Extreme Learning Machines (ELM): Filling the Gap between Frank Rosenblatt's Dream and John von Neumann's Puzzle
- Learning without iteratively tuning hidden neurons - Random hidden neurons - Random features
Neural networks (NN) and support vector machines (SVM) play key roles in machine learning and data analysis. Feedforward neural networks and support vector machines are usually considered different learning techniques in computational intelligence community. Both popular learning techniques face some challenging issues such as: intensive human intervene, slow learning speed, poor learning scalability.
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: 1) the slow gradient-based learning algorithms are extensively used to train neural networks, and 2) all the parameters of the networks are tuned iteratively by using such learning algorithms. On the other hand, due to their outstanding classification capability, support vector machine and its variants such as least square support vector machine (LS-SVM) have been widely used in binary classification applications. The conventional SVM and LS-SVM cannot be used in regression and multi-class classification applications directly although different SVM/LS-SVM variants have been proposed to handle such cases.
ELM works for the “generalized” single-hidden layer feedforward networks (SLFNs) but the hidden layer (or called feature mapping) in ELM need not be tuned. Such SLFNs include but are not limited to support vector machine, polynomial network, RBF networks, and the conventional (both single-hidden-layer and multi-hidden-layer) feedforward neural networks. Different from the tenet in neural networks that all the hidden nodes in SLFNs need to be tuned, ELM learning theory shows that the hidden nodes / neurons of generalized feedforward networks needn’t be tuned and these hidden nodes / neurons can be randomly generated. All the hidden node parameters are independent from the target functions or the training datasets. ELM theories conjecture that this randomness may be true to biological learning in animal brains. Although in theory all the parameters of ELMs can be analytically determined instead of being tuned, for the sake of efficiency, in real applicaitons the output weights of ELMs may be determined in different ways (with or without iterations, with or without incremental implementations, etc.).
Why can learning be made without tuning hidden neurons?
What kind of activation functions can be used in hidden neurons?
Does such a network have feature learning, clustering, regression and classification capabilities?
According to ELM theory:
The hidden node / neuron parameters are not only independent of the training data but also of each other, standard feedforward neural networks with such hidden nodes have universial approximation capability and separation capability. Such hidden nodes and their related mappings are terms ELM random nodes, ELM random neurons or ELM random features.
Unlike conventional learning methods which MUST see the training data before generating the hidden node / neuron parameters, ELM could randomly generate the hidden node / neuron parameters before seeing the training data.
Multi hidden layers of networks can be built by hierarchical ELMs
ELM was originally proposed for standard single hidden layer feedforward neural networks (with random hidden nodes (random hidden neurons, random features)), and has recently been extended to kernel learning as well:
ELM is efficient in:
ELM has been successfully used in the following applications:
Due to the demand on ELM solutions, ELM may help drive R&D in the following areas and make some applications which seem impossible in the past become true in the future:
Singapore, December 15 - 17 2015
Nanyang Technological University, Singapore
Zhejiang University, China
Tsinghua University, China