Author: Marek Libra
A brief note prior reading of this Knol: I suggest to read
- introduction to Artificial Neural Networks
- and Feed Forward Neural Networks
- and Adaptation of Feed-Forward Neural Networks
- and The Perceptron Rule
The Back Propagation (called BP later) is an algorithm used for adaptation of a multi-layered feed-forward neural network. It is based on the perceptron rule, but propagating the effect of the overall network error into inner parts of the network.
The network error computation on the training set is based on the complicated nonlinear function. It is a classical hard optimization problem to minimize the network error value and the BP tries to deal with it.
The BP uses gradient descent method which demands the differentiability of the network error function. Obviously, the differentiability of the network error function relies on the differentiability of the activation function (due to the feed-forward computation mechanism described in the preceeding Adaptation of Feed-Forward Neural Networks Knol).
The Knol you are reading aims for the standard sigmoid as the activation function in the description of the following equations. This function is commonly used in practical experiments.
Initially, the weight wij0 from the neuron i to the neuron j is set randomly for each interconnected neurons i and j at the time t=0.
The adaptation is computed in a loop of discrete steps (begin at time t = 1, increment t in each subsequent step).
The weights wt of time t ensue from the weights wt−1 of time t−1 by incrementing a negative gradient of the network error at the point wt−1 multiplied by a learning speed parameter e ∈ R:
Eq. 1 |
The gradient in Eq. 1 is computed from
Eq. 4 |
Eq. 2 |
Eq. 3 |
- for a neuron j in output layer:
- for a neuron j in hidden layer:
Same as other gradient descent methods, the main disadvantage of the BP is in locating of a local
minimum. It is very difficult to reach the global minimum with such approaches. Some heuristics
based a stochastic extension of BP or on tuning of the learning speed parameter exist but they
are not optimal in general.
Following simplified BP algorithm is provided for better description. The ∆E is a matrix used for step-by-step computation
of the network error gradient.
The claim to discovery of the BP is controversial. Initially, it was accepted to have been discovered independently by Rumelhart, Hinton and Williams (1986), Le Cun (1985), and Parker (1985). But it was mainly the work by Rumelhart et. al. who made the model popular. He published his original article as [1].
Further Reading
Knols
- Artificial Neural Networks
- Feed-Forward Neural Networks
- Adaptation of Feed-Forward Neural Networks
- The Perceptron Rule
- My Knol Directory
External
- [1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Bradford Books/MIT Press, Cambridge, 1986.
Source Knol: The Back Propagation
No comments:
Post a Comment