Author: Marek Libra
A brief note prior reading of this Knol: I suggest to read
- introduction to Artificial Neural Networks
- and Feed Forward Neural Networks
- and Adaptation of Feed-Forward Neural Networks
- and The Perceptron Rule
The Back Propagation (called BP later) is an algorithm used for adaptation of a multi-layered feed-forward neural network. It is based on the perceptron rule, but propagating the effect of the overall network error into inner parts of the network.
The network error computation on the training set is based on the complicated nonlinear function. It is a classical hard optimization problem to minimize the network error value and the BP tries to deal with it.
The BP uses gradient descent method which demands the differentiability of the network error function. Obviously, the differentiability of the network error function relies on the differentiability of the activation function (due to the feed-forward computation mechanism described in the preceeding Adaptation of Feed-Forward Neural Networks Knol).
The Knol you are reading aims for the standard sigmoid as the activation function in the description of the following equations. This function is commonly used in practical experiments.
Initially, the weight wij0 from the neuron i to the neuron j is set randomly for each interconnected neurons i and j at the time t=0.
The adaptation is computed in a loop of discrete steps (begin at time t = 1, increment t in each subsequent step).
The weights wt of time t ensue from the weights wt−1 of time t−1 by incrementing a negative gradient of the network error at the point wt−1 multiplied by a learning speed parameter e ∈ R:
By usage of standard sigmoid and inner potential definition (described in Feed-Forward Neural Networks):
Next, the back propagation strategy is used for computation of the partial derivation:
- for a neuron j in output layer:
- for a neuron j in hidden layer:
Same as other gradient descent methods, the main disadvantage of the BP is in locating of a local
minimum. It is very difficult to reach the global minimum with such approaches. Some heuristics
based a stochastic extension of BP or on tuning of the learning speed parameter exist but they
are not optimal in general.
Following simplified BP algorithm is provided for better description. The ∆E is a matrix used for step-by-step computation
of the network error gradient.
- Artificial Neural Networks
- Feed-Forward Neural Networks
- Adaptation of Feed-Forward Neural Networks
- The Perceptron Rule
- My Knol Directory
-  D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Bradford Books/MIT Press, Cambridge, 1986.
Source Knol: The Back Propagation