Author: Marek Libra

A brief note prior reading of this Knol: I suggest to read

- introduction to Artificial Neural Networks

- and Feed Forward Neural Networks
- and Adaptation of Feed-Forward Neural Networks
- and The Perceptron Rule

The

**Back Propagation**(called BP later) is an algorithm used for adaptation of a multi-layered feed-forward neural network. It is based on the perceptron rule, but propagating the effect of the overall network error into inner parts of the network.

The

**network error**computation on the training set is based on the complicated nonlinear function. It is a classical hard optimization problem to minimize the network error value and the BP tries to deal with it.

The BP uses

**gradient descent method**which demands the differentiability of the network error function. Obviously, the differentiability of the network error function relies on the differentiability of the activation function (due to the feed-forward computation mechanism described in the preceeding Adaptation of Feed-Forward Neural Networks Knol).

The Knol you are reading aims for the

**standard sigmoid**as the activation function in the description of the following equations. This function is commonly used in practical experiments.

Initially, the weight w

_{ij}

^{0}from the neuron i to the neuron j is set randomly for each interconnected neurons i and j at the time t=0.

The adaptation is computed in a loop of discrete steps (begin at time t = 1, increment t in each subsequent step).

The weights w

_{t}of time t ensue from the weights w

_{t−1}of time t−1 by incrementing a negative gradient of the network error at the point w

_{t−1}multiplied by a learning speed parameter

*e*∈ R:

Eq. 1 |

*e*is set accordingly in interval 0 <

*e*< 1 by the user or by some supporting algorithm. Its value can be changed during the learning process to tune the adaptation behavior (learning speed or skipping over local extremes).

The gradient in Eq. 1 is computed from

Eq. 4 |

Eq. 2 |

Eq. 3 |

- for a neuron
**j**in output layer: - for a neuron
**j**in hidden layer:

^{→}is a set of neurons with j as its input.

Same as other gradient descent methods, the main disadvantage of the BP is in locating of a

**local**

minimum. It is very difficult to reach the global minimum with such approaches. Some heuristics

minimum

based a stochastic extension of BP or on tuning of the learning speed parameter exist but they

are not optimal in general.

Following simplified BP algorithm is provided for better description. The ∆E is a matrix used for step-by-step computation

of the network error gradient.

The claim to discovery of the BP is controversial. Initially, it was accepted to have been discovered independently by Rumelhart, Hinton and Williams (1986), Le Cun (1985), and Parker (1985). But it was mainly the work by Rumelhart et. al. who made the model popular. He published his original article as [1].

## Further Reading

### Knols

- Artificial Neural Networks
- Feed-Forward Neural Networks
- Adaptation of Feed-Forward Neural Networks
- The Perceptron Rule
- My Knol Directory

### External

- [1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Bradford Books/MIT Press, Cambridge, 1986.

Source Knol: The Back Propagation

## No comments:

## Post a Comment