Computer Science and Engineering Knowledge Center: Feed Forward Neural Networks

Saturday, April 28, 2012

Feed Forward Neural Networks

Author: Marek Libra

A brief note prior reading: I suggest to read introduction to Artificial Neural Networks before reading that Knol.

The Feed Forward Neural Networks (FFNNs) represent a subclass of perceptron networks.Same as other perceptron networks, the FFNNs are computed with discrete time. The models using the discrete time are used for solving practical problems very often due to generally good theoretical understanding of those models and especially due to their ability to be easily simulated on von Neumann machines.

A feed-forward network is an acyclic graph of neurons unambiguously signed (weighted) by numbers from R (real numbers). The neurons are grouped into a sequence of d + 1 pair wise disjoint layers. The neurons are connected only to the neurons in a subsequent or previous layer.

The layers are called input, output and hidden in a feed-forward NN. The input and output layers must be present exactly once per each in a network. These layers build the network interface. The hidden layers are optional and there can be more than one.

The number of layers excluding the input one is called the depth and equals to the d. The connection (sometimes called synapse) between neuron i and j is evaluated by the weight w_ij . The w_ij = 0 means there is no connection from the i to the j neuron.

FFNNs are notated according to counts of neurons in their layers. For example the 3-1-2-4 FFNN notation points to an FFNN with 3 input neurons, 1 neuron in the first hidden layer, 2 neurons in the second hidden layer and 4 output neurons, as depicted on the following figure:

The computation of an FFNN is done gradually, layer by layer, where the outputs of one layer neurons serve as the inputs for the subsequent layer neurons.
Initially, the states (means outputs) of neurons in the input layer are set to represent the external input.
The states of neurons in a subsequent hidden or output layer are computed in parallel. The neuron computing its output takes the output of the preceding neurons as its input. The neuron j computes a weighted sum of the inputs called the inner potential ε_j ∈ R:

where the j← is a set of input neurons for neuron j (means from preceding layer) and y _i is the output of the neuron i.
The nonlinearity is introduced to an NN using an activation function σ computed on the inner potential ε:

σ : R −→ (0, 1).

The activation function must be differentiable on R or at least on a subset of R which is used by the neurons to solve a particular problem. The differentiability is necessary especially for the back-propagation algorithm.

The commonly used activation function is the standard sigmoid:

The λ is a parameter of this function used for optimization of the network’s behavior. The λ determines the function steepness and its value can differ for neurons in an NN in general.

The actual computation of σ is problem-specific, but the standard sigmoid on the interval (0, 1) is commonly used for computation of σ(ε).
Another also popular activation function seems to be the hyperbolic tangents, where σ = tangh:

A discussion about the activation function’s behavior can be found in [1] .

Computer Science and Engineering Knowledge Center

Pages

Saturday, April 28, 2012

Feed Forward Neural Networks

Further Reading

Knol

External

No comments:

Post a Comment

Total Pageviews