BackpropagationBackpropagation, Backpropagation, Error Back Propagation, Backprop, Deltarule. TaxonomyThe Backpropagation algorithm is a supervised learning method for multilayer feedforward networks from the field of Artificial Neural Networks and more broadly Computational Intelligence. The name refers to the backward propagation of error during the training of the network. Backpropagation is the basis for many variations and extensions for training multilayer feedforward networks not limited to Vogl's Method (Bold Drive), DeltaBarDelta, Quickprop, and Rprop. InspirationFeedforward neural networks are inspired by the information processing of one or more neural cells (called a neuron). A neuron accepts input signals via its dendrites, which pass the electrical signal down to the cell body. The axon carry the signal out to synapses, which are the connections of a cell's axon to other cell's dendrites. In a synapse, the electrical activity is converted into molecular activity (neurotransmitter molecules crossing the synaptic cleft and binding with receptors). The molecular binding develops an electrical signal which is passed onto the connected cells dendrites. The Backpropagation algorithm is a training regime for multilayer feed forward neural networks and is not directly inspired by the learning processes of the biological system. StrategyThe information processing objective of the technique is to model a given function by modifying internal weightings of input signals to produce an expected output signal. The system is trained using a supervised learning method, where the error between the system's output and a known expected output is presented to the system and used to modify its internal state. State is maintained in a set of weightings on the input signals. The weights are used to represent an abstraction of the mapping of input vectors to the output signal for the examples that the system was exposed to during training. Each layer of the network provides an abstraction of the information processing of the previous layer, allowing the combination of subfunctions and higher order modeling. ProcedureThe Backpropagation algorithm is a method for training the weights in a multilayer feedforward neural network. As such, it requires a network structure to be defined of one or more layers where one layer is fully connected to the next layer. A standard network structure is one input layer, one hidden layer, and one output layer. The method is primarily concerned with adapting the weights to the calculated error in the presence of input patterns, and the method is applied backward from the network output layer through to the input layer. Algorithm (below) provides a highlevel pseudocode for preparing a network using the Backpropagation training method. A weight is initialized for each input plus an additional weight for a fixed bias constant input that is almost always set to 1.0. The activation of a single neuron to a given input pattern is calculated as follows: $activation = \bigg(\sum_{k=1}^{n} w_{k} \times x_{ki}\bigg) + w_{bias} \times 1.0$where $n$ is the number of weights and inputs, $x_{ki}$ is the $k^{th}$ attribute on the $i^{th}$ input pattern, and $w_{bias}$ is the bias weight. A logistic transfer function (sigmoid) is used to calculate the output for a neuron $\in [0,1]$ and provide nonlinearities between in the input and output signals: $\frac{1}{1+exp(a)}$, where $a$ represents the neuron activation. The weight updates use the delta rule, specifically a modified delta rule where error is backwardly propagated through the network, starting at the output layer and weighted back through the previous layers. The following describes the backpropagation of error and weight updates for a single pattern. An error signal is calculated for each node and propagated back through the network. For the output nodes this is the sum of the error between the node outputs and the expected outputs: $es_i = (c_i  o_i) \times td_i$where $es_i$ is the error signal for the $i^{th}$ node, $c_i$ is the expected output and $o_i$ is the actual output for the $i^{th}$ node. The $td$ term is the derivative of the output of the $i^{th}$ node. If the sigmod transfer function is used, $td_i$ would be $o_i \times (1o_i)$ For the hidden nodes, the error signal is the sum of the weighted error signals from the next layer. $es_i = \bigg(\sum_{k=1}^n (w_{ik} \times es_k)\bigg) \times td_i$where $es_i$ is the error signal for the $i^{th}$ node, $w_{ik}$ is the weight between the $i^{th}$ and the $k^{th}$ nodes, and $es_k$ is the error signal of the $k_th$ node. The error derivatives for each weight are calculated by combining the input to each node and the error signal for the node. $ed_i = \sum_{k=1}^n es_i \times x_k$where $ed_i$ is the error derivative for the $i^{th}$ node, $es_i$ is the error signal for the $i^{th}$ node and $x_k$ is the input from the $k^{th}$ node in the previous layer. This process include the bias input that has a constant value. Weights are updated in a direction that reduces the error derivative $ed_i$ (error assigned to the weight), metered by a learning coefficient. $w_i(t+1) = w_i(t) + (ed_k \times learn_{rate})$where $w_i(t+1)$ is the updated $i^{th}$ weight, $ed_k$ is the error derivative for the $k^{th}$ node and $learn_{rate}$ is an update coefficient parameter. Input :
ProblemSize , InputPatterns , $iterations_{max}$, $learn_{rate}$
Output :
Network
Network $\leftarrow$ ConstructNetworkLayers ()$Network_{weights}$ $\leftarrow$ InitializeWeights (Network , ProblemSize )For ($i=1$ To $iterations_{max}$)$Pattern_i$ $\leftarrow$ SelectInputPattern (InputPatterns )$Output_i$ $\leftarrow$ ForwardPropagate ($Pattern_i$, Network )BackwardPropagateError ($Pattern_i$, $Output_i$, Network )UpdateWeights ($Pattern_i$, $Output_i$, Network , $learn_{rate}$)End Return (Network )Heuristics
Code ListingListing (below) provides an example of the Backpropagation algorithm implemented in the Ruby Programming Language. The problem is the classical XOR boolean problem, where the inputs of the boolean truth table are provided as inputs and the result of the boolean XOR operation is expected as output. This is a classical problem for BackPropagation because it was the problem instance referenced by Minsky and Papert in their analysis of the Perceptron highlighting the limitations of their simplified models of neural networks [Minsky1969]. The algorithm was implemented using a batch learning method, meaning the weights are updated after each epoch of patterns are observed. A logistic (sigmoid) transfer function is used to convert the activation into an output signal. Weight updates occur at the end of each epoch using the accumulated delta's. A momentum term is used in conjunction with the past weight update to ensure the last update influences the current update, reducing large changes. A three layer network is demonstrated with 2 nodes in the input layer (two inputs), 2 nodes in the hidden layer and 1 node in the output layer, which is sufficient for the chosen problem. A bias weight is used on each neuron for stability with a constant input of 1.0. The learning process is separated into four steps: forward propagation, backward propagation of error, calculation of error derivatives (assigning blame to the weights) and the weight update. This separation facilities easy extensions such as adding a momentum term and/or weight decay to the update process. def random_vector(minmax) return Array.new(minmax.size) do i minmax[i][0] + ((minmax[i][1]  minmax[i][0]) * rand()) end end def initialize_weights(num_weights) minmax = Array.new(num_weights) {[rand(),rand()]} return random_vector(minmax) end def activate(weights, vector) sum = weights[weights.size1] * 1.0 vector.each_with_index do input, i sum += weights[i] * input end return sum end def transfer(activation) return 1.0 / (1.0 + Math.exp(activation)) end def transfer_derivative(output) return output * (1.0  output) end def forward_propagate(net, vector) net.each_with_index do layer, i input=(i==0)? vector : Array.new(net[i1].size){knet[i1][k][:output]} layer.each do neuron neuron[:activation] = activate(neuron[:weights], input) neuron[:output] = transfer(neuron[:activation]) end end return net.last[0][:output] end def backward_propagate_error(network, expected_output) network.size.times do n index = network.size  1  n if index == network.size1 neuron = network[index][0] # assume one node in output layer error = (expected_output  neuron[:output]) neuron[:delta] = error * transfer_derivative(neuron[:output]) else network[index].each_with_index do neuron, k sum = 0.0 # only sum errors weighted by connection to the current k'th neuron network[index+1].each do next_neuron sum += (next_neuron[:weights][k] * next_neuron[:delta]) end neuron[:delta] = sum * transfer_derivative(neuron[:output]) end end end end def calculate_error_derivatives_for_weights(net, vector) net.each_with_index do layer, i input=(i==0)? vector : Array.new(net[i1].size){knet[i1][k][:output]} layer.each do neuron input.each_with_index do signal, j neuron[:deriv][j] += neuron[:delta] * signal end neuron[:deriv][1] += neuron[:delta] * 1.0 end end end def update_weights(network, lrate, mom=0.8) network.each do layer layer.each do neuron neuron[:weights].each_with_index do w, j delta = (lrate * neuron[:deriv][j]) + (neuron[:last_delta][j] * mom) neuron[:weights][j] += delta neuron[:last_delta][j] = delta neuron[:deriv][j] = 0.0 end end end end def train_network(network, domain, num_inputs, iterations, lrate) correct = 0 iterations.times do epoch domain.each do pattern vector,expected=Array.new(num_inputs){kpattern[k].to_f},pattern.last output = forward_propagate(network, vector) correct += 1 if output.round == expected backward_propagate_error(network, expected) calculate_error_derivatives_for_weights(network, vector) end update_weights(network, lrate) if (epoch+1).modulo(100) == 0 puts "> epoch=#{epoch+1}, Correct=#{correct}/#{100*domain.size}" correct = 0 end end end def test_network(network, domain, num_inputs) correct = 0 domain.each do pattern input_vector = Array.new(num_inputs) {k pattern[k].to_f} output = forward_propagate(network, input_vector) correct += 1 if output.round == pattern.last end puts "Finished test with a score of #{correct}/#{domain.length}" return correct end def create_neuron(num_inputs) return {:weights=>initialize_weights(num_inputs+1), :last_delta=>Array.new(num_inputs+1){0.0}, :deriv=>Array.new(num_inputs+1){0.0}} end def execute(domain, num_inputs, iterations, num_nodes, lrate) network = [] network << Array.new(num_nodes){create_neuron(num_inputs)} network << Array.new(1){create_neuron(network.last.size)} puts "Topology: #{num_inputs} #{network.inject(""){m,im+"#{i.size} "}}" train_network(network, domain, num_inputs, iterations, lrate) test_network(network, domain, num_inputs) return network end if __FILE__ == $0 # problem configuration xor = [[0,0,0], [0,1,1], [1,0,1], [1,1,0]] inputs = 2 # algorithm configuration learning_rate = 0.3 num_hidden_nodes = 4 iterations = 2000 # execute the algorithm execute(xor, inputs, iterations, num_hidden_nodes, learning_rate) end Download: backpropagation.rb.
ReferencesPrimary SourcesThe backward propagation of error method is credited to Bryson and Ho in [Bryson1969]. It was applied to the training of multilayer networks and called backpropagation by Rumelhart, Hinton and Williams in 1986 [Rumelhart1986b] [Rumelhart1986c]. This effort and the collection of studies edited by Rumelhart and McClelland helped to define the field of Artificial Neural Networks in the late 1980s [Rumelhart1986] [Rumelhart1986a]. Learn MoreA seminal book on the approach was "Backpropagation: theory, architectures, and applications" by Chauvin and Rumelhart that provided an excellent introduction (chapter 1) but also a collection of studies applying and extending the approach [Chauvin1995]. Reed and Marks provide an excellent treatment of feedforward neural networks called "Neural Smithing" that includes chapters dedicated to Backpropagation, the configuration of its parameters, error surface and speed improvements [Reed1999]. Bibliography 
Free CourseGet one algorithm per week...
Own A CopyThis 438page ebook has...

Please Note: This content was automatically generated from the book content and may contain minor differences. 

Do you like Clever Algorithms? Buy the book now. 