Advanced Feedforward; Backpropagation
We've seen a simple feedforward implementation in which two inputs were fed into a neuron, multiplied by their respective weights, and then a bias added before passing the sum to an activation function. Those are the fundamental mechanics of feedforward.
Let's take a look at a a slightly more complex and realistic feedforward system.
A 2-2-1 System
We're going to look at a 2-2-1 network, which indicates that we'll have two inputs, two nodes in the middle layer, and one output node.
Rather than have the bias be a separate value that gets added at the end, we're going to actually use what looks like a 3-3-1 system, where there appears to be an additional input at the bottom. We're going to use that for the bias: the input will always be 1, and the weight for that input will be the bias itself. This will allow us to more easily adjust the weights and bias as our algorithm learns.
We're going to run through the feedforward process step-by-step to make sure we have a good handle on what's happening here. Use the indicated inputs, weights, and bias values and confirm that a sigmoid activation function will produce the values shown.
This process of working left-to-right through the layers of the network is the feedforward phase of our learning, also sometimes called the propagation phase.
We now need to perform a backpropagation phase, in which we compute the (gradient) loss at the final layer, and recursively apply partial derivatives and the chain rule to update the weights for each cell in our layer.
Backpropagation
The backpropagation process is a fascinating one, and the principles are easy to understand.
- With inputs, a series of initial random weights and bias, and a true (labeled) output, we can calculate the result of a feedforward pass and determine what the loss (or "error") in our network is.
- We can consider that loss as being a multivariable function, where the loss is dependent on the weights and biases in our layers:
- The value of L(...) will vary as a function of each individual variable. Because they are all independent variables, we can compute the partial-derivative with respect to each variable.
- Use each partial derivative to update the appropriate weight (or bias).
- Repeat the process of feedforward and backprop until
- a given number of epochs have passed, or
- the predicted values align well with the true labels in the training set
Example: the XOR binary function
This example here uses feedforward and backpropagation to try to learn the rules for the exclusive-or (XOR) binary operation. The numpy
library is used for manipulating arrays here, although no machine-learning libraries (TensorFlow, Keras, SciKitLearn, PyTorch) are used.
Example: our own test data
This example here uses feedforward and backpropagation to try to identify patterns in height-weight-gender. Try manimpulating the larning rate, the number of epochs, the number of cells in the hidden layer, or the number of layers to see how that affects your results.