Backpropagation DNN exercises #
Computational graph in Tensorboard showing the components involved in a TF BP update
Neuron #
Simple DNN 1 #
Simple DNN 2 #
A network consist of a concatenation of the following layers
- Fully Connected layer with input $x^{(1)}$, $W^{(1)}$ and output $z^{(1)}$.
- RELU producing $a^{(1)}$
- Fully Connected layer with parameters $W^{(2)}$ producing $z^{(2)}$
- SOFTMAX producing $\hat{y}$
- Cross-Entropy (CE) loss producing $L$
The task of backprop consists of the following steps:
- Sketch the network and write down the equations for the forward path.
- Propagate the backwards path i.e. make sure you write down the expressions of the gradient of the loss with respect to all the network parameters.
NOTE: Please note that we have omitted the bias terms for simplicity.
Forward Pass Step | Symbolic Equation |
---|---|
(1) | $z^{(1)} = W^{(1)} x^{(1)}$ |
(2) | $a^{(1)} = \max(0, z^{(1)})$ |
(3) | $z^{(2)} = W^{(2)} a^{(1)}$ |
(4) | $\hat{y} = \mathtt{softmax}(z^{(2)})$ |
(5) | $L = CE(y, \hat{y})$ |
Backward Pass Step | Symbolic Equation |
---|---|
(5) | $\frac{\partial L}{\partial L} = 1.0$ |
(4) | $\frac{\partial L}{\partial z^{(2)}} = \hat y - y$ |
(3a) | $\frac{\partial L}{\partial W^{(2)}} = a^{(1)} (\hat y - y)$ |
(3b) | $\frac{\partial L}{\partial a^{(1)}} = W^{(2)} (\hat y - y)$ |
(2) | $\frac{\partial L}{\partial z^{(1)}} = \frac{\partial L}{\partial a^{(1)}}$ if $a^{(1)} > 0$ |
(1) | $\frac{\partial L}{\partial W^{(1)}} = \frac{\partial L}{\partial z^{(1)}} \times x^{(1)}$ |