Building a Deep Neural Network: 2 Layer Logistic Regression
This post will walk through the process of building a 2 Layer Logistic Regression Neural Network. The process goes as follows:
Forward Functions:
As you can see from figure 1, the 2-layer neural network can be summarized as:
- Input of Feature Vectors which in this case is 187500
x
values - Execute the linear forward equation to get the $$z$$ value (one $$x$$ value for each node).
- Execute the RELU function to get the $$a^{[l]}$$ values.
- With these values, execute the linear forward function which returns the $$z$$ value
- Execute the sigmoid function which will give you the $$\hat y\$$ output.
- Calculate the cost function for the dataset
The details of the analysis:
- The input is a (250,250,3) image which is flattened to a vector of size $$(187500,1)$$ vector.
- The variables
w
andb
are then initialized with random values. The variableb
is initialized with all0
‘s. - The corresponding vector: $$[x_0,x_1,…,x_{187500}]^T$$ is then multiplied by the weight matrix $$W^{[1]}$$ of size $$(n^{[1]}, 187500)$$. You then add a bias term and calculate the activation function; for hidden layers, and in this analysis the RELU (Rectified Linear Unit) function is used.
- The following vector is obtained: $$[a_0^{[1]}, a_1^{[1]},…,a_{n^{[1]}-1}^{[1]}]^T$$. This ends up being, in this case, an array of (7, 750), meaning that for each input values, there are 7 values for each image — based on the number of nodes in the hidden layer.
- The
z
value for each of these 7 activation function values (a
) is then calculated using the new $$w^{[2]}$$ weight, and g$$b$$ bias values. In this setup, thew
has dimensions (1,7), and thea
array has dimensions (7,750), and the bias has dimensions (1,1) resulting in a single $$z$$ value for each image, or $$m$$. - The $$\sigma\ (z)$$ value is calculated yielding the $$\hat y\$$ value.
- This $$\hat y\$$ value is then used to calculate the cost function.
Back Propagation:
This may seem complicated up to this point, and it is, but for me, it gets even more complicated with the backward propagation. It is relatively simple when you see the code, but actually transporting the code from Python to R can get a little complicated. In fact, in many cases, I had to simply write the code, and compare values with small matrices which was in many instances the only way I knew for sure that the code was properly translated.
For every forward function, it is important to remember that there is always a corresponding backward propagation. This is where the gradient descent is calculated, and the parameter values are then adjusted according to the learning rate $$alpha$$. Figure 2 depicts the backward propagation as best I can. I realize that it can be confusing, but hopefully you can follow it closely enough to map the code to what is actually happening. I have found that writing the matrix dimensions down and using that for a guide will help you avoid bugs.
Details of backward propagation:
- One detail omitted during the forward function was that the $$z^[l]$$ values, along with the parameters were cached. These values are used in the backward propagation.
- The first step is to use the cost function to initialize the backward propagation.
- As with the forward function, backward propagation can be summarized with: a) Compute the gradient of the cost with respect to $$z$$, b) implement the linear portion of the backward propagation by computing the gradient cost with respect to $$a^{[l]}$$, $$w^{[l]}$$, and $$b^{[l]}$$, c) and finally, using these values, update the parameters using $$\alpha$$, the learning rate, and repeat the process for the set number of iterations defined by the initialization parameters.
I realize the last bullet was quite a mouthful, but it is what it is. Hopefully, some of the coding translation below between Python and R make it a little clearer.
Commonly used Python and R Matrix Functions:
Rather than display all of the code here, I will provide some of the less commonly required functions for matrix manipulation in both Python and R.
Create, Show Shape/Dimensions, and Reshape Matrix:
The very basics require you to create matrices, analyze matrix structures, and reshape the matrix to desired dimensions.
Python:
Keep in mind that Python and R index slightly different. Python starts counting at index zero (0
), and does not include the last number in an index range, whereas in R, indexing starts at one (1
) and includes the last number in a range.
Create Matrix W: W = np.matrix(np.arange(1,253).reshape(7,36))
Display Structure: W.shape
Out[318]: (7, 36)
Reshape existing matrix: X = np.matrix(np.arange(1,37).reshape(36,1))
X.shape
Out[326]: (36, 1)
Create and return evenly spaced values within a given interval in Python, the last number in range is not included: b = np.matrix(np.arange(1,8).reshape(7,1))
b.shape
Out[330]: (7, 1)
The numpy dot
function is used for 2-D arrays and is equivalent to matrix multiplication, and for 1-D arrays, it is equivalent to the inner product of vectors:
z = W.dot(X) + b
matrix([
[ 16207],
[ 40184],
[ 64161],
[ 88138],
[112115],
[136092],
[160069]])
R:
W <- matrix(1:252, nrow = 7, ncol = 36, byrow = TRUE)
dim(W)
# [1] 7 36
X <- matrix(1:36, nrow = 36, ncol = 1, byrow = TRUE)
dim(X)
# [1] 36 1
b <- matrix(1:7, nrow = 7, ncol = 1, byrow = TRUE)
dim(b)
# [1] 7 1
Z = W %*% X + b
[,1]
[1,] 16207
[2,] 40184
[3,] 64161
[4,] 88138
[5,] 112115
[6,] 136092
[7,] 160069
Leave a Reply
Your email is safe with us.