{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PySmorch: A machine learning library for the dregs of society\n", "by\n", "**Your name here**\n", "\n", "After graduating with an advanced degree in CS, you apply for jobs. You get an interview at Facebook, but the HR team rejects your application when they find out you didn't get an 'A+' in 764. Soon after, a recruiter for Friendster reaches out to you and offers you a job. The pay is lousy and the cafeteria only serves Hot Pockets, but after your performance in 764, this is the best you can ever hope for in life. You accept the job and come to terms with reality.\n", "\n", "Ready, code monkey? Your first assignment is to implement a neural network classifier. Company policy dictates that all data science work must be done using Friendster's proprietary ML framework, PySmorch. To your horror, you discover that the developers of PySmorch forgot to implement the backward routines in any of the neural network layers. \n", "\n", "You realize there's only one way forward: complete the PySmorch library, implement a neural network classifier, become the CEO of Friendster, and raise the company to fame as the foremost social network! So grab a bag a cheetos and hunker down in your cube...it's time to go to war. \n", "\n", "...but first...let's configure your environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from utility import check_gradient\n", "from math import sqrt\n", "from scipy.signal import convolve2d\n", "import numpy as np\n", "from numpy.linalg import norm\n", "from numpy.random import randn, normal, randint\n", "import urllib, io\n", "import matplotlib.pyplot as plt\n", "np.random.seed(0)\n", "def good_job(path):\n", " f = urllib.request.urlopen(path)\n", " a = plt.imread(io.BytesIO(f.read()))\n", " fig = plt.imshow(a)\n", " fig.axes.get_xaxis().set_visible(False)\n", " fig.axes.get_yaxis().set_visible(False)\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PySmorch contains classes that implement different neural network components. You create a network by layering these components together.\n", "\n", "Lets start by defining a data input layer, which just stores an array. During the forward pass, it just hands its data array forward to the next layer. During the backward pass, it does nothing since it contains no variables.\n", "\n", "Let's also define a least-squares loss. It takes in $X$ and returns $\\frac{1}{2M}\\|X-Y\\|^2$ where $Y$ is an array of labels specified when the loss in constructed, and $M$ is the number of training vectors (i.e, rows of $X$). The backward pass computes the gradient of the loss with respect to the inputs ($X$), and recursively hands it back (i.e., \"backprops\") to the layer behind. Note that the factor of $1/M$ in the loss means that we are computing an *average* over all instances in the batch. This formulation is nice because it makes the size (and curvature) of the loss function invariant to the batch size. As a result, the stability bound on the learning rate remains roughly the same for all batch sizes.\n", "\n", "You don't have to modify the cell below - the PySmorch devs already wrote these classes. However understanding these methods will help you implement other missing methods." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Don't modify this code block!\n", "class Data:\n", " \"\"\"Stores an input array of training data, and hands it to the next layer.\"\"\"\n", " def __init__(self, data): \n", " self.data = data\n", " self.out_dims = data.shape\n", " def set_data(self, data):\n", " self.data = data\n", " def forward(self):\n", " return self.data\n", " def backward(self, dwnstrm):\n", " pass\n", "\n", "class SquareLoss:\n", " \"\"\"Given a matrix of logits (one logit vector per row), and a vector labels, \n", " compute the sum of squares difference between the two\"\"\"\n", " def __init__(self, in_layer, labels):\n", " self.in_layer = in_layer\n", " self.labels = labels\n", " def set_data(self, labels):\n", " self.labels[:] = labels\n", " def forward(self):\n", " \"\"\"Loss value is (1/2M) || X-Y ||^2\"\"\"\n", " self.in_array = self.in_layer.forward()\n", " self.num_data = self.in_array.shape[0]\n", " return (0.5/self.num_data) * norm(self.in_array-self.labels)**2\n", " \n", " def backward(self):\n", " \"\"\"Gradient is (1/M) (X-Y), where N is the number of training samples\"\"\"\n", " self.pass_back = (self.in_array-self.labels)/self.num_data\n", " self.in_layer.backward(self.pass_back) # hand the gradient of loss with respect to inputs back to previous layer\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### You don't trust the PySmorth devs though...so run this unit test just to make sure." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a random test problem \n", "X = Data(randn(10,3))\n", "Y = randn(10,3)\n", "# Create a square loss layer that accepts the X's, and compares them to the Y's. \n", "# This is basically a neural net that does nothing but compute a loss.\n", "loss = SquareLoss(X,Y)\n", "# Create functions to produce the loss, and its gradient\n", "def network_loss(Xdata):\n", " X.set_data(Xdata)\n", " return loss.forward()\n", "def loss_grad(Xdata):\n", " X.set_data(Xdata)\n", " loss.forward()\n", " loss.backward()\n", " return loss.pass_back\n", "did_pass = check_gradient(network_loss,loss_grad,randn(10,3))\n", "assert did_pass, \"The square loss layer does not produce the correct gradient!\"\n", "print(\"TEST PASSED! Guess the PySmorch devs don't screw up EVERYTHING.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem 1: A linear layer \n", "This layer takes its input $X,$ and produces $XW,$ where $W$ is a matrix of layer parameters. The backward step does two things. First, it computes the gradient of the downstream loss with respect to `W`. It stores this gradient in an array called `self.G`. Then, it computes the gradient of the downstream loss with respect to the input, and recursively hands this down to the layer beneath it. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Linear:\n", " \"\"\"Given an input matrix X, with one feature vector per row, \n", " this layer computes XW, where W is a linear operator.\"\"\"\n", " def __init__(self, in_layer, num_out_features):\n", " assert len(in_layer.out_dims)==2, \"Input layer must contain a list of 1D linear feature data.\"\n", " self.in_layer = in_layer\n", " num_data, num_in_features = in_layer.out_dims\n", " self.out_dims = np.array([num_data, num_out_features])\n", " # Declare the weight matrix\n", " self.W = randn(num_in_features, num_out_features)/sqrt(num_in_features) \n", " def forward(self):\n", " \"\"\"This function computes XW\"\"\"\n", " self.in_array = self.in_layer.forward()\n", " ##### YOUR CODE HERE #####\n", " return ##### YOUR CODE HERE #####\n", " def backward(self, dwnstrm):\n", " # Compute the gradient of the loss with respect to W, and store it as G\n", " self.G = ##### YOUR CODE HERE #####\n", " # Compute grad of loss with respect to inputs, and hand this gradient backward to the layer behind\n", " input_grad = ##### YOUR CODE HERE #####\n", " self.in_layer.backward(input_grad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Now run this unit test. Don't change anything in this code block." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Don't modify this code block!\n", "# Make a random dataset with 10 samples of 3 features each\n", "X = Data(randn(10,3))\n", "Y = randn(10,2) # The label vector has to entries per sample\n", "L1 = Linear(X,num_out_features=4) # A linear layer maps each 3-vector onto a 2-vector\n", "L2 = Linear(L1,num_out_features=2) # A linear layer maps each 3-vector onto a 2-vector\n", "loss = SquareLoss(L2,Y) # Loss compares the output of the linear layer to the labels\n", "\n", "# Check that the method computes the matrix product\n", "out = L1.forward()\n", "error = norm(X.data@L1.W-out)/norm(X.data)\n", "assert error<1e-8, \"Your Linear layer does not compute the correct forward function\"\n", "\n", "# Check the gradient for W\n", "def network_loss(W): # Computes the value of the loss function\n", " L2.W[:] = W[:]\n", " loss_value = loss.forward()\n", " return loss_value\n", "def L2_grad(W): # Gradient of loss function with respect to weights in linear layer\n", " L2.W[:] = W[:]\n", " loss.forward()\n", " loss.backward()\n", " return L2.G\n", "W = normal(size=L2.W.shape)\n", "did_pass = check_gradient(network_loss,L2_grad,W)\n", "assert did_pass, \"Your linear layer is no good! You did not get the right gradient with respect to W.\"\n", "\n", "# Check that the correct gradient is passed back\n", "def network_loss(W): # Computes the value of the loss function\n", " L1.W[:] = W[:]\n", " loss_value = loss.forward()\n", " return loss_value\n", "def L1_grad(W): # Gradient of loss function with respect to weights in linear layer\n", " L1.W[:] = W[:]\n", " loss.forward()\n", " loss.backward()\n", " return L1.G\n", "W = normal(size=L1.W.shape)\n", "did_pass = check_gradient(network_loss,L1_grad,W)\n", "assert did_pass, \"Your linear layer is no good! You did not pass back the correct gradient with respect to the layer inputs.\"\n", "\n", "print(\"TEST PASSED! Your linear layer is terrific\")\n", "good_job(\"https://www.cs.umd.edu/~tomg/img/important_memes/penguin_hate.png\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem 2 - Relu non-linearity\n", "\n", "Neural nets are no good without non-linearities. If all the layers are linear operators, then the entire network can never be any more powerful than a single linear operator (why?). Implement a Relu non-linearity." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Relu:\n", " \"\"\"Given an input matrix X, with one feature vector per row, \n", " this layer computes maximum(X,0), where the maximum operator is coordinate-wise.\"\"\"\n", " def __init__(self, in_layer):\n", " self.in_layer = in_layer\n", " self.in_dims = in_layer.out_dims\n", " self.out_dims = self.in_dims\n", " def forward(self):\n", " self.in_array = self.in_layer.forward()\n", " ##### YOUR CODE HERE #####\n", " return 0 # Change this return value\n", " def backward(self, dwnstrm):\n", " ##### YOUR CODE HERE #####\n", " self.in_layer.backward(##### YOUR CODE HERE #####)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Now test your Relu" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Don't modify this code block!\n", "# Create a dataset\n", "X = Data(randn(10,3))\n", "Y = randn(10,1)\n", "# Create network with 2 linear layers and a Relu in the middle\n", "L1 = Linear(X,5)\n", "R1 = Relu(L1)\n", "L2 = Linear(R1,1)\n", "loss = SquareLoss(L2,Y)\n", "# Methods to compute loss function, and gradient of loss with respect to L1 parameters. This gradient has to\n", "# backprop through your Relu, so this test should fail if your Relu is messed up.\n", "def network_loss(W):\n", " L1.W[:] = W[:]\n", " loss_value = loss.forward()\n", " return loss_value\n", "def L1_grad(W):\n", " L1.W[:] = W[:]\n", " loss.forward()\n", " loss.backward()\n", " return L1.G\n", "W = normal(size=L1.W.shape)\n", "did_pass = check_gradient(network_loss,L1_grad,W)\n", "assert did_pass, \"Your gradient checker is no good!\"\n", "print(\"TEST PASSED! Your Relu layer works\")\n", "good_job(\"https://www.cs.umd.edu/~tomg/img/important_memes/otter.png\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem 3 - cross entropy\n", "A real classification net uses a cross-entropy loss instead of a least-squares loss. Given a single input/logit vector $x$ with class label $k,$ the cross entropy is given by \n", " $$- \\log\\left( \\frac{\\exp(x_k)} { \\sum_i \\exp(x_i) } \\right) = -x_k + \\log( \\sum_i \\exp(x_i)).$$\n", "You can use the `LeastSquares` loss class as a model for how to do this. The formulas for forward and backward will be different though.\n", "\n", "Remember that in your implementation, the array $x$ contains a bunch of logit vectors, one per row. Just like we did for the least squares loss, you should return the *average* cross entropy loss over the batch dimension. \n", "\n", "**Your implementation of this layer can never exponentiate a positive number!**\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class CrossEntropy:\n", " \"\"\"Given a matrix of logits (one logit vector per row), and a vector labels, \n", " compute the cross entropy of the softmax.\n", " The labels must be a 1d vector\"\"\"\n", " def __init__(self, in_layer, ones_hot):\n", " self.in_layer = in_layer\n", " self.ones_hot = ones_hot\n", " def set_labels(self, ones_hot):\n", " self.ones_hot = ones_hot\n", " def forward(self):\n", " in_array = self.in_layer.forward()\n", " ##### YOUR CODE HERE #####\n", " return 0 # Change this return value\n", " def backward(self):\n", " ##### YOUR CODE HERE #####\n", " self.in_layer.backward(##### YOUR CODE HERE #####)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Now, run this unit test for your cross-entropy layer" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Don't modify this code block!\n", "## Make dataset\n", "X = Data(randn(10,3)) # Create 10 random feature vectors of length 3\n", "Y = randint(5,size=(10)) # Assign a random class to each vector\n", "Y = np.eye(5)[Y] # Convert the 1D label vector to a 2D one's hot encoding with a 5-entry row for each sample\n", "# Build neural net with a linear layer, Relu, and a cross entropy loss\n", "L1 = Linear(X,5)\n", "R1 = Relu(L1)\n", "loss = CrossEntropy(R1,Y)\n", "\n", "print('Checking against a slow for loop implementation...')\n", "logits = R1.forward() # Get the logits by running the network up to the last layer before the loss\n", "xent_value = loss.forward() # The loss value from the code above. This value must match the result of the loop\n", "mean = 0\n", "for x,y in zip(logits,Y): # Compute the xent loss using a loop\n", " xent = -x[y>0] + np.log(np.sum(np.exp(x)))\n", " mean = mean + xent\n", "mean = mean/logits.shape[0]\n", "assert (mean-xent_value)**2<1e-10, \"Your cross entropy formula is incorrect.\"\n", "print(' Passed')\n", "\n", "# Build methods to get loss and gradient\n", "def network_loss(W):\n", " L1.W[:] = W[:]\n", " loss_value = loss.forward()\n", " return loss_value\n", "def L1_grad(W):\n", " L1.W[:] = W[:]\n", " loss.forward()\n", " loss.backward()\n", " return L1.G\n", "W = normal(size=L1.W.shape)\n", "\n", "print('Checking the gradient...')\n", "did_pass = check_gradient(network_loss, L1_grad, W, error_tol=1e-4)\n", "assert did_pass, \"Your softmax gradient is no good! Check your gradient formula!\"\n", "print(' Passed')\n", "\n", "print('Checking that your method is stable with large inputs.')\n", "L1.W = normal(size=L1.W.shape)*1000\n", "l = loss.forward()\n", "assert np.isfinite(l), \"Your softmax is unstable. You probably exponentiated positive numbers. That's not allowed!\"\n", "print(' Passed')\n", "\n", "print(\"TEST PASSED! Your softmax layer isn't good...it's teeeerific!\")\n", "good_job(\"https://www.cs.umd.edu/~tomg/img/important_memes/hacker_database.png\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem 4 - bias\n", "Our linear layer above performs a matrix multiplication on its input. In practice, most neural net libraries have layers that include an additive \"bias\" term, and so they compute $xw+b$ for some (trainable) constant b. Very often, the linear and conv layers are executed without bias because they are immediately followed by a batch norm layer that includes a bias term. In PySmorch, we perform this affine operation using two layers: a linear layer and a bias layer. We already implemented the linear layer above. Now let's do the bias. \n", "\n", "Write a layer that takes in a feature map, and adds a (trainable) constant to each output. This layer only adds the constant, it does not perform a matrix multiplication. For consistency, we will use W and G to denote the layer's parameters and gradient. So for each row vector $x$, this layer returns $x+W$. It accepts a 2D array of many row vectors X, then it adds the row vector $W$, which is broadcast over each row vector in $X$. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Bias:\n", " \"\"\"Given an input matrix X, add a trainable constant to each entry.\"\"\"\n", " def __init__(self, in_layer):\n", " self.in_layer = in_layer\n", " num_data, num_in_features = in_layer.out_dims\n", " self.out_dims = in_layer.out_dims\n", " # Declare the weight matrix\n", " self.W = randn(1,num_in_features) \n", " \n", " def forward(self):\n", " ##### YOUR CODE HERE #####\n", " return 0 # Change this return value\n", " \n", " def backward(self, dwnstrm):\n", " # Compute the gradient of the loss with respect to W, and store it as G\n", " self.G = ##### YOUR CODE HERE #####\n", " # Compute grad of loss with respect to inputs, and hand this gradient backward to the layer behind\n", " self.in_layer.backward(##### YOUR CODE HERE #####)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Your code is always perfect. But there's all sorts of bugs in the Friendster codebase. So better run this unit test to be sure..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Don't modify this code block!\n", "# Create a dataset\n", "X = Data(randn(10,3))\n", "Y = randn(10,1)\n", "# Create network with 2 linear layers and a Relu in the middle\n", "L1 = Linear(X,5)\n", "B1 = Bias(L1)\n", "R1 = Relu(B1)\n", "L2 = Linear(R1,1)\n", "loss = SquareLoss(L2,Y)\n", "# Build methods to get loss and gradient\n", "def network_loss(W):\n", " B1.W[:] = W[:]\n", " loss_value = loss.forward()\n", " return loss_value\n", "def B1_grad(W):\n", " B1.W[:] = W[:]\n", " loss.forward()\n", " loss.backward()\n", " return B1.G\n", "W = normal(size=B1.W.shape)\n", "did_pass = check_gradient(network_loss,B1_grad,W)\n", "assert did_pass, \"Your bias gradient is no good. You used the wrong formula for the gradient with respect to W!\"\n", "def network_loss(W):\n", " L1.W[:] = W[:]\n", " loss_value = loss.forward()\n", " return loss_value\n", "def L1_grad(W):\n", " L1.W[:] = W[:]\n", " loss.forward()\n", " loss.backward()\n", " return L1.G\n", "W = normal(size=L1.W.shape)\n", "did_pass = check_gradient(network_loss,L1_grad,W)\n", "assert did_pass, \"Your bias gradient is no good! You passed back the wrong gradient with respect to layer inputs!\"\n", "print(\"TEST PASSED! Your bias layer works\")\n", "good_job(\"https://www.cs.umd.edu/~tomg/img/important_memes/my_voice.png\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Finally, test a complete net with 3 linear layers, relus, bias, and cross-entropy " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Don't modify this code block!\n", "## Make dataset\n", "X = Data(randn(10,3)) # Create 10 random feature vectors of length 3\n", "Y = randint(5,size=(10)) # Assign a random class to each vector\n", "Y = np.eye(5)[Y] # Convert the 1D label vector to a 2D one's hot encoding with a 5-entry row for each sample\n", "\n", "# Create network with 3 linear layers, biases, and Relu\n", "L1 = Linear(X,5)\n", "B1 = Bias(L1)\n", "R1 = Relu(B1)\n", "L2 = Linear(R1,8)\n", "B2 = Bias(L2)\n", "R2 = Relu(B2)\n", "L3 = Linear(R2,5)\n", "B3 = Bias(L3)\n", "loss = CrossEntropy(B3,Y)\n", "\n", "def network_loss(W):\n", " L1.W[:] = W[:]\n", " loss_value = loss.forward()\n", " return loss_value\n", "\n", "def L1_grad(W):\n", " L1.W[:] = W[:]\n", " loss.forward()\n", " loss.backward()\n", " return L1.G\n", "W = normal(size=L1.W.shape)\n", "did_pass = check_gradient(network_loss,L1_grad,W)\n", "assert did_pass, \"Your network failed! Run all the unit tests above and double check!\"\n", "print(\"\"\"Wow! You did it! As your boss, I'm soooo impressed with you! You're the best dev we're got here at\n", "Friendster, and you've been promoted to CTO! Stock value is up and we've got 14M new users!\"\"\")\n", "\n", "good_job(\"https://www.cs.umd.edu/~tomg/img/important_memes/koala.png\")\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }