{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Name:  **Your name here**  \n",
    "UID:  **Your student ID num here**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### Homework 6:  Gradient methods \n",
    "\n",
    "Put the file `grad2d.py` into your directory, along with `logreg.py`.  Then run the following cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from grad2d import grad2d, divergence2d\n",
    "from logreg import logistic_loss, logreg_objective, logreg_objective_grad, create_classification_problem\n",
    "import numpy as np\n",
    "from numpy import sqrt, sum, abs, max, maximum, logspace, exp, log, log10, zeros\n",
    "from numpy.linalg import norm\n",
    "from numpy.random import randn, rand, normal, randint\n",
    "import urllib, io\n",
    "import matplotlib.pyplot as plt\n",
    "np.random.seed(0)\n",
    "def good_job(path):\n",
    "    f = urllib.request.urlopen(path)\n",
    "    a = plt.imread(io.BytesIO(f.read()))\n",
    "    fig = plt.imshow(a)\n",
    "    fig.axes.get_xaxis().set_visible(False)\n",
    "    fig.axes.get_yaxis().set_visible(False)\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Problem 1:  Gradient descent\n",
    "Write a function that estimates the Lipschitz constant of a function $g$.  This can be done using the formula\n",
    "$$L \\approx \\frac{\\|g(x) - g(y)\\|}{\\|x-y\\|}.$$\n",
    "The inputs should be a function $g:\\mathbb{R}^n \\to \\mathbb{R}^m,$ and an initial vector $x$ in the domain of $g$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def estimate_lipschitz(g, x):\n",
    "    # Your work here\n",
    "    return L"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Now, run this unit test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = lambda x: 10*x\n",
    "x = randn(3,4,5)\n",
    "L = estimate_lipschitz(g,x)\n",
    "assert abs(L-10)<1e-10, \"Your Lipschitz estimator is broken!\"\n",
    "print(\"Great - your Lipschitz estimator works!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Write a routine that minimizes a function using gradient descent.\n",
    "The inputs $f$ and $grad$ are function handles.  The function $f: \\mathbb{R}^N\\to \\mathbb{R}$ is an arbitrary objective function, and  $grad: \\mathbb{R}^N \\to \\mathbb{R}^N$ is its gradient.  The method should minimize $f$ using gradient descent, and terminate when the gradient of $f$ is small.  I suggest stopping when\n",
    " $$\\|\\nabla f(x^k)\\|<\\|\\nabla f(x^0)\\|*tol$$\n",
    " where $x^0$ is an initial guess and $tol$ is a small tolerance parameter (a typical value would be $10^{-4}$).  \n",
    " \n",
    "  Use a backtracking line search to guarantee convergence.   The stepsize should be monotonically decreasing.  Each iteration should begin by trying the stepsize that was used on the previous iteration, and then backtrack until the Armijo condition holds:\n",
    "  $$f(x^{k+1}) \\le f(x^k) + \\alpha \\langle x^{k+1} - x^k, \\nabla f(x^k)\\rangle,$$\n",
    "  where $\\alpha \\in (0,1),$ and $\\alpha=0.1$ is suggested.\n",
    "\n",
    "  The function returns the solution vector $x_{sol}$, and also a vector $res$ containing the norm of the residual (i.e., the norm of the gradient) at each iteration.\n",
    "\n",
    "This initial stepsize should be $10/L$, where $L$ is an estimate of the Lipschitz constant for the gradient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def grad_descent(f, grad, x0, max_iters=10000, tol=1e-4):\n",
    "    # Your work here\n",
    "    return x, res"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now run this unit test.  It will use your routine to fit a logistic regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define a classification problem\n",
    "X, y = create_classification_problem(100, 10, cond_number=10)\n",
    "# Define the logistic loss function, and its gradient\n",
    "f = lambda w: logreg_objective(w,X,y)\n",
    "grad = lambda w: logreg_objective_grad(w,X,y)\n",
    "# Pick the initial guess\n",
    "w0 = zeros((10,1))\n",
    "\n",
    "# Now, solve the minimization problem\n",
    "w, res = grad_descent(f,grad,w0)\n",
    "\n",
    "# Check the solution\n",
    "assert norm(grad(w))/norm(grad(w0))<1e-4, \"ERROR:  your gradient descent routine did not the minimize the function\"\n",
    "print(\"Solver terminated in %d steps\"%len(res))\n",
    "print(\"Terrific!  Your routine works!\")\n",
    "\n",
    "_, res_g = grad_descent(f,grad,w0)\n",
    "n_g = list(range(len(res_g)))\n",
    "plt.semilogy(n_g,res_g)\n",
    "plt.legend(('gradient',))\n",
    "plt.xlabel('iteration')\n",
    "plt.ylabel('residual')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now, implement a gradient solver that begins each iterations using a Barzilai-Borwein stepsize \n",
    "  $$\\tau = \\frac{\\langle x^{k} - x^{k-1} ,x^{k} - x^{k-1}   \\rangle}{\\langle x^{k} - x^{k-1} ,\\nabla f(x^{k}) - \\nabla f(x^{k-1})   \\rangle}.$$\n",
    "  Your routine should perform only 1 gradient computation per iteration (i.e., your method should not require any \"extra\" work).\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def grad_descent_bb(f, grad, x0, max_iters=10000, tol=1e-4):\n",
    "    # Your work here!\n",
    "    return x, res"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now run this unit test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Minimize that logistic loss again\n",
    "f = lambda w: logreg_objective(w,X,y)\n",
    "grad = lambda w: logreg_objective_grad(w,X,y)\n",
    "\n",
    "w, res = grad_descent_bb(f,grad,w0)\n",
    "assert res[-1]/res[0]<1e-4, \"ERROR:  Your BB routine did not the minimize the function\"\n",
    "print(\"Solver terminated in %d steps\"%len(res))\n",
    "print(\"Terrific!  Your BB routine works!\")\n",
    "\n",
    "_, res_g = grad_descent(f,grad,w0)\n",
    "_, res_b = grad_descent_bb(f,grad,w0)\n",
    "n_g = list(range(len(res_g)))\n",
    "n_b = list(range(len(res_b)))\n",
    "\n",
    "plt.semilogy(n_g,res_g,n_b,res_b)\n",
    "plt.legend(('gradient','bb'))\n",
    "plt.xlabel('iteration')\n",
    "plt.ylabel('residual')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Write a routine that uses Nesterov's accelerated gradient method\n",
    "\\begin{align}\n",
    "x^{k} &= y^k - \\tau \\nabla f(y^k)\\\\\n",
    "\\delta^{k+1} &= \\frac{1+\\sqrt{1+4(\\delta^k)^2}}{2}\\\\\n",
    "y^{k+1} &= x^{k}+\\frac{\\delta^k-1}{\\delta^{k+1}}(x^k-x^{k-1})\n",
    "\\end{align}\n",
    "The stepsize restriction for Nesterov's methods is $\\tau<1/L,$ however when $L$ is not known exactly you can use the line search condition\n",
    " $$f(x^k) \\le f(y^{k}) + \\alpha (x^k-y^k)^T\\nabla f(y^k), $$\n",
    " where $\\alpha \\in [1/2,1).$  I suggest choosing $\\alpha=1/2.$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def grad_descent_nesterov(f, grad, x0, max_iters=10000, tol=1e-4):\n",
    "    # Your work here!\n",
    "    return x, res"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now run this unit test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Minimize the logistic loss using Nesterov's method\n",
    "f = lambda w: logreg_objective(w,X,y)\n",
    "grad = lambda w: logreg_objective_grad(w,X,y)\n",
    "\n",
    "w, res = grad_descent_nesterov(f,grad,w0)\n",
    "assert res[-1]/res[0]<1e-4, \"ERROR:  Your Nesterov routine did not the minimize the function\"\n",
    "print(\"Solver terminated in %d steps\"%len(res))\n",
    "print(\"Terrific!  Your Nesterov routine works!\")\n",
    "\n",
    "_, res_g = grad_descent(f,grad,w0)\n",
    "_, res_b = grad_descent_bb(f,grad,w0)\n",
    "_, res_n = grad_descent_nesterov(f,grad,w0)\n",
    "n_g = list(range(len(res_g)))\n",
    "n_b = list(range(len(res_b)))\n",
    "n_n = list(range(len(res_n)))\n",
    "\n",
    "plt.semilogy(n_g,res_g,n_b,res_b,n_n,res_n)\n",
    "plt.legend(('gradient','bb','nesterov'))\n",
    "plt.xlabel('iteration')\n",
    "plt.ylabel('residual')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Problem 2:  Image denoising\n",
    "Consider this noisy test image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Don't modify this block\n",
    "image = zeros((50,50))\n",
    "image[15:35,15:35]=1\n",
    "image = image+0.1*randn(50,50)\n",
    "print('Here is a noisy image...')\n",
    "plt.imshow(image)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Here's the TV loss function from a previous homework..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Don't modify this block\n",
    "mu = .1\n",
    "def h(z, eps=.01):\n",
    "    \"\"\"The hyperbolic approximation to L1\"\"\"\n",
    "    return sum(sqrt(z*z+eps*eps).ravel())\n",
    "def tv_denoise_objective(x,mu,b):\n",
    "    return mu*h(grad2d(x)) + 0.5*norm(x-b)**2\n",
    "def h_grad(z, eps=.01):\n",
    "    \"\"\"The gradient of h\"\"\"\n",
    "    return z/sqrt(z*z+eps*eps)\n",
    "def tv_denoise_grad(x,mu,b):\n",
    "    \"\"\"The gradient of the TV objective\"\"\"\n",
    "    return mu*divergence2d(h_grad(grad2d(x))) + x-b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use your BB solver to minimize the TV objective, and denoise the test image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your work here!\n",
    "\n",
    "# x, res = grad_descent_bb(...)\n",
    "\n",
    "# Don't modify the three lines below\n",
    "print(\"Here's the denoised image...\")\n",
    "plt.imshow(x)\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}