Most proofs in optimization consist in using inequalities for a particular function class in some creative way. This is a cheatsheet with inequalities that I use most often. It considers class of functions that are convex, strongly convex and $L$-smooth.
Basic inequalities
Setting. $f$ is a function $\mathbb{R}^p \to \mathbb{R}$. Below are a set of inequalities that are verified when $f$ belongs to a particular class of functions and $x, y \in \mathbb{R}^p$ are arbitrary elements in its domain. For simplicity I’m assuming that functions are differentiable, but most of these are also true replacing the gradient with a subgradient.
$f$ is $L$-smooth. This is the class of functions that are differentiable and its gradient is Lipschitz continuous.
- $|\nabla f(y) - \nabla f(x) | \leq L |x - y|$
- $|f(y) - f(x) - \langle \nabla f(x), y - x\rangle| \leq \frac{L}{2}|y - x|^2$ $\nabla^2 f(x) \preceq L\qquad \text{ (assuming $f$ is twice differentiable)} $
$f$ is convex:
- $f(\lambda x + (1 - \lambda)y) \leq \lambda f(x) + (1 - \lambda)f(y)$ for all $\lambda \in [0, 1]$.
- $f(x) \leq f(y) + \langle \nabla f(x), x - y\rangle$
- $0 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
- $f(\mathbb{E}X) \leq \mathbb{E}[f(X)]$ where $X$ is a random variable (Jensen’s inequality).
- $x = \text{prox}{\gamma f}(x) + \gamma \text{prox}{f^/\gamma}(x/\gamma)$, where $f^$ is the Fenchel conjugate (Moreau’s decomposition, )
$f$ is both $L$-smooth and convex:
- $\frac{1}{L}|\nabla f(x) - \nabla f(y)|^2 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
- $0 \leq f(y) - f(x) - \langle \nabla f(x), y - x\rangle \leq \frac{L}{2}|x - y|^2$
- $f(x) \leq f(y) + \langle \nabla f(x), x - y\rangle - \frac{1}{2 L}|\nabla f(x) - \nabla f(y)|^2$
$f$ is $\mu$-strongly convex. Set of functions $f$ such that $f - \frac{\mu}{2}|\cdot|^2$ is convex. It includes the set of convex functions with $\mu=0$. Here $x^*$ denotes the minimizer of $f$.
-
$f(x) \leq f(y) + \langle \nabla f(x), x - y \rangle - \frac{\mu}{2}|x - y|^2$
-
$f(x) \leq f(y) + \langle \nabla f(y), x - y\rangle + \frac{1}{2\mu}|\nabla f(x) - \nabla f(y)|^2$
-
$\mu|x - y|^2 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
-
$\frac{\mu}{2}|x-x^|^2\leq f(x) - f(x^)$
$f$ is both $L$-smooth and $\mu$-strongly convex.
- $\frac{\mu L}{\mu + L}|x - y|^2 + \frac{1}{\mu + L}|\nabla f(x) - \nabla f(y)|^2 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
- $\mu \preceq \nabla^2 f(x) \preceq L \qquad \text{ (assuming $f$ is twice differentiable)}$
References
Most of these inequalities appear in the Book: “Introductory lectures on convex optimization: A basic course” by Nesterov (2013, Springer Science & Business Media). Another good (and free) resource is the book “Convex Optimization” by Stephen Boyd and Lieven Vandenberghe.