The Bregman Algorithm (3/4)

Unconstrained Optimization using the Bregman Algorithm

2021-03-21 4 min read Mathematics

In a previous post we discussed how to solve constrained optimization problems by using the Bregman algorithm. Here we want to extend the approach unconstrained problems.

Let’s start simple. Assume we want to minimize a convex and smooth function $f\colon\mathbb{R}^{n}\to\mathbb{R}$. Since $\mathbb{R}^n$ is a convex set, we can apply the Bregman algorithm from our previous post and get the following iterative scheme.

$$\begin{align} x^{(n+1)} &= \arg\min_z\left\lbrace D(z, x^{(n)})\right\rbrace \\ &= \arg\min_z\left\lbrace f(z) - \left\langle \nabla f(x^{(n)}), z \right\rangle \right\rbrace \end{align}$$

The corresponding first order optimality conditions ask us to solve $\nabla f(z) = \nabla f(x^{(n)})$ for $z$. Additionally we have to take the source condition into account when choosing $x^{(0)}$. As mentioned in our previous post, we may set $x^{(0)}$ as the global minimizer of $f$. In which case the algorithm would be finished at its first iteration. That is very nice, but actually not really useful. We can draw two conclusions from this observation.

Basically it’s possible to apply the Bregman algorithm to unconstrained optimization problems.
In such a general setting it’s rather pointless.

So let’s try something more specific. Let’s assume our function $f$ is given by $f(x) = g(x) + \lambda h(Ax)$ with convex and smooth functions $g$ and $h$, a matrix $A\in\mathbb{R}^{n,n}$, and a positive parameter $\lambda$. Minimizing such models is a very common task in applied mathematics and can for example be used to compute optical flow in computer vision. The authors of [2] proposed to iterate the following minimization scheme. Here we denote the Bregman divergence of $g$ by $D_g$.

$$\begin{equation} x^{(n+1)} = \arg\min_z\left\lbrace D_g(z,x^{(n)}) + \lambda h(Az) \right\rbrace \end{equation}$$

Instead of using the Bregman divergence of the whole expression, we only use the Bregman divergence of the first part of the energy. Nevertheless, it doesn’t feel like we have gained anything from this. We have replaced $g$ by its Bregman divergence, which is going to add at least one additional term to the energy that we need to take into consideration. However, we can add a second trick. We introduce a slack variable $y$ and require this slack variable to be close to $Ax$ in our original function $f$. In this context, $y$ being close to $Ax$ will enter our model as an additional term $\frac{1}{2}\left\lVert y-Ax\right\rVert_2^2$. Thus, we consider the following energy with a parameter $\mu > 0$.

$$\begin{equation} \arg\min_{z,y}\left\lbrace g(z) + \lambda h(y) + \frac{\mu}{2}\left\lVert y-Ax\right\rVert_2^2 \right\rbrace \end{equation}$$

Combining both ideas, the proposed scheme from [2] then becomes

$$\begin{equation} (z,y)^{(n+1)} = \arg\min_{(z,y)} \left\lbrace D_{g+\lambda h}((z,y),(z,y)^{(n)}) + \frac{\mu}{2}\left\lVert z-Ay\right\rVert_2^2 \right\rbrace \end{equation}$$

It’s easy to show that we have the following identity for the Bregman divergence.

$$\begin{equation} D_{g+\lambda h}(\cdot,\cdot) = D_{g}(\cdot,\cdot) + \lambda D_{h}(\cdot,\cdot) \end{equation}$$

Thus, we can decouple the computation in our aforementioned iterative scheme into two simpler steps.

$$\begin{align} z^{(n+1)} &= \arg\min_{z}\left\lbrace D_g(z,z^{(n)}) + \frac{\mu}{2}\left\lVert z-Ay^{(n)}\right\rVert_2^2 \right\rbrace \\ y^{(n+1)} &= \arg\min_{y}\left\lbrace \lambda D_h(y,y^{(n)}) + \frac{\mu}{2}\left\lVert z^{(n+1)}-Ay\right\rVert_2^2 \right\rbrace \end{align}$$

These two steps involve proximal operators, which are well understood in mathematics and there exist closed form expressions for the most common choices of $g$ and $h$. Hence their evaluation is simple and very fast. Because of this splitting, the here presented variant of the Bregman Algorithm is also known as Split Bregman Algorithm. A thorough evaluation of its performance as well as all the details on its convergence properties can be found in my Master Thesis.

References

[1] Bregman, L. M. “The Relaxation Method of finding the common point of convex sets and its application to the solution of problems in convex programming.” USSR Computational Mathematics and Mathematical Physics, 1967, 7, 200–217 (English translation, Russian original available here)

[2] Goldstein T. and Osher S., “The Split Bregman Method for L1-Regularized Problems.” SIAM Journal on Imaging Sciences, 2009, 2(2), 323–343

[3] Hoeltgen, L. “Bregman Iteration for Optical Flow.” Master Thesis, Saarland University, Saarbrücken, Germany, 2010