Lanczos Algorithm

How to compute eigenvalues and eigenvectors of symmetric matrices

This is the third post in my series on Krylov subspaces. The first post is here and the second one is here.

The Lanczos Algorithm

In this post we cover the Lanczos algorithm that gives you eigenvalues and eigenvectors of symmetric matrices.

The Lanczos algorithm is named after Cornelius Lanczos, who was quite influential with his research. He also proposed an interpolation method and a method to approximate the gamma function. These findings will not be covered in this post. We will look at his minimized iterations method that allows you to compute eigenvalues and eigenvectors of symmetric matrices.

The Lanczos algorithm is a special case of the Arnoldi iteration that we discussed in a previous post. Historically, the Lanczos algorithm was published before the Arnoldi iteration. Arnoldi explicitly mentions the work of Lanczos [2] in his paper [1].

Similarly as for the post on the Arnoldi iteration, we will again closely follow the presentation of the original paper of Lanczos [2]. Actually, the paper of Arnoldi [1] contains a very readable presentation of the Lanczos method. When it comes to the algorithmic details and how to implement things, I would probably recommend [1] rather than [2]. However, the work of Lanczos [2] contains many detailed examples the might be useful to check that an implementation is actually correct.

Preliminary Observations

Lanczos begins his presentation [2] by introducing the (discrete) Fredholm equation $y - \lambda Ay = b$ and cites two popular methods to solve it. Here $A\in\mathbb{R}^{n,n}$ is some matrix, $\lambda\in\mathbb{R}$ a scalar, and $b\in\mathbb{R}^n$ a given righthand side. We would like to find a solution $y\in\mathbb{R}$. Setting $b=0$ leads us to the eigenvalue problem.

The Liouville-Neumann expansion obtains a solution of the Fredholm equation from the geometric series $$y = (I - \lambda A)^{-1}b = \sum_{k=0}^\infty \lambda^kA^k b\quad .$$ The Liouville-Neumann expansion converges if $\lambda$ is sufficiently small. This assumption asserts that the inverse in the previous expression really exists and that the series converges.

The second method is the Schmidt series, which requires all eigenvalues $\mu_i$ and eigenvectors $u_i$ of $A$ to be known upfront. Then you can express $y$ as $$y = \sum_{k=1}^n \frac{\gamma_k u_k}{1-\lambda\mu_k},\quad \gamma_k := \frac{\left\langle b, u_k\right\rangle}{\left\langle u_k, u_k\right\rangle}$$ The Schmidt series can easily be derived by expressing $A$ in terms of its eigendecomposition. We note that the Liouville-Neumann is rather simple in its structure. It only requires iterated multiplications with $\lambda A$. On the other hand the Schmidt series is appealing because it is a finite sum. No discussion on convergence is needed.

Lanczos’ method actually combines the advantages of both formulations. It has a simple algorithmic structure and can be represented as a finite sum.

We begin by defining $b_k:=A^k b$. The definition is solely for convenience. Clearly there must exist some integer $m\geqslant 1$ and $m$ real coefficients $g_k$ such that $$b_m + g_1 b_{m-1} + \ldots + g_m b_0 = 0$$ Let us, for now, assume that $m$ and all the coefficients $g_k$ are known. Then we can use them to define the following family of polynomials $$S_m(\lambda) := 1 + g_1 \lambda + \ldots + g_m \lambda^m$$ Note that $S_0(\lambda)$ is constant 1. Lanczos then shows that you can derive the following relation between $y$ and $b$ $$y = \frac{S_{m-1}(\lambda)b + S_{m-2}(\lambda)\lambda A b + \ldots + S_0(\lambda)\lambda^{m-1}A^{m-1}b}{S_m(\lambda)}$$ The previous formula is very similar to the Sturm-Liouville expression mentioned above. It uses a different weighting which allows it to give an exact result after a finite number of iterations, though! In the Sturm-Liouville formula, all weights in front of the $\lambda^kA^k$ were $1$. Here they are given by $$\frac{S_{m-1-k}(\lambda)}{S_m(\lambda)}$$ In order to find eigenvalues and eigenvectors one would have to set $b=0$. This would immediately give us $y=0$ in the equation above. Which is not a valid eigenvector. However, Lanczos shows in his paper that the eigenvalues of $A$ are obtained by solving $S_m(\lambda)=0$ and that you get the eigenvector $u_i$ corresponding to the eigenvalue $\mu_i$ by computing $$u_i = S_{m-1}(\mu_i)b + S_{m-2}(\mu_i)\mu_i A b + \ldots + S_0(\mu_i)\mu_i^{m-1} A^{m-1} b$$ In order to obtain this result you have to substitute $b$ with $S_m(\lambda)b$ in the expression for $y$ above to get rid of the denominator. Then it’s a simple discussion on what can or should be identically 0.

Finally, the previous expression for $u_i$ can even further be simplified to $$u_i = A^{m-1} b + g_1^i A^{m-2} b + \ldots + g_{m-1}^i b$$ This final form saves you a lot of polynome evaluations in practical applications.

The method of minimized iterations

In the previous section we did not discuss how to get the coefficients $g_i$ or how to get the eigenvalues. Computing the $g_i$ can be done by solving a linear system of equations. In his paper [2], Lanczos derives a recursive algorithm to solve this linear system. He exploits the fact that the system matrix is actually a Hankel matrix.

Lanczos also discusses potential issues if the multiplicity of an eigenvalue is higher than 1 or if the starting vector $b$ is chosen orthogonally to an eigenvector.

We don’t want to go into these details here because there’s another issue with the approach mentioned so far. The strategy is numerically unstable and would already cause issues for rather small matrices. The ratio between the smallest and largest eigenvalue is amplified in each iteration and ruins your computations already after a few steps. Lanczos mentions that the method would already fail for $12\times{}12$ matrices where the ratio between the smallest and largest eigenvalue is $10$. Such matrices are quite small and rather well conditioned. They shouldn’t cause any issues. For this particular reason he proposes the method of minimized iterations. This method is equivalent to the approach from the previous section. We compute the same quantities. However, we do so in a slightly more controlled way, which helps us avoid numerical instabilities.

Our approach from the previous section expresses eigenvectors of $A$ as linear combinations of vectors $A^kb$. Rather than using the vectors $A^kb$ directly, we could orthogonalize the set of all $A^kb$. If for example we had given $b$ and $Ab$ and wanted to find a vector $x$ orthogonal to $b$ such that $x$ and $b$ span the same space as $Ab$ and $b$, then we would try to find the minimizing $\alpha$ for the cost function $\left\lVert Ab - \alpha b\right\rVert^2$. A straightforward computation reveals that it is given by $$\alpha = \frac{\left\langle A b, b\right\rangle}{\left\langle b, b\right\rangle}$$ and one immediately sees that $x := Ab - \alpha b$ will indeed be orthogonal to $b$. This scheme can obviously be continued for $A^2b$, $A^3b$, …, where we always perform the orthogonalisation with respect to all previously computed vectors. Lanczos doesn’t state it explicitly, but this simply applying the Gram-Schmidt method onto the Krylov subspaces. It’s the very same trick we already encountered in the Arnoldi Iteration. Lanczos mentions in his paper that such a successive orthogonalization was probably first employed by Otto Szász in 1910. The works of Gram [6] and Schmidt [5] are slightly older, though. They date back to 1883 and 1907.

In Arnoldis approach, we had a recurrence formulation that included all terms. In Lanczos’ method we will never have more than two correction terms. The reason for this is that he assumes the matrix $A$ to be symetric when presenting his minimized iterations. Lanczos also briefly discusses the case of a non-symmetric $A$. He proposes to work simultaneously on $A$ and $A^\top$ to generate two sequences of vectors.

From an algorithmic point of view the remaining details in the paper are identical to the presentation in Arnoldis work [1]. We have a recursive strategy for the characteristic polynomial from which we can recover the eigenvalues by computing the roots of certain polynomials. Next we obtain the eigenvectors from these polynomials and the basis vectors of the Krylov subspace. Rather than copying all the equations from that post, I will simply refer to it here.

What does it have to do with Krylov subspaces?

Similarly as for the Arnoldi Iteration, we compute a basis of the space spanned by the vectors $A^kb$ for $k=0$, $1$, $2$, …. This space is a Krylov subspace.

Implementation

Have a look at the code published for the Arnoldi Iteration. If your input matrix is symmetric, then you will generate a coefficient matrix $(\mu_{i,j})_{i,j}$ which will be tridiagonal, i.e. your iterative scheme to compute the next basis vector of your Krylov subspace only consists of three terms. One could simplify some formulas to take advantage of this, but the overall implementation would still be the same.

References

[1] W. E. Arnoldi, The principle of minimized iterations in the solution of the matrix eigenvalue problem, Quarterly of Applied Mathematics 9, 17–29, 1951, doi:10.1090/QAM/42792

[2] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, Journal of Research, Nat. Bu. Stand., 45, 255-282, 1950

[3] J. Liesen, Zdenĕk Strakoš, Krylov Subspace Methods: Principles and Analysis; Oxford University Press, 2013

[4] Gene H. Golub, Charles F. Van Loan, Matrix Computations, John Hopkins University Press, 2013

[5] E. Schmidt, “Zur Theorie der linearen und nichtlinearen Integralgleichungen I. Teil: Entwicklung willkürlicher Funktionen nach Systemen vorgeschriebener,” Mathematische Annalen, vol. 63, pp. 433–476, 1907.

[6] J. P. Gram, “Ueber die Entwickelung reeller Functionen in Reihen mittels der Methode der kleinsten Quadrate,” Journal for die Reine und Angewandte Mathematik, vol. 94, pp. 41–73, 1883.