This is the second post in my series on Krylov subspaces. The first post is here and the third one is here.
Arnoldi Iterations
Arnoldi iterations is an algorithm to find eigenvalues and eigenvectors of general matrices. The algorithm constructs orthonormal bases of suitable Krylov subspaces and extracts the eigendata from the basis representation. The method is named after the American engineer Walter Edwin Arnoldi and was published in 1951 [1]. Arnoldi bases his results on the findings of Lanczos [2]. We will present the Lanczos method in a later post since it is a special case of the Arnoldi iteration. The Lanczos method focuses more on symmetric matrices, whereas the Arnoldi iteration works for arbitrary matrices.
Starting point is the (homogeneous) Fredholm equation $(\lambda I - A)u=0$. Pairs $(\lambda_k, u_k)$ with $u_k\neq 0$ that solve this equation are known as eigenvalues and eigenvectors of $A$. Eigenvalues and eigenvectors are a very important concept in mathematics. Knowing the eigenvalues and eigenvectors of a linear operator tells you everything about its behavior. We would like to find such pairs of eigendata by using Arnoldi iterations. The next paragraphs present the necessary ideas and follow the presentation in [1] rather closely.
We are given a square matrix $A\in\mathbb{R}^{n,n}$ and we want to solve the Fredholm equation $(\lambda I - A)u=0$ from the previous paragraph. To this end let us first consider a set of linearly independent vectors $\left\lbrace e_i \right\rbrace_i$ with $e_i$ in $\mathbb{R}^n$ for all $i$. If we place these vectors $e_i$ as columns in a matrix $E$, then there exist coefficients $\alpha_i$ such that any vector $u$ in the space spanned by these vectors can be expressed as $u = E\alpha$. Here $\alpha$ is the vector that contains the coefficients $\alpha_i$. Plugging this expression for $u$ into $(\lambda I - A)u=0$ gives us $(\lambda I - A)E\alpha = 0$. We can also multiply the equation with $E^\top$ from the left on both sides and obtain the expression $E^\top(\lambda I - A)E\alpha = 0$. The latter can also be rewritten as $$ \left(\lambda I - (E^\top E)^{-1}E^\top A E\right)\alpha = 0\quad . $$ Arnoldi refers to the previous equation as reduced equation and to $(E^\top E)^{-1}E^\top A E$ as reduced matrix in his work [1]. Note that the reduced matrix can be much smaller than the original matrix $A$. Its size depends on how many linearly independent columns you have in your matrix $E$. Arnoldi continues by noting that it would be useful if the matrix $E^\top E$ was diagonal. In that case it is easy to compute the inverse. If $\left\langle e_i, e_j\right\rangle = 0$ whenever $i\neq j$ (i.e. the vectors $e_i$ are orthogonal to each other), then $E^\top E$ will indeed be diagonal and our previous equation becomes
$$ \left( \lambda I - \left( \frac{e_j^\top A e_i}{e_j^\top e_i} \right)_{i,j} \right) \alpha = 0\quad . $$
The eigenvalues and eigenvectors of $A$ will be extracted from the previous system. However, we first need to find a way to generate suitable linearly independent vectors. Unless we chose them properly, we do not gain anything from the previous transformations.
Arnoldi suggests the following iterative strategy to obtain the vectors $e_i$. This idea is also already present in the work of Lanczos [2]. Starting from a given vector $e_1\in\mathbb{R}^n$, we compute $$ e_{i+1} = A e_i - \sum_{k=0}^{i-1} \mu_{i,k} e_{i-k} $$ In order to determine the scalar coefficients $\mu_{i,k}$ we exploit our orthogonality requirement. If $e_1$ is given, we want to generate a vector $e_2$ orthogonal to $e_1$. Next we generate a vector $e_3$ orthogonal to both $e_1$ and $e_2$… Overall we get the following identities. $$\begin{align} e_{i}^\top e_{i+1} &= e_{i}^\top A e_{i} - \mu_{i,0} e_i^\top e_i = 0 \\ e_{i}^\top e_{i+2} &= e_{i}^\top A e_{i+1} - \mu_{i,1} e_i^\top e_i = 0 \\ e_{i}^\top e_{i+3} &= e_{i}^\top A e_{i+2} - \mu_{i,2} e_i^\top e_i = 0 \\ &\vdots \end{align}$$ Solving these equations for the coefficients $\mu_{i,k}$ is straightforward. We immediately get $$ \mu_{i,k} = \frac{e_i^\top A e_{i+k}}{e_i^\top e_i}\quad . $$ The two previous steps are commonly referred to as Arnoldi iteration in the literature, c.f. this Wikipedia article. If you compute all $n$ iterates (see below for why there cannot be more than $n$ iterates) then you have generated a matrix $E$ such thath $E^\top A E$ will be in Hessenberg shape. Actually this is quite nice, but you might wonder what you can do with that representation. Arnoldis paper actually goes further and shows how to extract the eigendata. Unfortunately that part is omitted in many textbooks.
As mentionned, our choice to construct the orthogonal basis vectors implies that the matrix $E^\top A E$ is in Hessenberg shape. All entries below the first subdiagonal are 0. Further, the eigenvalues will approximate the eigenvalues of $A$. After $n$ steps they will even be identical because we have actually applied a similarity transform.
The eigenvalues correspond to the roots of the characteristic polynomial. Since our matrix is in Hessenberg shape, computing its characteristic polynomial becomes much simpler because we can approach it recursively. Arnoldi shows in his work that we get the following sequence of polynomials in parallel with the basis vectors. $$\begin{align} p_1(\lambda) &= \lambda - \mu_{1,0} \\ p_2(\lambda) &= (\lambda - \mu_{2,0}) p_1(\lambda) - \mu_{0,1} \\ p_3(\lambda) &= (\lambda - \mu_{3,0}) p_2(\lambda) - \mu_{2,1} p_1(\lambda) - \mu_{0,2} \\ &\vdots\\ \end{align}$$ The roots of these polynomials approximate the eigenvalues of $A$. Once you have computed the roots $\lambda_k$, you can also compute the coefficients $\alpha_j$ that solve the equation $E^\top(\lambda I - A)E\alpha = 0$. Similarly as for the $\lambda_k$, the $\alpha_j$ can also be computed recursively. $$\begin{align} \alpha_m &= 1 \\ \alpha_{m-1} &= \lambda_k - \mu_{m,0} \\ \alpha_{m-2} &= (\lambda_k - \mu_{m-1,0})\alpha_{m-1} - \mu_{0,m-1} \\ &\vdots \end{align}$$ Finally you get an approximation to the eigenvector corresponding to the $\lambda_k$ by evaluating $$ u = \sum_{k=1}^m \alpha_k e_k\quad . $$ Again, once you have gathered all $n$ vectors $e_i$, this eigenvector will correspond to an eigenvector of $A$. Also note that if you want to compute all eigenvectors, then you have to compute the coefficients $\alpha_j$ for each $\lambda_k$.
It is important to emphasize that our representation for the eigenvector $u$ will always be a finite sum. Both Arnoldi and Lanczos cite several other formulas in their works that give you the eigenvector as well (e.g Schmidt expansion or Liouvill-Neumann expansion). However these representations are based on infinite sums or require that you know an eigenvalue upfront. Arnoldis and Lanczos’ merit is to provide a representation in terms of a finite sum.
What does it have to do with Krylov subspaces?
Let’s have another look at how the iterates are generated.
- We begin with an arbitrary start vector $e_1$.
- We compute the second basis vector $e_2$ as follows: $$ e_2 = A e_1 - \frac{e_1^\top A e_1}{e_1^\top e_1} e_1 $$
- We compute the third basis vector $e_3$ as follows: $$ e_3 = A e_2 - \frac{e_2^\top A e_2}{e_2^\top e_2} e_2 - \frac{e_1^\top A e_2}{e_1^\top e_1} e_1 $$
- We keep iterating until we generated the desired number of vectors or until the generation breaks down by returning the 0 vector.
The first thing that we notice is that we actually compute $$ e_1,\quad Ae_1,\quad Ae_2=A^2e_1,\quad … $$ which are exactly the vectors whose span gives you the Krylov subspaces. Next, by subtracting the terms $\frac{e_i^\top A e_j}{e_i^\top e_i} e_i$ from $A e_j$, we are actually applying a Gram-Schmidt orthogonalization on the iterates $A e_j$. Hence, in each iterate we compute a Krylov subspace and compute an orthogonal basis.
The algorithm then projects the original matrix into these Krylov subspaces and computes an approximation to the eigenvalues. Since these spaces are much smaller than the original space, the computations are much simpler (also our matrix has a much nicer structure)
Implementation
The following Mathematica code implements a very simple Arnoldi Iteration to compute eigenvalues and eigenvectors.
ComplexDot[a_, b_] := Conjugate[a] . b
ReducedMatrixEntry[leftvec_, rightvec_, A_] := Module[{},
Return[
ComplexDot[leftvec, A . rightvec]/ComplexDot[leftvec, leftvec]
];
];
ReducedMatrix[basis_, A_] := Module[{},
Return[
Outer[ReducedMatrixEntry[#1, #2, A] &, basis, basis, 1]
];
];
GramSchmidtStep[basis_, weights_, A_] := Module[{},
Return[
Append[basis,
A . basis[[-1]] - Total[MapThread[#1*#2 &, {weights, basis}]]]
];
];
EigenvaluePolynomials[polynomials_, weights_, lambda_] := Module[{},
Return[
Append[polynomials,
(lambda - weights[[-1]])*polynomials[[-1]] -
Total[weights[[1 ;; -2]]*polynomials[[1 ;; -2]]]]
];
];
EigenvectorPolynomials[polynomials_, weights_, lambda_] :=
Module[{},
Return[
Prepend[
polynomials,
(lambda - weights[[1]])*polynomials[[1]] -
Total[weights[[2 ;; -1]]*polynomials[[2 ;; -1]]]]
];
];
EigenvalueEstimates[polynomial_, lambda_] := Module[{},
Return[lambda /. Solve[polynomial == 0, lambda]]
];
EigenvectorEstimate[polynomials_, basis_, eigenvalues_, lambda_] :=
Module[{},
Return[
Map[{#,
Normalize[Total[(polynomials /. lambda -> #)*basis]]} &,
eigenvalues]
];
];
ArnoldiEigendata[A_, start_, maxIts_] := Module[{
VecK = {Normalize[start]},
CoeffMat,
ColIdx = 0,
RowIdx = 0,
EigenvaluePolys = {1},
EigenvectorPolys = {1},
EigenvalEstimates,
EigenvecEstimates,
lambda
},
For[ColIdx = 1, ColIdx <= maxIts, ColIdx++,
CoeffMat = ReducedMatrix[VecK, A];
VecK = GramSchmidtStep[VecK, CoeffMat[[1 ;; ColIdx, -1]], A];
EigenvaluePolys =
EigenvaluePolynomials[EigenvaluePolys,
CoeffMat[[1 ;; ColIdx, -1]], lambda];
];
EigenvalEstimates =
EigenvalueEstimates[EigenvaluePolys[[-1]], lambda];
For[RowIdx = maxIts, RowIdx >= 1, RowIdx--,
EigenvectorPolys =
EigenvectorPolynomials[EigenvectorPolys,
CoeffMat[[RowIdx, RowIdx ;; -1]], lambda];
];
EigenvecEstimates =
EigenvectorEstimate[EigenvectorPolys[[2 ;; -1]], VecK[[1 ;; -2]],
EigenvalEstimates, lambda];
Return[{EigenvalEstimates, EigenvecEstimates}];
];
Example
We consider the following matrix (in Mathematica notation)
{
{1.943350, 0.578511, 1.163850, 0.268453, 1.73745, 0.98200},
{0.578511, 1.246780, 0.910821, 0.090292, 1.62437, 1.35639},
{1.163850, 0.910821, 0.409511, 0.265599, 1.74996, 0.67720},
{0.268453, 0.090292, 0.265599, 0.232830, 1.23293, 0.35352},
{1.737450, 1.624370, 1.749960, 1.232930, 1.41587, 1.07492},
{0.982009, 1.356390, 0.677200, 0.353520, 1.07492, 1.76505}
}
Its eigenvalues are given by
{-1.34007, -0.49569, 0.33907, 0.754853, 1.34977, 6.40546}
and its eigenvectors are given by the columns of the following matrix
{
{-0.460203, -0.554847, 0.164253, 0.585615, 0.326414, -0.062352},
{-0.398644, 0.480159, 0.287389, -0.248932, 0.459752, -0.504578},
{-0.363666, -0.164665, 0.410593, -0.158344, -0.763497, -0.253072},
{-0.174360, -0.143923, 0.459720, -0.454511, 0.225959, 0.692752},
{-0.548404, -0.185839, -0.710719, -0.397555, -0.010979, 0.037753},
{-0.407301, 0.615814, -0.073336, 0.453195, -0.219035, 0.442875}
}
Our Arnoldi implementation above gives us the following iterates
- Iteration 1:
{0.549131, 6.06347}
{
{-0.864387, 0.502827},
{ 0.121370, 0.208641},
{ 0.244172, 0.419744},
{ 0.056320, 0.096817},
{ 0.364510, 0.626613},
{ 0.206022, 0.354163}
}
- Iteration 2:
{-0.723417, 1.0684, 6.40053}
{
{ 0.380137, -0.801777, 0.461139},
{ 0.569453, 0.496034, 0.393024},
{-0.340171, 0.056756, 0.379099},
{ 0.294660, 0.249443, 0.190803},
{-0.572385, 0.043831, 0.548051},
{-0.032460, 0.209139, 0.390385}
}
- Iteration 3:
{-1.09743, 0.247749, 1.22842, 6.40536}
{
{-0.255709, 0.483779, -0.699114, 0.460229},
{-0.469832, 0.195576, 0.569868, 0.399032},
{-0.017525, -0.657474, -0.210288, 0.361940},
{-0.578735, -0.409707, 0.043735, 0.175555},
{ 0.555310, -0.200571, 0.020606, 0.550674},
{ 0.265066, 0.295547, 0.374075, 0.404846}
}
- Iteration 4:
{-1.33928, -0.492637, 0.750416, 1.34907, 6.40546}
{
{ 0.164574, -0.327319, 0.587271, -0.555971, 0.460203},
{ 0.277366, -0.489640, -0.195416, 0.493952, 0.39867},
{ 0.402705, 0.746602, -0.129910, -0.156531, 0.363682},
{ 0.474268, -0.185201, -0.524148, -0.159935, 0.174327},
{-0.710545, 0.010519, -0.399970, -0.185071, 0.548404},
{-0.063307, 0.247569, 0.4066220, 0.602143, 0.407275}
}
- Iteration 5:
{-1.34007, -0.49569, 0.33907, 0.754853, 1.34977, 6.40546}
{
{-0.164253, 0.326414, -0.062352, 0.585615, -0.554847, 0.460203},
{-0.287389, 0.459752, -0.504578, -0.248932, 0.480159, 0.398644},
{-0.410593, -0.763497, -0.253072, -0.158344, -0.164665, 0.363666},
{-0.459720, 0.225959, 0.692752, -0.454511, -0.143923, 0.174360},
{ 0.710719, -0.010979, 0.037753, -0.397555, -0.185839, 0.548404},
{ 0.073336, -0.219035, 0.442875, 0.453195, 0.615814, 0.407301}
}
At this point the eigenvalues correspond exactly with those computed by Mathematica and the eigenvectors coincide up to sign (which is perfectly fine).
Algorithmic considerations
- Arnoldi already remarks in his paper that it can be helpful to apply a handful of power iterations to our first iterate. This gives you a better initial guess for the first basis vector. If you do not compute all $n$ iterates, then your iterates should be closer to the real eigenvectors and eigenvalues.
- Orthogonality is important. Rather than using Gram-Schmidt one should probably use the modified Gram-Schmidt variant. I analysed its benefits here. Also, a common issue with power iterations is that once they start converging, they become closer to each other, which makes the orthogonalisation process more difficult.
- You need to find the roots of a polynomial. This requires a separate set of numerical methods.
- Finally, your memory consumption increases with each iterate. You should restart the whole method regularly to keep the memory usage within reasonable bounds. This is especially important if you work with large and sparse matrices and is also happening in ARPACK.
References
[1] W. E. Arnoldi, The principle of minimized iterations in the solution of the matrix eigenvalue problem, Quarterly of Applied Mathematics 9, 17–29, 1951, doi:10.1090/QAM/42792
[2] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, Journal of Research, Nat. Bu. Stand., 45, 255-282, 1950
[3] J. Liesen, Zdenĕk Strakoš, Krylov Subspace Methods: Principles and Analysis; Oxford University Press, 2013
[4] Gene H. Golub, Charles F. Van Loan, Matrix Computations, John Hopkins University Press, 2013
Notes
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License