# Gram-Schmidt vs. Modified Gram-Schmidt

## Problem Formulation

Given $k$ vectors $x_j\in\mathbb{R}^n$, we would like to find $k$ vectors $y_j\in\mathbb{R}^n$ such that the $y_j$ build an orthonormal system, i.e.

$$\lVert y_j \rVert = 1\quad\forall j\quad\text{and}\quad\langle y_j, y_i \rangle = 0\quad\forall j\neq i$$

and

$$\mathrm{span}\lbrace y_1, .. y_j\rbrace = \mathrm{span}\lbrace x_1, .. x_j\rbrace\quad \forall j = 1, …, k$$

The most popular algorithm to solve this problem is probably the Gram-Schmidt orthogonalisation. The algorithm is named after Erhardt Schmidt and Jørgen Pedersen Gram. Schmidt published the algorithm in 1907 [5] but claims in his work that the method can already be found in Grams work from 1883 [6]. The first reference to the “Gram-Schmidt” Algorithm probably comes from [7]. Finally, a well readable historical overview is given in [4]

## Classical Gram-Schmidt

Finding $y_1$ is rather straightforward. The first requirement tells us that $y_1$ must have length 1 and the second equation tells us that it must be parallel to $x_1$. The obvious choice is therefore to set

$$y_1 = \frac{x_1}{\lVert x_1 \rVert}$$

In order to get the second vector lets have a look at our current situation in the sketch below.

We know $p$. It’s given by $p = \langle x_2, y_1\rangle y_1$. Thus it follows that the vector going from $p$ to $x_2$ is given by $x_2 - p$. From the sketch it’s obvious that $x_2 - p$ will be orthogonal to $y_1$ and thus a perfect candidate for $y_2$. This gives us an idea for a general formula

$$y_j = x_j - \sum_{i=1}^{j-1} \langle x_j, y_i\rangle y_i$$

and we end up with the following algorithm

Classical Gram-Schmidt Algorithm:

1. set $y_1 = \frac{x_1}{\lVert x_1\rVert}$
2. for $j = 2, …, k$ compute: $$\begin{eqnarray} y_j &=& x_j - \sum_{i=1}^{j-1} \langle x_j, y_i\rangle y_i \\ y_j &=& \frac{y_j}{\lVert y_j\rVert} \end{eqnarray}$$

In Julia, classical Gram-Schmidt can be implemented as follows.

using LinearAlgebra

function classical_gram_schmidt_alt(matrix)
# orthogonalises the columns of the input matrix
num_vectors = size(matrix)[2]
orth_matrix = zeros(size(matrix))
for vec_idx = 1:num_vectors
orth_matrix[:, vec_idx] = matrix[:, vec_idx]
sum = zeros(size(orth_matrix[:, 1]))
for span_base_idx = 1:(vec_idx-1)
# compute sum
sum += dot(orth_matrix[:, span_base_idx], orth_matrix[:, vec_idx])*orth_matrix[:, span_base_idx]
end
orth_matrix[:, vec_idx] -= sum
# normalise vector
orth_matrix[:, vec_idx] = orth_matrix[:, vec_idx]/norm(orth_matrix[:, vec_idx])
end
return orth_matrix
end


## Modified Gram-Schmidt

Modified Gram-Schmidt performs the very same computational steps as classical Gram-Schmidt. However, it does so in a slightly different order. In classical Gram-Schmidt you compute in each iteration a sum where all previously computed vectors are involved. In the modified version you can correct errors in each step.

Modified Gram-Schmidt Algorithm:

1. set $y_j = x_j$ for all $j$.
2. for $j = 1, …, k$ compute: $$\begin{eqnarray} y_j &=& \frac{y_j}{\lVert y_j\rVert} \\ y_i &=& y_i - \langle y_j, y_i\rangle y_j\quad \forall i = j + 1, …, k \end{eqnarray}$$

If you omit the normalisations, then classical Gram-Schmidt computes

$$\begin{eqnarray} y_2 &=& x_2 - \langle x_2, y_1 \rangle y_1 \\ y_3 &=& x_3 - \langle x_3, y_1 \rangle y_1 - \langle x_3, y_2 \rangle y_2 \\ y_4 &=& x_4 - \langle x_4, y_1 \rangle y_1 - \langle x_4, y_2 \rangle y_2 - \langle x_4, y_3 \rangle y_3 \end{eqnarray}$$

whereas modified Gram-Schmidt computes

$$\begin{eqnarray} y_2 &=& x_2 - \langle x_2, y_1 \rangle y_1, \\ y_3 &=& x_3 - \langle x_3, y_1 \rangle y_1, \\ &…& \\ y_k &=& x_k - \langle x_k, y_1 \rangle y_1, \\ \\ y_3 &=& y_3 - \langle y_3, y_2 \rangle y_2 = x_3 - \langle x_3, y_1 \rangle y_1 - \langle x_3, y_2 \rangle y_2 \\ &…& \\ y_k &=& y_k - \langle x_k, y_2 \rangle y_2 = x_k - \langle x_k, y_1 \rangle y_1 - \langle x_k, y_2 \rangle y_2 \\ &…& \end{eqnarray}$$

If you consider the strategy from a block wise point of view, then you’ll notice that you orthogonalize each of your vectors in each step based on your previous orthogonalisation. This does not happen in the classical Gram-Schmidt version but allows you to correct for errors that happened in previous blocks.

In Julia, modified Gram-Schmidt can be implemented as follows.

using LinearAlgebra

function modified_gram_schmidt(matrix)
# orthogonalises the columns of the input matrix
num_vectors = size(matrix)[2]
orth_matrix = copy(matrix)
for vec_idx = 1:num_vectors
orth_matrix[:, vec_idx] = orth_matrix[:, vec_idx]/norm(orth_matrix[:, vec_idx])
for span_base_idx = (vec_idx+1):num_vectors
# perform block step
orth_matrix[:, span_base_idx] -= dot(orth_matrix[:, span_base_idx], orth_matrix[:, vec_idx])*orth_matrix[:, vec_idx]
end
end
return orth_matrix
end


The benefits of using the modified Gram-Schmidt algorithm is visible in the plot below. As input, I used different sizes of a regularized version of the Hilbert matrix and computed the operator norm of the matrix $I - Q^\top Q$, where $I$ is the identity matrix and $Q$ is our output matrix. In a perfect setting the error should be 0.

error_cgs = []
error_mgs = []
for n = 1:10
H = [1.0/(i+j-1) for i=1:2^n, j=1:2^n] + 0.00001 * I # regularised Hilbert Matrixy
eye = Matrix{Float64}(I, 2^n, 2^n)                   # Identity Matrix
H_cgs = classical_gram_schmidt(H)
H_mgs = modified_gram_schmidt(H)
append!(error_cgs, norm(eye - transpose(H_cgs)*H_cgs, Inf))
append!(error_mgs, norm(eye - transpose(H_mgs)*H_mgs, Inf))
end


Please note that the error is expressed in multiples of machine epsilon here.

As we can see, the error is increasing much faster in classical Gram-Schmidt than in modified Gram-Schmidt.

It is interesting to note that the following formulation of Gram-Schmidt algorithm (which is really just a better implementation of classical Gram-Schmidt) is referred to as the modified Gram-Schmidt by Schwarz and Rutishauser in [3].

using LinearAlgebra

function classical_gram_schmidt(matrix)
# orthogonalises the columns of the input matrix
num_vectors = size(matrix)[2]
orth_matrix = copy(matrix)
for vec_idx = 1:num_vectors
for span_base_idx = 1:(vec_idx-1)
# substrack sum
orth_matrix[:, vec_idx] -= dot(orth_matrix[:, span_base_idx], orth_matrix[:, vec_idx])*orth_matrix[:, span_base_idx]
end
# normalise vector
orth_matrix[:, vec_idx] = orth_matrix[:, vec_idx]/norm(orth_matrix[:, vec_idx])
end
return orth_matrix
end


## Notes

All the code was evaluated with Julia version 1.5.0 (2020-08-01) using the official Julia Docker image.

## References

[1] N. J. Higham, Accuracy and Stability of Numerical Algorithms, Society for Industrial and Applied Mathematics, 2002

[2] G. H. Golub and C. F. Van Loan, Matrix Computations, The John Hopkins University Press, 1996.

[3] W. Gander, Algorithms for the QR-Decomposition, Eidgenössische Technische Hochschule, Research Report No. 80-02, 1980. Retyped 2003

[4] S. J. Leon, A. Björck, and W. Gander, “Gram-Schmidt orthogonalization: 100 years and more,” Numerical Linear Algebra with Applications, vol. 20, no. 3, Art. no. 3, 2012

[5] E. Schmidt, “Zur Theorie der linearen und nichtlinearen Integralgleichungen I. Teil: Entwicklung willkürlicher Funktionen nach Systemen vorgeschriebener,” Mathematische Annalen, vol. 63, pp. 433–476, 1907.

[6] J. P. Gram, “Ueber die Entwickelung reeller Functionen in Reihen mittels der Methode der kleinsten Quadrate,” Journal for die Reine und Angewandte Mathematik, vol. 94, pp. 41–73, 1883.

[7] Y. K. Wong, “An application of orthogonalization process to the theory of least squares,” Annals of Mathematical Statistics, vol. 6, pp. 53–75, 1935.