G. Strang’s Linear Algebra IV-1
Reading notes
This is the 1st note from Gilbert Strang’s book on linear algebra (LA, Strang 2012). The initial plan was to try to compress what I read into a tight summary to retain the most insightful bits. But what I found myself doing is deliberating on what I read and self-explaining the gaps. This is definitely more useful for me, and perhaps more useful for the reader of this post.
The IV-th edition I am reading is 2 iterations behind the latest edition (VI). In this book, the very first line of the very first chapter reads:
This book begins with the central problem of linear algebra: solving linear equations.
(In the 5th edition, this sentence only appears in the 2nd chapter. And I could not Ctrl+F it at all in the 6th edition.)
Which signals that solving linear equations is a major raison d’etre of LA. Before reading this chapter, I thought LA was about matrices, vectors, operations involving them, determinants, and eigen-things, but perhaps those things are just means to an end of solving linear equations…
Since it is of central importance, let’s think deeply about what solving equations means. If I remember anything from “regular” school-level algebra, we “solve for x” when we find a value of x that satisfies a constraint expressed as an equality of two sides, e.g., 2x - 5 = 11. The solution is 3 (\in \mathbb{R}) and it is a single point on the real number line. Thinking generally, we could say that the set of solutions is \{3\}, and it contains a single scalar that satisfies the equality 2x - 5 = 11.
If I was to add anything to Strang’s opening sentence is that LA seems to be concerned with systems of linear equations with (potentially) more than 1 variable, e.g.:
\begin{align} 2x - 5y = 11 x + 7y = 3 \end{align}
Each equation now has 2 scalars that may or may not satisfy two equalities. For the first equation, 2x - 5y = 11, the pair (x,y) =(0, 2.2) satisfies the equality, but so does (5.5, 0). In fact, there seems to be infinitely many points that satisfy that equality and the same can said for any equation of the form ax + by = c. If we marked all point solutions to each equation on an (x,y) coordinate plane, we’d see that they trace straight lines.
Aha! That’s why these equations are called linear.
Solving a system of equations means finding a set of points that satisfies the constraints given by all equalities in the system. With this understanding of what it means to solve a system of equation, it makes sense that the solution to a system can be a single point where the two lines intersect.
Aha! This is where an intuitive idea of intersecting lines corresponds exactly to a less intuitive notion of intersecting sets. Because a line is a set of points that satisfies a linear equation, an intersection of two lines is a single-point set that satisfies more than 1 linear equation.
Elimination algorithm
Apparently, Gaussian elimination is a really efficient method for solving systems of equations. When seeing Strang explain it on a simple example (and remembering how it was taught in my middle school), this elimination method seems like a recipe: look for a way to eliminate the coefficients so that we can express the variables as transformations of known quantities. In fact, it can be described as an algorithm – a sequence of steps that achieves a desired outcome. Here’s is what a naive learner like myself might come up with:
function gaussian_elimination!(M::Matrix{<:Real})
nrows = size(M, 1)
# Forward elimination
for row in 1:nrows-1
# Get pivot value
col = row
pivot = M[row, col]
# "Eliminate" values in the pivot's col for each row below the pivot
for row_below in row+1:nrows
coef = M[row_below, row] / pivot # rescale
M[row_below, :] -= coef .* M[row, :] # eliminate
end
end
# Backward substitution
for row in nrows:-1:1
col = row
M[row, :] /= M[row, col] # rescale (normalize) the substitute
# "Eliminate" values in the substitute's col for each row above the substitute
for row_above in 1:row-1
M[row_above, :] -= M[row_above, col] .* M[row, :] # eliminate
end
end
solution = M[:, end]
return solution
endOf course, this algorithm does not quite work when some pivots are already 0.
There are two major parts: 1. Forward elimination that gets the original matrix into an upper triangular form, M \rightarrow U. 2. Backward substitution that turns U into an identity matrix to reveal the solution.
Elimination via matrices
The author then introduces the elementary matrix which can be seen as a kind of device that performs an elimination step. It’s an “off-by-1” identity matrix with a single element (i,j)-th element – such that i>j is set to l. Instead of writing out the full matrix, I think it is convenient to annotate it as E^{(l)}_{i,j}. An elementary matrix transforms its input matrix A by it adding l times A’s row j to its row i, which is what we want to achieve when performing elimination.
After taking a hint from Strang’s didactics, I figured that the whole process elimination can be represented as a matrix transformation. Each elimination step corresponds to a multiplication of a matrix by an elementary matrix that subtracts a multiple of a pivot from a row, such that the latter ends up with a 0 in the column of a pivot:
\begin{align} E^{(-\frac{d}{a})}_{2,1} A & = \begin{bmatrix} 1 & 0 & 0 \\ -\frac{d}{a} & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} \\ & = \begin{bmatrix} a & b & c \\ d-\frac{d}{a}a & e-\frac{d}{a}b & f-\frac{d}{a}c \\ g & h & i \end{bmatrix} \\ & = \begin{bmatrix} a & b & c \\ 0 & e-\frac{d}{a}b & f-\frac{d}{a}c \\ g & h & i \end{bmatrix} \end{align}
After arriving at E^{(-\frac{d}{a})}_{2,1} A we can apply another transformation E^{(-\frac{g}{a})}_{3,1} that will eliminate g from A:
E^{(-\frac{g}{a})}_{3,1} (E^{(-\frac{d}{a})}_{2,1} A) = \begin{bmatrix} a & b & c \\ 0 & e-\frac{d}{a}b & f-\frac{d}{a}c \\ 0 & h-\frac{g}{a}b & i-\frac{g}{a}c \end{bmatrix}
After that we apply yet another E^{x}_{3,2} that will do the trick of eliminating h-\frac{g}{a}b from the matrix above by subtracting a multiple of the 2nd row from the 3rd. We can represent (E^{x}_{3,2} E^{(-\frac{g}{a})}_{3,1} E^{(-\frac{d}{a})}_{2,1}) as a single elimination matrix E and write
EA = U
which I find quite elegant – a big chunk of the algorithm captured by a handful of symbols.
LU factorization
Strang then continues to introduce the idea of LU factorization – the representation of a matrix as its lower (L) and upper (U) triangular components. The L factor is a special matrix that contains all the pivot multipliers in their respective positions. The importance of LU factorization is emphasized and re-emphasized. I think the following passage summarizes it well:
There is a serious practical point about A = LU. It is more than just a record of elimination steps; L and U are the right matrices to solve Ax = b. In fact A could be thrown away! We go from b to c by forward elimination (this uses L) and we go from c to x by back-substitution (that uses U). We can and should do it without A: First Lc = b and then Ux = c.
The 1st chapter is huge (and dense), so I will conclude this reading note here. Looking forward to more insights from LA.