QR Decomposition and Least Squares

We have discussed the Gram-Schmidt algorithm for constructing an orthonormal set of vectors from a linearly independent set. When we perform the Gram-Schmidt algorithm, we are actually factoring a matrix A in the form A = QR, where Q is a matrix with orthonormal columns and R is an upper triangular matrix. This factorization can be done in MATLAB by typing [Q,R]=qr(A,0).

Example:

A = æ
ç
ç
ç
ç
è
1
1
2
1
-1
1
1
1
0
ö
÷
÷
÷
÷
ø
Factor A in the form A = QR, where Q is an orthogonal matrix (i.e., a square matrix whose columns are orthonormal) and R is upper triangular.

The Gram-Schmidt algorithm for orthonormalizing the columns of A gives us such a factorization. The first step is to divide column one of A by its norm, to obtain the first orthonormal vector. This corresponds to multiplying A on the right by a diagonal matrix, which we will call R1:

A R1 = æ
ç
ç
ç
ç
è
1
1
2
1
-1
1
1
1
0
ö
÷
÷
÷
÷
ø
æ
ç
ç
ç
ç
è
1/ Ö3
0
0
0
1
0
0
0
1
ö
÷
÷
÷
÷
ø
= æ
ç
ç
ç
ç
è
1/ Ö3
1
2
1/ Ö3
-1
1
1/ Ö3
1
0
ö
÷
÷
÷
÷
ø
.
The first column of this new matrix is our first orthonormal vector.

Next we replace column two of the matrix by itself minus a constant times column one, where the constant is chosen so that the new vector will be orthogonal to column one:

æ
ç
ç
ç
ç
è
1
-1
1
ö
÷
÷
÷
÷
ø
¬ æ
ç
ç
ç
ç
è
1
-1
1
ö
÷
÷
÷
÷
ø
- 1
Ö3
æ
ç
ç
ç
ç
è
1/ Ö3
1/ Ö3
1/ Ö3
ö
÷
÷
÷
÷
ø
.
The constant 1/ Ö3 was determined by taking the inner product of column one with column two:
( 1/ Ö3 , 1/ Ö3 , 1/ Ö3 ) æ
ç
ç
ç
ç
è
1
-1
1
ö
÷
÷
÷
÷
ø
= 1
Ö3
.
This corresponds to multiplying on the right by an upper triangular matrix R2:
A R1 R2 = æ
ç
ç
ç
ç
è
1/ Ö3
1
2
1/ Ö3
-1
1
1/ Ö3
1
0
ö
÷
÷
÷
÷
ø
æ
ç
ç
ç
ç
è
1
-1/ Ö3
0
0
1
0
0
0
1
ö
÷
÷
÷
÷
ø
= æ
ç
ç
ç
ç
è
1/ Ö3
2/3
2
1/ Ö3
-4/3
1
1/ Ö3
2/3
0
ö
÷
÷
÷
÷
ø
.
Notice that the second column of the new matrix is orthogonal to the first.

Next we normalize the second column of the matrix. This again corresponds to multiplying on the right by a diagonal matrix R3:

A R1 R2 R3 = æ
ç
ç
ç
ç
è
1/ Ö3
2/3
2
1/ Ö3
-4/3
1
1/ Ö3
2/3
0
ö
÷
÷
÷
÷
ø
æ
ç
ç
ç
ç
è
1
0
0
0
Ö3/(2 Ö2)
0
0
0
1
ö
÷
÷
÷
÷
ø
= æ
ç
ç
ç
ç
è
1/ Ö3
1/ Ö6
2
1/ Ö3
-2/ Ö6
1
1/ Ö3
1/ Ö6
0
ö
÷
÷
÷
÷
ø
.
Note that column two is still orthogonal to column one, but now it has norm one. This is the second orthonormal vector produced by the Gram-Schmidt algorithm.

Column three still has not been changed. The next step is to orthogonalize it against columns one and two:

æ
ç
ç
ç
ç
è
2
1
0
ö
÷
÷
÷
÷
ø
¬ æ
ç
ç
ç
ç
è
2
1
0
ö
÷
÷
÷
÷
ø
-Ö3 æ
ç
ç
ç
ç
è
1/ Ö3
1/ Ö3
1/ Ö3
ö
÷
÷
÷
÷
ø
-0 æ
ç
ç
ç
ç
è
1/ Ö6
-2/ Ö6
1/ Ö6
ö
÷
÷
÷
÷
ø
.
The coefficients Ö3 and 0 are obtained by taking the inner product of columns one and two, respectively, with column three:
( 1/ Ö3 , 1/ Ö3, 1/ Ö3 ) æ
ç
ç
ç
ç
è
2
1
0
ö
÷
÷
÷
÷
ø
= Ö3    ( 1/ Ö6 , -2/ Ö6 , 1/ Ö6 ) æ
ç
ç
ç
ç
è
2
1
0
ö
÷
÷
÷
÷
ø
= 0.
This corresponds to multiplying on the right by another upper triangular matrix R4:
A R1 R2 R3 R4 = æ
ç
ç
ç
ç
è
1/ Ö3
1/ Ö6
2
1/ Ö3
-2/ Ö6
1
1/ Ö3
1/ Ö6
0
ö
÷
÷
÷
÷
ø
æ
ç
ç
ç
ç
è
1
0
- Ö3
0
1
0
0
0
1
ö
÷
÷
÷
÷
ø
= æ
ç
ç
ç
ç
è
1/ Ö3
1/ Ö6
1
1/ Ö3
-2/ Ö6
0
1/ Ö3
1/ Ö6
-1
ö
÷
÷
÷
÷
ø
.
Note that column three is now orthogonal to columns one and two.

The last step is to normalize column three. This corresponds to multiplying on the right by a diagonal matrix R5:

A R1 R2 R3 R4 R5 = æ
ç
ç
ç
ç
è
1/ Ö3
1/ Ö6
1
1/ Ö3
-2/ Ö6
0
1/ Ö3
1/ Ö6
-1
ö
÷
÷
÷
÷
ø
æ
ç
ç
ç
ç
è
1
0
0
0
1
0
0
0
1/ Ö2
ö
÷
÷
÷
÷
ø
= æ
ç
ç
ç
ç
è
1/ Ö3
1/ Ö6
1/ Ö2
1/ Ö3
-2/ Ö6
0
1/ Ö3
1/ Ö6
-1/ Ö2
ö
÷
÷
÷
÷
ø
.
The columns of our final matrix are orthonormal. Thus we can write
A R1 R2 R3 R4 R5 = Q ,
where Q is an orthogonal matrix. If we define
R = ( R1 R2 R3 R4 R5 )-1 = R5-1 R4-1 ¼R1-1 ,
(1)
then R is upper triangular, since the inverse of each upper triangular matrix R1 , ¼, R5 is upper triangular and the product of upper triangular matrices is upper triangular. We thus have the factorization A = QR.

The matrix R in (1) is easy to compute, since the matrices R1 , ¼, R5 are either diagonal or else have ones on the main diagonal and nonzeros above the diagonal in just one column. To invert a diagonal matrix, just take the inverse of each diagonal element. To invert a matrix with ones on the main diagonal and nonzeros in just one column, take the negative of the entries in that column. We can also compute R by noting that since A = QR and QT Q = I, we have QT A = QT Q R = R. For our example,

R = QT A = æ
ç
ç
ç
ç
è
Ö3
1/ Ö3
Ö3
0
2 Ö2/3
0
0
0
Ö2
ö
÷
÷
÷
÷
ø
.

Least Squares Solution of Overdetermined Linear Systems

Suppose we have an overdetermined system of linear equations, that is, more equations than unknowns, so that the system cannot be satisfied exactly. An example is

x1 + x2
=
2
x1 - x2
=
0
2 x1 + x2
=
4
If the overdetermined system is denoted Ax » b, where A is an m by n matrix with m > n, and b is a given m-vector, then we would like to find an approximate solution x* that minimizes the residual norm || b - A x ||; that is
|| b - A x* || £ || b - A x || ,  for all x Î Rn .
This is called a least squares solution. In our example, we look for x*1 and x*2 to minimize
|| æ
ç
ç
ç
ç
è
2
0
4
ö
÷
÷
÷
÷
ø
- æ
ç
ç
ç
ç
è
1
1
1
-1
2
1
ö
÷
÷
÷
÷
ø
æ
ç
è
x1
x2
ö
÷
ø
|| =   __________________________________________
Ö ( 2 - x1 - x2 )2 + ( 0 - x1 + x2 )2 + ( 4 - 2 x1 - x2 )2
 
.

If b lies in the range of A, then the equation Ax = b has an exact solution; that is, the least squares solution x* satisfies || b - A x* || = 0. If b lies outside the range of A, then the equation Ax = b cannot be solved exactly. Instead, we find the closest vector b* to b in the range of A and solve A x* = b*. This is described further in the text. The vector b* is characterized by the fact that b- b* = b - A x* is orthogonal to the range of A. This enables us to determine x* by solving the normal equations: AT ( b - A x* ) = 0 or

AT A x* = AT b .
(2)
For our example, the normal equations are
æ
ç
è
6
2
2
3
ö
÷
ø
æ
ç
è
x1
x2
ö
÷
ø
= æ
ç
è
10
6
ö
÷
ø
,
yielding x1 = 9/7, x2 = 8/7.

We can also use the (reduced) QR-decomposition of A to solve the least squares problem. [There are actually two types of QR-decompositions for an m by n matrix. In the full QR-decomposition, Q is m by m and R is m by n. We will discuss only the reduced QR-decomposition, in which Q is an m by n matrix with orthonormal columns and R is n by n. In the MATLAB expression qr(A,0), the second argument 0 indicates that it is the reduced QR-decomposition that we seek.] The columns of Q form an orthonormal basis for the range of A. (Do you see why?) Hence b - A x* will be orthogonal to the range of A if and only if QT ( b - A x* ) = 0. Since A = QR and QT Q = I, this equation becomes QT b - QT (Q R) x* = QT b - ( QT Q ) R x* = 0 or,

R x* = QT b .
(3)
This equation is easy to solve since R is upper triangular. We can use the routine back_solve that was described in a previous set of notes.

For our example problem, the (reduced) QR-decomposition is

æ
ç
ç
ç
ç
è
1
1
1
-1
2
1
ö
÷
÷
÷
÷
ø
= æ
ç
ç
ç
ç
ç
è
1/ Ö6
2/   __
Ö21
 
1/ Ö6
-4/   __
Ö21
 
1/ Ö6
1/   __
Ö21
 
ö
÷
÷
÷
÷
÷
ø
æ
ç
ç
ç
ç
è
Ö6
2/ Ö6
0
  ___
Ö7/3
 
ö
÷
÷
÷
÷
ø
,
and so the linear system (3) becomes
æ
ç
ç
ç
ç
è
Ö6
2/ Ö6
0
  ___
Ö7/3
 
ö
÷
÷
÷
÷
ø
æ
ç
è
x1
x2
ö
÷
ø
= æ
ç
ç
ç
ç
è
10/ Ö6
8/   __
Ö21
 
ö
÷
÷
÷
÷
ø
.
(You should check this.) Solving for x1 and x2, we again obtain x1 = 9/7, x2 = 8/7.

When solving a small least squares problem by hand, it is usually easier to solve the normal equations (2) than it is to compute the QR-decomposition. When working on a computer, however, least squares problems are usually solved by computing the QR-decomposition of the matrix and then backsolving to solve the n by n upper triangular system in (3). This is because the QR-decomposition is less susceptible to the effects of rounding errors and hence delivers a more accurate approximation to the least squares solution than does solving the normal equations. This is what MATLAB does when you enter a nonsquare matrix A (m by n, say, with m ¹ n) and issue the command A\b. Recall that when A is square, MATLAB attempts to solve the linear system by Gaussian elimination and issues a warning if the matrix is (nearly) singular. In the nonsquare case, however, it uses the QR-decomposition and returns a least squares solution, even if the linear system cannot be solved exactly.


File translated from TEX by TTH, version 1.59.