QR Decomposition and Least Squares
We have discussed the Gram-Schmidt algorithm for constructing
an orthonormal set of vectors from a linearly independent set.
When we perform the Gram-Schmidt algorithm, we are actually
factoring a matrix A in the form A = QR, where Q is a matrix
with orthonormal columns and R is an upper triangular matrix.
This factorization can be done in MATLAB by typing [Q,R]=qr(A,0).
Example:
Factor A in the form A = QR, where Q is an orthogonal matrix
(i.e., a square matrix whose columns are orthonormal) and R is
upper triangular.
The Gram-Schmidt algorithm for orthonormalizing the columns of A
gives us such a factorization. The first step is to divide column one
of A by its norm, to obtain the first orthonormal vector. This
corresponds to multiplying A on the right by a diagonal matrix,
which we will call R1:
A R1 = |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
|
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
= |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
. |
|
The first column of this new matrix is our first orthonormal vector.
Next we replace column two of the matrix by itself minus a constant
times column one, where the constant is chosen so that the new vector
will be orthogonal to column one:
|
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
¬ |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
- |
1
Ö3
|
|
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
. |
|
The constant 1/ Ö3 was determined by taking the inner product
of column one with column two:
( 1/ Ö3 , 1/ Ö3 , 1/ Ö3 ) |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
= |
1
Ö3
|
. |
|
This corresponds to multiplying on the right by an upper triangular
matrix R2:
A R1 R2 = |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
|
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
= |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
. |
|
Notice that the second column of the new matrix is orthogonal to the first.
Next we normalize the second column of the matrix. This again corresponds
to multiplying on the right by a diagonal matrix R3:
A R1 R2 R3 = |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
|
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
= |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
. |
|
Note that column two is still orthogonal to column one, but now it has
norm one. This is the second orthonormal vector produced by the Gram-Schmidt
algorithm.
Column three still has not been changed. The next step is to orthogonalize
it against columns one and two:
|
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
¬ |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
-Ö3 |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
-0 |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
. |
|
The coefficients Ö3 and 0 are obtained by taking the inner
product of columns one and two, respectively, with column three:
( 1/ Ö3 , 1/ Ö3, 1/ Ö3 ) |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
= Ö3 ( 1/ Ö6 , -2/ Ö6 , 1/ Ö6 ) |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
= 0. |
|
This corresponds to multiplying on the right by another upper triangular
matrix R4:
A R1 R2 R3 R4 = |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
|
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
= |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
. |
|
Note that column three is now orthogonal to columns one and two.
The last step is to normalize column three. This corresponds to
multiplying on the right by a diagonal matrix R5:
A R1 R2 R3 R4 R5 = |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
|
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
= |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
. |
|
The columns of our final matrix are orthonormal. Thus we can write
where Q is an orthogonal matrix. If we define
R = ( R1 R2 R3 R4 R5 )-1 = R5-1 R4-1 ¼R1-1 , |
| (1) |
then R is upper triangular, since the inverse of each upper triangular
matrix R1 , ¼, R5 is upper triangular and the product of upper
triangular matrices is upper triangular. We thus have the factorization
A = QR.
The matrix R in (1) is easy to compute, since the matrices
R1 , ¼, R5 are either diagonal or else have ones on the
main diagonal and nonzeros above the diagonal in just one column.
To invert a diagonal matrix, just take the inverse of each diagonal
element. To invert a matrix with ones on the main diagonal and
nonzeros in just one column, take the negative of the entries in
that column. We can also compute R by noting that since A = QR
and QT Q = I, we have QT A = QT Q R = R. For our example,
R = QT A = |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
. |
|
Least Squares Solution of Overdetermined Linear Systems
Suppose we have an overdetermined system of linear equations,
that is, more equations than unknowns, so that the system cannot be
satisfied exactly. An example is
If the overdetermined system is denoted Ax » b, where A is
an m by n matrix with m > n, and b is a given m-vector,
then we would like to find an approximate solution x* that minimizes
the residual norm || b - A x ||; that is
|| b - A x* || £ || b - A x || , for all x Î Rn . |
|
This is called a least squares solution. In our example, we
look for x*1 and x*2 to minimize
|| |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
- |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
|
æ ç
è
|
|
| |
ö ÷
ø
|
|| = |
| __________________________________________ Ö ( 2 - x1 - x2 )2 + ( 0 - x1 + x2 )2 + ( 4 - 2 x1 - x2 )2
|
. |
|
If b lies in the range of A, then the equation Ax = b
has an exact solution; that is, the least squares solution x*
satisfies || b - A x* || = 0. If b lies outside the range of A,
then the equation Ax = b cannot be solved exactly. Instead, we find
the closest vector b* to b in the range of A and solve
A x* = b*. This is described further in the text. The vector b*
is characterized by the fact that b- b* = b - A x* is
orthogonal to the range of A. This enables us to determine x*
by solving the normal equations: AT ( b - A x* ) = 0 or
For our example, the normal equations are
|
æ ç
è
|
|
| |
ö ÷
ø
|
|
æ ç
è
|
|
| |
ö ÷
ø
|
= |
æ ç
è
|
|
| |
ö ÷
ø
|
, |
|
yielding x1 = 9/7, x2 = 8/7.
We can also use the (reduced) QR-decomposition of A to solve the
least squares problem. [There are actually two types of QR-decompositions
for an m by n matrix. In the full QR-decomposition, Q is
m by m and R is m by n. We will discuss only the reduced
QR-decomposition, in which Q is an m by n matrix with orthonormal
columns and R is n by n. In the MATLAB expression qr(A,0),
the second argument 0 indicates that it is the reduced QR-decomposition
that we seek.] The columns of Q form an orthonormal basis
for the range of A. (Do you see why?) Hence b - A x* will be
orthogonal to the range of A if and only if QT ( b - A x* ) = 0.
Since A = QR and QT Q = I, this equation becomes
QT b - QT (Q R) x* = QT b - ( QT Q ) R x* = 0 or,
This equation is easy to solve since R is upper triangular.
We can use the routine back_solve that was described in a previous
set of notes.
For our example problem, the (reduced) QR-decomposition is
|
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
= |
æ ç ç ç
ç ç è
|
|
| |
ö ÷ ÷ ÷
÷ ÷ ø
|
|
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
, |
|
and so the linear system (3) becomes
|
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
|
æ ç
è
|
|
| |
ö ÷
ø
|
= |
æ ç ç
ç ç è
|
|
| |
ö ÷ ÷
÷ ÷ ø
|
. |
|
(You should check this.) Solving for x1 and x2, we again obtain
x1 = 9/7, x2 = 8/7.
When solving a small least squares problem by hand, it is usually easier
to solve the normal equations (2) than it is to compute the
QR-decomposition. When working on a computer, however, least squares
problems are usually solved by computing the QR-decomposition of the
matrix and then backsolving to solve the n by n upper triangular
system in (3). This is because the QR-decomposition is less susceptible
to the effects of rounding errors and hence delivers a more accurate
approximation to the least squares solution than does solving the normal
equations. This is what MATLAB does when you enter a nonsquare matrix A
(m by n, say, with m ¹ n) and issue the command A\b.
Recall that when A is square, MATLAB attempts
to solve the linear system by Gaussian elimination and issues a warning
if the matrix is (nearly) singular. In the nonsquare case, however,
it uses the QR-decomposition and returns a least squares solution, even
if the linear system cannot be solved exactly.
File translated from TEX by TTH, version 1.59.