Orthogonal bases and orthogonal projection

Section 5.2 Orthogonal bases and orthogonal projection

Subsection 5.2.1 Orthogonal sets

Definition 5.2.1. Orthogonal.

Let $(V,\langle \ , \rangle)$ be an inner product space. Vectors $\boldv, \boldw\in V$ are orthogonal if $\langle \boldv, \boldw\rangle =0\text{.}$

Let $S\subseteq V$ be a subset of nonzero vectors.

The set $S$ is orthogonal if $\langle\boldv,\boldw \rangle=0$ for all $\boldv\ne\boldw\in S\text{.}$ We say that the elements of $S$ are pairwise orthogonal in this case.
The set $S$ is orthonormal if it is both orthogonal and satisfies $\norm{\boldv}=1$ for all $\boldv\in S\text{:}$ i.e., $S$ consists of pairwise orthogonal unit vectors.

Theorem 5.2.2. Orthogonal implies linearly independent.

Let $(V,\langle\ , \rangle)$ be an inner product space. If $S$ is orthogonal, then $S$ is linearly independent.

Proof.

\begin{align*} a_1\boldv_1 +a_2\boldv_2+\cdots +a_r\boldv_r=\boldzero\amp \Rightarrow\amp \langle a_1\boldv_1 +a_2\boldv_2 +\cdots +a_r\boldv_r,\boldv_i\rangle=\langle\boldzero,\boldv_i\rangle\\ \amp \Rightarrow\amp a_1\langle\boldv_1,\boldv_i\rangle +a_2\langle \boldv_2,\boldv_i\rangle +\cdots +a_r\langle\boldv_r,\boldv_i\rangle=0\\ \amp \Rightarrow\amp a_i\langle \boldv_i,\boldv_i\rangle=0 \ \text{ (since $\langle\boldv_j,\boldv_i\rangle= 0$ for $j\ne i$) }\\ \amp \Rightarrow\amp a_i=0 \text{ (since $\langle\boldv_i,\boldv_i\rangle\ne 0$) } \end{align*}

We have shown that if $a_1\boldv_1+a_2\boldv_2+\cdots +a_r\boldv_r=\boldzero\text{,}$ then $a_i=0$ for all $i\text{,}$ proving that $S$ is linearly independent.

Subsection 5.2.2 Example

Let $V=C([0,2\pi])$ with standard inner product $\langle f, g\rangle=\int_0^{2\pi} f(x)g(x) \, dx\text{.}$

Let

\begin{equation*} S=\{\cos(x),\sin(x),\cos(2x),\sin(2x), \dots\}=\{\cos(nx)\colon n\in\Z_{>0}\}\cup\{\sin(mx)\colon m\in\Z_{>0}\}\text{.} \end{equation*}

Then $S$ is orthogonal, hence linearly independent.

Proof.

Using some trig identities, one can show the following:

\begin{align*} \langle \cos(nx),\cos(mx)\rangle=\int_0^{2\pi}\cos(nx)\cos(mx)\, dx\amp =\begin{cases} 0\amp \text{ if $n\ne m$ }\\ \pi\amp \text{ if $n=m$ } \end{cases}\\ \langle \sin(nx),\sin(mx)\rangle=\int_0^{2\pi}\sin(nx)\sin(mx)\, dx\amp =\begin{cases} 0\amp \text{ if $n\ne m$ }\\ \pi\amp \text{ if $n=m$ } \end{cases}\\ \langle \cos(nx),\sin(mx)\rangle=\int_0^{2\pi}\cos(nx)\sin(mx)\, dx\amp =0 \text{ for any $n,m$ } \end{align*}

Orthogonality holds more generally if we replace the interval $[0,2\pi]$ with any interval of length $L\text{,}$ and replace $S$ with

\begin{equation*} \scriptsize \left\{\cos\left(\frac{2\pi x}{L}\right), \sin\left(\frac{2\pi x}{L}\right), \cos\left(2\cdot\frac{2\pi x}{L}\right),\sin\left(2\cdot\frac{2\pi x}{L}\right),\dots\right\}\text{.} \end{equation*}

Subsection 5.2.3 Orthogonal bases

Definition 5.2.3. Orthogonal and orthonormal bases.

Let $(V,\langle \ , \rangle)$ be an inner product space. An orthogonal basis (resp., orthonormal basis) of $V$ is a basis $B$ that is orthogonal (resp., orthonormal) as a set.

Theorem 5.2.4. Existence of orthonormal bases.

Let $(V,\langle \ , \rangle)$ be a vector space of dimension $n\text{.}$

There is an orthonormal basis for $V\text{.}$ In fact, any basis of $V$ can be converted to an orthonormal basis using the Gram-Schmidt procedure.
If $S\subseteq V$ is an orthogonal set, then there is an orthogonal basis $B$ containing $S\text{:}$ i.e., any orthogonal set can be extended to an orthogonal basis.

The proof that every finite-dimensional vector space has an orthogonal basis is actually a procedure, called the Gram-Schmidt procedure, for converting an arbitrary basis for an inner product space to an orthogonal basis.

Procedure 5.2.5. Gram-Schmidt procedure.

Let $(V, \langle \ , \ \rangle)$ be an inner product space, and let $B=\{\boldv_1, \boldv_2, \dots, \boldv_n\}$ be a basis for $V\text{.}$ We can convert $B$ into an orthogonal basis $B'=\{\boldw_1, \boldw_2, \dots, \boldw_n\}\text{,}$ and further to an orthonormal basis $B''=\{\boldu_1, \boldu_2, \dots, \boldu_n\}\text{,}$ as follows:

Set $\boldw_1=\boldv_1\text{.}$
For $2\leq r\leq n$ replace $\boldv_r$ with

\begin{equation*} \boldw_r:=\boldv_r-\frac{\angvec{\boldv_r, \boldw_{r-1}}}{\angvec{\boldw_{r-1},\boldw_{r-1}}}\boldw_{r-1}-\frac{\angvec{\boldv_r, \boldw_{r-2}}}{\angvec{\boldw_{r-2},\boldw_{r-2}}}\boldw_{r-2}-\cdots -\frac{\angvec{\boldv_r, \boldw_{1}}}{\angvec{\boldw_{1},\boldw_{1}}}\boldw_1\text{.} \end{equation*}

The resulting set $B'=\{\boldw_1, \boldw_2, \dots, \boldw_n\}$ is an orthogonal basis of $V\text{.}$
For each $1\leq i\leq n$ let

\begin{equation*} \boldu_i=\frac{1}{\norm{\boldw_i}}\,\boldw_i\text{.} \end{equation*}

The set $B''=\{\boldu_1, \boldu_2, \dots, \boldu_n\}$ is an orthonormal basis of $V\text{.}$

Theorem 5.2.6. Calculating with orthogonal bases.

Let $(V, \langle , \rangle )$ be an inner product space, and let $B=\{\boldv_1,\dots,\boldv_n\}\subseteq V$ be an orthogonal basis.

Given any $\boldv\in V$ we have

\begin{equation*} \boldv=a_1\boldv_1+a_2\boldb_2+\cdots +a_n\boldv_n \end{equation*}

where

\begin{equation*} a_i=\frac{\langle \boldv,\boldv_i\rangle}{\langle\boldv_i,\boldv_i\rangle}\text{.} \end{equation*}
If $B$ is further assumed to be orthonormal, then

\begin{equation*} a_i=\langle\boldv,\boldv_i\rangle \end{equation*}

for each $1\leq i\leq n\text{.}$

Subsection 5.2.4 Example

Let $V=\R^2$ with the standard inner produce (aka the dot product).

(a) Verify that $B'=\{\boldv_1=(\sqrt{3}/2,1/2), \boldv_2=(-1/2,\sqrt{3}/2)\}$ is an orthonormal basis.

(b) Compute $[\boldv]_{B'}$ for $\boldv=(4,2)\text{.}$

(a) Easily seen to be true.

(b) Since $B'$ is orthonormal, $\boldv=a_1\boldv_1+a_2\boldv_2$ where $a_1=\boldv\cdot\boldv_1=2\sqrt{3}+1$ and $a_2=\boldv\cdot\boldv_2=\sqrt{3}-2\text{.}$ Thus $[\boldv]_{B'}=\begin{bmatrix}2\sqrt{3}+1\\ \sqrt{3}-2 \end{bmatrix}$

(c) As we have seen before, $\underset{B'\rightarrow B}{P}=\begin{bmatrix}\sqrt{3}/2\amp -1/2\\1/2\amp \sqrt{3}/2 \end{bmatrix}$ (put elements of $B'$ in as columns). Hence $\underset{B\rightarrow B'}{P}=(\underset{B'\rightarrow B}{P})^{-1}=\begin{bmatrix}\sqrt{3}/2\amp 1/2\\-1/2\amp \sqrt{3}/2 \end{bmatrix}$ \end{bsolution}

Theorem 5.2.7. Orthogonal matrices.

Let $A$ be an $n\times n$ matrix. The following statements are equivalent.

$A$ is invertible and $A^{-1}=A^T\text{.}$
The columns of $A$ are orthonormal.
The rows of $A$ are orthonormal.
The columns (resp., rows) of $A$ form an orthonormal basis of $\R^n\text{.}$

Definition 5.2.8. Orthogonal matrices.

An $n\times n$ matrix $A$ is orthogonal if it is invertible and $A^{-1}=A^T\text{.}$ Equivalently, $A$ is orthogonal if its columns (or rows) are orthonormal.

Subsection 5.2.5 Gram-Schmidt Process

Subsection 5.2.6 Orthogonal complement

Definition 5.2.9. Orthogonal complement.

. Let $(V,\langle \ , \rangle)$ be an inner product vector space, and let $W\subseteq V$ be a finite-dimensional subspace. The orthogonal complement of $W$, denoted $W^\perp\text{,}$ is defined as

\begin{equation*} W^\perp=\{\boldv\in V\colon \langle \boldv, \boldw\rangle=0 \text{ for all } \boldw\in W\}\text{.} \end{equation*}

In other words $W^\perp$ is the set of vectors that are orthogonal to all elements of $W\text{.}$

Theorem 5.2.10. Orthogonal complement.

Let $(V,\langle \ , \rangle)$ be an inner product vector space, and let $W\subseteq V$ be a subspace.

The orthogonal complement $W^\perp$ is a subspace of $V\text{;}$
We have $W\cap W^\perp=\{\boldzero\}\text{.}$

Subsection 5.2.6.1 Example

Let $V=\R^3$ equipped with the dot product, and let $W=\Span\{(1,1,1)\}\subset \R^3\text{.}$ This is the line defined by the vector $(1,1,1)\text{.}$ Then $W^\perp$ is the set of vectors orthogonal to $(1,1,1)\text{:}$ i.e., the plane perpendicular to $(1,1,1)\text{.}$

Subsection 5.2.7 Geometry of fundamental spaces

The notion of orthogonal complement gives us a new way of understanding the relationship between the various fundamental spaces of a matrix.

Theorem 5.2.11.

Let $A$ be $m\times n\text{,}$ and consider $\R^n$ and $\R^m$ as inner product spaces with respect to the dot product. Then:

$\NS(A)=\left(\RS(A)\right)^\perp\text{,}$ and thus $\RS(A)=\left(\NS(A)\right)^\perp\text{.}$
$\NS(A^T)=\left(\CS(A)\right)^\perp\text{,}$ and thus $\CS(A)=\left(\NS(A^T)\right)^\perp\text{.}$

Proof.

(i) Using the dot product method of matrix multiplication, we see that a vector $\boldv\in\NS(A)$ if and only if $\boldv\cdot\boldr_i=0$ for each row $\boldr_i$ of $A\text{.}$ Since the $\boldr_i$ span $\RS(A)\text{,}$ the linear properties of the dot product imply that $\boldv\cdot\boldr_i=0$ for each row $\boldr_i$ of $A$ if and only if $\boldv\cdot\boldw=0$ for all $\boldw\in\RS(A)$ if and only if $\boldv\in \RS(A)^\perp\text{.}$

(ii) This follows from (i) and the fact that $\CS(A)=\RS(A^T)\text{.}$

Subsection 5.2.8 Example

Understanding the orthogonal relationship between $\NS(A)$ and $\RS(A)$ allows us in many cases to quickly determine/visualize the one from the other. Consider the example $A=\begin{bmatrix}1\amp -1\amp 1\\ 1\amp -1\amp -1 \end{bmatrix}\text{.}$

Looking at the columns, we see easily that $\rank(A)=2\text{,}$ which implies that $\nullity(A)=3-2=1\text{.}$ Since $(1,-1,0)$ is an element of $\NS(A)$ and $\dim(\NS(A))=1\text{,}$ we must have $\NS(A)=\Span\{(1,-1,0)\}\text{,}$ a line.

By orthogonality, we conclude that

\begin{equation*} \RS(A)=\NS(A)^\perp=\text{ (plane perpendicular to $(1,-1,0)$) }\text{.} \end{equation*}

\begin{equation*} \end{equation*}

Subsection 5.2.9 Orthogonal Projection

Theorem 5.2.12. Orthogonal projection theorem.

Let $(V,\langle \ , \rangle)$ be an inner product space, and let $W\subseteq V$ be a finite-dimensional subspace.

Orthogonal decomposition.
For all $\boldv\in V$ there is a unique choice of vectors $\boldw\in W$ and $\boldw^\perp\in W^\perp$ such that $\boldv=\boldw+\boldw^\perp\text{.}$ We call this vector expression an orthogonal decomposition of $\boldv\text{,}$ and denote $\boldw=\proj{\boldv}{W}$ and $\boldw^\perp=\proj{\boldv}{W^\perp}\text{,}$ the orthogonal projections of $\boldv$ onto $W$ and $W^\perp\text{,}$ respectively.
Distance to $W$.
The orthogonal projection $\boldw=\proj{\boldv}{W}$ is the unique element of $W$ that minimizes the distance to $\boldv\text{.}$ In other words

\begin{equation*} \norm{\boldv-\proj{\boldv}{W}}\leq\norm{\boldv-\boldw'} \end{equation*}

for all $\boldw'\in W\text{.}$

Accordingly, we define the distance from $\boldv$ to $W$, denoted $d(\boldw, W)\text{,}$ as

\begin{equation*} d(\boldv, W)=d(\boldv, \proj{\boldv}{W}=\norm{\boldw^\perp}\text{.} \end{equation*}
Orthogonal projection formula.
Pick an orthogonal basis $B=\{\boldv_1,\boldv_2,\dots, \boldv_r\}$ of $W\text{.}$ Then

\begin{equation*} \proj{\boldv}{W}=\sum_{i=1}^r\frac{\angvec{\boldv,\boldv_i}}{\angvec{\boldv_i, \boldv_i}}\boldv_i\text{.} \end{equation*}

Subsection 5.2.10 Proof of orthogonal projection theorem

Pick an orthogonal basis $B=\{\boldv_1,\boldv_2,\dots, \boldv_r\}$ of $W$ and set $\boldw=\sum_{i=1}^r\frac{\angvec{\boldv,\boldv_i}}{\angvec{\boldv_i, \boldv_i}}\boldv_i\text{.}$ This is clearly an element of $W\text{.}$ Next we set $\boldw^\perp=\boldv-\boldw=\boldv-\sum_{i=1}^r\frac{\angvec{\boldv,\boldv_i}}{\angvec{\boldv_i, \boldv_i}}\boldv_i\text{.}$

To complete the proof, we must show the following: (A) $\boldw^\perp\in W^\perp\text{,}$ (B) this choice of $\boldw$ and $\boldw^\perp$ is unique, and (C) $\boldw$ is the closest element of $W$ to $\boldv\text{.}$

Subsection 5.2.10.1 (A)

For all $i$ we have

\begin{align*} \langle\boldw^\perp,\boldv_i\rangle\amp =\amp \langle \boldv-\sum_{i=1}^r\frac{\angvec{\boldv,\boldv_i}}{\angvec{\boldv_i, \boldv_i}}\boldv_i, \boldv_i\rangle\\ \amp =\amp \langle \boldv, \boldv_i\rangle-\langle \sum_{i=1}^r\frac{\angvec{\boldv,\boldv_i}}{\angvec{\boldv_i, \boldv_i}}\boldv_i ,\boldv_i\rangle \hspace{9pt} \text{ (distr.) }\\ \amp =\amp \langle \boldv, \boldv_i\rangle-\frac{\angvec{\boldv,\boldv_i}}{\angvec{\boldv_i,\boldv_i}}\langle\boldv_i,\boldv_i\rangle \hspace{9pt} \text{ (by orthogonality) }\\ \amp =\amp 0 \end{align*}

Subsection 5.2.10.2 (B)+(C)

Recall: $\boldw$ satisfies $\boldv=\boldw+\boldw^\perp\text{,}$ where $\boldw^\perp\in W^\perp\text{.}$ Now take any other $\boldw'\in W\text{.}$ Then

\begin{align*} \norm{\boldv-\boldw'}^2\amp =\amp \norm{\boldw^\perp+(\boldw-\boldw')}^2 =\norm{\boldw^\perp}^2+\norm{\boldw-\boldw'}^2 \hspace{9pt} \text{ (Pythag. theorem) }\\ \amp \geq\amp \norm{\boldw^\perp}^2=\norm{\boldv-\boldw}^2\text{.} \end{align*}

Taking square-roots now proves the desired inequality. Furthermore, we have equality iff the last inequality is an equality iff $\norm{\boldw''}=\norm{\boldw-\boldw'}=0$ iff $\boldw=\boldw'\text{.}$ This proves our choice of $\boldw$ is the unique element of $W$ minimizing the distance to $\boldv\text{!}$

Corollary 5.2.13.

Let $(V,\angvec{\ , \ })$ be an inner product space, and let $W\subseteq V$ be a finite-dimensional subspace. Then $(W^\perp)^\perp=W\text{.}$

Proof.

Clearly $W\subseteq (W^\perp)^\perp\text{.}$ For the other direction, take $\boldv\in (W^\perp)^\perp\text{.}$ Using the orthogonal projection theorem, we can write $\boldv=\boldw+\boldw^\perp$ with $\boldw\in W$ and $\boldw^\perp\in W^\perp\text{.}$ We will show $\boldw^\perp=\boldzero\text{.}$

Since $\boldv\in (W^\perp)^\perp$ we have $\angvec{\boldv,\boldw^\perp}=0\text{.}$ Then we have

\begin{align*} 0\amp =\angvec{\boldv,\boldw^\perp}\\ \amp =\angvec{\boldw+\boldw^\perp,\boldw^\perp}\\ \amp =\angvec{\boldw,\boldw^\perp}+\angvec{\boldw^\perp,\boldw^\perp} \amp \text{ (since $W\perp W^\perp$) }\\ \amp =0+\angvec{\boldw^\perp,\boldw^\perp} \end{align*}

Thus $\angvec{\boldw^\perp,\boldw^\perp}=0\text{.}$ It follows that $\boldw^\perp=\boldzero\text{,}$ and hence $\boldv=\boldw+\boldzero=\boldw\in W\text{.}$

Corollary 5.2.14.

Let $(V,\angvec{\ , \ })$ be an inner product space, and let $W\subseteq V$ be a finite-dimensional subspace.

Define $T\colon V\rightarrow V$ as $T(\boldv)=\proj{\boldv}{W}\text{.}$ Then $T$ is a linear transformation.

In other words, orthogonal projection onto $W$ defines a linear transformation of $V\text{.}$

Proof.

We must show that $T(c\boldv_1+d\boldv_2)=cT(\boldv_1)+dT(\boldv_2)$ for all $c,d\in\R$ and $\boldv_1,\boldv_2\in V\text{.}$ This is easily shown by picking an orthonormal basis $B=\{\boldv_1,\boldv_2, \dots, \boldv_r\}$ of $W$ and using the formula from the orthogonal projection theorem.

Subsection 5.2.11 Projection onto lines and planes in $\R^3$

Let's revisit orthogonal projection onto lines and planes in $\R^3$ passing through the origin. Here the relevant inner product is dot product.

Subsection 5.2.12 Projection onto a line $\ell$

Any line in $\R^3$ passing through the origin can be described as $\ell=\Span\{\boldv_0\}\text{,}$ for some $\boldv_0=(a,b,c)\ne 0\text{.}$ Since this is an orthogonal basis of $\ell\text{,}$ by the orthogonal projection theorem we have, for any $\boldv=(x,y,z)$

\begin{equation*} \proj{\boldv}{\ell}=\frac{\boldv\cdot \boldv_0}{\boldv_0\cdot\boldv_0}\boldv_0=\frac{ax+by+cz}{a^2+b^2+c^2}(a,b,c)=\frac{1}{a^2+b^2+c^2}\begin{bmatrix}a^2\amp ab\amp ac\\ ab\amp b^2\amp bc\\ ac\amp bc\amp c^2 \end{bmatrix} \begin{bmatrix}x\\ y\\ z \end{bmatrix}\text{.} \end{equation*}

We have re-derived the matrix formula for orthogonal projection onto $\ell\text{.}$

Subsection 5.2.13 Projection onto lines and planes in $\R^3$

Let's revisit orthogonal projection onto lines and planes in $\R^3$ passing through the origin. Here the relevant inner product is dot product.

Subsection 5.2.14 Projection onto a plane

Any plane in $\R^3$ passing through the origin can be described with the equation $\mathcal{P}\colon ax+by+cz=0$ for some $\boldn=(a,b,c)\ne 0\text{.}$ This says precisely that $\mathcal{P}$ is the orthogonal complement of the line $\ell=\Span\{(a,b,c)\}\text{:}$ i.e., $\mathcal{P}=\ell^\perp\text{.}$

From the orthogonal projection theorem, we know that

\begin{equation*} \boldv=\proj{\boldv}{\ell}+\proj{\boldv}{\ell^\perp}=\proj{\boldv}{\ell}+\proj{\boldv}{\mathcal{P}}\text{.} \end{equation*}

But then

\begin{equation*} \proj{\boldv}{\mathcal{P}}=\boldv-\proj{\boldv}{\ell}=I \ \boldv-\proj{\boldv}{\ell}=(I-A)\boldv\text{,} \end{equation*}

where $A$ is the matrix formula for $\proj{\boldv}{\ell}$ from the previous example. We conclude that the matrix defining $\proj{\boldv}{\mathcal{P}}$ is

\begin{equation*} I-\frac{1}{a^2+b^2+c^2}\begin{bmatrix}a^2\amp ab\amp ac\\ ab\amp b^2\amp bc\\ ac\amp bc\amp c^2 \end{bmatrix} = \frac{1}{a^2+b^2+c^2}\begin{bmatrix}b^2+c^2\amp -ab\amp -ac\\ -ab\amp a^2+c^2\amp -bc\\ -ac\amp -bc\amp a^2+b^2 \end{bmatrix} \end{equation*}

We can express this in terms of matrix multiplication as

\item Translate the whole picture by $-Q=(-q_1,-q_2, -q_3)\text{,}$ which means we replace $P=(x,y,z)$ with $P-Q=(x-q_1,y-q_2,z-q_3)\text{.}$ \item Apply our formulas from before, replacing $(x,y,z)$ with $(x-q_1,y-q_2,z-q_3)$ \item Translate back by adding $Q$ to your answer.

Subsection 5.2.15 Example: sine/cosine series

Let $V=C[0,2\pi]$ with inner product $\langle f, g\rangle=\int_0^{2\pi}f(x)g(x) \, dx\text{.}$

We have seen that the set

\begin{equation*} B=\{1, \cos(x),\sin(x),\cos(2x),\sin(2x), \dots , \cos(nx),\sin(nx)\} \end{equation*}

is orthogonal. Thus $B$ is an orthogonal basis of $W=\Span(B)\text{,}$ which we might describe as the space of trigonometric polynomials of degree at most $n$.

Given an arbitrary function $f(x)\in C[0,2\pi]\text{,}$ its orthogonal projection onto $W$ is the function

\begin{equation*} \hat{f}(x)=a_0+a_1\cos(x)+b_1\sin(x)+a_2\cos(2x)+b_2\sin(2x)+\cdots +a_n\cos(nx)+b_n\sin(nx)\text{,} \end{equation*}

where

\begin{equation*} a_0=\frac{1}{2\pi}\int_0^{2\pi} f(x) \ dx, \ a_j=\frac{1}{\pi}\int_0^{2\pi}f(x)\cos(jx)\, dx, \ b_k=\frac{1}{\pi}\int_0^{2\pi}f(x)\sin(kx)\, dx\text{.} \end{equation*}

The projection theorem tells us that $\hat{f}$ is the “best” trigonometric polynomial approximation of $f(x)$ (of degree at most $n$), in the sense that for any other sinusoidal $g\in W\text{,}$ $\left\vert\left\vert f-\hat{f}\right\vert\right\vert\leq \norm{f-g}\text{.}$

This means in turn

\begin{equation*} \int_0^{2\pi} (f-\hat{f})^2\, dx\leq \int_0^{2\pi} (f-g)^2 \, dx\text{.} \end{equation*}

Subsection 5.2.16 Example: least-squares solution to $A\boldx=\boldy$

Often in applications we have an $m\times n$ matrix $A$ and vector $\boldy\in\R^m$ for which the matrix equation

\begin{equation*} A\boldx=\boldy \end{equation*}

has no solution. In terms of fundamental spaces, this means simply that $\boldy\notin \CS(A)\text{.}$ Set $W=\CS(A)\text{.}$

In such situations we speak of a least-squares solution to the matrix equation. This is a vector $\hat{\boldx}$ such that $A\hat{\boldx}=\hat{\boldy}\text{,}$ where $\hat{\boldy}=\proj{\boldy}{W}\text{.}$ Here the inner product is taken to be the dot product.

Note: the equation $A\hat{\boldx}=\hat{\boldy}$ is guaranteed to have a solution since $\hat{\boldy}=\proj{\boldy}{W}$ lies in $\CS(A)\text{.}$

The vector $\hat{\boldx}$ is called a least-square solutions because its image $\hat{\boldy}$ is the element of $\CS(A)$ that is “closest” to $\boldy$ in terms of the dot product. Writing $\boldy=(y_1,y_2,\dots,y_n)$ and $\hat{\boldy}=(y_1',y_2',\dots, y_n')\text{,}$ this means that $\hat{\boldy}$ minimizes the distance

\begin{equation*} \norm{\boldy-\hat{\boldy}}=\sqrt{(y_1-y_1')^2+(y_2-y_2')^2+\cdots +(y_n-y_n')^2}\text{.} \end{equation*}

Subsection 5.2.17 Least-squares example (curve fitting)

Suppose we wish to find an equation of a line $y=mx+b$ that best fits (in the least-square's sense) the following $(x,y)$ data points: $P_1=(-3,1), P_2=(1,2), P_3=(2,3)\text{.}$

Then we seek $m$ and $b$ such that

\begin{align*} 1\amp =m(-3)+b\\ 2\amp =m(1)+b\\ 3\amp =m(2)+b\text{,} \end{align*}

or equivalently, we wish to solve $\begin{bmatrix}-3\amp 1\\ 1\amp 1\\ 2\amp 1 \end{bmatrix} \begin{bmatrix}m \\ b \end{bmatrix} =\begin{bmatrix}1\\ 2\\ 3 \end{bmatrix}\text{.}$

This equation has no solution as $\boldy=(1,2,3)$ does no lie in $W=\CS(A)=\Span(\{(-3,1,2),(1,1,1)\}\text{.}$ So instead we compute $\hat{\boldy}=\proj{\boldy}{W}=(13/14,33/14,38/14)\text{.}$ (This was not hard to compute as conveniently the given basis of $W$ was already orthogonal!)

Finally we solve $A\begin{bmatrix}m\\ b \end{bmatrix} =\hat{\boldy}\text{,}$ getting $m=5/14\text{,}$ $b=28/14=2\text{.}$ Thus $y=\frac{5}{14}x+2$ is the line best fitting the data in the least-squares sense.

Subsection 5.2.18 Least-squares example contd.

In what sense does $y=\frac{5}{14}x+2$ “best” fit the data?

Let $\boldy=(1,2,3)=(y_1,y_2,y_3)$ be the given $y$-values of the points, and $\hat{\boldy}=(y_1',y_2',y_3')$ be the projection we computed before. In the graph the values $\epsilon_i$ denote the vertical difference $\epsilon_i=y_i-y_i'$ between the data points, and our fitting line.

The projection $\hat{\boldy}$ makes the error $\norm{\boldy-\hat{\boldy}}=\sqrt{ \epsilon_1^2+\epsilon_2^2+\epsilon_3^2}$ as small as possible.

This means if I draw any other line and compute the corresponding differences $\epsilon_i'$ at the $x$-values -3, 1 and 2, then we have

\begin{equation*} \epsilon_1^2+\epsilon_2^2+\epsilon_3^2\leq (\epsilon_1')^2+(\epsilon_2')^2+(\epsilon_3')^2 \end{equation*}

Subsection 5.2.19 Finding least squares solutions

As the last example illustrated, one method of finding a least-squares solution $\boldx$ to $A\boldx=\boldy$ is to first produce an orthogonal basis for $\CS(A)\text{,}$ then compute $\hat{\boldy}=\proj{\boldy}{\CS(A)}\text{,}$ and then use GE to solve $A\boldx=\hat{\boldy}\text{.}$

Alternatively, it turns out (through a little trickery) that $\hat{\boldy}=A\boldx\text{,}$ where $\boldx$ is a solution to the equation

\begin{equation*} A^TA\boldx=A^T\boldy\text{.} \end{equation*}

This solves us the hassle of computing an orthogonal basis for $\CS(A)\text{;}$ to find a least-squares solution $\boldx$ for $A\boldx=\boldy\text{,}$ we simply use GE to solve the boxed equation. (Some more trickery shows a solution is guaranteed to exist!)

Subsection 5.2.19.1 Example

In the previous example we were seeking a least-squares solution $\boldx=\colvec{m\\ b}$ to $A\boldx=\boldy\text{,}$ where $A=\begin{bmatrix}-3\amp 1\\ 1\amp 1\\ 2\amp 1 \end{bmatrix} , \boldy=\colvec{1\\2\\3}\text{.}$

The equation $A^TA\boldx=A^T\boldy$ is thus

\begin{equation*} \begin{bmatrix}14\amp 0\\ 0\amp 3 \end{bmatrix} \boldx= \colvec{5\\ 6} \end{equation*}

As you can see, $\boldx=\colvec{m\\ b}=\colvec{5/14\\ 2}$ is a least-squares solution, just as before

Exercises 5.2.20 Exercises

1.

The vectors

\begin{equation*} \boldv_1=(1,1,1,1), \boldv_2=(1,-1,1,-1), \boldv_3=(1,1,-1,-1), \boldv_4=(1,-1,-1,1) \end{equation*}

are pairwise orthogonal with respect to the dot product, as is easily verified. For each $\boldv$ below, find the scalars $c_i$ such that

\begin{equation*} \boldv=c_1\boldv_1+c_2\boldv_2+c_3\boldv_3+c_4\boldv_4\text{.} \end{equation*}

$\displaystyle \boldv=(3,0,-1,0)$
$\displaystyle \boldv=(1,2,0,1)$
$\boldv=(a,b,c,d)$ (Your answer will be expressed in terms of $a,b,c\text{,}$ and $d\text{.}$ )

2.

Consider the inner product space given by $V=\R^3$ together with the dot product. Let $W$ be the plane with defining equation $x+2y-z=0\text{.}$ Compute an orthogonal basis of $W\text{,}$ and then extend this to an orthogonal basis of $\R^3\text{.}$

3.

Consider the vector space $V=C([0,1])$ with the integral inner product. Apply Gram-Schmidt to the basis $B=\{1,2^x, 3^x\}$ of $W=\Span(B)$ to obtain an orthogonal basis of $W\text{.}$

Solution.

The resulting orthogonal basis is $B'=\{f_1, f_2,f_3\}\text{,}$ where

\begin{align*} f_1\amp =1\\ f_2\amp =2^x-(\angvec{2^x,1}/\angvec{1,1})1\\ \amp =2^x-(\int_{0}^12^x \ dx)/(\int_0^1 1 \ dx)=2^x-\frac{1}{\ln 2}\\ f_3\amp =3^x-(\angvec{3^x,2^x-\frac{1}{\ln 2}}/\angvec{2^x-\frac{1}{\ln 2}, 2^x-\frac{1}{\ln 2}})(2^x-\frac{1}{\ln 2})-(\angvec{3^x,1}/\angvec{1,1})1\\ \amp =3^x-\frac{\frac{2}{\ln 2\ln 3}+\frac{5}{\ln 6}}{\frac{1}{(\ln 2)^2}+\frac{3}{\ln 4}}(2^x-\frac{1}{\ln 2})-\frac{1}{\ln 3} \end{align*}

OK, I admit, I used technology to compute those integrals.

4.

Consider the vector space $V=P_2$ with the evaluation at $-1, 0, 1$ inner product:

\begin{equation*} \angvec{p(x),q(x)}=p(-1)q(-1)+p(0)q(0)+p(1)q(1)\text{.} \end{equation*}

Apply Gram-Schmidt to the standard basis of $P_2$ to obtain an orthogonal basis of $P_2\text{.}$

5.

Let $V=M_{22}$ with inner product $\angvec{A,B}=\tr(A^TB)\text{,}$ and let $W\subseteq V$ be the subspace of matrices whose trace is 0.

Compute an orthogonal basis for $W\text{.}$ You can do this either by inspection (the space is manageable), or by starting with a simple basis of $W$ and applying the Gram-Schmidt procedure.
Compute $\proj{A}{W}\text{,}$ where

\begin{equation*} A=\begin{bmatrix}1\amp 2\\ 1\amp 1 \end{bmatrix}\text{.} \end{equation*}

6.

Let $V=C([0,1])$ with the integral inner product, and let $f(x)=x\text{.}$ Find the function of the form $g(x)=a+b\cos(2\pi x)+c\sin(2\pi x)$ that “best approximates” $f(x)$ in terms of this inner product: i.e. find the the $g(x)$ of this form that minimizes $d(g,f)\text{.}$

Hint.

The set $S=\{f(x)=1, g(x)=\cos(2\pi x), h(x)=\sin(2\pi x)\}$ is orthogonal with respect to the given inner product.

7.

Let $(V,\langle , \rangle )$ be an inner produce space. Prove: if $\angvec{\boldv,\ \boldw}=0\text{,}$ then

\begin{equation*} \norm{\boldv+\boldw}^2=\norm{\boldv}^2+\norm{\boldw}^2\text{.} \end{equation*}

This result can be thought of as the Pythagorean theorem for general inner product spaces.

8.

Let $(V, \langle , \rangle )$ be an inner product space, let $S=\{\boldw_1, \boldw_2, \dots, \boldw_r\}\subseteq V\text{,}$ and let $W=\Span S\text{.}$ Prove:

\begin{equation*} \boldv\in W^\perp \text{ if and only if } \langle \boldv,\boldw_i \rangle=0 \text{ for all } 1\leq i\leq r\text{.} \end{equation*}

In other words, to check whether an element is in $W^\perp\text{,}$ it suffices to check that it is orthogonal to each element of its spanning set $S\text{.}$

9.

Let $(V, \langle , \rangle )$ be an inner product space, and suppose $B=\{\boldv_1, \boldv_2, \dots, \boldv_n\}$ is an orthonormal basis of $V\text{.}$ Suppose $\boldv, \boldw\in V$ satisfy

\begin{equation*} \boldv=\sum_{i=1}^nc_i\boldv_i, \boldw=\sum_{i=1}^nd_i\boldv_i\text{.} \end{equation*}

Prove:

\begin{equation*} \langle \boldv, \boldw\rangle =\sum_{i=1}^nc_id_i\text{.} \end{equation*}
Prove:

\begin{equation*} \norm{\boldv}=\sqrt{\sum_{i=1}^nc_i^2}\text{.} \end{equation*}

10.

Prove both statements of Theorem 5.2.10.

11.

Prove Corollary 5.2.14 following the suggestion in the text.

12.

Let $V$ an inner product space, and let $W\subseteq V$ be a finite-dimensional subspace. Recall that $\proj{\boldv}{W}$ is defined as the unique $\boldw\in W$ satisfying $\boldv=\boldw+\boldw^\perp\text{,}$ where $\boldw^\perp\in W^\perp\text{.}$ Use this definition (including the uniqueness claim) to prove the following statements.

If $\boldv\in W\text{,}$ then $\proj{\boldv}{W}=\boldv\text{.}$
We have $\boldv\in W^\perp$ if and only if $\proj{\boldv}{W}=\boldzero\text{.}$

13. Dimension of $W^\perp$.

Let $(V, \ \angvec{\ , \ })$ be an inner product space of dimension $n\text{,}$ and suppose $W\subseteq V$ is a subspace of dimension $r\text{.}$ Prove: $\dim W^\perp=n-r\text{.}$

Hint.

Begin by picking an orthogonal basis $B=\{\boldv_1,\dots ,\boldv_r\}$ of $W$ and extend to an orthogonal basis $B'=\{\boldv_1,\boldv_2, \dots, \boldv_r, \boldu_1,\dots , \boldu_{n-r}\}$ of all of $V\text{.}$ Show the $\boldu_i$ form a basis for $W^\perp\text{.}$

14.

We consider the problem of fitting a collection of data points $(x,y)$ with a quadratic curve of the form $y=f(x)=ax^2+bx+c\text{.}$ Thus we are given some collection of points $(x,y)\text{,}$ and we seek parameters $a, b, c$ for which the graph of $f(x)=ax^2+bx+c$ “best fits” the points in some way.

Show, using linear algebra, that if we are given any three points $(x,y)=(r_1,s_1), (r_2,s_2), (r_3,s_3)\text{,}$ where the $x$-coordinates $r_i$ are all distinct, then there is a unique choice of $a,b,c$ such that the corresponding quadratic function agrees precisely with the data. In other words, given just about any three points in the plane, there is a unique quadratic curve connecting them.
Now suppose we are given the four data points

\begin{equation*} P_1=(0,2), P_2=(1,0), P_3=(2,2), P_4=(3,6)\text{.} \end{equation*}
1. Use the least-squares method described in the lecture notes to come up with a quadratic function $y=f(x)$ that “best fits” the data.
2. Graph the function $f$ you found, along with the points $P_i\text{.}$ (You may want to use technology.) Use your graph to explain precisely in what sense $f$ “best fits” the data.

Solution.

Section 5.2 Orthogonal bases and orthogonal projection

Subsection 5.2.1 Orthogonal sets

Definition 5.2.1. Orthogonal.

Theorem 5.2.2. Orthogonal implies linearly independent.

Proof.

Subsection 5.2.2 Example

Proof.

Subsection 5.2.3 Orthogonal bases

Definition 5.2.3. Orthogonal and orthonormal bases.

Theorem 5.2.4. Existence of orthonormal bases.

Procedure 5.2.5. Gram-Schmidt procedure.

Theorem 5.2.6. Calculating with orthogonal bases.

Subsection 5.2.4 Example

Theorem 5.2.7. Orthogonal matrices.

Definition 5.2.8. Orthogonal matrices.

Subsection 5.2.5 Gram-Schmidt Process

Subsection 5.2.6 Orthogonal complement

Definition 5.2.9. Orthogonal complement.

Theorem 5.2.10. Orthogonal complement.

Subsection 5.2.6.1 Example

Subsection 5.2.7 Geometry of fundamental spaces

Theorem 5.2.11.

Proof.

Subsection 5.2.8 Example

Subsection 5.2.9 Orthogonal Projection

Theorem 5.2.12. Orthogonal projection theorem.

Subsection 5.2.10 Proof of orthogonal projection theorem

Subsection 5.2.10.1 (A)

Subsection 5.2.10.2 (B)+(C)

Corollary 5.2.13.

Proof.

Corollary 5.2.14.

Proof.

Subsection 5.2.11 Projection onto lines and planes in $\R^3$

Subsection 5.2.12 Projection onto a line $\ell$

Subsection 5.2.13 Projection onto lines and planes in $\R^3$

Subsection 5.2.14 Projection onto a plane

Subsection 5.2.15 Example: sine/cosine series

Subsection 5.2.16 Example: least-squares solution to $A\boldx=\boldy$

Subsection 5.2.17 Least-squares example (curve fitting)

Subsection 5.2.18 Least-squares example contd.

Subsection 5.2.19 Finding least squares solutions

Subsection 5.2.19.1 Example

Exercises 5.2.20 Exercises

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13. Dimension of \(W^\perp\).

14.