Skip to main content

Section 5.2 Orthogonal bases and orthogonal projection

Subsection 5.2.1 Orthogonal sets

Definition 5.2.1. Orthogonal.

Let \((V,\langle \ , \rangle)\) be an inner product space. Vectors \(\boldv, \boldw\in V\) are orthogonal if \(\langle \boldv, \boldw\rangle =0\text{.}\)

Let \(S\subseteq V\) be a subset of nonzero vectors.

  • The set \(S\) is orthogonal if \(\langle\boldv,\boldw \rangle=0\) for all \(\boldv\ne\boldw\in S\text{.}\) We say that the elements of \(S\) are pairwise orthogonal in this case.

  • The set \(S\) is orthonormal if it is both orthogonal and satisfies \(\norm{\boldv}=1\) for all \(\boldv\in S\text{:}\) i.e., \(S\) consists of pairwise orthogonal unit vectors.

\begin{align*} a_1\boldv_1 +a_2\boldv_2+\cdots +a_r\boldv_r=\boldzero\amp \Rightarrow\amp \langle a_1\boldv_1 +a_2\boldv_2 +\cdots +a_r\boldv_r,\boldv_i\rangle=\langle\boldzero,\boldv_i\rangle\\ \amp \Rightarrow\amp a_1\langle\boldv_1,\boldv_i\rangle +a_2\langle \boldv_2,\boldv_i\rangle +\cdots +a_r\langle\boldv_r,\boldv_i\rangle=0\\ \amp \Rightarrow\amp a_i\langle \boldv_i,\boldv_i\rangle=0 \ \text{ (since \(\langle\boldv_j,\boldv_i\rangle= 0\) for \(j\ne i\)) }\\ \amp \Rightarrow\amp a_i=0 \text{ (since \(\langle\boldv_i,\boldv_i\rangle\ne 0\)) } \end{align*}

We have shown that if \(a_1\boldv_1+a_2\boldv_2+\cdots +a_r\boldv_r=\boldzero\text{,}\) then \(a_i=0\) for all \(i\text{,}\) proving that \(S\) is linearly independent.

Subsection 5.2.2 Example

Let \(V=C([0,2\pi])\) with standard inner product \(\langle f, g\rangle=\int_0^{2\pi} f(x)g(x) \, dx\text{.}\)

Let

\begin{equation*} S=\{\cos(x),\sin(x),\cos(2x),\sin(2x), \dots\}=\{\cos(nx)\colon n\in\Z_{>0}\}\cup\{\sin(mx)\colon m\in\Z_{>0}\}\text{.} \end{equation*}

Then \(S\) is orthogonal, hence linearly independent.

Using some trig identities, one can show the following:

\begin{align*} \langle \cos(nx),\cos(mx)\rangle=\int_0^{2\pi}\cos(nx)\cos(mx)\, dx\amp =\begin{cases} 0\amp \text{ if \(n\ne m\) }\\ \pi\amp \text{ if \(n=m\) } \end{cases}\\ \langle \sin(nx),\sin(mx)\rangle=\int_0^{2\pi}\sin(nx)\sin(mx)\, dx\amp =\begin{cases} 0\amp \text{ if \(n\ne m\) }\\ \pi\amp \text{ if \(n=m\) } \end{cases}\\ \langle \cos(nx),\sin(mx)\rangle=\int_0^{2\pi}\cos(nx)\sin(mx)\, dx\amp =0 \text{ for any \(n,m\) } \end{align*}

Orthogonality holds more generally if we replace the interval \([0,2\pi]\) with any interval of length \(L\text{,}\) and replace \(S\) with

\begin{equation*} \scriptsize \left\{\cos\left(\frac{2\pi x}{L}\right), \sin\left(\frac{2\pi x}{L}\right), \cos\left(2\cdot\frac{2\pi x}{L}\right),\sin\left(2\cdot\frac{2\pi x}{L}\right),\dots\right\}\text{.} \end{equation*}

Subsection 5.2.3 Orthogonal bases

Definition 5.2.3. Orthogonal and orthonormal bases.

Let \((V,\langle \ , \rangle)\) be an inner product space. An orthogonal basis (resp., orthonormal basis) of \(V\) is a basis \(B\) that is orthogonal (resp., orthonormal) as a set.

The proof that every finite-dimensional vector space has an orthogonal basis is actually a procedure, called the Gram-Schmidt procedure, for converting an arbitrary basis for an inner product space to an orthogonal basis.

Subsection 5.2.4 Example

Let \(V=\R^2\) with the standard inner produce (aka the dot product).

(a) Verify that \(B'=\{\boldv_1=(\sqrt{3}/2,1/2), \boldv_2=(-1/2,\sqrt{3}/2)\}\) is an orthonormal basis.

(b) Compute \([\boldv]_{B'}\) for \(\boldv=(4,2)\text{.}\)

(c) Compute \(\underset{B\rightarrow B'}{P}\text{,}\) where \(B\) is the standard basis. \​begin{bsolution}

(a) Easily seen to be true.

(b) Since \(B'\) is orthonormal, \(\boldv=a_1\boldv_1+a_2\boldv_2\) where \(a_1=\boldv\cdot\boldv_1=2\sqrt{3}+1\) and \(a_2=\boldv\cdot\boldv_2=\sqrt{3}-2\text{.}\) Thus \([\boldv]_{B'}=\begin{bmatrix}2\sqrt{3}+1\\ \sqrt{3}-2 \end{bmatrix}\)

(c) As we have seen before, \(\underset{B'\rightarrow B}{P}=\begin{bmatrix}\sqrt{3}/2\amp -1/2\\1/2\amp \sqrt{3}/2 \end{bmatrix}\) (put elements of \(B'\) in as columns). Hence \(\underset{B\rightarrow B'}{P}=(\underset{B'\rightarrow B}{P})^{-1}=\begin{bmatrix}\sqrt{3}/2\amp 1/2\\-1/2\amp \sqrt{3}/2 \end{bmatrix}\) \end{bsolution}

Definition 5.2.8. Orthogonal matrices.

An \(n\times n\) matrix \(A\) is orthogonal if it is invertible and \(A^{-1}=A^T\text{.}\) Equivalently, \(A\) is orthogonal if its columns (or rows) are orthonormal.

Subsection 5.2.5 Gram-Schmidt Process

Subsection 5.2.6 Orthogonal complement

Definition 5.2.9. Orthogonal complement.

. Let \((V,\langle \ , \rangle)\) be an inner product vector space, and let \(W\subseteq V\) be a finite-dimensional subspace. The orthogonal complement of \(W\), denoted \(W^\perp\text{,}\) is defined as

\begin{equation*} W^\perp=\{\boldv\in V\colon \langle \boldv, \boldw\rangle=0 \text{ for all } \boldw\in W\}\text{.} \end{equation*}

In other words \(W^\perp\) is the set of vectors that are orthogonal to all elements of \(W\text{.}\)

Subsection 5.2.6.1 Example

Let \(V=\R^3\) equipped with the dot product, and let \(W=\Span\{(1,1,1)\}\subset \R^3\text{.}\) This is the line defined by the vector \((1,1,1)\text{.}\) Then \(W^\perp\) is the set of vectors orthogonal to \((1,1,1)\text{:}\) i.e., the plane perpendicular to \((1,1,1)\text{.}\)

Subsection 5.2.7 Geometry of fundamental spaces

The notion of orthogonal complement gives us a new way of understanding the relationship between the various fundamental spaces of a matrix.

(i) Using the dot product method of matrix multiplication, we see that a vector \(\boldv\in\NS(A)\) if and only if \(\boldv\cdot\boldr_i=0\) for each row \(\boldr_i\) of \(A\text{.}\) Since the \(\boldr_i\) span \(\RS(A)\text{,}\) the linear properties of the dot product imply that \(\boldv\cdot\boldr_i=0\) for each row \(\boldr_i\) of \(A\) if and only if \(\boldv\cdot\boldw=0\) for all \(\boldw\in\RS(A)\) if and only if \(\boldv\in \RS(A)^\perp\text{.}\)

(ii) This follows from (i) and the fact that \(\CS(A)=\RS(A^T)\text{.}\)

Subsection 5.2.8 Example

Understanding the orthogonal relationship between \(\NS(A)\) and \(\RS(A)\) allows us in many cases to quickly determine/visualize the one from the other. Consider the example \(A=\begin{bmatrix}1\amp -1\amp 1\\ 1\amp -1\amp -1 \end{bmatrix}\text{.}\)

Looking at the columns, we see easily that \(\rank(A)=2\text{,}\) which implies that \(\nullity(A)=3-2=1\text{.}\) Since \((1,-1,0)\) is an element of \(\NS(A)\) and \(\dim(\NS(A))=1\text{,}\) we must have \(\NS(A)=\Span\{(1,-1,0)\}\text{,}\) a line.

By orthogonality, we conclude that

\begin{equation*} \RS(A)=\NS(A)^\perp=\text{ (plane perpendicular to \((1,-1,0)\)) }\text{.} \end{equation*}
\begin{equation*} \end{equation*}

Subsection 5.2.9 Orthogonal Projection

Subsection 5.2.10 Proof of orthogonal projection theorem

Pick an orthogonal basis \(B=\{\boldv_1,\boldv_2,\dots, \boldv_r\}\) of \(W\) and set \(\boldw=\sum_{i=1}^r\frac{\angvec{\boldv,\boldv_i}}{\angvec{\boldv_i, \boldv_i}}\boldv_i\text{.}\) This is clearly an element of \(W\text{.}\) Next we set \(\boldw^\perp=\boldv-\boldw=\boldv-\sum_{i=1}^r\frac{\angvec{\boldv,\boldv_i}}{\angvec{\boldv_i, \boldv_i}}\boldv_i\text{.}\)

To complete the proof, we must show the following: (A) \(\boldw^\perp\in W^\perp\text{,}\) (B) this choice of \(\boldw\) and \(\boldw^\perp\) is unique, and (C) \(\boldw\) is the closest element of \(W\) to \(\boldv\text{.}\)

Subsection 5.2.10.1 (A)

For all \(i\) we have

\begin{align*} \langle\boldw^\perp,\boldv_i\rangle\amp =\amp \langle \boldv-\sum_{i=1}^r\frac{\angvec{\boldv,\boldv_i}}{\angvec{\boldv_i, \boldv_i}}\boldv_i, \boldv_i\rangle\\ \amp =\amp \langle \boldv, \boldv_i\rangle-\langle \sum_{i=1}^r\frac{\angvec{\boldv,\boldv_i}}{\angvec{\boldv_i, \boldv_i}}\boldv_i ,\boldv_i\rangle \hspace{9pt} \text{ (distr.) }\\ \amp =\amp \langle \boldv, \boldv_i\rangle-\frac{\angvec{\boldv,\boldv_i}}{\angvec{\boldv_i,\boldv_i}}\langle\boldv_i,\boldv_i\rangle \hspace{9pt} \text{ (by orthogonality) }\\ \amp =\amp 0 \end{align*}

Subsection 5.2.10.2 (B)+(C)

Recall: \(\boldw\) satisfies \(\boldv=\boldw+\boldw^\perp\text{,}\) where \(\boldw^\perp\in W^\perp\text{.}\) Now take any other \(\boldw'\in W\text{.}\) Then

\begin{align*} \norm{\boldv-\boldw'}^2\amp =\amp \norm{\boldw^\perp+(\boldw-\boldw')}^2 =\norm{\boldw^\perp}^2+\norm{\boldw-\boldw'}^2 \hspace{9pt} \text{ (Pythag. theorem) }\\ \amp \geq\amp \norm{\boldw^\perp}^2=\norm{\boldv-\boldw}^2\text{.} \end{align*}

Taking square-roots now proves the desired inequality. Furthermore, we have equality iff the last inequality is an equality iff \(\norm{\boldw''}=\norm{\boldw-\boldw'}=0\) iff \(\boldw=\boldw'\text{.}\) This proves our choice of \(\boldw\) is the unique element of \(W\) minimizing the distance to \(\boldv\text{!}\)

Clearly \(W\subseteq (W^\perp)^\perp\text{.}\) For the other direction, take \(\boldv\in (W^\perp)^\perp\text{.}\) Using the orthogonal projection theorem, we can write \(\boldv=\boldw+\boldw^\perp\) with \(\boldw\in W\) and \(\boldw^\perp\in W^\perp\text{.}\) We will show \(\boldw^\perp=\boldzero\text{.}\)

Since \(\boldv\in (W^\perp)^\perp\) we have \(\angvec{\boldv,\boldw^\perp}=0\text{.}\) Then we have

\begin{align*} 0\amp =\angvec{\boldv,\boldw^\perp}\\ \amp =\angvec{\boldw+\boldw^\perp,\boldw^\perp}\\ \amp =\angvec{\boldw,\boldw^\perp}+\angvec{\boldw^\perp,\boldw^\perp} \amp \text{ (since \(W\perp W^\perp\)) }\\ \amp =0+\angvec{\boldw^\perp,\boldw^\perp} \end{align*}

Thus \(\angvec{\boldw^\perp,\boldw^\perp}=0\text{.}\) It follows that \(\boldw^\perp=\boldzero\text{,}\) and hence \(\boldv=\boldw+\boldzero=\boldw\in W\text{.}\)

We must show that \(T(c\boldv_1+d\boldv_2)=cT(\boldv_1)+dT(\boldv_2)\) for all \(c,d\in\R\) and \(\boldv_1,\boldv_2\in V\text{.}\) This is easily shown by picking an orthonormal basis \(B=\{\boldv_1,\boldv_2, \dots, \boldv_r\}\) of \(W\) and using the formula from the orthogonal projection theorem.

Subsection 5.2.11 Projection onto lines and planes in $\R^3$

Let's revisit orthogonal projection onto lines and planes in \(\R^3\) passing through the origin. Here the relevant inner product is dot product.

Subsection 5.2.12 Projection onto a line $\ell$

Any line in \(\R^3\) passing through the origin can be described as \(\ell=\Span\{\boldv_0\}\text{,}\) for some \(\boldv_0=(a,b,c)\ne 0\text{.}\) Since this is an orthogonal basis of \(\ell\text{,}\) by the orthogonal projection theorem we have, for any \(\boldv=(x,y,z)\)

\begin{equation*} \proj{\boldv}{\ell}=\frac{\boldv\cdot \boldv_0}{\boldv_0\cdot\boldv_0}\boldv_0=\frac{ax+by+cz}{a^2+b^2+c^2}(a,b,c)=\frac{1}{a^2+b^2+c^2}\begin{bmatrix}a^2\amp ab\amp ac\\ ab\amp b^2\amp bc\\ ac\amp bc\amp c^2 \end{bmatrix} \begin{bmatrix}x\\ y\\ z \end{bmatrix}\text{.} \end{equation*}

We have re-derived the matrix formula for orthogonal projection onto \(\ell\text{.}\)

Subsection 5.2.13 Projection onto lines and planes in $\R^3$

Let's revisit orthogonal projection onto lines and planes in \(\R^3\) passing through the origin. Here the relevant inner product is dot product.

Subsection 5.2.14 Projection onto a plane

Any plane in \(\R^3\) passing through the origin can be described with the equation \(\mathcal{P}\colon ax+by+cz=0\) for some \(\boldn=(a,b,c)\ne 0\text{.}\) This says precisely that \(\mathcal{P}\) is the orthogonal complement of the line \(\ell=\Span\{(a,b,c)\}\text{:}\) i.e., \(\mathcal{P}=\ell^\perp\text{.}\)

From the orthogonal projection theorem, we know that

\begin{equation*} \boldv=\proj{\boldv}{\ell}+\proj{\boldv}{\ell^\perp}=\proj{\boldv}{\ell}+\proj{\boldv}{\mathcal{P}}\text{.} \end{equation*}

But then

\begin{equation*} \proj{\boldv}{\mathcal{P}}=\boldv-\proj{\boldv}{\ell}=I \ \boldv-\proj{\boldv}{\ell}=(I-A)\boldv\text{,} \end{equation*}

where \(A\) is the matrix formula for \(\proj{\boldv}{\ell}\) from the previous example. We conclude that the matrix defining \(\proj{\boldv}{\mathcal{P}}\) is

\begin{equation*} I-\frac{1}{a^2+b^2+c^2}\begin{bmatrix}a^2\amp ab\amp ac\\ ab\amp b^2\amp bc\\ ac\amp bc\amp c^2 \end{bmatrix} = \frac{1}{a^2+b^2+c^2}\begin{bmatrix}b^2+c^2\amp -ab\amp -ac\\ -ab\amp a^2+c^2\amp -bc\\ -ac\amp -bc\amp a^2+b^2 \end{bmatrix} \end{equation*}

We can express this in terms of matrix multiplication as

\item Translate the whole picture by \(-Q=(-q_1,-q_2, -q_3)\text{,}\) which means we replace \(P=(x,y,z)\) with \(P-Q=(x-q_1,y-q_2,z-q_3)\text{.}\) \item Apply our formulas from before, replacing \((x,y,z)\) with \((x-q_1,y-q_2,z-q_3)\) \item Translate back by adding \(Q\) to your answer.

Subsection 5.2.15 Example: sine/cosine series

Let \(V=C[0,2\pi]\) with inner product \(\langle f, g\rangle=\int_0^{2\pi}f(x)g(x) \, dx\text{.}\)

We have seen that the set

\begin{equation*} B=\{1, \cos(x),\sin(x),\cos(2x),\sin(2x), \dots , \cos(nx),\sin(nx)\} \end{equation*}

is orthogonal. Thus \(B\) is an orthogonal basis of \(W=\Span(B)\text{,}\) which we might describe as the space of trigonometric polynomials of degree at most \(n\).

Given an arbitrary function \(f(x)\in C[0,2\pi]\text{,}\) its orthogonal projection onto \(W\) is the function

\begin{equation*} \hat{f}(x)=a_0+a_1\cos(x)+b_1\sin(x)+a_2\cos(2x)+b_2\sin(2x)+\cdots +a_n\cos(nx)+b_n\sin(nx)\text{,} \end{equation*}

where

\begin{equation*} a_0=\frac{1}{2\pi}\int_0^{2\pi} f(x) \ dx, \ a_j=\frac{1}{\pi}\int_0^{2\pi}f(x)\cos(jx)\, dx, \ b_k=\frac{1}{\pi}\int_0^{2\pi}f(x)\sin(kx)\, dx\text{.} \end{equation*}

The projection theorem tells us that \(\hat{f}\) is the “best” trigonometric polynomial approximation of \(f(x)\) (of degree at most \(n\)), in the sense that for any other sinusoidal \(g\in W\text{,}\) \(\left\vert\left\vert f-\hat{f}\right\vert\right\vert\leq \norm{f-g}\text{.}\)

This means in turn

\begin{equation*} \int_0^{2\pi} (f-\hat{f})^2\, dx\leq \int_0^{2\pi} (f-g)^2 \, dx\text{.} \end{equation*}

Subsection 5.2.16 Example: least-squares solution to $A\boldx=\boldy$

Often in applications we have an \(m\times n\) matrix \(A\) and vector \(\boldy\in\R^m\) for which the matrix equation

\begin{equation*} A\boldx=\boldy \end{equation*}

has no solution. In terms of fundamental spaces, this means simply that \(\boldy\notin \CS(A)\text{.}\) Set \(W=\CS(A)\text{.}\)

In such situations we speak of a least-squares solution to the matrix equation. This is a vector \(\hat{\boldx}\) such that \(A\hat{\boldx}=\hat{\boldy}\text{,}\) where \(\hat{\boldy}=\proj{\boldy}{W}\text{.}\) Here the inner product is taken to be the dot product.

Note: the equation \(A\hat{\boldx}=\hat{\boldy}\) is guaranteed to have a solution since \(\hat{\boldy}=\proj{\boldy}{W}\) lies in \(\CS(A)\text{.}\)

The vector \(\hat{\boldx}\) is called a least-square solutions because its image \(\hat{\boldy}\) is the element of \(\CS(A)\) that is “closest” to \(\boldy\) in terms of the dot product. Writing \(\boldy=(y_1,y_2,\dots,y_n)\) and \(\hat{\boldy}=(y_1',y_2',\dots, y_n')\text{,}\) this means that \(\hat{\boldy}\) minimizes the distance

\begin{equation*} \norm{\boldy-\hat{\boldy}}=\sqrt{(y_1-y_1')^2+(y_2-y_2')^2+\cdots +(y_n-y_n')^2}\text{.} \end{equation*}

Subsection 5.2.17 Least-squares example (curve fitting)

Suppose we wish to find an equation of a line \(y=mx+b\) that best fits (in the least-square's sense) the following \((x,y)\) data points: \(P_1=(-3,1), P_2=(1,2), P_3=(2,3)\text{.}\)

Then we seek \(m\) and \(b\) such that

\begin{align*} 1\amp =m(-3)+b\\ 2\amp =m(1)+b\\ 3\amp =m(2)+b\text{,} \end{align*}

or equivalently, we wish to solve \(\begin{bmatrix}-3\amp 1\\ 1\amp 1\\ 2\amp 1 \end{bmatrix} \begin{bmatrix}m \\ b \end{bmatrix} =\begin{bmatrix}1\\ 2\\ 3 \end{bmatrix}\text{.}\)

This equation has no solution as \(\boldy=(1,2,3)\) does no lie in \(W=\CS(A)=\Span(\{(-3,1,2),(1,1,1)\}\text{.}\) So instead we compute \(\hat{\boldy}=\proj{\boldy}{W}=(13/14,33/14,38/14)\text{.}\) (This was not hard to compute as conveniently the given basis of \(W\) was already orthogonal!)

Finally we solve \(A\begin{bmatrix}m\\ b \end{bmatrix} =\hat{\boldy}\text{,}\) getting \(m=5/14\text{,}\) \(b=28/14=2\text{.}\) Thus \(y=\frac{5}{14}x+2\) is the line best fitting the data in the least-squares sense.

Subsection 5.2.18 Least-squares example contd.

In what sense does \(y=\frac{5}{14}x+2\) “best” fit the data?

Let \(\boldy=(1,2,3)=(y_1,y_2,y_3)\) be the given \(y\)-values of the points, and \(\hat{\boldy}=(y_1',y_2',y_3')\) be the projection we computed before. In the graph the values \(\epsilon_i\) denote the vertical difference \(\epsilon_i=y_i-y_i'\) between the data points, and our fitting line.

The projection \(\hat{\boldy}\) makes the error \(\norm{\boldy-\hat{\boldy}}=\sqrt{ \epsilon_1^2+\epsilon_2^2+\epsilon_3^2}\) as small as possible.

This means if I draw any other line and compute the corresponding differences \(\epsilon_i'\) at the \(x\)-values -3, 1 and 2, then we have

\begin{equation*} \epsilon_1^2+\epsilon_2^2+\epsilon_3^2\leq (\epsilon_1')^2+(\epsilon_2')^2+(\epsilon_3')^2 \end{equation*}

Subsection 5.2.19 Finding least squares solutions

As the last example illustrated, one method of finding a least-squares solution \(\boldx\) to \(A\boldx=\boldy\) is to first produce an orthogonal basis for \(\CS(A)\text{,}\) then compute \(\hat{\boldy}=\proj{\boldy}{\CS(A)}\text{,}\) and then use GE to solve \(A\boldx=\hat{\boldy}\text{.}\)

Alternatively, it turns out (through a little trickery) that \(\hat{\boldy}=A\boldx\text{,}\) where \(\boldx\) is a solution to the equation

\begin{equation*} A^TA\boldx=A^T\boldy\text{.} \end{equation*}

This solves us the hassle of computing an orthogonal basis for \(\CS(A)\text{;}\) to find a least-squares solution \(\boldx\) for \(A\boldx=\boldy\text{,}\) we simply use GE to solve the boxed equation. (Some more trickery shows a solution is guaranteed to exist!)

Subsection 5.2.19.1 Example

In the previous example we were seeking a least-squares solution \(\boldx=\colvec{m\\ b}\) to \(A\boldx=\boldy\text{,}\) where \(A=\begin{bmatrix}-3\amp 1\\ 1\amp 1\\ 2\amp 1 \end{bmatrix} , \boldy=\colvec{1\\2\\3}\text{.}\)

The equation \(A^TA\boldx=A^T\boldy\) is thus

\begin{equation*} \begin{bmatrix}14\amp 0\\ 0\amp 3 \end{bmatrix} \boldx= \colvec{5\\ 6} \end{equation*}

As you can see, \(\boldx=\colvec{m\\ b}=\colvec{5/14\\ 2}\) is a least-squares solution, just as before

Exercises 5.2.20 Exercises

1.

The vectors

\begin{equation*} \boldv_1=(1,1,1,1), \boldv_2=(1,-1,1,-1), \boldv_3=(1,1,-1,-1), \boldv_4=(1,-1,-1,1) \end{equation*}

are pairwise orthogonal with respect to the dot product, as is easily verified. For each \(\boldv\) below, find the scalars \(c_i\) such that

\begin{equation*} \boldv=c_1\boldv_1+c_2\boldv_2+c_3\boldv_3+c_4\boldv_4\text{.} \end{equation*}
  1. \(\displaystyle \boldv=(3,0,-1,0)\)

  2. \(\displaystyle \boldv=(1,2,0,1)\)

  3. \(\boldv=(a,b,c,d)\) (Your answer will be expressed in terms of \(a,b,c\text{,}\) and \(d\text{.}\) )

2.

Consider the inner product space given by \(V=\R^3\) together with the dot product. Let \(W\) be the plane with defining equation \(x+2y-z=0\text{.}\) Compute an orthogonal basis of \(W\text{,}\) and then extend this to an orthogonal basis of \(\R^3\text{.}\)

3.

Consider the vector space \(V=C([0,1])\) with the integral inner product. Apply Gram-Schmidt to the basis \(B=\{1,2^x, 3^x\}\) of \(W=\Span(B)\) to obtain an orthogonal basis of \(W\text{.}\)

Solution.

The resulting orthogonal basis is \(B'=\{f_1, f_2,f_3\}\text{,}\) where

\begin{align*} f_1\amp =1\\ f_2\amp =2^x-(\angvec{2^x,1}/\angvec{1,1})1\\ \amp =2^x-(\int_{0}^12^x \ dx)/(\int_0^1 1 \ dx)=2^x-\frac{1}{\ln 2}\\ f_3\amp =3^x-(\angvec{3^x,2^x-\frac{1}{\ln 2}}/\angvec{2^x-\frac{1}{\ln 2}, 2^x-\frac{1}{\ln 2}})(2^x-\frac{1}{\ln 2})-(\angvec{3^x,1}/\angvec{1,1})1\\ \amp =3^x-\frac{\frac{2}{\ln 2\ln 3}+\frac{5}{\ln 6}}{\frac{1}{(\ln 2)^2}+\frac{3}{\ln 4}}(2^x-\frac{1}{\ln 2})-\frac{1}{\ln 3} \end{align*}

OK, I admit, I used technology to compute those integrals.

4.

Consider the vector space \(V=P_2\) with the evaluation at \(-1, 0, 1\) inner product:

\begin{equation*} \angvec{p(x),q(x)}=p(-1)q(-1)+p(0)q(0)+p(1)q(1)\text{.} \end{equation*}

Apply Gram-Schmidt to the standard basis of \(P_2\) to obtain an orthogonal basis of \(P_2\text{.}\)

5.

Let \(V=M_{22}\) with inner product \(\angvec{A,B}=\tr(A^TB)\text{,}\) and let \(W\subseteq V\) be the subspace of matrices whose trace is 0.

  1. Compute an orthogonal basis for \(W\text{.}\) You can do this either by inspection (the space is manageable), or by starting with a simple basis of \(W\) and applying the Gram-Schmidt procedure.

  2. Compute \(\proj{A}{W}\text{,}\) where

    \begin{equation*} A=\begin{bmatrix}1\amp 2\\ 1\amp 1 \end{bmatrix}\text{.} \end{equation*}

6.

Let \(V=C([0,1])\) with the integral inner product, and let \(f(x)=x\text{.}\) Find the function of the form \(g(x)=a+b\cos(2\pi x)+c\sin(2\pi x)\) that “best approximates” \(f(x)\) in terms of this inner product: i.e. find the the \(g(x)\) of this form that minimizes \(d(g,f)\text{.}\)

Hint.

The set \(S=\{f(x)=1, g(x)=\cos(2\pi x), h(x)=\sin(2\pi x)\}\) is orthogonal with respect to the given inner product.

7.

Let \((V,\langle , \rangle )\) be an inner produce space. Prove: if \(\angvec{\boldv,\ \boldw}=0\text{,}\) then

\begin{equation*} \norm{\boldv+\boldw}^2=\norm{\boldv}^2+\norm{\boldw}^2\text{.} \end{equation*}

This result can be thought of as the Pythagorean theorem for general inner product spaces.

8.

Let \((V, \langle , \rangle )\) be an inner product space, let \(S=\{\boldw_1, \boldw_2, \dots, \boldw_r\}\subseteq V\text{,}\) and let \(W=\Span S\text{.}\) Prove:

\begin{equation*} \boldv\in W^\perp \text{ if and only if } \langle \boldv,\boldw_i \rangle=0 \text{ for all } 1\leq i\leq r\text{.} \end{equation*}

In other words, to check whether an element is in \(W^\perp\text{,}\) it suffices to check that it is orthogonal to each element of its spanning set \(S\text{.}\)

9.

Let \((V, \langle , \rangle )\) be an inner product space, and suppose \(B=\{\boldv_1, \boldv_2, \dots, \boldv_n\}\) is an orthonormal basis of \(V\text{.}\) Suppose \(\boldv, \boldw\in V\) satisfy

\begin{equation*} \boldv=\sum_{i=1}^nc_i\boldv_i, \boldw=\sum_{i=1}^nd_i\boldv_i\text{.} \end{equation*}
  1. Prove:

    \begin{equation*} \langle \boldv, \boldw\rangle =\sum_{i=1}^nc_id_i\text{.} \end{equation*}
  2. Prove:

    \begin{equation*} \norm{\boldv}=\sqrt{\sum_{i=1}^nc_i^2}\text{.} \end{equation*}

12.

Let \(V\) an inner product space, and let \(W\subseteq V\) be a finite-dimensional subspace. Recall that \(\proj{\boldv}{W}\) is defined as the unique \(\boldw\in W\) satisfying \(\boldv=\boldw+\boldw^\perp\text{,}\) where \(\boldw^\perp\in W^\perp\text{.}\) Use this definition (including the uniqueness claim) to prove the following statements.

  1. If \(\boldv\in W\text{,}\) then \(\proj{\boldv}{W}=\boldv\text{.}\)

  2. We have \(\boldv\in W^\perp\) if and only if \(\proj{\boldv}{W}=\boldzero\text{.}\)

13. Dimension of \(W^\perp\).

Let \((V, \ \angvec{\ , \ })\) be an inner product space of dimension \(n\text{,}\) and suppose \(W\subseteq V\) is a subspace of dimension \(r\text{.}\) Prove: \(\dim W^\perp=n-r\text{.}\)

Hint.

Begin by picking an orthogonal basis \(B=\{\boldv_1,\dots ,\boldv_r\}\) of \(W\) and extend to an orthogonal basis \(B'=\{\boldv_1,\boldv_2, \dots, \boldv_r, \boldu_1,\dots , \boldu_{n-r}\}\) of all of \(V\text{.}\) Show the \(\boldu_i\) form a basis for \(W^\perp\text{.}\)

14.

We consider the problem of fitting a collection of data points \((x,y)\) with a quadratic curve of the form \(y=f(x)=ax^2+bx+c\text{.}\) Thus we are given some collection of points \((x,y)\text{,}\) and we seek parameters \(a, b, c\) for which the graph of \(f(x)=ax^2+bx+c\) “best fits” the points in some way.

  1. Show, using linear algebra, that if we are given any three points \((x,y)=(r_1,s_1), (r_2,s_2), (r_3,s_3)\text{,}\) where the \(x\)-coordinates \(r_i\) are all distinct, then there is a unique choice of \(a,b,c\) such that the corresponding quadratic function agrees precisely with the data. In other words, given just about any three points in the plane, there is a unique quadratic curve connecting them.

  2. Now suppose we are given the four data points

    \begin{equation*} P_1=(0,2), P_2=(1,0), P_3=(2,2), P_4=(3,6)\text{.} \end{equation*}
    1. Use the least-squares method described in the lecture notes to come up with a quadratic function \(y=f(x)\) that “best fits” the data.

    2. Graph the function \(f\) you found, along with the points \(P_i\text{.}\) (You may want to use technology.) Use your graph to explain precisely in what sense \(f\) “best fits” the data.