Inner product spaces

Section 5.1 Inner product spaces

An inner product is an additional layer of structure we can define on a vector space \(V\text{.}\) It takes a pair of elements \(\boldv, \boldw\in V\) and returns a scalar \(\langle \boldv,\boldw \rangle\in \R\text{.}\) As with the vector addition and scalar multiplication, we define inner products axiomatically, taking as our model the dot product on \(\R^2\) and \(\R^3\text{.}\) Our definition (5.1.1) simply promulgates a few important properties enjoyed by the dot product that may be familiar to you from studying calculus.

The addition of an inner product enriches the structure of a vector space considerably, and gives rise to a number of additional useful analytic tools. We highlight a few below.

Distance and angle: A notion of distance and angle between two vectors can be defined relative to a given inner product. These provide a numeric measurement of how “close” (distance) or “closely oriented” (angle) two vectors in our space are.
Orthogonality: Two vectors \(\boldv, \boldw\in V\) are orthogonal, relative to a given inner product, if \(\langle \boldv, \boldw\rangle=0\text{.}\) Orthogonality leads further to a general notion of orthogonal projection onto a subspace \(W\subseteq V\text{.}\)
Orthogonal bases: An orthogonal basis of a vector space \(V\text{,}\) relative to a given inner product, is one whose elements are pairwise orthogonal. As we will see there are many computational advantages of working with an orthogonal basis.

Subsection 5.1.1 Inner products

Definition 5.1.1. Inner product.

Let \(V\) be a vector space. An inner product on \(V\) is an operation that takes as input a pair of vectors \(\boldv, \boldw\in V\) and outputs a scalar \(\langle \boldv, \boldw \rangle \in \R\text{.}\) Using function notation:

\begin{align*} \langle \ , \rangle \colon \amp V\times V\rightarrow \R\\ (\boldv_1,\boldv_2)\amp \mapsto \langle \boldv_1,\boldv_2\rangle\text{.} \end{align*}

Furthermore, this operation must satisfy the following axioms.

Symmetry.
For all \(\boldv, \boldw\in V\) we have

\begin{equation*} \langle \boldv, \boldw \rangle =\langle \boldw, \boldv \rangle\text{.} \end{equation*}
Linearity.
For all \(\boldv, \boldw, \boldu\in V\) and \(c, d\in \R\) we have :

\begin{equation*} \langle c\boldv+d\boldw, \boldu \rangle =c \langle \boldv, \boldu \rangle +d \langle \boldw, \boldu \rangle\text{.} \end{equation*}

It follows by (i) (symmetry) that

\begin{equation*} \langle \boldu, c\boldv+d\boldw \rangle =c \langle \boldu, \boldv \rangle +d \langle \boldu, \boldw \rangle\text{.} \end{equation*}
Positivity.
For all \(\boldv\in V\) we have

\begin{equation*} \langle \boldv, \boldv \rangle \geq 0 \end{equation*}

and

\begin{equation*} \langle \boldv, \boldv \rangle =0 \text{ if and only if } \boldv=\boldzero\text{.} \end{equation*}

An inner product space is a pair \((V, \langle , \rangle )\text{,}\) where \(V\) is a vector space, and \(\langle , \rangle \) is a choice of inner product on \(V\text{.}\)

Remark 5.1.2. Inner products of linear combinations.

We will have many opportunities to “expand out” an inner product of two linear combinations of vectors. Using axioms (i) and (ii) in series, this process resembles the procedure for multiplying two polynomials. For example, we have

\begin{align*} \langle c\boldv+d\boldw, e\boldv+f\boldw\rangle \amp = c \langle \boldv, e\boldv+f\boldw \rangle +d \langle \boldw, e\boldv+f\boldw \rangle \amp (\knowl{./knowl/d_innerproduct.html}{\text{5.1.1}}, \text{(ii)})\\ \amp=ce \langle \boldv, \boldv\rangle+cf\langle \boldv,\boldw \rangle +de \langle \boldw, \boldv\rangle+df\langle \boldw, \boldw\rangle \amp (\knowl{./knowl/d_innerproduct.html}{\text{5.1.1}}, \text{(ii)}) \\ \amp = ce\langle \boldv, \boldv\rangle +(cf+de)\langle \boldv, \boldw\rangle +df\langle \boldw, \boldw\rangle \amp (\knowl{./knowl/d_innerproduct.html}{\text{5.1.1}}, \text{(i)})\text{.} \end{align*}

Note how in the last step we are able to group the “cross terms”, \(\langle \boldv, \boldw\rangle=\langle \boldw, \boldv\rangle\) using the symmetry axiom.

More generally, given linear combinations

\begin{align*} \boldv \amp = c_1\boldv_1+c_2\boldv_2+\cdots +c_n\boldv_n=\sum_{i=1}^n c_i\boldv_i \\ \boldw \amp =d_1\boldv_1+d_2\boldv_2+\cdots +d_n\boldv_n=\sum_{i=1}^nd_i\boldv_i \text{,} \end{align*}

the same reasoning shows that

\begin{align*} \langle \boldv, \boldw \rangle \amp= c_1d_1 \langle \boldv_1, \boldv_1 \rangle+c_2d_2 \langle \boldv_2, \boldv_2 \rangle +\cdots+c_nd_n \langle \boldv_n, \boldv_n \rangle +\underset{\text{cross terms}}{\underbrace{(c_1d_2+c_2d_1)\langle \boldv_1,\boldv_2 \rangle +\cdots}} \\ \amp= \sum_{i=1}^nc_id_i \langle \boldv_i, \boldv_i \rangle +\underset{\text{cross terms}}{\underbrace{\sum_{1\leq i\lt j\leq n}(c_{i}d_j+c_jd_i)\langle \boldv_i, \boldv_j \rangle}} \text{.} \end{align*}

In particular, we have

\begin{equation*} \langle \boldv, \boldv\rangle =\sum_{i=1}^nc_i^2 \langle \boldv_i, \boldv_i \rangle +\sum_{1\leq i\lt j\leq n}2c_ic_j \langle \boldv_i, \boldv_j \rangle\text{.} \end{equation*}

We now present a series of important examples of inner products defined on our various inner product spaces. Each is presented as a theorem, as we must prove that the proposed operation satisfies the axios of an inner product. The first example, the weighted dot product is itself a vast generalization of the familiar dot product operations defined on \(\R^2\) and \(\R^3\text{.}\)

Theorem 5.1.3. Weighted dot product.

Let \(V=\R^n\text{.}\) Let \(k_1, k_2, \dots , k_n\) be any list of real numbers. Define an operation on \(\R^n\) as follows: given \(\boldx=(x_1,x_2,\dots, x_n), \boldy=(y_1, y_2, \dots, y_n)\in\R^n\text{,}\) let

\begin{equation*} \langle \boldx, \boldy \rangle=k_1x_1y_1+k_2x_2y_2+\cdots k_nx_ny_n=\sum_{i=1}^nk_ix_iy_i\text{.} \end{equation*}

This operation is an inner product if and only if \(k_i>0\) for all \(i\text{.}\)

We call this inner product a weighted dot product on \(\R^n\text{,}\) or more specifically, the dot product with weights \(k_1, k_2,\dots, k_n\). In the special case where \(k_i=1\) for all \(i\) we call this the (standard) dot product and write \(\boldx\cdot \boldy\) instead of \(\langle \boldx, \boldy \rangle \text{.}\)

Proof.

First we show that axioms (i) and (ii) are satsified for any choice of \(k_i\text{.}\) Let

\begin{equation*} K=\begin{amatrix}[rrrr]k_1\amp 0\amp \dots \amp 0 \\ 0\amp k_2\amp 0\amp \dots \\ \vdots \\ 0\amp \dots\amp 0\amp k_n \end{amatrix}\text{,} \end{equation*}

the diagonal matrix whose \(i\)-th diagonal entry is \(k_i\text{.}\) Then for all \(\boldx=(x_1,x_2,\dots, x_n), \boldy=(y_1,y_2,\dots, y_n)\in \R^n\) we have

\begin{equation*} \langle \boldx, \boldy \rangle=\boldx^TK\boldy=\begin{amatrix}[cccc]x_1\amp x_2\amp \dots\amp x_n \end{amatrix} K\begin{bmatrix}y_1\\ y_2\\ \vdots \\ y_n\end{bmatrix}\text{.} \end{equation*}

Here we treat \(\boldx, \boldy\) as column vectors, and we treat the resulting \(1\times 1\) matrix \(\boldx^T K\boldy\) as a scalar. Axioms (i)-(ii) now follow from various matrix properties. For linearity, for example, we have

\begin{align*} \langle c\boldx_1+d\boldx_2, \boldy \rangle \amp = (c\boldx_1+d\boldx_2)^TK\boldy\\ \amp =(c\boldx_1^T+d\boldx_2^T)K\boldy \amp (\knowl{./knowl/th_trans_props.html}{\text{3.2.11}})\\ \amp =c\boldx_1^TK\boldy+d\boldx_2^TK\boldy \\ \amp =c \langle \boldx_1, \boldy \rangle +d \langle \boldx_2,\boldy \rangle \text{.} \end{align*}

Symmetry requires a little more trickery:

\begin{align*} \langle \boldy, \boldx \rangle \amp = \boldy^TK\boldx \\ \amp = \boldy^TK^T\boldx \amp (K^T=K) \\ \amp = (\boldx^T K\boldy)^T \amp (\knowl{./knowl/th_trans_props.html}{\text{Theorem 3.2.11}})\\ \amp =\boldx^T K \boldy \amp \\ \amp = \langle \boldx, \boldy \rangle \text{.} \end{align*}

Note that \((\boldx^T K\boldy)^T=\boldx^T K\boldy\) as \(\boldx^T K\boldy\) is just a \(1\times 1\) matrix.

Lastly, we show that axiom (iii) is satisfied if and only if \(k_i>0\) for all \(i\text{.}\) To this end consider the formula

\begin{equation*} \langle \boldx, \boldx \rangle=k_1x_1^2+k_2x_2^2+\cdots k_nx_n^2\text{.} \end{equation*}

If \(k_i>0\text{,}\) then since \(x_i^2\geq 0\) for all \(i\text{,}\) we have \(\langle \boldx, \boldx \rangle\geq 0\) for any \(\boldx\text{,}\) and \(\langle \boldx, \boldx \rangle=0\) if and only if \(x_i=0\) for all \(i\) if and only if \(\boldx=\boldzero\text{.}\)

For the other direction suppose \(k_i\leq 0\) for some \(i\text{.}\) Let \(\boldx=\bolde_i\text{,}\) the \(i\)-th element of the standard basis of \(\R^n\text{.}\) Then \(\langle \boldx, \boldx \rangle=k_i\leq 0\text{,}\) in contradiction to the positivity axiom.

Example 5.1.4. Dot product on \(\R^4\).

Let \(\boldx=(-1,2,0,1), \boldy=(1,2,1,1)\text{.}\) Then

\begin{equation*} \boldx\cdot \boldy=-1+4+0+1=4\text{,} \end{equation*}

and

\begin{equation*} \boldx\cdot\boldx=1+4+0+1=6\text{.} \end{equation*}

Example 5.1.5. Weighted dot product.

The dot product with weights \(2, 1, 3\) on \(\R^3\) is defined as

\begin{equation*} \langle \boldx, \boldy \rangle= 2x_1y_1+x_2y_2+3x_3y_3\text{.} \end{equation*}

Let \(\boldx=(-1,-1,-1)\) and \(\boldy=(1,0,1)\text{.}\) We have

\begin{equation*} \langle \boldx, \boldy \rangle =2(-1)+0-3=-5\text{,} \end{equation*}

and

\begin{equation*} \langle \boldx, \boldx \rangle =2(-1)^2+1(-1)^2+3(-1)^2=2+1+3=6\text{.} \end{equation*}

Example 5.1.6. Why the weights must be positive.

Define

\begin{equation*} \langle \boldx, \boldy \rangle =(-1)x_1y_1+2x_2y_2 \end{equation*}

for vectors \(\boldx=(x_1,x_2), \boldy=(y_1,y_2)\in \R^2\text{.}\) Then

\begin{equation*} \langle (3,1), (3,1) \rangle=-9+2=-7\lt 0\text{.} \end{equation*}

Remark 5.1.7.

It is worth highlighting the obersvation in the proof above that a dot product with weights \(k_1, k_2, \dots, k_n\) can be expressed as a matrix product:

\begin{equation*} \langle \boldx, \boldy \rangle=\sum_{i=1}^nk_ix_iy_i=\boldx^TK\boldy\text{,} \end{equation*}

where \(K\) is the diagonal \(n\times n\) matrix whose \(i\)-th diagonal entry is \(k_i\text{.}\) Here \(\boldx, \boldy\) are treated as column vectors, and we identify the resulting \(1\times 1\) matrix \(\boldx^T K \boldy\) with a scalar.

In particular for the standard dot product this matrix formula reduces to

\begin{equation*} \boldx\cdot \boldy=\boldx^T I \boldy=\boldx^T\boldy\text{.} \end{equation*}

Conversely, the dot product gives another way to formulate general matrix multiplication, as the next theorem articulates.

Theorem 5.1.8. Dot product method of matrix multiplication.

Let \(A\) be an \(m\times n\) matrix, and let \(B\) be a \(n\times r\) matrix. Let \(\boldr_i\in \R^n\) be the \(i\)-th row of \(A\text{,}\) and let \(\boldc_j\in \R^n\) be the \(j\)-th column of \(B\text{.}\) Then

\begin{equation*} (AB)_{ij}=\boldr_i\cdot \boldc_j\text{.} \end{equation*}

In other words, the \(ij\)-th entry of \(AB\) is the dot product of the \(i\)-th row of \(A\) and the \(j\)-th column of \(B\text{.}\)

Proof.

Let \(A=[a_{ij}]_{m\times n}\) and \(B=[b_{ij}]_{n\times r}\text{.}\) Then

\begin{equation*} (AB)_{ij}=\sum_{k=1}^na_{ik}b_{kj}=\boldr_i\cdot \boldc_j\text{,} \end{equation*}

since \(\boldr_i=(a_{i1}, a_{i2}, \dots, a_{in})\) and \(\boldc_j=(b_{j1}, b_{2j},\dots, b_{nj})\text{.}\)

Next we introduce an important family of inner products defined on polynomials spaces called evaluation inner products. These are useful when we wish to compare polynomials by how they behave at a specified list of inputs.

Theorem 5.1.9. Evaluation inner products on \(P_n\).

Let \(V=P_n\text{,}\) and let \(c_0, c_1, \dots, c_n\) be any list of \(n+1\) distinct real numbers. For any \(p(x), q(x)\in P_n\) define

\begin{equation*} \langle p(x), q(x) \rangle =p(c_0)q(c_0)+p(c_1)q(c_1)+\cdots +p(c_n)q(c_n)\text{.} \end{equation*}

This defines an inner product on \(P_n\) called an evaluation inner product, or more precisely, evaluation at the inputs \(c_0, c_1, \dots, c_n\).

Proof.

That axioms (i)-(ii) are satisfied is left as an exercise. For axiom (iii), note that

\begin{equation*} \langle p(x), p(x) \rangle =p(c_0)^2+p(c_1)^2+\cdots +p(c_n)^2\geq 0\text{,} \end{equation*}

and we have equality if and only if \(p(c_0)=p(c_1)=\dots p(c_n)=0\text{.}\) Since a nonzero polynomial of degree \(n\) or less has at most \(n\) distinct roots, we conclude that \(p(x)=\boldzero\text{,}\) the zero polynomial.

Example 5.1.10. Evaluation at \(-1, 0, 1\).

Let \(V=P_2\text{,}\) and let \(\langle p(x),q(x) \rangle \) be the evaluation at \(-1, 0, 1\) inner product. Compute \(\langle x^2-1,x^2+2x+1 \rangle \) and \(\langle x^2-1, x^2-1 \rangle. \)

Solution.

Let \(p(x)=x^2-1\text{,}\) \(q(x)=x^2+2x+1\text{.}\) We have

\begin{equation*} \langle p(x), q(x) \rangle=p(-1)q(-1)+p(0)q(0)+p(1)q(1)=0+(-1)1+0=-1 \end{equation*}

and

\begin{equation*} \langle p(x), p(x) \rangle =p(-1)^2+p(0)^2+p(1)^2=0+(-1)^2+0=1\text{.} \end{equation*}

Our last example defines an integral inner product on the space \(C([a,b])\) of continuous functions on an interval \([a,b]\text{.}\) This inner product plays an important role in Fourier analysis, which studies the approximation of arbibitrary continuous functions with linear combinations of certain trigonometric funtions.

Theorem 5.1.11. Integral inner product.

Fix an interval \([a,b]\text{,}\) and let \(V=C([a,b])\text{,}\) the space of all continuous functions on \([a,b]\text{.}\) For any \(f,g\in C([a,b])\) define

\begin{equation*} \langle f,g \rangle=\int_a^bf(x)g(x)\ dx\text{.} \end{equation*}

This defines an inner product on \(C([a,b])\) called the integral inner product.

Proof.

First observe that the integral defining the inner product always exists since the product \(fg\) is a continuous function on the closed interval \([a,b]\text{.}\)

Axioms (i)-(ii) follow directly from the definition and various properties of the integral. This is left as an exercise. As for (iii), we have

\begin{equation*} \langle f, f \rangle=\int_{a}^b f^2(x) \ dx \geq 0\text{,} \end{equation*}

since \(f^2(x)\geq 0\) for all \(x\in [a,b]\text{.}\) (This is a property of integration.) Furthermore, since \(f^2\) is continuous and \(f^2(x)\geq 0\text{,}\) we have

\begin{equation*} \langle f, f \rangle=\int_a^b f^2(x) \ dx=0 \end{equation*}

if and only if \(f^2(x)=0\) for all \(x\in [a,b]\) (a property of integrals of continuous functions) if and only if \(f(x)=0\) for all \(x\in [a,b]\) if and only if \(f=\boldzero\text{,}\) the zero function.

Example 5.1.12. Integral inner product.

Let \(V=C([0,1])\text{,}\) equipped with integral inner product. Let \(f(x)=x\text{,}\) \(g(x)=e^x\text{.}\) Compute \(\langle f,g \rangle \) and \(\langle f,f \rangle \text{.}\)

Solution.

We have

\begin{equation*} \langle f,g \rangle=\int_0^1xe^x\ dx=(xe^x\Bigr\vert_0^1-\int_0^1 e^x\ dx)=e-(e-1)=1 \end{equation*}

and

\begin{equation*} \langle f, f \rangle=\int_0^1 x^2 \ dx=\frac{1}{3}\text{.} \end{equation*}

Subsection 5.1.2 Norm and distance

As mentioned at the top, once an inner product is established, we can define further notions of norm (or length), distance, angle, etc.. When the inner product in question is the standard dot product on \(\R^2\) or \(\R^3\text{,}\) then these are precisely the standard notions you have met in multivariable calculus. For more exotic inner products, however, the corresponding notions of length, distance, etc., are to be a useful generalization of these notions.

Definition 5.1.13. Norm (or length) of a vector.

Let \((V, \langle , \rangle )\) be an inner product space. Given \(\boldv\in V\) we define its norm (or length), denoted \(\norm{\boldv}, \) as

\begin{equation*} \norm{\boldv}=\sqrt{\langle \boldv, \boldv \rangle }\text{.} \end{equation*}

A unit vector is a vector \(\boldv\) of length one: i.e., a vector \(\boldv\) satisfying \(\norm{\boldv}=1\text{.}\)

Remark 5.1.14. Unit vectors.

Given any \(\boldv\ne \boldzero\in V\text{,}\) the vector \(\boldu=\frac{1}{\norm{\boldv}}\boldv\) is a unit vector. To verify this, let \(c=\norm{\boldv}\) and compute

\begin{align*} \norm{\boldu} \amp \sqrt{\langle \frac{1}{c}\boldv,\frac{1}{c}\boldv \rangle }\\ \amp =\sqrt{\frac{1}{c^2}\langle \boldv,\boldv \rangle} \amp (\knowl{./knowl/d_innerproduct.html}{\text{Definition 5.1.1}}, \text{(ii)}) \\ \amp=\frac{1}{c}\sqrt{\langle \boldv,\boldv \rangle } \\ \amp =\frac{\norm{\boldv}}{\norm{\boldv}}=1\text{.} \end{align*}

Next, the distance between two vectors in an inner product space is defined as the length of their difference.

Definition 5.1.15. Distance between vectors.

Let \((V,\langle , \rangle )\) be an inner product space. The distance between \(\boldv, \boldw\in V\), denoted \(d(\boldv, \boldw)\text{,}\) is defined as

\begin{equation*} d(\boldv, \boldw)=\norm{\boldv-\boldw}=\sqrt{\langle \boldv-\boldw,\boldv-\boldw \rangle }\text{.} \end{equation*}

Theorem 5.1.16. Basic properties of norm and distance.

Let \((V,\langle , \rangle)\) be an inner product space.

For all \(\boldv\in V\) we have

\begin{equation*} \norm{\boldv}\geq 0\text{,} \end{equation*}

and equality holds if and only if \(\boldv=0\text{.}\)
For all \(c\in \R\) and \(\boldv\in V\) we have

\begin{equation*} \norm{c\boldv}=\val{c}\norm{\boldv}\text{.} \end{equation*}
For all \(\boldv, \boldw\in V\) we have

\begin{equation*} d(\boldv,\boldw)=d(\boldw,\boldv)\geq 0\text{,} \end{equation*}

and equality holds if and only if \(\boldv=\boldw\text{.}\)

Proof.

This is an elementary exercise of unpacking the various definitions, and is left to the reader.

Subsection 5.1.3 Cauchy-Schwarz inequality, triangle inequalities, and angles between vectors

The famous Cauchy-Schwarz inequality has a knack of cropping up all over the world of science: from properties of covariance in statistics, to the Heisenberg uncertainty principle of quantum mechanics. More directly pertinent to our discussion, the Cauchy-Schwarz inequality implies the triangle inequalities (5.1.18) and ensures that our notion of the angle between two nonzero vectors Definition 5.1.19) is well-defined.

Theorem 5.1.17. Cauchy-Schwarz inequality.

Let \((V,\langle \ , \rangle)\) be an inner product space. For all \(\boldv,\boldw\in V\) we have

\begin{equation*} \val{\langle\boldv,\boldw\rangle}\leq\norm{\boldv}\norm{\boldw}\text{,} \end{equation*}

and equality holds if and only if \(\boldv=c\boldw\) for some \(c\in\R\text{.}\)

Proof.

Fix vectors \(\boldv\) and \(\boldw\text{.}\) For any \(t\in\R\) we have by positivity

\begin{equation*} 0\leq \langle t\boldv-\boldw,t\boldv-\boldw\rangle=\langle\boldv,\boldv\rangle t^2-2\langle\boldv,\boldw\rangle t+\langle\boldw,\boldw\rangle=at^2-2bt+c\text{,} \end{equation*}

where

\begin{equation} a=\langle\boldv,\boldv\rangle, \ b=\langle\boldv,\boldw\rangle, \ c=\langle\boldw,\boldw\rangle=\norm{w}^2\text{.}\label{eq_cauchy_schwarz}\tag{5.1.1} \end{equation}

Since \(at^2-2b\,t+c\geq 0\) for all \(t\in \R\) the quadratic polynomial \(p(t)=at^2-2b\,t+c\) has at most one root. Using the quadratic formula for we conclude that we must have \(4b^2-4ac\leq 0\text{,}\) since otherwise \(p(t)\) would have two distinct roots. Substituting back in for \(a,b,c\) using (5.1.1), we see after a bit of algebra that

\begin{equation*} (\langle\boldv,\boldw\rangle)^2\leq \norm{\boldv}^2\norm{\boldw}^2\text{.} \end{equation*}

Taking square-roots yields the desired inequality.

The same reasoning shows that the Cauchy-Schwarz inequality is an actual equality if and only if \(p(t)=0\) for some \(t\) if and only if \(0=\langle t\boldv-\boldw,t\boldv-\boldw\rangle\) if and only if \(\boldv=t\boldw\) for some \(t\) (by positivity).

The following triangle inequalities are more or less direct consequences of the Cauchy-Schwarz inequality.

Theorem 5.1.18. Triangle Inequalities.

Let \((V, \langle \ , \rangle)\) be an inner product space.

For all \(\boldv, \boldw\in V\) we have

\begin{equation*} \norm{\boldv+\boldw}\leq \norm{\boldv}+\norm{\boldw}\text{.} \end{equation*}
For all \(\boldv, \boldw, \boldu\in V\) we have

\begin{equation*} d(\boldv,\boldw)\leq d(\boldv,\boldu)+d(\boldu,\boldw) \end{equation*}

Proof.

This is an elementary exercise of unpacking the definitions of norm and distance in terms of the inner product, and then applying the Cauchy-Schwarz inequality appropriately. The proof is left as an exercise.

Let \((V, \langle , \rangle )\) be an inner product space. For any nonzero vectors \(\boldv, \boldw\text{,}\) the Cauchy-Schwarz inequality tells us that

\begin{equation*} \val{\langle \boldv, \boldw \rangle }\leq \norm{\boldv}\, \norm{\boldw}\text{,} \end{equation*}

or equivalently,

\begin{equation*} -1\leq \frac{\langle \boldv, \boldw \rangle}{\norm{\boldv}\, \norm{\boldw}} \leq 1\text{.} \end{equation*}

It follows that there is a unique real number \(\theta\in [0,\pi]\) satisfying

\begin{equation*} \cos\theta=\frac{\langle \boldv, \boldw \rangle}{\norm{\boldv}\, \norm{\boldw}}\text{.} \end{equation*}

We define the angle between \(\boldv\) and \(\boldw\) to be this \(\theta\text{.}\)

Definition 5.1.19. Angle between vectors.

Let \((V,\langle , \rangle )\) be an inner product space. Given nonzero vectors \(\boldv, \boldw\in V\text{,}\) the angle between \(\boldv\) and \(\boldw\) is defined to be the unique \(\theta\in [0,\pi]\) satisfying

\begin{equation*} \cos\theta=\frac{\langle \boldv, \boldw \rangle}{\norm{\boldv}\, \norm{\boldw}}\text{.} \end{equation*}

Equivalently, we have

\begin{equation*} \theta=\cos^{-1}\left( \frac{\langle \boldv, \boldw \rangle}{\norm{\boldv}\, \norm{\boldw}} \right)\text{.} \end{equation*}

Subsection 5.1.4 Choosing your inner product

Why, given a fixed vector space \(V\text{,}\) would we prefer one inner product definition to another?

One way of understanding a particular choice of inner product is to ask what its corresponding notion of distance measures.

Subsection 5.1.4.1 Example

Take \(P_n\) with the evaluation inner product at inputs \(x=c_0, c_1,\dots, c_n\text{.}\) Given two polynomials \(p(x), q(x)\text{,}\) the distance between them with respect to this inner product is

\begin{equation*} \norm{p(x)-q(x)}=\sqrt{(p(c_0)-q(c_0))^2+(p(c_1)-q(c_1))^2+\cdots +(p(c_n)-q(c_n))^2}\text{.} \end{equation*}

So in this inner product space the “distance” between two polynomials is a measure of how different their values are at the inputs \(x=c_0,c_1,\dots ,c_n\text{.}\) This inner product may be useful if you are particularly interested in how a polynomial behaves at this finite list of inputs.

Subsection 5.1.4.2 Example

Take \(C[a,b]\) with the standard inner product \(\langle f, g \rangle=\int_a^b f(x)g(x) \ dx\text{.}\) Here the distance between two functions is defined as \(\ds \norm{f-g}=\sqrt{\int_a^b (f(x)-g(x))^2 \ dx}\text{.}\) In particular, a function \(f\) is “close” to the zero function (i.e. “is small” ) if the integral of \(f^2\) is small. This notion is useful in settings where integrals of functions represent quantities we are interested in (e.g. in probability theory, thermodynamics, and quantum mechanics).

Exercises 5.1.5 Exercises

1.

For each of the following operations on \(\R^2\text{,}\) determine whether it defines an inner product on \(\R^2\text{.}\) If it fails to be an inner product, identify which of the three inner product axioms (if any) it does satisfy, and provide explicit counterexamples for any axiom that fails.

\(\angvec{(x_1,x_2),\ (y_1,y_2)}=x_1y_2+x_2y_1\text{.}\)
\(\angvec{(x_1,x_2),\ (y_1,y_2)}=2x_1y_1+x_1y_2+x_2y_1+3x_2y_2\text{.}\)
\(\angvec{(x_1,x_2), \ (y_1,y_2)}=x_1^2y_1^2+x_2^2y_2^2\text{.}\)

Hint.

The operation in (b) is an inner product. Use that fact that

\begin{equation*} \angvec{\boldx,\ \boldy}=\boldx^T \begin{amatrix}[cc]2\amp 1 \\ 1 \amp 3 \end{amatrix}\boldy\text{,} \end{equation*}

where we treat \(\boldx, \boldy\) as column vectors. This helps to prove axioms (i)-(ii). For axiom (iii), use a “complete the square” argument.

2.

We work within the inner product space given by \(V=P_2\) together with the evaluation at 0, 1, 2 inner product.

Let \(q(x)=x\text{.}\) Give a parametric description of the set

\begin{equation*} W=\{p(x)\in P_2\colon \langle p(x), q(x)\rangle =0\}\text{.} \end{equation*}

3.

We work in the inner product space given by \(V=C([-\pi,\pi])\) together with the integral inner product.

Let \(f(x)=\cos x, g(x)=\sin x\text{.}\) Compute \(\langle f,g \rangle \) and \(\norm{g}\text{.}\)
Show that if \(f(x)\) is an odd function (i.e., \(f(x)=-f(-x)\) for all \(x\)) and \(g(x)\) is an even function (\(g(-x)=g(x)\) for all \(x\)), then \(\langle f, g \rangle=0 \text{.}\) Hint: use the area interpretation of the integral and properties of even/odd functions.

4.

Compute the angle between the given vectors with respect to the given inner product. Do not give your answer in terms of \(\arccos\text{:}\) i.e., the angles can be computed by hand.

\(V=\R^4\) with the standard dot product; \(\boldv=(1,1,1,1), \boldw=(1,-1,1,1)\)
\(V=C([0,1])\) with the integral inner product; \(f(x)=1, g(x)=x\text{.}\)
\(V=P_2\) with evaluation at \(-1, 1\) inner product; \(p(x)=-\frac{1}{2}x+\frac{1}{2}, q(x)=2x\)

5.

Let \(\boldv, \boldw\in V\text{,}\) and let \(\theta\) be the angle between them. Prove the following equivalence:

\begin{equation*} \norm{\boldv+\boldw}=\norm{\boldv}+\norm{\boldw}\text{ if and only if } \theta=0\text{.} \end{equation*}

Your proof should be a chain of equivalences with each step justified.

Solution.

\begin{align*} \norm{\boldv+\boldw}=\norm{\boldv}+\norm{\boldw}\amp \Leftrightarrow \norm{\boldv+\boldw}^2=\left(\norm{\boldv}+\norm{\boldw}\right)^2 \amp \text{ ( square both sides) }\\ \amp \Leftrightarrow (\boldv+\boldw)\cdot(\boldv+\boldw)=\norm{\boldv}^2+2\norm{\boldv}\norm{\boldw}+\norm{\boldw}^2\\ \amp \Leftrightarrow \boldv\cdot\boldv+2\boldv\cdot\boldw+\boldw\cdot\boldw=\boldv\cdot\boldv+2\norm{\boldv}\norm{\boldw}+\boldw\cdot\boldw\\ \amp \Leftrightarrow \boldv\cdot\boldw=\norm{\boldv}\norm{\boldw}\\ \amp \Leftrightarrow \frac{\boldv\cdot\boldw}{\norm{\boldv}\norm{\boldw}}=1\\ \amp \Leftrightarrow \cos(\theta)=1\\ \amp \Leftrightarrow \theta=0\text{.} \end{align*}

6.

Let \((V, \langle , \rangle )\) be an inner product space. Suppose vectors \(\boldv, \boldw\in V\) satisfy \(\norm{\boldv}=2\) and \(\norm{\boldw}=3\text{.}\) Using the Cauchy-Schwarz inequality (5.1.17) find the maximum and minimum possible values of \(\norm{\boldv-\boldw}\text{,}\) and give explicit examples where those values occur.

7.

Prove all statements of Theorem 5.1.16.

8.

Prove each inequality below using the Cauchy-Schwarz inequality (5.1.17) applied to a judicious choice of inner product space, and possibly a judicious choice of vector in said inner product space.

For all \(f, g\in C([a,b])\)

\begin{equation*} \left(\int_a^b f(x)g(x) \ dx\right)^2\leq \int_a^b f^2(x)\ dx\int_a^b g^2(x) \ dx\text{.} \end{equation*}
For all \((x_1,x_2,\dots, x_n)\in\R^n\text{,}\)

\begin{equation*} (x_1+x_2+\cdots +x_n)\leq\sqrt{x_1^2+x_2^2+\cdots +x_n^2}\sqrt{n}\text{.} \end{equation*}
For all \(a,b,\theta\in\R\)

\begin{equation*} (a\cos\theta+b\sin\theta)^2\leq a^2+b^2\text{.} \end{equation*}

9. Isometries of inner product spaces.

Let \((V,\angvec{ \ , })\) be an inner product space. An isometry of \(V\) is a function \(f\colon V\rightarrow V\) that preserves distance: i.e.,

\begin{equation*} d(f(\boldv), f(\boldw))=d(\boldv, \boldw) \text{ for all \(\boldv, \boldw\in V\) }\text{.} \end{equation*}

In this exercise we will show that any isometry that maps \(\boldzero\) to \(\boldzero\) is a linear transformation. This is a very useful fact. For example, it implies the linearity of many geometric transformations we have considered: rotation about the origin in \(\R^2\text{,}\) reflection through a line in \(\R^2\text{,}\) etc..

In what follows assume that \(f\) is an isometry of \(V\) satisfying \(f(\boldzero)=\boldzero\text{.}\)

Prove that \(\norm{f(\boldv)}=\norm{\boldv}\text{:}\) i.e., \(f\) preserves norms.
Prove \(\angvec{f(\boldv), f(\boldw)}=\angvec{\boldv, \boldw}\text{:}\) i.e., \(f\) preserves inner products. Hint: first prove that \(\angvec{\boldv, \boldw}=\frac{1}{2}(\norm{\boldv}^2+\norm{\boldw}^2-\norm{\boldv-\boldw}^2)\text{.}\)
To prove \(f\) is linear it is enough to show \(f(\boldv+c\boldw)=f(\boldv)+cf(\boldw)\) for all \(\boldv, \boldw\in V\text{,}\) \(c\in \R\text{.}\) To do so, use the above parts to show that

\begin{equation*} \norm{f(\boldv+c\boldw)-(f(\boldv)+cf(\boldw))}=0\text{.} \end{equation*}

Hint: regroup this difference in a suitable manner so that you can use parts (a)-(b). You may also want to use the identity

\begin{equation*} \norm{\boldv-\boldw}^2=\norm{\boldv}^2-2\angvec{\boldv,\boldw}+\norm{\boldw}^2\text{.} \end{equation*}