Diagonalization

Section 6.5 Diagonalization

Definition 6.5.1.

A linear transformation $T\colon V\rightarrow V$ is diagonalizable if there exists a basis $B$ of $V$ for which $[T]_B$ is a diagonal matrix.

At last we relate the property of being diagonalizable with the notion of eigenvectors. In the process we make clear what we mean when we say $T$ is diagonalizable if and only if it has “enough” linearly independent eigenvectors.

Theorem 6.5.2. Diagonalizability theorem.

Let $T\colon V\rightarrow V$ be linear, $\dim(V)=n\text{.}$

Qualitative statement.
Given basis $B$ of $V\text{,}$ the matrix $[T]_B$ is diagonal if and only if $B$ consists of eigenvectors of $T\text{.}$ Thus $T$ is diagonalizable if and only if there is a basis of $V$ consisting of eigenvectors of $T\text{.}$
Quantitative statement.
Let $\lambda_1, \lambda_2, \dots, \lambda_r$ be the distinct eigenvalues of $T\text{,}$ let $W_{\lambda_j}$ be the corresponding eigenspaces, and let $n_j=\dim W_{\lambda_j}\text{.}$ Then

\begin{equation*} T \text{ is diagonalizable } \iff n_1+n_2+\cdots +n_r=n\text{.} \end{equation*}

Proof.

The first statement was a homework exercise. The proof of the second statement is within our means, but somewhat lengthy. I will sketch its proof elsewhere. For now it is more important to understand how to use the result.

Procedure 6.5.3. Deciding whether a linear transformation is diagonalizable.

Let $T\colon V\rightarrow V\text{,}$ where $V$ is an $n$-dimensional vector space. To decide whether $T$ is diagonalizable proceed as follows.

Use Procedure 6.4.5 to compute the distinct real eigenvalues $\lambda_1, \lambda_2, \dots, \lambda_r$ of $T\text{,}$ as well as bases of their corresponding eigenspaces.
We have

\begin{equation*} T \text{ diagonalizable }\iff \sum_{i=1}^r\dim W_{\lambda_i}=n\text{.} \end{equation*}
Let $B_i$ be an ordered basis for $W_{\lambda_i}\text{,}$ and let $B$ be the ordered basis obtained by concatenating the bases $B_1, B_2, \dots, B_r\text{.}$ Then $[T]_B$ is a diagonal matrix representing $T\text{.}$

Procedure 6.5.4.

Let $A$ be an $n\times n$ matrix. To decide whether $A$ is diagonalizable, proceed as follows.

Compute the distinct real eigenvalues $\lambda_1, \lambda_2, \dots, \lambda_r$ of $A$ as well as bases for their corresponding eigenspaces.
We have

\begin{equation*} A \text{ diagonalizable }\iff \sum_{i=1}^n\dim W_{\lambda_i}=n\text{.} \end{equation*}
Let $B_i$ be an ordered basis for $W_{\lambda_i}\text{,}$ let $B'$ be the ordered basis obtained by concatenating the bases $B_1, B_2, \dots, B_r\text{,}$ and let $P$ be the matrix whose $j$-th column is the $j$-th element of $B'\text{.}$ Then

\begin{equation*} D=P^{-1}AP \end{equation*}

is diagonal. Furthermore, we have $D=[T_A]_{B'}\text{:}$ i.e., $D$ is the matrix representing $T_A$ with respect to the basis $B'$ .

Subsection 6.5.1 Deciding whether $A_{n\times n}$ is diagonalizable

Find the distinct eigenvalues, $\lambda_1,\dots, \lambda_r\text{,}$ of $A\text{,}$ let $W_{\lambda_i}$ be the corresponding eigenspaces, and let $n_i=\dim W_{\lambda_i}\text{.}$
$A$ is diagonalizable if and only if $n_1+n_2+\cdots +n_r=n\text{.}$
If the above equality is true, compute bases $B_j$ for each $W_{\lambda_j}\text{.}$
Place all the vectors from the bases $B_j$ as columns of a matrix $P\text{.}$ As these eigenvectors are linearly independent, $P$ is invertible.
The matrix $D=P^{-1}AP$ is diagonal. In more detail, the $j$-th diagonal entry of $D$ is the eigenvalue associated to the eigenvector in the $j$-th column of $P\text{.}$ This means the first $n_1$ diagonal entries of $D$ are equal to $\lambda_1\text{,}$ the next $n_2$ entries are equal to $\lambda_2\text{,}$ etc.

Subsection 6.5.2 Example

Take $A=\begin{bmatrix}1\amp 1\\ 0\amp 1 \end{bmatrix}\text{.}$ Earlier I claimed that this matrix is not diagonalizable. Let's see why.

The characteristic polynomial of $A$ is $p(t)=(t-1)^2\text{.}$ Thus $\lambda=1$ is the only eigenvalue of $A\text{.}$

We have $W_1=\NS(I-A)=\NS\begin{bmatrix}0\amp -1\\ 0\amp 0 \end{bmatrix}\text{.}$ We see clearly that $\rank(I-A)=1\text{,}$ and hence $\dim W_1=\dim\NS(I-A)=2-1=1\text{.}$

Since $W_1$ is the only eigenspace, and since $\dim W_1=1\ne 2\text{,}$ we conclude $A$ is not diagonalizable.

Subsection 6.5.3 Example

Let $A=\begin{bmatrix}14 \amp 21 \amp 3 \amp -39 \\ 12 \amp 25 \amp 3 \amp -41 \\ 12 \amp 24 \amp 5 \amp -42 \\ 12 \amp 22 \amp 3 \amp -38 \end{bmatrix}\text{.}$

The characteristic polynomial of $A$ is $p(t)=x^4 - 6x^3 + 9x^2 + 4x - 12\text{.}$ (This is not obvious, but would be a pain to compute in detail. You may take this for granted.)

Our usual factoring tricks allow us to factor this as $p(x)=(x-2)^2(x+1)(x-3)\text{.}$

The eigenspaces are $W_2=\NS(2I-A), W_{-1}=\NS(-I-A)\text{,}$ and $W_3=(3I-A)\text{.}$ I'll leave it to you to verify that they have bases $B=\{ (3,2,0,2), (1,1,2,1)\}\text{,}$ $B'=\{(1,1,1,1)\}\text{,}$ and $B''=\{(3,5,6,4)\}\text{,}$ respectively.

It follows that the dimensions of the eigenspaces are 2, 1, and 1, respectively. Since $2+1+1=4\text{,}$ we conclude that $A$ is diagonalizable.

In more detail, we have $D=P^{-1}AP\text{,}$ where

\begin{equation*} D=\begin{bmatrix}2\amp 0\amp 0\amp 0\\ 0\amp 2\amp 0\amp 0\\ 0\amp 0\amp -1\amp 0\\ 0\amp 0\amp 0\amp 3 \end{bmatrix} , \ \ P=\begin{bmatrix}3 \amp 1 \amp 1 \amp 3 \\ 2 \amp 1 \amp 1 \amp 5 \\ 0 \amp 2 \amp 1 \amp 6 \\ 2 \amp 1 \amp 1 \amp 4 \end{bmatrix} \end{equation*}

Subsection 6.5.4 Geometric and algebraic multiplicity

Take $A$ (or $T$) and suppose the characteristic polynomial $p(t)$ factors as

\begin{equation*} p(t)=(t-\lambda_1)^{n_1}(t-\lambda_2)^{n_2}\cdots (t-\lambda_r)^{n_r}\text{,} \end{equation*}

where the $\lambda_i$ are the distinct eigenvalues of $A$ (or $T$). It turns out that the exponent $n_i\text{,}$ called the algebraic multiplicity of the eigenvalue $\lambda_i\text{,}$ is an upper bound on $m_i=\dim W_{\lambda_i}\text{,}$ called the geometric multiplicity.

Theorem 6.5.5. Algebraic and geometric multiplicity theorem.

Let $A$ (or $T$) have characterisitc polynomial

\begin{equation*} p(t)=(t-\lambda_1)^{n_1}(t-\lambda_2)^{n_2}\cdots (t-\lambda_r)^{n_r}\text{,} \end{equation*}

where the $\lambda_i$ are the distinct eigenvalues of $A$ (or $T$). Then

\begin{equation*} \dim W_{\lambda_i}\leq n_i: \end{equation*}

i.e., the geometric multiplicity is less than or equal to the algebraic multiplicity.

Subsection 6.5.5 Linear independence and eigenvectors

The following result is used to prove the diagonalizability theorem (Theorem 6.5.2), but is also very useful in its own right.

Theorem 6.5.6.

Let $T\colon V\rightarrow V$ be a linear transformation, and let $S=\{\boldv_1,\dots, \boldv_r\}$ be a set of eigenvectors of $T$ with $T\boldv_i=\lambda_i\boldv_i\text{.}$

If the $\lambda_i$ are all distinct, then $S$ is linearly independent.

Proof.

Proved elsewhere.

Corollary 6.5.7.

Let $T\colon V\rightarrow V$ be a linear transformation, and suppose $\dim V=n\text{.}$

If $T$ has $n$ \alert{distinct} eigenvalues, then $T$ is diagonalizable.

Proof.

Let $\boldv_1, \boldv_2,\dots, \boldv_n$ be eigenvectors corresponding to these $n$ distinct eigenvalues. The theorem tells us they form a linearly independent set. Since $\dim V=n\text{,}$ they form a basis for $V$ by the dimension theorem compendium. Since $T$ has a basis of eigenvectors, it is diagonalizable.

Theorem 6.5.6 makes no assumption about the dimension of $V\text{.}$ It can thus be applied to interesting infinite-dimensional examples.

Example 6.5.8.

Let $V=C^\infty(\R)\text{,}$ and let $T\colon V\rightarrow V$ be defined as $T(f)=f'\text{.}$

Let $f_i(x)=e^{k_ix}\text{,}$ where the $k_i$ are all distinct constants. I claim $S=\{f_1,f_2,\dots , f_r\}$ is linearly independent.

Indeed, each $f_i$ is an eigenvector of $T\text{,}$ since $T(f_i)=(e^{k_ix})'=k_ie^{k_ix}=k_if_i\text{.}$

Since the $k_i$'s are all distinct, it follows that the $f_i$ are eigenvectors with distinct eigenvalues, hence linearly independent!

Note: try proving that $S$ is linearly independent using the the Wronskian! You get a very interesting determinant computation.

Subsection 6.5.6 Final extension of the invertibility theorem

Lastly, we can add one final statement to the invertibility theorem: $A$ is invertible if and only if $0$ is not an eigenvalue of $A\text{.}$

Indeed $0$ is an eigenvalue of $A$ if and only if $p(0)=\det(0I-A)=\det(-A)=0$ if and only if $\det A=0\text{,}$ since $\det(-A)=(-1)^n\det A\text{.}$

Since $\det A=0$ if and only if $A$ is not invertible, we conclude that $0$ is an eigenvalue of $A$ if and only if $A$ is not invertible.

You find the final version of the invertibility theorem on the next slide.

Subsection 6.5.7 Diagonalizable matrices

Definition 6.5.9.

An $n\times n$ matrix $A$ is diagonalizable if there is an invertible matrix $P$ such that $D=P^{-1}AP$ is diagonal.

In other words, $A$ is diagonalizable if it is similar to a diagonal matrix $D\text{.}$

Remark 6.5.10.

If $A$ is itself diagonal, then it is diagonalizable: we may choose $P=I_n$ in the definition.
In this section we will develop a systematic procedure for determining whether a matrix is diagonalizable. As we will see, the answer is yes if and only if $A$ has “enough” linearly independent eigenvectors. Of course we will spell out precisely what we mean by “enough”.
Not all matrices are diagonalizable. For example, $A=\begin{bmatrix}0\amp 1\\ 0\amp 0 \end{bmatrix}$ is not diagonalizable, as the aforementioned procedure will show.
Roughly speaking, you should interpret being diagonalizable as meaning “as good as diagonal”. To elaborate: doing arithmetic with diagonal matrices $D$ is extremely easy; if we know $A$ is diagonalizable, meaning it is similar to a diagonal matrix $D\text{,}$ then it shares many essential properties of $D\text{,}$ and we can use the relation $D=P^{-1}AP$ to help ease arithmetic computations involving $A\text{.}$

Subsection 6.5.8 Properties of conjugation

If we have, $D=P^{-1}AP\text{,}$ why exactly is there such a close connection between $D$ and $A\text{?}$ One explanation has to do with the underlying operation

\begin{equation*} A\longmapsto P^{-1}AP\text{,} \end{equation*}

which we call conjugation by $P$. As the following theorem outlines, conjugation by an invertible $P$ satisfies many useful properties, and we use these to relate the matrix $A$ with the matrix $P^{-1}AP\text{.}$

Theorem 6.5.11. Properties of conjugation.

Let $P$ be any invertible $n\times n$ matrix.

$P^{-1}(c_1A_1+c_2A_2)P=c_1P^{-1}A_1P+c_2P^{-1}A_2P\text{.}$ (\alert{Conjugation by $P$ is a linear transformation}.)
$P^{-1}A^nP=(P^{-1}AP)^n$ for any integer $n\geq 0\text{.}$ If $A$ is invertible, the equality holds for all integers $n\text{.}$ (\alert{Conjugation preserves powers}.)
Recall that given any polynomial $f(x)=\anpoly$ and any $n\times n$ matrix $A$ we define $f(A)=a_nA^n+a_{n-1}A^{n-1}+a_1A+a_0I_n\text{.}$ We have $f(P^{-1}AP)=P^{-1}f(A)P$ for any polynomial $f(x)\text{,}$ and any $n\times n$ matrix $A\text{.}$ (\alert{Conjugation preserves polynomial evaluation}.)

Proof.

Exercise.

Subsection 6.5.9 Examples: utility of diagonalizability

Suppose $A$ is diagonalizable, so that $D=P^{-1}AP\text{,}$ where $D$ is diagonal with diagonal entries $d_i\text{.}$ Using Theorem 6.5.11, we can now see how to translate statements about $D$ (which are generally easy to prove), to statements about $A$ (which otherwise might have been difficult to show).

Example 6.5.12.

To compute $A^{n}$ (hard) we can just compute $D^{n}$ (easy) and then observe that $A=PDP^{-1}\text{,}$ and thus

\begin{equation*} A^{n}=(PDP^{-1})^{n}=PD^{n}P^{-1}\text{,} \end{equation*}

where the last equality follows from part (b) of Theorem 6.5.11; here we let $P^{-1}$ assume the role of $P$ in the theorem statement.

For example, let $A=\begin{bmatrix}1\amp 3\\ 1\amp -1 \end{bmatrix}\text{.}$ Let's compute $A^n$ for arbitrary $n\text{.}$

We have $D=P^{-1}AP\text{,}$ where $D=\begin{bmatrix}2\amp 0\\ 0\amp -2 \end{bmatrix}\text{,}$ and $P=\begin{bmatrix}3\amp 1\\ 1\amp -1 \end{bmatrix}\text{.}$ (This is not obvious yet. Soon we will have the tools to see why this is so.)

Then $A=PDP^{-1}\text{,}$ and thus

\begin{equation*} A^{n}=PD^{n}P^{-1}=P\begin{bmatrix}2^{100}\amp 0\\ 0\amp (-2)^{100} \end{bmatrix} P^{-1}=\frac{1}{4}\begin{bmatrix}3\cdot2^n+(-2)^n\amp 3\cdot 2^n-3(-2)^{n}\\ 2^{n}-(-2)^n\amp 2^n+3(-2)^{n} \end{bmatrix}\text{.} \end{equation*}

Subsection 6.5.10 Examples: utility of diagonalizability

Suppose $A$ is diagonalizable, so that $D=P^{-1}AP\text{,}$ where $D$ is diagonal with diagonal entries $d_i\text{.}$

Example 6.5.13.

More generally, we have $f(A)=f(PDP^{-1})=Pf(D)P^{-1}$ for any polynomial $f(x)=\anpoly\text{.}$ Since $D$ is diagonal, with diagonal entries $d_i\text{,}$ it is easy to see that $f(D)$ is also diagonal, with diagonal entries $f(d_i)\text{.}$

In particular we see that $f(A)=\underset{n\times n}{\boldzero}$ if and only if $f(D)=\underset{n\times n}{\boldzero}\text{,}$ and this holds if and only if $f(d_i)=0$ for each diagonal entry $d_i$ of $D\text{.}$

Take the matrix $A$ from Example 6.5.12, and let $f(x)=(x-2)(x+2)=x^2-4\text{.}$ Since $f(2)=f(-2)=0\text{,}$ it follows that $f(D)=f(A)=\underset{2\times 2}{\boldzero}\text{.}$ In other words, $A^2-4I=\underset{2\times 2}{\boldzero}\text{,}$ as you can easily check.

Subsection 6.5.11 Examples: utility of diagonalizability

Suppose $A$ is diagonalizable, so that $D=P^{-1}AP\text{,}$ where $D$ is diagonal with diagonal entries $d_i\text{.}$

Example 6.5.14.

$A$ has a square-root (i.e., a matrix $B$ such that $B^2=A$) iff $D$ has a square-root.

Indeed, suppose $B^2=A\text{.}$ Set $C=P^{-1}BP\text{.}$ Then $C^2=P^{-1}B^2P=P^{-1}AP=D\text{.}$ Similarly, if $C^2=D\text{,}$ then $B^2=A\text{,}$ where $B=PCP^{-1}\text{.}$

As an example, the matrix $A=\begin{bmatrix}0\amp -2\\ 1 \amp 3 \end{bmatrix}\text{,}$ satisfies $D=P^{-1}AP\text{,}$ where $D=\begin{bmatrix}1\amp 0\\ 0\amp 2 \end{bmatrix}\text{,}$ and $P=\begin{bmatrix}2\amp 1\\ -1\amp -1 \end{bmatrix}\text{.}$ Since $C=\begin{bmatrix}1\amp 0\\ 0\amp \sqrt{2} \end{bmatrix}$ is a square-root of $D\text{,}$ $B=PCP^{-1}=\begin{bmatrix}2-\sqrt{2}\amp 2-2\sqrt{2}\\ -1+\sqrt{2}\amp -1+2\sqrt{2} \end{bmatrix}$ is a square-root of $A\text{,}$ as you can easily check.

So when exactly does a diagonal matrix $D$ have a square-root? Clearly, it is sufficient that $d_i\geq 0$ for all $i\text{,}$ as in the example above. Interestingly, this is not a necessary condition! Indeed, consider the following example:

$\begin{bmatrix}-1\amp 0\\ 0\amp -1 \end{bmatrix} =\begin{bmatrix}0\amp -1\\ 1\amp 0 \end{bmatrix} ^2\text{.}$

Subsection 6.5.12 Properties of similarity

Before investigating the question of when a matrix is diagonalizable, we record a few more properties illustrating the close connection between similar matrices.

Theorem 6.5.15. Properties of similarity.

Suppose $A$ is similar to $B\text{:}$ i.e., there is an invertible matrix $P$ such that $B=P^{-1}AP\text{.}$ Then:

$B$ is similar to $A\text{.}$ (\alert{Similarity is symmetric}.)
$A$ and $B$ have the same trace and determinant.
$A$ and $B$ have the same rank.
$A$ and $B$ have the same characteristic polynomial.
$A$ and $B$ have the same eigenvalues.
Given any $\lambda\in\R\text{,}$ let $W_\lambda$ be the corresponding eigenspace for $A\text{,}$ and $W_\lambda'$ the corresponding eigenspace for $B\text{.}$ Then $\dim W_{\lambda}=\dim W_{\lambda}'\text{.}$

The proof of (d) follows. I leave the rest as an exercise.

Proof.

By definition we have $B=P^{-1}AP$ for some matrix $P\text{.}$ We wish to show the characteristic polynomials $p_A(t)$ and $p_B(t)$ of the two matrices are equal. Compute:

\begin{align*} p_B(t)\amp =\amp \det(tI-B)\\ \amp =\amp \det(tI-P^{-1}AP)\\ \amp =\amp \det(tP^{-1}IP-P^{-1}AP) \ \text{ (since $P^{-1}IP=I$)}\\ \amp =\amp \det(P^{-1}tIP-P^{-1}AP) \ \text{ ($t$ behaves as scalar) }\\ \amp =\amp \det(P^{-1}(tI-A)P)\\ \amp =\amp \det(P^{-1})\det(tI-A)\det(P)\\ \amp =\amp (\det(P))^{-1}\det(P)\det(tI-A)\\ \amp =\amp \det(tI-A)=p_A(t)\text{.} \end{align*}

Subsection 6.5.13 The true meaning of similarity

Hopefully Theorems 6.5.11 and Theorem 6.5.15 convince you that similar matrices (in the linear algebraic sence) are truly similar (in the usual sense).

There is, however, a deeper explanation for this. Namely, if $A$ and $A'$ are similar, then they are simply two different matrix representations of a common linear transformation!

In more detail: suppose we have $A'=P^{-1}AP\text{.}$

Let $B$ be the standard basis of $\R^n\text{,}$ and let $B'$ be the basis of $\R^n$ obtained by taking the columns of the invertible matrix $P\text{.}$ Finally, let $T=T_A$ be the matrix transformation associated to $A\text{.}$
Then $A=[T]_B\text{,}$ $P=\underset{B'\rightarrow B}{P}\text{,}$ and $P^{-1}=\underset{B\rightarrow B'}{P}\text{.}$
From the change of basis formula it follows that

\begin{equation*} A'=P^{-1}AP=\underset{B\rightarrow B'}{P}[T]_B\underset{B'\rightarrow B}{P}=[T]_{B'} \end{equation*}

In other words to say $A$ and $A'$ are similar is simply to say that they are different matrix representations of the same overlying linear transformation $T$ (see Holy Commutative Tent of Linear Algebra on next slide). All their shared properties (same eigenvalues, same determinant, same trace, etc.) are simply the properties they inherit from this one overlying $T\text{,}$ of which they are but earthly shadows.

There is one true $T\text{!}$

The previous theorem allows us to extend some of our matrix definitions to linear transformations.

Definition 6.5.16.

Let $T\colon V\rightarrow V$ be linear, let $B$ be any basis of $V\text{,}$ and let $A=[T]_B\text{.}$ We define the characteristic polynomial of $T$ to be $p(t)=\det(tI-A)\text{.}$

Theorem 6.5.17. Invertibility theorem (final version).

Let $A$ be $n\times n\text{.}$ The following statements are equivalent.

$A$ is invertible.
$A\boldx=\boldzero$ has a unique solution (the trivial one).
$A$ is row equivalent to $I_n\text{,}$ the $n\times n$ identity matrix.
$A$ is a product of elementary matrices.
$A\boldx=\boldb$ has a solution for every $n\times 1$ column vector $\boldb\text{.}$
$A\boldx=\boldb$ has a unique solution for every $n\times 1$ column vector $\boldb\text{.}$
$\det(A)\ne 0\text{.}$
$\NS(A)=\{\boldzero\}\text{.}$
$\nullity(A)=0\text{.}$
$\rank(A)=n\text{.}$
$\CS(A)=\R^n\text{.}$
$\RS(A)=\R^n\text{.}$
The columns of $A$ are linearly independent (or span $\R^n\text{,}$ or are a basis of $\R^n$).
The rows of $A$ are linearly independent (or span $\R^n\text{,}$ or are a basis of $\R^n$).
$A$ does not have 0 as an eigenvalue.