Skip to main content

Section 6.5 Diagonalization

Definition 6.5.1.

A linear transformation \(T\colon V\rightarrow V\) is diagonalizable if there exists a basis \(B\) of \(V\) for which \([T]_B\) is a diagonal matrix.

At last we relate the property of being diagonalizable with the notion of eigenvectors. In the process we make clear what we mean when we say \(T\) is diagonalizable if and only if it has “enough” linearly independent eigenvectors.

The first statement was a homework exercise. The proof of the second statement is within our means, but somewhat lengthy. I will sketch its proof elsewhere. For now it is more important to understand how to use the result.

Subsection 6.5.1 Deciding whether $A_{n\times n}$ is diagonalizable

  1. Find the distinct eigenvalues, \(\lambda_1,\dots, \lambda_r\text{,}\) of \(A\text{,}\) let \(W_{\lambda_i}\) be the corresponding eigenspaces, and let \(n_i=\dim W_{\lambda_i}\text{.}\)

  2. \(A\) is diagonalizable if and only if \(n_1+n_2+\cdots +n_r=n\text{.}\)

  3. If the above equality is true, compute bases \(B_j\) for each \(W_{\lambda_j}\text{.}\)

  4. Place all the vectors from the bases \(B_j\) as columns of a matrix \(P\text{.}\) As these eigenvectors are linearly independent, \(P\) is invertible.

  5. The matrix \(D=P^{-1}AP\) is diagonal. In more detail, the \(j\)-th diagonal entry of \(D\) is the eigenvalue associated to the eigenvector in the \(j\)-th column of \(P\text{.}\) This means the first \(n_1\) diagonal entries of \(D\) are equal to \(\lambda_1\text{,}\) the next \(n_2\) entries are equal to \(\lambda_2\text{,}\) etc.

Subsection 6.5.2 Example

Take \(A=\begin{bmatrix}1\amp 1\\ 0\amp 1 \end{bmatrix}\text{.}\) Earlier I claimed that this matrix is not diagonalizable. Let's see why.

The characteristic polynomial of \(A\) is \(p(t)=(t-1)^2\text{.}\) Thus \(\lambda=1\) is the only eigenvalue of \(A\text{.}\)

We have \(W_1=\NS(I-A)=\NS\begin{bmatrix}0\amp -1\\ 0\amp 0 \end{bmatrix}\text{.}\) We see clearly that \(\rank(I-A)=1\text{,}\) and hence \(\dim W_1=\dim\NS(I-A)=2-1=1\text{.}\)

Since \(W_1\) is the only eigenspace, and since \(\dim W_1=1\ne 2\text{,}\) we conclude \(A\) is not diagonalizable.

Subsection 6.5.3 Example

Let \(A=\begin{bmatrix}14 \amp 21 \amp 3 \amp -39 \\ 12 \amp 25 \amp 3 \amp -41 \\ 12 \amp 24 \amp 5 \amp -42 \\ 12 \amp 22 \amp 3 \amp -38 \end{bmatrix}\text{.}\)

The characteristic polynomial of \(A\) is \(p(t)=x^4 - 6x^3 + 9x^2 + 4x - 12\text{.}\) (This is not obvious, but would be a pain to compute in detail. You may take this for granted.)

Our usual factoring tricks allow us to factor this as \(p(x)=(x-2)^2(x+1)(x-3)\text{.}\)

The eigenspaces are \(W_2=\NS(2I-A), W_{-1}=\NS(-I-A)\text{,}\) and \(W_3=(3I-A)\text{.}\) I'll leave it to you to verify that they have bases \(B=\{ (3,2,0,2), (1,1,2,1)\}\text{,}\) \(B'=\{(1,1,1,1)\}\text{,}\) and \(B''=\{(3,5,6,4)\}\text{,}\) respectively.

It follows that the dimensions of the eigenspaces are 2, 1, and 1, respectively. Since \(2+1+1=4\text{,}\) we conclude that \(A\) is diagonalizable.

In more detail, we have \(D=P^{-1}AP\text{,}\) where

\begin{equation*} D=\begin{bmatrix}2\amp 0\amp 0\amp 0\\ 0\amp 2\amp 0\amp 0\\ 0\amp 0\amp -1\amp 0\\ 0\amp 0\amp 0\amp 3 \end{bmatrix} , \ \ P=\begin{bmatrix}3 \amp 1 \amp 1 \amp 3 \\ 2 \amp 1 \amp 1 \amp 5 \\ 0 \amp 2 \amp 1 \amp 6 \\ 2 \amp 1 \amp 1 \amp 4 \end{bmatrix} \end{equation*}

Subsection 6.5.4 Geometric and algebraic multiplicity

Take \(A\) (or \(T\)) and suppose the characteristic polynomial \(p(t)\) factors as

\begin{equation*} p(t)=(t-\lambda_1)^{n_1}(t-\lambda_2)^{n_2}\cdots (t-\lambda_r)^{n_r}\text{,} \end{equation*}

where the \(\lambda_i\) are the distinct eigenvalues of \(A\) (or \(T\)). It turns out that the exponent \(n_i\text{,}\) called the algebraic multiplicity of the eigenvalue \(\lambda_i\text{,}\) is an upper bound on \(m_i=\dim W_{\lambda_i}\text{,}\) called the geometric multiplicity.

Subsection 6.5.5 Linear independence and eigenvectors

The following result is used to prove the diagonalizability theorem (Theorem 6.5.2), but is also very useful in its own right.

Proved elsewhere.

Let \(\boldv_1, \boldv_2,\dots, \boldv_n\) be eigenvectors corresponding to these \(n\) distinct eigenvalues. The theorem tells us they form a linearly independent set. Since \(\dim V=n\text{,}\) they form a basis for \(V\) by the dimension theorem compendium. Since \(T\) has a basis of eigenvectors, it is diagonalizable.

Theorem 6.5.6 makes no assumption about the dimension of \(V\text{.}\) It can thus be applied to interesting infinite-dimensional examples.

Let \(V=C^\infty(\R)\text{,}\) and let \(T\colon V\rightarrow V\) be defined as \(T(f)=f'\text{.}\)

Let \(f_i(x)=e^{k_ix}\text{,}\) where the \(k_i\) are all distinct constants. I claim \(S=\{f_1,f_2,\dots , f_r\}\) is linearly independent.

Indeed, each \(f_i\) is an eigenvector of \(T\text{,}\) since \(T(f_i)=(e^{k_ix})'=k_ie^{k_ix}=k_if_i\text{.}\)

Since the \(k_i\)'s are all distinct, it follows that the \(f_i\) are eigenvectors with distinct eigenvalues, hence linearly independent!

Note: try proving that \(S\) is linearly independent using the the Wronskian! You get a very interesting determinant computation.

Subsection 6.5.6 Final extension of the invertibility theorem

Lastly, we can add one final statement to the invertibility theorem: \(A\) is invertible if and only if \(0\) is not an eigenvalue of \(A\text{.}\)

Indeed \(0\) is an eigenvalue of \(A\) if and only if \(p(0)=\det(0I-A)=\det(-A)=0\) if and only if \(\det A=0\text{,}\) since \(\det(-A)=(-1)^n\det A\text{.}\)

Since \(\det A=0\) if and only if \(A\) is not invertible, we conclude that \(0\) is an eigenvalue of \(A\) if and only if \(A\) is not invertible.

You find the final version of the invertibility theorem on the next slide.

Subsection 6.5.7 Diagonalizable matrices

Definition 6.5.9.

An \(n\times n\) matrix \(A\) is diagonalizable if there is an invertible matrix \(P\) such that \(D=P^{-1}AP\) is diagonal.

In other words, \(A\) is diagonalizable if it is similar to a diagonal matrix \(D\text{.}\)

Remark 6.5.10.

  1. If \(A\) is itself diagonal, then it is diagonalizable: we may choose \(P=I_n\) in the definition.

  2. In this section we will develop a systematic procedure for determining whether a matrix is diagonalizable. As we will see, the answer is yes if and only if \(A\) has “enough” linearly independent eigenvectors. Of course we will spell out precisely what we mean by “enough”.

  3. Not all matrices are diagonalizable. For example, \(A=\begin{bmatrix}0\amp 1\\ 0\amp 0 \end{bmatrix}\) is not diagonalizable, as the aforementioned procedure will show.

  4. Roughly speaking, you should interpret being diagonalizable as meaning “as good as diagonal”. To elaborate: doing arithmetic with diagonal matrices \(D\) is extremely easy; if we know \(A\) is diagonalizable, meaning it is similar to a diagonal matrix \(D\text{,}\) then it shares many essential properties of \(D\text{,}\) and we can use the relation \(D=P^{-1}AP\) to help ease arithmetic computations involving \(A\text{.}\)

Subsection 6.5.8 Properties of conjugation

If we have, \(D=P^{-1}AP\text{,}\) why exactly is there such a close connection between \(D\) and \(A\text{?}\) One explanation has to do with the underlying operation

\begin{equation*} A\longmapsto P^{-1}AP\text{,} \end{equation*}

which we call conjugation by \(P\). As the following theorem outlines, conjugation by an invertible \(P\) satisfies many useful properties, and we use these to relate the matrix \(A\) with the matrix \(P^{-1}AP\text{.}\)

Exercise.

Subsection 6.5.9 Examples: utility of diagonalizability

Suppose \(A\) is diagonalizable, so that \(D=P^{-1}AP\text{,}\) where \(D\) is diagonal with diagonal entries \(d_i\text{.}\) Using Theorem 6.5.11, we can now see how to translate statements about \(D\) (which are generally easy to prove), to statements about \(A\) (which otherwise might have been difficult to show).

To compute \(A^{n}\) (hard) we can just compute \(D^{n}\) (easy) and then observe that \(A=PDP^{-1}\text{,}\) and thus

\begin{equation*} A^{n}=(PDP^{-1})^{n}=PD^{n}P^{-1}\text{,} \end{equation*}

where the last equality follows from part (b) of Theorem 6.5.11; here we let \(P^{-1}\) assume the role of \(P\) in the theorem statement.

For example, let \(A=\begin{bmatrix}1\amp 3\\ 1\amp -1 \end{bmatrix}\text{.}\) Let's compute \(A^n\) for arbitrary \(n\text{.}\)

We have \(D=P^{-1}AP\text{,}\) where \(D=\begin{bmatrix}2\amp 0\\ 0\amp -2 \end{bmatrix}\text{,}\) and \(P=\begin{bmatrix}3\amp 1\\ 1\amp -1 \end{bmatrix}\text{.}\) (This is not obvious yet. Soon we will have the tools to see why this is so.)

Then \(A=PDP^{-1}\text{,}\) and thus

\begin{equation*} A^{n}=PD^{n}P^{-1}=P\begin{bmatrix}2^{100}\amp 0\\ 0\amp (-2)^{100} \end{bmatrix} P^{-1}=\frac{1}{4}\begin{bmatrix}3\cdot2^n+(-2)^n\amp 3\cdot 2^n-3(-2)^{n}\\ 2^{n}-(-2)^n\amp 2^n+3(-2)^{n} \end{bmatrix}\text{.} \end{equation*}

Subsection 6.5.10 Examples: utility of diagonalizability

Suppose \(A\) is diagonalizable, so that \(D=P^{-1}AP\text{,}\) where \(D\) is diagonal with diagonal entries \(d_i\text{.}\)

More generally, we have \(f(A)=f(PDP^{-1})=Pf(D)P^{-1}\) for any polynomial \(f(x)=\anpoly\text{.}\) Since \(D\) is diagonal, with diagonal entries \(d_i\text{,}\) it is easy to see that \(f(D)\) is also diagonal, with diagonal entries \(f(d_i)\text{.}\)

In particular we see that \(f(A)=\underset{n\times n}{\boldzero}\) if and only if \(f(D)=\underset{n\times n}{\boldzero}\text{,}\) and this holds if and only if \(f(d_i)=0\) for each diagonal entry \(d_i\) of \(D\text{.}\)

Take the matrix \(A\) from Example 6.5.12, and let \(f(x)=(x-2)(x+2)=x^2-4\text{.}\) Since \(f(2)=f(-2)=0\text{,}\) it follows that \(f(D)=f(A)=\underset{2\times 2}{\boldzero}\text{.}\) In other words, \(A^2-4I=\underset{2\times 2}{\boldzero}\text{,}\) as you can easily check.

Subsection 6.5.11 Examples: utility of diagonalizability

Suppose \(A\) is diagonalizable, so that \(D=P^{-1}AP\text{,}\) where \(D\) is diagonal with diagonal entries \(d_i\text{.}\)

\(A\) has a square-root (i.e., a matrix \(B\) such that \(B^2=A\)) iff \(D\) has a square-root.

Indeed, suppose \(B^2=A\text{.}\) Set \(C=P^{-1}BP\text{.}\) Then \(C^2=P^{-1}B^2P=P^{-1}AP=D\text{.}\) Similarly, if \(C^2=D\text{,}\) then \(B^2=A\text{,}\) where \(B=PCP^{-1}\text{.}\)

As an example, the matrix \(A=\begin{bmatrix}0\amp -2\\ 1 \amp 3 \end{bmatrix}\text{,}\) satisfies \(D=P^{-1}AP\text{,}\) where \(D=\begin{bmatrix}1\amp 0\\ 0\amp 2 \end{bmatrix}\text{,}\) and \(P=\begin{bmatrix}2\amp 1\\ -1\amp -1 \end{bmatrix}\text{.}\) Since \(C=\begin{bmatrix}1\amp 0\\ 0\amp \sqrt{2} \end{bmatrix}\) is a square-root of \(D\text{,}\) \(B=PCP^{-1}=\begin{bmatrix}2-\sqrt{2}\amp 2-2\sqrt{2}\\ -1+\sqrt{2}\amp -1+2\sqrt{2} \end{bmatrix}\) is a square-root of \(A\text{,}\) as you can easily check.

So when exactly does a diagonal matrix \(D\) have a square-root? Clearly, it is sufficient that \(d_i\geq 0\) for all \(i\text{,}\) as in the example above. Interestingly, this is not a necessary condition! Indeed, consider the following example:

\(\begin{bmatrix}-1\amp 0\\ 0\amp -1 \end{bmatrix} =\begin{bmatrix}0\amp -1\\ 1\amp 0 \end{bmatrix} ^2\text{.}\)

Subsection 6.5.12 Properties of similarity

Before investigating the question of when a matrix is diagonalizable, we record a few more properties illustrating the close connection between similar matrices.

The proof of (d) follows. I leave the rest as an exercise.

By definition we have \(B=P^{-1}AP\) for some matrix \(P\text{.}\) We wish to show the characteristic polynomials \(p_A(t)\) and \(p_B(t)\) of the two matrices are equal. Compute:

\begin{align*} p_B(t)\amp =\amp \det(tI-B)\\ \amp =\amp \det(tI-P^{-1}AP)\\ \amp =\amp \det(tP^{-1}IP-P^{-1}AP) \ \text{ (since \(P^{-1}IP=I\))}\\ \amp =\amp \det(P^{-1}tIP-P^{-1}AP) \ \text{ (\(t\) behaves as scalar) }\\ \amp =\amp \det(P^{-1}(tI-A)P)\\ \amp =\amp \det(P^{-1})\det(tI-A)\det(P)\\ \amp =\amp (\det(P))^{-1}\det(P)\det(tI-A)\\ \amp =\amp \det(tI-A)=p_A(t)\text{.} \end{align*}

Subsection 6.5.13 The true meaning of similarity

Hopefully Theorems 6.5.11 and Theorem 6.5.15 convince you that similar matrices (in the linear algebraic sence) are truly similar (in the usual sense).

There is, however, a deeper explanation for this. Namely, if \(A\) and \(A'\) are similar, then they are simply two different matrix representations of a common linear transformation!

In more detail: suppose we have \(A'=P^{-1}AP\text{.}\)

  • Let \(B\) be the standard basis of \(\R^n\text{,}\) and let \(B'\) be the basis of \(\R^n\) obtained by taking the columns of the invertible matrix \(P\text{.}\) Finally, let \(T=T_A\) be the matrix transformation associated to \(A\text{.}\)

  • Then \(A=[T]_B\text{,}\) \(P=\underset{B'\rightarrow B}{P}\text{,}\) and \(P^{-1}=\underset{B\rightarrow B'}{P}\text{.}\)

  • From the change of basis formula it follows that

    \begin{equation*} A'=P^{-1}AP=\underset{B\rightarrow B'}{P}[T]_B\underset{B'\rightarrow B}{P}=[T]_{B'} \end{equation*}

In other words to say \(A\) and \(A'\) are similar is simply to say that they are different matrix representations of the same overlying linear transformation \(T\) (see Holy Commutative Tent of Linear Algebra on next slide). All their shared properties (same eigenvalues, same determinant, same trace, etc.) are simply the properties they inherit from this one overlying \(T\text{,}\) of which they are but earthly shadows.

There is one true \(T\text{!}\)

The previous theorem allows us to extend some of our matrix definitions to linear transformations.

Definition 6.5.16.

Let \(T\colon V\rightarrow V\) be linear, let \(B\) be any basis of \(V\text{,}\) and let \(A=[T]_B\text{.}\) We define the characteristic polynomial of \(T\) to be \(p(t)=\det(tI-A)\text{.}\)