Skip to main content

Section 6.2 Matrix representations of linear transformations

We have seen how the coordinate vector map can be used to translate a linear algebraic question posed about a finite-dimensional vector space \(V\) into a question about \(\R^n\text{,}\) where we have many computational algorithms at our disposal.

We would like to extend this technique to linear transformations \(T\colon V\rightarrow W\text{,}\) where both \(V\) and \(W\) are finite-dimensional. The basic idea, to be fleshed out below, can be described as follows:

  1. Pick a basis \(B\) for \(V\text{,}\) and a basis \(B'\) for \(W\text{.}\)

  2. “Identify” \(V\) with \(\R^n\) and \(W\) with \(\R^m\) using the coordinate vector isomorphisms \([\hspace{5pt}]_B\) and \([\hspace{5pt}]_{B'}\text{,}\) respectively.

  3. “Model” the linear transformation \(T\colon V\rightarrow W\) with a certain linear transformation \(T_A\colon \R^n\rightarrow \R^m\text{.}\)

The matrix \(A\) defining \(T_A\) will be called the matrix representing \(T\) with respect to our choice of basis \(B\) for \(V\) and \(B'\) for \(W\).

In what sense does \(A\) “model” \(T\text{?}\) All the properties of \(T\) we are interested in (\(\NS T\text{,}\) \(\nullity T\text{,}\) \(\im T\text{,}\) \(\rank T\text{,}\) etc.) are perfectly mirrored by the matrix \(A\text{.}\)

As a result, this technique allows us to answer questions about the original \(T\) essentially by applying a relevant matrix algorithm to \(A\text{.}\)

Subsection 6.2.1 Matrix representations of linear transformations

Definition 6.2.1.

Let \(V\) and \(W\) be vector spaces with ordered bases \(B=(\boldv_1, \boldv_2, \dots, \boldv_n)\) and \(B'=(\boldw_1, \boldw_2, \dots, \boldw_m)\text{,}\) respectively. Given a linear transformation \(T\colon V\rightarrow W\text{,}\) the matrix representing \(T\) with respect to \(B\) and \(B'\), is the \(m\times n\) matrix \([T]_B^{B'}\) whose \(j\)-th column is \([T(\boldv_j)]_{B'}\text{,}\) considered as a column vector: i.e.,

\begin{equation*} [T]_B^{B'}=\begin{amatrix}[cccc]\vert \amp \vert \amp \amp \vert \\ \left[T(\boldv_1)\right]_{B'}\amp [T(\boldv_2)]_{B'}\amp \dots \amp [T(\boldv_n)]_{B'} \\ \vert \amp \vert \amp \amp \vert \end{amatrix}\text{.} \end{equation*}

In the special case where \(W=V\) and we pick \(B'=B\) we write simply \([T]_B\text{.}\)

We must prove two things: (1) the matrix \(A=[T]_{B}^{B'}\) satisfies (6.2.1); (2) if \(A\) satisfies \(A[\boldv]_B=[T(\boldv)]_{B}\) for all \(\boldv\in V\text{,}\) then \(A=[T]_{B}^{B'}\text{.}\)

Assume we have \(B=(\boldv_1, \boldv_2, \dots, \boldv_n)\text{.}\)

  1. By definition we have

    \begin{equation*} [T]_B^{B'}=\begin{amatrix}[cccc]\vert \amp \vert \amp \amp \vert \\ \left[T(\boldv_1)\right]_{B'}\amp [T(\boldv_2)]_{B'}\amp \dots \amp [T(\boldv_n)]_{B'} \\ \vert \amp \vert \amp \amp \vert \end{amatrix}\text{.} \end{equation*}

    Given any \(\boldv\in V\text{,}\) we can write

    \begin{equation*} \boldv=c_1\boldv_1+c_2\boldv_2+\dots +c_n\boldv_n \end{equation*}

    for some \(c_i\in \R\text{.}\) Then

    \begin{align*} [T]_{B}^{B'}[\boldv] \amp= [T]_{B}^{B'} \begin{bmatrix} c_1\\ c_2\\ \vdots \\ v_n \end{bmatrix} \\ \amp=c_1[T(\boldv_1)]_{B'}+c_n[T(\boldv_n)]_{B'}+\cdots +c_n[T(\boldv_n)]_{B'} \amp (\text{column method})\\ \amp = [c_1T(\boldv_1)+c_2T(\boldv_2)+\cdots +c_nT(\boldv_n)]_{B'} \amp (\knowl{./knowl/th_coordinates.html}{\text{6.1.9}})\\ \amp=[T(c_1\boldv_1+c_2\boldv_2+\cdots +c_n\boldv_n)]_{B'} \amp (T \text{ is linear})\\ \amp =[T(\boldv)]_{B'}\text{,} \end{align*}

    as desired.

  2. Assume \(A\) satisfies

    \begin{equation*} A[\boldv]_B=[T(\boldv)]_{B'} \end{equation*}

    for all \(\boldv\in V\text{.}\) Then in particular we have

    \begin{equation} A[\boldv_i]_B=[T(\boldv_i)]_{B'}\label{eq_matrixrep_proof}\tag{6.2.2} \end{equation}

    for all \(1\leq i\leq n\text{.}\) Since \(\boldv_i\) is the \(i\)-th element of \(B\text{,}\) we have \([\boldv_i]_B=\bolde_i\text{,}\) the \(i\)-th standard basis element of \(\R^n\text{.}\) Using the column method (3.1.19), we see that

    \begin{equation*} A[\boldv_i]_B=A\bolde_i=\boldc_i, \end{equation*}

    where \(\boldc_i\) is the \(i\)-th column of \(A\text{.}\) Thus (6.2.2) implies that the \(i\)-th column of \(A\) is equal to \([T(\boldv_i)]_{B}\text{,}\) the \(i\)-th column of \([T]_B^{B'}\text{,}\) for all \(1\leq i\leq n\text{.}\) Since \(A\) and \([T]_{B}^{B'}\) have identical columns, we conclude that \(A=[T]_{B}^{B'}\text{,}\) as desired.

Remark 6.2.3. Uniqueness of \([T]_B^{B'}\).

The uniqueness claim of Theorem 6.2.2 provides an alternative way of computing \([T]_{B}^{B'}\text{:}\) namely, simply find an \(m\times n\) matrix \(A\) that satisfies

\begin{equation*} A[\boldv]_B=[T(\boldv)]_{B'} \end{equation*}

for all \(\boldv\in V\text{.}\) Since there is only one such matrix, we must have \(A=[T]_B^{B'}\text{.}\)

Remark 6.2.4. Commutative diagram for \([T]_B^{B'}\).

Let \(T\colon V\rightarrow W\text{,}\) \(B\text{,}\) and \(B'\) be as in Theorem 6.2.2. The defining property of \([T]_B^{B'}\) ((6.2.1)) can be summarized by saying that the following diagram is commutative.

Figure 6.2.5. Commutative diagram for \([T]_B^{B'}\)

That the diagram is commutative means that starting with an element \(\boldv\in V\) in the top left of the diagram, whether we travel to the bottom right of the diagram either by first applying \(T\) and then applying \([\hspace{5pt}]_{B'}\) (“go right, then down”), or else by first applying \([\hspace{5pt}]_B\) and then applying \([T]_B^{B'}\) (“go down, then right”), we get the same result! (The bottom map should technically be labeled \(T_A\text{,}\) where \(A=[T]_B^{B'}\text{,}\) but this would detract from the elegance of the diagram.)

Remark 6.2.6. How \([T]_B^{B'}\) represents \(T\).

In what precise sense does the matrix \(A=[T]_{B}^{B'}\) represent or model the linear transformation \(T\text{?}\) To answer this question we enumerate the key features of Figure 6.2.5:

  • The diagram is commutative: i.e.,

    \begin{equation*} A[\boldv]_B=[T(\boldv)]_{B}^{B'} \end{equation*}

    for all \(\boldv\in V\text{.}\)

  • The vertical coordinate vector maps are isomorphisms.

These two properties together allow us to translate any linear algebraic fact about \(T\) to an equivalent fact about the matrix \(A\text{.}\) We list a few here:

  • \(\boldv\in \NS T\) if and only if \([\boldv]_B\in \NS A\)

  • \(\boldw\in \im T\) if and only if \([\boldw]_{B'}\in \CS A=\im_{T_A}\)

  • \(\{\boldv_1,\boldv_2,\dots, \boldv_r\}\) is a basis of \(\NS T\) if and only if \(\{[\boldv_1]_B, [\boldv_2]_B, \dots, [\boldv_r]_B\}\) is a basis of \(\NS A\)

  • \(\{\boldw_1,\boldw_2,\dots, \boldw_s\}\) is a basis of \(\im T\) if and only if \(\{[\boldw_1]_{B'}, [\boldw_2]_{B'}, \dots, [\boldv_s]_{B'}\}\) is a basis of \(\CS A=\im_{T_A}\text{.}\)

  • \(\nullity T=\nullity A\) and \(\rank T=\rank A\)

  • \(T\) is an isomorphism if and only if \(A\) is invertible.

Subsection 6.2.2 Example

Define \(T\colon P_{3}\rightarrow P_{2}\) by \(T(p(x))=p'(x)\text{.}\) Compute \(A=[T]_{B}^{B'}\text{,}\) where \(B\) and \(B'\) are the standard bases for \(P_3\) and \(P_2\text{,}\) respectively.

Use \(A\) to determine \(\NS T\) and \(\range T\text{.}\) \​begin{bsolution} The matrix \(A\) will be \(3\times 4\text{.}\) Denote by \(\boldc_j\) the \(j\)-th column of \(A\text{.}\) We use the formula for \(\boldc_j\text{:}\)

\begin{align*} \boldc_1\amp =[T(1)]_{B'}=[0]_{B'}=\begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix} \amp \boldc_2\amp =[T(x)]_{B'}=[1]_{B'}=\begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix}\\ \boldc_3\amp =[T(x^2)]_{B'}=[2x]_{B'}=\begin{bmatrix} 0\\ 2\\ 0 \end{bmatrix} \amp \boldc_4\amp =[T(x^3)]_{B'}=[3x^2]_{B'}=\begin{bmatrix} 0\\ 0\\ 3 \end{bmatrix} \end{align*}

Thus \(A=\begin{bmatrix}0\amp 1\amp 0\amp 0\\ 0\amp 0\amp 2\amp 0\\ 0\amp 0\amp 0\amp 3 \end{bmatrix}\text{.}\)

We see easily that \(\NS A =\Span(\{(1,0,0,0)\})\) and \(\range A=\CS A=\R^3\text{.}\) Translating everything back to the original spaces, we see that \(\NS(T)=\Span(\{1\})=\{\text{ constant poly.'s } \}\) and \(\range(T)=P_2\text{.}\) \end{bsolution}

Subsection 6.2.3 Example

Define \(T\colon M_{22}\rightarrow M_{22}\) by \(T(A)=A^T+A\text{.}\) Let \(B\) be the standard basis of \(M_{22}\text{,}\) and let

\begin{equation*} B'=\{ \begin{bmatrix}0\amp 1\\ -1\amp 0 \end{bmatrix} , \begin{bmatrix}1\amp 0\\ 0\amp 0 \end{bmatrix} , \begin{bmatrix}0\amp 1\\ 1\amp 0 \end{bmatrix} , \begin{bmatrix}0\amp 0\\ 0\amp 1 \end{bmatrix} \}\text{.} \end{equation*}
  1. Compute \(A=[T]_B\text{.}\)

  2. Compute \(A'=[T]_{B'}\text{.}\)

\begin{equation*} A=\begin{bmatrix}2\amp 0\amp 0\amp 0\\ 0\amp 1\amp 1\amp 0\\ 0\amp 1\amp 1\amp 0\\ 0\amp 0\amp 0\amp 2 \end{bmatrix} , A'=\begin{bmatrix}0\amp 0\amp 0\amp 0\\ 0\amp 2\amp 0\amp 0\\ 0\amp 0\amp 2\amp 0\\ 0\amp 0\amp 0\amp 2 \end{bmatrix} \end{equation*}

Moral: our choice of basis affects the matrix representing \(T\text{,}\) and some choices are better than others!

Subsection 6.2.4 \(\R^n\) revisited

Consider the special case of the form \(T\colon \R^n\rightarrow \R^m\text{.}\) We know that in this case we have \(T=T_A\text{,}\) where

\begin{equation*} A= \begin{bmatrix}\vert\amp \vert\amp \cdots \amp \vert \\ T(\bolde_1)\amp T(\bolde_2)\amp \cdots \amp T(\bolde_n)\\ \vert\amp \vert\amp \cdots \amp \vert \end{bmatrix}\text{.} \end{equation*}

In light of our recent discussion we recognize this as simply \(A=[T]_{B}^{B'}\text{,}\) where \(B,B'\) are the standard bases of \(\R^n\) and \(\R^m\text{.}\)

This is certainly the most direct way of associating a matrix to the transformation \(T\) in this case, but it begs the question as to whether another choice of bases gives us a better matrix representation!

Example follows.

Subsection 6.2.5 Example

Let \(W\colon x+y+z=0\) be the plane in \(\R^3\) perpendicular to \(\boldn=(1,1,1)\text{,}\) and consider the orthogonal projection transformation \(T=\text{ proj } _W\colon \R^3\rightarrow \R^3\text{.}\)

The recipe in the last slide tells us that \(\text{ proj } _W=T_A\) where \(A=\begin{bmatrix}2/3 \amp -1/3\amp -1/3\\-1/3\amp 2/3\amp -1/3\\ -1/3\amp -1/3\amp 2/3 \end{bmatrix}\text{.}\)

This \(A\) is nothing more than \([T]_B\text{,}\) where \(B=\{\bolde_1,\bolde_2,\bolde_3\}\) is the standard basis of \(\R^3\text{.}\) We ask: Is there another basis \(B'\) for which the matrix \(A'=[T]_{B'}\) is simpler?

Yes!! I'll build a basis that pays more attention to the geometry involved in defining \(T\text{.}\) Start first with a basis of the plane \(W\text{:}\) the set \(\{\boldv_1=(1,-1,0),\boldv_2=(0,1,-1)\}\) will do. Now extend to a basis of \(\R^3\text{.}\) We need only add a vector that is not included already in \(W\text{:}\) the normal vector \(\boldv_3=(1,1,1)\) to the plane is a natural choice.

Thus we consider the basis \(B'=\{\boldv_1,\boldv_2, \boldv_3\}\) and compute \(A'=[\text{ proj } _W]_{B'}\text{:}\) { $ A'=

\​begin{bmatrix}\vert\amp \vert\amp \vert\\ [T(\boldv_1)]_{B'}\amp [T(\boldv_2)]_{B'}\amp [T(\boldv_3)]_{B'}\\ \vert\amp \vert\amp \vert \end{bmatrix} \​begin{bmatrix}\vert\amp \vert\amp \vert\\ [\boldv_1]_{B'}\amp [\boldv_2]_{B'}\amp [\boldzero]_{B'}\\ \vert\amp \vert\amp \vert \end{bmatrix} \​begin{bmatrix}1\amp 0\amp 0\\ 0\amp 1\amp 0\\ 0\amp 0\amp 0 \end{bmatrix}

Wow, \(A'\) is way simpler! How can both of these matrices “represent” the same linear transformation?

{ Let \(W\colon x+y+z=0\) be a the plane in \(\R^3\) perpendicular to \(\boldn=(1,1,1)\text{,}\) and consider the orthogonal projection transformation \(T=\text{ proj } _W\colon \R^3\rightarrow \R^3\text{.}\)

Two different bases: \(B=\{\bolde_1,\bolde_2,\bolde_3\}\text{,}\)\(B'=\{\boldv_1=(1,-1,0),\boldv_2=(0,1,-1), \boldv_3=(1,1,1)\}\text{.}\)

Two different matrix representations:

\(A=[T]_B=\frac{1}{3}\begin{bmatrix}2 \amp -1\amp -1\\-1\amp 2\amp -1\\ -1\amp -1\amp 2 \end{bmatrix}\text{,}\) \(A'=[T]_{B'}=\begin{bmatrix}1\amp 0\amp 0\\ 0\amp 1\amp 0\\ 0\amp 0\amp 0 \end{bmatrix}\text{.}\)}

The simpler matrix \(A'\) gives us a clear conceptual understanding of this orthogonal projection.

For example, we see that \(\CS A'=\Span(\{(1,0,0),(0,1,0)\})\) and \(\NS A'=\Span(\{(0,0,1)\}\text{,}\) and furthermore \(A'\) acts as the identity on \(\CS A'\text{,}\) and as the zero transformation on \(\NS A'\text{.}\)

Using \([\hspace{5pt}]_{B'}^{-1}\) we can translate this information back to \(T=\text{ proj } _W\text{.}\) Namely, \(\range T=\Span\{(\boldv_1,\boldv_2)\}=W\text{,}\) \(\NS T=\Span \{\boldv_3\}=\Span \{\boldn\}\text{,}\) and furthermore, \(T\) acts as the identity on \(W\) and as the zero transformation on \(\Span\{\boldn\}\text{.}\)

However, if we actually want an explicit formula for computing he orthogonal projection of a vector \(\boldx\in \R^3\) onto \(W\text{,}\) we are better off using \(A\text{,}\) since we have \(\proj{\boldx}{W}=A\boldx\text{.}\)

So both representations have their own particular virtue! In the next section we develop a means for fluidly going back and forth between the two.

Wouldn't it have been easier just to compute