A complex analytic proof of the Cayley-Hamilton theorem

The Cayley-Hamilton theorem states that any real or complex square matrix satisfies its own characteristic equation. Hamilton originally proved a version involving quaternions, which can be represented by {4\times 4} real matrices. A few years later, Cayley established it for {3\times 3} matrices. It was Frobenius who established the general case more than 20 years later. (The theorem is also valid for a matrix over commutative rings in general.)

There are many nice proofs of Cayley-Hamilton. Doron Zeilberger‘s exposition of the combinatorial proof of Howard Straubing comes to mind. To my taste, one of the nicest proofs, due to Charles A. McCarthy, uses a matrix version of Cauchy’s integral formula. Here I expand on McCarthy’s original proof, and also have borrowed from  Leandro Cioletti‘s exposition on the subject.

We first state the theorem. Let {A} be an {n\times n} matrix over the real or complex fields {\Bbb R} or {\Bbb C}. Let {I_n} be the identity matrix. The characteristic polynomial {p(t)} for the variable {t} and matrix {A} is defined as

\displaystyle p(t)= \det (t I_n -A )~. \ \ \ \ \ (1)

The charactetristic equation for {A} is defined as

\displaystyle p(t)=0~. \ \ \ \ \ (2)

In the past this equation was sometimes known as the secular equation. The degree of {p} is clearly {n}. The Cayley-Hamilton theorem states that

\displaystyle p(A) =0~. \ \ \ \ \ (3)

In other words {A} satisfies its own characteristic equation.

Note that if {A} is diagonal, {p(A)=0} is clearly satisfied, because the diagonal entries {A_{ii}} are just the eigenvalues, which necessarily satisfy the characteristic equation. If {A} is not diagonal but is diagonalizable, then also the theorem is clearly true, because the determinant is invariant under similarity transformations. But what about the general case, which includes all the non-diagonalizable matrices? This general case is what makes the theorem non-trivial to prove.

We can prove the theorem using a continuity argument. Recall that every matrix {A} with non-degenerate eigenvalues is diagonalizable. So we can approximate every non-diagonalizable matrix to arbitrary precision in terms of a diagonalizable matrix. This qualitative statement can be made precise. Specifically, the diagonalizable matrices are dense in the set of all square matrices. Since the determinant is a continuous function of {A}, therefore {p(A)} cannot discontinuously jump away from zero as {A} is varied continuously — thus establishing Cayley-Hamilton for the general case. There are many variations of this theme.

The reason that I very much like McCarthy’s complex analytic proof is that it uses the Cauchy integral formula to implement this continuity argument without ever explicitly invoking continuity. This is possible because Cauchy’s integral formula allows a function to be calculated at a point without ever having to evaluate the function explicitly at that point. So the value of {p(A)} can be calculated for non-diagonalizable matrices without actually having to compute {p(A)} directly. It is a beautiful proof.

In what follows, I give a step-by-step reproduction of McCarthy’s proof. I assume that readers have familiarity with Cauchy’s integral formula for complex functions of a single complex variable. For our purposes, let {f: \Bbb C \rightarrow \Bbb C} be entire. Then Cauchy’s integral formula states that

\displaystyle f(a) = \frac 1 {2 \pi i } \oint \frac{f(z)}{z-a} ~dz ~. \ \ \ \ \ (4)

Proofs are found in standard texts. Here we will need the following version for matrices. Let {A} be an {n\times n} matrix with entries in {\Bbb C}. Then

\displaystyle f(A) = \frac 1 {2 \pi i } \oint \frac{f(z)}{z I_n -A} ~dz ~, \ \ \ \ \ (5)

where {I_n} is the identity matrix. See the appendix below for a simple proof.

Recall that the inverse {B^{-1}} of an invertible matrix {B} is given by

\displaystyle B^{-1} = \frac {{\rm adj} (B)} {\det (B)} \ \ \ \ \ (6)

wbere {{\rm adj}(B)} is the adjugate matrix which is the transpose of the cofactor matrix. Hence, the inverse of {(z I_n - A)} is given by

\displaystyle (z I_n - A)^{-1} = \frac M {\det (z I_n-A)} \ \ \ \ \ (7)

where

\displaystyle M= {{\rm adj} (z I_n-A)} ~. \ \ \ \ \ (8)

This adjugate matrix {M} contains entries that are polynomials in {z}. Hence, the entries of {M} are of finite degree in {z}. (In fact, due to the definition of the cofactor matrix they are of degree no larger than {n-1}.)

Next observe that the entries of {M} do not have negative powers of {z}. Hence, by the residue theorem, the entries {M_{ij}} of {M} satisfy

\displaystyle \frac 1 {2 \pi i } \oint M_{ij} ~dz =0 ~, \ \ \ \ \ (9)

because the complex residues at {z=0} all vanish.

The Cayley-Hamilton theorem now follows from (5), (7) and (9) :

\displaystyle \begin{array}{rcl} p(A) &=& \displaystyle \frac 1 {2 \pi i } \oint {p(z)} \frac M {\det (z I_n-A)} ~dz \\ &=& \displaystyle\frac 1 {2 \pi i } \oint M \frac {\det (z I_n-A) } {\det (z I_n-A)} ~dz \\ &=& \displaystyle \frac 1 {2 \pi i } \oint M ~dz =0 ~. \end{array}

\square

Appendix

There are several ways to prove (5). A common approach is to show convergence. Here I instead use formal power series, without worrying about convergence, since the latter can be checked a posteriori.

Taylor expanding {(1-x)^{-1}} we obtain the series

\displaystyle { 1 \over (1-x)} = \sum_{j=0}^\infty x^j ~.\ \ \ \ \ (10)

It should not therefore be a surprise that

\displaystyle { I_n \over (I_n-A)} = \sum_{j=0}^\infty A^j ~. \ \ \ \ \ (11)

The proof of the above is simple:

\displaystyle (I_n -A) \sum_{j=0}^\infty A^j = I_n \sum_{j=0}^\infty A^j - \sum_{j=1}^\infty A^j = I_n A^0 = I_n \ \ \ \ \ (12)

If we now put {A/z} in place of {A} we get

\displaystyle { I_n \over (zI_n-A)} = \sum_{j=0}^\infty \frac{A^j}{z^{j+1}} ~. \ \ \ \ \ (13)

Now consider that

\displaystyle \frac 1 {2 \pi i} \oint {z^k \over (zI_n-A)} dz = \frac 1 {2 \pi i} \oint \sum_{j=0}^\infty \frac{A^{j}}{z^{j+1-k}} dz ~. \ \ \ \ \ (14)

By the residue theorem, this last expression contains nonzero contributions only when {j=k}. So we obtain

\displaystyle \frac 1 {2 \pi i} \oint \sum_{j=0}^\infty {z^k \over (zI_n-A)} dz = A^k \ \ \ \ \ (15)

Observe that we now have an expression that we can substitute for {A^k} in any series expansioin involving powers of {A}. We are ready to prove the claim (5).

Since {f} is entire, its Laurent series has vaninishing principal part. We can thus write

\displaystyle f(z) = \sum_{j=0}^\infty a_j z^j ~~, \ \ \ \ \ (16)

so that

\displaystyle f(A) = \sum_{j=0}^\infty a_j A^j ~~, \ \ \ \ \ (17)

Invoking (15) we arrive at the claim,

\displaystyle f(A) = \sum_{j=0}^\infty a_j \frac 1 {2 \pi i} \oint {z^j \over (zI_n-A)} dz = \frac 1 {2 \pi i} \oint {f(z) \over (zI_n-A)} dz ~. \ \ \ \ \ (18)

\square