The Cayley-Hamilton theorem states that any real or complex square matrix satisfies its own characteristic equation. Hamilton originally proved a version involving quaternions, which can be represented by real matrices. A few years later, Cayley established it for matrices. It was Frobenius who established the general case more than 20 years later. (The theorem is also valid for a matrix over commutative rings in general.)

There are many nice proofs of Cayley-Hamilton. Doron Zeilberger‘s exposition of the combinatorial proof of Howard Straubing comes to mind. To my taste, one of the nicest proofs, due to Charles A. McCarthy, uses a matrix version of Cauchy’s integral formula. Here I expand on McCarthy’s original proof, and also have borrowed from Leandro Cioletti‘s exposition on the subject.

We first state the theorem. Let be an matrix over the real or complex fields or . Let be the identity matrix. The characteristic polynomial for the variable and matrix is defined as

The charactetristic equation for is defined as

In the past this equation was sometimes known as the secular equation. The degree of is clearly . The Cayley-Hamilton theorem states that

In other words satisfies its own characteristic equation.

Note that if is diagonal, is clearly satisfied, because the diagonal entries are just the eigenvalues, which necessarily satisfy the characteristic equation. If is not diagonal but is diagonalizable, then also the theorem is clearly true, because the determinant is invariant under similarity transformations. But what about the general case, which includes all the non-diagonalizable matrices? This general case is what makes the theorem non-trivial to prove.

We can prove the theorem using a continuity argument. Recall that every matrix with non-degenerate eigenvalues is diagonalizable. So we can approximate every non-diagonalizable matrix to arbitrary precision in terms of a diagonalizable matrix. This qualitative statement can be made precise. Specifically, the diagonalizable matrices are dense in the set of all square matrices. Since the determinant is a continuous function of , therefore cannot discontinuously jump away from zero as is varied continuously — thus establishing Cayley-Hamilton for the general case. There are many variations of this theme.

The reason that I very much like McCarthy’s complex analytic proof is that it uses the Cauchy integral formula to implement this continuity argument without ever explicitly invoking continuity. This is possible because Cauchy’s integral formula allows a function to be calculated at a point without ever having to evaluate the function explicitly at that point. So the value of can be calculated for non-diagonalizable matrices without actually having to compute directly. It is a beautiful proof.

In what follows, I give a step-by-step reproduction of McCarthy’s proof. I assume that readers have familiarity with Cauchy’s integral formula for complex functions of a single complex variable. For our purposes, let be entire. Then Cauchy’s integral formula states that

Proofs are found in standard texts. Here we will need the following version for matrices. Let be an matrix with entries in . Then

where is the identity matrix. See the appendix below for a simple proof.

Recall that the inverse of an invertible matrix is given by

wbere is the adjugate matrix which is the transpose of the cofactor matrix. Hence, the inverse of is given by

where

This adjugate matrix contains entries that are polynomials in . Hence, the entries of are of finite degree in . (In fact, due to the definition of the cofactor matrix they are of degree no larger than .)

Next observe that the entries of do not have negative powers of . Hence, by the residue theorem, the entries of satisfy

because the complex residues at all vanish.

The Cayley-Hamilton theorem now follows from (5), (7) and (9) :

** Appendix **

There are several ways to prove (5). A common approach is to show convergence. Here I instead use formal power series, without worrying about convergence, since the latter can be checked *a posteriori*.

Taylor expanding we obtain the series

It should not therefore be a surprise that

The proof of the above is simple:

If we now put in place of we get

Now consider that

By the residue theorem, this last expression contains nonzero contributions only when . So we obtain

Observe that we now have an expression that we can substitute for in any series expansioin involving powers of . We are ready to prove the claim (5).

Since is entire, its Laurent series has vaninishing principal part. We can thus write

so that

Invoking (15) we arrive at the claim,