In this blog post, I derive, step by step, the exact partition function for the ferromagnetic Ising model on the square lattice. The result is celebrated as “Onsager’s solution” of the 2-D Ising model. It was originally derived by Lars Onsager in 1942 and published in 1944 in Physical Review . That paper revolutionized the study of phase transitions and what we now call critical phenomena .
Somewhat ironically, I first heard about the Ising model when I was working in industry. I was 20 and held a summer job at what was then known as British Telecom Research Labs (BTRL), near Ipswich in the UK. This was before I had ever seen a cell phone or heard of the Internet (although I knew about BITNET and JANET). I worked there in the summer of 1990 and again for a month or so around April 1991. My job at BT involved writing C implementations of multilayer perceptrons and Hopfield neural nets. In those days, BT was interested in implementing hardware neural networks and my boss mentioned casually to me that certain kinds of neural nets are basically just special cases of the Ising model. (Indeed, the Hopfield network is closely related to the Ising spin glass.) Thus began my fascination with the Ising model. Later, in 1994 in Boston, I took a course given by Bill Klein at BU on statistical mechanics, where we went through the solution of the 1-D ferromagnetic Ising model. Still, I never had the chance to study properly the 2-D Ising model. As a PhD student, I would almost daily pass by a poster with a background photo of Lars Onsager (with a cigarette in his hand), hung near the office door of my advisor Gene Stanley, so I was regularly reminded of the 2-D Ising model. I kept telling myself that one day I would eventually learn how Onsager managed to do what seemed to me, at the time, an “impossible calculation.” That was 1994 and I am writing this in 2015!
In what follows, I solve the Ising model on the infinite square lattice, but I do not actually follow Onsager’s original argument. There are in fact several different ways of arriving at Onsager’s expression [3–9]. The method I use below is known as the combinatorial method and was developed by van der Waerden, Kac and Ward among others and relies essentially on counting certain kinds of closed graphs (see refs. [3,10–13]). I more or less follow Feynman  and I have also relied on the initial portions of ref. .
2. The 2-D Ising model
Consider a two dimensional lattice where at each point of the lattice is located a (somewhat idealized) spin- particle. Consider a finite subset of this lattice, of size and let . Let denote the set of pairs of integers such that spins and are nearest neighbors. In the ferromagnetic 2-D Ising model with nearest neighbor interactions, spins and interact if and only if . Each spin can assume only 2 values: .
Consider a system of spins. The Hamiltonian for a spin configuration is given by
The sum over the nearest neighbors should avoid double counting, so that and are not counted separately. Without loss of generality, we will assume for simplicity.
3. The canonical partition function
In the theory of equilibrium statistical mechanics, the canonical partition function contains all the information needed to recover the thermodynamic properties of a system with fixed number of particles, immersed in a heat bath, details of which can be found in any textbook on statistical mechanics [3-6,14].
I prefer to define the partition function as the two-sided Laplace transform of the degeneracy of the energy level . But traditionally, the partition function is defined as a sum or integral over all possible states of the system:
The two ways of thinking are equivalent. The Laplace transform variable is related to the thermodynamic temperature via , where is the Boltzmann constant. What follows is the exact calculation of .
4. The partition function of the 2-D Ising model
The sum over the full configuration space spans over exactly states, because each spin can only have 2 possible values. Moreover, since the sum is finite (for finite ), we can write the -sum as iterated sums, to obtain
Let us rewrite the exponential factor as
So we get for the partition function
Next, consider that can only assume the two values . Moreover, , so that
We will next carry out a change of variables to eliminate the hyperbolic functions. Let us write
where . From the identity
where is the number of total bonds, assuming periodic boundary conditions. So we finally get
To proceed further, we must now evaluate the product that appears inside the summation. Expanding the product, we can write
as a polynomial in . Collecting the monomial terms , we can write
The summation is over distinct nearest neighbor pairs, of which there are exactly , i.e. the number of bonds.
Recall that a graph is a collection of nodes or vertices connected by edges or links. Given any collection of bonds, we can treat the bonds as edges and the spins connected to those bonds as the nodes. So in the language of graph theory, we can say that the last summation above is over all graphs with edges, i.e. bonds. The maximum valence or degree of a node in the 2-D Ising model is 4.
Now, observe that if a node has degree in some particular graph of bonds, then appears times in the product over the graphs of bonds. Consider nodes with odd degree . In the iterated summation over all spin configurations, there is one sum over . Since
therefore any graph with even a single node of odd degree will have a zero factor, eliminating its contribution to the partition function. So the sum over graphs of bonds need include only graphs with nodes of degree 2 and 4. Indeed, for even degree
So for the graphs of nodes of even degree, each node contributes a factor 2. If the graph of bonds has vertices or nodes, then that graph will contribute a factor . There are also spins which are not counted in the graph of bonds. The summation over these leftover spins can be summed separately. For each such spin we get
so that we get an additional factor . We get a total factor of . Finally, let us separate the case and write
Let us at this point introduce new terminology to refer to graphs only with nodes with even degree and call such graphs admissible graphs. Let us denote the set of all admissible graphs by . We can thus rewrite the partition function as
Note that in dimensions, the only difference is that so that
In dimensions, there are no admissible graphs (if we neglect the boundary conditions) and we immediately recover
so that the partition function per spin (see definition further below) is
5. Summation over paths
We now have the partition function as a sum over admissible graphs. These admissible graphs contain only nodes with even degree, so that dangling nodes are impossible, i.e. the graphs contain only (disconnected or connected) loops. So it makes sense to try to decompose the graphs in terms of closed loops.
Let us consider the bonds between nearest neighbor sites as starting at and ending at . In other words, we are orienting the bonds. Let us define a path as a sequence of bonds, such that each bond starts where the previous bond finished, but never going backwards over the previous bond. Moreover, the last bond ends where the first bond started, so that paths are closed by definition.
Since all paths are closed, there are many possible starting (or ending) points, and we are only interested in the equivalence class of paths equivalent up to circular permutation of the bond order. The inverse of a path is defined as the bonds in inverse order and with opposite orientation, in agreement with the usual common sense. We will not distinguish between paths and their inverses, i.e. they are equivalent in what follows.
But now, we are going to assign to every path a positive or negative signature . Let be the number of windings or rotations made by the tangent vector of a path as it traverses the whole path, i.e. the total angle of rotation divided by . Angle is measured in the standard convention of counterclockwise being positive. We then define the sign of a path by
so that if a path goes around an odd number of times, then the sign is positive but if the path winds around an odd number of times, then the sign is negative.
If a path is periodic, we define the period to be the largest integer such that the identical sub-paths are not periodic. If a path is periodic with period , then the sign of the periodic path is defined as
where is the value of for the non-periodic subpath. In particular, the sign of a periodic path is not the product of the signs of the subpaths. Instead, the sign of a periodic path is -1 if the period is even. If the period is odd, then the sign of the periodic path equals the sign of the non-periodic subpath.
We now define the “amplitude” or weight of a path as
so that the amplitude can be positive or negative, but its absolute value is , where is the total length of the path. Recall that for non-negative .
Next, consider the infinite product over all equivalence classes of closed non-periodic paths
The expansion of the product will contain terms like
Let . Let us now decompose the product of all the powers of in terms of how many times the bonds appear in the paths.
Some of the bonds will appear multiple times, because they might be part of two or more distinct paths. Let us consider first the situation where each bond appears at most once in the different paths. This means that either the paths are disjoint, i.e. do not intersect, or else they intersect in a such a manner that no bond is traversed by both intersecting paths. This can only happen if the intersection nodes have degree 4. When a path enters through a bond to a node of degree 4, there are 3 possible exit bonds.
Let be the number of nodes with degree 4. Since at each node of degree 4 there are 3 types of path crossings, there are possible choices of path combinations for any given graph. Each such choice of paths will have a sign. Let us take a closer look at all these signed terms.
The 3 types of crossings are (1) straight through, (2) turn clockwise and (3) counterclockwise. Let be the number of intersections of type , including self-crossings plus crossings of the different paths. Crossings of type 1 of distinct paths can happen only an even number of times and to an even power is 1.
Each self-crossing of type 1 means that 1 full rotation of the tangent vector has been made. So we get a sign contribution .
The other 2 types of crossings do not lead to sign changes. Let us write their sign contribution as where .
What this means is that if there are crossings, then the sign for the entire graph will be . How many ways are there to choose , , crossings of type 1, 2, 3 from the crossings? The answer is
All such combinations of paths have the same , where is the total number of bonds in the graph. The total contribution of the signed terms is going to be, for a particular connected graph and a particular collection of paths such that no bond appears more than once,
By the multinomial theorem, this expression equals
We have so far considered the case of a connected graph. What about a disconnected graph? Let graph consist of components , each of which is connected. By the previous argument, each will have a contribution and since the summation for the graph can be separated into the product of summations for each graph, we get for the entire graph
We have so far seen how the product of over the paths that make up a graph equal the when no bond is repeated more than once.
Let us now consider the situation where at least one bond appears more than once. We wish to avoid paths which are periodic and paths which have no bond appearing more than once. For a connected graph, we must avoid a polygon, i.e. a path with only degree 2 nodes, because no bond appears more than once (and of course we also exclude periodic polygons). For a disconnected graph, at least one of the connected subgraphs must not be a polygon, for the same reason. Let denote the set of all connected graphs which are not polygons and all disconnected graphs which have at least one non-polygon connected subgraph.
Consider one such graph . This graph will have terms of the form
where is the sign for this particular path combination. Consider a path where a particular bond is repeated times. Consider all possible sets of paths which contribute terms of absolute value for some fixed and fixed for a chosen bond. Since each such path goes through the chosen bond times, therefore there are path segments separated by that bond, i.e. the bond chops the path into path segments. The set of all such paths will be given by all possible permutations of the bond segments. Although we did not distinguish between paths and their inverses, the path segments can be traversed in two directions each, and the sign may change. Since all permutations are being considered, a cancellation in sign takes place. In the permutation group, half of the permutations are odd and half are even, and these will have opposite signs. So whenever a bond is repeated, the terms cancel out.
So we have ended up with the sketch or outline for proving the following:
This identity between a product over paths and a sum over graphs is essentially the “backbone” of the combinatorial method. It is difficult to count the graphs, but relatively easier to calculate the product over the paths.
We can now express the 2-D Ising model’s partition function as
6. Counting all the paths
We need to count paths, but we need to do the book-keeping of the signs. To track the signs, which count the windings of the tangent vector along the path, we will consider every “left turn” as sign change, so we will count a factor . Similarly for every “right turn” we will include a factor . For no turns, i.e. going straight, we will include no extra factor, or equivalently, a factor of 1. Finally, going backwards is forbidden, so for a turn of radians we will kill this possibility by using a factor of zero, 0.
Recall that the amplitude of a path is just raised to the power of the length of the path, and with a sign equal to depending on how many times the tangent vector winds around the origin. Since the factors can be used to keep track of the changes in the direction of the tangent vector, therefore the amplitude of any (non-periodic) path is simply equal to where is the number of right turns, is the number of left turns, is the number of no turns (i.e. straight ahead) and is the number of backwards or 180 degree turns, of which there are none since these are forbidden.
Since we know how to calculate amplitudes now for any particular path, let us now turn to the calculation of amplitudes for collection of paths. Following probability theory, wave mechanics or quantum mechanics, we will assume that amplitudes obey the principle of linear superposition. In other words, the combined amplitude for a collection of paths equals the sum of the amplitudes for each path.
We are first going to count all the paths that start and end at the origin, i.e. closed loops that go through the origin. We are going to calculate the path propagator, i.e. the amplitude for arriving at a site at step starting at the origin. The propagator has translation invariance, i.e. where we place the origin does not matter. We are going to ignore boundary conditions because boundary effects will disappear in the thermodynamic limit. We will always consider the directions in the order up, down, left, right, for convenience. Let the positive axis go towards the right and the positive axis go upwards. Let the amplitudes for arrival at a site going in the up, down, left and right directions be respectively , , and . These amplitudes can refer to either one single path or to a collection or “ensemble” of paths.
The amplitude for arriving in the upwards direction (i.e., from below) at step at position is given by
In words, this equation tells us that the amplitude to arrive going upwards at going upwards depends only on the amplitudes at step at position and on the turning factors and , and on the factor of one bond.
Similarly, we thus have
If a site can only be reached in steps with , then the amplitudes . This extension allows us to consider arbitrarily distant .
Before we calculate the amplitudes of closed nonperiodic paths , which is what we need, let us first calculate the amplitudes of path segments. Consider all path segments that start at the origin and end at position after exactly steps. The total amplitude for this collection of path segments will be given by
which is just the sum of the amplitudes to end at coming in from each of the 4 possible directions.
We now have a situation where we can recursively express the amplitudes at step in terms of the amplitudes at step , however the recursion relations include a shift of the coordinates to a neighboring site. For example the amplitude depends on the amplitudes at step at position rather than , and so forth. These translations pose a difficulty that we need to overcome.
We will now invoke the magical powers of the Fourier transform! This integral transform has many amazing properties. For example, convolutions become products and vice versa under Fourier transformation. Similarly, differential operators transform to simple algebraic multipliers. The particular property of the Fourier transform that we will use is the ability to convert translations into phase shifts. I use the term Fourier transform in the general sense, which includes Fourier series. The Dirac comb can be used to treat them both as part of a single theory. So let us define the Fourier transform of and the inverse transform as follows:
Are these quantities well defined? The Fourier transform is well defined for any function. Since for fixed the amplitude has compact support and is finite, therefore it lies in the space of absolutely summable functions. However, it is possible for not to be an function! This topic is beyond the scope of this article. Suffice it to say that, in some sense, the inverse transform can also be made well defined even for non- functions.
Let us now explicitly calculate :
Note that the translation has disappeared and instead become a phase shift. For the transforms of the other amplitudes, we will get analogous expressions. Specifically, if we let and we can summarize the recursion relations for the transformed amplitudes as
Notice that the Fourier transformed amplitudes for step are linear in the transformed amplitudes for step . So we can use the tools of linear algebra on this set of 4 equations. So let us define the vector
and the matrix
So we can express the recursion relation in matrix form as
which is much more concise than the 4 separate equations in 4 variables. Even better, we can iterate the recursion to obtain
where is the Fourier transformed amplitude for a loop of length zero, for arriving at the origin before the first step. But do we know which way the walk arrived at the origin? In general we do not know, but for loops, i.e. closed paths, we do! The for arriving at the origin is simply , because the path is a closed loop. From now on, we focus exclusively on loops. In the case of a loop, the way that the path started at the origin is identical to the way it ends at the origin.
Paths starting at the origin can only start in 4 possible directions. Let for be the basis unit vectors for the space of the for the directions up, down, left and right. Then the paths ending at the origin going upwards at the origin have total (Fourier transformed) amplitude with , and similarly for the other 3 directions. Here is the transpose of . So the total amplitude for all loops going through the origin is given by
For those familiar with Dirac’s bra-ket notation, we could write instead
however, we will stick with the usual linear algebra notation. The sum of the diagonal terms of a matrix is the trace, so we can rewrite the total amplitude for loops of length that end at the origin more concisely as,
We have been working with the Fourier transform of the amplitudes , so we need to inverse Fourier transform and evaluate the inverse transformed quantity at the origin, :
This expression is still not the correct amplitude because the sign of the amplitude is times the number of times that the tangent vector of the path winds around the origin, so we must multiply by . We have also only considered paths that end at the origin. If we have sites, then there are times as many paths. We have also counted each path twice: one in the “forward” direction and one in the “reverse” direction. So we have counted paths as well as their inversions. Multiplying by and we get the total amplitude of closed paths of length :
This expression includes all closed paths that start anywhere. But there are possible starting positions for a loop of length . So we are over-counting the total number of loops. If we quotient out the circular permutations, we get the correct final amplitude. Summing over , we get the total amplitude for all (non-equivalent) closed paths:
Interchanging of the summation and integration is allowed provided we have absolute convergence. Similarly, for the logarithm we need to ensure that we have convergent power series. Let us take a closer look. The matrix has no element larger than 1. So the maximum trace of is 4 and the maximum trace of is also 4. Hence, for the sum of the double integral is absolutely convergent and we can take the sum inside the integral. Next, let us look at the convergence of the sum, taken inside the trace inside the double integral. The summand is a matrix, so we have a matrix valued power series. If we consider the power series expansion for the logarithm of a matrix, then as usual the radius of convergence is for matrix norm (smaller than) 1. and have norms no larger than 4. So for the power series converges in norm and we can express it as a logarithm. However, note that for the logarithm we do not in fact have to worry about convergence of the power series of the logarithm if we think in terms of formal power series or analytic continuation beyond the branch point singularity. For example, it is well known that the Riemann zeta function defined as a power series does not converge for small positive or negative real part of , but analytic continuation (or equivalent methods) allow it to be uniquely defined beyond the radius of convergence. Similarly, the formal power series of the logarithm is well defined independently of convergence (i.e. well defined algebraically rather than analytically).
We thus have an expression for the amplitudes that includes all non-periodic paths, up to equivalence under inversions and circular permutations. But what about periodic paths? A periodic path of total length with period has starting points only, not . So the above expression counts all non-periodic paths with weight 1 but also periodic paths with weight . What about the sign of periodic paths? The sign is because the sign of a periodic path is defined as , see Eq. (22). So for even period, the sign for terms of order is is always , whereas for odd period the sign should be the same as for the nonperiodic path, i.e. .
Let again denote the equivalence class of non-periodic paths, with inversions and circular permutations considered equivalent. We can then write
Note that the sign for even period is negative, but the sign of for odd is the same as the sign of . (Again, for readers concerned about convergence, since , the series for converges absolutely.)
Substituting these last few results back into (17) we thus get
To proceed further, we must calculate the trace of the log of . But this is actually quite easy, because the sum of logarithms is the logarithm of the products. In this case, the relevant sum and product are of the eigenvalues of . Specifically, , so
The determinant can be found using methods such as expansion in minors, Laplace expansion or Leibniz’ formula. We obtain the value
Let us try to simplify this expression. Recall that so that
Substituting back, we obtain
We are now essentially done and can easily obtain the solution found by Onsager. The Helmholtz free energy is defined according to
and the free energy per particle is thus given by
Following Onsager, let us define , i.e. the partition function per spin.
Specifically, the above result is equivalent to Onsager’s original expression. Except for notation, it is identical to Eq. (109b) of the 1944 paper, specialized to the case of unit symmetric coupling for both directions. It is also essentially the same as Eq. (5.29) in Feynman’s book .
7. The hypergeometric series for the partition function
I originally went through the above exercise with the sole motivation of wanting to understand Onsager’s solution. Quite by accident, however, I unexpectedly stumbled onto a very nice result about the 2-D Ising model.
Onsager’s solution above is written in terms of a definite integral. No “closed form” expression, even in terms of special functions, was generally known — until now! I recently uploaded to arXiv a paper in which I derive the hypergeometric series for the partition function. Hypergeometric series generalize the geometric series.
I was especially happy to learn that in the 1970s Larry Glasser and Lars Onsager had arrived at a result (unpublished) very similar to mine (see link to arXiv below for details). Their result and mine are related via a hypergeometric identity, which I hope to investigate more closely when time permits.
 L. Onsager, Phys. Rev. 65, 117 (1944).
 H. E. Stanley, Introduction to Phase Transitions and Critical Phenomena (Oxford University
Press, Oxford, New York, 1971).
 R. P. Feynman, Statistical Mechanics. A set of lectures (Benjamin and Cummings Publishing, Reading, 1972).
 B. McCoy and T. Wu, The two dimensional Ising model (Harvard University Press, Cambridge, 1973).
 K. Huang, Statistical Mechanics (John Wiley & Sons, New York, 1987).
 R. J. Baxter. Exactly solved models in statistical mechanics (Academic Press, London, 1989).
 S. G. Brush, Rev. Mod. Phys. 39, 883 (1967).
 G. F. Newell and E. W. Montroll, Rev. Mod. Phys. 25, 353 (1953).
 B. A. Cipra, The American Mathematical Monthly 94, 937 (1987).
 B. L. Van Der Waerden, Z. Physik 118, 473, 1941.
 M. Kac and J. C. Ward, Phys. Rev. 88, 1332, 1952.
 S. Sherman, J. Math. Phys. 1, 202 (1960).
 G. A. T. F. da Costa and A. L. Maciel, Rev. Bras. Ensino de Física 25, 49 (2003).
 S. Salinas, Introduction to Statistical Physics (Springer, Berlin, 2001).