KRACH Ratings

The KRACH system assigns a rating to each team such that the ratio of two teams' ratings is the odds on a game between them. The set of ratings is calculated so that each team's expected number of wins in the games it has already played, based on the odds implied by its and its opponent's ratings, equals the team's actual number of wins in those games. The calculation utilizes only teams' game results in terms of wins, losses, and ties: not game scores, dates, or locations. My own KRACH calculation incorporates a fictitious tie between each team and a team of rating 100.

KRACH stands for "Ken's Ratings for American College Hockey". Ken Butler rated college hockey teams using the "Bradley-Terry" rating system. Bradley and Terry developed a rating formula in 1952 to handle taste tests. While taste tests have little to do with hockey, I presume Bradley and Terry had data from tests on a bunch of different tastes, and needed a reasonable method of drawing conclusions from a lot of data that consisted of numerous individuals saying which taste they preferred, given the chance to compare two specific tastes. In other words, a bunch of random head-to-head competitions between tastes, each such trial with a winner.

The identical system was invented earlier in 1929 by the German mathematician, Ernst Zermelo as a way to evaluate the results of a chess tournament.

I will refer to the system as "KRACH", though terms such as "Bradley-Terry", or "KRACH-like" would make more sense if I am discussing sports other than American college hockey and my name is not Ken. You can find references to Bradley-Terry ratings and Zermelo ratings as well as KRACH ratings. You will occasionally see this system proposed, under one or more of these names, as a means to determine which teams should receive post-season or championship tournament births, or even as a means to declare a national champion.

Odds: KRACH is based upon the notion of odds. Given a bunch of games played, KRACH determines a set of odds that is consistent with what was seen in the games. Odds are typically expressed as a pair of numbers, e.g.

3:2

Given two teams such as East State and West College, you might say that you judge the odds of East State winning a game between them to be 3:2. By this, we mean that their respective strengths are such that for every 3 games East State won, West College would have won 2. Presumably this wouldn't exactly happen, since chance plays a role: two flips of a coin does not always give you precisely one heads and one tails.

Odds are not the only way to express a difference in team strengths. Odds are popular for horse races, but in football, a difference in strengths is often given in a totally different way: a point spread. I won't discuss point spreads, which are irrelevant to KRACH, but want to point out that built into odds (and thus KRACH) is a way of looking at team strengths and it is hardly the only way.

Percentage chance of winning: Expressing relative team strengths as odds is exactly equivalent to expressing their relative strengths as a percentage chance that a team will win a game over another team. For example, if we think the odds are 3:2 that East State will beat West College, that says we think that over 5 games, we'd expect East State to win 3 of them. 3 out of 5 is 60% of the games, and another way of saying the odds are 3:2 is to say that East State has a 60% chance of winning a game over West College. Given odds, you can determine such a percentage, and given a percentage, you can determine the odds (with some trickiness for 0% and 100%; typically oddsmakers call these a "sure thing" rather than expressing odds as two numbers). Since 3:2 implies 3 wins for East State for every 2 wins for West College, it speaks of the result of 5 games, the sum of 3 and 2. The fraction that East State will win is 3/5. Here is the math both for this specific case, and in the general case.

If the odds are 3:2 that East State will beat West College:

East State           3                        60
is expected to win ----- of such games, i.e. --- i.e. 60%.
                   3 + 2                     100

If the odds are X:Y that TeamX will beat TeamY:

TeamX is          X                       Something
expected to win ----- of such games, i.e. --------- i.e. Something%.
                X + Y                        100

To calculate 'Something' (let's call it Z):

  X       Z
----- = -----
X + Y    100

100 x X
------- = Z
 X + Y

So, for 3:2 odds,

100 x 3         300
------- yields  --- which yields 60, those odds expressed percent.
 3 + 2           5

Also, given a percentage, you can find equivalent odds. You have some freedom, but typically want odds X:Y such that the two numbers are each natural (not fractional). We like 3:2 better than 1.5:1. One way to get such answer if Z is not fractional is to take X to be Z.

From above if X:Y is odds, and Z is percentage chance, of an X win:

  X       Z
----- = -----
X + Y    100

Taking X to be Z, find Y:

  Z       Z
----- = -----
Z + Y    100

or, assuming Z is not zero:

Z + Y    100
----- = -----
  Z       Z

or:

Z + Y = 100

or:

Y = 100 - Z

For example, if Z is 60:

Y = 100 - 60

or Y = 40, or the odds are 60:40, which is correct, but we
normally express it in reduced form, 3:2.

If what we have is a fractional probability of East College winning, i.e. .6, we start by turning it into a percent, i.e 60%, then doing the same thing.

Coming up with Odds: But where do odds come from? Obviously you, like any expert, can use your own judgment. Another way to get odds is from the teams' existing records, i.e., their past games. If East College has played West State five times and East College won three of the times, then we can use odds in a couple of ways. We could use our judgment to decide how likely it will be that West State will win their next such game, and who can say what our judgment will be without some particulars? Maybe East College just had a key injury. Maybe the weather conditions are such as to favor West State. Maybe those games were played over the course of five years, with different personnel, and this year, East College has won all ten of its regular season games against strong competition. It seems like any odds we come up with are just our own judgement.

But we can use odds to express the existing record, and in that case, we are not depending upon such judgement calls. We can say with total assurance that over those five games already played, what we saw was East College win 3:2 of the games. While odds regarding future games are arguable, odds regarding these past games are "perfect". What we can say is "the series was perfectly consistent with the notion that for each game, the odds were 3:2 that East College would win".

Were the odds actually 3:2 for each of the individual games? Odds, and probabilities, are slippery, always being relative to our prior knowledge. What you know determines what you would say are the true odds. If we were using our judgment to pick odds regarding a future game, we'd be thinking of every factor we were aware of: injuries, recent streaks, playing styles, etc. If East State did play West College five times and East State won three, what explaining factors might there have been? Perhaps each game was, in actuality, a sure thing: for example, suppose East State had a superstar that made them nearly invincible, but he was injured during two games and they were terrible without him. In that case, during none of these individual games was the odds 3:2. In each case it was a sure thing one way or the other. So, when we say 3:2 describes this particular past series, we mean to say that the actual won/loss result is the plausible outcome of five games, if in each of the five games, those were the odds. The odds describe the results of the series as if each team had a specific, consistent strength, played all five games at that strength.

Odds are a rather good description of the outcome. The idea is to win games, and the odds tell how well each team did that.

Odds when there are more than just two teams: This is all well and good, but if you have lots of teams that have played a number of games, but they all haven't had lengthy series with every other team, then what? Major League Baseball allows every team to have a record against every other league team, thus you can come up with winning percentages (or the equivalent odds) between any two teams in the league. But in many sports, such as College Football, Basketball, and Lacrosse, no team plays every other team even just once each. If you have dozens or hundreds of teams, each of which has played from ten to thirty games, are odds of any use? How do you calculate them?

"Expected" victories: In fact, mathematics can handle it. Key is that mathematically, you can deal with fractions of a victory. To begin with, let's ask the question: if the odds are 3:2 that East State will beat beat West College, then if they play two games, how many victories will East State get? We are aware that the answer is, as yet, unknown, and we do know that the answer will be either 0, or 1, or 2. After all, you can't end up with something like 1.2 victories. That's just a mathematical abstraction, not a real-world result.

However, the mathematical abstraction is useful. For example, if the odds are 3:2 that East State will win against West College, what do we expect the outcome to be if they played 13 games? Mathematically, there is an answer:


For 1 game, East State has a .6 chance of winning.  If
we take this as saying it is expected to win .6 games out
of every 1, then in 13 games:

.6 x 13 yields 7.8

If East State gets .6 victory from each game, after 13 games, it will have racked up 7.8 victories. Once again, 7.8 is not a realistic number, but it does tell us that 7 or 8 victories are likely numbers, with the suggestion that 8 is more likely than 7. As an abstraction, ".6 victories" is useful to do quick calculations of how many total victories to expect in such cases. Now, suppose you have lots of teams that have played a bunch of games, but no team has played every other team. Each team has racked up a number of wins. Is there odds for each pair of teams that would produce this result? For example, if five teams have played a total of ten games, and we know the results, i.e. who won and lost each game, then, are there odds we could come up with for each pair of teams that would produce this result? We did this above, with just two teams playing five games, but more teams present more of a challenge. This is the challenge that the KRACH system meets: coming up with odds between every pair of teams that is consistent with the results of the games already played.

KRACH: First, there is the question: if you have picked odds for every pair of teams, how do you know they are consistent with the results? The consistency is that given the odds, and the games, each team's actual count victories matches the victory count that is expected from the odds. "The victory count that is expected from the odds" is the sum of the expected victory count for each individual game. This is where we use the fractional expected victories, i.e. if the odds are 3:2, then we use the .6 victory to sum our victories.

For example, if East State played three games with West College, then we expect it to rack up 3 x .6 victories or 1.8 victories. On the other hand, maybe East State played three separate teams. If the odds in each case were 3:2 that East State won, then we still expect 1.8 victories. It doesn't matter that it was the same team, only that three games were all played, in which East State had 3:2 odds of winning. Perhaps it played another weaker team, and its odds of winning that game was 9:1. That calculates to a 90% chance of victory, or .9 win. For the four games, the number of expected victories would be .6 + .6 + .6 + .9 or 2.7 victories.

Getting back to the challenge, can one come up with odds such that every team shows a number of expected victories that add up to the victories they actually racked up? Looking in the opposite direction, suppose East State won against West College, but lost against North U. That means they had just 1 victory, total. Given the 3:2 odds that East State beats West College, this 1 victory matches the odds if East State's chance of beating Northern U was .4 or 2:3. We found odds in this very easy case, and not only found an answer, but there could be other answers: if East State:West College is 1:4 and East State:North U is 4:1, we also find the calculated expected victories add up to the actual total.

But back to the idea of simply finding an answer. A prime formula used by KRACH:


For a team X:

Victories by X = the sum of the expected victories for
                  each game that it played.

or, using "Vx" for short, if it played teams A, B, C ...

Vx = Sum of 
        Expected victories of X over A   +
        Expected victories of X over B   +
        Expected victories of X over C   etc. 

or, using Vxa to stand for "Expected victories of X over A:

Vx = Vxa + Vxb + Vxc + ...

Team X's expected victories over A, if it played 1 game,
is the fractional chance of a victory, e.g. .6 in the
examples above.

So if the odds of Team X beating Team A is X:A, then Vxa is

   X
 -----
 X + Z

or, to sum:

       X       X       X
Vx = ----- + ----- + ----- + ...
     X + A   X + B   X + C

given that the odds are X:A, X:B, X:C, etc.

Note that for mathematical summing of probabilities, it does not matter whether you are using odds are in their simplest form, e.g. 3:2 versus 60:40.

The above equation is key to KRACH. Given odds X:A, X:B, X:C, etc, it shows whether they are consistent with the outcome we've seen so far. This means if we are lucky guessers, and come up with some odds, we can see whether they are right or not. By "right", we mean: consistent with the outcomes we've seen so far.

But we can also reform this equation into a way to calculate KRACH. It isn't a simple formula, in that it is not one that you can use to directly calculate these numbers. Rather it can be formed into a calculation that can be repeated, each time giving you numbers closer to the odds that fit this equation. Repeat the calculation enough and you have the answer. The equation itself also serves as a test of your ratings, showing you when you've reached it. You end up with odds between every team, that are consistent with the result.

The one problem with the formulation is that it simply does not handle unbeaten teams. If one team has a non-zero KRACH rating, and an unbeaten team has beaten it, then the unbeaten teams' KRACH rating cannot be high enough to yield the required expected wins. If you've played five games and won all five, your opponents all won 0% of their games with you, and your odds against each, after the fact, is "sure thing" to win them all.

A KRACH assumption: There is a specific assumption built into KRACH that I have not mentioned yet. That is: If the odds that TeamA beats TeamX is A:X and the odds that TeamX beats TeamB is X:B, then the odds that TeamA beats TeamB is A:B.

To express this in an example, suppose East State has a 2:1 chance of beating North U, which has a 2:1 chance of beating South Institute. Can we conclude that East State has a 4:1 chance of beating South Institute?

Is this true? Is this rational? My suspicion is that in the real world, it is not true, but it does have a logic, which is purely mathematical, thus not subject to the vagrancies of one sport versus another, or one season versus another.

In our small example, East State comes across as an excellent team, far better than South Institute. Perhaps in such games, the better team does not actually win 80% of the time. Perhaps the superior team actually finds it difficult to maintain its level of play to the team's actual potential, and the inferior team ends up winning (perhaps slightly) more than the expected 1 out of 5 times. Or perhaps it is the opposite: that teams like South Institute tend to lose heart and thus lose even more than 4 out of 5 times. Or perhaps other factors produce trends that differ from the logical 4:1 ratio of wins you would expect.

So in a system such as KRACH, an assumption is made: that this kind of relation between odds of various teams does indeed hold. It could be argued that you should look at the record this way. Even if teams have a tendency to play less than their very best under predictable conditions, and even if that is provable by analyzing the game results a lot of seasons (e.g. if you were to show that in College Football, weak teams tend to lose heart), that that is no more a factor in determining a rating for the basis of a national champion or entry into a tournament than would such factors for a baseball team in Major League Baseball. After all, when we decide who won the pennant and proceeds to post season play, we don't say "The Atlanta Braves won X games, but some of those were against weak teams which we know tend to lose heart." We go with straight mathematics devoid of such "things we know", even if you can establish the truth of the statement "weak teams lose heart" from the results of seasons' of game results data. We want the season result and rewards to be based upon actual wins and losses.

Retrodictive systems: This illuminates the difference between "predictive systems" and "retrodictive systems", of which KRACH is the latter. A "predictive system" is designed to tell you who is likely to win in the future while a "retrodictive system" tells you what teams have earned. A lucky (or unlucky) team is an example of one that will differ in these two kinds of ratings. Perhaps a team botches a play, but by sheer dumb luck, it results in a touchdown, improving their record. Unless this happens so often with this team that you suspect some unseen potential, you might rationally discount that win when you make predictions: how likely is the team to have that lucky break again? But the retrodictive system is designed to reflect what actually did happen, just as won-lost records are used in sports such as MLB, NBA, NHL, etc.

Many will argue that a retrodictive system is what should be used to determine champions and post season play: what you are interested in is wins rather than predictions or potential. The team that had a good record of wins based upon who it beat, and who those teams beat, and so forth, has earned its place. That's what KRACH aims to show. Even if a team has "earned its place", there could conceivably be another team that you'd judge would more likely win a future game between the two. You have knowledge about injuries, about late season streaks, you've noted scoring differentials in the teams' games. You can make a judgement as to whether some team's record came down to a lucky or unlucky freakish play on the field. But to use such things for determining the teams allowed to advance is like passing over a MLB team for the World Series, even though they have the best record, simply because they subsequently lost a key player to an injury. Or because their scores indicated they barely won the games to get in and undoubtedly "lucked out".

The basic idea in a nutshell: The aim of KRACH is to provide ratings for a set of teams.

A set of ratings is calculated (see above) such as to express the odds on a game between any pair of teams.
For each set of teams, given their record, a set of expected wins is calculated from the ratings.
Each teams' expected number of wins based upon these ratings matches its actual number of wins in real life.

If you accept that odds are a way of expressing the relative strengths of teams, and that odds are consistent between teams (such that if A:B and B:C express the odds of TeamA beating TeamB and TeamB beating TeamC) then KRACH is the answer to ratings that express the teams' actual wins and losses.

Fictitious ties: In my KRACH calculations, I incorporate a "Fictitious Tie" in each team's record, i.e. a supposed tie between each team and an 'average' team. This 'mutes' a team's ratings early in the season, before it has played many games. For example, if a team's actual record is 2 and 0, I actually do the calculation using the record 2, 0 and 1 tie, where the tie is with an imaginary team with a fixed rating (I use 100). This addition to the calculation results in each (real) team's rating being a bit more average: the higher ratings are a little bit lower and the lower ratings are a little bit higher. In other words, the ratings are bunched together a little more than they would be.

A problem faced by rating systems based upon wins and losses is that a team that is 2 and 0 has a perfect record, but hasn't played enough games to convince us that this really means much. By pretending that team is 2, 0, and 1, we differentiate it from a team which is, say, 5 and 0 (which we would pretend is 5, 0, and 1). We all mentally make such adjustments when we evaluate teams: "Sure, they are undefeated, but they've only played two games!". Fictitious games are one means of incorporating this kind of adjustment in a measured, calculable fashion, suitable for incorporating into a ratings formula. I treat such a 2, 0, and 1 record as 2.5 wins and .5 losses.

Since all the teams have played roughly the same number of games, this actually has a limited effect on the relative ratings of teams. But it does make some difference, for example, if one team is 2 and 0 while another is 3 and 0. As the season progresses, this factor has less and less effect, and if all teams played exactly the same number of games through their seasons, I believe the advantages/disadvantages given by this device to any specific teams would cancel.

Another convenience of fictitious ties is that the KRACH calculation simply does not handle unbeaton teams. However, if we give each team a tie, counted as 1/2 win and 1/2 loss, our calculation always has a minimum of 1/2 loss to deal with for any team.

I read that Ken Butler originally included such a factor in his College Hockey rankings, but dropped it after a while.

I've seen reference to a method attributed to LaPlace (called his Law of Succession or Rule of Succession) for the same purpose, essentially, to use (1+Wins)/(2+Wins+Losses) as if it were the team's winning percentage, rather than the actual Wins/(Wins+Losses). This measure is equivalent to pretending a team played to two fictitious extra ties, or played two fictitious extra games, fictitiously winning one and losing one.

Links:
KRACH explanation at Ken Butler's homepage
College Hockey News info about KRACH.
Pairwise Comparision (including Bradley-Terry) at Wikipedia
Pierre-Simon Laplace (including Rule of Succession) at Wikipedia

John Wobus, 9/18/07

Wobus Sports: www.vaporia.com/sports