Football and Men's Basketball Rankings Correlations

These are correlations of various college football and men's basketball rankings specifically comparing early weeks' individual rankings with Kenneth Massey's comparison pages' most recent average (mean) rankings. The intent is to show which individual rankings most accurately predict later consensus as represented by the average rankings.

Links to the results of these correlation calculations are at the bottom of this page. The input data is taken from Kenneth Massey's comparison pages:

Key to the Columns

This explains the pages linked below. You need to be familiar with Kenneth's comparison pages to make sense of them.

Example lines in my pages:


DEN        Week10  36%  921
DEN        Week11  48%  938

DEN designates the ranking, the same abbreviation as Kenneth's comparison page.

Week10 is the week within the season, the number matching those in Kenneth's historical comparison-page URLs.

921 is the correlation between "DEN" for week 10 and the current "consensus", i.e. the ranking listed on Kenneth's current comparison page derived from the average (mean) ranking.

36% is a percentile, indicating "DEN" had a higher correlation than 36 percent of the rankings included on the week-10 comparison page. The number actually represents the percent of rankings this one beat beat so the highest number is typically in the 95%-98% range (since it didn't beat itself) and the lowest 0%.

"Special" Rankings

Besides comparing all the individual rankings from Kenneth's page, I also threw in some other rankings. For lines with the first column containing:

Consensus correlates earlier weeks' consensus with the current consensus. (By "Consensus", I mean the ranking as the teams are listed on the comparison page, derived from the mean of the individual rankings.) You can see how various individual rankings' predictive ability compares with the predictive ability of the consensus of all rankings. In a typical week it beats almost all individual rankings.

Con2001 (or whatever year) correlates the previous year's final ranking with this year's current ranking. It does NOT use each week from the previous year: only the previous year's final ranking. However, it is listed against each week to show what percentile it achieves as compared to that week's individual rankings. I originally had this thought: "I'll bet the same teams are typically on top every year and last year's consensus ranking might stack up very well against the rankings we all produce, especially in the early weeks". That proved to be false since it shows a low percentile even the first week. I don't know what folks use to initialize their data but it appears to be better than simply taking the previous year's rankings.

The pages for each sport

For each sport, there are four pages that are calculated with slight differences.

xx-corr.txt: The "ordinary" calculation, nothing special--the correlation of each individual rank is done with the order of the teams listed on the comparison page, i.e. the team at the top is considered 1, the next one 2, etc.
xx-corr-25.txt: "top 25 only"--ignores all rankings greater than 25. Any unranked team or team with a rank higher than 25 is given a rank of 26 under all the different rankings. This allows a limited-but-level comparison between rankings that only do 25 teams such as AP and USA versus other rankings.
xx-corr-average.txt: Correlates against the "Average" or "Mean" fractional ranks rather than against the natural number ranks derived from them.
xx-corr-miss.txt: Fills in missing ranks, any individual ranking that doesn't have a number all the teams is extended to have a number for every team, as if it were a multi-way tie. For example, the "WAJL10" football ranking ranks 50 teams, so in this calculation, we produce the correlation for a "modified WAJL10" with each non-ranked team assigned a rank of 51. This was my final attempt at a reasonable method and I consider it the "best" except specifically for evaluating the "top-25-only" rankings like AP and USA.

What's in each page

Within each page there are four sections.

The current week's correlations. This is an attempt to duplicate the numbers across the bottom of Kenneth's comparison page. These don't match Kenneth's though I've checked and rechecked my formulas and programming. The numbers I produce look generally plausible, e.g. generally the same rankings have high and low correlations, but there are occasions when on Kenneth's page, "A" has a higher correlation than "B", whereas on my page "B" has the higher correlation, etc.
Week by week listings of the earlier weeks.
Same data in a single table, ordered from highest correlation to lowest. This will sometimes show you that one individual rankings' predictive ability is weeks behind another, e.g. ones Week8 correlates less than anothers Week7.
Same data grouped by individual ranking, e.g. all the AP rankings are together. This will show you how a particular rank's correlation has risen over time and how its percentile has changed over time.

A note on the calculations

As I said above, when I applied my correlation formula in a similar manner to that on Kenneth Massey's comparison page, I haven't been able to reproduce his numbers though mine are generally close to his. Another issue is that the procedure/formula I use does not handle unranked teams in a reasonable manner. Thus you will see negative numbers for some of the correlations when it is only a partial ranking such as AP and USA, and I suppose this is an artifact of the particular formula I use and how I apply it to incomplete rankings.

This doesn't explain why I can't reproduce Kenneth's numbers because differences show up for rankings for which that is not a factor.

Also, I list a "concordance" for each week. This formula also assumes a ranking from 1 to N but has been applied to data that doesn't fill this requirement, thus is a little off.

The Various Correlation Pages

From Kenneth Massey's Football Comparison 2003/2004
fb-corr.html
fb-corr-25.html: top 25 only
fb-corr-average.html: against mean ranks
fb-corr-miss.html: with missing ranks filled in

From Kenneth Massey's Basketball Comparison 2002/2003
bb-corr.txt
bb-corr-25.txt: top 25 only
bb-corr-average.txt: against mean rank
bb-corr-miss.txt: with missing ranks filled in

-John Wobus, 9/8/03