Monday, March 16, 2009

Does the NCAA Selection Committee Use Regression Analysis?

The committees job is to pick and rank the 65 best college basketball teams (including those who won their conference tournaments). They want to choose the teams that are most likely to win games in the NCAA tournament. Now, regression analysis seems like a perfect tool for this type of dilemma, but does the selection committee use it?

Based on the slow involvement of statistics in other sports, my guess is no.

When college basketball experts talk about using "computers", I fear they are only talking comparing team's RPIs and Strength of Schedules via an eye test. The problem with these two measurements is that they are co-linear (a statistical way of saying too similar). All teams that have crappy RPIs are going to have crappy SOSs, like Penn State, since SOS is a part of RPI. Do we know how likely these measures predict success? I don't know, but there is a lot data from previous tournaments and it would be easy to determine. However, RPI and SOS would only explain a portion of the variability of the model. There are a lot of other variables that should be considered like record in the last 10 games, previous performance in tournaments, margin of victory, and potential injuries to key players. If these variables could help better predict a winning outcome (explain more of the variability in the model) then they should be included.

Yeah it's a human decision, and the committee does look at these variables and takes them into account. But there is a huge amount of bias in a purely human decision. It should be the human's decision to determine the variables in the regression analysis that makes the final decision. The committee may think that RPI is the most important variable and deserves the most weight. This may be true, but their assumption should at least be tested by regression.

During the selection show last night a committee member started to explain how they chose Arizona/Dayton over the other bubble teams. He emphasized that teams chosen were based on playing a competitive schedule throughout the year and winning quality games away from home. Of course Penn State had two huge road wins (MSU, Illinois) and Arizona didn't have any (and were 1-5 in their last six games). It's quite possible that the regression could have chosen the same 65 teams, but it would be interesting to see the analysis.