Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Tuesday, November 10, 2009

Gay Marriage

I don't know if there is any issue with such a generational divide...




It currently appears that it's only a matter time before legalization.



Nate Silver projects when each state will legalize gay marriage via a regression model with the following variables:
1. The year in which (a gay marriage type) amendment was voted upon;
2. The percentage of adults in 2008 Gallup tracking surveys who said that religion was an important part of their daily lives;
3. The percentage of white evangelicals in the state.


He projects about half of states will legalize by 2014 and the rest by 2024. I personally think the status quo bias is a bit stronger than that and it'll take a little longer...

*Edit - Nate Sliver actually projects "the dates when the model predicts that each of the 50 states would vote against a marriage ban." The dates seem a bit more resonable now, but i still would be more conservative in the projections. Though maybe we've hit a tipping point?

Monday, March 16, 2009

Does the NCAA Selection Committee Use Regression Analysis?

The committees job is to pick and rank the 65 best college basketball teams (including those who won their conference tournaments). They want to choose the teams that are most likely to win games in the NCAA tournament. Now, regression analysis seems like a perfect tool for this type of dilemma, but does the selection committee use it?

Based on the slow involvement of statistics in other sports, my guess is no.

When college basketball experts talk about using "computers", I fear they are only talking comparing team's RPIs and Strength of Schedules via an eye test. The problem with these two measurements is that they are co-linear (a statistical way of saying too similar). All teams that have crappy RPIs are going to have crappy SOSs, like Penn State, since SOS is a part of RPI. Do we know how likely these measures predict success? I don't know, but there is a lot data from previous tournaments and it would be easy to determine. However, RPI and SOS would only explain a portion of the variability of the model. There are a lot of other variables that should be considered like record in the last 10 games, previous performance in tournaments, margin of victory, and potential injuries to key players. If these variables could help better predict a winning outcome (explain more of the variability in the model) then they should be included.

Yeah it's a human decision, and the committee does look at these variables and takes them into account. But there is a huge amount of bias in a purely human decision. It should be the human's decision to determine the variables in the regression analysis that makes the final decision. The committee may think that RPI is the most important variable and deserves the most weight. This may be true, but their assumption should at least be tested by regression.

During the selection show last night a committee member started to explain how they chose Arizona/Dayton over the other bubble teams. He emphasized that teams chosen were based on playing a competitive schedule throughout the year and winning quality games away from home. Of course Penn State had two huge road wins (MSU, Illinois) and Arizona didn't have any (and were 1-5 in their last six games). It's quite possible that the regression could have chosen the same 65 teams, but it would be interesting to see the analysis.