Showing posts with label sports. Show all posts
Showing posts with label sports. Show all posts

Monday, March 16, 2009

Does the NCAA Selection Committee Use Regression Analysis?

The committees job is to pick and rank the 65 best college basketball teams (including those who won their conference tournaments). They want to choose the teams that are most likely to win games in the NCAA tournament. Now, regression analysis seems like a perfect tool for this type of dilemma, but does the selection committee use it?

Based on the slow involvement of statistics in other sports, my guess is no.

When college basketball experts talk about using "computers", I fear they are only talking comparing team's RPIs and Strength of Schedules via an eye test. The problem with these two measurements is that they are co-linear (a statistical way of saying too similar). All teams that have crappy RPIs are going to have crappy SOSs, like Penn State, since SOS is a part of RPI. Do we know how likely these measures predict success? I don't know, but there is a lot data from previous tournaments and it would be easy to determine. However, RPI and SOS would only explain a portion of the variability of the model. There are a lot of other variables that should be considered like record in the last 10 games, previous performance in tournaments, margin of victory, and potential injuries to key players. If these variables could help better predict a winning outcome (explain more of the variability in the model) then they should be included.

Yeah it's a human decision, and the committee does look at these variables and takes them into account. But there is a huge amount of bias in a purely human decision. It should be the human's decision to determine the variables in the regression analysis that makes the final decision. The committee may think that RPI is the most important variable and deserves the most weight. This may be true, but their assumption should at least be tested by regression.

During the selection show last night a committee member started to explain how they chose Arizona/Dayton over the other bubble teams. He emphasized that teams chosen were based on playing a competitive schedule throughout the year and winning quality games away from home. Of course Penn State had two huge road wins (MSU, Illinois) and Arizona didn't have any (and were 1-5 in their last six games). It's quite possible that the regression could have chosen the same 65 teams, but it would be interesting to see the analysis.

Friday, January 23, 2009

Ryan Howard's Arbitration Case

I have been frequenting fangraphs.com recently, which is a great site for lots of cool baseball data and analysis. I recommend it for all the baseball stat nerds out there.

Recently Ryan Howard, the Phillies first base slugger, is in the news for having the biggest difference between arbitration figures submitted. Howard is asking for 18 million, while the Phillies submitted a figure of 14 million. Last year, in his first arbitration year, Howard took home a record 10 million for actually taking his case to arbitration judges and winning.

So in terms of arbitration, major league players have three arbitration years. Howard is in his 2nd year. According to Fangraphs, estimates on what players get are 40% of their free market value their 1st year, 60% their 2nd year, and 80% their 3rd year. So Howard should be getting about 60 percent of his free market value.

The Phillies with their 14 million dollar offer, think Howard is worth 19.6 million/year.
Howard with his 18 million dollar suggestion, thinks he's worth 25.2 million/year.

So basically Howard thinks he deserves the 2nd highest contract in league after ARod, who makes 27.5 million/year.

Looking at Fangraphs data, here are what the top Phillies were "worth" based on last year's production:

1. Chase Utley: 35.7 million
2. Jimmy Rollins: 23.1 million
3. Jayson Werth: 21.3 million
4. Cole Hamels: 20.6 million
5. Shane Victarino: 17.0 million
6. Ryan Howard: 14.1 million

The stats take into account his defense, replacement level (1b) performance, OPS (which went down .100 this year), and many other measures.

Howard has regressed the past three years, but considering his great 2nd half he will hopefully preform better next year. The Phils take that into account giving him a "19.6 million/year" deal (5.5 million more than he "made" last year).

Yes - Howard wants the 2nd "highest" salary in the major leagues, when he was potentially the 6th most valuable Philly last year.

It will be interesting to see what happens if this goes to arbitration. Do these arbitration judges think like MVP voters (Howard finished 2nd in NL MVP voting) or do they take into account statistics other than HR and RBI? I think either way Howard will lose his case.

EDIT:
A good point (by a fangraphs contributor) was made that needs to be considered.

He basically believes that Howard cannot be considered like a typical arbitration case, which would normally fall under the 40/60/80 rule. The problem with Howard is that the Phillies took awhile (age 26?) in calling him up (since they had Thome). Assuming the arbitration judges aren't ignorant like MVP voters, they probably rewarded Howard 10 million last year because he really should be in his 3rd year of arbitration (not his 2nd). I can understand that logic a bit more. If you make that assumption, Howard only thinks he's worth 21.6 million, which is a bit high but more understandable. In the above article, the fangraphs author used 2009 projections to calculate that Howard would be worth approximately 15 million (assuming he's in his "third" year of arb.)

So it could go either way I guess, but I still don't think he's worth the money.

Saturday, December 13, 2008

The All-Time Greatest College Football Teams: Where are they Headed?


(rankings as of 12/3/08, click on table for larger image)

The teams highlighted blue are likely to advance in the rankings in the future, while the teams in red look like they will fall based on the last ten year win percentage, 2009 recruiting rank, and current BCS rank. Any suggestions on how to make this more scientific?

Teams with a conference championship (Big Twelve, SEC) have a slight advantage. I also give an advantage t0 two teams with nice locations (USC, Texas). Let's not kid ourselves - many good players would rather play in the sun at USC instead of frigid State College. Notice Alabama has the worst win percentage in the last ten year, but seems to be turning it around with a nice year and a quality recruiting class. Nebraska and Tennessee seemed to be failing out of the spotlight lately, and with poor performances on the field and with recruiting it will be tough to regain momentum. Notre Dame has really lost it's luster lately, and it doesn't look like Charlie Weis will be the one to bring them back.

Pretty amazing how these powerhouses just keep going. Five of the top 10 teams are going to be playing the top tier bowls.

Notes:
1. Win percentage statistics generated by stassen.
2. rankings of non-bcs teams from usa today.
3. 2009 recruiting rankings from rivals

Sunday, September 28, 2008

Phillies Clinch NL East!



My friends and I went to a classic Phillies game on Saturday, that statistics can not adequately describe. The figure above is from a cool website called fangraphs.com. It's able to track the win probability as the game goes on. It also measures the "leverage index (LI)" or importance of each at bat as the game goes along. You can notice a big dip in probability and increase in leverage with a "C Guzman Single". That single made the game 4-3 and loaded the bases with only one out. The graph can't fails to illustrate the next play which appeared to be a lead changing single, but was turned into a game ending double play.

There are also some pretty cool statistics measured on this site including WPA (Win Probablitiy Added), and Clutch.

Since I know you're interested, rankings of a few players in WPA for the 2008 season:
1. Manny Ramerez: 7.03
2. Lance Berkman: 6.68
3. Albert Pujols: 6.22
8. Carlos Beltran: 4.53
9. Joe Mauer: 4.52 (top in AL)
11. Pat Burrell: 3.78 (top for phillies)
39. Jason Giambi: 2.17 (yes he was ranked above Howard)
40. Ryan Howard: 2.17 (NL MVP or MVP of the month? - he leads in September WPA)
50. Jack Cust: 1.85 (sadly, the top athletic)
148 (last). Jeff Francoeur: -3.91

Tops in "clutch" (WPA/(LI-(WPA/LI))):

1. Stephan Drew: 2.29
2. Lance Berkman: 1.82
3. Dustin Pedroia: 1.52
8. Pat Burrell: 1.15 (top for phillies)
115. Ryan Howard: -0.76
145 (4th to last): Chase Utley: -2.11 (I don't think phillies fans noticed)
148 (last): Alex Rodriguez: -3.09 (ny fans might have been right this year?)

And FYI for pitchers WPA:
WPA: Starters
Cliff Lee6.22
Tim Lincecum4.73
CC Sabathia4.69
Roy Halladay4.48
Johan Santana4.41
WPA: Relievers
Brad Lidge5.43
Mariano Rivera4.47
Joakim Soria4.42
Joe Nathan3.73
Carlos Marmol3.71
(Hamels is 15th for starters)

Friday, September 19, 2008

Two Point Conversions


Last Sunday in the 2nd week of NFL season the Denver Broncos defeated the San Diego Chargers in an exciting 39-38 game. The Broncos make a controversial decision to go for a 2 point conversion down 38-37 with 29 seconds left, declining to take the 1 pt extra point for the tie.

Disregarding the broncos successful attempt - the question is: what was the statistically correct move?

-Most NFL fans and analysts would say the extra point - based on the fact that the broncos would tie the game and have a 50% chance of winning in overtime. This is basically the conventional wisdom. “As a general rule, I feel like I have an obligation to my team to give them a chance to win the game in overtime by kicking an extra point,” Jeff Fisher, the Tennessee Titans’ coach, said, “not by winning or losing the game on one play.”

However, the alternative (the 2 point conversion) has had a varying success rate since it's existence. This New York Times article describes this rate, including this interesting fact: "Last season, N.F.L. teams converted just 30 of 61 attempts, a paltry .492 success rate". (This rate has varied since the 2-point it's existence, but has been more successful lately possibly because of decrease incidence and better play calling). The same article mentions the success rate of an extra point is slightly below 99 percent.

Another scenario to account for is the broncos missing the 2 point conversion, kicking an onside kick (10-15 percent success rate), then scoring in the final ~30 seconds (probably a 20-30 percent assuming successful onside kick). This would indicate an additional 1-3 percent increase in the win probability for the 2-point conversion.

It is hard to take every variable to account including each player on the field, but I estimate the 2-point conversion was the correct statistical call by a very small margin. This is sometimes difficult to comprehend, Since it's a dichotomous result. Close to 50 percent of the time choosing the 2-point conversion will be fail and be the "wrong" chose.

“Sometimes you have to go with your gut,” Shanahan said. “I just felt like it was a chance for us to put them away. I didn’t want to count on the coin flip. I wanted to do it then, and obviously it worked out.”

While, I am not a fan of the phrase "going with your gut" (did his lunch make the decision?), I must commend Shannhan for taking this risk. Most coaches (like Jeff Fisher) tend toward be more conservative, choosing the statistically incorrect decision when it has high risk that could be later be blamed on the coach.

I admire Mike Shanahan for his longevity, unique use of "skill players" like running backs, and his ability to lay himself on the line by taking risky decision that is statistically accurate. Just not his post-explanations. ;)

Tuesday, September 16, 2008

Baseball Managers and Probability


I recently got into a discussion with my friend on whether the Brewers decision of firing their manager 2 weeks before the end of the season will be beneficial for them.

Here is my commentary....

I think the effect it brings is very negligible, and is over analyzed by the media. Often, managers are fired during slumps when the team is getting unlucky and playing at a short-term record below their pyth. record based on runs scored and overall record. However, when the sample size increases they will play at a level closer to their talent level (possibly around .550 for the brewers). The brewers were "due" to lose, and they are not "due to win in the future" though. For the next two weeks, the brewers are most likely going to perform better than they have been in the previous 2 weeks. This should be based off their season's worth of data and talent level. Analysts like John Kruk and maybe even brewers players will base this off the manager change - when there really isn't much evidence pointing to that.

Lets say Carlos flips a fair coin and he gets a bunch of tails (loses) in a row. Carlos is fired and is replaced by the fresh and upcoming Ryan. Ryan flips closer to the 50/50 rate. Sports analysts would say Ryan turned things around. It's the same thing in baseball except it's not exactly 50/50, but rather maybe 55/45 for the brewers.

In my opinion the role of the baseball manager is vastly overrated. The decisions made by a manager could be made better by a computer. Players need a friend/leader to make sure they stay confident in themselves. Would hiring a psychologist or motivational speaker and using computer based analysis to make decisions be better than a "baseball guy"? Maybe...

So my prediction is that this move will "help" the brewers, but will technically bring little effect.

for further studies...
http://www.hardballtimes.com/
baseball prospectus

Edit @ 4:30pm - I know my computer manager thing isn't going to happen, but can we at least get a laptop or two in clubhouse?

Friday, September 5, 2008

Fantasy Football Drafting Theories



Fantasy Football season is about to begin. The best part, the draft, has all ready past - so let me share with you a few of my theories based a bit on data, guts and observational study.

1. When drafting one should consider players "VORP"

-VORP - Value of Replacement Player is a statistic develop by the sabermetric community in baseball, but can be used in fantasy football quite well. When deciding between a backup qb and a fourth running back near the last rounds - one should think about who will be available on the waiver wire. If there will be several quarterbacks of equal value to the one you're considering drafting, then maybe you should go a different route.

It is important to utilize the waiver wire - and realize you do not always have to draft 2 qbs, 2 tight ends, 2 defenses, and a kicker. You can always pickup the equivalent backup later on the waiver wire.


2. Old players suck - avoid them

This study suggests that players start declining at age 28 for running back, 30 for reciever, and 32 for quarterbacks.


There are several problems with this study:
1. It does not evaluate the percentage decrease/increase of performance
2. It evaluates all players equal - scrubs and superstars (maybe superstars - the one who matter to fantasy leagues, decline later)
3. It makes arbitrary categories on age, instead of analyzing data as continuous.

Well I take what I can get in terms of data. I'm too lazy to analyze the data myself. Either way, I am still a believer in the theory that well known, older players are going to be overvalued by the average fantasy manager.

3. Draft running backs on Good Teams

Teams that usually win are not only scoring more, but usually running out the clock in the 4th quarter - giving running backs some extra carries.

"The correlation between first quarter rushing attempts and team wins is a measly .171. That means there is almost no connection between running a lot in the first quarter, and winning a lot of games. The correlation between fourth quarter rushing attempts and team wins, on the other hand, is .750. That’s a size able relationship."
http://www.footballoutsiders.com/2003/07/14/ramblings/stat-analysis/3/

4. Draft receivers on Bad Teams, with decent QBs

The converse of this - when teams are playing from behind they have to throw more giving better stats to the receivers.

5. Draft many more running backs and wide receivers

In a typical league you will start 2 or 3 Running Backs and Wide outs, while only starting 1 QB, TE, and Def. There are over 30 starters for each position

6. Concentrate on Yards instead of
TDs

Especially for Running Backs - TDs are a lot of luck. Look at Willie Parker last year who only had 2 TDs despite being one of the leading rushers. The previous year he had 16 TDs with a similar yardage amount. Yards gained are going to be a lot more constant (less variable) than TDs, with TD vultures, etc.

7. Don't over think things

There is a lot of variability and luck in fantasy football. You cannot really control injuries for the most part, and predicting results on a week to week basis is little better than a crap shoot. Players come out of nowhere each year like Derek Anderson, Jason Witten, and Ryan Grant last year. Solid producers like Shaun Alexander, Preist Holmes and Marvin Harrison can drop off a cliff any given year. Just pick some guys you think should do well loosely based on statistics and the other above measures, then pick some guys you enjoy rooting for.

8. Finally, actually show up to the draft!


I actually missed my keeper league draft this year, and got screwed. Yahoo decided I needed 10 backup QBs like Charlie Batch. So i'll be working the waivers heavily this year.