Tuesday, September 30, 2008

700 billion dollars

Understanding the United States financial woes is a rather difficult task. I'm still trying to get a handle of it myself. My basic thoughts are that we have been accumulating debt and deregulating the markets for several decades now, and it's finally come back to bite us. Corporations job is to turn a profit, it's the government job to regulate and protect the general public. The government has failed us in this instance.

That being said, this article from daily kos does a nice job explaining why a plan is needed, explaining the financial institutions as the heart of the economy.

As you know, the bailout/rescue bill failed yesterday, leading to a major drop in the stock markets around the world. The congressmen (particularly ones up for re-election) have a very difficult decision to make. There have been rumors that calls were made to the congressmen by the general public being 50 or 100 to 1 against the bailout/rescue bill. However you can notice from the markets some kind of bill is essential. I think it's important to remember that this 700 billion isn't all going down the drain, but an investment that could actually make money one day (a risky and possibly bad one though).

It is understandable that tax payers are against with figures like the one in Time Magazine showing how much 700,000,000,000 really is...

  • Give every person in the United States $2,300 or give every household $6,200
  • Pay the Income taxes of every American who makes $500,000 or less a year
  • Fully fund the Defense, Treasury, Education, State, Veterans Affairs and Interior departments next year, as well as NASA
  • Buy gasoline for every car in the United States for the next 16 months.
  • You could pay the income taxes for every American who makes $500,000 or less.
  • You could buy every NFL, NBA, Major League Baseball team and build each one a new stadium - and pay every player $191 million each for a year.
  • Create the 17th largest economy in the world - roughly equal to that of the Netherlands
  • You could pay off just 7% of the $9.8 trillion national debt

Monday, September 29, 2008

Cell Phones and Brain Cancer


Figure 1.
Estimation of the penetration of electromagnetic radiation from a cell phone based on age (Frequency GSM 900 Mhz) (On the right, a scale showing the Specific Absorption Rate at different depths, in W/kg) [ ]
Today I saw Devra Davis lecture about her new book, The Secret History of the War on Cancer. In the book she discusses the controversial topic of Cell Phones and Brain Cancer comparing it to the Tobacco/Lung Cancer relationship back in the 50s.

Her main points were:
-We don't have enough evidence to make conclusions one way or another
-Previous studies (which mostly have shown no association) have been biased, cell phone users included anyone who made in a call in the past 6 months
-Huge increase in use has only happened in past ten years, cancer takes 20-30 years to develop.
-Particularly concerned about children's increased use of cell phones, since skulls are thinner and brains aren't not fully myelinated. (see above figure)
----------
This NY Times article provides a more balanced view:

"According to the Food and Drug Administration, three large epidemiology studies since 2000 have shown no harmful effects. CTIA — the Wireless Association, the leading industry trade group, said in a statement, “The overwhelming majority of studies that have been published in scientific journals around the globe show that wireless phones do not pose a health risk."

The F.D.A. notes, however, that the average period of phone use in the studies it cites was about three years, so the research doesn’t answer questions about long-term exposures. Critics say many studies are flawed for that reason, and also because they do not distinguish between casual and heavy use.

Cellphones emit non-ionizing radiation, waves of energy that are too weak to break chemical bonds or to set off the DNA damage known to cause cancer. There is no known biological mechanism to explain how non-ionizing radiation might lead to cancer."
-----------
Studies that have shown risk have unalarming low Odds Ratios:

Meta-analysis of long-term mobile phone use and the association with brain tumours


Abstract:
We evaluated long-term use of mobile phones and the risk for brain tumours in case-control studies published so far on this issue. We identified ten studies on glioma and meta-analysis yielded OR = 0.9, 95% CI = 0.8-1.1. Latency period of ≥10-years gave OR = 1.2, 95% CI = 0.8-1.9 based on six studies, for ipsilateral use (same side as tumour) OR = 2.0, 95% CI = 1.2-3.4 (four studies), but contralateral use did not increase the risk significantly, OR = 1.1, 95% CI = 0.6-2.0. Meta-analysis of nine studies on acoustic neuroma gave OR = 0.9, 95% CI = 0.7-1.1 increasing to OR = 1.3, 95% CI = 0.6-2.8 using ≥10-years latency period (four studies). Ipsilateral use gave OR = 2.4, 95% CI = 1.1-5.3 and contra-lateral OR = 1.2, 95% CI = 0.7-2.2 in the ≥10-years latency period group (three studies). Seven studies gave results for meningioma yielding overall OR = 0.8, 95% CI = 0.7-0.99. Using ≥10-years latency period OR = 1.3, 95% CI = 0.9-1.8 was calculated (four studies) increasing to OR = 1.7, 95% CI = 0.99-3.1 for ipsilateral use and OR = 1.0, 95% CI = 0.3-3.1 for contralateral use (two studies). We conclude that this meta-analysis gave a consistent pattern of an association between mobile phone use and ipsilateral glioma and acoustic neuroma using ≥10-years latency period.
-----------------


My Conclusions:
1. Chances are there is little or no risk increase for general population, but even a small risk increase would be a big public health problem since billions of people use cell phones.
2. The lack of known biological mechanism is huge and the major reason the FDA and scientists aren't freaking out
3. Studies should be particularly concentrated on children

4. Time will tell, hopefully we can gather more evidence and not be slow to action like with tobacco (if there happens to be causational evidence).
5. If worse comes to worse, we can always go back to the Banana Phone...

Sunday, September 28, 2008

Phillies Clinch NL East!



My friends and I went to a classic Phillies game on Saturday, that statistics can not adequately describe. The figure above is from a cool website called fangraphs.com. It's able to track the win probability as the game goes on. It also measures the "leverage index (LI)" or importance of each at bat as the game goes along. You can notice a big dip in probability and increase in leverage with a "C Guzman Single". That single made the game 4-3 and loaded the bases with only one out. The graph can't fails to illustrate the next play which appeared to be a lead changing single, but was turned into a game ending double play.

There are also some pretty cool statistics measured on this site including WPA (Win Probablitiy Added), and Clutch.

Since I know you're interested, rankings of a few players in WPA for the 2008 season:
1. Manny Ramerez: 7.03
2. Lance Berkman: 6.68
3. Albert Pujols: 6.22
8. Carlos Beltran: 4.53
9. Joe Mauer: 4.52 (top in AL)
11. Pat Burrell: 3.78 (top for phillies)
39. Jason Giambi: 2.17 (yes he was ranked above Howard)
40. Ryan Howard: 2.17 (NL MVP or MVP of the month? - he leads in September WPA)
50. Jack Cust: 1.85 (sadly, the top athletic)
148 (last). Jeff Francoeur: -3.91

Tops in "clutch" (WPA/(LI-(WPA/LI))):

1. Stephan Drew: 2.29
2. Lance Berkman: 1.82
3. Dustin Pedroia: 1.52
8. Pat Burrell: 1.15 (top for phillies)
115. Ryan Howard: -0.76
145 (4th to last): Chase Utley: -2.11 (I don't think phillies fans noticed)
148 (last): Alex Rodriguez: -3.09 (ny fans might have been right this year?)

And FYI for pitchers WPA:
WPA: Starters
Cliff Lee6.22
Tim Lincecum4.73
CC Sabathia4.69
Roy Halladay4.48
Johan Santana4.41
WPA: Relievers
Brad Lidge5.43
Mariano Rivera4.47
Joakim Soria4.42
Joe Nathan3.73
Carlos Marmol3.71
(Hamels is 15th for starters)

Tuesday, September 23, 2008

2008 Election: State Rankings - 9/23/08

(Rank, State, Score (max=1))
1. Pennsylvania - .854
2. Michigan - .775
3. Florida - .685
4. Minnesota - .675
5. Colorado - .652
6. Wisconsin - .620
7. Washington - .589
8. New Jersey - .583
9. Ohio - .576
10. Virgina -.560

-----
Just Missed: North Carolina, New Mexico, Indiana, Oregon, Nevada
(2000 Florida would rank 1st, 2004 Ohio would rank 2nd just below Pennsylvania)

See my intro - for the explanation of the question and methods for this project.
Current National Average: +3.0 for Obama
Nate Sliver's top 5 tipping point states: Pennsylvania, Ohio, Virgina, Colorado, and Michigan

Analysis:
The states at the top seem to be more Obama leaning states possibly because of a recent Obama bump in the national polls. It's possible that some of the state data has yet to catchup with the national data. The list is filled with larger states possibly to a fault. I'm considering changing the 50/50 designation. I'd love to hear feedback.

A couple of surprises on the list:
1. Pennsylvania - seemed to be a strong obama state, but polls have been mixed lately, very close to national average
8. New Jersey - see Pa
9 and 10. Ohio and Virgina - would expect to see these states at the top of the list. Seem to be running a few points behind national average for Obama.

Monday, September 22, 2008

2008 Election: State Rankings - Intro

As we are under two months away from the Nov 4th election and a few days away from the first debate, I thought I would take a try at some statistical predictions.

Question: What state is most likely to be this years Florida (2000) or Ohio (2004)? Which one will be the "tipping point" state that decides a very close election. Several sites such as 538 have similar analyses but seem overly convoluted attempting to adjust for all confounders that are difficult to measure. I will provide a crude analysis that could be less biased than the other on the web. (I prefer the 538 one, but it is nice to having something to compare it too)

-----------
Methods:
The rankings are based off of two measures:

1. States Difference from National Average
based on composites from Pollster (a popular ranking which combines all data from several pollsters - gallup, rasmussen, cnn, etc). I excluded a state more than 10 points away from national average, assuming they would not be a tipping state. We are also assuming that the election will be closely contested in terms of popular vote, or there will not be a tipping point state.

=(10-X)/10
X=State's difference from national average
*therefore a state that is exactly equal to natural average = 1*
-given .50 weight

2. # of Electoral Votes - (538 total in nation, 270 needed to be elected president)

=(State X's # of Electoral Votes)/(Largest State of Interest Electoral Votes)
*therefore Largest state in question ratio = 1*
-given .50 weight (In 2004 - New Mexico and Iowa were actually closer contested than Ohio, but did not have enough EV to "tip" the election)
-------------

Assumptions/Drawbacks:
-State data is following same trend as National data (not lagging behind)
-Confounders such as ground game, similarities to other swing states (demographics), and nation/state lag time are not involved in model
-Arbitrary weighting of state average (.5) , electoral average (.5)

Sunday, September 21, 2008

Epidemiology Theories - Pre-Hopkins

Theories I have one month into my Hopkins education:

1. Life expectancy in the US will decrease in the near future
-too many w/o health insurance
-expectancy (~80) all ready close to human max (~100)
-America going down the drain (stupid wars, huge debt, not funding/encouraging science)
-obesity epidemic

2. Environment is a much greater factor than Genes
-Genome Wide Association studies have shown little
-BRAC1/2 an exception, but possibly no others quite like it
-Personal genetics companies like 23 and me will lead to over analyzing of genes
-However genes play a part in basically every disease (average approx 25 percent per disease?)

3. In terms of health disparities Social Economic Status (SES) is a much greater factor than race
-Race over analyzed - possible since it's easy to measure?
-SES underdeveloped, needs a less arbitrary definition (make a uniform one?)
-However separation of groups and evolution gives more reason for race

4. Future in GIS (Geographic Information Systems) and Biomarker studies
-GIS used by gov't to identify disease clusters and disparities
-Biomarkers help reduce bias, identify diseases and potential diseases at earlier stages

5. Measures will be found to greatly reduce bias
-A well done small study (w/ biomarkers) is greater than a huge sample with surveys.
-find other ways to evaluate disease without relying on human memories.
-technology will greatly help in this area - measuring risk factor/food intake via cell phones, etc

6. Obesity will not be the next smoking in this generation
-people too lazy to exercise (much harder than quitting smoking?)
-does not have the obvious ability to harm others like smoking (2nd hand)
-very difficult to turn food into something "evil" (like cigarettes)

7. Poor health education (stubbornish?) is public health biggest enemy
-public over interprets studies - we should ignore unless it increases Risk by 300 percent. Current example: BPA (poor done study showed 2x increase in heart disease, Nalgene removes it from bottles - even though FDA and well done studies are on other side).
-ignoring studies - alarming increase in vaccine distrust (people need to see a disease to be afraid?)
-huge health disparities between education classes


We'll see how my Hopkins education alters my views....

Friday, September 19, 2008

Two Point Conversions


Last Sunday in the 2nd week of NFL season the Denver Broncos defeated the San Diego Chargers in an exciting 39-38 game. The Broncos make a controversial decision to go for a 2 point conversion down 38-37 with 29 seconds left, declining to take the 1 pt extra point for the tie.

Disregarding the broncos successful attempt - the question is: what was the statistically correct move?

-Most NFL fans and analysts would say the extra point - based on the fact that the broncos would tie the game and have a 50% chance of winning in overtime. This is basically the conventional wisdom. “As a general rule, I feel like I have an obligation to my team to give them a chance to win the game in overtime by kicking an extra point,” Jeff Fisher, the Tennessee Titans’ coach, said, “not by winning or losing the game on one play.”

However, the alternative (the 2 point conversion) has had a varying success rate since it's existence. This New York Times article describes this rate, including this interesting fact: "Last season, N.F.L. teams converted just 30 of 61 attempts, a paltry .492 success rate". (This rate has varied since the 2-point it's existence, but has been more successful lately possibly because of decrease incidence and better play calling). The same article mentions the success rate of an extra point is slightly below 99 percent.

Another scenario to account for is the broncos missing the 2 point conversion, kicking an onside kick (10-15 percent success rate), then scoring in the final ~30 seconds (probably a 20-30 percent assuming successful onside kick). This would indicate an additional 1-3 percent increase in the win probability for the 2-point conversion.

It is hard to take every variable to account including each player on the field, but I estimate the 2-point conversion was the correct statistical call by a very small margin. This is sometimes difficult to comprehend, Since it's a dichotomous result. Close to 50 percent of the time choosing the 2-point conversion will be fail and be the "wrong" chose.

“Sometimes you have to go with your gut,” Shanahan said. “I just felt like it was a chance for us to put them away. I didn’t want to count on the coin flip. I wanted to do it then, and obviously it worked out.”

While, I am not a fan of the phrase "going with your gut" (did his lunch make the decision?), I must commend Shannhan for taking this risk. Most coaches (like Jeff Fisher) tend toward be more conservative, choosing the statistically incorrect decision when it has high risk that could be later be blamed on the coach.

I admire Mike Shanahan for his longevity, unique use of "skill players" like running backs, and his ability to lay himself on the line by taking risky decision that is statistically accurate. Just not his post-explanations. ;)

Tuesday, September 16, 2008

Baseball Managers and Probability


I recently got into a discussion with my friend on whether the Brewers decision of firing their manager 2 weeks before the end of the season will be beneficial for them.

Here is my commentary....

I think the effect it brings is very negligible, and is over analyzed by the media. Often, managers are fired during slumps when the team is getting unlucky and playing at a short-term record below their pyth. record based on runs scored and overall record. However, when the sample size increases they will play at a level closer to their talent level (possibly around .550 for the brewers). The brewers were "due" to lose, and they are not "due to win in the future" though. For the next two weeks, the brewers are most likely going to perform better than they have been in the previous 2 weeks. This should be based off their season's worth of data and talent level. Analysts like John Kruk and maybe even brewers players will base this off the manager change - when there really isn't much evidence pointing to that.

Lets say Carlos flips a fair coin and he gets a bunch of tails (loses) in a row. Carlos is fired and is replaced by the fresh and upcoming Ryan. Ryan flips closer to the 50/50 rate. Sports analysts would say Ryan turned things around. It's the same thing in baseball except it's not exactly 50/50, but rather maybe 55/45 for the brewers.

In my opinion the role of the baseball manager is vastly overrated. The decisions made by a manager could be made better by a computer. Players need a friend/leader to make sure they stay confident in themselves. Would hiring a psychologist or motivational speaker and using computer based analysis to make decisions be better than a "baseball guy"? Maybe...

So my prediction is that this move will "help" the brewers, but will technically bring little effect.

for further studies...
http://www.hardballtimes.com/
baseball prospectus

Edit @ 4:30pm - I know my computer manager thing isn't going to happen, but can we at least get a laptop or two in clubhouse?

Wednesday, September 10, 2008

What's up with the McCain Bounce?


Gallup Daily tracking poll (Sept 9)


As you can see from Gallup and other polls - John McCain has been polling much better this week. The studies suggest this is because of his VP selection of Gov. Palin and the republic convention that was held last week - and may be short lived.

My problem with the polls:
There are several problems with these daily tracking and political polls in general. The fact that they are robo calling, the lack of calling cell phones, the increased use of caller ID are a few of the many. My biggest problem, however, is with the low response rate or response bias.

The goal of a poll is to try to develop a representative sample to describe the population at interest. In our case this sample of approx 1,000 being polled is supposed to represent the general US voting population. In 2004, Bush received over 62 million votes, while Kerry received over 59 - totaling well over 100 million votes. Organizations like Gallup and Rasmussen
did a good job with their statistics, determining a good sample size and all of that. The problem is their response rate. The Pew research center found that in standard surveys (like the daily tracking poll) the response rate is 27 percent. I would wager that that number is closer to 10-15 percent these days. In Epi you want to get your response rate at least in the 70 percent range.
Gallup and Rasmussen still get their adequate sample size of say 1,000 (can't find exact #) by calling closer to 5K+ homes. They are also able to adjust for confounders such as age, race, sex, etc.

So what can we take from this? Response rate is highly dependent on enthusiasm. An explanation for the McCain bounce would be that voters are more enthusiastic about his candidacy thanks to both the selection of Palin and the convention. These voters would be more willing to now answer their phone and take a few minutes out of their day to respond to the poller's questions.

For example: Lets assume Steve is a conservative leaning independent who plans to vote for McCain. Before this past week he would have maybe just ignored the poller's call, not wanting to talk politics. The poller would then go onto the next more enthusiastic caller Andrew who would answer their questions - possibly a liberal leaning independent who is very unhappy with bush. Now this week people like Steve are more willing to talk - and therefore the poller wouldn't reach people like Andrew once their sample size has become adequate. The factor has been seen in the British election and is called the Shy Tary Factor.

The democrats saw a smaller bump after the Biden announcement and their convention possibly because their base voters were all ready very enthusiastic. Also because the events all kind of overlapped in a short period of time. The real question is - would unenthusiastic Steve who will not answer the polling question still end up voting on Nov. 4th? I think most will.

The real statistic that would be most helpful to test this hypothesis is the response rate for each day of tracking and for each group - republicans, democrats, and independents.

Conclusion:
I think the bounce will eventually go down, but I don't think it matters. My advice: Ignore daily tracking polls until they discuss and alleviate some of the response bias problems.

(see 538 for Nate Silver's thoughts on the subject)

Monday, September 8, 2008

Gas Prices should be higher...

A discussion on the cost of gasoline:

“This price reflects only the cost of discovering the oil, pumping it to the surface, refining it into gasoline, and delivering the gas to service stations. It overlooks the costs of climate change as well as the costs of tax subsidies to the oil industry, the bludgeoning military costs of protecting access to oil in the politically unstable Middle East, and the health care costs for treating respiratory illnesses from breathing polluted air.” (p. 7)
-Plan B 3.0: Mobilizing to Save Civilization, Third Edition (Lester Brown)

Sounds about right to me....

Friday, September 5, 2008

Fantasy Football Drafting Theories



Fantasy Football season is about to begin. The best part, the draft, has all ready past - so let me share with you a few of my theories based a bit on data, guts and observational study.

1. When drafting one should consider players "VORP"

-VORP - Value of Replacement Player is a statistic develop by the sabermetric community in baseball, but can be used in fantasy football quite well. When deciding between a backup qb and a fourth running back near the last rounds - one should think about who will be available on the waiver wire. If there will be several quarterbacks of equal value to the one you're considering drafting, then maybe you should go a different route.

It is important to utilize the waiver wire - and realize you do not always have to draft 2 qbs, 2 tight ends, 2 defenses, and a kicker. You can always pickup the equivalent backup later on the waiver wire.


2. Old players suck - avoid them

This study suggests that players start declining at age 28 for running back, 30 for reciever, and 32 for quarterbacks.


There are several problems with this study:
1. It does not evaluate the percentage decrease/increase of performance
2. It evaluates all players equal - scrubs and superstars (maybe superstars - the one who matter to fantasy leagues, decline later)
3. It makes arbitrary categories on age, instead of analyzing data as continuous.

Well I take what I can get in terms of data. I'm too lazy to analyze the data myself. Either way, I am still a believer in the theory that well known, older players are going to be overvalued by the average fantasy manager.

3. Draft running backs on Good Teams

Teams that usually win are not only scoring more, but usually running out the clock in the 4th quarter - giving running backs some extra carries.

"The correlation between first quarter rushing attempts and team wins is a measly .171. That means there is almost no connection between running a lot in the first quarter, and winning a lot of games. The correlation between fourth quarter rushing attempts and team wins, on the other hand, is .750. That’s a size able relationship."
http://www.footballoutsiders.com/2003/07/14/ramblings/stat-analysis/3/

4. Draft receivers on Bad Teams, with decent QBs

The converse of this - when teams are playing from behind they have to throw more giving better stats to the receivers.

5. Draft many more running backs and wide receivers

In a typical league you will start 2 or 3 Running Backs and Wide outs, while only starting 1 QB, TE, and Def. There are over 30 starters for each position

6. Concentrate on Yards instead of
TDs

Especially for Running Backs - TDs are a lot of luck. Look at Willie Parker last year who only had 2 TDs despite being one of the leading rushers. The previous year he had 16 TDs with a similar yardage amount. Yards gained are going to be a lot more constant (less variable) than TDs, with TD vultures, etc.

7. Don't over think things

There is a lot of variability and luck in fantasy football. You cannot really control injuries for the most part, and predicting results on a week to week basis is little better than a crap shoot. Players come out of nowhere each year like Derek Anderson, Jason Witten, and Ryan Grant last year. Solid producers like Shaun Alexander, Preist Holmes and Marvin Harrison can drop off a cliff any given year. Just pick some guys you think should do well loosely based on statistics and the other above measures, then pick some guys you enjoy rooting for.

8. Finally, actually show up to the draft!


I actually missed my keeper league draft this year, and got screwed. Yahoo decided I needed 10 backup QBs like Charlie Batch. So i'll be working the waivers heavily this year.

Thursday, September 4, 2008

Haiti and Hurricanes


I am big supporter of Partners in Health (PIH) run by Paul Farmer, Ophelia Dahl and others. Their organization helped argue that you can successfully treat those in developing countries like Haiti (the poorest country in the western hemisphere), the same way as in the United States. They treat disease that were usually considered 'too expensive' like TB-MDR and HIV-AIDS.


I have traveled to Haiti and it's neighbor the Dominican Republic several times over the past few years with a solidarity group. Following the Haiti's news over the past few years, I've realized their problems are bit more complicated than it seems on the surface.


One of their major problems is their location - specifically a prime target to be hit by Hurricanes. Similar to places like Florida, New Orleans, and Cuba, each fall they seem to be devastated by one hurricane after another.


Today I received an email from PIH discussing Haiti's recent blows from Hurricanes Gustav and Hanna.

"Loune Viaud, our Director of Operations in Haiti, explained that the situation is dire and the suffering extreme. She estimates that close to 10,000 people have been driven from their homes by floodwaters in Haiti’s Artibonite Valley, where we have recently expanded our operations to six new facilities."

"The situation is dire and catastrophic and sad and frustrating... worse than [Hurricane] Jeanne, if you can imagine." - Loune Viaud, Director of Operations for Zanmi Lasante


Hurricanes that hit Haiti can be especially deadly because of the lack of forestation in the country. Haiti was once a lush wonderland where Christopher Columbus first landed in the new world. Now thanks to years of poverty, political corruption, and mass deforestation to produce charcoal for food - the place seems almost bare. The lack of trees are a major problem because it the hurricanes can more easily create problems via floods and mudslides. Haiti's high population density (249.79 people per sqkm[45th of 256 countries]) helps compound the effect.

I don't have too much data on the situation - but I feel the hurricanes have shown an argument for helping to solve Haiti's environmental problem before their many others. How this can be done is another question...