Fun with Statistics

Chrysler Closings

Zero Hedge is a fantastic blog, one of the best financial actually, but I think they really dropped the ball with this statistical analysis. They produce this chart, a regression on the Chrysler auto dealer shutdown based on political contributions:


This puzzled us. Why would there be an significant noticeable (we have rightly been called out for using significant here) and highly positive correlation between dealer survival and Clinton donors? Granted, that P-Value (0.125) isn’t enough to reject the null hypothesis at 95% confidence intervals (our null hypothesis being that the effect is due to random chance), but a 12.5% chance of a Type I error in rejecting a null hypothesis (false rejection of a true hypothesis) is at least eyebrow raising. Most statistians would not call this a “find” as 95% confidence intervals are the gold standard for this sort of work. Nevertheless, it seems clear that something is going on here.

Nate Silver has a great writeup on why the p-values produced should make us throw this out right away. Two additional thoughts:

1) If we are just throwing out confidence intervals, and looking at the broad overview, what jumps out is that it is 45% likely that the distribution implies that Republicans are treated favorably, and Democrats targeted for closure. There’s no reason to believe this given the regression, because the confidence intervals don’t clear at all, but if we are just getting a sense of the data it seems to go against the hypothesis.

2) I’ve been there. You get all this data, do all the analysis, and nothing. So you set up a different model, in this case Obama AND Democratic Donations, and re-estimate. Then you add more. And more. Nothing. So you start taking some out, and soon you are testing every combination to see if any of them are significant. The sense I get that this has been datamined pretty hard, and since they haven’t been able to find a way to force errors in multicollinearity to magically create results. This lack of results is pretty damning. Give me a dataset, and usually I can come up with something significant to ring out.

So there’s no information here. Note how correlated each of the variables are! “P(Democratic Donor) = P(Obama Donor) + P(Clinton Donor) – P(Obama + Clinton Donor) + …..” Those variables are designed to muck up your standard errors, which is going to leave your p-values not making much sense. I know that it tends to create an interval too big, but if you read the comments you can see that they did a bunch of combinations and none of them gave better results. Awesome effort, but nothing here.

On a certain kind of risk bias

From Andrew Gelman:

As a researcher and teacher in decision analysis, I’ve noticed a particular argument that seems to have a lot of appeal to people who don’t know better. I’ll call it the one-sided bet. Some examples:

– How much money would you accept in exchange for a 1-in-a-billion chance of immediate death? Students commonly say they wouldn’t take this wager for any amount of money. Then I have to explain that they will do things such as cross the street to save $1 on some purchase, there’s some chance they’ll get run over when crossing the street, etc. (See Section 6 of this paper; it’s also in our Teaching Statistics book.)

And a commenter (my underline):

as a professional risk manager I run into this frequently. Our normal tools for understanding risk and return seem to be amazingly good with things that are “familiar.” These familiar things are generally high probability (close to the mean) items (both low and high severity). We tend to proverbially fall apart at the seams when dealing with highly improbable yet highly catastrophic risk perception. We also do much worse calculating relative risks between things that we control (driving our car) and things other control (riding in a plane). The things we control we tend to feel more safe in, regardless of the true risks.

From the underline, what’s worse? Well, those errors are not iid. We tend to find more dangerous (assign a higher valuation to) risks that we don’t control. To put it the other way, given two identical risks, one which is the result of a die roll, and one which is the result of something we control, we find the one we control safer. So if you could get paid $100 to have a 1-in-100,000 chance of dying, presumably determined by typing a number into excel or wolframalpha, or $100 to drive for some number of weeks (which also has the same possibility of death), you’d pick the driving. I may only want $80 to drive, even though statistically they are the same. Is this irrational?

More generally people prefer risks that are familiar as opposed to not familiar. I drive all the time. I love driving, and trust myself driving. I don’t know Stephen Wolfram. Do I really trust Stephen Wolfram to roll the dice to determine if I live or die? (I really, really think he’d like that.) They also value control. If wolframalpha rolls the dreaded “1”, then I’m screwed, but it is something that has something to do with a clock processor spinning in Champaign, IL. If I get in a car accident, I was there, driving the wheel. It was my fault, something I take ownership over for better or worse, rather than having all the personal responsibility of drawing a certain colored marble from a jar. There are also other issues; when I’m driving, all other drivers face the same risks. The risks are equitably distributed among peers.

Before you say “Risk N00b”, I think a lot about risk, and I think a lot about how to deal with these assessments (calling them a bias assumes they are a problem) in light of both traditional risk assessments and policy issues. There’s a tendency to want to paper over this, to convince people they are being irrational for giving this kind of valuation to risks. “What’s the difference to you?” I think these issues, equity and control over one’s destiny, do have real value, economic-risk-metric value if you will, and assuming that people’s instincts are incorrect here may be losing very valuable information.

And this is information that gets lost in the first cut when doing more traditional cost-benefit analysis. Take global warming. How much of a say do I get in how the Earth warms? How familiar are we with a world where the climate is changing rapidly? How equitably are the effects distributed? How reversible are the effects? These concerns are often thought of as a bias or cognitive defect, and they disappear into our model’s estimates where costs on one side, benefits to GDP on the other.

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Fun with Statistics

  1. Jacob says:

    Great post. One other thing worth mentioning is that without doing the data analysis (and I sure as hell don’t care enough to dig up the original dataset and do it myself), we still haven’t answered the question conclusively; we’ve shown only that there isn’t enough information in the data to claim anything about political contributions and dealership closing probability.

    But, it’s entirely possible that if we did a stepwise selection procedure and dropped some of the noise terms, we’d end up with a more parsimonious, significant model that would indicate a clear association between one of the explanatory variables and the dealership closing. Or not.

    • Jacob says:

      Oops, I see I read over your paragraph where you described Silver’s commenters doing just that. Well damn, they have a lot of free time.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s