Thanks to my moderate knowledge of statistics, I know that I have a lot more to learn in the field and should never make assumptions about data or analyses (even my own).
Because of this I share a grievance with Zed Shaw who says that “programmers need to learn statistics or I will kill them all”. Required reading and advice not just for programmers, but for everyone who looks at data, creates models, or even reads a newspaper.
I have a major pet peeve that I need to confess. I go insane when I hear programmers talking about statistics like they know shit when its clearly obvious they do not. I’ve been studying it for years and years and still don’t think I know anything. This article is my call for all programmers to finally learn enough about statistics to at least know they don’t know shit. I have no idea why, but their confidence in their lacking knowledge is only surpassed by their lack of confidence in their personal appearance.
My recommendation? Read this article to realise that you know nothing, and then pick up a copy of John Allen Paulos’ Innumeracy and Darrell Huff’s How to Lie with Statistics in order to realise that you know even less than you thought (but a hell of a lot more than the average person).
Comments
2 responses to “Learn Statistics, Damn You!”
I’ve read John Allen Paulos’ excellent book.
Sometimes the books’ maths gets a bit hairy but generally I’ve learned an awful lot from it. I’m now hooked on the power of conditional probability and I’d like to give an example of how its lessons can be used in everyday life.
I’m currently looking for a job so I speak to recruiters every day. A few weeks ago I was talking to a recruiter about a position working with computers. I’ll leave the details out, but one of the required skills was working with an old UNIX system. Another requirement was working with an unusual customer relationship management system and the last requirement was fluency in a SE Asian language.
If I remember correctly, my conversation with the recruiter went something like this:
“What chances do you think you have of finding someone with all three?”
“The problem is the language fluency. Of the fifty applications we’ve received so far, only 5 can speak that language at all.”
“How about the other skills?”
“10 of the applicants have the UNIX experience, and maybe 5 in total have the CRM experience”
“Right … so .. how do you rate your chances of finding someone with all three then?”
…
“We’ll probably only respond to the five or so that will have all three”
(She’s concluded that 10% of the fifty applicants will meet all of the requirements. However, the correct figure is 0.002%, meaning she will have to get through roughly another 450 applications to find someone with all three of the skills)
Notwithstanding the fact that she has a job and I don’t right now, this example does show that statistics and conditional probability are hugely undervalued especially by the people who need to understand them most.
I found Stuart Sutherland’s book ‘Irrationality’ to be an excellent intro into how people miscalculate risk too.
Even statisticians do not understand statistics. They spend their entire lives on it and still argue with each other.
In medicine it is even worse as there are so many vested interests, who do you listen too?