Since yesterday’s WSJ article on whether it’s easier for women to qualify for the Boston Marathon than men, I’ve seen a lot of people debating this question that I’ve heard knocked around a lot.
http://online.wsj.com/article/SB10001424052748703673604575550133914934718.html
It seems to me that if one had access to comprehensive statistical data broken down by age group, gender, and finishing time, that this is really a question that could be settled once and for all statistically without the need for debate.
I dug around a bit and found a bit of data that begins to answer the question, though my method has some problems that we will discuss. I found mean and standard deviation data broken down for all American male and female finishers in 2009.
http://www.marathonguide.com/Features/Articles/2009RecapOverview.cfm
According to this, the mean finishing time for all USA men in 2009 was 4:24:17 with a standard deviation of 1:00:02. The mean finishing time for all USA women in 2009 was 4:52:31 with a standard deviation of 1:05:04.
I then used MATLAB to evaluate the answer to the question: What percentage of all USA male/female runners were able to run a Boston Qualifying time in 2009?
% Compute probability of 3:10 or under for men
u_men = 4*60+24+17/60;
std_men = 60+2/60;
men_q = 3*60+10;
p_310_men = normcdf((men_q-u_men)/std_men)
% Compute probability of 3:40 or under for women
u_women = 4*60+52+31/60
std_women = 60+2+9/60;
women_q = 3*60+40;
p_340_women = normcdf((women_q-u_women)/std_women)
The results are: p_310_men = 0.1080, p_340_women = 0.1216. This means that of all male competitors in 2009, 10.8% were able to run at or under 3:10. Of all female competitors in 2009, 12.2% were able to run at or under 3:40.
This analysis has a big problem in that I don’t have data separated by age group, so I’m comparing the entire male/female population to the 35 and under qualifying standard.
Despite the problem of not having good data by age, this quick and dirty analysis seems to suggest that the qualifying standard is biased toward being easier for women since a slightly higher percentage of the female population is able to run the 35 and under standard. It’s also quite possible than because in general a higher percentage of male runners are older, that this confounds my results and none of this means anything.
Does anyone know where I can get some better data to do a better job with this analysis? I imagine someone out there has probably done a better job using statistical distributions to answer this question.
-
glbregani reblogged this from alexandertaylor and added:
smart cookie. alexandertaylor:
-
runningthroughthewall reblogged this from alexandertaylor
-
alexandertaylor posted this