The wisdom of crowds: Masturbation and asking the right questions

Here’s a small challenge for you: how many £2 coins are in this tin?

This tin is filled only with £2 coins (took about 4 years!). Ignore the fiver, I was clearly feeling rich. How many do you reckon there are in there? Bisto included for scale…

Your answer, perhaps unsurprisingly, is likely to be some way off.  Some will guess too high, others too low. But, curiously, if enough people make a guess, and you average out the guesses, the ‘average guess’ is likely to be close to the right number: The wisdom of crowds.

The statistician Francis Galton in 1907 is credited with first noticing the effect. A public fair in Plymouth was offering a competition, with the princely entry fee of 6d., to guess the weight of a ‘slaughtered and dressed ox’. With roughly 800 entries, he noted that the ‘middlemost value’ (the median) of 1207 lb. was only 0.3% higher than the actual value of 1198 lb.

Ok, so, the wisdom of crowds may work for answering specific questions about quantities or general knowledge, or even estimating and ranking the populations of cities. But such questions are usually set by some expert or group of people ‘in the know’ to find an answer. So what happens when you get the crowd to ask the questions? Can you crowd source the right questions to ask to predict something based on the answers?

How often do you masturbate per month*? Surprisingly, your answer may be predictive of your body mass index (BMI)…

The answer, it turns out, is probably, yes. And on that note, how often do you masturbate per month*? Surprisingly, your answer may be predictive of your body mass index (BMI)…

To see if you can crowd source questions a group from the University of Vermont set up two websites. One website was intended to discover correlations between user submitted questions and their energy consumption. The other website looked for correlations between questions and the BMI of people visiting the site. Both sites invited visitors to enter some kind of information, such as the number of kilowatthours (kWh) used in previous months, or their height and weight – from which BMI is calculated.

The sites also asked visitors to input questions – and answers – for other users that they thought might be related to either energy usage or BMI. Some simple seed questions, where links have previously been suspected, for example, between BMI and frequency of eating fast food were used to start things off. Both sites also gave feedback to their users. Each user could compare themselves to other users and see how the other questions were answered. A constantly updating metric of how well each question correlated with either energy usage or BMI for each site was also given.

On both sites, only about 60-70 people took part, however, a statistically meaningful correlation was observed between (some) of the questions asked by the site visitors and either their energy consumption or BMI.

Now, I agree that it isn’t necessarily hard to conceive of questions that would be directly related to energy usage or an individual’s BMI. However, the results of the study do suggest that crowd sourcing questions may be a way to generate scientific question. Questions that would otherwise elude an expert, or group of experts who, even with the best will in the world, will hold some biases, and some questions will simply go unasked.

In this case, the question “how often do you masturbate a month?” gave the second strongest predictor of BMI, above “how much of your job involves sitting?” I haven’t been able to find any other research linking frequency of masturbation to BMI. (What must Google now think of me…?)

Though I should stress this is not a truly confirmed correlation. The researchers themselves were also keen to stress this.  “We’re not arguing that this study is actually predictive of the causes,” says Paul Hines, professors in UVM’s College of Engineering and Mathematical Sciences, “but improvements to this method may lead in that direction.”

There are, of course, other factors which could confound these results, such as honesty in reported numbers and so on. However, it does raise an interesting possibility for a new way for the public to engage in research, and even help to drive what questions we should be asking.

You can grab the paper here: http://arxiv.org/pdf/1203.1833v1.pdf

* Rhetorical! I really don’t want to know that…