For most election analysts, the raw material for a prediction comes in the form of polling data. In theory, polls represent random samples employing uniform methodologies that are lightly weighted. In reality, pollsters use a variety of sampling methods, and then heavily weight the data before (and sometimes after) pushing it through varying voter screens. Much of this is considered proprietary, so we don’t really know what is going on, but suffice it to say that pollsters aren’t just presenting “pristine” random samples.
Even worse, pollsters seem to be increasingly engaging in something called poll herding: a tendency to either re-weight an outlying poll to fall in line with other pollsters or to fail to publish outlying polls altogether. In 2014 alone we saw evidence that PPP, Rasmussen Reports, Gravis Marketing and Hampton University all refused to release polls; forecasters suspect that there are many more instances like this (at least two of these polls were released by accident), but it is unknowable just how many.
This matters, because if a race shifts, or if the herd is wrong, pollsters will be unable to pick up on the movement—there is a collective “you first” tendency when the data suggest pollsters should break out of the herd. Moreover, for technical reasons, models that are denied access to outlying results will tend to understate the uncertainty of their predictions. The result, then, can be the types of massive misses that we saw in the recent elections in the United Kingdom and Israel.When it comes to election forecasting models that use data other than preference polls, the problem is that there are just not that many elections to plug into the equation.
The bottom line is that the relative paucity of electoral data complicates our efforts to evaluate results, which in turn exacerbates the problem of generating “unbiased” theories and weeding out (or controlling for) flawed data. All told, this means that we have all have less certainty than we claim when we generate predictions, and increases the odds of a massive modeling miss in the near future.
A rigorous approach to predicting elections will, over time, produce better results than simply going with one’s gut. But when we’re terrified of being innovative or going out on a limb because of the consequences for being wrong, we’re no longer being rigorous. When analysts retreat into our own herd, we’re just putting an analytical gloss on the same old conventional wisdom. The freedom to be wrong has always been a crucial part of the scientific endeavor. In an area like election forecasting, where the data are problematic and we can easily be wrong for a period of several years by simple random chance, it is even more important. Refusing to give analysts the breathing space to make mistakes is a recipe for stagnation in our analysis. The data journalism project is an incredibly important one. It would be a shame to torpedo it because people are reluctant to admit how much we don’t know.