Mind the ’credibility’ gap

Colby Cosh finds out what subsets, modelling assumptions and ’non-probability samples’ have to do with polling these days
(J Pat Carter/AP Photo)

Over the weekend, the estimable David Akin was talking U.S. politics with Ipsos’s Darrell Bricker on Twitter when he noticed an unfamiliar verbal oddity in a Reuters report on the polling firm’s recent survey of early voters.

Obama leads Romney 54 per cent to 39 per cent among voters who already have cast ballots, according to Reuters/Ipsos polling data compiled in recent weeks. The sample size of early voters is 960 people, with a credibility interval of plus or minus 3.5 percentage points.

Huh, what’s this “credibility interval” business? Sounds like a different name for the good old margin of error! But why would we need a different name for that? This question, it turns out, is the pop-top on a can of worms.

The polling business has a problem: when most households had a single land-line telephone, it was relatively easy to sample the population cheaply and well—to estimate quantities like voter intentions in a clean, mathematically uncomplicated way, as one might draw different-coloured balls from a single urn to estimate the amounts of each colour amongst the balls on the inside. That happy state of affairs has, of course, been reduced to chaos by the cell phone.

The cell phone, increasingly, does not just divide the population into two hypothetical urns—which is basically how pollsters originally went about solving the problem. Its overall effect (including the demise of the telephone directory) has affected the math of polling in several ways, all of them constantly intensifying; declining response rates to public surveys (“Get lost, pal, you’re eating up my minutes”) are the most obvious example. Put simply, individual members of the public are no longer necessarily accessible for polite questioning by means of a single randomizable number that everybody pretty much has one of. The problem of sampling from the urn has thus become infinitely more complicated. Pollsters can no longer assume that the balls are more or less evenly distributed inside the urn, and it is getting harder and harder to reach into the urn and rummage around.

So how are they handling this obstacle? Their job, at least when it comes to pre-election polling, is becoming a lot less like drawing balls from an urn and more like flying an aircraft in zero-visibility conditions. The boffins are becoming increasingly reliant on “non-probability samples” like internet panel groups, which give only narrow pictures of biased subsets of the overall population. The good news is that they can take many such pictures and use modern computational techniques to combine them and make pretty decent population inferences. “Obama is at 90 per cent with black voters in Shelbyville; 54 per cent among auto workers; 48 per cent among California epileptics; 62 per cent with people whose surnames start with the letter Z…” Pile up enough subsets of this sort, combined with knowledge of their relative sizes and other characteristics, and you can build models which let you guess at the characteristics of the entire electorate (or, if you’re doing market research, the consumerate).

As a matter of truth in advertising, however, pollsters have concluded that they shouldn’t report the uncertainty of these guesses by using the traditional term “margin of error.” There is an extra layer of inference involved in the new techniques: they offer what one might call a “margin of error, given that the modelling assumptions are correct.” And there’s a philosophical problem, too. The new techniques are founded on what is called a “Bayesian” basis, meaning that sample data must be combined explicitly with a prior state of knowledge to derive both estimates of particular quantities and the uncertainty surrounding them.

A classical pre-election voter survey would neither require nor benefit from ordinary knowledge of the likely range of President Obama’s vote share: such surveys start only with the purely mathematical specification that the share must definitely be somewhere between 0 per cent and 100 per cent. A Bayesian approach might start by specifying that in the real world Obama, for no other reason than that he is a major-party candidate, is overwhelmingly likely to land somewhere between 35 per cent and 65 per cent. And this range would be tightened up gradually, using Bayes’ Law, as new survey information came in.

This is probably the best way, in principle, to make intelligent election forecasts. But you can see the issues with it. Bayesianism explicitly invites some subjectivity into the art of the pollster. (Whose “priors” do we use, and why?) And in making the step from estimating the current disposition of the populace to making positive election forecasts, one has to have a method of letting the influence of old information gradually attenuate as it gets less relevant. Even nifty Bayesian techniques, by themselves, don’t solve that problem.

Pollsters are trying very hard to appear as transparent and up-front about their methods as they were in the landline era. When it comes to communicating with journalists, who are by and large a gang of rampaging innumerates, I don’t really see much hope for this; polling firms may not want their methods to be some sort of mysterious “black box,” but the nuances of Bayesian multilevel modelling, even to fairly intense stat hobbyists, might as well be buried in about a mile of cognitive concrete. Our best hope is likely to be the advent of meta-analysts like (he said through tightly gritted teeth) Nate Silver, who are watching and evaluating polling agencies according to their past performance. That is, pretty much exactly as if they were “black boxes.” In the meantime, you will want to be on the lookout for that phrase “credibility interval.”  As the American Association for Public Opinion Research says, it is, in effect, a “[news] consumer beware” reminder.