Tuesday night, September 14, 2010: The final day of the big primaries prior to the general election in November. I’m sitting watching the results come in. My network of choice tonight is MSNBC. It’s just so much fun watching Rachel Maddow narrate the primary like it was a Super Bowl.
Statistics during the election season are thrown around like cheap bead necklaces at a Mardi Gras parade. Polls are quoted like they mean something and are perfect predictors of the future.
Well, I’m going to tell you something the politicians and pollsters and pundits would prefer you don’t know:
Polls Results Can Be Stretched Like Bungee-Jumping.
Yep, its true. And I can give some simple examples.
You can follow along. But open a new window in your browser so you can click back and forth more easily.
First click here on Pollster.com. This should take you to the 2010 National Congressional Ballot. This is for the House of Representatives only. What we’re interested in is the polling chart. It should look like this:
2010 National Congressional Ballot. Image: Pollster.com
So, how do you read this mess of dots and lines? The dots are called a scatter plot and each one represents a poll taken on a certain day or period of days (usually 2-3). The date of the poll is on the horizontal line and the percentage Republican (red) and Democratic (blue) is on the vertical line. The squiggly lines in the middle are called a trend line and represents the mid-point of the dots for that day. For those of you who may have taken statistics some time in the past, this is also called a linear regression. The colors represent the same political parties as the dots. Correction: Dr. John Bogen, an Extreme Thinkover contributor, corrected my error in labeling the trend line as a “linear regression.” Although the trend lines are based on regression formulas, I should have labeled it as Pollster.com calls it, a “trend estimate.” For more info on Pollster’s statistical methods for trend estimates, click here.
Now take a moment to go to the Pollster.com site and look at their “live” chart. Each dot will open a fly-by box telling you the pollster, date and the results. Pretty nifty, huh. There is also an expand box in the upper right hand corner if you want to open the chart to fill your screen. The features still all work.
What, then, does this chart, as presented, convey? Notice the date begins in November 2008, at the time of the Presidential election and covers the time since then. Here the trend line is easier to read because you can see the ups and downs of the popularity of each of the two major parties over the past two years.
If this chart was the only one you looked at you would conclude that the Republicans have made huge gains beginning about May 2010 and now hold a 47.1% to 40.6% lead over the Democrats. And you would be wrong. Something is missing. First of all what about the undecided voters? Where are they? How many of them are there? What is their trend? For that answer, click here.
A new set of black dots with a trendline appears on the chart representing those voters who answered “undecided” on who they plan to vote for. You can also see that as this year has progressed, the line has trended just slightly upward, and only since August have more of the people made up their minds. As of today though, the undecideds are still 10.4% of the total, which is larger than the gap between the Republicans and the Democrats. This is where it gets interesting.
I submit that in this gap is where the wild things are, to reframe the title of Maurice Sendak’s beloved children’s book.
Reading the gaps is where the information about the most dynamic trends in the electorate are. Follow me on this. Go back to the original chart. You should already have the Red, Blue and Black trends open. On the footer is a button titled “Tools.” Click on it and it will open another footer just above with six different choices on it. Click on “Filter.” This will open a small window with three check boxes: Live phone interviews, automated phone (i.e. robocall) interviews, and internet. Place your cursor on each one and you will get a list of “filter options.” Notice on the first option, Live phone interviews, there is an arrow in the top right hand corner. This option has three pages and we’ll use them.
We want to narrow our polling data to the most relevant and the highest chance for honest answers. To do that, based on my criteria (you are free to choose your own), I say let’s eliminate the internet surveys, first. They are very hard to get a true random sample and very easy to lie on. Next, let’s eliminate the robocalls, too. Even though the calls go out to a random sample (supposedly) it is very easy to lie to a machine. That leaves us with the live interviews. These surveyors, you will notice are familiar big name pollsters, who have a reputation to uphold, and nearly all of them publish their survey questions and results online for free access for anyone interested in reading them (which would include geeks like me). We want to cull some of these, still. They are the pollsters for both political parties because there is a greater chance they will ask weighted questions that favor their side.
So, uncheck the internet, robocalls, and on the live call pages every pollster ID’d with either a R or a D. Now we have a select set of pollsters who are as neutral as possible and use real people to talk to voters to decrease the chance for lying or misrepresentation.
One more thing. We really only want to look at the results for the current primary season. So again, click on tools and then on “Date Range.” On the left date, click on the month and set it to “01”, the day, “01” and the year, “10” and then click on the blue “Set Range” button.
Look at your results on the chart now. The polling results have changed. The Republicans sit at 47.7%, the Democrats at 41.1% and the undecided at a whopping 17.1%! This, I would suggest is a much clearer picture of the state of the electorate regarding the races in the House of Representatives. By eliminating those polls that introduce bias into the big picture, either by the way they are administered, or by the way they are designed to benefit their candidates, we can see that the November election is far less certain than most pundits and politicians are leading us to believe.
The fact that apparently over 17% of the electorate is still vacillating about who they will vote for in the general election means the predicted gains by the Republicans has to be called into question, the predicted losses by the Democrats has to be called into question, and the outcomes across the country will very possibly be different than is now being predicted. It may also mean that the gains or losses may be greater than predicted and one party or the other end up with a significant lop-sided outcome.
But one principle in polling must not be forgotten. Each poll is a snap-shot in time and by itself can be either an accurate or inaccurate reflection of the voters’ will. It is also important to remember the truism that all politics is local and as Dr. Bogen also points out, the undecided percentage is likely to be smaller on the local scene. He also rightly suggests this local phenomenon, all things being equal, favors the challenger. This same principle applies to groups of polls as well because they are aggregates of local polls. Political trending, although becoming more sophisticated all the time still cannot reliably predict the outcome on election day every time. We have far to go to reach the algorithmic precision of Isaac Asimov’s Foundation “psychohistory.” In the mean time we have to search for the data where the wild things are.