There have been some fantastic pieces the last couple of days analyzing the divergent polls and how partisans seem to be choosing whichever data supports their candidate and arguing for its veracity over the contrary. Today, a great many Republicans look at Mitt Romney’s lead in national polls and point to that as the reason for his expected election victory. Democrats look at the state polling (since that is where the actual electoral votes come from) and say Obama still has the electoral college advantage regardless of any deficit in national polls. Sean Trende at Real Clear Politics had a great non-partisan column on national polls versus state polls and how looking at each can lead to opposite conclusions:
The RCP Average currently has Mitt Romney up by 0.8 points nationally. He has held this lead fairly consistently ever since the first presidential debate. Given what we know about how individual states typically lean with respect to the popular vote, a Republican enjoying a one-point lead nationally should expect a three-to-four-point lead in Florida, a two-to-three-point lead in Ohio, and a tie in Iowa. Instead we see Romney ahead by roughly one point in Florida, and down by two in Ohio and Iowa.
That would give the Presidency to Mitt Romney. But if you reverse engineer the state polls to a national turnout you arrive at a different conclusion:
Since the national vote is a collection of state votes, polls of all states should collectively approximate the national vote (since errors should be randomly distributed, they should cancel out). This is done by a simple weighted average…[T]here are several good arguments for favoring the state polling: (1) you have more polls — a much larger collective “n”; (2) you compartmentalize sampling issues — pollsters focused exclusively on Colorado, for example, seem less likely to overlook downscale Latinos than pollsters with a national focus; and (3) the state pollsters were better in 1996 and 2000, two years that the national pollsters missed (although the truly final national pollsters in 2000 got it right, suggesting that perhaps there was a late shift in the race)…After adding the totals up, the results were plain: If the state polls are right, even assuming Romney performs as well as Bush 2004 did in the states without polling, Obama should lead by 1.18 points in the national vote. Given the high collective samples in both the state and national polling, this is almost certainly a statistically significant difference. It’s also a larger margin than all but one of the polls in the national RCP Average presently show.
But national versus state polls isn’t the only debate. Actual poll results versus the data within those same polls may even be the more contentious (and valuable) debate this cycle. Enter Baseball Crank with a fantastic look at modeling election outcomes based on polls versus looking at the actual data that makes up the polls to forecast election winners:
Mathematical models are all the rage these days, but you need to start with the most basic of facts: a model is only as good as the underlying data, and that data comes in two varieties: (1) actual raw data about the current and recent past, and (2) historical evidence from which the future is projected from the raw data, on the assumption that the future will behave like the past.
[A]n argument Michael Lewis makes in his book The Big Short: nearly everybody involved in the mortgage-backed securities market (buy-side, sell-side, ratings agencies, regulators) bought into mathematical models valuing MBS as low-risk based on models whose historical data didn’t go back far enough to capture a collapse in housing prices. And it was precisely such a collapse that destroyed all the assumptions on which the models rested. But the people who saw the collapse coming weren’t people who built better models; they were people who questioned the assumptions in the existing models and figured out how dependent they were on those unquestioned assumptions. Something similar is what I believe is going on today with poll averages and the polling models on which they are based. The 2008 electorate that put Barack Obama in the White House is the 2005 housing market, the Dow 36,000 of politics. And any model that directly or indirectly assumes its continuation in 2012 is – no matter how diligently applied – combining bad raw data with a flawed reading of the historical evidence.
Nate Silver’s much-celebrated model is, like other poll averages, based simply on analyzing the toplines of public polls…My thesis, and that of a good many conservative skeptics of the 538 model, is that these internals are telling an entirely different story than some of the toplines: that Obama is getting clobbered with independent voters, traditionally the largest variable in any election and especially in a presidential election, where both sides will usually have sophisticated, well-funded turnout operations in the field. He’s on track to lose independents by double digits nationally, and the last three candidates to do that were Dukakis, Mondale and Carter in 1980. And he’s not balancing that with any particular crossover advantage (i.e., drawing more crossover Republican voters than Romney is drawing crossover Democratic voters). Similar trends are apparent throughout the state-by-state polls, not in every single poll but in enough of them to show a clear trend all over the battleground states.
If you averaged Obama’s standing in all the internals, you’d capture a profile of a candidate that looks an awful lot like a whole lot of people who have gone down to defeat in the past, and nearly nobody who has won. Under such circumstances, Obama can only win if the electorate features a historically decisive turnout advantage for Democrats – an advantage that none of the historically predictive turnout metrics are seeing, with the sole exception of the poll samples used by some (but not all) pollsters. Thus, Obama’s position in the toplines depends entirely on whether those pollsters are correctly sampling the partisan turnout.
Battlegroundwatch clearly falls into the Baseball Crank category of looking at the internals and taking the conclusions wherever they lead us. Following this methodology Baseball Crank concludes thusly with which we have no disagreement:
I stand by my view that Obama is losing independent voters decisively, because the national and state polls both support that thesis. I stand by my view that Republican turnout will be up significantly from recent-historic lows in 2008 in the key swing states (Ohio, Wisconsin, Colorado) and nationally, because the post-2008 elections, the party registration data, the early-voting and absentee-ballot numbers, and the Rasmussen and Gallup national party-ID surveys (both of which have solid track records) all point to this conclusion. I stand by my view that no countervailing evidence outside of poll samples shows a similar surge above 2008 levels in Democratic voter turnout, as would be needed to offset Romney’s advantage with independents and increased GOP voter turnout. And I stand by the view that a mechanical reading of polling averages is an inadequate basis to project an event unprecedented in American history: the re-election of a sitting president without a clear-cut victory in the national popular vote. Perhaps, despite the paucity of evidence to the contrary, these assumptions are wrong. But if they are correct, no mathematical model can provide a convincing explanation of how Obama is going to win re-election. He remains toast.