Introducing the initial cross tab modeled popular vote prediction

Aug 4, 2024
3 min read

This includes 19 polls with cross tabs to make up one big super poll!

What this basically does is ignore the top line results of the polls coming out and instead focus on how the specific groups are voting within each of the polls. This avoids having your polling averages knocked off by pollsters who oversample Democrats or Republicans because they are trying to move the needle. For instance, the current Leger poll showing Harris up by seven points. But modeling the demographic results shows the race more likely within two points and right in the range of everyone else. They got to seven and became an outlier by oversampling Democrats.

So how do I come up with the demographic turn out prediction? Simple. By averaging out the past few elections to get an actual historical value. What is the best indicator of future behavior? Yep, past behavior. I am not one to believe that "this" is the year that Democrats or Republicans are going to simply turn out in record numbers and completely skew the results. Sure, I have toyed with the idea of figuring out how to include voter enthusiasm in predicting a turn out model. But that polling is fluid and generally consolidates towards the end of a cycle. So far, my modeling based on past election results has been too close to really want to mess with.

To prove that point, the difference between Hillary Clinton's 2.1% popular vote margin and Joe Biden's 4.4% popular vote margin had nothing to do with turnout. In fact, Democrats held a slightly bigger turnout advantage in 2016 than they did in 2020. The difference was that Trump won independents in 2016 whereas Biden won them by double digits in 2020. The difference was not about turnout. The difference was how people voted.

Meanwhile, because you are using a great many polls and replacing old versions of the same poll (rather than use multiple versions of the same poll as 538 does) it avoids allowing a few active pollsters to overstate their value to the mix. Moreover, this avoids the RCP polling fluctuation problem as they allow polls to drop off because of time even as they have not been replaced. That RCP average will then end up with massive changes as pollster more generous to one candidate drop off the averages and are replaced by completely different polls that might be more generous to the other candidate. Keeping the latest of all the polls removes that problem.

That being said, those who rely on momentum, waves, and swings to make things interesting, would probably not like my slow-moving average. But most of the time, something that affects the polling short term, rarely affects it as much long term. If it does, then my model will reflect that incrementally as each of the pollster come through with newer numbers.

The one downside to this method is that there are many pollsters that do not provide crosstabs. In essence, my model will not be comprehensive of all of the polls out there. That being said, I am suspicious of pollsters who will not release their crosstabs, so maybe it is a good thing that they are not in the model. Ultimately it is more about getting enough polls to create a working average in each of the demographic categories to effectively create the one super poll.

I also include some predictive measure at where the undecideds will fall down. Nothing complicated. Just the idea that most Republicans and Democrats will come home in the end, and it is independents or third-Party voters who are most likely than others to vote for someone outside of the major Party candidates. The past few elections I have overestimated the effect of third-Party voters, so I have made some adjustments.

Lastly, for the initial result, most of these polls I am using are new. A majority of them have been released within the past week, and none of them are more than two weeks old. So, this may be as good as it gets with being completely current.

This includes 19 polls with cross tabs to make up one big super poll!

Comments