by Julio Amador Diaz Lopez1 and Ralf Martin2
Last update: April 22 , 2020 - 14:02
The coronavirus pandemic has changed everything overnight. Unfortunately, as the genetic sequence of the virus started to make its deadly journey through bodies around the world, in parallel a memetic sequence emerged in the minds of some people: the idea that covid19 pandemic is not real and a hoax. Indeed, the worry is that the two infections exist in a symbiotic relationship with one helping to advance the survival and spread of the other. Here we report on our ongoing efforts to map the spread of memetic infection using Twitter. Since March 23 we have been sampling tweets mentioning the terms “corona” and/or “covid”. Currently, we have collected 11.95 million tweets.
Some emerging results include the following:
How bad is the hoax infection and is it getting better or worse? To identify tweeters believing in the hoax (or promoting the hoax idea) we look for tweets with one of the following hastags:
Using hashtags instead of string searches of the same terms provides a good distinction between tweets who display support for hoaxsim vs tweets criticising hoaxism. Note that this is likely a conservative way of counting hoaxist tweets and in reality a larger fraction of tweets are from people supporting hoaxist ideas.
Below is a time series plot of the share of hoaxist tweets over our sample period.3] we report separate series for the Us and UK. Assigning location to tweets is notoriously difficult as most users have switch off detailed location tracking. In the figure below we base location on the analysis of a free text field where users can write something about their whereabouts. In many cases this refers to known areas although the detail varies (e.g. London, UK vs the Universe). Often it also involves phantasy locations (e.g. Walhalla). Hence, our “other” category might include tweeters from either the UK or Us who have chosen not to reveal their location.
Note that towards the begining of the sample period the share of hoax tweets in all covid related tweets is less than 0.5%. However, the weekend around the 28th of March saw a major outbreak of Hoaxism that was particularly bad in the UK. This has subsided somewhat come March 30. The whole sample trend would suggest that hoaxism is fairly stable and not subsiding, although there seems to be a declining trend for the last couple of days.
What are drivers of hoaxism? We can start exploring this by looking at the tweets of hoaxists more widely. Below we plot a word cloud of the last 1000 tweets of the 300 most prolific hoaxists. One hypothesis is that hoaxism has been fueled by Trumpism. Because of worries that a strong response to the pandemic could negatively affect the economy and thereby his re-election chances, he had a vested interest in playing down the crisis. The word cloud confirms that obsession with trump is prevalent among hoxists.
For comparison, here is a word cloud of the 300 most prolific non-hoaxist covid related tweeters. Trump is relevant here too although do a smaller degree: hoaxers have a 4.86 percentage point higher probability of mentioning Trump (The share of Trump mentions across both groups is 5.34%). Of course it might also be that one group is supporting Trump whereas the other is opposing him. We will address this in future work.
Also note that the term “filmyourhospital” shows up prominently, which according reports is a hastag pushed by right-wing commentators.
We examine if US state level hoax infection rates are correlated with reported covid19 infection rates. This is interesting to gauge if mis-information has any effect on actual outcomes. Clearly, from a simple exercise like that we cannot draw overly strong conclusions about causal effects. However, it is a useful starting point. The figure below4 is a scatter plot of state level per capita infection rates on the share of hoax tweets (in percent) from within the state. There is clearly a positive relationship. What is particularly striking is that New York is not only extreme in terms of infections but also in the prevalence of hoaxism.
An alternative explanation for the striking infection rates in New York is the relative density of New York. That’s why we also examine the relationship between infection rates and density (in people per square mile). Indeed there is a positive relationship as well. But New York seems to be more of an outlier in terms of density. Indeed one potential hypothesis the two figures combined suggests is that there might be an interaction effect between hoaxism and density. Take for instance Alaska, which has the second highest rates of hoaxism, but much lower infection rates than New York. Of course it’s also the least densiley populated state. On the other hand: consider New Jersey which is actually more dense than New York but has much lower rates of infection. It turns out that hoaxism is also less prevalent there.
To explore this more below we also undertake regression analysis.5 The Table below shows that:
Hoaxsim is indeed significantly and positively related to hoaxism (Column 1). The coefficient implies that a 1 percentage point higher hoaxism level is associated with 1.38 extra covid patients per 1000 citizens.
This is result is highly robust to the inclusion of further controls such as population density, population size and covid tweet intensity (covid related tweets per 1000 people) in column 2.
The hoaxism and density interaction hypothesis is confirmed in column 3 where we include the interaction of both variables as an additional regression coefficient (as well as the interaction of covid tweet intensity with density as additional control)
In column 4, we identify the model from daily data rather than a cross sectional variation of the latest available period (day). This allows us to control to include state as well as day control (density is no longer separately identified as it becomes a fixed state level characterstics). Hence we implicity control for all fixed state characteristics that could might be confounding our estimate. This preserves our qualitative conclusions alhtough the estimates coefficients become lower.
In column 5 we repeat the exercise while dropping all observations from New York. This has little impact on the findings related to hoaxism.
We have to be cautious with causal claims at this stage. Our results could be contingent on our simple model specification or crude aggregation (e.g. we don’t take into account that New York state consists of the metropolitian area of New York as well as rarther rural parts, although as we saw in column 5, the results are not contingent on New York). Still, to understand if the results are not only statistically significant but also quantitatively meaningful it is useful to ask what - if taken at face value - the impact of haoxism would be. Using the estimates from column 4 which we consider our most reliable at this stage would imply that without hoaxism we had 144039 covid cases less (of a total of 722635), as of 2020-04-18. Clearly, this is substantial.
Dependent variable: | |||||
Covid19 Cases per capita | |||||
(1) | (2) | (3) | (4) | (5) | |
Hoax Tweets Share | 1.744** | 1.910*** | 2.220*** | 0.184*** | 0.280*** |
(0.774) | (0.568) | (0.426) | (0.048) | (0.037) | |
Population density | 0.006*** | -0.001 | |||
(0.001) | (0.002) | ||||
Tweets per capita | 0.366 | 0.070 | 0.086*** | 0.004 | |
(0.407) | (0.307) | (0.008) | (0.007) | ||
Hoax X Density | 0.012*** | 0.002*** | 0.002*** | ||
(0.002) | (0.0003) | (0.0002) | |||
Tweets X Density | -0.0002 | 0.0003*** | 0.0003*** | ||
(0.001) | (0.00001) | (0.00001) | |||
States Controls | No | No | No | Yes | Yes |
Day Controls | No | No | No | Yes | Yes |
Sample | Last Day | Last Day | Last Day | Daily | NY dropped |
Observations | 49 | 49 | 49 | 1,346 | 1,319 |
R2 | 0.098 | 0.600 | 0.788 | 0.877 | 0.877 |
Adjusted R2 | 0.078 | 0.573 | 0.763 | 0.870 | 0.869 |
Residual Std. Error | 2.153 | 1.465 | 1.091 | 0.496 | 0.367 |
F Statistic | 5.079** | 22.496*** | 31.951*** | 114.667*** | 113.115*** |
Note: | p<0.1; p<0.05; p<0.01 |