A couple of years ago, I noticed my heart would occasionally skip a beat. I was experiencing a slight palpitation, just a few times a day. My cardiologist referred to these as premature ventricular contractions (PVCs) and, after some tests, reassured me that they were a benign presence in my otherwise healthy heart. But both he and my regular doctor noted that there were some triggers that would make it worse: stress, caffeine, alcohol, tiredness — the usual suspects.
I was reassured that my symptoms were harmless, but their unexplained presence still made me anxious. Rather than dealing with the anxiety in a healthy way, I instead decided to run a randomized controlled trial (RCT) on myself to see whether my daily coffee habit was driving these palpitations. Every day I flipped a coin and followed a very simple rule: if it was heads, I could drink tea or coffee as much as I would do normally. If it was tails, I abstained from caffeinated beverages the entire day.
For a couple of months I flipped coins and ran the experiment, using an app on my phone to keep track of my palpitations and other things like alcohol consumption.1 My smart watch kept track of my sleep patterns and how often I exercised. When I eventually got fed up, I downloaded all the data and looked at the results, which covered a span of about fifty days.
Initially, there didn’t appear to be much of an effect. Comparing the basic averages on days when the coin came up heads to those when it came up tails revealed I had slightly more palpitations, but not significantly so.
However, looking at the raw data, I realized the number of palpitations I was experiencing seemed to vary wildly from day to day and from week to week. Once I accounted for these seasonal and day-of-week differences, there was some evidence that the doctors were correct: on days where I flipped heads, I experienced roughly 35-40% more palpitations (equivalent to a couple of skipped beats a day). The results were mostly, but not always statistically significant (I probably should have ran the experiment for more than 50 days, but I eventually lost the battle with my inner coffee demons).
But while staring at the graph documenting these palpitations over two months, I realized that I had answered the wrong question. Even if caffeine was increasing the number of palpitations I felt on a given day, it couldn’t explain the more interesting story: that they eventually went away on their own. Even if caffeine exacerbated my symptoms, they weren’t the predominant cause.
Snippets of evidence — the fact that my palpitations peaked on Mondays and on a day before a job interview — suggested that underlying conditions like stress played a factor. But even if I could find ways to randomize my levels of stress, such as forcing myself to meditate or force myself to doomscroll on certain days — the above trends suggest that longer term, cumulative factors were driving my condition. This got me thinking about how my own research might occasionally miss the bigger picture.
The field of economics is currently in the middle of what academics Josh Angrist and Jörn-Steffen Pischk once referred to as a “Credibility Revolution,” a movement to make empirical work both more rigorous and more focused on uncovering causal relationships. While most revolutions have some sort of purported ideology at their core, the vanguard of the Credibility Revolution often insists that it is anti-ideological in nature: the data should tell us about how the world works, not the other way around. If we have good reason to believe that caffeine causes heart palpitations, then it had better be shown to be true in a natural experiment or RCT.
Occasionally this dogma led to bust-ups between the Credibility Revolutionistas (I count myself as one of the pack, even though they still get annoyed at me from time to time) and those that feel that the Revolution was making an implicit (or, as is often the case on Twitter, explicit) case that certain types of empirical evidence were paramount.
A notable scrimmage took place in 2019, when superstar economists Abhijit Banerjee, Esther Duflo and Michael Kremer were awarded the Nobel Prize in Economics. The three were recognized for revolutionizing the field of development economics with an experimental approach to fighting poverty, largely centered around the use of RCTs. The use of these methods has dramatically increased since Kremer first started running randomized interventions in the domains of health in education in the late 90s, although even today their overall prevalence in the top economics journals remains muted.
Every new Nobel win shines a spotlight on a way of thinking about the world. Proponents want to step into that spotlight and celebrate, where opponents want to scribble caveats on the sidewalk for all to see. In the case of Banerjee, Duflo and Kremer, the recognition led to a pretty swift backlash on social media. Part of the dissent came from heterodox economics who rejected experiments as imperialist and unethical, and probably also had a few long standing bones to pick with the newly-minted economics establishment (if not, what sort of heterodoxers could they really be?).
Another tide of criticism came from well-established empirical economists who have been largely skeptical of the new wave. Cynically, I suspect some of this push back has been driven by inter-generational differences in methods: the old guard has not been terribly happy that no one looks at the world in quite the same way they do anymore.
However, there is a third group of those that were uneasy with certain elements of the randomista movement, and the wider Credibility Revolution, and they included card-carrying members. The source of this concern came from the worry that, because certain questions were easier to answer using rigorous methods than others, that focusing solely on those methods was going to lead to a skewed understanding of how the world works.
A great example of this came from a brief back-and-forth between the political scientist/economist Chris Blattman and Banerjee and Duflo in 2011, on the eve of South Sudan’s independence from the North. Responding to a query from an NYT columnist on what the new South Sudanese government’s policy priorities should be, Banerjee and Duflo focused on getting social services right, particularly cash transfers, micro-scale interventions that had largely been borne out by recent trials. Blattman instead argued that the government desperately needed to get incentives aligned across country for people to not start shooting at each other again — focusing on coalition and state building instead of tweaking the plumbing of state services and anti-poverty programs.
It would be unfair to characterize Duflo and Banerjee of being ignorant of politics or not seeing past micro-scale interventions — they have been more than eager to grapple with these issues in their past few books. But as South Sudan descended into a brutal civil war from which it has yet to emerge, it is initially hard to see how the randomistas could ever have plucked a sufficient answer from their portfolio of work. They might have successfully implemented an effective cash transfer program and reduced poverty by 4o% - and that would have been a very good thing - but those gains would have been pulverized by the oncoming tsunami of civil war.
It would be easy to throw one’s hands up in the air and lament that empirical economists are doomed to ask and answer the wrong questions. I don’t actually believe that is true.
For one, the Credibility Revolution brought with it a whole host of tools beyond the RCT, many of which are better-suited for tackling sub-national or national level policies, from the impact of austerity on voting choices to the medium-term labor market effects of large refugee inflows. Those tools are in a constant state of refinement, but they allow us to open up our research lens to take in the bigger picture and - data permitting - allow for more frequent snapshots of the state of the world.
Second, researchers wielding RCTs have begun to use them to unpack how we might better influence that bigger, more complex picture, ranging from post-conflict reconciliation, to influencing how voters absorb information about political candidates, to how demonstrations of state power erode or embolden criminal enterprises.
But even when our methods answer “big” questions, they typically offer insight over a single slice of a big, complex causal picture. When devising, writing, publishing and pitching our research, we still need to be alert to the other forces at play for two reasons:
First, they may make our answers unstable: an intervention or policy which works today may not work tomorrow - and we need to be aware of when that is likely to be the case. Given that the typical route of asking and answering questions by running a study and getting it published in a peer-reviewed journal takes years, we should be very worried about how stable evidence is over time.
Second, the larger forces might be the better target of our efforts. While I am a firm believer that cash or asset transfer programs are excellent interventions for helping poor rural people escape poverty, it might still be the case that improving market access by building the road might have a larger impact on people’s lives.
We are ultimately best able to describe what we see at our feet when we stand under the lamplight of empirical scrutiny than we are when standing in the dark. But things that remain out of sight do not disappear, and still deserve our consideration and - when possible - experimentation. Over the last couple of years my palpitations occasionally returned, but I found that switching to a job that gave me more autonomy and spending more time investing in self care (e.g. meditation, therapy) did wonders for my stress levels. And now I get to enjoy my morning cup of coffee without worrying too much about increasing my palpitations by 40%.
This wasn’t an ideal experiment for several reasons. Ideally, I should have been blind to whether or not I was ingesting caffeine, something that just wasn’t practical to implement. It is also possible I noticed my palps more often on days I drank coffee — however they were strong enough that they were easy to notice in either condition.
Just mark the graphs with "caffeine" and "no caffeine" rather than heads/tails.
Nice read, one potential way to “blind” yourself from knowledge of whether you’re ingesting caffeine might be to flip a coin to determine regular black coffee vs decaffeinated coffee. Likewise for tea. Some connoisseurs would pick up on the taste difference but would the average person? Or, there are caffeine pills from drugstores which one could work with. Shall it be known to the next person... thanks for sharing!