Causation Does not Imply Variation
Tyler Muir suggested this lovely catchphrase, which should stand next to “Correlation does not imply causation” in our menagerie of econometric sayings. “Do changes in x cause changes in y?” does not answer the question “what are the most important causes of variation in y?” Many identified causal effects explain very little variation, and we know there are many other sources of variation. People often jump from one to the other without stopping to think.
An extra year of college, or growing up in a better neighborhood might raise wages. But only a tiny fraction of why one person’s wage differs from another results from extra years of college or which neighborhood a person grew up in. Minimum wages might raise, some find, or lower, others find, employment. But only a tiny fraction of the huge variation in employment from one area to another or one person to another traces to variation in minimum wages. If you want employment, other levers are likely far more important. Demand shocks might move stock prices. But only a tiny fraction of stock price variation comes from demand shocks.
The causality revolution
The causality revolution has come to dominate empirical work in economics. And productively so. We want to know how x affects y. We might see a correlation between x and y. But our data don’t come from controlled experiments. Maybe y also causes x, maybe there are third variables that cause y and x. This is the central conundrum of empirical social science. College graduates have higher incomes. Does going to college raise your income? Well, rich men drive Porsches. That does not mean that driving a Porsche will make you rich.
So, we find a tiny slice of variation in x that is plausibly “exogenous,” like the random variation that a lab scientist could impose. The correlation of this tiny bit of x with a similarly tiny bit of variation in y can identify a causal effect of x on y. That’s great. This causality revolution has really improved empirical economics from the willy-nilly regressions we used to run. But that does not mean we understand the bulk of movement in y. The other causes of y may, and often do, dominate.
Throwing out variation
Start with a variable we want to understand y, perhaps employment. y is the sum of many causes, y = b1 x1 + b2 x2 + … First of all, we look just at the effect of one variable, x1, say minimum wages, leaving out all the others, including population, demographics, education, unionization, immigrants, rising or falling industries, social program disincentives (they often cut benefits by a dollar for each dollar you earn), and on and on. For most people, who earn far more than minimum wages, minimum wages are obviously irrelevant. Right off the bat, you know we will explain a tiny fraction of employment.
But states don’t enact minimum wage laws randomly. They respond to conditions. Maybe you’re looking at the effect of employment on minimum wages. Or maybe you’re looking at governments that enact a bunch of policies at the same time, more regulation with more minimum wages, and it’s the regulations that lower employment. So we start throwing out variation in order to find something that looks like truly exogenous variation.
A typical study might employ “differences in differences.” Look at changes in minimum wage (difference in time) across different states, and correlate that with the difference across states in employment growth. We’ve thrown out a lot of the variation of the original data, the level of minimum wage in each state.
Studies typically add “fixed effects.” In a regression, y(state, time) = state fixed effect + b x(state, time) + error. A state fixed effect means we look only at the variation in a variable within a state over time, not how the variable varies across states. A time fixed effect means we only look at the variation of a variable across states, and not how it varies over time. It is common to add both fixed effects. Yes, that’s possible. y(i,t) = a(i) + c(t) + bx(i,t) + error is not the same as y(i,t) = a(i,t) + bx(i,t) + error, which would not work. Let’s see if I can state the source of variation in words. (A great seminar question: can you please state the source of variation in x in words?) We’re looking at x in state i at time t relative to how much x is on average in state i, and relative to how much x is on average across all states at time t, and how that correlates to similar variation in y. Hmm, I didn’t do a great job of translation to English. (Stating the assumption on standard errors in words gets even more fraught. Just what did you assume is independent of what? Without using the word “cluster?”)
Other studies look only at states that share a border, or counties that share a border, in the hope that “other effects” are the same across the border. Great, but again we throw out all the variation in non-bordering states or counties.
Next, researchers add “controls.” Controls should be added judiciously: think about what else moves y, how it might be correlated with the x of interest, and then bring it in from the error term to the regression. Control for taxes, regulations, or other changes that might have happened at the same time as a change in minimum wage. Instead of y = b1 x1 + error, recognize that the error includes b2 x2 and that x1 and x2 are correlated, so run y = b1 x1 + b2 x2 + error. Drinking and cancer are correlated. But people who drink also smoke, so you want to look at the part of drinking not correlated with smoking to see if drinking on its own causes cancer. But we are now looking for that much smaller population of drinkers who don't smoke. Technically, controls are the same thing as looking only at the variation in x1 that is not correlated with x2. We throw out variation. Fixed effects are just one type of controls.
In fact, controls tend to be added willy nilly without thinking. Why is this control needed? What are we controlling for? That seems especially true of fixed effects and demographic controls. Extra controls and often destroying the causal implication of the regression. Tom Rothenberg, beloved econometrics teacher at Berkeley, offered two great examples. Regress left shoe sales on price and right shoe sales. The R2 goes up dramatically, the standard errors drop, the magic stars appear. But now you’re measuring the effect of price on how many people buy a left shoe without buying a right show. More seriously, regress wages on education, but “control for” industry. The R2 goes up, we explain much more variation of wages (sort of where this post wants to go, but not this way). But the point of education is to let you move from the burger flipping industry to investment banking, so controlling for industry destroys the causal interpretation of the coefficient.
But I digress. To our point, adding controls reduces the variation in x we are looking at. It is correct to do so: A lot of the variation in x was reverse causality or correlation with other causes, and we want to throw that out in order to learn about causality.
Next, researchers add “instruments.” To avoid the correlation is causation problem, we find some variable z that is plausibly uncorrelated with other influences on y, and then only use variation in x that is predicted by z. We throw out variation in x uncorrelated with z. (Great exam question: explain the difference between an instrument and a control?)
And so on. I am not criticizing. The improvement in causal inference from these techniques has been enormous. We also are now blessed by huge data sets, so we can can do it. Take all the people in the US, and drill down to the fact that Joe Brown really did move exogenously from Newark to Manhattan, compared to Sam Smith who was otherwise identical but stayed put, and see how they did. But obviously that tells us little about the actual distribution of income in the US.
Causality intersects with large data, also newly available. With large data, you can afford to throw out variation profligately to look for that needle of exogenous variation. Ideally, large data means we should be free from standard errors. Everything should be significant. That standard errors still matter tells you how much data we throw out in the quest for causality.
Yes, it is often overdone and not quite as casual as it seems.A “causally identified” “top 5” publication with three stars on the coefficients moves the average economists’ prior by about 1/10,000 of what Bayesian updating says it should do, if the causal identification were correct. (Jeff Smith gave a great recent Hoover seminar on this topic, slides here, on how sensitive many results are to small changes in specification.) We are either incredibly behaviorally stuck in our ways, or the new techniques on their own don’t fully identify causal effects automatically. But I’m not here to delve in to that question today, rather to point out that even if it were all perfectly identified, it only answers the question it says it answers.
Sometimes, of course, the jump is justified. Darwin figured out that natural selection accounts for finch beaks in the Galapagos. That must be 0.00001% of the variation in species. It turns out all the rest is also natural selection. But the Finch beaks alone don’t prove that.
Price pressure, and the 90% full glass.
This comment arose out of discussion at the NBER Asset Pricing Program over Aditya Chaudhry and Jiacui Li’s “Endogenous Elasticities” paper (review in my last post). Like the rest of the price pressure literature, they find surprisingly large elasticities of small changes stock quantities — an unexpected sale of 1% of the outstanding stock lowers the price 1-2%. (Their point is a declining elasticity. Roughly speaking, sales under 1% raise the price by twice the amount of sale, sales over 1% only by the same amount as the sale. But even 1 is a large elasticity.)
But most changes in price occur without any demand (or is it supply?) pressure. Earnings announcements move stock prices, and no shares need change hands. When the market goes down and your stock has a beta of one, the price moves, with no selling pressure to move it. This is the standard theory and fact of trading: when information hits the market symmetrically, prices move with no “buying or selling pressure”, and no volume at all. Indeed, here we are talking about 1% movements in price from occasional 1% movements in sales, but the average stock moves 1% every single day, and 50% or more in a typical year.
Thus, while one can causally identify that buying or selling pressure moves prices, that does not establish that most price movement comes from buying or selling pressure. R(t+1) = beta x(t+1) + error can have a beautifully identified beta and x. But the “error,” which consists of all the other x’s left out of the regression, can be huge. “Liquidity traders move stock prices” does not imply “stock prices mostly move because of liquidity traders.”
To be clear, neither Chaudhry and Li nor any other price pressure authors I have seen claim otherwise. But one does sniff that mis-interpretation hanging around.
Related but slightly different, most changes in quantity have no or tiny price effects, because they are anticipated. Most people trying to buy or sell financial assets are smart enough not to surprise the market. If you show up unexpectedly with a truck load of tomatoes outside of Whole Foods at 2 am, you’re not going to get full price for them. The Treasury, for example, routinely sells hundreds of billions of dollars of debt with essentially no price impact. Why? It announces the sales well ahead of time, and talks to bond traders about the sale. Quantitative easing purchases of hundreds of billions had some impact effect when announced, but no detectable price impact when the Fed actually bought securities. Initial offerings amount to an infinite percent increase in supply of shares. Investment banks exist to popularize offerings, announce them, line up investors, and limit any “sloping demand curve” price impact.
Moreover, we have long understood why selling drives prices down: people on the other side suspect you know something they don’t know. The price pressure literature tries to find selling or buying shocks that the other side ought to be able to figure out is not tied to information. For example, with the same data that price pressure authors laboriously dig up, you should be able to figure out that a mutual fund is selling stocks because its customers are pulling out money, not because its analysts know something you don’t. The mere fact that a fund is selling might mean that its analysts know something the trader does not know. Well, maybe high frequency arbitrageurs aren’t quite that good at parsing out who does and doesn’t know something when they sell.
This is a slightly different phenomenon, for which I don’t have a catchphrase: Just because your identified movement in x causes movement in y does not mean that all movements in x cause movement in y.
Macroeconomics
Macroeconomics should take a victory lap for being first to the table here. Chris Sims’ Vector Autoregressions taught us to look for the effects of a monetary policy shock by looking at the average events not following an interest rate rise per se, but only following unexpected interest rate rises. The trouble is, markets anticipate most interest rate changes very well, so true monetary policy shocks are few and far between. If we want to subdivide, for example to monetary policy shocks that persistently raise interest rates vs those that die out quickly, then we have fewer data points still. (In contemporary theory, persistent vs. transitory shocks have very different effects.) The result, identified monetary policy shocks explain next to none of the observed variation in prices, output, and employment, and standard errors plus the effects of small specification changes are huge.
Final thoughts
So, causality is great, but it isn’t everything. We often do want to know, “what are the major causes of growth vs stagnation, wealth vs. poverty, recession vs. boom, and why do stock prices wander around so much?” Causal identification can chip away at this question, but obviously there is a long way to go. And it’s not the obvious we will ever get there, since so much movement in the causes is and will always be endogenous.
Maybe one should rule out such big picture questions. Medicine doesn’t get far with “why are people sick?” but instead attacks drugs with small marginal power one by one. And clinical trials focus on just the people in the trial, ignoring the vast number outside of the trial.
Still, then, one should not mistake the answer of the small causal question for the answer to the disallowed big picture question.
As I think about macroeconomics and finance, I think there is good work to be done that does not just follow the causal identification format, and allows us to address the big picture question. Sometimes broad facts fit one vs. another causal story in ways that cannot be captured by these techniques.
As a concrete example, I’ll plug again a recent paper, “Expectations and the Neutrality of Interest Rates.” Here I contrasted FTPL, old Keynesian, new-Keynesian and Monetarist explanations for the recent surge of inflation, the long quiet zero bound, the lack of a deflation spiral in 2008, and the immense difference between QE and the 2020-2021 asset purchases. I argue that one can sort out the theories with a little Occam’s razor, basic fundamental predictions, and elephant in the room facts. But I couldn’t think of an F test in a VAR to capture that common sense. This sort of examination of historical episodes remains productive. Tom Sargent’s plot of the end of the German hyperinflation did more than a thousand VARs to demonstrate the possibility of painless disinflation and its likely mechanism.
Growth theory also seems to find it very productive to look at basic facts, rather than slice and dice causal estimates. It started with Bob Lucas noticing that capital should be flowing in droves to poor countries. Why not? Tom Sowell is on my mind from his recent celebration. He documents facts that support one vs. another causal framework. For example, people who immigrate to the US from different countries or areas of countries, but Americans can’t tell them apart, have very different outcomes. Well, pure discrimination can’t be everything.
But this sort of thing takes thought and judgement, and is hard to publish.

