Mean differences or mean changes?

While analyzing data from an experiment, I found myself writing things like ‘The treatment changes the outcome variably by…’ or ‘the treatment leads to changes in the outcome variable’. However, I often thought that talking about changes sounded too ‘dynamic’. After all, I was referring to two different groups of subjects (between-subjects design). What I was doing was to statistically compare means of the outcome variable of different groups. I was ok to talk about changes when referring to within-subject differences, i.e. changes in outcomes for the same subject due to an intervention, but for the between-subjects case, shouldn’t I rather talk about differences instead of changes?

I was aware that talking about changes was probably due to the fact that my data was generated by carefully controlled laboratory or online experiments. Experiments are usually paramount in testing for causal effects of treatments on outcomes. Happenstance data (e.g. observational studies, questionnaires, surveys, etc.) on the other hand are used to make inferences about correlation, not primarily causation. On the first glance this difference might seem subtle but it is actually very important.

In observational studies, detecting differences in the outcome variable between groups, e.g. females, males, and any other gender, suggests that these sub-groups differ (significantly) with respect to the outcome variable. However, we cannot infer from the data whether gender is causing the difference in the outcome variable, or whether there is any other factor (correlated with gender), that might be responsible for this difference. Therefore, we should not write that “variation of gender leads to changes in the outcome variable”. Rather, we could say that gender predicts a certain outcome variable, i.e. that knowing the gender leads us to infer that the outcome variable has a certain value (of course associated with uncertainty, e.g. a confidence interval), or that gender correlates with the outcome.

In experiments, however, we are significantly creating a situation in which we can make sure that the cause precedes the consequence. This is not the case in observational studies, i.e. happenstance data, where data is gathered at the same time (more or less, at least), and changes in the design are usually not used in order to influence other variables systematically. By this little tweak we can be reasonably sure that differences in the outcome are caused by the manipulation we did, of course assuming that nothing else changed.

So, statistically speaking, the treatment causes a difference in the outcome variable. However, as I indicated above, this difference is between two distinct groups, which are, however, sampled randomly from the population. Random sampling allows us to make inferences about the population, i.e. to infer that the treatment at hand would also lead to a change in the outcome variable of a subject for which we did not observe the ‘treated’ outcome, only the un-treated outcome, because he was randomly allocated to the control group. This, in between-subjects experiments, is a very basic limit, which is termed ‘counterfactual’ (about which I also write here). We only observe one outcome per participant, i.e. the outcome when he is confronted with the intervention, or when he is not. Random sampling, however, allows us to basically ‘assume’ that the mean un-treated outcomes of the control group participants serves as a proxy for the mean un-treated outcomes of the treatment group participants, and vice versa. By taking the difference between both means, we can calculate the average treatment effect (ATE). This ATE basically tells us what the treatment intervention causes with respect to the outcome variable.

So, coming back to my initial question: I guess it is ok to talk about changes, as well as it is to talk about differences in the outcome variable. For experiments, this is. For empirical data, I’d say that it is a bit more difficult to talk about changes. Although theory at hand should specify which direction of causality is expected, it would be safer to talk about differences instead.

An excellent book about counterfactuals and how to understand (field) experiments from this perspective is that of Gerber, A.S. & Green, D.P., 2012. Field experiments: Design, Analysis, and Interpretation. New York, London: W.W. Norton & Company.