The Causal Model and Notation

Modified

May 12, 2025

Notation

Consider a typical setting where we have measured some treatment \(A\), some set of pre-treatment variables \(L\), and an outcome \(Y\). We will assume that the data are generated by the following causal model (but alternative definitions can be achieved using other causal models):

\[ \begin{align} L &= f_L(U_L)\\ A &= f_A(L, U_A)\\ Y &= f_Y(A, L, U_Y) \end{align} \]

The functions \(f\) are assumed fixed, but likely unknown.
\(U = (U_L, U_A, U_Y)\) is a vector of exogenous, unmeasured random variables.
This system is referred to as a Non-parametric Structural Equation Model (NPSEM).
We will assume that this system of equations exists in nature and is responsible for generating our observed data.

We will often refer to \(L\) as confounders, \(A\) as the treatment or exposure, and \(Y\) as the outcome.

Take for example the following sample of data:

Let, \(L\) = sex, bmi, age, and smoke; \(A\) = trt; and \(Y\) = event. When we say that we assume the previous NPSEM, we are positing that:

the variable trt was generated from an unknown function of the random variables sex, bmi, age, and smoke, as well as a set of unmeasured random variables not present in our data.
that the variable event was generated from an unknown function of the variables that generated trt in addition to trt itself.

Note that in some cases, we may know one or more of the functions \(f\). For example, if our data came from a randomized trial for \(A\), then we know the function \(f_A\).

Counterfactuals

Central to how we will define causal effects is the concept of counterfactual random variables.

Counterfactuals are random variables that would have been observed, possibly contrary to fact, in an alternative world.

For example, consider a scenario where we are interested in the value of \(Y\) in a hypothetical situation where, instead of the variable \(A\) being equal to its observed value, it is set to some other value \(A^\dd\).

The value of \(Y\) in this hypothetical situation is a counterfactual random variable.

Typically, one is interested in counterfactuals where treatment is set to some deterministic value. For instance, one could be interested in setting treatment \(A^\dd=1\). One could also be interested in setting \(A\) according to some covariates, e.g., “treat if age > 50”.

This intervention would be denoted as \(A^\dd=\dd(L)\) for some function \(\dd\), and is known as a dynamic treatment regime.

In this workshop we are interested in a generalization of this concept, where the function \(\dd\) can also depend on the natural value of treatment \(A\):

Let \(\dd(A, L)\) be a function that takes a natural treatment value \(A\) and a covariate profile \(L\) and returns a new value of treatment. We will refer to the function \(\dd\) as a shift function or a general hypothetical intervention.
Denote the value of \(Y\) in the hypothetical world where treatment is set to the value \(\dd(A,L)\) as \(Y^{\dd}\).

Returning to the data example, imagine we are interested in the value of event if trt was replaced with the output of a function \(\dd\) that always returns 1:

\[ \begin{align} L &= f_L(U_L)\\ A^{\dd} &= \dd(A, L) = 1 \\ Y^{\dd} &= f_Y(1, L, U_Y) \end{align} \]

Here we introduce some new notation \(A^{\dd}\) to refer to the post-intervention exposure. If we had the ability to collect data from this alternative NPSEM, the data may instead look like this:

Unfortunately, we are never able to collect data from this alternative world. This is called the fundamental problem of causal inference.

The previous NPSEM is the simplest causal model we will assume in this workshop. However, real data is often much more complex and may be characterized by:

time-varying variables
loss-to-follow-up

As such, we need to modify and introudce some additional notation:

Symbol	Definition
\(i\)	The index (i.e. a row in a dataset) of an observation from a data set with \(n\) total units (i.e., the total number of rows)
\(t\)	The index of time for a total number of time points \(\tau\)
\(L_t\)	Confounders at time \(t\)
\(A_t\)	A vector of intervention variables (i..e, treatment or exposure) at time \(t\)
\(Y\)	An outcome variable observed at the end of the study, that is at time \(\tau + 1\). Earlier measures of the outcome can be included in \(L_t\).
\(C_t\)	A indicator variable that a unit is observed (not censored) at time \(t+1\)
\(O_1, ..., O_n\)	A sample of \(n\) i.i.d observations with \(O = (L_1, A_1, C_1, L_2, A_2, C_2, ..., L_\tau, A_\tau, C_\tau, Y)\)
\(\bar{X}_t = (X_1, ..., X_t)\)	The history of a variable up until time \(t\)
\(\underline{X}_t = (X_t, ..., X_\tau)\)	The future of a variable, including time \(t\)
\(H_t = (\bar{A}_{t-1}, \bar{L}_t)\)	The history of all variables up until just before \(A_t\)
\(\epsilon_t\)	A randomizer
\(\dd(a_t, h_t, \epsilon_t)\)	A function that maps \(A_t\), \(H_t\), and \(\epsilon_t\) to a new value of treatment \(A^{\dd}_t\)

References

Dı́az, Iván, Nicholas Williams, Katherine L Hoffman, and Edward J Schenck. 2023. “Nonparametric Causal Effects Based on Longitudinal Modified Treatment Policies.” Journal of the American Statistical Association 118 (542): 846–57.