The Causal Model and Notation
\[ \renewcommand{\P}{\mathsf{P}} \newcommand{\m}{\mathsf{m}} \newcommand{\p}{\mathsf{p}} \newcommand{\q}{\mathsf{q}} \newcommand{\bb}{\mathsf{b}} \newcommand{\g}{\mathsf{g}} \newcommand{\rr}{\mathsf{r}} \newcommand{\IF}{\mathbb{IF}} \newcommand{\dd}{\mathsf{d}} \newcommand{\Pn}{$\mathsf{P}_n$} \newcommand{\E}{\mathsf{E}} \]
Consider a typical setting where we have measured some treatment \(A\), some set of pre-treatment variables \(L\), and an outcome. We will assume that the data are generated by the following causal model (but alternative definitions can be achieved using other causal models):
\[ \begin{align} L &= f_L(U_L)\\ A &= f_A(L, U_A)\\ Y &= f_Y(A, L, U_Y) \end{align} \]
The functions \(f\) are assumed fixed, but likely unknown.
\(U = (U_L, U_A, U_Y)\) is a vector of exogenous, unmeasured random variables.
This system is referred to as a Non-parametric Structural Equation Model (NPSEM).
We will assume that this system of equations exists in nature and is responsible for generating our observed data.
We will often refer to \(L\) as confounders, \(A\) as the treatment or exposure, and \(Y\) as the outcome.
Take for example the following sample of data:
Let, \(L\) = sex
, bmi
, age
, and smoke
; \(A\) = trt
; and \(Y\) = event
. When we say that we assume the previous NPSEM, we are positing that:
the variable
trt
was generated from an unknown function of the random variablessex
,bmi
,age
, andsmoke
, as well as a set of unmeasured random variables not present in our data.that the variable
event
was generated from an unknown function of the variables that generatedtrt
in addition totrt
itself.
Note that in some cases, we may know one or more of the functions \(f\). For example, if our data came from a randomized clinical trial for \(A\), then we know the function \(f_A\).
Central to how we will define causal effects is the concept of counterfactual random variables.
For example, consider a scenario where we are interested in the value of \(Y\) in a hypothetical situation where, instead of the variable \(A\) being equal to its observed value, \(A\) is set to some other value.
The value of \(Y\) in this hypothetical situation is a counterfactual random variable.
let \(\dd(A, L)\) be a function that takes a treatment value \(A\) and a covariate profile \(L\) and returns a new value of treatment. We will refer to the function \(\dd\) as a shift function or a general hypothetical intervention.
Denote the value of \(Y\) in the hypothetical world where treatment is set to the value \(\dd(A,L)\) as \(Y^{\dd}\).
Returning to the data example, imagine we are interested in the value of event
if trt
was replaced with the output of a function \(\dd\) that always returns 1:
\[ \begin{align} L &= f_L(U_L)\\ A^{\dd} &= \dd(A, L) = 1 \\ Y^{\dd} &= f_Y(1, L, U_Y) \end{align} \]
Here we introduce some new notation \(A^{\dd}\) to refer to the post-intervention exposure. If we had the ability to collect data from this alternative NPSEM, the data may instead look like this:
Unfortunately, we are never able to collect data from this alternative world. This is called the fundamental problem of causal inference.
The previous NPSEM is the simplest causal model we will assume in this workshop. However, real data is often much more complex and may be characterized by:
time-varying variables
loss-to-follow-up
As such, we need to modify and introudce some additional notation:
Symbol | Definition |
---|---|
\(i\) | The index (i.e. a row in a dataset) of an observation from a data set with \(n\) total units (i.e., the total number of rows) |
\(t\) | The index of time for a total number of time points \(\tau\) |
\(L_t\) | Confounders at time \(t\) |
\(A_t\) | A vector of intervention variables (i..e, treatment or exposure) at time \(t\) |
\(Y\) | An outcome variable observed at the end of the study, that is at time \(\tau + 1\). Earlier measures of the outcome can be included in \(L_t\). |
\(C_t\) | A indicator variable that a unit is observed (not censored) at time \(t+1\) |
\(O_1, ..., O_n\) | A sample of \(n\) i.i.d observations with \(O = (L_1, A_1, C_1, L_2, A_2, C_2, ..., L_\tau, A_\tau, C_\tau, Y)\) |
\(\bar{X}_t = (X_1, ..., X_t)\) | The history of a variable up until time \(t\) |
\(\underline{X}_t = (X_t, ..., X_\tau)\) | The future of a variable, including time \(t\) |
\(H_t = (\bar{A}_{t-1}, \bar{L}_t)\) | The history of all variables up until just before \(A_t\) |
\(\epsilon_t\) | A randomizer |
\(\dd(a_t, h_t, \epsilon_t)\) | A function that maps \(A_t\), \(H_t\), and \(\epsilon_t\) to a new value of treatment \(A^{\dd}_t\) |