Defining General, Hypothetical Interventions
\[ \renewcommand{\P}{\mathsf{P}} \newcommand{\m}{\mathsf{m}} \newcommand{\p}{\mathsf{p}} \newcommand{\q}{\mathsf{q}} \newcommand{\bb}{\mathsf{b}} \newcommand{\g}{\mathsf{g}} \newcommand{\rr}{\mathsf{r}} \newcommand{\IF}{\mathbb{IF}} \newcommand{\dd}{\mathsf{d}} \newcommand{\Pn}{$\mathsf{P}_n$} \newcommand{\E}{\mathsf{E}} \]
Have you ever begun reading a paper in the methodological causal inference literature and encountered the phrase “assume the treatment or exposure is a binary…”? (Most papers we read assume this!!) While assuming exposure variables are binary can simplify the definition of causal effects, many exposures of interest in reality are not binary.
Instead, we will work in situations where \(A\) is a binary, categorical, multivariate, or continuous variable!
In the previous section, we defined \(\dd(a_t, h_t, \epsilon_t)\) as a function that maps \(A_t\), \(H_t\), and potentially a randomizer \(\epsilon_t\) to a new value of \(A_t\). Our focus henceforth is estimating the causal effect of an intervention, characterized by \(\dd\) on the outcome \(Y\), through the causal parameter
\[ \theta = \E[Y^{\bar A^{\dd}}]\text{,} \]
where \(Y^{\bar A^{\dd}}\) is the counterfactual outcome in a world, where possibly contrary to fact, each entry of \(\bar{A} = (A_1, \ldots, A_\tau)\) was modified according to the function \(\dd\).
When \(Y\) is continuous, \(\theta\) is the mean population value of \(Y\) under intervention \(\dd\).
When \(Y\) is dichotomous, \(\theta\) is the population proportion of event \(Y\) under intervention \(\dd\).
When \(Y\) is the indicator of an event by end of the study, \(\theta\) is defined as the cumulative incidence of \(Y\) under intervention \(\dd\).
But what is this function \(\dd\), how can it be defined, and how does using this function to define interventions solve the problem? Let’s start from simple to more complex examples of functions \(\dd\).
Static Interventions
Let \(A\) denote a binary vector, such as receiving a medication, and define \(\dd(a_t, h_t, \epsilon) = 1\). This intervention characterizes a hypothetical world where all members of the population receive treatment.
Dynamic Treatment Regime
Let \(A_t\) denote a binary vector, such as receiving a medication, and \(H_t\) a numeric vector, such as a measure of discomfort. For a given value of \(\delta\), define \[ \dd(a_t, h_t, \epsilon) = \begin{cases} 1 &\text{ if } h_t > \delta \\ 0 &\text{ otherwise.} \end{cases} \]
Modified Treatment Policies
While much attention is given to static and dynamic interventions, their use is often accompanied by a few key problems.
Defining causal effects in terms of hypothetical interventions where treatment is applied to all units may be inconceivable. For example, we may be interested to know if reducing surgery time reduces surgical complications. However, it’s inconceivable to set all surgeries to a given duration, even if this duration depends on patient covariates.
Defining causal effects in terms of hypothetical interventions where treatment is applied to all units may induce positivity violations.
A solution to these problems is to instead define causal effects using modified treatment policies (MTP).
Additive and multiplicative shift MTP
Let \(A_t\) denote a numeric vector. Assume that \(A_t\) has support in the data such that \(P(A_t \leq u(h_t) \mid H_t = h_t) = 1\). For an analyst-defined value of \(\delta\), define \[ \dd(a_t, h_t, \epsilon) = \begin{cases} a_t + \delta &\text{ if } a_t + \delta \leq u(h_t) \\ a_t &\text{ otherwise.} \end{cases} \]
Under this intervention, the natural value of exposure at time \(t\) is increased by the analyst-defined value \(\delta\), whenever such an increase is feasible. This MTP is referred to as an additive shift MTP.
We can similarly define a multiplicative shift MTP as
\[ \dd(a_t, h_t, \epsilon) = \begin{cases} a_t \times \delta &\text{ if } a_t \times \delta \leq u(h_t) \\ a_t &\text{ otherwise}. \end{cases} \]
Randomized Interventions
Let \(A\) denote a binary vector, \(\epsilon \sim U(0, 1)\), and \(\epsilon\) be an analyst-defined value between 0 and 1. We may then define randomized interventions. For example, imagine we are interested in a hypothetical world where half of all smokers quit smoking. This intervention would be defined as
\[ \dd(a_t, \epsilon_t) = \begin{cases} 0 &\text{ if } \epsilon_t < 0.5 \text{ and } a_t = 1 \\ a_t &\text{ otherwise} \end{cases}. \]
Incremental Propensity Score Interventions Based on the Risk Ratio
Let \(A\) denote a binary variable, \(\epsilon \sim U(0, 1)\), and \(\delta\) be an analyst-defined risk ratio limited to be between \(0\) and \(1\). In addition, define \(P(A_t = a_t\mid H_t)= \g(a_t \mid H_t)\).
If we were interested in an intervention that decreased the likelihood of receiving treatment, define
\[ \dd_t(a_t, h_t, \epsilon_t) = \begin{cases} a_t &\text{ if } \epsilon_t < \delta \\ 0 &\text{ otherwise} \end{cases}. \] In this case, we have \(\g^\dd(a_t \mid H_t) = a_t \delta \g_t(1 \mid H_t) + (1 - a_t) (1 - \delta \g_t(1\mid H_t))\), which leads to a risk ratio of \(\g_t^\dd(1 \mid H_t)/\g_t(1\mid H_t) = \delta\) for comparing the propensity score post- vs pre-intervention.
Conversely, if we were interested in an intervention that increased the likelihood of receiving treatment, define
\[ \dd_t(a_t, h_t, \epsilon_t) = \begin{cases} a_t &\text{ if } \epsilon_t < \delta \\ 1 &\text{ otherwise.} \end{cases} \]
Now \(\g_t^\dd(a_t \mid H_t) = a_t (1 - \delta \g_t(0\mid H_t)) + (1 - a_t) \delta \g_t(0 \mid H_t)\), which implies a risk ratio \(\g_t^\dd(0\mid H_t)/\g(0\mid H_t) = \delta\).
Interventions where the shift is in the odds ratio scale were previously proposed, but the effects of odds-ratio shifts should not be estimated with lmtp
, we will discuss this more later.
Identification of the causal parameter
Recall that the fundamental problem of causal inference is that we can’t observe the alternative worlds which we use to define causal effects. If we can’t observe counterfactual variables, then how can we learn a causal effect? Under a set of certain assumptions, we can identify a causal parameter from observed data. These assumptions are called identification assumptions.
Positivity. If \((a_t, h_t) \in \text{supp}\{A_t, H_t\}\) then \(\dd(a_t, h_t) \in \text{supp}\{A_t, H_t\}\) for \(t \in \{1, ..., \tau\}\).
If there is a unit with observed treatment value \(a_t\) and covariates \(h_t\), there must also be a unit with treatment value \(\dd(a_t, h_t)\) and covariates \(h_t\).
No unmeasured confounders. All the common causes of \(A_t\) and \((L_s, A_s, Y)\) are measured and contained in \(H_t\) for all \(s \in \{t+1, ..., \tau\}\).
For all times \(t\), the history \(H_t\) contains sufficient variables to adjust for confounding of \(A_t\) and any subsequent variables, including future treatment.
Assuming the above, \(\theta\) is identified from the observed data with:
As an example, consider the following data where \(\tau = 2\).
We can compute the identification formula in the following steps:
Set \(\m_3(A_3^\dd, H_3) = Y\)
Compute the regression of \(\m_3(A_3^\dd, H_3)\) on \((A_2, H_2)\). This gives a predictive function, call that predictive function \(\m_2(A_2,H_2)\).
Use the predictive function to compute what would have occurred if the intervention had been implemented at time \(t=2\), i.e., compute \(\m_2(A_2^\dd,H_2)\).
Compute the regression of \(\m_2(A_2^\dd,H_2)\) on \((A_1, H_1)\). This gives a predictive function, call that predictive function \(\m_1(A_1,H_1)\).
Use the predictive function to compute what would have occurred if the intervention had been implemented at time \(t=1\), i.e., compute \(\m_1(A_1^\dd,H_1)\).
Compute the mean of \(\m_1(A_1^\dd,H_1)\). This mean is equal to \(\theta\).