Static Interventions
\[ \renewcommand{\P}{\mathsf{P}} \newcommand{\m}{\mathsf{m}} \newcommand{\p}{\mathsf{p}} \newcommand{\q}{\mathsf{q}} \newcommand{\bb}{\mathsf{b}} \newcommand{\g}{\mathsf{g}} \newcommand{\rr}{\mathsf{r}} \newcommand{\IF}{\mathbb{IF}} \newcommand{\dd}{\mathsf{d}} \newcommand{\Pn}{$\mathsf{P}_n$} \newcommand{\E}{\mathsf{E}} \]
Synthetic COVID RCT Data
For our first example, we’ll use a synthetic dataset of \(n = 1000\) patients from a hypothetical clinical trial for a treatment to decrease intubation and death among patients hospitalized with COVID-19.
The data is based on a database of over 1,500 patients hospitalized at Weill Cornell Medicine New York Presbyterian Hospital prior to 15 May 2020 with COVID-19 confirmed through PCR.
To replicate a two-arm randomized clinical trial (RCT), we’ve simulated a hypothetical treatment variable (
A
) that is randomly assigned for each observation with probability 0.5.The outcome of interest (
event
) is intubation or death by 15-days post-hospitalization.Treatment is associated with increased survival.
Baseline covariates include: age, sex, BMI, smoking status, whether the patient required supplemental oxygen within 3 hours of presenting to the emergency department, number of comorbidities, number of relevant symptoms, presence of bilateral infiltrates on chest X-ray, dyspnea, and hypertension.
Characteristic | 0, N = 4961 | 1, N = 5041 |
---|---|---|
event | 149 (30%) | 140 (28%) |
age | 66 (52, 77) | 63 (51, 78) |
sex | ||
Female | 210 (42%) | 216 (43%) |
Male | 286 (58%) | 288 (57%) |
bmi | 27 (23, 31) | 27 (23, 31) |
smoke | ||
Active Smoker | 31 (6.3%) | 21 (4.2%) |
Former Smoker | 122 (25%) | 119 (24%) |
No | 343 (69%) | 364 (72%) |
o2 | 222 (45%) | 223 (44%) |
num_comorbid | ||
0 | 115 (23%) | 121 (24%) |
1 | 118 (24%) | 113 (22%) |
2 | 93 (19%) | 97 (19%) |
3 | 86 (17%) | 78 (15%) |
4 | 40 (8.1%) | 55 (11%) |
5 | 20 (4.0%) | 24 (4.8%) |
6 | 14 (2.8%) | 12 (2.4%) |
7 | 8 (1.6%) | 4 (0.8%) |
8 | 2 (0.4%) | 0 (0%) |
num_symptoms | 3 (2, 4) | 4 (2, 5) |
bilat | 335 (68%) | 332 (66%) |
dyspnea | 296 (60%) | 303 (60%) |
hyper | 296 (60%) | 296 (59%) |
1 n (%); Median (IQR) |
Population mean outcomes
Our goal is to estimate the average treatment effect (ATE) of the hypothetical treatment on intubation or death:
\[ \begin{align} \theta &= \E(Y^{A=1} - Y^{A=0}) \\ &= \textcolor{blue}{\E(Y^{A=1})} - \textcolor{red}{\E(Y^{A=0})} \end{align} \]
Notice that the ATE is composed of two parameters:
the proportion of patients who receive intubation or die in a hypothetical world where all patients receive treatment (\(\textcolor{blue}{\E(Y^{A=1})}\)), and
the proportion of patients who receive intubation or die in a hypothetical world where no patients receive treatment (\(\textcolor{red}{\E(Y^{A=0})}\)).
We refer to these parameters as population mean outcomes. Let’s rewrite the target parameter as a function of interventions using the framework we previously discussed. Let \(\dd_1(a, h) = 1\) and \(\dd_0(a, h) = 0\). Then,
\[ \theta = \textcolor{blue}{\E(Y^{\dd_1})} - \textcolor{red}{\E(Y^{\dd_0})}. \]
Remember, we refer to these interventions as static because they return the same value regardless of their input.
Writing shift functions
So, how do we translate the functions \(\dd_1\) and \(\dd_0\) into R code to be used with lmtp
?
Policies can be specified using one of two arguments in lmtp
estimators: shift
or shifted
.
If using shift
, we supply a two-argument function of the form
<- function(data, trt) {
d # Insert code here
}
The first argument should correspond to a dataset containing all the variables in \(O\).
The second argument should expect a string corresponding to the variable(s) \(A_t\) in \(O\).
This function should return a size \(n\) vector with the same class as \(A_t\) modified according to \(d_t\).
For \(\dd_1\) this function would be:
If we apply this function to our data we are returned a vector of 1s with the same length as the number of rows in the observed data.
Problem 1
Let’s now estimate \(\E(Y^{\dd_1})\) using lmtp
.
Instead of specifying an intervention using shift
, we could instead supply a modified version of data
to the shifted
argument where the variables \(A_t\) are replaced with \(A^d_t\).
Problem 2
lmtp
objects
A call to an lmtp
estimator returns a list of class lmtp
.
Value | Description |
---|---|
estimator | The estimator used. |
theta | The estimated population LMTP estimand. |
standard_error | The estimated standard error of the LMTP estimand. |
low | Lower bound of the 95% confidence interval of the LMTP estimand. |
high | Upper bound of the 95% confidence interval of the LMTP estimand. |
eif | The estimated, un-centered, influence function of the estimate. |
shift | The shift function specifying the treatment policy of interest. |
outcome_reg | An \(n \times \tau + 1\) matrix of outcome regression predictions. The mean of the first column is used for calculating theta. |
density_ratios | An \(n \times \tau\) matrix of the estimated, non-cumulative, density ratios. |
fits_m | A list the same length as folds , containing the fits at each time-point for each fold for the outcome regression. |
fits_r | A list the same length as folds , containing the fits at each time-point for each fold of density ratio estimation. |
outcome_type | The outcome variable type. |
Let’s inspect some of these values.
Problem 3
Write a function to estimate the population mean outcome if no patients received treatment. Assign the result to an object named dont_treat
.
lmtp
already implements d1
and d0
as static_binary_on()
and static_binary_off()
!
Average treatment effect
With estimates of \(\E(Y^{\dd_1})\) and \(\E(Y^{\dd_0})\) we can now calculate the ATE. To do so, we’ll use the function lmtp_contrast()
.
In addition to the ATE, we can also calculate the causal risk ratio and the causal odds ratio.
Confirming what we already know, we’ve estimated the treatment as being associated with decreased intubation or death.