Characteristic | N = 5001 |
---|---|
Y | 21 (14, 32) |
X1 | 0.86 (0.41, 1.49) |
X2 | 0.93 (0.66, 1.29) |
X3 | 0.85 (0.44, 1.56) |
X4 | 0.97 (0.62, 1.54) |
X5 | 0.92 (0.49, 1.68) |
X6 | 0.92 (0.53, 1.59) |
X7 | 0.84 (0.41, 1.51) |
Z | 214 (43%) |
1 Median (IQR); n (%) |
Multivariate exposures
\[ \renewcommand{\P}{\mathsf{P}} \newcommand{\m}{\mathsf{m}} \newcommand{\p}{\mathsf{p}} \newcommand{\q}{\mathsf{q}} \newcommand{\bb}{\mathsf{b}} \newcommand{\g}{\mathsf{g}} \newcommand{\rr}{\mathsf{r}} \newcommand{\IF}{\mathbb{IF}} \newcommand{\dd}{\mathsf{d}} \newcommand{\Pn}{$\mathsf{P}_n$} \newcommand{\E}{\mathsf{E}} \]
lmtp
can estimate effects of simultaneous interventions on multiple variablesPractically, this is useful for assessing the effects of mixtures on environmental outcomes
NIEHS Simulation Data
For our example of estimating the effects of simultaneous interventions on multiple variables, we will use simulated data from the 2015 NIEHS Mixtures Workshop. The data has already been loaded into R in the background as mixtures
. You can view and download the raw data here.
The simulated data has \(n = 500\) observations and is intended to replicate a prospective cohort study.
The data is composed of 7 log-normally distributed and correlated exposures variables (
"X1", "X2", "X3", "X4", "X5", "X6", "X7"
), a single continuous outcome ("Y"
), and one binary confounder ("Z"
).There is no missing covariate data, no measurement error, and no censoring.
Only exposure variables
X1
,X2
,X4
,X5
, andX7
have an effect on the outcomeY
. However, the direction of the effects varies.X1
,X2
, andX7
are positively associated with the outcome.X4
andX5
are negatively associated with the outcome.
Multivariate shift functions
Only two things need to change when using lmtp
estimators with multivariate treatments:
Instead of a vector, you should now pass a list to the
trt
argumentThe shift function should return a named list of vectors instead of a single vector.
Let’s use lmtp
to estimate the effect of a modified treatment policy which intervenes on all 7 exposure simultaneously on the outcome:
\[ \dd(\mathbf{a}, h) = \begin{cases} \dd(a_1, h) = \begin{cases} a_1 - 0.2 &\text{ if } a_1 - 0.2 > 0 \\ a_1 &\text{ otherwise } \end{cases} \\ \dd(a_2, h) = \begin{cases} a_2 - 0.4 &\text{ if } a_2 - 0.4 > 0 \\ a_2 &\text{ otherwise } \end{cases} \\ \dd(a_3, h) = a_3 + 0.4 \\ \dd(a_4, h) = a_4 + 0.1 \\ \dd(a_5, h) = a_5 + 0.5 \\ \dd(a_6, h) = \begin{cases} a_6 - 0.2 &\text{ if } a_6 - 0.2 > 0 \\ a_6 &\text{ otherwise } \end{cases} \\ \dd(a_7, h) = \begin{cases} a_7 - 0.3 &\text{ if } a_7 - 0.3 > 0 \\ a_7 &\text{ otherwise } \end{cases} \end{cases} \]
Problem 1
Using TMLE, estimate the population mean outcome under the simultaneous intervention we just defined. Fit both the treatment mechanism and the outcome regression using this set of learners: c("SL.mean", "SL.glm", "SL.gam", "SL.rpart", "SL.rpartPrune", "SL.step.interaction")
. Assign the result to ans
. To save time, don’t use crossfitting; lmtp
has already been loaded into the R session.
Problem 2
Compared to what was observed under the natural course of exposure, how did intervening upon the set of exposures effect the outcome? Estimate this effect using lmtp_contrast()
.