Multivariate exposures

Modified

June 14, 2024

NIEHS Simulation Data

For our example of estimating the effects of simultaneous interventions on multiple variables, we will use simulated data from the 2015 NIEHS Mixtures Workshop. The data has already been loaded into R in the background as mixtures. You can view and download the raw data here.

  • The simulated data has \(n = 500\) observations and is intended to replicate a prospective cohort study.

  • The data is composed of 7 log-normally distributed and correlated exposures variables ("X1", "X2", "X3", "X4", "X5", "X6", "X7"), a single continuous outcome ("Y"), and one binary confounder ("Z").

  • There is no missing covariate data, no measurement error, and no censoring.

Characteristic N = 5001
Y 21 (14, 32)
X1 0.86 (0.41, 1.49)
X2 0.93 (0.66, 1.29)
X3 0.85 (0.44, 1.56)
X4 0.97 (0.62, 1.54)
X5 0.92 (0.49, 1.68)
X6 0.92 (0.53, 1.59)
X7 0.84 (0.41, 1.51)
Z 214 (43%)
1 Median (IQR); n (%)
  • Only exposure variables X1, X2, X4, X5, and X7 have an effect on the outcome Y. However, the direction of the effects varies.

  • X1, X2, and X7 are positively associated with the outcome.

  • X4 and X5 are negatively associated with the outcome.

Multivariate shift functions

Only two things need to change when using lmtp estimators with multivariate treatments:

  1. Instead of a vector, you should now pass a list to the trt argument

  2. The shift function should return a named list of vectors instead of a single vector.

Let’s use lmtp to estimate the effect of a modified treatment policy which intervenes on all 7 exposure simultaneously on the outcome:

\[ \dd(\mathbf{a}, h) = \begin{cases} \dd(a_1, h) = \begin{cases} a_1 - 0.2 &\text{ if } a_1 - 0.2 > 0 \\ a_1 &\text{ otherwise } \end{cases} \\ \dd(a_2, h) = \begin{cases} a_2 - 0.4 &\text{ if } a_2 - 0.4 > 0 \\ a_2 &\text{ otherwise } \end{cases} \\ \dd(a_3, h) = a_3 + 0.4 \\ \dd(a_4, h) = a_4 + 0.1 \\ \dd(a_5, h) = a_5 + 0.5 \\ \dd(a_6, h) = \begin{cases} a_6 - 0.2 &\text{ if } a_6 - 0.2 > 0 \\ a_6 &\text{ otherwise } \end{cases} \\ \dd(a_7, h) = \begin{cases} a_7 - 0.3 &\text{ if } a_7 - 0.3 > 0 \\ a_7 &\text{ otherwise } \end{cases} \end{cases} \]

Problem 1

Using TMLE, estimate the population mean outcome under the simultaneous intervention we just defined. Fit both the treatment mechanism and the outcome regression using this set of learners: c("SL.mean", "SL.glm", "SL.gam", "SL.rpart", "SL.rpartPrune", "SL.step.interaction"). Assign the result to ans. To save time, don’t use crossfitting; lmtp has already been loaded into the R session.

Practice

Solution

set.seed(4363754)

learners <- c("SL.mean", 
              "SL.glm", 
              "SL.gam", 
              "SL.rpart", 
              "SL.rpartPrune", 
              "SL.step.interaction")

ans <- lmtp_tmle(data = mixtures, 
                 trt = A, 
                 outcome = "Y", 
                 baseline = "Z", 
                 shift = d, 
                 mtp = TRUE,
                 outcome_type = "continuous",
                 learners_trt = learners, 
                 learners_outcome = learners, 
                 folds = 1)

print(ans)

Problem 2

Compared to what was observed under the natural course of exposure, how did intervening upon the set of exposures effect the outcome? Estimate this effect using lmtp_contrast().

Practice

Solution

obs_y <- mean(mixtures$Y)
lmtp_contrast(ans, ref = obs_y)

References

Dı́az, Iván, Nicholas Williams, Katherine L Hoffman, and Edward J Schenck. 2023. “Nonparametric Causal Effects Based on Longitudinal Modified Treatment Policies.” Journal of the American Statistical Association 118 (542): 846–57.
Taylor, Kyla W, Bonnie R Joubert, Joe M Braun, Caroline Dilworth, Chris Gennings, Russ Hauser, Jerry J Heindel, Cynthia V Rider, Thomas F Webster, and Danielle J Carlin. 2016. “Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology: Lessons from an Innovative Workshop.” Environmental Health Perspectives 124 (12): A227–29.