Statistical issues in general (Causal effects with TMLE: missing data)
June 3, 2026 The authors evaluated the performance of targeted maximum likelihood estimation (TMLE) for estimating the average treatment effect in missing data scenarios under varying levels of positivity violations. Directed acyclic graphs (DAGs) have been used in causal inference. They have also been extended to show missingness mechanisms (m-DAG). Since under causal inference, the average treatment effect (ATE) is a key target parameter, targeted maximum likelihood estimation (TMLE) is often used to estimate the ATE because of its robust properties. The TMLE is also a plug-in-estimator, which means that while it directly targets the parameter of interest, it also allows for incorporation of data-adaptive methods like machine learning methods. Also, positivity is a crucial assumption in causal inference and requires that within each stratum of the confounders, every individual has a nonzero probability of receiving either exposure condition. The m-DAGs include missingness indicators which encode assumptions about the processes leading to missing data. Recoverability refers to ability to consistently estimate a target parameter, like an ATE, from the available data even in presence of missing data. There are many ways to handle missing data and many assumptions. The non-multiple imputation (MI) strategies they considered were: complete-case analysis, extended TMLE (discards observations with missing values in confounders and exposure variables and then incorporates a model for outcome missingness into the targeting step of the TMLE; Ext), and extended TMLE with missing covariate missing indicator (ext MCMI), where only observations with missing exposure data are excluded and missingness indicators are included for incomplete confounders. Another option, fully conditional specification (FCS) Methods, which are chained equation approaches and implemented in the mice package in R, iteratively imputes missing values by sampling from a series of univariate conditional models. The possible methods are: MI using predictive mean matching (MI PMM), which uses parametric imputation using PMM for the outcome and applies appropriate models for other variables based on their type of outcome, MI with interaction terms (MI Int), MI using classification and regression trees (MI CART), and MI using