ltmle
is Targeted Maximum Likelihood Estimation (TMLE) of
treatment/censoring specific mean outcome for point-treatment and
longitudinal data. ltmleMSM
adds Marginal Structural Models. Both
always provide Inverse Probability of Treatment/Censoring Weighted estimate
(IPTW) as well. Maximum likelihood based G-computation estimate (G-comp) can
be obtained instead of TMLE. ltmle
can be used to calculate additive
treatment effect, risk ratio, and odds ratio.
Usage
ltmle(
data,
Anodes,
Cnodes = NULL,
Lnodes = NULL,
Ynodes,
survivalOutcome = NULL,
Qform = NULL,
gform = NULL,
abar,
rule = NULL,
gbounds = c(0.01, 1),
Yrange = NULL,
deterministic.g.function = NULL,
stratify = FALSE,
SL.library = "glm",
SL.cvControl = list(),
estimate.time = TRUE,
gcomp = FALSE,
iptw.only = FALSE,
deterministic.Q.function = NULL,
variance.method = "tmle",
observation.weights = NULL,
id = NULL
)
ltmleMSM(
data,
Anodes,
Cnodes = NULL,
Lnodes = NULL,
Ynodes,
survivalOutcome = NULL,
Qform = NULL,
gform = NULL,
gbounds = c(0.01, 1),
Yrange = NULL,
deterministic.g.function = NULL,
SL.library = "glm",
SL.cvControl = list(),
regimes,
working.msm,
summary.measures,
final.Ynodes = NULL,
stratify = FALSE,
msm.weights = "empirical",
estimate.time = TRUE,
gcomp = FALSE,
iptw.only = FALSE,
deterministic.Q.function = NULL,
variance.method = "tmle",
observation.weights = NULL,
id = NULL
)
Arguments
- data
data frame following the time-ordering of the nodes. See 'Details'.
- Anodes
column names or indicies in
data
of treatment nodes- Cnodes
column names or indicies in
data
of censoring nodes- Lnodes
column names or indicies in
data
of time-dependent covariate nodes- Ynodes
column names or indicies in
data
of outcome nodes- survivalOutcome
If
TRUE
, then Y nodes are indicators of an event, and if Y at some time point is 1, then all following should be 1. Required to beTRUE
orFALSE
if outcomes are binary and there are multiple Ynodes.- Qform
character vector of regression formulas for \(Q\). See 'Details'.
- gform
character vector of regression formulas for \(g\) or a matrix/array of prob(A=1). See 'Details'.
- abar
binary vector (numAnodes x 1) or matrix (n x numAnodes) of counterfactual treatment or a list of length 2. See 'Details'.
- rule
a function to be applied to each row (a named vector) of
data
that returns a numeric vector of length numAnodes. See 'Details'.- gbounds
lower and upper bounds on estimated cumulative probabilities for g-factors. Vector of length 2, order unimportant.
- Yrange
NULL or a numerical vector where the min and max of
Yrange
specify the range of all Y nodes. See 'Details'.- deterministic.g.function
optional information on A and C nodes that are given deterministically. See 'Details'. Default
NULL
indicates no deterministic links.- stratify
if
TRUE
stratify on followingabar
when estimating Q and g. IfFALSE
, pool overabar
.- SL.library
optional character vector of libraries to pass to
SuperLearner
.NULL
indicates glm should be called instead ofSuperLearner
. 'default
' indicates a standard set of libraries. May be separately specified for \(Q\) and \(g\). See 'Details'.- SL.cvControl
optional list to be passed as
cvControl
toSuperLearner
- estimate.time
if
TRUE
, run an initial estimate using only 50 observations and use this to print a very rough estimate of the total time to completion. No action if there are fewer than 50 observations.- gcomp
if
TRUE
, run the maximum likelihood based G-computation estimate instead of TMLE- iptw.only
by default (
iptw.only = FALSE
), both TMLE and IPTW are run inltmle
andltmleMSM
. Ifiptw.only = TRUE
, only IPTW is run, which is faster.- deterministic.Q.function
optional information on Q given deterministically. See 'Details'. Default
NULL
indicates no deterministic links.- variance.method
Method for estimating variance of TMLE. One of "ic", "tmle", "iptw". If "tmle", compute both the robust variance estimate using TMLE and the influence curve based variance estimate (use the larger of the two). If "iptw", compute both the robust variance estimate using IPTW and the influence curve based variance estimate (use the larger of the two). If "ic", only compute the influence curve based variance estimate. "ic" is fastest, but may be substantially anti-conservative if there are positivity violations or rare outcomes. "tmle" is slowest but most robust if there are positivity violations or rare outcomes. "iptw" is a compromise between speed and robustness. variance.method="tmle" or "iptw" are not yet available with non-binary outcomes, gcomp=TRUE, stratify=TRUE, or deterministic.Q.function.
- observation.weights
observation (sampling) weights. Vector of length n. If
NULL
, assumed to be all 1.- id
Household or subject identifiers. Vector of length n or
NULL
. Integer, factor, or character recommended, but any type that can be coerced to factor will work.NULL
means all distinct ids.- regimes
binary array: n x numAnodes x numRegimes of counterfactual treatment or a list of 'rule' functions
- working.msm
character formula for the working marginal structural model
- summary.measures
array: num.regimes x num.summary.measures x num.final.Ynodes - measures summarizing the regimes that will be used on the right hand side of
working.msm
(baseline covariates may also be used in the right hand side ofworking.msm
and do not need to be included insummary.measures
)- final.Ynodes
vector subset of Ynodes - used in MSM to pool over a set of outcome nodes
- msm.weights
projection weights for the working MSM. If "empirical", weight by empirical proportions of rows matching each regime for each final.Ynode, with duplicate regimes given zero weight. If
NULL
, no weights. Or an array of user-supplied weights with dimensions c(n, num.regimes, num.final.Ynodes) or c(num.regimes, num.final.Ynodes).
Value
ltmle
returns an object of class "ltmle
" (unless
abar
or rule
is a list, in which case it returns an object of
class ltmleSummaryMeasures
, which has the same components as
ltmleMSM
.) The function summary
(i.e.
summary.ltmle
) can be used to obtain or print a summary of the
results. An object of class "ltmle
" is a list containing the
following components:
- estimates
a named vector of length 4 with elements, each an estimate of \(E[Y_{bar{a}}]\):
tmle
- Targeted Maximum Likelihood Estimate [NULL ifgcomp
isTRUE
]iptw
- Inverse Probability of Treatment/Censoring Weighted estimategcomp
- maximum likelihood based G-computation estimate [NULL ifgcomp
isFALSE
]
- IC
a list with the following components of Influence Curve values
tmle
- vector of influence curve values for Targeted Maximum Likelihood Estimate [NULL ifgcomp
isTRUE
]iptw
- vector of influence curve values for Inverse Probability of Treatment/Censoring Weighted estimategcomp
- vector of influence curve values for Targeted Maximum Likelihood Estimate without updating [NULL ifgcomp
isFALSE
]
- cum.g
cumulative g, after bounding: for ltmle, n x numACnodes, for ltmleMSM, n x numACnodes x num.regimes
- cum.g.unbounded
cumulative g, before bounding: for ltmle, n x numACnodes, for ltmleMSM, n x numACnodes x num.regimes
- cum.g.used
binary - TRUE if an entry of cum.g was used in the updating step (note: even if cum.g.used is FALSE, a small value of cum.g.unbounded may still indicate a positivity problem): for ltmle, n x numACnodes, for ltmleMSM, n x numACnodes x num.regimes
- call
the matched call
- gcomp
the
gcomp
input- formulas
a
list
with elementsQform
andgform
- fit
a list with the following components
g
- list of length numACnodes -glm
orSuperLearner
(see Details) return objects from fitting g regressionsQ
- list of length numLYnodes -glm
orSuperLearner
(see Details) return objects from fitting Q regressionsQstar
- list of length numLYnodes -glm
(or numerical optimization ifglm
fails to solve the score equation) return objects from updating the Q fit
ltmleMSM
returns an object of class "ltmleMSM
" The function
summary
(i.e. summary.ltmleMSM
) can be used to
obtain or print a summary of the results. An object of class
"ltmleMSM
" is a list containing the following components:
- beta
parameter estimates for working.msm using TMLE (GCOMP if
gcomp
input isTRUE
)- beta.iptw
parameter estimates for working.msm using IPTW
- IC
matrix, n x numBetas - influence curve values for TMLE (without updating if
gcomp
input isTRUE
)- IC.iptw
matrix, n x numBetas - influence curve values for IPTW
- msm
object of class glm - the result of fitting the working.msm
- cum.g
array, n x numACnodes x numRegimes - cumulative g, after bounding
- cum.g.unbounded
array, n x numACnodes x numRegimes - cumulative g, before bounding
- call
the matched call
- gcomp
the
gcomp
input- formulas
a
list
with elementsQform
andgform
- fit
a list with the following components
g
- list of length numRegimes of list of length numACnodes -glm
orSuperLearner
(see Details) return objects from fitting g regressionsQ
- list of length numLYnodes -glm
orSuperLearner
(see Details) return objects from fitting Q regressionsQstar
- list of length numLYnodes -glm
(or numerical optimization ifglm
fails to solve the score equation) return objects from updating the Q fit
Details
The estimates returned by ltmle
are of a treatment specific mean,
\(E[Y_{\bar{a}}]\), the mean of the final treatment node, where all
treatment nodes, \(A\), are set to \(\bar{a}\) (abar
) and all
censoring nodes \(C\) are set to 1 (uncensored). The estimates returned by
ltmleMSM
are similar but are the parameters in a working marginal
structural model.
data
should be a data frame where the order of the columns
corresponds to the time-ordering of the model.
in censoring columns (Cnodes): factor with two levels: "censored" and "uncensored". The helper function
BinaryToCensoring
can be used to create these factors.in treatment columns (Anodes): 1 = treated, 0 = untreated (must be binary)
in event columns (Ynodes): If
survivalOutcome
isTRUE
, then Y nodes are treated as indicators of a one-time event. See details forsurvivalOutocme
. IfsurvivalOutcome
isFALSE
, Y nodes are treated as binary if all values are 0 or 1, and are treated as continuous otherwise. If Y nodes are continuous, they may be automatically scaled. See details forYrange
.time-dependent covariate columns (Lnodes): can be any numeric data
Data in
Cnodes
,Anodes
,Lnodes
andYnodes
are not used after (to the right of) censoring (or an event whensurvivalOutcome==TRUE
) and may be coded asNA
or any other value.Columns in
data
that are before (to the left of) the first ofCnodes
orAnodes
are treated as baseline variables, even if they are specified asLnodes
.After the first of
Cnodes
,Anodes
,Ynodes
, orLnodes
, every column must be in one ofCnodes
,Anodes
,Ynodes
, orLnodes
.
If survivalOutcome
is TRUE
, all Y values are indicators of an
event (e.g. death) at or before the current time, where 1 = event and 0 = no
event. The events in Ynodes must be of the form where once Y jumps to 1, Y
remains 1 at subsequent nodes.
For continuous outcomes, (survivalOutcome==FALSE
and some Y nodes are
not 0 or 1,) Y values are truncated at the minimum and maximum of
Yrange
if specified, and then transformed and scaled to be in [0,1].
That is, transformed to (Y-min(Yrange))/(max(Yrange)-min(Yrange))
. If
Yrange
is NULL
, it is set to the range of all Y nodes. In that
case, Y nodes are only scaled if any values fall outside of [0,1]. For
intervention specific means (ltmle
), parameter estimates are
transformed back based Yrange
.
Qform
should be NULL
, in which case all parent nodes of each L
and Y node will be used as regressors, or a named character vector that can
be coerced to class "formula
". The length of Qform
must
be equal to length(Lnodes) + length(Ynodes)
** and the names and order
of the formulas must be the same as the names and order of the L and Y nodes
in data
. The left hand side of each formula should be
"Q.kplus1
". If SL.library
is NULL
, glm
will be
called using the elements of Qform
. If SL.library
is
specified, SuperLearner
will be
called after a design matrix is created using Qform.
** If there is a "block" of L and Y nodes not separated by A or C nodes, only one regression is required at the first L/Y node in a block. You can pass regression formulas for the other L/Y nodes, but they will be ignored (with a message). See example 5.
gform
should be NULL
, in which case all parent nodes of each L
and Y node will be used as regressors, or a character vector that can be
coerced to class "formula
", or a matrix/array of Prob(A=1). If
gform
is a character vector, the length of gform
must be equal
to length(Anodes) + length(Cnodes)
and the order of the formulas must
be the same as the order the A and C nodes appear in data
. The left
hand side of each formula should be the name of the Anode or Cnode. If
SL.library
is NULL
, glm
will be called using the
elements of gform
. If SL.library
is specified,
SuperLearner
will be called after a
design matrix is created using gform
.
In ltmle
, gform
can also be a n x numACnodes matrix where
entry (i, j) is the probability that the ith observation of the jth A/C node
is 1 (if an Anode) or uncensored (if a Cnode), conditional on following abar
up to that node. In ltmleMSM
, gform
can similarly be a n x
numACnodes x numRegimes array, where entry (i, j, k) is the probability that
the ith observation of the jth A/C node is 1 (if an Anode) or uncensored (if
a Cnode), conditional on following regime k up to that node. If gform
is a matrix/array, deterministic.g.function
will not be used and
should be NULL
.
abar
specifies the counterfactual values of the Anodes, using the
order they appear in data
and should have the same length (if abar is
a vector) or number of columns (if abar is a matrix) as Anodes
.
rule
can be used to specify a dynamic treatment rule. rule
is
a function applied to each row of data
which returns a numeric
vector of the same length as Anodes
.
abar
and rule
cannot both be specified. If one of them if a
list of length 2, additive treatment effect, risk ratio, and odds ratio can
be computed using summary.ltmleEffectMeasures
.
regimes
can be a binary array: n x numAnodes x numRegimes of
counterfactual treatment or a list of 'rule' functions as described above
for the rule
argument for the ltmle
function
deterministic.g.function
can be a function used to specify model
knowledge about value of Anodes and/or Cnodes that are set
deterministically. For example, it may be the case that once a patient
starts treatment, they always stay on treatment. For details on the form of
the function and examples, see
deterministic.g.function_template
deterministic.Q.function
can be a function used to specify model
knowledge about the final event state. For example, it may be the case that
a patient can complete the study at some intermediate time point, in which
case the probability of death is 0 (assuming they have not died already).
For details on the form of the function and examples, see
deterministic.Q.function_template
SL.library
may be a character vector of libraries (or 'glm
' or
'default
'), in which case these libraries are used to estimate both
\(Q\) and \(g\) OR a list with two components, Q
and g
,
where each is a character vector of libraries (or 'glm
' or
'default
'). 'glm
' indicates glm should be called
instead of SuperLearner
If
SL.library
is the string 'default
', SL.library
is set
to list("SL.glm", "SL.stepAIC", "SL.bayesglm", c("SL.glm",
"screen.corP"), c("SL.step", "screen.corP"), c("SL.step.forward",
"screen.corP"), c("SL.stepAIC", "screen.corP"), c("SL.step.interaction",
"screen.corP"), c("SL.bayesglm", "screen.corP")
. Note that the default set
of libraries consists of main terms models. It may be advisable to include
squared terms, interaction terms, etc in gform
and Qform
or
include libraries that consider non-linear terms.
If attr(SL.library, "return.fit") == TRUE
, then fit$g
and
fit$Q
will return full SuperLearner
or glm
objects.
If not, only a summary matrix will be returned to save memory.
The print method for ltmle
objects only prints the tmle estimates.
Functions
ltmleMSM()
: Longitudinal Targeted Maximum Likelihood Estimation for a Marginal Structural Model
Author
Joshua Schwab jschwab77@berkeley.edu, Samuel Lendle, Maya Petersen, and Mark van der Laan
Examples
# See \url{http://joshuaschwab.github.io/ltmle/} for more examples.
rexpit <- function(x) rbinom(n=length(x), size=1, prob=plogis(x))
# Single time point Example
n <- 1000
W <- rnorm(n)
A <- rexpit(-1 + 2 * W)
Y <- rexpit(W + A)
data <- data.frame(W, A, Y)
result1 <- ltmle(data, Anodes="A", Ynodes="Y", abar=1)
#> Qform not specified, using defaults:
#> formula for Y:
#> Q.kplus1 ~ W + A
#>
#> gform not specified, using defaults:
#> formula for A:
#> A ~ W
#>
#> Estimate of time to completion: < 1 minute
summary(result1)
#> Estimator: tmle
#> Call:
#> ltmle(data = data, Anodes = "A", Ynodes = "Y", abar = 1)
#>
#> Parameter Estimate: 0.70148
#> Estimated Std Err: 0.050825
#> p-value: <2e-16
#> 95% Conf Interval: (0.60186, 0.8011)
#>
summary(result1, estimator="iptw")
#> Estimator: iptw
#> Call:
#> ltmle(data = data, Anodes = "A", Ynodes = "Y", abar = 1)
#>
#> Parameter Estimate: 0.67206
#> Estimated Std Err: 0.081852
#> p-value: 2.1996e-16
#> 95% Conf Interval: (0.51163, 0.83248)
#>
# MSM Example
# Given data over 3 time points where A switches to 1 once and then stays 1. We want to know
# how death varies as a function of gender, time and an indicator of whether a patient's
# intended regime was to switch before time.
# Note that working.msm includes time and switch.time, which are columns of
# summary.measures; working.msm also includes male, which is ok because it is a baseline
# covariate (it comes before any A/C/L/Y nodes).
data(sampleDataForLtmleMSM)
Anodes <- grep("^A", names(sampleDataForLtmleMSM$data))
Lnodes <- c("CD4_1", "CD4_2")
Ynodes <- grep("^Y", names(sampleDataForLtmleMSM$data))
msm.weights <- matrix(1:12, nrow=4, ncol=3) #just an example (can also use a 200x3x4 array),
#or NULL (for no weights), or "empirical" (the default)
result2 <- ltmleMSM(sampleDataForLtmleMSM$data, Anodes=Anodes, Lnodes=Lnodes, Ynodes=Ynodes,
survivalOutcome=TRUE,
regimes=sampleDataForLtmleMSM$regimes,
summary.measures=sampleDataForLtmleMSM$summary.measures, final.Ynodes=Ynodes,
working.msm="Y ~ male + time + I(pmax(time - switch.time, 0))",
msm.weights=msm.weights, estimate.time=FALSE)
#> Qform not specified, using defaults:
#> formula for Y1:
#> Q.kplus1 ~ male + age + CD4_0 + A0
#> formula for Y2:
#> Q.kplus1 ~ male + age + CD4_0 + A0 + CD4_1 + A1
#> formula for Y3:
#> Q.kplus1 ~ male + age + CD4_0 + A0 + CD4_1 + A1 + CD4_2 + A2
#>
#> gform not specified, using defaults:
#> formula for A0:
#> A0 ~ male + age + CD4_0
#> formula for A1:
#> A1 ~ male + age + CD4_0 + A0 + CD4_1
#> formula for A2:
#> A2 ~ male + age + CD4_0 + A0 + CD4_1 + A1 + CD4_2
#>
print(summary(result2))
#> Estimator: tmle
#> Estimate Std. Error CI 2.5% CI 97.5% p-value
#> (Intercept) -3.4059 0.6545 -4.6886 -2.123 1.95e-07
#> male -0.3802 0.5305 -1.4198 0.660 0.47360
#> time 0.7241 0.2602 0.2141 1.234 0.00539
#> I(pmax(time - switch.time, 0)) -0.4717 0.2019 -0.8675 -0.076 0.01948
#>
#> (Intercept) ***
#> male
#> time **
#> I(pmax(time - switch.time, 0)) *
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1