Fit Proportional Hazards Regression Model
Description
Fits a Cox proportional hazards regression model.
Time dependent variables, time dependent strata, multiple events per subject,
and other extensions are incorporated using the counting process formulation
of Andersen and Gill.
Usage
coxph(formula, data=, weights, subset,
na.action, init, control,
ties=c("efron","breslow","exact"),
singular.ok=TRUE, robust,
model=FALSE, x=FALSE, y=TRUE, tt, method=ties,
id, cluster, istate, statedata, nocenter=c(1, 0, 1), ...)
Arguments
formula 
a formula object, with the response on the left of a ~ operator, and
the terms on the right. The response must be a survival object as
returned by the Surv function. For a multistate model the
formula may be a list of formulas.

data 
a data.frame in which to interpret the variables named in
the formula , or in the subset and the weights
argument.

weights 
vector of case weights, see the note below.
For a thorough discussion of these see the
book by Therneau and Grambsch.

subset 
expression indicating which subset of the rows of data should be used in
the fit. All observations are included by default.

na.action 
a missingdata filter function. This is applied to the model.frame
after any
subset argument has been used. Default is options()\$na.action .

init 
vector of initial values of the iteration. Default initial
value is zero for all variables.

control 
Object of class coxph.control specifying iteration limit
and other control options. Default is coxph.control(...) .

ties 
a character string specifying the method for tie handling. If there
are no tied death times all the methods are equivalent.
The Efron approximation is used as the default, it is more
accurate when dealing with tied death times, and is as efficient
computationally. (But see below for multistate models.)
The “exact partial likelihood” is
equivalent to a conditional logistic model, and is appropriate when
the times are a small set of discrete values.

singular.ok 
logical value indicating how to handle collinearity in the model matrix.
If TRUE , the program will automatically skip over columns of the X
matrix that are linear combinations of earlier columns. In this case the
coefficients for such columns will be NA, and the variance matrix will
contain zeros. For ancillary calculations, such as the linear
predictor,
the missing coefficients are treated as zeros.

robust 
should a robust variance be computed.
The default is TRUE if: there is a cluster argument, there
are case weights that are not 0 or 1, or there are id values
with more than one event.

id 
optional variable name that identifies subjects. Only
necessary when a subject can have multiple rows in the data, and
there is more than one event type. This variable will normally be
found in data .

cluster 
optional variable which clusters the observations, for
the purposes of a robust variance. If present, it implies
robust .
This variable will normally be found in data .

istate 
optional variable giving the current state at the start
each interval. This variable will normally be found in data .

statedata 
optional data set used to describe multistate models.

model 
logical value: if TRUE , the model frame is returned in component
model .

x 
logical value: if TRUE , the x matrix is returned in
component x .

y 
logical value: if TRUE , the response vector is returned in
component y .

tt 
optional list of timetransform functions.

method 
alternate name for the ties argument.

nocenter 
columns of the X matrix whose values lie strictly
within this set are not recentered. Remember that a factor
variable becomes a set of 0/1 columns.

... 
Other arguments will be passed to coxph.control

Details
The proportional hazards model is usually expressed in terms of a
single survival time value for each person, with possible censoring.
Andersen and Gill reformulated the same problem as a counting process;
as time marches onward we observe the events for a subject, rather
like watching a Geiger counter.
The data for a subject is presented as multiple rows or "observations",
each
of which applies to an interval of observation (start, stop].
The routine internally scales and centers data to avoid overflow in
the argument to the exponential function. These actions do not change
the result, but lead to more numerical stability.
Any column of the X matrix whose values lie within nocenter
list
are not recentered. The practical consequence of the default is to not
recenter dummy variables corresponding to factors.
However, arguments to offset are not scaled since there are situations
where a large offset value is a purposefully used.
In general, however, users should not avoid very large numeric values
for an offset due to possible loss of precision in the estimates.
Value
an object of class coxph
representing the fit.
See coxph.object
and coxphms.object
for details.
Side Effects
Depending on the call, the predict
, residuals
,
and survfit
routines may
need to reconstruct the x matrix created by coxph
.
It is possible for this to fail, as in the example below in
which the predict function is unable to find tform
.
tfun < function(tform) coxph(tform, data=lung)
fit < tfun(Surv(time, status) ~ age)
predict(fit)
In such a case add the model=TRUE
option to the
coxph
call to obviate the
need for reconstruction, at the expense of a larger fit
object.
Case weights
Case weights are treated as replication weights, i.e., a case weight of
2 is equivalent to having 2 copies of that subject's observation.
When computers were much smaller grouping like subjects together was a
common trick to used to conserve memory. Setting all weights to 2 for
instance will give the same coefficient estimate but halve the variance.
When the Efron approximation for ties (default) is employed replication
of the data will not give exactly the same coefficients as the weights option,
and in this case the weighted fit is arguably the correct one.
When the model includes a cluster
term or the robust=TRUE
option the computed variance treats any weights as sampling weights;
setting all weights to 2 will in this case give the same variance as weights of 1.
Special terms
There are three special terms that may be used in the model equation.
A strata
term identifies a stratified Cox model; separate baseline
hazard functions are fit for each strata.
The cluster
term is used to compute a robust variance for the model.
The term + cluster(id)
where each value of id
is unique is
equivalent to
specifying the robust=TRUE
argument.
If the id
variable is not
unique, it is assumed that it identifies clusters of correlated
observations.
The robust estimate arises from many different arguments and thus has
had many labels. It is variously known as the
Huber sandwich estimator, White's estimate (linear models/econometrics),
the HorvitzThompson estimate (survey sampling), the working
independence variance (generalized estimating equations), the
infinitesimal jackknife, and the Wei, Lin, Weissfeld (WLW) estimate.
A timetransform term allows variables to vary dynamically in time. In
this case the tt
argument will be a function or a list of
functions (if there are more than one tt() term in the model) giving the
appropriate transform. See the examples below.
One user mistake that has recently arisen is to slavishly follow the
advice of some coding guides and prepend survival::
onto
everthing, including the special terms, e.g.,
survival::coxph(survival:Surv(time, status) ~ age +
survival::cluster(inst), data=lung)
First, this is unnecessary: arguments within the coxph
call will
be evaluated within the survival namespace, so another package's Surv or
cluster function would not be noticed.
(Full qualification of the coxph call itself may be protective, however.)
Second, and more importantly, the call just above will not give the
correct answer. The specials are recognized by their name, and
survival::cluster
is not the same as cluster
; the above model would
treat inst
as an ordinary variable.
A similar issue arises from using stats::offset
as a term, in
either survival or glm models.
Convergence
In certain data cases the actual MLE estimate of a
coefficient is infinity, e.g., a dichotomous variable where one of the
groups has no events. When this happens the associated coefficient
grows at a steady pace and a race condition will exist in the fitting
routine: either the log likelihood converges, the information matrix
becomes effectively singular, an argument to exp becomes too large for
the computer hardware, or the maximum number of interactions is
exceeded.
(Most often number 1 is the first to occur.)
The routine attempts to detect when this has happened,
not always successfully.
The primary consequence for the user is that the Wald statistic =
coefficient/se(coefficient) is not valid in this case and should be
ignored; the likelihood ratio and score tests remain valid however.
Ties
There are three possible choices for handling tied event times.
The Breslow approximation is the easiest to program and hence became the
first option coded for almost all computer routines. It then ended up
as the default option when other options were added in order to "maintain
backwards compatability". The Efron option is more accurate if there are
a large number of ties, and it is the default option here.
In practice the number of ties is usually small, in which case all the
methods are statistically indistinguishable.
Using the "exact partial likelihood" approach the Cox partial likelihood
is equivalent to that for matched logistic regression. (The
clogit
function uses the coxph
code to do the fit.)
It is technically appropriate when the time scale is discrete and has
only a few unique values, and some packages refer to this as the
"discrete" option. There is also an "exact marginal likelihood" due to
Prentice which is not implemented here.
The calculation of the exact partial likelihood is numerically intense.
Say for instance 180 subjects are at risk on day 7 of which 15 had an event;
then the code needs to compute sums over all 180choose15 > 10^43 different
possible subsets of size 15. There is an efficient recursive algorithm
for this task, but even with this the computation can be insufferably
long. With (start, stop) data it is much worse since the recursion
needs to start anew for each unique start time.
Multi state models are a more difficult case.
First of all, a proper extension of the Efron argument is much more
difficult to do, and this author is not yet fully convinced that the
resulting algorithm is defensible.
Secondly, the current code for Efron case does not consistently
compute that extended logic (and extension would require major changes
in the code). Due to this
complexity, the default is ties='breslow'
for the multistate
case. If ties='efron'
is selected the current code will, in
effect, only apply to to tied transitions of the same type.
A separate issue is that of artificial ties due to floatingpoint
imprecision. See the vignette on this topic for a full explanation or
the timefix
option in coxph.control
.
Users may need to add timefix=FALSE
for simulated data sets.
Penalized regression
coxph
can maximise a penalised partial likelihood with
arbitrary userdefined penalty. Supplied penalty functions include
ridge regression (ridge), smoothing splines
(pspline), and frailty models (frailty).
References
Andersen, P. and Gill, R. (1982).
Cox's regression model for counting processes, a large sample study.
Annals of Statistics
10, 11001120.
Therneau, T., Grambsch, P., Modeling Survival Data: Extending the Cox Model.
SpringerVerlag, 2000.
See Also
coxph.object
, coxphms.object
,
coxph.control
,
cluster
, strata
, Surv
,
survfit
, pspline
.
Examples
test1 < list(time=c(4,3,1,1,2,2,3),
status=c(1,1,1,0,1,1,0),
x=c(0,2,1,1,1,0,0),
sex=c(0,0,0,0,1,1,1))
coxph(Surv(time, status) ~ x + strata(sex), test1)
test2 < list(start=c(1,2,5,2,1,7,3,4,8,8),
stop=c(2,3,6,7,8,9,9,9,14,17),
event=c(1,1,1,1,1,1,1,0,0,0),
x=c(1,0,0,1,0,1,1,1,0,0))
summary(coxph(Surv(start, stop, event) ~ x, test2))
test2 < list(start=c(1, 2, 5, 2, 1, 7, 3, 4, 8, 8),
stop =c(2, 3, 6, 7, 8, 9, 9, 9,14,17),
event=c(1, 1, 1, 1, 1, 1, 1, 0, 0, 0),
x =c(1, 0, 0, 1, 0, 1, 1, 1, 0, 0) )
summary( coxph( Surv(start, stop, event) ~ x, test2))
bladder1 < bladder[bladder$enum < 5, ]
coxph(Surv(stop, event) ~ (rx + size + number) * strata(enum),
cluster = id, bladder1)
coxph(Surv(time, status) ~ ph.ecog + tt(age), data=lung,
tt=function(x,t,...) pspline(x + t/365.25))