Functions to Check the Type of Variables passed to Model Frames


.checkMFClasses checks if the variables used in a predict method agree in type with those used for fitting.

.MFclass categorizes variables for this purpose.

.getXlevels() extracts factor levels from factor or character variables.


.checkMFClasses(cl, m, ordNotOK = FALSE)
.getXlevels(Terms, m)



a character vector of class descriptions to match.


a model frame (model.frame() result).


any R object.


logical: are ordered factors different?


a terms object (terms.object).


For applications involving model.matrix() such as linear models we do not need to differentiate between ordered factors and factors as although these affect the coding, the coding used in the fit is already recorded and imposed during prediction. However, other applications may treat ordered factors differently: rpart does, for example.


.checkMFClasses() checks and either signals an error calling stop() or returns NULL invisibly.

.MFclass() returns a character string, one of "logical", "ordered", "factor", "numeric", "nmatrix.*" (a numeric matrix with a number of columns appended) or "other".

.getXlevels returns a named list of character vectors, possibly empty, or NULL.


sapply(warpbreaks, .MFclass) # "numeric" plus 2 x "factor"
sapply(iris,       .MFclass) # 4 x "numeric" plus "factor"

mf <- model.frame(Sepal.Width ~ Species,      iris)
mc <- model.frame(Sepal.Width ~ Sepal.Length, iris)

.checkMFClasses("numeric", mc) # nothing else
.checkMFClasses(c("numeric", "factor"), mf)

## simple .getXlevels() cases :
(xl <- .getXlevels(terms(mf), mf)) # a list with one entry " $ Species" with 3 levels:
stopifnot(exprs = {
  identical(xl$Species, levels(iris$Species))
  identical(.getXlevels(terms(mc), mc), xl[0]) # a empty named list, as no factors
  is.null(.getXlevels(terms(x~x), list(x=1)))

Auto- and Cross- Covariance and -Correlation Function Estimation


The function acf computes (and by default plots) estimates of the autocovariance or autocorrelation function. Function pacf is the function used for the partial autocorrelations. Function ccf computes the cross-correlation or cross-covariance of two univariate series.


acf(x, lag.max = NULL,
    type = c("correlation", "covariance", "partial"),
    plot = TRUE, na.action =, demean = TRUE, ...)

pacf(x, lag.max, plot, na.action, ...)

## Default S3 method:
pacf(x, lag.max = NULL, plot = TRUE, na.action =,

ccf(x, y, lag.max = NULL, type = c("correlation", "covariance"),
    plot = TRUE, na.action =, ...)

## S3 method for class 'acf'
x[i, j]


x, y

a univariate or multivariate (not ccf) numeric time series object or a numeric vector or matrix, or an "acf" object.


maximum lag at which to calculate the acf. Default is 10log10(N/m)10\log_{10}(N/m) where NN is the number of observations and mm the number of series. Will be automatically limited to one less than the number of observations in the series.


character string giving the type of acf to be computed. Allowed values are "correlation" (the default), "covariance" or "partial". Will be partially matched.


logical. If TRUE (the default) the acf is plotted.


function to be called to handle missing values. na.pass can be used.


logical. Should the covariances be about the sample means?


further arguments to be passed to plot.acf.


a set of lags (time differences) to retain.


a set of series (names or numbers) to retain.


For type = "correlation" and "covariance", the estimates are based on the sample covariance. (The lag 0 autocorrelation is fixed at 1 by convention.)

By default, no missing values are allowed. If the na.action function passes through missing values (as na.pass does), the covariances are computed from the complete cases. This means that the estimate computed may well not be a valid autocorrelation sequence, and may contain missing values. Missing values are not allowed when computing the PACF of a multivariate time series.

The partial correlation coefficient is estimated by fitting autoregressive models of successively higher orders up to lag.max.

The generic function plot has a method for objects of class "acf".

The lag is returned and plotted in units of time, and not numbers of observations.

There are print and subsetting methods for objects of class "acf".


An object of class "acf", which is a list with the following elements:


A three dimensional array containing the lags at which the acf is estimated.


An array with the same dimensions as lag containing the estimated acf.


The type of correlation (same as the type argument).


The number of observations in the time series.


The name of the series x.


The series names for a multivariate time series.

The lag k value returned by ccf(x, y) estimates the correlation between x[t+k] and y[t].

The result is returned invisibly if plot is TRUE.


Original: Paul Gilbert, Martyn Plummer. Extensive modifications and univariate case of pacf by B. D. Ripley.


Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer-Verlag.

(This contains the exact definitions used.)

See Also

plot.acf, ARMAacf for the exact autocorrelations of a given ARMA process.



## Examples from Venables & Ripley
acf(lh, type = "covariance")

acf(ldeaths, ci.type = "ma")
acf(ts.union(mdeaths, fdeaths))
ccf(mdeaths, fdeaths, ylab = "cross-correlation")
# (just the cross-correlations)

presidents # contains missing values
acf(presidents, na.action = na.pass)
pacf(presidents, na.action = na.pass)

Compute an AR Process Exactly Fitting an ACF


Compute an AR process exactly fitting an autocorrelation function.





An autocorrelation or autocovariance sequence.


A matrix, with one row for the computed AR(p) coefficients for 1 <= p <= length(acf).

See Also

ARMAacf, ar.yw which does this from an empirical ACF.


(Acf <- ARMAacf(c(0.6, 0.3, -0.2)))

Add or Drop All Possible Single Terms to a Model


Compute all the single terms in the scope argument that can be added to or dropped from the model, fit those models and compute a table of the changes in fit.


add1(object, scope, ...)

## Default S3 method:
add1(object, scope, scale = 0, test = c("none", "Chisq"),
     k = 2, trace = FALSE, ...)

## S3 method for class 'lm'
add1(object, scope, scale = 0, test = c("none", "Chisq", "F"),
     x = NULL, k = 2, ...)

## S3 method for class 'glm'
add1(object, scope, scale = 0,
     test = c("none", "Rao", "LRT", "Chisq", "F"),
     x = NULL, k = 2, ...)

drop1(object, scope, ...)

## Default S3 method:
drop1(object, scope, scale = 0, test = c("none", "Chisq"),
      k = 2, trace = FALSE, ...)

## S3 method for class 'lm'
drop1(object, scope, scale = 0, all.cols = TRUE,
      test = c("none", "Chisq", "F"), k = 2, ...)

## S3 method for class 'glm'
drop1(object, scope, scale = 0,
      test = c("none", "Rao", "LRT", "Chisq", "F"),
      k = 2, ...)



a fitted model object.


a formula giving the terms to be considered for adding or dropping.


an estimate of the residual mean square to be used in computing CpC_p. Ignored if 0 or NULL.


should the results include a test statistic relative to the original model? The F test is only appropriate for lm and aov models or perhaps for glm fits with estimated dispersion. The χ2\chi^2 test can be an exact test (lm models with known scale) or a likelihood-ratio test or a test of the reduction in scaled deviance depending on the method. For glm fits, you can also choose "LRT" and "Rao" for likelihood ratio tests and Rao's efficient score test. The former is synonymous with "Chisq" (although both have an asymptotic chi-square distribution). Values can be abbreviated.


the penalty constant in AIC / CpC_p.


if TRUE, print out progress reports.


a model matrix containing columns for the fitted model and all terms in the upper scope. Useful if add1 is to be called repeatedly. Warning: no checks are done on its validity.


(Provided for compatibility with S.) Logical to specify whether all columns of the design matrix should be used. If FALSE then non-estimable columns are dropped, but the result is not usually statistically meaningful.


further arguments passed to or from other methods.


For drop1 methods, a missing scope is taken to be all terms in the model. The hierarchy is respected when considering terms to be added or dropped: all main effects contained in a second-order interaction must remain, and so on.

In a scope formula . means ‘what is already there’.

The methods for lm and glm are more efficient in that they do not recompute the model matrix and call the fit methods directly.

The default output table gives AIC, defined as minus twice log likelihood plus 2p2p where pp is the rank of the model (the number of effective parameters). This is only defined up to an additive constant (like log-likelihoods). For linear Gaussian models with fixed scale, the constant is chosen to give Mallows' CpC_p, RSS/scale+2pnRSS/scale + 2p - n. Where CpC_p is used, the column is labelled as Cp rather than AIC.

The F tests for the "glm" methods are based on analysis of deviance tests, so if the dispersion is estimated it is based on the residual deviance, unlike the F tests of anova.glm.


An object of class "anova" summarizing the differences in fit between the models.


The model fitting must apply the models to the same dataset. Most methods will attempt to use a subset of the data with no missing values for any of the variables if na.action = na.omit, but this may give biased results. Only use these functions with data containing missing values with great care.

The default methods make calls to the function nobs to check that the number of observations involved in the fitting process remained unchanged.


These are not fully equivalent to the functions in S. There is no keep argument, and the methods used are not quite so computationally efficient.

Their authors' definitions of Mallows' CpC_p and Akaike's AIC are used, not those of the authors of the models chapter of S.


The design was inspired by the S functions of the same names described in Chambers (1992).


Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

step, aov, lm, extractAIC, anova


require(graphics); require(utils)
## following example(swiss)
lm1 <- lm(Fertility ~ ., data = swiss)
add1(lm1, ~ I(Education^2) + .^2)
drop1(lm1, test = "F")  # So called 'type II' anova

## following example(glm)

drop1(glm.D93, test = "Chisq")
drop1(glm.D93, test = "F")
add1(glm.D93, scope = ~outcome*treatment, test = "Rao") ## Pearson Chi-square

Puts Arbitrary Margins on Multidimensional Tables or Arrays


For a given table one can specify which of the classifying factors to expand by one or more levels to hold margins to be calculated. One may for example form sums and means over the first dimension and medians over the second. The resulting table will then have two extra levels for the first dimension and one extra level for the second. The default is to sum over all margins in the table. Other possibilities may give results that depend on the order in which the margins are computed. This is flagged in the printed output from the function.


addmargins(A, margin = seq_along(dim(A)), FUN = sum, quiet = FALSE)



table or array. The function uses the presence of the "dim" and "dimnames" attributes of A.


vector of dimensions over which to form margins. Margins are formed in the order in which dimensions are specified in margin.


list of the same length as margin, each element of the list being either a function or a list of functions. In the length-1 case, can be a function instead of a list of one. Names of the list elements will appear as levels in dimnames of the result. Unnamed list elements will have names constructed: the name of a function or a constructed name based on the position in the table.


logical which suppresses the message telling the order in which the margins were computed.


If the functions used to form margins are not commutative, the result depends on the order in which margins are computed. Annotation of margins is done via naming the FUN list.


A table or array with the same number of dimensions as A, but with extra levels of the dimensions mentioned in margin. The number of levels added to each dimension is the length of the entries in FUN. A message with the order of computation of margins is printed.


Bendix Carstensen, Steno Diabetes Center & Department of Biostatistics, University of Copenhagen,, autumn 2003. Margin naming enhanced by Duncan Murdoch.

See Also

table, ftable, margin.table.


Aye <- sample(c("Yes", "Si", "Oui"), 177, replace = TRUE)
Bee <- sample(c("Hum", "Buzz"), 177, replace = TRUE)
Sea <- sample(c("White", "Black", "Red", "Dead"), 177, replace = TRUE)
(A <- table(Aye, Bee, Sea))
(aA <- addmargins(A))


# Non-commutative functions - note differences between resulting tables:
ftable( addmargins(A, c(3, 1),
                   FUN = list(list(Min = min, Max = max),
                              Sum = sum)))
ftable( addmargins(A, c(1, 3),
                   FUN = list(Sum = sum,
                              list(Min = min, Max = max))))

# Weird function needed to return the N when computing percentages
sqsm <- function(x) sum(x)^2/100
B <- table(Sea, Bee)
round(sweep(addmargins(B, 1, list(list(All = sum, N = sqsm))), 2,
            apply(B, 2, sum)/100, `/`), 1)
round(sweep(addmargins(B, 2, list(list(All = sum, N = sqsm))), 1,
            apply(B, 1, sum)/100, `/`), 1)

# A total over Bee requires formation of the Bee-margin first:
mB <-  addmargins(B, 2, FUN = list(list(Total = sum)))
round(ftable(sweep(addmargins(mB, 1, list(list(All = sum, N = sqsm))), 2,
                   apply(mB, 2, sum)/100, `/`)), 1)

## Zero.Printing table+margins:
x <- sample( 1:7, 20, replace = TRUE)
y <- sample( 1:7, 20, replace = TRUE)
tx <- addmargins( table(x, y) )
print(tx, zero.print = ".")

Compute Summary Statistics of Data Subsets


Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.


aggregate(x, ...)

## Default S3 method:
aggregate(x, ...)

## S3 method for class 'data.frame'
aggregate(x, by, FUN, ..., simplify = TRUE, drop = TRUE)

## S3 method for class 'formula'
aggregate(x, data, FUN, ...,
          subset, na.action = na.omit)

## S3 method for class 'ts'
aggregate(x, nfrequency = 1, FUN = sum, ndeltat = 1,
          ts.eps = getOption("ts.eps"), ...)



an R object. For the formula method a formula, such as y ~ x or cbind(y1, y2) ~ x1 + x2, where the y variables are numeric data to be split into groups according to the grouping x variables (usually factors).


a list of grouping elements, each as long as the variables in the data frame x, or a formula. The elements are coerced to factors before use.


a function to compute the summary statistics which can be applied to all data subsets.


a logical indicating whether results should be simplified to a vector or matrix if possible.


a logical indicating whether to drop unused combinations of grouping values. The non-default case drop=FALSE has been amended for R 3.5.0 to drop unused combinations.


a data frame (or list) from which the variables in the formula should be taken.


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NA values. The default is to only consider complete cases with respect to the given variables.


new number of observations per unit of time; must be a divisor of the frequency of x.


new fraction of the sampling period between successive observations; must be a divisor of the sampling interval of x.


tolerance used to decide if nfrequency is a sub-multiple of the original frequency.


further arguments passed to or used by methods.


aggregate is a generic function with methods for data frames and time series.

The default method, aggregate.default, uses the time series method if x is a time series, and otherwise coerces x to a data frame and calls the data frame method. is the data frame method. If x is not a data frame, it is coerced to one, which must have a non-zero number of rows. Then, each of the variables (columns) in x is split into subsets of cases (rows) of identical combinations of the components of by, and FUN is applied to each such subset with further arguments in ... passed to it. The result is reformatted into a data frame containing the variables in by and x. The ones arising from by contain the unique combinations of grouping values used for determining the subsets, and the ones arising from x the corresponding summaries for the subset of the respective variables in x. If simplify is true, summaries are simplified to vectors or matrices if they have a common length of one or greater than one, respectively; otherwise, lists of summary results according to subsets are obtained. Rows with missing values in any of the by variables will be omitted from the result. (Note that versions of R prior to 2.11.0 required FUN to be a scalar function.)

The formula method provides a standard formula interface to The latter invokes the formula method if by is a formula, in which case aggregate(x, by, FUN) is the same as aggregate(by, x, FUN) for a data frame x.

aggregate.ts is the time series method, and requires FUN to be a scalar function. If x is not a time series, it is coerced to one. Then, the variables in x are split into appropriate blocks of length frequency(x) / nfrequency, and FUN is applied to each such block, with further (named) arguments in ... passed to it. The result returned is a time series with frequency nfrequency holding the aggregated values. Note that this make most sense for a quarterly or yearly result when the original series covers a whole number of quarters or years: in particular aggregating a monthly series to quarters starting in February does not give a conventional quarterly series.

FUN is passed to, and hence it can be a function or a symbol or character string naming a function.


For the time series method, a time series of class "ts" or class c("mts", "ts").

For the data frame method, a data frame with columns corresponding to the grouping variables in by followed by aggregated columns from x. If the by has names, the non-empty times are used to label the columns in the results, with unnamed grouping variables being named Group.i for by[[i]].


The first argument of the "formula" method was named formula rather than x prior to R 4.2.0. Portable uses should not name that argument.


Kurt Hornik, with contributions by Arni Magnusson.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

apply, lapply, tapply.


## Compute the averages for the variables in 'state.x77', grouped
## according to the region (Northeast, South, North Central, West) that
## each state belongs to.
aggregate(state.x77, list(Region = state.region), mean)

## Compute the averages according to region and the occurrence of more
## than 130 days of frost.
          list(Region = state.region,
               Cold = state.x77[,"Frost"] > 130),
## (Note that no state in 'South' is THAT cold.)

## example with character variables and NAs
testDF <- data.frame(v1 = c(1,3,5,7,8,3,5,NA,4,5,7,9),
                     v2 = c(11,33,55,77,88,33,55,NA,44,55,77,99) )
by1 <- c("red", "blue", 1, 2, NA, "big", 1, 2, "red", 1, NA, 12)
by2 <- c("wet", "dry", 99, 95, NA, "damp", 95, 99, "red", 99, NA, NA)
aggregate(x = testDF, by = list(by1, by2), FUN = "mean")

# and if you want to treat NAs as a group
fby1 <- factor(by1, exclude = "")
fby2 <- factor(by2, exclude = "")
aggregate(x = testDF, by = list(fby1, fby2), FUN = "mean")

## Formulas, one ~ one, one ~ many, many ~ one, and many ~ many:
aggregate(weight ~ feed, data = chickwts, mean)
aggregate(breaks ~ wool + tension, data = warpbreaks, mean)
aggregate(cbind(Ozone, Temp) ~ Month, data = airquality, mean)
aggregate(cbind(ncases, ncontrols) ~ alcgp + tobgp, data = esoph, sum)

## "complete cases" vs. "available cases"
colSums(  # NAs in Ozone but not Temp
## the default is to summarize *complete cases*:
aggregate(cbind(Ozone, Temp) ~ Month, data = airquality, FUN = mean)
## to handle missing values *per variable*:
aggregate(cbind(Ozone, Temp) ~ Month, data = airquality, FUN = mean,
          na.action = na.pass, na.rm = TRUE)

## Dot notation:
aggregate(. ~ Species, data = iris, mean)
aggregate(len ~ ., data = ToothGrowth, mean)

## Often followed by xtabs():
ag <- aggregate(len ~ ., data = ToothGrowth, mean)
xtabs(len ~ ., data = ag)

## Formula interface via 'by' (for pipe operations)
ToothGrowth |> aggregate(len ~ ., FUN = mean)

## Compute the average annual approval ratings for American presidents.
aggregate(presidents, nfrequency = 1, FUN = mean)
## Give the summer less weight.
aggregate(presidents, nfrequency = 1,
          FUN = weighted.mean, w = c(1, 1, 0.5, 1))

Akaike's An Information Criterion


Generic function calculating Akaike's ‘An Information Criterion’ for one or several fitted model objects for which a log-likelihood value can be obtained, according to the formula 2log-likelihood+knpar-2 \mbox{log-likelihood} + k n_{par}, where nparn_{par} represents the number of parameters in the fitted model, and k=2k = 2 for the usual AIC, or k=log(n)k = \log(n) (nn being the number of observations) for the so-called BIC or SBC (Schwarz's Bayesian criterion).


AIC(object, ..., k = 2)

BIC(object, ...)



a fitted model object for which there exists a logLik method to extract the corresponding log-likelihood, or an object inheriting from class logLik.


optionally more fitted model objects.


numeric, the penalty per parameter to be used; the default k = 2 is the classical AIC.


When comparing models fitted by maximum likelihood to the same data, the smaller the AIC or BIC, the better the fit.

The theory of AIC requires that the log-likelihood has been maximized: whereas AIC can be computed for models not fitted by maximum likelihood, their AIC values should not be compared.

Examples of models not ‘fitted to the same data’ are where the response is transformed (accelerated-life models are fitted to log-times) and where contingency tables have been used to summarize data.

These are generic functions (with S4 generics defined in package stats4): however methods should be defined for the log-likelihood function logLik rather than these functions: the action of their default methods is to call logLik on all the supplied objects and assemble the results. Note that in several common cases logLik does not return the value at the MLE: see its help page.

The log-likelihood and hence the AIC/BIC is only defined up to an additive constant. Different constants have conventionally been used for different purposes and so extractAIC and AIC may give different values (and do for models of class "lm": see the help for extractAIC). Particular care is needed when comparing fits of different classes (with, for example, a comparison of a Poisson and gamma GLM being meaningless since one has a discrete response, the other continuous).

BIC is defined as AIC(object, ..., k = log(nobs(object))). This needs the number of observations to be known: the default method looks first for a "nobs" attribute on the return value from the logLik method, then tries the nobs generic, and if neither succeed returns BIC as NA.


If just one object is provided, a numeric value with the corresponding AIC (or BIC, or ..., depending on k).

If multiple objects are provided, a data.frame with rows corresponding to the objects and columns representing the number of parameters in the model (df) and the AIC or BIC.


Originally by José Pinheiro and Douglas Bates, more recent revisions by R-core.


Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.

See Also

extractAIC, logLik, nobs.


lm1 <- lm(Fertility ~ . , data = swiss)

lm2 <- update(lm1, . ~ . -Examination)
AIC(lm1, lm2)
BIC(lm1, lm2)

Find Aliases (Dependencies) in a Model


Find aliases (linearly dependent terms) in a linear model specified by a formula.


alias(object, ...)

## S3 method for class 'formula'
alias(object, data, ...)

## S3 method for class 'lm'
alias(object, complete = TRUE, partial = FALSE,
      partial.pattern = FALSE, ...)



A fitted model object, for example from lm or aov, or a formula for alias.formula.


Optionally, a data frame to search for the objects in the formula.


Should information on complete aliasing be included?


Should information on partial aliasing be included?


Should partial aliasing be presented in a schematic way? If this is done, the results are presented in a more compact way, usually giving the deciles of the coefficients.


further arguments passed to or from other methods.


Although the main method is for class "lm", alias is most useful for experimental designs and so is used with fits from aov. Complete aliasing refers to effects in linear models that cannot be estimated independently of the terms which occur earlier in the model and so have their coefficients omitted from the fit. Partial aliasing refers to effects that can be estimated less precisely because of correlations induced by the design.

Some parts of the "lm" method require recommended package MASS to be installed.


A list (of class "listof") containing components


Description of the model; usually the formula.


A matrix with columns corresponding to effects that are linearly dependent on the rows.


The correlations of the estimable effects, with a zero diagonal. An object of class "mtable" which has its own print method.


The aliasing pattern may depend on the contrasts in use: Helmert contrasts are probably most useful.

The defaults are different from those in S.


The design was inspired by the S function of the same name described in Chambers et al. (1992).


Chambers, J. M., Freeny, A and Heiberger, R. M. (1992) Analysis of variance; designed experiments. Chapter 5 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.


op <- options(contrasts = c("contr.helmert", "contr.poly"))
npk.aov <- aov(yield ~ block + N*P*K, npk)
options(op)  # reset

ANOVA Tables


Compute analysis of variance (or deviance) tables for one or more fitted model objects.


anova(object, ...)



an object containing the results returned by a model fitting function (e.g., lm or glm).


additional objects of the same type.


This (generic) function returns an object of class anova. These objects represent analysis-of-variance and analysis-of-deviance tables. When given a single argument it produces a table which tests whether the model terms are significant.

When given a sequence of objects, anova tests the models against one another in the order specified.

The print method for anova objects prints tables in a ‘pretty’ form.


The comparison between two or more models will only be valid if they are fitted to the same dataset. This may be a problem if there are missing values and R's default of na.action = na.omit is used.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S, Wadsworth & Brooks/Cole.

See Also

coefficients, effects, fitted.values, residuals, summary, drop1, add1.

Analysis of Deviance for Generalized Linear Model Fits


Compute an analysis of deviance table for one or more generalized linear model fits.


## S3 method for class 'glm'
anova(object, ..., dispersion = NULL, test = NULL)


object, ...

objects of class glm, typically the result of a call to glm, or a list of objects for the "glmlist" method.


the dispersion parameter for the fitting family. By default it is obtained from the object(s).


a character string, (partially) matching one of "Chisq", "LRT", "Rao", "F" or "Cp". See stat.anova. Or logical FALSE, which suppresses any test.


Specifying a single object gives a sequential analysis of deviance table for that fit. That is, the reductions in the residual deviance as each term of the formula is added in turn are given in as the rows of a table, plus the residual deviances themselves.

If more than one object is specified, the table has a row for the residual degrees of freedom and deviance for each model. For all but the first model, the change in degrees of freedom and deviance is also given. (This only makes statistical sense if the models are nested.) It is conventional to list the models from smallest to largest, but this is up to the user.

The table will optionally contain test statistics (and P values) comparing the reduction in deviance for the row to the residuals. For models with known dispersion (e.g., binomial and Poisson fits) the chi-squared test is most appropriate, and for those with dispersion estimated by moments (e.g., gaussian, quasibinomial and quasipoisson fits) the F test is most appropriate. If anova.glm can determine which of these cases applies then by default it will use one of the above tests. If the dispersion argument is supplied, the dispersion is considered known and the chi-squared test will be used. Argument test=FALSE suppresses the test statistics and P values. Mallows' CpC_p statistic is the residual deviance plus twice the estimate of σ2\sigma^2 times the residual degrees of freedom, which is closely related to AIC (and a multiple of it if the dispersion is known). You can also choose "LRT" and "Rao" for likelihood ratio tests and Rao's efficient score test. The former is synonymous with "Chisq" (although both have an asymptotic chi-square distribution).

The dispersion estimate will be taken from the largest model, using the value returned by summary.glm. As this will in most cases use a Chi-squared-based estimate, the F tests are not based on the residual deviance in the analysis of deviance table shown.


An object of class "anova" inheriting from class "data.frame".


The comparison between two or more models will only be valid if they are fitted to the same dataset. This may be a problem if there are missing values and R's default of na.action = na.omit is used, and anova will detect this with an error.


Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

glm, anova.

drop1 for so-called ‘type II’ ANOVA where each term is dropped one at a time respecting their hierarchy.


## --- Continuing the Example from  '?glm':

anova(glm.D93, test = FALSE)
anova(glm.D93, test = "Cp")
anova(glm.D93, test = "Chisq")
glm.D93a <-
   update(glm.D93, ~treatment*outcome) # equivalent to Pearson Chi-square
anova(glm.D93, glm.D93a, test = "Rao")

ANOVA for Linear Model Fits


Compute an analysis of variance table for one or more linear model fits.


## S3 method for class 'lm'
anova(object, ...)

## S3 method for class 'lmlist'
anova(object, ..., scale = 0, test = "F")


object, ...

objects of class lm, usually, a result of a call to lm.


a character string specifying the test statistic to be used. Can be one of "F", "Chisq" or "Cp", with partial matching allowed, or NULL for no test.


numeric. An estimate of the noise variance σ2\sigma^2. If zero this will be estimated from the largest model considered.


Specifying a single object gives a sequential analysis of variance table for that fit. That is, the reductions in the residual sum of squares as each term of the formula is added in turn are given in as the rows of a table, plus the residual sum of squares.

The table will contain F statistics (and P values) comparing the mean square for the row to the residual mean square.

If more than one object is specified, the table has a row for the residual degrees of freedom and sum of squares for each model. For all but the first model, the change in degrees of freedom and sum of squares is also given. (This only make statistical sense if the models are nested.) It is conventional to list the models from smallest to largest, but this is up to the user.

Optionally the table can include test statistics. Normally the F statistic is most appropriate, which compares the mean square for a row to the residual sum of squares for the largest model considered. If scale is specified chi-squared tests can be used. Mallows' CpC_p statistic is the residual sum of squares plus twice the estimate of σ2\sigma^2 times the residual degrees of freedom.


An object of class "anova" inheriting from class "data.frame".


The comparison between two or more models will only be valid if they are fitted to the same dataset. This may be a problem if there are missing values and R's default of na.action = na.omit is used, and anova.lmlist will detect this with an error.


Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

The model fitting function lm, anova.

drop1 for so-called ‘type II’ ANOVA where each term is dropped one at a time respecting their hierarchy.


## sequential table
fit <- lm(sr ~ ., data = LifeCycleSavings)

## same effect via separate models
fit0 <- lm(sr ~ 1, data = LifeCycleSavings)
fit1 <- update(fit0, . ~ . + pop15)
fit2 <- update(fit1, . ~ . + pop75)
fit3 <- update(fit2, . ~ . + dpi)
fit4 <- update(fit3, . ~ . + ddpi)
anova(fit0, fit1, fit2, fit3, fit4, test = "F")

anova(fit4, fit2, fit0, test = "F") # unconventional order

Comparisons between Multivariate Linear Models


Compute a (generalized) analysis of variance table for one or more multivariate linear models.


## S3 method for class 'mlm'
anova(object, ...,
      test = c("Pillai", "Wilks", "Hotelling-Lawley", "Roy",
      Sigma = diag(nrow = p), T = Thin.row(Proj(M) - Proj(X)),
      M = diag(nrow = p), X = ~0,
      idata = data.frame(index = seq_len(p)), tol = 1e-7)



an object of class "mlm".


further objects of class "mlm".


choice of test statistic (see below). Can be abbreviated.


(only relevant if test == "Spherical"). Covariance matrix assumed proportional to Sigma.


transformation matrix. By default computed from M and X.


formula or matrix describing the outer projection (see below).


formula or matrix describing the inner projection (see below).


data frame describing intra-block design.


tolerance to be used in deciding if the residuals are rank-deficient: see qr.


The anova.mlm method uses either a multivariate test statistic for the summary table, or a test based on sphericity assumptions (i.e. that the covariance is proportional to a given matrix).

For the multivariate test, Wilks' statistic is most popular in the literature, but the default Pillai–Bartlett statistic is recommended by Hand and Taylor (1987). See summary.manova for further details.

For the "Spherical" test, proportionality is usually with the identity matrix but a different matrix can be specified using Sigma. Corrections for asphericity known as the Greenhouse–Geisser, respectively Huynh–Feldt, epsilons are given and adjusted FF tests are performed.

It is common to transform the observations prior to testing. This typically involves transformation to intra-block differences, but more complicated within-block designs can be encountered, making more elaborate transformations necessary. A transformation matrix T can be given directly or specified as the difference between two projections onto the spaces spanned by M and X, which in turn can be given as matrices or as model formulas with respect to idata (the tests will be invariant to parametrization of the quotient space M/X).

As with anova.lm, all test statistics use the SSD matrix from the largest model considered as the (generalized) denominator.

Contrary to other anova methods, the intercept is not excluded from the display in the single-model case. When contrast transformations are involved, it often makes good sense to test for a zero intercept.


An object of class "anova" inheriting from class "data.frame"


The Huynh–Feldt epsilon differs from that calculated by SAS (as of v. 8.2) except when the DF is equal to the number of observations minus one. This is believed to be a bug in SAS, not in R.


Hand, D. J. and Taylor, C. C. (1987) Multivariate Analysis of Variance and Repeated Measures. Chapman and Hall.

See Also



utils::example(SSD) # Brings in the mlmfit and reacttime objects

mlmfit0 <- update(mlmfit, ~0)

### Traditional tests of intrasubj. contrasts
## Using MANOVA techniques on contrasts:
anova(mlmfit, mlmfit0, X = ~1)

## Assuming sphericity
anova(mlmfit, mlmfit0, X = ~1, test = "Spherical")

### tests using intra-subject 3x2 design
idata <- data.frame(deg = gl(3, 1, 6, labels = c(0, 4, 8)),
                    noise = gl(2, 3, 6, labels = c("A", "P")))

anova(mlmfit, mlmfit0, X = ~ deg + noise,
      idata = idata, test = "Spherical")
anova(mlmfit, mlmfit0, M = ~ deg + noise, X = ~ noise,
      idata = idata, test = "Spherical" )
anova(mlmfit, mlmfit0, M = ~ deg + noise, X = ~ deg,
      idata = idata, test = "Spherical" )

f <- factor(rep(1:2, 5)) # bogus, just for illustration
mlmfit2 <- update(mlmfit, ~f)
anova(mlmfit2, mlmfit, mlmfit0, X = ~1, test = "Spherical")
anova(mlmfit2, X = ~1, test = "Spherical")
# one-model form, eqiv. to previous

### There seems to be a strong interaction in these data

Ansari-Bradley Test


Performs the Ansari-Bradley two-sample test for a difference in scale parameters.


ansari.test(x, ...)

## Default S3 method:
ansari.test(x, y,
            alternative = c("two.sided", "less", "greater"),
            exact = NULL, = FALSE, conf.level = 0.95,

## S3 method for class 'formula'
ansari.test(formula, data, subset, na.action, ...)



numeric vector of data values.


numeric vector of data values.


indicates the alternative hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter.


a logical indicating whether an exact p-value should be computed.

a logical,indicating whether a confidence interval should be computed.


confidence level of the interval.


a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").


further arguments to be passed to or from methods.


Suppose that x and y are independent samples from distributions with densities f((tm)/s)/sf((t-m)/s)/s and f(tm)f(t-m), respectively, where mm is an unknown nuisance parameter and ss, the ratio of scales, is the parameter of interest. The Ansari-Bradley test is used for testing the null that ss equals 1, the two-sided alternative being that s1s \ne 1 (the distributions differ only in variance), and the one-sided alternatives being s>1s > 1 (the distribution underlying x has a larger variance, "greater") or s<1s < 1 ("less").

By default (if exact is not specified), an exact p-value is computed if both samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.

Optionally, a nonparametric confidence interval and an estimator for ss are computed. If exact p-values are available, an exact confidence interval is obtained by the algorithm described in Bauer (1972), and the Hodges-Lehmann estimator is employed. Otherwise, the returned confidence interval and point estimate are based on normal approximations.

Note that mid-ranks are used in the case of ties rather than average scores as employed in Hollander & Wolfe (1973). See, e.g., Hajek, Sidak and Sen (1999), pages 131ff, for more information.


A list with class "htest" containing the following components:


the value of the Ansari-Bradley test statistic.


the p-value of the test.


the ratio of scales ss under the null, 1.


a character string describing the alternative hypothesis.


the string "Ansari-Bradley test".

a character string giving the names of the data.

a confidence interval for the scale parameter. (Only present if argument = TRUE.)


an estimate of the ratio of scales. (Only present if argument = TRUE.)


To compare results of the Ansari-Bradley test to those of the F test to compare two variances (under the assumption of normality), observe that ss is the ratio of scales and hence s2s^2 is the ratio of variances (provided they exist), whereas for the F test the ratio of variances itself is the parameter of interest. In particular, confidence intervals are for ss in the Ansari-Bradley test but for s2s^2 in the F test.


David F. Bauer (1972). Constructing confidence sets using rank statistics. Journal of the American Statistical Association, 67, 687–690. doi:10.1080/01621459.1972.10481279.

Jaroslav Hajek, Zbynek Sidak and Pranab K. Sen (1999). Theory of Rank Tests. San Diego, London: Academic Press.

Myles Hollander and Douglas A. Wolfe (1973). Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 83–92.

See Also

fligner.test for a rank-based (nonparametric) kk-sample test for homogeneity of variances; mood.test for another rank-based two-sample test for a difference in scale parameters; var.test and bartlett.test for parametric tests for the homogeneity in variance.

ansari_test in package coin for exact and approximate conditional p-values for the Ansari-Bradley test, as well as different methods for handling ties.


## Hollander & Wolfe (1973, p. 86f):
## Serum iron determination using Hyland control sera
ramsay <- c(111, 107, 100, 99, 102, 106, 109, 108, 104, 99,
            101, 96, 97, 102, 107, 113, 116, 113, 110, 98)
jung.parekh <- c(107, 108, 106, 98, 105, 103, 110, 105, 104,
            100, 96, 108, 103, 104, 114, 114, 113, 108, 106, 99)
ansari.test(ramsay, jung.parekh)

ansari.test(rnorm(10), rnorm(10, 0, 2), = TRUE)

## try more points - failed in 2.4.1
ansari.test(rnorm(100), rnorm(100, 0, 2), = TRUE)

Fit an Analysis of Variance Model


Fit an analysis of variance model by a call to lm (for each stratum if an Error(.) is used).


aov(formula, data = NULL, projections = FALSE, qr = TRUE,
    contrasts = NULL, ...)



A formula specifying the model.


A data frame in which the variables specified in the formula will be found. If missing, the variables are searched for in the standard way.


Logical flag: should the projections be returned?


Logical flag: should the QR decomposition be returned?


A list of contrasts to be used for some of the factors in the formula. These are not used for any Error term, and supplying contrasts for factors only in the Error term will give a warning.


Arguments to be passed to lm, such as subset or na.action. See ‘Details’ about weights.


This provides a wrapper to lm for fitting linear models to balanced or unbalanced experimental designs.

The main difference from lm is in the way print, summary and so on handle the fit: this is expressed in the traditional language of the analysis of variance rather than that of linear models.

If the formula contains a single Error term, this is used to specify error strata, and appropriate models are fitted within each error stratum.

The formula can specify multiple responses.

Weights can be specified by a weights argument, but should not be used with an Error term, and are incompletely supported (e.g., not by model.tables).


An object of class c("aov", "lm") or for multiple responses of class c("maov", "aov", "mlm", "lm") or for multiple error strata of class c("aovlist", "listof"). There are print and summary methods available for these.


aov is designed for balanced designs, and the results can be hard to interpret without balance: beware that missing values in the response(s) will likely lose the balance. If there are two or more error strata, the methods used are statistically inefficient without balance, and it may be better to use lme in package nlme.

Balance can be checked with the replications function.

The default ‘contrasts’ in R are not orthogonal contrasts, and aov and its helper functions will work better with such contrasts: see the examples for how to select these.


The design was inspired by the S function of the same name described in Chambers et al. (1992).


Chambers, J. M., Freeny, A and Heiberger, R. M. (1992) Analysis of variance; designed experiments. Chapter 5 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

lm, summary.aov, replications, alias, proj, model.tables, TukeyHSD


## From Venables and Ripley (2002) p.165.

## Set orthogonal contrasts.
op <- options(contrasts = c("contr.helmert", "contr.poly"))
( npk.aov <- aov(yield ~ block + N*P*K, npk) )

## to show the effects of re-ordering terms contrast the two fits
aov(yield ~ block + N * P + K, npk)
aov(terms(yield ~ block + N * P + K, keep.order = TRUE), npk)

## as a test, not particularly sensible statistically
npk.aovE <- aov(yield ~  N*P*K + Error(block), npk)
options(op)  # reset to previous

Interpolation Functions


Return a list of points which linearly interpolate given data points, or a function performing the linear (or constant) interpolation.


approx   (x, y = NULL, xout, method = "linear", n = 50,
          yleft, yright, rule = 1, f = 0, ties = mean, na.rm = TRUE)

approxfun(x, y = NULL,       method = "linear",
          yleft, yright, rule = 1, f = 0, ties = mean, na.rm = TRUE)


x, y

numeric vectors giving the coordinates of the points to be interpolated. Alternatively a single plotting structure can be specified: see xy.coords.


an optional set of numeric values specifying where interpolation is to take place.


specifies the interpolation method to be used. Choices are "linear" or "constant".


If xout is not specified, interpolation takes place at n equally spaced points spanning the interval [min(x), max(x)].


the value to be returned when input x values are less than min(x). The default is defined by the value of rule given below.


the value to be returned when input x values are greater than max(x). The default is defined by the value of rule given below.


an integer (of length 1 or 2) describing how interpolation is to take place outside the interval [min(x), max(x)]. If rule is 1 then NAs are returned for such points and if it is 2, the value at the closest data extreme is used. Use, e.g., rule = 2:1, if the left and right side extrapolation should differ.


for method = "constant" a number between 0 and 1 inclusive, indicating a compromise between left- and right-continuous step functions. If y0 and y1 are the values to the left and right of the point then the value is y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this way the result is right-continuous for f == 0 and left-continuous for f == 1, even for non-finite y values.


handling of tied x values. The string "ordered" or a function (or the name of a function) taking a single vector argument and returning a single number or a list of both, e.g., list("ordered", mean), see ‘Details’.


logical specifying how missing values (NA's) should be handled. Setting na.rm=FALSE will propagate NA's in y to the interpolated values, also depending on the rule set. Note that in this case, NA's in x are invalid, see also the examples.


The inputs can contain missing values which are deleted (if na.rm is true, i.e., by default), so at least two complete (x, y) pairs are required (for method = "linear", one otherwise). If there are duplicated (tied) x values and ties contains a function it is applied to the y values for each distinct x value to produce (x,y) pairs with unique x. Useful functions in this context include mean, min, and max.

If ties = "ordered" the x values are assumed to be already ordered (and unique) and ties are not checked but kept if present. This is the fastest option for large length(x).

If ties is a list of length two, ties[[2]] must be a function to be applied to ties, see above, but if ties[[1]] is identical to "ordered", the x values are assumed to be sorted and are only checked for ties. Consequently, ties = list("ordered", mean) will be slightly more efficient than the default ties = mean in such a case.

The first y value will be used for interpolation to the left and the last one for interpolation to the right.


approx returns a list with components x and y, containing n coordinates which interpolate the given data points according to the method (and rule) desired.

The function approxfun returns a function performing (linear or constant) interpolation of the given data points. For a given set of x values, this function will return the corresponding interpolated values. It uses data stored in its environment when it was created, the details of which are subject to change.


The value returned by approxfun contains references to the code in the current version of R: it is not intended to be saved and loaded into a different R session. This is safer for R >= 3.0.0.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

spline and splinefun for spline interpolation.



x <- 1:10
y <- rnorm(10)
par(mfrow = c(2,1))
plot(x, y, main = "approx(.) and approxfun(.)")
points(approx(x, y), col = 2, pch = "*")
points(approx(x, y, method = "constant"), col = 4, pch = "*")

f <- approxfun(x, y)
curve(f(x), 0, 11, col = "green2")
points(x, y)
is.function(fc <- approxfun(x, y, method = "const")) # TRUE
curve(fc(x), 0, 10, col = "darkblue", add = TRUE)
## different extrapolation on left and right side :
plot(approxfun(x, y, rule = 2:1), 0, 11,
     col = "tomato", add = TRUE, lty = 3, lwd = 2)

### Treatment of 'NA's -- are kept if  na.rm=FALSE :

xn <- 1:4
yn <- c(1,NA,3:4)
xout <- (1:9)/2
## Default behavior (na.rm = TRUE): NA's omitted; extrapolation gives NA
data.frame(approx(xn,yn, xout))
data.frame(approx(xn,yn, xout, rule = 2))# -> *constant* extrapolation
## New (2019-2020)  na.rm = FALSE: NA's are "kept"
data.frame(approx(xn,yn, xout, na.rm=FALSE, rule = 2))
data.frame(approx(xn,yn, xout, na.rm=FALSE, rule = 2, method="constant"))

## NA's in x[] are not allowed:
stopifnot(inherits( try( approx(yn,yn, na.rm=FALSE) ), "try-error"))

## Give a nice overview of all possibilities  rule * method * na.rm :
##             -----------------------------  ====   ======   =====
## extrapolations "N":= NA;   "C":= Constant :
rules <- list(N=1, C=2, NC=1:2, CN=2:1)
methods <- c("constant","linear")
ry <- sapply(rules, function(R) {
       sapply(methods, function(M)
        sapply(setNames(,c(TRUE,FALSE)), function(na.)
                 approx(xn, yn, xout=xout, method=M, rule=R, na.rm=na.)$y),
   }, simplify="array")
names(dimnames(ry)) <- c("x = ", "na.rm", "method", "rule")
dimnames(ry)[[1]] <- format(xout)
ftable(aperm(ry, 4:1)) # --> (4 * 2 * 2) x length(xout)  =  16 x 9 matrix

## Show treatment of 'ties' :

x <- c(2,2:4,4,4,5,5,7,7,7)
y <- c(1:6, 5:4, 3:1)
(amy <- approx(x, y, xout = x)$y) # warning, can be avoided by specifying 'ties=':
op <- options(warn=2) # warnings would be error
stopifnot(identical(amy, approx(x, y, xout = x, ties=mean)$y))
(ay <- approx(x, y, xout = x, ties = "ordered")$y)
stopifnot(amy == c(1.5,1.5, 3, 5,5,5, 4.5,4.5, 2,2,2),
          ay  == c(2, 2,    3, 6,6,6, 4, 4,    1,1,1))
approx(x, y, xout = x, ties = min)$y
approx(x, y, xout = x, ties = max)$y
options(op) # revert 'warn'ing level

Fit Autoregressive Models to Time Series


Fit an autoregressive time series model to the data, by default selecting the complexity by AIC.


ar(x, aic = TRUE, order.max = NULL,
   method = c("yule-walker", "burg", "ols", "mle", "yw"),
   na.action, series, ...)

ar.burg(x, ...)
## Default S3 method:
ar.burg(x, aic = TRUE, order.max = NULL,
        na.action =, demean = TRUE, series,
        var.method = 1, ...)
## S3 method for class 'mts'
ar.burg(x, aic = TRUE, order.max = NULL,
        na.action =, demean = TRUE, series,
        var.method = 1, ...)

ar.yw(x, ...)
## Default S3 method:
ar.yw(x, aic = TRUE, order.max = NULL,
      na.action =, demean = TRUE, series, ...)
## S3 method for class 'mts'
ar.yw(x, aic = TRUE, order.max = NULL,
      na.action =, demean = TRUE, series,
      var.method = 1, ...)

ar.mle(x, aic = TRUE, order.max = NULL, na.action =,
       demean = TRUE, series, ...)

## S3 method for class 'ar'
predict(object, newdata, n.ahead = 1, = TRUE, ...)



a univariate or multivariate time series.


logical. If TRUE then the Akaike Information Criterion is used to choose the order of the autoregressive model. If FALSE, the model of order order.max is fitted.


maximum order (or order) of model to fit. Defaults to the smaller of N1N-1 and 10log10(N)10\log_{10}(N) where NN is the number of non-missing observations except for method = "mle" where it is the minimum of this quantity and 12.


character string specifying the method to fit the model. Must be one of the strings in the default argument (the first few characters are sufficient). Defaults to "yule-walker".


function to be called to handle missing values. Currently, via na.action = na.pass, only Yule-Walker method can handle missing values which must be consistent within a time point: either all variables must be missing or none.


should a mean be estimated during fitting?


names for the series. Defaults to deparse1(substitute(x)).


the method to estimate the innovations variance (see ‘Details’).


additional arguments for specific methods.


a fit from ar().


data to which to apply the prediction.


number of steps ahead at which to predict.

logical: return estimated standard errors of the prediction error?


For definiteness, note that the AR coefficients have the sign in

xtμ=a1(xt1μ)++ap(xtpμ)+etx_t - \mu = a_1(x_{t-1} - \mu) + \cdots + a_p(x_{t-p} - \mu) + e_t

ar is just a wrapper for the functions ar.yw, ar.burg, ar.ols and ar.mle.

Order selection is done by AIC if aic is true. This is problematic, as of the methods here only ar.mle performs true maximum likelihood estimation. The AIC is computed as if the variance estimate were the MLE, omitting the determinant term from the likelihood. Note that this is not the same as the Gaussian likelihood evaluated at the estimated parameter values. In ar.yw the variance matrix of the innovations is computed from the fitted coefficients and the autocovariance of x.

ar.burg allows two methods to estimate the innovations variance and hence AIC. Method 1 is to use the update given by the Levinson-Durbin recursion (Brockwell and Davis, 1991, (8.2.6) on page 242), and follows S-PLUS. Method 2 is the mean of the sum of squares of the forward and backward prediction errors (as in Brockwell and Davis, 1996, page 145). Percival and Walden (1998) discuss both. In the multivariate case the estimated coefficients will depend (slightly) on the variance estimation method.

Remember that ar includes by default a constant in the model, by removing the overall mean of x before fitting the AR model, or (ar.mle) estimating a constant to subtract.


For ar and its methods a list of class "ar" with the following elements:


The order of the fitted model. This is chosen by minimizing the AIC if aic = TRUE, otherwise it is order.max.


Estimated autoregression coefficients for the fitted model.


The prediction variance: an estimate of the portion of the variance of the time series that is not explained by the autoregressive model.


The estimated mean of the series used in fitting and for use in prediction.


(ar.ols only.) The intercept in the model for x - x.mean.


The differences in AIC between each model and the best-fitting model. Note that the latter can have an AIC of -Inf.


The number of observations in the time series, including missing.


The number of non-missing observations in the time series.


The value of the order.max argument.


The estimate of the partial autocorrelation function up to lag order.max.


residuals from the fitted model, conditioning on the first order observations. The first order residuals are set to NA. If x is a time series, so is resid.


The value of the method argument.


The name(s) of the time series.


The frequency of the time series.


The matched call.


(univariate case, order > 0.) The asymptotic-theory variance matrix of the coefficient estimates.

For, a time series of predictions, or if = TRUE, a list with components pred, the predictions, and se, the estimated standard errors. Both components are time series.


Only the univariate case of ar.mle is implemented.

Fitting by method="mle" to long series can be very slow.

If x contains missing values, see NA, also consider using arima(), possibly with method = "ML".


Martyn Plummer. Univariate case of ar.yw, ar.mle and C code for univariate case of ar.burg by B. D. Ripley.


Brockwell, P. J. and Davis, R. A. (1991). Time Series and Forecasting Methods, second edition. Springer, New York. Section 11.4.

Brockwell, P. J. and Davis, R. A. (1996). Introduction to Time Series and Forecasting. Springer, New York. Sections 5.1 and 7.6.

Percival, D. P. and Walden, A. T. (1998). Spectral Analysis for Physical Applications. Cambridge University Press.

Whittle, P. (1963). On the fitting of multivariate autoregressions and the approximate canonical factorization of a spectral density matrix. Biometrika, 40, 129–134. doi:10.2307/2333753.

See Also

ar.ols, arima for ARMA models; acf2AR, for AR construction from the ACF.

arima.sim for simulation of AR processes.


ar(lh, method = "burg")
ar(lh, method = "ols")
ar(lh, FALSE, 4) # fit ar(4)

( <- ar(sunspot.year))
predict(, n.ahead = 25)
## try the other methods too

ar(ts.union(BJsales, BJsales.lead))
## Burg is quite different here, as is OLS (see ar.ols)
ar(ts.union(BJsales, BJsales.lead), method = "burg")

Fit Autoregressive Models to Time Series by OLS


Fit an autoregressive time series model to the data by ordinary least squares, by default selecting the complexity by AIC.


ar.ols(x, aic = TRUE, order.max = NULL, na.action =,
       demean = TRUE, intercept = demean, series, ...)



A univariate or multivariate time series.


Logical flag. If TRUE then the Akaike Information Criterion is used to choose the order of the autoregressive model. If FALSE, the model of order order.max is fitted.


Maximum order (or order) of model to fit. Defaults to 10log10(N)10\log_{10}(N) where NN is the number of observations.


function to be called to handle missing values.


should the AR model be for x minus its mean?


should a separate intercept term be fitted?


names for the series. Defaults to deparse1(substitute(x)).


further arguments to be passed to or from methods.


ar.ols fits the general AR model to a possibly non-stationary and/or multivariate system of series x. The resulting unconstrained least squares estimates are consistent, even if some of the series are non-stationary and/or co-integrated. For definiteness, note that the AR coefficients have the sign in

xtμ=a0+a1(xt1μ)++ap(xtpμ)+etx_t - \mu = a_0 + a_1(x_{t-1} - \mu) + \cdots + a_p(x_{t-p} - \mu) + e_t

where a0a_0 is zero unless intercept is true, and μ\mu is the sample mean if demean is true, zero otherwise.

Order selection is done by AIC if aic is true. This is problematic, as ar.ols does not perform true maximum likelihood estimation. The AIC is computed as if the variance estimate (computed from the variance matrix of the residuals) were the MLE, omitting the determinant term from the likelihood. Note that this is not the same as the Gaussian likelihood evaluated at the estimated parameter values.

Some care is needed if intercept is true and demean is false. Only use this is the series are roughly centred on zero. Otherwise the computations may be inaccurate or fail entirely.


A list of class "ar" with the following elements:


The order of the fitted model. This is chosen by minimizing the AIC if aic = TRUE, otherwise it is order.max.


Estimated autoregression coefficients for the fitted model.


The prediction variance: an estimate of the portion of the variance of the time series that is not explained by the autoregressive model.


The estimated mean (or zero if demean is false) of the series used in fitting and for use in prediction.


The intercept in the model for x - x.mean, or zero if intercept is false.


The differences in AIC between each model and the best-fitting model. Note that the latter can have an AIC of -Inf.


The number of observations in the time series.


The value of the order.max argument.


NULL. For compatibility with ar.


residuals from the fitted model, conditioning on the first order observations. The first order residuals are set to NA. If x is a time series, so is resid.


The character string "Unconstrained LS".


The name(s) of the time series.


The frequency of the time series.


The matched call.

The asymptotic-theory standard errors of the coefficient estimates.


Adrian Trapletti, Brian Ripley.


Luetkepohl, H. (1991): Introduction to Multiple Time Series Analysis. Springer Verlag, NY, pp. 368–370.

See Also



ar(lh, method = "burg")
ar.ols(lh, FALSE, 4) # fit ar(4)

ar.ols(ts.union(BJsales, BJsales.lead))

x <- diff(log(EuStockMarkets))
ar.ols(x, order.max = 6, demean = FALSE, intercept = TRUE)

ARIMA Modelling of Time Series


Fit an ARIMA model to a univariate time series.


arima(x, order = c(0L, 0L, 0L),
      seasonal = list(order = c(0L, 0L, 0L), period = NA),
      xreg = NULL, include.mean = TRUE, = TRUE,
      fixed = NULL, init = NULL,
      method = c("CSS-ML", "ML", "CSS"), n.cond,
      SSinit = c("Gardner1980", "Rossignol2011"),
      optim.method = "BFGS",
      optim.control = list(), kappa = 1e6)



a univariate time series


A specification of the non-seasonal part of the ARIMA model: the three integer components (p,d,q)(p, d, q) are the AR order, the degree of differencing, and the MA order.


A specification of the seasonal part of the ARIMA model, plus the period (which defaults to frequency(x)). This may be a list with components order and period, or just a numeric vector of length 3 which specifies the seasonal order. In the latter case the default period is used.


Optionally, a vector or matrix of external regressors, which must have the same number of rows as x.


Should the ARMA model include a mean/intercept term? The default is TRUE for undifferenced series, and it is ignored for ARIMA models with differencing.

logical; if true, the AR parameters are transformed to ensure that they remain in the region of stationarity. Not used for method = "CSS". For method = "ML", it has been advantageous to set = FALSE in some cases, see also fixed.


optional numeric vector of the same length as the total number of coefficients to be estimated. It should be of the form

(ϕ1,,ϕp,θ1,,θq,Φ1,,ΦP,Θ1,,ΘQ,μ),(\phi_1, \ldots, \phi_p, \theta_1, \ldots, \theta_q, \Phi_1, \ldots, \Phi_P, \Theta_1, \ldots, \Theta_Q, \mu),

where ϕi\phi_i are the AR coefficients, θi\theta_i are the MA coefficients, Φi\Phi_i are the seasonal AR coefficients, Θi\Theta_i are the seasonal MA coefficients and μ\mu is the intercept term. Note that the μ\mu entry is required if and only if include.mean is TRUE. In particular it should not be present if the model is an ARIMA model with differencing.

The entries of the fixed vector should consist of the values at which the user wishes to “fix” the corresponding coefficient, or NA if that coefficient should not be fixed, but estimated.

The argument will be set to FALSE if any AR parameters are fixed. A warning will be given if is set to (or left at its default) TRUE. It may be wise to set = FALSE even when fixing MA parameters, especially at values that cause the model to be nearly non-invertible.


optional numeric vector of initial parameter values. Missing values will be filled in, by zeroes except for regression coefficients. Values already specified in fixed will be ignored.


fitting method: maximum likelihood or minimize conditional sum-of-squares. The default (unless there are missing values) is to use conditional-sum-of-squares to find starting values, then maximum likelihood. Can be abbreviated.


only used if fitting by conditional-sum-of-squares: the number of initial observations to ignore. It will be ignored if less than the maximum lag of an AR term.


a string specifying the algorithm to compute the state-space initialization of the likelihood; see KalmanLike for details. Can be abbreviated.


The value passed as the method argument to optim.


List of control parameters for optim.


the prior variance (as a multiple of the innovations variance) for the past observations in a differenced model. Do not reduce this.


Different definitions of ARMA models have different signs for the AR and/or MA coefficients. The definition used here has

Xt=a1Xt1++apXtp+et+b1et1++bqetqX_t= a_1 X_{t-1}+\cdots+ a_p X_{t-p} + e_t + b_1 e_{t-1}+\cdots+b_q e_{t-q}

and so the MA coefficients differ in sign from those used in documentation written for S-PLUS. Further, if include.mean is true (the default for an ARMA model), this formula applies to XmX - m rather than XX. For ARIMA models with differencing, the differenced series follows a zero-mean ARMA model. If an xreg term is included, a linear regression (with a constant term if include.mean is true and there is no differencing) is fitted with an ARMA model for the error term.

The variance matrix of the estimates is found from the Hessian of the log-likelihood, and so may only be a rough guide.

Optimization is done by optim. It will work best if the columns in xreg are roughly scaled to zero mean and unit variance, but does attempt to estimate suitable scalings.


A list of class "Arima" with components:


a vector of AR, MA and regression coefficients, which can be extracted by the coef method.


the MLE of the innovations variance.


the estimated variance matrix of the coefficients coef, which can be extracted by the vcov method.


the maximized log-likelihood (of the differenced data), or the approximation to it used.


A compact form of the specification, as a vector giving the number of AR, MA, seasonal AR and seasonal MA coefficients, plus the period and the number of non-seasonal and seasonal differences.


the AIC value corresponding to the log-likelihood. Only valid for method = "ML" fits.


the fitted innovations.


the matched call.


the name of the series x.


the convergence value returned by optim.


the number of initial observations not used in the fitting.


the number of “used” observations for the fitting, can also be extracted via nobs() and is used by BIC.


A list representing the Kalman filter used in the fitting. See KalmanLike.

Fitting methods

The exact likelihood is computed via a state-space representation of the ARIMA process, and the innovations and their variance found by a Kalman filter. The initialization of the differenced ARMA process uses stationarity and is based on Gardner et al. (1980). For a differenced process the non-stationary components are given a diffuse prior (controlled by kappa). Observations which are still controlled by the diffuse prior (determined by having a Kalman gain of at least 1e4) are excluded from the likelihood calculations. (This gives comparable results to arima0 in the absence of missing values, when the observations excluded are precisely those dropped by the differencing.)

Missing values are allowed, and are handled exactly in method "ML".

If is true, the optimization is done using an alternative parametrization which is a variation on that suggested by Jones (1980) and ensures that the model is stationary. For an AR(p) model the parametrization is via the inverse tanh of the partial autocorrelations: the same procedure is applied (separately) to the AR and seasonal AR terms. The MA terms are not constrained to be invertible during optimization, but they will be converted to invertible form after optimization if is true.

Conditional sum-of-squares is provided mainly for expositional purposes. This computes the sum of squares of the fitted innovations from observation n.cond on, (where n.cond is at least the maximum lag of an AR term), treating all earlier innovations to be zero. Argument n.cond can be used to allow comparability between different fits. The ‘part log-likelihood’ is the first term, half the log of the estimated mean square. Missing values are allowed, but will cause many of the innovations to be missing.

When regressors are specified, they are orthogonalized prior to fitting unless any of the coefficients is fixed. It can be helpful to roughly scale the regressors to zero mean and unit variance.


arima is very similar to arima0 for ARMA models or for differenced models without missing values, but handles differenced models with missing values exactly. It is somewhat slower than arima0, particularly for seasonally differenced models.


Brockwell, P. J. and Davis, R. A. (1996). Introduction to Time Series and Forecasting. Springer, New York. Sections 3.3 and 8.3.

Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press.

Gardner, G, Harvey, A. C. and Phillips, G. D. A. (1980). Algorithm AS 154: An algorithm for exact maximum likelihood estimation of autoregressive-moving average models by means of Kalman filtering. Applied Statistics, 29, 311–322. doi:10.2307/2346910.

Harvey, A. C. (1993). Time Series Models. 2nd Edition. Harvester Wheatsheaf. Sections 3.3 and 4.4.

Jones, R. H. (1980). Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics, 22, 389–395. doi:10.2307/1268324.

Ripley, B. D. (2002). “Time series in R 1.5.0”. R News, 2(2), 2–7.

See Also

predict.Arima, arima.sim for simulating from an ARIMA model, tsdiag, arima0, ar


arima(lh, order = c(1,0,0))
arima(lh, order = c(3,0,0))
arima(lh, order = c(1,0,1))

arima(lh, order = c(3,0,0), method = "CSS")

arima(USAccDeaths, order = c(0,1,1), seasonal = list(order = c(0,1,1)))
arima(USAccDeaths, order = c(0,1,1), seasonal = list(order = c(0,1,1)),
      method = "CSS") # drops first 13 observations.
# for a model with as few years as this, we want full ML

arima(LakeHuron, order = c(2,0,0), xreg = time(LakeHuron) - 1920)

## presidents contains NAs
## graphs in example(acf) suggest order 1 or 3
(fit1 <- arima(presidents, c(1, 0, 0)))
(fit3 <- arima(presidents, c(3, 0, 0)))  # smaller AIC
BIC(fit1, fit3)
## compare a whole set of models; BIC() would choose the smallest
AIC(fit1, arima(presidents, c(2,0,0)),
          arima(presidents, c(2,0,1)), # <- chosen (barely) by AIC
    fit3, arima(presidents, c(3,0,1)))

## An example of using the  'fixed'  argument:
## Note that the period of the seasonal component is taken to be
## frequency(presidents), i.e. 4.
(fitSfx <- arima(presidents, order=c(2,0,1), seasonal=c(1,0,0),
                 fixed=c(NA, NA, 0.5, -0.1, 50),
## The partly-fixed & smaller model seems better (as we "knew too much"):
AIC(fitSfx, arima(presidents, order=c(2,0,1), seasonal=c(1,0,0)))

## An example of ARIMA forecasting:
predict(fit3, 3)

Simulate from an ARIMA Model


Simulate from an ARIMA model.


arima.sim(model, n, rand.gen = rnorm, innov = rand.gen(n, ...),
          n.start = NA, start.innov = rand.gen(n.start, ...),



A list with component ar and/or ma giving the AR and MA coefficients respectively. Optionally a component order can be used. An empty list gives an ARIMA(0, 0, 0) model, that is white noise.


length of output series, before un-differencing. A strictly positive integer.


optional: a function to generate the innovations.


an optional times series of innovations. If not provided, rand.gen is used.


length of ‘burn-in’ period. If NA, the default, a reasonable value is computed.


an optional times series of innovations to be used for the burn-in period. If supplied there must be at least n.start values (and n.start is by default computed inside the function).


additional arguments for rand.gen. Most usefully, the standard deviation of the innovations generated by rnorm can be specified by sd.


See arima for the precise definition of an ARIMA model.

The ARMA model is checked for stationarity.

ARIMA models are specified via the order component of model, in the same way as for arima. Other aspects of the order component are ignored, but inconsistent specifications of the MA and AR orders are detected. The un-differencing assumes previous values of zero, and to remind the user of this, those values are returned.

Random inputs for the ‘burn-in’ period are generated by calling rand.gen.


A time-series object of class "ts".

See Also




arima.sim(n = 63, list(ar = c(0.8897, -0.4858), ma = c(-0.2279, 0.2488)),
          sd = sqrt(0.1796))
# mildly long-tailed
arima.sim(n = 63, list(ar = c(0.8897, -0.4858), ma = c(-0.2279, 0.2488)),
          rand.gen = function(n, ...) sqrt(0.1796) * rt(n, df = 5))

# An ARIMA simulation
ts.sim <- arima.sim(list(order = c(1,1,0), ar = 0.7), n = 200)

ARIMA Modelling of Time Series – Preliminary Version


Fit an ARIMA model to a univariate time series, and forecast from the fitted model.


arima0(x, order = c(0, 0, 0),
       seasonal = list(order = c(0, 0, 0), period = NA),
       xreg = NULL, include.mean = TRUE, delta = 0.01, = TRUE, fixed = NULL, init = NULL,
       method = c("ML", "CSS"), n.cond, optim.control = list())

## S3 method for class 'arima0'
predict(object, n.ahead = 1, newxreg, = TRUE, ...)



a univariate time series


A specification of the non-seasonal part of the ARIMA model: the three components (p,d,q)(p, d, q) are the AR order, the degree of differencing, and the MA order.


A specification of the seasonal part of the ARIMA model, plus the period (which defaults to frequency(x)). This should be a list with components order and period, but a specification of just a numeric vector of length 3 will be turned into a suitable list with the specification as the order.


Optionally, a vector or matrix of external regressors, which must have the same number of rows as x.


Should the ARIMA model include a mean term? The default is TRUE for undifferenced series, FALSE for differenced ones (where a mean would not affect the fit nor predictions).


A value to indicate at which point ‘fast recursions’ should be used. See the ‘Details’ section.

Logical. If true, the AR parameters are transformed to ensure that they remain in the region of stationarity. Not used for method = "CSS".


optional numeric vector of the same length as the total number of parameters. If supplied, only NA entries in fixed will be varied. = TRUE will be overridden (with a warning) if any ARMA parameters are fixed.


optional numeric vector of initial parameter values. Missing values will be filled in, by zeroes except for regression coefficients. Values already specified in fixed will be ignored.


Fitting method: maximum likelihood or minimize conditional sum-of-squares. Can be abbreviated.


Only used if fitting by conditional-sum-of-squares: the number of initial observations to ignore. It will be ignored if less than the maximum lag of an AR term.


List of control parameters for optim.


The result of an arima0 fit.


New values of xreg to be used for prediction. Must have at least n.ahead rows.


The number of steps ahead for which prediction is required.

Logical: should standard errors of prediction be returned?


arguments passed to or from other methods.


Different definitions of ARMA models have different signs for the AR and/or MA coefficients. The definition here has

Xt=a1Xt1++apXtp+et+b1et1++bqetqX_t = a_1X_{t-1} + \cdots + a_pX_{t-p} + e_t + b_1e_{t-1} + \dots + b_qe_{t-q}

and so the MA coefficients differ in sign from those given by S-PLUS. Further, if include.mean is true, this formula applies to XmX-m rather than XX. For ARIMA models with differencing, the differenced series follows a zero-mean ARMA model.

The variance matrix of the estimates is found from the Hessian of the log-likelihood, and so may only be a rough guide, especially for fits close to the boundary of invertibility.

Optimization is done by optim. It will work best if the columns in xreg are roughly scaled to zero mean and unit variance, but does attempt to estimate suitable scalings.

Finite-history prediction is used. This is only statistically efficient if the MA part of the fit is invertible, so predict.arima0 will give a warning for non-invertible MA models.


For arima0, a list of class "arima0" with components:


a vector of AR, MA and regression coefficients,


the MLE of the innovations variance.


the estimated variance matrix of the coefficients coef.


the maximized log-likelihood (of the differenced data), or the approximation to it used.


A compact form of the specification, as a vector giving the number of AR, MA, seasonal AR and seasonal MA coefficients, plus the period and the number of non-seasonal and seasonal differences.


the AIC value corresponding to the log-likelihood. Only valid for method = "ML" fits.


the fitted innovations.


the matched call.


the name of the series x.


the value returned by optim.


the number of initial observations not used in the fitting.

For predict.arima0, a time series of predictions, or if = TRUE, a list with components pred, the predictions, and se, the estimated standard errors. Both components are time series.

Fitting methods

The exact likelihood is computed via a state-space representation of the ARMA process, and the innovations and their variance found by a Kalman filter based on Gardner et al. (1980). This has the option to switch to ‘fast recursions’ (assume an effectively infinite past) if the innovations variance is close enough to its asymptotic bound. The argument delta sets the tolerance: at its default value the approximation is normally negligible and the speed-up considerable. Exact computations can be ensured by setting delta to a negative value.

If is true, the optimization is done using an alternative parametrization which is a variation on that suggested by Jones (1980) and ensures that the model is stationary. For an AR(p) model the parametrization is via the inverse tanh of the partial autocorrelations: the same procedure is applied (separately) to the AR and seasonal AR terms. The MA terms are also constrained to be invertible during optimization by the same transformation if is true. Note that the MLE for MA terms does sometimes occur for MA polynomials with unit roots: such models can be fitted by using = FALSE and specifying a good set of initial values (often obtainable from a fit with = TRUE).

Missing values are allowed, but any missing values will force delta to be ignored and full recursions used. Note that missing values will be propagated by differencing, so the procedure used in this function is not fully efficient in that case.

Conditional sum-of-squares is provided mainly for expositional purposes. This computes the sum of squares of the fitted innovations from observation n.cond on, (where n.cond is at least the maximum lag of an AR term), treating all earlier innovations to be zero. Argument n.cond can be used to allow comparability between different fits. The ‘part log-likelihood’ is the first term, half the log of the estimated mean square. Missing values are allowed, but will cause many of the innovations to be missing.

When regressors are specified, they are orthogonalized prior to fitting unless any of the coefficients is fixed. It can be helpful to roughly scale the regressors to zero mean and unit variance.


This is a preliminary version, and will be replaced by arima.

The standard errors of prediction exclude the uncertainty in the estimation of the ARMA model and the regression coefficients.

The results are likely to be different from S-PLUS's arima.mle, which computes a conditional likelihood and does not include a mean in the model. Further, the convention used by arima.mle reverses the signs of the MA coefficients.


Brockwell, P. J. and Davis, R. A. (1996). Introduction to Time Series and Forecasting. Springer, New York. Sections 3.3 and 8.3.

Gardner, G, Harvey, A. C. and Phillips, G. D. A. (1980). Algorithm AS 154: An algorithm for exact maximum likelihood estimation of autoregressive-moving average models by means of Kalman filtering. Applied Statistics, 29, 311–322. doi:10.2307/2346910.

Harvey, A. C. (1993). Time Series Models. 2nd Edition. Harvester Wheatsheaf. Sections 3.3 and 4.4.

Harvey, A. C. and McKenzie, C. R. (1982). Algorithm AS 182: An algorithm for finite sample prediction from ARIMA processes. Applied Statistics, 31, 180–187. doi:10.2307/2347987.

Jones, R. H. (1980). Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics, 22, 389–395. doi:10.2307/1268324.

See Also

arima, ar, tsdiag


## Not run: arima0(lh, order = c(1,0,0))
arima0(lh, order = c(3,0,0))
arima0(lh, order = c(1,0,1))
predict(arima0(lh, order = c(3,0,0)), n.ahead = 12)

arima0(lh, order = c(3,0,0), method = "CSS")

# for a model with as few years as this, we want full ML
(fit <- arima0(USAccDeaths, order = c(0,1,1),
               seasonal = list(order=c(0,1,1)), delta = -1))
predict(fit, n.ahead = 6)

arima0(LakeHuron, order = c(2,0,0), xreg = time(LakeHuron)-1920)
## Not run: 
## presidents contains NAs
## graphs in example(acf) suggest order 1 or 3
(fit1 <- arima0(presidents, c(1, 0, 0), delta = -1))  # avoid warning
(fit3 <- arima0(presidents, c(3, 0, 0), delta = -1))  # smaller AIC
## End(Not run)

Compute Theoretical ACF for an ARMA Process


Compute the theoretical autocorrelation function or partial autocorrelation function for an ARMA process.


ARMAacf(ar = numeric(), ma = numeric(), lag.max = r, pacf = FALSE)



numeric vector of AR coefficients


numeric vector of MA coefficients


integer. Maximum lag required. Defaults to max(p, q+1), where p, q are the numbers of AR and MA terms respectively.


logical. Should the partial autocorrelations be returned?


The methods used follow Brockwell & Davis (1991, section 3.3). Their equations (3.3.8) are solved for the autocovariances at lags 0,,max(p,q+1)0, \dots, \max(p, q+1), and the remaining autocorrelations are given by a recursive filter.


A vector of (partial) autocorrelations, named by the lags.


Brockwell, P. J. and Davis, R. A. (1991) Time Series: Theory and Methods, Second Edition. Springer.

See Also

arima, ARMAtoMA, acf2AR for inverting part of ARMAacf; further filter.


ARMAacf(c(1.0, -0.25), 1.0, lag.max = 10)

## Example from Brockwell & Davis (1991, pp.92-4)
## answer: 2^(-n) * (32/3 + 8 * n) /(32/3)
n <- 1:10
a.n <- 2^(-n) * (32/3 + 8 * n) /(32/3)
(A.n <- ARMAacf(c(1.0, -0.25), 1.0, lag.max = 10))
stopifnot(all.equal(unname(A.n), c(1, a.n)))

ARMAacf(c(1.0, -0.25), 1.0, lag.max = 10, pacf = TRUE)
zapsmall(ARMAacf(c(1.0, -0.25), lag.max = 10, pacf = TRUE))

## Cov-Matrix of length-7 sub-sample of AR(1) example:
toeplitz(ARMAacf(0.8, lag.max = 7))

Convert ARMA Process to Infinite MA Process


Convert ARMA process to infinite MA process.


ARMAtoMA(ar = numeric(), ma = numeric(), lag.max)



numeric vector of AR coefficients


numeric vector of MA coefficients


Largest MA(Inf) coefficient required.


A vector of coefficients.


Brockwell, P. J. and Davis, R. A. (1991) Time Series: Theory and Methods, Second Edition. Springer.

See Also

arima, ARMAacf.


ARMAtoMA(c(1.0, -0.25), 1.0, 10)
## Example from Brockwell & Davis (1991, p.92)
## answer (1 + 3*n)*2^(-n)
n <- 1:10; (1 + 3*n)*2^(-n)

Convert Objects to Class "hclust"


Converts objects from other hierarchical clustering functions to class "hclust".


as.hclust(x, ...)



Hierarchical clustering object


further arguments passed to or from other methods.


Currently there is only support for converting objects of class "twins" as produced by the functions diana and agnes from the package cluster. The default method throws an error unless passed an "hclust" object.


An object of class "hclust".

See Also

hclust, and from package cluster, diana and agnes


x <- matrix(rnorm(30), ncol = 3)
hc <- hclust(dist(x), method = "complete")

if(require("cluster", quietly = TRUE)) {# is a recommended package
  ag <- agnes(x, method = "complete")
  hcag <- as.hclust(ag)
  ## The dendrograms order slightly differently:
  op <- par(mfrow = c(1,2))
  plot(hc) ;  mtext("hclust", side = 1)
  plot(hcag); mtext("agnes",  side = 1)

Convert to One-Sided Formula


Names, calls, expressions (first element), numeric values, and character strings are converted to one-sided formulae associated with the global environment. If the input is a formula, it must be one-sided, in which case it is returned unaltered.





a one-sided formula, name, call, expression, numeric value, or character string.


a one-sided formula representing object


José Pinheiro and Douglas Bates

See Also



(form <- asOneSidedFormula("age"))
stopifnot(exprs = {
    identical(form, asOneSidedFormula(form))
    identical(form, asOneSidedFormula("age")))
    identical(form, asOneSidedFormula(expression(age)))

Group Averages Over Level Combinations of Factors


Subsets of x[] are averaged, where each subset consist of those observations with the same factor levels.


ave(x, ..., FUN = mean)



A numeric.


Grouping variables, typically factors, all of the same length as x.


Function to apply for each factor level combination.


A numeric vector, say y of length length(x). If ... is g1, g2, e.g., y[i] is equal to FUN(x[j], for all j with g1[j] == g1[i] and g2[j] == g2[i]).

See Also

mean, median.



ave(1:3)  # no grouping -> grand mean

ave(breaks, wool)
ave(breaks, tension)
ave(breaks, tension, FUN = function(x) mean(x, trim = 0.1))
plot(breaks, main =
     "ave( Warpbreaks )  for   wool  x  tension  combinations")
lines(ave(breaks, wool, tension              ), type = "s", col = "blue")
lines(ave(breaks, wool, tension, FUN = median), type = "s", col = "green")
legend(40, 70, c("mean", "median"), lty = 1,
      col = c("blue","green"), bg = "gray90")

Bandwidth Selectors for Kernel Density Estimation


Bandwidth selectors for Gaussian kernels in density.




bw.ucv(x, nb = 1000, lower = 0.1 * hmax, upper = hmax,
       tol = 0.1 * lower)

bw.bcv(x, nb = 1000, lower = 0.1 * hmax, upper = hmax,
       tol = 0.1 * lower)

bw.SJ(x, nb = 1000, lower = 0.1 * hmax, upper = hmax,
      method = c("ste", "dpi"), tol = 0.1 * lower)



numeric vector.


number of bins to use.

lower, upper

range over which to minimize. The default is almost always satisfactory. hmax is calculated internally from a normal reference bandwidth.


either "ste" ("solve-the-equation") or "dpi" ("direct plug-in"). Can be abbreviated.


for method "ste", the convergence tolerance for uniroot. The default leads to bandwidth estimates with only slightly more than one digit accuracy, which is sufficient for practical density estimation, but possibly not for theoretical simulation studies.


bw.nrd0 implements a rule-of-thumb for choosing the bandwidth of a Gaussian kernel density estimator. It defaults to 0.9 times the minimum of the standard deviation and the interquartile range divided by 1.34 times the sample size to the negative one-fifth power (= Silverman's ‘rule of thumb’, Silverman (1986, page 48, eqn (3.31))) unless the quartiles coincide when a positive result will be guaranteed.

bw.nrd is the more common variation given by Scott (1992), using factor 1.06.

bw.ucv and bw.bcv implement unbiased and biased cross-validation respectively.

bw.SJ implements the methods of Sheather & Jones (1991) to select the bandwidth using pilot estimation of derivatives.
The algorithm for method "ste" solves an equation (via uniroot) and because of that, enlarges the interval c(lower, upper) when the boundaries were not user-specified and do not bracket the root.

The last three methods use all pairwise binned distances: they are of complexity O(n2)O(n^2) up to n = nb/2 and O(n)O(n) thereafter. Because of the binning, the results differ slightly when x is translated or sign-flipped.


A bandwidth on a scale suitable for the bw argument of density.


Long vectors x are not supported, but neither are they by density and kernel density estimation and for more than a few thousand points a histogram would be preferred.


B. D. Ripley, taken from early versions of package MASS.


Scott, D. W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization. New York: Wiley.

Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society Series B, 53, 683–690. doi:10.1111/j.2517-6161.1991.tb01857.x.

Silverman, B. W. (1986). Density Estimation. London: Chapman and Hall.

Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. Springer.

See Also


bandwidth.nrd, ucv, bcv and width.SJ in package MASS, which are all scaled to the width argument of density and so give answers four times as large.



plot(density(precip, n = 1000))
lines(density(precip, bw = "nrd"), col = 2)
lines(density(precip, bw = "ucv"), col = 3)
lines(density(precip, bw = "bcv"), col = 4)
lines(density(precip, bw = "SJ-ste"), col = 5)
lines(density(precip, bw = "SJ-dpi"), col = 6)
legend(55, 0.035,
       legend = c("nrd0", "nrd", "ucv", "bcv", "SJ-ste", "SJ-dpi"),
       col = 1:6, lty = 1)

Bartlett Test of Homogeneity of Variances


Performs Bartlett's test of the null that the variances in each of the groups (samples) are the same.


bartlett.test(x, ...)

## Default S3 method:
bartlett.test(x, g, ...)

## S3 method for class 'formula'
bartlett.test(formula, data, subset, na.action, ...)



a numeric vector of data values, or a list of numeric data vectors representing the respective samples, or fitted linear model objects (inheriting from class "lm").


a vector or factor object giving the group for the corresponding elements of x. Ignored if x is a list.


a formula of the form lhs ~ rhs where lhs gives the data values and rhs the corresponding groups.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").


further arguments to be passed to or from methods.


If x is a list, its elements are taken as the samples or fitted linear models to be compared for homogeneity of variances. In this case, the elements must either all be numeric data vectors or fitted linear model objects, g is ignored, and one can simply use bartlett.test(x) to perform the test. If the samples are not yet contained in a list, use bartlett.test(list(x, ...)).

Otherwise, x must be a numeric data vector, and g must be a vector or factor object of the same length as x giving the group for the corresponding elements of x.


A list of class "htest" containing the following components:


Bartlett's K-squared test statistic.


the degrees of freedom of the approximate chi-squared distribution of the test statistic.


the p-value of the test.


the character string "Bartlett test of homogeneity of variances".

a character string giving the names of the data.


Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proceedings of the Royal Society of London Series A 160, 268–282. doi:10.1098/rspa.1937.0109.

See Also

var.test for the special case of comparing variances in two samples from normal distributions; fligner.test for a rank-based (nonparametric) kk-sample test for homogeneity of variances; ansari.test and mood.test for two rank based two-sample tests for difference in scale.



plot(count ~ spray, data = InsectSprays)
bartlett.test(InsectSprays$count, InsectSprays$spray)
bartlett.test(count ~ spray, data = InsectSprays)

The Beta Distribution


Density, distribution function, quantile function and random generation for the Beta distribution with parameters shape1 and shape2 (and optional non-centrality parameter ncp).


dbeta(x, shape1, shape2, ncp = 0, log = FALSE)
pbeta(q, shape1, shape2, ncp = 0, lower.tail = TRUE, log.p = FALSE)
qbeta(p, shape1, shape2, ncp = 0, lower.tail = TRUE, log.p = FALSE)
rbeta(n, shape1, shape2, ncp = 0)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.

shape1, shape2

non-negative parameters of the Beta distribution.


non-centrality parameter.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


The Beta distribution with parameters shape1 =a= a and shape2 =b= b has density

f(x)=Γ(a+b)Γ(a)Γ(b)xa1(1x)b1f(x)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}{x}^{a-1} {(1-x)}^{b-1}%

for a>0a > 0, b>0b > 0 and 0x10 \le x \le 1 where the boundary values at x=0x=0 or x=1x=1 are defined as by continuity (as limits).
The mean is a/(a+b)a/(a+b) and the variance is ab/((a+b)2(a+b+1))ab/((a+b)^2 (a+b+1)). If a,b>1a,b > 1, (or one of them =1=1), the mode is (a1)/(a+b2)(a-1)/(a+b-2). These and all other distributional properties can be defined as limits (leading to point masses at 0, 1/2, or 1) when aa or bb are zero or infinite, and the corresponding [dpqr]beta() functions are defined correspondingly.

pbeta is closely related to the incomplete beta function. As defined by Abramowitz and Stegun 6.6.1

Bx(a,b)=0xta1(1t)b1dt,B_x(a,b) = \int_0^x t^{a-1} (1-t)^{b-1} dt,

and 6.6.2 Ix(a,b)=Bx(a,b)/B(a,b)I_x(a,b) = B_x(a,b) / B(a,b) where B(a,b)=B1(a,b)B(a,b) = B_1(a,b) is the Beta function (beta).

Ix(a,b)I_x(a,b) is pbeta(x, a, b).

The noncentral Beta distribution (with ncp =λ= \lambda) is defined (Johnson et al., 1995, pp. 502) as the distribution of X/(X+Y)X/(X+Y) where Xχ2a2(λ)X \sim \chi^2_{2a}(\lambda) and Yχ2b2Y \sim \chi^2_{2b}. There, χn2(λ)\chi^2_n(\lambda) is the noncentral chi-squared distribution with nn degrees of freedom and non-centrality parameter λ\lambda, see Chisquare.


dbeta gives the density, pbeta the distribution function, qbeta the quantile function, and rbeta generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rbeta, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


Supplying ncp = 0 uses the algorithm for the non-central distribution, which is not the same algorithm as when ncp is omitted. This is to give consistent behaviour in extreme cases with values of ncp very near zero.


  • The central dbeta is based on a binomial probability, using code contributed by Catherine Loader (see dbinom) if either shape parameter is larger than one, otherwise directly from the definition. The non-central case is based on the derivation as a Poisson mixture of betas (Johnson et al., 1995, pp. 502–3).

  • The central pbeta for the default (log_p = FALSE) uses a C translation based on

    Didonato, A. and Morris, A., Jr, (1992) Algorithm 708: Significant digit computation of the incomplete beta function ratios, ACM Transactions on Mathematical Software, 18, 360–373, doi:10.1145/131766.131776. (See also
    Brown, B. and Lawrence Levy, L. (1994) Certification of algorithm 708: Significant digit computation of the incomplete beta, ACM Transactions on Mathematical Software, 20, 393–397, doi:10.1145/192115.192155.)
    We have slightly tweaked the original “TOMS 708” algorithm, and enhanced for log.p = TRUE. For that (log-scale) case, underflow to -Inf (i.e., P=0P = 0) or 0, (i.e., P=1P = 1) still happens because the original algorithm was designed without log-scale considerations. Underflow to -Inf now typically signals a warning.

  • The non-central pbeta uses a C translation of

    Lenth, R. V. (1987) Algorithm AS 226: Computing noncentral beta probabilities. Applied Statistics, 36, 241–244, doi:10.2307/2347558, incorporating
    Frick, H. (1990)'s AS R84, Applied Statistics, 39, 311–2, doi:10.2307/2347780 and
    Lam, M.L. (1995)'s AS R95, Applied Statistics, 44, 551–2, doi:10.2307/2986147.

    This computes the lower tail only, so the upper tail suffers from cancellation and a warning will be given when this is likely to be significant.

  • The central case of qbeta is based on a C translation of

    Cran, G. W., K. J. Martin and G. E. Thomas (1977). Remark AS R19 and Algorithm AS 109, Applied Statistics, 26, 111–114, doi:10.2307/2346887, and subsequent remarks (AS83 and correction).

    Enhancements, notably for starting values and switching to a log-scale Newton search, by R Core.

  • The central case of rbeta is based on a C translation of

    R. C. H. Cheng (1978). Generating beta variates with nonintegral shape parameters. Communications of the ACM, 21, 317–322.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Abramowitz, M. and Stegun, I. A. (1972) Handbook of Mathematical Functions. New York: Dover. Chapter 6: Gamma and Related Functions.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 2, especially chapter 25. Wiley, New York.

See Also

Distributions for other standard distributions.

beta for the Beta function.


x <- seq(0, 1, length.out = 21)
dbeta(x, 1, 1)
pbeta(x, 1, 1)

## Visualization, including limit cases:
pl.beta <- function(a,b, asp = if(isLim) 1, ylim = if(isLim) c(0,1.1)) {
  if(isLim <- a == 0 || b == 0 || a == Inf || b == Inf) {
    eps <- 1e-10
    x <- c(0, eps, (1:7)/16, 1/2+c(-eps,0,eps), (9:15)/16, 1-eps, 1)
  } else {
    x <- seq(0, 1, length.out = 1025)
  fx <- cbind(dbeta(x, a,b), pbeta(x, a,b), qbeta(x, a,b))
  f <- fx; f[fx == Inf] <- 1e100
  matplot(x, f, ylab="", type="l", ylim=ylim, asp=asp,
          main = sprintf("[dpq]beta(x, a=%g, b=%g)", a,b))
  abline(0,1,     col="gray", lty=3)
  abline(h = 0:1, col="gray", lty=3)
  legend("top", paste0(c("d","p","q"), "beta(x, a,b)"),
         col=1:3, lty=1:3, bty = "n")
  invisible(cbind(x, fx))

pl.beta(2, 4)
pl.beta(3, 7)
pl.beta(3, 7, asp=1)

pl.beta(0, 0)   ## point masses at  {0, 1}

pl.beta(0, 2)   ## point mass at 0 ; the same as
pl.beta(1, Inf)

pl.beta(Inf, 2) ## point mass at 1 ; the same as
pl.beta(3, 0)

pl.beta(Inf, Inf)# point mass at 1/2

Exact Binomial Test


Performs an exact test of a simple null hypothesis about the probability of success in a Bernoulli experiment.


binom.test(x, n, p = 0.5,
           alternative = c("two.sided", "less", "greater"),
           conf.level = 0.95)



number of successes, or a vector of length 2 giving the numbers of successes and failures, respectively.


number of trials; ignored if x has length 2.


hypothesized probability of success.


indicates the alternative hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter.


confidence level for the returned confidence interval.


Confidence intervals are obtained by a procedure first given in Clopper and Pearson (1934). This guarantees that the confidence level is at least conf.level, but in general does not give the shortest-length confidence intervals.


A list with class "htest" containing the following components:


the number of successes.


the number of trials.


the p-value of the test.

a confidence interval for the probability of success.


the estimated probability of success.


the probability of success under the null, p.


a character string describing the alternative hypothesis.


the character string "Exact binomial test".

a character string giving the names of the data.


Clopper, C. J. & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26, 404–413. doi:10.2307/2331986.

William J. Conover (1971), Practical nonparametric statistics. New York: John Wiley & Sons. Pages 97–104.

Myles Hollander & Douglas A. Wolfe (1973), Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 15–22.

See Also

prop.test for a general (approximate) test for equal or given proportions.


## Conover (1971), p. 97f.
## Under (the assumption of) simple Mendelian inheritance, a cross
##  between plants of two particular genotypes produces progeny 1/4 of
##  which are "dwarf" and 3/4 of which are "giant", respectively.
##  In an experiment to determine if this assumption is reasonable, a
##  cross results in progeny having 243 dwarf and 682 giant plants.
##  If "giant" is taken as success, the null hypothesis is that p =
##  3/4 and the alternative that p != 3/4.
binom.test(c(682, 243), p = 3/4)
binom.test(682, 682 + 243, p = 3/4)   # The same.
## => Data are in agreement with the null hypothesis.

The Binomial Distribution


Density, distribution function, quantile function and random generation for the binomial distribution with parameters size and prob.

This is conventionally interpreted as the number of ‘successes’ in size trials.


dbinom(x, size, prob, log = FALSE)
pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)
qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)
rbinom(n, size, prob)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.


number of trials (zero or more).


probability of success on each trial.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


The binomial distribution with size =n= n and prob =p= p has density

p(x)=(nx)px(1p)nxp(x) = {n \choose x} {p}^{x} {(1-p)}^{n-x}

for x=0,,nx = 0, \ldots, n. Note that binomial coefficients can be computed by choose in R.

If an element of x is not integer, the result of dbinom is zero, with a warning.

p(x)p(x) is computed using Loader's algorithm, see the reference below.

The quantile is defined as the smallest value xx such that F(x)pF(x) \ge p, where FF is the distribution function.


dbinom gives the density, pbinom gives the distribution function, qbinom gives the quantile function and rbinom generates random deviates.

If size is not an integer, NaN is returned.

The length of the result is determined by n for rbinom, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


For dbinom a saddle-point expansion is used: see

Catherine Loader (2000). Fast and Accurate Computation of Binomial Probabilities; available as

pbinom uses pbeta.

qbinom uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search.

rbinom (for size < .Machine$integer.max) is based on

Kachitvichyanukul, V. and Schmeiser, B. W. (1988) Binomial random variate generation. Communications of the ACM, 31, 216–222.

For larger values it uses inversion.

See Also

Distributions for other standard distributions, including dnbinom for the negative binomial, and dpois for the Poisson distribution.


# Compute P(45 < X < 55) for X Binomial(100,0.5)
sum(dbinom(46:54, 100, 0.5))

## Using "log = TRUE" for an extended range :
n <- 2000
k <- seq(0, n, by = 20)
plot (k, dbinom(k, n, pi/10, log = TRUE), type = "l", ylab = "log density",
      main = "dbinom(*, log=TRUE) is better than  log(dbinom(*))")
lines(k, log(dbinom(k, n, pi/10)), col = "red", lwd = 2)
## extreme points are omitted since dbinom gives 0.
mtext("dbinom(k, log=TRUE)", adj = 0)
mtext("extended range", adj = 0, line = -1, font = 4)
mtext("log(dbinom(k))", col = "red", adj = 1)

Biplot of Multivariate Data


Plot a biplot on the current graphics device.


biplot(x, ...)

## Default S3 method:
biplot(x, y, var.axes = TRUE, col, cex = rep(par("cex"), 2),
       xlabs = NULL, ylabs = NULL, expand = 1,
       xlim  = NULL, ylim  = NULL, arrow.len = 0.1,
       main = NULL, sub = NULL, xlab = NULL, ylab = NULL, ...)



The biplot, a fitted object. For biplot.default, the first set of points (a two-column matrix), usually associated with observations.


The second set of points (a two-column matrix), usually associated with variables.


If TRUE the second set of points have arrows representing them as (unscaled) axes.


A vector of length 2 giving the colours for the first and second set of points respectively (and the corresponding axes). If a single colour is specified it will be used for both sets. If missing the default colour is looked for in the palette: if there it and the next colour as used, otherwise the first two colours of the palette are used.


The character expansion factor used for labelling the points. The labels can be of different sizes for the two sets by supplying a vector of length two.


A vector of character strings to label the first set of points: the default is to use the row dimname of x, or 1:n if the dimname is NULL.


A vector of character strings to label the second set of points: the default is to use the row dimname of y, or 1:n if the dimname is NULL.


An expansion factor to apply when plotting the second set of points relative to the first. This can be used to tweak the scaling of the two sets to a physically comparable scale.


The length of the arrow heads on the axes plotted in var.axes is true. The arrow head can be suppressed by arrow.len = 0.

xlim, ylim

Limits for the x and y axes in the units of the first set of variables.

main, sub, xlab, ylab, ...

graphical parameters.


A biplot is plot which aims to represent both the observations and variables of a matrix of multivariate data on the same plot. There are many variations on biplots (see the references) and perhaps the most widely used one is implemented by biplot.princomp. The function biplot.default merely provides the underlying code to plot two sets of variables on the same figure.

Graphical parameters can also be given to biplot: the size of xlabs and ylabs is controlled by cex.

Side Effects

a plot is produced on the current graphics device.


K. R. Gabriel (1971). The biplot graphical display of matrices with application to principal component analysis. Biometrika, 58, 453–467. doi:10.2307/2334381.

J.C. Gower and D. J. Hand (1996). Biplots. Chapman & Hall.

See Also

biplot.princomp, also for examples.

Biplot for Principal Components


Produces a biplot (in the strict sense) from the output of princomp or prcomp


## S3 method for class 'prcomp'
biplot(x, choices = 1:2, scale = 1, pc.biplot = FALSE, ...)

## S3 method for class 'princomp'
biplot(x, choices = 1:2, scale = 1, pc.biplot = FALSE, ...)



an object of class "princomp".


length 2 vector specifying the components to plot. Only the default is a biplot in the strict sense.


The variables are scaled by lambda ^ scale and the observations are scaled by lambda ^ (1-scale) where lambda are the singular values as computed by princomp. Normally 0 <= scale <= 1, and a warning will be issued if the specified scale is outside this range.


If true, use what Gabriel (1971) refers to as a "principal component biplot", with lambda = 1 and observations scaled up by sqrt(n) and variables scaled down by sqrt(n). Then inner products between variables approximate covariances and distances between observations approximate Mahalanobis distance.


optional arguments to be passed to biplot.default.


This is a method for the generic function biplot. There is considerable confusion over the precise definitions: those of the original paper, Gabriel (1971), are followed here. Gabriel and Odoroff (1990) use the same definitions, but their plots actually correspond to pc.biplot = TRUE.

Side Effects

a plot is produced on the current graphics device.


Gabriel, K. R. (1971). The biplot graphical display of matrices with applications to principal component analysis. Biometrika, 58, 453–467. doi:10.2307/2334381.

Gabriel, K. R. and Odoroff, C. L. (1990). Biplots in biomedical research. Statistics in Medicine, 9, 469–485. doi:10.1002/sim.4780090502.

See Also

biplot, princomp.



Probability of coincidences


Computes answers to a generalised birthday paradox problem. pbirthday computes the probability of a coincidence and qbirthday computes the smallest number of observations needed to have at least a specified probability of coincidence.


qbirthday(prob = 0.5, classes = 365, coincident = 2)
pbirthday(n, classes = 365, coincident = 2)



How many distinct categories the people could fall into


The desired probability of coincidence


The number of people


The number of people to fall in the same category


The birthday paradox is that a very small number of people, 23, suffices to have a 50–50 chance that two or more of them have the same birthday. This function generalises the calculation to probabilities other than 0.5, numbers of coincident events other than 2, and numbers of classes other than 365.

The formula used is approximate for coincident > 2. The approximation is very good for moderate values of prob but less good for very small probabilities.



Minimum number of people needed for a probability of at least prob that k or more of them have the same one out of classes equiprobable labels.


Probability of the specified coincidence.


Diaconis, P. and Mosteller F. (1989). Methods for studying coincidences. Journal of the American Statistical Association, 84, 853–861. doi:10.1080/01621459.1989.10478847.



## the standard version
qbirthday() # 23
## probability of > 2 people with the same birthday
pbirthday(23, coincident = 3)

## examples from Diaconis & Mosteller p. 858.
## 'coincidence' is that husband, wife, daughter all born on the 16th
qbirthday(classes = 30, coincident = 3) # approximately 18
qbirthday(coincident = 4)  # exact value 187
qbirthday(coincident = 10) # exact value 1181

## same 4-digit PIN number
qbirthday(classes = 10^4)

## 0.9 probability of three or more coincident birthdays
qbirthday(coincident = 3, prob = 0.9)

## Chance of 4 or more coincident birthdays in 150 people
pbirthday(150, coincident = 4)

## 100 or more coincident birthdays in 1000 people: very rare
pbirthday(1000, coincident = 100)

Box-Pierce and Ljung-Box Tests


Compute the Box–Pierce or Ljung–Box test statistic for examining the null hypothesis of independence in a given time series. These are sometimes known as ‘portmanteau’ tests.


Box.test(x, lag = 1, type = c("Box-Pierce", "Ljung-Box"), fitdf = 0)



a numeric vector or univariate time series.


the statistic will be based on lag autocorrelation coefficients.


test to be performed: partial matching is used.


number of degrees of freedom to be subtracted if x is a series of residuals.


These tests are sometimes applied to the residuals from an ARMA(p, q) fit, in which case the references suggest a better approximation to the null-hypothesis distribution is obtained by setting fitdf = p+q, provided of course that lag > fitdf.


A list with class "htest" containing the following components:


the value of the test statistic.


the degrees of freedom of the approximate chi-squared distribution of the test statistic (taking fitdf into account).


the p-value of the test.


a character string indicating which type of test was performed.

a character string giving the name of the data.


Missing values are not handled.


A. Trapletti


Box, G. E. P. and Pierce, D. A. (1970), Distribution of residual correlations in autoregressive-integrated moving average time series models. Journal of the American Statistical Association, 65, 1509–1526. doi:10.2307/2284333.

Ljung, G. M. and Box, G. E. P. (1978), On a measure of lack of fit in time series models. Biometrika, 65, 297–303. doi:10.2307/2335207.

Harvey, A. C. (1993) Time Series Models. 2nd Edition, Harvester Wheatsheaf, NY, pp. 44, 45.


x <- rnorm (100)
Box.test (x, lag = 1)
Box.test (x, lag = 1, type = "Ljung")

Sets Contrasts for a Factor


Sets the "contrasts" attribute for the factor.


C(object, contr, how.many, ...)



a factor or ordered factor


which contrasts to use. Can be a matrix with one row for each level of the factor or a suitable function like contr.poly or a character string giving the name of the function


the number of contrasts to set, by default one less than nlevels(object).


additional arguments for the function contr.


For compatibility with S, contr can be treatment, helmert, sum or poly (without quotes) as shorthand for contr.treatment and so on.


The factor object with the "contrasts" attribute set.


Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

contrasts, contr.sum, etc.


## reset contrasts to defaults
options(contrasts = c("contr.treatment", "contr.poly"))
tens <- with(warpbreaks, C(tension, poly, 1))
## tension SHOULD be an ordered factor, but as it is not we can use
aov(breaks ~ wool + tens + tension, data = warpbreaks)

## show the use of ...  The default contrast is contr.treatment here
summary(lm(breaks ~ wool + C(tension, base = 2), data = warpbreaks))

# following on from help(esoph)
model3 <- glm(cbind(ncases, ncontrols) ~ agegp + C(tobgp, , 1) +
     C(alcgp, , 1), data = esoph, family = binomial())

Canonical Correlations


Compute the canonical correlations between two data matrices.


cancor(x, y, xcenter = TRUE, ycenter = TRUE)



numeric matrix (n×p1n \times p_1), containing the x coordinates.


numeric matrix (n×p2n \times p_2), containing the y coordinates.


logical or numeric vector of length p1p_1, describing any centering to be done on the x values before the analysis. If TRUE (default), subtract the column means. If FALSE, do not adjust the columns. Otherwise, a vector of values to be subtracted from the columns.


analogous to xcenter, but for the y values.


The canonical correlation analysis seeks linear combinations of the y variables which are well explained by linear combinations of the x variables. The relationship is symmetric as ‘well explained’ is measured by correlations.


A list containing the following components:




estimated coefficients for the x variables.


estimated coefficients for the y variables.


the values used to adjust the x variables.


the values used to adjust the x variables.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.

Hotelling H. (1936). Relations between two sets of variables. Biometrika, 28, 321–327. doi:10.1093/biomet/28.3-4.321.

Seber, G. A. F. (1984). Multivariate Observations. New York: Wiley. Page 506f.

See Also

qr, svd.


## signs of results are random
pop <- LifeCycleSavings[, 2:3]
oec <- LifeCycleSavings[, -(2:3)]
cancor(pop, oec)

x <- matrix(rnorm(150), 50, 3)
y <- matrix(rnorm(250), 50, 5)
(cxy <- cancor(x, y))
all(abs(cor(x %*% cxy$xcoef,
            y %*% cxy$ycoef)[,1:3] - diag(cxy $ cor)) < 1e-15)
all(abs(cor(x %*% cxy$xcoef) - diag(3)) < 1e-15)
all(abs(cor(y %*% cxy$ycoef) - diag(5)) < 1e-15)

Case and Variable Names of Fitted Models


Simple utilities returning (non-missing) case names, and (non-eliminated) variable names.


case.names(object, ...)
## S3 method for class 'lm'
case.names(object, full = FALSE, ...)

variable.names(object, ...)
## S3 method for class 'lm'
variable.names(object, full = FALSE, ...)



an R object, typically a fitted model.


logical; if TRUE, all names (including zero weights, ...) are returned.


further arguments passed to or from other methods.


A character vector.

See Also

lm; further, all.names, all.vars for functions with a similar name but only slightly related purpose.


x <- 1:20
y <-  setNames(x + (x/4 - 2)^3 + rnorm(20, sd = 3),
               paste("O", x, sep = "."))
ww <- rep(1, 20); ww[13] <- 0
summary(lmxy <- lm(y ~ x + I(x^2)+I(x^3) + I((x-10)^2), weights = ww),
        correlation = TRUE)
variable.names(lmxy, full = TRUE)  # includes the last
case.names(lmxy, full = TRUE)      # includes the 0-weight case

The Cauchy Distribution


Density, distribution function, quantile function and random generation for the Cauchy distribution with location parameter location and scale parameter scale.


dcauchy(x, location = 0, scale = 1, log = FALSE)
pcauchy(q, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE)
qcauchy(p, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE)
rcauchy(n, location = 0, scale = 1)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.

location, scale

location and scale parameters.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


If location or scale are not specified, they assume the default values of 0 and 1 respectively.

The Cauchy distribution with location ll and scale ss has density

f(x)=1πs(1+(xls)2)1f(x) = \frac{1}{\pi s} \left( 1 + \left(\frac{x - l}{s}\right)^2 \right)^{-1}%

for all xx.


dcauchy, pcauchy, and qcauchy are respectively the density, distribution function and quantile function of the Cauchy distribution. rcauchy generates random deviates from the Cauchy.

The length of the result is determined by n for rcauchy, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


dcauchy, pcauchy and qcauchy are all calculated from numerically stable versions of the definitions.

rcauchy uses inversion.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 16. Wiley, New York.

See Also

Distributions for other standard distributions, including dt for the t distribution which generalizes dcauchy(*, l = 0, s = 1).



Pearson's Chi-squared Test for Count Data


chisq.test performs chi-squared contingency table tests and goodness-of-fit tests.


chisq.test(x, y = NULL, correct = TRUE,
           p = rep(1/length(x), length(x)), rescale.p = FALSE,
           simulate.p.value = FALSE, B = 2000)



a numeric vector or matrix. x and y can also both be factors.


a numeric vector; ignored if x is a matrix. If x is a factor, y should be a factor of the same length.


a logical indicating whether to apply continuity correction when computing the test statistic for 2 by 2 tables: one half is subtracted from all OE|O - E| differences; however, the correction will not be bigger than the differences themselves. No correction is done if simulate.p.value = TRUE.


a vector of probabilities of the same length as x. An error is given if any entry of p is negative.


a logical scalar; if TRUE then p is rescaled (if necessary) to sum to 1. If rescale.p is FALSE, and p does not sum to 1, an error is given.


a logical indicating whether to compute p-values by Monte Carlo simulation.


an integer specifying the number of replicates used in the Monte Carlo test.


If x is a matrix with one row or column, or if x is a vector and y is not given, then a goodness-of-fit test is performed (x is treated as a one-dimensional contingency table). The entries of x must be non-negative integers. In this case, the hypothesis tested is whether the population probabilities equal those in p, or are all equal if p is not given.

If x is a matrix with at least two rows and columns, it is taken as a two-dimensional contingency table: the entries of x must be non-negative integers. Otherwise, x and y must be vectors or factors of the same length; cases with missing values are removed, the objects are coerced to factors, and the contingency table is computed from these. Then Pearson's chi-squared test is performed of the null hypothesis that the joint distribution of the cell counts in a 2-dimensional contingency table is the product of the row and column marginals.

If simulate.p.value is FALSE, the p-value is computed from the asymptotic chi-squared distribution of the test statistic; continuity correction is only used in the 2-by-2 case (if correct is TRUE, the default). Otherwise the p-value is computed for a Monte Carlo test (Hope, 1968) with B replicates. The default B = 2000 implies a minimum p-value of about 0.0005 (1/(B+1)1/(B+1)).

In the contingency table case, simulation is done by random sampling from the set of all contingency tables with given marginals, and works only if the marginals are strictly positive. Continuity correction is never used, and the statistic is quoted without it. Note that this is not the usual sampling situation assumed for the chi-squared test but rather that for Fisher's exact test.

In the goodness-of-fit case simulation is done by random sampling from the discrete distribution specified by p, each sample being of size n = sum(x). This simulation is done in R and may be slow.


A list with class "htest" containing the following components:


the value the chi-squared test statistic.


the degrees of freedom of the approximate chi-squared distribution of the test statistic, NA if the p-value is computed by Monte Carlo simulation.


the p-value for the test.


a character string indicating the type of test performed, and whether Monte Carlo simulation or continuity correction was used.

a character string giving the name(s) of the data.


the observed counts.


the expected counts under the null hypothesis.


the Pearson residuals, (observed - expected) / sqrt(expected).


standardized residuals, (observed - expected) / sqrt(V), where V is the residual cell variance (Agresti, 2007, section 2.4.5 for the case where x is a matrix, n * p * (1 - p) otherwise).


The code for Monte Carlo simulation is a C translation of the Fortran algorithm of Patefield (1981).


Hope, A. C. A. (1968). A simplified Monte Carlo significance test procedure. Journal of the Royal Statistical Society Series B, 30, 582–598. doi:10.1111/j.2517-6161.1968.tb00759.x.

Patefield, W. M. (1981). Algorithm AS 159: An efficient method of generating r x c tables with given row and column totals. Applied Statistics, 30, 91–97. doi:10.2307/2346669.

Agresti, A. (2007). An Introduction to Categorical Data Analysis, 2nd ed. New York: John Wiley & Sons. Page 38.

See Also

For goodness-of-fit testing, notably of continuous distributions, ks.test.


## From Agresti(2007) p.39
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
                    party = c("Democrat","Independent", "Republican"))
(Xsq <- chisq.test(M))  # Prints test summary
Xsq$observed   # observed counts (same as M)
Xsq$expected   # expected counts under the null
Xsq$residuals  # Pearson residuals
Xsq$stdres     # standardized residuals

## Effect of simulating p-values
x <- matrix(c(12, 5, 7, 7), ncol = 2)
chisq.test(x)$p.value           # 0.4233
chisq.test(x, simulate.p.value = TRUE, B = 10000)$p.value
                                # around 0.29!

## Testing for population probabilities
## Case A. Tabulated data
x <- c(A = 20, B = 15, C = 25)
chisq.test(as.table(x))             # the same
x <- c(89,37,30,28,2)
p <- c(40,20,20,15,5)
chisq.test(x, p = p)                # gives an error
chisq.test(x, p = p, rescale.p = TRUE)
                                # works
p <- c(0.40,0.20,0.20,0.19,0.01)
                                # Expected count in category 5
                                # is 1.86 < 5 ==> chi square approx.
chisq.test(x, p = p)            #               maybe doubtful, but is ok!
chisq.test(x, p = p, simulate.p.value = TRUE)

## Case B. Raw data
x <- trunc(5 * runif(100))
chisq.test(table(x))            # NOT 'chisq.test(x)'!

The (non-central) Chi-Squared Distribution


Density, distribution function, quantile function and random generation for the chi-squared (χ2\chi^2) distribution with df degrees of freedom and optional non-centrality parameter ncp.


dchisq(x, df, ncp = 0, log = FALSE)
pchisq(q, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)
qchisq(p, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)
rchisq(n, df, ncp = 0)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.


degrees of freedom (non-negative, but can be non-integer).


non-centrality parameter (non-negative).

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


The chi-squared distribution with df=n0= n \ge 0 degrees of freedom has density

fn(x)=12n/2Γ(n/2)xn/21ex/2f_n(x) = \frac{1}{{2}^{n/2} \Gamma (n/2)} {x}^{n/2-1} {e}^{-x/2}

for x>0x > 0, where f0(x):=limn0fn(x)=δ0(x)f_0(x) := \lim_{n \to 0} f_n(x) = \delta_0(x), a point mass at zero, is not a density function proper, but a “δ\delta distribution”.
The mean and variance are nn and 2n2n.

The non-central chi-squared distribution with df=n= n degrees of freedom and non-centrality parameter ncp =λ= \lambda has density

f(x)=fn,λ(x)=eλ/2r=0(λ/2)rr!fn+2r(x)f(x) = f_{n,\lambda}(x) = e^{-\lambda / 2} \sum_{r=0}^\infty \frac{(\lambda/2)^r}{r!}\, f_{n + 2r}(x)

for x0x \ge 0. For integer nn, this is the distribution of the sum of squares of nn normals each with variance one, λ\lambda being the sum of squares of the normal means; further,
E(X)=n+λE(X) = n + \lambda, Var(X)=2(n+2λ)Var(X) = 2(n + 2*\lambda), and E((XE(X))3)=8(n+3λ)E((X - E(X))^3) = 8(n + 3*\lambda).

Note that the degrees of freedom df=n= n, can be non-integer, and also n=0n = 0 which is relevant for non-centrality λ>0\lambda > 0, see Johnson et al. (1995, chapter 29). In that (noncentral, zero df) case, the distribution is a mixture of a point mass at x=0x = 0 (of size pchisq(0, df=0, ncp=ncp)) and a continuous part, and dchisq() is not a density with respect to that mixture measure but rather the limit of the density for df0df \to 0.

Note that ncp values larger than about 1e5 (and even smaller) may give inaccurate results with many warnings for pchisq and qchisq.


dchisq gives the density, pchisq gives the distribution function, qchisq gives the quantile function, and rchisq generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rchisq, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


Supplying ncp = 0 uses the algorithm for the non-central distribution, which is not the same algorithm used if ncp is omitted. This is to give consistent behaviour in extreme cases with values of ncp very near zero.

The code for non-zero ncp is principally intended to be used for moderate values of ncp: it will not be highly accurate, especially in the tails, for large values.


The central cases are computed via the gamma distribution.

The non-central dchisq and rchisq are computed as a Poisson mixture of central chi-squares (Johnson et al., 1995, p.436).

The non-central pchisq is for ncp < 80 computed from the Poisson mixture of central chi-squares and for larger ncp via a C translation of

Ding, C. G. (1992) Algorithm AS275: Computing the non-central chi-squared distribution function. Applied Statistics, 41 478–482.

which computes the lower tail only (so the upper tail suffers from cancellation and a warning will be given when this is likely to be significant).

The non-central qchisq is based on inversion of pchisq.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, chapters 18 (volume 1) and 29 (volume 2). Wiley, New York.

See Also

Distributions for other standard distributions.

A central chi-squared distribution with nn degrees of freedom is the same as a Gamma distribution with shape α=n/2\alpha = n/2 and scale σ=2\sigma = 2. Hence, see dgamma for the Gamma distribution.

The central chi-squared distribution with 2 d.f. is identical to the exponential distribution with rate 1/2: χ22=Exp(1/2)\chi^2_2 = Exp(1/2), see dexp.



dchisq(1, df = 1:3)
pchisq(1, df =  3)
pchisq(1, df =  3, ncp = 0:4)  # includes the above

x <- 1:10
## Chi-squared(df = 2) is a special exponential distribution
all.equal(dchisq(x, df = 2), dexp(x, 1/2))
all.equal(pchisq(x, df = 2), pexp(x, 1/2))

## non-central RNG -- df = 0 with ncp > 0:  Z0 has point mass at 0!
Z0 <- rchisq(100, df = 0, ncp = 2.)

## visual testing
## do P-P plots for 1000 points at various degrees of freedom
L <- 1.2; n <- 1000; pp <- ppoints(n)
op <- par(mfrow = c(3,3), mar = c(3,3,1,1)+.1, mgp = c(1.5,.6,0),
          oma = c(0,0,3,0))
for(df in 2^(4*rnorm(9))) {
  plot(pp, sort(pchisq(rr <- rchisq(n, df = df, ncp = L), df = df, ncp = L)),
       ylab = "pchisq(rchisq(.),.)", pch = ".")
  mtext(paste("df = ", formatC(df, digits = 4)), line =  -2, adj = 0.05)
  abline(0, 1, col = 2)
mtext(expression("P-P plots : Noncentral  "*
                 chi^2 *"(n=1000, df=X, ncp= 1.2)"),
      cex = 1.5, font = 2, outer = TRUE)

## "analytical" test
lam <- seq(0, 100, by = .25)
p00 <- pchisq(0,      df = 0, ncp = lam)
p.0 <- pchisq(1e-300, df = 0, ncp = lam)
stopifnot(all.equal(p00, exp(-lam/2)),
          all.equal(p.0, exp(-lam/2)))

Classical (Metric) Multidimensional Scaling


Classical multidimensional scaling (MDS) of a data matrix. Also known as principal coordinates analysis (Gower, 1966).


cmdscale(d, k = 2, eig = FALSE, add = FALSE, x.ret = FALSE,
         list. = eig || add || x.ret)



a distance structure such as that returned by dist or a full symmetric matrix containing the dissimilarities.


the maximum dimension of the space which the data are to be represented in; must be in {1,2,,n1}\{1, 2, \ldots, n-1\}.


indicates whether eigenvalues should be returned.


logical indicating if an additive constant cc* should be computed, and added to the non-diagonal dissimilarities such that the modified dissimilarities are Euclidean.


indicates whether the doubly centred symmetric distance matrix should be returned.


logical indicating if a list should be returned or just the n×kn \times k matrix, see ‘Value:’.


Multidimensional scaling takes a set of dissimilarities and returns a set of points such that the distances between the points are approximately equal to the dissimilarities. (It is a major part of what ecologists call ‘ordination’.)

A set of Euclidean distances on nn points can be represented exactly in at most n1n - 1 dimensions. cmdscale follows the analysis of Mardia (1978), and returns the best-fitting kk-dimensional representation, where kk may be less than the argument k.

The representation is only determined up to location (cmdscale takes the column means of the configuration to be at the origin), rotations and reflections. The configuration returned is given in principal-component axes, so the reflection chosen may differ between R platforms (see prcomp).

When add = TRUE, a minimal additive constant cc* is computed such that the dissimilarities dij+cd_{ij} + c* are Euclidean and hence can be represented in n - 1 dimensions. Whereas S (Becker et al., 1988) computes this constant using an approximation suggested by Torgerson, R uses the analytical solution of Cailliez (1983), see also Cox and Cox (2001). Note that because of numerical errors the computed eigenvalues need not all be non-negative, and even theoretically the representation could be in fewer than n - 1 dimensions.


If .list is false (as per default), a matrix with k columns whose rows give the coordinates of the points chosen to represent the dissimilarities.

Otherwise, a list containing the following components.


a matrix with up to k columns whose rows give the coordinates of the points chosen to represent the dissimilarities.


the nn eigenvalues computed during the scaling process if eig is true. NB: versions of R before 2.12.1 returned only k but were documented to return n1n - 1.


the doubly centered distance matrix if x.ret is true.


the additive constant cc*, 0 if add = FALSE.


a numeric vector of length 2, equal to say (g1,g2)(g_1,g_2), where gi=(j=1kλj)/(j=1nTi(λj))g_i = (\sum_{j=1}^k \lambda_j)/ (\sum_{j=1}^n T_i(\lambda_j)), where λj\lambda_j are the eigenvalues (sorted in decreasing order), T1(v)=vT_1(v) = \left| v \right|, and T2(v)=max(v,0)T_2(v) = max( v, 0 ).


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.

Cailliez, F. (1983). The analytical solution of the additive constant problem. Psychometrika, 48, 343–349. doi:10.1007/BF02294026.

Cox, T. F. and Cox, M. A. A. (2001). Multidimensional Scaling. Second edition. Chapman and Hall.

Gower, J. C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53, 325–328. doi:10.2307/2333639.

Krzanowski, W. J. and Marriott, F. H. C. (1994). Multivariate Analysis. Part I. Distributions, Ordination and Inference. London: Edward Arnold. (Especially pp. 108–111.)

Mardia, K.V. (1978). Some properties of classical multidimensional scaling. Communications on Statistics – Theory and Methods, A7, 1233–41. doi:10.1080/03610927808827707

Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Chapter 14 of Multivariate Analysis, London: Academic Press.

Seber, G. A. F. (1984). Multivariate Observations. New York: Wiley.

Torgerson, W. S. (1958). Theory and Methods of Scaling. New York: Wiley.

See Also


isoMDS and sammon in package MASS provide alternative methods of multidimensional scaling.



loc <- cmdscale(eurodist)
x <- loc[, 1]
y <- -loc[, 2] # reflect so North is at the top
## note asp = 1, to ensure Euclidean distances are represented correctly
plot(x, y, type = "n", xlab = "", ylab = "", asp = 1, axes = FALSE,
     main = "cmdscale(eurodist)")
text(x, y, rownames(loc), cex = 0.6)

Extract Model Coefficients


coef is a generic function which extracts model coefficients from objects returned by modeling functions. coefficients is an alias for it.


coef(object, ...)
coefficients(object, ...)
## Default S3 method:
coef(object, complete = TRUE, ...)
## S3 method for class 'aov'
coef(object, complete = FALSE, ...)



an object for which the extraction of model coefficients is meaningful.


for the default (used for lm, etc) and aov methods: logical indicating if the full coefficient vector should be returned also in case of an over-determined system where some coefficients will be set to NA, see also alias. Note that the default differs for lm() and aov() results.


other arguments.


All object classes which are returned by model fitting functions should provide a coef method or use the default one. (Note that the method is for coef and not coefficients.)

The "aov" method does not report aliased coefficients (see alias) by default where complete = FALSE.

The complete argument also exists for compatibility with vcov methods, and coef and aov methods for other classes should typically also keep the complete = * behavior in sync. By that, with p <- length(coef(obj, complete = TF)), dim(vcov(obj, complete = TF)) == c(p,p) will be fulfilled for both complete settings and the default.


Coefficients extracted from the model object object.

For standard model fitting classes this will be a named numeric vector. For "maov" objects (produced by aov) it will be a matrix.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

See Also

fitted.values and residuals for related methods; glm, lm for model fitting.


x <- 1:5; coef(lm(c(1:3, 7, 6) ~ x))

Find Complete Cases


Return a logical vector indicating which cases are complete, i.e., have no missing values.





a sequence of vectors, matrices and data frames.


A logical vector specifying which observations/rows have no missing values across the entire sequence.


A current limitation of this function is that it uses low level functions to determine lengths and missingness, ignoring the class. This will lead to spurious errors when some columns have classes with length or methods, for example "POSIXlt", as described in PR#16648.

See Also, na.omit,


x <- airquality[, -1] # x is a regression design matrix
y <- airquality[,  1] # y is the corresponding response

stopifnot(complete.cases(y) !=
ok <- complete.cases(x, y)
sum(!ok) # how many are not "ok" ?
x <- x[ok,]
y <- y[ok]

Confidence Intervals for Model Parameters


Computes confidence intervals for one or more parameters in a fitted model. There is a default and a method for objects inheriting from class "lm".


confint(object, parm, level = 0.95, ...)
## Default S3 method:
confint(object, parm, level = 0.95, ...)
## S3 method for class 'lm'
confint(object, parm, level = 0.95, ...)
## S3 method for class 'glm'
confint(object, parm, level = 0.95, trace = FALSE, test=c("LRT", "Rao"), ...)
## S3 method for class 'nls'
confint(object, parm, level = 0.95, ...)



a fitted model object.


a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.


the confidence level required.


logical. Should profiling be traced?


use Likelihood Ratio or Rao Score test in profiling.


additional argument(s) for methods.


confint is a generic function. The default method assumes normality, and needs suitable coef and vcov methods to be available. The default method can be called directly for comparison with other methods.

For objects of class "lm" the direct formulae based on tt values are used.

Methods for classes "glm" and "nls" call the appropriate profile method, then find the confidence intervals by interpolation in the profile traces. If the profile object is already available it can be used as the main argument rather than the fitted model object itself.


A matrix (or vector) with columns giving lower and upper confidence limits for each parameter. These will be labelled as (1-level)/2 and 1 - (1-level)/2 in % (by default 2.5% and 97.5%).


Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

See Also

Original versions: confint.glm and confint.nls in package MASS.


fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
confint(fit, "wt")

## from example(glm)
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3, 1, 9); treatment <- gl(3, 3)
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
confint.default(glm.D93)  # based on asymptotic normality

Linearly Constrained Optimization


Minimise a function subject to linear inequality constraints using an adaptive barrier algorithm.


constrOptim(theta, f, grad, ui, ci, mu = 1e-04, control = list(),
            method = if(is.null(grad)) "Nelder-Mead" else "BFGS",
            outer.iterations = 100, outer.eps = 1e-05, ...,
            hessian = FALSE)



numeric (vector) starting value (of length pp): must be in the feasible region.


function to minimise (see below).


gradient of f (a function as well), or NULL (see below).


constraint matrix (k×pk \times p), see below.


constraint vector of length kk (see below).


(Small) tuning parameter.

control, method, hessian

passed to optim.


iterations of the barrier algorithm.


non-negative number; the relative convergence tolerance of the barrier algorithm.


Other named arguments to be passed to f and grad: needs to be passed through optim so should not match its argument names.


The feasible region is defined by ui %*% theta - ci >= 0. The starting value must be in the interior of the feasible region, but the minimum may be on the boundary.

A logarithmic barrier is added to enforce the constraints and then optim is called. The barrier function is chosen so that the objective function should decrease at each outer iteration. Minima in the interior of the feasible region are typically found quite quickly, but a substantial number of outer iterations may be needed for a minimum on the boundary.

The tuning parameter mu multiplies the barrier term. Its precise value is often relatively unimportant. As mu increases the augmented objective function becomes closer to the original objective function but also less smooth near the boundary of the feasible region.

Any optim method that permits infinite values for the objective function may be used (currently all but "L-BFGS-B").

The objective function f takes as first argument the vector of parameters over which minimisation is to take place. It should return a scalar result. Optional arguments ... will be passed to optim and then (if not used by optim) to f. As with optim, the default is to minimise, but maximisation can be performed by setting control$fnscale to a negative value.

The gradient function grad must be supplied except with method = "Nelder-Mead". It should take arguments matching those of f and return a vector containing the gradient.


As for optim, but with two extra components: barrier.value giving the value of the barrier function at the optimum and outer.iterations gives the number of outer iterations (calls to optim). The counts component contains the sum of all optim()$counts.


K. Lange Numerical Analysis for Statisticians. Springer 2001, p185ff

optim, especially method = "L-BFGS-B" which does box-constrained optimisation.


## from optim
fr <- function(x) {   ## Rosenbrock Banana function
    x1 <- x[1]
    x2 <- x[2]
    100 * (x2 - x1 * x1)^2 + (1 - x1)^2
grr <- function(x) { ## Gradient of 'fr'
    x1 <- x[1]
    x2 <- x[2]
    c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
       200 *      (x2 - x1 * x1))

optim(c(-1.2,1), fr, grr)
#Box-constraint, optimum on the boundary
constrOptim(c(-1.2,0.9), fr, grr, ui = rbind(c(-1,0), c(0,-1)), ci = c(-1,-1))
#  x <= 0.9,  y - x > 0.1
constrOptim(c(.5,0), fr, grr, ui = rbind(c(-1,0), c(1,-1)), ci = c(-0.9,0.1))

## Solves linear and quadratic programming problems
## but needs a feasible starting value
# from example(solve.QP) in 'quadprog'
# no derivative
fQP <- function(b) {-sum(c(0,5,0)*b)+0.5*sum(b*b)}
Amat       <- matrix(c(-4,-3,0,2,1,0,0,-2,1), 3, 3)
bvec       <- c(-8, 2, 0)
constrOptim(c(2,-1,-1), fQP, NULL, ui = t(Amat), ci = bvec)
# derivative
gQP <- function(b) {-c(0, 5, 0) + b}
constrOptim(c(2,-1,-1), fQP, gQP, ui = t(Amat), ci = bvec)

## Now with maximisation instead of minimisation
hQP <- function(b) {sum(c(0,5,0)*b)-0.5*sum(b*b)}
constrOptim(c(2,-1,-1), hQP, NULL, ui = t(Amat), ci = bvec,
            control = list(fnscale = -1))

(Possibly Sparse) Contrast Matrices


Return a matrix of contrasts.


contr.helmert(n, contrasts = TRUE, sparse = FALSE)
contr.poly(n, scores = 1:n, contrasts = TRUE, sparse = FALSE)
contr.sum(n, contrasts = TRUE, sparse = FALSE)
contr.treatment(n, base = 1, contrasts = TRUE, sparse = FALSE)
contr.SAS(n, contrasts = TRUE, sparse = FALSE)



a vector of levels for a factor, or the number of levels.


a logical indicating whether contrasts should be computed.


logical indicating if the result should be sparse (of class dgCMatrix), using package Matrix.


the set of values over which orthogonal polynomials are to be computed.


an integer specifying which group is considered the baseline group. Ignored if contrasts is FALSE.


These functions are used for creating contrast matrices for use in fitting analysis of variance and regression models. The columns of the resulting matrices contain contrasts which can be used for coding a factor with n levels. The returned value contains the computed contrasts. If the argument contrasts is FALSE a square indicator matrix (the dummy coding) is returned except for contr.poly (which includes the 0-degree, i.e. constant, polynomial when contrasts = FALSE).

contr.helmert returns Helmert contrasts, which contrast the second level with the first, the third with the average of the first two, and so on. contr.poly returns contrasts based on orthogonal polynomials. contr.sum uses ‘sum to zero contrasts’.

contr.treatment contrasts each level with the baseline level (specified by base): the baseline level is omitted. Note that this does not produce ‘contrasts’ as defined in the standard theory for linear models as they are not orthogonal to the intercept.

contr.SAS is a wrapper for contr.treatment that sets the base level to be the last level of the factor. The coefficients produced when using these contrasts should be equivalent to those produced by many (but not all) SAS procedures.

For consistency, sparse is an argument to all these contrast functions, however sparse = TRUE for contr.poly is typically pointless and is rarely useful for contr.helmert.


A matrix with n rows and k columns, with k=n-1 if contrasts is TRUE and k=n if contrasts is FALSE.


Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

contrasts, C, and aov, glm, lm.


(cH <- contr.helmert(4))
apply(cH, 2, sum) # column sums are 0
crossprod(cH) # diagonal -- columns are orthogonal
contr.helmert(4, contrasts = FALSE) # just the 4 x 4 identity matrix

(cT <- contr.treatment(5))
all(crossprod(cT) == diag(4)) # TRUE: even orthonormal

(cT. <- contr.SAS(5))
all(crossprod(cT.) == diag(4)) # TRUE

zapsmall(cP <- contr.poly(3)) # Linear and Quadratic
zapsmall(crossprod(cP), digits = 15) # orthonormal up to fuzz

Get and Set Contrast Matrices


Set and view the contrasts associated with a factor.


contrasts(x, contrasts = TRUE, sparse = FALSE)
contrasts(x, how.many = NULL) <- value



a factor or a logical variable.


logical. See ‘Details’.


logical indicating if the result should be sparse (of class dgCMatrix), using package Matrix.


integer number indicating how many contrasts should be made. Defaults to one less than the number of levels of x. This need not be the same as the number of columns of value.


either a numeric matrix (or a sparse or dense matrix of a class extending dMatrix from package Matrix) whose columns give coefficients for contrasts in the levels of x, or (the quoted name of) a function which computes such matrices.


If contrasts are not set for a factor the default functions from options("contrasts") are used.

A logical vector x is converted into a two-level factor with levels c(FALSE, TRUE) (regardless of which levels occur in the variable).

The argument contrasts is ignored if x has a matrix contrasts attribute set. Otherwise if contrasts = TRUE it is passed to a contrasts function such as contr.treatment and if contrasts = FALSE an identity matrix is returned. Suitable functions have a first argument which is the character vector of levels, a named argument contrasts (always called with contrasts = TRUE) and optionally a logical argument sparse.

If value supplies more than how.many contrasts, the first how.many are used. If too few are supplied, a suitable contrast matrix is created by extending value after ensuring its columns are contrasts (orthogonal to the constant term) and not collinear.


Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

C, contr.helmert, contr.poly, contr.sum, contr.treatment; glm, aov, lm.


fff <- ff[, drop = TRUE]  # reduce to 5 levels.
contrasts(fff) # treatment contrasts by default
contrasts(C(fff, sum))
contrasts(fff, contrasts = FALSE) # the 5x5 identity matrix

contrasts(fff) <- contr.sum(5); contrasts(fff)  # set sum contrasts
contrasts(fff, 2) <- contr.sum(5); contrasts(fff)  # set 2 contrasts
# supply 2 contrasts, compute 2 more to make full set of 4.
contrasts(fff) <- contr.sum(5)[, 1:2]; contrasts(fff)

## using sparse contrasts: % useful, once model.matrix() works with these :
ffs <- fff
contrasts(ffs) <- contr.sum(5, sparse = TRUE)[, 1:2]; contrasts(ffs)
stopifnot(all.equal(ffs, fff))
contrasts(ffs) <- contr.sum(5, sparse = TRUE); contrasts(ffs)

Convolution of Sequences via FFT


Use the Fast Fourier Transform to compute the several kinds of convolutions of two sequences.


convolve(x, y, conj = TRUE, type = c("circular", "open", "filter"))


x, y

numeric sequences of the same length to be convolved.


logical; if TRUE, take the complex conjugate before back-transforming (default, and used for usual convolution).


character; partially matched to "circular", "open", "filter". For "circular", the two sequences are treated as circular, i.e., periodic.

For "open" and "filter", the sequences are padded with 0s (from left and right) first; "filter" returns the middle sub-vector of "open", namely, the result of running a weighted mean of x with weights y.


The Fast Fourier Transform, fft, is used for efficiency.

The input sequences x and y must have the same length if circular is true.

Note that the usual definition of convolution of two sequences x and y is given by convolve(x, rev(y), type = "o").


If r <- convolve(x, y, type = "open") and n <- length(x), m <- length(y), then

rk=ixkm+iyir_k = \sum_{i} x_{k-m+i} y_{i}

where the sum is over all valid indices ii, for k=1,,n+m1k = 1, \dots, n+m-1.

If type == "circular", n=mn = m is required, and the above is true for i,k=1,,ni , k = 1,\dots,n when xj:=xn+jx_{j} := x_{n+j} for j<1j < 1.


Brillinger, D. R. (1981) Time Series: Data Analysis and Theory, Second Edition. San Francisco: Holden-Day.

fft, nextn, and particularly filter (from the stats package) which may be more appropriate.



x <- c(0,0,0,100,0,0,0)
y <- c(0,0,1, 2 ,1,0,0)/4
zapsmall(convolve(x, y))         #  *NOT* what you first thought.
zapsmall(convolve(x, y[3:5], type = "f")) # rather
x <- rnorm(50)
y <- rnorm(50)
# Circular convolution *has* this symmetry:
all.equal(convolve(x, y, conj = FALSE), rev(convolve(rev(y),x)))

n <- length(x <- -20:24)
y <- (x-10)^2/1000 + rnorm(x)/8

Han <- function(y) # Hanning
       convolve(y, c(1,2,1)/4, type = "filter")

plot(x, y, main = "Using  convolve(.) for Hanning filters")
lines(x[-c(1  , n)      ], Han(y), col = "red")
lines(x[-c(1:2, (n-1):n)], Han(Han(y)), lwd = 2, col = "dark blue")

Cophenetic Distances for a Hierarchical Clustering


Computes the cophenetic distances for a hierarchical clustering.


## Default S3 method:
## S3 method for class 'dendrogram'



an R object representing a hierarchical clustering. For the default method, an object of class "hclust" or with a method for as.hclust() such as "agnes" in package cluster.


The cophenetic distance between two observations that have been clustered is defined to be the intergroup dissimilarity at which the two observations are first combined into a single cluster. Note that this distance has many ties and restrictions.

It can be argued that a dendrogram is an appropriate summary of some data if the correlation between the original distances and the cophenetic distances is high. Otherwise, it should simply be viewed as the description of the output of the clustering algorithm.

cophenetic is a generic function. Support for classes which represent hierarchical clusterings (total indexed hierarchies) can be added by providing an as.hclust() or, more directly, a cophenetic() method for such a class.

The method for objects of class "dendrogram" requires that all leaves of the dendrogram object have non-null labels.


An object of class "dist".


Robert Gentleman


Sneath, P.H.A. and Sokal, R.R. (1973) Numerical Taxonomy: The Principles and Practice of Numerical Classification, p. 278 ff; Freeman, San Francisco.

dist, hclust



d1 <- dist(USArrests)
hc <- hclust(d1, "ave")
d2 <- cophenetic(hc)
cor(d1, d2) # 0.7659

## Example from Sneath & Sokal, Fig. 5-29, p.279
d0 <- c(1,3.8,4.4,5.1, 4,4.2,5, 2.6,5.3, 5.4)
attributes(d0) <- list(Size = 5, diag = TRUE)
class(d0) <- "dist"
names(d0) <- letters[1:5]
utils::str(upgma <- hclust(d0, method = "average"))
plot(upgma, hang = -1)
(d.coph <- cophenetic(upgma))
cor(d0, d.coph) # 0.9911

Correlation, Variance and Covariance (Matrices)


var, cov and cor compute the variance of x and the covariance or correlation of x and y if these are vectors. If x and y are matrices then the covariances (or correlations) between the columns of x and the columns of y are computed.

cov2cor scales a covariance matrix into the corresponding correlation matrix efficiently.


var(x, y = NULL, na.rm = FALSE, use)

cov(x, y = NULL, use = "everything",
    method = c("pearson", "kendall", "spearman"))

cor(x, y = NULL, use = "everything",
    method = c("pearson", "kendall", "spearman"))




a numeric vector, matrix or data frame.


NULL (default) or a vector, matrix or data frame with compatible dimensions to x. The default is equivalent to y = x (but more efficient).


logical. Should missing values be removed?


an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".


a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.


symmetric numeric matrix, usually positive definite such as a covariance matrix.


For cov and cor one must either give a matrix or data frame for x or give both x and y.

The inputs must be numeric (as determined by is.numeric: logical values are also allowed for historical compatibility): the "kendall" and "spearman" methods make sense for ordered inputs but xtfrm can be used to find a suitable prior transformation to numbers.

var is just another interface to cov, where na.rm is used to determine the default for use when that is unspecified. If na.rm is TRUE then the complete observations (rows) are used (use = "na.or.complete") to compute the variance. Otherwise, by default use = "everything".

If use is "everything", NAs will propagate conceptually, i.e., a resulting value will be NA whenever one of its contributing observations is NA.
If use is "all.obs", then the presence of missing observations will produce an error. If use is "complete.obs" then missing values are handled by casewise deletion (and if there are no complete cases, that gives an error).
"na.or.complete" is the same unless there are no complete cases, that gives NA. Finally, if use has the value "pairwise.complete.obs" then the correlation or covariance between each pair of variables is computed using all complete pairs of observations on those variables. This can result in covariance or correlation matrices which are not positive semi-definite, as well as NA entries if there are no complete pairs for that pair of variables. For cov and var, "pairwise.complete.obs" only works with the "pearson" method. Note that (the equivalent of) var(double(0), use = *) gives NA for use = "everything" and "na.or.complete", and gives an error in the other cases.

The denominator n1n - 1 is used which gives an unbiased estimator of the (co)variance for i.i.d. observations. These functions return NA when there is only one observation.

For cor(), if method is "kendall" or "spearman", Kendall's τ\tau or Spearman's ρ\rho statistic is used to estimate a rank-based measure of association. These are more robust and have been recommended if the data do not necessarily come from a bivariate normal distribution.
For cov(), a non-Pearson method is unusual but available for the sake of completeness. Note that "spearman" basically computes cor(R(x), R(y)) (or cov(., .)) where R(u) := rank(u, na.last = "keep"). In the case of missing values, the ranks are calculated depending on the value of use, either based on complete observations, or based on pairwise completeness with reranking for each pair.

When there are ties, Kendall's τb\tau_b is computed, as proposed by Kendall (1945).

Scaling a covariance matrix into a correlation one can be achieved in many ways, mathematically most appealing by multiplication with a diagonal matrix from left and right, or more efficiently by using sweep(.., FUN = "/") twice. The cov2cor function is even a bit more efficient, and provided mostly for didactical reasons.


For r <- cor(*, use = "all.obs"), it is now guaranteed that all(abs(r) <= 1).


Some people have noted that the code for Kendall's tau is slow for very large datasets (many more than 1000 cases). It rarely makes sense to do such a computation, but see function in package pcaPP.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.

Kendall, M. G. (1938). A new measure of rank correlation, Biometrika, 30, 81–93. doi:10.1093/biomet/30.1-2.81.

Kendall, M. G. (1945). The treatment of ties in rank problems. Biometrika, 33 239–251. doi:10.1093/biomet/33.3.239

See Also

cor.test for confidence intervals (and tests).

cov.wt for weighted covariance computation.

sd for standard deviation (vectors).


var(1:10)  # 9.166667

var(1:5, 1:5) # 2.5

## Two simple vectors
cor(1:10, 2:11) # == 1

## Correlation Matrix of Multivariate sample:
(Cl <- cor(longley))
## Graphical Correlation Matrix:
symnum(Cl) # highly correlated

## Spearman's rho  and  Kendall's tau
symnum(clS <- cor(longley, method = "spearman"))
symnum(clK <- cor(longley, method = "kendall"))
## How much do they differ?
i <- lower.tri(Cl)
cor(cbind(P = Cl[i], S = clS[i], K = clK[i]))

## cov2cor() scales a covariance matrix by its diagonal
##           to become the correlation matrix.
cov2cor # see the function definition {and learn ..}
stopifnot(all.equal(Cl, cov2cor(cov(longley))),
          all.equal(cor(longley, method = "kendall"),
            cov2cor(cov(longley, method = "kendall"))))

##--- Missing value treatment:
C1 <- cov(swiss)
range(eigen(C1, only.values = TRUE)$values) # 6.19        1921

## swM := "swiss" with  3 "missing"s :
swM <- swiss
colnames(swM) <- abbreviate(colnames(swiss), minlength=6)
swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing"

## Consider all 5 "use" cases :
(C. <- cov(swM)) # use="everything"  quite a few NA's in cov.matrix
try(cov(swM, use = "all")) # Error: missing obs...
C2 <- cov(swM, use = "complete")
stopifnot(identical(C2, cov(swM, use = "na.or.complete")))
range(eigen(C2, only.values = TRUE)$values) # 6.46   1930
C3 <- cov(swM, use = "pairwise")
range(eigen(C3, only.values = TRUE)$values) # 6.19   1938

## Kendall's tau doesn't change much:
symnum(Rc <- cor(swM, method = "kendall", use = "complete"))
symnum(Rp <- cor(swM, method = "kendall", use = "pairwise"))
symnum(R. <- cor(swiss, method = "kendall"))

## "pairwise" is closer componentwise,
summary(abs(c(1 - Rp/R.)))
summary(abs(c(1 - Rc/R.)))

## but "complete" is closer in Eigen space:
EV <- function(m) eigen(m, only.values=TRUE)$values
summary(abs(1 - EV(Rp)/EV(R.)) / abs(1 - EV(Rc)/EV(R.)))

Test for Association/Correlation Between Paired Samples


Test for association between paired samples, using one of Pearson's product moment correlation coefficient, Kendall's τ\tau or Spearman's ρ\rho.


cor.test(x, ...)

## Default S3 method:
cor.test(x, y,
         alternative = c("two.sided", "less", "greater"),
         method = c("pearson", "kendall", "spearman"),
         exact = NULL, conf.level = 0.95, continuity = FALSE, ...)

## S3 method for class 'formula'
cor.test(formula, data, subset, na.action, ...)


x, y

numeric vectors of data values. x and y must have the same length.


indicates the alternative hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. "greater" corresponds to positive association, "less" to negative association.


a character string indicating which correlation coefficient is to be used for the test. One of "pearson", "kendall", or "spearman", can be abbreviated.


a logical indicating whether an exact p-value should be computed. Used for Kendall's τ\tau and Spearman's ρ\rho. See ‘Details’ for the meaning of NULL (the default).


confidence level for the returned confidence interval. Currently only used for the Pearson product moment correlation coefficient if there are at least 4 complete pairs of observations.


logical: if true, a continuity correction is used for Kendall's τ\tau and Spearman's ρ\rho when not computed exactly.


a formula of the form ~ u + v, where each of u and v are numeric variables giving the data values for one sample. The samples must be of the same length.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").


further arguments to be passed to or from methods.


The three methods each estimate the association between paired samples and compute a test of the value being zero. They use different measures of association, all in the range [1,1][-1, 1] with 00 indicating no association. These are sometimes referred to as tests of no correlation, but that term is often confined to the default method.

If method is "pearson", the test statistic is based on Pearson's product moment correlation coefficient cor(x, y) and follows a t distribution with length(x)-2 degrees of freedom if the samples follow independent normal distributions. If there are at least 4 complete pairs of observation, an asymptotic confidence interval is given based on Fisher's Z transform.

If method is "kendall" or "spearman", Kendall's τ\tau or Spearman's ρ\rho statistic is used to estimate a rank-based measure of association. These tests may be used if the data do not necessarily come from a bivariate normal distribution.

For Kendall's test, by default (if exact is NULL), an exact p-value is computed if there are less than 50 paired samples containing finite values and there are no ties. Otherwise, the test statistic is the estimate scaled to zero mean and unit variance, and is approximately normally distributed.

For Spearman's test, p-values are computed using algorithm AS 89 for n<1290n < 1290 and exact = TRUE, otherwise via the asymptotic tt approximation. Note that these are ‘exact’ for n<10n < 10, and use an Edgeworth series approximation for larger sample sizes (the cutoff has been changed from the original paper).


A list with class "htest" containing the following components:


the value of the test statistic.


the degrees of freedom of the test statistic in the case that it follows a t distribution.


the p-value of the test.


the estimated measure of association, with name "cor", "tau", or "rho" corresponding to the method employed.


the value of the association measure under the null hypothesis, always 0.


a character string describing the alternative hypothesis.


a character string indicating how the association was measured.

a character string giving the names of the data.

a confidence interval for the measure of association. Currently only given for Pearson's product moment correlation coefficient in case of at least 4 complete pairs of observations.


D. J. Best & D. E. Roberts (1975). Algorithm AS 89: The Upper Tail Probabilities of Spearman's ρ\rho. Applied Statistics, 24, 377–379. doi:10.2307/2347111.

Myles Hollander & Douglas A. Wolfe (1973), Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 185–194 (Kendall and Spearman tests).

See Also

Kendall in package Kendall.

pKendall and pSpearman in package SuppDists, spearman.test in package pspearman, which supply different (and often more accurate) approximations.


## Hollander & Wolfe (1973), p. 187f.
## Assessment of tuna quality.  We compare the Hunter L measure of
##  lightness to the averages of consumer panel scores (recoded as
##  integer values from 1 to 6 and averaged over 80 such values) in
##  9 lots of canned tuna.

x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6,  3.1,  2.5,  5.0,  3.6,  4.0,  5.2,  2.8,  3.8)

##  The alternative hypothesis of interest is that the
##  Hunter L value is positively associated with the panel score.

cor.test(x, y, method = "kendall", alternative = "greater")
## => p=0.05972

cor.test(x, y, method = "kendall", alternative = "greater",
         exact = FALSE) # using large sample approximation
## => p=0.04765

## Compare this to
cor.test(x, y, method = "spearm", alternative = "g")
cor.test(x, y,                    alternative = "g")

## Formula interface.
cor.test(~ CONT + INTG, data = USJudgeRatings)

Weighted Covariance Matrices


Returns a list containing estimates of the weighted covariance matrix and the mean of the data, and optionally of the (weighted) correlation matrix.


cov.wt(x, wt = rep(1/nrow(x), nrow(x)), cor = FALSE, center = TRUE,
       method = c("unbiased", "ML"))



a matrix or data frame. As usual, rows are observations and columns are variables.


a non-negative and non-zero vector of weights for each observation. Its length must equal the number of rows of x.


a logical indicating whether the estimated correlation weighted matrix will be returned as well.


either a logical or a numeric vector specifying the centers to be used when computing covariances. If TRUE, the (weighted) mean of each variable is used, if FALSE, zero is used. If center is numeric, its length must equal the number of columns of x.


string specifying how the result is scaled, see ‘Details’ below. Can be abbreviated.


By default, method = "unbiased", The covariance matrix is divided by one minus the sum of squares of the weights, so if the weights are the default (1/n1/n) the conventional unbiased estimate of the covariance matrix with divisor (n1)(n - 1) is obtained.


A list containing the following named components:


the estimated (weighted) covariance matrix


an estimate for the center (mean) of the data.


the number of observations (rows) in x.


the weights used in the estimation. Only returned if given as an argument.


the estimated correlation matrix. Only returned if cor is TRUE.

cov and var.


(xy <- cbind(x = 1:10, y = c(1:3, 8:5, 8:10)))
 w1 <- c(0,0,0,1,1,1,1,1,0,0)
 cov.wt(xy, wt = w1) # i.e. method = "unbiased"
 cov.wt(xy, wt = w1, method = "ML", cor = TRUE)

Plot Cumulative Periodogram


Plots a cumulative periodogram.


cpgram(ts, taper = 0.1,
       main = paste("Series: ", deparse1(substitute(ts))),
       ci.col = "blue")



a univariate time series


proportion tapered in forming the periodogram


main title


colour for confidence band.



Side Effects

Plots the cumulative periodogram in a square plot.


From package MASS.


B.D. Ripley



par(pty = "s", mfrow = c(1,2))
cpgram(lh) <- ar(lh, order.max = 9)
cpgram($resid, main = "AR(3) fit to lh")


Cut a Tree into Groups of Data


Cuts a tree, e.g., as resulting from hclust, into several groups either by specifying the desired number(s) of groups or the cut height(s).


cutree(tree, k = NULL, h = NULL)



a tree as produced by hclust. cutree() only expects a list with components merge, height, and labels, of appropriate content each.


an integer scalar or vector with the desired number of groups


numeric scalar or vector with heights where the tree should be cut.

At least one of k or h must be specified, k overrides h if both are given.


Cutting trees at a given height is only possible for ultrametric trees (with monotone clustering heights).


cutree returns a vector with group memberships if k or h are scalar, otherwise a matrix with group memberships is returned where each column corresponds to the elements of k or h, respectively (which are also used as column names).


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

hclust, dendrogram for cutting trees themselves.


hc <- hclust(dist(USArrests))

cutree(hc, k = 1:5) #k = 1 is trivial
cutree(hc, h = 250)

## Compare the 2 and 4 grouping:
g24 <- cutree(hc, k = c(2,4))
table(grp2 = g24[,"2"], grp4 = g24[,"4"])

Classical Seasonal Decomposition by Moving Averages


Decompose a time series into seasonal, trend and irregular components using moving averages. Deals with additive or multiplicative seasonal component.


decompose(x, type = c("additive", "multiplicative"), filter = NULL)



A time series.


The type of seasonal component. Can be abbreviated.


A vector of filter coefficients in reverse time order (as for AR or MA coefficients), used for filtering out the seasonal component. If NULL, a moving average with symmetric window is performed.


The additive model used is:

Yt=Tt+St+etY_t = T_t + S_t + e_t

The multiplicative model used is:

Yt=TtStetY_t = T_t\,S_t\, e_t

The function first determines the trend component using a moving average (if filter is NULL, a symmetric window with equal weights is used), and removes it from the time series. Then, the seasonal figure is computed by averaging, for each time unit, over all periods. The seasonal figure is then centered. Finally, the error component is determined by removing trend and seasonal figure (recycled as needed) from the original time series.

This only works well if x covers an integer number of complete periods.


An object of class "decomposed.ts" with following components:


The original series.


The seasonal component (i.e., the repeated seasonal figure).


The estimated seasonal figure only.


The trend component.


The remainder part.


The value of type.


The function stl provides a much more sophisticated decomposition.


David Meyer


M. Kendall and A. Stuart (1983) The Advanced Theory of Statistics, Vol.3, Griffin. pp. 410–414.

See Also




m <- decompose(co2)

## example taken from Kendall/Stuart
x <- c(-50, 175, 149, 214, 247, 237, 225, 329, 729, 809,
       530, 489, 540, 457, 195, 176, 337, 239, 128, 102, 232, 429, 3,
       98, 43, -141, -77, -13, 125, 361, -45, 184)
x <- ts(x, start = c(1951, 1), end = c(1958, 4), frequency = 4)
m <- decompose(x)
## seasonal figure: 6.25, 8.62, -8.84, -6.03
round(decompose(x)$figure / 10, 2)

Modify Terms Objects


delete.response returns a terms object for the same model but with no response variable.

drop.terms removes variables from the right-hand side of the model. There is also a "[.terms" method to perform the same function (with keep.response = TRUE).

reformulate creates a formula from a character vector. If length(termlabels) > 1, its elements are concatenated with +. Non-syntactic names (e.g. containing spaces or special characters; see make.names) must be protected with backticks (see examples). A non-parseable response still works for now, back compatibly, with a deprecation warning.



reformulate(termlabels, response = NULL, intercept = TRUE, env = parent.frame())

drop.terms(termobj, dropx = NULL, keep.response = FALSE)



A terms object


character vector giving the right-hand side of a model formula. Cannot be zero-length.


character string, symbol or call giving the left-hand side of a model formula, or NULL.


logical: should the formula have an intercept?


the environment of the formula returned.


vector of positions of variables to drop from the right-hand side of the model.


Keep the response in the resulting object?


delete.response and drop.terms return a terms object.

reformulate returns a formula.

ff <- y ~ z + x + w
tt <- terms(ff)
drop.terms(tt, 2:3, keep.response = TRUE)
reformulate(attr(tt, "term.labels"))

## keep LHS :
reformulate("x*w", ff[[2]])
fS <- surv(ft, case) ~ a + b
reformulate(c("a", "b*f"), fS[[2]])

## using non-syntactic names:
reformulate(c("`P/E`", "`% Growth`"), response ="+-"))

x <- c("a name", "another name")
tryCatch( reformulate(x), error = function(e) "Syntax error." )
## rather backquote the strings in x :
reformulate(sprintf("`%s`", x))

stopifnot(identical(      ~ var, reformulate("var")),
          identical(~ a + b + c, reformulate(letters[1:3])),
          identical(  y ~ a + b, reformulate(letters[1:2], "y"))

Apply a Function to All Nodes of a Dendrogram


Apply function FUN to each node of a dendrogram recursively. When y <- dendrapply(x, fn), then y is a dendrogram of the same graph structure as x and for each node, y.node[j] <- FUN( x.node[j], ...) (where y.node[j] is an (invalid!) notation for the j-th node of y).


dendrapply(X, FUN, ...)



an object of class "dendrogram".


an R function to be applied to each dendrogram node, typically working on its attributes alone, returning an altered version of the same node.


potential further arguments passed to FUN.


Usually a dendrogram of the same (graph) structure as X. For that, the function must be conceptually of the form FUN <- function(X) { attributes(X) <- .....; X }, i.e., returning the node with some attributes added or changed.


The implementation is somewhat experimental and suggestions for enhancements (or nice examples of usage) are very welcome. The current implementation is recursive and inefficient for dendrograms with many non-leaves. See the ‘Warning’ in dendrogram.


Martin Maechler

as.dendrogram, lapply for applying a function to each component of a list, rapply for doing so to each non-list component of a nested list.



## a smallish simple dendrogram
dhc <- as.dendrogram(hc <- hclust(dist(USArrests), "ave"))
(dhc21 <- dhc[[2]][[1]])

## too simple:
dendrapply(dhc21, function(n) utils::str(attributes(n)))

## toy example to set colored leaf labels :
  colLab <<- function(n) {
      if(is.leaf(n)) {
        a <- attributes(n)
        i <<- i+1
        attr(n, "nodePar") <-
            c(a$nodePar, list(lab.col = mycols[i], lab.font = i%%3))
  mycols <- grDevices::rainbow(attr(dhc21,"members"))
  i <- 0
dL <- dendrapply(dhc21, colLab)
op <- par(mfrow = 2:1)
 plot(dL) ## --> colored labels!

General Tree Structures


Class "dendrogram" provides general functions for handling tree-like structures. It is intended as a replacement for similar functions in hierarchical clustering and classification/regression trees, such that all of these can use the same engine for plotting or cutting trees.


as.dendrogram(object, ...)
## S3 method for class 'hclust'
as.dendrogram(object, hang = -1, check = TRUE, ...)

## S3 method for class 'dendrogram'
as.hclust(x, ...)

## S3 method for class 'dendrogram'
plot(x, type = c("rectangle", "triangle"),
      center = FALSE,
      edge.root = is.leaf(x) || !is.null(attr(x,"edgetext")),
      nodePar = NULL, edgePar = list(),
      leaflab = c("perpendicular", "textlike", "none"),
      dLeaf = NULL, xlab = "", ylab = "", xaxt = "n", yaxt = "s",
      horiz = FALSE, frame.plot = FALSE, xlim, ylim, ...)

## S3 method for class 'dendrogram'
cut(x, h, ...)

## S3 method for class 'dendrogram'
merge(x, y, ..., height,
      adjust = c("auto", "add.max", "none"))

## S3 method for class 'dendrogram'
nobs(object, ...)

## S3 method for class 'dendrogram'
print(x, digits, ...)

## S3 method for class 'dendrogram'

## S3 method for class 'dendrogram'
str(object, max.level = NA, digits.d = 3,
    give.attr = FALSE, wid = getOption("width"),
    nest.lev = 0, indent.str = "",
    last.str = getOption("str.dendrogram.last"), stem = "--",




any R object that can be made into one of class "dendrogram".

x, y

object(s) of class "dendrogram".


numeric scalar indicating how the height of leaves should be computed from the heights of their parents; see plot.hclust.


logical indicating if object should be checked for validity. This check is not necessary when x is known to be valid such as when it is the direct result of hclust(). The default is check=TRUE, e.g. for protecting against memory explosion with invalid inputs.


type of plot.


logical; if TRUE, nodes are plotted centered with respect to the leaves in the branch. Otherwise (default), plot them in the middle of all direct child nodes.


logical; if true, draw an edge to the root node.


a list of plotting parameters to use for the nodes (see points) or NULL by default which does not draw symbols at the nodes. The list may contain components named pch, cex, col, xpd, and/or bg each of which can have length two for specifying separate attributes for inner nodes and leaves. Note that the default of pch is 1:2, so you may want to use pch = NA if you specify nodePar.


a list of plotting parameters to use for the edge segments and labels (if there's an edgetext). The list may contain components named col, lty and lwd (for the segments), p.col, p.lwd, and p.lty (for the polygon around the text) and t.col for the text color. As with nodePar, each can have length two for differentiating leaves and inner nodes.


a string specifying how leaves are labeled. The default "perpendicular" write text vertically (by default).
"textlike" writes text horizontally (in a rectangle), and
"none" suppresses leaf labels.


a number specifying the distance in user coordinates between the tip of a leaf and its label. If NULL as per default, 3/4 of a letter width or height is used.


logical indicating if the dendrogram should be drawn horizontally or not.


logical indicating if a box around the plot should be drawn, see plot.default.


height at which the tree is cut.


height at which the two dendrograms should be merged. If not specified (or NULL), the default is ten percent larger than the (larger of the) two component heights.


a string determining if the leaf values should be adjusted. The default, "auto", checks if the (first) two dendrograms both start at 1; if they do, "add.max" is chosen, which adds the maximum of the previous dendrogram leaf values to each leaf of the “next” dendrogram. Specifying adjust to another value skips the check and hence is a tad more efficient.

xlim, ylim

optional x- and y-limits of the plot, passed to plot.default. The defaults for these show the full dendrogram.

..., xlab, ylab, xaxt, yaxt

graphical parameters, or arguments for other methods.


integer specifying the precision for printing, see print.default.

max.level, digits.d, give.attr, wid, nest.lev, indent.str

arguments to str, see str.default(). Note that give.attr = FALSE still shows height and members attributes for each node.

last.str, stem

strings used for str() specifying how the last branch (at each level) should start and the stem to use for each dendrogram branch. In some environments, using last.str = "'" will provide much nicer looking output, than the historical default last.str = "`".


The dendrogram is directly represented as a nested list where each component corresponds to a branch of the tree. Hence, the first branch of tree z is z[[1]], the second branch of the corresponding subtree is z[[1]][[2]], or shorter z[[c(1,2)]], etc.. Each node of the tree carries some information needed for efficient plotting or cutting as attributes, of which only members, height and leaf for leaves are compulsory:


total number of leaves in the branch


numeric non-negative height at which the node is plotted.


numeric horizontal distance of the node from the left border (the leftmost leaf) of the branch (unit 1 between all leaves). This is used for plot(*, center = FALSE).


character; the label of the node


for cut()$upper, the number of former members; more generally a substitute for the members component used for ‘horizontal’ (when horiz = FALSE, else ‘vertical’) alignment.


character; the label for the edge leading to the node


a named list (of length-1 components) specifying node-specific attributes for points plotting, see the nodePar argument above.


a named list (of length-1 components) specifying attributes for segments plotting of the edge leading to the node, and drawing of the edgetext if available, see the edgePar argument above.


logical, if TRUE, the node is a leaf of the tree.

cut.dendrogram() returns a list with components $upper and $lower, the first is a truncated version of the original tree, also of class dendrogram, the latter a list with the branches obtained from cutting the tree, each a dendrogram.

There are [[, print, and str methods for "dendrogram" objects where the first one (extraction) ensures that selecting sub-branches keeps the class, i.e., returns a dendrogram even if only a leaf. On the other hand, [ (single bracket) extraction returns the underlying list structure.

Objects of class "hclust" can be converted to class "dendrogram" using method as.dendrogram(), and since R 2.13.0, there is also a as.hclust() method as an inverse.

rev.dendrogram simply returns the dendrogram x with reversed nodes, see also reorder.dendrogram.

The merge(x, y, ...) method merges two or more dendrograms into a new one which has x and y (and optional further arguments) as branches. Note that before R 3.1.2, adjust = "none" was used implicitly, which is invalid when, e.g., the dendrograms are from as.dendrogram(hclust(..)).

nobs(object) returns the total number of leaves (the members attribute, see above).

is.leaf(object) returns logical indicating if object is a leaf (the most simple dendrogram).

plotNode() and plotNodeLimit() are helper functions.


Some operations on dendrograms such as merge() make use of recursion. For deep trees it may be necessary to increase options("expressions"): if you do, you are likely to need to set the C stack size (Cstack_info()[["size"]]) larger than the default where possible.



When using type = "triangle", center = TRUE often looks better.


If you really want to see the internal structure, use str(unclass(d)) instead.

dendrapply for applying a function to each node. order.dendrogram and reorder.dendrogram; further, the labels method.


require(graphics); require(utils)

hc <- hclust(dist(USArrests), "ave")
(dend1 <- as.dendrogram(hc)) # "print()" method
str(dend1)          # "str()" method
str(dend1, max.level = 2, last.str =  "'") # only the first two sub-levels
oo <- options(str.dendrogram.last = "\\") # yet another possibility
str(dend1, max.level = 2) # only the first two sub-levels
options(oo)  # .. resetting them

op <- par(mfrow =  c(2,2), mar = c(5,2,1,4))
## "triangle" type and show inner nodes:
plot(dend1, nodePar = list(pch = c(1,NA), cex = 0.8, lab.cex = 0.8),
      type = "t", center = TRUE)
plot(dend1, edgePar = list(col = 1:2, lty = 2:3),
     dLeaf = 1, edge.root = TRUE)
plot(dend1, nodePar = list(pch = 2:1, cex = .4*2:1, col = 2:3),
     horiz = TRUE)

## simple test for as.hclust() as the inverse of as.dendrogram():
stopifnot(identical(as.hclust(dend1)[1:4], hc[1:4]))

dend2 <- cut(dend1, h = 70)
## leaves are wrong horizontally in R 4.0 and earlier:
plot(dend2$upper, nodePar = list(pch = c(1,7), col = 2:1))
##  dend2$lower is *NOT* a dendrogram, but a list of .. :
plot(dend2$lower[[3]], nodePar = list(col = 4), horiz = TRUE, type = "tr")
## "inner" and "leaf" edges in different type & color :
plot(dend2$lower[[2]], nodePar = list(col = 1),   # non empty list
     edgePar = list(lty = 1:2, col = 2:1), edge.root = TRUE)
d3 <- dend2$lower[[2]][[2]][[1]]
stopifnot(identical(d3, dend2$lower[[2]][[c(2,1)]]))
str(d3, last.str = "'")

## to peek at the inner structure "if you must", use '[..]' indexing :
str(d3[2][[1]]) ## or the full

## merge() to join dendrograms:
(d13 <- merge(dend2$lower[[1]], dend2$lower[[3]]))
## merge() all parts back (using default 'height' instead of original one):
den.1 <- Reduce(merge, dend2$lower)
## or merge() all four parts at same height --> 4 branches (!)
d. <- merge(dend2$lower[[1]], dend2$lower[[2]], dend2$lower[[3]],
## (with a warning) or the same using :
stopifnot(identical(d.,, dend2$lower)))
plot(d., main = "merge(d1, d2, d3, d4)  |->  dendrogram with a 4-split")

## "Zoom" in to the first dendrogram :
plot(dend1, xlim = c(1,20), ylim = c(1,50))

nP <- list(col = 3:2, cex = c(2.0, 0.75), pch =  21:22,
           bg =  c("light blue", "pink"),
           lab.cex = 0.75, lab.col = "tomato")
plot(d3, nodePar= nP, edgePar = list(col = "gray", lwd = 2), horiz = TRUE)
addE <- function(n) {
      if(!is.leaf(n)) {
        attr(n, "edgePar") <- list(p.col = "plum")
        attr(n, "edgetext") <- paste(attr(n,"members"),"members")
d3e <- dendrapply(d3, addE)
plot(d3e, nodePar =  nP)
plot(d3e, nodePar =  nP, leaflab = "textlike")

Kernel Density Estimation


The (S3) generic function density computes kernel density estimates. Its default method does so with the given kernel and bandwidth for univariate observations.


density(x, ...)
## Default S3 method:
density(x, bw = "nrd0", adjust = 1,
        kernel = c("gaussian", "epanechnikov", "rectangular",
                   "triangular", "biweight",
                   "cosine", "optcosine"),
        weights = NULL, window = kernel, width,
        give.Rkern = FALSE, subdensity = FALSE,
        warnWbw = var(weights) > 0,
        n = 512, from, to, cut = 3, ext = 4,
        old.coords = FALSE,
        na.rm = FALSE, ...)



the data from which the estimate is to be computed. For the default method a numeric vector: long vectors are not supported.


the smoothing bandwidth to be used. The kernels are scaled such that this is the standard deviation of the smoothing kernel. (Note this differs from the reference books cited below.)

bw can also be a character string giving a rule to choose the bandwidth. See bw.nrd.
The default, "nrd0", has remained the default for historical and compatibility reasons, rather than as a general recommendation, where e.g., "SJ" would rather fit, see also Venables and Ripley (2002).

The specified (or computed) value of bw is multiplied by adjust.


the bandwidth used is actually adjust*bw. This makes it easy to specify values like ‘half the default’ bandwidth.

kernel, window

a character string giving the smoothing kernel to be used. This must partially match one of "gaussian", "rectangular", "triangular", "epanechnikov", "biweight", "cosine" or "optcosine", with default "gaussian", and may be abbreviated to a unique prefix (single letter).

"cosine" is smoother than "optcosine", which is the usual ‘cosine’ kernel in the literature and almost MSE-efficient. However, "cosine" is the version used by S.


numeric vector of non-negative observation weights, hence of same length as x. The default NULL is equivalent to weights = rep(1/nx, nx) where nx is the length of (the finite entries of) x[]. If na.rm = TRUE and there are NA's in x, they and the corresponding weights are removed before computations. In that case, when the original weights have summed to one, they are re-scaled to keep doing so.

Note that weights are not taken into account for automatic bandwidth rules, i.e., when bw is a string. When the weights are proportional to true counts cn, density(x = rep(x, cn)) may be used instead of weights.


this exists for compatibility with S; if given, and bw is not, will set bw to width if this is a character string, or to a kernel-dependent multiple of width if this is numeric.


logical; if true, no density is estimated, and the ‘canonical bandwidth’ of the chosen kernel is returned instead.


used only when weights are specified which do not sum to one. When true, it indicates that a “sub-density” is desired and no warning should be signalled. By default, when false, a warning is signalled when the weights do not sum to one.


logical, used only when weights are specified and bw is character, i.e., automatic bandwidth selection is chosen (as by default). When true (as by default), a warning is signalled to alert the user that automatic bandwidth selection will not take the weights into account and hence may be suboptimal.


the number of equally spaced points at which the density is to be estimated. When n > 512, it is rounded up to a power of 2 during the calculations (as fft is used) and the final result is interpolated by approx. So it almost always makes sense to specify n as a power of two.

from, to

the left and right-most points of the grid at which the density is to be estimated; the defaults are cut * bw outside of range(x).


by default, the values of from and to are cut bandwidths beyond the extremes of the data. This allows the estimated density to drop to approximately zero at the extremes.


a positive extension factor, 4 by default. The values from and to are further extended on both sides to lo <- from - ext * bw and up <- to + ext * bw which are then used to build the grid used for the FFT and interpolation, see n above. Do not change unless you know what you are doing!


logical to require pre-R 4.4.0 behaviour which gives too large values by a factor of about (1+1/(2n2))(1 + 1/(2n-2)).


logical; if TRUE, missing values are removed from x. If FALSE any missing values cause an error.


further arguments for (non-default) methods.


The algorithm used in density.default disperses the mass of the empirical distribution function over a regular grid of at least 512 points and then uses the fast Fourier transform to convolve this approximation with a discretized version of the kernel and then uses linear approximation to evaluate the density at the specified points.

The statistical properties of a kernel are determined by σK2=t2K(t)dt\sigma^2_K = \int t^2 K(t) dt which is always =1= 1 for our kernels (and hence the bandwidth bw is the standard deviation of the kernel) and R(K)=K2(t)dtR(K) = \int K^2(t) dt.
MSE-equivalent bandwidths (for different kernels) are proportional to σKR(K)\sigma_K R(K) which is scale invariant and for our kernels equal to R(K)R(K). This value is returned when give.Rkern = TRUE. See the examples for using exact equivalent bandwidths.

Infinite values in x are assumed to correspond to a point mass at +/-Inf and the density estimate is of the sub-density on (-Inf, +Inf).


If give.Rkern is true, the number R(K)R(K), otherwise an object with class "density" whose underlying structure is a list containing the following components.


the n coordinates of the points where the density is estimated.


the estimated density values. These will be non-negative, but can be zero.


the bandwidth used.


the sample size after elimination of missing values.


the call which produced the result.

the deparsed name of the x argument.

logical, for compatibility (always FALSE).

The print method reports summary values on the x and y components.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole (for S version).

Scott, D. W. (1992). Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.

Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society Series B, 53, 683–690. doi:10.1111/j.2517-6161.1991.tb01857.x.

Silverman, B. W. (1986). Density Estimation. London: Chapman and Hall.

Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. New York: Springer.

bw.nrd, plot.density, hist; fft and convolve for the computational short cut used.



plot(density(c(-20, rep(0,98), 20)), xlim = c(-4, 4))  # IQR = 0

# The Old Faithful geyser data
d <- density(faithful$eruptions, bw = "sj")

plot(d, type = "n")
polygon(d, col = "wheat")

## Missing values:
x <- xx <- faithful$eruptions
x[i.out <- sample(length(x), 10)] <- NA
doR <- density(x, bw = 0.15, na.rm = TRUE)
lines(doR, col = "blue")
points(xx[i.out], rep(0.01, 10))

## Weighted observations:
fe <- sort(faithful$eruptions) # has quite a few non-unique values
## use 'counts / n' as weights:
dw <- density(unique(fe), weights = table(fe)/length(fe), bw = d$bw)
utils::str(dw) ## smaller n: only 126, but identical estimate:
stopifnot(all.equal(d[1:3], dw[1:3]))

## simulation from a density() fit:
# a kernel density fit is an equally-weighted mixture.
fit <- density(xx)
N <- 1e6 <- rnorm(N, sample(xx, size = N, replace = TRUE), fit$bw)
lines(density(, col = "blue")

## The available kernels:
(kernels <- eval(formals(density.default)$kernel))

## show the kernels in the R parametrization
plot (density(0, bw = 1), xlab = "",
      main = "R's density() kernels with bw = 1")
for(i in 2:length(kernels))
   lines(density(0, bw = 1, kernel =  kernels[i]), col = i)
legend(1.5,.4, legend = kernels, col = seq(kernels),
       lty = 1, cex = .8, y.intersp = 1)

## show the kernels in the S parametrization
plot(density(0, from = -1.2, to = 1.2, width = 2, kernel = "gaussian"),
     type = "l", ylim = c(0, 1), xlab = "",
     main = "R's density() kernels with width = 1")
for(i in 2:length(kernels))
   lines(density(0, width = 2, kernel =  kernels[i]), col = i)
legend(0.6, 1.0, legend = kernels, col = seq(kernels), lty = 1)

##-------- Semi-advanced theoretic from here on -------------

## Explore the old.coords TRUE --> FALSE change:
set.seed(7); x <- runif(2^12) # N = 4096
den  <- density(x) # -> grid of n = 512 points
den0 <- density(x, old.coords = TRUE)
summary(den0$y / den$y) # 1.001 ... 1.011
summary(    den0$y / den$y - 1) # ~= 1/(2n-2)
summary(1/ (den0$y / den$y - 1))# ~=    2n-2 = 1022
corr0 <- 1 - 1/(2*512-2) # 1 - 1/(2n-2)
all.equal(den$y, den0$y * corr0)# ~ 0.0001
plot(den$x, (den0$y - den$y)/den$y, type='o', cex=1/4)
title("relative error of density(runif(2^12), old.coords=TRUE)")
abline(h = 1/1022, v = range(x), lty=2); axis(2, at=1/1022, "1/(2n-2)", las=1)

## The R[K] for our kernels:
(RKs <- cbind(sapply(kernels,
                     function(k) density(kernel = k, give.Rkern = TRUE))))
100*round(RKs["epanechnikov",]/RKs, 4) ## Efficiencies

bw <- bw.SJ(precip) ## sensible automatic choice
plot(density(precip, bw = bw),
     main = "same sd bandwidths, 7 different kernels")
for(i in 2:length(kernels))
   lines(density(precip, bw = bw, kernel = kernels[i]), col = i)

## Bandwidth Adjustment for "Exactly Equivalent Kernels"
h.f <- sapply(kernels, function(k)density(kernel = k, give.Rkern = TRUE))
(h.f <- (h.f["gaussian"] / h.f)^ .2)
## -> 1, 1.01, .995, 1.007,... close to 1 => adjustment barely visible..

plot(density(precip, bw = bw),
     main = "equivalent bandwidths, 7 different kernels")
for(i in 2:length(kernels))
   lines(density(precip, bw = bw, adjust = h.f[i], kernel = kernels[i]),
         col = i)
legend(55, 0.035, legend = kernels, col = seq(kernels), lty = 1)

Symbolic and Algorithmic Derivatives of Simple Expressions


Compute derivatives of simple expressions, symbolically and algorithmically.


D (expr, name)
 deriv(expr, ...)
deriv3(expr, ...)

 ## Default S3 method:
deriv(expr, namevec, function.arg = NULL, tag = ".expr",
       hessian = FALSE, ...)
 ## S3 method for class 'formula'
deriv(expr, namevec, function.arg = NULL, tag = ".expr",
       hessian = FALSE, ...)

## Default S3 method:
deriv3(expr, namevec, function.arg = NULL, tag = ".expr",
       hessian = TRUE, ...)
## S3 method for class 'formula'
deriv3(expr, namevec, function.arg = NULL, tag = ".expr",
       hessian = TRUE, ...)



a expression or call or (except D) a formula with no lhs.

name, namevec

character vector, giving the variable names (only one for D()) with respect to which derivatives will be computed.


if specified and non-NULL, a character vector of arguments for a function return, or a function (with empty body) or TRUE, the latter indicating that a function with argument names namevec should be used.


character; the prefix to be used for the locally created variables in result. Must be no longer than 60 bytes when translated to the native encoding.


a logical value indicating whether the second derivatives should be calculated and incorporated in the return value.


arguments to be passed to or from methods.


D is modelled after its S namesake for taking simple symbolic derivatives.

deriv is a generic function with a default and a formula method. It returns a call for computing the expr and its (partial) derivatives, simultaneously. It uses so-called algorithmic derivatives. If function.arg is a function, its arguments can have default values, see the fx example below.

Currently, deriv.formula just calls deriv.default after extracting the expression to the right of ~.

deriv3 and its methods are equivalent to deriv and its methods except that hessian defaults to TRUE for deriv3.

The internal code knows about the arithmetic operators +, -, *, / and ^, and the single-variable functions exp, log, sin, cos, tan, sinh, cosh, sqrt, pnorm, dnorm, asin, acos, atan, gamma, lgamma, digamma and trigamma, as well as psigamma for one or two arguments (but derivative only with respect to the first). (Note that only the standard normal distribution is considered.)
Since R 3.4.0, the single-variable functions log1p, expm1, log2, log10, cospi, sinpi, tanpi, factorial, and lfactorial are supported as well.


D returns a call and therefore can easily be iterated for higher derivatives.

deriv and deriv3 normally return an expression object whose evaluation returns the function values with a "gradient" attribute containing the gradient matrix. If hessian is TRUE the evaluation also returns a "hessian" attribute containing the Hessian array.

If function.arg is not NULL, deriv and deriv3 return a function with those arguments rather than an expression.


Griewank, A. and Corliss, G. F. (1991) Automatic Differentiation of Algorithms: Theory, Implementation, and Application. SIAM proceedings, Philadelphia.

Bates, D. M. and Chambers, J. M. (1992) Nonlinear models. Chapter 10 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

nlm and optim for numeric minimization which could make use of derivatives,


## formula argument :
dx2x <- deriv(~ x^2, "x") ; dx2x
## Not run: expression({
         .value <- x^2
         .grad <- array(0, c(length(.value), 1), list(NULL, c("x")))
         .grad[, "x"] <- 2 * x
         attr(.value, "gradient") <- .grad
## End(Not run)
x <- -1:2

## Something 'tougher':
trig.exp <- expression(sin(cos(x + y^2)))
( <- D(trig.exp, "x") )
all.equal(D(trig.exp[[1]], "x"),

( dxy <- deriv(trig.exp, c("x", "y")) )
y <- 1

## function returned:
deriv((y ~ sin(cos(x) * y)), c("x","y"), function.arg = TRUE)

## function with defaulted arguments:
(fx <- deriv(y ~ b0 + b1 * 2^(-x/th), c("b0", "b1", "th"),
             function(b0, b1, th, x = 1:7){} ) )
fx(2, 3, 4)

## First derivative

D(expression(x^2), "x")
stopifnot(D("x"), "x") == 1)

## Higher derivatives
deriv3(y ~ b0 + b1 * 2^(-x/th), c("b0", "b1", "th"),
     c("b0", "b1", "th", "x") )

## Higher derivatives:
DD <- function(expr, name, order = 1) {
   if(order < 1) stop("'order' must be >= 1")
   if(order == 1) D(expr, name)
   else DD(D(expr, name), name, order - 1)
DD(expression(sin(x^2)), "x", 3)
## showing the limits of the internal "simplify()" :
## Not run: 
-sin(x^2) * (2 * x) * 2 + ((cos(x^2) * (2 * x) * (2 * x) + sin(x^2) *
    2) * (2 * x) + sin(x^2) * (2 * x) * 2)

## End(Not run)

## New (R 3.4.0, 2017):
D(quote(log1p(x^2)), "x") ## log1p(x) = log(1 + x)
       D(quote(log1p(x^2)), "x"),
       D(quote(log(1+x^2)), "x")))
D(quote(expm1(x^2)), "x") ## expm1(x) = exp(x) - 1
       D(quote(expm1(x^2)), "x") -> Dex1,
       D(quote(exp(x^2)-1), "x")),
       identical(Dex1, quote(exp(x^2) * (2 * x))))

D(quote(sinpi(x^2)), "x") ## sinpi(x) = sin(pi*x)
D(quote(cospi(x^2)), "x") ## cospi(x) = cos(pi*x)
D(quote(tanpi(x^2)), "x") ## tanpi(x) = tan(pi*x)

stopifnot(identical(D(quote(log2 (x^2)), "x"),
                    quote(2 * x/(x^2 * log(2)))),
          identical(D(quote(log10(x^2)), "x"),
                    quote(2 * x/(x^2 * log(10)))))

Model Deviance


Returns the deviance of a fitted model object.


deviance(object, ...)



an object for which the deviance is desired.


additional optional argument.


This is a generic function which can be used to extract deviances for fitted models. Consult the individual modeling functions for details on how to use this function.


The value of the deviance extracted from the object object.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

See Also

df.residual, extractAIC, glm, lm.

Residual Degrees-of-Freedom


Returns the residual degrees-of-freedom extracted from a fitted model object.


df.residual(object, ...)



an object for which the degrees-of-freedom are desired.


additional optional arguments.


This is a generic function which can be used to extract residual degrees-of-freedom for fitted models. Consult the individual modeling functions for details on how to use this function.

The default method just extracts the df.residual component.


The value of the residual degrees-of-freedom extracted from the object x.

deviance, glm, lm.

Discrete Integration: Inverse of Differencing


Computes the inverse function of the lagged differences function diff.


diffinv(x, ...)

## Default S3 method:
diffinv(x, lag = 1, differences = 1, xi, ...)
## S3 method for class 'ts'
diffinv(x, lag = 1, differences = 1, xi, ...)



a numeric vector, matrix, or time series.


a scalar lag parameter.


an integer representing the order of the difference.


a numeric vector, matrix, or time series containing the initial values for the integrals. If missing, zeros are used.


arguments passed to or from other methods.


diffinv is a generic function with methods for class "ts" and default for vectors and matrices.

Missing values are not handled.


A numeric vector, matrix, or time series (the latter for the "ts" method) representing the discrete integral of x.


A. Trapletti

s <- 1:10
d <- diff(s)
diffinv(d, xi = 1)

Distance Matrix Computation


This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix.


dist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)

as.dist(m, diag = FALSE, upper = FALSE)
## Default S3 method:
as.dist(m, diag = FALSE, upper = FALSE)

## S3 method for class 'dist'
print(x, diag = NULL, upper = NULL,
      digits = getOption("digits"), justify = "none",
      right = TRUE, ...)

## S3 method for class 'dist'
as.matrix(x, ...)



a numeric matrix, data frame or "dist" object.


the distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous substring can be given.


logical value indicating whether the diagonal of the distance matrix should be printed by print.dist.


logical value indicating whether the upper triangle of the distance matrix should be printed by print.dist.


The power of the Minkowski distance.


An object with distance information to be converted to a "dist" object. For the default method, a "dist" object, or a matrix (of distances) or an object which can be coerced to such a matrix using as.matrix(). (Only the lower triangle of the matrix is used, the rest is ignored).

digits, justify

passed to format inside of print().

right, ...

further arguments, passed to other methods.


Available distance measures are (written for two vectors xx and yy):


Usual distance between the two vectors (2 norm aka L2L_2), i(xiyi)2\sqrt{\sum_i (x_i - y_i)^2}.


Maximum distance between two components of xx and yy (supremum norm)


Absolute distance between the two vectors (1 norm aka L1L_1).


ixiyi/(xi+yi)\sum_i |x_i - y_i| / (|x_i| + |y_i|). Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing.

This is intended for non-negative values (e.g., counts), in which case the denominator can be written in various equivalent ways; Originally, R used xi+yix_i + y_i, then from 1998 to 2017, xi+yi|x_i + y_i|, and then the correct xi+yi|x_i| + |y_i|.


(aka asymmetric binary): The vectors are regarded as binary bits, so non-zero elements are ‘on’ and zero elements are ‘off’. The distance is the proportion of bits in which only one is on amongst those in which at least one is on. This also called “Jaccard” distance in some contexts. Here, two all-zero observations have distance 0, whereas in traditional Jaccard definitions, the distance would be undefined for that case and give NaN numerically.


The pp norm, the pp-th root of the sum of the pp-th powers of the differences of the components.

Missing values are allowed, and are excluded from all computations involving the rows within which they occur. Further, when Inf values are involved, all pairs of values are excluded when their contribution to the distance gave NaN or NA. If some columns are excluded in calculating a Euclidean, Manhattan, Canberra or Minkowski distance, the sum is scaled up proportionally to the number of columns used. If all pairs are excluded when calculating a particular distance, the value is NA.

The "dist" method of as.matrix() and as.dist() can be used for conversion between objects of class "dist" and conventional distance matrices.

as.dist() is a generic function. Its default method handles objects inheriting from class "dist", or coercible to matrices using as.matrix(). Support for classes representing distances (also known as dissimilarities) can be added by providing an as.matrix() or, more directly, an as.dist method for such a class.


dist returns an object of class "dist".

The lower triangle of the distance matrix stored by columns in a vector, say do. If n is the number of observations, i.e., n <- attr(do, "Size"), then for i<jni < j \le n, the dissimilarity between (row) i and j is do[n*(i-1) - i*(i-1)/2 + j-i]. The length of the vector is n(n1)/2n*(n-1)/2, i.e., of order n2n^2.

The object has the following attributes (besides "class" equal to "dist"):


integer, the number of observations in the dataset.


optionally, contains the labels, if any, of the observations of the dataset.

Diag, Upper

logicals corresponding to the arguments diag and upper above, specifying how the object should be printed.


optionally, the call used to create the object.


optionally, the distance method used; resulting from dist(), the (match.arg()ed) method argument.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979) Multivariate Analysis. Academic Press.

Borg, I. and Groenen, P. (1997) Modern Multidimensional Scaling. Theory and Applications. Springer.

daisy in the cluster package with more possibilities in the case of mixed (continuous / categorical) variables. hclust.



x <- matrix(rnorm(100), nrow = 5)
dist(x, diag = TRUE)
dist(x, upper = TRUE)
m <- as.matrix(dist(x))
d <- as.dist(m)
stopifnot(d == dist(x))

## Use correlations between variables "as distance"
dd <- as.dist((1 - cor(USJudgeRatings))/2)
round(1000 * dd) # (prints more nicely)
plot(hclust(dd)) # to see a dendrogram of clustered variables

## example of binary and canberra distances.
x <- c(0, 0, 1, 1, 1, 1)
y <- c(1, 0, 1, 1, 0, 1)
dist(rbind(x, y), method = "binary")
## answer 0.4 = 2/5
dist(rbind(x, y), method = "canberra")
## answer 2 * (6/5)

## To find the names

## Examples involving "Inf" :
## 1)
x[6] <- Inf
(m2 <- rbind(x, y))
dist(m2, method = "binary")   # warning, answer 0.5 = 2/4
## These all give "Inf":
stopifnot(Inf == dist(m2, method =  "euclidean"),
          Inf == dist(m2, method =  "maximum"),
          Inf == dist(m2, method =  "manhattan"))
##  "Inf" is same as very large number:
x1 <- x; x1[6] <- 1e100
stopifnot(dist(cbind(x, y), method = "canberra") ==
    print(dist(cbind(x1, y), method = "canberra")))

## 2)
y[6] <- Inf #-> 6-th pair is excluded
dist(rbind(x, y), method = "binary"  )   # warning; 0.5
dist(rbind(x, y), method = "canberra"  ) # 3
dist(rbind(x, y), method = "maximum")    # 1
dist(rbind(x, y), method = "manhattan")  # 2.4

Distributions in the stats package


Density, cumulative distribution function, quantile function and random variate generation for many standard probability distributions are available in the stats package.


The functions for the density/mass function, cumulative distribution function, quantile function and random variate generation are named in the form dxxx, pxxx, qxxx and rxxx respectively.

For the beta distribution see dbeta.

For the binomial (including Bernoulli) distribution see dbinom.

For the Cauchy distribution see dcauchy.

For the chi-squared distribution see dchisq.

For the exponential distribution see dexp.

For the F distribution see df.

For the gamma distribution see dgamma.

For the geometric distribution see dgeom. (This is also a special case of the negative binomial.)

For the hypergeometric distribution see dhyper.

For the log-normal distribution see dlnorm.

For the multinomial distribution see dmultinom.

For the negative binomial distribution see dnbinom.

For the normal distribution see dnorm.

For the Poisson distribution see dpois.

For the Student's t distribution see dt.

For the uniform distribution see dunif.

For the Weibull distribution see dweibull.

For less common distributions of test statistics see pbirthday, dsignrank, ptukey and dwilcox (and see the ‘See Also’ section of cor.test).

RNG about random number generation in R.

The CRAN task view on distributions,, mentioning several CRAN packages for additional distributions.

Extract Coefficients in Original Coding


This extracts coefficients in terms of the original levels of the coefficients rather than the coded variables.


dummy.coef(object, ...)

## S3 method for class 'lm'
dummy.coef(object, = FALSE, ...)

## S3 method for class 'aovlist'
dummy.coef(object, = FALSE, ...)



a linear model fit.

logical flag for coefficients in a singular model. If is true, undetermined coefficients will be missing; if false they will get one possible value.


arguments passed to or from other methods.


A fitted linear model has coefficients for the contrasts of the factor terms, usually one less in number than the number of levels. This function re-expresses the coefficients in the original coding; as the coefficients will have been fitted in the reduced basis, any implied constraints (e.g., zero sum for contr.helmert or contr.sum) will be respected. There will be little point in using dummy.coef for contr.treatment contrasts, as the missing coefficients are by definition zero.

The method used has some limitations, and will give incomplete results for terms such as poly(x, 2). However, it is adequate for its main purpose, aov models.


A list giving for each term the values of the coefficients. For a multistratum aov model, such a list for each stratum.


This function is intended for human inspection of the output: it should not be used for calculations. Use coded variables for all calculations.

The results differ from S for singular values, where S can be incorrect.

aov, model.tables


options(contrasts = c("contr.helmert", "contr.poly"))
## From Venables and Ripley (2002) p.165.
npk.aov <- aov(yield ~ block + N*P*K, npk)

npk.aovE <- aov(yield ~  N*P*K + Error(block), npk)

Empirical Cumulative Distribution Function


Compute an empirical cumulative distribution function, with several methods for plotting, printing and computing with such an “ecdf” object.



## S3 method for class 'ecdf'
plot(x, ..., ylab="Fn(x)", verticals = FALSE,
     col.01line = "gray70", pch = 19)

## S3 method for class 'ecdf'
print(x, digits= getOption("digits") - 2, ...)

## S3 method for class 'ecdf'
summary(object, ...)
## S3 method for class 'ecdf'
quantile(x, ...)


x, object

numeric vector of the observations for ecdf; for the methods, an object inheriting from class "ecdf".


arguments to be passed to subsequent methods, e.g., plot.stepfun for the plot method.


label for the y-axis.


see plot.stepfun.


numeric or character specifying the color of the horizontal lines at y = 0 and 1, see colors.


plotting character.


number of significant digits to use, see print.


The e.c.d.f. (empirical cumulative distribution function) FnF_n is a step function with jumps i/ni/n at observation values, where ii is the number of tied observations at that value. Missing values are ignored.

For observations x=(= (x1,x2x_1,x_2, ... xn)x_n), FnF_n is the fraction of observations less or equal to tt, i.e.,

Fn(t)=#{xit} /n=1ni=1n1[xit].F_n(t) = \#\{x_i\le t\}\ / n = \frac1 n\sum_{i=1}^n \mathbf{1}_{[x_i \le t]}.

The function plot.ecdf which implements the plot method for ecdf objects, is implemented via a call to plot.stepfun; see its documentation.


For ecdf, a function of class "ecdf", inheriting from the "stepfun" class, and hence inheriting a knots() method.

For the summary method, a summary of the knots of object with a "header" attribute.

The quantile(obj, ...) method computes the same quantiles as quantile(x, ...) would where x is the original sample.


The objects of class "ecdf" are not intended to be used for permanent storage and may change structure between versions of R (and did at R 3.0.0). They can usually be re-created by

    eval(attr(old_obj, "call"), environment(old_obj))

since the data used is stored as part of the object's environment.


See Also

stepfun, the more general class of step functions, approxfun and splinefun.


##-- Simple didactical  ecdf  example :
x <- rnorm(12)
Fn <- ecdf(x)
Fn     # a *function*
Fn(x)  # returns the percentiles for x
tt <- seq(-2, 2, by = 0.1)
12 * Fn(tt) # Fn is a 'simple' function {with values k/12}
##--> see below for graphics
knots(Fn)  # the unique data values {12 of them if there were no ties}

y <- round(rnorm(12), 1); y[3] <- y[1]
Fn12 <- ecdf(y)
knots(Fn12) # unique values (always less than 12!)

## Advanced: What's inside the function closure?
## "f"     "method" "na.rm"  "nobs"   "x"     "y"    "yleft"  "yright"
stopifnot(all.equal(quantile(Fn12), quantile(y)))

###----------------- Plotting --------------------------

op <- par(mfrow = c(3, 1), mgp = c(1.5, 0.8, 0), mar =  .1+c(3,3,2,1))

F10 <- ecdf(rnorm(10))

plot(F10, verticals = TRUE, do.points = FALSE)

plot(Fn12 , lwd = 2) ; mtext("lwd = 2", adj = 1)
xx <- unique(sort(c(seq(-3, 2, length.out = 201), knots(Fn12))))
lines(xx, Fn12(xx), col = "blue")
abline(v = knots(Fn12), lty = 2, col = "gray70")

plot(xx, Fn12(xx), type = "o", cex = .1)  #- plot.default {ugly}
plot(Fn12, col.hor = "red", add =  TRUE)  #- plot method
abline(v = knots(Fn12), lty = 2, col = "gray70")
## luxury plot
plot(Fn12, verticals = TRUE, col.points = "blue",
     col.hor = "red", col.vert = "bisque")

##-- this works too (automatic call to  ecdf(.)):
title("via  simple  plot.ecdf(x)", adj = 1)


Compute Efficiencies of Multistratum Analysis of Variance


Computes the efficiencies of fixed-effect terms in an analysis of variance model with multiple strata.





The result of a call to aov with an Error term.


Fixed-effect terms in an analysis of variance model with multiple strata may be estimable in more than one stratum, in which case there is less than complete information in each. The efficiency for a term is the fraction of the maximum possible precision (inverse variance) obtainable by estimating in just that stratum. Under the assumption of balance, this is the same for all contrasts involving that term.

This function is used to pick strata in which to estimate terms in model.tables.aovlist and se.contrast.aovlist.

In many cases terms will only occur in one stratum, when all the efficiencies will be one: this is detected and no further calculations are done.

The calculation used requires orthogonal contrasts for each term, and will throw an error if non-orthogonal contrasts (e.g., treatment contrasts or an unbalanced design) are detected.


A matrix giving for each non-pure-error stratum (row) the efficiencies for each fixed-effect term in the model.


Heiberger, R. M. (1989) Computation for the Analysis of Designed Experiments. Wiley.

See Also

aov, model.tables.aovlist, se.contrast.aovlist


## An example from Yates (1932),
## a 2^3 design in 2 blocks replicated 4 times

Block <- gl(8, 4)
A <- factor(c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,
B <- factor(c(0,0,1,1,0,0,1,1,0,1,0,1,1,0,1,0,0,0,1,1,
C <- factor(c(0,1,1,0,1,0,0,1,0,0,1,1,0,0,1,1,0,1,0,1,
Yield <- c(101, 373, 398, 291, 312, 106, 265, 450, 106, 306, 324, 449,
           272, 89, 407, 338, 87, 324, 279, 471, 323, 128, 423, 334,
           131, 103, 445, 437, 324, 361, 302, 272)
aovdat <- data.frame(Block, A, B, C, Yield)

old <- getOption("contrasts")
options(contrasts = c("contr.helmert", "contr.poly"))

(fit <- aov(Yield ~ A*B*C + Error(Block), data = aovdat))

options(contrasts = old)

Effects from Fitted Model


Returns (orthogonal) effects from a fitted model, usually a linear model. This is a generic function, but currently only has a methods for objects inheriting from classes "lm" and "glm".


effects(object, ...)

## S3 method for class 'lm'
effects(object, set.sign = FALSE, ...)



an R object; typically, the result of a model fitting function such as lm.


logical. If TRUE, the sign of the effects corresponding to coefficients in the model will be set to agree with the signs of the corresponding coefficients, otherwise the sign is arbitrary.


arguments passed to or from other methods.


For a linear model fitted by lm or aov, the effects are the uncorrelated single-degree-of-freedom values obtained by projecting the data onto the successive orthogonal subspaces generated by the QR decomposition during the fitting process. The first rr (the rank of the model) are associated with coefficients and the remainder span the space of residuals (but are not associated with particular residuals).

Empty models do not have effects.


A (named) numeric vector of the same length as residuals, or a matrix if there were multiple responses in the fitted model, in either case of class "coef".

The first rr rows are labelled by the corresponding coefficients, and the remaining rows are unlabelled. Note that in rank-deficient models the corresponding coefficients will be in a different order if pivoting occurred.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

See Also



y <- c(1:3, 7, 5)
x <- c(1:3, 6:7)
( ee <- effects(lm(y ~ x)) )
c( round(ee - effects(lm(y+10 ~ I(x-3.8))), 3) )
# just the first is different

Embedding a Time Series


Embeds the time series x into a low-dimensional Euclidean space.


embed (x, dimension = 1)



a numeric vector, matrix, or time series.


a scalar representing the embedding dimension.


Each row of the resulting matrix consists of sequences x[t], x[t-1], ..., x[t-dimension+1], where t is the original index of x. If x is a matrix, i.e., x contains more than one variable, then x[t] consists of the t-th observation on each variable.


A matrix containing the embedded time series x.


A. Trapletti, B.D. Ripley


x <- 1:10
embed (x, 3)

Add new variables to a model frame


Evaluates new variables as if they had been part of the formula of the specified model. This ensures that the same na.action and subset arguments are applied and allows, for example, x to be recovered for a model using sin(x) as a predictor.


expand.model.frame(model, extras,
                   envir = environment(formula(model)),
                   na.expand = FALSE)



a fitted model


one-sided formula or vector of character strings describing new variables to be added


an environment to evaluate things in


logical; see below


If na.expand = FALSE then NA values in the extra variables will be passed to the na.action function used in model. This may result in a shorter data frame (with na.omit) or an error (with If na.expand = TRUE the returned data frame will have precisely the same rows as model.frame(model), but the columns corresponding to the extra variables may contain NA.


A data frame.

See Also

model.frame, predict


model <- lm(log(Volume) ~ log(Girth) + log(Height), data = trees)
expand.model.frame(model, ~ Girth) # prints data.frame like

dd <- data.frame(x = 1:5, y = rnorm(5), z = c(1,2,NA,4,5))
model <- glm(y ~ x, data = dd, subset = 1:4, na.action = na.omit)
expand.model.frame(model, "z", na.expand = FALSE) # = default
expand.model.frame(model, "z", na.expand = TRUE)

The Exponential Distribution


Density, distribution function, quantile function and random generation for the exponential distribution with rate rate (i.e., mean 1/rate).


dexp(x, rate = 1, log = FALSE)
pexp(q, rate = 1, lower.tail = TRUE, log.p = FALSE)
qexp(p, rate = 1, lower.tail = TRUE, log.p = FALSE)
rexp(n, rate = 1)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.


vector of rates.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


If rate is not specified, it assumes the default value of 1.

The exponential distribution with rate λ\lambda has density

f(x)=λeλxf(x) = \lambda {e}^{- \lambda x}

for x0x \ge 0.


dexp gives the density, pexp gives the distribution function, qexp gives the quantile function, and rexp generates random deviates.

The length of the result is determined by n for rexp, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


The cumulative hazard H(t)=log(1F(t))H(t) = - \log(1 - F(t)) is -pexp(t, r, lower = FALSE, log = TRUE).


dexp, pexp and qexp are all calculated from numerically stable versions of the definitions.

rexp uses

Ahrens, J. H. and Dieter, U. (1972). Computer methods for sampling from the exponential and normal distributions. Communications of the ACM, 15, 873–882.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 19. Wiley, New York.

See Also

exp for the exponential function.

Distributions for other standard distributions, including dgamma for the gamma distribution and dweibull for the Weibull distribution, both of which generalize the exponential.


dexp(1) - exp(-1) #-> 0

## a fast way to generate *sorted*  U[0,1]  random numbers:
rsunif <- function(n) { n1 <- n+1
   cE <- cumsum(rexp(n1)); cE[seq_len(n)]/cE[n1] }
plot(rsunif(1000), ylim=0:1, pch=".")
abline(0,1/(1000+1), col=adjustcolor(1, 0.5))

Extract AIC from a Fitted Model


Computes the (generalized) Akaike An Information Criterion for a fitted parametric model.


extractAIC(fit, scale, k = 2, ...)



fitted model, usually the result of a fitter like lm.


optional numeric specifying the scale parameter of the model, see scale in step. Currently only used in the "lm" method, where scale specifies the estimate of the error variance, and scale = 0 indicates that it is to be estimated by maximum likelihood.


numeric specifying the ‘weight’ of the equivalent degrees of freedom (\equiv edf) part in the AIC formula.


further arguments (currently unused in base R).


This is a generic function, with methods in base R for classes "aov", "glm" and "lm" as well as for "negbin" (package MASS) and "coxph" and "survreg" (package survival).

The criterion used is

AIC=2logL+k×edf,AIC = - 2\log L + k \times \mbox{edf},

where LL is the likelihood and edf the equivalent degrees of freedom (i.e., the number of free parameters for usual parametric models) of fit.

For linear models with unknown scale (i.e., for lm and aov), 2logL-2\log L is computed from the deviance and uses a different additive constant to logLik and hence AIC. If RSSRSS denotes the (weighted) residual sum of squares then extractAIC uses for 2logL- 2\log L the formulae RSS/snRSS/s - n (corresponding to Mallows' CpC_p) in the case of known scale ss and nlog(RSS/n)n \log (RSS/n) for unknown scale. AIC only handles unknown scale and uses the formula nlog(RSS/n)+n+nlog2πlogwn \log (RSS/n) + n + n \log 2\pi - \sum \log w where ww are the weights. Further AIC counts the scale estimation as a parameter in the edf and extractAIC does not.

For glm fits the family's aic() function is used to compute the AIC: see the note under logLik about the assumptions this makes.

k = 2 corresponds to the traditional AIC, using k = log(n) provides the BIC (Bayesian IC) instead.

Note that the methods for this function may differ in their assumptions from those of methods for AIC (usually via a method for logLik). We have already mentioned the case of "lm" models with estimated scale, and there are similar issues in the "glm" and "negbin" methods where the dispersion parameter may or may not be taken as ‘free’. This is immaterial as extractAIC is only used to compare models of the same class (where only differences in AIC values are considered).


A numeric vector of length 2, with first and second elements giving


the ‘equivalent degrees of freedom’ for the fitted model fit.


the (generalized) Akaike Information Criterion for fit.


This function is used in add1, drop1 and step and the similar functions in package MASS from which it was adopted.


B. D. Ripley


Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. New York: Springer (4th ed).

AIC, deviance, add1, step


extractAIC(glm.D93)  #>>  5  15.129

Factor Analysis


Perform maximum-likelihood factor analysis on a covariance matrix or data matrix.


factanal(x, factors, data = NULL, covmat = NULL, n.obs = NA,
         subset, na.action, start = NULL,
         scores = c("none", "regression", "Bartlett"),
         rotation = "varimax", control = NULL, ...)



A formula or a numeric matrix or an object that can be coerced to a numeric matrix.


The number of factors to be fitted.


An optional data frame (or similar: see model.frame), used only if x is a formula. By default the variables are taken from environment(formula).


A covariance matrix, or a covariance list as returned by cov.wt. Of course, correlation matrices are covariance matrices.


The number of observations, used if covmat is a covariance matrix.


A specification of the cases to be used, if x is used as a matrix or formula.


The na.action to be used if x is used as a formula.


NULL or a matrix of starting values, each column giving an initial set of uniquenesses.


Type of scores to produce, if any. The default is none, "regression" gives Thompson's scores, "Bartlett" given Bartlett's weighted least-squares scores. Partial matching allows these names to be abbreviated.


character. "none" or the name of a function to be used to rotate the factors: it will be called with first argument the loadings matrix, and should return a list with component loadings giving the rotated loadings, or just the rotated loadings.


A list of control values,


The number of starting values to be tried if start = NULL. Default 1.


logical. Output tracing information? Default FALSE.


The lower bound for uniquenesses during optimization. Should be > 0. Default 0.005.


A list of control values to be passed to optim's control argument.


a list of additional arguments for the rotation function.


Components of control can also be supplied as named arguments to factanal.


The factor analysis model is

x=Λf+ex = \Lambda f + e

Thus factor analysis is in essence a model for the correlation matrix of xx,

Σ=ΛΛ+Ψ\Sigma = \Lambda\Lambda^\prime + \Psi

There is still some indeterminacy in the model for it is unchanged if Λ\Lambda is replaced by GΛG \Lambda for any orthogonal matrix GG. Such matrices GG are known as rotations (although the term is applied also to non-orthogonal invertible matrices).

If covmat is supplied it is used. Otherwise x is used if it is a matrix, or a formula x is used with data to construct a model matrix, and that is used to construct a covariance matrix. (It makes no sense for the formula to have a response, and all the variables must be numeric.) Once a covariance matrix is found or calculated from x, it is converted to a correlation matrix for analysis. The correlation matrix is returned as component correlation of the result.

The fit is done by optimizing the log likelihood assuming multivariate normality over the uniquenesses. (The maximizing loadings for given uniquenesses can be found analytically: Lawley & Maxwell (1971, p. 27).) All the starting values supplied in start are tried in turn and the best fit obtained is used. If start = NULL then the first fit is started at the value suggested by Jöreskog (1963) and given by Lawley & Maxwell (1971, p. 31), and then control$nstart - 1 other values are tried, randomly selected as equal values of the uniquenesses.

The uniquenesses are technically constrained to lie in [0,1][0, 1], but near-zero values are problematical, and the optimization is done with a lower bound of control$lower, default 0.005 (Lawley & Maxwell, 1971, p. 32).

Scores can only be produced if a data matrix is supplied and used. The first method is the regression method of Thomson (1951), the second the weighted least squares method of Bartlett (1937, 8). Both are estimates of the unobserved scores ff. Thomson's method regresses (in the population) the unknown ff on xx to yield

f^=ΛΣ1x\hat f = \Lambda^\prime \Sigma^{-1} x

and then substitutes the sample estimates of the quantities on the right-hand side. Bartlett's method minimizes the sum of squares of standardized errors over the choice of ff, given (the fitted) Λ\Lambda.

If x is a formula then the standard NA-handling is applied to the scores (if requested): see napredict.

The print method (documented under loadings) follows the factor analysis convention of drawing attention to the patterns of the results, so the default precision is three decimal places, and small loadings are suppressed.


An object of class "factanal" with components


A matrix of loadings, one column for each factor. The factors are ordered in decreasing order of sums of squares of loadings, and given the sign that will make the sum of the loadings positive. This is of class "loadings": see loadings for its print method.


The uniquenesses computed.


The correlation matrix used.


The results of the optimization: the value of the criterion (a linear function of the negative log-likelihood) and information on the iterations used.


The argument factors.


The number of degrees of freedom of the factor analysis model.


The method: always "mle".


The rotation matrix if relevant.


If requested, a matrix of scores. napredict is applied to handle the treatment of values omitted by the na.action.


The number of observations if available, or NA.


The matched call.


If relevant.


The significance-test statistic and P value, if it can be computed.


There are so many variations on factor analysis that it is hard to compare output from different programs. Further, the optimization in maximum likelihood factor analysis is hard, and many other examples we compared had less good fits than produced by this function. In particular, solutions which are ‘Heywood cases’ (with one or more uniquenesses essentially zero) are much more common than most texts and some other programs would lead one to believe.


Bartlett, M. S. (1937). The statistical conception of mental factors. British Journal of Psychology, 28, 97–104. doi:10.1111/j.2044-8295.1937.tb00863.x.

Bartlett, M. S. (1938). Methods of estimating mental factors. Nature, 141, 609–610. doi:10.1038/141246a0.

Jöreskog, K. G. (1963). Statistical Estimation in Factor Analysis. Almqvist and Wicksell.

Lawley, D. N. and Maxwell, A. E. (1971). Factor Analysis as a Statistical Method. Second edition. Butterworths.

Thomson, G. H. (1951). The Factorial Analysis of Human Ability. London University Press.

loadings (which explains some details of the print method), varimax, princomp, ability.cov, Harman23.cor, Harman74.cor.

Other rotation methods are available in various contributed packages, including GPArotation and psych.


# A little demonstration, v2 is just v1 with noise,
# and same for v4 vs. v3 and v6 vs. v5
# Last four cases are there to add noise
# and introduce a positive manifold (g factor)
v1 <- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6)
v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)
v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)
v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4)
v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)
v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4)
m1 <- cbind(v1,v2,v3,v4,v5,v6)
factanal(m1, factors = 3) # varimax is the default
factanal(m1, factors = 3, rotation = "promax")
# The following shows the g factor as PC1
prcomp(m1) # signs may depend on platform

## formula interface
factanal(~v1+v2+v3+v4+v5+v6, factors = 3,
         scores = "Bartlett")$scores

## a realistic example from Bartholomew (1987, pp. 61-65)

Compute Allowed Changes in Adding to or Dropping from a Formula


add.scope and drop.scope compute those terms that can be individually added to or dropped from a model while respecting the hierarchy of terms.


add.scope(terms1, terms2)

drop.scope(terms1, terms2)

factor.scope(factor, scope)



the terms or formula for the base model.


the terms or formula for the upper (add.scope) or lower (drop.scope) scope. If missing for drop.scope it is taken to be the null formula, so all terms (except any intercept) are candidates to be dropped.


the "factor" attribute of the terms of the base object.


a list with one or both components drop and add giving the "factor" attribute of the lower and upper scopes respectively.


factor.scope is not intended to be called directly by users.


For add.scope and drop.scope a character vector of terms labels. For factor.scope, a list with components drop and add, character vectors of terms labels.

add1, drop1, aov, lm


add.scope( ~ a + b + c + a:b,  ~ (a + b + c)^3)
# [1] "a:c" "b:c"
drop.scope( ~ a + b + c + a:b)
# [1] "c"   "a:b"

Family Objects for Models


Family objects provide a convenient way to specify the details of the models used by functions such as glm. See the documentation for glm for the details on how such model fitting takes place.


family(object, ...)

binomial(link = "logit")
gaussian(link = "identity")
Gamma(link = "inverse")
inverse.gaussian(link = "1/mu^2")
poisson(link = "log")
quasi(link = "identity", variance = "constant")
quasibinomial(link = "logit")
quasipoisson(link = "log")



a specification for the model link function. This can be a name/expression, a literal character string, a length-one character vector, or an object of class "link-glm" (such as generated by provided it is not specified via one of the standard names given next.

The gaussian family accepts the links (as names) identity, log and inverse; the binomial family the links logit, probit, cauchit, (corresponding to logistic, normal and Cauchy CDFs respectively) log and cloglog (complementary log-log); the Gamma family the links inverse, identity and log; the poisson family the links log, identity, and sqrt; and the inverse.gaussian family the links 1/mu^2, inverse, identity and log.

The quasi family accepts the links logit, probit, cloglog, identity, inverse, log, 1/mu^2 and sqrt, and the function power can be used to create a power link function.


for all families other than quasi, the variance function is determined by the family. The quasi family will accept the literal character string (or unquoted as a name/expression) specifications "constant", "mu(1-mu)", "mu", "mu^2" and "mu^3", a length-one character vector taking one of those values, or a list containing components varfun, validmu, dev.resids, initialize and name.


the function family accesses the family objects which are stored within objects created by modelling functions (e.g., glm).


further arguments passed to methods.


family is a generic function with methods for classes "glm" and "lm" (the latter returning gaussian()).

For the binomial and quasibinomial families the response can be specified in one of three ways:

  1. As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).

  2. As a numerical vector with values between 0 and 1, interpreted as the proportion of successful cases (with the total number of cases given by the weights).

  3. As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.

The quasibinomial and quasipoisson families differ from the binomial and poisson families only in that the dispersion parameter is not fixed at one, so they can model over-dispersion. For the binomial case see McCullagh and Nelder (1989, pp. 124–8). Although they show that there is (under some restrictions) a model with variance proportional to mean as in the quasi-binomial model, note that glm does not compute maximum-likelihood estimates in that model. The behaviour of S is closer to the quasi- variants.


An object of class "family" (which has a concise print method). This is a list with elements


character: the family name.


character: the link name.


function: the link.


function: the inverse of the link function.


function: the variance as a function of the mean.


function giving the deviance for each observation as a function of (y, mu, wt), used by the residuals method when computing deviance residuals.


function giving the AIC value if appropriate (but NA for the quasi- families). More precisely, this function returns 2+2s-2\ell + 2 s, where \ell is the log-likelihood and ss is the number of estimated scale parameters. Note that the penalty term for the location parameters (typically the “regression coefficients”) is added elsewhere, e.g., in, or AIC(), see the AIC example in glm. See logLik for the assumptions made about the dispersion parameter.


function: derivative of the inverse-link function with respect to the linear predictor. If the inverse-link function is μ=g1(η)\mu = g^{-1}(\eta) where η\eta is the value of the linear predictor, then this function returns d(g1)/dη=dμ/dηd(g^{-1})/d\eta = d\mu/d\eta.


expression. This needs to set up whatever data objects are needed for the family as well as n (needed for AIC in the binomial family) and mustart (see glm).


logical function. Returns TRUE if a mean vector mu is within the domain of variance.


logical function. Returns TRUE if a linear predictor eta is within the domain of linkinv.


(optional) function simulate(object, nsim) to be called by the "lm" method of simulate. It will normally return a matrix with nsim columns and one row for each fitted value, but it can also return a list of length nsim. Clearly this will be missing for ‘quasi-’ families.


(optional since R version 4.3.0) numeric: value of the dispersion parameter, if fixed, or NA_real_ if free.


The link and variance arguments have rather awkward semantics for back-compatibility. The recommended way is to supply them as quoted character strings, but they can also be supplied unquoted (as names or expressions). Additionally, they can be supplied as a length-one character vector giving the name of one of the options, or as a list (for link, of class "link-glm"). The restrictions apply only to links given as names: when given as a character string all the links known to are accepted.

This is potentially ambiguous: supplying link = logit could mean the unquoted name of a link or the value of object logit. It is interpreted if possible as the name of an allowed link, then as an object. (You can force the interpretation to always be the value of an object via logit[1].)


The design was inspired by S functions of the same names described in Hastie & Pregibon (1992) (except quasibinomial and quasipoisson).


McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.

Dobson, A. J. (1983) An Introduction to Statistical Modelling. London: Chapman and Hall.

Cox, D. R. and Snell, E. J. (1981). Applied Statistics; Principles and Examples. London: Chapman and Hall.

Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

glm, power,

For binomial coefficients, choose; the binomial and negative binomial distributions, Binomial, and NegBinomial.


require(utils) # for str

nf <- gaussian()  # Normal family

gf <- Gamma()
gf$variance(-3:4) #- == (.)^2

## Binomial with default 'logit' link:  Check some properties visually:
bi <- binomial()
et <- seq(-10,10, by=1/8)
plot(et, bi$mu.eta(et), type="l")
## show that mu.eta() is derivative of linkinv() :
lines((et[-1]+et[-length(et)])/2, col=adjustcolor("red", 1/4),
      diff(bi$linkinv(et))/diff(et), type="l", lwd=4)
## which here is the logistic density:
lines(et, dlogis(et), lwd=3, col=adjustcolor("blue", 1/4))
stopifnot(exprs = {
  all.equal(bi$ mu.eta(et), dlogis(et))
  all.equal(bi$linkinv(et), plogis(et) -> m)
  all.equal(bi$linkfun(m ), qlogis(m))    #  logit(.) == qlogis(.) !

## Data from example(glm) :
d.AD <- data.frame(treatment = gl(3,3),
                   outcome   = gl(3,1,9),
                   counts    = c(18,17,15, 20,10,20, 25,13,12))
glm.D93 <- glm(counts ~ outcome + treatment, d.AD, family = poisson())
## Quasipoisson: compare with above / example(glm) :
glm.qD93 <- glm(counts ~ outcome + treatment, d.AD, family = quasipoisson())

anova  (glm.qD93, test = "F")
## for Poisson results (same as from 'glm.D93' !) use
anova  (glm.qD93, dispersion = 1, test = "Chisq")
summary(glm.qD93, dispersion = 1)

## Example of user-specified link, a logit model for p^days
## See Shaffer, T.  2004. Auk 121(2): 526-540.
logexp <- function(days = 1)
    linkfun <- function(mu) qlogis(mu^(1/days))
    linkinv <- function(eta) plogis(eta)^days
    mu.eta  <- function(eta) days * plogis(eta)^(days-1) *
    valideta <- function(eta) TRUE
    link <- paste0("logexp(", days, ")")
    structure(list(linkfun = linkfun, linkinv = linkinv,
                   mu.eta = mu.eta, valideta = valideta, name = link),
              class = "link-glm")
(bil3 <- binomial(logexp(3)))

## in practice this would be used with a vector of 'days', in
## which case use an offset of 0 in the corresponding formula
## to get the null deviance right.

## Binomial with identity link: often not a good idea, as both
## computationally and conceptually difficult:
binomial(link = "identity")  ## is exactly the same as
binomial(link ="identity"))

## tests of quasi
x <- rnorm(100)
y <- rpois(100, exp(1+x))
glm(y ~ x, family = quasi(variance = "mu", link = "log"))
# which is the same as
glm(y ~ x, family = poisson)
glm(y ~ x, family = quasi(variance = "mu^2", link = "log"))
## Not run: glm(y ~ x, family = quasi(variance = "mu^3", link = "log")) # fails
y <- rbinom(100, 1, plogis(x))
# need to set a starting value for the next fit
glm(y ~ x, family = quasi(variance = "mu(1-mu)", link = "logit"), start = c(0,1))

The F Distribution


Density, distribution function, quantile function and random generation for the F distribution with df1 and df2 degrees of freedom (and optional non-centrality parameter ncp).


df(x, df1, df2, ncp, log = FALSE)
pf(q, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)
qf(p, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)
rf(n, df1, df2, ncp)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.

df1, df2

degrees of freedom. Inf is allowed.


non-centrality parameter. If omitted the central F is assumed.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


The F distribution with df1 = ν1\nu_1 and df2 = ν2\nu_2 degrees of freedom has density

f(x)=Γ(ν1/2+ν2/2)Γ(ν1/2)Γ(ν2/2)(ν1ν2)ν1/2xν1/21(1+ν1xν2)(ν1+ν2)/2f(x) = \frac{\Gamma(\nu_1/2 + \nu_2/2)}{\Gamma(\nu_1/2)\Gamma(\nu_2/2)} \left(\frac{\nu_1}{\nu_2}\right)^{\nu_1/2} x^{\nu_1/2 -1} \left(1 + \frac{\nu_1 x}{\nu_2}\right)^{-(\nu_1 + \nu_2) / 2}%

for x>0x > 0.

The F distribution's cumulative distribution function (cdf), Fν1,ν2F_{\nu_1,\nu_2} fulfills (Abramowitz & Stegun 26.6.2, p.946) Fν1,ν2(qF)=1Ix(ν2/2,ν1/2)=I1x(ν1/2,ν2/2),F_{\nu_1,\nu_2}(qF) = 1 - I_x(\nu_2/2, \nu_1/2) = I_{1-x}(\nu_1/2, \nu_2/2), where x:=ν2ν2+ν1qFx := \frac{\nu_2}{\nu_2 + \nu_1*qF}, and Ix(a,b)I_x(a,b) is the incomplete beta function; in R, == pbeta(x, a,b).

It is the distribution of the ratio of the mean squares of ν1\nu_1 and ν2\nu_2 independent standard normals, and hence of the ratio of two independent chi-squared variates each divided by its degrees of freedom. Since the ratio of a normal and the root mean-square of mm independent normals has a Student's tmt_m distribution, the square of a tmt_m variate has a F distribution on 1 and mm degrees of freedom.

The non-central F distribution is again the ratio of mean squares of independent normals of unit variance, but those in the numerator are allowed to have non-zero means and ncp is the sum of squares of the means. See Chisquare for further details on non-central distributions.


df gives the density, pf gives the distribution function qf gives the quantile function, and rf generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rf, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


Supplying ncp = 0 uses the algorithm for the non-central distribution, which is not the same algorithm used if ncp is omitted. This is to give consistent behaviour in extreme cases with values of ncp very near zero.

The code for non-zero ncp is principally intended to be used for moderate values of ncp: it will not be highly accurate, especially in the tails, for large values.


For the central case of df, computed via a binomial probability, code contributed by Catherine Loader (see dbinom); for the non-central case computed via dbeta, code contributed by Peter Ruckdeschel.

For pf, via pbeta (or for large df2, via pchisq).

For qf, via qchisq for large df2, else via qbeta.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 2, chapters 27 and 30. Wiley, New York.

Distributions for other standard distributions, including dchisq for chi-squared and dt for Student's t distributions.


## Equivalence of pt(.,nu) with pf(.^2, 1,nu):
x <- seq(0.001, 5, length.out = 100)
nu <- 4
stopifnot(all.equal(2*pt(x,nu) - 1, pf(x^2, 1,nu)),
          ## upper tails:
 	  all.equal(2*pt(x,     nu, lower.tail=FALSE),
		      pf(x^2, 1,nu, lower.tail=FALSE)))

## the density of the square of a t_m is 2*dt(x, m)/(2*x)
# check this is the same as the density of F_{1,m}
all.equal(df(x^2, 1, 5), dt(x, 5)/x)

## Identity (F <-> t):  qf(2*p - 1, 1, df) == qt(p, df)^2  for  p >= 1/2
p <- seq(1/2, .99, length.out = 50); df <- 10
rel.err <- function(x, y) ifelse(x == y, 0, abs(x-y)/mean(abs(c(x,y))))
stopifnot(all.equal(qf(2*p - 1, df1 = 1, df2 = df),
                    qt(p, df)^2))

## Identity (F <-> Beta <-> incompl.beta):
n1 <- 7 ; n2 <- 12; qF <- c((0:4)/4, 1.5, 2:16)
x <- n2/(n2 + n1*qF)
stopifnot(all.equal(pf(qF, n1, n2, lower.tail=FALSE),
                    pbeta(x, n2/2, n1/2)))

Fast Discrete Fourier Transform (FFT)


Computes the Discrete Fourier Transform (DFT) of an array with a fast algorithm, the “Fast Fourier Transform” (FFT).


fft(z, inverse = FALSE)
mvfft(z, inverse = FALSE)



a real or complex array containing the values to be transformed. Long vectors are not supported.


if TRUE, the unnormalized inverse transform is computed (the inverse has a + in the exponent of ee, but here, we do not divide by 1/length(x)).


When z is a vector, the value computed and returned by fft is the unnormalized univariate discrete Fourier transform of the sequence of values in z. Specifically, y <- fft(z) returns

y[h]=k=1nz[k]exp(2πi(k1)(h1)/n)y[h] = \sum_{k=1}^n z[k] \exp(-2\pi i (k-1) (h-1)/n)

for h=1,,nh = 1,\ldots,n where n = length(y). If inverse is TRUE, exp(2π)\exp(-2\pi\ldots) is replaced with exp(2π)\exp(2\pi\ldots).

When z contains an array, fft computes and returns the multivariate (spatial) transform. If inverse is TRUE, the (unnormalized) inverse Fourier transform is returned, i.e., if y <- fft(z), then z is fft(y, inverse = TRUE) / length(y).

By contrast, mvfft takes a real or complex matrix as argument, and returns a similar shaped matrix, but with each column replaced by its discrete Fourier transform. This is useful for analyzing vector-valued series.

The FFT is fastest when the length of the series being transformed is highly composite (i.e., has many factors). If this is not the case, the transform may take a long time to compute and will use a large amount of memory.


Uses C translation of Fortran code in Singleton (1979).


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.

Singleton, R. C. (1979). Mixed Radix Fast Fourier Transforms, in Programs for Digital Signal Processing, IEEE Digital Signal Processing Committee eds. IEEE Press.

Cooley, James W., and Tukey, John W. (1965). An algorithm for the machine calculation of complex Fourier series, Mathematics of Computation, 19(90), 297–301. doi:10.2307/2003354.

convolve, nextn.


x <- 1:4
fft(fft(x), inverse = TRUE)/length(x)

## Slow Discrete Fourier Transform (DFT) - e.g., for checking the formula
fft0 <- function(z, inverse=FALSE) {
  n <- length(z)
  if(n == 0) return(z)
  k <- 0:(n-1)
  ff <- (if(inverse) 1 else -1) * 2*pi * 1i * k/n
  vapply(1:n, function(h) sum(z * exp(ff*(h-1))), complex(1))

relD <- function(x,y) 2* abs(x - y) / abs(x + y)
n <- 2^8
z <- complex(n, rnorm(n), rnorm(n))
## relative differences in the order of 4*10^{-14} :
summary(relD(fft(z), fft0(z)))
summary(relD(fft(z, inverse=TRUE), fft0(z, inverse=TRUE)))

Linear Filtering on a Time Series


Applies linear filtering to a univariate time series or to each series separately of a multivariate time series.


filter(x, filter, method = c("convolution", "recursive"),
       sides = 2, circular = FALSE, init)



a univariate or multivariate time series.


a vector of filter coefficients in reverse time order (as for AR or MA coefficients).


Either "convolution" or "recursive" (and can be abbreviated). If "convolution" a moving average is used: if "recursive" an autoregression is used.


for convolution filters only. If sides = 1 the filter coefficients are for past values only; if sides = 2 they are centred around lag 0. In this case the length of the filter should be odd, but if it is even, more of the filter is forward in time than backward.


for convolution filters only. If TRUE, wrap the filter around the ends of the series, otherwise assume external values are missing (NA).


for recursive filters only. Specifies the initial values of the time series just prior to the start value, in reverse time order. The default is a set of zeros.


Missing values are allowed in x but not in filter (where they would lead to missing values everywhere in the output).

Note that there is an implied coefficient 1 at lag 0 in the recursive filter, which gives

yi=xi+f1yi1++fpyipy_i = x_i + f_1y_{i-1} + \cdots + f_py_{i-p}

No check is made to see if recursive filter is invertible: the output may diverge if it is not.

The convolution filter is

yi=f1xi+o++fpxi+o(p1)y_i = f_1x_{i+o} + \cdots + f_px_{i+o-(p-1)}

where o is the offset: see sides for how it is determined.


A time series object.


convolve(, type = "filter") uses the FFT for computations and so may be faster for long filters on univariate series, but it does not return a time series (and so the time alignment is unclear), nor does it handle missing values. filter is faster for a filter of length 100 on a series of length 1000, for example.

See Also

convolve, arima.sim


x <- 1:100
filter(x, rep(1, 3))
filter(x, rep(1, 3), sides = 1)
filter(x, rep(1, 3), sides = 1, circular = TRUE)

filter(presidents, rep(1, 3))

Fisher's Exact Test for Count Data


Performs Fisher's exact test for testing the null of independence of rows and columns in a contingency table with fixed marginals.


fisher.test(x, y = NULL, workspace = 200000, hybrid = FALSE,
            hybridPars = c(expect = 5, percent = 80, Emin = 1),
            control = list(), or = 1, alternative = "two.sided",
   = TRUE, conf.level = 0.95,
            simulate.p.value = FALSE, B = 2000)



either a two-dimensional contingency table in matrix form, or a factor object.


a factor object; ignored if x is a matrix.


an integer specifying the size of the workspace used in the network algorithm. In units of 4 bytes. Only used for non-simulated p-values larger than 2×22 \times 2 tables. Since R version 3.5.0, this also increases the internal stack size which allows larger problems to be solved, however sometimes needing hours. In such cases, simulate.p.values=TRUE may be more reasonable.


a logical. Only used for larger than 2×22 \times 2 tables, in which cases it indicates whether the exact probabilities (default) or a hybrid approximation thereof should be computed.


a numeric vector of length 3, by default describing “Cochran's conditions” for the validity of the chi-squared approximation, see ‘Details’.


a list with named components for low level algorithm control. At present the only one used is "mult", a positive integer 2\ge 2 with default 30 used only for larger than 2×22 \times 2 tables. This says how many times as much space should be allocated to paths as to keys: see file ‘fexact.c’ in the sources of this package.


the hypothesized odds ratio. Only used in the 2×22 \times 2 case.


indicates the alternative hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. Only used in the 2×22 \times 2 case.

logical indicating if a confidence interval for the odds ratio in a 2×22 \times 2 table should be computed (and returned).


confidence level for the returned confidence interval. Only used in the 2×22 \times 2 case and if = TRUE.


a logical indicating whether to compute p-values by Monte Carlo simulation, in larger than 2×22 \times 2 tables.


an integer specifying the number of replicates used in the Monte Carlo test.


If x is a matrix, it is taken as a two-dimensional contingency table, and hence its entries should be nonnegative integers. Otherwise, both x and y must be vectors or factors of the same length. Incomplete cases are removed, vectors are coerced into factor objects, and the contingency table is computed from these.

For 2×22 \times 2 cases, p-values are obtained directly using the (central or non-central) hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1983, 1986) and improved by Clarkson, Fan and Joe (1993). The FORTRAN code can be obtained from Note this fails (with an error message) when the entries of the table are too large. (It transposes the table if necessary so it has no more rows than columns. One constraint is that the product of the row marginals be less than 23112^{31} - 1.)

For 2×22 \times 2 tables, the null of conditional independence is equivalent to the hypothesis that the odds ratio equals one. ‘Exact’ inference can be based on observing that in general, given all marginal totals fixed, the first element of the contingency table has a non-central hypergeometric distribution with non-centrality parameter given by the odds ratio (Fisher, 1935). The alternative for a one-sided test is based on the odds ratio, so alternative = "greater" is a test of the odds ratio being bigger than or.

Two-sided tests are based on the probabilities of the tables, and take as ‘more extreme’ all tables with probabilities less than or equal to that of the observed table, the p-value being the sum of such probabilities.

For larger than 2×22 \times 2 tables and hybrid = TRUE, asymptotic chi-squared probabilities are only used if the ‘Cochran conditions’ (or modified version thereof) specified by hybridPars = c(expect = 5, percent = 80, Emin = 1) are satisfied, that is if no cell has expected counts less than 1 (= Emin) and more than 80% (= percent) of the cells have expected counts at least 5 (= expect), otherwise the exact calculation is used. A corresponding if() decision is made for all sub-tables considered. Accidentally, R has used 180 instead of 80 as percent, i.e., hybridPars[2] in R versions between 3.0.0 and 3.4.1 (inclusive), i.e., the 2nd of the hybridPars (all of which used to be hard-coded previous to R 3.5.0). Consequently, in these versions of R, hybrid=TRUE never made a difference.

In the r×cr \times c case with r>2r > 2 or c>2c > 2, internal tables can get too large for the exact test in which case an error is signalled. Apart from increasing workspace sufficiently, which then may lead to very long running times, using simulate.p.value = TRUE may then often be sufficient and hence advisable.

Simulation is done conditional on the row and column marginals, and works only if the marginals are strictly positive. (A C translation of the algorithm of Patefield (1981) is used.) Note that the default number of replicates (B = 2000) implies a minimum p-value of about 0.0005 (1/(B+1)1/(B+1)).


A list with class "htest" containing the following components:


the p-value of the test.

a confidence interval for the odds ratio. Only present in the 2×22 \times 2 case and if argument = TRUE.


an estimate of the odds ratio. Note that the conditional Maximum Likelihood Estimate (MLE) rather than the unconditional MLE (the sample odds ratio) is used. Only present in the 2×22 \times 2 case.


the odds ratio under the null, or. Only present in the 2×22 \times 2 case.


a character string describing the alternative hypothesis.


the character string "Fisher's Exact Test for Count Data".

a character string giving the name(s) of the data.


Agresti, A. (1990). Categorical data analysis. New York: Wiley. Pages 59–66.

Agresti, A. (2002). Categorical data analysis. Second edition. New York: Wiley. Pages 91–101.

Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society Series A, 98, 39–54. doi:10.2307/2342435.

Fisher, R. A. (1962). Confidence limits for a cross-product ratio. Australian Journal of Statistics, 4, 41. doi:10.1111/j.1467-842X.1962.tb00285.x.

Fisher, R. A. (1970). Statistical Methods for Research Workers. Oliver & Boyd.

Mehta, Cyrus R. and Patel, Nitin R. (1983). A network algorithm for performing Fisher's exact test in r×cr \times c contingency tables. Journal of the American Statistical Association, 78, 427–434. doi:10.1080/01621459.1983.10477989.

Mehta, C. R. and Patel, N. R. (1986). Algorithm 643: FEXACT, a FORTRAN subroutine for Fisher's exact test on unordered r×cr \times c contingency tables. ACM Transactions on Mathematical Software, 12, 154–161. doi:10.1145/6497.214326.

Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm 643: FEXACT: An Algorithm for Performing Fisher's Exact Test in r×cr \times c Contingency Tables. ACM Transactions on Mathematical Software, 19, 484–488. doi:10.1145/168173.168412.

Patefield, W. M. (1981). Algorithm AS 159: An efficient method of generating r x c tables with given row and column totals. Applied Statistics, 30, 91–97. doi:10.2307/2346669.

fisher.exact in package exact2x2 for alternative interpretations of two-sided tests and confidence intervals for 2×22 \times 2 tables.


## Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker
## A British woman claimed to be able to distinguish whether milk or
##  tea was added to the cup first.  To test, she was given 8 cups of
##  tea, in four of which milk was added first.  The null hypothesis
##  is that there is no association between the true order of pouring
##  and the woman's guess, the alternative that there is a positive
##  association (that the odds ratio is greater than 1).
TeaTasting <-
matrix(c(3, 1, 1, 3),
       nrow = 2,
       dimnames = list(Guess = c("Milk", "Tea"),
                       Truth = c("Milk", "Tea")))
fisher.test(TeaTasting, alternative = "greater")
## => p = 0.2429, association could not be established

## Fisher (1962, 1970), Criminal convictions of like-sex twins
Convictions <- matrix(c(2, 10, 15, 3), nrow = 2,
	              dimnames =
	       list(c("Dizygotic", "Monozygotic"),
		    c("Convicted", "Not convicted")))
fisher.test(Convictions, alternative = "less")
fisher.test(Convictions, = FALSE)
fisher.test(Convictions, conf.level = 0.95)$
fisher.test(Convictions, conf.level = 0.99)$

## A r x c table  Agresti (2002, p. 57) Job Satisfaction
Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4,
           dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"),
                     satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS")))
fisher.test(Job) # 0.7827
fisher.test(Job, simulate.p.value = TRUE, B = 1e5) # also close to 0.78

## 6th example in Mehta & Patel's JASA paper
MP6 <- rbind(
# Exactly the same p-value, as Cochran's conditions are never met:
fisher.test(MP6, hybrid=TRUE)

Extract Model Fitted Values


fitted is a generic function which extracts fitted values from objects returned by modeling functions. fitted.values is an alias for it.

All object classes which are returned by model fitting functions should provide a fitted method. (Note that the generic is fitted and not fitted.values.)

Methods can make use of napredict methods to compensate for the omission of missing values. The default and nls methods do.


fitted(object, ...)
fitted.values(object, ...)



an object for which the extraction of model fitted values is meaningful.


other arguments.


Fitted values extracted from the object object.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

See Also

coefficients, glm, lm, residuals.

Tukey Five-Number Summaries


Returns Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum) for the input data.


fivenum(x, na.rm = TRUE)



numeric, maybe including NAs and ±\pmInfs.


logical; if TRUE, all NA and NaNs are dropped, before the statistics are computed.


A numeric vector of length 5 containing the summary information. See boxplot.stats for more details.

See Also

IQR, boxplot.stats, median, quantile, range.


fivenum(c(rnorm(100), -1:1/0))

Fligner-Killeen Test of Homogeneity of Variances


Performs a Fligner-Killeen (median) test of the null that the variances in each of the groups (samples) are the same.


fligner.test(x, ...)

## Default S3 method:
fligner.test(x, g, ...)

## S3 method for class 'formula'
fligner.test(formula, data, subset, na.action, ...)



a numeric vector of data values, or a list of numeric data vectors.


a vector or factor object giving the group for the corresponding elements of x. Ignored if x is a list.


a formula of the form lhs ~ rhs where lhs gives the data values and rhs the corresponding groups.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").


further arguments to be passed to or from methods.


If x is a list, its elements are taken as the samples to be compared for homogeneity of variances, and hence have to be numeric data vectors. In this case, g is ignored, and one can simply use fligner.test(x) to perform the test. If the samples are not yet contained in a list, use fligner.test(list(x, ...)).

Otherwise, x must be a numeric data vector, and g must be a vector or factor object of the same length as x giving the group for the corresponding elements of x.

The Fligner-Killeen (median) test has been determined in a simulation study as one of the many tests for homogeneity of variances which is most robust against departures from normality, see Conover, Johnson & Johnson (1981). It is a kk-sample simple linear rank which uses the ranks of the absolute values of the centered samples and weights a(i)=qnorm((1+i/(n+1))/2)a(i) = \mathrm{qnorm}((1 + i/(n+1))/2). The version implemented here uses median centering in each of the samples (F-K:med X2X^2 in the reference).


A list of class "htest" containing the following components:


the Fligner-Killeen:med X2X^2 test statistic.


the degrees of freedom of the approximate chi-squared distribution of the test statistic.


the p-value of the test.


the character string "Fligner-Killeen test of homogeneity of variances".

a character string giving the names of the data.


William J. Conover, Mark E. Johnson and Myrle M. Johnson (1981). A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics, 23, 351–361. doi:10.2307/1268225.

ansari.test and mood.test for rank-based two-sample test for a difference in scale parameters; var.test and bartlett.test for parametric tests for the homogeneity of variances.



plot(count ~ spray, data = InsectSprays)
fligner.test(InsectSprays$count, InsectSprays$spray)
fligner.test(count ~ spray, data = InsectSprays)
## Compare this to bartlett.test()

Model Formulae


The generic function formula and its specific methods provide a way of extracting formulae which have been included in other objects.

as.formula is almost identical, additionally preserving attributes when object already inherits from "formula".


formula(x, ...)
DF2formula(x, env = parent.frame())
as.formula(object, env = parent.frame())

## S3 method for class 'formula'
print(x, showEnv = !identical(e, .GlobalEnv), ...)


x, object

R object, for DF2formula() a data.frame.


further arguments passed to or from other methods.


the environment to associate with the result, if not already a formula.


logical indicating if the environment should be printed as well.


The models fitted by, e.g., the lm and glm functions are specified in a compact symbolic form. The ~ operator is basic in the formation of such models. An expression of the form y ~ model is interpreted as a specification that the response y is modelled by a linear predictor specified symbolically by model. Such a model consists of a series of terms separated by + operators. The terms themselves consist of variable and factor names separated by : operators. Such a term is interpreted as the interaction of all the variables and factors appearing in the term.

In addition to + and :, a number of other operators are useful in model formulae.

  • The * operator denotes factor crossing: a*b is interpreted as a + b + a:b.

  • The ^ operator indicates crossing to the specified degree. For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula containing the main effects for a, b and c together with their second-order interactions.

  • The %in% operator indicates that the terms on its left are nested within those on the right. For example a + b %in% a expands to the formula a + a:b.

  • The / operator provides a shorthand, so that a / b is equivalent to a + b %in% a.

  • The - operator removes the specified terms, hence (a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c. It can also used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x.

While formulae usually involve just variable and factor names, they can also involve arithmetic expressions. The formula log(y) ~ a + log(x) is quite legal. When such arithmetic expressions involve operators which are also used symbolically in model formulae, there can be confusion between arithmetic and symbolic operator use.

To avoid this confusion, the function I() can be used to bracket those portions of a model formula where the operators are used in their arithmetic sense. For example, in the formula y ~ a + I(b+c), the term b+c is to be interpreted as the sum of b and c.

Variable names can be quoted by backticks `like this` in formulae, although there is no guarantee that all code using formulae will accept such non-syntactic names.

Most model-fitting functions accept formulae with right-hand-side including the function offset to indicate terms with a fixed coefficient of one. Some functions accept other ‘specials’ such as strata or cluster (see the specials argument of terms.formula).

There are two special interpretations of . in a formula. The usual one is in the context of a data argument of model fitting functions and means ‘all columns not otherwise in the formula’: see terms.formula. In the context of update.formula, only, it means ‘what was previously in this part of the formula’.

When formula is called on a fitted model object, either a specific method is used (such as that for class "nls") or the default method. The default first looks for a "formula" component of the object (and evaluates it), then a "terms" component, then a formula parameter of the call (and evaluates its value) and finally a "formula" attribute.

There is a formula method for data frames. When there's "terms" attribute with a formula, e.g., for a model.frame(), that formula is returned. If you'd like the previous (R \le 3.5.x) behavior, use the auxiliary DF2formula() which does not consider a "terms" attribute. Otherwise, if there is only one column this forms the RHS with an empty LHS. For more columns, the first column is the LHS of the formula and the remaining columns separated by + form the RHS.


All the functions above produce an object of class "formula" which contains a symbolic model formula.


A formula object has an associated environment, and this environment (rather than the parent environment) is used by model.frame to evaluate variables that are not found in the supplied data argument.

Formulas created with the ~ operator use the environment in which they were created. Formulas created with as.formula will use the env argument for their environment.


In R versions up to 3.6.0, character x of length more than one were parsed as separate lines of R code and the first complete expression was evaluated into a formula when possible. This silently truncates such vectors of characters inefficiently and to some extent inconsistently as this behaviour had been undocumented. For this reason, such use has been deprecated. If you must work via character x, do use a string, i.e., a character vector of length one.

E.g., eval(call("~", quote(foo + bar))) has been an order of magnitude more efficient than formula(c("~", "foo + bar")).

Further, character “expressions” needing an eval() to return a formula are now deprecated.


Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

~, I, offset.

For formula manipulation: update.formula, terms.formula, and all.vars. For typical use: lm, glm, and coplot. For formula construction: reformulate.


class(fo <- y ~ x1*x2) # "formula"
typeof(fo)  # R internal : "language"

environment(as.formula("y ~ x"))
environment(as.formula("y ~ x", env = new.env()))

## Create a formula for a model with a large number of variables:
xnam <- paste0("x", 1:25)
(fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))
## Equivalent with reformulate():
fmla2 <- reformulate(xnam, response = "y")
stopifnot(identical(fmla, fmla2))

Extract Model Formula from nls Object


Returns the model used to fit object.


## S3 method for class 'nls'
formula(x, ...)



an object inheriting from class "nls", representing a nonlinear least squares fit.


further arguments passed to or from other methods.


a formula representing the model used to obtain object.


José Pinheiro and Douglas Bates

nls, formula


fm1 <- nls(circumference ~ A/(1+exp((B-age)/C)), Orange,
           start = list(A = 160, B = 700, C = 350))

Friedman Rank Sum Test


Performs a Friedman rank sum test with unreplicated blocked data.


friedman.test(y, ...)

## Default S3 method:
friedman.test(y, groups, blocks, ...)

## S3 method for class 'formula'
friedman.test(formula, data, subset, na.action, ...)



either a numeric vector of data values, or a data matrix.


a vector giving the group for the corresponding elements of y if this is a vector; ignored if y is a matrix. If not a factor object, it is coerced to one.


a vector giving the block for the corresponding elements of y if this is a vector; ignored if y is a matrix. If not a factor object, it is coerced to one.


a formula of the form a ~ b | c, where a, b and c give the data values and corresponding groups and blocks, respectively.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").


further arguments to be passed to or from methods.


friedman.test can be used for analyzing unreplicated complete block designs (i.e., there is exactly one observation in y for each combination of levels of groups and blocks) where the normality assumption may be violated.

The null hypothesis is that apart from an effect of blocks, the location parameter of y is the same in each of the groups.

If y is a matrix, groups and blocks are obtained from the column and row indices, respectively. NA's are not allowed in groups or blocks; if y contains NA's, corresponding blocks are removed.


A list with class "htest" containing the following components:


the value of Friedman's chi-squared statistic.


the degrees of freedom of the approximate chi-squared distribution of the test statistic.


the p-value of the test.


the character string "Friedman rank sum test".

a character string giving the names of the data.


Myles Hollander and Douglas A. Wolfe (1973), Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 139–146.

See Also



## Hollander & Wolfe (1973), p. 140ff.
## Comparison of three methods ("round out", "narrow angle", and
##  "wide angle") for rounding first base.  For each of 18 players
##  and the three method, the average time of two runs from a point on
##  the first base line 35ft from home plate to a point 15ft short of
##  second base is recorded.
RoundingTimes <-
matrix(c(5.40, 5.50, 5.55,
         5.85, 5.70, 5.75,
         5.20, 5.60, 5.50,
         5.55, 5.50, 5.40,
         5.90, 5.85, 5.70,
         5.45, 5.55, 5.60,
         5.40, 5.40, 5.35,
         5.45, 5.50, 5.35,
         5.25, 5.15, 5.00,
         5.85, 5.80, 5.70,
         5.25, 5.20, 5.10,
         5.65, 5.55, 5.45,
         5.60, 5.35, 5.45,
         5.05, 5.00, 4.95,
         5.50, 5.50, 5.40,
         5.45, 5.55, 5.50,
         5.55, 5.55, 5.35,
         5.45, 5.50, 5.55,
         5.50, 5.45, 5.25,
         5.65, 5.60, 5.40,
         5.70, 5.65, 5.55,
         6.30, 6.30, 6.25),
       nrow = 22,
       byrow = TRUE,
       dimnames = list(1 : 22,
                       c("Round Out", "Narrow Angle", "Wide Angle")))
## => strong evidence against the null that the methods are equivalent
##    with respect to speed

wb <- aggregate(warpbreaks$breaks,
                by = list(w = warpbreaks$wool,
                          t = warpbreaks$tension),
                FUN = mean)
friedman.test(wb$x, wb$w, wb$t)
friedman.test(x ~ w | t, data = wb)

Flat Contingency Tables


Create ‘flat’ contingency tables.


ftable(x, ...)

## Default S3 method:
ftable(..., exclude = c(NA, NaN), row.vars = NULL,
       col.vars = NULL)


x, ...

R objects which can be interpreted as factors (including character strings), or a list (or data frame) whose components can be so interpreted, or a contingency table object of class "table" or "ftable".


values to use in the exclude argument of factor when interpreting non-factor objects.


a vector of integers giving the numbers of the variables, or a character vector giving the names of the variables to be used for the rows of the flat contingency table.


a vector of integers giving the numbers of the variables, or a character vector giving the names of the variables to be used for the columns of the flat contingency table.


ftable creates ‘flat’ contingency tables. Similar to the usual contingency tables, these contain the counts of each combination of the levels of the variables (factors) involved. This information is then re-arranged as a matrix whose rows and columns correspond to unique combinations of the levels of the row and column variables (as specified by row.vars and col.vars, respectively). The combinations are created by looping over the variables in reverse order (so that the levels of the left-most variable vary the slowest). Displaying a contingency table in this flat matrix form (via print.ftable, the print method for objects of class "ftable") is often preferable to showing it as a higher-dimensional array.

ftable is a generic function. Its default method, ftable.default, first creates a contingency table in array form from all arguments except row.vars and col.vars. If the first argument is of class "table", it represents a contingency table and is used as is; if it is a flat table of class "ftable", the information it contains is converted to the usual array representation using as.table. Otherwise, the arguments should be R objects which can be interpreted as factors (including character strings), or a list (or data frame) whose components can be so interpreted, which are cross-tabulated using table. Then, the arguments row.vars and col.vars are used to collapse the contingency table into flat form. If neither of these two is given, the last variable is used for the columns. If both are given and their union is a proper subset of all variables involved, the other variables are summed out.

When the arguments are R expressions interpreted as factors, additional arguments will be passed to table to control how the variable names are displayed; see the last example below.

Function ftable.formula provides a formula method for creating flat contingency tables.

There are methods for as.table, as.matrix and


ftable returns an object of class "ftable", which is a matrix with counts of each combination of the levels of variables with information on the names and levels of the (row and columns) variables stored as attributes "row.vars" and "col.vars".

See Also

ftable.formula for the formula interface (which allows a data = . argument); read.ftable for information on reading, writing and coercing flat contingency tables; table for ordinary cross-tabulation; xtabs for formula-based cross-tabulation.


## Start with a contingency table.
ftable(Titanic, row.vars = 1:3)
ftable(Titanic, row.vars = 1:2, col.vars = "Survived")
ftable(Titanic, row.vars = 2:1, col.vars = "Survived")

## Start with a data frame.
x <- ftable(mtcars[c("cyl", "vs", "am", "gear")])
ftable(x, row.vars = c(2, 4))

## Start with expressions, use table()'s "dnn" to change labels
ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4),
       dnn = c("Cylinders", "V/S", "Transmission", "Gears"))

Formula Notation for Flat Contingency Tables


Produce or manipulate a flat contingency table using formula notation.


## S3 method for class 'formula'
ftable(formula, data = NULL, subset, na.action, ...)



a formula object with both left and right hand sides specifying the column and row variables of the flat table.


a data frame, list or environment (or similar: see model.frame) containing the variables to be cross-tabulated, or a contingency table (see below).


an optional vector specifying a subset of observations to be used. Ignored if data is a contingency table.


a function which indicates what should happen when the data contain NAs. Ignored if data is a contingency table.


further arguments to the default ftable method may also be passed as arguments, see ftable.default.


This is a method of the generic function ftable.

The left and right hand side of formula specify the column and row variables, respectively, of the flat contingency table to be created. Only the + operator is allowed for combining the variables. A . may be used once in the formula to indicate inclusion of all the remaining variables.

If data is an object of class "table" or an array with more than 2 dimensions, it is taken as a contingency table, and hence all entries should be nonnegative. Otherwise, if it is not a flat contingency table (i.e., an object of class "ftable"), it should be a data frame or matrix, list or environment containing the variables to be cross-tabulated. In this case, na.action is applied to the data to handle missing values, and, after possibly selecting a subset of the data as specified by the subset argument, a contingency table is computed from the variables.

The contingency table is then collapsed to a flat table, according to the row and column variables specified by formula.


A flat contingency table which contains the counts of each combination of the levels of the variables, collapsed into a matrix for suitably displaying the counts.

See Also

ftable, ftable.default; table.


x <- ftable(Survived ~ ., data = Titanic)
ftable(Sex ~ Class + Age, data = x)

The Gamma Distribution


Density, distribution function, quantile function and random generation for the Gamma distribution with parameters shape and scale.


dgamma(x, shape, rate = 1, scale = 1/rate, log = FALSE)
pgamma(q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE,
       log.p = FALSE)
qgamma(p, shape, rate = 1, scale = 1/rate, lower.tail = TRUE,
       log.p = FALSE)
rgamma(n, shape, rate = 1, scale = 1/rate)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.


an alternative way to specify the scale.

shape, scale

shape and scale parameters. Must be positive, scale strictly.

log, log.p

logical; if TRUE, probabilities/densities pp are returned as log(p)log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


If scale is omitted, it assumes the default value of 1.

The Gamma distribution with parameters shape =α=\alpha and scale =σ=\sigma has density

f(x)=1σαΓ(α)xα1ex/σf(x)= \frac{1}{{\sigma}^{\alpha}\Gamma(\alpha)} {x}^{\alpha-1} e^{-x/\sigma}%

for x0x \ge 0, α>0\alpha > 0 and σ>0\sigma > 0. (Here Γ(α)\Gamma(\alpha) is the function implemented by R's gamma() and defined in its help. Note that a=0a = 0 corresponds to the trivial distribution with all mass at point 0.)

The mean and variance are E(X)=ασE(X) = \alpha\sigma and Var(X)=ασ2Var(X) = \alpha\sigma^2.

The cumulative hazard H(t)=log(1F(t))H(t) = - \log(1 - F(t)) is

-pgamma(t, ..., lower = FALSE, log = TRUE)

Note that for smallish values of shape (and moderate scale) a large parts of the mass of the Gamma distribution is on values of xx so near zero that they will be represented as zero in computer arithmetic. So rgamma may well return values which will be represented as zero. (This will also happen for very large values of scale since the actual generation is done for scale = 1.)


dgamma gives the density, pgamma gives the distribution function, qgamma gives the quantile function, and rgamma generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rgamma, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


The S (Becker et al., 1988) parametrization was via shape and rate: S had no scale parameter. It is an error to supply both scale and rate.

pgamma is closely related to the incomplete gamma function. As defined by Abramowitz and Stegun 6.5.1 (and by ‘Numerical Recipes’) this is

P(a,x)=1Γ(a)0xta1etdtP(a,x) = \frac{1}{\Gamma(a)} \int_0^x t^{a-1} e^{-t} dt

P(a,x)P(a, x) is pgamma(x, a). Other authors (for example Karl Pearson in his 1922 tables) omit the normalizing factor, defining the incomplete gamma function γ(a,x)\gamma(a,x) as γ(a,x)=0xta1etdt,\gamma(a,x) = \int_0^x t^{a-1} e^{-t} dt, i.e., pgamma(x, a) * gamma(a). Yet other use the ‘upper’ incomplete gamma function,

Γ(a,x)=xta1etdt,\Gamma(a,x) = \int_x^\infty t^{a-1} e^{-t} dt,

which can be computed by pgamma(x, a, lower = FALSE) * gamma(a).

Note however that pgamma(x, a, ..) currently requires a>0a > 0, whereas the incomplete gamma function is also defined for negative aa. In that case, you can use gamma_inc(a,x) (for Γ(a,x)\Gamma(a,x)) from package gsl.

See also, or


dgamma is computed via the Poisson density, using code contributed by Catherine Loader (see dbinom).

pgamma uses an unpublished (and not otherwise documented) algorithm ‘mainly by Morten Welinder’.

qgamma is based on a C translation of

Best, D. J. and D. E. Roberts (1975). Algorithm AS91. Percentage points of the chi-squared distribution. Applied Statistics, 24, 385–388.

plus a final Newton step to improve the approximation.

rgamma for shape >= 1 uses

Ahrens, J. H. and Dieter, U. (1982). Generating gamma variates by a modified rejection technique. Communications of the ACM, 25, 47–54,

and for 0 < shape < 1 uses

Ahrens, J. H. and Dieter, U. (1974). Computer methods for sampling from gamma, beta, Poisson and binomial distributions. Computing, 12, 223–246.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.

Shea, B. L. (1988). Algorithm AS 239: Chi-squared and incomplete Gamma integral, Applied Statistics (JRSS C), 37, 466–473. doi:10.2307/2347328.

Abramowitz, M. and Stegun, I. A. (1972) Handbook of Mathematical Functions. New York: Dover. Chapter 6: Gamma and Related Functions.

NIST Digital Library of Mathematical Functions., section 8.2.

See Also

gamma for the gamma function.

Distributions for other standard distributions, including dbeta for the Beta distribution and dchisq for the chi-squared distribution which is a special case of the Gamma distribution.


-log(dgamma(1:4, shape = 1))
p <- (1:9)/10
pgamma(qgamma(p, shape = 2), shape = 2)
1 - 1/exp(qgamma(p, shape = 1))

# even for shape = 0.001 about half the mass is on numbers
# that cannot be represented accurately (and most of those as zero)
pgamma(.Machine$double.xmin, 0.001)
pgamma(5e-324, 0.001)  # on most machines 5e-324 is the smallest
                       # representable non-zero number
table(rgamma(1e4, 0.001) == 0)/1e4

The Geometric Distribution


Density, distribution function, quantile function and random generation for the geometric distribution with parameter prob.


dgeom(x, prob, log = FALSE)
pgeom(q, prob, lower.tail = TRUE, log.p = FALSE)
qgeom(p, prob, lower.tail = TRUE, log.p = FALSE)
rgeom(n, prob)


x, q

vector of quantiles representing the number of failures in a sequence of Bernoulli trials before success occurs.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.


probability of success in each trial. 0 < prob <= 1.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


The geometric distribution with prob =p= p has density

p(x)=p(1p)xp(x) = p {(1-p)}^{x}

for x=0,1,2,x = 0, 1, 2, \ldots, 0<p10 < p \le 1.

If an element of x is not integer, the result of dgeom is zero, with a warning.

The quantile is defined as the smallest value xx such that F(x)pF(x) \ge p, where FF is the distribution function.


dgeom gives the density, pgeom gives the distribution function, qgeom gives the quantile function, and rgeom generates random deviates.

Invalid prob will result in return value NaN, with a warning.

The length of the result is determined by n for rgeom, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

rgeom returns a vector of type integer unless generated values exceed the maximum representable integer when double values are returned.


dgeom computes via dbinom, using code contributed by Catherine Loader (see dbinom).

pgeom and qgeom are based on the closed-form formulae.

rgeom uses the derivation as an exponential mixture of Poisson distributions, see

Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

See Also

Distributions for other standard distributions, including dnbinom for the negative binomial which generalizes the geometric distribution.


qgeom((1:9)/10, prob = .2)
Ni <- rgeom(20, prob = 1/4); table(factor(Ni, 0:max(Ni)))

Get Initial Parameter Estimates


This function evaluates initial parameter estimates for a nonlinear regression model. If data is a parameterized data frame or pframe object, its parameters attribute is returned. Otherwise the object is examined to see if it contains a call to a selfStart object whose initial attribute can be evaluated.


getInitial(object, data, ...)



a formula or a selfStart model that defines a nonlinear regression model


a data frame in which the expressions in the formula or arguments to the selfStart model can be evaluated


optional additional arguments


A named numeric vector or list of starting estimates for the parameters. The construction of many selfStart models is such that these "starting" estimates are, in fact, the converged parameter estimates.


José Pinheiro and Douglas Bates

See Also

nls, selfStart, selfStart.default, selfStart.formula. Further, nlsList from nlme.


PurTrt <- Puromycin[ Puromycin$state == "treated", ]
print(getInitial( rate ~ SSmicmen( conc, Vm, K ), PurTrt ), digits = 3)

Fitting Generalized Linear Models


glm is used to fit generalized linear models, specified by giving a symbolic description of the linear predictor and a description of the error distribution.


glm(formula, family = gaussian, data, weights, subset,
    na.action, start = NULL, etastart, mustart, offset,
    control = list(...), model = TRUE, method = "",
    x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = NULL, ...), y, weights =, nobs),
        start = NULL, etastart = NULL, mustart = NULL,
        offset =, nobs), family = gaussian(),
        control = list(), intercept = TRUE, singular.ok = TRUE)

## S3 method for class 'glm'
weights(object, type = c("prior", "working"), ...)



an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.


a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For only the third option is supported. (See family for details of family functions.)


an optional data frame, list or environment (or object coercible by to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which glm is called.


an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector.


an optional vector specifying a subset of observations to be used in the fitting process.


a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.


starting values for the parameters in the linear predictor.


starting values for the linear predictor.


starting values for the vector of means.


this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.


a list of parameters for controlling the fitting process. For this is passed to glm.control.


a logical value indicating whether model frame should be included as a component of the returned value.


the method to be used in fitting the model. The default method "" uses iteratively reweighted least squares (IWLS): the alternative "model.frame" returns the model frame and does no fitting.

User-supplied fitting functions can be supplied either as a function or a character string naming a function, with a function which takes the same arguments as If specified as a character string it is looked up from within the stats namespace.

x, y

For glm: logical values indicating whether the response vector and model matrix used in the fitting process should be returned as components of the returned value.

For x is a design matrix of dimension n * p, and y is a vector of observations of length n.


logical; if FALSE a singular fit is an error.


an optional list. See the contrasts.arg of model.matrix.default.


logical. Should an intercept be included in the null model?


an object inheriting from class "glm".


character, partial matching allowed. Type of weights to extract from the fitted model object. Can be abbreviated.


For glm: arguments to be used to form the default control argument if it is not supplied directly.

For weights: further arguments passed to or from other methods.


A typical predictor has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. For binomial and quasibinomial families the response can also be specified as a factor (when the first level denotes failure and all others success) or as a two-column matrix with the columns giving the numbers of successes and failures. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with any duplicates removed.

A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second.

The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on: to avoid this pass a terms object as the formula.

Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers wiw_i, that each response yiy_i is the mean of wiw_i unit-weight observations. For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes: they would rarely be used for a Poisson GLM. is the workhorse function: it is not normally called directly but can be more efficient where the response vector, design matrix and family have already been calculated.

If more than one of etastart, start and mustart is specified, the first in the list will be used. It is often advisable to supply starting values for a quasi family, and also for families with unusual links such as gaussian("log").

All of weights, subset, offset, etastart and mustart are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula.

For the background to warning messages about ‘fitted probabilities numerically 0 or 1 occurred’ for binomial GLMs, see Venables & Ripley (2002, pp. 197–8).


glm returns an object of class inheriting from "glm" which inherits from the class "lm". See later in this section. If a non-standard method is used, the object will also inherit from the class (if any) returned by that function.

The function summary (i.e., summary.glm) can be used to obtain or print a summary of the results and the function anova (i.e., anova.glm) to produce an analysis of variance table.

The generic accessor functions coefficients, effects, fitted.values and residuals can be used to extract various useful features of the value returned by glm.

weights extracts a vector of weights, one for each case in the fit (after subsetting and na.action).

An object of class "glm" is a list containing at least the following components:


a named vector of coefficients


the working residuals, that is the residuals in the final iteration of the IWLS fit. Since cases with zero weights are omitted, their working residuals are NA.


the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.


the numeric rank of the fitted linear model.


the family object used.


the linear fit on link scale.


up to a constant, minus twice the maximized log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero.


A version of Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of parameters, computed via the aic component of the family. For binomial and Poison families the dispersion is fixed at one and the number of parameters is the number of coefficients. For gaussian, Gamma and inverse gaussian families the dispersion is estimated from the residual deviance, and the number of parameters is the number of coefficients plus one. For a gaussian family the MLE of the dispersion is used so this is a valid value of AIC, but for Gamma and inverse gaussian families it is not. For families fitted by quasi-likelihood the value is NA.


The deviance for the null model, comparable with deviance. The null model will include the offset, and an intercept if there is one in the model. Note that this will be incorrect if the link function depends on the data other than through the fitted mean: specify a zero offset to force a correct calculation.


the number of iterations of IWLS used.


the working weights, that is the weights in the final iteration of the IWLS fit.


the weights initially supplied, a vector of 1s if none were.


the residual degrees of freedom.


the residual degrees of freedom for the null model.


if requested (the default) the y vector used. (It is a vector even for a binomial model.)


if requested, the model matrix.


if requested (the default), the model frame.


logical. Was the IWLS algorithm judged to have converged?


logical. Is the fitted value on the boundary of the attainable values?


the matched call.


the formula supplied.


the terms object used.


the data argument.


the offset vector used.


the value of the control argument used.


the name of the fitter function used (when provided as a character string to glm()) or the fitter function (when provided as that).


(where relevant) the contrasts used.


(where relevant) a record of the levels of the factors used in fitting.


(where relevant) information returned by model.frame on the special handling of NAs.

In addition, non-empty fits will have components qr, R and effects relating to the final weighted linear fit.

Objects of class "glm" are normally of class c("glm", "lm"), that is inherit from class "lm", and well-designed methods for class "lm" will be applied to the weighted linear model at the final iteration of IWLS. However, care is needed, as extractor functions for class "glm" such as residuals and weights do not just pick out the component of the fit with the same name.

If a binomial glm model was specified by giving a two-column response, the weights returned by prior.weights are the total numbers of cases (factored by the supplied case weights) and the component y of the result is the proportion of successes.

Fitting functions

The argument method serves two purposes. One is to allow the model frame to be recreated with no fitting. The other is to allow the default fitting function to be replaced by a function which takes the same arguments and uses a different fitting algorithm. If is supplied as a character string it is used to search for a function of that name, starting in the stats namespace.

The class of the object return by the fitter (if any) will be prepended to the class returned by glm.


The original R implementation of glm was written by Simon Davies working for Ross Ihaka at the University of Auckland, but has since been extensively re-written by members of the R Core team.

The design was inspired by the S function of the same name described in Hastie & Pregibon (1992).


Dobson, A. J. (1990) An Introduction to Generalized Linear Models. London: Chapman and Hall.

Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. New York: Springer.

See Also

anova.glm, summary.glm, etc. for glm methods, and the generic functions anova, summary, effects, fitted.values, and residuals.

lm for non-generalized linear models (which SAS calls GLMs, for ‘general’ linear models).

loglin and loglm (package MASS) for fitting log-linear models (which binomial and Poisson GLMs are) to contingency tables.

bigglm in package biglm for an alternative way to fit GLMs to large datasets (especially those with many cases).

esoph, infert and predict.glm have examples of fitting binomial GLMs.


## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
data.frame(treatment, outcome, counts) # showing data
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
## Computing AIC [in many ways]:
(A0 <- AIC(glm.D93))
(ll <- logLik(glm.D93))
A1 <- -2*c(ll) + 2*attr(ll, "df")
A2 <- glm.D93$family$aic(counts, mu=fitted(glm.D93), wt=1) +
        2 * length(coef(glm.D93))
stopifnot(exprs = {
  all.equal(A0, A1)
  all.equal(A1, A2)
  all.equal(A1, glm.D93$aic)

## an example with offsets from Venables & Ripley (2002, p.189)
utils::data(anorexia, package = "MASS")

anorex.1 <- glm(Postwt ~ Prewt + Treat + offset(Prewt),
                family = gaussian, data = anorexia)

# A Gamma example, from McCullagh & Nelder (1989, pp. 300-2)
clotting <- data.frame(
    u = c(5,10,15,20,30,40,60,80,100),
    lot1 = c(118,58,42,35,27,25,21,19,18),
    lot2 = c(69,35,26,21,18,16,13,12,12))
summary(glm(lot1 ~ log(u), data = clotting, family = Gamma))
summary(glm(lot2 ~ log(u), data = clotting, family = Gamma))
## Aliased ("S"ingular) -> 1 NA coefficient
(fS <- glm(lot2 ~ log(u) + log(u^2), data = clotting, family = Gamma))
tools::assertError(update(fS, singular.ok=FALSE), verbose=interactive())
## -> .. "singular fit encountered"

## Not run: 
## for an example of the use of a terms object as a formula

## End(Not run)

Auxiliary for Controlling GLM Fitting


Auxiliary function for glm fitting. Typically only used internally by, but may be used to construct a control argument to either function.


glm.control(epsilon = 1e-8, maxit = 25, trace = FALSE)



positive convergence tolerance ϵ\epsilon; the iterations converge when devdevold/(dev+0.1)<ϵ|dev - dev_{old}|/(|dev| + 0.1) < \epsilon.


integer giving the maximal number of IWLS iterations.


logical indicating if output should be produced for each iteration.


The control argument of glm is by default passed to the control argument of, which uses its elements as arguments to glm.control: the latter provides defaults and sanity checking.

If epsilon is small (less than 101010^{-10}) it is also used as the tolerance for the detection of collinearity in the least squares solution.

When trace is true, calls to cat produce the output for each IWLS iteration. Hence, options(digits = *) can be used to increase the precision, see the example.


A list with components named as the arguments.


Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also, the fitting procedure used by glm.


### A variation on  example(glm) :

## Annette Dobson's example ...
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
oo <- options(digits = 12) # to see more when tracing :
glm.D93X <- glm(counts ~ outcome + treatment, family = poisson(),
                trace = TRUE, epsilon = 1e-14)
coef(glm.D93X) # the last two are closer to 0 than in ?glm's  glm.D93

Accessing Generalized Linear Model Fits


These functions are all methods for class glm or summary.glm objects.


## S3 method for class 'glm'
family(object, ...)

## S3 method for class 'glm'
residuals(object, type = c("deviance", "pearson", "working",
                           "response", "partial"), ...)



an object of class glm, typically the result of a call to glm.


the type of residuals which should be returned. The alternatives are: "deviance" (default), "pearson", "working", "response", and "partial". Can be abbreviated.


further arguments passed to or from other methods.


The references define the types of residuals: Davison & Snell is a good reference for the usages of each.

The partial residuals are a matrix of working residuals, with each column formed by omitting a term from the model.

How residuals treats cases with missing values in the original fit is determined by the na.action argument of that fit. If na.action = na.omit omitted cases will not appear in the residuals, whereas if na.action = na.exclude they will appear, with residual value NA. See also naresid.

For fits done with y = FALSE the response values are computed from other components.


Davison, A. C. and Snell, E. J. (1991) Residuals and diagnostics. In: Statistical Theory and Modelling. In Honour of Sir David Cox, FRS, eds. Hinkley, D. V., Reid, N. and Snell, E. J., Chapman & Hall.

Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.

See Also

glm for computing glm.obj, anova.glm; the corresponding generic functions, summary.glm, coef, deviance, df.residual, effects, fitted, residuals.

influence.measures for deletion diagnostics, including standardized (rstandard) and studentized (rstudent) residuals.

Hierarchical Clustering


Hierarchical cluster analysis on a set of dissimilarities and methods for analyzing it.


hclust(d, method = "complete", members = NULL)

## S3 method for class 'hclust'
plot(x, labels = NULL, hang = 0.1, check = TRUE,
     axes = TRUE, frame.plot = FALSE, ann = TRUE,
     main = "Cluster Dendrogram",
     sub = NULL, xlab = NULL, ylab = "Height", ...)



a dissimilarity structure as produced by dist.


the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).


NULL or a vector with length size of d. See the ‘Details’ section.


an object of the type produced by hclust.


The fraction of the plot height by which labels should hang below the rest of the plot. A negative value will cause the labels to hang down from 0.


logical indicating if the x object should be checked for validity. This check is not necessary when x is known to be valid such as when it is the direct result of hclust(). The default is check=TRUE, as invalid inputs may crash R due to memory violation in the internal C plotting code.


A character vector of labels for the leaves of the tree. By default the row names or row numbers of the original data are used. If labels = FALSE no labels at all are plotted.

axes, frame.plot, ann

logical flags as in plot.default.

main, sub, xlab, ylab

character strings for title. sub and xlab have a non-NULL default when there's a tree$call.


Further graphical arguments. E.g., cex controls the size of the labels (if plotted) in the same way as text.


This function performs a hierarchical cluster analysis using a set of dissimilarities for the nn objects being clustered. Initially, each object is assigned to its own cluster and then the algorithm proceeds iteratively, at each stage joining the two most similar clusters, continuing until there is just a single cluster. At each stage distances between clusters are recomputed by the Lance–Williams dissimilarity update formula according to the particular clustering method being used.

A number of different clustering methods are provided. Ward's minimum variance method aims at finding compact, spherical clusters. The complete linkage method finds similar clusters. The single linkage method (which is closely related to the minimal spanning tree) adopts a ‘friends of friends’ clustering strategy. The other methods can be regarded as aiming for clusters with characteristics somewhere between the single and complete link methods. Note however, that methods "median" and "centroid" are not leading to a monotone distance measure, or equivalently the resulting dendrograms can have so called inversions or reversals which are hard to interpret, but note the trichotomies in Legendre and Legendre (2012).

Two different algorithms are found in the literature for Ward clustering. The one used by option "ward.D" (equivalent to the only Ward option "ward" in R versions \le 3.0.3) does not implement Ward's (1963) clustering criterion, whereas option "ward.D2" implements that criterion (Murtagh and Legendre 2014). With the latter, the dissimilarities are squared before cluster updating. Note that agnes(*, method="ward") corresponds to hclust(*, "ward.D2").

If members != NULL, then d is taken to be a dissimilarity matrix between clusters instead of dissimilarities between singletons and members gives the number of observations per cluster. This way the hierarchical cluster algorithm can be ‘started in the middle of the dendrogram’, e.g., in order to reconstruct the part of the tree above a cut (see examples). Dissimilarities between clusters can be efficiently computed (i.e., without hclust itself) only for a limited number of distance/linkage combinations, the simplest one being squared Euclidean distance and centroid linkage. In this case the dissimilarities between the clusters are the squared Euclidean distances between cluster means.

In hierarchical cluster displays, a decision is needed at each merge to specify which subtree should go on the left and which on the right. Since, for nn observations there are n1n-1 merges, there are 2(n1)2^{(n-1)} possible orderings for the leaves in a cluster tree, or dendrogram. The algorithm used in hclust is to order the subtree so that the tighter cluster is on the left (the last, i.e., most recent, merge of the left subtree is at a lower value than the last merge of the right subtree). Single observations are the tightest clusters possible, and merges involving two observations place them in order by their observation sequence number.


An object of class "hclust" which describes the tree produced by the clustering process. The object is a list with components:


an n1n-1 by 2 matrix. Row ii of merge describes the merging of clusters at step ii of the clustering. If an element jj in the row is negative, then observation j-j was merged at this stage. If jj is positive then the merge was with the cluster formed at the (earlier) stage jj of the algorithm. Thus negative entries in merge indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.


a set of n1n-1 real values (non-decreasing for ultrametric trees). The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration.


a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossings of the branches.


labels for each of the objects being clustered.


the call which produced the result.


the cluster method that has been used.


the distance that has been used to create d (only returned if the distance object has a "method" attribute).

There are print, plot and identify (see identify.hclust) methods and the rect.hclust() function for hclust objects.


Method "centroid" is typically meant to be used with squared Euclidean distances.


The hclust function is based on Fortran code contributed to STATLIB by F. Murtagh.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole. (S version.)

Everitt, B. (1974). Cluster Analysis. London: Heinemann Educ. Books.

Hartigan, J.A. (1975). Clustering Algorithms. New York: Wiley.

Sneath, P. H. A. and R. R. Sokal (1973). Numerical Taxonomy. San Francisco: Freeman.

Anderberg, M. R. (1973). Cluster Analysis for Applications. Academic Press: New York.

Gordon, A. D. (1999). Classification. Second Edition. London: Chapman and Hall / CRC

Murtagh, F. (1985). “Multidimensional Clustering Algorithms”, in COMPSTAT Lectures 4. Wuerzburg: Physica-Verlag (for algorithmic details of algorithms used).

McQuitty, L.L. (1966). Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data. Educational and Psychological Measurement, 26, 825–831. doi:10.1177/001316446602600402.

Legendre, P. and L. Legendre (2012). Numerical Ecology, 3rd English ed. Amsterdam: Elsevier Science BV.

Murtagh, Fionn and Legendre, Pierre (2014). Ward's hierarchical agglomerative clustering method: which algorithms implement Ward's criterion? Journal of Classification, 31, 274–295. doi:10.1007/s00357-014-9161-z.

See Also

identify.hclust, rect.hclust, cutree, dendrogram, kmeans.

For the Lance–Williams formula and methods that apply it generally, see agnes from package cluster.



### Example 1: Violent crime rates by US state

hc <- hclust(dist(USArrests), "ave")
plot(hc, hang = -1)

## Do the same with centroid clustering and *squared* Euclidean distance,
## cut the tree into ten clusters and reconstruct the upper part of the
## tree from the cluster centers.
hc <- hclust(dist(USArrests)^2, "cen")
memb <- cutree(hc, k = 10)
cent <- NULL
for(k in 1:10){
  cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE]))
hc1 <- hclust(dist(cent)^2, method = "cen", members = table(memb))
opar <- par(mfrow = c(1, 2))
plot(hc,  labels = FALSE, hang = -1, main = "Original Tree")
plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters")

### Example 2: Straight-line distances among 10 US cities
##  Compare the results of algorithms "ward.D" and "ward.D2"

mds2 <- -cmdscale(UScitiesD)
plot(mds2, type="n", axes=FALSE, ann=FALSE)
text(mds2, labels=rownames(mds2), xpd = NA)

hcity.D  <- hclust(UScitiesD, "ward.D") # "wrong"
hcity.D2 <- hclust(UScitiesD, "ward.D2")
opar <- par(mfrow = c(1, 2))
plot(hcity.D,  hang=-1)
plot(hcity.D2, hang=-1)

Draw a Heat Map


A heat map is a false color image (basically image(t(x))) with a dendrogram added to the left side and to the top. Typically, reordering of the rows and columns according to some set of values (row or column means) within the restrictions imposed by the dendrogram is carried out.


heatmap(x, Rowv = NULL, Colv = if(symm)"Rowv" else NULL,
        distfun = dist, hclustfun = hclust,
        reorderfun = function(d, w) reorder(d, w),
        add.expr, symm = FALSE, revC = identical(Colv, "Rowv"),
        scale = c("row", "column", "none"), na.rm = TRUE,
        margins = c(5, 5), ColSideColors, RowSideColors,
        cexRow = 0.2 + 1/log10(nr), cexCol = 0.2 + 1/log10(nc),
        labRow = NULL, labCol = NULL, main = NULL,
        xlab = NULL, ylab = NULL,
        keep.dendro = FALSE, verbose = getOption("verbose"), ...)



numeric matrix of the values to be plotted.


determines if and how the row dendrogram should be computed and reordered. Either a dendrogram or a vector of values used to reorder the row dendrogram or NA to suppress any row dendrogram (and reordering) or by default, NULL, see ‘Details’ below.


determines if and how the column dendrogram should be reordered. Has the same options as the Rowv argument above and additionally when x is a square matrix, Colv = "Rowv" means that columns should be treated identically to the rows (and so if there is to be no row dendrogram there will not be a column one either).


function used to compute the distance (dissimilarity) between both rows and columns. Defaults to dist.


function used to compute the hierarchical clustering when Rowv or Colv are not dendrograms. Defaults to hclust. Should take as argument a result of distfun and return an object to which as.dendrogram can be applied.


function(d, w) of dendrogram and weights for reordering the row and column dendrograms. The default uses reorder.dendrogram.


expression that will be evaluated after the call to image. Can be used to add components to the plot.


logical indicating if x should be treated symmetrically; can only be true when x is a square matrix.


logical indicating if the column order should be reversed for plotting, such that e.g., for the symmetric case, the symmetry axis is as usual.


character indicating if the values should be centered and scaled in either the row direction or the column direction, or none. The default is "row" if symm false, and "none" otherwise.


logical indicating whether NA's should be removed.


numeric vector of length 2 containing the margins (see par(mar = *)) for column and row names, respectively.


(optional) character vector of length ncol(x) containing the color names for a horizontal side bar that may be used to annotate the columns of x.


(optional) character vector of length nrow(x) containing the color names for a vertical side bar that may be used to annotate the rows of x.

cexRow, cexCol

positive numbers, used as cex.axis in for the row or column axis labeling. The defaults currently only use number of rows or columns, respectively.

labRow, labCol

character vectors with row and column labels to use; these default to rownames(x) or colnames(x), respectively.

main, xlab, ylab

main, x- and y-axis titles; defaults to none.


logical indicating if the dendrogram(s) should be kept as part of the result (when Rowv and/or Colv are not NA).


logical indicating if information should be printed.


additional arguments passed on to image, e.g., col specifying the colors.


If either Rowv or Colv are dendrograms they are honored (and not reordered). Otherwise, dendrograms are computed as dd <- as.dendrogram(hclustfun(distfun(X))) where X is either x or t(x).

If either is a vector (of ‘weights’) then the appropriate dendrogram is reordered according to the supplied values subject to the constraints imposed by the dendrogram, by reorder(dd, Rowv), in the row case. If either is missing, as by default, then the ordering of the corresponding dendrogram is by the mean value of the rows/columns, i.e., in the case of rows, Rowv <- rowMeans(x, na.rm = na.rm). If either is NA, no reordering will be done for the corresponding side.

By default (scale = "row") the rows are scaled to have mean zero and standard deviation one. There is some empirical evidence from genomic plotting that this is useful.


Invisibly, a list with components


row index permutation vector as returned by order.dendrogram.


column index permutation vector.


the row dendrogram; only if input Rowv was not NA and keep.dendro is true.


the column dendrogram; only if input Colv was not NA and keep.dendro is true.


Unless Rowv = NA (or Colw = NA), the original rows and columns are reordered in any case to match the dendrogram, e.g., the rows by order.dendrogram(Rowv) where Rowv is the (possibly reorder()ed) row dendrogram.

heatmap() uses layout and draws the image in the lower right corner of a 2x2 layout. Consequentially, it can not be used in a multi column/row layout, i.e., when par(mfrow = *) or (mfcol = *) has been called.


Andy Liaw, original; R. Gentleman, M. Maechler, W. Huber, revisions.

See Also

image, hclust


require(graphics); require(grDevices)
x  <- as.matrix(mtcars)
rc <- rainbow(nrow(x), start = 0, end = .3)
cc <- rainbow(ncol(x), start = 0, end = .3)
hv <- heatmap(x, col = cm.colors(256), scale = "column",
              RowSideColors = rc, ColSideColors = cc, margins = c(5,10),
              xlab = "specification variables", ylab =  "Car Models",
              main = "heatmap(<Mtcars data>, ..., scale = \"column\")")
utils::str(hv) # the two re-ordering index vectors

## no column dendrogram (nor reordering) at all:
heatmap(x, Colv = NA, col = cm.colors(256), scale = "column",
        RowSideColors = rc, margins = c(5,10),
        xlab = "specification variables", ylab =  "Car Models",
        main = "heatmap(<Mtcars data>, ..., scale = \"column\")")

## "no nothing"
heatmap(x, Rowv = NA, Colv = NA, scale = "column",
        main = "heatmap(*, NA, NA) ~= image(t(x))")

round(Ca <- cor(attitude), 2)
symnum(Ca) # simple graphic
heatmap(Ca,               symm = TRUE, margins = c(6,6)) # with reorder()
heatmap(Ca, Rowv = FALSE, symm = TRUE, margins = c(6,6)) # _NO_ reorder()
## slightly artificial with color bar, without and with ordering:
cc <- rainbow(nrow(Ca))
heatmap(Ca, Rowv = FALSE, symm = TRUE, RowSideColors = cc, ColSideColors = cc,
	margins = c(6,6))
heatmap(Ca,		symm = TRUE, RowSideColors = cc, ColSideColors = cc,
	margins = c(6,6))

## For variable clustering, rather use distance based on cor():
symnum( cU <- cor(USJudgeRatings) )

hU <- heatmap(cU, Rowv = FALSE, symm = TRUE, col = topo.colors(16),
             distfun = function(c) as.dist(1 - c), keep.dendro = TRUE)
## The Correlation matrix with same reordering:
round(100 * cU[hU[[1]], hU[[2]]])
## The column dendrogram:

Holt-Winters Filtering


Computes Holt-Winters Filtering of a given time series. Unknown parameters are determined by minimizing the squared prediction error.


HoltWinters(x, alpha = NULL, beta = NULL, gamma = NULL,
            seasonal = c("additive", "multiplicative"),
            start.periods = 2, l.start = NULL, b.start = NULL,
            s.start = NULL,
            optim.start = c(alpha = 0.3, beta = 0.1, gamma = 0.1),
            optim.control = list())



An object of class ts


alphaalpha parameter of Holt-Winters Filter.


betabeta parameter of Holt-Winters Filter. If set to FALSE, the function will do exponential smoothing.


gammagamma parameter used for the seasonal component. If set to FALSE, an non-seasonal model is fitted.


Character string to select an "additive" (the default) or "multiplicative" seasonal model. The first few characters are sufficient. (Only takes effect if gamma is non-zero).


Start periods used in the autodetection of start values. Must be at least 2.


Start value for level (a[0]).


Start value for trend (b[0]).


Vector of start values for the seasonal component (s1[0]sp[0]s_1[0] \ldots s_p[0])


Vector with named components alpha, beta, and gamma containing the starting values for the optimizer. Only the values needed must be specified. Ignored in the one-parameter case.


Optional list with additional control parameters passed to optim if this is used. Ignored in the one-parameter case.


The additive Holt-Winters prediction function (for time series with period length p) is

Y^[t+h]=a[t]+hb[t]+s[tp+1+(h1)modp],\hat Y[t+h] = a[t] + h b[t] + s[t - p + 1 + (h - 1) \bmod p],

where a[t]a[t], b[t]b[t] and s[t]s[t] are given by

a[t]=α(Y[t]s[tp])+(1α)(a[t1]+b[t1])a[t] = \alpha (Y[t] - s[t-p]) + (1-\alpha) (a[t-1] + b[t-1])

b[t]=β(a[t]a[t1])+(1β)b[t1]b[t] = \beta (a[t] -a[t-1]) + (1-\beta) b[t-1]

s[t]=γ(Y[t]a[t])+(1γ)s[tp]s[t] = \gamma (Y[t] - a[t]) + (1-\gamma) s[t-p]

The multiplicative Holt-Winters prediction function (for time series with period length p) is

Y^[t+h]=(a[t]+hb[t])×s[tp+1+(h1)modp].\hat Y[t+h] = (a[t] + h b[t]) \times s[t - p + 1 + (h - 1) \bmod p].

where a[t]a[t], b[t]b[t] and s[t]s[t] are given by

a[t]=α(Y[t]/s[tp])+(1α)(a[t1]+b[t1])a[t] = \alpha (Y[t] / s[t-p]) + (1-\alpha) (a[t-1] + b[t-1])

b[t]=β(a[t]a[t1])+(1β)b[t1]b[t] = \beta (a[t] - a[t-1]) + (1-\beta) b[t-1]

s[t]=γ(Y[t]/a[t])+(1γ)s[tp]s[t] = \gamma (Y[t] / a[t]) + (1-\gamma) s[t-p]

The data in x are required to be non-zero for a multiplicative model, but it makes most sense if they are all positive.

The function tries to find the optimal values of α\alpha and/or β\beta and/or γ\gamma by minimizing the squared one-step prediction error if they are NULL (the default). optimize will be used for the single-parameter case, and optim otherwise.

For seasonal models, start values for a, b and s are inferred by performing a simple decomposition in trend and seasonal component using moving averages (see function decompose) on the start.periods first periods (a simple linear regression on the trend component is used for starting level and trend). For level/trend-models (no seasonal component), start values for a and b are x[2] and x[2] - x[1], respectively. For level-only models (ordinary exponential smoothing), the start value for a is x[1].


An object of class "HoltWinters", a list with components:


A multiple time series with one column for the filtered series as well as for the level, trend and seasonal components, estimated contemporaneously (that is at time t and not at the end of the series).


The original series


alpha used for filtering


beta used for filtering


gamma used for filtering


A vector with named components a, b, s1, ..., sp containing the estimated values for the level, trend and seasonal components


The specified seasonal parameter


The final sum of squared errors achieved in optimizing


The call used


David Meyer


C. C. Holt (1957) Forecasting seasonals and trends by exponentially weighted moving averages, ONR Research Memorandum, Carnegie Institute of Technology 52. (reprint at doi:10.1016/j.ijforecast.2003.09.015).

P. R. Winters (1960). Forecasting sales by exponentially weighted moving averages. Management Science, 6, 324–342. doi:10.1287/mnsc.6.3.324.

See Also

predict.HoltWinters, optim.



## Seasonal Holt-Winters
(m <- HoltWinters(co2))

(m <- HoltWinters(AirPassengers, seasonal = "mult"))

## Non-Seasonal Holt-Winters
x <- uspop + rnorm(uspop, sd = 5)
m <- HoltWinters(x, gamma = FALSE)

## Exponential Smoothing
m2 <- HoltWinters(x, gamma = FALSE, beta = FALSE)
lines(fitted(m2)[,1], col = 3)

The Hypergeometric Distribution


Density, distribution function, quantile function and random generation for the hypergeometric distribution.


dhyper(x, m, n, k, log = FALSE)
phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
qhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE)
rhyper(nn, m, n, k)


x, q

vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls.


the number of white balls in the urn.


the number of black balls in the urn.


the number of balls drawn from the urn, hence must be in 0,1,,m+n0,1,\dots, m+n.


probability, it must be between 0 and 1.


number of observations. If length(nn) > 1, the length is taken to be the number required.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


The hypergeometric distribution is used for sampling without replacement. The density of this distribution with parameters m, n and k (named NpNp, NNpN-Np, and nn, respectively in the reference below, where N:=m+nN := m+n is also used in other references) is given by

p(x)=(mx)(nkx)/(m+nk)p(x) = \left. {m \choose x}{n \choose k-x} \right/ {m+n \choose k}%

for x=0,,kx = 0, \ldots, k.

Note that p(x)p(x) is non-zero only for max(0,kn)xmin(k,m)\max(0, k-n) \le x \le \min(k, m).

With p:=m/(m+n)p := m/(m+n) (hence Np=N×pNp = N \times p in the reference's notation), the first two moments are mean

E[X]=μ=kpE[X] = \mu = k p

and variance

Var(X)=kp(1p)m+nkm+n1,\mbox{Var}(X) = k p (1 - p) \frac{m+n-k}{m+n-1},

which shows the closeness to the Binomial(k,p)(k,p) (where the hypergeometric has smaller variance unless k=1k = 1).

The quantile is defined as the smallest value xx such that F(x)pF(x) \ge p, where FF is the distribution function.

In rhyper(), if one of m,n,km, n, k exceeds .Machine$integer.max, currently the equivalent of qhyper(runif(nn), m,n,k) is used which is comparably slow while instead a binomial approximation may be considerably more efficient.


dhyper gives the density, phyper gives the distribution function, qhyper gives the quantile function, and rhyper generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rhyper, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


dhyper computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

phyper is based on calculating dhyper and phyper(...)/dhyper(...) (as a summation), based on ideas of Ian Smith and Morten Welinder.

qhyper is based on inversion (of an earlier phyper() algorithm).

rhyper is based on a corrected version of

Kachitvichyanukul, V. and Schmeiser, B. (1985). Computer generation of hypergeometric random variates. Journal of Statistical Computation and Simulation, 22, 127–145.


Johnson, N. L., Kotz, S., and Kemp, A. W. (1992) Univariate Discrete Distributions, Second Edition. New York: Wiley.

See Also

Distributions for other standard distributions.


m <- 10; n <- 7; k <- 8
x <- 0:(k+1)
rbind(phyper(x, m, n, k), dhyper(x, m, n, k))
all(phyper(x, m, n, k) == cumsum(dhyper(x, m, n, k)))  # FALSE
## but errors are very small:
signif(phyper(x, m, n, k) - cumsum(dhyper(x, m, n, k)), digits = 3)

stopifnot(abs(phyper(x, m, n, k) - cumsum(dhyper(x, m, n, k))) < 5e-16)

Identify Clusters in a Dendrogram


identify.hclust reads the position of the graphics pointer when the (first) mouse button is pressed. It then cuts the tree at the vertical position of the pointer and highlights the cluster containing the horizontal position of the pointer. Optionally a function is applied to the index of data points contained in the cluster.


## S3 method for class 'hclust'
identify(x, FUN = NULL, N = 20, MAXCLUSTER = 20, DEV.FUN = NULL,



an object of the type produced by hclust.


(optional) function to be applied to the index numbers of the data points in a cluster (see ‘Details’ below).


the maximum number of clusters to be identified.


the maximum number of clusters that can be produced by a cut (limits the effective vertical range of the pointer).


(optional) integer scalar. If specified, the corresponding graphics device is made active before FUN is applied.


further arguments to FUN.


By default clusters can be identified using the mouse and an invisible list of indices of the respective data points is returned.

If FUN is not NULL, then the index vector of data points is passed to this function as first argument, see the examples below. The active graphics device for FUN can be specified using DEV.FUN.

The identification process is terminated by pressing any mouse button other than the first, see also identify.


Either a list of data point index vectors or a list of return values of FUN.

See Also

hclust, rect.hclust


## Not run: 

hca <- hclust(dist(USArrests))
(x <- identify(hca)) ##  Terminate with 2nd mouse button !!

hci <- hclust(dist(iris[,1:4]))
identify(hci, function(k) print(table(iris[k,5])))

# open a new device (one for dendrogram, one for bars): # << make that narrow (& small)
          # and *beside* 1st one
nD <- dev.cur()            # to be for the barplot
dev.set(dev.prev())  # old one for dendrogram
## select subtrees in dendrogram and "see" the species distribution:
identify(hci, function(k) barplot(table(iris[k,5]), col = 2:4), DEV.FUN = nD)

## End(Not run)

Regression Deletion Diagnostics


This suite of functions can be used to compute some of the regression (leave-one-out deletion) diagnostics for linear and generalized linear models discussed in Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982), etc.


influence.measures(model, infl = influence(model))

rstandard(model, ...)
## S3 method for class 'lm'
rstandard(model, infl = lm.influence(model, do.coef = FALSE),
          sd = sqrt(deviance(model)/df.residual(model)),
          type = c("sd.1", "predictive"), ...)
## S3 method for class 'glm'
rstandard(model, infl = influence(model, do.coef = FALSE),
          type = c("deviance", "pearson"), ...)

rstudent(model, ...)
## S3 method for class 'lm'
rstudent(model, infl = lm.influence(model, do.coef = FALSE),
         res = infl$wt.res, ...)
## S3 method for class 'glm'
rstudent(model, infl = influence(model, do.coef = FALSE), ...)

dffits(model, infl = , res = )

dfbeta(model, ...)
## S3 method for class 'lm'
dfbeta(model, infl = lm.influence(model, do.coef = TRUE), ...)

dfbetas(model, ...)
## S3 method for class 'lm'
dfbetas(model, infl = lm.influence(model, do.coef = TRUE), ...)

covratio(model, infl = lm.influence(model, do.coef = FALSE),
         res = weighted.residuals(model))

cooks.distance(model, ...)
## S3 method for class 'lm'
cooks.distance(model, infl = lm.influence(model, do.coef = FALSE),
               res = weighted.residuals(model),
               sd = sqrt(deviance(model)/df.residual(model)),
               hat = infl$hat, ...)
## S3 method for class 'glm'
cooks.distance(model, infl = influence(model, do.coef = FALSE),
               res = infl$pear.res,
               dispersion = summary(model)$dispersion,
               hat = infl$hat, ...)

hatvalues(model, ...)
## S3 method for class 'lm'
hatvalues(model, infl = lm.influence(model, do.coef = FALSE), ...)

hat(x, intercept = TRUE)



an R object, typically returned by lm or glm.


influence structure as returned by lm.influence or influence (the latter only for the glm method of rstudent and cooks.distance).


(possibly weighted) residuals, with proper default.


standard deviation to use, see default.


dispersion (for glm objects) to use, see default.


hat values HiiH_{ii}, see default.


type of residuals for rstandard, with different options and meanings for lm and glm. Can be abbreviated.


the XX or design matrix.


should an intercept column be prepended to x?


further arguments passed to or from other methods.


The primary high-level function is influence.measures which produces a class "infl" object tabular display showing the DFBETAs for each model variable, DFFITs, covariance ratios, Cook's distances and the diagonal elements of the hat matrix. Cases which are influential with respect to any of these measures are marked with an asterisk.

The functions dfbetas, dffits, covratio and cooks.distance provide direct access to the corresponding diagnostic quantities. Functions rstandard and rstudent give the standardized and Studentized residuals respectively. (These re-normalize the residuals to have unit variance, using an overall and leave-one-out measure of the error variance respectively.)

Note that for multivariate lm() models (of class "mlm"), these functions return 3d arrays instead of matrices, or matrices instead of vectors.

Values for generalized linear models are approximations, as described in Williams (1987) (except that Cook's distances are scaled as FF rather than as chi-square values). The approximations can be poor when some cases have large influence.

The optional infl, res and sd arguments are there to encourage the use of these direct access functions, in situations where, e.g., the underlying basic influence measures (from lm.influence or the generic influence) are already available.

Note that cases with weights == 0 are dropped from all these functions, but that if a linear model has been fitted with na.action = na.exclude, suitable values are filled in for the cases excluded during fitting.

For linear models, rstandard(*, type = "predictive") provides leave-one-out cross validation residuals, and the “PRESS” statistic (PREdictive Sum of Squares, the same as the CV score) of model model is

   PRESS <- sum(rstandard(model, type="pred")^2)

The function hat() exists mainly for S (version 2) compatibility; we recommend using hatvalues() instead.


For hatvalues, dfbeta, and dfbetas, the method for linear models also works for generalized linear models.


Several R core team members and John Fox, originally in his ‘car’ package.


Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman and Hall.

Williams, D. A. (1987). Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics, 36, 181–191. doi:10.2307/2347550.

Fox, J. (1997). Applied Regression, Linear Models, and Related Methods. Sage.

Fox, J. (2002) An R and S-Plus Companion to Applied Regression. Sage Publ.

Fox, J. and Weisberg, S. (2011). An R Companion to Applied Regression, second edition. Sage Publ;

See Also

influence (containing lm.influence).

plotmath’ for the use of hat in plot annotation.



## Analysis of the life-cycle savings data
## given in Belsley, Kuh and Welsch.
lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)

inflm.SR <- influence.measures(lm.SR)
which(apply(inflm.SR$is.inf, 1, any))
# which observations 'are' influential
summary(inflm.SR) # only these
inflm.SR          # all
plot(rstudent(lm.SR) ~ hatvalues(lm.SR)) # recommended by some
plot(lm.SR, which = 5) # an enhanced version of that via plot(<lm>)

## The 'infl' argument is not needed, but avoids recomputation:
rs <- rstandard(lm.SR)
iflSR <- influence(lm.SR)
all.equal(rs, rstandard(lm.SR, infl = iflSR), tolerance = 1e-10)
## to "see" the larger values:
1000 * round(dfbetas(lm.SR, infl = iflSR), 3)
cat("PRESS :"); (PRESS <- sum( rstandard(lm.SR, type = "predictive")^2 ))
stopifnot(all.equal(PRESS, sum( (residuals(lm.SR) / (1 - iflSR$hat))^2)))

## Show that "PRE-residuals"  ==  L.O.O. Crossvalidation (CV) errors:
X <- model.matrix(lm.SR)
y <- model.response(model.frame(lm.SR))
## Leave-one-out CV least-squares prediction errors (relatively fast)
rCV <- vapply(seq_len(nrow(X)), function(i)
              y[i] - X[i,] %*%[-i,], y[-i])$coefficients,
## are the same as the *faster* rstandard(*, "pred") :
stopifnot(all.equal(rCV, unname(rstandard(lm.SR, type = "predictive"))))

## Huber's data [Atkinson 1985]
xh <- c(-4:0, 10)
yh <- c(2.48, .73, -.04, -1.44, -1.32, 0)
lmH <- lm(yh ~ xh)
im <- influence.measures(lmH)
is.inf <- apply(im$is.inf, 1, any)
plot(xh,yh, main = "Huber's data: L.S. line and influential obs.")
abline(lmH); points(xh[is.inf], yh[is.inf], pch = 20, col = 2)

## Irwin's data [Williams 1987]
xi <- 1:5
yi <- c(0,2,14,19,30)    # number of mice responding to dose xi
mi <- rep(40, 5)         # number of mice exposed
glmI <- glm(cbind(yi, mi -yi) ~ xi, family = binomial)
signif(cooks.distance(glmI), 3)   # ~= Ci in Table 3, p.184
imI <- influence.measures(glmI)

Integration of One-Dimensional Functions


Adaptive quadrature of functions of one variable over a finite or infinite interval.


integrate(f, lower, upper, ..., subdivisions = 100L,
          rel.tol = .Machine$double.eps^0.25, abs.tol = rel.tol,
          stop.on.error = TRUE, keep.xy = FALSE, aux = NULL)



an R function taking a numeric first argument and returning a numeric vector of the same length. Returning a non-finite element will generate an error.

lower, upper

the limits of integration. Can be infinite.


additional arguments to be passed to f.


the maximum number of subintervals.


relative accuracy requested.


absolute accuracy requested.


logical. If true (the default) an error stops the function. If false some errors will give a result with a warning in the message component.


unused. For compatibility with S.


unused. For compatibility with S.


Note that arguments after ... must be matched exactly.

If one or both limits are infinite, the infinite range is mapped onto a finite interval.

For a finite interval, globally adaptive interval subdivision is used in connection with extrapolation by Wynn's Epsilon algorithm, with the basic step being Gauss–Kronrod quadrature.

rel.tol cannot be less than max(50*.Machine$double.eps, 0.5e-28) if abs.tol <= 0.

Note that the comments in the C source code in ‘R/src/appl/integrate.c’ give more details, particularly about reasons for failure (internal error code ier >= 1).

In R versions \le 3.2.x, the first entries of lower and upper were used whereas an error is signalled now if they are not of length one.


A list of class "integrate" with components


the final estimate of the integral.


estimate of the modulus of the absolute error.


the number of subintervals produced in the subdivision process.


"OK" or a character string giving the error message.


the matched call.


Like all numerical integration routines, these evaluate the function on a finite set of points. If the function is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong.

When integrating over infinite intervals do so explicitly, rather than just using a large number as the endpoint. This increases the chance of a correct answer – any function whose integral over an infinite interval is finite must be near zero for most of that interval.

For values at a finite set of points to be a fair reflection of the behaviour of the function elsewhere, the function needs to be well-behaved, for example differentiable except perhaps for a small number of jumps or integrable singularities.

f must accept a vector of inputs and produce a vector of function evaluations at those points. The Vectorize function may be helpful to convert f to this form.


Based on QUADPACK routines dqags and dqagi by R. Piessens and E. deDoncker–Kapenga, available from Netlib.


R. Piessens, E. deDoncker–Kapenga, C. Uberhuber, D. Kahaner (1983) Quadpack: a Subroutine Package for Automatic Integration; Springer Verlag.


integrate(dnorm, -1.96, 1.96)
integrate(dnorm, -Inf, Inf)

## a slowly-convergent integral
integrand <- function(x) {1/((x+1)*sqrt(x))}
integrate(integrand, lower = 0, upper = Inf)

## don't do this if you really want the integral from 0 to Inf
integrate(integrand, lower = 0, upper = 10)
integrate(integrand, lower = 0, upper = 100000)
integrate(integrand, lower = 0, upper = 1000000, stop.on.error = FALSE)

## some functions do not handle vector input properly
f <- function(x) 2.0
try(integrate(f, 0, 1))
integrate(Vectorize(f), 0, 1)  ## correct
integrate(function(x) rep(2.0, length(x)), 0, 1)  ## correct

## integrate can fail if misused
integrate(dnorm, 0, 2)
integrate(dnorm, 0, 20)
integrate(dnorm, 0, 200)
integrate(dnorm, 0, 2000)
integrate(dnorm, 0, 20000) ## fails on many systems
integrate(dnorm, 0, Inf)   ## works

integrate(dnorm, 0:1, 20) #-> error!
## "silently" gave  integrate(dnorm, 0, 20)  in earlier versions of R

Two-way Interaction Plot


Plots the mean (or other summary) of the response for two-way combinations of factors, thereby illustrating possible interactions.


interaction.plot(x.factor, trace.factor, response, fun = mean,
                 type = c("l", "p", "b", "o", "c"), legend = TRUE,
                 trace.label = deparse1(substitute(trace.factor)),
                 fixed = FALSE,
                 xlab = deparse1(substitute(x.factor)),
                 ylab = ylabel,
                 ylim = range(cells, na.rm = TRUE),
                 lty = nc:1, col = 1, pch = c(1:9, 0, letters),
                 xpd = NULL, = par("bg"), leg.bty = "n",
                 xtick = FALSE, xaxt = par("xaxt"), axes = TRUE,



a factor whose levels will form the x axis.


another factor whose levels will form the traces.


a numeric variable giving the response.


the function to compute the summary. Should return a single real value.


the type of plot (see plot.default): lines or points or both.


logical. Should a legend be included?


overall label for the legend.


logical. Should the legend be in the order of the levels of trace.factor (TRUE) or in the order of the traces at their right-hand ends (FALSE, the default)?

xlab, ylab

the x and y label of the plot each with a sensible default.


numeric of length 2 giving the y limits for the plot.


line type for the lines drawn, with sensible default.


the color to be used for plotting.


a vector of plotting symbols or characters, with sensible default.


determines clipping behaviour for the legend used, see par(xpd). Per default, the legend is not clipped at the figure border., leg.bty

arguments passed to legend().


logical. Should tick marks be used on the x axis?

xaxt, axes, ...

graphics parameters to be passed to the plotting routines.


By default the levels of x.factor are plotted on the x axis in their given order, with extra space on the right for the legend (if specified). If x.factor is an ordered factor and the levels are numeric, these numeric values are used for the x axis.

The response and hence its summary can contain missing values. If so, the missing values and the line segments joining them are omitted from the plot (and this can be somewhat disconcerting).

The graphics parameters xlab, ylab, ylim, lty, col and pch are given suitable defaults (and xlim and xaxs are set and cannot be overridden). The defaults are to cycle through the line types, use the foreground colour, and to use the symbols 1:9, 0, and the small letters to plot the traces.


Some of the argument names and the precise behaviour are chosen for S-compatibility.


Chambers, J. M., Freeny, A and Heiberger, R. M. (1992) Analysis of variance; designed experiments. Chapter 5 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.



with(ToothGrowth, {
interaction.plot(dose, supp, len, fixed = TRUE)
dose <- ordered(dose)
interaction.plot(dose, supp, len, fixed = TRUE,
                 col = 2:3, leg.bty = "o", xtick = TRUE)
interaction.plot(dose, supp, len, fixed = TRUE, col = 2:3, type = "p")

with(OrchardSprays, {
  interaction.plot(treatment, rowpos, decrease)
  interaction.plot(rowpos, treatment, decrease, cex.axis = 0.8)
  ## order the rows by their mean effect
  rowpos <- factor(rowpos,
                   levels = sort.list(tapply(decrease, rowpos, mean)))
  interaction.plot(rowpos, treatment, decrease, col = 2:9, lty = 1)

The Interquartile Range


computes interquartile range of the x values.


IQR(x, na.rm = FALSE, type = 7)



a numeric vector.


logical. Should missing values be removed?


an integer selecting one of the many quantile algorithms, see quantile.


Note that this function computes the quartiles using the quantile function rather than following Tukey's recommendations, i.e., IQR(x) = quantile(x, 3/4) - quantile(x, 1/4).

For normally N(m,1)N(m,1) distributed XX, the expected value of IQR(X) is 2*qnorm(3/4) = 1.3490, i.e., for a normal-consistent estimate of the standard deviation, use IQR(x) / 1.349.


Tukey, J. W. (1977). Exploratory Data Analysis. Reading: Addison-Wesley.

See Also

fivenum, mad which is more robust, range, quantile.



Test if a Model's Formula is Empty


R's formula notation allows models with no intercept and no predictors. These require special handling internally. is.empty.model() checks whether an object describes an empty model.





A terms object or an object with a terms method.


TRUE if the model is empty

See Also

lm, glm


y <- rnorm(20)
is.empty.model(y ~ 0)
is.empty.model(y ~ -1)
is.empty.model(lm(y ~ 0))

Isotonic / Monotone Regression


Compute the isotonic (monotonically increasing nonparametric) least squares regression which is piecewise constant.


isoreg(x, y = NULL)


x, y

coordinate vectors of the regression points. Alternatively a single plotting structure can be specified: see xy.coords. The y values, and even sum(y) must be finite, currently.


The algorithm determines the convex minorant m(x)m(x) of the cumulative data (i.e., cumsum(y)) which is piecewise linear and the result is m(x)m'(x), a step function with level changes at locations where the convex m(x)m(x) touches the cumulative data polygon and changes slope.
as.stepfun() returns a stepfun object which can be more parsimonious.


isoreg() returns an object of class isoreg which is basically a list with components


original (constructed) abscissa values x.


corresponding y values.


fitted values corresponding to ordered x values.


cumulative y values corresponding to ordered x values.


integer vector giving indices where the fitted curve jumps, i.e., where the convex minorant has kinks.


logical indicating if original x values were ordered increasingly already.


if(!isOrd): integer permutation order(x) of original x.


the call to isoreg() used.


The inputs can be long vectors, but iKnots will wrap around at 2312^{31}.

The code should be improved to accept weights additionally and solve the corresponding weighted least squares problem.
‘Patches are welcome!’


Barlow, R. E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D. (1972) Statistical Inference under Order Restrictions; Wiley, London.

Robertson, T., Wright, F. T. and Dykstra, R. L. (1988) Order Restricted Statistical Inference; Wiley, New York.

See Also

the plotting method plot.isoreg with more examples; isoMDS() from the MASS package internally uses isotonic regression.



(ir <- isoreg(c(1,0,4,3,3,5,4,2,0)))
plot(ir, plot.type = "row")

(ir3 <- isoreg(y3 <- c(1,0,4,3,3,5,4,2, 3))) # last "3", not "0"
(fi3 <- as.stepfun(ir3))
(ir4 <- isoreg(1:10, y4 <- c(5, 9, 1:2, 5:8, 3, 8)))
cat(sprintf("R^2 = %.2f\n",
            1 - sum(residuals(ir4)^2) / ((10-1)*var(y4))))

## If you are interested in the knots alone :
with(ir4, cbind(iKnots, yf[iKnots]))

## Example of unordered x[] with ties:
x <- sample((0:30)/8)
y <- exp(x)
x. <- round(x) # ties!
plot(m <- isoreg(x., y))
stopifnot(all.equal(with(m, yf[iKnots]),
                    as.vector(tapply(y, x., mean))))

Kalman Filtering


Use Kalman Filtering to find the (Gaussian) log-likelihood, or for forecasting or smoothing.


KalmanLike(y, mod, nit = 0L, update = FALSE)
KalmanRun(y, mod, nit = 0L, update = FALSE)
KalmanSmooth(y, mod, nit = 0L)
KalmanForecast(n.ahead = 10L, mod, update = FALSE)

makeARIMA(phi, theta, Delta, kappa = 1e6,
          SSinit = c("Gardner1980", "Rossignol2011"),
          tol = .Machine$double.eps)



a univariate time series.


a list describing the state-space model: see ‘Details’.


the time at which the initialization is computed. nit = 0L implies that the initialization is for a one-step prediction, so Pn should not be computed at the first step.


if TRUE the update mod object will be returned as attribute "mod" of the result.


the number of steps ahead for which prediction is required.

phi, theta

numeric vectors of length 0\ge 0 giving AR and MA parameters.


vector of differencing coefficients, so an ARMA model is fitted to y[t] - Delta[1]*y[t-1] - ....


the prior variance (as a multiple of the innovations variance) for the past observations in a differenced model.


a string specifying the algorithm to compute the Pn part of the state-space initialization; see ‘Details’.


tolerance eventually passed to solve.default when SSinit = "Rossignol2011".


These functions work with a general univariate state-space model with state vector ‘⁠a⁠’, transitions ‘⁠a <- T a + R e⁠’, eN(0,κQ)e \sim {\cal N}(0, \kappa Q) and observation equation ‘⁠y = Z'a + eta⁠’, (etaη),ηN(0,κh)(eta\equiv\eta), \eta \sim {\cal N}(0, \kappa h). The likelihood is a profile likelihood after estimation of κ\kappa.

The model is specified as a list with at least components


the transition matrix


the observation coefficients


the observation variance




the current state estimate


the current estimate of the state uncertainty matrix QQ


the estimate at time t1t-1 of the state uncertainty matrix QQ (not updated by KalmanForecast).

KalmanSmooth is the workhorse function for tsSmooth.

makeARIMA constructs the state-space model for an ARIMA model, see also arima.

The state-space initialization has used Gardner et al.'s method (SSinit = "Gardner1980"), as only method for years. However, that suffers sometimes from deficiencies when close to non-stationarity. For this reason, it may be replaced as default in the future and only kept for reproducibility reasons. Explicit specification of SSinit is therefore recommended, notably also in arima(). The "Rossignol2011" method has been proposed and partly documented by Raphael Rossignol, Univ. Grenoble, on 2011-09-20 (see PR#14682, below), and later been ported to C by Matwey V. Kornilov. It computes the covariance matrix of (Xt1,...,Xtp,Zt,...,Ztq)(X_{t-1},...,X_{t-p},Z_t,...,Z_{t-q}) by the method of difference equations (page 93 of Brockwell and Davis (1991)), apparently suggested by a referee of Gardner et al. (see p.314 of their paper).


For KalmanLike, a list with components Lik (the log-likelihood less some constants) and s2, the estimate of κ\kappa.

For KalmanRun, a list with components values, a vector of length 2 giving the output of KalmanLike, resid (the residuals) and states, the contemporaneous state estimates, a matrix with one row for each observation time.

For KalmanSmooth, a list with two components. Component smooth is a n by p matrix of state estimates based on all the observations, with one row for each time. Component var is a n by p by p array of variance matrices.

For KalmanForecast, a list with components pred, the predictions, and var, the unscaled variances of the prediction errors (to be multiplied by s2).

For makeARIMA, a model list including components for its arguments.


These functions are designed to be called from other functions which check the validity of the arguments passed, so very little checking is done.


Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, second edition. Springer.

Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press.

Gardner, G, Harvey, A. C. and Phillips, G. D. A. (1980). Algorithm AS 154: An algorithm for exact maximum likelihood estimation of autoregressive-moving average models by means of Kalman filtering. Applied Statistics, 29, 311–322. doi:10.2307/2346910.

R bug report PR#14682 (2011-2013)

See Also

arima, StructTS. tsSmooth.


## an ARIMA fit
fit3 <- arima(presidents, c(3, 0, 0))
predict(fit3, 12)
## reconstruct this
pr <- KalmanForecast(12, fit3$model)
pr$pred + fit3$coef[4]
sqrt(pr$var * fit3$sigma2)
## and now do it year by year
mod <- fit3$model
for(y in 1:3) {
  pr <- KalmanForecast(4, mod, TRUE)
  print(list(pred = pr$pred + fit3$coef["intercept"], 
             se = sqrt(pr$var * fit3$sigma2)))
  mod <- attr(pr, "mod")

Apply Smoothing Kernel


kernapply computes the convolution between an input sequence and a specific kernel.


kernapply(x, ...)

## Default S3 method:
kernapply(x, k, circular = FALSE, ...)
## S3 method for class 'ts'
kernapply(x, k, circular = FALSE, ...)
## S3 method for class 'vector'
kernapply(x, k, circular = FALSE, ...)

## S3 method for class 'tskernel'
kernapply(x, k, ...)



an input vector, matrix, time series or kernel to be smoothed.


smoothing "tskernel" object.


a logical indicating whether the input sequence to be smoothed is treated as circular, i.e., periodic.


arguments passed to or from other methods.


A smoothed version of the input sequence.


This uses fft to perform the convolution, so is fastest when NROW(x) is a power of 2 or some other highly composite integer.


A. Trapletti

See Also

kernel, convolve, filter, spectrum


## see 'kernel' for examples

Smoothing Kernel Objects


The "tskernel" class is designed to represent discrete symmetric normalized smoothing kernels. These kernels can be used to smooth vectors, matrices, or time series objects.

There are print, plot and [ methods for these kernel objects.


kernel(coef, m = 2, r, name)


## S3 method for class 'tskernel'
plot(x, type = "h", xlab = "k", ylab = "W[k]",
     main = attr(x,"name"), ...)



the upper half of the smoothing kernel coefficients (including coefficient zero) or the name of a kernel (currently "daniell", "dirichlet", "fejer" or "modified.daniell").


the kernel dimension(s) if coef is a name. When m has length larger than one, it means the convolution of kernels of dimension m[j], for j in 1:length(m). Currently this is supported only for the named "*daniell" kernels.


the name the kernel will be called.


the kernel order for a Fejer kernel.

k, x

a "tskernel" object.

type, xlab, ylab, main, ...

arguments passed to plot.default.


kernel is used to construct a general kernel or named specific kernels. The modified Daniell kernel halves the end coefficients.

The [ method allows natural indexing of kernel objects with indices in (-m) : m. The normalization is such that for k <- kernel(*), sum(k[ -k$m : k$m ]) is one.

df.kernel returns the ‘equivalent degrees of freedom’ of a smoothing kernel as defined in Brockwell and Davis (1991), page 362, and bandwidth.kernel returns the equivalent bandwidth as defined in Bloomfield (1976), p. 201, with a continuity correction.


kernel() returns an object of class "tskernel" which is basically a list with the two components coef and the kernel dimension m. An additional attribute is "name".


A. Trapletti; modifications by B.D. Ripley


Bloomfield, P. (1976) Fourier Analysis of Time Series: An Introduction. Wiley.

Brockwell, P.J. and Davis, R.A. (1991) Time Series: Theory and Methods. Second edition. Springer, pp. 350–365.

See Also




## Demonstrate a simple trading strategy for the
## financial time series German stock index DAX.
x <- EuStockMarkets[,1]
k1 <- kernel("daniell", 50)  # a long moving average
k2 <- kernel("daniell", 10)  # and a short one
x1 <- kernapply(x, k1)
x2 <- kernapply(x, k2)
lines(x1, col = "red")    # go long if the short crosses the long upwards
lines(x2, col = "green")  # and go short otherwise

## More interesting kernels
kd <- kernel("daniell", c(3, 3))
kd # note the unusual indexing
plot(kernel("fejer", 100, r = 6))
plot(kernel("modified.daniell", c(7,5,3)))

# Reproduce example 10.4.3 from Brockwell and Davis (1991)
spectrum(sunspot.year, kernel = kernel("daniell", c(11,7,3)), log = "no")

K-Means Clustering


Perform k-means clustering on a data matrix.


kmeans(x, centers, iter.max = 10, nstart = 1,
       algorithm = c("Hartigan-Wong", "Lloyd", "Forgy",
                     "MacQueen"), trace = FALSE)
## S3 method for class 'kmeans'
fitted(object, method = c("centers", "classes"), ...)



numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).


either the number of clusters, say kk, or a set of initial (distinct) cluster centres. If a number, a random set of (distinct) rows in x is chosen as the initial centres.


the maximum number of iterations allowed.


if centers is a number, how many random sets should be chosen?


character: may be abbreviated. Note that "Lloyd" and "Forgy" are alternative names for one algorithm.


an R object of class "kmeans", typically the result ob of ob <- kmeans(..).


character: may be abbreviated. "centers" causes fitted to return cluster centers (one for each input point) and "classes" causes fitted to return a vector of class assignments.


logical or integer number, currently only used in the default method ("Hartigan-Wong"): if positive (or true), tracing information on the progress of the algorithm is produced. Higher values may produce more tracing information.


not used.


The data given by x are clustered by the kk-means method, which aims to partition the points into kk groups such that the sum of squares from points to the assigned cluster centres is minimized. At the minimum, all cluster centres are at the mean of their Voronoi sets (the set of data points which are nearest to the cluster centre).

The algorithm of Hartigan and Wong (1979) is used by default. Note that some authors use kk-means to refer to a specific algorithm rather than the general method: most commonly the algorithm given by MacQueen (1967) but sometimes that given by Lloyd (1957) and Forgy (1965). The Hartigan–Wong algorithm generally does a better job than either of those, but trying several random starts (nstart>1> 1) is often recommended. In rare cases, when some of the points (rows of x) are extremely close, the algorithm may not converge in the “Quick-Transfer” stage, signalling a warning (and returning ifault = 4). Slight rounding of the data may be advisable in that case.

For ease of programmatic exploration, k=1k = 1 is allowed, notably returning the center and withinss.

Except for the Lloyd–Forgy method, kk clusters will always be returned if a number is specified. If an initial matrix of centres is supplied, it is possible that no point will be closest to one or more centres, which is currently an error for the Hartigan–Wong method.


kmeans returns an object of class "kmeans" which has a print and a fitted method. It is a list with at least the following components:


A vector of integers (from 1:k) indicating the cluster to which each point is allocated.


A matrix of cluster centres.


The total sum of squares.


Vector of within-cluster sum of squares, one component per cluster.


Total within-cluster sum of squares, i.e. sum(withinss).


The between-cluster sum of squares, i.e. totss-tot.withinss.


The number of points in each cluster.


The number of (outer) iterations.


integer: indicator of a possible algorithm problem – for experts.


The clusters are numbered in the returned object, but they are a set and no ordering is implied. (Their apparent ordering may differ by platform.)


Forgy, E. W. (1965). Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics, 21, 768–769.

Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-means clustering algorithm. Applied Statistics, 28, 100–108. doi:10.2307/2346830.

Lloyd, S. P. (1957, 1982). Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory, 28, 128–137.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281–297. Berkeley, CA: University of California Press.



# a 2-dimensional example
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
(cl <- kmeans(x, 2))
plot(x, col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex = 2)

# sum of squares
ss <- function(x) sum(scale(x, scale = FALSE)^2)

## cluster centers "fitted" to each obs.:
fitted.x <- fitted(cl);  head(fitted.x)
resid.x <- x - fitted(cl)

## Equalities : ----------------------------------
cbind(cl[c("betweenss", "tot.withinss", "totss")], # the same two columns
         c(ss(fitted.x), ss(resid.x),    ss(x)))
stopifnot(all.equal(cl$ totss,        ss(x)),
	  all.equal(cl$ tot.withinss, ss(resid.x)),
	  ## these three are the same:
	  all.equal(cl$ betweenss,    ss(fitted.x)),
	  all.equal(cl$ betweenss, cl$totss - cl$tot.withinss),
	  ## and hence also
	  all.equal(ss(x), ss(fitted.x) + ss(resid.x))

kmeans(x,1)$withinss # trivial one-cluster, (its W.SS == ss(x))

## random starts do help here with too many clusters
## (and are often recommended anyway!):
## The ordering of the clusters may be platform-dependent.

(cl <- kmeans(x, 5, nstart = 25))

plot(x, col = cl$cluster)
points(cl$centers, col = 1:5, pch = 8)

Kruskal-Wallis Rank Sum Test


Performs a Kruskal-Wallis rank sum test.


kruskal.test(x, ...)

## Default S3 method:
kruskal.test(x, g, ...)

## S3 method for class 'formula'
kruskal.test(formula, data, subset, na.action, ...)



a numeric vector of data values, or a list of numeric data vectors. Non-numeric elements of a list will be coerced, with a warning.


a vector or factor object giving the group for the corresponding elements of x. Ignored with a warning if x is a list.


a formula of the form response ~ group where response gives the data values and group a vector or factor of the corresponding groups.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").


further arguments to be passed to or from methods.


kruskal.test performs a Kruskal-Wallis rank sum test of the null that the location parameters of the distribution of x are the same in each group (sample). The alternative is that they differ in at least one.

If x is a list, its elements are taken as the samples to be compared, and hence have to be numeric data vectors. In this case, g is ignored, and one can simply use kruskal.test(x) to perform the test. If the samples are not yet contained in a list, use kruskal.test(list(x, ...)).

Otherwise, x must be a numeric data vector, and g must be a vector or factor object of the same length as x giving the group for the corresponding elements of x.


A list with class "htest" containing the following components:


the Kruskal-Wallis rank sum statistic.


the degrees of freedom of the approximate chi-squared distribution of the test statistic.


the p-value of the test.


the character string "Kruskal-Wallis rank sum test".

a character string giving the names of the data.


Myles Hollander and Douglas A. Wolfe (1973), Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 115–120.

See Also

The Wilcoxon rank sum test (wilcox.test) as the special case for two samples; lm together with anova for performing one-way location analysis under normality assumptions; with Student's t test (t.test) as the special case for two samples.

wilcox_test in package coin for exact, asymptotic and Monte Carlo conditional p-values, including in the presence of ties.


## Hollander & Wolfe (1973), 116.
## Mucociliary efficiency from the rate of removal of dust in normal
##  subjects, subjects with obstructive airway disease, and subjects
##  with asbestosis.
x <- c(2.9, 3.0, 2.5, 2.6, 3.2) # normal subjects
y <- c(3.8, 2.7, 4.0, 2.4)      # with obstructive airway disease
z <- c(2.8, 3.4, 3.7, 2.2, 2.0) # with asbestosis
kruskal.test(list(x, y, z))
## Equivalently,
x <- c(x, y, z)
g <- factor(rep(1:3, c(5, 4, 5)),
            labels = c("Normal subjects",
                       "Subjects with obstructive airway disease",
                       "Subjects with asbestosis"))
kruskal.test(x, g)

## Formula interface.
boxplot(Ozone ~ Month, data = airquality)
kruskal.test(Ozone ~ Month, data = airquality)

Kolmogorov-Smirnov Tests


Perform a one- or two-sample Kolmogorov-Smirnov test.


ks.test(x, ...)
## Default S3 method:
ks.test(x, y, ...,
        alternative = c("two.sided", "less", "greater"),
        exact = NULL, simulate.p.value = FALSE, B = 2000)
## S3 method for class 'formula'
ks.test(formula, data, subset, na.action, ...)



a numeric vector of data values.


either a numeric vector of data values, or a character string naming a cumulative distribution function or an actual cumulative distribution function such as pnorm. Only continuous CDFs are valid.


for the default method, parameters of the distribution specified (as a character string) by y. Otherwise, further arguments to be passed to or from methods.


indicates the alternative hypothesis and must be one of "two.sided" (default), "less", or "greater". You can specify just the initial letter of the value, but the argument name must be given in full. See ‘Details’ for the meanings of the possible values.


NULL or a logical indicating whether an exact p-value should be computed. See ‘Details’ for the meaning of NULL.


a logical indicating whether to compute p-values by Monte Carlo simulation. (Ignored for the one-sample test.)


an integer specifying the number of replicates used in the Monte Carlo test.


a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs either 1 for a one-sample test or a factor with two levels giving the corresponding groups for a two-sample test.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").


If y is numeric, a two-sample (Smirnov) test of the null hypothesis that x and y were drawn from the same distribution is performed.

Alternatively, y can be a character string naming a continuous (cumulative) distribution function, or such a function. In this case, a one-sample (Kolmogorov) test is carried out of the null that the distribution function which generated x is distribution y with parameters specified by .... The presence of ties always generates a warning in the one-sample case, as continuous distributions do not generate them. If the ties arose from rounding the tests may be approximately valid, but even modest amounts of rounding can have a significant effect on the calculated statistic.

Missing values are silently omitted from x and (in the two-sample case) y.

The possible values "two.sided", "less" and "greater" of alternative specify the null hypothesis that the true cumulative distribution function (CDF) of x is equal to, not less than or not greater than the hypothesized CDF (one-sample case) or the CDF of y (two-sample case), respectively. The test compares the CDFs taking their maximal difference as test statistic, with the statistic in the "greater" alternative being D+=maxu[Fx(u)Fy(u)]D^+ = \max_u [ F_x(u) - F_y(u) ]. Thus in the two-sample case alternative = "greater" includes distributions for which x is stochastically smaller than y (the CDF of x lies above and hence to the left of that for y), in contrast to t.test or wilcox.test.

Exact p-values are not available for the one-sample case in the presence of ties. If exact = NULL (the default), an exact p-value is computed if the sample size is less than 100 in the one-sample case and there are no ties, and if the product of the sample sizes is less than 10000 in the two-sample case, with or without ties (using the algorithm described in Schröer and Trenkler (1995)). Otherwise, the p-value is computed via Monte Carlo simulation in the two-sample case if simulate.p.value is TRUE, or else asymptotic distributions are used whose approximations may be inaccurate in small samples. In the one-sample two-sided case, exact p-values are obtained as described in Marsaglia, Tsang & Wang (2003) (but not using the optional approximation in the right tail, so this can be slow for small p-values). The formula of Birnbaum & Tingey (1951) is used for the one-sample one-sided case.

If a one-sample test is used, the parameters specified in ... must be pre-specified and not estimated from the data. There is some more refined distribution theory for the KS test with estimated parameters (see Durbin, 1973), but that is not implemented in ks.test.


A list inheriting from classes "ks.test" and "htest" containing the following components:


the value of the test statistic.


the p-value of the test.


a character string describing the alternative hypothesis.


a character string indicating what type of test was performed.

a character string giving the name(s) of the data.


The two-sided one-sample distribution comes via Marsaglia, Tsang and Wang (2003).

Exact distributions for the two-sample (Smirnov) test are computed by the algorithm proposed by Schröer (1991) and Schröer & Trenkler (1995) using numerical improvements along the lines of Viehmann (2021).


Z. W. Birnbaum and Fred H. Tingey (1951). One-sided confidence contours for probability distribution functions. The Annals of Mathematical Statistics, 22/4, 592–596. doi:10.1214/aoms/1177729550.

William J. Conover (1971). Practical Nonparametric Statistics. New York: John Wiley & Sons. Pages 295–301 (one-sample Kolmogorov test), 309–314 (two-sample Smirnov test).

Durbin, J. (1973). Distribution theory for tests based on the sample distribution function. SIAM.

W. Feller (1948). On the Kolmogorov-Smirnov limit theorems for empirical distributions. The Annals of Mathematical Statistics, 19(2), 177–189. doi:10.1214/aoms/1177730243.

George Marsaglia, Wai Wan Tsang and Jingbo Wang (2003). Evaluating Kolmogorov's distribution. Journal of Statistical Software, 8/18. doi:10.18637/jss.v008.i18.

Gunar Schröer (1991). Computergestützte statistische Inferenz am Beispiel der Kolmogorov-Smirnov Tests. Diplomarbeit Universität Osnabrück.

Gunar Schröer and Dietrich Trenkler (1995). Exact and Randomization Distributions of Kolmogorov-Smirnov Tests for Two or Three Samples. Computational Statistics & Data Analysis, 20(2), 185–202. doi:10.1016/0167-9473(94)00040-P.

Thomas Viehmann (2021). Numerically more stable computation of the p-values for the two-sample Kolmogorov-Smirnov test.

See Also


shapiro.test which performs the Shapiro-Wilk test for normality.



x <- rnorm(50)
y <- runif(30)
# Do x and y come from the same distribution?
ks.test(x, y)
# Does x come from a shifted gamma distribution with shape 3 and rate 2?
ks.test(x+2, "pgamma", 3, 2) # two-sided, exact
ks.test(x+2, "pgamma", 3, 2, exact = FALSE)
ks.test(x+2, "pgamma", 3, 2, alternative = "gr")

# test if x is stochastically larger than x2
x2 <- rnorm(50, -1)
plot(ecdf(x), xlim = range(c(x, x2)))
plot(ecdf(x2), add = TRUE, lty = "dashed")
t.test(x, x2, alternative = "g")
wilcox.test(x, x2, alternative = "g")
ks.test(x, x2, alternative = "l")

# with ties, example from Schröer and Trenkler (1995)
# D = 3/7, p = 8/33 = 0.242424..
ks.test(c(1, 2, 2, 3, 3),
        c(1, 2, 3, 3, 4, 5, 6))# -> exact

# formula interface, see ?wilcox.test
ks.test(Ozone ~ Month, data = airquality,
        subset = Month %in% c(5, 8))

Kernel Regression Smoother


The Nadaraya–Watson kernel regression estimate.


ksmooth(x, y, kernel = c("box", "normal"), bandwidth = 0.5,
        range.x = range(x),
        n.points = max(100L, length(x)), x.points)



input x values. Long vectors are supported.


input y values. Long vectors are supported.


the kernel to be used. Can be abbreviated.


the bandwidth. The kernels are scaled so that their quartiles (viewed as probability densities) are at ±\pm 0.25*bandwidth.


the range of points to be covered in the output.


the number of points at which to evaluate the fit.


points at which to evaluate the smoothed fit. If missing, n.points are chosen uniformly to cover range.x. Long vectors are supported.


A list with components


values at which the smoothed fit is evaluated. Guaranteed to be in increasing order.


fitted values corresponding to x.


This function was implemented for compatibility with S, although it is nowhere near as slow as the S function. Better kernel smoothers are available in other packages such as KernSmooth.



with(cars, {
    plot(speed, dist)
    lines(ksmooth(speed, dist, "normal", bandwidth = 2), col = 2)
    lines(ksmooth(speed, dist, "normal", bandwidth = 5), col = 3)

Lag a Time Series


Compute a lagged version of a time series, shifting the time base back by a given number of observations.

lag is a generic function; this page documents its default method.


lag(x, ...)

## Default S3 method:
lag(x, k = 1, ...)



A vector or matrix or univariate or multivariate time series


The number of lags (in units of observations).


further arguments to be passed to or from methods.


Vector or matrix arguments x are given a tsp attribute via hasTsp.


A time series object with the same class as x.


Note the sign of k: a series lagged by a positive k starts earlier.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

diff, deltat


lag(ldeaths, 12) # starts one year earlier

Time Series Lag Plots


Plot time series against lagged versions of themselves. Helps visualizing ‘auto-dependence’ even when auto-correlations vanish.


lag.plot(x, lags = 1, layout = NULL, set.lags = 1:lags,
         main = NULL, asp = 1,
         diag = TRUE, diag.col = "gray", type = "p", oma = NULL,
         ask = NULL, do.lines = (n <= 150), labels = do.lines,



time-series (univariate or multivariate)


number of lag plots desired, see argument set.lags.


the layout of multiple plots, basically the mfrow par() argument. The default uses about a square layout (see n2mfrow) such that all plots are on one page.


vector of positive integers allowing specification of the set of lags used; defaults to 1:lags.


character with a main header title to be done on the top of each page.


Aspect ratio to be fixed, see plot.default.


logical indicating if the x=y diagonal should be drawn.


color to be used for the diagonal if(diag).


plot type to be used, but see plot.ts about its restricted meaning.


outer margins, see par.


logical or NULL; if true, the user is asked to confirm before a new page is started.


logical indicating if lines should be drawn.


logical indicating if labels should be used.


Further arguments to plot.ts. Several graphical parameters are set in this function and so cannot be changed: these include xlab, ylab, mgp, col.lab and font.lab: this also applies to the arguments xy.labels and xy.lines.


If just one plot is produced, this is a conventional plot. If more than one plot is to be produced, par(mfrow) and several other graphics parameters will be set, so it is not (easily) possible to mix such lag plots with other plots on the same page.

If ask = NULL, par(ask = TRUE) will be called if more than one page of plots is to be produced and the device is interactive.


It is more flexible and has different default behaviour than the S version. We use main = instead of head = for internal consistency.


Martin Maechler

See Also

plot.ts which is the basic work horse.



lag.plot(nhtemp, 8, diag.col = "forest green")
lag.plot(nhtemp, 5, main = "Average Temperatures in New Haven")
## ask defaults to TRUE when we have more than one page:
lag.plot(nhtemp, 6, layout = c(2,1), asp = NA,
         main = "New Haven Temperatures", col.main = "blue")

## Multivariate (but non-stationary! ...)
lag.plot(freeny.x, lags = 3)

## no lines for long series :
lag.plot(sqrt(sunspots), set.lags = c(1:4, 9:12), pch = ".", col = "gold")

Robust Line Fitting


Fit a line robustly as recommended in Exploratory Data Analysis.

Currently by default (iter = 1) the initial median-median line is not iterated (as opposed to Tukey's “resistant line” in the references).


line(x, y, iter = 1)


x, y

the arguments can be any way of specifying x-y pairs. See xy.coords.


positive integer specifying the number of “polishing” iterations. Note that this was hard coded to 1 in R versions before 3.5.0, and more importantly that such simple iterations may not converge, see Siegel's 9-point example.


Cases with missing values are omitted.

Contrary to the references where the data is split in three (almost) equally sized groups with symmetric sizes depending on nn and n %% 3 and computes medians inside each group, the line() code splits into three groups using all observations with x[.] <= q1 and x[.] >= q2, where q1, q2 are (a kind of) quantiles for probabilities p=1/3p = 1/3 and p=2/3p = 2/3 of the form (x[j1]+x[j2])/2 where j1 = floor(p*(n-1)) and j2 = ceiling(p(n-1)), n = length(x).

Long vectors are not supported yet.


An object of class "tukeyline".

Methods are available for the generic functions coef, residuals, fitted, and print.


Tukey, J. W. (1977). Exploratory Data Analysis, Reading Massachusetts: Addison-Wesley.

Velleman, P. F. and Hoaglin, D. C. (1981). Applications, Basics and Computing of Exploratory Data Analysis, Duxbury Press. Chapter 5.

Emerson, J. D. and Hoaglin, D. C. (1983). Resistant Lines for yy versus xx. Chapter 5 of Understanding Robust and Exploratory Data Analysis, eds. David C. Hoaglin, Frederick Mosteller and John W. Tukey. Wiley.

Iain M. Johnstone and Paul F. Velleman (1985). The Resistant Line and Related Regression Methods. Journal of the American Statistical Association, 80, 1041–1054. doi:10.2307/2288572.

See Also


There are alternatives for robust linear regression more robust and more (statistically) efficient, see rlm() from MASS, or lmrob() from robustbase.



(z <- line(cars))
## Tukey-Anscombe Plot :
plot(residuals(z) ~ fitted(z), main = deparse(z$call))

## Andrew Siegel's pathological 9-point data, y-values multiplied by 3:
d.AS <- data.frame(x = c(-4:3, 12), y = 3*c(rep(0,6), -5, 5, 1))
cAS <- with(d.AS, t(sapply(1:10,
                   function(it) line(x,y, iter=it)$coefficients)))
dimnames(cAS) <- list(paste("it =", format(1:10)), c("intercept", "slope"))
## iterations started to oscillate, repeating iteration 7,8 indefinitely

A Class for Lists of (Parts of) Model Fits


Class "listof" is used by aov and the "lm" method of alias for lists of model fits or parts thereof. It is simply a list with an assigned class to control the way methods, especially printing, act on it.

It has a coef method in this package (which returns an object of this class), and [ and print methods in package base.

Fitting Linear Models


lm is used to fit linear models, including multivariate ones. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance (although aov may provide a more convenient interface for these).


lm(formula, data, subset, weights, na.action,
   method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
   singular.ok = TRUE, contrasts = NULL, offset, ...)

## S3 method for class 'lm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)



an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.


an optional data frame, list or environment (or object coercible by to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lm is called.


an optional vector specifying a subset of observations to be used in the fitting process. (See additional details about how this argument interacts with data-dependent bases in the ‘Details’ section of the model.frame documentation.)


an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary least squares is used. See also ‘Details’,


a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.


the method to be used; for fitting, currently only method = "qr" is supported; method = "model.frame" returns the model frame (the same as with model = TRUE, see below).

model, x, y, qr

logicals. If TRUE the corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition) are returned.


logical. If FALSE (the default in S but not in R) a singular fit is an error.


an optional list. See the contrasts.arg of model.matrix.default.


this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector or matrix of extents matching those of the response. One or more offset terms can be included in the formula instead or as well, and if more than one are specified their sum is used. See model.offset.


For lm(): additional arguments to be passed to the low level regression fitting functions (see below).


the number of significant digits to be passed to format(coef(x), .) when print()ing.


Models for lm are specified symbolically. A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second.

If the formula includes an offset, this is evaluated and subtracted from the response.

If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix and the result inherits from "mlm" (“multivariate linear model”).

See model.matrix for some further details. The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on: to avoid this pass a terms object as the formula (see aov and demo(glm.vr) for an example).

A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula for more details of allowed formulae.

Non-NULL weights can be used to indicate that different observations have different variances (with the values in weights being inversely proportional to the variances); or equivalently, when the elements of weights are positive integers wiw_i, that each response yiy_i is the mean of wiw_i unit-weight observations (including the case that there are wiw_i observations equal to yiy_i and the data have been summarized). However, in the latter case, notice that within-group variation is not used. Therefore, the sigma estimate and residual degrees of freedom may be suboptimal; in the case of replication weights, even wrong. Hence, standard errors and analysis of variance tables should be treated with care.

lm calls the lower level functions, etc, see below, for the actual numerical computations. For programming only, you may consider doing likewise.

All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula.


lm returns an object of class "lm" or for multivariate (‘multiple’) responses of class c("mlm", "lm").

The functions summary and anova are used to obtain and print a summary and analysis of variance table of the results. The generic accessor functions coefficients, effects, fitted.values and residuals extract various useful features of the value returned by lm.

An object of class "lm" is a list containing at least the following components:


a named vector of coefficients


the residuals, that is response minus fitted values.


the fitted mean values.


the numeric rank of the fitted linear model.


(only for weighted fits) the specified weights.


the residual degrees of freedom.


the matched call.


the terms object used.


(only where relevant) the contrasts used.


(only where relevant) a record of the levels of the factors used in fitting.


the offset used (missing if none were used).


if requested, the response used.


if requested, the model matrix used.


if requested (the default), the model frame used.


(where relevant) information returned by model.frame on the special handling of NAs.

In addition, non-null fits will have components assign, effects and (unless not requested) qr relating to the linear fit, for use by extractor functions such as summary and effects.

Using time series

Considerable care is needed when using lm with time series.

Unless na.action = NULL, the time series attributes are stripped from the variables before the regression is done. (This is necessary as omitting NAs would invalidate the time series attributes, and if NAs are omitted in the middle of the series the result would no longer be a regular time series.)

Even if the time series attributes are retained, they are not used to line up series, so that the time shift of a lagged or differenced regressor would be ignored. It is good practice to prepare a data argument by ts.intersect(..., dframe = TRUE), then apply a suitable na.action to that data frame and call lm with na.action = NULL so that residuals and fitted values are time series.


The design was inspired by the S function of the same name described in Chambers (1992). The implementation of model formula by Ross Ihaka was based on Wilkinson & Rogers (1973).


Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

Wilkinson, G. N. and Rogers, C. E. (1973). Symbolic descriptions of factorial models for analysis of variance. Applied Statistics, 22, 392–399. doi:10.2307/2346786.

See Also

summary.lm for more detailed summaries and anova.lm for the ANOVA table; aov for a different interface.

The generic functions coef, effects, residuals, fitted, vcov.

predict.lm (via predict) for prediction, including confidence and prediction intervals; confint for confidence intervals of parameters.

lm.influence for regression diagnostics, and glm for generalized linear models.

The underlying low level functions, for plain, and lm.wfit for weighted regression fitting.

More lm() examples are available e.g., in anscombe, attitude, freeny, LifeCycleSavings, longley, stackloss, swiss.

biglm in package biglm for an alternative way to fit linear models to large datasets (especially those with many cases).



## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
lm.D90 <- lm(weight ~ group - 1) # omitting intercept


opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0))
plot(lm.D9, las = 1)      # Residuals, Fitted, ...

### less simple examples in "See Also" above

Fitter Functions for Linear Models


These are the basic computing engines called by lm used to fit linear models. These should usually not be used directly unless by experienced users. is a bare-bones wrapper to the innermost QR-based C code, on which and lsfit are also based, for even more experienced users.

Usage (x, y,    offset = NULL, method = "qr", tol = 1e-7,
       singular.ok = TRUE, ...)

lm.wfit(x, y, w, offset = NULL, method = "qr", tol = 1e-7,
        singular.ok = TRUE, ...), y, tol = 1e-7)



design matrix of dimension n * p.


vector of observations of length n, or a matrix with n rows.


vector of weights (length n) to be used in the fitting process for the wfit functions. Weighted least squares is used with weights w, i.e., sum(w * e^2) is minimized.


(numeric of length n). This can be used to specify an a priori known component to be included in the linear predictor during fitting.


currently, only method = "qr" is supported.


tolerance for the qr decomposition. Default is 1e-7.


logical. If FALSE, a singular model is an error.


currently disregarded.


If y is a matrix, offset can be a numeric matrix of the same dimensions, in which case each column is applied to the corresponding column of y.


a list with components (for and lm.wfit)


p vector


n vector or matrix


n vector or matrix


n vector of orthogonal single-df effects. The first rank of them correspond to non-aliased coefficients, and are named accordingly.


n vector — only for the *wfit* functions.


integer, giving the rank


degrees of freedom of residuals


the QR decomposition, see qr.

Fits without any columns or non-zero weights do not have the effects and qr components. returns a subset of the above, the qr part unwrapped, plus a logical component pivoted indicating if the underlying QR algorithm did pivot.

See Also

lm which you should use for linear least squares regression, unless you know better.



n <- 7 ; p <- 2
X <- matrix(rnorm(n * p), n, p) # no intercept!
y <- rnorm(n)
w <- rnorm(n)^2

str(lmw <- lm.wfit(x = X, y = y, w = w))

str(lm. <- (x = X, y = y))

## fits w/o intercept:
all.equal(unname(coef(lm(y ~ X-1))),

if(require("microbenchmark")) {
  mb <- microbenchmark(lm(y~X-1),,y),,y))
  boxplot(mb, notch=TRUE)

Regression Diagnostics


This function provides the basic quantities which are used in forming a wide variety of diagnostics for checking the quality of regression fits.


influence(model, ...)
## S3 method for class 'lm'
influence(model, do.coef = TRUE, ...)
## S3 method for class 'glm'
influence(model, do.coef = TRUE, ...)

lm.influence(model, do.coef = TRUE)



an object as returned by lm or glm.


logical indicating if the changed coefficients (see below) are desired. These need O(n2p)O(n^2 p) computing time.


further arguments passed to or from other methods.


The influence.measures() and other functions listed in See Also provide a more user oriented way of computing a variety of regression diagnostics. These all build on lm.influence. Note that for GLMs (other than the Gaussian family with identity link) these are based on one-step approximations which may be inadequate if a case has high influence.

An attempt is made to ensure that computed hat values that are probably one are treated as one, and the corresponding rows in sigma and coefficients are NaN. (Dropping such a case would normally result in a variable being dropped, so it is not possible to give simple drop-one diagnostics.)

naresid is applied to the results and so will fill in with NAs it the fit had na.action = na.exclude.


A list containing the following components of the same length or number of rows nn, which is the number of non-zero weights. Cases omitted in the fit are omitted unless a na.action method was used (such as na.exclude) which restores them.


a vector containing the diagonal of the ‘hat’ matrix.


(unless do.coef is false) a matrix whose i-th row contains the change in the estimated coefficients which results when the i-th case is dropped from the regression. Note that aliased coefficients are not included in the matrix.


a vector whose i-th element contains the estimate of the residual standard deviation obtained when the i-th case is dropped from the regression. (The approximations needed for GLMs can result in this being NaN.)


a vector of weighted (or for class glm rather deviance) residuals.


The coefficients returned by the R version of lm.influence differ from those computed by S. Rather than returning the coefficients which result from dropping each case, we return the changes in the coefficients. This is more directly useful in many diagnostic measures.
Since these need O(np2)O(n p^2) computing time, they can be omitted by do.coef = FALSE.

Note that cases with weights == 0 are dropped (contrary to the situation in S).

If a model has been fitted with na.action = na.exclude (see na.exclude), cases excluded in the fit are considered here.


See the list in the documentation for influence.measures.

Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

summary.lm for summary and related methods;
hat for the hat matrix diagonals,
dfbetas, dffits, covratio, cooks.distance, lm.


## Analysis of the life-cycle savings data
## given in Belsley, Kuh and Welsch.
summary(lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi,
                    data = LifeCycleSavings),
        correlation = TRUE)
utils::str(lmI <- lm.influence(lm.SR))

## For more "user level" examples, use example(influence.measures)

Accessing Linear Model Fits


All these functions are methods for class "lm" objects.


## S3 method for class 'lm'
family(object, ...)

## S3 method for class 'lm'
formula(x, ...)

## S3 method for class 'lm'
          type = c("working", "response", "deviance", "pearson",

## S3 method for class 'lm'
labels(object, ...)


object, x

an object inheriting from class lm, usually the result of a call to lm or aov.


further arguments passed to or from other methods.


the type of residuals which should be returned. Can be abbreviated.


The generic accessor functions coef, effects, fitted and residuals can be used to extract various useful features of the value returned by lm.

The working and response residuals are ‘observed - fitted’. The deviance and Pearson residuals are weighted residuals, scaled by the square root of the weights used in fitting. The partial residuals are a matrix with each column formed by omitting a term from the model. In all these, zero weight cases are never omitted (as opposed to the standardized rstudent residuals, and the weighted.residuals).

How residuals treats cases with missing values in the original fit is determined by the na.action argument of that fit. If na.action = na.omit omitted cases will not appear in the residuals, whereas if na.action = na.exclude they will appear, with residual value NA. See also naresid.

The "lm" method for generic labels returns the term labels for estimable terms, that is the names of the terms with an least one estimable coefficient.


Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

The model fitting function lm, anova.lm.

coef, deviance, df.residual, effects, fitted, glm for generalized linear models, influence (etc on that page) for regression diagnostics, weighted.residuals, residuals, residuals.glm, summary.lm, weights.

influence.measures for deletion diagnostics, including standardized (rstandard) and studentized (rstudent) residuals.


##-- Continuing the  lm(.) example:
coef(lm.D90) # the bare coefficients

## The 2 basic regression diagnostic plots [plot.lm(.) is preferred]
plot(resid(lm.D90), fitted(lm.D90)) # Tukey-Anscombe's
abline(h = 0, lty = 2, col = "gray")


Print Loadings in Factor Analysis


Extract or print loadings in factor analysis (or principal components analysis).


loadings(x, ...)

## S3 method for class 'loadings'
print(x, digits = 3, cutoff = 0.1, sort = FALSE, ...)

## S3 method for class 'factanal'
print(x, digits = 3, ...)



an object of class "factanal" or "princomp" or the loadings component of such an object.


number of decimal places to use in printing uniquenesses and loadings.


loadings smaller than this (in absolute value) are suppressed.


logical. If true, the variables are sorted by their importance on each factor. Each variable with any loading larger than 0.5 (in modulus) is assigned to the factor with the largest loading, and the variables are printed in the order of the factor they are assigned to, then those unassigned.


further arguments for other methods, ignored for loadings.


‘Loadings’ is a term from factor analysis, but because factor analysis and principal component analysis (PCA) are often conflated in the social science literature, it was used for PCA by SPSS and hence by princomp in S-PLUS to help SPSS users.

Small loadings are conventionally not printed (replaced by spaces), to draw the eye to the pattern of the larger loadings.

The print method for class "factanal" calls the "loadings" method to print the loadings, and so passes down arguments such as cutoff and sort.

The signs of the loadings vectors are arbitrary for both factor analysis and PCA.


There are other functions called loadings in contributed packages which are S3 or S4 generic: the ... argument is to make it easier for this one to become a default method.

See Also

factanal, princomp

Local Polynomial Regression Fitting


Fit a locally polynomial surface determined by one or more numerical predictors, using local fitting.


loess(formula, data, weights, subset, na.action, model = FALSE,
      span = 0.75,, degree = 2,
      parametric = FALSE, drop.square = FALSE, normalize = TRUE,
      family = c("gaussian", "symmetric"),
      method = c("loess", "model.frame"),
      control = loess.control(...), ...)



a formula specifying the numeric response and one to four numeric predictors (best specified via an interaction, but can also be specified additively). Will be coerced to a formula if necessary.


an optional data frame, list or environment (or object coercible by to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which loess is called.


optional weights for each case.


an optional specification of a subset of the data to be used.


the action to be taken with missing values in the response or predictors. The default is given by getOption("na.action").


should the model frame be returned?


the parameter α\alpha which controls the degree of smoothing.

an alternative way to specify span, as the approximate equivalent number of parameters to be used.


the degree of the polynomials to be used, normally 1 or 2. (Degree 0 is also allowed, but see the ‘Note’.)


should any terms be fitted globally rather than locally? Terms can be specified by name, number or as a logical vector of the same length as the number of predictors.


for fits with more than one predictor and degree = 2, should the quadratic term be dropped for particular predictors? Terms are specified in the same way as for parametric.


should the predictors be normalized to a common scale if there is more than one? The normalization used is to set the 10% trimmed standard deviation to one. Set to false for spatial coordinate predictors and others known to be on a common scale.


if "gaussian" fitting is by least-squares, and if "symmetric" a re-descending M estimator is used with Tukey's biweight function. Can be abbreviated.


fit the model or just extract the model frame. Can be abbreviated.


control parameters: see loess.control.


control parameters can also be supplied directly (if control is not specified).


Fitting is done locally. That is, for the fit at point xx, the fit is made using points in a neighbourhood of xx, weighted by their distance from xx (with differences in ‘parametric’ variables being ignored when computing the distance). The size of the neighbourhood is controlled by α\alpha (set by span or For α<1\alpha < 1, the neighbourhood includes proportion α\alpha of the points, and these have tricubic weighting (proportional to (1(dist/maxdist)3)3(1 - \mathrm{(dist/maxdist)}^3)^3). For α>1\alpha > 1, all points are used, with the ‘maximum distance’ assumed to be α1/p\alpha^{1/p} times the actual maximum distance for pp explanatory variables.

For the default family, fitting is by (weighted) least squares. For family="symmetric" a few iterations of an M-estimation procedure with Tukey's biweight are used. Be aware that as the initial value is the least-squares fit, this need not be a very resistant fit.

It can be important to tune the control list to achieve acceptable speed. See loess.control for details.


An object of class "loess", with print(), summary(), predict and anova methods.


As this is based on cloess, it is similar to but not identical to the loess function of S. In particular, conditioning is not implemented.

The memory usage of this implementation of loess is roughly quadratic in the number of points, with 1000 points taking about 10Mb.

degree = 0, local constant fitting, is allowed in this implementation but not documented in the reference. It seems very little tested, so use with caution.


B. D. Ripley, based on the cloess package of Cleveland, Grosse and Shyu.


The 1998 version of cloess package of Cleveland, Grosse and Shyu. A later version is available as dloess at


W. S. Cleveland, E. Grosse and W. M. Shyu (1992) Local regression models. Chapter 8 of Statistical Models in S eds J.M. Chambers and T.J. Hastie, Wadsworth & Brooks/Cole.

See Also

loess.control, predict.loess.

lowess, the ancestor of loess (with different defaults!).


cars.lo <- loess(dist ~ speed, cars)
predict(cars.lo, data.frame(speed = seq(5, 30, 1)), se = TRUE)
# to allow extrapolation
cars.lo2 <- loess(dist ~ speed, cars,
                  control = loess.control(surface = "direct"))
predict(cars.lo2, data.frame(speed = seq(5, 30, 1)), se = TRUE)

Set Parameters for loess


Set control parameters for loess fits.


loess.control(surface = c("interpolate", "direct"),
              statistics = c("approximate", "exact", "none"),
              trace.hat = c("exact", "approximate"),
              cell = 0.2, iterations = 4, iterTrace = FALSE, ...)



should the fitted surface be computed exactly ("direct") or via interpolation from a k-d tree? Can be abbreviated.


should the statistics be computed exactly, approximately or not at all? Exact computation can be very slow. Can be abbreviated.


Only for the (default) case (surface = "interpolate", statistics = "approximate"): should the trace of the smoother matrix be computed exactly or approximately? It is recommended to use the approximation for more than about 1000 data points. Can be abbreviated.


if interpolation is used this controls the accuracy of the approximation via the maximum number of points in a cell in the k-d tree. Cells with more than floor(n*span*cell) points are subdivided.


the number of iterations used in robust fitting, i.e. only if family is "symmetric".


logical (or integer) determining if tracing information during the robust iterations (iterations2\ge 2) is produced.


further arguments which are ignored.


A list with components


with meanings as explained under ‘Arguments’.

See Also


The Logistic Distribution


Density, distribution function, quantile function and random generation for the logistic distribution with parameters location and scale.


dlogis(x, location = 0, scale = 1, log = FALSE)
plogis(q, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE)
qlogis(p, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE)
rlogis(n, location = 0, scale = 1)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.

location, scale

location and scale parameters.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


If location or scale are omitted, they assume the default values of 0 and 1 respectively.

The Logistic distribution with location =μ= \mu and scale =σ= \sigma has distribution function

F(x)=11+e(xμ)/σF(x) = \frac{1}{1 + e^{-(x-\mu)/\sigma}}%

and density

f(x)=1σe(xμ)/σ(1+e(xμ)/σ)2f(x)= \frac{1}{\sigma}\frac{e^{(x-\mu)/\sigma}}{(1 + e^{(x-\mu)/\sigma})^2}%

It is a long-tailed distribution with mean μ\mu and variance π2/3σ2\pi^2/3 \sigma^2.


dlogis gives the density, plogis gives the distribution function, qlogis gives the quantile function, and rlogis generates random deviates.

The length of the result is determined by n for rlogis, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


qlogis(p) is the same as the well known ‘logit’ function, logit(p)=logp/(1p)logit(p) = \log p/(1-p), and plogis(x) has consequently been called the ‘inverse logit’.

The distribution function is a rescaled hyperbolic tangent, plogis(x) == (1+ tanh(x/2))/2, and it is called a sigmoid function in contexts such as neural networks.


[dpq]logis are calculated directly from the definitions.

rlogis uses inversion.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 2, chapter 23. Wiley, New York.

See Also

Distributions for other standard distributions.


var(rlogis(4000, 0, scale = 5))  # approximately (+/- 3)
pi^2/3 * 5^2

Extract Log-Likelihood


This function is generic; method functions can be written to handle specific classes of objects. Classes which have methods for this function include: "glm", "lm", "nls" and "Arima". Packages contain methods for other classes, such as "fitdistr", "negbin" and "polr" in package MASS, "multinom" in package nnet and "gls", "gnls" "lme" and others in package nlme.


logLik(object, ...)

## S3 method for class 'lm'
logLik(object, REML = FALSE, ...)



any object from which a log-likelihood value, or a contribution to a log-likelihood value, can be extracted.


some methods for this generic function require additional arguments.


an optional logical value. If TRUE the restricted log-likelihood is returned, else, if FALSE, the log-likelihood is returned. Defaults to FALSE.


logLik is most commonly used for a model fitted by maximum likelihood, and some uses, e.g. by AIC, assume this. So care is needed where other fit criteria have been used, for example REML (the default for "lme").

For a "glm" fit the family does not have to specify how to calculate the log-likelihood, so this is based on using the family's aic() function to compute the AIC. For the gaussian, Gamma and inverse.gaussian families it assumed that the dispersion of the GLM is estimated and has been counted as a parameter in the AIC value, and for all other families it is assumed that the dispersion is known. Note that this procedure does not give the maximized likelihood for "glm" fits from the Gamma and inverse gaussian families, as the estimate of dispersion used is not the MLE.

For "lm" fits it is assumed that the scale has been estimated (by maximum likelihood or REML), and all the constants in the log-likelihood are included. That method is only applicable to single-response fits.


Returns an object of class logLik. This is a number with at least one attribute, "df" (degrees of freedom), giving the number of (estimated) parameters in the model.

There is a simple print method for "logLik" objects.

There may be other attributes depending on the method used: see the appropriate documentation. One that is used by several methods is "nobs", the number of observations used in estimation (after the restrictions if REML = TRUE).


José Pinheiro and Douglas Bates


For logLik.lm:

Harville, D.A. (1974). Bayesian inference for variance components using only error contrasts. Biometrika, 61, 383–385. doi:10.2307/2334370.

See Also

logLik.gls, logLik.lme, in package nlme, etc.



x <- 1:5
lmx <- lm(x ~ 1)
logLik(lmx) # using print.logLik() method

## lm method
(fm1 <- lm(rating ~ ., data = attitude))
logLik(fm1, REML = TRUE)

utils::data(Orthodont, package = "nlme")
fm1 <- lm(distance ~ Sex * age, Orthodont)
logLik(fm1, REML = TRUE)

Fitting Log-Linear Models


loglin is used to fit log-linear models to multidimensional contingency tables by Iterative Proportional Fitting.


loglin(table, margin, start = rep(1, length(table)), fit = FALSE,
       eps = 0.1, iter = 20, param = FALSE, print = TRUE)



a contingency table to be fit, typically the output from table.


a list of vectors with the marginal totals to be fit.

(Hierarchical) log-linear models can be specified in terms of these marginal totals which give the ‘maximal’ factor subsets contained in the model. For example, in a three-factor model, list(c(1, 2), c(1, 3)) specifies a model which contains parameters for the grand mean, each factor, and the 1-2 and 1-3 interactions, respectively (but no 2-3 or 1-2-3 interaction), i.e., a model where factors 2 and 3 are independent conditional on factor 1 (sometimes represented as ‘[12][13]’).

The names of factors (i.e., names(dimnames(table))) may be used rather than numeric indices.


a starting estimate for the fitted table. This optional argument is important for incomplete tables with structural zeros in table which should be preserved in the fit. In this case, the corresponding entries in start should be zero and the others can be taken as one.


a logical indicating whether the fitted values should be returned.


maximum deviation allowed between observed and fitted margins.


maximum number of iterations.


a logical indicating whether the parameter values should be returned.


a logical. If TRUE, the number of iterations and the final deviation are printed.


The Iterative Proportional Fitting algorithm as presented in Haberman (1972) is used for fitting the model. At most iter iterations are performed, convergence is taken to occur when the maximum deviation between observed and fitted margins is less than eps. All internal computations are done in double precision; there is no limit on the number of factors (the dimension of the table) in the model.

Assuming that there are no structural zeros, both the Likelihood Ratio Test and Pearson test statistics have an asymptotic chi-squared distribution with df degrees of freedom.

Note that the IPF steps are applied to the factors in the order given in margin. Hence if the model is decomposable and the order given in margin is a running intersection property ordering then IPF will converge in one iteration.

Package MASS contains loglm, a front-end to loglin which allows the log-linear model to be specified and fitted in a formula-based manner similar to that of other fitting functions such as lm or glm.


A list with the following components.


the Likelihood Ratio Test statistic.


the Pearson test statistic (X-squared).


the degrees of freedom for the fitted model. There is no adjustment for structural zeros.


list of the margins that were fit. Basically the same as the input margin, but with numbers replaced by names where possible.


An array like table containing the fitted values. Only returned if fit is TRUE.


A list containing the estimated parameters of the model. The ‘standard’ constraints of zero marginal sums (e.g., zero row and column sums for a two factor parameter) are employed. Only returned if param is TRUE.


Kurt Hornik


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.

Haberman, S. J. (1972). Algorithm AS 51: Log-linear fit for contingency tables. Applied Statistics, 21, 218–225. doi:10.2307/2346506.

Agresti, A. (1990). Categorical data analysis. New York: Wiley.

See Also


loglm in package MASS for a user-friendly wrapper.

glm for another way to fit log-linear models.


## Model of joint independence of sex from hair and eye color.
fm <- loglin(HairEyeColor, list(c(1, 2), c(1, 3), c(2, 3)))
1 - pchisq(fm$lrt, fm$df)
## Model with no three-factor interactions fits well.

The Log Normal Distribution


Density, distribution function, quantile function and random generation for the log normal distribution whose logarithm has mean equal to meanlog and standard deviation equal to sdlog.


dlnorm(x, meanlog = 0, sdlog = 1, log = FALSE)
plnorm(q, meanlog = 0, sdlog = 1, lower.tail = TRUE, log.p = FALSE)
qlnorm(p, meanlog = 0, sdlog = 1, lower.tail = TRUE, log.p = FALSE)
rlnorm(n, meanlog = 0, sdlog = 1)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.

meanlog, sdlog

mean and standard deviation of the distribution on the log scale with default values of 0 and 1 respectively.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


The log normal distribution has density

f(x)=12πσxe(log(x)μ)2/2σ2f(x) = \frac{1}{\sqrt{2\pi}\sigma x} e^{-(\log(x) - \mu)^2/2 \sigma^2}%

where μ\mu and σ\sigma are the mean and standard deviation of the logarithm. The mean is E(X)=exp(μ+1/2σ2)E(X) = exp(\mu + 1/2 \sigma^2), the median is med(X)=exp(μ)med(X) = exp(\mu), and the variance Var(X)=exp(2μ+σ2)(exp(σ2)1)Var(X) = exp(2\mu + \sigma^2)(exp(\sigma^2) - 1) and hence the coefficient of variation is exp(σ2)1\sqrt{exp(\sigma^2) - 1} which is approximately σ\sigma when that is small (e.g., σ<1/2\sigma < 1/2).


dlnorm gives the density, plnorm gives the distribution function, qlnorm gives the quantile function, and rlnorm generates random deviates.

The length of the result is determined by n for rlnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


The cumulative hazard H(t)=log(1F(t))H(t) = - \log(1 - F(t)) is -plnorm(t, r, lower = FALSE, log = TRUE).


dlnorm is calculated from the definition (in ‘Details’). [pqr]lnorm are based on the relationship to the normal.

Consequently, they model a single point mass at exp(meanlog) for the boundary case sdlog = 0.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

See Also

Distributions for other standard distributions, including dnorm for the normal distribution.


dlnorm(1) == dnorm(0)

Scatter Plot Smoothing


This function performs the computations for the LOWESS smoother which uses locally-weighted polynomial regression (see the references).


lowess(x, y = NULL, f = 2/3, iter = 3, delta = 0.01 * diff(range(x)))


x, y

vectors giving the coordinates of the points in the scatter plot. Alternatively a single plotting structure can be specified – see xy.coords.


the smoother span. This gives the proportion of points in the plot which influence the smooth at each value. Larger values give more smoothness.


the number of ‘robustifying’ iterations which should be performed. Using smaller values of iter will make lowess run faster.


See ‘Details’. Defaults to 1/100th of the range of x.


lowess is defined by a complex algorithm, the Ratfor original of which (by W. S. Cleveland) can be found in the R sources as file ‘src/library/stats/src/lowess.doc’. Normally a local linear polynomial fit is used, but under some circumstances (see the file) a local constant fit can be used. ‘Local’ is defined by the distance to the floor(f*n)-th nearest neighbour, and tricubic weighting is used for x which fall within the neighbourhood.

The initial fit is done using weighted least squares. If iter > 0, further weighted fits are done using the product of the weights from the proximity of the x values and case weights derived from the residuals at the previous iteration. Specifically, the case weight is Tukey's biweight, with cutoff 6 times the MAD of the residuals. (The current R implementation differs from the original in stopping iteration if the MAD is effectively zero since the algorithm is highly unstable in that case.)

delta is used to speed up computation: instead of computing the local polynomial fit at each data point it is not computed for points within delta of the last computed point, and linear interpolation is used to fill in the fitted values for the skipped points.


lowess returns a list containing components x and y which give the coordinates of the smooth. The smooth can be added to a plot of the original points with the function lines: see the examples.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.

Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74, 829–836. doi:10.1080/01621459.1979.10481038.

Cleveland, W. S. (1981) LOWESS: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician, 35, 54. doi:10.2307/2683591.

See Also

loess, a newer formula based version of lowess (with different defaults!).



plot(cars, main = "lowess(cars)")
lines(lowess(cars), col = 2)
lines(lowess(cars, f = .2), col = 3)
legend(5, 120, c(paste("f = ", c("2/3", ".2"))), lty = 1, col = 2:3)

Compute Diagnostics for lsfit Regression Results


Computes basic statistics, including standard errors, t- and p-values for the regression coefficients.





Typically the result of lsfit()


A list with the following numeric components.

The standard deviation of the errors, an estimate of σ\sigma.


diagonal entries hiih_{ii} of the hat matrix HH


standardized residuals


studentized residuals


Cook's distances


DFITS statistics


correlation matrix


standard errors of the regression coefficients


Scaled covariance matrix of the coefficients


Unscaled covariance matrix of the coefficients


Belsley, D. A., Kuh, E. and Welsch, R. E. (1980) Regression Diagnostics. New York: Wiley.

See Also

hat for the hat matrix diagonals, ls.print, lm.influence, summary.lm, anova.


##-- Using the same data as the lm(.) example:
lsD9 <- lsfit(x = as.numeric(gl(2, 10, 20)), y = weight)
dlsD9 <- ls.diag(lsD9)
utils::str(dlsD9, give.attr = FALSE)
abs(1 - sum(dlsD9$hat) / 2) < 10*.Machine$double.eps # sum(h.ii) = p
plot(dlsD9$hat, dlsD9$stud.res, xlim = c(0, 0.11))
abline(h = 0, lty = 2, col = "lightgray")

Print lsfit Regression Results


Computes basic statistics, including standard errors, t- and p-values for the regression coefficients and prints them if is TRUE.


ls.print(ls.out, digits = 4, = TRUE)



Typically the result of lsfit()


The number of significant digits used for printing

a logical indicating whether the result should also be printed


A list with the components


The ANOVA table of the regression


matrix with regression coefficients, standard errors, t- and p-values


Usually you would use summary(lm(...)) and anova(lm(...)) to obtain similar output.

See Also

ls.diag, lsfit, also for examples; lm, lm.influence which usually are preferable.

Find the Least Squares Fit


The least squares estimate of β\beta in the model

Y=Xβ+ϵ\bold{Y} = \bold{X \beta} + \bold{\epsilon}

is found.


lsfit(x, y, wt = NULL, intercept = TRUE, tolerance = 1e-07,
      yname = NULL)



a matrix whose rows correspond to cases and whose columns correspond to variables.


the responses, possibly a matrix if you want to fit multiple left hand sides.


an optional vector of weights for performing weighted least squares.


whether or not an intercept term should be used.


the tolerance to be used in the matrix decomposition.


names to be used for the response variables.


If weights are specified then a weighted least squares is performed with the weight given to the j-th case specified by the j-th entry in wt.

If any observation has a missing value in any field, that observation is removed before the analysis is carried out. This can be quite inefficient if there is a lot of missing data.

The implementation is via a modification of the LINPACK subroutines which allow for multiple left-hand sides.


A list with the following named components:


the least squares estimates of the coefficients in the model (β\beta as stated above).


residuals from the fit.


indicates whether an intercept was fitted.


the QR decomposition of the design matrix.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

lm which usually is preferable; ls.print, ls.diag.


##-- Using the same data as the lm(.) example:
lsD9 <- lsfit(x = unclass(gl(2, 10)), y = weight)

Median Absolute Deviation


Compute the median absolute deviation, i.e., the (lo-/hi-) median of the absolute deviations from the median, and (by default) adjust by a factor for asymptotically normal consistency.


mad(x, center = median(x), constant = 1.4826, na.rm = FALSE,
    low = FALSE, high = FALSE)



a numeric vector.


Optionally, the centre: defaults to the median.


scale factor.


if TRUE then NA values are stripped from x before computation takes place.


if TRUE, compute the ‘lo-median’, i.e., for even sample size, do not average the two middle values, but take the smaller one.


if TRUE, compute the ‘hi-median’, i.e., take the larger of the two middle values for even sample size.


The actual value calculated is constant * cMedian(abs(x - center)) with the default value of center being median(x), and cMedian being the usual, the ‘low’ or ‘high’ median, see the arguments description for low and high above.

In the case of n=1n = 1 non-missing values and default center, the result is 0, consistent with “no deviation from the center”.

The default constant = 1.4826 (approximately 1/Φ1(34)1/\Phi^{-1}(\frac 3 4) = 1/qnorm(3/4)) ensures consistency, i.e.,

E[mad(X1,,Xn)]=σE[mad(X_1,\dots,X_n)] = \sigma

for XiX_i distributed as N(μ,σ2)N(\mu, \sigma^2) and large nn.

If na.rm is TRUE then NA values are stripped from x before computation takes place. If this is not done then an NA value in x will cause mad to return NA.

See Also

IQR which is simpler but less robust, median, var.


print(mad(c(1:9),     constant = 1)) ==
      mad(c(1:8, 100), constant = 1)       # = 2 ; TRUE
x <- c(1,2,3,5,7,8)
sort(abs(x - median(x)))
c(mad(x, constant = 1),
  mad(x, constant = 1, low = TRUE),
  mad(x, constant = 1, high = TRUE))

Mahalanobis Distance


Returns the squared Mahalanobis distance of all rows in x and the vector μ\mu = center with respect to Σ\Sigma = cov. This is (for vector x) defined as

D2=(xμ)Σ1(xμ)D^2 = (x - \mu)' \Sigma^{-1} (x - \mu)


mahalanobis(x, center, cov, inverted = FALSE, ...)



vector or matrix of data with, say, pp columns.


mean vector of the distribution or second data vector of length pp or recyclable to that length. If set to FALSE, the centering step is skipped.


covariance matrix (p×pp \times p) of the distribution.


logical. If TRUE, cov is supposed to contain the inverse of the covariance matrix.


passed to solve for computing the inverse of the covariance matrix (if inverted is false).

See Also

cov, var



ma <- cbind(1:6, 1:3)
(S <-  var(ma))
mahalanobis(c(0, 0), 1:2, S)

x <- matrix(rnorm(100*3), ncol = 3)
stopifnot(mahalanobis(x, 0, diag(ncol(x))) == rowSums(x*x))
        ##- Here, D^2 = usual squared Euclidean distances

Sx <- cov(x)
D2 <- mahalanobis(x, colMeans(x), Sx)
plot(density(D2, bw = 0.5),
     main="Squared Mahalanobis distances, n=100, p=3") ; rug(D2)
qqplot(qchisq(ppoints(100), df = 3), D2,
       main = expression("Q-Q plot of Mahalanobis" * ~D^2 *
                         " vs. quantiles of" * ~ chi[3]^2))
abline(0, 1, col = 'gray')

Utility Function for Safe Prediction


A utility to help model.frame.default create the right matrices when predicting from models with terms like (univariate) poly or ns.


makepredictcall(var, call)



A variable.


The term in the formula, as a call.


This is a generic function with methods for poly, bs and ns: the default method handles scale. If model.frame.default encounters such a term when creating a model frame, it modifies the predvars attribute of the terms supplied by replacing the term with one which will work for predicting new data. For example makepredictcall.ns adds arguments for the knots and intercept.

To make use of this, have your model-fitting function return the terms attribute of the model frame, or copy the predvars attribute of the terms attribute of the model frame to your terms object.

To extend this, make sure the term creates variables with a class, and write a suitable method for that class.


A replacement for call for the predvars attribute of the terms.

See Also

model.frame, poly, scale; bs and ns in package splines.

cars for an example of prediction from a polynomial fit.



## using poly: this did not work in R < 1.5.0
fm <- lm(weight ~ poly(height, 2), data = women)
plot(women, xlab = "Height (in)", ylab = "Weight (lb)")
ht <- seq(57, 73, length.out = 200)
nD <- data.frame(height = ht)
pfm <- predict(fm, nD)
lines(ht, pfm)
pf2 <- predict(update(fm, ~ stats::poly(height, 2)), nD)
stopifnot(all.equal(pfm, pf2)) ## was off (rel.diff. 0.0766) in R <= 3.5.0

## see also example(cars)

## see bs and ns for spline examples.

Multivariate Analysis of Variance


A class for the multivariate analysis of variance.





Arguments to be passed to aov.


Class "manova" differs from class "aov" in selecting a different summary method. Function manova calls aov and then add class "manova" to the result object for each stratum.


See aov and the comments in ‘Details’ here.


Krzanowski, W. J. (1988) Principles of Multivariate Analysis. A User's Perspective. Oxford.

Hand, D. J. and Taylor, C. C. (1987) Multivariate Analysis of Variance and Repeated Measures. Chapman and Hall.

See Also

aov, summary.manova, the latter containing more examples.


## Set orthogonal contrasts.
op <- options(contrasts = c("contr.helmert", "contr.poly"))

## Fake a 2nd response variable
npk2 <- within(npk, foo <- rnorm(24))
( npk2.aov <- manova(cbind(yield, foo) ~ block + N*P*K, npk2) )

( npk2.aovE <- manova(cbind(yield, foo) ~  N*P*K + Error(block), npk2) )

Cochran-Mantel-Haenszel Chi-Squared Test for Count Data


Performs a Cochran-Mantel-Haenszel chi-squared test of the null that two nominal variables are conditionally independent in each stratum, assuming that there is no three-way interaction.


mantelhaen.test(x, y = NULL, z = NULL,
                alternative = c("two.sided", "less", "greater"),
                correct = TRUE, exact = FALSE, conf.level = 0.95)



either a 3-dimensional contingency table in array form where each dimension is at least 2 and the last dimension corresponds to the strata, or a factor object with at least 2 levels.


a factor object with at least 2 levels; ignored if x is an array.


a factor object with at least 2 levels identifying to which stratum the corresponding elements in x and y belong; ignored if x is an array.


indicates the alternative hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. Only used in the 2 by 2 by KK case.


a logical indicating whether to apply continuity correction when computing the test statistic. Only used in the 2 by 2 by KK case.


a logical indicating whether the Mantel-Haenszel test or the exact conditional test (given the strata margins) should be computed. Only used in the 2 by 2 by KK case.


confidence level for the returned confidence interval. Only used in the 2 by 2 by KK case.


If x is an array, each dimension must be at least 2, and the entries should be nonnegative integers. NA's are not allowed. Otherwise, x, y and z must have the same length. Triples containing NA's are removed. All variables must take at least two different values.


A list with class "htest" containing the following components:


Only present if no exact test is performed. In the classical case of a 2 by 2 by KK table (i.e., of dichotomous underlying variables), the Mantel-Haenszel chi-squared statistic; otherwise, the generalized Cochran-Mantel-Haenszel statistic.


the degrees of freedom of the approximate chi-squared distribution of the test statistic (11 in the classical case). Only present if no exact test is performed.


the p-value of the test.

a confidence interval for the common odds ratio. Only present in the 2 by 2 by KK case.


an estimate of the common odds ratio. If an exact test is performed, the conditional Maximum Likelihood Estimate is given; otherwise, the Mantel-Haenszel estimate. Only present in the 2 by 2 by KK case.


the common odds ratio under the null of independence, 1. Only present in the 2 by 2 by KK case.


a character string describing the alternative hypothesis. Only present in the 2 by 2 by KK case.


a character string indicating the method employed, and whether or not continuity correction was used.

a character string giving the names of the data.


The asymptotic distribution is only valid if there is no three-way interaction. In the classical 2 by 2 by KK case, this is equivalent to the conditional odds ratios in each stratum being identical. Currently, no inference on homogeneity of the odds ratios is performed.

See also the example below.


Alan Agresti (1990). Categorical data analysis. New York: Wiley. Pages 230–235.

Alan Agresti (2002). Categorical data analysis (second edition). New York: Wiley.


## Agresti (1990), pages 231--237, Penicillin and Rabbits
## Investigation of the effectiveness of immediately injected or 1.5
##  hours delayed penicillin in protecting rabbits against a lethal
##  injection with beta-hemolytic streptococci.
Rabbits <-
array(c(0, 0, 6, 5,
        3, 0, 3, 6,
        6, 2, 0, 4,
        5, 6, 1, 0,
        2, 5, 0, 0),
      dim = c(2, 2, 5),
      dimnames = list(
          Delay = c("None", "1.5h"),
          Response = c("Cured", "Died"),
          Penicillin.Level = c("1/8", "1/4", "1/2", "1", "4")))
## Classical Mantel-Haenszel test
## => p = 0.047, some evidence for higher cure rate of immediate
##               injection
## Exact conditional test
mantelhaen.test(Rabbits, exact = TRUE)
## => p - 0.040
## Exact conditional test for one-sided alternative of a higher
## cure rate for immediate injection
mantelhaen.test(Rabbits, exact = TRUE, alternative = "greater")
## => p = 0.020

## UC Berkeley Student Admissions
## No evidence for association between admission and gender
## when adjusted for department.  However,
apply(UCBAdmissions, 3, function(x) (x[1,1]*x[2,2])/(x[1,2]*x[2,1]))
## This suggests that the assumption of homogeneous (conditional)
## odds ratios may be violated.  The traditional approach would be
## using the Woolf test for interaction:
woolf <- function(x) {
  x <- x + 1 / 2
  k <- dim(x)[3]
  or <- apply(x, 3, function(x) (x[1,1]*x[2,2])/(x[1,2]*x[2,1]))
  w <-  apply(x, 3, function(x) 1 / sum(1 / x))
  1 - pchisq(sum(w * (log(or) - weighted.mean(log(or), w)) ^ 2), k - 1)
## => p = 0.003, indicating that there is significant heterogeneity.
## (And hence the Mantel-Haenszel test cannot be used.)

## Agresti (2002), p. 287f and p. 297.
## Job Satisfaction example.
Satisfaction <-
    as.table(array(c(1, 2, 0, 0, 3, 3, 1, 2,
                     11, 17, 8, 4, 2, 3, 5, 2,
                     1, 0, 0, 0, 1, 3, 0, 1,
                     2, 5, 7, 9, 1, 1, 3, 6),
                   dim = c(4, 4, 2),
                   dimnames =
                   list(Income =
                        c("<5000", "5000-15000",
                          "15000-25000", ">25000"),
                        "Job Satisfaction" =
                        c("V_D", "L_S", "M_S", "V_S"),
                        Gender = c("Female", "Male"))))
## (Satisfaction categories abbreviated for convenience.)
ftable(. ~ Gender + Income, Satisfaction)
## Table 7.8 in Agresti (2002), p. 288.
## See Table 7.12 in Agresti (2002), p. 297.

Mauchly's Test of Sphericity


Tests whether a Wishart-distributed covariance matrix (or transformation thereof) is proportional to a given matrix.


mauchly.test(object, ...)
## S3 method for class 'mlm'
mauchly.test(object, ...)
## S3 method for class 'SSD'
mauchly.test(object, Sigma = diag(nrow = p),
   T = Thin.row(Proj(M) - Proj(X)), M = diag(nrow = p), X = ~0,
   idata = data.frame(index = seq_len(p)), ...)



object of class SSD or mlm.


matrix to be proportional to.


transformation matrix. By default computed from M and X.


formula or matrix describing the outer projection (see below).


formula or matrix describing the inner projection (see below).


data frame describing intra-block design.


arguments to be passed to or from other methods.


This is a generic function with methods for classes "mlm" and "SSD".

The basic method is for objects of class SSD the method for mlm objects just extracts the SSD matrix and invokes the corresponding method with the same options and arguments.

The T argument is used to transform the observations prior to testing. This typically involves transformation to intra-block differences, but more complicated within-block designs can be encountered, making more elaborate transformations necessary. A matrix T can be given directly or specified as the difference between two projections onto the spaces spanned by M and X, which in turn can be given as matrices or as model formulas with respect to idata (the tests will be invariant to parametrization of the quotient space M/X).

The common use of this test is in repeated measurements designs, with X = ~1. This is almost, but not quite the same as testing for compound symmetry in the untransformed covariance matrix.

Notice that the defaults involve p, which is calculated internally as the dimension of the SSD matrix, and a couple of hidden functions in the stats namespace, namely proj which calculates projection matrices from design matrices or model formulas and Thin.row which removes linearly dependent rows from a matrix until it has full row rank.


An object of class "htest"


The p-value differs slightly from that of SAS because a second order term is included in the asymptotic approximation in R.


T. W. Anderson (1958). An Introduction to Multivariate Statistical Analysis. Wiley.

See Also

SSD, anova.mlm, rWishart


utils::example(SSD) # Brings in the mlmfit and reacttime objects

### traditional test of intrasubj. contrasts
mauchly.test(mlmfit, X = ~1)

### tests using intra-subject 3x2 design
idata <- data.frame(deg = gl(3, 1, 6, labels = c(0,4,8)),
                    noise = gl(2, 3, 6, labels = c("A","P")))
mauchly.test(mlmfit, X = ~ deg + noise, idata = idata)
mauchly.test(mlmfit, M = ~ deg + noise, X = ~ noise, idata = idata)

McNemar's Chi-squared Test for Count Data


Performs McNemar's chi-squared test for symmetry of rows and columns in a two-dimensional contingency table.


mcnemar.test(x, y = NULL, correct = TRUE)



either a two-dimensional contingency table in matrix form, or a factor object.


a factor object; ignored if x is a matrix.


a logical indicating whether to apply continuity correction when computing the test statistic.


The null is that the probabilities of being classified into cells [i,j] and [j,i] are the same.

If x is a matrix, it is taken as a two-dimensional contingency table, and hence its entries should be nonnegative integers. Otherwise, both x and y must be vectors or factors of the same length. Incomplete cases are removed, vectors are coerced into factors, and the contingency table is computed from these.

Continuity correction is only used in the 2-by-2 case if correct is TRUE.


A list with class "htest" containing the following components:


the value of McNemar's statistic.


the degrees of freedom of the approximate chi-squared distribution of the test statistic.


the p-value of the test.


a character string indicating the type of test performed, and whether continuity correction was used.

a character string giving the name(s) of the data.


Alan Agresti (1990). Categorical data analysis. New York: Wiley. Pages 350–354.


## Agresti (1990), p. 350.
## Presidential Approval Ratings.
##  Approval of the President's performance in office in two surveys,
##  one month apart, for a random sample of 1600 voting-age Americans.
Performance <-
matrix(c(794, 86, 150, 570),
       nrow = 2,
       dimnames = list("1st Survey" = c("Approve", "Disapprove"),
                       "2nd Survey" = c("Approve", "Disapprove")))
## => significant change (in fact, drop) in approval ratings

Median Value


Compute the sample median.


median(x, na.rm = FALSE, ...)
## Default S3 method:
median(x, na.rm = FALSE, ...)



an object for which a method has been defined, or a numeric vector containing the values whose median is to be computed.


a logical value indicating whether NA values should be stripped before the computation proceeds.


potentially further arguments for methods; not used in the default method.


This is a generic function for which methods can be written. However, the default method makes use of, sort and mean from package base all of which are generic, and so the default method will work for most classes (e.g., "Date") for which a median is a reasonable concept.


The default method returns a length-one object of the same type as x, except when x is logical or integer of even length, when the result will be double.

If there are no values or if na.rm = FALSE and there are NA values the result is NA of the same type as x (or more generally the result of x[NA_integer_]).


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

quantile for general quantiles.


median(1:4)                # = 2.5 [even number]
median(c(1:3, 100, 1000))  # = 3 [odd, robust]

Median Polish (Robust Two-way Decomposition) of a Matrix


Fits an additive model (two-way decomposition) using Tukey's median polish procedure.


medpolish(x, eps = 0.01, maxiter = 10, trace.iter = TRUE,
          na.rm = FALSE)



a numeric matrix.


real number greater than 0. A tolerance for convergence: see ‘Details’.


the maximum number of iterations


logical. Should progress in convergence be reported?


logical. Should missing values be removed?


The model fitted is additive (constant + rows + columns). The algorithm works by alternately removing the row and column medians, and continues until the proportional reduction in the sum of absolute residuals is less than eps or until there have been maxiter iterations. The sum of absolute residuals is printed at each iteration of the fitting process, if trace.iter is TRUE. If na.rm is FALSE the presence of any NA value in x will cause an error, otherwise NA values are ignored.

medpolish returns an object of class medpolish (see below). There are printing and plotting methods for this class, which are invoked via by the generics print and plot.


An object of class medpolish with the following named components:


the fitted constant term.


the fitted row effects.


the fitted column effects.


the residuals.


the name of the dataset.


Tukey, J. W. (1977). Exploratory Data Analysis, Reading Massachusetts: Addison-Wesley.

See Also

median; aov for a mean instead of median decomposition.



## Deaths from sport parachuting;  from ABC of EDA, p.224:
deaths <-
          c( 7, 4, 7),
          c( 8, 2,10),
          c(15, 9,10),
          c( 0, 2, 0))
dimnames(deaths) <- list(c("1-24", "25-74", "75-199", "200++", "NA"),
(med.d <- medpolish(deaths))
## Check decomposition:
all(deaths ==
    med.d$overall + outer(med.d$row,med.d$col, `+`) + med.d$residuals)

Extract Components from a Model Frame


Returns the response, offset, subset, weights or other special components of a model frame passed as optional arguments to model.frame.


model.extract(frame, component)
model.response(data, type = "any")


frame, x, data

a model frame, see model.frame.


literal character string or name. The name of a component to extract, such as "weights" or "subset".


One of "any", "numeric" or "double". Using either of latter two coerces the result to have storage mode "double".


model.extract is provided for compatibility with S, which does not have the more specific functions. It is also useful to extract e.g. the etastart and mustart components of a glm fit.

model.extract(m, "offset") and model.extract(m, "response") are equivalent to model.offset(m) and model.response(m) respectively. model.offset sums any terms specified by offset terms in the formula or by offset arguments in the call producing the model frame: it does check that the offset is numeric.

model.weights is slightly different from model.extract(, "weights") in not naming the vector it returns.


The specified component of the model frame, usually a vector. model.response() now drops a possible "Asis" class (stemming from I(.)).

model.offset returns NULL if no offset was specified.

See Also

model.frame, offset


a <- model.frame(cbind(ncases,ncontrols) ~ agegp + tobgp + alcgp, data = esoph)
model.extract(a, "response")
stopifnot(model.extract(a, "response") == model.response(a))

a <- model.frame(ncases/(ncases+ncontrols) ~ agegp + tobgp + alcgp,
                 data = esoph, weights = ncases+ncontrols)
(mw <- model.extract(a, "weights"))
stopifnot(identical(unname(mw), model.weights(a)))

a <- model.frame(cbind(ncases,ncontrols) ~ agegp,
                 something = tobgp, data = esoph)
stopifnot(model.extract(a, "something") == esoph$tobgp)

Extracting the Model Frame from a Formula or Fit


model.frame (a generic function) and its methods return a data.frame with the variables needed to use formula and any ... arguments.


model.frame(formula, ...)

## Default S3 method:
model.frame(formula, data = NULL,
            subset = NULL, na.action,
            drop.unused.levels = FALSE, xlev = NULL, ...)

## S3 method for class 'aovlist'
model.frame(formula, data = NULL, ...)

## S3 method for class 'glm'
model.frame(formula, ...)

## S3 method for class 'lm'
model.frame(formula, ...)

get_all_vars(formula, data, ...)



a model formula or terms object or an R object.


a data frame, list or environment (or object coercible by to a data frame), containing the variables in formula. Neither a matrix nor an array will be accepted.


a specification of the rows/observations to be used: defaults to all. This can be any valid indexing vector (see [.data.frame) for the rows of data, or a (logical) expression using variables in data or if that is not supplied, in formula. (See additional details about how this argument interacts with data-dependent bases under ‘Details’ below.)


an optional (name of a) function for treating missing values (NAs). The default is first, any na.action attribute of data, second a na.action setting of options, and third if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL.


should factors have unused levels dropped? Defaults to FALSE.


a named list of character vectors giving the full set of levels to be assumed for each factor.


for model.frame methods, a mix of further arguments such as data, na.action, subset to pass to the default method. Any additional arguments (such as offset and weights or other named arguments) which reach the default method are used to create further columns in the model frame, with parenthesised names such as "(offset)".

For get_all_vars, further named columns to include in the model frame.


Exactly what happens depends on the class and attributes of the object formula. If this is an object of fitted-model class such as "lm", the method will either return the saved model frame used when fitting the model (if any, often selected by argument model = TRUE) or pass the call used when fitting on to the default method. The default method itself can cope with rather standard model objects such as those of class "lqs" from package MASS if no other arguments are supplied.

The rest of this section applies only to the default method.

If either formula or data is already a model frame (a data frame with a "terms" attribute) and the other is missing, the model frame is returned. Unless formula is a terms object, as.formula and then terms is called on it. (If you wish to use the keep.order argument of terms.formula, pass a terms object rather than a formula.)

Row names for the model frame are taken from the data argument if present, then from the names of the response in the formula (or rownames if it is a matrix), if there is one.

All the variables in formula, subset and in ... are looked for first in data and then in the environment of formula (see the help for formula() for further details) and collected into a data frame. Then the subset expression is evaluated, and it is used as a row index to the data frame. Then the na.action function is applied to the data frame (and may well add attributes). The levels of any factors in the data frame are adjusted according to the drop.unused.levels and xlev arguments: if xlev specifies a factor and a character variable is found, it is converted to a factor (as from R 2.10.0).

Because variables in the formula are evaluated before rows are dropped based on subset, the characteristics of data-dependent bases such as orthogonal polynomials (i.e. from terms using poly) or splines will be computed based on the full data set rather than the subsetted one.

Unless na.action = NULL, time-series attributes will be removed from the variables found (since they will be wrong if NAs are removed).

Note that all the variables in the formula are included in the data frame, even those preceded by -.

Only variables whose type is raw, logical, integer, real, complex or character can be included in a model frame: this includes classed variables such as factors (whose underlying type is integer), but excludes lists.

get_all_vars returns a data.frame containing the variables used in formula plus those specified in ... which are recycled to the number of data frame rows. Unlike model.frame.default, it returns the input variables and not those resulting from function calls in formula.


A data.frame containing the variables used in formula plus those specified in .... It will have additional attributes, including "terms" for an object of class "terms" derived from formula, and possibly "na.action" giving information on the handling of NAs (which will not be present if no special handling was done, e.g. by na.pass).


Chambers, J. M. (1992) Data for models. Chapter 3 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

model.matrix for the ‘design matrix’, formula for formulas, model.extract to extract components, and expand.model.frame for model.frame manipulation.


data.class(model.frame(dist ~ speed, data = cars))

## using a subset and an extra variable
model.frame(dist ~ speed, data = cars, subset = speed < 10, z = log(dist))

## get_all_vars(): new var.s are recycled (iff length matches: 50 = 2*25)
ncars <- get_all_vars(sqrt(dist) ~ I(speed/2), data = cars, newVar = 2:3)
          identical(cars, ncars[,names(cars)]),
          ncol(ncars) == ncol(cars) + 1)

Construct Design Matrices


model.matrix creates a design (or model) matrix, e.g., by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly.


model.matrix(object, ...)

## Default S3 method:
model.matrix(object, data = environment(object),
             contrasts.arg = NULL, xlev = NULL, ...)
## S3 method for class 'lm'
model.matrix(object, ...)



an object of an appropriate class. For the default method, a model formula or a terms object.


a data frame created with model.frame. If another sort of object, model.frame is called first.


a list, whose entries are values (numeric matrices, functions or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors.


to be used as argument of model.frame if data is such that model.frame is called.


further arguments passed to or from other methods.


model.matrix creates a design matrix from the description given in terms(object), using the data in data which must supply variables with the same names as would be created by a call to model.frame(object) or, more precisely, by evaluating attr(terms(object), "variables"). If data is a data frame, there may be other columns and the order of columns is not important. Any character variables are coerced to factors. After coercion, all the variables used on the right-hand side of the formula must be logical, integer, numeric or factor.

If contrasts.arg is specified for a factor it overrides the default factor coding for that variable and any "contrasts" attribute set by C or contrasts. Whereas invalid contrasts.args have been ignored always, they are warned about since R version 3.6.0.

In an interaction term, the variable whose levels vary fastest is the first one to appear in the formula (and not in the term), so in ~ a + b + b:a the interaction will have a varying fastest.

By convention, if the response variable also appears on the right-hand side of the formula it is dropped (with a warning), although interactions involving the term are retained.


The design matrix for a regression-like model with the specified formula and data.

There is an attribute "assign", an integer vector with an entry for each column in the matrix giving the term in the formula which gave rise to the column. Value 0 corresponds to the intercept (if any), and positive values to terms in the order given by the term.labels attribute of the terms structure corresponding to object.

If there are any factors in terms in the model, there is an attribute "contrasts", a named list with an entry for each factor. This specifies the contrasts that would be used in terms in which the factor is coded by contrasts (in some terms dummy coding may be used), either as a character vector naming a function or as a numeric matrix.


Chambers, J. M. (1992) Data for models. Chapter 3 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

model.frame, model.extract, terms

sparse.model.matrix from package Matrix for creating sparse model matrices, which may be more efficient in large dimensions.


ff <- log(Volume) ~ log(Height) + log(Girth)
utils::str(m <- model.frame(ff, trees))
mat <- model.matrix(ff, m)

dd <- data.frame(a = gl(3,4), b = gl(4,1,12)) # balanced 2-way
options("contrasts") # typically 'treatment' (for unordered factors)
model.matrix(~ a + b, dd)
model.matrix(~ a + b, dd, contrasts.arg = list(a = "contr.sum"))
model.matrix(~ a + b, dd, contrasts.arg = list(a = "contr.sum", b = contr.poly))
m.orth <- model.matrix(~a+b, dd, contrasts.arg = list(a = "contr.helmert"))
crossprod(m.orth) # m.orth is  ALMOST  orthogonal
# invalid contrasts.. ignored with a warning:
   model.matrix(~ a + b, dd),
   model.matrix(~ a + b, dd, contrasts.arg = "contr.FOO")))

Compute Tables of Results from an aov Model Fit


Computes summary tables for model fits, especially complex aov fits.


model.tables(x, ...)

## S3 method for class 'aov'
model.tables(x, type = "effects", se = FALSE, cterms, ...)

## S3 method for class 'aovlist'
model.tables(x, type = "effects", se = FALSE, ...)



a model object, usually produced by aov


type of table: currently only "effects" and "means" are implemented. Can be abbreviated.


should standard errors be computed?


A character vector giving the names of the terms for which tables should be computed. The default is all tables.


further arguments passed to or from other methods.


For type = "effects" give tables of the coefficients for each term, optionally with standard errors.

For type = "means" give tables of the mean response for each combinations of levels of the factors in a term.

The "aov" method cannot be applied to components of a "aovlist" fit.


An object of class "tables.aov", as list which may contain components


A list of tables for each requested term.


The replication information for each term.


Standard error information.


The implementation is incomplete, and only the simpler cases have been tested thoroughly.

Weighted aov fits are not supported.

See Also

aov, proj, replications, TukeyHSD, se.contrast


options(contrasts = c("contr.helmert", "contr.treatment"))
npk.aov <- aov(yield ~ block + N*P*K, npk)
model.tables(npk.aov, "means", se = TRUE)

## as a test, not particularly sensible statistically
npk.aovE <- aov(yield ~  N*P*K + Error(block), npk)
model.tables(npk.aovE, se = TRUE)
model.tables(npk.aovE, "means")

Plot a Seasonal or other Subseries from a Time Series


These functions plot seasonal (or other) subseries of a time series. For each season (or other category), a time series is plotted.


monthplot(x, ...)

## S3 method for class 'stl'
monthplot(x, labels = NULL, ylab = choice, choice = "seasonal",

## S3 method for class 'StructTS'
monthplot(x, labels = NULL, ylab = choice, choice = "sea", ...)

## S3 method for class 'ts'
monthplot(x, labels = NULL, times = time(x), phase = cycle(x),
             ylab = deparse1(substitute(x)), ...)

## Default S3 method:
monthplot(x, labels = 1L:12L,
          ylab = deparse1(substitute(x)),
          times = seq_along(x),
          phase = (times - 1L)%%length(labels) + 1L, base = mean,
          axes = TRUE, type = c("l", "h"), box = TRUE,
          add = FALSE,
          col = par("col"), lty = par("lty"), lwd = par("lwd"),
          col.base = col, lty.base = lty, lwd.base = lwd, ...)



Time series or related object.


Labels to use for each ‘season’.


y label.


Time of each observation.


Indicator for each ‘season’.


Function to use for reference line for subseries.


Which series of an stl or StructTS object?


Arguments to be passed to the default method or graphical parameters.


Should axes be drawn (ignored if add = TRUE)?


Type of plot. The default is to join the points with lines, and "h" is for histogram-like vertical lines.


Should a box be drawn (ignored if add = TRUE)?


Should thus just add on an existing plot.

col, lty, lwd

Graphics parameters for the series.

col.base, lty.base, lwd.base

Graphics parameters for the segments used for the reference lines.


These functions extract subseries from a time series and plot them all in one frame. The ts, stl, and StructTS methods use the internally recorded frequency and start and finish times to set the scale and the seasons. The default method assumes observations come in groups of 12 (though this can be changed).

If the labels are not given but the phase is given, then the labels default to the unique values of the phase. If both are given, then the phase values are assumed to be indices into the labels array, i.e., they should be in the range from 1 to length(labels).


These functions are executed for their side effect of drawing a seasonal subseries plot on the current graphical window.


Duncan Murdoch


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

ts, stl, StructTS



## The CO2 data
fit <- stl(log(co2), s.window = 20, t.window = 20)
op <- par(mfrow = c(2,2))
monthplot(co2, ylab = "data", cex.axis = 0.8)
monthplot(fit, choice = "seasonal", cex.axis = 0.8)
monthplot(fit, choice = "trend", cex.axis = 0.8)
monthplot(fit, choice = "remainder", type = "h", cex.axis = 0.8)

## The CO2 data, grouped quarterly
quarter <- (cycle(co2) - 1) %/% 3
monthplot(co2, phase = quarter)

## see also JohnsonJohnson

Mood Two-Sample Test of Scale


Performs Mood's two-sample test for a difference in scale parameters.


mood.test(x, ...)

## Default S3 method:
mood.test(x, y,
          alternative = c("two.sided", "less", "greater"), ...)

## S3 method for class 'formula'
mood.test(formula, data, subset, na.action, ...)


x, y

numeric vectors of data values.


indicates the alternative hypothesis and must be one of "two.sided" (default), "greater" or "less" all of which can be abbreviated.


a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").


further arguments to be passed to or from methods.


The underlying model is that the two samples are drawn from f(xl)f(x-l) and f((xl)/s)/sf((x-l)/s)/s, respectively, where ll is a common location parameter and ss is a scale parameter.

The null hypothesis is s=1s = 1.

There are more useful tests for this problem.

In the case of ties, the formulation of Mielke (1967) is employed.


A list with class "htest" containing the following components:


the value of the test statistic.


the p-value of the test.


a character string describing the alternative hypothesis. You can specify just the initial letter.


the character string "Mood two-sample test of scale".

a character string giving the names of the data.


William J. Conover (1971), Practical nonparametric statistics. New York: John Wiley & Sons. Pages 234f.

Paul W. Mielke, Jr. (1967). Note on some squared rank tests with existing ties. Technometrics, 9/2, 312–314. doi:10.2307/1266427.

See Also

fligner.test for a rank-based (nonparametric) k-sample test for homogeneity of variances; ansari.test for another rank-based two-sample test for a difference in scale parameters; var.test and bartlett.test for parametric tests for the homogeneity in variance.


## Same data as for the Ansari-Bradley test:
## Serum iron determination using Hyland control sera
ramsay <- c(111, 107, 100, 99, 102, 106, 109, 108, 104, 99,
            101, 96, 97, 102, 107, 113, 116, 113, 110, 98)
jung.parekh <- c(107, 108, 106, 98, 105, 103, 110, 105, 104,
            100, 96, 108, 103, 104, 114, 114, 113, 108, 106, 99)
mood.test(ramsay, jung.parekh)
## Compare this to ansari.test(ramsay, jung.parekh)

The Multinomial Distribution


Generate multinomially distributed random number vectors and compute multinomial probabilities.


rmultinom(n, size, prob)
dmultinom(x, size = NULL, prob, log = FALSE)



vector of length KK of integers in 0:size.


number of random vectors to draw.


integer, say NN, specifying the total number of objects that are put into KK boxes in the typical multinomial experiment. For dmultinom, it defaults to sum(x).


numeric non-negative vector of length KK, specifying the probability for the KK classes; is internally normalized to sum 1. Infinite and missing values are not allowed.


logical; if TRUE, log probabilities are computed.


If x is a KK-component vector, dmultinom(x, prob) is the probability

P(X1=x1,,XK=xk)=C×j=1KπjxjP(X_1=x_1,\ldots,X_K=x_k) = C \times \prod_{j=1}^K \pi_j^{x_j}

where CC is the ‘multinomial coefficient’ C=N!/(x1!xK!)C = N! / (x_1! \cdots x_K!) and N=j=1KxjN = \sum_{j=1}^K x_j.
By definition, each component XjX_j is binomially distributed as Bin(size, prob[j]) for j=1,,Kj = 1, \ldots, K.

The rmultinom() algorithm draws binomials XjX_j from Bin(nj,Pj)Bin(n_j,P_j) sequentially, where n1=Nn_1 = N (N := size), P1=π1P_1 = \pi_1 (π\pi is prob scaled to sum 1), and for j2j \ge 2, recursively, nj=Nk=1j1Xkn_j = N - \sum_{k=1}^{j-1} X_k and Pj=πj/(1k=1j1πk)P_j = \pi_j / (1 - \sum_{k=1}^{j-1} \pi_k).


For rmultinom(), an integer K×nK \times n matrix where each column is a random vector generated according to the desired multinomial law, and hence summing to size. Whereas the transposed result would seem more natural at first, the returned matrix is more efficient because of columnwise storage.


dmultinom is currently not vectorized at all and has no C interface (API); this may be amended in the future.

See Also

Distributions for standard distributions, including dbinom which is a special case conceptually.


rmultinom(10, size = 12, prob = c(0.1,0.2,0.8))

pr <- c(1,3,6,10) # normalization not necessary for generation
rmultinom(10, 20, prob = pr)

## all possible outcomes of Multinom(N = 3, K = 3)
X <- t(as.matrix(expand.grid(0:3, 0:3))); X <- X[, colSums(X) <= 3]
X <- rbind(X, 3:3 - colSums(X)); dimnames(X) <- list(letters[1:3], NULL)
round(apply(X, 2, function(x) dmultinom(x, prob = c(1,2,5))), 3)

NA Action


Extract information on the NA action used to create an object.


na.action(object, ...)



any object whose NA action is given.


further arguments special methods could require.


na.action is a generic function, and na.action.default its default method. The latter extracts the "na.action" component of a list if present, otherwise the "na.action" attribute.

When model.frame is called, it records any information on NA handling in a "na.action" attribute. Most model-fitting functions return this as a component of their result.


Information from the action which was applied to object if NAs were handled specially, or NULL.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

See Also

options("na.action"), na.omit,, also for na.exclude, na.pass.


na.action(na.omit(c(1, NA)))

Find Longest Contiguous Stretch of non-NAs


Find the longest consecutive stretch of non-missing values in a time series object. (In the event of a tie, the first such stretch.)


na.contiguous(object, ...)



a univariate or multivariate time series.


further arguments passed to or from other methods.


A time series without missing values. The class of object will be preserved.

See Also

na.omit and na.omit.ts;



Handle Missing Values in Objects


These generic functions are useful for dealing with NAs in e.g., data frames. returns the object if it does not contain any missing values, and signals an error otherwise. na.omit returns the object with incomplete cases removed. na.pass returns the object unchanged.

Usage, ...)
na.omit(object, ...)
na.exclude(object, ...)
na.pass(object, ...)



an R object, typically a data frame


further arguments special methods could require.


At present these will handle vectors, matrices and data frames comprising vectors and matrices (only).

If na.omit removes cases, the row numbers of the cases form the "na.action" attribute of the result, of class "omit".

na.exclude differs from na.omit only in the class of the "na.action" attribute of the result, which is "exclude". This gives different behaviour in functions making use of naresid and napredict: when na.exclude is used the residuals and predictions are padded to the correct length by inserting NAs for cases omitted by na.exclude.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

See Also

na.action; options with argument na.action for setting NA actions; and lm and glm for functions using these. na.contiguous as alternative for time series.


DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA))
m <- as.matrix(DF)
stopifnot(all(na.omit(1:3) == 1:3))  # does not affect objects with no NA's
try(   #> Error: missing values in ...


Adjust for Missing Values


Use missing value information to report the effects of an na.action.


naprint(x, ...)



An object produced by an na.action function.


further arguments passed to or from other methods.


This is a generic function, and the exact information differs by method. naprint.omit reports the number of rows omitted: naprint.default reports an empty string.


A character string providing information on missing values, for example the number.

Adjust for Missing Values


Use missing value information to adjust residuals and predictions.


naresid(omit, x, ...)
napredict(omit, x, ...)



an object produced by an na.action function, typically the "na.action" attribute of the result of na.omit or na.exclude.


a vector, data frame, or matrix to be adjusted based upon the missing value information.


further arguments passed to or from other methods.


These are utility functions used to allow predict, fitted and residuals methods for modelling functions to compensate for the removal of NAs in the fitting process. They are used by the default, "lm", "glm" and "nls" methods, and by further methods in packages MASS, rpart and survival. Also used for the scores returned by factanal, prcomp and princomp.

The default methods do nothing. The default method for the na.exclude action is to pad the object with NAs in the correct positions to have the same number of rows as the original data frame.

Currently naresid and napredict are identical, but future methods need not be. naresid is used for residuals, and napredict for fitted values, predictions and weights.


These return a similar object to x.


In the early 2000s, packages rpart and survival5 contained versions of these functions that had an na.omit action equivalent to that now used for na.exclude.

The Negative Binomial Distribution


Density, distribution function, quantile function and random generation for the negative binomial distribution with parameters size and prob.


dnbinom(x, size, prob, mu, log = FALSE)
pnbinom(q, size, prob, mu, lower.tail = TRUE, log.p = FALSE)
qnbinom(p, size, prob, mu, lower.tail = TRUE, log.p = FALSE)
rnbinom(n, size, prob, mu)



vector of (non-negative integer) quantiles.


vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.


target for number of successful trials, or dispersion parameter (the shape parameter of the gamma mixing distribution). Must be strictly positive, need not be integer.


probability of success in each trial. 0 < prob <= 1.


alternative parametrization via mean: see ‘Details’.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


The negative binomial distribution with size =n= n and prob =p= p has density

p(x)=Γ(x+n)Γ(n)x!pn(1p)xp(x) = \frac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x

for x=0,1,2,x = 0, 1, 2, \ldots, n>0n > 0 and 0<p10 < p \le 1.

This represents the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached. The mean is μ=n(1p)/p\mu = n(1-p)/p and variance n(1p)/p2n(1-p)/p^2.

A negative binomial distribution can also arise as a mixture of Poisson distributions with mean distributed as a gamma distribution (see pgamma) with scale parameter (1 - prob)/prob and shape parameter size. (This definition allows non-integer values of size.)

An alternative parametrization (often used in ecology) is by the mean mu (see above), and size, the dispersion parameter, where prob = size/(size+mu). The variance is mu + mu^2/size in this parametrization.

If an element of x is not integer, the result of dnbinom is zero, with a warning.

The case size == 0 is the distribution concentrated at zero. This is the limiting distribution for size approaching zero, even if mu rather than prob is held constant. Notice though, that the mean of the limit distribution is 0, whatever the value of mu.

The quantile is defined as the smallest value xx such that F(x)pF(x) \ge p, where FF is the distribution function.


dnbinom gives the density, pnbinom gives the distribution function, qnbinom gives the quantile function, and rnbinom generates random deviates.

Invalid size or prob will result in return value NaN, with a warning.

The length of the result is determined by n for rnbinom, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

rnbinom returns a vector of type integer unless generated values exceed the maximum representable integer when double values are returned.


dnbinom computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

pnbinom uses pbeta.

qnbinom uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search.

rnbinom uses the derivation as a gamma mixture of Poisson distributions, see

Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

See Also

Distributions for standard distributions, including dbinom for the binomial, dpois for the Poisson and dgeom for the geometric distribution, which is a special case of the negative binomial.


x <- 0:11
dnbinom(x, size = 1, prob = 1/2) * 2^(1 + x) # == 1
126 /  dnbinom(0:8, size  = 2, prob  = 1/2) #- theoretically integer

## Cumulative ('p') = Sum of discrete prob.s ('d');  Relative error :
summary(1 - cumsum(dnbinom(x, size = 2, prob = 1/2)) /
                  pnbinom(x, size  = 2, prob = 1/2))

x <- 0:15
size <- (1:20)/4
persp(x, size, dnb <- outer(x, size, function(x,s) dnbinom(x, s, prob = 0.4)),
      xlab = "x", ylab = "s", zlab = "density", theta = 150)
title(tit <- "negative binomial density(x,s, pr = 0.4)  vs.  x & s")

image  (x, size, log10(dnb), main = paste("log [", tit, "]"))
contour(x, size, log10(dnb), add = TRUE)

## Alternative parametrization
x1 <- rnbinom(500, mu = 4, size = 1)
x2 <- rnbinom(500, mu = 4, size = 10)
x3 <- rnbinom(500, mu = 4, size = 100)
h1 <- hist(x1, breaks = 20, plot = FALSE)
h2 <- hist(x2, breaks = h1$breaks, plot = FALSE)
h3 <- hist(x3, breaks = h1$breaks, plot = FALSE)
barplot(rbind(h1$counts, h2$counts, h3$counts),
        beside = TRUE, col = c("red","blue","cyan"),
        names.arg = round(h1$breaks[-length(h1$breaks)]))

Find Highly Composite Numbers


nextn returns the smallest integer, greater than or equal to n, which can be obtained as a product of powers of the values contained in factors.

nextn() is intended to be used to find a suitable length to zero-pad the argument of fft so that the transform is computed quickly. The default value for factors ensures this.


nextn(n, factors = c(2,3,5))



a vector of integer numbers (of type "integer" or "double").


a vector of positive integer factors (at least 22 and preferably relative prime, see the note).


a vector of the same length as n, of type "integer" when the values are small enough (determined before computing them) and "double" otherwise.


If the factors in factors are not relative prime, i.e., have themselves a common factor larger than one, the result may be wrong in the sense that it may not be the smallest integer. E.g., nextn(91, c(2,6)) returns 128 instead of 96 as nextn(91, c(2,3)) returns.

When the resulting N <- nextn(..) is larger than 2^53, a warning with the true 64-bit integer value is signalled, as integers above that range may not be representable in double precision.

If you really need to deal with such large integers, it may be advisable to use package gmp.

See Also

convolve, fft.


nextn(1001) # 1024
n <- 1:100 ; plot(n, nextn(n) - n, type = "o", lwd=2, cex=1/2)

Non-Linear Minimization


This function carries out a minimization of the function f using a Newton-type algorithm. See the references for details.


nlm(f, p, ..., hessian = FALSE, typsize = rep(1, length(p)),
    fscale = 1, print.level = 0, ndigit = 12, gradtol = 1e-6,
    stepmax = max(1000 * sqrt(sum((p/typsize)^2)), 1000),
    steptol = 1e-6, iterlim = 100, check.analyticals = TRUE)



the function to be minimized, returning a single numeric value. This should be a function with first argument a vector of the length of p followed by any other arguments specified by the ... argument.

If the function value has an attribute called gradient or both gradient and hessian attributes, these will be used in the calculation of updated parameter values. Otherwise, numerical derivatives are used. deriv returns a function with suitable gradient attribute and optionally a hessian attribute.


starting parameter values for the minimization.


additional arguments to be passed to f.


if TRUE, the hessian of f at the minimum is returned.


an estimate of the size of each parameter at the minimum.


an estimate of the size of f at the minimum.


this argument determines the level of printing which is done during the minimization process. The default value of 0 means that no printing occurs, a value of 1 means that initial and final details are printed and a value of 2 means that full tracing information is printed.


the number of significant digits in the function f.


a positive scalar giving the tolerance at which the scaled gradient is considered close enough to zero to terminate the algorithm. The scaled gradient is a measure of the relative change in f in each direction p[i] divided by the relative change in p[i].


a positive scalar which gives the maximum allowable scaled step length. stepmax is used to prevent steps which would cause the optimization function to overflow, to prevent the algorithm from leaving the area of interest in parameter space, or to detect divergence in the algorithm. stepmax would be chosen small enough to prevent the first two of these occurrences, but should be larger than any anticipated reasonable step.


A positive scalar providing the minimum allowable relative step length.


a positive integer specifying the maximum number of iterations to be performed before the program is terminated.


a logical scalar specifying whether the analytic gradients and Hessians, if they are supplied, should be checked against numerical derivatives at the initial parameter values. This can help detect incorrectly formulated gradients or Hessians.


Note that arguments after ... must be matched exactly.

If a gradient or hessian is supplied but evaluates to the wrong mode or length, it will be ignored if check.analyticals = TRUE (the default) with a warning. The hessian is not even checked unless the gradient is present and passes the sanity checks.

The C code for the “perturbed” Cholesky, choldc() has had a bug in all R versions before 3.4.1.

From the three methods available in the original source, we always use method “1” which is line search.

The functions supplied should always return finite (including not NA and not NaN) values: for the function value itself non-finite values are replaced by the maximum positive value with a warning.


A list containing the following components:


the value of the estimated minimum of f.


the point at which the minimum value of f is obtained.


the gradient at the estimated minimum of f.


the hessian at the estimated minimum of f (if requested).


an integer indicating why the optimization process terminated.


relative gradient is close to zero, current iterate is probably solution.


successive iterates within tolerance, current iterate is probably solution.


last global step failed to locate a point lower than estimate. Either estimate is an approximate local minimum of the function or steptol is too small.


iteration limit exceeded.


maximum step size stepmax exceeded five consecutive times. Either the function is unbounded below, becomes asymptotic to a finite value from above in some direction or stepmax is too small.


the number of iterations performed.


The current code is by Saikat DebRoy and the R Core team, using a C translation of Fortran code by Richard H. Jones.


Dennis, J. E. and Schnabel, R. B. (1983). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Cliffs, NJ.

Schnabel, R. B., Koontz, J. E. and Weiss, B. E. (1985). A modular system of algorithms for unconstrained minimization. ACM Transactions on Mathematical Software, 11, 419–440. doi:10.1145/6187.6192.

See Also

optim and nlminb.

constrOptim for constrained optimization, optimize for one-dimensional minimization and uniroot for root finding. deriv to calculate analytical derivatives.

For nonlinear regression, nls may be better.


f <- function(x) sum((x-1:length(x))^2)
nlm(f, c(10,10))
nlm(f, c(10,10), print.level = 2)
utils::str(nlm(f, c(5), hessian = TRUE))

f <- function(x, a) sum((x-a)^2)
nlm(f, c(10,10), a = c(3,5))
f <- function(x, a)
    res <- sum((x-a)^2)
    attr(res, "gradient") <- 2*(x-a)
nlm(f, c(10,10), a = c(3,5))

## more examples, including the use of derivatives.
## Not run: demo(nlm)

Optimization using PORT routines


Unconstrained and box-constrained optimization using PORT routines.

For historical compatibility.


nlminb(start, objective, gradient = NULL, hessian = NULL, ...,
       scale = 1, control = list(), lower = -Inf, upper = Inf)



numeric vector, initial values for the parameters to be optimized.


Function to be minimized. Must return a scalar value. The first argument to objective is the vector of parameters to be optimized, whose initial values are supplied through start. Further arguments (fixed during the course of the optimization) to objective may be specified as well (see ...).


Optional function that takes the same arguments as objective and evaluates the gradient of objective at its first argument. Must return a vector as long as start.


Optional function that takes the same arguments as objective and evaluates the hessian of objective at its first argument. Must return a square matrix of order length(start). Only the lower triangle is used.


Further arguments to be supplied to objective.


See PORT documentation (or leave alone).


A list of control parameters. See below for details.

lower, upper

vectors of lower and upper bounds, replicated to be as long as start. If unspecified, all parameters are assumed to be unconstrained.


Any names of start are passed on to objective and where applicable, gradient and hessian. The parameter vector will be coerced to double.

If any of the functions returns NA or NaN this is an error for the gradient and Hessian, and such values for function evaluation are replaced by +Inf with a warning.


A list with components:


The best set of parameters found.


The value of objective corresponding to par.


An integer code. 0 indicates successful convergence.


A character string giving any additional information returned by the optimizer, or NULL. For details, see PORT documentation.


Number of iterations performed.


Number of objective function and gradient function evaluations

Control parameters

Possible names in the control list and their default values are:


Maximum number of evaluations of the objective function allowed. Defaults to 200.


Maximum number of iterations allowed. Defaults to 150.


The value of the objective function and the parameters is printed every trace'th iteration. Defaults to 0 which indicates no trace information is to be printed.


Absolute tolerance. Defaults to 0 so the absolute convergence test is not used. If the objective function is known to be non-negative, the previous default of 1e-20 would be more appropriate.


Relative tolerance. Defaults to 1e-10.


X tolerance. Defaults to 1.5e-8.


false convergence tolerance. Defaults to 2.2e-14.

step.min, step.max

Minimum and maximum step size. Both default to 1..


singular convergence tolerance; defaults to rel.tol.




an estimated bound on the relative error in the objective function value.


R port: Douglas Bates and Deepayan Sarkar.

Underlying Fortran code by David M. Gay



David M. Gay (1990), Usage summary for selected optimization routines. Computing Science Technical Report 153, AT&T Bell Laboratories, Murray Hill.

See Also

optim (which is preferred) and nlm.

optimize for one-dimensional minimization and constrOptim for constrained optimization.


x <- rnbinom(100, mu = 10, size = 10)
hdev <- function(par)
    -sum(dnbinom(x, mu = par[1], size = par[2], log = TRUE))
nlminb(c(9, 12), hdev)
nlminb(c(20, 20), hdev, lower = 0, upper = Inf)
nlminb(c(20, 20), hdev, lower = 0.001, upper = Inf)

## slightly modified from the S-PLUS help page for nlminb
# this example minimizes a sum of squares with known solution y
sumsq <- function( x, y) {sum((x-y)^2)}
y <- rep(1,5)
x0 <- rnorm(length(y))
nlminb(start = x0, sumsq, y = y)
# now use bounds with a y that has some components outside the bounds
y <- c( 0, 2, 0, -2, 0)
nlminb(start = x0, sumsq, lower = -1, upper = 1, y = y)
# try using the gradient
sumsq.g <- function(x, y) 2*(x-y)
nlminb(start = x0, sumsq, sumsq.g,
       lower = -1, upper = 1, y = y)
# now use the hessian, too
sumsq.h <- function(x, y) diag(2, nrow = length(x))
nlminb(start = x0, sumsq, sumsq.g, sumsq.h,
       lower = -1, upper = 1, y = y)

## Rest lifted from optim help page

fr <- function(x) {   ## Rosenbrock Banana function
    x1 <- x[1]
    x2 <- x[2]
    100 * (x2 - x1 * x1)^2 + (1 - x1)^2
grr <- function(x) { ## Gradient of 'fr'
    x1 <- x[1]
    x2 <- x[2]
    c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
       200 *      (x2 - x1 * x1))
nlminb(c(-1.2,1), fr)
nlminb(c(-1.2,1), fr, grr)

flb <- function(x)
    { p <- length(x); sum(c(1, rep(4, p-1)) * (x - c(1, x[-p])^2)^2) }
## 25-dimensional box constrained
## par[24] is *not* at boundary
nlminb(rep(3, 25), flb, lower = rep(2, 25), upper = rep(4, 25))
## trying to use a too small tolerance:
r <- nlminb(rep(3, 25), flb, control = list(rel.tol = 1e-16))
stopifnot(grepl("rel.tol", r$message))

Nonlinear Least Squares


Determine the nonlinear (weighted) least-squares estimates of the parameters of a nonlinear model.


nls(formula, data, start, control, algorithm,
    trace, subset, weights, na.action, model,
    lower, upper, ...)



a nonlinear model formula including variables and parameters. Will be coerced to a formula if necessary.


an optional data frame in which to evaluate the variables in formula and weights. Can also be a list or an environment, but not a matrix.


a named list or named numeric vector of starting estimates. When start is missing (and formula is not a self-starting model, see selfStart), a very cheap guess for start is tried (if algorithm != "plinear").


an optional list of control settings. See nls.control for the names of the settable control values and their effect.


character string specifying the algorithm to use. The default algorithm is a Gauss-Newton algorithm. Other possible values are "plinear" for the Golub-Pereyra algorithm for partially linear least-squares models and "port" for the ‘nl2sol’ algorithm from the Port library – see the references. Can be abbreviated.


logical value indicating if a trace of the iteration progress should be printed. Default is FALSE. If TRUE the residual (weighted) sum-of-squares, the convergence criterion and the parameter values are printed at the conclusion of each iteration. Note that format() is used, so these mostly depend on getOption("digits"). When the "plinear" algorithm is used, the conditional estimates of the linear parameters are printed after the nonlinear parameters. When the "port" algorithm is used the objective function value printed is half the residual (weighted) sum-of-squares.


an optional vector specifying a subset of observations to be used in the fitting process.


an optional numeric vector of (fixed) weights. When present, the objective function is weighted least squares.


a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is if that is unset. The ‘factory-fresh’ default is na.omit. Value na.exclude can be useful.


logical. If true, the model frame is returned as part of the object. Default is FALSE.

lower, upper

vectors of lower and upper bounds, replicated to be as long as start. If unspecified, all parameters are assumed to be unconstrained. Bounds can only be used with the "port" algorithm. They are ignored, with a warning, if given for other algorithms.


Additional optional arguments. None are used at present.


An nls object is a type of fitted model object. It has methods for the generic functions anova, coef, confint, deviance, df.residual, fitted, formula, logLik, predict, print, profile, residuals, summary, vcov and weights.

Variables in formula (and weights if not missing) are looked for first in data, then the environment of formula and finally along the search path. Functions in formula are searched for first in the environment of formula and then along the search path.

Arguments subset and na.action are supported only when all the variables in the formula taken from data are of the same length: other cases give a warning.

Note that the anova method does not check that the models are nested: this cannot easily be done automatically, so use with care.


A list of


an nlsModel object incorporating the model.


the expression that was passed to nls as the data argument. The actual data values are present in the environment of the m components, e.g., environment(m$conv).


the matched call with several components, notably algorithm.


the "na.action" attribute (if any) of the model frame.


the "dataClasses" attribute (if any) of the "terms" attribute of the model frame.


if model = TRUE, the model frame.


if weights is supplied, the weights.


a list with convergence information.


the control list used, see the control argument.

convergence, message

for an algorithm = "port" fit only, a convergence code (0 for convergence) and message.

To use these is deprecated, as they are available from convInfo now.


The default settings of nls generally fail on artificial “zero-residual” data problems.

The nls function uses a relative-offset convergence criterion that compares the numerical imprecision at the current parameter estimates to the residual sum-of-squares. This performs well on data of the form

y=f(x,θ)+εy=f(x, \theta) + \varepsilon

(with var(ε)>0var(\varepsilon) > 0). It fails to indicate convergence on data of the form

y=f(x,θ)y = f(x, \theta)

because the criterion amounts to comparing two components of the round-off error. To avoid a zero-divide in computing the convergence testing value, a positive constant scaleOffset should be added to the denominator sum-of-squares; it is set in control, as in the example below; this does not yet apply to algorithm = "port".

The algorithm = "port" code appears unfinished, and does not even check that the starting value is within the bounds. Use with caution, especially where bounds are supplied.


Setting warnOnly = TRUE in the control argument (see nls.control) returns a non-converged object (since R version 2.5.0) which might be useful for further convergence analysis, but not for inference.


Douglas M. Bates and Saikat DebRoy: David M. Gay for the Fortran code used by algorithm = "port".


Bates, D. M. and Watts, D. G. (1988) Nonlinear Regression Analysis and Its Applications, Wiley

Bates, D. M. and Chambers, J. M. (1992) Nonlinear models. Chapter 10 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole. for the Port library documentation.

See Also

summary.nls, predict.nls, profile.nls.

Self starting models (with ‘automatic initial values’): selfStart.



DNase1 <- subset(DNase, Run == 1)

## using a selfStart model
fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1)
## the coefficients only:
## including their SE, etc:

## using conditional linearity
fm2DNase1 <- nls(density ~ 1/(1 + exp((xmid - log(conc))/scal)),
                 data = DNase1,
                 start = list(xmid = 0, scal = 1),
                 algorithm = "plinear")

## without conditional linearity
fm3DNase1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
                 data = DNase1,
                 start = list(Asym = 3, xmid = 0, scal = 1))

## using Port's nl2sol algorithm
fm4DNase1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
                 data = DNase1,
                 start = list(Asym = 3, xmid = 0, scal = 1),
                 algorithm = "port")

## weighted nonlinear regression
Treated <- Puromycin[Puromycin$state == "treated", ]
weighted.MM <- function(resp, conc, Vm, K)
    ## Purpose: exactly as white book p. 451 -- RHS for nls()
    ##  Weighted version of Michaelis-Menten model
    ## ----------------------------------------------------------
    ## Arguments: 'y', 'x' and the two parameters (see book)
    ## ----------------------------------------------------------
    ## Author: Martin Maechler, Date: 23 Mar 2001

    pred <- (Vm * conc)/(K + conc)
    (resp - pred) / sqrt(pred)

Pur.wt <- nls( ~ weighted.MM(rate, conc, Vm, K), data = Treated,
              start = list(Vm = 200, K = 0.1))

## Passing arguments using a list that can not be coerced to a data.frame
lisTreat <- with(Treated,
                 list(conc1 = conc[1], conc.1 = conc[-1], rate = rate))

weighted.MM1 <- function(resp, conc1, conc.1, Vm, K)
     conc <- c(conc1, conc.1)
     pred <- (Vm * conc)/(K + conc)
    (resp - pred) / sqrt(pred)
Pur.wt1 <- nls( ~ weighted.MM1(rate, conc1, conc.1, Vm, K),
               data = lisTreat, start = list(Vm = 200, K = 0.1))
stopifnot(all.equal(coef(Pur.wt), coef(Pur.wt1)))

## Chambers and Hastie (1992) Statistical Models in S  (p. 537):
## If the value of the right side [of formula] has an attribute called
## 'gradient' this should be a matrix with the number of rows equal
## to the length of the response and one column for each parameter.

weighted.MM.grad <- function(resp, conc1, conc.1, Vm, K)
  conc <- c(conc1, conc.1)

  K.conc <- K+conc
  dy.dV <- conc/K.conc
  dy.dK <- -Vm*dy.dV/K.conc
  pred <- Vm*dy.dV
  pred.5 <- sqrt(pred)
  dev <- (resp - pred) / pred.5
  Ddev <- -0.5*(resp+pred)/(pred.5*pred)
  attr(dev, "gradient") <- Ddev * cbind(Vm = dy.dV, K = dy.dK)

Pur.wt.grad <- nls( ~ weighted.MM.grad(rate, conc1, conc.1, Vm, K),
                   data = lisTreat, start = list(Vm = 200, K = 0.1))

rbind(coef(Pur.wt), coef(Pur.wt1), coef(Pur.wt.grad))

## In this example, there seems no advantage to providing the gradient.
## In other cases, there might be.

## The two examples below show that you can fit a model to
## artificial data with noise but not to artificial data
## without noise.
x <- 1:10
y <- 2*x + 3                            # perfect fit
## terminates in an error, because convergence cannot be confirmed:
try(nls(y ~ a + b*x, start = list(a = 0.12345, b = 0.54321)))
## adjusting the convergence test by adding 'scaleOffset' to its denominator RSS:
nls(y ~ a + b*x, start = list(a = 0.12345, b = 0.54321),
    control = list(scaleOffset = 1, printEval=TRUE))
## Alternatively jittering the "too exact" values, slightly:
yeps <- y + rnorm(length(y), sd = 0.01) # added noise
nls(yeps ~ a + b*x, start = list(a = 0.12345, b = 0.54321))

## the nls() internal cheap guess for starting values can be sufficient:
x <- -(1:100)/10
y <- 100 + 10 * exp(x / 2) + rnorm(x)/10
nlmod <- nls(y ~  Const + A * exp(B * x))

plot(x,y, main = "nls(*), data, true function and fit, n=100")
curve(100 + 10 * exp(x / 2), col = 4, add = TRUE)
lines(x, predict(nlmod), col = 2)

## Here, requiring close convergence, must use more accurate numerical differentiation,
## as this typically gives Error: "step factor .. reduced below 'minFactor' .."

try(nlm1 <- update(nlmod, control = list(tol = 1e-7)))
o2 <- options(digits = 10) # more accuracy for 'trace'
## central differencing works here typically (PR#18165: not converging on *some*):
ctr2 <- nls.control(nDcentral=TRUE, tol = 8e-8, # <- even smaller than above
   warnOnly =
        TRUE || # << work around; e.g. needed on some ATLAS-Lapack setups
        (grepl("^aarch64.*linux", R.version$platform) && grepl("^NixOS", osVersion)
(nlm2 <- update(nlmod, control = ctr2, trace = TRUE)); options(o2)
## --> convergence tolerance  4.997e-8 (in 11 iter.)

## The muscle dataset in MASS is from an experiment on muscle
## contraction on 21 animals.  The observed variables are Strip
## (identifier of muscle), Conc (Cacl concentration) and Length
## (resulting length of muscle section).

if(requireNamespace("MASS", quietly = TRUE)) withAutoprint({
## The non linear model considered is
##       Length = alpha + beta*exp(-Conc/theta) + error
## where theta is constant but alpha and beta may vary with Strip.

with(MASS::muscle, table(Strip)) # 2, 3 or 4 obs per strip

## We first use the plinear algorithm to fit an overall model,
## ignoring that alpha and beta might vary with Strip.
musc.1 <- nls(Length ~ cbind(1, exp(-Conc/th)), MASS::muscle,
              start = list(th = 1), algorithm = "plinear")

## Then we use nls' indexing feature for parameters in non-linear
## models to use the conventional algorithm to fit a model in which
## alpha and beta vary with Strip.  The starting values are provided
## by the previously fitted model.
## Note that with indexed parameters, the starting values must be
## given in a list (with names):
b <- coef(musc.1)
musc.2 <- nls(Length ~ a[Strip] + b[Strip]*exp(-Conc/th), MASS::muscle,
              start = list(a = rep(b[2], 21), b = rep(b[3], 21), th = b[1]))

Control the Iterations in nls


Allow the user to set some characteristics of the nls nonlinear least squares algorithm.


nls.control(maxiter = 50, tol = 1e-05, minFactor = 1/1024,
            printEval = FALSE, warnOnly = FALSE, scaleOffset = 0,
            nDcentral = FALSE)



A positive integer specifying the maximum number of iterations allowed.


A positive numeric value specifying the tolerance level for the relative offset convergence criterion.


A positive numeric value specifying the minimum step-size factor allowed on any step in the iteration. The increment is calculated with a Gauss-Newton algorithm and successively halved until the residual sum of squares has been decreased or until the step-size factor has been reduced below this limit.


a logical specifying whether the number of evaluations (steps in the gradient direction taken each iteration) is printed.


a logical specifying whether nls() should return instead of signalling an error in the case of termination before convergence. Termination before convergence happens upon completion of maxiter iterations, in the case of a singular gradient, and in the case that the step-size factor is reduced below minFactor.


a constant to be added to the denominator of the relative offset convergence criterion calculation to avoid a zero divide in the case where the fit of a model to data is very close. The default value of 0 keeps the legacy behaviour of nls(). A value such as 1 seems to work for problems of reasonable scale with very small residuals.


only when numerical derivatives are used: logical indicating if central differences should be employed, i.e., numericDeriv(*, central=TRUE) be used.


A list with components


with meanings as explained under ‘Arguments’.


Douglas Bates and Saikat DebRoy; John C. Nash for part of the scaleOffset option.


Bates, D. M. and Watts, D. G. (1988), Nonlinear Regression Analysis and Its Applications, Wiley.

See Also



nls.control(minFactor = 1/2048)

Fit the Asymptotic Regression Model


Fits the asymptotic regression model, in the form b0 + b1*(1-exp(-exp(lrc) * x)) to the xy data. This can be used as a building block in determining starting estimates for more complicated models.





a sortedXyData object


A numeric value of length 3 with components labelled b0, b1, and lrc. b0 is the estimated intercept on the y-axis, b1 is the estimated difference between the asymptote and the y-intercept, and lrc is the estimated logarithm of the rate constant.


José Pinheiro and Douglas Bates

See Also



Lob.329 <- Loblolly[ Loblolly$Seed == "329", ]
                                   Lob.329)), digits = 3)

Inverse Interpolation


Use inverse linear interpolation to approximate the x value at which the function represented by xy is equal to yval.


NLSstClosestX(xy, yval)



a sortedXyData object


a numeric value on the y scale


A single numeric value on the x scale.


José Pinheiro and Douglas Bates

See Also

sortedXyData, NLSstLfAsymptote, NLSstRtAsymptote, selfStart


DNase.2 <- DNase[ DNase$Run == "2", ] <- sortedXyData(expression(log(conc)), expression(density), DNase.2)
NLSstClosestX(, 1.0)

Horizontal Asymptote on the Left Side


Provide an initial guess at the horizontal asymptote on the left side (i.e., small values of x) of the graph of y versus x from the xy object. Primarily used within initial functions for self-starting nonlinear regression models.





a sortedXyData object


A single numeric value estimating the horizontal asymptote for small x.


José Pinheiro and Douglas Bates

See Also

sortedXyData, NLSstClosestX, NLSstRtAsymptote, selfStart


DNase.2 <- DNase[ DNase$Run == "2", ] <- sortedXyData( expression(log(conc)), expression(density), DNase.2 )
NLSstLfAsymptote( )

Horizontal Asymptote on the Right Side


Provide an initial guess at the horizontal asymptote on the right side (i.e., large values of x) of the graph of y versus x from the xy object. Primarily used within initial functions for self-starting nonlinear regression models.





a sortedXyData object


A single numeric value estimating the horizontal asymptote for large x.


José Pinheiro and Douglas Bates

See Also

sortedXyData, NLSstClosestX, NLSstRtAsymptote, selfStart


DNase.2 <- DNase[ DNase$Run == "2", ] <- sortedXyData( expression(log(conc)), expression(density), DNase.2 )
NLSstRtAsymptote( )

Extract the Number of Observations from a Fit


Extract the number of ‘observations’ from a model fit. This is principally intended to be used in computing BIC (see AIC).


nobs(object, ...)

## Default S3 method:
nobs(object, use.fallback = FALSE, ...)



a fitted model object.


logical: should fallback methods be used to try to guess the value?


further arguments to be passed to methods.


This is a generic function, with an S4 generic in package stats4. There are methods in this package for objects of classes "lm", "glm", "nls" and "logLik", as well as a default method (which throws an error, unless use.fallback = TRUE when it looks for weights and residuals components – use with care!).

The main usage is in determining the appropriate penalty for BIC, but nobs is also used by the stepwise fitting methods step, add1 and drop1 as a quick check that different fits have been fitted to the same set of data (and not, say, that further rows have been dropped because of NAs in the new predictors).

For lm, glm and nls fits, observations with zero weight are not included.


A single number, normally an integer. Could be NA.

See Also


The Normal Distribution


Density, distribution function, quantile function and random generation for the normal distribution with mean equal to mean and standard deviation equal to sd.


dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.


vector of means.


vector of standard deviations.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x] otherwise, P[X>x]P[X > x].


If mean or sd are not specified they assume the default values of 0 and 1, respectively.

The normal distribution has density

f(x)=12πσe(xμ)2/2σ2f(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-(x-\mu)^2/2\sigma^2}

where μ\mu is the mean of the distribution and σ\sigma the standard deviation.


dnorm gives the density, pnorm gives the distribution function, qnorm gives the quantile function, and rnorm generates random deviates.

The length of the result is determined by n for rnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

For sd = 0 this gives the limit as sd decreases to 0, a point mass at mu. sd < 0 is an error and returns NaN.


For pnorm, based on

Cody, W. D. (1993) Algorithm 715: SPECFUN – A portable FORTRAN package of special function routines and test drivers. ACM Transactions on Mathematical Software 19, 22–32.

For qnorm, the code is based on a C translation of

Wichura, M. J. (1988) Algorithm AS 241: The percentage points of the normal distribution. Applied Statistics, 37, 477–484; doi:10.2307/2347330.

which provides precise results up to about 16 digits for log.p=FALSE. For log scale probabilities in the extreme tails, since R version 4.1.0, extensively since 4.3.0, asymptotic expansions are used which have been derived and explored in

Maechler, M. (2022) Asymptotic tail formulas for gaussian quantiles; DPQ vignette

For rnorm, see RNG for how to select the algorithm and for references to the supplied methods.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 13. Wiley, New York.

See Also

Distributions for other standard distributions, including dlnorm for the Lognormal distribution.



dnorm(0) == 1/sqrt(2*pi)
dnorm(1) == exp(-1/2)/sqrt(2*pi)
dnorm(1) == 1/sqrt(2*pi*exp(1))

## Using "log = TRUE" for an extended range :
par(mfrow = c(2,1))
plot(function(x) dnorm(x, log = TRUE), -60, 50,
     main = "log { Normal density }")
curve(log(dnorm(x)), add = TRUE, col = "red", lwd = 2)
mtext("dnorm(x, log=TRUE)", adj = 0)
mtext("log(dnorm(x))", col = "red", adj = 1)

plot(function(x) pnorm(x, log.p = TRUE), -50, 10,
     main = "log { Normal Cumulative }")
curve(log(pnorm(x)), add = TRUE, col = "red", lwd = 2)
mtext("pnorm(x, log=TRUE)", adj = 0)
mtext("log(pnorm(x))", col = "red", adj = 1)

## if you want the so-called 'error function'
erf <- function(x) 2 * pnorm(x * sqrt(2)) - 1
## (see Abramowitz and Stegun 29.2.29)
## and the so-called 'complementary error function'
erfc <- function(x) 2 * pnorm(x * sqrt(2), lower = FALSE)
## and the inverses
erfinv <- function (x) qnorm((1 + x)/2)/sqrt(2)
erfcinv <- function (x) qnorm(x/2, lower = FALSE)/sqrt(2)

Evaluate Derivatives Numerically


numericDeriv numerically evaluates the gradient of an expression.


numericDeriv(expr, theta, rho = parent.frame(), dir = 1,
             eps = .Machine$double.eps ^ (1/if(central) 3 else 2), central = FALSE)



expression or call to be differentiated. Should evaluate to a numeric vector.


character vector of names of numeric variables used in expr.


environment containing all the variables needed to evaluate expr.


numeric vector of directions, typically with values in -1, 1 to use for the finite differences; will be recycled to the length of theta.


a positive number, to be used as unit step size hh for the approximate numerical derivative (f(x+h)f(x))/h(f(x+h)-f(x))/h or the central version, see central.


logical indicating if central divided differences should be computed, i.e., (f(x+h)f(xh))/2h(f(x+h) - f(x-h)) / 2h. These are typically more accurate but need more evaluations of f()f().


This is a front end to the C function numeric_deriv, which is described in Writing R Extensions.

The numeric variables must be of type double and not integer.


The value of eval(expr, envir = rho) plus a matrix attribute "gradient". The columns of this matrix are the derivatives of the value with respect to the variables listed in theta.


Saikat DebRoy; tweaks and eps, central options by R Core Team.


myenv <- new.env()
myenv$mean <- 0.
myenv$sd   <- 1.
myenv$x    <- seq(-3., 3., length.out = 31)
nD <- numericDeriv(quote(pnorm(x, mean, sd)), c("mean", "sd"), myenv)

## Visualize :
matplot(myenv$x, cbind(c(nD), attr(nD, "gradient")), type="l")
abline(h=0, lty=3)
## "gradient" is close to the true derivatives, you don't see any diff.:
curve( - dnorm(x), col=2, lty=3, lwd=2, add=TRUE)
curve(-x*dnorm(x), col=3, lty=3, lwd=2, add=TRUE)
# shows 1.609e-8 on most platforms
          with(myenv, cbind(-dnorm(x), -x*dnorm(x))))

Include an Offset in a Model Formula


An offset is a term to be added to a linear predictor, such as in a generalised linear model, with known coefficient 1 rather than an estimated coefficient.





An offset to be included in a model frame


There can be more than one offset in a model formula, but - is not supported for offset terms (and is equivalent to +).


The input value.

See Also

model.offset, model.frame.

For examples see glm and Insurance in package MASS.

Test for Equal Means in a One-Way Layout


Test whether two or more samples from normal distributions have the same means. The variances are not necessarily assumed to be equal.


oneway.test(formula, data, subset, na.action, var.equal = FALSE)



a formula of the form lhs ~ rhs where lhs gives the sample values and rhs the corresponding groups.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").


a logical variable indicating whether to treat the variances in the samples as equal. If TRUE, then a simple F test for the equality of means in a one-way analysis of variance is performed. If FALSE, an approximate method of Welch (1951) is used, which generalizes the commonly known 2-sample Welch test to the case of arbitrarily many samples.


If the right-hand side of the formula contains more than one term, their interaction is taken to form the grouping.


A list with class "htest" containing the following components:


the value of the test statistic.


the degrees of freedom of the exact or approximate F distribution of the test statistic.


the p-value of the test.


a character string indicating the test performed.

a character string giving the names of the data.


B. L. Welch (1951). On the comparison of several mean values: an alternative approach. Biometrika, 38, 330–336. doi:10.2307/2332579.

See Also

The standard t test (t.test) as the special case for two samples; the Kruskal-Wallis test kruskal.test for a nonparametric test for equal location parameters in a one-way layout.


## Not assuming equal variances
oneway.test(extra ~ group, data = sleep)
## Assuming equal variances
oneway.test(extra ~ group, data = sleep, var.equal = TRUE)
## which gives the same result as
anova(lm(extra ~ group, data = sleep))

General-purpose Optimization


General-purpose optimization based on Nelder–Mead, quasi-Newton and conjugate-gradient algorithms. It includes an option for box-constrained optimization and simulated annealing.


optim(par, fn, gr = NULL, ...,
      method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
      lower = -Inf, upper = Inf,
      control = list(), hessian = FALSE)

optimHess(par, fn, gr = NULL, ..., control = list())



Initial values for the parameters to be optimized over.


A function to be minimized (or maximized), with first argument the vector of parameters over which minimization is to take place. It should return a scalar result.


A function to return the gradient for the "BFGS", "CG" and "L-BFGS-B" methods. If it is NULL, a finite-difference approximation will be used.

For the "SANN" method it specifies a function to generate a new candidate point. If it is NULL a default Gaussian Markov kernel is used.


Further arguments to be passed to fn and gr.


The method to be used. See ‘Details’. Can be abbreviated.

lower, upper

Bounds on the variables for the "L-BFGS-B" method, or bounds in which to search for method "Brent".


a list of control parameters. See ‘Details’.


Logical. Should a numerically differentiated Hessian matrix be returned?


Note that arguments after ... must be matched exactly.

By default optim performs minimization, but it will maximize if control$fnscale is negative. optimHess is an auxiliary function to compute the Hessian at a later stage if hessian = TRUE was forgotten.

The default method is an implementation of that of Nelder and Mead (1965), that uses only function values and is robust but relatively slow. It will work reasonably well for non-differentiable functions.

Method "BFGS" is a quasi-Newton method (also known as a variable metric algorithm), specifically that published simultaneously in 1970 by Broyden, Fletcher, Goldfarb and Shanno. This uses function values and gradients to build up a picture of the surface to be optimized.

Method "CG" is a conjugate gradients method based on that by Fletcher and Reeves (1964) (but with the option of Polak–Ribiere or Beale–Sorenson updates). Conjugate gradient methods will generally be more fragile than the BFGS method, but as they do not store a matrix they may be successful in much larger optimization problems.

Method "L-BFGS-B" is that of Byrd et al. (1995) which allows box constraints, that is each variable can be given a lower and/or upper bound. The initial value must satisfy the constraints. This uses a limited-memory modification of the BFGS quasi-Newton method. If non-trivial bounds are supplied, this method will be selected, with a warning.

Nocedal and Wright (1999) is a comprehensive reference for the previous three methods.

Method "SANN" is by default a variant of simulated annealing given in Belisle (1992). Simulated-annealing belongs to the class of stochastic global optimization methods. It uses only function values but is relatively slow. It will also work for non-differentiable functions. This implementation uses the Metropolis function for the acceptance probability. By default the next candidate point is generated from a Gaussian Markov kernel with scale proportional to the actual temperature. If a function to generate a new candidate point is given, method "SANN" can also be used to solve combinatorial optimization problems. Temperatures are decreased according to the logarithmic cooling schedule as given in Belisle (1992, p. 890); specifically, the temperature is set to temp / log(((t-1) %/% tmax)*tmax + exp(1)), where t is the current iteration step and temp and tmax are specifiable via control, see below. Note that the "SANN" method depends critically on the settings of the control parameters. It is not a general-purpose method but can be very useful in getting to a good value on a very rough surface.

Method "Brent" is for one-dimensional problems only, using optimize(). It can be useful in cases where optim() is used inside other functions where only method can be specified, such as in mle from package stats4.

Function fn can return NA or Inf if the function cannot be evaluated at the supplied value, but the initial value must have a computable finite value of fn. (Except for method "L-BFGS-B" where the values should always be finite.)

optim can be used recursively, and for a single parameter as well as many. It also accepts a zero-length par, and just evaluates the function with that argument.

The control argument is a list that can supply any of the following components:


Non-negative integer. If positive, tracing information on the progress of the optimization is produced. Higher values may produce more tracing information: for method "L-BFGS-B" there are six levels of tracing. (To understand exactly what these do see the source code: higher levels give more detail.)


An overall scaling to be applied to the value of fn and gr during optimization. If negative, turns the problem into a maximization problem. Optimization is performed on fn(par)/fnscale.


A vector of scaling values for the parameters. Optimization is performed on par/parscale and these should be comparable in the sense that a unit change in any element produces about a unit change in the scaled value. Not used (nor needed) for method = "Brent".


A vector of step sizes for the finite-difference approximation to the gradient, on par/parscale scale. Defaults to 1e-3.


The maximum number of iterations. Defaults to 100 for the derivative-based methods, and 500 for "Nelder-Mead".

For "SANN" maxit gives the total number of function evaluations: there is no other stopping criterion. Defaults to 10000.


The absolute convergence tolerance. Only useful for non-negative functions, as a tolerance for reaching zero.


Relative convergence tolerance. The algorithm stops if it is unable to reduce the value by a factor of reltol * (abs(val) + reltol) at a step. Defaults to sqrt(.Machine$double.eps), typically about 1e-8.

alpha, beta, gamma

Scaling parameters for the "Nelder-Mead" method. alpha is the reflection factor (default 1.0), beta the contraction factor (0.5) and gamma the expansion factor (2.0).


The frequency of reports for the "BFGS", "L-BFGS-B" and "SANN" methods if control$trace is positive. Defaults to every 10 iterations for "BFGS" and "L-BFGS-B", or every 100 temperatures for "SANN".


a logical indicating if the (default) "Nelder-Mead" method should signal a warning when used for one-dimensional minimization. As the warning is sometimes inappropriate, you can suppress it by setting this option to false.


for the conjugate-gradients method. Takes value 1 for the Fletcher–Reeves update, 2 for Polak–Ribiere and 3 for Beale–Sorenson.


is an integer giving the number of BFGS updates retained in the "L-BFGS-B" method, It defaults to 5.


controls the convergence of the "L-BFGS-B" method. Convergence occurs when the reduction in the objective is within this factor of the machine tolerance. Default is 1e7, that is a tolerance of about 1e-8.


helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed.


controls the "SANN" method. It is the starting temperature for the cooling schedule. Defaults to 10.


is the number of function evaluations at each temperature for the "SANN" method. Defaults to 10.

Any names given to par will be copied to the vectors passed to fn and gr. Note that no other attributes of par are copied over.

The parameter vector passed to fn has special semantics and may be shared between calls: the function should not change or copy it.


For optim, a list with components:


The best set of parameters found.


The value of fn corresponding to par.


A two-element integer vector giving the number of calls to fn and gr respectively. This excludes those calls needed to compute the Hessian, if requested, and any calls to fn to compute a finite-difference approximation to the gradient.


An integer code. 0 indicates successful completion (which is always the case for "SANN" and "Brent"). Possible error codes are


indicates that the iteration limit maxit had been reached.


indicates degeneracy of the Nelder–Mead simplex.


indicates a warning from the "L-BFGS-B" method; see component message for further details.


indicates an error from the "L-BFGS-B" method; see component message for further details.


A character string giving any additional information returned by the optimizer, or NULL.


Only if argument hessian is true. A symmetric matrix giving an estimate of the Hessian at the solution found. Note that this is the Hessian of the unconstrained problem even if the box constraints are active.

For optimHess, the description of the hessian component applies.


optim will work with one-dimensional pars, but the default method does not work well (and will warn). Method "Brent" uses optimize and needs bounds to be available; "BFGS" often works well enough if not.


The code for methods "Nelder-Mead", "BFGS" and "CG" was based originally on Pascal code in Nash (1990) that was translated by p2c and then hand-optimized. Dr Nash has agreed that the code can be made freely available.

The code for method "L-BFGS-B" is based on Fortran code by Zhu, Byrd, Lu-Chen and Nocedal obtained from Netlib (file ‘opt/lbfgs_bcm.shar’: another version is in ‘toms/778’).

The code for method "SANN" was contributed by A. Trapletti.


Belisle, C. J. P. (1992). Convergence theorems for a class of simulated annealing algorithms on RdR^d. Journal of Applied Probability, 29, 885–895. doi:10.2307/3214721.

Byrd, R. H., Lu, P., Nocedal, J. and Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16, 1190–1208. doi:10.1137/0916069.

Fletcher, R. and Reeves, C. M. (1964). Function minimization by conjugate gradients. Computer Journal 7, 148–154. doi:10.1093/comjnl/7.2.149.

Nash, J. C. (1990). Compact Numerical Methods for Computers. Linear Algebra and Function Minimisation. Adam Hilger.

Nelder, J. A. and Mead, R. (1965). A simplex algorithm for function minimization. Computer Journal, 7, 308–313. doi:10.1093/comjnl/7.4.308.

Nocedal, J. and Wright, S. J. (1999). Numerical Optimization. Springer.

See Also

nlm, nlminb.

optimize for one-dimensional minimization and constrOptim for constrained optimization.



fr <- function(x) {   ## Rosenbrock Banana function
    x1 <- x[1]
    x2 <- x[2]
    100 * (x2 - x1 * x1)^2 + (1 - x1)^2
grr <- function(x) { ## Gradient of 'fr'
    x1 <- x[1]
    x2 <- x[2]
    c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
       200 *      (x2 - x1 * x1))
optim(c(-1.2,1), fr)
(res <- optim(c(-1.2,1), fr, grr, method = "BFGS"))
optimHess(res$par, fr, grr)
optim(c(-1.2,1), fr, NULL, method = "BFGS", hessian = TRUE)
## These do not converge in the default number of steps
optim(c(-1.2,1), fr, grr, method = "CG")
optim(c(-1.2,1), fr, grr, method = "CG", control = list(type = 2))
optim(c(-1.2,1), fr, grr, method = "L-BFGS-B")

flb <- function(x)
    { p <- length(x); sum(c(1, rep(4, p-1)) * (x - c(1, x[-p])^2)^2) }
## 25-dimensional box constrained
optim(rep(3, 25), flb, NULL, method = "L-BFGS-B",
      lower = rep(2, 25), upper = rep(4, 25)) # par[24] is *not* at boundary

## "wild" function , global minimum at about -15.81515
fw <- function (x)
    10*sin(0.3*x)*sin(1.3*x^2) + 0.00001*x^4 + 0.2*x+80
plot(fw, -50, 50, n = 1000, main = "optim() minimising 'wild function'")

res <- optim(50, fw, method = "SANN",
             control = list(maxit = 20000, temp = 20, parscale = 20))
## Now improve locally {typically only by a small bit}:
(r2 <- optim(res$par, fw, method = "BFGS"))
points(r2$par,  r2$value,  pch = 8, col = "red", cex = 2)

## Combinatorial optimization: Traveling salesman problem
library(stats) # normally loaded

eurodistmat <- as.matrix(eurodist)

distance <- function(sq) {  # Target function
    sq2 <- embed(sq, 2)
    sum(eurodistmat[cbind(sq2[,2], sq2[,1])])

genseq <- function(sq) {  # Generate new candidate sequence
    idx <- seq(2, NROW(eurodistmat)-1)
    changepoints <- sample(idx, size = 2, replace = FALSE)
    tmp <- sq[changepoints[1]]
    sq[changepoints[1]] <- sq[changepoints[2]]
    sq[changepoints[2]] <- tmp

sq <- c(1:nrow(eurodistmat), 1)  # Initial sequence: alphabetic
# rotate for conventional orientation
loc <- -cmdscale(eurodist, add = TRUE)$points
x <- loc[,1]; y <- loc[,2]
s <- seq_len(nrow(eurodistmat))
tspinit <- loc[sq,]

plot(x, y, type = "n", asp = 1, xlab = "", ylab = "",
     main = "initial solution of traveling salesman problem", axes = FALSE)
arrows(tspinit[s,1], tspinit[s,2], tspinit[s+1,1], tspinit[s+1,2],
       angle = 10, col = "green")
text(x, y, labels(eurodist), cex = 0.8)

set.seed(123) # chosen to get a good soln relatively quickly
res <- optim(sq, distance, genseq, method = "SANN",
             control = list(maxit = 30000, temp = 2000, trace = TRUE,
                            REPORT = 500))
res  # Near optimum distance around 12842

tspres <- loc[res$par,]
plot(x, y, type = "n", asp = 1, xlab = "", ylab = "",
     main = "optim() 'solving' traveling salesman problem", axes = FALSE)
arrows(tspres[s,1], tspres[s,2], tspres[s+1,1], tspres[s+1,2],
       angle = 10, col = "red")
text(x, y, labels(eurodist), cex = 0.8)

## 1-D minimization: "Brent" or optimize() being preferred.. but NM may be ok and "unavoidable",
## ----------------   so we can suppress the check+warning :
system.time(rO <- optimize(function(x) (x-pi)^2, c(0, 10)))
system.time(ro <- optim(1, function(x) (x-pi)^2, control=list(warn.1d.NelderMead = FALSE)))
rO$minimum - pi # 0 (perfect), on one platform
ro$par - pi     # ~= 1.9e-4    on one platform

One Dimensional Optimization


The function optimize searches the interval from lower to upper for a minimum or maximum of the function f with respect to its first argument.

optimise is an alias for optimize.


optimize(f, interval, ..., lower = min(interval), upper = max(interval),
         maximum = FALSE,
         tol = .Machine$double.eps^0.25)
optimise(f, interval, ..., lower = min(interval), upper = max(interval),
         maximum = FALSE,
         tol = .Machine$double.eps^0.25)



the function to be optimized. The function is either minimized or maximized over its first argument depending on the value of maximum.


a vector containing the end-points of the interval to be searched for the minimum.


additional named or unnamed arguments to be passed to f.


the lower end point of the interval to be searched.


the upper end point of the interval to be searched.


logical. Should we maximize or minimize (the default)?


the desired accuracy.


Note that arguments after ... must be matched exactly.

The method used is a combination of golden section search and successive parabolic interpolation, and was designed for use with continuous functions. Convergence is never much slower than that for a Fibonacci search. If f has a continuous second derivative which is positive at the minimum (which is not at lower or upper), then convergence is superlinear, and usually of the order of about 1.324.

The function f is never evaluated at two points closer together than ϵ\epsilonx0+(tol/3)|x_0| + (tol/3), where ϵ\epsilon is approximately sqrt(.Machine$double.eps) and x0x_0 is the final abscissa optimize()$minimum.
If f is a unimodal function and the computed values of f are always unimodal when separated by at least ϵ\epsilon x+(tol/3)|x| + (tol/3), then x0x_0 approximates the abscissa of the global minimum of f on the interval lower,upper with an error less than ϵ\epsilonx0+tol|x_0|+ tol.
If f is not unimodal, then optimize() may approximate a local, but perhaps non-global, minimum to the same accuracy.

The first evaluation of f is always at x1=a+(1ϕ)(ba)x_1 = a + (1-\phi)(b-a) where (a,b) = (lower, upper) and ϕ=(51)/2=0.61803..\phi = (\sqrt 5 - 1)/2 = 0.61803.. is the golden section ratio. Almost always, the second evaluation is at x2=a+ϕ(ba)x_2 = a + \phi(b-a). Note that a local minimum inside [x1,x2][x_1,x_2] will be found as solution, even when f is constant in there, see the last example.

f will be called as f(x, ...) for a numeric value of x.

The argument passed to f has special semantics and used to be shared between calls. The function should not copy it.


A list with components minimum (or maximum) and objective which give the location of the minimum (or maximum) and the value of the function at that point.


A C translation of Fortran code (author(s) unstated) based on the Algol 60 procedure localmin given in the reference.


Brent, R. (1973) Algorithms for Minimization without Derivatives. Englewood Cliffs N.J.: Prentice-Hall.

See Also

nlm, uniroot.



f <- function (x, a) (x - a)^2
xmin <- optimize(f, c(0, 1), tol = 0.0001, a = 1/3)

## See where the function is evaluated:
optimize(function(x) x^2*(print(x)-1), lower = 0, upper = 10)

## "wrong" solution with unlucky interval and piecewise constant f():
f  <- function(x) ifelse(x > -1, ifelse(x < 4, exp(-1/abs(x - 1)), 10), 10)
fp <- function(x) { print(x); f(x) }

plot(f, -2,5, ylim = 0:1, col = 2)
optimize(fp, c(-4, 20))   # doesn't see the minimum
optimize(fp, c(-7, 20))   # ok

Ordering or Labels of the Leaves in a Dendrogram


Theses functions return the order (index) or the "label" attribute for the leaves in a dendrogram. These indices can then be used to access the appropriate components of any additional data.



## S3 method for class 'dendrogram'
labels(object, ...)


x, object

a dendrogram (see as.dendrogram).


additional arguments


The indices or labels for the leaves in left to right order are retrieved.


A vector with length equal to the number of leaves in the dendrogram is returned. From r <- order.dendrogram(), each element is the index into the original data (from which the dendrogram was computed).


R. Gentleman (order.dendrogram) and Martin Maechler (labels.dendrogram).

See Also

reorder, dendrogram.


x <- rnorm(10)
hc <- hclust(dist(x))
dd <- as.dendrogram(hc)
order.dendrogram(dd) ## the same :
stopifnot(hc$order == order.dendrogram(dd))

d2 <- as.dendrogram(hclust(dist(USArrests)))
labels(d2) ## in this case the same as

Adjust P-values for Multiple Comparisons


Given a set of p-values, returns p-values adjusted using one of several methods.


p.adjust(p, method = p.adjust.methods, n = length(p))

# c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
#   "fdr", "none")



numeric vector of p-values (possibly with NAs). Any other R object is coerced by as.numeric.


correction method, a character string. Can be abbreviated.


number of comparisons, must be at least length(p); only set this (to non-default) when you know what you are doing!


The adjustment methods include the Bonferroni correction ("bonferroni") in which the p-values are multiplied by the number of comparisons. Less conservative corrections are also included by Holm (1979) ("holm"), Hochberg (1988) ("hochberg"), Hommel (1988) ("hommel"), Benjamini & Hochberg (1995) ("BH" or its alias "fdr"), and Benjamini & Yekutieli (2001) ("BY"), respectively. A pass-through option ("none") is also included. The set of methods are contained in the p.adjust.methods vector for the benefit of methods that need to have the method as an option and pass it on to p.adjust.

The first four methods are designed to give strong control of the family-wise error rate. There seems no reason to use the unmodified Bonferroni correction because it is dominated by Holm's method, which is also valid under arbitrary assumptions.

Hochberg's and Hommel's methods are valid when the hypothesis tests are independent or when they are non-negatively associated ( Sarkar, 1998; Sarkar and Chang, 1997). Hommel's method is more powerful than Hochberg's, but the difference is usually small and the Hochberg p-values are faster to compute.

The "BH" (aka "fdr") and "BY" methods of Benjamini, Hochberg, and Yekutieli control the false discovery rate, the expected proportion of false discoveries amongst the rejected hypotheses. The false discovery rate is a less stringent condition than the family-wise error rate, so these methods are more powerful than the others.

Note that you can set n larger than length(p) which means the unobserved p-values are assumed to be greater than all the observed p for "bonferroni" and "holm" methods and equal to 1 for the other methods.


A numeric vector of corrected p-values (of the same length as p, with names copied from p).


Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x.

Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188. doi:10.1214/aos/1013699998.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70.

Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika, 75, 383–386. doi:10.2307/2336190.

Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–803. doi:10.2307/2336325.

Shaffer, J. P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46, 561–584. doi:10.1146/ (An excellent review of the area.)

Sarkar, S. (1998). Some probability inequalities for ordered MTP2 random variables: a proof of Simes conjecture. Annals of Statistics, 26, 494–504. doi:10.1214/aos/1028144846.

Sarkar, S., and Chang, C. K. (1997). The Simes method for multiple hypothesis testing with positively dependent test statistics. Journal of the American Statistical Association, 92, 1601–1608. doi:10.2307/2965431.

Wright, S. P. (1992). Adjusted P-values for simultaneous inference. Biometrics, 48, 1005–1013. doi:10.2307/2532694. (Explains the adjusted P-value approach.)

See Also

pairwise.* functions such as pairwise.t.test.



x <- rnorm(50, mean = c(rep(0, 25), rep(3, 25)))
p <- 2*pnorm(sort(-abs(x)))

round(p, 3)
round(p.adjust(p), 3)
round(p.adjust(p, "BH"), 3)

## or all of them at once (dropping the "fdr" alias):
p.adjust.M <- p.adjust.methods[p.adjust.methods != "fdr"]
p.adj    <- sapply(p.adjust.M, function(meth) p.adjust(p, meth))
p.adj.60 <- sapply(p.adjust.M, function(meth) p.adjust(p, meth, n = 60))
stopifnot(identical(p.adj[,"none"], p), p.adj <= p.adj.60)
round(p.adj, 3)
## or a bit nicer:
noquote(apply(p.adj, 2, format.pval, digits = 3))

## and a graphic:
matplot(p, p.adj, ylab="p.adjust(p, meth)", type = "l", asp = 1, lty = 1:6,
        main = "P-value adjustments")
legend(0.7, 0.6, p.adjust.M, col = 1:6, lty = 1:6)

## Can work with NA's:
pN <- p; iN <- c(46, 47); pN[iN] <- NA
pN.a <- sapply(p.adjust.M, function(meth) p.adjust(pN, meth))
## The smallest 20 P-values all affected by the NA's :
round((pN.a / p.adj)[1:20, ] , 4)

Construct a Paired-Data Object


Combines two vectors into an object of class "Pair".


Pair(x, y)



a vector, the 1st element of the pair.


a vector, the 2nd element of the pair. Should have the same length as x.


A 2-column matrix of class "Pair".


Mostly designed as part of the formula interface to paired tests.

See Also

t.test and wilcox.test

Pairwise comparisons for proportions


Calculate pairwise comparisons between pairs of proportions with correction for multiple testing


pairwise.prop.test(x, n, p.adjust.method = p.adjust.methods, ...)



Vector of counts of successes or a matrix with 2 columns giving the counts of successes and failures, respectively.


Vector of counts of trials; ignored if x is a matrix.


Method for adjusting p values (see p.adjust). Can be abbreviated.


Additional arguments to pass to prop.test


Object of class "pairwise.htest"

See Also

prop.test, p.adjust


smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )
pairwise.prop.test(smokers, patients)

Pairwise t tests


Calculate pairwise comparisons between group levels with corrections for multiple testing


pairwise.t.test(x, g, p.adjust.method = p.adjust.methods,
       = !paired, paired = FALSE,
                alternative = c("two.sided", "less", "greater"),



response vector.


grouping vector or factor.


Method for adjusting p values (see p.adjust).

switch to allow/disallow the use of a pooled SD


a logical indicating whether you want paired t-tests.


a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". Can be abbreviated.


additional arguments to pass to t.test.


The switch calculates a common SD for all groups and uses that for all comparisons (this can be useful if some groups are small). This method does not actually call t.test, so extra arguments are ignored. Pooling does not generalize to paired tests so and paired cannot both be TRUE.

Only the lower triangle of the matrix of possible comparisons is being calculated, so setting alternative to anything other than "two.sided" requires that the levels of g are ordered sensibly.


Object of class "pairwise.htest"

See Also

t.test, p.adjust


Month <- factor(Month, labels =[5:9])
pairwise.t.test(Ozone, Month)
pairwise.t.test(Ozone, Month, p.adjust.method = "bonf")
pairwise.t.test(Ozone, Month, = FALSE)

Tabulate p values for pairwise comparisons


Creates table of p values for pairwise comparisons with corrections for multiple testing.


pairwise.table(compare.levels, level.names, p.adjust.method)



a function to compute (raw) p value given indices i and j.


names of the group levels


a character string specifying the method for multiple testing adjustment; almost always one of p.adjust.methods. Can be abbreviated.


Functions that do multiple group comparisons create separate compare.levels functions (assumed to be symmetrical in i and j) and passes them to this function.


Table of p values in lower triangular form.

See Also


Pairwise Wilcoxon Rank Sum Tests


Calculate pairwise comparisons between group levels with corrections for multiple testing.


pairwise.wilcox.test(x, g, p.adjust.method = p.adjust.methods,
                      paired = FALSE, ...)



response vector.


grouping vector or factor.


method for adjusting p values (see p.adjust). Can be abbreviated.


a logical indicating whether you want a paired test.


additional arguments to pass to wilcox.test.


Extra arguments that are passed on to wilcox.test may or may not be sensible in this context. In particular, only the lower triangle of the matrix of possible comparisons is being calculated, so setting alternative to anything other than "two.sided" requires that the levels of g are ordered sensibly.


Object of class "pairwise.htest"

See Also

wilcox.test, p.adjust


Month <- factor(Month, labels =[5:9])
## These give warnings because of ties :
pairwise.wilcox.test(Ozone, Month)
pairwise.wilcox.test(Ozone, Month, p.adjust.method = "bonf")

Plot Autocovariance and Autocorrelation Functions


Plot method for objects of class "acf".


## S3 method for class 'acf'
plot(x, ci = 0.95, type = "h", xlab = "Lag", ylab = NULL,
     ylim = NULL, main = NULL,
     ci.col = "blue", ci.type = c("white", "ma"),
     max.mfrow = 6, ask = Npgs > 1 && dev.interactive(),
     mar = if(nser > 2) c(3,2,2,0.8) else par("mar"),
     oma = if(nser > 2) c(1,1.2,1,1) else par("oma"),
     mgp = if(nser > 2) c(1.5,0.6,0) else par("mgp"),
     xpd = par("xpd"),
     cex.main = if(nser > 2) 1 else par("cex.main"),
     verbose = getOption("verbose"),



an object of class "acf".


coverage probability for confidence interval. Plotting of the confidence interval is suppressed if ci is zero or negative.


the type of plot to be drawn, default to histogram like vertical lines.


the x label of the plot.


the y label of the plot.


numeric of length 2 giving the y limits for the plot.


overall title for the plot.


colour to plot the confidence interval lines.


should the confidence limits assume a white noise input or for lag kk an MA(k1k-1) input? Can be abbreviated.


positive integer; for multivariate x indicating how many rows and columns of plots should be put on one page, using par(mfrow = c(m,m)).


logical; if TRUE, the user is asked before a new page is started.

mar, oma, mgp, xpd, cex.main

graphics parameters as in par(*), by default adjusted to use smaller than default margins for multivariate x only.


logical. Should R report extra information on progress?


graphics parameters to be passed to the plotting routines.


The confidence interval plotted in plot.acf is based on an uncorrelated series and should be treated with appropriate caution. Using ci.type = "ma" may be less potentially misleading.

See Also

acf which calls plot.acf by default.



z4  <- ts(matrix(rnorm(400), 100, 4), start = c(1961, 1), frequency = 12)
z7  <- ts(matrix(rnorm(700), 100, 7), start = c(1961, 1), frequency = 12)
acf(z7, max.mfrow = 7)   # squeeze onto 1 page
acf(z7) # multi-page

Plot Method for Kernel Density Estimation


The plot method for density objects.


## S3 method for class 'density'
plot(x, main = NULL, xlab = NULL, ylab = "Density", type = "l",
     zero.line = TRUE, ...)



a "density" object.

main, xlab, ylab, type

plotting parameters with useful defaults.


further plotting parameters.


logical; if TRUE, add a base line at y=0y = 0



See Also


Plot function for "HoltWinters" objects


Produces a chart of the original time series along with the fitted values. Optionally, predicted values (and their confidence bounds) can also be plotted.


## S3 method for class 'HoltWinters'
plot(x, predicted.values = NA, intervals = TRUE,
        separator = TRUE, col = 1, col.predicted = 2,
        col.intervals = 4, col.separator = 1, lty = 1,
        lty.predicted = 1, lty.intervals = 1, lty.separator = 3,
        ylab = "Observed / Fitted",
        main = "Holt-Winters filtering",
        ylim = NULL, ...)



Object of class "HoltWinters"


Predicted values as returned by predict.HoltWinters


If TRUE, the prediction intervals are plotted (default).


If TRUE, a separating line between fitted and predicted values is plotted (default).

col, lty

Color/line type of original data (default: black solid).

col.predicted, lty.predicted

Color/line type of fitted and predicted values (default: red solid).

col.intervals, lty.intervals

Color/line type of prediction intervals (default: blue solid).

col.separator, lty.separator

Color/line type of observed/predicted values separator (default: black dashed).


Label of the y-axis.


Main title.


Limits of the y-axis. If NULL, the range is chosen such that the plot contains the original series, the fitted values, and the predicted values if any.


Other graphics parameters.


David Meyer


C. C. Holt (1957) Forecasting trends and seasonals by exponentially weighted moving averages, ONR Research Memorandum, Carnegie Institute of Technology 52.

P. R. Winters (1960). Forecasting sales by exponentially weighted moving averages. Management Science, 6, 324–342. doi:10.1287/mnsc.6.3.324.

See Also

HoltWinters, predict.HoltWinters

Plot Method for isoreg Objects


The plot and lines method for R objects of class isoreg.


## S3 method for class 'isoreg'
plot(x, plot.type = c("single", "row.wise", "col.wise"),
      main = paste("Isotonic regression", deparse(x$call)),
      main2 = "Cumulative Data and Convex Minorant",
      xlab = "x0", ylab = "x$y", = list(col = "red", cex = 1.5, pch = 13, lwd = 1.5),
      mar = if (both) 0.1 + c(3.5, 2.5, 1, 1) else par("mar"),
      mgp = if (both) c(1.6, 0.7, 0) else par("mgp"),
      grid = length(x$x) < 12, ...)

## S3 method for class 'isoreg'
lines(x, col = "red", lwd = 1.5,
       do.points = FALSE, cex = 1.5, pch = 13, ...)



an isoreg object.


character indicating which type of plot is desired. The first (default) only draws the data and the fit, where the others add a plot of the cumulative data and fit. Can be abbreviated.


main title of plot, see title.


title for second (cumulative) plot.

xlab, ylab

x- and y- axis annotation.

a list of arguments (for points and lines) for drawing the fit.

mar, mgp

graphical parameters, see par, mainly for the case of two plots.


logical indicating if grid lines should be drawn. If true, grid() is used for the first plot, where as vertical lines are drawn at ‘touching’ points for the cumulative plot.


for lines(): logical indicating if the step points should be drawn as well (and as they are drawn in plot()).

col, lwd, cex, pch

graphical arguments for lines(), where cex and pch are only used when do.points is TRUE.


further arguments passed to and from methods.

See Also

isoreg for computation of isoreg objects.



utils::example(isoreg) # for the examples there

plot(y3, main = "simple plot(.)  +  lines(<isoreg>)")

## 'same' plot as above, "proving" that only ranks of 'x' are important
plot(isoreg(2^(1:9), c(1,0,4,3,3,5,4,2,0)), plot.type = "row", log = "x")

plot(ir3, plot.type = "row", ylab = "y3")
plot(isoreg(y3 - 4), plot.type = "r", ylab = "y3 - 4")
plot(ir4, plot.type = "ro",  ylab = "y4", xlab = "x = 1:n")

## experiment a bit with these (C-c C-j):
plot(isoreg(sample(9),  y3), plot.type = "row")
plot(isoreg(sample(9),  y3), plot.type = "col.wise")

plot(ir <- isoreg(sample(10), sample(10, replace = TRUE)),
                  plot.type = "r")

Plot Diagnostics for an lm Object


Six plots (selectable by which) are currently available: a plot of residuals against fitted values, a Scale-Location plot of residuals\sqrt{| residuals |} against fitted values, a Q-Q plot of residuals, a plot of Cook's distances versus row labels, a plot of residuals against leverages, and a plot of Cook's distances against leverage/(1-leverage). By default, the first three and 5 are provided.


## S3 method for class 'lm'
plot(x, which = c(1,2,3,5), 
     caption = list("Residuals vs Fitted", "Q-Q Residuals",
       "Scale-Location", "Cook's distance",
       "Residuals vs Leverage",
       expression("Cook's dist vs Leverage* " * h[ii] / (1 - h[ii]))),
     panel = if(add.smooth) function(x, y, ...)
              panel.smooth(x, y, iter=iter.smooth, ...) else points,
     sub.caption = NULL, main = "",
     ask = prod(par("mfcol")) < length(which) && dev.interactive(),
     id.n = 3, = names(residuals(x)), = 0.75,
     qqline = TRUE, cook.levels = c(0.5, 1.0),
     cook.col = 8, cook.lty = 2, cook.legendChanges = list(),
     add.smooth = getOption("add.smooth"),
     iter.smooth = if(isGlm) 0 else 3,
     label.pos = c(4,2),
     cex.caption = 1, cex.oma.main = 1.25
   , extend.ylim.f = 0.08



lm object, typically result of lm or glm.


a subset of the numbers 1:6, by default 1:3, 5, referring to

  1. "Residuals vs Fitted", aka ‘Tukey-Anscombe’ plot

  2. "Residual Q-Q" plot

  3. "Scale-Location"

  4. "Cook's distance"

  5. "Residuals vs Leverage"

  6. "Cook's dist vs Lev./(1-Lev.)"

See also ‘Details’ below.


captions to appear above the plots; character vector or list of valid graphics annotations, see as.graphicsAnnot, of length 6, the j-th entry corresponding to which[j], see also the default vector in ‘Usage’. Can be set to "" or NA to suppress all captions.


panel function. The useful alternative to points, panel.smooth can be chosen by add.smooth = TRUE.


common title—above the figures if there are more than one; used as sub (s.title) otherwise. If NULL, as by default, a possible abbreviated version of deparse(x$call) is used.


title to each plot—in addition to caption.


logical; if TRUE, the user is asked before each plot, see par(ask=.).


other parameters to be passed through to plotting functions.


number of points to be labelled in each plot, starting with the most extreme.

vector of labels, from which the labels for extreme points will be chosen. NULL uses observation numbers.

magnification of point labels.


logical indicating if a qqline() should be added to the normal Q-Q plot.


levels of Cook's distance at which to draw contours.

cook.col, cook.lty

color and line type to use for these contour lines.


a list (or NULL to suppress the call) of arguments to legend which should be modified from (or added to) the plot.lm() default list(x = "bottomleft", legend = "Cook's distance", lty = cook.lty, col = cook.col, text.col = cook.col, bty = "n", x.intersp = 1/4, y.intersp = 1/8) .


logical indicating if a smoother should be added to most plots; see also panel above.


the number of robustness iterations, the argument iter in panel.smooth(); the default uses no such iterations for glm fits which is particularly desirable for the (predominant) case of binary observations, but also for other models where the response distribution can be highly skewed.


positioning of labels, for the left half and right half of the graph respectively, for plots 1-3, 5, 6.


controls the size of caption.


controls the size of the sub.caption only if that is above the figures when there is more than one.


a numeric vector of length 1 or 2, to be used in ylim <- extendrange(r=ylim, f = *) for plots 1 and 5 when id.n is non-empty.


sub.caption—by default the function call—is shown as a subtitle (under the x-axis title) on each plot when plots are on separate pages, or as a subtitle in the outer margin (if any) when there are multiple plots per page.

The ‘Scale-Location’ plot (which=3), also called ‘Spread-Location’ or ‘S-L’ plot, takes the square root of the absolute residuals in order to diminish skewness (E\sqrt{| E |} is much less skewed than E| E | for Gaussian zero-mean EE).

The ‘S-L’, the Q-Q, and the Residual-Leverage (which=5) plot use standardized residuals which have identical variance (under the hypothesis). They are given as Ri/(s×1hii)R_i / (s \times \sqrt{1 - h_{ii}}) where the ‘leverages’ hiih_{ii} are the diagonal entries of the hat matrix, influence()$hat (see also hat), and where the Residual-Leverage plot uses the standardized Pearson residuals (residuals.glm(type = "pearson")) for R[i]R[i].

The Residual-Leverage plot (which=5) shows contours of equal Cook's distance, for values of cook.levels (by default 0.5 and 1) and omits cases with leverage one with a warning. If the leverages are constant (as is typically the case in a balanced aov situation) the plot uses factor level combinations instead of the leverages for the x-axis. (The factor levels are ordered by mean fitted value.)

In the Cook's distance vs leverage/(1-leverage) (= “leverage*”) plot (which=6), contours of standardized residuals (rstandard(.)) that are equal in magnitude are lines through the origin. These lines are labelled with the magnitudes. The x-axis is labeled with the (non equidistant) leverages hiih_{ii}.

For the glm case, the Q-Q plot is based on the absolute value of the standardized deviance residuals. When the saddlepoint approximation applies, these have an approximate half-normal distribution. The saddlepoint approximation is exact for the normal and inverse Gaussian family, and holds approximately for the Gamma family with small dispersion (large shape) and for the Poisson and binomial families with large counts (Dunn and Smyth 2018).


John Maindonald and Martin Maechler.


Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman and Hall.

Firth, D. (1991) Generalized Linear Models. In Hinkley, D. V. and Reid, N. and Snell, E. J., eds: Pp. 55-82 in Statistical Theory and Modelling. In Honour of Sir David Cox, FRS. London: Chapman and Hall.

Hinkley, D. V. (1975). On power transformations to symmetry. Biometrika, 62, 101–111. doi:10.2307/2334491.

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. London: Chapman and Hall.

Dunn, P.K. and Smyth G.K. (2018) Generalized Linear Models with Examples in R. New York: Springer-Verlag.

See Also

termplot, lm.influence, cooks.distance, hatvalues.



## Analysis of the life-cycle savings data
## given in Belsley, Kuh and Welsch.
lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)

## 4 plots on 1 page;
## allow room for printing model formula in outer margin:
par(mfrow = c(2, 2), oma = c(0, 0, 2, 0)) -> opar
plot(lm.SR, id.n = NULL)                 # no id's
plot(lm.SR, id.n = 5, = NULL)  # 5 id numbers

## Was default in R <= 2.1.x:
## Cook's distances instead of Residual-Leverage plot
plot(lm.SR, which = 1:4)

## All the above fit a smooth curve where applicable
## by default unless "add.smooth" is changed.
## Give a smoother curve by increasing the lowess span :
plot(lm.SR, panel = function(x, y) panel.smooth(x, y, span = 1))

par(mfrow = c(2,1)) # same oma as above
plot(lm.SR, which = 1:2, sub.caption = "Saving Rates, n=50, p=5")

## Cook's distance tweaking
par(mfrow = c(2,3)) # same oma ...
plot(lm.SR, which = 1:6, cook.col = "royalblue")

## A case where over plotting of the "legend" is to be avoided:
if(dev.interactive(TRUE)) getOption("device")(height = 6, width = 4)
par(mfrow = c(3,1), mar = c(5,5,4,2)/2 +.1, mgp = c(1.4, .5, 0))
plot(lm.SR, which = 5, extend.ylim.f = c(0.2, 0.08))
plot(lm.SR, which = 5, cook.lty = "dotdash",
     cook.legendChanges = list(x = "bottomright", legend = "Cook"))
plot(lm.SR, which = 5, cook.legendChanges = NULL)  # no "legend"

par(opar) # reset par()s

Plot Ridge Functions for Projection Pursuit Regression Fit


Plot the ridge functions for a projection pursuit regression (ppr) fit.


## S3 method for class 'ppr'
plot(x, ask, type = "o", cex = 1/2,
     main = quote(bquote(
         "term"[.(i)]*":" ~~ hat(beta[.(i)]) == .(bet.i))),
     xlab = quote(bquote(bold(alpha)[.(i)]^T * bold(x))),
     ylab = "", ...)



an R object of class "ppr" as produced by a call to ppr.


the graphics parameter ask: see par for details. If set to TRUE will ask between the plot of each cross-section.


the type of line (see plot.default) to draw.


plot symbol expansion factor (relative to par("cex")).

main, xlab, ylab

axis annotations, see also title. Can be an expression (depending on i and bet.i), as by default which will be eval()uated.


further graphical parameters, passed to plot().



Side Effects

A series of plots are drawn on the current graphical device, one for each term in the fit.

See Also

ppr, par



rock1 <- within(rock, { area1 <- area/10000; peri1 <- peri/10000 })
par(mfrow = c(3,2)) # maybe: , pty = "s"
rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape,
                data = rock1, nterms = 2, max.terms = 5)
plot(rock.ppr, main = "ppr(log(perm)~ ., nterms=2, max.terms=5)")
plot(update(rock.ppr, bass = 5), main = "update(..., bass = 5)")
plot(update(rock.ppr, sm.method = "gcv", gcvpen = 2),
     main = "update(..., sm.method=\"gcv\", gcvpen=2)")

Plotting Functions for 'profile' Objects


plot and pairs methods for objects of class "profile".


## S3 method for class 'profile'
plot(x, ...)
## S3 method for class 'profile'
pairs(x, colours = 2:3, which = names(x), ...)



an object inheriting from class "profile".


colours to be used for the mean curves conditional on x and y respectively.


names or number of parameters in pairs plot


arguments passed to or from other methods.


This is the main plot method for objects created by profile.glm. It can also be called on objects created by profile.nls, but they have a specific method, plot.profile.nls.

The pairs method shows, for each pair of parameters x and y, two curves intersecting at the maximum likelihood estimate, which give the loci of the points at which the tangents to the contours of the bivariate profile likelihood become vertical and horizontal, respectively. In the case of an exactly bivariate normal profile likelihood, these two curves would be straight lines giving the conditional means of y|x and x|y, and the contours would be exactly elliptical. The which argument allows you to select a subset of parameters; the default corresponds to the set of parameters that have been profiled.


Originally, D. M. Bates and W. N. Venables for S (in 1996). Taken from MASS where these functions were re-written by B. D. Ripley for R (by 1998).

See Also

profile.glm, profile.nls.


## see ?profile.glm for another example using glm fits.

## a version of example(profile.nls) from R >= 2.8.0
fm1 <- nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD)
pr1 <- profile(fm1, alphamax = 0.1)
stats:::plot.profile(pr1) ## override dispatch to plot.profile.nls
pairs(pr1) # a little odd since the parameters are highly correlated

## an example from ?nls
x <- -(1:100)/10
y <- 100 + 10 * exp(x / 2) + rnorm(x)/10
nlmod <- nls(y ~  Const + A * exp(B * x), start=list(Const=100, A=10, B=1))

## example from Dobson (1990) (see ?glm)
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
## this example is only formally a Poisson model. It is really a 
## comparison of 3 multinomials. Only the interaction parameters are of 
## interest.
glm.D93i <- glm(counts ~ outcome * treatment, family = poisson())
pr1 <- profile(glm.D93i)
pr2 <- profile(glm.D93i, which=6:9)

Plot a profile.nls Object


Displays a series of plots of the profile t function and interpolated confidence intervals for the parameters in a nonlinear regression model that has been fit with nls and profiled with profile.nls.


## S3 method for class 'profile.nls'
plot(x, levels, conf = c(99, 95, 90, 80, 50)/100,
     absVal = TRUE, ylab = NULL, lty = 2, ...)



an object of class "profile.nls"


levels, on the scale of the absolute value of a t statistic, at which to interpolate intervals. Usually conf is used instead of giving levels explicitly.


a numeric vector of confidence levels for profile-based confidence intervals on the parameters. Defaults to c(0.99, 0.95, 0.90, 0.80, 0.50).


a logical value indicating whether or not the plots should be on the scale of the absolute value of the profile t. Defaults to TRUE.


the line type to be used for axis and dropped lines.

ylab, ...

other arguments to the plot.default function can be passed here (but not xlab, xlim, ylim nor type).


The plots are produced in a set of hard-coded colours, but as these are coded by number their effect can be changed by setting the palette. Colour 1 is used for the axes and 4 for the profile itself. Colours 3 and 6 are used for the axis line at zero and the horizontal/vertical lines dropping to the axes.


Douglas M. Bates and Saikat DebRoy


Bates, D.M. and Watts, D.G. (1988), Nonlinear Regression Analysis and Its Applications, Wiley (chapter 6)

See Also

nls, profile, profile.nls



# obtain the fitted object
fm1 <- nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD)
# get the profile for the fitted model
pr1 <- profile(fm1, alphamax = 0.05)
opar <- par(mfrow = c(2,2), oma = c(1.1, 0, 1.1, 0), las = 1)
plot(pr1, conf = c(95, 90, 80, 50)/100)
plot(pr1, conf = c(95, 90, 80, 50)/100, absVal = FALSE)
mtext("Confidence intervals based on the profile sum of squares",
      side = 3, outer = TRUE)
mtext("BOD data - confidence levels of 50%, 80%, 90% and 95%",
      side = 1, outer = TRUE)

Plotting Spectral Densities


Plotting method for objects of class "spec". For multivariate time series it plots the marginal spectra of the series or pairs plots of the coherency and phase of the cross-spectra.


## S3 method for class 'spec'
plot(x, add = FALSE, ci = 0.95, log = c("yes", "dB", "no"),
     xlab = "frequency", ylab = NULL, type = "l",
     ci.col = "blue", ci.lty = 3,
     main = NULL, sub = NULL,
     plot.type = c("marginal", "coherency", "phase"),

plot.spec.phase(x, ci = 0.95,
                xlab = "frequency", ylab = "phase",
                ylim = c(-pi, pi), type = "l",
                main = NULL, ci.col = "blue", ci.lty = 3, ...)

plot.spec.coherency(x, ci = 0.95,
                    xlab = "frequency",
                    ylab = "squared coherency",
                    ylim = c(0, 1), type = "l",
                    main = NULL, ci.col = "blue", ci.lty = 3, ...)



an object of class "spec".


logical. If TRUE, add to already existing plot. Only valid for plot.type = "marginal".


coverage probability for confidence interval. Plotting of the confidence bar/limits is omitted unless ci is strictly positive.


If "dB", plot on log10 (decibel) scale, otherwise use conventional log scale or linear scale. Logical values are also accepted. The default is "yes" unless options(ts.S.compat = TRUE) has been set, when it is "dB". Only valid for plot.type = "marginal".


the x label of the plot.


the y label of the plot. If missing a suitable label will be constructed.


the type of plot to be drawn, defaults to lines.


colour for plotting confidence bar or confidence intervals for coherency and phase.


line type for confidence intervals for coherency and phase.


overall title for the plot. If missing, a suitable title is constructed.


a subtitle for the plot. Only used for plot.type = "marginal". If missing, a description of the smoothing is used.


For multivariate time series, the type of plot required. Only the first character is needed.

ylim, ...

Graphical parameters.

See Also


Plot Step Functions


Method of the generic plot for stepfun objects and utility for plotting piecewise constant functions.


## S3 method for class 'stepfun'
plot(x, xval, xlim, ylim = range(c(y,,
     xlab = "x", ylab = "f(x)", main = NULL,
     add = FALSE, verticals = TRUE, do.points = (n < 1000),
     pch = par("pch"), col = par("col"),
     col.points = col, cex.points = par("cex"),
     col.hor = col, col.vert = col,
     lty = par("lty"), lwd = par("lwd"), ...)

## S3 method for class 'stepfun'
lines(x, ...)



an R object inheriting from "stepfun".


numeric vector of abscissa values at which to evaluate x. Defaults to knots(x) restricted to xlim.

xlim, ylim

limits for the plot region: see plot.window. Both have sensible defaults if omitted.

xlab, ylab

labels for x and y axis.


main title.


logical; if TRUE only add to an existing plot.


logical; if TRUE, draw vertical lines at steps.


logical; if TRUE, also draw points at the (xlim restricted) knot locations. Default is true, for sample size <1000< 1000.


character; point character if do.points.


default color of all points and lines.


character or integer code; color of points if do.points.


numeric; character expansion factor if do.points.


color of horizontal lines.


color of vertical lines.

lty, lwd

line type and thickness for all lines.


further arguments of plot(.), or if(add) segments(.).


A list with two components


abscissa (x) values, including the two outermost ones.


y values ‘in between’ the t[].


Martin Maechler, 1990, 1993; ported to R, 1997.

See Also

ecdf for empirical distribution functions as special step functions, approxfun and splinefun.



y0 <- c(1,2,4,3)
sfun0  <- stepfun(1:3, y0, f = 0)
sfun.2 <- stepfun(1:3, y0, f = .2)
sfun1  <- stepfun(1:3, y0, right = TRUE)

tt <- seq(0, 3, by = 0.1)
op <- par(mfrow = c(2,2))
plot(sfun0); plot(sfun0, xval = tt, add = TRUE, col.hor = "bisque")
plot(sfun.2);plot(sfun.2, xval = tt, add = TRUE, col = "orange") # all colors
plot(sfun1);lines(sfun1, xval = tt, col.hor = "coral")
##-- This is  revealing :
plot(sfun0, verticals = FALSE,
     main = "stepfun(x, y0, f=f)  for f = 0, .2, 1")
for(i in 1:3)
  lines(list(sfun0, sfun.2, stepfun(1:3, y0, f = 1))[[i]], col = i)
legend(2.5, 1.9, paste("f =", c(0, 0.2, 1)), col = 1:3, lty = 1, y.intersp = 1)

# Extend and/or restrict 'viewport':
plot(sfun0, xlim = c(0,5), ylim = c(0, 3.5),
     main = "plot(stepfun(*), xlim= . , ylim = .)")

##-- this works too (automatic call to  ecdf(.)):
plot.stepfun(rt(50, df = 3), col.vert = "gray20")

Plotting Time-Series Objects


Plotting method for objects inheriting from class "ts".


## S3 method for class 'ts'
plot(x, y = NULL, plot.type = c("multiple", "single"),
        xy.labels, xy.lines, panel = lines, nc, yax.flip = FALSE,
        mar.multi = c(0, 5.1, 0, if(yax.flip) 5.1 else 2.1),
        oma.multi = c(6, 0, 5, 0), axes = TRUE, ...)

## S3 method for class 'ts'
lines(x, ...)


x, y

time series objects, usually inheriting from class "ts".


for multivariate time series, should the series by plotted separately (with a common time axis) or on a single plot? Can be abbreviated.


logical, indicating if text() labels should be used for an x-y plot, or character, supplying a vector of labels to be used. The default is to label for up to 150 points, and not for more.


logical, indicating if lines should be drawn for an x-y plot. Defaults to the value of xy.labels if that is logical, otherwise to TRUE.


a function(x, col, bg, pch, type, ...) which gives the action to be carried out in each panel of the display for plot.type = "multiple". The default is lines.


the number of columns to use when type = "multiple". Defaults to 1 for up to 4 series, otherwise to 2.


logical indicating if the y-axis (ticks and numbering) should flip from side 2 (left) to 4 (right) from series to series when type = "multiple".

mar.multi, oma.multi

the (default) par settings for plot.type = "multiple". Modify with care!


logical indicating if x- and y- axes should be drawn.


additional graphical arguments, see plot, plot.default and par.


If y is missing, this function creates a time series plot, for multivariate series of one of two kinds depending on plot.type.

If y is present, both x and y must be univariate, and a scatter plot y ~ x will be drawn, enhanced by using text if xy.labels is TRUE or character, and lines if xy.lines is TRUE.

See Also

ts for basic time series construction and access functionality.



## Multivariate
z <- ts(matrix(rt(200 * 8, df = 3), 200, 8),
        start = c(1961, 1), frequency = 12)
plot(z, yax.flip = TRUE)
plot(z, axes = FALSE, ann = FALSE, frame.plot = TRUE,
     mar.multi = c(0,0,0,0), oma.multi = c(1,1,5,1))
title("plot(ts(..), axes=FALSE, ann=FALSE, frame.plot=TRUE, mar..., oma...)")

z <- window(z[,1:3], end = c(1969,12))
plot(z, type = "b")    # multiple
plot(z, plot.type = "single", lty = 1:3, col = 4:2)

## A phase plot:
plot(nhtemp, lag(nhtemp, 1), cex = .8, col = "blue",
     main = "Lag plot of New Haven temperatures")

## xy.lines and xy.labels are FALSE for large series:
plot(lag(sunspots, 1), sunspots, pch = ".")

SMI <- EuStockMarkets[, "SMI"]
plot(lag(SMI,  1), SMI, pch = ".")
plot(lag(SMI, 20), SMI, pch = ".", log = "xy",
     main = "4 weeks lagged SMI stocks -- log scale", xy.lines =  TRUE)

The Poisson Distribution


Density, distribution function, quantile function and random generation for the Poisson distribution with parameter lambda.


dpois(x, lambda, log = FALSE)
ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)
qpois(p, lambda, lower.tail = TRUE, log.p = FALSE)
rpois(n, lambda)



vector of (non-negative integer) quantiles.


vector of quantiles.


vector of probabilities.


number of random values to return.


vector of (non-negative) means.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


The Poisson distribution has density

p(x)=λxeλx!p(x) = \frac{\lambda^x e^{-\lambda}}{x!}

for x=0,1,2,x = 0, 1, 2, \ldots . The mean and variance are E(X)=Var(X)=λE(X) = Var(X) = \lambda.

Note that λ=0\lambda = 0 is really a limit case (setting 00=10^0 = 1) resulting in a point mass at 00, see also the example.

If an element of x is not integer, the result of dpois is zero, with a warning. p(x)p(x) is computed using Loader's algorithm, see the reference in dbinom.

The quantile is right continuous: qpois(p, lambda) is the smallest integer xx such that P(Xx)pP(X \le x) \ge p.

Setting lower.tail = FALSE allows to get much more precise results when the default, lower.tail = TRUE would return 1, see the example below.


dpois gives the (log) density, ppois gives the (log) distribution function, qpois gives the quantile function, and rpois generates random deviates.

Invalid lambda will result in return value NaN, with a warning.

The length of the result is determined by n for rpois, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

rpois returns a vector of type integer unless generated values exceed the maximum representable integer when double values are returned.


dpois uses C code contributed by Catherine Loader (see dbinom).

ppois uses pgamma.

qpois uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search.

rpois uses

Ahrens, J. H. and Dieter, U. (1982). Computer generation of Poisson deviates from modified normal distributions. ACM Transactions on Mathematical Software, 8, 163–179.

See Also

Distributions for other standard distributions, including dbinom for the binomial and dnbinom for the negative binomial distribution.




-log(dpois(0:7, lambda = 1) * gamma(1+ 0:7)) # == 1
Ni <- rpois(50, lambda = 4); table(factor(Ni, 0:max(Ni)))

1 - ppois(10*(15:25), lambda = 100)  # becomes 0 (cancellation)
    ppois(10*(15:25), lambda = 100, lower.tail = FALSE)  # no cancellation

par(mfrow = c(2, 1))
x <- seq(-0.01, 5, 0.01)
plot(x, ppois(x, 1), type = "s", ylab = "F(x)", main = "Poisson(1) CDF")
plot(x, pbinom(x, 100, 0.01), type = "s", ylab = "F(x)",
     main = "Binomial(100, 0.01) CDF")

## The (limit) case  lambda = 0 :
stopifnot(identical(dpois(0,0), 1),
	  identical(ppois(0,0), 1),
	  identical(qpois(1,0), 0))

Exact Poisson tests


Performs an exact test of a simple null hypothesis about the rate parameter in Poisson distribution, or for the ratio between two rate parameters.


poisson.test(x, T = 1, r = 1,
    alternative = c("two.sided", "less", "greater"),
    conf.level = 0.95)



number of events. A vector of length one or two.


time base for event count. A vector of length one or two.


hypothesized rate or rate ratio


indicates the alternative hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter.


confidence level for the returned confidence interval.


Confidence intervals are computed similarly to those of binom.test in the one-sample case, and using binom.test in the two sample case.


A list with class "htest" containing the following components:


the number of events (in the first sample if there are two.)


the corresponding expected count


the p-value of the test.

a confidence interval for the rate or rate ratio.


the estimated rate or rate ratio.


the rate or rate ratio under the null, r.


a character string describing the alternative hypothesis.


the character string "Exact Poisson test" or "Comparison of Poisson rates" as appropriate.

a character string giving the names of the data.


The rate parameter in Poisson data is often given based on a “time on test” or similar quantity (person-years, population size, or expected number of cases from mortality tables). This is the role of the T argument.

The one-sample case is effectively the binomial test with a very large n. The two sample case is converted to a binomial test by conditioning on the total event count, and the rate ratio is directly related to the odds in that binomial distribution.

See Also



### These are paraphrased from data sets in the ISwR package

## SMR, Welsh Nickel workers
poisson.test(137, 24.19893)

## eba1977, compare Fredericia to other three cities for ages 55-59
poisson.test(c(11, 6+8+7), c(800, 1083+1050+878))

Compute Orthogonal Polynomials


Returns or evaluates orthogonal polynomials of degree 1 to degree over the specified set of points x: these are all orthogonal to the constant polynomial of degree 0. Alternatively, evaluate raw polynomials.


poly(x, ..., degree = 1, coefs = NULL, raw = FALSE, simple = FALSE)
polym  (..., degree = 1, coefs = NULL, raw = FALSE)

## S3 method for class 'poly'
predict(object, newdata, ...)


x, newdata

a numeric vector or an object with mode "numeric" (such as a Date) at which to evaluate the polynomial. x can also be a matrix. Missing values are not allowed in x.


the degree of the polynomial. Must be less than the number of unique points when raw is false, as by default.


for prediction, coefficients from a previous fit.


if true, use raw and not orthogonal polynomials.


logical indicating if a simple matrix (with no further attributes but dimnames) should be returned. For speedup only.


an object inheriting from class "poly", normally the result of a call to poly with a single vector argument.


poly, polym: further vectors.
predict.poly: arguments to be passed to or from other methods.


Although formally degree should be named (as it follows ...), an unnamed second argument of length 1 will be interpreted as the degree, such that poly(x, 3) can be used in formulas.

The orthogonal polynomial is summarized by the coefficients, which can be used to evaluate it via the three-term recursion given in Kennedy & Gentle (1980, pp. 343–4), and used in the predict part of the code.

poly using ... is just a convenience wrapper for polym: coef is ignored. Conversely, if polym is called with a single argument in ... it is a wrapper for poly.


For poly and polym() (when simple=FALSE and coefs=NULL as per default):
A matrix with rows corresponding to points in x and columns corresponding to the degree, with attributes "degree" specifying the degrees of the columns and (unless raw = TRUE) "coefs" which contains the centering and normalization constants used in constructing the orthogonal polynomials and class c("poly", "matrix").

For poly(*, simple=TRUE), polym(*, coefs=<non-NULL>), and predict.poly(): a matrix.


This routine is intended for statistical purposes such as contr.poly: it does not attempt to orthogonalize to machine accuracy.


R Core Team. Keith Jewell (Campden BRI Group, UK) contributed improvements for correct prediction on subsets.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

Kennedy, W. J. Jr and Gentle, J. E. (1980) Statistical Computing Marcel Dekker.

See Also


cars for an example of polynomial regression.


od <- options(digits = 3) # avoid too much visual clutter
(z <- poly(1:10, 3))
predict(z, seq(2, 4, 0.5))
zapsmall(poly(seq(4, 6, 0.5), 3, coefs = attr(z, "coefs")))

 zm <- zapsmall(polym (    1:4, c(1, 4:6),  degree = 3)) # or just poly():
(z1 <- zapsmall(poly(cbind(1:4, c(1, 4:6)), degree = 3)))
## they are the same :
stopifnot(all.equal(zm, z1, tolerance = 1e-15))

## poly(<matrix>, df) --- used to fail till July 14 (vive la France!), 2017:
m2 <- cbind(1:4, c(1, 4:6))
pm2 <- zapsmall(poly(m2, 3)) # "unnamed degree = 3"
stopifnot(all.equal(pm2, zm, tolerance = 1e-15))


Create a Power Link Object


Creates a link object based on the link function η=μλ\eta = \mu ^ \lambda.


power(lambda = 1)



a real number.


If lambda is non-positive, it is taken as zero, and the log link is obtained. The default lambda = 1 gives the identity link.


A list with components linkfun, linkinv, mu.eta, and valideta. See for information on their meaning.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

See Also, family

To raise a number to a power, see Arithmetic.

To calculate the power of a test, see various functions in the stats package, e.g., power.t.test.


quasi(link = power(1/3))[c("linkfun", "linkinv")]

Power Calculations for Balanced One-Way Analysis of Variance Tests


Compute power of test or determine parameters to obtain target power.


power.anova.test(groups = NULL, n = NULL,
                 between.var = NULL, within.var = NULL,
                 sig.level = 0.05, power = NULL)



Number of groups


Number of observations (per group)


Between group variance


Within group variance


Significance level (Type I error probability)


Power of test (1 minus Type II error probability)


Exactly one of the parameters groups, n, between.var, power, within.var, and sig.level must be passed as NULL, and that parameter is determined from the others. Notice that sig.level has non-NULL default so NULL must be explicitly passed if you want it computed.


Object of class "power.htest", a list of the arguments (including the computed one) augmented with method and note elements.


uniroot is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given.


Claus Ekstrøm

See Also

anova, lm, uniroot


power.anova.test(groups = 4, n = 5, between.var = 1, within.var = 3)
# Power = 0.3535594

power.anova.test(groups = 4, between.var = 1, within.var = 3,
                 power = .80)
# n = 11.92613

## Assume we have prior knowledge of the group means:
groupmeans <- c(120, 130, 140, 150)
power.anova.test(groups = length(groupmeans),
                 between.var = var(groupmeans),
                 within.var = 500, power = .90) # n = 15.18834

Power Calculations for Two-Sample Test for Proportions


Compute the power of the two-sample test for proportions, or determine parameters to obtain a target power.


power.prop.test(n = NULL, p1 = NULL, p2 = NULL, sig.level = 0.05,
                power = NULL,
                alternative = c("two.sided", "one.sided"),
                strict = FALSE, tol = .Machine$double.eps^0.25)



number of observations (per group)


probability in one group


probability in other group


significance level (Type I error probability)


power of test (1 minus Type II error probability)


one- or two-sided test. Can be abbreviated.


use strict interpretation in two-sided case


numerical tolerance used in root finding, the default providing (at least) four significant digits.


Exactly one of the parameters n, p1, p2, power, and sig.level must be passed as NULL, and that parameter is determined from the others. Notice that sig.level has a non-NULL default so NULL must be explicitly passed if you want it computed.

If strict = TRUE is used, the power will include the probability of rejection in the opposite direction of the true effect, in the two-sided case. Without this the power will be half the significance level if the true difference is zero.

Note that not all conditions can be satisfied, e.g., for

power.prop.test(n=30, p1=0.90, p2=NULL, power=0.8, strict=TRUE)

there is no proportion p2 between p1 = 0.9 and 1, as you'd need a sample size of at least n=74n = 74 to yield the desired power for (p1,p2)=(0.9,1)(p1,p2) = (0.9, 1).

For these impossible conditions, currently a warning (warning) is signalled which may become an error (stop) in the future.


Object of class "power.htest", a list of the arguments (including the computed one) augmented with method and note elements.


uniroot is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given. If one of p1 and p2 is computed, then p1<p2p1 < p2 is assumed and will hold, but if you specify both, p2p1p2 \le p1 is allowed.


Peter Dalgaard. Based on previous work by Claus Ekstrøm

See Also

prop.test, uniroot


power.prop.test(n = 50, p1 = .50, p2 = .75)      ## => power = 0.740
power.prop.test(p1 = .50, p2 = .75, power = .90) ## =>     n = 76.7
power.prop.test(n = 50, p1 = .5, power = .90)    ## =>    p2 = 0.8026
power.prop.test(n = 50, p1 = .5, p2 = 0.9, power = .90, sig.level=NULL)
                                                 ## => sig.l = 0.00131
power.prop.test(p1 = .5, p2 = 0.501, sig.level=.001, power=0.90)
                                                 ## => n = 10451937
 power.prop.test(n=30, p1=0.90, p2=NULL, power=0.8)
) # a warning  (which may become an error)
## Reason:
power.prop.test(      p1=0.90, p2= 1.0, power=0.8) ##-> n = 73.37

Power calculations for one and two sample t tests


Compute the power of the one- or two- sample t test, or determine parameters to obtain a target power.


power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05,
             power = NULL,
             type = c("two.sample", "one.sample", "paired"),
             alternative = c("two.sided", "one.sided"),
             strict = FALSE, tol = .Machine$double.eps^0.25)



number of observations (per group)


true difference in means


standard deviation


significance level (Type I error probability)


power of test (1 minus Type II error probability)


string specifying the type of t test. Can be abbreviated.


one- or two-sided test. Can be abbreviated.


use strict interpretation in two-sided case


numerical tolerance used in root finding, the default providing (at least) four significant digits.


Exactly one of the parameters n, delta, power, sd, and sig.level must be passed as NULL, and that parameter is determined from the others. Notice that the last two have non-NULL defaults, so NULL must be explicitly passed if you want to compute them.

If strict = TRUE is used, the power will include the probability of rejection in the opposite direction of the true effect, in the two-sided case. Without this the power will be half the significance level if the true difference is zero.


Object of class "power.htest", a list of the arguments (including the computed one) augmented with method and note elements.


uniroot is used to solve the power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given.


Peter Dalgaard. Based on previous work by Claus Ekstrøm

See Also

t.test, uniroot


power.t.test(n = 20, delta = 1)
 power.t.test(power = .90, delta = 1)
 power.t.test(power = .90, delta = 1, alternative = "one.sided")

Phillips-Perron Test for Unit Roots


Computes the Phillips-Perron test for the null hypothesis that x has a unit root against a stationary alternative.


PP.test(x, lshort = TRUE)



a numeric vector or univariate time series.


a logical indicating whether the short or long version of the truncation lag parameter is used.


The general regression equation which incorporates a constant and a linear trend is used and the corrected t-statistic for a first order autoregressive coefficient equals one is computed. To estimate sigma^2 the Newey-West estimator is used. If lshort is TRUE, then the truncation lag parameter is set to trunc(4*(n/100)^0.25), otherwise trunc(12*(n/100)^0.25) is used. The p-values are interpolated from Table 4.2, page 103 of Banerjee et al. (1993).

Missing values are not handled.


A list with class "htest" containing the following components:


the value of the test statistic.


the truncation lag parameter.


the p-value of the test.


a character string indicating what type of test was performed.

a character string giving the name of the data.


A. Trapletti


A. Banerjee, J. J. Dolado, J. W. Galbraith, and D. F. Hendry (1993). Cointegration, Error Correction, and the Econometric Analysis of Non-Stationary Data. Oxford University Press, Oxford.

P. Perron (1988). Trends and random walks in macroeconomic time series. Journal of Economic Dynamics and Control, 12, 297–332. doi:10.1016/0165-1889(88)90043-7.


x <- rnorm(1000)
y <- cumsum(x) # has unit root

Ordinates for Probability Plotting


Generates the sequence of probability points (1:m - a)/(m + (1-a)-a) where m is either n, if length(n)==1, or length(n).


ppoints(n, a = if(n <= 10) 3/8 else 1/2)



either the number of points generated or a vector of observations.


the offset fraction to be used; typically in (0,1)(0,1).


If 0<a<10 < a < 1, the resulting values are within (0,1)(0,1) (excluding boundaries). In any case, the resulting sequence is symmetric in [0,1][0,1], i.e., p + rev(p) == 1.

ppoints() is used in qqplot and qqnorm to generate the set of probabilities at which to evaluate the inverse distribution.

The choice of a follows the documentation of the function of the same name in Becker et al. (1988), and appears to have been motivated by results from Blom (1958) on approximations to expect normal order statistics (see also quantile).

The probability points for the continuous sample quantile types 5 to 9 (see quantile) can be obtained by taking a as, respectively, 1/2, 0, 1, 1/3, and 3/8.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Blom, G. (1958) Statistical Estimates and Transformed Beta Variables. Wiley

See Also

qqplot, qqnorm.


ppoints(4) # the same as  ppoints(1:4)
ppoints(10, a = 1/2)

## Visualize including the fractions :
p.ppoints <- function(n, ..., add = FALSE, col = par("col")) {
  pn <- ppoints(n, ...)
      points(pn, pn, col = col)
  else {
      tit <-; tit[[1]] <- quote(ppoints)
      plot(pn,pn, main = deparse(tit), col=col,
           xlim = 0:1, ylim = 0:1, xaxs = "i", yaxs = "i")
      abline(0, 1, col = adjustcolor(1, 1/4), lty = 3)
  if(!add && requireNamespace("MASS", quietly = TRUE))
    text(pn, pn, as.character(MASS::fractions(pn)),
         adj = c(0,0)-1/4, cex = 3/4, xpd = NA, col=col)
  abline(h = pn, v = pn, col = adjustcolor(col, 1/2), lty = 2, lwd = 1/2)

p.ppoints(10, a = 1/2)
p.ppoints(8) ; p.ppoints(8, a = 1/2, add=TRUE, col="tomato")

Projection Pursuit Regression


Fit a projection pursuit regression model.


ppr(x, ...)

## S3 method for class 'formula'
ppr(formula, data, weights, subset, na.action,
    contrasts = NULL, ..., model = FALSE)

## Default S3 method:
ppr(x, y, weights = rep(1, n),
    ww = rep(1, q), nterms, max.terms = nterms, optlevel = 2,
    sm.method = c("supsmu", "spline", "gcvspline"),
    bass = 0, span = 0, df = 5, gcvpen = 1, trace = FALSE, ...)



a formula specifying one or more numeric response variables and the explanatory variables.


numeric matrix of explanatory variables. Rows represent observations, and columns represent variables. Missing values are not accepted.


numeric matrix of response variables. Rows represent observations, and columns represent variables. Missing values are not accepted.


number of terms to include in the final model.


a data frame (or similar: see model.frame) from which variables specified in formula are preferentially to be taken.


a vector of weights w_i for each case.


a vector of weights for each response, so the fit criterion is the sum over case i and responses j of w_i ww_j (y_ij - fit_ij)^2 divided by the sum of w_i.


an index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)


a function to specify the action to be taken if NAs are found. The default action is given by getOption("na.action"). (NOTE: If given, this argument must be named.)


the contrasts to be used when any factor explanatory variables are coded.


maximum number of terms to choose from when building the model.


integer from 0 to 3 which determines the thoroughness of an optimization routine in the SMART program. See the ‘Details’ section.


the method used for smoothing the ridge functions. The default is to use Friedman's super smoother supsmu. The alternatives are to use the smoothing spline code underlying smooth.spline, either with a specified (equivalent) degrees of freedom for each ridge functions, or to allow the smoothness to be chosen by GCV.

Can be abbreviated.


super smoother bass tone control used with automatic span selection (see supsmu); the range of values is 0 to 10, with larger values resulting in increased smoothing.


super smoother span control (see supsmu). The default, 0, results in automatic span selection by local cross validation. span can also take a value in (0, 1].


if sm.method is "spline" specifies the smoothness of each ridge term via the requested equivalent degrees of freedom.


if sm.method is "gcvspline" this is the penalty used in the GCV selection for each degree of freedom used.


logical indicating if each spline fit should produce diagnostic output (about lambda and df), and the supsmu fit about its steps.


arguments to be passed to or from other methods.


logical. If true, the model frame is returned.


The basic method is given by Friedman (1984) and based on his code. This code has been shown to be extremely sensitive to the Fortran compiler used.

The algorithm first adds up to max.terms ridge terms one at a time; it will use less if it is unable to find a term to add that makes sufficient difference. It then removes the least important term at each step until nterms terms are left.

The levels of optimization (argument optlevel) differ in how thoroughly the models are refitted during this process. At level 0 the existing ridge terms are not refitted. At level 1 the projection directions are not refitted, but the ridge functions and the regression coefficients are. Levels 2 and 3 refit all the terms and are equivalent for one response; level 3 is more careful to re-balance the contributions from each regressor at each step and so is a little less likely to converge to a saddle point of the sum of squares criterion.


A list with the following components, many of which are for use by the method functions.


the matched call


the number of explanatory variables (after any coding)


the number of response variables


the argument nterms


the argument max.terms


the overall residual (weighted) sum of squares for the selected model


the overall residual (weighted) sum of squares against the number of terms, up to max.terms. Will be invalid (and zero) for less than nterms.


the argument df


if sm.method is "spline" or "gcvspline" the equivalent number of degrees of freedom for each ridge term used.


the names of the explanatory variables


the names of the response variables


a matrix of the projection directions, with a column for each ridge term


a matrix of the coefficients applied for each response to the ridge terms: the rows are the responses and the columns the ridge terms


the weighted means of each response


the overall scale factor used: internally the responses are divided by ys to have unit total weighted sum of squares.


the fitted values, as a matrix if q > 1.


the residuals, as a matrix if q > 1.


internal work array, which includes the ridge functions evaluated at the training set points.


(only if model = TRUE) the model frame.


Friedman (1984): converted to double precision and added interface to smoothing splines by B. D. Ripley, originally for the MASS package.


Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association, 76, 817–823. doi:10.2307/2287576.

Friedman, J. H. (1984). SMART User's Guide. Laboratory for Computational Statistics, Stanford University Technical Report No. 1.

Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. Springer.

See Also

plot.ppr, supsmu, smooth.spline



# Note: your numerical values may differ
area1 <- area/10000; peri1 <- peri/10000
rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape,
                data = rock, nterms = 2, max.terms = 5)
# Call:
# ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock,
#     nterms = 2, max.terms = 5)
# Goodness of fit:
#  2 terms  3 terms  4 terms  5 terms
# 8.737806 5.289517 4.745799 4.490378

# .....  (same as above)
# .....
# Projection direction vectors ('alpha'):
#       term 1      term 2
# area1  0.34357179  0.37071027
# peri1 -0.93781471 -0.61923542
# shape  0.04961846  0.69218595
# Coefficients of ridge terms:
#    term 1    term 2
# 1.6079271 0.5460971

par(mfrow = c(3,2))   # maybe: , pty = "s")
plot(rock.ppr, main = "ppr(log(perm)~ ., nterms=2, max.terms=5)")
plot(update(rock.ppr, bass = 5), main = "update(..., bass = 5)")
plot(update(rock.ppr, sm.method = "gcv", gcvpen = 2),
     main = "update(..., sm.method=\"gcv\", gcvpen=2)")
cbind(perm = rock$perm, prediction = round(exp(predict(rock.ppr)), 1))

Principal Components Analysis


Performs a principal components analysis on the given data matrix and returns the results as an object of class prcomp.


prcomp(x, ...)

## S3 method for class 'formula'
prcomp(formula, data = NULL, subset, na.action, ...)

## Default S3 method:
prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE,
       tol = NULL, rank. = NULL, ...)

## S3 method for class 'prcomp'
predict(object, newdata, ...)



a formula with no response variable, referring only to numeric variables.


an optional data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector used to select rows (observations) of the data matrix x.


a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is if that is unset. The ‘factory-fresh’ default is na.omit.


arguments passed to or from other methods. If x is a formula one might specify scale. or tol.


a numeric or complex matrix (or data frame) which provides the data for the principal components analysis.


a logical value indicating whether the rotated variables should be returned.


a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.


a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is FALSE for consistency with S, but in general scaling is advisable. Alternatively, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.


a value indicating the magnitude below which components should be omitted. (Components are omitted if their standard deviations are less than or equal to tol times the standard deviation of the first component.) With the default null setting, no components are omitted (unless rank. is specified less than min(dim(x)).). Other settings for tol could be tol = 0 or tol = sqrt(.Machine$double.eps), which would omit essentially constant components.


optionally, a number specifying the maximal rank, i.e., maximal number of principal components to be used. Can be set as alternative or in addition to tol, useful notably when the desired rank is considerably smaller than the dimensions of the matrix.


object of class inheriting from "prcomp"


An optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. If the original fit used a formula or a data frame or a matrix with column names, newdata must contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order.


The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using eigen on the covariance matrix. This is generally the preferred method for numerical accuracy. The print method for these objects prints the results in a nice format and the plot method produces a scree plot.

Unlike princomp, variances are computed with the usual divisor N1N - 1.

Note that scale = TRUE cannot be used if there are zero or constant (for center = TRUE) variables.


prcomp returns a list with class "prcomp" containing the following components:


the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix).


the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). The function princomp returns this in the element loadings.


if retx is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the rotation matrix) is returned. Hence, cov(x) is the diagonal matrix diag(sdev^2). For the formula method, napredict() is applied to handle the treatment of values omitted by the na.action.

center, scale

the centering and scaling used, or FALSE.


The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Mardia, K. V., J. T. Kent, and J. M. Bibby (1979) Multivariate Analysis, London: Academic Press.

Venables, W. N. and B. D. Ripley (2002) Modern Applied Statistics with S, Springer-Verlag.

See Also

biplot.prcomp, screeplot, princomp, cor, cov, svd, eigen.


C <- chol(S <- toeplitz(.9 ^ (0:31))) # Cov.matrix and its root
all.equal(S, crossprod(C))
X <- matrix(rnorm(32000), 1000, 32)
Z <- X %*% C  ## ==>  cov(Z) ~=  C'C = S
all.equal(cov(Z), S, tolerance = 0.08)
pZ <- prcomp(Z, tol = 0.1)
summary(pZ) # only ~14 PCs (out of 32)
## or choose only 3 PCs more directly:
pz3 <- prcomp(Z, rank. = 3)
summary(pz3) # same numbers as the first 3 above
stopifnot(ncol(pZ$rotation) == 14, ncol(pz3$rotation) == 3,
          all.equal(pz3$sdev, pZ$sdev, tolerance = 1e-15)) # exactly equal typically

## signs are random
## the variances of the variables in the
## USArrests data vary by orders of magnitude, so scaling is appropriate
prcomp(USArrests)  # inappropriate
prcomp(USArrests, scale. = TRUE)
prcomp(~ Murder + Assault + Rape, data = USArrests, scale. = TRUE)
summary(prcomp(USArrests, scale. = TRUE))
biplot(prcomp(USArrests, scale. = TRUE))

Model Predictions


predict is a generic function for predictions from the results of various model fitting functions. The function invokes particular methods which depend on the class of the first argument.


predict (object, ...)



a model object for which prediction is desired.


additional arguments affecting the predictions produced.


Most prediction methods which are similar to those for linear models have an argument newdata specifying the first place to look for explanatory variables to be used for prediction. Some considerable attempts are made to match up the columns in newdata to those used for fitting, for example that they are of comparable types and that any factors have the same level set in the same order (or can be transformed to be so).

Time series prediction methods in package stats have an argument n.ahead specifying how many time steps ahead to predict.

Many methods have a logical argument saying if standard errors are to returned.


The form of the value returned by predict depends on the class of its argument. See the documentation of the particular methods for details of what is produced by that method.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

See Also

predict.glm, predict.lm, predict.loess, predict.nls, predict.poly, predict.princomp, predict.smooth.spline.

SafePrediction for prediction from (univariable) polynomial and spline fits.

For time-series prediction,, predict.Arima, predict.arima0, predict.HoltWinters, predict.StructTS.



## All the "predict" methods found
## NB most of the methods in the standard packages are hidden.
## Output will depend on what namespaces are (or have been) loaded.

for(fn in methods("predict"))
       f <- eval(substitute(getAnywhere(fn)$objs[[1]], list(fn = fn)))
       cat(fn, ":\n\t", deparse(args(f)), "\n")
       }, silent = TRUE)

Forecast from ARIMA fits


Forecast from models fitted by arima.


## S3 method for class 'Arima'
predict(object, n.ahead = 1, newxreg = NULL, = TRUE, ...)



The result of an arima fit.


The number of steps ahead for which prediction is required.


New values of xreg to be used for prediction. Must have at least n.ahead rows.

Logical: should standard errors of prediction be returned?


arguments passed to or from other methods.


Finite-history prediction is used, via KalmanForecast. This is only statistically efficient if the MA part of the fit is invertible, so predict.Arima will give a warning for non-invertible MA models.

The standard errors of prediction exclude the uncertainty in the estimation of the ARMA model and the regression coefficients. According to Harvey (1993, pp. 58–9) the effect is small.


A time series of predictions, or if = TRUE, a list with components pred, the predictions, and se, the estimated standard errors. Both components are time series.


Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press.

Harvey, A. C. and McKenzie, C. R. (1982). Algorithm AS 182: An algorithm for finite sample prediction from ARIMA processes. Applied Statistics, 31, 180–187. doi:10.2307/2347987.

Harvey, A. C. (1993). Time Series Models, 2nd Edition. Harvester Wheatsheaf. Sections 3.3 and 4.4.

See Also



od <- options(digits = 5) # avoid too much spurious accuracy
predict(arima(lh, order = c(3,0,0)), n.ahead = 12)

(fit <- arima(USAccDeaths, order = c(0,1,1),
              seasonal = list(order = c(0,1,1))))
predict(fit, n.ahead = 6)

Predict Method for GLM Fits


Obtains predictions and optionally estimates standard errors of those predictions from a fitted generalized linear model object.


## S3 method for class 'glm'
predict(object, newdata = NULL,
            type = c("link", "response", "terms"),
   = FALSE, dispersion = NULL, terms = NULL,
            na.action = na.pass, ...)



a fitted object of class inheriting from "glm".


optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used.


the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale.

The value of this argument can be abbreviated.

logical switch indicating if standard errors are required.


the dispersion of the GLM fit to be assumed in computing the standard errors. If omitted, that returned by summary applied to the object is used.


with type = "terms" by default all terms are returned. A character vector specifies which terms are to be returned


function determining what should be done with missing values in newdata. The default is to predict NA.


further arguments passed to or from other methods.


If newdata is omitted the predictions are based on the data used for the fit. In that case how cases with missing values in the original fit is determined by the na.action argument of that fit. If na.action = na.omit omitted cases will not appear in the residuals, whereas if na.action = na.exclude they will appear (in predictions and standard errors), with residual value NA. See also napredict.


If = FALSE, a vector or matrix of predictions. For type = "terms" this is a matrix with a column per term, and may have an attribute "constant".

If = TRUE, a list with components


Predictions, as for = FALSE.

Estimated standard errors.


A scalar giving the square root of the dispersion used in computing the standard errors.


Variables are first looked for in newdata and then searched for in the usual way (which will include the environment of the formula used in the fit). A warning will be given if the variables found are not of the same length as those in newdata if it was supplied.

See Also

glm, SafePrediction



## example from Venables and Ripley (2002, pp. 190-2.)
ldose <- rep(0:5, 2)
numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)
sex <- factor(rep(c("M", "F"), c(6, 6)))
SF <- cbind(numdead, numalive = 20-numdead)
budworm.lg <- glm(SF ~ sex*ldose, family = binomial)

plot(c(1,32), c(0,1), type = "n", xlab = "dose",
     ylab = "prob", log = "x")
text(2^ldose, numdead/20, as.character(sex))
ld <- seq(0, 5, 0.1)
lines(2^ld, predict(budworm.lg, data.frame(ldose = ld,
   sex = factor(rep("M", length(ld)), levels = levels(sex))),
   type = "response"))
lines(2^ld, predict(budworm.lg, data.frame(ldose = ld,
   sex = factor(rep("F", length(ld)), levels = levels(sex))),
   type = "response"))

Prediction Function for Fitted Holt-Winters Models


Computes predictions and prediction intervals for models fitted by the Holt-Winters method.


## S3 method for class 'HoltWinters'
predict(object, n.ahead = 1, prediction.interval = FALSE,
       level = 0.95, ...)



An object of class HoltWinters.


Number of future periods to predict.


logical. If TRUE, the lower and upper bounds of the corresponding prediction intervals are computed.


Confidence level for the prediction interval.


arguments passed to or from other methods.


A time series of the predicted values. If prediction intervals are requested, a multiple time series is returned with columns fit, lwr and upr for the predicted values and the lower and upper bounds respectively.


David Meyer


C. C. Holt (1957) Forecasting trends and seasonals by exponentially weighted moving averages, ONR Research Memorandum, Carnegie Institute of Technology 52.

P. R. Winters (1960). Forecasting sales by exponentially weighted moving averages. Management Science, 6, 324–342. doi:10.1287/mnsc.6.3.324.

See Also




m <- HoltWinters(co2)
p <- predict(m, 50, prediction.interval = TRUE)
plot(m, p)

Predict method for Linear Model Fits


Predicted values based on linear model object.


## S3 method for class 'lm'
predict(object, newdata, = FALSE, scale = NULL, df = Inf,
        interval = c("none", "confidence", "prediction"),
        level = 0.95, type = c("response", "terms"),
        terms = NULL, na.action = na.pass,
        pred.var = res.var/weights, weights = 1,
        rankdeficient = c("warnif", "simple", "non-estim", "NA", "NAwarn"),
        tol = 1e-6, verbose = FALSE,



Object of class inheriting from "lm"


An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used.

A switch indicating if standard errors are required.


Scale parameter for std.err. calculation.


Degrees of freedom for scale.


Type of interval calculation. Can be abbreviated.


Tolerance/confidence level.


Type of prediction (response or model term). Can be abbreviated.


If type = "terms", which terms (default is all terms), a character vector.


function determining what should be done with missing values in newdata. The default is to predict NA.


the variance(s) for future observations to be assumed for prediction intervals. See ‘Details’.


variance weights for prediction. This can be a numeric vector or a one-sided model formula. In the latter case, it is interpreted as an expression evaluated in newdata.


a character string specifying what should happen in the case of a rank deficient model, i.e., when object$rank < ncol(model.matrix(object)).


gives a warning only in case of predicting ‘non-estimable’ cases, i.e., vectors not in the same predictor subspace as the original data (with tolerance tol). In that case, the non-estimable indices are also returned as attribute "non-estim" (see rankdeficient="non-estim").


is back compatible to R < 4.3.0, possibly giving dubious predictions in non-estimable cases, and always signalling a warning.


gives the same predictions without warning, and with an attribute attr(*, "non-estim") with indices in 1:nrow(newdata) of new data observations which are deemed non-estimable.


predicts NA for non-estimable new data, silently. Often recommended in new code.


predicts NA for non-estimable new data with a warning.


non-negative number determining how non-estimability is determined in rank deficient cases.


logical indicating if messages should be produced about rank deficiency handling.


further arguments passed to or from other methods.


predict.lm produces predicted values, obtained by evaluating the regression function in the frame newdata (which defaults to model.frame(object)). If the logical is TRUE, standard errors of the predictions are calculated. If the numeric argument scale is set (with optional df), it is used as the residual standard deviation in the computation of the standard errors, otherwise this is extracted from the model fit. Setting intervals specifies computation of confidence or prediction (tolerance) intervals at the specified level, sometimes referred to as narrow vs. wide intervals.

If the fit is rank-deficient, some of the columns of the design matrix will have been dropped during the lm computations, and corresponding coef() components set to NA. Prediction from such a fit only makes sense if newdata is contained in the same subspace as the original data. Other newdata entries (rows) are non-estimable. This is now checked (up to numerical tolerance tol) unless rankdeficient == "simple", which corresponds to previous behaviour, warns always and predicts using the non-NA coefficients with the corresponding columns of the design matrix. The new default option, rankdeficient == "warnif" checks if there are “non-estimable” cases (up to tolerance tol) and only warns in that case. All further rankdeficient options also check and either predict NA or mark the non-estimable cases differently.

If newdata is omitted the predictions are based on the data used for the fit. In that case how cases with missing values in the original fit are handled is determined by the na.action argument of that fit. If na.action = na.omit omitted cases will not appear in the predictions, whereas if na.action = na.exclude they will appear (in predictions, standard errors or interval limits), with value NA. See also napredict.

The prediction intervals are for a single observation at each case in newdata (or by default, the data used for the fit) with error variance(s) pred.var. This can be a multiple of res.var, the estimated value of σ2\sigma^2: the default is to assume that future observations have the same error variance as those used for fitting. If weights is supplied, the inverse of this is used as a scale factor. For a weighted fit, if the prediction is for the original data frame, weights defaults to the weights used for the model fit, with a warning since it might not be the intended result. If the fit was weighted and newdata is given, the default is to assume constant prediction variance, with a warning.


predict.lm produces a vector of predictions or a matrix of predictions and bounds with column names fit, lwr, and upr if interval is set. For type = "terms" this is a matrix with a column per term and may have an attribute "constant".

If is TRUE, a list with the following components is returned:


vector or matrix as above

standard error of predicted means


residual standard deviations


degrees of freedom for residual


Variables are first looked for in newdata and then searched for in the usual way (which will include the environment of the formula used in the fit). A warning will be given if the variables found are not of the same length as those in newdata if it was supplied.

Notice that prediction variances and prediction intervals always refer to future observations, possibly corresponding to the same predictors as used for the fit. The variance of the residuals will be smaller.

Strictly speaking, the formula used for prediction limits assumes that the degrees of freedom for the fit are the same as those for the residual variance. This may not be the case if res.var is not obtained from the fit.

See Also

The model fitting function lm, predict.

SafePrediction for prediction from (univariable) polynomial and spline fits.



## Predictions
x <- rnorm(15)
y <- x + rnorm(15)
predict(lm(y ~ x))
new <- data.frame(x = seq(-3, 3, 0.5))
predict(lm(y ~ x), new, = TRUE)
pred.w.plim <- predict(lm(y ~ x), new, interval = "prediction")
pred.w.clim <- predict(lm(y ~ x), new, interval = "confidence")
matplot(new$x, cbind(pred.w.clim, pred.w.plim[,-1]),
        lty = c(1,2,2,3,3), type = "l", ylab = "predicted y")

## Prediction intervals, special cases
##  The first three of these throw warnings
w <- 1 + x^2
fit <- lm(y ~ x)
wfit <- lm(y ~ x, weights = w)
predict(fit, interval = "prediction")
predict(wfit, interval = "prediction")
predict(wfit, new, interval = "prediction")
predict(wfit, new, interval = "prediction", weights = (new$x)^2)
predict(wfit, new, interval = "prediction", weights = ~x^2)

##-- From  aov(.) example ---- predict(.. terms)
npk.aov <- aov(yield ~ block + N*P*K, npk)
(termL <- attr(terms(npk.aov), "term.labels"))
(pt <- predict(npk.aov, type = "terms"))
pt. <- predict(npk.aov, type = "terms", terms = termL[1:4])
stopifnot(all.equal(pt[,1:4], pt.,
                    tolerance = 1e-12, check.attributes = FALSE))

Predict LOESS Curve or Surface


Predictions from a loess fit, optionally with standard errors.


## S3 method for class 'loess'
predict(object, newdata = NULL, se = FALSE,
        na.action = na.pass, ...)



an object fitted by loess.


an optional data frame in which to look for variables with which to predict, or a matrix or vector containing exactly the variables needs for prediction. If missing, the original data points are used.


should standard errors be computed?


function determining what should be done with missing values in data frame newdata. The default is to predict NA.


arguments passed to or from other methods.


The standard errors calculation se = TRUE is slower than prediction, notably as it needs a relatively large workspace (memory), notably matrices of dimension N×NfN \times Nf where f=f =span, i.e., se = TRUE is O(N2)O(N^2) and hence stops when the sample size NN is larger than about 40'600 (for default span = 0.75).

When the fit was made using surface = "interpolate" (the default), predict.loess will not extrapolate – so points outside an axis-aligned hypercube enclosing the original data will have missing (NA) predictions and standard errors.


If se = FALSE, a vector giving the prediction for each row of newdata (or the original data). If se = TRUE, a list containing components


the predicted values.


an estimated standard error for each predicted value.


the estimated scale of the residuals used in computing the standard errors.


an estimate of the effective degrees of freedom used in estimating the residual scale, intended for use with t-based confidence intervals.

If newdata was the result of a call to expand.grid, the predictions (and s.e.'s if requested) will be an array of the appropriate dimensions.

Predictions from infinite inputs will be NA since loess does not support extrapolation.


Variables are first looked for in newdata and then searched for in the usual way (which will include the environment of the formula used in the fit). A warning will be given if the variables found are not of the same length as those in newdata if it was supplied.


B. D. Ripley, based on the cloess package of Cleveland, Grosse and Shyu.

See Also



cars.lo <- loess(dist ~ speed, cars)
predict(cars.lo, data.frame(speed = seq(5, 30, 1)), se = TRUE)
# to get extrapolation
cars.lo2 <- loess(dist ~ speed, cars,
  control = loess.control(surface = "direct"))
predict(cars.lo2, data.frame(speed = seq(5, 30, 1)), se = TRUE)

Predicting from Nonlinear Least Squares Fits


predict.nls produces predicted values, obtained by evaluating the regression function in the frame newdata. If the logical is TRUE, standard errors of the predictions are calculated. If the numeric argument scale is set (with optional df), it is used as the residual standard deviation in the computation of the standard errors, otherwise this is extracted from the model fit. Setting intervals specifies computation of confidence or prediction (tolerance) intervals at the specified level.

At present and interval are ignored.


## S3 method for class 'nls'
predict(object, newdata , = FALSE, scale = NULL, df = Inf,
        interval = c("none", "confidence", "prediction"),
        level = 0.95, ...)



An object that inherits from class nls.


A named list or data frame in which to look for variables with which to predict. If newdata is missing the fitted values at the original data points are returned.

A logical value indicating if the standard errors of the predictions should be calculated. Defaults to FALSE. At present this argument is ignored.


A numeric scalar. If it is set (with optional df), it is used as the residual standard deviation in the computation of the standard errors, otherwise this information is extracted from the model fit. At present this argument is ignored.


A positive numeric scalar giving the number of degrees of freedom for the scale estimate. At present this argument is ignored.


A character string indicating if prediction intervals or a confidence interval on the mean responses are to be calculated. At present this argument is ignored.


A numeric scalar between 0 and 1 giving the confidence level for the intervals (if any) to be calculated. At present this argument is ignored.


Additional optional arguments. At present no optional arguments are used.


predict.nls produces a vector of predictions. When implemented, interval will produce a matrix of predictions and bounds with column names fit, lwr, and upr. When implemented, if is TRUE, a list with the following components will be returned:


vector or matrix as above

standard error of predictions


residual standard deviations


degrees of freedom for residual


Variables are first looked for in newdata and then searched for in the usual way (which will include the environment of the formula used in the fit). A warning will be given if the variables found are not of the same length as those in newdata if it was supplied.

See Also

The model fitting function nls, predict.



fm <- nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD)
predict(fm)              # fitted values at observed times
## Form data plot and smooth line for the predictions
opar <- par(las = 1)
plot(demand ~ Time, data = BOD, col = 4,
     main = "BOD data and fitted first-order curve",
     xlim = c(0,7), ylim = c(0, 20) )
tt <- seq(0, 8, length.out = 101)
lines(tt, predict(fm, list(Time = tt)))

Predict from Smoothing Spline Fit


Predict a smoothing spline fit at new points, return the derivative if desired. The predicted fit is linear beyond the original data.


## S3 method for class 'smooth.spline'
predict(object, x, deriv = 0, ...)



a fit from smooth.spline.


the new values of x.


integer; the order of the derivative required.


further arguments passed to or from other methods.


A list with components


The input x.


The fitted values or derivatives at x.

See Also




cars.spl <- smooth.spline(speed, dist, df = 6.4)

## "Proof" that the derivatives are okay, by comparing with approximation
diff.quot <- function(x, y) {
  ## Difference quotient (central differences where available)
  n <- length(x); i1 <- 1:2; i2 <- (n-1):n
  c(diff(y[i1]) / diff(x[i1]), (y[-i1] - y[-i2]) / (x[-i1] - x[-i2]),
    diff(y[i2]) / diff(x[i2]))

xx <- unique(sort(c(seq(0, 30, by = .2), kn <- unique(speed)))) <- match(kn, xx)   # indices of knots within xx
op <- par(mfrow = c(2,2))
plot(speed, dist, xlim = range(xx), main = "Smooth.spline & derivatives")
lines(pp <- predict(cars.spl, xx), col = "red")
points(kn, pp$y[], pch = 3, col = "dark red")
mtext("s(x)", col = "red")
for(d in 1:3){
  n <- length(pp$x)
  plot(pp$x, diff.quot(pp$x,pp$y), type = "l", xlab = "x", ylab = "",
       col = "blue", col.main = "red",
       main = paste0("s" ,paste(rep("'", d), collapse = ""), "(x)"))
  mtext("Difference quotient approx.(last)", col = "blue")
  lines(pp <- predict(cars.spl, xx, deriv = d), col = "red")

  points(kn, pp$y[], pch = 3, col = "dark red")
  abline(h = 0, lty = 3, col = "gray")
detach(); par(op)

Pre-computations for a Plotting Object


Compute an object to be used for plots relating to the given model object.


preplot(object, ...)



a fitted model object.


additional arguments for specific methods.


Only the generic function is currently provided in base R, but some add-on packages have methods. Principally here for S compatibility.


An object set up to make a plot that describes object.

Principal Components Analysis


princomp performs a principal components analysis on the given numeric data matrix and returns the results as an object of class princomp.


princomp(x, ...)

## S3 method for class 'formula'
princomp(formula, data = NULL, subset, na.action, ...)

## Default S3 method:
princomp(x, cor = FALSE, scores = TRUE, covmat = NULL,
         subset = rep_len(TRUE, nrow(as.matrix(x))), fix_sign = TRUE, ...)

## S3 method for class 'princomp'
predict(object, newdata, ...)



a formula with no response variable, referring only to numeric variables.


an optional data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector used to select rows (observations) of the data matrix x.


a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is if that is unset. The ‘factory-fresh’ default is na.omit.


a numeric matrix or data frame which provides the data for the principal components analysis.


a logical value indicating whether the calculation should use the correlation matrix or the covariance matrix. (The correlation matrix can only be used if there are no constant variables.)


a logical value indicating whether the score on each principal component should be calculated.


a covariance matrix, or a covariance list as returned by cov.wt (and cov.mve or from package MASS). If supplied, this is used rather than the covariance matrix of x.


Should the signs of the loadings and scores be chosen so that the first element of each loading is non-negative?


arguments passed to or from other methods. If x is a formula one might specify cor or scores.


Object of class inheriting from "princomp".


An optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. If the original fit used a formula or a data frame or a matrix with column names, newdata must contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order.


princomp is a generic function with "formula" and "default" methods.

The calculation is done using eigen on the correlation or covariance matrix, as determined by cor. (This was done for compatibility with the S-PLUS result.) A preferred method of calculation is to use svd on x, as is done in prcomp.

Note that the default calculation uses divisor N for the covariance matrix.

The print method for these objects prints the results in a nice format and the plot method produces a scree plot (screeplot). There is also a biplot method.

If x is a formula then the standard NA-handling is applied to the scores (if requested): see napredict.

princomp only handles so-called R-mode PCA, that is feature extraction of variables. If a data matrix is supplied (possibly via a formula) it is required that there are at least as many units as variables. For Q-mode PCA use prcomp.


princomp returns a list with class "princomp" containing the following components:


the standard deviations of the principal components.


the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). This is of class "loadings": see loadings for its print method.


the means that were subtracted.


the scalings applied to each variable.


the number of observations.


if scores = TRUE, the scores of the supplied data on the principal components. These are non-null only if x was supplied, and if covmat was also supplied if it was a covariance list. For the formula method, napredict() is applied to handle the treatment of values omitted by the na.action.


the matched call.


If relevant.


The signs of the columns of the loadings and scores are arbitrary, and so may differ between different programs for PCA, and even between different builds of R: fix_sign = TRUE alleviates that.


Mardia, K. V., J. T. Kent and J. M. Bibby (1979). Multivariate Analysis, London: Academic Press.

Venables, W. N. and B. D. Ripley (2002). Modern Applied Statistics with S, Springer-Verlag.

See Also

summary.princomp, screeplot, biplot.princomp, prcomp, cor, cov, eigen.



## The variances of the variables in the
## USArrests data vary by orders of magnitude, so scaling is appropriate
( <- princomp(USArrests))  # inappropriate
princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE)
## Similar, but different:
## The standard deviations differ by a factor of sqrt(49/50)

summary( <- princomp(USArrests, cor = TRUE))
loadings(  # note that blank entries are small but not zero
## The signs of the columns of the loadings are arbitrary
plot( # shows a screeplot.

## Formula interface
princomp(~ ., data = USArrests, cor = TRUE)

## NA-handling
USArrests[1, 2] <- NA <- princomp(~ Murder + Assault + UrbanPop,
                  data = USArrests, na.action = na.exclude, cor = TRUE)$scores[1:5, ]

## (Simple) Robust PCA:
## Classical:
(  <- princomp(stackloss))
## Robust:
(pc.rob <- princomp(stackloss, covmat = MASS::cov.rob(stackloss)))

Print Methods for Hypothesis Tests and Power Calculation Objects


Printing objects of class "htest" or "power.htest", respectively, by simple print methods.


## S3 method for class 'htest'
print(x, digits = getOption("digits"), prefix = "\t", ...)

## S3 method for class 'power.htest'
print(x, digits = getOption("digits"), ...)



object of class "htest" or "power.htest".


number of significant digits to be used.


string, passed to strwrap for displaying the method component of the htest object.


further arguments to be passed to or from methods.


Both print methods traditionally have not obeyed the digits argument properly. They now do, the htest method mostly in expressions like max(1, digits - 2).

A power.htest object is just a named list of numbers and character strings, supplemented with method and note elements. The method is displayed as a title, the note as a footnote, and the remaining elements are given in an aligned ‘name = value’ format.


the argument x, invisibly, as for all print methods.


Peter Dalgaard

See Also

power.t.test, power.prop.test


(ptt <- power.t.test(n = 20, delta = 1))
print(ptt, digits =  4) # using less digits than default
print(ptt, digits = 12) # using more  "       "     "

Printing and Formatting of Time-Series Objects


Notably for calendar related time series objects, format and print methods showing years, months and or quarters respectively.


## S3 method for class 'ts'
print(x, calendar, ...)
.preformat.ts(x, calendar, ...)



a time series object.


enable/disable the display of information about month names, quarter names or year when printing. The default is TRUE for a frequency of 4 or 12, FALSE otherwise.


additional arguments to print (or format methods).


The print method for "ts" objects prints a header (basically of tsp(x)), if calendar is false, and then prints the result of .preformat.ts(x, *), which is typically a matrix with rownames built from the calendar times where applicable.

See Also

print, ts.


print(ts(1:10, frequency = 7, start = c(12, 2)), calendar = TRUE)

print(sunsp.1 <- window(sunspot.month, end=c(1756, 12)))
m <- .preformat.ts(sunsp.1) # a character matrix

Print Coefficient Matrices


Utility function to be used in higher-level print methods, such as those for summary.lm, summary.glm and anova. The goal is to provide a flexible interface with smart defaults such that often, only x needs to be specified.


printCoefmat(x, digits = max(3, getOption("digits") - 2),
             signif.stars = getOption("show.signif.stars"),
             signif.legend = signif.stars,
             dig.tst = max(1, min(5, digits - 1)),
             cs.ind = 1L:k, tst.ind = k + 1L,
             zap.ind = integer(), P.values = NULL,
             has.Pvalue = nc >= 4L && length(cn <- colnames(x)) &&
                          substr(cn[nc], 1L, 3L) %in% c("Pr(", "p-v"),
             eps.Pvalue = .Machine$double.eps,
             na.print = "NA", quote = FALSE, right = TRUE, ...)



a numeric matrix like object, to be printed.


minimum number of significant digits to be used for most numbers.


logical; if TRUE, P-values are additionally encoded visually as ‘significance stars’ in order to help scanning of long coefficient tables. It defaults to the show.signif.stars slot of options.


logical; if TRUE, a legend for the ‘significance stars’ is printed provided signif.stars = TRUE.


minimum number of significant digits for the test statistics, see tst.ind.


indices (integer) of column numbers which are (like) coefficients and standard errors to be formatted together.


indices (integer) of column numbers for test statistics.


indices (integer) of column numbers which should be formatted by zapsmall, i.e., by ‘zapping’ values close to 0.


logical or NULL; if TRUE, the last column of x is formatted by format.pval as P values. If P.values = NULL, the default, it is set to TRUE only if options("show.coef.Pvalue") is TRUE and x has at least 4 columns and the last column name of x starts with "Pr(".


logical; if TRUE, the last column of x contains P values; in that case, it is printed if and only if P.values (above) is true.


number, passed to format.pval() as eps.


a character string to code NA values in printed output.

quote, right, ...

further arguments passed to print.default.


Invisibly returns its argument, x.


Martin Maechler

See Also

print.summary.lm, format.pval, format.


cmat <- cbind(rnorm(3, 10), sqrt(rchisq(3, 12)))
cmat <- cbind(cmat, cmat[, 1]/cmat[, 2])
cmat <- cbind(cmat, 2*pnorm(-cmat[, 3]))
colnames(cmat) <- c("Estimate", "Std.Err", "Z value", "Pr(>z)")
printCoefmat(cmat[, 1:3])
op <- options(show.coef.Pvalues = FALSE)
printCoefmat(cmat, digits = 2)
printCoefmat(cmat, digits = 2, P.values = TRUE)
options(op) # restore

Generic Function for Profiling Models


Investigates the behavior of the objective function near the solution represented by fitted.

See documentation on method functions for further details.


profile(fitted, ...)



the original fitted model object.


additional parameters. See documentation on individual methods.


A list with an element for each parameter being profiled. See the individual methods for further details.

See Also

profile.nls, profile.glm ...


For profiling R code, see Rprof.

Method for Profiling glm Objects


Investigates the profile log-likelihood function for a fitted model of class "glm".


## S3 method for class 'glm'
profile(fitted, which = 1:p, alpha = 0.01, maxsteps = 10,
        del = zmax/5, trace = FALSE, test = c("LRT", "Rao"), ...)



the original fitted model object.


the original model parameters which should be profiled. This can be a numeric or character vector. By default, all parameters are profiled.


highest significance level allowed for the profile z-statistics.


maximum number of points to be used for profiling each parameter.


suggested change on the scale of the profile t-statistics. Default value chosen to allow profiling at about 10 parameter values.


logical: should the progress of profiling be reported?


profile Likelihood Ratio test or Rao Score test.


further arguments passed to or from other methods.


The profile z-statistic is defined either as (case test = "LRT") the square root of change in deviance with an appropriate sign, or (case test = "Rao") as the similarly signed square root of the Rao Score test statistic. The latter is defined as the squared gradient of the profile log likelihood divided by the profile Fisher information, but more conveniently calculated via the deviance of a Gaussian GLM fitted to the residuals of the profiled model.


A list of classes "profile.glm" and "profile" with an element for each parameter being profiled. The elements are data-frames with two variables


a matrix of parameter values for each fitted model.

tau or z

the profile t or z-statistics (the name depends on whether there is an estimated dispersion parameter.)


Originally, D. M. Bates and W. N. Venables. (For S in 1996.)

See Also

glm, profile, plot.profile


options(contrasts = c("contr.treatment", "contr.poly"))
ldose <- rep(0:5, 2)
numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)
sex <- factor(rep(c("M", "F"), c(6, 6)))
SF <- cbind(numdead, numalive = 20 - numdead)
budworm.lg <- glm(SF ~ sex*ldose, family = binomial)
pr1 <- profile(budworm.lg)

Method for Profiling nls Objects


Investigates the profile log-likelihood function for a fitted model of class "nls".


## S3 method for class 'nls'
profile(fitted, which = 1:npar, maxpts = 100, alphamax = 0.01,
        delta.t = cutoff/5, ...)



the original fitted model object.


the original model parameters which should be profiled. This can be a numeric or character vector. By default, all non-linear parameters are profiled.


maximum number of points to be used for profiling each parameter.


highest significance level allowed for the profile t-statistics.


suggested change on the scale of the profile t-statistics. Default value chosen to allow profiling at about 10 parameter values.


further arguments passed to or from other methods.


The profile t-statistics is defined as the square root of change in sum-of-squares divided by residual standard error with an appropriate sign.


A list with an element for each parameter being profiled. The elements are data-frames with two variables


a matrix of parameter values for each fitted model.


the profile t-statistics.


Of the original version, Douglas M. Bates and Saikat DebRoy


Bates, D. M. and Watts, D. G. (1988), Nonlinear Regression Analysis and Its Applications, Wiley (chapter 6).

See Also

nls, profile, plot.profile.nls


# obtain the fitted object
fm1 <- nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD)
# get the profile for the fitted model: default level is too extreme
pr1 <- profile(fm1, alphamax = 0.05)
# profiled values for the two parameters


# see also example(plot.profile.nls)

Projections of Models


proj returns a matrix or list of matrices giving the projections of the data onto the terms of a linear model. It is most frequently used for aov models.


proj(object, ...)

## S3 method for class 'aov'
proj(object, onedf = FALSE, unweighted.scale = FALSE, ...)

## S3 method for class 'aovlist'
proj(object, onedf = FALSE, unweighted.scale = FALSE, ...)

## Default S3 method:
proj(object, onedf = TRUE, ...)

## S3 method for class 'lm'
proj(object, onedf = FALSE, unweighted.scale = FALSE, ...)



An object of class "lm" or a class inheriting from it, or an object with a similar structure including in particular components qr and effects.


A logical flag. If TRUE, a projection is returned for all the columns of the model matrix. If FALSE, the single-column projections are collapsed by terms of the model (as represented in the analysis of variance table).


If the fit producing object used weights, this determines if the projections correspond to weighted or unweighted observations.


Swallow and ignore any other arguments.


A projection is given for each stratum of the object, so for aov models with an Error term the result is a list of projections.


A projection matrix or (for multi-stratum objects) a list of projection matrices.

Each projection is a matrix with a row for each observations and either a column for each term (onedf = FALSE) or for each coefficient (onedf = TRUE). Projection matrices from the default method have orthogonal columns representing the projection of the response onto the column space of the Q matrix from the QR decomposition. The fitted values are the sum of the projections, and the sum of squares for each column is the reduction in sum of squares from fitting that column (after those to the left of it).

The methods for lm and aov models add a column to the projection matrix giving the residuals (the projection of the data onto the orthogonal complement of the model space).

Strictly, when onedf = FALSE the result is not a projection, but the columns represent sums of projections onto the columns of the model matrix corresponding to that term. In this case the matrix does not depend on the coding used.


The design was inspired by the S function of the same name described in Chambers et al. (1992).


Chambers, J. M., Freeny, A and Heiberger, R. M. (1992) Analysis of variance; designed experiments. Chapter 5 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

aov, lm, model.tables


N <- c(0,1,0,1,1,1,0,0,0,1,1,0,1,1,0,0,1,0,1,0,1,1,0,0)
P <- c(1,1,0,0,0,1,0,1,1,1,0,0,0,1,0,1,1,0,0,1,0,1,1,0)
K <- c(1,0,0,1,0,1,1,0,0,1,0,1,0,1,1,0,0,0,1,1,1,0,1,0)
yield <- c(49.5,62.8,46.8,57.0,59.8,58.5,55.5,56.0,62.8,55.8,69.5,
55.0, 62.0,48.8,45.5,44.2,52.0,51.5,49.8,48.8,57.2,59.0,53.2,56.0)

npk <- data.frame(block = gl(6,4), N = factor(N), P = factor(P),
                  K = factor(K), yield = yield)
npk.aov <- aov(yield ~ block + N*P*K, npk)

## as a test, not particularly sensible
options(contrasts = c("contr.helmert", "contr.treatment"))
npk.aovE <- aov(yield ~  N*P*K + Error(block), npk)

Test of Equal or Given Proportions


prop.test can be used for testing the null that the proportions (probabilities of success) in several groups are the same, or that they equal certain given values.


prop.test(x, n, p = NULL,
          alternative = c("two.sided", "less", "greater"),
          conf.level = 0.95, correct = TRUE)



a vector of counts of successes, a one-dimensional table with two entries, or a two-dimensional table (or matrix) with 2 columns, giving the counts of successes and failures, respectively.


a vector of counts of trials; ignored if x is a matrix or a table.


a vector of probabilities of success. The length of p must be the same as the number of groups specified by x, and its elements must be greater than 0 and less than 1.


a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter. Only used for testing the null that a single proportion equals a given value, or that two proportions are equal; ignored otherwise.


confidence level of the returned confidence interval. Must be a single number between 0 and 1. Only used when testing the null that a single proportion equals a given value, or that two proportions are equal; ignored otherwise.


a logical indicating whether Yates' continuity correction should be applied where possible.


Only groups with finite numbers of successes and failures are used. Counts of successes and failures must be nonnegative and hence not greater than the corresponding numbers of trials which must be positive. All finite counts should be integers.

If p is NULL and there is more than one group, the null tested is that the proportions in each group are the same. If there are two groups, the alternatives are that the probability of success in the first group is less than, not equal to, or greater than the probability of success in the second group, as specified by alternative. A confidence interval for the difference of proportions with confidence level as specified by conf.level and clipped to [1,1][-1,1] is returned. Continuity correction is used only if it does not exceed the difference of the sample proportions in absolute value. Otherwise, if there are more than 2 groups, the alternative is always "two.sided", the returned confidence interval is NULL, and continuity correction is never used.

If there is only one group, then the null tested is that the underlying probability of success is p, or .5 if p is not given. The alternative is that the probability of success is less than, not equal to, or greater than p or 0.5, respectively, as specified by alternative. A confidence interval for the underlying proportion with confidence level as specified by conf.level and clipped to [0,1][0,1] is returned. Continuity correction is used only if it does not exceed the difference between sample and null proportions in absolute value. The confidence interval is computed by inverting the score test.

Finally, if p is given and there are more than 2 groups, the null tested is that the underlying probabilities of success are those given by p. The alternative is always "two.sided", the returned confidence interval is NULL, and continuity correction is never used.


A list with class "htest" containing the following components:


the value of Pearson's chi-squared test statistic.


the degrees of freedom of the approximate chi-squared distribution of the test statistic.


the p-value of the test.


a vector with the sample proportions x/n.

a confidence interval for the true proportion if there is one group, or for the difference in proportions if there are 2 groups and p is not given, or NULL otherwise. In the cases where it is not NULL, the returned confidence interval has an asymptotic confidence level as specified by conf.level, and is appropriate to the specified alternative hypothesis.


the value of p if specified by the null, or NULL otherwise.


a character string describing the alternative.


a character string indicating the method used, and whether Yates' continuity correction was applied.

a character string giving the names of the data.


Wilson, E.B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, 209–212. doi:10.2307/2276774.

Newcombe R.G. (1998). Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods. Statistics in Medicine, 17, 857–872. doi:10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E.

Newcombe R.G. (1998). Interval Estimation for the Difference Between Independent Proportions: Comparison of Eleven Methods. Statistics in Medicine, 17, 873–890. doi:10.1002/(SICI)1097-0258(19980430)17:8<873::AID-SIM779>3.0.CO;2-I.

See Also

binom.test for an exact test of a binomial hypothesis.


heads <- rbinom(1, size = 100, prob = .5)
prop.test(heads, 100)          # continuity correction TRUE by default
prop.test(heads, 100, correct = FALSE)

## Data from Fleiss (1981), p. 139.
## H0: The null hypothesis is that the four populations from which
##     the patients were drawn have the same true proportion of smokers.
## A:  The alternative is that this proportion is different in at
##     least one of the populations.

smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )
prop.test(smokers, patients)

Test for trend in proportions


Performs chi-squared test for trend in proportions, i.e., a test asymptotically optimal for local alternatives where the log odds vary in proportion with score. By default, score is chosen as the group numbers.


prop.trend.test(x, n, score = seq_along(x))



Number of events


Number of trials


Group score


An object of class "htest" with title, test statistic, p-value, etc.


This really should get integrated with prop.test


Peter Dalgaard

See Also



smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )
prop.test(smokers, patients)
prop.trend.test(smokers, patients)
prop.trend.test(smokers, patients, c(0,0,0,1))

Quantile-Quantile Plots


qqnorm is a generic function the default method of which produces a normal QQ plot of the values in y. qqline adds a line to a “theoretical”, by default normal, quantile-quantile plot which passes through the probs quantiles, by default the first and third quartiles.

qqplot produces a QQ plot of two datasets. If conf.level is given, a confidence band for a function transforming the distribution of x into the distribution of y is plotted based on Switzer (1976). The QQ plot can be understood as an estimate of such a treatment function. If exact = NULL (the default), an exact confidence band is computed if the product of the sample sizes is less than 10000, with or without ties. Otherwise, asymptotic distributions are used whose approximations may be inaccurate in small samples. Monte-Carlo approximations based on B random permutations are computed when simulate = TRUE. Confidence bands are in agreement with Smirnov's test, that is, the bisecting line is covered by the band iff the null of both samples coming from the same distribution cannot be rejected at the same level.

Graphical parameters may be given as arguments to qqnorm, qqplot and qqline.


qqnorm(y, ...)
## Default S3 method:
qqnorm(y, ylim, main = "Normal Q-Q Plot",
       xlab = "Theoretical Quantiles", ylab = "Sample Quantiles", = TRUE, datax = FALSE, ...)

qqline(y, datax = FALSE, distribution = qnorm,
       probs = c(0.25, 0.75), qtype = 7, ...)

qqplot(x, y, = TRUE,
       xlab = deparse1(substitute(x)),
       ylab = deparse1(substitute(y)), ...,
       conf.level = NULL, 
       conf.args = list(exact = NULL, simulate.p.value = FALSE,
                        B = 2000, col = NA, border = NULL))



The first sample for qqplot.


The second or only data sample.

xlab, ylab, main

plot labels. The xlab and ylab refer to the y and x axes respectively if datax = TRUE.

logical. Should the result be plotted?


logical. Should data values be on the x-axis?


quantile function for reference theoretical distribution.


numeric vector of length two, representing probabilities. Corresponding quantile pairs define the line drawn.


the type of quantile computation used in quantile.

ylim, ...

graphical parameters.


confidence level of the band. The default, NULL, does not lead to the computation of a confidence band.


list of arguments defining confidence band computation and visualisation: exact is NULL (see details) or a logical indicating whether an exact p-value should be computed, simulate.p.value is a logical indicating whether to compute p-values by Monte Carlo simulation, B defines the number of replicates used in the Monte Carlo test, col and border define the color for filling and border of the confidence band (the default, NA and NULL, is to leave the band unfilled with black borders.


For qqnorm and qqplot, a list with components


The x coordinates of the points that were/would be plotted


The original y vector, i.e., the corresponding y coordinates including NAs. If conf.level was specified to qqplot, the list contains additional components lwr and upr defining the confidence band.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.

Switzer, P. (1976). Confidence procedures for two-sample problems. Biometrika, 63(1), 13–25. doi:10.1093/biomet/63.1.13.

See Also

ppoints, used by qqnorm to generate approximations to expected order statistics for a normal distribution.



y <- rt(200, df = 5)
qqnorm(y); qqline(y, col = 2)
qqplot(y, rt(300, df = 5))

qqnorm(precip, ylab = "Precipitation [in/yr] for 70 US cities")

## "QQ-Chisquare" : --------------------------
y <- rchisq(500, df = 3)
## Q-Q plot for Chi^2 data against true theoretical distribution:
qqplot(qchisq(ppoints(500), df = 3), y,
       main = expression("Q-Q plot for" ~~ {chi^2}[nu == 3]))
qqline(y, distribution = function(p) qchisq(p, df = 3),
       probs = c(0.1, 0.6), col = 2)
mtext("qqline(*, dist = qchisq(., df=3), prob = c(0.1, 0.6))")
## (Note that the above uses ppoints() with a = 1/2, giving the
## probability points for quantile type 5: so theoretically, using
## qqline(qtype = 5) might be preferable.) 

## Figure 1 in Switzer (1976), knee angle data
switzer <- data.frame(
    angle = c(-31, -30, -25, -25, -23, -23, -22, -20, -20, -18,
              -18, -18, -16, -15, -15, -14, -13, -11, -10, - 9,
              - 8, - 7, - 7, - 7, - 6, - 6, - 4, - 4, - 3, - 2,
              - 2, - 1,   1,   1,   4,   5,  11,  12,  16,  34,
              -31, -20, -18, -16, -16, -16, -15, -14, -14, -14,
              -14, -13, -13, -11, -11, -10, - 9, - 9, - 8, - 7,
              - 7, - 6, - 6,  -5, - 5, - 5, - 4, - 2, - 2, - 2,
                0,   0,   1,   1,   2,   4,   5,   5,   6,  17),
    sex = gl(2, 40, labels = c("Female", "Male")))

ks.test(angle ~ sex, data = switzer)
d <- with(switzer, split(angle, sex))
with(d, qqplot(Female, Male, pch = 19, xlim = c(-31, 31), ylim = c(-31, 31),
               conf.level = 0.945, 
               conf.args = list(col = "lightgrey", exact = TRUE))
abline(a = 0, b = 1)

## agreement with ks.test
x <- rnorm(50)
y <- rnorm(50, mean = .5, sd = .95)
ex <- TRUE
### p = 0.112
(pval <- ks.test(x, y, exact = ex)$p.value)
## 88.8% confidence band with bisecting line
## touching the lower bound
qqplot(x, y, pch = 19, conf.level = 1 - pval, 
       conf.args = list(exact = ex, col = "lightgrey"))
abline(a = 0, b = 1)

Quade Test


Performs a Quade test with unreplicated blocked data.


quade.test(y, ...)

## Default S3 method:
quade.test(y, groups, blocks, ...)

## S3 method for class 'formula'
quade.test(formula, data, subset, na.action, ...)



either a numeric vector of data values, or a data matrix.


a vector giving the group for the corresponding elements of y if this is a vector; ignored if y is a matrix. If not a factor object, it is coerced to one.


a vector giving the block for the corresponding elements of y if this is a vector; ignored if y is a matrix. If not a factor object, it is coerced to one.


a formula of the form a ~ b | c, where a, b and c give the data values and corresponding groups and blocks, respectively.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").


further arguments to be passed to or from methods.


quade.test can be used for analyzing unreplicated complete block designs (i.e., there is exactly one observation in y for each combination of levels of groups and blocks) where the normality assumption may be violated.

The null hypothesis is that apart from an effect of blocks, the location parameter of y is the same in each of the groups.

If y is a matrix, groups and blocks are obtained from the column and row indices, respectively. NA's are not allowed in groups or blocks; if y contains NA's, corresponding blocks are removed.


A list with class "htest" containing the following components:


the value of Quade's F statistic.


a vector with the numerator and denominator degrees of freedom of the approximate F distribution of the test statistic.


the p-value of the test.


the character string "Quade test".

a character string giving the names of the data.


D. Quade (1979), Using weighted rankings in the analysis of complete blocks with additive block effects. Journal of the American Statistical Association 74, 680–683.

William J. Conover (1999), Practical nonparametric statistics. New York: John Wiley & Sons. Pages 373–380.

See Also



## Conover (1999, p. 375f):
## Numbers of five brands of a new hand lotion sold in seven stores
## during one week.
y <- matrix(c( 5,  4,  7, 10, 12,
               1,  3,  1,  0,  2,
              16, 12, 22, 22, 35,
               5,  4,  3,  5,  4,
              10,  9,  7, 13, 10,
              19, 18, 28, 37, 58,
              10,  7,  6,  8,  7),
            nrow = 7, byrow = TRUE,
            dimnames =
            list(Store = as.character(1:7),
                 Brand = LETTERS[1:5]))
(qTst <- quade.test(y))

## Show equivalence of different versions of test :
utils::str(dy <-
qT. <- quade.test(Freq ~ Brand|Store, data = dy)
qT.$ <- qTst$
stopifnot(all.equal(qTst, qT., tolerance = 1e-15))
dys <- dy[order(dy[,"Freq"]),]
qTs <- quade.test(Freq ~ Brand|Store, data = dys)
qTs$ <- qTst$
stopifnot(all.equal(qTst, qTs, tolerance = 1e-15))

Sample Quantiles


The generic function quantile produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1.


quantile(x, ...)

## Default S3 method:
quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE,
         names = TRUE, type = 7, digits = 7, ...)



numeric vector whose sample quantiles are wanted, or an object of a class for which a method has been defined (see also ‘details’). NA and NaN values are not allowed in numeric vectors unless na.rm is TRUE.


numeric vector of probabilities with values in [0,1][0,1]. (Values up to ‘⁠2e-14⁠’ outside that range are accepted and moved to the nearby endpoint.)


logical; if true, any NA and NaN's are removed from x before the quantiles are computed.


logical; if true, the result has a names attribute. Set to FALSE for speedup with many probs.


an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used.


used only when names is true: the precision to use when formatting the percentages. In R versions up to 4.0.x, this had been set to max(2, getOption("digits")), internally.


further arguments passed to or from other methods.


A vector of length length(probs) is returned; if names = TRUE, it has a names attribute.

NA and NaN values in probs are propagated to the result.

The default method works with classed objects sufficiently like numeric vectors that sort and (not needed by types 1 and 3) addition of elements and multiplication by a number work correctly. Note that as this is in a namespace, the copy of sort in base will be used, not some S4 generic of that name. Also note that that is no check on the ‘correctly’, and so e.g. quantile can be applied to complex vectors which (apart from ties) will be ordered on their real parts.

There is a method for the date-time classes (see "POSIXt"). Types 1 and 3 can be used for class "Date" and for ordered factors.


quantile returns estimates of underlying distribution quantiles based on one or two order statistics from the supplied elements in x at probabilities in probs. One of the nine quantile algorithms discussed in Hyndman and Fan (1996), selected by type, is employed.

All sample quantiles are defined as weighted averages of consecutive order statistics. Sample quantiles of type ii are defined by:

Qi(p)=(1γ)xj+γxj+1Q_{i}(p) = (1 - \gamma)x_{j} + \gamma x_{j+1}

where 1i91 \le i \le 9, jmnp<jm+1n\frac{j - m}{n} \le p < \frac{j - m + 1}{n}, xjx_{j} is the jj-th order statistic, nn is the sample size, the value of γ\gamma is a function of j=np+mj = \lfloor np + m\rfloor and g=np+mjg = np + m - j, and mm is a constant determined by the sample quantile type.

Discontinuous sample quantile types 1, 2, and 3

For types 1, 2 and 3, Qi(p)Q_i(p) is a discontinuous function of pp, with m=0m = 0 when i=1i = 1 and i=2i = 2, and m=1/2m = -1/2 when i=3i = 3.

Type 1

Inverse of empirical distribution function. γ=0\gamma = 0 if g=0g = 0, and 1 otherwise.

Type 2

Similar to type 1 but with averaging at discontinuities. γ=0.5\gamma = 0.5 if g=0g = 0, and 1 otherwise (SAS default, see Wicklin (2017)).

Type 3

Nearest even order statistic (SAS default till ca. 2010). γ=0\gamma = 0 if g=0g = 0 and jj is even, and 1 otherwise.

Continuous sample quantile types 4 through 9

For types 4 through 9, Qi(p)Q_i(p) is a continuous function of pp, with γ=g\gamma = g and mm given below. The sample quantiles can be obtained equivalently by linear interpolation between the points (pk,xk)(p_k,x_k) where xkx_k is the kk-th order statistic. Specific expressions for pkp_k are given below.

Type 4

m=0m = 0. pk=knp_k = \frac{k}{n}. That is, linear interpolation of the empirical cdf.

Type 5

m=1/2m = 1/2. pk=k0.5np_k = \frac{k - 0.5}{n}. That is a piecewise linear function where the knots are the values midway through the steps of the empirical cdf. This is popular amongst hydrologists.

Type 6

m=pm = p. pk=kn+1p_k = \frac{k}{n + 1}. Thus pk=E[F(xk)]p_k = \mbox{E}[F(x_{k})]. This is used by Minitab and by SPSS.

Type 7

m=1pm = 1-p. pk=k1n1p_k = \frac{k - 1}{n - 1}. In this case, pk=mode[F(xk)]p_k = \mbox{mode}[F(x_{k})]. This is used by S.

Type 8

m=(p+1)/3m = (p+1)/3. pk=k1/3n+1/3p_k = \frac{k - 1/3}{n + 1/3}. Then pkmedian[F(xk)]p_k \approx \mbox{median}[F(x_{k})]. The resulting quantile estimates are approximately median-unbiased regardless of the distribution of x.

Type 9

m=p/4+3/8m = p/4 + 3/8. pk=k3/8n+1/4p_k = \frac{k - 3/8}{n + 1/4}. The resulting quantile estimates are approximately unbiased for the expected order statistics if x is normally distributed.

Further details are provided in Hyndman and Fan (1996) who recommended type 8. The default method is type 7, as used by S and by R < 2.0.0. Makkonen argues for type 6, also as already proposed by Weibull in 1939. The Wikipedia page contains further information about availability of these 9 types in software.


of the version used in R >= 2.0.0, Ivan Frohne and Rob J Hyndman.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician 50, 361–365. doi:10.2307/2684934.

Wicklin, R. (2017) Sample quantiles: A comparison of 9 definitions; SAS Blog.


See Also

ecdf for empirical distributions of which quantile is an inverse; boxplot.stats and fivenum for computing other versions of quartiles, etc.


quantile(x <- rnorm(1001)) # Extremes & Quartiles by default
quantile(x,  probs = c(0.1, 0.5, 1, 2, 5, 10, 50, NA)/100)

### Compare different types
quantAll <- function(x, prob, ...)
  t(vapply(1:9, function(typ) quantile(x, probs = prob, type = typ, ...),
           quantile(x, prob, type=1, ...)))
p <- c(0.1, 0.5, 1, 2, 5, 10, 50)/100
signif(quantAll(x, p), 4)

## 0% and 100% are equal to min(), max() for all types:
stopifnot(t(quantAll(x, prob=0:1)) == range(x))

## for complex numbers:
z <- complex(real = x, imaginary = -10*x)
signif(quantAll(z, p), 4)

Random 2-way Tables with Given Marginals


Generate random 2-way tables with given marginals using Patefield's algorithm.


r2dtable(n, r, c)



a non-negative numeric giving the number of tables to be drawn.


a non-negative vector of length at least 2 giving the row totals, to be coerced to integer. Must sum to the same as c.


a non-negative vector of length at least 2 giving the column totals, to be coerced to integer.


A list of length n containing the generated tables as its components.


Patefield, W. M. (1981). Algorithm AS 159: An efficient method of generating r x c tables with given row and column totals. Applied Statistics, 30, 91–97. doi:10.2307/2346669.


## Fisher's Tea Drinker data.
TeaTasting <-
matrix(c(3, 1, 1, 3),
       nrow = 2,
       dimnames = list(Guess = c("Milk", "Tea"),
                       Truth = c("Milk", "Tea")))
## Simulate permutation test for independence based on the maximum
## Pearson residuals (rather than their sum).
rowTotals <- rowSums(TeaTasting)
colTotals <- colSums(TeaTasting)
nOfCases <- sum(rowTotals)
expected <- outer(rowTotals, colTotals) / nOfCases
maxSqResid <- function(x) max((x - expected) ^ 2 / expected)
simMaxSqResid <-
    sapply(r2dtable(1000, rowTotals, colTotals), maxSqResid)
sum(simMaxSqResid >= maxSqResid(TeaTasting)) / 1000
## Fisher's exact test gives p = 0.4857 ...

Manipulate Flat Contingency Tables


Read, write and coerce ‘flat’ (contingency) tables, aka ftables.


read.ftable(file, sep = "", quote = "\"",
            row.var.names, col.vars, skip = 0)

write.ftable(x, file = "", quote = TRUE, append = FALSE,
             digits = getOption("digits"), sep = " ", ...)

## S3 method for class 'ftable'
format(x, quote = TRUE, digits = getOption("digits"),
       method = c("non.compact", "row.compact", "col.compact", "compact"),
       lsep = " | ",
       justify = c("left", "right"),

## S3 method for class 'ftable'
print(x, digits = getOption("digits"), ...)



either a character string naming a file or a connection which the data are to be read from or written to. "" indicates input from the console for reading and output to the console for writing.


the field separator string. Values on each line of the file are separated by this string.


a character string giving the set of quoting characters for read.ftable; to disable quoting altogether, use quote="". For write.table, a logical indicating whether strings in the data will be surrounded by double quotes.


a character vector with the names of the row variables, in case these cannot be determined automatically.


a list giving the names and levels of the column variables, in case these cannot be determined automatically.


the number of lines of the data file to skip before beginning to read data.


an object of class "ftable".


logical. If TRUE and file is the name of a file (and not a connection or "|cmd"), the output from write.ftable is appended to the file. If FALSE, the contents of file will be overwritten.


an integer giving the number of significant digits to use for (the cell entries of) x.


string specifying how the "ftable" object is formatted (and printed if used as in write.ftable() or the print method). Can be abbreviated. Available methods are (see the examples):


the default representation of an "ftable" object.


a row-compact version without empty cells below the column labels.


a column-compact version without empty cells to the right of the row labels.


a row- and column-compact version. This may imply a row and a column label sharing the same cell. They are then separated by the string lsep.


only for method = "compact", the separation string for row and column labels.


character vector of length (one or) two, specifying how string justification should happen in format(..), first for the labels, then the table entries.


further arguments to be passed to or from methods; for write() and print(), notably arguments such as method, passed to format().


read.ftable reads in a flat-like contingency table from a file. If the file contains the written representation of a flat table (more precisely, a header with all information on names and levels of column variables, followed by a line with the names of the row variables), no further arguments are needed. Similarly, flat tables with only one column variable the name of which is the only entry in the first line are handled automatically. Other variants can be dealt with by skipping all header information using skip, and providing the names of the row variables and the names and levels of the column variable using row.var.names and col.vars, respectively. See the examples below.

Note that flat tables are characterized by their ‘ragged’ display of row (and maybe also column) labels. If the full grid of levels of the row variables is given, one should instead use read.table to read in the data, and create the contingency table from this using xtabs.

write.ftable writes a flat table to a file, which is useful for generating ‘pretty’ ASCII representations of contingency tables. Different versions are available via the method argument, which may be useful, for example, for constructing LaTeX tables.


Agresti, A. (1990) Categorical data analysis. New York: Wiley.

See Also

ftable for more information on flat contingency tables.


## Agresti (1990), page 157, Table 5.8.
## Not in ftable standard format, but o.k.
file <- tempfile()
cat("             Intercourse\n",
    "Race  Gender     Yes  No\n",
    "White Male        43 134\n",
    "      Female      26 149\n",
    "Black Male        29  23\n",
    "      Female      22  36\n",
    file = file)
ft1 <- read.ftable(file)

## Agresti (1990), page 297, Table 8.16.
## Almost o.k., but misses the name of the row variable.
file <- tempfile()
cat("                      \"Tonsil Size\"\n",
    "            \"Not Enl.\" \"Enl.\" \"Greatly Enl.\"\n",
    "Noncarriers       497     560           269\n",
    "Carriers           19      29            24\n",
    file = file)
ft <- read.ftable(file, skip = 2,
                  row.var.names = "Status",
                  col.vars = list("Tonsil Size" =
                      c("Not Enl.", "Enl.", "Greatly Enl.")))

ft22 <- ftable(Titanic, row.vars = 2:1, col.vars = 4:3)
write.ftable(ft22, quote = FALSE) # is the same as
print(ft22)#method="non.compact" is default
print(ft22, method="row.compact")
print(ft22, method="col.compact")
print(ft22, method="compact")

## using 'justify' and 'quote' :
format(ftable(wool + tension ~ breaks, warpbreaks),
       justify = "none", quote = FALSE)

Draw Rectangles Around Hierarchical Clusters


Draws rectangles around the branches of a dendrogram highlighting the corresponding clusters. First the dendrogram is cut at a certain level, then a rectangle is drawn around selected branches.


rect.hclust(tree, k = NULL, which = NULL, x = NULL, h = NULL,
            border = 2, cluster = NULL)



an object of the type produced by hclust.

k, h

Scalar. Cut the dendrogram such that either exactly k clusters are produced or by cutting at height h.

which, x

A vector selecting the clusters around which a rectangle should be drawn. which selects clusters by number (from left to right in the tree), x selects clusters containing the respective horizontal coordinates. Default is which = 1:k.


Vector with border colors for the rectangles.


Optional vector with cluster memberships as returned by cutree(hclust.obj, k = k), can be specified for efficiency if already computed.


(Invisibly) returns a list where each element contains a vector of data points contained in the respective cluster.

See Also

hclust, identify.hclust.



hca <- hclust(dist(USArrests))
rect.hclust(hca, k = 3, border = "red")
x <- rect.hclust(hca, h = 50, which = c(2,7), border = 3:4)

Reorder Levels of Factor


The levels of a factor are re-ordered so that the level specified by ref is first and the others are moved down. This is useful for contr.treatment contrasts which take the first level as the reference.


relevel(x, ref, ...)



an unordered factor.


the reference level, typically a string.


additional arguments for future methods.


This, as reorder(), is a special case of simply calling factor(x, levels = levels(x)[....]).


A factor of the same length as x.

See Also

factor, contr.treatment, levels, reorder.


warpbreaks$tension <- relevel(warpbreaks$tension, ref = "M")
summary(lm(breaks ~ wool + tension, data = warpbreaks))

Reorder Levels of a Factor


reorder is a generic function. The "default" method treats its first argument as a categorical variable, and reorders its levels based on the values of a second variable, usually numeric.


reorder(x, ...)

## Default S3 method:
reorder(x, X, FUN = mean, ...,
        order = is.ordered(x), decreasing = FALSE)



an atomic vector, usually a factor (possibly ordered). The vector is treated as a categorical variable whose levels will be reordered. If x is not a factor, its unique values will be used as the implicit levels.


a vector of the same length as x, whose subset of values for each unique level of x determines the eventual order of that level.


a function whose first argument is a vector and returns a scalar, to be applied to each subset of X determined by the levels of x.


optional: extra arguments supplied to FUN


logical, whether return value will be an ordered factor rather than a factor.


logical, whether the levels will be ordered in increasing or decreasing order.


This, as relevel(), is a special case of simply calling factor(x, levels = levels(x)[....]).


A factor or an ordered factor (depending on the value of order), with the order of the levels determined by FUN applied to X grouped by x. By default, the levels are ordered such that the values returned by FUN are in increasing order. Empty levels will be dropped.

Additionally, the values of FUN applied to the subsets of X (in the original order of the levels of x) is returned as the "scores" attribute.


Deepayan Sarkar

See Also

reorder.dendrogram, levels, relevel.



bymedian <- with(InsectSprays, reorder(spray, count, median))
boxplot(count ~ bymedian, data = InsectSprays,
        xlab = "Type of spray", ylab = "Insect count",
        main = "InsectSprays data", varwidth = TRUE,
        col = "lightgray")

bymedianR <- with(InsectSprays, reorder(spray, count, median, decreasing=TRUE))
stopifnot(exprs = {
    identical(attr(bymedian, "scores") -> sc,
    identical(nms <- names(sc), LETTERS[1:6])
    identical(levels(bymedian ), nms[isc <- order(sc)])
    identical(levels(bymedianR), nms[rev(isc)])

Reorder a Dendrogram


A method for the generic function reorder.

There are many different orderings of a dendrogram that are consistent with the structure imposed. This function takes a dendrogram and a vector of values and reorders the dendrogram in the order of the supplied vector, maintaining the constraints on the dendrogram.


## S3 method for class 'dendrogram'
reorder(x, wts, agglo.FUN = sum, ...)



the (dendrogram) object to be reordered


numeric weights (arbitrary values) for reordering.


a function for weights agglomeration, see below.


additional arguments


Using the weights wts, the leaves of the dendrogram are reordered so as to be in an order as consistent as possible with the weights. At each node, the branches are ordered in increasing weights where the weight of a branch is defined as f(wj)f(w_j) where ff is agglo.FUN and wjw_j is the weight of the jj-th sub branch.


A dendrogram where each node has a further attribute value with its corresponding weight.


R. Gentleman and M. Maechler

See Also


rev.dendrogram which simply reverses the nodes' order; heatmap, cophenetic.



x <- rnorm(10)
hc <- hclust(dist(x))
dd <- as.dendrogram(hc)
dd.reorder <- reorder(dd, 10:1)
plot(dd, main = "random dendrogram 'dd'")

op <- par(mfcol = 1:2)
plot(dd.reorder, main = "reorder(dd, 10:1)")
plot(reorder(dd, 10:1, agglo.FUN = mean), main = "reorder(dd, 10:1, mean)")

Number of Replications of Terms


Returns a vector or a list of the number of replicates for each term in the formula.


replications(formula, data = NULL, na.action)



a formula or a terms object or a data frame.


a data frame used to find the objects in formula.


function for handling missing values. Defaults to a na.action attribute of data, then a setting of the option na.action, or if that is not set.


If formula is a data frame and data is missing, formula is used for data with the formula ~ ..

Any character vectors in the formula are coerced to factors.


A vector or list with one entry for each term in the formula giving the number(s) of replications for each level. If all levels are balanced (have the same number of replications) the result is a vector, otherwise it is a list with a component for each terms, as a vector, matrix or array as required.

A test for balance is !is.list(replications(formula,data)).


The design was inspired by the S function of the same name described in Chambers et al. (1992).


Chambers, J. M., Freeny, A and Heiberger, R. M. (1992) Analysis of variance; designed experiments. Chapter 5 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also



## From Venables and Ripley (2002) p.165.
N <- c(0,1,0,1,1,1,0,0,0,1,1,0,1,1,0,0,1,0,1,0,1,1,0,0)
P <- c(1,1,0,0,0,1,0,1,1,1,0,0,0,1,0,1,1,0,0,1,0,1,1,0)
K <- c(1,0,0,1,0,1,1,0,0,1,0,1,0,1,1,0,0,0,1,1,1,0,1,0)
yield <- c(49.5,62.8,46.8,57.0,59.8,58.5,55.5,56.0,62.8,55.8,69.5,
55.0, 62.0,48.8,45.5,44.2,52.0,51.5,49.8,48.8,57.2,59.0,53.2,56.0)

npk <- data.frame(block = gl(6,4), N = factor(N), P = factor(P),
                  K = factor(K), yield = yield)
replications(~ . - yield, npk)

Reshape Grouped Data


This function reshapes a data frame between ‘wide’ format (with repeated measurements in separate columns of the same row) and ‘long’ format (with the repeated measurements in separate rows).


reshape(data, varying = NULL, v.names = NULL, timevar = "time",
        idvar = "id", ids = 1:NROW(data),
        times = seq_along(varying[[1]]),
        drop = NULL, direction, new.row.names = NULL,
        sep = ".",
        split = if (sep == "") {
            list(regexp = "[A-Za-z][0-9]", include = TRUE)
        } else {
            list(regexp = sep, include = FALSE, fixed = TRUE)}

### Typical usage for converting from long to wide format:

# reshape(data, direction = "wide",
#         idvar = "___", timevar = "___", # mandatory
#         v.names = c(___),    # time-varying variables
#         varying = list(___)) # auto-generated if missing

### Typical usage for converting from wide to long format:

### If names of wide-format variables are in a 'nice' format

# reshape(data, direction = "long",
#         varying = c(___), # vector 
#         sep)              # to help guess 'v.names' and 'times'

### To specify long-format variable names explicitly

# reshape(data, direction = "long",
#         varying = ___,  # list / matrix / vector (use with care)
#         v.names = ___,  # vector of variable names in long format
#         timevar, times, # name / values of constructed time variable
#         idvar, ids)     # name / values of constructed id variable



a data frame


names of sets of variables in the wide format that correspond to single variables in long format (‘time-varying’). This is canonically a list of vectors of variable names, but it can optionally be a matrix of names, or a single vector of names. In each case, when direction = "long", the names can be replaced by indices which are interpreted as referring to names(data). See ‘Details’ for more details and options.


names of variables in the long format that correspond to multiple variables in the wide format. See ‘Details’.


the variable in long format that differentiates multiple records from the same group or individual. If more than one record matches, the first will be taken (with a warning).


Names of one or more variables in long format that identify multiple records from the same group/individual. These variables may also be present in wide format.


the values to use for a newly created idvar variable in long format.


the values to use for a newly created timevar variable in long format. See ‘Details’.


a vector of names of variables to drop before reshaping.


character string, partially matched to either "wide" to reshape to wide format, or "long" to reshape to long format.


character or NULL: a non-null value will be used for the row names of the result.


A character vector of length 1, indicating a separating character in the variable names in the wide format. This is used for guessing v.names and times arguments based on the names in varying. If sep == "", the split is just before the first numeral that follows an alphabetic character. This is also used to create variable names when reshaping to wide format.


A list with three components, regexp, include, and (optionally) fixed. This allows an extended interface to variable name splitting. See ‘Details’.


Although reshape() can be used in a variety of contexts, the motivating application is data from longitudinal studies, and the arguments of this function are named and described in those terms. A longitudinal study is characterized by repeated measurements of the same variable(s), e.g., height and weight, on each unit being studied (e.g., individual persons) at different time points (which are assumed to be the same for all units). These variables are called time-varying variables. The study may include other variables that are measured only once for each unit and do not vary with time (e.g., gender and race); these are called time-constant variables.

A ‘wide’ format representation of a longitudinal dataset will have one record (row) for each unit, typically with some time-constant variables that occupy single columns, and some time-varying variables that occupy multiple columns (one column for each time point). A ‘long’ format representation of the same dataset will have multiple records (rows) for each individual, with the time-constant variables being constant across these records and the time-varying variables varying across the records. The ‘long’ format dataset will have two additional variables: a ‘time’ variable identifying which time point each record comes from, and an ‘id’ variable showing which records refer to the same unit.

The type of conversion (long to wide or wide to long) is determined by the direction argument, which is mandatory unless the data argument is the result of a previous call to reshape. In that case, the operation can be reversed simply using reshape(data) (the other arguments are stored as attributes on the data frame).

Conversion from long to wide format with direction = "wide" is the simpler operation, and is mainly useful in the context of multivariate analysis where data is often expected as a wide-format matrix. In this case, the time variable timevar and id variable idvar must be specified. All other variables are assumed to be time-varying, unless the time-varying variables are explicitly specified via the v.names argument. A warning is issued if time-constant variables are not actually constant.

Each time-varying variable is expanded into multiple variables in the wide format. The names of these expanded variables are generated automatically, unless they are specified as the varying argument in the form of a list (or matrix) with one component (or row) for each time-varying variable. If varying is a vector of names, it is implicitly converted into a matrix, with one row for each time-varying variable. Use this option with care if there are multiple time-varying variables, as the ordering (by column, the default in the matrix constructor) may be unintuitive, whereas the explicit list or matrix form is unambiguous.

Conversion from wide to long with direction = "long" is the more common operation as most (univariate) statistical modeling functions expect data in the long format. In the simpler case where there is only one time-varying variable, the corresponding columns in the wide format input can be specified as the varying argument, which can be either a vector of column names or the corresponding column indices. The name of the corresponding variable in the long format output combining these columns can be optionally specified as the v.names argument, and the name of the time variables as the timevar argument. The values to use as the time values corresponding to the different columns in the wide format can be specified as the times argument. If v.names is unspecified, the function will attempt to guess v.names and times from varying (an explicitly specified times argument is unused in that case). The default expects variable names like x.1, x.2, where sep = "." specifies to split at the dot and drop it from the name. To have alphabetic followed by numeric times use sep = "".

Multiple time-varying variables can be specified in two ways, either with varying as an atomic vector as above, or as a list (or a matrix). The first form is useful (and mandatory) if the automatic variable name splitting as described above is used; this requires the names of all time-varying variables to be suitably formatted in the same manner, and v.names to be unspecified. If varying is a list (with one component for each time-varying variable) or a matrix (one row for each time-varying variable), variable name splitting is not attempted, and v.names and times will generally need to be specified, although they will default to, respectively, the first variable name in each set, and sequential times.

Also, guessing is not attempted if v.names is given explicitly, even if varying is an atomic vector. In that case, the number of time-varying variables is taken to be the length of v.names, and varying is implicitly converted into a matrix, with one row for each time-varying variable. As in the case of long to wide conversion, the matrix is filled up by column, so careful attention needs to be paid to the order of variable names (or indices) in varying, which is taken to be like x.1, y.1, x.2, y.2 (i.e., variables corresponding to the same time point need to be grouped together).

The split argument should not usually be necessary. The split$regexp component is passed to either strsplit or regexpr, where the latter is used if split$include is TRUE, in which case the splitting occurs after the first character of the matched string. In the strsplit case, the separator is not included in the result, and it is possible to specify fixed-string matching using split$fixed.


The reshaped data frame with added attributes to simplify reshaping back to the original form.

See Also

stack, aperm; relist for reshaping the result of unlist. xtabs and for creating contingency tables and converting them back to data frames.


summary(Indometh) # data in long format

## long to wide (direction = "wide") requires idvar and timevar at a minimum
reshape(Indometh, direction = "wide", idvar = "Subject", timevar = "time")

## can also explicitly specify name of combined variable
wide <- reshape(Indometh, direction = "wide", idvar = "Subject",
                timevar = "time", v.names = "conc", sep= "_")

## reverse transformation
reshape(wide, direction = "long")
reshape(wide, idvar = "Subject", varying = list(2:12),
        v.names = "conc", direction = "long")

## times need not be numeric
df <- data.frame(id = rep(1:4, rep(2,4)),
                 visit = I(rep(c("Before","After"), 4)),
                 x = rnorm(4), y = runif(4))
reshape(df, timevar = "visit", idvar = "id", direction = "wide")
## warns that y is really varying
reshape(df, timevar = "visit", idvar = "id", direction = "wide", v.names = "x")

##  unbalanced 'long' data leads to NA fill in 'wide' form
df2 <- df[1:7, ]
reshape(df2, timevar = "visit", idvar = "id", direction = "wide")

## Alternative regular expressions for guessing names
df3 <- data.frame(id = 1:4, age = c(40,50,60,50), dose1 = c(1,2,1,2),
                  dose2 = c(2,1,2,1), dose4 = c(3,3,3,3))
reshape(df3, direction = "long", varying = 3:5, sep = "")

## an example that isn't longitudinal data
state.x77 <-
long <- reshape(state.x77, idvar = "state", ids = row.names(state.x77),
                times = names(state.x77), timevar = "Characteristic",
                varying = list(names(state.x77)), direction = "long")

reshape(long, direction = "wide")

reshape(long, direction = "wide", new.row.names = unique(long$state))

## multiple id variables
df3 <- data.frame(school = rep(1:3, each = 4), class = rep(9:10, 6),
                  time = rep(c(1,1,2,2), 3), score = rnorm(12))
wide <- reshape(df3, idvar = c("school", "class"), direction = "wide")
## transform back

Extract Model Residuals


residuals is a generic function which extracts model residuals from objects returned by modeling functions.

resid is an alias for residuals, abbreviated to encourage users to access object components through an accessor function rather than by directly referencing an object slot.

All object classes which are returned by model fitting functions should provide a residuals method. (Note that the method is for ‘⁠residuals⁠’ and not ‘⁠resid⁠’.)

Methods can make use of naresid methods to compensate for the omission of missing values. The default, nls and smooth.spline methods do.


residuals(object, ...)
resid(object, ...)



an object for which the extraction of model residuals is meaningful.


other arguments.


Residuals extracted from the object object.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

See Also

coefficients, fitted.values, glm, lm.

influence.measures for standardized (rstandard) and studentized (rstudent) residuals.

Running Medians – Robust Scatter Plot Smoothing


Compute running medians of odd span. This is the ‘most robust’ scatter plot smoothing possible. For efficiency (and historical reason), you can use one of two different algorithms giving identical results.


runmed(x, k, endrule = c("median", "keep", "constant"),
       algorithm = NULL,
       na.action = c("+Big_alternate", "-Big_alternate", "na.omit", "fail"),
       print.level = 0)



numeric vector, the ‘dependent’ variable to be smoothed.


integer width of median window; must be odd. Turlach had a default of k <- 1 + 2 * min((n-1)%/% 2, ceiling(0.1*n)). Use k = 3 for ‘minimal’ robust smoothing eliminating isolated outliers.


character string indicating how the values at the beginning and the end (of the data) should be treated. Can be abbreviated. Possible values are:


keeps the first and last k2k_2 values at both ends, where k2k_2 is the half-bandwidth k2 = k %/% 2, i.e., y[j] = x[j] for j{1,,k2;nk2+1,,n}j \in \{1,\ldots,k_2; n-k_2+1,\ldots,n\};


copies median(y[1:k2]) to the first values and analogously for the last ones making the smoothed ends constant;


the default, smooths the ends by using symmetrical medians of subsequently smaller bandwidth, but for the very first and last value where Tukey's robust end-point rule is applied, see smoothEnds.


character string (partially matching "Turlach" or "Stuetzle") or the default NULL, specifying which algorithm should be applied. The default choice depends on n = length(x) and k where "Turlach" will be used for larger problems.


character string determining the behavior in the case of NA or NaN in x, (partially matching) one of


Here, all the NAs in x are first replaced by alternating ±B\pm B where BB is a “Big” number (with 2B<M2B < M*, where M=M*=.Machine $ double.xmax). The replacement values are “from left” (+B,B,+B,)(+B, -B, +B, \ldots), i.e. start with "+".


almost the same as "+Big_alternate", just starting with B-B ("-Big...").


the result is the same as runmed(x[!], k, ..).


the presence of NAs in x will raise an error.


integer, indicating verboseness of algorithm; should rarely be changed by average users.


Apart from the end values, the result y = runmed(x, k) simply has y[j] = median(x[(j-k2):(j+k2)]) (k = 2*k2+1), computed very efficiently.

The two algorithms are internally entirely different:


is the Härdle–Steiger algorithm (see Ref.) as implemented by Berwin Turlach. A tree algorithm is used, ensuring performance O(nlogk)O(n \log k) where n = length(x) which is asymptotically optimal.


is the (older) Stuetzle–Friedman implementation which makes use of median updating when one observation enters and one leaves the smoothing window. While this performs as O(n×k)O(n \times k) which is slower asymptotically, it is considerably faster for small kk or nn.

Note that, both algorithms (and the smoothEnds() utility) now “work” also when x contains non-finite entries (±\pmInf, NaN, and NA):




currently simply works by applying the underlying math library (‘libm’) arithmetic for the non-finite numbers; this may optionally change in the future.

Currently long vectors are only supported for algorithm = "Stuetzle".


vector of smoothed values of the same length as x with an attribute k containing (the ‘oddified’) k.


Martin Maechler, based on Fortran code from Werner Stuetzle and S-PLUS and C code from Berwin Turlach.


Härdle, W. and Steiger, W. (1995) Algorithm AS 296: Optimal median smoothing, Applied Statistics 44, 258–264. doi:10.2307/2986349.

Jerome H. Friedman and Werner Stuetzle (1982) Smoothing of Scatterplots; Report, Dep. Statistics, Stanford U., Project Orion 003.

See Also

smoothEnds which implements Tukey's end point rule and is called by default from runmed(*, endrule = "median"). smooth uses running medians of 3 for its compound smoothers.



myNHT <- as.vector(nhtemp)
myNHT[20] <- 2 * nhtemp[20]
plot(myNHT, type = "b", ylim = c(48, 60), main = "Running Medians Example")
lines(runmed(myNHT, 7), col = "red")

## special: multiple y values for one x
plot(cars, main = "'cars' data and runmed(dist, 3)")
lines(cars, col = "light gray", type = "c")
with(cars, lines(speed, runmed(dist, k = 3), col = 2))

## nice quadratic with a few outliers
y <- ys <- (-20:20)^2
y [c(1,10,21,41)] <- c(150, 30, 400, 450)
all(y == runmed(y, 1)) # 1-neighbourhood <==> interpolation
plot(y) ## lines(y, lwd = .1, col = "light gray")
lines(lowess(seq(y), y, f = 0.3), col = "brown")
lines(runmed(y, 7), lwd = 2, col = "blue")
lines(runmed(y, 11), lwd = 2, col = "red")

## Lowess is not robust
y <- ys ; y[21] <- 6666 ; x <- seq(y)
col <- c("black", "brown","blue")
plot(y, col = col[1])
lines(lowess(x, y, f = 0.3), col = col[2])
lines(runmed(y, 7),      lwd = 2, col = col[3])
legend(length(y),max(y), c("data", "lowess(y, f = 0.3)", "runmed(y, 7)"),
       xjust = 1, col = col, lty = c(0, 1, 1), pch = c(1,NA,NA))

## An example with initial NA's - used to fail badly (notably for "Turlach"):
x15 <- c(rep(NA, 4), c(9, 9, 4, 22, 6, 1, 7, 5, 2, 8, 3))
rS15 <- cbind(Sk.3 = runmed(x15, k = 3, algorithm="S"),
              Sk.7 = runmed(x15, k = 7, algorithm="S"),
              Sk.11= runmed(x15, k =11, algorithm="S"))
rT15 <- cbind(Tk.3 = runmed(x15, k = 3, algorithm="T", print.level=1),
              Tk.7 = runmed(x15, k = 7, algorithm="T", print.level=1),
              Tk.9 = runmed(x15, k = 9, algorithm="T", print.level=1),
              Tk.11= runmed(x15, k =11, algorithm="T", print.level=1))
cbind(x15, rS15, rT15) # result for k=11  maybe a bit surprising ..
Tv <- rT15[-(1:3),]
stopifnot(3 <= Tv, Tv <= 9, 5 <= Tv[1:10,])
matplot(y = cbind(x15, rT15), type = "b", ylim = c(1,9), pch=1:5, xlab = NA,
        main = "runmed(x15, k, algo = \"Turlach\")")
mtext(paste("x15 <-", deparse(x15)))
points(x15, cex=2)
legend("bottomleft", legend=c("data", paste("k = ", c(3,7,9,11))),
       bty="n", col=1:5, lty=1:5, pch=1:5)

Random Wishart Distributed Matrices


Generate n random matrices, distributed according to the Wishart distribution with parameters Sigma and df, Wp(Σ,m), m=df, Σ=SigmaW_p(\Sigma, m),\ m=\code{df},\ \Sigma=\code{Sigma}.


rWishart(n, df, Sigma)



integer sample size.


numeric parameter, “degrees of freedom”.


positive definite (p×pp\times p) “scale” matrix, the matrix parameter of the distribution.


If X1,,Xm, XiRpX_1,\dots, X_m, \ X_i\in\mathbf{R}^p is a sample of mm independent multivariate Gaussians with mean (vector) 0, and covariance matrix Σ\Sigma, the distribution of M=XXM = X'X is Wp(Σ,m)W_p(\Sigma, m).

Consequently, the expectation of MM is

E[M]=m×Σ.E[M] = m\times\Sigma.

Further, if Sigma is scalar (p=1p = 1), the Wishart distribution is a scaled chi-squared (χ2\chi^2) distribution with df degrees of freedom, W1(σ2,m)=σ2χm2W_1(\sigma^2, m) = \sigma^2 \chi^2_m.

The component wise variance is

Var(Mij)=m(Σij2+ΣiiΣjj).\mathrm{Var}(M_{ij}) = m(\Sigma_{ij}^2 + \Sigma_{ii} \Sigma_{jj}).


a numeric array, say R, of dimension p×p×np \times p \times n, where each R[,,i] is a positive definite matrix, a realization of the Wishart distribution Wp(Σ,m),  m=df, Σ=SigmaW_p(\Sigma, m),\ \ m=\code{df},\ \Sigma=\code{Sigma}.


Douglas Bates


Mardia, K. V., J. T. Kent, and J. M. Bibby (1979) Multivariate Analysis, London: Academic Press.

See Also

cov, rnorm, rchisq.


## Artificial
S <- toeplitz((10:1)/10)
R <- rWishart(1000, 20, S)
dim(R)  #  10 10  1000
mR <- apply(R, 1:2, mean)  # ~= E[ Wish(S, 20) ] = 20 * S
stopifnot(all.equal(mR, 20*S, tolerance = .009))

## See Details, the variance is
Va <- 20*(S^2 + tcrossprod(diag(S)))
vR <- apply(R, 1:2, var)
stopifnot(all.equal(vR, Va, tolerance = 1/16))

Scatter Plot with Smooth Curve Fitted by loess


Plot and add a smooth curve computed by loess to a scatter plot.


scatter.smooth(x, y = NULL, span = 2/3, degree = 1,
    family = c("symmetric", "gaussian"),
    xlab = NULL, ylab = NULL,
    ylim = range(y, pred$y, na.rm = TRUE),
    evaluation = 50, ..., lpars = list())

loess.smooth(x, y, span = 2/3, degree = 1,
    family = c("symmetric", "gaussian"), evaluation = 50, ...)


x, y

the x and y arguments provide the x and y coordinates for the plot. Any reasonable way of defining the coordinates is acceptable. See the function xy.coords for details.


smoothness parameter for loess.


degree of local polynomial used.


if "gaussian" fitting is by least-squares, and if family = "symmetric" a re-descending M estimator is used. Can be abbreviated.


label for x axis.


label for y axis.


the y limits of the plot.


number of points at which to evaluate the smooth curve.


For scatter.smooth(), graphical parameters, passed to plot() only. For loess.smooth, control parameters passed to loess.control.


a list of arguments to be passed to lines().


loess.smooth is an auxiliary function which evaluates the loess smooth at evaluation equally spaced points covering the range of x.


For scatter.smooth, none.

For loess.smooth, a list with two components, x (the grid of evaluation points) and y (the smoothed values at the grid points).

See Also

loess; smoothScatter for scatter plots with smoothed density color representation.



with(cars, scatter.smooth(speed, dist))
## or with dotted thick smoothed line results :
with(cars, scatter.smooth(speed, dist, lpars =
                    list(col = "red", lwd = 3, lty = 3)))

Scree Plots


screeplot.default plots the variances against the number of the principal component. This is also the plot method for classes "princomp" and "prcomp".


screeplot(x, ...)
## Default S3 method:
screeplot(x, npcs = min(10, length(x$sdev)),
          type = c("barplot", "lines"),
          main = deparse1(substitute(x)), ...)



an object containing a sdev component, such as that returned by princomp() and prcomp().


the number of components to be plotted.


the type of plot. Can be abbreviated.

main, ...

graphics parameters.


Mardia, K. V., J. T. Kent and J. M. Bibby (1979). Multivariate Analysis, London: Academic Press.

Venables, W. N. and B. D. Ripley (2002). Modern Applied Statistics with S, Springer-Verlag.

See Also

princomp and prcomp.



## The variances of the variables in the
## USArrests data vary by orders of magnitude, so scaling is appropriate
( <- princomp(USArrests, cor = TRUE))  # inappropriate

fit <- princomp(covmat = Harman74.cor)
screeplot(fit, npcs = 24, type = "lines")

Standard Deviation


This function computes the standard deviation of the values in x. If na.rm is TRUE then missing values are removed before computation proceeds.


sd(x, na.rm = FALSE)



a numeric vector or an R object but not a factor coercible to numeric by as.double(x).


logical. Should missing values be removed?


Like var this uses denominator n1n - 1.

The standard deviation of a length-one or zero-length vector is NA.

See Also

var for its square, and mad, the most robust alternative.


sd(1:2) ^ 2

Standard Errors for Contrasts in Model Terms


Returns the standard errors for one or more contrasts in an aov object.


se.contrast(object, ...)
## S3 method for class 'aov'
se.contrast(object, contrast.obj,
           coef = contr.helmert(ncol(contrast))[, 1],
           data = NULL, ...)



A suitable fit, usually from aov.


The contrasts for which standard errors are requested. This can be specified via a list or via a matrix. A single contrast can be specified by a list of logical vectors giving the cells to be contrasted. Multiple contrasts should be specified by a matrix, each column of which is a numerical contrast vector (summing to zero).


used when contrast.obj is a list; it should be a vector of the same length as the list with zero sum. The default value is the first Helmert contrast, which contrasts the first and second cell means specified by the list.


The data frame used to evaluate contrast.obj.


further arguments passed to or from other methods.


Contrasts are usually used to test if certain means are significantly different; it can be easier to use se.contrast than compute them directly from the coefficients.

In multistratum models, the contrasts can appear in more than one stratum, in which case the standard errors are computed in the lowest stratum and adjusted for efficiencies and comparisons between strata. (See the comments in the note in the help for aov about using orthogonal contrasts.) Such standard errors are often conservative.

Suitable matrices for use with coef can be found by calling contrasts and indexing the columns by a factor.


A vector giving the standard errors for each contrast.

See Also

contrasts, model.tables


## From Venables and Ripley (2002) p.165.
N <- c(0,1,0,1,1,1,0,0,0,1,1,0,1,1,0,0,1,0,1,0,1,1,0,0)
P <- c(1,1,0,0,0,1,0,1,1,1,0,0,0,1,0,1,1,0,0,1,0,1,1,0)
K <- c(1,0,0,1,0,1,1,0,0,1,0,1,0,1,1,0,0,0,1,1,1,0,1,0)
yield <- c(49.5,62.8,46.8,57.0,59.8,58.5,55.5,56.0,62.8,55.8,69.5,
55.0, 62.0,48.8,45.5,44.2,52.0,51.5,49.8,48.8,57.2,59.0,53.2,56.0)

npk <- data.frame(block = gl(6,4), N = factor(N), P = factor(P),
                  K = factor(K), yield = yield)
## Set suitable contrasts.
options(contrasts = c("contr.helmert", "contr.poly"))
npk.aov1 <- aov(yield ~ block + N + K, data = npk)
se.contrast(npk.aov1, list(N == "0", N == "1"), data = npk)
# or via a matrix
cont <- matrix(c(-1,1), 2, 1, dimnames = list(NULL, "N"))
se.contrast(npk.aov1, cont[N, , drop = FALSE]/12, data = npk)

## test a multi-stratum model
npk.aov2 <- aov(yield ~ N + K + Error(block/(N + K)), data = npk)
se.contrast(npk.aov2, list(N == "0", N == "1"))

## an example looking at an interaction contrast
## Dataset from R.E. Kirk (1995)
## 'Experimental Design: procedures for the behavioral sciences'
score <- c(12, 8,10, 6, 8, 4,10,12, 8, 6,10,14, 9, 7, 9, 5,11,12,
            7,13, 9, 9, 5,11, 8, 7, 3, 8,12,10,13,14,19, 9,16,14)
A <- gl(2, 18, labels = c("a1", "a2"))
B <- rep(gl(3, 6, labels = c("b1", "b2", "b3")), 2)
fit <- aov(score ~ A*B)
cont <- c(1, -1)[A] * c(1, -1, 0)[B]
sum(cont)       # 0
sum(cont*score) # value of the contrast
se.contrast(fit, as.matrix(cont))
(t.stat <- sum(cont*score)/se.contrast(fit, as.matrix(cont)))
summary(fit, split = list(B = 1:2), expand.split = TRUE)
## t.stat^2 is the F value on the A:B: C1 line (with Helmert contrasts)
## Now look at all three interaction contrasts
cont <- c(1, -1)[A] * cbind(c(1, -1, 0), c(1, 0, -1), c(0, 1, -1))[B,]
se.contrast(fit, cont)  # same, due to balance.
rm(A, B, score)

## multi-stratum example where efficiencies play a role
## An example from Yates (1932),
## a 2^3 design in 2 blocks replicated 4 times

Block <- gl(8, 4)
A <- factor(c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,
B <- factor(c(0,0,1,1,0,0,1,1,0,1,0,1,1,0,1,0,0,0,1,1,
C <- factor(c(0,1,1,0,1,0,0,1,0,0,1,1,0,0,1,1,0,1,0,1,
Yield <- c(101, 373, 398, 291, 312, 106, 265, 450, 106, 306, 324, 449,
           272, 89, 407, 338, 87, 324, 279, 471, 323, 128, 423, 334,
           131, 103, 445, 437, 324, 361, 302, 272)
aovdat <- data.frame(Block, A, B, C, Yield)
fit <- aov(Yield ~ A + B * C + Error(Block), data = aovdat)
cont1 <- c(-1, 1)[A]/32  # Helmert contrasts
cont2 <- c(-1, 1)[B] * c(-1, 1)[C]/32
cont <- cbind(A = cont1, BC = cont2)
colSums(cont*Yield) # values of the contrasts
se.contrast(fit, as.matrix(cont))
# comparison with lme
fit2 <- lme(Yield ~ A + B*C, random = ~1 | Block, data = aovdat)
summary(fit2)$tTable # same estimates, similar (but smaller) se's.

Construct Self-starting Nonlinear Models


Construct self-starting nonlinear models to be used in nls, etc. Via function initial to compute approximate parameter values from data, such models are “self-starting”, i.e., do not need a start argument in, e.g., nls().


selfStart(model, initial, parameters, template)



a function object defining a nonlinear model or a nonlinear formula object of the form ~ expression.


a function object, taking arguments mCall, data, and LHS, and ..., representing, respectively, a matched call to the function model, a data frame in which to interpret the variables in mCall, and the expression from the left-hand side of the model formula in the call to nls. This function should return initial values for the parameters in model. The ... is used by nls() to pass its control and trace arguments for the cases where initial() itself calls nls() as it does for the ten self-starting nonlinear models in R's stats package.


a character vector specifying the terms on the right hand side of model for which initial estimates should be calculated. Passed as the namevec argument to the deriv function.


an optional prototype for the calling sequence of the returned object, passed as the function.arg argument to the deriv function. By default, a template is generated with the covariates in model coming first and the parameters in model coming last in the calling sequence.


nls() calls getInitial and the initial function for these self-starting models.

This function is generic; methods functions can be written to handle specific classes of objects.


a function object of class "selfStart", for the formula method obtained by applying deriv to the right hand side of the model formula. An initial attribute (defined by the initial argument) is added to the function to calculate starting estimates for the parameters in the model automatically.


José Pinheiro and Douglas Bates

See Also

nls, getInitial.

Each of the following are "selfStart" models (with examples) SSasymp, SSasympOff, SSasympOrig, SSbiexp, SSfol, SSfpl, SSgompertz, SSlogis, SSmicmen, SSweibull.

Further, package nlme's nlsList.


## self-starting logistic model

## The "initializer" (finds initial values for parameters from data):
initLogis <- function(mCall, data, LHS, ...) {
    xy <- sortedXyData(mCall[["x"]], LHS, data)
    if(nrow(xy) < 4)
        stop("too few distinct input values to fit a logistic model")
    z <- xy[["y"]]
    ## transform to proportion, i.e. in (0,1) :
    rng <- range(z); dz <- diff(rng)
    z <- (z - rng[1L] + 0.05 * dz)/(1.1 * dz)
    xy[["z"]] <- log(z/(1 - z))		# logit transformation
    aux <- coef(lm(x ~ z, xy))
    pars <- coef(nls(y ~ 1/(1 + exp((xmid - x)/scal)),
                     data = xy,
                     start = list(xmid = aux[[1L]], scal = aux[[2L]]),
                     algorithm = "plinear", ...))
    setNames(pars [c(".lin", "xmid", "scal")],
             mCall[c("Asym", "xmid", "scal")])

mySSlogis <- selfStart(~ Asym/(1 + exp((xmid - x)/scal)),
                       initial = initLogis,
                       parameters = c("Asym", "xmid", "scal"))

getInitial(weight ~ mySSlogis(Time, Asym, xmid, scal),
           data = subset(ChickWeight, Chick == 1))

# 'first.order.log.model' is a function object defining a first order
# compartment model
# 'first.order.log.initial' is a function object which calculates initial
# values for the parameters in 'first.order.log.model'
# self-starting first order compartment model
## Not run: 
SSfol <- selfStart(first.order.log.model, first.order.log.initial)

## End(Not run)

## Explore the self-starting models already available in R's  "stats": <- which("package:stats" == search())
mSS <- apropos("^SS..", where = TRUE, = FALSE)
(mSS <- unname(mSS[names(mSS) ==]))
fSS <- sapply(mSS, get, pos =, mode = "function")
all(sapply(fSS, inherits, "selfStart"))  # -> TRUE

## Show the argument list of each self-starting function:
str(fSS, give.attr = FALSE)

Set the Names in an Object


This is a convenience function that sets the names on an object and returns the object. It is most useful at the end of a function definition where one is creating the object to be returned and would prefer not to store it under a name just so the names can be assigned.


setNames(object = nm, nm)



an object for which a names attribute will be meaningful


a character vector of names to assign to the object


An object of the same sort as object with the new names assigned.


Douglas M. Bates and Saikat DebRoy

See Also

unname for removing names.


setNames( 1:3, c("foo", "bar", "baz") )
# this is just a short form of
tmp <- 1:3
names(tmp) <-  c("foo", "bar", "baz")

## special case of character vector, using default
setNames(nm = c("First", "2nd"))

Shapiro-Wilk Normality Test


Performs the Shapiro-Wilk test of normality.





a numeric vector of data values. Missing values are allowed, but the number of non-missing values must be between 3 and 5000.


A list with class "htest" containing the following components:


the value of the Shapiro-Wilk statistic.


an approximate p-value for the test. This is said in Royston (1995) to be adequate for p.value < 0.1.


the character string "Shapiro-Wilk normality test".

a character string giving the name(s) of the data.


The algorithm used is a C translation of the Fortran code described in Royston (1995). The calculation of the p value is exact for n=3n = 3, otherwise approximations are used, separately for 4n114 \le n \le 11 and n12n \ge 12.


Patrick Royston (1982). An extension of Shapiro and Wilk's WW test for normality to large samples. Applied Statistics, 31, 115–124. doi:10.2307/2347973.

Patrick Royston (1982). Algorithm AS 181: The WW test for Normality. Applied Statistics, 31, 176–180. doi:10.2307/2347986.

Patrick Royston (1995). Remark AS R94: A remark on Algorithm AS 181: The WW test for normality. Applied Statistics, 44, 547–551. doi:10.2307/2986146.

See Also

qqnorm for producing a normal quantile-quantile plot.


shapiro.test(rnorm(100, mean = 5, sd = 3))
shapiro.test(runif(100, min = 2, max = 4))

Extract Residual Standard Deviation 'Sigma'


Extract the estimated standard deviation of the errors, the “residual standard deviation” (misnamed also “residual standard error”, e.g., in summary.lm()'s output, from a fitted model).

Many classical statistical models have a scale parameter, typically the standard deviation of a zero-mean normal (or Gaussian) random variable which is denoted as σ\sigma. sigma(.) extracts the estimated parameter from a fitted model, i.e., σ^\hat\sigma.


sigma(object, ...)

## Default S3 method:
sigma(object, use.fallback = TRUE, ...)



an R object, typically resulting from a model fitting function such as lm.


logical, passed to nobs.


potentially further arguments passed to and from methods. Passed to deviance(*, ...) for the default method.


The stats package provides the S3 generic, a default method, and a method for objects of class "glm". The default method is correct typically for (asymptotically / approximately) generalized gaussian (“least squares”) problems, since it is defined as

   sigma.default <- function (object, use.fallback = TRUE, ...)
       sqrt( deviance(object, ...) / (NN - PP) )

where NN <- nobs(object, use.fallback = use.fallback) and PP <- sum(! – where in older R versions this was length(coef(object)) which is too large in case of undetermined coefficients, e.g., for rank deficient model fits.


Typically a number, the estimated standard deviation of the errors (“residual standard deviation”) for Gaussian models, and—less interpretably—the square root of the residual deviance per degree of freedom in more general models.

Very strictly speaking, σ^\hat{\sigma} (“σ\sigma hat”) is actually σ2^\sqrt{\widehat{\sigma^2}}.

For generalized linear models (class "glm"), the sigma.glm method returns the square root of the dispersion parameter (See summary.glm). For families with free dispersion parameter, sigmasigma is estimated from the root mean square of the Pearson residuals. For families with fixed dispersion, sigma is not estimated from the residuals but extracted directly from the family of the fitted model. Consequently, for binomial or Poisson GLMs, sigma is exactly 1.

For multivariate linear models (class "mlm"), a vector of sigmas is returned, each corresponding to one column of YY.


The misnomer “Residual standard error” has been part of too many R (and S) outputs to be easily changed there.

See Also

deviance, nobs, vcov, summary.glm.


## -- lm() ------------------------------
lm1 <- lm(Fertility ~ . , data = swiss)
sigma(lm1) # ~= 7.165  = "Residual standard error"  printed from summary(lm1)
stopifnot(all.equal(sigma(lm1), summary(lm1)$sigma, tolerance=1e-15))

## -- nls() -----------------------------
DNase1 <- subset(DNase, Run == 1)
fm.DN1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1)
sigma(fm.DN1) # ~= 0.01919  as from summary(..)
stopifnot(all.equal(sigma(fm.DN1), summary(fm.DN1)$sigma, tolerance=1e-15))

## -- glm() -----------------------------
## -- a) Binomial -- Example from MASS
ldose <- rep(0:5, 2)
numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)
sex <- factor(rep(c("M", "F"), c(6, 6)))
SF <- cbind(numdead, numalive = 20-numdead)
sigma(budworm.lg <- glm(SF ~ sex*ldose, family = binomial))

## -- b) Poisson -- from ?glm :
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
sigma(glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()))
## equal to
sqrt(summary(glm.D93)$dispersion) # == 1
## and the *Quasi*poisson's dispersion
sigma(glm.qD93 <- update(glm.D93, family = quasipoisson()))
sigma (glm.qD93)^2 # 1.2933 equal to
summary(glm.qD93)$dispersion # == 1.2933

## -- Multivariate lm() "mlm" -----------
utils::example("SSD", echo=FALSE)
sigma(mlmfit) # is the same as {but more efficient than}

Distribution of the Wilcoxon Signed Rank Statistic


Density, distribution function, quantile function and random generation for the distribution of the Wilcoxon Signed Rank statistic obtained from a sample with size n.


dsignrank(x, n, log = FALSE)
psignrank(q, n, lower.tail = TRUE, log.p = FALSE)
qsignrank(p, n, lower.tail = TRUE, log.p = FALSE)
rsignrank(nn, n)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(nn) > 1, the length is taken to be the number required.


number(s) of observations in the sample(s). A positive integer, or a vector of such integers.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


This distribution is obtained as follows. Let x be a sample of size n from a continuous distribution symmetric about the origin. Then the Wilcoxon signed rank statistic is the sum of the ranks of the absolute values x[i] for which x[i] is positive. This statistic takes values between 00 and n(n+1)/2n(n+1)/2, and its mean and variance are n(n+1)/4n(n+1)/4 and n(n+1)(2n+1)/24n(n+1)(2n+1)/24, respectively.

If either of the first two arguments is a vector, the recycling rule is used to do the calculations for all combinations of the two up to the length of the longer vector.


dsignrank gives the density, psignrank gives the distribution function, qsignrank gives the quantile function, and rsignrank generates random deviates.

The length of the result is determined by nn for rsignrank, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than nn are recycled to the length of the result. Only the first elements of the logical arguments are used.


Kurt Hornik; efficiency improvement by Ivo Ugrina.

See Also

wilcox.test to calculate the statistic from data, find p values and so on.

Distributions for standard distributions, including dwilcox for the distribution of two-sample Wilcoxon rank sum statistic.



par(mfrow = c(2,2))
for(n in c(4:5,10,40)) {
  x <- seq(0, n*(n+1)/2, length.out = 501)
  plot(x, dsignrank(x, n = n), type = "l",
       main = paste0("dsignrank(x, n = ", n, ")"))

Simulate Responses


Simulate one or more responses from the distribution corresponding to a fitted model object.


simulate(object, nsim = 1, seed = NULL, ...)



an object representing a fitted model.


number of response vectors to simulate. Defaults to 1.


an object specifying if and how the random number generator should be initialized (‘seeded’).
For the "lm" method, either NULL or an integer that will be used in a call to set.seed before simulating the response vectors. If set, the value is saved as the "seed" attribute of the returned value. The default, NULL will not change the random generator state, and return .Random.seed as the "seed" attribute, see ‘Value’.


additional optional arguments.


This is a generic function. Consult the individual modeling functions for details on how to use this function.

Package stats has a method for "lm" objects which is used for lm and glm fits. There is a method for fits from glm.nb in package MASS, and hence the case of negative binomial families is not covered by the "lm" method.

The methods for linear models fitted by lm or glm(family = "gaussian") assume that any weights which have been supplied are inversely proportional to the error variance. For other GLMs the (optional) simulate component of the family object is used—there is no appropriate simulation method for ‘quasi’ models as they are specified only up to two moments.

For binomial and Poisson GLMs the dispersion is fixed at one. Integer prior weights wiw_i can be interpreted as meaning that observation ii is an average of wiw_i observations, which is natural for binomials specified as proportions but less so for a Poisson, for which prior weights are ignored with a warning.

For a gamma GLM the shape parameter is estimated by maximum likelihood (using function gamma.shape in package MASS). The interpretation of weights is as multipliers to a basic shape parameter, since dispersion is inversely proportional to shape.

For an inverse gaussian GLM the model assumed is IG(μi,λwi)IG(\mu_i, \lambda w_i) (see where λ\lambda is estimated by the inverse of the dispersion estimate for the fit. The variance is μi3/(λwi)\mu_i^3/(\lambda w_i) and hence inversely proportional to the prior weights. The simulation is done by function rinvGauss from the SuppDists package, which must be installed.


Typically, a list of length nsim of simulated responses. Where appropriate the result can be a data frame (which is a special type of list).

For the "lm" method, the result is a data frame with an attribute "seed". If argument seed is NULL, the attribute is the value of .Random.seed before the simulation was started; otherwise it is the value of the argument with a "kind" attribute with value as.list(RNGkind()).

See Also

RNG about random number generation in R, fitted.values and residuals for related methods; glm, lm for model fitting.

There are further examples in the ‘simulate.R’ tests file in the sources for package stats.


x <- 1:5
mod1 <- lm(c(1:3, 7, 6) ~ x)
S1 <- simulate(mod1, nsim = 4)
## repeat the simulation:
.Random.seed <- attr(S1, "seed")
identical(S1, simulate(mod1, nsim = 4))

S2 <- simulate(mod1, nsim = 200, seed = 101)
rowMeans(S2) # should be about the same as

## repeat identically:
(sseed <- attr(S2, "seed")) # seed; RNGkind as attribute
stopifnot(identical(S2, simulate(mod1, nsim = 200, seed = sseed)))

## To be sure about the proper RNGkind, e.g., after
## first set the RNG kind, then simulate, attr(sseed, "kind"))
identical(S2, simulate(mod1, nsim = 200, seed = sseed))

## Binomial GLM examples
yb1 <- matrix(c(4, 4, 5, 7, 8, 6, 6, 5, 3, 2), ncol = 2)
modb1 <- glm(yb1 ~ x, family = binomial)
S3 <- simulate(modb1, nsim = 4)
# each column of S3 is a two-column matrix.

x2 <- sort(runif(100))
yb2 <- rbinom(100, prob = plogis(2*(x2-1)), size = 1)
yb2 <- factor(1 + yb2, labels = c("failure", "success"))
modb2 <- glm(yb2 ~ x2, family = binomial)
S4 <- simulate(modb2, nsim = 4)
# each column of S4 is a factor

Distribution of the Smirnov Statistic


Distribution function, quantile function and random generation for the distribution of the Smirnov statistic.


psmirnov(q, sizes, z = NULL,
         alternative = c("two.sided", "less", "greater"),
         exact = TRUE, simulate = FALSE, B = 2000,
         lower.tail = TRUE, log.p = FALSE)
qsmirnov(p, sizes, z = NULL,
         alternative = c("two.sided", "less", "greater"),
         exact = TRUE, simulate = FALSE, B = 2000)
rsmirnov(n, sizes, z = NULL,
         alternative = c("two.sided", "less", "greater"))



a numeric vector of quantiles.


a numeric vector of probabilities.


an integer vector of length two giving the sample sizes.


a numeric vector of the pooled data values in both samples when the exact conditional distribution of the Smirnov statistic given the data shall be computed.


one of "two.sided" (default), "less", or "greater" indicating whether absolute (two-sided, default) or raw (one-sided) differences of frequencies define the test statistic. See ‘Details’.


NULL or a logical indicating whether the exact (conditional on the pooled data values in z) distribution or the asymptotic distribution should be used.


a logical indicating whether to compute the distribution function by Monte Carlo simulation.


an integer specifying the number of replicates used in the Monte Carlo test.


a logical, if TRUE (default), probabilities are P[D<q]P[D < q], otherwise, P[Dq]P[D \ge q].


a logical, if TRUE (default), probabilities are given as log-probabilities.


an integer giving number of observations.


For samples xx and yy with respective sizes nxn_x and nyn_y and empirical cumulative distribution functions Fx,nxF_{x,n_x} and Fy,nyF_{y,n_y}, the Smirnov statistic is

D=supcFx,nx(c)Fy,ny(c)D = \sup_c | F_{x,n_x}(c) - F_{y,n_y}(c) |

in the two-sided case,

D+=supc(Fx,nx(c)Fy,ny(c))D^+ = \sup_c ( F_{x,n_x}(c) - F_{y,n_y}(c) )

in the one-sided "greater" case, and

D=supc(Fy,ny(c)Fx,nx(c))D^- = \sup_c ( F_{y,n_y}(c) - F_{x,n_x}(c) )

in the one-sided "less" case.

These statistics are used in the Smirnov test of the null that xx and yy were drawn from the same distribution, see ks.test.

If the underlying common distribution function FF is continuous, the distribution of the test statistics does not depend on FF, and has a simple asymptotic approximation. For arbitrary FF, one can compute the conditional distribution given the pooled data values zz of xx and yy, either exactly (feasible provided that the product nxnyn_x n_y of the sample sizes is “small enough”) or approximately Monte Carlo simulation. If the pooled data values zz are not specified, a pooled sample without ties is assumed.


psmirnov gives the distribution function, qsmirnov gives the quantile function, and rsmirnov generates random deviates.

See Also

ks.test for references on the algorithms used for computing exact distributions.

Tukey's (Running Median) Smoothing


Tukey's smoothers, 3RS3R, 3RSS, 3R, etc.


smooth(x, kind = c("3RS3R", "3RSS", "3RSR", "3R", "3", "S"),
       twiceit = FALSE, endrule = c("Tukey", "copy"), do.ends = FALSE)



a vector or time series


a character string indicating the kind of smoother required; defaults to "3RS3R".


logical, indicating if the result should be ‘twiced’. Twicing a smoother S(y)S(y) means S(y)+S(yS(y))S(y) + S(y - S(y)), i.e., adding smoothed residuals to the smoothed values. This decreases bias (increasing variance).


a character string indicating the rule for smoothing at the boundary. Either "Tukey" (default) or "copy".


logical, indicating if the 3-splitting of ties should also happen at the boundaries (ends). This is only used for kind = "S".


3 is Tukey's short notation for running medians of length 3,
3R stands for Repeated 3 until convergence, and
S for Splitting of horizontal stretches of length 2 or 3.

Hence, 3RS3R is a concatenation of 3R, S and 3R, 3RSS similarly, whereas 3RSR means first 3R and then (S and 3) Repeated until convergence – which can be bad.


An object of class "tukeysmooth" (which has print and summary methods) and is a vector or time series containing the smoothed values with additional attributes.


Note that there are other smoothing methods which provide rather better results. These were designed for hand calculations and may be used mainly for didactical purposes.

Since R version 1.2, smooth does really implement Tukey's end-point rule correctly (see argument endrule).

kind = "3RSR" had been the default till R-1.1, but it can have very bad properties, see the examples.

Note that repeated application of smooth(*) does smooth more, for the "3RS*" kinds.


Tukey, J. W. (1977). Exploratory Data Analysis, Reading Massachusetts: Addison-Wesley.

See Also

runmed for running medians; lowess and loess; supsmu and smooth.spline.



## see also   demo(smooth) !

x1 <- c(4, 1, 3, 6, 6, 4, 1, 6, 2, 4, 2) # very artificial
(x3R <- smooth(x1, "3R")) # 2 iterations of "3"
smooth(x3R, kind = "S")

sm.3RS <- function(x, ...)
   smooth(smooth(x, "3R", ...), "S", ...)

y <- c(1, 1, 19:1)
plot(y, main = "misbehaviour of \"3RSR\"", col.main = 3)
lines(smooth(y, "3RSR"), col = 3, lwd = 2)  # the horror

x <- c(8:10, 10, 0, 0, 9, 9)
plot(x, main = "breakdown of  3R  and  S  and hence  3RSS")
matlines(cbind(smooth(x, "3R"), smooth(x, "S"), smooth(x, "3RSS"), smooth(x)))

presidents[] <- 0 # silly
summary(sm3 <- smooth(presidents, "3R"))
summary(sm2 <- smooth(presidents,"3RSS"))
summary(sm  <- smooth(presidents))

all.equal(c(sm2), c(smooth(smooth(sm3, "S"), "S")))  # 3RSS  === 3R S S
all.equal(c(sm),  c(smooth(smooth(sm3, "S"), "3R"))) # 3RS3R === 3R S 3R

plot(presidents, main = "smooth(presidents0, *) :  3R and default 3RS3R")
lines(sm3, col = 3, lwd = 1.5)
lines(sm, col = 2, lwd = 1.25)

Fit a Smoothing Spline


Fits a cubic smoothing spline to the supplied data.


smooth.spline(x, y = NULL, w = NULL, df, spar = NULL, lambda = NULL, cv = FALSE,
              all.knots = FALSE, nknots = .nknots.smspl,
     = TRUE, df.offset = 0, penalty = 1,
              control.spar = list(), tol = 1e-6 * IQR(x), keep.stuff = FALSE)




a vector giving the values of the predictor variable, or a list or a two-column matrix specifying x and y.


responses. If y is missing or NULL, the responses are assumed to be specified by x, with x the index vector.


optional vector of weights of the same length as x; defaults to all 1.


the desired equivalent number of degrees of freedom (trace of the smoother matrix). Must be in (1,nx](1,n_x], nxn_x the number of unique x values, see below.


smoothing parameter, typically (but not necessarily) in (0,1](0,1]. When spar is specified, the coefficient λ\lambda of the integral of the squared second derivative in the fit (penalized log likelihood) criterion is a monotone function of spar, see the details below. Alternatively lambda may be specified instead of the scale free spar=ss.


if desired, the internal (design-dependent) smoothing parameter λ\lambda can be specified instead of spar. This may be desirable for resampling algorithms such as cross validation or the bootstrap.


ordinary leave-one-out (TRUE) or ‘generalized’ cross-validation (GCV) when FALSE; is used for smoothing parameter computation only when both spar and df are not specified; it is used however to determine cv.crit in the result. Setting it to NA for speedup skips the evaluation of leverages and any score.


if TRUE, all distinct points in x are used as knots. If FALSE (default), a subset of x[] is used, specifically x[j] where the nknots indices are evenly spaced in 1:n, see also the next argument nknots.

Alternatively, a strictly increasing numeric vector specifying “all the knots” to be used; must be rescaled to [0,1][0, 1] already such that it corresponds to the ans $ fit$knots sequence returned, not repeating the boundary knots.


integer or function giving the number of knots to use when all.knots = FALSE. If a function (as by default), the number of knots is nknots(nx). By default using .nknots.smspl(), for nx>49n_x > 49 this is less than nxn_x, the number of unique x values, see the Note.

logical specifying if the input data should be kept in the result. If TRUE (as per default), fitted values and residuals are available from the result.


allows the degrees of freedom to be increased by df.offset in the GCV criterion.


the coefficient of the penalty for degrees of freedom in the GCV criterion.


optional list with named components controlling the root finding when the smoothing parameter spar is computed, i.e., missing or NULL, see below.

Note that this is partly experimental and may change with general spar computation improvements!


lower bound for spar; defaults to -1.5 (used to implicitly default to 0 in R versions earlier than 1.4).


upper bound for spar; defaults to +1.5.


the absolute precision (tolerance) used; defaults to 1e-4 (formerly 1e-3).


the relative precision used; defaults to 2e-8 (formerly 0.00244).


logical indicating if iterations should be traced.


integer giving the maximal number of iterations; defaults to 500.

Note that spar is only searched for in the interval [low,high][low, high].


a tolerance for sameness or uniqueness of the x values. The values are binned into bins of size tol and values which fall into the same bin are regarded as the same. Must be strictly positive (and finite).


an experimental logical indicating if the result should keep extras from the internal computations. Should allow to reconstruct the XX matrix and more.


for .nknots.smspl; typically the number of unique x values (aka nxn_x).


Neither x nor y are allowed to containing missing or infinite values.

The x vector should contain at least four distinct values. ‘Distinct’ here is controlled by tol: values which are regarded as the same are replaced by the first of their values and the corresponding y and w are pooled accordingly.

Unless lambda has been specified instead of spar, the computational λ\lambda used (as a function of s=spars=spar) is λ=r2563s1\lambda = r \cdot 256^{3 s - 1} where r=tr(XWX)/tr(Σ)r = tr(X' W X) / tr(\Sigma), Σ\Sigma is the matrix given by Σij=Bi(t)Bj(t)dt\Sigma_{ij} = \int B_i''(t) B_j''(t) dt, XX is given by Xij=Bj(xi)X_{ij} = B_j(x_i), WW is the diagonal matrix of weights (scaled such that its trace is nn, the original number of observations) and Bk(.)B_k(.) is the kk-th B-spline.

Note that with these definitions, fi=f(xi)f_i = f(x_i), and the B-spline basis representation f=Xcf = X c (i.e., cc is the vector of spline coefficients), the penalized log likelihood is L=(yf)W(yf)+λcΣcL = (y - f)' W (y - f) + \lambda c' \Sigma c, and hence cc is the solution of the (ridge regression) (XWX+λΣ)c=XWy(X' W X + \lambda \Sigma) c = X' W y.

If spar and lambda are missing or NULL, the value of df is used to determine the degree of smoothing. If df is missing as well, leave-one-out cross-validation (ordinary or ‘generalized’ as determined by cv) is used to determine λ\lambda.

Note that from the above relation, spar is s=s0+0.0601logλs = s0 + 0.0601 \cdot \log\lambda.

Note however that currently the results may become very unreliable for spar values smaller than about -1 or -2. The same may happen for values larger than 2 or so. Don't think of setting spar or the controls low and high outside such a safe range, unless you know what you are doing! Similarly, specifying lambda instead of spar is delicate, notably as the range of “safe” values for lambda is not scale-invariant and hence entirely data dependent.

The ‘generalized’ cross-validation method GCV will work correctly when there are duplicated points in x. However, it is ambiguous what leave-one-out cross-validation means with duplicated points, and the internal code uses an approximation that involves leaving out groups of duplicated points. cv = TRUE is best avoided in that case.


An object of class "smooth.spline" with components


the distinct x values in increasing order, see the ‘Details’ above.


the fitted values corresponding to x.


the weights used at the unique values of x.


the y values used at the unique y values.


the tol argument (whose default depends on x).


only if = TRUE: itself a list with components x, y and w of the same length. These are the original (xi,yi,wi),i=1,,n(x_i,y_i,w_i), i = 1, \dots, n, values where data$x may have repeated values and hence be longer than the above x component; see details.


an integer; the (original) sample size.


(when cv was not NA) leverages, the diagonal values of the smoother matrix.


the cv argument used; i.e., FALSE, TRUE, or NA.


cross-validation score, ‘generalized’ or true, depending on cv. The CV score is often called “PRESS” (and labeled on print()), for ‘PREdiction Sum of Squares’. Note that this is not the same as the (CV or GCV) score which is minimized during fitting (and returned in crit), e.g., in the case of nx < n (where nx=nx=n_x is the number of unique x values).


the penalized criterion, a non-negative number; simply the (weighted) residual sum of squares (RSS), sum(.$w * residuals(.)^2) .


the criterion value minimized in the underlying .Fortran routine ‘sslvrg’. When df has been specified, the criterion is 3+(tr(Sλ)df)23 + (tr(S_\lambda) - df)^2, where the 3+3 + is there for numerical (and historical) reasons.


equivalent degrees of freedom used. Note that (currently) this value may become quite imprecise when the true df is between and 1 and 2.


the value of spar computed or given, unless it has been given as c(lambda = *), when it set to NA here.


(when spar above is not NA), the value rr, the ratio of two matrix traces.


the value of λ\lambda corresponding to spar, see the details above.


named integer(3) vector where ..$ipars["iter"] gives number of spar computing iterations used.


experimental; when keep.stuff was true, a “flat” numeric vector containing parts of the internal computations.


list for use by predict.smooth.spline, with components


the knot sequence (including the repeated boundary knots), scaled into [0,1][0, 1] (via min and range).


number of coefficients or number of ‘proper’ knots plus 2.


coefficients for the spline basis used.

min, range:

numbers giving the corresponding quantities of x.


the matched call.

method(class = "smooth.spline") shows a hatvalues() method based on the lev vector above.


The number of unique x values, nx=nx\code{nx} = n_x, are determined by the tol argument, equivalently to

    nx <- length(x) - sum(duplicated( round((x - mean(x)) / tol) ))

The default all.knots = FALSE and nknots = .nknots.smspl, entails using only O(nx0.2)O({n_x}^{0.2}) knots instead of nxn_x for nx>49n_x > 49. This cuts speed and memory requirements, but not drastically anymore since R version 1.5.1 where it is only O(nk)+O(n)O(n_k) + O(n) where nkn_k is the number of knots.

In this case where not all unique x values are used as knots, the result is a regression spline rather than a smoothing spline in the strict sense, but very close unless a small smoothing parameter (or large df) is used.


R implementation by B. D. Ripley and Martin Maechler (spar/lambda, etc).


This function is based on code in the GAMFIT Fortran program by T. Hastie and R. Tibshirani (originally taken from which makes use of spline code by Finbarr O'Sullivan. Its design parallels the smooth.spline function of Chambers & Hastie (1992).


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S, Wadsworth & Brooks/Cole.

Green, P. J. and Silverman, B. W. (1994) Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall.

Hastie, T. J. and Tibshirani, R. J. (1990) Generalized Additive Models. Chapman and Hall.

See Also

predict.smooth.spline for evaluating the spline and its derivatives.


plot(dist ~ speed, data = cars, main = "data(cars)  &  smoothing splines")
cars.spl <- with(cars, smooth.spline(speed, dist))
## This example has duplicate points, so avoid cv = TRUE

lines(cars.spl, col = "blue")
ss10 <- smooth.spline(cars[,"speed"], cars[,"dist"], df = 10)
lines(ss10, lty = 2, col = "red")
legend(5,120,c(paste("default [C.V.] => df =",round(cars.spl$df,1)),
               "s( * , df = 10)"), col = c("blue","red"), lty = 1:2,
       bg = 'bisque')

## Residual (Tukey Anscombe) plot:
plot(residuals(cars.spl) ~ fitted(cars.spl))
abline(h = 0, col = "gray")

## consistency check:
                    fitted(cars.spl) + residuals(cars.spl)))
## The chosen inner knots in original x-scale :
with(cars.spl$fit, min + range * knot[-c(1:3, nk+1 +1:3)]) # == unique(cars$speed)

## Visualize the behavior of  .nknots.smspl()
nKnots <- Vectorize(.nknots.smspl) ; c.. <- adjustcolor("gray20",.5)
curve(nKnots, 1, 250, n=250)
abline(0,1, lty=2, col=c..); text(90,90,"y = x", col=c.., adj=-.25)
abline(h=100,lty=2); abline(v=200, lty=2)

n <- c(1:799, seq(800, 3490, by=10), seq(3500, 10000, by = 50))
plot(n, nKnots(n), type="l", main = "Vectorize(.nknots.smspl) (n)")
abline(0,1, lty=2, col=c..); text(180,180,"y = x", col=c..)
n0 <- c(50, 200, 800, 3200); c0 <- adjustcolor("blue3", .5)
lines(n0, nKnots(n0), type="h", col=c0)
axis(1, at=n0, line=-2, col.ticks=c0, col=NA, col.axis=c0)
axis(4, at=.nknots.smspl(10000), line=-.5, col=c..,col.axis=c.., las=1)

##-- artificial example
y18 <- c(1:3, 5, 4, 7:3, 2*(2:5), rep(10, 4))
xx  <- seq(1, length(y18), length.out = 201)
(s2   <- smooth.spline(y18)) # GCV
(s02  <- smooth.spline(y18, spar = 0.2))
(s02. <- smooth.spline(y18, spar = 0.2, cv = NA))
plot(y18, main = deparse(s2$call), col.main = 2)
lines(s2, col = "gray"); lines(predict(s2, xx), col = 2)
lines(predict(s02, xx), col = 3); mtext(deparse(s02$call), col = 3)

## Specifying 'lambda' instead of usual spar :
(s2. <- smooth.spline(y18, lambda = s2$lambda, tol = s2$tol))

## The following shows the problematic behavior of 'spar' searching:
(s2  <- smooth.spline(y18, control =
                      list(trace = TRUE, tol = 1e-6, low = -1.5)))
(s2m <- smooth.spline(y18, cv = TRUE, control =
                      list(trace = TRUE, tol = 1e-6, low = -1.5)))
## both above do quite similarly (Df = 8.5 +- 0.2)

End Points Smoothing (for Running Medians)


Smooth end points of a vector y using subsequently smaller medians and Tukey's end point rule at the very end. (of odd span),


smoothEnds(y, k = 3)



dependent variable to be smoothed (vector).


width of largest median window; must be odd.


smoothEnds is used to only do the ‘end point smoothing’, i.e., change at most the observations closer to the beginning/end than half the window k. The first and last value are computed using Tukey's end point rule, i.e., sm[1] = median(y[1], sm[2], 3*sm[2] - 2*sm[3], na.rm=TRUE).

In R versions 3.6.0 and earlier, missing values (NA) in y typically lead to an error, whereas now the equivalent of median(*, na.rm=TRUE) is used.


vector of smoothed values, the same length as y.


Martin Maechler


John W. Tukey (1977) Exploratory Data Analysis, Addison.

Velleman, P.F., and Hoaglin, D.C. (1981) ABC of EDA (Applications, Basics, and Computing of Exploratory Data Analysis); Duxbury.

See Also

runmed(*, endrule = "median") which calls smoothEnds().



y <- ys <- (-20:20)^2
y [c(1,10,21,41)] <-  c(100, 30, 400, 470)
s7k <- runmed(y, 7, endrule = "keep")
s7. <- runmed(y, 7, endrule = "const")
s7m <- runmed(y, 7)
col3 <- c("midnightblue","blue","steelblue")
plot(y, main = "Running Medians -- runmed(*, k=7, endrule = X)")
lines(ys, col = "light gray")
matlines(cbind(s7k, s7.,s7m), lwd = 1.5, lty = 1, col = col3)
eRules <- c("keep","constant","median")
legend("topleft", paste("endrule", eRules, sep = " = "),
       col = col3, lwd = 1.5, lty = 1, bty = "n")

stopifnot(identical(s7m, smoothEnds(s7k, 7)))

## With missing values (for R >= 3.6.1):
yN <- y; yN[c(2,40)] <- NA
rN <- sapply(eRules, function(R) runmed(yN, 7, endrule=R))
matlines(rN, type = "b", pch = 4, lwd = 3, lty=2,
         col = adjustcolor(c("red", "orange4", "orange1"), 0.5))
yN[c(1, 20:21)] <- NA # additionally
rN. <- sapply(eRules, function(R) runmed(yN, 7, endrule=R))
head(rN., 4); tail(rN.) # more NA's too, still not *so* many:
stopifnot(exprs = {
   identical(which([,"keep"])), c(2L, 40L))
   identical(which(, arr.ind=TRUE, useNames=FALSE),
             cbind(c(1:2,40L), 1L))
   identical(rN.[38:41, "median"], c(289,289, 397, 470))

Create a sortedXyData Object


This is a constructor function for the class of sortedXyData objects. These objects are mostly used in the initial function for a self-starting nonlinear regression model, which will be of the selfStart class.


sortedXyData(x, y, data)



a numeric vector or an expression that will evaluate in data to a numeric vector


a numeric vector or an expression that will evaluate in data to a numeric vector


an optional data frame in which to evaluate expressions for x and y, if they are given as expressions


A sortedXyData object. This is a data frame with exactly two numeric columns, named x and y. The rows are sorted so the x column is in increasing order. Duplicate x values are eliminated by averaging the corresponding y values.


José Pinheiro and Douglas Bates

See Also

selfStart, NLSstClosestX, NLSstLfAsymptote, NLSstRtAsymptote


DNase.2 <- DNase[ DNase$Run == "2", ]
sortedXyData( expression(log(conc)), expression(density), DNase.2 )

Estimate Spectral Density of a Time Series from AR Fit


Fits an AR model to x (or uses the existing fit) and computes (and by default plots) the spectral density of the fitted model.

Usage, n.freq, order = NULL, plot = TRUE, na.action =,
        method = "yule-walker", ...)



A univariate (not yet:or multivariate) time series or the result of a fit by ar.


The number of points at which to plot.


The order of the AR model to be fitted. If omitted, the order is chosen by AIC.


Plot the periodogram?


NA action function.


method for ar fit.


Graphical arguments passed to plot.spec.


An object of class "spec". The result is returned invisibly if plot is true.


Some authors, for example Thomson (1990), warn strongly that AR spectra can be misleading.


The multivariate case is not yet implemented.


Thompson, D.J. (1990). Time series analysis of Holocene climate data. Philosophical Transactions of the Royal Society of London Series A, 330, 601–616. doi:10.1098/rsta.1990.0041.

Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with S. Fourth edition. Springer. (Especially page 402.)

See Also

ar, spectrum.


require(graphics), method = "burg"), method = "burg", add = TRUE, col = "purple"), method = "mle", add = TRUE, col = "forest green"), method = "ols", add = TRUE, col = "blue")

Estimate Spectral Density of a Time Series by a Smoothed Periodogram


spec.pgram calculates the periodogram using a fast Fourier transform, and optionally smooths the result with a series of modified Daniell smoothers (moving averages giving half weight to the end values).


spec.pgram(x, spans = NULL, kernel, taper = 0.1,
           pad = 0, fast = TRUE, demean = FALSE, detrend = TRUE,
           plot = TRUE, na.action =, ...)



univariate or multivariate time series.


vector of odd integers giving the widths of modified Daniell smoothers to be used to smooth the periodogram.


alternatively, a kernel smoother of class "tskernel".


specifies the proportion of data to taper. A split cosine bell taper is applied to this proportion of the data at the beginning and end of the series.


proportion of data to pad. Zeros are added to the end of the series to increase its length by the proportion pad.


logical; if TRUE, pad the series to a highly composite length.


logical. If TRUE, subtract the mean of the series.


logical. If TRUE, remove a linear trend from the series. This will also remove the mean.


plot the periodogram?


NA action function.


graphical arguments passed to plot.spec.


The raw periodogram is not a consistent estimator of the spectral density, but adjacent values are asymptotically independent. Hence a consistent estimator can be derived by smoothing the raw periodogram, assuming that the spectral density is smooth.

The series will be automatically padded with zeros until the series length is a highly composite number in order to help the Fast Fourier Transform. This is controlled by the fast and not the pad argument.

The periodogram at zero is in theory zero as the mean of the series is removed (but this may be affected by tapering): it is replaced by an interpolation of adjacent values during smoothing, and no value is returned for that frequency.


A list object of class "spec" (see spectrum) with the following additional components:


The kernel argument, or the kernel constructed from spans.


The distribution of the spectral density estimate can be approximated by a (scaled) chi square distribution with df degrees of freedom.


The equivalent bandwidth of the kernel smoother as defined by Bloomfield (1976, page 201).


The value of the taper argument.


The value of the pad argument.


The value of the detrend argument.


The value of the demean argument.

The result is returned invisibly if plot is true.


Originally Martyn Plummer; kernel smoothing by Adrian Trapletti, synthesis by B.D. Ripley


Bloomfield, P. (1976) Fourier Analysis of Time Series: An Introduction. Wiley.

Brockwell, P.J. and Davis, R.A. (1991) Time Series: Theory and Methods. Second edition. Springer.

Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with S. Fourth edition. Springer. (Especially pp. 392–7.)

See Also

spectrum, spec.taper, plot.spec, fft



## Examples from Venables & Ripley
spectrum(ldeaths, spans = c(3,5))
spectrum(ldeaths, spans = c(5,7))
spectrum(mdeaths, spans = c(3,3))
spectrum(fdeaths, spans = c(3,3))

## bivariate example
mfdeaths.spc <- spec.pgram(ts.union(mdeaths, fdeaths), spans = c(3,3))
# plots marginal spectra: now plot coherency and phase
plot(mfdeaths.spc, plot.type = "coherency")
plot(mfdeaths.spc, plot.type = "phase")

## now impose a lack of alignment
mfdeaths.spc <- spec.pgram(ts.intersect(mdeaths, lag(fdeaths, 4)),
   spans = c(3,3), plot = FALSE)
plot(mfdeaths.spc, plot.type = "coherency")
plot(mfdeaths.spc, plot.type = "phase")

stocks.spc <- spectrum(EuStockMarkets, kernel("daniell", c(30,50)),
                       plot = FALSE)
plot(stocks.spc, plot.type = "marginal") # the default type
plot(stocks.spc, plot.type = "coherency")
plot(stocks.spc, plot.type = "phase")

sales.spc <- spectrum(ts.union(BJsales, BJsales.lead),
                      kernel("modified.daniell", c(5,7)))
plot(sales.spc, plot.type = "coherency")
plot(sales.spc, plot.type = "phase")

Taper a Time Series by a Cosine Bell


Apply a cosine-bell taper to a time series.


spec.taper(x, p = 0.1)



A univariate or multivariate time series


The proportion to be tapered at each end of the series, either a scalar (giving the proportion for all series) or a vector of the length of the number of series (giving the proportion for each series).


The cosine-bell taper is applied to the first and last p[i] observations of time series x[, i].


A new time series object.

See Also

spec.pgram, cpgram

Spectral Density Estimation


The spectrum function estimates the spectral density of a time series.


spectrum(x, ..., method = c("pgram", "ar"))



A univariate or multivariate time series.


String specifying the method used to estimate the spectral density. Allowed methods are "pgram" (the default) and "ar". Can be abbreviated.


Further arguments to specific spec methods or plot.spec.


spectrum is a wrapper function which calls the methods spec.pgram and

The spectrum here is defined (for historical compatibility) with scaling 1/frequency(x). This makes the spectral density a density over the range (-frequency(x)/2, +frequency(x)/2], whereas a more common scaling is 2π2\pi and range (0.5,0.5](-0.5, 0.5] (e.g., Bloomfield) or 1 and range (π,π](-\pi, \pi].

If available, a confidence interval will be plotted by plot.spec: this is asymmetric, and the width of the centre mark indicates the equivalent bandwidth.


An object of class "spec", which is a list containing at least the following components:


vector of frequencies at which the spectral density is estimated. (Possibly approximate Fourier frequencies.) The units are the reciprocal of cycles per unit time (and not per observation spacing): see ‘Details’ below.


Vector (for univariate series) or matrix (for multivariate series) of estimates of the spectral density at frequencies corresponding to freq.


NULL for univariate series. For multivariate time series, a matrix containing the squared coherency between different series. Column i+(j1)(j2)/2i + (j - 1) * (j - 2)/2 of coh contains the squared coherency between columns ii and jj of x, where i<ji < j.


NULL for univariate series. For multivariate time series a matrix containing the cross-spectrum phase between different series. The format is the same as coh.


The name of the time series.


For multivariate input, the names of the component series.


The method used to calculate the spectrum.

The result is returned invisibly if plot is true.


The default plot for objects of class "spec" is quite complex, including an error bar and default title, subtitle and axis labels. The defaults can all be overridden by supplying the appropriate graphical parameters.


Martyn Plummer, B.D. Ripley


Bloomfield, P. (1976) Fourier Analysis of Time Series: An Introduction. Wiley.

Brockwell, P. J. and Davis, R. A. (1991) Time Series: Theory and Methods. Second edition. Springer.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S-PLUS. Fourth edition. Springer. (Especially pages 392–7.)

See Also, spec.pgram; plot.spec.



## Examples from Venables & Ripley
## spec.pgram
par(mfrow = c(2,2))
spectrum(lh, spans = 3)
spectrum(lh, spans = c(3,3))
spectrum(lh, spans = c(3,5))

spectrum(ldeaths, spans = c(3,3))
spectrum(ldeaths, spans = c(3,5))
spectrum(ldeaths, spans = c(5,7))
spectrum(ldeaths, spans = c(5,7), log = "dB", ci = 0.8)

# for multivariate examples see the help for spec.pgram

spectrum(lh, method = "ar")
spectrum(ldeaths, method = "ar")

Interpolating Splines


Perform cubic (or Hermite) spline interpolation of given data points, returning either a list of points obtained by the interpolation or a function performing the interpolation.


splinefun(x, y = NULL,
          method = c("fmm", "periodic", "natural", "monoH.FC", "hyman"),
          ties = mean)

spline(x, y = NULL, n = 3*length(x), method = "fmm",
       xmin = min(x), xmax = max(x), xout, ties = mean)

splinefunH(x, y, m)


x, y

vectors giving the coordinates of the points to be interpolated. Alternatively a single plotting structure can be specified: see xy.coords.

y must be increasing or decreasing for method = "hyman".


(for splinefunH()): vector of slopes mim_i at the points (xi,yi)(x_i,y_i); these together determine the Hermite “spline” which is piecewise cubic, (only) once differentiable continuously.


specifies the type of spline to be used. Possible values are "fmm", "natural", "periodic", "monoH.FC" and "hyman". Can be abbreviated.


if xout is left unspecified, interpolation takes place at n equally spaced points spanning the interval [xmin, xmax].

xmin, xmax

left-hand and right-hand endpoint of the interpolation interval (when xout is unspecified).


an optional set of values specifying where interpolation is to take place.


handling of tied x values. The string "ordered" or a function (or the name of a function) taking a single vector argument and returning a single number or a length-2 list of both, see approx and its ‘Details’ section, and the example below.


The inputs can contain missing values which are deleted, so at least one complete (x, y) pair is required. If method = "fmm", the spline used is that of Forsythe, Malcolm and Moler (an exact cubic is fitted through the four points at each end of the data, and this is used to determine the end conditions). Natural splines are used when method = "natural", and periodic splines when method = "periodic".

The method "monoH.FC" computes a monotone Hermite spline according to the method of Fritsch and Carlson. It does so by determining slopes such that the Hermite spline, determined by (xi,yi,mi)(x_i,y_i,m_i), is monotone (increasing or decreasing) iff the data are.

Method "hyman" computes a monotone cubic spline using Hyman filtering of an method = "fmm" fit for strictly monotonic inputs.

These interpolation splines can also be used for extrapolation, that is prediction at points outside the range of x. Extrapolation makes little sense for method = "fmm"; for natural splines it is linear using the slope of the interpolating curve at the nearest data point.


spline returns a list containing components x and y which give the ordinates where interpolation took place and the interpolated values.

splinefun returns a function with formal arguments x and deriv, the latter defaulting to zero. This function can be used to evaluate the interpolating cubic spline (deriv = 0), or its derivatives (deriv = 1, 2, 3) at the points x, where the spline function interpolates the data points originally specified. It uses data stored in its environment when it was created, the details of which are subject to change.


The value returned by splinefun contains references to the code in the current version of R: it is not intended to be saved and loaded into a different R session. This is safer in R >= 3.0.0.


R Core Team.

Simon Wood for the original code for Hyman filtering.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.

Dougherty, R. L., Edelman, A. and Hyman, J. M. (1989) Positivity-, monotonicity-, or convexity-preserving cubic and quintic Hermite interpolation. Mathematics of Computation, 52, 471–494. doi:10.1090/S0025-5718-1989-0962209-1.

Forsythe, G. E., Malcolm, M. A. and Moler, C. B. (1977). Computer Methods for Mathematical Computations. Wiley.

Fritsch, F. N. and Carlson, R. E. (1980). Monotone piecewise cubic interpolation. SIAM Journal on Numerical Analysis, 17, 238–246. doi:10.1137/0717021.

Hyman, J. M. (1983). Accurate monotonicity preserving cubic interpolation. SIAM Journal on Scientific and Statistical Computing, 4, 645–654. doi:10.1137/0904045.

See Also

approx and approxfun for constant and linear interpolation.

Package splines, especially interpSpline and periodicSpline for interpolation splines. That package also generates spline bases that can be used for regression splines.

smooth.spline for smoothing splines.



op <- par(mfrow = c(2,1), mgp = c(2,.8,0), mar = 0.1+c(3,3,3,1))
n <- 9
x <- 1:n
y <- rnorm(n)
plot(x, y, main = paste("spline[fun](.) through", n, "points"))
lines(spline(x, y))
lines(spline(x, y, n = 201), col = 2)

y <- (x-6)^2
plot(x, y, main = "spline(.) -- 3 methods")
lines(spline(x, y, n = 201), col = 2)
lines(spline(x, y, n = 201, method = "natural"), col = 3)
lines(spline(x, y, n = 201, method = "periodic"), col = 4)
legend(6, 25, c("fmm","natural","periodic"), col = 2:4, lty = 1)

y <- sin((x-0.5)*pi)
f <- splinefun(x, y)
ls(envir = environment(f))
splinecoef <- get("z", envir = environment(f))
curve(f(x), 1, 10, col = "green", lwd = 1.5)
points(splinecoef, col = "purple", cex = 2)
curve(f(x, deriv = 1), 1, 10, col = 2, lwd = 1.5)
curve(f(x, deriv = 2), 1, 10, col = 2, lwd = 1.5, n = 401)
curve(f(x, deriv = 3), 1, 10, col = 2, lwd = 1.5, n = 401)

## Manual spline evaluation --- demo the coefficients :
.x <- splinecoef$x
u <- seq(3, 6, by = 0.25)
(ii <- findInterval(u, .x))
dx <- u - .x[ii]
f.u <- with(splinecoef,
            y[ii] + dx*(b[ii] + dx*(c[ii] + dx* d[ii])))
stopifnot(all.equal(f(u), f.u))

## An example with ties (non-unique  x values):
set.seed(1); x <- round(rnorm(30), 1); y <- sin(pi * x) + rnorm(30)/10
plot(x, y, main = "spline(x,y)  when x has ties")
lines(spline(x, y, n = 201), col = 2)
## visualizes the non-unique ones:
tx <- table(x); mx <- as.numeric(names(tx[tx > 1]))
ry <- matrix(unlist(tapply(y, match(x, mx), range, simplify = FALSE)),
             ncol = 2, byrow = TRUE)
segments(mx, ry[, 1], mx, ry[, 2], col = "blue", lwd = 2)

## Another example with sorted x, but ties:
set.seed(8); x <- sort(round(rnorm(30), 1)); y <- round(sin(pi * x) + rnorm(30)/10, 3)
summary(diff(x) == 0) # -> 7 duplicated x-values
str(spline(x, y, n = 201, ties="ordered")) # all '$y' entries are NaN
## The default (ties=mean) is ok, but most efficient to use instead is
sxyo <- spline(x, y, n = 201, ties= list("ordered", mean))
sapply(sxyo, summary)# all fine now
plot(x, y, main = "spline(x,y, ties=list(\"ordered\", mean))  for when x has ties")
lines(sxyo, col="blue")

## An example of monotone interpolation
n <- 20
x. <- sort(runif(n)) ; y. <- cumsum(abs(rnorm(n)))
plot(x., y.)
curve(splinefun(x., y.)(x), add = TRUE, col = 2, n = 1001)
curve(splinefun(x., y., method = "monoH.FC")(x), add = TRUE, col = 3, n = 1001)
curve(splinefun(x., y., method = "hyman")   (x), add = TRUE, col = 4, n = 1001)
       paste0("splinefun( \"", c("fmm", "monoH.FC", "hyman"), "\" )"),
       col = 2:4, lty = 1, bty = "n")

## and one from Fritsch and Carlson (1980), Dougherty et al (1989)
x. <- c(7.09, 8.09, 8.19, 8.7, 9.2, 10, 12, 15, 20)
f <- c(0, 2.76429e-5, 4.37498e-2, 0.169183, 0.469428, 0.943740,
       0.998636, 0.999919, 0.999994)
s0 <- splinefun(x., f)
s1 <- splinefun(x., f, method = "monoH.FC")
s2 <- splinefun(x., f, method = "hyman")
plot(x., f, ylim = c(-0.2, 1.2))
curve(s0(x), add = TRUE, col = 2, n = 1001) -> m0
curve(s1(x), add = TRUE, col = 3, n = 1001)
curve(s2(x), add = TRUE, col = 4, n = 1001)
       paste0("splinefun( \"", c("fmm", "monoH.FC", "hyman"), "\" )"),
       col = 2:4, lty = 1, bty = "n")

## they seem identical, but are not quite:
xx <- m0$x
plot(xx, s1(xx) - s2(xx), type = "l",  col = 2, lwd = 2,
     main = "Difference   monoH.FC - hyman"); abline(h = 0, lty = 3)

x <- xx[xx < 10.2] ## full range: x <- xx .. does not show enough
ccol <- adjustcolor(2:4, 0.8)
matplot(x, cbind(s0(x, deriv = 2), s1(x, deriv = 2), s2(x, deriv = 2))^2,
        lwd = 2, col = ccol, type = "l", ylab = quote({{f*second}(x)}^2),
        main = expression({{f*second}(x)}^2 ~" for the three 'splines'"))
       paste0("splinefun( \"", c("fmm", "monoH.FC", "hyman"), "\" )"),
       lwd = 2, col  =  ccol, lty = 1:3, bty = "n")
## --> "hyman" has slightly smaller  Integral f''(x)^2 dx  than "FC",
## here, and both are 'much worse' than the regular fmm spline.

Self-Starting nls Asymptotic Model


This selfStart model evaluates the asymptotic regression function and its gradient. It has an initial attribute that will evaluate initial estimates of the parameters Asym, R0, and lrc for a given set of data.

Note that SSweibull() generalizes this asymptotic model with an extra parameter.


SSasymp(input, Asym, R0, lrc)



a numeric vector of values at which to evaluate the model.


a numeric parameter representing the horizontal asymptote on the right side (very large values of input).


a numeric parameter representing the response when input is zero.


a numeric parameter representing the natural logarithm of the rate constant.


a numeric vector of the same length as input. It is the value of the expression Asym+(R0-Asym)*exp(-exp(lrc)*input). If all of the arguments Asym, R0, and lrc are names of objects, the gradient matrix with respect to these names is attached as an attribute named gradient.


José Pinheiro and Douglas Bates

See Also

nls, selfStart


Lob.329 <- Loblolly[ Loblolly$Seed == "329", ]
SSasymp( Lob.329$age, 100, -8.5, -3.2 )   # response only
  Asym <- 100 ; resp0 <- -8.5 ; lrc <- -3.2
  SSasymp( Lob.329$age, Asym, resp0, lrc) # response _and_ gradient
getInitial(height ~ SSasymp( age, Asym, resp0, lrc), data = Lob.329)
## Initial values are in fact the converged values
fm1 <- nls(height ~ SSasymp( age, Asym, resp0, lrc), data = Lob.329)

## Visualize the SSasymp()  model  parametrization :

  xx <- seq(-.3, 5, length.out = 101)
  ##  Asym + (R0-Asym) * exp(-exp(lrc)* x) :
  yy <- 5 - 4 * exp(-xx / exp(3/4))
  stopifnot( all.equal(yy, SSasymp(xx, Asym = 5, R0 = 1, lrc = -3/4)) )
  op <- par(mar = c(0, .2, 4.1, 0))
  plot(xx, yy, type = "l", axes = FALSE, ylim = c(0,5.2), xlim = c(-.3, 5),
       xlab = "", ylab = "", lwd = 2,
       main = quote("Parameters in the SSasymp model " ~
                    {f[phi](x) == phi[1] + (phi[2]-phi[1])*~e^{-e^{phi[3]}*~x}}))
  mtext(quote(list(phi[1] == "Asym", phi[2] == "R0", phi[3] == "lrc")))
  usr <- par("usr")
  arrows(usr[1], 0, usr[2], 0, length = 0.1, angle = 25)
  arrows(0, usr[3], 0, usr[4], length = 0.1, angle = 25)
  text(usr[2] - 0.2, 0.1, "x", adj = c(1, 0))
  text(     -0.1, usr[4], "y", adj = c(1, 1))
  abline(h = 5, lty = 3)
  arrows(c(0.35, 0.65), 1,
         c(0  ,  1   ), 1, length = 0.08, angle = 25); text(0.5, 1, quote(1))
  y0 <- 1 + 4*exp(-3/4) ; t.5 <- log(2) / exp(-3/4) ; AR2 <- 3 # (Asym + R0)/2
  segments(c(1, 1), c( 1, y0),
           c(1, 0), c(y0,  1),  lty = 2, lwd = 0.75)
  text(1.1, 1/2+y0/2, quote((phi[1]-phi[2])*e^phi[3]), adj = c(0,.5))
  axis(2, at = c(1,AR2,5), labels= expression(phi[2], frac(phi[1]+phi[2],2), phi[1]),
       pos=0, las=1)
  arrows(c(.6,t.5-.6), AR2,
         c(0, t.5   ), AR2, length = 0.08, angle = 25)
  text(   t.5/2,   AR2, quote(t[0.5]))
  text(   t.5 +.4, AR2,
       quote({f(t[0.5]) == frac(phi[1]+phi[2],2)}~{} %=>% {}~~
                {t[0.5] == frac(log(2), e^{phi[3]})}), adj = c(0, 0.5))

Self-Starting nls Asymptotic Model with an Offset


This selfStart model evaluates an alternative parametrization of the asymptotic regression function and the gradient with respect to those parameters. It has an initial attribute that creates initial estimates of the parameters Asym, lrc, and c0.


SSasympOff(input, Asym, lrc, c0)



a numeric vector of values at which to evaluate the model.


a numeric parameter representing the horizontal asymptote on the right side (very large values of input).


a numeric parameter representing the natural logarithm of the rate constant.


a numeric parameter representing the input for which the response is zero.


a numeric vector of the same length as input. It is the value of the expression Asym*(1 - exp(-exp(lrc)*(input - c0))). If all of the arguments Asym, lrc, and c0 are names of objects, the gradient matrix with respect to these names is attached as an attribute named gradient.


José Pinheiro and Douglas Bates

See Also

nls, selfStart; example(SSasympOff) gives graph showing the SSasympOff parametrization.


CO2.Qn1 <- CO2[CO2$Plant == "Qn1", ]
SSasympOff(CO2.Qn1$conc, 32, -4, 43)  # response only
local({  Asym <- 32; lrc <- -4; c0 <- 43
  SSasympOff(CO2.Qn1$conc, Asym, lrc, c0) # response and gradient
getInitial(uptake ~ SSasympOff(conc, Asym, lrc, c0), data = CO2.Qn1)
## Initial values are in fact the converged values
fm1 <- nls(uptake ~ SSasympOff(conc, Asym, lrc, c0), data = CO2.Qn1)

## Visualize the SSasympOff()  model  parametrization :

  xx <- seq(0.25, 8,  by=1/16)
  yy <- 5 * (1 -  exp(-(xx - 3/4)*0.4))
  stopifnot( all.equal(yy, SSasympOff(xx, Asym = 5, lrc = log(0.4), c0 = 3/4)) )
  op <- par(mar = c(0, 0, 4.0, 0))
  plot(xx, yy, type = "l", axes = FALSE, ylim = c(-.5,6), xlim = c(-1, 8),
       xlab = "", ylab = "", lwd = 2,
       main = "Parameters in the SSasympOff model")
  mtext(quote(list(phi[1] == "Asym", phi[2] == "lrc", phi[3] == "c0")))
  usr <- par("usr")
  arrows(usr[1], 0, usr[2], 0, length = 0.1, angle = 25)
  arrows(0, usr[3], 0, usr[4], length = 0.1, angle = 25)
  text(usr[2] - 0.2, 0.1, "x", adj = c(1, 0))
  text(     -0.1, usr[4], "y", adj = c(1, 1))
  abline(h = 5, lty = 3)
  arrows(-0.8, c(2.1, 2.9),
         -0.8, c(0  , 5  ), length = 0.1, angle = 25)
  text  (-0.8, 2.5, quote(phi[1]))
  segments(3/4, -.2, 3/4, 1.6, lty = 2)
  text    (3/4,    c(-.3, 1.7), quote(phi[3]))
  arrows(c(1.1, 1.4), -.15,
         c(3/4, 7/4), -.15, length = 0.07, angle = 25)
  text    (3/4 + 1/2, -.15, quote(1))
  segments(c(3/4, 7/4, 7/4), c(0, 0, 2),   # 5 * exp(log(0.4)) = 2
           c(7/4, 7/4, 3/4), c(0, 2, 0),  lty = 2, lwd = 2)
  text(      7/4 +.1, 2./2, quote(phi[1]*e^phi[2]), adj = c(0, .5))

Self-Starting nls Asymptotic Model through the Origin


This selfStart model evaluates the asymptotic regression function through the origin and its gradient. It has an initial attribute that will evaluate initial estimates of the parameters Asym and lrc for a given set of data.


SSasympOrig(input, Asym, lrc)



a numeric vector of values at which to evaluate the model.


a numeric parameter representing the horizontal asymptote.


a numeric parameter representing the natural logarithm of the rate constant.


a numeric vector of the same length as input. It is the value of the expression Asym*(1 - exp(-exp(lrc)*input)). If all of the arguments Asym and lrc are names of objects, the gradient matrix with respect to these names is attached as an attribute named gradient.


José Pinheiro and Douglas Bates

See Also

nls, selfStart


Lob.329 <- Loblolly[ Loblolly$Seed == "329", ]
SSasympOrig(Lob.329$age, 100, -3.2)  # response only
local({   Asym <- 100; lrc <- -3.2
  SSasympOrig(Lob.329$age, Asym, lrc) # response and gradient
getInitial(height ~ SSasympOrig(age, Asym, lrc), data = Lob.329)
## Initial values are in fact the converged values
fm1 <- nls(height ~ SSasympOrig(age, Asym, lrc), data = Lob.329)

## Visualize the SSasympOrig()  model  parametrization :

  xx <- seq(0, 5, length.out = 101)
  yy <- 5 * (1- exp(-xx * log(2)))
  stopifnot( all.equal(yy, SSasympOrig(xx, Asym = 5, lrc = log(log(2)))) )

  op <- par(mar = c(0, 0, 3.5, 0))
  plot(xx, yy, type = "l", axes = FALSE, ylim = c(0,5), xlim = c(-1/4, 5),
       xlab = "", ylab = "", lwd = 2,
       main = quote("Parameters in the SSasympOrig model"~~ f[phi](x)))
  mtext(quote(list(phi[1] == "Asym", phi[2] == "lrc")))
  usr <- par("usr")
  arrows(usr[1], 0, usr[2], 0, length = 0.1, angle = 25)
  arrows(0, usr[3], 0, usr[4], length = 0.1, angle = 25)
  text(usr[2] - 0.2, 0.1, "x", adj = c(1, 0))
  text(   -0.1,   usr[4], "y", adj = c(1, 1))
  abline(h = 5, lty = 3)
  axis(2, at = 5*c(1/2,1), labels= expression(frac(phi[1],2), phi[1]), pos=0, las=1)
  arrows(c(.3,.7), 5/2,
         c(0, 1 ), 5/2, length = 0.08, angle = 25)
  text(   0.5,     5/2, quote(t[0.5]))
  text(   1 +.4,   5/2,
       quote({f(t[0.5]) == frac(phi[1],2)}~{} %=>% {}~~{t[0.5] == frac(log(2), e^{phi[2]})}),
       adj = c(0, 0.5))

Self-Starting nls Biexponential Model


This selfStart model evaluates the biexponential model function and its gradient. It has an initial attribute that creates initial estimates of the parameters A1, lrc1, A2, and lrc2.


SSbiexp(input, A1, lrc1, A2, lrc2)



a numeric vector of values at which to evaluate the model.


a numeric parameter representing the multiplier of the first exponential.


a numeric parameter representing the natural logarithm of the rate constant of the first exponential.


a numeric parameter representing the multiplier of the second exponential.


a numeric parameter representing the natural logarithm of the rate constant of the second exponential.


a numeric vector of the same length as input. It is the value of the expression A1*exp(-exp(lrc1)*input)+A2*exp(-exp(lrc2)*input). If all of the arguments A1, lrc1, A2, and lrc2 are names of objects, the gradient matrix with respect to these names is attached as an attribute named gradient.


José Pinheiro and Douglas Bates

See Also

nls, selfStart


Indo.1 <- Indometh[Indometh$Subject == 1, ]
SSbiexp( Indo.1$time, 3, 1, 0.6, -1.3 )  # response only
A1 <- 3; lrc1 <- 1; A2 <- 0.6; lrc2 <- -1.3
SSbiexp( Indo.1$time, A1, lrc1, A2, lrc2 ) # response and gradient
print(getInitial(conc ~ SSbiexp(time, A1, lrc1, A2, lrc2), data = Indo.1),
      digits = 5)
## Initial values are in fact the converged values
fm1 <- nls(conc ~ SSbiexp(time, A1, lrc1, A2, lrc2), data = Indo.1)

## Show the model components visually

  xx <- seq(0, 5, length.out = 101)
  y1 <- 3.5 * exp(-4*xx)
  y2 <- 1.5 * exp(-xx)
  plot(xx, y1 + y2, type = "l", lwd=2, ylim = c(-0.2,6), xlim = c(0, 5),
       main = "Components of the SSbiexp model")
  lines(xx, y1, lty = 2, col="tomato"); abline(v=0, h=0, col="gray40")
  lines(xx, y2, lty = 3, col="blue2" )
  legend("topright", c("y1+y2", "y1 = 3.5 * exp(-4*x)", "y2 = 1.5 * exp(-x)"),
         lty=1:3, col=c("black","tomato","blue2"), bty="n")
  axis(2, pos=0, at = c(3.5, 1.5), labels = c("A1","A2"), las=2)

## and how you could have got their sum via SSbiexp():
  ySS <- SSbiexp(xx, 3.5, log(4), 1.5, log(1))
  ##                      ---          ---
  stopifnot(all.equal(y1+y2, ySS, tolerance = 1e-15))

## Show a no-noise example
datN <- data.frame(time = (0:600)/64)
datN$conc <- predict(fm1, newdata=datN)
plot(conc ~ time, data=datN) # perfect, no noise

## Fails by default (scaleOffset=0) on most platforms {also after increasing maxiter !}
## Not run: 
        nls(conc ~ SSbiexp(time, A1, lrc1, A2, lrc2), data = datN, trace=TRUE)
## End(Not run)

fmX1 <- nls(conc ~ SSbiexp(time, A1, lrc1, A2, lrc2), data = datN,
            control = list(scaleOffset=1))
fmX  <- nls(conc ~ SSbiexp(time, A1, lrc1, A2, lrc2), data = datN,
            control = list(scaleOffset=1, printEval=TRUE, tol=1e-11, nDcentral=TRUE), trace=TRUE)
all.equal(coef(fm1), coef(fmX1), tolerance=0) # ... rel.diff.: 1.57e-6
all.equal(coef(fm1), coef(fmX),  tolerance=0) # ... rel.diff.: 1.03e-12

stopifnot(all.equal(coef(fm1), coef(fmX1), tolerance = 6e-6),
          all.equal(coef(fm1), coef(fmX ), tolerance = 1e-11))

SSD Matrix and Estimated Variance Matrix in Multivariate Models


Functions to compute matrix of residual sums of squares and products, or the estimated variance matrix for multivariate linear models.


# S3 method for class 'mlm'
SSD(object, ...)

# S3 methods for class 'SSD' and 'mlm'
estVar(object, ...)



object of class "mlm", or "SSD" in the case of estVar.




SSD() returns a list of class "SSD" containing the following components


The residual sums of squares and products matrix


Degrees of freedom


Copied from object

estVar returns a matrix with the estimated variances and covariances.

See Also

mauchly.test, anova.mlm


# Lifted from Baron+Li:
# "Notes on the use of R for psychology experiments and questionnaires"
# Maxwell and Delaney, p. 497
reacttime <- matrix(c(
420, 420, 480, 480, 600, 780,
420, 480, 480, 360, 480, 600,
480, 480, 540, 660, 780, 780,
420, 540, 540, 480, 780, 900,
540, 660, 540, 480, 660, 720,
360, 420, 360, 360, 480, 540,
480, 480, 600, 540, 720, 840,
480, 600, 660, 540, 720, 900,
540, 600, 540, 480, 720, 780,
480, 420, 540, 540, 660, 780),
ncol = 6, byrow = TRUE,
dimnames = list(subj = 1:10,
              cond = c("deg0NA", "deg4NA", "deg8NA",
                       "deg0NP", "deg4NP", "deg8NP")))

mlmfit <- lm(reacttime ~ 1)

Self-Starting nls First-order Compartment Model


This selfStart model evaluates the first-order compartment function and its gradient. It has an initial attribute that creates initial estimates of the parameters lKe, lKa, and lCl.


SSfol(Dose, input, lKe, lKa, lCl)



a numeric value representing the initial dose.


a numeric vector at which to evaluate the model.


a numeric parameter representing the natural logarithm of the elimination rate constant.


a numeric parameter representing the natural logarithm of the absorption rate constant.


a numeric parameter representing the natural logarithm of the clearance.


a numeric vector of the same length as input, which is the value of the expression

Dose * exp(lKe+lKa-lCl) * (exp(-exp(lKe)*input) - exp(-exp(lKa)*input))
    / (exp(lKa) - exp(lKe))

If all of the arguments lKe, lKa, and lCl are names of objects, the gradient matrix with respect to these names is attached as an attribute named gradient.


José Pinheiro and Douglas Bates

See Also

nls, selfStart


Theoph.1 <- Theoph[ Theoph$Subject == 1, ]
with(Theoph.1, SSfol(Dose, Time, -2.5, 0.5, -3)) # response only
with(Theoph.1, local({  lKe <- -2.5; lKa <- 0.5; lCl <- -3
  SSfol(Dose, Time, lKe, lKa, lCl) # response _and_ gradient
getInitial(conc ~ SSfol(Dose, Time, lKe, lKa, lCl), data = Theoph.1)
## Initial values are in fact the converged values
fm1 <- nls(conc ~ SSfol(Dose, Time, lKe, lKa, lCl), data = Theoph.1)

Self-Starting nls Four-Parameter Logistic Model


This selfStart model evaluates the four-parameter logistic function and its gradient. It has an initial attribute computing initial estimates of the parameters A, B, xmid, and scal for a given set of data.


SSfpl(input, A, B, xmid, scal)



a numeric vector of values at which to evaluate the model.


a numeric parameter representing the horizontal asymptote on the left side (very small values of input).


a numeric parameter representing the horizontal asymptote on the right side (very large values of input).


a numeric parameter representing the input value at the inflection point of the curve. The value of SSfpl will be midway between A and B at xmid.


a numeric scale parameter on the input axis.


a numeric vector of the same length as input. It is the value of the expression A+(B-A)/(1+exp((xmid-input)/scal)). If all of the arguments A, B, xmid, and scal are names of objects, the gradient matrix with respect to these names is attached as an attribute named gradient.


José Pinheiro and Douglas Bates

See Also

nls, selfStart


Chick.1 <- ChickWeight[ChickWeight$Chick == 1, ]
SSfpl(Chick.1$Time, 13, 368, 14, 6)  # response only
  A <- 13; B <- 368; xmid <- 14; scal <- 6
  SSfpl(Chick.1$Time, A, B, xmid, scal) # response _and_ gradient
print(getInitial(weight ~ SSfpl(Time, A, B, xmid, scal), data = Chick.1),
      digits = 5)
## Initial values are in fact the converged values
fm1 <- nls(weight ~ SSfpl(Time, A, B, xmid, scal), data = Chick.1)

## Visualizing the  SSfpl()  parametrization
  xx <- seq(-0.5, 5, length.out = 101)
  yy <- 1 + 4 / (1 + exp((2-xx))) # == SSfpl(xx, *) :
  stopifnot( all.equal(yy, SSfpl(xx, A = 1, B = 5, xmid = 2, scal = 1)) )
  op <- par(mar = c(0, 0, 3.5, 0))
  plot(xx, yy, type = "l", axes = FALSE, ylim = c(0,6), xlim = c(-1, 5),
       xlab = "", ylab = "", lwd = 2,
       main = "Parameters in the SSfpl model")
  mtext(quote(list(phi[1] == "A", phi[2] == "B", phi[3] == "xmid", phi[4] == "scal")))
  usr <- par("usr")
  arrows(usr[1], 0, usr[2], 0, length = 0.1, angle = 25)
  arrows(0, usr[3], 0, usr[4], length = 0.1, angle = 25)
  text(usr[2] - 0.2, 0.1, "x", adj = c(1, 0))
  text(     -0.1, usr[4], "y", adj = c(1, 1))
  abline(h = c(1, 5), lty = 3)
  arrows(-0.8, c(2.1, 2.9),
         -0.8, c(0,   5  ), length = 0.1, angle = 25)
  text  (-0.8, 2.5, quote(phi[1]))
  arrows(-0.3, c(1/4, 3/4),
         -0.3, c(0,   1  ), length = 0.07, angle = 25)
  text  (-0.3, 0.5, quote(phi[2]))
  text(2, -.1, quote(phi[3]))
  segments(c(2,3,3), c(0,3,4), # SSfpl(x = xmid = 2) = 3
           c(2,3,2), c(3,4,3),    lty = 2, lwd = 0.75)
  arrows(c(2.3, 2.7), 3,
         c(2.0, 3  ), 3, length = 0.08, angle = 25)
  text(      2.5,     3, quote(phi[4])); text(3.1, 3.5, "1")

Self-Starting nls Gompertz Growth Model


This selfStart model evaluates the Gompertz growth model and its gradient. It has an initial attribute that creates initial estimates of the parameters Asym, b2, and b3.


SSgompertz(x, Asym, b2, b3)



a numeric vector of values at which to evaluate the model.


a numeric parameter representing the asymptote.


a numeric parameter related to the value of the function at x = 0


a numeric parameter related to the scale the x axis.


a numeric vector of the same length as input. It is the value of the expression Asym*exp(-b2*b3^x). If all of the arguments Asym, b2, and b3 are names of objects the gradient matrix with respect to these names is attached as an attribute named gradient.


Douglas Bates

See Also

nls, selfStart


DNase.1 <- subset(DNase, Run == 1)
SSgompertz(log(DNase.1$conc), 4.5, 2.3, 0.7)  # response only
local({  Asym <- 4.5; b2 <- 2.3; b3 <- 0.7
  SSgompertz(log(DNase.1$conc), Asym, b2, b3) # response _and_ gradient
print(getInitial(density ~ SSgompertz(log(conc), Asym, b2, b3),
                 data = DNase.1), digits = 5)
## Initial values are in fact the converged values
fm1 <- nls(density ~ SSgompertz(log(conc), Asym, b2, b3),
           data = DNase.1)
plot(density ~ log(conc), DNase.1, # xlim = c(0, 21),
     main = "SSgompertz() fit to DNase.1")
ux <- par("usr")[1:2]; x <- seq(ux[1], ux[2], length.out=250)
lines(x,, c(list(x=x), coef(fm1))), col = "red", lwd=2)
As <- coef(fm1)[["Asym"]]; abline(v = 0, h = 0, lty = 3)
axis(2, at= exp(-coef(fm1)[["b2"]]), quote(e^{-b[2]}), las=1, pos=0)

Self-Starting nls Logistic Model


This selfStart model evaluates the logistic function and its gradient. It has an initial attribute that creates initial estimates of the parameters Asym, xmid, and scal. In R 3.4.2 and earlier, that init function failed when min(input) was exactly zero.


SSlogis(input, Asym, xmid, scal)



a numeric vector of values at which to evaluate the model.


a numeric parameter representing the asymptote.


a numeric parameter representing the x value at the inflection point of the curve. The value of SSlogis will be Asym/2 at xmid.


a numeric scale parameter on the input axis.


a numeric vector of the same length as input. It is the value of the expression Asym/(1+exp((xmid-input)/scal)). If all of the arguments Asym, xmid, and scal are names of objects the gradient matrix with respect to these names is attached as an attribute named gradient.


José Pinheiro and Douglas Bates

See Also

nls, selfStart


Chick.1 <- ChickWeight[ChickWeight$Chick == 1, ]
SSlogis(Chick.1$Time, 368, 14, 6)  # response only
  Asym <- 368; xmid <- 14; scal <- 6
  SSlogis(Chick.1$Time, Asym, xmid, scal) # response _and_ gradient
getInitial(weight ~ SSlogis(Time, Asym, xmid, scal), data = Chick.1)
## Initial values are in fact the converged one here, "Number of iter...: 0" :
fm1 <- nls(weight ~ SSlogis(Time, Asym, xmid, scal), data = Chick.1)
## but are slightly improved here:
fm2 <- update(fm1, control=nls.control(tol = 1e-9, warnOnly=TRUE), trace = TRUE)
all.equal(coef(fm1), coef(fm2)) # "Mean relative difference: 9.6e-6"
str(fm2$convInfo) # 3 iterations

dwlg1 <- data.frame(Prop = c(rep(0,5), 2, 5, rep(9, 9)), end = 1:16)
iPar <- getInitial(Prop ~ SSlogis(end, Asym, xmid, scal), data = dwlg1)
## failed in R <= 3.4.2 (because of the '0's in 'Prop')
stopifnot(all.equal(tolerance = 1e-6,
   iPar, c(Asym = 9.0678, xmid = 6.79331, scal = 0.499934)))

## Visualize the SSlogis()  model  parametrization :
  xx <- seq(-0.75, 5, by=1/32)
  yy <- 5 / (1 + exp((2-xx)/0.6)) # == SSlogis(xx, *):
  stopifnot( all.equal(yy, SSlogis(xx, Asym = 5, xmid = 2, scal = 0.6)) )
  op <- par(mar = c(0.5, 0, 3.5, 0))
  plot(xx, yy, type = "l", axes = FALSE, ylim = c(0,6), xlim = c(-1, 5),
       xlab = "", ylab = "", lwd = 2,
       main = "Parameters in the SSlogis model")
  mtext(quote(list(phi[1] == "Asym", phi[2] == "xmid", phi[3] == "scal")))
  usr <- par("usr")
  arrows(usr[1], 0, usr[2], 0, length = 0.1, angle = 25)
  arrows(0, usr[3], 0, usr[4], length = 0.1, angle = 25)
  text(usr[2] - 0.2, 0.1, "x", adj = c(1, 0))
  text(     -0.1, usr[4], "y", adj = c(1, 1))
  abline(h = 5, lty = 3)
  arrows(-0.8, c(2.1, 2.9),
         -0.8, c(0,   5  ), length = 0.1, angle = 25)
  text  (-0.8, 2.5, quote(phi[1]))
  segments(c(2,2.6,2.6), c(0,  2.5,3.5),   # NB.  SSlogis(x = xmid = 2) = 2.5
           c(2,2.6,2  ), c(2.5,3.5,2.5), lty = 2, lwd = 0.75)
  text(2, -.1, quote(phi[2]))
  arrows(c(2.2, 2.4), 2.5,
         c(2.0, 2.6), 2.5, length = 0.08, angle = 25)
  text(      2.3,     2.5, quote(phi[3])); text(2.7, 3, "1")

Self-Starting nls Michaelis-Menten Model


This selfStart model evaluates the Michaelis-Menten model and its gradient. It has an initial attribute that will evaluate initial estimates of the parameters Vm and K


SSmicmen(input, Vm, K)



a numeric vector of values at which to evaluate the model.


a numeric parameter representing the maximum value of the response.


a numeric parameter representing the input value at which half the maximum response is attained. In the field of enzyme kinetics this is called the Michaelis parameter.


a numeric vector of the same length as input. It is the value of the expression Vm*input/(K+input). If both the arguments Vm and K are names of objects, the gradient matrix with respect to these names is attached as an attribute named gradient.


José Pinheiro and Douglas Bates

See Also

nls, selfStart


PurTrt <- Puromycin[ Puromycin$state == "treated", ]
SSmicmen(PurTrt$conc, 200, 0.05)  # response only
local({  Vm <- 200; K <- 0.05
  SSmicmen(PurTrt$conc, Vm, K)    # response _and_ gradient
print(getInitial(rate ~ SSmicmen(conc, Vm, K), data = PurTrt), digits = 3)
## Initial values are in fact the converged values
fm1 <- nls(rate ~ SSmicmen(conc, Vm, K), data = PurTrt)
## Alternative call using the subset argument
fm2 <- nls(rate ~ SSmicmen(conc, Vm, K), data = Puromycin,
           subset = state == "treated")
summary(fm2) # The same indeed:
stopifnot(all.equal(coef(summary(fm1)), coef(summary(fm2))))

## Visualize the SSmicmen()  Michaelis-Menton model parametrization :

  xx <- seq(0, 5, length.out = 101)
  yy <- 5 * xx/(1+xx)
  stopifnot(all.equal(yy, SSmicmen(xx, Vm = 5, K = 1)))
  op <- par(mar = c(0, 0, 3.5, 0))
  plot(xx, yy, type = "l", lwd = 2, ylim = c(-1/4,6), xlim = c(-1, 5),
       ann = FALSE, axes = FALSE, main = "Parameters in the SSmicmen model")
  mtext(quote(list(phi[1] == "Vm", phi[2] == "K")))
  usr <- par("usr")
  arrows(usr[1], 0, usr[2], 0, length = 0.1, angle = 25)
  arrows(0, usr[3], 0, usr[4], length = 0.1, angle = 25)
  text(usr[2] - 0.2, 0.1, "x", adj = c(1, 0))
  text(     -0.1, usr[4], "y", adj = c(1, 1))
  abline(h = 5, lty = 3)
  arrows(-0.8, c(2.1, 2.9),
         -0.8, c(0,   5  ),  length = 0.1, angle = 25)
  text(  -0.8,     2.5, quote(phi[1]))
  segments(1, 0, 1, 2.7, lty = 2, lwd = 0.75)
  text(1, 2.7, quote(phi[2]))

Self-Starting nls Weibull Growth Curve Model


This selfStart model evaluates the Weibull model for growth curve data and its gradient. It has an initial attribute that will evaluate initial estimates of the parameters Asym, Drop, lrc, and pwr for a given set of data.


SSweibull(x, Asym, Drop, lrc, pwr)



a numeric vector of values at which to evaluate the model.


a numeric parameter representing the horizontal asymptote on the right side (very small values of x).


a numeric parameter representing the change from Asym to the y intercept.


a numeric parameter representing the natural logarithm of the rate constant.


a numeric parameter representing the power to which x is raised.


This model is a generalization of the SSasymp model in that it reduces to SSasymp when pwr is unity.


a numeric vector of the same length as x. It is the value of the expression Asym-Drop*exp(-exp(lrc)*x^pwr). If all of the arguments Asym, Drop, lrc, and pwr are names of objects, the gradient matrix with respect to these names is attached as an attribute named gradient.


Douglas Bates


Ratkowsky, David A. (1983), Nonlinear Regression Modeling, Dekker. (section 4.4.5)

See Also

nls, selfStart, SSasymp


Chick.6 <- subset(ChickWeight, (Chick == 6) & (Time > 0))
SSweibull(Chick.6$Time, 160, 115, -5.5, 2.5)   # response only
local({ Asym <- 160; Drop <- 115; lrc <- -5.5; pwr <- 2.5
  SSweibull(Chick.6$Time, Asym, Drop, lrc, pwr) # response _and_ gradient

getInitial(weight ~ SSweibull(Time, Asym, Drop, lrc, pwr), data = Chick.6)
## Initial values are in fact the converged values
fm1 <- nls(weight ~ SSweibull(Time, Asym, Drop, lrc, pwr), data = Chick.6)

## Data and Fit:
plot(weight ~ Time, Chick.6, xlim = c(0, 21), main = "SSweibull() fit to Chick.6")
ux <- par("usr")[1:2]; x <- seq(ux[1], ux[2], length.out=250)
lines(x,, c(list(x=x), coef(fm1))), col = "red", lwd=2)
As <- coef(fm1)[["Asym"]]; abline(v = 0, h = c(As, As - coef(fm1)[["Drop"]]), lty = 3)

Encode the Terminal Times of Time Series


Extract and encode the times the first and last observations were taken. Provided only for compatibility with S version 2.


start(x, ...)
end(x, ...)



a univariate or multivariate time-series, or a vector or matrix.


extra arguments for future methods.


These are generic functions, which will use the tsp attribute of x if it exists. Their default methods decode the start time from the original time units, so that for a monthly series 1995.5 is represented as c(1995, 7). For a series of frequency f, time n+i/f is presented as c(n, i+1) (even for i = 0 and f = 1).


The representation used by start and end has no meaning unless the frequency is supplied.

See Also

ts, time, tsp.

GLM ANOVA Statistics


This is a utility function, used in lm and glm methods for anova(..., test != NULL) and should not be used by the average user.


stat.anova(table, test = c("Rao","LRT", "Chisq", "F", "Cp"),
           scale, df.scale, n)



numeric matrix as results from anova.glm(..., test = NULL).


a character string, partially matching one of "Rao", "LRT", "Chisq", "F" or "Cp".


a residual mean square or other scale estimate to be used as the denominator in an F test.


degrees of freedom corresponding to scale.


number of observations.


A matrix which is the original table, augmented by a column of test statistics, depending on the test argument.


Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

anova.lm, anova.glm.


##-- Continued from '?glm':

print(ag <- anova(glm.D93))
stat.anova(ag$table, test = "Cp",
           scale = sum(resid(glm.D93, "pearson")^2)/4,
           df.scale = 4, n = 9)

Deprecated Functions in Package stats


These functions are provided for compatibility with older versions of R only, and may be defunct as soon as the next release.


There are currently no deprecated functions in this package.

See Also


Choose a model by AIC in a Stepwise Algorithm


Select a formula-based model by AIC.


step(object, scope, scale = 0,
     direction = c("both", "backward", "forward"),
     trace = 1, keep = NULL, steps = 1000, k = 2, ...)



an object representing a model of an appropriate class (mainly "lm" and "glm"). This is used as the initial model in the stepwise search.


defines the range of models examined in the stepwise search. This should be either a single formula, or a list containing components upper and lower, both formulae. See the details for how to specify the formulae and how they are used.


used in the definition of the AIC statistic for selecting the models, currently only for lm, aov and glm models. The default value, 0, indicates the scale should be estimated: see extractAIC.


the mode of stepwise search, can be one of "both", "backward", or "forward", with a default of "both". If the scope argument is missing the default for direction is "backward". Values can be abbreviated.


if positive, information is printed during the running of step. Larger values may give more detailed information.


a filter function whose input is a fitted model object and the associated AIC statistic, and whose output is arbitrary. Typically keep will select a subset of the components of the object and return them. The default is not to keep anything.


the maximum number of steps to be considered. The default is 1000 (essentially as many as required). It is typically used to stop the process early.


the multiple of the number of degrees of freedom used for the penalty. Only k = 2 gives the genuine AIC: k = log(n) is sometimes referred to as BIC or SBC.


any additional arguments to extractAIC.


step uses add1 and drop1 repeatedly; it will work for any method for which they work, and that is determined by having a valid method for extractAIC. When the additive constant can be chosen so that AIC is equal to Mallows' CpC_p, this is done and the tables are labelled appropriately.

The set of models searched is determined by the scope argument. The right-hand-side of its lower component is always included in the model, and right-hand-side of the model is included in the upper component. If scope is a single formula, it specifies the upper component, and the lower model is empty. If scope is missing, the initial model is used as the upper model.

Models specified by scope can be templates to update object as used by update.formula. So using . in a scope formula means ‘what is already there’, with .^2 indicating all interactions of existing terms.

There is a potential problem in using glm fits with a variable scale, as in that case the deviance is not simply related to the maximized log-likelihood. The "glm" method for function extractAIC makes the appropriate adjustment for a gaussian family, but may need to be amended for other cases. (The binomial and poisson families have fixed scale by default and do not correspond to a particular maximum-likelihood problem for variable scale.)


the stepwise-selected model is returned, with up to two additional components. There is an "anova" component corresponding to the steps taken in the search, as well as a "keep" component if the keep= argument was supplied in the call. The "Resid. Dev" column of the analysis of deviance table refers to a constant minus twice the maximized log likelihood: it will be a deviance only in cases where a saturated model is well-defined (thus excluding lm, aov and survreg fits, for example).


The model fitting must apply the models to the same dataset. This may be a problem if there are missing values and R's default of na.action = na.omit is used. We suggest you remove the missing values first.

Calls to the function nobs are used to check that the number of observations involved in the fitting process remains unchanged.


This function differs considerably from the function in S, which uses a number of approximations and does not in general compute the correct AIC.

This is a minimal implementation. Use stepAIC in package MASS for a wider range of object classes.


B. D. Ripley: step is a slightly simplified version of stepAIC in package MASS (Venables & Ripley, 2002 and earlier editions).

The idea of a step function follows that described in Hastie & Pregibon (1992); but the implementation in R is more general.


Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. New York: Springer (4th ed).

See Also

stepAIC in MASS, add1, drop1


## following on from example(lm)


summary(lm1 <- lm(Fertility ~ ., data = swiss))
slm1 <- step(lm1)

Step Functions - Creation and Class


Given the vectors (x1,,xn)(x_1, \ldots, x_n) and (y0,y1,,yn)(y_0,y_1,\ldots, y_n) (one value more!), stepfun(x, y, ...) returns an interpolating ‘step’ function, say fn. I.e., fn(t)=cfn(t) = ci_i (constant) for t(xi,xi+1)t \in (x_i, x_{i+1}) and at the abscissa values, if (by default) right = FALSE, fn(xi)=yifn(x_i) = y_i and for right = TRUE, fn(xi)=yi1fn(x_i) = y_{i-1}, for i=1,,ni=1,\ldots,n.

The value of the constant cic_i above depends on the ‘continuity’ parameter f. For the default, right = FALSE, f = 0, fn is a cadlag function, i.e., continuous from the right, limits from the left, so that the function is piecewise constant on intervals that include their left endpoint. In general, cic_i is interpolated in between the neighbouring yy values, ci=(1f)yi+fyi+1c_i= (1-f) y_i + f\cdot y_{i+1}. Therefore, for non-0 values of f, fn may no longer be a proper step function, since it can be discontinuous from both sides, unless right = TRUE, f = 1 which is left-continuous (i.e., constant pieces contain their right endpoint).


stepfun(x, y, f = as.numeric(right), ties = "ordered",
        right = FALSE)

knots(Fn, ...)
as.stepfun(x, ...)

## S3 method for class 'stepfun'
print(x, digits = getOption("digits") - 2, ...)

## S3 method for class 'stepfun'
summary(object, ...)



numeric vector giving the knots or jump locations of the step function for stepfun(). For the other functions, x is as object below.


numeric vector one longer than x, giving the heights of the function values between the x values.


a number between 0 and 1, indicating how interpolation outside the given x values should happen. See approxfun.


Handling of tied x values. Either a function or the string "ordered". See approxfun.


logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa.

Fn, object

an R object inheriting from "stepfun".


number of significant digits to use, see print.


potentially further arguments (required by the generic).


A function of class "stepfun", say fn.

There are methods available for summarizing ("summary(.)"), representing ("print(.)") and plotting ("plot(.)", see plot.stepfun) "stepfun" objects.

The environment of fn contains all the information needed;

"x", "y"

the original arguments


number of knots (x values)


continuity parameter

"yleft", "yright"

the function values outside the knots


(always == "constant", from approxfun(.)).

The knots are also available via knots(fn).


The objects of class "stepfun" are not intended to be used for permanent storage and may change structure between versions of R (and did at R 3.0.0). They can usually be re-created by

    eval(attr(old_obj, "call"), environment(old_obj))

since the data used is stored as part of the object's environment.


Martin Maechler, with some basic code from Thomas Lumley.

See Also

ecdf for empirical distribution functions as special step functions and plot.stepfun for plotting step functions.

approxfun and splinefun.


y0 <- c(1., 2., 4., 3.)
sfun0  <- stepfun(1:3, y0, f = 0)
sfun.2 <- stepfun(1:3, y0, f = 0.2)
sfun1  <- stepfun(1:3, y0, f = 1)
sfun1c <- stepfun(1:3, y0, right = TRUE) # hence f=1

## look at the internal structure:
ls(envir = environment(sfun0))

x0 <- seq(0.5, 3.5, by = 0.25)
rbind(x = x0, f.f0 = sfun0(x0), f.f02 = sfun.2(x0),
      f.f1 = sfun1(x0), f.f1c = sfun1c(x0))
## Identities :
stopifnot(identical(y0[-1], sfun0 (1:3)), # right = FALSE
          identical(y0[-4], sfun1c(1:3))) # right = TRUE

Seasonal Decomposition of Time Series by Loess


Decompose a time series into seasonal, trend and irregular components using loess, acronym STL.


stl(x, s.window, = 0,
    t.window = NULL, = 1,
    l.window = nextodd(period), =,
    s.jump = ceiling(s.window/10),
    t.jump = ceiling(t.window/10),
    l.jump = ceiling(l.window/10),
    robust = FALSE,
    inner = if(robust)  1 else 2,
    outer = if(robust) 15 else 0,
    na.action =



univariate time series to be decomposed. This should be an object of class "ts" with a frequency greater than one.


either the character string "periodic" or the span (in lags) of the loess window for seasonal extraction, which should be odd and at least 7, according to Cleveland et al. This has no default.

degree of locally-fitted polynomial in seasonal extraction. Should be zero or one.


the span (in lags) of the loess window for trend extraction, which should be odd. If NULL, the default, nextodd(ceiling((1.5*period) / (1-(1.5/s.window)))), is taken.

degree of locally-fitted polynomial in trend extraction. Should be zero or one.


the span (in lags) of the loess window of the low-pass filter used for each subseries. Defaults to the smallest odd integer greater than or equal to frequency(x) which is recommended since it prevents competition between the trend and seasonal components. If not an odd integer its given value is increased to the next odd one.

degree of locally-fitted polynomial for the subseries low-pass filter. Must be 0 or 1.

s.jump, t.jump, l.jump

integers at least one to increase speed of the respective smoother. Linear interpolation happens between every *.jump-th value.


logical indicating if robust fitting be used in the loess procedure.


integer; the number of ‘inner’ (backfitting) iterations; usually very few (2) iterations suffice.


integer; the number of ‘outer’ robustness iterations.


action on missing values.


The seasonal component is found by loess smoothing the seasonal sub-series (the series of all January values, ...); if s.window = "periodic" smoothing is effectively replaced by taking the mean. The seasonal values are removed, and the remainder smoothed to find the trend. The overall level is removed from the seasonal component and added to the trend component. This process is iterated a few times. The remainder component is the residuals from the seasonal plus trend fit.

Several methods for the resulting class "stl" objects, see, plot.stl.


stl returns an object of class "stl" with components


a multiple time series with columns seasonal, trend and remainder.


the final robust weights (all one if fitting is not done robustly).


the matched call.


integer (length 3 vector) with the spans used for the "s", "t", and "l" smoothers.


integer (length 3) vector with the polynomial degrees for these smoothers.


integer (length 3) vector with the ‘jumps’ (skips) used for these smoothers.


number of inner iterations


number of outer robustness iterations


B.D. Ripley; Fortran code by Cleveland et al. (1990) from ‘netlib’.


R. B. Cleveland, W. S. Cleveland, J.E. McRae, and I. Terpenning (1990) STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, 6, 3–73.

See Also

plot.stl for stl methods; loess in package stats (which is not actually used in stl).

StructTS for different kind of decomposition.



plot(stl(nottem, "per"))
plot(stl(nottem, s.window = 7, t.window = 50, t.jump = 1))

plot(stllc <- stl(log(co2), s.window = 21))
## linear trend, strict period.
plot(stl(log(co2), s.window = "per", t.window = 1000))

## Two STL plotted side by side :
        stmd <- stl(mdeaths, s.window = "per") # non-robust
summary(stmR <- stl(mdeaths, s.window = "per", robust = TRUE))
op <- par(mar = c(0, 4, 0, 3), oma = c(5, 0, 4, 0), mfcol = c(4, 2))
plot(stmd, = NULL, labels  =  NULL,
     main = "stl(mdeaths, s.w = \"per\",  robust = FALSE / TRUE )")
plot(stmR, = NULL)
# mark the 'outliers' :
(iO <- which(stmR $ weights  < 1e-8)) # 10 were considered outliers
sts <- stmR$time.series
points(time(sts)[iO], 0.8* sts[,"remainder"][iO], pch = 4, col = "red")
par(op)   # reset

Methods for STL Objects


Methods for objects of class stl, typically the result of stl. The plot method does a multiple figure plot with some flexibility.

There are also (non-visible) print and summary methods.


## S3 method for class 'stl'
plot(x, labels = colnames(X), = list(mar = c(0, 6, 0, 6), oma = c(6, 0, 4, 0),
                     tck = -0.01, mfrow = c(nplot, 1)),
     main = NULL, range.bars = TRUE, ...,
     col.range = "light gray")



stl object.


character of length 4 giving the names of the component time-series.

settings for par(.) when setting up the plot.


plot main title.


logical indicating if each plot should have a bar at its right side which are of equal heights in user coordinates.


further arguments passed to or from other methods.


colour to be used for the range bars, if plotted. Note this appears after ... and so cannot be abbreviated.

See Also

plot.ts and stl, particularly for examples.

Fit Structural Time Series


Fit a structural model for a time series by maximum likelihood.


StructTS(x, type = c("level", "trend", "BSM"), init = NULL,
         fixed = NULL, optim.control = NULL)



a univariate numeric time series. Missing values are allowed.


the class of structural model. If omitted, a BSM is used for a time series with frequency(x) > 1, and a local trend model otherwise. Can be abbreviated.


initial values of the variance parameters.


optional numeric vector of the same length as the total number of parameters. If supplied, only NA entries in fixed will be varied. Probably most useful for setting variances to zero.


List of control parameters for optim. Method "L-BFGS-B" is used.


Structural time series models are (linear Gaussian) state-space models for (univariate) time series based on a decomposition of the series into a number of components. They are specified by a set of error variances, some of which may be zero.

The simplest model is the local level model specified by type = "level". This has an underlying level μt\mu_t which evolves by

μt+1=μt+ξt,ξtN(0,σξ2)\mu_{t+1} = \mu_t + \xi_t, \qquad \xi_t \sim N(0, \sigma^2_\xi)

The observations are

xt=μt+ϵt,ϵtN(0,σϵ2)x_t = \mu_t + \epsilon_t, \qquad \epsilon_t \sim N(0, \sigma^2_\epsilon)

There are two parameters, σξ2\sigma^2_\xi and σϵ2\sigma^2_\epsilon. It is an ARIMA(0,1,1) model, but with restrictions on the parameter set.

The local linear trend model, type = "trend", has the same measurement equation, but with a time-varying slope in the dynamics for μt\mu_t, given by

μt+1=μt+νt+ξt,ξtN(0,σξ2)\mu_{t+1} = \mu_t + \nu_t + \xi_t, \qquad \xi_t \sim N(0, \sigma^2_\xi)

νt+1=νt+ζt,ζtN(0,σζ2)\nu_{t+1} = \nu_t + \zeta_t, \qquad \zeta_t \sim N(0, \sigma^2_\zeta)

with three variance parameters. It is not uncommon to find σζ2=0\sigma^2_\zeta = 0 (which reduces to the local level model) or σξ2=0\sigma^2_\xi = 0, which ensures a smooth trend. This is a restricted ARIMA(0,2,2) model.

The basic structural model, type = "BSM", is a local trend model with an additional seasonal component. Thus the measurement equation is

xt=μt+γt+ϵt,ϵtN(0,σϵ2)x_t = \mu_t + \gamma_t + \epsilon_t, \qquad \epsilon_t \sim N(0, \sigma^2_\epsilon)

where γt\gamma_t is a seasonal component with dynamics

γt+1=γt++γts+2+ωt,ωtN(0,σω2)\gamma_{t+1} = -\gamma_t + \cdots + \gamma_{t-s+2} + \omega_t, \qquad \omega_t \sim N(0, \sigma^2_\omega)

The boundary case σω2=0\sigma^2_\omega = 0 corresponds to a deterministic (but arbitrary) seasonal pattern. (This is sometimes known as the ‘dummy variable’ version of the BSM.)


A list of class "StructTS" with components:


the estimated variances of the components.


the maximized log-likelihood. Note that as all these models are non-stationary this includes a diffuse prior for some observations and hence is not comparable to arima nor different types of structural models.


the maximized log-likelihood with the constant used prior to R 3.0.0, for backwards compatibility.


the time series x.


the standardized residuals.


a multiple time series with one component for the level, slope and seasonal components, estimated contemporaneously (that is at time tt and not at the end of the series).


the matched call.


the name of the series x.


the convergence code returned by optim.

model, model0

Lists representing the Kalman filter used in the fitting. See KalmanLike. model0 is the initial state of the filter, model its final state.


the tsp attributes of x.


Optimization of structural models is a lot harder than many of the references admit. For example, the AirPassengers data are considered in Brockwell & Davis (1996): their solution appears to be a local maximum, but nowhere near as good a fit as that produced by StructTS. It is quite common to find fits with one or more variances zero, and this can include σϵ2\sigma^2_\epsilon.


Brockwell, P. J. & Davis, R. A. (1996). Introduction to Time Series and Forecasting. Springer, New York. Sections 8.2 and 8.5.

Durbin, J. and Koopman, S. J. (2001) Time Series Analysis by State Space Methods. Oxford University Press.

Harvey, A. C. (1989) Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.

Harvey, A. C. (1993) Time Series Models. 2nd Edition, Harvester Wheatsheaf.

See Also

KalmanLike, tsSmooth; stl for different kind of (seasonal) decomposition.


## see also JohnsonJohnson, Nile and AirPassengers

trees <- window(treering, start = 0)
(fit <- StructTS(trees, type = "level"))
lines(fitted(fit), col = "green")

(fit <- StructTS(log10(UKgas), type = "BSM"))
par(mfrow = c(4, 1)) # to give appropriate aspect ratio for next plot.
plot(cbind(fitted(fit), resids=resid(fit)), main = "UK gas consumption")

## keep some parameters fixed; trace optimizer:
StructTS(log10(UKgas), type = "BSM", fixed = c(0.1,0.001,NA,NA),
         optim.control = list(trace = TRUE))

Summarize an Analysis of Variance Model


Summarize an analysis of variance model.


## S3 method for class 'aov'
summary(object, intercept = FALSE, split,
        expand.split = TRUE, = TRUE, ...)

## S3 method for class 'aovlist'
summary(object, ...)



An object of class "aov" or "aovlist".


logical: should intercept terms be included?


an optional named list, with names corresponding to terms in the model. Each component is itself a list with integer components giving contrasts whose contributions are to be summed.


logical: should the split apply also to interactions involving the factor?

logical: should terms with no degrees of freedom be included?


Arguments to be passed to or from other methods, for summary.aovlist including those for summary.aov.


An object of class c("summary.aov", "listof") or "summary.aovlist" respectively.

For fits with a single stratum the result will be a list of ANOVA tables, one for each response (even if there is only one response): the tables are of class "anova" inheriting from class "data.frame". They have columns "Df", "Sum Sq", "Mean Sq", as well as "F value" and "Pr(>F)" if there are non-zero residual degrees of freedom. There is a row for each term in the model, plus one for "Residuals" if there are any.

For multistratum fits the return value is a list of such summaries, one for each stratum.


The use of expand.split = TRUE is little tested: it is always possible to set it to FALSE and specify exactly all the splits required.

See Also

aov, summary, model.tables, TukeyHSD


## For a simple example see example(aov)

# Cochran and Cox (1957, p.164)
# 3x3 factorial with ordered factors, each is average of 12.
CC <- data.frame(
    y = c(449, 413, 326, 409, 358, 291, 341, 278, 312)/12,
    P = ordered(gl(3, 3)), N = ordered(gl(3, 1, 9))
CC.aov <- aov(y ~ N * P, data = CC , weights = rep(12, 9))

# Split both main effects into linear and quadratic parts.
summary(CC.aov, split = list(N = list(L = 1, Q = 2),
                             P = list(L = 1, Q = 2)))

# Split only the interaction
summary(CC.aov, split = list("N:P" = list(L.L = 1, Q = 2:4)))

# split on just one var
summary(CC.aov, split = list(P = list(lin = 1, quad = 2)))
summary(CC.aov, split = list(P = list(lin = 1, quad = 2)),
        expand.split = FALSE)

Summarizing Generalized Linear Model Fits


These functions are all methods for class glm or summary.glm objects.


## S3 method for class 'glm'
summary(object, dispersion = NULL, correlation = FALSE,
        symbolic.cor = FALSE, ...)

## S3 method for class 'summary.glm'
print(x, digits = max(3, getOption("digits") - 3),
      symbolic.cor = x$symbolic.cor,
      signif.stars = getOption("show.signif.stars"),
      show.residuals = FALSE, ...)



an object of class "glm", usually, a result of a call to glm.


an object of class "summary.glm", usually, a result of a call to summary.glm.


the dispersion parameter for the family used. Either a single numerical value or NULL (the default), when it is inferred from object (see ‘Details’).


logical; if TRUE, the correlation matrix of the estimated parameters is returned and printed.


the number of significant digits to use when printing.


logical. If TRUE, print the correlations in a symbolic form (see symnum) rather than as numbers.


logical. If TRUE, ‘significance stars’ are printed for each coefficient.


logical. If TRUE then a summary of the deviance residuals is printed at the head of the output.


further arguments passed to or from other methods.


print.summary.glm tries to be smart about formatting the coefficients, standard errors, etc. and additionally gives ‘significance stars’ if signif.stars is TRUE. The coefficients component of the result gives the estimated coefficients and their estimated standard errors, together with their ratio. This third column is labelled t ratio if the dispersion is estimated, and z ratio if the dispersion is known (or fixed by the family). A fourth column gives the two-tailed p-value corresponding to the t or z ratio based on a Student t or Normal reference distribution. (It is possible that the dispersion is not known and there are no residual degrees of freedom from which to estimate it. In that case the estimate is NaN.)

Aliased coefficients are omitted in the returned object but restored by the print method.

Correlations are printed to two decimal places (or symbolically): to see the actual correlations print summary(object)$correlation directly.

The dispersion of a GLM is not used in the fitting process, but it is needed to find standard errors. If dispersion is not supplied or NULL, the dispersion is taken as 1 for the binomial and Poisson families, and otherwise estimated by the residual Chi-squared statistic (calculated from cases with non-zero weights) divided by the residual degrees of freedom.

summary can be used with Gaussian glm fits to handle the case of a linear regression with known error variance, something not handled by summary.lm.


summary.glm returns an object of class "summary.glm", a list with components


the component from object.


the component from object.


the component from object.


the component from object.


the component from object.


the component from object.


the component from object.


the deviance residuals: see residuals.glm.


the matrix of coefficients, standard errors, z-values and p-values. Aliased coefficients are omitted.


named logical vector showing if the original coefficients are aliased.


either the supplied argument or the inferred/estimated dispersion if the former is NULL.


a 3-vector of the rank of the model and the number of residual degrees of freedom, plus number of coefficients (including aliased ones).


the unscaled (dispersion = 1) estimated covariance matrix of the estimated coefficients.


ditto, scaled by dispersion.


(only if correlation is true.) The estimated correlations of the estimated coefficients.


(only if correlation is true.) The value of the argument symbolic.cor.

See Also

glm, summary.


## For examples see example(glm)

Summarizing Linear Model Fits


summary method for class "lm".


## S3 method for class 'lm'
summary(object, correlation = FALSE, symbolic.cor = FALSE, ...)

## S3 method for class 'summary.lm'
print(x, digits = max(3, getOption("digits") - 3),
      symbolic.cor = x$symbolic.cor,
      signif.stars = getOption("show.signif.stars"), ...)



an object of class "lm", usually, a result of a call to lm.


an object of class "summary.lm", usually, a result of a call to summary.lm.


logical; if TRUE, the correlation matrix of the estimated parameters is returned and printed.


the number of significant digits to use when printing.


logical. If TRUE, print the correlations in a symbolic form (see symnum) rather than as numbers.


logical. If TRUE, ‘significance stars’ are printed for each coefficient.


further arguments passed to or from other methods.


print.summary.lm tries to be smart about formatting the coefficients, standard errors, etc. and additionally gives ‘significance stars’ if signif.stars is TRUE.

Aliased coefficients are omitted in the returned object but restored by the print method.

Correlations are printed to two decimal places (or symbolically): to see the actual correlations print summary(object)$correlation directly.


The function summary.lm computes and returns a list of summary statistics of the fitted linear model given in object, using the components (list elements) "call" and "terms" from its argument, plus


the weighted residuals, the usual residuals rescaled by the square root of the weights specified in the call to lm.


a p×4p \times 4 matrix with columns for the estimated coefficient, its standard error, t-statistic and corresponding (two-sided) p-value. Aliased coefficients are omitted.


named logical vector showing if the original coefficients are aliased.


the square root of the estimated variance of the random error

σ^2=1npiwiRi2,\hat\sigma^2 = \frac{1}{n-p}\sum_i{w_i R_i^2},

where RiR_i is the ii-th residual, residuals[i].


degrees of freedom, a 3-vector (p,np,p)(p, n-p, p*), the first being the number of non-aliased coefficients, the last being the total number of coefficients.


(for models including non-intercept terms) a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom.


R2R^2, the ‘fraction of variance explained by the model’,

R2=1iRi2i(yiy)2,R^2 = 1 - \frac{\sum_i{R_i^2}}{\sum_i(y_i- y^*)^2},

where yy^* is the mean of yiy_i if there is an intercept and zero otherwise.


the above R2R^2 statistic ‘adjusted’, penalizing for higher pp.


a p×pp \times p matrix of (unscaled) covariances of the β^j\hat\beta_j, j=1,,pj=1, \dots, p.


the correlation matrix corresponding to the above cov.unscaled, if correlation = TRUE is specified.


(only if correlation is true.) The value of the argument symbolic.cor.


from object, if present there.

See Also

The model fitting function lm, summary.

Function coef will extract the matrix of coefficients with standard errors, t-statistics and p-values.


##-- Continuing the  lm(.) example:
coef(lm.D90)  # the bare coefficients
sld90 <- summary(lm.D90 <- lm(weight ~ group -1))  # omitting intercept
coef(sld90)  # much more

## model with *aliased* coefficient:
lm.D9. <- lm(weight ~ group + I(group != "Ctl"))
Sm.D9. <- summary(lm.D9.)
Sm.D9. #  shows the NA NA NA NA  line
stopifnot(length(cc <- coef(lm.D9.)) == 3,[3]),
          dim(coef(Sm.D9.)) == c(2,4), Sm.D9.$df == c(2, 18, 3))

Summary Method for Multivariate Analysis of Variance


A summary method for class "manova".


## S3 method for class 'manova'
        test = c("Pillai", "Wilks", "Hotelling-Lawley", "Roy"),
        intercept = FALSE, tol = 1e-7, ...)



An object of class "manova" or an aov object with multiple responses.


The name of the test statistic to be used. Partial matching is used so the name can be abbreviated.


logical. If TRUE, the intercept term is included in the table.


tolerance to be used in deciding if the residuals are rank-deficient: see qr.


further arguments passed to or from other methods.


The summary.manova method uses a multivariate test statistic for the summary table. Wilks' statistic is most popular in the literature, but the default Pillai–Bartlett statistic is recommended by Hand and Taylor (1987).

The table gives a transformation of the test statistic which has approximately an F distribution. The approximations used follow S-PLUS and SAS (the latter apart from some cases of the Hotelling–Lawley statistic), but many other distributional approximations exist: see Anderson (1984) and Krzanowski and Marriott (1994) for further references. All four approximate F statistics are the same when the term being tested has one degree of freedom, but in other cases that for the Roy statistic is an upper bound.

The tolerance tol is applied to the QR decomposition of the residual correlation matrix (unless some response has essentially zero residuals, when it is unscaled). Thus the default value guards against very highly correlated responses: it can be reduced but doing so will allow rather inaccurate results and it will normally be better to transform the responses to remove the high correlation.


An object of class "summary.manova". If there is a positive residual degrees of freedom, this is a list with components


The names of the terms, the row names of the stats table if present.


A named list of sums of squares and product matrices.


A matrix of eigenvalues.


A matrix of the statistics, approximate F value, degrees of freedom and P value.

otherwise components row.names, SS and Df (degrees of freedom) for the terms (and not the residuals).


Anderson, T. W. (1994) An Introduction to Multivariate Statistical Analysis. Wiley.

Hand, D. J. and Taylor, C. C. (1987) Multivariate Analysis of Variance and Repeated Measures. Chapman and Hall.

Krzanowski, W. J. (1988) Principles of Multivariate Analysis. A User's Perspective. Oxford.

Krzanowski, W. J. and Marriott, F. H. C. (1994) Multivariate Analysis. Part I: Distributions, Ordination and Inference. Edward Arnold.

See Also

manova, aov


## Example on producing plastic film from Krzanowski (1998, p. 381)
tear <- c(6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3,
          6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6)
gloss <- c(9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0, 9.9, 9.5, 9.4,
           9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2)
opacity <- c(4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7,
             2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9)
Y <- cbind(tear, gloss, opacity)
rate     <- gl(2,10, labels = c("Low", "High"))
additive <- gl(2, 5, length = 20, labels = c("Low", "High"))

fit <- manova(Y ~ rate * additive)
summary.aov(fit)             # univariate ANOVA tables
summary(fit, test = "Wilks") # ANOVA table of Wilks' lambda
summary(fit)                # same F statistics as single-df terms

Summarizing Non-Linear Least-Squares Model Fits


summary method for class "nls".


## S3 method for class 'nls'
summary(object, correlation = FALSE, symbolic.cor = FALSE, ...)

## S3 method for class 'summary.nls'
print(x, digits = max(3, getOption("digits") - 3),
      symbolic.cor = x$symbolic.cor,
      signif.stars = getOption("show.signif.stars"), ...)



an object of class "nls".


an object of class "summary.nls", usually the result of a call to summary.nls.


logical; if TRUE, the correlation matrix of the estimated parameters is returned and printed.


the number of significant digits to use when printing.


logical. If TRUE, print the correlations in a symbolic form (see symnum) rather than as numbers.


logical. If TRUE, ‘significance stars’ are printed for each coefficient.


further arguments passed to or from other methods.


The distribution theory used to find the distribution of the standard errors and of the residual standard error (for t ratios) is based on linearization and is approximate, maybe very approximate.

print.summary.nls tries to be smart about formatting the coefficients, standard errors, etc. and additionally gives ‘significance stars’ if signif.stars is TRUE.

Correlations are printed to two decimal places (or symbolically): to see the actual correlations print summary(object)$correlation directly.


The function summary.nls computes and returns a list of summary statistics of the fitted model given in object, using the component "formula" from its argument, plus


the weighted residuals, the usual residuals rescaled by the square root of the weights specified in the call to nls.


a p×4p \times 4 matrix with columns for the estimated coefficient, its standard error, t-statistic and corresponding (two-sided) p-value.


the square root of the estimated variance of the random error

σ^2=1npiRi2,\hat\sigma^2 = \frac{1}{n-p}\sum_i{R_i^2},

where RiR_i is the ii-th weighted residual.


degrees of freedom, a 2-vector (p,np)(p, n-p). (Here and elsewhere nn omits observations with zero weights.)


a p×pp \times p matrix of (unscaled) covariances of the parameter estimates.


the correlation matrix corresponding to the above cov.unscaled, if correlation = TRUE is specified and there are a non-zero number of residual degrees of freedom.


(only if correlation is true.) The value of the argument symbolic.cor.

See Also

The model fitting function nls, summary.

Function coef will extract the matrix of coefficients with standard errors, t-statistics and p-values.

Summary method for Principal Components Analysis


The summary method for class "princomp".


## S3 method for class 'princomp'
summary(object, loadings = FALSE, cutoff = 0.1, ...)

## S3 method for class 'summary.princomp'
print(x, digits = 3, loadings = x$print.loadings,
      cutoff = x$cutoff, ...)



an object of class "princomp", as from princomp().


logical. Should loadings be included?


numeric. Loadings below this cutoff in absolute value are shown as blank in the output.


an object of class "summary.princomp".


the number of significant digits to be used in listing loadings.


arguments to be passed to or from other methods.


object with additional components cutoff and print.loadings.

See Also



summary( <- princomp(USArrests, cor = TRUE))
## The signs of the loading columns are arbitrary
print(summary(princomp(USArrests, cor = TRUE),
              loadings = TRUE, cutoff = 0.2), digits = 2)

Friedman's SuperSmoother


Smooth the (x, y) values by Friedman's ‘super smoother’.


supsmu(x, y, wt =, span = "cv", periodic = FALSE, bass = 0, trace = FALSE)



x values for smoothing


y values for smoothing


case weights, by default all equal


the fraction of the observations in the span of the running lines smoother, or "cv" to choose this by leave-one-out cross-validation.


if TRUE, the x values are assumed to be in [0, 1] and of period 1.


controls the smoothness of the fitted curve. Values of up to 10 indicate increasing smoothness.


logical, if true, prints one line of info “per spar”, notably useful for "cv".


supsmu is a running lines smoother which chooses between three spans for the lines. The running lines smoothers are symmetric, with k/2 data points each side of the predicted point, and values of k as 0.5 * n, 0.2 * n and 0.05 * n, where n is the number of data points. If span is specified, a single smoother with span span * n is used.

The best of the three smoothers is chosen by cross-validation for each prediction. The best spans are then smoothed by a running lines smoother and the final prediction chosen by linear interpolation.

The FORTRAN code says: “For small samples (n < 40) or if there are substantial serial correlations between observations close in x-value, then a pre-specified fixed span smoother (span > 0) should be used. Reasonable span values are 0.2 to 0.4.”

Cases with non-finite values of x, y or wt are dropped, with a warning.


A list with components


the input values in increasing order with duplicates removed.


the corresponding y values on the fitted curve.


Friedman, J. H. (1984) SMART User's Guide. Laboratory for Computational Statistics, Stanford University Technical Report No. 1.

Friedman, J. H. (1984) A variable span scatterplot smoother. Laboratory for Computational Statistics, Stanford University Technical Report No. 5.

See Also




with(cars, {
    plot(speed, dist)
    lines(supsmu(speed, dist))
    lines(supsmu(speed, dist, bass = 7), lty = 2)

Symbolic Number Coding


Symbolically encode a given numeric or logical vector or array. Particularly useful for visualization of structured matrices, e.g., correlation, sparse, or logical ones.


symnum(x, cutpoints = c(0.3, 0.6, 0.8, 0.9, 0.95),
       symbols = if(numeric.x) c(" ", ".", ",", "+", "*", "B")
                 else c(".", "|"),
       legend = length(symbols) >= 3,
       na = "?", eps = 1e-5, numeric.x = is.numeric(x),
       corr = missing(cutpoints) && numeric.x,
       show.max = if(corr) "1", show.min = NULL,
       abbr.colnames = has.colnames,
       lower.triangular = corr && is.numeric(x) && is.matrix(x),
       diag.lower.tri   = corr && !is.null(show.max))



numeric or logical vector or array.


numeric vector whose values cutpoints[j] =cj= c_j (after augmentation, see corr below) are used for intervals.


character vector, one shorter than (the augmented, see corr below) cutpoints. symbols[j]=sj= s_j are used as ‘code’ for the (half open) interval (cj,cj+1](c_j,c_{j+1}].

When numeric.x is FALSE, i.e., by default when argument x is logical, the default is c(".","|") (graphical 0 / 1 s).


logical indicating if a "legend" attribute is desired.


character or logical. How NAs are coded. If na == FALSE, NAs are coded invisibly, including the "legend" attribute below, which otherwise mentions NA coding.


absolute precision to be used at left and right boundary.


logical indicating if x should be treated as numbers, otherwise as logical.


logical. If TRUE, x contains correlations. The cutpoints are augmented by 0 and 1 and abs(x) is coded.


if TRUE, or of mode character, the maximal cutpoint is coded especially.


if TRUE, or of mode character, the minimal cutpoint is coded especially.


logical, integer or NULL indicating how column names should be abbreviated (if they are); if NULL (or FALSE and x has no column names), the column names will all be empty, i.e., ""; otherwise if abbr.colnames is false, they are left unchanged. If TRUE or integer, existing column names will be abbreviated to abbreviate(*, minlength = abbr.colnames).


logical. If TRUE and x is a matrix, only the lower triangular part of the matrix is coded as non-blank.


logical. If lower.triangular and this are TRUE, the diagonal part of the matrix is shown.


An atomic character object of class noquote and the same dimensions as x.

If legend is TRUE (as by default when there are more than two classes), the result has an attribute "legend" containing a legend of the returned character codes, in the form

c1s1c2s2sncn+1c_1 s_1 c_2 s_2 \dots s_n c_{n+1}

where cjc_j = cutpoints[j] and sjs_j = symbols[j].


The optional (mostly logical) arguments all try to use smart defaults. Specifying them explicitly may lead to considerably improved output in many cases.


Martin Maechler

See Also

as.character; image


ii <- setNames(0:8, 0:8)
symnum(ii, cutpoints =  2*(0:4), symbols = c(".", "-", "+", "$"))
symnum(ii, cutpoints =  2*(0:4), symbols = c(".", "-", "+", "$"), show.max = TRUE)

symnum(1:12 %% 3 == 0)  # --> "|" = TRUE, "." = FALSE  for logical

## Pascal's Triangle modulo 2 -- odd and even numbers:
N <- 38
pascal <- t(sapply(0:N, function(n) round(choose(n, 0:N - (N-n)%/%2))))
rownames(pascal) <- rep("", 1+N) # <-- to improve "graphic"
symnum(pascal %% 2, symbols = c(" ", "A"), numeric.x = FALSE)

##-- Symbolic correlation matrices:
symnum(cor(attitude), diag.lower.tri = FALSE)
symnum(cor(attitude), abbr.colnames = NULL)
symnum(cor(attitude), abbr.colnames = FALSE)
symnum(cor(attitude), abbr.colnames = 2)

symnum(cor(rbind(1, rnorm(25), rnorm(25)^2)))
symnum(cor(matrix(rexp(30, 1), 5, 18))) # <<-- PATTERN ! --
symnum(cm1 <- cor(matrix(rnorm(90) ,  5, 18))) # < White Noise SMALL n
symnum(cm1, diag.lower.tri = FALSE)
symnum(cm2 <- cor(matrix(rnorm(900), 50, 18))) # < White Noise "BIG" n
symnum(cm2, lower.triangular = FALSE)

## NA's:
Cm <- cor(matrix(rnorm(60),  10, 6)); Cm[c(3,6), 2] <- NA
symnum(Cm, show.max = NULL)

## Graphical P-values (aka "significance stars"):
pval <- rev(sort(c(outer(1:6, 10^-(1:3)))))
symp <- symnum(pval, corr = FALSE,
               cutpoints = c(0,  .001,.01,.05, .1, 1),
               symbols = c("***","**","*","."," "))
noquote(cbind(P.val = format(pval), Signif = symp))

Student's t-Test


Performs one and two sample t-tests on vectors of data.


t.test(x, ...)

## Default S3 method:
t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, ...)

## S3 method for class 'formula'
t.test(formula, data, subset, na.action = na.pass, ...)



a (non-empty) numeric vector of data values.


an optional (non-empty) numeric vector of data values.


a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.


a number indicating the true value of the mean (or difference in means if you are performing a two sample test).


a logical indicating whether you want a paired t-test.


a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.


confidence level of the interval.


a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs either 1 for a one-sample or paired test or a factor with two levels giving the corresponding groups. If lhs is of class "Pair" and rhs is 1, a paired test is done, see Examples.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs.


further arguments to be passed to or from methods. For the formula method, this includes arguments of the default method, but not paired.


alternative = "greater" is the alternative that x has a larger mean than y. For the one-sample case: that the mean is positive.

If paired is TRUE then both x and y must be specified and they must be the same length. Missing values are silently removed (in pairs if paired is TRUE). If var.equal is TRUE then the pooled estimate of the variance is used. By default, if var.equal is FALSE then the variance is estimated separately for both groups and the Welch modification to the degrees of freedom is used.

If the input data are effectively constant (compared to the larger of the two means) an error is generated.


A list with class "htest" containing the following components:


the value of the t-statistic.


the degrees of freedom for the t-statistic.


the p-value for the test.

a confidence interval for the mean appropriate to the specified alternative hypothesis.


the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.


the specified hypothesized value of the mean or mean difference depending on whether it was a one-sample test or a two-sample test.


the standard error of the mean (difference), used as denominator in the t-statistic formula.


a character string describing the alternative hypothesis.


a character string indicating what type of t-test was performed.

a character string giving the name(s) of the data.

See Also



## Two-sample t-test
t.test(1:10, y = c(7:20))      # P = .00001855
t.test(1:10, y = c(7:20, 200)) # P = .1245    -- NOT significant anymore

## Traditional interface
with(mtcars, t.test(mpg[am == 0], mpg[am == 1]))

## Formula interface
t.test(mpg ~ am, data = mtcars)

## One-sample t-test
## Traditional interface

## Formula interface
t.test(extra ~ 1, data = sleep)

## Paired t-test
## The sleep data is actually paired, so could have been in wide format:
sleep2 <- reshape(sleep, direction = "wide",
                  idvar = "ID", timevar = "group")

## Traditional interface
t.test(sleep2$extra.1, sleep2$extra.2, paired = TRUE)

## Formula interface
t.test(Pair(extra.1, extra.2) ~ 1, data = sleep2)

The Student t Distribution


Density, distribution function, quantile function and random generation for the t distribution with df degrees of freedom (and optional non-centrality parameter ncp).


dt(x, df, ncp, log = FALSE)
pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE)
qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE)
rt(n, df, ncp)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.


degrees of freedom (>0> 0, maybe non-integer). df = Inf is allowed.


non-centrality parameter δ\delta; currently except for rt(), accurate only for abs(ncp) <= 37.62. If omitted, use the central t distribution.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


The tt distribution with df =ν= \nu degrees of freedom has density

f(x)=Γ((ν+1)/2)πνΓ(ν/2)(1+x2/ν)(ν+1)/2f(x) = \frac{\Gamma ((\nu+1)/2)}{\sqrt{\pi \nu} \Gamma (\nu/2)} (1 + x^2/\nu)^{-(\nu+1)/2}%

for all real xx. It has mean 00 (for ν>1\nu > 1) and variance νν2\frac{\nu}{\nu-2} (for ν>2\nu > 2).

The general non-central tt with parameters (ν,δ)(\nu, \delta) = (df, ncp) is defined as the distribution of Tν(δ):=(U+δ)/V/νT_{\nu}(\delta) := (U + \delta)/\sqrt{V/\nu} where UU and VV are independent random variables, UN(0,1)U \sim {\cal N}(0,1) and Vχν2V \sim \chi^2_\nu (see Chisquare).

The most used applications are power calculations for tt-tests:
Let T=Xˉμ0S/nT = \frac{\bar{X} - \mu_0}{S/\sqrt{n}} where Xˉ\bar{X} is the mean and SS the sample standard deviation (sd) of X1,X2,,XnX_1, X_2, \dots, X_n which are i.i.d. N(μ,σ2){\cal N}(\mu, \sigma^2) Then TT is distributed as non-central tt with df=n1{} = n-1 degrees of freedom and non-centrality parameter ncp=(μμ0)n/σ{} = (\mu - \mu_0) \sqrt{n}/\sigma.

The tt distribution's cumulative distribution function (cdf), FνF_{\nu} fulfills Fν(t)=12Ix(ν2,12),F_{\nu}(t) = \frac 1 2 I_x(\frac{\nu}{2}, \frac 1 2), for t0t \le 0, and Fν(t)=112Ix(ν2,12),F_{\nu}(t) = 1- \frac 1 2 I_x(\frac{\nu}{2}, \frac 1 2), for t0t \ge 0, where x:=ν/(ν+t2)x := \nu/(\nu + t^2), and Ix(a,b)I_x(a,b) is the incomplete beta function, in R this is pbeta(x, a,b).


dt gives the density, pt gives the distribution function, qt gives the quantile function, and rt generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rt, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


Supplying ncp = 0 uses the algorithm for the non-central distribution, which is not the same algorithm used if ncp is omitted. This is to give consistent behaviour in extreme cases with values of ncp very near zero.

The code for non-zero ncp is principally intended to be used for moderate values of ncp: it will not be highly accurate, especially in the tails, for large values.


The central dt is computed via an accurate formula provided by Catherine Loader (see the reference in dbinom).

For the non-central case of dt, C code contributed by Claus Ekstrøm based on the relationship (for x0x \neq 0) to the cumulative distribution.

For the central case of pt, a normal approximation in the tails, otherwise via pbeta.

For the non-central case of pt based on a C translation of

Lenth, R. V. (1989). Algorithm AS 243 — Cumulative distribution function of the non-central tt distribution, Applied Statistics 38, 185–189.

This computes the lower tail only, so the upper tail currently suffers from cancellation and a warning will be given when this is likely to be significant.

For central qt, a C translation of

Hill, G. W. (1970) Algorithm 396: Student's t-quantiles. Communications of the ACM, 13(10), 619–620.

altered to take account of

Hill, G. W. (1981) Remark on Algorithm 396, ACM Transactions on Mathematical Software, 7, 250–1.

The non-central case is done by inversion.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole. (Except non-central versions.)

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 2, chapters 28 and 31. Wiley, New York.

See Also

Distributions for other standard distributions, including df for the F distribution.



1 - pt(1:5, df = 1)
qt(.975, df = c(1:10,20,50,100,1000))

tt <- seq(0, 10, length.out = 21)
ncp <- seq(0, 6, length.out = 31)
ptn <- outer(tt, ncp, function(t, d) pt(t, df = 3, ncp = d))
t.tit <- "Non-central t - Probabilities"
image(tt, ncp, ptn, zlim = c(0,1), main = t.tit)
persp(tt, ncp, ptn, zlim = 0:1, r = 2, phi = 20, theta = 200, main = t.tit,
      xlab = "t", ylab = "non-centrality parameter",
      zlab = "Pr(T <= t)")

plot(function(x) dt(x, df = 3, ncp = 2), -3, 11, ylim = c(0, 0.32),
     main = "Non-central t - Density", yaxs = "i")

## Relation between F_t(.) = pt(x, n) and pbeta():
ptBet <- function(t, n) {
    x <- n/(n + t^2)
    r <- pb <- pbeta(x, n/2, 1/2) / 2
    pos <- t > 0
    r[pos] <- 1 - pb[pos]
x <- seq(-5, 5, by = 1/8)
nu <- 3:10
pt. <- outer(x, nu, pt)
ptB <- outer(x, nu, ptBet)
## matplot(x, pt., type = "l")
stopifnot(all.equal(pt., ptB, tolerance = 1e-15))

Plot Regression Terms


Plots regression terms against their predictors, optionally with standard errors and partial residuals added.


termplot(model, data = NULL, envir = environment(formula(model)),
         partial.resid = FALSE, rug = FALSE,
         terms = NULL, se = FALSE,
         xlabs = NULL, ylabs = NULL, main = NULL,
         col.term = 2, lwd.term = 1.5, = "orange", = 2, = 1,
         col.res = "gray", cex = 1, pch = par("pch"),
         col.smth = "darkred", lty.smth = 2, span.smth = 2/3,
         ask = dev.interactive() && nb.fig < n.tms,
         use.factor.levels = TRUE, smooth = NULL, ylim = "common",
         plot = TRUE, transform.x = FALSE, ...)



fitted model object


data frame in which variables in model can be found


environment in which variables in model can be found


logical; should partial residuals be plotted?


add rugplots (jittered 1-d histograms) to the axes?


which terms to plot (default NULL means all terms); a vector passed to predict(.., type = "terms", terms = *).


plot pointwise standard errors?


vector of labels for the x axes


vector of labels for the y axes


logical, or vector of main titles; if TRUE, the model's call is taken as main title, NULL or FALSE mean no titles.

col.term, lwd.term

color and line width for the ‘term curve’, see lines.,,

color, line type and line width for the ‘twice-standard-error curve’ when se = TRUE.

col.res, cex, pch

color, plotting character expansion and type for partial residuals, when partial.resid = TRUE, see points.


logical; if TRUE, the user is asked before each plot, see par(ask=.).


Should x-axis ticks use factor levels or numbers for factor terms?


NULL or a function with the same arguments as panel.smooth to draw a smooth through the partial residuals for non-factor terms

lty.smth, col.smth, span.smth

Passed to smooth


an optional range for the y axis, or "common" when a range sufficient for all the plot will be computed, or "free" when limits are computed for each plot.


if set to FALSE plots are not produced: instead a list is returned containing the data that would have been plotted.


logical vector; if an element (recycled as necessary) is TRUE, partial residuals for the corresponding term are plotted against transformed values. The model response is then a straight line, allowing a ready comparison against the data or against the curve obtained from smooth-panel.smooth.


other graphical parameters.


The model object must have a predict method that accepts type = "terms", e.g., glm in the stats package, coxph and survreg in the survival package.

For the partial.resid = TRUE option model must have a residuals method that accepts type = "partial", which lm and glm do.

The data argument should rarely be needed, but in some cases termplot may be unable to reconstruct the original data frame. Using na.action=na.exclude makes these problems less likely.

Nothing sensible happens for interaction terms, and they may cause errors.

The plot = FALSE option is useful when some special action is needed, e.g. to overlay the results of two different models or to plot confidence bands.


For plot = FALSE, a list with one element for each plot which would have been produced. Each element of the list is a data frame with variables x, y, and optionally the pointwise standard errors se. For continuous predictors x will contain the ordered unique values and for a factor it will be a factor containing one instance of each level. The list has attribute "constant" copied from the predicted terms object.

Otherwise, the number of terms, invisibly.

See Also

For (generalized) linear models, plot.lm and predict.glm.



had.splines <- "package:splines" %in% search()
if(!had.splines) rs <- require(splines)
x <- 1:100
z <- factor(rep(LETTERS[1:4], 25))
y <- rnorm(100, sin(x/10)+as.numeric(z))
model <- glm(y ~ ns(x, 6) + z)

par(mfrow = c(2,2)) ## 2 x 2 plots for same model :
termplot(model, main = paste("termplot( ", deparse(model$call)," ...)"))
termplot(model, rug = TRUE)
termplot(model, partial.resid = TRUE, se = TRUE, main = TRUE)
termplot(model, partial.resid = TRUE, smooth = panel.smooth, span.smth = 1/4)
if(!had.splines && rs) detach("package:splines")

if(requireNamespace("MASS", quietly = TRUE)) {
hills.lm <- lm(log(time) ~ log(climb)+log(dist), data = MASS::hills)
termplot(hills.lm, partial.resid = TRUE, smooth = panel.smooth,
        terms = "log(dist)", main = "Original")
termplot(hills.lm, transform.x = TRUE,
         partial.resid = TRUE, smooth = panel.smooth,
	 terms = "log(dist)", main = "Transformed")


Model Terms


The function terms is a generic function which can be used to extract terms objects from various kinds of R data objects.


terms(x, ...)



object used to select a method to dispatch.


further arguments passed to or from other methods.


There are methods for classes "aovlist", and "terms" "formula" (see terms.formula): the default method just extracts the terms component of the object, or failing that a "terms" attribute (as used by model.frame).

There are print and labels methods for class "terms": the latter prints the term labels (see terms.object).


An object of class c("terms", "formula") which contains the terms representation of a symbolic model. See terms.object for its structure.


Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also

terms.object, terms.formula, lm, glm, formula.

Construct a terms Object from a Formula


This function takes a formula and some optional arguments and constructs a terms object. The terms object can then be used to construct a model.matrix.


## S3 method for class 'formula'
terms(x, specials = NULL, abb = NULL, data = NULL, neg.out = TRUE,
      keep.order = FALSE, simplify = FALSE, ...,
      allowDotAsName = FALSE)



a formula.


which functions in the formula should be marked as special in the terms object? A character vector or NULL.


Not implemented in R; deprecated.


a data frame from which the meaning of the special symbol . can be inferred. It is used only if there is a . in the formula.


Not implemented in R; deprecated.


a logical value indicating whether the terms should keep their positions. By default, when FALSE, the terms are reordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on. Effects of a given order are kept in the order specified.


should the formula be expanded and simplified, the pre-1.7.0 behaviour?


further arguments passed to or from other methods.


normally . in a formula refers to the remaining variables contained in data. Exceptionally, . can be treated as a name for non-standard uses of formulae.


Not all of the options work in the same way that they do in S and not all are implemented.


A terms object is returned. It is the re-ordered formula (unless keep.order = TRUE) with several attributes, see terms.object for details. In all cases variables within an interaction term in the formula are re-ordered by the ordering of the "variables" attribute, which is the order in which the variables occur in the formula.

See Also

terms, terms.object, also for examples.

Description of Terms Objects


An object of class terms holds information about a model. Usually the model was specified in terms of a formula and that formula was used to determine the terms object.


The object itself is simply the result of terms.formula(<formula>). It has a number of attributes and they are used to construct the model frame:


An integer matrix of variables by terms showing which variables appear in which terms. The entries are


if the variable does not occur in the term,


if it does occur and should be coded by contrasts, and


if it occurs and should be coded via dummy variables for all levels (as when a lower-order term is missing).

Note that variables in main effects always receive 1, even if the intercept is missing (in which case the first one should be coded with dummy variables). If there are no terms other than an intercept and offsets, this is integer(0).


A character vector containing the labels for each of the terms in the model, except for offsets. Note that these are after possible re-ordering of terms.

Non-syntactic names will be quoted by backticks: this makes it easier to re-construct the formula from the term labels.


A call to list of the variables in the model.


Either 0, indicating no intercept is to be fit, or 1 indicating that an intercept is to be fit.


A vector of the same length as term.labels indicating the order of interaction for each term.


The index of the variable (in variables) of the response (the left hand side of the formula). Zero, if there is no response.


If the model contains offset terms there is an offset attribute indicating which variable(s) are offsets


If a specials argument was given to terms.formula there is a specials attribute, a pairlist of vectors (one for each specified special function) giving numeric indices of the arguments of the list returned as the variables attribute which contain these special functions.


optional. A named character vector giving the classes (as given by .MFclass) of the variables used in a fit.


optional. An expression to help in computing predictions at new covariate values; see makepredictcall.

The object has class c("terms", "formula").


These objects are different from those found in S. In particular there is no formula attribute: instead the object is itself a formula. (Thus, the mode of a terms object is different.)

Examples of the specials argument can be seen in the aov and coxph functions, the latter from package survival.

See Also

terms, formula.


## use of specials (as used for gam() in packages mgcv and gam)
(tf <- terms(y ~ x + x:z + s(x), specials = "s"))
## Note that the "factors" attribute has variables as row names
## and term labels as column names, both as character vectors.
attr(tf, "specials")    # index 's' variable(s)
rownames(attr(tf, "factors"))[attr(tf, "specials")$s]

## we can keep the order by
terms(y ~ x + x:z + s(x), specials = "s", keep.order = TRUE)

Sampling Times of Time Series


time creates the vector of times at which a time series was sampled.

cycle gives the positions in the cycle of each observation.

frequency returns the number of samples per unit time and deltat the time interval between observations (see ts).


time(x, ...)
## Default S3 method:
time(x, offset = 0, ts.eps = getOption("ts.eps"), ...)

cycle(x, ...)
frequency(x, ...)
deltat(x, ...)



a univariate or multivariate time-series, or a vector or matrix.


can be used to indicate when sampling took place in the time unit. 0 (the default) indicates the start of the unit, 0.5 the middle and 1 the end of the interval.


time series comparison tolerance, used in time() to determine if values close than ts.eps to an integer should be round()ed to it in order to preserve the “year”.


extra arguments for future methods.


These are all generic functions, which will use the tsp attribute of x if it exists. time and cycle have methods for class ts that coerce the result to that class.

time() round()s values close to an integer, i.e., closer than ts.eps, since R 4.3.0. For previous behaviour, you can call it with ts.eps = 0.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

ts, start, tsp, window.

date for clock time, system.time for CPU usage.



# a simple series plot
plot(as.vector(time(presidents)), as.vector(presidents), type = "l")

Create Symmetric and Asymmetric Toeplitz Matrix


In its simplest use, toeplitz() forms a symmetric Toeplitz matrix given its first column (or row). For the general case, asymmetric and non-square Toeplitz matrices are formed either by specifying the first column and row separately,

T1 <- toeplitz(col, row)

or by

T <- toeplitz2(x, nr, nc)

where only one of (nr, nc) needs to be specified. In the latter case, the simple equivalence Ti,j=xij+ncT_{i,j} = x_{i-j + n_c} is fulfilled where nc=n_c =ncol(T).


toeplitz (x, r = NULL, symmetric = is.null(r))
toeplitz2(x, nrow = length(x) +1 - ncol, ncol = length(x) +1 - nrow)



for toeplitz(x, *): the first column of the Toeplitz matrix; for toeplitz2(x, *) it is the upper-and-left border of the Toeplitz matrix, i.e., from top-right to bottom-left, such that T[i,j] == x[i-j + ncol].


the first row of the target Toeplitz matrix; only needed in asymmetric cases.


optional logical indicating if the matrix should be symmetric.

nrow, ncol

the number of rows and columns; only one needs to be specified.


The n×mn \times m Toeplitz matrix TT; for


dim(T) is (n,m) and m == length(x) and n == m in the symmetric case or n == length(r) otherwise.


dim(T) == c(nrow, ncol).


A. Trapletti and Martin Maechler (speedup and asymmetric extensions)


x <- 1:5
toeplitz (x)

T. <- toeplitz (1:5, 11:13) # with a  *Warning* x[1] != r[1]
T2 <- toeplitz2(c(13:12, 1:5), 5, 3)# this is the same matrix:
stopifnot(identical(T., T2))

# Matrix of character (could also have logical, raw, complex ..) {also warning}:
noquote(toeplitz(letters[1:4], LETTERS[20:26]))

## A convolution/smoother weight matrix :
m <- 17
k <- length(wts <- c(76, 99, 60, 20, 1))
n <- m-k+1
## Convolution
W <- toeplitz2(c(rep(0, m-k), wts, rep(0, m-k)), ncol=n)

## "display" nicely :
   print(Matrix::Matrix(W))    else {
   colnames(W) <- paste0(",", if(n <= 9) 1:n else c(1:9, letters[seq_len(n-9)]))

## scale W to have column sums 1:
W. <- W / sum(wts)
all.equal(rep(1, ncol(W.)), colSums(W.), check.attributes = FALSE)
## Visualize "mass-preserving" convolution
x <- 1:n; f <- function(x) exp(-((x - .4*n)/3)^2)
y <- f(x) + rep_len(3:-2, n)/10
## Smoothing convolution:
y.hat <- W. %*% y # y.hat := smoothed(y) ("mass preserving" -> longer than y)
stopifnot(length(y.hat) == m, m == n + (k-1))
plot(x,y, type="b", xlim=c(1,m)); curve(f(x), 1,n, col="gray", lty=2, add=TRUE)
lines(1:m, y.hat, col=2, lwd=3)
rbind(sum(y), sum(y.hat)) ## mass preserved

## And, yes, convolve(y, *) does the same when called appropriately:
all.equal(c(y.hat), convolve(y, rev(wts/sum(wts)), type="open"))

Time-Series Objects


The function ts is used to create time-series objects.

as.ts and is.ts coerce an object to a time-series and test whether an object is a time series.


ts(data = NA, start = 1, end = numeric(), frequency = 1,
   deltat = 1, ts.eps = getOption("ts.eps"),
   class = if(nseries > 1) c("mts", "ts", "matrix", "array") else "ts",
   names = )
as.ts(x, ...)




a vector or matrix of the observed time-series values. A data frame will be coerced to a numeric matrix via data.matrix. (See also ‘Details’.)


the time of the first observation. Either a single number or a vector of two numbers (the second of which is an integer), which specify a natural time unit and a (1-based) number of samples into the time unit. See the examples for the use of the second form.


the time of the last observation, specified in the same way as start.


the number of observations per unit of time.


the fraction of the sampling period between successive observations; e.g., 1/12 for monthly data. Only one of frequency or deltat should be provided.


time series comparison tolerance. Frequencies are considered equal if their absolute difference is less than ts.eps.


class to be given to the result, or none if NULL or "none". The default is "ts" for a single series, or c("mts", "ts", "matrix", "array") for multiple series.


a character vector of names for the series in a multiple series: defaults to the colnames of data, or "Series 1", "Series 2", ....


an arbitrary R object.


arguments passed to methods (unused for the default method).


The function ts is used to create time-series objects. These are vectors or matrices which inherit from class "ts" (and have additional attributes) which represent data which has been sampled at equispaced points in time. In the matrix case, each column of the matrix data is assumed to contain a single (univariate) time series. Time series must have at least one observation, and although they need not be numeric there is very limited support for non-numeric series.

Class "ts" has a number of methods. In particular arithmetic will attempt to align time axes, and subsetting to extract subsets of series can be used (e.g., EuStockMarkets[, "DAX"]). However, subsetting the first (or only) dimension will return a matrix or vector, as will matrix subsetting. Subassignment can be used to replace values but not to extend a series (see window). There is a method for t that transposes the series as a matrix (a one-column matrix if a vector) and hence returns a result that does not inherit from class "ts".

Argument frequency indicates the sampling frequency of the time series, with the default value 1 indicating one sample in each unit time interval. For example, one could use a value of 7 for frequency when the data are sampled daily, and the natural time period is a week, or 12 when the data are sampled monthly and the natural time period is a year. Values of 4 and 12 are assumed in (e.g.) print methods to imply a quarterly and monthly series respectively. frequency need not be a whole number: for example, frequency = 0.2 would imply sampling once every five time units.

as.ts is generic. Its default method will use the tsp attribute of the object if it has one to set the start and end times and frequency.

is.ts() tests if an object is a time series, i.e., inherits from "ts" and is of positive length.

is.mts(x) tests if an object x is a multivariate time series, i.e., fulfills is.ts(x), is.matrix(x) and inherits from class "mts".


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

tsp, frequency, start, end, time, window; print.ts, the print method for time series objects; plot.ts, the plot method for time series objects.

For other definitions of ‘time series’ (e.g., time-ordered observations) see the CRAN task view at



ts(1:10, frequency = 4, start = c(1959, 2)) # 2nd Quarter of 1959
print( ts(1:10, frequency = 7, start = c(12, 2)), calendar = TRUE)
# print.ts(.)
## Using July 1954 as start date:
gnp <- ts(cumsum(1 + round(rnorm(100), 2)),
          start = c(1954, 7), frequency = 12)
plot(gnp) # using 'plot.ts' for time-series plot

## Multivariate
z <- ts(matrix(rnorm(300), 100, 3), start = c(1961, 1), frequency = 12)
head(z) # as "matrix"
plot(z, plot.type = "single", lty = 1:3)

## A phase plot:
plot(nhtemp, lag(nhtemp, 1), cex = .8, col = "blue",
     main = "Lag plot of New Haven temperatures")

Methods for Time Series Objects


Methods for objects of class "ts", typically the result of ts.


## S3 method for class 'ts'
diff(x, lag = 1, differences = 1, ...)

## S3 method for class 'ts'
na.omit(object, ...)



an object of class "ts" containing the values to be differenced.


an integer indicating which lag to use.


an integer indicating the order of the difference.


a univariate or multivariate time series.


further arguments to be passed to or from methods.


The na.omit method omits initial and final segments with missing values in one or more of the series. ‘Internal’ missing values will lead to failure.


For the na.omit method, a time series without missing values. The class of object will be preserved.

See Also

diff; na.omit,, na.contiguous.

Plot Multiple Time Series


Plot several time series on a common plot. Unlike plot.ts the series can have a different time bases, but they should have the same frequency.


ts.plot(..., gpars = list())



one or more univariate or multivariate time series.


list of named graphics parameters to be passed to the plotting functions. Those commonly used can be supplied directly in ....




Although this can be used for a single time series, plot is easier to use and is preferred.

See Also




ts.plot(ldeaths, mdeaths, fdeaths,
        gpars=list(xlab="year", ylab="deaths", lty=c(1:3)))

Bind Two or More Time Series


Bind time series which have a common frequency. ts.union pads with NAs to the total time coverage, ts.intersect restricts to the time covered by all the series.


ts.intersect(..., dframe = FALSE)
ts.union(..., dframe = FALSE)



two or more univariate or multivariate time series, or objects which can coerced to time series.


logical; if TRUE return the result as a data frame.


As a special case, ... can contain vectors or matrices of the same length as the combined time series of the time series present, as well as those of a single row.


A time series object if dframe is FALSE, otherwise a data frame.

See Also



ts.union(mdeaths, fdeaths)
cbind(mdeaths, fdeaths) # same as the previous line
ts.intersect(window(mdeaths, 1976), window(fdeaths, 1974, 1978))

sales1 <- ts.union(BJsales, lead = BJsales.lead)
ts.intersect(sales1, lead3 = lag(BJsales.lead, -3))

Diagnostic Plots for Time-Series Fits


A generic function to plot time-series diagnostics.


tsdiag(object, gof.lag, ...)



a fitted time-series model


the maximum number of lags for a Portmanteau goodness-of-fit test


further arguments to be passed to particular methods


This is a generic function. It will generally plot the residuals, often standardized, the autocorrelation function of the residuals, and the p-values of a Portmanteau test for all lags up to gof.lag.

The methods for arima and StructTS objects plots residuals scaled by the estimate of their (individual) variance, and use the Ljung–Box version of the portmanteau test.


None. Diagnostics are plotted.

See Also

arima, StructTS, Box.test



fit <- arima(lh, c(1,0,0))

## see also examples(arima)

(fit <- StructTS(log10(JohnsonJohnson), type = "BSM"))

Tsp Attribute of Time-Series-like Objects


tsp returns the tsp attribute (or NULL). It is included for compatibility with S version 2. tsp<- sets the tsp attribute. hasTsp ensures x has a tsp attribute, by adding one if needed.


tsp(x) <- value



a vector or matrix or univariate or multivariate time-series.


a numeric vector of length 3 or NULL.


The tsp attribute gives the start time in time units, the end time and the frequency (the number of observations per unit of time, e.g. 12 for a monthly series).

Assignments are checked for consistency.

Assigning NULL which removes the tsp attribute and any "ts" (or "mts") class of x.


An object which differs from x only in the tsp attribute (unless NULL is assigned).

hasTsp adds, if needed, an attribute with a start time and frequency of 1 and end time NROW(x).


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

ts, time, start.

Use Fixed-Interval Smoothing on Time Series


Performs fixed-interval smoothing on a univariate time series via a state-space model. Fixed-interval smoothing gives the best estimate of the state at each time point based on the whole observed series.


tsSmooth(object, ...)



a time-series fit. Currently only class "StructTS" is supported


possible arguments for future methods.


A time series, with as many dimensions as the state space and results at each time point of the original series. (For seasonal models, only the current seasonal component is returned.)


B. D. Ripley


Durbin, J. and Koopman, S. J. (2001) Time Series Analysis by State Space Methods. Oxford University Press.

See Also

KalmanSmooth, StructTS.

For examples consult AirPassengers, JohnsonJohnson and Nile.

The Studentized Range Distribution


Functions of the distribution of the studentized range, R/sR/s, where RR is the range of a standard normal sample and df×s2df \times s^2 is independently distributed as chi-squared with dfdf degrees of freedom, see pchisq.


ptukey(q, nmeans, df, nranges = 1, lower.tail = TRUE, log.p = FALSE)
qtukey(p, nmeans, df, nranges = 1, lower.tail = TRUE, log.p = FALSE)



vector of quantiles.


vector of probabilities.


sample size for range (same for each group).


degrees of freedom for ss (see below).


number of groups whose maximum range is considered.


logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


If ng=n_g =nranges is greater than one, RR is the maximum of ngn_g groups of nmeans observations each.


ptukey gives the distribution function and qtukey its inverse, the quantile function.

The length of the result is the maximum of the lengths of the numerical arguments. The other numerical arguments are recycled to that length. Only the first elements of the logical arguments are used.


A Legendre 16-point formula is used for the integral of ptukey. The computations are relatively expensive, especially for qtukey which uses a simple secant method for finding the inverse of ptukey. qtukey will be accurate to the 4th decimal place.


qtukey is in part adapted from Odeh and Evans (1974).


Copenhaver, Margaret Diponzio and Holland, Burt S. (1988). Computation of the distribution of the maximum studentized range statistic with application to multiple significance testing of simple effects. Journal of Statistical Computation and Simulation, 30, 1–15. doi:10.1080/00949658808811082.

Odeh, R. E. and Evans, J. O. (1974). Algorithm AS 70: Percentage Points of the Normal Distribution. Applied Statistics, 23, 96–97. doi:10.2307/2347061.

See Also

Distributions for standard distributions, including pnorm and qnorm for the corresponding functions for the normal distribution.


  curve(ptukey(x, nm = 6, df = 5), from = -1, to = 8, n = 101)
(ptt <- ptukey(0:10, 2, df =  5))
(qtt <- qtukey(.95, 2, df =  2:11))
## The precision may be not much more than about 8 digits:
summary(abs(.95 - ptukey(qtt, 2, df = 2:11)))

Compute Tukey Honest Significant Differences


Create a set of confidence intervals on the differences between the means of the levels of a factor with the specified family-wise probability of coverage. The intervals are based on the Studentized range statistic, Tukey's ‘Honest Significant Difference’ method.


TukeyHSD(x, which, ordered = FALSE, conf.level = 0.95, ...)



A fitted model object, usually an aov fit.


A character vector listing terms in the fitted model for which the intervals should be calculated. Defaults to all the terms.


A logical value indicating if the levels of the factor should be ordered according to increasing average in the sample before taking differences. If ordered is true then the calculated differences in the means will all be positive. The significant differences will be those for which the lwr end point is positive.


A numeric value between zero and one giving the family-wise confidence level to use.


Optional additional arguments. None are used at present.


This is a generic function: the description here applies to the method for fits of class "aov".

When comparing the means for the levels of a factor in an analysis of variance, a simple comparison using t-tests will inflate the probability of declaring a significant difference when it is not in fact present. This because the intervals are calculated with a given coverage probability for each interval but the interpretation of the coverage is usually with respect to the entire family of intervals.

John Tukey introduced intervals based on the range of the sample means rather than the individual differences. The intervals returned by this function are based on this Studentized range statistics.

The intervals constructed in this way would only apply exactly to balanced designs where there are the same number of observations made at each level of the factor. This function incorporates an adjustment for sample size that produces sensible intervals for mildly unbalanced designs.

If which specifies non-factor terms these will be dropped with a warning: if no terms are left this is an error.


A list of class c("multicomp", "TukeyHSD"), with one component for each term requested in which. Each component is a matrix with columns diff giving the difference in the observed means, lwr giving the lower end point of the interval, upr giving the upper end point and p adj giving the p-value after adjustment for the multiple comparisons.

There are print and plot methods for class "TukeyHSD". The plot method does not accept xlab, ylab or main arguments and creates its own values for each plot.


Douglas Bates


Miller, R. G. (1981) Simultaneous Statistical Inference. Springer.

Yandell, B. S. (1997) Practical Data Analysis for Designed Experiments. Chapman & Hall.

See Also

aov, qtukey, model.tables, glht in package multcomp.



summary(fm1 <- aov(breaks ~ wool + tension, data = warpbreaks))
TukeyHSD(fm1, "tension", ordered = TRUE)
plot(TukeyHSD(fm1, "tension"))

The Uniform Distribution


These functions provide information about the uniform distribution on the interval from min to max. dunif gives the density, punif gives the distribution function qunif gives the quantile function and runif generates random deviates.


dunif(x, min = 0, max = 1, log = FALSE)
punif(q, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
qunif(p, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
runif(n, min = 0, max = 1)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.

min, max

lower and upper limits of the distribution. Must be finite.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


If min or max are not specified they assume the default values of 0 and 1 respectively.

The uniform distribution has density

f(x)=1maxminf(x) = \frac{1}{max-min}

for minxmaxmin \le x \le max.

For the case of u:=min==maxu := min == max, the limit case of XuX \equiv u is assumed, although there is no density in that case and dunif will return NaN (the error condition).

runif will not generate either of the extreme values unless max = min or max-min is small compared to min, and in particular not for the default arguments.


dunif gives the density, punif gives the distribution function, qunif gives the quantile function, and runif generates random deviates.

The length of the result is determined by n for runif, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


The characteristics of output from pseudo-random number generators (such as precision and periodicity) vary widely. See .Random.seed for more information on R's random number generation algorithms.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

RNG about random number generation in R.

Distributions for other standard distributions.


u <- runif(20)

## The following relations always hold :
punif(u) == u
dunif(u) == 1

var(runif(10000))  #- ~ = 1/12 = .08333

One Dimensional Root (Zero) Finding


The function uniroot searches the interval from lower to upper for a root (i.e., zero) of the function f with respect to its first argument.

Setting extendInt to a non-"no" string, means searching for the correct interval = c(lower,upper) if sign(f(x)) does not satisfy the requirements at the interval end points; see the ‘Details’ section.


uniroot(f, interval, ...,
        lower = min(interval), upper = max(interval),
        f.lower = f(lower, ...), f.upper = f(upper, ...),
        extendInt = c("no", "yes", "downX", "upX"), check.conv = FALSE,
        tol = .Machine$double.eps^0.25, maxiter = 1000, trace = 0)



the function for which the root is sought.


a vector containing the end-points of the interval to be searched for the root.


additional named or unnamed arguments to be passed to f

lower, upper

the lower and upper end points of the interval to be searched.

f.lower, f.upper

the same as f(upper) and f(lower), respectively. Passing these values from the caller where they are often known is more economical as soon as f() contains non-trivial computations.


character string specifying if the interval c(lower,upper) should be extended or directly produce an error when f() does not have differing signs at the endpoints. The default, "no", keeps the search interval and hence produces an error. Can be abbreviated.


logical indicating whether a convergence warning of the underlying uniroot should be caught as an error and if non-convergence in maxiter iterations should be an error instead of a warning.


the desired accuracy (convergence tolerance).


the maximum number of iterations.


integer number; if positive, tracing information is produced. Higher values giving more details.


Note that arguments after ... must be matched exactly.

Either interval or both lower and upper must be specified: the upper endpoint must be strictly larger than the lower endpoint. The function values at the endpoints must be of opposite signs (or zero), for extendInt="no", the default. Otherwise, if extendInt="yes", the interval is extended on both sides, in search of a sign change, i.e., until the search interval [l,u][l,u] satisfies f(l)f(u)0f(l) \cdot f(u) \le 0.

If it is known how ff changes sign at the root x0x_0, that is, if the function is increasing or decreasing there, extendInt can (and typically should) be specified as "upX" (for “upward crossing”) or "downX", respectively. Equivalently, define S:=±1S := \pm 1, to require S=sign(f(x0+ϵ))S = \mathrm{sign}(f(x_0 + \epsilon)) at the solution. In that case, the search interval [l,u][l,u] possibly is extended to be such that Sf(l)0S\cdot f(l)\le 0 and Sf(u)0S \cdot f(u) \ge 0.

uniroot() uses Fortran subroutine zeroin (from Netlib) based on algorithms given in the reference below. They assume a continuous function (which then is known to have at least one root in the interval).

Convergence is declared either if f(x) == 0 or the change in x for one step of the algorithm is less than tol (plus an allowance for representation error in x).

If the algorithm does not converge in maxiter steps, a warning is printed and the current approximation is returned.

f will be called as f(x, ...) for a numeric value of x.

The argument passed to f has special semantics and used to be shared between calls. The function should not copy it.


A list with at least five components: root and f.root give the location of the root and the value of the function evaluated at that point. iter and estim.prec give the number of iterations used and an approximate estimated precision for root. (If the root occurs at one of the endpoints, the estimated precision is NA.) contains the number of initial extendInt iterations if there were any and is NA otherwise. In the case of such extendInt iterations, iter contains the sum of these and the zeroin iterations.

Further components may be added in the future.


Based on ‘zeroin.c’ in


Brent, R. (1973) Algorithms for Minimization without Derivatives. Englewood Cliffs, NJ: Prentice-Hall.

See Also

polyroot for all complex roots of a polynomial; optimize, nlm.


require(utils) # for str

## some platforms hit zero exactly on the first step:
## if so the estimated precision is 2/3.
f <- function (x, a) x - a
str(xmin <- uniroot(f, c(0, 1), tol = 0.0001, a = 1/3))

## handheld calculator example: fixed point of cos(.):
uniroot(function(x) cos(x) - x, lower = -pi, upper = pi, tol = 1e-9)$root

str(uniroot(function(x) x*(x^2-1) + .5, lower = -2, upper = 2,
            tol = 0.0001))
str(uniroot(function(x) x*(x^2-1) + .5, lower = -2, upper = 2,
            tol = 1e-10))

## Find the smallest value x for which exp(x) > 0 (numerically):
r <- uniroot(function(x) 1e80*exp(x) - 1e-300, c(-1000, 0), tol = 1e-15)
str(r, digits.d = 15) # around -745, depending on the platform.

exp(r$root)     # = 0, but not for r$root * 0.999...
minexp <- r$root * (1 - 10*.Machine$double.eps)
exp(minexp)     # typically denormalized

##--- uniroot() with new interval extension + checking features: --------------

f1 <- function(x) (121 - x^2)/(x^2+1)
f2 <- function(x) exp(-x)*(x - 12)

try(uniroot(f1, c(0,10)))
try(uniroot(f2, c(0, 2)))
##--> error: f() .. end points not of opposite sign

## where as  'extendInt="yes"'  simply first enlarges the search interval:
u1 <- uniroot(f1, c(0,10),extendInt="yes", trace=1)
u2 <- uniroot(f2, c(0,2), extendInt="yes", trace=2)
stopifnot(all.equal(u1$root, 11, tolerance = 1e-5),
          all.equal(u2$root, 12, tolerance = 6e-6))

## The *danger* of interval extension:
## No way to find a zero of a positive function, but
## numerically, f(-|M|) becomes zero :
u3 <- uniroot(exp, c(0,2), extendInt="yes", trace=TRUE)

## Nonsense example (must give an error):
tools::assertCondition( uniroot(function(x) 1, 0:1, extendInt="yes"),
                       "error", verbose=TRUE)

## Convergence checking :
sinc <- function(x) ifelse(x == 0, 1, sin(x)/x)
curve(sinc, -6,18); abline(h=0,v=0, lty=3, col=adjustcolor("gray", 0.8))

uniroot(sinc, c(0,5), extendInt="yes", maxiter=4) #-> "just" a warning

## now with  check.conv=TRUE, must signal a convergence error :

uniroot(sinc, c(0,5), extendInt="yes", maxiter=4, check.conv=TRUE)

### Weibull cumulative hazard (example origin, Ravi Varadhan):
cumhaz <- function(t, a, b) b * (t/b)^a
froot <- function(x, u, a, b) cumhaz(x, a, b) - u

n <- 1000
u <- -log(runif(n))
a <- 1/2
b <- 1
## Find failure times
ru <- sapply(u, function(x)
   uniroot(froot, u=x, a=a, b=b, interval= c(1.e-14, 1e04),
ru2 <- sapply(u, function(x)
   uniroot(froot, u=x, a=a, b=b, interval= c(0.01,  10),
stopifnot(all.equal(ru, ru2, tolerance = 6e-6))

r1 <- uniroot(froot, u= 0.99, a=a, b=b, interval= c(0.01, 10),
stopifnot(all.equal(0.99, cumhaz(r1$root, a=a, b=b)))

## An error if 'extendInt' assumes "wrong zero-crossing direction":

uniroot(froot, u= 0.99, a=a, b=b, interval= c(0.1, 10), extendInt="down")

Update and Re-fit a Model Call


update will update and (by default) re-fit a model. It does this by extracting the call stored in the object, updating the call and (by default) evaluating that call. Sometimes it is useful to call update with only one argument, for example if the data frame has been corrected.

“Extracting the call” in update() and similar functions uses getCall() which itself is a (S3) generic function with a default method that simply gets x$call.

Because of this, update() will often work (via its default method) on new model classes, either automatically, or by providing a simple getCall() method for that class.


update(object, ...)
## Default S3 method:
update(object, formula., ..., evaluate = TRUE)

getCall(x, ...)


object, x

An existing fit from a model function such as lm, glm and many others.


Changes to the formula – see update.formula for details.


Additional arguments to the call, or arguments with changed values. Use name = NULL to remove the argument name.


If true evaluate the new call else return the call.


If evaluate = TRUE the fitted object, otherwise the updated call.


Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

See Also



oldcon <- options(contrasts = c("contr.treatment", "contr.poly"))
## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl", "Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
summary(lm.D90 <- update(lm.D9, . ~ . - 1))
options(contrasts = c("contr.helmert", "contr.poly"))
getCall(lm.D90)  # "through the origin"


Model Updating


update.formula is used to update model formulae. This typically involves adding or dropping terms, but updates can be more general.


## S3 method for class 'formula'
update(old, new, ...)



a model formula to be updated.


a formula giving a template which specifies how to update.


further arguments passed to or from other methods.


Either or both of old and new can be objects such as length-one character vectors which can be coerced to a formula via as.formula.

The function works by first identifying the left-hand side and right-hand side of the old formula. It then examines the new formula and substitutes the lhs of the old formula for any occurrence of ‘.’ on the left of new, and substitutes the rhs of the old formula for any occurrence of ‘.’ on the right of new. The result is then simplified via terms.formula(simplify = TRUE).


The updated formula is returned. The environment of the result is that of old.

See Also

terms, model.matrix.


update(y ~ x,    ~ . + x2) #> y ~ x + x2
update(y ~ x, log(.) ~ . ) #> log(y) ~ x
update(. ~ u+v, res  ~ . ) #> res ~ u + v

F Test to Compare Two Variances


Performs an F test to compare the variances of two samples from normal populations.


var.test(x, ...)

## Default S3 method:
var.test(x, y, ratio = 1,
         alternative = c("two.sided", "less", "greater"),
         conf.level = 0.95, ...)

## S3 method for class 'formula'
var.test(formula, data, subset, na.action, ...)


x, y

numeric vectors of data values, or fitted linear model objects (inheriting from class "lm").


the hypothesized ratio of the population variances of x and y.


a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.


confidence level for the returned confidence interval.


a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").


further arguments to be passed to or from methods.


The null hypothesis is that the ratio of the variances of the populations from which x and y were drawn, or in the data to which the linear models x and y were fitted, is equal to ratio.


A list with class "htest" containing the following components:


the value of the F test statistic.


the degrees of the freedom of the F distribution of the test statistic.


the p-value of the test.

a confidence interval for the ratio of the population variances.


the ratio of the sample variances of x and y.


the ratio of population variances under the null.


a character string describing the alternative hypothesis.


the character string "F test to compare two variances".

a character string giving the names of the data.

See Also

bartlett.test for testing homogeneity of variances in more than two samples from normal distributions; ansari.test and mood.test for two rank based (nonparametric) two-sample tests for difference in scale.


x <- rnorm(50, mean = 0, sd = 2)
y <- rnorm(30, mean = 1, sd = 1)
var.test(x, y)                  # Do x and y have the same variance?
var.test(lm(x ~ 1), lm(y ~ 1))  # The same.

Rotation Methods for Factor Analysis


These functions ‘rotate’ loading matrices in factor analysis.


varimax(x, normalize = TRUE, eps = 1e-5)
promax(x, m = 4)



A loadings matrix, with pp rows and k<pk < p columns


The power used the target for promax. Values of 2 to 4 are recommended.


logical. Should Kaiser normalization be performed? If so the rows of x are re-scaled to unit length before rotation, and scaled back afterwards.


The tolerance for stopping: the relative change in the sum of singular values.


These seek a ‘rotation’ of the factors x %*% T that aims to clarify the structure of the loadings matrix. The matrix T is a rotation (possibly with reflection) for varimax, but a general linear transformation for promax, with the variance of the factors being preserved.


A list with components


The ‘rotated’ loadings matrix, x %*% rotmat, of class "loadings".


The ‘rotation’ matrix.


Hendrickson, A. E. and White, P. O. (1964). Promax: a quick method for rotation to orthogonal oblique structure. British Journal of Statistical Psychology, 17, 65–70. doi:10.1111/j.2044-8317.1964.tb00244.x.

Horst, P. (1965). Factor Analysis of Data Matrices. Holt, Rinehart and Winston. Chapter 10.

Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200. doi:10.1007/BF02289233.

Lawley, D. N. and Maxwell, A. E. (1971). Factor Analysis as a Statistical Method, second edition. Butterworths.

See Also

factanal, Harman74.cor.


## varimax with normalize = TRUE is the default
fa <- factanal( ~., 2, data = swiss)
varimax(loadings(fa), normalize = FALSE)

Calculate Variance-Covariance Matrix for a Fitted Model Object


Returns the variance-covariance matrix of the main parameters of a fitted model object. The “main” parameters of model correspond to those returned by coef, and typically do not contain a nuisance scale parameter (sigma).


vcov(object, ...)
## S3 method for class 'lm'
vcov(object, complete = TRUE, ...)
## and also for '[summary.]glm' and 'mlm'
## S3 method for class 'aov'
vcov(object, complete = FALSE, ...)

.vcov.aliased(aliased, vc, complete = TRUE)



a fitted model object, typically. Sometimes also a summary() object of such a fitted model.


for the aov, lm, glm, mlm, and where applicable summary.lm etc methods: logical indicating if the full variance-covariance matrix should be returned also in case of an over-determined system where some coefficients are undefined and coef(.) contains NAs correspondingly. When complete = TRUE, vcov() is compatible with coef() also in this singular case.


additional arguments for method functions. For the glm method this can be used to pass a dispersion parameter.


a logical vector typically identical to indicating which coefficients are ‘aliased’.


a variance-covariance matrix, typically “incomplete”, i.e., with no rows and columns for aliased coefficients.


vcov() is a generic function and functions with names beginning in vcov. will be methods for this function. Classes with methods for this function include: lm, mlm, glm, nls, summary.lm, summary.glm, negbin, polr, rlm (in package MASS), multinom (in package nnet) gls, lme (in package nlme), coxph and survreg (in package survival).

(vcov() methods for summary objects allow more efficient and still encapsulated access when both summary(mod) and vcov(mod) are needed.)

.vcov.aliased() is an auxiliary function useful for vcov method implementations which have to deal with singular model fits encoded via NA coefficients: It augments a vcov–matrix vc by NA rows and columns where needed, i.e., when some entries of aliased are true and vc is of smaller dimension than length(aliased).


A matrix of the estimated covariances between the parameter estimates in the linear or non-linear predictor of the model. This should have row and column names corresponding to the parameter names given by the coef method.

When some coefficients of the (linear) model are undetermined and hence NA because of linearly dependent terms (or an “over specified” model), also called “aliased”, see alias, then since R version 3.5.0, vcov() (iff complete = TRUE, i.e., by default for lm etc, but not for aov) contains corresponding rows and columns of NAs, wherever coef() has always contained such NAs.

The Weibull Distribution


Density, distribution function, quantile function and random generation for the Weibull distribution with parameters shape and scale.


dweibull(x, shape, scale = 1, log = FALSE)
pweibull(q, shape, scale = 1, lower.tail = TRUE, log.p = FALSE)
qweibull(p, shape, scale = 1, lower.tail = TRUE, log.p = FALSE)
rweibull(n, shape, scale = 1)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(n) > 1, the length is taken to be the number required.

shape, scale

shape and scale parameters, the latter defaulting to 1.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


The Weibull distribution with shape parameter aa and scale parameter σ\sigma has density given by

f(x)=(a/σ)(x/σ)a1exp((x/σ)a)f(x) = (a/\sigma) {(x/\sigma)}^{a-1} \exp (-{(x/\sigma)}^{a})

for x>0x > 0. The cumulative distribution function is F(x)=1exp((x/σ)a)F(x) = 1 - \exp(-{(x/\sigma)}^a) on x>0x > 0, the mean is E(X)=σΓ(1+1/a)E(X) = \sigma \Gamma(1 + 1/a), and the Var(X)=σ2(Γ(1+2/a)(Γ(1+1/a))2)Var(X) = \sigma^2(\Gamma(1 + 2/a)-(\Gamma(1 + 1/a))^2).


dweibull gives the density, pweibull gives the distribution function, qweibull gives the quantile function, and rweibull generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rweibull, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.


The cumulative hazard H(t)=log(1F(t))H(t) = - \log(1 - F(t)) is

-pweibull(t, a, b, lower = FALSE, log = TRUE)

which is just H(t)=(t/b)aH(t) = {(t/b)}^a.


[dpq]weibull are calculated directly from the definitions. rweibull uses inversion.


Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 21. Wiley, New York.

See Also

Distributions for other standard distributions, including the Exponential which is a special case of the Weibull distribution.


x <- c(0, rlnorm(50))
all.equal(dweibull(x, shape = 1), dexp(x))
all.equal(pweibull(x, shape = 1, scale = pi), pexp(x, rate = 1/pi))
## Cumulative hazard H():
all.equal(pweibull(x, 2.5, pi, lower.tail = FALSE, log.p = TRUE),
          -(x/pi)^2.5, tolerance = 1e-15)
all.equal(qweibull(x/11, shape = 1, scale = pi), qexp(x/11, rate = 1/pi))

Weighted Arithmetic Mean


Compute a weighted mean.


weighted.mean(x, w, ...)

## Default S3 method:
weighted.mean(x, w, ..., na.rm = FALSE)



an object containing the values whose weighted mean is to be computed.


a numerical vector of weights the same length as x giving the weights to use for elements of x.


arguments to be passed to or from methods.


a logical value indicating whether NA values in x should be stripped before the computation proceeds.


This is a generic function and methods can be defined for the first argument x: apart from the default methods there are methods for the date-time classes "POSIXct", "POSIXlt", "difftime" and "Date". The default method will work for any numeric-like object for which [, multiplication, division and sum have suitable methods, including complex vectors.

If w is missing then all elements of x are given the same weight, otherwise the weights are normalized to sum to one (if possible: if their sum is zero or infinite the value is likely to be NaN).

Missing values in w are not handled specially and so give a missing value as the result. However, zero weights are handled specially and the corresponding x values are omitted from the sum.


For the default method, a length-one numeric vector.

See Also



## GPA from Siegel 1994
wt <- c(5,  5,  4,  1)/15
x <- c(3.7,3.3,3.5,2.8)
xm <- weighted.mean(x, wt)

Compute Weighted Residuals


Computed weighted residuals from a linear model fit.


weighted.residuals(obj, drop0 = TRUE)



R object, typically of class lm or glm.


logical. If TRUE, drop all cases with weights == 0.


Weighted residuals are based on the deviance residuals, which for a lm fit are the raw residuals RiR_i multiplied by wi\sqrt{w_i}, where wiw_i are the weights as specified in lm's call.

Dropping cases with weights zero is compatible with influence and related functions.


Numeric vector of length nn', where nn' is the number of non-0 weights (drop0 = TRUE) or the number of observations, otherwise.

See Also

residuals, lm.influence, etc.


## following on from example(lm)

x <- 1:10
w <- 0:9
y <- rnorm(x)
weighted.residuals(lmxy <- lm(y ~ x, weights = w))
weighted.residuals(lmxy, drop0 = FALSE)

Extract Model Weights


weights is a generic function which extracts fitting weights from objects returned by modeling functions.

Methods can make use of napredict methods to compensate for the omission of missing values. The default methods does so.


weights(object, ...)



an object for which the extraction of model weights is meaningful.


other arguments passed to methods.


Weights extracted from the object object: the default method looks for component "weights" and if not NULL calls napredict on it.


Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

See Also


Wilcoxon Rank Sum and Signed Rank Tests


Performs one- and two-sample Wilcoxon tests on vectors of data; the latter is also known as ‘Mann-Whitney’ test.


wilcox.test(x, ...)

## Default S3 method:
wilcox.test(x, y = NULL,
            alternative = c("two.sided", "less", "greater"),
            mu = 0, paired = FALSE, exact = NULL, correct = TRUE,
   = FALSE, conf.level = 0.95,
            tol.root = 1e-4, digits.rank = Inf, ...)

## S3 method for class 'formula'
wilcox.test(formula, data, subset, na.action = na.pass, ...)



numeric vector of data values. Non-finite (e.g., infinite or missing) values will be omitted.


an optional numeric vector of data values: as with x non-finite values will be omitted.


a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.


a number specifying an optional parameter used to form the null hypothesis. See ‘Details’.


a logical indicating whether you want a paired test.


a logical indicating whether an exact p-value should be computed.


a logical indicating whether to apply continuity correction in the normal approximation for the p-value.

a logical indicating whether a confidence interval should be computed.


confidence level of the interval.


(when is true:) a positive numeric tolerance, used in uniroot(*, tol=tol.root) calls.


a number; if finite, rank(signif(r, digits.rank)) will be used to compute ranks for the test statistic instead of (the default) rank(r).


a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs either 1 for a one-sample or paired test or a factor with two levels giving the corresponding groups. If lhs is of class "Pair" and rhs is 1, a paired test is done, see Examples.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


a function which indicates what should happen when the data contain NAs.


further arguments to be passed to or from methods. For the formula method, this includes arguments of the default method, but not paired.


The formula interface is only applicable for the 2-sample tests.

If only x is given, or if both x and y are given and paired is TRUE, a Wilcoxon signed rank test of the null that the distribution of x (in the one sample case) or of x - y (in the paired two sample case) is symmetric about mu is performed.

Otherwise, if both x and y are given and paired is FALSE, a Wilcoxon rank sum test (equivalent to the Mann-Whitney test: see the Note) is carried out. In this case, the null hypothesis is that the distributions of x and y differ by a location shift of mu and the alternative is that they differ by some other location shift (and the one-sided alternative "greater" is that x is shifted to the right of y).

By default (if exact is not specified), an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.

For stability reasons, it may be advisable to use rounded data or to set digits.rank = 7, say, such that determination of ties does not depend on very small numeric differences (see the example).

Optionally (if argument is true), a nonparametric confidence interval and an estimator for the pseudomedian (one-sample case) or for the difference of the location parameters x-y is computed. (The pseudomedian of a distribution FF is the median of the distribution of (u+v)/2(u+v)/2, where uu and vv are independent, each with distribution FF. If FF is symmetric, then the pseudomedian and median coincide. See Hollander & Wolfe (1973), page 34.) Note that in the two-sample case the estimator for the difference in location parameters does not estimate the difference in medians (a common misconception) but rather the median of the difference between a sample from x and a sample from y.

If exact p-values are available, an exact confidence interval is obtained by the algorithm described in Bauer (1972), and the Hodges-Lehmann estimator is employed. Otherwise, the returned confidence interval and point estimate are based on normal approximations. These are continuity-corrected for the interval but not the estimate (as the correction depends on the alternative).

With small samples it may not be possible to achieve very high confidence interval coverages. If this happens a warning will be given and an interval with lower coverage will be substituted.

When x (and y if applicable) are valid, the function now always returns, also in the = TRUE case when a confidence interval cannot be computed, in which case the interval boundaries and sometimes the estimate now contain NaN.


A list with class "htest" containing the following components:


the value of the test statistic with a name describing it.


the parameter(s) for the exact distribution of the test statistic.


the p-value for the test.


the location parameter mu.


a character string describing the alternative hypothesis.


the type of test applied.

a character string giving the names of the data.

a confidence interval for the location parameter. (Only present if argument = TRUE.)


an estimate of the location parameter. (Only present if argument = TRUE.)


This function can use large amounts of memory and stack (and even crash R if the stack limit is exceeded) if exact = TRUE and one sample is large (several thousands or more).


The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. The two most common definitions correspond to the sum of the ranks of the first sample with the minimum value (m(m+1)/2m(m+1)/2 for a first sample of size mm) subtracted or not: R subtracts. It seems Wilcoxon's original paper used the unadjusted sum of the ranks but subsequent tables subtracted the minimum.

R's value can also be computed as the number of all pairs (x[i], y[j]) for which y[j] is not greater than x[i], the most common definition of the Mann-Whitney test.


David F. Bauer (1972). Constructing confidence sets using rank statistics. Journal of the American Statistical Association 67, 687–690. doi:10.1080/01621459.1972.10481279.

Myles Hollander and Douglas A. Wolfe (1973). Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 27–33 (one-sample), 68–75 (two-sample).
Or second edition (1999).

See Also

psignrank, pwilcox.

wilcox_test in package coin for exact, asymptotic and Monte Carlo conditional p-values, including in the presence of ties.

kruskal.test for testing homogeneity in location parameters in the case of two or more samples; t.test for an alternative under normality assumptions [or large samples]


## One-sample test.
## Hollander & Wolfe (1973), 29f.
## Hamilton depression scale factor measurements in 9 patients with
##  mixed anxiety and depression, taken at the first (x) and second
##  (y) visit after initiation of a therapy (administration of a
##  tranquilizer).
x <- c(1.83,  0.50,  1.62,  2.48, 1.68, 1.88, 1.55, 3.06, 1.30)
y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)
wilcox.test(x, y, paired = TRUE, alternative = "greater")
wilcox.test(y - x, alternative = "less")    # The same.
wilcox.test(y - x, alternative = "less",
            exact = FALSE, correct = FALSE) # H&W large sample
                                            # approximation

## Formula interface to one-sample and paired tests

depression <- data.frame(first = x, second = y, change = y - x)
wilcox.test(change ~ 1, data = depression)
wilcox.test(Pair(first, second) ~ 1, data = depression)

## Two-sample test.
## Hollander & Wolfe (1973), 69f.
## Permeability constants of the human chorioamnion (a placental
##  membrane) at term (x) and between 12 to 26 weeks gestational
##  age (y).  The alternative of interest is greater permeability
##  of the human chorioamnion for the term pregnancy.
x <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46)
y <- c(1.15, 0.88, 0.90, 0.74, 1.21)
wilcox.test(x, y, alternative = "g")        # greater
wilcox.test(x, y, alternative = "greater",
            exact = FALSE, correct = FALSE) # H&W large sample
                                            # approximation

wilcox.test(rnorm(10), rnorm(10, 2), = TRUE)

## Formula interface.
boxplot(Ozone ~ Month, data = airquality)
wilcox.test(Ozone ~ Month, data = airquality,
            subset = Month %in% c(5, 8))

## accuracy in ties determination via 'digits.rank':
wilcox.test( 4:2,      3:1,     paired=TRUE) # Warning:  cannot compute exact p-value with ties
wilcox.test((4:2)/10, (3:1)/10, paired=TRUE) # no ties => *no* warning
wilcox.test((4:2)/10, (3:1)/10, paired=TRUE, digits.rank = 9) # same ties as (4:2, 3:1)

Distribution of the Wilcoxon Rank Sum Statistic


Density, distribution function, quantile function and random generation for the distribution of the Wilcoxon rank sum statistic obtained from samples with size m and n, respectively.


dwilcox(x, m, n, log = FALSE)
pwilcox(q, m, n, lower.tail = TRUE, log.p = FALSE)
qwilcox(p, m, n, lower.tail = TRUE, log.p = FALSE)
rwilcox(nn, m, n)


x, q

vector of quantiles.


vector of probabilities.


number of observations. If length(nn) > 1, the length is taken to be the number required.

m, n

numbers of observations in the first and second sample, respectively. Can be vectors of positive integers.

log, log.p

logical; if TRUE, probabilities p are given as log(p).


logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].


This distribution is obtained as follows. Let x and y be two random, independent samples of size m and n. Then the Wilcoxon rank sum statistic is the number of all pairs (x[i], y[j]) for which y[j] is not greater than x[i]. This statistic takes values between 0 and m * n, and its mean and variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.

If any of the first three arguments are vectors, the recycling rule is used to do the calculations for all combinations of the three up to the length of the longest vector.


dwilcox gives the density, pwilcox gives the distribution function, qwilcox gives the quantile function, and rwilcox generates random deviates.

The length of the result is determined by nn for rwilcox, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than nn are recycled to the length of the result. Only the first elements of the logical arguments are used.


These functions can use large amounts of memory and stack (and even crash R if the stack limit is exceeded and stack-checking is not in place) if one sample is large (several thousands or more).


S-PLUS used a different (but equivalent) definition of the Wilcoxon statistic: see wilcox.test for details.


Kurt Hornik


These ("d","p","q") are calculated via recursion, based on cwilcox(k, m, n), the number of choices with statistic k from samples of size m and n, which is itself calculated recursively and the results cached. Then dwilcox and pwilcox sum appropriate values of cwilcox, and qwilcox is based on inversion.

rwilcox generates a random permutation of ranks and evaluates the statistic. Note that it is based on the same C code as sample(), and hence is determined by .Random.seed, notably from RNGkind(sample.kind = ..) which changed with R version 3.6.0.

See Also

wilcox.test to calculate the statistic from data, find p values and so on.

Distributions for standard distributions, including dsignrank for the distribution of the one-sample Wilcoxon signed rank statistic.



x <- -1:(4*6 + 1)
fx <- dwilcox(x, 4, 6)
Fx <- pwilcox(x, 4, 6)

layout(rbind(1,2), widths = 1, heights = c(3,2))
plot(x, fx, type = "h", col = "violet",
     main =  "Probabilities (density) of Wilcoxon-Statist.(n=6, m=4)")
plot(x, Fx, type = "s", col = "blue",
     main =  "Distribution of Wilcoxon-Statist.(n=6, m=4)")
abline(h = 0:1, col = "gray20", lty = 2)
layout(1) # set back

N <- 200
hist(U <- rwilcox(N, m = 4,n = 6), breaks = 0:25 - 1/2,
     border = "red", col = "pink", sub = paste("N =",N))
mtext("N * f(x),  f() = true \"density\"", side = 3, col = "blue")
 lines(x, N*fx, type = "h", col = "blue", lwd = 2)
points(x, N*fx, cex = 2)

## Better is a Quantile-Quantile Plot
qqplot(U, qw <- qwilcox((1:N - 1/2)/N, m = 4, n = 6),
       main = paste("Q-Q-Plot of empirical and theoretical quantiles",
                     "Wilcoxon Statistic,  (m=4, n=6)", sep = "\n"))
n <- as.numeric(names(print(tU <- table(U))))
text(n+.2, n+.5, labels = tU, col = "red")

Time (Series) Windows


window is a generic function which extracts the subset of the object x observed between the times start and end. If a frequency is specified, the series is then re-sampled at the new frequency.


window(x, ...)
## S3 method for class 'ts'
window(x, ...)
## Default S3 method:
window(x, start = NULL, end = NULL,
      frequency = NULL, deltat = NULL, extend = FALSE, ts.eps = getOption("ts.eps"), ...)

window(x, ...) <- value
## S3 replacement method for class 'ts'
window(x, start, end, frequency, deltat, ...) <- value



a time-series (or other object if not replacing values).


the start time of the period of interest.


the end time of the period of interest.

frequency, deltat

the new frequency can be specified by either (or both if they are consistent).


logical. If true, the start and end values are allowed to extend the series. If false, attempts to extend the series give a warning and are ignored.


time series comparison tolerance. Frequencies are considered equal if their absolute difference is less than ts.eps and boundaries (length-1 versions of start and end) are checked with fuzz ts.eps/frequency(x).


further arguments passed to or from other methods.


replacement values.


The start and end times can be specified as for ts. If there is no observation at the new start or end, the immediately following (start) or preceding (end) observation time is used.

The replacement function has a method for ts objects, and is allowed to extend the series (with a warning). There is no default method.


The value depends on the method. window.default will return a vector or matrix with an appropriate tsp attribute.

window.ts differs from window.default only in ensuring the result is a ts object.

If extend = TRUE the series will be padded with NAs if needed.


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

time, ts.


window(presidents, 1960, c(1969,4)) # values in the 1960's
window(presidents, deltat = 1)  # All Qtr1s
window(presidents, start = c(1945,3), deltat = 1)  # All Qtr3s
window(presidents, 1944, c(1979,2), extend = TRUE)

pres <- window(presidents, 1945, c(1949,4)) # values in the 1940's
window(pres, 1945.25, 1945.50) <- c(60, 70)
window(pres, 1944, 1944.75) <- 0 # will generate a warning
window(pres, c(1945,4), c(1949,4), frequency = 1) <- 85:89

Cross Tabulation


Create a contingency table (optionally a sparse matrix) from cross-classifying factors, usually contained in a data frame, using a formula interface.


xtabs(formula = ~., data = parent.frame(), subset, sparse = FALSE,
      na.action, na.rm = FALSE, addNA = FALSE,
      exclude = if(!addNA) c(NA, NaN), drop.unused.levels = FALSE)

## S3 method for class 'xtabs'
print(x, na.print = "", ...)



a formula object with the cross-classifying variables (separated by +) on the right-hand side (or an object which can be coerced to a formula). Interactions are not allowed. On the left-hand side, one may optionally give a vector or a matrix of counts; in the latter case, the columns are interpreted as corresponding to the levels of a variable. This is useful if the data have already been tabulated, see the examples below.


an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).


an optional vector specifying a subset of observations to be used.


logical specifying if the result should be a sparse matrix, i.e., inheriting from sparseMatrix Only works for two factors (since there are no higher-order sparse array classes yet).


a function which indicates what should happen when the variables in formula (or subset) contain NAs. Defaults to na.pass, so na.rm and addNA, respectively, control the handling of missing values for the two sides of the formula. Using na.omit removes any incomplete cases.


logical: should missing values on the left-hand side of the formula be treated as zero when computing the sum?


logical indicating if NAs in the factors should get a separate level and be counted, using addNA(*, ifany=TRUE). This has no effect if na.action = na.omit.


a vector of values to be excluded when forming the set of levels of the classifying factors.


a logical indicating whether to drop unused levels in the classifying factors. If this is FALSE and there are unused levels, the table will contain zero marginals, and a subsequent chi-squared test for independence of the factors will not work.


an object of class "xtabs".


character string (or NULL) indicating how NA are printed. The default ("") does not show NAs clearly, and na.print = "NA" maybe advisable instead.


further arguments passed to or from other methods.


There is a summary method for contingency table objects created by table or xtabs(*, sparse = FALSE), which gives basic information and performs a chi-squared test for independence of factors (note that the function chisq.test currently only handles 2-d tables).

If a left-hand side is given in formula, its entries are simply summed over the cells corresponding to the right-hand side; this also works if the LHS does not give counts.

For variables in formula which are factors, exclude must be specified explicitly; the default exclusions will not be used.

In R versions before 3.4.0, e.g., when na.action = na.pass, sometimes zeroes (0) were returned instead of NAs.

In R versions before 4.4.0, when !addNA as by default, the default na.action was na.omit, effectively treating missing counts as zero.


By default, when sparse = FALSE, a contingency table in array representation of S3 class c("xtabs", "table"), with a "call" attribute storing the matched call.

When sparse = TRUE, a sparse numeric matrix, specifically an object of S4 class dgTMatrix from package Matrix.

See Also

table for traditional cross-tabulation, and which is the inverse operation of xtabs (see the DF example below).

sparseMatrix on sparse matrices in package Matrix.


## 'esoph' has the frequencies of cases and controls for all levels of
## the variables 'agegp', 'alcgp', and 'tobgp'.
xtabs(cbind(ncases, ncontrols) ~ ., data = esoph)
## Output is not really helpful ... flat tables are better:
ftable(xtabs(cbind(ncases, ncontrols) ~ ., data = esoph))
## In particular if we have fewer factors ...
ftable(xtabs(cbind(ncases, ncontrols) ~ agegp, data = esoph))

## This is already a contingency table in array form.
DF <-
## Now 'DF' is a data frame with a grid of the factors and the counts
## in variable 'Freq'.
## Nice for taking margins ...
xtabs(Freq ~ Gender + Admit, DF)
## And for testing independence ...
summary(xtabs(Freq ~ ., DF))

## with NA's
DN <- DF; DN[cbind(6:9, c(1:2,4,1))] <- NA
DN # 'Freq' is missing only for (Rejected, Female, B)
(xtNA <- xtabs(Freq ~ Gender + Admit, DN))     # NA prints 'invisibly'
print(xtNA, na.print = "NA")                   # show NA's better
xtabs(Freq ~ Gender + Admit, DN, na.rm = TRUE) # ignore missing Freq
## Use addNA = TRUE to tabulate missing factor levels:
xtabs(Freq ~ Gender + Admit, DN, addNA = TRUE)
xtabs(Freq ~ Gender + Admit, DN, addNA = TRUE, na.rm = TRUE)
## na.action = na.omit removes all rows with NAs right from the start:
xtabs(Freq ~ Gender + Admit, DN, na.action = na.omit)

## Create a nice display for the warp break data.
warpbreaks$replicate <- rep_len(1:9, 54)
ftable(xtabs(breaks ~ wool + tension + replicate, data = warpbreaks))

### ---- Sparse Examples ----

if(require("Matrix")) withAutoprint({
 ## similar to "nlme"s  'ergoStool' :
 d.ergo <- data.frame(Type = paste0("T", rep(1:4, 9*4)),
                      Subj = gl(9, 4, 36*4))
 xtabs(~ Type + Subj, data = d.ergo) # 4 replicates each
 set.seed(15) # a subset of cases:
 xtabs(~ Type + Subj, data = d.ergo[sample(36, 10), ], sparse = TRUE)

 ## Hypothetical two-level setup:
 inner <- factor(sample(letters[1:25], 100, replace = TRUE))
 inout <- factor(sample(LETTERS[1:5], 25, replace = TRUE))
 fr <- data.frame(inner = inner, outer = inout[as.integer(inner)])
 xtabs(~ inner + outer, fr, sparse = TRUE)