Title: | The R Base Package |
---|---|
Description: | Base R functions. |
Authors: | R Core Team and contributors worldwide |
Maintainer: | R Core Team <[email protected]> |
License: | Part of R 4.4.1 |
Version: | 4.4.1 |
Built: | 2024-06-15 17:27:47 UTC |
Source: | base |
Base R functions
This package contains the basic functions which let R function as a language: arithmetic, input/output, basic programming support, etc. Its contents are available through inheritance from any environment.
For a complete list of functions, use library(help = "base")
.
Bin a numeric vector and return integer codes for the binning.
.bincode(x, breaks, right = TRUE, include.lowest = FALSE)
.bincode(x, breaks, right = TRUE, include.lowest = FALSE)
x |
a numeric vector which is to be converted to integer codes by binning. |
breaks |
a numeric vector of two or more cut points, sorted in increasing order. |
right |
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa. |
include.lowest |
logical, indicating if an ‘x[i]’ equal to
the lowest (or highest, for |
This is a ‘barebones’ version of cut.default(labels =
FALSE)
intended for use in other functions which have checked the
arguments passed. (Note the different order of the arguments they have
in common.)
Unlike cut
, the breaks
do not need to be unique.
An input can only fall into a zero-length interval if it is closed
at both ends, so only if include.lowest = TRUE
and it is the
first (or last for right = FALSE
) interval.
An integer vector of the same length as x
indicating which bin
each element falls into (the leftmost bin being bin 1
).
NaN
and NA
elements of x
are mapped to
NA
codes, as are values outside range of breaks
.
## An example with non-unique breaks: x <- c(0, 0.01, 0.5, 0.99, 1) b <- c(0, 0, 1, 1) .bincode(x, b, TRUE) .bincode(x, b, FALSE) .bincode(x, b, TRUE, TRUE) .bincode(x, b, FALSE, TRUE)
## An example with non-unique breaks: x <- c(0, 0.01, 0.5, 0.99, 1) b <- c(0, 0, 1, 1) .bincode(x, b, TRUE) .bincode(x, b, FALSE) .bincode(x, b, TRUE, TRUE) .bincode(x, b, FALSE, TRUE)
A pairlist of the names of open graphics devices is stored in
.Devices
. The name of the active device (see
dev.cur
) is stored in .Device
. Both are symbols
and so appear in the base namespace.
.Device .Devices
.Device .Devices
.Device
is a length-one character vector.
.Devices
is a pairlist of length-one character vectors.
The first entry is always "null device"
, and there are as many
entries as the maximal number of graphics devices which have been
simultaneously active. If a device has been removed, its entry will be
""
until the device number is reused.
Devices may add attributes to the character vector: for example
devices which write to a file may record its path in attribute
"filepath"
.
.Machine
is a variable holding information on the numerical
characteristics of the machine R is running on, such as the largest
double or integer and the machine's precision.
.Machine
.Machine
The algorithm is based on Cody's (1988) subroutine MACHAR
. As all
current implementations of R use 32-bit integers and use IEC 60559
floating-point (double precision) arithmetic, the "integer"
and
"double"
related values are the same for almost all R builds.
Note that on most platforms smaller positive values than
.Machine$double.xmin
can occur. On a typical R platform the
smallest positive double is about 5e-324
.
A list with components
double.eps |
the smallest positive floating-point number
|
double.neg.eps |
a small positive floating-point number |
double.xmin |
the smallest non-zero normalized
floating-point number, a power of the radix, i.e.,
|
double.xmax |
the largest normalized floating-point number.
Typically, it is equal to |
double.base |
the radix for the floating-point representation:
normally |
double.digits |
the number of base digits in the floating-point
significand: normally |
double.rounding |
the rounding action, one of |
double.guard |
the number of guard digits for multiplication
with truncating arithmetic. It is 1 if floating-point arithmetic
truncates and more than |
double.ulp.digits |
the largest negative integer |
double.neg.ulp.digits |
the largest negative integer |
double.exponent |
the number of bits (decimal places if |
double.min.exp |
the largest in magnitude negative integer |
double.max.exp |
the smallest positive power of |
integer.max |
the largest integer which can be represented.
Always |
sizeof.long |
the number of bytes in a C ‘long’ type:
|
sizeof.longlong |
the number of bytes in a C ‘long long’
type. Will be zero if there is no such type, otherwise usually
|
sizeof.longdouble |
the number of bytes in a C ‘long double’
type. Will be zero if there is no such type (or its use was
disabled when R was built), otherwise possibly
|
sizeof.pointer |
the number of bytes in the C |
sizeof.time_t |
the number of bytes in the C |
longdouble.eps , longdouble.neg.eps , longdouble.digits , ...
|
introduced in R 4.0.0. When
|
In the (typical) case where capabilities("long.double")
is
true, R uses the ‘long double’ C type in quite a few places internally
for accumulators in e.g. sum
, reading non-integer
numeric constants into (binary) double precision numbers, or arithmetic
such as x %% y
; also, ‘long double’ can be read by
readBin
.
For this reason, in that case, .Machine
contains ten further components,
longdouble.eps
, *.neg.eps
, *.digits
, *.rounding
*.guard
, *.ulp.digits
, *.neg.ulp.digits
,
*.exponent
, *.min.exp
, and *.max.exp
, computed
entirely analogously to their double.*
counterparts, see there.
sizeof.longdouble
only tells you the amount of storage
allocated for a long double. Often what is stored is the 80-bit extended
double type of IEC 60559, padded to the double alignment used on the
platform — this seems to be the case for the common R platforms
using ix86 and x86_64 chips. There are other implementation of long
double, usually in software for example on Sparc Solaris and AIX.
Note that it is legal for a platform to have a ‘long double’ C
type which is identical to the ‘double’ type — this happens on
ARM CPUs. In that case capabilities("long.double")
will
be false but on versions of R prior to 4.0.4, .Machine
may contain
"longdouble.kind"
elements.
Uses a C translation of Fortran code in the reference, modified by the R Core Team to defeat over-optimization in modern compilers.
Cody, W. J. (1988). MACHAR: A subroutine to dynamically determine machine parameters. Transactions on Mathematical Software, 14(4), 303–311. doi:10.1145/50063.51907.
.Platform
for details of the platform.
.Machine ## or for a neat printout noquote(unlist(format(.Machine)))
.Machine ## or for a neat printout noquote(unlist(format(.Machine)))
.Platform
is a list with some details of the platform under
which R was built. This provides means to write OS-portable R
code.
.Platform
.Platform
A list with at least the following components:
OS.type |
character string, giving the Operating System
(family) of the computer. One of |
file.sep |
character string, giving the file separator used on your
platform: |
dynlib.ext |
character string, giving the file name extension of
dynamically loadable libraries, e.g., |
GUI |
character string, giving the type of GUI in use, or |
endian |
character string, |
pkgType |
character string, the preferred setting for
This should not be used to identify the OS. |
path.sep |
character string, giving the path separator,
used on your platform, e.g., |
r_arch |
character string, possibly |
.Platform$GUI
is set to "AQUA"
under the macOS GUI,
R.app
. This has a number of consequences:
‘/usr/local/bin’ is appended to the PATH environment variable.
the default graphics device is set to quartz
.
selects native (rather than Tk) widgets for the graphics
= TRUE
options of menu
and select.list
.
HTML help is displayed in the internal browser.
the spreadsheet-like data editor/viewer uses a Quartz version rather than the X11 one.
R.version
and Sys.info
give more details
about the OS. In particular, R.version$platform
is the
canonical name of the platform under which R was compiled.
osVersion
may give more details about the platform R is running on.
.Machine
for details of the arithmetic used, and
system
for invoking platform-specific system commands.
capabilities
and extSoftVersion
(and links
there) for availability of capabilities partly external to R
but used from R functions.
## Note: this can be done in a system-independent way by dir.exists() if(.Platform$OS.type == "unix") { system.test <- function(...) system(paste("test", ...)) == 0L dir.exists2 <- function(dir) sapply(dir, function(d) system.test("-d", d)) dir.exists2(c(R.home(), "/tmp", "~", "/NO")) # > T T T F }
## Note: this can be done in a system-independent way by dir.exists() if(.Platform$OS.type == "unix") { system.test <- function(...) system(paste("test", ...)) == 0L dir.exists2 <- function(dir) sapply(dir, function(d) system.test("-d", d)) dir.exists2(c(R.home(), "/tmp", "~", "/NO")) # > T T T F }
Abbreviate strings to at least minlength
characters,
such that they remain unique (if they were),
unless strict = TRUE
.
abbreviate(names.arg, minlength = 4, use.classes = TRUE, dot = FALSE, strict = FALSE, method = c("left.kept", "both.sides"), named = TRUE)
abbreviate(names.arg, minlength = 4, use.classes = TRUE, dot = FALSE, strict = FALSE, method = c("left.kept", "both.sides"), named = TRUE)
names.arg |
a character vector of names to be abbreviated, or an
object to be coerced to a character vector by |
minlength |
the minimum length of the abbreviations. |
use.classes |
logical: should lowercase characters be removed first? |
dot |
logical: should a dot ( |
strict |
logical: should |
method |
a character string specifying the method used with default
|
named |
logical: should |
The default algorithm (method = "left.kept"
) used is similar
to that of S. For a single string it works as follows.
First spaces at the ends of the string are stripped.
Then (if necessary) any other spaces are stripped.
Next, lower case vowels are removed followed by lower case consonants.
Finally if the abbreviation is still longer than minlength
upper case letters and symbols are stripped.
Characters are always stripped from the end of the strings first. If
an element of names.arg
contains more than one word (words are
separated by spaces) then at least one letter from each word will be
retained.
Missing (NA
) values are unaltered.
If use.classes
is FALSE
then the only distinction is to
be between letters and space.
A character vector containing abbreviations for the character strings
in its first argument. Duplicates in the original names.arg
will be given identical abbreviations. If any non-duplicated elements
have the same minlength
abbreviations then, if method =
"both.sides"
the basic internal abbreviate()
algorithm is
applied to the characterwise reversed strings; if there are
still duplicated abbreviations and if strict = FALSE
as by
default, minlength
is incremented by one and new abbreviations
are found for those elements only. This process is repeated until all
unique elements of names.arg
have unique abbreviations.
If names
is true, the character version of names.arg
is
attached to the returned value as a names
attribute: no
other attributes are retained.
If a input element contains non-ASCII characters, the corresponding
value will be in UTF-8 and marked as such (see Encoding
).
If use.classes
is true (the default), this is really only
suitable for English, and prior to R 3.3.0 did not work correctly
with non-ASCII characters in multibyte locales. It will warn if used
with non-ASCII characters (and required to reduce the length). It is
unlikely to work well with inputs not in the Unicode Basic Multilingual
Plane nor on (rare) platforms where wide characters are not encoded in
Unicode.
As from R 3.3.0 the concept of ‘vowel’ is extended from English vowels by including characters which are accented versions of lower-case English vowels (including ‘o with stroke’). Of course, there are languages (even Western European languages such as Welsh) with other vowels.
x <- c("abcd", "efgh", "abce") abbreviate(x, 2) abbreviate(x, 2, strict = TRUE) # >> 1st and 3rd are == "ab" (st.abb <- abbreviate(state.name, 2)) stopifnot(identical(unname(st.abb), abbreviate(state.name, 2, named=FALSE))) table(nchar(st.abb)) # out of 50, 3 need 4 letters : as <- abbreviate(state.name, 3, strict = TRUE) as[which(as == "Mss")] ## and without distinguishing vowels: st.abb2 <- abbreviate(state.name, 2, FALSE) cbind(st.abb, st.abb2)[st.abb2 != st.abb, ] ## method = "both.sides" helps: no 4-letters, and only 4 3-letters: st.ab2 <- abbreviate(state.name, 2, method = "both") table(nchar(st.ab2)) ## Compare the two methods: cbind(st.abb, st.ab2)
x <- c("abcd", "efgh", "abce") abbreviate(x, 2) abbreviate(x, 2, strict = TRUE) # >> 1st and 3rd are == "ab" (st.abb <- abbreviate(state.name, 2)) stopifnot(identical(unname(st.abb), abbreviate(state.name, 2, named=FALSE))) table(nchar(st.abb)) # out of 50, 3 need 4 letters : as <- abbreviate(state.name, 3, strict = TRUE) as[which(as == "Mss")] ## and without distinguishing vowels: st.abb2 <- abbreviate(state.name, 2, FALSE) cbind(st.abb, st.abb2)[st.abb2 != st.abb, ] ## method = "both.sides" helps: no 4-letters, and only 4 3-letters: st.ab2 <- abbreviate(state.name, 2, method = "both") table(nchar(st.ab2)) ## Compare the two methods: cbind(st.abb, st.ab2)
Searches for approximate matches to pattern
(the first argument)
within each element of the string x
(the second argument) using
the generalized Levenshtein edit distance (the minimal possibly
weighted number of insertions, deletions and substitutions needed to
transform one string into another).
agrep(pattern, x, max.distance = 0.1, costs = NULL, ignore.case = FALSE, value = FALSE, fixed = TRUE, useBytes = FALSE) agrepl(pattern, x, max.distance = 0.1, costs = NULL, ignore.case = FALSE, fixed = TRUE, useBytes = FALSE)
agrep(pattern, x, max.distance = 0.1, costs = NULL, ignore.case = FALSE, value = FALSE, fixed = TRUE, useBytes = FALSE) agrepl(pattern, x, max.distance = 0.1, costs = NULL, ignore.case = FALSE, fixed = TRUE, useBytes = FALSE)
pattern |
a non-empty character string to be matched. For
|
x |
character vector where matches are sought.
Coerced by |
max.distance |
maximum distance allowed for a match. Expressed either as integer, or as a fraction of the pattern length times the maximal transformation cost (will be replaced by the smallest integer not less than the corresponding fraction), or a list with possible components
If |
costs |
a numeric vector or list with names partially matching
‘insertions’, ‘deletions’ and ‘substitutions’ giving
the respective costs for computing the generalized Levenshtein
distance, or |
ignore.case |
if |
value |
if |
fixed |
logical. If |
useBytes |
logical. If |
The Levenshtein edit distance is used as measure of approximateness: it is the (possibly cost-weighted) total number of insertions, deletions and substitutions required to transform one string into another.
This uses the tre
code by Ville Laurikari
(https://github.com/laurikari/tre), which supports MBCS
character matching.
The main effect of useBytes = TRUE
is to avoid errors/warnings
about invalid inputs and spurious matches in multibyte locales.
It inhibits the conversion of inputs with marked encodings, and is
forced if any input is found which is marked as "bytes"
(see
Encoding
).
agrep
returns a vector giving the indices of the elements that
yielded a match, or, if value
is TRUE
, the matched
elements (after coercion, preserving names but no other attributes).
agrepl
returns a logical vector.
Since someone who read the description carelessly even filed a bug
report on it, do note that this matches substrings of each element of
x
(just as grep
does) and not whole
elements. See also adist
in package utils, which
optionally returns the offsets of the matched substrings.
Original version in R < 2.10.0 by David Meyer. Current version by Brian Ripley and Kurt Hornik.
grep
, adist
.
A different interface to approximate string matching is provided by
aregexec()
.
agrep("lasy", "1 lazy 2") agrep("lasy", c(" 1 lazy 2", "1 lasy 2"), max.distance = list(sub = 0)) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max.distance = 2) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max.distance = 2, value = TRUE) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max.distance = 2, ignore.case = TRUE)
agrep("lasy", "1 lazy 2") agrep("lasy", c(" 1 lazy 2", "1 lasy 2"), max.distance = list(sub = 0)) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max.distance = 2) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max.distance = 2, value = TRUE) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max.distance = 2, ignore.case = TRUE)
Given a set of logical vectors, are all of the values true?
all(..., na.rm = FALSE)
all(..., na.rm = FALSE)
... |
zero or more logical vectors. Other objects of zero length are ignored, and the rest are coerced to logical ignoring any class. |
na.rm |
logical. If true |
This is a generic function: methods can be defined for it
directly or via the Summary
group generic.
For this to work properly, the arguments ...
should be
unnamed, and dispatch is on the first argument.
Coercion of types other than integer (raw, double, complex, character, list) gives a warning as this is often unintentional.
This is a primitive function.
The value is a logical vector of length one.
Let x
denote the concatenation of all the logical vectors in
...
(after coercion), after removing NA
s if requested by
na.rm = TRUE
.
The value returned is TRUE
if all of the values in x
are
TRUE
(including if there are no values), and FALSE
if at
least one of the values in x
is FALSE
. Otherwise the
value is NA
(which can only occur if na.rm = FALSE
and
...
contains no FALSE
values and at least one
NA
value).
This is part of the S4 Summary
group generic. Methods for it must use the signature
x, ..., na.rm
.
That all(logical(0))
is true is a useful convention:
it ensures that
all(all(x), all(y)) == all(x, y)
even if x
has length zero.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
any
, the ‘complement’ of all
, and
stopifnot(*)
which is an all(*)
‘insurance’.
range(x <- sort(round(stats::rnorm(10) - 1.2, 1))) if(all(x < 0)) cat("all x values are negative\n") all(logical(0)) # true, as all zero of the elements are true.
range(x <- sort(round(stats::rnorm(10) - 1.2, 1))) if(all(x < 0)) cat("all x values are negative\n") all(logical(0)) # true, as all zero of the elements are true.
all.equal(x, y)
is a utility to compare R objects x
and y
testing ‘near equality’. If they are different,
comparison is still made to some extent, and a report of the
differences is returned. Do not use all.equal
directly in
if
expressions—either use isTRUE(all.equal(....))
or
identical
if appropriate.
all.equal(target, current, ...) ## Default S3 method: all.equal(target, current, ..., check.class = TRUE) ## S3 method for class 'numeric' all.equal(target, current, tolerance = sqrt(.Machine$double.eps), scale = NULL, countEQ = FALSE, formatFUN = function(err, what) format(err), ..., check.attributes = TRUE, check.class = TRUE, giveErr = FALSE) ## S3 method for class 'list' all.equal(target, current, ..., check.attributes = TRUE, use.names = TRUE) ## S3 method for class 'environment' all.equal(target, current, all.names = TRUE, evaluate = TRUE, ...) ## S3 method for class 'function' all.equal(target, current, check.environment=TRUE, ...) ## S3 method for class 'POSIXt' all.equal(target, current, ..., tolerance = 1e-3, scale, check.tzone = TRUE) attr.all.equal(target, current, ..., check.attributes = TRUE, check.names = TRUE)
all.equal(target, current, ...) ## Default S3 method: all.equal(target, current, ..., check.class = TRUE) ## S3 method for class 'numeric' all.equal(target, current, tolerance = sqrt(.Machine$double.eps), scale = NULL, countEQ = FALSE, formatFUN = function(err, what) format(err), ..., check.attributes = TRUE, check.class = TRUE, giveErr = FALSE) ## S3 method for class 'list' all.equal(target, current, ..., check.attributes = TRUE, use.names = TRUE) ## S3 method for class 'environment' all.equal(target, current, all.names = TRUE, evaluate = TRUE, ...) ## S3 method for class 'function' all.equal(target, current, check.environment=TRUE, ...) ## S3 method for class 'POSIXt' all.equal(target, current, ..., tolerance = 1e-3, scale, check.tzone = TRUE) attr.all.equal(target, current, ..., check.attributes = TRUE, check.names = TRUE)
target |
R object. |
current |
other R object, to be compared with |
... |
further arguments for different methods, notably the following two, for numerical comparison: |
tolerance |
numeric |
scale |
|
countEQ |
logical indicating if the |
formatFUN |
a |
check.attributes |
logical indicating if the
|
check.class |
logical indicating if the |
giveErr |
|
use.names |
logical indicating if |
all.names |
logical passed to |
evaluate |
for the |
check.environment |
logical requiring that the
|
check.tzone |
logical indicating if the |
check.names |
logical indicating if the |
all.equal
is a generic function, dispatching methods on the
target
argument. To see the available methods, use
methods("all.equal")
, but note that the default method
also does some dispatching, e.g. using the raw method for logical
targets.
Remember that arguments which follow ...
must be specified by
(unabbreviated) name. It is inadvisable to pass unnamed arguments in
...
as these will match different arguments in different
methods.
Numerical comparisons for scale = NULL
(the default) are
typically on a relative difference scale unless the
target
values are close to zero or infinite. Specifically,
the scale is computed as the mean absolute value of target
.
If this scale is finite and exceeds tolerance
, differences
are expressed relative to it; otherwise, absolute differences are used.
Note that this scale and all further steps are computed only for those
vector elements
where target
is not NA
and differs from current
.
If countEQ
is true, the equal and NA
cases are
counted in determining the “sample” size.
If scale
is numeric (and positive), absolute comparisons are
made after scaling (dividing) by scale
. Note that if all of
scale is close to 1 (specifically, within 1e-7), the difference is still
reported as being on an absolute scale.
For complex target
, the modulus (Mod
) of the
difference is used: all.equal.numeric
is called so arguments
tolerance
and scale
are available.
The list
method compares components of
target
and current
recursively, passing all other
arguments, as long as both are “list-like”, i.e., fulfill
either is.vector
or is.list
.
The environment
method works via the list
method,
and is also used for reference classes (unless a specific
all.equal
method is defined).
The method for date-time objects uses all.equal.numeric
to
compare times (in "POSIXct"
representation) with a
default tolerance
of 0.001 seconds, ignoring scale
.
A time zone mismatch between target
and current
is
reported unless check.tzone = FALSE
.
attr.all.equal
is used for comparing
attributes
, returning NULL
or a
character
vector.
Either TRUE
(NULL
for attr.all.equal
) or a vector
of mode
"character"
describing the differences
between target
and current
.
Chambers, J. M. (1998)
Programming with Data. A Guide to the S Language.
Springer (for =
).
identical
, isTRUE
, ==
, and
all
for exact equality testing.
all.equal(pi, 355/113) # not precise enough (default tol) > relative error quarts <- 1/4 + 1:10 # exact d45 <- pi*quarts ; one <- rep(1, 10) tan(d45) == one # mostly FALSE, as typically exact; embarrassingly, tanpi(quarts) == one # (is always FALSE (Fedora 34; gcc 11.2.1)) stopifnot(all.equal( tan(d45), one)) # TRUE, but not if we are picky: all.equal(tan(d45), one, tolerance = 0) # to see difference all.equal(tan(d45), one, tolerance = 0, scale = 1)# "absolute diff.." all.equal(tan(d45), one, tolerance = 0, scale = 1+(-2:2)/1e9) # "absolute" all.equal(tan(d45), one, tolerance = 0, scale = 1+(-2:2)/1e6) # "scaled" ## advanced: equality of environments ae <- all.equal(as.environment("package:stats"), asNamespace("stats")) stopifnot(is.character(ae), length(ae) > 10, ## were incorrectly "considered equal" in R <= 3.1.1 all.equal(asNamespace("stats"), asNamespace("stats"))) ## A situation where 'countEQ = TRUE' makes sense: x1 <- x2 <- (1:100)/10; x2[2] <- 1.1*x1[2] ## 99 out of 100 pairs (x1[i], x2[i]) are equal: plot(x1,x2, main = "all.equal.numeric() -- not counting equal parts") all.equal(x1,x2) ## "Mean relative difference: 0.1" mtext(paste("all.equal(x1,x2) :", all.equal(x1,x2)), line= -2) ##' extract the 'Mean relative difference' as number: all.eqNum <- function(...) as.numeric(sub(".*:", '', all.equal(...))) set.seed(17) ## When x2 is jittered, typically all pairs (x1[i],x2[i]) do differ: summary(r <- replicate(100, all.eqNum(x1, x2*(1+rnorm(x1)*1e-7)))) mtext(paste("mean(all.equal(x1, x2*(1 + eps_k))) {100 x} Mean rel.diff.=", signif(mean(r), 3)), line = -4, adj=0) ## With argument countEQ=TRUE, get "the same" (w/o need for jittering): mtext(paste("all.equal(x1,x2, countEQ=TRUE) :", signif(all.eqNum(x1,x2, countEQ=TRUE), 3)), line= -6, col=2) ## Using giveErr=TRUE : x1. <- x1 * (1+ 1e-9*rnorm(x1)) str(all.equal(x1, x1., giveErr=TRUE)) ## logi TRUE ## - attr(*, "err")= num 8.66e-10 ## - attr(*, "what")= chr "relative" ## Used with stopifnot(), still *showing* diff: all.equalShow <- function (...) { r <- all.equal(..., giveErr=TRUE) cat(attr(r,"what"), "err:", attr(r,"err"), "\n") c(r) # can drop attributes, as not used anymore } # checks, showing error in any case: stopifnot(all.equalShow(x1, x1.)) # -> relative err: 8.66002e-10 tryCatch(error=identity, stopifnot(all.equalShow(x1, 2*x1))) -> eAe stopifnot(inherits(eAe, "error")) # stopifnot(all.equal....()) giving smart msg: cat(conditionMessage(eAe), "\n") two <- structure(2, foo = 1, class = "bar") all.equal(two^20, 2^20) # lots of diff all.equal(two^20, 2^20, check.attributes = FALSE)# "target is bar, current is numeric" all.equal(two^20, 2^20, check.attributes = FALSE, check.class = FALSE) # TRUE ## comparison of date-time objects now <- Sys.time() stopifnot( all.equal(now, now + 1e-4) # TRUE (default tolerance = 0.001 seconds) ) all.equal(now, now + 0.2) all.equal(now, as.POSIXlt(now, "UTC")) stopifnot( all.equal(now, as.POSIXlt(now, "UTC"), check.tzone = FALSE) # TRUE )
all.equal(pi, 355/113) # not precise enough (default tol) > relative error quarts <- 1/4 + 1:10 # exact d45 <- pi*quarts ; one <- rep(1, 10) tan(d45) == one # mostly FALSE, as typically exact; embarrassingly, tanpi(quarts) == one # (is always FALSE (Fedora 34; gcc 11.2.1)) stopifnot(all.equal( tan(d45), one)) # TRUE, but not if we are picky: all.equal(tan(d45), one, tolerance = 0) # to see difference all.equal(tan(d45), one, tolerance = 0, scale = 1)# "absolute diff.." all.equal(tan(d45), one, tolerance = 0, scale = 1+(-2:2)/1e9) # "absolute" all.equal(tan(d45), one, tolerance = 0, scale = 1+(-2:2)/1e6) # "scaled" ## advanced: equality of environments ae <- all.equal(as.environment("package:stats"), asNamespace("stats")) stopifnot(is.character(ae), length(ae) > 10, ## were incorrectly "considered equal" in R <= 3.1.1 all.equal(asNamespace("stats"), asNamespace("stats"))) ## A situation where 'countEQ = TRUE' makes sense: x1 <- x2 <- (1:100)/10; x2[2] <- 1.1*x1[2] ## 99 out of 100 pairs (x1[i], x2[i]) are equal: plot(x1,x2, main = "all.equal.numeric() -- not counting equal parts") all.equal(x1,x2) ## "Mean relative difference: 0.1" mtext(paste("all.equal(x1,x2) :", all.equal(x1,x2)), line= -2) ##' extract the 'Mean relative difference' as number: all.eqNum <- function(...) as.numeric(sub(".*:", '', all.equal(...))) set.seed(17) ## When x2 is jittered, typically all pairs (x1[i],x2[i]) do differ: summary(r <- replicate(100, all.eqNum(x1, x2*(1+rnorm(x1)*1e-7)))) mtext(paste("mean(all.equal(x1, x2*(1 + eps_k))) {100 x} Mean rel.diff.=", signif(mean(r), 3)), line = -4, adj=0) ## With argument countEQ=TRUE, get "the same" (w/o need for jittering): mtext(paste("all.equal(x1,x2, countEQ=TRUE) :", signif(all.eqNum(x1,x2, countEQ=TRUE), 3)), line= -6, col=2) ## Using giveErr=TRUE : x1. <- x1 * (1+ 1e-9*rnorm(x1)) str(all.equal(x1, x1., giveErr=TRUE)) ## logi TRUE ## - attr(*, "err")= num 8.66e-10 ## - attr(*, "what")= chr "relative" ## Used with stopifnot(), still *showing* diff: all.equalShow <- function (...) { r <- all.equal(..., giveErr=TRUE) cat(attr(r,"what"), "err:", attr(r,"err"), "\n") c(r) # can drop attributes, as not used anymore } # checks, showing error in any case: stopifnot(all.equalShow(x1, x1.)) # -> relative err: 8.66002e-10 tryCatch(error=identity, stopifnot(all.equalShow(x1, 2*x1))) -> eAe stopifnot(inherits(eAe, "error")) # stopifnot(all.equal....()) giving smart msg: cat(conditionMessage(eAe), "\n") two <- structure(2, foo = 1, class = "bar") all.equal(two^20, 2^20) # lots of diff all.equal(two^20, 2^20, check.attributes = FALSE)# "target is bar, current is numeric" all.equal(two^20, 2^20, check.attributes = FALSE, check.class = FALSE) # TRUE ## comparison of date-time objects now <- Sys.time() stopifnot( all.equal(now, now + 1e-4) # TRUE (default tolerance = 0.001 seconds) ) all.equal(now, now + 0.2) all.equal(now, as.POSIXlt(now, "UTC")) stopifnot( all.equal(now, as.POSIXlt(now, "UTC"), check.tzone = FALSE) # TRUE )
Return a character vector containing all the names which occur in an expression or call.
all.names(expr, functions = TRUE, max.names = -1L, unique = FALSE) all.vars(expr, functions = FALSE, max.names = -1L, unique = TRUE)
all.names(expr, functions = TRUE, max.names = -1L, unique = FALSE) all.vars(expr, functions = FALSE, max.names = -1L, unique = TRUE)
expr |
an expression or call from which the names are to be extracted. |
functions |
a logical value indicating whether function names should be included in the result. |
max.names |
the maximum number of names to be returned. |
unique |
a logical value which indicates whether duplicate names should be removed from the value. |
These functions differ only in the default values for their arguments.
A character vector with the extracted names.
substitute
to replace symbols with values in an expression.
all.names(expression(sin(x+y))) all.names(quote(sin(x+y))) # or a call all.vars(expression(sin(x+y)))
all.names(expression(sin(x+y))) all.names(quote(sin(x+y))) # or a call all.vars(expression(sin(x+y)))
Given a set of logical vectors, is at least one of the values true?
any(..., na.rm = FALSE)
any(..., na.rm = FALSE)
... |
zero or more logical vectors. Other objects of zero length are ignored, and the rest are coerced to logical ignoring any class. |
na.rm |
logical. If true |
This is a generic function: methods can be defined for it
directly or via the Summary
group generic.
For this to work properly, the arguments ...
should be
unnamed, and dispatch is on the first argument.
Coercion of types other than integer (raw, double, complex, character, list) gives a warning as this is often unintentional.
This is a primitive function.
The value is a logical vector of length one.
Let x
denote the concatenation of all the logical vectors in
...
(after coercion), after removing NA
s if requested by
na.rm = TRUE
.
The value returned is TRUE
if at least one of the values in
x
is TRUE
, and FALSE
if all of the values in
x
are FALSE
(including if there are no values). Otherwise
the value is NA
(which can only occur if na.rm = FALSE
and ...
contains no TRUE
values and at least one
NA
value).
This is part of the S4 Summary
group generic. Methods for it must use the signature
x, ..., na.rm
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
all
, the ‘complement’ of any
.
range(x <- sort(round(stats::rnorm(10) - 1.2, 1))) if(any(x < 0)) cat("x contains negative values\n")
range(x <- sort(round(stats::rnorm(10) - 1.2, 1))) if(any(x < 0)) cat("x contains negative values\n")
Transpose an array by permuting its dimensions and optionally resizing it.
aperm(a, perm, ...) ## Default S3 method: aperm(a, perm = NULL, resize = TRUE, ...) ## S3 method for class 'table' aperm(a, perm = NULL, resize = TRUE, keep.class = TRUE, ...)
aperm(a, perm, ...) ## Default S3 method: aperm(a, perm = NULL, resize = TRUE, ...) ## S3 method for class 'table' aperm(a, perm = NULL, resize = TRUE, keep.class = TRUE, ...)
a |
the array to be transposed. |
perm |
the subscript permutation vector, usually a permutation of
the integers |
resize |
a flag indicating whether the vector should be
resized as well as having its elements reordered (default |
keep.class |
logical indicating if the result should be of the
same class as |
... |
potential further arguments of methods. |
A transposed version of array a
, with subscripts permuted as
indicated by the array perm
. If resize
is TRUE
,
the array is reshaped as well as having its elements permuted, the
dimnames
are also permuted; if resize = FALSE
then the
returned object has the same dimensions as a
, and the dimnames
are dropped. In each case other attributes are copied from a
.
The function t
provides a faster and more convenient way of
transposing matrices.
Jonathan Rougier, [email protected] did the faster C implementation.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
t
, to transpose matrices.
# interchange the first two subscripts on a 3-way array x x <- array(1:24, 2:4) xt <- aperm(x, c(2,1,3)) stopifnot(t(xt[,,2]) == x[,,2], t(xt[,,3]) == x[,,3], t(xt[,,4]) == x[,,4]) UCB <- aperm(UCBAdmissions, c(2,1,3)) UCB[1,,] summary(UCB) # UCB is still a contingency table
# interchange the first two subscripts on a 3-way array x x <- array(1:24, 2:4) xt <- aperm(x, c(2,1,3)) stopifnot(t(xt[,,2]) == x[,,2], t(xt[,,3]) == x[,,3], t(xt[,,4]) == x[,,4]) UCB <- aperm(UCBAdmissions, c(2,1,3)) UCB[1,,] summary(UCB) # UCB is still a contingency table
Add elements to a vector.
append(x, values, after = length(x))
append(x, values, after = length(x))
x |
the vector the values are to be appended to. |
values |
to be included in the modified vector. |
after |
a subscript, after which the values are to be appended. |
A vector containing the values in x
with the elements of
values
appended after the specified element of x
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
append(1:5, 0:1, after = 3)
append(1:5, 0:1, after = 3)
Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.
apply(X, MARGIN, FUN, ..., simplify = TRUE)
apply(X, MARGIN, FUN, ..., simplify = TRUE)
X |
an array, including a matrix. |
MARGIN |
a vector giving the subscripts which the function will
be applied over. E.g., for a matrix |
FUN |
the function to be applied: see ‘Details’.
In the case of functions like |
... |
optional arguments to |
simplify |
a logical indicating whether results should be simplified if possible. |
If X
is not an array but an object of a class with a non-null
dim
value (such as a data frame), apply
attempts
to coerce it to an array via as.matrix
if it is two-dimensional
(e.g., a data frame) or via as.array
.
FUN
is found by a call to match.fun
and typically
is either a function or a symbol (e.g., a backquoted name) or a
character string specifying a function to be searched for from the
environment of the call to apply
.
Arguments in ...
cannot have the same name as any of the
other arguments, and care may be needed to avoid partial matching to
MARGIN
or FUN
. In general-purpose code it is good
practice to name the first three arguments if ...
is passed
through: this both avoids partial matching to MARGIN
or FUN
and ensures that a sensible error message is given if
arguments named X
, MARGIN
or FUN
are passed
through ...
.
If each call to FUN
returns a vector of length n
,
and simplify
is TRUE
,
then
apply
returns an array of dimension c(n, dim(X)[MARGIN])
if n > 1
. If n
equals 1
, apply
returns a
vector if MARGIN
has length 1 and an array of dimension
dim(X)[MARGIN]
otherwise.
If n
is 0
, the result has length 0 but not necessarily
the ‘correct’ dimension.
If the calls to FUN
return vectors of different lengths,
or if simplify
is FALSE
,
apply
returns a list of length prod(dim(X)[MARGIN])
with
dim
set to MARGIN
if this has length greater than one.
In all cases the result is coerced by as.vector
to one
of the basic vector types before the dimensions are set, so that (for
example) factor results will be coerced to a character array.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
lapply
and there, simplify2array
;
tapply
, and convenience functions
sweep
and aggregate
.
## Compute row and column sums for a matrix: x <- cbind(x1 = 3, x2 = c(4:1, 2:5)) dimnames(x)[[1]] <- letters[1:8] apply(x, 2, mean, trim = .2) col.sums <- apply(x, 2, sum) row.sums <- apply(x, 1, sum) rbind(cbind(x, Rtot = row.sums), Ctot = c(col.sums, sum(col.sums))) stopifnot( apply(x, 2, is.vector)) ## Sort the columns of a matrix apply(x, 2, sort) ## keeping named dimnames names(dimnames(x)) <- c("row", "col") x3 <- array(x, dim = c(dim(x),3), dimnames = c(dimnames(x), list(C = paste0("cop.",1:3)))) identical(x, apply( x, 2, identity)) identical(x3, apply(x3, 2:3, identity)) ##- function with extra args: cave <- function(x, c1, c2) c(mean(x[c1]), mean(x[c2])) apply(x, 1, cave, c1 = "x1", c2 = c("x1","x2")) ma <- matrix(c(1:4, 1, 6:8), nrow = 2) ma apply(ma, 1, table) #--> a list of length 2 apply(ma, 1, stats::quantile) # 5 x n matrix with rownames stopifnot(dim(ma) == dim(apply(ma, 1:2, sum))) ## Example with different lengths for each call z <- array(1:24, dim = 2:4) zseq <- apply(z, 1:2, function(x) seq_len(max(x))) zseq ## a 2 x 3 matrix typeof(zseq) ## list dim(zseq) ## 2 3 zseq[1,] apply(z, 3, function(x) seq_len(max(x))) # a list without a dim attribute
## Compute row and column sums for a matrix: x <- cbind(x1 = 3, x2 = c(4:1, 2:5)) dimnames(x)[[1]] <- letters[1:8] apply(x, 2, mean, trim = .2) col.sums <- apply(x, 2, sum) row.sums <- apply(x, 1, sum) rbind(cbind(x, Rtot = row.sums), Ctot = c(col.sums, sum(col.sums))) stopifnot( apply(x, 2, is.vector)) ## Sort the columns of a matrix apply(x, 2, sort) ## keeping named dimnames names(dimnames(x)) <- c("row", "col") x3 <- array(x, dim = c(dim(x),3), dimnames = c(dimnames(x), list(C = paste0("cop.",1:3)))) identical(x, apply( x, 2, identity)) identical(x3, apply(x3, 2:3, identity)) ##- function with extra args: cave <- function(x, c1, c2) c(mean(x[c1]), mean(x[c2])) apply(x, 1, cave, c1 = "x1", c2 = c("x1","x2")) ma <- matrix(c(1:4, 1, 6:8), nrow = 2) ma apply(ma, 1, table) #--> a list of length 2 apply(ma, 1, stats::quantile) # 5 x n matrix with rownames stopifnot(dim(ma) == dim(apply(ma, 1:2, sum))) ## Example with different lengths for each call z <- array(1:24, dim = 2:4) zseq <- apply(z, 1:2, function(x) seq_len(max(x))) zseq ## a 2 x 3 matrix typeof(zseq) ## list dim(zseq) ## 2 3 zseq[1,] apply(z, 3, function(x) seq_len(max(x))) # a list without a dim attribute
Displays the argument names and corresponding default values of a (non-primitive or primitive) function.
args(name)
args(name)
name |
a function (a primitive or a closure, i.e.,
“non-primitive”).
If |
This function is mainly used interactively to print the argument list
of a function. For programming, consider using formals
instead.
For a closure, a closure with identical formal argument list but an
empty (NULL
) body.
For a primitive (function), a closure with the documented usage and NULL
body. Note that some primitives do not make use of named arguments
and match by position rather than name.
NULL
in case of a non-function.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
formals
, help
;
str
also prints the argument list of a function.
## "regular" (non-primitive) functions "print their arguments" ## (by returning another function with NULL body which you also see): args(ls) args(graphics::plot.default) utils::str(ls) # (just "prints": does not show a NULL) ## You can also pass a string naming a function. args("scan") ## ...but :: package specification doesn't work in this case. tryCatch(args("graphics::plot.default"), error = print) ## As explained above, args() gives a function with empty body: list(is.f = is.function(args(scan)), body = body(args(scan))) ## Primitive functions mostly behave like non-primitive functions. args(c) args(`+`) ## primitive functions without well-defined argument list return NULL: args(`if`)
## "regular" (non-primitive) functions "print their arguments" ## (by returning another function with NULL body which you also see): args(ls) args(graphics::plot.default) utils::str(ls) # (just "prints": does not show a NULL) ## You can also pass a string naming a function. args("scan") ## ...but :: package specification doesn't work in this case. tryCatch(args("graphics::plot.default"), error = print) ## As explained above, args() gives a function with empty body: list(is.f = is.function(args(scan)), body = body(args(scan))) ## Primitive functions mostly behave like non-primitive functions. args(c) args(`+`) ## primitive functions without well-defined argument list return NULL: args(`if`)
These unary and binary operators perform arithmetic on numeric or complex vectors (or objects which can be coerced to them).
+ x - x x + y x - y x * y x / y x ^ y x %% y x %/% y
+ x - x x + y x - y x * y x / y x ^ y x %% y x %/% y
x , y
|
numeric or complex vectors or objects which can be coerced to such, or other objects for which methods have been written. |
The unary and binary arithmetic operators are generic functions:
methods can be written for them individually or via the
Ops
group generic function. (See
Ops
for how dispatch is computed.)
If applied to arrays the result will be an array if this is sensible (for example it will not if the recycling rule has been invoked).
Logical vectors will be coerced to integer or numeric vectors,
FALSE
having value zero and TRUE
having value one.
1 ^ y
and y ^ 0
are 1
, always.
x ^ y
should also give the proper limit result when
either (numeric) argument is infinite (one of Inf
or
-Inf
).
Objects such as arrays or time-series can be operated on this way provided they are conformable.
For double arguments, %%
can be subject to catastrophic loss of
accuracy if x
is much larger than y
, and a warning is
given if this is detected.
%%
and x %/% y
can be used for non-integer y
,
e.g. 1 %/% 0.2
, but the results are subject to representation
error and so may be platform-dependent. Mathematically, the answer to
1 %/% 0.2
should be 5
, but because the IEC 60559
representation of 0.2
is a binary fraction slightly larger than
0.2
most platforms give 4
.
Users are sometimes surprised by the value returned, for example why
(-8)^(1/3)
is NaN
. For double inputs, R makes
use of IEC 60559 arithmetic on all platforms, together with the C
system function ‘pow’ for the ^
operator. The relevant
standards define the result in many corner cases. In particular, the
result in the example above is mandated by the C99 standard. On many
Unix-alike systems the command man pow
gives details of the
values in a large number of corner cases.
Arithmetic on type double in R is supposed to be done in ‘round to nearest, ties to even’ mode, but this does depend on the compiler and FPU being set up correctly.
Unary +
and unary -
return a numeric or complex vector.
All attributes (including class) are preserved if there is no
coercion: logical x
is coerced to integer and names, dims and
dimnames are preserved.
The binary operators return vectors containing the result of the element
by element operations. If involving a zero-length vector the result
has length zero. Otherwise, the elements of shorter vectors are recycled
as necessary (with a warning
when they are recycled only
fractionally). The operators are +
for addition,
-
for subtraction, *
for multiplication, /
for
division and ^
for exponentiation.
%%
indicates x mod y
(“x modulo y”), i.e.,
computes the ‘remainder’ r <- x %% y
, and
%/%
indicates integer division, where R uses “floored”
integer division, i.e., q <- x %/% y := floor(x/y)
, as promoted
by Donald Knuth, see the Wikipedia page on ‘Modulo operation’,
and hence sign(r) == sign(y)
. It is guaranteed that
x == (x %% y) + y * (x %/% y)
(up to rounding error)
unless y == 0
where the result of %%
is
NA_integer_
or NaN
(depending on the
typeof
of the arguments) or for some non-finite
arguments, e.g., when the RHS of the identity above
amounts to Inf - Inf
.
If either argument is complex the result will be complex, otherwise if
one or both arguments are numeric, the result will be numeric. If
both arguments are of type integer, the type of the result of
/
and ^
is numeric and for the other operators it
is integer (with overflow, which occurs at
,
returned as
NA_integer_
with a warning).
The rules for determining the attributes of the result are rather
complicated. Most attributes are taken from the longer argument.
Names will be copied from the first if it is the same length as the
answer, otherwise from the second if that is. If the arguments are
the same length, attributes will be copied from both, with those of
the first argument taking precedence when the same attribute is
present in both arguments. For time series, these operations are
allowed only if the series are compatible, when the class and
tsp
attribute of whichever is a time series (the same,
if both are) are used. For arrays (and an array result) the
dimensions and dimnames are taken from first argument if it is an
array, otherwise the second.
These operators are members of the S4 Arith
group generic,
and so methods can be written for them individually as well as for the
group generic (or the Ops
group generic), with arguments
c(e1, e2)
(with e2
missing for a unary operator).
R is dependent on OS services (and they on FPUs) for floating-point
arithmetic. On all current R platforms IEC 60559 (also known as IEEE
754) arithmetic is used, but some things in those standards are
optional. In particular, the support for denormal aka
subnormal numbers
(those outside the range given by .Machine
) may differ
between platforms and even between calculations on a single platform.
Another potential issue is signed zeroes: on IEC 60559 platforms there
are two zeroes with internal representations differing by sign. Where
possible R treats them as the same, but for example direct output
from C code often does not do so and may output ‘-0.0’ (and on
Windows whether it does so or not depends on the version of Windows).
One place in R where the difference might be seen is in division by
zero: 1/x
is Inf
or -Inf
depending on the sign of
zero x
. Another place is
identical(0, -0, num.eq = FALSE)
.
All logical operations involving a zero-length vector have a zero-length result.
The binary operators are sometimes called as functions as
e.g. `&`(x, y)
: see the description of how
argument-matching is done in Ops
.
**
is translated in the parser to ^
, but this was
undocumented for many years. It appears as an index entry in Becker
et al. (1988), pointing to the help for Deprecated
but
is not actually mentioned on that page. Even though it had been
deprecated in S for 20 years, it was still accepted in R in 2008.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
D. Goldberg (1991).
What Every Computer Scientist Should Know about Floating-Point
Arithmetic.
ACM Computing Surveys, 23(1), 5–48.
doi:10.1145/103162.103163.
Also available at
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html.
For the IEC 60559 (aka IEEE 754) standard: https://www.iso.org/standard/57469.html and https://en.wikipedia.org/wiki/IEEE_754.
On the integer division and remainder (modulo) computations, %%
and %/%
: https://en.wikipedia.org/wiki/Modulo_operation, and
Donald Knuth (1972)
The Art of Computer Programming, Vol.1.
sqrt
for miscellaneous and Special
for special
mathematical functions.
Syntax
for operator precedence.
%*%
for matrix multiplication.
x <- -1:12 x + 1 2 * x + 3 x %% 3 # is periodic 2 0 1 2 0 1 ... x %% -3 # (ditto) -1 0 -2 -1 0 -2 ... x %/% 5 x %% Inf # now is defined by limit (gave NaN in earlier versions of R) ## Illustrating PR#18677, see above 1 %/% print(0.2, digits=19)
x <- -1:12 x + 1 2 * x + 3 x %% 3 # is periodic 2 0 1 2 0 1 ... x %% -3 # (ditto) -1 0 -2 -1 0 -2 ... x %/% 5 x %% Inf # now is defined by limit (gave NaN in earlier versions of R) ## Illustrating PR#18677, see above 1 %/% print(0.2, digits=19)
Creates or tests for arrays.
array(data = NA, dim = length(data), dimnames = NULL) as.array(x, ...) is.array(x)
array(data = NA, dim = length(data), dimnames = NULL) as.array(x, ...) is.array(x)
data |
a vector (including a list or |
dim |
the dim attribute for the array to be created, that is an integer vector of length one or more giving the maximal indices in each dimension. |
dimnames |
either |
x |
an R object. |
... |
additional arguments to be passed to or from methods. |
An array in R can have one, two or more dimensions. It is simply a
vector which is stored with additional attributes giving the
dimensions (attribute "dim"
) and optionally names for those
dimensions (attribute "dimnames"
).
A two-dimensional array is the same thing as a matrix
.
One-dimensional arrays often look like vectors, but may be handled
differently by some functions: str
does distinguish
them in recent versions of R.
The "dim"
attribute is an integer vector of length one or more
containing non-negative values: the product of the values must match
the length of the array.
The "dimnames"
attribute is optional: if present it is a list
with one component for each dimension, either NULL
or a
character vector of the length given by the element of the
"dim"
attribute for that dimension.
is.array
is a primitive function.
For a list array, the print
methods prints entries of length
not one in the form ‘integer,7’ indicating the type and length.
array
returns an array with the extents specified in dim
and naming information in dimnames
. The values in data
are
taken to be those in the array with the leftmost subscript moving
fastest. If there are too few elements in data
to fill the array,
then the elements in data
are recycled. If data
has
length zero, NA
of an appropriate type is used for atomic
vectors (0
for raw vectors) and NULL
for lists.
Unlike matrix
, array
does not currently remove
any attributes left by as.vector
from a classed list
data
, so can return a list array with a class attribute.
as.array
is a generic function for coercing to arrays. The
default method does so by attaching a dim
attribute to
it. It also attaches dimnames
if x
has
names
. The sole purpose of this is to make it possible
to access the dim[names]
attribute at a later time.
is.array
returns TRUE
or FALSE
depending on
whether its argument is an array (i.e., has a dim
attribute of
positive length) or not. It is generic: you can write methods to handle
specific classes of objects, see InternalMethods.
is.array
is a primitive function.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
dim(as.array(letters)) array(1:3, c(2,4)) # recycle 1:3 "2 2/3 times" # [,1] [,2] [,3] [,4] #[1,] 1 3 2 1 #[2,] 2 1 3 2
dim(as.array(letters)) array(1:3, c(2,4)) # recycle 1:3 "2 2/3 times" # [,1] [,2] [,3] [,4] #[1,] 1 3 2 1 #[2,] 2 1 3 2
array2DF
converts an array, including list arrays commonly
returned by tapply
, into data frames for use in further
analysis or plotting functions.
array2DF(x, responseName = "Value", sep = "", base = list(LETTERS), simplify = TRUE, allowLong = TRUE)
array2DF(x, responseName = "Value", sep = "", base = list(LETTERS), simplify = TRUE, allowLong = TRUE)
x |
an array object. |
responseName |
character string, used for creating column name(s) in the result, if required. |
sep |
character string, used as separator when creating new names, if required. |
base |
character vector, giving an initial set of names to create
dimnames of |
simplify |
logical, whether to attempt simplification of the result. |
allowLong |
logical, specifying whether a long format data frame
should be returned if |
The main use of array2DF
is to convert an array, as typically
returned by tapply
, into a data frame.
When simplify = FALSE
, this is similar to
as.data.frame.table
, except that it works for list
arrays as well as atomic arrays. Specifically, the resulting data
frame has one row for each element of the array, with one column for
each dimension of the array giving the corresponding
dimnames
. The contents of the array are placed in a
column whose name is given by the responseName
argument. The
mode of this column is the same as that of x
, usually an atomic
vector or a list.
If x
does not have dimnames
, they are
automatically created using base
and sep
.
In the default case, when simplify = TRUE
, some common cases
are handled specially.
If all components of x
are data frames with identical column
names (with possibly different numbers of rows), they are
rbind
-ed to form the response. The additional columns
giving dimnames
are repeated according to the number of
rows, and responseName
is ignored in this case.
If all components of x
are unnamed atomic vectors
and allowLong = TRUE
, each component is treated as a
single-column data frame with column name given by
responseName
, and processed as above.
In all other cases, an attempt to simplify is made by
simplify2array
. If this results in multiple unnamed
columns, names are constructed using responseName
and
sep
.
A data frame with at least length(dim(x)) + 1
columns. The
first length(dim(x))
columns each represent one dimension of
x
and gives the corresponding values of dimnames
, which
are implicitly created if necessary. The remaining columns contain the
contents of x
, after attempted simplification if requested.
tapply
, as.data.frame.table
,
split
, aggregate
.
s1 <- with(ToothGrowth, tapply(len, list(dose, supp), mean, simplify = TRUE)) s2 <- with(ToothGrowth, tapply(len, list(dose, supp), mean, simplify = FALSE)) str(s1) # atomic array str(s2) # list array str(array2DF(s1, simplify = FALSE)) # Value column is vector str(array2DF(s2, simplify = FALSE)) # Value column is list str(array2DF(s2, simplify = TRUE)) # simplified to vector ### The remaining examples use the default 'simplify = TRUE' ## List array with list components: columns are lists (no simplification) with(ToothGrowth, tapply(len, list(dose, supp), function(x) t.test(x)[c("p.value", "alternative")])) |> array2DF() |> str() ## List array with data frame components: columns are atomic (simplified) with(ToothGrowth, tapply(len, list(dose, supp), function(x) with(t.test(x), data.frame(p.value, alternative)))) |> array2DF() |> str() ## named vectors with(ToothGrowth, tapply(len, list(dose, supp), quantile)) |> array2DF() ## unnamed vectors: long format with(ToothGrowth, tapply(len, list(dose, supp), sample, size = 5)) |> array2DF() ## unnamed vectors: wide format with(ToothGrowth, tapply(len, list(dose, supp), sample, size = 5)) |> array2DF(allowLong = FALSE) ## unnamed vectors of unequal length with(ToothGrowth[-1, ], tapply(len, list(dose, supp), sample, replace = TRUE)) |> array2DF(allowLong = FALSE) ## unnamed vectors of unequal length with allowLong = TRUE ## (within-group bootstrap) with(ToothGrowth[-1, ], tapply(len, list(dose, supp), sample, replace = TRUE)) |> array2DF() |> str() ## data frame input tapply(ToothGrowth, ~ dose + supp, FUN = with, data.frame(n = length(len), mean = mean(len), sd = sd(len))) |> array2DF()
s1 <- with(ToothGrowth, tapply(len, list(dose, supp), mean, simplify = TRUE)) s2 <- with(ToothGrowth, tapply(len, list(dose, supp), mean, simplify = FALSE)) str(s1) # atomic array str(s2) # list array str(array2DF(s1, simplify = FALSE)) # Value column is vector str(array2DF(s2, simplify = FALSE)) # Value column is list str(array2DF(s2, simplify = TRUE)) # simplified to vector ### The remaining examples use the default 'simplify = TRUE' ## List array with list components: columns are lists (no simplification) with(ToothGrowth, tapply(len, list(dose, supp), function(x) t.test(x)[c("p.value", "alternative")])) |> array2DF() |> str() ## List array with data frame components: columns are atomic (simplified) with(ToothGrowth, tapply(len, list(dose, supp), function(x) with(t.test(x), data.frame(p.value, alternative)))) |> array2DF() |> str() ## named vectors with(ToothGrowth, tapply(len, list(dose, supp), quantile)) |> array2DF() ## unnamed vectors: long format with(ToothGrowth, tapply(len, list(dose, supp), sample, size = 5)) |> array2DF() ## unnamed vectors: wide format with(ToothGrowth, tapply(len, list(dose, supp), sample, size = 5)) |> array2DF(allowLong = FALSE) ## unnamed vectors of unequal length with(ToothGrowth[-1, ], tapply(len, list(dose, supp), sample, replace = TRUE)) |> array2DF(allowLong = FALSE) ## unnamed vectors of unequal length with allowLong = TRUE ## (within-group bootstrap) with(ToothGrowth[-1, ], tapply(len, list(dose, supp), sample, replace = TRUE)) |> array2DF() |> str() ## data frame input tapply(ToothGrowth, ~ dose + supp, FUN = with, data.frame(n = length(len), mean = mean(len), sd = sd(len))) |> array2DF()
Functions to check if an object is a data frame, or coerce it if possible.
as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S3 method for class 'character' as.data.frame(x, ..., stringsAsFactors = FALSE) ## S3 method for class 'list' as.data.frame(x, row.names = NULL, optional = FALSE, ..., cut.names = FALSE, col.names = names(x), fix.empty.names = TRUE, check.names = !optional, stringsAsFactors = FALSE) ## S3 method for class 'matrix' as.data.frame(x, row.names = NULL, optional = FALSE, make.names = TRUE, ..., stringsAsFactors = FALSE) as.data.frame.vector(x, row.names = NULL, optional = FALSE, ..., nm = deparse1(substitute(x))) is.data.frame(x)
as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S3 method for class 'character' as.data.frame(x, ..., stringsAsFactors = FALSE) ## S3 method for class 'list' as.data.frame(x, row.names = NULL, optional = FALSE, ..., cut.names = FALSE, col.names = names(x), fix.empty.names = TRUE, check.names = !optional, stringsAsFactors = FALSE) ## S3 method for class 'matrix' as.data.frame(x, row.names = NULL, optional = FALSE, make.names = TRUE, ..., stringsAsFactors = FALSE) as.data.frame.vector(x, row.names = NULL, optional = FALSE, ..., nm = deparse1(substitute(x))) is.data.frame(x)
x |
any R object. |
row.names |
|
optional |
logical. If |
... |
additional arguments to be passed to or from methods. |
stringsAsFactors |
logical: should the character vector be converted to a factor? |
cut.names |
logical or integer; indicating if column names with
more than 256 (or |
col.names |
(optional) character vector of column names. |
fix.empty.names |
logical indicating if empty column names, i.e.,
|
check.names |
logical; passed to the |
make.names |
a |
nm |
a |
as.data.frame
is a generic function with many methods, and
users and packages can supply further methods. For classes that act
as vectors, often a copy of as.data.frame.vector
will work
as the method.
Since R 4.3.0, the default method will call
as.data.frame.vector
for atomic (as by is.atomic
) x
.
Direct calls of as.data.frame.class
are still possible (base package!),
for 12 atomic base classes, but are deprecated
where calling as.data.frame.vector
instead is recommended.
If a list is supplied, each element is converted to a column in the
data frame. Similarly, each column of a matrix is converted separately.
This can be overridden if the object has a class which has
a method for as.data.frame
: two examples are
matrices of class "model.matrix"
(which are
included as a single column) and list objects of class
"POSIXlt"
which are coerced to class
"POSIXct"
.
Arrays can be converted to data frames. One-dimensional arrays are treated like vectors and two-dimensional arrays like matrices. Arrays with more than two dimensions are converted to matrices by ‘flattening’ all dimensions after the first and creating suitable column labels.
Character variables are converted to factor columns unless protected
by I
.
If a data frame is supplied, all classes preceding "data.frame"
are stripped, and the row names are changed if that argument is supplied.
If row.names = NULL
, row names are constructed from the names
or dimnames of x
, otherwise are the integer sequence
starting at one. Few of the methods check for duplicated row names.
Names are removed from vector columns unless I
.
as.data.frame
returns a data frame, normally with all row names
""
if optional = TRUE
.
is.data.frame
returns TRUE
if its argument is a data
frame (that is, has "data.frame"
amongst its classes)
and FALSE
otherwise.
Chambers, J. M. (1992) Data for models. Chapter 3 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
data.frame
, as.data.frame.table
for the
table
method (which has additional arguments if called directly).
Functions to convert between character representations and objects of
class "Date"
representing calendar dates.
as.Date(x, ...) ## S3 method for class 'character' as.Date(x, format, tryFormats = c("%Y-%m-%d", "%Y/%m/%d"), optional = FALSE, ...) ## S3 method for class 'numeric' as.Date(x, origin, ...) ## S3 method for class 'POSIXct' as.Date(x, tz = "UTC", ...) ## S3 method for class 'Date' format(x, format = "%Y-%m-%d", ...) ## S3 method for class 'Date' as.character(x, ...)
as.Date(x, ...) ## S3 method for class 'character' as.Date(x, format, tryFormats = c("%Y-%m-%d", "%Y/%m/%d"), optional = FALSE, ...) ## S3 method for class 'numeric' as.Date(x, origin, ...) ## S3 method for class 'POSIXct' as.Date(x, tz = "UTC", ...) ## S3 method for class 'Date' format(x, format = "%Y-%m-%d", ...) ## S3 method for class 'Date' as.character(x, ...)
x |
an object to be converted. |
format |
a |
tryFormats |
|
optional |
|
origin |
a |
tz |
a time zone name. |
... |
further arguments to be passed from or to other methods. |
The usual vector re-cycling rules are applied to x
and
format
so the answer will be of length that of the longer of the
vectors.
Locale-specific conversions to and from character strings are used where appropriate and available. This affects the names of the days and months.
The as.Date
methods accept character strings, factors, logical
NA
and objects of classes "POSIXlt"
and
"POSIXct"
. (The last is converted to days by ignoring
the time after midnight in the representation of the time in specified
time zone, default UTC.) Also objects of class "date"
(from
package date) and "dates"
(from
package chron). Character strings are processed
as far as necessary for the format specified: any trailing characters
are ignored.
as.Date
will accept numeric data (the number of days since an
epoch), since R 4.3.0 also when origin
is not supplied.
The format
and as.character
methods ignore any
fractional part of the date.
The format
and as.character
methods return a character vector
representing the date. NA
dates are returned as NA_character_
.
The as.Date
methods return an object of class "Date"
.
Most systems record dates internally as the number of days since some origin, but this is fraught with problems, including
Is the origin day 0 or day 1? As the ‘Examples’ show, Excel manages to use both choices for its two date systems.
If the origin is far enough back, the designers may show their ignorance of calendar systems. For example, Excel's designer thought 1900 was a leap year (claiming to copy the error from earlier DOS spreadsheets), and Matlab's designer chose the non-existent date of ‘January 0, 0000’ (there is no such day), not specifying the calendar. (There is such a year in the ‘Gregorian’ calendar as used in ISO 8601:2004, but that does say that it is only to be used for years before 1582 with the agreement of the parties in information exchange.)
The only safe procedure is to check the other systems values for known dates: reports on the Internet (including R-help) are more often wrong than right.
The default formats follow the rules of the ISO 8601 international
standard which expresses a day as "2001-02-03"
.
If the date string does not specify the date completely, the returned
answer may be system-specific. The most common behaviour is to assume
that a missing year, month or day is the current one. If it specifies
a date incorrectly, reliable implementations will give an error and
the date is reported as NA
. Unfortunately some common
implementations (such as ‘glibc’) are unreliable and guess at the
intended meaning.
Years before 1CE (aka 1AD) will probably not be handled correctly.
International Organization for Standardization (2004, 1988, 1997, ...) ISO 8601. Data elements and interchange formats – Information interchange – Representation of dates and times. For links to versions available on-line see (at the time of writing) https://www.qsl.net/g1smd/isopdf.htm.
Date for details of the date class;
locales
to query or set a locale.
Your system's help pages on strftime
and strptime
to see
how to specify their formats. Windows users will find no help page
for strptime
: code based on ‘glibc’ is used (with
corrections), so all the format specifiers described here are
supported, but with no alternative number representation nor era
available in any locale.
## locale-specific version of the date format(Sys.Date(), "%a %b %d") ## read in date info in format 'ddmmmyyyy' ## This will give NA(s) in some locales; setting the C locale ## as in the commented lines will overcome this on most systems. ## lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C") x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960") z <- as.Date(x, "%d%b%Y") ## Sys.setlocale("LC_TIME", lct) z ## read in date/time info in format 'm/d/y' dates <- c("02/27/92", "02/27/92", "01/14/92", "02/28/92", "02/01/92") as.Date(dates, "%m/%d/%y") ## date given as number of days since 1900-01-01 (a date in 1989) as.Date(32768, origin = "1900-01-01") ## Excel is said to use 1900-01-01 as day 1 (Windows default) or ## 1904-01-01 as day 0 (Mac default), but this is complicated by Excel ## incorrectly treating 1900 as a leap year. ## So for dates (post-1901) from Windows Excel as.Date(35981, origin = "1899-12-30") # 1998-07-05 ## and Mac Excel as.Date(34519, origin = "1904-01-01") # 1998-07-05 ## (these values come from http://support.microsoft.com/kb/214330) ## Experiment shows that Matlab's origin is 719529 days before ours, ## (it takes the non-existent 0000-01-01 as day 1) ## so Matlab day 734373 can be imported as as.Date(734373) - 719529 # 2010-08-23 ## (value from ## http://www.mathworks.de/de/help/matlab/matlab_prog/represent-date-and-times-in-MATLAB.html) ## Time zone effect z <- ISOdate(2010, 04, 13, c(0,12)) # midnight and midday UTC as.Date(z) # in UTC ## these time zone names are common as.Date(z, tz = "NZ") as.Date(z, tz = "HST") # Hawaii
## locale-specific version of the date format(Sys.Date(), "%a %b %d") ## read in date info in format 'ddmmmyyyy' ## This will give NA(s) in some locales; setting the C locale ## as in the commented lines will overcome this on most systems. ## lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C") x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960") z <- as.Date(x, "%d%b%Y") ## Sys.setlocale("LC_TIME", lct) z ## read in date/time info in format 'm/d/y' dates <- c("02/27/92", "02/27/92", "01/14/92", "02/28/92", "02/01/92") as.Date(dates, "%m/%d/%y") ## date given as number of days since 1900-01-01 (a date in 1989) as.Date(32768, origin = "1900-01-01") ## Excel is said to use 1900-01-01 as day 1 (Windows default) or ## 1904-01-01 as day 0 (Mac default), but this is complicated by Excel ## incorrectly treating 1900 as a leap year. ## So for dates (post-1901) from Windows Excel as.Date(35981, origin = "1899-12-30") # 1998-07-05 ## and Mac Excel as.Date(34519, origin = "1904-01-01") # 1998-07-05 ## (these values come from http://support.microsoft.com/kb/214330) ## Experiment shows that Matlab's origin is 719529 days before ours, ## (it takes the non-existent 0000-01-01 as day 1) ## so Matlab day 734373 can be imported as as.Date(734373) - 719529 # 2010-08-23 ## (value from ## http://www.mathworks.de/de/help/matlab/matlab_prog/represent-date-and-times-in-MATLAB.html) ## Time zone effect z <- ISOdate(2010, 04, 13, c(0,12)) # midnight and midday UTC as.Date(z) # in UTC ## these time zone names are common as.Date(z, tz = "NZ") as.Date(z, tz = "HST") # Hawaii
A generic function coercing an R object to an
environment
. A number or a character string is
converted to the corresponding environment on the search path.
as.environment(x)
as.environment(x)
x |
an R object to convert. If it is already an
environment, just return it. If it is a positive number, return the
environment corresponding to that position on the search list. If it
is If it is a list, the equivalent of If |
This is a primitive generic function: you can write methods to handle specific classes of objects, see InternalMethods.
The corresponding environment object.
John Chambers
environment
for creation and manipulation,
search
;
list2env
.
as.environment(1) ## the global environment identical(globalenv(), as.environment(1)) ## is TRUE try( ## <<- stats need not be attached as.environment("package:stats")) ee <- as.environment(list(a = "A", b = pi, ch = letters[1:8])) ls(ee) # names of objects in ee utils::ls.str(ee)
as.environment(1) ## the global environment identical(globalenv(), as.environment(1)) ## is TRUE try( ## <<- stats need not be attached as.environment("package:stats")) ee <- as.environment(list(a = "A", b = pi, ch = letters[1:8])) ls(ee) # names of objects in ee utils::ls.str(ee)
as.function
is a generic function which is used to convert
objects to functions.
as.function.default
works on a list x
, which should contain the
concatenation of a formal argument list and an expression or an
object of mode "call"
which will become the function body.
The function will be defined in a specified environment, by default
that of the caller.
as.function(x, ...) ## Default S3 method: as.function(x, envir = parent.frame(), ...)
as.function(x, ...) ## Default S3 method: as.function(x, envir = parent.frame(), ...)
x |
object to convert, a list for the default method. |
... |
additional arguments to be passed to or from methods. |
envir |
environment in which the function should be defined. |
The desired function.
Peter Dalgaard
function
;
alist
which is handy for the construction of
argument lists, etc.
as.function(alist(a = , b = 2, a+b)) as.function(alist(a = , b = 2, a+b))(3)
as.function(alist(a = , b = 2, a+b)) as.function(alist(a = , b = 2, a+b))(3)
Functions to manipulate objects of classes "POSIXlt"
and
"POSIXct"
representing calendar dates and times.
as.POSIXct(x, tz = "", ...) as.POSIXlt(x, tz = "", ...) ## S3 method for class 'character' as.POSIXlt(x, tz = "", format, tryFormats = c("%Y-%m-%d %H:%M:%OS", "%Y/%m/%d %H:%M:%OS", "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M", "%Y-%m-%d", "%Y/%m/%d"), optional = FALSE, ...) ## Default S3 method: as.POSIXlt(x, tz = "", optional = FALSE, ...) ## S3 method for class 'numeric' as.POSIXlt(x, tz = "", origin, ...) ## S3 method for class 'Date' as.POSIXct(x, tz = "UTC", ...) ## S3 method for class 'Date' as.POSIXlt(x, tz = "UTC", ...) ## S3 method for class 'numeric' as.POSIXct(x, tz = "", origin, ...) ## S3 method for class 'POSIXlt' as.double(x, ...)
as.POSIXct(x, tz = "", ...) as.POSIXlt(x, tz = "", ...) ## S3 method for class 'character' as.POSIXlt(x, tz = "", format, tryFormats = c("%Y-%m-%d %H:%M:%OS", "%Y/%m/%d %H:%M:%OS", "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M", "%Y-%m-%d", "%Y/%m/%d"), optional = FALSE, ...) ## Default S3 method: as.POSIXlt(x, tz = "", optional = FALSE, ...) ## S3 method for class 'numeric' as.POSIXlt(x, tz = "", origin, ...) ## S3 method for class 'Date' as.POSIXct(x, tz = "UTC", ...) ## S3 method for class 'Date' as.POSIXlt(x, tz = "UTC", ...) ## S3 method for class 'numeric' as.POSIXct(x, tz = "", origin, ...) ## S3 method for class 'POSIXlt' as.double(x, ...)
x |
R object to be converted. |
tz |
a character string. The time zone specification to be used
for the conversion, if one is required. System-specific (see
time zones), but |
... |
further arguments to be passed to or from other methods. |
format |
character string giving a date-time format as used
by |
tryFormats |
|
optional |
|
origin |
a date-time object, or something which can be coerced by
|
The as.POSIX*
functions convert an object to one of the two
classes used to represent date/times (calendar dates plus time to the
nearest second). They can convert objects of the other class and of
class "Date"
to these classes. Dates without times are
treated as being at midnight UTC.
They can also convert character strings of the formats
"2001-02-03"
and "2001/02/03"
optionally followed by
white space and a time in the format "14:52"
or
"14:52:03"
. (Formats such as "01/02/03"
are ambiguous
but can be converted via a format specification by
strptime
.) Fractional seconds are allowed.
Alternatively, format
can be specified for character vectors or
factors: if it is not specified and no standard format works for
all non-NA
inputs an error is thrown.
If format
is specified, remember that some of the format
specifications are locale-specific, and you may need to set the
LC_TIME
category appropriately via
Sys.setlocale
. This most often affects the use of
%a
, %A
(weekday names),
%b
, %B
(month names) and %p
(AM/PM).
Logical NA
s can be converted to either of the classes, but no
other logical vectors can be.
If you are given a numeric time as the number of seconds since an epoch, see the examples.
Character input is first converted to class "POSIXlt"
by
strptime
: numeric input is first converted to
"POSIXct"
. Any conversion that needs to go between the two
date-time classes requires a time zone: conversion from
"POSIXlt"
to "POSIXct"
will validate times in the
selected time zone. One issue is what happens at transitions
to and from DST, for example in the UK
as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S")) as.POSIXct(strptime("2010-10-31 01:30:00", "%Y-%m-%d %H:%M:%S"))
are respectively invalid (the clocks went forward at 1:00 GMT to 2:00
BST) and ambiguous (the clocks went back at 2:00 BST to 1:00 GMT). What
happens in such cases is OS-specific: one should expect the first to
be NA
, but the second could be interpreted as either BST or
GMT (and common OSes give both possible values). Note too (see
strftime
) that OS facilities may not format invalid
times correctly.
as.POSIXct
and as.POSIXlt
return an object of the
appropriate class. If tz
was specified, as.POSIXlt
will give an appropriate "tzone"
attribute. Date-times known
to be invalid will be returned as NA
.
Some of the concepts used have to be extended backwards in time (the
usage is said to be ‘proleptic’). For example, the origin of
time for the "POSIXct"
class, ‘1970-01-01 00:00.00 UTC’,
is before UTC was defined. More importantly, conversion is done
assuming the Gregorian calendar which was introduced in 1582 and not
used near-universally until the 20th century. One of the
re-interpretations assumed by ISO 8601:2004 is that there was a year
zero, even though current year numbering (and zero) is a much later
concept (525 CE for year numbers from 1 CE).
Conversions between "POSIXlt"
and "POSIXct"
of future
times are speculative except in UTC. The main uncertainty is in the
use of and transitions to/from DST (most systems will assume the
continuation of current rules but these can be changed at short
notice).
If you want to extract specific aspects of a time (such as the day of
the week) just convert it to class "POSIXlt"
and extract the
relevant component(s) of the list, or if you want a character
representation (such as a named day of the week) use the
format
method.
If a time zone is needed and that specified is invalid on your system, what happens is system-specific but attempts to set it will probably be ignored.
Conversion from character needs to find a suitable format unless one is supplied (by trying common formats in turn): this can be slow for long inputs.
DateTimeClasses for details of the classes;
strptime
for conversion to and from character
representations.
Sys.timezone
for details of the (system-specific) naming
of time zones.
locales for locale-specific aspects.
(z <- Sys.time()) # the current datetime, as class "POSIXct" unclass(z) # a large integer floor(unclass(z)/86400) # the number of days since 1970-01-01 (UTC) (now <- as.POSIXlt(Sys.time())) # the current datetime, as class "POSIXlt" str(unclass(now)) # the internal list ; use now$hour, etc : now$year + 1900 # see ?DateTimeClasses months(now); weekdays(now) # see ?months; using LC_TIME locale ## suppose we have a time in seconds since 1960-01-01 00:00:00 GMT ## (the origin used by SAS) z <- 1472562988 # ways to convert this as.POSIXct(z, origin = "1960-01-01") # local as.POSIXct(z, origin = "1960-01-01", tz = "GMT") # in UTC ## SPSS dates (R-help 2006-02-16) z <- c(10485849600, 10477641600, 10561104000, 10562745600) as.Date(as.POSIXct(z, origin = "1582-10-14", tz = "GMT")) ## Stata date-times: milliseconds since 1960-01-01 00:00:00 GMT ## format %tc excludes leap-seconds, assumed here ## For format %tC including leap seconds, see foreign::read.dta() z <- 1579598122120 op <- options(digits.secs = 3) # avoid rounding down: milliseconds are not exactly representable as.POSIXct((z+0.1)/1000, origin = "1960-01-01") options(op) ## Matlab 'serial day number' (days and fractional days) z <- 7.343736909722223e5 # 2010-08-23 16:35:00 as.POSIXct((z - 719529)*86400, origin = "1970-01-01", tz = "UTC") as.POSIXlt(Sys.time(), "GMT") # the current time in UTC ## These may not be correct names on your system as.POSIXlt(Sys.time(), "America/New_York") # in New York as.POSIXlt(Sys.time(), "EST5EDT") # alternative. as.POSIXlt(Sys.time(), "EST" ) # somewhere in Eastern Canada as.POSIXlt(Sys.time(), "HST") # in Hawaii as.POSIXlt(Sys.time(), "Australia/Darwin") tab <- file.path(R.home("share"), "zoneinfo", "zone1970.tab") if(file.exists(tab)) { # typically on Windows; *not* on Linux cols <- c("code", "coordinates", "TZ", "comments") tmp <- read.delim(tab, header = FALSE, comment.char = "#", col.names = cols) if(interactive()) View(tmp) head(tmp, 10) }
(z <- Sys.time()) # the current datetime, as class "POSIXct" unclass(z) # a large integer floor(unclass(z)/86400) # the number of days since 1970-01-01 (UTC) (now <- as.POSIXlt(Sys.time())) # the current datetime, as class "POSIXlt" str(unclass(now)) # the internal list ; use now$hour, etc : now$year + 1900 # see ?DateTimeClasses months(now); weekdays(now) # see ?months; using LC_TIME locale ## suppose we have a time in seconds since 1960-01-01 00:00:00 GMT ## (the origin used by SAS) z <- 1472562988 # ways to convert this as.POSIXct(z, origin = "1960-01-01") # local as.POSIXct(z, origin = "1960-01-01", tz = "GMT") # in UTC ## SPSS dates (R-help 2006-02-16) z <- c(10485849600, 10477641600, 10561104000, 10562745600) as.Date(as.POSIXct(z, origin = "1582-10-14", tz = "GMT")) ## Stata date-times: milliseconds since 1960-01-01 00:00:00 GMT ## format %tc excludes leap-seconds, assumed here ## For format %tC including leap seconds, see foreign::read.dta() z <- 1579598122120 op <- options(digits.secs = 3) # avoid rounding down: milliseconds are not exactly representable as.POSIXct((z+0.1)/1000, origin = "1960-01-01") options(op) ## Matlab 'serial day number' (days and fractional days) z <- 7.343736909722223e5 # 2010-08-23 16:35:00 as.POSIXct((z - 719529)*86400, origin = "1970-01-01", tz = "UTC") as.POSIXlt(Sys.time(), "GMT") # the current time in UTC ## These may not be correct names on your system as.POSIXlt(Sys.time(), "America/New_York") # in New York as.POSIXlt(Sys.time(), "EST5EDT") # alternative. as.POSIXlt(Sys.time(), "EST" ) # somewhere in Eastern Canada as.POSIXlt(Sys.time(), "HST") # in Hawaii as.POSIXlt(Sys.time(), "Australia/Darwin") tab <- file.path(R.home("share"), "zoneinfo", "zone1970.tab") if(file.exists(tab)) { # typically on Windows; *not* on Linux cols <- c("code", "coordinates", "TZ", "comments") tmp <- read.delim(tab, header = FALSE, comment.char = "#", col.names = cols) if(interactive()) View(tmp) head(tmp, 10) }
Change the class of an object to indicate that it should be treated ‘as is’.
I(x)
I(x)
x |
an object |
Function I
has two main uses.
In function data.frame
. Protecting an object by
enclosing it in I()
in a call to data.frame
inhibits the
conversion of character vectors to factors and the dropping of
names, and ensures that matrices are inserted as single columns.
I
can also be used to protect objects which are to be
added to a data frame, or converted to a data frame via
as.data.frame
.
It achieves this by prepending the class "AsIs"
to the object's
classes. Class "AsIs"
has a few of its own methods, including
for [
, as.data.frame
, print
and format
.
In function formula
. There it is used to
inhibit the interpretation of operators such as "+"
,
"-"
, "*"
and "^"
as formula operators, so they
are used as arithmetical operators. This is interpreted as a symbol
by terms.formula
.
A copy of the object with class "AsIs"
prepended to the class(es).
Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Split an array or matrix by its margins.
asplit(x, MARGIN)
asplit(x, MARGIN)
x |
an array, including a matrix. |
MARGIN |
a vector giving the margins to split by.
E.g., for a matrix |
Since R 4.1.0, one can also obtain the splits (less efficiently)
using apply(x, MARGIN, identity, simplify = FALSE)
.
The values of the splits can also be obtained (less efficiently) by
split(x, slice.index(x, MARGIN))
.
A “list array” with dimension and each element an
array of dimension
and dimnames preserved as available, where
and
are, respectively, the dimensions of
x
included and not included in MARGIN
.
## A 3-dimensional array of dimension 2 x 3 x 4: d <- 2 : 4 x <- array(seq_len(prod(d)), d) x ## Splitting by margin 2 gives a 1-d list array of length 3 ## consisting of 2 x 4 arrays: asplit(x, 2) ## Splitting by margins 1 and 2 gives a 2 x 3 list array ## consisting of 1-d arrays of length 4: asplit(x, c(1, 2)) ## Compare to split(x, slice.index(x, c(1, 2))) ## A 2 x 3 matrix: (x <- matrix(1 : 6, 2, 3)) ## To split x by its rows, one can use asplit(x, 1) ## or less efficiently split(x, slice.index(x, 1)) split(x, row(x))
## A 3-dimensional array of dimension 2 x 3 x 4: d <- 2 : 4 x <- array(seq_len(prod(d)), d) x ## Splitting by margin 2 gives a 1-d list array of length 3 ## consisting of 2 x 4 arrays: asplit(x, 2) ## Splitting by margins 1 and 2 gives a 2 x 3 list array ## consisting of 1-d arrays of length 4: asplit(x, c(1, 2)) ## Compare to split(x, slice.index(x, c(1, 2))) ## A 2 x 3 matrix: (x <- matrix(1 : 6, 2, 3)) ## To split x by its rows, one can use asplit(x, 1) ## or less efficiently split(x, slice.index(x, 1)) split(x, row(x))
Assign a value to a name in an environment.
assign(x, value, pos = -1, envir = as.environment(pos), inherits = FALSE, immediate = TRUE)
assign(x, value, pos = -1, envir = as.environment(pos), inherits = FALSE, immediate = TRUE)
x |
a variable name, given as a character string. No coercion is done, and the first element of a character vector of length greater than one will be used, with a warning. |
value |
a value to be assigned to |
pos |
where to do the assignment. By default, assigns into the current environment. See ‘Details’ for other possibilities. |
envir |
the |
inherits |
should the enclosing frames of the environment be inspected? |
immediate |
an ignored compatibility feature. |
There are no restrictions on the name given as x
: it can be a
non-syntactic name (see make.names
).
The pos
argument can specify the environment in which to assign
the object in any of several ways: as -1
(the default),
as a positive integer (the position in the search
list); as
the character string name of an element in the search list; or as an
environment
(including using sys.frame
to
access the currently active function calls).
The envir
argument is an alternative way to specify an
environment, but is primarily for back compatibility.
assign
does not dispatch assignment methods, so it cannot be
used to set elements of vectors, names, attributes, etc.
Note that assignment to an attached list or data frame changes the
attached copy and not the original object: see attach
and with
.
This function is invoked for its side effect, which is assigning
value
to the variable x
. If no envir
is
specified, then the assignment takes place in the currently active
environment.
If inherits
is TRUE
, enclosing environments of the supplied
environment are searched until the variable x
is encountered.
The value is then assigned in the environment in which the variable is
encountered (provided that the binding is not locked: see
lockBinding
: if it is, an error is signaled). If the
symbol is not encountered then assignment takes place in the user's
workspace (the global environment).
If inherits
is FALSE
, assignment takes place in the
initial frame of envir
, unless an existing binding is locked or
there is no existing binding and the environment is locked (when an
error is signaled).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
<-
,
get
, the inverse of assign()
,
exists
,
environment
.
for(i in 1:6) { #-- Create objects 'r.1', 'r.2', ... 'r.6' -- nam <- paste("r", i, sep = ".") assign(nam, 1:i) } ls(pattern = "^r..$") ##-- Global assignment within a function: myf <- function(x) { innerf <- function(x) assign("Global.res", x^2, envir = .GlobalEnv) innerf(x+1) } myf(3) Global.res # 16 a <- 1:4 assign("a[1]", 2) a[1] == 2 # FALSE get("a[1]") == 2 # TRUE
for(i in 1:6) { #-- Create objects 'r.1', 'r.2', ... 'r.6' -- nam <- paste("r", i, sep = ".") assign(nam, 1:i) } ls(pattern = "^r..$") ##-- Global assignment within a function: myf <- function(x) { innerf <- function(x) assign("Global.res", x^2, envir = .GlobalEnv) innerf(x+1) } myf(3) Global.res # 16 a <- 1:4 assign("a[1]", 2) a[1] == 2 # FALSE get("a[1]") == 2 # TRUE
Assign a value to a name.
x <- value x <<- value value -> x value ->> x x = value
x <- value x <<- value value -> x value ->> x x = value
x |
a variable name (possibly quoted). |
value |
a value to be assigned to |
There are three different assignment operators: two of them have leftwards and rightwards forms.
The operators <-
and =
assign into the environment in
which they are evaluated. The operator <-
can be used
anywhere, whereas the operator =
is only allowed at the top
level (e.g., in the complete expression typed at the command prompt)
or as one of the subexpressions in a braced list of expressions.
The operators <<-
and ->>
are normally only used in
functions, and cause a search to be made through parent environments
for an existing definition of the variable being assigned. If such
a variable is found (and its binding is not locked) then its value
is redefined, otherwise assignment takes place in the global
environment. Note that their semantics differ from that in the S
language, but are useful in conjunction with the scoping rules of
R. See ‘The R Language Definition’ manual for further
details and examples.
In all the assignment operator expressions, x
can be a name
or an expression defining a part of an object to be replaced (e.g.,
z[[1]]
). A syntactic name does not need to be quoted,
though it can be (preferably by backticks).
The leftwards forms of assignment <- = <<-
group right to left,
the other from left to right.
value
. Thus one can use a <- b <- c <- 6
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Chambers, J. M. (1998)
Programming with Data. A Guide to the S Language.
Springer (for =
).
assign
(and its inverse get
),
for “subassignment” such as x[i] <- v
,
see [<-
;
further, environment
.
The database is attached to the R search path. This means that the database is searched by R when evaluating a variable, so objects in the database can be accessed by simply giving their names.
attach(what, pos = 2L, name = deparse1(substitute(what), backtick=FALSE), warn.conflicts = TRUE)
attach(what, pos = 2L, name = deparse1(substitute(what), backtick=FALSE), warn.conflicts = TRUE)
what |
‘database’. This can be a
|
pos |
integer specifying position in |
name |
name to use for the attached database. Names starting with
|
warn.conflicts |
logical. If NB: Even though the name is |
When evaluating a variable or function name R searches for
that name in the databases listed by search
. The first
name of the appropriate type is used.
By attaching a data frame (or list) to the search path it is possible
to refer to the variables in the data frame by their names alone,
rather than as components of the data frame (e.g., in the example below,
height
rather than women$height
).
By default the database is attached in position 2 in the search path,
immediately after the user's workspace and before all previously
attached packages and previously attached databases. This can be
altered to attach later in the search path with the pos
option,
but you cannot attach at pos = 1
.
The database is not actually attached. Rather, a new environment is
created on the search path and the elements of a list (including
columns of a data frame) or objects in a save file or an environment
are copied into the new environment. If you use
<<-
or assign
to assign to an attached
database, you only alter the attached copy, not the original object.
(Normal assignment will place a modified version in the user's
workspace: see the examples.) For this reason attach
can lead
to confusion.
One useful ‘trick’ is to use what = NULL
(or equivalently a
length-zero list) to create a new environment on the search path into
which objects can be assigned by assign
or
load
or sys.source
.
Names starting "package:"
are reserved for
library
and should not be used by end users. Attached
files are by default given the name file:what
. The
name
argument given for the attached environment will be used
by search
and can be used as the argument to
as.environment
.
The environment
is returned invisibly with a
"name"
attribute.
attach
has the side effect of altering the search path and this
can easily lead to the wrong object of a particular name being found.
People do often forget to detach
databases.
In interactive use, with
is usually preferable to the
use of attach
/detach
, unless what
is a
save()
-produced file in which case
attach()
is a (safety) wrapper for load()
.
In programming, functions should not change the search path unless
that is their purpose. Often with
can be used within a
function. If not, good practice is to
Always use a distinctive name
argument, and
To immediately follow the attach
call by an
on.exit
call to detach
using the distinctive name.
This ensures that the search path is left unchanged even if the
function is interrupted or if code after the attach
call
changes the search path.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
library
, detach
, search
,
objects
, environment
, with
.
require(utils) summary(women$height) # refers to variable 'height' in the data frame attach(women) summary(height) # The same variable now available by name height <- height*2.54 # Don't do this. It creates a new variable # in the user's workspace find("height") summary(height) # The new variable in the workspace rm(height) summary(height) # The original variable. height <<- height*25.4 # Change the copy in the attached environment find("height") summary(height) # The changed copy detach("women") summary(women$height) # unchanged ## Not run: ## create an environment on the search path and populate it sys.source("myfuns.R", envir = attach(NULL, name = "myfuns")) ## End(Not run)
require(utils) summary(women$height) # refers to variable 'height' in the data frame attach(women) summary(height) # The same variable now available by name height <- height*2.54 # Don't do this. It creates a new variable # in the user's workspace find("height") summary(height) # The new variable in the workspace rm(height) summary(height) # The original variable. height <<- height*25.4 # Change the copy in the attached environment find("height") summary(height) # The changed copy detach("women") summary(women$height) # unchanged ## Not run: ## create an environment on the search path and populate it sys.source("myfuns.R", envir = attach(NULL, name = "myfuns")) ## End(Not run)
Get or set specific attributes of an object.
attr(x, which, exact = FALSE) attr(x, which) <- value
attr(x, which, exact = FALSE) attr(x, which) <- value
x |
an object whose attributes are to be accessed. |
which |
a non-empty character string specifying which attribute is to be accessed. |
exact |
logical: should |
value |
an object, the new value of the attribute, or |
These functions provide access to a single attribute of an object. The replacement form causes the named attribute to take the value specified (or create a new attribute with the value given).
The extraction function first looks for an exact match to which
amongst the attributes of x
, then (unless exact = TRUE
)
a unique partial match.
(Setting options(warnPartialMatchAttr = TRUE)
causes
partial matches to give warnings.)
The replacement function only uses exact matches.
Note that some attributes (namely class
,
comment
, dim
, dimnames
,
names
, row.names
and
tsp
) are treated specially and have restrictions on
the values which can be set. (Note that this is not true of
levels
which should be set for factors via the
levels
replacement function.)
The extractor function allows (and does not match) empty and missing
values of which
: the replacement function does not.
NULL
objects cannot have attributes and attempting to
assign one by attr
gives an error.
Both are primitive functions.
For the extractor, the value of the attribute matched, or NULL
if no exact match is found and no or more than one partial match is found.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
# create a 2 by 5 matrix x <- 1:10 attr(x,"dim") <- c(2, 5)
# create a 2 by 5 matrix x <- 1:10 attr(x,"dim") <- c(2, 5)
These functions access an object's attributes. The first form below returns the object's attribute list. The replacement forms uses the list on the right-hand side of the assignment as the object's attributes (if appropriate).
attributes(x) attributes(x) <- value mostattributes(x) <- value
attributes(x) attributes(x) <- value mostattributes(x) <- value
x |
any R object. |
value |
an appropriate named |
Unlike attr
it is not an error to set attributes on a
NULL
object: it will first be coerced to an empty list.
Note that some attributes (namely class
,
comment
, dim
, dimnames
,
names
, row.names
and
tsp
) are treated specially and have restrictions on
the values which can be set. (Note that this is not true of
levels
which should be set for factors via the
levels
replacement function.)
Attributes are not stored internally as a list and should be thought
of as a set and not a vector, i.e, the order of the elements of
attributes()
does not matter. This is also reflected by
identical()
's behaviour with the default argument
attrib.as.set = TRUE
. Attributes must have unique names (and
NA
is taken as "NA"
, not a missing value).
Assigning attributes first removes all attributes, then sets any
dim
attribute and then the remaining attributes in the order
given: this ensures that setting a dim
attribute always precedes
the dimnames
attribute.
The mostattributes
assignment takes special care for the
dim
, names
and dimnames
attributes, and assigns them only when known to be valid whereas an
attributes
assignment would give an error if any are not. It
is principally intended for arrays, and should be used with care on
classed objects. For example, it does not check that
row.names
are assigned correctly for data frames.
The names of a pairlist are not stored as attributes, but are reported
as if they were (and can be set by the replacement form of
attributes
).
NULL
objects cannot have attributes and attempts to
assign them will promote the object to an empty list.
Both assignment and replacement forms of attributes
are
primitive functions.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
x <- cbind(a = 1:3, pi = pi) # simple matrix with dimnames attributes(x) ## strip an object's attributes: attributes(x) <- NULL x # now just a vector of length 6 mostattributes(x) <- list(mycomment = "really special", dim = 3:2, dimnames = list(LETTERS[1:3], letters[1:5]), names = paste(1:6)) x # dim(), but not {dim}names
x <- cbind(a = 1:3, pi = pi) # simple matrix with dimnames attributes(x) ## strip an object's attributes: attributes(x) <- NULL x # now just a vector of length 6 mostattributes(x) <- list(mycomment = "really special", dim = 3:2, dimnames = list(LETTERS[1:3], letters[1:5]), names = paste(1:6)) x # dim(), but not {dim}names
autoload
creates a promise-to-evaluate autoloader
and
stores it with name name
in .AutoloadEnv
environment.
When R attempts to evaluate name
, autoloader
is run,
the package is loaded and name
is re-evaluated in the new
package's environment. The result is that R behaves as if
package
was loaded but it does not occupy memory.
.Autoloaded
contains the names of the packages for
which autoloading has been promised.
autoload(name, package, reset = FALSE, ...) autoloader(name, package, ...) .AutoloadEnv .Autoloaded
autoload(name, package, reset = FALSE, ...) autoloader(name, package, ...) .AutoloadEnv .Autoloaded
name |
string giving the name of an object. |
package |
string giving the name of a package containing the object. |
reset |
logical: for internal use by |
... |
other arguments to |
This function is invoked for its side-effect. It has no return value.
require(stats) autoload("interpSpline", "splines") search() ls("Autoloads") .Autoloaded x <- sort(stats::rnorm(12)) y <- x^2 is <- interpSpline(x, y) search() ## now has splines detach("package:splines") search() is2 <- interpSpline(x, y+x) search() ## and again detach("package:splines")
require(stats) autoload("interpSpline", "splines") search() ls("Autoloads") .Autoloaded x <- sort(stats::rnorm(12)) y <- x^2 is <- interpSpline(x, y) search() ## now has splines detach("package:splines") search() is2 <- interpSpline(x, y+x) search() ## and again detach("package:splines")
Solves a triangular system of linear equations.
backsolve(r, x, k = ncol(r), upper.tri = TRUE, transpose = FALSE) forwardsolve(l, x, k = ncol(l), upper.tri = FALSE, transpose = FALSE)
backsolve(r, x, k = ncol(r), upper.tri = TRUE, transpose = FALSE) forwardsolve(l, x, k = ncol(l), upper.tri = FALSE, transpose = FALSE)
r , l
|
an upper (or lower) triangular matrix giving the coefficients for the system to be solved. Values below (above) the diagonal are ignored. |
x |
a matrix whose columns give the right-hand sides for the equations. |
k |
the number of columns of |
upper.tri |
logical; if |
transpose |
logical; if |
Solves a system of linear equations where the coefficient matrix is upper (or ‘right’, ‘R’) or lower (‘left’, ‘L’) triangular.
x <- backsolve (R, b)
solves , and
x <- forwardsolve(L, b)
solves , respectively.
The r
/l
must have at least k
rows and columns,
and x
must have at least k
rows.
This is a wrapper for the level-3 BLAS routine dtrsm
.
The solution of the triangular system. The result will be a vector if
x
is a vector and a matrix if x
is a matrix.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Dongarra, J. J., Bunch, J. R., Moler, C. B. and Stewart, G. W. (1978) LINPACK Users Guide. Philadelphia: SIAM Publications.
## upper triangular matrix 'r': r <- rbind(c(1,2,3), c(0,1,1), c(0,0,2)) ( y <- backsolve(r, x <- c(8,4,2)) ) # -1 3 1 r %*% y # == x = (8,4,2) backsolve(r, x, transpose = TRUE) # 8 -12 -5
## upper triangular matrix 'r': r <- rbind(c(1,2,3), c(0,1,1), c(0,0,2)) ( y <- backsolve(r, x <- c(8,4,2)) ) # -1 3 1 r %*% y # == x = (8,4,2) backsolve(r, x, transpose = TRUE) # 8 -12 -5
Utilities to ‘balance’ objects of class "POSIXlt"
.
unCfillPOSIXlt(x)
is a fast primitive version of
balancePOSIXlt(x, fill.only=TRUE, classed=FALSE)
or equivalently,
unclass(balancePOSIXlt(x, fill.only=TRUE))
from where it is named.
balancePOSIXlt(x, fill.only = FALSE, classed = TRUE) unCfillPOSIXlt(x)
balancePOSIXlt(x, fill.only = FALSE, classed = TRUE) unCfillPOSIXlt(x)
x |
an R object inheriting from |
fill.only |
a |
classed |
a |
Note that "POSIXlt"
objects x
may have their (9 to 11)
list components of different length
s, by simply
recycling them to full length. Prior to R 4.3.0, this has worked in
printing, formatting, and conversion to "POSIXct"
, but often
not for length()
, conversion to "Date"
or indexing,
i.e., subsetting, [
, or subassigning, [<-
.
Relatedly, components sec
, min
, hour
, mday
and mon
could have been out of their designated range (say, 0–23
for hours) and still work correctly, e.g. in conversions and printing.
This is supported as well, since R 4.3.0, at least when the values are
not extreme.
Function balancePOSIXlt(x)
will now return a version of the
"POSIXlt"
object x
which by default is balanced in both ways:
All the internal list components are of full length, and their values are
inside their ranges as specified in as.POSIXlt
's
‘Details on POSIXlt’.
Setting fill.only = TRUE
will only recycle the list components
to full length, but not check them at all. This is particularly faster
when all components of x
are already of full length.
Experimentally, balancePOSIXlt()
and other functions returning
POSIXlt
objects now set a logical
attribute
"balanced"
with NA
meaning “filled-in”, i.e.,
not “ragged” and TRUE
means (fully) balanced.
For more details about many aspects of valid POSIXlt
objects, notably
their internal list components, see ‘DateTimeClasses’, e.g.,
as.POSIXlt
, notably the section ‘Details on POSIXlt’.
## FIXME: this should also work for regular (non-UTC) time zones. TZ <-"UTC" # Could be # d1 <- as.POSIXlt("2000-01-02 3:45", tz = TZ) # on systems (almost all) which have tm_zone. oldTZ <- Sys.getenv('TZ', unset = "unset") Sys.setenv(TZ = "UTC") d1 <- as.POSIXlt("2000-01-02 3:45") d1$min <- d1$min + (0:16)*20L (f1 <- format(d1)) str(unclass(d1)) # only $min is of length > 1 df <- balancePOSIXlt(d1, fill.only = TRUE) # a "POSIXlt" object str(unclass(df)) # all of length 17; 'min' unchanged db <- balancePOSIXlt(d1, classed = FALSE) # a list stopifnot(identical( unCfillPOSIXlt(d1), balancePOSIXlt(d1, fill.only = TRUE, classed = FALSE))) str(db) # of length 17 *and* in range if(oldTZ == "unset") Sys.unsetenv('TZ') else Sys.setenv(TZ = oldTZ)
## FIXME: this should also work for regular (non-UTC) time zones. TZ <-"UTC" # Could be # d1 <- as.POSIXlt("2000-01-02 3:45", tz = TZ) # on systems (almost all) which have tm_zone. oldTZ <- Sys.getenv('TZ', unset = "unset") Sys.setenv(TZ = "UTC") d1 <- as.POSIXlt("2000-01-02 3:45") d1$min <- d1$min + (0:16)*20L (f1 <- format(d1)) str(unclass(d1)) # only $min is of length > 1 df <- balancePOSIXlt(d1, fill.only = TRUE) # a "POSIXlt" object str(unclass(df)) # all of length 17; 'min' unchanged db <- balancePOSIXlt(d1, classed = FALSE) # a list stopifnot(identical( unCfillPOSIXlt(d1), balancePOSIXlt(d1, fill.only = TRUE, classed = FALSE))) str(db) # of length 17 *and* in range if(oldTZ == "unset") Sys.unsetenv('TZ') else Sys.setenv(TZ = oldTZ)
basename
removes all of the path up to and including the last
path separator (if any).
dirname
returns the part of the path
up to but
excluding the last path separator, or "."
if there is no path
separator.
basename(path) dirname(path)
basename(path) dirname(path)
path |
character vector, containing path names. |
tilde expansion of the path will be performed.
Trailing path separators are removed before dissecting the path,
and for dirname
any trailing file separators are removed
from the result.
A character vector of the same length as path
. A zero-length
input will give a zero-length output with no error.
Paths not containing any separators are taken to be in the current
directory, so dirname
returns "."
.
If an element of path
is NA
, so is the result.
""
is not a valid pathname, but is returned unchanged.
On Windows this will accept either \
or /
as the path
separator, but dirname
will return a path using /
(except if on a network share, when the leading \\
will be
preserved). Expect these only to be able to handle complete
paths, and not for example just a network share or a drive.
UTF-8-encoded path names not valid in the current locale can be used.
These are not wrappers for the POSIX system functions of the same
names: in particular they do not have the special handling of
the path "/"
and of returning "."
for empty strings.
basename(file.path("","p1","p2","p3", c("file1", "file2"))) dirname (file.path("","p1","p2","p3", "filename"))
basename(file.path("","p1","p2","p3", c("file1", "file2"))) dirname (file.path("","p1","p2","p3", "filename"))
Bessel Functions of integer and fractional order, of first
and second kind, and
, and
Modified Bessel functions (of first and third kind),
and
.
besselI(x, nu, expon.scaled = FALSE) besselK(x, nu, expon.scaled = FALSE) besselJ(x, nu) besselY(x, nu)
besselI(x, nu, expon.scaled = FALSE) besselK(x, nu, expon.scaled = FALSE) besselJ(x, nu) besselY(x, nu)
x |
numeric, |
nu |
numeric; the order (maybe fractional and negative) of the corresponding Bessel function. |
expon.scaled |
logical; if |
If expon.scaled = TRUE
, ,
or
are returned.
For , formulae 9.1.2 and 9.6.2 from
Abramowitz & Stegun
are applied (which is probably suboptimal), except for
besselK
which is symmetric in nu
.
The current algorithms will give warnings about accuracy loss for
large arguments. In some cases, these warnings are exaggerated, and
the precision is perfect. For large nu
, say in the order of
millions, the current algorithms are rarely useful.
Numeric vector with the (scaled, if expon.scaled = TRUE
)
values of the corresponding Bessel function.
The length of the result is the maximum of the lengths of the parameters. All parameters are recycled to that length.
Original Fortran code:
W. J. Cody, Argonne National Laboratory
Translation to C and adaptation to R:
Martin Maechler [email protected].
The C code is a translation of Fortran routines from https://netlib.org/specfun/ribesl, ‘../rjbesl’, etc. The four source code files for bessel[IJKY] each contain a paragraph “Acknowledgement” and “References”, a short summary of which is
based on (code) by David J. Sookne, see Sookne (1973)... Modifications... An earlier version was published in Cody (1983).
as besselI
based on (code) by J. B. Campbell (1980)... Modifications...
draws heavily on Temme's Algol program for
... and on Campbell's programs for
.... ... heavily modified.
Abramowitz, M. and Stegun, I. A. (1972). Handbook of Mathematical Functions. Dover, New York; Chapter 9: Bessel Functions of Integer Order.
In order of “Source” citation above:
Sookne, David J. (1973). Bessel Functions of Real Argument and Integer Order. Journal of Research of the National Bureau of Standards, 77B, 125–132. doi:10.6028/jres.077B.012.
Cody, William J. (1983). Algorithm 597: Sequence of modified Bessel functions of the first kind. ACM Transactions on Mathematical Software, 9(2), 242–245. doi:10.1145/357456.357462.
Campbell, J.B. (1980). On Temme's algorithm for the modified Bessel function of the third kind. ACM Transactions on Mathematical Software, 6(4), 581–586. doi:10.1145/355921.355928.
Campbell, J.B. (1979). Bessel functions J_nu(x) and Y_nu(x) of float order and float argument. Computer Physics Communications, 18, 133–142. doi:10.1016/0010-4655(79)90030-4.
Temme, Nico M. (1976). On the numerical evaluation of the ordinary Bessel function of the second kind. Journal of Computational Physics, 21, 343–350. doi:10.1016/0021-9991(76)90032-2.
Other special mathematical functions, such as
gamma
, , and
beta
,
.
require(graphics) nus <- c(0:5, 10, 20) x <- seq(0, 4, length.out = 501) plot(x, x, ylim = c(0, 6), ylab = "", type = "n", main = "Bessel Functions I_nu(x)") for(nu in nus) lines(x, besselI(x, nu = nu), col = nu + 2) legend(0, 6, legend = paste("nu=", nus), col = nus + 2, lwd = 1) x <- seq(0, 40, length.out = 801); yl <- c(-.5, 1) plot(x, x, ylim = yl, ylab = "", type = "n", main = "Bessel Functions J_nu(x)") abline(h=0, v=0, lty=3) for(nu in nus) lines(x, besselJ(x, nu = nu), col = nu + 2) legend("topright", legend = paste("nu=", nus), col = nus + 2, lwd = 1, bty="n") ## Negative nu's -------------------------------------------------- xx <- 2:7 nu <- seq(-10, 9, length.out = 2001) ## --- I() --- --- --- --- matplot(nu, t(outer(xx, nu, besselI)), type = "l", ylim = c(-50, 200), main = expression(paste("Bessel ", I[nu](x), " for fixed ", x, ", as ", f(nu))), xlab = expression(nu)) abline(v = 0, col = "light gray", lty = 3) legend(5, 200, legend = paste("x=", xx), col=seq(xx), lty=1:5) ## --- J() --- --- --- --- bJ <- t(outer(xx, nu, besselJ)) matplot(nu, bJ, type = "l", ylim = c(-500, 200), xlab = quote(nu), ylab = quote(J[nu](x)), main = expression(paste("Bessel ", J[nu](x), " for fixed ", x))) abline(v = 0, col = "light gray", lty = 3) legend("topright", legend = paste("x=", xx), col=seq(xx), lty=1:5) ## ZOOM into right part: matplot(nu[nu > -2], bJ[nu > -2,], type = "l", xlab = quote(nu), ylab = quote(J[nu](x)), main = expression(paste("Bessel ", J[nu](x), " for fixed ", x))) abline(h=0, v = 0, col = "gray60", lty = 3) legend("topright", legend = paste("x=", xx), col=seq(xx), lty=1:5) ##--------------- x --> 0 ----------------------------- x0 <- 2^seq(-16, 5, length.out=256) plot(range(x0), c(1e-40, 1), log = "xy", xlab = "x", ylab = "", type = "n", main = "Bessel Functions J_nu(x) near 0\n log - log scale") ; axis(2, at=1) for(nu in sort(c(nus, nus+0.5))) lines(x0, besselJ(x0, nu = nu), col = nu + 2, lty= 1+ (nu%%1 > 0)) legend("right", legend = paste("nu=", paste(nus, nus+0.5, sep=", ")), col = nus + 2, lwd = 1, bty="n") x0 <- 2^seq(-10, 8, length.out=256) plot(range(x0), 10^c(-100, 80), log = "xy", xlab = "x", ylab = "", type = "n", main = "Bessel Functions K_nu(x) near 0\n log - log scale") ; axis(2, at=1) for(nu in sort(c(nus, nus+0.5))) lines(x0, besselK(x0, nu = nu), col = nu + 2, lty= 1+ (nu%%1 > 0)) legend("topright", legend = paste("nu=", paste(nus, nus + 0.5, sep = ", ")), col = nus + 2, lwd = 1, bty="n") x <- x[x > 0] plot(x, x, ylim = c(1e-18, 1e11), log = "y", ylab = "", type = "n", main = "Bessel Functions K_nu(x)"); axis(2, at=1) for(nu in nus) lines(x, besselK(x, nu = nu), col = nu + 2) legend(0, 1e-5, legend=paste("nu=", nus), col = nus + 2, lwd = 1) yl <- c(-1.6, .6) plot(x, x, ylim = yl, ylab = "", type = "n", main = "Bessel Functions Y_nu(x)") for(nu in nus){ xx <- x[x > .6*nu] lines(xx, besselY(xx, nu=nu), col = nu+2) } legend(25, -.5, legend = paste("nu=", nus), col = nus+2, lwd = 1) ## negative nu in bessel_Y -- was bogus for a long time curve(besselY(x, -0.1), 0, 10, ylim = c(-3,1), ylab = "") for(nu in c(seq(-0.2, -2, by = -0.1))) curve(besselY(x, nu), add = TRUE) title(expression(besselY(x, nu) * " " * {nu == list(-0.1, -0.2, ..., -2)}))
require(graphics) nus <- c(0:5, 10, 20) x <- seq(0, 4, length.out = 501) plot(x, x, ylim = c(0, 6), ylab = "", type = "n", main = "Bessel Functions I_nu(x)") for(nu in nus) lines(x, besselI(x, nu = nu), col = nu + 2) legend(0, 6, legend = paste("nu=", nus), col = nus + 2, lwd = 1) x <- seq(0, 40, length.out = 801); yl <- c(-.5, 1) plot(x, x, ylim = yl, ylab = "", type = "n", main = "Bessel Functions J_nu(x)") abline(h=0, v=0, lty=3) for(nu in nus) lines(x, besselJ(x, nu = nu), col = nu + 2) legend("topright", legend = paste("nu=", nus), col = nus + 2, lwd = 1, bty="n") ## Negative nu's -------------------------------------------------- xx <- 2:7 nu <- seq(-10, 9, length.out = 2001) ## --- I() --- --- --- --- matplot(nu, t(outer(xx, nu, besselI)), type = "l", ylim = c(-50, 200), main = expression(paste("Bessel ", I[nu](x), " for fixed ", x, ", as ", f(nu))), xlab = expression(nu)) abline(v = 0, col = "light gray", lty = 3) legend(5, 200, legend = paste("x=", xx), col=seq(xx), lty=1:5) ## --- J() --- --- --- --- bJ <- t(outer(xx, nu, besselJ)) matplot(nu, bJ, type = "l", ylim = c(-500, 200), xlab = quote(nu), ylab = quote(J[nu](x)), main = expression(paste("Bessel ", J[nu](x), " for fixed ", x))) abline(v = 0, col = "light gray", lty = 3) legend("topright", legend = paste("x=", xx), col=seq(xx), lty=1:5) ## ZOOM into right part: matplot(nu[nu > -2], bJ[nu > -2,], type = "l", xlab = quote(nu), ylab = quote(J[nu](x)), main = expression(paste("Bessel ", J[nu](x), " for fixed ", x))) abline(h=0, v = 0, col = "gray60", lty = 3) legend("topright", legend = paste("x=", xx), col=seq(xx), lty=1:5) ##--------------- x --> 0 ----------------------------- x0 <- 2^seq(-16, 5, length.out=256) plot(range(x0), c(1e-40, 1), log = "xy", xlab = "x", ylab = "", type = "n", main = "Bessel Functions J_nu(x) near 0\n log - log scale") ; axis(2, at=1) for(nu in sort(c(nus, nus+0.5))) lines(x0, besselJ(x0, nu = nu), col = nu + 2, lty= 1+ (nu%%1 > 0)) legend("right", legend = paste("nu=", paste(nus, nus+0.5, sep=", ")), col = nus + 2, lwd = 1, bty="n") x0 <- 2^seq(-10, 8, length.out=256) plot(range(x0), 10^c(-100, 80), log = "xy", xlab = "x", ylab = "", type = "n", main = "Bessel Functions K_nu(x) near 0\n log - log scale") ; axis(2, at=1) for(nu in sort(c(nus, nus+0.5))) lines(x0, besselK(x0, nu = nu), col = nu + 2, lty= 1+ (nu%%1 > 0)) legend("topright", legend = paste("nu=", paste(nus, nus + 0.5, sep = ", ")), col = nus + 2, lwd = 1, bty="n") x <- x[x > 0] plot(x, x, ylim = c(1e-18, 1e11), log = "y", ylab = "", type = "n", main = "Bessel Functions K_nu(x)"); axis(2, at=1) for(nu in nus) lines(x, besselK(x, nu = nu), col = nu + 2) legend(0, 1e-5, legend=paste("nu=", nus), col = nus + 2, lwd = 1) yl <- c(-1.6, .6) plot(x, x, ylim = yl, ylab = "", type = "n", main = "Bessel Functions Y_nu(x)") for(nu in nus){ xx <- x[x > .6*nu] lines(xx, besselY(xx, nu=nu), col = nu+2) } legend(25, -.5, legend = paste("nu=", nus), col = nus+2, lwd = 1) ## negative nu in bessel_Y -- was bogus for a long time curve(besselY(x, -0.1), 0, 10, ylim = c(-3,1), ylab = "") for(nu in c(seq(-0.2, -2, by = -0.1))) curve(besselY(x, nu), add = TRUE) title(expression(besselY(x, nu) * " " * {nu == list(-0.1, -0.2, ..., -2)}))
These functions represent an interface for adjustments to environments and bindings within environments. They allow for locking environments as well as individual bindings, and for linking a variable to a function.
lockEnvironment(env, bindings = FALSE) environmentIsLocked(env) lockBinding(sym, env) unlockBinding(sym, env) bindingIsLocked(sym, env) makeActiveBinding(sym, fun, env) bindingIsActive(sym, env) activeBindingFunction(sym, env)
lockEnvironment(env, bindings = FALSE) environmentIsLocked(env) lockBinding(sym, env) unlockBinding(sym, env) bindingIsLocked(sym, env) makeActiveBinding(sym, fun, env) bindingIsActive(sym, env) activeBindingFunction(sym, env)
env |
an environment. |
bindings |
logical specifying whether bindings should be locked. |
sym |
a name object or character string. |
fun |
a function taking zero or one arguments. |
The function lockEnvironment
locks its environment argument.
Locking the
environment prevents adding or removing variable bindings from the
environment. Changing the value of a variable is still possible unless
the binding has been locked. The namespace environments of packages
with namespaces are locked when loaded.
lockBinding
locks individual bindings in the specified
environment. The value of a locked binding cannot be changed. Locked
bindings may be removed from an environment unless the environment is
locked.
makeActiveBinding
installs fun
in environment env
so that getting the value of sym
calls fun
with no
arguments, and assigning to sym
calls fun
with one
argument, the value to be assigned. This allows the implementation of
things like C variables linked to R variables and variables linked to
databases, and is used to implement setRefClass
. It may
also be useful for making thread-safe versions of some system globals.
Currently active bindings are not preserved during package installation,
but they can be created in .onLoad
.
The bindingIsLocked
and environmentIsLocked
return a
length-one logical vector. The remaining functions return
NULL
, invisibly.
Luke Tierney
# locking environments e <- new.env() assign("x", 1, envir = e) get("x", envir = e) lockEnvironment(e) get("x", envir = e) assign("x", 2, envir = e) try(assign("y", 2, envir = e)) # error # locking bindings e <- new.env() assign("x", 1, envir = e) get("x", envir = e) lockBinding("x", e) try(assign("x", 2, envir = e)) # error unlockBinding("x", e) assign("x", 2, envir = e) get("x", envir = e) # active bindings f <- local( { x <- 1 function(v) { if (missing(v)) cat("get\n") else { cat("set\n") x <<- v } x } }) makeActiveBinding("fred", f, .GlobalEnv) bindingIsActive("fred", .GlobalEnv) fred fred <- 2 fred
# locking environments e <- new.env() assign("x", 1, envir = e) get("x", envir = e) lockEnvironment(e) get("x", envir = e) assign("x", 2, envir = e) try(assign("y", 2, envir = e)) # error # locking bindings e <- new.env() assign("x", 1, envir = e) get("x", envir = e) lockBinding("x", e) try(assign("x", 2, envir = e)) # error unlockBinding("x", e) assign("x", 2, envir = e) get("x", envir = e) # active bindings f <- local( { x <- 1 function(v) { if (missing(v)) cat("get\n") else { cat("set\n") x <<- v } x } }) makeActiveBinding("fred", f, .GlobalEnv) bindingIsActive("fred", .GlobalEnv) fred fred <- 2 fred
Logical operations on integer vectors with elements viewed as sets of bits.
bitwNot(a) bitwAnd(a, b) bitwOr(a, b) bitwXor(a, b) bitwShiftL(a, n) bitwShiftR(a, n)
bitwNot(a) bitwAnd(a, b) bitwOr(a, b) bitwXor(a, b) bitwShiftL(a, n) bitwShiftR(a, n)
a , b
|
integer vectors; numeric vectors are coerced to integer vectors. |
n |
non-negative integer vector of values up to 31. |
Each element of an integer vector has 32 bits.
Pairwise operations can result in integer NA
.
Shifting is done assuming the values represent unsigned integers.
An integer vector of length the longer of the arguments, or zero length if one is zero-length.
The output element is NA
if an input is NA
(after
coercion) or an invalid shift.
The logical operators, !
, &
,
|
, xor
.
Notably these do work bitwise for raw
arguments.
The classes "octmode"
and "hexmode"
whose
implementation of the standard logical operators is based on these
functions.
Package bitops has similar functions for numeric vectors which
differ in the way they treat integers or larger.
bitwNot(0:12) # -1 -2 ... -13 bitwAnd(15L, 7L) # 7 bitwOr (15L, 7L) # 15 bitwXor(15L, 7L) # 8 bitwXor(-1L, 1L) # -2 ## The "same" for 'raw' instead of integer : rr12 <- as.raw(0:12) ; rbind(rr12, !rr12) c(r15 <- as.raw(15), r7 <- as.raw(7)) # 0f 07 r15 & r7 # 07 r15 | r7 # 0f xor(r15, r7)# 08 bitwShiftR(-1, 1:31) # shifts of 2^32-1 = 4294967295
bitwNot(0:12) # -1 -2 ... -13 bitwAnd(15L, 7L) # 7 bitwOr (15L, 7L) # 15 bitwXor(15L, 7L) # 8 bitwXor(-1L, 1L) # -2 ## The "same" for 'raw' instead of integer : rr12 <- as.raw(0:12) ; rbind(rr12, !rr12) c(r15 <- as.raw(15), r7 <- as.raw(7)) # 0f 07 r15 & r7 # 07 r15 | r7 # 0f xor(r15, r7)# 08 bitwShiftR(-1, 1:31) # shifts of 2^32-1 = 4294967295
Get or set the body of a function which is basically all of
the function definition but its formal arguments (formals
),
see the ‘Details’.
body(fun = sys.function(sys.parent())) body(fun, envir = environment(fun)) <- value
body(fun = sys.function(sys.parent())) body(fun, envir = environment(fun)) <- value
fun |
a function object, or see ‘Details’. |
envir |
environment in which the function should be defined. |
value |
an object, usually a language object: see section ‘Value’. |
For the first form, fun
can be a character string
naming the function to be manipulated, which is searched for from the
parent frame. If it is not specified, the function calling
body
is used.
The bodies of all but the simplest are braced expressions, that is
calls to {
: see the ‘Examples’ section for how to
create such a call.
body
returns the body of the function specified. This is
normally a language object, most often a call to {
, but
it can also be a symbol
such as pi
or a constant
(e.g., 3
or "R"
) to be the return value of the function.
The replacement form sets the body of a function to the
object on the right hand side, and (potentially) resets the
environment
of the function, and drops
attributes
. If value
is of class
"expression"
the first element is used as the body: any
additional elements are ignored, with a warning.
The three parts of a (non-primitive) function are its
formals
, body
, and environment
.
Further, see
alist
,
args
,
function
.
body(body) f <- function(x) x^5 body(f) <- quote(5^x) ## or equivalently body(f) <- expression(5^x) f(3) # = 125 body(f) ## creating a multi-expression body e <- expression(y <- x^2, return(y)) # or a list body(f) <- as.call(c(as.name("{"), e)) f f(8) ## Using substitute() may be simpler than 'as.call(c(as.name("{",..)))': stopifnot(identical(body(f), substitute({ y <- x^2; return(y) })))
body(body) f <- function(x) x^5 body(f) <- quote(5^x) ## or equivalently body(f) <- expression(5^x) f(3) # = 125 body(f) ## creating a multi-expression body e <- expression(y <- x^2, return(y)) # or a list body(f) <- as.call(c(as.name("{"), e)) f f(8) ## Using substitute() may be simpler than 'as.call(c(as.name("{",..)))': stopifnot(identical(body(f), substitute({ y <- x^2; return(y) })))
An analogue of the LISP backquote macro. bquote
quotes its
argument except that terms wrapped in .()
are evaluated in the
specified where
environment. If splice = TRUE
then
terms wrapped in ..()
are evaluated and spliced into a call.
bquote(expr, where = parent.frame(), splice = FALSE)
bquote(expr, where = parent.frame(), splice = FALSE)
expr |
|
where |
An environment. |
splice |
Logical; if |
require(graphics) a <- 2 bquote(a == a) quote(a == a) bquote(a == .(a)) substitute(a == A, list(A = a)) plot(1:10, a*(1:10), main = bquote(a == .(a))) ## to set a function default arg default <- 1 bquote( function(x, y = .(default)) x+y ) exprs <- expression(x <- 1, y <- 2, x + y) bquote(function() {..(exprs)}, splice = TRUE)
require(graphics) a <- 2 bquote(a == a) quote(a == a) bquote(a == .(a)) substitute(a == A, list(A = a)) plot(1:10, a*(1:10), main = bquote(a == .(a))) ## to set a function default arg default <- 1 bquote( function(x, y = .(default)) x+y ) exprs <- expression(x <- 1, y <- 2, x + y) bquote(function() {..(exprs)}, splice = TRUE)
Interrupt the execution of an expression and allow the inspection of
the environment where browser
was called from.
browser(text = "", condition = NULL, expr = TRUE, skipCalls = 0L)
browser(text = "", condition = NULL, expr = TRUE, skipCalls = 0L)
text |
a text string that can be retrieved once the browser is invoked. |
condition |
a condition that can be retrieved once the browser is invoked. |
expr |
a “condition”. By default, and whenever not false
after being coerced to |
skipCalls |
how many previous calls to skip when reporting the calling context. |
A call to browser
can be included in the body of a function.
When reached, this causes a pause in the execution of the
current expression and allows access to the R interpreter.
The purpose of the text
and condition
arguments are to
allow helper programs (e.g., external debuggers) to insert specific
values here, so that the specific call to browser (perhaps its location
in a source file) can be identified and special processing can be
achieved. The values can be retrieved by calling browserText
and browserCondition
.
The purpose of the expr
argument is to allow for the illusion
of conditional debugging. It is an illusion, because execution is
always paused at the call to browser, but control is only passed
to the evaluator described below if expr
is not FALSE
after
coercion to logical.
In most cases it is going to be more efficient to use an if
statement in the calling program, but in some cases using this argument
will be simpler.
The skipCalls
argument should be used when the browser()
call is nested within another debugging function: it will look further
up the call stack to report its location.
At the browser prompt the user can enter commands or R expressions, followed by a newline. The commands are
c
exit the browser and continue execution at the next statement.
cont
synonym for c
.
f
finish execution of the current loop or function.
help
print this list of commands.
n
evaluate the next statement, stepping over
function calls. For byte compiled functions interrupted by
browser
calls, n
is equivalent to c
.
s
evaluate the next statement, stepping into
function calls. Again, byte compiled functions make
s
equivalent to c
.
where
print a stack trace of all active function calls.
r
invoke a "resume"
restart if one is
available; interpreted as an R expression otherwise. Typically
"resume"
restarts are established for continuing from user
interrupts.
Q
exit the browser and the current evaluation and return to the top-level prompt.
Leading and trailing whitespace is ignored, except for an empty line.
Handling of empty lines depends on the "browserNLdisabled"
option; if it is TRUE
, empty lines are ignored.
If not, an empty line is the same as n
(or s
, if it was used
most recently).
Anything else entered at the browser prompt is interpreted as an
R expression to be evaluated in the calling environment: in
particular typing an object name will cause the object to be printed,
and ls()
lists the objects in the calling frame. (If you want
to look at an object with a name such as n
, print it
explicitly, or use autoprint via (n)
.
The number of lines printed for the deparsed call can be limited by
setting options(deparse.max.lines)
.
The browser prompt is of the form Browse[n]>
: here
n
indicates the ‘browser level’. The browser can
be called when browsing (and often is when debug
is in
use), and each recursive call increases the number. (The actual
number is the number of ‘contexts’ on the context stack: this
is usually 2
for the outer level of browsing and 1
when
examining dumps in debugger
.)
This is a primitive function but does argument matching in the standard way.
Because the browser prompt is implemented using the restart and condition handling mechanism, it prevents error handlers set up before the breakpoint from being called or invoked. The implementation follows this model:
repeat withRestarts( withCallingHandlers( readEvalPrint(), error = function(cnd) { cat("Error:", conditionMessage(cnd), "\n") invokeRestart("browser") } ), browser = function(...) NULL ) readEvalPrint <- function(env = parent.frame()) { print(eval(parse(prompt = "Browse[n]> "), env)) }
The restart invocation interrupts the lookup for condition handlers and transfers control to the next iteration of the debugger REPL.
Note that condition handlers for other classes (such as "warning"
)
are still called and may cause a non-local transfer of control out of the
debugger.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Chambers, J. M. (1998) Programming with Data. A Guide to the S Language. Springer.
debug
, and
traceback
for the stack on error.
browserText
for how to retrieve the text and condition.
A call to browser can provide context by supplying either a text argument or a condition argument. These functions can be used to retrieve either of these arguments.
browserText(n = 1) browserCondition(n = 1) browserSetDebug(n = 1)
browserText(n = 1) browserCondition(n = 1) browserSetDebug(n = 1)
n |
The number of contexts to skip over, it must be non-negative. |
Each call to browser
can supply either a text string or a condition.
The functions browserText
and browserCondition
provide ways
to retrieve those values. Since there can be multiple browser contexts
active at any time we also support retrieving values from the different
contexts. The innermost (most recently initiated) browser context is
numbered 1: other contexts are numbered sequentially.
browserSetDebug
provides a mechanism for initiating the browser in
one of the calling functions. See sys.frame
for a more
complete discussion of the calling stack. To use browserSetDebug
you select some calling function, determine how far back it is in the call
stack and call browserSetDebug
with n
set to that value.
Then, by typing c
at the browser prompt you will cause evaluation
to continue, and provided there are no intervening calls to browser or
other interrupts, control will halt again once evaluation has returned to
the closure specified. This is similar to the up functionality in GDB
or the "step out" functionality in other debuggers.
browserText
returns the text, while browserCondition
returns the condition from the specified browser context.
browserSetDebug
returns NULL, invisibly.
It may be of interest to allow for querying further up the set of browser contexts and this functionality may be added at a later date.
R. Gentleman
Return the names of all the built-in objects. These are fetched directly from the symbol table of the R interpreter.
builtins(internal = FALSE)
builtins(internal = FALSE)
internal |
a logical indicating whether only ‘internal’
functions (which can be called via |
builtins()
returns an unsorted list of the objects in the
symbol table, that is all the objects in the base environment.
These are the built-in objects plus any that have been added
subsequently when the base package was loaded. It is less confusing
to use ls(baseenv(), all.names = TRUE)
.
builtins(TRUE)
returns an unsorted list of the names of internal
functions, that is those which can be accessed as
.Internal(foo(args ...))
for foo in the list.
A character vector.
Function by
is an object-oriented wrapper for
tapply
applied to data frames.
by(data, INDICES, FUN, ..., simplify = TRUE)
by(data, INDICES, FUN, ..., simplify = TRUE)
data |
an R object, normally a data frame, possibly a matrix. |
INDICES |
a factor or a list of factors, each of length
|
FUN |
a function to be applied to (usually data-frame) subsets of
|
... |
further arguments to |
simplify |
logical: see |
A data frame is split by row into data frames
subsetted by the values of one or more factors, and function
FUN
is applied to each subset in turn.
For the default method, an object with dimensions (e.g., a matrix) is
coerced to a data frame and the data frame method applied. Other
objects are also coerced to a data frame, but FUN
is applied
separately to (subsets of) each column of the data frame.
An object of class "by"
, giving the results for each subset.
This is always a list if simplify
is false, otherwise a list or
array (see tapply
).
tapply
, simplify2array
.
array2DF
to convert result to a data
frame. ave
also applies a function block-wise.
require(stats) by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary) by(warpbreaks[, 1], warpbreaks[, -1], summary) by(warpbreaks, warpbreaks[,"tension"], function(x) lm(breaks ~ wool, data = x)) ## now suppose we want to extract the coefficients by group tmp1 <- with(warpbreaks, by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))) sapply(tmp1, coef) ## another way tmp2 <- by(warpbreaks, ~ tension, with, coef(lm(breaks ~ wool))) array2DF(tmp2, simplify = TRUE)
require(stats) by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary) by(warpbreaks[, 1], warpbreaks[, -1], summary) by(warpbreaks, warpbreaks[,"tension"], function(x) lm(breaks ~ wool, data = x)) ## now suppose we want to extract the coefficients by group tmp1 <- with(warpbreaks, by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))) sapply(tmp1, coef) ## another way tmp2 <- by(warpbreaks, ~ tension, with, coef(lm(breaks ~ wool))) array2DF(tmp2, simplify = TRUE)
This is a generic function which combines its arguments.
The default method combines its arguments to form a vector. All arguments are coerced to a common type which is the type of the returned value, and all attributes except names are removed.
## S3 Generic function c(...) ## Default S3 method: c(..., recursive = FALSE, use.names = TRUE)
## S3 Generic function c(...) ## Default S3 method: c(..., recursive = FALSE, use.names = TRUE)
... |
objects to be concatenated. All |
recursive |
logical. If |
use.names |
logical indicating if |
The output type is determined from the highest type of the components
in the hierarchy NULL < raw < logical < integer < double < complex < character
< list < expression. Pairlists are treated as lists, whereas non-vector
components (such as name
s / symbol
s and call
s)
are treated as one-element list
s
which cannot be unlisted even if recursive = TRUE
.
If the output type is complex
, logical, integer, and double
NA
s keep their imaginary parts zero when coerced, and hence will
not become NA_complex_
(with imaginary part NA
).
There is a c.factor
method which combines factors into
a factor.
c
is sometimes used for its side effect of removing attributes
except names, for example to turn an array
into a vector.
as.vector
is a more intuitive way to do this, but also drops
names. Note that c
methods other than the default are not required
to remove attributes (and they will almost certainly preserve a class attribute).
This is a primitive function.
NULL
or an expression or a vector of an appropriate mode.
(With no arguments the value is NULL
.)
This function is S4 generic, but with argument list
(x, ...)
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
unlist
and as.vector
to produce
attribute-free vectors.
c(1, 7:9) c(1:5, 10.5, "next") ## uses with a single argument to drop attributes x <- 1:4 names(x) <- letters[1:4] x c(x) # has names as.vector(x) # no names dim(x) <- c(2,2) x c(x) as.vector(x) ## append to a list: ll <- list(A = 1, c = "C") ## do *not* use c(ll, d = 1:3) # which is == c(ll, as.list(c(d = 1:3))) ## but rather c(ll, d = list(1:3)) # c() combining two lists ## descend through lists: c(list(A = c(B = 1)), recursive = TRUE) c(list(A = c(B = 1, C = 2), B = c(E = 7)), recursive = TRUE)
c(1, 7:9) c(1:5, 10.5, "next") ## uses with a single argument to drop attributes x <- 1:4 names(x) <- letters[1:4] x c(x) # has names as.vector(x) # no names dim(x) <- c(2,2) x c(x) as.vector(x) ## append to a list: ll <- list(A = 1, c = "C") ## do *not* use c(ll, d = 1:3) # which is == c(ll, as.list(c(d = 1:3))) ## but rather c(ll, d = list(1:3)) # c() combining two lists ## descend through lists: c(list(A = c(B = 1)), recursive = TRUE) c(list(A = c(B = 1, C = 2), B = c(E = 7)), recursive = TRUE)
Create or test for objects of mode
"call"
(or
"("
, see Details).
call(name, ...) is.call(x) as.call(x)
call(name, ...) is.call(x) as.call(x)
name |
a non-empty character string naming the function to be called. |
... |
arguments to be part of the call. |
x |
an arbitrary R object. |
call
returns an unevaluated function call, that is, an
unevaluated expression which consists of the named function applied to
the given arguments (name
must be a string which gives
the name of a function to be called). Note that although the call is
unevaluated, the arguments ...
are evaluated.
call
is a primitive, so the first argument is
taken as name
and the remaining arguments as arguments for the
constructed call: if the first argument is named the name must
partially match name
.
is.call
is used to determine whether x
is a call (i.e.,
of mode
"call"
or "("
). Note that
is.call(x)
is strictly equivalent to
typeof(x) == "language"
.
is.language()
is also true for calls (but also
for symbol
s and expression
s where
is.call()
is false).
When is.call(cl)
is true, class(cl)
typically returns "call"
, except when cl
is one of
if
, for
, while
, (
, {
, <-
, =
,
which each has its own class(cl)
(equal to the
“function” name), see the ‘Special calls’ example.
as.call(x)
: Objects of mode "list"
can be coerced to mode "call"
.
The first element of the list becomes the function part of the call,
so should be a function or the name of one (as a symbol; a character string will not do).
If you think of using as.call(string)
, consider using
str2lang(string)
which is an efficient version of
parse(text=string)
.
Note that call()
and as.call()
, when
applicable, are much preferable to these parse()
based
approaches.
All three are primitive functions.
as.call
is generic: you can write methods to handle specific
classes of objects, see InternalMethods.
call
should not be used to attempt to evade restrictions on the
use of .Internal
and other non-API calls.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
do.call
for calling a function by name and argument
list;
Recall
for recursive calling of functions;
further
is.language
,
expression
,
function
.
Producing call
s etc from character: str2lang
and
parse
.
is.call(call) #-> FALSE: Functions are NOT calls ## set up a function call to round with argument 10.5 cl <- call("round", 10.5) is.call(cl) # TRUE cl identical(quote(round(10.5)), # <- less functional, but the same cl) # TRUE ## such a call can also be evaluated. eval(cl) # [1] 10 class(cl) # "call" typeof(cl)# "language" is.call(cl) && is.language(cl) # always TRUE for "call"s A <- 10.5 call("round", A) # round(10.5) call("round", quote(A)) # round(A) f <- "round" call(f, quote(A)) # round(A) ## if we want to supply a function we need to use as.call or similar f <- round ## Not run: call(f, quote(A)) # error: first arg must be character (g <- as.call(list(f, quote(A)))) eval(g) ## alternatively but less transparently g <- list(f, quote(A)) mode(g) <- "call" g eval(g) ## Special calls (and some regular ones): L <- as.list(E <- setNames( , c("if", "for", "while", "repeat", "function", "(", "{", "[", "<-", "<<-", "->", "="))) for(i in seq_along(L)) L[[i]] <- call(E[[i]]) # instead of lapply(E, call) .. list_ <- function (...) `names<-`(list(...), vapply(sys.call()[-1L], as.character, "")) (Tab <- noquote(sapply(list_(is.call, typeof, class, mode), \(F) sapply(L, F)))) ## The 7 exceptions: Tab[ Tab[,"class"] != "call" , c(3:4, 1:2)] ## see also the examples in the help for do.call
is.call(call) #-> FALSE: Functions are NOT calls ## set up a function call to round with argument 10.5 cl <- call("round", 10.5) is.call(cl) # TRUE cl identical(quote(round(10.5)), # <- less functional, but the same cl) # TRUE ## such a call can also be evaluated. eval(cl) # [1] 10 class(cl) # "call" typeof(cl)# "language" is.call(cl) && is.language(cl) # always TRUE for "call"s A <- 10.5 call("round", A) # round(10.5) call("round", quote(A)) # round(A) f <- "round" call(f, quote(A)) # round(A) ## if we want to supply a function we need to use as.call or similar f <- round ## Not run: call(f, quote(A)) # error: first arg must be character (g <- as.call(list(f, quote(A)))) eval(g) ## alternatively but less transparently g <- list(f, quote(A)) mode(g) <- "call" g eval(g) ## Special calls (and some regular ones): L <- as.list(E <- setNames( , c("if", "for", "while", "repeat", "function", "(", "{", "[", "<-", "<<-", "->", "="))) for(i in seq_along(L)) L[[i]] <- call(E[[i]]) # instead of lapply(E, call) .. list_ <- function (...) `names<-`(list(...), vapply(sys.call()[-1L], as.character, "")) (Tab <- noquote(sapply(list_(is.call, typeof, class, mode), \(F) sapply(L, F)))) ## The 7 exceptions: Tab[ Tab[,"class"] != "call" , c(3:4, 1:2)] ## see also the examples in the help for do.call
A downward-only version of Scheme's call with current continuation.
callCC(fun)
callCC(fun)
fun |
function of one argument, the exit procedure. |
callCC
provides a non-local exit mechanism that can be useful
for early termination of a computation. callCC
calls
fun
with one argument, an exit function. The exit
function takes a single argument, the intended return value. If the
body of fun
calls the exit function then the call to
callCC
immediately returns, with the value supplied to the exit
function as the value returned by callCC
.
Luke Tierney
# The following all return the value 1 callCC(function(k) 1) callCC(function(k) k(1)) callCC(function(k) {k(1); 2}) callCC(function(k) repeat k(1))
# The following all return the value 1 callCC(function(k) 1) callCC(function(k) k(1)) callCC(function(k) {k(1); 2}) callCC(function(k) repeat k(1))
Functions to pass R objects to compiled C/C++ code that has been loaded into R.
.Call(.NAME, ..., PACKAGE) .External(.NAME, ..., PACKAGE)
.Call(.NAME, ..., PACKAGE) .External(.NAME, ..., PACKAGE)
.NAME |
a character string giving the name of a C function,
or an object of class |
... |
arguments to be passed to the compiled code. Up to 65 for
|
PACKAGE |
if supplied, confine the search for a character string
This argument follows This is intended to add safety for packages, which can ensure by using this argument that no other package can override their external symbols, and also speeds up the search (see ‘Note’). |
The functions are used to call compiled code which makes use of internal R objects, passing the arguments to the code as a sequence of R objects. They assume C calling conventions, so can usually also be used for C++ code.
For details about how to write code to use with these functions see
the chapter on ‘System and foreign language interfaces’ in
the ‘Writing R Extensions’ manual. They differ in the way the
arguments are passed to the C code: .External
allows for a
variable or unlimited number of arguments.
These functions are primitive, and .NAME
is always
matched to the first argument supplied (which should not be named).
For clarity, avoid using names in the arguments passed to ...
that match or partially match .NAME
.
An R object constructed in the compiled code.
Writing code for use with these functions will need to use internal R structures defined in ‘Rinternals.h’ and/or the macros in ‘Rdefines.h’.
If one of these functions is to be used frequently, do specify
PACKAGE
(to confine the search to a single DLL) or pass
.NAME
as one of the native symbol objects. Searching for
symbols can take a long time, especially when many namespaces are loaded.
You may see PACKAGE = "base"
for symbols linked into R. Do
not use this in your own code: such symbols are not part of the API
and may be changed without warning.
PACKAGE = ""
used to be accepted (but was undocumented): it is
now an error.
Chambers, J. M. (1998)
Programming with Data. A Guide to the S Language.
Springer. (.Call
.)
The ‘Writing R Extensions’ manual.
Report on the optional features which have been compiled into this build of R.
capabilities(what = NULL, Xchk = any(nas %in% c("X11", "jpeg", "png", "tiff")))
capabilities(what = NULL, Xchk = any(nas %in% c("X11", "jpeg", "png", "tiff")))
what |
character vector or |
Xchk |
|
A named logical vector. Current components are
jpeg |
is the |
png |
is the |
tiff |
is the |
tcltk |
is the tcltk package operational?
Note that to make use of Tk you will almost always need to check
that |
X11 |
are the |
aqua |
is the Note that this is distinct from |
http/ftp |
does the default method for |
sockets |
are |
libxml |
is there support for integrating |
fifo |
are FIFO connections supported? |
cledit |
is command-line editing available in the current R
session? This is false in non-interactive sessions.
It will be true for the command-line interface if |
iconv |
is internationalization conversion via
|
NLS |
is there Natural Language Support (for message translations)? |
Rprof |
is there support for |
profmem |
is there support for memory profiling? See
|
cairo |
is there support for the |
ICU |
is ICU available for collation? See the help on
Comparison and |
long.double |
does this build use a Although not guaranteed, it is a reasonable assumption that if
present long doubles will have at least as much range and accuracy
as the ISO/IEC 60559 80-bit ‘extended precision’ format. Since
R 4.0.0 |
libcurl |
is |
Capabilities "jpeg"
, "png"
and "tiff"
refer to
the X11-based versions of these devices. If
capabilities("aqua")
is true, then these devices with
type = "quartz"
will be available, and out-of-the-box will be the
default type. Thus for example the tiff
device will be
available if capabilities("aqua") || capabilities("tiff")
if
the defaults are unchanged.
.Platform
, extSoftVersion
, and
grSoftVersion
(and links there)
for availability of capabilities external to R but
used from R functions.
capabilities() if(!capabilities("ICU")) warning("ICU is not available") ## Does not call the internal X11-checking function: capabilities(Xchk = FALSE) ## See also the examples for 'connections'.
capabilities() if(!capabilities("ICU")) warning("ICU is not available") ## Does not call the internal X11-checking function: capabilities(Xchk = FALSE) ## See also the examples for 'connections'.
Outputs the objects, concatenating the representations. cat
performs much less conversion than print
.
cat(... , file = "", sep = " ", fill = FALSE, labels = NULL, append = FALSE)
cat(... , file = "", sep = " ", fill = FALSE, labels = NULL, append = FALSE)
... |
R objects (see ‘Details’ for the types of objects allowed). |
file |
a connection, or a character string naming the file
to print to. If |
sep |
a character vector of strings to append after each element. |
fill |
a logical or (positive) numeric controlling how the output is
broken into successive lines. If |
labels |
character vector of labels for the lines printed.
Ignored if |
append |
logical. Only used if the argument |
cat
is useful for producing output in user-defined functions.
It converts its arguments to character vectors, concatenates
them to a single character vector, appends the given sep =
string(s) to each element and then outputs them.
No line feeds (aka “newline”s) are output unless explicitly
requested by ‘"\n"’
or if generated by filling (if argument fill
is TRUE
or
numeric).
If file
is a connection and open for writing it is written from
its current position. If it is not open, it is opened for the
duration of the call in "wt"
mode and then closed again.
Currently only atomic vectors and names are handled,
together with NULL
and other zero-length objects (which produce
no output). Character strings are output ‘as is’ (unlike
print.default
which escapes non-printable characters and
backslash — use encodeString
if you want to output
encoded strings using cat
). Other types of R object should be
converted (e.g., by as.character
or format
)
before being passed to cat
. That includes factors, which are
output as integer vectors.
cat
converts numeric/complex elements in the same way as
print
(and not in the same way as as.character
which is used by the S equivalent), so options
"digits"
and "scipen"
are relevant. However, it uses
the minimum field width necessary for each element, rather than the
same field width for all elements.
None (invisible NULL
).
If any element of sep
contains a newline character, it is
treated as a vector of terminators rather than separators, an element
being output after every vector element and a newline after the
last. Entries are recycled as needed.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
print
, format
, and paste
which concatenates into a string.
iter <- stats::rpois(1, lambda = 10) ## print an informative message cat("iteration = ", iter <- iter + 1, "\n") ## 'fill' and label lines: cat(paste(letters, 100* 1:26), fill = TRUE, labels = paste0("{", 1:10, "}:"))
iter <- stats::rpois(1, lambda = 10) ## print an informative message cat("iteration = ", iter <- iter + 1, "\n") ## 'fill' and label lines: cat(paste(letters, 100* 1:26), fill = TRUE, labels = paste0("{", 1:10, "}:"))
Take a sequence of vector, matrix or data-frame arguments and combine by columns or rows, respectively. These are generic functions with methods for other R classes.
cbind(..., deparse.level = 1) rbind(..., deparse.level = 1) ## S3 method for class 'data.frame' rbind(..., deparse.level = 1, make.row.names = TRUE, stringsAsFactors = FALSE, factor.exclude = TRUE)
cbind(..., deparse.level = 1) rbind(..., deparse.level = 1) ## S3 method for class 'data.frame' rbind(..., deparse.level = 1, make.row.names = TRUE, stringsAsFactors = FALSE, factor.exclude = TRUE)
... |
(generalized) vectors or matrices. These can be given as named
arguments. Other R objects may be coerced as appropriate, or S4
methods may be used: see sections ‘Details’ and
‘Value’. (For the |
deparse.level |
integer controlling the construction of labels in
the case of non-matrix-like arguments (for the default method): |
make.row.names |
(only for data frame method:) logical
indicating if unique and valid |
stringsAsFactors |
logical, passed to |
factor.exclude |
if the data frames contain factors, the default
|
The functions cbind
and rbind
are S3 generic, with
methods for data frames. The data frame method will be used if at
least one argument is a data frame and the rest are vectors or
matrices. There can be other methods; in particular, there is one for
time series objects. See the section on ‘Dispatch’ for how
the method to be used is selected. If some of the arguments are of an
S4 class, i.e., isS4(.)
is true, S4 methods are sought
also, and the hidden cbind
/ rbind
functions
from package methods maybe called, which in turn build on
cbind2
or rbind2
, respectively. In that
case, deparse.level
is obeyed, similarly to the default method.
In the default method, all the vectors/matrices must be atomic (see
vector
) or lists. Expressions are not allowed.
Language objects (such as formulae and calls) and pairlists will be
coerced to lists: other objects (such as names and external pointers)
will be included as elements in a list result. Any classes the inputs
might have are discarded (in particular, factors are replaced by their
internal codes).
If there are several matrix arguments, they must all have the same
number of columns (or rows) and this will be the number of columns (or
rows) of the result. If all the arguments are vectors, the number of
columns (rows) in the result is equal to the length of the longest
vector. Values in shorter arguments are recycled to achieve this
length (with a warning
if they are recycled only
fractionally).
When the arguments consist of a mix of matrices and vectors the number of columns (rows) of the result is determined by the number of columns (rows) of the matrix arguments. Any vectors have their values recycled or subsetted to achieve this length.
For cbind
(rbind
), vectors of zero length (including
NULL
) are ignored unless the result would have zero rows
(columns), for S compatibility.
(Zero-extent matrices do not occur in S3 and are not ignored in R.)
Matrices are restricted to less than rows and
columns even on 64-bit systems. So input vectors have the same length
restriction: as from R 3.2.0 input matrices with more elements (but
meeting the row and column restrictions) are allowed.
For the default method, a matrix combining the ...
arguments
column-wise or row-wise. (Exception: if there are no inputs or all
the inputs are NULL
, the value is NULL
.)
The type of a matrix result determined from the highest type of any of the inputs in the hierarchy raw < logical < integer < double < complex < character < list .
For cbind
(rbind
) the column (row) names are taken from
the colnames
(rownames
) of the arguments if these are
matrix-like. Otherwise from the names of the arguments or where those
are not supplied and deparse.level > 0
, by deparsing the
expressions given, for deparse.level = 1
only if that gives a
sensible name (a ‘symbol’, see is.symbol
).
For cbind
row names are taken from the first argument with
appropriate names: rownames for a matrix, or names for a vector of
length the number of rows of the result.
For rbind
column names are taken from the first argument with
appropriate names: colnames for a matrix, or names for a vector of
length the number of columns of the result.
The cbind
data frame method is just a wrapper for
data.frame(..., check.names = FALSE)
. This means that
it will split matrix columns in data frame arguments, and convert
character columns to factors unless stringsAsFactors = FALSE
is
specified.
The rbind
data frame method first drops all zero-column and
zero-row arguments. (If that leaves none, it returns the first
argument with columns otherwise a zero-column zero-row data frame.)
It then takes the classes of the columns from the first data frame,
and matches columns by name (rather than by position). Factors have
their levels expanded as necessary (in the order of the levels of the
level sets of the factors encountered) and the result is an ordered
factor if and only if all the components were ordered factors.
Old-style categories (integer vectors with levels) are promoted to
factors.
Note that for result column j
, factor(., exclude = X(j))
is applied, where
X(j) := if(isTRUE(factor.exclude)) { if(!NA.lev[j]) NA # else NULL } else factor.exclude
where NA.lev[j]
is true iff any contributing data frame has had a
factor
in column j
with an explicit NA
level.
The method dispatching is not done via
UseMethod()
, but by C-internal dispatching.
Therefore there is no need for, e.g., rbind.default
.
The dispatch algorithm is described in the source file (‘.../src/main/bind.c’) as
For each argument we get the list of possible class memberships from the class attribute.
We inspect each class in turn to see if there is an applicable method.
If we find a method, we use it. Otherwise, if there was an S4 object among the arguments, we try S4 dispatch; otherwise, we use the default code.
If you want to combine other objects with data frames, it may be necessary to coerce them to data frames first. (Note that this algorithm can result in calling the data frame method if all the arguments are either data frames or vectors, and this will result in the coercion of character vectors to factors.)
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
c
to combine vectors (and lists) as vectors,
data.frame
to combine vectors and matrices as a data
frame.
m <- cbind(1, 1:7) # the '1' (= shorter vector) is recycled m m <- cbind(m, 8:14)[, c(1, 3, 2)] # insert a column m cbind(1:7, diag(3)) # vector is subset -> warning cbind(0, rbind(1, 1:3)) cbind(I = 0, X = rbind(a = 1, b = 1:3)) # use some names xx <- data.frame(I = rep(0,2)) cbind(xx, X = rbind(a = 1, b = 1:3)) # named differently cbind(0, matrix(1, nrow = 0, ncol = 4)) #> Warning (making sense) dim(cbind(0, matrix(1, nrow = 2, ncol = 0))) #-> 2 x 1 ## deparse.level dd <- 10 rbind(1:4, c = 2, "a++" = 10, dd, deparse.level = 0) # middle 2 rownames rbind(1:4, c = 2, "a++" = 10, dd, deparse.level = 1) # 3 rownames (default) rbind(1:4, c = 2, "a++" = 10, dd, deparse.level = 2) # 4 rownames ## cheap row names: b0 <- gl(3,4, labels=letters[1:3]) bf <- setNames(b0, paste0("o", seq_along(b0))) df <- data.frame(a = 1, B = b0, f = gl(4,3)) df. <- data.frame(a = 1, B = bf, f = gl(4,3)) new <- data.frame(a = 8, B ="B", f = "1") (df1 <- rbind(df , new)) (df.1 <- rbind(df., new)) stopifnot(identical(df1, rbind(df, new, make.row.names=FALSE)), identical(df1, rbind(df., new, make.row.names=FALSE)))
m <- cbind(1, 1:7) # the '1' (= shorter vector) is recycled m m <- cbind(m, 8:14)[, c(1, 3, 2)] # insert a column m cbind(1:7, diag(3)) # vector is subset -> warning cbind(0, rbind(1, 1:3)) cbind(I = 0, X = rbind(a = 1, b = 1:3)) # use some names xx <- data.frame(I = rep(0,2)) cbind(xx, X = rbind(a = 1, b = 1:3)) # named differently cbind(0, matrix(1, nrow = 0, ncol = 4)) #> Warning (making sense) dim(cbind(0, matrix(1, nrow = 2, ncol = 0))) #-> 2 x 1 ## deparse.level dd <- 10 rbind(1:4, c = 2, "a++" = 10, dd, deparse.level = 0) # middle 2 rownames rbind(1:4, c = 2, "a++" = 10, dd, deparse.level = 1) # 3 rownames (default) rbind(1:4, c = 2, "a++" = 10, dd, deparse.level = 2) # 4 rownames ## cheap row names: b0 <- gl(3,4, labels=letters[1:3]) bf <- setNames(b0, paste0("o", seq_along(b0))) df <- data.frame(a = 1, B = b0, f = gl(4,3)) df. <- data.frame(a = 1, B = bf, f = gl(4,3)) new <- data.frame(a = 8, B ="B", f = "1") (df1 <- rbind(df , new)) (df.1 <- rbind(df., new)) stopifnot(identical(df1, rbind(df, new, make.row.names=FALSE)), identical(df1, rbind(df., new, make.row.names=FALSE)))
Seeks a unique match of its first argument among the elements of its second. If successful, it returns this element; otherwise, it performs an action specified by the third argument.
char.expand(input, target, nomatch = stop("no match"))
char.expand(input, target, nomatch = stop("no match"))
input |
a character string to be expanded. |
target |
a character vector with the values to be matched against. |
nomatch |
an R expression to be evaluated in case expansion was not possible. |
This function is particularly useful when abbreviations are allowed in function arguments, and need to be uniquely expanded with respect to a target table of possible values.
A length-one character vector, one of the elements of target
(unless nomatch
is changed to be a non-error, when it can be a
zero-length character string).
charmatch
and pmatch
for performing
partial string matching.
locPars <- c("mean", "median", "mode") char.expand("me", locPars, warning("Could not expand!")) char.expand("mo", locPars)
locPars <- c("mean", "median", "mode") char.expand("me", locPars, warning("Could not expand!")) char.expand("mo", locPars)
Create or test for objects of type "character"
.
character(length = 0) as.character(x, ...) is.character(x)
character(length = 0) as.character(x, ...) is.character(x)
length |
a non-negative integer specifying the desired length. Double values will be coerced to integer: supplying an argument of length other than one is an error. |
x |
object to be coerced or tested. |
... |
further arguments passed to or from other methods. |
as.character
and is.character
are generic: you can
write methods to handle specific classes of objects,
see InternalMethods. Further, for as.character
the
default method calls as.vector
, so, only
if(is.object(x))
is true, dispatch is first on
methods for as.character
and then for methods for as.vector
.
as.character
represents real and complex numbers to 15 significant
digits (technically the compiler's setting of the ISO C constant
DBL_DIG
, which will be 15 on machines supporting IEC 60559
arithmetic according to the C99 standard). This ensures that all the
digits in the result will be reliable (and not the result of
representation error), but does mean that conversion to character and
back to numeric may change the number. If you want to convert numbers
to character with the maximum possible precision, use
format
.
character
creates a character vector of the specified length.
The elements of the vector are all equal to ""
.
as.character
attempts to coerce its argument to character type;
like as.vector
it strips attributes including names.
For lists and pairlists (including language objects such as
calls) it deparses the elements individually, except that it extracts
the first element of length-one character vectors, see the Abc
example.
is.character
returns TRUE
or FALSE
depending on
whether its argument is of character type or not.
as.character
breaks lines in language objects at 500
characters, and inserts newlines. Prior to 2.15.0 lines were
truncated.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
options
: options scipen
and OutDec
affect the
conversion of numbers.
paste
, substr
and strsplit
for character concatenation and splitting,
chartr
for character translation and case folding (e.g.,
upper to lower case) and sub
, grep
etc for
string matching and substitutions. Note that
help.search(keyword = "character")
gives even more links.
deparse
, which is normally preferable to
as.character
for language objects.
Quotes
on how to specify character
/ string
constants, including raw ones.
form <- y ~ a + b + c as.character(form) ## length 3 deparse(form) ## like the input a0 <- 11/999 # has a repeating decimal representation (a1 <- as.character(a0)) format(a0, digits = 16) # shows 1 to 2 more digit(s) a2 <- as.numeric(a1) a2 - a0 # normally around -1e-17 as.character(a2) # possibly different from a1 print(c(a0, a2), digits = 16) as.character(list(A = "Abc", xy = c("x", "y"))) # "Abc" "c(\"x\", \"y\")" ## i.e., "Abc" directly instead of deparsing to "\"Abc\""
form <- y ~ a + b + c as.character(form) ## length 3 deparse(form) ## like the input a0 <- 11/999 # has a repeating decimal representation (a1 <- as.character(a0)) format(a0, digits = 16) # shows 1 to 2 more digit(s) a2 <- as.numeric(a1) a2 - a0 # normally around -1e-17 as.character(a2) # possibly different from a1 print(c(a0, a2), digits = 16) as.character(list(A = "Abc", xy = c("x", "y"))) # "Abc" "c(\"x\", \"y\")" ## i.e., "Abc" directly instead of deparsing to "\"Abc\""
charmatch
seeks matches for the elements of its first argument
among those of its second.
charmatch(x, table, nomatch = NA_integer_)
charmatch(x, table, nomatch = NA_integer_)
x |
the values to be matched: converted to a character vector by
|
table |
the values to be matched against: converted to a character vector. Long vectors are not supported. |
nomatch |
the (integer) value to be returned at non-matching positions. |
Exact matches are preferred to partial matches (those where the value to be matched has an exact match to the initial part of the target, but the target is longer).
If there is a single exact match or no exact match and a unique
partial match then the index of the matching value is returned; if
multiple exact or multiple partial matches are found then 0
is
returned and if no match is found then nomatch
is returned.
NA
values are treated as the string constant "NA"
.
An integer vector of the same length as x
, giving the
indices of the elements in table
which matched, or nomatch
.
This function is based on a C function written by Terry Therneau.
startsWith
for another matching of initial parts of strings;
grep
or regexpr
for more general (regexp)
matching of strings.
charmatch("", "") # returns 1 charmatch("m", c("mean", "median", "mode")) # returns 0 charmatch("med", c("mean", "median", "mode")) # returns 2
charmatch("", "") # returns 1 charmatch("m", c("mean", "median", "mode")) # returns 0 charmatch("med", c("mean", "median", "mode")) # returns 2
Translate characters in character vectors, in particular from upper to lower case or vice versa.
chartr(old, new, x) tolower(x) toupper(x) casefold(x, upper = FALSE)
chartr(old, new, x) tolower(x) toupper(x) casefold(x, upper = FALSE)
x |
a character vector, or an object that can be coerced to
character by |
old |
a character string specifying the characters to be translated. If a character vector of length 2 or more is supplied, the first element is used with a warning. |
new |
a character string specifying the translations. If a character vector of length 2 or more is supplied, the first element is used with a warning. |
upper |
logical: translate to upper or lower case? |
chartr
translates each character in x
that is specified
in old
to the corresponding character specified in new
.
Ranges are supported in the specifications, but character classes and
repeated characters are not. If old
contains more characters
than new, an error is signaled; if it contains fewer characters, the
extra characters at the end of new
are ignored.
tolower
and toupper
convert upper-case characters in a
character vector to lower-case, or vice versa. Non-alphabetic
characters are left unchanged. More than one character can be mapped
to a single upper-case character.
casefold
is a wrapper for tolower
and toupper
originally written for compatibility with S-PLUS.
A character vector of the same length and with the same attributes as
x
(after possible coercion).
Elements of the result will be have the encoding declared as that of
the current locale (see Encoding
) if the corresponding
input had a declared encoding and the current locale is either Latin-1
or UTF-8. The result will be in the current locale's encoding unless
the corresponding input was in UTF-8 or Latin-1, when it will be in UTF-8.
These functions are platform-dependent, usually using OS services. The latter can be quite deficient, for example only covering ASCII characters in 8-bit locales. The definition of ‘alphabetic’ is platform-dependent and liable to change over time as most platforms are based on the frequently-updated Unicode tables.
sub
and gsub
for other
substitutions in strings.
x <- "MiXeD cAsE 123" chartr("iXs", "why", x) chartr("a-cX", "D-Fw", x) tolower(x) toupper(x) ## "Mixed Case" Capitalizing - toupper( every first letter of a word ) : .simpleCap <- function(x) { s <- strsplit(x, " ")[[1]] paste(toupper(substring(s, 1, 1)), substring(s, 2), sep = "", collapse = " ") } .simpleCap("the quick red fox jumps over the lazy brown dog") ## -> [1] "The Quick Red Fox Jumps Over The Lazy Brown Dog" ## and the better, more sophisticated version: capwords <- function(s, strict = FALSE) { cap <- function(s) paste(toupper(substring(s, 1, 1)), {s <- substring(s, 2); if(strict) tolower(s) else s}, sep = "", collapse = " " ) sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s))) } capwords(c("using AIC for model selection")) ## -> [1] "Using AIC For Model Selection" capwords(c("using AIC", "for MODEL selection"), strict = TRUE) ## -> [1] "Using Aic" "For Model Selection" ## ^^^ ^^^^^ ## 'bad' 'good' ## -- Very simple insecure crypto -- rot <- function(ch, k = 13) { p0 <- function(...) paste(c(...), collapse = "") A <- c(letters, LETTERS, " '") I <- seq_len(k); chartr(p0(A), p0(c(A[-I], A[I])), ch) } pw <- "my secret pass phrase" (crypw <- rot(pw, 13)) #-> you can send this off ## now ``decrypt'' : rot(crypw, 54 - 13) # -> the original: stopifnot(identical(pw, rot(crypw, 54 - 13)))
x <- "MiXeD cAsE 123" chartr("iXs", "why", x) chartr("a-cX", "D-Fw", x) tolower(x) toupper(x) ## "Mixed Case" Capitalizing - toupper( every first letter of a word ) : .simpleCap <- function(x) { s <- strsplit(x, " ")[[1]] paste(toupper(substring(s, 1, 1)), substring(s, 2), sep = "", collapse = " ") } .simpleCap("the quick red fox jumps over the lazy brown dog") ## -> [1] "The Quick Red Fox Jumps Over The Lazy Brown Dog" ## and the better, more sophisticated version: capwords <- function(s, strict = FALSE) { cap <- function(s) paste(toupper(substring(s, 1, 1)), {s <- substring(s, 2); if(strict) tolower(s) else s}, sep = "", collapse = " " ) sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s))) } capwords(c("using AIC for model selection")) ## -> [1] "Using AIC For Model Selection" capwords(c("using AIC", "for MODEL selection"), strict = TRUE) ## -> [1] "Using Aic" "For Model Selection" ## ^^^ ^^^^^ ## 'bad' 'good' ## -- Very simple insecure crypto -- rot <- function(ch, k = 13) { p0 <- function(...) paste(c(...), collapse = "") A <- c(letters, LETTERS, " '") I <- seq_len(k); chartr(p0(A), p0(c(A[-I], A[I])), ch) } pw <- "my secret pass phrase" (crypw <- rot(pw, 13)) #-> you can send this off ## now ``decrypt'' : rot(crypw, 54 - 13) # -> the original: stopifnot(identical(pw, rot(crypw, 54 - 13)))
Warn about extraneous arguments in the ...
of its caller. A
utility to be used e.g., in S3 methods which need a formal ...
argument but do not make any use of it. This helps catching user
errors in calling the function in question (which is the caller of
chkDots()
).
chkDots(..., which.call = -1, allowed = character(0))
chkDots(..., which.call = -1, allowed = character(0))
... |
“the dots”, as passed from the caller. |
which.call |
passed to |
allowed |
not yet implemented: character vector of named
elements in |
Martin Maechler, first version outside base, June 2012.
seq.default ## <- you will see ' chkDots(...) ' seq(1,5, foo = "bar") # gives warning via chkDots() ## warning with more than one ...-entry: density.f <- function(x, ...) NextMethod("density") x <- density(structure(rnorm(10), class="f"), bar=TRUE, baz=TRUE)
seq.default ## <- you will see ' chkDots(...) ' seq(1,5, foo = "bar") # gives warning via chkDots() ## warning with more than one ...-entry: density.f <- function(x, ...) NextMethod("density") x <- density(structure(rnorm(10), class="f"), bar=TRUE, baz=TRUE)
Compute the Cholesky factorization of a real symmetric positive-definite square matrix.
chol(x, ...) ## Default S3 method: chol(x, pivot = FALSE, LINPACK = FALSE, tol = -1, ...)
chol(x, ...) ## Default S3 method: chol(x, pivot = FALSE, LINPACK = FALSE, tol = -1, ...)
x |
an object for which a method exists. The default method applies to numeric (or logical) symmetric, positive-definite matrices. |
... |
arguments to be passed to or from methods. |
pivot |
logical: should pivoting be used? |
LINPACK |
logical. Defunct and gives an error. |
tol |
a numeric tolerance for use with |
chol
is generic: the description here applies to the default
method.
Note that only the upper triangular part of x
is used, so
that when
x
is symmetric.
If pivot = FALSE
and x
is not non-negative definite an
error occurs. If x
is positive semi-definite (i.e., some zero
eigenvalues) an error will also occur as a numerical tolerance is used.
If pivot = TRUE
, then the Cholesky decomposition of a positive
semi-definite x
can be computed. The rank of x
is
returned as attr(Q, "rank")
, subject to numerical errors.
The pivot is returned as attr(Q, "pivot")
. It is no longer
the case that t(Q) %*% Q
equals x
. However, setting
pivot <- attr(Q, "pivot")
and oo <- order(pivot)
, it
is true that t(Q[, oo]) %*% Q[, oo]
equals x
,
or, alternatively, t(Q) %*% Q
equals x[pivot,
pivot]
. See the examples.
The value of tol
is passed to LAPACK, with negative values
selecting the default tolerance of (usually) nrow(x) *
.Machine$double.neg.eps * max(diag(x))
. The algorithm terminates once
the pivot is less than tol
.
Unsuccessful results from the underlying LAPACK code will result in an error giving a positive error code: these can only be interpreted by detailed study of the FORTRAN code.
The upper triangular factor of the Cholesky decomposition, i.e., the
matrix such that
(see example).
If pivoting is used, then two additional attributes
"pivot"
and "rank"
are also returned.
The code does not check for symmetry.
If pivot = TRUE
and x
is not non-negative definite then
there will be a warning message but a meaningless result will occur.
So only use pivot = TRUE
when x
is non-negative definite
by construction.
This is an interface to the LAPACK routines DPOTRF
and
DPSTRF
,
LAPACK is from https://netlib.org/lapack/ and its guide is listed in the references.
Anderson. E. and ten others (1999)
LAPACK Users' Guide. Third Edition. SIAM.
Available on-line at
https://netlib.org/lapack/lug/lapack_lug.html.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
chol2inv
for its inverse (without pivoting),
backsolve
for solving linear systems with upper
triangular left sides.
qr
, svd
for related matrix factorizations.
( m <- matrix(c(5,1,1,3),2,2) ) ( cm <- chol(m) ) t(cm) %*% cm #-- = 'm' crossprod(cm) #-- = 'm' # now for something positive semi-definite x <- matrix(c(1:5, (1:5)^2), 5, 2) x <- cbind(x, x[, 1] + 3*x[, 2]) colnames(x) <- letters[20:22] m <- crossprod(x) qr(m)$rank # is 2, as it should be # chol() may fail, depending on numerical rounding: # chol() unlike qr() does not use a tolerance. try(chol(m)) (Q <- chol(m, pivot = TRUE)) ## we can use this by pivot <- attr(Q, "pivot") crossprod(Q[, order(pivot)]) # recover m ## now for a non-positive-definite matrix ( m <- matrix(c(5,-5,-5,3), 2, 2) ) try(chol(m)) # fails (Q <- chol(m, pivot = TRUE)) # warning crossprod(Q) # not equal to m
( m <- matrix(c(5,1,1,3),2,2) ) ( cm <- chol(m) ) t(cm) %*% cm #-- = 'm' crossprod(cm) #-- = 'm' # now for something positive semi-definite x <- matrix(c(1:5, (1:5)^2), 5, 2) x <- cbind(x, x[, 1] + 3*x[, 2]) colnames(x) <- letters[20:22] m <- crossprod(x) qr(m)$rank # is 2, as it should be # chol() may fail, depending on numerical rounding: # chol() unlike qr() does not use a tolerance. try(chol(m)) (Q <- chol(m, pivot = TRUE)) ## we can use this by pivot <- attr(Q, "pivot") crossprod(Q[, order(pivot)]) # recover m ## now for a non-positive-definite matrix ( m <- matrix(c(5,-5,-5,3), 2, 2) ) try(chol(m)) # fails (Q <- chol(m, pivot = TRUE)) # warning crossprod(Q) # not equal to m
Invert a symmetric, positive definite square matrix from its Cholesky
decomposition. Equivalently, compute
from the (
part) of the QR decomposition of
.
chol2inv(x, size = NCOL(x), LINPACK = FALSE)
chol2inv(x, size = NCOL(x), LINPACK = FALSE)
x |
a matrix. The first |
size |
the number of columns of |
LINPACK |
logical. Defunct and gives an error. |
The inverse of the matrix whose Cholesky decomposition was given.
Unsuccessful results from the underlying LAPACK code will result in an error giving a positive error code: these can only be interpreted by detailed study of the FORTRAN code.
This is an interface to the LAPACK routine DPOTRI
.
LAPACK is from https://netlib.org/lapack/ and its guide is listed
in the references.
Anderson. E. and ten others (1999) LAPACK Users' Guide. Third Edition. SIAM. Available on-line at https://netlib.org/lapack/lug/lapack_lug.html.
Dongarra, J. J., Bunch, J. R., Moler, C. B. and Stewart, G. W. (1978) LINPACK Users Guide. Philadelphia: SIAM Publications.
cma <- chol(ma <- cbind(1, 1:3, c(1,3,7))) ma %*% chol2inv(cma)
cma <- chol(ma <- cbind(1, 1:3, c(1,3,7))) ma %*% chol2inv(cma)
chooseOpsMethod
is a function called by the Ops
Group Generic when two
suitable methods are found for a given call. It determines which method to
use for the operation based on the objects being dispatched.
The function is first called with reverse = FALSE
, where
x
corresponds to the first argument and y
to the second
argument of the group generic call. If chooseOpsMethod()
returns
FALSE
for x
, then chooseOpsMethod
is called again,
with x
and y
swapped, mx
and my
swapped,
and reverse = TRUE
.
chooseOpsMethod(x, y, mx, my, cl, reverse)
chooseOpsMethod(x, y, mx, my, cl, reverse)
x , y
|
the objects being dispatched on by the group generic. |
mx , my
|
the methods found for objects |
cl |
the call to the group generic. |
reverse |
logical value indicating whether |
This function must return either TRUE
or FALSE
. A value of
TRUE
indicates that method mx
should be used.
# Create two objects with custom Ops methods foo_obj <- structure(1, class = "foo") bar_obj <- structure(1, class = "bar") `+.foo` <- function(e1, e2) "foo" Ops.bar <- function(e1, e2) "bar" invisible(foo_obj + bar_obj) # Warning: Incompatible methods chooseOpsMethod.bar <- function(x, y, mx, my, cl, reverse) TRUE stopifnot(exprs = { identical(foo_obj + bar_obj, "bar") identical(bar_obj + foo_obj, "bar") }) # cleanup rm(foo_obj, bar_obj, `+.foo`, Ops.bar, chooseOpsMethod.bar)
# Create two objects with custom Ops methods foo_obj <- structure(1, class = "foo") bar_obj <- structure(1, class = "bar") `+.foo` <- function(e1, e2) "foo" Ops.bar <- function(e1, e2) "bar" invisible(foo_obj + bar_obj) # Warning: Incompatible methods chooseOpsMethod.bar <- function(x, y, mx, my, cl, reverse) TRUE stopifnot(exprs = { identical(foo_obj + bar_obj, "bar") identical(bar_obj + foo_obj, "bar") }) # cleanup rm(foo_obj, bar_obj, `+.foo`, Ops.bar, chooseOpsMethod.bar)
R possesses a simple generic function mechanism which can be used for an object-oriented style of programming. Method dispatch takes place based on the class of the first argument to the generic function.
class(x) class(x) <- value unclass(x) inherits(x, what, which = FALSE) nameOfClass(x) isa(x, what) oldClass(x) oldClass(x) <- value .class2(x)
class(x) class(x) <- value unclass(x) inherits(x, what, which = FALSE) nameOfClass(x) isa(x, what) oldClass(x) oldClass(x) <- value .class2(x)
x |
an R object. |
what , value
|
a character vector naming classes. |
which |
logical affecting return value: see ‘Details’. |
Here, we describe the so called “S3” classes (and methods). For “S4” classes (and methods), see ‘Formal classes’ below.
Many R objects have a class
attribute, a character vector
giving the names of the classes from which the object inherits.
(Functions oldClass
and oldClass<-
get and set the
attribute, which can also be done directly.)
If the object does not have a class attribute, it has an implicit
class, notably "matrix"
, "array"
, "function"
or
"numeric"
or the result of
typeof(x)
(which is similar to mode(x)
),
but for type "language"
and mode
"call"
,
where the following extra classes exist for the corresponding function
call
s:
if
, for
, while
, (
, {
, <-
, =
.
Note that for objects x
of an implicit (or an S4) class, when a
(S3) generic function foo(x)
is called, method dispatch may use
more classes than are returned by class(x)
, e.g., for a numeric
matrix, the foo.numeric()
method may apply. The exact full
character
vector of the classes which
UseMethod()
uses, is available as .class2(x)
since
R version 4.0.0. (This also applies to S4 objects when S3 dispatch is
considered, see below.)
Beware that using .class2()
for other reasons than didactical,
diagnostical or for debugging may rather be a misuse than smart.
NULL
objects (of implicit class "NULL"
) cannot have
attributes (hence no class
attribute) and attempting to assign a
class is an error.
When a generic function fun
is applied to an object with class
attribute c("first", "second")
, the system searches for a
function called fun.first
and, if it finds it, applies it to
the object. If no such function is found, a function called
fun.second
is tried. If no class name produces a suitable
function, the function fun.default
is used (if it exists). If
there is no class attribute, the implicit class is tried, then the
default method.
The function class
prints the vector of names of classes an
object inherits from. Correspondingly, class<-
sets the
classes an object inherits from. Assigning an empty character vector or
NULL
removes the class attribute, as for oldClass<-
or
direct attribute setting. Whereas it is clearer to explicitly assign
NULL
to remove the class, using an empty vector is more natural in
e.g., class(x) <- setdiff(class(x), "ts")
.
unclass
returns (a copy of) its argument with its class
attribute removed. (It is not allowed for objects which cannot be
copied, namely environments and external pointers.)
inherits
indicates whether its first argument inherits from any
of the classes specified in the what
argument. If which
is TRUE
then an integer vector of the same length as
what
is returned. Each element indicates the position in the
class(x)
matched by the element of what
; zero indicates
no match. If which
is FALSE
then TRUE
is
returned by inherits
if any of the names in what
match
with any class
.
nameOfClass
is an S3 generic. It is called by inherits
to
get the class name for what
, allowing for what
to be
values other than a character vector. nameOfClass
methods are
expected to return a character vector of length 1.
isa
tests whether x
is an object of class(es) as given
in what
by using is
if x
is an S4
object, and otherwise giving TRUE
iff all elements of
class(x)
are contained in what
.
All but inherits
and isa
are primitive functions.
An additional mechanism of formal classes, nicknamed
“S4”, is available in package methods which is attached
by default. For objects which have a formal class, its name is
returned by class
as a character vector of length one and
method dispatch can happen on several arguments, instead of
only the first. However, S3 method selection attempts to treat objects
from an S4 class as if they had the appropriate S3 class attribute, as
does inherits
. Therefore, S3 methods can be defined for S4
classes. See the ‘Introduction’ and ‘Methods_for_S3’
help pages for basic information on S4 methods and for the relation
between these and S3 methods.
The replacement version of the function sets the class to the value
provided. For classes that have a formal definition, directly
replacing the class this way is strongly deprecated. The expression
as(object, value)
is the way to coerce an object to a
particular class.
The analogue of inherits
for formal classes is
is
. The two functions behave consistently
with one exception: S4 classes can have conditional
inheritance, with an explicit test. In this case, is
will
test the condition, but inherits
ignores all conditional
superclasses.
UseMethod
dispatches on the class as returned by
class
(with some interpolated classes: see the link) rather
than oldClass
. However, group generics dispatch
on the oldClass
for efficiency, and internal generics
only dispatch on objects for which is.object
is true.
UseMethod
, NextMethod
,
‘group generic’, ‘internal generic’
x <- 10 class(x) # "numeric" oldClass(x) # NULL inherits(x, "a") #FALSE class(x) <- c("a", "b") inherits(x,"a") #TRUE inherits(x, "a", TRUE) # 1 inherits(x, c("a", "b", "c"), TRUE) # 1 2 0 class( quote(pi) ) # "name" ## regular calls class( quote(sin(pi*x)) ) # "call" ## special calls class( quote(x <- 1) ) # "<-" class( quote((1 < 2)) ) # "(" class( quote( if(8<3) pi ) ) # "if" .class2(pi) # "double" "numeric" .class2(matrix(1:6, 2,3)) # "matrix" "array" "integer" "numeric"
x <- 10 class(x) # "numeric" oldClass(x) # NULL inherits(x, "a") #FALSE class(x) <- c("a", "b") inherits(x,"a") #TRUE inherits(x, "a", TRUE) # 1 inherits(x, c("a", "b", "c"), TRUE) # 1 2 0 class( quote(pi) ) # "name" ## regular calls class( quote(sin(pi*x)) ) # "call" ## special calls class( quote(x <- 1) ) # "<-" class( quote((1 < 2)) ) # "(" class( quote( if(8<3) pi ) ) # "if" .class2(pi) # "double" "numeric" .class2(matrix(1:6, 2,3)) # "matrix" "array" "integer" "numeric"
Returns a matrix of integers indicating their column number in a matrix-like object, or a factor of column labels.
col(x, as.factor = FALSE) .col(dim)
col(x, as.factor = FALSE) .col(dim)
x |
a matrix-like object, that is one with a two-dimensional
|
dim |
a matrix dimension, i.e., an integer valued numeric vector of length two (with non-negative entries). |
as.factor |
a logical value indicating whether the value should be returned as a factor of column labels (created if necessary) rather than as numbers. |
An integer (or factor) matrix with the same dimensions as x
and whose
ij
-th element is equal to j
(or the j
-th column label).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
row
to get rows;
slice.index
for a general way to get slice indices
in an array.
# extract an off-diagonal of a matrix ma <- matrix(1:12, 3, 4) ma[row(ma) == col(ma) + 1] # create an identity 5-by-5 matrix more slowly than diag(n = 5): x <- matrix(0, nrow = 5, ncol = 5) x[row(x) == col(x)] <- 1 (i34 <- .col(3:4)) stopifnot(identical(i34, .col(c(3,4)))) # 'dim' maybe "double"
# extract an off-diagonal of a matrix ma <- matrix(1:12, 3, 4) ma[row(ma) == col(ma) + 1] # create an identity 5-by-5 matrix more slowly than diag(n = 5): x <- matrix(0, nrow = 5, ncol = 5) x[row(x) == col(x)] <- 1 (i34 <- .col(3:4)) stopifnot(identical(i34, .col(c(3,4)))) # 'dim' maybe "double"
Generate regular sequences.
from:to a:b
from:to a:b
from |
starting value of sequence. |
to |
(maximal) end value of the sequence. |
a , b
|
|
The binary operator :
has two meanings: for factors a:b
is
equivalent to interaction(a, b)
(but the levels are
ordered and labelled differently).
For other arguments from:to
is equivalent to seq(from, to)
,
and generates a sequence from from
to to
in steps of 1
or -1
. Value to
will be included if it differs from
from
by an integer up to a numeric fuzz of about 1e-7
.
Non-numeric arguments are coerced internally (hence without
dispatching methods) to numeric—complex values will have their
imaginary parts discarded with a warning.
For numeric arguments, a numeric vector. This will be of type
integer
if from
is integer-valued and the result
is representable in the R integer type, otherwise of type
"double"
(aka mode
"numeric"
).
For factors, an unordered factor with levels labelled as la:lb
and ordered lexicographically (that is, lb
varies fastest).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole.
(for numeric arguments: S does not have :
for factors.)
seq
(a generalization of from:to
).
As an alternative to using :
for factors, interaction
.
For :
used in the formal representation of an interaction, see
formula
.
1:4 pi:6 # real 6:pi # integer f1 <- gl(2, 3); f1 f2 <- gl(3, 2); f2 f1:f2 # a factor, the "cross" f1 x f2
1:4 pi:6 # real 6:pi # integer f1 <- gl(2, 3); f1 f2 <- gl(3, 2); f2 f1:f2 # a factor, the "cross" f1 x f2
Form row and column sums and means for numeric arrays (or data frames).
colSums (x, na.rm = FALSE, dims = 1) rowSums (x, na.rm = FALSE, dims = 1) colMeans(x, na.rm = FALSE, dims = 1) rowMeans(x, na.rm = FALSE, dims = 1) .colSums(x, m, n, na.rm = FALSE) .rowSums(x, m, n, na.rm = FALSE) .colMeans(x, m, n, na.rm = FALSE) .rowMeans(x, m, n, na.rm = FALSE)
colSums (x, na.rm = FALSE, dims = 1) rowSums (x, na.rm = FALSE, dims = 1) colMeans(x, na.rm = FALSE, dims = 1) rowMeans(x, na.rm = FALSE, dims = 1) .colSums(x, m, n, na.rm = FALSE) .rowSums(x, m, n, na.rm = FALSE) .colMeans(x, m, n, na.rm = FALSE) .rowMeans(x, m, n, na.rm = FALSE)
x |
an array of two or more dimensions, containing numeric,
complex, integer or logical values, or a numeric data frame. For
|
na.rm |
logical. Should missing values (including |
dims |
integer: Which dimensions are regarded as ‘rows’ or
‘columns’ to sum over. For |
m , n
|
the dimensions of the matrix |
These functions are equivalent to use of apply
with
FUN = mean
or FUN = sum
with appropriate margins, but
are a lot faster. As they are written for speed, they blur over some
of the subtleties of NaN
and NA
. If na.rm =
FALSE
and either NaN
or NA
appears in a sum, the
result will be one of NaN
or NA
, but which might be
platform-dependent.
Notice that omission of missing values is done on a per-column or
per-row basis, so column means may not be over the same set of rows,
and vice versa. To use only complete rows or columns, first select
them with na.omit
or complete.cases
(possibly on the transpose of x
).
The versions with an initial dot in the name (.colSums()
etc)
are ‘bare-bones’ versions for use in programming: they apply
only to numeric (like) matrices and do not name the result.
A numeric or complex array of suitable size, or a vector if the result
is one-dimensional. For the first four functions the dimnames
(or names
for a vector result) are taken from the original
array.
If there are no values in a range to be summed over (after removing
missing values with na.rm = TRUE
), that
component of the output is set to 0
(*Sums
) or NaN
(*Means
), consistent with sum
and
mean
.
## Compute row and column sums for a matrix: x <- cbind(x1 = 3, x2 = c(4:1, 2:5)) rowSums(x); colSums(x) dimnames(x)[[1]] <- letters[1:8] rowSums(x); colSums(x); rowMeans(x); colMeans(x) x[] <- as.integer(x) rowSums(x); colSums(x) x[] <- x < 3 rowSums(x); colSums(x) x <- cbind(x1 = 3, x2 = c(4:1, 2:5)) x[3, ] <- NA; x[4, 2] <- NA rowSums(x); colSums(x); rowMeans(x); colMeans(x) rowSums(x, na.rm = TRUE); colSums(x, na.rm = TRUE) rowMeans(x, na.rm = TRUE); colMeans(x, na.rm = TRUE) ## an array dim(UCBAdmissions) rowSums(UCBAdmissions); rowSums(UCBAdmissions, dims = 2) colSums(UCBAdmissions); colSums(UCBAdmissions, dims = 2) ## complex case x <- cbind(x1 = 3 + 2i, x2 = c(4:1, 2:5) - 5i) x[3, ] <- NA; x[4, 2] <- NA rowSums(x); colSums(x); rowMeans(x); colMeans(x) rowSums(x, na.rm = TRUE); colSums(x, na.rm = TRUE) rowMeans(x, na.rm = TRUE); colMeans(x, na.rm = TRUE)
## Compute row and column sums for a matrix: x <- cbind(x1 = 3, x2 = c(4:1, 2:5)) rowSums(x); colSums(x) dimnames(x)[[1]] <- letters[1:8] rowSums(x); colSums(x); rowMeans(x); colMeans(x) x[] <- as.integer(x) rowSums(x); colSums(x) x[] <- x < 3 rowSums(x); colSums(x) x <- cbind(x1 = 3, x2 = c(4:1, 2:5)) x[3, ] <- NA; x[4, 2] <- NA rowSums(x); colSums(x); rowMeans(x); colMeans(x) rowSums(x, na.rm = TRUE); colSums(x, na.rm = TRUE) rowMeans(x, na.rm = TRUE); colMeans(x, na.rm = TRUE) ## an array dim(UCBAdmissions) rowSums(UCBAdmissions); rowSums(UCBAdmissions, dims = 2) colSums(UCBAdmissions); colSums(UCBAdmissions, dims = 2) ## complex case x <- cbind(x1 = 3 + 2i, x2 = c(4:1, 2:5) - 5i) x[3, ] <- NA; x[4, 2] <- NA rowSums(x); colSums(x); rowMeans(x); colMeans(x) rowSums(x, na.rm = TRUE); colSums(x, na.rm = TRUE) rowMeans(x, na.rm = TRUE); colMeans(x, na.rm = TRUE)
Provides access to a copy of the command line arguments supplied when this R session was invoked.
commandArgs(trailingOnly = FALSE)
commandArgs(trailingOnly = FALSE)
trailingOnly |
logical. Should only arguments after --args be returned? |
These arguments are captured before the standard R command line processing takes place. This means that they are the unmodified values. This is especially useful with the --args command-line flag to R, as all of the command line after that flag is skipped.
A character vector containing the name of the executable and the user-supplied command line arguments. The first element is the name of the executable by which R was invoked. The exact form of this element is platform dependent: it may be the fully qualified name, or simply the last component (or basename) of the application, or for an embedded R it can be anything the programmer supplied.
If trailingOnly = TRUE
, a character vector of those arguments
(if any) supplied after --args.
commandArgs() ## Spawn a copy of this application as it was invoked, ## subject to shell quoting issues ## system(paste(commandArgs(), collapse = " "))
commandArgs() ## Spawn a copy of this application as it was invoked, ## subject to shell quoting issues ## system(paste(commandArgs(), collapse = " "))
"comment"
AttributeThese functions set and query a comment
attribute for any R objects. This is typically useful for
data.frame
s or model fits.
Contrary to other attributes
, the comment
is not
printed (by print
or print.default
).
Assigning NULL
or a zero-length character vector removes the
comment.
comment(x) comment(x) <- value
comment(x) comment(x) <- value
x |
any R object. |
value |
a |
attributes
and attr
for other attributes.
x <- matrix(1:12, 3, 4) comment(x) <- c("This is my very important data from experiment #0234", "Jun 5, 1998") x comment(x)
x <- matrix(1:12, 3, 4) comment(x) <- c("This is my very important data from experiment #0234", "Jun 5, 1998") x comment(x)
Binary operators which allow the comparison of values in atomic vectors.
x < y x > y x <= y x >= y x == y x != y
x < y x > y x <= y x >= y x == y x != y
x , y
|
atomic vectors, symbols, calls, or other objects for which methods have been written. |
The binary comparison operators are generic functions: methods can be
written for them individually or via the
Ops
group generic function. (See
Ops
for how dispatch is computed.)
Comparison of strings in character vectors is lexicographic within the
strings using the collating sequence of the locale in use: see
locales
. The collating sequence of locales such as
‘en_US’ is normally different from ‘C’ (which should use
ASCII) and can be surprising. Beware of making any assumptions
about the collation order: e.g. in Estonian Z
comes between
S
and T
, and collation is not necessarily
character-by-character – in Danish aa
sorts as a single
letter, after z
. In Welsh ng
may or may not be a single
sorting unit: if it is it follows g
. Some platforms may
not respect the locale and always sort in numerical order of the bytes
in an 8-bit locale, or in Unicode code-point order for a UTF-8 locale (and
may not sort in the same order for the same language in different
character sets). Collation of non-letters (spaces, punctuation signs,
hyphens, fractions and so on) is even more problematic.
Character strings can be compared with different marked encodings
(see Encoding
): they are translated to UTF-8 before
comparison.
Raw vectors should not really be considered to have an order, but the numeric order of the byte representation is used.
At least one of x
and y
must be an atomic vector, but if
the other is a list R attempts to coerce it to the type of the atomic
vector: this will succeed if the list is made up of elements of length
one that can be coerced to the correct type.
If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
Missing values (NA
) and NaN
values are
regarded as non-comparable even to themselves, so comparisons
involving them will always result in NA
. Missing values can
also result when character strings are compared and one is not valid
in the current collation locale.
Language objects such as symbols and calls can only be used as
operands for ==
and !=
; the other comparisons signal an
error when one of the operands is a language object. Currently
language objects are deparsed to character strings before
comparison. This can be inefficient and may not be what is really
wanted. For equality comparisons identical
is usually a
better choice.
A logical vector indicating the result of the element by element comparison. The elements of shorter vectors are recycled as necessary.
Objects such as arrays or time-series can be compared this way provided they are conformable.
These operators are members of the S4 Compare
group generic,
and so methods can be written for them individually as well as for the
group generic (or the Ops
group generic), with arguments
c(e1, e2)
.
Do not use ==
and !=
for tests, such as in if
expressions, where you must get a single TRUE
or
FALSE
. Unless you are absolutely sure that nothing unusual
can happen, you should use the identical
function
instead.
For numerical and complex values, remember ==
and !=
do
not allow for the finite representation of fractions, nor for rounding
error. Using all.equal
with identical
or
isTRUE
is almost always preferable; see the examples.
(This also applies to the other comparison operators.)
These operators are sometimes called as functions as
e.g. `<`(x, y)
: see the description of how
argument-matching is done in Ops
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Collation of character strings is a complex topic. For an introduction see https://en.wikipedia.org/wiki/Collating_sequence. The Unicode Collation Algorithm (https://unicode.org/reports/tr10/) is likely to be increasingly influential. Where available R by default makes use of ICU (https://icu.unicode.org/) for collation (except in a C locale).
Logic
on how to combine results of comparisons,
i.e., logical vectors.
factor
for the behaviour with factor arguments.
Syntax
for operator precedence.
capabilities
for whether ICU is available, and
icuSetCollate
to tune the string collation algorithm
when it is.
x <- stats::rnorm(20) x < 1 x[x > 0] x1 <- 0.5 - 0.3 x2 <- 0.3 - 0.1 x1 == x2 # FALSE on most machines isTRUE(all.equal(x1, x2)) # TRUE everywhere # range of most 8-bit charsets, as well as of Latin-1 in Unicode z <- c(32:126, 160:255) x <- if(l10n_info()$MBCS) { intToUtf8(z, multiple = TRUE) } else rawToChar(as.raw(z), multiple = TRUE) ## by number writeLines(strwrap(paste(x, collapse=" "), width = 60)) ## by locale collation writeLines(strwrap(paste(sort(x), collapse=" "), width = 60))
x <- stats::rnorm(20) x < 1 x[x > 0] x1 <- 0.5 - 0.3 x2 <- 0.3 - 0.1 x1 == x2 # FALSE on most machines isTRUE(all.equal(x1, x2)) # TRUE everywhere # range of most 8-bit charsets, as well as of Latin-1 in Unicode z <- c(32:126, 160:255) x <- if(l10n_info()$MBCS) { intToUtf8(z, multiple = TRUE) } else rawToChar(as.raw(z), multiple = TRUE) ## by number writeLines(strwrap(paste(x, collapse=" "), width = 60)) ## by locale collation writeLines(strwrap(paste(sort(x), collapse=" "), width = 60))
Basic functions which support complex arithmetic in R, in addition to
the arithmetic operators +
, -
, *
, /
, and ^
.
complex(length.out = 0, real = numeric(), imaginary = numeric(), modulus = 1, argument = 0) as.complex(x, ...) is.complex(x) Re(z) Im(z) Mod(z) Arg(z) Conj(z)
complex(length.out = 0, real = numeric(), imaginary = numeric(), modulus = 1, argument = 0) as.complex(x, ...) is.complex(x) Re(z) Im(z) Mod(z) Arg(z) Conj(z)
length.out |
numeric. Desired length of the output vector, inputs being recycled as needed. |
real |
numeric vector. |
imaginary |
numeric vector. |
modulus |
numeric vector. |
argument |
numeric vector. |
x |
an object, probably of mode |
z |
an object of mode |
... |
further arguments passed to or from other methods. |
Complex vectors can be created with complex
. The vector can be
specified either by giving its length, its real and imaginary parts, or
modulus and argument. (Giving just the length generates a vector of
complex zeroes.)
as.complex
attempts to coerce its argument to be of complex
type: like as.vector
it strips attributes including
names.
Since R version 4.4.0, as.complex(x)
for “number-like”
x
, i.e., types "logical"
, "integer"
, and
"double"
, will always keep imaginary part zero, now also for
NA
's.
Up to R versions 3.2.x, all forms of NA
and NaN
were coerced to a complex NA
, i.e., the NA_complex_
constant, for which both the real and imaginary parts are NA
.
Since R 3.3.0, typically only objects which are NA
in parts
are coerced to complex NA
, but others with NaN
parts,
are not. As a consequence, complex arithmetic where only
NaN
's (but no NA
's) are involved typically will
not give complex NA
but complex numbers with real or
imaginary parts of NaN
.
All of these many different complex numbers fulfill is.na(.)
but
only one of them is identical to NA_complex_
.
Note that is.complex
and is.numeric
are never both
TRUE
.
The functions Re
, Im
, Mod
, Arg
and
Conj
have their usual interpretation as returning the real
part, imaginary part, modulus, argument and complex conjugate for
complex values. The modulus and argument are also called the polar
coordinates. If with real
and
, for
,
and
,
and
. They are all
internal generic primitive functions: methods can be
defined for them
individually or via the
Complex
group generic.
In addition to the arithmetic operators (see Arithmetic)
+
, -
, *
, /
, and ^
, the elementary
trigonometric, logarithmic, exponential, square root and hyperbolic
functions are implemented for complex values.
Matrix multiplications (%*%
, crossprod
,
tcrossprod
) are also defined for complex matrices
(matrix
), and so are solve
,
eigen
or svd
.
Internally, complex numbers are stored as a pair of double
precision numbers, either or both of which can be NaN
(including NA
, see NA_complex_
and above) or
plus or minus infinity.
as.complex
is primitive and can have S4 methods set.
Re
, Im
, Mod
, Arg
and Conj
constitute the S4 group generic
Complex
and so S4 methods can be
set for them individually or via the group generic.
Operations and functions involving complex NaN
mostly
rely on the C library's handling of ‘double complex’ arithmetic,
which typically returns complex(re=NaN, im=NaN)
(but we have
not seen a guarantee for that).
For +
and -
, R's own handling works strictly
“coordinate wise”.
Operations involving complex NA
, i.e., NA_complex_
, return
NA_complex_
.
Only since R version 4.4.0, as.complex("1i")
gives 1i
,
it returned NA_complex_
with a warning, previously.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Arithmetic
; polyroot
finds all
complex roots of a polynomial of degree
.
require(graphics) 0i ^ (-3:3) matrix(1i^ (-6:5), nrow = 4) #- all columns are the same 0 ^ 1i # a complex NaN ## create a complex normal vector z <- complex(real = stats::rnorm(100), imaginary = stats::rnorm(100)) ## or also (less efficiently): z2 <- 1:2 + 1i*(8:9) ## The Arg(.) is an angle: zz <- (rep(1:4, length.out = 9) + 1i*(9:1))/10 zz.shift <- complex(modulus = Mod(zz), argument = Arg(zz) + pi) plot(zz, xlim = c(-1,1), ylim = c(-1,1), col = "red", asp = 1, main = expression(paste("Rotation by "," ", pi == 180^o))) abline(h = 0, v = 0, col = "blue", lty = 3) points(zz.shift, col = "orange") ## as.complex(<some NA>): numbers keep Im = 0: stopifnot(identical(as.complex(NA_real_), NA_real_ + 0i)) # has always been true NAs <- vapply(list(NA, NA_integer_, NA_real_, NA_character_, NA_complex_), as.complex, 0+0i) stopifnot(is.na(NAs), is.na(Re(NAs))) # has always been true showC <- function(z) noquote(paste0("(", Re(z), ",", Im(z), ")")) showC(NAs) Im(NAs) # [0 0 0 NA NA] \ in R <= 4.3.x was [NA NA 0 NA NA] stopifnot(Im(NAs)[1:3] == 0) ## The exact result of this *depends* on the platform, compiler, math-library: (NpNA <- NaN + NA_complex_) ; str(NpNA) # *behaves* as 'cplx NA' .. stopifnot(is.na(NpNA), is.na(NA_complex_), is.na(Re(NA_complex_)), is.na(Im(NA_complex_))) showC(NpNA)# but does not always show '(NaN,NA)' ## and this is not TRUE everywhere: identical(NpNA, NA_complex_) showC(NA_complex_) # always == (NA,NA)
require(graphics) 0i ^ (-3:3) matrix(1i^ (-6:5), nrow = 4) #- all columns are the same 0 ^ 1i # a complex NaN ## create a complex normal vector z <- complex(real = stats::rnorm(100), imaginary = stats::rnorm(100)) ## or also (less efficiently): z2 <- 1:2 + 1i*(8:9) ## The Arg(.) is an angle: zz <- (rep(1:4, length.out = 9) + 1i*(9:1))/10 zz.shift <- complex(modulus = Mod(zz), argument = Arg(zz) + pi) plot(zz, xlim = c(-1,1), ylim = c(-1,1), col = "red", asp = 1, main = expression(paste("Rotation by "," ", pi == 180^o))) abline(h = 0, v = 0, col = "blue", lty = 3) points(zz.shift, col = "orange") ## as.complex(<some NA>): numbers keep Im = 0: stopifnot(identical(as.complex(NA_real_), NA_real_ + 0i)) # has always been true NAs <- vapply(list(NA, NA_integer_, NA_real_, NA_character_, NA_complex_), as.complex, 0+0i) stopifnot(is.na(NAs), is.na(Re(NAs))) # has always been true showC <- function(z) noquote(paste0("(", Re(z), ",", Im(z), ")")) showC(NAs) Im(NAs) # [0 0 0 NA NA] \ in R <= 4.3.x was [NA NA 0 NA NA] stopifnot(Im(NAs)[1:3] == 0) ## The exact result of this *depends* on the platform, compiler, math-library: (NpNA <- NaN + NA_complex_) ; str(NpNA) # *behaves* as 'cplx NA' .. stopifnot(is.na(NpNA), is.na(NA_complex_), is.na(Re(NA_complex_)), is.na(Im(NA_complex_))) showC(NpNA)# but does not always show '(NaN,NA)' ## and this is not TRUE everywhere: identical(NpNA, NA_complex_) showC(NA_complex_) # always == (NA,NA)
These functions provide a mechanism for handling unusual conditions, including errors and warnings.
tryCatch(expr, ..., finally) withCallingHandlers(expr, ...) globalCallingHandlers(...) signalCondition(cond) simpleCondition(message, call = NULL) simpleError (message, call = NULL) simpleWarning (message, call = NULL) simpleMessage (message, call = NULL) errorCondition(message, ..., class = NULL, call = NULL) warningCondition(message, ..., class = NULL, call = NULL) ## S3 method for class 'condition' as.character(x, ...) ## S3 method for class 'error' as.character(x, ...) ## S3 method for class 'condition' print(x, ...) ## S3 method for class 'restart' print(x, ...) conditionCall(c) ## S3 method for class 'condition' conditionCall(c) conditionMessage(c) ## S3 method for class 'condition' conditionMessage(c) withRestarts(expr, ...) computeRestarts(cond = NULL) findRestart(name, cond = NULL) invokeRestart(r, ...) tryInvokeRestart(r, ...) invokeRestartInteractively(r) isRestart(x) restartDescription(r) restartFormals(r) suspendInterrupts(expr) allowInterrupts(expr) .signalSimpleWarning(msg, call) .handleSimpleError(h, msg, call) .tryResumeInterrupt()
tryCatch(expr, ..., finally) withCallingHandlers(expr, ...) globalCallingHandlers(...) signalCondition(cond) simpleCondition(message, call = NULL) simpleError (message, call = NULL) simpleWarning (message, call = NULL) simpleMessage (message, call = NULL) errorCondition(message, ..., class = NULL, call = NULL) warningCondition(message, ..., class = NULL, call = NULL) ## S3 method for class 'condition' as.character(x, ...) ## S3 method for class 'error' as.character(x, ...) ## S3 method for class 'condition' print(x, ...) ## S3 method for class 'restart' print(x, ...) conditionCall(c) ## S3 method for class 'condition' conditionCall(c) conditionMessage(c) ## S3 method for class 'condition' conditionMessage(c) withRestarts(expr, ...) computeRestarts(cond = NULL) findRestart(name, cond = NULL) invokeRestart(r, ...) tryInvokeRestart(r, ...) invokeRestartInteractively(r) isRestart(x) restartDescription(r) restartFormals(r) suspendInterrupts(expr) allowInterrupts(expr) .signalSimpleWarning(msg, call) .handleSimpleError(h, msg, call) .tryResumeInterrupt()
c |
a condition object. |
call |
call expression. |
cond |
a condition object. |
expr |
expression to be evaluated. |
finally |
expression to be evaluated before returning or exiting. |
h |
function. |
message |
character string. |
msg |
character string. |
name |
character string naming a restart. |
r |
restart object. |
x |
object. |
class |
character string naming a condition class. |
... |
additional arguments; see details below. |
The condition system provides a mechanism for signaling and handling unusual conditions, including errors and warnings. Conditions are represented as objects that contain information about the condition that occurred, such as a message and the call in which the condition occurred. Currently conditions are S3-style objects, though this may eventually change.
Conditions are objects inheriting from the abstract class
condition
. Errors and warnings are objects inheriting
from the abstract subclasses error
and warning
.
The class simpleError
is the class used by stop
and all internal error signals. Similarly, simpleWarning
is used by warning
, and simpleMessage
is used by
message
. The constructors by the same names take a string
describing the condition as argument and an optional call. The
functions conditionMessage
and conditionCall
are
generic functions that return the message and call of a condition.
The function errorCondition
can be
used to construct error conditions of a particular class with
additional fields specified as the ...
argument.
warningCondition
is analogous for warnings.
Conditions are signaled by signalCondition
. In addition,
the stop
and warning
functions have been modified to
also accept condition arguments.
The function tryCatch
evaluates its expression argument
in a context where the handlers provided in the ...
argument are available. The finally
expression is then
evaluated in the context in which tryCatch
was called; that
is, the handlers supplied to the current tryCatch
call are
not active when the finally
expression is evaluated.
Handlers provided in the ...
argument to tryCatch
are established for the duration of the evaluation of expr
.
If no condition is signaled when evaluating expr
then
tryCatch
returns the value of the expression.
If a condition is signaled while evaluating expr
then
established handlers are checked, starting with the most recently
established ones, for one matching the class of the condition.
When several handlers are supplied in a single tryCatch
then
the first one is considered more recent than the second. If a
handler is found then control is transferred to the
tryCatch
call that established the handler, the handler
found and all more recent handlers are disestablished, the handler
is called with the condition as its argument, and the result
returned by the handler is returned as the value of the
tryCatch
call.
Calling handlers are established by withCallingHandlers
. If
a condition is signaled and the applicable handler is a calling
handler, then the handler is called by signalCondition
in
the context where the condition was signaled but with the available
handlers restricted to those below the handler called in the
handler stack. If the handler returns, then the next handler is
tried; once the last handler has been tried, signalCondition
returns NULL
.
globalCallingHandlers
establishes calling handlers globally.
These handlers are only called as a last resort, after the other
handlers dynamically registered with withCallingHandlers
have
been invoked. They are called before the error
global option
(which is the legacy interface for global handling of errors).
Registering the same handler multiple times moves that handler on
top of the stack, which ensures that it is called first. Global
handlers are a good place to define a general purpose logger (for
instance saving the last error object in the global workspace) or a
general recovery strategy (e.g. installing missing packages via the
retry_loadNamespace
restart).
Like withCallingHandlers
and tryCatch
,
globalCallingHandlers
takes named handlers. Unlike these
functions, it also has an options
-like interface: you
can establish handlers by passing a single list of named handlers.
To unregister all global handlers, supply a single 'NULL'. The list
of deleted handlers is returned invisibly. Finally, calling
globalCallingHandlers
without arguments returns the list of
currently established handlers, visibly.
User interrupts signal a condition of class interrupt
that
inherits directly from class condition
before executing the
default interrupt action.
Restarts are used for establishing recovery protocols. They can be
established using withRestarts
. One pre-established restart is
an abort
restart that represents a jump to top level.
findRestart
and computeRestarts
find the available
restarts. findRestart
returns the most recently established
restart of the specified name. computeRestarts
returns a
list of all restarts. Both can be given a condition argument and
will then ignore restarts that do not apply to the condition.
invokeRestart
transfers control to the point where the
specified restart was established and calls the restart's handler with the
arguments, if any, given as additional arguments to
invokeRestart
. The restart argument to invokeRestart
can be a character string, in which case findRestart
is used
to find the restart. If no restart is found, an error is thrown.
tryInvokeRestart
is a variant of invokeRestart
that
returns silently when the restart cannot be found with
findRestart
. Because a condition of a given class might be
signalled with arbitrary protocols (error, warning, etc), it is
recommended to use this permissive variant whenever you are handling
conditions signalled from a foreign context. For instance, invocation
of a "muffleWarning"
restart should be optional because the
warning might have been signalled by the user or from a different
package with the stop
or message
protocols. Only use
invokeRestart
when you have control of the signalling context,
or when it is a logical error if the restart is not available.
New restarts for withRestarts
can be specified in several ways.
The simplest is in name = function
form where the function is
the handler to call when the restart is invoked. Another simple
variant is as name = string
where the string is stored in the
description
field of the restart object returned by
findRestart
; in this case the handler ignores its arguments
and returns NULL
. The most flexible form of a restart
specification is as a list that can include several fields, including
handler
, description
, and test
. The
test
field should contain a function of one argument, a
condition, that returns TRUE
if the restart applies to the
condition and FALSE
if it does not; the default function
returns TRUE
for all conditions.
One additional field that can be specified for a restart is
interactive
. This should be a function of no arguments that
returns a list of arguments to pass to the restart handler. The list
could be obtained by interacting with the user if necessary. The
function invokeRestartInteractively
calls this function to
obtain the arguments to use when invoking the restart. The default
interactive
method queries the user for values for the
formal arguments of the handler function.
Interrupts can be suspended while evaluating an expression using
suspendInterrupts
. Subexpression can be evaluated with
interrupts enabled using allowInterrupts
. These functions
can be used to make sure cleanup handlers cannot be interrupted.
.signalSimpleWarning
, .handleSimpleError
, and
.tryResumeInterrupt
are used internally and should not be
called directly.
The tryCatch
mechanism is similar to Java
error handling. Calling handlers are based on Common Lisp and
Dylan. Restarts are based on the Common Lisp restart mechanism.
stop
and warning
signal conditions,
and try
is essentially a simplified version of
tryCatch
.
assertCondition
in package tools tests
that conditions are signalled and works with several of the above
handlers.
tryCatch(1, finally = print("Hello")) e <- simpleError("test error") ## Not run: stop(e) tryCatch(stop(e), finally = print("Hello")) tryCatch(stop("fred"), finally = print("Hello")) ## End(Not run) tryCatch(stop(e), error = function(e) e, finally = print("Hello")) tryCatch(stop("fred"), error = function(e) e, finally = print("Hello")) withCallingHandlers({ warning("A"); 1+2 }, warning = function(w) {}) ## Not run: { withRestarts(stop("A"), abort = function() {}); 1 } ## End(Not run) withRestarts(invokeRestart("foo", 1, 2), foo = function(x, y) {x + y}) ##--> More examples are part of ##--> demo(error.catching)
tryCatch(1, finally = print("Hello")) e <- simpleError("test error") ## Not run: stop(e) tryCatch(stop(e), finally = print("Hello")) tryCatch(stop("fred"), finally = print("Hello")) ## End(Not run) tryCatch(stop(e), error = function(e) e, finally = print("Hello")) tryCatch(stop("fred"), error = function(e) e, finally = print("Hello")) withCallingHandlers({ warning("A"); 1+2 }, warning = function(w) {}) ## Not run: { withRestarts(stop("A"), abort = function() {}); 1 } ## End(Not run) withRestarts(invokeRestart("foo", 1, 2), foo = function(x, y) {x + y}) ##--> More examples are part of ##--> demo(error.catching)
conflicts
reports on objects that exist with the same name in
two or more places on the search
path, usually because
an object in the user's workspace or a package is masking a system
object of the same name. This helps discover unintentional masking.
conflicts(where = search(), detail = FALSE)
conflicts(where = search(), detail = FALSE)
where |
A subset of the search path, by default the whole search path. |
detail |
If |
If detail = FALSE
, a character vector of masked objects.
If detail = TRUE
, a list of character vectors giving the masked or
masking objects in that member of the search path. Empty vectors are
omitted.
lm <- 1:3 conflicts(, TRUE) ## gives something like # $.GlobalEnv # [1] "lm" # # $package:base # [1] "lm" ## Remove things from your "workspace" that mask others: remove(list = conflicts(detail = TRUE)$.GlobalEnv)
lm <- 1:3 conflicts(, TRUE) ## gives something like # $.GlobalEnv # [1] "lm" # # $package:base # [1] "lm" ## Remove things from your "workspace" that mask others: remove(list = conflicts(detail = TRUE)$.GlobalEnv)
Functions to create, open and close connections, i.e., “generalized files”, such as possibly compressed files, URLs, pipes, etc.
file(description = "", open = "", blocking = TRUE, encoding = getOption("encoding"), raw = FALSE, method = getOption("url.method", "default")) url(description, open = "", blocking = TRUE, encoding = getOption("encoding"), method = getOption("url.method", "default"), headers = NULL) gzfile(description, open = "", encoding = getOption("encoding"), compression = 6) bzfile(description, open = "", encoding = getOption("encoding"), compression = 9) xzfile(description, open = "", encoding = getOption("encoding"), compression = 6) unz(description, filename, open = "", encoding = getOption("encoding")) pipe(description, open = "", encoding = getOption("encoding")) fifo(description, open = "", blocking = FALSE, encoding = getOption("encoding")) socketConnection(host = "localhost", port, server = FALSE, blocking = FALSE, open = "a+", encoding = getOption("encoding"), timeout = getOption("timeout"), options = getOption("socketOptions")) serverSocket(port) socketAccept(socket, blocking = FALSE, open = "a+", encoding = getOption("encoding"), timeout = getOption("timeout"), options = getOption("socketOptions")) open(con, ...) ## S3 method for class 'connection' open(con, open = "r", blocking = TRUE, ...) close(con, ...) ## S3 method for class 'connection' close(con, type = "rw", ...) flush(con) isOpen(con, rw = "") isIncomplete(con) socketTimeout(socket, timeout = -1)
file(description = "", open = "", blocking = TRUE, encoding = getOption("encoding"), raw = FALSE, method = getOption("url.method", "default")) url(description, open = "", blocking = TRUE, encoding = getOption("encoding"), method = getOption("url.method", "default"), headers = NULL) gzfile(description, open = "", encoding = getOption("encoding"), compression = 6) bzfile(description, open = "", encoding = getOption("encoding"), compression = 9) xzfile(description, open = "", encoding = getOption("encoding"), compression = 6) unz(description, filename, open = "", encoding = getOption("encoding")) pipe(description, open = "", encoding = getOption("encoding")) fifo(description, open = "", blocking = FALSE, encoding = getOption("encoding")) socketConnection(host = "localhost", port, server = FALSE, blocking = FALSE, open = "a+", encoding = getOption("encoding"), timeout = getOption("timeout"), options = getOption("socketOptions")) serverSocket(port) socketAccept(socket, blocking = FALSE, open = "a+", encoding = getOption("encoding"), timeout = getOption("timeout"), options = getOption("socketOptions")) open(con, ...) ## S3 method for class 'connection' open(con, open = "r", blocking = TRUE, ...) close(con, ...) ## S3 method for class 'connection' close(con, type = "rw", ...) flush(con) isOpen(con, rw = "") isIncomplete(con) socketTimeout(socket, timeout = -1)
description |
character string. A description of the connection: see ‘Details’. |
open |
character string. A description of how to open the connection (if it should be opened initially). See section ‘Modes’ for possible values. |
blocking |
logical. See the ‘Blocking’ section. |
encoding |
the name of the encoding to be assumed. See the ‘Encoding’ section. |
raw |
logical. If true, a ‘raw’ interface is used which will be more suitable for arguments which are not regular files, e.g. character devices. This suppresses the check for a compressed file when opening for text-mode reading, and asserts that the ‘file’ may not be seekable. |
method |
character string, partially matched to
|
headers |
named character vector of HTTP headers to use in HTTP
requests. It is ignored for non-HTTP URLs. The |
compression |
integer in 0–9. The amount of compression to be
applied when writing, from none to maximal available. For
|
timeout |
numeric: the timeout (in seconds) to be used for this connection. Beware that some OSes may treat very large values as zero: however the POSIX standard requires values up to 31 days to be supported. |
options |
optional character vector with options. Currently only
|
filename |
a filename within a zip file. |
host |
character string. Host name for the port. |
port |
integer. The TCP port number. |
server |
logical. Should the socket be a client or a server? |
socket |
a server socket listening for connections. |
con |
a connection. |
type |
character string. Currently ignored. |
rw |
character string. Empty or |
... |
arguments passed to or from other methods. |
The first eleven functions create connections. By default the
connection is not opened (except for a socket connection created by
socketConnection
or socketAccept
and for server socket
connection created by serverSocket
), but may
be opened by setting a non-empty value of argument open
.
For file
the description is a path to the file to be opened
(when tilde expansion is done) or a complete URL (when it is
the same as calling url
), or ""
(the default) or
"clipboard"
(see the ‘Clipboard’ section). Use
"stdin"
to refer to the C-level ‘standard input’ of the
process (which need not be connected to anything in a console or
embedded version of R, and is not in RGui
on Windows). See
also stdin()
for the subtly different R-level concept of
stdin
. See nullfile()
for a platform-independent
way to get filename of the null device.
For url
the description is a complete URL including scheme
(such as ‘http://’, ‘https://’, ‘ftp://’ or
‘file://’). Method "internal"
is that available since
connections were introduced but now mainly defunct. Method
"wininet"
is only available on Windows (it uses the WinINet
functions of that OS) and method "libcurl"
(using the library
of that name: https://curl.se/libcurl/) is nowadays required but
was optional on Windows before R 4.2.0. Method "default"
currently uses method "internal"
for ‘file://’ URLs and
"libcurl"
for all others. Which methods support which schemes
has varied by R version – currently "internal"
supports only
‘file://’; "wininet"
supports ‘file://’,
‘http://’ and ‘https://’. Proxies can be specified: see
download.file
.
For gzfile
the description is the path to a file compressed by
gzip
: it can also open for reading uncompressed files and
those compressed by bzip2
, xz
or lzma
.
For bzfile
the description is the path to a file compressed by
bzip2
.
For xzfile
the description is the path to a file compressed by
xz
(https://en.wikipedia.org/wiki/Xz) or (for reading
only) lzma
(https://en.wikipedia.org/wiki/LZMA).
unz
reads (only) single files within zip files, in binary mode.
The description is the full path to the zip file, with ‘.zip’
extension if required.
For pipe
the description is the command line to be piped to or
from. This is run in a shell, on Windows that specified by the
COMSPEC environment variable.
For fifo
the description is the path of the fifo. (Support for
fifo
connections is optional but they are available on most
Unix platforms and on Windows.)
The intention is that file
and gzfile
can be used
generally for text input (from files, ‘http://’ and
‘https://’ URLs) and binary input respectively.
open
, close
and seek
are generic functions: the
following applies to the methods relevant to connections.
open
opens a connection. In general functions using
connections will open them if they are not open, but then close them
again, so to leave a connection open call open
explicitly.
close
closes and destroys a connection. This will happen
automatically in due course (with a warning) if there is no longer an
R object referring to the connection.
flush
flushes the output stream of a connection open for
write/append (where implemented, currently for file and clipboard
connections, stdout
and stderr
).
If for a file
or (on most platforms) a fifo
connection
the description is ""
, the file/fifo is immediately opened (in
"w+"
mode unless open = "w+b"
is specified) and unlinked
from the file system. This provides a temporary file/fifo to write to
and then read from.
socketConnection(server=TRUE)
creates a new temporary server socket
listening on the given port. As soon as a new socket connection is
accepted on that port, the server socket is automatically closed.
serverSocket
creates a listening server socket which can be used
for accepting multiple socket connections by socketAccept
. To stop
listening for new connections, a server socket needs to be closed
explicitly by close
.
socketConnection
and socketAccept
support setting of
socket-specific options. Currently only "no-delay"
is
implemented which enables the TCP_NODELAY
socket option, causing
the socket to flush send buffers immediately (instead of waiting to
collect all output before sending). This option is useful for
protocols that need fast request/response turn-around times.
socketTimeout
sets connection timeout of a socket connection. A
negative timeout
can be given to query the old value.
file
, pipe
, fifo
, url
, gzfile
,
bzfile
, xzfile
, unz
, socketConnection
,
socketAccept
and serverSocket
return a connection object which inherits from class
"connection"
and has a first more specific class.
open
and flush
return NULL
, invisibly.
close
returns either NULL
or an integer status,
invisibly. The status is from when the connection was last closed and
is available only for some types of connections (e.g., pipes, files and
fifos): typically zero values indicate success. Negative values will
result in a warning; if writing, these may indicate write failures and should
not be ignored. Connections should be closed explicitly when finished
with to avoid wasting resources and to reduce the risk that some buffered
data in output connections would be lost (see on.exit()
for
how to run code also in case of error).
isOpen
returns a logical value, whether the connection is
currently open.
isIncomplete
returns a logical value, whether the last read attempt
from a non-blocking connection provided no data (currently no data from a
socket or an unterminated line in readLines
), or for an
output text connection whether there is unflushed output. See example
below.
socketTimeout
returns the old timeout value of a socket connection.
url
and file
support URL schemes ‘file://’,
‘http://’, ‘https://’ and ‘ftp://’.
method = "libcurl"
allows more schemes: exactly which schemes
is platform-dependent (see libcurlVersion
), but all
platforms will support ‘https://’ and most platforms will support
‘ftps://’.
Support for the ‘ftp://’ scheme by the "internal"
method was
deprecated in R 4.1.1 and removed in R 4.2.0.
Most methods do not percent-encode special characters such as spaces
in ‘http://’ URLs (see URLencode
), but it seems the
"wininet"
method does.
A note on ‘file://’ URLs (which are handled by the same internal
code irrespective of argument method
). The most general form
(from RFC1738) is ‘file://host/path/to/file’, but R only accepts
the form with an empty host
field referring to the local
machine.
On a Unix-alike, this is then ‘file:///path/to/file’, where ‘path/to/file’ is relative to ‘/’. So although the third slash is strictly part of the specification not part of the path, this can be regarded as a way to specify the file ‘/path/to/file’. It is not possible to specify a relative path using a file URL.
In this form the path is relative to the root of the filesystem, not a
Windows concept. The standard form on Windows is
‘file:///d:/R/repos’: for compatibility with earlier versions of
R and Unix versions, any other form is parsed as R as ‘file://’
plus path_to_file
. Also, backslashes are accepted within the
path even though RFC1738 does not allow them.
No attempt is made to decode a percent-encoded ‘file:’ URL: call
URLdecode
if necessary.
All the methods attempt to follow redirected HTTP and HTTPS URLs.
Server-side cached data is always accepted.
Function download.file
and several contributed packages
provide more comprehensive facilities to download from URLs.
Possible values for the argument open
are
"r"
or "rt"
Open for reading in text mode.
"w"
or "wt"
Open for writing in text mode.
"a"
or "at"
Open for appending in text mode.
"rb"
Open for reading in binary mode.
"wb"
Open for writing in binary mode.
"ab"
Open for appending in binary mode.
"r+"
, "r+b"
Open for reading and writing.
"w+"
, "w+b"
Open for reading and writing, truncating file initially.
"a+"
, "a+b"
Open for reading and appending.
Not all modes are applicable to all connections: for example URLs can only be opened for reading. Only file and socket connections can be opened for both reading and writing. An unsupported mode is usually silently substituted.
If a file or fifo is created on a Unix-alike, its permissions will be
the maximal allowed by the current setting of umask
(see
Sys.umask
).
For many connections there is little or no difference between text and
binary modes. For file-like connections on Windows, translation of
line endings (between LF and CRLF) is done in text mode only (but text
read operations on connections such as readLines
,
scan
and source
work for any form of line
ending). Various R operations are possible in only one of the modes:
for example pushBack
is text-oriented and is only
allowed on connections open for reading in text mode, and binary
operations such as readBin
, load
and
save
can only be done on binary-mode connections.
The mode of a connection is determined when actually opened, which is
deferred if open = ""
is given (the default for all but socket
connections). An explicit call to open
can specify the mode,
but otherwise the mode will be "r"
. (gzfile
,
bzfile
and xzfile
connections are exceptions, as the
compressed file always has to be opened in binary mode and no
conversion of line-endings is done even on Windows, so the default
mode is interpreted as "rb"
.) Most operations that need write
access or text-only or binary-only mode will override the default mode
of a non-yet-open connection.
Append modes need to be considered carefully for compressed-file
connections. They do not produce a single compressed stream
on the file, but rather append a new compressed stream to the file.
Readers may or may not read beyond end of the first stream: currently
R does so for gzfile
, bzfile
and xzfile
connections.
R supports gzip
, bzip2
and xz
compression (also read-only support for its precursor, lzma
compression).
For reading, the type of compression (if any) can be determined from
the first few bytes of the file. Thus for file(raw = FALSE)
connections, if open
is ""
, "r"
or "rt"
the connection can read any of the compressed file types as well as
uncompressed files. (Using "rb"
will allow compressed files to
be read byte-by-byte.) Similarly, gzfile
connections can read
any of the forms of compression and uncompressed files in any read
mode.
(The type of compression is determined when the connection is created
if open
is unspecified and a file of that name exists. If the
intention is to open the connection to write a file with a
different form of compression under that name, specify
open = "w"
when the connection is created or
unlink
the file before creating the connection.)
For write-mode connections, compress
specifies how hard the
compressor works to minimize the file size, and higher values need
more CPU time and more working memory (up to ca 800Mb for
xzfile(compress = 9)
). For xzfile
negative values of
compress
correspond to adding the xz
argument
-e: this takes more time (double?) to compress but may
achieve (slightly) better compression. The default (6
) has
good compression and modest (100Mb memory) usage: but if you are using
xz
compression you are probably looking for high compression.
Choosing the type of compression involves tradeoffs: gzip
,
bzip2
and xz
are successively less widely supported,
need more resources for both compression and decompression, and
achieve more compression (although individual files may buck the
general trend). Typical experience is that bzip2
compression
is 15% better on text files than gzip
compression, and
xz
with maximal compression 30% better. The experience with
R save
files is similar, but on some large ‘.rda’
files xz
compression is much better than the other two. With
current computers decompression times even with compress = 9
are typically modest and reading compressed files is usually faster
than uncompressed ones because of the reduction in disc activity.
The encoding of the input/output stream of a connection can be
specified by name in the same way as it would be given to
iconv
: see that help page for how to find out what
encoding names are recognized on your platform. Additionally,
""
and "native.enc"
both mean the ‘native’
encoding, that is the internal encoding of the current locale and
hence no translation is done.
When writing to a text connection, the connections code always assumes its
input is in native encoding, so e.g. writeLines
has to
convert text to native encoding. The native encoding is UTF-8 on most
systems (since R 4.2 also on recent Windows) and can represent all
characters. writeLines
does not do the conversion when
useBytes=TRUE
(for expert use only, only useful on systems with native
encoding other than UTF-8), but the connections code still behaves as if
the text was in native encoding, so any attempt to convert encoding
(encoding
argument other than ""
and "native.enc"
) in
connections will produce incorrect results.
When reading from a text connection, the connections code re-encodes the
input to native encoding (from the encoding given by the encoding
argument). On systems where UTF-8 is not the native encoding, one can
read text not representable in the native encoding using
readLines
and scan
by providing them with an
unopened connection that has been created with the encoding
argument specifying the input encoding. readLines
and
scan
would then instruct the connections code to convert the
text to UTF-8 (instead of native encoding) and they will return it marked
(aka declared, see Encoding
)
as "UTF-8"
. Finally and for expert use only, one may disable
re-encoding of input by specifying ""
or "native.enc"
as
encoding
for the connection, but then mark the text as being
"UTF-8"
or "latin1"
via the encoding
argument
of readLines
and scan
.
Re-encoding only works for connections in text mode: reading from a
connection with re-encoding specified in binary mode will read the
stream of bytes, but mixing text and binary mode reads (e.g., mixing
calls to readLines
and readChar
) is likely
to lead to incorrect results.
The encodings "UCS-2LE"
and "UTF-16LE"
are treated
specially, as they are appropriate values for Windows ‘Unicode’
text files. If the first two bytes are the Byte Order Mark
0xFEFF
then these are removed as some implementations of
iconv
do not accept BOMs. Note that whereas most
implementations will handle BOMs using encoding "UCS-2"
and
choose the appropriate byte order, some (including earlier versions of
glibc
) will not. There is a subtle distinction between
"UTF-16"
and "UCS-2"
(see
https://en.wikipedia.org/wiki/UTF-16): the use of characters in
the ‘Supplementary Planes’ which need surrogate pairs is very
rare so "UCS-2LE"
is an appropriate first choice (as it is more
widely implemented).
The encoding "UTF-8-BOM"
is accepted for reading and will
remove a Byte Order Mark if present (which it often is for files and
webpages generated by Microsoft applications). If a BOM is required
(it is not recommended) when writing it should be written explicitly,
e.g. by writeChar("\ufeff", con, eos = NULL)
or
writeBin(as.raw(c(0xef, 0xbb, 0xbf)), binary_con)
Encoding names "utf8"
, "mac"
and "macroman"
are
not portable, and not supported on all current R platforms.
"UTF-8"
is portable and "macintosh"
is the official (and
most widely supported) name for ‘Mac Roman’. (R maps
"utf8"
to "UTF-8"
internally.)
Requesting a conversion that is not supported is an error, reported when the connection is opened. Exactly what happens when the requested translation cannot be done for invalid input is in general undocumented. On output the result is likely to be that up to the error, with a warning. On input, it will most likely be all or some of the input up to the error.
It may be possible to deduce the current native encoding from
Sys.getlocale("LC_CTYPE")
, but not all OSes record it.
Whether or not the connection blocks can be specified for file, url (default yes), fifo and socket connections (default not).
In blocking mode, functions using the connection do not return to the R evaluator until the read/write is complete. In non-blocking mode, operations return as soon as possible, so on input they will return with whatever input is available (possibly none) and for output they will return whether or not the write succeeded.
The function readLines
behaves differently in respect of
incomplete last lines in the two modes: see its help page.
Even when a connection is in blocking mode, attempts are made to ensure that it does not block the event loop and hence the operation of GUI parts of R. These do not always succeed, and the whole R process will be blocked during a DNS lookup on Unix, for example.
Most blocking operations on HTTP/FTP URLs and on sockets are subject to the
timeout set by options("timeout")
. Note that this is a timeout
for no response, not for the whole operation. The timeout is set at
the time the connection is opened (more precisely, when the last
connection of that type – ‘http:’, ‘ftp:’ or socket – was
opened).
Fifos default to non-blocking. That follows S version 4 and is probably most natural, but it does have some implications. In particular, opening a non-blocking fifo connection for writing (only) will fail unless some other process is reading on the fifo.
Opening a fifo for both reading and writing (in any mode: one can only
append to fifos) connects both sides of the fifo to the R process,
and provides an similar facility to file()
.
file
can be used with description = "clipboard"
in mode "r"
only. This reads the X11 primary selection (see
https://specifications.freedesktop.org/clipboards-spec/clipboards-latest.txt),
which can also be specified as "X11_primary"
and the secondary
selection as "X11_secondary"
. On most systems the clipboard
selection (that used by ‘Copy’ from an ‘Edit’ menu) can
be specified as "X11_clipboard"
.
When a clipboard is opened for reading, the contents are immediately copied to internal storage in the connection.
Unix users wishing to write to one of the X11 selections may be
able to do so via xclip
(https://github.com/astrand/xclip) or xsel
(https://www.vergenet.net/~conrad/software/xsel/), for example by
pipe("xclip -i", "w")
for the primary selection.
macOS users can use pipe("pbpaste")
and
pipe("pbcopy", "w")
to read from and write to that system's
clipboard.
In most cases these are translated to the native encoding.
The exceptions are file
and pipe
on Windows, where a
description
which is marked as being in UTF-8 is passed to
Windows as a ‘wide’ character string. This allows files with
names not in the native encoding to be opened on file systems which
use Unicode file names (such as NTFS but not FAT32).
Most modern browsers do not support such URLs, and ‘https://’ ones are much preferred for use in R.
It is intended that R will continue to allow such URLs for as long as
libcurl
does, but as they become rarer this is increasingly
untested. What ‘protocols’ the version of libcurl
being used supports can be seen by calling libcurlVersion()
.
There is a limit on the number of connections which can be allocated (not necessarily open) at any one time. It is good practice to close connections when finished with, but if necessary garbage-collection will be invoked to close those connections without any R object referring to them.
The default limit is 128 (including the three terminal connections,
stdin
, stdout
and stderr
). This can be increased
when R is started using the option --max-connections=N, where
the maximum allowed value is 4096.
However, many types of connections use other resources which are
themselves limited. Notably on Unix, ‘file descriptors’ which
by default are per-process limited: this limits the number of
connections using files, pipes and fifos. (The default limit is 256
on macOS (and Solaris) but 1024 on Linux. The limit can be raised in the
shell used to launch R, for example by ulimit -n
.) File
descriptors are used for many other purposes including dynamically
loading DSO/DLLs (see dyn.load
) which may use up to 60%
of the limit.
Windows has a default limit of 512 open C file streams: these are used
by at least file
, gzfile
, bzfile
, xzfile
,
pipe
, url
and unz
connections applied to files
(rather than URLs).
Package parallel's makeCluster
uses socket
connections to communicate with the worker processes, one per worker.
R's connections are modelled on those in S version 4 (see Chambers,
1998). However R goes well beyond the S model, for example in output
text connections and URL, compressed and socket connections.
The default open mode in R is "r"
except for socket connections.
This differs from S, where it is the equivalent of "r+"
,
known as "*"
.
On (historic) platforms where vsnprintf
does not return the needed
length of output there is a 100,000 byte output limit on the length of
a line for text output on fifo
, gzfile
, bzfile
and
xzfile
connections: longer lines will be truncated with a
warning.
Chambers, J. M. (1998) Programming with Data. A Guide to the S Language. Springer.
Ripley, B. D. (2001). “Connections.” R News, 1(1), 16–7. https://www.r-project.org/doc/Rnews/Rnews_2001-1.pdf.
textConnection
, seek
,
showConnections
, pushBack
.
Functions making direct use of connections are (text-mode)
readLines
, writeLines
, cat
,
sink
, scan
, parse
,
read.dcf
, dput
, dump
and
(binary-mode) readBin
, readChar
,
writeBin
, writeChar
, load
and save
.
capabilities
to see if fifo
connections are
supported by this build of R.
gzcon
to wrap gzip
(de)compression around a
connection.
options
HTTPUserAgent
, internet.info
and
timeout
are used by some of the methods for URL connections.
memCompress
for more ways to (de)compress and references
on data compression.
extSoftVersion
for the versions of the zlib
(for
gzfile
), bzip2
and xz
libraries in use.
To flush output to the Windows and macOS consoles, see
flush.console
.
zzfil <- tempfile(fileext=".data") zz <- file(zzfil, "w") # open an output file connection cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") cat("One more line\n", file = zz) close(zz) readLines(zzfil) unlink(zzfil) zzfil <- tempfile(fileext=".gz") zz <- gzfile(zzfil, "w") # compressed file cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") close(zz) readLines(zz <- gzfile(zzfil)) close(zz) unlink(zzfil) zz # an invalid connection zzfil <- tempfile(fileext=".bz2") zz <- bzfile(zzfil, "w") # bzip2-ed file cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") close(zz) zz # print() method: invalid connection print(readLines(zz <- bzfile(zzfil))) close(zz) unlink(zzfil) ## An example of a file open for reading and writing Tpath <- tempfile("test") Tfile <- file(Tpath, "w+") c(isOpen(Tfile, "r"), isOpen(Tfile, "w")) # both TRUE cat("abc\ndef\n", file = Tfile) readLines(Tfile) seek(Tfile, 0, rw = "r") # reset to beginning readLines(Tfile) cat("ghi\n", file = Tfile) readLines(Tfile) Tfile # -> print() : "valid" connection close(Tfile) Tfile # -> print() : "invalid" connection unlink(Tpath) ## We can do the same thing with an anonymous file. Tfile <- file() cat("abc\ndef\n", file = Tfile) readLines(Tfile) close(Tfile) ## Not run: ## fifo example -- may hang even with OS support for fifos if(capabilities("fifo")) { zzfil <- tempfile(fileext="-fifo") zz <- fifo(zzfil, "w+") writeLines("abc", zz) print(readLines(zz)) close(zz) unlink(zzfil) } ## End(Not run) ## Unix examples of use of pipes # read listing of current directory readLines(pipe("ls -1")) # remove trailing commas. Suppose ## Not run: % cat data2_ 450, 390, 467, 654, 30, 542, 334, 432, 421, 357, 497, 493, 550, 549, 467, 575, 578, 342, 446, 547, 534, 495, 979, 479 ## End(Not run) # Then read this by scan(pipe("sed -e s/,$// data2_"), sep = ",") # convert decimal point to comma in output: see also write.table # both R strings and (probably) the shell need \ doubled zzfil <- tempfile("outfile") zz <- pipe(paste("sed s/\\\\./,/ >", zzfil), "w") cat(format(round(stats::rnorm(48), 4)), fill = 70, file = zz) close(zz) file.show(zzfil, delete.file = TRUE) ## Not run: ## example for a machine running a finger daemon con <- socketConnection(port = 79, blocking = TRUE) writeLines(paste0(system("whoami", intern = TRUE), "\r"), con) gsub(" *$", "", readLines(con)) close(con) ## End(Not run) ## Not run: ## Two R processes communicating via non-blocking sockets # R process 1 con1 <- socketConnection(port = 6011, server = TRUE) writeLines(LETTERS, con1) close(con1) # R process 2 con2 <- socketConnection(Sys.info()["nodename"], port = 6011) # as non-blocking, may need to loop for input readLines(con2) while(isIncomplete(con2)) { Sys.sleep(1) z <- readLines(con2) if(length(z)) print(z) } close(con2) ## examples of use of encodings # write a file in UTF-8 cat(x, file = (con <- file("foo", "w", encoding = "UTF-8"))); close(con) # read a 'Windows Unicode' file A <- read.table(con <- file("students", encoding = "UCS-2LE")); close(con) ## End(Not run)
zzfil <- tempfile(fileext=".data") zz <- file(zzfil, "w") # open an output file connection cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") cat("One more line\n", file = zz) close(zz) readLines(zzfil) unlink(zzfil) zzfil <- tempfile(fileext=".gz") zz <- gzfile(zzfil, "w") # compressed file cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") close(zz) readLines(zz <- gzfile(zzfil)) close(zz) unlink(zzfil) zz # an invalid connection zzfil <- tempfile(fileext=".bz2") zz <- bzfile(zzfil, "w") # bzip2-ed file cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") close(zz) zz # print() method: invalid connection print(readLines(zz <- bzfile(zzfil))) close(zz) unlink(zzfil) ## An example of a file open for reading and writing Tpath <- tempfile("test") Tfile <- file(Tpath, "w+") c(isOpen(Tfile, "r"), isOpen(Tfile, "w")) # both TRUE cat("abc\ndef\n", file = Tfile) readLines(Tfile) seek(Tfile, 0, rw = "r") # reset to beginning readLines(Tfile) cat("ghi\n", file = Tfile) readLines(Tfile) Tfile # -> print() : "valid" connection close(Tfile) Tfile # -> print() : "invalid" connection unlink(Tpath) ## We can do the same thing with an anonymous file. Tfile <- file() cat("abc\ndef\n", file = Tfile) readLines(Tfile) close(Tfile) ## Not run: ## fifo example -- may hang even with OS support for fifos if(capabilities("fifo")) { zzfil <- tempfile(fileext="-fifo") zz <- fifo(zzfil, "w+") writeLines("abc", zz) print(readLines(zz)) close(zz) unlink(zzfil) } ## End(Not run) ## Unix examples of use of pipes # read listing of current directory readLines(pipe("ls -1")) # remove trailing commas. Suppose ## Not run: % cat data2_ 450, 390, 467, 654, 30, 542, 334, 432, 421, 357, 497, 493, 550, 549, 467, 575, 578, 342, 446, 547, 534, 495, 979, 479 ## End(Not run) # Then read this by scan(pipe("sed -e s/,$// data2_"), sep = ",") # convert decimal point to comma in output: see also write.table # both R strings and (probably) the shell need \ doubled zzfil <- tempfile("outfile") zz <- pipe(paste("sed s/\\\\./,/ >", zzfil), "w") cat(format(round(stats::rnorm(48), 4)), fill = 70, file = zz) close(zz) file.show(zzfil, delete.file = TRUE) ## Not run: ## example for a machine running a finger daemon con <- socketConnection(port = 79, blocking = TRUE) writeLines(paste0(system("whoami", intern = TRUE), "\r"), con) gsub(" *$", "", readLines(con)) close(con) ## End(Not run) ## Not run: ## Two R processes communicating via non-blocking sockets # R process 1 con1 <- socketConnection(port = 6011, server = TRUE) writeLines(LETTERS, con1) close(con1) # R process 2 con2 <- socketConnection(Sys.info()["nodename"], port = 6011) # as non-blocking, may need to loop for input readLines(con2) while(isIncomplete(con2)) { Sys.sleep(1) z <- readLines(con2) if(length(z)) print(z) } close(con2) ## examples of use of encodings # write a file in UTF-8 cat(x, file = (con <- file("foo", "w", encoding = "UTF-8"))); close(con) # read a 'Windows Unicode' file A <- read.table(con <- file("students", encoding = "UCS-2LE")); close(con) ## End(Not run)
Constants built into R.
LETTERS letters month.abb month.name pi
LETTERS letters month.abb month.name pi
R has a small number of built-in constants.
The following constants are available:
LETTERS
: the 26 upper-case letters of the Roman
alphabet;
letters
: the 26 lower-case letters of the Roman
alphabet;
month.abb
: the three-letter abbreviations for the
English month names;
month.name
: the English names for the months of the
year;
pi
: the ratio of the circumference of a circle to its
diameter.
These are implemented as variables in the base namespace taking appropriate values.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Quotes
for the parsing of character constants,
NumericConstants
for numeric constants.
## John Machin (ca 1706) computed pi to over 100 decimal places ## using the Taylor series expansion of the second term of pi - 4*(4*atan(1/5) - atan(1/239)) ## months in English month.name ## months in your current locale format(ISOdate(2000, 1:12, 1), "%B") format(ISOdate(2000, 1:12, 1), "%b")
## John Machin (ca 1706) computed pi to over 100 decimal places ## using the Taylor series expansion of the second term of pi - 4*(4*atan(1/5) - atan(1/239)) ## months in English month.name ## months in your current locale format(ISOdate(2000, 1:12, 1), "%B") format(ISOdate(2000, 1:12, 1), "%b")
The R Who-is-who, describing who made significant contributions to the development of R.
contributors()
contributors()
These are the basic control-flow constructs of the R language. They function in much the same way as control statements in any Algol-like language. They are all reserved words.
if(cond) expr if(cond) cons.expr else alt.expr for(var in seq) expr while(cond) expr repeat expr break next x %||% y
if(cond) expr if(cond) cons.expr else alt.expr for(var in seq) expr while(cond) expr repeat expr break next x %||% y
cond |
A length-one logical vector that is not |
var |
A syntactical name for a variable. |
seq |
An expression evaluating to a vector (including a list and
an expression) or to a pairlist or |
expr , cons.expr , alt.expr , x , y
|
An expression in a formal sense. This is either a
simple expression or a so-called compound expression, usually
of the form |
break
breaks out of a for
, while
or repeat
loop; control is transferred to the first statement outside the
inner-most loop. next
halts the processing of the current
iteration and advances the looping index. Both break
and
next
apply only to the innermost of nested loops.
Note that it is a common mistake to forget to put braces ({ .. }
)
around your statements, e.g., after if(..)
or for(....)
.
In particular, you should not have a newline between }
and
else
to avoid a syntax error in entering a if ... else
construct at the keyboard or via source
.
For that reason, one (somewhat extreme) attitude of defensive programming
is to always use braces, e.g., for if
clauses.
The seq
in a for
loop is evaluated at the start of
the loop; changing it subsequently does not affect the loop. If
seq
has length zero the body of the loop is skipped. Otherwise the
variable var
is assigned in turn the value of each element of
seq
. You can assign to var
within the body of the loop,
but this will not affect the next iteration. When the loop terminates,
var
remains as a variable containing its latest value.
The null coalescing operator %||%
is a simple 1-line function:
x %||% y
is an idiomatic way to call
if (is.null(x)) y else x # or equivalently, of course, if(!is.null(x)) x else y
Inspired by Ruby, it was first proposed by Hadley Wickham.
if
returns the value of the expression evaluated, or
NULL
invisibly if none was (which may happen if there is no
else
).
for
, while
and repeat
return NULL
invisibly.
for
sets var
to the last used element of seq
,
or to NULL
if it was of length zero.
break
and next
do not return a value as they transfer
control within the loop.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Syntax
for the basic R syntax and operators,
Paren
for parentheses and braces.
ifelse
, switch
for other ways to control flow.
for(i in 1:5) print(1:i) for(n in c(2,5,10,20,50)) { x <- stats::rnorm(n) cat(n, ": ", sum(x^2), "\n", sep = "") } f <- factor(sample(letters[1:5], 10, replace = TRUE)) for(i in unique(f)) print(i) res <- {} res %||% "alternative result" x <- head(x) %||% stop("parsed, but *not* evaluated..") res <- if(sum(x) > 7.5) mean(x) # may be NULL res %||% "sum(x) <= 7.5"
for(i in 1:5) print(1:i) for(n in c(2,5,10,20,50)) { x <- stats::rnorm(n) cat(n, ": ", sum(x^2), "\n", sep = "") } f <- factor(sample(letters[1:5], 10, replace = TRUE)) for(i in unique(f)) print(i) res <- {} res %||% "alternative result" x <- head(x) %||% stop("parsed, but *not* evaluated..") res <- if(sum(x) > 7.5) mean(x) # may be NULL res %||% "sum(x) <= 7.5"
R is released under the ‘GNU Public License’: see
license
for details. The license describes your right
to use R. Copyright is concerned with ownership of intellectual
rights, and some of the software used has conditions that the
copyright must be explicitly stated: see the ‘Details’ section. We
are grateful to these people and other contributors (see
contributors
) for the ability to use their work.
The file ‘R_HOME/COPYRIGHTS’ lists the copyrights in full detail.
Given matrices x
and y
as arguments, return a matrix
cross-product. This is formally equivalent to (but faster than) the call
t(x) %*% y
(crossprod
) or
x %*% t(y)
(tcrossprod
).
These are generic functions since R 4.4.0: methods can be written
individually or via the matOps
group
generic function; it dispatches to S3 and S4 methods.
crossprod(x, y = NULL, ...) tcrossprod(x, y = NULL, ...)
crossprod(x, y = NULL, ...) tcrossprod(x, y = NULL, ...)
x , y
|
numeric or complex matrices (or vectors): |
... |
potential further arguments for methods. |
A double or complex matrix, with appropriate dimnames
taken
from x
and y
.
When x
or y
are not matrices, they are treated as column or
row matrices, but their names
are usually not
promoted to dimnames
. Hence, currently, the last
example has empty dimnames.
In the same situation, these matrix products (also %*%
)
are more flexible in promotion of vectors to row or column matrices, such
that more cases are allowed, since R 3.2.0.
The propagation of NaN
/Inf
values, precision, and performance of matrix
products can be controlled by options("matprod")
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
(z <- crossprod(1:4)) # = sum(1 + 2^2 + 3^2 + 4^2) drop(z) # scalar x <- 1:4; names(x) <- letters[1:4]; x tcrossprod(as.matrix(x)) # is identical(tcrossprod(as.matrix(x)), crossprod(t(x))) tcrossprod(x) # no dimnames m <- matrix(1:6, 2,3) ; v <- 1:3; v2 <- 2:1 stopifnot(identical(tcrossprod(v, m), v %*% t(m)), identical(tcrossprod(v, m), crossprod(v, t(m))), identical(crossprod(m, v2), t(m) %*% v2))
(z <- crossprod(1:4)) # = sum(1 + 2^2 + 3^2 + 4^2) drop(z) # scalar x <- 1:4; names(x) <- letters[1:4]; x tcrossprod(as.matrix(x)) # is identical(tcrossprod(as.matrix(x)), crossprod(t(x))) tcrossprod(x) # no dimnames m <- matrix(1:6, 2,3) ; v <- 1:3; v2 <- 2:1 stopifnot(identical(tcrossprod(v, m), v %*% t(m)), identical(tcrossprod(v, m), crossprod(v, t(m))), identical(crossprod(m, v2), t(m) %*% v2))
Report information on the C stack size and usage (if available).
Cstack_info()
Cstack_info()
On most platforms, C stack information is recorded when R is
initialized and used for stack-checking. If this information is
unavailable, the size
will be returned as NA
, and
stack-checking is not performed.
The information on the stack base address is thought to be accurate on
Windows, Linux (using glibc
), macOS and FreeBSD but a heuristic
is used on other platforms. Because this might be slightly
inaccurate, the current usage could be estimated as negative. (The
heuristic is not used on embedded uses of R on platforms where the
stack base information is not thought to be accurate.)
The ‘evaluation depth’ is the number of nested R expressions
currently under evaluation: this has a limit controlled by
options("expressions")
.
An integer vector. This has named elements
size |
The size of the stack (in bytes), or |
current |
The estimated current usage (in bytes), possibly |
direction |
|
eval_depth |
The current evaluation depth (including two calls
for the call to |
Cstack_info()
Cstack_info()
Returns a vector whose elements are the cumulative sums, products, minima or maxima of the elements of the argument.
cumsum(x) cumprod(x) cummax(x) cummin(x)
cumsum(x) cumprod(x) cummax(x) cummin(x)
x |
a numeric or complex (not |
These are generic functions: methods can be defined for them
individually or via the Math
group generic.
A vector of the same length and type as x
(after coercion),
except that cumprod
returns a numeric vector for integer input
(for consistency with *
). Names are preserved.
An NA
value in x
causes the corresponding and following
elements of the return value to be NA
, as does integer overflow
in cumsum
(with a warning).
In the complex case with NA
s, these NA
elements may
have finite real or imaginary parts, notably for cumsum()
,
fulfilling the identity Im(cumsum(x))
cumsum(Im(x))
.
cumsum
and cumprod
are S4 generic functions:
methods can be defined for them individually or via the
Math
group generic.
cummax
and cummin
are individually S4 generic functions.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole. (cumsum
only.)
cumsum(1:10) cumprod(1:10) cummin(c(3:1, 2:0, 4:2)) cummax(c(3:1, 2:0, 4:2))
cumsum(1:10) cumprod(1:10) cummin(c(3:1, 2:0, 4:2)) cummax(c(3:1, 2:0, 4:2))
Retrieve the headers for a URL for a supported protocol such as ‘http://’, ‘ftp://’, ‘https://’ and ‘ftps://’.
curlGetHeaders(url, redirect = TRUE, verify = TRUE, timeout = 0L, TLS = "")
curlGetHeaders(url, redirect = TRUE, verify = TRUE, timeout = 0L, TLS = "")
url |
character string specifying the URL. |
redirect |
logical: should redirections be followed? |
verify |
logical: should certificates be verified as valid and applying to that host? |
timeout |
integer: the maximum time in seconds the request is allowed to take. Non-positive and invalid values are ignored (including the default). (Added in R 4.1.0.) |
TLS |
character: the minimum version of the TLS protocol to be used
for ‘https://’ URLs: the default ( |
This reports what curl -I -L
or curl -I
would
report. For a ‘ftp://’ URL the ‘headers’ are a record of
the conversation between client and server before data transfer.
Only 500 header lines will be reported: there is a limit of 20 redirections so this should suffice (and even 20 would indicate problems).
If argument timeout
is not set to a positive integer this uses
getOption("timeout")
which defaults to 60 seconds. As
the request cannot be interrupted you may want to consider a shorter
value.
To see all the details of the interaction with the server(s) set
options(internet.info = 1)
.
HTTP[S] servers are allowed to refuse requests to read the headers and
some do: this will result in a status
of 405
.
For possible issues with secure URLs (especially on Windows) see
download.file
.
There is a security risk in not verifying certificates, but as only the headers are captured it is slight. Usually looking at the URL in a browser will reveal what the problem is (and it may well be machine-specific).
A character vector with integer attribute "status"
(the
last-received ‘status’ code). If redirection occurs this will include
the headers for all the URLs visited.
For the interpretation of ‘status’ codes see https://en.wikipedia.org/wiki/List_of_HTTP_status_codes and https://en.wikipedia.org/wiki/List_of_FTP_server_return_codes. A successful FTP connection will usually have status 250, 257 or 350.
capabilities("libcurl")
to see if this is supported.
libcurlVersion
for the version of libcurl
in use.
options
HTTPUserAgent
and timeout
are used.
## needs Internet access, results vary curlGetHeaders("http://bugs.r-project.org") ## this redirects to https:// ## 2023-04: replaces slow and unreliable https://httpbin.org/status/404 curlGetHeaders("https://developer.R-project.org/inet-tests/not-found") ## returns status
## needs Internet access, results vary curlGetHeaders("http://bugs.r-project.org") ## this redirects to https:// ## 2023-04: replaces slow and unreliable https://httpbin.org/status/404 curlGetHeaders("https://developer.R-project.org/inet-tests/not-found") ## returns status
cut
divides the range of x
into intervals
and codes the values in x
according to which
interval they fall. The leftmost interval corresponds to level one,
the next leftmost to level two and so on.
cut(x, ...) ## Default S3 method: cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, ...)
cut(x, ...) ## Default S3 method: cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, ...)
x |
a numeric vector which is to be converted to a factor by cutting. |
breaks |
either a numeric vector of two or more unique cut points or a
single number (greater than or equal to 2) giving the number of
intervals into which |
labels |
labels for the levels of the resulting category. By default,
labels are constructed using |
include.lowest |
logical, indicating if an ‘x[i]’ equal to
the lowest (or highest, for |
right |
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa. |
dig.lab |
integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers. |
ordered_result |
logical: should the result be an ordered factor? |
... |
further arguments passed to or from other methods. |
When breaks
is specified as a single number, the range of the
data is divided into breaks
pieces of equal length, and then
the outer limits are moved away by 0.1% of the range to ensure that
the extreme values both fall within the break intervals. (If x
is a constant vector, equal-length intervals are created, one of
which includes the single value.)
If a labels
parameter is specified, its values are used to name
the factor levels. If none is specified, the factor level labels are
constructed as "(b1, b2]"
, "(b2, b3]"
etc. for
right = TRUE
and as "[b1, b2)"
, ... if right =
FALSE
.
In this case, dig.lab
indicates the minimum number of digits
should be used in formatting the numbers b1
, b2
, ....
A larger value (up to 12) will be used if needed to distinguish
between any pair of endpoints: if this fails labels such as
"Range3"
will be used. Formatting is done by
formatC
.
The default method will sort a numeric vector of breaks
, but
other methods are not required to and labels
will correspond to
the intervals after sorting.
As from R 3.2.0, getOption("OutDec")
is consulted when labels
are constructed for labels = NULL
.
A factor
is returned, unless labels = FALSE
which
results in an integer vector of level codes.
Values which fall outside the range of breaks
are coded as
NA
, as are NaN
and NA
values.
Instead of table(cut(x, br))
, hist(x, br, plot = FALSE)
is
more efficient and less memory hungry. Instead of cut(*,
labels = FALSE)
, findInterval()
is more efficient.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
split
for splitting a variable according to a group factor;
factor
, tabulate
, table
,
findInterval
.
quantile
for ways of choosing breaks of roughly equal
content (rather than length).
.bincode
for a bare-bones version.
Z <- stats::rnorm(10000) table(cut(Z, breaks = -6:6)) sum(table(cut(Z, breaks = -6:6, labels = FALSE))) sum(graphics::hist(Z, breaks = -6:6, plot = FALSE)$counts) cut(rep(1,5), 4) #-- dummy tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5) x <- rep(0:8, tx0) stopifnot(table(x) == tx0) table( cut(x, breaks = 8)) table( cut(x, breaks = 3*(-2:5))) table( cut(x, breaks = 3*(-2:5), right = FALSE)) ##--- some values OUTSIDE the breaks : table(cx <- cut(x, breaks = 2*(0:4))) table(cxl <- cut(x, breaks = 2*(0:4), right = FALSE)) which(is.na(cx)); x[is.na(cx)] #-- the first 9 values 0 which(is.na(cxl)); x[is.na(cxl)] #-- the last 5 values 8 ## Label construction: y <- stats::rnorm(100) table(cut(y, breaks = pi/3*(-3:3))) table(cut(y, breaks = pi/3*(-3:3), dig.lab = 4)) table(cut(y, breaks = 1*(-3:3), dig.lab = 4)) # extra digits don't "harm" here table(cut(y, breaks = 1*(-3:3), right = FALSE)) #- the same, since no exact INT! ## sometimes the default dig.lab is not enough to be avoid confusion: aaa <- c(1,2,3,4,5,2,3,4,5,6,7) cut(aaa, 3) cut(aaa, 3, dig.lab = 4, ordered_result = TRUE) ## one way to extract the breakpoints labs <- levels(cut(aaa, 3)) cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ), upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))
Z <- stats::rnorm(10000) table(cut(Z, breaks = -6:6)) sum(table(cut(Z, breaks = -6:6, labels = FALSE))) sum(graphics::hist(Z, breaks = -6:6, plot = FALSE)$counts) cut(rep(1,5), 4) #-- dummy tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5) x <- rep(0:8, tx0) stopifnot(table(x) == tx0) table( cut(x, breaks = 8)) table( cut(x, breaks = 3*(-2:5))) table( cut(x, breaks = 3*(-2:5), right = FALSE)) ##--- some values OUTSIDE the breaks : table(cx <- cut(x, breaks = 2*(0:4))) table(cxl <- cut(x, breaks = 2*(0:4), right = FALSE)) which(is.na(cx)); x[is.na(cx)] #-- the first 9 values 0 which(is.na(cxl)); x[is.na(cxl)] #-- the last 5 values 8 ## Label construction: y <- stats::rnorm(100) table(cut(y, breaks = pi/3*(-3:3))) table(cut(y, breaks = pi/3*(-3:3), dig.lab = 4)) table(cut(y, breaks = 1*(-3:3), dig.lab = 4)) # extra digits don't "harm" here table(cut(y, breaks = 1*(-3:3), right = FALSE)) #- the same, since no exact INT! ## sometimes the default dig.lab is not enough to be avoid confusion: aaa <- c(1,2,3,4,5,2,3,4,5,6,7) cut(aaa, 3) cut(aaa, 3, dig.lab = 4, ordered_result = TRUE) ## one way to extract the breakpoints labs <- levels(cut(aaa, 3)) cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ), upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))
Method for cut
applied to date-time objects.
## S3 method for class 'POSIXt' cut(x, breaks, labels = NULL, start.on.monday = TRUE, right = FALSE, ...) ## S3 method for class 'Date' cut(x, breaks, labels = NULL, start.on.monday = TRUE, right = FALSE, ...)
## S3 method for class 'POSIXt' cut(x, breaks, labels = NULL, start.on.monday = TRUE, right = FALSE, ...) ## S3 method for class 'Date' cut(x, breaks, labels = NULL, start.on.monday = TRUE, right = FALSE, ...)
x |
an object inheriting from class |
breaks |
a vector of cut points or number giving the number of
intervals which |
labels |
labels for the levels of the resulting category. By default,
labels are constructed from the left-hand end of the intervals
(which are included for the default value of |
start.on.monday |
logical. If |
right , ...
|
arguments to be passed to or from other methods. |
Note that the default for right
differs from the
default method. Using include.lowest =
TRUE
will include both ends of the range of dates.
Using breaks = "quarter"
will create intervals of 3 calendar
months, with the intervals beginning on January 1, April 1,
July 1 or October 1 (based upon min(x)
) as appropriate.
A vector of breaks
will be sorted before use: labels
should
correspond to the sorted vector.
A factor is returned, unless labels = FALSE
which returns
the integer level codes.
Values which fall outside the range of breaks
are coded as
NA
, as are and NA
values.
## random dates in a 10-week period cut(ISOdate(2001, 1, 1) + 70*86400*stats::runif(100), "weeks") cut(as.Date("2001/1/1") + 70*stats::runif(100), "weeks") # The standards all have midnight as the start of the day, but some # people incorrectly interpret it at the end of the previous day ... tm <- seq(as.POSIXct("2012-06-01 06:00"), by = "6 hours", length.out = 24) aggregate(1:24, list(day = cut(tm, "days")), mean) # and a version with midnight included in the previous day: aggregate(1:24, list(day = cut(tm, "days", right = TRUE)), mean)
## random dates in a 10-week period cut(ISOdate(2001, 1, 1) + 70*86400*stats::runif(100), "weeks") cut(as.Date("2001/1/1") + 70*stats::runif(100), "weeks") # The standards all have midnight as the start of the day, but some # people incorrectly interpret it at the end of the previous day ... tm <- seq(as.POSIXct("2012-06-01 06:00"), by = "6 hours", length.out = 24) aggregate(1:24, list(day = cut(tm, "days")), mean) # and a version with midnight included in the previous day: aggregate(1:24, list(day = cut(tm, "days", right = TRUE)), mean)
Determine the class of an arbitrary R object.
data.class(x)
data.class(x)
x |
an R object. |
character string giving the class of x
.
The class is the (first element) of the class
attribute if this is non-NULL
, or inferred from the object's
dim
attribute if this is non-NULL
, or mode(x)
.
Simply speaking, data.class(x)
returns what is typically useful
for method dispatching. (Or, what the basic creator functions already
and maybe eventually all will attach as a class attribute.)
For compatibility reasons, there is one exception to the rule above:
When x
is integer
, the result of
data.class(x)
is "numeric"
even when x
is classed.
x <- LETTERS data.class(factor(x)) # has a class attribute data.class(matrix(x, ncol = 13)) # has a dim attribute data.class(list(x)) # the same as mode(x) data.class(x) # the same as mode(x) stopifnot(data.class(1:2) == "numeric") # compatibility "rule"
x <- LETTERS data.class(factor(x)) # has a class attribute data.class(matrix(x, ncol = 13)) # has a dim attribute data.class(list(x)) # the same as mode(x) data.class(x) # the same as mode(x) stopifnot(data.class(1:2) == "numeric") # compatibility "rule"
The function data.frame()
creates data frames, tightly coupled
collections of variables which share many of the properties of
matrices and of lists, used as the fundamental data structure by most
of R's modeling software.
data.frame(..., row.names = NULL, check.rows = FALSE, check.names = TRUE, fix.empty.names = TRUE, stringsAsFactors = FALSE)
data.frame(..., row.names = NULL, check.rows = FALSE, check.names = TRUE, fix.empty.names = TRUE, stringsAsFactors = FALSE)
... |
these arguments are of either the form |
row.names |
|
check.rows |
if |
check.names |
logical. If |
fix.empty.names |
logical indicating if arguments which are
“unnamed” (in the sense of not being formally called as
|
stringsAsFactors |
logical: should character vectors be converted
to factors? The ‘factory-fresh’ default has been |
A data frame is a list of variables of the same number of rows with
unique row names, given class "data.frame"
. If no variables
are included, the row names determine the number of rows.
The column names should be non-empty, and attempts to use empty names
will have unsupported results. Duplicate column names are allowed,
but you need to use check.names = FALSE
for data.frame
to generate such a data frame. However, not all operations on data
frames will preserve duplicated column names: for example matrix-like
subsetting will force column names in the result to be unique.
data.frame
converts each of its arguments to a data frame by
calling as.data.frame(optional = TRUE)
. As that is a
generic function, methods can be written to change the behaviour of
arguments according to their classes: R comes with many such methods.
Character variables passed to data.frame
are converted to
factor columns if not protected by I
and argument
stringsAsFactors
is true. If a list or data
frame or matrix is passed to data.frame
it is as if each
component or column had been passed as a separate argument (except for
matrices protected by I
).
Objects passed to data.frame
should have the same number of
rows, but atomic vectors (see is.vector
), factors and
character vectors protected by I
will be recycled a
whole number of times if necessary (including as elements of list
arguments).
If row names are not supplied in the call to data.frame
, the
row names are taken from the first component that has suitable names,
for example a named vector or a matrix with rownames or a data frame.
(If that component is subsequently recycled, the names are discarded
with a warning.) If row.names
was supplied as NULL
or no
suitable component was found the row names are the integer sequence
starting at one (and such row names are considered to be
‘automatic’, and not preserved by as.matrix
).
If row names are supplied of length one and the data frame has a
single row, the row.names
is taken to specify the row names and
not a column (by name or number).
Names are removed from vector inputs not protected by I
.
A data frame, a matrix-like structure whose columns may be of differing types (numeric, logical, factor and character and so on).
How the names of the data frame are created is complex, and the rest
of this paragraph is only the basic story. If the arguments are all
named and simple objects (not lists, matrices of data frames) then the
argument names give the column names. For an unnamed simple argument,
a deparsed version of the argument is used as the name (with an
enclosing I(...)
removed). For a named matrix/list/data frame
argument with more than one named column, the names of the columns are
the name of the argument followed by a dot and the column name inside
the argument: if the argument is unnamed, the argument's column names
are used. For a named or unnamed matrix/list/data frame argument that
contains a single column, the column name in the result is the column
name in the argument. Finally, the names are adjusted to be unique
and syntactically valid unless check.names = FALSE
.
In versions of R prior to 2.4.0 row.names
had to be
character: to ensure compatibility with such versions of R, supply
a character vector as the row.names
argument.
Chambers, J. M. (1992) Data for models. Chapter 3 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
I
,
plot.data.frame
,
print.data.frame
,
row.names
, names
(for the column names),
[.data.frame
for subsetting methods
and I(matrix(..))
examples;
Math.data.frame
etc, about
Group methods for data.frame
s;
read.table
,
make.names
,
list2DF
for creating data frames from lists of variables.
L3 <- LETTERS[1:3] char <- sample(L3, 10, replace = TRUE) (d <- data.frame(x = 1, y = 1:10, char = char)) ## The "same" with automatic column names: data.frame(1, 1:10, sample(L3, 10, replace = TRUE)) is.data.frame(d) ## enable automatic conversion of character arguments to factor columns: (dd <- data.frame(d, fac = letters[1:10], stringsAsFactors = TRUE)) rbind(class = sapply(dd, class), mode = sapply(dd, mode)) stopifnot(1:10 == row.names(d)) # {coercion} (d0 <- d[, FALSE]) # data frame with 0 columns and 10 rows (d.0 <- d[FALSE, ]) # <0 rows> data frame (3 named cols) (d00 <- d0[FALSE, ]) # data frame with 0 columns and 0 rows
L3 <- LETTERS[1:3] char <- sample(L3, 10, replace = TRUE) (d <- data.frame(x = 1, y = 1:10, char = char)) ## The "same" with automatic column names: data.frame(1, 1:10, sample(L3, 10, replace = TRUE)) is.data.frame(d) ## enable automatic conversion of character arguments to factor columns: (dd <- data.frame(d, fac = letters[1:10], stringsAsFactors = TRUE)) rbind(class = sapply(dd, class), mode = sapply(dd, mode)) stopifnot(1:10 == row.names(d)) # {coercion} (d0 <- d[, FALSE]) # data frame with 0 columns and 10 rows (d.0 <- d[FALSE, ]) # <0 rows> data frame (3 named cols) (d00 <- d0[FALSE, ]) # data frame with 0 columns and 0 rows
Return the matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Factors and ordered factors are replaced by their internal codes.
data.matrix(frame, rownames.force = NA)
data.matrix(frame, rownames.force = NA)
frame |
a data frame whose components are logical vectors, factors or numeric or character vectors. |
rownames.force |
logical indicating if the resulting matrix
should have character (rather than |
Logical and factor columns are converted to integers. Character
columns are first converted to factors and then to integers. Any other
column which is not numeric (according to is.numeric
) is
converted by as.numeric
or, for S4 objects,
as(, "numeric")
. If all columns are integer (after
conversion) the result is an integer matrix, otherwise a numeric
(double) matrix.
If frame
inherits from class "data.frame"
, an integer or
numeric matrix of the same dimensions as frame
, with dimnames
taken from the row.names
(or NULL
, depending on
rownames.force
) and names
.
Otherwise, the result of as.matrix
.
The default behaviour for data frames differs from R < 2.5.0 which always gave the result character rownames.
Chambers, J. M. (1992) Data for models. Chapter 3 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
as.matrix
,
data.frame
,
matrix
.
DF <- data.frame(a = 1:3, b = letters[10:12], c = seq(as.Date("2004-01-01"), by = "week", length.out = 3), stringsAsFactors = TRUE) data.matrix(DF[1:2]) data.matrix(DF)
DF <- data.frame(a = 1:3, b = letters[10:12], c = seq(as.Date("2004-01-01"), by = "week", length.out = 3), stringsAsFactors = TRUE) data.matrix(DF[1:2]) data.matrix(DF)
Returns a character string of the current system date and time.
date()
date()
The string has the form "Fri Aug 20 11:11:00 1999"
, i.e.,
length 24, since it relies on POSIX's ctime
ensuring the above
fixed format. Timezone and Daylight Saving Time are taken account of,
but not indicated in the result.
The day and month abbreviations are always in English, irrespective of locale.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Sys.Date
and Sys.time
; Date
and
DateTimeClasses
for objects representing date and time.
(d <- date()) nchar(d) == 24 ## something similar in the current locale ## depending on ctime; e.g. %e could be %d: format(Sys.time(), "%a %b %e %H:%M:%S %Y")
(d <- date()) nchar(d) == 24 ## something similar in the current locale ## depending on ctime; e.g. %e could be %d: format(Sys.time(), "%a %b %e %H:%M:%S %Y")
Description of the class "Date"
representing calendar dates.
## S3 method for class 'Date' summary(object, digits = 12, ...) ## S3 method for class 'Date' print(x, max = NULL, ...)
## S3 method for class 'Date' summary(object, digits = 12, ...) ## S3 method for class 'Date' print(x, max = NULL, ...)
object , x
|
a |
digits |
number of significant digits for the computations. |
max |
numeric or |
... |
further arguments to be passed from or to other methods. |
Dates are represented as the number of days since 1970-01-01, with negative values for earlier dates. They are always printed following the rules of the current Gregorian calendar, even though that calendar was not in use long ago (it was adopted in 1752 in Great Britain and its colonies). When printing there is assumed to be a year zero.
It is intended that the date should be an integer value, but this is
not enforced in the internal representation. Fractional days will be
ignored when printing. It is possible to produce fractional days via
the mean
method or by adding or subtracting (see
Ops.Date
).
When a date is converted to a date-time (for example by
as.POSIXct
or as.POSIXlt
its time is taken
as midnight in UTC.
Printing dates involves conversion to class "POSIXlt"
which treats dates of more than about 780 million years from present
as NA
.
For the many methods see methods(class = "Date")
. Several are
documented separately, see below.
Sys.Date
for the current date.
weekdays
for convenience extraction functions.
Methods with extra arguments and documentation:
Ops.Date
for operators on "Date"
objects.
format.Date
for conversion to and from character strings.
axis.Date
and hist.Date
for plotting.
seq.Date
, cut.Date
, and
round.Date
for utility operations.
DateTimeClasses
for date-time classes.
(today <- Sys.Date()) format(today, "%d %b %Y") # with month as a word (tenweeks <- seq(today, length.out=10, by="1 week")) # next ten weeks weekdays(today) months(tenweeks) (Dls <- as.Date(.leap.seconds)) ## Show use of year zero: (z <- as.Date("01-01-01")) # how it is printed depends on the OS z - 365 # so year zero was a leap year. as.Date("00-02-29") # if you want a different format, consider something like (if supported) ## Not run: format(z, "%04Y-%m-%d") # "0001-01-01" format(z, "%_4Y-%m-%d") # " 1-01-01" format(z, "%_Y-%m-%d") # "1-01-01" ## End(Not run) ## length(<Date>) <- n now works ls <- Dls; length(ls) <- 12 l2 <- Dls; length(l2) <- 5 + length(Dls) stopifnot(exprs = { ## length(.) <- * is compatible to subsetting/indexing: identical(ls, Dls[seq_along(ls)]) identical(l2, Dls[seq_along(l2)]) ## has filled with NA's is.na(l2[(length(Dls)+1):length(l2)]) })
(today <- Sys.Date()) format(today, "%d %b %Y") # with month as a word (tenweeks <- seq(today, length.out=10, by="1 week")) # next ten weeks weekdays(today) months(tenweeks) (Dls <- as.Date(.leap.seconds)) ## Show use of year zero: (z <- as.Date("01-01-01")) # how it is printed depends on the OS z - 365 # so year zero was a leap year. as.Date("00-02-29") # if you want a different format, consider something like (if supported) ## Not run: format(z, "%04Y-%m-%d") # "0001-01-01" format(z, "%_4Y-%m-%d") # " 1-01-01" format(z, "%_Y-%m-%d") # "1-01-01" ## End(Not run) ## length(<Date>) <- n now works ls <- Dls; length(ls) <- 12 l2 <- Dls; length(l2) <- 5 + length(Dls) stopifnot(exprs = { ## length(.) <- * is compatible to subsetting/indexing: identical(ls, Dls[seq_along(ls)]) identical(l2, Dls[seq_along(l2)]) ## has filled with NA's is.na(l2[(length(Dls)+1):length(l2)]) })
Description of the classes "POSIXlt"
and "POSIXct"
representing calendar dates and times.
## S3 method for class 'POSIXct' print(x, tz = "", usetz = TRUE, max = NULL, ...) ## S3 method for class 'POSIXct' summary(object, digits = 15, ...) time + z z + time time - z time1 lop time2
## S3 method for class 'POSIXct' print(x, tz = "", usetz = TRUE, max = NULL, ...) ## S3 method for class 'POSIXct' summary(object, digits = 15, ...) time + z z + time time - z time1 lop time2
x , object
|
an object to be printed or summarized from one of the date-time classes. |
tz , usetz
|
for timezone formatting, passed to |
max |
numeric or |
digits |
number of significant digits for the computations: should be high enough to represent the least important time unit exactly. |
... |
further arguments to be passed from or to other methods. |
time |
date-time objects. |
time1 , time2
|
date-time objects or character vectors. (Character
vectors are converted by |
z |
a numeric vector (in seconds). |
lop |
one of |
There are two basic classes of date/times. Class "POSIXct"
represents the (signed) number of seconds since the beginning of 1970
(in the UTC time zone) as a numeric vector. Class "POSIXlt"
is
internally a list
of vectors with components named
sec
, min
, hour
for the time,
mday
, mon
, and year
, for the date,
wday
, yday
for the day of the week and day of the year,
isdst
, a Daylight Saving Time flag,
and sometimes (both optional)
zone
, a string for the time zone, and
gmtoff
, offset in seconds from GMT,
see the section ‘Details on POSIXlt’ below for more details.
The classes correspond to the POSIX/C99 constructs of ‘calendar
time’ (the time_t
data type, “ct”), and ‘local time’
(or broken-down time, the ‘struct tm’ data type, “lt”),
from which they also inherit their names.
"POSIXct"
is more convenient for including in data frames, and
"POSIXlt"
is closer to human-readable forms. A virtual class
"POSIXt"
exists from which both of the classes inherit: it is
used to allow operations such as subtraction to mix the two classes.
Logical comparisons and some arithmetic operations are available for
both classes. One can add or subtract a number of seconds from a
date-time object, but not add two date-time objects. Subtraction of
two date-time objects is equivalent to using difftime
.
Be aware that "POSIXlt"
objects will be interpreted as being in
the current time zone for these operations unless a time zone has been
specified.
Both classes may have an attribute "tzone"
, specifying the time
zone. Note however that their meaning differ, see the section
‘Time Zones’ below for more details.
Unfortunately, the conversion is complicated by the operation of time
zones and leap seconds (according to this version of R's data,
27 days have been 86401 seconds long so
far, the last being on (actually, immediately before)
2017-01-01: the times of the
extra seconds are in the object .leap.seconds
). The details of
this are entrusted to the OS services where possible. It seems that
some rare systems used to use leap seconds, but all known current
platforms ignore them (as required by POSIX). This is detected and
corrected for at build time, so "POSIXct"
times used by R do
not include leap seconds on any platform.
Using c
on "POSIXlt"
objects converts them to the
current time zone, and on "POSIXct"
objects drops "tzone"
attributes if they are not all the same.
A few times have specific issues. First, the leap seconds are ignored,
and real times such as "2005-12-31 23:59:60"
are (probably)
treated as the next second. However, they will never be generated by
R, and are unlikely to arise as input. Second, on some OSes there is
a problem in the POSIX/C99 standard with "1969-12-31 23:59:59 UTC"
,
which is -1
in calendar time and that value is on those OSes
also used as an error code. Thus as.POSIXct("1969-12-31
23:59:59", format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
may give
NA
, and hence as.POSIXct("1969-12-31 23:59:59",
tz = "UTC")
will give "1969-12-31 23:59:00"
. Other OSes
(including the code used by R on Windows) report errors separately
and so are able to handle that time as valid.
The print methods respect options("max.print")
.
"POSIXlt"
objects will often have an attribute "tzone"
,
a character vector of length 3 giving the time zone name (from the TZ
environment variable or argument tz
of functions creating
"POSIXlt"
objects; ""
marks the current time zone)
and the names of the base time zone
and the alternate (daylight-saving) time zone. Sometimes this may
just be of length one, giving the time zone name.
"POSIXct"
objects may also have an attribute "tzone"
, a
character vector of length one. If set to a non-empty value, it will
determine how the object is converted to class "POSIXlt"
and in
particular how it is printed. This is usually desirable, but if you
want to specify an object in a particular time zone but to be printed
in the current time zone you may want to remove the "tzone"
attribute.
Class "POSIXlt"
is internally a named list
of
vectors representing date-times, with the following list components
sec
0–61: seconds, allowing for leap seconds.
min
0–59: minutes.
hour
0–23: hours.
mday
1–31: day of the month.
mon
0–11: months after the first of the year.
year
years since 1900.
wday
0–6 day of the week, starting on Sunday.
yday
0–365: day of the year (365 only in leap years).
isdst
Daylight Saving Time flag. Positive if in force, zero if not, negative if unknown.
zone
(Optional.) The abbreviation for the time zone in
force at that time: ""
if unknown (but ""
might also
be used for UTC).
gmtoff
(Optional.) The offset in seconds from GMT:
positive values are East of the meridian. Usually NA
if
unknown, but 0
could mean unknown.
The components must be in this order: that was only minimally checked
prior to R 4.3.0. All objects created in R 4.3.0 have the optional
components. From earlier versions of R, he last two components will
not be present for times in UTC and are platform-dependent. Currently
gmtoff
is set on almost all current platforms: those based on
BSD or glibc
(including Linux and macOS) and those using the
tzcode
implementation shipped with R (including Windows and by
default macOS).
Note that the internal list structure is somewhat hidden, as many
methods (including length(x)
, print()
and
str()
) apply to the abstract date-time vector, as for
"POSIXct"
. One can extract and replace single
components via [
indexing with two indices (see the
examples).
The components of "POSIXlt"
are integer
vectors,
except sec
(double
) and zone
(character
). However most users will coerce numeric
values for the first to real and the rest bar zone
to integer.
Components wday
and yday
are for information, and are not
used in the conversion to calendar time nor for printing,
format()
, or in as.character()
.
However, component isdst
is needed to distinguish times at the
end of DST: typically 1am to 2am occurs twice, first in DST and then
in standard time. At all other times isdst
can be deduced from
the first six values, but the behaviour if it is set incorrectly is
platform-dependent. For example Linux/glibc when checked fixed up
incorrect values in time zones which support DST but gave an error on
value 1
in those without DST.
For “ragged” and out-of-range vs “balanced”
"POSIXlt"
objects, see balancePOSIXlt()
.
Classes "POSIXct"
and "POSIXlt"
are able to express
fractions of a second where the latter allows for higher accuracy.
Consequently, conversion of fractions between the two forms
may not be exact, but will have better than microsecond accuracy.
Fractional seconds are printed only if
options("digits.secs")
is set: see strftime
.
The "POSIXlt"
class can represent a very wide range of times (up
to billions of years), but such times can only be interpreted with
reference to a time zone.
The concept of time zones was first adopted in the nineteenth century, and the Gregorian calendar was introduced in 1582 but not universally adopted until 1927. OS services almost invariably assume the Gregorian calendar and may assume that the time zone that was first enacted for the location was in force before that date. (The earliest legislated time zone seems to have been London on 1847-12-01.) Some OSes assume the previous use of ‘local time’ based on the longitude of a location within the time zone.
Most operating systems represent POSIXct
times as C type
long
. This means that on 32-bit OSes this covers the period
1902 to 2037. On all known 64-bit platforms and for the code we use
on 32-bit Windows, the range of representable times is billions of
years: however, not all can convert correctly times before 1902 or
after 2037. A few benighted OSes used a unsigned type and so cannot
represent times before 1970.
Where possible the platform limits are detected, and outside
the limits we use our own C code. This uses the offset from
GMT in use either for 1902 (when there was no DST) or that predicted
for one of 2030 to 2037 (chosen so that the likely DST transition days
are Sundays), and uses the alternate (daylight-saving) time zone only
if isdst
is positive or (if -1
) if DST was predicted to
be in operation in the 2030s on that day.
Note that there are places (e.g., Rome) whose offset from UTC varied in the years prior to 1902, and these will be handled correctly only where there is OS support.
There is no reason to assume that the DST rules will remain the same in the future: the US legislated in 2005 to change its rules as from 2007, with a possible future reversion. So conversions for times more than a year or two ahead are speculative. Other countries have changed their rules (and indeed, if DST is used at all) at a few days' notice. So representations and conversion of future dates are tentative. This also applies to dates after the in-use version of the time-zone database – not all platforms keep it up to date, which includes that shipped with older versions of R where used (which it is by default on Windows and macOS).
Some Unix-like systems (especially Linux ones) do not have environment
variable TZ set, yet have internal code that expects it (as does
POSIX). We have tried to work around this, but if you get unexpected
results try setting TZ. See Sys.timezone
for
valid settings.
Great care is needed when comparing objects of class "POSIXlt"
.
Not only are components and attributes optional; several components
may have values meaning ‘not yet determined’ and the same time
represented in different time zones will look quite different.
The order of the list components of "POSIXlt"
objects
must not be changed, as several C-based conversion methods rely on the
order for efficiency.
Ripley, B. D. and Hornik, K. (2001). “Date-time classes.” R News, 1(2), 8–11. https://www.r-project.org/doc/Rnews/Rnews_2001-2.pdf.
Dates for dates without times.
as.POSIXct
and as.POSIXlt
for conversion
between the classes.
strptime
for conversion to and from character
representations.
Sys.time
for clock time as a "POSIXct"
object.
difftime
for time intervals.
balancePOSIXlt()
for balancing or filling “ragged”
POSIXlt objects.
cut.POSIXt
, seq.POSIXt
,
round.POSIXt
and trunc.POSIXt
for methods
for these classes.
weekdays
for convenience extraction functions.
(z <- Sys.time()) # the current date, as class "POSIXct" Sys.time() - 3600 # an hour ago as.POSIXlt(Sys.time(), "GMT") # the current time in GMT format(.leap.seconds) # the leap seconds in your time zone print(.leap.seconds, tz = "America/Los_Angeles") # and in Seattle's ## look at *internal* representation of "POSIXlt" : leapS <- as.POSIXlt(.leap.seconds) names(unclass(leapS)) ; is.list(leapS) ## str() on inner structure needs unclass(.): utils::str(unclass(leapS), vec.len = 7) ## show all (apart from "tzone" attr): data.frame(unclass(leapS)) ## Extracting *single* components of POSIXlt objects: leapS[1 : 5, "year"] leapS[17:22, "mon" ] ## length(.) <- n now works for "POSIXct" and "POSIXlt" : for(lpS in list(.leap.seconds, leapS)) { ls <- lpS; length(ls) <- 12 l2 <- lpS; length(l2) <- 5 + length(lpS) stopifnot(exprs = { ## length(.) <- * is compatible to subsetting/indexing: identical(ls, lpS[seq_along(ls)]) identical(l2, lpS[seq_along(l2)]) ## has filled with NA's is.na(l2[(length(lpS)+1):length(l2)]) }) }
(z <- Sys.time()) # the current date, as class "POSIXct" Sys.time() - 3600 # an hour ago as.POSIXlt(Sys.time(), "GMT") # the current time in GMT format(.leap.seconds) # the leap seconds in your time zone print(.leap.seconds, tz = "America/Los_Angeles") # and in Seattle's ## look at *internal* representation of "POSIXlt" : leapS <- as.POSIXlt(.leap.seconds) names(unclass(leapS)) ; is.list(leapS) ## str() on inner structure needs unclass(.): utils::str(unclass(leapS), vec.len = 7) ## show all (apart from "tzone" attr): data.frame(unclass(leapS)) ## Extracting *single* components of POSIXlt objects: leapS[1 : 5, "year"] leapS[17:22, "mon" ] ## length(.) <- n now works for "POSIXct" and "POSIXlt" : for(lpS in list(.leap.seconds, leapS)) { ls <- lpS; length(ls) <- 12 l2 <- lpS; length(l2) <- 5 + length(lpS) stopifnot(exprs = { ## length(.) <- * is compatible to subsetting/indexing: identical(ls, lpS[seq_along(ls)]) identical(l2, lpS[seq_along(l2)]) ## has filled with NA's is.na(l2[(length(lpS)+1):length(l2)]) }) }
Reads or writes an R object from/to a file in Debian Control File format.
read.dcf(file, fields = NULL, all = FALSE, keep.white = NULL) write.dcf(x, file = "", append = FALSE, useBytes = FALSE, indent = 0.1 * getOption("width"), width = 0.9 * getOption("width"), keep.white = NULL)
read.dcf(file, fields = NULL, all = FALSE, keep.white = NULL) write.dcf(x, file = "", append = FALSE, useBytes = FALSE, indent = 0.1 * getOption("width"), width = 0.9 * getOption("width"), keep.white = NULL)
file |
either a character string naming a file or a connection.
|
fields |
a character vector with the names of the fields to read from the DCF file. Default is to read all fields. |
all |
a logical indicating whether in case of multiple
occurrences of a field in a record, all these should be gathered.
If |
keep.white |
a character vector with the names of the fields for
which whitespace should be kept as is, or |
x |
the object to be written, typically a data frame. If not, it
is attempted to coerce |
append |
logical. If |
useBytes |
logical to be passed to |
indent |
a positive integer specifying the indentation for continuation lines in output entries. |
width |
a positive integer giving the target column for wrapping lines in the output. |
DCF is a simple format for storing databases in plain text files that can easily be directly read and written by humans. DCF is used in various places to store R system information, like descriptions and contents of packages.
The DCF rules as implemented in R are:
A database consists of one or more records, each with one or more named fields. Not every record must contain each field. Fields may appear more than once in a record.
Regular lines start with a non-whitespace character.
Regular lines are of form tag:value
, i.e., have a name
tag and a value for the field, separated by :
(only the first
:
counts). The value can be empty (i.e., whitespace only).
Lines starting with whitespace are continuation lines (to the preceding field) if at least one character in the line is non-whitespace. Continuation lines where the only non-whitespace character is a ‘.’ are taken as blank lines (allowing for multi-paragraph field values).
Records are separated by one or more empty (i.e., whitespace only) lines.
Individual lines may not be arbitrarily long; prior to R 3.0.2 the length limit was approximately 8191 bytes per line.
Note that read.dcf(all = FALSE)
reads the file byte-by-byte.
This allows a ‘DESCRIPTION’ file to be read and only its ASCII
fields used, or its ‘Encoding’ field used to re-encode the
remaining fields.
write.dcf
does not write NA
fields.
The default read.dcf(all = FALSE)
returns a character matrix
with one row per record and one column per field. Leading and
trailing whitespace of field values is ignored unless a field is
listed in keep.white
. If a tag name is specified in the file,
but the corresponding value is empty, then an empty string is
returned. If the tag name of a field is specified in fields
but never used in a record, then the corresponding value is NA
.
If fields are repeated within a record, the last one encountered is
returned. Malformed lines lead to an error.
For read.dcf(all = TRUE)
a data frame is returned, again with
one row per record and one column per field. The columns are lists of
character vectors for fields with multiple occurrences, and character
vectors otherwise.
Note that an empty file
is a valid DCF file, and
read.dcf
will return a zero-row matrix or data frame.
For write.dcf
, invisible NULL
.
As from R 3.4.0, ‘whitespace’ in all cases includes newlines.
https://www.debian.org/doc/debian-policy/ch-controlfields.html.
Note that R does not require encoding in UTF-8, which is a recent Debian requirement. Nor does it use the Debian-specific sub-format which allows comment lines starting with ‘#’.
available.packages
, which uses read.dcf
to read
the indices of package repositories.
## Create a reduced version of the DESCRIPTION file in package 'splines' x <- read.dcf(file = system.file("DESCRIPTION", package = "splines"), fields = c("Package", "Version", "Title")) write.dcf(x) ## An online DCF file with multiple records con <- url("https://cran.r-project.org/src/contrib/PACKAGES") y <- read.dcf(con, all = TRUE) close(con) utils::str(y)
## Create a reduced version of the DESCRIPTION file in package 'splines' x <- read.dcf(file = system.file("DESCRIPTION", package = "splines"), fields = c("Package", "Version", "Title")) write.dcf(x) ## An online DCF file with multiple records con <- url("https://cran.r-project.org/src/contrib/PACKAGES") y <- read.dcf(con, all = TRUE) close(con) utils::str(y)
Set, unset or query the debugging flag on a function.
The text
and condition
arguments are the same as those
that can be supplied via a call to browser
. They can be retrieved
by the user once the browser has been entered, and provide a mechanism to
allow users to identify which breakpoint has been activated.
debug(fun, text = "", condition = NULL, signature = NULL) debugonce(fun, text = "", condition = NULL, signature = NULL) undebug(fun, signature = NULL) isdebugged(fun, signature = NULL) debuggingState(on = NULL)
debug(fun, text = "", condition = NULL, signature = NULL) debugonce(fun, text = "", condition = NULL, signature = NULL) undebug(fun, signature = NULL) isdebugged(fun, signature = NULL) debuggingState(on = NULL)
fun |
any interpreted R function. |
text |
a text string that can be retrieved when the browser is entered. |
condition |
a condition that can be retrieved when the browser is entered. |
signature |
an optional method signature. If specified, the method is debugged, rather than its generic. |
on |
logical; a call to the support function
|
When a function flagged for debugging is entered, normal execution
is suspended and the body of function is executed one statement at a
time. A new browser
context is initiated for each step
(and the previous one destroyed).
At the debug prompt the user can enter commands or R expressions,
followed by a newline. The commands are described in the
browser
help topic.
To debug a function which is defined inside another function,
single-step through to the end of its definition, and then call
debug
on its name.
If you want to debug a function not starting at the very beginning,
use trace(..., at = *)
or setBreakpoint
.
Using debug
is persistent, and unless debugging is turned off
the debugger will be entered on every invocation (note that if the
function is removed and replaced the debug state is not preserved).
Use debugonce()
to enter the debugger only the next time the
function is invoked.
To debug an S4 method by explicit signature, use
signature
. When specified, signature indicates the method of
fun
to be debugged. Note that debugging is implemented slightly
differently for this case, as it uses the trace machinery, rather than
the debugging bit. As such, text
and condition
cannot be
specified in combination with a non-null signature
. For methods
which implement the .local
rematching mechanism, the
.local
closure itself is the one that will be ultimately
debugged (see isRematched
).
isdebugged
returns TRUE
if a) signature
is NULL
and the closure fun
has been debugged, or b) signature
is not
NULL
, fun
is an S4 generic, and the method of fun
for that signature has been debugged. In all other cases, it returns
FALSE
.
The number of lines printed for the deparsed call when a function is
entered for debugging can be limited by setting
options(deparse.max.lines)
.
When debugging is enabled on a byte compiled function then the interpreted version of the function will be used until debugging is disabled.
debug
and undebug
invisibly return NULL
.
isdebugged
returns TRUE
if the function or method is
marked for debugging, and FALSE
otherwise.
debugcall
for conveniently debugging methods,
browser
notably for its ‘commands’, trace
;
traceback
to see the stack after an Error: ...
message; recover
for another debugging approach.
## Not run: debug(library) library(methods) ## End(Not run) ## Not run: debugonce(sample) ## only the first call will be debugged sampe(10, 1) sample(10, 1) ## End(Not run)
## Not run: debug(library) library(methods) ## End(Not run) ## Not run: debugonce(sample) ## only the first call will be debugged sampe(10, 1) sample(10, 1) ## End(Not run)
A framework for specifying information about R code for use by the interpreter, compiler, and code analysis tools.
declare(...)
declare(...)
... |
declaration expressions. |
A syntax for declaration expressions is still being developed.
Evaluating a declare()
call ignores the arguments and returns
NULL
invisibly.
When a function is removed from R it should be replaced by a function
which calls .Defunct
.
.Defunct(new, package = NULL, msg)
.Defunct(new, package = NULL, msg)
new |
character string: A suggestion for a replacement function. |
package |
character string: The package to be used when suggesting where the defunct function might be listed. |
msg |
character string: A message to be printed, if missing a default message is used. |
.Defunct
is called from defunct functions. Functions should be
listed in help("pkg-defunct")
for an appropriate pkg
,
including base
(with the alias added to the respective Rd
file).
.Defunct
signals an error of class defunctError
with fields old
, new
, and package
.
base-defunct
and so on which list the defunct functions
in the packages.
delayedAssign
creates a promise to evaluate the given
expression if its value is requested. This provides direct access
to the lazy evaluation mechanism used by R for the evaluation
of (interpreted) functions.
delayedAssign(x, value, eval.env = parent.frame(1), assign.env = parent.frame(1))
delayedAssign(x, value, eval.env = parent.frame(1), assign.env = parent.frame(1))
x |
a variable name (given as a quoted string in the function call) |
value |
an expression to be assigned to |
eval.env |
an environment in which to evaluate |
assign.env |
an environment in which to assign |
Both eval.env
and assign.env
default to the currently active
environment.
The expression assigned to a promise by delayedAssign
will
not be evaluated until it is eventually ‘forced’. This happens when
the variable is first accessed.
When the promise is eventually forced, it is evaluated within the
environment specified by eval.env
(whose contents may have changed in
the meantime). After that, the value is fixed and the expression will
not be evaluated again, where the promise still keeps its expression.
This function is invoked for its side effect, which is assigning
a promise to evaluate value
to the variable x
.
substitute
, to see the expression associated with a
promise, if assign.env
is not the .GlobalEnv
.
msg <- "old" delayedAssign("x", msg) substitute(x) # shows only 'x', as it is in the global env. msg <- "new!" x # new! delayedAssign("x", { for(i in 1:3) cat("yippee!\n") 10 }) x^2 #- yippee x^2 #- simple number ne <- new.env() delayedAssign("x", pi + 2, assign.env = ne) ## See the promise {without "forcing" (i.e. evaluating) it}: substitute(x, ne) # 'pi + 2' ### Promises in an environment [for advanced users]: --------------------- e <- (function(x, y = 1, z) environment())(cos, "y", {cat(" HO!\n"); pi+2}) ## How can we look at all promises in an env (w/o forcing them)? gete <- function(e_) { ne <- names(e_) names(ne) <- ne lapply(lapply(ne, as.name), function(n) eval(substitute(substitute(X, e_), list(X=n)))) } (exps <- gete(e)) sapply(exps, typeof) (le <- as.list(e)) # evaluates ("force"s) the promises stopifnot(identical(le, lapply(exps, eval))) # and another "Ho!"
msg <- "old" delayedAssign("x", msg) substitute(x) # shows only 'x', as it is in the global env. msg <- "new!" x # new! delayedAssign("x", { for(i in 1:3) cat("yippee!\n") 10 }) x^2 #- yippee x^2 #- simple number ne <- new.env() delayedAssign("x", pi + 2, assign.env = ne) ## See the promise {without "forcing" (i.e. evaluating) it}: substitute(x, ne) # 'pi + 2' ### Promises in an environment [for advanced users]: --------------------- e <- (function(x, y = 1, z) environment())(cos, "y", {cat(" HO!\n"); pi+2}) ## How can we look at all promises in an env (w/o forcing them)? gete <- function(e_) { ne <- names(e_) names(ne) <- ne lapply(lapply(ne, as.name), function(n) eval(substitute(substitute(X, e_), list(X=n)))) } (exps <- gete(e)) sapply(exps, typeof) (le <- as.list(e)) # evaluates ("force"s) the promises stopifnot(identical(le, lapply(exps, eval))) # and another "Ho!"
Turn unevaluated expressions into character strings.
deparse(expr, width.cutoff = 60L, backtick = mode(expr) %in% c("call", "expression", "(", "function"), control = c("keepNA", "keepInteger", "niceNames", "showAttributes"), nlines = -1L) deparse1(expr, collapse = " ", width.cutoff = 500L, ...)
deparse(expr, width.cutoff = 60L, backtick = mode(expr) %in% c("call", "expression", "(", "function"), control = c("keepNA", "keepInteger", "niceNames", "showAttributes"), nlines = -1L) deparse1(expr, collapse = " ", width.cutoff = 500L, ...)
expr |
any R expression. |
width.cutoff |
integer in |
backtick |
logical indicating whether symbolic names should be enclosed in backticks if they do not follow the standard syntax. |
control |
character vector (or |
nlines |
integer: the maximum number of lines to produce. Negative values indicate no limit. |
collapse |
a string, passed to |
... |
further arguments passed to |
These functions turn unevaluated expressions (where ‘expression’
is taken in a wider sense than the strict concept of a vector of
mode
and type (typeof
)
"expression"
used in expression
) into character
strings (a kind of inverse to parse
).
A typical use of this is to create informative labels for data sets
and plots. The example shows a simple use of this facility. It uses
the functions deparse
and substitute
to create labels
for a plot which are character string versions of the actual arguments
to the function myplot
.
The default for the backtick
option is not to quote single
symbols but only composite expressions. This is a compromise to
avoid breaking existing code.
width.cutoff
is a lower bound for the line lengths: deparsing a
line proceeds until at least width.cutoff
bytes have
been output and e.g. arg = value
expressions will not be split
across lines.
deparse1()
is a simple utility added in R 4.0.0 to ensure a
string result (character
vector of length one),
typically used in name construction, as
deparse1(substitute(.))
.
To avoid the risk of a source attribute out of sync with the actual function definition, the source attribute of a function will never be deparsed as an attribute.
Deparsing internal structures may not be accurate: for example the
graphics display list recorded by recordPlot
is not
intended to be deparsed and .Internal
calls will be shown as
primitive calls.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
.deparseOpts
for available control
settings;
dput()
and dump()
for related functions using
identical internal deparsing functionality.
substitute
,
parse
,
expression
.
Quotes
for quoting conventions, including backticks.
require(stats); require(graphics) deparse(args(lm)) deparse(args(lm), width.cutoff = 500) myplot <- function(x, y) { plot(x, y, xlab = deparse1(substitute(x)), ylab = deparse1(substitute(y))) } e <- quote(`foo bar`) deparse(e) deparse(e, backtick = TRUE) e <- quote(`foo bar`+1) deparse(e) deparse(e, control = "all") # wraps it w/ quote( . )
require(stats); require(graphics) deparse(args(lm)) deparse(args(lm), width.cutoff = 500) myplot <- function(x, y) { plot(x, y, xlab = deparse1(substitute(x)), ylab = deparse1(substitute(y))) } e <- quote(`foo bar`) deparse(e) deparse(e, backtick = TRUE) e <- quote(`foo bar`+1) deparse(e) deparse(e, control = "all") # wraps it w/ quote( . )
Process the deparsing options for deparse
, dput
and
dump
.
.deparseOpts(control) ..deparseOpts
.deparseOpts(control) ..deparseOpts
control |
character vector of deparsing options. |
..deparseOpts
is the character
vector of possible
deparsing options used by .deparseOpts()
.
.deparseOpts()
is called by deparse
, dput
and
dump
to process their control
argument.
The control
argument is a vector containing zero or more of the
following strings (exactly those in ..deparseOpts
). Partial
string matching is used.
"keepInteger"
:Either surround integer vectors by as.integer()
or use
suffix L
, so they are not converted to type double when
parsed. This includes making sure that integer NA
s are
preserved (via NA_integer_
if there are no non-NA
values in the vector, unless "S_compatible"
is set).
"quoteExpressions"
:Surround unevaluated expressions, but not formula
s,
with quote()
, so they are not evaluated when re-parsed.
"showAttributes"
:If the object has attributes
(other than a source
attribute, see srcref
), use structure()
to display them as well as the object value unless the only such
attribute is names
and the "niceNames"
option is set.
This ("showAttributes"
) is the default for
deparse
and dput
.
"useSource"
:If the object has a source
attribute (srcref
),
display that instead of deparsing the object. Currently only
applies to function definitions.
"warnIncomplete"
:Some exotic objects such as environments, external pointers, etc. can not be deparsed properly. This option causes a warning to be issued if the deparser recognizes one of these situations.
Also, the parser in R < 2.7.0 would only accept strings of up to 8192 bytes, and this option gives a warning for longer strings.
"keepNA"
:Integer, real and character NA
s are surrounded by coercion
functions where necessary to ensure that they are parsed to the
same type. Since e.g. NA_real_
can be output in R, this is
mainly used in connection with S_compatible
.
"niceNames"
:If true, list
s and atomic vectors with non-NA
names (see names
) are deparsed as e.g., c(A = 1)
instead of structure(1, names = "A")
, independently of the
"showAttributes"
setting.
"all"
:An abbreviated way to specify all of the options
listed above plus "digits17"
.
This is the default for dump
, and, without "digits17"
, the options
used by edit
(which are fixed).
"delayPromises"
:Deparse promises in the form <promise: expression> rather than evaluating them. The value and the environment of the promise will not be shown and the deparsed code cannot be sourced.
"S_compatible"
:Make deparsing as far as possible compatible with S and R < 2.5.0. For compatibility with S, integer values of double vectors are deparsed with a trailing decimal point. Backticks are not used.
"hexNumeric"
:Real and finite complex numbers are output in ‘"%a"’ format as
binary fractions (coded as hexadecimal: see sprintf
)
with maximal opportunity to be recorded exactly to full precision.
Complex numbers with one or both non-finite components are
output as if this option were not set.
(This relies on that format being correctly supported: known problems on Windows are worked around as from R 3.1.2.)
"digits17"
:Real and finite complex numbers are output using format ‘"%.17g"’ which may give more precision than the default (but the output will depend on the platform and there may be loss of precision when read back). Complex numbers with one or both non-finite components are output as if this option were not set.
"exact"
:An abbreviated way to specify control = c("all", "hexNumeric")
which is guaranteed to be exact for numbers, see also below.
For the most readable (but perhaps incomplete) display, use
control = NULL
. This displays the object's value, but not its
attributes. The default in deparse
is to display the
attributes as well, but not to use any of the other options to make
the result parseable. (dump
uses more default options via
control = "all"
, and printing of functions without sources
uses c("keepInteger", "keepNA")
to which one may add
"warnIncomplete"
.)
Using control = "exact"
(short for control = c("all", "hexNumeric")
)
comes closest to making deparse()
an inverse of parse()
(but we have not yet seen an example where "all"
, now including
"digits17"
, would not have been as good). However, not all
objects are deparse-able even with these options, and a warning will be
issued if the function recognizes that it is being asked to do the
impossible.
Only one of "hexNumeric"
and "digits17"
can be specified.
An integer value corresponding to the control
options
selected.
stopifnot(.deparseOpts("exact") == .deparseOpts(c("all", "hexNumeric"))) (iOpt.all <- .deparseOpts("all")) # a four digit integer ## one integer --> vector binary bits int2bits <- function(x, base = 2L, ndigits = 1 + floor(1e-9 + log(max(x,1), base))) { r <- numeric(ndigits) for (i in ndigits:1) { r[i] <- x%%base if (i > 1L) x <- x%/%base } rev(r) # smallest bit at left } int2bits(iOpt.all) ## What options does "all" contain ? ========= (depO.indiv <- setdiff(..deparseOpts, c("all", "exact"))) (oa <- depO.indiv[int2bits(iOpt.all) == 1])# 8 strings stopifnot(identical(iOpt.all, .deparseOpts(oa))) ## ditto for "exact" instead of "all": (iOpt.X <- .deparseOpts("exact")) data.frame(opts = depO.indiv, all = int2bits(iOpt.all), exact= int2bits(iOpt.X)) (oX <- depO.indiv[int2bits(iOpt.X) == 1]) # 8 strings, too diffXall <- oa != oX stopifnot(identical(iOpt.X, .deparseOpts(oX)), identical(oX[diffXall], "hexNumeric"), identical(oa[diffXall], "digits17"))
stopifnot(.deparseOpts("exact") == .deparseOpts(c("all", "hexNumeric"))) (iOpt.all <- .deparseOpts("all")) # a four digit integer ## one integer --> vector binary bits int2bits <- function(x, base = 2L, ndigits = 1 + floor(1e-9 + log(max(x,1), base))) { r <- numeric(ndigits) for (i in ndigits:1) { r[i] <- x%%base if (i > 1L) x <- x%/%base } rev(r) # smallest bit at left } int2bits(iOpt.all) ## What options does "all" contain ? ========= (depO.indiv <- setdiff(..deparseOpts, c("all", "exact"))) (oa <- depO.indiv[int2bits(iOpt.all) == 1])# 8 strings stopifnot(identical(iOpt.all, .deparseOpts(oa))) ## ditto for "exact" instead of "all": (iOpt.X <- .deparseOpts("exact")) data.frame(opts = depO.indiv, all = int2bits(iOpt.all), exact= int2bits(iOpt.X)) (oX <- depO.indiv[int2bits(iOpt.X) == 1]) # 8 strings, too diffXall <- oa != oX stopifnot(identical(iOpt.X, .deparseOpts(oX)), identical(oX[diffXall], "hexNumeric"), identical(oa[diffXall], "digits17"))
When an object is about to be removed from R it is first deprecated and
should include a call to .Deprecated
.
.Deprecated(new, package = NULL, msg, old = as.character(sys.call(sys.parent()))[1L])
.Deprecated(new, package = NULL, msg, old = as.character(sys.call(sys.parent()))[1L])
new |
character string: A suggestion for a replacement function. |
package |
character string: The package to be used when suggesting where the deprecated function might be listed. |
msg |
character string: A message to be printed, if missing a default message is used. |
old |
character string specifying the function (default) or usage which is being deprecated. |
.Deprecated("new name")
is called from deprecated
functions. The original help page for these functions is often
available at help("old-deprecated")
(note the quotes).
Deprecated functions should be listed in help("pkg-deprecated")
for an appropriate pkg, including base.
.Deprecated
signals a warning of class "deprecatedWarning"
with fields old
, new
, and package
.
help("base-deprecated")
and so on which list the
deprecated functions in the packages.
det
calculates the determinant of a matrix. determinant
is a generic function that returns separately the modulus of the determinant,
optionally on the logarithm scale, and the sign of the determinant.
det(x, ...) determinant(x, logarithm = TRUE, ...)
det(x, ...) determinant(x, logarithm = TRUE, ...)
x |
numeric matrix: logical matrices are coerced to numeric. |
logarithm |
logical; if |
... |
optional arguments, currently unused. |
The determinant
function uses an LU decomposition and the
det
function is simply a wrapper around a call to
determinant
.
Often, computing the determinant is not what you should be doing to solve a given problem.
For det
, the determinant of x
. For determinant
, a
list with components
modulus |
a numeric value. The modulus (absolute value) of the
determinant if |
sign |
integer; either |
(x <- matrix(1:4, ncol = 2)) unlist(determinant(x)) det(x) det(print(cbind(1, 1:3, c(2,0,1))))
(x <- matrix(1:4, ncol = 2)) unlist(determinant(x)) det(x) det(print(cbind(1, 1:3, c(2,0,1))))
Detach a database, i.e., remove it from the search()
path of available R objects. Usually this is either a
data.frame
which has been attach
ed or a
package which was attached by library
.
detach(name, pos = 2L, unload = FALSE, character.only = FALSE, force = FALSE)
detach(name, pos = 2L, unload = FALSE, character.only = FALSE, force = FALSE)
name |
the object to detach. Defaults to |
pos |
index position in |
unload |
a logical value indicating whether or not to attempt to
unload the namespace when a package is being detached. If the
package has a namespace and |
character.only |
a logical indicating whether |
force |
logical: should a package be detached even though other attached packages depend on it? |
This is most commonly used with a single number argument referring to a
position on the search list, and can also be used with a unquoted or
quoted name of an item on the search list such as package:tools
.
If a package has a namespace, detaching it does not by default unload
the namespace (and may not even with unload = TRUE
), and
detaching will not in general unload any dynamically loaded compiled
code (DLLs); see getLoadedDLLs
and
library.dynam.unload
. Further, registered S3 methods
from the namespace will not be removed, and because S3 methods are
not tagged to their source on registration, it is in general not
possible to safely un-register the methods associated with a given
package. If you use library
on a package whose
namespace is loaded, it attaches the exports of the already loaded
namespace. So detaching and re-attaching a package may not refresh
some or all components of the package, and is inadvisable. The most
reliable way to completely detach a package is to restart R.
The return value is invisible. It is NULL
when a
package is detached, otherwise the environment which was returned by
attach
when the object was attached (incorporating any
changes since it was attached).
detach()
without an argument removes the first item on the
search path after the workspace. It is all too easy to call it too
many or too few times, or to not notice that the search path has
changed since an attach
call.
Use of attach
/detach
is best avoided in functions (see
the help for attach
) and in interactive use and scripts
it is prudent to detach by name.
You cannot detach either the workspace (position 1) nor the base package (the last item in the search list), and attempting to do so will throw an error.
Unloading some namespaces has undesirable side effects: e.g. unloading grid closes all graphics devices, and on some systems tcltk cannot be reloaded once it has been unloaded and may crash R if this is attempted.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
attach
, library
, search
,
objects
, unloadNamespace
,
library.dynam.unload
.
require(splines) # package detach(package:splines) ## or also library(splines) pkg <- "package:splines" detach(pkg, character.only = TRUE) ## careful: do not do this unless 'splines' is not already attached. library(splines) detach(2) # 'pos' used for 'name' ## an example of the name argument to attach ## and of detaching a database named by a character vector attach_and_detach <- function(db, pos = 2) { name <- deparse1(substitute(db)) attach(db, pos = pos, name = name) print(search()[pos]) detach(name, character.only = TRUE) } attach_and_detach(women, pos = 3)
require(splines) # package detach(package:splines) ## or also library(splines) pkg <- "package:splines" detach(pkg, character.only = TRUE) ## careful: do not do this unless 'splines' is not already attached. library(splines) detach(2) # 'pos' used for 'name' ## an example of the name argument to attach ## and of detaching a database named by a character vector attach_and_detach <- function(db, pos = 2) { name <- deparse1(substitute(db)) attach(db, pos = pos, name = name) print(search()[pos]) detach(name, character.only = TRUE) } attach_and_detach(women, pos = 3)
Extract or replace the diagonal of a matrix, or construct a diagonal matrix.
diag(x = 1, nrow, ncol, names = TRUE) diag(x) <- value
diag(x = 1, nrow, ncol, names = TRUE) diag(x) <- value
x |
a matrix, vector or 1D |
nrow , ncol
|
optional dimensions for the result when |
names |
(when |
value |
either a single value or a vector of length equal to that
of the current diagonal. Should be of a mode which can be coerced
to that of |
diag
has four distinct usages:
x
is a matrix, when it extracts the diagonal.
x
is missing and nrow
is specified, it returns
an identity matrix.
x
is a scalar (length-one vector) and the only
argument, it returns a square identity matrix of size given by the scalar.
x
is a ‘numeric’ (complex
,
numeric
, integer
, logical
, or
raw
) vector, either of length at least 2 or there
were further arguments. This returns a matrix with the given
diagonal and zero off-diagonal entries.
It is an error to specify nrow
or ncol
in the first case.
If x
is a matrix then diag(x)
returns the diagonal of
x
. The resulting vector will have names
if
names
is true and if the
matrix x
has matching column and rownames.
The replacement form sets the diagonal of the matrix x
to the
given value(s).
In all other cases the value is a diagonal matrix with nrow
rows and ncol
columns (if ncol
is not given the matrix
is square). Here nrow
is taken from the argument if specified,
otherwise inferred from x
: if that is a vector (or 1D array) of
length two or more, then its length is the number of rows, but if it
is of length one and neither nrow
nor ncol
is specified,
nrow = as.integer(x)
.
When a diagonal matrix is returned, the diagonal elements are one
except in the fourth case, when x
gives the diagonal elements:
it will be recycled or truncated as needed, but fractional recycling
and truncation will give a warning.
Using diag(x)
can have unexpected effects if x
is a
vector that could be of length one. Use diag(x, nrow =
length(x))
for consistent behaviour.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
dim(diag(3)) diag(10, 3, 4) # guess what? all(diag(1:3) == {m <- matrix(0,3,3); diag(m) <- 1:3; m}) ## other "numeric"-like diagonal matrices : diag(c(1i,2i)) # complex diag(TRUE, 3) # logical diag(as.raw(1:3)) # raw (D2 <- diag(2:1, 4)); typeof(D2) # "integer" require(stats) ## diag(<var-cov-matrix>) = variances diag(var(M <- cbind(X = 1:5, Y = rnorm(5)))) #-> vector with names "X" and "Y" rownames(M) <- c(colnames(M), rep("", 3)) M; diag(M) # named as well diag(M, names = FALSE) # w/o names
dim(diag(3)) diag(10, 3, 4) # guess what? all(diag(1:3) == {m <- matrix(0,3,3); diag(m) <- 1:3; m}) ## other "numeric"-like diagonal matrices : diag(c(1i,2i)) # complex diag(TRUE, 3) # logical diag(as.raw(1:3)) # raw (D2 <- diag(2:1, 4)); typeof(D2) # "integer" require(stats) ## diag(<var-cov-matrix>) = variances diag(var(M <- cbind(X = 1:5, Y = rnorm(5)))) #-> vector with names "X" and "Y" rownames(M) <- c(colnames(M), rep("", 3)) M; diag(M) # named as well diag(M, names = FALSE) # w/o names
Returns suitably lagged and iterated differences.
diff(x, ...) ## Default S3 method: diff(x, lag = 1, differences = 1, ...) ## S3 method for class 'POSIXt' diff(x, lag = 1, differences = 1, ...) ## S3 method for class 'Date' diff(x, lag = 1, differences = 1, ...)
diff(x, ...) ## Default S3 method: diff(x, lag = 1, differences = 1, ...) ## S3 method for class 'POSIXt' diff(x, lag = 1, differences = 1, ...) ## S3 method for class 'Date' diff(x, lag = 1, differences = 1, ...)
x |
a numeric vector or matrix containing the values to be differenced. |
lag |
an integer indicating which lag to use. |
differences |
an integer indicating the order of the difference. |
... |
further arguments to be passed to or from methods. |
diff
is a generic function with a default method and ones for
classes "ts"
, "POSIXt"
and
"Date"
.
NA
's propagate.
If x
is a vector of length n
and differences = 1
,
then the computed result is equal to the successive differences
x[(1+lag):n] - x[1:(n-lag)]
.
If difference
is larger than one this algorithm is applied
recursively to x
.
Note that the returned value is a vector which is shorter than
x
.
If x
is a matrix then the difference operations are carried out
on each column separately.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
diff(1:10, 2) diff(1:10, 2, 2) x <- cumsum(cumsum(1:10)) diff(x, lag = 2) diff(x, differences = 2) diff(.leap.seconds) ## allows to pass units via ... to difftime() diff(.leap.seconds, units = "weeks") diff(as.Date(.leap.seconds), units = "weeks")
diff(1:10, 2) diff(1:10, 2, 2) x <- cumsum(cumsum(1:10)) diff(x, lag = 2) diff(x, differences = 2) diff(.leap.seconds) ## allows to pass units via ... to difftime() diff(.leap.seconds, units = "weeks") diff(as.Date(.leap.seconds), units = "weeks")
Time intervals creation, printing, and some arithmetic. The
print()
method calls these “time differences”.
time1 - time2 difftime(time1, time2, tz, units = c("auto", "secs", "mins", "hours", "days", "weeks")) as.difftime(tim, format = "%X", units = "auto", tz = "UTC") ## S3 method for class 'difftime' format(x, ...) ## S3 method for class 'difftime' units(x) ## S3 replacement method for class 'difftime' units(x) <- value ## S3 method for class 'difftime' as.double(x, units = "auto", ...) ## Group methods, notably for round(), signif(), floor(), ## ceiling(), trunc(), abs(); called directly, *not* as Math(): ## S3 method for class 'difftime' Math(x, ...)
time1 - time2 difftime(time1, time2, tz, units = c("auto", "secs", "mins", "hours", "days", "weeks")) as.difftime(tim, format = "%X", units = "auto", tz = "UTC") ## S3 method for class 'difftime' format(x, ...) ## S3 method for class 'difftime' units(x) ## S3 replacement method for class 'difftime' units(x) <- value ## S3 method for class 'difftime' as.double(x, units = "auto", ...) ## Group methods, notably for round(), signif(), floor(), ## ceiling(), trunc(), abs(); called directly, *not* as Math(): ## S3 method for class 'difftime' Math(x, ...)
time1 , time2
|
|
tz |
an optional time zone specification to be used for the
conversion, mainly for |
units |
character string. Units in which the results are desired. Can be abbreviated. |
value |
character string. Like |
tim |
character string or numeric value specifying a time interval. |
format |
character specifying the format of |
x |
an object inheriting from class |
... |
arguments to be passed to or from other methods. |
Function difftime
calculates a difference of two date/time
objects and returns an object of class "difftime"
with an
attribute indicating the units. The
Math
group method provides
round
, signif
, floor
,
ceiling
, trunc
, abs
, and
sign
methods for objects of this class, and there are
methods for the group-generic (see
Ops
) logical and arithmetic
operations.
If units = "auto"
, a suitable set of units is chosen, the largest
possible (excluding "weeks"
) in which all the absolute
differences are greater than one.
Subtraction of date-time objects gives an object of this class,
by calling difftime
with units = "auto"
. Alternatively,
as.difftime()
works on character-coded or numeric time
intervals; in the latter case, units must be specified, and
format
has no effect.
Limited arithmetic is available on "difftime"
objects: they can
be added or subtracted, and multiplied or divided by a numeric vector.
In addition, adding or subtracting a numeric vector by a
"difftime"
object implicitly converts the numeric vector to a
"difftime"
object with the same units as the "difftime"
object. There are methods for mean
and
sum
(via the Summary
group generic), and diff
via diff.default
building on the "difftime"
method for arithmetic, notably
-
.
The units of a "difftime"
object can be extracted by the
units
function, which also has a replacement form. If the
units are changed, the numerical value is scaled accordingly. The
replacement version keeps attributes such as names and dimensions.
Note that units = "days"
means a period of 24 hours, hence
takes no account of Daylight Savings Time. Differences in objects
of class "Date"
are computed as if in the UTC time zone.
The as.double
method returns the numeric value expressed in
the specified units. Using units = "auto"
means the units of the
object.
The format
method simply formats the numeric value and appends
the units as a text string.
Because R follows POSIX (and almost all computer clocks) in ignoring leap seconds, so do time differences. So in a UTC time zone
z <- as.POSIXct(c("2016-12-31 23:59:59", "2017-01-01 00:00:01")) z[2] - z[1]
reports ‘Time difference of 2 secs’ but 3 seconds elapsed while the computer clock advanced by 2 seconds.
If you want the elapsed time interval, you need to add in any leap seconds for yourself.
Units such as "months"
are not possible as they are not of
constant length. To create intervals of months, quarters or years
use seq.Date
or seq.POSIXt
.
(z <- Sys.time() - 3600) Sys.time() - z # just over 3600 seconds. ## time interval between release days of R 1.2.2 and 1.2.3. ISOdate(2001, 4, 26) - ISOdate(2001, 2, 26) as.difftime(c("0:3:20", "11:23:15")) as.difftime(c("3:20", "23:15", "2:"), format = "%H:%M") # 3rd gives NA (z <- as.difftime(c(0,30,60), units = "mins")) as.numeric(z, units = "secs") as.numeric(z, units = "hours") format(z)
(z <- Sys.time() - 3600) Sys.time() - z # just over 3600 seconds. ## time interval between release days of R 1.2.2 and 1.2.3. ISOdate(2001, 4, 26) - ISOdate(2001, 2, 26) as.difftime(c("0:3:20", "11:23:15")) as.difftime(c("3:20", "23:15", "2:"), format = "%H:%M") # 3rd gives NA (z <- as.difftime(c(0,30,60), units = "mins")) as.numeric(z, units = "secs") as.numeric(z, units = "hours") format(z)
Retrieve or set the dimension of an object.
dim(x) dim(x) <- value
dim(x) dim(x) <- value
x |
an R object, for example a matrix, array or data frame. |
value |
for the default method, either |
The functions dim
and dim<-
are internal generic
primitive functions.
dim
has a method for data.frame
s, which returns
the lengths of the row.names
attribute of x
and
of x
(as the numbers of rows and columns respectively).
For an array (and hence in particular, for a matrix) dim
retrieves
the dim
attribute of the object. It is NULL
or a vector
of mode integer
.
The replacement method changes the "dim"
attribute (provided the
new value is compatible) and removes any "dimnames"
and
"names"
attributes.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
x <- 1:12 ; dim(x) <- c(3,4) x # simple versions of nrow and ncol could be defined as follows nrow0 <- function(x) dim(x)[1] ncol0 <- function(x) dim(x)[2]
x <- 1:12 ; dim(x) <- c(3,4) x # simple versions of nrow and ncol could be defined as follows nrow0 <- function(x) dim(x)[1] ncol0 <- function(x) dim(x)[2]
Retrieve or set the dimnames of an object.
dimnames(x) dimnames(x) <- value provideDimnames(x, sep = "", base = list(LETTERS), unique = TRUE)
dimnames(x) dimnames(x) <- value provideDimnames(x, sep = "", base = list(LETTERS), unique = TRUE)
x |
an R object, for example a matrix, array or data frame. |
value |
a possible value for |
sep |
a character string, used to separate |
base |
a non-empty |
unique |
logical indicating that the dimnames constructed are
unique within each dimension in the sense of |
The functions dimnames
and dimnames<-
are generic.
For an array
(and hence in particular, for a
matrix
), they retrieve or set the dimnames
attribute (see attributes) of the object. A list
value
can have names, and these will be used to label the
dimensions of the array where appropriate.
The replacement method for arrays/matrices coerces vector and factor
elements of value
to character, but does not dispatch methods
for as.character
. It coerces zero-length elements to
NULL
, and a zero-length list to NULL
. If value
is a list shorter than the number of dimensions, it is extended with
NULL
s to the needed length.
Both have methods for data frames. The dimnames of a data frame are
its row.names
and its names
. For the
replacement method each component of value
will be coerced by
as.character
.
For a 1D matrix the names
are the same thing as the
(only) component of the dimnames
.
Both are primitive functions.
provideDimnames(x)
provides dimnames
where
“missing”, such that its result has character
dimnames for each component. If unique
is true as by default,
they are unique within each component via make.unique(*,
sep=sep)
.
The dimnames of a matrix or array can be NULL
(which is not
stored) or a list of the same length as dim(x)
. If a list, its
components are either NULL
or a character vector with positive
length of the appropriate dimension of x
. The list can have
names. It is possible that all components are NULL
: such
dimnames may get converted to NULL
.
For the "data.frame"
method both dimnames are character
vectors, and the rownames must contain no duplicates nor missing
values.
provideDimnames(x)
returns x
, with “NULL
-
free” dimnames
, i.e. each component a character vector of
correct length.
Setting components of the dimnames, e.g.,
dimnames(A)[[1]] <- value
is a common paradigm, but note that
it will not work if the value assigned is NULL
. Use
rownames
instead, or (as it does) manipulate the whole
dimnames list.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
rownames
, colnames
;
array
, matrix
, data.frame
.
## simple versions of rownames and colnames ## could be defined as follows rownames0 <- function(x) dimnames(x)[[1]] colnames0 <- function(x) dimnames(x)[[2]] (dn <- dimnames(A <- provideDimnames(N <- array(1:24, dim = 2:4)))) A0 <- A; dimnames(A)[2:3] <- list(NULL) stopifnot(identical(A0, provideDimnames(A))) strd <- function(x) utils::str(dimnames(x)) strd(provideDimnames(A, base= list(letters[-(1:9)], tail(LETTERS)))) strd(provideDimnames(N, base= list(letters[-(1:9)], tail(LETTERS)))) # recycling strd(provideDimnames(A, base= list(c("AA","BB")))) # recycling on both levels ## set "empty dimnames": provideDimnames(rbind(1, 2:3), base = list(""), unique=FALSE)
## simple versions of rownames and colnames ## could be defined as follows rownames0 <- function(x) dimnames(x)[[1]] colnames0 <- function(x) dimnames(x)[[2]] (dn <- dimnames(A <- provideDimnames(N <- array(1:24, dim = 2:4)))) A0 <- A; dimnames(A)[2:3] <- list(NULL) stopifnot(identical(A0, provideDimnames(A))) strd <- function(x) utils::str(dimnames(x)) strd(provideDimnames(A, base= list(letters[-(1:9)], tail(LETTERS)))) strd(provideDimnames(N, base= list(letters[-(1:9)], tail(LETTERS)))) # recycling strd(provideDimnames(A, base= list(c("AA","BB")))) # recycling on both levels ## set "empty dimnames": provideDimnames(rbind(1, 2:3), base = list(""), unique=FALSE)
do.call
constructs and executes a function call from a name or
a function and a list of arguments to be passed to it.
do.call(what, args, quote = FALSE, envir = parent.frame())
do.call(what, args, quote = FALSE, envir = parent.frame())
what |
either a function or a non-empty character string naming the function to be called. |
args |
a list of arguments to the function call. The
|
quote |
a logical value indicating whether to quote the arguments. |
envir |
an environment within which to evaluate the call. This
will be most useful if |
If quote
is FALSE
, the default, then the arguments are
evaluated (in the calling environment, not in envir
). If
quote
is TRUE
then each argument is quoted (see
quote
) so that the effect of argument evaluation is to
remove the quotes – leaving the original arguments unevaluated when the
call is constructed.
The behavior of some functions, such as substitute
,
will not be the same for functions evaluated using do.call
as
if they were evaluated from the interpreter. The precise semantics
are currently undefined and subject to change.
The result of the (evaluated) function call.
This should not be used to attempt to evade restrictions on the use of
.Internal
and other non-API calls.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
call
which creates an unevaluated call.
do.call("complex", list(imaginary = 1:3)) ## if we already have a list (e.g., a data frame) ## we need c() to add further arguments tmp <- expand.grid(letters[1:2], 1:3, c("+", "-")) do.call("paste", c(tmp, sep = "")) do.call(paste, list(as.name("A"), as.name("B")), quote = TRUE) ## examples of where objects will be found. A <- 2 f <- function(x) print(x^2) env <- new.env() assign("A", 10, envir = env) assign("f", f, envir = env) f <- function(x) print(x) f(A) # 2 do.call("f", list(A)) # 2 do.call("f", list(A), envir = env) # 4 do.call( f, list(A), envir = env) # 2 do.call("f", list(quote(A)), envir = env) # 100 do.call( f, list(quote(A)), envir = env) # 10 do.call("f", list(as.name("A")), envir = env) # 100 eval(call("f", A)) # 2 eval(call("f", quote(A))) # 2 eval(call("f", A), envir = env) # 4 eval(call("f", quote(A)), envir = env) # 100
do.call("complex", list(imaginary = 1:3)) ## if we already have a list (e.g., a data frame) ## we need c() to add further arguments tmp <- expand.grid(letters[1:2], 1:3, c("+", "-")) do.call("paste", c(tmp, sep = "")) do.call(paste, list(as.name("A"), as.name("B")), quote = TRUE) ## examples of where objects will be found. A <- 2 f <- function(x) print(x^2) env <- new.env() assign("A", 10, envir = env) assign("f", f, envir = env) f <- function(x) print(x) f(A) # 2 do.call("f", list(A)) # 2 do.call("f", list(A), envir = env) # 4 do.call( f, list(A), envir = env) # 2 do.call("f", list(quote(A)), envir = env) # 100 do.call( f, list(quote(A)), envir = env) # 10 do.call("f", list(as.name("A")), envir = env) # 100 eval(call("f", A)) # 2 eval(call("f", quote(A))) # 2 eval(call("f", A), envir = env) # 4 eval(call("f", quote(A)), envir = env) # 100
The dontCheck
function is the same as identity
, but
is interpreted by R CMD check
code analysis as a directive
to suppress checking of x
. Currently this is only used by
checkFF(registration = TRUE)
when checking the .NAME
argument of foreign function calls.
dontCheck(x)
dontCheck(x)
x |
an R object. |
suppressForeignCheck
which explains why that and
dontCheck
are undesirable and should be avoided if at all
possible.
..1
, etc used in Functions...
and ..1
, ..2
etc are used to refer to
arguments passed down from a calling function. These (and the
following) can only be used inside a function which has
...
among its formal arguments.
...elt(n)
is a functional way to get ..n
and
basically the same as eval(paste0("..", n))
, just more elegant
and efficient.
Note that switch(n, ...)
is very close, differing by returning
NULL
invisibly instead of an error when n
is zero or
too large.
...length()
returns the number of expressions in ...
, and
...names()
the names
.
These are the same as length(list(...))
or names(list(...))
but without evaluating the expressions in ...
(which happens with
list(...)
).
Evaluating elements of ...
with ..1
, ..2
,
...elt(n)
, etc. propagates visibility. This
is consistent with the evaluation of named arguments which also
propagates visibility.
...length() ...names() ...elt(n)
...length() ...names() ...elt(n)
n |
a positive integer, not larger than the number of expressions
in ..., which is the same as |
...
and ..1
, ..2
are reserved words in
R, see Reserved
.
For more, see the Introduction to R manual for usage of these syntactic elements, and dotsMethods for their use in formal (S4) methods.
tst <- function(n, ...) ...elt(n) tst(1, pi=pi*0:1, 2:4) ## [1] 0.000000 3.141593 tst(2, pi=pi*0:1, 2:4) ## [1] 2 3 4 try(tst(1)) # -> Error about '...' not containing an element. tst.dl <- function(x, ...) ...length() tst.dns <- function(x, ...) ...names() tst.dl(1:10) # 0 (because the first argument is 'x') tst.dl(4, 5) # 1 tst.dl(4, 5, 6) # 2 namely '5, 6' tst.dl(4, 5, 6, 7, sin(1:10), "foo"/"bar") # 5. Note: no evaluation! tst.dns(4, foo=5, 6, bar=7, sini = sin(1:10), "foo"/"bar") ## "foo" "" "bar" "sini" "" ## From R 4.1.0 to 4.1.2, ...names() sometimes did not match names(list(...)); ## check and show (these examples all would've failed): chk.n2 <- function(...) stopifnot(identical(print(...names()), names(list(...)))) chk.n2(4, foo=5, 6, bar=7, sini = sin(1:10), "bar") chk.n2() chk.n2(1,2)
tst <- function(n, ...) ...elt(n) tst(1, pi=pi*0:1, 2:4) ## [1] 0.000000 3.141593 tst(2, pi=pi*0:1, 2:4) ## [1] 2 3 4 try(tst(1)) # -> Error about '...' not containing an element. tst.dl <- function(x, ...) ...length() tst.dns <- function(x, ...) ...names() tst.dl(1:10) # 0 (because the first argument is 'x') tst.dl(4, 5) # 1 tst.dl(4, 5, 6) # 2 namely '5, 6' tst.dl(4, 5, 6, 7, sin(1:10), "foo"/"bar") # 5. Note: no evaluation! tst.dns(4, foo=5, 6, bar=7, sini = sin(1:10), "foo"/"bar") ## "foo" "" "bar" "sini" "" ## From R 4.1.0 to 4.1.2, ...names() sometimes did not match names(list(...)); ## check and show (these examples all would've failed): chk.n2 <- function(...) stopifnot(identical(print(...names()), names(list(...)))) chk.n2(4, foo=5, 6, bar=7, sini = sin(1:10), "bar") chk.n2() chk.n2(1,2)
Create, coerce to or test for a double-precision vector.
double(length = 0) as.double(x, ...) is.double(x) single(length = 0) as.single(x, ...)
double(length = 0) as.double(x, ...) is.double(x) single(length = 0) as.single(x, ...)
length |
a non-negative integer specifying the desired length. Double values will be coerced to integer: supplying an argument of length other than one is an error. |
x |
object to be coerced or tested. |
... |
further arguments passed to or from other methods. |
double
creates a double-precision vector of the specified
length. The elements of the vector are all equal to 0
.
It is identical to numeric
.
as.double
is a generic function. It is identical to
as.numeric
. Methods should return an object of base type
"double"
.
is.double
is a test of double type.
R has no single precision data type. All real numbers are
stored in double precision format. The functions as.single
and single
are identical to as.double
and double
except they set the attribute Csingle
that is used in the
.C
and .Fortran
interface, and they are
intended only to be used in that context.
double
creates a double-precision vector of the specified
length. The elements of the vector are all equal to 0
.
as.double
attempts to coerce its argument to be of double type:
like as.vector
it strips attributes including names.
(To ensure that an object is of double type without stripping
attributes, use storage.mode
.) Character strings
containing optional whitespace followed by either a decimal
representation or a hexadecimal representation (starting with
0x
or 0X
) can be converted, as can special values such
as "NA"
, "NaN"
, "Inf"
and "infinity"
,
irrespective of case.
as.double
for factors yields the codes underlying the factor
levels, not the numeric representation of the labels, see also
factor
.
is.double
returns TRUE
or FALSE
depending on
whether its argument is of double type or not.
All R platforms are required to work with values conforming to the
IEC 60559 (also known as IEEE 754) standard. This basically works
with a precision of 53 bits, and represents to that precision a range
of absolute values from about to
. It also has special values
NaN
(many of them), plus and minus infinity and plus and
minus zero (although R acts as if these are the same). There are
also denormal(ized) (or subnormal) numbers with values
below the range given above but represented to less precision.
See .Machine
for precise information on these limits.
Note that ultimately how double precision numbers are handled is down
to the CPU/FPU and compiler.
In IEEE 754-2008/IEC60559:2011 this is called ‘binary64’ format.
It is a historical anomaly that R has two names for its
floating-point vectors, double
and numeric
(and formerly had real
).
double
is the name of the type.
numeric
is the name of the mode and also of the implicit
class. As an S4 formal class, use "numeric"
.
The potential confusion is that R has used mode
"numeric"
to mean ‘double or integer’, which conflicts
with the S4 usage. Thus is.numeric
tests the mode, not the
class, but as.numeric
(which is identical to as.double
)
coerces to the class.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
https://en.wikipedia.org/wiki/IEEE_754-1985, https://en.wikipedia.org/wiki/IEEE_754-2008, https://en.wikipedia.org/wiki/IEEE_754-2019, https://en.wikipedia.org/wiki/Double_precision, https://en.wikipedia.org/wiki/Denormal_number.
integer
, numeric
, storage.mode
.
is.double(1) all(double(3) == 0)
is.double(1) all(double(3) == 0)
Writes an ASCII text representation of an R object to a file, the R console, or a connection, or uses one to recreate the object.
dput(x, file = "", control = c("keepNA", "keepInteger", "niceNames", "showAttributes")) dget(file, keep.source = FALSE)
dput(x, file = "", control = c("keepNA", "keepInteger", "niceNames", "showAttributes")) dget(file, keep.source = FALSE)
x |
an object. |
file |
either a character string naming a file or a
connection. |
control |
character vector (or |
keep.source |
logical: should the source formatting be retained when parsing functions, if possible? |
dput
opens file
and deparses the object x
into
that file. The object name is not written (unlike dump
).
If x
is a function the associated environment is stripped.
Hence scoping information can be lost.
Deparsing an object is difficult, and not always possible. With the
default control
, dput()
attempts to deparse in a way
that is readable, but for more complex or unusual objects (see
dump
), not likely
to be parsed as identical to the original. Use control = "all"
for the most complete deparsing; use control = NULL
for the
simplest deparsing, not even including attributes.
dput
will warn if fewer characters were written to a file than
expected, which may indicate a full or corrupt file system.
To display saved source rather than deparsing the internal
representation include "useSource"
in control
. R
currently saves source only for function definitions. If you do not
care about source representation (e.g., for a data object), for speed
set options(keep.source = FALSE
) when calling source
.
For dput
, the first argument invisibly.
For dget
, the object created.
This is not a good way to transfer objects between R sessions.
dump
is better, but the functions save
and
saveRDS
are designed to be used for transporting R data,
and will work with R objects that dput
does not handle correctly
as well as being much faster.
To avoid the risk of a source attribute out of sync with the actual function definition, the source attribute of a function will never be written as an attribute.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
deparse
, .deparseOpts
,
dump
, write
.
fil <- tempfile() ## Write an ASCII version of the 'base' function mean() to our temp file, .. dput(base::mean, fil) ## ... read it back into 'bar' and confirm it is the same bar <- dget(fil) stopifnot(all.equal(bar, base::mean, check.environment = FALSE)) ## Create a function with comments baz <- function(x) { # Subtract from one 1-x } ## and display it dput(baz) ## and now display the saved source dput(baz, control = "useSource") ## Numeric values: xx <- pi^(1:3) dput(xx) dput(xx, control = "digits17") dput(xx, control = "hexNumeric") dput(xx, fil); dget(fil) - xx # slight rounding on all platforms dput(xx, fil, control = "digits17") dget(fil) - xx # slight rounding on some platforms dput(xx, fil, control = "hexNumeric"); dget(fil) - xx unlink(fil) xn <- setNames(xx, paste0("pi^",1:3)) dput(xn) # nicer, now "niceNames" being part of default 'control' dput(xn, control = "S_compat") # no names ## explicitly asking for output as in R < 3.5.0: dput(xn, control = c("keepNA", "keepInteger", "showAttributes"))
fil <- tempfile() ## Write an ASCII version of the 'base' function mean() to our temp file, .. dput(base::mean, fil) ## ... read it back into 'bar' and confirm it is the same bar <- dget(fil) stopifnot(all.equal(bar, base::mean, check.environment = FALSE)) ## Create a function with comments baz <- function(x) { # Subtract from one 1-x } ## and display it dput(baz) ## and now display the saved source dput(baz, control = "useSource") ## Numeric values: xx <- pi^(1:3) dput(xx) dput(xx, control = "digits17") dput(xx, control = "hexNumeric") dput(xx, fil); dget(fil) - xx # slight rounding on all platforms dput(xx, fil, control = "digits17") dget(fil) - xx # slight rounding on some platforms dput(xx, fil, control = "hexNumeric"); dget(fil) - xx unlink(fil) xn <- setNames(xx, paste0("pi^",1:3)) dput(xn) # nicer, now "niceNames" being part of default 'control' dput(xn, control = "S_compat") # no names ## explicitly asking for output as in R < 3.5.0: dput(xn, control = c("keepNA", "keepInteger", "showAttributes"))
Delete the dimensions of an array which have only one level.
drop(x)
drop(x)
x |
an array (including a matrix). |
If x
is an object with a dim
attribute (e.g., a matrix
or array
), then drop
returns an object like
x
, but with any extents of length one removed. Any
accompanying dimnames
attribute is adjusted and returned with
x
: if the result is a vector the names
are taken from
the dimnames
(if any). If the result is a length-one vector,
the names are taken from the first dimension with a dimname.
Array subsetting ([
) performs this reduction unless used
with drop = FALSE
, but sometimes it is useful to invoke
drop
directly.
drop1
which is used for dropping terms in models, and
droplevels
used for dropping unused levels from a factor
.
dim(drop(array(1:12, dim = c(1,3,1,1,2,1,2)))) # = 3 2 2 drop(1:3 %*% 2:4) # scalar product
dim(drop(array(1:12, dim = c(1,3,1,1,2,1,2)))) # = 3 2 2 drop(1:3 %*% 2:4) # scalar product
The function droplevels
is used to drop unused levels from a
factor
or, more commonly, from factors in a data frame.
droplevels(x, ...) ## S3 method for class 'factor' droplevels(x, exclude = if(anyNA(levels(x))) NULL else NA, ...) ## S3 method for class 'data.frame' droplevels(x, except, exclude, ...)
droplevels(x, ...) ## S3 method for class 'factor' droplevels(x, exclude = if(anyNA(levels(x))) NULL else NA, ...) ## S3 method for class 'data.frame' droplevels(x, except, exclude, ...)
x |
an object from which to drop unused factor levels. |
exclude |
passed to |
... |
further arguments passed to methods. |
except |
indices of columns from which not to drop levels. |
The method for class "factor"
is currently equivalent to
factor(x, exclude=exclude)
. For the data frame method, you
should rarely specify exclude
“globally” for all factor
columns; rather the default uses the same factor-specific
exclude
as the factor method itself.
The except
argument follows the usual indexing rules.
droplevels
returns an object of the same class as x
This function was introduced in R 2.12.0. It is primarily
intended for cases where one or more factors in a data frame
contains only elements from a reduced level set after
subsetting. (Notice that subsetting does not in general drop
unused levels). By default, levels are dropped from all factors in a
data frame, but the except
argument allows you to specify
columns for which this is not wanted.
subset
for subsetting data frames.
factor
for definition of factors.
drop
for dropping array dimensions.
drop1
for dropping terms from a model.
[.factor
for subsetting of factors.
aq <- transform(airquality, Month = factor(Month, labels = month.abb[5:9])) aq <- subset(aq, Month != "Jul") table( aq $Month) table(droplevels(aq)$Month)
aq <- transform(airquality, Month = factor(Month, labels = month.abb[5:9])) aq <- subset(aq, Month != "Jul") table( aq $Month) table(droplevels(aq)$Month)
This function takes a vector of names of R objects and produces
text representations of the objects on a file or connection.
A dump
file can usually be source
d into another
R session.
dump(list, file = "dumpdata.R", append = FALSE, control = "all", envir = parent.frame(), evaluate = TRUE)
dump(list, file = "dumpdata.R", append = FALSE, control = "all", envir = parent.frame(), evaluate = TRUE)
list |
character vector (or |
file |
either a character string naming a file or a
connection. |
append |
if |
control |
character vector (or |
envir |
the environment to search for objects. |
evaluate |
logical. Should promises be evaluated? |
If some of the objects named do not exist (in scope), they are
omitted, with a warning. If file
is a file and no objects
exist then no file is created.
source
ing may not produce an identical copy of
dump
ed objects. A warning is issued if it is likely that
problems will arise, for example when dumping exotic or complex
objects (see the Note).
dump
will also warn if fewer characters were written to a file
than expected, which may indicate a full or corrupt file system.
A dump
file can be source
d into another R (or
perhaps S) session, but the functions save
and
saveRDS
are designed to
be used for transporting R data, and will work with R objects that
dump
does not handle. For maximal reproducibility use
control = "exact"
.
To produce a more readable representation of an object, use
control = NULL
. This will skip attributes, and will make other
simplifications that make source
less likely to produce an
identical copy. See .deparseOpts
for details.
To deparse the internal representation of a function rather than
displaying the saved source, use control = c("keepInteger",
"warnIncomplete", "keepNA")
. This will lose all formatting and
comments, but may be useful in those cases where the saved source is
no longer correct.
Promises will normally only be encountered by users as a result of
lazy-loading (when the default evaluate = TRUE
is essential)
and after the use of delayedAssign
,
when evaluate = FALSE
might be intended.
An invisible character vector containing the names of the objects which were dumped.
As dump
is defined in the base namespace, the base
package will be searched before the global environment unless
dump
is called from the top level prompt or the envir
argument is given explicitly.
To avoid the risk of a source attribute becoming out of sync with the actual function definition, the source attribute of a function will never be dumped as an attribute.
Currently environments, external pointers, weak references and objects
of type S4
are not deparsed in a way that can be
source
d. In addition, language objects are deparsed in a
simple way whatever the value of control
, and this includes not
dumping their attributes (which will result in a warning).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
.deparseOpts
for available control
settings;
dput()
, dget()
and deparse()
for related functions using identical internal deparsing functionality.
write
, write.table
, etc for “dumping”
data to (text) files.
save
and saveRDS
for a more reliable way to
save R objects.
x <- 1; y <- 1:10 fil <- tempfile(fileext=".Rdmped") dump(ls(pattern = '^[xyz]'), fil) print(.Last.value) unlink(fil)
x <- 1; y <- 1:10 fil <- tempfile(fileext=".Rdmped") dump(ls(pattern = '^[xyz]'), fil) print(.Last.value) unlink(fil)
duplicated()
determines which elements of a vector or data
frame are duplicates
of elements with smaller subscripts, and returns a logical vector
indicating which elements (rows) are duplicates.
anyDuplicated(.)
is a “generalized” more efficient
version any(duplicated(.))
, returning positive integer indices
instead of just TRUE
.
duplicated(x, incomparables = FALSE, ...) ## Default S3 method: duplicated(x, incomparables = FALSE, fromLast = FALSE, nmax = NA, ...) ## S3 method for class 'array' duplicated(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, ...) anyDuplicated(x, incomparables = FALSE, ...) ## Default S3 method: anyDuplicated(x, incomparables = FALSE, fromLast = FALSE, ...) ## S3 method for class 'array' anyDuplicated(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, ...)
duplicated(x, incomparables = FALSE, ...) ## Default S3 method: duplicated(x, incomparables = FALSE, fromLast = FALSE, nmax = NA, ...) ## S3 method for class 'array' duplicated(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, ...) anyDuplicated(x, incomparables = FALSE, ...) ## Default S3 method: anyDuplicated(x, incomparables = FALSE, fromLast = FALSE, ...) ## S3 method for class 'array' anyDuplicated(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, ...)
x |
a vector or a data frame or an array or |
incomparables |
a vector of values that cannot be compared.
|
fromLast |
logical indicating if duplication should be considered
from the reverse side, i.e., the last (or rightmost) of identical
elements would correspond to |
nmax |
the maximum number of unique items expected (greater than one). |
... |
arguments for particular methods. |
MARGIN |
the array margin to be held fixed: see
|
These are generic functions with methods for vectors (including lists), data frames and arrays (including matrices).
For the default methods, and whenever there are equivalent method
definitions for duplicated
and anyDuplicated
,
anyDuplicated(x, ...)
is a “generalized” shortcut for
any(duplicated(x, ...))
, in the sense that it returns the
index i
of the first duplicated entry x[i]
if
there is one, and 0
otherwise. Their behaviours may be
different when at least one of duplicated
and
anyDuplicated
has a relevant method.
duplicated(x, fromLast = TRUE)
is equivalent to but faster than
rev(duplicated(rev(x)))
.
The array method calculates for each element of the sub-array
specified by MARGIN
if the remaining dimensions are identical
to those for an earlier (or later, when fromLast = TRUE
) element
(in row-major order). This would most commonly be used to find
duplicated rows (the default) or columns (with MARGIN = 2
).
Note that MARGIN = 0
returns an array of the same
dimensionality attributes as x
.
Missing values ("NA"
) are regarded as equal, numeric and
complex ones differing from NaN
; character strings will be compared in a
“common encoding”; for details, see match
(and
unique
) which use the same concept.
Values in incomparables
will never be marked as duplicated.
This is intended to be used for a fairly small set of values and will
not be efficient for a very large set.
Except for factors, logical and raw vectors the default nmax = NA
is
equivalent to nmax = length(x)
. Since a hash table of size
8*nmax
bytes is allocated, setting nmax
suitably can
save large amounts of memory. For factors it is automatically set to
the smaller of length(x)
and the number of levels plus one (for
NA
). If nmax
is set too small there is liable to be an
error: nmax = 1
is silently ignored.
Long vectors are supported for the default method of
duplicated
, but may only be usable if nmax
is supplied.
duplicated()
:
For a vector input, a logical vector of the same length as
x
. For a data frame, a logical vector with one element for
each row. For a matrix or array, and when MARGIN = 0
, a
logical array with the same dimensions and dimnames.
anyDuplicated()
: an integer or real vector of length one with
value the 1-based index of the first duplicate if any, otherwise
0
.
Using this for lists is potentially slow, especially if the elements
are not atomic vectors (see vector
) or differ only
in their attributes. In the worst case it is .
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
x <- c(9:20, 1:5, 3:7, 0:8) ## extract unique elements (xu <- x[!duplicated(x)]) ## similar, same elements but different order: (xu2 <- x[!duplicated(x, fromLast = TRUE)]) ## xu == unique(x) but unique(x) is more efficient stopifnot(identical(xu, unique(x)), identical(xu2, unique(x, fromLast = TRUE))) duplicated(iris)[140:143] duplicated(iris3, MARGIN = c(1, 3)) anyDuplicated(iris) ## 143 anyDuplicated(x) anyDuplicated(x, fromLast = TRUE)
x <- c(9:20, 1:5, 3:7, 0:8) ## extract unique elements (xu <- x[!duplicated(x)]) ## similar, same elements but different order: (xu2 <- x[!duplicated(x, fromLast = TRUE)]) ## xu == unique(x) but unique(x) is more efficient stopifnot(identical(xu, unique(x)), identical(xu2, unique(x, fromLast = TRUE))) duplicated(iris)[140:143] duplicated(iris3, MARGIN = c(1, 3)) anyDuplicated(iris) ## 143 anyDuplicated(x) anyDuplicated(x, fromLast = TRUE)
Load or unload DLLs (also known as shared objects), and test whether a C function or Fortran subroutine is available.
dyn.load(x, local = TRUE, now = TRUE, ...) dyn.unload(x) is.loaded(symbol, PACKAGE = "", type = "")
dyn.load(x, local = TRUE, now = TRUE, ...) dyn.unload(x) is.loaded(symbol, PACKAGE = "", type = "")
x |
a character string giving the pathname to a DLL, also known as a dynamic shared object. (See ‘Details’ for what these terms mean.) |
local |
a logical value controlling whether the symbols in the DLL are stored in their own local table and not shared across DLLs, or added to the global symbol table. Whether this has any effect is system-dependent. |
now |
a logical controlling whether all symbols are resolved (and relocated) immediately when the library is loaded or deferred until they are used. This control is useful for developers testing whether a library is complete and has all the necessary symbols, and for users to ignore missing symbols. Whether this has any effect is system-dependent. |
... |
other arguments for future expansion. |
symbol |
a character string giving a symbol name. |
PACKAGE |
if supplied, confine the search for the |
type |
the type of symbol to look for: can be any ( |
The objects dyn.load
loads are called ‘dynamically
loadable libraries’ (abbreviated to ‘DLL’) on all platforms
except macOS, which uses the term for a different sort
of object. On Unix-alikes they are also called ‘dynamic
shared objects’ (‘DSO’), or ‘shared objects’ for
short. (The POSIX standards use ‘executable object file’,
but no one else does.)
See ‘See Also’ and the ‘Writing R Extensions’ and ‘R Installation and Administration’ manuals for how to create and install a suitable DLL.
Unfortunately some rare platforms (e.g., Compaq Tru64) do not handle
the PACKAGE
argument correctly, and may incorrectly find
symbols linked into R.
The additional arguments to dyn.load
mirror the different
aspects of the mode argument to the dlopen()
routine on POSIX
systems. They are available so that users can exercise greater control
over the loading process for an individual library. In general, the
default values are appropriate and you should override them only if
there is good reason and you understand the implications.
The local
argument allows one to control whether the symbols in
the DLL being attached are visible to other DLLs. While maintaining
the symbols in their own namespace is good practice, the ability to
share symbols across related ‘chapters’ is useful in many
cases. Additionally, on certain platforms and versions of an
operating system, certain libraries must have their symbols loaded
globally to successfully resolve all symbols.
One should be careful of one potential side-effect of using lazy
loading via now = FALSE
: if a routine is
called that has a missing symbol, the process will terminate
immediately. The intended use is for library developers to call this with
value TRUE
to check that all symbols are actually resolved and
for regular users to call it with FALSE
so that missing symbols
can be ignored and the available ones can be called.
The initial motivation for adding these was to avoid such termination
in the _init()
routines of the Java virtual machine library.
However, symbols loaded locally may not be (read: probably) available
to other DLLs. Those added to the global table are available to all
other elements of the application and so can be shared across two
different DLLs.
Some (very old) systems do not provide (explicit) support for
local/global and lazy/eager symbol resolution. This can be the source
of subtle bugs. One can arrange to have warning messages emitted when
unsupported options are used. This is done by setting either of the
options verbose
or warn
to be non-zero via the
options
function.
There is a short discussion of these additional arguments with some example code available at https://www.stat.ucdavis.edu/~duncan/R/dynload/.
The function dyn.load
is used for its side effect which links
the specified DLL to the executing R image. Calls to .C
,
.Call
, .Fortran
and .External
can then be used to
execute compiled C functions or Fortran subroutines contained in the
library. The return value of dyn.load
is an object of class
DLLInfo
. See getLoadedDLLs
for information about
this class.
The function dyn.unload
unlinks the DLL. Note that unloading a
DLL and then re-loading a DLL of the same name may or may not work: on
Solaris it used the first version loaded. Note also that some DLLs cannot
be safely unloaded at all: unloading a DLL which implements C finalizers
but does not unregister them on unload causes R to crash.
is.loaded
checks if the symbol name is loaded and
searchable and hence available for use as a character string value
for argument .NAME
in .C
, .Fortran
,
.Call
, or .External
. It will succeed if any one of the
four calling functions would succeed in using the entry point unless
type
is specified. (See .Fortran
for how Fortran
symbols are mapped.) Note that symbols in base packages are not
searchable, and other packages can be so marked.
Do not use dyn.unload
on a DLL loaded by
library.dynam
: use library.dynam.unload
.
This is needed for system housekeeping.
is.loaded
requires the name you would give to .C
etc.
It must be a character string and so cannot be an R object as used
for registered native symbols (see “Writing R Extensions”
section 5.4.). Some registered symbols are available by name but most are
not, including those in the examples below.
By default, the maximum number of DLLs that can be loaded is now 614
when the OS limit on the number of open files allows or can be
increased, but less otherwise (but it will be at least 100). A
specific maximum can be requested via the environment variable
R_MAX_NUM_DLLS, which has to be set (to a value between 100 and
1000 inclusive) before starting an R session. If the OS limit on
the number of open files does not allow using this maximum and cannot
be increased, R will fail to start with an error. The maximum is not
allowed to be greater than 60% of the OS limit on the number of open
files (essentially unlimited on Windows, on Unix typically 1024, but
256 on macOS). The limit can sometimes (including on macOS) be
modified using command ulimit -n
(sh
,
bash
) or limit descriptors
(csh
) in the
shell used to launch R. Increasing R_MAX_NUM_DLLS comes with
some memory overhead, and be aware that many types of
connections also use file descriptors.
If the OS limit on the number of open files cannot be determined, the DLL limit is 100 and cannot be changed via R_MAX_NUM_DLLS.
The creation of DLLs and the runtime linking of them into executing
programs is very platform dependent. In recent years there has been
some simplification in the process because the C subroutine call
dlopen
has become the POSIX standard for doing this. Under
Unix-alikes dyn.load
uses the dlopen
mechanism and
should work on all platforms which support it. On Windows it uses the
standard mechanism (LoadLibrary
) for loading DLLs.
The original code for loading DLLs in Unix-alikes was provided by Heiner Schwarte.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
library.dynam
to be used inside a package's
.onLoad
initialization.
SHLIB
for how to create suitable DLLs.
.C
,
.Fortran
,
.External
,
.Call
.
## expect all of these to be false in R >= 3.0.0 as these can only be ## used via registered symbols. is.loaded("supsmu") # Fortran entry point in stats is.loaded("supsmu", "stats", "Fortran") is.loaded("PDF", type = "External") # pdf() device in grDevices
## expect all of these to be false in R >= 3.0.0 as these can only be ## used via registered symbols. is.loaded("supsmu") # Fortran entry point in stats is.loaded("supsmu", "stats", "Fortran") is.loaded("PDF", type = "External") # pdf() device in grDevices
eapply
applies FUN
to the named values from an
environment
and returns the results as a list. The user
can request that all named objects are used (normally names that begin
with a dot are not). The output is not sorted and no enclosing
environments are searched.
eapply(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE)
eapply(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE)
env |
environment to be used. |
FUN |
the function to be applied, found via
|
... |
optional arguments to |
all.names |
a logical indicating whether to apply the function to all values. |
USE.NAMES |
logical indicating whether the resulting list should
have |
A named (unless USE.NAMES = FALSE
) list. Note that the order of
the components is arbitrary for hashed environments.
require(stats) env <- new.env(hash = FALSE) # so the order is fixed env$a <- 1:10 env$beta <- exp(-3:3) env$logic <- c(TRUE, FALSE, FALSE, TRUE) # what have we there? utils::ls.str(env) # compute the mean for each list element eapply(env, mean) unlist(eapply(env, mean, USE.NAMES = FALSE)) # median and quartiles for each element (making use of "..." passing): eapply(env, quantile, probs = 1:3/4) eapply(env, quantile)
require(stats) env <- new.env(hash = FALSE) # so the order is fixed env$a <- 1:10 env$beta <- exp(-3:3) env$logic <- c(TRUE, FALSE, FALSE, TRUE) # what have we there? utils::ls.str(env) # compute the mean for each list element eapply(env, mean) unlist(eapply(env, mean, USE.NAMES = FALSE)) # median and quartiles for each element (making use of "..." passing): eapply(env, quantile, probs = 1:3/4) eapply(env, quantile)
Computes eigenvalues and eigenvectors of numeric (double, integer, logical) or complex matrices.
eigen(x, symmetric, only.values = FALSE, EISPACK = FALSE)
eigen(x, symmetric, only.values = FALSE, EISPACK = FALSE)
x |
a numeric or complex matrix whose spectral decomposition is to be computed. Logical matrices are coerced to numeric. |
symmetric |
if |
only.values |
if |
EISPACK |
logical. Defunct and ignored. |
If symmetric
is unspecified, isSymmetric(x)
determines if the matrix is symmetric up to plausible numerical
inaccuracies. It is surer and typically much faster to set the value
yourself.
Computing the eigenvectors is the slow part for large matrices.
Computing the eigendecomposition of a matrix is subject to errors on a
real-world computer: the definitive analysis is Wilkinson (1965). All
you can hope for is a solution to a problem suitably close to
x
. So even though a real asymmetric x
may have an
algebraic solution with repeated real eigenvalues, the computed
solution may be of a similar matrix with complex conjugate pairs of
eigenvalues.
Unsuccessful results from the underlying LAPACK code will result in an
error giving a positive error code (most often 1
): these can
only be interpreted by detailed study of the FORTRAN code.
Missing, NaN
or infinite values in x
will given
an error.
The spectral decomposition of x
is returned as a list with components
values |
a vector containing the |
vectors |
either a Recall that the eigenvectors are only defined up to a constant: even when the length is specified they are still only defined up to a scalar of modulus one (the sign for real matrices). |
When only.values
is not true, as by default, the result is of
S3 class "eigen"
.
If r <- eigen(A)
, and V <- r$vectors; lam <- r$values
,
then
(up to numerical
fuzz), where diag(lam)
.
eigen
uses the LAPACK routines DSYEVR
, DGEEV
,
ZHEEV
and ZGEEV
.
LAPACK is from https://netlib.org/lapack/ and its guide is listed in the references.
Anderson. E. and ten others (1999)
LAPACK Users' Guide. Third Edition. SIAM.
Available on-line at
https://netlib.org/lapack/lug/lapack_lug.html.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Wilkinson, J. H. (1965) The Algebraic Eigenvalue Problem. Clarendon Press, Oxford.
svd
, a generalization of eigen
; qr
, and
chol
for related decompositions.
To compute the determinant of a matrix, the qr
decomposition is much more efficient: det
.
eigen(cbind(c(1,-1), c(-1,1))) eigen(cbind(c(1,-1), c(-1,1)), symmetric = FALSE) # same (different algorithm). eigen(cbind(1, c(1,-1)), only.values = TRUE) eigen(cbind(-1, 2:1)) # complex values eigen(print(cbind(c(0, 1i), c(-1i, 0)))) # Hermite ==> real Eigenvalues ## 3 x 3: eigen(cbind( 1, 3:1, 1:3)) eigen(cbind(-1, c(1:2,0), 0:2)) # complex values
eigen(cbind(c(1,-1), c(-1,1))) eigen(cbind(c(1,-1), c(-1,1)), symmetric = FALSE) # same (different algorithm). eigen(cbind(1, c(1,-1)), only.values = TRUE) eigen(cbind(-1, 2:1)) # complex values eigen(print(cbind(c(0, 1i), c(-1i, 0)))) # Hermite ==> real Eigenvalues ## 3 x 3: eigen(cbind( 1, 3:1, 1:3)) eigen(cbind(-1, c(1:2,0), 0:2)) # complex values
encodeString
escapes the strings in a character vector in the
same way print.default
does, and optionally fits the encoded
strings within a field width.
encodeString(x, width = 0, quote = "", na.encode = TRUE, justify = c("left", "right", "centre", "none"))
encodeString(x, width = 0, quote = "", na.encode = TRUE, justify = c("left", "right", "centre", "none"))
x |
a character vector, or an object that can be coerced to one
by |
width |
integer: the minimum field width. If |
quote |
character: quoting character, if any. |
na.encode |
logical: should |
justify |
character: partial matches are allowed. If padding to
the minimum field width is needed, how should spaces be inserted?
|
This escapes backslash and the control characters ‘\a’ (bell), ‘\b’ (backspace), ‘\f’ (form feed), ‘\n’ (line feed, aka “newline”), ‘\r’ (carriage return), ‘\t’ (tab) and ‘\v’ (vertical tab) as well as any non-printable characters in a single-byte locale, which are printed in octal notation (‘\xyz’ with leading zeroes).
Which characters are non-printable depends on the current locale.
Windows' reporting of printable characters is unreliable, so there all
other control characters are regarded as non-printable, and all
characters with codes 32–255 as printable in a single-byte locale.
See print.default
for how non-printable characters are
handled in multi-byte locales.
If quote
is a single or double quote any embedded quote of the
same type is escaped. Note that justification is of the quoted
string, hence spaces are added outside the quotes.
A character vector of the same length as x
, with the same
attributes (including names and dimensions) but with no class set.
Marked UTF-8 encodings are preserved.
The default for width
is different from format.default
,
which does similar things for character vectors but without encoding
using escapes.
x <- "ab\bc\ndef" print(x) cat(x) # interprets escapes cat(encodeString(x), "\n", sep = "") # similar to print() factor(x) # makes use of this to print the levels x <- c("a", "ab", "abcde") encodeString(x) # width = 0: use as little as possible encodeString(x, 2) # use two or more (left justified) encodeString(x, width = NA) # left justification encodeString(x, width = NA, justify = "c") encodeString(x, width = NA, justify = "r") encodeString(x, width = NA, quote = "'", justify = "r")
x <- "ab\bc\ndef" print(x) cat(x) # interprets escapes cat(encodeString(x), "\n", sep = "") # similar to print() factor(x) # makes use of this to print the levels x <- c("a", "ab", "abcde") encodeString(x) # width = 0: use as little as possible encodeString(x, 2) # use two or more (left justified) encodeString(x, width = NA) # left justification encodeString(x, width = NA, justify = "c") encodeString(x, width = NA, justify = "r") encodeString(x, width = NA, quote = "'", justify = "r")
Read or set the declared encodings for a character vector.
Encoding(x) Encoding(x) <- value enc2native(x) enc2utf8(x)
Encoding(x) Encoding(x) <- value enc2native(x) enc2utf8(x)
x |
A character vector. |
value |
A character vector of positive length. |
Character strings in R can be declared to be encoded in
"latin1"
or "UTF-8"
or as "bytes"
. These
declarations can be read by Encoding
, which will return a
character vector of values "latin1"
, "UTF-8"
"bytes"
or "unknown"
, or set, when value
is
recycled as needed and other values are silently treated as
"unknown"
. ASCII strings will never be marked with a declared
encoding, since their representation is the same in all supported
encodings. Strings marked as "bytes"
are intended to be
non-ASCII strings which should be manipulated as bytes, and never
converted to a character encoding (so writing them to a text file is
supported only by writeLines(useBytes = TRUE)
).
enc2native
and enc2utf8
convert elements of character
vectors to the native encoding or UTF-8 respectively, taking any
marked encoding into account. They are primitive functions,
designed to do minimal copying.
There are other ways for character strings to acquire a declared
encoding apart from explicitly setting it (and these have changed as
R has evolved). The parser marks strings containing ‘\u’ or
‘\U’ escapes. Functions scan
,
read.table
, readLines
, and
parse
have an encoding
argument that is used to
declare encodings, iconv
declares encodings from its
to
argument, and console input in suitable locales is also
declared. intToUtf8
declares its output as
"UTF-8"
, and output text connections (see
textConnection
) are marked if running in a
suitable locale. Under some circumstances (see its help page)
source(encoding=)
will mark encodings of character
strings it outputs.
Most character manipulation functions will set the encoding on output
strings if it was declared on the corresponding input. These include
chartr
, strsplit(useBytes = FALSE)
,
tolower
and toupper
as well as
sub(useBytes = FALSE)
and gsub(useBytes =
FALSE)
. Note that such functions do not preserve the
encoding, but if they know the input encoding and that the string has
been successfully re-encoded (to the current encoding or UTF-8), they
mark the output.
substr
does preserve the encoding, and
chartr
, tolower
and toupper
preserve UTF-8 encoding on systems with Unicode wide characters. With
their fixed
and perl
options, strsplit
,
sub
and gsub
will give a marked UTF-8 result if
any of the inputs are UTF-8.
paste
and sprintf
return elements marked
as bytes if any of the corresponding inputs is marked as bytes, and
otherwise marked as UTF-8 if any of the inputs is marked as UTF-8.
match
, pmatch
, charmatch
,
duplicated
and unique
all match in UTF-8
if any of the elements are marked as UTF-8.
Changing the current encoding from a running R session may lead to
confusion (see Sys.setlocale
).
There is some ambiguity as to what is meant by a ‘Latin-1’ locale, since some OSes (notably Windows) make use of character positions undefined (or used for control characters) in the ISO 8859-1 character set. How such characters are interpreted is system-dependent but as from R 3.5.0 they are if possible interpreted as per Windows codepage 1252 (which Microsoft calls ‘Windows Latin 1 (ANSI)’) when converting to e.g. UTF-8.
A character vector.
For enc2utf8
encodings are always marked: they are for
enc2native
in UTF-8 and Latin-1 locales.
## x is intended to be in latin1 x. <- x <- "fran\xE7ais" Encoding(x.) # "unknown" (UTF-8 loc.) | "latin1" (8859-1/CP-1252 loc.) | .... Encoding(x) <- "latin1" x xx <- iconv(x, "latin1", "UTF-8") Encoding(c(x., x, xx)) c(x, xx) xb <- xx; Encoding(xb) <- "bytes" xb # will be encoded in hex cat("x = ", x, ", xx = ", xx, ", xb = ", xb, "\n", sep = "") (Ex <- Encoding(c(x.,x,xx,xb))) stopifnot(identical(Ex, c(Encoding(x.), Encoding(x), Encoding(xx), Encoding(xb))))
## x is intended to be in latin1 x. <- x <- "fran\xE7ais" Encoding(x.) # "unknown" (UTF-8 loc.) | "latin1" (8859-1/CP-1252 loc.) | .... Encoding(x) <- "latin1" x xx <- iconv(x, "latin1", "UTF-8") Encoding(c(x., x, xx)) c(x, xx) xb <- xx; Encoding(xb) <- "bytes" xb # will be encoded in hex cat("x = ", x, ", xx = ", xx, ", xb = ", xb, "\n", sep = "") (Ex <- Encoding(c(x.,x,xx,xb))) stopifnot(identical(Ex, c(Encoding(x.), Encoding(x), Encoding(xx), Encoding(xb))))
Get, set, test for and create environments.
environment(fun = NULL) environment(fun) <- value is.environment(x) .GlobalEnv globalenv() .BaseNamespaceEnv emptyenv() baseenv() new.env(hash = TRUE, parent = parent.frame(), size = 29L) parent.env(env) parent.env(env) <- value environmentName(env) env.profile(env)
environment(fun = NULL) environment(fun) <- value is.environment(x) .GlobalEnv globalenv() .BaseNamespaceEnv emptyenv() baseenv() new.env(hash = TRUE, parent = parent.frame(), size = 29L) parent.env(env) parent.env(env) <- value environmentName(env) env.profile(env)
fun |
|
value |
an environment to associate with the function. |
x |
an arbitrary R object. |
hash |
a logical, if |
parent |
an environment to be used as the enclosure of the environment created. |
env |
an environment. |
size |
an integer specifying the initial size for a hashed
environment. An internal default value will be used if
|
Environments consist of a frame, or collection of named
objects, and a pointer to an enclosing environment. The most
common example is the frame of variables local to a function call; its
enclosure is the environment where the function was defined
(unless changed subsequently). The enclosing environment is
distinguished from the parent frame: the latter (returned by
parent.frame
) refers to the environment of the caller of
a function. Since confusion is so easy, it is best never to use
‘parent’ in connection with an environment (despite the
presence of the function parent.env
).
When get
or exists
search an environment
with the default inherits = TRUE
, they look for the variable
in the frame, then in the enclosing frame, and so on.
The global environment .GlobalEnv
, more often known as the
user's workspace, is the first item on the search path. It can also
be accessed by globalenv()
. On the search path, each item's
enclosure is the next item.
The object .BaseNamespaceEnv
is the namespace environment for
the base package. The environment of the base package itself is
available as baseenv()
.
If one follows the chain of enclosures found by repeatedly calling
parent.env
from any environment, eventually one reaches the
empty environment emptyenv()
, into which nothing may
be assigned.
The replacement function parent.env<-
is extremely dangerous as
it can be used to destructively change environments in ways that
violate assumptions made by the internal C code. It may be removed
in the near future.
The replacement form of environment
, is.environment
,
baseenv
, emptyenv
and globalenv
are
primitive functions.
System environments, such as the base, global and empty environments,
have names as do the package and namespace environments and those
generated by attach()
. Other environments can be named by
giving a "name"
attribute, but this needs to be done with care
as environments have unusual copying semantics.
If fun
is a function or a formula then environment(fun)
returns the environment associated with that function or formula.
If fun
is NULL
then the current evaluation environment is
returned.
The replacement form sets the environment of the function or formula
fun
to the value
given.
is.environment(obj)
returns TRUE
if and only if
obj
is an environment
.
new.env
returns a new (empty) environment with (by default)
enclosure the parent frame.
parent.env
returns the enclosing environment of its argument.
parent.env<-
sets the enclosing environment of its first
argument.
environmentName
returns a character string, that given when
the environment is printed or ""
if it is not a named environment.
env.profile
returns a list with the following components:
size
the number of chains that can be stored in the hash table,
nchains
the number of non-empty chains in the table (as
reported by HASHPRI
), and counts
an integer vector
giving the length of each chain (zero for empty chains). This
function is intended to assess the performance of hashed environments.
When env
is a non-hashed environment, NULL
is returned.
For the performance implications of hashing or not, see https://en.wikipedia.org/wiki/Hash_table.
The envir
argument of eval
, get
,
and exists
.
ls
may be used to view the objects in an environment,
and hence ls.str
may be useful for an overview.
sys.source
can be used to populate an environment.
f <- function() "top level function" ##-- all three give the same: environment() environment(f) .GlobalEnv ls(envir = environment(stats::approxfun(1:2, 1:2, method = "const"))) is.environment(.GlobalEnv) # TRUE e1 <- new.env(parent = baseenv()) # this one has enclosure package:base. e2 <- new.env(parent = e1) assign("a", 3, envir = e1) ls(e1) ls(e2) exists("a", envir = e2) # this succeeds by inheritance exists("a", envir = e2, inherits = FALSE) exists("+", envir = e2) # this succeeds by inheritance eh <- new.env(hash = TRUE, size = NA) with(env.profile(eh), stopifnot(size == length(counts)))
f <- function() "top level function" ##-- all three give the same: environment() environment(f) .GlobalEnv ls(envir = environment(stats::approxfun(1:2, 1:2, method = "const"))) is.environment(.GlobalEnv) # TRUE e1 <- new.env(parent = baseenv()) # this one has enclosure package:base. e2 <- new.env(parent = e1) assign("a", 3, envir = e1) ls(e1) ls(e2) exists("a", envir = e2) # this succeeds by inheritance exists("a", envir = e2, inherits = FALSE) exists("+", envir = e2) # this succeeds by inheritance eh <- new.env(hash = TRUE, size = NA) with(env.profile(eh), stopifnot(size == length(counts)))
Details of some of the environment variables which affect an R session.
It is impossible to list all the environment variables which can affect an R session: some affect the OS system functions which R uses, and others will affect add-on packages. But here are notes on some of the more important ones. Those that set the defaults for options are consulted only at startup (as are some of the others).
The user's ‘home’ directory.
Optional. The language(s) to be used for message translations. This is consulted when needed.
(etc) Optional. Use to set various aspects of
the locale – see Sys.getlocale
. Consulted at startup.
The path to makeindex
.
If unset to a value determined when R was built.
Used by the emulation mode of texi2dvi
and
texi2pdf
.
Optional – set in a batch session, that is
one started by R CMD BATCH
. Most often set to
""
, so test by something like
!is.na(Sys.getenv("R_BATCH", NA))
.
The path to the default browser. Used to
set the default value of options("browser")
.
Optional. If set to FALSE
,
command-line completion is not used. (Not used by the macOS GUI.)
A comma-separated list of packages
which are to be attached in every session. See options
.
The location of the R ‘doc’ directory. Set by R.
Optional. The path to the site environment file: see Startup. Consulted at startup.
Optional. The path to Ghostscript, used by
dev2bitmap
, bitmap
and
embedFonts
. Consulted when those functions are
invoked. Since it will be treated as if passed to
system
, spaces and shell metacharacters should be escaped.
Optional. The path of the history file: see Startup. Consulted at startup and when the history is saved.
Optional. The maximum size of the history file, in lines. Exactly how this is used depends on the interface.
for the readline
command-line interface it takes effect
when the history is saved (by savehistory
or at the
end of a session).
for Rgui
it controls the number of lines saved to the
history file: the size of the history used in the session is
controlled by the console customization: see Rconsole
.
The top-level directory of the R
installation: see R.home
. Set by R.
The location of the R ‘include’ directory. Set by R.
Optional. Used for initial setting of
.libPaths
.
Optional. Used for initial setting of
.libPaths
.
Optional. Used for initial setting of
.libPaths
.
Optional. Used to set the default for
options("papersize")
, e.g. used by
pdf
and postscript
.
Optional. Consulted when
PCRE's JIT pattern compiler is first used. See grep
.
The path to the default PDF viewer. Used
by R CMD Rd2pdf
.
The platform – a string of the form
"cpu-vendor-os"
, see R.Version
.
Optional. The path to the site profile file: see Startup. Consulted at startup.
Options for pdflatex
processing of
Rd
files. Used by R CMD Rd2pdf
.
The location of the R ‘share’ directory. Set by R.
The path to texi2dvi
.
Defaults to the value of TEXI2DVI, and if that is unset to a
value determined when R was built.
Only on Unix-alikes:
Consulted at startup to set the default for
options("texi2dvi")
, used by
texi2dvi
and texi2pdf
in package tools.
The path to HTML tidy
. Used by
R CMD check
if _R_CHECK_RD_VALIDATE_RD2HTML_ is
set to a true value (as it is by --as-cran.
The path to unzip
. Sets the
initial value for options("unzip")
on a Unix-alike
when namespace utils is loaded.
The path to zip
. Used by
zip
and by R CMD INSTALL --build
on Windows.
Consulted (in that
order) when setting the temporary directory for the session: see
tempdir
. TMPDIR is also used by some of the
utilities: see the help for build
.
Optional. The current time zone. See
Sys.timezone
for the system-specific
formats. Consulted as needed.
Optional. The top-level directory of the
time-zone database. See Sys.timezone
.
(and more). Optional. Settings for download.file
:
see its help for further details.
Some variables set on Unix-alikes, and not (in general) on Windows.
Optional: used by X11
, Tk (in
package tcltk), the data editor and various packages.
The path to the default editor: sets the
default for options("editor")
when namespace
utils is loaded.
The path to the pager with the default setting of
options("pager")
. The default value is chosen at
configuration, usually as the path to less
.
Sets the default for
options("printcmd")
, which sets the default print
command to be used by postscript
.
logical. Sets the default for the
support_old_tars
argument of untar
. Should
be set to TRUE
if an old system tar
command is
used which does not support either xz
compression or
automagically detecting compression type.
Some Windows-specific variables are
Optional: the path to Ghostscript, used if R_GSCMD is not set.
The user's ‘home’ directory. Set by R. (HOME will be set to the same value if not already set.)
Sys.getenv
and Sys.setenv
to read and set
environmental variables in an R session.
gctorture
for environment variables controlling garbage
collection.
Evaluate an R expression in a specified environment.
eval(expr, envir = parent.frame(), enclos = if(is.list(envir) || is.pairlist(envir)) parent.frame() else baseenv()) evalq(expr, envir, enclos) eval.parent(expr, n = 1) local(expr, envir = new.env())
eval(expr, envir = parent.frame(), enclos = if(is.list(envir) || is.pairlist(envir)) parent.frame() else baseenv()) evalq(expr, envir, enclos) eval.parent(expr, n = 1) local(expr, envir = new.env())
expr |
an object to be evaluated. See ‘Details’. |
envir |
the |
enclos |
relevant when |
n |
number of parent generations to go back. |
eval
evaluates the expr
argument in the
environment specified by envir
and returns the computed value.
If envir
is not specified, then the default is
parent.frame()
(the environment where the call to
eval
was made).
Objects to be evaluated can be of types call
or
expression
or name (when the name is looked
up in the current scope and its binding is evaluated), a promise
or any of the basic types such as vectors, functions and environments
(which are returned unchanged).
The evalq
form is equivalent to eval(quote(expr), ...)
.
eval
evaluates its first argument in the current scope
before passing it to the evaluator: evalq
avoids this.
eval.parent(expr, n)
is a shorthand for
eval(expr, parent.frame(n))
.
If envir
is a list (such as a data frame) or pairlist, it is
copied into a temporary environment (with enclosure enclos
),
and the temporary environment is used for evaluation. So if
expr
changes any of the components named in the (pair)list, the
changes are lost.
If envir
is NULL
it is interpreted as an empty list so
no values could be found in envir
and look-up goes directly to
enclos
.
local
evaluates an expression in a local environment. It is
equivalent to evalq
except that its default argument creates a
new, empty environment. This is useful to create anonymous recursive
functions and as a kind of limited namespace feature since variables
defined in the environment are not visible from the outside.
The result of evaluating the object: for an expression vector this is the result of evaluating the last element.
Due to the difference in scoping rules, there are some differences between R and S in this area. In particular, the default enclosure in S is the global environment.
When evaluating expressions in a data frame that has been passed as an
argument to a function, the relevant enclosure is often the caller's
environment, i.e., one needs
eval(x, data, parent.frame())
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole. (eval
only.)
expression
, quote
, sys.frame
,
parent.frame
, environment
.
Further, force
to force evaluation, typically of
function arguments.
eval(2 ^ 2 ^ 3) mEx <- expression(2^2^3); mEx; 1 + eval(mEx) eval({ xx <- pi; xx^2}) ; xx a <- 3 ; aa <- 4 ; evalq(evalq(a+b+aa, list(a = 1)), list(b = 5)) # == 10 a <- 3 ; aa <- 4 ; evalq(evalq(a+b+aa, -1), list(b = 5)) # == 12 ev <- function() { e1 <- parent.frame() ## Evaluate a in e1 aa <- eval(expression(a), e1) ## evaluate the expression bound to a in e1 a <- expression(x+y) list(aa = aa, eval = eval(a, e1)) } tst.ev <- function(a = 7) { x <- pi; y <- 1; ev() } tst.ev() #-> aa : 7, eval : 4.14 a <- list(a = 3, b = 4) with(a, a <- 5) # alters the copy of a from the list, discarded. ## ## Example of evalq() ## N <- 3 env <- new.env() assign("N", 27, envir = env) ## this version changes the visible copy of N only, since the argument ## passed to eval is '4'. eval(N <- 4, env) N get("N", envir = env) ## this version does the assignment in env, and changes N only there. evalq(N <- 5, env) N get("N", envir = env) ## ## Uses of local() ## # Mutually recursive. # gg gets value of last assignment, an anonymous version of f. gg <- local({ k <- function(y)f(y) f <- function(x) if(x) x*k(x-1) else 1 }) gg(10) sapply(1:5, gg) # Nesting locals: a is private storage accessible to k gg <- local({ k <- local({ a <- 1 function(y){print(a <<- a+1);f(y)} }) f <- function(x) if(x) x*k(x-1) else 1 }) sapply(1:5, gg) ls(envir = environment(gg)) ls(envir = environment(get("k", envir = environment(gg))))
eval(2 ^ 2 ^ 3) mEx <- expression(2^2^3); mEx; 1 + eval(mEx) eval({ xx <- pi; xx^2}) ; xx a <- 3 ; aa <- 4 ; evalq(evalq(a+b+aa, list(a = 1)), list(b = 5)) # == 10 a <- 3 ; aa <- 4 ; evalq(evalq(a+b+aa, -1), list(b = 5)) # == 12 ev <- function() { e1 <- parent.frame() ## Evaluate a in e1 aa <- eval(expression(a), e1) ## evaluate the expression bound to a in e1 a <- expression(x+y) list(aa = aa, eval = eval(a, e1)) } tst.ev <- function(a = 7) { x <- pi; y <- 1; ev() } tst.ev() #-> aa : 7, eval : 4.14 a <- list(a = 3, b = 4) with(a, a <- 5) # alters the copy of a from the list, discarded. ## ## Example of evalq() ## N <- 3 env <- new.env() assign("N", 27, envir = env) ## this version changes the visible copy of N only, since the argument ## passed to eval is '4'. eval(N <- 4, env) N get("N", envir = env) ## this version does the assignment in env, and changes N only there. evalq(N <- 5, env) N get("N", envir = env) ## ## Uses of local() ## # Mutually recursive. # gg gets value of last assignment, an anonymous version of f. gg <- local({ k <- function(y)f(y) f <- function(x) if(x) x*k(x-1) else 1 }) gg(10) sapply(1:5, gg) # Nesting locals: a is private storage accessible to k gg <- local({ k <- local({ a <- 1 function(y){print(a <<- a+1);f(y)} }) f <- function(x) if(x) x*k(x-1) else 1 }) sapply(1:5, gg) ls(envir = environment(gg)) ls(envir = environment(get("k", envir = environment(gg))))
Look for an R object of the given name and possibly return it
exists(x, where = -1, envir = , frame, mode = "any", inherits = TRUE) get0(x, envir = pos.to.env(-1L), mode = "any", inherits = TRUE, ifnotfound = NULL)
exists(x, where = -1, envir = , frame, mode = "any", inherits = TRUE) get0(x, envir = pos.to.env(-1L), mode = "any", inherits = TRUE, ifnotfound = NULL)
x |
a variable name (given as a character string or a symbol). |
where |
where to look for the object (see the details section); if omitted, the function will search as if the name of the object appeared unquoted in an expression. |
envir |
an alternative way to specify an environment to look in,
but it is usually simpler to just use the |
frame |
a frame in the calling list. Equivalent to giving
|
mode |
the mode or type of object sought: see the ‘Details’ section. |
inherits |
should the enclosing frames of the environment be searched? |
ifnotfound |
the return value of |
The where
argument can specify the environment in which to look
for the object in any of several ways: as an integer (the position in
the search
list); as the character string name of an
element in the search list; or as an environment
(including using sys.frame
to access the currently active
function calls). The envir
argument is an alternative way to
specify an environment, but is primarily there for back compatibility.
This function looks to see if the name x
has a value bound to
it in the specified environment. If inherits
is TRUE
and
a value is not found for x
in the specified environment, the
enclosing frames of the environment are searched until the name x
is encountered. See environment
and the ‘R
Language Definition’ manual for details about the structure of
environments and their enclosures.
Warning:
inherits = TRUE
is the default behaviour for R but not for S.
If mode
is specified then only objects of that type are sought.
The mode
may specify one of the collections "numeric"
and
"function"
(see mode
): any member of the
collection will suffice. (This is true even if a member of a
collection is specified, so for example mode = "special"
will
seek any type of function.)
exists():
Logical, true if and only if an object of the correct
name and mode is found.
get0():
The object—as from get(x, *)
—
if exists(x, *)
is true, otherwise ifnotfound
.
With get0()
, instead of the easy to read but somewhat
inefficient
if (exists(myVarName, envir = myEnvir)) { r <- get(myVarName, envir = myEnvir) ## ... deal with r ... }
you now can use the more efficient (and slightly harder to read)
if (!is.null(r <- get0(myVarName, envir = myEnvir))) { ## ... deal with r ... }
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
get
and hasName
. For quite a different
kind of “existence”
checking, namely if function arguments were specified,
missing
;
and for yet a different kind, namely if a file exists,
file.exists
.
## Define a substitute function if necessary: if(!exists("some.fun", mode = "function")) some.fun <- function(x) { cat("some.fun(x)\n"); x } search() exists("ls", 2) # true even though ls is in pos = 3 exists("ls", 2, inherits = FALSE) # false ## These are true (in most circumstances): identical(ls, get0("ls")) identical(NULL, get0(".foo.bar.")) # default ifnotfound = NULL (!)
## Define a substitute function if necessary: if(!exists("some.fun", mode = "function")) some.fun <- function(x) { cat("some.fun(x)\n"); x } search() exists("ls", 2) # true even though ls is in pos = 3 exists("ls", 2, inherits = FALSE) # false ## These are true (in most circumstances): identical(ls, get0("ls")) identical(NULL, get0(".foo.bar.")) # default ifnotfound = NULL (!)
Create a data frame from all combinations of the supplied vectors or factors. See the description of the return value for precise details of the way this is done.
expand.grid(..., KEEP.OUT.ATTRS = TRUE, stringsAsFactors = TRUE)
expand.grid(..., KEEP.OUT.ATTRS = TRUE, stringsAsFactors = TRUE)
... |
vectors, factors or a list containing these. |
KEEP.OUT.ATTRS |
a logical indicating the |
stringsAsFactors |
logical specifying if character vectors are converted to factors. |
A data frame containing one row for each combination of the supplied factors. The first factors vary fastest. The columns are labelled by the factors if these are supplied as named arguments or named components of a list. The row names are ‘automatic’.
Attribute "out.attrs"
is a list which gives the dimension and
dimnames for use by predict
methods.
Conversion to a factor is done with levels in the order they occur in the character vectors (and not alphabetically, as is most common when converting to factors).
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.
combn
(package utils
) for the generation
of all combinations of n elements, taken m at a time.
require(utils) expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("Male","Female")) x <- seq(0, 10, length.out = 100) y <- seq(-1, 1, length.out = 20) d1 <- expand.grid(x = x, y = y) d2 <- expand.grid(x = x, y = y, KEEP.OUT.ATTRS = FALSE) object.size(d1) - object.size(d2) ##-> 5992 or 8832 (on 32- / 64-bit platform)
require(utils) expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("Male","Female")) x <- seq(0, 10, length.out = 100) y <- seq(-1, 1, length.out = 20) d1 <- expand.grid(x = x, y = y) d2 <- expand.grid(x = x, y = y, KEEP.OUT.ATTRS = FALSE) object.size(d1) - object.size(d2) ##-> 5992 or 8832 (on 32- / 64-bit platform)
Creates or tests for objects of mode and class "expression"
.
expression(...) is.expression(x) as.expression(x, ...)
expression(...) is.expression(x) as.expression(x, ...)
... |
|
x |
an arbitrary R object. |
‘Expression’ here is not being used in its colloquial sense,
that of mathematical expressions. Those are calls (see
call
) in R, and an R expression vector is a list of
calls, symbols etc, for example as returned by parse
.
As an object of mode "expression"
is a list, it can be
subsetted by [
, [[
or $
, the latter two extracting
individual calls etc. The replacement forms of these operators can be
used to replace or delete elements.
expression
and is.expression
are primitive functions.
expression
is ‘special’: it does not evaluate its arguments.
expression
returns a vector of type "expression"
containing its arguments (unevaluated).
is.expression
returns TRUE
if expr
is an
expression object and FALSE
otherwise.
as.expression
attempts to coerce its argument into an
expression object. It is generic, and only the default method is
described here. (The default method calls
as.vector(type = "expression")
and so may dispatch methods for
as.vector
.) NULL
, calls, symbols (see
as.symbol
) and pairlists are returned as the element of
a length-one expression vector. Atomic vectors are placed
element-by-element into an expression vector (without using any
names): list
s have their type (typeof
)
changed to an expression vector
(keeping all attributes).
Other types are not currently supported.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
call
,
eval
,
function
.
Further,
text
, legend
, and plotmath
for plotting mathematical expressions.
length(ex1 <- expression(1 + 0:9)) # 1 ex1 eval(ex1) # 1:10 length(ex3 <- expression(u, 2, u + 0:9)) # 3 mode(ex3 [3]) # expression mode(ex3[[3]]) # call ## but not all components are 'call's : sapply(ex3, mode ) # name numeric call sapply(ex3, typeof) # symbol double language rm(ex3)
length(ex1 <- expression(1 + 0:9)) # 1 ex1 eval(ex1) # 1:10 length(ex3 <- expression(u, 2, u + 0:9)) # 3 mode(ex3 [3]) # expression mode(ex3[[3]]) # call ## but not all components are 'call's : sapply(ex3, mode ) # name numeric call sapply(ex3, typeof) # symbol double language rm(ex3)
Operators acting on vectors, matrices, arrays and lists to extract or replace parts.
x[i] x[i, j, ... , drop = TRUE] x[[i, exact = TRUE]] x[[i, j, ..., exact = TRUE]] x$name getElement(object, name) x[i] <- value x[i, j, ...] <- value x[[i]] <- value x$name <- value
x[i] x[i, j, ... , drop = TRUE] x[[i, exact = TRUE]] x[[i, j, ..., exact = TRUE]] x$name getElement(object, name) x[i] <- value x[i, j, ...] <- value x[[i]] <- value x$name <- value
x , object
|
object from which to extract element(s) or in which to replace element(s). |
i , j , ...
|
indices specifying elements to extract or replace. Indices are
For When indexing arrays by An index value of |
name |
a literal character string or a name (possibly backtick
quoted). For extraction, this is normally (see under
‘Environments’) partially matched to the |
drop |
relevant for matrices and arrays. If |
exact |
controls possible partial matching of |
value |
typically an array-like R object of a similar class as
|
These operators are generic. You can write methods to handle indexing
of specific classes of objects, see InternalMethods as well as
[.data.frame
and [.factor
. The
descriptions here apply only to the default methods. Note that
separate methods are required for the replacement functions
[<-
, [[<-
and $<-
for use when indexing occurs on
the assignment side of an expression.
The most important distinction between [
, [[
and
$
is that the [
can select more than one element whereas
the other two select a single element.
Note that x[[]]
is always erroneous.
The default methods work somewhat differently for atomic vectors,
matrices/arrays and for recursive (list-like, see
is.recursive
) objects. $
is only valid for
recursive objects (and NULL
), and is only discussed in the section below on
recursive objects.
Subsetting (except by an empty index) will drop all attributes except
names
, dim
and dimnames
.
Indexing can occur on the right-hand-side of an expression for
extraction, or on the left-hand-side for replacement. When an index
expression appears on the left side of an assignment (known as
subassignment) then that part of x
is set to the value
of the right hand side of the assignment. In this case no partial
matching of character indices is done, and the left-hand-side is
coerced as needed to accept the values. For vectors, the answer will
be of the higher of the types of x
and value
in the
hierarchy raw < logical < integer < double < complex < character <
list < expression. Attributes are preserved (although names
,
dim
and dimnames
will be adjusted suitably).
Subassignment is done sequentially, so if an index is specified more
than once the latest assigned value for an index will result.
It is an error to apply any of these operators to an object which is not subsettable (e.g., a function).
The usual form of indexing is [
. [[
can be used to
select a single element dropping names
, whereas
[
keeps them, e.g., in c(abc = 123)[1]
.
The index object i
can be numeric, logical, character or empty.
Indexing by factors is allowed and is equivalent to indexing by the
numeric codes (see factor
) and not by the character
values which are printed (for which use [as.character(i)]
).
An empty index selects all values: this is most often used to replace
all the entries but keep the attributes
.
Matrices and arrays are vectors with a dimension attribute and so all
the vector forms of indexing can be used with a single index. The
result will be an unnamed vector unless x
is one-dimensional
when it will be a one-dimensional array.
The most common form of indexing a -dimensional array is to
specify
indices to
[
. As for vector indexing, the
indices can be numeric, logical, character, empty or even factor.
And again, indexing by factors is equivalent to indexing by the
numeric codes, see ‘Atomic vectors’ above.
An empty index (a comma separated blank) indicates that all entries in
that dimension are selected.
The argument drop
applies to this form of indexing.
A third form of indexing is via a numeric matrix with the one column
for each dimension: each row of the index matrix then selects a single
element of the array, and the result is a vector. Negative indices are
not allowed in the index matrix. NA
and zero values are allowed:
rows of an index matrix containing a zero are ignored, whereas rows
containing an NA
produce an NA
in the result.
Indexing via a character matrix with one column per dimensions is also
supported if the array has dimension names. As with numeric matrix
indexing, each row of the index matrix selects a single element of the
array. Indices are matched against the appropriate dimension names.
NA
is allowed and will produce an NA
in the result.
Unmatched indices as well as the empty string (""
) are not
allowed and will result in an error.
A vector obtained by matrix indexing will be unnamed unless x
is one-dimensional when the row names (if any) will be indexed to
provide names for the result.
Indexing by [
is similar to atomic vectors and selects a list
of the specified element(s).
Both [[
and $
select a single element of the list. The
main difference is that $
does not allow computed indices,
whereas [[
does. x$name
is equivalent to
x[["name", exact = FALSE]]
. Also, the partial matching
behavior of [[
can be controlled using the exact
argument.
getElement(x, name)
is a version of x[[name, exact = TRUE]]
which for formally classed (S4) objects returns slot(x, name)
,
hence providing access to even more general list-like objects.
[
and [[
are sometimes applied to other recursive
objects such as calls and expressions. Pairlists (such
as calls) are coerced to lists for extraction by [
, but all
three operators can be used for replacement.
[[
can be applied recursively to lists, so that if the single
index i
is a vector of length p
, alist[[i]]
is
equivalent to alist[[i1]]...[[ip]]
providing all but the
final indexing results in a list.
Note that in all three kinds of replacement, a value of NULL
deletes the corresponding item of the list. To set entries to
NULL
, you need x[i] <- list(NULL)
.
When $<-
is applied to a NULL
x
, it first coerces
x
to list()
. This is what also happens with [[<-
where in R versions less than 4.y.z, a length one value resulted in a
length one (atomic) vector.
Both $
and [[
can be applied to environments. Only
character indices are allowed and no partial matching is done. The
semantics of these operations are those of get(i, env = x,
inherits = FALSE)
. If no match is found then NULL
is
returned. The replacement versions, $<-
and [[<-
, can
also be used. Again, only character arguments are allowed. The
semantics in this case are those of assign(i, value, env = x,
inherits = FALSE)
. Such an assignment will either create a new
binding or change the existing binding in x
.
When extracting, a numerical, logical or character NA
index picks
an unknown element and so returns NA
in the corresponding
element of a logical, integer, numeric, complex or character result,
and NULL
for a list. (It returns 00
for a raw result.)
When replacing (that is using indexing on the lhs of an
assignment) NA
does not select any element to be replaced. As
there is ambiguity as to whether an element of the rhs should
be used or not, this is only allowed if the rhs value is of length one
(so the two interpretations would have the same outcome).
(The documented behaviour of S was that an NA
replacement index
‘goes nowhere’ but uses up an element of value
:
Becker et al. p. 359. However, that has not been true of
other implementations.)
Note that these operations do not match their index arguments in the
standard way: argument names are ignored and positional matching only is
used. So m[j = 2, i = 1]
is equivalent to m[2, 1]
and
not to m[1, 2]
.
This may not be true for methods defined for them; for example it is
not true for the data.frame
methods described in
[.data.frame
which warn if i
or j
is named and have undocumented behaviour in that case.
To avoid confusion, do not name index arguments (but drop
and
exact
must be named).
These operators are also implicit S4 generics, but as primitives, S4
methods will be dispatched only on S4 objects x
.
The implicit generics for the $
and $<-
operators do not
have name
in their signature because the grammar only allows
symbols or string constants for the name
argument.
Character indices can in some circumstances be partially matched (see
pmatch
) to the names or dimnames of the object being
subsetted (but never for subassignment).
Unlike S (Becker et al. p. 358), R never uses partial
matching when extracting by
[
, and partial matching is not by default used by [[
(see argument exact
).
Thus the default behaviour is to use partial matching only when
extracting from recursive objects (except environments) by $
.
Even in that case, warnings can be switched on by
options(warnPartialMatchDollar = TRUE)
.
Neither empty (""
) nor NA
indices match any names, not
even empty nor missing names. If any object has no names or
appropriate dimnames, they are taken as all ""
and so match
nothing.
Attempting to apply a subsetting operation to objects for which this is
not possible signals an error of class
notSubsettableError
. The object
component of the error
condition contains the non-subsettable object.
Subscript out of bounds errors are signaled as errors of class
subscriptOutOfBoundsError
. The object
component of the
error condition contains the object being subsetted. The integer
subscript
component is zero for vector subscripting, and for
multiple subscripts indicates which subscript was out of bounds. The
index
component contains the erroneous index.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
names
for details of matching to names, and
pmatch
for partial matching.
[.data.frame
and [.factor
for the
behaviour when applied to data.frame and factors.
Syntax
for operator precedence, and the
‘R Language Definition’ manual about indexing details.
NULL
for details of indexing null objects.
x <- 1:12 m <- matrix(1:6, nrow = 2, dimnames = list(c("a", "b"), LETTERS[1:3])) li <- list(pi = pi, e = exp(1)) x[10] # the tenth element of x x <- x[-1] # delete the 1st element of x m[1,] # the first row of matrix m m[1, , drop = FALSE] # is a 1-row matrix m[,c(TRUE,FALSE,TRUE)]# logical indexing m[cbind(c(1,2,1),3:1)]# matrix numeric index ci <- cbind(c("a", "b", "a"), c("A", "C", "B")) m[ci] # matrix character index m <- m[,-1] # delete the first column of m li[[1]] # the first element of list li y <- list(1, 2, a = 4, 5) y[c(3, 4)] # a list containing elements 3 and 4 of y y$a # the element of y named a ## non-integer indices are truncated: (i <- 3.999999999) # "4" is printed (1:5)[i] # 3 ## named atomic vectors, compare "[" and "[[" : nx <- c(Abc = 123, pi = pi) nx[1] ; nx["pi"] # keeps names, whereas "[[" does not: nx[[1]] ; nx[["pi"]] ## recursive indexing into lists z <- list(a = list(b = 9, c = "hello"), d = 1:5) unlist(z) z[[c(1, 2)]] z[[c(1, 2, 1)]] # both "hello" z[[c("a", "b")]] <- "new" unlist(z) ## check $ and [[ for environments e1 <- new.env() e1$a <- 10 e1[["a"]] e1[["b"]] <- 20 e1$b ls(e1) ## partial matching - possibly with warning : stopifnot(identical(li$p, pi)) op <- options(warnPartialMatchDollar = TRUE) stopifnot( identical(li$p, pi), #-- a warning inherits(tryCatch (li$p, warning = identity), "warning")) ## revert the warning option: options(op)
x <- 1:12 m <- matrix(1:6, nrow = 2, dimnames = list(c("a", "b"), LETTERS[1:3])) li <- list(pi = pi, e = exp(1)) x[10] # the tenth element of x x <- x[-1] # delete the 1st element of x m[1,] # the first row of matrix m m[1, , drop = FALSE] # is a 1-row matrix m[,c(TRUE,FALSE,TRUE)]# logical indexing m[cbind(c(1,2,1),3:1)]# matrix numeric index ci <- cbind(c("a", "b", "a"), c("A", "C", "B")) m[ci] # matrix character index m <- m[,-1] # delete the first column of m li[[1]] # the first element of list li y <- list(1, 2, a = 4, 5) y[c(3, 4)] # a list containing elements 3 and 4 of y y$a # the element of y named a ## non-integer indices are truncated: (i <- 3.999999999) # "4" is printed (1:5)[i] # 3 ## named atomic vectors, compare "[" and "[[" : nx <- c(Abc = 123, pi = pi) nx[1] ; nx["pi"] # keeps names, whereas "[[" does not: nx[[1]] ; nx[["pi"]] ## recursive indexing into lists z <- list(a = list(b = 9, c = "hello"), d = 1:5) unlist(z) z[[c(1, 2)]] z[[c(1, 2, 1)]] # both "hello" z[[c("a", "b")]] <- "new" unlist(z) ## check $ and [[ for environments e1 <- new.env() e1$a <- 10 e1[["a"]] e1[["b"]] <- 20 e1$b ls(e1) ## partial matching - possibly with warning : stopifnot(identical(li$p, pi)) op <- options(warnPartialMatchDollar = TRUE) stopifnot( identical(li$p, pi), #-- a warning inherits(tryCatch (li$p, warning = identity), "warning")) ## revert the warning option: options(op)
Extract or replace subsets of data frames.
## S3 method for class 'data.frame' x[i, j, drop = ] ## S3 replacement method for class 'data.frame' x[i, j] <- value ## S3 method for class 'data.frame' x[[..., exact = TRUE]] ## S3 replacement method for class 'data.frame' x[[i, j]] <- value ## S3 replacement method for class 'data.frame' x$name <- value
## S3 method for class 'data.frame' x[i, j, drop = ] ## S3 replacement method for class 'data.frame' x[i, j] <- value ## S3 method for class 'data.frame' x[[..., exact = TRUE]] ## S3 replacement method for class 'data.frame' x[[i, j]] <- value ## S3 replacement method for class 'data.frame' x$name <- value
x |
data frame. |
i , j , ...
|
elements to extract or replace. For |
name |
a literal character string or a name (possibly backtick quoted). |
drop |
logical. If |
value |
a suitable replacement value: it will be repeated a whole
number of times if necessary and it may be coerced: see the
Coercion section. If |
exact |
logical: see |
Data frames can be indexed in several modes. When [
and
[[
are used with a single vector index (x[i]
or
x[[i]]
), they index the data frame as if it were a list. In
this usage a drop
argument is ignored, with a warning.
There is no data.frame
method for $
, so x$name
uses the default method which treats x
as a list (with partial
matching of column names if the match is unique, see
Extract
). The replacement method (for $
) checks
value
for the correct number of rows, and replicates it if necessary.
When [
and [[
are used with two indices (x[i, j]
and x[[i, j]]
) they act like indexing a matrix: [[
can
only be used to select one element. Note that for each selected
column, xj
say, typically (if it is not matrix-like), the
resulting column will be xj[i]
, and hence rely on the
corresponding [
method, see the examples section.
If [
returns a data frame it will have unique (and non-missing)
row names, if necessary transforming the row names using
make.unique
. Similarly, if columns are selected column
names will be transformed to be unique if necessary (e.g., if columns
are selected more than once, or if more than one column of a given
name is selected if the data frame has duplicate column names).
When drop = TRUE
, this is applied to the subsetting of any
matrices contained in the data frame as well as to the data frame itself.
The replacement methods can be used to add whole column(s) by specifying non-existent column(s), in which case the column(s) are added at the right-hand edge of the data frame and numerical indices must be contiguous to existing indices. On the other hand, rows can be added at any row after the current last row, and the columns will be in-filled with missing values. Missing values in the indices are not allowed for replacement.
For [
the replacement value can be a list: each element of the
list is used to replace (part of) one column, recycling the list as
necessary. If columns specified by number are created, the names
(if any) of the corresponding list elements are used to name the
columns. If the replacement is not selecting rows, list values can
contain NULL
elements which will cause the corresponding
columns to be deleted. (See the Examples.)
Matrix indexing (x[i]
with a logical or a 2-column integer
matrix i
) using [
is not recommended. For extraction,
x
is first coerced to a matrix. For replacement, logical
matrix indices must be of the same dimension as x
.
Replacements are done one column at a time, with multiple type
coercions possibly taking place.
Both [
and [[
extraction methods partially match row
names. By default neither partially match column names, but [[
will if exact = FALSE
(and with a warning if exact =
NA
). If you want to exact matching on row names use
match
, as in the examples.
For [
a data frame, list or a single column (the latter two
only when dimensions have been dropped). If matrix indexing is used for
extraction a vector results. If the result would be a data frame an
error results if undefined columns are selected (as there is no general
concept of a 'missing' column in a data frame). Otherwise if a single
column is selected and this is undefined the result is NULL
.
For [[
a column of the data frame or NULL
(extraction with one index)
or a length-one vector (extraction with two indices).
For $
, a column of the data frame (or NULL
).
For [<-
, [[<-
and $<-
, a data frame.
The story over when replacement values are coerced is a complicated one, and one that has changed during R's development. This section is a guide only.
When [
and [[
are used to add or replace a whole column,
no coercion takes place but value
will be
replicated (by calling the generic function rep
) to the
right length if an exact number of repeats can be used.
When [
is used with a logical matrix, each value is coerced to
the type of the column into which it is to be placed.
When [
and [[
are used with two indices, the
column will be coerced as necessary to accommodate the value.
Note that when the replacement value is an array (including a matrix)
it is not treated as a series of columns (as
data.frame
and as.data.frame
do) but
inserted as a single column.
The default behaviour when only one row is left is equivalent to
specifying drop = FALSE
. To drop from a data frame to a list,
drop = TRUE
has to be specified explicitly.
Arguments other than drop
and exact
should not be named:
there is a warning if they are and the behaviour differs from the
description here.
subset
which is often easier for extraction,
data.frame
, Extract
.
sw <- swiss[1:5, 1:4] # select a manageable subset sw[1:3] # select columns sw[, 1:3] # same sw[4:5, 1:3] # select rows and columns sw[1] # a one-column data frame sw[, 1, drop = FALSE] # the same sw[, 1] # a (unnamed) vector sw[[1]] # the same sw$Fert # the same (possibly w/ warning, see ?Extract) sw[1,] # a one-row data frame sw[1,, drop = TRUE] # a list sw["C", ] # partially matches sw[match("C", row.names(sw)), ] # no exact match try(sw[, "Ferti"]) # column names must match exactly sw[sw$Fertility > 90,] # logical indexing, see also ?subset sw[c(1, 1:2), ] # duplicate row, unique row names are created sw[sw <= 6] <- 6 # logical matrix indexing sw ## adding a column sw["new1"] <- LETTERS[1:5] # adds a character column sw[["new2"]] <- letters[1:5] # ditto sw[, "new3"] <- LETTERS[1:5] # ditto sw$new4 <- 1:5 sapply(sw, class) sw$new # -> NULL: no unique partial match sw$new4 <- NULL # delete the column sw sw[6:8] <- list(letters[10:14], NULL, aa = 1:5) # update col. 6, delete 7, append sw ## matrices in a data frame A <- data.frame(x = 1:3, y = I(matrix(4:9, 3, 2)), z = I(matrix(letters[1:9], 3, 3))) A[1:3, "y"] # a matrix A[1:3, "z"] # a matrix A[, "y"] # a matrix stopifnot(identical(colnames(A), c("x", "y", "z")), ncol(A) == 3L, identical(A[,"y"], A[1:3, "y"]), inherits (A[,"y"], "AsIs")) ## keeping special attributes: use a class with a ## "as.data.frame" and "[" method; ## "avector" := vector that keeps attributes. Could provide a constructor ## avector <- function(x) { class(x) <- c("avector", class(x)); x } as.data.frame.avector <- as.data.frame.vector `[.avector` <- function(x,i,...) { r <- NextMethod("[") mostattributes(r) <- attributes(x) r } d <- data.frame(i = 0:7, f = gl(2,4), u = structure(11:18, unit = "kg", class = "avector")) str(d[2:4, -1]) # 'u' keeps its "unit"
sw <- swiss[1:5, 1:4] # select a manageable subset sw[1:3] # select columns sw[, 1:3] # same sw[4:5, 1:3] # select rows and columns sw[1] # a one-column data frame sw[, 1, drop = FALSE] # the same sw[, 1] # a (unnamed) vector sw[[1]] # the same sw$Fert # the same (possibly w/ warning, see ?Extract) sw[1,] # a one-row data frame sw[1,, drop = TRUE] # a list sw["C", ] # partially matches sw[match("C", row.names(sw)), ] # no exact match try(sw[, "Ferti"]) # column names must match exactly sw[sw$Fertility > 90,] # logical indexing, see also ?subset sw[c(1, 1:2), ] # duplicate row, unique row names are created sw[sw <= 6] <- 6 # logical matrix indexing sw ## adding a column sw["new1"] <- LETTERS[1:5] # adds a character column sw[["new2"]] <- letters[1:5] # ditto sw[, "new3"] <- LETTERS[1:5] # ditto sw$new4 <- 1:5 sapply(sw, class) sw$new # -> NULL: no unique partial match sw$new4 <- NULL # delete the column sw sw[6:8] <- list(letters[10:14], NULL, aa = 1:5) # update col. 6, delete 7, append sw ## matrices in a data frame A <- data.frame(x = 1:3, y = I(matrix(4:9, 3, 2)), z = I(matrix(letters[1:9], 3, 3))) A[1:3, "y"] # a matrix A[1:3, "z"] # a matrix A[, "y"] # a matrix stopifnot(identical(colnames(A), c("x", "y", "z")), ncol(A) == 3L, identical(A[,"y"], A[1:3, "y"]), inherits (A[,"y"], "AsIs")) ## keeping special attributes: use a class with a ## "as.data.frame" and "[" method; ## "avector" := vector that keeps attributes. Could provide a constructor ## avector <- function(x) { class(x) <- c("avector", class(x)); x } as.data.frame.avector <- as.data.frame.vector `[.avector` <- function(x,i,...) { r <- NextMethod("[") mostattributes(r) <- attributes(x) r } d <- data.frame(i = 0:7, f = gl(2,4), u = structure(11:18, unit = "kg", class = "avector")) str(d[2:4, -1]) # 'u' keeps its "unit"
Extract or replace subsets of factors.
## S3 method for class 'factor' x[..., drop = FALSE] ## S3 method for class 'factor' x[[...]] ## S3 replacement method for class 'factor' x[...] <- value ## S3 replacement method for class 'factor' x[[...]] <- value
## S3 method for class 'factor' x[..., drop = FALSE] ## S3 method for class 'factor' x[[...]] ## S3 replacement method for class 'factor' x[...] <- value ## S3 replacement method for class 'factor' x[[...]] <- value
x |
a factor. |
... |
a specification of indices – see |
drop |
logical. If true, unused levels are dropped. |
value |
character: a set of levels. Factor values are coerced to character. |
When unused levels are dropped the ordering of the remaining levels is preserved.
If value
is not in levels(x)
, a missing value is
assigned with a warning.
Any contrasts
assigned to the factor are preserved
unless drop = TRUE
.
The [[
method supports argument exact
.
A factor with the same set of levels as x
unless drop = TRUE
.
## following example(factor) (ff <- factor(substring("statistics", 1:10, 1:10), levels = letters)) ff[, drop = TRUE] factor(letters[7:10])[2:3, drop = TRUE]
## following example(factor) (ff <- factor(substring("statistics", 1:10, 1:10), levels = letters)) ff[, drop = TRUE] factor(letters[7:10])[2:3, drop = TRUE]
Returns the (regular or parallel) maxima and minima of the input values.
pmax*()
and pmin*()
take one or more vectors as
arguments, recycle them to common length and return a single vector
giving the ‘parallel’ maxima (or minima) of the argument
vectors.
max(..., na.rm = FALSE) min(..., na.rm = FALSE) pmax(..., na.rm = FALSE) pmin(..., na.rm = FALSE) pmax.int(..., na.rm = FALSE) pmin.int(..., na.rm = FALSE)
max(..., na.rm = FALSE) min(..., na.rm = FALSE) pmax(..., na.rm = FALSE) pmin(..., na.rm = FALSE) pmax.int(..., na.rm = FALSE) pmin.int(..., na.rm = FALSE)
... |
numeric or character arguments (see Note). |
na.rm |
a logical indicating whether missing values should be removed. |
max
and min
return the maximum or minimum of all
the values present in their arguments, as integer
if
all are logical
or integer
, as double
if
all are numeric, and character otherwise.
If na.rm
is FALSE
an NA
value in any of the
arguments will cause a value of NA
to be returned, otherwise
NA
values are ignored.
The minimum and maximum of a numeric empty set are +Inf
and
-Inf
(in this order!) which ensures transitivity, e.g.,
min(x1, min(x2)) == min(x1, x2)
. For numeric x
max(x) == -Inf
and min(x) == +Inf
whenever length(x) == 0
(after removing missing values if
requested). However, pmax
and pmin
return
NA
if all the parallel elements are NA
even for
na.rm = TRUE
.
pmax
and pmin
take one or more vectors (or matrices) as
arguments and return a single vector giving the ‘parallel’
maxima (or minima) of the vectors. The first element of the result is
the maximum (minimum) of the first elements of all the arguments, the
second element of the result is the maximum (minimum) of the second
elements of all the arguments and so on. Shorter inputs (of non-zero
length) are recycled if necessary. Attributes (see
attributes
: such as names
or
dim
) are copied from the first argument (if applicable,
e.g., not for an S4
object).
pmax.int
and pmin.int
are faster internal versions only
used when all arguments are atomic vectors and there are no classes:
they drop all attributes. (Note that all versions fail for raw and
complex vectors since these have no ordering.)
max
and min
are generic functions: methods can be
defined for them individually or via the
Summary
group generic. For this to
work properly, the arguments ...
should be unnamed, and
dispatch is on the first argument.
By definition the min/max of a numeric vector containing an NaN
is NaN
, except that the min/max of any vector containing an
NA
is NA
even if it also contains an NaN
.
Note that max(NA, Inf) == NA
even though the maximum would be
Inf
whatever the missing value actually is.
Character versions are sorted lexicographically, and this depends on
the collating sequence of the locale in use: the help for
‘Comparison’ gives details. The max/min of an empty
character vector is defined to be character NA
. (One could
argue that as ""
is the smallest character element, the maximum
should be ""
, but there is no obvious candidate for the
minimum.)
For min
or max
, a length-one vector. For pmin
or
pmax
, a vector of length the longest of the input vectors, or
length zero if one of the inputs had zero length.
The type of the result will be that of the highest of the inputs in the hierarchy integer < double < character.
For min
and max
if there are only numeric inputs and all
are empty (after possible removal of NA
s), the result is double
(Inf
or -Inf
).
max
and min
are part of the S4
Summary
group generic. Methods
for them must use the signature x, ..., na.rm
.
‘Numeric’ arguments are vectors of type integer and numeric,
and logical (coerced to integer). For historical reasons, NULL
is accepted as equivalent to integer(0)
.
pmax
and pmin
will also work on classed S3 or S4 objects
with appropriate methods for comparison, is.na
and rep
(if recycling of arguments is needed).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
range
(both min and max) and
which.min
(which.max
) for the arg min,
i.e., the location where an extreme value occurs.
‘plotmath’ for the use of min
in plot annotation.
require(stats); require(graphics) min(5:1, pi) #-> one number pmin(5:1, pi) #-> 5 numbers x <- sort(rnorm(100)); cH <- 1.35 pmin(cH, quantile(x)) # no names pmin(quantile(x), cH) # has names plot(x, pmin(cH, pmax(-cH, x)), type = "b", main = "Huber's function") cut01 <- function(x) pmax(pmin(x, 1), 0) curve( x^2 - 1/4, -1.4, 1.5, col = 2) curve(cut01(x^2 - 1/4), col = "blue", add = TRUE, n = 500) ## pmax(), pmin() preserve attributes of *first* argument D <- diag(x = (3:1)/4) ; n0 <- numeric() stopifnot(identical(D, cut01(D) ), identical(n0, cut01(n0)), identical(n0, cut01(NULL)), identical(n0, pmax(3:1, n0, 2)), identical(n0, pmax(n0, 4)))
require(stats); require(graphics) min(5:1, pi) #-> one number pmin(5:1, pi) #-> 5 numbers x <- sort(rnorm(100)); cH <- 1.35 pmin(cH, quantile(x)) # no names pmin(quantile(x), cH) # has names plot(x, pmin(cH, pmax(-cH, x)), type = "b", main = "Huber's function") cut01 <- function(x) pmax(pmin(x, 1), 0) curve( x^2 - 1/4, -1.4, 1.5, col = 2) curve(cut01(x^2 - 1/4), col = "blue", add = TRUE, n = 500) ## pmax(), pmin() preserve attributes of *first* argument D <- diag(x = (3:1)/4) ; n0 <- numeric() stopifnot(identical(D, cut01(D) ), identical(n0, cut01(n0)), identical(n0, cut01(NULL)), identical(n0, pmax(3:1, n0, 2)), identical(n0, pmax(n0, 4)))
Report versions of (external) third-party software used.
extSoftVersion()
extSoftVersion()
The reports the versions of third-party software libraries in use. These are often external but might have been compiled into R when it was installed.
With dynamic linking, these are the versions of the libraries linked to in this session: with static linking, of those compiled in.
A named character vector, currently with components
zlib |
The version of |
bzlib |
The version of |
xz |
The version of |
libdeflate |
The version of |
PCRE |
The version of |
ICU |
The version of |
TRE |
The version of |
iconv |
The implementation and version of the |
readline |
The version of |
BLAS |
Name of the binary/executable file with the implementation of
|
Note that the values for bzlib
and pcre
normally contain
a date as well as the version number, and that for tre
includes
several items separated by spaces, the version number being the
second.
For iconv
this will give the implementation as well as the
version, for example "GNU libiconv 1.14"
, "glibc
2.18"
or "win_iconv"
(which has no version number).
The name of the binary/executable file for BLAS
can be used as an
indication of which implementation is in use. Typically, the R version of
BLAS will appear as libR.so
(libR.dylib
), R
or
libRblas.so
(libRblas.dylib
), depending on how R was built.
Note that libRblas.so
(libRblas.dylib
) may also be shown for
an external BLAS implementation that had been copied, hard-linked or
renamed by the system administrator. For an external BLAS, a shared
object file will be given and its path/name may indicate the
vendor/version. The detection does not work on Windows nor for some
uses of the Accelerate framework on macOS.
libcurlVersion
for the version of libCurl
.
La_version
for the version of LAPACK in use.
La_library
for binary/executable file with LAPACK in use.
grSoftVersion
for third-party graphics software.
tclVersion
in package tcltk for the version of Tcl/Tk.
pcre_config
for PCRE configuration options.
extSoftVersion() ## the PCRE version sub(" .*", "", extSoftVersion()["PCRE"])
extSoftVersion() ## the PCRE version sub(" .*", "", extSoftVersion()["PCRE"])
The function factor
is used to encode a vector as a factor (the
terms ‘category’ and ‘enumerated type’ are also used for
factors). If argument ordered
is TRUE
, the factor
levels are assumed to be ordered. For compatibility with S there is
also a function ordered
.
is.factor
, is.ordered
, as.factor
and as.ordered
are the membership and coercion functions for these classes.
factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA) ordered(x = character(), ...) is.factor(x) is.ordered(x) as.factor(x) as.ordered(x) addNA(x, ifany = FALSE) .valid.factor(object)
factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA) ordered(x = character(), ...) is.factor(x) is.ordered(x) as.factor(x) as.ordered(x) addNA(x, ifany = FALSE) .valid.factor(object)
x |
a vector of data, usually taking a small number of distinct values. |
levels |
an optional vector of the unique values (as character strings)
that |
labels |
either an optional character vector of
labels for the levels (in the same order as |
exclude |
a vector of values to be excluded when forming the
set of levels. This may be factor with the same level set as |
ordered |
logical flag to determine if the levels should be regarded as ordered (in the order given). |
nmax |
an upper bound on the number of levels; see ‘Details’. |
... |
(in |
ifany |
only add an |
object |
an R object. |
The type of the vector x
is not restricted; it only must have
an as.character
method and be sortable (by
order
).
Ordered factors differ from factors only in their class, but methods
and model-fitting functions may treat the two classes quite differently,
see options("contrasts")
.
The encoding of the vector happens as follows. First all the values
in exclude
are removed from levels
. If x[i]
equals levels[j]
, then the i
-th element of the result is
j
. If no match is found for x[i]
in levels
(which will happen for excluded values) then the i
-th element
of the result is set to NA
.
Normally the ‘levels’ used as an attribute of the result are
the reduced set of levels after removing those in exclude
, but
this can be altered by supplying labels
. This should either
be a set of new labels for the levels, or a character string, in
which case the levels are that character string with a sequence
number appended.
factor(x, exclude = NULL)
applied to a factor without
NA
s is a no-operation unless there are unused levels: in
that case, a factor with the reduced level set is returned. If
exclude
is used, since R version 3.4.0, excluding non-existing
character levels is equivalent to excluding nothing, and when
exclude
is a character
vector, that is
applied to the levels of x
.
Alternatively, exclude
can be factor with the same level set as
x
and will exclude the levels present in exclude
.
The codes of a factor may contain NA
. For a numeric
x
, set exclude = NULL
to make NA
an extra
level (prints as ‘<NA>’); by default, this is the last level.
If NA
is a level, the way to set a code to be missing (as
opposed to the code of the missing level) is to
use is.na
on the left-hand-side of an assignment (as in
is.na(f)[i] <- TRUE
; indexing inside is.na
does not work).
Under those circumstances missing values are currently printed as
‘<NA>’, i.e., identical to entries of level NA
.
is.factor
is generic: you can write methods to handle
specific classes of objects, see InternalMethods.
Where levels
is not supplied, unique
is called.
Since factors typically have quite a small number of levels, for large
vectors x
it is helpful to supply nmax
as an upper bound
on the number of unique values.
When using c
to combine a (possibly
ordered) factor with other objects, if all objects are (possibly
ordered) factors, the result will be a factor with levels the union of
the level sets of the elements, in the order the levels occur in the
level sets of the elements (which means that if all the elements have
the same level set, that is the level set of the result), equivalent
to how unlist
operates on a list of factor objects.
factor
returns an object of class "factor"
which has a
set of integer codes the length of x
with a "levels"
attribute of mode character
and unique
(!anyDuplicated(.)
) entries. If argument ordered
is true (or ordered()
is used) the result has class
c("ordered", "factor")
.
Undocumentedly for a long time, factor(x)
loses all
attributes(x)
but "names"
, and resets
"levels"
and "class"
.
Applying factor
to an ordered or unordered factor returns a
factor (of the same type) with just the levels which occur: see also
[.factor
for a more transparent way to achieve this.
is.factor
returns TRUE
or FALSE
depending on
whether its argument is of type factor or not. Correspondingly,
is.ordered
returns TRUE
when its argument is an ordered
factor and FALSE
otherwise.
as.factor
coerces its argument to a factor.
It is an abbreviated (sometimes faster) form of factor
.
as.ordered(x)
returns x
if this is ordered, and
ordered(x)
otherwise.
addNA
modifies a factor by turning NA
into an extra
level (so that NA
values are counted in tables, for instance).
.valid.factor(object)
checks the validity of a factor,
currently only levels(object)
, and returns TRUE
if it is
valid, otherwise a string describing the validity problem. This
function is used for validObject(<factor>)
.
The interpretation of a factor depends on both the codes and the
"levels"
attribute. Be careful only to compare factors with
the same set of levels (in the same order). In particular,
as.numeric
applied to a factor is meaningless, and may
happen by implicit coercion. To transform a factor f
to
approximately its original numeric values,
as.numeric(levels(f))[f]
is recommended and slightly more
efficient than as.numeric(as.character(f))
.
The levels of a factor are by default sorted, but the sort order may well depend on the locale at the time of creation, and should not be assumed to be ASCII.
There are some anomalies associated with factors that have
NA
as a level. It is suggested to use them sparingly, e.g.,
only for tabulation purposes.
There are "factor"
and "ordered"
methods for the
group generic Ops
which
provide methods for the Comparison operators,
and for the min
, max
, and
range
generics in Summary
of "ordered"
. (The rest of the groups and the
Math
group generate an error as they
are not meaningful for factors.)
Only ==
and !=
can be used for factors: a factor can
only be compared to another factor with an identical set of levels
(not necessarily in the same ordering) or to a character vector.
Ordered factors are compared in the same way, but the general dispatch
mechanism precludes comparing ordered and unordered factors.
All the comparison operators are available for ordered factors. Collation is done by the levels of the operands: if both operands are ordered factors they must have the same level set.
In earlier versions of R, storing character data as a factor was more space efficient if there is even a small proportion of repeats. However, identical character strings now share storage, so the difference is small in most cases. (Integer values are stored in 4 bytes whereas each reference to a character string needs a pointer of 4 or 8 bytes.)
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.
[.factor
for subsetting of factors.
gl
for construction of balanced factors and
C
for factors with specified contrasts.
levels
and nlevels
for accessing the
levels, and unclass
to get integer codes.
(ff <- factor(substring("statistics", 1:10, 1:10), levels = letters)) as.integer(ff) # the internal codes (f. <- factor(ff)) # drops the levels that do not occur ff[, drop = TRUE] # the same, more transparently factor(letters[1:20], labels = "letter") class(ordered(4:1)) # "ordered", inheriting from "factor" z <- factor(LETTERS[3:1], ordered = TRUE) ## and "relational" methods work: stopifnot(sort(z)[c(1,3)] == range(z), min(z) < max(z)) ## suppose you want "NA" as a level, and to allow missing values. (x <- factor(c(1, 2, NA), exclude = NULL)) is.na(x)[2] <- TRUE x # [1] 1 <NA> <NA> is.na(x) # [1] FALSE TRUE FALSE ## More rational, since R 3.4.0 : factor(c(1:2, NA), exclude = "" ) # keeps <NA> , as factor(c(1:2, NA), exclude = NULL) # always did ## exclude = <character> z # ordered levels 'A < B < C' factor(z, exclude = "C") # does exclude factor(z, exclude = "B") # ditto ## Now, labels maybe duplicated: ## factor() with duplicated labels allowing to "merge levels" x <- c("Man", "Male", "Man", "Lady", "Female") ## Map from 4 different values to only two levels: (xf <- factor(x, levels = c("Male", "Man" , "Lady", "Female"), labels = c("Male", "Male", "Female", "Female"))) #> [1] Male Male Male Female Female #> Levels: Male Female ## Using addNA() Month <- airquality$Month table(addNA(Month)) table(addNA(Month, ifany = TRUE))
(ff <- factor(substring("statistics", 1:10, 1:10), levels = letters)) as.integer(ff) # the internal codes (f. <- factor(ff)) # drops the levels that do not occur ff[, drop = TRUE] # the same, more transparently factor(letters[1:20], labels = "letter") class(ordered(4:1)) # "ordered", inheriting from "factor" z <- factor(LETTERS[3:1], ordered = TRUE) ## and "relational" methods work: stopifnot(sort(z)[c(1,3)] == range(z), min(z) < max(z)) ## suppose you want "NA" as a level, and to allow missing values. (x <- factor(c(1, 2, NA), exclude = NULL)) is.na(x)[2] <- TRUE x # [1] 1 <NA> <NA> is.na(x) # [1] FALSE TRUE FALSE ## More rational, since R 3.4.0 : factor(c(1:2, NA), exclude = "" ) # keeps <NA> , as factor(c(1:2, NA), exclude = NULL) # always did ## exclude = <character> z # ordered levels 'A < B < C' factor(z, exclude = "C") # does exclude factor(z, exclude = "B") # ditto ## Now, labels maybe duplicated: ## factor() with duplicated labels allowing to "merge levels" x <- c("Man", "Male", "Man", "Lady", "Female") ## Map from 4 different values to only two levels: (xf <- factor(x, levels = c("Male", "Man" , "Lady", "Female"), labels = c("Male", "Male", "Female", "Female"))) #> [1] Male Male Male Female Female #> Levels: Male Female ## Using addNA() Month <- airquality$Month table(addNA(Month)) table(addNA(Month, ifany = TRUE))
Utility function to access information about files on the user's file systems.
file.access(names, mode = 0)
file.access(names, mode = 0)
names |
character vector containing file names.
Tilde-expansion will be done: see |
mode |
integer specifying access mode required: see ‘Details’. |
The mode
value can be the exclusive or (xor
), i.e., a
partial sum of the following values, and hence must be in 0:7
,
test for existence.
test for execute permission.
test for write permission.
test for read permission.
Permission will be computed for real user ID and real group ID (rather than the effective IDs).
Please note that it is not a good idea to use this function to test
before trying to open a file. On a multi-tasking system, it is
possible that the accessibility of a file will change between the time
you call file.access()
and the time you try to open the file.
It is better to wrap file open attempts in try
.
An integer vector with values 0
for success and -1
for failure.
This was written as a replacement for the S-PLUS function
access
, a wrapper for the C function of the same name, which
explains the return value encoding. Note that the return value is
false for success.
file.info
for more details on permissions,
Sys.chmod
to change permissions, and
try
for a ‘test it and see’ approach.
file_test
for shell-style file tests.
fa <- file.access(dir(".")) table(fa) # count successes & failures
fa <- file.access(dir(".")) table(fa) # count successes & failures
Choose a file interactively.
file.choose(new = FALSE)
file.choose(new = FALSE)
new |
Logical: choose the style of dialog box presented to the user: at present only new = FALSE is used. |
A character vector of length one giving the file path.
list.files
for non-interactive selection.
Utility function to extract information about files on the user's file systems.
file.info(..., extra_cols = TRUE) file.mode(...) file.mtime(...) file.size(...)
file.info(..., extra_cols = TRUE) file.mode(...) file.mtime(...) file.size(...)
... |
character vectors containing file paths. Tilde-expansion
is done: see |
extra_cols |
logical: return all cols rather than just the first six. |
What constitutes a ‘file’ is OS-dependent but includes
directories. (However, directory names must not include a trailing
backslash or slash on Windows.) See also the section in the help for
file.exists
on case-insensitive file systems.
The file ‘mode’ follows POSIX conventions, giving three octal digits summarizing the permissions for the file owner, the owner's group and for anyone respectively. Each digit is the logical or of read (4), write (2) and execute/search (1) permissions.
See files for how file paths with marked encodings are interpreted.
On most systems symbolic links are followed, so information is given about the file to which the link points rather than about the link.
File modes are probably only useful on NTFS file systems, and it seems
all three digits refer to the file's owner.
The execute/search bits are set for directories, and for files based
on their extensions (e.g., ‘.exe’, ‘.com’, ‘.cmd’
and ‘.bat’ files). file.access
will give a more
reliable view of read/write access availability to the R process.
UTF-8-encoded file names not valid in the current locale can be used.
Junction points and symbolic links are followed, so information is given about the file/directory to which the link points rather than about the link.
For file.info()
, data frame with row names the file names and columns
size |
double: File size in bytes. |
isdir |
logical: Is the file a directory? |
mode |
integer of class |
mtime , ctime , atime
|
object of class |
integer, the user ID of the file's owner.
integer, the group ID of the file's group.
character, uid
interpreted as a user name.
character, gid
interpreted as a group name.
Unknown user and group names will be NA
.
character indicating the sort of executable. Possible
values are "no"
, "msdos"
, "win16"
,
"win32"
, "win64"
and "unknown"
. Note that a
file (e.g., a script file) can be executable according to the mode
bits but not executable in this sense.
If extra_cols
is false, only the first six columns are
returned: as these can all be found from a single C system call this
can be faster. (However, properly configured systems will use a
‘name service cache daemon’ to speed up the name lookups.)
Entries for non-existent or non-readable files will be NA
.
The uid
, gid
, uname
and grname
columns
may not be supplied on a non-POSIX Unix-alike system, and will not be
on Windows.
What is meant by the three file times depends on the OS and file
system. On Windows native file systems ctime
is the file
creation time (something which is not recorded on most Unix-alike file
systems). What is meant by ‘file access’ and hence the
‘last access time’ is system-dependent.
The resolution of the file times depends on both the OS and the type
of the file system. Modern file systems typically record times to an
accuracy of a microsecond or better: notable exceptions are HFS+ on
macOS (recorded in seconds) and modification time on older FAT systems
(recorded in increments of 2 seconds). Note that "POSIXct"
times are by default printed in whole seconds: to change that see
strftime
.
file.mode()
, file.mtime()
and file.size()
are fast
convenience wrappers returning just one of the columns.
Some (now old) unix alike systems allow files of more than 2Gb to be created but
not accessed by the stat
system call. Such files may show up
as non-readable (and very likely not be readable by any of R's input
functions).
Sys.readlink
to find out about symbolic links,
files
, file.access
,
list.files
,
and DateTimeClasses
for the date formats.
Sys.chmod
to change permissions.
ncol(finf <- file.info(dir())) # at least six finf # the whole list ## Those that are more than 100 days old : finf <- file.info(dir(), extra_cols = FALSE) finf[difftime(Sys.time(), finf[,"mtime"], units = "days") > 100 , 1:4] file.info("no-such-file-exists") ## E.g., for R-core, in a R-devel version: if(Sys.info()[["sysname"]] == "Linux") sort(file.mtime(file.path(R.home("bin"), c("", file.path(c("", "exec"), "R"))) ))
ncol(finf <- file.info(dir())) # at least six finf # the whole list ## Those that are more than 100 days old : finf <- file.info(dir(), extra_cols = FALSE) finf[difftime(Sys.time(), finf[,"mtime"], units = "days") > 100 , 1:4] file.info("no-such-file-exists") ## E.g., for R-core, in a R-devel version: if(Sys.info()[["sysname"]] == "Linux") sort(file.mtime(file.path(R.home("bin"), c("", file.path(c("", "exec"), "R"))) ))
Construct the path to a file from components in a platform-independent way.
file.path(..., fsep = .Platform$file.sep)
file.path(..., fsep = .Platform$file.sep)
... |
character vectors. Long vectors are not supported. |
fsep |
the path separator to use (assumed to be ASCII). |
The implementation is designed to be fast (faster than
paste
) as this function is used extensively in R itself.
It can also be used for environment paths such as PATH and
R_LIBS with fsep = .Platform$path.sep
.
Trailing path separators are invalid for Windows file paths apart from
‘/’ and ‘d:/’ (although some functions/utilities do accept
them), so a trailing /
or \
is removed there.
A character vector of the arguments concatenated term-by-term and
separated by fsep
if all arguments have positive length;
otherwise, an empty character vector (unlike paste
).
An element of the result will be marked (see Encoding
) as
UTF-8 if run in a UTF-8 locale (when marked inputs are converted to
UTF-8) or if a component of the result is marked as UTF-8, or as
Latin-1 in a non-Latin-1 locale.
The components are by default separated by /
(not \
) on Windows.
basename
, normalizePath
, path.expand
.
Display one or more (plain) text files, in a platform specific way, typically via a ‘pager’.
file.show(..., header = rep("", nfiles), title = "R Information", delete.file = FALSE, pager = getOption("pager"), encoding = "")
file.show(..., header = rep("", nfiles), title = "R Information", delete.file = FALSE, pager = getOption("pager"), encoding = "")
... |
one or more character vectors containing the names of the files to be displayed. Paths with have tilde expansion. |
header |
character vector (of the same length as the number of files
specified in |
title |
an overall title for the display. If a single separate
window is used for the display, |
delete.file |
should the files be deleted after display? Used for temporary files. |
pager |
the pager to be used, see ‘Details’. |
encoding |
character string giving the encoding to be assumed for the file(s). |
This function provides the core of the R help system, but it can be
used for other purposes as well, such as page
.
How the pager is implemented is highly system-dependent.
The basic Unix version concatenates the files (using the headers) to a
temporary file, and displays it in the pager selected by the
pager
argument, which is a character vector specifying a system
command (a full path or a command found on the PATH) to run on
the set of files. The ‘factory-fresh’ default is to use
‘R_HOME/bin/pager’, which is a shell script running the command-line
specified by the environment variable PAGER whose default is set
at configuration, usually to less
. On a Unix-alike
more
is used if pager
is empty.
Most GUI systems will use a separate pager window for each file, and
let the user leave it up while R continues running. The selection of
such pagers could either be done using special pager names being
intercepted by lower-level code (such as "internal"
and
"console"
on Windows), or by letting pager
be an R
function which will be called with arguments (files, header,
title, delete.file)
corresponding to the first four arguments of
file.show
and take care of interfacing to the GUI.
The R.app
GUI on macOS uses its internal pager irrespective
of the setting of pager
.
Not all implementations will honour delete.file
. In
particular, using an external pager on Windows does not, as there is
no way to know when the external application has finished with the
file.
Ross Ihaka, Brian Ripley.
Text-type help
and
RShowDoc
call file.show
.
Consider getOption("pdfviewer")
and,
e.g., system
for displaying pdf files.
file.show(file.path(R.home("doc"), "COPYRIGHTS"))
file.show(file.path(R.home("doc"), "COPYRIGHTS"))
These functions provide a low-level interface to the computer's file system.
file.create(..., showWarnings = TRUE) file.exists(...) file.remove(...) file.rename(from, to) file.append(file1, file2) file.copy(from, to, overwrite = recursive, recursive = FALSE, copy.mode = TRUE, copy.date = FALSE) file.symlink(from, to) file.link(from, to)
file.create(..., showWarnings = TRUE) file.exists(...) file.remove(...) file.rename(from, to) file.append(file1, file2) file.copy(from, to, overwrite = recursive, recursive = FALSE, copy.mode = TRUE, copy.date = FALSE) file.symlink(from, to) file.link(from, to)
... , file1 , file2
|
character vectors, containing file names or paths. |
from , to
|
character vectors, containing file names or paths.
For
|
overwrite |
logical; should existing destination files be overwritten? |
showWarnings |
logical; should the warnings on failure be shown? |
recursive |
logical. If |
copy.mode |
logical: should file permission bits be copied where possible? |
copy.date |
logical: should file dates be preserved where
possible? See |
The ...
arguments are concatenated to form one character
string: you can specify the files separately or as one vector. All of
these functions expand path names: see
path.expand
. (file.exists
silently reports false
for paths that would be too long after expansion: the rest will give a
warning.)
file.create
creates files with the given names if they do not
already exist and truncates them if they do. They are created with
the maximal read/write permissions allowed by the
‘umask’ setting (where relevant). By default a warning
is given (with the reason) if the operation fails.
file.exists
returns a logical vector indicating whether the
files named by its argument exist. (Here ‘exists’ is in the
sense of the system's stat
call: a file will be reported as
existing only if you have the permissions needed by stat
.
Existence can also be checked by file.access
, which
might use different permissions and so obtain a different result.
Note that the existence of a file does not imply that it is readable:
for that use file.access
.) What constitutes a
‘file’ is system-dependent, but should include directories.
(However, directory names must not include a trailing backslash or
slash on Windows.) Note that if the file is a symbolic link on a
Unix-alike, the result indicates if the link points to an actual file,
not just if the link exists. On Windows, the result is unreliable for a
broken symbolic link (junction).
Lastly, note the different function exists
which
checks for existence of R objects.
file.remove
attempts to remove the files named in its argument.
On most Unix platforms ‘file’ includes empty
directories, symbolic links, fifos and sockets. On Windows,
‘file’ means a regular file and not, say, an empty directory.
file.rename
attempts to rename files (and from
and
to
must be of the same length). Where file permissions allow
this will overwrite an existing element of to
. This is subject
to the limitations of the OS's corresponding system call (see
something like man 2 rename
on a Unix-alike): in particular
in the interpretation of ‘file’: most platforms will not rename
files from one file system to another. NB: This means that
renaming a file from a temporary directory to the user's filespace or
during package installation will often fail. (On Windows,
file.rename
can rename files but not directories across
volumes.) On platforms which allow directories to be renamed,
typically neither or both of from
and to
must a
directory, and if to
exists it must be an empty directory.
file.append
attempts to append the files named by its
second argument to those named by its first. The R subscript
recycling rule is used to align names given in vectors
of different lengths.
file.copy
works in a similar way to file.append
but with
the arguments in the natural order for copying. Copying to existing
destination files is skipped unless overwrite = TRUE
. The
to
argument can specify a single existing directory. If
copy.mode = TRUE
file read/write/execute permissions are copied
where possible, restricted by ‘umask’. (On Windows this
applies only to files.) Other security attributes such as ACLs are not
copied. On a POSIX filesystem the targets of symbolic links will be
copied rather than the links themselves, and hard links are copied
separately. Using copy.date = TRUE
may or may not copy the
timestamp exactly (for example, fractional seconds may be omitted),
but is more likely to do so as from R 3.4.0.
file.symlink
and file.link
make symbolic and hard links
on those file systems which support them. For file.symlink
the
to
argument can specify a single existing directory. (Unix and
macOS native filesystems support both. Windows has hard links to
files on NTFS file systems and concepts related to symbolic links on
recent versions: see the section below on the Windows version of this
help page. What happens on a FAT or SMB-mounted file system is OS-specific.)
File arguments with a marked encoding (see Encoding
are
if possible translated to the native encoding, except on Windows where
Unicode file operations are used (so marking as UTF-8 can be used to
access file paths not in the native encoding on suitable file
systems).
These functions return a logical vector indicating which operation succeeded for each of the files attempted. Using a missing value for a file or path name will always be regarded as a failure.
If showWarnings = TRUE
, file.create
will give a warning
for an unexpected failure.
Case-insensitive file systems are the norm on Windows and macOS, but can be found on all OSes (for example a FAT-formatted USB drive is probably case-insensitive).
These functions will most likely match existing files regardless of case on such file systems: however this is an OS function and it is possible that file names might be mapped to upper or lower case.
Always check the return value of these functions when used in package
code. This is especially important for file.rename
, which has
OS-specific restrictions (and note that the session temporary
directory is commonly on a different file system from the working
directory): it is only portable to use file.rename
to change
file name(s) within a single directory.
Ross Ihaka, Brian Ripley
file.info
, file.access
, file.path
,
file.show
, list.files
,
unlink
, basename
,
path.expand
.
Sys.glob
to expand wildcards in file specifications.
file_test
, Sys.readlink
(for ‘symlink’s).
https://en.wikipedia.org/wiki/Hard_link and https://en.wikipedia.org/wiki/Symbolic_link for the concepts of links and their limitations.
cat("file A\n", file = "A") cat("file B\n", file = "B") file.append("A", "B") file.create("A") # (trashing previous) file.append("A", rep("B", 10)) if(interactive()) file.show("A") # -> the 10 lines from 'B' file.copy("A", "C") dir.create("tmp") file.copy(c("A", "B"), "tmp") list.files("tmp") # -> "A" and "B" setwd("tmp") file.remove("A") # the tmp/A file file.symlink(file.path("..", c("A", "B")), ".") # |--> (TRUE,FALSE) : ok for A but not B as it exists already setwd("..") unlink("tmp", recursive = TRUE) file.remove("A", "B", "C")
cat("file A\n", file = "A") cat("file B\n", file = "B") file.append("A", "B") file.create("A") # (trashing previous) file.append("A", rep("B", 10)) if(interactive()) file.show("A") # -> the 10 lines from 'B' file.copy("A", "C") dir.create("tmp") file.copy(c("A", "B"), "tmp") list.files("tmp") # -> "A" and "B" setwd("tmp") file.remove("A") # the tmp/A file file.symlink(file.path("..", c("A", "B")), ".") # |--> (TRUE,FALSE) : ok for A but not B as it exists already setwd("..") unlink("tmp", recursive = TRUE) file.remove("A", "B", "C")
These functions provide a low-level interface to the computer's file system.
dir.exists(paths) dir.create(path, showWarnings = TRUE, recursive = FALSE, mode = "0777") Sys.chmod(paths, mode = "0777", use_umask = TRUE) Sys.umask(mode = NA)
dir.exists(paths) dir.create(path, showWarnings = TRUE, recursive = FALSE, mode = "0777") Sys.chmod(paths, mode = "0777", use_umask = TRUE) Sys.umask(mode = NA)
path |
a character vector containing a single path name. Tilde
expansion (see |
paths |
character vectors containing file or directory paths. Tilde
expansion (see |
showWarnings |
logical; should the warnings on failure be shown? |
recursive |
logical. Should elements of the path other than the
last be created? If true, like the Unix command |
mode |
the mode to be used on Unix-alikes: it will be
coerced by |
use_umask |
logical: should the mode be restricted by the
|
dir.exists
checks that the paths exist (in the same sense as
file.exists
) and are directories.
dir.create
creates the last element of the path, unless
recursive = TRUE
. Trailing path separators are discarded.
The mode will be modified by the umask
setting in the same way
as for the system function mkdir
. What modes can be set is
OS-dependent, and it is unsafe to assume that more than three octal
digits will be used. For more details see your OS's documentation on the
system call mkdir
, e.g. man 2 mkdir
(and not that on
the command-line utility of that name).
One of the idiosyncrasies of Windows is that directory creation may
report success but create a directory with a different name, for
example dir.create("G.S.")
creates ‘"G.S"’. This is
undocumented, and what are the precise circumstances is unknown (and
might depend on the version of Windows). Also avoid directory names
with a trailing space.
Sys.chmod
sets the file permissions of one or more files.
It may not be supported on a system (when a warning is issued).
See the comments for dir.create
for how modes are interpreted.
Changing mode on a symbolic link is unlikely to work (nor be
necessary). For more details see your OS's documentation on the
system call chmod
, e.g. man 2 chmod
(and not that on
the command-line utility of that name). Whether this changes the
permission of a symbolic link or its target is OS-dependent (although
to change the target is more common, and POSIX does not support modes
for symbolic links: BSD-based Unixes do, though).
Sys.umask
sets the umask
and returns the previous value:
as a special case mode = NA
just returns the current value.
It may not be supported (when a warning is issued and "0"
is returned). For more details see your OS's documentation on the
system call umask
, e.g. man 2 umask
.
How modes are handled depends on the file system, even on Unix-alikes (although their documentation is often written assuming a POSIX file system). So treat documentation cautiously if you are using, say, a FAT/FAT32 or network-mounted file system.
See files for how file paths with marked encodings are interpreted.
dir.exists
returns a logical vector of TRUE
or
FALSE
values (without names).
dir.create
and Sys.chmod
return invisibly a logical vector
indicating if the operation succeeded for each of the files attempted.
Using a missing value for a path name will always be regarded as a
failure. dir.create
indicates failure if the directory already
exists. If showWarnings = TRUE
, dir.create
will give a
warning for an unexpected failure (e.g., not for a missing value nor
for an already existing component for recursive = TRUE
).
Sys.umask
returns the previous value of the umask
,
as a length-one object of class "octmode"
: the
visibility flag is off unless mode
is NA
.
See also the section in the help for file.exists
on
case-insensitive file systems for the interpretation of path
and paths
.
Ross Ihaka, Brian Ripley
file.info
, file.exists
, file.path
,
list.files
, unlink
,
basename
, path.expand
.
## Not run: ## Fix up maximal allowed permissions in a file tree Sys.chmod(list.dirs("."), "777") f <- list.files(".", all.files = TRUE, full.names = TRUE, recursive = TRUE) Sys.chmod(f, (file.mode(f) | "664")) ## End(Not run)
## Not run: ## Fix up maximal allowed permissions in a file tree Sys.chmod(list.dirs("."), "777") f <- list.files(".", all.files = TRUE, full.names = TRUE, recursive = TRUE) Sys.chmod(f, (file.mode(f) | "664")) ## End(Not run)
Find the paths to one or more packages.
find.package(package, lib.loc = NULL, quiet = FALSE, verbose = getOption("verbose")) path.package(package, quiet = FALSE) packageNotFoundError(package, lib.loc, call = NULL)
find.package(package, lib.loc = NULL, quiet = FALSE, verbose = getOption("verbose")) path.package(package, quiet = FALSE) packageNotFoundError(package, lib.loc, call = NULL)
package |
character vector: the names of packages. |
lib.loc |
a character vector describing the location of R
library trees to search through, or |
quiet |
logical. Should this not give warnings or an error if the package is not found? |
verbose |
a logical. If |
call |
call expression. |
find.package
returns path to the locations where the
given packages are found. If lib.loc
is NULL
, then
loaded namespaces are searched before the libraries. If a package is
found more than once, the first match is used. Unless quiet =
TRUE
a warning will be given about the named packages which are not
found, and an error if none are. If verbose
is true, warnings
about packages found more than once are given. For a package to be
returned it must contain a either a ‘Meta’ subdirectory or a
‘DESCRIPTION’ file containing a valid version
field, but
it need not be installed (it could be a source package if
lib.loc
was set suitably).
find.package
is not usually the right tool to find out if a
package is available for use: the only way to do that is to use
require
to try to load it. It need not be installed for
the correct platform, it might have a version requirement not met by
the running version of R, there might be dependencies which are not
available, ....
path.package
returns the paths from which the named packages
were loaded, or if none were named, for all currently attached packages.
Unless quiet = TRUE
it will warn if some of the packages named
are not attached, and given an error if none are.
packageNotFoundError
creates an error condition object of class
packageNotFoundError
for signaling errors. The condition object
contains the fields package
and lib.loc
.
A character vector of paths of package directories.
path.expand
and normalizePath
for path
standardization.
try(find.package("knitr")) ## will not give an error, maybe a warning about *all* locations it is found: find.package("kitty", quiet=TRUE, verbose=TRUE) ## Find all .libPaths() entries a package is found: findPkgAll <- function(pkg) unlist(lapply(.libPaths(), function(lib) find.package(pkg, lib, quiet=TRUE, verbose=FALSE))) findPkgAll("MASS") findPkgAll("knitr")
try(find.package("knitr")) ## will not give an error, maybe a warning about *all* locations it is found: find.package("kitty", quiet=TRUE, verbose=TRUE) ## Find all .libPaths() entries a package is found: findPkgAll <- function(pkg) unlist(lapply(.libPaths(), function(lib) find.package(pkg, lib, quiet=TRUE, verbose=FALSE))) findPkgAll("MASS") findPkgAll("knitr")
Given a vector of non-decreasing breakpoints in vec
, find the
interval containing each element of x
; i.e., if
i <- findInterval(x,v)
, for each index j
in x
where
,
, and
N <- length(v)
.
At the two boundaries, the returned index may differ by 1, depending
on the optional arguments rightmost.closed
and all.inside
.
findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE, left.open = FALSE)
findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE, left.open = FALSE)
x |
numeric. |
vec |
numeric, sorted (weakly) increasingly, of length |
rightmost.closed |
logical; if true, the rightmost interval,
|
all.inside |
logical; if true, the returned indices are coerced
into |
left.open |
logical; if true all the intervals are open at left
and closed at right; in the formulas below, |
The function findInterval
finds the index of one vector x
in
another, vec
, where the latter must be non-decreasing. Where
this is trivial, equivalent to apply( outer(x, vec, `>=`), 1, sum)
,
as a matter of fact, the internal algorithm uses interval search
ensuring complexity where
n <- length(x)
(and N <- length(vec)
). For (almost)
sorted x
, it will be even faster, basically .
This is the same computation as for the empirical distribution
function, and indeed, findInterval(t, sort(X))
is
identical to where
is the empirical distribution
function of
.
When rightmost.closed = TRUE
, the result for x[j] = vec[N]
(), is
N - 1
as for all other
values in the last interval.
left.open = TRUE
is occasionally useful, e.g., for survival data.
For (anti-)symmetry reasons, it is equivalent to using
“mirrored” data, i.e., the following is always true:
identical( findInterval( x, v, left.open= TRUE, ...) , N - findInterval(-x, -v[N:1], left.open=FALSE, ...) )
where N <- length(vec)
as above.
vector of length length(x)
with values in 0:N
(and
NA
) where N <- length(vec)
, or values coerced to
1:(N-1)
if and only if all.inside = TRUE
(equivalently coercing all
x values inside the intervals). Note that NA
s are
propagated from x
, and Inf
values are allowed in
both x
and vec
.
Martin Maechler
approx(*, method = "constant")
which is a
generalization of findInterval()
, ecdf
for
computing the empirical distribution function which is (up to a factor
of ) also basically the same as
findInterval(.)
.
x <- 2:18 v <- c(5, 10, 15) # create two bins [5,10) and [10,15) cbind(x, findInterval(x, v)) N <- 100 X <- sort(round(stats::rt(N, df = 2), 2)) tt <- c(-100, seq(-2, 2, length.out = 201), +100) it <- findInterval(tt, X) tt[it < 1 | it >= N] # only first and last are outside range(X) ## 'left.open = TRUE' means "mirroring" : N <- length(v) stopifnot(identical( findInterval( x, v, left.open=TRUE) , N - findInterval(-x, -v[N:1])))
x <- 2:18 v <- c(5, 10, 15) # create two bins [5,10) and [10,15) cbind(x, findInterval(x, v)) N <- 100 X <- sort(round(stats::rt(N, df = 2), 2)) tt <- c(-100, seq(-2, 2, length.out = 201), +100) it <- findInterval(tt, X) tt[it < 1 | it >= N] # only first and last are outside range(X) ## 'left.open = TRUE' means "mirroring" : N <- length(v) stopifnot(identical( findInterval( x, v, left.open=TRUE) , N - findInterval(-x, -v[N:1])))
Forces the evaluation of a function argument.
force(x)
force(x)
x |
a formal argument of the enclosing function. |
force
forces the evaluation of a formal argument. This can
be useful if the argument will be captured in a closure by the lexical
scoping rules and will later be altered by an explicit assignment or
an implicit assignment in a loop or an apply function.
This is semantic sugar: just evaluating the symbol will do the same thing (see the examples).
force
does not force the evaluation of other
promises. (It works by forcing the promise that
is created when the actual arguments of a call are matched to the
formal arguments of a closure, the mechanism which implements
lazy evaluation.)
f <- function(y) function() y lf <- vector("list", 5) for (i in seq_along(lf)) lf[[i]] <- f(i) lf[[1]]() # returns 5 g <- function(y) { force(y); function() y } lg <- vector("list", 5) for (i in seq_along(lg)) lg[[i]] <- g(i) lg[[1]]() # returns 1 ## This is identical to g <- function(y) { y; function() y }
f <- function(y) function() y lf <- vector("list", 5) for (i in seq_along(lf)) lf[[i]] <- f(i) lf[[1]]() # returns 5 g <- function(y) { force(y); function() y } lg <- vector("list", 5) for (i in seq_along(lg)) lg[[i]] <- g(i) lg[[1]]() # returns 1 ## This is identical to g <- function(y) { y; function() y }
Call a function with a specified number of leading arguments forced before the call if the function is a closure.
forceAndCall(n, FUN, ...)
forceAndCall(n, FUN, ...)
n |
number of leading arguments to force. |
FUN |
function to call. |
... |
arguments to |
forceAndCall
calls the function FUN
with arguments
specified in ...
. If the value of FUN
is a closure
then the first n
arguments to the function are evaluated
(i.e. their delayed evaluation promises are forced) before executing
the function body. If the value of FUN
is a primitive then
the call FUN(...)
is evaluated in the usual way.
forceAndCall
is intended to help defining higher order
functions like apply
to behave more reasonably when the
result returned by the function applied is a closure that captured its
arguments.
Functions to make calls to compiled code that has been loaded into R.
.C(.NAME, ..., NAOK = FALSE, DUP = TRUE, PACKAGE, ENCODING) .Fortran(.NAME, ..., NAOK = FALSE, DUP = TRUE, PACKAGE, ENCODING)
.C(.NAME, ..., NAOK = FALSE, DUP = TRUE, PACKAGE, ENCODING) .Fortran(.NAME, ..., NAOK = FALSE, DUP = TRUE, PACKAGE, ENCODING)
.NAME |
a character string giving the name of a C function or
Fortran subroutine, or an object of class
|
... |
arguments to be passed to the foreign function. Up to 65. |
NAOK |
if |
PACKAGE |
if supplied, confine the search for a character string
This is intended to add safety for packages, which can ensure by using this argument that no other package can override their external symbols, and also speeds up the search (see ‘Note’). |
DUP , ENCODING
|
For back-compatibility, accepted but ignored. |
These functions can be used to make calls to compiled C and Fortran
code. Later interfaces are .Call
and
.External
which are more flexible and have better
performance.
These functions are primitive, and .NAME
is always
matched to the first argument supplied (which should not be named).
The other named arguments follow ...
and so cannot be
abbreviated. For clarity, should avoid using names in the arguments
passed to ...
that match or partially match .NAME
.
A list similar to the ...
list of arguments passed in
(including any names given to the arguments), but reflecting any
changes made by the C or Fortran code.
The mapping of the types of R arguments to C or Fortran arguments is
R | C | Fortran |
integer |
int * |
integer
|
numeric |
double * |
double precision
|
-- or -- | float * |
real
|
complex |
Rcomplex * |
double complex
|
logical |
int * |
integer |
character
|
char ** |
[see below] |
raw |
unsigned char *
|
not allowed |
list |
SEXP * |
not allowed |
other | SEXP |
not allowed |
Note: The C types corresponding to integer
and
logical
are int
, not long
as in S. This
difference matters on most 64-bit platforms, where int
is
32-bit and long
is 64-bit (but not on 64-bit Windows).
Note: The Fortran type corresponding to logical
is
integer
, not logical
: the difference matters on some
Fortran compilers.
Numeric vectors in R will be passed as type double *
to C
(and as double precision
to Fortran) unless the argument has
attribute Csingle
set to TRUE
(use
as.single
or single
). This mechanism is
only intended to be used to facilitate the interfacing of existing C
and Fortran code.
The C type Rcomplex
is defined in ‘Complex.h’ as a
typedef struct {double r; double i;}
. It may or may not be
equivalent to the C99 double complex
type, depending on the
compiler used.
Logical values are sent as 0
(FALSE
), 1
(TRUE
) or INT_MIN = -2147483648
(NA
, but only if
NAOK = TRUE
), and the compiled code should return one of these
three values: however non-zero values other than INT_MIN
are
mapped to TRUE
.
Missing (NA
) string values are passed to .C
as the string
"NA". As the C char
type can represent all possible bit patterns
there appears to be no way to distinguish missing strings from the
string "NA"
. If this distinction is important use .Call
.
Using a character string with .Fortran
is deprecated and will
give a warning. It passes the first (only) character string of a
character vector as a C character array to Fortran: that may be usable
as character*255
if its true length is passed separately. Only
up to 255 characters of the string are passed back. (How well this
works, and even if it works at all, depends on the C and Fortran
compilers and the platform.)
Lists, functions or other R objects can (for historical reasons) be
passed to .C
, but the .Call
interface is much
preferred. All inputs apart from atomic vectors should be regarded as
read-only, and all apart from vectors (including lists), functions and
environments are now deprecated.
All Fortran compilers known to be usable to compile R map symbol names
to lower case, and so does .Fortran
.
Symbol names containing underscores are not valid Fortran 77 (although
they are valid in Fortran 9x). Many Fortran 77 compilers will allow
them but may translate them in a different way to names not containing
underscores. Such names will often work with .Fortran
(since
how they are translated is detected when R is built and the
information used by .Fortran
), but portable code should not use
Fortran names containing underscores.
Use .Fortran
with care for compiled Fortran 9x code: it may not
work if the Fortran 9x compiler used differs from the Fortran compiler
used when configuring R, especially if the subroutine name is not
lower-case or includes an underscore. The most portable way to call
Fortran 9x code from R is to use .C
and the Fortran 2003
module iso_c_binding
to provide a C interface to the Fortran
code.
Character vectors are copied before calling the compiled code and to collect the results. For other atomic vectors the argument is copied before calling the compiled code if it is otherwise used in the calling code.
Non-atomic-vector objects are read-only to the C code and are never copied.
This behaviour can be changed by setting
options(CBoundsCheck = TRUE)
. In that case raw,
logical, integer, double and complex vector arguments are copied both
before and after calling the compiled code. The first copy made is
extended at each end by guard bytes, and on return it is checked that
these are unaltered. For .C
, each element of a character
vector uses guard bytes.
If one of these functions is to be used frequently, do specify
PACKAGE
(to confine the search to a single DLL) or pass
.NAME
as one of the native symbol objects. Searching for
symbols can take a long time, especially when many namespaces are loaded.
You may see PACKAGE = "base"
for symbols linked into R. Do
not use this in your own code: such symbols are not part of the API
and may be changed without warning.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
The ‘Writing R Extensions’ manual.
Get or set the formal arguments of a function
.
formals(fun = sys.function(sys.parent()), envir = parent.frame()) formals(fun, envir = environment(fun)) <- value
formals(fun = sys.function(sys.parent()), envir = parent.frame()) formals(fun, envir = environment(fun)) <- value
fun |
a |
envir |
|
value |
For the first form, fun
can also be a character string naming
the function to be manipulated, which is searched for in envir
,
by default from the parent
frame. If it is not specified, the function calling formals
is
used.
Only closures, i.e., non-primitive functions, have formals, not
primitive functions.
Note that formals(args(f))
gives a formal argument list for
all functions f
, primitive or not.
formals
returns the formal argument list of the function
specified, as a pairlist
, or NULL
for a
non-function or primitive.
The replacement form sets the formals of a function to the
list/pairlist on the right hand side, and (potentially) resets the
environment of the function, dropping attributes
.
formalArgs
(from methods), a shortcut for names(formals(.))
.
args
for a human-readable version, and as
intermediary to get formals of a primitive function.
alist
to construct a typical formals value
,
see the examples.
The three parts of a (non-primitive) function
are its
formals
, body
, and environment
.
require(stats) formals(lm) ## If you just want the names of the arguments, use formalArgs instead. names(formals(lm)) methods:: formalArgs(lm) # same ## formals returns a pairlist. Arguments with no default have type symbol (aka name). str(formals(lm)) ## formals returns NULL for primitive functions. Use it in combination with ## args for this case. is.primitive(`+`) formals(`+`) formals(args(`+`)) ## You can overwrite the formal arguments of a function (though this is ## advanced, dangerous coding). f <- function(x) a + b formals(f) <- alist(a = , b = 3) f # function(a, b = 3) a + b f(2) # result = 5
require(stats) formals(lm) ## If you just want the names of the arguments, use formalArgs instead. names(formals(lm)) methods:: formalArgs(lm) # same ## formals returns a pairlist. Arguments with no default have type symbol (aka name). str(formals(lm)) ## formals returns NULL for primitive functions. Use it in combination with ## args for this case. is.primitive(`+`) formals(`+`) formals(args(`+`)) ## You can overwrite the formal arguments of a function (though this is ## advanced, dangerous coding). f <- function(x) a + b formals(f) <- alist(a = , b = 3) f # function(a, b = 3) a + b f(2) # result = 5
Format an R object for pretty printing.
format(x, ...) ## Default S3 method: format(x, trim = FALSE, digits = NULL, nsmall = 0L, justify = c("left", "right", "centre", "none"), width = NULL, na.encode = TRUE, scientific = NA, big.mark = "", big.interval = 3L, small.mark = "", small.interval = 5L, decimal.mark = getOption("OutDec"), zero.print = NULL, drop0trailing = FALSE, ...) ## S3 method for class 'data.frame' format(x, ..., justify = "none") ## S3 method for class 'factor' format(x, ...) ## S3 method for class 'AsIs' format(x, width = 12, ...)
format(x, ...) ## Default S3 method: format(x, trim = FALSE, digits = NULL, nsmall = 0L, justify = c("left", "right", "centre", "none"), width = NULL, na.encode = TRUE, scientific = NA, big.mark = "", big.interval = 3L, small.mark = "", small.interval = 5L, decimal.mark = getOption("OutDec"), zero.print = NULL, drop0trailing = FALSE, ...) ## S3 method for class 'data.frame' format(x, ..., justify = "none") ## S3 method for class 'factor' format(x, ...) ## S3 method for class 'AsIs' format(x, width = 12, ...)
x |
any R object (conceptually); typically numeric. |
trim |
logical; if |
digits |
a positive integer indicating how many significant digits
are to be used for
numeric and complex |
nsmall |
the minimum number of digits to the right of the decimal
point in formatting real/complex numbers in non-scientific formats.
Allowed values are |
justify |
should a character vector be left-justified (the default), right-justified, centred or left alone. Can be abbreviated. |
width |
|
na.encode |
logical: should |
scientific |
either a logical specifying whether
elements of a real or complex vector should be encoded in scientific
format, or an integer penalty (see |
... |
further arguments passed to or from other methods. |
big.mark , big.interval , small.mark , small.interval , decimal.mark , zero.print , drop0trailing
|
used for prettying (longish) numerical and complex sequences.
Passed to |
format
is a generic function. Apart from the methods described
here there are methods for dates (see format.Date
),
date-times (see format.POSIXct
) and for other classes such
as format.octmode
and format.dist
.
format.data.frame
formats the data frame column by column,
applying the appropriate method of format
for each column.
Methods for columns are often similar to as.character
but offer
more control. Matrix and data-frame columns will be converted to
separate columns in the result, and character columns (normally all)
will be given class "AsIs"
.
format.factor
converts the factor to a character vector and
then calls the default method (and so justify
applies).
format.AsIs
deals with columns of complicated objects that
have been extracted from a data frame. Character objects and (atomic)
matrices are passed to the default method (and so width
does
not apply).
Otherwise it calls toString
to convert the object
to character (if a vector or list, element by element) and then
right-justifies the result.
Justification for character vectors (and objects converted to
character vectors by their methods) is done on display width (see
nchar
), taking double-width characters and the rendering
of special characters (as escape sequences, including escaping
backslash but not double quote: see print.default
) into
account. Thus the width is as displayed by print(quote =
FALSE)
and not as displayed by cat
. Character strings
are padded with blanks to the display width of the widest. (If
na.encode = FALSE
missing character strings are not included in
the width computations and are not encoded.)
Numeric vectors are encoded with the minimum number of decimal places
needed to display all the elements to at least the digits
significant digits. However, if all the elements then have trailing
zeroes, the number of decimal places is reduced until at least one
element has a non-zero final digit; see also the argument
documentation for big.*
, small.*
etc, above. See the
note in print.default
about digits >= 16
.
Raw vectors are converted to their 2-digit hexadecimal representation
by as.character
.
format.default(x)
now provides a “minimal” string when
isS4(x)
is true.
While the internal code respects the option
getOption("OutDec")
for the ‘decimal mark’ in general,
decimal.mark
takes precedence over that option. Similarly,
scientific
takes precedence over getOption("scipen")
.
An object of similar structure to x
containing character
representations of the elements of the first argument x
in a common format, and in the current locale's encoding.
For character, numeric, complex or factor x
, dims and dimnames
are preserved on matrices/arrays and names on vectors: no other
attributes are copied.
If x
is a list, the result is a character vector obtained by
applying format.default(x, ...)
to each element of the list
(after unlist
ing elements which are themselves lists),
and then collapsing the result for each element with
paste(collapse = ", ")
. The defaults in this case are
trim = TRUE, justify = "none"
since one does not usually want
alignment in the collapsed strings.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
format.info
indicates how an atomic vector would be
formatted.
formatC
, paste
, as.character
,
sprintf
, print
, prettyNum
,
toString
, encodeString
.
format(1:10) format(1:10, trim = TRUE) zz <- data.frame("(row names)"= c("aaaaa", "b"), check.names = FALSE) format(zz) format(zz, justify = "left") ## use of nsmall format(13.7) format(13.7, nsmall = 3) format(c(6.0, 13.1), digits = 2) format(c(6.0, 13.1), digits = 2, nsmall = 1) ## use of scientific format(2^31-1) format(2^31-1, scientific = TRUE) ## scientific = numeric scipen (= {sci}entific notation {pen}alty) : x <- c(1e5, 1000, 10, 0.1, .001, .123) t(sapply(setNames(,-4:1), \(sci) sapply(x, format, scientific=sci))) ## a list z <- list(a = letters[1:3], b = (-pi+0i)^((-2:2)/2), c = c(1,10,100,1000), d = c("a", "longer", "character", "string"), q = quote( a + b ), e = expression(1+x)) ## can you find the "2" small differences? (f1 <- format(z, digits = 2)) (f2 <- format(z, digits = 2, justify = "left", trim = FALSE)) f1 == f2 ## 2 FALSE, 4 TRUE ## A "minimal" format() for S4 objects without their own format() method: cc <- methods::getClassDef("standardGeneric") format(cc) ## "<S4 class ......>"
format(1:10) format(1:10, trim = TRUE) zz <- data.frame("(row names)"= c("aaaaa", "b"), check.names = FALSE) format(zz) format(zz, justify = "left") ## use of nsmall format(13.7) format(13.7, nsmall = 3) format(c(6.0, 13.1), digits = 2) format(c(6.0, 13.1), digits = 2, nsmall = 1) ## use of scientific format(2^31-1) format(2^31-1, scientific = TRUE) ## scientific = numeric scipen (= {sci}entific notation {pen}alty) : x <- c(1e5, 1000, 10, 0.1, .001, .123) t(sapply(setNames(,-4:1), \(sci) sapply(x, format, scientific=sci))) ## a list z <- list(a = letters[1:3], b = (-pi+0i)^((-2:2)/2), c = c(1,10,100,1000), d = c("a", "longer", "character", "string"), q = quote( a + b ), e = expression(1+x)) ## can you find the "2" small differences? (f1 <- format(z, digits = 2)) (f2 <- format(z, digits = 2, justify = "left", trim = FALSE)) f1 == f2 ## 2 FALSE, 4 TRUE ## A "minimal" format() for S4 objects without their own format() method: cc <- methods::getClassDef("standardGeneric") format(cc) ## "<S4 class ......>"
Information is returned on how format(x, digits, nsmall)
would be formatted.
format.info(x, digits = NULL, nsmall = 0)
format.info(x, digits = NULL, nsmall = 0)
x |
an atomic vector; a potential argument of
|
digits |
how many significant digits are to be used for
numeric and complex |
nsmall |
(see |
An integer
vector
of length 1, 3 or 6, say
r
.
For logical, integer and character vectors a single element,
the width which would be used by format
if width = NULL
.
For numeric vectors:
r[1] |
width (in characters) used by |
r[2] |
number of digits after decimal point. |
r[3] |
in |
For a complex vector the first three elements refer to the real parts, and there are three further elements corresponding to the imaginary parts.
format
(notably about digits >= 16
),
formatC
.
dd <- options("digits") ; options(digits = 7) #-- for the following format.info(123) # 3 0 0 format.info(pi) # 8 6 0 format.info(1e8) # 5 0 1 - exponential "1e+08" format.info(1e222) # 6 0 2 - exponential "1e+222" x <- pi*10^c(-10,-2,0:2,8,20) names(x) <- formatC(x, width = 1, digits = 3, format = "g") cbind(sapply(x, format)) t(sapply(x, format.info)) ## using at least 8 digits right of "." t(sapply(x, format.info, nsmall = 8)) # Reset old options: options(dd)
dd <- options("digits") ; options(digits = 7) #-- for the following format.info(123) # 3 0 0 format.info(pi) # 8 6 0 format.info(1e8) # 5 0 1 - exponential "1e+08" format.info(1e222) # 6 0 2 - exponential "1e+222" x <- pi*10^c(-10,-2,0:2,8,20) names(x) <- formatC(x, width = 1, digits = 3, format = "g") cbind(sapply(x, format)) t(sapply(x, format.info)) ## using at least 8 digits right of "." t(sapply(x, format.info, nsmall = 8)) # Reset old options: options(dd)
format.pval
is intended for formatting p-values.
format.pval(pv, digits = max(1, getOption("digits") - 2), eps = .Machine$double.eps, na.form = "NA", ...)
format.pval(pv, digits = max(1, getOption("digits") - 2), eps = .Machine$double.eps, na.form = "NA", ...)
pv |
a numeric vector. |
digits |
how many significant digits are to be used. |
eps |
a numerical tolerance: see ‘Details’. |
na.form |
character representation of |
... |
further arguments to be passed to |
format.pval
is mainly an auxiliary function for
print.summary.lm
etc., and does separate formatting for
fixed, floating point and very small values; those less than
eps
are formatted as "< [eps]"
(where ‘[eps]’
stands for format(eps, digits)
).
A character vector.
format.pval(c(stats::runif(5), pi^-100, NA)) format.pval(c(0.1, 0.0001, 1e-27))
format.pval(c(stats::runif(5), pi^-100, NA)) format.pval(c(0.1, 0.0001, 1e-27))
formatC()
formats numbers individually and flexibly using
C
style format specifications.
prettyNum()
is used for “prettifying” (possibly
formatted) numbers, also in format.default
.
.format.zeros(x)
, an auxiliary function of prettyNum()
,
re-formats the zeros in a vector x
of formatted numbers.
formatC(x, digits = NULL, width = NULL, format = NULL, flag = "", mode = NULL, big.mark = "", big.interval = 3L, small.mark = "", small.interval = 5L, decimal.mark = getOption("OutDec"), preserve.width = "individual", zero.print = NULL, replace.zero = TRUE, drop0trailing = FALSE) prettyNum(x, big.mark = "", big.interval = 3L, small.mark = "", small.interval = 5L, decimal.mark = getOption("OutDec"), input.d.mark = decimal.mark, preserve.width = c("common", "individual", "none"), zero.print = NULL, replace.zero = FALSE, drop0trailing = FALSE, is.cmplx = NA, ...) .format.zeros(x, zero.print, nx = suppressWarnings(as.numeric(x)), replace = FALSE, warn.non.fitting = TRUE)
formatC(x, digits = NULL, width = NULL, format = NULL, flag = "", mode = NULL, big.mark = "", big.interval = 3L, small.mark = "", small.interval = 5L, decimal.mark = getOption("OutDec"), preserve.width = "individual", zero.print = NULL, replace.zero = TRUE, drop0trailing = FALSE) prettyNum(x, big.mark = "", big.interval = 3L, small.mark = "", small.interval = 5L, decimal.mark = getOption("OutDec"), input.d.mark = decimal.mark, preserve.width = c("common", "individual", "none"), zero.print = NULL, replace.zero = FALSE, drop0trailing = FALSE, is.cmplx = NA, ...) .format.zeros(x, zero.print, nx = suppressWarnings(as.numeric(x)), replace = FALSE, warn.non.fitting = TRUE)
x |
an atomic numerical or character object, possibly
|
digits |
the desired number of digits after the decimal
point ( Default: 2 for integer, 4 for real numbers. If less than 0,
the C default of 6 digits is used. If specified as more than 50, 50
will be used with a warning unless |
width |
the total field width; if both |
format |
equal to
|
flag |
for
There can be more than one of these flags, in any order. Other characters
used to have no effect for |
mode |
|
big.mark |
character; if not empty used as mark between every
|
big.interval |
see |
small.mark |
character; if not empty used as mark between every
|
small.interval |
see |
decimal.mark |
the character to be used to indicate the numeric decimal point. |
input.d.mark |
if |
preserve.width |
string specifying if the string widths should
be preserved where possible in those cases where marks
( |
zero.print |
logical, character string or |
replace.zero , replace
|
logical; if This works via |
warn.non.fitting |
logical; if it is true, |
drop0trailing |
logical, indicating if trailing zeros,
i.e., |
is.cmplx |
optional logical, to be used when |
... |
arguments passed to |
nx |
numeric vector of the same length as |
For numbers, formatC()
calls prettyNum()
when needed
which itself calls .format.zeros(*, replace=replace.zero)
.
(“when needed”: when zero.print
is not
NULL
, drop0trailing
is true, or one of big.mark
,
small.mark
, or decimal.mark
is not at default.)
If you set format
it overrides the setting of mode
, so
formatC(123.45, mode = "double", format = "d")
gives 123
.
The rendering of scientific format is platform-dependent: some systems
use n.ddde+nnn
or n.dddenn
rather than n.ddde+nn
.
formatC
does not necessarily align the numbers on the decimal
point, so formatC(c(6.11, 13.1), digits = 2, format = "fg")
gives
c("6.1", " 13")
. If you want common formatting for several
numbers, use format
.
prettyNum
is the utility function for prettifying x
.
x
can be complex (or format(<complex>)
), here. If
x
is not a character, format(x[i], ...)
is applied to
each element, and then it is left unchanged if all the other arguments
are at their defaults. Use the input.d.mark
argument for
prettyNum(x)
when x
is a character
vector not
resulting from something like format(<number>)
with a period as
decimal mark.
Because gsub
is used to insert the big.mark
and small.mark
, special characters need escaping. In particular,
to insert a single backslash, use "\\\\"
.
The C doubles used for R numerical vectors have signed zeros, which
formatC
may output as -0
, -0.000
....
There is a warning if big.mark
and decimal.mark
are the
same: that would be confusing to those reading the output.
A character object of same size and attributes as x
(after
discarding any class), in the current locale's encoding.
Unlike format
, each number is formatted individually.
Looping over each element of x
, the C function
sprintf(...)
is called for numeric inputs (inside the C
function str_signif
).
formatC
: for character x
, do simple (left or right)
padding with white space.
The default for decimal.mark
in formatC()
was changed in
R 3.2.0: for use within print
methods in packages which might
be used with earlier versions: use decimal.mark = getOption("OutDec")
explicitly.
formatC
was originally written by Bill Dunlap for S-PLUS, later
much improved by Martin Maechler.
It was first adapted for R by Friedrich Leisch and since much improved by the R Core team.
Kernighan, B. W. and Ritchie, D. M. (1988) The C Programming Language. Second edition. Prentice Hall.
sprintf
for more general C-like formatting.
xx <- pi * 10^(-5:4) cbind(format(xx, digits = 4), formatC(xx)) cbind(formatC(xx, width = 9, flag = "-")) cbind(formatC(xx, digits = 5, width = 8, format = "f", flag = "0")) cbind(format(xx, digits = 4), formatC(xx, digits = 4, format = "fg")) f <- (-2:4); f <- f*16^f # Default ("g") format: formatC(pi*f) # Fixed ("f") format, more than one flag ('width' partly "enlarged"): cbind(formatC(pi*f, digits = 3, width=9, format = "f", flag = "0+")) formatC( c("a", "Abc", "no way"), width = -7) # <=> flag = "-" formatC(c((-1:1)/0,c(1,100)*pi), width = 8, digits = 1) ## note that some of the results here depend on the implementation ## of long-double arithmetic, which is platform-specific. xx <- c(1e-12,-3.98765e-10,1.45645e-69,1e-70,pi*1e37,3.44e4) ## 1 2 3 4 5 6 formatC(xx) formatC(xx, format = "fg") # special "fixed" format. formatC(xx[1:4], format = "f", digits = 75) #>> even longer strings formatC(c(3.24, 2.3e-6), format = "f", digits = 11) formatC(c(3.24, 2.3e-6), format = "f", digits = 11, drop0trailing = TRUE) r <- c("76491283764.97430", "29.12345678901", "-7.1234", "-100.1","1123") ## American: prettyNum(r, big.mark = ",") ## Some Europeans: prettyNum(r, big.mark = "'", decimal.mark = ",") (dd <- sapply(1:10, function(i) paste((9:0)[1:i], collapse = ""))) prettyNum(dd, big.mark = "'") ## examples of 'small.mark' pN <- stats::pnorm(1:7, lower.tail = FALSE) cbind(format (pN, small.mark = " ", digits = 15)) cbind(formatC(pN, small.mark = " ", digits = 17, format = "f")) cbind(ff <- format(1.2345 + 10^(0:5), width = 11, big.mark = "'")) ## all with same width (one more than the specified minimum) ## individual formatting to common width: fc <- formatC(1.234 + 10^(0:8), format = "fg", width = 11, big.mark = "'") cbind(fc) ## Powers of two, stored exactly, formatted individually: pow.2 <- formatC(2^-(1:32), digits = 24, width = 1, format = "fg") ## nicely printed (the last line showing 5^32 exactly): noquote(cbind(pow.2)) ## complex numbers: r <- 10.0000001; rv <- (r/10)^(1:10) (zv <- (rv + 1i*rv)) op <- options(digits = 7) ## (system default) (pnv <- prettyNum(zv)) stopifnot(pnv == "1+1i", pnv == format(zv), pnv == prettyNum(zv, drop0trailing = TRUE)) ## more digits change the picture: options(digits = 8) head(fv <- format(zv), 3) prettyNum(fv) prettyNum(fv, drop0trailing = TRUE) # a bit nicer options(op) ## The ' flag : doLC <- FALSE # <= R warns, so change to TRUE manually if you want see the effect if(doLC) { oldLC <- Sys.getlocale("LC_NUMERIC") Sys.setlocale("LC_NUMERIC", "de_CH.UTF-8") } formatC(1.234 + 10^(0:4), format = "fg", width = 11, flag = "'") ## --> ..... " 1'001" " 10'001" on supported platforms if(doLC) ## revert, typically to "C" : Sys.setlocale("LC_NUMERIC", oldLC)
xx <- pi * 10^(-5:4) cbind(format(xx, digits = 4), formatC(xx)) cbind(formatC(xx, width = 9, flag = "-")) cbind(formatC(xx, digits = 5, width = 8, format = "f", flag = "0")) cbind(format(xx, digits = 4), formatC(xx, digits = 4, format = "fg")) f <- (-2:4); f <- f*16^f # Default ("g") format: formatC(pi*f) # Fixed ("f") format, more than one flag ('width' partly "enlarged"): cbind(formatC(pi*f, digits = 3, width=9, format = "f", flag = "0+")) formatC( c("a", "Abc", "no way"), width = -7) # <=> flag = "-" formatC(c((-1:1)/0,c(1,100)*pi), width = 8, digits = 1) ## note that some of the results here depend on the implementation ## of long-double arithmetic, which is platform-specific. xx <- c(1e-12,-3.98765e-10,1.45645e-69,1e-70,pi*1e37,3.44e4) ## 1 2 3 4 5 6 formatC(xx) formatC(xx, format = "fg") # special "fixed" format. formatC(xx[1:4], format = "f", digits = 75) #>> even longer strings formatC(c(3.24, 2.3e-6), format = "f", digits = 11) formatC(c(3.24, 2.3e-6), format = "f", digits = 11, drop0trailing = TRUE) r <- c("76491283764.97430", "29.12345678901", "-7.1234", "-100.1","1123") ## American: prettyNum(r, big.mark = ",") ## Some Europeans: prettyNum(r, big.mark = "'", decimal.mark = ",") (dd <- sapply(1:10, function(i) paste((9:0)[1:i], collapse = ""))) prettyNum(dd, big.mark = "'") ## examples of 'small.mark' pN <- stats::pnorm(1:7, lower.tail = FALSE) cbind(format (pN, small.mark = " ", digits = 15)) cbind(formatC(pN, small.mark = " ", digits = 17, format = "f")) cbind(ff <- format(1.2345 + 10^(0:5), width = 11, big.mark = "'")) ## all with same width (one more than the specified minimum) ## individual formatting to common width: fc <- formatC(1.234 + 10^(0:8), format = "fg", width = 11, big.mark = "'") cbind(fc) ## Powers of two, stored exactly, formatted individually: pow.2 <- formatC(2^-(1:32), digits = 24, width = 1, format = "fg") ## nicely printed (the last line showing 5^32 exactly): noquote(cbind(pow.2)) ## complex numbers: r <- 10.0000001; rv <- (r/10)^(1:10) (zv <- (rv + 1i*rv)) op <- options(digits = 7) ## (system default) (pnv <- prettyNum(zv)) stopifnot(pnv == "1+1i", pnv == format(zv), pnv == prettyNum(zv, drop0trailing = TRUE)) ## more digits change the picture: options(digits = 8) head(fv <- format(zv), 3) prettyNum(fv) prettyNum(fv, drop0trailing = TRUE) # a bit nicer options(op) ## The ' flag : doLC <- FALSE # <= R warns, so change to TRUE manually if you want see the effect if(doLC) { oldLC <- Sys.getlocale("LC_NUMERIC") Sys.setlocale("LC_NUMERIC", "de_CH.UTF-8") } formatC(1.234 + 10^(0:4), format = "fg", width = 11, flag = "'") ## --> ..... " 1'001" " 10'001" on supported platforms if(doLC) ## revert, typically to "C" : Sys.setlocale("LC_NUMERIC", oldLC)
Format vectors of items and their descriptions as 2-column tables or LaTeX-style description lists.
formatDL(x, y, style = c("table", "list"), width = 0.9 * getOption("width"), indent = NULL)
formatDL(x, y, style = c("table", "list"), width = 0.9 * getOption("width"), indent = NULL)
x |
a vector giving the items to be described, or a list of length 2 or a matrix with 2 columns giving both items and descriptions. |
y |
a vector of the same length as |
style |
a character string specifying the rendering style of the
description information. Can be abbreviated.
If |
width |
a positive integer giving the target column for wrapping lines in the output. |
indent |
a positive integer specifying the indentation of the
second column in table style, and the indentation of continuation
lines in list style. Must not be greater than |
After extracting the vectors of items and corresponding descriptions from the arguments, both are coerced to character vectors.
In table style, items with more than indent - 3
characters are
displayed on a line of their own.
a character vector with the formatted entries.
## Provide a nice summary of the numerical characteristics of the ## machine R is running on: writeLines(formatDL(unlist(.Machine))) ## Inspect Sys.getenv() results in "list" style (by default, these are ## printed in "table" style): writeLines(formatDL(Sys.getenv(), style = "list"))
## Provide a nice summary of the numerical characteristics of the ## machine R is running on: writeLines(formatDL(unlist(.Machine))) ## Inspect Sys.getenv() results in "list" style (by default, these are ## printed in "table" style): writeLines(formatDL(Sys.getenv(), style = "list"))
These functions provide the base mechanisms for defining new functions in the R language.
function( arglist ) expr \( arglist ) expr return(value)
function( arglist ) expr \( arglist ) expr return(value)
arglist |
empty or one or more (comma-separated) ‘name’ or
‘name = expression’ terms
and/or the special token |
expr |
an expression. |
value |
an expression. |
The names in an argument list can be back-quoted non-standard names (see ‘backquote’).
If value
is missing, NULL
is returned. If it is a
single expression, the value of the evaluated expression is returned.
(The expression is evaluated as soon as return
is called, in
the evaluation frame of the function and before any
on.exit
expression is evaluated.)
If the end of a function is reached without calling return
, the
value of the last evaluated expression is returned.
The shorthand form \(x) x + 1
is parsed as function(x) x
+ 1
. It may be helpful in making code containing simple function
expressions more readable.
This type of function is not the only type in R: they are called closures (a name with origins in LISP) to distinguish them from primitive functions.
A closure has three components, its formals
(its argument
list), its body
(expr
in the ‘Usage’
section) and its environment
which provides the
enclosure of the evaluation frame when the closure is used.
There is an optional further component if the closure has been byte-compiled. This is not normally user-visible, but is indicated when functions are printed.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
args
.
formals
, body
and
environment
for accessing the component parts of a
function.
debug
for debugging; using invisible
inside
return(.)
for returning invisibly.
norm <- function(x) sqrt(x%*%x) norm(1:4) ## An anonymous function: (function(x, y){ z <- x^2 + y^2; x+y+z })(0:7, 1)
norm <- function(x) sqrt(x%*%x) norm(1:4) ## An anonymous function: (function(x, y){ z <- x^2 + y^2; x+y+z })(0:7, 1)
Reduce
uses a binary function to successively combine the elements of a given vector and a possibly given initial value.
Filter
extracts the elements of a vector for which a predicate (logical) function gives true.
Find
and Position
give the first or last such element and its position in the vector, respectively.
Map
applies a function to the corresponding elements of given vectors.
Negate
creates the negation of a given function.
Reduce(f, x, init, right = FALSE, accumulate = FALSE, simplify = TRUE) Filter(f, x) Find(f, x, right = FALSE, nomatch = NULL) Map(f, ...) Negate(f) Position(f, x, right = FALSE, nomatch = NA_integer_)
Reduce(f, x, init, right = FALSE, accumulate = FALSE, simplify = TRUE) Filter(f, x) Find(f, x, right = FALSE, nomatch = NULL) Map(f, ...) Negate(f) Position(f, x, right = FALSE, nomatch = NA_integer_)
f |
a function of the appropriate arity (binary for
|
x |
a vector. |
init |
an R object of the same kind as the elements of
|
right |
a logical indicating whether to proceed from left to right (default) or from right to left. |
accumulate |
a logical indicating whether the successive reduce combinations should be accumulated. By default, only the final combination is used. |
simplify |
a logical indicating whether accumulated results should be simplified (by unlisting) in case they all are length one. |
nomatch |
the value to be returned in the case when “no match” (no element satisfying the predicate) is found. |
... |
vectors to which the function is |
If init
is given, Reduce
logically adds it to the start
(when proceeding left to right) or the end of x
, respectively.
If this possibly augmented vector has
elements,
Reduce
successively applies to the elements of
from left to right or right to left, respectively. I.e., a left
reduce computes
,
, etc.,
and returns
, and a right reduce does
,
and returns
. (E.g., if
is the
sequence (2, 3, 4) and
is division, left and right reduce give
and
, respectively.)
If
has only a single element, this is returned; if there are
no elements,
NULL
is returned. Thus, it is ensured that
f
is always called with 2 arguments.
The current implementation is non-recursive to ensure stability and scalability.
Reduce
is patterned after Common Lisp's reduce
. A
reduce is also known as a fold (e.g., in Haskell) or an accumulate
(e.g., in the C++ Standard Template Library). The accumulative
version corresponds to Haskell's scan functions.
Filter
applies the unary predicate function f
to each
element of x
, coercing to logical if necessary, and returns the
subset of x
for which this gives true. Note that possible
NA
values are currently always taken as false; control over
NA
handling may be added in the future. Filter
corresponds to filter
in Haskell or ‘remove-if-not’ in
Common Lisp.
Find
and Position
are patterned after Common Lisp's
‘find-if’ and ‘position-if’, respectively. If there is an
element for which the predicate function gives true, then the first or
last such element or its position is returned depending on whether
right
is false (default) or true, respectively. If there is no
such element, the value specified by nomatch
is returned. The
current implementation is not optimized for performance.
Map
is a simple wrapper to mapply
which does not
attempt to simplify the result, similar to Common Lisp's mapcar
(with arguments being recycled, however). Future versions may allow
some control of the result type.
Negate
corresponds to Common Lisp's complement
. Given a
(predicate) function f
, it creates a function which returns the
logical negation of what f
returns.
Function clusterMap
and mcmapply
(not
Windows) in package parallel provide parallel versions of Map
.
## A general-purpose adder: add <- function(x) Reduce(`+`, x) add(list(1, 2, 3)) ## Like sum(), but can also used for adding matrices etc., as it will ## use the appropriate '+' method in each reduction step. ## More generally, many generics meant to work on arbitrarily many ## arguments can be defined via reduction: FOO <- function(...) Reduce(FOO2, list(...)) FOO2 <- function(x, y) UseMethod("FOO2") ## FOO() methods can then be provided via FOO2() methods. ## A general-purpose cumulative adder: cadd <- function(x) Reduce(`+`, x, accumulate = TRUE) cadd(seq_len(7)) ## A simple function to compute continued fractions: cfrac <- function(x) Reduce(function(u, v) u + 1 / v, x, right = TRUE) ## Continued fraction approximation for pi: cfrac(c(3, 7, 15, 1, 292)) ## Continued fraction approximation for Euler's number (e): cfrac(c(2, 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8)) ## Map() now recycles similar to basic Ops: Map(`+`, 1, 1 : 3) ; 1 + 1:3 Map(`+`, numeric(), 1 : 3) ; numeric() + 1:3 ## Iterative function application: Funcall <- function(f, ...) f(...) ## Compute log(exp(acos(cos(0)))) Reduce(Funcall, list(log, exp, acos, cos), 0, right = TRUE) ## n-fold iterate of a function, functional style: Iterate <- function(f, n = 1) function(x) Reduce(Funcall, rep.int(list(f), n), x, right = TRUE) ## Continued fraction approximation to the golden ratio: Iterate(function(x) 1 + 1 / x, 30)(1) ## which is the same as cfrac(rep.int(1, 31)) ## Computing square root approximations for x as fixed points of the ## function t |-> (t + x / t) / 2, as a function of the initial value: asqrt <- function(x, n) Iterate(function(t) (t + x / t) / 2, n) asqrt(2, 30)(10) # Starting from a positive value => +sqrt(2) asqrt(2, 30)(-1) # Starting from a negative value => -sqrt(2) ## A list of all functions in the base environment: funs <- Filter(is.function, sapply(ls(baseenv()), get, baseenv())) ## Functions in base with more than 10 arguments: names(Filter(function(f) length(formals(f)) > 10, funs)) ## Number of functions in base with a '...' argument: length(Filter(function(f) any(names(formals(f)) %in% "..."), funs)) ## Find all objects in the base environment which are *not* functions: Filter(Negate(is.function), sapply(ls(baseenv()), get, baseenv()))
## A general-purpose adder: add <- function(x) Reduce(`+`, x) add(list(1, 2, 3)) ## Like sum(), but can also used for adding matrices etc., as it will ## use the appropriate '+' method in each reduction step. ## More generally, many generics meant to work on arbitrarily many ## arguments can be defined via reduction: FOO <- function(...) Reduce(FOO2, list(...)) FOO2 <- function(x, y) UseMethod("FOO2") ## FOO() methods can then be provided via FOO2() methods. ## A general-purpose cumulative adder: cadd <- function(x) Reduce(`+`, x, accumulate = TRUE) cadd(seq_len(7)) ## A simple function to compute continued fractions: cfrac <- function(x) Reduce(function(u, v) u + 1 / v, x, right = TRUE) ## Continued fraction approximation for pi: cfrac(c(3, 7, 15, 1, 292)) ## Continued fraction approximation for Euler's number (e): cfrac(c(2, 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8)) ## Map() now recycles similar to basic Ops: Map(`+`, 1, 1 : 3) ; 1 + 1:3 Map(`+`, numeric(), 1 : 3) ; numeric() + 1:3 ## Iterative function application: Funcall <- function(f, ...) f(...) ## Compute log(exp(acos(cos(0)))) Reduce(Funcall, list(log, exp, acos, cos), 0, right = TRUE) ## n-fold iterate of a function, functional style: Iterate <- function(f, n = 1) function(x) Reduce(Funcall, rep.int(list(f), n), x, right = TRUE) ## Continued fraction approximation to the golden ratio: Iterate(function(x) 1 + 1 / x, 30)(1) ## which is the same as cfrac(rep.int(1, 31)) ## Computing square root approximations for x as fixed points of the ## function t |-> (t + x / t) / 2, as a function of the initial value: asqrt <- function(x, n) Iterate(function(t) (t + x / t) / 2, n) asqrt(2, 30)(10) # Starting from a positive value => +sqrt(2) asqrt(2, 30)(-1) # Starting from a negative value => -sqrt(2) ## A list of all functions in the base environment: funs <- Filter(is.function, sapply(ls(baseenv()), get, baseenv())) ## Functions in base with more than 10 arguments: names(Filter(function(f) length(formals(f)) > 10, funs)) ## Number of functions in base with a '...' argument: length(Filter(function(f) any(names(formals(f)) %in% "..."), funs)) ## Find all objects in the base environment which are *not* functions: Filter(Negate(is.function), sapply(ls(baseenv()), get, baseenv()))
A call of gc
causes a garbage collection to take place.
gcinfo
sets a flag so that
automatic collection is either silent (verbose = FALSE
) or
prints memory usage statistics (verbose = TRUE
).
gc(verbose = getOption("verbose"), reset = FALSE, full = TRUE) gcinfo(verbose)
gc(verbose = getOption("verbose"), reset = FALSE, full = TRUE) gcinfo(verbose)
verbose |
logical; if |
reset |
logical; if |
full |
logical; if |
A call of gc
causes a garbage collection to take place.
This will also take place automatically without user intervention, and the
primary purpose of calling gc
is for the report on memory
usage. For an accurate report full = TRUE
should be used.
It can be useful to call gc
after a large object
has been removed, as this may prompt R to return memory to the
operating system.
R allocates space for vectors in multiples of 8 bytes: hence the
report of "Vcells"
, a relic of an earlier allocator (that used
a vector heap).
When gcinfo(TRUE)
is in force, messages are sent to the message
connection at each garbage collection of the form
Garbage collection 12 = 10+0+2 (level 0) ... 6.4 Mbytes of cons cells used (58%) 2.0 Mbytes of vectors used (32%)
Here the last two lines give the current memory usage rounded up to the next 0.1Mb and as a percentage of the current trigger value. The first line gives a breakdown of the number of garbage collections at various levels (for an explanation see the ‘R Internals’ manual).
gc
returns a matrix with rows "Ncells"
(cons
cells), usually 28 bytes each on 32-bit systems and 56 bytes on
64-bit systems, and "Vcells"
(vector cells, 8 bytes
each), and columns "used"
and "gc trigger"
,
each also interpreted in megabytes (rounded up to the next 0.1Mb).
If maxima have been set for either "Ncells"
or "Vcells"
,
a fifth column is printed giving the current limits in Mb (with
NA
denoting no limit).
The final two columns show the maximum space used since the last call
to gc(reset = TRUE)
(or since R started).
gcinfo
returns the previous value of the flag.
The ‘R Internals’ manual.
Memory
on R's memory management,
and gctorture
if you are an R developer.
gc.time()
reports time used for garbage collection.
reg.finalizer
for actions to happen at garbage
collection.
gc() #- do it now gcinfo(TRUE) #-- in the future, show when R does it ## vvvvv use larger to *show* something x <- integer(100000); for(i in 1:18) x <- c(x, i) gcinfo(verbose = FALSE) #-- don't show it anymore gc(TRUE) gc(reset = TRUE)
gc() #- do it now gcinfo(TRUE) #-- in the future, show when R does it ## vvvvv use larger to *show* something x <- integer(100000); for(i in 1:18) x <- c(x, i) gcinfo(verbose = FALSE) #-- don't show it anymore gc(TRUE) gc(reset = TRUE)
This function reports the time spent in garbage collection so far in the R session while GC timing was enabled.
gc.time(on = TRUE)
gc.time(on = TRUE)
on |
logical; if |
Due to timer resolution this may be under-estimate.
This is a primitive.
A numerical vector of length 5 giving the user CPU time, the system CPU time, the elapsed time and children's user and system CPU times (normally both zero), of time spent doing garbage collection whilst GC timing was enabled.
Times of child processes are not available on Windows and will always
be given as NA
.
gc
,
proc.time
for the timings for the session.
gc.time()
gc.time()
Provokes garbage collection on (nearly) every memory allocation. Intended to ferret out memory protection bugs. Also makes R run very slowly, unfortunately.
gctorture(on = TRUE) gctorture2(step, wait = step, inhibit_release = FALSE)
gctorture(on = TRUE) gctorture2(step, wait = step, inhibit_release = FALSE)
on |
logical; turning it on/off. |
step |
integer; run GC every |
wait |
integer; number of allocations to wait before starting GC torture. |
inhibit_release |
logical; do not release free objects for re-use: use with caution. |
Calling gctorture(TRUE)
instructs the memory manager to force a
full GC on every allocation. gctorture2
provides a more refined
interface that allows the start of the GC torture to be deferred and
also gives the option of running a GC only every step
allocations.
The third argument to gctorture2
is only used if R has been
configured with a strict write barrier enabled. When this is the case
all garbage collections are full collections, and the memory manager
marks free nodes and enables checks in many situations that signal an
error when a free node is used. This can help greatly in isolating
unprotected values in C code. It does not detect the case where a
node becomes free and is reallocated. The inhibit_release
argument can be used to prevent such reallocation. This will cause
memory to grow and should be used with caution and in conjunction with
operating system facilities to monitor and limit process memory use.
gctorture2
can also be invoked via environment variables at the
start of the R session. R_GCTORTURE corresponds to the
step
argument, R_GCTORTURE_WAIT to wait
, and
R_GCTORTURE_INHIBIT_RELEASE to inhibit_release
.
Previous value of first argument.
Peter Dalgaard and Luke Tierney
Search by name for an object (get
) or zero or more objects
(mget
).
get(x, pos = -1, envir = as.environment(pos), mode = "any", inherits = TRUE) mget(x, envir = as.environment(-1), mode = "any", ifnotfound, inherits = FALSE) dynGet(x, ifnotfound = , minframe = 1L, inherits = FALSE)
get(x, pos = -1, envir = as.environment(pos), mode = "any", inherits = TRUE) mget(x, envir = as.environment(-1), mode = "any", ifnotfound, inherits = FALSE) dynGet(x, ifnotfound = , minframe = 1L, inherits = FALSE)
x |
For |
pos , envir
|
where to look for the object (see ‘Details’); if omitted search as if the name of the object appeared unquoted in an expression. |
mode |
the mode or type of object sought: see the ‘Details’ section. |
inherits |
should the enclosing frames of the environment be searched? |
ifnotfound |
For |
minframe |
integer specifying the minimal frame number to look into. |
The pos
argument can specify the environment in which to look
for the object in any of several ways: as a positive integer (the
position in the search
list); as the character string
name of an element in the search list; or as an
environment
(including using sys.frame
to access the currently active function calls). The default of
-1
indicates the current environment of the call to
get
. The envir
argument is an alternative way to
specify an environment.
These functions look to see if each of the name(s) x
have a
value bound to it in the specified environment. If inherits
is
TRUE
and a value is not found for x
in the specified
environment, the enclosing frames of the environment are searched
until the name x
is encountered. See environment
and the ‘R Language Definition’ manual for details about the
structure of environments and their enclosures.
If mode
is specified then only objects of that type are sought.
mode
here is a mixture of the meanings of typeof
and mode
: "function"
covers primitive functions
and operators, "numeric"
, "integer"
and "double"
all refer to any numeric type, "symbol"
and "name"
are
equivalent but "language"
must be used (and not
"call"
or "("
).
Currently, mode = "S4"
and mode = "object"
are equivalent.
For mget
, the values of mode
and ifnotfound
can
be either the same length as x
or of length 1. The argument
ifnotfound
must be a list containing either the value to use if
the requested item is not found or a function of one argument which
will be called if the item is not found, with argument the name of the
item being requested.
dynGet()
is somewhat experimental and to be used inside
another function. It looks for an object in the callers, i.e.,
the sys.frame()
s of the function. Use with caution.
For get
, the object found. If no object is found an error results.
For mget
, a named list of objects (found or specified via
ifnotfound
).
The reverse (or “inverse”) of a <- get(nam)
is
assign(nam, a)
, assigning a
to name nam
.
inherits = TRUE
is the default for get
in R
but not for S where it had a different meaning.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
exists
for checking whether an object exists;
get0
for an efficient way of both checking existence and
getting an object.
assign
, the inverse of get()
, see above.
Use getAnywhere
for searching for an object
anywhere, including in other namespaces, and
getFromNamespace
to find an object in a specific
namespace.
get("%o%") ## test mget e1 <- new.env() mget(letters, e1, ifnotfound = as.list(LETTERS))
get("%o%") ## test mget e1 <- new.env() mget(letters, e1, ifnotfound = as.list(LETTERS))
This function allows us to query the set of routines
in a DLL that are registered with R to enhance
dynamic lookup, error handling when calling native routines,
and potentially security in the future.
This function provides a description of each of the
registered routines in the DLL for the different interfaces,
i.e. .C
, .Call
, .Fortran
and .External
.
getDLLRegisteredRoutines(dll, addNames = TRUE)
getDLLRegisteredRoutines(dll, addNames = TRUE)
dll |
a character string or The The |
addNames |
a logical value. If this is |
This takes the registration information after it has been registered and processed by the R internals. In other words, it uses the extended information.
There is a print
method for the class, which prints only the
types which have registered routines.
A list of class "DLLRegisteredRoutines"
with four elements
corresponding to the routines registered for the .C
,
.Call
, .Fortran
and .External
interfaces. Each is
a list (of class "NativeRoutineList"
) with as many elements as
there were routines registered for that interface.
Each element identifies a routine and is an object
of class "NativeSymbolInfo"
.
An object of this class has the following fields:
name |
the registered name of the routine (not necessarily the name in the C code). |
address |
the memory address of the routine as resolved in the
loaded DLL. This may be |
dll |
an object of class |
numParameters |
the number of arguments the native routine is to be called with. |
Duncan Temple Lang [email protected]
‘Writing R Extensions’ manual for symbol registration.
Duncan Temple Lang (2001). “In Search of C/C++ & FORTRAN Routines”. R News, 1(3), 20–23. https://www.r-project.org/doc/Rnews/Rnews_2001-3.pdf.
getLoadedDLLs
,
getNativeSymbolInfo
for information on the entry points listed.
dlls <- getLoadedDLLs() getDLLRegisteredRoutines(dlls[["base"]]) getDLLRegisteredRoutines("stats")
dlls <- getLoadedDLLs() getDLLRegisteredRoutines(dlls[["base"]]) getDLLRegisteredRoutines("stats")
This function provides a way to get a list of all the DLLs (see
dyn.load
) that are currently loaded in the R session.
getLoadedDLLs()
getLoadedDLLs()
This queries the internal table that manages the DLLs.
An object of class "DLLInfoList"
which is a list
with an element corresponding to each DLL that is currently loaded in the
session. Each element is an object of class "DLLInfo"
which
has the following entries.
name |
the abbreviated name. |
path |
the fully qualified name of the loaded DLL. |
dynamicLookup |
a logical value indicating whether R uses only the registration information to resolve symbols or whether it searches the entire symbol table of the DLL. |
handle |
a reference to the C-level data structure that
provides access to the contents of the DLL.
This is an object of class |
Note that the class DLLInfo
has a method for
$
which can be used to resolve native symbols within that
DLL. Therefore, one must access the R-level elements described
above using [[
, e.g. x[["name"]]
or x[["handle"]]
.
We are starting to use the handle
elements in the DLL object to
resolve symbols more directly in R.
Duncan Temple Lang [email protected].
getDLLRegisteredRoutines
,
getNativeSymbolInfo
getLoadedDLLs() utils::tail(getLoadedDLLs(), 2) # the last 2 loaded ones, still a DLLInfoList
getLoadedDLLs() utils::tail(getLoadedDLLs(), 2) # the last 2 loaded ones, still a DLLInfoList
This finds and returns a description of one or more dynamically loaded
or ‘exported’ built-in native symbols. For each name, it
returns information about the name of the symbol, the library in which
it is located and, if available, the number of arguments it expects
and by which interface it should be called (i.e .Call
,
.C
, .Fortran
, or
.External
). Additionally, it returns the address of the
symbol and this can be passed to other C routines. Specifically, this
provides a way to explicitly share symbols between different
dynamically loaded package libraries. Also, it provides a way to
query where symbols were resolved, and aids diagnosing strange
behavior associated with dynamic resolution.
getNativeSymbolInfo(name, PACKAGE, unlist = TRUE, withRegistrationInfo = FALSE)
getNativeSymbolInfo(name, PACKAGE, unlist = TRUE, withRegistrationInfo = FALSE)
name |
the name(s) of the native symbol(s). |
PACKAGE |
an optional argument that specifies to which
DLL to restrict the search for this symbol. If this is
|
unlist |
a logical value which controls how the result is
returned if the function is called with the name of a single symbol.
If |
withRegistrationInfo |
a logical value indicating whether, if
|
This uses the same mechanism for resolving symbols as is used
in all the native interfaces (.Call
, etc.).
If the symbol has been explicitly registered by the DLL
in which it is contained, information about the number of arguments
and the interface by which it should be called will be returned.
Otherwise, a generic native symbol object is returned.
Generally, a list of NativeSymbolInfo
elements whose elements
can be indexed by the elements of name
in the call. Each
NativeSymbolInfo
object is a list containing the following
elements:
name |
the name of the symbol, as given by the
|
address |
if |
dll |
a list containing 3 elements:
|
If the routine was explicitly registered by the dynamically loaded library, the list contains a fourth field
numParameters |
the number of arguments that should be passed in a call to this routine. |
Additionally, the list will have an additional class,
being CRoutine
, CallRoutine
, FortranRoutine
or
ExternalRoutine
corresponding to the R interface by which it
should be invoked.
If any of the symbols is not found, an error is raised.
If name
contains only one symbol name and unlist
is
TRUE
, then the single NativeSymbolInfo
is returned
rather than the list containing that one element.
The third element of the NativeSymbolInfo
objects was renamed
from package
to dll
in R version 3.6.0, for consistency
with the names of the NativeSymbolInfo
objects returned by
getDLLRegisteredRoutines()
.
One motivation for accessing this reflectance information is to be
able to pass native routines to C routines as function pointers in C.
This allows us to treat native routines and R functions in a similar
manner, such as when passing an R function to C code that makes
callbacks to that function at different points in its computation
(e.g., nls
). Additionally, we can resolve the symbol
just once and avoid resolving it repeatedly or using the internal
cache.
Duncan Temple Lang
For information about registering native routines, see “In Search of C/C++ & FORTRAN Routines”, R-News, volume 1, number 3, 2001, p20–23 (https://www.r-project.org/doc/Rnews/Rnews_2001-3.pdf).
getDLLRegisteredRoutines
,
is.loaded
,
.C
,
.Fortran
,
.External
,
.Call
,
dyn.load
.
Translation of text messages typically from calls to
stop()
, warning()
, or message()
happens when Native Language Support (NLS) was enabled in this build of
R as it is almost always, see also the bindtextdomain()
example.
The functions documented here are the low level building blocks used explicitly or implicitly in almost all such message producing calls and they attempt to translate character vectors or set where the translations are to be found.
gettext(..., domain = NULL, trim = TRUE) ngettext(n, msg1, msg2, domain = NULL) bindtextdomain(domain, dirname = NULL) Sys.setLanguage(lang, unset = "en")
gettext(..., domain = NULL, trim = TRUE) ngettext(n, msg1, msg2, domain = NULL) bindtextdomain(domain, dirname = NULL) Sys.setLanguage(lang, unset = "en")
... |
one or more character vectors. |
trim |
logical indicating if the white space trimming in
|
domain |
the ‘domain’ for the translation, a |
n |
a non-negative integer. |
msg1 |
the message to be used in English for |
msg2 |
the message to be used in English for |
dirname |
the directory in which to find translated message catalogs for the domain. |
lang |
a |
unset |
a string, specifying the default language assumed to be
current in the case |
If domain
is NULL
(the default) in gettext
or ngettext
, the domain is inferred. If gettext
or ngettext
is called from a function in the namespace of
package pkg including called via stop()
,
warning()
, or message()
from the function,
or, say, evaluated as if called from that namespace, see the
evalq()
example,
the domain is set to "R-pkg"
. Otherwise there is no default
domain and messages are not translated.
Setting domain = NA
in gettext
or ngettext
suppresses any translation.
""
does not match any domain. In gettext
or ngettext
,
domain = ""
is effectively the same as domain = NA
.
If the domain is found, each character string is offered for translation, and replaced by its translation into the current language if one is found.
The language to be used for message translation is determined by
your OS default and/or the locale setting at R's startup, see
Sys.getlocale()
, and notably the LANGUAGE environment
variable, and also Sys.setLanguage()
here.
Conventionally the domain for R warning/error messages in package
pkg is "R-pkg"
, and that for C-level messages is "pkg"
.
For gettext
, when trim
is true as by default,
leading and trailing whitespace is ignored (“trimmed”) when
looking for the translation.
ngettext
is used where the message needs to vary by a single
integer. Translating such messages is subject to very specific rules
for different languages: see the GNU Gettext Manual. The string
will often contain a single instance of %d
to be used in
sprintf
. If English is used, msg1
is returned if
n == 1
and msg2
in all other cases.
bindtextdomain
is typically wrapper for the C function of the same
name: your system may have a man
page for it. With a
non-NULL
dirname
it specifies where to look for message
catalogues: with dirname = NULL
it returns the current location.
If NLS is not enabled, bindtextdomain(*,*)
returns NULL
.
The special case bindtextdomain(NULL)
calls C level
textdomain(textdomain(NULL))
for the purpose of flushing (i.e.,
emptying) the cache of already translated strings; it returns TRUE
when NLS is enabled.
The utility Sys.setLanguage(lang)
combines setting the
LANGUAGE environment variable with flushing the translation cache
by bindtextdomain(NULL)
.
For gettext
, a character vector, one element per string in
...
. If translation is not enabled or no domain is found or
no translation is found in that domain, the original strings are
returned.
For ngettext
, a character string.
For bindtextdomain
, a character string giving the current base
directory, or NULL
if setting it failed.
For Sys.setLanguage()
, the previous LANGUAGE setting with
attribute attr(*, "ok")
, a logical
indicating success.
Note that currently, using a non-existing language lang
is still
set and no translation will happen, without any message
.
stop
and warning
make use of gettext
to
translate messages.
xgettext
(package tools) for extracting translatable
strings from R source files.
bindtextdomain("R") # non-null if and only if NLS is enabled for(n in 0:3) print(sprintf(ngettext(n, "%d variable has missing values", "%d variables have missing values"), n)) ## Not run: ## for translation, those strings should appear in R-pkg.pot as msgid "%d variable has missing values" msgid_plural "%d variables have missing values" msgstr[0] "" msgstr[1] "" ## End(Not run) miss <- "One only" # this line, or the next for the ngettext() below miss <- c("one", "or", "another") cat(ngettext(length(miss), "variable", "variables"), paste(sQuote(miss), collapse = ", "), ngettext(length(miss), "contains", "contain"), "missing values\n") ## better for translators would be to use cat(sprintf(ngettext(length(miss), "variable %s contains missing values\n", "variables %s contain missing values\n"), paste(sQuote(miss), collapse = ", "))) thisLang <- Sys.getenv("LANGUAGE", unset = NA) # so we can reset it if(is.na(thisLang) || !nzchar(thisLang)) thisLang <- "en" # "factory" default enT <- "empty model supplied" Sys.setenv(LANGUAGE = "de") # may not always 'work' gettext(enT, domain="R-stats")# "leeres Modell angegeben" (if translation works) tget <- function() gettext(enT) tget() # not translated as fn tget() is not from "stats" pkg/namespace evalq(function() gettext(enT), asNamespace("stats"))() # *is* translated ## Sys.setLanguage() -- typical usage -- Sys.setLanguage("en") -> oldSet # does set LANGUAGE env.var errMsg <- function(expr) tryCatch(expr, error=conditionMessage) (errMsg(1 + "2") -> err) Sys.setLanguage("fr") errMsg(1 + "2") Sys.setLanguage("de") errMsg(1 + "2") ## Usually, you would reset the language to "previous" via Sys.setLanguage(oldSet) ## A show off of translations -- platform (font etc) dependent: ## The translation languages available for "base" R in this version of R: if(capabilities("NLS")) withAutoprint({ langs <- list.files(bindtextdomain("R"), pattern = "^[a-z]{2}(_[A-Z]{2}|@quot)?$") langs txts <- sapply(setNames(,langs), function(lang) { Sys.setLanguage(lang) gettext("incompatible dimensions", domain="R-stats") }) cbind(txts) (nTrans <- length(unique(txts))) (not_translated <- names(txts[txts == txts[["en"]]])) }) ## Here, we reset to the *original* setting before the full example started: if(nzchar(thisLang)) { ## reset to previous and check Sys.setLanguage(thisLang) stopifnot(identical(errMsg(1 + "2"), err)) } # else staying at 'de' ..
bindtextdomain("R") # non-null if and only if NLS is enabled for(n in 0:3) print(sprintf(ngettext(n, "%d variable has missing values", "%d variables have missing values"), n)) ## Not run: ## for translation, those strings should appear in R-pkg.pot as msgid "%d variable has missing values" msgid_plural "%d variables have missing values" msgstr[0] "" msgstr[1] "" ## End(Not run) miss <- "One only" # this line, or the next for the ngettext() below miss <- c("one", "or", "another") cat(ngettext(length(miss), "variable", "variables"), paste(sQuote(miss), collapse = ", "), ngettext(length(miss), "contains", "contain"), "missing values\n") ## better for translators would be to use cat(sprintf(ngettext(length(miss), "variable %s contains missing values\n", "variables %s contain missing values\n"), paste(sQuote(miss), collapse = ", "))) thisLang <- Sys.getenv("LANGUAGE", unset = NA) # so we can reset it if(is.na(thisLang) || !nzchar(thisLang)) thisLang <- "en" # "factory" default enT <- "empty model supplied" Sys.setenv(LANGUAGE = "de") # may not always 'work' gettext(enT, domain="R-stats")# "leeres Modell angegeben" (if translation works) tget <- function() gettext(enT) tget() # not translated as fn tget() is not from "stats" pkg/namespace evalq(function() gettext(enT), asNamespace("stats"))() # *is* translated ## Sys.setLanguage() -- typical usage -- Sys.setLanguage("en") -> oldSet # does set LANGUAGE env.var errMsg <- function(expr) tryCatch(expr, error=conditionMessage) (errMsg(1 + "2") -> err) Sys.setLanguage("fr") errMsg(1 + "2") Sys.setLanguage("de") errMsg(1 + "2") ## Usually, you would reset the language to "previous" via Sys.setLanguage(oldSet) ## A show off of translations -- platform (font etc) dependent: ## The translation languages available for "base" R in this version of R: if(capabilities("NLS")) withAutoprint({ langs <- list.files(bindtextdomain("R"), pattern = "^[a-z]{2}(_[A-Z]{2}|@quot)?$") langs txts <- sapply(setNames(,langs), function(lang) { Sys.setLanguage(lang) gettext("incompatible dimensions", domain="R-stats") }) cbind(txts) (nTrans <- length(unique(txts))) (not_translated <- names(txts[txts == txts[["en"]]])) }) ## Here, we reset to the *original* setting before the full example started: if(nzchar(thisLang)) { ## reset to previous and check Sys.setLanguage(thisLang) stopifnot(identical(errMsg(1 + "2"), err)) } # else staying at 'de' ..
getwd
returns an absolute filepath representing the current
working directory of the R process; setwd(dir)
is used to set
the working directory to dir
.
getwd() setwd(dir)
getwd() setwd(dir)
dir |
A character string: tilde expansion will be done. |
See files for how file paths with marked encodings are interpreted.
getwd
returns a character string or NULL
if the working
directory is not available.
On Windows the path returned will use /
as the path separator
and be encoded in UTF-8. The path will not have a trailing /
unless it is the root directory (of a drive or share on Windows).
setwd
returns the current directory before the change,
invisibly and with the same conventions as getwd
. It will give
an error if it does not succeed (including if it is not implemented).
Note that the return value is said to be an absolute filepath: there can be more than one representation of the path to a directory and on some OSes the value returned can differ after changing directories and changing back to the same directory (for example if symbolic links have been traversed).
list.files
for the contents of a directory.
normalizePath
for a ‘canonical’ path name.
(WD <- getwd()) if (!is.null(WD)) setwd(WD)
(WD <- getwd()) if (!is.null(WD)) setwd(WD)
Generate factors by specifying the pattern of their levels.
gl(n, k, length = n*k, labels = seq_len(n), ordered = FALSE)
gl(n, k, length = n*k, labels = seq_len(n), ordered = FALSE)
n |
an integer giving the number of levels. |
k |
an integer giving the number of replications. |
length |
an integer giving the length of the result. |
labels |
an optional vector of labels for the resulting factor levels. |
ordered |
a logical indicating whether the result should be ordered or not. |
The result has levels from 1
to n
with each value
replicated in groups of length k
out to a total length of
length
.
gl
is modelled on the GLIM function of the same name.
The underlying factor()
.
## First control, then treatment: gl(2, 8, labels = c("Control", "Treat")) ## 20 alternating 1s and 2s gl(2, 1, 20) ## alternating pairs of 1s and 2s gl(2, 2, 20)
## First control, then treatment: gl(2, 8, labels = c("Control", "Treat")) ## 20 alternating 1s and 2s gl(2, 1, 20) ## alternating pairs of 1s and 2s gl(2, 2, 20)
grep
, grepl
, regexpr
, gregexpr
,
regexec
and gregexec
search for matches to argument
pattern
within each element of a character vector: they differ in
the format of and amount of detail in the results.
sub
and gsub
perform replacement of the first and all
matches respectively.
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE) grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) regexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) regexec(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) gregexec(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE) grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) regexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) regexec(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) gregexec(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
pattern |
character string containing a regular expression
(or character string for |
x , text
|
a character vector where matches are sought, or an
object which can be coerced by |
ignore.case |
if |
perl |
logical. Should Perl-compatible regexps be used? |
value |
if |
fixed |
logical. If |
useBytes |
logical. If |
invert |
logical. If |
replacement |
a replacement for matched pattern in |
Arguments which should be character strings or character vectors are coerced to character if possible.
Each of these functions operates in one of three modes:
fixed = TRUE
: use exact matching.
perl = TRUE
: use Perl-style regular expressions.
fixed = FALSE, perl = FALSE
: use POSIX 1003.2
extended regular expressions (the default).
See the help pages on regular expression for details of the different types of regular expressions.
The two *sub
functions differ only in that sub
replaces
only the first occurrence of a pattern
whereas gsub
replaces all occurrences. If replacement
contains
backreferences which are not defined in pattern
the result is
undefined (but most often the backreference is taken to be ""
).
For regexpr
, gregexpr
, regexec
and gregexec
it is an error for pattern
to be NA
, otherwise NA
is permitted and gives an NA
match.
Both grep
and grepl
take missing values in x
as
not matching a non-missing pattern
.
The main effect of useBytes = TRUE
is to avoid errors/warnings
about invalid inputs and spurious matches in multibyte locales, but
for regexpr
it changes the interpretation of the output. It
inhibits the conversion of inputs with marked encodings, and is forced
if any input is found which is marked as "bytes"
(see
Encoding
).
Caseless matching does not make much sense for bytes in a multibyte
locale, and you should expect it only to work for ASCII characters if
useBytes = TRUE
.
regexpr
and gregexpr
with perl = TRUE
allow
Python-style named captures, but not for long vector inputs.
Invalid inputs in the current locale are warned about up to 5 times.
Caseless matching with perl = TRUE
for non-ASCII characters
depends on the PCRE library being compiled with ‘Unicode
property support’, which PCRE2 is by default.
grep(value = FALSE)
returns a vector of the indices
of the elements of x
that yielded a match (or not, for
invert = TRUE
). This will be an integer vector unless the input
is a long vector, when it will be a double vector.
grep(value = TRUE)
returns a character vector containing the
selected elements of x
(after coercion, preserving names but no
other attributes).
grepl
returns a logical vector (match or not for each element of
x
).
sub
and gsub
return a character vector of the same length
and with the same attributes as x
(after possible coercion to
character). Elements of character vectors x
which are not
substituted will be returned unchanged (including any declared encoding if
useBytes = FALSE
). If useBytes = FALSE
a non-ASCII
substituted result will often be in UTF-8 with a marked encoding (e.g., if
there is a UTF-8 input, and in a multibyte locale unless fixed =
TRUE
). Such strings can be re-encoded by enc2native
. If
any of the inputs is marked as "bytes"
, elements of character
vectors x
which are substituted will be returned marked as
"bytes"
, but the encoding flag on elements not substituted is
unspecified (it may be the original or "bytes"). If none of the inputs is
marked as "bytes"
, but useBytes = TRUE
is given explicitly,
the encoding flag is unspecified even on the substituted elements (it may
be "bytes"
or "unknown"
, possibly invalid in the current
encoding). Mixed use of "bytes"
and other marked encodings is
discouraged, but if still desired one may use iconv
to
re-encode the result e.g. to UTF-8 with suitably substituted invalid
bytes.
regexpr
returns an integer vector of the same length as
text
giving the starting position of the first match or
if there is none, with attribute
"match.length"
, an
integer vector giving the length of the matched text (or for
no match). The match positions and lengths are in characters unless
useBytes = TRUE
is used, when they are in bytes (as they are
for ASCII-only matching: in either case an attribute
useBytes
with value TRUE
is set on the result). If
named capture is used there are further attributes
"capture.start"
, "capture.length"
and
"capture.names"
.
gregexpr
returns a list of the same length as text
each
element of which is of the same form as the return value for
regexpr
, except that the starting positions of every (disjoint)
match are given.
regexec
returns a list of the same length as text
each
element of which is either if there is no match, or a
sequence of integers with the starting positions of the match and all
substrings corresponding to parenthesized subexpressions of
pattern
, with attribute "match.length"
a vector
giving the lengths of the matches (or for no match). The
interpretation of positions and length and the attributes follows
regexpr
.
gregexec
returns the same as regexec
, except that to
accommodate multiple matches per element of text
, the integer
sequences for each match are made into columns of a matrix, with one
matrix per element of text
with matches.
Where matching failed because of resource limits (especially for
perl = TRUE
) this is regarded as a non-match, usually with a
warning.
The POSIX 1003.2 mode of gsub
and gregexpr
does not
work correctly with repeated word-boundaries (e.g.,
pattern = "\b"
).
Use perl = TRUE
for such matches (but that may not
work as expected with non-ASCII inputs, as the meaning of
‘word’ is system-dependent).
If you are doing a lot of regular expression matching, including on
very long strings, you will want to consider the options used.
Generally perl = TRUE
will be faster than the default regular
expression engine, and fixed = TRUE
faster still (especially
when each pattern is matched only a few times).
If you are working with texts with non-ASCII characters, which can be easily turned into ASCII (e.g. by substituting fancy quotes), doing so is likely to improve performance.
If you are working in a single-byte locale (though not common since R 4.2)
and have marked UTF-8 strings that are representable in that locale,
convert them first as just one UTF-8 string will force all the matching to
be done in Unicode, which attracts a penalty of around
for the default POSIX 1003.2 mode.
While useBytes = TRUE
will improve performance further, because the
strings will not be checked before matching and the actual matching will
be faster, it can produce unexpected results so is best avoided. With
fixed = TRUE
and useBytes = FALSE
, optimizations are in
place that take advantage of byte-based matching working for such patterns
in UTF-8. With useBytes = TRUE
, character ranges, wildcards,
and other regular expression patterns may produce unexpected results.
PCRE-based matching by default used to put additional effort into
‘studying’ the compiled pattern when x
/text
has
length 10 or more. That study may use the PCRE JIT compiler on
platforms where it is available (see pcre_config
). As
from PCRE2 (PCRE version >= 10.00 as reported by
extSoftVersion
), there is no study phase, but the
patterns are optimized automatically when possible, and PCRE JIT is
used when enabled. The details are controlled by
options
PCRE_study
and PCRE_use_JIT
.
(Some timing comparisons can be seen by running file
‘tests/PCRE.R’ in the R sources (and perhaps installed).)
People working with PCRE and very long strings can adjust the maximum
size of the JIT stack by setting environment variable
R_PCRE_JIT_STACK_MAXSIZE before JIT is used to a value between
1
and 1000
in MB: the default is 64
. When JIT is
not used with PCRE version < 10.30 (that is with PCRE1 and old
versions of PCRE2), it might also be wise to set the option
PCRE_limit_recursion
.
Aspects will be platform-dependent as well as locale-dependent: for
example the implementation of character classes (except
[:digit:]
and [:xdigit:]
). One can expect results to be
consistent for ASCII inputs and when working in UTF-8 mode (when most
platforms will use Unicode character tables, although those are
updated frequently and subject to some degree of interpretation – is
a circled capital letter alphabetic or a symbol?). However, results
in 8-bit encodings can differ considerably between platforms, modes
and from the UTF-8 versions.
The C code for POSIX-style regular expression matching has changed over the years. As from R 2.10.0 (Oct 2009) the TRE library of Ville Laurikari (https://github.com/laurikari/tre) is used. The POSIX standard does give some room for interpretation, especially in the handling of invalid regular expressions and the collation of character ranges, so the results will have changed slightly over the years.
For Perl-style matching PCRE2 or PCRE (https://www.pcre.org) is used: again the results may depend (slightly) on the version of PCRE in use.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole (grep
)
regular expression (aka regexp
) for the details
of the pattern specification.
regmatches
for extracting matched substrings based on
the results of regexpr
, gregexpr
and regexec
.
glob2rx
to turn wildcard matches into regular expressions.
agrep
for approximate matching.
charmatch
, pmatch
for partial matching,
match
for matching to whole strings,
startsWith
for matching of initial parts of strings.
tolower
, toupper
and chartr
for character translations.
apropos
uses regexps and has more examples.
grepRaw
for matching raw vectors.
Options PCRE_limit_recursion
, PCRE_study
and
PCRE_use_JIT
.
extSoftVersion
for the versions of regex and PCRE
libraries in use, pcre_config
for more details for
PCRE.
grep("[a-z]", letters) txt <- c("arm","foot","lefroo", "bafoobar") if(length(i <- grep("foo", txt))) cat("'foo' appears at least once in\n\t", txt, "\n") i # 2 and 4 txt[i] ## Double all 'a' or 'b's; "\" must be escaped, i.e., 'doubled' gsub("([ab])", "\\1_\\1_", "abc and ABC") txt <- c("The", "licenses", "for", "most", "software", "are", "designed", "to", "take", "away", "your", "freedom", "to", "share", "and", "change", "it.", "", "By", "contrast,", "the", "GNU", "General", "Public", "License", "is", "intended", "to", "guarantee", "your", "freedom", "to", "share", "and", "change", "free", "software", "--", "to", "make", "sure", "the", "software", "is", "free", "for", "all", "its", "users") ( i <- grep("[gu]", txt) ) # indices stopifnot( txt[i] == grep("[gu]", txt, value = TRUE) ) ## Note that for some implementations character ranges are ## locale-dependent (but not currently). Then [b-e] in locales such as ## en_US may include B as the collation order is aAbBcCdDe ... (ot <- sub("[b-e]",".", txt)) txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution ## In caseless matching, ranges include both cases: a <- grep("[b-e]", txt, value = TRUE) b <- grep("[b-e]", txt, ignore.case = TRUE, value = TRUE) setdiff(b, a) txt[gsub("g","#", txt) != gsub("g","#", txt, ignore.case = TRUE)] # the "G" words regexpr("en", txt) gregexpr("e", txt) ## Using grepl() for filtering ## Find functions with argument names matching "warn": findArgs <- function(env, pattern) { nms <- ls(envir = as.environment(env)) nms <- nms[is.na(match(nms, c("F","T")))] # <-- work around "checking hack" aa <- sapply(nms, function(.) { o <- get(.) if(is.function(o)) names(formals(o)) }) iw <- sapply(aa, function(a) any(grepl(pattern, a, ignore.case=TRUE))) aa[iw] } findArgs("package:base", "warn") ## trim trailing white space str <- "Now is the time " sub(" +$", "", str) ## spaces only ## what is considered 'white space' depends on the locale. sub("[[:space:]]+$", "", str) ## white space, POSIX-style ## what PCRE considered white space changed in version 8.34: see ?regex sub("\\s+$", "", str, perl = TRUE) ## PCRE-style white space ## capitalizing txt <- "a test of capitalizing" gsub("(\\w)(\\w*)", "\\U\\1\\L\\2", txt, perl=TRUE) gsub("\\b(\\w)", "\\U\\1", txt, perl=TRUE) txt2 <- "useRs may fly into JFK or laGuardia" gsub("(\\w)(\\w*)(\\w)", "\\U\\1\\E\\2\\U\\3", txt2, perl=TRUE) sub("(\\w)(\\w*)(\\w)", "\\U\\1\\E\\2\\U\\3", txt2, perl=TRUE) ## named capture notables <- c(" Ben Franklin and Jefferson Davis", "\tMillard Fillmore") # name groups 'first' and 'last' name.rex <- "(?<first>[[:upper:]][[:lower:]]+) (?<last>[[:upper:]][[:lower:]]+)" (parsed <- regexpr(name.rex, notables, perl = TRUE)) gregexpr(name.rex, notables, perl = TRUE)[[2]] parse.one <- function(res, result) { m <- do.call(rbind, lapply(seq_along(res), function(i) { if(result[i] == -1) return("") st <- attr(result, "capture.start")[i, ] substring(res[i], st, st + attr(result, "capture.length")[i, ] - 1) })) colnames(m) <- attr(result, "capture.names") m } parse.one(notables, parsed) ## Decompose a URL into its components. ## Example by LT (http://www.cs.uiowa.edu/~luke/R/regexp.html). x <- "http://stat.umn.edu:80/xyz" m <- regexec("^(([^:]+)://)?([^:/]+)(:([0-9]+))?(/.*)", x) m regmatches(x, m) ## Element 3 is the protocol, 4 is the host, 6 is the port, and 7 ## is the path. We can use this to make a function for extracting the ## parts of a URL: URL_parts <- function(x) { m <- regexec("^(([^:]+)://)?([^:/]+)(:([0-9]+))?(/.*)", x) parts <- do.call(rbind, lapply(regmatches(x, m), `[`, c(3L, 4L, 6L, 7L))) colnames(parts) <- c("protocol","host","port","path") parts } URL_parts(x) ## gregexec() may match multiple times within a single string. pattern <- "([[:alpha:]]+)([[:digit:]]+)" s <- "Test: A1 BC23 DEF456" m <- gregexec(pattern, s) m regmatches(s, m) ## Before gregexec() was implemented, one could emulate it by running ## regexec() on the regmatches obtained via gregexpr(). E.g.: lapply(regmatches(s, gregexpr(pattern, s)), function(e) regmatches(e, regexec(pattern, e)))
grep("[a-z]", letters) txt <- c("arm","foot","lefroo", "bafoobar") if(length(i <- grep("foo", txt))) cat("'foo' appears at least once in\n\t", txt, "\n") i # 2 and 4 txt[i] ## Double all 'a' or 'b's; "\" must be escaped, i.e., 'doubled' gsub("([ab])", "\\1_\\1_", "abc and ABC") txt <- c("The", "licenses", "for", "most", "software", "are", "designed", "to", "take", "away", "your", "freedom", "to", "share", "and", "change", "it.", "", "By", "contrast,", "the", "GNU", "General", "Public", "License", "is", "intended", "to", "guarantee", "your", "freedom", "to", "share", "and", "change", "free", "software", "--", "to", "make", "sure", "the", "software", "is", "free", "for", "all", "its", "users") ( i <- grep("[gu]", txt) ) # indices stopifnot( txt[i] == grep("[gu]", txt, value = TRUE) ) ## Note that for some implementations character ranges are ## locale-dependent (but not currently). Then [b-e] in locales such as ## en_US may include B as the collation order is aAbBcCdDe ... (ot <- sub("[b-e]",".", txt)) txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution ## In caseless matching, ranges include both cases: a <- grep("[b-e]", txt, value = TRUE) b <- grep("[b-e]", txt, ignore.case = TRUE, value = TRUE) setdiff(b, a) txt[gsub("g","#", txt) != gsub("g","#", txt, ignore.case = TRUE)] # the "G" words regexpr("en", txt) gregexpr("e", txt) ## Using grepl() for filtering ## Find functions with argument names matching "warn": findArgs <- function(env, pattern) { nms <- ls(envir = as.environment(env)) nms <- nms[is.na(match(nms, c("F","T")))] # <-- work around "checking hack" aa <- sapply(nms, function(.) { o <- get(.) if(is.function(o)) names(formals(o)) }) iw <- sapply(aa, function(a) any(grepl(pattern, a, ignore.case=TRUE))) aa[iw] } findArgs("package:base", "warn") ## trim trailing white space str <- "Now is the time " sub(" +$", "", str) ## spaces only ## what is considered 'white space' depends on the locale. sub("[[:space:]]+$", "", str) ## white space, POSIX-style ## what PCRE considered white space changed in version 8.34: see ?regex sub("\\s+$", "", str, perl = TRUE) ## PCRE-style white space ## capitalizing txt <- "a test of capitalizing" gsub("(\\w)(\\w*)", "\\U\\1\\L\\2", txt, perl=TRUE) gsub("\\b(\\w)", "\\U\\1", txt, perl=TRUE) txt2 <- "useRs may fly into JFK or laGuardia" gsub("(\\w)(\\w*)(\\w)", "\\U\\1\\E\\2\\U\\3", txt2, perl=TRUE) sub("(\\w)(\\w*)(\\w)", "\\U\\1\\E\\2\\U\\3", txt2, perl=TRUE) ## named capture notables <- c(" Ben Franklin and Jefferson Davis", "\tMillard Fillmore") # name groups 'first' and 'last' name.rex <- "(?<first>[[:upper:]][[:lower:]]+) (?<last>[[:upper:]][[:lower:]]+)" (parsed <- regexpr(name.rex, notables, perl = TRUE)) gregexpr(name.rex, notables, perl = TRUE)[[2]] parse.one <- function(res, result) { m <- do.call(rbind, lapply(seq_along(res), function(i) { if(result[i] == -1) return("") st <- attr(result, "capture.start")[i, ] substring(res[i], st, st + attr(result, "capture.length")[i, ] - 1) })) colnames(m) <- attr(result, "capture.names") m } parse.one(notables, parsed) ## Decompose a URL into its components. ## Example by LT (http://www.cs.uiowa.edu/~luke/R/regexp.html). x <- "http://stat.umn.edu:80/xyz" m <- regexec("^(([^:]+)://)?([^:/]+)(:([0-9]+))?(/.*)", x) m regmatches(x, m) ## Element 3 is the protocol, 4 is the host, 6 is the port, and 7 ## is the path. We can use this to make a function for extracting the ## parts of a URL: URL_parts <- function(x) { m <- regexec("^(([^:]+)://)?([^:/]+)(:([0-9]+))?(/.*)", x) parts <- do.call(rbind, lapply(regmatches(x, m), `[`, c(3L, 4L, 6L, 7L))) colnames(parts) <- c("protocol","host","port","path") parts } URL_parts(x) ## gregexec() may match multiple times within a single string. pattern <- "([[:alpha:]]+)([[:digit:]]+)" s <- "Test: A1 BC23 DEF456" m <- gregexec(pattern, s) m regmatches(s, m) ## Before gregexec() was implemented, one could emulate it by running ## regexec() on the regmatches obtained via gregexpr(). E.g.: lapply(regmatches(s, gregexpr(pattern, s)), function(e) regmatches(e, regexec(pattern, e)))
grepRaw
searches for substring pattern
matches within a
raw vector x
.
grepRaw(pattern, x, offset = 1L, ignore.case = FALSE, value = FALSE, fixed = FALSE, all = FALSE, invert = FALSE)
grepRaw(pattern, x, offset = 1L, ignore.case = FALSE, value = FALSE, fixed = FALSE, all = FALSE, invert = FALSE)
pattern |
raw vector containing a regular expression
(or fixed pattern for |
x |
a raw vector where matches are sought, or an object which can
be coerced by |
ignore.case |
if |
offset |
an integer specifying the offset from
which the search should start. Must be positive. The beginning of
line is defined to be at that offset so |
value |
logical. Determines the return value: see ‘Value’. |
fixed |
logical. If |
all |
logical. If |
invert |
logical. If |
Unlike grep
, seeks matching patterns within the raw
vector x
. This has implications especially in the all =
TRUE
case, e.g., patterns matching empty strings are inherently
infinite and thus may lead to unexpected results.
The argument invert
is interpreted as asking to return the
complement of the match, which is only meaningful for value =
TRUE
. Argument offset
determines the start of the search, not
of the complement. Note that invert = TRUE
with all =
TRUE
will split x
into pieces delimited by the pattern
including leading and trailing empty strings (consequently the use of
regular expressions with "^"
or "$"
in that case may
lead to less intuitive results).
Some combinations of arguments such as fixed = TRUE
with
value = TRUE
are supported but are less meaningful.
grepRaw(value = FALSE)
returns an integer vector of the offsets
at which matches have occurred. If all = FALSE
then it will be
either of length zero (no match) or length one (first matching
position).
grepRaw(value = TRUE, all = FALSE)
returns a raw vector which
is either empty (no match) or the matched part of x
.
grepRaw(value = TRUE, all = TRUE)
returns a (potentially
empty) list of raw vectors corresponding to the matched parts.
The TRE library of Ville Laurikari (https://github.com/laurikari/tre/)
is used except for fixed = TRUE
.
regular expression (aka regexp
) for the details
of the pattern specification.
grep
for matching character vectors.
grepRaw("no match", "textText") # integer(0): no match grepRaw("adf", "adadfadfdfadadf") # 3 - the first match grepRaw("adf", "adadfadfdfadadf", all=TRUE, fixed=TRUE) ## [1] 3 6 13 -- three matches
grepRaw("no match", "textText") # integer(0): no match grepRaw("adf", "adadfadfdfadadf") # 3 - the first match grepRaw("adf", "adadfadfdfadadf", all=TRUE, fixed=TRUE) ## [1] 3 6 13 -- three matches
Group generic methods can be defined for the following pre-specified groups of
functions, Math
, Ops
, matrixOps
, Summary
and Complex
.
(There are no objects of these names in base R, but there are in the
methods package, not yet for matrixOps
.)
A method defined for an individual member of the group takes precedence over a method defined for the group as a whole.
## S3 methods for group generics have prototypes: Math(x, ...) Ops(e1, e2) Complex(z) Summary(..., na.rm = FALSE) matrixOps(x, y)
## S3 methods for group generics have prototypes: Math(x, ...) Ops(e1, e2) Complex(z) Summary(..., na.rm = FALSE) matrixOps(x, y)
x , y , z , e1 , e2
|
objects. |
... |
further arguments passed to methods. |
na.rm |
logical: should missing values be removed? |
There are five groups for which S3 methods can be written,
namely the "Math"
, "Ops"
, "Summary"
, "matrixOps"
, and
"Complex"
groups. These are not R objects in base R, but
methods can be supplied for them and base R contains
factor
, data.frame
and
difftime
methods for the first three groups. (There is
also a ordered
method for Ops
,
POSIXt
and Date
methods for Math
and Ops
, package_version
methods for Ops
and Summary
, as well as a ts
method for
Ops
in package stats.)
Group "Math"
:
abs
, sign
, sqrt
,floor
, ceiling
, trunc
,round
, signif
exp
, log
, expm1
, log1p
,cos
, sin
, tan
,cospi
, sinpi
, tanpi
,acos
, asin
, atan
cosh
, sinh
, tanh
,acosh
, asinh
, atanh
lgamma
, gamma
, digamma
, trigamma
cumsum
, cumprod
, cummax
, cummin
Members of this group dispatch on x
. Most members accept
only one argument, but members log
, round
and
signif
accept one or two arguments, and trunc
accepts
one or more.
Group "Ops"
:
"+"
, "-"
, "*"
, "/"
,
"^"
, "%%"
, "%/%"
"&"
, "|"
, "!"
"=="
, "!="
,
"<"
, "<="
, ">="
, ">"
This group contains both binary and unary operators (+
,
-
and !
): when a unary operator is encountered the
Ops
method is called with one argument and e2
is
missing.
The classes of both arguments are considered in dispatching any
member of this group. For each argument its vector of classes is
examined to see if there is a matching specific (preferred) or
Ops
method. If a method is found for just one argument or
the same method is found for both, it is used.
If different methods are found, then the generic
chooseOpsMethod()
is called to
pick the appropriate method. (See ?chooseOpsMethod
for
details). If chooseOpsMethod()
does not resolve the method,
then there is a warning about
‘incompatible methods’: in that case or if no method is found
for either argument the internal method is used.
Note that the data.frame
methods for the comparison
("Compare"
: ==
, <
, ...) and logic
("Logic"
: &
|
and !
) operators return a
logical matrix
instead of a data frame, for
convenience and back compatibility.
If the members of this group are called as functions, any argument names are removed to ensure that positional matching is always used.
Group "matrixOps"
:
"%*%"
This group currently contains the matrix multiply %*%
binary operator
only, where at least crossprod()
and tcrossprod()
are meant to follow.
Members of the group have the same dispatch semantics (using both arguments)
as the Ops
group.
Group "Summary"
:
all
, any
sum
, prod
min
, max
range
Members of this group dispatch on the first argument supplied.
Note that the data.frame
methods for the
"Summary"
and "Math"
groups require “numeric-alike”
columns x
, i.e., fulfilling
is.numeric(x) || is.logical(x) || is.complex(x)
Group "Complex"
:
Arg
, Conj
, Im
, Mod
, Re
Members of this group dispatch on z
.
Note that a method will be used for one of these groups or one of its
members only if it corresponds to a "class"
attribute,
as the internal code dispatches on oldClass
and not on
class
. This is for efficiency: having to dispatch on,
say, Ops.integer
would be too slow.
The number of arguments supplied for primitive members of the
"Math"
group generic methods is not checked prior to dispatch.
There is no lazy evaluation of arguments for group-generic functions.
These functions are all primitive and internal generic.
The details of method dispatch and variables such as .Generic
are discussed in the help for UseMethod
. There are a
few small differences:
For the operators of group Ops
, the object
.Method
is a length-two character vector with elements the
methods selected for the left and right arguments respectively. (If
no method was selected, the corresponding element is ""
.)
Object .Group
records the group used for dispatch (if
a specific method is used this is ""
).
Package methods does contain objects with these names, which it has re-used in confusing similar (but different) ways. See the help for that package.
Appendix A, Classes and Methods of
Chambers, J. M. and Hastie, T. J. eds (1992)
Statistical Models in S.
Wadsworth & Brooks/Cole.
methods
for methods of non-internal generic functions.
S4groupGeneric for group generics for S4 methods.
require(utils) d.fr <- data.frame(x = 1:9, y = stats::rnorm(9)) class(1 + d.fr) == "data.frame" ##-- add to d.f. ... methods("Math") methods("Ops") methods("Summary") methods("Complex") # none in base R
require(utils) d.fr <- data.frame(x = 1:9, y = stats::rnorm(9)) class(1 + d.fr) == "data.frame" ##-- add to d.f. ... methods("Math") methods("Ops") methods("Summary") methods("Complex") # none in base R
grouping
returns a permutation which rearranges its first
argument such that identical values are adjacent to each other. Also
returned as attributes are the group-wise partitioning and the maximum
group size.
grouping(...)
grouping(...)
... |
a sequence of numeric, character or logical vectors, all of the same length, or a classed R object. |
The function partially sorts the elements so that identical values are
adjacent. NA
values come last. This is guaranteed to be
stable, so ties are preserved, and if the data are already
grouped/sorted, the grouping is unchanged. This is useful for
aggregation and is particularly fast for character vectors.
Under the covers, the "radix"
method of order
is
used, and the same caveats apply, including restrictions on character
encodings and lack of support for long vectors (those with
or more elements). Real-valued numbers are slightly
rounded to account for numerical imprecision.
Like order
, for a classed R object the grouping is based on
the result of xtfrm
.
An object of class "grouping"
, the representation of which
should be considered experimental and subject to change. It is an
integer vector with two attributes:
ends |
subscripts in the result corresponding to the last member of each group |
maxgrpn |
the maximum group size |
(ii <- grouping(x <- c(1, 1, 3:1, 1:4, 3), y <- c(9, 9:1), z <- c(2, 1:9))) ## 6 5 2 1 7 4 10 8 3 9 rbind(x, y, z)[, ii]
(ii <- grouping(x <- c(1, 1, 3:1, 1:4, 3), y <- c(9, 9:1), z <- c(2, 1:9))) ## 6 5 2 1 7 4 10 8 3 9 rbind(x, y, z)[, ii]
gzcon
provides a modified connection that wraps an existing
connection, and decompresses reads or compresses writes through that
connection. Standard gzip
headers are assumed.
gzcon(con, level = 6, allowNonCompressed = TRUE, text = FALSE)
gzcon(con, level = 6, allowNonCompressed = TRUE, text = FALSE)
con |
a connection. |
level |
integer between 0 and 9, the compression level when writing. |
allowNonCompressed |
logical. When reading, should non-compressed input be allowed? |
text |
logical. Should the connection be text-oriented? This is
distinct from the mode of the connection (must always be binary).
If |
If con
is open then the modified connection is opened. Closing
the wrapper connection will also close the underlying connection.
Reading from a connection which does not supply a gzip
magic
header is equivalent to reading from the original connection if
allowNonCompressed
is true, otherwise an error.
Compressed output will contain embedded NUL bytes, and so con
is not permitted to be a textConnection
opened with
open = "w"
. Use a writable rawConnection
to
compress data into a variable.
The original connection becomes unusable: any object pointing to it will now refer to the modified connection. For this reason, the new connection needs to be closed explicitly.
An object inheriting from class "connection"
. This is the same
connection number as supplied, but with a modified internal
structure. It has binary mode.
## Uncompress a data file from a URL z <- gzcon(url("https://www.stats.ox.ac.uk/pub/datasets/csb/ch12.dat.gz")) # read.table can only read from a text-mode connection. raw <- textConnection(readLines(z)) close(z) dat <- read.table(raw) close(raw) dat[1:4, ] ## gzfile and gzcon can inter-work. ## Of course here one would use gzfile, but file() can be replaced by ## any other connection generator. zzfil <- tempfile(fileext = ".gz") zz <- gzfile(zzfil, "w") cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") close(zz) readLines(zz <- gzcon(file(zzfil, "rb"))) close(zz) unlink(zzfil) zzfil2 <- tempfile(fileext = ".gz") zz <- gzcon(file(zzfil2, "wb")) cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") close(zz) readLines(zz <- gzfile(zzfil2)) close(zz) unlink(zzfil2)
## Uncompress a data file from a URL z <- gzcon(url("https://www.stats.ox.ac.uk/pub/datasets/csb/ch12.dat.gz")) # read.table can only read from a text-mode connection. raw <- textConnection(readLines(z)) close(z) dat <- read.table(raw) close(raw) dat[1:4, ] ## gzfile and gzcon can inter-work. ## Of course here one would use gzfile, but file() can be replaced by ## any other connection generator. zzfil <- tempfile(fileext = ".gz") zz <- gzfile(zzfil, "w") cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") close(zz) readLines(zz <- gzcon(file(zzfil, "rb"))) close(zz) unlink(zzfil) zzfil2 <- tempfile(fileext = ".gz") zz <- gzcon(file(zzfil2, "wb")) cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") close(zz) readLines(zz <- gzfile(zzfil2)) close(zz) unlink(zzfil2)
Integers which are displayed in hexadecimal (short ‘hex’) format, with as many digits as are needed to display the largest, using leading zeroes as necessary.
Arithmetic works as for integers, and non-integer valued mathematical functions typically work by truncating the result to integer.
as.hexmode(x) ## S3 method for class 'hexmode' as.character(x, keepStr = FALSE, ...) ## S3 method for class 'hexmode' format(x, width = NULL, upper.case = FALSE, ...) ## S3 method for class 'hexmode' print(x, ...)
as.hexmode(x) ## S3 method for class 'hexmode' as.character(x, keepStr = FALSE, ...) ## S3 method for class 'hexmode' format(x, width = NULL, upper.case = FALSE, ...) ## S3 method for class 'hexmode' print(x, ...)
x |
an object, for the methods inheriting from class |
keepStr |
a |
width |
|
upper.case |
a logical indicating whether to use upper-case letters or lower-case letters (default). |
... |
further arguments passed to or from other methods. |
Class "hexmode"
consists of integer vectors with that class
attribute, used primarily to ensure that they are printed in hex.
Subsetting ([
) works too, as do arithmetic or
other mathematical operations, albeit truncated to integer.
as.character(x)
drops all attributes
(unless when
keepStr=TRUE
where it keeps, dim
, dimnames
and
names
for back compatibility) and converts each entry individually, hence with no
leading zeroes, whereas in format()
, when width = NULL
(the
default), the output is padded with leading zeroes to the smallest width
needed for all the non-missing elements.
as.hexmode
can convert integers (of type "integer"
or
"double"
) and character vectors whose elements contain only
0-9
, a-f
, A-F
(or are NA
) to class
"hexmode"
.
There is a !
method and methods for |
and
&
:
these recycle their arguments to the length of the longer and then
apply the operators bitwise to each element.
octmode
, sprintf
for other options in
converting integers to hex, strtoi
to convert hex
strings to integers.
i <- as.hexmode("7fffffff") i; class(i) identical(as.integer(i), .Machine$integer.max) hm <- as.hexmode(c(NA, 1)); hm as.integer(hm) Xm <- as.hexmode(1:16) Xm # print()s via format() stopifnot(nchar(format(Xm)) == 2) Xm[-16] # *no* leading zeroes! stopifnot(format(Xm[-16]) == c(1:9, letters[1:6])) ## Integer arithmetic (remaining "hexmode"): 16*Xm Xm^2 -Xm (fac <- factorial(Xm[1:12])) # !1, !2, !3, !4 .. in hexadecimals as.integer(fac) # indeed the same as factorial(1:12)
i <- as.hexmode("7fffffff") i; class(i) identical(as.integer(i), .Machine$integer.max) hm <- as.hexmode(c(NA, 1)); hm as.integer(hm) Xm <- as.hexmode(1:16) Xm # print()s via format() stopifnot(nchar(format(Xm)) == 2) Xm[-16] # *no* leading zeroes! stopifnot(format(Xm[-16]) == c(1:9, letters[1:6])) ## Integer arithmetic (remaining "hexmode"): 16*Xm Xm^2 -Xm (fac <- factorial(Xm[1:12])) # !1, !2, !3, !4 .. in hexadecimals as.integer(fac) # indeed the same as factorial(1:12)
These functions give the obvious hyperbolic functions. They respectively compute the hyperbolic cosine, sine, tangent, and their inverses, arc-cosine, arc-sine, arc-tangent (or ‘area cosine’, etc).
cosh(x) sinh(x) tanh(x) acosh(x) asinh(x) atanh(x)
cosh(x) sinh(x) tanh(x) acosh(x) asinh(x) atanh(x)
x |
a numeric or complex vector |
These are internal generic primitive functions: methods
can be defined for them individually or via the
Math
group generic.
Branch cuts are consistent with the inverse trigonometric functions
asin
et seq, and agree with those defined in
Abramowitz & Stegun, figure 4.7, page 86.
The behaviour actually on the cuts
follows the C99 standard which requires continuity coming round the
endpoint in a counter-clockwise direction.
All are S4 generic functions: methods can be defined
for them individually or via the
Math
group generic.
Abramowitz, M. and Stegun, I. A. (1972)
Handbook of Mathematical Functions. New York: Dover.
Chapter 4. Elementary Transcendental Functions: Logarithmic,
Exponential, Circular and Hyperbolic Functions
The trigonometric functions, cos
, sin
,
tan
, and their inverses
acos
, asin
, atan
.
The logistic distribution function plogis
is a shifted
version of tanh()
for numeric x
.
This uses system facilities to convert a character vector between encodings: the ‘i’ stands for ‘internationalization’.
iconv(x, from = "", to = "", sub = NA, mark = TRUE, toRaw = FALSE) iconvlist()
iconv(x, from = "", to = "", sub = NA, mark = TRUE, toRaw = FALSE) iconvlist()
x |
a character vector, or an object to be converted to a character
vector by |
from |
a character string describing the current encoding. |
to |
a character string describing the target encoding. |
sub |
character string. If not |
mark |
logical, for expert use. Should encodings be marked? |
toRaw |
logical. Should a list of raw vectors be returned rather than a character vector? |
The names of encodings and which ones are available are
platform-dependent. All R platforms support ""
(for the
encoding of the current locale), "latin1"
and "UTF-8"
.
Generally case is ignored when specifying an encoding.
On most platforms iconvlist
provides an alphabetical list of
the supported encodings. On others, the information is on the man
page for iconv(5)
or elsewhere in the man pages (but beware
that the system command iconv
may not support the same set of
encodings as the C functions R calls). Unfortunately, the names are
rarely supported across all platforms.
Elements of x
which cannot be converted (perhaps because they
are invalid or because they cannot be represented in the target
encoding) will be returned as NA
(or NULL
for
toRaw = TRUE
) unless sub
is specified.
Most versions of iconv
will allow transliteration by appending
‘//TRANSLIT’ to the to
encoding: see the examples.
Encoding "ASCII"
is accepted, and on most systems "C"
and "POSIX"
are synonyms for ASCII. Where
"ASCII/TRANSLIT"
is unsupported by the OS, "ASCII"
is
used with sub = "c99"
if from UTF-8, else sub =
"?"
. (However, musl's version of "ASCII"
substitutes
*
.)
Elements of x
with a declared encoding (UTF-8 or latin1, see
Encoding
) are converted from that encoding if from
= ""
, otherwise they are taken as being in the encoding specified by
from
.
Note that implementations of iconv
typically do not do much
validity checking and will often mis-convert inputs which are invalid
in encoding from
.
If sub = "Unicode"
or sub = "c99"
is used for a
non-UTF-8 input it is the same as sub = "byte"
.
If toRaw = FALSE
(the default), the value is a character vector
of the same length and the same attributes as x
(after
conversion to a character vector). If conversion fails for an element
that element of the result is set to NA_character_
. (NB:
whether conversion fails is implementation-specific.)
NA_character_
inputs give NA_character_
outputs.
If mark = TRUE
(the default) the elements of the result have a
declared encoding if to
is "latin1"
or "UTF-8"
,
or if to = ""
and the current locale's encoding is detected as
Latin-1 (or its superset CP1252 on Windows) or UTF-8.
If toRaw = TRUE
, the value is a list of the same length and
the same attributes as x
whose elements are either NULL
(if conversion fails or the input was NA_character_
) or a raw
vector.
For iconvlist()
, a character vector (typically of a few hundred
elements) of known encoding names.
There are three main implementations of iconv
in use. Linux's
most common C runtime, ‘glibc’, contains one. Several platforms
supply versions or emulations of GNU ‘libiconv’, including
previous versions of macOS and FreeBSD, in some cases with additional
encodings. On Windows we use a version of Yukihiro Nakadaira's
‘win_iconv’, which is based on Windows' codepages. (We have
added many encoding names for compatibility with other systems.) All
three have iconvlist
, ignore case in encoding names and support
‘//TRANSLIT’ (but with different results, and for
‘win_iconv’ currently a ‘best fit’ strategy is used except
for to = "ASCII"
).
The macOS 14 implementation is attributed to the ‘Citrus
Project’: the Apple headers declare it as ‘compatible’ with GNU
‘libiconv’ 1.11 from 2006. However, it differs in significant
ways including using transliteration for conversions which cannot be
represented exactly in the target encoding. (It seems this
implementation is also used in recent versions of FreeBSD. Earlier
versions of macOS used GNU ‘libiconv’ 1.11 and some
CRAN builds still do.) For a failing
conversion macOS 14 generally translated character(s) to ?
but
14.1 gives an error (so an NA
result in R).
Most commercial Unixes contain an implementation of iconv
but
none we have encountered have supported the encoding names we need:
the ‘R Installation and Administration’ manual recommended
installing GNU ‘libiconv’ on Solaris and AIX.
Some Linux distributions use ‘musl’ as their C runtime. This is less comprehensive than ‘glibc’: it does not support ‘//TRANSLIT’ but does inexact conversions (currently using ‘*’).
There are other implementations, e.g. NetBSD has used one from the Citrus project (which does not support ‘//TRANSLIT’) and there is an older FreeBSD port.
Note that you cannot rely on invalid inputs being detected, especially
for to = "ASCII"
where some implementations allow 8-bit
characters and pass them through unchanged or with transliteration or
substitution.
Some of the implementations have interesting extra encodings: for
example GNU ‘libiconv’ and macOS 14 allow to = "C99"
to use
‘\uxxxx’ escapes (or if needed ‘\Uuxxxxxxxx’) for
non-ASCII characters.
most commonly known as ‘BOMs’.
Encodings using character units which are more than one byte in size
can be written on a file in either big-endian or little-endian order:
this applies most commonly to UCS-2, UTF-16 and UTF-32/UCS-4
encodings. Some systems will write the Unicode character
U+FEFF
at the beginning of a file in these encodings and
perhaps also in UTF-8. In that usage the character is known as a BOM,
and should be handled during input (see the ‘Encodings’ section
under connection
: re-encoded connections have some
special handling of BOMs). The rest of this section applies when this
has not been done so x
starts with a BOM.
Implementations will generally interpret a BOM for from
given
as one of "UCS-2"
, "UTF-16"
and
"UTF-32"
. Implementations differ in how they treat BOMs in
x
in other from
encodings: they may be discarded,
returned as character U+FEFF
or regarded as invalid.
The most portable name for the ISO 8859-15 encoding, commonly known as
‘Latin 9’, is "iso885915"
: most platforms support both
"latin-9"
and"latin9"
but GNU ‘libiconv’ does not
support the latter. ‘musl’ (as used by Alpine Linux and other
lightweight Linux distributions) supports neither, but R remaps there
to "iso885915"
.
Encoding names "utf8"
, "mac"
and "macroman"
are
not portable. "utf8"
is converted to "UTF-8"
for
from
and to
by iconv
, but not
for e.g. fileEncoding
arguments. "macintosh"
is
the official (and most widely supported) name for ‘Mac Roman’
(https://en.wikipedia.org/wiki/Mac_OS_Roman).
Using sub
substitutes each non-convertible byte in the
input, so when converting from UTF-8 a non-convertible character may
be replaced by two or more bytes. Using sub = "c99"
or
sub = "Unicode"
will be clearer.
## In principle, as not all systems have iconvlist try(utils::head(iconvlist(), n = 50)) ## Not run: ## convert from Latin-2 to UTF-8: two of the glibc iconv variants. iconv(x, "ISO_8859-2", "UTF-8") iconv(x, "LATIN2", "UTF-8") ## End(Not run) ## Both x below are in latin1 and will only display correctly in a ## locale that can represent and display latin1. x <- "fran\xE7ais" Encoding(x) <- "latin1" x charToRaw(xx <- iconv(x, "latin1", "UTF-8")) xx ## The results in the comments are those from glibc and GNU libiconv iconv(x, "latin1", "ASCII") # NA iconv(x, "latin1", "ASCII", "?") # "fran?ais" iconv(x, "latin1", "ASCII", "") # "franais" iconv(x, "latin1", "ASCII", "byte") # "fran<e7>ais" iconv(xx, "UTF-8", "ASCII", "Unicode")# "fran<U+00E7>ais" iconv(xx, "UTF-8", "ASCII", "c99") # "fran\\u00e7ais" ## Extracts from old R help files (they are nowadays in UTF-8) x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher") Encoding(x) <- "latin1" x try(iconv(x, "latin1", "ASCII//TRANSLIT")) # platform-dependent ## glibc gives "Ekstroem" "Joreskog" "bisschen Zurcher" ## macOS 14 gives "Ekstrom" "J\"oreskog" "bisschen Z\"urcher" ## musl gives "Ekstr*m" "J*reskog" "bi*chen Z*rcher" iconv(x, "latin1", "ASCII", sub = "byte") ## and for Windows' 'Unicode' str(xx <- iconv(x, "latin1", "UTF-16LE", toRaw = TRUE)) iconv(xx, "UTF-16LE", "UTF-8") emoji <- "\U0001f604" iconv(emoji,, "latin1", sub = "Unicode") # "<U+1F604>" iconv(emoji,, "latin1", sub = "c99")
## In principle, as not all systems have iconvlist try(utils::head(iconvlist(), n = 50)) ## Not run: ## convert from Latin-2 to UTF-8: two of the glibc iconv variants. iconv(x, "ISO_8859-2", "UTF-8") iconv(x, "LATIN2", "UTF-8") ## End(Not run) ## Both x below are in latin1 and will only display correctly in a ## locale that can represent and display latin1. x <- "fran\xE7ais" Encoding(x) <- "latin1" x charToRaw(xx <- iconv(x, "latin1", "UTF-8")) xx ## The results in the comments are those from glibc and GNU libiconv iconv(x, "latin1", "ASCII") # NA iconv(x, "latin1", "ASCII", "?") # "fran?ais" iconv(x, "latin1", "ASCII", "") # "franais" iconv(x, "latin1", "ASCII", "byte") # "fran<e7>ais" iconv(xx, "UTF-8", "ASCII", "Unicode")# "fran<U+00E7>ais" iconv(xx, "UTF-8", "ASCII", "c99") # "fran\\u00e7ais" ## Extracts from old R help files (they are nowadays in UTF-8) x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher") Encoding(x) <- "latin1" x try(iconv(x, "latin1", "ASCII//TRANSLIT")) # platform-dependent ## glibc gives "Ekstroem" "Joreskog" "bisschen Zurcher" ## macOS 14 gives "Ekstrom" "J\"oreskog" "bisschen Z\"urcher" ## musl gives "Ekstr*m" "J*reskog" "bi*chen Z*rcher" iconv(x, "latin1", "ASCII", sub = "byte") ## and for Windows' 'Unicode' str(xx <- iconv(x, "latin1", "UTF-16LE", toRaw = TRUE)) iconv(xx, "UTF-16LE", "UTF-8") emoji <- "\U0001f604" iconv(emoji,, "latin1", sub = "Unicode") # "<U+1F604>" iconv(emoji,, "latin1", sub = "c99")
Controls the way collation is done by ICU (an optional part of the R build).
icuSetCollate(...) icuGetCollate(type = c("actual", "valid"))
icuSetCollate(...) icuGetCollate(type = c("actual", "valid"))
... |
named arguments, see ‘Details’. |
type |
a character string: either the |
Optionally, R can be built to collate character strings by ICU
(https://icu.unicode.org/). For such systems,
icuSetCollate
can be used to tune the way collation is done.
On other builds calling this function does nothing, with a warning.
Possible arguments are
locale
:A character string such as "da_DK"
giving the language and country whose collation rules are to be
used. If present, this should be the first argument.
case_first
:"upper"
, "lower"
or
"default"
, asking for upper- or lower-case characters to be
sorted first. The default is usually lower-case first, but not in
all languages (not under the default settings for Danish, for example).
alternate_handling
:Controls the handling of
‘variable’ characters (mainly punctuation and symbols).
Possible values are "non_ignorable"
(primary strength) and
"shifted"
(quaternary strength).
strength
:Which components should be used? Possible
values "primary"
, "secondary"
, "tertiary"
(default), "quaternary"
and "identical"
.
french_collation
:In a French locale the way accents
affect collation is from right to left, whereas in most other locales
it is from left to right. Possible values "on"
, "off"
and "default"
.
normalization
:Should strings be normalized? Possible values
are "on"
and "off"
(default). This affects the
collation of composite characters.
case_level
:An additional level between secondary and
tertiary, used to distinguish large and small Japanese Kana
characters. Possible values "on"
and "off"
(default).
hiragana_quaternary
:Possible values "on"
(sort
Hiragana first at quaternary level) and "off"
.
Only the first three are likely to be of interest except to those with a detailed understanding of collation and specialized requirements.
Some special values are accepted for locale
:
"none"
:ICU is not used for collation: the OS's collation services are used instead.
"ASCII"
:ICU is not used for collation: the C function
strcmp
is used instead, which should sort byte-by-byte in
(unsigned) numerical order.
"default"
:obtains the locale from the OS as is done at the start of the session (except on Windows). If environment variable R_ICU_LOCALE is set to a non-empty value, its value is used rather than consulting the OS, unless environment variable LC_ALL is set to 'C' (or unset but LC_COLLATE is set to 'C').
""
, "root"
:the ‘root’ collation: see https://www.unicode.org/reports/tr35/tr35-collation.html#Root_Collation.
For the specifications of ‘real’ ICU locales, see
https://unicode-org.github.io/icu/userguide/locale/. Note that ICU does not
report that a locale is not supported, but falls back to its idea of
‘best fit’ (which could be rather different and is reported by
icuGetCollate("actual")
, often "root"
). Most English
locales fall back to "root"
as although e.g. "en_GB"
is
a valid locale (at least on some platforms), it contains no special
rules for collation. Note that "C"
is not a supported ICU locale
and hence R_ICU_LOCALE should never be set to "C"
.
Some examples are case_level = "on", strength = "primary"
to ignore
accent differences and alternate_handling = "shifted"
to ignore
space and punctuation characters.
Initially ICU will not be used for collation if the OS is set to use the
C
locale for collation and R_ICU_LOCALE is not set. Once
this function is called with a value for locale
, ICU will be used
until it is called again with locale = "none"
. ICU will not be
used once Sys.setlocale
is called with a "C"
value for
LC_ALL
or LC_COLLATE
, even if R_ICU_LOCALE is set.
ICU will be used again honoring R_ICU_LOCALE once
Sys.setlocale
is called to set a different collation order.
Environment variables LC_ALL (or LC_COLLATE) take precedence
over R_ICU_LOCALE if and only if they are set to 'C'. Due to the
interaction with other ways of setting the collation order,
R_ICU_LOCALE should be used with care and only when needed.
All customizations are reset to the default for the locale if
locale
is specified: the collation engine is reset if the
OS collation locate category is changed by Sys.setlocale
.
For icuGetCollate
, a character string describing the ICU locale
in use (which may be reported as "ICU not in use"
). The
‘actual’ locale may be simpler than the requested locale: for
example "da"
rather than "da_DK"
: English locales are
likely to report "root"
.
Except on Windows, ICU is used by default wherever it is available. As it works internally in UTF-8, it will be most efficient in UTF-8 locales.
On Windows, R is normally built including ICU, but it will only be
used if environment variable R_ICU_LOCALE had been set when R
is started or after icuSetCollate
is called to select the
locale (as ICU and Windows differ in their idea of locale names).
Note that icuSetCollate(locale = "default")
should work
reasonably well, but finds the system default ignoring environment
variables such as LC_COLLATE.
capabilities
for whether ICU is available;
extSoftVersion
for its version.
The ICU user guide chapter on collation (https://unicode-org.github.io/icu/userguide/collation/).
## These examples depend on having ICU available, and on the locale. ## As we don't know the current settings, we can only reset to the default. if(capabilities("ICU")) withAutoprint({ icuGetCollate() icuGetCollate("valid") x <- c("Aarhus", "aarhus", "safe", "test", "Zoo") sort(x) icuSetCollate(case_first = "upper"); sort(x) icuSetCollate(case_first = "lower"); sort(x) ## Danish collates upper-case-first and with 'aa' as a single letter icuSetCollate(locale = "da_DK", case_first = "default"); sort(x) ## Estonian collates Z between S and T icuSetCollate(locale = "et_EE"); sort(x) icuSetCollate(locale = "default"); icuGetCollate("valid") })
## These examples depend on having ICU available, and on the locale. ## As we don't know the current settings, we can only reset to the default. if(capabilities("ICU")) withAutoprint({ icuGetCollate() icuGetCollate("valid") x <- c("Aarhus", "aarhus", "safe", "test", "Zoo") sort(x) icuSetCollate(case_first = "upper"); sort(x) icuSetCollate(case_first = "lower"); sort(x) ## Danish collates upper-case-first and with 'aa' as a single letter icuSetCollate(locale = "da_DK", case_first = "default"); sort(x) ## Estonian collates Z between S and T icuSetCollate(locale = "et_EE"); sort(x) icuSetCollate(locale = "default"); icuGetCollate("valid") })
The safe and reliable way to test two objects for being
exactly equal. It returns TRUE
in this case,
FALSE
in every other case.
identical(x, y, num.eq = TRUE, single.NA = TRUE, attrib.as.set = TRUE, ignore.bytecode = TRUE, ignore.environment = FALSE, ignore.srcref = TRUE, extptr.as.ref = FALSE)
identical(x, y, num.eq = TRUE, single.NA = TRUE, attrib.as.set = TRUE, ignore.bytecode = TRUE, ignore.environment = FALSE, ignore.srcref = TRUE, extptr.as.ref = FALSE)
x , y
|
any R objects. |
num.eq |
logical indicating if ( |
single.NA |
logical indicating if there is conceptually just one numeric
|
attrib.as.set |
logical indicating if |
ignore.bytecode |
logical indicating if byte code should be ignored when comparing closures. |
ignore.environment |
logical indicating if their environments should be ignored when comparing closures. |
ignore.srcref |
logical indicating if their |
extptr.as.ref |
logical indicating whether external pointer objects should be compared as reference objects and considered identical only if they are the same object in memory. By default, external pointers are considered identical if the addresses they contain are identical. |
A call to identical
is the way to test exact equality in
if
and while
statements, as well as in logical
expressions that use &&
or ||
. In all these
applications you need to be assured of getting a single logical
value.
Users often use the comparison operators, such as ==
or
!=
, in these situations. It looks natural, but it is not what
these operators are designed to do in R. They return an object like
the arguments. If you expected x
and y
to be of length
1, but it happened that one of them was not, you will not get a
single FALSE
. Similarly, if one of the arguments is NA
,
the result is also NA
. In either case, the expression
if(x == y)....
won't work as expected.
The function all.equal
is also sometimes used to test equality
this way, but was intended for something different: it allows for
small differences in numeric results.
The computations in identical
are also reliable and usually
fast. There should never be an error. The only known way to kill
identical
is by having an invalid pointer at the C level,
generating a memory fault. It will usually find inequality quickly.
Checking equality for two large, complicated objects can take longer
if the objects are identical or nearly so, but represent completely
independent copies. For most applications, however, the computational cost
should be negligible.
If single.NA
is true, as by default, identical
sees
NaN
as different from NA_real_
, but all
NaN
s are equal (and all NA
of the same type are equal).
Character strings (except those in marked encoding "bytes"
) are
regarded as identical even if they are in different marked encodings but
would agree when translated to UTF-8. A character string in marked encoding
"bytes"
is only regarded as identical to a character string in the
same encoding and with the same content.
If attrib.as.set
is true, as by default, comparison of
attributes view them as a set (and not a vector, so order is not
tested).
If ignore.bytecode
is true (the default), the compiled
bytecode of a function (see cmpfun
) will be ignored in
the comparison. If it is false, functions will compare equal only if
they are copies of the same compiled object (or both are
uncompiled). To check whether two different compiles are equal, you
should compare the results of disassemble()
.
You almost never want to use identical
on datetimes of class
"POSIXlt"
: not only can different times in the different
time zones represent the same time and time zones have multiple names,
but several of the components are optional.
Note that the strictest test for equality is
identical(x, y, num.eq = FALSE, single.NA = FALSE, attrib.as.set = FALSE, ignore.bytecode = FALSE, ignore.environment = FALSE, ignore.srcref = FALSE, extptr.as.ref = TRUE)
A single logical value, TRUE
or FALSE
, never NA
and never anything other than a single value.
John Chambers and R Core
Chambers, J. M. (1998) Programming with Data. A Guide to the S Language. Springer.
all.equal
for descriptions of how two objects differ;
Comparison
and Logic
for elementwise comparisons.
identical(1, NULL) ## FALSE -- don't try this with == identical(1, 1.) ## TRUE in R (both are stored as doubles) identical(1, as.integer(1)) ## FALSE, stored as different types x <- 1.0; y <- 0.99999999999 ## how to test for object equality allowing for numeric fuzz : (E <- all.equal(x, y)) identical(TRUE, E) isTRUE(E) # alternative test ## If all.equal thinks the objects are different, it returns a ## character string, and the above expression evaluates to FALSE ## even for unusual R objects : identical(.GlobalEnv, environment()) ### ------- Pickyness Flags : ----------------------------- ## the infamous example: identical(0., -0.) # TRUE, i.e. not differentiated identical(0., -0., num.eq = FALSE) ## similar: identical(NaN, -NaN) # TRUE identical(NaN, -NaN, single.NA = FALSE) # differ on bit-level ### For functions ("closure"s): ---------------------------------------------- ### ~~~~~~~~~ f <- function(x) x f g <- compiler::cmpfun(f) g identical(f, g) # TRUE, as bytecode is ignored by default identical(f, g, ignore.bytecode=FALSE) # FALSE: bytecode differs ## GLM families contain several functions, some of which share an environment: p1 <- poisson() ; p2 <- poisson() identical(p1, p2) # FALSE identical(p1, p2, ignore.environment=TRUE) # TRUE ## in interactive use, the 'keep.source' option is typically true: op <- options(keep.source = TRUE) # and so, these have differing "srcref" : f1 <- function() {} f2 <- function() {} identical(f1,f2)# ignore.srcref= TRUE : TRUE identical(f1,f2, ignore.srcref=FALSE)# FALSE options(op) # revert to previous state
identical(1, NULL) ## FALSE -- don't try this with == identical(1, 1.) ## TRUE in R (both are stored as doubles) identical(1, as.integer(1)) ## FALSE, stored as different types x <- 1.0; y <- 0.99999999999 ## how to test for object equality allowing for numeric fuzz : (E <- all.equal(x, y)) identical(TRUE, E) isTRUE(E) # alternative test ## If all.equal thinks the objects are different, it returns a ## character string, and the above expression evaluates to FALSE ## even for unusual R objects : identical(.GlobalEnv, environment()) ### ------- Pickyness Flags : ----------------------------- ## the infamous example: identical(0., -0.) # TRUE, i.e. not differentiated identical(0., -0., num.eq = FALSE) ## similar: identical(NaN, -NaN) # TRUE identical(NaN, -NaN, single.NA = FALSE) # differ on bit-level ### For functions ("closure"s): ---------------------------------------------- ### ~~~~~~~~~ f <- function(x) x f g <- compiler::cmpfun(f) g identical(f, g) # TRUE, as bytecode is ignored by default identical(f, g, ignore.bytecode=FALSE) # FALSE: bytecode differs ## GLM families contain several functions, some of which share an environment: p1 <- poisson() ; p2 <- poisson() identical(p1, p2) # FALSE identical(p1, p2, ignore.environment=TRUE) # TRUE ## in interactive use, the 'keep.source' option is typically true: op <- options(keep.source = TRUE) # and so, these have differing "srcref" : f1 <- function() {} f2 <- function() {} identical(f1,f2)# ignore.srcref= TRUE : TRUE identical(f1,f2, ignore.srcref=FALSE)# FALSE options(op) # revert to previous state
A trivial identity function returning its argument.
identity(x)
identity(x)
x |
an R object. |
diag
creates diagonal matrices, including identity ones.
ifelse
returns a value with the same shape as
test
which is filled with elements selected
from either yes
or no
depending on whether the element of test
is TRUE
or FALSE
.
ifelse(test, yes, no)
ifelse(test, yes, no)
test |
an object which can be coerced to logical mode. |
yes |
return values for true elements of |
no |
return values for false elements of |
If yes
or no
are too short, their elements are recycled.
yes
will be evaluated if and only if any element of test
is true, and analogously for no
.
Missing values in test
give missing values in the result.
A vector of the same length and attributes (including dimensions and
"class"
) as test
and data values from the values of
yes
or no
. The mode of the answer will be coerced from
logical to accommodate first any values taken from yes
and then
any values taken from no
.
The mode of the result may depend on the value of test
(see the
examples), and the class attribute (see oldClass
) of the
result is taken from test
and may be inappropriate for the
values selected from yes
and no
.
Sometimes it is better to use a construction such as
(tmp <- yes; tmp[!test] <- no[!test]; tmp)
, possibly extended to handle missing values in test
.
Further note that if(test) yes else no
is much more efficient
and often much preferable to ifelse(test, yes, no)
whenever
test
is a simple true/false result, i.e., when
length(test) == 1
.
The srcref
attribute of functions is handled specially: if
test
is a simple true result and yes
evaluates to a function
with srcref
attribute, ifelse
returns yes
including
its attribute (the same applies to a false test
and no
argument). This functionality is only for backwards compatibility, the
form if(test) yes else no
should be used whenever yes
and
no
are functions.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
if
.
x <- c(6:-4) sqrt(x) #- gives warning sqrt(ifelse(x >= 0, x, NA)) # no warning ## Note: the following also gives the warning ! ifelse(x >= 0, sqrt(x), NA) ## ifelse() strips attributes ## This is important when working with Dates and factors x <- seq(as.Date("2000-02-29"), as.Date("2004-10-04"), by = "1 month") ## has many "yyyy-mm-29", but a few "yyyy-03-01" in the non-leap years y <- ifelse(as.POSIXlt(x)$mday == 29, x, NA) head(y) # not what you expected ... ==> need restore the class attribute: class(y) <- class(x) y ## This is a (not atypical) case where it is better *not* to use ifelse(), ## but rather the more efficient and still clear: y2 <- x y2[as.POSIXlt(x)$mday != 29] <- NA ## which gives the same as ifelse()+class() hack: stopifnot(identical(y2, y)) ## example of different return modes (and 'test' alone determining length): yes <- 1:3 no <- pi^(1:4) utils::str( ifelse(NA, yes, no) ) # logical, length 1 utils::str( ifelse(TRUE, yes, no) ) # integer, length 1 utils::str( ifelse(FALSE, yes, no) ) # double, length 1
x <- c(6:-4) sqrt(x) #- gives warning sqrt(ifelse(x >= 0, x, NA)) # no warning ## Note: the following also gives the warning ! ifelse(x >= 0, sqrt(x), NA) ## ifelse() strips attributes ## This is important when working with Dates and factors x <- seq(as.Date("2000-02-29"), as.Date("2004-10-04"), by = "1 month") ## has many "yyyy-mm-29", but a few "yyyy-03-01" in the non-leap years y <- ifelse(as.POSIXlt(x)$mday == 29, x, NA) head(y) # not what you expected ... ==> need restore the class attribute: class(y) <- class(x) y ## This is a (not atypical) case where it is better *not* to use ifelse(), ## but rather the more efficient and still clear: y2 <- x y2[as.POSIXlt(x)$mday != 29] <- NA ## which gives the same as ifelse()+class() hack: stopifnot(identical(y2, y)) ## example of different return modes (and 'test' alone determining length): yes <- 1:3 no <- pi^(1:4) utils::str( ifelse(NA, yes, no) ) # logical, length 1 utils::str( ifelse(TRUE, yes, no) ) # integer, length 1 utils::str( ifelse(FALSE, yes, no) ) # double, length 1
Creates or tests for objects of type "integer"
.
integer(length = 0) as.integer(x, ...) is.integer(x)
integer(length = 0) as.integer(x, ...) is.integer(x)
length |
a non-negative integer specifying the desired length. Double values will be coerced to integer: supplying an argument of length other than one is an error. |
x |
object to be coerced or tested. |
... |
further arguments passed to or from other methods. |
Integer vectors exist so that data can be passed to C or Fortran code which expects them, and so that (small) integer data can be represented exactly and compactly.
Note that current implementations of R use 32-bit integers for
integer vectors, so the range of representable integers is restricted
to about :
double
s can
hold much larger integers exactly.
integer
creates a integer vector of the specified length.
Each element of the vector is equal to 0
.
as.integer
attempts to coerce its argument to be of integer
type. The answer will be NA
unless the coercion succeeds. Real
values larger in modulus than the largest integer are coerced to
NA
(unlike S which gives the most extreme integer of the same
sign). Non-integral numeric values are truncated towards zero (i.e.,
as.integer(x)
equals trunc(x)
there), and
imaginary parts of complex numbers are discarded (with a warning).
Character strings containing optional whitespace followed by either a
decimal representation or a hexadecimal representation (starting with
0x
or 0X
) can be converted, as well as any allowed by
the platform for real numbers. Like as.vector
it strips
attributes including names. (To ensure that an object x
is of
integer type without stripping attributes, use
storage.mode(x) <- "integer"
.)
is.integer
returns TRUE
or FALSE
depending on
whether its argument is of integer type or not, unless it is a
factor when it returns FALSE
.
is.integer(x)
does not test if x
contains integer
numbers! For that, use round
, as in the function
is.wholenumber(x)
in the examples.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
round
(and ceiling
and floor
on that help
page) to convert to integral values.
## as.integer() truncates: x <- pi * c(-1:1, 10) as.integer(x) is.integer(1) # is FALSE ! is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) < tol is.wholenumber(1) # is TRUE (x <- seq(1, 5, by = 0.5) ) is.wholenumber( x ) #--> TRUE FALSE TRUE ...
## as.integer() truncates: x <- pi * c(-1:1, 10) as.integer(x) is.integer(1) # is FALSE ! is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) < tol is.wholenumber(1) # is TRUE (x <- seq(1, 5, by = 0.5) ) is.wholenumber( x ) #--> TRUE FALSE TRUE ...
interaction
computes a factor which represents the interaction
of the given factors. The result of interaction
is always unordered.
interaction(..., drop = FALSE, sep = ".", lex.order = FALSE)
interaction(..., drop = FALSE, sep = ".", lex.order = FALSE)
... |
the factors for which interaction is to be computed, or a single list giving those factors. |
drop |
if |
sep |
string to construct the new level labels by joining the constituent ones. |
lex.order |
logical indicating if the order of factor concatenation should be lexically ordered. |
A factor which represents the interaction of the given factors.
The levels are labelled as the levels of the individual factors joined
by sep
which is .
by default.
By default, when lex.order = FALSE
, the levels are ordered so
the level of the first factor varies fastest, then the second and so
on. This is the reverse of lexicographic ordering (which you can get
by lex.order = TRUE
), and differs from
:
. (It is done this way for compatibility with S.)
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.
factor
;
:
where f:g
is similar to
interaction(f, g, sep = ":")
when f
and g
are factors.
a <- gl(2, 4, 8) b <- gl(2, 2, 8, labels = c("ctrl", "treat")) s <- gl(2, 1, 8, labels = c("M", "F")) interaction(a, b) interaction(a, b, s, sep = ":") stopifnot(identical(a:s, interaction(a, s, sep = ":", lex.order = TRUE)), identical(a:s:b, interaction(a, s, b, sep = ":", lex.order = TRUE)))
a <- gl(2, 4, 8) b <- gl(2, 2, 8, labels = c("ctrl", "treat")) s <- gl(2, 1, 8, labels = c("M", "F")) interaction(a, b) interaction(a, b, s, sep = ":") stopifnot(identical(a:s, interaction(a, s, sep = ":", lex.order = TRUE)), identical(a:s:b, interaction(a, s, b, sep = ":", lex.order = TRUE)))
Return TRUE
when R is being used interactively and
FALSE
otherwise.
interactive()
interactive()
An interactive R session is one in which it is assumed that there is a human operator to interact with, so for example R can prompt for corrections to incorrect input or ask what to do next or if it is OK to move to the next plot.
GUI consoles will arrange to start R in an interactive session. When
R is run in a terminal (via Rterm.exe
on Windows), it
assumes that it is interactive if ‘stdin’ is connected to a
(pseudo-)terminal and not if ‘stdin’ is redirected to a file or
pipe. Command-line options --interactive (Unix) and
--ess (Windows, Rterm.exe
) override the default
assumption.
(On a Unix-alike, whether the readline
command-line editor is
used is not overridden by --interactive.)
Embedded uses of R can set a session to be interactive or not.
Internally, whether a session is interactive determines
how some errors are handled and reported, e.g. see
stop
and options("showWarnCalls")
.
whether one of --save, --no-save or --vanilla is required, and if R ever asks whether to save the workspace.
the choice of default graphics device launched when needed and
by dev.new
: see options("device")
whether graphics devices ever ask for confirmation of a new page.
In addition, R's own R code makes use of interactive()
: for
example help
, debugger
and
install.packages
do.
This is a primitive function.
.First <- function() if(interactive()) x11()
.First <- function() if(interactive()) x11()
.Internal
performs a call to an internal code
which is built in to the R interpreter.
Only true R wizards should even consider using this function, and only R developers can add to the list of internal functions.
.Internal(call)
.Internal(call)
call |
a call expression |
.Primitive
, .External
(the nearest
equivalent available to users).
Many R-internal functions are generic and allow methods to be written for.
The following primitive and internal functions are generic,
i.e., you can write methods
for them:
length
,
length<-
,
lengths
,
dimnames
,
dimnames<-
,
dim
,
dim<-
,
names
,
names<-
,
levels<-
,
@
,
@<-
,
as.character
,
as.complex
,
as.double
,
as.integer
,
as.logical
,
as.raw
,
as.vector
,
as.call
,
as.environment
is.array
,
is.matrix
,
is.na
,
anyNA
,
is.nan
,
is.finite
is.infinite
is.numeric
,
nchar
rep
,
rep.int
rep_len
seq.int
(which dispatches methods for "seq"
),
is.unsorted
and
xtfrm
In addition, is.name
is a synonym for is.symbol
and
dispatches methods for the latter. Similarly, as.numeric
is a synonym for as.double
and dispatches methods for the
latter, i.e., S3 methods are for as.double
, whereas S4 methods
are to be written for as.numeric
.
Note that all of the group generic functions are also internal/primitive and allow methods to be written for them.
.S3PrimitiveGenerics
is a character vector listing the
primitives which are internal generic and not group generic,
(not only for S3 but also S4).
Similarly, the .internalGenerics
character vector contains the names
of the internal (via .Internal(..)
) non-primitive functions
which are internally generic.
For efficiency, internal dispatch only occurs on objects, that
is those for which is.object
returns true.
methods
for the methods which are available.
Return a (temporarily) invisible copy of an object.
invisible(x = NULL)
invisible(x = NULL)
x |
an arbitrary R object, by default |
This function can be useful when it is desired to have functions return values which can be assigned, but which do not print when they are not assigned.
This is a primitive function.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
withVisible
,
return
,
function
.
# These functions both return their argument f1 <- function(x) x f2 <- function(x) invisible(x) f1(1) # prints f2(1) # does not
# These functions both return their argument f1 <- function(x) x f2 <- function(x) invisible(x) f1(1) # prints f2(1) # does not
is.finite
and is.infinite
return a vector of the same
length as x
, indicating which elements are finite (not infinite
and not missing) or infinite.
Inf
and -Inf
are positive and negative infinity
whereas NaN
means ‘Not a Number’. (These apply to numeric
values and real and imaginary parts of complex values but not to
values of integer vectors.) Inf
and NaN
(as well as
NA
) are
reserved words in the R language.
is.finite(x) is.infinite(x) is.nan(x) Inf NaN
is.finite(x) is.infinite(x) is.nan(x) Inf NaN
x |
R object to be tested: the default methods handle atomic vectors. |
is.finite
returns a vector of the same length as x
the
j-th element of which is TRUE
if x[j]
is finite (i.e., it
is not one of the values NA
, NaN
, Inf
or
-Inf
) and FALSE
otherwise. Complex
numbers are finite if both the real and imaginary parts are.
is.infinite
returns a vector of the same length as x
the
j-th element of which is TRUE
if x[j]
is infinite (i.e.,
equal to one of Inf
or -Inf
) and FALSE
otherwise. This will be false unless x
is numeric or complex.
Complex numbers are infinite if either the real or the imaginary part is.
is.nan
tests if a numeric value is NaN
. Do not test
equality to NaN
, or even use identical
, since
systems typically have many different NaN values. One of these is
used for the numeric missing value NA
, and is.nan
is
false for that value. A complex number is regarded as NaN
if
either the real or imaginary part is NaN
but not NA
.
All elements of logical, integer and raw vectors are considered not to
be NaN.
All three functions accept NULL
as input and return a length
zero result. The default methods accept character and raw vectors, and
return FALSE
for all entries. Prior to R version 2.14.0 they
accepted all input, returning FALSE
for most non-numeric
values; cases which are not atomic vectors are now signalled as
errors.
All three functions are generic: you can write methods to handle specific classes of objects, see InternalMethods.
A logical vector of the same length as x
: dim
,
dimnames
and names
attributes are preserved.
In R, basically all mathematical functions (including basic
Arithmetic
), are supposed to work properly with
+/- Inf
and NaN
as input or output.
The basic rule should be that calls and relations with Inf
s
really are statements with a proper mathematical limit.
Computations involving NaN
will return NaN
or perhaps
NA
: which of those two is not guaranteed and may depend
on the R platform (since compilers may re-order computations).
The IEC 60559 standard, also known as the ANSI/IEEE 754 Floating-Point Standard.
https://en.wikipedia.org/wiki/NaN.
D. Goldberg (1991).
What Every Computer Scientist Should Know about Floating-Point
Arithmetic.
ACM Computing Surveys, 23(1), 5–48.
doi:10.1145/103162.103163.
Also available at
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html.
The C99 function isfinite
is used for is.finite
.
NA
, ‘Not Available’ which is not a number
as well, however usually used for missing values and applies to many
modes, not just numeric and complex.
pi / 0 ## = Inf a non-zero number divided by zero creates infinity 0 / 0 ## = NaN 1/0 + 1/0 # Inf 1/0 - 1/0 # NaN stopifnot( 1/0 == Inf, 1/Inf == 0 ) sin(Inf) cos(Inf) tan(Inf)
pi / 0 ## = Inf a non-zero number divided by zero creates infinity 0 / 0 ## = NaN 1/0 + 1/0 # Inf 1/0 - 1/0 # NaN stopifnot( 1/0 == Inf, 1/Inf == 0 ) sin(Inf) cos(Inf) tan(Inf)
Checks whether its argument is a (primitive) function.
is.function(x) is.primitive(x)
is.function(x) is.primitive(x)
x |
an R object. |
is.primitive(x)
tests if x
is a primitive function,
i.e, if typeof(x)
is either "builtin"
or
"special"
.
TRUE
if x
is a (primitive) function, and FALSE
otherwise.
is.function(1) # FALSE is.function (is.primitive) # TRUE: it is a function, but .. is.primitive(is.primitive) # FALSE: it's not a primitive one, whereas is.primitive(is.function) # TRUE: that one *is*
is.function(1) # FALSE is.function (is.primitive) # TRUE: it is a function, but .. is.primitive(is.primitive) # FALSE: it's not a primitive one, whereas is.primitive(is.function) # TRUE: that one *is*
is.language
returns TRUE
if x
is a
variable name
, a call
, or an
expression
.
is.language(x)
is.language(x)
x |
object to be tested. |
A name
is also known as ‘symbol’, from its type
(typeof
), see is.symbol
.
If typeof(x) == "language"
, then is.language(x)
is always true, but the reverse does not hold as expressions or
names y
also fulfill is.language(y)
, see the examples.
This is a primitive function.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
ll <- list(a = expression(x^2 - 2*x + 1), b = as.name("Jim"), c = as.expression(exp(1)), d = call("sin", pi)) sapply(ll, typeof) sapply(ll, mode) stopifnot(sapply(ll, is.language))
ll <- list(a = expression(x^2 - 2*x + 1), b = as.name("Jim"), c = as.expression(exp(1)), d = call("sin", pi)) sapply(ll, typeof) sapply(ll, mode) stopifnot(sapply(ll, is.language))
A function mostly for internal use. It returns TRUE
if the
object x
has the R internal OBJECT
bit set, and
FALSE
otherwise. The OBJECT
bit is set when a
"class"
attribute is added and removed when that attribute is
removed, so this is a very efficient way to check if an object has a
class attribute. (S4 objects always should.)
Note that typical basic (‘atomic’, see is.atomic
)
R vectors and arrays x
are not objects in the above
sense as attributes(x)
does not contain "class"
.
is.object(x)
is.object(x)
x |
object to be tested. |
This is a primitive function.
isS4
.
is.object(1) # FALSE is.object(as.factor(1:3)) # TRUE
is.object(1) # FALSE is.object(as.factor(1:3)) # TRUE
is.atomic
returns TRUE
if x
is of an atomic type
and FALSE
otherwise.
is.recursive
returns TRUE
if x
has a recursive
(list-like) structure and FALSE
otherwise.
is.atomic(x) is.recursive(x)
is.atomic(x) is.recursive(x)
x |
object to be tested. |
is.atomic
is true for the atomic types
("logical"
, "integer"
, "numeric"
,
"complex"
, "character"
and "raw"
).
Most types of objects are regarded as recursive. Exceptions are the atomic
types, NULL
, symbols (as given by as.name
),
S4
objects with slots, external pointers, and—rarely visible
from R—weak references and byte code, see typeof
.
It is common to call the atomic types ‘atomic vectors’, but
note that is.vector
imposes further restrictions: an
object can be atomic but not a vector (in that sense).
These are primitive functions.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
is.list
,
is.language
, etc,
and the demo("is.things")
.
require(stats) is.a.r <- function(x) c(is.atomic(x), is.recursive(x)) is.a.r(c(a = 1, b = 3)) # TRUE FALSE is.a.r(list()) # FALSE TRUE - a list is a list is.a.r(list(2)) # FALSE TRUE is.a.r(lm) # FALSE TRUE is.a.r(y ~ x) # FALSE TRUE is.a.r(expression(x+1)) # FALSE TRUE is.a.r(quote(exp)) # FALSE FALSE is.a.r(NULL) # FALSE FALSE
require(stats) is.a.r <- function(x) c(is.atomic(x), is.recursive(x)) is.a.r(c(a = 1, b = 3)) # TRUE FALSE is.a.r(list()) # FALSE TRUE - a list is a list is.a.r(list(2)) # FALSE TRUE is.a.r(lm) # FALSE TRUE is.a.r(y ~ x) # FALSE TRUE is.a.r(expression(x+1)) # FALSE TRUE is.a.r(quote(exp)) # FALSE FALSE is.a.r(NULL) # FALSE FALSE
is.single
reports an error. There are no single precision
values in R.
is.single(x)
is.single(x)
x |
object to be tested. |
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Test if an object is not sorted (in increasing order), without the cost of sorting it.
is.unsorted(x, na.rm = FALSE, strictly = FALSE)
is.unsorted(x, na.rm = FALSE, strictly = FALSE)
x |
an R object with a class or a numeric, complex, character, logical or raw vector. |
na.rm |
logical. Should missing values be removed before checking? |
strictly |
logical indicating if the check should be for strictly increasing values. |
is.unsorted
is generic: you can write methods to handle
specific classes of objects, see InternalMethods.
A length-one logical value. All objects of length 0 or 1 are sorted.
Otherwise, the result will be NA
except for atomic vectors and
objects with an S3 class (where the >=
or >
method is
used to compare x[i]
with x[i-1]
for i
in
2:length(x)
) or with an S4 class where you have to provide a
method for is.unsorted()
.
This function is designed for objects with one-dimensional indices, as described above. Data frames, matrices and other arrays may give surprising results.
Convenience wrappers to create date-times from numeric representations.
ISOdatetime(year, month, day, hour, min, sec, tz = "") ISOdate(year, month, day, hour = 12, min = 0, sec = 0, tz = "GMT")
ISOdatetime(year, month, day, hour, min, sec, tz = "") ISOdate(year, month, day, hour = 12, min = 0, sec = 0, tz = "GMT")
year , month , day
|
numerical values to specify a day. |
hour , min , sec
|
numerical values for a time within a day. Fractional seconds are allowed. |
tz |
a time zone specification to be used for the conversion.
|
ISOdatetime
and ISOdate
are convenience wrappers for
strptime
that differ only in their defaults and that
ISOdate
sets UTC as the time zone. For dates without times it
would normally be better to use the "Date"
class.
The main arguments will be recycled using the usual recycling rules.
Because these make use of strptime
, only years in the
range 0:9999
are accepted.
An object of class "POSIXct"
.
DateTimeClasses for details of the date-time classes;
strptime
for conversions from character strings.
Tests whether the object is an instance of an S4 class.
isS4(object) asS4(object, flag = TRUE, complete = TRUE) asS3(object, flag = TRUE, complete = TRUE)
isS4(object) asS4(object, flag = TRUE, complete = TRUE) asS3(object, flag = TRUE, complete = TRUE)
object |
Any R object. |
flag |
Optional, logical: indicate direction of conversion. |
complete |
Optional, logical: whether conversion to S3 is completed. Not usually needed, but see the details section. |
Note that isS4
does not rely on the methods
package, so in particular it can be used to detect the need to
require
that package.
asS3
uses the value of
complete
to control whether an attempt is made to transform
object
into a valid object of the implied S3 class. If
complete
is TRUE
,
then an object from an S4 class extending an S3 class will be
transformed into an S3 object with the corresponding S3 class (see
S3Part
). This includes classes extending the
pseudo-classes array
and matrix
: such objects will have
their class attribute set to NULL
.
isS4
is primitive.
isS4
always returns TRUE
or FALSE
according to
whether the internal flag marking an S4 object has been turned on for
this object.
asS4
and asS3
will turn this flag on or off,
and asS3
will set the class from the objects .S3Class
slot if one exists. Note that asS3
will not turn
the object into an S3 object
unless there is a valid conversion; that is, an object of type other
than "S4"
for which the S4 object is an extension, unless
argument complete
is FALSE
.
is.object
for a more general test; Introduction
for general information on S4; Classes_Details for more on S4
class definitions.
isS4(pi) # FALSE isS4(getClass("MethodDefinition")) # TRUE
isS4(pi) # FALSE isS4(getClass("MethodDefinition")) # TRUE
Generic function to test if object
is symmetric or not.
Currently only a matrix method is implemented, where a
complex
matrix Z
must be “Hermitian” for
isSymmetric(Z)
to be true.
isSymmetric(object, ...) ## S3 method for class 'matrix' isSymmetric(object, tol = 100 * .Machine$double.eps, tol1 = 8 * tol, ...)
isSymmetric(object, ...) ## S3 method for class 'matrix' isSymmetric(object, tol = 100 * .Machine$double.eps, tol1 = 8 * tol, ...)
object |
any R object; a |
tol |
numeric scalar >= 0. Smaller differences are not
considered, see |
tol1 |
numeric scalar >= 0. |
... |
further arguments passed to methods; the matrix method
passes these to |
The matrix
method is used inside eigen
by
default to test symmetry of matrices up to rounding error, using
all.equal
. It might not be appropriate in all
situations.
Note that a matrix m
is only symmetric if its rownames
and
colnames
are identical. Consider using unname(m)
.
logical indicating if object
is symmetric or not.
eigen
which calls isSymmetric
when its
symmetric
argument is missing.
isSymmetric(D3 <- diag(3)) # -> TRUE D3[2, 1] <- 1e-100 D3 isSymmetric(D3) # TRUE isSymmetric(D3, tol = 0) # FALSE for zero-tolerance ## Complex Matrices - Hermitian or not Z <- sqrt(matrix(-1:2 + 0i, 2)); Z <- t(Conj(Z)) %*% Z Z isSymmetric(Z) # TRUE isSymmetric(Z + 1) # TRUE isSymmetric(Z + 1i) # FALSE -- a Hermitian matrix has a *real* diagonal colnames(D3) <- c("X", "Y", "Z") isSymmetric(D3) # FALSE (as row and column names differ) isSymmetric(D3, check.attributes=FALSE) # TRUE (as names are not checked)
isSymmetric(D3 <- diag(3)) # -> TRUE D3[2, 1] <- 1e-100 D3 isSymmetric(D3) # TRUE isSymmetric(D3, tol = 0) # FALSE for zero-tolerance ## Complex Matrices - Hermitian or not Z <- sqrt(matrix(-1:2 + 0i, 2)); Z <- t(Conj(Z)) %*% Z Z isSymmetric(Z) # TRUE isSymmetric(Z + 1) # TRUE isSymmetric(Z + 1i) # FALSE -- a Hermitian matrix has a *real* diagonal colnames(D3) <- c("X", "Y", "Z") isSymmetric(D3) # FALSE (as row and column names differ) isSymmetric(D3, check.attributes=FALSE) # TRUE (as names are not checked)
Add a small amount of noise to a numeric vector.
jitter(x, factor = 1, amount = NULL)
jitter(x, factor = 1, amount = NULL)
x |
numeric vector to which jitter should be added. |
factor |
numeric. |
amount |
numeric; if positive, used as amount (see below),
otherwise, if Default ( |
The result, say r
, is r <- x + runif(n, -a, a)
where n <- length(x)
and a
is the amount
argument (if specified).
Let z <- max(x) - min(x)
(assuming the usual case).
The amount a
to be added is either provided as positive
argument amount
or otherwise computed from z
, as
follows:
If amount == 0
, we set a <- factor * z/50
(same as S).
If amount
is NULL
(default), we set
a <- factor * d/5
where d is the smallest
difference between adjacent unique (apart from fuzz) x
values.
jitter(x, ...)
returns a numeric of the same length as
x
, but with an amount
of noise added in order to break
ties.
Werner Stahel and Martin Maechler, ETH Zurich
Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P.A. (1983) Graphical Methods for Data Analysis. Wadsworth; figures 2.8, 4.22, 5.4.
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.
rug
which you may want to combine with jitter
.
round(jitter(c(rep(1, 3), rep(1.2, 4), rep(3, 3))), 3) ## These two 'fail' with S-plus 3.x: jitter(rep(0, 7)) jitter(rep(10000, 5))
round(jitter(c(rep(1, 3), rep(1.2, 4), rep(3, 3))), 3) ## These two 'fail' with S-plus 3.x: jitter(rep(0, 7)) jitter(rep(10000, 5))
The condition number of a regular (square) matrix is the product of the norm of the matrix and the norm of its inverse (or pseudo-inverse), and hence depends on the kind of matrix-norm.
kappa()
computes by default (an estimate of) the 2-norm
condition number of a matrix or of the matrix of a
decomposition, perhaps of a linear fit. The 2-norm condition number
can be shown to be the ratio of the largest to the smallest
non-zero singular value of the matrix.
rcond()
computes an approximation of the reciprocal
condition number, see the details.
kappa(z, ...) ## Default S3 method: kappa(z, exact = FALSE, norm = NULL, method = c("qr", "direct"), inv_z = solve(z), triangular = FALSE, uplo = "U", ...) ## S3 method for class 'lm' kappa(z, ...) ## S3 method for class 'qr' kappa(z, ...) .kappa_tri(z, exact = FALSE, LINPACK = TRUE, norm = NULL, uplo = "U", ...) rcond(x, norm = c("O","I","1"), triangular = FALSE, uplo = "U", ...)
kappa(z, ...) ## Default S3 method: kappa(z, exact = FALSE, norm = NULL, method = c("qr", "direct"), inv_z = solve(z), triangular = FALSE, uplo = "U", ...) ## S3 method for class 'lm' kappa(z, ...) ## S3 method for class 'qr' kappa(z, ...) .kappa_tri(z, exact = FALSE, LINPACK = TRUE, norm = NULL, uplo = "U", ...) rcond(x, norm = c("O","I","1"), triangular = FALSE, uplo = "U", ...)
z , x
|
a numeric or complex matrix or a result of
|
exact |
logical. Should the result be exact (up to small rounding error) as opposed to fast (but quite inaccurate)? |
norm |
character string, specifying the matrix norm with respect
to which the condition number is to be computed, see the function
|
method |
a partially matched character string specifying the method to be used;
|
inv_z |
for |
triangular |
logical. If true, the matrix used is just the upper or
lower triangular part of |
uplo |
character string, either |
LINPACK |
logical. If true and |
... |
further arguments passed to or from other methods;
for |
For kappa()
, if exact = FALSE
(the default) the
condition number is estimated by a cheap approximation to the 1-norm of
the triangular matrix of the
qr(x)
decomposition
. However, the exact 2-norm calculation (via
svd
) is also likely to be quick enough.
Note that the approximate 1- and Inf-norm condition numbers via
method = "direct"
are much faster to
calculate, and rcond()
computes these reciprocal
condition numbers, also for complex matrices, using standard LAPACK
routines.
Currently, also the kappa*()
functions compute these
approximations whenever exact
is false, i.e., by default.
kappa
and rcond
are different interfaces to
partly identical functionality.
.kappa_tri
is an internal function called by kappa.qr
and
kappa.default
; tri
is for triangular and its methods
only consider the upper or lower triangular part of the matrix, depending
on uplo = "U"
or "L"
, where "U"
was internally hard
wired before R 4.4.0.
Unsuccessful results from the underlying LAPACK code will result in an error giving a positive error code: these can only be interpreted by detailed study of the FORTRAN code.
The condition number, , or an approximation if
exact = FALSE
.
The design was inspired by (but differs considerably from) the S function of the same name described in Chambers (1992).
The LAPACK routines DTRCON
and ZTRCON
and the LINPACK
routine DTRCO
.
LAPACK and LINPACK are from https://netlib.org/lapack/ and https://netlib.org/linpack/ and their guides are listed in the references.
Anderson. E. and ten others (1999)
LAPACK Users' Guide. Third Edition. SIAM.
Available on-line at
https://netlib.org/lapack/lug/lapack_lug.html.
Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Dongarra, J. J., Bunch, J. R., Moler, C. B. and Stewart, G. W. (1978) LINPACK Users Guide. Philadelphia: SIAM Publications.
norm
;
svd
for the singular value decomposition and
qr
for the one.
kappa(x1 <- cbind(1, 1:10)) # 15.71 kappa(x1, exact = TRUE) # 13.68 kappa(x2 <- cbind(x1, 2:11)) # high! [x2 is singular!] hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, `+`) } sv9 <- svd(h9 <- hilbert(9))$ d kappa(h9) # pretty high; by default {exact=FALSE, method="qr"} : kappa(h9) == kappa(qr.R(qr(h9)), norm = "1") all.equal(kappa(h9, exact = TRUE), # its definition: max(sv9) / min(sv9), tolerance = 1e-12) ## the same (typically down to 2.22e-16) kappa(h9, exact = TRUE) / kappa(h9) # 0.677 (i.e., rel.error = 32%) ## Exact kappa for rectangular matrix ## panmagic.6npm1(7) : pm7 <- rbind(c( 1, 13, 18, 23, 35, 40, 45), c(37, 49, 5, 10, 15, 27, 32), c(24, 29, 41, 46, 2, 14, 19), c(11, 16, 28, 33, 38, 43, 6), c(47, 3, 8, 20, 25, 30, 42), c(34, 39, 44, 7, 12, 17, 22), c(21, 26, 31, 36, 48, 4, 9)) kappa(pm7, exact=TRUE, norm="1") # no problem for square matrix m76 <- pm7[,1:6] (m79 <- cbind(pm7, 50:56, 63:57)) ## Moore-Penrose inverse { ~= MASS::ginv(); differing tol (value & meaning)}: ## pinv := p(seudo) inv(erse) pinv <- function(X, s = svd(X), tol = 64*.Machine$double.eps) { if (is.complex(X)) s$u <- Conj(s$u) dx <- dim(X) ## X = U D V' ==> Result = V {1/D} U' pI <- function(u,d,v) tcrossprod(v, u / rep(d, each = dx[1L])) pos <- (d <- s$d) > max(tol * max(dx) * d[1L], 0) if (all(pos)) pI(s$u, d, s$v) else if (!any(pos)) array(0, dX[2L:1L]) else { # some pos, some not: i <- which(pos) pI(s$u[, i, drop = FALSE], d[i], s$v[, i, drop = FALSE]) } } ## rectangular kappa(m76, norm="1") try( kappa(m76, exact=TRUE, norm="1") )# error in solve().. must be square ## ==> use pseudo-inverse instead of solve() for rectangular {and norm != "2"}: iZ <- pinv(m76) kappa(m76, exact=TRUE, norm="1", inv_z = iZ) kappa(m76, exact=TRUE, norm="M", inv_z = iZ) kappa(m76, exact=TRUE, norm="I", inv_z = iZ) iX <- pinv(m79) kappa(m79, exact=TRUE, norm="1", inv_z = iX) kappa(m79, exact=TRUE, norm="M", inv_z = iX) kappa(m79, exact=TRUE, norm="I", inv_z = iX)
kappa(x1 <- cbind(1, 1:10)) # 15.71 kappa(x1, exact = TRUE) # 13.68 kappa(x2 <- cbind(x1, 2:11)) # high! [x2 is singular!] hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, `+`) } sv9 <- svd(h9 <- hilbert(9))$ d kappa(h9) # pretty high; by default {exact=FALSE, method="qr"} : kappa(h9) == kappa(qr.R(qr(h9)), norm = "1") all.equal(kappa(h9, exact = TRUE), # its definition: max(sv9) / min(sv9), tolerance = 1e-12) ## the same (typically down to 2.22e-16) kappa(h9, exact = TRUE) / kappa(h9) # 0.677 (i.e., rel.error = 32%) ## Exact kappa for rectangular matrix ## panmagic.6npm1(7) : pm7 <- rbind(c( 1, 13, 18, 23, 35, 40, 45), c(37, 49, 5, 10, 15, 27, 32), c(24, 29, 41, 46, 2, 14, 19), c(11, 16, 28, 33, 38, 43, 6), c(47, 3, 8, 20, 25, 30, 42), c(34, 39, 44, 7, 12, 17, 22), c(21, 26, 31, 36, 48, 4, 9)) kappa(pm7, exact=TRUE, norm="1") # no problem for square matrix m76 <- pm7[,1:6] (m79 <- cbind(pm7, 50:56, 63:57)) ## Moore-Penrose inverse { ~= MASS::ginv(); differing tol (value & meaning)}: ## pinv := p(seudo) inv(erse) pinv <- function(X, s = svd(X), tol = 64*.Machine$double.eps) { if (is.complex(X)) s$u <- Conj(s$u) dx <- dim(X) ## X = U D V' ==> Result = V {1/D} U' pI <- function(u,d,v) tcrossprod(v, u / rep(d, each = dx[1L])) pos <- (d <- s$d) > max(tol * max(dx) * d[1L], 0) if (all(pos)) pI(s$u, d, s$v) else if (!any(pos)) array(0, dX[2L:1L]) else { # some pos, some not: i <- which(pos) pI(s$u[, i, drop = FALSE], d[i], s$v[, i, drop = FALSE]) } } ## rectangular kappa(m76, norm="1") try( kappa(m76, exact=TRUE, norm="1") )# error in solve().. must be square ## ==> use pseudo-inverse instead of solve() for rectangular {and norm != "2"}: iZ <- pinv(m76) kappa(m76, exact=TRUE, norm="1", inv_z = iZ) kappa(m76, exact=TRUE, norm="M", inv_z = iZ) kappa(m76, exact=TRUE, norm="I", inv_z = iZ) iX <- pinv(m79) kappa(m79, exact=TRUE, norm="1", inv_z = iX) kappa(m79, exact=TRUE, norm="M", inv_z = iX) kappa(m79, exact=TRUE, norm="I", inv_z = iX)
Computes the generalised Kronecker product of two arrays,
X
and Y
.
kronecker(X, Y, FUN = "*", make.dimnames = FALSE, ...) X %x% Y
kronecker(X, Y, FUN = "*", make.dimnames = FALSE, ...) X %x% Y
X |
a vector or array. |
Y |
a vector or array. |
FUN |
a function; it may be a quoted string. |
make.dimnames |
logical: provide dimnames that are the product of the
dimnames of |
... |
optional arguments to be passed to |
If X
and Y
do not have the same number of
dimensions, the smaller array is padded with dimensions of size
one. The returned array comprises submatrices constructed by
taking X
one term at a time and expanding that term as
FUN(x, Y, ...)
.
%x%
is an alias for kronecker
(where
FUN
is hardwired to "*"
).
An array A
with dimensions dim(X) * dim(Y)
.
Jonathan Rougier
Shayle R. Searle (1982) Matrix Algebra Useful for Statistics. John Wiley and Sons.
outer
, on which kronecker
is built
and %*%
for usual matrix multiplication.
# simple scalar multiplication ( M <- matrix(1:6, ncol = 2) ) kronecker(4, M) # Block diagonal matrix: kronecker(diag(1, 3), M) # ask for dimnames fred <- matrix(1:12, 3, 4, dimnames = list(LETTERS[1:3], LETTERS[4:7])) bill <- c("happy" = 100, "sad" = 1000) kronecker(fred, bill, make.dimnames = TRUE) bill <- outer(bill, c("cat" = 3, "dog" = 4)) kronecker(fred, bill, make.dimnames = TRUE)
# simple scalar multiplication ( M <- matrix(1:6, ncol = 2) ) kronecker(4, M) # Block diagonal matrix: kronecker(diag(1, 3), M) # ask for dimnames fred <- matrix(1:12, 3, 4, dimnames = list(LETTERS[1:3], LETTERS[4:7])) bill <- c("happy" = 100, "sad" = 1000) kronecker(fred, bill, make.dimnames = TRUE) bill <- outer(bill, c("cat" = 3, "dog" = 4)) kronecker(fred, bill, make.dimnames = TRUE)
Report on localization information.
l10n_info()
l10n_info()
‘A Latin-1 locale’ includes supersets (for printable characters) such as Windows codepage 1252 but not Latin-9 (ISO 8859-15).
On Windows (where the resulting list contains codepage
and system.codepage
components additionally), common
codepages are 1252 (Western European), 1250 (Central European),
1251 (Cyrillic), 1253 (Greek), 1254 (Turkish), 1255 (Hebrew), 1256
(Arabic), 1257 (Baltic), 1258 (Vietnamese), 874 (Thai), 932
(Japanese), 936 (Simplified Chinese), 949 (Korean) and 950
(Traditional Chinese). Codepage 28605 is Latin-9 and 65001 is UTF-8
(where supported). R does not allow the C locale, and uses 1252 as
the default codepage.
A list with three logical elements and further OS-specific elements:
MBCS |
If a multi-byte character set in use? |
UTF-8 |
Is this known to be a UTF-8 locale? |
Latin-1 |
Is this known to be a Latin-1 locale? |
Not on Windows:
codeset |
character. The encoding name as reported by the OS,
possibly |
Only on Windows:
codepage |
integer: the Windows codepage corresponding to the locale R is using (and not necessarily that Windows is using). |
system.codepage |
integer: the Windows system/ANSI codepage (the codepage Windows is using). Added in R 4.1.0. |
l10n_info()
l10n_info()
Report the name of the shared object file with LAPACK
implementation
in use.
La_library()
La_library()
A character vector of length one (""
when the name is not known).
The value can be used as an indication of which LAPACK
implementation is in use. Typically, the R version of LAPACK
will
appear as libRlapack.so
(libRlapack.dylib
), depending on how
R was built. Note that libRlapack.so
(libRlapack.dylib
) may
also be shown for an external LAPACK
implementation that had been
copied, hard-linked or renamed by the system administrator. Otherwise,
the shared object file will be given and its path/name may indicate
the vendor/version.
The detection does not work on Windows, nor for the Accelerate framework on macOS, nor in the rare (and unsupported) case of a static external library.
It is possible to build R against an enhanced BLAS which contains
some but not all LAPACK routines, in which case this function reports
the library containing routine ILAVER
.
extSoftVersion
for versions of other third-party software
including BLAS
.
La_version
for the version of LAPACK in use.
La_library()
La_library()
Report the version of LAPACK in use.
La_version()
La_version()
A character vector of length one.
Note that this is the version as reported by the library at runtime. It may differ from the reference (‘netlib’) implementation, for example by having some optimized or patched routines. For the version included with R, the older (not Fortran 90) versions of
DLARTG DLASSQ ZLARTG ZLASSQ
are used.
extSoftVersion
for versions of other third-party software.
La_library
for binary/executable file with LAPACK in use.
La_version()
La_version()
Find a suitable set of labels from an object for use in printing or plotting, for example. A generic function.
labels(object, ...)
labels(object, ...)
object |
any R object: the function is generic. |
... |
further arguments passed to or from other methods. |
A character vector or list of such vectors. For a vector the results
is the names or seq_along(x)
and for a data frame or array it
is the dimnames (with NULL
expanded to seq_len(d[i])
).
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.
lapply
returns a list of the same length as X
, each
element of which is the result of applying FUN
to the
corresponding element of X
.
sapply
is a user-friendly version and wrapper of lapply
by default returning a vector, matrix or, if simplify = "array"
, an
array if appropriate, by applying simplify2array()
.
sapply(x, f, simplify = FALSE, USE.NAMES = FALSE)
is the same as
lapply(x, f)
.
vapply
is similar to sapply
, but has a pre-specified
type of return value, so it can be safer (and sometimes faster) to
use.
replicate
is a wrapper for the common use of sapply
for
repeated evaluation of an expression (which will usually involve
random number generation).
simplify2array()
is the utility called from sapply()
when simplify
is not false and is similarly called from
mapply()
.
lapply(X, FUN, ...) sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE) replicate(n, expr, simplify = "array") simplify2array(x, higher = TRUE, except = c(0L, 1L))
lapply(X, FUN, ...) sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE) replicate(n, expr, simplify = "array") simplify2array(x, higher = TRUE, except = c(0L, 1L))
X |
a vector (atomic or list) or an |
FUN |
the function to be applied to each element of |
... |
optional arguments to |
simplify |
logical or character string; should the result be
simplified to a vector, matrix or higher dimensional array if
possible? For |
USE.NAMES |
logical; if |
FUN.VALUE |
a (generalized) vector; a template for the return value from FUN. See ‘Details’. |
n |
integer: the number of replications. |
expr |
the expression (a language object, usually a call) to evaluate repeatedly. |
x |
a list, typically returned from |
higher |
logical; if true, |
except |
integer vector or |
FUN
is found by a call to match.fun
and typically
is specified as a function or a symbol (e.g., a backquoted name) or a
character string specifying a function to be searched for from the
environment of the call to lapply
.
Function FUN
must be able to accept as input any of the
elements of X
. If the latter is an atomic vector, FUN
will always be passed a length-one vector of the same type as X
.
Arguments in ...
cannot have the same name as any of the
other arguments, and care may be needed to avoid partial matching to
FUN
. In general-purpose code it is good practice to name the
first two arguments X
and FUN
if ...
is passed
through: this both avoids partial matching to FUN
and ensures
that a sensible error message is given if arguments named X
or
FUN
are passed through ...
.
Simplification in sapply
is only attempted if X
has
length greater than zero and if the return values from all elements
of X
are all of the same (positive) length. If the common
length is one the result is a vector, and if greater than one is a
matrix with a column corresponding to each element of X
.
Simplification is always done in vapply
. This function
checks that all values of FUN
are compatible with the
FUN.VALUE
, in that they must have the same length and type.
(Types may be promoted to a higher type within the ordering logical
< integer < double < complex, but not demoted.)
Users of S4 classes should pass a list to lapply
and
vapply
: the internal coercion is done by the as.list
in
the base namespace and not one defined by a user (e.g., by setting S4
methods on the base function).
For lapply
, sapply(simplify = FALSE)
and
replicate(simplify = FALSE)
, a list.
For sapply(simplify = TRUE)
and replicate(simplify =
TRUE)
: if X
has length zero or n = 0
, an empty list.
Otherwise an atomic vector or matrix or list of the same length as
X
(of length n
for replicate
). If simplification
occurs, the output type is determined from the highest type of the
return values in the hierarchy NULL < raw < logical < integer < double <
complex < character < list < expression, after coercion of pairlists
to lists.
vapply
returns a vector or array of type matching the
FUN.VALUE
. If length(FUN.VALUE) == 1
a
vector of the same length as X
is returned, otherwise
an array. If FUN.VALUE
is not an array
, the
result is a matrix with length(FUN.VALUE)
rows and
length(X)
columns, otherwise an array a
with
dim(a) == c(dim(FUN.VALUE), length(X))
.
The (Dim)names of the array value are taken from the FUN.VALUE
if it is named, otherwise from the result of the first function call.
Column names of the matrix or more generally the names of the last
dimension of the array value or names of the vector value are set from
X
as in sapply
.
sapply(*, simplify = FALSE, USE.NAMES = FALSE)
is
equivalent to lapply(*)
.
For historical reasons, the calls created by lapply
are
unevaluated, and code has been written (e.g., bquote
) that
relies on this. This means that the recorded call is always of the
form FUN(X[[i]], ...)
, with i
replaced by the current
(integer or double) index. This is not normally a problem, but it can
be if FUN
uses sys.call
or
match.call
or if it is a primitive function that makes
use of the call. This means that it is often safer to call primitive
functions with a wrapper, so that e.g. lapply(ll, function(x)
is.numeric(x))
is required to ensure that method dispatch for
is.numeric
occurs correctly.
If expr
is a function call, be aware of assumptions about where
it is evaluated, and in particular what ...
might refer to.
You can pass additional named arguments to a function call as
additional named arguments to replicate
: see ‘Examples’.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
apply
, tapply
,
mapply
for applying a function to multiple
arguments, and rapply
for a recursive version of
lapply()
, eapply
for applying a function to each
entry in an environment
.
require(stats); require(graphics) x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE)) # compute the list mean for each list element lapply(x, mean) # median and quartiles for each list element lapply(x, quantile, probs = 1:3/4) sapply(x, quantile) i39 <- sapply(3:9, seq) # list of vectors sapply(i39, fivenum) vapply(i39, fivenum, c(Min. = 0, "1st Qu." = 0, Median = 0, "3rd Qu." = 0, Max. = 0)) ## sapply(*, "array") -- artificial example (v <- structure(10*(5:8), names = LETTERS[1:4])) f2 <- function(x, y) outer(rep(x, length.out = 3), y) (a2 <- sapply(v, f2, y = 2*(1:5), simplify = "array")) a.2 <- vapply(v, f2, outer(1:3, 1:5), y = 2*(1:5)) stopifnot(dim(a2) == c(3,5,4), all.equal(a2, a.2), identical(dimnames(a2), list(NULL,NULL,LETTERS[1:4]))) hist(replicate(100, mean(rexp(10)))) ## use of replicate() with parameters: foo <- function(x = 1, y = 2) c(x, y) # does not work: bar <- function(n, ...) replicate(n, foo(...)) bar <- function(n, x) replicate(n, foo(x = x)) bar(5, x = 3)
require(stats); require(graphics) x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE)) # compute the list mean for each list element lapply(x, mean) # median and quartiles for each list element lapply(x, quantile, probs = 1:3/4) sapply(x, quantile) i39 <- sapply(3:9, seq) # list of vectors sapply(i39, fivenum) vapply(i39, fivenum, c(Min. = 0, "1st Qu." = 0, Median = 0, "3rd Qu." = 0, Max. = 0)) ## sapply(*, "array") -- artificial example (v <- structure(10*(5:8), names = LETTERS[1:4])) f2 <- function(x, y) outer(rep(x, length.out = 3), y) (a2 <- sapply(v, f2, y = 2*(1:5), simplify = "array")) a.2 <- vapply(v, f2, outer(1:3, 1:5), y = 2*(1:5)) stopifnot(dim(a2) == c(3,5,4), all.equal(a2, a.2), identical(dimnames(a2), list(NULL,NULL,LETTERS[1:4]))) hist(replicate(100, mean(rexp(10)))) ## use of replicate() with parameters: foo <- function(x = 1, y = 2) c(x, y) # does not work: bar <- function(n, ...) replicate(n, foo(...)) bar <- function(n, x) replicate(n, foo(x = x)) bar(5, x = 3)
The value of the internal evaluation of a top-level R expression
is always assigned to .Last.value
(in package:base
)
before further processing (e.g., printing).
.Last.value
.Last.value
The value of a top-level assignment is put in .Last.value
,
unlike S.
Do not assign to .Last.value
in the workspace, because this
will always mask the object of the same name in package:base
.
## These will not work correctly from example(), ## but they will in make check or if pasted in, ## as example() does not run them at the top level gamma(1:15) # think of some intensive calculation... fac14 <- .Last.value # keep them library("splines") # returns invisibly .Last.value # shows what library(.) above returned
## These will not work correctly from example(), ## but they will in make check or if pasted in, ## as example() does not run them at the top level gamma(1:15) # think of some intensive calculation... fac14 <- .Last.value # keep them library("splines") # returns invisibly .Last.value # shows what library(.) above returned
Get or set the length of vectors (including lists) and factors, and of any other R object for which a method has been defined.
length(x) length(x) <- value
length(x) length(x) <- value
x |
an R object. For replacement, a vector or factor. |
value |
a non-negative integer or double (which will be rounded down). |
Both functions are generic: you can write methods to handle specific
classes of objects, see InternalMethods. length<-
has a
"factor"
method.
The replacement form can be used to reset the length of a vector. If
a vector is shortened, extra values are discarded and when a vector is
lengthened, it is padded out to its new length with NA
s
(nul
for raw vectors).
Both are primitive functions.
The default method for length
currently returns a non-negative
integer
of length 1, except for vectors of more than
elements, when it returns a double.
For vectors (including lists) and factors the length is the number of
elements. For an environment it is the number of objects in the
environment, and NULL
has length 0. For expressions and
pairlists (including language objects and dot-dot-dot lists) it is the
length of the pairlist chain. All other objects (including functions)
have length one: note that for functions this differs from S.
The replacement form removes all the attributes of x
except its
names, which are adjusted (and if necessary extended by ""
).
Package authors have written methods that return a result of length
other than one (Formula) and that return a vector of type
double
(Matrix), even with non-integer values
(earlier versions of sets). Where a single double value is
returned that can be represented as an integer it is returned as a
length-one integer vector.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
nchar
for counting the number of characters in character
vectors, lengths
for getting the length of every element
in a list.
length(diag(4)) # = 16 (4 x 4) length(options()) # 12 or more length(y ~ x1 + x2 + x3) # 3 length(expression(x, {y <- x^2; y+2}, x^y)) # 3 ## from example(warpbreaks) require(stats) fm1 <- lm(breaks ~ wool * tension, data = warpbreaks) length(fm1$call) # 3, lm() and two arguments. length(formula(fm1)) # 3, ~ lhs rhs
length(diag(4)) # = 16 (4 x 4) length(options()) # 12 or more length(y ~ x1 + x2 + x3) # 3 length(expression(x, {y <- x^2; y+2}, x^y)) # 3 ## from example(warpbreaks) require(stats) fm1 <- lm(breaks ~ wool * tension, data = warpbreaks) length(fm1$call) # 3, lm() and two arguments. length(formula(fm1)) # 3, ~ lhs rhs
Get the length of each element of a list
or atomic
vector (is.atomic
) as an integer or numeric vector.
lengths(x, use.names = TRUE)
lengths(x, use.names = TRUE)
x |
a |
use.names |
logical indicating if the result should inherit the
|
This function loops over x
and returns a compatible vector
containing the length of each element in x
. Effectively,
length(x[[i]])
is called for all i
, so any methods on
length
are considered.
lengths
is generic: you can write methods to handle
specific classes of objects, see InternalMethods.
A non-negative integer
of length length(x)
,
except when any element has a length of more than
elements, when it returns a double vector.
When
use.names
is true, the names are taken from the names on
x
, if any.
One raison d'être of lengths(x)
is its use as a
more efficient version of sapply(x, length)
and similar
*apply
calls to length
. This is the reason why
x
may be an atomic vector, even though lengths(x)
is
trivial in that case.
length
for getting the length of any R object.
require(stats) ## summarize by month l <- split(airquality$Ozone, airquality$Month) avgOz <- lapply(l, mean, na.rm=TRUE) ## merge result airquality$avgOz <- rep(unlist(avgOz, use.names=FALSE), lengths(l)) ## but this is safer and cleaner, but can be slower airquality$avgOz <- unsplit(avgOz, airquality$Month) ## should always be true, except when a length does not fit in 32 bits stopifnot(identical(lengths(l), vapply(l, length, integer(1L)))) ## empty lists are not a problem x <- list() stopifnot(identical(lengths(x), integer())) ## nor are "list-like" expressions: lengths(expression(u, v, 1+ 0:9)) ## and we should dispatch to length methods f <- c(rep(1, 3), rep(2, 6), 3) dates <- split(as.POSIXlt(Sys.time() + 1:10), f) stopifnot(identical(lengths(dates), vapply(dates, length, integer(1L))))
require(stats) ## summarize by month l <- split(airquality$Ozone, airquality$Month) avgOz <- lapply(l, mean, na.rm=TRUE) ## merge result airquality$avgOz <- rep(unlist(avgOz, use.names=FALSE), lengths(l)) ## but this is safer and cleaner, but can be slower airquality$avgOz <- unsplit(avgOz, airquality$Month) ## should always be true, except when a length does not fit in 32 bits stopifnot(identical(lengths(l), vapply(l, length, integer(1L)))) ## empty lists are not a problem x <- list() stopifnot(identical(lengths(x), integer())) ## nor are "list-like" expressions: lengths(expression(u, v, 1+ 0:9)) ## and we should dispatch to length methods f <- c(rep(1, 3), rep(2, 6), 3) dates <- split(as.POSIXlt(Sys.time() + 1:10), f) stopifnot(identical(lengths(dates), vapply(dates, length, integer(1L))))
levels
provides access to the levels attribute of a variable.
The first form returns the value of the levels of its argument
and the second sets the attribute.
levels(x) levels(x) <- value
levels(x) levels(x) <- value
x |
an object, for example a factor. |
value |
a valid value for |
Both the extractor and replacement forms are generic and new methods
can be written for them. The most important method for the replacement
function is that for factor
s.
For the factor replacement method, a NA
in value
causes that level to be removed from the levels and the elements
formerly with that level to be replaced by NA
.
Note that for a factor, replacing the levels via
levels(x) <- value
is not the same as (and is preferred to)
attr(x, "levels") <- value
.
The replacement function is primitive.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
## assign individual levels x <- gl(2, 4, 8) levels(x)[1] <- "low" levels(x)[2] <- "high" x ## or as a group y <- gl(2, 4, 8) levels(y) <- c("low", "high") y ## combine some levels z <- gl(3, 2, 12, labels = c("apple", "salad", "orange")) z levels(z) <- c("fruit", "veg", "fruit") z ## same, using a named list z <- gl(3, 2, 12, labels = c("apple", "salad", "orange")) z levels(z) <- list("fruit" = c("apple","orange"), "veg" = "salad") z ## we can add levels this way: f <- factor(c("a","b")) levels(f) <- c("c", "a", "b") f f <- factor(c("a","b")) levels(f) <- list(C = "C", A = "a", B = "b") f
## assign individual levels x <- gl(2, 4, 8) levels(x)[1] <- "low" levels(x)[2] <- "high" x ## or as a group y <- gl(2, 4, 8) levels(y) <- c("low", "high") y ## combine some levels z <- gl(3, 2, 12, labels = c("apple", "salad", "orange")) z levels(z) <- c("fruit", "veg", "fruit") z ## same, using a named list z <- gl(3, 2, 12, labels = c("apple", "salad", "orange")) z levels(z) <- list("fruit" = c("apple","orange"), "veg" = "salad") z ## we can add levels this way: f <- factor(c("a","b")) levels(f) <- c("c", "a", "b") f f <- factor(c("a","b")) levels(f) <- list(C = "C", A = "a", B = "b") f
Report version of libcurl
in use.
libcurlVersion()
libcurlVersion()
A character string, with value the libcurl
version in use or
""
if none is. If libcurl
is available, has attributes
ssl_version |
A character string naming the SSL/TLS implementation
and version, possibly |
libssh_version |
A character string naming the |
protocols |
A character vector of the names of supported protocols, also known as ‘schemes’ when part of a URL. |
In late 2017 a libcurl
installation was seen divided into two
libraries, libcurl
and libcurl-feature
, and the first
had been updated but not the second. As the compiled function
recording the version was in the latter, the version reported by
libcurlVersion
was misleading.
extSoftVersion
for versions of other third-party
software.
curlGetHeaders
, download.file
and
url
for functions which (optionally) use libcurl
.
https://curl.se/docs/sslcerts.html and
https://curl.se/docs/ssl-compared.html for more details on
SSL versions (the current standard being known as TLS). Normally
libcurl
used with R uses SecureTransport on macOS, OpenSSL on
Windows and GnuTLS, NSS or OpenSSL on Unix-alikes. (At the time of
writing Debian-based Linuxen use GnuTLS and RedHat-based ones use
OpenSSL, having previously used NSS.)
libcurlVersion()
libcurlVersion()
.libPaths
gets/sets the library trees within which packages are
looked for.
.libPaths(new, include.site = TRUE) .Library .Library.site
.libPaths(new, include.site = TRUE) .Library .Library.site
new |
a character vector with the locations of R library
trees. Tilde expansion ( |
include.site |
a logical value indicating whether the value of
|
.Library
is a character string giving the location of the
default library, the ‘library’ subdirectory of R_HOME.
.Library.site
is a (possibly empty) character vector giving the
locations of the site libraries.
.libPaths
is used for getting or setting the library trees that R
knows about and hence uses when looking for packages (the library search
path). If called with argument new
, by default, the library search
path is set to the existing directories in unique(c(new,
.Library.site, .Library))
and this is returned. If include.site
is FALSE
when the new
argument is set, .Library.site
is not added to the new library search path. If called without the
new
argument, a character vector with the currently active library
trees is returned.
How paths in new
with a trailing slash are treated is
OS-dependent. On a POSIX filesystem existing directories can usually
be specified with a trailing slash. On Windows filepaths with a
trailing slash (or backslash) are invalid and existing directories
specified with a trailing slash may not be added to the library search path.
At startup, the library search path is initialized from the
environment variables R_LIBS, R_LIBS_USER and
R_LIBS_SITE, which if set should give lists of directories where
R library trees are rooted, colon-separated on Unix-alike systems and
semicolon-separated on Windows. For the latter two, a value of
NULL
indicates an empty list of directories. (Note that as from
R 4.2.0, both are set by R start-up code if not already set or empty
so can be interrogated from an R session to find their defaults:
in earlier versions this was true only for R_LIBS_USER.)
First, .Library.site
is initialized from R_LIBS_SITE. If
this is unset or empty, the ‘site-library’ subdirectory of
R_HOME is used. Only directories which exist at the time of
initialization are retained. Then, .libPaths()
is called with
the combination of the directories given by R_LIBS and
R_LIBS_USER. By default R_LIBS is unset, and if
R_LIBS_USER is unset or empty, it is set to directory
‘R/R.version$platform-library/x.y’ of the home
directory on Unix-alike systems (or
‘Library/R/m/x.y/library’ for CRAN macOS builds, with
m Sys.info()["machine"]
) and
‘R/win-library/x.y’ subdirectory of LOCALAPPDATA on
Windows, for R x.y.z.
Both R_LIBS_USER and R_LIBS_SITE feature possible expansion of specifiers for R-version-specific information as part of the startup process. The possible conversion specifiers all start with a ‘%’ and are followed by a single letter (use ‘%%’ to obtain ‘%’), with currently available conversion specifications as follows:
R version number including the patch level (e.g., ‘2.5.0’).
R version number excluding the patch level (e.g., ‘2.5’).
the platform for which R was built, the value of
R.version$platform
.
the underlying operating system, the value of
R.version$os
.
the architecture (CPU) R was built on/for, the
value of R.version$arch
.
(See version
for details on R version information.)
In addition, ‘%U’ and ‘%S’ expand to the R defaults for,
respectively, R_LIBS_USER and R_LIBS_SITE.
Function .libPaths
always uses the values of .Library
and .Library.site
in the base namespace. .Library.site
can be set by the site in ‘Rprofile.site’, which should be
followed by a call to .libPaths(.libPaths())
to make use of the
updated value.
For consistency, the paths are always normalized by
normalizePath(winslash = "/")
.
LOCALAPPDATA (usually C:\Users\username\AppData\Local
) on
Windows is a hidden directory and may not be viewed by some software. It
may be opened by shell.exec(Sys.getenv("LOCALAPPDATA"))
.
A character vector of file paths.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
.libPaths() # all library trees R knows about
.libPaths() # all library trees R knows about
library
and require
load and attach add-on packages.
library(package, help, pos = 2, lib.loc = NULL, character.only = FALSE, logical.return = FALSE, warn.conflicts, quietly = FALSE, verbose = getOption("verbose"), mask.ok, exclude, include.only, attach.required = missing(include.only)) require(package, lib.loc = NULL, quietly = FALSE, warn.conflicts, character.only = FALSE, mask.ok, exclude, include.only, attach.required = missing(include.only)) conflictRules(pkg, mask.ok = NULL, exclude = NULL)
library(package, help, pos = 2, lib.loc = NULL, character.only = FALSE, logical.return = FALSE, warn.conflicts, quietly = FALSE, verbose = getOption("verbose"), mask.ok, exclude, include.only, attach.required = missing(include.only)) require(package, lib.loc = NULL, quietly = FALSE, warn.conflicts, character.only = FALSE, mask.ok, exclude, include.only, attach.required = missing(include.only)) conflictRules(pkg, mask.ok = NULL, exclude = NULL)
package , help
|
the name of a package, given as a name or
literal character string, or a character string, depending on
whether |
pos |
the position on the search list at which to attach the
loaded namespace. Can also be the name of a position on the current
search list as given by |
lib.loc |
a character vector describing the location of R
library trees to search through, or |
character.only |
a logical indicating whether |
logical.return |
logical. If it is |
warn.conflicts |
logical. If |
verbose |
a logical. If |
quietly |
a logical. If |
pkg |
character string naming a package. |
mask.ok |
character vector of names of objects that can mask objects on the search path without signaling an error when strict conflict checking is enabled. |
exclude , include.only
|
character vector of names of objects to
exclude or include in the attached frame. Only one of these arguments
may be used in a call to |
attach.required |
logical specifying whether required packages
listed in the |
library(package)
and require(package)
both load the
namespace of the package with name package
and attach it on the
search list. require
is designed for use inside other
functions; it returns FALSE
and gives a warning (rather than an
error as library()
does by default) if the package does not
exist. Both functions check and update the list of currently attached
packages and do not reload a namespace which is already loaded. (If
you want to reload such a package, call detach(unload =
TRUE)
or unloadNamespace
first.) If you want to load a
package without attaching it on the search list, see
requireNamespace
.
To suppress messages during the loading of packages use
suppressPackageStartupMessages
: this will suppress all
messages from R itself but not necessarily all those from package
authors.
If library
is called with no package
or help
argument, it lists all available packages in the libraries specified
by lib.loc
, and returns the corresponding information in an
object of class "libraryIQR"
. (The structure of this class may
change in future versions.) Use .packages(all = TRUE)
to
obtain just the names of all available packages, and
installed.packages()
for even more information.
library(help = somename)
computes basic information about the
package somename, and returns this in an object of class
"packageInfo"
. (The structure of this class may change in
future versions.) When used with the default value (NULL
) for
lib.loc
, the attached packages are searched before the libraries.
Normally library
returns (invisibly) the list of attached
packages, but TRUE
or FALSE
if logical.return
is
TRUE
. When called as library()
it returns an object of
class "libraryIQR"
, and for library(help=)
, one of
class "packageInfo"
.
require
returns (invisibly) a logical indicating whether the required
package is available.
Handling of conflicts depends on the setting of the
conflicts.policy
option. If this option is not set, then
conflicts result in warning messages if the argument
warn.conflicts
is TRUE
. If the option is set to the
character string "strict"
, then all unresolved conflicts signal
errors. Conflicts can be resolved using the mask.ok
,
exclude
, and include.only
arguments to library
and
require
. Defaults for mask.ok
and exclude
can be
specified using conflictRules
.
If the conflicts.policy
option is set to the string
"depends.ok"
then conflicts resulting from attaching declared
dependencies will not produce errors, but other conflicts will.
This is likely to be the best setting for most users wanting some
additional protection against unexpected conflicts.
The policy can be tuned further by specifying the
conflicts.policy
option as a named list with the following
fields:
error
:logical; if TRUE
treat unresolved
conflicts as errors.
warn
:logical; unless FALSE
issue a warning
message when conflicts are found.
generics.ok
:logical; if TRUE
ignore conflicts
created by defining S4 generics for functions on the search path.
depends.ok
:logical; if TRUE
do not treat
conflicts with required packages as errors.
can.mask
:character vector of names of packages that are allowed to be masked. These would typically be base packages attached by default.
Some packages have restrictive licenses, and there is a mechanism to
allow users to be aware of such licenses. If
getOption("checkPackageLicense") == TRUE
, then at first
use of a namespace of a package with a not-known-to-be-FOSS (see
below) license the user is asked to view and accept the license: a
list of accepted licenses is stored in file ‘~/.R/licensed’. In
a non-interactive session it is an error to use such a package whose
license has not already been recorded as accepted.
Free or Open Source Software (FOSS,
e.g. https://en.wikipedia.org/wiki/FOSS) packages are
determined by the same filters used by
available.packages
but applied to just the current
package, not its dependencies.
There can also be a site-wide file ‘R_HOME/etc/licensed.site’ of packages (one per line).
library
takes some further actions when package methods
is attached (as it is by default). Packages may define formal generic
functions as well as re-defining functions in other packages (notably
base) to be generic, and this information is cached whenever
such a namespace is loaded after methods and re-defined functions
(implicit generics) are excluded from the list of conflicts.
The caching and check for conflicts require looking for a pattern of
objects; the search may be avoided by defining an object
.noGenerics
(with any value) in the namespace. Naturally, if the
package does have any such methods, this will prevent them from
being used.
library
and require
can only load/attach an
installed package, and this is detected by having a
‘DESCRIPTION’ file containing a ‘Built:’ field.
Under Unix-alikes, the code checks that the package was installed
under a similar operating system as given by R.version$platform
(the canonical name of the platform under which R was compiled),
provided it contains compiled code. Packages which do not contain
compiled code can be shared between Unix-alikes, but not to other OSes
because of potential problems with line endings and OS-specific help
files. If sub-architectures are used, the OS similarity is not
checked since the OS used to build may differ
(e.g. i386-pc-linux-gnu
code can be built on an
x86_64-unknown-linux-gnu
OS).
The package name given to library
and require
must match
the name given in the package's ‘DESCRIPTION’ file exactly, even
on case-insensitive file systems such as are common on Windows and
macOS.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
attach
, detach
, search
,
objects
, autoload
,
requireNamespace
,
library.dynam
, data
,
install.packages
and
installed.packages
;
INSTALL
, REMOVE
.
The initial set of packages attached is set by
options(defaultPackages=)
: see also Startup
.
library() # list all available packages library(lib.loc = .Library) # list all packages in the default library library(help = splines) # documentation on package 'splines' library(splines) # attach package 'splines' require(splines) # the same search() # "splines", too detach("package:splines") # if the package name is in a character vector, use pkg <- "splines" library(pkg, character.only = TRUE) detach(pos = match(paste("package", pkg, sep = ":"), search())) require(pkg, character.only = TRUE) detach(pos = match(paste("package", pkg, sep = ":"), search())) require(nonexistent) # FALSE ## Not run: ## if you want to mask as little as possible, use library(mypkg, pos = "package:base") ## End(Not run)
library() # list all available packages library(lib.loc = .Library) # list all packages in the default library library(help = splines) # documentation on package 'splines' library(splines) # attach package 'splines' require(splines) # the same search() # "splines", too detach("package:splines") # if the package name is in a character vector, use pkg <- "splines" library(pkg, character.only = TRUE) detach(pos = match(paste("package", pkg, sep = ":"), search())) require(pkg, character.only = TRUE) detach(pos = match(paste("package", pkg, sep = ":"), search())) require(nonexistent) # FALSE ## Not run: ## if you want to mask as little as possible, use library(mypkg, pos = "package:base") ## End(Not run)
Load the specified file of compiled code if it has not been loaded already, or unloads it.
library.dynam(chname, package, lib.loc, verbose = getOption("verbose"), file.ext = .Platform$dynlib.ext, ...) library.dynam.unload(chname, libpath, verbose = getOption("verbose"), file.ext = .Platform$dynlib.ext) .dynLibs(new)
library.dynam(chname, package, lib.loc, verbose = getOption("verbose"), file.ext = .Platform$dynlib.ext, ...) library.dynam.unload(chname, libpath, verbose = getOption("verbose"), file.ext = .Platform$dynlib.ext) .dynLibs(new)
chname |
a character string naming a DLL (also known as a dynamic shared object or library) to load. |
package |
a character vector with the name of package. |
lib.loc |
a character vector describing the location of R library trees to search through. |
libpath |
the path to the loaded package whose DLL is to be unloaded. |
verbose |
a logical value indicating whether an announcement
is printed on the console before loading the DLL. The
default value is taken from the verbose entry in the system
|
file.ext |
the extension (including ‘.’ if used) to append to the file name to specify the library to be loaded. This defaults to the appropriate value for the operating system. |
... |
additional arguments needed by some libraries that
are passed to the call to |
new |
a list of |
See dyn.load
for what sort of objects these functions handle.
library.dynam
is designed to be used inside a package rather
than at the command line, and should really only be used inside
.onLoad
. The system-specific extension for DLLs (e.g.,
‘.so’ or ‘.sl’ on Unix-alike systems,
‘.dll’ on Windows) should not be added.
library.dynam.unload
is designed for use in
.onUnload
: it unloads the DLL and updates the value of
.dynLibs()
.dynLibs
is used for getting (with no argument) or setting the
DLLs which are currently loaded by packages (using library.dynam
).
If chname
is not specified, library.dynam
returns an
object of class "DLLInfoList"
corresponding to the DLLs
loaded by packages.
If chname
is specified, an object of class
"DLLInfo"
that identifies the DLL and which can be used
in future calls is returned invisibly. Note that the class
"DLLInfo"
has a method for $
which can be used to
resolve native symbols within that DLL.
library.dynam.unload
invisibly returns an object of class
"DLLInfo"
identifying the DLL successfully unloaded.
.dynLibs
returns an object of class "DLLInfoList"
corresponding to its current value.
Do not use dyn.unload
on a DLL loaded by
library.dynam
: use library.dynam.unload
to ensure
that .dynLibs
gets updated. Otherwise a subsequent call to
library.dynam
will be told the object is already loaded.
Note that whether or not it is possible to unload a DLL and then
reload a revised version of the same file is OS-dependent: see the
‘Value’ section of the help for dyn.unload
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
getLoadedDLLs
for information on "DLLInfo"
and
"DLLInfoList"
objects.
.onLoad
, library
,
dyn.load
, .packages
,
.libPaths
SHLIB
for how to create suitable DLLs.
## Which DLLs were dynamically loaded by packages? library.dynam() ## More on library.dynam.unload() : require(nlme) nlme:::.onUnload # shows library.dynam.unload() call detach("package:nlme") # by default, unload=FALSE , so, tail(library.dynam(), 2)# nlme still there ## How to unload the DLL ? ## Best is to unload the namespace, unloadNamespace("nlme") ## If we need to do it separately which should be exceptional: pd.file <- attr(packageDescription("nlme"), "file") library.dynam.unload("nlme", libpath = sub("/Meta.*", '', pd.file)) tail(library.dynam(), 2)# 'nlme' is gone now unloadNamespace("nlme") # now gives warning
## Which DLLs were dynamically loaded by packages? library.dynam() ## More on library.dynam.unload() : require(nlme) nlme:::.onUnload # shows library.dynam.unload() call detach("package:nlme") # by default, unload=FALSE , so, tail(library.dynam(), 2)# nlme still there ## How to unload the DLL ? ## Best is to unload the namespace, unloadNamespace("nlme") ## If we need to do it separately which should be exceptional: pd.file <- attr(packageDescription("nlme"), "file") library.dynam.unload("nlme", libpath = sub("/Meta.*", '', pd.file)) tail(library.dynam(), 2)# 'nlme' is gone now unloadNamespace("nlme") # now gives warning
The license terms under which R is distributed.
license() licence()
license() licence()
R is distributed under the terms of the GNU GENERAL PUBLIC LICENSE,
either Version 2, June 1991 or Version 3, June 2007. A copy of the
version 2 license is in file ‘R_HOME/doc/COPYING’
and can be viewed by RShowDoc("COPYING")
. Version 3 of the
license can be displayed by RShowDoc("GPL-3")
.
A small number of files (some of the API header files) are distributed
under the LESSER GNU GENERAL PUBLIC LICENSE, version 2.1 or later. A
copy of this license is in file ‘R_SHARE_DIR/licenses/LGPL-2.1’
and can be viewed by RShowDoc("LGPL-2.1")
. Version 3 of the
license can be displayed by RShowDoc("LGPL-3")
.
Functions to construct, coerce and check for both kinds of R lists.
list(...) pairlist(...) as.list(x, ...) ## S3 method for class 'environment' as.list(x, all.names = FALSE, sorted = FALSE, ...) as.pairlist(x) is.list(x) is.pairlist(x) alist(...)
list(...) pairlist(...) as.list(x, ...) ## S3 method for class 'environment' as.list(x, all.names = FALSE, sorted = FALSE, ...) as.pairlist(x) is.list(x) is.pairlist(x) alist(...)
... |
objects, possibly named. |
x |
object to be coerced or tested. |
all.names |
a logical indicating whether to copy all values or (default) only those whose names do not begin with a dot. |
sorted |
a logical indicating whether the |
Almost all lists in R internally are Generic Vectors, whereas
traditional dotted pair lists (as in LISP) remain available but
rarely seen by users (except as formals
of functions).
The arguments to list
or pairlist
are of the form
value
or tag = value
. The functions return a list or
dotted pair list composed of its arguments with each value either
tagged or untagged, depending on how the argument was specified.
alist
handles its arguments as if they described function
arguments. So the values are not evaluated, and tagged arguments with
no value are allowed whereas list
simply ignores them.
alist
is most often used in conjunction with formals
.
as.list
attempts to coerce its argument to a list. For
functions, this returns the concatenation of the list of formal
arguments and the function body. For expressions, the list of
constituent elements is returned. as.list
is generic, and as
the default method calls as.vector(mode = "list")
for a
non-list, methods for as.vector
may be invoked. as.list
turns a factor into a list of one-element factors, keeping
names
. Other attributes may
be dropped unless the argument already is a list or expression. (This
is inconsistent with functions such as as.character
which always drop attributes, and is for efficiency since lists can be
expensive to copy.)
is.list
returns TRUE
if and only if its argument
is a list
or a pairlist
of length
.
is.pairlist
returns TRUE
if and only if the argument
is a pairlist or NULL
(see below).
The "environment"
method for as.list
copies the
name-value pairs (for names not beginning with a dot) from an
environment to a named list. The user can request that all named
objects are copied. Unless sorted = TRUE
, the list is in no
particular order (the order
depends on the order of creation of objects and whether the
environment is hashed). No enclosing environments are searched.
(Objects copied are duplicated so this can be an expensive operation.)
Note that there is an inverse operation, the
as.environment()
method for list objects.
An empty pairlist, pairlist()
is the same as
NULL
. This is different from list()
: some but
not all operations will promote an empty pairlist to an empty list.
as.pairlist
is implemented as as.vector(x,
"pairlist")
, and hence will dispatch methods for the generic function
as.vector
. Lists are copied element-by-element into a pairlist
and the names of the list used as tags for the pairlist: the return
value for other types of argument is undocumented.
list
, is.list
and is.pairlist
are
primitive functions.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
vector("list", length)
for creation of a list with empty
components; c
, for concatenation; formals
.
unlist
is an approximate inverse to as.list()
.
‘plotmath’ for the use of list
in plot annotation.
require(graphics) # create a plotting structure pts <- list(x = cars[,1], y = cars[,2]) plot(pts) is.pairlist(.Options) # a user-level pairlist ## "pre-allocate" an empty list of length 5 vector("list", 5) # Argument lists f <- function() x # Note the specification of a "..." argument: formals(f) <- al <- alist(x = , y = 2+3, ... = ) f al ## environment->list coercion e1 <- new.env() e1$a <- 10 e1$b <- 20 as.list(e1)
require(graphics) # create a plotting structure pts <- list(x = cars[,1], y = cars[,2]) plot(pts) is.pairlist(.Options) # a user-level pairlist ## "pre-allocate" an empty list of length 5 vector("list", 5) # Argument lists f <- function() x # Note the specification of a "..." argument: formals(f) <- al <- alist(x = , y = 2+3, ... = ) f al ## environment->list coercion e1 <- new.env() e1$a <- 10 e1$b <- 20 as.list(e1)
These functions produce a character vector of the names of files or directories in the named directory.
list.files(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE) dir(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE) list.dirs(path = ".", full.names = TRUE, recursive = TRUE)
list.files(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE) dir(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE) list.dirs(path = ".", full.names = TRUE, recursive = TRUE)
path |
a character vector of full path names; the default
corresponds to the working directory, |
pattern |
an optional regular expression. Only file names which match the regular expression will be returned. |
all.files |
a logical value. If |
full.names |
a logical value. If |
recursive |
logical. Should the listing recurse into directories? |
ignore.case |
logical. Should pattern-matching be case-insensitive? |
include.dirs |
logical. Should subdirectory names be included in recursive listings? (They always are in non-recursive ones). |
no.. |
logical. Should both |
A character vector containing the names of the files in the specified directories (empty if there were no files). If a path does not exist or is not a directory or is unreadable it is skipped.
The files are sorted in alphabetical order, on the full path
if full.names = TRUE
.
list.dirs
implicitly has all.files = TRUE
, and if
recursive = TRUE
, the answer includes path
itself
(provided it is a readable directory).
dir
is an alias for list.files
.
File naming conventions are platform dependent. The pattern matching works with the case of file names as returned by the OS.
On a POSIX filesystem recursive listings will follow symbolic links to directories.
Ross Ihaka, Brian Ripley
file.info
, file.access
and files
for many more file handling functions and
file.choose
for interactive selection.
glob2rx
to convert wildcards (as used by system file
commands and shells) to regular expressions.
Sys.glob
for wildcard expansion on file paths.
basename
and dirname
, useful for splitting paths
into non-directory (aka ‘filename’) and directory parts.
list.files(R.home()) ## Only files starting with a-l or r ## Note that a-l is locale-dependent, but using case-insensitive ## matching makes it unambiguous in English locales dir("../..", pattern = "^[a-lr]", full.names = TRUE, ignore.case = TRUE) list.dirs(R.home("doc")) list.dirs(R.home("doc"), full.names = FALSE)
list.files(R.home()) ## Only files starting with a-l or r ## Note that a-l is locale-dependent, but using case-insensitive ## matching makes it unambiguous in English locales dir("../..", pattern = "^[a-lr]", full.names = TRUE, ignore.case = TRUE) list.dirs(R.home("doc")) list.dirs(R.home("doc"), full.names = FALSE)
Create a data frame from a list of variables.
list2DF(x = list(), nrow = 0)
list2DF(x = list(), nrow = 0)
x |
A list of same-length variables for the data frame. |
nrow |
An integer giving the desired number of rows for the data frame in
case |
Note that all list elements are taken “as is”.
A data frame with the given variables.
## Create a data frame holding a list of character vectors and the ## corresponding lengths: x <- list(character(), "A", c("B", "C")) n <- lengths(x) list2DF(list(x = x, n = n)) ## Create data frames with no variables and the desired number of rows: list2DF() list2DF(nrow = 3L)
## Create a data frame holding a list of character vectors and the ## corresponding lengths: x <- list(character(), "A", c("B", "C")) n <- lengths(x) list2DF(list(x = x, n = n)) ## Create data frames with no variables and the desired number of rows: list2DF() list2DF(nrow = 3L)
From a named list x
, create an
environment
containing all list components as objects, or
“multi-assign” from x
into a pre-existing environment.
list2env(x, envir = NULL, parent = parent.frame(), hash = (length(x) > 100), size = max(29L, length(x)))
list2env(x, envir = NULL, parent = parent.frame(), hash = (length(x) > 100), size = max(29L, length(x)))
x |
a |
envir |
an |
parent |
(for the case |
hash |
(for the case |
size |
(in the case |
This will be very slow for large inputs unless hashing is used on the environment.
Environments must have uniquely named entries, but named lists need not: where the list has duplicate names it is the last element with the name that is used. Empty names throw an error.
An environment
, either newly created (as by
new.env
) if the envir
argument was NULL
,
otherwise the updated environment envir
. Since environments
are never duplicated, the argument envir
is also changed.
Martin Maechler
environment
, new.env
,
as.environment
; further, assign
.
The (semantical) “inverse”: as.list.environment
.
L <- list(a = 1, b = 2:4, p = pi, ff = gl(3, 4, labels = LETTERS[1:3])) e <- list2env(L) ls(e) stopifnot(ls(e) == sort(names(L)), identical(L$b, e$b)) # "$" working for environments as for lists ## consistency, when we do the inverse: ll <- as.list(e) # -> dispatching to the as.list.environment() method rbind(names(L), names(ll)) # not in the same order, typically, # but the same content: stopifnot(identical(L [sort.list(names(L ))], ll[sort.list(names(ll))])) ## now add to e -- can be seen as a fast "multi-assign": list2env(list(abc = LETTERS, note = "just an example", df = data.frame(x = rnorm(20), y = rbinom(20, 1, prob = 0.2))), envir = e) utils::ls.str(e)
L <- list(a = 1, b = 2:4, p = pi, ff = gl(3, 4, labels = LETTERS[1:3])) e <- list2env(L) ls(e) stopifnot(ls(e) == sort(names(L)), identical(L$b, e$b)) # "$" working for environments as for lists ## consistency, when we do the inverse: ll <- as.list(e) # -> dispatching to the as.list.environment() method rbind(names(L), names(ll)) # not in the same order, typically, # but the same content: stopifnot(identical(L [sort.list(names(L ))], ll[sort.list(names(ll))])) ## now add to e -- can be seen as a fast "multi-assign": list2env(list(abc = LETTERS, note = "just an example", df = data.frame(x = rnorm(20), y = rbinom(20, 1, prob = 0.2))), envir = e) utils::ls.str(e)
Reload datasets written with the function save
.
load(file, envir = parent.frame(), verbose = FALSE)
load(file, envir = parent.frame(), verbose = FALSE)
file |
a (readable binary-mode) connection or a character string giving the name of the file to load (when tilde expansion is done). |
envir |
the environment where the data should be loaded. |
verbose |
should item names be printed during loading? |
load
can load R objects saved in the current or any earlier
format. It can read a compressed file (see save
)
directly from a file or from a suitable connection (including a call
to url
).
A not-open connection will be opened in mode "rb"
and closed
after use. Any connection other than a gzfile
or
gzcon
connection will be wrapped in gzcon
to allow compressed saves to be handled: note that this leaves the
connection in an altered state (in particular, binary-only), and that
it needs to be closed explicitly (it will not be garbage-collected).
Only R objects saved in the current format (used since R 1.4.0) can be read from a connection. If no input is available on a connection a warning will be given, but any input not in the current format will result in a error.
Loading from an earlier version will give a warning about the
‘magic number’: magic numbers 1971:1977
are from R <
0.99.0, and RD[ABX]1
from R 0.99.0 to R 1.3.1. These are all
obsolete, and you are strongly recommended to re-save such files in a
current format.
The verbose
argument is mainly intended for debugging. If it
is TRUE
, then as objects from the file are loaded, their
names will be printed to the console. If verbose
is set to
an integer value greater than one, additional names corresponding to
attributes and other parts of individual objects will also be printed.
Larger values will print names to a greater depth.
Objects can be saved with references to namespaces, usually as part of the environment of a function or formula. Such objects can be loaded even if the namespace is not available: it is replaced by a reference to the global environment with a warning. The warning identifies the first object with such a reference (but there may be more than one).
A character vector of the names of objects created, invisibly.
Saved R objects are binary files, even those saved with
ascii = TRUE
, so ensure that they are transferred without
conversion of end of line markers. load
tries to detect such a
conversion and gives an informative error message.
load(file)
replaces all existing objects with the same names
in the current environment (typically your workspace,
.GlobalEnv
) and hence potentially overwrites important data.
It is considerably safer to use envir =
to load into a
different environment, or to attach(file)
which
load()
s into a new entry in the search
path.
save
, download.file
; further
attach
as wrapper for load()
.
For other interfaces to the underlying serialization format, see
unserialize
and readRDS
.
## save all data xx <- pi # to ensure there is some data save(list = ls(all.names = TRUE), file= "all.rda") rm(xx) ## restore the saved values to the current environment local({ load("all.rda") ls() }) xx <- exp(1:3) ## restore the saved values to the user's workspace load("all.rda") ## which is here *equivalent* to ## load("all.rda", .GlobalEnv) ## This however annihilates all objects in .GlobalEnv with the same names ! xx # no longer exp(1:3) rm(xx) attach("all.rda") # safer and will warn about masked objects w/ same name in .GlobalEnv ls(pos = 2) ## also typically need to cleanup the search path: detach("file:all.rda") ## clean up (the example): unlink("all.rda") ## Not run: con <- url("http://some.where.net/R/data/example.rda") ## print the value to see what objects were created. print(load(con)) close(con) # url() always opens the connection ## End(Not run)
## save all data xx <- pi # to ensure there is some data save(list = ls(all.names = TRUE), file= "all.rda") rm(xx) ## restore the saved values to the current environment local({ load("all.rda") ls() }) xx <- exp(1:3) ## restore the saved values to the user's workspace load("all.rda") ## which is here *equivalent* to ## load("all.rda", .GlobalEnv) ## This however annihilates all objects in .GlobalEnv with the same names ! xx # no longer exp(1:3) rm(xx) attach("all.rda") # safer and will warn about masked objects w/ same name in .GlobalEnv ls(pos = 2) ## also typically need to cleanup the search path: detach("file:all.rda") ## clean up (the example): unlink("all.rda") ## Not run: con <- url("http://some.where.net/R/data/example.rda") ## print the value to see what objects were created. print(load(con)) close(con) # url() always opens the connection ## End(Not run)
Get details of or set aspects of the locale for the R process.
Sys.getlocale (category = "LC_ALL") Sys.setlocale (category = "LC_ALL", locale = "") .LC.categories
Sys.getlocale (category = "LC_ALL") Sys.setlocale (category = "LC_ALL", locale = "") .LC.categories
category |
character string. The following categories should
always be supported: |
locale |
character string. A valid locale name on the system in
use. Normally |
The locale describes aspects of the internationalization of a program.
Initially most aspects of the locale of R are set to "C"
(which is the default for the C language and reflects North-American
usage – also known as "POSIX"
). R sets "LC_CTYPE"
and
"LC_COLLATE"
, which allow the use of a different character set
and alphabetic comparisons in that character set (including the use of
sort
), "LC_MONETARY"
(for use by
Sys.localeconv
) and "LC_TIME"
may affect the
behaviour of as.POSIXlt
and strptime
and
functions which use them (but not date
).
The first seven categories described here are those specified by
POSIX. "LC_MESSAGES"
will be "C"
on systems that do not
support message translation, and is not supported on Windows, where
you must use the LANGUAGE environment variable for
message translation, see below and the Sys.setLanguage()
utility. Trying to use an unsupported category is an error for
Sys.setlocale
.
Note that setting category "LC_ALL"
sets only categories
"LC_COLLATE"
, "LC_CTYPE"
, "LC_MONETARY"
and
"LC_TIME"
.
Attempts to set an invalid locale are ignored. There may or may not be a warning, depending on the OS.
Attempts to change the character set (by
Sys.setlocale("LC_CTYPE", )
, if that implies a different
character set) during a session may not work and are likely to lead to
some confusion.
Note that the LANGUAGE environment variable has precedence over
"LC_MESSAGES"
in selecting the language for message translation
on most R platforms.
On platforms where ICU is used for collation the locale used for
collation can be reset by icuSetCollate
. Except on
Windows, the initial setting is taken from the "LC_COLLATE"
category, and it is reset when this is changed by a call to
Sys.setlocale
.
A character string of length one describing the locale in use (after
setting for Sys.setlocale
), or an empty character string if the
current locale settings are invalid or NULL
if locale
information is unavailable.
For category = "LC_ALL"
the details of the string are
system-specific: it might be a single locale name or a set of locale
names separated by "/"
(macOS) or ";"
(Windows, Linux). For portability, it is best to query categories
individually: it is not necessarily the case that the result of
foo <- Sys.getlocale()
can be used in
Sys.setlocale("LC_ALL", locale = foo)
.
On most Unix-alikes the POSIX shell command locale -a
will
list the ‘available public’ locales. What that means is
platform-dependent. On recent Linuxen this may mean ‘available
to be installed’ as on some RPM-based systems the locale data is in
separate RPMs. On Debian/Ubuntu the set of available locales is
managed by OS-specific facilities such as locale-gen
and
locale -a
lists those currently enabled.
For Windows, Microsoft moves its documentation frequently so a Web
search is the best way to find current information. From R 4.2, UCRT
locale names should be used. The character set should match the
system/ANSI codepage (l10n_info()$codepage
be the same as
l10n_info()$system.codepage
). Setting it to any other value
results in a warning and may cause encoding problems. As from R 4.2
on recent Windows the system codepage is 65001 and one should always
use locale names ending with ".UTF-8"
(except for "C"
and ""
), otherwise Windows may add a different character set.
Setting "LC_NUMERIC"
to any value other than "C"
may
cause R to function anomalously, so gives a warning. Input
conversions in R itself are unaffected, but the reading and writing
of ASCII save
files will be, as may packages which do
their own input/output.
Setting it temporarily on a Unix-alike to produce graphical or text
output may work well enough, but options(OutDec)
is
often preferable.
Almost all the output routines used by R itself under Windows ignore
the setting of "LC_NUMERIC"
since they make use of the Trio
library which is not internationalized.
Changing the values of locale categories whilst R is running ought to be noticed by the OS services, and usually is but exceptions have been seen (usually in collation services).
Do not use the value of Sys.getlocale("LC_CTYPE")
to attempt to
find the character set – for example UTF-8 locales can have suffix
‘.UTF-8’ or ‘.utf8’ (more common on Linux than ‘UTF-8’)
or none (as on macOS) and Latin-9 locales can have suffix
‘ISO8859-15’, ‘iso885915’, ‘iso885915@euro’ or
‘ISO8859-15@euro’. Use l10n_info
instead.
strptime
for uses of category = "LC_TIME"
.
Sys.localeconv
for details of numerical and monetary
representations.
l10n_info
gives some summary facts about the locale and
its encoding (including if it is UTF-8).
The ‘R Installation and Administration’ manual for background on locales and how to find out locale names on your system.
Sys.getlocale() ## Date-time related : Sys.getlocale("LC_TIME") -> olcT then <- as.POSIXlt("2001-01-01 01:01:01", tz = "UTC") ## Not run: c(m = months(then), wd = weekdays(then)) # locale specific Sys.setlocale("LC_TIME", "de") # Solaris: details are OS-dependent Sys.setlocale("LC_TIME", "de_DE") # Many Unix-alikes Sys.setlocale("LC_TIME", "de_DE.UTF-8") # Linux, macOS, other Unix-alikes Sys.setlocale("LC_TIME", "de_DE.utf8") # some Linux versions Sys.setlocale("LC_TIME", "German.UTF-8") # Windows Sys.getlocale("LC_TIME") # the last one successfully set above c(m = months(then), wd = weekdays(then)) # in C_TIME locale 'cT' ; typically German ## End(Not run) Sys.setlocale("LC_TIME", "C") c(m = months(then), wd = weekdays(then)) # "standard" (still platform specific ?) Sys.setlocale("LC_TIME", olcT) # reset to previous ## Other locales Sys.getlocale("LC_PAPER") # may or may not be set .LC.categories # of length 9 on all platforms ## Not run: Sys.setlocale("LC_COLLATE", "C") # turn off locale-specific sorting, # usually (but not on all platforms) Sys.setenv("LANGUAGE" = "es") # set the language for error/warning messages ## End(Not run) ## some nice formatting; should work on most platforms, ## macOS does not name the entries. sep <- switch(Sys.info()[["sysname"]], "Darwin"=, "SunOS" = "/", "Linux" =, "Windows" = ";") ##' show a "full" Sys.getlocale() nicely: showL <- function(loc) { sl <- strsplit(strsplit(loc, sep)[[1L]], "=") if(all(sapply(sl, length) == 2L)) setNames(sapply(sl, `[[`, 2L), sapply(sl, `[[`, 1L)) else setNames(as.character(sl), .LC.categories[1+seq_along(sl)]) } print.Dlist(lloc <- showL(Sys.getlocale())) ## R-supported ones (but LC_ALL): lloc[.LC.categories[-1]]
Sys.getlocale() ## Date-time related : Sys.getlocale("LC_TIME") -> olcT then <- as.POSIXlt("2001-01-01 01:01:01", tz = "UTC") ## Not run: c(m = months(then), wd = weekdays(then)) # locale specific Sys.setlocale("LC_TIME", "de") # Solaris: details are OS-dependent Sys.setlocale("LC_TIME", "de_DE") # Many Unix-alikes Sys.setlocale("LC_TIME", "de_DE.UTF-8") # Linux, macOS, other Unix-alikes Sys.setlocale("LC_TIME", "de_DE.utf8") # some Linux versions Sys.setlocale("LC_TIME", "German.UTF-8") # Windows Sys.getlocale("LC_TIME") # the last one successfully set above c(m = months(then), wd = weekdays(then)) # in C_TIME locale 'cT' ; typically German ## End(Not run) Sys.setlocale("LC_TIME", "C") c(m = months(then), wd = weekdays(then)) # "standard" (still platform specific ?) Sys.setlocale("LC_TIME", olcT) # reset to previous ## Other locales Sys.getlocale("LC_PAPER") # may or may not be set .LC.categories # of length 9 on all platforms ## Not run: Sys.setlocale("LC_COLLATE", "C") # turn off locale-specific sorting, # usually (but not on all platforms) Sys.setenv("LANGUAGE" = "es") # set the language for error/warning messages ## End(Not run) ## some nice formatting; should work on most platforms, ## macOS does not name the entries. sep <- switch(Sys.info()[["sysname"]], "Darwin"=, "SunOS" = "/", "Linux" =, "Windows" = ";") ##' show a "full" Sys.getlocale() nicely: showL <- function(loc) { sl <- strsplit(strsplit(loc, sep)[[1L]], "=") if(all(sapply(sl, length) == 2L)) setNames(sapply(sl, `[[`, 2L), sapply(sl, `[[`, 1L)) else setNames(as.character(sl), .LC.categories[1+seq_along(sl)]) } print.Dlist(lloc <- showL(Sys.getlocale())) ## R-supported ones (but LC_ALL): lloc[.LC.categories[-1]]
log
computes logarithms, by default natural logarithms,
log10
computes common (i.e., base 10) logarithms, and
log2
computes binary (i.e., base 2) logarithms.
The general form log(x, base)
computes logarithms with base
base
.
log1p(x)
computes accurately also for
.
exp
computes the exponential function.
expm1(x)
computes accurately also for
.
log(x, base = exp(1)) logb(x, base = exp(1)) log10(x) log2(x) log1p(x) exp(x) expm1(x)
log(x, base = exp(1)) logb(x, base = exp(1)) log10(x) log2(x) log1p(x) exp(x) expm1(x)
x |
a numeric or complex vector. |
base |
a positive or complex number: the base with respect to which
logarithms are computed. Defaults to |
All except logb
are generic functions: methods can be defined
for them individually or via the Math
group generic.
log10
and log2
are only convenience wrappers, but logs
to bases 10 and 2 (whether computed via log
or the wrappers)
will be computed more efficiently and accurately where supported by the OS.
Methods can be set for them individually (and otherwise methods for
log
will be used).
logb
is a wrapper for log
for compatibility with S. If
(S3 or S4) methods are set for log
they will be dispatched.
Do not set S4 methods on logb
itself.
All except log
are primitive functions.
A vector of the same length as x
containing the transformed
values. log(0)
gives -Inf
, and log(x)
for
negative values of x
is NaN
. exp(-Inf)
is 0
.
For complex inputs to the log functions, the value is a complex number
with imaginary part in the range : which
end of the range is used might be platform-specific.
exp
, expm1
, log
, log10
, log2
and
log1p
are S4 generic and are members of the
Math
group generic.
Note that this means that the S4 generic for log
has a
signature with only one argument, x
, but that base
can
be passed to methods (but will not be used for method selection). On
the other hand, if you only set a method for the Math
group
generic then base
argument of log
will be ignored for
your class.
log1p
and expm1
may be taken from the operating system,
but if not available there then they are based on the Fortran subroutine
dlnrel
by W. Fullerton of Los Alamos Scientific Laboratory (see
https://netlib.org/slatec/fnlib/dlnrel.f) and (for small x) a
single Newton step for the solution of log1p(y) = x
respectively.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole.
(for log
, log10
and exp
.)
Chambers, J. M. (1998)
Programming with Data. A Guide to the S Language.
Springer. (for logb
.)
Trig
,
sqrt
,
Arithmetic
.
log(exp(3)) log10(1e7) # = 7 x <- 10^-(1+2*1:9) cbind(deparse.level=2, # to get nice column names x, log(1+x), log1p(x), exp(x)-1, expm1(x))
log(exp(3)) log10(1e7) # = 7 x <- 10^-(1+2*1:9) cbind(deparse.level=2, # to get nice column names x, log(1+x), log1p(x), exp(x)-1, expm1(x))
These operators act on raw, logical and number-like vectors.
! x x & y x && y x | y x || y xor(x, y) isTRUE (x) isFALSE(x)
! x x & y x && y x | y x || y xor(x, y) isTRUE (x) isFALSE(x)
x , y
|
|
!
indicates logical negation (NOT).
&
and &&
indicate logical AND and |
and ||
indicate logical OR. The shorter forms performs elementwise
comparisons in much the same way as arithmetic operators. The longer
forms evaluates left to right, proceeding only until the result is
determined. The longer form is appropriate for programming
control-flow and typically preferred in if
clauses.
Using vectors of more than one element in &&
or ||
will
give an error.
xor
indicates elementwise exclusive OR.
isTRUE(x)
is the same as
{ is.logical(x) && length(x) == 1 && !is.na(x) && x }
;
isFALSE()
is defined analogously. Consequently,
if(isTRUE(cond))
may be preferable to if(cond)
because
of NA
s.
In earlier R versions, isTRUE <- function(x) identical(x, TRUE)
,
had the drawback to be false e.g., for x <- c(val = TRUE)
.
Numeric and complex vectors will be coerced to logical values, with
zero being false and all non-zero values being true. Raw vectors are
handled without any coercion for !
, &
, |
and
xor
, with these operators being applied bitwise (so !
is
the 1s-complement).
The operators !
, &
and |
are generic functions:
methods can be written for them individually or via the
Ops
(or S4 Logic
, see below)
group generic function. (See Ops
for
how dispatch is computed.)
NA
is a valid logical object. Where a component of
x
or y
is NA
, the result will be NA
if the
outcome is ambiguous. In other words NA & TRUE
evaluates to
NA
, but NA & FALSE
evaluates to FALSE
. See the
examples below.
See Syntax for the precedence of these operators: unlike many other languages (including S) the AND and OR operators do not have the same precedence (the AND operators have higher precedence than the OR operators).
For !
, a logical or raw vector(for raw x
) of the same
length as x
: names, dims and dimnames are copied from x
,
and all other attributes (including class) if no coercion is done.
For |
, &
and xor
a logical or raw vector. If
involving a zero-length vector the result has length zero. Otherwise,
the elements of shorter vectors are recycled as necessary (with a
warning
when they are recycled only fractionally).
The rules for determining the attributes of the result are rather
complicated. Most attributes are taken from the longer argument, the
first if they are of the same length. Names will be copied from the
first if it is the same length as the answer, otherwise from the
second if that is. For time series, these operations are allowed only
if the series are compatible, when the class and tsp
attribute of whichever is a time series (the same, if both are) are
used. For arrays (and an array result) the dimensions and dimnames
are taken from first argument if it is an array, otherwise the second.
For ||
, &&
and isTRUE
, a length-one logical vector.
!
, &
and |
are S4 generics, the latter two part
of the Logic
group generic (and
hence methods need argument names e1, e2
).
The elementwise operators are sometimes called as functions as
e.g. `&`(x, y)
: see the description of how
argument-matching is done in Ops
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
any
and all
for OR and AND on many scalar
arguments.
Syntax
for operator precedence.
L %||% R
which takes L
if it is not NULL
,
and R
otherwise.
bitwAnd
for bitwise versions for integer vectors.
y <- 1 + (x <- stats::rpois(50, lambda = 1.5) / 4 - 1) x[(x > 0) & (x < 1)] # all x values between 0 and 1 if (any(x == 0) || any(y == 0)) "zero encountered" ## construct truth tables : x <- c(NA, FALSE, TRUE) names(x) <- as.character(x) outer(x, x, `&`) ## AND table outer(x, x, `|`) ## OR table
y <- 1 + (x <- stats::rpois(50, lambda = 1.5) / 4 - 1) x[(x > 0) & (x < 1)] # all x values between 0 and 1 if (any(x == 0) || any(y == 0)) "zero encountered" ## construct truth tables : x <- c(NA, FALSE, TRUE) names(x) <- as.character(x) outer(x, x, `&`) ## AND table outer(x, x, `|`) ## OR table
Create or test for objects of type "logical"
, and the basic
logical constants.
TRUE FALSE T; F logical(length = 0) as.logical(x, ...) is.logical(x)
TRUE FALSE T; F logical(length = 0) as.logical(x, ...) is.logical(x)
length |
a non-negative integer specifying the desired length. Double values will be coerced to integer: supplying an argument of length other than one is an error. |
x |
object to be coerced or tested. |
... |
further arguments passed to or from other methods. |
TRUE
and FALSE
are reserved words denoting logical
constants in the R language, whereas T
and F
are global
variables whose initial values set to these. All four are
logical(1)
vectors.
as.logical
is a generic function. Methods should return an object
of type "logical"
.
Logical vectors are coerced to integer vectors in contexts where a
numerical value is required, with TRUE
being mapped to
1L
, FALSE
to 0L
and NA
to NA_integer_
.
logical
creates a logical vector of the specified length.
Each element of the vector is equal to FALSE
.
as.logical
attempts to coerce its argument to be of logical
type. In numeric and complex vectors, zeros are FALSE
and
non-zero values are TRUE
.
For factor
s, this uses the levels
(labels). Like as.vector
it strips attributes including
names. Character strings c("T", "TRUE", "True", "true")
are
regarded as true, c("F", "FALSE", "False", "false")
as false,
and all others as NA
.
is.logical
returns TRUE
or FALSE
depending on
whether its argument is of logical type or not.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
NA
, the other logical constant.
Logical operators are documented in Logic
.
## non-zero values are TRUE as.logical(c(pi,0)) if (length(letters)) cat("26 is TRUE\n") ## logical interpretation of particular strings charvec <- c("FALSE", "F", "False", "false", "fAlse", "0", "TRUE", "T", "True", "true", "tRue", "1") as.logical(charvec) ## factors are converted via their levels, so string conversion is used as.logical(factor(charvec)) as.logical(factor(c(0,1))) # "0" and "1" give NA
## non-zero values are TRUE as.logical(c(pi,0)) if (length(letters)) cat("26 is TRUE\n") ## logical interpretation of particular strings charvec <- c("FALSE", "F", "False", "false", "fAlse", "0", "TRUE", "T", "True", "true", "tRue", "1") as.logical(charvec) ## factors are converted via their levels, so string conversion is used as.logical(factor(charvec)) as.logical(factor(c(0,1))) # "0" and "1" give NA
Vectors of or more elements were added in R 3.0.0.
Prior to R 3.0.0, all vectors in R were restricted to at most
elements and could be indexed by integer
vectors.
Currently all atomic (raw, logical, integer, numeric, complex,
character) vectors, lists and expressions can be much
longer on 64-bit platforms: such vectors are referred to as
‘long vectors’ and have a slightly different internal
structure. In theory they can contain up to elements, but
address space limits of current CPUs and OSes will be much smaller.
Such objects will have a length that is expressed as a double,
and can be indexed by double vectors.
Arrays (including matrices) can be based on long vectors provided each
of their dimensions is at most : thus there
are no 1-dimensional long arrays.
R code typically only needs minor changes to work with long vectors,
maybe only checking that as.integer
is not used unnecessarily
for e.g. lengths. However, compiled code typically needs quite
extensive changes. Note that the .C
and
.Fortran
interfaces do not accept long vectors, so
.Call
(or similar) has to be used.
Because of the storage requirements (a minimum of 64 bytes per character string), character vectors are only going to be usable if they have a small number of distinct elements, and even then factors will be more efficient (4 bytes per element rather than 8). So it is expected that most of the usage of long vectors will be integer vectors (including factors) and numeric vectors.
It is now possible to use matrices with more
than 2 billion elements. Whether matrix algebra (including
%*%
, crossprod
, svd
,
qr
, solve
and eigen
) will
actually work is somewhat implementation dependent, including the
Fortran compiler used and if an external BLAS or LAPACK is used.
An efficient parallel BLAS implementation will often be important to
obtain usable performance. For example on one particular platform
chol
on a 47,000 square matrix took about 5 hours with the
internal BLAS, 21 minutes using an optimized BLAS on one core, and 2
minutes using an optimized BLAS on 16 cores.
Returns a matrix of logicals the same size of a given matrix with
entries TRUE
in the lower or upper triangle.
lower.tri(x, diag = FALSE) upper.tri(x, diag = FALSE)
lower.tri(x, diag = FALSE) upper.tri(x, diag = FALSE)
x |
a matrix or other R object with |
diag |
logical. Should the diagonal be included? |
diag
, matrix
; further row
and col
on which lower.tri()
and
upper.tri()
are built.
(m2 <- matrix(1:20, 4, 5)) lower.tri(m2) m2[lower.tri(m2)] <- NA m2
(m2 <- matrix(1:20, 4, 5)) lower.tri(m2) m2[lower.tri(m2)] <- NA m2
ls
and objects
return a vector of character strings
giving the names of the objects in the specified environment. When
invoked with no argument at the top level prompt, ls
shows what
data sets and functions a user has defined. When invoked with no
argument inside a function, ls
returns the names of the
function's local variables: this is useful in conjunction with
browser
.
ls(name, pos = -1L, envir = as.environment(pos), all.names = FALSE, pattern, sorted = TRUE) objects(name, pos= -1L, envir = as.environment(pos), all.names = FALSE, pattern, sorted = TRUE)
ls(name, pos = -1L, envir = as.environment(pos), all.names = FALSE, pattern, sorted = TRUE) objects(name, pos= -1L, envir = as.environment(pos), all.names = FALSE, pattern, sorted = TRUE)
name |
which environment to use in listing the available objects.
Defaults to the current environment. Although called
|
pos |
an alternative argument to |
envir |
an alternative argument to |
all.names |
a logical value. If |
pattern |
an optional regular expression. Only names
matching |
sorted |
logical indicating if the resulting
|
The name
argument can specify the environment from which
object names are taken in one of several forms:
as an integer (the position in the search
list); as
the character string name of an element in the search list; or as an
explicit environment
(including using
sys.frame
to access the currently active function calls).
By default, the environment of the call to ls
or objects
is used. The pos
and envir
arguments are an alternative
way to specify an environment, but are primarily there for back
compatibility.
Note that the order of strings for sorted = TRUE
is
locale dependent, see Sys.getlocale
. If sorted =
FALSE
the order is arbitrary, depending if the environment is
hashed, the order of insertion of objects, ....
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
glob2rx
for converting wildcard patterns to regular
expressions.
ls.str
for a long listing based on str
.
apropos
(or find
)
for finding objects in the whole search path;
grep
for more details on ‘regular expressions’;
class
, methods
, etc., for
object-oriented programming.
.Ob <- 1 ls(pattern = "O") ls(pattern= "O", all.names = TRUE) # also shows ".[foo]" # shows an empty list because inside myfunc no variables are defined myfunc <- function() {ls()} myfunc() # define a local variable inside myfunc myfunc <- function() {y <- 1; ls()} myfunc() # shows "y"
.Ob <- 1 ls(pattern = "O") ls(pattern= "O", all.names = TRUE) # also shows ".[foo]" # shows an empty list because inside myfunc no variables are defined myfunc <- function() {ls()} myfunc() # define a local variable inside myfunc myfunc <- function() {y <- 1; ls()} myfunc() # shows "y"
Make syntactically valid names out of character vectors.
make.names(names, unique = FALSE, allow_ = TRUE)
make.names(names, unique = FALSE, allow_ = TRUE)
names |
character vector to be coerced to syntactically valid names. This is coerced to character if necessary. |
unique |
logical; if |
allow_ |
logical. For compatibility with R prior to 1.9.0. |
A syntactically valid name consists of letters, numbers and the dot or
underline characters and starts with a letter or the dot not followed
by a number. Names such as ".2way"
are not valid, and neither
are the reserved words.
The definition of a letter depends on the current locale, but only ASCII digits are considered to be digits.
The character "X"
is prepended if necessary.
All invalid characters are translated to "."
. A missing value
is translated to "NA"
. Names which match R keywords have a dot
appended to them. Duplicated values are altered by
make.unique
.
A character vector of same length as names
with each changed to
a syntactically valid name, in the current locale's encoding.
Some OSes, notably FreeBSD, report extremely incorrect information about which characters are alphabetic in some locales (typically, all multi-byte locales including UTF-8 locales). However, R provides substitutes on Windows, macOS and AIX.
Prior to R version 1.9.0, underscores were not valid in variable names,
and code that relies on them being converted to dots will no longer
work. Use allow_ = FALSE
for back-compatibility.
allow_ = FALSE
is also useful when creating names for export to
applications which do not allow underline in names (such as some DBMSes).
make.unique
,
names
,
character
,
data.frame
.
make.names(c("a and b", "a-and-b"), unique = TRUE) # "a.and.b" "a.and.b.1" make.names(c("a and b", "a_and_b"), unique = TRUE) # "a.and.b" "a_and_b" make.names(c("a and b", "a_and_b"), unique = TRUE, allow_ = FALSE) # "a.and.b" "a.and.b.1" make.names(c("", "X"), unique = TRUE) # "X.1" "X" currently; R up to 3.0.2 gave "X" "X.1" state.name[make.names(state.name) != state.name] # those 10 with a space
make.names(c("a and b", "a-and-b"), unique = TRUE) # "a.and.b" "a.and.b.1" make.names(c("a and b", "a_and_b"), unique = TRUE) # "a.and.b" "a_and_b" make.names(c("a and b", "a_and_b"), unique = TRUE, allow_ = FALSE) # "a.and.b" "a.and.b.1" make.names(c("", "X"), unique = TRUE) # "X.1" "X" currently; R up to 3.0.2 gave "X" "X.1" state.name[make.names(state.name) != state.name] # those 10 with a space
Makes the elements of a character vector unique by appending sequence numbers to duplicates.
make.unique(names, sep = ".")
make.unique(names, sep = ".")
names |
a character vector. |
sep |
a character string used to separate a duplicate name from its sequence number. |
The algorithm used by make.unique
has the property that
make.unique(c(A, B)) == make.unique(c(make.unique(A), B))
.
In other words, you can append one string at a time to a vector,
making it unique each time, and get the same result as applying
make.unique
to all of the strings at once.
If character vector A
is already unique, then
make.unique(c(A, B))
preserves A
.
A character vector of same length as names
with duplicates
changed, in the current locale's encoding.
Thomas P. Minka
make.unique(c("a", "a", "a")) make.unique(c(make.unique(c("a", "a")), "a")) make.unique(c("a", "a", "a.2", "a")) make.unique(c(make.unique(c("a", "a")), "a.2", "a")) ## Now show a bit where this is used : trace(make.unique) ## Applied in data.frame() constructions: (d1 <- data.frame(x = 1, x = 2, x = 3)) # direct d2 <- data.frame(data.frame(x = 1, x = 2), x = 3) # pairwise stopifnot(identical(d1, d2), colnames(d1) == c("x", "x.1", "x.2")) untrace(make.unique)
make.unique(c("a", "a", "a")) make.unique(c(make.unique(c("a", "a")), "a")) make.unique(c("a", "a", "a.2", "a")) make.unique(c(make.unique(c("a", "a")), "a.2", "a")) ## Now show a bit where this is used : trace(make.unique) ## Applied in data.frame() constructions: (d1 <- data.frame(x = 1, x = 2, x = 3)) # direct d2 <- data.frame(data.frame(x = 1, x = 2), x = 3) # pairwise stopifnot(identical(d1, d2), colnames(d1) == c("x", "x.1", "x.2")) untrace(make.unique)
mapply
is a multivariate version of sapply
.
mapply
applies FUN
to the first elements of each ...
argument, the second elements, the third elements, and so on.
Arguments are recycled if necessary.
.mapply()
is a bare-bones version of mapply()
, e.g., to be
used in other functions.
mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) .mapply(FUN, dots, MoreArgs)
mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) .mapply(FUN, dots, MoreArgs)
FUN |
function to apply, found via |
... |
arguments to vectorize over, will be recycled to common length (zero if one of them is). See also ‘Details’. |
dots |
|
MoreArgs |
a list of other arguments to |
SIMPLIFY |
logical or character string; attempt to reduce the
result to a vector, matrix or higher dimensional array; see
the |
USE.NAMES |
logical; use the names of the first ... argument, or if that is an unnamed character vector, use that vector as the names. |
mapply
calls FUN
for the values of ...
(re-cycled to the length of the longest, unless any have length zero
where recycling to zero length will return list()
),
followed by the arguments given in MoreArgs
. The arguments in
the call will be named if ...
or MoreArgs
are named.
For the arguments in ...
(or components in dots
) class specific
subsetting (such as [
) and length
methods will be
used where applicable.
A list
, or for SIMPLIFY = TRUE
, a vector, array or list.
sapply
, after which mapply()
is modelled.
outer
, which applies a vectorized function to all
combinations of two arguments.
mapply(rep, 1:4, 4:1) mapply(rep, times = 1:4, x = 4:1) mapply(rep, times = 1:4, MoreArgs = list(x = 42)) mapply(function(x, y) seq_len(x) + y, c(a = 1, b = 2, c = 3), # names from first c(A = 10, B = 0, C = -10)) word <- function(C, k) paste(rep.int(C, k), collapse = "") ## names from the first, too: utils::str(L <- mapply(word, LETTERS[1:6], 6:1, SIMPLIFY = FALSE)) mapply(word, "A", integer()) # gave Error, now list()
mapply(rep, 1:4, 4:1) mapply(rep, times = 1:4, x = 4:1) mapply(rep, times = 1:4, MoreArgs = list(x = 42)) mapply(function(x, y) seq_len(x) + y, c(a = 1, b = 2, c = 3), # names from first c(A = 10, B = 0, C = -10)) word <- function(C, k) paste(rep.int(C, k), collapse = "") ## names from the first, too: utils::str(L <- mapply(word, LETTERS[1:6], 6:1, SIMPLIFY = FALSE)) mapply(word, "A", integer()) # gave Error, now list()
For a contingency table in array form, compute the sum of table entries for a given margin or set of margins.
marginSums(x, margin = NULL) margin.table(x, margin = NULL)
marginSums(x, margin = NULL) margin.table(x, margin = NULL)
x |
an array, usually a |
margin |
a vector giving the margins to compute sums for.
E.g., for a matrix |
The relevant marginal table, or just the sum of all entries if margin
has length zero. The class of x
is copied to the
output table if margin
is non-NULL.
margin.table
is an earlier name, retained for back-compatibility.
Peter Dalgaard
rowSums
and colSums
for similar functionality.
proportions
and addmargins
.
m <- matrix(1:4, 2) marginSums(m, 1) # = rowSums(m) marginSums(m, 2) # = colSums(m) DF <- as.data.frame(UCBAdmissions) tbl <- xtabs(Freq ~ Gender + Admit, DF) tbl marginSums(tbl, "Gender") # a 1-dim "table" rowSums(tbl) # a numeric vector
m <- matrix(1:4, 2) marginSums(m, 1) # = rowSums(m) marginSums(m, 2) # = colSums(m) DF <- as.data.frame(UCBAdmissions) tbl <- xtabs(Freq ~ Gender + Admit, DF) tbl marginSums(tbl, "Gender") # a 1-dim "table" rowSums(tbl) # a numeric vector
mat.or.vec
creates an nr
by nc
zero matrix if
nc
is greater than 1, and a zero vector of length nr
if
nc
equals 1.
mat.or.vec(nr, nc)
mat.or.vec(nr, nc)
nr , nc
|
numbers of rows and columns. |
mat.or.vec(3, 1) mat.or.vec(3, 2)
mat.or.vec(3, 1) mat.or.vec(3, 2)
match
returns a vector of the positions of (first) matches of
its first argument in its second.
%in%
is a more intuitive interface as a binary operator,
which returns a logical vector indicating if there is a match or not
for its left operand.
match(x, table, nomatch = NA_integer_, incomparables = NULL) x %in% table
match(x, table, nomatch = NA_integer_, incomparables = NULL) x %in% table
x |
vector or |
table |
vector or |
nomatch |
the value to be returned in the case when no match is
found. Note that it is coerced to |
incomparables |
a vector of values that cannot be matched. Any
value in |
%in%
is currently defined as "%in%" <- function(x, table) match(x, table, nomatch = 0) > 0
Factors, raw vectors and lists are converted to character vectors,
internally classed objects are transformed via mtfrm
, and
then x
and table
are coerced to a common type (the later
of the two types in R's ordering, logical < integer < numeric <
complex < character) before matching. If incomparables
has
positive length it is coerced to the common type.
Matching for lists is potentially very slow and best avoided except in simple cases.
Exactly what matches what is to some extent a matter of definition.
For all types, NA
matches NA
and no other value.
For real and complex values, NaN
values are regarded
as matching any other NaN
value, but not matching NA
,
where for complex x
, real and imaginary parts must match both
(unless containing at least one NA
).
Character strings will be compared as byte sequences if any input is
marked as "bytes"
, and otherwise are regarded as equal if they are
in different encodings but would agree when translated to UTF-8 (see
Encoding
).
That %in%
never returns NA
makes it particularly
useful in if
conditions.
A vector of the same length as x
.
match
: An integer vector giving the position in table
of
the first match if there is a match, otherwise nomatch
.
If x[i]
is found to equal table[j]
then the value
returned in the i
-th position of the return value is j
,
for the smallest possible j
. If no match is found, the value
is nomatch
.
%in%
: A logical vector, indicating if a match was located for
each element of x
: thus the values are TRUE
or
FALSE
and never NA
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
pmatch
and charmatch
for (partial)
string matching, match.arg
, etc for function argument
matching.
findInterval
similarly returns a vector of positions, but
finds numbers within intervals, rather than exact matches.
is.element
for an S-compatible equivalent of %in%
.
unique
(and duplicated
) are using the same
definitions of “match” or “equality” as match()
,
and these are less strict than ==
, e.g., for
NA
and NaN
in numeric or complex vectors,
or for strings with different encodings, see also above.
## The intersection of two sets can be defined via match(): ## Simple version: ## intersect <- function(x, y) y[match(x, y, nomatch = 0)] intersect # the R function in base is slightly more careful intersect(1:10, 7:20) 1:10 %in% c(1,3,5,9) sstr <- c("c","ab","B","bba","c",NA,"@","bla","a","Ba","%") sstr[sstr %in% c(letters, LETTERS)] "%w/o%" <- function(x, y) x[!x %in% y] #-- x without y (1:10) %w/o% c(3,7,12) ## Note that setdiff() is very similar and typically makes more sense: c(1:6,7:2) %w/o% c(3,7,12) # -> keeps duplicates setdiff(c(1:6,7:2), c(3,7,12)) # -> unique values ## Illuminating example about NA matching r <- c(1, NA, NaN) zN <- c(complex(real = NA , imaginary = r ), complex(real = r , imaginary = NA ), complex(real = r , imaginary = NaN), complex(real = NaN, imaginary = r )) zM <- cbind(Re=Re(zN), Im=Im(zN), match = match(zN, zN)) rownames(zM) <- format(zN) zM ##--> many "NA's" (= 1) and the four non-NA's (3 different ones, at 7,9,10) length(zN) # 12 unique(zN) # the "NA" and the 3 different non-NA NaN's stopifnot(identical(unique(zN), zN[c(1, 7,9,10)])) ## very strict equality would have 4 duplicates (of 12): symnum(outer(zN, zN, Vectorize(identical,c("x","y")), FALSE,FALSE,FALSE,FALSE)) ## removing "(very strictly) duplicates", i <- c(5,8,11,12) # we get 8 pairwise non-identicals : Ixy <- outer(zN[-i], zN[-i], Vectorize(identical,c("x","y")), FALSE,FALSE,FALSE,FALSE) stopifnot(identical(Ixy, diag(8) == 1))
## The intersection of two sets can be defined via match(): ## Simple version: ## intersect <- function(x, y) y[match(x, y, nomatch = 0)] intersect # the R function in base is slightly more careful intersect(1:10, 7:20) 1:10 %in% c(1,3,5,9) sstr <- c("c","ab","B","bba","c",NA,"@","bla","a","Ba","%") sstr[sstr %in% c(letters, LETTERS)] "%w/o%" <- function(x, y) x[!x %in% y] #-- x without y (1:10) %w/o% c(3,7,12) ## Note that setdiff() is very similar and typically makes more sense: c(1:6,7:2) %w/o% c(3,7,12) # -> keeps duplicates setdiff(c(1:6,7:2), c(3,7,12)) # -> unique values ## Illuminating example about NA matching r <- c(1, NA, NaN) zN <- c(complex(real = NA , imaginary = r ), complex(real = r , imaginary = NA ), complex(real = r , imaginary = NaN), complex(real = NaN, imaginary = r )) zM <- cbind(Re=Re(zN), Im=Im(zN), match = match(zN, zN)) rownames(zM) <- format(zN) zM ##--> many "NA's" (= 1) and the four non-NA's (3 different ones, at 7,9,10) length(zN) # 12 unique(zN) # the "NA" and the 3 different non-NA NaN's stopifnot(identical(unique(zN), zN[c(1, 7,9,10)])) ## very strict equality would have 4 duplicates (of 12): symnum(outer(zN, zN, Vectorize(identical,c("x","y")), FALSE,FALSE,FALSE,FALSE)) ## removing "(very strictly) duplicates", i <- c(5,8,11,12) # we get 8 pairwise non-identicals : Ixy <- outer(zN[-i], zN[-i], Vectorize(identical,c("x","y")), FALSE,FALSE,FALSE,FALSE) stopifnot(identical(Ixy, diag(8) == 1))
match.arg
matches a character arg
against a table of
candidate values as specified by choices
.
match.arg(arg, choices, several.ok = FALSE)
match.arg(arg, choices, several.ok = FALSE)
arg |
a character vector (of length one unless |
choices |
a character vector of candidate values, often missing, see ‘Details’. |
several.ok |
logical specifying if |
In the one-argument form match.arg(arg)
, the choices are
obtained from a default setting for the formal argument arg
of
the function from which match.arg
was called. (Since default
argument matching will set arg
to choices
, this is
allowed as an exception to the ‘length one unless
several.ok
is TRUE
’ rule, and returns the first
element.)
Matching is done using pmatch
, so arg
may be
abbreviated and the empty string (""
) never matches, not even
itself, see pmatch
.
The unabbreviated version of the exact or unique partial match if
there is one; otherwise, an error is signalled if several.ok
is
false, as per default. When several.ok
is true and (at least)
one element of arg
has a match, all unabbreviated versions of
matches are returned.
The error messages given are liable to change and did so in R 4.2.0. Do not test them in packages.
pmatch
,
match.fun
,
match.call
.
require(stats) ## Extends the example for 'switch' center <- function(x, type = c("mean", "median", "trimmed")) { type <- match.arg(type) switch(type, mean = mean(x), median = median(x), trimmed = mean(x, trim = .1)) } x <- rcauchy(10) center(x, "t") # Works center(x, "med") # Works try(center(x, "m")) # Error stopifnot(identical(center(x), center(x, "mean")), identical(center(x, NULL), center(x, "mean")) ) ## Allowing more than one 'arg' and hence more than one match: match.arg(c("gauss", "rect", "ep"), c("gaussian", "epanechnikov", "rectangular", "triangular"), several.ok = TRUE) match.arg(c("a", ""), c("", NA, "bb", "abc"), several.ok=TRUE) # |--> "abc"
require(stats) ## Extends the example for 'switch' center <- function(x, type = c("mean", "median", "trimmed")) { type <- match.arg(type) switch(type, mean = mean(x), median = median(x), trimmed = mean(x, trim = .1)) } x <- rcauchy(10) center(x, "t") # Works center(x, "med") # Works try(center(x, "m")) # Error stopifnot(identical(center(x), center(x, "mean")), identical(center(x, NULL), center(x, "mean")) ) ## Allowing more than one 'arg' and hence more than one match: match.arg(c("gauss", "rect", "ep"), c("gaussian", "epanechnikov", "rectangular", "triangular"), several.ok = TRUE) match.arg(c("a", ""), c("", NA, "bb", "abc"), several.ok=TRUE) # |--> "abc"
match.call
returns a call in which all of the specified arguments are
specified by their full names.
match.call(definition = sys.function(sys.parent()), call = sys.call(sys.parent()), expand.dots = TRUE, envir = parent.frame(2L))
match.call(definition = sys.function(sys.parent()), call = sys.call(sys.parent()), expand.dots = TRUE, envir = parent.frame(2L))
definition |
a function, by default the function from which
|
call |
an unevaluated call to the function specified by
|
expand.dots |
logical. Should arguments matching |
envir |
an environment, from which the |
‘function’ on this help page means an interpreted function
(also known as a ‘closure’): match.call
does not support
primitive functions (where argument matching is normally
positional).
match.call
is most commonly used in two circumstances:
To record the call for later re-use: for example most
model-fitting functions record the call as element call
of
the list they return. Here the default expand.dots = TRUE
is appropriate.
To pass most of the call to another function, often
model.frame
. Here the common idiom is that
expand.dots = FALSE
is used, and the ...
element
of the matched call is removed. An alternative is to
explicitly select the arguments to be passed on, as is done in
lm
.
Calling match.call
outside a function without specifying
definition
is an error.
An object of class call
.
Chambers, J. M. (1998) Programming with Data. A Guide to the S Language. Springer.
sys.call()
is similar, but does not expand the
argument names;
call
, pmatch
, match.arg
,
match.fun
.
match.call(get, call("get", "abc", i = FALSE, p = 3)) ## -> get(x = "abc", pos = 3, inherits = FALSE) fun <- function(x, lower = 0, upper = 1) { structure((x - lower) / (upper - lower), CALL = match.call()) } fun(4 * atan(1), u = pi)
match.call(get, call("get", "abc", i = FALSE, p = 3)) ## -> get(x = "abc", pos = 3, inherits = FALSE) fun <- function(x, lower = 0, upper = 1) { structure((x - lower) / (upper - lower), CALL = match.call()) } fun(4 * atan(1), u = pi)
When called inside functions that take a function as argument, extract the desired function object while avoiding undesired matching to objects of other types.
match.fun(FUN, descend = TRUE)
match.fun(FUN, descend = TRUE)
FUN |
item to match as function: a function, symbol or character string. See ‘Details’. |
descend |
logical; control whether to search past non-function objects. |
match.fun
is not intended to be used at the top level since it
will perform matching in the parent of the caller.
If FUN
is a function, it is returned. If it is a symbol (for
example, enclosed in backquotes) or a
character vector of length one, it will be looked up using get
in the environment of the parent of the caller. If it is of any other
mode, it is attempted first to get the argument to the caller as a
symbol (using substitute
twice), and if that fails, an error is
declared.
If descend = TRUE
, match.fun
will look past non-function
objects with the given name; otherwise if FUN
points to a
non-function object then an error is generated.
This is used in base functions such as apply
,
lapply
, outer
, and sweep
.
A function matching FUN
or an error is generated.
The descend
argument is a bit of misnomer and probably not
actually needed by anything. It may go away in the future.
It is impossible to fully foolproof this. If one attach
es a
list or data frame containing a length-one character vector with the
same name as a function, it may be used (although namespaces
will help).
Peter Dalgaard and Robert Gentleman, based on an earlier version by Jonathan Rougier.
# Same as get("*"): match.fun("*") # Overwrite outer with a vector outer <- 1:5 try(match.fun(outer, descend = FALSE)) #-> Error: not a function match.fun(outer) # finds it anyway is.function(match.fun("outer")) # as well
# Same as get("*"): match.fun("*") # Overwrite outer with a vector outer <- 1:5 try(match.fun(outer, descend = FALSE)) #-> Error: not a function match.fun(outer) # finds it anyway is.function(match.fun("outer")) # as well
abs(x)
computes the absolute value of x, sqrt(x)
computes the
(principal) square root of x, .
The naming follows the standard for computer languages such as C or Fortran.
abs(x) sqrt(x)
abs(x) sqrt(x)
x |
a numeric or |
These are internal generic primitive functions: methods
can be defined for them individually or via the
Math
group generic. For complex
arguments (and the default method), z
, abs(z) ==
Mod(z)
and sqrt(z) == z^0.5
.
abs(x)
returns an integer
vector when x
is
integer
or logical
.
Both are S4 generic and members of the
Math
group generic.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Arithmetic
for simple, log
for logarithmic,
sin
for trigonometric, and Special
for
special mathematical functions.
‘plotmath’ for the use of sqrt
in plot annotation.
require(stats) # for spline require(graphics) xx <- -9:9 plot(xx, sqrt(abs(xx)), col = "red") lines(spline(xx, sqrt(abs(xx)), n=101), col = "pink")
require(stats) # for spline require(graphics) xx <- -9:9 plot(xx, sqrt(abs(xx)), col = "red") lines(spline(xx, sqrt(abs(xx)), n=101), col = "pink")
Multiplies two matrices, if they are conformable. If one argument is a vector, it will be promoted to either a row or column matrix to make the two arguments conformable. If both are vectors of the same length, it will return the inner product (as a matrix).
x %*% y
x %*% y
x , y
|
numeric or complex matrices or vectors. |
When a vector is promoted to a matrix, its names are not
promoted to row or column names, unlike as.matrix
.
Promotion of a vector to a 1-row or 1-column matrix happens when one
of the two choices allows x
and y
to get conformable
dimensions.
This operator is a generic function: methods can be written for it
individually or via the matOps
group
generic function; it dispatches to S3 and S4 methods. Methods need to be
written for a function that takes two arguments named x
and y
.
A double or complex matrix product. Use drop
to remove
dimensions which have only one level.
The propagation of NaN/Inf values, precision, and performance of matrix
products can be controlled by options("matprod")
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
For matrix cross products, crossprod()
and
tcrossprod()
are typically preferable.
matrix
, Arithmetic
, diag
.
x <- 1:4 (z <- x %*% x) # scalar ("inner") product (1 x 1 matrix) drop(z) # as scalar y <- diag(x) z <- matrix(1:12, ncol = 3, nrow = 4) y %*% z y %*% x x %*% z
x <- 1:4 (z <- x %*% x) # scalar ("inner") product (1 x 1 matrix) drop(z) # as scalar y <- diag(x) z <- matrix(1:12, ncol = 3, nrow = 4) y %*% z y %*% x x %*% z
matrix
creates a matrix from the given set of values.
as.matrix
attempts to turn its argument into a matrix.
is.matrix
tests if its argument is a (strict) matrix.
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) as.matrix(x, ...) ## S3 method for class 'data.frame' as.matrix(x, rownames.force = NA, ...) is.matrix(x)
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) as.matrix(x, ...) ## S3 method for class 'data.frame' as.matrix(x, rownames.force = NA, ...) is.matrix(x)
data |
an optional data vector (including a list or
|
nrow |
the desired number of rows. |
ncol |
the desired number of columns. |
byrow |
logical. If |
dimnames |
a |
x |
an R object. |
... |
additional arguments to be passed to or from methods. |
rownames.force |
logical indicating if the resulting matrix
should have character (rather than |
If one of nrow
or ncol
is not given, an attempt is
made to infer it from the length of data
and the other
parameter. If neither is given, a one-column matrix is returned.
If there are too few elements in data
to fill the matrix,
then the elements in data
are recycled. If data
has
length zero, NA
of an appropriate type is used for atomic
vectors (0
for raw vectors) and NULL
for lists.
is.matrix
returns TRUE
if x
is a vector and has a
"dim"
attribute of length 2 and FALSE
otherwise.
Note that a data.frame
is not a matrix by this
test. The function is generic: you can write methods to handle
specific classes of objects, see InternalMethods.
as.matrix
is a generic function. The method for data frames
will return a character matrix if there is only atomic columns and any
non-(numeric/logical/complex) column, applying as.vector
to factors and format
to other non-character columns.
Otherwise, the usual coercion hierarchy (logical < integer < double <
complex) will be used, e.g., all-logical data frames will be coerced
to a logical matrix, mixed logical-integer will give a integer matrix,
etc.
The default method for as.matrix
calls as.vector(x)
, and
hence e.g. coerces factors to character vectors.
When coercing a vector, it produces a one-column matrix, and promotes the names (if any) of the vector to the rownames of the matrix.
is.matrix
is a primitive function.
The print
method for a matrix gives a rectangular layout with
dimnames or indices. For a list matrix, the entries of length not
one are printed in the form ‘integer,7’ indicating the type
and length.
If you just want to convert a vector to a matrix, something like
dim(x) <- c(nx, ny) dimnames(x) <- list(row_names, col_names)
will avoid duplicating x
and preserve
class(x)
which may be useful, e.g.,
for Date
objects.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
data.matrix
, which attempts to convert to a numeric
matrix.
A matrix is the special case of a two-dimensional array
.
inherits(m, "array")
is true for a matrix
m
.
is.matrix(as.matrix(1:10)) !is.matrix(warpbreaks) # data.frame, NOT matrix! warpbreaks[1:10,] as.matrix(warpbreaks[1:10,]) # using as.matrix.data.frame(.) method ## Example of setting row and column names mdat <- matrix(c(1,2,3, 11,12,13), nrow = 2, ncol = 3, byrow = TRUE, dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3"))) mdat
is.matrix(as.matrix(1:10)) !is.matrix(warpbreaks) # data.frame, NOT matrix! warpbreaks[1:10,] as.matrix(warpbreaks[1:10,]) # using as.matrix.data.frame(.) method ## Example of setting row and column names mdat <- matrix(c(1,2,3, 11,12,13), nrow = 2, ncol = 3, byrow = TRUE, dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3"))) mdat
Find the maximum position for each row of a matrix, breaking ties at random.
max.col(m, ties.method = c("random", "first", "last"))
max.col(m, ties.method = c("random", "first", "last"))
m |
a numerical matrix. |
ties.method |
a character string specifying how ties are
handled, |
When ties.method = "random"
, as per default, ties are broken at
random. In this case, the determination of a tie assumes that
the entries are probabilities: there is a relative tolerance of
, relative to the largest (in magnitude, omitting
infinity) entry in the row.
If ties.method = "first"
, max.col
returns the
column number of the first of several maxima in every row, the
same as unname(apply(m, 1, which.max))
if m
has no missing values.
Correspondingly, ties.method = "last"
returns the last
of possibly several indices.
index of a maximal value for each row, an integer vector of
length nrow(m)
.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. New York: Springer (4th ed).
which.max
for vectors.
table(mc <- max.col(swiss)) # mostly "1" and "5", 5 x "2" and once "4" swiss[unique(print(mr <- max.col(t(swiss)))) , ] # 3 33 45 45 33 6 set.seed(1) # reproducible example: (mm <- rbind(x = round(2*stats::runif(12)), y = round(5*stats::runif(12)), z = round(8*stats::runif(12)))) ## Not run: [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] x 1 1 1 2 0 2 2 1 1 0 0 0 y 3 2 4 2 4 5 2 4 5 1 3 1 z 2 3 0 3 7 3 4 5 4 1 7 5 ## End(Not run) ## column indices of all row maxima : utils::str(lapply(1:3, function(i) which(mm[i,] == max(mm[i,])))) max.col(mm) ; max.col(mm) # "random" max.col(mm, "first") # -> 4 6 5 max.col(mm, "last") # -> 7 9 11
table(mc <- max.col(swiss)) # mostly "1" and "5", 5 x "2" and once "4" swiss[unique(print(mr <- max.col(t(swiss)))) , ] # 3 33 45 45 33 6 set.seed(1) # reproducible example: (mm <- rbind(x = round(2*stats::runif(12)), y = round(5*stats::runif(12)), z = round(8*stats::runif(12)))) ## Not run: [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] x 1 1 1 2 0 2 2 1 1 0 0 0 y 3 2 4 2 4 5 2 4 5 1 3 1 z 2 3 0 3 7 3 4 5 4 1 7 5 ## End(Not run) ## column indices of all row maxima : utils::str(lapply(1:3, function(i) which(mm[i,] == max(mm[i,])))) max.col(mm) ; max.col(mm) # "random" max.col(mm, "first") # -> 4 6 5 max.col(mm, "last") # -> 7 9 11
Generic function for the (trimmed) arithmetic mean.
mean(x, ...) ## Default S3 method: mean(x, trim = 0, na.rm = FALSE, ...)
mean(x, ...) ## Default S3 method: mean(x, trim = 0, na.rm = FALSE, ...)
x |
an R object. Currently there are methods for
numeric/logical vectors and date,
date-time and time interval objects. Complex vectors
are allowed for |
trim |
the fraction (0 to 0.5) of observations to be
trimmed from each end of |
na.rm |
a logical evaluating to |
... |
further arguments passed to or from other methods. |
If trim
is zero (the default), the arithmetic mean of the
values in x
is computed, as a numeric or complex vector of
length one. If x
is not logical (coerced to numeric), numeric
(including integer) or complex, NA_real_
is returned, with a warning.
If trim
is non-zero, a symmetrically trimmed mean is computed
with a fraction of trim
observations deleted from each end
before the mean is computed.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
weighted.mean
, mean.POSIXct
,
colMeans
for row and column means.
x <- c(0:10, 50) xm <- mean(x) c(xm, mean(x, trim = 0.10))
x <- c(0:10, 50) xm <- mean(x) c(xm, mean(x, trim = 0.10))
In-memory compression or decompression for raw vectors.
memCompress(from, type = c("gzip", "bzip2", "xz", "none")) memDecompress(from, type = c("unknown", "gzip", "bzip2", "xz", "none"), asChar = FALSE)
memCompress(from, type = c("gzip", "bzip2", "xz", "none")) memDecompress(from, type = c("unknown", "gzip", "bzip2", "xz", "none"), asChar = FALSE)
from |
raw vector. For |
type |
character string, the type of compression. May be abbreviated to a single letter, defaults to the first of the alternatives. |
asChar |
logical: should the result be converted to a character
string? NB: character strings have a limit of
|
type = "none"
passes the input through unchanged, but may be
useful if type
is a variable.
type = "unknown"
attempts to detect the type of compression
applied (if any): this will always succeed for bzip2
compression, and will succeed for other forms if there is a suitable
header. If no type of compression is detected this is the same as
type = "none"
but a warning is given.
gzip
compression uses whatever is the default compression
level of the underlying library (usually 6
). This supports the
RFC 1950 format, sometimes known as ‘zlib’ format, for
compression and decompression and for decompression only RFC 1952, the
‘gzip’ format (which wraps the ‘zlib’ format with a
header and footer).
bzip2
compression always adds a header ("BZh"
). The
underlying library only supports in-memory (de)compression of up to
elements. Compression is equivalent to
bzip2 -9
(the default).
Compressing with type = "xz"
is equivalent to compressing a
file with xz -9e
(including adding the ‘magic’
header): decompression should cope with the contents of any file
compressed by xz
version 4.999 and later, as well as by some
versions of lzma
. There are other versions, in particular
‘raw’ streams, that are not currently handled.
All the types of compression can expand the input: for "gzip"
and "bzip2"
the maximum expansion is known and so
memCompress
can always allocate sufficient space. For
"xz"
it is possible (but extremely unlikely) that compression
will fail if the output would have been too large.
A raw vector or a character string (if asChar = TRUE
).
libdeflate
Support for the libdeflate
library was added for R 4.4.0. It
uses different code for the RFC 1950 ‘zlib’ format (and RFC
1952 for decompression), expected to be substantially faster than
using the reference (or system) zlib library. It is used for
type = "gzip"
if available.
The headers and sources can be downloaded from https://github.com/ebiggers/libdeflate and pre-built versions are available for most Linux distributions. It is used for binary Windows distributions.
extSoftVersion
for the versions of the zlib
or
libdeflate
, bzip2
and xz
libraries in use.
https://en.wikipedia.org/wiki/Data_compression for background on data compression, https://zlib.net/, https://en.wikipedia.org/wiki/Gzip, http://www.bzip.org/, https://en.wikipedia.org/wiki/Bzip2, and https://en.wikipedia.org/wiki/XZ_Utils for references about the particular schemes used.
txt <- readLines(file.path(R.home("doc"), "COPYING")) sum(nchar(txt)) txt.gz <- memCompress(txt, "g") # "gzip", the default length(txt.gz) txt2 <- strsplit(memDecompress(txt.gz, "g", asChar = TRUE), "\n")[[1]] stopifnot(identical(txt, txt2)) ## as from R 4.4.0 this is detected if not specified. txt2b <- strsplit(memDecompress(txt.gz, asChar = TRUE), "\n")[[1]] stopifnot(identical(txt2b, txt2)) txt.bz2 <- memCompress(txt, "b") length(txt.bz2) ## can auto-detect bzip2: txt3 <- strsplit(memDecompress(txt.bz2, asChar = TRUE), "\n")[[1]] stopifnot(identical(txt, txt3)) ## xz compression is only worthwhile for large objects txt.xz <- memCompress(txt, "x") length(txt.xz) txt3 <- strsplit(memDecompress(txt.xz, asChar = TRUE), "\n")[[1]] stopifnot(identical(txt, txt3)) ## test decompressing a gzip-ed file tf <- tempfile(fileext = ".gz") con <- gzfile(tf, "w") writeLines(txt, con) close(con) (nf <- file.size(tf)) # if (nzchar(Sys.which("file"))) system2("file", tf) foo <- readBin(tf, "raw", n = nf) unlink(tf) ## will detect the gzip header and choose type = "gzip" txt3 <- strsplit(memDecompress(foo, asChar = TRUE), "\n")[[1]] stopifnot(identical(txt, txt3))
txt <- readLines(file.path(R.home("doc"), "COPYING")) sum(nchar(txt)) txt.gz <- memCompress(txt, "g") # "gzip", the default length(txt.gz) txt2 <- strsplit(memDecompress(txt.gz, "g", asChar = TRUE), "\n")[[1]] stopifnot(identical(txt, txt2)) ## as from R 4.4.0 this is detected if not specified. txt2b <- strsplit(memDecompress(txt.gz, asChar = TRUE), "\n")[[1]] stopifnot(identical(txt2b, txt2)) txt.bz2 <- memCompress(txt, "b") length(txt.bz2) ## can auto-detect bzip2: txt3 <- strsplit(memDecompress(txt.bz2, asChar = TRUE), "\n")[[1]] stopifnot(identical(txt, txt3)) ## xz compression is only worthwhile for large objects txt.xz <- memCompress(txt, "x") length(txt.xz) txt3 <- strsplit(memDecompress(txt.xz, asChar = TRUE), "\n")[[1]] stopifnot(identical(txt, txt3)) ## test decompressing a gzip-ed file tf <- tempfile(fileext = ".gz") con <- gzfile(tf, "w") writeLines(txt, con) close(con) (nf <- file.size(tf)) # if (nzchar(Sys.which("file"))) system2("file", tf) foo <- readBin(tf, "raw", n = nf) unlink(tf) ## will detect the gzip header and choose type = "gzip" txt3 <- strsplit(memDecompress(foo, asChar = TRUE), "\n")[[1]] stopifnot(identical(txt, txt3))
Query and set the maximal size of the vector heap and the maximal number of heap nodes for the current R process.
mem.maxVSize(vsize = 0) mem.maxNSize(nsize = 0)
mem.maxVSize(vsize = 0) mem.maxNSize(nsize = 0)
vsize |
numeric; new size limit in Mb. |
nsize |
numeric; new maximal node number. |
New limits lower than current usage are ignored.
Specifying a size of Inf
sets the limit to the maximal possible
value for the platform.
The default maximal values are unlimited on most platforms, but can be
adjusted using environment variables as described in
Memory
. On macOS a lower default vector heap limit is
used to protect against the R process being killed when macOS
over-commits memory.
Adjusting the maximal number of nodes is rarely necessary. Adjusting the vector heap size limit can be useful on macOS in particular but should be done with caution.
The current or new value, in Mb for mem.maxVSize
. Inf
is
returned if the current value is unlimited.
How R manages its workspace.
R has a variable-sized workspace. There are (rarely-used) command-line options to control its minimum size, but no longer any to control the maximum size.
R maintains separate areas for fixed and variable sized objects. The first of these is allocated as an array of cons cells (Lisp programmers will know what they are, others may think of them as the building blocks of the language itself, parse trees, etc.), and the second are thrown on a heap of ‘Vcells’ of 8 bytes each. Each cons cell occupies 28 bytes on a 32-bit build of R, (usually) 56 bytes on a 64-bit build.
The default values are (currently) an initial setting of 350k cons cells and 6Mb of vector heap. Note that the areas are not actually allocated initially: rather these values are the sizes for triggering garbage collection. These values can be set by the command line options --min-nsize and --min-vsize (or if they are not used, the environment variables R_NSIZE and R_VSIZE) when R is started. Thereafter R will grow or shrink the areas depending on usage, never decreasing below the initial values. The maximal vector heap size can be set with the environment variable R_MAX_VSIZE. An attempt to set a lower maximum than the current usage is ignored. Vector heap limits are given in bytes.
How much time R spends in the garbage collector will depend on these initial settings and on the trade-off the memory manager makes, when memory fills up, between collecting garbage to free up unused memory and growing these areas. The strategy used for growth can be specified by setting the environment variable R_GC_MEM_GROW to an integer value between 0 and 3. This variable is read at start-up. Higher values grow the heap more aggressively, thus reducing garbage collection time but using more memory.
You can find out the current memory consumption (the heap and cons
cells used as numbers and megabytes) by typing gc()
at the
R prompt. Note that following gcinfo(TRUE)
, automatic
garbage collection always prints memory use statistics.
The command-line option --max-ppsize controls the maximum size of the pointer protection stack. This defaults to 50000, but can be increased to allow deep recursion or large and complicated calculations to be done. Note that parts of the garbage collection process goes through the full reserved pointer protection stack and hence becomes slower when the size is increased. Currently the maximum value accepted is 500000.
An Introduction to R for more command-line options.
Memory-limits
for the design limitations.
gc
for information on the garbage collector and total
memory usage, object.size(a)
for the (approximate)
size of R object a
. memory.profile
for
profiling the usage of cons cells.
R holds objects it is using in virtual memory. This help file documents the current design limitations on large objects: these differ between 32-bit and 64-bit builds of R.
Currently R runs on 32- and 64-bit operating systems, and most 64-bit OSes (including Linux, Solaris, Windows and macOS) can run either 32- or 64-bit builds of R. The memory limits depends mainly on the build, but for a 32-bit build of R on Windows they also depend on the underlying OS version.
R holds all objects in virtual memory, and there are limits based on the amount of memory that can be used by all objects:
There may be limits on the size of the heap and the number of
cons cells allowed – see Memory
– but these are
usually not imposed.
There is a limit on the (user) address space of a single process such as the R executable. This is system-specific, and can depend on the executable.
The environment may impose limitations on the resources available to a single process: Windows' versions of R do so directly.
Error messages beginning ‘cannot allocate vector of size’ indicate a failure to obtain memory, either because the size exceeded the address-space limit for a process or, more likely, because the system was unable to provide the memory. Note that on a 32-bit build there may well be enough free memory available, but not a large enough contiguous block of address space into which to map it.
There are also limits on individual objects. The storage space
cannot exceed the address limit, and if you try to exceed that limit,
the error message begins ‘cannot allocate vector of length’.
The number of bytes in a character string is limited to
,
which is also the limit on each dimension of an array.
The address-space limit is system-specific: 32-bit OSes imposes a limit of no more than 4Gb: it is often 3Gb. Running 32-bit executables on a 64-bit OS will have similar limits: 64-bit executables will have an essentially infinite system-specific limit (e.g., 128Tb for Linux on x86_64 CPUs).
See the OS/shell's help on commands such as limit
or
ulimit
for how to impose limitations on the resources available
to a single process. For example a bash
user could use
ulimit -t 600 -v 4000000
whereas a csh
user might use
limit cputime 10m limit vmemoryuse 4096m
to limit a process to 10 minutes of CPU time and (around) 4Gb of virtual memory. (There are other options to set the RAM in use, but they are not generally honoured.)
The address-space limit is 2Gb under 32-bit Windows unless the OS's default has been changed to allow more (up to 3Gb). See https://docs.microsoft.com/en-gb/windows/desktop/Memory/physical-address-extension and https://docs.microsoft.com/en-gb/windows/desktop/Memory/4-gigabyte-tuning. Under most 64-bit versions of Windows the limit for a 32-bit build of R is 4Gb: for the oldest ones it is 2Gb. The limit for a 64-bit build of R (imposed by the OS) is 8Tb.
It is not normally possible to allocate as much as 2Gb to a single vector in a 32-bit build of R even on 64-bit Windows because of preallocations by Windows in the middle of the address space.
object.size(a)
for the (approximate) size of R object
a
.
Lists the usage of the cons cells by SEXPREC
type.
memory.profile()
memory.profile()
The current types and their uses are listed in the include file ‘Rinternals.h’.
A vector of counts, named by the types. See typeof
for
an explanation of types.
gc
for the overall usage of cons cells.
Rprofmem
and tracemem
allow memory profiling
of specific code or objects, but need to be enabled at compile time.
memory.profile()
memory.profile()
Merge two data frames by common columns or row names, or do other versions of database join operations.
merge(x, y, ...) ## Default S3 method: merge(x, y, ...) ## S3 method for class 'data.frame' merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x",".y"), no.dups = TRUE, incomparables = NULL, ...)
merge(x, y, ...) ## Default S3 method: merge(x, y, ...) ## S3 method for class 'data.frame' merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x",".y"), no.dups = TRUE, incomparables = NULL, ...)
x , y
|
data frames, or objects to be coerced to one. |
by , by.x , by.y
|
specifications of the columns used for merging. See ‘Details’. |
all |
logical; |
all.x |
logical; if |
all.y |
logical; analogous to |
sort |
logical. Should the result be sorted on the |
suffixes |
a character vector of length 2 specifying the suffixes
to be used for making unique the names of columns in the result
which are not used for merging (appearing in |
no.dups |
logical indicating that |
incomparables |
values which cannot be matched. See
|
... |
arguments to be passed to or from methods. |
merge
is a generic function whose principal method is for data
frames: the default method coerces its arguments to data frames and
calls the "data.frame"
method.
By default the data frames are merged on the columns with names they
both have, but separate specifications of the columns can be given by
by.x
and by.y
. The rows in the two data frames that
match on the specified columns are extracted, and joined together. If
there is more than one match, all possible matches contribute one row
each. For the precise meaning of ‘match’, see
match
.
Columns to merge on can be specified by name, number or by a logical
vector: the name "row.names"
or the number 0
specifies
the row names. If specified by name it must correspond uniquely to a
named column in the input.
If by
or both by.x
and by.y
are of length 0 (a
length zero vector or NULL
), the result, r
, is the
Cartesian product of x
and y
, i.e.,
dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y))
.
If all.x
is true, all the non matching cases of x
are
appended to the result as well, with NA
filled in the
corresponding columns of y
; analogously for all.y
.
If the columns in the data frames not used in merging have any common
names, these have suffixes
(".x"
and ".y"
by
default) appended to try to make the names of the result unique. If
this is not possible, an error is thrown.
If a by.x
column name matches one of y
, and if
no.dups
is true (as by default), the y version gets suffixed as
well, avoiding duplicate column names in the result.
The complexity of the algorithm used is proportional to the length of the answer.
In SQL database terminology, the default value of all = FALSE
gives a natural join, a special case of an inner
join. Specifying all.x = TRUE
gives a left (outer)
join, all.y = TRUE
a right (outer) join, and both
(all = TRUE
) a (full) outer join. DBMSes do not match
NULL
records, equivalent to incomparables = NA
in R.
A data frame. The rows are by default lexicographically sorted on the
common columns, but for sort = FALSE
are in an unspecified order.
The columns are the common columns followed by the
remaining columns in x
and then those in y
. If the
matching involved row names, an extra character column called
Row.names
is added at the left, and in all cases the result has
‘automatic’ row names.
This is intended to work with data frames with vector-like columns: some aspects work with data frames containing matrices, but not all.
Currently long vectors are not accepted for inputs, which are thus restricted to less than 2^31 rows. That restriction also applies to the result for 32-bit platforms.
data.frame
,
by
,
cbind
.
dendrogram
for a class which has a merge
method.
authors <- data.frame( ## I(*) : use character columns of names to get sensible sort order surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")), nationality = c("US", "Australia", "US", "UK", "Australia"), deceased = c("yes", rep("no", 4))) authorN <- within(authors, { name <- surname; rm(surname) }) books <- data.frame( name = I(c("Tukey", "Venables", "Tierney", "Ripley", "Ripley", "McNeil", "R Core")), title = c("Exploratory Data Analysis", "Modern Applied Statistics ...", "LISP-STAT", "Spatial Statistics", "Stochastic Simulation", "Interactive Data Analysis", "An Introduction to R"), other.author = c(NA, "Ripley", NA, NA, NA, NA, "Venables & Smith")) (m0 <- merge(authorN, books)) (m1 <- merge(authors, books, by.x = "surname", by.y = "name")) m2 <- merge(books, authors, by.x = "name", by.y = "surname") stopifnot(exprs = { identical(m0, m2[, names(m0)]) as.character(m1[, 1]) == as.character(m2[, 1]) all.equal(m1[, -1], m2[, -1][ names(m1)[-1] ]) identical(dim(merge(m1, m2, by = NULL)), c(nrow(m1)*nrow(m2), ncol(m1)+ncol(m2))) }) ## "R core" is missing from authors and appears only here : merge(authors, books, by.x = "surname", by.y = "name", all = TRUE) ## example of using 'incomparables' x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5) y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5) merge(x, y, by = c("k1","k2")) # NA's match merge(x, y, by = "k1") # NA's match, so 6 rows merge(x, y, by = "k2", incomparables = NA) # 2 rows
authors <- data.frame( ## I(*) : use character columns of names to get sensible sort order surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")), nationality = c("US", "Australia", "US", "UK", "Australia"), deceased = c("yes", rep("no", 4))) authorN <- within(authors, { name <- surname; rm(surname) }) books <- data.frame( name = I(c("Tukey", "Venables", "Tierney", "Ripley", "Ripley", "McNeil", "R Core")), title = c("Exploratory Data Analysis", "Modern Applied Statistics ...", "LISP-STAT", "Spatial Statistics", "Stochastic Simulation", "Interactive Data Analysis", "An Introduction to R"), other.author = c(NA, "Ripley", NA, NA, NA, NA, "Venables & Smith")) (m0 <- merge(authorN, books)) (m1 <- merge(authors, books, by.x = "surname", by.y = "name")) m2 <- merge(books, authors, by.x = "name", by.y = "surname") stopifnot(exprs = { identical(m0, m2[, names(m0)]) as.character(m1[, 1]) == as.character(m2[, 1]) all.equal(m1[, -1], m2[, -1][ names(m1)[-1] ]) identical(dim(merge(m1, m2, by = NULL)), c(nrow(m1)*nrow(m2), ncol(m1)+ncol(m2))) }) ## "R core" is missing from authors and appears only here : merge(authors, books, by.x = "surname", by.y = "name", all = TRUE) ## example of using 'incomparables' x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5) y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5) merge(x, y, by = c("k1","k2")) # NA's match merge(x, y, by = "k1") # NA's match, so 6 rows merge(x, y, by = "k2", incomparables = NA) # 2 rows
Generate a diagnostic message from its arguments.
message(..., domain = NULL, appendLF = TRUE) suppressMessages(expr, classes = "message") packageStartupMessage(..., domain = NULL, appendLF = TRUE) suppressPackageStartupMessages(expr) .makeMessage(..., domain = NULL, appendLF = FALSE)
message(..., domain = NULL, appendLF = TRUE) suppressMessages(expr, classes = "message") packageStartupMessage(..., domain = NULL, appendLF = TRUE) suppressPackageStartupMessages(expr) .makeMessage(..., domain = NULL, appendLF = FALSE)
... |
zero or more objects which can be coerced to character
(and which are pasted together with no separator) or (for
|
domain |
see |
appendLF |
logical: should messages given as a character string have a newline appended? |
expr |
expression to evaluate. |
classes |
character, indicating which classes of messages should be suppressed. |
message
is used for generating ‘simple’ diagnostic
messages which are neither warnings nor errors, but nevertheless
represented as conditions. Unlike warnings and errors, a final
newline is regarded as part of the message, and is optional.
The default handler sends the message to the
stderr()
connection.
If a condition object is supplied to message
it should be
the only argument, and further arguments will be ignored, with a warning.
While the message is being processed, a muffleMessage
restart
is available.
suppressMessages
evaluates its expression in a context that
ignores all ‘simple’ diagnostic messages.
packageStartupMessage
is a variant whose messages can be
suppressed separately by suppressPackageStartupMessages
. (They
are still messages, so can be suppressed by suppressMessages
.)
.makeMessage
is a utility used by message
, warning
and stop
to generate a text message from the ...
arguments by possible translation (see gettext
) and
concatenation (with no separator).
warning
and stop
for generating warnings
and errors; conditions
for condition handling and
recovery.
gettext
for the mechanisms for the automated translation
of text.
message("ABC", "DEF") suppressMessages(message("ABC")) testit <- function() { message("testing package startup messages") packageStartupMessage("initializing ...", appendLF = FALSE) Sys.sleep(1) packageStartupMessage(" done") } testit() suppressPackageStartupMessages(testit()) suppressMessages(testit())
message("ABC", "DEF") suppressMessages(message("ABC")) testit <- function() { message("testing package startup messages") packageStartupMessage("initializing ...", appendLF = FALSE) Sys.sleep(1) packageStartupMessage(" done") } testit() suppressPackageStartupMessages(testit()) suppressMessages(testit())
missing
can be used to test whether a value was specified
as an argument to a function.
missing(x)
missing(x)
x |
a formal argument. |
missing(x)
is only reliable if x
has not been altered
since entering the function: in particular it will always
be false after x <- match.arg(x)
.
The example shows how a plotting function can be written to work with either a pair of vectors giving x and y coordinates of points to be plotted or a single vector giving y values to be plotted against their indices.
Currently missing
can only be used in the immediate body of
the function that defines the argument, not in the body of a nested
function or a local
call. This may change in the future.
This is a ‘special’ primitive function: it must not evaluate its argument.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Chambers, J. M. (1998) Programming with Data. A Guide to the S Language. Springer.
substitute
for argument expression;
NA
for missing values in data.
myplot <- function(x, y) { if(missing(y)) { y <- x x <- 1:length(y) } plot(x, y) }
myplot <- function(x, y) { if(missing(y)) { y <- x x <- 1:length(y) } plot(x, y) }
Get or set the ‘mode’ (a kind of ‘type’), or the storage mode of an R object.
mode(x) mode(x) <- value storage.mode(x) storage.mode(x) <- value
mode(x) mode(x) <- value storage.mode(x) storage.mode(x) <- value
x |
any R object. |
value |
a character string giving the desired mode or ‘storage mode’ (type) of the object. |
Both mode
and storage.mode
return a character string
giving the (storage) mode of the object — often the same — both
relying on the output of typeof(x)
, see the example
below.
mode(x) <- "newmode"
changes the mode
of object x
to
newmode
. This is only supported if there is an appropriate
as.newmode
function, for example
"logical"
, "integer"
, "double"
, "complex"
,
"raw"
, "character"
, "list"
, "expression"
,
"name"
, "symbol"
and "function"
. Attributes are
preserved (but see below).
storage.mode(x) <- "newmode"
is a more efficient primitive
version of mode<-
, which works for "newmode"
which is
one of the internal types (see typeof
), but not for
"single"
. Attributes are preserved.
As storage mode "single"
is only a pseudo-mode in R, it will
not be reported by mode
or storage.mode
: use
attr(object, "Csingle")
to examine this. However,
mode<-
can be used to set the mode to "single"
,
which sets the real mode to "double"
and the "Csingle"
attribute to TRUE
. Setting any other mode will remove this
attribute.
Note (in the examples below) that some call
s have mode
"("
which is S compatible.
Modes have the same set of names as types (see typeof
)
except that
types "integer"
and "double"
are
returned as "numeric"
.
types "special"
, "builtin"
and "closure"
are returned as
"function"
.
type "symbol"
is called mode "name"
.
type "language"
is returned as "("
or "call"
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
typeof
for the R-internal ‘mode’ or ‘type’,
type.convert
, attributes
.
require(stats) sapply(options(), mode) cex3 <- c("NULL", "1", "1:1", "1i", "list(1)", "data.frame(x = 1)", "pairlist(pi)", "c", "lm", "formals(lm)[[1]]", "formals(lm)[[2]]", "y ~ x","expression((1))[[1]]", "(y ~ x)[[1]]", "expression(x <- pi)[[1]][[1]]") lex3 <- sapply(cex3, function(x) eval(str2lang(x))) mex3 <- t(sapply(lex3, function(x) c(typeof(x), storage.mode(x), mode(x)))) dimnames(mex3) <- list(cex3, c("typeof(.)","storage.mode(.)","mode(.)")) mex3 ## This also makes a local copy of 'pi': storage.mode(pi) <- "complex" storage.mode(pi) rm(pi)
require(stats) sapply(options(), mode) cex3 <- c("NULL", "1", "1:1", "1i", "list(1)", "data.frame(x = 1)", "pairlist(pi)", "c", "lm", "formals(lm)[[1]]", "formals(lm)[[2]]", "y ~ x","expression((1))[[1]]", "(y ~ x)[[1]]", "expression(x <- pi)[[1]][[1]]") lex3 <- sapply(cex3, function(x) eval(str2lang(x))) mex3 <- t(sapply(lex3, function(x) c(typeof(x), storage.mode(x), mode(x)))) dimnames(mex3) <- list(cex3, c("typeof(.)","storage.mode(.)","mode(.)")) mex3 ## This also makes a local copy of 'pi': storage.mode(pi) <- "complex" storage.mode(pi) rm(pi)
Transform objects for matching via match()
, think
“match form” -> "mtfrm"
.
base provides the S3 generic and a default
plus
"POSIXct"
and "POSIXlt"
methods.
mtfrm(x)
mtfrm(x)
x |
an R object |
Matching via match
will use mtfrm
to transform
internally classed objects (see is.object
) to a vector
representation appropriate for matching. The default method performs
as.character
if this preserves the length.
Ideally, methods for mtfrm
should ensure that comparisons of
same-classed objects via match
are consistent with those
employed by methods for duplicated
/unique
and ==
/!=
(where applicable).
A vector of the same length as x
.
NA
is a logical constant of length 1 which contains a missing
value indicator. NA
can be coerced to any other vector
type except raw. There are also constants NA_integer_
,
NA_real_
, NA_complex_
and NA_character_
of the
other atomic vector types which support missing values: all of these
are reserved words in the R language.
The generic function is.na
indicates which elements are missing.
The generic function is.na<-
sets elements to NA
.
The generic function anyNA
implements any(is.na(x))
in a
possibly faster way (especially for atomic vectors).
NA is.na(x) anyNA(x, recursive = FALSE) ## S3 method for class 'data.frame' is.na(x) is.na(x) <- value
NA is.na(x) anyNA(x, recursive = FALSE) ## S3 method for class 'data.frame' is.na(x) is.na(x) <- value
x |
an R object to be tested: the default method for
|
recursive |
logical: should |
value |
a suitable index vector for use with |
The NA
of character type is distinct from the string
"NA"
. Programmers who need to specify an explicit missing
string should use NA_character_
(rather than "NA"
) or set
elements to NA
using is.na<-
.
is.na
and anyNA
are generic: you can write
methods to handle specific classes of objects, see
InternalMethods.
Function is.na<-
may provide a safer way to set missingness.
It behaves differently for factors, for example.
Numerical computations using NA
will normally result in
NA
: a possible exception is where NaN
is also
involved, in which case either might result (which may depend on
the R platform). However, this is not guaranteed and future CPUs
and/or compilers may behave differently. Dynamic binary translation may
also impact this behavior (with valgrind, computations using NA
may result in NaN
even when no NaN
is involved).
Logical computations treat NA
as a missing TRUE/FALSE
value, and so may return TRUE
or FALSE
if the expression
does not depend on the NA
operand.
The default method for anyNA
handles atomic vectors without a
class and NULL
. It calls any(is.na(x))
on objects with
classes and for recursive = FALSE
, on lists and pairlists.
The default method for is.na
applied to an atomic vector
returns a logical vector of the same length as its argument x
,
containing TRUE
for those elements marked NA
or, for
numeric or complex vectors, NaN
, and FALSE
otherwise. (A complex value is regarded as NA
if either its
real or imaginary part is NA
or NaN
.)
dim
, dimnames
and names
attributes are copied to
the result.
The default methods also work for lists and pairlists:
For is.na
, elementwise the result is false unless that element
is a length-one atomic vector and the single element of that vector is
regarded as NA
or NaN
(note that any is.na
method for the class of the element is ignored).anyNA(recursive = FALSE)
works the same way as is.na
;
anyNA(recursive = TRUE)
applies anyNA
(with method
dispatch) to each element.
The data frame method for is.na
returns a logical matrix
with the same dimensions as the data frame, and with dimnames taken
from the row and column names of the data frame.
anyNA(NULL)
is false; is.na(NULL)
is logical(0)
(no longer warning since R version 3.5.0).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Chambers, J. M. (1998) Programming with Data. A Guide to the S Language. Springer.
NaN
, is.nan
, etc.,
and the utility function complete.cases
.
na.action
, na.omit
, na.fail
on how methods can be tuned to deal with missing values.
is.na(c(1, NA)) #> FALSE TRUE is.na(paste(c(1, NA))) #> FALSE FALSE (xx <- c(0:4)) is.na(xx) <- c(2, 4) xx #> 0 NA 2 NA 4 anyNA(xx) # TRUE # Some logical operations do not return NA c(TRUE, FALSE) & NA c(TRUE, FALSE) | NA ## Measure speed difference in a favourable case: ## the difference depends on the platform, on most ca 3x. x <- 1:10000; x[5000] <- NaN # coerces x to be double if(require("microbenchmark")) { # does not work reliably on all platforms print(microbenchmark(any(is.na(x)), anyNA(x))) } else { nSim <- 2^13 print(rbind(is.na = system.time(replicate(nSim, any(is.na(x)))), anyNA = system.time(replicate(nSim, anyNA(x))))) } ## anyNA() can work recursively with list()s: LL <- list(1:5, c(NA, 5:8), c("A","NA"), c("a", NA_character_)) L2 <- LL[c(1,3)] sapply(LL, anyNA); c(anyNA(LL), anyNA(LL, TRUE)) sapply(L2, anyNA); c(anyNA(L2), anyNA(L2, TRUE)) ## ... lists, and hence data frames, too: dN <- dd <- USJudgeRatings; dN[3,6] <- NA anyNA(dd) # FALSE anyNA(dN) # TRUE
is.na(c(1, NA)) #> FALSE TRUE is.na(paste(c(1, NA))) #> FALSE FALSE (xx <- c(0:4)) is.na(xx) <- c(2, 4) xx #> 0 NA 2 NA 4 anyNA(xx) # TRUE # Some logical operations do not return NA c(TRUE, FALSE) & NA c(TRUE, FALSE) | NA ## Measure speed difference in a favourable case: ## the difference depends on the platform, on most ca 3x. x <- 1:10000; x[5000] <- NaN # coerces x to be double if(require("microbenchmark")) { # does not work reliably on all platforms print(microbenchmark(any(is.na(x)), anyNA(x))) } else { nSim <- 2^13 print(rbind(is.na = system.time(replicate(nSim, any(is.na(x)))), anyNA = system.time(replicate(nSim, anyNA(x))))) } ## anyNA() can work recursively with list()s: LL <- list(1:5, c(NA, 5:8), c("A","NA"), c("a", NA_character_)) L2 <- LL[c(1,3)] sapply(LL, anyNA); c(anyNA(LL), anyNA(LL, TRUE)) sapply(L2, anyNA); c(anyNA(L2), anyNA(L2, TRUE)) ## ... lists, and hence data frames, too: dN <- dd <- USJudgeRatings; dN[3,6] <- NA anyNA(dd) # FALSE anyNA(dN) # TRUE
A ‘name’ (also known as a ‘symbol’) is a way to refer to R objects by name (rather than the value of the object, if any, bound to that name).
as.name
and as.symbol
are identical: they attempt to
coerce the argument to a name.
is.symbol
and the identical is.name
return TRUE
or FALSE
depending on whether the argument is a name or not.
as.symbol(x) is.symbol(x) as.name(x) is.name(x)
as.symbol(x) is.symbol(x) as.name(x) is.name(x)
x |
object to be coerced or tested. |
Names are limited to 10,000 bytes (and were to 256 bytes in versions of R before 2.13.0).
as.name
first coerces its argument internally to a character
vector (so methods for as.character
are not used). It then
takes the first element and provided it is not ""
, returns a
symbol of that name (and if the element is NA_character_
, the
name is `NA`
).
as.name
is implemented as as.vector(x, "symbol")
,
and hence will dispatch methods for the generic function as.vector
.
is.name
and is.symbol
are primitive functions.
For as.name
and as.symbol
, an R object of type
"symbol"
(see typeof
).
For is.name
and is.symbol
, a length-one logical vector
with value TRUE
or FALSE
.
The term ‘symbol’ is from the LISP background of R, whereas ‘name’ has been the standard S term for this.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
call
, is.language
.
For the internal object mode, typeof
.
plotmath
for another use of ‘symbol’.
an <- as.name("arrg") is.name(an) # TRUE mode(an) # name typeof(an) # symbol
an <- as.name("arrg") is.name(an) # TRUE mode(an) # name typeof(an) # symbol
Functions to get or set the names of an object.
names(x) names(x) <- value
names(x) names(x) <- value
x |
an R object. |
value |
a character vector of up to the same length as |
names
is a generic accessor function, and names<-
is a
generic replacement function. The default methods get and set
the "names"
attribute of a vector (including a list) or
pairlist.
For an environment
env
, names(env)
gives
the names of the corresponding list, i.e.,
names(as.list(env, all.names = TRUE))
which are also given by
ls(env, all.names = TRUE, sorted = FALSE)
. If the
environment is used as a hash table, names(env)
are its
“keys”.
If value
is shorter than x
, it is extended by character
NA
s to the length of x
.
It is possible to update just part of the names attribute via the
general rules: see the examples. This works because the expression
there is evaluated as z <- "names<-"(z, "[<-"(names(z), 3, "c2"))
.
The name ""
is special: it is used to indicate that there is no
name associated with an element of a (atomic or generic) vector.
Subscripting by ""
will match nothing (not even elements which
have no name).
A name can be character NA
, but such a name will never be
matched and is likely to lead to confusion.
Both are primitive functions.
For names
, NULL
or a character vector of the same length
as x
. (NULL
is given if the object has no names,
including for objects of types which cannot have names.) For an
environment, the length is the number of objects in the environment
but the order of the names is arbitrary.
For names<-
, the updated object. (Note that the value of
names(x) <- value
is that of the assignment, value
, not
the return value from the left-hand side.)
For vectors, the names are one of the attributes with restrictions on the possible values. For pairlists, the names are the tags and converted to and from a character vector.
For a one-dimensional array the names
attribute really is
dimnames[[1]]
.
Formally classed aka “S4” objects typically have
slotNames()
(and no names()
).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
# print the names attribute of the islands data set names(islands) # remove the names attribute names(islands) <- NULL islands rm(islands) # remove the copy made z <- list(a = 1, b = "c", c = 1:3) names(z) # change just the name of the third element. names(z)[3] <- "c2" z z <- 1:3 names(z) ## assign just one name names(z)[2] <- "b" z
# print the names attribute of the islands data set names(islands) # remove the names attribute names(islands) <- NULL islands rm(islands) # remove the copy made z <- list(a = 1, b = "c", c = 1:3) names(z) # change just the name of the third element. names(z)[3] <- "c2" z z <- 1:3 names(z) ## assign just one name names(z)[2] <- "b" z
When used inside a function body, nargs
returns the number of
arguments supplied to that function, including positional
arguments left blank.
nargs()
nargs()
The count includes empty (missing) arguments, so that foo(x,,z)
will be considered to have three arguments (see ‘Examples’).
This can occur in rather indirect ways, so for example x[]
might dispatch a call to `[.some_method`(x, )
which is
considered to have two arguments.
This is a primitive function.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
tst <- function(a, b = 3, ...) {nargs()} tst() # 0 tst(clicketyclack) # 1 (even non-existing) tst(c1, a2, rr3) # 3 foo <- function(x, y, z, w) { cat("call was ", deparse(match.call()), "\n", sep = "") nargs() } foo() # 0 foo(, , 3) # 3 foo(z = 3) # 1, even though this is the same call nargs() # not really meaningful
tst <- function(a, b = 3, ...) {nargs()} tst() # 0 tst(clicketyclack) # 1 (even non-existing) tst(c1, a2, rr3) # 3 foo <- function(x, y, z, w) { cat("call was ", deparse(match.call()), "\n", sep = "") nargs() } foo() # 0 foo(, , 3) # 3 foo(z = 3) # 1, even though this is the same call nargs() # not really meaningful
nchar
takes a character vector as an argument and
returns a vector whose elements contain the sizes of
the corresponding elements of x
. Internally, it is a generic,
for which methods can be defined (see InternalMethods).
nzchar
is a fast way to find out if elements of a character
vector are non-empty strings.
nchar(x, type = "chars", allowNA = FALSE, keepNA = NA) nzchar(x, keepNA = FALSE)
nchar(x, type = "chars", allowNA = FALSE, keepNA = NA) nzchar(x, keepNA = FALSE)
x |
character vector, or a vector to be coerced to a character vector. Giving a factor is an error. |
type |
character string: partial matching to one of
|
allowNA |
logical: should |
keepNA |
logical: should |
The ‘size’ of a character string can be measured in one of
three ways (corresponding to the type
argument):
bytes
The number of bytes needed to store the string (plus in C a final terminator which is not counted).
chars
The number of characters.
width
The number of columns cat
will use to
print the string in a monospaced font. The same as chars
if this cannot be calculated.
These will often be the same, and usually will be in single-byte
locales (but note how type
determines the default for
keepNA
). There will be differences between the first two with
multibyte character sequences, e.g. in UTF-8 locales.
The internal equivalent of the default method of
as.character
is performed on x
(so there is no
method dispatch). If you want to operate on non-vector objects
passing them through deparse
first will be required.
For nchar
, an integer vector giving the sizes of each element.
For missing values (i.e., NA
, i.e., NA_character_
),
nchar()
returns NA_integer_
if keepNA
is
true, and 2
, the number of printing characters, if false.
type = "width"
gives (an approximation to) the number of
columns used in printing each element in a terminal font, taking into
account double-width, zero-width and ‘composing’ characters.
The approximation is likely to be poor when there are unassigned or
non-printing characters.
If allowNA = TRUE
and an element is detected as invalid in a
multi-byte character set such as UTF-8, its number of characters and
the width will be NA
. Otherwise the number of characters will
be non-negative, so !is.na(nchar(x, "chars", TRUE))
is a test
of validity.
A character string marked with "bytes"
encoding (see
Encoding
) has a number of bytes, but neither a known
number of characters nor a width, so the latter two types are
NA
if allowNA = TRUE
, otherwise an error.
Names, dims and dimnames are copied from the input.
For nzchar
, a logical vector of the same length as x
,
true if and only if the element has non-zero size; if the element is
NA
, nzchar()
is true when keepNA
is false (the
default) or NA
, and NA
otherwise.
This does not by default give the number of characters that
will be used to print()
the string. Use
encodeString
to find that.
Where character strings have been marked as UTF-8, the number of characters and widths will be computed in UTF-8, even though printing may use escapes such as ‘<U+2642>’ in a non-UTF-8 locale.
The concept of ‘width’ is a slippery one even in a monospaced
font. Some human languages have the concept of combining
characters, in which two or more characters are rendered together: an
example would be "y\u306"
, which is two characters of width
one: combining characters are given width zero, and there are other
zero-width characters such as the zero-width space "\u200b"
.
Some East Asian languages have ‘wide’ characters, ideographs
which are conventionally printed across two columns when mixed with
ASCII and other ‘narrow’ characters in those languages. The
problem is that whether a computer prints wide characters over two or
one columns depends on the font, with it not being uncommon to use two
columns in a font intended for East Asian users and a single column in
a ‘Western’ font. Unicode has encodings for ‘fullwidth’
versions of ASCII characters and ‘halfwidth’ versions of
Katakana (Japanese) and Hangul (Korean) characters. Then there is the
‘East Asian Ambiguous class’ (Greek, Cyrillic, signs, some
accented Latin chars, etc), for which the historical practice was to
use two columns in East Asia and one elsewhere. The width quoted by
nchar
for characters in that class (and some others) depends on
the locale, being one except in some East Asian locales on some OSes
(notably Windows).
Control characters are usually given width zero: this includes CR and LF. Computing the width of a string containing control characters should be avoided (and may depend on the OS and R version).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Unicode Standard Annex #11: East Asian Width. https://www.unicode.org/reports/tr11/
strwidth
giving width of strings for plotting;
paste
, substr
, strsplit
x <- c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech") nchar(x) # 5 6 6 1 15 nchar(deparse(mean)) # 18 17 <-- unless mean differs from base::mean ## NA behaviour as function of keepNA=* : logi <- setNames(, c(FALSE, NA, TRUE)) sapply(logi, \(k) data.frame(nchar = nchar (NA, keepNA=k), nzchar = nzchar(NA, keepNA=k))) x[3] <- NA; x nchar(x, keepNA= TRUE) # 5 6 NA 1 15 nchar(x, keepNA=FALSE) # 5 6 2 1 15 stopifnot(identical(nchar(x ), nchar(x, keepNA= TRUE)), identical(nchar(x, "w"), nchar(x, keepNA=FALSE)), identical(is.na(x), is.na(nchar(x)))) ##' nchar() for all three types : nchars <- function(x, ...) vapply(c("chars", "bytes", "width"), function(tp) nchar(x, tp, ...), integer(length(x))) nchars("\u200b") # in R versions (>= 2015-09-xx): ## chars bytes width ## 1 3 0 data.frame(x, nchars(x)) ## all three types : same unless for NA ## force the same by forcing 'keepNA': (ncT <- nchars(x, keepNA = TRUE)) ## .... NA NA NA .... (ncF <- nchars(x, keepNA = FALSE))## .... 2 2 2 .... stopifnot(apply(ncT, 1, function(.) length(unique(.))) == 1, apply(ncF, 1, function(.) length(unique(.))) == 1)
x <- c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech") nchar(x) # 5 6 6 1 15 nchar(deparse(mean)) # 18 17 <-- unless mean differs from base::mean ## NA behaviour as function of keepNA=* : logi <- setNames(, c(FALSE, NA, TRUE)) sapply(logi, \(k) data.frame(nchar = nchar (NA, keepNA=k), nzchar = nzchar(NA, keepNA=k))) x[3] <- NA; x nchar(x, keepNA= TRUE) # 5 6 NA 1 15 nchar(x, keepNA=FALSE) # 5 6 2 1 15 stopifnot(identical(nchar(x ), nchar(x, keepNA= TRUE)), identical(nchar(x, "w"), nchar(x, keepNA=FALSE)), identical(is.na(x), is.na(nchar(x)))) ##' nchar() for all three types : nchars <- function(x, ...) vapply(c("chars", "bytes", "width"), function(tp) nchar(x, tp, ...), integer(length(x))) nchars("\u200b") # in R versions (>= 2015-09-xx): ## chars bytes width ## 1 3 0 data.frame(x, nchars(x)) ## all three types : same unless for NA ## force the same by forcing 'keepNA': (ncT <- nchars(x, keepNA = TRUE)) ## .... NA NA NA .... (ncF <- nchars(x, keepNA = FALSE))## .... 2 2 2 .... stopifnot(apply(ncT, 1, function(.) length(unique(.))) == 1, apply(ncF, 1, function(.) length(unique(.))) == 1)
Return the number of levels which its argument has.
nlevels(x)
nlevels(x)
x |
an object, usually a factor. |
This is usually applied to a factor, but other objects can have levels.
The actual factor levels (if they exist) can be obtained
with the levels
function.
The length of levels(x)
, which is zero if
x
has no levels.
nlevels(gl(3, 7)) # = 3
nlevels(gl(3, 7)) # = 3
Print character strings without quotes.
noquote(obj, right = FALSE) ## S3 method for class 'noquote' print(x, quote = FALSE, right = FALSE, ...) ## S3 method for class 'noquote' c(..., recursive = FALSE)
noquote(obj, right = FALSE) ## S3 method for class 'noquote' print(x, quote = FALSE, right = FALSE, ...) ## S3 method for class 'noquote' c(..., recursive = FALSE)
obj |
any R object, typically a vector of
|
right |
optional |
x |
an object of class |
quote , ...
|
further options passed to next methods, such as |
recursive |
for compatibility with the generic |
noquote
returns its argument as an object of class
"noquote"
. There is a method for c()
and subscript
method ("[.noquote"
) which ensures that the class is not lost
by subsetting. The print method (print.noquote
) prints
character strings without quotes ("...."
is printed as ....
).
If right
is specified in a call print(x, right=*)
, it
takes precedence over a possible right
setting of x
,
e.g., created by x <- noquote(*, right=TRUE)
.
These functions exist both as utilities and as an example of using (S3)
class
and object orientation.
Martin Maechler [email protected]
letters nql <- noquote(letters) nql nql[1:4] <- "oh" nql[1:12] cmp.logical <- function(log.v) { ## Purpose: compact printing of logicals log.v <- as.logical(log.v) noquote(if(length(log.v) == 0)"()" else c(".","|")[1 + log.v]) } cmp.logical(stats::runif(20) > 0.8) chmat <- as.matrix(format(stackloss)) # a "typical" character matrix ## noquote(*, right=TRUE) so it prints exactly like a data frame chmat <- noquote(chmat, right = TRUE) chmat
letters nql <- noquote(letters) nql nql[1:4] <- "oh" nql[1:12] cmp.logical <- function(log.v) { ## Purpose: compact printing of logicals log.v <- as.logical(log.v) noquote(if(length(log.v) == 0)"()" else c(".","|")[1 + log.v]) } cmp.logical(stats::runif(20) > 0.8) chmat <- as.matrix(format(stackloss)) # a "typical" character matrix ## noquote(*, right=TRUE) so it prints exactly like a data frame chmat <- noquote(chmat, right = TRUE) chmat
Computes a matrix norm of x
using LAPACK. The norm can be
the one ("O"
) norm, the infinity ("I"
) norm, the
Frobenius ("F"
) norm, the maximum modulus ("M"
) among
elements of a matrix, or the “spectral” or "2"
-norm, as
determined by the value of type
.
norm(x, type = c("O", "I", "F", "M", "2"))
norm(x, type = c("O", "I", "F", "M", "2"))
x |
numeric matrix; note that packages such as Matrix
define more |
type |
character string, specifying the type of matrix norm to be computed. A character indicating the type of norm desired.
The default is |
The base method of norm()
calls the LAPACK function
dlange
.
Note that the 1-, Inf- and "M"
norm is faster to calculate than
the Frobenius one.
Unsuccessful results from the underlying LAPACK code will result in an error giving a positive error code: these can only be interpreted by detailed study of the FORTRAN code.
The matrix norm, a non-negative number. Zero for a 0-extent (empty) matrix.
Except for norm = "2"
, the LAPACK routine DLANGE
.
LAPACK is from https://netlib.org/lapack/.
Anderson, E., et al (1994). LAPACK User's Guide, 2nd edition, SIAM, Philadelphia.
rcond
for the (reciprocal) condition number.
(x1 <- cbind(1, 1:10)) norm(x1) norm(x1, "I") norm(x1, "M") stopifnot(all.equal(norm(x1, "F"), sqrt(sum(x1^2)))) hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, `+`) } h9 <- hilbert(9) ## all 5 (4 different) types of norm: (nTyp <- eval(formals(base::norm)$type)) sapply(nTyp, norm, x = h9) stopifnot(exprs = { # 0-extent matrices: sapply(nTyp, norm, x = matrix(, 1,0)) == 0 sapply(nTyp, norm, x = matrix(, 0,0)) == 0 })
(x1 <- cbind(1, 1:10)) norm(x1) norm(x1, "I") norm(x1, "M") stopifnot(all.equal(norm(x1, "F"), sqrt(sum(x1^2)))) hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, `+`) } h9 <- hilbert(9) ## all 5 (4 different) types of norm: (nTyp <- eval(formals(base::norm)$type)) sapply(nTyp, norm, x = h9) stopifnot(exprs = { # 0-extent matrices: sapply(nTyp, norm, x = matrix(, 1,0)) == 0 sapply(nTyp, norm, x = matrix(, 0,0)) == 0 })
Convert file paths to canonical form for the platform, to display them in a user-understandable form and so that relative and absolute paths can be compared.
normalizePath(path, winslash = "\\", mustWork = NA)
normalizePath(path, winslash = "\\", mustWork = NA)
path |
character vector of file paths. |
winslash |
the separator to be used on Windows – ignored
elsewhere. Must be one of |
mustWork |
logical: if |
Tilde-expansion (see path.expand
) is first done on
paths
.
Where the Unix-alike platform supports it attempts to turn paths into
absolute paths in their canonical form (no ‘./’, ‘../’ nor
symbolic links). It relies on the POSIX system function
realpath
: if the platform does not have that (we know of no
current example) then the result will be an absolute path but might
not be canonical. Even where realpath
is used the canonical
path need not be unique, for example via hard links or
multiple mounts.
On Windows it converts relative paths to absolute paths, resolves symbolic
links, converts short names for path elements to long names and ensures the
separator is that specified by winslash
. It will match each path
element case-insensitively or case-sensitively as during the usual name
lookup and return the canonical case. It relies on Windows API function
GetFinalPathNameByHandle
and in case of an error (such as
insufficient permissions) it currently falls back to the R 3.6 (and
older) implementation, which relies on GetFullPathName
and
GetLongPathName
with limitations described in the Notes section.
An attempt is made not to introduce UNC paths in presence of mapped drives
or symbolic links: if GetFinalPathNameByHandle
returns a UNC path,
but GetLongPathName
returns a path starting with a drive letter, R
falls back to the R 3.6 (and older) implementation.
UTF-8-encoded paths not valid in the current locale can be used.
mustWork = FALSE
is useful for expressing paths for use in
messages.
A character vector.
If an input is not a real path the result is system-dependent (unless
mustWork = TRUE
, when this should be an error). It will be
either the corresponding input element or a transformation of it into
an absolute path.
Converting to an absolute file path can fail for a large number of reasons. The most common are
One of more components of the file path does not exist.
A component before the last is not a directory, or there is insufficient permission to read the directory.
For a relative path, the current directory cannot be determined.
A symbolic link points to a non-existent place or links form a loop.
The canonicalized path would be exceed the maximum supported length of a file path.
The canonical form of paths may not be what you expect. For example,
on macOS absolute paths such as ‘/tmp’ and ‘/var’ are
symbolic links. On Linux, a path produced by bash process substitution is
a symbolic link (such as ‘/proc/fd/63’) to a pipe and there is no
canonical form of such path. In R 3.6 and older on Windows, symlinks will
not be resolved and the long names for path elements will be returned with
the case in which they are in path
, which may not be canonical in
case-insensitive folders.
cat(normalizePath(c(R.home(), tempdir())), sep = "\n")
cat(normalizePath(c(R.home(), tempdir())), sep = "\n")
In order to pinpoint missing functionality, the R core team uses these functions for missing R functions and not yet used arguments of existing R functions (which are typically there for compatibility purposes).
You are very welcome to contribute your code ...
.NotYetImplemented() .NotYetUsed(arg, error = TRUE)
.NotYetImplemented() .NotYetUsed(arg, error = TRUE)
arg |
an argument of a function that is not yet used. |
error |
a logical. If |
the contrary, Deprecated
and
Defunct
for outdated code.
require(graphics) barplot(1:5, inside = TRUE) # 'inside' is not yet used
require(graphics) barplot(1:5, inside = TRUE) # 'inside' is not yet used
nrow
and ncol
return the number of rows or columns
present in x
.
NCOL
and NROW
do the same treating a vector as
1-column matrix, even a 0-length vector, compatibly with
as.matrix()
or cbind()
, see the example.
nrow(x) ncol(x) NCOL(x) NROW(x)
nrow(x) ncol(x) NCOL(x) NROW(x)
x |
a vector, array, data frame, or |
an integer
of length 1 or NULL
, the
latter only for ncol
and nrow
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole (ncol
and nrow
.)
dim
which returns all dimensions, and
length
which gives a number (a ‘count’) also in cases where
dim()
is NULL
, and hence nrow()
and ncol()
return NULL
;
array
, matrix
.
ma <- matrix(1:12, 3, 4) nrow(ma) # 3 ncol(ma) # 4 ncol(array(1:24, dim = 2:4)) # 3, the second dimension NCOL(1:12) # 1 NROW(1:12) # 12, the length() of the vector ## as.matrix() produces 1-column matrices from 0-length vectors, ## and so does cbind() : dim(as.matrix(numeric())) # 0 1 dim( cbind(numeric())) # ditto NCOL(numeric()) # 1 ## However, as.matrix(NULL) fails and cbind(NULL) gives NULL, hence for ## consistency: NCOL(NULL) # 0 ## (This gave 1 in R < 4.4.0.)
ma <- matrix(1:12, 3, 4) nrow(ma) # 3 ncol(ma) # 4 ncol(array(1:24, dim = 2:4)) # 3, the second dimension NCOL(1:12) # 1 NROW(1:12) # 12, the length() of the vector ## as.matrix() produces 1-column matrices from 0-length vectors, ## and so does cbind() : dim(as.matrix(numeric())) # 0 1 dim( cbind(numeric())) # ditto NCOL(numeric()) # 1 ## However, as.matrix(NULL) fails and cbind(NULL) gives NULL, hence for ## consistency: NCOL(NULL) # 0 ## (This gave 1 in R < 4.4.0.)
Accessing exported and internal variables, i.e. R objects (including lazy loaded data sets) in a namespace.
pkg::name pkg:::name
pkg::name pkg:::name
pkg |
package name: symbol or literal character string. |
name |
variable name: symbol or literal character string. |
For a package pkg, pkg::name
returns the value of the
exported variable name
in namespace pkg
, whereas
pkg:::name
returns the value of the internal variable
name
. The package namespace will be loaded if it was not
loaded before the call, but the package will not be attached to the
search path.
Specifying a variable or package that does not exist is an error.
Note that pkg::name
does not access the objects in the
environment package:pkg
(which does not exist until the
package's namespace is attached): the latter may contain objects not
exported from the namespace. It can access datasets made available by
lazy-loading.
It is typically a design mistake to use :::
in your code since the corresponding object has probably been kept
internal for a good reason. Consider contacting the package
maintainer
if you feel the need to access the object for
anything but mere inspection.
get
to access an object masked by another of the same name.
loadNamespace
, asNamespace
for more about
namespaces.
base::log base::"+" ## Beware -- use ':::' at your own risk! (see "Details") stats:::coef.default
base::log base::"+" ## Beware -- use ':::' at your own risk! (see "Details") stats:::coef.default
Packages can supply functions to be called when loaded, attached, detached or unloaded.
.onLoad(libname, pkgname) .onAttach(libname, pkgname) .onUnload(libpath) .onDetach(libpath) .Last.lib(libpath)
.onLoad(libname, pkgname) .onAttach(libname, pkgname) .onUnload(libpath) .onDetach(libpath) .Last.lib(libpath)
libname |
a character string giving the library directory where the package defining the namespace was found. |
pkgname |
a character string giving the name of the package. |
libpath |
a character string giving the complete path to the package. |
After loading, loadNamespace
looks for a hook function
named .onLoad
and calls it (with two unnamed arguments) before
sealing the namespace and processing exports.
When the package is attached (via library
or
attachNamespace
), the hook function .onAttach
is
looked for and if found is called (with two unnamed arguments) before
the package environment is sealed.
If a function .onDetach
is in the namespace or .Last.lib
is exported from the package, it will be called (with a single
argument) when the package is detach
ed. Beware that it
might be called if .onAttach
has failed, so it should be
written defensively. (It is called within tryCatch
, so
errors will not stop the package being detached.)
If a namespace is unloaded (via unloadNamespace
), a hook
function .onUnload
is run (with a single argument) before final
unloading.
Note that the code in .onLoad
and .onUnload
should not
assume any package except the base package is on the search path.
Objects in the current package will be visible (unless this is
circumvented), but objects from other packages should be imported or
the double colon operator should be used.
.onLoad
, .onUnload
, .onAttach
and
.onDetach
are looked for as internal objects in the namespace
and should not be exported (whereas .Last.lib
should be).
Note that packages are not detached nor namespaces unloaded at the end
of an R session unless the user arranges to do so (e.g., via
.Last
).
Anything needed for the functioning of the namespace should be
handled at load/unload times by the .onLoad
and
.onUnload
hooks. For example, DLLs can be loaded (unless done
by a useDynLib
directive in the ‘NAMESPACE’ file) and
initialized in .onLoad
and unloaded in .onUnload
. Use
.onAttach
only for actions that are needed only when the
package becomes visible to the user (for example a start-up message)
or need to be run after the package environment has been created.
Loading a namespace should where possible be silent, with startup
messages given by .onAttach
. These messages (and any essential
ones from .onLoad
) should use packageStartupMessage
so they can be silenced where they would be a distraction.
There should be no calls to library
nor require
in these
hooks. The way for a package to load other packages is via the
‘Depends’ field in the ‘DESCRIPTION’ file: this ensures
that the dependence is documented and packages are loaded in the
correct order. Loading a namespace should not change the search path,
so rather than attach a package, dependence of a namespace on another
package should be achieved by (selectively) importing from the other
package's namespace.
Uses of library
with argument help
to display basic
information about the package should use format
on the
computed package information object and pass this to
packageStartupMessage
.
There should be no calls to installed.packages
in startup
code: it is potentially very slow and may fail in versions of R
before 2.14.2 if package installation is going on in parallel. See
its help page for alternatives.
Compiled code should be loaded (e.g., via
library.dynam
) in .onLoad
or a useDynLib
directive in the ‘NAMESPACE’ file, and not in .onAttach
.
Similarly, compiled code should not be unloaded (e.g., via
library.dynam.unload
) in .Last.lib
nor
.onDetach
, only in .onUnload
.
setHook
shows how users can set hooks on the same events, and
lists the sequence of events involving all of the hooks.
reg.finalizer
for hooks to be run at the end of a session.
loadNamespace
for more about namespaces.
Functions to load and unload name spaces.
attachNamespace(ns, pos = 2L, depends = NULL, exclude, include.only) loadNamespace(package, lib.loc = NULL, keep.source = getOption("keep.source.pkgs"), partial = FALSE, versionCheck = NULL, keep.parse.data = getOption("keep.parse.data.pkgs")) requireNamespace(package, ..., quietly = FALSE) loadedNamespaces() unloadNamespace(ns) isNamespaceLoaded(name)
attachNamespace(ns, pos = 2L, depends = NULL, exclude, include.only) loadNamespace(package, lib.loc = NULL, keep.source = getOption("keep.source.pkgs"), partial = FALSE, versionCheck = NULL, keep.parse.data = getOption("keep.parse.data.pkgs")) requireNamespace(package, ..., quietly = FALSE) loadedNamespaces() unloadNamespace(ns) isNamespaceLoaded(name)
ns |
string or name space object. |
pos |
integer specifying position to attach. |
depends |
|
package |
string naming the package/name space to load. |
lib.loc |
character vector specifying library search path (the location of R library trees to search through. |
keep.source |
now ignored except during package installation. |
keep.parse.data |
ignored except during package installation. |
partial |
logical; if true, stop just after loading code. |
versionCheck |
|
quietly |
logical: should progress and error messages be suppressed? |
name |
string or ‘name’, see |
exclude , include.only
|
character vectors; see |
... |
further arguments to be passed to |
The functions loadNamespace
and attachNamespace
are
usually called implicitly when library
is used to load a name
space and any imports needed. However it may be useful at times to
call these functions directly.
loadNamespace
loads the specified name space and registers it in
an internal data base. A request to load a name space when one of that
name is already loaded has no effect. The arguments have the same
meaning as the corresponding arguments to library
, whose
help page explains the details of how a particular installed package
comes to be chosen. After loading, loadNamespace
looks for a
hook function named .onLoad
as an internal variable in
the name space (it should not be exported). Partial loading is used
to support installation with lazy-loading.
Optionally the package licence is checked during loading: see section
‘Licenses’ in the help for library
.
loadNamespace
does not attach the name space it loads to the
search path. attachNamespace
can be used to attach a frame
containing the exported values of a name space to the search path (but
this is almost always done via library
). The
hook function .onAttach
is run after the name space
exports are attached.
requireNamespace
is a wrapper for loadNamespace
analogous to require
that returns a logical value.
loadedNamespaces
returns a character vector of the names of
the loaded name spaces.
isNamespaceLoaded(pkg)
is equivalent to but more efficient than
pkg %in% loadedNamespaces()
.
unloadNamespace
can be used to attempt to force a name space to
be unloaded. If the name space is attached, it is first
detach
ed, thereby running a .onDetach
or
.Last.lib
function in the name space if one is exported. An
error is signaled and the name space is not unloaded if the name space
is imported by other loaded name spaces. If defined, a hook function
.onUnload
is run before removing the name space from the
internal registry.
See the comments in the help for detach
about some
issues with unloading and reloading name spaces.
attachNamespace
returns invisibly the package environment it
adds to the search path.
loadNamespace
returns the name space environment, either one
already loaded or the one the function causes to be loaded.
requireNamespace
returns TRUE
if it succeeds or
FALSE
.
loadedNamespaces
returns a character
vector.
unloadNamespace
returns NULL
, invisibly.
As from R 4.1.0 the operation of loadNamespace
can be traced,
which can help track down the causes of unexpected messages (including
which package(s) they come from since loadNamespace
is called in
many ways including from itself and by ::
and can be called by
load
). Setting the environment variable
_R_TRACE_LOADNAMESPACE_ to a numerical value will generate
additional messages on progress. Non-zero values,
e.g. 1
, report which namespace is being loaded and when
loading completes: values 2
to 4
report in increasing
detail. Negative values are reserved for tracing specific features and
their current meanings are documented in source-code comments.
Loading standard packages is never traced.
Luke Tierney and R-core
The ‘Writing R Extensions’ manual, section “Package namespaces”.
getNamespace
, asNamespace
,
topenv
, .onLoad
(etc);
further environment
.
(lns <- loadedNamespaces()) statL <- isNamespaceLoaded("stats") stopifnot( identical(statL, "stats" %in% lns) ) ## The string "foo" and the symbol 'foo' can be used interchangably here: stopifnot( identical(isNamespaceLoaded( "foo" ), FALSE), identical(isNamespaceLoaded(quote(foo)), FALSE), identical(isNamespaceLoaded(quote(stats)), statL)) hasS <- isNamespaceLoaded("splines") # (to restore if needed) Sns <- asNamespace("splines") # loads it if not already stopifnot( isNamespaceLoaded("splines")) if (is.null(try(unloadNamespace(Sns)))) # try unloading the NS 'object' stopifnot( ! isNamespaceLoaded("splines")) if (hasS) loadNamespace("splines") # (restoring previous state)
(lns <- loadedNamespaces()) statL <- isNamespaceLoaded("stats") stopifnot( identical(statL, "stats" %in% lns) ) ## The string "foo" and the symbol 'foo' can be used interchangably here: stopifnot( identical(isNamespaceLoaded( "foo" ), FALSE), identical(isNamespaceLoaded(quote(foo)), FALSE), identical(isNamespaceLoaded(quote(stats)), statL)) hasS <- isNamespaceLoaded("splines") # (to restore if needed) Sns <- asNamespace("splines") # loads it if not already stopifnot( isNamespaceLoaded("splines")) if (is.null(try(unloadNamespace(Sns)))) # try unloading the NS 'object' stopifnot( ! isNamespaceLoaded("splines")) if (hasS) loadNamespace("splines") # (restoring previous state)
Finding the top level environment
from an environment
envir
and its enclosing environments.
topenv(envir = parent.frame(), matchThisEnv = getOption("topLevelEnvironment"))
topenv(envir = parent.frame(), matchThisEnv = getOption("topLevelEnvironment"))
envir |
environment. |
matchThisEnv |
return this environment, if it matches before
any other criterion is satisfied. The default, the option
‘topLevelEnvironment’, is set by |
topenv
returns the first top level environment
found when searching envir
and its enclosing environments. If no
top level environment is found, .GlobalEnv
is returned. An
environment is considered top level if it is the internal environment
of a namespace, a package environment in the search
path, or .GlobalEnv
.
environment
, notably parent.env()
on
“enclosing environments”;
loadNamespace
for more on namespaces.
topenv(.GlobalEnv) topenv(new.env()) # also global env topenv(environment(ls))# namespace:base topenv(environment(lm))# namespace:stats
topenv(.GlobalEnv) topenv(new.env()) # also global env topenv(environment(ls))# namespace:base topenv(environment(lm))# namespace:stats
NULL
represents the null object in R: it is a reserved
word. NULL
is often returned by expressions and functions
whose value is undefined.
NULL as.null(x, ...) is.null(x)
NULL as.null(x, ...) is.null(x)
x |
an object to be tested or coerced. |
... |
ignored. |
NULL
can be indexed (see Extract) in just about any
syntactically legal way: apart from NULL[[]]
which is an error, the result is
always NULL
. Objects with value NULL
can be changed by
replacement operators and will be coerced to the type of the
right-hand side.
NULL
is also used as the empty pairlist: see the
examples. Because pairlists are often promoted to lists, you may
encounter NULL
being promoted to an empty list.
Objects with value NULL
cannot have attributes as there is only
one null object: attempts to assign them are either an error
(attr
) or promote the object to an empty list with
attribute(s) (attributes
and structure
).
as.null
ignores its argument and returns NULL
.
is.null
returns TRUE
if its argument's value
is NULL
and FALSE
otherwise.
is.null
is a primitive function.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
%||%
: L %||% R
is equivalent to if(!is.null(L)) L else R
is.null(list()) # FALSE (on purpose!) is.null(pairlist()) # TRUE is.null(integer(0)) # FALSE is.null(logical(0)) # FALSE as.null(list(a = 1, b = "c"))
is.null(list()) # FALSE (on purpose!) is.null(pairlist()) # TRUE is.null(integer(0)) # FALSE is.null(logical(0)) # FALSE as.null(list(a = 1, b = "c"))
Creates or coerces objects of type "numeric"
.
is.numeric
is a more general test of an object being
interpretable as numbers.
numeric(length = 0) as.numeric(x, ...) is.numeric(x)
numeric(length = 0) as.numeric(x, ...) is.numeric(x)
length |
a non-negative integer specifying the desired length. Double values will be coerced to integer: supplying an argument of length other than one is an error. |
x |
object to be coerced or tested. |
... |
further arguments passed to or from other methods. |
numeric
is identical to double
.
It creates a double-precision vector of the specified length with each
element equal to 0
.
as.numeric
is a generic function, but S3 methods must be
written for as.double
. It is identical to as.double
.
is.numeric
is an internal generic primitive
function: you can write methods to handle specific classes of objects,
see InternalMethods. It is not the same as
is.double
. Factors are handled by the default method,
and there are methods for classes "Date"
,
"POSIXt"
and "difftime"
(all of which
return false). Methods for is.numeric
should only return true
if the base type of the class is double
or integer
and values can reasonably be regarded as numeric
(e.g., arithmetic on them makes sense, and comparison should be done
via the base type).
for numeric
and as.numeric
see double
.
The default method for is.numeric
returns TRUE
if its argument is of mode "numeric"
(type "double"
or type "integer"
) and not a
factor, and FALSE
otherwise. That is,
is.integer(x) || is.double(x)
, or
(mode(x) == "numeric") && !is.factor(x)
.
If x
is a factor
, as.numeric
will return
the underlying numeric (integer) representation, which is often
meaningless as it may not correspond to the factor
levels
, see the ‘Warning’ section in
factor
(and the 2nd example below).
as.numeric
and is.numeric
are internally S4 generic and
so methods can be set for them via setMethod
.
To ensure that as.numeric
and as.double
remain identical, S4 methods can only be set for as.numeric
.
It is a historical anomaly that R has two names for its
floating-point vectors, double
and numeric
(and formerly had real
).
double
is the name of the type.
numeric
is the name of the mode and also of the implicit
class. As an S4 formal class, use "numeric"
.
The potential confusion is that R has used mode
"numeric"
to mean ‘double or integer’, which conflicts
with the S4 usage. Thus is.numeric
tests the mode, not the
class, but as.numeric
(which is identical to as.double
)
coerces to the class.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
double
, integer
, storage.mode
.
## Conversion does trim whitespace; non-numeric strings give NA + warning as.numeric(c("-.1"," 2.7 ","B")) ## Numeric values are sometimes accidentally converted to factors. ## Converting them back to numeric is trickier than you'd expect. f <- factor(5:10) as.numeric(f) # not what you might expect, probably not what you want ## what you typically meant and want: as.numeric(as.character(f)) ## the same, considerably more efficient (for long vectors): as.numeric(levels(f))[f]
## Conversion does trim whitespace; non-numeric strings give NA + warning as.numeric(c("-.1"," 2.7 ","B")) ## Numeric values are sometimes accidentally converted to factors. ## Converting them back to numeric is trickier than you'd expect. f <- factor(5:10) as.numeric(f) # not what you might expect, probably not what you want ## what you typically meant and want: as.numeric(as.character(f)) ## the same, considerably more efficient (for long vectors): as.numeric(levels(f))[f]
A simple S3 class for representing numeric versions including package versions, and associated methods.
numeric_version(x, strict = TRUE) package_version(x, strict = TRUE) R_system_version(x, strict = TRUE) getRversion() as.numeric_version(x) as.package_version(x) is.numeric_version(x) is.package_version(x)
numeric_version(x, strict = TRUE) package_version(x, strict = TRUE) R_system_version(x, strict = TRUE) getRversion() as.numeric_version(x) as.package_version(x) is.numeric_version(x) is.package_version(x)
x |
for the creators, a character vector with suitable numeric
version strings (see ‘Details’);
for |
strict |
a logical indicating whether invalid numeric versions should result in an error (default) or not. |
Numeric versions are sequences of one or more non-negative integers, usually (e.g., in package ‘DESCRIPTION’ files) represented as character strings with the elements of the sequence concatenated and separated by single ‘.’ or ‘-’ characters. R package versions consist of at least two such integers, an R system version of exactly three (major, minor and patch level).
Functions numeric_version
, package_version
and
R_system_version
create a representation from such strings (if
suitable) which allows for coercion and testing, combination,
comparison, summaries (min/max), inclusion in data frames,
subscripting, and printing. The classes can hold a vector of such
representations.
getRversion
returns the version of the running R as an R
system version object.
The [[
operator extracts or replaces a single version. To
access the integers of a version use two indices: see the examples.
compareVersion
;
packageVersion
for the version of a specific R package.
R.version
etc for the version of R (and the information
underlying getRversion()
).
x <- package_version(c("1.2-4", "1.2-3", "2.1")) x < "1.4-2.3" c(min(x), max(x)) x[2, 2] x$major x$minor if(getRversion() <= "2.5.0") { ## work around missing feature cat("Your version of R, ", as.character(getRversion()), ", is outdated.\n", "Now trying to work around that ...\n", sep = "") } x[[1]] x[[c(1, 3)]] # '4' as a numeric version x[1, 3] # same x[[1, 3]] # 4 as an integer x[[2, 3]] <- 0 # zero the patchlevel x[[c(2, 3)]] <- 0 # same x x[[3]] <- "2.2.3" x x <- c(x, package_version("0.0")) is.na(x)[4] <- TRUE stopifnot(identical(is.na(x), c(rep(FALSE,3), TRUE)), anyNA(x))
x <- package_version(c("1.2-4", "1.2-3", "2.1")) x < "1.4-2.3" c(min(x), max(x)) x[2, 2] x$major x$minor if(getRversion() <= "2.5.0") { ## work around missing feature cat("Your version of R, ", as.character(getRversion()), ", is outdated.\n", "Now trying to work around that ...\n", sep = "") } x[[1]] x[[c(1, 3)]] # '4' as a numeric version x[1, 3] # same x[[1, 3]] # 4 as an integer x[[2, 3]] <- 0 # zero the patchlevel x[[c(2, 3)]] <- 0 # same x x[[3]] <- "2.2.3" x x <- c(x, package_version("0.0")) is.na(x)[4] <- TRUE stopifnot(identical(is.na(x), c(rep(FALSE,3), TRUE)), anyNA(x))
How R parses numeric constants.
R parses numeric constants in its input in a very similar way to C99 floating-point constants.
Inf
and NaN
are numeric constants (with
typeof(.) "double"
). In text input (e.g., in
scan
and as.double
), these are recognized
ignoring case as is infinity
as an alternative to Inf
.
NA_real_
and NA_integer_
are constants of
types "double"
and "integer"
representing missing
values. All other numeric constants start with a digit or period and
are either a decimal or hexadecimal constant optionally followed by
L
.
Hexadecimal constants start with 0x
or 0X
followed by
a non-empty sequence from 0-9 a-f A-F .
which is interpreted as a
hexadecimal number, optionally followed by a binary exponent. A binary
exponent consists of a P
or p
followed by an optional
plus or minus sign followed by a non-empty sequence of (decimal)
digits, and indicates multiplication by a power of two. Thus
0x123p456
is .
Decimal constants consist of a non-empty sequence of digits possibly
containing a period (the decimal point), optionally followed by a
decimal exponent. A decimal exponent consists of an E
or
e
followed by an optional plus or minus sign followed by a
non-empty sequence of digits, and indicates multiplication by a power
of ten.
Values which are too large or too small to be representable will
overflow to Inf
or underflow to 0.0
.
A numeric constant immediately followed by i
is regarded as an
imaginary complex number.
A numeric constant immediately followed by L
is regarded as an
integer
number when possible (and with a warning if it
contains a "."
).
Only the ASCII digits 0–9 are recognized as digits, even in languages which have other representations of digits. The ‘decimal separator’ is always a period and never a comma.
Note that a leading plus or minus is not regarded by the parser as part of a numeric constant but as a unary operator applied to the constant.
When a string is parsed to input a numeric constant, the number may or may not be representable exactly in the C double type used. If not one of the nearest representable numbers will be returned.
R's own C code is used to convert constants to binary numbers, so the
effect can be expected to be the same on all platforms implementing
full IEC 60559 arithmetic (the most likely area of difference being
the handling of numbers less than .Machine$double.xmin
).
The same code is used by scan
.
Syntax
.
For complex numbers, see complex
.
Quotes
for the parsing of character constants,
Reserved
for the “reserved words” in R.
## You can create numbers using fixed or scientific formatting. 2.1 2.1e10 -2.1E-10 ## The resulting objects have class numeric and type double. class(2.1) typeof(2.1) ## This holds even if what you typed looked like an integer. class(2) typeof(2) ## If you actually wanted integers, use an "L" suffix. class(2L) typeof(2L) ## These are equal but not identical 2 == 2L identical(2, 2L) ## You can write numbers between 0 and 1 without a leading "0" ## (but typically this makes code harder to read) .1234 sqrt(1i) # remember elementary math? utils::str(0xA0) identical(1L, as.integer(1)) ## You can combine the "0x" prefix with the "L" suffix : identical(0xFL, as.integer(15))
## You can create numbers using fixed or scientific formatting. 2.1 2.1e10 -2.1E-10 ## The resulting objects have class numeric and type double. class(2.1) typeof(2.1) ## This holds even if what you typed looked like an integer. class(2) typeof(2) ## If you actually wanted integers, use an "L" suffix. class(2L) typeof(2L) ## These are equal but not identical 2 == 2L identical(2, 2L) ## You can write numbers between 0 and 1 without a leading "0" ## (but typically this makes code harder to read) .1234 sqrt(1i) # remember elementary math? utils::str(0xA0) identical(1L, as.integer(1)) ## You can combine the "0x" prefix with the "L" suffix : identical(0xFL, as.integer(15))
Integers which are displayed in octal (base-8 number system) format, with as many digits as are needed to display the largest, using leading zeroes as necessary.
Arithmetic works as for integers, and non-integer valued mathematical functions typically work by truncating the result to integer.
as.octmode(x) ## S3 method for class 'octmode' as.character(x, keepStr = FALSE, ...) ## S3 method for class 'octmode' format(x, width = NULL, ...) ## S3 method for class 'octmode' print(x, ...)
as.octmode(x) ## S3 method for class 'octmode' as.character(x, keepStr = FALSE, ...) ## S3 method for class 'octmode' format(x, width = NULL, ...) ## S3 method for class 'octmode' print(x, ...)
x |
an object, for the methods inheriting from class |
keepStr |
a |
width |
|
... |
further arguments passed to or from other methods. |
"octmode"
objects are integer vectors with that class
attribute, used primarily to ensure that they are printed in octal
notation, specifically for Unix-like file permissions such as
755
. Subsetting ([
) works too, as do arithmetic or
other mathematical operations, albeit truncated to integer.
as.character(x)
drops all attributes
(unless when
keepStr=TRUE
where it keeps, dim
, dimnames
and
names
for back compatibility) and converts each entry individually, hence with no
leading zeroes, whereas in format()
, when width = NULL
(the
default), the output is padded with leading zeroes to the smallest width
needed for all the non-missing elements.
as.octmode
can convert integers (of type "integer"
or
"double"
) and character vectors whose elements contain only
digits 0-7
(or are NA
) to class "octmode"
.
There is a !
method and methods for |
and
&
:
these recycle their arguments to the length of the longer and then
apply the operators bitwise to each element.
These are auxiliary functions for file.info
.
hexmode
, sprintf
for other options in
converting integers to octal, strtoi
to convert octal
strings to integers.
(on <- as.octmode(c(16, 32, 127:129))) # "020" "040" "177" "200" "201" unclass(on[3:4]) # subsetting ## manipulate file modes fmode <- as.octmode("170") (fmode | "644") & "755" (umask <- Sys.umask()) # depends on platform c(fmode, "666", "755") & !umask om <- as.octmode(1:12) om # print()s via format() stopifnot(nchar(format(om)) == 2) om[1:7] # *no* leading zeroes! stopifnot(format(om[1:7]) == as.character(1:7)) om2 <- as.octmode(c(1:10, 60:70)) om2 # prints via format() -> with 3 octals stopifnot(nchar(format(om2)) == 3) as.character(om2) # strings of length 1, 2, 3 ## Integer arithmetic (remaining "octmode"): om^2 om * 64 -om (fac <- factorial(om)) # !1, !2, !3, !4 .. in hexadecimals as.integer(fac) # indeed the same as factorial(1:12)
(on <- as.octmode(c(16, 32, 127:129))) # "020" "040" "177" "200" "201" unclass(on[3:4]) # subsetting ## manipulate file modes fmode <- as.octmode("170") (fmode | "644") & "755" (umask <- Sys.umask()) # depends on platform c(fmode, "666", "755") & !umask om <- as.octmode(1:12) om # print()s via format() stopifnot(nchar(format(om)) == 2) om[1:7] # *no* leading zeroes! stopifnot(format(om[1:7]) == as.character(1:7)) om2 <- as.octmode(c(1:10, 60:70)) om2 # prints via format() -> with 3 octals stopifnot(nchar(format(om2)) == 3) as.character(om2) # strings of length 1, 2, 3 ## Integer arithmetic (remaining "octmode"): om^2 om * 64 -om (fac <- factorial(om)) # !1, !2, !3, !4 .. in hexadecimals as.integer(fac) # indeed the same as factorial(1:12)
on.exit
records the expression given as its argument as needing
to be executed when the current function exits (either naturally or as
the result of an error). This is useful for resetting graphical
parameters or performing other cleanup actions.
If no expression is provided, i.e., the call is on.exit()
, then
the current on.exit
code is removed.
on.exit(expr = NULL, add = FALSE, after = TRUE)
on.exit(expr = NULL, add = FALSE, after = TRUE)
expr |
an expression to be executed. |
add |
if TRUE, add |
after |
if |
The expr
argument passed to on.exit
is recorded without
evaluation. If it is not subsequently removed/replaced by another
on.exit
call in the same function, it is evaluated in the
evaluation frame of the function when it exits (including during
standard error handling). Thus any functions or variables in the
expression will be looked for in the function and its environment at
the time of exit: to capture the current value in expr
use
substitute
or similar.
If multiple on.exit
expressions are set using add = TRUE
then all expressions will be run even if one signals an error.
This is a ‘special’ primitive function: it only
evaluates the arguments add
and after
.
Invisible NULL
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
sys.on.exit
which returns the expression stored for use
by on.exit()
in the function in which sys.on.exit()
is
evaluated.
require(graphics) opar <- par(mai = c(1,1,1,1)) on.exit(par(opar))
require(graphics) opar <- par(mai = c(1,1,1,1)) on.exit(par(opar))
Operators for the "Date"
class.
There is an Ops
method and specific
methods for +
and -
for the Date
class.
date + x x + date date - x date1 lop date2
date + x x + date date - x date1 lop date2
date |
an object of class |
date1 , date2
|
date objects or character vectors. (Character
vectors are converted by |
x |
a numeric vector (in days) or an object of class
|
lop |
one of |
x
does not need to be integer if specified as a numeric vector,
but see the comments about fractional days in the help for
Dates
.
(z <- Sys.Date()) z + 10 z < c("2009-06-01", "2010-01-01", "2015-01-01")
(z <- Sys.Date()) z + 10 z < c("2009-06-01", "2010-01-01", "2015-01-01")
Allow the user to set and examine a variety of global options which affect the way in which R computes and displays its results.
options(...) getOption(x, default = NULL) .Options
options(...) getOption(x, default = NULL) .Options
... |
any options can be defined, using Options can also be passed by giving a single unnamed argument which is a named list. |
x |
a character string holding an option name. |
default |
if the specified option is not set in the options list, this value is returned. This facilitates retrieving an option and checking whether it is set and setting it separately if not. |
Invoking options()
with no arguments returns a list with the
current values of the options. Note that not all options listed below
are set initially. To access the value of a single option, one should
use, e.g., getOption("width")
rather than
options("width")
which is a list of length one.
For getOption
, the current value set for option x
, or
default
(which defaults to NULL
) if the option is unset.
For options()
, a list of all set options sorted by name. For
options(name)
, a list of length one containing the set value,
or NULL
if it is unset. For uses setting one or more options,
a list with the previous values of the options changed (returned
invisibly).
add.smooth
:typically logical, defaulting to
TRUE
. Could also be set to an integer for specifying how
many (simulated) smooths should be added. This is currently only
used by plot.lm
.
askYesNo
:a function (typically set by a front-end)
to ask the user binary response functions in a consistent way,
or a vector of strings used by askYesNo
to use
as default responses for such questions.
browserNLdisabled
:logical: whether newline is
disabled as a synonym for "n"
in the browser.
catch.script.errors
:logical, false by default. If
true and interactive()
is false, e.g., when an
R script is run by R CMD BATCH <script>.R
, then
errors do not stop execution of the script. Rather evaluation
continues after printing the error (and jumping to top level).
Also, traceback()
would provide info about the error.
Do use with care!
checkPackageLicense
:logical, not set by default. If
true, loadNamespace
asks a user to accept any
non-standard license at first load of the package.
check.bounds
:logical, defaulting to FALSE
. If
true, a warning is produced whenever a
vector (atomic or list
) is extended, by something
like x <- 1:3; x[5] <- 6
.
CBoundsCheck
:logical, controlling whether
.C
and .Fortran
make copies to check for
array over-runs on the atomic vector arguments.
Initially set from value of the environment variable
R_C_BOUNDS_CHECK (set to yes
to enable).
conflicts.policy
:character string or list controlling
handling of conflicts found in calls to library
or
require
. See library
for details.
continue
:a non-empty string setting the prompt used for lines which continue over one line.
defaultPackages
:the packages that are attached by
default when R starts up. Initially set from the value of the
environment variable R_DEFAULT_PACKAGES, or if that is unset
to c("datasets", "utils", "grDevices", "graphics", "stats",
"methods")
. (Set R_DEFAULT_PACKAGES to NULL
or
a comma-separated list of package names.)
This option can be changed in a ‘.Rprofile’ file, but it will
not work to exclude the methods package at this stage, as
the value is screened for methods before that file is read.
deparse.cutoff
:integer value controlling the
printing of language constructs which are deparse
d.
Default 60
.
deparse.max.lines
:controls the number of lines used
when deparsing in browser
, upon entry to a function
whose debugging flag is set, and if option traceback.max.lines
is unset, of traceback()
. Initially unset, and only
used if set to a positive integer.
traceback.max.lines
:controls the number of lines used
when deparsing in traceback
, if set.
Initially unset, and only used if set to a positive integer.
digits
:controls the number of significant (see signif
) digits to
print when printing numeric values. It is a suggestion only.
Valid values are 1...22 with default 7. See the note in
print.default
about values greater than 15.
digits.secs
:controls the maximum number of digits to
print when formatting time values in seconds. Valid values
are 0...6 with default 0 (equivalent to NULL
which is used
when it is undefined as on vanilla startup). See strftime
.
download.file.extra
:Extra command-line argument(s) for
non-default methods: see download.file
.
download.file.method
:Method to be used for
download.file
. Currently download methods
"internal"
, "wininet"
(Windows only),
"libcurl"
, "wget"
and "curl"
are available.
If not set, method = "auto"
is chosen: see download.file
.
echo
:logical. Only used in non-interactive mode,
when it controls whether input is echoed. Command-line option
--no-echo sets this to FALSE
, but otherwise
it starts the session as TRUE
.
encoding
:The name of an encoding, default
"native.enc"
. See connections
.
error
:either a function or an expression governing
the handling of non-catastrophic errors such as those generated by
stop
as well as by signals and internally detected
errors. If the option is a function, a call to that function,
with no arguments, is generated as the expression. By default
the option is not set: see stop
for the behaviour in
that case. The functions dump.frames
and
recover
provide alternatives that allow post-mortem
debugging. Note that these need to specified as
e.g. options(error = utils::recover)
in startup
files such as ‘.Rprofile’.
expressions
:sets a limit on the number of nested
expressions that will be evaluated. Valid values are
25...500000 with default 5000. If you increase it, you may
also want to start R with a larger protection stack;
see --max-ppsize in Memory
. Note too that
you may cause a segfault from overflow of the C stack, and on OSes
where it is possible you may want to increase that. Once the
limit is reached an error is thrown. The current number under
evaluation can be found by calling Cstack_info
.
interrupt
:a function taking no arguments to be called on a user interrupt if the interrupt condition is not otherwise handled.
keep.parse.data
:When internally storing source code
(keep.source
is TRUE), also store parse data. Parse data can
then be retrieved with getParseData()
and used e.g. for
spell checking of string constants or syntax highlighting. The value
has effect only when internally storing source code (see
keep.source
). The default is TRUE
.
keep.parse.data.pkgs
:As for keep.parse.data
, used
only when packages are installed. Defaults to FALSE
unless the
environment variable R_KEEP_PKG_PARSE_DATA is set to yes
.
The space overhead of parse data can be substantial even after
compression and it causes performance overhead when loading packages.
keep.source
:When TRUE
, the source code for
functions (newly defined or loaded) is stored internally
allowing comments to be kept in the right places. Retrieve the
source by printing or using deparse(fn, control =
"useSource")
.
The default is interactive()
, i.e., TRUE
for
interactive use.
keep.source.pkgs
:As for keep.source
, used only
when packages are installed. Defaults to FALSE
unless the
environment variable R_KEEP_PKG_SOURCE is set to yes
.
matprod
:a string selecting the implementation of
the matrix products %*%
, crossprod
, and
tcrossprod
for double and complex vectors:
"internal"
uses an unoptimized 3-loop algorithm
which correctly propagates NaN
and
Inf
values and is consistent in precision with
other summation algorithms inside R like sum
or
colSums
(which now means that it uses a long
double
accumulator for summation if available and enabled,
see capabilities
).
"default"
uses BLAS to speed up computation, but
to ensure correct propagation of NaN
and Inf
values it uses an unoptimized 3-loop algorithm for inputs that may
contain NaN
or Inf
values. When deemed
beneficial for performance, "default"
may call the
3-loop algorithm unconditionally, i.e., without checking the
input for NaN
/Inf
values. The 3-loop algorithm uses
(only) a double
accumulator for summation, which is
consistent with the reference BLAS implementation.
"blas"
uses BLAS unconditionally without any
checks and should be used with extreme caution. BLAS
libraries do not propagate NaN
or
Inf
values correctly and for inputs with
NaN
/Inf
values the results may be undefined.
"default.simd"
is experimental and will likely be
removed in future versions of R. It provides the same behavior
as "default"
, but the check whether the input contains
NaN
/Inf
values is faster on some SIMD hardware.
On older systems it will run correctly, but may be much slower than
"default"
.
max.print
:integer, defaulting to 99999
.
print
or show
methods can make use of
this option, to limit the amount of information that is printed,
to something in the order of (and typically slightly less than)
max.print
entries.
OutDec
:character string containing a single
character. The preferred character to be used as the decimal
point in output conversions, that is in printing, plotting,
format
, formatC
and
as.character
but not when
deparsing nor by sprintf
(which is sometimes used prior to printing).
pager
:the command used for displaying text files by
file.show
, details depending on the platform:
defaults to ‘R_HOME/bin/pager’, which is a shell
script running the command-line specified by the environment
variable PAGER whose default is set at configuration,
usually to less
.
defaults to "internal"
, which uses a pager similar to the
GUI console. Another possibility is "console"
to use the
console itself.
Can be a character string or an R function, in which case it
needs to accept the arguments (files, header,
title, delete.file)
corresponding to the first four arguments of
file.show
.
papersize
:the default paper format used by
postscript
; set by environment variable
R_PAPERSIZE when R is started: if that is unset or invalid
it defaults platform dependently
to a value derived from the locale category
LC_PAPER
, or if that is unavailable to a default set
when R was built.
to "a4"
, or "letter"
in US and
Canadian locales.
PCRE_limit_recursion
:Logical: should
grep(perl = TRUE)
and similar limit the maximal
recursion allowed when matching? Only relevant for PCRE1 and
PCRE2 <= 10.23.
PCRE can be built not to use a recursion stack (see
pcre_config
), but it uses recursion by default with
a recursion limit of 10000000 which potentially needs a very large
C stack: see the discussion at
https://www.pcre.org/original/doc/html/pcrestack.html. If
true, the limit is reduced using R's estimate of the C stack size
available (if known), otherwise 10000. If NA
, the limit is
imposed only if any input string has 1000 or more bytes. The
limit has no effect when PCRE's Just-in-Time compiler is used.
PCRE_study
:Logical or integer: should
grep(perl = TRUE)
and similar ‘study’ the
patterns? Either logical or a numerical threshold for the minimum
number of strings to be matched for the pattern to be studied (the
default is 10
)). Missing values and negative numbers are
treated as false. This option is ignored with PCRE2 (PCRE version >=
10.00) which does not have a separate study phase and patterns are
automatically optimized when possible.
PCRE_use_JIT
:Logical: should grep(perl =
TRUE)
, strsplit(perl = TRUE)
and similar make use
of PCRE's Just-In-Time compiler if available? (This applies only to
studied patterns with PCRE1.) Default: true. Missing values are
treated as false.
pdfviewer
:default PDF viewer. The default is set from the environment variable R_PDFVIEWER, the default value of which
is set when R is configured, and
is the full path to open.exe
, a utility
supplied with R.
printcmd
:the command used by postscript
for printing; set by environment variable R_PRINTCMD when
R is started. This should be a command that expects either input
to be piped to ‘stdin’ or to be given a single filename
argument. Usually set to "lpr"
on a Unix-alike.
prompt
:a non-empty string to be used for R's prompt;
should usually end in a blank (" "
).
rl_word_breaks
:(Unix only:) Used for the readline-based terminal
interface. Default value " \t\n\"\\'`><=%;,|&{()}"
.
This is the set of characters use to break the input line into
tokens for object- and file-name completion. Those who do not use
spaces around operators may prefer" \t\n\"\\'`><=+-*%;,|&{()}"
save.defaults
, save.image.defaults
:see save
.
scipen
:integer. A penalty to be applied
when deciding to print numeric values in fixed or exponential
notation. Positive values bias towards fixed and negative towards
scientific notation: fixed notation will be preferred unless it is
more than scipen
digits wider.
setWidthOnResize
:a logical. If set and TRUE
, R
run in a terminal using a recent readline
library will set
the width
option when the terminal is resized.
showWarnCalls
, showErrorCalls
:a logical.
Should warning and error messages produced by the default handlers
show a summary of the call stack? By default error call stacks
are shown in non-interactive sessions. When warning
or stop
are called on a condition object the call
stacks are only shown if the value returned by
conditionCall
for the condition object is not
NULL
.
showNCalls
:integer. Controls how long the sequence of calls must be (in bytes) before ellipses are used. Defaults to 50 and should be at least 30 and no more than 500.
show.error.locations
:Should source locations of
errors be printed? If set to TRUE
or "top"
, the
source location that is highest on the stack (the most recent
call) will be printed. "bottom"
will print the location
of the earliest call found on the stack.
Integer values can select other entries. The value 0
corresponds to "top"
and positive values count down the
stack from there. The value -1
corresponds to
"bottom"
and negative values count up from there.
show.error.messages
:a logical. Should error messages
be printed? Intended for use with try
or a
user-installed error handler.
texi2dvi
:used by functions
texi2dvi
and texi2pdf
in package tools.
Set at startup from the environment variable R_TEXI2DVICMD,
which defaults first to the value of environment variable
TEXI2DVI, and then to a value set when R was installed (the
full path to a texi2dvi
script if one was found). If
necessary, that environment variable can be set to
"emulation"
.
timeout
:positive integer. The timeout for some
Internet operations, in seconds. Default 60 (seconds) but can be
set from environment variable
R_DEFAULT_INTERNET_TIMEOUT. (Invalid values of the option or
the variable are silently ignored: non-integer numeric values will
be truncated.) See download.file
and
connections
.
topLevelEnvironment
:see topenv
and
sys.source
.
url.method
:character string: the default method for
url
. Normally unset, which is equivalent to
"default"
, which is "internal"
except on Windows.
useFancyQuotes
:controls the use of
directional quotes in sQuote
, dQuote
and in
rendering text help (see Rd2txt
in package
tools). Can be TRUE
, FALSE
, "TeX"
or
"UTF-8"
.
verbose
:logical. Should R report extra information
on progress? Set to TRUE
by the command-line option
--verbose.
warn
:integer value to set the handling of warning
messages by the default warning handler. If
warn
is negative all warnings are ignored. If warn
is zero (the default) warnings are stored until the top–level
function returns. If 10 or fewer warnings were signalled they
will be printed otherwise a message saying how many were
signalled. An object called last.warning
is
created and can be printed through the function
warnings
. If warn
is one, warnings are
printed as they occur. If warn
is two (or larger, coercible
to integer), all warnings are turned into errors. While sometimes
useful for debugging, turning warnings into errors may trigger
bugs and resource leaks that would not have been triggered otherwise.
warnPartialMatchArgs
:logical. If true, warns if partial matching is used in argument matching.
warnPartialMatchAttr
:logical. If true, warns if
partial matching is used in extracting attributes via
attr
.
warnPartialMatchDollar
:logical. If true, warns if
partial matching is used for extraction by $
.
warning.expression
:an R code expression to be called
if a warning is generated, replacing the standard message. If
non-null it is called irrespective of the value of option
warn
.
warning.length
:sets the truncation limit in bytes for error and warning messages. A non-negative integer, with allowed values 100...8170, default 1000.
nwarnings
:the limit for the number of warnings kept
when warn = 0
, default 50. This will discard messages if
called whilst they are being collected. If you increase this
limit, be aware that the current implementation pre-allocates
the equivalent of a named list for them, i.e., do not increase it to
more than say a million.
width
:controls the maximum number of columns on a
line used in printing vectors, matrices and arrays, and when
filling by cat
.
Columns are normally the same as characters except in East Asian languages.
You may want to change this if you re-size the window that R is running in. Valid values are 10...10000 with default normally 80. (The limits on valid values are in file ‘Print.h’ and can be changed by re-compiling R.) Some R consoles automatically change the value when they are resized.
See the examples on Startup for one way to set this automatically from the terminal width when R is started.
The ‘factory-fresh’ default settings of some of these options are
add.smooth |
TRUE
|
check.bounds |
FALSE
|
continue |
"+ "
|
digits |
7
|
echo |
TRUE
|
encoding |
"native.enc"
|
error |
NULL
|
expressions |
5000
|
keep.source |
interactive()
|
keep.source.pkgs |
FALSE
|
max.print |
99999
|
OutDec |
"."
|
prompt |
"> "
|
scipen |
0 |
show.error.messages |
TRUE
|
timeout |
60
|
verbose |
FALSE
|
warn |
0
|
warning.length |
1000
|
width |
80
|
Others are set from environment variables or are platform-dependent.
These will be set when package grDevices (or its namespace) is loaded if not already set.
bitmapType
:(Unix only, incl. macOS) character. The
default type for the
bitmap devices such as png
. Defaults to
"cairo"
on systems where that is available, or to
"quartz"
on macOS where that is available.
device
:a character string giving
the name of a function, or the function object itself,
which when called creates a new graphics device of the default
type for that session. The value of this option defaults to the
normal screen device (e.g., X11
, windows
or
quartz
) for an interactive session, and pdf
in batch use or if a screen is not available. If set to the name
of a device, the device is looked for first from the global
environment (that is down the usual search path) and then in the
grDevices namespace.
The default values in interactive and non-interactive sessions are configurable via environment variables R_INTERACTIVE_DEVICE and R_DEFAULT_DEVICE respectively.
The search logic for ‘the normal screen device’ is that
this is windows
on Windows, and quartz
if available
on macOS (running at the console, and compiled into the build).
Otherwise X11
is used if environment variable DISPLAY
is set.
device.ask.default
:logical. The default for
devAskNewPage("ask")
when a device is opened.
locatorBell
:logical. Should selection in locator
and identify
be confirmed by a bell? Default TRUE
.
Honoured at least on X11
and windows
devices.
windowsTimeout
:(Windows-only) integer vector of length 2
representing two times in milliseconds. These control the
double-buffering of windows
devices when that is
enabled: the first is the delay after plotting finishes
(default 100) and the second is the update interval during
continuous plotting (default 500). The values at the time the
device is opened are used.
max.contour.segments
:positive integer, defaulting to
25000
if not set. A limit on the number of
segments in a single contour line in contour
or
contourLines
.
These will be set when package stats (or its namespace) is loaded if not already set.
contrasts
:the default contrasts
used in
model fitting such as with aov
or lm
.
A character vector of length two, the first giving the function to
be used with unordered factors and the second the function to be
used with ordered factors. By default the elements are named
c("unordered", "ordered")
, but the names are unused.
na.action
:the name of a function for treating missing
values (NA
's) for certain situations, see
na.action
and na.pass
.
show.coef.Pvalues
:logical, affecting whether P
values are printed in summary tables of coefficients. See
printCoefmat
.
show.nls.convergence
:logical, should nls
convergence messages be printed for successful fits?
show.signif.stars
:logical, should stars be printed on
summary tables of coefficients? See printCoefmat
.
ts.eps
:the relative tolerance for certain time series
(ts
) computations. Default 1e-05
.
ts.S.compat
:logical. Used to select S compatibility
for plotting time-series spectra. See the description of argument
log
in plot.spec
.
These will be set (apart from Ncpus
) when package utils
(or its namespace) is loaded if not already set.
BioC_mirror
:The URL of a Bioconductor mirror
for use by setRepositories
,
e.g. the default ‘"https://bioconductor.org"’
or the European mirror
‘"https://bioconductor.statistik.tu-dortmund.de"’. Can be set
by chooseBioCmirror
.
browser
:The HTML browser to be used by
browseURL
. This sets the default browser on UNIX or
a non-default browser on Windows. Alternatively, an R function
that is called with a URL as its argument. See
browseURL
for further details.
ccaddress
:default Cc: address used by
create.post
(and hence bug.report
and
help.request
). Can be FALSE
or ""
.
citation.bibtex.max
:default 1; the maximal number of
bibentries (bibentry
) in a citation
for
which the BibTeX version is printed in addition to the text one.
de.cellwidth
:integer: the cell widths (number of
characters) to be used in the data editor dataentry
.
If this is unset (the default), 0, negative or NA
, variable
cell widths are used.
demo.ask
:default for the ask
argument of
demo
.
editor
:a non-empty character string or an R function
that sets the default text editor, e.g., for edit
and file.edit
. Set from the environment variable
EDITOR on UNIX, or if unset VISUAL or vi
.
As a string it should specify the name of or path to an external
command.
example.ask
:default for the ask
argument of
example
.
help.ports
:optional integer vector for setting ports
of the internal HTTP server, see startDynamicHelp
.
help.search.types
:default types of documentation
to be searched by help.search
and ??
.
help.try.all.packages
:default for an argument of
help
.
help_type
:default for an argument of
help
, used also as the help type by ?
.
help.htmlmath
:default for the texmath
argument
of Rd2HTML
, controlling how LaTeX-like mathematical
equations are displayed in R help pages (if enabled). Useful
values are "katex"
(equivalent to NULL
, the default)
and "mathjax"
; for all other values basic substitutions are
used.
help.htmltoc
:default for the toc
argument
of Rd2HTML
, controlling whether a table of contents
should be included.
HTTPUserAgent
:string used as the ‘user agent’ in
HTTP(S) requests by download.file
, url
and curlGetHeaders
, or NULL
when requests will
be made without a user agent header. The default is
"R (version platform arch os)"
except when ‘libcurl’ is used when it is
"libcurl/version"
for the ‘libcurl’ version in use.
install.lock
:logical: should per-directory package
locking be used by install.packages
? Most useful
for binary installs on macOS and Windows, but can be used in a
startup file for source installs via
R CMD INSTALL
. For binary installs, can also be
the character string "pkglock"
.
internet.info
:The minimum level of information to be
printed on URL downloads etc, using the "internal"
and
"libcurl"
methods.
Default is 2, for failure causes. Set to 1 or 0 to get more
detailed information (for the "internal"
method 0 provides
more information than 1).
install.packages.check.source
:Used by
install.packages
(and indirectly
update.packages
) on platforms which support binary
packages. Possible values "yes"
and "no"
, with
unset being equivalent to "yes"
.
install.packages.compile.from.source
:Used by
install.packages(type = "both")
(and indirectly
update.packages
) on platforms which
support binary packages. Possible values are "never"
,
"interactive"
(which means ask in interactive use and
"never"
in batch use) and "always"
. The default is
taken from environment variable
R_COMPILE_AND_INSTALL_PACKAGES, with default
"interactive"
if unset. However, install.packages
uses "never"
unless a make
program is found,
consulting the environment variable MAKE.
mailer
:default emailing method used by
create.post
and hence bug.report
and
help.request
.
menu.graphics
:Logical: should graphical menus be used
if available? Defaults to TRUE
. Currently applies to
select.list
, chooseCRANmirror
,
setRepositories
and to select from multiple (text)
help files in help
.
Ncpus
:an integer , used in
install.packages
as default for the number of CPUs
to use in a potentially parallel installation, as
Ncpus = getOption("Ncpus", 1L)
, i.e., when unset is
equivalent to a setting of 1.
pkgType
:The default type of packages to be downloaded
and installed – see install.packages
.
Possible values are platform dependently
"win.binary"
, "source"
and
"both"
(the default).
"source"
(the default except under a
CRAN macOS build), "mac.binary"
and
"both"
(the default for CRAN macOS builds).
("mac.binary.el-capitan"
,
"mac.binary.mavericks"
, "mac.binary.leopard"
and "mac.binary.universal"
are no longer in use.)
Value "binary"
is a synonym for the native binary type (if
there is one); "both"
is used by
install.packages
to choose between source and binary
installs.
repos
:character vector of repository URLs for use by
available.packages
and related functions. Initially
set from entries marked as default in the
‘repositories’ file,
whose path is configurable via environment variable R_REPOSITORIES
(set this to NULL
to skip initialization at startup).
The ‘factory-fresh’ setting from the file in R.home("etc")
is
c(CRAN="@CRAN@")
, a value that causes some utilities to
prompt for a CRAN mirror. To avoid this do set the CRAN mirror,
by something like
local({ r <- getOption("repos") r["CRAN"] <- "https://my.local.cran" options(repos = r) })
in your ‘.Rprofile’, or use a personal ‘repositories’ file.
Note that you can add more repositories (Bioconductor,
R-Forge, RForge.net, ...) for the current session
using setRepositories
.
str
:a list of options controlling the default
str
display. Defaults to strOptions()
.
str.dendrogram.last
:see str.dendrogram
.
SweaveHooks
, SweaveSyntax
:see Sweave
.
unzip
:a character string used by unzip
:
the path of the external program unzip
or "internal"
.
Defaults (platform dependently)
to the value of R_UNZIPCMD, which is set in
‘etc/Renviron’ to the path of the unzip
command found
during configuration and otherwise to ""
.
to "internal"
when the internal unzip
code is used.
These will be set when package parallel (or its namespace) is loaded if not already set.
mc.cores
:an integer giving the maximum allowed number
of additional R processes allowed to be run in parallel to
the current R process. Defaults to the setting of the
environment variable MC_CORES if set. Most applications
which use this assume a limit of 2
if it is unset.
dvipscmd
:character string giving a command to be used in
the (deprecated) off-line printing of help pages via
PostScript. Defaults to "dvips"
.
warn.FPU
:logical, by default undefined. If true, a warning is produced whenever dyn.load repairs the control word damaged by a buggy DLL.
For compatibility with S there is a visible object .Options
whose
value is a pairlist containing the current options()
(in no
particular order). Assigning to it will make a local copy and not
change the original. (Using it however is faster than calling
options()
).
An option set to NULL
is indistinguishable from a non existing
option.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
op <- options(); utils::str(op) # op is a named list getOption("width") == options()$width # the latter needs more memory options(digits = 15) pi # set the editor, and save previous value old.o <- options(editor = "nedit") old.o options(check.bounds = TRUE, warn = 1) x <- NULL; x[4] <- "yes" # gives a warning options(digits = 5) print(1e5) options(scipen = 3); print(1e5) options(op) # reset (all) initial options options("digits") ## Not run: ## set contrast handling to be like S options(contrasts = c("contr.helmert", "contr.poly")) ## End(Not run) ## Not run: ## on error, terminate the R session with error status 66 options(error = quote(q("no", status = 66, runLast = FALSE))) stop("test it") ## End(Not run) ## Not run: ## Set error actions for debugging: ## enter browser on error, see ?recover: options(error = recover) ## allows to call debugger() afterwards, see ?debugger: options(error = dump.frames) ## A possible setting for non-interactive sessions options(error = quote({dump.frames(to.file = TRUE); q()})) ## End(Not run) # Compare the two ways to get an option and use it # acconting for the possibility it might not be set. if(as.logical(getOption("performCleanp", TRUE))) cat("do cleanup\n") ## Not run: # a clumsier way of expressing the above w/o the default. tmp <- getOption("performCleanup") if(is.null(tmp)) tmp <- TRUE if(tmp) cat("do cleanup\n") ## End(Not run)
op <- options(); utils::str(op) # op is a named list getOption("width") == options()$width # the latter needs more memory options(digits = 15) pi # set the editor, and save previous value old.o <- options(editor = "nedit") old.o options(check.bounds = TRUE, warn = 1) x <- NULL; x[4] <- "yes" # gives a warning options(digits = 5) print(1e5) options(scipen = 3); print(1e5) options(op) # reset (all) initial options options("digits") ## Not run: ## set contrast handling to be like S options(contrasts = c("contr.helmert", "contr.poly")) ## End(Not run) ## Not run: ## on error, terminate the R session with error status 66 options(error = quote(q("no", status = 66, runLast = FALSE))) stop("test it") ## End(Not run) ## Not run: ## Set error actions for debugging: ## enter browser on error, see ?recover: options(error = recover) ## allows to call debugger() afterwards, see ?debugger: options(error = dump.frames) ## A possible setting for non-interactive sessions options(error = quote({dump.frames(to.file = TRUE); q()})) ## End(Not run) # Compare the two ways to get an option and use it # acconting for the possibility it might not be set. if(as.logical(getOption("performCleanp", TRUE))) cat("do cleanup\n") ## Not run: # a clumsier way of expressing the above w/o the default. tmp <- getOption("performCleanup") if(is.null(tmp)) tmp <- TRUE if(tmp) cat("do cleanup\n") ## End(Not run)
order
returns a permutation which rearranges its first
argument into ascending or descending order, breaking ties by further
arguments. sort.list
does the same, using only one argument.
See the examples for how to use these functions to sort data frames,
etc.
order(..., na.last = TRUE, decreasing = FALSE, method = c("auto", "shell", "radix")) sort.list(x, partial = NULL, na.last = TRUE, decreasing = FALSE, method = c("auto", "shell", "quick", "radix"))
order(..., na.last = TRUE, decreasing = FALSE, method = c("auto", "shell", "radix")) sort.list(x, partial = NULL, na.last = TRUE, decreasing = FALSE, method = c("auto", "shell", "quick", "radix"))
... |
a sequence of numeric, complex, character or logical vectors, all of the same length, or a classed R object. |
x |
an atomic vector for |
partial |
vector of indices for partial sorting.
(Non- |
decreasing |
logical. Should the sort order be increasing or
decreasing? For the |
na.last |
for controlling the treatment of |
method |
the method to be used: partial matches are allowed. The
default ( |
In the case of ties in the first vector, values in the second are used
to break the ties. If the values are still tied, values in the later
arguments are used to break the tie (see the first example).
The sort used is stable (except for method = "quick"
),
so any unresolved ties will be left in their original ordering.
Complex values are sorted first by the real part, then the imaginary part.
Except for method "radix"
, the sort order for character vectors
will depend on the collating sequence of the locale in use: see
Comparison
.
The "shell"
method is generally the safest bet and is the
default method, except for short factors, numeric vectors, integer
vectors and logical vectors, where "radix"
is assumed. Method
"radix"
stably sorts logical, numeric and character vectors in
linear time. It outperforms the other methods, although there are
drawbacks, especially for character vectors (see sort
).
Method "quick"
for sort.list
is only supported for
numeric x
with na.last = NA
, is not stable, and is
slower than "radix"
.
partial = NULL
is supported for compatibility with other
implementations of S, but no other values are accepted and ordering is
always complete.
For a classed R object, the sort order is taken from
xtfrm
: as its help page notes, this can be slow unless a
suitable method has been defined or is.numeric(x)
is
true. For factors, this sorts on the internal codes, which is
particularly appropriate for ordered factors.
An integer vector unless any of the inputs has or
more elements, when it is a double vector.
In programmatic use it is unsafe to name the ...
arguments,
as the names could match current or future control
arguments such as decreasing
. A sometimes-encountered unsafe
practice is to call do.call('order', df_obj)
where
df_obj
might be a data frame: copy df_obj
and
remove any names, for example using unname
.
sort.list
can get called by mistake as a method for
sort
with a list argument: it gives a suitable error
message for list x
.
There is a historical difference in behaviour for na.last = NA
:
sort.list
removes the NA
s and then computes the order
amongst the remaining elements: order
computes the order
amongst the non-NA
elements of the original vector. Thus
x[order(x, na.last = NA)] zz <- x[!is.na(x)]; zz[sort.list(x, na.last = NA)]
both sort the non-NA
values of x
.
Prior to R 3.3.0 method = "radix"
was only supported for
integers of range less than 100,000.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Knuth, D. E. (1998) The Art of Computer Programming, Volume 3: Sorting and Searching. 2nd ed. Addison-Wesley.
require(stats) (ii <- order(x <- c(1,1,3:1,1:4,3), y <- c(9,9:1), z <- c(2,1:9))) ## 6 5 2 1 7 4 10 8 3 9 rbind(x, y, z)[,ii] # shows the reordering (ties via 2nd & 3rd arg) ## Suppose we wanted descending order on y. ## A simple solution for numeric 'y' is rbind(x, y, z)[, order(x, -y, z)] ## More generally we can make use of xtfrm cy <- as.character(y) rbind(x, y, z)[, order(x, -xtfrm(cy), z)] ## The radix sort supports multiple 'decreasing' values: rbind(x, y, z)[, order(x, cy, z, decreasing = c(FALSE, TRUE, FALSE), method="radix")] ## Sorting data frames: dd <- transform(data.frame(x, y, z), z = factor(z, labels = LETTERS[9:1])) ## Either as above {for factor 'z' : using internal coding}: dd[ order(x, -y, z), ] ## or along 1st column, ties along 2nd, ... *arbitrary* no.{columns}: dd[ do.call(order, dd), ] set.seed(1) # reproducible example: d4 <- data.frame(x = round( rnorm(100)), y = round(10*runif(100)), z = round( 8*rnorm(100)), u = round(50*runif(100))) (d4s <- d4[ do.call(order, d4), ]) (i <- which(diff(d4s[, 3]) == 0)) # in 2 places, needed 3 cols to break ties: d4s[ rbind(i, i+1), ] ## rearrange matched vectors so that the first is in ascending order x <- c(5:1, 6:8, 12:9) y <- (x - 5)^2 o <- order(x) rbind(x[o], y[o]) ## tests of na.last a <- c(4, 3, 2, NA, 1) b <- c(4, NA, 2, 7, 1) z <- cbind(a, b) (o <- order(a, b)); z[o, ] (o <- order(a, b, na.last = FALSE)); z[o, ] (o <- order(a, b, na.last = NA)); z[o, ] ## speed examples on an average laptop for long vectors: ## factor/small-valued integers: x <- factor(sample(letters, 1e7, replace = TRUE)) system.time(o <- sort.list(x, method = "quick", na.last = NA)) # 0.1 sec stopifnot(!is.unsorted(x[o])) system.time(o <- sort.list(x, method = "radix")) # 0.05 sec, 2X faster stopifnot(!is.unsorted(x[o])) ## large-valued integers: xx <- sample(1:200000, 1e7, replace = TRUE) system.time(o <- sort.list(xx, method = "quick", na.last = NA)) # 0.3 sec system.time(o <- sort.list(xx, method = "radix")) # 0.2 sec ## character vectors: xx <- sample(state.name, 1e6, replace = TRUE) system.time(o <- sort.list(xx, method = "shell")) # 2 sec system.time(o <- sort.list(xx, method = "radix")) # 0.007 sec, 300X faster ## double vectors: xx <- rnorm(1e6) system.time(o <- sort.list(xx, method = "shell")) # 0.4 sec system.time(o <- sort.list(xx, method = "quick", na.last = NA)) # 0.1 sec system.time(o <- sort.list(xx, method = "radix")) # 0.05 sec, 2X faster
require(stats) (ii <- order(x <- c(1,1,3:1,1:4,3), y <- c(9,9:1), z <- c(2,1:9))) ## 6 5 2 1 7 4 10 8 3 9 rbind(x, y, z)[,ii] # shows the reordering (ties via 2nd & 3rd arg) ## Suppose we wanted descending order on y. ## A simple solution for numeric 'y' is rbind(x, y, z)[, order(x, -y, z)] ## More generally we can make use of xtfrm cy <- as.character(y) rbind(x, y, z)[, order(x, -xtfrm(cy), z)] ## The radix sort supports multiple 'decreasing' values: rbind(x, y, z)[, order(x, cy, z, decreasing = c(FALSE, TRUE, FALSE), method="radix")] ## Sorting data frames: dd <- transform(data.frame(x, y, z), z = factor(z, labels = LETTERS[9:1])) ## Either as above {for factor 'z' : using internal coding}: dd[ order(x, -y, z), ] ## or along 1st column, ties along 2nd, ... *arbitrary* no.{columns}: dd[ do.call(order, dd), ] set.seed(1) # reproducible example: d4 <- data.frame(x = round( rnorm(100)), y = round(10*runif(100)), z = round( 8*rnorm(100)), u = round(50*runif(100))) (d4s <- d4[ do.call(order, d4), ]) (i <- which(diff(d4s[, 3]) == 0)) # in 2 places, needed 3 cols to break ties: d4s[ rbind(i, i+1), ] ## rearrange matched vectors so that the first is in ascending order x <- c(5:1, 6:8, 12:9) y <- (x - 5)^2 o <- order(x) rbind(x[o], y[o]) ## tests of na.last a <- c(4, 3, 2, NA, 1) b <- c(4, NA, 2, 7, 1) z <- cbind(a, b) (o <- order(a, b)); z[o, ] (o <- order(a, b, na.last = FALSE)); z[o, ] (o <- order(a, b, na.last = NA)); z[o, ] ## speed examples on an average laptop for long vectors: ## factor/small-valued integers: x <- factor(sample(letters, 1e7, replace = TRUE)) system.time(o <- sort.list(x, method = "quick", na.last = NA)) # 0.1 sec stopifnot(!is.unsorted(x[o])) system.time(o <- sort.list(x, method = "radix")) # 0.05 sec, 2X faster stopifnot(!is.unsorted(x[o])) ## large-valued integers: xx <- sample(1:200000, 1e7, replace = TRUE) system.time(o <- sort.list(xx, method = "quick", na.last = NA)) # 0.3 sec system.time(o <- sort.list(xx, method = "radix")) # 0.2 sec ## character vectors: xx <- sample(state.name, 1e6, replace = TRUE) system.time(o <- sort.list(xx, method = "shell")) # 2 sec system.time(o <- sort.list(xx, method = "radix")) # 0.007 sec, 300X faster ## double vectors: xx <- rnorm(1e6) system.time(o <- sort.list(xx, method = "shell")) # 0.4 sec system.time(o <- sort.list(xx, method = "quick", na.last = NA)) # 0.1 sec system.time(o <- sort.list(xx, method = "radix")) # 0.05 sec, 2X faster
The outer product of the arrays X
and Y
is the array
A
with dimension c(dim(X), dim(Y))
where element
A[c(arrayindex.x, arrayindex.y)]
= FUN(X[arrayindex.x], Y[arrayindex.y], ...)
.
outer(X, Y, FUN = "*", ...) X %o% Y
outer(X, Y, FUN = "*", ...) X %o% Y
X , Y
|
first and second arguments for function |
FUN |
a function to use on the outer products, found via
|
... |
optional arguments to be passed to |
X
and Y
must be suitable arguments for FUN
. Each
will be extended by rep
to length the products of the
lengths of X
and Y
before FUN
is called.
FUN
is called with these two extended vectors as arguments
(plus any arguments in ...
). It must be a vectorized
function (or the name of one) expecting at least two arguments and
returning a value with the same length as the first (and the second).
Where they exist, the [dim]names of X
and Y
will be
copied to the answer, and a dimension assigned which is the
concatenation of the dimensions of X
and Y
(or lengths
if dimensions do not exist).
FUN = "*"
is handled as a special case via
as.vector(X) %*% t(as.vector(Y))
, and is intended only for
numeric vectors and arrays.
%o%
is binary operator providing a wrapper for
outer(x, y, "*")
.
Jonathan Rougier
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
%*%
for usual (inner) matrix vector
multiplication;
kronecker
which is based on outer
;
Vectorize
for vectorizing a non-vectorized function.
x <- 1:9; names(x) <- x # Multiplication & Power Tables x %o% x y <- 2:8; names(y) <- paste(y,":", sep = "") outer(y, x, `^`) outer(month.abb, 1999:2003, FUN = paste) ## three way multiplication table: x %o% x %o% y[1:3]
x <- 1:9; names(x) <- x # Multiplication & Power Tables x %o% x y <- 2:8; names(y) <- paste(y,":", sep = "") outer(y, x, `^`) outer(month.abb, 1999:2003, FUN = paste) ## three way multiplication table: x %o% x %o% y[1:3]
Open parenthesis, (
, and open brace, {
, are
.Primitive
functions in R.
Effectively, (
is semantically equivalent to the identity
function(x) x
, whereas {
is slightly more interesting,
see examples.
( ... ) { ... }
( ... ) { ... }
For (
, the result of evaluating the argument. This has
visibility set, so will auto-print if used at top-level.
For {
, the result of the last expression evaluated. This has
the visibility of the last evaluation.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
if
, return
, etc for other objects used in
the R language itself.
Syntax
for operator precedence.
f <- get("(") e <- expression(3 + 2 * 4) identical(f(e), e) do <- get("{") do(x <- 3, y <- 2*x-3, 6-x-y); x; y ## note the differences (2+3) {2+3; 4+5} (invisible(2+3)) {invisible(2+3)}
f <- get("(") e <- expression(3 + 2 * 4) identical(f(e), e) do <- get("{") do(x <- 3, y <- 2*x-3, 6-x-y); x; y ## note the differences (2+3) {2+3; 4+5} (invisible(2+3)) {invisible(2+3)}
parse()
returns the parsed but unevaluated expressions in an
expression
, a “list” of call
s.
str2expression(s)
and str2lang(s)
return special versions
of parse(text=s, keep.source=FALSE)
and can therefore be regarded as
transforming character strings s
to expressions, calls, etc.
parse(file = "", n = NULL, text = NULL, prompt = "?", keep.source = getOption("keep.source"), srcfile, encoding = "unknown") str2lang(s) str2expression(text)
parse(file = "", n = NULL, text = NULL, prompt = "?", keep.source = getOption("keep.source"), srcfile, encoding = "unknown") str2lang(s) str2expression(text)
file |
a connection, or a character string giving the name of a
file or a URL to read the expressions from.
If |
n |
integer (or coerced to integer). The maximum number of
expressions to parse. If |
text |
character vector. The text to parse. Elements are treated as if they were lines of a file. Other R objects will be coerced to character if possible. |
prompt |
the prompt to print when parsing from the keyboard.
|
keep.source |
a logical value; if |
srcfile |
|
encoding |
encoding to be assumed for input strings. If the
value is |
s |
a |
parse(....)
: If text
has length greater than zero (after coercion) it is used in
preference to file
.
All versions of R accept input from a connection with end of line marked by LF (as used on Unix), CRLF (as used on DOS/Windows) or CR (as used on classic Mac OS). The final line can be incomplete, that is missing the final EOL marker.
When input is taken from the console, n = NULL
is equivalent to
n = 1
, and n < 0
will read until an EOF character is
read. (The EOF character is Ctrl-Z for the Windows front-ends.) The
line-length limit is 4095 bytes when reading from the console (which
may impose a lower limit: see ‘An Introduction to R’).
The default for srcfile
is set as follows. If
keep.source
is not TRUE
, srcfile
defaults to a character string, either "<text>"
or one
derived from file
. When keep.source
is
TRUE
, if text
is used, srcfile
will be set to a
srcfilecopy
containing the text. If a character
string is used for file
, a srcfile
object
referring to that file will be used.
When srcfile
is a character string, error messages will
include the name, but source reference information will not be added
to the result. When srcfile
is a srcfile
object, source reference information will be retained.
str2expression(s)
: for a character
vector
s
, str2expression(s)
corresponds to
parse(text = s, keep.source=FALSE)
, which is always of
type (typeof
) and class
expression
.
str2lang(s)
: for a character
string
s
, str2lang(s)
corresponds to
parse(text = s, keep.source=FALSE)[[1]]
(plus a check
that both s
and the parse(*)
result are of length one)
which is typically a call
but may also be a symbol
aka
name
, NULL
or an atomic constant such as
2
, 1L
, or TRUE
. Put differently, the value of
str2lang(.)
is a call or one of its parts, in short
“a call or simpler”.
Currently, encoding is not handled in str2lang()
and
str2expression()
.
parse()
and str2expression()
return an object of type
"expression"
, for parse()
with up to n
elements if specified as a non-negative integer.
str2lang(s)
, s
a string, returns “a
call
or simpler”, see the ‘Details:’ section.
When srcfile
is non-NULL
, a "srcref"
attribute
will be attached to the result containing a list of
srcref
records corresponding to each element, a
"srcfile"
attribute will be attached containing a copy of
srcfile
, and a "wholeSrcref"
attribute will be
attached containing a srcref
record corresponding to
all of the parsed text. Detailed parse information will be stored in
the "srcfile"
attribute, to be retrieved by
getParseData
.
A syntax error (including an incomplete expression) will throw an error.
Character strings in the result will have a declared encoding if
encoding
is "latin1"
or "UTF-8"
, or if
text
is supplied with every element of known encoding in a
Latin-1 or UTF-8 locale.
When a syntax error occurs during parsing, parse
signals an error. The partial parse data will be stored in the
srcfile
argument if it is a srcfile
object
and the text
argument was used to supply the text. In other
cases it will be lost when the error is triggered.
The partial parse data can be retrieved using
getParseData
applied to the srcfile
object.
Because parsing was incomplete, it will typically include references
to "parent"
entries that are not present.
Using parse(text = *, ..)
or its simplified and hence more
efficient versions str2lang()
or str2expression()
is at
least an order of magnitude less efficient than call(..)
or
as.call()
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Murdoch, D. (2010). “Source References”. The R Journal, 2(2), 16–19. doi:10.32614/RJ-2010-010.
The source reference information can be used for debugging (see
e.g. setBreakpoint
) and profiling (see
Rprof
). It can be examined by getSrcref
and related functions. More detailed information is available through
getParseData
.
fil <- tempfile(fileext = ".Rdmped") cat("x <- c(1, 4)\n x ^ 3 -10 ; outer(1:7, 5:9)\n", file = fil) # parse 3 statements from our temp file parse(file = fil, n = 3) unlink(fil) ## str2lang(<string>) || str2expression(<character>) : stopifnot(exprs = { identical( str2lang("x[3] <- 1+4"), quote(x[3] <- 1+4)) identical( str2lang("log(y)"), quote(log(y)) ) identical( str2lang("abc" ), quote(abc) -> qa) is.symbol(qa) & !is.call(qa) # a symbol/name, not a call identical( str2lang("1.375" ), 1.375) # just a number, not a call identical( str2expression(c("# a comment", "", "42")), expression(42) ) }) # A partial parse with a syntax error txt <- " x <- 1 an error " sf <- srcfile("txt") tryCatch(parse(text = txt, srcfile = sf), error = function(e) "Syntax error.") getParseData(sf)
fil <- tempfile(fileext = ".Rdmped") cat("x <- c(1, 4)\n x ^ 3 -10 ; outer(1:7, 5:9)\n", file = fil) # parse 3 statements from our temp file parse(file = fil, n = 3) unlink(fil) ## str2lang(<string>) || str2expression(<character>) : stopifnot(exprs = { identical( str2lang("x[3] <- 1+4"), quote(x[3] <- 1+4)) identical( str2lang("log(y)"), quote(log(y)) ) identical( str2lang("abc" ), quote(abc) -> qa) is.symbol(qa) & !is.call(qa) # a symbol/name, not a call identical( str2lang("1.375" ), 1.375) # just a number, not a call identical( str2expression(c("# a comment", "", "42")), expression(42) ) }) # A partial parse with a syntax error txt <- " x <- 1 an error " sf <- srcfile("txt") tryCatch(parse(text = txt, srcfile = sf), error = function(e) "Syntax error.") getParseData(sf)
Concatenate vectors after converting to character.
Concatenation happens in two basically different ways, determined by
collapse
being a string or not.
paste (..., sep = " ", collapse = NULL, recycle0 = FALSE) paste0(..., collapse = NULL, recycle0 = FALSE)
paste (..., sep = " ", collapse = NULL, recycle0 = FALSE) paste0(..., collapse = NULL, recycle0 = FALSE)
... |
one or more R objects, to be converted to character vectors. |
sep |
a character string to separate the terms. Not
|
collapse |
an optional character string to separate the results. Not
|
recycle0 |
|
paste
converts its arguments (via
as.character
) to character strings, and concatenates
them (separating them by the string given by sep
).
If the arguments are vectors, they are concatenated term-by-term to give a
character vector result. Vector arguments are recycled as needed.
Zero-length arguments are recycled as ""
unless recycle0
is TRUE
and collapse
is NULL
.
Note that paste()
coerces NA_character_
, the
character missing value, to "NA"
which may seem
undesirable, e.g., when pasting two character vectors, or very
desirable, e.g. in paste("the value of p is ", p)
.
paste0(..., collapse)
is equivalent to
paste(..., sep = "", collapse)
, slightly more efficiently.
If a value is specified for collapse
, the values in the result
are then concatenated into a single string, with the elements being
separated by the value of collapse
.
A character vector of the concatenated values. This will be of length
zero if all the objects are, unless collapse
is non-NULL, in which
case it is ""
(a single empty string).
If any input into an element of the result is in UTF-8 (and none are
declared with encoding "bytes"
, see Encoding
),
that element will be in UTF-8, otherwise in the current encoding in
which case the encoding of the element is declared if the current
locale is either Latin-1 or UTF-8, at least one of the corresponding
inputs (including separators) had a declared encoding and all inputs
were either ASCII or declared.
If an input into an element is declared with encoding "bytes"
,
no translation will be done of any of the elements and the resulting
element will have encoding "bytes"
. If collapse
is
non-NULL, this applies also to the second, collapsing, phase, but some
translation may have been done in pasting object together in the first
phase.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
toString
typically calls paste(*, collapse=", ")
.
String manipulation with
as.character
, substr
, nchar
,
strsplit
; further, cat
which concatenates and
writes to a file, and sprintf
for C like string
construction.
‘plotmath’ for the use of paste
in plot annotation.
## When passing a single vector, paste0 and paste work like as.character. paste0(1:12) paste(1:12) # same as.character(1:12) # same ## If you pass several vectors to paste0, they are concatenated in a ## vectorized way. (nth <- paste0(1:12, c("st", "nd", "rd", rep("th", 9)))) ## paste works the same, but separates each input with a space. ## Notice that the recycling rules make every input as long as the longest input. paste(month.abb, "is the", nth, "month of the year.") paste(month.abb, letters) ## You can change the separator by passing a sep argument ## which can be multiple characters. paste(month.abb, "is the", nth, "month of the year.", sep = "_*_") ## To collapse the output into a single string, pass a collapse argument. paste0(nth, collapse = ", ") ## For inputs of length 1, use the sep argument rather than collapse paste("1st", "2nd", "3rd", collapse = ", ") # probably not what you wanted paste("1st", "2nd", "3rd", sep = ", ") ## You can combine the sep and collapse arguments together. paste(month.abb, nth, sep = ": ", collapse = "; ") ## Using paste() in combination with strwrap() can be useful ## for dealing with long strings. (title <- paste(strwrap( "Stopping distance of cars (ft) vs. speed (mph) from Ezekiel (1930)", width = 30), collapse = "\n")) plot(dist ~ speed, cars, main = title) ## zero length arguments recycled as `""` -- NB: `{}` <==> character(0) here paste({}, 1:2) ## 'recycle0 = TRUE' allows standard vectorized behaviour, i.e., zero-length ## recycling resulting in zero-length result character(0): valid <- FALSE val <- pi paste("The value is", val[valid], "-- not so good!") # -> ".. value is -- not .." paste("The value is", val[valid], "-- good: empty!", recycle0=TRUE) # -> character(0) ## When 'collapse = <string>', result is (length 1) string in all cases paste("foo", {}, "bar", collapse = "|") # |--> "foo bar" paste("foo", {}, collapse = "|", recycle0 = TRUE) # |--> "" ## If all arguments are empty (and collapse a string), "" results always paste( collapse = "|") paste( collapse = "|", recycle0 = TRUE) paste({}, collapse = "|") paste({}, collapse = "|", recycle0 = TRUE)
## When passing a single vector, paste0 and paste work like as.character. paste0(1:12) paste(1:12) # same as.character(1:12) # same ## If you pass several vectors to paste0, they are concatenated in a ## vectorized way. (nth <- paste0(1:12, c("st", "nd", "rd", rep("th", 9)))) ## paste works the same, but separates each input with a space. ## Notice that the recycling rules make every input as long as the longest input. paste(month.abb, "is the", nth, "month of the year.") paste(month.abb, letters) ## You can change the separator by passing a sep argument ## which can be multiple characters. paste(month.abb, "is the", nth, "month of the year.", sep = "_*_") ## To collapse the output into a single string, pass a collapse argument. paste0(nth, collapse = ", ") ## For inputs of length 1, use the sep argument rather than collapse paste("1st", "2nd", "3rd", collapse = ", ") # probably not what you wanted paste("1st", "2nd", "3rd", sep = ", ") ## You can combine the sep and collapse arguments together. paste(month.abb, nth, sep = ": ", collapse = "; ") ## Using paste() in combination with strwrap() can be useful ## for dealing with long strings. (title <- paste(strwrap( "Stopping distance of cars (ft) vs. speed (mph) from Ezekiel (1930)", width = 30), collapse = "\n")) plot(dist ~ speed, cars, main = title) ## zero length arguments recycled as `""` -- NB: `{}` <==> character(0) here paste({}, 1:2) ## 'recycle0 = TRUE' allows standard vectorized behaviour, i.e., zero-length ## recycling resulting in zero-length result character(0): valid <- FALSE val <- pi paste("The value is", val[valid], "-- not so good!") # -> ".. value is -- not .." paste("The value is", val[valid], "-- good: empty!", recycle0=TRUE) # -> character(0) ## When 'collapse = <string>', result is (length 1) string in all cases paste("foo", {}, "bar", collapse = "|") # |--> "foo bar" paste("foo", {}, collapse = "|", recycle0 = TRUE) # |--> "" ## If all arguments are empty (and collapse a string), "" results always paste( collapse = "|") paste( collapse = "|", recycle0 = TRUE) paste({}, collapse = "|") paste({}, collapse = "|", recycle0 = TRUE)
Expand a path name, for example by replacing a leading tilde by the user's home directory (if defined on that platform).
path.expand(path)
path.expand(path)
path |
character vector containing one or more path names. |
On most builds of R a leading ~user
will expand to the home
directory of user
.
There are possibly different concepts of ‘home directory’: that usually used is the setting of the environment variable HOME.
The ‘path names’ need not exist nor be valid path names but they do need to be representable in the session encoding.
The definition of the ‘home’ directory is in the ‘rw-FAQ’
Q2.14: it is taken from the R_USER environment variable when
path.expand
is first called in a session.
The ‘path names’ need not exist nor be valid path names.
A character vector of possibly expanded path names: where the home directory is unknown or none is specified the path is returned unchanged.
If the expansion would exceed the maximum path length the result may be truncated or the path may be returned unchanged.
basename
, normalizePath
, file.path
.
path.expand("~/foo")
path.expand("~/foo")
Report some of the configuration options of the version of PCRE in use in this R session.
pcre_config()
pcre_config()
A named logical vector, currently with elements
UTF-8 |
Support for UTF-8 inputs. Required. |
Unicode properties |
Support for ‘\p{xx}’ and ‘\P{xx}’ in regular expressions. Desirable and used by some CRAN packages. As of PCRE2, always present with support for UTF-8. |
JIT |
Support for just-in-time compilation. Desirable for speed
(but only available as a compile-time option on certain
architectures, and may be unused as unreliable on some of those,
e.g. |
stack |
Does match recursion use a stack ( |
extSoftVersion
for the PCRE version.
pcre_config()
pcre_config()
Pipe a value into a call expression or a function expression.
lhs |> rhs
lhs |> rhs
lhs |
expression producing a value. |
rhs |
a call expression. |
A pipe expression passes, or ‘pipes’, the result of the left-hand-side
expression lhs
to the right-hand-side expression rhs
.
The lhs
is
inserted as the first argument in the call. So x |> f(y)
is
interpreted as f(x, y)
.
To avoid ambiguities, functions in rhs
calls may not be
syntactically special, such as +
or if
.
It is also possible to use a named argument with the placeholder
_
in the rhs
call to specify where the lhs
is to
be inserted. The placeholder can only appear once on the rhs
.
The placeholder can also be used as the first argument in an
extraction call, such as _$coef
. More generally, it can be used
as the head of a chain of extractions, such as _$coef[[2]]
,
using a sequence of the extraction functions $
, [
,
[[
, or @
.
Pipe notation allows a nested sequence of calls to be written in a way that may make the sequence of processing steps easier to follow.
Currently, pipe operations are implemented as syntax transformations.
So an expression written as x |> f(y)
is parsed as f(x,
y)
. It is worth emphasizing that while the code in a pipeline is
written sequentially, regular R semantics for evaluation apply and
so piped expressions will be evaluated only when first used in the
rhs
expression.
Returns the result of evaluating the transformed expression.
The forward pipe operator is motivated by the pipe introduced in the magrittr package, but is more streamlined. It is similar to the pipe or pipeline operators introduced in other languages, including F#, Julia, and JavaScript.
This was introduced in R 4.1.0. Code using it will not be parsed as intended (probably with an error) in earlier versions of R.
# simple uses: mtcars |> head() # same as head(mtcars) mtcars |> head(2) # same as head(mtcars, 2) mtcars |> subset(cyl == 4) |> nrow() # same as nrow(subset(mtcars, cyl == 4)) # to pass the lhs into an argument other than the first, either # use the _ placeholder with a named argument: mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = _) # or use an anonymous function: mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = d))() mtcars |> subset(cyl == 4) |> (\(d) lm(mpg ~ disp, data = d))() # or explicitly name the argument(s) before the "one": mtcars |> subset(cyl == 4) |> lm(formula = mpg ~ disp) # using the placeholder as the head of an extraction chain: mtcars |> subset(cyl == 4) |> lm(formula = mpg ~ disp) |> _$coef[[2]] # the pipe operator is implemented as a syntax transformation: quote(mtcars |> subset(cyl == 4) |> nrow()) # regular R evaluation semantics apply stop() |> (function(...) {})() # stop() is not used on RHS so is not evaluated
# simple uses: mtcars |> head() # same as head(mtcars) mtcars |> head(2) # same as head(mtcars, 2) mtcars |> subset(cyl == 4) |> nrow() # same as nrow(subset(mtcars, cyl == 4)) # to pass the lhs into an argument other than the first, either # use the _ placeholder with a named argument: mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = _) # or use an anonymous function: mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = d))() mtcars |> subset(cyl == 4) |> (\(d) lm(mpg ~ disp, data = d))() # or explicitly name the argument(s) before the "one": mtcars |> subset(cyl == 4) |> lm(formula = mpg ~ disp) # using the placeholder as the head of an extraction chain: mtcars |> subset(cyl == 4) |> lm(formula = mpg ~ disp) |> _$coef[[2]] # the pipe operator is implemented as a syntax transformation: quote(mtcars |> subset(cyl == 4) |> nrow()) # regular R evaluation semantics apply stop() |> (function(...) {})() # stop() is not used on RHS so is not evaluated
Generic function for plotting of R objects.
For simple scatter plots, plot.default
will be used.
However, there are plot
methods for many R objects,
including function
s, data.frame
s,
density
objects, etc. Use methods(plot)
and
the documentation for these. Most of these methods are implemented
using traditional graphics (the graphics package), but this is
not mandatory.
For more details about graphical parameter arguments used by
traditional graphics, see par
.
plot(x, y, ...)
plot(x, y, ...)
x |
the coordinates of points in the plot. Alternatively, a
single plotting structure, function or any R object with a
|
y |
the y coordinates of points in the plot, optional
if |
... |
arguments to be passed to methods, such as
graphical parameters (see
|
The two step types differ in their x-y preference: Going from
to
with
,
type = "s"
moves first horizontal, then vertical, whereas type = "S"
moves
the other way around.
The plot
generic was moved from the graphics package to
the base package in R 4.0.0. It is currently re-exported from
the graphics namespace to allow packages importing it from there
to continue working, but this may change in future versions of R.
plot.default
, plot.formula
and other
methods; points
, lines
, par
.
For thousands of points, consider using smoothScatter()
instead of plot()
.
For X-Y-Z plotting see contour
, persp
and
image
.
require(stats) # for lowess, rpois, rnorm require(graphics) # for plot methods plot(cars) lines(lowess(cars)) plot(sin, -pi, 2*pi) # see ?plot.function ## Discrete Distribution Plot: plot(table(rpois(100, 5)), type = "h", col = "red", lwd = 10, main = "rpois(100, lambda = 5)") ## Simple quantiles/ECDF, see ecdf() {library(stats)} for a better one: plot(x <- sort(rnorm(47)), type = "s", main = "plot(x, type = \"s\")") points(x, cex = .5, col = "dark red")
require(stats) # for lowess, rpois, rnorm require(graphics) # for plot methods plot(cars) lines(lowess(cars)) plot(sin, -pi, 2*pi) # see ?plot.function ## Discrete Distribution Plot: plot(table(rpois(100, 5)), type = "h", col = "red", lwd = 10, main = "rpois(100, lambda = 5)") ## Simple quantiles/ECDF, see ecdf() {library(stats)} for a better one: plot(x <- sort(rnorm(47)), type = "s", main = "plot(x, type = \"s\")") points(x, cex = .5, col = "dark red")
pmatch
seeks matches for the elements of its first argument
among those of its second.
pmatch(x, table, nomatch = NA_integer_, duplicates.ok = FALSE)
pmatch(x, table, nomatch = NA_integer_, duplicates.ok = FALSE)
x |
the values to be matched: converted to a character vector by
|
table |
the values to be matched against: converted to a character vector. Long vectors are not supported. |
nomatch |
the value to be returned at non-matching or multiply
partially matching positions. Note that it is coerced to |
duplicates.ok |
should elements in |
The behaviour differs by the value of duplicates.ok
. Consider
first the case if this is true. First exact matches are considered,
and the positions of the first exact matches are recorded. Then unique
partial matches are considered, and if found recorded. (A partial
match occurs if the whole of the element of x
matches the
beginning of the element of table
.) Finally,
all remaining elements of x
are regarded as unmatched.
In addition, an empty string can match nothing, not even an exact
match to an empty string. This is the appropriate behaviour for
partial matching of character indices, for example.
If duplicates.ok
is FALSE
, values of table
once
matched are excluded from the search for subsequent matches. This
behaviour is equivalent to the R algorithm for argument
matching, except for the consideration of empty strings (which in
argument matching are matched after exact and partial matching to any
remaining arguments).
charmatch
is similar to pmatch
with
duplicates.ok
true, the differences being that it
differentiates between no match and an ambiguous partial match, it
does match empty strings, and it does not allow multiple exact matches.
NA
values are treated as if they were the string constant
"NA"
.
An integer vector (possibly including NA
if nomatch =
NA
) of the same length as x
, giving the indices of the
elements in table
which matched, or nomatch
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Chambers, J. M. (1998) Programming with Data. A Guide to the S Language. Springer.
match
, charmatch
and
match.arg
, match.fun
,
match.call
, for function argument matching etc.,
startsWith
for particular checking of initial matches;
grep
etc for more general (regexp) matching of strings.
pmatch("", "") # returns NA pmatch("m", c("mean", "median", "mode")) # returns NA pmatch("med", c("mean", "median", "mode")) # returns 2 pmatch(c("", "ab", "ab"), c("abc", "ab"), duplicates.ok = FALSE) pmatch(c("", "ab", "ab"), c("abc", "ab"), duplicates.ok = TRUE) ## compare charmatch(c("", "ab", "ab"), c("abc", "ab"))
pmatch("", "") # returns NA pmatch("m", c("mean", "median", "mode")) # returns NA pmatch("med", c("mean", "median", "mode")) # returns 2 pmatch(c("", "ab", "ab"), c("abc", "ab"), duplicates.ok = FALSE) pmatch(c("", "ab", "ab"), c("abc", "ab"), duplicates.ok = TRUE) ## compare charmatch(c("", "ab", "ab"), c("abc", "ab"))
Find zeros of a real or complex polynomial.
polyroot(z)
polyroot(z)
z |
the vector of polynomial coefficients in increasing order. |
A polynomial of degree ,
is given by its coefficient vector z[1:n]
.
polyroot
returns the complex zeros of
using the Jenkins-Traub algorithm.
If the coefficient vector z
has zeroes for the highest powers,
these are discarded.
There is no maximum degree, but numerical stability may be an issue for all but low-degree polynomials.
A complex vector of length , where
is the position
of the largest non-zero element of
z
.
C translation by Ross Ihaka of Fortran code in the reference, with modifications by the R Core Team.
Jenkins, M. A. and Traub, J. F. (1972). Algorithm 419: zeros of a complex polynomial. Communications of the ACM, 15(2), 97–99. doi:10.1145/361254.361262.
uniroot
for numerical root finding of arbitrary
functions;
complex
and the zero
example in the demos
directory.
polyroot(c(1, 2, 1)) round(polyroot(choose(8, 0:8)), 11) # guess what! for (n1 in 1:4) print(polyroot(1:n1), digits = 4) polyroot(c(1, 2, 1, 0, 0)) # same as the first
polyroot(c(1, 2, 1)) round(polyroot(choose(8, 0:8)), 11) # guess what! for (n1 in 1:4) print(polyroot(1:n1), digits = 4) polyroot(c(1, 2, 1, 0, 0)) # same as the first
Returns the environment at a specified position in the search path.
pos.to.env(x)
pos.to.env(x)
x |
an integer between |
Several R functions for manipulating objects in environments (such as
get
and ls
) allow specifying environments
via corresponding positions in the search path. pos.to.env
is
a convenience function for programmers which converts these positions
to corresponding environments; users will typically have no need for
it. It is primitive.
-1
is interpreted as the environment the function is called
from.
This is a primitive function.
pos.to.env(1) # R_GlobalEnv # the next returns the base environment pos.to.env(length(search()))
pos.to.env(1) # R_GlobalEnv # the next returns the base environment pos.to.env(length(search()))
Compute a sequence of about n+1
equally spaced ‘round’
values which cover the range of the values in x
.
The values are chosen so that they are 1, 2 or 5 times a power of 10.
pretty(x, ...) ## Default S3 method: pretty(x, n = 5, min.n = n %/% 3, shrink.sml = 0.75, high.u.bias = 1.5, u5.bias = .5 + 1.5*high.u.bias, eps.correct = 0, f.min = 2^-20, ...) .pretty(x, n = 5L, min.n = n %/% 3, shrink.sml = 0.75, high.u.bias = 1.5, u5.bias = .5 + 1.5*high.u.bias, eps.correct = 0L, f.min = 2^-20, bounds = TRUE)
pretty(x, ...) ## Default S3 method: pretty(x, n = 5, min.n = n %/% 3, shrink.sml = 0.75, high.u.bias = 1.5, u5.bias = .5 + 1.5*high.u.bias, eps.correct = 0, f.min = 2^-20, ...) .pretty(x, n = 5L, min.n = n %/% 3, shrink.sml = 0.75, high.u.bias = 1.5, u5.bias = .5 + 1.5*high.u.bias, eps.correct = 0L, f.min = 2^-20, bounds = TRUE)
x |
an object coercible to numeric by |
n |
integer giving the desired number of intervals. Non-integer values are rounded down. |
min.n |
nonnegative integer giving the minimal number of
intervals. If |
shrink.sml |
positive number, a factor (smaller than one)
by which a default scale is shrunk in the case when
|
high.u.bias |
non-negative numeric, typically |
u5.bias |
non-negative numeric
multiplier favoring factor 5 over 2. Default and ‘optimal’:
|
eps.correct |
integer code, one of {0,1,2}. If non-0, an
epsilon correction is made at the boundaries such that
the result boundaries will be outside |
f.min |
positive factor multiplied by |
bounds |
a |
... |
further arguments for methods. |
pretty
ignores non-finite values in x
.
Let d <- max(x) - min(x)
.
If
d
is not (very close) to 0, we let c <- d/n
,
otherwise more or less c <- max(abs(range(x)))*shrink.sml / min.n
.
Then, the 10 base b
is
such
that
.
Now determine the basic unit as one of
, depending on
and the two ‘bias’ coefficients,
high.u.bias
and u5.bias
.
.........
pretty()
returns an numeric vector of approximately
n
increasing numbers which are “pretty” in decimal notation.
(in extreme range cases, the numbers can no longer be “pretty”
given the other constraints; e.g., for pretty(..)
For ease of investigating the underlying C R_pretty()
function, .pretty()
returns a named list
. By
default, when bounds=TRUE
, the entries are l
, u
,
and n
, whereas for bounds=FALSE
, they are
ns
, nu
, n
, and (a “pretty”) unit
where the n*
's are integer valued (but only n
is of class
integer
). Programmers may use this to create pretty
sequence (iterator) objects.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
axTicks
for the computation of pretty axis tick
locations in plots, particularly on the log scale.
pretty(1:15) # 0 2 4 6 8 10 12 14 16 pretty(1:15, high.u.bias = 2) # 0 5 10 15 pretty(1:15, n = 4) # 0 5 10 15 pretty(1:15 * 2) # 0 5 10 15 20 25 30 pretty(1:20) # 0 5 10 15 20 pretty(1:20, n = 2) # 0 10 20 pretty(1:20, n = 10) # 0 2 4 ... 20 for(k in 5:11) { cat("k=", k, ": "); print(diff(range(pretty(100 + c(0, pi*10^-k)))))} ##-- more bizarre, when min(x) == max(x): pretty(pi) add.names <- function(v) { names(v) <- paste(v); v} utils::str(lapply(add.names(-10:20), pretty)) ## min.n = 0 returns a length-1 vector "if pretty": utils::str(lapply(add.names(0:20), pretty, min.n = 0)) sapply( add.names(0:20), pretty, min.n = 4) pretty(1.234e100) pretty(1001.1001) pretty(1001.1001, shrink.sml = 0.2) for(k in -7:3) cat("shrink=", formatC(2^k, width = 9),":", formatC(pretty(1001.1001, shrink.sml = 2^k), width = 6),"\n")
pretty(1:15) # 0 2 4 6 8 10 12 14 16 pretty(1:15, high.u.bias = 2) # 0 5 10 15 pretty(1:15, n = 4) # 0 5 10 15 pretty(1:15 * 2) # 0 5 10 15 20 25 30 pretty(1:20) # 0 5 10 15 20 pretty(1:20, n = 2) # 0 10 20 pretty(1:20, n = 10) # 0 2 4 ... 20 for(k in 5:11) { cat("k=", k, ": "); print(diff(range(pretty(100 + c(0, pi*10^-k)))))} ##-- more bizarre, when min(x) == max(x): pretty(pi) add.names <- function(v) { names(v) <- paste(v); v} utils::str(lapply(add.names(-10:20), pretty)) ## min.n = 0 returns a length-1 vector "if pretty": utils::str(lapply(add.names(0:20), pretty, min.n = 0)) sapply( add.names(0:20), pretty, min.n = 4) pretty(1.234e100) pretty(1001.1001) pretty(1001.1001, shrink.sml = 0.2) for(k in -7:3) cat("shrink=", formatC(2^k, width = 9),":", formatC(pretty(1001.1001, shrink.sml = 2^k), width = 6),"\n")
.Primitive
looks up by name a ‘primitive’
(internally implemented) function.
.Primitive(name)
.Primitive(name)
name |
name of the R function. |
The advantage of .Primitive
over .Internal
functions is the potential efficiency of argument passing, and that
positional matching can be used where desirable, e.g. in
switch
. For more details, see the ‘R Internals’
manual.
All primitive functions are in the base namespace.
This function is almost never used: `name`
or, more carefully,
get(name, envir = baseenv())
work equally well and do
not depend on knowing which functions are primitive (which does change
as R evolves).
is.primitive
showing that primitive functions come in
two types (typeof
),
.Internal
.
mysqrt <- .Primitive("sqrt") c .Internal # this one *must* be primitive! `if` # need backticks
mysqrt <- .Primitive("sqrt") c .Internal # this one *must* be primitive! `if` # need backticks
print
prints its argument and returns it invisibly (via
invisible(x)
). It is a generic function which means that
new printing methods can be easily added for new class
es.
print(x, ...) ## S3 method for class 'factor' print(x, quote = FALSE, max.levels = NULL, width = getOption("width"), ...) ## S3 method for class 'table' print(x, digits = getOption("digits"), quote = FALSE, na.print = "", zero.print = "0", right = is.numeric(x) || is.complex(x), justify = "none", ...) ## S3 method for class 'function' print(x, useSource = TRUE, ...)
print(x, ...) ## S3 method for class 'factor' print(x, quote = FALSE, max.levels = NULL, width = getOption("width"), ...) ## S3 method for class 'table' print(x, digits = getOption("digits"), quote = FALSE, na.print = "", zero.print = "0", right = is.numeric(x) || is.complex(x), justify = "none", ...) ## S3 method for class 'function' print(x, useSource = TRUE, ...)
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
quote |
logical, indicating whether or not strings should be printed with surrounding quotes. |
max.levels |
integer, indicating how many levels should be
printed for a factor; if |
width |
only used when |
digits |
minimal number of significant digits, see
|
na.print |
character string (or |
zero.print |
character specifying how zeros ( |
right |
logical, indicating whether or not strings should be right aligned. |
justify |
character indicating if strings should left- or
right-justified or left alone, passed to |
useSource |
logical indicating if internally stored source
should be used for printing when present, e.g., if
|
The default method, print.default
has its own help page.
Use methods("print")
to get all the methods for the
print
generic.
print.factor
allows some customization and is used for printing
ordered
factors as well.
print.table
for printing table
s allows other
customization. As of R 3.0.0, it only prints a description in case of a table
with 0-extents (this can happen if a classifier has no valid data).
See noquote
as an example of a class whose main
purpose is a specific print
method.
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.
The default method print.default
, and help for the
methods above; further options
, noquote
.
For more customizable (but cumbersome) printing, see
cat
, format
or also write
.
For a simple prototypical print method, see
.print.via.format
in package tools.
require(stats) ts(1:20) #-- print is the "Default function" --> print.ts(.) is called for(i in 1:3) print(1:i) ## Printing of factors attenu$station ## 117 levels -> 'max.levels' depending on width ## ordered factors: levels "l1 < l2 < .." esoph$agegp[1:12] esoph$alcgp[1:12] ## Printing of sparse (contingency) tables set.seed(521) t1 <- round(abs(rt(200, df = 1.8))) t2 <- round(abs(rt(200, df = 1.4))) table(t1, t2) # simple print(table(t1, t2), zero.print = ".") # nicer to read ## same for non-integer "table": T <- table(t2,t1) T <- T * (1+round(rlnorm(length(T)))/4) print(T, zero.print = ".") # quite nicer, print.table(T[,2:8] * 1e9, digits=3, zero.print = ".") ## still slightly inferior to Matrix::Matrix(T) for larger T ## Corner cases with empty extents: table(1, NA) # < table of extent 1 x 0 >
require(stats) ts(1:20) #-- print is the "Default function" --> print.ts(.) is called for(i in 1:3) print(1:i) ## Printing of factors attenu$station ## 117 levels -> 'max.levels' depending on width ## ordered factors: levels "l1 < l2 < .." esoph$agegp[1:12] esoph$alcgp[1:12] ## Printing of sparse (contingency) tables set.seed(521) t1 <- round(abs(rt(200, df = 1.8))) t2 <- round(abs(rt(200, df = 1.4))) table(t1, t2) # simple print(table(t1, t2), zero.print = ".") # nicer to read ## same for non-integer "table": T <- table(t2,t1) T <- T * (1+round(rlnorm(length(T)))/4) print(T, zero.print = ".") # quite nicer, print.table(T[,2:8] * 1e9, digits=3, zero.print = ".") ## still slightly inferior to Matrix::Matrix(T) for larger T ## Corner cases with empty extents: table(1, NA) # < table of extent 1 x 0 >
Print a data frame.
## S3 method for class 'data.frame' print(x, ..., digits = NULL, quote = FALSE, right = TRUE, row.names = TRUE, max = NULL)
## S3 method for class 'data.frame' print(x, ..., digits = NULL, quote = FALSE, right = TRUE, row.names = TRUE, max = NULL)
x |
object of class |
... |
optional arguments to |
digits |
the minimum number of significant digits to be used: see
|
quote |
logical, indicating whether or not entries should be printed with surrounding quotes. |
right |
logical, indicating whether or not strings should be right-aligned. The default is right-alignment. |
row.names |
logical (or character vector), indicating whether (or what) row names should be printed. |
max |
numeric or |
This calls format
which formats the data frame
column-by-column, then converts to a character matrix and dispatches
to the print
method for matrices.
When quote = TRUE
only the entries are quoted not the row names
nor the column names.
(dd <- data.frame(x = 1:8, f = gl(2,4), ch = I(letters[1:8]))) # print() with defaults print(dd, quote = TRUE, row.names = FALSE) # suppresses row.names and quotes all entries
(dd <- data.frame(x = 1:8, f = gl(2,4), ch = I(letters[1:8]))) # print() with defaults print(dd, quote = TRUE, row.names = FALSE) # suppresses row.names and quotes all entries
print.default
is the default method of the generic
print
function which prints its argument.
## Default S3 method: print(x, digits = NULL, quote = TRUE, na.print = NULL, print.gap = NULL, right = FALSE, max = NULL, width = NULL, useSource = TRUE, ...)
## Default S3 method: print(x, digits = NULL, quote = TRUE, na.print = NULL, print.gap = NULL, right = FALSE, max = NULL, width = NULL, useSource = TRUE, ...)
x |
the object to be printed. |
digits |
a non-null value for |
quote |
logical, indicating whether or not strings
( |
na.print |
a character string which is used to indicate
|
print.gap |
a non-negative integer |
right |
logical, indicating whether or not strings should be right aligned. The default is left alignment. |
max |
a non-null value for |
width |
controls the maximum number of columns on a line used in
printing vectors, matrices, etc. The default, |
useSource |
logical, indicating whether to use source references or copies rather than deparsing language objects. The default is to use the original source if it is available. |
... |
further arguments to be passed to or from other methods. They are ignored in this function. |
The default for printing NA
s is to print NA
(without
quotes) unless this is a character NA
and quote =
FALSE
, when ‘<NA>’ is printed.
The same number of decimal places is used throughout a vector. This
means that digits
specifies the minimum number of significant
digits to be used, and that at least one entry will be encoded with
that minimum number. However, if all the encoded elements then have
trailing zeroes, the number of decimal places is reduced until at
least one element has a non-zero final digit. Decimal points are only
included if at least one decimal place is selected.
You can suppress “exponential” / scientific
notation in
printing of numbers (atomic vectors x
),
via format(., scientific=FALSE)
, see the prI()
example
below, or also by increasing global option scipen
, e.g.,
options(scipen = 12)
.
Attributes are printed respecting their class(es), using the values of
digits
to print.default
, but using the default values
(for the methods called) of the other arguments.
Option width
controls the printing of vectors, matrices and
arrays, and option deparse.cutoff
controls the printing of
language objects such as calls and formulae.
When the methods package is attached, print
will call
show
for R objects with formal classes (‘S4’)
if called with no optional arguments.
Note that for large values of digits
, currently for
digits >= 16
, the calculation of the number of significant
digits will depend on the platform's internal (C library)
implementation of ‘sprintf()’ functionality.
If a non-printable character is encountered during output, it is represented as one of the ANSI escape sequences (‘\a’, ‘\b’, ‘\f’, ‘\n’, ‘\r’, ‘\t’, ‘\v’, ‘\\’ and ‘\0’: see Quotes), or failing that as a 3-digit octal code: for example the UK currency pound sign in the C locale (if implemented correctly) is printed as ‘\243’. Which characters are non-printable depends on the locale. (Because some versions of Windows get this wrong, all bytes with the upper bit set are regarded as printable on Windows in a single-byte locale.)
In all locales, the characters in the ASCII range (‘0x00’ to ‘0x7f’) are printed in the same way, as-is if printable, otherwise via ANSI escape sequences or 3-digit octal escapes as described for single-byte locales. Whether a character is printable depends on the current locale and the operating system (C library).
Multi-byte non-printing characters are printed as an escape sequence of the form ‘\uxxxx’ or ‘\Uxxxxxxxx’ (in hexadecimal). This is the internal code for the wide-character representation of the character. If this is not known to be Unicode code points, a warning is issued. The only known exceptions are certain Japanese ISO 2022 locales on commercial Unixes, which use a concatenation of the bytes: it is unlikely that R compiles on such a system.
It is possible to have a character string in a character vector that is not valid in the current locale. If a byte is encountered that is not part of a valid character it is printed in hex in the form ‘\xab’ and this is repeated until the start of a valid character. (This will rapidly recover from minor errors in UTF-8.)
The generic print
, options
.
The "noquote"
class and print method.
encodeString
, which encodes a character vector the way
it would be printed.
pi print(pi, digits = 16) LETTERS[1:16] print(LETTERS, quote = FALSE) M <- cbind(I = 1, matrix(1:10000, ncol = 10, dimnames = list(NULL, LETTERS[1:10]))) utils::head(M) # makes more sense than print(M, max = 1000) # prints 90 rows and a message about omitting 910 (x <- 2^seq(-8, 30, by=1/4)) # auto-prints; by default all in "exponential" format prI <- function(x) noquote(format(x, scientific = FALSE)) prI(x) # prints more "nicely" (using a bit more space)
pi print(pi, digits = 16) LETTERS[1:16] print(LETTERS, quote = FALSE) M <- cbind(I = 1, matrix(1:10000, ncol = 10, dimnames = list(NULL, LETTERS[1:10]))) utils::head(M) # makes more sense than print(M, max = 1000) # prints 90 rows and a message about omitting 910 (x <- 2^seq(-8, 30, by=1/4)) # auto-prints; by default all in "exponential" format prI <- function(x) noquote(format(x, scientific = FALSE)) prI(x) # prints more "nicely" (using a bit more space)
An earlier method for printing matrices, provided for S compatibility.
prmatrix(x, rowlab =, collab =, quote = TRUE, right = FALSE, na.print = NULL, ...)
prmatrix(x, rowlab =, collab =, quote = TRUE, right = FALSE, na.print = NULL, ...)
x |
numeric or character matrix. |
rowlab , collab
|
(optional) character vectors giving row or column
names respectively. By default, these are taken from
|
quote |
logical; if |
right |
if |
na.print |
how |
... |
arguments for |
prmatrix
is an earlier form of print.matrix
, and
is very similar to the S function of the same name.
Invisibly returns its argument, x
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
print.default
, and other print
methods.
prmatrix(m6 <- diag(6), rowlab = rep("", 6), collab = rep("", 6)) chm <- matrix(scan(system.file("help", "AnIndex", package = "splines"), what = ""), , 2, byrow = TRUE) chm # uses print.matrix() prmatrix(chm, collab = paste("Column", 1:3), right = TRUE, quote = FALSE)
prmatrix(m6 <- diag(6), rowlab = rep("", 6), collab = rep("", 6)) chm <- matrix(scan(system.file("help", "AnIndex", package = "splines"), what = ""), , 2, byrow = TRUE) chm # uses print.matrix() prmatrix(chm, collab = paste("Column", 1:3), right = TRUE, quote = FALSE)
proc.time
determines how much real and CPU time (in seconds)
the currently running R process has already taken.
proc.time()
proc.time()
proc.time
returns five elements for backwards compatibility,
but its print
method prints a named vector of
length 3. The first two entries are the total user and system CPU
times of the current R process and any child processes on which it
has waited, and the third entry is the ‘real’ elapsed time
since the process was started.
An object of class "proc_time"
which is a numeric vector of
length 5, containing the user, system, and total elapsed times for the
currently running R process, and the cumulative sum of user and
system times of any child processes spawned by it on which it has
waited. (The print
method uses the summary
method to
combine the child times with those of the main process.)
The definition of ‘user’ and ‘system’ times is from your OS. Typically it is something like
The ‘user time’ is the CPU time charged for the execution of user instructions of the calling process. The ‘system time’ is the CPU time charged for execution by the system on behalf of the calling process.
Times of child processes are not available on Windows and will always
be given as NA
.
The resolution of the times will be system-specific and on Unix-alikes times are rounded down to milliseconds. On modern systems they will be that accurate, but on older systems they might be accurate to 1/100 or 1/60 sec. They are typically available to 10ms on Windows.
This is a primitive function.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
system.time
for timing an R expression,
gc.time
for how much of the time was spent in garbage
collection.
setTimeLimit
to limit the CPU or elapsed time for
the session or an expression.
## a way to time an R expression: system.time is preferred ptm <- proc.time() for (i in 1:50) mad(stats::runif(500)) proc.time() - ptm
## a way to time an R expression: system.time is preferred ptm <- proc.time() for (i in 1:50) mad(stats::runif(500)) proc.time() - ptm
prod
returns the product of all the values
present in its arguments.
prod(..., na.rm = FALSE)
prod(..., na.rm = FALSE)
... |
numeric or complex or logical vectors. |
na.rm |
logical. Should missing values be removed? |
If na.rm
is FALSE
an NA
value in any of the arguments will cause
a value of NA
to be returned, otherwise
NA
values are ignored.
This is a generic function: methods can be defined for it
directly or via the Summary
group generic.
For this to work properly, the arguments ...
should be
unnamed, and dispatch is on the first argument.
Logical true values are regarded as one, false values as zero.
For historical reasons, NULL
is accepted and treated as if it
were numeric(0)
.
The product, a numeric (of type "double"
) or complex vector of length one.
NB: the product of an empty set is one, by definition.
This is part of the S4 Summary
group generic. Methods for it must use the signature
x, ..., na.rm
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
‘plotmath’ for the use of prod
in plot annotation.
print(prod(1:7)) == print(gamma(8))
print(prod(1:7)) == print(gamma(8))
Returns conditional proportions given margins
, i.e.,
entries of x
, divided by the appropriate marginal sums.
proportions(x, margin = NULL) prop.table(x, margin = NULL)
proportions(x, margin = NULL) prop.table(x, margin = NULL)
x |
an array, usually a |
margin |
a vector giving the margins to split by.
E.g., for a matrix |
A table or array like x
, expressed relative to margin
.
prop.table
is an earlier name, retained for back-compatibility.
Peter Dalgaard
apply
and sweep
are more general
mechanisms for sweeping out marginal statistics.
m <- matrix(1:4, 2) m proportions(m, 1) DF <- as.data.frame(UCBAdmissions) tbl <- xtabs(Freq ~ Gender + Admit, DF) tbl proportions(tbl, "Gender")
m <- matrix(1:4, 2) m proportions(m, 1) DF <- as.data.frame(UCBAdmissions) tbl <- xtabs(Freq ~ Gender + Admit, DF) tbl proportions(tbl, "Gender")
Functions to push back text lines onto a connection, and to enquire how many lines are currently pushed back.
pushBack(data, connection, newLine = TRUE, encoding = c("", "bytes", "UTF-8")) pushBackLength(connection) clearPushBack(connection)
pushBack(data, connection, newLine = TRUE, encoding = c("", "bytes", "UTF-8")) pushBackLength(connection) clearPushBack(connection)
data |
a character vector. |
connection |
a connection. |
newLine |
logical. If true, a newline is appended to each string pushed back. |
encoding |
character string, partially matched. See details. |
Several character strings can be pushed back on one or more occasions.
The occasions form a stack, so the first line to be retrieved will be
the first string from the last call to pushBack
. Lines which
are pushed back are read prior to the normal input from the
connection, by the normal text-reading functions such as
readLines
and scan
.
Pushback is only allowed for readable connections in text mode.
Not all uses of connections respect pushbacks, in particular the input
connection is still wired directly, so for example parsing
commands from the console and scan("")
ignore pushbacks on
stdin
.
When character strings with a marked encoding (see
Encoding
) are pushed back they are converted to the
current encoding if encoding = ""
. This may involve
representing characters as ‘<U+xxxx>’ if they cannot be
converted. They will be converted to UTF-8 if encoding =
"UTF-8"
or left as-is if encoding = "bytes"
.
pushBack
and clearPushBack()
return nothing, invisibly.
pushBackLength
returns the number of lines currently pushed back.
zz <- textConnection(LETTERS) readLines(zz, 2) pushBack(c("aa", "bb"), zz) pushBackLength(zz) readLines(zz, 1) pushBackLength(zz) readLines(zz, 1) readLines(zz, 1) close(zz)
zz <- textConnection(LETTERS) readLines(zz, 2) pushBack(c("aa", "bb"), zz) pushBackLength(zz) readLines(zz, 1) pushBackLength(zz) readLines(zz, 1) readLines(zz, 1) close(zz)
qr
computes the QR decomposition of a matrix.
qr(x, ...) ## Default S3 method: qr(x, tol = 1e-07 , LAPACK = FALSE, ...) qr.coef(qr, y) qr.qy(qr, y) qr.qty(qr, y) qr.resid(qr, y) qr.fitted(qr, y, k = qr$rank) qr.solve(a, b, tol = 1e-7) ## S3 method for class 'qr' solve(a, b, ...) is.qr(x) as.qr(x)
qr(x, ...) ## Default S3 method: qr(x, tol = 1e-07 , LAPACK = FALSE, ...) qr.coef(qr, y) qr.qy(qr, y) qr.qty(qr, y) qr.resid(qr, y) qr.fitted(qr, y, k = qr$rank) qr.solve(a, b, tol = 1e-7) ## S3 method for class 'qr' solve(a, b, ...) is.qr(x) as.qr(x)
x |
a numeric or complex matrix whose QR decomposition is to be computed. Logical matrices are coerced to numeric. |
tol |
the tolerance for detecting linear dependencies in the
columns of |
qr |
a QR decomposition of the type computed by |
y , b
|
a vector or matrix of right-hand sides of equations. |
a |
a QR decomposition or ( |
k |
effective rank. |
LAPACK |
logical. For real |
... |
further arguments passed to or from other methods. |
The QR decomposition plays an important role in many
statistical techniques. In particular it can be used to solve the
equation for given matrix
,
and vector
. It is useful for computing regression
coefficients and in applying the Newton-Raphson algorithm.
The functions qr.coef
, qr.resid
, and qr.fitted
return the coefficients, residuals and fitted values obtained when
fitting y
to the matrix with QR decomposition qr
.
(If pivoting is used, some of the coefficients will be NA
.)
qr.qy
and qr.qty
return Q %*% y
and
t(Q) %*% y
, where Q
is the (complete) matrix.
All the above functions keep dimnames
(and names
) of
x
and y
if there are any.
solve.qr
is the method for solve
for qr
objects.
qr.solve
solves systems of equations via the QR decomposition:
if a
is a QR decomposition it is the same as solve.qr
,
but if a
is a rectangular matrix the QR decomposition is
computed first. Either will handle over- and under-determined
systems, providing a least-squares fit if appropriate.
is.qr
returns TRUE
if x
is a list
and inherits
from "qr"
.
It is not possible to coerce objects to mode "qr"
. Objects
either are QR decompositions or they are not.
The LINPACK interface is restricted to matrices x
with less
than elements.
qr.fitted
and qr.resid
only support the LINPACK interface.
Unsuccessful results from the underlying LAPACK code will result in an error giving a positive error code: these can only be interpreted by detailed study of the FORTRAN code.
The QR decomposition of the matrix as computed by LINPACK(*) or LAPACK. The components in the returned value correspond directly to the values returned by DQRDC(2)/DGEQP3/ZGEQP3.
qr |
a matrix with the same dimensions as |
qraux |
a vector of length |
rank |
the rank of |
pivot |
information on the pivoting strategy used during the decomposition. |
Non-complex QR objects computed by LAPACK have the attribute
"useLAPACK"
with value TRUE
.
*)
dqrdc2
instead of LINPACK's DQRDCIn the (default) LINPACK case (LAPACK = FALSE
), qr()
uses a modified version of LINPACK's DQRDC, called
‘dqrdc2
’. It differs by using the tolerance tol
for a pivoting strategy which moves columns with near-zero 2-norm to
the right-hand edge of the x matrix. This strategy means that
sequential one degree-of-freedom effects can be computed in a natural
way.
To compute the determinant of a matrix (do you really need it?),
the QR decomposition is much more efficient than using eigenvalues
(eigen
). See det
.
Using LAPACK (including in the complex case) uses column pivoting and does not attempt to detect rank-deficient matrices.
For qr
, the LINPACK routine DQRDC
(but modified to
dqrdc2
(*)) and the LAPACK
routines DGEQP3
and ZGEQP3
. Further LINPACK and LAPACK
routines are used for qr.coef
, qr.qy
and qr.aty
.
LAPACK and LINPACK are from https://netlib.org/lapack/ and https://netlib.org/linpack/ and their guides are listed in the references.
Anderson. E. and ten others (1999)
LAPACK Users' Guide. Third Edition. SIAM.
Available on-line at
https://netlib.org/lapack/lug/lapack_lug.html.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Dongarra, J. J., Bunch, J. R., Moler, C. B. and Stewart, G. W. (1978) LINPACK Users Guide. Philadelphia: SIAM Publications.
qr.Q
, qr.R
, qr.X
for
reconstruction of the matrices.
lm.fit
, lsfit
,
eigen
, svd
.
det
(using qr
) to compute the determinant of a matrix.
hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, `+`) } h9 <- hilbert(9); h9 qr(h9)$rank #--> only 7 qrh9 <- qr(h9, tol = 1e-10) qrh9$rank #--> 9 ##-- Solve linear equation system H %*% x = y : y <- 1:9/10 x <- qr.solve(h9, y, tol = 1e-10) # or equivalently : x <- qr.coef(qrh9, y) #-- is == but much better than #-- solve(h9) %*% y h9 %*% x # = y ## overdetermined system A <- matrix(runif(12), 4) b <- 1:4 qr.solve(A, b) # or solve(qr(A), b) solve(qr(A, LAPACK = TRUE), b) # this is a least-squares solution, cf. lm(b ~ 0 + A) ## underdetermined system A <- matrix(runif(12), 3) b <- 1:3 qr.solve(A, b) solve(qr(A, LAPACK = TRUE), b) # solutions will have one zero, not necessarily the same one
hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, `+`) } h9 <- hilbert(9); h9 qr(h9)$rank #--> only 7 qrh9 <- qr(h9, tol = 1e-10) qrh9$rank #--> 9 ##-- Solve linear equation system H %*% x = y : y <- 1:9/10 x <- qr.solve(h9, y, tol = 1e-10) # or equivalently : x <- qr.coef(qrh9, y) #-- is == but much better than #-- solve(h9) %*% y h9 %*% x # = y ## overdetermined system A <- matrix(runif(12), 4) b <- 1:4 qr.solve(A, b) # or solve(qr(A), b) solve(qr(A, LAPACK = TRUE), b) # this is a least-squares solution, cf. lm(b ~ 0 + A) ## underdetermined system A <- matrix(runif(12), 3) b <- 1:3 qr.solve(A, b) solve(qr(A, LAPACK = TRUE), b) # solutions will have one zero, not necessarily the same one
Returns the original matrix from which the object was constructed or the components of the decomposition.
qr.X(qr, complete = FALSE, ncol =) qr.Q(qr, complete = FALSE, Dvec =) qr.R(qr, complete = FALSE)
qr.X(qr, complete = FALSE, ncol =) qr.Q(qr, complete = FALSE, Dvec =) qr.R(qr, complete = FALSE)
qr |
object representing a QR decomposition. This will
typically have come from a previous call to |
complete |
logical expression of length 1. Indicates whether an
arbitrary orthogonal completion of the |
ncol |
integer in the range |
Dvec |
vector (not matrix) of diagonal values. Each column of
the returned |
qr.X
returns , the original matrix from
which the qr object was constructed, provided
ncol(X) <= nrow(X)
.
If complete
is TRUE
or the argument ncol
is greater than
ncol(X)
, additional columns from an arbitrary orthogonal
(unitary) completion of X
are returned.
qr.Q
returns part or all of Q, the orthogonal (unitary)
transformation of order nrow(X)
represented by qr
. If
complete
is TRUE
, Q has nrow(X)
columns.
If complete
is FALSE
, Q has ncol(X)
columns. When Dvec
is specified, each column of Q is
multiplied by the corresponding value in Dvec
.
Note that qr.Q(qr, *)
is a special case of
qr.qy(qr, y)
(with a “diagonal” y
), and
qr.X(qr, *)
is basically qr.qy(qr, R)
(apart from
pivoting and dimnames
setting).
qr.R
returns R. This may be pivoted, e.g., if
a <- qr(x)
then x[, a$pivot]
= QR. The number of
rows of R is either nrow(X)
or ncol(X)
(and may
depend on whether complete
is TRUE
or FALSE
).
p <- ncol(x <- LifeCycleSavings[, -1]) # not the 'sr' qrstr <- qr(x) # dim(x) == c(n,p) qrstr $ rank # = 4 = p Q <- qr.Q(qrstr) # dim(Q) == dim(x) R <- qr.R(qrstr) # dim(R) == ncol(x) X <- qr.X(qrstr) # X == x range(X - as.matrix(x)) # ~ < 6e-12 ## X == Q %*% R if there has been no pivoting, as here: all.equal(unname(X), unname(Q %*% R)) # example of pivoting x <- cbind(int = 1, b1 = rep(1:0, each = 3), b2 = rep(0:1, each = 3), c1 = rep(c(1,0,0), 2), c2 = rep(c(0,1,0), 2), c3 = rep(c(0,0,1),2)) x # is singular, columns "b2" and "c3" are "extra" a <- qr(x) zapsmall(qr.R(a)) # columns are int b1 c1 c2 b2 c3 a$pivot pivI <- sort.list(a$pivot) # the inverse permutation all.equal (x, qr.Q(a) %*% qr.R(a)) # no, no stopifnot( all.equal(x[, a$pivot], qr.Q(a) %*% qr.R(a)), # TRUE all.equal(x , qr.Q(a) %*% qr.R(a)[, pivI])) # TRUE too!
p <- ncol(x <- LifeCycleSavings[, -1]) # not the 'sr' qrstr <- qr(x) # dim(x) == c(n,p) qrstr $ rank # = 4 = p Q <- qr.Q(qrstr) # dim(Q) == dim(x) R <- qr.R(qrstr) # dim(R) == ncol(x) X <- qr.X(qrstr) # X == x range(X - as.matrix(x)) # ~ < 6e-12 ## X == Q %*% R if there has been no pivoting, as here: all.equal(unname(X), unname(Q %*% R)) # example of pivoting x <- cbind(int = 1, b1 = rep(1:0, each = 3), b2 = rep(0:1, each = 3), c1 = rep(c(1,0,0), 2), c2 = rep(c(0,1,0), 2), c3 = rep(c(0,0,1),2)) x # is singular, columns "b2" and "c3" are "extra" a <- qr(x) zapsmall(qr.R(a)) # columns are int b1 c1 c2 b2 c3 a$pivot pivI <- sort.list(a$pivot) # the inverse permutation all.equal (x, qr.Q(a) %*% qr.R(a)) # no, no stopifnot( all.equal(x[, a$pivot], qr.Q(a) %*% qr.R(a)), # TRUE all.equal(x , qr.Q(a) %*% qr.R(a)[, pivI])) # TRUE too!
The function quit
or its alias q
terminate the current
R session.
quit(save = "default", status = 0, runLast = TRUE) q(save = "default", status = 0, runLast = TRUE)
quit(save = "default", status = 0, runLast = TRUE) q(save = "default", status = 0, runLast = TRUE)
save |
a character string indicating whether the environment
(workspace) should be saved, one of |
status |
the (numerical) error status to be returned to the
operating system, where relevant. Conventionally |
runLast |
should |
save
must be one of "no"
, "yes"
,
"ask"
or "default"
. In the first case the workspace
is not saved, in the second it is saved and in the third the user is
prompted and can also decide not to quit. The default is to
ask in interactive use but may be overridden by command-line
arguments (which must be supplied in non-interactive use).
Immediately before normal termination, .Last()
is
executed if the function .Last
exists and runLast
is
true. If in interactive use there are errors in the .Last
function, control will be returned to the command prompt, so do test
the function thoroughly. There is a system analogue,
.Last.sys()
, which is run after .Last()
if
runLast
is true.
Exactly what happens at termination of an R session depends on the
platform and GUI interface in use. A typical sequence is to run
.Last()
and .Last.sys()
(unless runLast
is
false), to save the workspace if requested (and in most cases also
to save the session history: see savehistory
), then
run any finalizers (see reg.finalizer
) that have been
set to be run on exit, close all open graphics devices, remove the
session temporary directory and print any remaining warnings
(e.g., from .Last()
and device closure).
Some error status values are used by R itself. The default error
handler for non-interactive use effectively calls q("no", 1,
FALSE)
and returns error status 1. Error status 2 is used for R
‘suicide’, that is a catastrophic failure, and other small
numbers are used by specific ports for initialization failures. It
is recommended that users choose statuses of 10 or more.
Valid values of status
are system-dependent, but 0:255
are normally valid. (Many OSes will report the last byte of the
value, that is report the value modulo 256. But not all.)
The value of .Last
is for the end user to control: as
it can be replaced later in the session, it cannot safely be used
programmatically, e.g. by a package. The other way to set code to be run
at the end of the session is to use a finalizer: see
reg.finalizer
.
The R.app
GUI on macOS has its own version of these functions
with slightly different behaviour for the save
argument (the
GUI's ‘Startup’ preferences for this action are taken into account).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
.First
for setting things on startup.
## Not run: ## Unix-flavour example .Last <- function() { graphics.off() # close devices before printing cat("Now sending PDF graphics to the printer:\n") system("lpr Rplots.pdf") cat("bye bye...\n") } quit("yes") ## End(Not run)
## Not run: ## Unix-flavour example .Last <- function() { graphics.off() # close devices before printing cat("Now sending PDF graphics to the printer:\n") system("lpr Rplots.pdf") cat("bye bye...\n") } quit("yes") ## End(Not run)
Descriptions of the various uses of quoting in R.
Three types of quotes are part of the syntax of R: single and double quotation marks and the backtick (or back quote, ‘`’). In addition, backslash is used to escape the following character inside character constants.
Single and double quotes delimit character constants. They can be used interchangeably but double quotes are preferred (and character constants are printed using double quotes), so single quotes are normally only used to delimit character constants containing double quotes.
Backslash is used to start an escape sequence inside character constants. Escaping a character not in the following table is an error.
Single quotes need to be escaped by backslash in single-quoted strings, and double quotes in double-quoted strings.
‘\n’ | newline (aka ‘line feed’) |
‘\r’ | carriage return |
‘\t’ | tab |
‘\b’ | backspace |
‘\a’ | alert (bell) |
‘\f’ | form feed |
‘\v’ | vertical tab |
‘\\’ | backslash ‘\’ |
‘\'’ | ASCII apostrophe ‘'’ |
‘\"’ | ASCII quotation mark ‘"’ |
‘\`’ | ASCII grave accent (backtick) ‘`’ |
‘\nnn’ | character with given octal code (1, 2 or 3 digits) |
‘\xnn’ | character with given hex code (1 or 2 hex digits) |
‘\unnnn’ | Unicode character with given code (1--4 hex digits) |
‘\Unnnnnnnn’ | Unicode character with given code (1--8 hex digits) |
Alternative forms for the last two are ‘\u{nnnn}’ and
‘\U{nnnnnnnn}’. All except the Unicode escape sequences are
also supported when reading character strings by scan
and read.table
if allowEscapes = TRUE
. Unicode
escapes can be used to enter Unicode characters not in the current
locale's charset (when the string will be stored internally in UTF-8).
The maximum allowed value for ‘\nnn’ is ‘\377’ (the same
character as ‘\xff’).
As from R 4.1.0 the largest allowed ‘\U’ value is ‘\U10FFFF’, the maximum Unicode point.
The parser does not allow the use of both octal/hex and Unicode escapes in a single string.
These forms will also be used by print.default
when outputting non-printable characters (including backslash).
Embedded NULs are not allowed in character strings, so using escapes (such as ‘\0’) for a NUL will result in the string being truncated at that point (usually with a warning).
Raw character constants are also available using a syntax similar to
the one used in C++: r"(...)"
with ...
any character
sequence, except that it must not contain the closing sequence
‘)"’.
The delimiter pairs []
and {}
can also be
used, and R
can be used in place of r
. For additional
flexibility, a number of dashes can be placed between the opening quote
and the opening delimiter, as long as the same number of dashes appear
between the closing delimiter and the closing quote.
Identifiers consist of a sequence of letters, digits, the period
(.
) and the underscore. They must not start with a digit nor
underscore, nor with a period followed by a digit. Reserved
words are not valid identifiers.
The definition of a letter depends on the current locale, but only ASCII digits are considered to be digits.
Such identifiers are also known as syntactic names and may be used
directly in R code. Almost always, other names can be used
provided they are quoted. The preferred quote is the backtick
(‘`’), and deparse
will normally use it, but under
many circumstances single or double quotes can be used (as a character
constant will often be converted to a name). One place where
backticks may be essential is to delimit variable names in formulae:
see formula
.
UTF-16 surrogate pairs in ‘\unnnn\uoooo’ form will be converted
to a single Unicode point, so for example ‘\uD834\uDD1E’ gives
the single character ‘\U1D11E’. However, unpaired values in
the surrogate range such as in the string "abc\uD834de"
will be
converted to a non-standard-conformant UTF-8 string (as is done by most
other software): this may change in future.
Syntax
for other aspects of the syntax.
sQuote
for quoting English text.
shQuote
for quoting OS commands.
The ‘R Language Definition’ manual.
'single quotes can be used more-or-less interchangeably' "with double quotes to create character vectors" ## Single quotes inside single-quoted strings need backslash-escaping. ## Ditto double quotes inside double-quoted strings. ## identical('"It\'s alive!", he screamed.', "\"It's alive!\", he screamed.") # same ## Backslashes need doubling, or they have a special meaning. x <- "In ALGOL, you could do logical AND with /\\." print(x) # shows it as above ("input-like") writeLines(x) # shows it as you like it ;-) ## Single backslashes followed by a letter are used to denote ## special characters like tab(ulator)s and newlines: x <- "long\tlines can be\nbroken with newlines" writeLines(x) # see also ?strwrap ## Backticks are used for non-standard variable names. ## (See make.names and ?Reserved for what counts as ## non-standard.) `x y` <- 1:5 `x y` d <- data.frame(`1st column` = rchisq(5, 2), check.names = FALSE) d$`1st column` ## Backslashes followed by up to three numbers are interpreted as ## octal notation for ASCII characters. "\110\145\154\154\157\40\127\157\162\154\144\41" ## \x followed by up to two numbers is interpreted as ## hexadecimal notation for ASCII characters. (hw1 <- "\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21") ## Mixing octal and hexadecimal in the same string is OK (hw2 <- "\110\x65\154\x6c\157\x20\127\x6f\162\x6c\144\x21") ## \u is also hexadecimal, but supports up to 4 digits, ## using Unicode specification. In the previous example, ## you can simply replace \x with \u. (hw3 <- "\u48\u65\u6c\u6c\u6f\u20\u57\u6f\u72\u6c\u64\u21") ## The last three are all identical to hw <- "Hello World!" stopifnot(identical(hw, hw1), identical(hw1, hw2), identical(hw2, hw3)) ## Using Unicode makes more sense for non-latin characters. (nn <- "\u0126\u0119\u1114\u022d\u2001\u03e2\u0954\u0f3f\u13d3\u147b\u203c") ## Mixing \x and \u throws a _parse_ error (which is not catchable!) ## Not run: "\x48\u65\x6c\u6c\x6f\u20\x57\u6f\x72\u6c\x64\u21" ## End(Not run) ## --> Error: mixing Unicode and octal/hex escapes ..... ## \U works like \u, but supports up to six hex digits. ## So we can replace \u with \U in the previous example. n2 <- "\U0126\U0119\U1114\U022d\U2001\U03e2\U0954\U0f3f\U13d3\U147b\U203c" stopifnot(identical(nn, n2)) ## Under systems supporting multi-byte locales (and not Windows), ## \U also supports the rarer characters outside the usual 16^4 range. ## See the R language manual, ## https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Literal-constants ## and bug 16098 https://bugs.r-project.org/show_bug.cgi?id=16098 ## This character may or not be printable (the platform decides) ## and if it is, may not have a glyph in the font used. "\U1d4d7" # On Windows this used to give the incorrect value of "\Ud4d7" ## nul characters (for terminating strings in C) are not allowed (parse errors) ## Not run: "foo\0bar" # Error: nul character not allowed (line 1) "foo\u0000bar" # same error ## End(Not run) ## A Windows path written as a raw string constant: r"(c:\Program files\R)" ## More raw strings: r"{(\1\2)}" r"(use both "double" and 'single' quotes)" r"---(\1--)-)---"
'single quotes can be used more-or-less interchangeably' "with double quotes to create character vectors" ## Single quotes inside single-quoted strings need backslash-escaping. ## Ditto double quotes inside double-quoted strings. ## identical('"It\'s alive!", he screamed.', "\"It's alive!\", he screamed.") # same ## Backslashes need doubling, or they have a special meaning. x <- "In ALGOL, you could do logical AND with /\\." print(x) # shows it as above ("input-like") writeLines(x) # shows it as you like it ;-) ## Single backslashes followed by a letter are used to denote ## special characters like tab(ulator)s and newlines: x <- "long\tlines can be\nbroken with newlines" writeLines(x) # see also ?strwrap ## Backticks are used for non-standard variable names. ## (See make.names and ?Reserved for what counts as ## non-standard.) `x y` <- 1:5 `x y` d <- data.frame(`1st column` = rchisq(5, 2), check.names = FALSE) d$`1st column` ## Backslashes followed by up to three numbers are interpreted as ## octal notation for ASCII characters. "\110\145\154\154\157\40\127\157\162\154\144\41" ## \x followed by up to two numbers is interpreted as ## hexadecimal notation for ASCII characters. (hw1 <- "\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21") ## Mixing octal and hexadecimal in the same string is OK (hw2 <- "\110\x65\154\x6c\157\x20\127\x6f\162\x6c\144\x21") ## \u is also hexadecimal, but supports up to 4 digits, ## using Unicode specification. In the previous example, ## you can simply replace \x with \u. (hw3 <- "\u48\u65\u6c\u6c\u6f\u20\u57\u6f\u72\u6c\u64\u21") ## The last three are all identical to hw <- "Hello World!" stopifnot(identical(hw, hw1), identical(hw1, hw2), identical(hw2, hw3)) ## Using Unicode makes more sense for non-latin characters. (nn <- "\u0126\u0119\u1114\u022d\u2001\u03e2\u0954\u0f3f\u13d3\u147b\u203c") ## Mixing \x and \u throws a _parse_ error (which is not catchable!) ## Not run: "\x48\u65\x6c\u6c\x6f\u20\x57\u6f\x72\u6c\x64\u21" ## End(Not run) ## --> Error: mixing Unicode and octal/hex escapes ..... ## \U works like \u, but supports up to six hex digits. ## So we can replace \u with \U in the previous example. n2 <- "\U0126\U0119\U1114\U022d\U2001\U03e2\U0954\U0f3f\U13d3\U147b\U203c" stopifnot(identical(nn, n2)) ## Under systems supporting multi-byte locales (and not Windows), ## \U also supports the rarer characters outside the usual 16^4 range. ## See the R language manual, ## https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Literal-constants ## and bug 16098 https://bugs.r-project.org/show_bug.cgi?id=16098 ## This character may or not be printable (the platform decides) ## and if it is, may not have a glyph in the font used. "\U1d4d7" # On Windows this used to give the incorrect value of "\Ud4d7" ## nul characters (for terminating strings in C) are not allowed (parse errors) ## Not run: "foo\0bar" # Error: nul character not allowed (line 1) "foo\u0000bar" # same error ## End(Not run) ## A Windows path written as a raw string constant: r"(c:\Program files\R)" ## More raw strings: r"{(\1\2)}" r"(use both "double" and 'single' quotes)" r"---(\1--)-)---"
R.Version()
provides detailed information about the version of
R running.
R.version
is a variable (a list
) holding this
information (and version
is a copy of it for S compatibility).
R.Version() R.version R.version.string version R_compiled_by()
R.Version() R.version R.version.string version R_compiled_by()
This gives details of the OS under which R was built, not the one
under which it is currently running (for which see
Sys.info
).
Note that OS names might not be what you expect: for example macOS Mavericks 10.9.4 identifies itself as ‘darwin13.3.0’, Linux usually as ‘linux-gnu’, Solaris 10 as ‘solaris2.10’ and Windows as ‘mingw32’.
R.version$crt
is supported on Windows since R 4.2.0 and returns
"ucrt"
to denote the Universal C Runtime. It would return
"msvcrt"
for the older Microsoft Visual C++ Runtime (but R does
not use that runtime since 4.2.0).
R.Version
returns a list with character-string components
platform |
the platform for which R was built. A triplet of the
form CPU-VENDOR-OS, as determined by the configure script. E.g,
|
arch |
the architecture (CPU) R was built on/for. |
os |
the underlying operating system. |
crt |
the C runtime on Windows. |
system |
CPU and OS, separated by a comma. |
status |
the status of the version (e.g., |
major |
the major version number. |
minor |
the minor version number, including the patch level. |
year |
the year the version was released. |
month |
the month the version was released. |
day |
the day the version was released. |
svn rev |
the Subversion revision number, which should be either
|
language |
always |
version.string |
a
|
R.version
and version
are lists of class
"simple.list"
which has a print
method.
R_compiled_by
returns a two-element character vector giving
details of the C and Fortran compilers used to build R. (Empty
strings if no information is available.)
Do not use R.version$os
to test the platform the
code is running on: use .Platform$OS.type
instead. Slightly
different versions of the OS may report different values of
R.version$os
, as may different versions of R.
Alternatively, osVersion
typically contains more
details about the platform R is running on.
R.version.string
is a copy of R.version$version.string
for simplicity and backwards compatibility.
sessionInfo
which provides additional information;
getRversion
typically used inside R code,
osVersion
,
.Platform
, Sys.info
.
require(graphics) R.version$os # to check how lucky you are ... plot(0) # any plot mtext(R.version.string, side = 1, line = 4, adj = 1) # a useful bottom-right note ## a good way to detect macOS: if(grepl("^darwin", R.version$os)) message("running on macOS") ## Short R version string, ("space free", useful in file/directory names; ## also fine for unreleased versions of R): shortRversion <- function() { rvs <- R.version.string if(grepl("devel", (st <- R.version$status))) rvs <- sub(paste0(" ",st," "), "-devel_", rvs, fixed=TRUE) gsub("[()]", "", gsub(" ", "_", sub(" version ", "-", rvs))) } shortRversion()
require(graphics) R.version$os # to check how lucky you are ... plot(0) # any plot mtext(R.version.string, side = 1, line = 4, adj = 1) # a useful bottom-right note ## a good way to detect macOS: if(grepl("^darwin", R.version$os)) message("running on macOS") ## Short R version string, ("space free", useful in file/directory names; ## also fine for unreleased versions of R): shortRversion <- function() { rvs <- R.version.string if(grepl("devel", (st <- R.version$status))) rvs <- sub(paste0(" ",st," "), "-devel_", rvs, fixed=TRUE) gsub("[()]", "", gsub(" ", "_", sub(" version ", "-", rvs))) } shortRversion()
.Random.seed
is an integer vector, containing the random number
generator (RNG) state for random number generation in R. It
can be saved and restored, but should not be altered by the user.
RNGkind
is a more friendly interface to query or set the kind
of RNG in use.
RNGversion
can be used to set the random generators as they
were in an earlier R version (for reproducibility).
set.seed
is the recommended way to specify seeds.
.Random.seed <- c(rng.kind, n1, n2, ...) RNGkind(kind = NULL, normal.kind = NULL, sample.kind = NULL) RNGversion(vstr) set.seed(seed, kind = NULL, normal.kind = NULL, sample.kind = NULL)
.Random.seed <- c(rng.kind, n1, n2, ...) RNGkind(kind = NULL, normal.kind = NULL, sample.kind = NULL) RNGversion(vstr) set.seed(seed, kind = NULL, normal.kind = NULL, sample.kind = NULL)
kind |
character or |
normal.kind |
character string or |
sample.kind |
character string or |
seed |
a single value, interpreted as an integer, or |
vstr |
a character string containing a version number,
e.g., |
rng.kind |
integer code in |
n1 , n2 , ...
|
integers. See the details for how many are required
(which depends on |
The currently available RNG kinds are given below. kind
is
partially matched to this list. The default is
"Mersenne-Twister"
.
"Wichmann-Hill"
The seed, .Random.seed[-1] == r[1:3]
is an integer vector of
length 3, where each r[i]
is in 1:(p[i] - 1)
, where
p
is the length 3 vector of primes, p = (30269, 30307,
30323)
.
The Wichmann–Hill generator has a cycle length of
(=
prod(p-1)/4
, see Applied Statistics (1984)
33, 123 which corrects the original article).
It exhibits 12 clear failures in the TestU01 Crush suite and 22
in the BigCrush suite (L'Ecuyer, 2007).
"Marsaglia-Multicarry"
:A multiply-with-carry RNG is used, as recommended by George
Marsaglia in his post to the mailing list ‘sci.stat.math’.
It has a period of more than .
It exhibits 40 clear failures in L'Ecuyer's TestU01 Crush suite. Combined with Ahrens-Dieter or Kinderman-Ramage it exhibits deviations from normality even for univariate distribution generation. See PR#18168 for a discussion.
The seed is two integers (all values allowed).
"Super-Duper"
:Marsaglia's famous Super-Duper from the 70's. This is the original
version which does not pass the MTUPLE test of the Diehard
battery. It has a period of for most initial seeds. The seed is two integers (all
values allowed for the first seed: the second must be odd).
We use the implementation by Reeds et al. (1982–84).
The two seeds are the Tausworthe and congruence long integers, respectively.
It exhibits 25 clear failures in the TestU01 Crush suite (L'Ecuyer, 2007).
"Mersenne-Twister"
:From Matsumoto and Nishimura (1998); code updated in 2002.
A twisted GFSR with period
and equidistribution in 623
consecutive dimensions (over the whole period). The ‘seed’ is a
624-dimensional set of 32-bit integers plus a current position in
that set.
R uses its own initialization method due to B. D. Ripley and is not affected by the initialization issue in the 1998 code of Matsumoto and Nishimura addressed in a 2002 update.
It exhibits 2 clear failures in each of the TestU01 Crush and the BigCrush suite (L'Ecuyer, 2007).
"Knuth-TAOCP-2002"
:A 32-bit integer GFSR using lagged Fibonacci sequences with subtraction. That is, the recurrence used is
and the ‘seed’ is the set of the 100 last numbers (actually
recorded as 101 numbers, the last being a cyclic shift of the
buffer). The period is around .
"Knuth-TAOCP"
:An earlier version from Knuth (1997).
The 2002 version was not backwards compatible with the earlier version: the initialization of the GFSR from the seed was altered. R did not allow you to choose consecutive seeds, the reported ‘weakness’, and already scrambled the seeds. Otherwise, the algorithm is identical to Knuth-TAOCP-2002, with the same lagged Fibonacci recurrence formula.
Initialization of this generator is done in interpreted R code and so takes a short but noticeable time.
It exhibits 3 clear failure in the TestU01 Crush suite and 4 clear failures in the BigCrush suite (L'Ecuyer, 2007).
"L'Ecuyer-CMRG"
:A ‘combined multiple-recursive generator’ from L'Ecuyer
(1999), each element of which is a feedback multiplicative
generator with three integer elements: thus the seed is a (signed)
integer vector of length 6. The period is around
.
The 6 elements of the seed are internally regarded as 32-bit
unsigned integers. Neither the first three nor the last three
should be all zero, and they are limited to less than
4294967087
and 4294944443
respectively.
This is not particularly interesting of itself, but provides the basis for the multiple streams used in package parallel.
It exhibits 6 clear failures in each of the TestU01 Crush and the BigCrush suite (L'Ecuyer, 2007).
"user-supplied"
:Use a user-supplied generator. See Random.user
for
details.
normal.kind
can be "Kinderman-Ramage"
,
"Buggy Kinderman-Ramage"
(not for set.seed
),
"Ahrens-Dieter"
, "Box-Muller"
, "Inversion"
(the
default), or "user-supplied"
. (For inversion, see the
reference in qnorm
.) The Kinderman-Ramage generator
used in versions prior to 1.7.0 (now called "Buggy"
) had several
approximation errors and should only be used for reproduction of old
results. The "Box-Muller"
generator is stateful as pairs of
normals are generated and returned sequentially. The state is reset
whenever it is selected (even if it is the current normal generator)
and when kind
is changed.
sample.kind
can be "Rounding"
or "Rejection"
,
or partial matches to these. The former was the default in versions
prior to 3.6.0: it made sample
noticeably non-uniform
on large populations, and should only be used for reproduction of old
results. See PR#17494 for a discussion.
set.seed
uses a single integer argument to set as many seeds
as are required. It is intended as a simple way to get quite different
seeds by specifying small integer arguments, and also as a way to get
valid seed sets for the more complicated methods (especially
"Mersenne-Twister"
and "Knuth-TAOCP"
). There is no
guarantee that different values of seed
will seed the RNG
differently, although any exceptions would be extremely rare. If
called with seed = NULL
it re-initializes (see ‘Note’)
as if no seed had yet been set.
The use of kind = NULL
, normal.kind = NULL
or
sample.kind = NULL
in
RNGkind
or set.seed
selects the currently-used
generator (including that used in the previous session if the
workspace has been restored): if no generator has been used it selects
"default"
.
.Random.seed
is an integer
vector whose first
element codes the kind of RNG and normal generator. The lowest
two decimal digits are in 0:(k-1)
where k
is the number of available RNGs. The hundreds
represent the type of normal generator (starting at 0
), and
the ten thousands represent the type of discrete uniform sampler.
In the underlying C, .Random.seed[-1]
is unsigned
;
therefore in R .Random.seed[-1]
can be negative, due to
the representation of an unsigned integer by a signed integer.
RNGkind
returns a three-element character vector of the RNG,
normal and sample kinds selected before the call, invisibly if
either argument is not NULL
. A type starts a session as the
default, and is selected either by a call to RNGkind
or by setting
.Random.seed
in the workspace. (NB: prior to R 3.6.0 the first
two kinds were returned in a two-element character vector.)
RNGversion
returns the same information as RNGkind
about
the defaults in a specific R version.
set.seed
returns NULL
, invisibly.
Initially, there is no seed; a new one is created from the current time and the process ID when one is required. Hence different sessions will give different simulation results, by default. However, the seed might be restored from a previous session if a previously saved workspace is restored.
.Random.seed
saves the seed set for the uniform random-number
generator, at least for the system generators. It does not
necessarily save the state of other generators, and in particular does
not save the state of the Box–Muller normal generator. If you want
to reproduce work later, call set.seed
(preferably with
explicit values for kind
and normal.kind
) rather than
set .Random.seed
.
The object .Random.seed
is only looked for in the user's
workspace.
Do not rely on randomness of low-order bits from RNGs. Most of the
supplied uniform generators return 32-bit integer values that are
converted to doubles, so they take at most distinct
values and long runs will return duplicated values (Wichmann-Hill is
the exception, and all give at least 30 varying bits.)
of RNGkind: Martin Maechler. Current implementation, B. D. Ripley with modifications by Duncan Murdoch.
Ahrens, J. H. and Dieter, U. (1973). Extensions of Forsythe's method for random sampling from the normal distribution. Mathematics of Computation, 27, 927–937.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988).
The New S Language.
Wadsworth & Brooks/Cole.
(set.seed
, storing in .Random.seed
.)
Box, G. E. P. and Muller, M. E. (1958). A note on the generation of normal random deviates. Annals of Mathematical Statistics, 29, 610–611. doi:10.1214/aoms/1177706645.
De Matteis, A. and Pagnutti, S. (1993). Long-range Correlation Analysis of the Wichmann-Hill Random Number Generator. Statistics and Computing, 3, 67–70. doi:10.1007/BF00153065.
Kinderman, A. J. and Ramage, J. G. (1976). Computer generation of normal random variables. Journal of the American Statistical Association, 71, 893–896. doi:10.2307/2286857.
Knuth, D. E. (1997).
The Art of Computer Programming.
Volume 2, third edition.
Source code at https://www-cs-faculty.stanford.edu/~knuth/taocp.html.
Knuth, D. E. (2002). The Art of Computer Programming. Volume 2, third edition, ninth printing.
L'Ecuyer, P. (1999). Good parameters and implementations for combined multiple recursive random number generators. Operations Research, 47, 159–164. doi:10.1287/opre.47.1.159.
L'Ecuyer, P. and Simard, R. (2007).
TestU01: A C Library for Empirical Testing of Random Number Generators
ACM Transactions on Mathematical Software, 33, Article 22.
doi:10.1145/1268776.1268777.
The TestU01 C library is available from
http://simul.iro.umontreal.ca/testu01/tu01.html or also
https://github.com/umontreal-simul/TestU01-2009.
Marsaglia, G. (1997).
A random number generator for C.
Discussion paper, posting on Usenet newsgroup sci.stat.math
on
September 29, 1997.
Marsaglia, G. and Zaman, A. (1994). Some portable very-long-period random number generators. Computers in Physics, 8, 117–121. doi:10.1063/1.168514.
Matsumoto, M. and Nishimura, T. (1998).
Mersenne Twister: A 623-dimensionally equidistributed uniform
pseudo-random number generator,
ACM Transactions on Modeling and Computer Simulation,
8, 3–30.
Source code formerly at http://www.math.keio.ac.jp/~matumoto/emt.html
.
Now see http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/c-lang.html.
Reeds, J., Hubert, S. and Abrahams, M. (1982–4). C implementation of SuperDuper, University of California at Berkeley. (Personal communication from Jim Reeds to Ross Ihaka.)
Wichmann, B. A. and Hill, I. D. (1982). Algorithm AS 183: An Efficient and Portable Pseudo-random Number Generator. Applied Statistics, 31, 188–190; Remarks: 34, 198 and 35, 89. doi:10.2307/2347988.
sample
for random sampling with and without replacement.
Distributions for functions for random-variate generation from standard distributions.
require(stats) ## Seed the current RNG, i.e., set the RNG status set.seed(42); u1 <- runif(30) set.seed(42); u2 <- runif(30) # the same because of identical RNG status: stopifnot(identical(u1, u2)) ## the default random seed is 626 integers, so only print a few runif(1); .Random.seed[1:6]; runif(1); .Random.seed[1:6] ## If there is no seed, a "random" new one is created: rm(.Random.seed); runif(1); .Random.seed[1:6] ok <- RNGkind() RNGkind("Wich") # (partial string matching on 'kind') ## This shows how 'runif(.)' works for Wichmann-Hill, ## using only R functions: p.WH <- c(30269, 30307, 30323) a.WH <- c( 171, 172, 170) next.WHseed <- function(i.seed = .Random.seed[-1]) { (a.WH * i.seed) %% p.WH } my.runif1 <- function(i.seed = .Random.seed) { ns <- next.WHseed(i.seed[-1]); sum(ns / p.WH) %% 1 } set.seed(1998-12-04)# (when the next lines were added to the souRce) rs <- .Random.seed (WHs <- next.WHseed(rs[-1])) u <- runif(1) stopifnot( next.WHseed(rs[-1]) == .Random.seed[-1], all.equal(u, my.runif1(rs)) ) ## ---- .Random.seed RNGkind("Super") # matches "Super-Duper" RNGkind() .Random.seed # new, corresponding to Super-Duper ## Reset: RNGkind(ok[1]) RNGversion(getRversion()) # the default version for this R version ## ---- sum(duplicated(runif(1e6))) # around 110 for default generator ## and we would expect about almost sure duplicates beyond about qbirthday(1 - 1e-6, classes = 2e9) # 235,000
require(stats) ## Seed the current RNG, i.e., set the RNG status set.seed(42); u1 <- runif(30) set.seed(42); u2 <- runif(30) # the same because of identical RNG status: stopifnot(identical(u1, u2)) ## the default random seed is 626 integers, so only print a few runif(1); .Random.seed[1:6]; runif(1); .Random.seed[1:6] ## If there is no seed, a "random" new one is created: rm(.Random.seed); runif(1); .Random.seed[1:6] ok <- RNGkind() RNGkind("Wich") # (partial string matching on 'kind') ## This shows how 'runif(.)' works for Wichmann-Hill, ## using only R functions: p.WH <- c(30269, 30307, 30323) a.WH <- c( 171, 172, 170) next.WHseed <- function(i.seed = .Random.seed[-1]) { (a.WH * i.seed) %% p.WH } my.runif1 <- function(i.seed = .Random.seed) { ns <- next.WHseed(i.seed[-1]); sum(ns / p.WH) %% 1 } set.seed(1998-12-04)# (when the next lines were added to the souRce) rs <- .Random.seed (WHs <- next.WHseed(rs[-1])) u <- runif(1) stopifnot( next.WHseed(rs[-1]) == .Random.seed[-1], all.equal(u, my.runif1(rs)) ) ## ---- .Random.seed RNGkind("Super") # matches "Super-Duper" RNGkind() .Random.seed # new, corresponding to Super-Duper ## Reset: RNGkind(ok[1]) RNGversion(getRversion()) # the default version for this R version ## ---- sum(duplicated(runif(1e6))) # around 110 for default generator ## and we would expect about almost sure duplicates beyond about qbirthday(1 - 1e-6, classes = 2e9) # 235,000
Function RNGkind
allows user-coded uniform and
normal random number generators to be supplied. The details are given
here.
A user-specified uniform RNG is called from entry points in
dynamically-loaded compiled code. The user must supply the entry point
user_unif_rand
, which takes no arguments and returns a
pointer to a double. The example below will show the general
pattern. The generator should have at least 25 bits of precision.
Optionally, the user can supply the entry point user_unif_init
,
which is called with an unsigned int
argument when
RNGkind
(or set.seed
) is called, and is intended
to be used to initialize the user's RNG code. The argument is intended
to be used to set the ‘seeds’; it is the seed
argument to
set.seed
or an essentially random seed if RNGkind
is called.
If only these functions are supplied, no information about the
generator's state is recorded in .Random.seed
. Optionally,
functions user_unif_nseed
and user_unif_seedloc
can be
supplied which are called with no arguments and should return pointers
to the number of seeds and to an integer (specifically, ‘Int32’)
array of seeds. Calls to GetRNGstate
and PutRNGstate
will then copy this array to and from .Random.seed
.
A user-specified normal RNG is specified by a single entry point
user_norm_rand
, which takes no arguments and returns a
pointer to a double.
As with all compiled code, mis-specifying these functions can crash R. Do include the ‘R_ext/Random.h’ header file for type checking.
## Not run: ## Marsaglia's congruential PRNG #include <R_ext/Random.h> static Int32 seed; static double res; static int nseed = 1; double * user_unif_rand(void) { seed = 69069 * seed + 1; res = seed * 2.32830643653869e-10; return &res; } void user_unif_init(Int32 seed_in) { seed = seed_in; } int * user_unif_nseed(void) { return &nseed; } int * user_unif_seedloc(void) { return (int *) &seed; } /* ratio-of-uniforms for normal */ #include <math.h> static double x; double * user_norm_rand(void) { double u, v, z; do { u = unif_rand(); v = 0.857764 * (2. * unif_rand() - 1); x = v/u; z = 0.25 * x * x; if (z < 1. - u) break; if (z > 0.259/u + 0.35) continue; } while (z > -log(u)); return &x; } ## Use under Unix: R CMD SHLIB urand.c R > dyn.load("urand.so") > RNGkind("user") > runif(10) > .Random.seed > RNGkind(, "user") > rnorm(10) > RNGkind() [1] "user-supplied" "user-supplied" ## End(Not run)
## Not run: ## Marsaglia's congruential PRNG #include <R_ext/Random.h> static Int32 seed; static double res; static int nseed = 1; double * user_unif_rand(void) { seed = 69069 * seed + 1; res = seed * 2.32830643653869e-10; return &res; } void user_unif_init(Int32 seed_in) { seed = seed_in; } int * user_unif_nseed(void) { return &nseed; } int * user_unif_seedloc(void) { return (int *) &seed; } /* ratio-of-uniforms for normal */ #include <math.h> static double x; double * user_norm_rand(void) { double u, v, z; do { u = unif_rand(); v = 0.857764 * (2. * unif_rand() - 1); x = v/u; z = 0.25 * x * x; if (z < 1. - u) break; if (z > 0.259/u + 0.35) continue; } while (z > -log(u)); return &x; } ## Use under Unix: R CMD SHLIB urand.c R > dyn.load("urand.so") > RNGkind("user") > runif(10) > .Random.seed > RNGkind(, "user") > rnorm(10) > RNGkind() [1] "user-supplied" "user-supplied" ## End(Not run)
range
returns a vector containing the minimum and maximum of
all the given arguments.
range(..., na.rm = FALSE) ## Default S3 method: range(..., na.rm = FALSE, finite = FALSE) ## same for classes 'Date' and 'POSIXct' .rangeNum(..., na.rm, finite, isNumeric)
range(..., na.rm = FALSE) ## Default S3 method: range(..., na.rm = FALSE, finite = FALSE) ## same for classes 'Date' and 'POSIXct' .rangeNum(..., na.rm, finite, isNumeric)
... |
any |
na.rm |
logical, indicating if |
finite |
logical, indicating if all non-finite elements should be omitted. |
isNumeric |
a |
range
is a generic function: methods can be defined for it
directly or via the Summary
group generic.
For this to work properly, the arguments ...
should be
unnamed, and dispatch is on the first argument.
If na.rm
is FALSE
, NA
and NaN
values in any of the arguments will cause NA
values
to be returned, otherwise NA
values are ignored.
If finite
is TRUE
, the minimum
and maximum of all finite values is computed, i.e.,
finite = TRUE
includes na.rm = TRUE
.
A special situation occurs when there is no (after omission
of NA
s) nonempty argument left, see min
.
This is part of the S4 Summary
group generic. Methods for it must use the signature
x, ..., na.rm
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
The extendrange()
utility in package grDevices.
(r.x <- range(stats::rnorm(100))) diff(r.x) # the SAMPLE range x <- c(NA, 1:3, -1:1/0); x range(x) range(x, na.rm = TRUE) range(x, finite = TRUE)
(r.x <- range(stats::rnorm(100))) diff(r.x) # the SAMPLE range x <- c(NA, 1:3, -1:1/0); x range(x) range(x, na.rm = TRUE) range(x, finite = TRUE)
Returns the sample ranks of the values in a vector. Ties (i.e., equal values) and missing values can be handled in several ways.
rank(x, na.last = TRUE, ties.method = c("average", "first", "last", "random", "max", "min"))
rank(x, na.last = TRUE, ties.method = c("average", "first", "last", "random", "max", "min"))
x |
a numeric, complex, character or logical vector. |
na.last |
a logical or character string controlling the treatment
of |
ties.method |
a character string specifying how ties are treated, see ‘Details’; can be abbreviated. |
If all components are different (and no NA
s), the ranks are
well defined, with values in seq_along(x)
. With some values equal
(called ‘ties’), the argument ties.method
determines the
result at the corresponding indices. The "first"
method results
in a permutation with increasing values at each index set of ties, and
analogously "last"
with decreasing values. The
"random"
method puts these in random order whereas the
default, "average"
, replaces them by their mean, and
"max"
and "min"
replaces them by their maximum and
minimum respectively, the latter being the typical sports
ranking.
NA
values are never considered to be equal: for na.last =
TRUE
and na.last = FALSE
they are given distinct ranks in
the order in which they occur in x
.
NB: rank
is not itself generic but xtfrm
is, and rank(xtfrm(x), ....)
will have the desired result if
there is a xtfrm
method. Otherwise, rank
will make use
of ==
, >
, is.na
and extraction methods for
classed objects, possibly rather slowly.
A numeric vector of the same length as x
with names copied from
x
(unless na.last = NA
, when missing values are
removed). The vector is of integer type unless x
is a long
vector or ties.method = "average"
when it is of double type
(whether or not there are any ties).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
order
and sort
;
xtfrm
, see above.
(r1 <- rank(x1 <- c(3, 1, 4, 15, 92))) x2 <- c(3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5) names(x2) <- letters[1:11] (r2 <- rank(x2)) # ties are averaged ## rank() is "idempotent": rank(rank(x)) == rank(x) : stopifnot(rank(r1) == r1, rank(r2) == r2) ## ranks without averaging rank(x2, ties.method= "first") # first occurrence wins rank(x2, ties.method= "last") # last occurrence wins rank(x2, ties.method= "random") # ties broken at random rank(x2, ties.method= "random") # and again ## keep ties ties, no average (rma <- rank(x2, ties.method= "max")) # as used classically (rmi <- rank(x2, ties.method= "min")) # as in Sports stopifnot(rma + rmi == round(r2 + r2)) ## Comparing all tie.methods: tMeth <- eval(formals(rank)$ties.method) rx2 <- sapply(tMeth, function(M) rank(x2, ties.method=M)) cbind(x2, rx2) ## ties.method's does not matter w/o ties: x <- sample(47) rx <- sapply(tMeth, function(MM) rank(x, ties.method=MM)) stopifnot(all(rx[,1] == rx))
(r1 <- rank(x1 <- c(3, 1, 4, 15, 92))) x2 <- c(3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5) names(x2) <- letters[1:11] (r2 <- rank(x2)) # ties are averaged ## rank() is "idempotent": rank(rank(x)) == rank(x) : stopifnot(rank(r1) == r1, rank(r2) == r2) ## ranks without averaging rank(x2, ties.method= "first") # first occurrence wins rank(x2, ties.method= "last") # last occurrence wins rank(x2, ties.method= "random") # ties broken at random rank(x2, ties.method= "random") # and again ## keep ties ties, no average (rma <- rank(x2, ties.method= "max")) # as used classically (rmi <- rank(x2, ties.method= "min")) # as in Sports stopifnot(rma + rmi == round(r2 + r2)) ## Comparing all tie.methods: tMeth <- eval(formals(rank)$ties.method) rx2 <- sapply(tMeth, function(M) rank(x2, ties.method=M)) cbind(x2, rx2) ## ties.method's does not matter w/o ties: x <- sample(47) rx <- sapply(tMeth, function(MM) rank(x, ties.method=MM)) stopifnot(all(rx[,1] == rx))
rapply
is a recursive version of lapply
with
flexibility in how the result is structured (how = ".."
).
rapply(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...)
rapply(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...)
object |
a |
f |
a |
classes |
character vector of |
deflt |
the default result (not used if |
how |
character string partially matching the three possibilities given: see ‘Details’. |
... |
additional arguments passed to the call to |
This function has two basic modes. If how = "replace"
, each
element of object
which is not itself list-like and has a class
included in classes
is replaced by the result of applying
f
to the element.
Otherwise, with mode how = "list"
or how = "unlist"
,
conceptually object
is copied, all non-list elements which have a class included in
classes
are replaced by the result of applying f
to the
element and all others are replaced by deflt
. Finally, if
how = "unlist"
, unlist(recursive = TRUE)
is called on
the result.
The semantics differ in detail from lapply
: in
particular the arguments are evaluated before calling the C code.
In R 3.5.x and earlier, object
was required to be a list,
which was not the case for its list-like components.
If how = "unlist"
, a vector, otherwise “list-like”
of similar structure as object
.
Chambers, J. A. (1998)
Programming with Data.
Springer.
(rapply
is only described briefly there.)
X <- list(list(a = pi, b = list(c = 1L)), d = "a test") # the "identity operation": rapply(X, function(x) x, how = "replace") -> X.; stopifnot(identical(X, X.)) rapply(X, sqrt, classes = "numeric", how = "replace") rapply(X, deparse, control = "all") # passing extras. argument of deparse() rapply(X, nchar, classes = "character", deflt = NA_integer_, how = "list") rapply(X, nchar, classes = "character", deflt = NA_integer_, how = "unlist") rapply(X, nchar, classes = "character", how = "unlist") rapply(X, log, classes = "numeric", how = "replace", base = 2) ## with expression() / list(): E <- expression(list(a = pi, b = expression(c = C1 * C2)), d = "a test") LE <- list(expression(a = pi, b = expression(c = C1 * C2)), d = "a test") rapply(E, nchar, how="replace") # "expression(c = C1 * C2)" are 23 chars rapply(E, nchar, classes = "character", deflt = NA_integer_, how = "unlist") rapply(LE, as.character) # a "pi" | b1 "expression" | b2 "C1 * C2" .. rapply(LE, nchar) # (see above) stopifnot(exprs = { identical(E , rapply(E , identity, how = "replace")) identical(LE, rapply(LE, identity, how = "replace")) })
X <- list(list(a = pi, b = list(c = 1L)), d = "a test") # the "identity operation": rapply(X, function(x) x, how = "replace") -> X.; stopifnot(identical(X, X.)) rapply(X, sqrt, classes = "numeric", how = "replace") rapply(X, deparse, control = "all") # passing extras. argument of deparse() rapply(X, nchar, classes = "character", deflt = NA_integer_, how = "list") rapply(X, nchar, classes = "character", deflt = NA_integer_, how = "unlist") rapply(X, nchar, classes = "character", how = "unlist") rapply(X, log, classes = "numeric", how = "replace", base = 2) ## with expression() / list(): E <- expression(list(a = pi, b = expression(c = C1 * C2)), d = "a test") LE <- list(expression(a = pi, b = expression(c = C1 * C2)), d = "a test") rapply(E, nchar, how="replace") # "expression(c = C1 * C2)" are 23 chars rapply(E, nchar, classes = "character", deflt = NA_integer_, how = "unlist") rapply(LE, as.character) # a "pi" | b1 "expression" | b2 "C1 * C2" .. rapply(LE, nchar) # (see above) stopifnot(exprs = { identical(E , rapply(E , identity, how = "replace")) identical(LE, rapply(LE, identity, how = "replace")) })
Creates or tests for objects of type "raw"
.
raw(length = 0) as.raw(x) is.raw(x)
raw(length = 0) as.raw(x) is.raw(x)
length |
desired length. |
x |
object to be coerced. |
The raw type is intended to hold raw bytes. It is possible to extract subsequences of bytes, and to replace elements (but only by elements of a raw vector). The relational operators (see Comparison, using the numerical order of the byte representation) work, as do the logical operators (see Logic) with a bitwise interpretation.
A raw vector is printed with each byte separately represented as a
pair of hex digits. If you want to see a character representation
(with escape sequences for non-printing characters) use
rawToChar
.
Coercion to raw treats the input values as representing small
(decimal) integers, so the input is first coerced to integer, and then
values which are outside the range [0 ... 255]
or are
NA
are set to 0
(the nul
byte).
as.raw
and is.raw
are primitive functions.
raw
creates a raw vector of the specified length.
Each element of the vector is equal to 0
.
Raw vectors are used to store fixed-length sequences of bytes.
as.raw
attempts to coerce its argument to be of raw
type. The (elementwise) answer will be 0
unless the
coercion succeeds (or if the original value successfully coerces to 0).
is.raw
returns true if and only if typeof(x) == "raw"
.
&
for bitwise operations on raw vectors.
xx <- raw(2) xx[1] <- as.raw(40) # NB, not just 40. xx[2] <- charToRaw("A") xx ## 28 41 -- raw prints hexadecimals dput(xx) ## as.raw(c(0x28, 0x41)) as.integer(xx) ## 40 65 x <- "A test string" (y <- charToRaw(x)) is.vector(y) # TRUE rawToChar(y) is.raw(x) is.raw(y) stopifnot( charToRaw("\xa3") == as.raw(0xa3) ) isASCII <- function(txt) all(charToRaw(txt) <= as.raw(127)) isASCII(x) # true isASCII("\xa325.63") # false (in Latin-1, this is an amount in UK pounds)
xx <- raw(2) xx[1] <- as.raw(40) # NB, not just 40. xx[2] <- charToRaw("A") xx ## 28 41 -- raw prints hexadecimals dput(xx) ## as.raw(c(0x28, 0x41)) as.integer(xx) ## 40 65 x <- "A test string" (y <- charToRaw(x)) is.vector(y) # TRUE rawToChar(y) is.raw(x) is.raw(y) stopifnot( charToRaw("\xa3") == as.raw(0xa3) ) isASCII <- function(txt) all(charToRaw(txt) <= as.raw(127)) isASCII(x) # true isASCII("\xa325.63") # false (in Latin-1, this is an amount in UK pounds)
Input and output raw connections.
rawConnection(object, open = "r") rawConnectionValue(con)
rawConnection(object, open = "r") rawConnectionValue(con)
object |
character or raw vector. A description of the connection. For an input this is an R raw vector object, and for an output connection the name for the connection. |
open |
character. Any of the standard connection open modes. |
con |
an output raw connection. |
An input raw connection is opened and the raw vector is copied
at the time the connection object is created, and close
destroys the copy.
An output raw connection is opened and creates an R raw vector
internally. The raw vector can be retrieved via
rawConnectionValue
.
If a connection is open for both input and output the initial raw vector supplied is copied when the connections is open
For rawConnection
, a connection object of class
"rawConnection"
which inherits from class "connection"
.
For rawConnectionValue
, a raw vector.
As output raw connections keep the internal raw vector up to date
call-by-call, they are relatively expensive to use (although
over-allocation is used), and it may be better to use an anonymous
file()
connection to collect output.
On (rare) platforms where vsnprintf
does not return the needed length
of output there is a 100,000 character limit on the length of line for
output connections: longer lines will be truncated with a warning.
zz <- rawConnection(raw(0), "r+") # start with empty raw vector writeBin(LETTERS, zz) seek(zz, 0) readLines(zz) # raw vector has embedded nuls seek(zz, 0) writeBin(letters[1:3], zz) rawConnectionValue(zz) close(zz)
zz <- rawConnection(raw(0), "r+") # start with empty raw vector writeBin(LETTERS, zz) seek(zz, 0) readLines(zz) # raw vector has embedded nuls seek(zz, 0) writeBin(letters[1:3], zz) rawConnectionValue(zz) close(zz)
Conversion to and from and manipulation of objects of type "raw"
,
both used as bits or “packed” 8 bits.
charToRaw(x) rawToChar(x, multiple = FALSE) rawShift(x, n) rawToBits(x) intToBits(x) packBits(x, type = c("raw", "integer", "double")) numToInts(x) numToBits(x)
charToRaw(x) rawToChar(x, multiple = FALSE) rawShift(x, n) rawToBits(x) intToBits(x) packBits(x, type = c("raw", "integer", "double")) numToInts(x) numToBits(x)
x |
object to be converted or shifted. |
multiple |
logical: should the conversion be to a single character string or multiple individual characters? |
n |
the number of bits to shift. Positive numbers shift right
and negative numbers shift left: allowed values are |
type |
the result type, partially matched. |
packBits
accepts raw, integer or logical inputs, the last two
without any NAs.
numToBits(.)
and packBits(., type="double")
are
inverse functions of each other, see also the examples.
Note that ‘bytes’ are not necessarily the same as characters, e.g. in UTF-8 locales.
charToRaw
converts a length-one character string to raw bytes.
It does so without taking into account any declared encoding (see
Encoding
).
rawToChar
converts raw bytes either to a single character
string or a character vector of single bytes (with ""
for
0
). (Note that a single character string could contain
embedded NULs; only trailing nulls are allowed and will be removed.)
In either case it is possible to create a result which is invalid in a
multibyte locale, e.g. one using UTF-8. Long vectors are
allowed if multiple
is true.
rawShift(x, n)
shift the bits in x
by n
positions
to the right, see the argument n
, above.
rawToBits
returns a raw vector of 8 times the length of a raw
vector with entries 0 or 1. intToBits
returns a raw vector
of 32 times the length of an integer vector with entries 0 or 1.
(Non-integral numeric values are truncated to integers.) In
both cases the unpacking is least-significant bit first.
packBits
packs its input (using only the lowest bit for raw or
integer vectors) least-significant bit first to a raw, integer or double
(“numeric”) vector.
numToInts()
and
numToBits()
split double
precision numeric vectors
either into to two integer
s each or into 64 bits each,
stored as raw
. In both cases the unpacking is least-significant
element first.
x <- "A test string" (y <- charToRaw(x)) is.vector(y) # TRUE rawToChar(y) rawToChar(y, multiple = TRUE) (xx <- c(y, charToRaw("&"), charToRaw(" more"))) rawToChar(xx) rawShift(y, 1) rawShift(y,-2) rawToBits(y) showBits <- function(r) stats::symnum(as.logical(rawToBits(r))) z <- as.raw(5) z ; showBits(z) showBits(rawShift(z, 1)) # shift to right showBits(rawShift(z, 2)) showBits(z) showBits(rawShift(z, -1)) # shift to left showBits(rawShift(z, -2)) # .. showBits(rawShift(z, -3)) # shifted off entirely packBits(as.raw(0:31)) i <- -2:3 stopifnot(exprs = { identical(i, packBits(intToBits(i), "integer")) identical(packBits( 0:31) , packBits(as.raw(0:31))) }) str(pBi <- packBits(intToBits(i))) data.frame(B = matrix(pBi, nrow=6, byrow=TRUE), hex = format(as.hexmode(i)), i) ## Look at internal bit representation of ... ## ... of integers : bitI <- function(x) vapply(as.integer(x), function(x) { b <- substr(as.character(rev(intToBits(x))), 2L, 2L) paste0(c(b[1L], " ", b[2:32]), collapse = "") }, "") print(bitI(-8:8), width = 35, quote = FALSE) ## ... of double precision numbers in format 'sign exp | mantissa' ## where 1 bit sign 1 <==> "-"; ## 11 bit exp is the base-2 exponent biased by 2^10 - 1 (1023) ## 52 bit mantissa is without the implicit leading '1' # ## Bit representation [ sign | exponent | mantissa ] of double prec numbers : bitC <- function(x) noquote(vapply(as.double(x), function(x) { # split one double b <- substr(as.character(rev(numToBits(x))), 2L, 2L) paste0(c(b[1L], " ", b[2:12], " | ", b[13:64]), collapse = "") }, "")) bitC(17) bitC(c(-1,0,1)) bitC(2^(-2:5)) bitC(1+2^-(1:53))# from 0.5 converge to 1 ### numToBits(.) <==> intToBits(numToInts(.)) : d2bI <- function(x) vapply(as.double(x), function(x) intToBits(numToInts(x)), raw(64L)) d2b <- function(x) vapply(as.double(x), function(x) numToBits(x) , raw(64L)) set.seed(1) x <- c(sort(rt(2048, df=1.5)), 2^(-10:10), 1+2^-(1:53)) str(bx <- d2b(x)) # a 64 x 2122 raw matrix stopifnot( identical(bx, d2bI(x)) ) ## Show that packBits(*, "double") is the inverse of numToBits() : packBits(numToBits(pi), type="double") bitC(2050) b <- numToBits(2050) identical(b, numToBits(packBits(b, type="double"))) pbx <- apply(bx, 2, packBits, type="double") stopifnot( identical(pbx, x))
x <- "A test string" (y <- charToRaw(x)) is.vector(y) # TRUE rawToChar(y) rawToChar(y, multiple = TRUE) (xx <- c(y, charToRaw("&"), charToRaw(" more"))) rawToChar(xx) rawShift(y, 1) rawShift(y,-2) rawToBits(y) showBits <- function(r) stats::symnum(as.logical(rawToBits(r))) z <- as.raw(5) z ; showBits(z) showBits(rawShift(z, 1)) # shift to right showBits(rawShift(z, 2)) showBits(z) showBits(rawShift(z, -1)) # shift to left showBits(rawShift(z, -2)) # .. showBits(rawShift(z, -3)) # shifted off entirely packBits(as.raw(0:31)) i <- -2:3 stopifnot(exprs = { identical(i, packBits(intToBits(i), "integer")) identical(packBits( 0:31) , packBits(as.raw(0:31))) }) str(pBi <- packBits(intToBits(i))) data.frame(B = matrix(pBi, nrow=6, byrow=TRUE), hex = format(as.hexmode(i)), i) ## Look at internal bit representation of ... ## ... of integers : bitI <- function(x) vapply(as.integer(x), function(x) { b <- substr(as.character(rev(intToBits(x))), 2L, 2L) paste0(c(b[1L], " ", b[2:32]), collapse = "") }, "") print(bitI(-8:8), width = 35, quote = FALSE) ## ... of double precision numbers in format 'sign exp | mantissa' ## where 1 bit sign 1 <==> "-"; ## 11 bit exp is the base-2 exponent biased by 2^10 - 1 (1023) ## 52 bit mantissa is without the implicit leading '1' # ## Bit representation [ sign | exponent | mantissa ] of double prec numbers : bitC <- function(x) noquote(vapply(as.double(x), function(x) { # split one double b <- substr(as.character(rev(numToBits(x))), 2L, 2L) paste0(c(b[1L], " ", b[2:12], " | ", b[13:64]), collapse = "") }, "")) bitC(17) bitC(c(-1,0,1)) bitC(2^(-2:5)) bitC(1+2^-(1:53))# from 0.5 converge to 1 ### numToBits(.) <==> intToBits(numToInts(.)) : d2bI <- function(x) vapply(as.double(x), function(x) intToBits(numToInts(x)), raw(64L)) d2b <- function(x) vapply(as.double(x), function(x) numToBits(x) , raw(64L)) set.seed(1) x <- c(sort(rt(2048, df=1.5)), 2^(-10:10), 1+2^-(1:53)) str(bx <- d2b(x)) # a 64 x 2122 raw matrix stopifnot( identical(bx, d2bI(x)) ) ## Show that packBits(*, "double") is the inverse of numToBits() : packBits(numToBits(pi), type="double") bitC(2050) b <- numToBits(2050) identical(b, numToBits(packBits(b, type="double"))) pbx <- apply(bx, 2, packBits, type="double") stopifnot( identical(pbx, x))
Utilities for converting files in R documentation (Rd) format to other formats or create indices from them, and for converting documentation in other formats to Rd format.
R CMD Rdconv [options] file R CMD Rd2pdf [options] files
R CMD Rdconv [options] file R CMD Rd2pdf [options] files
file |
the path to a file to be processed. |
files |
a list of file names specifying the R documentation sources to use, by either giving the paths to the files, or the path to a directory with the sources of a package. |
options |
further options to control the processing, or for obtaining information about usage and version of the utility. |
R CMD Rdconv
converts Rd format to plain text, HTML or LaTeX
formats: it can also extract the examples.
R CMD Rd2pdf
is the user-level program for producing PDF output
from Rd sources. It will make use of the environment variables
R_PAPERSIZE (set by R CMD
, with a default set when R
was installed: values for R_PAPERSIZE are a4
,
letter
, legal
and executive
)
and R_PDFVIEWER (the PDF previewer). Also,
RD2PDF_INPUTENC can be set to inputenx
to make use of the
LaTeX package of that name rather than inputenc
: this might be
needed for better support of the UTF-8 encoding.
R CMD Rd2pdf
calls tools::texi2pdf
to produce
its PDF file: see its help for the possibilities for the
texi2dvi
command which that function uses (and which can be
overridden by setting environment variable R_TEXI2DVICMD).
Use R CMD foo --help
to obtain usage information on utility
foo
.
The section ‘Processing documentation files’ in the
‘Writing R Extensions’ manual: RShowDoc("R-exts")
.
Read binary data from or write binary data to a connection or raw vector.
readBin(con, what, n = 1L, size = NA_integer_, signed = TRUE, endian = .Platform$endian) writeBin(object, con, size = NA_integer_, endian = .Platform$endian, useBytes = FALSE)
readBin(con, what, n = 1L, size = NA_integer_, signed = TRUE, endian = .Platform$endian) writeBin(object, con, size = NA_integer_, endian = .Platform$endian, useBytes = FALSE)
con |
A connection object or a character string naming a file or a raw vector. |
what |
Either an object whose mode will give the mode of the
vector to be read, or a character vector of length one describing
the mode: one of |
n |
numeric. The (maximal) number of records to be
read. You can use an over-estimate here, but not too large as
storage is reserved for |
size |
integer. The number of bytes per element in the byte
stream. The default, |
signed |
logical. Only used for integers of sizes 1 and 2, when it determines if the quantity on file should be regarded as a signed or unsigned integer. |
endian |
The endianness ( |
object |
An R object to be written to the connection. |
useBytes |
See |
These functions can only be used with binary-mode connections.
If con
is a character string, the functions call
file
to obtain a binary-mode file connection which is
opened for the duration of the function call.
If the connection is open it is read/written from its current position. If it is not open, it is opened for the duration of the call in an appropriate mode (binary read or write) and then closed again. An open connection must be in binary mode.
If readBin
is called with con
a raw vector, the data in
the vector is used as input. If writeBin
is called with
con
a raw vector, it is just an indication that a raw vector
should be returned.
If size
is specified and not the natural size of the object,
each element of the vector is coerced to an appropriate type before
being written or as it is read. Possible sizes are 1, 2, 4 and
possibly 8 for integer or logical vectors, and 4, 8 and possibly 12/16
for numeric vectors. (Note that coercion occurs as signed types
except if signed = FALSE
when reading integers of sizes 1 and 2.)
Changing sizes is unlikely to preserve NA
s, and the extended
precision sizes are unlikely to be portable across platforms.
readBin
and writeBin
read and write C-style
zero-terminated character strings. Input strings are limited to 10000
characters. readChar
and writeChar
can
be used to read and write fixed-length strings. No check is made that
the string is valid in the current locale's encoding.
Handling R's missing and special (Inf
, -Inf
and
NaN
) values is discussed in the ‘R Data Import/Export’ manual.
Only bytes can be written in a single
call (and that is the maximum capacity of a raw vector on 32-bit
platforms).
‘Endian-ness’ is relevant for size > 1
, and should
always be set for portable code (the default is only appropriate when
writing and then reading files on the same platform).
For readBin
, a vector of appropriate mode and length the number of
items read (which might be less than n
).
For writeBin
, a raw vector (if con
is a raw vector) or
invisibly NULL
.
Integer read/writes of size 8 will be available if either C type
long
is of size 8 bytes or C type long long
exists and
is of size 8 bytes.
Real read/writes of size sizeof(long double)
(usually 12 or 16
bytes) will be available only if that type is available and different
from double
.
If readBin(what = character())
is used incorrectly on a file
which does not contain C-style character strings, warnings (usually
many) are given. From a file or connection, the input will be broken
into pieces of length 10000 with any final part being discarded.
The ‘R Data Import/Export’ manual.
readChar
to read/write fixed-length strings.
connections
, readLines
,
writeLines
.
.Machine
for the sizes of long
, long long
and long double
.
zzfil <- tempfile("testbin") zz <- file(zzfil, "wb") writeBin(1:10, zz) writeBin(pi, zz, endian = "swap") writeBin(pi, zz, size = 4) writeBin(pi^2, zz, size = 4, endian = "swap") writeBin(pi+3i, zz) writeBin("A test of a connection", zz) z <- paste("A very long string", 1:100, collapse = " + ") writeBin(z, zz) if(.Machine$sizeof.long == 8 || .Machine$sizeof.longlong == 8) writeBin(as.integer(5^(1:10)), zz, size = 8) if((s <- .Machine$sizeof.longdouble) > 8) writeBin((pi/3)^(1:10), zz, size = s) close(zz) zz <- file(zzfil, "rb") readBin(zz, integer(), 4) readBin(zz, integer(), 6) readBin(zz, numeric(), 1, endian = "swap") readBin(zz, numeric(), size = 4) readBin(zz, numeric(), size = 4, endian = "swap") readBin(zz, complex(), 1) readBin(zz, character(), 1) z2 <- readBin(zz, character(), 1) if(.Machine$sizeof.long == 8 || .Machine$sizeof.longlong == 8) readBin(zz, integer(), 10, size = 8) if((s <- .Machine$sizeof.longdouble) > 8) readBin(zz, numeric(), 10, size = s) close(zz) unlink(zzfil) stopifnot(z2 == z) ## signed vs unsigned ints zzfil <- tempfile("testbin") zz <- file(zzfil, "wb") x <- as.integer(seq(0, 255, 32)) writeBin(x, zz, size = 1) writeBin(x, zz, size = 1) x <- as.integer(seq(0, 60000, 10000)) writeBin(x, zz, size = 2) writeBin(x, zz, size = 2) close(zz) zz <- file(zzfil, "rb") readBin(zz, integer(), 8, size = 1) readBin(zz, integer(), 8, size = 1, signed = FALSE) readBin(zz, integer(), 7, size = 2) readBin(zz, integer(), 7, size = 2, signed = FALSE) close(zz) unlink(zzfil) ## use of raw z <- writeBin(pi^{1:5}, raw(), size = 4) readBin(z, numeric(), 5, size = 4) z <- writeBin(c("a", "test", "of", "character"), raw()) readBin(z, character(), 4)
zzfil <- tempfile("testbin") zz <- file(zzfil, "wb") writeBin(1:10, zz) writeBin(pi, zz, endian = "swap") writeBin(pi, zz, size = 4) writeBin(pi^2, zz, size = 4, endian = "swap") writeBin(pi+3i, zz) writeBin("A test of a connection", zz) z <- paste("A very long string", 1:100, collapse = " + ") writeBin(z, zz) if(.Machine$sizeof.long == 8 || .Machine$sizeof.longlong == 8) writeBin(as.integer(5^(1:10)), zz, size = 8) if((s <- .Machine$sizeof.longdouble) > 8) writeBin((pi/3)^(1:10), zz, size = s) close(zz) zz <- file(zzfil, "rb") readBin(zz, integer(), 4) readBin(zz, integer(), 6) readBin(zz, numeric(), 1, endian = "swap") readBin(zz, numeric(), size = 4) readBin(zz, numeric(), size = 4, endian = "swap") readBin(zz, complex(), 1) readBin(zz, character(), 1) z2 <- readBin(zz, character(), 1) if(.Machine$sizeof.long == 8 || .Machine$sizeof.longlong == 8) readBin(zz, integer(), 10, size = 8) if((s <- .Machine$sizeof.longdouble) > 8) readBin(zz, numeric(), 10, size = s) close(zz) unlink(zzfil) stopifnot(z2 == z) ## signed vs unsigned ints zzfil <- tempfile("testbin") zz <- file(zzfil, "wb") x <- as.integer(seq(0, 255, 32)) writeBin(x, zz, size = 1) writeBin(x, zz, size = 1) x <- as.integer(seq(0, 60000, 10000)) writeBin(x, zz, size = 2) writeBin(x, zz, size = 2) close(zz) zz <- file(zzfil, "rb") readBin(zz, integer(), 8, size = 1) readBin(zz, integer(), 8, size = 1, signed = FALSE) readBin(zz, integer(), 7, size = 2) readBin(zz, integer(), 7, size = 2, signed = FALSE) close(zz) unlink(zzfil) ## use of raw z <- writeBin(pi^{1:5}, raw(), size = 4) readBin(z, numeric(), 5, size = 4) z <- writeBin(c("a", "test", "of", "character"), raw()) readBin(z, character(), 4)
Transfer character strings to and from connections, without assuming they are null-terminated on the connection.
readChar(con, nchars, useBytes = FALSE) writeChar(object, con, nchars = nchar(object, type = "chars"), eos = "", useBytes = FALSE)
readChar(con, nchars, useBytes = FALSE) writeChar(object, con, nchars = nchar(object, type = "chars"), eos = "", useBytes = FALSE)
con |
a connection object, or a character string naming a file, or a raw vector. |
nchars |
integer vector, giving the lengths in characters of
(unterminated) character strings to be read or written. Elements
must be >= 0 and not |
useBytes |
logical: For |
object |
a character vector to be written to the connection, at
least as long as |
eos |
‘end of string’: character string. The terminator
to be written after each string, followed by an ASCII |
These functions complement readBin
and
writeBin
which read and write C-style zero-terminated
character strings. They are for strings of known length, and
can optionally write an end-of-string mark. They are intended only
for character strings valid in the current locale.
These functions are intended to be used with binary-mode connections.
If con
is a character string, the functions call
file
to obtain a binary-mode file connection which is
opened for the duration of the function call.
If the connection is open it is read/written from its current position. If it is not open, it is opened for the duration of the call in an appropriate mode (binary read or write) and then closed again. An open connection must be in binary mode.
If readChar
is called with con
a raw vector, the data in
the vector is used as input. If writeChar
is called with
con
a raw vector, it is just an indication that a raw vector
should be returned.
Character strings containing ASCII nul
(s) will be read
correctly by readChar
but truncated at the first
nul
with a warning.
If the character length requested for readChar
is longer than
the data available on the connection, what is available is
returned. For writeChar
if too many characters are requested
the output is zero-padded, with a warning.
Missing strings are written as NA
.
For readChar
, a character vector of length the number of
items read (which might be less than length(nchars)
).
For writeChar
, a raw vector (if con
is a raw vector) or
invisibly NULL
.
Earlier versions of R allowed embedded NUL bytes within character
strings, but not R >= 2.8.0. readChar
was commonly used to
read fixed-size zero-padded byte fields for which readBin
was
unsuitable. readChar
can still be used for such fields if
there are no embedded NULs: otherwise readBin(what = "raw")
provides an alternative.
nchars
will be interpreted in bytes not characters in a
non-UTF-8 multi-byte locale, with a warning.
There is little validity checking of UTF-8 reads.
Using these functions on a text-mode connection may work but should
not be mixed with text-mode access to the connection, especially if
the connection was opened with an encoding
argument.
The ‘R Data Import/Export’ manual.
connections
, readLines
,
writeLines
, readBin
## test fixed-length strings zzfil <- tempfile("testchar") zz <- file(zzfil, "wb") x <- c("a", "this will be truncated", "abc") nc <- c(3, 10, 3) writeChar(x, zz, nc, eos = NULL) writeChar(x, zz, eos = "\r\n") close(zz) zz <- file(zzfil, "rb") readChar(zz, nc) readChar(zz, nchar(x)+3) # need to read the terminator explicitly close(zz) unlink(zzfil)
## test fixed-length strings zzfil <- tempfile("testchar") zz <- file(zzfil, "wb") x <- c("a", "this will be truncated", "abc") nc <- c(3, 10, 3) writeChar(x, zz, nc, eos = NULL) writeChar(x, zz, eos = "\r\n") close(zz) zz <- file(zzfil, "rb") readChar(zz, nc) readChar(zz, nchar(x)+3) # need to read the terminator explicitly close(zz) unlink(zzfil)
readline
reads a line from the terminal (in interactive use).
readline(prompt = "")
readline(prompt = "")
prompt |
the string printed when prompting the user for input.
Should usually end with a space |
The prompt string will be truncated to a maximum allowed length, normally 256 chars (but can be changed in the source code).
This can only be used in an interactive session.
A character vector of length one. Both leading and trailing spaces and tabs are stripped from the result.
In non-interactive use the result is as if the response was
RETURN and the value is ""
.
readLines
for reading text lines from connections,
including files.
fun <- function() { ANSWER <- readline("Are you a satisfied R user? ") ## a better version would check the answer less cursorily, and ## perhaps re-prompt if (substr(ANSWER, 1, 1) == "n") cat("This is impossible. YOU LIED!\n") else cat("I knew it.\n") } if(interactive()) fun()
fun <- function() { ANSWER <- readline("Are you a satisfied R user? ") ## a better version would check the answer less cursorily, and ## perhaps re-prompt if (substr(ANSWER, 1, 1) == "n") cat("This is impossible. YOU LIED!\n") else cat("I knew it.\n") } if(interactive()) fun()
Read some or all text lines from a connection.
readLines(con = stdin(), n = -1L, ok = TRUE, warn = TRUE, encoding = "unknown", skipNul = FALSE)
readLines(con = stdin(), n = -1L, ok = TRUE, warn = TRUE, encoding = "unknown", skipNul = FALSE)
con |
a connection object or a character string. |
n |
integer. The (maximal) number of lines to read. Negative values indicate that one should read up to the end of input on the connection. |
ok |
logical. Is it OK to reach the end of the connection before
|
warn |
logical. Warn if a text file is missing a final EOL or if there are embedded NULs in the file. |
encoding |
encoding to be assumed for input strings. It is
used to mark character strings as known to be in
Latin-1, UTF-8 or to be bytes: it is not used to re-encode the input.
To do the
latter, specify the encoding as part of the connection |
skipNul |
logical: should NULs be skipped? |
If the con
is a character string, the function calls
file
to obtain a file connection which is opened for
the duration of the function call. This can be a compressed file.
(tilde expansion of the file path is done by file
.)
If the connection is open it is read from its current position. If it
is not open, it is opened in "rt"
mode for the duration of
the call and then closed (but not destroyed; one must call
close
to do that).
If the final line is incomplete (no final EOL marker) the behaviour depends on whether the connection is blocking or not. For a non-blocking text-mode connection the incomplete line is pushed back, silently. For all other connections the line will be accepted, with a warning.
Whatever mode the connection is opened in, any of LF, CRLF or CR will be accepted as the EOL marker for a line.
Embedded NULs in the input stream will terminate the line currently
being read, with a warning (unless skipNul = TRUE
or warn
= FALSE
).
If con
is a not-already-open connection with a non-default
encoding
argument, the text is converted to UTF-8 and declared
as such (and the encoding
argument to readLines
is ignored).
See the examples.
A character vector of length the number of lines read.
The elements of the result have a declared encoding if encoding
is
"latin1"
or "UTF-8"
,
The default connection, stdin
, may be different from
con = "stdin"
: see file
.
connections
, writeLines
, readBin
,
scan
fil <- tempfile(fileext = ".data") cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = fil, sep = "\n") readLines(fil, n = -1) unlink(fil) # tidy up ## difference in blocking fil <- tempfile("test") cat("123\nabc", file = fil) readLines(fil) # line with a warning con <- file(fil, "r", blocking = FALSE) readLines(con) # "123" cat(" def\n", file = fil, append = TRUE) readLines(con) # gets both close(con) unlink(fil) # tidy up ## Not run: # read a 'Windows Unicode' file A <- readLines(con <- file("Unicode.txt", encoding = "UCS-2LE")) close(con) unique(Encoding(A)) # will most likely be UTF-8 ## End(Not run)
fil <- tempfile(fileext = ".data") cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = fil, sep = "\n") readLines(fil, n = -1) unlink(fil) # tidy up ## difference in blocking fil <- tempfile("test") cat("123\nabc", file = fil) readLines(fil) # line with a warning con <- file(fil, "r", blocking = FALSE) readLines(con) # "123" cat(" def\n", file = fil, append = TRUE) readLines(con) # gets both close(con) unlink(fil) # tidy up ## Not run: # read a 'Windows Unicode' file A <- readLines(con <- file("Unicode.txt", encoding = "UCS-2LE")) close(con) unique(Encoding(A)) # will most likely be UTF-8 ## End(Not run)
Functions to write a single R object to a file, and to restore it.
saveRDS(object, file = "", ascii = FALSE, version = NULL, compress = TRUE, refhook = NULL) readRDS(file, refhook = NULL) infoRDS(file)
saveRDS(object, file = "", ascii = FALSE, version = NULL, compress = TRUE, refhook = NULL) readRDS(file, refhook = NULL) infoRDS(file)
object |
R object to serialize. |
file |
a connection or the name of the file where the R object is saved to or read from. |
ascii |
a logical. If |
version |
the workspace format version to use. |
compress |
a logical specifying whether saving to a named file is
to use |
refhook |
a hook function for handling reference objects. |
saveRDS
and readRDS
provide the means to save a single R
object to a connection (typically a file) and to restore the object, quite
possibly under a different name. This differs from save
and
load
, which save and restore one or more named objects into
an environment. They are widely used by R itself, for example to store
metadata for a package and to store the help.search
databases: the ".rds"
file extension is most often used.
Functions serialize
and unserialize
provide a slightly lower-level interface to serialization: objects
serialized to a connection by serialize
can be read back by
readRDS
and conversely.
Function infoRDS
retrieves meta-data about serialization produced
by saveRDS
or serialize
. infoRDS
cannot be used to
detect whether a file is a serialization nor whether it is valid.
All of these interfaces use the same serialization format, but save
writes a single line header (typically "RDXs\n"
) before the
serialization of a single object (a pairlist of all the objects to be
saved).
If file
is a file name, it is opened by gzfile
except for save(compress = FALSE)
which uses
file
. Only for the exception are marked encodings of
file
which cannot be translated to the native encoding handled
on Windows.
Compression is handled by the connection opened when file
is a
file name, so is only possible when file
is a connection if
handled by the connection. So e.g. url
connections will need to be wrapped in a call to gzcon
.
If a connection is supplied it will be opened (in binary mode) for the
duration of the function if not already open: if it is already open it
must be in binary mode for saveRDS(ascii = FALSE)
or to read
non-ASCII saves.
For readRDS
, an R object.
For saveRDS
, NULL
invisibly.
For infoRDS
, an R list with elements version
(version
number, currently 2 or 3), writer_version
(version of R that
produced the serialization), min_reader_version
(minimum version of
R that can read the serialization), format
(data representation)
and native_encoding
(native encoding of the session that produced
the serialization, available since version 3). The data representation is
given as "xdr"
for big-endian binary representation, "ascii"
for ASCII representation (produced via ascii = TRUE
or ascii
= NA
) or "binary"
(binary representation with native
‘endianness’ which can be produced by serialize
).
Files produced by saveRDS
(or serialize
to a file
connection) are not suitable as an interchange format between
machines, for example to download from a website. The
files produced by save
have a header identifying the
file type and so are better protected against erroneous use.
The ‘R Internals’ manual for details of the format used.
fil <- tempfile("women", fileext = ".rds") ## save a single object to file saveRDS(women, fil) ## restore it under a different name women2 <- readRDS(fil) identical(women, women2) ## or examine the object via a connection, which will be opened as needed. con <- gzfile(fil) readRDS(con) close(con) ## Less convenient ways to restore the object ## which demonstrate compatibility with unserialize() con <- gzfile(fil, "rb") identical(unserialize(con), women) close(con) con <- gzfile(fil, "rb") wm <- readBin(con, "raw", n = 1e4) # size is a guess close(con) identical(unserialize(wm), women) ## Format compatibility with serialize(): fil2 <- tempfile("women") con <- file(fil2, "w") serialize(women, con) # ASCII, uncompressed close(con) identical(women, readRDS(fil2)) fil3 <- tempfile("women") con <- bzfile(fil3, "w") serialize(women, con) # binary, bzip2-compressed close(con) identical(women, readRDS(fil3)) unlink(c(fil, fil2, fil3))
fil <- tempfile("women", fileext = ".rds") ## save a single object to file saveRDS(women, fil) ## restore it under a different name women2 <- readRDS(fil) identical(women, women2) ## or examine the object via a connection, which will be opened as needed. con <- gzfile(fil) readRDS(con) close(con) ## Less convenient ways to restore the object ## which demonstrate compatibility with unserialize() con <- gzfile(fil, "rb") identical(unserialize(con), women) close(con) con <- gzfile(fil, "rb") wm <- readBin(con, "raw", n = 1e4) # size is a guess close(con) identical(unserialize(wm), women) ## Format compatibility with serialize(): fil2 <- tempfile("women") con <- file(fil2, "w") serialize(women, con) # ASCII, uncompressed close(con) identical(women, readRDS(fil2)) fil3 <- tempfile("women") con <- bzfile(fil3, "w") serialize(women, con) # binary, bzip2-compressed close(con) identical(women, readRDS(fil3)) unlink(c(fil, fil2, fil3))
Read as file such as ‘.Renviron’ or ‘Renviron.site’ in the format described in the help for Startup, and set environment variables as defined in the file.
readRenviron(path)
readRenviron(path)
path |
A length-one character vector giving the path to the file. Tilde-expansion is performed where supported. |
Scalar logical indicating if the file was read successfully. Returned invisibly. If the file cannot be opened for reading, a warning is given.
Startup
for the file format.
## Not run: ## re-read a startup file (or read it in a vanilla session) readRenviron("~/.Renviron") ## End(Not run)
## Not run: ## re-read a startup file (or read it in a vanilla session) readRenviron("~/.Renviron") ## End(Not run)
Recall
is used as a placeholder for the name of the function
in which it is called. It allows the definition of recursive
functions which still work after being renamed, see example below.
Recall(...)
Recall(...)
... |
all the arguments to be passed. |
Recall
will not work correctly when passed as a function
argument, e.g. to the apply
family of functions.
local
for another way to write anonymous recursive functions.
## A trivial (but inefficient!) example: fib <- function(n) if(n<=2) { if(n>=0) 1 else 0 } else Recall(n-1) + Recall(n-2) fibonacci <- fib; rm(fib) ## renaming wouldn't work without Recall fibonacci(10) # 55
## A trivial (but inefficient!) example: fib <- function(n) if(n<=2) { if(n>=0) 1 else 0 } else Recall(n-1) + Recall(n-2) fibonacci <- fib; rm(fib) ## renaming wouldn't work without Recall fibonacci(10) # 55
Registers an R function to be called upon garbage collection of object or (optionally) at the end of an R session.
reg.finalizer(e, f, onexit = FALSE)
reg.finalizer(e, f, onexit = FALSE)
e |
object to finalize. Must be an environment or an external pointer. |
f |
function to call on finalization. Must accept a single argument, which will be the object to finalize. |
onexit |
logical: should the finalizer be run if the object is still uncollected at the end of the R session? |
The main purpose of this function is to allow objects that refer to external items (a temporary file, say) to perform cleanup actions when they are no longer referenced from within R. This only makes sense for objects that are never copied on assignment, hence the restriction to environments and external pointers.
Inter alia, it provides a way to program code to be run at
the end of an R session without manipulating .Last
.
For use in a package, it is often a good idea to set a finalizer on an
object in the namespace: then it will be called at the end of the
session, or soon after the namespace is unloaded if that is done
during the session.
NULL
.
R's interpreter is not re-entrant and the finalizer could be run in
the middle of a computation. So there are many functions which it is
potentially unsafe to call from f
: one example which caused
trouble is options
. Finalizers are
scheduled at garbage collection but only run at a relatively safe time
thereafter.
gc
and Memory
for garbage collection and
memory management.
f <- function(e) print("cleaning....") g <- function(x){ e <- environment(); reg.finalizer(e, f) } g() invisible(gc()) # trigger cleanup
f <- function(e) print("cleaning....") g <- function(x){ e <- environment(); reg.finalizer(e, f) } g() invisible(gc()) # trigger cleanup
This help page documents the regular expression patterns supported by
grep
and related functions grepl
, regexpr
,
gregexpr
, sub
and gsub
, as well as by
strsplit
and optionally by agrep
and
agrepl
.
A ‘regular expression’ is a pattern that describes a set of
strings. Two types of regular expressions are used in R,
extended regular expressions (the default) and
Perl-like regular expressions used by perl = TRUE
.
There is also fixed = TRUE
which can be considered to use a
literal regular expression.
Other functions which use regular expressions (often via the use of
grep
) include apropos
, browseEnv
,
help.search
, list.files
and ls
.
These will all use extended regular expressions.
Patterns are described here as they would be printed by cat
:
(do remember that backslashes need to be doubled when entering R
character strings, e.g. from the keyboard).
Long regular expression patterns may or may not be accepted: the POSIX standard only requires up to 256 bytes.
This section covers the regular expressions allowed in the default
mode of grep
, grepl
, regexpr
, gregexpr
,
sub
, gsub
, regexec
and strsplit
. They use
an implementation of the POSIX 1003.2 standard: that allows some scope
for interpretation and the interpretations here are those currently
used by R. The implementation supports some extensions to the
standard.
Regular expressions are constructed analogously to arithmetic
expressions, by using various operators to combine smaller
expressions. The whole expression matches zero or more characters
(read ‘character’ as ‘byte’ if useBytes = TRUE
).
The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any metacharacter with special meaning may be quoted by preceding it with a backslash. The metacharacters in extended regular expressions are ‘. \ | ( ) [ { ^ $ * + ?’, but note that whether these have a special meaning depends on the context.
Escaping non-metacharacters with a backslash is implementation-dependent. The current implementation interprets ‘\a’ as ‘BEL’, ‘\e’ as ‘ESC’, ‘\f’ as ‘FF’, ‘\n’ as ‘LF’, ‘\r’ as ‘CR’ and ‘\t’ as ‘TAB’. (Note that these will be interpreted by R's parser in literal character strings.)
A character class is a list of characters enclosed between
‘[’ and ‘]’ which matches any single character in that list;
unless the first character of the list is the caret ‘^’, when it
matches any character not in the list. For example, the
regular expression ‘[0123456789]’ matches any single digit, and
‘[^abc]’ matches anything except the characters ‘a’,
‘b’ or ‘c’. A range of characters may be specified by
giving the first and last characters, separated by a hyphen. (Because
their interpretation is locale- and implementation-dependent,
character ranges are best avoided. Some but not all implementations
include both cases in ranges when doing caseless matching.) The only
portable way to specify all ASCII letters is to list them all as the
character class
‘[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]’.
(The
current implementation uses numerical order of the encoding, normally a
single-byte encoding or Unicode points.)
Certain named classes of characters are predefined. Their interpretation depends on the locale (see locales); the interpretation below is that of the POSIX locale.
Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’.
Alphabetic characters: ‘[:lower:]’ and ‘[:upper:]’.
Blank characters: space and tab, and possibly other locale-dependent characters, but on most platforms not including non-breaking space.
Control characters. In ASCII, these characters have octal codes
000 through 037, and 177 (DEL
). In another character set,
these are the equivalent characters, if any.
Digits: ‘0 1 2 3 4 5 6 7 8 9’.
Graphical characters: ‘[:alnum:]’ and ‘[:punct:]’.
Lower-case letters in the current locale.
Printable characters: ‘[:alnum:]’, ‘[:punct:]’ and space.
Punctuation characters:
‘! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~’.
Space characters: tab, newline, vertical tab, form feed, carriage return, space and possibly other locale-dependent characters – on most platforms this does not include non-breaking spaces.
Upper-case letters in the current locale.
Hexadecimal digits:
‘0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f’.
For example, ‘[[:alnum:]]’ means ‘[0-9A-Za-z]’, except the
latter depends upon the locale and the character encoding, whereas the
former is independent of locale and character set. (Note that the
brackets in these class names are part of the symbolic names, and must
be included in addition to the brackets delimiting the bracket list.)
Most metacharacters lose their special meaning inside a character
class. To include a literal ‘]’, place it first in the list.
Similarly, to include a literal ‘^’, place it anywhere but first.
Finally, to include a literal ‘-’, place it first or last (or,
for perl = TRUE
only, precede it by a backslash). (Only
‘^ - \ ]’ are special inside character classes.)
The period ‘.’ matches any single character. The symbol ‘\w’ matches a ‘word’ character (a synonym for ‘[[:alnum:]_]’, an extension) and ‘\W’ is its negation (‘[^[:alnum:]_]’). Symbols ‘\d’, ‘\s’, ‘\D’ and ‘\S’ denote the digit and space classes and their negations (these are all extensions).
The caret ‘^’ and the dollar sign ‘$’ are metacharacters that respectively match the empty string at the beginning and end of a line. The symbols ‘\<’ and ‘\>’ match the empty string at the beginning and end of a word. The symbol ‘\b’ matches the empty string at either edge of a word, and ‘\B’ matches the empty string provided it is not at an edge of a word. (The interpretation of ‘word’ depends on the locale and implementation: these are all extensions.)
A regular expression may be followed by one of several repetition quantifiers:
The preceding item is optional and will be matched at most once.
The preceding item will be matched zero or more times.
The preceding item will be matched one or more times.
The preceding item is matched exactly n
times.
The preceding item is matched n
or more
times.
The preceding item is matched at least n
times, but not more than m
times.
By default repetition is greedy, so the maximal possible number of
repeats is used. This can be changed to ‘minimal’ by appending
?
to the quantifier. (There are further quantifiers that allow
approximate matching: see the TRE documentation.)
Regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating the substrings that match the concatenated subexpressions.
Two regular expressions may be joined by the infix operator ‘|’;
the resulting regular expression matches any string matching either
subexpression. For example, ‘abba|cde’ matches either the
string abba
or the string cde
. Note that alternation
does not work inside character classes, where ‘|’ has its literal
meaning.
Repetition takes precedence over concatenation, which in turn takes precedence over alternation. A whole subexpression may be enclosed in parentheses to override these precedence rules.
The backreference ‘\N’, where ‘N = 1 ... 9’, matches the substring previously matched by the Nth parenthesized subexpression of the regular expression. (This is an extension for extended regular expressions: POSIX defines them only for basic ones.)
The perl = TRUE
argument to grep
, regexpr
,
gregexpr
, sub
, gsub
and strsplit
switches
to the PCRE library that implements regular expression pattern
matching using the same syntax and semantics as Perl 5.x,
with just a few differences.
For complete details please consult the man pages for PCRE, especially
man pcrepattern
and man pcreapi
, on your system or
from the sources at https://www.pcre.org. (The version in use can be
found by calling extSoftVersion
. It need not be the version
described in the system's man page. PCRE1 (reported as version < 10.00 by
extSoftVersion
) has been feature-frozen for some time
(essentially 2012), the man pages at
https://www.pcre.org/original/doc/html/ should be a good match.
PCRE2 (PCRE version >= 10.00) has man pages at
https://www.pcre.org/current/doc/html/).
Perl regular expressions can be computed byte-by-byte or
(UTF-8) character-by-character: the latter is used in all multibyte
locales and if any of the inputs are marked as UTF-8 (see
Encoding
, or as Latin-1 except in a Latin-1 locale.
All the regular expressions described for extended regular expressions
are accepted except ‘\<’ and ‘\>’: in Perl all backslashed
metacharacters are alphanumeric and backslashed symbols always are
interpreted as a literal character. ‘{’ is not special if it
would be the start of an invalid interval specification. There can be
more than 9 backreferences (but the replacement in sub
can only refer to the first 9).
Character ranges are interpreted in the numerical order of the characters, either as bytes in a single-byte locale or as Unicode code points in UTF-8 mode. So in either case ‘[A-Za-z]’ specifies the set of ASCII letters.
In UTF-8 mode the named character classes only match ASCII characters: see ‘\p’ below for an alternative.
The construct ‘(?...)’ is used for Perl extensions in a variety of ways depending on what immediately follows the ‘?’.
Perl-like matching can work in several modes, set by the options ‘(?i)’ (caseless, equivalent to Perl's ‘/i’), ‘(?m)’ (multiline, equivalent to Perl's ‘/m’), ‘(?s)’ (single line, so a dot matches all characters, even new lines: equivalent to Perl's ‘/s’) and ‘(?x)’ (extended, whitespace data characters are ignored unless escaped and comments are allowed: equivalent to Perl's ‘/x’). These can be concatenated, so for example, ‘(?im)’ sets caseless multiline matching. It is also possible to unset these options by preceding the letter with a hyphen, and to combine setting and unsetting such as ‘(?im-sx)’. These settings can be applied within patterns, and then apply to the remainder of the pattern. Additional options not in Perl include ‘(?U)’ to set ‘ungreedy’ mode (so matching is minimal unless ‘?’ is used as part of the repetition quantifier, when it is greedy). Initially none of these options are set.
If you want to remove the special meaning from a sequence of characters, you can do so by putting them between ‘\Q’ and ‘\E’. This is different from Perl in that ‘$’ and ‘@’ are handled as literals in ‘\Q...\E’ sequences in PCRE, whereas in Perl, ‘$’ and ‘@’ cause variable interpolation.
The escape sequences ‘\d’, ‘\s’ and ‘\w’ represent
any decimal digit, space character and ‘word’ character
(letter, digit or underscore in the current locale: in UTF-8 mode only
ASCII letters and digits are considered) respectively, and their
upper-case versions represent their negation. Vertical tab was not
regarded as a space character in a C
locale before PCRE 8.34.
Sequences ‘\h’, ‘\v’, ‘\H’ and ‘\V’ match
horizontal and vertical space or the negation. (In UTF-8 mode, these
do match non-ASCII Unicode code points.)
There are additional escape sequences: ‘\cx’ is ‘cntrl-x’ for any ‘x’, ‘\ddd’ is the octal character (for up to three digits unless interpretable as a backreference, as ‘\1’ to ‘\7’ always are), and ‘\xhh’ specifies a character by two hex digits. In a UTF-8 locale, ‘\x{h...}’ specifies a Unicode code point by one or more hex digits. (Note that some of these will be interpreted by R's parser in literal character strings.)
Outside a character class, ‘\A’ matches at the start of a subject (even in multiline mode, unlike ‘^’), ‘\Z’ matches at the end of a subject or before a newline at the end, ‘\z’ matches only at end of a subject. and ‘\G’ matches at first matching position in a subject (which is subtly different from Perl's end of the previous match). ‘\C’ matches a single byte, including a newline, but its use is warned against. In UTF-8 mode, ‘\R’ matches any Unicode newline character (not just CR), and ‘\X’ matches any number of Unicode characters that form an extended Unicode sequence. ‘\X’, ‘\R’ and ‘\B’ cannot be used inside a character class (with PCRE1, they are treated as characters ‘X’, ‘R’ and ‘B’; with PCRE2 they cause an error).
A hyphen (minus) inside a character class is treated as a range, unless it is first or last character in the class definition. It can be quoted to represent the hyphen literal (‘\-’). PCRE1 allows an unquoted hyphen at some other locations inside a character class where it cannot represent a valid range, but PCRE2 reports an error in such cases.
In UTF-8 mode, some Unicode properties may be supported via
‘\p{xx}’ and ‘\P{xx}’ which match characters with and
without property ‘xx’ respectively. For a list of supported
properties see the PCRE documentation, but for example ‘Lu’ is
‘upper case letter’ and ‘Sc’ is ‘currency symbol’. Note
that properties such as ‘\w’, ‘\W’, ‘\d’, ‘\D’, ‘\s’,
‘\S’, ‘\b’ and ‘\B’ by default do not refer to full
Unicode, but one can override this by starting a pattern with ‘(*UCP)’
(which comes with a performance penalty).
(This support depends on the PCRE library being compiled with
‘Unicode property support’ which can be checked via
pcre_config
. PCRE2 when compiled with Unicode support always
supports also Unicode properties.)
The sequence ‘(?#’ marks the start of a comment which continues up to the next closing parenthesis. Nested parentheses are not permitted. The characters that make up a comment play no part at all in the pattern matching.
If the extended option is set, an unescaped ‘#’ character outside a character class introduces a comment that continues up to the next newline character in the pattern.
The pattern ‘(?:...)’ groups characters just as parentheses do but does not make a backreference.
Patterns ‘(?=...)’ and ‘(?!...)’ are zero-width positive and
negative lookahead assertions: they match if an attempt to
match the ...
forward from the current position would succeed
(or not), but use up no characters in the string being processed.
Patterns ‘(?<=...)’ and ‘(?<!...)’ are the lookbehind
equivalents: they do not allow repetition quantifiers nor ‘\C’
in ...
.
regexpr
and gregexpr
support ‘named capture’. If
groups are named, e.g., "(?<first>[A-Z][a-z]+)"
then the
positions of the matches are also returned by name. (Named
backreferences are not supported by sub
.)
Atomic grouping, possessive qualifiers and conditional and recursive patterns are not covered here.
This help page is based on the TRE documentation and the POSIX
standard, and the pcre2pattern
man page from PCRE2 10.35.
grep
, apropos
, browseEnv
,
glob2rx
, help.search
, list.files
,
ls
, strsplit
and agrep
.
The TRE regexp syntax.
The POSIX 1003.2 standard at https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html.
The pcre2pattern
or pcrepattern
man
page
(found as part of https://www.pcre.org/original/pcre.txt), and
details of Perl's own implementation at
https://perldoc.perl.org/perlre.
Extract or replace matched substrings from match data obtained by
regexpr
, gregexpr
,
regexec
or gregexec
.
regmatches(x, m, invert = FALSE) regmatches(x, m, invert = FALSE) <- value
regmatches(x, m, invert = FALSE) regmatches(x, m, invert = FALSE) <- value
x |
a character vector. |
m |
an object with match data. |
invert |
a logical: if |
value |
an object with suitable replacement values for the
matched or non-matched substrings (see |
If invert
is FALSE
(default), regmatches
extracts
the matched substrings as specified by the match data. For vector
match data (as obtained from regexpr
), empty matches are
dropped; for list match data, empty matches give empty components
(zero-length character vectors).
If invert
is TRUE
, regmatches
extracts the
non-matched substrings, i.e., the strings are split according to the
matches similar to strsplit
(for vector match data, at
most a single split is performed).
If invert
is NA
, regmatches
extracts both
non-matched and matched substrings, always starting and ending with a
non-match (empty if the match occurred at the beginning or the end,
respectively).
Note that the match data can be obtained from regular expression
matching on a modified version of x
with the same numbers of
characters.
The replacement function can be used for replacing the matched or
non-matched substrings. For vector match data, if invert
is
FALSE
, value
should be a character vector with length the
number of matched elements in m
. Otherwise, it should be a
list of character vectors with the same length as m
, each as
long as the number of replacements needed. Replacement coerces values
to character or list and generously recycles values as needed.
Missing replacement values are not allowed.
For regmatches
, a character vector with the matched substrings
if m
is a vector and invert
is FALSE
. Otherwise,
a list with the matched or/and non-matched substrings.
For regmatches<-
, the updated character vector.
x <- c("A and B", "A, B and C", "A, B, C and D", "foobar") pattern <- "[[:space:]]*(,|and)[[:space:]]" ## Match data from regexpr() m <- regexpr(pattern, x) regmatches(x, m) regmatches(x, m, invert = TRUE) ## Match data from gregexpr() m <- gregexpr(pattern, x) regmatches(x, m) regmatches(x, m, invert = TRUE) ## Consider x <- "John (fishing, hunting), Paul (hiking, biking)" ## Suppose we want to split at the comma (plus spaces) between the ## persons, but not at the commas in the parenthesized hobby lists. ## One idea is to "blank out" the parenthesized parts to match the ## parts to be used for splitting, and extract the persons as the ## non-matched parts. ## First, match the parenthesized hobby lists. m <- gregexpr("\\([^)]*\\)", x) ## Create blank strings with given numbers of characters. blanks <- function(n) strrep(" ", n) ## Create a copy of x with the parenthesized parts blanked out. s <- x regmatches(s, m) <- Map(blanks, lapply(regmatches(s, m), nchar)) s ## Compute the positions of the split matches (note that we cannot call ## strsplit() on x with match data from s). m <- gregexpr(", *", s) ## And finally extract the non-matched parts. regmatches(x, m, invert = TRUE) ## regexec() and gregexec() return overlapping ranges because the ## first match is the full match. This conflicts with regmatches()<- ## and regmatches(..., invert=TRUE). We can work-around by dropping ## the first match. drop_first <- function(x) { if(!anyNA(x) && all(x > 0)) { ml <- attr(x, 'match.length') if(is.matrix(x)) x <- x[-1,] else x <- x[-1] attr(x, 'match.length') <- if(is.matrix(ml)) ml[-1,] else ml[-1] } x } m <- gregexec("(\\w+) \\(((?:\\w+(?:, )?)+)\\)", x) regmatches(x, m) try(regmatches(x, m, invert=TRUE)) regmatches(x, lapply(m, drop_first)) ## invert=TRUE loses matrix structure because we are retrieving what ## is in between every sub-match regmatches(x, lapply(m, drop_first), invert=TRUE) y <- z <- x ## Notice **list**(...) on the RHS regmatches(y, lapply(m, drop_first)) <- list(c("<NAME>", "<HOBBY-LIST>")) y regmatches(z, lapply(m, drop_first), invert=TRUE) <- list(sprintf("<%d>", 1:5)) z ## With `perl = TRUE` and `invert = FALSE` capture group names ## are preserved. Collect functions and arguments in calls: NEWS <- head(readLines(file.path(R.home(), 'doc', 'NEWS.2')), 100) m <- gregexec("(?<fun>\\w+)\\((?<args>[^)]*)\\)", NEWS, perl = TRUE) y <- regmatches(NEWS, m) y[[16]] ## Make tabular, adding original line numbers mdat <- as.data.frame(t(do.call(cbind, y))) mdat <- cbind(mdat, line=rep(seq_along(y), lengths(y) / ncol(mdat))) head(mdat) NEWS[head(mdat[['line']])]
x <- c("A and B", "A, B and C", "A, B, C and D", "foobar") pattern <- "[[:space:]]*(,|and)[[:space:]]" ## Match data from regexpr() m <- regexpr(pattern, x) regmatches(x, m) regmatches(x, m, invert = TRUE) ## Match data from gregexpr() m <- gregexpr(pattern, x) regmatches(x, m) regmatches(x, m, invert = TRUE) ## Consider x <- "John (fishing, hunting), Paul (hiking, biking)" ## Suppose we want to split at the comma (plus spaces) between the ## persons, but not at the commas in the parenthesized hobby lists. ## One idea is to "blank out" the parenthesized parts to match the ## parts to be used for splitting, and extract the persons as the ## non-matched parts. ## First, match the parenthesized hobby lists. m <- gregexpr("\\([^)]*\\)", x) ## Create blank strings with given numbers of characters. blanks <- function(n) strrep(" ", n) ## Create a copy of x with the parenthesized parts blanked out. s <- x regmatches(s, m) <- Map(blanks, lapply(regmatches(s, m), nchar)) s ## Compute the positions of the split matches (note that we cannot call ## strsplit() on x with match data from s). m <- gregexpr(", *", s) ## And finally extract the non-matched parts. regmatches(x, m, invert = TRUE) ## regexec() and gregexec() return overlapping ranges because the ## first match is the full match. This conflicts with regmatches()<- ## and regmatches(..., invert=TRUE). We can work-around by dropping ## the first match. drop_first <- function(x) { if(!anyNA(x) && all(x > 0)) { ml <- attr(x, 'match.length') if(is.matrix(x)) x <- x[-1,] else x <- x[-1] attr(x, 'match.length') <- if(is.matrix(ml)) ml[-1,] else ml[-1] } x } m <- gregexec("(\\w+) \\(((?:\\w+(?:, )?)+)\\)", x) regmatches(x, m) try(regmatches(x, m, invert=TRUE)) regmatches(x, lapply(m, drop_first)) ## invert=TRUE loses matrix structure because we are retrieving what ## is in between every sub-match regmatches(x, lapply(m, drop_first), invert=TRUE) y <- z <- x ## Notice **list**(...) on the RHS regmatches(y, lapply(m, drop_first)) <- list(c("<NAME>", "<HOBBY-LIST>")) y regmatches(z, lapply(m, drop_first), invert=TRUE) <- list(sprintf("<%d>", 1:5)) z ## With `perl = TRUE` and `invert = FALSE` capture group names ## are preserved. Collect functions and arguments in calls: NEWS <- head(readLines(file.path(R.home(), 'doc', 'NEWS.2')), 100) m <- gregexec("(?<fun>\\w+)\\((?<args>[^)]*)\\)", NEWS, perl = TRUE) y <- regmatches(NEWS, m) y[[16]] ## Make tabular, adding original line numbers mdat <- as.data.frame(t(do.call(cbind, y))) mdat <- cbind(mdat, line=rep(seq_along(y), lengths(y) / ncol(mdat))) head(mdat) NEWS[head(mdat[['line']])]
remove
and rm
are identical R functions that
can be used to remove objects. These can
be specified successively as character strings, or in the character
vector list
, or through a combination of both. All objects
thus specified will be removed.
If envir
is NULL then the currently active environment is
searched first.
If inherits
is TRUE
then parents of the supplied
directory are searched until a variable with the given name is
encountered. A warning is printed for each variable that is not
found.
remove(..., list = character(), pos = -1, envir = as.environment(pos), inherits = FALSE) rm (..., list = character(), pos = -1, envir = as.environment(pos), inherits = FALSE)
remove(..., list = character(), pos = -1, envir = as.environment(pos), inherits = FALSE) rm (..., list = character(), pos = -1, envir = as.environment(pos), inherits = FALSE)
... |
the objects to be removed, as names (unquoted) or character strings (quoted). |
list |
a character vector (or |
pos |
where to do the removal. By default, uses the current environment. See ‘details’ for other possibilities. |
envir |
the |
inherits |
should the enclosing frames of the environment be inspected? |
The pos
argument can specify the environment from which to remove
the objects in any of several ways:
as an integer (the position in the search
list); as
the character string name of an element in the search list; or as an
environment
(including using sys.frame
to
access the currently active function calls).
The envir
argument is an alternative way to specify an
environment, but is primarily there for back compatibility.
It is not allowed to remove variables from the base environment and
base namespace, nor from any environment which is locked (see
lockEnvironment
).
Earlier versions of R incorrectly claimed that supplying a character
vector in ...
removed the objects named in the character
vector, but it removed the character vector. Use the list
argument to specify objects via a character vector.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
tmp <- 1:4 ## work with tmp and cleanup rm(tmp) ## Not run: ## remove (almost) everything in the working environment. ## You will get no warning, so don't do this unless you are really sure. rm(list = ls()) ## End(Not run)
tmp <- 1:4 ## work with tmp and cleanup rm(tmp) ## Not run: ## remove (almost) everything in the working environment. ## You will get no warning, so don't do this unless you are really sure. rm(list = ls()) ## End(Not run)
rep
replicates the values in x
. It is a generic
function, and the (internal) default method is described here.
rep.int
and rep_len
are faster simplified versions for
two common cases. Internally, they are generic, so methods can be
defined for them (see InternalMethods).
rep(x, ...) rep.int(x, times) rep_len(x, length.out)
rep(x, ...) rep.int(x, times) rep_len(x, length.out)
x |
a vector (of any mode including a |
... |
further arguments to be passed to or from other methods. For the internal default method these can include:
|
times , length.out
|
see |
The default behaviour is as if the call was
rep(x, times = 1, length.out = NA, each = 1)
. Normally just one of the additional
arguments is specified, but if each
is specified with either
of the other two, its replication is performed first, and then that
implied by times
or length.out
.
If times
consists of a single integer, the result consists of
the whole input repeated this many times. If times
is a
vector of the same length as x
(after replication by
each
), the result consists of x[1]
repeated
times[1]
times, x[2]
repeated times[2]
times and
so on.
length.out
may be given in place of times
,
in which case x
is repeated as many times as is
necessary to create a vector of this length. If both are given,
length.out
takes priority and times
is ignored.
Non-integer values of times
will be truncated towards zero.
If times
is a computed quantity it is prudent to add a small
fuzz or use round
. And analogously for each
.
If x
has length zero and length.out
is supplied and is
positive, the values are filled in using the extraction rules, that is
by an NA
of the appropriate class for an atomic vector
(0
for raw vectors) and NULL
for a list.
An object of the same type as x
.
rep.int
and rep_len
return no attributes (except the
class if returning a factor).
The default method of rep
gives the result names (which will
almost always contain duplicates) if x
had names, but retains
no other attributes.
Function rep.int
is a simple case which was provided as a
separate function partly for S compatibility and partly for speed
(especially when names can be dropped). The performance of rep
has been improved since, but rep.int
is still at least twice as
fast when x
has names.
The name rep.int
long precedes making rep
generic.
Function rep
is a primitive, but (partial) matching of argument
names is performed as for normal functions.
For historical reasons rep
(only) works on NULL
: the
result is always NULL
even when length.out
is positive.
Although it has never been documented, these functions have always worked on expression vectors.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
rep(1:4, 2) rep(1:4, each = 2) # not the same. rep(1:4, c(2,2,2,2)) # same as second. rep(1:4, c(2,1,2,1)) rep(1:4, each = 2, length.out = 4) # first 4 only. rep(1:4, each = 2, length.out = 10) # 8 integers plus two recycled 1's. rep(1:4, each = 2, times = 3) # length 24, 3 complete replications rep(1, 40*(1-.8)) # length 7 on most platforms rep(1, 40*(1-.8)+1e-7) # better ## replicate a list fred <- list(happy = 1:10, name = "squash") rep(fred, 5) # date-time objects x <- .leap.seconds[1:3] rep(x, 2) rep(as.POSIXlt(x), rep(2, 3)) ## named factor x <- factor(LETTERS[1:4]); names(x) <- letters[1:4] x rep(x, 2) rep(x, each = 2) rep.int(x, 2) # no names rep_len(x, 10)
rep(1:4, 2) rep(1:4, each = 2) # not the same. rep(1:4, c(2,2,2,2)) # same as second. rep(1:4, c(2,1,2,1)) rep(1:4, each = 2, length.out = 4) # first 4 only. rep(1:4, each = 2, length.out = 10) # 8 integers plus two recycled 1's. rep(1:4, each = 2, times = 3) # length 24, 3 complete replications rep(1, 40*(1-.8)) # length 7 on most platforms rep(1, 40*(1-.8)+1e-7) # better ## replicate a list fred <- list(happy = 1:10, name = "squash") rep(fred, 5) # date-time objects x <- .leap.seconds[1:3] rep(x, 2) rep(as.POSIXlt(x), rep(2, 3)) ## named factor x <- factor(LETTERS[1:4]); names(x) <- letters[1:4] x rep(x, 2) rep(x, each = 2) rep.int(x, 2) # no names rep_len(x, 10)
replace
replaces the values in x
with indices given in list
by those given in values
.
If necessary, the values in values
are recycled.
replace(x, list, values)
replace(x, list, values)
x |
a vector. |
list |
an index vector. |
values |
replacement values. |
A vector with the values replaced.
x
is unchanged: remember to assign the result.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
The reserved words in R's parser are
if
else
repeat
while
function
for
in
next
break
TRUE
FALSE
NULL
Inf
NaN
NA
NA_integer_
NA_real_
NA_complex_
NA_character_
...
and ..1
, ..2
etc, which are used to refer to
arguments passed down from a calling function, see ...
.
Reserved words outside quotes are always parsed to be
references to the objects linked to in the ‘Description’, and
hence they are not allowed as syntactic names (see
make.names
). They are allowed as non-syntactic
names, e.g. inside backtick quotes.
rev
provides a reversed version of its argument. It is generic
function with a default method for vectors and one for
dendrogram
s.
Note that this is no longer needed (nor efficient) for obtaining
vectors sorted into descending order, since that is now rather more
directly achievable by sort(x, decreasing = TRUE)
.
rev(x)
rev(x)
x |
a vector or another object for which reversal is defined. |
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
x <- c(1:5, 5:3) ## sort into descending order; first more efficiently: stopifnot(sort(x, decreasing = TRUE) == rev(sort(x))) stopifnot(rev(1:7) == 7:1) #- don't need 'rev' here
x <- c(1:5, 5:3) ## sort into descending order; first more efficiently: stopifnot(sort(x, decreasing = TRUE) == rev(sort(x))) stopifnot(rev(1:7) == 7:1) #- don't need 'rev' here
Return the R home directory, or the full path to a component of the R installation.
R.home(component = "home")
R.home(component = "home")
component |
|
The R home directory is the top-level directory of the R installation being run.
The R home directory is often referred to as R_HOME,
and is the value of an environment variable of that name in an R
session.
It can be found outside an R session by R RHOME
.
The paths to components often are subdirectories of R_HOME but
need not be: "doc"
, "include"
and "share"
are
not for some Linux binary installations of R.
A character string giving the R home directory or path to a particular component. Normally the components are all subdirectories of the R home directory, but this need not be the case in a Unix-like installation.
The value for "modules"
and on Windows "bin"
is a
sub-architecture-specific location. (This is not so for
"etc"
, which may have sub-architecture-specific files as well
as common ones.)
On a Unix-alike, the constructed paths are based on the current values of the environment variables R_HOME and where set R_SHARE_DIR, R_DOC_DIR and R_INCLUDE_DIR (these are set on startup and should not be altered).
On Windows the values of R.home()
and R_HOME are
switched to the 8.3 short form of path elements if required and if
the Windows service to do that is enabled. The value of
R_HOME is set to use forward slashes (since many package
maintainers pass it unquoted to shells, for example in
‘Makefile’s).
commandArgs()[1]
may provide related information.
## These result quite platform-dependently : rbind(home = R.home(), bin = R.home("bin")) # often the 'bin' sub directory of 'home' # but not always ... list.files(R.home("bin"))
## These result quite platform-dependently : rbind(home = R.home(), bin = R.home("bin")) # often the 'bin' sub directory of 'home' # but not always ... list.files(R.home("bin"))
Compute the lengths and values of runs of equal values in a vector – or the reverse operation.
rle(x) inverse.rle(x, ...) ## S3 method for class 'rle' print(x, digits = getOption("digits"), prefix = "", ...)
rle(x) inverse.rle(x, ...) ## S3 method for class 'rle' print(x, digits = getOption("digits"), prefix = "", ...)
x |
a vector (atomic, not a list) for |
... |
further arguments; ignored here. |
digits |
number of significant digits for printing, see
|
prefix |
character string, prepended to each printed line. |
‘vector’ is used in the sense of is.vector
.
Missing values are regarded as unequal to the previous value, even if that is also missing.
inverse.rle()
is the inverse function of rle()
,
reconstructing x
from the runs.
rle()
returns an object of class "rle"
which is a list
with components:
lengths |
an integer vector containing the length of each run. |
values |
a vector of the same length as |
inverse.rle()
returns an atomic vector.
x <- rev(rep(6:10, 1:5)) rle(x) ## lengths [1:5] 5 4 3 2 1 ## values [1:5] 10 9 8 7 6 z <- c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE) rle(z) rle(as.character(z)) print(rle(z), prefix = "..| ") N <- integer(0) stopifnot(x == inverse.rle(rle(x)), identical(N, inverse.rle(rle(N))), z == inverse.rle(rle(z)))
x <- rev(rep(6:10, 1:5)) rle(x) ## lengths [1:5] 5 4 3 2 1 ## values [1:5] 10 9 8 7 6 z <- c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE) rle(z) rle(as.character(z)) print(rle(z), prefix = "..| ") N <- integer(0) stopifnot(x == inverse.rle(rle(x)), identical(N, inverse.rle(rle(N))), z == inverse.rle(rle(z)))
ceiling
takes a single numeric argument x
and returns a
numeric vector containing the smallest integers not less than the
corresponding elements of x
.
floor
takes a single numeric argument x
and returns a
numeric vector containing the largest integers not greater than the
corresponding elements of x
.
trunc
takes a single numeric argument x
and returns a
numeric vector containing the integers formed by truncating the values in
x
toward 0
.
round
rounds the values in its first argument to the specified
number of decimal places (default 0). See ‘Details’ about
“round to even” when rounding off a 5.
signif
rounds the values in its first argument to the specified
number of significant digits. Hence, for numeric
x
,
signif(x, dig)
is the same as round(x, dig - ceiling(log10(abs(x))))
.
ceiling(x) floor(x) trunc(x, ...) round(x, digits = 0, ...) signif(x, digits = 6)
ceiling(x) floor(x) trunc(x, ...) round(x, digits = 0, ...) signif(x, digits = 6)
x |
a numeric vector. Or, for |
digits |
integer indicating the number of decimal places
( |
... |
arguments to be passed to methods. |
These are generic functions: methods can be defined for them
individually or via the Math
group
generic.
Note that for rounding off a 5, the IEC 60559 standard (see also
‘IEEE 754’) is expected to be used, ‘go to the even digit’.
Therefore round(0.5)
is 0
and round(-1.5)
is
-2
. However, this is dependent on OS services and on
representation error (since e.g. 0.15
is not represented
exactly, the rounding rule applies to the represented number and not
to the printed number, and so round(0.15, 1)
could be either
0.1
or 0.2
).
Rounding to a negative number of digits means rounding to a power of
ten, so for example round(x, digits = -2)
rounds to the nearest
hundred.
For signif
the recognized values of digits
are
1...22
, and non-missing values are rounded to the nearest
integer in that range. Each element of the vector is rounded individually,
unlike printing.
These are all primitive functions.
These are all (internally) S4 generic.
ceiling
, floor
and trunc
are members of the
Math
group generic. As an S4
generic, trunc
has only one argument.
round
and signif
are members of the
Math2
group generic.
The realities of computer arithmetic can cause unexpected results,
especially with floor
and ceiling
. For example, we
‘know’ that floor(log(x, base = 8))
for x = 8
is
1
, but 0
has been seen on an R platform. It is
normally necessary to use a tolerance.
Rounding to decimal digits in binary arithmetic is non-trivial (when
digits != 0
) and may be surprising. Be aware that most decimal
fractions are not exactly representable in binary double precision.
In R 4.0.0, the algorithm for round(x, d)
, for , has
been improved to measure and round “to nearest even”,
contrary to earlier versions of R (or also to
sprintf()
or format()
based rounding).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
The ISO/IEC/IEEE 60559:2011 standard is available for money from https://www.iso.org.
The IEEE 754:2008 standard is more openly documented, e.g, at https://en.wikipedia.org/wiki/IEEE_754.
as.integer
.
Package round's roundX()
for several
versions or implementations of rounding, including some previous and the
current R version (as version = "3d.C"
).
round(.5 + -2:4) # IEEE / IEC rounding: -2 0 0 2 2 4 4 ## (this is *good* behaviour -- do *NOT* report it as bug !) ( x1 <- seq(-2, 4, by = .5) ) round(x1) #-- IEEE / IEC rounding ! x1[trunc(x1) != floor(x1)] x1[round(x1) != floor(x1 + .5)] (non.int <- ceiling(x1) != floor(x1)) x2 <- pi * 100^(-1:3) round(x2, 3) signif(x2, 3)
round(.5 + -2:4) # IEEE / IEC rounding: -2 0 0 2 2 4 4 ## (this is *good* behaviour -- do *NOT* report it as bug !) ( x1 <- seq(-2, 4, by = .5) ) round(x1) #-- IEEE / IEC rounding ! x1[trunc(x1) != floor(x1)] x1[round(x1) != floor(x1 + .5)] (non.int <- ceiling(x1) != floor(x1)) x2 <- pi * 100^(-1:3) round(x2, 3) signif(x2, 3)
Round or truncate date-time objects.
## S3 method for class 'POSIXt' round(x, units = c("secs", "mins", "hours", "days", "months", "years")) ## S3 method for class 'POSIXt' trunc(x, units = c("secs", "mins", "hours", "days", "months", "years"), ...) ## S3 method for class 'Date' round(x, ...) ## S3 method for class 'Date' trunc(x, units = c("secs", "mins", "hours", "days", "months", "years"), ...)
## S3 method for class 'POSIXt' round(x, units = c("secs", "mins", "hours", "days", "months", "years")) ## S3 method for class 'POSIXt' trunc(x, units = c("secs", "mins", "hours", "days", "months", "years"), ...) ## S3 method for class 'Date' round(x, ...) ## S3 method for class 'Date' trunc(x, units = c("secs", "mins", "hours", "days", "months", "years"), ...)
x |
|
units |
one of the units listed, a string. Can be abbreviated. |
... |
arguments to be passed to or from other methods, notably
|
The time is rounded or truncated to the second, minute, hour, day, month or year. Time zones are only relevant to days or more, when midnight in the current time zone is used.
For units
arguments besides “months” and “years”,
the methods for class "Date"
are of little use except to remove
fractional days.
An object of class "POSIXlt"
or "Date"
.
round
for the generic function and default methods.
round(.leap.seconds + 1000, "hour") trunc(Sys.time(), "day") (timM <- trunc(Sys.time() -> St, "months")) # shows timezone (datM <- trunc(Sys.Date() -> Sd, "months")) (timY <- trunc(St, "years")) # + timezone (datY <- trunc(Sd, "years")) stopifnot(inherits(datM, "Date"), inherits(timM, "POSIXt"), substring(format(datM), 9,10) == "01", # first of month substring(format(datY), 6,10) == "01-01", # Jan 1 identical(format(datM), format(timM)), identical(format(datY), format(timY)))
round(.leap.seconds + 1000, "hour") trunc(Sys.time(), "day") (timM <- trunc(Sys.time() -> St, "months")) # shows timezone (datM <- trunc(Sys.Date() -> Sd, "months")) (timY <- trunc(St, "years")) # + timezone (datY <- trunc(Sd, "years")) stopifnot(inherits(datM, "Date"), inherits(timM, "POSIXt"), substring(format(datM), 9,10) == "01", # first of month substring(format(datY), 6,10) == "01-01", # Jan 1 identical(format(datM), format(timM)), identical(format(datY), format(timY)))
Returns a matrix of integers indicating their row number in a matrix-like object, or a factor indicating the row labels.
row(x, as.factor = FALSE) .row(dim)
row(x, as.factor = FALSE) .row(dim)
x |
a matrix-like object, that is one with a two-dimensional
|
dim |
a matrix dimension, i.e., an integer valued numeric vector of length two (with non-negative entries). |
as.factor |
a logical value indicating whether the value should be returned as a factor of row labels (created if necessary) rather than as numbers. |
An integer (or factor) matrix with the same dimensions as x
and whose
ij
-th element is equal to i
(or the i
-th row label).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
col
to get columns;
slice.index
for a general way to get slice indices
in an array.
x <- matrix(1:12, 3, 4) # extract the diagonal of a matrix - more slowly than diag(x) dx <- x[row(x) == col(x)] dx # create an identity 5-by-5 matrix more slowly than diag(n = 5): x <- matrix(0, nrow = 5, ncol = 5) x[row(x) == col(x)] <- 1 x (i34 <- .row(3:4)) stopifnot(identical(i34, .row(c(3,4)))) # 'dim' maybe "double"
x <- matrix(1:12, 3, 4) # extract the diagonal of a matrix - more slowly than diag(x) dx <- x[row(x) == col(x)] dx # create an identity 5-by-5 matrix more slowly than diag(n = 5): x <- matrix(0, nrow = 5, ncol = 5) x[row(x) == col(x)] <- 1 x (i34 <- .row(3:4)) stopifnot(identical(i34, .row(c(3,4)))) # 'dim' maybe "double"
All data frames have row names, a character vector of length the number of rows with no duplicates nor missing values.
There are generic functions for getting and setting row names,
with default methods for arrays.
The description here is for the data.frame
method.
`.rowNamesDF<-`
is a (non-generic replacement) function to set
row names for data frames, with extra argument make.names
.
This function only exists as workaround as we cannot easily change the
row.names<-
generic without breaking legacy code in existing packages.
row.names(x) row.names(x) <- value .rowNamesDF(x, make.names=FALSE) <- value
row.names(x) row.names(x) <- value .rowNamesDF(x, make.names=FALSE) <- value
x |
object of class |
make.names |
|
value |
an object to be coerced to character unless an integer
vector. It should have (after coercion) the same length as the
number of rows of |
A data frame has (by definition) a vector of row names which has length the number of rows in the data frame, and contains neither missing nor duplicated values. Where a row names sequence has been added by the software to meet this requirement, they are regarded as ‘automatic’.
Row names are currently allowed to be integer or character, but
for backwards compatibility (with R <= 2.4.0) row.names
will
always return a character vector. (Use attr(x, "row.names")
if
you need to retrieve an integer-valued set of row names.)
Using NULL
for the value resets the row names to
seq_len(nrow(x))
, regarded as ‘automatic’.
row.names
returns a character vector.
row.names<-
returns a data frame with the row names changed.
row.names
is similar to rownames
for arrays, and
it has a method that calls rownames
for an array argument.
Row names of the form 1:n
for n > 2
are stored
internally in a compact form, which might be seen from C code or by
deparsing but never via row.names
or
attr(x, "row.names")
. Additionally, some names of this
sort are marked as ‘automatic’ and handled differently by
as.matrix
and data.matrix
(and potentially
other functions). (All zero-row data frames are regarded as having
automatic row names.)
Chambers, J. M. (1992) Data for models. Chapter 3 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
.row_names_info
for the internal representations.
## To illustrate the note: df <- data.frame(x = c(TRUE, FALSE, NA, NA), y = c(12, 34, 56, 78)) row.names(df) <- 1 : 4 attr(df, "row.names") #> 1:4 deparse(df) # or dput(df) ##--> c(NA, 4L) : Compact storage, *not* regarded as automatic. row.names(df) <- NULL attr(df, "row.names") #> 1:4 deparse(df) # or dput(df) -- shows ##--> c(NA, -4L) : Compact storage, regarded as automatic.
## To illustrate the note: df <- data.frame(x = c(TRUE, FALSE, NA, NA), y = c(12, 34, 56, 78)) row.names(df) <- 1 : 4 attr(df, "row.names") #> 1:4 deparse(df) # or dput(df) ##--> c(NA, 4L) : Compact storage, *not* regarded as automatic. row.names(df) <- NULL attr(df, "row.names") #> 1:4 deparse(df) # or dput(df) -- shows ##--> c(NA, -4L) : Compact storage, regarded as automatic.
Retrieve or set the row or column names of a matrix-like object.
rownames(x, do.NULL = TRUE, prefix = "row") rownames(x) <- value colnames(x, do.NULL = TRUE, prefix = "col") colnames(x) <- value
rownames(x, do.NULL = TRUE, prefix = "row") rownames(x) <- value colnames(x, do.NULL = TRUE, prefix = "col") colnames(x) <- value
x |
a matrix-like R object, with at least two dimensions for
|
do.NULL |
logical. If |
prefix |
for created names. |
value |
a valid value for that component of
|
The extractor functions try to do something sensible for any
matrix-like object x
. If the object has dimnames
the first component is used as the row names, and the second component
(if any) is used for the column names. For a data frame, rownames
and colnames
eventually call row.names
and
names
respectively, but the latter are preferred.
If do.NULL
is FALSE
, a character vector (of length
NROW(x)
or NCOL(x)
) is returned in any
case, prepending prefix
to simple numbers, if there are no
dimnames or the corresponding component of the dimnames is NULL
.
The replacement methods for arrays/matrices coerce vector and factor
values of value
to character, but do not dispatch methods for
as.character
.
For a data frame, value
for rownames
should be a
character vector of non-duplicated and non-missing names (this is
enforced), and for colnames
a character vector of (preferably)
unique syntactically-valid names. In both cases, value
will be
coerced by as.character
, and setting colnames
will convert the row names to character.
If the replacement versions are called on a matrix without any existing dimnames, they will add suitable dimnames. But constructions such as
rownames(x)[3] <- "c"
may not work unless x
already has dimnames, since this will
create a length-3 value
from the NULL
value of
rownames(x)
.
dimnames
,
case.names
,
variable.names
.
m0 <- matrix(NA, 4, 0) rownames(m0) m2 <- cbind(1, 1:4) colnames(m2, do.NULL = FALSE) colnames(m2) <- c("x","Y") rownames(m2) <- rownames(m2, do.NULL = FALSE, prefix = "Obs.") m2
m0 <- matrix(NA, 4, 0) rownames(m0) m2 <- cbind(1, 1:4) colnames(m2, do.NULL = FALSE) colnames(m2) <- c("x","Y") rownames(m2) <- rownames(m2, do.NULL = FALSE, prefix = "Obs.") m2
Compute column sums across rows of a numeric matrix-like object for
each level of a grouping variable. rowsum
is generic, with a
method for data frames and a default method for vectors and matrices.
rowsum(x, group, reorder = TRUE, ...) ## S3 method for class 'data.frame' rowsum(x, group, reorder = TRUE, na.rm = FALSE, ...) ## Default S3 method: rowsum(x, group, reorder = TRUE, na.rm = FALSE, ...)
rowsum(x, group, reorder = TRUE, ...) ## S3 method for class 'data.frame' rowsum(x, group, reorder = TRUE, na.rm = FALSE, ...) ## Default S3 method: rowsum(x, group, reorder = TRUE, na.rm = FALSE, ...)
x |
a matrix, data frame or vector of numeric data. Missing values are allowed. A numeric vector will be treated as a column vector. |
group |
a vector or factor giving the grouping, with one element
per row of |
reorder |
if |
na.rm |
logical ( |
... |
other arguments to be passed to or from methods. |
The default is to reorder the rows to agree with tapply
as in
the example below. Reordering should not add noticeably to the time
except when there are very many distinct values of group
and
x
has few columns.
The original function was written by Terry Therneau, but this is a new implementation using hashing that is much faster for large matrices.
To sum over all the rows of a matrix (i.e., a single group
) use
colSums
, which should be even faster.
For integer arguments, over/underflow in forming the sum results in
NA
.
A matrix or data frame containing the sums. There will be one row per
unique value of group
.
require(stats) x <- matrix(runif(100), ncol = 5) group <- sample(1:8, 20, TRUE) (xsum <- rowsum(x, group)) ## Slower versions tapply(x, list(group[row(x)], col(x)), sum) t(sapply(split(as.data.frame(x), group), colSums)) aggregate(x, list(group), sum)[-1]
require(stats) x <- matrix(runif(100), ncol = 5) group <- sample(1:8, 20, TRUE) (xsum <- rowsum(x, group)) ## Slower versions tapply(x, list(group[row(x)], col(x)), sum) t(sapply(split(as.data.frame(x), group), colSums)) aggregate(x, list(group), sum)[-1]
Register S3 methods in R scripts.
.S3method(generic, class, method)
.S3method(generic, class, method)
generic |
a character string naming an S3 generic function. |
class |
a character string naming an S3 class. |
method |
a character string or function giving the S3 method to
be registered. If not given, the function named
|
This function should only be used in R scripts: for package code, one should use the corresponding ‘S3method’ ‘NAMESPACE’ directive.
## Create a generic function and register a method for objects ## inheriting from class 'cls': gen <- function(x) UseMethod("gen") met <- function(x) writeLines("Hello world.") .S3method("gen", "cls", met) ## Create an object inheriting from class 'cls', and call the ## generic on it: x <- structure(123, class = "cls") gen(x)
## Create a generic function and register a method for objects ## inheriting from class 'cls': gen <- function(x) UseMethod("gen") met <- function(x) writeLines("Hello world.") .S3method("gen", "cls", met) ## Create an object inheriting from class 'cls', and call the ## generic on it: x <- structure(123, class = "cls") gen(x)
sample
takes a sample of the specified size from the elements
of x
using either with or without replacement.
sample(x, size, replace = FALSE, prob = NULL) sample.int(n, size = n, replace = FALSE, prob = NULL, useHash = (n > 1e+07 && !replace && is.null(prob) && size <= n/2))
sample(x, size, replace = FALSE, prob = NULL) sample.int(n, size = n, replace = FALSE, prob = NULL, useHash = (n > 1e+07 && !replace && is.null(prob) && size <= n/2))
x |
either a vector of one or more elements from which to choose, or a positive integer. See ‘Details.’ |
n |
a positive number, the number of items to choose from. See ‘Details.’ |
size |
a non-negative integer giving the number of items to choose. |
replace |
should sampling be with replacement? |
prob |
a vector of probability weights for obtaining the elements of the vector being sampled. |
useHash |
|
If x
has length 1, is numeric (in the sense of
is.numeric
) and x >= 1
, sampling via
sample
takes place from 1:x
. Note that this
convenience feature may lead to undesired behaviour when x
is
of varying length in calls such as sample(x)
. See the examples.
Otherwise x
can be any R object for which length
and
subsetting by integers make sense: S3 or S4 methods for these
operations will be dispatched as appropriate.
For sample
the default for size
is the number of items
inferred from the first argument, so that sample(x)
generates a
random permutation of the elements of x
(or 1:x
).
It is allowed to ask for size = 0
samples with n = 0
or
a length-zero x
, but otherwise n > 0
or positive
length(x)
is required.
Non-integer positive numerical values of n
or x
will be
truncated to the next smallest integer, which has to be no larger than
.Machine$integer.max
.
The optional prob
argument can be used to give a vector of
weights for obtaining the elements of the vector being sampled. They
need not sum to one, but they should be non-negative and not all zero.
If replace
is true, Walker's alias method (Ripley, 1987) is
used when there are more than 200 reasonably probable values: this
gives results incompatible with those from R < 2.2.0.
If replace
is false, these probabilities are applied
sequentially, that is the probability of choosing the next item is
proportional to the weights amongst the remaining items. The number
of nonzero weights must be at least size
in this case.
sample.int
is a bare interface in which both n
and
size
must be supplied as integers.
Argument n
can be larger than the largest integer of
type integer
, up to the largest representable integer in type
double
. Only uniform sampling is supported. Two
random numbers are used to ensure uniform sampling of large integers.
For sample
a vector of length size
with elements
drawn from either x
or from the integers 1:x
.
For sample.int
, an integer vector of length size
with
elements from 1:n
, or a double vector if
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Ripley, B. D. (1987) Stochastic Simulation. Wiley.
RNGkind(sample.kind = ..)
about random number generation,
notably the change of sample()
results with R version 3.6.0.
CRAN package sampling for other methods of weighted sampling without replacement.
x <- 1:12 # a random permutation sample(x) # bootstrap resampling -- only if length(x) > 1 ! sample(x, replace = TRUE) # 100 Bernoulli trials sample(c(0,1), 100, replace = TRUE) ## More careful bootstrapping -- Consider this when using sample() ## programmatically (i.e., in your function or simulation)! # sample()'s surprise -- example x <- 1:10 sample(x[x > 8]) # length 2 sample(x[x > 9]) # oops -- length 10! sample(x[x > 10]) # length 0 ## safer version: resample <- function(x, ...) x[sample.int(length(x), ...)] resample(x[x > 8]) # length 2 resample(x[x > 9]) # length 1 resample(x[x > 10]) # length 0 ## R 3.0.0 and later sample.int(1e10, 12, replace = TRUE) sample.int(1e10, 12) # not that there is much chance of duplicates
x <- 1:12 # a random permutation sample(x) # bootstrap resampling -- only if length(x) > 1 ! sample(x, replace = TRUE) # 100 Bernoulli trials sample(c(0,1), 100, replace = TRUE) ## More careful bootstrapping -- Consider this when using sample() ## programmatically (i.e., in your function or simulation)! # sample()'s surprise -- example x <- 1:10 sample(x[x > 8]) # length 2 sample(x[x > 9]) # oops -- length 10! sample(x[x > 10]) # length 0 ## safer version: resample <- function(x, ...) x[sample.int(length(x), ...)] resample(x[x > 8]) # length 2 resample(x[x > 9]) # length 1 resample(x[x > 10]) # length 0 ## R 3.0.0 and later sample.int(1e10, 12, replace = TRUE) sample.int(1e10, 12) # not that there is much chance of duplicates
save
writes an external representation of R objects to the
specified file. The objects can be read back from the file at a later
date by using the function load
or attach
(or data
in some cases).
save.image()
is just a short-cut for ‘save my current
workspace’, i.e., save(list = ls(all.names = TRUE), file =
".RData", envir = .GlobalEnv)
.
It is also what happens with q("yes")
.
save(..., list = character(), file = stop("'file' must be specified"), ascii = FALSE, version = NULL, envir = parent.frame(), compress = isTRUE(!ascii), compression_level, eval.promises = TRUE, precheck = TRUE) save.image(file = ".RData", version = NULL, ascii = FALSE, compress = !ascii, safe = TRUE)
save(..., list = character(), file = stop("'file' must be specified"), ascii = FALSE, version = NULL, envir = parent.frame(), compress = isTRUE(!ascii), compression_level, eval.promises = TRUE, precheck = TRUE) save.image(file = ".RData", version = NULL, ascii = FALSE, compress = !ascii, safe = TRUE)
... |
the names of the objects to be saved (as symbols or character strings). |
list |
a character vector (or |
file |
a (writable binary-mode) connection or the name of the
file where the data will be saved (when tilde expansion
is done). Must be a file name for |
ascii |
if |
version |
the workspace format version to use. |
envir |
environment to search for objects to be saved. |
compress |
logical or character string specifying whether saving
to a named file is to use compression. |
compression_level |
integer: the level of compression to be
used. Defaults to |
eval.promises |
logical: should objects which are promises be forced before saving? |
precheck |
logical: should the existence of the objects be checked before starting to save (and in particular before opening the file/connection)? Does not apply to version 1 saves. |
safe |
logical. If |
The names of the objects specified either as symbols (or character
strings) in ...
or as a character vector in list
are
used to look up the objects from environment envir
. By default
promises are evaluated, but if eval.promises = FALSE
promises are saved (together with their evaluation environments).
(Promises embedded in objects are always saved unevaluated.)
All R platforms use the XDR (big-endian) representation of C ints and doubles in binary save-d files, and these are portable across all R platforms.
ASCII saves used to be useful for moving data between platforms but are now mainly of historical interest. They can be more compact than binary saves where compression is not used, but are almost always slower to both read and write: binary saves compress much better than ASCII ones. Further, decimal ASCII saves may not restore double/complex values exactly, and what value is restored may depend on the R platform.
Default values for the ascii
, compress
, safe
and
version
arguments can be modified with the
"save.defaults"
option (used both by save
and
save.image
), see also the ‘Examples’ section. If a
"save.image.defaults"
option is set it is used in preference to
"save.defaults"
for function save.image
(which allows
this to have different defaults). In addition,
compression_level
can be part of the "save.defaults"
option.
A connection that is not already open will be opened in mode
"wb"
. Supplying a connection which is open and not in binary
mode gives an error.
Large files can be reduced considerably in size by compression. A
particular 46MB R object was saved as 35MB without compression in 2
seconds, 22MB with gzip
compression in 8 secs, 19MB with
bzip2
compression in 13 secs and 9.4MB with xz
compression in 40 secs. The load times were 1.3, 2.8, 5.5 and 5.7
seconds respectively. These results are indicative, but the relative
performances do depend on the actual file: xz
compressed
unusually well here.
It is possible to compress later (with gzip
, bzip2
or xz
) a file saved with compress = FALSE
: the effect
is the same as saving with compression. Also, a saved file can be
uncompressed and re-compressed under a different compression scheme
(and see resaveRdaFiles
for a way to do so from within R).
That file
can be a connection can be exploited to make use of
an external parallel compression utility such as pigz
(https://zlib.net/pigz/) or pbzip2
(https://launchpad.net/pbzip2) via a pipe
connection. For example, using 8 threads,
con <- pipe("pigz -p8 > fname.gz", "wb") save(myObj, file = con); close(con) con <- pipe("pbzip2 -p8 -9 > fname.bz2", "wb") save(myObj, file = con); close(con) con <- pipe("xz -T8 -6 -e > fname.xz", "wb") save(myObj, file = con); close(con)
where the last requires xz
5.1.1 or later built with support
for multiple threads (and parallel compression is only effective for
large objects: at level 6 it will compress in serialized chunks of 12MB).
The ...
arguments only give the names of the objects
to be saved: they are searched for in the environment given by the
envir
argument, and the actual objects given as arguments need
not be those found.
Saved R objects are binary files, even those saved with
ascii = TRUE
, so ensure that they are transferred without
conversion of end-of-line markers and of 8-bit characters. The lines
are delimited by LF on all platforms.
Although the default version was not changed between R 1.4.0 and R 3.4.4 nor since R 3.5.0, this does not mean that saved files are necessarily backwards compatible. You will be able to load a saved image into an earlier version of R which supports its version unless use is made of later additions (for example for version 2, raw vectors, external pointers and some S4 objects).
One such ‘later addition’ was long vectors, introduced in R 3.0.0 and loadable only on 64-bit platforms.
Loading files saved with ASCII = NA
requires a C99-compliant C
function sscanf
: this is a problem on Windows, first worked
around in R 3.1.2: version-2 files in that format should be readable
in earlier versions of R on all other platforms.
For saving single R objects, saveRDS()
is mostly
preferable to save()
, notably because of the functional
nature of readRDS()
, as opposed to load()
.
The most common reason for failure is lack of write permission in the
current directory. For save.image
and for saving at the end of
a session this will shown by messages like
Error in gzfile(file, "wb") : unable to open connection In addition: Warning message: In gzfile(file, "wb") : cannot open compressed file '.RDataTmp', probable reason 'Permission denied'
For other interfaces to the underlying serialization format, see
serialize
and saveRDS
.
x <- stats::runif(20) y <- list(a = 1, b = TRUE, c = "oops") save(x, y, file = "xy.RData") save.image() # creating ".RData" in current working directory unlink("xy.RData") # set save defaults using option: options(save.defaults = list(ascii = TRUE, safe = FALSE)) save.image() # creating ".RData" if(interactive()) withAutoprint({ file.info(".RData") readLines(".RData", n = 7) # first 7 lines; first starts w/ "RDA".. }) unlink(".RData")
x <- stats::runif(20) y <- list(a = 1, b = TRUE, c = "oops") save(x, y, file = "xy.RData") save.image() # creating ".RData" in current working directory unlink("xy.RData") # set save defaults using option: options(save.defaults = list(ascii = TRUE, safe = FALSE)) save.image() # creating ".RData" if(interactive()) withAutoprint({ file.info(".RData") readLines(".RData", n = 7) # first 7 lines; first starts w/ "RDA".. }) unlink(".RData")
scale
is generic function whose default method centers and/or
scales the columns of a numeric matrix.
scale(x, center = TRUE, scale = TRUE)
scale(x, center = TRUE, scale = TRUE)
x |
a numeric matrix(like object). |
center |
either a logical value or numeric-alike vector of length
equal to the number of columns of |
scale |
either a logical value or a numeric-alike vector of length
equal to the number of columns of |
The value of center
determines how column centering is
performed. If center
is a numeric-alike vector with length equal to
the number of columns of x
, then each column of x
has
the corresponding value from center
subtracted from it. If
center
is TRUE
then centering is done by subtracting the
column means (omitting NA
s) of x
from their
corresponding columns, and if center
is FALSE
, no
centering is done.
The value of scale
determines how column scaling is performed
(after centering). If scale
is a numeric-alike vector with length
equal to the number of columns of x
, then each column of
x
is divided by the corresponding value from scale
.
If scale
is TRUE
then scaling is done by dividing the
(centered) columns of x
by their standard deviations if
center
is TRUE
, and the root mean square otherwise.
If scale
is FALSE
, no scaling is done.
The root-mean-square for a (possibly centered) column is defined as
, where
is
a vector of the non-missing values and
is the number of
non-missing values. In the case
center = TRUE
, this is the
same as the standard deviation, but in general it is not. (To scale
by the standard deviations without centering, use
scale(x, center = FALSE, scale = apply(x, 2, sd, na.rm = TRUE))
.)
For scale.default
, the centered, scaled matrix. The numeric
centering and scalings used (if any) are returned as attributes
"scaled:center"
and "scaled:scale"
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
sweep
which allows centering (and scaling) with
arbitrary statistics.
For working with the scale of a plot, see par
.
require(stats) x <- matrix(1:10, ncol = 2) (centered.x <- scale(x, scale = FALSE)) cov(centered.scaled.x <- scale(x)) # all 1
require(stats) x <- matrix(1:10, ncol = 2) (centered.x <- scale(x, scale = FALSE)) cov(centered.scaled.x <- scale(x)) # all 1
Read data into a vector or list from the console or file.
scan(file = "", what = double(), nmax = -1, n = -1, sep = "", quote = if(identical(sep, "\n")) "" else "'\"", dec = ".", skip = 0, nlines = 0, na.strings = "NA", flush = FALSE, fill = FALSE, strip.white = FALSE, quiet = FALSE, blank.lines.skip = TRUE, multi.line = TRUE, comment.char = "", allowEscapes = FALSE, fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
scan(file = "", what = double(), nmax = -1, n = -1, sep = "", quote = if(identical(sep, "\n")) "" else "'\"", dec = ".", skip = 0, nlines = 0, na.strings = "NA", flush = FALSE, fill = FALSE, strip.white = FALSE, quiet = FALSE, blank.lines.skip = TRUE, multi.line = TRUE, comment.char = "", allowEscapes = FALSE, fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
file |
the name of a file to read data values from. If the
specified file is Otherwise, the file name is interpreted relative to the
current working directory (given by This can be a compressed file (see Alternatively,
To read a data file not in the current encoding (for example a
Latin-1 file in a UTF-8 locale or conversely) use a
|
what |
the type of |
nmax |
the maximum number of data values to be read, or if
|
n |
integer: the maximum number of data values to be read, defaulting to no limit. Invalid values will be ignored. |
sep |
by default, scan expects to read ‘white-space’
delimited input fields. Alternatively, If specified this should be the empty character string (the default)
or |
quote |
the set of quoting characters as a single character
string or |
dec |
decimal point character. This should be a character string
containing just one single-byte character. ( |
skip |
the number of lines of the input file to skip before beginning to read data values. |
nlines |
if positive, the maximum number of lines of data to be read. |
na.strings |
character vector. Elements of this vector are to be
interpreted as missing ( |
flush |
logical: if |
fill |
logical: if |
strip.white |
vector of logical value(s) corresponding to items
in the If |
quiet |
logical: if |
blank.lines.skip |
logical: if |
multi.line |
logical. Only used if |
comment.char |
character: a character vector of length one
containing a single character or an empty string. Use |
allowEscapes |
logical. Should C-style escapes such as ‘\n’ be processed (the default) or read verbatim? Note that if not within quotes these could be interpreted as a delimiter (but not as a comment character). The escapes which are interpreted are the control characters ‘\a, \b, \f, \n, \r, \t, \v’ and octal and hexadecimal representations like ‘\040’ and ‘\0x2A’. Any other escaped character is treated as itself, including backslash. Note that Unicode escapes (starting ‘\u’ or ‘\U’: see Quotes) are never processed. |
fileEncoding |
character string: if non-empty declares the
encoding used on a file (not a connection nor the keyboard) so the
character data can be re-encoded. See the ‘Encoding’ section
of the help for |
encoding |
encoding to be assumed for input strings. If the
value is |
text |
character string: if |
skipNul |
logical: should NULs be skipped when reading character fields? |
The value of what
can be a list of types, in which case
scan
returns a list of vectors with the types given by the
types of the elements in what
. This provides a way of reading
columnar data. If any of the types is NULL
, the corresponding
field is skipped (but a NULL
component appears in the result).
The type of what
or its components can be one of the six
atomic vector types or NULL
(see is.atomic
).
‘White space’ is defined for the purposes of this function as
one or more contiguous characters from the set space, horizontal tab,
carriage return and line feed (aka “newline”, "\n"
). It
does not include form feed nor
vertical tab, but in Latin-1 and Windows 8-bit locales (but not UTF-8)
'space' includes the non-breaking space ‘"\xa0"’.
Empty numeric fields are always regarded as missing values.
Empty character fields are scanned as empty character vectors, unless
na.strings
contains ""
when they are regarded as missing
values.
The allowed input for a numeric field is optional whitespace, followed by
either NA
or an optional sign followed by a decimal or
hexadecimal constant (see NumericConstants), or NaN
,
Inf
or infinity
(ignoring case). Out-of-range values
are recorded as Inf
, -Inf
or 0
.
For an integer field the allowed input is optional whitespace,
followed by either NA
or an optional sign and one or more
digits (‘0-9’): all out-of-range values are converted to
NA_integer_
.
If sep
is the default (""
), the character ‘\’
in a quoted string escapes the following character, so quotes may be
included in the string by escaping them.
If sep
is non-default, the fields may be quoted in the style of
‘.csv’ files where separators inside quotes (''
or
""
) are ignored and quotes may be put inside strings by
doubling them. However, if sep = "\n"
it is assumed
by default that one wants to read entire lines verbatim.
Quoting is only interpreted in character fields and in NULL
fields (which might be skipping character fields).
Note that since sep
is a separator and not a terminator,
reading a file by scan("foo", sep = "\n", blank.lines.skip = FALSE)
will give an empty final line if the file ends in a line feed ("\n"
)
and not if it does not. This might not be what you expected; see also
readLines
.
If comment.char
occurs (except inside a quoted character
field), it signals that the rest of the line should be regarded as a
comment and be discarded. Lines beginning with a comment character
(possibly after white space with the default separator) are treated as
blank lines.
There is a line-length limit of 4095 bytes when reading from the console (which may impose a lower limit: see ‘An Introduction to R’).
There is a check for a user interrupt every 1000 lines if what
is a list, otherwise every 10000 items.
If file
is a character string and fileEncoding
is
non-default, or if it is a not-already-open connection with a
non-default encoding
argument, the text is converted to UTF-8
and declared as such (and the encoding
argument to scan
is ignored). See the examples of readLines
.
Embedded NULs in the input stream will terminate the field currently
being read, with a warning once per call to scan
. Setting
skipNul = TRUE
causes them to be ignored.
if what
is a list, a list of the same length and same names (as
any) as what
.
Otherwise, a vector of the type of what
.
Character strings in the result will have a declared encoding if
encoding
is "latin1"
or "UTF-8"
.
The default for multi.line
differs from S. To read one record
per line, use flush = TRUE
and multi.line = FALSE
.
(Note that quoted character strings can still include embedded newlines.)
If number of items is not specified, the internal
mechanism re-allocates memory in powers of two and so could use up
to three times as much memory as needed. (It needs both old and new
copies.) If you can, specify either n
or nmax
whenever
inputting a large vector, and nmax
or nlines
when
inputting a large list.
Using scan
on an open connection to read partial lines can lose
chars: use an explicit separator to avoid this.
Having nul
bytes in fields (including ‘\0’ if
allowEscapes = TRUE
) may lead to interpretation of the
field being terminated at the nul
. They not normally present
in text files – see readBin
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
read.table
for more user-friendly reading of data
matrices;
readLines
to read a file a line at a time.
write
.
Quotes
for the details of C-style escape sequences.
readChar
and readBin
to read fixed or
variable length character strings or binary representations of numbers
a few at a time from a connection.
cat("TITLE extra line", "2 3 5 7", "11 13 17", file = "ex.data", sep = "\n") pp <- scan("ex.data", skip = 1, quiet = TRUE) scan("ex.data", skip = 1) scan("ex.data", skip = 1, nlines = 1) # only 1 line after the skipped one scan("ex.data", what = list("","","")) # flush is F -> read "7" scan("ex.data", what = list("","",""), flush = TRUE) unlink("ex.data") # tidy up ## "inline" usage scan(text = "1 2 3")
cat("TITLE extra line", "2 3 5 7", "11 13 17", file = "ex.data", sep = "\n") pp <- scan("ex.data", skip = 1, quiet = TRUE) scan("ex.data", skip = 1) scan("ex.data", skip = 1, nlines = 1) # only 1 line after the skipped one scan("ex.data", what = list("","","")) # flush is F -> read "7" scan("ex.data", what = list("","",""), flush = TRUE) unlink("ex.data") # tidy up ## "inline" usage scan(text = "1 2 3")
Gives a list of attach
ed packages
(see library
), and R objects, usually
data.frames
.
search() searchpaths()
search() searchpaths()
A character vector, starting with ".GlobalEnv"
, and
ending with "package:base"
which is R's base package
required always.
searchpaths
gives a similar character vector, with the
entries for packages being the path to the package used to load the
code.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole. (search
.)
Chambers, J. M. (1998)
Programming with Data. A Guide to the S Language.
Springer. (searchpaths
.)
.packages
to list just the packages on search path.
loadedNamespaces
to list loaded namespaces.
attach
and detach
to change the
search path, objects
to find R objects in there.
search() searchpaths()
search() searchpaths()
Functions to re-position connections.
seek(con, ...) ## S3 method for class 'connection' seek(con, where = NA, origin = "start", rw = "", ...) isSeekable(con) truncate(con, ...)
seek(con, ...) ## S3 method for class 'connection' seek(con, where = NA, origin = "start", rw = "", ...) isSeekable(con) truncate(con, ...)
con |
a connection. |
where |
numeric. A file position (relative to the origin
specified by |
rw |
character string. Empty or |
origin |
character string. One of |
... |
further arguments passed to or from other methods. |
seek
with where = NA
returns the current byte offset
of a connection (from the beginning), and with a non-missing where
argument the connection is re-positioned (if possible) to the
specified position. isSeekable
returns whether the connection
in principle supports seek
: currently only (possibly
gz-compressed) file connections do.
where
is stored as a real but should represent an integer:
non-integer values are likely to be truncated. Note that the possible
values can exceed the largest representable number in an R
integer
on 64-bit builds, and on some 32-bit builds.
File connections can be open for both writing/appending, in which case
R keeps separate positions for reading and writing. Which seek
refers to can be set by its rw
argument: the default is the
last mode (reading or writing) which was used. Most files are
only opened for reading or writing and so default to that state. If a
file is open for both reading and writing but has not been used, the
default is to give the reading position (0).
The initial file position for reading is always at the beginning.
The initial position for writing is at the beginning of the file
for modes "r+"
and "r+b"
, otherwise at the end of the
file. Some platforms only allow writing at the end of the file in
the append modes. (The reported write position for a file opened in
an append mode will typically be unreliable until the file has been
written to.)
gzfile
connections support seek
with a number of
limitations, using the file position of the uncompressed file.
They do not support origin = "end"
. When writing, seeking is
only possible forwards: when reading seeking backwards is supported by
rewinding the file and re-reading from its start.
If seek
is called with a non-NA
value of where
,
any pushback on a text-mode connection is discarded.
truncate
truncates a file opened for writing at its current
position. It works only for file
connections, and is not
implemented on all platforms: on others (including Windows) it will
not work for large (> 2Gb) files.
None of these should be expected to work on text-mode connections with re-encoding selected.
seek
returns the current position (before any move), as a
(numeric) byte offset from the origin, if relevant, or 0
if
not. Note that the position can exceed the largest representable
number in an R integer
on 64-bit builds, and on some 32-bit
builds.
truncate
returns NULL
: it stops with an error if
it fails (or is not implemented).
isSeekable
returns a logical value, whether the connection
supports seek
.
Use of seek
on Windows is discouraged. We have found so many
errors in the Windows implementation of file positioning that users
are advised to use it only at their own risk, and asked not to waste
the R developers' time with bug reports on Windows' deficiencies.
Generate regular sequences. seq
is a standard generic with a
default method. seq.int
is a primitive which can be
much faster but has a few restrictions. seq_along
and
seq_len
are very fast primitives for two common cases.
seq(...) ## Default S3 method: seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)), length.out = NULL, along.with = NULL, ...) seq.int(from, to, by, length.out, along.with, ...) seq_along(along.with) seq_len(length.out)
seq(...) ## Default S3 method: seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)), length.out = NULL, along.with = NULL, ...) seq.int(from, to, by, length.out, along.with, ...) seq_along(along.with) seq_len(length.out)
... |
arguments passed to or from methods. |
from , to
|
the starting and (maximal) end values of the
sequence. Of length |
by |
number: increment of the sequence. |
length.out |
desired length of the sequence. A
non-negative number, which for |
along.with |
take the length from the length of this argument. |
Numerical inputs should all be finite (that is, not infinite,
NaN
or NA
).
The interpretation of the unnamed arguments of seq
and
seq.int
is not standard, and it is recommended always to
name the arguments when programming.
seq
is generic, and only the default method is described here.
Note that it dispatches on the class of the first argument
irrespective of argument names. This can have unintended consequences
if it is called with just one argument intending this to be taken as
along.with
: it is much better to use seq_along
in that
case.
seq.int
is an internal generic which dispatches on
methods for "seq"
based on the class of the first supplied
argument (before argument matching).
Typical usages are
seq(from, to) seq(from, to, by= ) seq(from, to, length.out= ) seq(along.with= ) seq(from) seq(length.out= )
The first form generates the sequence from, from+/-1, ..., to
(identical to from:to
).
The second form generates from, from+by
, ..., up to the
sequence value less than or equal to to
. Specifying to -
from
and by
of opposite signs is an error. Note that the
computed final value can go just beyond to
to allow for
rounding error, but is truncated to to
. (‘Just beyond’
is by up to times
abs(from - to)
.)
The third generates a sequence of length.out
equally spaced
values from from
to to
. (length.out
is usually
abbreviated to length
or len
, and seq_len
is much
faster.)
The fourth form generates the integer sequence 1, 2, ...,
length(along.with)
. (along.with
is usually abbreviated to
along
, and seq_along
is much faster.)
The fifth form generates the sequence 1, 2, ..., length(from)
(as if argument along.with
had been specified), unless
the argument is numeric of length 1 when it is interpreted as
1:from
(even for seq(0)
for compatibility with S).
Using either seq_along
or seq_len
is much preferred
(unless strict S compatibility is essential).
The final form generates the integer sequence 1, 2, ...,
length.out
unless length.out = 0
, when it generates
integer(0)
.
Very small sequences (with from - to
of the order of
times the larger of the ends) will return
from
.
For seq
(only), up to two of from
, to
and
by
can be supplied as complex values provided length.out
or along.with
is specified. More generally, the default method
of seq
will handle classed objects with methods for
the Math
, Ops
and Summary
group generics.
seq.int
, seq_along
and seq_len
are
primitive.
seq.int
and the default method of seq
for numeric
arguments return a vector of type "integer"
or "double"
:
programmers should not rely on which.
seq_along
and seq_len
return an integer vector, unless
it is a long vector when it will be double.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
The methods seq.Date
and seq.POSIXt
.
seq(0, 1, length.out = 11) seq(stats::rnorm(20)) # effectively 'along' seq(1, 9, by = 2) # matches 'end' seq(1, 9, by = pi) # stays below 'end' seq(1, 6, by = 3) seq(1.575, 5.125, by = 0.05) seq(17) # same as 1:17, or even better seq_len(17)
seq(0, 1, length.out = 11) seq(stats::rnorm(20)) # effectively 'along' seq(1, 9, by = 2) # matches 'end' seq(1, 9, by = pi) # stays below 'end' seq(1, 6, by = 3) seq(1.575, 5.125, by = 0.05) seq(17) # same as 1:17, or even better seq_len(17)
The method for seq
for objects of class
"Date"
representing calendar dates.
## S3 method for class 'Date' seq(from, to, by, length.out = NULL, along.with = NULL, ...)
## S3 method for class 'Date' seq(from, to, by, length.out = NULL, along.with = NULL, ...)
from |
starting date. Required. |
to |
end date. Optional. |
by |
increment of the sequence. Optional. See ‘Details’. |
length.out |
integer, optional. Desired length of the sequence. |
along.with |
take the length from the length of this argument. |
... |
arguments passed to or from other methods. |
by
can be specified in several ways.
A number, taken to be in days.
A object of class difftime
A character string, containing one of "day"
,
"week"
, "month"
, "quarter"
or "year"
.
This can optionally be preceded by a (positive or negative) integer
and a space, or followed by "s"
.
See seq.POSIXt
for the details of "month"
.
A vector of class "Date"
.
## first days of years seq(as.Date("1910/1/1"), as.Date("1999/1/1"), "years") ## by month seq(as.Date("2000/1/1"), by = "month", length.out = 12) ## quarters seq(as.Date("2000/1/1"), as.Date("2003/1/1"), by = "quarter") ## find all 7th of the month between two dates, the last being a 7th. st <- as.Date("1998-12-17") en <- as.Date("2000-1-7") ll <- seq(en, st, by = "-1 month") rev(ll[ll > st & ll < en])
## first days of years seq(as.Date("1910/1/1"), as.Date("1999/1/1"), "years") ## by month seq(as.Date("2000/1/1"), by = "month", length.out = 12) ## quarters seq(as.Date("2000/1/1"), as.Date("2003/1/1"), by = "quarter") ## find all 7th of the month between two dates, the last being a 7th. st <- as.Date("1998-12-17") en <- as.Date("2000-1-7") ll <- seq(en, st, by = "-1 month") rev(ll[ll > st & ll < en])
The method for seq
for date-time classes.
## S3 method for class 'POSIXt' seq(from, to, by, length.out = NULL, along.with = NULL, ...)
## S3 method for class 'POSIXt' seq(from, to, by, length.out = NULL, along.with = NULL, ...)
from |
starting date. Required. |
to |
end date. Optional. |
by |
increment of the sequence. Optional. See ‘Details’. |
length.out |
integer, optional. Desired length of the sequence. |
along.with |
take the length from the length of this argument. |
... |
arguments passed to or from other methods. |
by
can be specified in several ways.
A number, taken to be in seconds.
A object of class difftime
A character string, containing one of "sec"
,
"min"
, "hour"
, "day"
, "DSTday"
,
"week"
, "month"
, "quarter"
or "year"
.
This can optionally be preceded by a (positive or negative) integer
and a space, or followed by "s"
.
The difference between "day"
and "DSTday"
is that the
former ignores changes to/from daylight savings time and the latter takes
the same clock time each day. "week"
ignores DST (it is a
period of 144 hours), but "7 DSTdays"
can be used as an
alternative. "month"
and "year"
allow for DST.
The time zone of the result is taken from from
: remember
that GMT means UTC (and not the time zone of Greenwich, England) and so
does not have daylight savings time.
Using "month"
first advances the month without changing the
day: if this results in an invalid day of the month, it is counted
forward into the next month: see the examples.
A vector of class "POSIXct"
.
## first days of years seq(ISOdate(1910,1,1), ISOdate(1999,1,1), "years") ## by month seq(ISOdate(2000,1,1), by = "month", length.out = 12) seq(ISOdate(2000,1,31), by = "month", length.out = 4) ## quarters seq(ISOdate(1990,1,1), ISOdate(2000,1,1), by = "quarter") # or "3 months" ## days vs DSTdays: use c() to lose the time zone. seq(c(ISOdate(2000,3,20)), by = "day", length.out = 10) seq(c(ISOdate(2000,3,20)), by = "DSTday", length.out = 10) seq(c(ISOdate(2000,3,20)), by = "7 DSTdays", length.out = 4)
## first days of years seq(ISOdate(1910,1,1), ISOdate(1999,1,1), "years") ## by month seq(ISOdate(2000,1,1), by = "month", length.out = 12) seq(ISOdate(2000,1,31), by = "month", length.out = 4) ## quarters seq(ISOdate(1990,1,1), ISOdate(2000,1,1), by = "quarter") # or "3 months" ## days vs DSTdays: use c() to lose the time zone. seq(c(ISOdate(2000,3,20)), by = "day", length.out = 10) seq(c(ISOdate(2000,3,20)), by = "DSTday", length.out = 10) seq(c(ISOdate(2000,3,20)), by = "7 DSTdays", length.out = 4)
The default method for sequence
generates the sequence
seq(from[i], by = by[i], length.out = nvec[i])
for each
element i
in the parallel (and recycled) vectors from
,
by
and nvec
. It then returns the result of concatenating
those sequences.
sequence(nvec, ...) ## Default S3 method: sequence(nvec, from = 1L, by = 1L, ...)
sequence(nvec, ...) ## Default S3 method: sequence(nvec, from = 1L, by = 1L, ...)
nvec |
coerced to a non-negative integer vector each element of which specifies the length of a sequence. |
from |
coerced to an integer vector each element of which specifies the first element of a sequence. |
by |
coerced to an integer vector each element of which specifies the step size between elements of a sequence. |
... |
additional arguments passed to methods. |
Negative values are supported for from
and
by
. sequence(nvec, from, by=0L)
is equivalent to
rep(from, each=nvec)
.
This function was originally implemented in R with fewer features, but it has since become more flexible, and the default method is implemented in C for speed.
Of the current version, Michael Lawrence based on code from the S4Vectors Bioconductor package
sequence(c(3, 2)) # the concatenated sequences 1:3 and 1:2. #> [1] 1 2 3 1 2 sequence(c(3, 2), from=2L) #> [1] 2 3 4 2 3 sequence(c(3, 2), from=2L, by=2L) #> [1] 2 4 6 2 4 sequence(c(3, 2), by=c(-1L, 1L)) #> [1] 1 0 -1 1 2
sequence(c(3, 2)) # the concatenated sequences 1:3 and 1:2. #> [1] 1 2 3 1 2 sequence(c(3, 2), from=2L) #> [1] 2 3 4 2 3 sequence(c(3, 2), from=2L, by=2L) #> [1] 2 4 6 2 4 sequence(c(3, 2), by=c(-1L, 1L)) #> [1] 1 0 -1 1 2
A simple low-level interface for serializing to connections.
serialize(object, connection, ascii, xdr = TRUE, version = NULL, refhook = NULL) unserialize(connection, refhook = NULL)
serialize(object, connection, ascii, xdr = TRUE, version = NULL, refhook = NULL) unserialize(connection, refhook = NULL)
object |
R object to serialize. |
connection |
an open connection or (for |
ascii |
a logical. If |
xdr |
a logical: if a binary representation is used, should a big-endian one (XDR) be used? |
version |
the workspace format version to use. |
refhook |
a hook function for handling reference objects. |
The function serialize
serializes object
to the specified
connection. If connection
is NULL
then object
is
serialized to a raw vector, which is returned as the result of
serialize
.
Sharing of reference objects is preserved within the object but not
across separate calls to serialize
.
unserialize
reads an object (as written by serialize
)
from connection
or a raw vector.
The refhook
functions can be used to customize handling of
non-system reference objects (all external pointers and weak
references, and all environments other than namespace and package
environments and .GlobalEnv
). The hook function for
serialize
should return a character vector for references it
wants to handle; otherwise it should return NULL
. The hook for
unserialize
will be called with character vectors supplied to
serialize
and should return an appropriate object.
For a text-mode connection, the default value of ascii
is set
to TRUE
: only ASCII representations can be written to text-mode
connections and attempting to use ascii = FALSE
will throw an
error.
The format consists of a single line followed by the data: the first
line contains a single character: X
for binary serialization
and A
for ASCII serialization, followed by a new line. (The
format used is identical to that used by readRDS
.)
As almost all systems in current use are little-endian, xdr =
FALSE
can be used to avoid byte-shuffling at both ends when
transferring data from one little-endian machine to another (or
between processes on the same machine). Depending on the system, this
can speed up serialization and unserialization by a factor of up to
3x.
For serialize
, NULL
unless connection = NULL
, when
the result is returned in a raw vector.
For unserialize
an R object.
These functions have provided a stable interface since R 2.4.0 (when the storage of serialized objects was changed from character to raw vectors). However, the serialization format may change in future versions of R, so this interface should not be used for long-term storage of R objects.
On 32-bit platforms a raw vector is limited to bytes, but R objects can exceed this and their serializations will
normally be larger than the objects.
saveRDS
for a more convenient interface to serialize an
object to a file or connection.
save
and load
to serialize and restore one
or more named objects.
The ‘R Internals’ manual for details of the format used.
x <- serialize(list(1,2,3), NULL) unserialize(x) ## see also the examples for saveRDS
x <- serialize(list(1,2,3), NULL) unserialize(x) ## see also the examples for saveRDS
Performs set union, intersection, (asymmetric!) difference, equality and membership on two vectors.
union(x, y) intersect(x, y) setdiff(x, y) setequal(x, y) is.element(el, set)
union(x, y) intersect(x, y) setdiff(x, y) setequal(x, y) is.element(el, set)
x , y , el , set
|
vectors (of the same mode) containing a sequence of items (conceptually) with no duplicated values. |
Each of union
, intersect
, setdiff
and
setequal
will discard any duplicated values in the arguments,
and they apply as.vector
to their arguments (and so
in particular coerce factors to character vectors).
is.element(x, y)
is identical to x %in% y
.
For union
, a vector of a common mode.
For intersect
, a vector of a common mode, or NULL
if
x
or y
is NULL
.
For setdiff
, a vector of the same mode
as x
.
A logical scalar for setequal
and a logical of the same
length as x
for is.element
.
‘plotmath’ for the use of union
and
intersect
in plot annotation.
(x <- c(sort(sample(1:20, 9)), NA)) (y <- c(sort(sample(3:23, 7)), NA)) union(x, y) intersect(x, y) setdiff(x, y) setdiff(y, x) setequal(x, y) ## True for all possible x & y : setequal( union(x, y), c(setdiff(x, y), intersect(x, y), setdiff(y, x))) is.element(x, y) # length 10 is.element(y, x) # length 8
(x <- c(sort(sample(1:20, 9)), NA)) (y <- c(sort(sample(3:23, 7)), NA)) union(x, y) intersect(x, y) setdiff(x, y) setdiff(y, x) setequal(x, y) ## True for all possible x & y : setequal( union(x, y), c(setdiff(x, y), intersect(x, y), setdiff(y, x))) is.element(x, y) # length 10 is.element(y, x) # length 8
Functions to set CPU and/or elapsed time limits for top-level computations or the current session.
setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE) setSessionTimeLimit(cpu = Inf, elapsed = Inf)
setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE) setSessionTimeLimit(cpu = Inf, elapsed = Inf)
cpu , elapsed
|
double (of length one). Set a limit on the total or elapsed CPU time in seconds, respectively. |
transient |
logical. If |
setTimeLimit
sets limits which apply to each top-level
computation, that is a command line (including any continuation lines)
entered at the console or from a file. If it is called from within a
computation the limits apply to the rest of the computation and
(unless transient = TRUE
) to subsequent top-level computations.
setSessionTimeLimit
sets limits for the rest of the
session. Once a session limit is reached it is reset to Inf
.
Setting any limit has a small overhead – well under 1% on the systems measured.
Time limits are checked whenever a user interrupt could occur.
This will happen frequently in R code and during Sys.sleep
,
but only at points in compiled C and Fortran code identified by the
code author.
‘Total CPU time’ includes that used by child processes where the latter is reported.
Display aspects of connections.
showConnections(all = FALSE) getConnection(what) closeAllConnections() stdin() stdout() stderr() nullfile() isatty(con) getAllConnections()
showConnections(all = FALSE) getConnection(what) closeAllConnections() stdin() stdout() stderr() nullfile() isatty(con) getAllConnections()
all |
logical: if true all connections, including closed ones and the standard ones are displayed. If false only open user-created connections are included. |
what |
integer: a row number of the table given by
|
con |
a connection. |
stdin()
, stdout()
and stderr()
are standard
connections corresponding to input, output and error on the console
respectively (and not necessarily to file streams). They are text-mode
connections of class "terminal"
which cannot be opened or
closed, and are read-only, write-only and write-only respectively.
The stdout()
and stderr()
connections can be
re-directed by sink
(and in some circumstances the
output from stdout()
can be split: see the help page).
The encoding for stdin()
when redirected can
be set by the command-line flag --encoding.
nullfile()
returns filename of the null device ("/dev/null"
on Unix, "nul:"
on Windows).
showConnections
returns a matrix of information. If a
connection object has been lost or forgotten, getConnection
will take a row number from the table and return a connection object
for that connection, which can be used to close the connection,
for example. However, if there is no R level object referring to the
connection it will be closed automatically at the next garbage
collection (except for gzcon
connections).
closeAllConnections
closes (and destroys) all user
connections, restoring all sink
diversions as it does
so.
isatty
returns true if the connection is one of the class
"terminal"
connections and it is apparently connected to a
terminal, otherwise false. This may not be reliable in embedded
applications, including GUI consoles.
getAllConnections
returns a sequence of integer connection
descriptors for use with getConnection
, corresponding to the
row names of the table returned by showConnections(all =
TRUE)
.
stdin()
, stdout()
and stderr()
return connection
objects.
showConnections
returns a character matrix of information with
a row for each connection, by default only for open non-standard connections.
getConnection
returns a connection object, or NULL
.
stdin()
refers to the ‘console’ and not to the C-level
‘stdin’ of the process. The distinction matters in GUI consoles
(which may not have an active ‘stdin’, and if they do it may not
be connected to console input), and also in embedded applications.
If you want access to the C-level file stream ‘stdin’, use
file("stdin")
.
When R is reading a script from a file, the file is the ‘console’: this is traditional usage to allow in-line data (see ‘An Introduction to R’ for an example).
showConnections(all = TRUE) ## Not run: textConnection(letters) # oops, I forgot to record that one showConnections() # class description mode text isopen can read can write #3 "letters" "textConnection" "r" "text" "opened" "yes" "no" mycon <- getConnection(3) ## End(Not run) c(isatty(stdin()), isatty(stdout()), isatty(stderr()))
showConnections(all = TRUE) ## Not run: textConnection(letters) # oops, I forgot to record that one showConnections() # class description mode text isopen can read can write #3 "letters" "textConnection" "r" "text" "opened" "yes" "no" mycon <- getConnection(3) ## End(Not run) c(isatty(stdin()), isatty(stdout()), isatty(stderr()))
Quote a string to be passed to an operating system shell.
shQuote(string, type = c("sh", "csh", "cmd", "cmd2"))
shQuote(string, type = c("sh", "csh", "cmd", "cmd2"))
string |
a character vector, usually of length one. |
type |
character: the type of shell quoting. Partial matching is
supported. |
The default type of quoting supported under Unix-alikes is that for
the Bourne shell sh
. If the string does not contain single
quotes, we can just surround it with single quotes. Otherwise, the
string is surrounded in double quotes, which suppresses all special
meanings of metacharacters except dollar, backquote and backslash, so
these (and of course double quote) are preceded by backslash. This
type of quoting is also appropriate for bash
, ksh
and
zsh
.
The other type of quoting is for the C-shell (csh
and
tcsh
). Once again, if the string does not contain single
quotes, we can just surround it with single quotes. If it does
contain single quotes, we can use double quotes provided it does not
contain dollar or backquote (and we need to escape backslash,
exclamation mark and double quote). As a last resort, we need to
split the string into pieces not containing single quotes (some may be
empty) and surround each with single quotes, and the single quotes
with double quotes.
In Windows, command line interpretation is done by the application as well
as the shell. It may depend on the compiler used: Microsoft's rules for
the C run-time are given at
https://learn.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments?view=msvc-160.
It may depend on the whim of the programmer of the application: check its
documentation. The type = "cmd"
prepares the string for parsing as
an argument by the Microsoft's rules and makes shQuote
safe for use
with many applications when used with system
or
system2
. It surrounds the string by double quotes and
escapes internal double quotes by a backslash. Any trailing backslashes
and backslashes that were originally before double quotes are doubled.
The Windows
cmd.exe
shell (used by default with shell
)
uses type = "cmd2"
quoting: special characters are prefixed
with "^"
. In some cases, two types of quoting should be
used: first for the application, and then type = "cmd2"
for cmd.exe
. See the examples below.
A character vector of the same length as string
.
Loukides, M. et al (2002) Unix Power Tools Third Edition. O'Reilly. Section 27.12.
Discussion in PR#16636.
Quotes for quoting R code.
sQuote
for quoting English text.
test <- "abc$def`gh`i\\j" cat(shQuote(test), "\n") ## Not run: system(paste("echo", shQuote(test))) test <- "don't do it!" cat(shQuote(test), "\n") tryit <- paste("use the", sQuote("-c"), "switch\nlike this") cat(shQuote(tryit), "\n") ## Not run: system(paste("echo", shQuote(tryit))) cat(shQuote(tryit, type = "csh"), "\n") ## Windows-only example, assuming cmd.exe: perlcmd <- 'print "Hello World\\n";' ## Not run: shell(shQuote(paste("perl -e", shQuote(perlcmd, type = "cmd")), type = "cmd2")) ## End(Not run)
test <- "abc$def`gh`i\\j" cat(shQuote(test), "\n") ## Not run: system(paste("echo", shQuote(test))) test <- "don't do it!" cat(shQuote(test), "\n") tryit <- paste("use the", sQuote("-c"), "switch\nlike this") cat(shQuote(tryit), "\n") ## Not run: system(paste("echo", shQuote(tryit))) cat(shQuote(tryit, type = "csh"), "\n") ## Windows-only example, assuming cmd.exe: perlcmd <- 'print "Hello World\\n";' ## Not run: shell(shQuote(paste("perl -e", shQuote(perlcmd, type = "cmd")), type = "cmd2")) ## End(Not run)
sign
returns a vector with the signs of the corresponding
elements of x
(the sign of a real number is 1, 0, or
if the number is positive, zero, or negative, respectively).
Note that sign
does not operate on complex vectors.
sign(x)
sign(x)
x |
a numeric vector |
This is an internal generic primitive function: methods
can be defined for it directly or via the
Math
group generic.
sign(pi) # == 1 sign(-2:3) # -1 -1 0 1 1 1
sign(pi) # == 1 sign(-2:3) # -1 -1 0 1 1 1
On receiving SIGUSR1
R will save the workspace and quit.
SIGUSR2
has the same result except that the .Last
function and on.exit
expressions will not be called.
kill -USR1 pid kill -USR2 pid
kill -USR1 pid kill -USR2 pid
pid |
The process ID of the R process. |
The commands history will also be saved if would be at normal termination.
This is not available on Windows, and possibly on other OSes which do not support these signals.
It is possible that one or more R objects will be undergoing modification at the time the signal is sent. These objects could be saved in a corrupted form.
Sys.getpid
to report the process ID for future use.
sink
diverts R output to a connection (and stops such diversions).
sink.number()
reports how many diversions are in use.
sink.number(type = "message")
reports the number of the
connection currently being used for error messages.
sink(file = NULL, append = FALSE, type = c("output", "message"), split = FALSE) sink.number(type = c("output", "message"))
sink(file = NULL, append = FALSE, type = c("output", "message"), split = FALSE) sink.number(type = c("output", "message"))
file |
a writable connection or a character string naming the
file to write to, or |
append |
logical. If |
type |
character string. Either the output stream or the messages stream. The name will be partially matched so can be abbreviated. |
split |
logical: if |
sink
diverts R output to a connection (and must be used again
to finish such a diversion, see below!). If file
is a
character string, a file connection with that name will be established
for the duration of the diversion.
Normal R output (to connection stdout
) is diverted by
the default type = "output"
. Only prompts and (most)
messages continue to appear on the console. Messages sent to
stderr()
(including those from message
,
warning
and stop
) can be diverted by
sink(type = "message")
(see below).
sink()
or sink(file = NULL)
ends the last diversion (of
the specified type). There is a stack of diversions for normal
output, so output reverts to the previous diversion (if there was
one). The stack is of up to 21 connections (20 diversions).
If file
is a connection it will be opened if necessary (in
"wt"
mode) and closed once it is removed from the stack of
diversions.
split = TRUE
only splits R output (via Rvprintf
) and
the default output from writeLines
: it does not split
all output that might be sent to stdout()
.
Sink-ing the messages stream should be done only with great care.
For that stream file
must be an already open connection, and
there is no stack of connections.
If file
is a character string, the file will be opened using
the current encoding. If you want a different encoding (e.g., to
represent strings which have been stored in UTF-8), use a
file
connection — but some ways to produce R output
will already have converted such strings to the current encoding.
sink
returns NULL
.
For sink.number()
the number (0, 1, 2, ...) of diversions of
output in place.
For sink.number("message")
the connection number used for
messages, 2 if no diversion has been used.
Do not use a connection that is open for sink
for any other
purpose. The software will stop you closing one such inadvertently.
Do not sink the messages stream unless you understand the source code implementing it and hence the pitfalls.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Chambers, J. M. (1998) Programming with Data. A Guide to the S Language. Springer.
sink("sink-examp.txt") i <- 1:10 outer(i, i) sink() ## capture all the output to a file. zz <- file("all.Rout", open = "wt") sink(zz) sink(zz, type = "message") try(log("a")) ## revert output back to the console -- only then access the file! sink(type = "message") sink() file.show("all.Rout", delete.file = TRUE)
sink("sink-examp.txt") i <- 1:10 outer(i, i) sink() ## capture all the output to a file. zz <- file("all.Rout", open = "wt") sink(zz) sink(zz, type = "message") try(log("a")) ## revert output back to the console -- only then access the file! sink(type = "message") sink() file.show("all.Rout", delete.file = TRUE)
Returns a matrix of integers indicating the number of their slice in a given array.
slice.index(x, MARGIN)
slice.index(x, MARGIN)
x |
an array. If |
MARGIN |
an integer vector giving the dimension numbers to slice by. |
If MARGIN
gives a single dimension, then all elements of slice
number i
with respect to this have value i
. In general,
slice numbers are obtained by numbering all combinations of indices in
the dimensions given by MARGIN
in column-major order. I.e.,
with , ...,
the dimension numbers (elements of
MARGIN
) sliced by and , ...,
the
corresponding extents, and
,
, ...,
,
the number of the slice where dimension
has value
,
..., dimension
has value
is
.
An integer array y
with dimensions corresponding to those of
x
.
row
and col
for determining row and column
indexes; in fact, these are special cases of slice.index
corresponding to MARGIN
equal to 1 and 2, respectively when
x
is a matrix.
x <- array(1 : 24, c(2, 3, 4)) slice.index(x, 2) slice.index(x, c(1, 3)) ## When slicing by dimensions 1 and 3, slice index 5 is obtained for ## dimension 1 has value 1 and dimension 3 has value 3 (see above): which(slice.index(x, c(1, 3)) == 5, arr.ind = TRUE)
x <- array(1 : 24, c(2, 3, 4)) slice.index(x, 2) slice.index(x, c(1, 3)) ## When slicing by dimensions 1 and 3, slice index 5 is obtained for ## dimension 1 has value 1 and dimension 3 has value 3 (see above): which(slice.index(x, c(1, 3)) == 5, arr.ind = TRUE)
Extract or replace the contents of a slot or property of an object.
object@name object@name <- value
object@name object@name <- value
object |
An object from a formally defined (S4) class, or an object with a class for which '@' or '@<-' S3 methods are defined. |
name |
The name of the slot or property, supplied as a character
string or unquoted symbol. If |
value |
A suitable replacement value for the slot or
property. For an S4 object this must be from a class compatible
with the class defined for this slot in the definition of the class
of |
If object
is not an S4 object, then a suitable S3 method for
'@' or '@<-' is searched for. If no method is found, then an error
is signaled.
if object
is an S4 object, then these operators are for slot
access, and are enabled only when package methods is loaded (as
per default). The slot must be formally defined. (There is an
exception for the name .Data
, intended for internal use only.)
The replacement operator checks that the slot already exists on the
object (which it should if the object is really from the class it
claims to be). See slot
for further details, in
particular for the differences between slot()
and the @
operator.
These are internal generic operators: see InternalMethods.
The current contents of the slot.
Waits for the first of several socket connections and server sockets to become available.
socketSelect(socklist, write = FALSE, timeout = NULL)
socketSelect(socklist, write = FALSE, timeout = NULL)
socklist |
list of open socket connections and server sockets. |
write |
logical. If |
timeout |
numeric or |
The values in write
are recycled if necessary to make up a
logical vector the same length as socklist
. Socket connections
can appear more than once in socklist
; this can be useful if
you want to determine whether a socket is available for reading or
writing.
Logical the same length as socklist
indicating
whether the corresponding socket connection is available for
output or input, depending on the corresponding value of write
.
Server sockets can only become available for input.
## Not run: ## test whether socket connection s is available for writing or reading socketSelect(list(s, s), c(TRUE, FALSE), timeout = 0) ## End(Not run)
## Not run: ## test whether socket connection s is available for writing or reading socketSelect(list(s, s), c(TRUE, FALSE), timeout = 0) ## End(Not run)
This generic function solves the equation a %*% x = b
for x
,
where b
can be either a vector or a matrix.
solve(a, b, ...) ## Default S3 method: solve(a, b, tol, LINPACK = FALSE, ...)
solve(a, b, ...) ## Default S3 method: solve(a, b, tol, LINPACK = FALSE, ...)
a |
a square numeric or complex matrix containing the coefficients of the linear system. Logical matrices are coerced to numeric. |
b |
a numeric or complex vector or matrix giving the right-hand
side(s) of the linear system. If missing, |
tol |
the tolerance for detecting linear dependencies in the
columns of |
LINPACK |
logical. Defunct and an error. |
... |
further arguments passed to or from other methods. |
a
or b
can be complex, but this uses double complex
arithmetic which might not be available on all platforms.
The row and column names of the result are taken from the column names
of a
and of b
respectively. If b
is missing the
column names of the result are the row names of a
. No check is
made that the column names of a
match the row names of b
.
For back-compatibility a
can be a (real) QR decomposition,
although qr.solve
should be called in that case.
qr.solve
can handle non-square systems.
Unsuccessful results from the underlying LAPACK code will result in an error giving a positive error code: these can only be interpreted by detailed study of the FORTRAN code.
What happens if a
and/or b
contain missing, NaN
or infinite values is platform-dependent, including on the version of
LAPACK is in use.
tol
is a tolerance for the (estimated 1-norm)
‘reciprocal condition number’: the check is skipped if
tol <= 0
.
For historical reasons, the default method accepts a
as an
object of class "qr"
(with a warning) and passes it on to
solve.qr
.
The default method is an interface to the LAPACK routines DGESV
and ZGESV
.
LAPACK is from https://netlib.org/lapack/.
Anderson. E. and ten others (1999)
LAPACK Users' Guide. Third Edition. SIAM.
Available on-line at
https://netlib.org/lapack/lug/lapack_lug.html.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
solve.qr
for the qr
method,
chol2inv
for inverting from the Cholesky factor
backsolve
, qr.solve
.
hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, `+`) } h8 <- hilbert(8); h8 sh8 <- solve(h8) round(sh8 %*% h8, 3) A <- hilbert(4) A[] <- as.complex(A) ## might not be supported on all platforms try(solve(A))
hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, `+`) } h8 <- hilbert(8); h8 sh8 <- solve(h8) round(sh8 %*% h8, 3) A <- hilbert(4) A[] <- as.complex(A) ## might not be supported on all platforms try(solve(A))
Sort (or order) a vector or factor (partially) into
ascending or descending order. For ordering along more than one
variable, e.g., for sorting data frames, see order
.
sort(x, decreasing = FALSE, ...) ## Default S3 method: sort(x, decreasing = FALSE, na.last = NA, ...) sort.int(x, partial = NULL, na.last = NA, decreasing = FALSE, method = c("auto", "shell", "quick", "radix"), index.return = FALSE)
sort(x, decreasing = FALSE, ...) ## Default S3 method: sort(x, decreasing = FALSE, na.last = NA, ...) sort.int(x, partial = NULL, na.last = NA, decreasing = FALSE, method = c("auto", "shell", "quick", "radix"), index.return = FALSE)
x |
for |
decreasing |
logical. Should the sort be increasing or decreasing? Not available for partial sorting. |
... |
arguments to be passed to or from methods or (for the
default methods and objects without a class) to |
na.last |
for controlling the treatment of |
partial |
|
method |
character string specifying the algorithm used. Not available for partial sorting. Can be abbreviated. |
index.return |
logical indicating if the ordering index vector should
be returned as well. Supported by |
sort
is a generic function for which methods can be written,
and sort.int
is the internal method which is compatible
with S if only the first three arguments are used.
The default sort
method makes use of order
for
classed objects, which in turn makes use of the generic function
xtfrm
(and can be slow unless a xtfrm
method has
been defined or is.numeric(x)
is true).
Complex values are sorted first by the real part, then the imaginary part.
The "auto"
method selects "radix"
for short (less than
elements) numeric vectors, integer vectors, logical
vectors and factors; otherwise,
"shell"
.
Except for method "radix"
,
the sort order for character vectors will depend on the collating
sequence of the locale in use: see Comparison
.
The sort order for factors is the order of their levels (which is
particularly appropriate for ordered factors).
If partial
is not NULL
, it is taken to contain indices
of elements of the result which are to be placed in their correct
positions in the sorted array by partial sorting. For each of the
result values in a specified position, any values smaller than that
one are guaranteed to have a smaller index in the sorted array and any
values which are greater are guaranteed to have a bigger index in the
sorted array. (This is included for efficiency, and many of the
options are not available for partial sorting. It is only
substantially more efficient if partial
has a handful of
elements, and a full sort is done (a Quicksort if possible) if there
are more than 10.) Names are discarded for partial sorting.
Method "shell"
uses Shellsort (an variant from
Sedgewick (1986)). If
x
has names a stable modification is
used, so ties are not reordered. (This only matters if names are
present.)
Method "quick"
uses Singleton (1969)'s implementation of
Hoare's Quicksort method and is only available when x
is
numeric (double or integer) and partial
is NULL
. (For
other types of x
Shellsort is used, silently.) It is normally
somewhat faster than Shellsort (perhaps 50% faster on vectors of
length a million and twice as fast at a billion) but has poor
performance in the rare worst case. (Peto's modification using a
pseudo-random midpoint is used to make the worst case rarer.) This is
not a stable sort, and ties may be reordered.
Method "radix"
relies on simple hashing to scale time linearly
with the input size, i.e., its asymptotic time complexity is O(n). The
specific variant and its implementation originated from the data.table
package and are due to Matt Dowle and Arun Srinivasan. For small
inputs (< 200), the implementation uses an insertion sort (O(n^2))
that operates in-place to avoid the allocation overhead of the radix
sort. For integer vectors of range less than 100,000, it switches to a
simpler and faster linear time counting sort. In all cases, the sort
is stable; the order of ties is preserved. It is the default method
for integer vectors and factors.
The "radix"
method generally outperforms the other methods,
especially for small integers. Compared to quick sort, it is slightly
faster for vectors with large integer or real values (but unlike quick
sort, radix is stable and supports all na.last
options). The
implementation is orders of magnitude faster than shell sort for
character vectors, but collation does not respect the
locale and so gives incorrect answers even in English locales.
However, there are some caveats for the radix sort:
If x
is a character
vector, all elements must share
the same encoding. Only UTF-8 (including ASCII) and Latin-1
encodings are supported. Collation follows that with
LC_COLLATE=C, that is lexicographically byte-by-byte using
numerical ordering of bytes.
Long vectors (with or more elements)
and
complex
vectors are not supported.
For sort
, the result depends on the S3 method which is
dispatched. If x
does not have a class sort.int
is used
and it description applies. For classed objects which do not have a
specific method the default method will be used and is equivalent to
x[order(x, ...)]
: this depends on the class having a suitable
method for [
(and also that order
will work,
which requires a xtfrm
method).
For sort.int
the value is the sorted vector unless
index.return
is true, when the result is a list with components
named x
and ix
containing the sorted numbers and the
ordering index vector. In the latter case, if method ==
"quick"
ties may be reversed in the ordering (unlike
sort.list
) as quicksort is not stable. For method ==
"radix"
, index.return
is supported for all na.last
modes. The other methods only support index.return
when na.last
is NA
. The index vector
refers to element numbers after removal of NA
s: see
order
if you want the original element numbers.
All attributes are removed from the return value (see Becker
et al., 1988, p.146) except names, which are sorted. (If
partial
is specified even the names are removed.) Note that
this means that the returned value has no class, except for factors
and ordered factors (which are treated specially and whose result is
transformed back to the original class).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.
Knuth, D. E. (1998). The Art of Computer Programming, Volume 3: Sorting and Searching, 2nd ed. Addison-Wesley.
Sedgewick, R. (1986). A new upper bound for Shellsort. Journal of Algorithms, 7, 159–173. doi:10.1016/0196-6774(86)90001-5.
Singleton, R. C. (1969). Algorithm 347: an efficient algorithm for sorting with minimal storage. Communications of the ACM, 12, 185–186. doi:10.1145/362875.362901.
‘Comparison’ for how character strings are collated.
order
for sorting on or reordering multiple variables.
require(stats) x <- swiss$Education[1:25] x; sort(x); sort(x, partial = c(10, 15)) ## illustrate 'stable' sorting (of ties): sort(c(10:3, 2:12), method = "shell", index.return = TRUE) # is stable ## $x : 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 12 ## $ix: 9 8 10 7 11 6 12 5 13 4 14 3 15 2 16 1 17 18 19 sort(c(10:3, 2:12), method = "quick", index.return = TRUE) # is not ## $x : 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 12 ## $ix: 9 10 8 7 11 6 12 5 13 4 14 3 15 16 2 17 1 18 19 x <- c(1:3, 3:5, 10) is.unsorted(x) # FALSE: is sorted is.unsorted(x, strictly = TRUE) # TRUE : is not (and cannot be) # sorted strictly ## Not run: ## Small speed comparison simulation: N <- 2000 Sim <- 20 rep <- 1000 # << adjust to your CPU c1 <- c2 <- numeric(Sim) for(is in seq_len(Sim)){ x <- rnorm(N) c1[is] <- system.time(for(i in 1:rep) sort(x, method = "shell"))[1] c2[is] <- system.time(for(i in 1:rep) sort(x, method = "quick"))[1] stopifnot(sort(x, method = "shell") == sort(x, method = "quick")) } rbind(ShellSort = c1, QuickSort = c2) cat("Speedup factor of quick sort():\n") summary({qq <- c1 / c2; qq[is.finite(qq)]}) ## A larger test x <- rnorm(1e7) system.time(x1 <- sort(x, method = "shell")) system.time(x2 <- sort(x, method = "quick")) system.time(x3 <- sort(x, method = "radix")) stopifnot(identical(x1, x2)) stopifnot(identical(x1, x3)) ## End(Not run)
require(stats) x <- swiss$Education[1:25] x; sort(x); sort(x, partial = c(10, 15)) ## illustrate 'stable' sorting (of ties): sort(c(10:3, 2:12), method = "shell", index.return = TRUE) # is stable ## $x : 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 12 ## $ix: 9 8 10 7 11 6 12 5 13 4 14 3 15 2 16 1 17 18 19 sort(c(10:3, 2:12), method = "quick", index.return = TRUE) # is not ## $x : 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 12 ## $ix: 9 10 8 7 11 6 12 5 13 4 14 3 15 16 2 17 1 18 19 x <- c(1:3, 3:5, 10) is.unsorted(x) # FALSE: is sorted is.unsorted(x, strictly = TRUE) # TRUE : is not (and cannot be) # sorted strictly ## Not run: ## Small speed comparison simulation: N <- 2000 Sim <- 20 rep <- 1000 # << adjust to your CPU c1 <- c2 <- numeric(Sim) for(is in seq_len(Sim)){ x <- rnorm(N) c1[is] <- system.time(for(i in 1:rep) sort(x, method = "shell"))[1] c2[is] <- system.time(for(i in 1:rep) sort(x, method = "quick"))[1] stopifnot(sort(x, method = "shell") == sort(x, method = "quick")) } rbind(ShellSort = c1, QuickSort = c2) cat("Speedup factor of quick sort():\n") summary({qq <- c1 / c2; qq[is.finite(qq)]}) ## A larger test x <- rnorm(1e7) system.time(x1 <- sort(x, method = "shell")) system.time(x2 <- sort(x, method = "quick")) system.time(x3 <- sort(x, method = "radix")) stopifnot(identical(x1, x2)) stopifnot(identical(x1, x3)) ## End(Not run)
Generic function to sort an object in the order determined by one or more other objects, typically vectors. A method is defined for data frames to sort its rows (typically by one or more columns), and the default method handles vector-like objects.
sort_by(x, y, ...) ## Default S3 method: sort_by(x, y, ...) ## S3 method for class 'data.frame' sort_by(x, y, ...)
sort_by(x, y, ...) ## Default S3 method: sort_by(x, y, ...) ## S3 method for class 'data.frame' sort_by(x, y, ...)
x |
An object to be sorted, typically a vector or data frame. |
y |
Variables to sort by. For the default method, this can be a vector, or more generally any
object that has a For the |
... |
Additional arguments, typically passed on to
|
A sorted version of x
. If x
is a data frame, this means
that the rows of x
have been reordered to sort the variables
specified in y
.
mtcars$am mtcars$mpg with(mtcars, sort_by(mpg, am)) # group mpg by am ## data.frame method sort_by(mtcars, runif(nrow(mtcars))) # random row permutation sort_by(mtcars, list(mtcars$am, mtcars$mpg)) # formula interface sort_by(mtcars, ~ am + mpg) |> subset(select = c(am, mpg)) sort_by.data.frame(mtcars, ~ list(am, -mpg)) |> subset(select = c(am, mpg))
mtcars$am mtcars$mpg with(mtcars, sort_by(mpg, am)) # group mpg by am ## data.frame method sort_by(mtcars, runif(nrow(mtcars))) # random row permutation sort_by(mtcars, list(mtcars$am, mtcars$mpg)) # formula interface sort_by(mtcars, ~ am + mpg) |> subset(select = c(am, mpg)) sort_by.data.frame(mtcars, ~ list(am, -mpg)) |> subset(select = c(am, mpg))
source
causes R to accept its input from the named file or URL
or connection or expressions directly. Input is read and
parse
d from that file
until the end of the file is reached, then the parsed expressions are
evaluated sequentially in the chosen environment.
withAutoprint(exprs)
is a wrapper for source(exprs =
exprs, ..)
with different defaults. Its main purpose is to evaluate
and auto-print expressions as if in a toplevel context, e.g, as in the
R console.
source(file, local = FALSE, echo = verbose, print.eval = echo, exprs, spaced = use_file, verbose = getOption("verbose"), prompt.echo = getOption("prompt"), max.deparse.length = 150, width.cutoff = 60L, deparseCtrl = "showAttributes", chdir = FALSE, catch.aborts = FALSE, encoding = getOption("encoding"), continue.echo = getOption("continue"), skip.echo = 0, keep.source = getOption("keep.source")) withAutoprint(exprs, evaluated = FALSE, local = parent.frame(), print. = TRUE, echo = TRUE, max.deparse.length = Inf, width.cutoff = max(20, getOption("width")), deparseCtrl = c("keepInteger", "showAttributes", "keepNA"), skip.echo = 0, ...)
source(file, local = FALSE, echo = verbose, print.eval = echo, exprs, spaced = use_file, verbose = getOption("verbose"), prompt.echo = getOption("prompt"), max.deparse.length = 150, width.cutoff = 60L, deparseCtrl = "showAttributes", chdir = FALSE, catch.aborts = FALSE, encoding = getOption("encoding"), continue.echo = getOption("continue"), skip.echo = 0, keep.source = getOption("keep.source")) withAutoprint(exprs, evaluated = FALSE, local = parent.frame(), print. = TRUE, echo = TRUE, max.deparse.length = Inf, width.cutoff = max(20, getOption("width")), deparseCtrl = c("keepInteger", "showAttributes", "keepNA"), skip.echo = 0, ...)
file |
a connection or a character string giving the
pathname of the file or URL to read from. The |
local |
|
echo |
logical; if |
print.eval , print.
|
logical; if |
exprs |
for for |
evaluated |
logical indicating that |
spaced |
logical indicating if newline (hence empty line) should
be printed before each expression (when |
verbose |
if |
prompt.echo |
character; gives the prompt to be used if
|
max.deparse.length |
integer; is used only if |
width.cutoff |
integer, passed to |
deparseCtrl |
|
chdir |
logical; if |
catch.aborts |
logical indicating that “abort”ing errors should be caught. |
encoding |
character vector. The encoding(s) to be assumed when
|
continue.echo |
character; gives the prompt to use on
continuation lines if |
skip.echo |
integer; how many comment lines at the start of the
file to skip if |
keep.source |
logical: should the source formatting be retained when echoing expressions, if possible? |
... |
(for |
Note that running code via source
differs in a few respects
from entering it at the R command line. Since expressions are not
executed at the top level, auto-printing is not done. So you will
need to include explicit print
calls for things you want to be
printed (and remember that this includes plotting by lattice,
FAQ Q7.22). Since the complete file is parsed before any of it is
run, syntax errors result in none of the code being run. If an error
occurs in running a syntactically correct script, anything assigned
into the workspace by code that has been run will be kept (just as
from the command line), but diagnostic information such as
traceback()
will contain additional calls to
withVisible
.
All versions of R accept input from a connection with end of line marked by LF (as used on Unix), CRLF (as used on DOS/Windows) or CR (as used on classic Mac OS) and map this to newline. The final line can be incomplete, that is missing the final end-of-line marker.
If keep.source
is true (the default in interactive use), the
source of functions is kept so they can be listed exactly as input.
Unlike input from a console, lines in the file or on a connection can contain an unlimited number of characters.
When skip.echo > 0
, that many comment lines at the start of
the file will not be echoed. This does not affect the execution of
the code at all. If there are executable lines within the first
skip.echo
lines, echoing will start with the first of them.
If echo
is true and a deparsed expression exceeds
max.deparse.length
, that many characters are output followed by
.... [TRUNCATED]
.
By default the input is read and parsed in the current encoding of the R session. This is usually what is required, but occasionally re-encoding is needed, e.g. if a file from a UTF-8-using system is to be read on Windows (or vice versa).
The rest of this paragraph applies if file
is an actual
filename or URL (and not a connection). If
encoding = "unknown"
, an attempt is made to guess the encoding:
the result of localeToCharset()
is used as a guide. If
encoding
has two or more elements, they are tried in turn until
the file/URL can be read without error in the trial encoding. If an
actual encoding
is specified (rather than the default or
"unknown"
) in a Latin-1 or UTF-8 locale then character strings
in the result will be translated to the current encoding and marked as
such (see Encoding
).
If file
is a connection,
it is not possible to re-encode the input inside source
, and so
the encoding
argument is just used to mark character strings in the
parsed input in Latin-1 and UTF-8 locales: see parse
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
demo
which uses source
;
eval
, parse
and scan
;
options("keep.source")
.
sys.source
which is a streamlined version to source a
file into an environment.
‘The R Language Definition’ for a discussion of source directives.
someCond <- 7 > 6 ## want an if-clause to behave "as top level" wrt auto-printing : ## (all should look "as if on top level", e.g. non-assignments should print:) if(someCond) withAutoprint({ x <- 1:12 x-1 (y <- (x-5)^2) z <- y z - 10 }) ## If you want to source() a bunch of files, something like ## the following may be useful: sourceDir <- function(path, trace = TRUE, ...) { op <- options(); on.exit(options(op)) # to reset after each for (nm in list.files(path, pattern = "[.][RrSsQq]$")) { if(trace) cat(nm,":") source(file.path(path, nm), ...) if(trace) cat("\n") options(op) } } suppressWarnings( rm(x,y) ) # remove 'x' or 'y' from global env withAutoprint({ x <- 1:2; cat("x=",x, "\n"); y <- x^2 }) ## x and y now exist: stopifnot(identical(x, 1:2), identical(y, x^2)) withAutoprint({ formals(sourceDir); body(sourceDir) }, max.deparse.length = 20, verbose = TRUE) ## Continuing after (catchable) errors: tc <- textConnection('1:3 2 + "3" cat(" .. in spite of error: happily continuing! ..\n") 6*7') r <- source(tc, catch.aborts = TRUE) ## Error in 2 + "3" .... ## .. in spite of error: happily continuing! .. stopifnot(identical(r, list(value = 42, visible=TRUE)))
someCond <- 7 > 6 ## want an if-clause to behave "as top level" wrt auto-printing : ## (all should look "as if on top level", e.g. non-assignments should print:) if(someCond) withAutoprint({ x <- 1:12 x-1 (y <- (x-5)^2) z <- y z - 10 }) ## If you want to source() a bunch of files, something like ## the following may be useful: sourceDir <- function(path, trace = TRUE, ...) { op <- options(); on.exit(options(op)) # to reset after each for (nm in list.files(path, pattern = "[.][RrSsQq]$")) { if(trace) cat(nm,":") source(file.path(path, nm), ...) if(trace) cat("\n") options(op) } } suppressWarnings( rm(x,y) ) # remove 'x' or 'y' from global env withAutoprint({ x <- 1:2; cat("x=",x, "\n"); y <- x^2 }) ## x and y now exist: stopifnot(identical(x, 1:2), identical(y, x^2)) withAutoprint({ formals(sourceDir); body(sourceDir) }, max.deparse.length = 20, verbose = TRUE) ## Continuing after (catchable) errors: tc <- textConnection('1:3 2 + "3" cat(" .. in spite of error: happily continuing! ..\n") 6*7') r <- source(tc, catch.aborts = TRUE) ## Error in 2 + "3" .... ## .. in spite of error: happily continuing! .. stopifnot(identical(r, list(value = 42, visible=TRUE)))
Special mathematical functions related to the beta and gamma functions.
beta(a, b) lbeta(a, b) gamma(x) lgamma(x) psigamma(x, deriv = 0) digamma(x) trigamma(x) choose(n, k) lchoose(n, k) factorial(x) lfactorial(x)
beta(a, b) lbeta(a, b) gamma(x) lgamma(x) psigamma(x, deriv = 0) digamma(x) trigamma(x) choose(n, k) lchoose(n, k) factorial(x) lfactorial(x)
a , b
|
non-negative numeric vectors. |
x , n
|
numeric vectors. |
k , deriv
|
integer vectors. |
The functions beta
and lbeta
return the beta function
and the natural logarithm of the beta function,
The formal definition is
(Abramowitz and Stegun section 6.2.1, page 258).
Note that it is only
defined in R for non-negative a
and b
, and is infinite
if either is zero.
The functions gamma
and lgamma
return the gamma function
and the natural logarithm of the absolute value of the
gamma function. The gamma function is defined by
(Abramowitz and Stegun section 6.1.1, page 255)
for all real x
except zero and negative integers (when
NaN
is returned). There will be a warning on possible loss of
precision for values which are too close (within about
) to a negative integer less than ‘-10’.
factorial(x)
( for non-negative integer
x
)
is defined to be gamma(x+1)
and lfactorial
to be
lgamma(x+1)
.
The functions digamma
and trigamma
return the first and second
derivatives of the logarithm of the gamma function.
psigamma(x, deriv)
(deriv >= 0
) computes the
deriv
-th derivative of .
and its derivatives, the
psigamma()
functions, are
often called the ‘polygamma’ functions, e.g. in
Abramowitz and Stegun (section 6.4.1, page 260); and higher
derivatives (deriv = 2:4
) have occasionally been called
‘tetragamma’, ‘pentagamma’, and ‘hexagamma’.
The functions choose
and lchoose
return binomial
coefficients and the logarithms of their absolute values. Note that
choose(n, k)
is defined for all real numbers and integer
. For
it is defined as
,
as
for
and as
for negative
.
Non-integer values of
k
are rounded to an integer, with a warning.
choose(*, k)
uses direct arithmetic (instead of
[l]gamma
calls) for small k
, for speed and accuracy
reasons. Note the function combn
(package
utils) for enumeration of all possible combinations.
The gamma
, lgamma
, digamma
and trigamma
functions are internal generic primitive functions: methods can be
defined for them individually or via the
Math
group generic.
gamma
, lgamma
, beta
and lbeta
are based on
C translations of Fortran subroutines by W. Fullerton of Los Alamos
Scientific Laboratory (now available as part of SLATEC).
digamma
, trigamma
and psigamma
for x >= 0
are based on
Amos, D. E. (1983). A portable Fortran subroutine for derivatives of the psi function, Algorithm 610, ACM Transactions on Mathematical Software 9(4), 494–502.
For, x < 0
and deriv <= 5
, the reflection formula (6.4.7) of
Abramowitz and Stegun is used.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole. (For gamma
and lgamma
.)
Abramowitz, M. and Stegun, I. A. (1972)
Handbook of Mathematical Functions. New York: Dover.
https://en.wikipedia.org/wiki/Abramowitz_and_Stegun provides
links to the full text which is in public domain.
Chapter 6: Gamma and Related Functions.
Arithmetic
for simple, sqrt
for
miscellaneous mathematical functions and Bessel
for the
real Bessel functions.
For the incomplete gamma function see pgamma
.
require(graphics) choose(5, 2) for (n in 0:10) print(choose(n, k = 0:n)) factorial(100) lfactorial(10000) ## gamma has 1st order poles at 0, -1, -2, ... ## this will generate loss of precision warnings, so turn off op <- options("warn") options(warn = -1) x <- sort(c(seq(-3, 4, length.out = 201), outer(0:-3, (-1:1)*1e-6, `+`))) plot(x, gamma(x), ylim = c(-20,20), col = "red", type = "l", lwd = 2, main = expression(Gamma(x))) abline(h = 0, v = -3:0, lty = 3, col = "midnightblue") options(op) x <- seq(0.1, 4, length.out = 201); dx <- diff(x)[1] par(mfrow = c(2, 3)) for (ch in c("", "l","di","tri","tetra","penta")) { is.deriv <- nchar(ch) >= 2 nm <- paste0(ch, "gamma") if (is.deriv) { dy <- diff(y) / dx # finite difference der <- which(ch == c("di","tri","tetra","penta")) - 1 nm2 <- paste0("psigamma(*, deriv = ", der,")") nm <- if(der >= 2) nm2 else paste(nm, nm2, sep = " ==\n") y <- psigamma(x, deriv = der) } else { y <- get(nm)(x) } plot(x, y, type = "l", main = nm, col = "red") abline(h = 0, col = "lightgray") if (is.deriv) lines(x[-1], dy, col = "blue", lty = 2) } par(mfrow = c(1, 1)) ## "Extended" Pascal triangle: fN <- function(n) formatC(n, width=2) for (n in -4:10) { cat(fN(n),":", fN(choose(n, k = -2:max(3, n+2)))) cat("\n") } ## R code version of choose() [simplistic; warning for k < 0]: mychoose <- function(r, k) ifelse(k <= 0, (k == 0), sapply(k, function(k) prod(r:(r-k+1))) / factorial(k)) k <- -1:6 cbind(k = k, choose(1/2, k), mychoose(1/2, k)) ## Binomial theorem for n = 1/2 ; ## sqrt(1+x) = (1+x)^(1/2) = sum_{k=0}^Inf choose(1/2, k) * x^k : k <- 0:10 # 10 is sufficient for ~ 9 digit precision: sqrt(1.25) sum(choose(1/2, k)* .25^k)
require(graphics) choose(5, 2) for (n in 0:10) print(choose(n, k = 0:n)) factorial(100) lfactorial(10000) ## gamma has 1st order poles at 0, -1, -2, ... ## this will generate loss of precision warnings, so turn off op <- options("warn") options(warn = -1) x <- sort(c(seq(-3, 4, length.out = 201), outer(0:-3, (-1:1)*1e-6, `+`))) plot(x, gamma(x), ylim = c(-20,20), col = "red", type = "l", lwd = 2, main = expression(Gamma(x))) abline(h = 0, v = -3:0, lty = 3, col = "midnightblue") options(op) x <- seq(0.1, 4, length.out = 201); dx <- diff(x)[1] par(mfrow = c(2, 3)) for (ch in c("", "l","di","tri","tetra","penta")) { is.deriv <- nchar(ch) >= 2 nm <- paste0(ch, "gamma") if (is.deriv) { dy <- diff(y) / dx # finite difference der <- which(ch == c("di","tri","tetra","penta")) - 1 nm2 <- paste0("psigamma(*, deriv = ", der,")") nm <- if(der >= 2) nm2 else paste(nm, nm2, sep = " ==\n") y <- psigamma(x, deriv = der) } else { y <- get(nm)(x) } plot(x, y, type = "l", main = nm, col = "red") abline(h = 0, col = "lightgray") if (is.deriv) lines(x[-1], dy, col = "blue", lty = 2) } par(mfrow = c(1, 1)) ## "Extended" Pascal triangle: fN <- function(n) formatC(n, width=2) for (n in -4:10) { cat(fN(n),":", fN(choose(n, k = -2:max(3, n+2)))) cat("\n") } ## R code version of choose() [simplistic; warning for k < 0]: mychoose <- function(r, k) ifelse(k <= 0, (k == 0), sapply(k, function(k) prod(r:(r-k+1))) / factorial(k)) k <- -1:6 cbind(k = k, choose(1/2, k), mychoose(1/2, k)) ## Binomial theorem for n = 1/2 ; ## sqrt(1+x) = (1+x)^(1/2) = sum_{k=0}^Inf choose(1/2, k) * x^k : k <- 0:10 # 10 is sufficient for ~ 9 digit precision: sqrt(1.25) sum(choose(1/2, k)* .25^k)
split
divides the data in the vector x
into the groups
defined by f
. The replacement forms replace values
corresponding to such a division. unsplit
reverses the effect of
split
.
split(x, f, drop = FALSE, ...) ## Default S3 method: split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...) split(x, f, drop = FALSE, ...) <- value unsplit(value, f, drop = FALSE)
split(x, f, drop = FALSE, ...) ## Default S3 method: split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...) split(x, f, drop = FALSE, ...) <- value unsplit(value, f, drop = FALSE)
x |
vector or data frame containing values to be divided into groups. |
f |
a ‘factor’ in the sense that |
drop |
logical indicating if levels that do not occur should be dropped
(if |
value |
a list of vectors or data frames compatible with a
splitting of |
sep |
character string, passed to |
lex.order |
logical, passed to |
... |
further potential arguments passed to methods. |
split
and split<-
are generic functions with default and
data.frame
methods. The data frame method can also be used to
split a matrix into a list of matrices, and the replacement form
likewise, provided they are invoked explicitly.
unsplit
works with lists of vectors or data frames (assumed to
have compatible structure, as if created by split
). It puts
elements or rows back in the positions given by f
. In the data
frame case, row names are obtained by unsplitting the row name
vectors from the elements of value
.
f
is recycled as necessary and if the length of x
is not
a multiple of the length of f
a warning is printed.
Any missing values in f
are dropped together with the
corresponding values of x
.
The default method calls interaction
when f
is a
list
. If the levels of the factors contain ‘.’
the factors may not be split as expected, unless sep
is set to
string not present in the factor levels
.
The value returned from split
is a list of vectors containing
the values for the groups. The components of the list are named by
the levels of f
(after converting to a factor, or if already a
factor and drop = TRUE
, dropping unused levels).
The replacement forms return their right hand side. unsplit
returns a vector or data frame for which split(x, f)
equals
value
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
cut
to categorize numeric values.
strsplit
to split strings.
require(stats); require(graphics) n <- 10; nn <- 100 g <- factor(round(n * runif(n * nn))) x <- rnorm(n * nn) + sqrt(as.numeric(g)) xg <- split(x, g) boxplot(xg, col = "lavender", notch = TRUE, varwidth = TRUE) sapply(xg, length) sapply(xg, mean) ### Calculate 'z-scores' by group (standardize to mean zero, variance one) z <- unsplit(lapply(split(x, g), scale), g) # or zz <- x split(zz, g) <- lapply(split(x, g), scale) # and check that the within-group std dev is indeed one tapply(z, g, sd) tapply(zz, g, sd) ### data frame variation ## Notice that assignment form is not used since a variable is being added g <- airquality$Month l <- split(airquality, g) ## Alternative using a formula identical(l, split(airquality, ~ Month)) l <- lapply(l, transform, Oz.Z = scale(Ozone)) aq2 <- unsplit(l, g) head(aq2) with(aq2, tapply(Oz.Z, Month, sd, na.rm = TRUE)) ### Split a matrix into a list by columns ma <- cbind(x = 1:10, y = (-4:5)^2) split(ma, col(ma)) split(1:10, 1:2)
require(stats); require(graphics) n <- 10; nn <- 100 g <- factor(round(n * runif(n * nn))) x <- rnorm(n * nn) + sqrt(as.numeric(g)) xg <- split(x, g) boxplot(xg, col = "lavender", notch = TRUE, varwidth = TRUE) sapply(xg, length) sapply(xg, mean) ### Calculate 'z-scores' by group (standardize to mean zero, variance one) z <- unsplit(lapply(split(x, g), scale), g) # or zz <- x split(zz, g) <- lapply(split(x, g), scale) # and check that the within-group std dev is indeed one tapply(z, g, sd) tapply(zz, g, sd) ### data frame variation ## Notice that assignment form is not used since a variable is being added g <- airquality$Month l <- split(airquality, g) ## Alternative using a formula identical(l, split(airquality, ~ Month)) l <- lapply(l, transform, Oz.Z = scale(Ozone)) aq2 <- unsplit(l, g) head(aq2) with(aq2, tapply(Oz.Z, Month, sd, na.rm = TRUE)) ### Split a matrix into a list by columns ma <- cbind(x = 1:10, y = (-4:5)^2) split(ma, col(ma)) split(1:10, 1:2)
A wrapper for the C function sprintf
, that returns a character
vector containing a formatted combination of text and variable values.
sprintf(fmt, ...) gettextf(fmt, ..., domain = NULL, trim = TRUE)
sprintf(fmt, ...) gettextf(fmt, ..., domain = NULL, trim = TRUE)
fmt |
a character vector of format strings, each of up to 8192 bytes. |
... |
values to be passed into |
trim , domain
|
see |
sprintf
is a wrapper for the system sprintf
C-library
function. Attempts are made to check that the mode of the values
passed match the format supplied, and R's special values (NA
,
Inf
, -Inf
and NaN
) are handled correctly.
gettextf
is a convenience function which provides C-style
string formatting with possible translation of the format string.
The arguments (including fmt
) are recycled if possible a whole
number of times to the length of the longest, and then the formatting
is done in parallel. Zero-length arguments are allowed and will give
a zero-length result. All arguments are evaluated even if unused, and
hence some types (e.g., "symbol"
or "language"
, see
typeof
) are not allowed. Arguments unused by fmt
result in a warning. (The format %.0s
can be used to
“skip” an argument.)
The following is abstracted from Kernighan and Ritchie (1988): however the actual implementation will follow the C99 standard and fine details (especially the behaviour under user error) may depend on the platform. References to numbered arguments come from POSIX.
The string fmt
contains normal characters,
which are passed through to the output string, and also conversion
specifications which operate on the arguments provided through
...
. The allowed conversion specifications start with a
%
and end with one of the letters in the set
aAdifeEgGosxX%
. These letters denote the following types:
d
, i
, o
, x
, X
Integer
value, o
being octal,
x
and X
being hexadecimal (using the same case for
a-f
as the code). Numeric variables with exactly integer
values will be coerced to integer. Formats d
and i
can also be used for logical variables, which will be converted to
0
, 1
or NA
.
f
Double precision value, in “fixed
point” decimal notation of the form ‘"[-]mmm.ddd"’. The number of
decimal places ("d") is specified by the precision: the default is 6;
a precision of 0 suppresses the decimal point. Non-finite values
are converted to NA
, NaN
or (perhaps a sign followed
by) Inf
.
e
, E
Double precision value, in
“exponential” decimal notation of the
form [-]m.ddde[+-]xx
or [-]m.dddE[+-]xx
.
g
, G
Double precision value, in %e
or
%E
format if the exponent is less than -4 or greater than or
equal to the precision, and %f
format otherwise.
(The precision (default 6) specifies the number of
significant digits here, whereas in %f, %e
, it is
the number of digits after the decimal point.)
a
, A
Double precision value, in binary notation
of the form [-]0xh.hhhp[+-]d
. This is a binary fraction
expressed in hex multiplied by a (decimal) power of 2. The number
of hex digits after the decimal point is specified by the precision:
the default is enough digits to represent exactly the internal
binary representation. Non-finite values are converted to NA
,
NaN
or (perhaps a sign followed by) Inf
. Format
%a
uses lower-case for x
, p
and the hex
values: format %A
uses upper-case.
This should be supported on all platforms as it is a feature of C99.
The format is not uniquely defined: although it would be possible
to make the leading h
always zero or one, this is not
always done. Most systems will suppress trailing zeros, but a few
do not. On a well-written platform, for normal numbers there will
be a leading one before the decimal point plus (by default) 13
hexadecimal digits, hence 53 bits. The treatment of denormalized
(aka ‘subnormal’) numbers is very platform-dependent.
s
Character string. Character NA
s are
converted to "NA"
.
%
Literal %
(none of the extra formatting
characters given below are permitted in this case).
Conversion by as.character
is used for non-character
arguments with s
and by as.double
for
non-double arguments with f, e, E, g, G
. NB: the length is
determined before conversion, so do not rely on the internal
coercion if this would change the length. The coercion is done only
once, so if length(fmt) > 1
then all elements must expect the
same types of arguments.
In addition, between the initial %
and the terminating
conversion character there may be, in any order:
m.n
Two numbers separated by a period, denoting the
field width (m
) and the precision (n
).
-
Left adjustment of converted argument in its field.
+
Always print number with sign: by default only negative numbers are printed with a sign.
Prefix a space if the first character is not a sign.
0
For numbers, pad to the field width with leading zeros. For characters, this zero-pads on some platforms and is ignored on others.
#
specifies “alternate output” for numbers, its
action depending on the type:
For x
or X
, 0x
or 0X
will be prefixed
to a non-zero result. For e
, e
, f
, g
and G
, the output will always have a decimal point; for
g
and G
, trailing zeros will not be removed.
Further, immediately after %
may come 1$
to 99$
to refer to a numbered argument: this allows arguments to be
referenced out of order and is mainly intended for translators of
error messages. If this is done it is best if all formats are
numbered: if not the unnumbered ones process the arguments in order.
See the examples. This notation allows arguments to be used more than
once, in which case they must be used as the same type (integer,
double or character).
A field width or precision (but not both) may be indicated by an
asterisk *
: in this case an argument specifies the desired
number. A negative field width is taken as a '-' flag followed by a
positive field width. A negative precision is treated as if the
precision were omitted. The argument should be integer, but a double
argument will be coerced to integer.
There is a limit of 8192 bytes on elements of fmt
, and on
strings included from a single %
letter conversion
specification.
Field widths and precisions of %s
conversions are interpreted
as bytes, not characters, as described in the C standard.
The C doubles used for R numerical vectors have signed zeros, which
sprintf
may output as -0
, -0.000
....
A character vector of length that of the longest input. If any
element of fmt
or any character argument is declared as UTF-8,
the element of the result will be in UTF-8 and have the encoding
declared as UTF-8. Otherwise it will be in the current locale's
encoding.
The format string is passed down the OS's sprintf
function, and
incorrect formats can cause the latter to crash the R process . R
does perform sanity checks on the format, but not all possible user
errors on all platforms have been tested, and some might be terminal.
The behaviour on inputs not documented here is ‘undefined’, which means it is allowed to differ by platform.
Original code by Jonathan Rougier.
Kernighan, B. W. and Ritchie, D. M. (1988) The C Programming Language. Second edition, Prentice Hall. Describes the format options in table B-1 in the Appendix.
The C Standards, especially ISO/IEC 9899:1999 for ‘C99’. Links can be found at https://developer.r-project.org/Portability.html.
https://pubs.opengroup.org/onlinepubs/9699919799/functions/snprintf.html for POSIX extensions such as numbered arguments.
man sprintf
on a Unix-alike system.
formatC
for a way of formatting vectors of numbers in a
similar fashion.
paste
for another way of creating a vector combining
text and values.
gettext
for the mechanisms for the automated translation
of text.
## be careful with the format: most things in R are floats ## only integer-valued reals get coerced to integer. sprintf("%s is %f feet tall\n", "Sven", 7.1) # OK try(sprintf("%s is %i feet tall\n", "Sven", 7.1)) # not OK sprintf("%s is %i feet tall\n", "Sven", 7 ) # OK ## use a literal % : sprintf("%.0f%% said yes (out of a sample of size %.0f)", 66.666, 3) ## various formats of pi : sprintf("%f", pi) sprintf("%.3f", pi) sprintf("%1.0f", pi) sprintf("%5.1f", pi) sprintf("%05.1f", pi) sprintf("%+f", pi) sprintf("% f", pi) sprintf("%-10f", pi) # left justified sprintf("%e", pi) sprintf("%E", pi) sprintf("%g", pi) sprintf("%g", 1e6 * pi) # -> exponential sprintf("%.9g", 1e6 * pi) # -> "fixed" sprintf("%G", 1e-6 * pi) ## no truncation: sprintf("%1.f", 101) ## re-use one argument three times, show difference between %x and %X xx <- sprintf("%1$d %1$x %1$X", 0:15) xx <- matrix(xx, dimnames = list(rep("", 16), "%d%x%X")) noquote(format(xx, justify = "right")) ## More sophisticated: sprintf("min 10-char string '%10s'", c("a", "ABC", "and an even longer one")) n <- 1:18 sprintf(paste0("e with %2d digits = %.", n, "g"), n, exp(1)) ## Platform-dependent bad example: may pad with spaces or zeroes sprintf("%09s", month.name) ## Using arguments out of order sprintf("second %2$1.0f, first %1$5.2f, third %3$1.0f", pi, 2, 3) ## Using asterisk for width or precision sprintf("precision %.*f, width '%*.3f'", 3, pi, 8, pi) ## Asterisk and argument re-use, 'e' example reiterated: sprintf("e with %1$2d digits = %2$.*1$g", n, exp(1)) ## re-cycle arguments sprintf("%s %d", "test", 1:3) ## binary output showing rounding/representation errors x <- seq(0, 1.0, 0.1); y <- c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1) cbind(x, sprintf("%a", x), sprintf("%a", y))
## be careful with the format: most things in R are floats ## only integer-valued reals get coerced to integer. sprintf("%s is %f feet tall\n", "Sven", 7.1) # OK try(sprintf("%s is %i feet tall\n", "Sven", 7.1)) # not OK sprintf("%s is %i feet tall\n", "Sven", 7 ) # OK ## use a literal % : sprintf("%.0f%% said yes (out of a sample of size %.0f)", 66.666, 3) ## various formats of pi : sprintf("%f", pi) sprintf("%.3f", pi) sprintf("%1.0f", pi) sprintf("%5.1f", pi) sprintf("%05.1f", pi) sprintf("%+f", pi) sprintf("% f", pi) sprintf("%-10f", pi) # left justified sprintf("%e", pi) sprintf("%E", pi) sprintf("%g", pi) sprintf("%g", 1e6 * pi) # -> exponential sprintf("%.9g", 1e6 * pi) # -> "fixed" sprintf("%G", 1e-6 * pi) ## no truncation: sprintf("%1.f", 101) ## re-use one argument three times, show difference between %x and %X xx <- sprintf("%1$d %1$x %1$X", 0:15) xx <- matrix(xx, dimnames = list(rep("", 16), "%d%x%X")) noquote(format(xx, justify = "right")) ## More sophisticated: sprintf("min 10-char string '%10s'", c("a", "ABC", "and an even longer one")) n <- 1:18 sprintf(paste0("e with %2d digits = %.", n, "g"), n, exp(1)) ## Platform-dependent bad example: may pad with spaces or zeroes sprintf("%09s", month.name) ## Using arguments out of order sprintf("second %2$1.0f, first %1$5.2f, third %3$1.0f", pi, 2, 3) ## Using asterisk for width or precision sprintf("precision %.*f, width '%*.3f'", 3, pi, 8, pi) ## Asterisk and argument re-use, 'e' example reiterated: sprintf("e with %1$2d digits = %2$.*1$g", n, exp(1)) ## re-cycle arguments sprintf("%s %d", "test", 1:3) ## binary output showing rounding/representation errors x <- seq(0, 1.0, 0.1); y <- c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1) cbind(x, sprintf("%a", x), sprintf("%a", y))
Single or double quote text by combining with appropriate single or double left and right quotation marks.
sQuote(x, q = getOption("useFancyQuotes")) dQuote(x, q = getOption("useFancyQuotes"))
sQuote(x, q = getOption("useFancyQuotes")) dQuote(x, q = getOption("useFancyQuotes"))
x |
an R object, to be coerced to a character vector. |
q |
the kind of quotes to be used, see ‘Details’. |
The purpose of the functions is to provide a simple means of markup for quoting text to be used in the R output, e.g., in warnings or error messages.
The choice of the appropriate quotation marks depends on both the locale and the available character sets. Older Unix/X11 fonts displayed the grave accent (ASCII code 0x60) and the apostrophe (0x27) in a way that they could also be used as matching open and close single quotation marks. Using modern fonts, or non-Unix systems, these characters no longer produce matching glyphs. Unicode provides left and right single quotation mark characters (U+2018 and U+2019); if Unicode markup cannot be assumed to be available, it seems good practice to use the apostrophe as a non-directional single quotation mark.
Similarly, Unicode has left and right double quotation mark characters (U+201C and U+201D); if only ASCII's typewriter characteristics can be employed, than the ASCII quotation mark (0x22) should be used as both the left and right double quotation mark.
Some other locales also have the directional quotation marks, notably on Windows. TeX uses grave and apostrophe for the directional single quotation marks, and doubled grave and doubled apostrophe for the directional double quotation marks.
What rendering is used depends on q
which by default depends on
the options
setting for useFancyQuotes
. If this
is FALSE
then the undirectional
ASCII quotation style is used. If this is TRUE
(the default),
Unicode directional quotes are used are used where available
(currently, UTF-8 locales on Unix-alikes and all Windows locales
except C
): if set to "UTF-8"
UTF-8 markup is used
(whatever the current locale). If set to "TeX"
, TeX-style
markup is used. Finally, if this is set to a character vector of
length four, the first two entries are used for beginning and ending
single quotes and the second two for beginning and ending double
quotes: this can be used to implement non-English quoting conventions
such as the use of guillemets.
Where fancy quotes are used, you should be aware that they may not be rendered correctly as not all fonts include the requisite glyphs: for example some have directional single quotes but not directional double quotes.
A character vector of the same length as x
(after any coercion)
in the current locale's encoding.
Markus Kuhn, “ASCII and Unicode quotation marks”. https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
Quotes for quoting R code.
shQuote
for quoting OS commands.
op <- options("useFancyQuotes") paste("argument", sQuote("x"), "must be non-zero") options(useFancyQuotes = FALSE) cat("\ndistinguish plain", sQuote("single"), "and", dQuote("double"), "quotes\n") options(useFancyQuotes = TRUE) cat("\ndistinguish fancy", sQuote("single"), "and", dQuote("double"), "quotes\n") options(useFancyQuotes = "TeX") cat("\ndistinguish TeX", sQuote("single"), "and", dQuote("double"), "quotes\n") if(l10n_info()$`Latin-1`) { options(useFancyQuotes = c("\xab", "\xbb", "\xbf", "?")) cat("\n", sQuote("guillemet"), "and", dQuote("Spanish question"), "styles\n") } else if(l10n_info()$`UTF-8`) { options(useFancyQuotes = c("\xc2\xab", "\xc2\xbb", "\xc2\xbf", "?")) cat("\n", sQuote("guillemet"), "and", dQuote("Spanish question"), "styles\n") } options(op)
op <- options("useFancyQuotes") paste("argument", sQuote("x"), "must be non-zero") options(useFancyQuotes = FALSE) cat("\ndistinguish plain", sQuote("single"), "and", dQuote("double"), "quotes\n") options(useFancyQuotes = TRUE) cat("\ndistinguish fancy", sQuote("single"), "and", dQuote("double"), "quotes\n") options(useFancyQuotes = "TeX") cat("\ndistinguish TeX", sQuote("single"), "and", dQuote("double"), "quotes\n") if(l10n_info()$`Latin-1`) { options(useFancyQuotes = c("\xab", "\xbb", "\xbf", "?")) cat("\n", sQuote("guillemet"), "and", dQuote("Spanish question"), "styles\n") } else if(l10n_info()$`UTF-8`) { options(useFancyQuotes = c("\xc2\xab", "\xc2\xbb", "\xc2\xbf", "?")) cat("\n", sQuote("guillemet"), "and", dQuote("Spanish question"), "styles\n") } options(op)
These functions are for working with source files and more generally
with “source references” ("srcref"
), i.e., references to
source code. The resulting data is used for printing and source level
debugging, and is typically available in interactive R sessions,
namely when options(keep.source = TRUE)
.
srcfile(filename, encoding = getOption("encoding"), Enc = "unknown") srcfilecopy(filename, lines, timestamp = Sys.time(), isFile = FALSE) srcfilealias(filename, srcfile) getSrcLines(srcfile, first, last) srcref(srcfile, lloc) ## S3 method for class 'srcfile' print(x, ...) ## S3 method for class 'srcfile' summary(object, ...) ## S3 method for class 'srcfile' open(con, line, ...) ## S3 method for class 'srcfile' close(con, ...) ## S3 method for class 'srcref' print(x, useSource = TRUE, ...) ## S3 method for class 'srcref' summary(object, useSource = FALSE, ...) ## S3 method for class 'srcref' as.character(x, useSource = TRUE, to = x, ...) .isOpen(srcfile)
srcfile(filename, encoding = getOption("encoding"), Enc = "unknown") srcfilecopy(filename, lines, timestamp = Sys.time(), isFile = FALSE) srcfilealias(filename, srcfile) getSrcLines(srcfile, first, last) srcref(srcfile, lloc) ## S3 method for class 'srcfile' print(x, ...) ## S3 method for class 'srcfile' summary(object, ...) ## S3 method for class 'srcfile' open(con, line, ...) ## S3 method for class 'srcfile' close(con, ...) ## S3 method for class 'srcref' print(x, useSource = TRUE, ...) ## S3 method for class 'srcref' summary(object, useSource = FALSE, ...) ## S3 method for class 'srcref' as.character(x, useSource = TRUE, to = x, ...) .isOpen(srcfile)
filename |
The name of a file. |
encoding |
The character encoding to assume for the file. |
Enc |
The encoding with which to make strings: see the
|
lines |
A character vector of source lines. Other R objects will be coerced to character. |
timestamp |
The timestamp to use on a copy of a file. |
isFile |
Is this |
srcfile |
A |
first , last , line
|
Line numbers. |
lloc |
A vector of four, six or eight values giving a source location; see ‘Details’. |
x , object , con
|
An object of the appropriate class. |
useSource |
Whether to read the |
to |
An optional second |
... |
Additional arguments to the methods; these will be ignored. |
These functions and classes handle source code references.
The srcfile
function produces an object of class
srcfile
, which contains the name and directory of a source code
file, along with its timestamp, for use in source level debugging (not
yet implemented) and source echoing. The encoding of the file is
saved; see file
for a discussion of encodings, and
iconvlist
for a list of allowable encodings on your platform.
The srcfilecopy
function produces an object of the descendant
class srcfilecopy
, which saves the source lines in a character
vector. It copies the value of the isFile
argument, to help
debuggers identify whether this text comes from a real file in the
file system.
The srcfilealias
function produces an object of the descendant
class srcfilealias
, which gives an alternate name to another
srcfile
. This is produced by the parser when a #line
directive
is used.
The getSrcLines
function reads the specified lines from
srcfile
.
The srcref
function produces an object of class
srcref
, which describes a range of characters in a
srcfile
.
The lloc
value gives the following values:
c(first_line, first_byte, last_line, last_byte, first_column, last_column, first_parsed, last_parsed)
Bytes (elements 2, 4) and
columns (elements 5, 6) may be different due to multibyte
characters. If only four values are given, the columns and bytes
are assumed to match. Lines (elements 1, 3) and parsed lines
(elements 7, 8) may differ if a #line
directive is used in
code: the former will respect the directive, the latter will just
count lines. If only 4 or 6 elements are given, the parsed lines
will be assumed to match the lines.
Methods are defined for print
, summary
, open
,
and close
for classes srcfile
and srcfilecopy
.
The open
method opens its internal file
connection at
a particular line; if it was already open, it will be repositioned
to that line.
Methods are defined for print
, summary
and
as.character
for class srcref
. The as.character
method will read the associated source file to obtain the text
corresponding to the reference. If the to
argument is given,
it should be a second srcref
that follows the first, in the
same file; they will be treated as one reference to the whole
range. The exact behaviour depends on the
class of the source file. If the source file inherits from
class srcfilecopy
, the lines are taken from the saved copy
using the “parsed” line counts. If not, an attempt
is made to read the file, and the original line numbers of the
srcref
record (i.e., elements 1 and 3) are used. If an error
occurs (e.g., the file no longer exists), text like
‘<srcref: "file" chars 1:1 to 2:10>’ will be returned instead,
indicating the line:column
ranges of the first and last
character. The summary
method defaults to this type of
display.
Lists of srcref
objects may be attached to expressions as the
"srcref"
attribute. (The list of srcref
objects should be the same
length as the expression.) By default, expressions are printed by
print.default
using the associated srcref
. To
see deparsed code instead, call print
with argument
useSource = FALSE
. If a srcref
object
is printed with useSource = FALSE
, the ‘<srcref: ....>’
record will be printed.
.isOpen
is intended for internal use: it checks whether the
connection associated with a srcfile
object is open.
srcfile
returns a srcfile
object.
srcfilecopy
returns a srcfilecopy
object.
getSrcLines
returns a character vector of source code lines.
srcref
returns a srcref
object.
Duncan Murdoch
getSrcFilename
for extracting information from a source
reference, or removeSource
to remove it from a
(non-primitive) function (aka ‘closure’).
src <- srcfile(system.file("DESCRIPTION", package = "base")) summary(src) getSrcLines(src, 1, 4) ref <- srcref(src, c(1, 1, 2, 1000)) ref print(ref, useSource = FALSE)
src <- srcfile(system.file("DESCRIPTION", package = "base")) summary(src) getSrcLines(src, 1, 4) ref <- srcref(src, c(1, 1, 2, 1000)) ref print(ref, useSource = FALSE)
Errors signaled by R when stacks used in evaluation overflow.
R uses several stacks in evaluating expressions: the C stack, the
pointer protection stack, and the node stack used by the byte code
engine. In addition, the number of nested R expressions currently
under evaluation is limited by the value set as
options("expressions")
. Overflowing these stacks or
limits signals an error that inherits from classes
stackOverflowError
, error
, and condition
.
The specific classes signaled are:
CStackOverflowError
: Signaled when the C stack
overflows. The usage
field of the error object contains the
current stack usage.
protectStackOverflowError
: Signaled when the pointer
protection stack overflows.
nodeStackOverflowError
: Signaled when the node stack
used by the byte code engine overflows.
expressionStackOverflowError
: Signaled when the the
evaluation depth, the number of nested R expressions currently
under evaluation, exceeds the limit set by
options("expressions")
Stack overflow errors can be caught and handled by exiting handlers
established with tryCatch()
Calling handlers established
by withCallingHandlers()
may fail since there may not be
enough stack space to run the handler. In this case the next available
exiting handler will be run, or error handling will fall back to the
default handler. Default handlers set by
tryCatch("error")
may also fail to run in a stack
overflow situation.
Cstack_info
for information on the environment and the
evaluation depth limit.
Memory
and options
for information on the
protection stack.
The function standardGeneric
initiates dispatch of S4
methods: see the references and the documentation of the
methods package. Usually, calls to this function are
generated automatically and not explicitly by the programmer.
standardGeneric(f, fdef)
standardGeneric(f, fdef)
f |
The name of the generic. |
fdef |
The generic function definition. Never passed when defining a new generic. |
standardGeneric
dispatches the method defined for a generic
function named f
, using the actual arguments in the frame from which
it is called.
The argument fdef
is inserted (automatically) when dispatching
methods for a primitive function. If present, it must always be the function
definition for the corresponding generic. Don't insert this argument
by hand, as there is no validity checking and miss-specifying the
function definition will cause certain failure.
For more, use the methods package, and see the documentation in
GenericFunctions
.
John Chambers
Chambers, John M. (2008) Software for Data Analysis: Programming with R Springer. (For the R version.)
Chambers, John M. (1998) Programming with Data Springer (For the original S4 version.)
Determines if entries of x
start or end with string (entries of)
prefix
or suffix
respectively, where strings are
recycled to common lengths.
startsWith(x, prefix) endsWith(x, suffix)
startsWith(x, prefix) endsWith(x, suffix)
x |
|
prefix , suffix
|
|
startsWith()
is equivalent to but much faster than
substring(x, 1, nchar(prefix)) == prefix
or also
grepl("^<prefix>", x)
where prefix
is not to contain special regular expression
characters (and for grepl
, x
does not contain missing
values, see below).
The code has an optimized branch for the most common usage in which
prefix
or suffix
is of length one, and is further
optimized in a UTF-8 or 8-byte locale if that is an ASCII string.
A logical
vector, of “common length” of x
and prefix
(or suffix
), i.e., of the longer of the two
lengths unless one of them is zero when the result is
also of zero length. A shorter input is recycled to the output length.
grepl
, substring
; the partial string
matching functions charmatch
and pmatch
solve a different task.
startsWith(search(), "package:") # typically at least two FALSE, nowadays often three x1 <- c("Foobar", "bla bla", "something", "another", "blu", "brown", "blau blüht der Enzian")# non-ASCII x2 <- cbind( startsWith(x1, "b"), startsWith(x1, "bl"), startsWith(x1, "bla"), endsWith(x1, "n"), endsWith(x1, "an")) rownames(x2) <- x1; colnames(x2) <- c("b", "b1", "bla", "n", "an") x2 ## Non-equivalence in case of missing values in 'x', see Details: x <- c("all", "but", NA_character_) cbind(startsWith(x, "a"), substring(x, 1L, 1L) == "a", grepl("^a", x))
startsWith(search(), "package:") # typically at least two FALSE, nowadays often three x1 <- c("Foobar", "bla bla", "something", "another", "blu", "brown", "blau blüht der Enzian")# non-ASCII x2 <- cbind( startsWith(x1, "b"), startsWith(x1, "bl"), startsWith(x1, "bla"), endsWith(x1, "n"), endsWith(x1, "an")) rownames(x2) <- x1; colnames(x2) <- c("b", "b1", "bla", "n", "an") x2 ## Non-equivalence in case of missing values in 'x', see Details: x <- c("all", "but", NA_character_) cbind(startsWith(x, "a"), substring(x, 1L, 1L) == "a", grepl("^a", x))
In R, the startup mechanism is as follows.
Unless --no-environ was given on the command line, R searches for site and user files to process for setting environment variables. The name of the site file is the one pointed to by the environment variable R_ENVIRON; if this is unset, ‘R_HOME/etc/Renviron.site’ is used (if it exists, which it does not in a ‘factory-fresh’ installation). The name of the user file can be specified by the R_ENVIRON_USER environment variable; if this is unset, the files searched for are ‘.Renviron’ in the current or in the user's home directory (in that order). See ‘Details’ for how the files are read.
Then R searches for the site-wide startup profile file of R code unless the command line option --no-site-file was given. The path of this file is taken from the value of the R_PROFILE environment variable (after tilde expansion). If this variable is unset, the default is ‘R_HOME/etc/Rprofile.site’, which is used if it exists (which it does not in a ‘factory-fresh’ installation).
This code is sourced into the workspace (global environment). Users need
to be careful not to unintentionally create objects in the workspace, and
it is normally advisable to use local
if code needs to be
executed: see the examples. .Library.site
may be assigned to and
the assignment will effectively modify the value of the variable in the
base namespace where .libPaths()
finds it. One may also
assign to .First
and .Last
, but assigning to other variables
in the execution environment is not recommended and does not work in
some older versions of R.
Then, unless --no-init-file was given, R searches for a user profile, a file of R code. The path of this file can be specified by the R_PROFILE_USER environment variable (and tilde expansion will be performed). If this is unset, a file called ‘.Rprofile’ is searched for in the current directory or in the user's home directory (in that order). The user profile file is sourced into the workspace.
Note that when the site and user profile files are sourced only the
base package is loaded, so objects in other packages need to be
referred to by e.g. utils::dump.frames
or after explicitly
loading the package concerned.
R then loads a saved image of the user workspace from ‘.RData’ in the current directory if there is one (unless --no-restore-data or --no-restore was specified on the command line).
Next, if a function .First
is found on the search path,
it is executed as .First()
. Finally, function
.First.sys()
in the base package is run. This calls
require
to attach the default packages specified by
options("defaultPackages")
. If the methods
package is included, this will have been attached earlier (by function
.OptRequireMethods()
) so that namespace initializations such
as those from the user workspace will proceed correctly.
A function .First
(and .Last
) can be defined in
appropriate ‘.Rprofile’ or ‘Rprofile.site’ files or have
been saved in ‘.RData’. If you want a different set of packages
than the default ones when you start, insert a call to
options
in the ‘.Rprofile’ or ‘Rprofile.site’
file. For example, options(defaultPackages = character())
will
attach no extra packages on startup (only the base package) (or
set R_DEFAULT_PACKAGES=NULL
as an environment variable before
running R). Using options(defaultPackages = "")
or
R_DEFAULT_PACKAGES=""
enforces the R system default.
On front-ends which support it, the commands history is read from the file specified by the environment variable R_HISTFILE (default ‘.Rhistory’ in the current directory) unless --no-restore-history or --no-restore was specified.
The command-line option --vanilla implies
--no-site-file, --no-init-file,
--no-environ and (except for R CMD
)
--no-restore
Note that there are two sorts of files used in startup: environment files which contain lists of environment variables to be set, and profile files which contain R code.
Lines in a site or user environment file should be either comment
lines starting with #
, or lines of the form
name=value
. The latter sets the environmental
variable name
to value
, overriding an
existing value. If value
contains an expression of the
form ${foo-bar}
, the value is that of the environmental
variable foo
if that is set, otherwise bar
. For
${foo:-bar}
, the value is that of foo
if that is set to
a non-empty value, otherwise bar
. (If it is of the form
${foo}
, the default is ""
.) This construction can be
nested, so bar
can be of the same form (as in
${foo-${bar-blah}}
). Note that the braces are essential: for
example $HOME
will not be interpreted.
Leading and trailing white space in value
are stripped.
value
is then processed in a similar way to a Unix shell:
in particular (single or double) quotes not preceded by backslash
are removed and backslashes are removed except inside such quotes.
For readability and future compatibility it is recommended to only use
constructs that have the same behavior as in a Unix shell. Hence,
expansions of variables should be in double quotes (e.g.
"${HOME}"
, in case they may contain a backslash) and literals
including a backslash should be in single quotes. If a variable value
may end in a backslash, such as PATH
on Windows, it may be
necessary to protect the following quote from it, e.g. "${PATH}/"
.
It is recommended to use forward slashes instead of backslashes.
It is ok to mix text in single and double quotes, see examples below.
On systems with sub-architectures (mainly Windows), the files ‘Renviron.site’ and ‘Rprofile.site’ are looked for first in architecture-specific directories, e.g. ‘R_HOME/etc/i386/Renviron.site’. And e.g. ‘.Renviron.i386’ will be used in preference to ‘.Renviron’.
There is a 100,000 byte limit on the length of a line (after expansions) in environment files.
It is not intended that there be interaction with the user during startup code. Attempting to do so can crash the R process.
On Unix versions of R there is also a file ‘R_HOME/etc/Renviron’ which is read very early in the start-up processing. It contains environment variables set by R in the configure process. Values in that file can be overridden in site or user environment files: do not change ‘R_HOME/etc/Renviron’ itself. Note that this is distinct from ‘R_HOME/etc/Renviron.site’.
Command-line options may well not apply to alternative front-ends:
they do not apply to R.app
on macOS.
R CMD check
and R CMD build
do not always read the
standard startup files, but they do always read specific
‘Renviron’ files. The location of these can be controlled by the
environment variables R_CHECK_ENVIRON and R_BUILD_ENVIRON.
If these are set their value is used as the path for the
‘Renviron’ file; otherwise, files ‘~/.R/check.Renviron’ or
‘~/.R/build.Renviron’ or sub-architecture-specific versions are
employed.
If you want ‘~/.Renviron’ or ‘~/.Rprofile’ to be ignored by
child R processes (such as those run by R CMD check
and
R CMD build
), set the appropriate environment variable
R_ENVIRON_USER or R_PROFILE_USER to (if possible, which it
is not on Windows) ""
or to the name of a non-existent file.
For the definition of the ‘home’ directory on Windows see the
‘rw-FAQ’ Q2.14. It can be found from a running R by
Sys.getenv("R_USER")
.
.Last
for final actions at the close of an R session.
commandArgs
for accessing the command line arguments.
There are examples of using startup files to set defaults for graphics devices in the help for
An Introduction to R for more command-line options: those affecting memory management are covered in the help file for Memory.
readRenviron
to read ‘.Renviron’ files.
For profiling code, see Rprof
.
## Not run: ## Example ~/.Renviron on Unix R_LIBS=~/R/library PAGER=/usr/local/bin/less ## Example .Renviron on Windows R_LIBS=C:/R/library MY_TCLTK="c:/Program Files/Tcl/bin" # Variable expansion in double quotes, string literals with backslashes in # single quotes. R_LIBS_USER="${APPDATA}"'\R-library' ## Example of setting R_DEFAULT_PACKAGES (from R CMD check) R_DEFAULT_PACKAGES='utils,grDevices,graphics,stats' # this loads the packages in the order given, so they appear on # the search path in reverse order. ## Example of .Rprofile options(width=65, digits=5) options(show.signif.stars=FALSE) setHook(packageEvent("grDevices", "onLoad"), function(...) grDevices::ps.options(horizontal=FALSE)) set.seed(1234) .First <- function() cat("\n Welcome to R!\n\n") .Last <- function() cat("\n Goodbye!\n\n") ## Example of Rprofile.site local({ # add MASS to the default packages, set a CRAN mirror old <- getOption("defaultPackages"); r <- getOption("repos") r["CRAN"] <- "http://my.local.cran" options(defaultPackages = c(old, "MASS"), repos = r) ## (for Unix terminal users) set the width from COLUMNS if set cols <- Sys.getenv("COLUMNS") if(nzchar(cols)) options(width = as.integer(cols)) # interactive sessions get a fortune cookie (needs fortunes package) if (interactive()) fortunes::fortune() }) ## if .Renviron contains FOOBAR="coo\bar"doh\ex"abc\"def'" ## then we get # > cat(Sys.getenv("FOOBAR"), "\n") # coo\bardoh\exabc"def' ## End(Not run)
## Not run: ## Example ~/.Renviron on Unix R_LIBS=~/R/library PAGER=/usr/local/bin/less ## Example .Renviron on Windows R_LIBS=C:/R/library MY_TCLTK="c:/Program Files/Tcl/bin" # Variable expansion in double quotes, string literals with backslashes in # single quotes. R_LIBS_USER="${APPDATA}"'\R-library' ## Example of setting R_DEFAULT_PACKAGES (from R CMD check) R_DEFAULT_PACKAGES='utils,grDevices,graphics,stats' # this loads the packages in the order given, so they appear on # the search path in reverse order. ## Example of .Rprofile options(width=65, digits=5) options(show.signif.stars=FALSE) setHook(packageEvent("grDevices", "onLoad"), function(...) grDevices::ps.options(horizontal=FALSE)) set.seed(1234) .First <- function() cat("\n Welcome to R!\n\n") .Last <- function() cat("\n Goodbye!\n\n") ## Example of Rprofile.site local({ # add MASS to the default packages, set a CRAN mirror old <- getOption("defaultPackages"); r <- getOption("repos") r["CRAN"] <- "http://my.local.cran" options(defaultPackages = c(old, "MASS"), repos = r) ## (for Unix terminal users) set the width from COLUMNS if set cols <- Sys.getenv("COLUMNS") if(nzchar(cols)) options(width = as.integer(cols)) # interactive sessions get a fortune cookie (needs fortunes package) if (interactive()) fortunes::fortune() }) ## if .Renviron contains FOOBAR="coo\bar"doh\ex"abc\"def'" ## then we get # > cat(Sys.getenv("FOOBAR"), "\n") # coo\bardoh\exabc"def' ## End(Not run)
stop
stops execution of the current expression and executes
an error action.
geterrmessage
gives the last error message.
stop(..., call. = TRUE, domain = NULL) geterrmessage()
stop(..., call. = TRUE, domain = NULL) geterrmessage()
... |
zero or more objects which can be coerced to character (and which are pasted together with no separator) or a single condition object. |
call. |
logical, indicating if the call should become part of the error message. |
domain |
see |
The error action is controlled by error handlers established within
the executing code and by the current default error handler set by
options(error=)
. The error is first signaled as if using
signalCondition()
. If there are no handlers or if all handlers
return, then the error message is printed (if
options("show.error.messages")
is true) and the default error
handler is used. The default behaviour (the NULL
error-handler) in interactive use is to return to the top level
prompt or the top level browser, and in non-interactive use to
(effectively) call q("no", status = 1, runLast = FALSE)
unless getOption("catch.script.errors")
is true.
The default handler stores the error message in a buffer; it can be
retrieved by geterrmessage()
. It also stores a trace of
the call stack that can be retrieved by traceback()
.
Errors will be truncated to getOption("warning.length")
characters, default 1000.
If a condition object is supplied it should be the only argument, and further arguments will be ignored, with a warning.
geterrmessage
gives the last error message, as a character string
ending in "\n"
.
Use domain = NA
whenever ...
contain a
result from gettextf()
as that is translated already.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
warning
, try
to catch errors and retry,
and options
for setting error handlers.
stopifnot
for validity testing. tryCatch
and withCallingHandlers
can be used to establish custom handlers
while executing an expression.
gettext
for the mechanisms for the automated translation
of messages.
iter <- 12 try(if(iter > 10) stop("too many iterations")) tst1 <- function(...) stop("dummy error") try(tst1(1:10, long, calling, expression)) tst2 <- function(...) stop("dummy error", call. = FALSE) try(tst2(1:10, longcalling, expression, but.not.seen.in.Error))
iter <- 12 try(if(iter > 10) stop("too many iterations")) tst1 <- function(...) stop("dummy error") try(tst1(1:10, long, calling, expression)) tst2 <- function(...) stop("dummy error", call. = FALSE) try(tst2(1:10, longcalling, expression, but.not.seen.in.Error))
If any of the expressions (in ...
or exprs
) are not
all
TRUE
, stop
is called, producing
an error message indicating the first expression which was not
(all
) true.
stopifnot(..., exprs, exprObject, local = TRUE)
stopifnot(..., exprs, exprObject, local = TRUE)
... , exprs
|
any number of R expressions, which should each
evaluate to (a logical vector of all) { expr1 expr2 .... } Note that e.g., positive numbers are not If names are provided to |
exprObject |
alternative to |
local |
(only when |
This function is intended for use in regression tests or also argument checking of functions, in particular to make them easier to read.
stopifnot(A, B)
or equivalently stopifnot(exprs= {A ;
B})
are conceptually equivalent to
{ if(any(is.na(A)) || !all(A)) stop(...); if(any(is.na(B)) || !all(B)) stop(...) }
Since R version 3.6.0, stopifnot()
no longer handles potential
errors or warnings (by tryCatch()
etc) for each single
expression
and may use sys.call(n)
to get a meaningful and short
error message in case an expression did not evaluate to all TRUE. This
provides considerably less overhead.
Since R version 3.5.0, expressions are evaluated sequentially, and hence evaluation stops as soon as there is a “non-TRUE”, as indicated by the above conceptual equivalence statement.
Also, since R version 3.5.0, stopifnot(exprs = { ... })
can be used
alternatively and may be preferable in the case of several
expressions, as they are more conveniently evaluated interactively
(“no extraneous ,
”).
Since R version 3.4.0, when an expression (from ...
) is not
true and is a call to all.equal
, the error
message will report the (first part of the) differences reported by
all.equal(*)
; since R 4.3.0, this happens for all calls
where "all.equal"
pmatch()
es the function called,
e.g., when that is called all.equalShow
, see the example in
all.equal
.
(NULL
if all statements in ...
are TRUE
.)
Trying to use the stopifnot(exprs = ..)
version via a shortcut,
say,
assertWRONG <- function(exprs) stopifnot(exprs = exprs)
is delicate and the above is not a good idea. Contrary to stopifnot()
which takes care to evaluate the parts of exprs
one by one and
stop at the first non-TRUE, the above short cut would typically evaluate
all parts of exprs
and pass the result, i.e., typically of the
last entry of exprs
to stopifnot()
.
However, a more careful version,
assert <- function(exprs) eval.parent(substitute(stopifnot(exprs = exprs)))
may be a nice short cut for stopifnot(exprs = *)
calls using the
more commonly known verb as function name.
stop
, warning
;
assertCondition
in package tools complements
stopifnot()
for testing warnings and errors.
## NB: Some of these examples are expected to produce an error. To ## prevent them from terminating a run with example() they are ## piped into a call to try(). stopifnot(1 == 1, all.equal(pi, 3.14159265), 1 < 2) # all TRUE m <- matrix(c(1,3,3,1), 2, 2) stopifnot(m == t(m), diag(m) == rep(1, 2)) # all(.) |=> TRUE stopifnot(length(10)) |> try() # gives an error: '1' is *not* TRUE ## even when if(1) "ok" works stopifnot(all.equal(pi, 3.141593), 2 < 2, (1:10 < 12), "a" < "b") |> try() ## More convenient for interactive "line by line" evaluation: stopifnot(exprs = { all.equal(pi, 3.1415927) 2 < 2 1:10 < 12 "a" < "b" }) |> try() eObj <- expression(2 < 3, 3 <= 3:6, 1:10 < 2) stopifnot(exprObject = eObj) |> try() stopifnot(exprObject = quote(3 == 3)) stopifnot(exprObject = TRUE) # long all.equal() error messages are abbreviated: stopifnot(all.equal(rep(list(pi),4), list(3.1, 3.14, 3.141, 3.1415))) |> try() # The default error message can be overridden to be more informative: m[1,2] <- 12 stopifnot("m must be symmetric"= m == t(m)) |> try() #=> Error: m must be symmetric ##' warnifnot(): a "only-warning" version of stopifnot() ##' {Yes, learn how to use do.call(substitute, ...) in a powerful manner !!} warnifnot <- stopifnot ; N <- length(bdy <- body(warnifnot)) bdy <- do.call(substitute, list(bdy, list(stopifnot = quote(warnifnot)))) bdy[[N-1]] <- do.call(substitute, list(bdy[[N-1]], list(stop = quote(warning)))) body(warnifnot) <- bdy warnifnot(1 == 1, 1 < 2, 2 < 2) # => warns " 2 < 2 is not TRUE " warnifnot(exprs = { 1 == 1 3 < 3 # => warns "3 < 3 is not TRUE" })
## NB: Some of these examples are expected to produce an error. To ## prevent them from terminating a run with example() they are ## piped into a call to try(). stopifnot(1 == 1, all.equal(pi, 3.14159265), 1 < 2) # all TRUE m <- matrix(c(1,3,3,1), 2, 2) stopifnot(m == t(m), diag(m) == rep(1, 2)) # all(.) |=> TRUE stopifnot(length(10)) |> try() # gives an error: '1' is *not* TRUE ## even when if(1) "ok" works stopifnot(all.equal(pi, 3.141593), 2 < 2, (1:10 < 12), "a" < "b") |> try() ## More convenient for interactive "line by line" evaluation: stopifnot(exprs = { all.equal(pi, 3.1415927) 2 < 2 1:10 < 12 "a" < "b" }) |> try() eObj <- expression(2 < 3, 3 <= 3:6, 1:10 < 2) stopifnot(exprObject = eObj) |> try() stopifnot(exprObject = quote(3 == 3)) stopifnot(exprObject = TRUE) # long all.equal() error messages are abbreviated: stopifnot(all.equal(rep(list(pi),4), list(3.1, 3.14, 3.141, 3.1415))) |> try() # The default error message can be overridden to be more informative: m[1,2] <- 12 stopifnot("m must be symmetric"= m == t(m)) |> try() #=> Error: m must be symmetric ##' warnifnot(): a "only-warning" version of stopifnot() ##' {Yes, learn how to use do.call(substitute, ...) in a powerful manner !!} warnifnot <- stopifnot ; N <- length(bdy <- body(warnifnot)) bdy <- do.call(substitute, list(bdy, list(stopifnot = quote(warnifnot)))) bdy[[N-1]] <- do.call(substitute, list(bdy[[N-1]], list(stop = quote(warning)))) body(warnifnot) <- bdy warnifnot(1 == 1, 1 < 2, 2 < 2) # => warns " 2 < 2 is not TRUE " warnifnot(exprs = { 1 == 1 3 < 3 # => warns "3 < 3 is not TRUE" })
Functions to convert between character representations and objects of
classes "POSIXlt"
and "POSIXct"
representing calendar
dates and times.
## S3 method for class 'POSIXct' format(x, format = "", tz = "", usetz = FALSE, ...) ## S3 method for class 'POSIXlt' format(x, format = "", usetz = FALSE, digits = getOption("digits.secs"), ...) ## S3 method for class 'POSIXt' as.character(x, digits = if(inherits(x, "POSIXlt")) 14L else 6L, OutDec = ".", ...) strftime(x, format = "", tz = "", usetz = FALSE, ...) strptime(x, format, tz = "")
## S3 method for class 'POSIXct' format(x, format = "", tz = "", usetz = FALSE, ...) ## S3 method for class 'POSIXlt' format(x, format = "", usetz = FALSE, digits = getOption("digits.secs"), ...) ## S3 method for class 'POSIXt' as.character(x, digits = if(inherits(x, "POSIXlt")) 14L else 6L, OutDec = ".", ...) strftime(x, format = "", tz = "", usetz = FALSE, ...) strptime(x, format, tz = "")
x |
an object to be converted: a character vector for
|
tz |
a character string specifying the time zone to be used for
the conversion. System-specific (see |
format |
a character string. The default for the |
... |
further arguments to be passed from or to other methods. |
usetz |
logical. Should the time zone abbreviation be appended
to the output? This is used in printing times, and more reliable
than using |
digits |
integer determining the |
OutDec |
a 1-character string specifying the decimal point to be
used; the default is not |
The format
and as.character
methods and strftime
convert objects from the classes "POSIXlt"
and
"POSIXct"
to character
vectors.
strptime
converts character vectors to class "POSIXlt"
:
its input x
is first converted by as.character
.
Each input string is processed as far as necessary for the format
specified: any trailing characters are ignored.
strftime
is a wrapper for format.POSIXlt
, and it and
format.POSIXct
first convert to class "POSIXlt"
by
calling as.POSIXlt
(so they also work for class
"Date"
). Note that only that conversion depends on the
time zone. Since R version 4.2.0, as.POSIXlt()
conversion now
treats the non-finite numeric -Inf
, Inf
, NA
and
NaN
differently (where previously all were treated as
NA
). Also the format()
method for POSIXlt
now
treats these different non-finite times and dates analogously to type
double
.
The usual vector re-cycling rules are applied to x
and
format
so the answer will be of length of the longer of these
vectors.
Locale-specific conversions to and from character strings are used
where appropriate and available. This affects the names of the days
and months, the AM/PM indicator (if used) and the separators in output
formats such as %x
and %X
, via the setting of
the LC_TIME
locale category. The ‘current
locale’ of the descriptions might mean the locale in use at the start
of the R session or when these functions are first used. (For input,
the locale-specific conversions can be changed by calling
Sys.setlocale
with category LC_TIME
(or
LC_ALL
). For output, what happens depends on the OS but
usually works.)
The details of the formats are platform-specific, but the following are
likely to be widely available: most are defined by the POSIX standard.
A conversion specification is introduced by %
, usually
followed by a single letter or O
or E
and then a single
letter. Any character in the format string not part of a conversion
specification is interpreted literally (and %%
gives
%
). Widely implemented conversion specifications include
%a
Abbreviated weekday name in the current locale on this platform. (Also matches full name on input: in some locales there are no abbreviations of names.)
%A
Full weekday name in the current locale. (Also matches abbreviated name on input.)
%b
Abbreviated month name in the current locale on this platform. (Also matches full name on input: in some locales there are no abbreviations of names.)
%B
Full month name in the current locale. (Also matches abbreviated name on input.)
%c
Date and time. Locale-specific on output,
"%a %b %e %H:%M:%S %Y"
on input.
%C
Century (00–99): the integer part of the year divided by 100.
%d
Day of the month as decimal number (01–31).
%D
Date format such as %m/%d/%y
: the C99
standard says it should be that exact format (but not all OSes
comply).
%e
Day of the month as decimal number (1–31), with a leading space for a single-digit number.
%F
Equivalent to %Y-%m-%d (the ISO 8601 date format).
%g
The last two digits of the week-based year
(see %V
). (Accepted but ignored on input.)
%G
The week-based year (see %V
) as a decimal
number. (Accepted but ignored on input.)
%h
Equivalent to %b
.
%H
Hours as decimal number (00–23). As a special exception strings such as ‘24:00:00’ are accepted for input, since ISO 8601 allows these.
%I
Hours as decimal number (01–12).
%j
Day of year as decimal number (001–366): For input, 366 is only valid in a leap year.
%m
Month as decimal number (01–12).
%M
Minute as decimal number (00–59).
%n
Newline on output, arbitrary whitespace on input.
%p
AM/PM indicator in the locale. Used in
conjunction with %I
and not with %H
. An
empty string in some locales (for example on some OSes,
non-English European locales including Russia). The behaviour is
undefined if used for input in such a locale.
Some platforms accept %P
for output, which uses a lower-case
version (%p
may also use lower case): others will output
P
.
%r
For output, the 12-hour clock time (using the
locale's AM or PM): only defined in some locales, and on some OSes
misleading in locales which do not define an AM/PM indicator.
For input, equivalent to %I:%M:%S %p
.
%R
Equivalent to %H:%M
.
%S
Second as integer (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds).
%t
Tab on output, arbitrary whitespace on input.
%T
Equivalent to %H:%M:%S
.
%u
Weekday as a decimal number (1–7, Monday is 1).
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
%V
Week of the year as decimal number (01–53) as
defined in ISO 8601.
If the week (starting on Monday) containing 1 January has four or
more days in the new year, then it is considered week 1. Otherwise, it
is the last week of the previous year, and the next week is week
1. See %G
(%g
) for the year corresponding to the
week given by %V
. (Accepted but ignored on input.)
%w
Weekday as decimal number (0–6, Sunday is 0).
%W
Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.
%x
Date. Locale-specific on output,
"%y/%m/%d"
on input.
%X
Time. Locale-specific on output,
"%H:%M:%S"
on input.
%y
Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2018 POSIX standard, but it does also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.
%Y
Year with century. Note that whereas there was no zero in the original Gregorian calendar, ISO 8601:2004 defines it to be valid (interpreted as 1BC): see https://en.wikipedia.org/wiki/0_(year). However, the standards also say that years before 1582 in its calendar should only be used with agreement of the parties involved.
For input, only years 0:9999
are accepted.
%z
Signed offset in hours and minutes from UTC, so
-0800
is 8 hours behind UTC. (Standard only for output. For
input R currently supports it on all platforms – values from
-1400
to +1400
are accepted.)
%Z
(Output only.) Time zone abbreviation as a character string (empty if not available). This may not be reliable when a time zone has changed abbreviations over the years.
Where leading zeros are shown they will be used on output but are
optional on input. Names are matched case-insensitively on input:
whether they are capitalized on output depends on the platform and the
locale. Note that abbreviated names are platform-specific (although
the standards specify that in the ‘C’ locale they must be the
first three letters of the capitalized English name: this convention
is widely used in English-language locales but for example the French
month abbreviations are not the same on any two of Linux, macOS, Solaris
and Windows). Knowing what the abbreviations are is essential
if you wish to use %a
, %b
or %h
as part of an
input format: see the examples for how to check.
When %z
or %Z
is used for output with an
object with an assigned time zone an attempt is made to use the values
for that time zone — but it is not guaranteed to succeed.
The definition of ‘whitespace’ for %n
and %t
is platform-dependent: for most it does not include non-breaking spaces.
Not in the standards and less widely implemented are
%k
The 24-hour clock time with single digits preceded by a blank.
%l
The 12-hour clock time with single digits preceded by a blank.
%s
(Output only.) The number of seconds since the epoch.
%+
(Output only.) Similar to %c
, often
"%a %b %e %H:%M:%S %Z %Y"
. May depend on the locale.
For output there are also %O[dHImMUVwWy]
which may emit
numbers in an alternative locale-dependent format (e.g., roman
numerals), and %E[cCyYxX]
which can use an alternative
‘era’ (e.g., a different religious calendar). Which of these
are supported is OS-dependent. These are accepted for input, but with
the standard interpretation.
Specific to R is %OSn
, which for output gives the seconds
truncated to 0 <= n <= 6
decimal places (and if %OS
is
not followed by a digit, it uses the setting of
getOption("digits.secs")
, or if that is unset, n =
0
). Further, for strptime
%OS
will input seconds
including fractional seconds. Note that %S
does not read
fractional parts on output.
The behaviour of other conversion specifications (and even if other
character sequences commencing with %
are conversion
specifications) is system-specific. Some systems document that the
use of multi-byte characters in format
is unsupported: UTF-8
locales are unlikely to cause a problem.
The format
methods and strftime
return character vectors
representing the time. NA
times are returned as
NA_character_
.
strptime
turns character representations into an object of
class "POSIXlt"
. The time zone is used to set the
isdst
component and to set the "tzone"
attribute if
tz != ""
. If the specified time is invalid (for example
‘"2010-02-30 08:00"’) all the components of the result are
NA
. (NB: this does means exactly what it says – if it is an
invalid time, not just a time that does not exist in some time zone.)
Everyone agrees that years from 1000 to 9999 should be printed with 4 digits, but the standards do not define what is to be done outside that range. For years 0 to 999 most OSes pad with zeros or spaces to 4 characters, but Linux/glibc outputs just the number.
OS facilities will probably not print years before 1 CE (aka 1 AD)
‘correctly’ (they tend to assume the existence of a year 0: see
https://en.wikipedia.org/wiki/0_(year), and some OSes get them
completely wrong). Common formats are -45
and -045
.
Years after 9999 and before -999 are normally printed with five or more characters.
Some platforms support modifiers from POSIX 2008 (and others). On
Linux/glibc the format "%04Y"
assures a minimum of four
characters and zero-padding (the default is no padding). The internal
code (as used on Windows and by default on macOS) uses zero-padding by
default (this can be controlled by environment variable
R_PAD_YEARS_BY_ZERO). On those platforms, formats %04Y
,
%_4Y
and %_Y
can be used for zero, space and no
padding respectively. (On macOS, the native code (not the default)
supports none of these and uses zero-padding to 4 digits.)
Offsets from GMT (also known as UTC) are part of the conversion
between timezones and to/from class "POSIXct"
, but cause
difficulties as they are often computed incorrectly.
They conventionally have the opposite sign from time-zone
specifications (see Sys.timezone
): positive values are
East of the meridian. Although there have been time zones with
offsets like +00:09:21 (Paris in 1900), and -00:44:30 (Liberia until
1972), offsets are usually treated as whole numbers of minutes, and
are most often seen in RFC 5322 email headers in forms like
-0800
(e.g., used on the Pacific coast of the USA in winter).
Format %z
can be used for input or output: it is a character
string, conventionally plus or minus followed by two digits for hours
and two for minutes: the standards say that an empty string should be
output if the offset is undetermined, but some systems use
+0000
or the offsets for the time zone in use for the current
year. (On some platforms this works better after conversion to
"POSIXct"
. Some platforms only recognize hour or half-hour
offsets for output.)
Using %z
for input makes most sense with tz = "UTC"
.
Input uses the POSIX function strptime
and output the C99
function strftime
.
However, not all OSes (notably Windows) provided strptime
and
many issues were found for those which did, so since 2000 R has used
a fork of code from ‘glibc’. The forked code uses the
system's strftime
to find the locale-specific day and month
names and any AM/PM indicator.
On some platforms (including Windows and by default on macOS) the
system's strftime
is replaced (along with most of the rest of
the C-level datetime code) by code modified from IANA's ‘tzcode’
distribution (https://www.iana.org/time-zones).
Note that as strftime
is used for output (and not
wcsftime
), argument format
is translated if necessary to
the session encoding.
The default formats follow the rules of the ISO 8601 international
standard which expresses a day as "2001-02-28"
and a time as
"14:01:02"
using leading zeroes as here. (The ISO form uses no
space, possibly ‘T’, to separate dates and times: R uses a space
by default.)
For strptime
the input string need not specify the date
completely: it is assumed that unspecified seconds, minutes or hours
are zero, and an unspecified year, month or day is the current one.
(However, if a month is specified, the day of that month has to be
specified by %d
or %e
since the current day of the
month need not be valid for the specified month.) Some components may
be returned as NA
(but an unknown tzone
component is
represented by an empty string).
If the time zone specified is invalid on your system, what happens is system-specific but it will probably be ignored.
Remember that in most time zones some times do not occur and some
occur twice because of transitions to/from ‘daylight saving’
(also known as ‘summer’) time. strptime
does not
validate such times (it does not assume a specific time zone), but
conversion by as.POSIXct
will do so. Conversion by
strftime
and formatting/printing uses OS facilities and may
return nonsensical results for non-existent times at DST transitions.
In a C locale %c
is required to be
"%a %b %e %H:%M:%S %Y"
. As Windows does not comply (and
uses a date format not understood outside N. America), that format is
used by R on Windows in all locales.
There is a limit of 2048 bytes on each string produced by
strftime
and the format
methods. As from R 4.3.0
attempting to exceed this is an error (previous versions silently
truncated at 255 bytes).
International Organization for Standardization (2004, 2000, ...) ‘ISO 8601. Data elements and interchange formats – Information interchange – Representation of dates and times.’, slightly updated to International Organization for Standardization (2019) ‘ISO 8601-1:2019. Date and time – Representations for information interchange – Part 1: Basic rules’, and further amended in 2022. For links to versions available on-line see (at the time of writing) https://dotat.at/tmp/ISO_8601-2004_E.pdf and https://www.qsl.net/g1smd/isopdf.htm; for information on the current official version, see https://www.iso.org/iso/iso8601 and https://en.wikipedia.org/wiki/ISO_8601.
The POSIX 1003.1 standard, which is in some respects stricter than ISO 8601.
DateTimeClasses for details of the date-time classes; locales to query or set a locale.
Your system's help page on strftime
to see how to specify their
formats. (On some systems, including Windows, strftime
is
replaced by more comprehensive internal code.)
## locale-specific version of date() format(Sys.time(), "%a %b %d %X %Y %Z") ## time to sub-second accuracy (if supported by the OS) format(Sys.time(), "%H:%M:%OS3") ## read in date info in format 'ddmmmyyyy' ## This will give NA(s) in some non-English locales; setting the C locale ## as in the commented lines will overcome this on most systems. ## lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C") x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960") z <- strptime(x, "%d%b%Y") ## Sys.setlocale("LC_TIME", lct) z (chz <- as.character(z)) # same w/o TZ ## *here* (but not in general), the same as format(): stopifnot(exprs = { identical(chz, format(z)) grepl("^1960-0[137]-[03][012]$", chz[!is.na(z)]) }) ## read in date/time info in format 'm/d/y h:m:s' dates <- c("02/27/92", "02/27/92", "01/14/92", "02/28/92", "02/01/92") times <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26") x <- paste(dates, times) z2 <- strptime(x, "%m/%d/%y %H:%M:%S") z2 ## *here* (but not in general), the same as format(): stopifnot(identical(format(z2), as.character(z2))) ## time with fractional seconds z3 <- strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS") z3 # prints without fractional seconds by default, digits.sec = NULL ("= 0") op <- options(digits.secs = 3) z3 # shows the 3 extra digits as.character(z3) # ditto options(op) ## time zone names are not portable, but 'EST5EDT' comes pretty close. ## (but its interpretation may not be universal: see ?timezones) z4 <- strptime(c("2006-01-08 10:07:52", "2006-08-07 19:33:02"), "%Y-%m-%d %H:%M:%S", tz = "EST5EDT") z4 attr(z4, "tzone") as.character(z4) z4$sec[2] <- pi # "very" fractional seconds as.character(z4) # shows full precision format(z4) # no fractional sec format(z4, digits=8) # shows only 6 (hard-wired maximum) format(z4, digits=4) ## An RFC 5322 header (Eastern Canada, during DST) ## In a non-English locale the commented lines may be needed. ## prev <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C") strptime("Tue, 23 Mar 2010 14:36:38 -0400", "%a, %d %b %Y %H:%M:%S %z") ## Sys.setlocale("LC_TIME", prev) ## Make sure you know what the abbreviated names are for you if you wish ## to use them for input (they are matched case-insensitively): format(s1 <- seq.Date(as.Date('1978-01-01'), by = 'day', len = 7), "%a") format(s2 <- seq.Date(as.Date('2000-01-01'), by = 'month', len = 12), "%b") ## Non-finite date-times : format(as.POSIXct(Inf)) # "Inf" (was NA in R <= 4.1.x) format(as.POSIXlt(c(-Inf,Inf,NaN,NA))) # were all NA
## locale-specific version of date() format(Sys.time(), "%a %b %d %X %Y %Z") ## time to sub-second accuracy (if supported by the OS) format(Sys.time(), "%H:%M:%OS3") ## read in date info in format 'ddmmmyyyy' ## This will give NA(s) in some non-English locales; setting the C locale ## as in the commented lines will overcome this on most systems. ## lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C") x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960") z <- strptime(x, "%d%b%Y") ## Sys.setlocale("LC_TIME", lct) z (chz <- as.character(z)) # same w/o TZ ## *here* (but not in general), the same as format(): stopifnot(exprs = { identical(chz, format(z)) grepl("^1960-0[137]-[03][012]$", chz[!is.na(z)]) }) ## read in date/time info in format 'm/d/y h:m:s' dates <- c("02/27/92", "02/27/92", "01/14/92", "02/28/92", "02/01/92") times <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26") x <- paste(dates, times) z2 <- strptime(x, "%m/%d/%y %H:%M:%S") z2 ## *here* (but not in general), the same as format(): stopifnot(identical(format(z2), as.character(z2))) ## time with fractional seconds z3 <- strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS") z3 # prints without fractional seconds by default, digits.sec = NULL ("= 0") op <- options(digits.secs = 3) z3 # shows the 3 extra digits as.character(z3) # ditto options(op) ## time zone names are not portable, but 'EST5EDT' comes pretty close. ## (but its interpretation may not be universal: see ?timezones) z4 <- strptime(c("2006-01-08 10:07:52", "2006-08-07 19:33:02"), "%Y-%m-%d %H:%M:%S", tz = "EST5EDT") z4 attr(z4, "tzone") as.character(z4) z4$sec[2] <- pi # "very" fractional seconds as.character(z4) # shows full precision format(z4) # no fractional sec format(z4, digits=8) # shows only 6 (hard-wired maximum) format(z4, digits=4) ## An RFC 5322 header (Eastern Canada, during DST) ## In a non-English locale the commented lines may be needed. ## prev <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C") strptime("Tue, 23 Mar 2010 14:36:38 -0400", "%a, %d %b %Y %H:%M:%S %z") ## Sys.setlocale("LC_TIME", prev) ## Make sure you know what the abbreviated names are for you if you wish ## to use them for input (they are matched case-insensitively): format(s1 <- seq.Date(as.Date('1978-01-01'), by = 'day', len = 7), "%a") format(s2 <- seq.Date(as.Date('2000-01-01'), by = 'month', len = 12), "%b") ## Non-finite date-times : format(as.POSIXct(Inf)) # "Inf" (was NA in R <= 4.1.x) format(as.POSIXlt(c(-Inf,Inf,NaN,NA))) # were all NA
Repeat the character strings in a character vector a given number of times (i.e., concatenate the respective numbers of copies of the strings).
strrep(x, times)
strrep(x, times)
x |
a character vector, or an object which can be coerced to a
character vector using |
times |
an integer vector giving the (non-negative) numbers of
times to repeat the respective elements of |
The elements of x
and times
will be recycled as
necessary (if one has no elements, and empty character vector is
returned). Missing elements in x
or times
result in
missing elements of the return value.
A character vector with the elements of the given character vector repeated the given numbers of times.
strrep("ABC", 2) strrep(c("A", "B", "C"), 1 : 3) ## Create vectors with the given numbers of spaces: strrep(" ", 1 : 5)
strrep("ABC", 2) strrep(c("A", "B", "C"), 1 : 3) ## Create vectors with the given numbers of spaces: strrep(" ", 1 : 5)
Split the elements of a character vector x
into substrings
according to the matches to substring split
within them.
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
x |
character vector, each element of which is to be split. Other inputs, including a factor, will give an error. |
split |
character vector (or object which can be coerced to such)
containing regular expression(s) (unless |
fixed |
logical. If |
perl |
logical. Should Perl-compatible regexps be used? |
useBytes |
logical. If |
Argument split
will be coerced to character, so
you will see uses with split = NULL
to mean
split = character(0)
, including in the examples below.
Note that splitting into single characters can be done via
split = character(0)
or split = ""
; the two are
equivalent. The definition of ‘character’ here depends on the
locale: in a single-byte locale it is a byte, and in a multi-byte
locale it is the unit represented by a ‘wide character’ (almost
always a Unicode code point).
A missing value of split
does not split the corresponding
element(s) of x
at all.
The algorithm applied to each input string is
repeat { if the string is empty break. if there is a match add the string to the left of the match to the output. remove the match and all to the left of it. else add the string to the output. break. }
Note that this means that if there is a match at the beginning of a
(non-empty) string, the first element of the output is ""
, but
if there is a match at the end of the string, the output is the same
as with the match removed.
Note also that if there is an empty match at the beginning of a non-empty
string, the first character is returned and the algorithm continues with
the rest of the string. This needs to be kept in mind when designing the
regular expressions. For example, when looking for a word boundary
followed by a letter ("[[:<:]]"
with perl = TRUE
), one can
disallow a match at the beginning of a string (via "(?!^)[[:<:]]"
).
Invalid inputs in the current locale are warned about up to 5 times.
A list of the same length as x
, the i
-th element of which
contains the vector of splits of x[i]
.
If any element of x
or split
is declared to be in UTF-8
(see Encoding
), all non-ASCII character strings in the
result will be in UTF-8 and have their encoding declared as UTF-8.
(This also holds if any element is declared to be Latin-1 except in a
Latin-1 locale.)
For perl = TRUE, useBytes = FALSE
all non-ASCII strings in a
multibyte locale are translated to UTF-8.
If any element of x
or split
is marked as "bytes"
(see Encoding
), all non-ASCII character strings created by
the splitting in the result will be marked as "bytes"
, but encoding
of the resulting character strings not split is unspecified (may be
"bytes"
or the original). If no element of x
or
split
is marked as "bytes"
, but useBytes = TRUE
, even
the encoding of the resulting character strings created by splitting is
unspecified (may be "bytes"
or "unknown"
, possibly invalid
in the current encoding). Mixed use of "bytes"
and other marked
encodings is discouraged, but if still desired one may use
iconv
to re-encode the result e.g. to UTF-8 with suitably
substituted invalid bytes.
paste
for the reverse,
grep
and sub
for string search and
manipulation; also nchar
, substr
.
‘regular expression’ for the details of the pattern specification.
Option PCRE_use_JIT
controls the details when perl = TRUE
.
noquote(strsplit("A text I want to display with spaces", NULL)[[1]]) x <- c(as = "asfef", qu = "qwerty", "yuiop[", "b", "stuff.blah.yech") # split x on the letter e strsplit(x, "e") unlist(strsplit("a.b.c", ".")) ## [1] "" "" "" "" "" ## Note that 'split' is a regexp! ## If you really want to split on '.', use unlist(strsplit("a.b.c", "[.]")) ## [1] "a" "b" "c" ## or unlist(strsplit("a.b.c", ".", fixed = TRUE)) ## a useful function: rev() for strings strReverse <- function(x) sapply(lapply(strsplit(x, NULL), rev), paste, collapse = "") strReverse(c("abc", "Statistics")) ## get the first names of the members of R-core a <- readLines(file.path(R.home("doc"),"AUTHORS"))[-(1:8)] a <- a[(0:2)-length(a)] (a <- sub(" .*","", a)) # and reverse them strReverse(a) ## Note that final empty strings are not produced: strsplit(paste(c("", "a", ""), collapse="#"), split="#")[[1]] # [1] "" "a" ## and also an empty string is only produced before a definite match: strsplit("", " ")[[1]] # character(0) strsplit(" ", " ")[[1]] # [1] ""
noquote(strsplit("A text I want to display with spaces", NULL)[[1]]) x <- c(as = "asfef", qu = "qwerty", "yuiop[", "b", "stuff.blah.yech") # split x on the letter e strsplit(x, "e") unlist(strsplit("a.b.c", ".")) ## [1] "" "" "" "" "" ## Note that 'split' is a regexp! ## If you really want to split on '.', use unlist(strsplit("a.b.c", "[.]")) ## [1] "a" "b" "c" ## or unlist(strsplit("a.b.c", ".", fixed = TRUE)) ## a useful function: rev() for strings strReverse <- function(x) sapply(lapply(strsplit(x, NULL), rev), paste, collapse = "") strReverse(c("abc", "Statistics")) ## get the first names of the members of R-core a <- readLines(file.path(R.home("doc"),"AUTHORS"))[-(1:8)] a <- a[(0:2)-length(a)] (a <- sub(" .*","", a)) # and reverse them strReverse(a) ## Note that final empty strings are not produced: strsplit(paste(c("", "a", ""), collapse="#"), split="#")[[1]] # [1] "" "a" ## and also an empty string is only produced before a definite match: strsplit("", " ")[[1]] # character(0) strsplit(" ", " ")[[1]] # [1] ""
Convert strings to integers according to the given base using the C
function strtol
, or choose a suitable base following the C rules.
strtoi(x, base = 0L)
strtoi(x, base = 0L)
x |
a character vector, or something coercible to this by
|
base |
an integer which is between 2 and 36 inclusive, or zero (default). |
Conversion is based on the C library function strtol
.
For the default base = 0L
, the base chosen from the string
representation of that element of x
, so different elements can
have different bases (see the first example). The standard C rules
for choosing the base are that octal constants (prefix 0
not
followed by x
or X
) and hexadecimal constants (prefix
0x
or 0X
) are interpreted as base 8
and
16
; all other strings are interpreted as base 10
.
For a base greater than 10
, letters a
to z
(or
A
to Z
) are used to represent 10
to 35
.
An integer vector of the same length as x
. Values which cannot
be interpreted as integers or would overflow are returned as
NA_integer_
.
For decimal strings as.integer
is equally useful.
strtoi(c("0xff", "077", "123")) strtoi(c("ffff", "FFFF"), 16L) strtoi(c("177", "377"), 8L)
strtoi(c("0xff", "077", "123")) strtoi(c("ffff", "FFFF"), 16L) strtoi(c("177", "377"), 8L)
Trim character strings to specified display widths.
strtrim(x, width)
strtrim(x, width)
x |
a character vector, or an object which can be coerced to a
character vector by |
width |
positive integer values: recycled to the length of |
‘Width’ is interpreted as the display width in a monospaced font. What happens with non-printable characters (such as backspace, tab) is implementation-dependent and may depend on the locale (e.g., they may be included in the count or they may be omitted).
Using this function rather than substr
is important when
there might be double-width (e.g., Chinese/Japanese/Korean) characters
in the character vector.
A character vector of the same length and with the same attributes
as x
(after possible coercion).
Elements of the result will have the encoding declared as that of
the current locale (see Encoding
) if the corresponding
input had a declared encoding and the current locale is either Latin-1
or UTF-8.
strtrim(c("abcdef", "abcdef", "abcdef"), c(1,5,10))
strtrim(c("abcdef", "abcdef", "abcdef"), c(1,5,10))
structure
returns the given object with further
attributes set.
structure(.Data, ...)
structure(.Data, ...)
.Data |
an object which will have various attributes attached to it. |
... |
attributes, specified in |
Adding a class "factor"
will ensure that numeric codes are
given integer storage mode.
For historical reasons (these names are used when deparsing),
attributes ".Dim"
, ".Dimnames"
, ".Names"
,
".Tsp"
and ".Label"
are renamed to "dim"
,
"dimnames"
, "names"
, "tsp"
and "levels"
.
It is possible to give the same tag more than once, in which case the
last value assigned wins. As with other ways of assigning attributes,
using tag = NULL
removes attribute tag
from .Data
if
it is present.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
structure(1:6, dim = 2:3)
structure(1:6, dim = 2:3)
Each character string in the input is first split into paragraphs (or lines containing whitespace only). The paragraphs are then formatted by breaking lines at word boundaries. The target columns for wrapping lines and the indentation of the first and all subsequent lines of a paragraph can be controlled independently.
strwrap(x, width = 0.9 * getOption("width"), indent = 0, exdent = 0, prefix = "", simplify = TRUE, initial = prefix)
strwrap(x, width = 0.9 * getOption("width"), indent = 0, exdent = 0, prefix = "", simplify = TRUE, initial = prefix)
x |
a character vector, or an object which can be converted to a
character vector by |
width |
a positive integer giving the target column for wrapping lines in the output. |
indent |
a non-negative integer giving the indentation of the first line in a paragraph. |
exdent |
a non-negative integer specifying the indentation of subsequent lines in paragraphs. |
prefix , initial
|
a character string to be used as prefix for
each line except the first, for which |
simplify |
a logical. If |
Whitespace (space, tab or newline characters) in the input is destroyed. Double spaces after periods, question and explanation marks (thought as representing sentence ends) are preserved. Currently, possible sentence ends at line breaks are not considered specially.
Indentation is relative to the number of characters in the prefix string.
A character vector (if simplify
is TRUE
), or a list of
such character vectors, with declared input encodings preserved.
## Read in file 'THANKS'. x <- paste(readLines(file.path(R.home("doc"), "THANKS")), collapse = "\n") ## Split into paragraphs and remove the first three ones x <- unlist(strsplit(x, "\n[ \t\n]*\n"))[-(1:3)] ## Join the rest x <- paste(x, collapse = "\n\n") ## Now for some fun: writeLines(strwrap(x, width = 60)) writeLines(strwrap(x, width = 60, indent = 5)) writeLines(strwrap(x, width = 60, exdent = 5)) writeLines(strwrap(x, prefix = "THANKS> ")) ## Note that messages are wrapped AT the target column indicated by ## 'width' (and not beyond it). ## From an R-devel posting by J. Hosking <[email protected]>. x <- paste(sapply(sample(10, 100, replace = TRUE), function(x) substring("aaaaaaaaaa", 1, x)), collapse = " ") sapply(10:40, function(m) c(target = m, actual = max(nchar(strwrap(x, m)))))
## Read in file 'THANKS'. x <- paste(readLines(file.path(R.home("doc"), "THANKS")), collapse = "\n") ## Split into paragraphs and remove the first three ones x <- unlist(strsplit(x, "\n[ \t\n]*\n"))[-(1:3)] ## Join the rest x <- paste(x, collapse = "\n\n") ## Now for some fun: writeLines(strwrap(x, width = 60)) writeLines(strwrap(x, width = 60, indent = 5)) writeLines(strwrap(x, width = 60, exdent = 5)) writeLines(strwrap(x, prefix = "THANKS> ")) ## Note that messages are wrapped AT the target column indicated by ## 'width' (and not beyond it). ## From an R-devel posting by J. Hosking <[email protected]>. x <- paste(sapply(sample(10, 100, replace = TRUE), function(x) substring("aaaaaaaaaa", 1, x)), collapse = " ") sapply(10:40, function(m) c(target = m, actual = max(nchar(strwrap(x, m)))))
Return subsets of vectors, matrices or data frames which meet conditions.
subset(x, ...) ## Default S3 method: subset(x, subset, ...) ## S3 method for class 'matrix' subset(x, subset, select, drop = FALSE, ...) ## S3 method for class 'data.frame' subset(x, subset, select, drop = FALSE, ...)
subset(x, ...) ## Default S3 method: subset(x, subset, ...) ## S3 method for class 'matrix' subset(x, subset, select, drop = FALSE, ...) ## S3 method for class 'data.frame' subset(x, subset, select, drop = FALSE, ...)
x |
object to be subsetted. |
subset |
logical expression indicating elements or rows to keep: missing values are taken as false. |
select |
expression, indicating columns to select from a data frame. |
drop |
passed on to |
... |
further arguments to be passed to or from other methods. |
This is a generic function, with methods supplied for matrices, data frames and vectors (including lists). Packages and users can add further methods.
For ordinary vectors, the result is simply
x[subset & !is.na(subset)]
.
For data frames, the subset
argument works on the rows. Note
that subset
will be evaluated in the data frame, so columns can
be referred to (by name) as variables in the expression (see the examples).
The select
argument exists only for the methods for data frames
and matrices. It works by first replacing column names in the
selection expression with the corresponding column numbers in the data
frame and then using the resulting integer vector to index the
columns. This allows the use of the standard indexing conventions so
that for example ranges of columns can be specified easily, or single
columns can be dropped (see the examples).
The drop
argument is passed on to the indexing method for
matrices and data frames: note that the default for matrices is
different from that for indexing.
Factors may have empty levels after subsetting; unused levels are
not automatically removed. See droplevels
for a way to
drop all unused levels from a data frame.
An object similar to x
contain just the selected elements (for
a vector), rows and columns (for a matrix or data frame), and so on.
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
[
, and in particular the non-standard evaluation of
argument subset
can have unanticipated consequences.
Peter Dalgaard and Brian Ripley
subset(airquality, Temp > 80, select = c(Ozone, Temp)) subset(airquality, Day == 1, select = -Temp) subset(airquality, select = Ozone:Wind) with(airquality, subset(Ozone, Temp > 80)) ## sometimes requiring a logical 'subset' argument is a nuisance nm <- rownames(state.x77) start_with_M <- nm %in% grep("^M", nm, value = TRUE) subset(state.x77, start_with_M, Illiteracy:Murder) # but in recent versions of R this can simply be subset(state.x77, grepl("^M", nm), Illiteracy:Murder)
subset(airquality, Temp > 80, select = c(Ozone, Temp)) subset(airquality, Day == 1, select = -Temp) subset(airquality, select = Ozone:Wind) with(airquality, subset(Ozone, Temp > 80)) ## sometimes requiring a logical 'subset' argument is a nuisance nm <- rownames(state.x77) start_with_M <- nm %in% grep("^M", nm, value = TRUE) subset(state.x77, start_with_M, Illiteracy:Murder) # but in recent versions of R this can simply be subset(state.x77, grepl("^M", nm), Illiteracy:Murder)
substitute
returns the parse tree for the (unevaluated)
expression expr
, substituting any variables bound in
env
.
quote
simply returns its argument. The argument is not evaluated
and can be any R expression.
enquote
is a simple one-line utility which transforms a call of
the form Foo(....)
into the call quote(Foo(....))
. This
is typically used to protect a call
from early evaluation.
substitute(expr, env) quote(expr) enquote(cl)
substitute(expr, env) quote(expr) enquote(cl)
expr |
any syntactically valid R expression. |
cl |
|
env |
an environment or a list object. Defaults to the current evaluation environment. |
The typical use of substitute
is to create informative labels
for data sets and plots.
The myplot
example below shows a simple use of this facility.
It uses the functions deparse
and substitute
to create labels for a plot which are character string versions
of the actual arguments to the function myplot
.
Substitution takes place by examining each component of the parse tree
as follows: If it is not a bound symbol in env
, it is
unchanged. If it is a promise object, i.e., a formal argument to a
function or explicitly created using delayedAssign()
,
the expression slot of the promise replaces the symbol. If it is an
ordinary variable, its value is substituted, unless env
is
.GlobalEnv
in which case the symbol is left unchanged.
Both quote
and substitute
are ‘special’
primitive functions which do not evaluate their arguments.
The mode
of the result is generally "call"
but
may in principle be any type. In particular, single-variable
expressions have mode "name"
and constants have the
appropriate base mode.
substitute
works on a purely lexical basis. There is no
guarantee that the resulting expression makes any sense.
Substituting and quoting often cause confusion when the argument is
expression(...)
. The result is a call to the
expression
constructor function and needs to be evaluated
with eval
to give the actual expression object.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
missing
for argument ‘missingness’,
bquote
for partial substitution,
sQuote
and dQuote
for adding quotation
marks to strings.
Quotes
about forward, back, and double quotes ‘'’,
‘`’, and ‘"’.
all.names
to retrieve the symbol names from an expression
or call.
require(graphics) (s.e <- substitute(expression(a + b), list(a = 1))) #> expression(1 + b) (s.s <- substitute( a + b, list(a = 1))) #> 1 + b c(mode(s.e), typeof(s.e)) # "call", "language" c(mode(s.s), typeof(s.s)) # (the same) # but: (e.s.e <- eval(s.e)) #> expression(1 + b) c(mode(e.s.e), typeof(e.s.e)) # "expression", "expression" substitute(x <- x + 1, list(x = 1)) # nonsense myplot <- function(x, y) plot(x, y, xlab = deparse1(substitute(x)), ylab = deparse1(substitute(y))) ## Simple examples about lazy evaluation, etc: f1 <- function(x, y = x) { x <- x + 1; y } s1 <- function(x, y = substitute(x)) { x <- x + 1; y } s2 <- function(x, y) { if(missing(y)) y <- substitute(x); x <- x + 1; y } a <- 10 f1(a) # 11 s1(a) # 11 s2(a) # a typeof(s2(a)) # "symbol"
require(graphics) (s.e <- substitute(expression(a + b), list(a = 1))) #> expression(1 + b) (s.s <- substitute( a + b, list(a = 1))) #> 1 + b c(mode(s.e), typeof(s.e)) # "call", "language" c(mode(s.s), typeof(s.s)) # (the same) # but: (e.s.e <- eval(s.e)) #> expression(1 + b) c(mode(e.s.e), typeof(e.s.e)) # "expression", "expression" substitute(x <- x + 1, list(x = 1)) # nonsense myplot <- function(x, y) plot(x, y, xlab = deparse1(substitute(x)), ylab = deparse1(substitute(y))) ## Simple examples about lazy evaluation, etc: f1 <- function(x, y = x) { x <- x + 1; y } s1 <- function(x, y = substitute(x)) { x <- x + 1; y } s2 <- function(x, y) { if(missing(y)) y <- substitute(x); x <- x + 1; y } a <- 10 f1(a) # 11 s1(a) # 11 s2(a) # a typeof(s2(a)) # "symbol"
Extract or replace substrings in a character vector.
substr(x, start, stop) substring(text, first, last = 1000000L) substr(x, start, stop) <- value substring(text, first, last = 1000000L) <- value
substr(x, start, stop) substring(text, first, last = 1000000L) substr(x, start, stop) <- value substring(text, first, last = 1000000L) <- value
x , text
|
a character vector. |
start , first
|
integer. The first element to be extracted or replaced. |
stop , last
|
integer. The last element to be extracted or replaced. |
value |
a character vector, recycled if necessary. |
substring
is compatible with S, with first
and
last
instead of start
and stop
.
For vector arguments, it expands the arguments cyclically to the
length of the longest provided none are of zero length.
When extracting, if start
is larger than the string length then
""
is returned.
For the extraction functions, x
or text
will be
converted to a character vector by as.character
if it is not
already one.
For the replacement functions, if start
is larger than the
string length then no replacement is done. If the portion to be
replaced is longer than the replacement string, then only the
portion the length of the string is replaced.
If any argument is an NA
element, the corresponding element of
the answer is NA
.
Elements of the result will be have the encoding declared as that of
the current locale (see Encoding
) if the corresponding
input had a declared Latin-1 or UTF-8 encoding and the current locale
is either Latin-1 or UTF-8.
If an input element has declared "bytes"
encoding (see
Encoding
), the subsetting is done in units of bytes not
characters.
For substr
, a character vector of the same length and with the
same attributes as x
(after possible coercion).
For substring
, a character vector of length the longest of the
arguments. This will have names taken from x
(if it has any
after coercion, repeated as needed), and other attributes copied from
x
if it is the longest of the arguments).
For the replacement functions, a character vector of the same length as
x
or text
, with attributes
such as
names
preserved.
Elements of x
or text
with a declared encoding (see
Encoding
) will be returned with the same encoding.
The S version of substring<-
ignores last
; this version
does not.
These functions are often used with nchar
to truncate a
display. That does not really work (you want to limit the width, not
the number of characters, so it would be better to use
strtrim
), but at least make sure you use the default
nchar(type = "chars")
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole. (substring
.)
substr("abcdef", 2, 4) substring("abcdef", 1:6, 1:6) ## strsplit() is more efficient ... substr(rep("abcdef", 4), 1:4, 4:5) x <- c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech") substr(x, 2, 5) substring(x, 2, 4:6) X <- x names(X) <- LETTERS[seq_along(x)] comment(X) <- noquote("is a named vector") str(aX <- attributes(X)) substring(x, 2) <- c("..", "+++") substring(X, 2) <- c("..", "+++") X stopifnot(x == X, identical(aX, attributes(X)), nzchar(comment(X)))
substr("abcdef", 2, 4) substring("abcdef", 1:6, 1:6) ## strsplit() is more efficient ... substr(rep("abcdef", 4), 1:4, 4:5) x <- c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech") substr(x, 2, 5) substring(x, 2, 4:6) X <- x names(X) <- LETTERS[seq_along(x)] comment(X) <- noquote("is a named vector") str(aX <- attributes(X)) substring(x, 2) <- c("..", "+++") substring(X, 2) <- c("..", "+++") X stopifnot(x == X, identical(aX, attributes(X)), nzchar(comment(X)))
sum
returns the sum of all the values
present in its arguments.
sum(..., na.rm = FALSE)
sum(..., na.rm = FALSE)
... |
numeric or complex or logical vectors. |
na.rm |
logical. Should missing values (including |
This is a generic function: methods can be defined for it
directly or via the Summary
group generic.
For this to work properly, the arguments ...
should be
unnamed, and dispatch is on the first argument.
If na.rm
is FALSE
an NA
or NaN
value in
any of the arguments will cause a value of NA
or NaN
to
be returned, otherwise NA
and NaN
values are ignored.
Logical true values are regarded as one, false values as zero.
For historical reasons, NULL
is accepted and treated as if it
were integer(0)
.
Loss of accuracy can occur when summing values of different signs: this can even occur for sufficiently long integer inputs if the partial sums would cause integer overflow. Where possible extended-precision accumulators are used, typically well supported with C99 and newer, but possibly platform-dependent.
The sum. If all of the ...
arguments are of type
integer or logical, then the sum is integer
when
possible and is double
otherwise. Integer overflow should no
longer happen since R version 3.5.0.
For other argument types it is a length-one numeric
(double
) or complex vector.
NB: the sum of an empty set is zero, by definition.
This is part of the S4 Summary
group generic. Methods for it must use the signature
x, ..., na.rm
.
‘plotmath’ for the use of sum
in plot annotation.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
colSums
for row and column sums.
## Pass a vector to sum, and it will add the elements together. sum(1:5) ## Pass several numbers to sum, and it also adds the elements. sum(1, 2, 3, 4, 5) ## In fact, you can pass vectors into several arguments, and everything gets added. sum(1:2, 3:5) ## If there are missing values, the sum is unknown, i.e., also missing, .... sum(1:5, NA) ## ... unless we exclude missing values explicitly: sum(1:5, NA, na.rm = TRUE)
## Pass a vector to sum, and it will add the elements together. sum(1:5) ## Pass several numbers to sum, and it also adds the elements. sum(1, 2, 3, 4, 5) ## In fact, you can pass vectors into several arguments, and everything gets added. sum(1:2, 3:5) ## If there are missing values, the sum is unknown, i.e., also missing, .... sum(1:5, NA) ## ... unless we exclude missing values explicitly: sum(1:5, NA, na.rm = TRUE)
summary
is a generic function used to produce result summaries
of the results of various model fitting functions. The function
invokes particular methods
which depend on the
class
of the first argument.
summary(object, ...) ## Default S3 method: summary(object, ..., digits, quantile.type = 7) ## S3 method for class 'data.frame' summary(object, maxsum = 7, digits = max(3, getOption("digits")-3), ...) ## S3 method for class 'factor' summary(object, maxsum = 100, ...) ## S3 method for class 'matrix' summary(object, ...) ## S3 method for class 'summaryDefault' format(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'summaryDefault' print(x, digits = max(3L, getOption("digits") - 3L), ...)
summary(object, ...) ## Default S3 method: summary(object, ..., digits, quantile.type = 7) ## S3 method for class 'data.frame' summary(object, maxsum = 7, digits = max(3, getOption("digits")-3), ...) ## S3 method for class 'factor' summary(object, maxsum = 100, ...) ## S3 method for class 'matrix' summary(object, ...) ## S3 method for class 'summaryDefault' format(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'summaryDefault' print(x, digits = max(3L, getOption("digits") - 3L), ...)
object |
an object for which a summary is desired. |
x |
a result of the default method of |
maxsum |
integer, indicating how many levels should be shown for
|
digits |
integer, used for number formatting with
|
quantile.type |
integer code used in |
... |
additional arguments affecting the summary produced. |
For factor
s, the frequency of the first maxsum - 1
most frequent levels is shown, and the less frequent levels are
summarized in "(Others)"
(resulting in at most maxsum
frequencies).
The functions summary.lm
and summary.glm
are examples
of particular methods which summarize the results produced by
lm
and glm
.
The form of the value returned by summary
depends on the
class of its argument. See the documentation of the particular
methods for details of what is produced by that method.
The default method returns an object of class
c("summaryDefault", "table")
which has specialized
format
and print
methods. The
factor
method returns an integer vector.
The matrix and data frame methods return a matrix of class
"table"
, obtained by applying summary
to each
column and collating the results.
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.
anova
,
summary.glm
,
summary.lm
.
summary(attenu, digits = 4) #-> summary.data.frame(...), default precision summary(attenu $ station, maxsum = 20) #-> summary.factor(...) lst <- unclass(attenu$station) > 20 # logical with NAs ## summary.default() for logicals -- different from *.factor: summary(lst) summary(as.factor(lst))
summary(attenu, digits = 4) #-> summary.data.frame(...), default precision summary(attenu $ station, maxsum = 20) #-> summary.factor(...) lst <- unclass(attenu$station) > 20 # logical with NAs ## summary.default() for logicals -- different from *.factor: summary(lst) summary(as.factor(lst))
Compute the singular-value decomposition of a rectangular matrix.
svd(x, nu = min(n, p), nv = min(n, p), LINPACK = FALSE) La.svd(x, nu = min(n, p), nv = min(n, p))
svd(x, nu = min(n, p), nv = min(n, p), LINPACK = FALSE) La.svd(x, nu = min(n, p), nv = min(n, p))
x |
a numeric or complex matrix whose SVD decomposition is to be computed. Logical matrices are coerced to numeric. |
nu |
the number of left singular vectors to be computed.
This must between |
nv |
the number of right singular vectors to be computed.
This must be between |
LINPACK |
logical. Defunct and an error. |
The singular value decomposition plays an important role in many
statistical techniques. svd
and La.svd
provide two
interfaces which differ in their return values.
Computing the singular vectors is the slow part for large matrices.
The computation will be more efficient if both nu <= min(n, p)
and nv <= min(n, p)
, and even more so if both are zero.
Unsuccessful results from the underlying LAPACK code will result in an
error giving a positive error code (most often 1
): these can
only be interpreted by detailed study of the FORTRAN code but mean
that the algorithm failed to converge.
Missing, NaN
or infinite values in x
will given
an error.
The SVD decomposition of the matrix as computed by LAPACK,
where and
are
orthogonal,
means V transposed (and conjugated
for complex input), and
is a diagonal matrix with the
(non-negative) singular values
in decreasing
order. Equivalently,
, which is verified in
the examples.
The returned value is a list with components
d |
a vector containing the singular values of |
u |
a matrix whose columns contain the left singular vectors of
|
v |
a matrix whose columns contain the right singular vectors of
|
Recall that the singular vectors are only defined up to sign (a constant of modulus one in the complex case). If a left singular vector has its sign changed, changing the sign of the corresponding right vector gives an equivalent decomposition.
For La.svd
the return value replaces v
by vt
, the
(conjugated if complex) transpose of v
.
The main functions used are the LAPACK routines DGESDD
and
ZGESDD
.
LAPACK is from https://netlib.org/lapack/ and its guide is listed in the references.
Anderson. E. and ten others (1999)
LAPACK Users' Guide. Third Edition. SIAM.
Available on-line at
https://netlib.org/lapack/lug/lapack_lug.html.
The ‘Singular-value decomposition’ Wikipedia article.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, `+`) } X <- hilbert(9)[, 1:6] (s <- svd(X)) D <- diag(s$d) s$u %*% D %*% t(s$v) # X = U D V' t(s$u) %*% X %*% s$v # D = U' X V
hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, `+`) } X <- hilbert(9)[, 1:6] (s <- svd(X)) D <- diag(s$d) s$u %*% D %*% t(s$v) # X = U D V' t(s$u) %*% X %*% s$v # D = U' X V
Return an array obtained from an input array by sweeping out a summary statistic.
sweep(x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...)
sweep(x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...)
x |
an array, including a matrix. |
MARGIN |
a vector of indices giving the extent(s) of |
STATS |
the summary statistic which is to be swept out. |
FUN |
the function to be used to carry out the sweep. |
check.margin |
logical. If |
... |
optional arguments to |
FUN
is found by a call to match.fun
. As in the
default, binary operators can be supplied if quoted or backquoted.
FUN
should be a function of two arguments: it will be called
with arguments x
and an array of the same dimensions generated
from STATS
by aperm
.
The consistency check among STATS
, MARGIN
and x
is stricter if STATS
is an array than if it is a vector.
In the vector case, some kinds of recycling are allowed without a
warning. Use sweep(x, MARGIN, as.array(STATS))
if STATS
is a vector and you want to be warned if any recycling occurs.
An array with the same shape as x
, but with the summary
statistics swept out.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
apply
on which sweep
used to be based;
scale
for centering and scaling.
require(stats) # for median med.att <- apply(attitude, 2, median) sweep(data.matrix(attitude), 2, med.att) # subtract the column medians ## More sweeping: A <- array(1:24, dim = 4:2) ## no warnings in normal use sweep(A, 1, 5) (A.min <- apply(A, 1, min)) # == 1:4 sweep(A, 1, A.min) sweep(A, 1:2, apply(A, 1:2, median)) ## warnings when mismatch sweep(A, 1, 1:3) # STATS does not recycle sweep(A, 1, 6:1) # STATS is longer ## exact recycling: sweep(A, 1, 1:2) # no warning sweep(A, 1, as.array(1:2)) # warning ## Using named dimnames dimnames(A) <- list(fee=1:4, fie=1:3, fum=1:2) mn_fum_fie <- apply(A, c("fum", "fie"), mean) mn_fum_fie sweep(A, c("fum", "fie"), mn_fum_fie)
require(stats) # for median med.att <- apply(attitude, 2, median) sweep(data.matrix(attitude), 2, med.att) # subtract the column medians ## More sweeping: A <- array(1:24, dim = 4:2) ## no warnings in normal use sweep(A, 1, 5) (A.min <- apply(A, 1, min)) # == 1:4 sweep(A, 1, A.min) sweep(A, 1:2, apply(A, 1:2, median)) ## warnings when mismatch sweep(A, 1, 1:3) # STATS does not recycle sweep(A, 1, 6:1) # STATS is longer ## exact recycling: sweep(A, 1, 1:2) # no warning sweep(A, 1, as.array(1:2)) # warning ## Using named dimnames dimnames(A) <- list(fee=1:4, fie=1:3, fum=1:2) mn_fum_fie <- apply(A, c("fum", "fie"), mean) mn_fum_fie sweep(A, c("fum", "fie"), mn_fum_fie)
switch
evaluates EXPR
and accordingly chooses one of the
further arguments (in ...
).
switch(EXPR, ...)
switch(EXPR, ...)
EXPR |
an expression evaluating to a number or a character string. |
... |
the list of alternatives. If it is intended that
|
switch
works in two distinct ways depending whether the first
argument evaluates to a character string or a number.
If the value of EXPR
is not a character string it is coerced to
integer. Note that this also happens for factor
s, with
a warning, as typically the character level is meant. If the integer
is between 1 and nargs()-1
then the corresponding element of
...
is evaluated and the result returned: thus if the first
argument is 3
then the fourth argument is evaluated and
returned.
If EXPR
evaluates to a character string then that string is
matched (exactly) to the names of the elements in ...
. If
there is a match then that element is evaluated unless it is missing,
in which case the next non-missing element is evaluated, so for
example switch("cc", a = 1, cc =, cd =, d = 2)
evaluates to
2
. If there is more than one match, the first matching element
is used. In the case of no match, if there is an unnamed element of
...
its value is returned. (If there is more than one such
argument an error is signaled.)
The first argument is always taken to be EXPR
: if it is named
its name must (partially) match.
A warning is signaled if no alternatives are provided, as this is usually a coding error.
This is implemented as a primitive function that only evaluates its first argument and one other if one is selected.
The value of one of the elements of ...
, or NULL
,
invisibly (whenever no element is selected).
The result has the visibility (see invisible
) of the
element evaluated.
It is possible to write calls to switch
that can be confusing
and may not work in the same way in earlier versions of R. For
compatibility (and clarity), always have EXPR
as the first
argument, naming it if partial matching is a possibility. For the
character-string form, have a single unnamed argument as the default
after the named values.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
require(stats) centre <- function(x, type) { switch(type, mean = mean(x), median = median(x), trimmed = mean(x, trim = .1)) } x <- rcauchy(10) centre(x, "mean") centre(x, "median") centre(x, "trimmed") ccc <- c("b","QQ","a","A","bb") # note: cat() produces no output for NULL for(ch in ccc) cat(ch,":", switch(EXPR = ch, a = 1, b = 2:3), "\n") for(ch in ccc) cat(ch,":", switch(EXPR = ch, a =, A = 1, b = 2:3, "Otherwise: last"),"\n") ## switch(f, *) with a factor f ff <- gl(3,1, labels=LETTERS[3:1]) ff[1] # C ## so one might expect " is C" here, but switch(ff[1], A = "I am A", B="Bb..", C=" is C")# -> "I am A" ## so we give a warning ## Numeric EXPR does not allow a default value to be specified ## -- it is always NULL for(i in c(-1:3, 9)) print(switch(i, 1, 2 , 3, 4)) ## visibility switch(1, invisible(pi), pi) switch(2, invisible(pi), pi)
require(stats) centre <- function(x, type) { switch(type, mean = mean(x), median = median(x), trimmed = mean(x, trim = .1)) } x <- rcauchy(10) centre(x, "mean") centre(x, "median") centre(x, "trimmed") ccc <- c("b","QQ","a","A","bb") # note: cat() produces no output for NULL for(ch in ccc) cat(ch,":", switch(EXPR = ch, a = 1, b = 2:3), "\n") for(ch in ccc) cat(ch,":", switch(EXPR = ch, a =, A = 1, b = 2:3, "Otherwise: last"),"\n") ## switch(f, *) with a factor f ff <- gl(3,1, labels=LETTERS[3:1]) ff[1] # C ## so one might expect " is C" here, but switch(ff[1], A = "I am A", B="Bb..", C=" is C")# -> "I am A" ## so we give a warning ## Numeric EXPR does not allow a default value to be specified ## -- it is always NULL for(i in c(-1:3, 9)) print(switch(i, 1, 2 , 3, 4)) ## visibility switch(1, invisible(pi), pi) switch(2, invisible(pi), pi)
Outlines R syntax and gives the precedence of operators.
The following unary and binary operators are defined. They are listed in precedence groups, from highest to lowest.
:: :::
|
access variables in a namespace |
$ @
|
component / slot extraction |
[ [[
|
indexing |
^
|
exponentiation (right to left) |
- +
|
unary minus and plus |
:
|
sequence operator |
%any% |>
|
special operators (including %% and %/% ) |
* /
|
multiply, divide |
+ -
|
(binary) add, subtract |
< > <= >= == !=
|
ordering and comparison |
!
|
negation |
& &&
|
and |
| ||
|
or |
~
|
as in formulae |
-> ->>
|
rightwards assignment |
<- <<-
|
assignment (right to left) |
=
|
assignment (right to left) |
?
|
help (unary and binary) |
Within an expression operators of equal precedence are evaluated
from left to right except where indicated. (Note that =
is not
necessarily an operator.)
The binary operators ::
, :::
, $
and @
require
names or string constants on the right hand side, and the first two
also require them on the left.
The links in the See Also section cover most other aspects of the basic syntax.
There are substantial precedence differences between R and S. In
particular, in S ?
has the same precedence as (binary) + -
and & && | ||
have equal precedence.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Arithmetic
, Comparison
, Control
,
Extract
, Logic
,
NumericConstants
, Paren
,
Quotes
, Reserved
.
The ‘R Language Definition’ manual.
## Logical AND ("&&") has higher precedence than OR ("||"): TRUE || TRUE && FALSE # is the same as TRUE || (TRUE && FALSE) # and different from (TRUE || TRUE) && FALSE ## Special operators have higher precedence than "!" (logical NOT). ## You can use this for %in% : ! 1:10 %in% c(2, 3, 5, 7) # same as !(1:10 %in% c(2, 3, 5, 7)) ## but we strongly advise to use the "!( ... )" form in this case! ## '=' has lower precedence than '<-' ... so you should not mix them ## (and '<-' is considered better style anyway): ## Not run: ## Consequently, this gives a ("non-catchable") error x <- y = 5 #-> Error in (x <- y) = 5 : .... ## End(Not run)
## Logical AND ("&&") has higher precedence than OR ("||"): TRUE || TRUE && FALSE # is the same as TRUE || (TRUE && FALSE) # and different from (TRUE || TRUE) && FALSE ## Special operators have higher precedence than "!" (logical NOT). ## You can use this for %in% : ! 1:10 %in% c(2, 3, 5, 7) # same as !(1:10 %in% c(2, 3, 5, 7)) ## but we strongly advise to use the "!( ... )" form in this case! ## '=' has lower precedence than '<-' ... so you should not mix them ## (and '<-' is considered better style anyway): ## Not run: ## Consequently, this gives a ("non-catchable") error x <- y = 5 #-> Error in (x <- y) = 5 : .... ## End(Not run)
Sys.getenv
obtains the values of the environment variables.
Sys.getenv(x = NULL, unset = "", names = NA)
Sys.getenv(x = NULL, unset = "", names = NA)
x |
a character vector, or |
unset |
a character string. |
names |
logical: should the result be named? If |
Both arguments will be coerced to character if necessary.
Setting unset = NA
will enable unset variables and those set to
the value ""
to be distinguished, if the OS does. POSIX
requires the OS to distinguish, and all known current R platforms do.
A vector of the same length as x
, with (if names ==
TRUE
) the variable names as its names
attribute. Each element
holds the value of the environment variable named by the corresponding
component of x
(or the value of unset
if no environment
variable with that name was found).
On most platforms Sys.getenv()
will return a named vector
giving the values of all the environment variables, sorted in the
current locale. It may be confused by names containing =
which
some platforms allow but POSIX does not. (Windows is such a platform:
there names including =
are truncated just before the first
=
.)
When x
is missing and names
is not false, the result is
of class "Dlist"
in order to get a nice
print
method.
Sys.setenv
,
Sys.getlocale
for the locale in use,
getwd
for the working directory.
The help for ‘environment variables’ lists many of the environment variables used by R.
## whether HOST is set will be shell-dependent e.g. Solaris' csh did not. Sys.getenv(c("R_HOME", "R_PAPERSIZE", "R_PRINTCMD", "HOST")) s <- Sys.getenv() # *all* environment variables op <- options(width=111) # (nice printing) names(s) # all settings (the values could be very long) head(s, 12) # using the Dlist print() method ## Language and Locale settings -- but rather use Sys.getlocale() s[grep("^L(C|ANG)", names(s))] ## typically R-related: s[grep("^_?R_", names(s))] options(op)# reset
## whether HOST is set will be shell-dependent e.g. Solaris' csh did not. Sys.getenv(c("R_HOME", "R_PAPERSIZE", "R_PRINTCMD", "HOST")) s <- Sys.getenv() # *all* environment variables op <- options(width=111) # (nice printing) names(s) # all settings (the values could be very long) head(s, 12) # using the Dlist print() method ## Language and Locale settings -- but rather use Sys.getlocale() s[grep("^L(C|ANG)", names(s))] ## typically R-related: s[grep("^_?R_", names(s))] options(op)# reset
Get the process ID of the R Session. It is guaranteed by the operating system that two R sessions running simultaneously will have different IDs, but it is possible that R sessions running at different times will have the same ID.
Sys.getpid()
Sys.getpid()
An integer, often between 1 and 32767 under Unix-alikes (but for example FreeBSD and macOS use IDs up to 99999) and a positive integer (up to 32767) under Windows.
Sys.getpid() ## Show files opened from this R process if(.Platform$OS.type == "unix") ## on Unix-alikes such Linux, macOS, FreeBSD: system(paste("lsof -p", Sys.getpid()))
Sys.getpid() ## Show files opened from this R process if(.Platform$OS.type == "unix") ## on Unix-alikes such Linux, macOS, FreeBSD: system(paste("lsof -p", Sys.getpid()))
Function to do wildcard expansion (also known as ‘globbing’) on file paths.
Sys.glob(paths, dirmark = FALSE)
Sys.glob(paths, dirmark = FALSE)
paths |
character vector of patterns for relative or absolute filepaths. Missing values will be ignored. |
dirmark |
logical: should matches to directories from patterns
that do not already end in have a slash appended? May not be supported on all platforms. |
This expands tilde (see tilde expansion) and wildcards in file paths.
For precise details of wildcards expansion, see your
system's documentation on the glob
system call. There is a
POSIX 1003.2 standard (see
https://pubs.opengroup.org/onlinepubs/9699919799/functions/glob.html)
but some OSes will go beyond this.
All systems should interpret *
(match zero or more characters),
?
(match a single character) and (probably) [
(begin a
character class or range). The handling of paths
ending with a separator is system-dependent. On a POSIX-2008
compliant OS they will match directories (only), but as they are not
valid filepaths on Windows, they match nothing there. (Earlier POSIX
standards allowed them to match files.)
The rest of these details are indicative (and based on the POSIX standard).
If a filename starts with .
this may need to be matched
explicitly: for example Sys.glob("*.RData")
may or may not
match ‘.RData’ but will not usually match ‘.aa.RData’. Note
that this is platform-dependent: e.g. on Solaris
Sys.glob("*.*")
matches ‘.’ and ‘..’.
[
begins a character class. If the first character in
[...]
is not !
, this is a character class which matches
a single character against any of the characters specified. The class
cannot be empty, so ]
can be included provided it is first. If
the first character is !
, the character class matches a single
character which is none of the specified characters. Whether
.
in a character class matches a leading .
in the
filename is OS-dependent.
Character classes can include ranges such as [A-Z]
: include
-
as a character by having it first or last in a class. (The
interpretation of ranges should be locale-specific, so the example is
not a good idea in an Estonian locale.)
One can remove the special meaning of ?
, *
and
[
by preceding them by a backslash (except within a
character class).
A character vector of matched file paths. The order is
system-specific (but in the order of the elements of paths
): it
is normally collated in either the current locale or in byte (ASCII)
order; however, on Windows collation is in the order of Unicode
points.
Directory errors are normally ignored, so the matches are to accessible file paths (but not necessarily accessible files).
Quotes for handling backslashes in character strings.
Sys.glob(file.path(R.home(), "library", "*", "R", "*.rdx"))
Sys.glob(file.path(R.home(), "library", "*", "R", "*.rdx"))
Reports system and user information.
Sys.info()
Sys.info()
This uses POSIX or Windows system calls. Note that OS names (sysname
) might not
be what you expect: for example macOS identifies itself as
‘Darwin’ and Solaris as ‘SunOS’.
Sys.info()
returns details of the platform R is running on,
whereas R.version
gives details of the platform R was
built on: the release
and version
may well be different.
A character vector with fields
sysname |
The operating system name. |
release |
The OS release. |
version |
The OS version. |
nodename |
A name by which the machine is known on the network (if any). |
machine |
A concise description of the hardware, often the CPU type. |
login |
The user's login name, or |
user |
The name of the real user ID, or |
effective_user |
The name of the effective user ID, or
|
The first five fields come from the uname(2)
system call. The
login name comes from getlogin(2)
, and the user names from
getpwuid(getuid())
and getpwuid(geteuid())
.
The last three fields give the same value.
The meaning of release
and version
is system-dependent:
on a Unix-alike they normally refer to the kernel. There, usually
release
contains a numeric version and version
gives
additional information. Examples for release
:
"4.17.11-200.fc28.x86_64" # Linux (Fedora) "3.16.0-5-amd64" # Linux (Debian) "17.7.0" # macOS 10.13.6 "5.11" # Solaris
There is no guarantee that the node or login or user names will be what you might reasonably expect. (In particular on some Linux distributions the login name is unknown from sessions with re-directed inputs.)
The use of alternatives such as system("whoami")
is not
portable: the POSIX command system("id")
is much more portable
on Unix-alikes, provided only the POSIX options -[Ggu][nr] are
used (and not the many BSD and GNU extensions). whoami
is
equivalent to id -un
(on Solaris, /usr/xpg4/bin/id -un
).
Windows may report unexpected versions: there, see the help for
.Platform
, and R.version
.
sessionInfo()
gives a synopsis of both your system and
the R session (and gives the OS version in a human-readable form).
Sys.info() ## An alternative (and probably better) way to get the login name on Unix Sys.getenv("LOGNAME")
Sys.info() ## An alternative (and probably better) way to get the login name on Unix Sys.getenv("LOGNAME")
Get details of the numerical and monetary representations in the current locale.
Sys.localeconv()
Sys.localeconv()
Normally R is run without looking at the value of LC_NUMERIC,
so the decimal point remains '.
'. So the first three of these
components will only be useful if you have set the locale category
LC_NUMERIC
using Sys.setlocale
in the current R session
(when R may not work correctly).
The monetary components will only be set to non-default values (see
the ‘Examples’ section) if the LC_MONETARY
category is
set. It often is not set: set the examples for how to trigger setting it.
A character vector with 18 named components. See your ISO C documentation for details of the meaning.
It is possible to compile R without support for locales, in which
case the value will be NULL
.
Sys.setlocale
for ways to set locales.
Sys.localeconv() ## The results in the C locale are ## decimal_point thousands_sep grouping int_curr_symbol ## "." "" "" "" ## currency_symbol mon_decimal_point mon_thousands_sep mon_grouping ## "" "" "" "" ## positive_sign negative_sign int_frac_digits frac_digits ## "" "" "127" "127" ## p_cs_precedes p_sep_by_space n_cs_precedes n_sep_by_space ## "127" "127" "127" "127" ## p_sign_posn n_sign_posn ## "127" "127" ## Now try your default locale (which might be "C"). old <- Sys.getlocale() ## The category may not be set: ## the following may do so, but it might not be supported. Sys.setlocale("LC_MONETARY", locale = "") Sys.localeconv() ## or set an appropriate value yourself, e.g. Sys.setlocale("LC_MONETARY", "de_AT") Sys.localeconv() Sys.setlocale(locale = old) ## Not run: read.table("foo", dec=Sys.localeconv()["decimal_point"])
Sys.localeconv() ## The results in the C locale are ## decimal_point thousands_sep grouping int_curr_symbol ## "." "" "" "" ## currency_symbol mon_decimal_point mon_thousands_sep mon_grouping ## "" "" "" "" ## positive_sign negative_sign int_frac_digits frac_digits ## "" "" "127" "127" ## p_cs_precedes p_sep_by_space n_cs_precedes n_sep_by_space ## "127" "127" "127" "127" ## p_sign_posn n_sign_posn ## "127" "127" ## Now try your default locale (which might be "C"). old <- Sys.getlocale() ## The category may not be set: ## the following may do so, but it might not be supported. Sys.setlocale("LC_MONETARY", locale = "") Sys.localeconv() ## or set an appropriate value yourself, e.g. Sys.setlocale("LC_MONETARY", "de_AT") Sys.localeconv() Sys.setlocale(locale = old) ## Not run: read.table("foo", dec=Sys.localeconv()["decimal_point"])
These functions provide access to environment
s
(‘frames’ in S terminology) associated with functions further
up the calling stack.
sys.call(which = 0) sys.frame(which = 0) sys.nframe() sys.function(which = 0) sys.parent(n = 1) sys.calls() sys.frames() sys.parents() sys.on.exit() sys.status() parent.frame(n = 1)
sys.call(which = 0) sys.frame(which = 0) sys.nframe() sys.function(which = 0) sys.parent(n = 1) sys.calls() sys.frames() sys.parents() sys.on.exit() sys.status() parent.frame(n = 1)
which |
the frame number if non-negative, the number of frames to go back if negative. |
n |
the number of generations to go back. (See the ‘Details’ section.) |
.GlobalEnv
is given number 0 in the list of frames.
Each subsequent function evaluation increases the frame stack by 1.
The call, function definition and the environment for evaluation
of that function are returned by sys.call
, sys.function
and sys.frame
with the appropriate index.
sys.call
, sys.function
and sys.frame
accept
integer values for the argument which
. Non-negative values of
which
are frame numbers starting from .GlobalEnv
whereas negative values are counted back from the frame number of the
current evaluation.
The parent frame of a function evaluation is the environment in which
the function was called. It is not necessarily numbered one less than
the frame number of the current evaluation, nor is it the environment
within which the function was defined. sys.parent
returns the
number of the parent frame if n
is 1 (the default), the
grandparent if n
is 2, and so on. See also the ‘Note’.
sys.nframe
returns an integer, the number of the current frame
as described in the first paragraph.
sys.calls
and sys.frames
give a pairlist of all the
active calls and frames, respectively, and sys.parents
returns
an integer vector of indices of the parent frames of each of those
frames.
Notice that even though the sys.
xxx functions (except
sys.status
) are interpreted, their contexts are not counted nor
are they reported. There is no access to them.
sys.status()
returns a list with components sys.calls
,
sys.parents
and sys.frames
, the results of calls to
those three functions (which will include the call to
sys.status
: see the first example).
sys.on.exit()
returns the expression stored for use by
on.exit
in the function currently being evaluated.
(Note that this differs from S, which returns a list of expressions
for the current frame and its parents.)
parent.frame(n)
is a convenient shorthand for
sys.frame(sys.parent(n))
(implemented slightly more efficiently).
sys.call
returns a call, sys.function
a function
definition, and sys.frame
and parent.frame
return an
environment.
For the other functions, see the ‘Details’ section.
Strictly, sys.parent
and parent.frame
refer to the
context of the parent interpreted function. So internal
functions (which may or may not set contexts and so may or may not
appear on the call stack) may not be counted, and S3 methods can also do
surprising things.
As an effect of lazy evaluation, these functions look at the call stack at the time they are evaluated, not at the time they are called. Passing calls to them as function arguments is unlikely to be a good idea, but these functions still look at the call stack and count frames from the frame of the function evaluation from which they were called.
Hence, when these functions are called to provide default values for
function arguments, they are evaluated in the evaluation of the called
function and they count frames accordingly (see e.g. the envir
argument of eval
).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole. (Not parent.frame
.)
eval
for a usage of sys.frame
and parent.frame
.
require(utils) ## Note: the first two examples will give different results ## if run by example(). ff <- function(x) gg(x) gg <- function(y) sys.status() str(ff(1)) gg <- function(y) { ggg <- function() { cat("current frame is", sys.nframe(), "\n") cat("parents are", sys.parents(), "\n") print(sys.function(0)) # ggg print(sys.function(2)) # gg } if(y > 0) gg(y-1) else ggg() } gg(3) t1 <- function() { aa <- "here" t2 <- function() { ## in frame 2 here cat("current frame is", sys.nframe(), "\n") str(sys.calls()) ## list with two components t1() and t2() cat("parents are frame numbers", sys.parents(), "\n") ## 0 1 print(ls(envir = sys.frame(-1))) ## [1] "aa" "t2" invisible() } t2() } t1() test.sys.on.exit <- function() { on.exit(print(1)) ex <- sys.on.exit() str(ex) cat("exiting...\n") } test.sys.on.exit() ## gives 'language print(1)', prints 1 on exit ## An example where the parent is not the next frame up the stack ## since method dispatch uses a frame. as.double.foo <- function(x) { str(sys.calls()) print(sys.frames()) print(sys.parents()) print(sys.frame(-1)); print(parent.frame()) x } t2 <- function(x) as.double(x) a <- structure(pi, class = "foo") t2(a)
require(utils) ## Note: the first two examples will give different results ## if run by example(). ff <- function(x) gg(x) gg <- function(y) sys.status() str(ff(1)) gg <- function(y) { ggg <- function() { cat("current frame is", sys.nframe(), "\n") cat("parents are", sys.parents(), "\n") print(sys.function(0)) # ggg print(sys.function(2)) # gg } if(y > 0) gg(y-1) else ggg() } gg(3) t1 <- function() { aa <- "here" t2 <- function() { ## in frame 2 here cat("current frame is", sys.nframe(), "\n") str(sys.calls()) ## list with two components t1() and t2() cat("parents are frame numbers", sys.parents(), "\n") ## 0 1 print(ls(envir = sys.frame(-1))) ## [1] "aa" "t2" invisible() } t2() } t1() test.sys.on.exit <- function() { on.exit(print(1)) ex <- sys.on.exit() str(ex) cat("exiting...\n") } test.sys.on.exit() ## gives 'language print(1)', prints 1 on exit ## An example where the parent is not the next frame up the stack ## since method dispatch uses a frame. as.double.foo <- function(x) { str(sys.calls()) print(sys.frames()) print(sys.parents()) print(sys.frame(-1)); print(parent.frame()) x } t2 <- function(x) as.double(x) a <- structure(pi, class = "foo") t2(a)
Find out if a file path is a symbolic link, and if so what it is
linked to, via the system call readlink
.
Symbolic links are a POSIX concept, not implemented on Windows but for most filesystems on Unix-alikes.
Sys.readlink(paths)
Sys.readlink(paths)
paths |
character vector of file paths. Tilde expansion is done:
see |
A character vector of the same length as paths
. The
entries are the path of the file linked to, ""
if the path is
not a symbolic link, and NA
if there is an error (e.g., the
path does not exist or cannot be converted to the native encoding).
On platforms without the readlink
system call, all elements are
""
.
file.symlink
for the creation of symbolic links (and
their Windows analogues), file.info
##' To check if files (incl. directories) are symbolic links: is.symlink <- function(paths) isTRUE(nzchar(Sys.readlink(paths), keepNA=TRUE)) ## will return all FALSE when the platform has no `readlink` system call. is.symlink("/foo/bar")
##' To check if files (incl. directories) are symbolic links: is.symlink <- function(paths) isTRUE(nzchar(Sys.readlink(paths), keepNA=TRUE)) ## will return all FALSE when the platform has no `readlink` system call. is.symlink("/foo/bar")
Sys.setenv
sets environment variables (for other processes
called from within R or future calls to Sys.getenv
from
this R process).
Sys.unsetenv
removes environment variables.
Sys.setenv(...) Sys.unsetenv(x)
Sys.setenv(...) Sys.unsetenv(x)
... |
named arguments with values coercible to a character string. |
x |
a character vector, or an object coercible to character. |
Non-standard R names must be quoted in Sys.setenv
: see the
examples. Most platforms (and POSIX) do not allow names containing
"="
. Windows does, but the facilities provided by R may not
handle these correctly so they should be avoided. Most platforms
allow setting an environment variable to ""
, but Windows does
not and there Sys.setenv(FOO = "")
unsets FOO.
There may be system-specific limits on the maximum length of the values of individual environment variables or of names+values of all environment variables.
Recent versions of Windows have a maximum length of 32,767 characters for a
environment variable; however cmd.exe
has a limit of 8192
characters for a command line, hence set
can only set 8188.
A logical vector, with elements being true if (un)setting the
corresponding variable succeeded. (For Sys.unsetenv
this
includes attempting to remove a non-existent variable.)
On Unix-alikes, if Sys.unsetenv
is not supported, it will at
least try to set the value of the environment variable to ""
,
with a warning.
Sys.getenv
, Startup for ways to set environment
variables for the R session.
setwd
for the working directory.
Sys.setlocale
to set (and get) language locale variables,
and notably Sys.setLanguage
to set the LANGUAGE
environment variable which is used for conditionMessage
translations.
The help for ‘environment variables’ lists many of the environment variables used by R.
print(Sys.setenv(R_TEST = "testit", "A+C" = 123)) # `A+C` could also be used Sys.getenv("R_TEST") Sys.unsetenv("R_TEST") # on Unix-alike may warn and not succeed Sys.getenv("R_TEST", unset = NA)
print(Sys.setenv(R_TEST = "testit", "A+C" = 123)) # `A+C` could also be used Sys.getenv("R_TEST") Sys.unsetenv("R_TEST") # on Unix-alike may warn and not succeed Sys.getenv("R_TEST", unset = NA)
Uses system calls to set the times on a file or directory.
Sys.setFileTime(path, time)
Sys.setFileTime(path, time)
path |
A character vector containing file or directory paths. |
time |
A date-time of class |
This attempts sets the file time to the value specified.
On a Unix-alike it uses the system call utimensat
if that is
available, otherwise utimes
or utime
. On a POSIX file
system it sets both the last-access and modification times.
Fractional seconds will set as from R 3.4.0 on OSes with the
requisite system calls and suitable filesystems.
On Windows it uses the system call SetFileTime
to set the
‘last write time’. Some Windows file systems only record the
time at a resolution of two seconds.
Sys.setFileTime
has been vectorized in R 3.6.0. Earlier versions
of R required path
and time
to be vectors of length one.
A logical vector indicating if the operation succeeded for each of the files and directories attempted, returned invisibly.
Suspend execution of R expressions for a specified time interval.
Sys.sleep(time)
Sys.sleep(time)
time |
The time interval to suspend execution for, in seconds. |
Using this function allows R to temporarily be given very low priority and hence not to interfere with more important foreground tasks. A typical use is to allow a process launched from R to set itself up and read its input files before R execution is resumed.
The intention is that this function suspends execution of R expressions but wakes the process up often enough to respond to GUI events, typically every half second. It can be interrupted (e.g. by ‘Ctrl-C’ or ‘Esc’ at the R console).
There is no guarantee that the process will sleep for the whole of the specified interval (sleep might be interrupted), and it may well take slightly longer in real time to resume execution.
time
must be non-negative (and not NA
nor NaN
):
Inf
is allowed (and might be appropriate if the intention is to
wait indefinitely for an interrupt). The resolution of the time
interval is system-dependent, but will normally be 20ms or better.
(On modern Unix-alikes it will be better than 1ms.)
Invisible NULL
.
Despite its name, this is not currently implemented using the
sleep
system call (although on Windows it does make use of
Sleep
).
testit <- function(x) { p1 <- proc.time() Sys.sleep(x) proc.time() - p1 # The cpu usage should be negligible } testit(3.7)
testit <- function(x) { p1 <- proc.time() Sys.sleep(x) proc.time() - p1 # The cpu usage should be negligible } testit(3.7)
Parses expressions in the given file, and then successively evaluates them in the specified environment.
sys.source(file, envir = baseenv(), chdir = FALSE, keep.source = getOption("keep.source.pkgs"), keep.parse.data = getOption("keep.parse.data.pkgs"), toplevel.env = as.environment(envir))
sys.source(file, envir = baseenv(), chdir = FALSE, keep.source = getOption("keep.source.pkgs"), keep.parse.data = getOption("keep.parse.data.pkgs"), toplevel.env = as.environment(envir))
file |
a character string naming the file to be read from. |
envir |
an R object specifying the environment in which the
expressions are to be evaluated. May also be a list or an integer.
The default |
chdir |
logical; if |
keep.source |
logical. If |
keep.parse.data |
logical. If |
toplevel.env |
an R environment to be used as top level while evaluating the expressions. This argument is useful for frameworks running package tests; the default should be used in other cases. |
For large files, keep.source = FALSE
may save quite a bit of
memory. Disabling only parse data via keep.parse.data = FALSE
can already save a lot.
envir
In order for the code being evaluated to use the correct environment
(for example, in global assignments), source code in packages should
call topenv()
, which will return the namespace, if any,
the environment set up by sys.source
, or the global environment
if a saved image is being used.
source
, and loadNamespace
which
is called from library(.)
and uses sys.source(.)
.
## a simple way to put some objects in an environment ## high on the search path tmp <- tempfile() writeLines("aaa <- pi", tmp) env <- attach(NULL, name = "myenv") sys.source(tmp, env) unlink(tmp) search() aaa detach("myenv")
## a simple way to put some objects in an environment ## high on the search path tmp <- tempfile() writeLines("aaa <- pi", tmp) env <- attach(NULL, name = "myenv") sys.source(tmp, env) unlink(tmp) search() aaa detach("myenv")
Sys.time
and Sys.Date
returns the system's idea of the
current date with and without time.
Sys.time() Sys.Date()
Sys.time() Sys.Date()
Sys.time
returns an absolute date-time value which can be
converted to various time zones and may return different days.
Sys.Date
returns the current day in the current time zone.
Sys.time
returns an object of class "POSIXct"
(see
DateTimeClasses). On almost all systems it will have
sub-second accuracy, possibly microseconds or better. On Windows it
increments in clock ticks (usually 1/60 of a second) reported to
millisecond accuracy.
Sys.Date
returns an object of class "Date"
(see Date).
Sys.time
may return fractional seconds, but they are ignored by
the default conversions (e.g., printing) for class "POSIXct"
.
See the examples and format.POSIXct
for ways to reveal them.
date
for the system time in a fixed-format character
string.
system.time
for measuring elapsed/CPU time of expressions.
Sys.time() ## print with possibly greater accuracy: op <- options(digits.secs = 6) Sys.time() options(op) ## locale-specific version of date() format(Sys.time(), "%a %b %d %X %Y") Sys.Date()
Sys.time() ## print with possibly greater accuracy: op <- options(digits.secs = 6) Sys.time() options(op) ## locale-specific version of date() format(Sys.time(), "%a %b %d %X %Y") Sys.Date()
This is an interface to the system command which
, or to an
emulation on Windows.
Sys.which(names)
Sys.which(names)
names |
Character vector of names or paths of possible executables. |
The system command which
reports on the full path names of
an executable (including an executable script) as would be executed by
a shell, accepting either absolute paths or looking on the path.
On Windows an ‘executable’ is a file with extension
‘.exe’, ‘.com’, ‘.cmd’ or ‘.bat’. Such files need
not actually be executable, but they are what system
tries.
On a Unix-alike the full path to which
(usually
‘/usr/bin/which’) is found when R is installed.
A character vector of the same length as names
, named by
names
. The elements are either the full path to the
executable or some indication that no executable of that name was
found. Typically the indication is ""
, but this does depend on
the OS (and the known exceptions are changed to ""
). Missing
values in names
have missing return values.
On Windows the paths will be short paths (8+3 components, no spaces)
with \
as the path delimiter.
Except on Windows this calls the system command which
: since
that is not part of e.g. the POSIX standards, exactly what it does is
OS-dependent. It will usually do tilde-expansion and it may make use
of csh
aliases.
## the first two are likely to exist everywhere ## texi2dvi exists on most Unix-alikes and under MiKTeX Sys.which(c("ftp", "ping", "texi2dvi", "this-does-not-exist"))
## the first two are likely to exist everywhere ## texi2dvi exists on most Unix-alikes and under MiKTeX Sys.which(c("ftp", "ping", "texi2dvi", "this-does-not-exist"))
system
invokes the OS command specified by command
.
system(command, intern = FALSE, ignore.stdout = FALSE, ignore.stderr = FALSE, wait = TRUE, input = NULL, show.output.on.console = TRUE, minimized = FALSE, invisible = TRUE, timeout = 0, receive.console.signals = wait)
system(command, intern = FALSE, ignore.stdout = FALSE, ignore.stderr = FALSE, wait = TRUE, input = NULL, show.output.on.console = TRUE, minimized = FALSE, invisible = TRUE, timeout = 0, receive.console.signals = wait)
command |
the system command to be invoked, as a character string. |
intern |
a logical (not |
ignore.stdout , ignore.stderr
|
a logical (not |
wait |
a logical (not |
input |
if a character vector is supplied, this is copied one
string per line to a temporary file, and the standard input of
|
timeout |
timeout in seconds, ignored if 0. This is a limit for the
elapsed time running |
receive.console.signals |
a logical (not |
show.output.on.console , minimized , invisible
|
arguments that are accepted on Windows but ignored on this platform, with a warning. |
This interface has become rather complicated over the years: see
system2
for a more portable and flexible interface
which is recommended for new code.
command
is parsed as a command plus arguments separated by
spaces. So if the path to the command (or a single argument such as a
file path) contains spaces, it must be quoted e.g. by
shQuote
.
Unix-alikes pass the command line to a shell (normally ‘/bin/sh’,
and POSIX requires that shell), so command
can be anything the
shell regards as executable, including shell scripts, and it can
contain multiple commands separated by ;
.
On Windows, system
does not use a shell and there is a separate
function shell
which passes command lines to a shell.
If intern
is TRUE
then popen
is used to invoke the
command and the output collected, line by line, into an R
character
vector. If intern
is FALSE
then
the C function system
is used to invoke the command.
wait
is implemented by appending &
to the command: this
is in principle shell-dependent, but required by POSIX and so widely
supported.
When timeout
is non-zero, the command is terminated after the given
number of seconds. The termination works for typical commands, but is not
guaranteed: it is possible to write a program that would keep running
after the time is out. Timeouts can only be set with wait = TRUE
.
Timeouts cannot be used with interactive commands: the command is run with
standard input redirected from ‘/dev/null’ and it must not modify
terminal settings. As long as tty tostop
option is disabled, which
it usually is by default, the executed command may write to standard
output and standard error. One cannot rely on that the execution time of
the child processes will be included into user.child
and
sys.child
element of proc_time
returned by proc.time
.
For the time to be included, all child processes have to be waited for by
their parents, which has to be implemented in the parent applications.
The ordering of arguments after the first two has changed from time to time: it is recommended to name all arguments after the first.
There are many pitfalls in using system
to ascertain if a
command can be run — Sys.which
is more suitable.
receive.console.signals = TRUE
is useful when running asynchronous
processes (using wait = FALSE
) to implement a synchronous operation.
In all other cases it is recommended to use the default.
If intern = TRUE
, a character vector giving the output of the
command, one line per character string. (Output lines of more than
8095 bytes will be split on some systems.)
If the command could not be run an R error is generated.
If command
runs but gives a non-zero exit status this will be
reported with a warning and in the attribute "status"
of the
result: an attribute "errmsg"
may also be available.
If intern = FALSE
, the return value is an error code (0
for success), given the invisible attribute (so needs to be printed
explicitly). If the command could not be run for any reason, the
value is 127
and a warning is issued (as from R 3.5.0).
Otherwise if wait = TRUE
the value is the exit status returned
by the command, and if wait = FALSE
it is 0
(the
conventional success value).
If the command times out, a warning is reported and the exit status is
124
.
For command-line R, error messages written to ‘stderr’ will be
sent to the terminal unless ignore.stderr = TRUE
. They can be
captured (in the most likely shells) by
system("some command 2>&1", intern = TRUE)
For GUIs, what happens to output sent to ‘stdout’ or
‘stderr’ if intern = FALSE
is interface-specific, and it
is unsafe to assume that such messages will appear on a GUI console
(they do on the macOS GUI's console, but not on some others).
How processes are launched differs fundamentally between Windows and
Unix-alike operating systems, as do the higher-level OS functions on
which this R function is built. So it should not be surprising that
there are many differences between OSes in how system
behaves.
For the benefit of programmers, the more important ones are summarized
in this section.
The most important difference is that on a Unix-alike
system
launches a shell which then runs command
. On
Windows the command is run directly – use shell
for an
interface which runs command
via a shell (by default
the Windows shell cmd.exe
, which has many differences from
a POSIX shell).
This means that it cannot be assumed that redirection or piping will
work in system
(redirection sometimes does, but we have seen
cases where it stopped working after a Windows security patch), and
system2
(or shell
) must be used on Windows.
What happens to stdout
and stderr
when not
captured depends on how R is running: Windows batch commands behave
like a Unix-alike, but from the Windows GUI they are
generally lost. system(intern = TRUE)
captures ‘stderr’
when run from the Windows GUI console unless ignore.stderr =
TRUE
.
The behaviour on error is different in subtle ways (and has differed between R versions).
The quoting conventions for command
differ, but
shQuote
is a portable interface.
Arguments show.output.on.console
, minimized
,
invisible
only do something on Windows (and are most relevant
to Rgui
there).
man system
and man sh
for how this is implemented
on the OS in use.
.Platform
for platform-specific variables.
pipe
to set up a pipe connection.
# list all files in the current directory using the -F flag ## Not run: system("ls -F") # t1 is a character vector, each element giving a line of output from who # (if the platform has who) t1 <- try(system("who", intern = TRUE)) try(system("ls fizzlipuzzli", intern = TRUE, ignore.stderr = TRUE)) # zero-length result since file does not exist, and will give warning.
# list all files in the current directory using the -F flag ## Not run: system("ls -F") # t1 is a character vector, each element giving a line of output from who # (if the platform has who) t1 <- try(system("who", intern = TRUE)) try(system("ls fizzlipuzzli", intern = TRUE, ignore.stderr = TRUE)) # zero-length result since file does not exist, and will give warning.
Finds the full file names of files in packages etc.
system.file(..., package = "base", lib.loc = NULL, mustWork = FALSE)
system.file(..., package = "base", lib.loc = NULL, mustWork = FALSE)
... |
character vectors, specifying subdirectory and file(s) within some package. The default, none, returns the root of the package. Wildcards are not supported. |
package |
a character string with the name of a single package. An error occurs if more than one package name is given. |
lib.loc |
a character vector with path names of R libraries.
See ‘Details’ for the meaning of the default value of |
mustWork |
logical. If |
This checks the existence of the specified files with
file.exists
. So file paths are only returned if there
are sufficient permissions to establish their existence.
The unnamed arguments in ...
are usually character strings, but
if character vectors they are recycled to the same length.
This uses find.package
to find the package, and hence
with the default lib.loc = NULL
looks first for attached
packages then in each library listed in .libPaths()
.
Note that if a namespace is loaded but the package is not attached,
this will look only on .libPaths()
.
A character vector of positive length, containing the file paths
that matched ...
, or the empty string, ""
, if none
matched (unless mustWork = TRUE
).
If matching the root of a package, there is no trailing separator.
system.file()
with no arguments gives the root of the
base package.
R.home
for the root directory of the R
installation, list.files
.
Sys.glob
to find paths via wildcards.
system.file() # The root of the 'base' package system.file(package = "stats") # The root of package 'stats' system.file("INDEX") system.file("help", "AnIndex", package = "splines")
system.file() # The root of the 'base' package system.file(package = "stats") # The root of package 'stats' system.file("INDEX") system.file("help", "AnIndex", package = "splines")
Return CPU (and other) times that expr
used.
system.time(expr, gcFirst = TRUE)
system.time(expr, gcFirst = TRUE)
expr |
Valid R expression to be timed. |
gcFirst |
Logical - should a garbage collection be performed
immediately before the timing? Default is |
system.time
calls the function proc.time
,
evaluates expr
, and then calls proc.time
once more,
returning the difference between the two proc.time
calls.
unix.time
has been an alias of system.time
, for
compatibility with S, has been deprecated in 2016 and finally became
defunct in 2022.
Timings of evaluations of the same expression can vary considerably
depending on whether the evaluation triggers a garbage collection. When
gcFirst
is TRUE
a garbage collection (gc
)
will be performed immediately before the evaluation of expr
.
This will usually produce more consistent timings.
A object of class "proc_time"
: see
proc.time
for details.
proc.time
, time
which is for time series.
setTimeLimit
to limit the (CPU/elapsed) time R is allowed
to use.
Sys.time
to get the current date & time.
require(stats) system.time(for(i in 1:100) mad(runif(1000))) ## Not run: exT <- function(n = 10000) { # Purpose: Test if system.time works ok; n: loop size system.time(for(i in 1:n) x <- mean(rt(1000, df = 4))) } #-- Try to interrupt one of the following (using Ctrl-C / Escape): exT() #- about 4 secs on a 2.5GHz Xeon system.time(exT()) #~ +/- same ## End(Not run)
require(stats) system.time(for(i in 1:100) mad(runif(1000))) ## Not run: exT <- function(n = 10000) { # Purpose: Test if system.time works ok; n: loop size system.time(for(i in 1:n) x <- mean(rt(1000, df = 4))) } #-- Try to interrupt one of the following (using Ctrl-C / Escape): exT() #- about 4 secs on a 2.5GHz Xeon system.time(exT()) #~ +/- same ## End(Not run)
system2
invokes the OS command specified by command
.
system2(command, args = character(), stdout = "", stderr = "", stdin = "", input = NULL, env = character(), wait = TRUE, minimized = FALSE, invisible = TRUE, timeout = 0, receive.console.signals = wait)
system2(command, args = character(), stdout = "", stderr = "", stdin = "", input = NULL, env = character(), wait = TRUE, minimized = FALSE, invisible = TRUE, timeout = 0, receive.console.signals = wait)
command |
the system command to be invoked, as a character string. |
args |
a character vector of arguments to |
stdout , stderr
|
where output to ‘stdout’ or
‘stderr’ should be sent. Possible values are |
stdin |
should input be diverted? |
input |
if a character vector is supplied, this is copied one
string per line to a temporary file, and the standard input of
|
env |
character vector of name=value strings to set environment variables. |
wait |
a logical (not |
timeout |
timeout in seconds, ignored if 0. This is a limit for the
elapsed time running |
receive.console.signals |
a logical (not |
minimized , invisible
|
arguments that are accepted on Windows but ignored on this platform, with a warning. |
Unlike system
, command
is always quoted by
shQuote
, so it must be a single command without arguments.
For details of how command
is found see system
.
On Windows, env
is only supported for commands such as
R
and make
which accept environment variables on
their command line.
Some Unix commands (such as some implementations of ls
) change
their output if they consider it to be piped or redirected:
stdout = TRUE
uses a pipe whereas stdout =
"some_file_name"
uses redirection.
Because of the way it is implemented, on a Unix-alike stderr =
TRUE
implies stdout = TRUE
: a warning is given if this is
not what was specified.
When timeout
is non-zero, the command is terminated after the given
number of seconds. The termination works for typical commands, but is not
guaranteed: it is possible to write a program that would keep running
after the time is out. Timeouts can only be set with wait = TRUE
.
Timeouts cannot be used with interactive commands: the command is run with
standard input redirected from /dev/null
and it must not modify
terminal settings. As long as tty tostop
option is disabled, which
it usually is by default, the executed command may write to standard
output and standard error.
receive.console.signals = TRUE
is useful when running asynchronous
processes (using wait = FALSE
) to implement a synchronous operation.
In all other cases it is recommended to use the default.
If stdout = TRUE
or stderr = TRUE
, a character vector
giving the output of the command, one line per character string.
(Output lines of more than 8095 bytes will be split.) If the command
could not be run an R error is generated. If command
runs but
gives a non-zero exit status this will be reported with a warning and
in the attribute "status"
of the result: an attribute
"errmsg"
may also be available.
In other cases, the return value is an error code (0
for
success), given the invisible attribute (so needs to be printed
explicitly). If the command could not be run for any reason, the
value is 127
and a warning is issued (as from R 3.5.0).
Otherwise if wait = TRUE
the value is the exit status returned
by the command, and if wait = FALSE
it is 0
(the
conventional success value).
If the command times out, a warning is issued and the exit status is
124
.
system2
is a more portable and flexible interface than
system
. It allows redirection of output without needing
to invoke a shell on Windows, a portable way to set environment
variables for the execution of command
, and finer control over
the redirection of stdout
and stderr
. Conversely,
system
(and shell
on Windows) allows the invocation of
arbitrary command lines.
There is no guarantee that if stdout
and stderr
are both
TRUE
or the same file that the two streams will be interleaved
in order. This depends on both the buffering used by the command and
the OS.
Given a matrix or data.frame
x
,
t
returns the transpose of x
.
t(x)
t(x)
x |
a matrix or data frame, typically. |
This is a generic function for which methods can be written. The
description here applies to the default and "data.frame"
methods.
A data frame is first coerced to a matrix: see as.matrix
.
When x
is a vector, it is treated as a column, i.e., the
result is a 1-row matrix.
A matrix, with dim
and dimnames
constructed
appropriately from those of x
, and other attributes except
names copied across.
The conjugate transpose of a complex matrix , denoted
or
, is computed as
Conj(t(A))
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
aperm
for permuting the dimensions of arrays.
a <- matrix(1:30, 5, 6) ta <- t(a) ##-- i.e., a[i, j] == ta[j, i] for all i,j : for(j in seq(ncol(a))) if(! all(a[, j] == ta[j, ])) stop("wrong transpose")
a <- matrix(1:30, 5, 6) ta <- t(a) ##-- i.e., a[i, j] == ta[j, i] for all i,j : for(j in seq(ncol(a))) if(! all(a[, j] == ta[j, ])) stop("wrong transpose")
table
uses cross-classifying factors to build a contingency
table of the counts at each combination of factor levels.
table(..., exclude = if (useNA == "no") c(NA, NaN), useNA = c("no", "ifany", "always"), dnn = list.names(...), deparse.level = 1) as.table(x, ...) is.table(x) ## S3 method for class 'table' as.data.frame(x, row.names = NULL, ..., responseName = "Freq", stringsAsFactors = TRUE, sep = "", base = list(LETTERS))
table(..., exclude = if (useNA == "no") c(NA, NaN), useNA = c("no", "ifany", "always"), dnn = list.names(...), deparse.level = 1) as.table(x, ...) is.table(x) ## S3 method for class 'table' as.data.frame(x, row.names = NULL, ..., responseName = "Freq", stringsAsFactors = TRUE, sep = "", base = list(LETTERS))
... |
one or more objects which can be interpreted as factors
(including numbers or character strings), or a |
exclude |
levels to remove for all factors in |
useNA |
whether to include |
dnn |
the names to be given to the dimensions in the result (the dimnames names). |
deparse.level |
controls how the default |
x |
an arbitrary R object, or an object inheriting from class
|
row.names |
a character vector giving the row names for the data frame. |
responseName |
the name to be used for the column of table entries, usually counts. |
stringsAsFactors |
logical: should the classifying factors be returned as factors (the default) or character vectors? |
sep , base
|
passed to |
If the argument dnn
is not supplied, the internal function
list.names
is called to compute the ‘dimname names’ as
follows:
If ...
is one list
with its own names()
,
these names
are used. Otherwise, if the
arguments in ...
are named, those names are used. For the
remaining arguments, deparse.level = 0
gives an empty name,
deparse.level = 1
uses the supplied argument if it is a symbol,
and deparse.level = 2
will deparse the argument.
Only when exclude
is specified (i.e., not by default) and
non-empty, will table
potentially drop levels of factor
arguments.
useNA
controls if the table includes counts of NA
values: the allowed values correspond to never ("no"
), only if the count is
positive ("ifany"
) and even for zero counts ("always"
).
Note the somewhat “pathological” case of two different kinds of
NA
s which are treated differently, depending on both
useNA
and exclude
, see d.patho
in the
‘Examples:’ below.
Both exclude
and useNA
operate on an “all or none”
basis. If you want to control the dimensions of a multiway table
separately, modify each argument using factor
or
addNA
.
Non-factor arguments a
are coerced via factor(a,
exclude=exclude)
. Since R 3.4.0, care is taken not to
count the excluded values (where they were included in the NA
count, previously).
The summary
method for class "table"
(used for objects
created by table
or xtabs
) which gives basic
information and performs a chi-squared test for independence of
factors (note that the function chisq.test
currently
only handles 2-d tables).
table()
returns a contingency table, an object of
class "table"
, an array of integer values.
Note that unlike S the result is always an array
, a 1D
array if one factor is given.
as.table
and is.table
coerce to and test for contingency
table, respectively.
The as.data.frame
method for objects inheriting from class
"table"
can be used to convert the array-based representation
of a contingency table to a data frame containing the classifying
factors and the corresponding entries (the latter as component
named by responseName
). This is the inverse of xtabs
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
tabulate
is the underlying function and allows finer
control.
Use ftable
for printing (and more) of
multidimensional tables. margin.table
,
prop.table
, addmargins
.
addNA
for constructing factors with NA
as
a level.
xtabs
for cross tabulation of data frames with a
formula interface.
require(stats) # for rpois and xtabs ## Simple frequency distribution table(rpois(100, 5)) ## Check the design: with(warpbreaks, table(wool, tension)) table(state.division, state.region) # simple two-way contingency table with(airquality, table(cut(Temp, quantile(Temp)), Month)) a <- letters[1:3] table(a, sample(a)) # dnn is c("a", "") table(a, sample(a), dnn = NULL) # dimnames() have no names table(a, sample(a), deparse.level = 0) # dnn is c("", "") table(a, sample(a), deparse.level = 2) # dnn is c("a", "sample(a)") ## xtabs() <-> as.data.frame.table() : UCBAdmissions ## already a contingency table DF <- as.data.frame(UCBAdmissions) class(tab <- xtabs(Freq ~ ., DF)) # xtabs & table ## tab *is* "the same" as the original table: all(tab == UCBAdmissions) all.equal(dimnames(tab), dimnames(UCBAdmissions)) a <- rep(c(NA, 1/0:3), 10) table(a) # does not report NA's table(a, exclude = NULL) # reports NA's b <- factor(rep(c("A","B","C"), 10)) table(b) table(b, exclude = "B") d <- factor(rep(c("A","B","C"), 10), levels = c("A","B","C","D","E")) table(d, exclude = "B") print(table(b, d), zero.print = ".") ## NA counting: is.na(d) <- 3:4 d. <- addNA(d) d.[1:7] table(d.) # ", exclude = NULL" is not needed ## i.e., if you want to count the NA's of 'd', use table(d, useNA = "ifany") ## "pathological" case: d.patho <- addNA(c(1,NA,1:2,1:3))[-7]; is.na(d.patho) <- 3:4 d.patho ## just 3 consecutive NA's ? --- well, have *two* kinds of NAs here : as.integer(d.patho) # 1 4 NA NA 1 2 ## ## In R >= 3.4.0, table() allows to differentiate: table(d.patho) # counts the "unusual" NA table(d.patho, useNA = "ifany") # counts all three table(d.patho, exclude = NULL) # (ditto) table(d.patho, exclude = NA) # counts none ## Two-way tables with NA counts. The 3rd variant is absurd, but shows ## something that cannot be done using exclude or useNA. with(airquality, table(OzHi = Ozone > 80, Month, useNA = "ifany")) with(airquality, table(OzHi = Ozone > 80, Month, useNA = "always")) with(airquality, table(OzHi = Ozone > 80, addNA(Month)))
require(stats) # for rpois and xtabs ## Simple frequency distribution table(rpois(100, 5)) ## Check the design: with(warpbreaks, table(wool, tension)) table(state.division, state.region) # simple two-way contingency table with(airquality, table(cut(Temp, quantile(Temp)), Month)) a <- letters[1:3] table(a, sample(a)) # dnn is c("a", "") table(a, sample(a), dnn = NULL) # dimnames() have no names table(a, sample(a), deparse.level = 0) # dnn is c("", "") table(a, sample(a), deparse.level = 2) # dnn is c("a", "sample(a)") ## xtabs() <-> as.data.frame.table() : UCBAdmissions ## already a contingency table DF <- as.data.frame(UCBAdmissions) class(tab <- xtabs(Freq ~ ., DF)) # xtabs & table ## tab *is* "the same" as the original table: all(tab == UCBAdmissions) all.equal(dimnames(tab), dimnames(UCBAdmissions)) a <- rep(c(NA, 1/0:3), 10) table(a) # does not report NA's table(a, exclude = NULL) # reports NA's b <- factor(rep(c("A","B","C"), 10)) table(b) table(b, exclude = "B") d <- factor(rep(c("A","B","C"), 10), levels = c("A","B","C","D","E")) table(d, exclude = "B") print(table(b, d), zero.print = ".") ## NA counting: is.na(d) <- 3:4 d. <- addNA(d) d.[1:7] table(d.) # ", exclude = NULL" is not needed ## i.e., if you want to count the NA's of 'd', use table(d, useNA = "ifany") ## "pathological" case: d.patho <- addNA(c(1,NA,1:2,1:3))[-7]; is.na(d.patho) <- 3:4 d.patho ## just 3 consecutive NA's ? --- well, have *two* kinds of NAs here : as.integer(d.patho) # 1 4 NA NA 1 2 ## ## In R >= 3.4.0, table() allows to differentiate: table(d.patho) # counts the "unusual" NA table(d.patho, useNA = "ifany") # counts all three table(d.patho, exclude = NULL) # (ditto) table(d.patho, exclude = NA) # counts none ## Two-way tables with NA counts. The 3rd variant is absurd, but shows ## something that cannot be done using exclude or useNA. with(airquality, table(OzHi = Ozone > 80, Month, useNA = "ifany")) with(airquality, table(OzHi = Ozone > 80, Month, useNA = "always")) with(airquality, table(OzHi = Ozone > 80, addNA(Month)))
tabulate
takes the integer-valued vector bin
and counts
the number of times each integer occurs in it.
tabulate(bin, nbins = max(1, bin, na.rm = TRUE))
tabulate(bin, nbins = max(1, bin, na.rm = TRUE))
bin |
a numeric vector (of positive integers), or a factor. Long vectors are supported. |
nbins |
the number of bins to be used. |
tabulate
is the workhorse for the table
function.
If bin
is a factor, its internal integer representation
is tabulated.
If the elements of bin
are numeric but not integers,
they are truncated by as.integer
.
An integer valued integer
or double
vector
(without names). There is a bin for each of the values 1,
..., nbins
; values outside that range and NA
s are (silently)
ignored.
On 64-bit platforms bin
can have or more
elements (i.e.,
length(bin) > .Machine$integer.max
), and hence
a count could exceed the maximum integer. For this reason, the return
value is of type double for such long bin
vectors.
tabulate(c(2,3,5)) tabulate(c(2,3,3,5), nbins = 10) tabulate(c(-2,0,2,3,3,5)) # -2 and 0 are ignored tabulate(c(-2,0,2,3,3,5), nbins = 3) tabulate(factor(letters[1:10]))
tabulate(c(2,3,5)) tabulate(c(2,3,3,5), nbins = 10) tabulate(c(-2,0,2,3,3,5)) # -2 and 0 are ignored tabulate(c(-2,0,2,3,3,5), nbins = 3) tabulate(factor(letters[1:10]))
Tailcall
and Exec
Tailcall
and Exec
allow writing more
stack-space-efficient recursive functions in R.
Tailcall(FUN, ...) Exec(expr, envir)
Tailcall(FUN, ...) Exec(expr, envir)
FUN |
a function or a non-empty character string naming the function to be called. |
... |
all the arguments to be passed. |
expr |
a call expression. |
envir |
environment for evaluating |
Tailcall
evaluates a call to FUN
with arguments ... in
the current environment, and Exec
evaluates the call
expr
in environment envir
. If a Tailcall
or
Exec
expression appears in tail position in an R function, and
if there are no on.exit
expressions set, then the evaluation
context of the new calls replaces the currently executing call context
with a new one. If the requirements for context re-use are not met,
then evaluation proceeds in the standard way adding another context to
the stack.
Using Tailcall
it is possible to define tail-recursive
functions that do not grow the evaluation stack. Exec
can be
used to simplify the call stack for functions that create and then
evaluate an expression.
Because of lazy evaluation of arguments in R it may be necessary to force evaluation of some arguments to avoid accumulating deferred evaluations.
This tail call optimization has the advantage of not growing
the call stack and permitting arbitrarily deep tail recursions. It
does also mean that stack traces produced by traceback
or sys.calls
will only show the call specified by
Tailcall
or Exec
, not the previous call whose stack
entry has been replaced.
Tailcall
and Exec
are experimental and may be
changed or dropped in future released versions of R.
## tail-recursive log10-factorial lfact <- function(n) { lfact_iter <- function(val, n) { if (n <= 0) val else { val <- val + log10(n) # forces val Tailcall(lfact_iter, val, n - 1) } } lfact_iter(0, n) } 10 ^ lfact(3) lfact(100000) ## simplified variant of do.call using Exec: docall <- function (what, args, quote = FALSE) { if (!is.list(args)) stop("second argument must be a list") if (quote) args <- lapply(args, enquote) Exec(as.call(c(list(substitute(what)), args)), parent.frame()) } ## the call stack does not contain the call to docall: docall(function() sys.calls(), list()) |> Find(function(x) identical(x[[1]], quote(docall)), x = _) ## contrast to do.call: do.call(function(x) sys.calls(), list()) |> Find(function(x) identical(x[[1]], quote(do.call)), x = _)
## tail-recursive log10-factorial lfact <- function(n) { lfact_iter <- function(val, n) { if (n <= 0) val else { val <- val + log10(n) # forces val Tailcall(lfact_iter, val, n - 1) } } lfact_iter(0, n) } 10 ^ lfact(3) lfact(100000) ## simplified variant of do.call using Exec: docall <- function (what, args, quote = FALSE) { if (!is.list(args)) stop("second argument must be a list") if (quote) args <- lapply(args, enquote) Exec(as.call(c(list(substitute(what)), args)), parent.frame()) } ## the call stack does not contain the call to docall: docall(function() sys.calls(), list()) |> Find(function(x) identical(x[[1]], quote(docall)), x = _) ## contrast to do.call: do.call(function(x) sys.calls(), list()) |> Find(function(x) identical(x[[1]], quote(do.call)), x = _)
Apply a function to each cell of a ragged array, that is to each (non-empty) group of values or data rows given by a unique combination of the levels of certain factors.
tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)
tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)
X |
an R object for which a |
INDEX |
a |
FUN |
a function (or name of a function) to be applied, or |
... |
optional arguments to |
default |
(only in the case of simplification to an array) the
value with which the array is initialized as
|
simplify |
logical; if |
If FUN
is not NULL
, it is passed to
match.fun
, and hence it can be a function or a symbol or
character string naming a function.
When FUN
is present, tapply
calls FUN
for each
cell that has any data in it. If FUN
returns a single atomic
value for each such cell (e.g., functions mean
or var
)
and when simplify
is TRUE
, tapply
returns a
multi-way array containing the values, and NA
for the
empty cells. The array has the same number of dimensions as
INDEX
has components; the number of levels in a dimension is
the number of levels (nlevels()
) in the corresponding component
of INDEX
. Note that if the return value has a class (e.g., an
object of class "Date"
) the class is discarded.
simplify = TRUE
always returns an array, possibly 1-dimensional.
If FUN
does not return a single atomic value, tapply
returns an array of mode list
whose components are the
values of the individual calls to FUN
, i.e., the result is a
list with a dim
attribute.
When there is an array answer, its dimnames
are named by
the names of INDEX
and are based on the levels of the grouping
factors (possibly after coercion).
For a list result, the elements corresponding to empty cells are
NULL
.
The array2DF
function can be used to convert the array
returned by tapply
into a data frame, which may be more
convenient for further analysis.
Optional arguments to FUN
supplied by the ...
argument
are not divided into cells. It is therefore inappropriate for
FUN
to expect additional arguments with the same length as
X
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
the convenience functions by
and
aggregate
(using tapply
);
apply
,
lapply
with its versions
sapply
and mapply
.
array2DF
to convert the result into a data frame.
require(stats) groups <- as.factor(rbinom(32, n = 5, prob = 0.4)) tapply(groups, groups, length) #- is almost the same as table(groups) ## contingency table from data.frame : array with named dimnames tapply(warpbreaks$breaks, warpbreaks[,-1], sum) tapply(warpbreaks$breaks, warpbreaks[, 3, drop = FALSE], sum) n <- 17; fac <- factor(rep_len(1:3, n), levels = 1:5) table(fac) tapply(1:n, fac, sum) tapply(1:n, fac, sum, default = 0) # maybe more desirable tapply(1:n, fac, sum, simplify = FALSE) tapply(1:n, fac, range) tapply(1:n, fac, quantile) tapply(1:n, fac, length) ## NA's tapply(1:n, fac, length, default = 0) # == table(fac) ## example of ... argument: find quarterly means tapply(presidents, cycle(presidents), mean, na.rm = TRUE) ind <- list(c(1, 2, 2), c("A", "A", "B")) table(ind) tapply(1:3, ind) #-> the split vector tapply(1:3, ind, sum) ## Some assertions (not held by all patch propsals): nq <- names(quantile(1:5)) stopifnot( identical(tapply(1:3, ind), c(1L, 2L, 4L)), identical(tapply(1:3, ind, sum), matrix(c(1L, 2L, NA, 3L), 2, dimnames = list(c("1", "2"), c("A", "B")))), identical(tapply(1:n, fac, quantile)[-1], array(list(`2` = structure(c(2, 5.75, 9.5, 13.25, 17), names = nq), `3` = structure(c(3, 6, 9, 12, 15), names = nq), `4` = NULL, `5` = NULL), dim=4, dimnames=list(as.character(2:5)))))
require(stats) groups <- as.factor(rbinom(32, n = 5, prob = 0.4)) tapply(groups, groups, length) #- is almost the same as table(groups) ## contingency table from data.frame : array with named dimnames tapply(warpbreaks$breaks, warpbreaks[,-1], sum) tapply(warpbreaks$breaks, warpbreaks[, 3, drop = FALSE], sum) n <- 17; fac <- factor(rep_len(1:3, n), levels = 1:5) table(fac) tapply(1:n, fac, sum) tapply(1:n, fac, sum, default = 0) # maybe more desirable tapply(1:n, fac, sum, simplify = FALSE) tapply(1:n, fac, range) tapply(1:n, fac, quantile) tapply(1:n, fac, length) ## NA's tapply(1:n, fac, length, default = 0) # == table(fac) ## example of ... argument: find quarterly means tapply(presidents, cycle(presidents), mean, na.rm = TRUE) ind <- list(c(1, 2, 2), c("A", "A", "B")) table(ind) tapply(1:3, ind) #-> the split vector tapply(1:3, ind, sum) ## Some assertions (not held by all patch propsals): nq <- names(quantile(1:5)) stopifnot( identical(tapply(1:3, ind), c(1L, 2L, 4L)), identical(tapply(1:3, ind, sum), matrix(c(1L, 2L, NA, 3L), 2, dimnames = list(c("1", "2"), c("A", "B")))), identical(tapply(1:n, fac, quantile)[-1], array(list(`2` = structure(c(2, 5.75, 9.5, 13.25, 17), names = nq), `3` = structure(c(3, 6, 9, 12, 15), names = nq), `4` = NULL, `5` = NULL), dim=4, dimnames=list(as.character(2:5)))))
addTaskCallback
registers an R function
that is to be called each time a top-level task
is completed.
removeTaskCallback
un-registers a function
that was registered earlier via addTaskCallback
.
These provide low-level access to the internal/native
mechanism for managing task-completion actions.
One can use taskCallbackManager
at the R-language level to manage R functions
that are called at the completion of each task.
This is easier and more direct.
addTaskCallback(f, data = NULL, name = character()) removeTaskCallback(id)
addTaskCallback(f, data = NULL, name = character()) removeTaskCallback(id)
f |
the function that is to be invoked each time a top-level task
is successfully completed. This is called with 5 or 4 arguments
depending on whether |
data |
if specified, this is the 5-th argument in the call to the
callback function |
id |
a string or an integer identifying the element in the
internal callback list to be removed.
Integer indices are 1-based, i.e the first element is 1.
The names of currently registered handlers is available
using |
name |
character: names to be used. |
Top-level tasks are individual expressions
rather than entire lines of input. Thus an input
line of the form expression1 ; expression2
will give rise to 2 top-level tasks.
A top-level task callback is called with the expression for the
top-level task, the result of the top-level task, a logical value
indicating whether it was successfully completed or not (always TRUE
at present), and a logical value indicating whether the result was
printed or not. If the data
argument was specified in the call
to addTaskCallback
, that value is given as the fifth argument.
The callback function should return a logical value. If the value is FALSE, the callback is removed from the task list and will not be called again by this mechanism. If the function returns TRUE, it is kept in the list and will be called on the completion of the next top-level task.
addTaskCallback
returns
an integer value giving the position in the list
of task callbacks that this new callback occupies.
This is only the current position of the callback.
It can be used to remove the entry as long as
no other values are removed from earlier positions
in the list first.
removeTaskCallback
returns a logical value
indicating whether the specified element was removed.
This can fail (i.e., return FALSE
)
if an incorrect name or index is given that does not
correspond to the name or position of an element in the list.
There is also C-level access to top-level task callbacks to allow C routines rather than R functions be used.
getTaskCallbackNames
taskCallbackManager
https://developer.r-project.org/TaskHandlers.pdf
times <- function(total = 3, str = "Task a") { ctr <- 0 function(expr, value, ok, visible) { ctr <<- ctr + 1 cat(str, ctr, "\n") keep.me <- (ctr < total) if (!keep.me) cat("handler removing itself\n") # return keep.me } } # add the callback that will work for # 4 top-level tasks and then remove itself. n <- addTaskCallback(times(4)) # now remove it, assuming it is still first in the list. removeTaskCallback(n) ## See how the handler is called every time till "self destruction": addTaskCallback(times(4)) # counts as once already sum(1:10) ; mean(1:3) # two more sinpi(1) # 4th - and "done" cospi(1) tanpi(1)
times <- function(total = 3, str = "Task a") { ctr <- 0 function(expr, value, ok, visible) { ctr <<- ctr + 1 cat(str, ctr, "\n") keep.me <- (ctr < total) if (!keep.me) cat("handler removing itself\n") # return keep.me } } # add the callback that will work for # 4 top-level tasks and then remove itself. n <- addTaskCallback(times(4)) # now remove it, assuming it is still first in the list. removeTaskCallback(n) ## See how the handler is called every time till "self destruction": addTaskCallback(times(4)) # counts as once already sum(1:10) ; mean(1:3) # two more sinpi(1) # 4th - and "done" cospi(1) tanpi(1)
This provides an entirely R-language mechanism for managing callbacks or actions that are invoked at the conclusion of each top-level task. Essentially, we register a single R function from this manager with the underlying, native task-callback mechanism and this function handles invoking the other R callbacks under the control of the manager. The manager consists of a collection of functions that access shared variables to manage the list of user-level callbacks.
taskCallbackManager(handlers = list(), registered = FALSE, verbose = FALSE)
taskCallbackManager(handlers = list(), registered = FALSE, verbose = FALSE)
handlers |
this can be a list of callbacks in which each element
is a list with an element named |
registered |
a logical value indicating whether
the |
verbose |
a logical value, which if |
A list
containing 6 functions:
add() |
register a callback with this manager, giving the
function, an optional 5-th argument, an optional name
by which the callback is stored in the list,
and a |
remove() |
remove an element from the manager's collection of callbacks, either by name or position/index. |
evaluate() |
the ‘real’ callback function that is registered with the C-level dispatch mechanism and which invokes each of the R-level callbacks within this manager's control. |
suspend() |
a function to set the suspend state
of the manager. If it is suspended, none of the callbacks will be
invoked when a task is completed. One sets the state by specifying
a logical value for the |
register() |
a function to register the |
callbacks() |
returns the list of callbacks being maintained by this manager. |
Duncan Temple Lang (2001) Top-level Task Callbacks in R, https://developer.r-project.org/TaskHandlers.pdf
addTaskCallback
,
removeTaskCallback
,
getTaskCallbackNames
and the reference.
# create the manager h <- taskCallbackManager() # add a callback h$add(function(expr, value, ok, visible) { cat("In handler\n") return(TRUE) }, name = "simpleHandler") # look at the internal callbacks. getTaskCallbackNames() # look at the R-level callbacks names(h$callbacks()) removeTaskCallback("R-taskCallbackManager")
# create the manager h <- taskCallbackManager() # add a callback h$add(function(expr, value, ok, visible) { cat("In handler\n") return(TRUE) }, name = "simpleHandler") # look at the internal callbacks. getTaskCallbackNames() # look at the R-level callbacks names(h$callbacks()) removeTaskCallback("R-taskCallbackManager")
This provides a way to get the names (or identifiers) for the currently registered task callbacks that are invoked at the conclusion of each top-level task. These identifiers can be used to remove a callback.
getTaskCallbackNames()
getTaskCallbackNames()
A character vector giving the name for each of the
registered callbacks which are invoked when
a top-level task is completed successfully.
Each name is the one used when registering
the callbacks and returned as the in the
call to addTaskCallback
.
One can use taskCallbackManager
to manage user-level task callbacks,
i.e., S-language functions, entirely within
the S language and access the names
more directly.
addTaskCallback
,
removeTaskCallback
,
taskCallbackManager
\
https://developer.r-project.org/TaskHandlers.pdf
n <- addTaskCallback(function(expr, value, ok, visible) { cat("In handler\n") return(TRUE) }, name = "simpleHandler") getTaskCallbackNames() # now remove it by name removeTaskCallback("simpleHandler") h <- taskCallbackManager() h$add(function(expr, value, ok, visible) { cat("In handler\n") return(TRUE) }, name = "simpleHandler") getTaskCallbackNames() removeTaskCallback("R-taskCallbackManager")
n <- addTaskCallback(function(expr, value, ok, visible) { cat("In handler\n") return(TRUE) }, name = "simpleHandler") getTaskCallbackNames() # now remove it by name removeTaskCallback("simpleHandler") h <- taskCallbackManager() h$add(function(expr, value, ok, visible) { cat("In handler\n") return(TRUE) }, name = "simpleHandler") getTaskCallbackNames() removeTaskCallback("R-taskCallbackManager")
tempfile
returns a vector of character strings which can be used as
names for temporary files.
tempfile(pattern = "file", tmpdir = tempdir(), fileext = "") tempdir(check = FALSE)
tempfile(pattern = "file", tmpdir = tempdir(), fileext = "") tempdir(check = FALSE)
pattern |
a non-empty character vector giving the initial part of the name. |
tmpdir |
a non-empty character vector giving the directory name. |
fileext |
a non-empty character vector giving the file extension. |
check |
|
The length of the result is the maximum of the lengths of the three arguments; values of shorter arguments are recycled.
The names are very likely to be unique among calls to tempfile
in an R session and across simultaneous R sessions (unless
tmpdir
is specified). The filenames are guaranteed not to be
currently in use.
The file name is made by concatenating the path given by
tmpdir
, the pattern
string, a random string in hex and
a suffix of fileext
.
By default, tmpdir
will be the directory given by
tempdir()
. This will be a subdirectory of the per-session
temporary directory found by the following rule when the R session is
started. The environment variables TMPDIR, TMP and
TEMP are checked in turn and the first found which points to a
writable directory is used:
if none succeeds ‘/tmp’ is used. The path must not contain spaces.
Note that setting any of these environment variables in the R session
has no effect on tempdir()
: the per-session temporary directory
is created before the interpreter is started.
For tempfile
a character vector giving the names of possible
(temporary) files. Note that no files are generated by tempfile
.
For tempdir
, the path of the per-session temporary directory.
On Windows, both will use a backslash as the path separator.
On a Unix-alike, the value will be an absolute path (unless
tmpdir
is set to a relative path), but it need not be canonical
(see normalizePath
) and on macOS it often is not.
R processes forked by functions such as mclapply
and
makeForkCluster
in package parallel share a
per-session temporary directory. Further, the ‘guaranteed not
to be currently in use’ applies only at the time of asking, and two
children could ask simultaneously. This is circumvented by ensuring
that tempfile
calls in different children try different names.
The final component of tempdir()
is created by the POSIX system
call mkdtemp
, or if this is not available (e.g. on
Windows) a version derived from the source code of GNU glibc
.
It will be of the form ‘RtmpXXXXXX’ where the last 6 characters
are replaced in a platform-specific way. POSIX only requires that the
replacements be ASCII, which allows .
(so the value may appear
to have a file extension) and regexp metacharacters such as
+
. Most commonly the replacements are from the regexp
pattern [A-Za-z0-9]
, but .
has been seen.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
unlink
for deleting files.
tempfile(c("ab", "a b c")) # give file name with spaces in! tempfile("plot", fileext = c(".ps", ".pdf")) tempdir() # works on all platforms with a platform-dependent result ## Show how 'check' is working on some platforms: if(exists("I'm brave") && `I'm brave` && identical(.Platform$OS.type, "unix") && grepl("^/tmp/", tempdir())) { cat("Current tempdir(): ", tempdir(), "\n") cat("Removing it :", file.remove(tempdir()), "; dir.exists(tempdir()):", dir.exists(tempdir()), "\n") cat("and now tempdir(check = TRUE) :", tempdir(check = TRUE),"\n") }
tempfile(c("ab", "a b c")) # give file name with spaces in! tempfile("plot", fileext = c(".ps", ".pdf")) tempdir() # works on all platforms with a platform-dependent result ## Show how 'check' is working on some platforms: if(exists("I'm brave") && `I'm brave` && identical(.Platform$OS.type, "unix") && grepl("^/tmp/", tempdir())) { cat("Current tempdir(): ", tempdir(), "\n") cat("Removing it :", file.remove(tempdir()), "; dir.exists(tempdir()):", dir.exists(tempdir()), "\n") cat("and now tempdir(check = TRUE) :", tempdir(check = TRUE),"\n") }
Input and output text connections.
textConnection(object, open = "r", local = FALSE, name = deparse1(substitute(object)), encoding = c("", "bytes", "UTF-8")) textConnectionValue(con)
textConnection(object, open = "r", local = FALSE, name = deparse1(substitute(object)), encoding = c("", "bytes", "UTF-8")) textConnectionValue(con)
object |
character. A description of the connection.
For an input this is an R character vector object, and for an output
connection the name for the R character vector to receive the
output, or |
open |
character string. Either |
local |
logical. Used only for output connections. If |
name |
a |
encoding |
character string, partially matched. Used only for input connections. How
marked strings in |
con |
an output text connection. |
An input text connection is opened and the character vector is copied
at time the connection object is created, and close
destroys
the copy. object
should be the name of a character vector:
however, short expressions will be accepted provided they deparse to
less than 60 bytes.
An output text connection is opened and creates an R character vector
of the given name in the user's workspace or in the calling environment,
depending on the value of the local
argument. This object will at all
times hold the completed lines of output to the connection, and
isIncomplete
will indicate if there is an incomplete
final line. Closing the connection will output the final line,
complete or not. (A line is complete once it has been terminated by
end-of-line, represented by "\n"
in R.) The output character
vector has locked bindings (see lockBinding
) until
close
is called on the connection. The character vector can
also be retrieved via textConnectionValue
, which is the
only way to do so if object = NULL
. If the current locale is
detected as Latin-1 or UTF-8, non-ASCII elements of the character vector
will be marked accordingly (see Encoding
).
Opening a text connection with mode = "a"
will attempt to
append to an existing character vector with the given name in the
user's workspace or the calling environment. If none is found (even
if an object exists of the right name but the wrong type) a new
character vector will be created, with a warning.
You cannot seek
on a text connection, and seek
will
always return zero as the position.
Text connections have slightly unusual semantics: they are always open, and throwing away an input text connection without closing it (so it get garbage-collected) does not give a warning.
For textConnection
, a connection object of class
"textConnection"
which inherits from class "connection"
.
For textConnectionValue
, a character vector.
As output text connections keep the character vector up to date
line-by-line, they are relatively expensive to use, and it is often
better to use an anonymous file()
connection to collect
output.
On (rare) platforms where vsnprintf
does not return the needed
length of output there is a 100,000 character limit on the length of
line for output connections: longer lines will be truncated with a
warning.
Chambers, J. M. (1998)
Programming with Data. A Guide to the S Language. Springer.
[S has input text connections only.]
connections
, showConnections
,
pushBack
, capture.output
.
zz <- textConnection(LETTERS) readLines(zz, 2) scan(zz, "", 4) pushBack(c("aa", "bb"), zz) scan(zz, "", 4) close(zz) zz <- textConnection("foo", "w") writeLines(c("testit1", "testit2"), zz) cat("testit3 ", file = zz) isIncomplete(zz) cat("testit4\n", file = zz) isIncomplete(zz) close(zz) foo # capture R output: use part of example from help(lm) zz <- textConnection("foo", "w") ctl <- c(4.17, 5.58, 5.18, 6.11, 4.5, 4.61, 5.17, 4.53, 5.33, 5.14) trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69) group <- gl(2, 10, 20, labels = c("Ctl", "Trt")) weight <- c(ctl, trt) sink(zz) anova(lm.D9 <- lm(weight ~ group)) cat("\nSummary of Residuals:\n\n") summary(resid(lm.D9)) sink() close(zz) cat(foo, sep = "\n")
zz <- textConnection(LETTERS) readLines(zz, 2) scan(zz, "", 4) pushBack(c("aa", "bb"), zz) scan(zz, "", 4) close(zz) zz <- textConnection("foo", "w") writeLines(c("testit1", "testit2"), zz) cat("testit3 ", file = zz) isIncomplete(zz) cat("testit4\n", file = zz) isIncomplete(zz) close(zz) foo # capture R output: use part of example from help(lm) zz <- textConnection("foo", "w") ctl <- c(4.17, 5.58, 5.18, 6.11, 4.5, 4.61, 5.17, 4.53, 5.33, 5.14) trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69) group <- gl(2, 10, 20, labels = c("Ctl", "Trt")) weight <- c(ctl, trt) sink(zz) anova(lm.D9 <- lm(weight ~ group)) cat("\nSummary of Residuals:\n\n") summary(resid(lm.D9)) sink() close(zz) cat(foo, sep = "\n")
Tilde is used to separate the left- and right-hand sides in a model formula.
y ~ model
y ~ model
y , model
|
symbolic expressions. |
The left-hand side is optional, and one-sided formulae are used in some contexts.
A formula has mode call
. It can be subsetted by
[[
: the components are ~
, the left-hand side (if
present) and the right-hand side in that order. (Thus
one-sided formulae have two components.)
Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Information about time zones in R. Sys.timezone
returns
the name of the current time zone.
Sys.timezone(location = TRUE) OlsonNames(tzdir = NULL)
Sys.timezone(location = TRUE) OlsonNames(tzdir = NULL)
location |
logical. Defunct, with a warning if |
tzdir |
the time-zone database to be used: the default is to try known locations until one is found. |
Time zones are a system-specific topic, but these days almost all R platforms use similar underlying code, used by Linux, macOS, Solaris, AIX and FreeBSD, and installed with R on Windows. (Unfortunately there are many system-specific errors in the implementations.) It is possible to use the R sources' version of the code on Unix-alikes as well as on Windows: this is the default on macOS.
It should be possible to set the current time zone via the environment
variable TZ: see the section on ‘Time zone names’ for
suitable values. Sys.timezone()
will return the value of
TZ if set initially (and on some OSes it is always set),
otherwise it will try to retrieve from the OS a value which if set for
TZ would give the initial time zone. (‘Initially’ means
before any time-zone functions are used: if TZ is being set to
override the OS setting or if the ‘try’ does not get this
right, it should be set before the R process is started or (probably
early enough) in file .Rprofile
).
If TZ is set but invalid, most platforms default to ‘UTC’,
the time zone colloquially known as ‘GMT’ (see
https://en.wikipedia.org/wiki/Coordinated_Universal_Time).
(Some but not all platforms will give a warning for invalid values.)
If it is unset or empty the system time zone is used (the one
returned by Sys.timezone
).
Time zones did not come into use until the middle of the nineteenth century and were not widely adopted until the twentieth, and daylight saving time (DST, also known as summer time) was first introduced in the early twentieth century, most widely in 1916. Over the last 100 years places have changed their affiliation between major time zones, have opted out of (or in to) DST in various years or adopted DST rule changes late or not at all. (For example, the UK experimented with DST throughout 1971, only.) In a few countries (one is the Irish Republic) it is the summer time which is the ‘standard’ time and a different name is used in winter. And there can be multiple changes during a year, for example for Ramadan.
A quite common system implementation of POSIXct
was as signed
32-bit integers and so only went back to the end of 1901: on such
systems R assumes that dates prior to that are in the same time zone
as they were in 1902. Most of the world had not adopted time zones by
1902 (so used local ‘mean time’ based on longitude) but for a
few places there had been time-zone changes before then. 64-bit
representations are becoming by far the most common; unfortunately on
some 64-bit OSes the database information is 32-bit and so only
available for the range 1901–2038, and incompletely for the end
years.
When a time zone location is first found in a session its value is
cached in object .sys.timezone
in the base environment.
Sys.timezone
returns an OS-specific character string, possibly
NA
or an empty string (which on some OSes means ‘UTC’).
This will be a location such as "Europe/London"
if one can be
ascertained.
A time zone region may be known by several names: for example ‘"Europe/London"’ may also be known as ‘GB’, ‘GB-Eire’, ‘Europe/Belfast’, ‘Europe/Guernsey’, ‘Europe/Isle_of_Man’ and ‘Europe/Jersey’. A few regions are also known by a summary of their time zone, e.g. ‘PST8PDT’ is (on most but not all systems) an alias for ‘America/Los_Angeles’.
OlsonNames
returns a character vector, see the examples for
typical cases. It may have an attribute "Version"
, something
like ‘"2023a"’. (It does on systems using
--with-internal-tzcode and those like Fedora distributing
file ‘tzdata.zi’.)
Names "UTC"
and its synonym "GMT"
are accepted on all
platforms.
Where OSes describe their valid time zones can be obscure. The help
for the C function tzset
can be helpful, but it can also be
inaccurate. There is a cumbersome POSIX specification (listed under
environment variable TZ at
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08),
which is often at least partially supported, but there are other more
user-friendly ways to specify time zones.
Almost all R platforms make use of a time-zone database originally
compiled by Arthur David Olson and now managed by IANA, in which the
preferred way to refer to a time zone is by a location (typically of a
city), e.g., Europe/London
, America/Los_Angeles
,
Pacific/Easter
within a ‘time zone region’. Some
traditional designations are also allowed such as EST5EDT
or
GB
. (Beware that some of these designations may not be what
you expect: in particular EST
is a time zone used in Canada
without daylight saving time, and not EST5EDT
nor
(Australian) Eastern Standard Time.) The designation can also be an
optional colon prepended to the path to a file giving complied zone
information (and the examples above are all files in a system-specific
location). See https://data.iana.org/time-zones/tz-link.html
for more details and references. By convention, regions with a unique
time-zone history since 1970 have specific names in the database, but
those with different earlier histories may not. Each time zone has
one or two (the second for ‘summer’) abbreviations used when
formatting times.
Increasingly OSes are (optionally or always) not including
‘legacy’ names such as US/Eastern
: only names of the
forms Continent/City
and Etc/...
are fully portable.
The abbreviations used have changed over the years: for example France used ‘PMT’ (‘Paris Mean Time’) from 1891 to 1911 then ‘WET/WEST’ up to 1940 and ‘CET/CEST’ from 1946. (In almost all time zones the abbreviations have been stable since 1970.) The POSIX standard allows only one or two abbreviations per time zone, so you may see the current abbreviation(s) used for older times.
For some time zones abbreviations are like ‘-03’ and
‘+0845’: this is done when there is no official abbreviation.
(Negative values are behind (West of) UTC, as for the "%z"
format for strftime
.)
The function OlsonNames
returns the time-zone names known to
the currently selected Olson/IANA database. The system-specific
location in the file system varies,
e.g. ‘/usr/share/zoneinfo’ (Linux, macOS, FreeBSD),
‘/usr/share/lib/zoneinfo’ (Solaris, AIX), .... It is likely
that there is a file named something like ‘zone1970.tab’ or
(older) ‘zone.tab’ under that directory listing the locations
known as time-zone names (but not for example EST5EDT
). See
also https://en.wikipedia.org/wiki/Zone.tab.
Where R was configured with option --with-internal-tzcode
(the default on Windows), the database at
file.path(R.home("share"), "zoneinfo")
is used by default: file
‘VERSION’ in that directory states the version. That option is
also the default on macOS but there whichever is more recent of the
system database at ‘/var/db/timezone/zoneinfo’ and that
distributed with R is used by default. Environment variable
TZDIR can be used to give the full path to a different
‘zoneinfo’ database: value "internal"
indicates the
database from the R sources and "macOS"
indicates the system
database. (Setting either of those values would not be recognized by
other software using TZDIR.)
Setting TZDIR is also supported by the native services on some
OSes, e.g. Linux using glibc
except in secure modes.
Time zones given by name (via environment variable TZ, in
tz
arguments to functions such as as.POSIXlt
and
perhaps the system time zone) are loaded from the currently selected
‘zoneinfo’ database.
On Windows only: An attempt is made (once only per session) to map Windows' idea of the current time zone to a location, following a version of http://unicode.org/repos/cldr/trunk/common/supplemental/windowsZones.xml with additional values deduced from the Windows Registry and documentation. It can be overridden by setting the TZ environment variable before any date-times are used in the session.
Most platforms support time zones of the form ‘Etc/GMT+n’ and ‘Etc/GMT-n’ (possibly also without prefix ‘Etc/’), which assume a fixed offset from UTC (hence no DST). Contrary to some expectations (but consistent with names such as ‘PST8PDT’), negative offsets are times ahead of (East of) UTC, positive offsets are times behind (West of) UTC.
Immediately prior to the advent of legislated time zones, most people used time based on their longitude (or that of a nearby town), known as ‘Local Mean Time’ and abbreviated as ‘LMT’ in the databases: in many countries that was codified with a specific name before the switch to a standard time. For example, Paris codified its LMT as ‘Paris Mean Time’ in 1891 (to be used throughout mainland France) and switched to ‘GMT+0’ in 1911.
Some systems (notably Linux) have a tzselect
command which
allows the interactive selection of a supported time zone name. On
systems using systemd
(notably Linux), the OS command
timedatectl list-timezones
will list all available time zone
names.
There is a system-specific upper limit on the number of bytes in (abbreviated) time-zone names which can be as low as 6 (as required by POSIX). Some OSes allow the setting of time zones with names which exceed their limit, and that can crash the R session.
Information about future times is speculative (‘proleptic’): the database provides the best-known information based on current rules set by civil authorities. For the period 1900–1970 those rules (and which of any authority's rules were enacted) are often obscure, and the databases do get corrected frequently.
OlsonNames
tries to find an Olson database in known locations.
It might not succeed (when it returns an empty vector with a warning)
and even if it does it might not locate the database used by the
date-time code linked into R. Fortunately names are added rarely
and most databases are pretty complete. On the other hand, many names
which duplicate other named timezones have been moved to the
‘backward’ list – these are regarded as optional and omitted on
minimal installations. Similarly, there are timezones named in file
‘backzone’ which differ only from those in the main lists prior
to 1970 – these are usually included but may not be in minimalist
systems.
For many years, the legacy names EST5EDT
and PST8PDT
were portable, but musl
(the C runtime used by Alpine Linux)
does not use DST with those names.
This section is of background interest for users of a Unix-alike, but
may help if an NA
value is returned unexpectedly.
Commercial Unixen such as Solaris and AIX set TZ, so the value when R is started is used.
All other common platforms (Linux, macOS, *BSD) use similar schemes,
either derived from tzcode
(currently distributed from
https://www.iana.org/time-zones) or independently coded
(glibc
, musl-libc
). Such systems read the time-zone
information from a file ‘localtime’, usually under ‘/etc’
(but possibly under ‘/usr/local/etc’ or
‘/usr/local/etc/zoneinfo’). As the usual Linux manual page for
localtime
says
‘Because the time zone identifier is extracted from the symlink target name of ‘/etc/localtime’, this file may not be a normal file or hardlink.’
Nevertheless, some Linux distributions (including the one from which that quote was taken) or sysadmins have chosen to copy a time-zone file to ‘localtime’. For a non-symlink, the ultimate fallback is to compare that file to all files in the time-zone database.
Some Linux platforms provide two other mechanisms which are tried in turn before looking at ‘/etc/localtime’.
‘Modern’ Linux systems use systemd
which
provides mechanisms to set and retrieve the time zone (amongst other
things). There is a command timedatectl
to give details.
(Unfortunately RHEL/Centos 6.x were not ‘modern’.)
Debian-derived systems since ca 2007 have supplied a file ‘/etc/timezone’. Its format is undocumented but empirically it contains a single line of text naming the time zone.
In each case a sanity check is performed that the time-zone name is the
name of a file in the time-zone database. (The systems probably use
the time-zone file (symlinked to) ‘/etc/localtime’, but the
Sys.timezone
code does not check that is the same as the named
file in the database. This is deliberate as they may be from
different dates.)
Since 2007 there has been considerable disruption over changes to the timings of the DST transitions; these often have short notice and time-zone databases may not be up to date. (Morocco in 2013 announced a change to the end of DST at a day's notice. In 2023 there was chaos in Lebanon as the authorities changed their minds repeatedly and some changes were not widely implemented.)
There have also been changes to the ‘standard’ time with little notice (Kazakhstan switched to a single time zone in Mar 2024 with six weeks' notice), and to whether ‘summer’ or ‘winter’ time is regarded as ‘standard’ (and hence to abbreviations).
On platforms with case-insensitive file systems, time zone names will be
case-insensitive. They may or may not be on other platforms and so,
for example, "gmt"
is valid on some platforms and not on others.
Note that except where replaced, the operation of time zones is an OS service, and even where replaced a third-party database is used and can be updated (see the section on ‘Time zone names’). Incorrect results will never be an R issue, so please ensure that you have the courtesy not to blame R for them.
https://en.wikipedia.org/wiki/Time_zone and https://data.iana.org/time-zones/tz-link.html for extensive sets of links.
https://data.iana.org/time-zones/theory.html for the ‘rules’ of the Olson/IANA database.
Sys.timezone() str(OlsonNames()) ## typically around six hundred names, ## typically some acronyms/aliases such as "UTC", "NZ", "MET", "Eire", ..., but ## mostly pairs (and triplets) such as "Pacific/Auckland" table(sl <- grepl("/", OlsonNames())) OlsonNames()[ !sl ] # the simple ones head(Osl <- strsplit(OlsonNames()[sl], "/")) (tOS1 <- table(vapply(Osl, `[[`, "", 1))) # Continents, countries, ... table(lengths(Osl))# most are pairs, some triplets str(Osl[lengths(Osl) >= 3])# "America" South and North ...
Sys.timezone() str(OlsonNames()) ## typically around six hundred names, ## typically some acronyms/aliases such as "UTC", "NZ", "MET", "Eire", ..., but ## mostly pairs (and triplets) such as "Pacific/Auckland" table(sl <- grepl("/", OlsonNames())) OlsonNames()[ !sl ] # the simple ones head(Osl <- strsplit(OlsonNames()[sl], "/")) (tOS1 <- table(vapply(Osl, `[[`, "", 1))) # Continents, countries, ... table(lengths(Osl))# most are pairs, some triplets str(Osl[lengths(Osl) >= 3])# "America" South and North ...
This is a helper function for format
to produce a single
character string describing an R object.
toString(x, ...) ## Default S3 method: toString(x, width = NULL, ...)
toString(x, ...) ## Default S3 method: toString(x, width = NULL, ...)
x |
The object to be converted. |
width |
Suggestion for the maximum field width. Values of
|
... |
Optional arguments passed to or from methods. |
This is a generic function for which methods can be written: only the
default method is described here. Most methods should honor the
width
argument to specify the maximum display width (as measured
by nchar(type = "width")
) of the result.
The default method first converts x
to character and then
concatenates the elements separated by ", "
.
If width
is supplied and is not NULL
, the default method
returns the first width - 4
characters of the result with
....
appended, if the full result would use more than
width
characters.
A character vector of length 1 is returned.
Robert Gentleman
x <- c("a", "b", "aaaaaaaaaaa") toString(x) toString(x, width = 8)
x <- c("a", "b", "aaaaaaaaaaa") toString(x) toString(x, width = 8)
A call to trace
allows you to insert debugging code (e.g., a
call to browser
or recover
) at chosen
places in any function. A call to untrace
cancels the tracing.
Specified methods can be traced the same way, without tracing all
calls to the generic function. Trace code (tracer
) can be any
R expression. Tracing can be temporarily turned on or off globally
by calling tracingState
.
trace(what, tracer, exit, at, print, signature, where = topenv(parent.frame()), edit = FALSE) untrace(what, signature = NULL, where = topenv(parent.frame())) tracingState(on = NULL) .doTrace(expr, msg) returnValue(default = NULL)
trace(what, tracer, exit, at, print, signature, where = topenv(parent.frame()), edit = FALSE) untrace(what, signature = NULL, where = topenv(parent.frame())) tracingState(on = NULL) .doTrace(expr, msg) returnValue(default = NULL)
what |
the name, possibly |
tracer |
either a function or an unevaluated expression. The
function will be called or the expression will be evaluated either
at the beginning of the call, or before those steps in the call
specified by the argument |
exit |
either a |
at |
optional numeric vector or list. If supplied, |
print |
if |
signature |
an optional
signature for a method for function |
edit |
For complicated tracing, such as tracing within a loop
inside the function, you will need to insert the desired calls by
editing the body of the function. If so, supply the |
where |
where to look for the function to be
traced; by default, the top-level environment of the call to
An important use of this argument is to trace functions from a
package which are “hidden” or called from another package.
The namespace mechanism imports the functions to be called (with the
exception of functions in the base package). The functions being
called are not the same objects seen from the top-level (in
general, the imported packages may not even be attached).
Therefore, you must ensure that the correct versions are being
traced. The way to do this is to set argument |
on |
logical; a call to the support function |
expr , msg
|
arguments to the support function |
default |
if |
The trace
function operates by constructing a revised version
of the function (or of the method, if signature
is supplied),
and assigning the new object back where the original was found.
If only the what
argument is given, a line of trace printing is
produced for each call to the function (back compatible with the
earlier version of trace
).
The object constructed by trace
is from a class that extends
"function"
and which contains the original, untraced version.
A call to untrace
re-assigns this version.
If the argument tracer
or exit
is the name of a
function, the tracing expression will be a call to that function, with
no arguments. This is the easiest and most common case, with the
functions browser
and recover
the
likeliest candidates; the former browses in the frame of the function
being traced, and the latter allows browsing in any of the currently
active calls. The arguments tracer
and exit
are evaluated to
see whether they are functions, but only their names are used in the
tracing expressions. The lookup is done again when the traced function
executes, so it may not be tracer
or exit
that will be called
while tracing.
The tracer
or exit
argument can also be an unevaluated
expression (such as returned by a call to quote
or
substitute
). This expression itself is inserted in the
traced function, so it will typically involve arguments or local
objects in the traced function. An expression of this form is useful
if you only want to interact when certain conditions apply (and in
this case you probably want to supply print = FALSE
in the call
to trace
also).
When the at
argument is supplied, it can be a vector of
integers referring to the substeps of the body of the function (this
only works if the body of the function is enclosed in { ...}
). In
this case tracer
is not called on entry, but instead
just before evaluating each of the steps listed in at
. (Hint:
you don't want to try to count the steps in the printed version of a
function; instead, look at as.list(body(f))
to get the numbers
associated with the steps in function f
.)
The at
argument can also be a list of integer vectors. In
this case, each vector refers to a step nested within another step of
the function. For example, at = list(c(3,4))
will call the tracer just before the fourth step of the third step
of the function. See the example below.
Using setBreakpoint
(from package utils) may be an
alternative, calling trace(...., at, ...)
.
The exit
argument is called during on.exit
processing. In an on.exit
expression, the experimental returnValue()
function may be called to obtain the value about to be returned by
the function. Calling this function in other circumstances will give
undefined results.
An intrinsic limitation in the exit
argument is that it won't
work if the function itself uses on.exit
with add=
FALSE
(the default), since the existing calls will override the one
supplied by trace
.
Tracing does not nest. Any call to trace
replaces previously
traced versions of that function or method (except for edited
versions as discussed below), and untrace
always
restores an untraced version. (Allowing nested tracing has too many
potentials for confusion and for accidentally leaving traced versions
behind.)
When the edit
argument is used repeatedly with no call to
untrace
on the same function or method in between, the
previously edited version is retained. If you want to throw away
all the previous tracing and then edit, call untrace
before the next
call to trace
. Editing may be combined with automatic
tracing; just supply the other arguments such as tracer
, and
the edit
argument as well. The edit = TRUE
argument
uses the default editor (see edit
).
Tracing primitive functions (builtins and specials) from the base
package works, but only by a special mechanism and not very
informatively. Tracing a primitive causes the primitive to be
replaced by a function with argument ... (only). You can get a bit
of information out, but not much. A warning message is issued when
trace
is used on a primitive.
The practice of saving the traced version of the function back where
the function came from means that tracing carries over from one
session to another, if the traced function is saved in the
session image. (In the next session, untrace
will remove the
tracing.) On the other hand, functions that were in a package, not in
the global environment, are not saved in the image, so tracing expires
with the session for such functions.
Tracing an S4 method is basically just like tracing a function, with the
exception that the traced version is stored by a call to
setMethod
rather than by direct assignment, and so is
the untraced version after a call to untrace
.
The version of trace
described here is largely compatible with
the version in S-Plus, although the two work by entirely different
mechanisms. The S-Plus trace
uses the session frame, with the
result that tracing never carries over from one session to another (R
does not have a session frame). Another relevant distinction has
nothing directly to do with trace
: The browser in S-Plus
allows changes to be made to the frame being browsed, and the changes
will persist after exiting the browser. The R browser allows changes,
but they disappear when the browser exits. This may be relevant in
that the S-Plus version allows you to experiment with code changes
interactively, but the R version does not. (A future revision may
include a ‘destructive’ browser for R.)
In the simple version (just the first argument), trace
returns
an invisible NULL
.
Otherwise, the traced function(s) name(s). The relevant consequence is the
assignment that takes place.
untrace
returns the function name invisibly.
tracingState
returns the current global tracing state, and possibly
changes it.
When called during on.exit
processing, returnValue
returns
the value about to be returned by the exiting function. Behaviour in
other circumstances is undefined.
Using trace()
is conceptually a generalization of
debug
, implemented differently. Namely by calling
browser
via its tracer
or exit
argument.
The version of function tracing that includes any of the arguments except for the function name requires the methods package (because it uses special classes of objects to store and restore versions of the traced functions).
If methods dispatch is not currently on, trace
will load the
methods namespace, but will not put the methods package on the
search
list.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
browser
and recover
, the likeliest
tracing functions;
also, quote
and substitute
for
constructing general expressions.
require(stats) ## Very simple use trace(sum) hist(rnorm(100)) # shows about 3-4 calls to sum() untrace(sum) ## Show how pt() is called from inside power.t.test(): if(FALSE) trace(pt) ## would show ~20 calls, but we want to see more: trace(pt, tracer = quote(cat(sprintf("tracing pt(*, ncp = %.15g)\n", ncp))), print = FALSE) # <- not showing typical extra power.t.test(20, 1, power=0.8, sd=NULL) ##--> showing the ncp root finding: untrace(pt) f <- function(x, y) { y <- pmax(y, 0.001) if (x > 0) x ^ y else stop("x must be positive") } ## arrange to call the browser on entering and exiting ## function f trace("f", quote(browser(skipCalls = 4)), exit = quote(browser(skipCalls = 4))) ## instead, conditionally assign some data, and then browse ## on exit, but only then. Don't bother me otherwise trace("f", quote(if(any(y < 0)) yOrig <- y), exit = quote(if(exists("yOrig")) browser(skipCalls = 4)), print = FALSE) ## Enter the browser just before stop() is called. First, find ## the step numbers untrace(f) # (as it has changed f's body !) as.list(body(f)) as.list(body(f)[[3]]) # -> stop(..) is [[4]] ## Now call the browser there trace("f", quote(browser(skipCalls = 4)), at = list(c(3,4))) ## Not run: f(-1,2) # --> enters browser just before stop(..) ## End(Not run) ## trace a utility function, with recover so we ## can browse in the calling functions as well. trace("as.matrix", recover) ## turn off the tracing (that happened above) untrace(c("f", "as.matrix")) ## Not run: ## Useful to find how system2() is called in a higher-up function: trace(base::system2, quote(print(ls.str()))) ## End(Not run) ##-------- Tracing hidden functions : need 'where = *' ## ## 'where' can be a function whose environment is meant: trace(quote(ar.yw.default), where = ar) a <- ar(rnorm(100)) # "Tracing ..." untrace(quote(ar.yw.default), where = ar) ## trace() more than one function simultaneously: ## expression(E1, E2, ...) here is equivalent to ## c(quote(E1), quote(E2), quote(.*), ..) trace(expression(ar.yw, ar.yw.default), where = ar) a <- ar(rnorm(100)) # --> 2 x "Tracing ..." # and turn it off: untrace(expression(ar.yw, ar.yw.default), where = ar) ## Not run: ## trace calls to the function lm() that come from ## the nlme package. trace("lm", where = asNamespace("nlme")) lm (len ~ log(dose) * supp, ToothGrowth) -> fit1 # NOT traced nlme::lmList(len ~ log(dose) | supp, ToothGrowth) -> fit2 # traced untrace("lm", where = asNamespace("nlme")) ## End(Not run)
require(stats) ## Very simple use trace(sum) hist(rnorm(100)) # shows about 3-4 calls to sum() untrace(sum) ## Show how pt() is called from inside power.t.test(): if(FALSE) trace(pt) ## would show ~20 calls, but we want to see more: trace(pt, tracer = quote(cat(sprintf("tracing pt(*, ncp = %.15g)\n", ncp))), print = FALSE) # <- not showing typical extra power.t.test(20, 1, power=0.8, sd=NULL) ##--> showing the ncp root finding: untrace(pt) f <- function(x, y) { y <- pmax(y, 0.001) if (x > 0) x ^ y else stop("x must be positive") } ## arrange to call the browser on entering and exiting ## function f trace("f", quote(browser(skipCalls = 4)), exit = quote(browser(skipCalls = 4))) ## instead, conditionally assign some data, and then browse ## on exit, but only then. Don't bother me otherwise trace("f", quote(if(any(y < 0)) yOrig <- y), exit = quote(if(exists("yOrig")) browser(skipCalls = 4)), print = FALSE) ## Enter the browser just before stop() is called. First, find ## the step numbers untrace(f) # (as it has changed f's body !) as.list(body(f)) as.list(body(f)[[3]]) # -> stop(..) is [[4]] ## Now call the browser there trace("f", quote(browser(skipCalls = 4)), at = list(c(3,4))) ## Not run: f(-1,2) # --> enters browser just before stop(..) ## End(Not run) ## trace a utility function, with recover so we ## can browse in the calling functions as well. trace("as.matrix", recover) ## turn off the tracing (that happened above) untrace(c("f", "as.matrix")) ## Not run: ## Useful to find how system2() is called in a higher-up function: trace(base::system2, quote(print(ls.str()))) ## End(Not run) ##-------- Tracing hidden functions : need 'where = *' ## ## 'where' can be a function whose environment is meant: trace(quote(ar.yw.default), where = ar) a <- ar(rnorm(100)) # "Tracing ..." untrace(quote(ar.yw.default), where = ar) ## trace() more than one function simultaneously: ## expression(E1, E2, ...) here is equivalent to ## c(quote(E1), quote(E2), quote(.*), ..) trace(expression(ar.yw, ar.yw.default), where = ar) a <- ar(rnorm(100)) # --> 2 x "Tracing ..." # and turn it off: untrace(expression(ar.yw, ar.yw.default), where = ar) ## Not run: ## trace calls to the function lm() that come from ## the nlme package. trace("lm", where = asNamespace("nlme")) lm (len ~ log(dose) * supp, ToothGrowth) -> fit1 # NOT traced nlme::lmList(len ~ log(dose) | supp, ToothGrowth) -> fit2 # traced untrace("lm", where = asNamespace("nlme")) ## End(Not run)
By default traceback()
prints the call stack of the last
uncaught error, i.e., the sequence of calls that lead to the error.
This is useful when an error occurs with an unidentifiable error
message. It can also be used to print the current stack or
arbitrary lists of calls.
.traceback()
now returns the above call stack (and
traceback(x, *)
can be regarded as convenience function for
printing the result of .traceback(x)
).
traceback(x = NULL, max.lines = getOption("traceback.max.lines", getOption("deparse.max.lines", -1L))) .traceback(x = NULL, max.lines = getOption("traceback.max.lines", getOption("deparse.max.lines", -1L)))
traceback(x = NULL, max.lines = getOption("traceback.max.lines", getOption("deparse.max.lines", -1L))) .traceback(x = NULL, max.lines = getOption("traceback.max.lines", getOption("deparse.max.lines", -1L)))
x |
|
max.lines |
a number, the maximum number of lines to be printed
per call. The default is unlimited. Applies only when |
The default display is of the stack of the last uncaught error as
stored as a list of call
s in .Traceback
, which
traceback
prints in a user-friendly format. The stack of
calls always contains all function calls and all foreign
function calls (such as .Call
): if profiling is in
progress it will include calls to some primitive functions. (Calls
to builtins are included, but not to specials.)
Errors which are caught via try
or
tryCatch
do not generate a traceback, so what is printed
is the call sequence for the last uncaught error, and not necessarily
for the last error.
If x
is numeric, then the current stack is printed, skipping
x
entries at the top of the stack. For example,
options(error = function() traceback(3))
will print the stack
at the time of the error, skipping the call to traceback()
and
.traceback()
and the error function that called it.
Otherwise, x
is assumed to be a list or pairlist of calls or
deparsed calls and will be displayed in the same way.
.traceback()
and by extension traceback()
may trigger
deparsing of call
s. This is an expensive operation
for large calls so it may be advisable to set max.lines
to a reasonable value when such calls are on the call stack.
.traceback()
returns the deparsed call stack deepest call
first as a list or pairlist. The number of lines deparsed from
the call can be limited via max.lines
. Calls for which
max.lines
results in truncated output will gain a
"truncated"
attribute.
traceback()
formats, prints, and returns the call stack
produced by .traceback()
invisibly.
It is undocumented where .Traceback
is stored nor that it is
visible, and this is subject to change. Currently
.Traceback
contains the call
s as language
objects.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
foo <- function(x) { print(1); bar(2) } bar <- function(x) { x + a.variable.which.does.not.exist } ## Not run: foo(2) # gives a strange error traceback() ## End(Not run) ## 2: bar(2) ## 1: foo(2) bar ## Ah, this is the culprit ... ## This will print the stack trace at the time of the error. options(error = function() traceback(3))
foo <- function(x) { print(1); bar(2) } bar <- function(x) { x + a.variable.which.does.not.exist } ## Not run: foo(2) # gives a strange error traceback() ## End(Not run) ## 2: bar(2) ## 1: foo(2) bar ## Ah, this is the culprit ... ## This will print the stack trace at the time of the error. options(error = function() traceback(3))
This function marks an object so that a message is printed whenever the internal code copies the object. It is a major cause of hard-to-predict memory use in R.
tracemem(x) untracemem(x) retracemem(x, previous = NULL)
tracemem(x) untracemem(x) retracemem(x, previous = NULL)
x |
An R object, not a function or environment or |
previous |
A value as returned by |
This functionality is optional, determined at compilation, because it
makes R run a little more slowly even when no objects are being
traced. tracemem
and untracemem
give errors when R is not
compiled with memory profiling; retracemem
does not (so it can be
left in code during development).
It is enabled in the CRAN macOS and Windows builds of R.
When an object is traced any copying of the object by the C function
duplicate
produces a message to standard output, as does type
coercion and copying when passing arguments to .C
or
.Fortran
.
The message consists of the string tracemem
, the identifying
strings for the object being copied and the new object being created,
and a stack trace showing where the duplication occurred.
retracemem()
is used to indicate that a variable should be
considered a copy of a previous variable (e.g., after subscripting).
The messages can be turned off with tracingState
.
It is not possible to trace functions, as this would conflict with
trace
and it is not useful to trace NULL
,
environments, promises, weak references, or external pointer objects, as
these are not duplicated.
These functions are primitive.
A character string for identifying the object in the trace output (an
address in hex enclosed in angle brackets), or NULL
(invisibly).
capabilities("profmem")
to see if this was enabled for
this build of R.
https://developer.r-project.org/memory-profiling.html
## Not run: a <- 1:10 tracemem(a) ## b and a share memory b <- a b[1] <- 1 untracemem(a) ## copying in lm: less than R <= 2.15.0 d <- stats::rnorm(10) tracemem(d) lm(d ~ a+log(b)) ## f is not a copy and is not traced f <- d[-1] f+1 ## indicate that f should be traced as a copy of d retracemem(f, retracemem(d)) f+1 ## End(Not run)
## Not run: a <- 1:10 tracemem(a) ## b and a share memory b <- a b[1] <- 1 untracemem(a) ## copying in lm: less than R <= 2.15.0 d <- stats::rnorm(10) tracemem(d) lm(d ~ a+log(b)) ## f is not a copy and is not traced f <- d[-1] f+1 ## indicate that f should be traced as a copy of d retracemem(f, retracemem(d)) f+1 ## End(Not run)
transform
is a generic function, which—at least
currently—only does anything useful with
data frames. transform.default
converts its first argument to
a data frame if possible and calls transform.data.frame
.
transform(`_data`, ...)
transform(`_data`, ...)
_data |
The object to be transformed |
... |
Further arguments of the form |
The ...
arguments to transform.data.frame
are tagged
vector expressions, which are evaluated in the data frame
_data
. The tags are matched against names(_data)
, and for
those that match, the value replace the corresponding variable in
_data
, and the others are appended to _data
.
The modified value of _data
.
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting arithmetic functions,
and in particular the non-standard evaluation of
argument transform
can have unanticipated consequences.
If some of the values are not vectors of the appropriate length, you deserve whatever you get!
Peter Dalgaard
within
for a more flexible approach,
subset
,
list
,
data.frame
transform(airquality, Ozone = -Ozone) transform(airquality, new = -Ozone, Temp = (Temp-32)/1.8) attach(airquality) transform(Ozone, logOzone = log(Ozone)) # marginally interesting ... detach(airquality)
transform(airquality, Ozone = -Ozone) transform(airquality, new = -Ozone, Temp = (Temp-32)/1.8) attach(airquality) transform(Ozone, logOzone = log(Ozone)) # marginally interesting ... detach(airquality)
These functions give the obvious trigonometric functions. They respectively compute the cosine, sine, tangent, arc-cosine, arc-sine, arc-tangent, and the two-argument arc-tangent.
cospi(x)
, sinpi(x)
, and tanpi(x)
, compute
cos(pi*x)
, sin(pi*x)
, and tan(pi*x)
.
cos(x) sin(x) tan(x) acos(x) asin(x) atan(x) atan2(y, x) cospi(x) sinpi(x) tanpi(x)
cos(x) sin(x) tan(x) acos(x) asin(x) atan(x) atan2(y, x) cospi(x) sinpi(x) tanpi(x)
x , y
|
numeric or complex vectors. |
The arc-tangent of two arguments atan2(y, x)
returns the angle
between the x-axis and the vector from the origin to ,
i.e., for positive arguments
atan2(y, x) == atan(y/x)
.
Angles are in radians, not degrees, for the standard versions (i.e., a
right angle is ), and in ‘half-rotations’ for
cospi
etc.
cospi(x)
, sinpi(x)
, and tanpi(x)
are accurate
for x
values which are multiples of a half.
All except atan2
are internal generic primitive
functions: methods can be defined for them individually or via the
Math
group generic.
These are all wrappers to system calls of the same name (with prefix
c
for complex arguments) where available. (cospi
,
sinpi
, and tanpi
are part of a C11 extension
and provided by e.g. macOS and Solaris: where not yet
available call to cos
etc are used, with special cases
for multiples of a half.)
tanpi(0.5)
is NaN
. Similarly for other inputs
with fractional part 0.5
.
For the inverse trigonometric functions, branch cuts are defined as in Abramowitz and Stegun, figure 4.4, page 79.
For asin
and acos
, there are two cuts, both along
the real axis: and
.
For atan
there are two cuts, both along the pure imaginary
axis: and
.
The behaviour actually on the cuts follows the C99 standard which requires continuity coming round the endpoint in a counter-clockwise direction.
Complex arguments for cospi
, sinpi
, and tanpi
are not yet implemented, and they are a ‘future direction’ of
ISO/IEC TS 18661-4.
All except atan2
are S4 generic functions: methods can be defined
for them individually or via the
Math
group generic.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Abramowitz, M. and Stegun, I. A. (1972). Handbook of
Mathematical Functions. New York: Dover.
Chapter 4. Elementary Transcendental Functions: Logarithmic,
Exponential, Circular and Hyperbolic Functions
For cospi
, sinpi
, and tanpi
the C11 extension
ISO/IEC TS 18661-4:2015 (draft at
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1950.pdf).
x <- seq(-3, 7, by = 1/8) tx <- cbind(x, cos(pi*x), cospi(x), sin(pi*x), sinpi(x), tan(pi*x), tanpi(x), deparse.level=2) op <- options(digits = 4, width = 90) # for nice formatting head(tx) tx[ (x %% 1) %in% c(0, 0.5) ,] options(op)
x <- seq(-3, 7, by = 1/8) tx <- cbind(x, cos(pi*x), cospi(x), sin(pi*x), sinpi(x), tan(pi*x), tanpi(x), deparse.level=2) op <- options(digits = 4, width = 90) # for nice formatting head(tx) tx[ (x %% 1) %in% c(0, 0.5) ,] options(op)
Remove leading and/or trailing whitespace from character strings.
trimws(x, which = c("both", "left", "right"), whitespace = "[ \t\r\n]")
trimws(x, which = c("both", "left", "right"), whitespace = "[ \t\r\n]")
x |
a character vector. |
which |
a character string specifying whether to remove both
leading and trailing whitespace (default), or only leading
( |
whitespace |
a string specifying a regular expression to match (one character of) “white space”, see Details for alternatives to the default. |
Internally, sub(re, "", *, perl = TRUE)
, i.e., PCRE
library regular expressions are used.
For portability, the default ‘whitespace’ is the character class
[ \t\r\n]
(space, horizontal tab, carriage return,
newline). Alternatively, [\h\v]
is a good (PCRE)
generalization to match all Unicode horizontal and vertical white
space characters, see also https://www.pcre.org.
x <- " Some text. " x trimws(x) trimws(x, "l") trimws(x, "r") ## Unicode --> need "stronger" 'whitespace' to match all : tt <- "text with unicode 'non breakable space'." xu <- paste(" \t\v", tt, "\u00a0 \n\r") (tu <- trimws(xu, whitespace = "[\\h\\v]")) stopifnot(identical(tu, tt))
x <- " Some text. " x trimws(x) trimws(x, "l") trimws(x, "r") ## Unicode --> need "stronger" 'whitespace' to match all : tt <- "text with unicode 'non breakable space'." xu <- paste(" \t\v", tt, "\u00a0 \n\r") (tu <- trimws(xu, whitespace = "[\\h\\v]")) stopifnot(identical(tu, tt))
try
is a wrapper to run an expression that might fail and allow
the user's code to handle error-recovery.
try(expr, silent = FALSE, outFile = getOption("try.outFile", default = stderr()))
try(expr, silent = FALSE, outFile = getOption("try.outFile", default = stderr()))
expr |
an R expression to try. |
silent |
logical: should the report of error messages be suppressed? |
outFile |
a connection, or a character string naming the
file to print to (via |
try
evaluates an expression and traps any errors that occur
during the evaluation. If an error occurs then the error
message is printed to the stderr
connection unless
options("show.error.messages")
is false or
the call includes silent = TRUE
. The error message is also
stored in a buffer where it can be retrieved by
geterrmessage
. (This should not be needed as the value returned
in case of an error contains the error message.)
try
is implemented using tryCatch
; for
programming, instead of try(expr, silent = TRUE)
, something like
tryCatch(expr, error = function(e) e)
(or other simple
error handler functions) may be more efficient and flexible.
It may be useful to set the default for outFile
to
stdout()
, i.e.,
options(try.outFile = stdout())
instead of the default stderr()
,
notably when try()
is used inside a Sweave
code
chunk and the error message should appear in the resulting document.
The value of the expression if expr
is evaluated without error:
otherwise an invisible object inheriting from class "try-error"
containing the error message with the error condition as the
"condition"
attribute.
Do not test
if (class(res) == "try-error"))
as if there is no error, the result might (now or in future) have a
class of length > 1. Use if(inherits(res, "try-error"))
instead.
options
for setting error handlers and suppressing the
printing of error messages;
geterrmessage
for retrieving the last error message.
The underlying tryCatch
provides more flexible means of
catching and handling errors.
assertCondition
in package tools is related and
useful for testing.
## this example will not work correctly in example(try), but ## it does work correctly if pasted in options(show.error.messages = FALSE) try(log("a")) print(.Last.value) options(show.error.messages = TRUE) ## alternatively, print(try(log("a"), TRUE)) ## run a simulation, keep only the results that worked. set.seed(123) x <- stats::rnorm(50) doit <- function(x) { x <- sample(x, replace = TRUE) if(length(unique(x)) > 30) mean(x) else stop("too few unique points") } ## alternative 1 res <- lapply(1:100, function(i) try(doit(x), TRUE)) ## alternative 2 ## Not run: res <- vector("list", 100) for(i in 1:100) res[[i]] <- try(doit(x), TRUE) ## End(Not run) unlist(res[sapply(res, function(x) !inherits(x, "try-error"))])
## this example will not work correctly in example(try), but ## it does work correctly if pasted in options(show.error.messages = FALSE) try(log("a")) print(.Last.value) options(show.error.messages = TRUE) ## alternatively, print(try(log("a"), TRUE)) ## run a simulation, keep only the results that worked. set.seed(123) x <- stats::rnorm(50) doit <- function(x) { x <- sample(x, replace = TRUE) if(length(unique(x)) > 30) mean(x) else stop("too few unique points") } ## alternative 1 res <- lapply(1:100, function(i) try(doit(x), TRUE)) ## alternative 2 ## Not run: res <- vector("list", 100) for(i in 1:100) res[[i]] <- try(doit(x), TRUE) ## End(Not run) unlist(res[sapply(res, function(x) !inherits(x, "try-error"))])
typeof
determines the (R internal)
type or storage mode of any object
typeof(x)
typeof(x)
x |
any R object. |
A character string. The possible values are listed in the structure
TypeTable
in ‘src/main/util.c’. Current values are
the vector types "logical"
, "integer"
, "double"
,
"complex"
, "character"
, "raw"
and "list"
,
"NULL"
,
"closure"
(function), "special"
and "builtin"
(basic functions and operators), "environment"
, "S4"
(some S4 objects) and others that are unlikely to be seen at user
level ("symbol"
, "pairlist"
, "promise"
,
"object"
,
"language"
, "char"
, "..."
, "any"
,
"expression"
, "externalptr"
, "bytecode"
and
"weakref"
).
isS4
to determine if an object has an S4 class.
typeof(2) mode(2) ## for a table of examples, see ?mode / examples(mode)
typeof(2) mode(2) ## for a table of examples, see ?mode / examples(mode)
unique
returns a vector, data frame or array like x
but with duplicate elements/rows removed.
unique(x, incomparables = FALSE, ...) ## Default S3 method: unique(x, incomparables = FALSE, fromLast = FALSE, nmax = NA, ...) ## S3 method for class 'matrix' unique(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, ...) ## S3 method for class 'array' unique(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, ...)
unique(x, incomparables = FALSE, ...) ## Default S3 method: unique(x, incomparables = FALSE, fromLast = FALSE, nmax = NA, ...) ## S3 method for class 'matrix' unique(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, ...) ## S3 method for class 'array' unique(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, ...)
x |
a vector or a data frame or an array or |
incomparables |
a vector of values that cannot be compared.
|
fromLast |
logical indicating if duplication should be considered
from the last, i.e., the last (or rightmost) of identical elements will
be kept. This only matters for |
nmax |
the maximum number of unique items expected (greater than one).
See |
... |
arguments for particular methods. |
MARGIN |
the array margin to be held fixed: a single integer. |
This is a generic function with methods for vectors, data frames and arrays (including matrices).
The array method calculates for each element of the dimension
specified by MARGIN
if the remaining dimensions are identical
to those for an earlier element (in row-major order). This would most
commonly be used for matrices to find unique rows (the default) or columns
(with MARGIN = 2
).
Note that unlike the Unix command uniq
this omits
duplicated and not just repeated elements/rows. That
is, an element is omitted if it is equal to any previous element and
not just if it is equal the immediately previous one. (For the
latter, see rle
).
Missing values ("NA"
) are regarded as equal, numeric and
complex ones differing from NaN
; character strings will be compared in a
“common encoding”; for details, see match
(and
duplicated
) which use the same concept.
Values in incomparables
will never be marked as duplicated.
This is intended to be used for a fairly small set of values and will
not be efficient for a very large set.
When used on a data frame with more than one column, or an array or matrix when comparing dimensions of length greater than one, this tests for identity of character representations. This will catch people who unwisely rely on exact equality of floating-point numbers!
For a vector, an object of the same type of x
, but with only
one copy of each duplicated element. No attributes are copied (so
the result has no names).
For a data frame, a data frame is returned with the same columns but possibly fewer rows (and with row names from the first occurrences of the unique rows).
A matrix or array is subsetted by [, drop = FALSE]
, so
dimensions and dimnames are copied appropriately, and the result
always has the same number of dimensions as x
.
Using this for lists is potentially slow, especially if the elements
are not atomic vectors (see vector
) or differ only
in their attributes. In the worst case it is .
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
duplicated
which gives the indices of duplicated
elements.
rle
which is the equivalent of the Unix uniq -c
command.
x <- c(3:5, 11:8, 8 + 0:5) (ux <- unique(x)) (u2 <- unique(x, fromLast = TRUE)) # different order stopifnot(identical(sort(ux), sort(u2))) length(unique(sample(100, 100, replace = TRUE))) ## approximately 100(1 - 1/e) = 63.21 unique(iris)
x <- c(3:5, 11:8, 8 + 0:5) (ux <- unique(x)) (u2 <- unique(x, fromLast = TRUE)) # different order stopifnot(identical(sort(ux), sort(u2))) length(unique(sample(100, 100, replace = TRUE))) ## approximately 100(1 - 1/e) = 63.21 unique(iris)
unlink
deletes the file(s) or directories specified by x
.
unlink(x, recursive = FALSE, force = FALSE, expand = TRUE)
unlink(x, recursive = FALSE, force = FALSE, expand = TRUE)
x |
a character vector with the names of the file(s) or directories to be deleted. |
recursive |
logical. Should directories be deleted recursively? |
force |
logical. Should permissions be changed (if possible) to allow the file or directory to be removed? |
expand |
logical. Should wildcards (see ‘Details’ below) and
tilde (see |
If recursive = FALSE
directories are not deleted,
not even empty ones.
On most platforms ‘file’ includes symbolic links, fifos and
sockets. unlink(x, recursive = TRUE)
deletes just the symbolic link if the target of such a link is a directory.
Wildcard expansion (normally ‘*’ and ‘?’ are allowed) is done by
the internal code of Sys.glob
. Wildcards never match a
leading ‘.’ in the filename, and files ‘.’, ‘..’ and
‘~’ will never be considered for deletion.
Wildcards will only be expanded if the system supports it. Most
systems will support not only ‘*’ and ‘?’ but also character
classes such as ‘[a-z]’ (see the man
pages for the system
call glob
on your OS). The metacharacters * ? [
can
occur in Unix filenames, and this makes it difficult to use
unlink
to delete such files (see file.remove
),
although escaping the metacharacters by backslashes usually works. If
a metacharacter matches nothing it is considered as a literal
character.
recursive = TRUE
might not be supported on all platforms, when it
will be ignored, with a warning: however there are no known current
examples.
0
for success, 1
for failure, invisibly.
Not deleting a non-existent file is not a failure, nor is being unable
to delete a directory if recursive = FALSE
. However, missing
values in x
are regarded as failures.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Given a list structure x
, unlist
simplifies it to
produce a vector which contains all the atomic components
which occur in x
.
unlist(x, recursive = TRUE, use.names = TRUE)
unlist(x, recursive = TRUE, use.names = TRUE)
x |
an R object, typically a list or vector. |
recursive |
logical. Should unlisting be applied to list
components of |
use.names |
logical. Should names be preserved? |
unlist
is generic: you can write methods to handle
specific classes of objects, see InternalMethods,
and note, e.g., relist
with the unlist
method
for relistable
objects.
If recursive = FALSE
, the function will not recurse beyond the
first level items in x
.
Factors are treated specially. If all non-list elements of x
are factor
(or ordered factor) objects then the result
will be a factor with
levels the union of the level sets of the elements, in the order the
levels occur in the level sets of the elements (which means that if
all the elements have the same level set, that is the level set of the
result).
x
can be an atomic vector, but then unlist
does nothing useful,
not even drop names.
By default, unlist
tries to retain the naming
information present in x
. If use.names = FALSE
all
naming information is dropped.
Where possible the list elements are coerced to a common mode during the unlisting, and so the result often ends up as a character vector. Vectors will be coerced to the highest type of the components in the hierarchy NULL < raw < logical < integer < double < complex < character < list < expression: pairlists are treated as lists.
A list is a (generic) vector, and the simplified vector might still be
a list (and might be unchanged). Non-vector elements of the list
(for example language elements such as names, formulas and calls)
are not coerced, and so a list containing one or more of these remains a
list. (The effect of unlisting an lm
fit is a list which
has individual residuals as components.)
Note that unlist(x)
now returns x
unchanged also for
non-vector x
, instead of signalling an error in that case.
NULL
or an expression or a vector of an appropriate mode to
hold the list components.
The output type is determined from the highest type of the components in the hierarchy NULL < raw < logical < integer < double < complex < character < list < expression, after coercion of pairlists to lists.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
unlist(options()) unlist(options(), use.names = FALSE) l.ex <- list(a = list(1:5, LETTERS[1:5]), b = "Z", c = NA) unlist(l.ex, recursive = FALSE) unlist(l.ex, recursive = TRUE) l1 <- list(a = "a", b = 2, c = pi+2i) unlist(l1) # a character vector l2 <- list(a = "a", b = as.name("b"), c = pi+2i) unlist(l2) # remains a list ll <- list(as.name("sinc"), quote( a + b ), 1:10, letters, expression(1+x)) utils::str(ll) for(x in ll) stopifnot(identical(x, unlist(x)))
unlist(options()) unlist(options(), use.names = FALSE) l.ex <- list(a = list(1:5, LETTERS[1:5]), b = "Z", c = NA) unlist(l.ex, recursive = FALSE) unlist(l.ex, recursive = TRUE) l1 <- list(a = "a", b = 2, c = pi+2i) unlist(l1) # a character vector l2 <- list(a = "a", b = as.name("b"), c = pi+2i) unlist(l2) # remains a list ll <- list(as.name("sinc"), quote( a + b ), 1:10, letters, expression(1+x)) utils::str(ll) for(x in ll) stopifnot(identical(x, unlist(x)))
names
or dimnames
Remove the names
or dimnames
attribute of
an R object.
unname(obj, force = FALSE)
unname(obj, force = FALSE)
obj |
an R object. |
force |
logical; if true, the |
Object as obj
but without names
or
dimnames
.
require(graphics); require(stats) ## Answering a question on R-help (14 Oct 1999): col3 <- 750+ 100*rt(1500, df = 3) breaks <- factor(cut(col3, breaks = 360+5*(0:155))) z <- table(breaks) z[1:5] # The names are larger than the data ... barplot(unname(z), axes = FALSE)
require(graphics); require(stats) ## Answering a question on R-help (14 Oct 1999): col3 <- 750+ 100*rt(1500, df = 3) breaks <- factor(cut(col3, breaks = 360+5*(0:155))) z <- table(breaks) z[1:5] # The names are larger than the data ... barplot(unname(z), axes = FALSE)
Use packages in R scripts by loading their namespace and attaching a package environment including (a subset of) their exports to the search path.
use(package, include.only)
use(package, include.only)
package |
a character string given the name of a package. |
include.only |
character vector of names of objects to include in the attached environment frame. If missing, all exports are included. |
This is a simple wrapper around library
which always
uses attach.required = FALSE
, so that packages listed in the
Depends
clause of the DESCRIPTION
file of the package to
be used never get attached automatically to the search path.
This therefore allows to write R scripts with full control over what
gets found on the search path. In addition, such scripts can easily
be integrated as package code, replacing the calls to use
by
the corresponding ImportFrom
directives in ‘NAMESPACE’
files.
(invisibly) a logical indicating whether the package to be used is available.
This functionality is still experimental: interfaces may change in future versions.
R possesses a simple generic function mechanism which can be used for
an object-oriented style of programming. Method dispatch takes place
based on the class(es) of the first argument to the generic function or of
the object supplied as an argument to UseMethod
or NextMethod
.
UseMethod(generic, object) NextMethod(generic = NULL, object = NULL, ...)
UseMethod(generic, object) NextMethod(generic = NULL, object = NULL, ...)
generic |
a character string naming a function (and not a
built-in operator). Required for |
object |
for |
... |
further arguments to be passed to the next method. |
An R object is a data object which has a class
attribute (and this can be tested by is.object
).
A class attribute is a character vector giving the names of
the classes from which the object inherits.
If the object does not have a class attribute, it has an
implicit class. Matrices and arrays have class "matrix"
or "array"
followed by the class of the underlying vector.
Most vectors have class the result of mode(x)
, except
that integer vectors have class c("integer", "numeric")
and
real vectors have class c("double", "numeric")
.
Function .class2(x)
(since R 4.0.x) returns the full
implicit (or explicit) class vector of x
.
When a function calling UseMethod("fun")
is applied to an
object with class vector c("first", "second")
, the system
searches for a function called fun.first
and, if it finds it,
applies it to the object. If no such function is found a function
called fun.second
is tried. If no class name produces a
suitable function, the function fun.default
is used, if it
exists, or an error results.
Function methods
can be used to find out about the
methods for a particular generic function or class.
UseMethod
is a primitive function but uses standard argument
matching. It is not the only means of dispatch of methods, for there
are internal generic and group generic functions.
UseMethod
currently dispatches on the implicit class even for
arguments that are not objects, but the other means of dispatch do
not.
NextMethod
invokes the next method (determined by the
class vector, either of the object supplied to the generic, or of
the first argument to the function containing NextMethod
if a
method was invoked directly). Normally NextMethod
is used with
only one argument, generic
, but if further arguments are
supplied these modify the call to the next method.
NextMethod
should not be called except in methods called by
UseMethod
or from internal generics (see
InternalGenerics). In particular it will not work inside
anonymous calling functions (e.g., get("print.ts")(AirPassengers)
).
Namespaces can register methods for generic functions. To support
this, UseMethod
and NextMethod
search for methods in
two places: in the environment in which the generic function
is called, and in the registration data base for the
environment in which the generic is defined (typically a namespace).
So methods for a generic function need to be available in the
environment of the call to the generic, or they must be registered.
(It does not matter whether they are visible in the environment in
which the generic is defined.) As from R 3.5.0, the registration
data base is searched after the top level environment (see
topenv
) of the calling environment (but before the
parents of the top level environment).
Now for some obscure details that need to appear somewhere. These
comments will be slightly different than those in Chambers(1992).
(See also the draft ‘R Language Definition’.)
UseMethod
creates a new function call with
arguments matched as they came in to the generic. [Previously local
variables defined before the call to UseMethod
were retained;
as of R 4.4.0 this is no longer the case.] Any
statements after the call to UseMethod
will not be evaluated as
UseMethod
does not return. UseMethod
can be called with
more than two arguments: a warning will be given and additional
arguments ignored. (They are not completely ignored in S.) If it is
called with just one argument, the class of the first argument of the
enclosing function is used as object
: unlike S this is the first
actual argument passed and not the current value of the object of that
name.
NextMethod
works by creating a special call frame for the next
method. If no new arguments are supplied, the arguments will be the
same in number, order and name as those to the current method but
their values will be promises to evaluate their name in the current
method and environment. Any named arguments matched to ...
are handled specially: they either replace existing arguments of the
same name or are appended to the argument list. They are passed on as
the promise that was supplied as an argument to the current
environment. (S does this differently!) If they have been evaluated
in the current (or a previous environment) they remain evaluated.
(This is a complex area, and subject to change: see the draft
‘R Language Definition’.)
The search for methods for NextMethod
is slightly different
from that for UseMethod
. Finding no fun.default
is not
necessarily an error, as the search continues to the generic
itself. This is to pick up an internal generic like [
which has no separate default method, and succeeds only if the generic
is a primitive function or a wrapper for a
.Internal
function of the same name. (When a primitive
is called as the default method, argument matching may not work as
described above due to the different semantics of primitives.)
You will see objects such as .Generic
, .Method
, and
.Class
used in methods. These are set in the environment
within which the method is evaluated by the dispatch mechanism, which
is as follows:
Find the context for the calling function (the generic): this gives us the unevaluated arguments for the original call.
Evaluate the object (usually an argument) to be used for dispatch, and find a method (possibly the default method) or throw an error.
Create an environment for evaluating the method and insert special variables (see below) into that environment. Also copy any variables in the environment of the generic that are not formal (or actual) arguments.
Fix up the argument list to be the arguments of the call matched to the formals of the method.
.Generic
is a length-one character vector naming the generic function.
.Method
is a character vector (normally of length one) naming
the method function. (For functions in the group generic
Ops
it is of length two.)
.Class
is a character vector of classes used to find the next
method. NextMethod
adds an attribute "previous"
to
.Class
giving the .Class
last used for dispatch, and
shifts .Class
along to that used for dispatch.
.GenericCallEnv
and .GenericDefEnv
are the environments
of the call to be generic and defining the generic respectively. (The
latter is used to find methods registered for the generic.)
Note that .Class
is set when the generic is called, and is
unchanged if the class of the dispatching argument is changed in a
method. It is possible to change the method that NextMethod
would dispatch by manipulating .Class
, but ‘this is not
recommended unless you understand the inheritance mechanism
thoroughly’ (Chambers & Hastie, 1992, p. 469).
This scheme is called S3 (S version 3). For new projects, it is recommended to use the more flexible and robust S4 scheme provided in the methods package.
Chambers, J. M. (1992) Classes and methods: object-oriented programming in S. Appendix A of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
The draft ‘R Language Definition’.
methods
, class
incl .class2()
;
getS3method
, is.object
.
These functions allow users to set actions to be taken before packages are attached/detached and namespaces are (un)loaded.
getHook(hookName) setHook(hookName, value, action = c("append", "prepend", "replace")) packageEvent(pkgname, event = c("onLoad", "attach", "detach", "onUnload"))
getHook(hookName) setHook(hookName, value, action = c("append", "prepend", "replace")) packageEvent(pkgname, event = c("onLoad", "attach", "detach", "onUnload"))
hookName |
character string: the hook name. |
pkgname |
character string: the package/namespace name. |
event |
character string: an event for the package. Can be abbreviated. |
value |
a function or a list of functions, or for |
action |
the action to be taken. Can be abbreviated. |
setHook
provides a general mechanism for users to register
hooks, a list of functions to be called from system (or user)
functions. The initial set of hooks was associated with events on
packages/namespaces: these hooks are named via calls to
packageEvent
.
To remove a hook completely, call setHook(hookName, NULL, "replace")
.
When an R package is attached by library
or loaded by
other means, it can call initialization code. See
.onLoad
for a description of the package hook functions
called during initialization. Users can add their own initialization
code via the hooks provided by setHook()
, functions which will
be called as funname(pkgname, pkgpath)
inside a
try
call.
The sequence of events depends on which hooks are defined, and whether a package is attached or just loaded. In the case where all hooks are defined and a package is attached, the order of initialization events is as follows:
The package namespace is loaded.
The package's .onLoad
function is run.
If S4 methods dispatch is on, any actions set by
setLoadAction
are run.
The namespace is sealed.
The user's "onLoad"
hook is run.
The package is added to the search path.
The package's .onAttach
function is run.
The package environment is sealed.
The user's "attach"
hook is run.
A similar sequence (but in reverse) is run when a package is detached and its namespace unloaded:
The user's "detach"
hook is run.
The package's .Last.lib
function is run.
The package is removed from the search path.
The user's "onUnload"
hook is run.
The package's .onUnload
function is run.
The package namespace is unloaded.
Note that when an R session is finished, packages are not detached and namespaces are not unloaded, so the corresponding hooks will not be run.
Also note that some of the user hooks are run without the package being on the search path, so in those hooks objects in the package need to be referred to using the double (or triple) colon operator, as in the example.
If multiple hooks are added, they are normally run in the order shown
by getHook
, but the "detach"
and "onUnload"
hooks
are run in reverse order so the default for package events is to add
hooks ‘inside’ existing ones.
The hooks are stored in the environment .userHooksEnv
in the
base package, with ‘mangled’ names.
For getHook
function, a list of functions (possibly empty).
For setHook
function, no return value.
For packageEvent
, the derived hook name (a character string).
Hooks need to be set before the event they modify: for standard packages this can be problematic as methods is loaded and attached early in the startup sequence. The usual place to set hooks such as the example below is in the ‘.Rprofile’ file, but that will not work for methods.
library
, detach
, loadNamespace
.
See ::
for a discussion of the double and triple colon operators.
Other hooks may be added later: functions plot.new
and
persp
already have them.
setHook(packageEvent("grDevices", "onLoad"), function(...) grDevices::ps.options(horizontal = FALSE))
setHook(packageEvent("grDevices", "onLoad"), function(...) grDevices::ps.options(horizontal = FALSE))
Conversion of UTF-8 encoded character vectors to and from integer vectors representing a UTF-32 encoding.
utf8ToInt(x) intToUtf8(x, multiple = FALSE, allow_surrogate_pairs = FALSE)
utf8ToInt(x) intToUtf8(x, multiple = FALSE, allow_surrogate_pairs = FALSE)
x |
object to be converted. |
multiple |
logical: should the conversion be to a single character string or multiple individual characters? |
allow_surrogate_pairs |
logical: should interpretation of
surrogate pairs be attempted? (See ‘Details’.)
Only supported for |
These will work in any locale, including on platforms that do not otherwise support multi-byte character sets.
Unicode defines a name and a number of all of the glyphs it
encompasses: the numbers are called code points: since RFC3629
they run from 0
to 0x10FFFF
(with about 5% being
assigned by version 13.0 of the Unicode standard and 7% reserved for
‘private use’).
intToUtf8
does not by default handle surrogate pairs: inputs in
the surrogate ranges are mapped to NA
. They might occur if a
UTF-16 byte stream has been read as 2-byte integers (in the correct
byte order), in which case allow_surrogate_pairs = TRUE
will
try to interpret them (with unmatched surrogate values still treated
as NA
).
utf8ToInt
converts a length-one character string encoded in
UTF-8 to an integer vector of Unicode code points.
intToUtf8
converts a numeric vector of Unicode code points
either (default) to a single character string or a character vector of
single characters. Non-integral numeric values are truncated to
integers. For output to a single character string 0
is
silently omitted: otherwise 0
is mapped to ""
. The
Encoding
of a non-NA
return value is declared as
"UTF-8"
.
Invalid and NA
inputs are mapped to NA
output.
Which code points are regarded as valid has changed over the lifetime
of UTF-8. Originally all 32-bit unsigned integers were potentially
valid and could be converted to up to 6 bytes in UTF-8. Since 2003 it
has been stated that there will never be valid code points larger than
0x10FFFF
, and so valid UTF-8 encodings are never more than 4
bytes.
The code points in the surrogate-pair range 0xD800
to
0xDFFF
are prohibited in UTF-8 and so are regarded as invalid
by utf8ToInt
and by default by intToUtf8
.
The position of ‘noncharacters’ (notably 0xFFFE
and
0xFFFF
) was clarified by ‘Corrigendum 9’ in 2013. These
are valid but will never be given an official interpretation. (In some
earlier versions of R utf8ToInt
treated them as invalid.)
https://www.rfc-editor.org/rfc/rfc3629, the current standard for UTF-8.
https://www.unicode.org/versions/corrigendum9.html for non-characters.
## will only display in some locales and fonts intToUtf8(0x03B2L) # Greek beta utf8ToInt("bi\u00dfchen") utf8ToInt("\xfa\xb4\xbf\xbf\x9f") ## A valid UTF-16 surrogate pair (for U+10437) x <- c(0xD801, 0xDC37) intToUtf8(x) intToUtf8(x, TRUE) (xx <- intToUtf8(x, , TRUE)) # will only display in some locales and fonts charToRaw(xx) ## An example of how surrogate pairs might occur x <- "\U10437" charToRaw(x) foo <- tempfile() writeLines(x, file(foo, encoding = "UTF-16LE")) ## next two are OS-specific, but are mandated by POSIX system(paste("od -x", foo)) # 2-byte units, correct on little-endian platforms system(paste("od -t x1", foo)) # single bytes as hex y <- readBin(foo, "integer", 2, 2, FALSE, endian = "little") sprintf("%X", y) intToUtf8(y, , TRUE)
## will only display in some locales and fonts intToUtf8(0x03B2L) # Greek beta utf8ToInt("bi\u00dfchen") utf8ToInt("\xfa\xb4\xbf\xbf\x9f") ## A valid UTF-16 surrogate pair (for U+10437) x <- c(0xD801, 0xDC37) intToUtf8(x) intToUtf8(x, TRUE) (xx <- intToUtf8(x, , TRUE)) # will only display in some locales and fonts charToRaw(xx) ## An example of how surrogate pairs might occur x <- "\U10437" charToRaw(x) foo <- tempfile() writeLines(x, file(foo, encoding = "UTF-16LE")) ## next two are OS-specific, but are mandated by POSIX system(paste("od -x", foo)) # 2-byte units, correct on little-endian platforms system(paste("od -t x1", foo)) # single bytes as hex y <- readBin(foo, "integer", 2, 2, FALSE, endian = "little") sprintf("%X", y) intToUtf8(y, , TRUE)
Most modern file systems store file-path components (names of directories and files) in a character encoding of wide scope: usually UTF-8 on a Unix-alike and UCS-2/UTF-16 on Windows. However, this was not true when R was first developed and there are still exceptions amongst file systems, e.g. FAT32.
This was not something anticipated by the C and POSIX standards which only provide means to access files via file paths encoded in the current locale, for example those specified in Latin-1 in a Latin-1 locale.
Everything here apart from the specific section on Windows is about Unix-alikes.
It is possible to mark character strings (elements of character
vectors) as being in UTF-8 or Latin-1 (see Encoding
).
This allows file paths not in the native encoding to be
expressed in R character vectors but there is almost no way to use
them unless they can be translated to the native encoding. That is of
course not a problem if that is UTF-8, so these details are really only
relevant to the use of a non-UTF-8 locale (including a C locale) on a
Unix-alike.
Functions to open a file such as file
,
fifo
, pipe
, gzfile
,
bzfile
, xzfile
and unz
give
an error for non-native filepaths. Where functions look at existence
such as file.exists
, dir.exists
,
unlink
, file.info
and
list.files
, non-native filepaths are treated as
non-existent.
Many other functions use file
or gzfile
to open their
files.
file.path
allows non-native file paths to be combined,
marking them as UTF-8 if needed.
path.expand
only handles paths in the native encoding.
Windows provides proprietary entry points to access its file systems, and these gained ‘wide’ versions in Windows NT that allowed file paths in UCS-2/UTF-16 to be accessed from any locale.
Some R functions use these entry points when file paths are marked
as Latin-1 or UTF-8 to allow access to paths not in the current
encoding. These include
file
, file.access
,
file.append
, file.copy
,
file.create
, file.exists
,
file.info
, file.link
,
file.remove
, file.rename
,
file.symlink
and
dir.create
, dir.exists
,
normalizePath
, path.expand
,
pipe
, Sys.glob
,
Sys.junction
,
unlink
but not gzfile
bzfile
,
xzfile
nor unz
.
For functions using gzfile
(including
load
, readRDS
, read.dcf
and
tar
), it is often possible to use a gzcon
connection wrapping a file
connection.
Other notable exceptions are list.files
,
list.dirs
, system
and file-path inputs for
graphics devices.
Before R 4.0.0, file paths marked as being in Latin-1 or UTF-8 were silently translated to the native encoding using escapes such as ‘<e7>’ or ‘<U+00e7>’. This created valid file names but maybe not those intended.
This document is still a work-in-progress.
Check if each element of a character vector is valid in its implied encoding.
validUTF8(x) validEnc(x)
validUTF8(x) validEnc(x)
x |
a character vector. |
These use similar checks to those used by functions such as
grep
.
validUTF8
ignores any marked encoding (see
Encoding
) and so looks directly if the bytes in each
string are valid UTF-8. (For the validity of ‘noncharacters’
see the help for intToUtf8
.)
validEnc
regards character strings as validly encoded unless
their encodings are marked as UTF-8 or they are unmarked and the R
session is in a UTF-8 or other multi-byte locale. (The checks in
other multi-byte locales depend on the OS and as with
iconv
not all invalid inputs may be detected.)
A logical vector of the same length as x
. NA
elements
are regarded as validly encoded.
It would be possible to check for the validity of character strings in a Latin-1 encoding, but extensions such as CP1252 are widely accepted as ‘Latin-1’ and 8-bit encodings rarely need to be checked for validity.
x <- ## from example(text) c("Jetz", "no", "chli", "z\xc3\xbcrit\xc3\xbc\xc3\xbctsch:", "(noch", "ein", "bi\xc3\x9fchen", "Z\xc3\xbc", "deutsch)", ## from a CRAN check log "\xfa\xb4\xbf\xbf\x9f") validUTF8(x) validEnc(x) # depends on the locale Encoding(x) <-"UTF-8" validEnc(x) # typically the last, x[10], is invalid ## Maybe advantageous to declare it "unknown": G <- x ; Encoding(G[!validEnc(G)]) <- "unknown" try( substr(x, 1,1) ) # gives 'invalid multibyte string' error in a UTF-8 locale try( substr(G, 1,1) ) # works in a UTF-8 locale nchar(G) # fine, too ## but it is not "more valid" typically: all.equal(validEnc(x), validEnc(G)) # typically TRUE
x <- ## from example(text) c("Jetz", "no", "chli", "z\xc3\xbcrit\xc3\xbc\xc3\xbctsch:", "(noch", "ein", "bi\xc3\x9fchen", "Z\xc3\xbc", "deutsch)", ## from a CRAN check log "\xfa\xb4\xbf\xbf\x9f") validUTF8(x) validEnc(x) # depends on the locale Encoding(x) <-"UTF-8" validEnc(x) # typically the last, x[10], is invalid ## Maybe advantageous to declare it "unknown": G <- x ; Encoding(G[!validEnc(G)]) <- "unknown" try( substr(x, 1,1) ) # gives 'invalid multibyte string' error in a UTF-8 locale try( substr(G, 1,1) ) # works in a UTF-8 locale nchar(G) # fine, too ## but it is not "more valid" typically: all.equal(validEnc(x), validEnc(G)) # typically TRUE
A vector in R is either an atomic vector i.e., one of the atomic
types, see ‘Details’, or of type (typeof
) or mode
list
or expression
.
vector
produces a ‘simple’ vector of the given length and
mode, where a ‘simple’ vector has no attribute, i.e., fulfills
is.null(attributes(.))
.
as.vector
, a generic, attempts to coerce its argument into a
vector of mode mode
(the default is to coerce to whichever
vector mode is most convenient): if the result is atomic
(is.atomic
), all attributes are removed.
For mode="any"
, see ‘Details’.
is.vector(x)
returns TRUE
if x
is a vector of the
specified mode having no attributes other than names.
For mode="any"
, see ‘Details’.
vector(mode = "logical", length = 0) as.vector(x, mode = "any") is.vector(x, mode = "any")
vector(mode = "logical", length = 0) as.vector(x, mode = "any") is.vector(x, mode = "any")
mode |
character string naming an atomic mode or
|
length |
a non-negative integer specifying the desired length. For
a long vector, i.e., |
x |
an R object. |
The atomic modes are "logical"
, "integer"
,
"numeric"
(synonym "double"
), "complex"
,
"character"
and "raw"
.
If mode = "any"
, is.vector
may return TRUE
for
the atomic modes, list
and expression
.
For any mode
, it will return FALSE
if x
has any
attributes except names. (This is incompatible with S.) On the other
hand, as.vector
removes all attributes including names
for results of atomic mode.
For mode = "any"
, and atomic vectors x
, as.vector(x)
strips all attributes
(including names
),
returning a simple atomic vector.
However, when x
is of type "list"
or
"expression"
, as.vector(x)
currently returns the
argument x
unchanged, unless there is an as.vector
method
for class(x)
.
Note that factors are not vectors; is.vector
returns
FALSE
and as.vector
converts a factor to a character
vector for mode = "any"
.
For vector
, a vector of the given length and mode. Logical
vector elements are initialized to FALSE
, numeric vector
elements to 0
, character vector elements to ""
, raw
vector elements to nul
bytes and list/expression elements to
NULL
.
For as.vector
, a vector (atomic or of type list or expression).
All attributes are removed from the result if it is of an atomic mode,
but not in general for a list or expression result. The default method handles 24
input types and 12 values of type
: the details of most
coercions are undocumented and subject to change.
For is.vector
, TRUE
or FALSE
.
is.vector(x, mode = "numeric")
can be true for vectors of types
"integer"
or "double"
whereas is.vector(x, mode =
"double")
can only be true for those of type "double"
.
as.vector()
Writers of methods for as.vector
need to take care to
follow the conventions of the default method. In particular
Argument mode
can be "any"
, any of the atomic
modes, "list"
, "expression"
, "symbol"
,
"pairlist"
or one of the aliases "double"
and "name"
.
The return value should be of the appropriate mode. For
mode = "any"
this means an atomic vector or list or expression.
Attributes should be treated appropriately: in particular when the result is an atomic vector there should be no attributes, not even names.
is.vector(as.vector(x, m), m)
should be true for any
mode m
, including the default "any"
.
Currently this is not fulfilled in R when m == "any"
and
x
is of type list
or expression
with
attributes in addition to names
— typically the case for
(S3 or S4) objects (see is.object
) which are lists
internally.
as.vector
and is.vector
are quite distinct from the
meaning of the formal class "vector"
in the methods
package, and hence as(x, "vector")
and
is(x, "vector")
.
Note that as.vector(x)
is not necessarily a null operation if
is.vector(x)
is true: any names will be removed from an atomic
vector.
Non-vector mode
s "symbol"
(synonym "name"
) and
"pairlist"
are accepted but have long been undocumented: they
are used to implement as.name
and
as.pairlist
, and those functions should preferably be
used directly. None of the description here applies to those
mode
s: see the help for the preferred forms.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
c
, is.numeric
, is.list
, etc.
df <- data.frame(x = 1:3, y = 5:7) ## Error: try(as.vector(data.frame(x = 1:3, y = 5:7), mode = "numeric")) x <- c(a = 1, b = 2) is.vector(x) as.vector(x) all.equal(x, as.vector(x)) ## FALSE ###-- All the following are TRUE: is.list(df) ! is.vector(df) ! is.vector(df, mode = "list") is.vector(list(), mode = "list")
df <- data.frame(x = 1:3, y = 5:7) ## Error: try(as.vector(data.frame(x = 1:3, y = 5:7), mode = "numeric")) x <- c(a = 1, b = 2) is.vector(x) as.vector(x) all.equal(x, as.vector(x)) ## FALSE ###-- All the following are TRUE: is.list(df) ! is.vector(df) ! is.vector(df, mode = "list") is.vector(list(), mode = "list")
Vectorize
creates a function wrapper that vectorizes the
action of its argument FUN
.
Vectorize(FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE)
Vectorize(FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE)
FUN |
function to apply, found via |
vectorize.args |
a character vector of arguments which should be
vectorized. Defaults to all arguments of |
SIMPLIFY |
logical or character string; attempt to reduce the
result to a vector, matrix or higher dimensional array; see
the |
USE.NAMES |
logical; use names if the first ... argument has names, or if it is a character vector, use that character vector as the names. |
The arguments named in the vectorize.args
argument to
Vectorize
are the arguments passed in the ...
list to
mapply
. Only those that are actually passed will be
vectorized; default values will not. See the examples.
Vectorize
cannot be used with primitive functions as they do
not have a value for formals
.
It also cannot be used with functions that have arguments named
FUN
, vectorize.args
, SIMPLIFY
or
USE.NAMES
, as they will interfere with the Vectorize
arguments. See the combn
example below for a workaround.
A function with the same arguments as FUN
, wrapping a call to
mapply
.
# We use rep.int as rep is primitive vrep <- Vectorize(rep.int) vrep(1:4, 4:1) vrep(times = 1:4, x = 4:1) vrep <- Vectorize(rep.int, "times") vrep(times = 1:4, x = 42) f <- function(x = 1:3, y) c(x, y) vf <- Vectorize(f, SIMPLIFY = FALSE) f(1:3, 1:3) vf(1:3, 1:3) vf(y = 1:3) # Only vectorizes y, not x # Nonlinear regression contour plot, based on nls() example require(graphics) SS <- function(Vm, K, resp, conc) { pred <- (Vm * conc)/(K + conc) sum((resp - pred)^2 / pred) } vSS <- Vectorize(SS, c("Vm", "K")) Treated <- subset(Puromycin, state == "treated") Vm <- seq(140, 310, length.out = 50) K <- seq(0, 0.15, length.out = 40) SSvals <- outer(Vm, K, vSS, Treated$rate, Treated$conc) contour(Vm, K, SSvals, levels = (1:10)^2, xlab = "Vm", ylab = "K") # combn() has an argument named FUN combnV <- Vectorize(function(x, m, FUNV = NULL) combn(x, m, FUN = FUNV), vectorize.args = c("x", "m")) combnV(4, 1:4) combnV(4, 1:4, sum)
# We use rep.int as rep is primitive vrep <- Vectorize(rep.int) vrep(1:4, 4:1) vrep(times = 1:4, x = 4:1) vrep <- Vectorize(rep.int, "times") vrep(times = 1:4, x = 42) f <- function(x = 1:3, y) c(x, y) vf <- Vectorize(f, SIMPLIFY = FALSE) f(1:3, 1:3) vf(1:3, 1:3) vf(y = 1:3) # Only vectorizes y, not x # Nonlinear regression contour plot, based on nls() example require(graphics) SS <- function(Vm, K, resp, conc) { pred <- (Vm * conc)/(K + conc) sum((resp - pred)^2 / pred) } vSS <- Vectorize(SS, c("Vm", "K")) Treated <- subset(Puromycin, state == "treated") Vm <- seq(140, 310, length.out = 50) K <- seq(0, 0.15, length.out = 40) SSvals <- outer(Vm, K, vSS, Treated$rate, Treated$conc) contour(Vm, K, SSvals, levels = (1:10)^2, xlab = "Vm", ylab = "K") # combn() has an argument named FUN combnV <- Vectorize(function(x, m, FUNV = NULL) combn(x, m, FUN = FUNV), vectorize.args = c("x", "m")) combnV(4, 1:4) combnV(4, 1:4, sum)
Generates a warning message that corresponds to its argument(s) and (optionally) the expression or function from which it was called.
warning(..., call. = TRUE, immediate. = FALSE, noBreaks. = FALSE, domain = NULL) suppressWarnings(expr, classes = "warning")
warning(..., call. = TRUE, immediate. = FALSE, noBreaks. = FALSE, domain = NULL) suppressWarnings(expr, classes = "warning")
... |
either zero or more objects which can be coerced to character (and which are pasted together with no separator) or a single condition object. |
call. |
logical, indicating if the call should become part of the warning message. |
immediate. |
logical, indicating if the warning should be output
immediately, even if |
noBreaks. |
logical, indicating as far as possible the message should
be output as a single line when |
expr |
expression to evaluate. |
domain |
see |
classes |
character, indicating which classes of warnings should be suppressed. |
The result depends on the value of
options("warn")
and on handlers established in the
executing code.
If a condition object is supplied it should be the only
argument, and further arguments will be ignored, with a message.
options(warn = 1)
can be used to request an immediate
report.
warning
signals a warning condition by (effectively) calling
signalCondition
. If there are no handlers or if all handlers
return, then the value of warn = getOption("warn")
is
used to determine the appropriate action. If warn
is negative
warnings are ignored; if it is zero they are stored and printed after
the top–level function has completed; if it is one they are printed
as they occur and if it is 2 (or larger) warnings are turned into
errors. Calling warning(immediate. = TRUE)
turns warn <=
0
into warn = 1
for this call only.
If warn
is zero (the default), a read-only variable
last.warning
is created. It contains the warnings which can be
printed via a call to warnings
.
Warnings will be truncated to getOption("warning.length")
characters, default 1000, indicated by [... truncated]
.
While the warning is being processed, a muffleWarning
restart
is available. If this restart is invoked with invokeRestart
,
then warning
returns immediately.
An attempt is made to coerce other types of inputs to warning
to character vectors.
suppressWarnings
evaluates its expression in a context that
ignores all warnings.
The warning message as character
string, invisibly.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
stop
for fatal errors,
message
for diagnostic messages,
warnings
,
and options
with argument warn=
.
gettext
for the mechanisms for the automated translation
of messages.
testit <- function() warning("testit") testit() ## shows call testit <- function() warning("problem in testit", call. = FALSE) testit() ## no call suppressWarnings(warning("testit"))
testit <- function() warning("testit") testit() ## shows call testit <- function() warning("problem in testit", call. = FALSE) testit() ## no call suppressWarnings(warning("testit"))
warnings
and its print
method print the
variable last.warning
in a pleasing form.
warnings(...) ## S3 method for class 'warnings' summary(object, ...) ## S3 method for class 'warnings' print(x, tags, header = ngettext(n, "Warning message:\n", "Warning messages:\n"), ...) ## S3 method for class 'summary.warnings' print(x, ...)
warnings(...) ## S3 method for class 'warnings' summary(object, ...) ## S3 method for class 'warnings' print(x, tags, header = ngettext(n, "Warning message:\n", "Warning messages:\n"), ...) ## S3 method for class 'summary.warnings' print(x, ...)
... |
arguments to be passed to |
object |
a |
x |
a |
tags |
if not |
header |
a character string |
See the description of options("warn")
for the
circumstances under which there is a last.warning
object and
warnings()
is used. In essence this is if options(warn =
0)
and warning
has been called at least once.
Note that the length(last.warning)
is maximally
getOption("nwarnings")
(at the time the warnings are
generated) which is 50
by default. To increase, use something
like
options(nwarnings = 10000)
It is possible that last.warning
refers to the last recorded
warning and not to the last warning, for example if options(warn)
has
been changed or if a catastrophic error occurred.
warnings()
returns an object of S3 class "warnings"
, basically a named
list
.
In R versions before 4.4.0, it returned NULL
when there
were no warnings, contrary to the above documentation.
summary(<warnings>)
returns a "summary.warnings"
object which is basically the list
of unique warnings
(unique(object)
) with a "counts"
attribute, somewhat
experimentally.
It is undocumented where last.warning
is stored nor that it is
visible, and this is subject to change.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
## NB this example is intended to be pasted in, ## rather than run by example() ow <- options("warn") for(w in -1:1) { options(warn = w); cat("\n warn =", w, "\n") for(i in 1:3) { cat(i,"..\n"); m <- matrix(1:7, 3,4) } cat("--=--=--\n") } ## at the end prints all three warnings, from the 'option(warn = 0)' above options(ow) # reset to previous, typically 'warn = 0' tail(warnings(), 2) # see the last two warnings only (via '[' method) ## Often the most useful way to look at many warnings: summary(warnings()) op <- options(nwarnings = 10000) ## <- get "full statistics" x <- 1:36; for(n in 1:13) for(m in 1:12) A <- matrix(x, n,m) # There were 105 warnings ... summary(warnings()) options(op) # revert to previous (keeping 50 messages by default)
## NB this example is intended to be pasted in, ## rather than run by example() ow <- options("warn") for(w in -1:1) { options(warn = w); cat("\n warn =", w, "\n") for(i in 1:3) { cat(i,"..\n"); m <- matrix(1:7, 3,4) } cat("--=--=--\n") } ## at the end prints all three warnings, from the 'option(warn = 0)' above options(ow) # reset to previous, typically 'warn = 0' tail(warnings(), 2) # see the last two warnings only (via '[' method) ## Often the most useful way to look at many warnings: summary(warnings()) op <- options(nwarnings = 10000) ## <- get "full statistics" x <- 1:36; for(n in 1:13) for(m in 1:12) A <- matrix(x, n,m) # There were 105 warnings ... summary(warnings()) options(op) # revert to previous (keeping 50 messages by default)
Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.
weekdays(x, abbreviate) ## S3 method for class 'POSIXt' weekdays(x, abbreviate = FALSE) ## S3 method for class 'Date' weekdays(x, abbreviate = FALSE) months(x, abbreviate) ## S3 method for class 'POSIXt' months(x, abbreviate = FALSE) ## S3 method for class 'Date' months(x, abbreviate = FALSE) quarters(x, abbreviate) ## S3 method for class 'POSIXt' quarters(x, ...) ## S3 method for class 'Date' quarters(x, ...) julian(x, ...) ## S3 method for class 'POSIXt' julian(x, origin = as.POSIXct("1970-01-01", tz = "GMT"), ...) ## S3 method for class 'Date' julian(x, origin = as.Date("1970-01-01"), ...)
weekdays(x, abbreviate) ## S3 method for class 'POSIXt' weekdays(x, abbreviate = FALSE) ## S3 method for class 'Date' weekdays(x, abbreviate = FALSE) months(x, abbreviate) ## S3 method for class 'POSIXt' months(x, abbreviate = FALSE) ## S3 method for class 'Date' months(x, abbreviate = FALSE) quarters(x, abbreviate) ## S3 method for class 'POSIXt' quarters(x, ...) ## S3 method for class 'Date' quarters(x, ...) julian(x, ...) ## S3 method for class 'POSIXt' julian(x, origin = as.POSIXct("1970-01-01", tz = "GMT"), ...) ## S3 method for class 'Date' julian(x, origin = as.Date("1970-01-01"), ...)
x |
an object inheriting from class |
abbreviate |
logical vector (possibly recycled). Should the names be abbreviated? |
origin |
an length-one object inheriting from class
|
... |
arguments for other methods. |
weekdays
and months
return a character
vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME")
.
quarters
returns a character vector of "Q1"
to
"Q4"
.
julian
returns the number of days (possibly fractional)
since the origin, with the origin as a "origin"
attribute.
All time calculations in R are done ignoring leap-seconds.
Other components such as the day of the month or the year are
very easy to compute: just use as.POSIXlt
and extract
the relevant component. Alternatively (especially if the components
are desired as character strings), use strftime
.
DateTimeClasses
, Date
;
Sys.getlocale("LC_TIME")
crucially for months()
and weekdays()
.
## first two are locale dependent: weekdays(.leap.seconds) months (.leap.seconds) quarters(.leap.seconds) ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) ##' Transform (Time-)Date vector to convenient data frame : dt2df <- function(dt, dName = deparse(substitute(dt))) { DF <- as.data.frame(unclass(as.POSIXlt( dt ))) `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) } ## e.g., dt2df(.leap.seconds) # date+time dt2df(Sys.Date() + 0:9) # date ##' Even simpler: Date -> Matrix - dropping time info {sec,min,hour, isdst} d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7]) ## e.g., d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date ## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day) ## is the number of days since noon UTC on the first day of 4317 BCE. ## in the proleptic Julian calendar. To more recently, in ## 'Terrestrial Time' which differs from UTC by a few seconds ## See https://en.wikipedia.org/wiki/Terrestrial_Time julian(Sys.Date(), -2440588) # from a day floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time
## first two are locale dependent: weekdays(.leap.seconds) months (.leap.seconds) quarters(.leap.seconds) ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) ##' Transform (Time-)Date vector to convenient data frame : dt2df <- function(dt, dName = deparse(substitute(dt))) { DF <- as.data.frame(unclass(as.POSIXlt( dt ))) `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) } ## e.g., dt2df(.leap.seconds) # date+time dt2df(Sys.Date() + 0:9) # date ##' Even simpler: Date -> Matrix - dropping time info {sec,min,hour, isdst} d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7]) ## e.g., d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date ## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day) ## is the number of days since noon UTC on the first day of 4317 BCE. ## in the proleptic Julian calendar. To more recently, in ## 'Terrestrial Time' which differs from UTC by a few seconds ## See https://en.wikipedia.org/wiki/Terrestrial_Time julian(Sys.Date(), -2440588) # from a day floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time
Give the TRUE
indices of a logical object, allowing for array
indices.
which(x, arr.ind = FALSE, useNames = TRUE) arrayInd(ind, .dim, .dimnames = NULL, useNames = FALSE)
which(x, arr.ind = FALSE, useNames = TRUE) arrayInd(ind, .dim, .dimnames = NULL, useNames = FALSE)
x |
a |
arr.ind |
logical; should array indices be returned
when |
ind |
integer-valued index vector, as resulting from
|
.dim |
|
.dimnames |
optional list of character |
useNames |
logical indicating if the value of |
If arr.ind == FALSE
(the default), an integer vector,
or a double vector if x
is a long vector, with
length
equal to sum(x)
, i.e., to the number of
TRUE
s in x
.
Basically, the result is (1:length(x))[x]
in typical cases;
more generally, including when x
has NA
's,
which(x)
is seq_along(x)[!is.na(x) & x]
plus
names
when x
has.
If arr.ind == TRUE
and x
is an array
(has
a dim
attribute), the result is
arrayInd(which(x), dim(x), dimnames(x))
, namely a matrix
whose rows each are the indices of one element of x
; see
Examples below.
Unlike most other base R functions this does not coerce x
to logical: only arguments with typeof
logical are
accepted and others give an error.
Werner Stahel and Peter Holzer (ETH Zurich) proposed the
arr.ind
option.
Logic
, which.min
for the index of
the minimum or maximum, and match
for the first index of
an element in a vector, i.e., for a scalar a
, match(a, x)
is equivalent to min(which(x == a))
but much more efficient.
which(LETTERS == "R") which(ll <- c(TRUE, FALSE, TRUE, NA, FALSE, FALSE, TRUE)) #> 1 3 7 names(ll) <- letters[seq(ll)] which(ll) which((1:12)%%2 == 0) # which are even? which(1:10 > 3, arr.ind = TRUE) ( m <- matrix(1:12, 3, 4) ) div.3 <- m %% 3 == 0 which(div.3) which(div.3, arr.ind = TRUE) rownames(m) <- paste("Case", 1:3, sep = "_") which(m %% 5 == 0, arr.ind = TRUE) dim(m) <- c(2, 2, 3); m which(div.3, arr.ind = FALSE) which(div.3, arr.ind = TRUE) vm <- c(m) dim(vm) <- length(vm) #-- funny thing with length(dim(...)) == 1 which(div.3, arr.ind = TRUE)
which(LETTERS == "R") which(ll <- c(TRUE, FALSE, TRUE, NA, FALSE, FALSE, TRUE)) #> 1 3 7 names(ll) <- letters[seq(ll)] which(ll) which((1:12)%%2 == 0) # which are even? which(1:10 > 3, arr.ind = TRUE) ( m <- matrix(1:12, 3, 4) ) div.3 <- m %% 3 == 0 which(div.3) which(div.3, arr.ind = TRUE) rownames(m) <- paste("Case", 1:3, sep = "_") which(m %% 5 == 0, arr.ind = TRUE) dim(m) <- c(2, 2, 3); m which(div.3, arr.ind = FALSE) which(div.3, arr.ind = TRUE) vm <- c(m) dim(vm) <- length(vm) #-- funny thing with length(dim(...)) == 1 which(div.3, arr.ind = TRUE)
Determines the location, i.e., index of the (first) minimum or maximum of a numeric (or logical) vector.
which.min(x) which.max(x)
which.min(x) which.max(x)
x |
numeric (logical, integer or double) vector or an R object
for which the internal coercion to |
Missing and NaN
values are discarded.
an integer
or on 64-bit platforms, if
length(x) =: n
an integer
valued
double
of length 1 or 0 (iff x
has no
non-NA
s), giving the index of the first minimum or
maximum respectively of x
.
If this extremum is unique (or empty), the results are the same as
(but more efficient than) which(x == min(x, na.rm = TRUE))
or
which(x == max(x, na.rm = TRUE))
respectively.
x
– First TRUE
or FALSE
For a logical
vector x
with both FALSE
and
TRUE
values, which.min(x)
and which.max(x)
return
the index of the first FALSE
or TRUE
, respectively, as
FALSE < TRUE
. However, match(FALSE, x)
or
match(TRUE, x)
are typically preferred, as they do
indicate mismatches.
Martin Maechler
Use arrayInd()
, if you need array/matrix indices instead
of 1D vector ones.
which.is.max
in package nnet differs in
breaking ties at random (and having a ‘fuzz’ in the definition
of ties).
x <- c(1:4, 0:5, 11) which.min(x) which.max(x) ## it *does* work with NA's present, by discarding them: presidents[1:30] range(presidents, na.rm = TRUE) which.min(presidents) # 28 which.max(presidents) # 2 ## Find the first occurrence, i.e. the first TRUE, if there is at least one: x <- rpois(10000, lambda = 10); x[sample.int(50, 20)] <- NA ## where is the first value >= 20 ? which.max(x >= 20) ## Also works for lists (which can be coerced to numeric vectors): which.min(list(A = 7, pi = pi)) ## -> c(pi = 2L)
x <- c(1:4, 0:5, 11) which.min(x) which.max(x) ## it *does* work with NA's present, by discarding them: presidents[1:30] range(presidents, na.rm = TRUE) which.min(presidents) # 28 which.max(presidents) # 2 ## Find the first occurrence, i.e. the first TRUE, if there is at least one: x <- rpois(10000, lambda = 10); x[sample.int(50, 20)] <- NA ## where is the first value >= 20 ? which.max(x >= 20) ## Also works for lists (which can be coerced to numeric vectors): which.min(list(A = 7, pi = pi)) ## -> c(pi = 2L)
Evaluate an R expression in an environment constructed from data, possibly modifying (a copy of) the original data.
with(data, expr, ...) within(data, expr, ...) ## S3 method for class 'list' within(data, expr, keepAttrs = TRUE, ...)
with(data, expr, ...) within(data, expr, ...) ## S3 method for class 'list' within(data, expr, keepAttrs = TRUE, ...)
data |
data to use for constructing an environment. For the
default |
expr |
expression to evaluate; particularly for { a <- somefun() b <- otherfun() ..... rm(unused1, temp) } |
keepAttrs |
for the |
... |
arguments to be passed to (future) methods. |
with
is a generic function that evaluates expr
in a
local environment constructed from data
. The environment has
the caller's environment as its parent. This is useful for
simplifying calls to modeling functions. (Note: if data
is
already an environment then this is used with its existing parent.)
Note that assignments within expr
take place in the constructed
environment and not in the user's workspace.
within
is similar, except that it examines the environment
after the evaluation of expr
and makes the corresponding
modifications to a copy of data
(this may fail in the data
frame case if objects are created which cannot be stored in a data
frame), and returns it. within
can be used as an alternative
to transform
.
For with
, the value of the evaluated expr
. For
within
, the modified object.
For interactive use this is very effective and nice to read. For
programming however, i.e., in one's functions, more care is
needed, and typically one should refrain from using with()
, as,
e.g., variables in data
may accidentally override local
variables, see the reference.
Further, when using modeling or graphics functions with an explicit
data
argument (and typically using formula
s),
it is typically preferred to use the data
argument of that
function rather than to use with(data, ...)
.
Thomas Lumley (2003) Standard nonstandard evaluation rules. https://developer.r-project.org/nonstandard-eval.pdf
evalq
, attach
, assign
,
transform
.
with(mtcars, mpg[cyl == 8 & disp > 350]) # is the same as, but nicer than mtcars$mpg[mtcars$cyl == 8 & mtcars$disp > 350] require(stats); require(graphics) # examples from glm: with(data.frame(u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)), list(summary(glm(lot1 ~ log(u), family = Gamma)), summary(glm(lot2 ~ log(u), family = Gamma)))) aq <- within(airquality, { # Notice that multiple vars can be changed lOzone <- log(Ozone) Month <- factor(month.abb[Month]) cTemp <- round((Temp - 32) * 5/9, 1) # From Fahrenheit to Celsius S.cT <- Solar.R / cTemp # using the newly created variable rm(Day, Temp) }) head(aq) # example from boxplot: with(ToothGrowth, { boxplot(len ~ dose, boxwex = 0.25, at = 1:3 - 0.2, subset = (supp == "VC"), col = "yellow", main = "Guinea Pigs' Tooth Growth", xlab = "Vitamin C dose mg", ylab = "tooth length", ylim = c(0, 35)) boxplot(len ~ dose, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, subset = supp == "OJ", col = "orange") legend(2, 9, c("Ascorbic acid", "Orange juice"), fill = c("yellow", "orange")) }) # alternate form that avoids subset argument: with(subset(ToothGrowth, supp == "VC"), boxplot(len ~ dose, boxwex = 0.25, at = 1:3 - 0.2, col = "yellow", main = "Guinea Pigs' Tooth Growth", xlab = "Vitamin C dose mg", ylab = "tooth length", ylim = c(0, 35))) with(subset(ToothGrowth, supp == "OJ"), boxplot(len ~ dose, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, col = "orange")) legend(2, 9, c("Ascorbic acid", "Orange juice"), fill = c("yellow", "orange"))
with(mtcars, mpg[cyl == 8 & disp > 350]) # is the same as, but nicer than mtcars$mpg[mtcars$cyl == 8 & mtcars$disp > 350] require(stats); require(graphics) # examples from glm: with(data.frame(u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)), list(summary(glm(lot1 ~ log(u), family = Gamma)), summary(glm(lot2 ~ log(u), family = Gamma)))) aq <- within(airquality, { # Notice that multiple vars can be changed lOzone <- log(Ozone) Month <- factor(month.abb[Month]) cTemp <- round((Temp - 32) * 5/9, 1) # From Fahrenheit to Celsius S.cT <- Solar.R / cTemp # using the newly created variable rm(Day, Temp) }) head(aq) # example from boxplot: with(ToothGrowth, { boxplot(len ~ dose, boxwex = 0.25, at = 1:3 - 0.2, subset = (supp == "VC"), col = "yellow", main = "Guinea Pigs' Tooth Growth", xlab = "Vitamin C dose mg", ylab = "tooth length", ylim = c(0, 35)) boxplot(len ~ dose, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, subset = supp == "OJ", col = "orange") legend(2, 9, c("Ascorbic acid", "Orange juice"), fill = c("yellow", "orange")) }) # alternate form that avoids subset argument: with(subset(ToothGrowth, supp == "VC"), boxplot(len ~ dose, boxwex = 0.25, at = 1:3 - 0.2, col = "yellow", main = "Guinea Pigs' Tooth Growth", xlab = "Vitamin C dose mg", ylab = "tooth length", ylim = c(0, 35))) with(subset(ToothGrowth, supp == "OJ"), boxplot(len ~ dose, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, col = "orange")) legend(2, 9, c("Ascorbic acid", "Orange juice"), fill = c("yellow", "orange"))
This function evaluates an expression, returning it in a two element list containing its value and a flag showing whether it would automatically print.
withVisible(x)
withVisible(x)
x |
an expression to be evaluated. |
The argument, not an expression
object, rather
an (unevaluated function) call
, is evaluated in the
caller's context.
This is a primitive function.
value |
The value of |
visible |
logical; whether the value would auto-print. |
invisible
, eval
;
withAutoprint()
calls source()
which
itself uses withVisible()
in order to correctly
“auto print”.
x <- 1 withVisible(x <- 1) # *$visible is FALSE x withVisible(x) # *$visible is TRUE # Wrap the call in evalq() for special handling df <- data.frame(a = 1:5, b = 1:5) evalq(withVisible(a + b), envir = df)
x <- 1 withVisible(x <- 1) # *$visible is FALSE x withVisible(x) # *$visible is TRUE # Wrap the call in evalq() for special handling df <- data.frame(a = 1:5, b = 1:5) evalq(withVisible(a + b), envir = df)
Write data x
to a file or other connection
.
As it simply calls cat()
, less formatting happens than
with print()
ing.
If x
is a matrix you need to transpose it (and typically set
ncolumns
) to get the columns in file
the same as those in
the internal representation.
Whereas atomic vectors (numeric
, character
,
etc, including matrices) are written plainly, i.e., without any names,
less simple vector-like objects such as "factor"
,
"Date"
, or "POSIXt"
may be
format
ted to character before writing.
write(x, file = "data", ncolumns = if(is.character(x)) 1 else 5, append = FALSE, sep = " ")
write(x, file = "data", ncolumns = if(is.character(x)) 1 else 5, append = FALSE, sep = " ")
x |
the data to be written out. |
file |
a When |
ncolumns |
the number of columns to write the data in. |
append |
if |
sep |
a string used to separate columns. Using |
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
write
is a wrapper for cat
, which gives further
details on the format used.
write.table
for matrix and data frame objects,
writeLines
for lines of text,
and scan
for reading data.
saveRDS
and save
are often preferable (for
writing any R objects).
# Demonstrate default ncolumns, writing to the console write(month.abb, "") # 1 element per line for "character" write(stack.loss, "") # 5 elements per line for "numeric" # Build a file with sequential calls fil <- tempfile("data") write("# Model settings", fil) write(month.abb, fil, ncolumns = 6, append = TRUE) write("\n# Initial parameter values", fil, append = TRUE) write(sqrt(stack.loss), fil, append = TRUE) if(interactive()) file.show(fil) unlink(fil) # tidy up
# Demonstrate default ncolumns, writing to the console write(month.abb, "") # 1 element per line for "character" write(stack.loss, "") # 5 elements per line for "numeric" # Build a file with sequential calls fil <- tempfile("data") write("# Model settings", fil) write(month.abb, fil, ncolumns = 6, append = TRUE) write("\n# Initial parameter values", fil, append = TRUE) write(sqrt(stack.loss), fil, append = TRUE) if(interactive()) file.show(fil) unlink(fil) # tidy up
Write text lines to a connection.
writeLines(text, con = stdout(), sep = "\n", useBytes = FALSE)
writeLines(text, con = stdout(), sep = "\n", useBytes = FALSE)
text |
a character vector. |
con |
a connection object or a character string. |
sep |
character string. A string to be written to the connection after each line of text. |
useBytes |
logical. See ‘Details’. |
If the con
is a character string, the function calls
file
to obtain a file connection which is opened for
the duration of the function call.
(tilde expansion of the file path is done by file
.)
If the connection is open it is written from its current position.
If it is not open, it is opened for the duration of the call in
"wt"
mode and then closed again.
Normally writeLines
is used with a text-mode connection, and the
default separator is converted to the normal separator for that
platform (LF on Unix/Linux, CRLF on Windows). For more
control, open
a binary connection and specify the precise value you want written to
the file in sep
. For even more control, use
writeChar
on a binary connection.
useBytes
is for expert use. Normally (when false) character
strings with marked encodings are converted to the current encoding
before being passed to the connection (which might do further
re-encoding). useBytes = TRUE
suppresses the re-encoding of
marked strings so they are passed byte-by-byte to the connection:
this can be useful when strings have already been re-encoded by
e.g. iconv
. (It is invoked automatically for strings
with marked encoding "bytes"
.)
connections
, writeChar
, writeBin
,
readLines
, cat
A generic auxiliary function that produces a numeric vector which
will sort in the same order as x
.
xtfrm(x)
xtfrm(x)
x |
an R object. |
This is a special case of ranking, but as a less general function than
rank
is more suitable to be made generic. The default
method is similar to rank(x, ties.method = "min",
na.last = "keep")
, so NA
values are given rank NA
and all
tied values are given equal integer rank.
The factor
method extracts the codes.
The default method will unclass the object if
is.numeric(x)
is true but otherwise make use of
==
and >
methods for the class of x[i]
(for
integers i
), and the is.na
method for the class of
x
, but might be rather slow when doing so.
This is an internal generic primitive, so S3 or S4 methods can be written for it. Differently to other internal generics, the default method is called explicitly when no other dispatch has happened.
A numeric (usually integer) vector of the same length as x
.
zapsmall
determines a digits
argument dr
for
calling round(x, digits = dr)
such that values close to
zero (compared with the maximal absolute value in the vector) are
‘zapped’, i.e., replaced by 0
.
zapsmall(x, digits = getOption("digits"), mFUN = function(x, ina) max(abs(x[!ina])), min.d = 0L)
zapsmall(x, digits = getOption("digits"), mFUN = function(x, ina) max(abs(x[!ina])), min.d = 0L)
x |
a numeric or complex vector or any R number-like object
which has a |
digits |
integer indicating the precision to be used. |
mFUN |
a |
min.d |
an integer specifying the minimal number of digits to use in
the resulting |
Chambers, J. M. (1998) Programming with Data. A Guide to the S Language. Springer.
x2 <- pi * 100^(-2:2)/10 print( x2, digits = 4) zapsmall( x2) # automatical digits zapsmall( x2, digits = 4) zapsmall(c(x2, Inf)) # round()s to integer .. zapsmall(c(x2, Inf), min.d=-Inf) # everything is small wrt Inf (z <- exp(1i*0:4*pi/2)) zapsmall(z) zapShow <- function(x, ...) rbind(orig = x, zapped = zapsmall(x, ...)) zapShow(x2) ## using a *robust* mFUN mF_rob <- function(x, ina) boxplot.stats(x, do.conf=FALSE)$stats[5] ## with robust mFUN(), 'Inf' is no longer distorting the picture: zapShow(c(x2, Inf), mFUN = mF_rob) zapShow(c(x2, Inf), mFUN = mF_rob, min.d = -5) # the same zapShow(c(x2, 999), mFUN = mF_rob) # same *rounding* as w/ Inf zapShow(c(x2, 999), mFUN = mF_rob, min.d = 3) # the same zapShow(c(x2, 999), mFUN = mF_rob, min.d = 8) # small diff
x2 <- pi * 100^(-2:2)/10 print( x2, digits = 4) zapsmall( x2) # automatical digits zapsmall( x2, digits = 4) zapsmall(c(x2, Inf)) # round()s to integer .. zapsmall(c(x2, Inf), min.d=-Inf) # everything is small wrt Inf (z <- exp(1i*0:4*pi/2)) zapsmall(z) zapShow <- function(x, ...) rbind(orig = x, zapped = zapsmall(x, ...)) zapShow(x2) ## using a *robust* mFUN mF_rob <- function(x, ina) boxplot.stats(x, do.conf=FALSE)$stats[5] ## with robust mFUN(), 'Inf' is no longer distorting the picture: zapShow(c(x2, Inf), mFUN = mF_rob) zapShow(c(x2, Inf), mFUN = mF_rob, min.d = -5) # the same zapShow(c(x2, 999), mFUN = mF_rob) # same *rounding* as w/ Inf zapShow(c(x2, 999), mFUN = mF_rob, min.d = 3) # the same zapShow(c(x2, 999), mFUN = mF_rob, min.d = 8) # small diff
.packages
returns information about package availability.
.packages(all.available = FALSE, lib.loc = NULL)
.packages(all.available = FALSE, lib.loc = NULL)
all.available |
logical; if |
lib.loc |
a character vector describing the location of R
library trees to search through, or |
.packages()
returns the names of the currently
attached packages invisibly whereas
.packages(all.available = TRUE)
gives (visibly) all
packages available in the library location path lib.loc
.
For a package to be regarded as being ‘available’ it must have valid
metadata (and hence be an installed package). However, this will
report a package as available if the metadata does not match the
directory name: use find.package
to confirm that the
metadata match or installed.packages
for a much slower
but more comprehensive check of ‘available’ packages.
A character vector of package base names, invisible unless
all.available = TRUE
.
.packages(all.available = TRUE)
is not a way to find out if a
small number of packages are available for use: not only is it
expensive when thousands of packages are installed, it is an
incomplete test. See the help for find.package
for why
require
should be used.
R core;
Guido Masarotto for the all.available = TRUE
part of
.packages
.
library
, .libPaths
,
installed.packages
.
(.packages()) # maybe just "base" .packages(all.available = TRUE) # return all available as character vector require(splines) (.packages()) # "splines", too detach("package:splines")
(.packages()) # maybe just "base" .packages(all.available = TRUE) # return all available as character vector require(splines) (.packages()) # "splines", too detach("package:splines")
Miscellaneous internal/programming utilities.
.standard_regexps()
.standard_regexps()
.standard_regexps
returns a list of ‘standard’ regexps,
including elements named valid_package_name
and
valid_package_version
with the obvious meanings. The regexps
are not anchored.