variancePartition:Quantify and interpret drivers of variation in multilevel gene expression experiments
Quantify and interpret multiple sources of biological and technical variation in gene expression experiments. Uses a linear mixed model to quantify variation in gene expression attributable to individual, tissue, time point, or technical variables. Includes dream differential expression analysis for repeated measures.
Maintained by Gabriel E. Hoffman. Last updated 2 months ago.
nlme:Linear and Nonlinear Mixed Effects Models
Fit and compare Gaussian linear and nonlinear mixed-effects models.
Maintained by R Core Team. Last updated 2 months ago.
mdatools:Multivariate Data Analysis for Chemometrics
Projection based methods for preprocessing, exploring and analysis of multivariate data used in chemometrics. S. Kucheryavskiy (2020) <doi:10.1016/j.chemolab.2020.103937>.
Maintained by Sergey Kucheryavskiy. Last updated 8 months ago.
sommer:Solving Mixed Model Equations in R
Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.
Maintained by Giovanny Covarrubias-Pazaran. Last updated 22 days ago.
psychmeta:Psychometric Meta-Analysis Toolkit
Tools for computing bare-bones and psychometric meta-analyses and for generating psychometric data for use in meta-analysis simulations. Supports bare-bones, individual-correction, and artifact-distribution methods for meta-analyzing correlations and d values. Includes tools for converting effect sizes, computing sporadic artifact corrections, reshaping meta-analytic databases, computing multivariate corrections for range variation, and more. Bugs can be reported to <> or <>.
Maintained by Jeffrey A. Dahlke. Last updated 9 months ago.
avar:Allan Variance
Implements the allan variance and allan variance linear regression estimator for latent time series models. More details about the method can be found, for example, in Guerrier, S., Molinari, R., & Stebler, Y. (2016) <doi:10.1109/LSP.2016.2541867>.
Maintained by Stéphane Guerrier. Last updated 3 years ago.
AlphaSimR:Breeding Program Simulations
The successor to the 'AlphaSim' software for breeding program simulation [Faux et al. (2016) <doi:10.3835/plantgenome2016.02.0013>]. Used for stochastic simulations of breeding programs to the level of DNA sequence for every individual. Contained is a wide range of functions for modeling common tasks in a breeding program, such as selection and crossing. These functions allow for constructing simulations of highly complex plant and animal breeding programs via scripting in the R software environment. Such simulations can be used to evaluate overall breeding program performance and conduct research into breeding program design, such as implementation of genomic selection. Included is the 'Markovian Coalescent Simulator' ('MaCS') for fast simulation of biallelic sequences according to a population demographic history [Chen et al. (2009) <doi:10.1101/gr.083634.108>].
Maintained by Chris Gaynor. Last updated 5 months ago.
wv:Wavelet Variance
Provides a series of tools to compute and plot quantities related to classical and robust wavelet variance for time series and regular lattices. More details can be found, for example, in Serroukh, A., Walden, A.T., & Percival, D.B. (2000) <doi:10.2307/2669537> and Guerrier, S. & Molinari, R. (2016) <arXiv:1607.05858>.
Maintained by Stéphane Guerrier. Last updated 2 years ago.
metafor:Meta-Analysis Package for R
A comprehensive collection of functions for conducting meta-analyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit equal-, fixed-, random-, and mixed-effects models to such data, carry out moderator and meta-regression analyses, and create various types of meta-analytical plots (e.g., forest, funnel, radial, L'Abbe, Baujat, bubble, and GOSH plots). For meta-analyses of binomial and person-time data, the package also provides functions that implement specialized methods, including the Mantel-Haenszel method, Peto's method, and a variety of suitable generalized linear (mixed-effects) models (i.e., mixed-effects logistic and Poisson regression models). Finally, the package provides functionality for fitting meta-analytic multivariate/multilevel models that account for non-independent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network meta-analyses and meta-analyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted. An introduction to the package can be found in Viechtbauer (2010) <doi:10.18637/jss.v036.i03>.
Maintained by Wolfgang Viechtbauer. Last updated 2 days ago.
MAnorm2:Tools for Normalizing and Comparing ChIP-seq Samples
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the premier technology for profiling genome-wide localization of chromatin-binding proteins, including transcription factors and histones with various modifications. This package provides a robust method for normalizing ChIP-seq signals across individual samples or groups of samples. It also designs a self-contained system of statistical models for calling differential ChIP-seq signals between two or more biological conditions as well as for calling hypervariable ChIP-seq signals across samples. Refer to Tu et al. (2021) <doi:10.1101/gr.262675.120> and Chen et al. (2022) <doi:10.1186/s13059-022-02627-9> for associated statistical details.
Maintained by Shiqi Tu. Last updated 2 years ago.
ape:Analyses of Phylogenetics and Evolution
Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
Maintained by Emmanuel Paradis. Last updated 14 hours ago.
svrep:Tools for Creating, Updating, and Analyzing Survey Replicate Weights
Provides tools for creating and working with survey replicate weights, extending functionality of the 'survey' package from Lumley (2004) <doi:10.18637/jss.v009.i08>. Implements bootstrap methods for complex surveys, including the generalized survey bootstrap as described by Beaumont and Patak (2012) <doi:10.1111/j.1751-5823.2011.00166.x>. Methods are provided for applying nonresponse adjustments to both full-sample and replicate weights as described by Rust and Rao (1996) <doi:10.1177/096228029600500305>. Implements methods for sample-based calibration described by Opsomer and Erciulescu (2021) <>. Diagnostic functions are included to compare weights and weighted estimates from different sets of replicate weights.
Maintained by Ben Schneider. Last updated 7 days ago.
pROC:Display and Analyze ROC Curves
Tools for visualizing, smoothing and comparing receiver operating characteristic (ROC curves). (Partial) area under the curve (AUC) can be compared with statistical tests based on U-statistics or bootstrap. Confidence intervals can be computed for (p)AUC or ROC curves.
Maintained by Xavier Robin. Last updated 4 months ago.
samplingVarEst:Sampling Variance Estimation
Functions to calculate some point estimators and estimate their variance under unequal probability sampling without replacement. Single and two-stage sampling designs are considered. Some approximations for the second-order inclusion probabilities (joint inclusion probabilities) are available (sample and population based). A variety of Jackknife variance estimators are implemented. Almost every function is written in C (compiled) code for faster results. The functions incorporate some performance improvements for faster results with large datasets.
Maintained by Emilio Lopez Escobar. Last updated 2 years ago.
spatstat.model:Parametric Statistical Modelling and Inference for the 'spatstat' Family
Functionality for parametric statistical modelling and inference for spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Supports parametric modelling, formal statistical inference, and model validation. Parametric models include Poisson point processes, Cox point processes, Neyman-Scott cluster processes, Gibbs point processes and determinantal point processes. Models can be fitted to data using maximum likelihood, maximum pseudolikelihood, maximum composite likelihood and the method of minimum contrast. Fitted models can be simulated and predicted. Formal inference includes hypothesis tests (quadrat counting tests, Cressie-Read tests, Clark-Evans test, Berman test, Diggle-Cressie-Loosmore-Ford test, scan test, studentised permutation test, segregation test, ANOVA tests of fitted models, adjusted composite likelihood ratio test, envelope tests, Dao-Genton test, balanced independent two-stage test), confidence intervals for parameters, and prediction intervals for point counts. Model validation techniques include leverage, influence, partial residuals, added variable plots, diagnostic plots, pseudoscore residual plots, model compensators and Q-Q plots.
Maintained by Adrian Baddeley. Last updated 8 days ago.
polars:Lightning-Fast 'DataFrame' Library
Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.
Maintained by Soren Welling. Last updated 3 days ago.
queueing:Analysis of Queueing Networks and Models
It provides versatile tools for analysis of birth and death based Markovian Queueing Models and Single and Multiclass Product-Form Queueing Networks. It implements M/M/1, M/M/c, M/M/Infinite, M/M/1/K, M/M/c/K, M/M/c/c, M/M/1/K/K, M/M/c/K/K, M/M/c/K/m, M/M/Infinite/K/K, Multiple Channel Open Jackson Networks, Multiple Channel Closed Jackson Networks, Single Channel Multiple Class Open Networks, Single Channel Multiple Class Closed Networks and Single Channel Multiple Class Mixed Networks. Also it provides a B-Erlang, C-Erlang and Engset calculators. This work is dedicated to the memory of D. Sixto Rios Insua.
Maintained by Pedro Canadilla. Last updated 5 years ago.
fastcpd:Fast Change Point Detection via Sequential Gradient Descent
Implements fast change point detection algorithm based on the paper "Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis" by Xianyang Zhang, Trisha Dawn <>. The algorithm is based on dynamic programming with pruning and sequential gradient descent. It is able to detect change points a magnitude faster than the vanilla Pruned Exact Linear Time(PELT). The package includes examples of linear regression, logistic regression, Poisson regression, penalized linear regression data, and whole lot more examples with custom cost function in case the user wants to use their own cost function.
Maintained by Xingchi Li. Last updated 1 days ago.
DESeq2:Differential gene expression analysis based on the negative binomial distribution
Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.
Maintained by Michael Love. Last updated 11 days ago.
survey:Analysis of Complex Survey Samples
Summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples. Variances by Taylor series linearisation or replicate weights. Post-stratification, calibration, and raking. Two-phase and multiphase subsampling designs. Graphics. PPS sampling without replacement. Small-area estimation. Dual-frame designs.
Maintained by "Thomas Lumley". Last updated 6 months ago.
dsm:Density Surface Modelling of Distance Sampling Data
Density surface modelling of line transect data. A Generalized Additive Model-based approach is used to calculate spatially-explicit estimates of animal abundance from distance sampling (also presence/absence and strip transect) data. Several utility functions are provided for model checking, plotting and variance estimation.
Maintained by Laura Marshall. Last updated 2 years ago.
scran:Methods for Single-Cell RNA-Seq Data Analysis
Implements miscellaneous functions for interpretation of single-cell RNA-seq data. Methods are provided for assignment of cell cycle phase, detection of highly variable and significantly correlated genes, identification of marker genes, and other common tasks in routine single-cell analysis workflows.
Maintained by Aaron Lun. Last updated 5 months ago.
MOFA2:Multi-Omics Factor Analysis v2
The MOFA2 package contains a collection of tools for training and analysing multi-omic factor analysis (MOFA). MOFA is a probabilistic factor model that aims to identify principal axes of variation from data sets that can comprise multiple omic layers and/or groups of samples. Additional time or space information on the samples can be incorporated using the MEFISTO framework, which is part of MOFA2. Downstream analysis functions to inspect molecular features underlying each factor, vizualisation, imputation etc are available.
Maintained by Ricard Argelaguet. Last updated 5 months ago.
MBECS:Evaluation and correction of batch effects in microbiome data-sets
The Microbiome Batch Effect Correction Suite (MBECS) provides a set of functions to evaluate and mitigate unwated noise due to processing in batches. To that end it incorporates a host of batch correcting algorithms (BECA) from various packages. In addition it offers a correction and reporting pipeline that provides a preliminary look at the characteristics of a data-set before and after correcting for batch effects.
Maintained by Michael Olbrich. Last updated 5 months ago.
VCA:Variance Component Analysis
ANOVA and REML estimation of linear mixed models is implemented, once following Searle et al. (1991, ANOVA for unbalanced data), once making use of the 'lme4' package. The primary objective of this package is to perform a variance component analysis (VCA) according to CLSI EP05-A3 guideline "Evaluation of Precision of Quantitative Measurement Procedures" (2014). There are plotting methods for visualization of an experimental design, plotting random effects and residuals. For ANOVA type estimation two methods for computing ANOVA mean squares are implemented (SWEEP and quadratic forms). The covariance matrix of variance components can be derived, which is used in estimating confidence intervals. Linear hypotheses of fixed effects and LS means can be computed. LS means can be computed at specific values of covariables and with custom weighting schemes for factor variables. See ?VCA for a more comprehensive description of the features.
Maintained by Andre Schuetzenmeister. Last updated 1 years ago.
BSDA:Basic Statistics and Data Analysis
Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.
Maintained by Alan T. Arnholt. Last updated 2 years ago.
recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 6 days ago.
vegan:Community Ecology Package
Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
Maintained by Jari Oksanen. Last updated 16 days ago.
WASP:Wavelet System Prediction
The wavelet-based variance transformation method is used for system modelling and prediction. It refines predictor spectral representation using Wavelet Theory, which leads to improved model specifications and prediction accuracy. Details of methodologies used in the package can be found in Jiang, Z., Sharma, A., & Johnson, F. (2020) <doi:10.1029/2019WR026962>, Jiang, Z., Rashid, M. M., Johnson, F., & Sharma, A. (2020) <doi:10.1016/j.envsoft.2020.104907>, and Jiang, Z., Sharma, A., & Johnson, F. (2021) <doi:10.1016/J.JHYDROL.2021.126816>.
Maintained by Ze Jiang. Last updated 7 months ago.
Pareto:The Pareto, Piecewise Pareto and Generalized Pareto Distribution
Utilities for the Pareto, piecewise Pareto and generalized Pareto distribution that are useful for reinsurance pricing. In particular, the package provides a non-trivial algorithm that can be used to match the expected losses of a tower of reinsurance layers with a layer-independent collective risk model. The theoretical background of the matching algorithm and most other methods are described in Ulrich Riegel (2018) <doi:10.1007/s13385-018-0177-3>.
Maintained by Ulrich Riegel. Last updated 2 years ago.
limma:Linear Models for Microarray and Omics Data
Data analysis, linear models and differential expression for omics data.
Maintained by Gordon Smyth. Last updated 6 days ago.
SHT:Statistical Hypothesis Testing Toolbox
We provide a collection of statistical hypothesis testing procedures ranging from classical to modern methods for non-trivial settings such as high-dimensional scenario. For the general treatment of statistical hypothesis testing, see the book by Lehmann and Romano (2005) <doi:10.1007/0-387-27605-X>.
Maintained by Kisung You. Last updated 19 days ago.
lava:Latent Variable Models
A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.
Maintained by Klaus K. Holst. Last updated 2 months ago.
distributional:Vectorised Probability Distributions
Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions.
Maintained by Mitchell OHara-Wild. Last updated 2 months ago.
lestat:A Package for Learning Statistics
Some simple objects and functions to do statistics using linear models and a Bayesian framework.
Maintained by Petter Mostad. Last updated 7 years ago.
ade4:Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences
Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
Maintained by Aurélie Siberchicot. Last updated 13 days ago.
lme4:Linear Mixed-Effects Models using 'Eigen' and S4
Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".
Maintained by Ben Bolker. Last updated 3 days ago.
umx:Structural Equation Modeling and Twin Modeling in R
Quickly create, run, and report structural equation models, and twin models. See '?umx' for help, and umx_open_CRAN_page("umx") for NEWS. Timothy C. Bates, Michael C. Neale, Hermine H. Maes, (2019). umx: A library for Structural Equation and Twin Modelling in R. Twin Research and Human Genetics, 22, 27-41. <doi:10.1017/thg.2019.2>.
Maintained by Timothy C. Bates. Last updated 2 days ago.
bsvars:Bayesian Estimation of Structural Vector Autoregressive Models
Provides fast and efficient procedures for Bayesian analysis of Structural Vector Autoregressions. This package estimates a wide range of models, including homo-, heteroskedastic, and non-normal specifications. Structural models can be identified by adjustable exclusion restrictions, time-varying volatility, or non-normality. They all include a flexible three-level equation-specific local-global hierarchical prior distribution for the estimated level of shrinkage for autoregressive and structural parameters. Additionally, the package facilitates predictive and structural analyses such as impulse responses, forecast error variance and historical decompositions, forecasting, verification of heteroskedasticity, non-normality, and hypotheses on autoregressive parameters, as well as analyses of structural shocks, volatilities, and fitted values. Beautiful plots, informative summary functions, and extensive documentation including the vignette by Woźniak (2024) <doi:10.48550/arXiv.2410.15090> complement all this. The implemented techniques align closely with those presented in Lütkepohl, Shang, Uzeda, & Woźniak (2024) <doi:10.48550/arXiv.2404.11057>, Lütkepohl & Woźniak (2020) <doi:10.1016/j.jedc.2020.103862>, and Song & Woźniak (2021) <doi:10.1093/acrefore/9780190625979.013.174>. The 'bsvars' package is aligned regarding objects, workflows, and code structure with the R package 'bsvarSIGNs' by Wang & Woźniak (2024) <doi:10.32614/CRAN.package.bsvarSIGNs>, and they constitute an integrated toolset.
Maintained by Tomasz Woźniak. Last updated 1 months ago.
mrds:Mark-Recapture Distance Sampling
Animal abundance estimation via conventional, multiple covariate and mark-recapture distance sampling (CDS/MCDS/MRDS). Detection function fitting is performed via maximum likelihood. Also included are diagnostics and plotting for fitted detection functions. Abundance estimation is via a Horvitz-Thompson-like estimator.
Maintained by Laura Marshall. Last updated 2 months ago.
EnvStats:Package for Environmental Statistics, Including US EPA Guidance
Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <doi:10.1007/978-1-4614-8456-1>).
Maintained by Alexander Kowarik. Last updated 17 days ago.
waveslim:Basic Wavelet Routines for One-, Two-, and Three-Dimensional Signal Processing
Basic wavelet routines for time series (1D), image (2D) and array (3D) analysis. The code provided here is based on wavelet methodology developed in Percival and Walden (2000); Gencay, Selcuk and Whitcher (2001); the dual-tree complex wavelet transform (DTCWT) from Kingsbury (1999, 2001) as implemented by Selesnick; and Hilbert wavelet pairs (Selesnick 2001, 2002). All figures in chapters 4-7 of GSW (2001) are reproducible using this package and R code available at the book website(s) below.
Maintained by Brandon Whitcher. Last updated 10 months ago.
dae:Functions Useful in the Design and ANOVA of Experiments
The content falls into the following groupings: (i) Data, (ii) Factor manipulation functions, (iii) Design functions, (iv) ANOVA functions, (v) Matrix functions, (vi) Projector and canonical efficiency functions, and (vii) Miscellaneous functions. There is a vignette describing how to use the design functions for randomizing and assessing designs available as a vignette called 'DesignNotes'. The ANOVA functions facilitate the extraction of information when the 'Error' function has been used in the call to 'aov'. The package 'dae' can also be installed from <>.
Maintained by Chris Brien. Last updated 4 months ago.
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 16 days ago.
VarReg:Semi-Parametric Variance Regression
Methods for fitting semi-parametric mean and variance models, with normal or censored data. Extended to allow a regression in the location, scale and shape parameters, and further for multiple regression in each.
Maintained by Kristy Robledo. Last updated 2 years ago.
dreamlet:Scalable differential expression analysis of single cell transcriptomics datasets with complex study designs
Recent advances in single cell/nucleus transcriptomic technology has enabled collection of cohort-scale datasets to study cell type specific gene expression differences associated disease state, stimulus, and genetic regulation. The scale of these data, complex study designs, and low read count per cell mean that characterizing cell type specific molecular mechanisms requires a user-frieldly, purpose-build analytical framework. We have developed the dreamlet package that applies a pseudobulk approach and fits a regression model for each gene and cell cluster to test differential expression across individuals associated with a trait of interest. Use of precision-weighted linear mixed models enables accounting for repeated measures study designs, high dimensional batch effects, and varying sequencing depth or observed cells per biosample.
Maintained by Gabriel Hoffman. Last updated 5 months ago.
keras3:R Interface to 'Keras'
Interface to 'Keras' <>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 4 days ago.
TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database
A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <> to automatically obtain data, and a wide range of methods to run the analysis.
Maintained by Gibran Hemani. Last updated 11 days ago.
sandwich:Robust Covariance Matrix Estimators
Object-oriented software for model-robust covariance matrix estimators. Starting out from the basic robust Eicker-Huber-White sandwich covariance methods include: heteroscedasticity-consistent (HC) covariances for cross-section data; heteroscedasticity- and autocorrelation-consistent (HAC) covariances for time series data (such as Andrews' kernel HAC, Newey-West, and WEAVE estimators); clustered covariances (one-way and multi-way); panel and panel-corrected covariances; outer-product-of-gradients covariances; and (clustered) bootstrap covariances. All methods are applicable to (generalized) linear model objects fitted by lm() and glm() but can also be adapted to other classes through S3 methods. Details can be found in Zeileis et al. (2020) <doi:10.18637/jss.v095.i01>, Zeileis (2004) <doi:10.18637/jss.v011.i10> and Zeileis (2006) <doi:10.18637/jss.v016.i09>.
Maintained by Achim Zeileis. Last updated 2 months ago.
Rdimtools:Dimension Reduction and Estimation Methods
We provide linear and nonlinear dimension reduction techniques. Intrinsic dimension estimation methods for exploratory analysis are also provided. For more details on the package, see the paper by You and Shung (2022) <doi:10.1016/j.simpa.2022.100414>.
Maintained by Kisung You. Last updated 2 years ago.
bvartools:Bayesian Inference of Vector Autoregressive and Error Correction Models
Assists in the set-up of algorithms for Bayesian inference of vector autoregressive (VAR) and error correction (VEC) models. Functions for posterior simulation, forecasting, impulse response analysis and forecast error variance decomposition are largely based on the introductory texts of Chan, Koop, Poirier and Tobias (2019, ISBN: 9781108437493), Koop and Korobilis (2010) <doi:10.1561/0800000013> and Luetkepohl (2006, ISBN: 9783540262398).
Maintained by Franz X. Mohr. Last updated 1 years ago.
lmomco:L-Moments, Censored L-Moments, Trimmed L-Moments, L-Comoments, and Many Distributions
Extensive functions for Lmoments (LMs) and probability-weighted moments (PWMs), distribution parameter estimation, LMs for distributions, LM ratio diagrams, multivariate Lcomoments, and asymmetric (asy) trimmed LMs (TLMs). Maximum likelihood and maximum product spacings estimation are available. Right-tail and left-tail LM censoring by threshold or indicator variable are available. LMs of residual (resid) and reversed (rev) residual life are implemented along with 13 quantile operators for reliability analyses. Exact analytical bootstrap estimates of order statistics, LMs, and LM var-covars are available. Harri-Coble Tau34-squared Normality Test is available. Distributions with L, TL, and added (+) support for right-tail censoring (RC) encompass: Asy Exponential (Exp) Power [L], Asy Triangular [L], Cauchy [TL], Eta-Mu [L], Exp. [L], Gamma [L], Generalized (Gen) Exp Poisson [L], Gen Extreme Value [L], Gen Lambda [L, TL], Gen Logistic [L], Gen Normal [L], Gen Pareto [L+RC, TL], Govindarajulu [L], Gumbel [L], Kappa [L], Kappa-Mu [L], Kumaraswamy [L], Laplace [L], Linear Mean Residual Quantile Function [L], Normal [L], 3p log-Normal [L], Pearson Type III [L], Polynomial Density-Quantile 3 and 4 [L], Rayleigh [L], Rev-Gumbel [L+RC], Rice [L], Singh Maddala [L], Slash [TL], 3p Student t [L], Truncated Exponential [L], Wakeby [L], and Weibull [L].
Maintained by William Asquith. Last updated 1 months ago.
cvAUC:Cross-Validated Area Under the ROC Curve Confidence Intervals
Tools for working with and evaluating cross-validated area under the ROC curve (AUC) estimators. The primary functions of the package are ci.cvAUC and ci.pooled.cvAUC, which report cross-validated AUC and compute confidence intervals for cross-validated AUC estimates based on influence curves for i.i.d. and pooled repeated measures data, respectively. One benefit to using influence curve based confidence intervals is that they require much less computation time than bootstrapping methods. The utility functions, AUC and cvAUC, are simple wrappers for functions from the ROCR package.
Maintained by Erin LeDell. Last updated 3 years ago.
stochtree:Stochastic Tree Ensembles (XBART and BART) for Supervised Learning and Causal Inference
Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) Chipman, George, McCulloch (2010) <doi:10.1214/09-AOAS285> for supervised learning and Bayesian Causal Forests (BCF) Hahn, Murray, Carvalho (2020) <doi:10.1214/19-BA1195> for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers.
Maintained by Drew Herren. Last updated 18 days ago.
BIGL:Biochemically Intuitive Generalized Loewe Model
Response surface methods for drug synergy analysis. Available methods include generalized and classical Loewe formulations as well as Highest Single Agent methodology. Response surfaces can be plotted in an interactive 3-D plot and formal statistical tests for presence of synergistic effects are available. Implemented methods and tests are described in the article "BIGL: Biochemically Intuitive Generalized Loewe null model for prediction of the expected combined effect compatible with partial agonism and antagonism" by Koen Van der Borght, Annelies Tourny, Rytis Bagdziunas, Olivier Thas, Maxim Nazarov, Heather Turner, Bie Verbist & Hugo Ceulemans (2017) <doi:10.1038/s41598-017-18068-5>.
Maintained by Maxim Nazarov. Last updated 2 years ago.
nvmix:Multivariate Normal Variance Mixtures
Functions for working with (grouped) multivariate normal variance mixture distributions (evaluation of distribution functions and densities, random number generation and parameter estimation), including Student's t distribution for non-integer degrees-of-freedom as well as the grouped t distribution and copula with multiple degrees-of-freedom parameters. See <doi:10.18637/jss.v102.i02> for a high-level description of select functionality.
Maintained by Marius Hofert. Last updated 1 years ago.
laeken:Estimation of Indicators on Social Exclusion and Poverty
Estimation of indicators on social exclusion and poverty, as well as Pareto tail modeling for empirical income distributions.
Maintained by Andreas Alfons. Last updated 1 years ago.
PEcAn.uncertainty:PEcAn Functions Used for Propagating and Partitioning Uncertainties in Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by David LeBauer. Last updated 2 days ago.
sirt:Supplementary Item Response Theory Models
Supplementary functions for item response models aiming to complement existing R packages. The functionality includes among others multidimensional compensatory and noncompensatory IRT models (Reckase, 2009, <doi:10.1007/978-0-387-89976-3>), MCMC for hierarchical IRT models and testlet models (Fox, 2010, <doi:10.1007/978-1-4419-0742-4>), NOHARM (McDonald, 1982, <doi:10.1177/014662168200600402>), Rasch copula model (Braeken, 2011, <doi:10.1007/s11336-010-9190-4>; Schroeders, Robitzsch & Schipolowski, 2014, <doi:10.1111/jedm.12054>), faceted and hierarchical rater models (DeCarlo, Kim & Johnson, 2011, <doi:10.1111/j.1745-3984.2011.00143.x>), ordinal IRT model (ISOP; Scheiblechner, 1995, <doi:10.1007/BF02301417>), DETECT statistic (Stout, Habing, Douglas & Kim, 1996, <doi:10.1177/014662169602000403>), local structural equation modeling (LSEM; Hildebrandt, Luedtke, Robitzsch, Sommer & Wilhelm, 2016, <doi:10.1080/00273171.2016.1142856>).
Maintained by Alexander Robitzsch. Last updated 3 months ago.
circular:Circular Statistics
Circular Statistics, from "Topics in circular Statistics" (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.
Maintained by Eduardo García-Portugués. Last updated 7 months ago.
spc:Statistical Process Control -- Calculation of ARL and Other Control Chart Performance Measures
Evaluation of control charts by means of the zero-state, steady-state ARL (Average Run Length) and RL quantiles. Setting up control charts for given in-control ARL. The control charts under consideration are one- and two-sided EWMA, CUSUM, and Shiryaev-Roberts schemes for monitoring the mean or variance of normally distributed independent data. ARL calculation of the same set of schemes under drift (in the mean) are added. Eventually, all ARL measures for the multivariate EWMA (MEWMA) are provided.
Maintained by Sven Knoth. Last updated 7 months ago.
Distance:Distance Sampling Detection Function and Abundance Estimation
A simple way of fitting detection functions to distance sampling data for both line and point transects. Adjustment term selection, left and right truncation as well as monotonicity constraints and binning are supported. Abundance and density estimates can also be calculated (via a Horvitz-Thompson-like estimator) if survey area information is provided. See Miller et al. (2019) <doi:10.18637/jss.v089.i01> for more information on methods and <> for example analyses.
Maintained by Laura Marshall. Last updated 11 days ago.
hierfstat:Estimation and Tests of Hierarchical F-Statistics
Estimates hierarchical F-statistics from haploid or diploid genetic data with any numbers of levels in the hierarchy, following the algorithm of Yang (Evolution(1998), 52:950). Tests via randomisations the significance of each F and variance components, using the likelihood-ratio statistics G (Goudet et al. (1996) <>). Estimates genetic diversity statistics for haploid and diploid genetic datasets in various formats, including inbreeding and coancestry coefficients, and population specific F-statistics following Weir and Goudet (2017) <>.
Maintained by Jerome Goudet. Last updated 4 months ago.
lawstat:Tools for Biostatistics, Public Policy, and Law
Statistical tests widely utilized in biostatistics, public policy, and law. Along with the well-known tests for equality of means and variances, randomness, and measures of relative variability, the package contains new robust tests of symmetry, omnibus and directional tests of normality, and their graphical counterparts such as robust QQ plot, robust trend tests for variances, etc. All implemented tests and methods are illustrated by simulations and real-life examples from legal statistics, economics, and biostatistics.
Maintained by Yulia R. Gel. Last updated 2 years ago.
dearseq:Differential Expression Analysis for RNA-seq data through a robust variance component test
Differential Expression Analysis RNA-seq data with variance component score test accounting for data heteroscedasticity through precision weights. Perform both gene-wise and gene set analyses, and can deal with repeated or longitudinal data. Methods are detailed in: i) Agniel D & Hejblum BP (2017) Variance component score test for time-course gene set analysis of longitudinal RNA-seq data, Biostatistics, 18(4):589-604 ; and ii) Gauthier M, Agniel D, Thiébaut R & Hejblum BP (2020) dearseq: a variance component score test for RNA-Seq differential analysis that effectively controls the false discovery rate, NAR Genomics and Bioinformatics, 2(4):lqaa093.
Maintained by Boris P. Hejblum. Last updated 5 months ago.
gustave:A User-Oriented Statistical Toolkit for Analytical Variance Estimation
Provides a toolkit for analytical variance estimation in survey sampling. Apart from the implementation of standard variance estimators, its main feature is to help the sampling expert produce easy-to-use variance estimation "wrappers", where systematic operations (linearization, domain estimation) are handled in a consistent and transparent way.
Maintained by Khaled Larbi. Last updated 1 years ago.
gets:General-to-Specific (GETS) Modelling and Indicator Saturation Methods
Automated General-to-Specific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting and testing for structural breaks in the mean, see Pretis, Reade and Sucarrat (2018) <doi:10.18637/jss.v086.i03> for an overview of the package. In advanced use, the estimator and diagnostics tests can be fully user-specified, see Sucarrat (2021) <doi:10.32614/RJ-2021-024>.
Maintained by Genaro Sucarrat. Last updated 8 months ago.
DEqMS:a tool to perform statistical analysis of differential protein expression for quantitative proteomics data.
DEqMS is developped on top of Limma. However, Limma assumes same prior variance for all genes. In proteomics, the accuracy of protein abundance estimates varies by the number of peptides/PSMs quantified in both label-free and labelled data. Proteins quantification by multiple peptides or PSMs are more accurate. DEqMS package is able to estimate different prior variances for proteins quantified by different number of PSMs/peptides, therefore acchieving better accuracy. The package can be applied to analyze both label-free and labelled proteomics data.
Maintained by Yafeng Zhu. Last updated 5 months ago.
Reacnorm:Perform a Partition of Variance of Reaction Norms
Partitions the phenotypic variance of a plastic trait, studied through its reaction norm. The variance partition distinguishes between the variance arising from the average shape of the reaction norms (V_Plas) and the (additive) genetic variance . The latter is itself separated into an environment-blind component (V_G/V_A) and the component arising from plasticity (V_GxE/V_AxE). The package also provides a way to further partition V_Plas into aspects (slope/curvature) of the shape of the average reaction norm (pi-decomposition) and partition V_Add (gamma-decomposition) and V_AxE (iota-decomposition) into the impact of genetic variation in the reaction norm parameters. Reference: de Villemereuil & Chevin (2025) <doi:10.32942/X2NC8B>.
Maintained by Pierre de Villemereuil. Last updated 18 days ago.
fullfact:Full Factorial Breeding Analysis
We facilitate the analysis of full factorial mating designs with mixed-effects models. The package contains six vignettes containing detailed examples.
Maintained by Aimee Lee Houde. Last updated 1 years ago.
powertools:Power and Sample Size Tools
Power and sample size calculations for a variety of study designs and outcomes. Methods include t tests, ANOVA (including tests for interactions, simple effects and contrasts), proportions, categorical data (chi-square tests and proportional odds), linear, logistic and Poisson regression, alternative and coprimary endpoints, power for confidence intervals, correlation coefficient tests, cluster randomized trials, individually randomized group treatment trials, multisite trials, treatment-by-covariate interaction effects and nonparametric tests of location. Utilities are provided for computing various effect sizes. Companion package to the book "Power and Sample Size in R", Crespi (2025, ISBN:9781138591622).
Maintained by Catherine M. Crespi. Last updated 5 days ago.
M3Drop:Michaelis-Menten Modelling of Dropouts in single-cell RNASeq
This package fits a model to the pattern of dropouts in single-cell RNASeq data. This model is used as a null to identify significantly variable (i.e. differentially expressed) genes for use in downstream analysis, such as clustering cells. Also includes an method for calculating exact Pearson residuals in UMI-tagged data using a library-size aware negative binomial model.
Maintained by Tallulah Andrews. Last updated 5 months ago.
neopolars:R Bindings for the 'polars' Rust Library
Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.
Maintained by Tatsuya Shima. Last updated 1 days ago.
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 4 days ago.
cSEM:Composite-Based Structural Equation Modeling
Estimate, assess, test, and study linear, nonlinear, hierarchical and multigroup structural equation models using composite-based approaches and procedures, including estimation techniques such as partial least squares path modeling (PLS-PM) and its derivatives (PLSc, ordPLSc, robustPLSc), generalized structured component analysis (GSCA), generalized structured component analysis with uniqueness terms (GSCAm), generalized canonical correlation analysis (GCCA), principal component analysis (PCA), factor score regression (FSR) using sum score, regression or Bartlett scores (including bias correction using Croon’s approach), as well as several tests and typical postestimation procedures (e.g., verify admissibility of the estimates, assess the model fit, test the model fit etc.).
Maintained by Florian Schuberth. Last updated 17 days ago.
picante:Integrating Phylogenies and Ecology
Functions for phylocom integration, community analyses, null-models, traits and evolution. Implements numerous ecophylogenetic approaches including measures of community phylogenetic and trait diversity, phylogenetic signal, estimation of trait values for unobserved taxa, null models for community and phylogeny randomizations, and utility functions for data input/output and phylogeny plotting. A full description of package functionality and methods are provided by Kembel et al. (2010) <doi:10.1093/bioinformatics/btq166>.
Maintained by Steven W. Kembel. Last updated 2 years ago.
rQCC:Robust Quality Control Chart
Constructs various robust quality control charts based on the median or Hodges-Lehmann estimator (location) and the median absolute deviation (MAD) or Shamos estimator (scale). The estimators used for the robust control charts are all unbiased with a sample of finite size. For more details, see Park, Kim and Wang (2022) <doi:10.1080/03610918.2019.1699114>. In addition, using this R package, the conventional quality control charts such as X-bar, S, R, p, np, u, c, g, h, and t charts are also easily constructed. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2022R1A2C1091319).
Maintained by Chanseok Park. Last updated 1 years ago.
psychometric:Applied Psychometric Theory
Contains functions useful for correlation theory, meta-analysis (validity-generalization), reliability, item analysis, inter-rater reliability, and classical utility.
Maintained by Thomas D. Fletcher. Last updated 1 years ago.
rempsyc:Convenience Functions for Psychology
Make your workflow faster and easier. Easily customizable plots (via 'ggplot2'), nice APA tables (following the style of the *American Psychological Association*) exportable to Word (via 'flextable'), easily run statistical tests or check assumptions, and automatize various other tasks.
Maintained by Rémi Thériault. Last updated 1 months ago.
tsvr:Timescale-Specific Variance Ratio for Use in Community Ecology
Tools for timescale decomposition of the classic variance ratio of community ecology. Tools are as described in Zhao et al (in prep), extending commonly used methods introduced by Peterson et al (1975) <doi: 10.2307/1936306>.
Maintained by Daniel C. Reuman. Last updated 4 years ago.
riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks
Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.
Maintained by Thomas Alexander Gerds. Last updated 18 days ago.
transformGamPoi:Variance Stabilizing Transformation for Gamma-Poisson Models
Variance-stabilizing transformations help with the analysis of heteroskedastic data (i.e., data where the variance is not constant, like count data). This package provide two types of variance stabilizing transformations: (1) methods based on the delta method (e.g., 'acosh', 'log(x+1)'), (2) model residual based (Pearson and randomized quantile residuals).
Maintained by Constantin Ahlmann-Eltze. Last updated 5 months ago.
PLNmodels:Poisson Lognormal Models
The Poisson-lognormal model and variants (Chiquet, Mariadassou and Robin, 2021 <doi:10.3389/fevo.2021.588292>) can be used for a variety of multivariate problems when count data are at play, including principal component analysis for count data, discriminant analysis, model-based clustering and network inference. Implements variational algorithms to fit such models accompanied with a set of functions for visualization and diagnostic.
Maintained by Julien Chiquet. Last updated 4 days ago.
mvMORPH:Multivariate Comparative Tools for Fitting Evolutionary Models to Morphometric Data
Fits multivariate (Brownian Motion, Early Burst, ACDC, Ornstein-Uhlenbeck and Shifts) models of continuous traits evolution on trees and time series. 'mvMORPH' also proposes high-dimensional multivariate comparative tools (linear models using Generalized Least Squares and multivariate tests) based on penalized likelihood. See Clavel et al. (2015) <DOI:10.1111/2041-210X.12420>, Clavel et al. (2019) <DOI:10.1093/sysbio/syy045>, and Clavel & Morlon (2020) <DOI:10.1093/sysbio/syaa010>.
Maintained by Julien Clavel. Last updated 1 months ago.
RaceID:Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data
Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>).
Maintained by Dominic Grün. Last updated 4 months ago.
popbio:Construction and Analysis of Matrix Population Models
Construct and analyze projection matrix models from a demography study of marked individuals classified by age or stage. The package covers methods described in Matrix Population Models by Caswell (2001) and Quantitative Conservation Biology by Morris and Doak (2002).
Maintained by Chris Stubben. Last updated 12 months ago.
explore:Simplifies Exploratory Data Analysis
Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.
Maintained by Roland Krasser. Last updated 3 months ago.
breedR:Statistical Methods for Forest Genetic Resources Analysts
Statistical tools to build predictive models for the breeders community. It aims to assess the genetic value of individuals under a number of situations, including spatial autocorrelation, genetic/environment interaction and competition. It is under active development as part of the Trees4Future project, particularly developed having forest genetic trials in mind. But can be used for animals or other situations as well.
Maintained by Facundo Muñoz. Last updated 8 months ago.
matrixStats:Functions that Apply to Rows and Columns of Matrices (and to Vectors)
High-performing functions operating on rows and columns of matrices, e.g. col / rowMedians(), col / rowRanks(), and col / rowSds(). Functions optimized per data type and for subsetted calculations such that both memory usage and processing time is minimized. There are also optimized vector-based methods, e.g. binMeans(), madDiff() and weightedMedian().
Maintained by Henrik Bengtsson. Last updated 2 months ago.
POV:Partition of Variation Variance Component Analysis Method
An implementation of the Partition Of variation (POV) method as developed by Dr. Thomas A Little <> in 1993 for the analysis of semiconductor data for hard drive manufacturing. POV is based on sequential sum of squares and is an exact method that explains all observed variation. It quantitates both the between and within factor variation effects and can quantitate the influence of both continuous and categorical factors.
Maintained by Paul Deen. Last updated 4 years ago.
metaSEM:Meta-Analysis using Structural Equation Modeling
A collection of functions for conducting meta-analysis using a structural equation modeling (SEM) approach via the 'OpenMx' and 'lavaan' packages. It also implements various procedures to perform meta-analytic structural equation modeling on the correlation and covariance matrices, see Cheung (2015) <doi:10.3389/fpsyg.2014.01521>.
Maintained by Mike Cheung. Last updated 10 days ago.
move:Visualizing and Analyzing Animal Track Data
Contains functions to access movement data stored in '' as well as tools to visualize and statistically analyze animal movement data, among others functions to calculate dynamic Brownian Bridge Movement Models. Move helps addressing movement ecology questions.
Maintained by Bart Kranstauber. Last updated 4 months ago.
RJafroc:Artificial Intelligence Systems and Observer Performance
Analyzing the performance of artificial intelligence (AI) systems/algorithms characterized by a 'search-and-report' strategy. Historically observer performance has dealt with measuring radiologists' performances in search tasks, e.g., searching for lesions in medical images and reporting them, but the implicit location information has been ignored. The implemented methods apply to analyzing the absolute and relative performances of AI systems, comparing AI performance to a group of human readers or optimizing the reporting threshold of an AI system. In addition to performing historical receiver operating receiver operating characteristic (ROC) analysis (localization information ignored), the software also performs free-response receiver operating characteristic (FROC) analysis, where lesion localization information is used. A book using the software has been published: Chakraborty DP: Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples, Taylor-Francis LLC; 2017: <>. Online updates to this book, which use the software, are at <>, <> and at <>. Supported data collection paradigms are the ROC, FROC and the location ROC (LROC). ROC data consists of single ratings per images, where a rating is the perceived confidence level that the image is that of a diseased patient. An ROC curve is a plot of true positive fraction vs. false positive fraction. FROC data consists of a variable number (zero or more) of mark-rating pairs per image, where a mark is the location of a reported suspicious region and the rating is the confidence level that it is a real lesion. LROC data consists of a rating and a location of the most suspicious region, for every image. Four models of observer performance, and curve-fitting software, are implemented: the binormal model (BM), the contaminated binormal model (CBM), the correlated contaminated binormal model (CORCBM), and the radiological search model (RSM). Unlike the binormal model, CBM, CORCBM and RSM predict 'proper' ROC curves that do not inappropriately cross the chance diagonal. Additionally, RSM parameters are related to search performance (not measured in conventional ROC analysis) and classification performance. Search performance refers to finding lesions, i.e., true positives, while simultaneously not finding false positive locations. Classification performance measures the ability to distinguish between true and false positive locations. Knowing these separate performances allows principled optimization of reader or AI system performance. This package supersedes Windows JAFROC (jackknife alternative FROC) software V4.2.1, <>. Package functions are organized as follows. Data file related function names are preceded by 'Df', curve fitting functions by 'Fit', included data sets by 'dataset', plotting functions by 'Plot', significance testing functions by 'St', sample size related functions by 'Ss', data simulation functions by 'Simulate' and utility functions by 'Util'. Implemented are figures of merit (FOMs) for quantifying performance and functions for visualizing empirical or fitted operating characteristics: e.g., ROC, FROC, alternative FROC (AFROC) and weighted AFROC (wAFROC) curves. For fully crossed study designs significance testing of reader-averaged FOM differences between modalities is implemented via either Dorfman-Berbaum-Metz or the Obuchowski-Rockette methods. Also implemented is single modality analysis, which allows comparison of performance of a group of radiologists to a specified value, or comparison of AI to a group of radiologists interpreting the same cases. Crossed-modality analysis is implemented wherein there are two crossed modality factors and the aim is to determined performance in each modality factor averaged over all levels of the second factor. Sample size estimation tools are provided for ROC and FROC studies; these use estimates of the relevant variances from a pilot study to predict required numbers of readers and cases in a pivotal study to achieve the desired power. Utility and data file manipulation functions allow data to be read in any of the currently used input formats, including Excel, and the results of the analysis can be viewed in text or Excel output files. The methods are illustrated with several included datasets from the author's collaborations. This update includes improvements to the code, some as a result of user-reported bugs and new feature requests, and others discovered during ongoing testing and code simplification.
Maintained by Dev Chakraborty. Last updated 5 months ago.
tidyLPA:Easily Carry Out Latent Profile Analysis (LPA) Using Open-Source or Commercial Software
Easily carry out latent profile analysis ("LPA"), determine the correct number of classes based on best practices, and tabulate and plot the results. Provides functionality to estimate commonly-specified models with free means, variances, and covariances for each profile. Follows a tidy approach, in that output is in the form of a data frame that can subsequently be computed on. Models can be estimated using the free open source 'R' packages 'Mclust' and 'OpenMx', or using the commercial program 'MPlus', via the 'MplusAutomation' package.
Maintained by Joshua M Rosenberg. Last updated 1 years ago.
coxme:Mixed Effects Cox Models
Fit Cox proportional hazards models containing both fixed and random effects. The random effects can have a general form, of which familial interactions (a "kinship" matrix) is a particular special case. Note that the simplest case of a mixed effects Cox model, i.e. a single random per-group intercept, is also called a "frailty" model. The approach is based on Ripatti and Palmgren, Biometrics 2002.
Maintained by Terry M. Therneau. Last updated 7 months ago.
HH:Statistical Analysis and Data Display: Heiberger and Holland
Support software for Statistical Analysis and Data Display (Second Edition, Springer, ISBN 978-1-4939-2121-8, 2015) and (First Edition, Springer, ISBN 0-387-40270-5, 2004) by Richard M. Heiberger and Burt Holland. This contemporary presentation of statistical methods features extensive use of graphical displays for exploring data and for displaying the analysis. The second edition includes redesigned graphics and additional chapters. The authors emphasize how to construct and interpret graphs, discuss principles of graphical design, and show how accompanying traditional tabular results are used to confirm the visual impressions derived directly from the graphs. Many of the graphical formats are novel and appear here for the first time in print. All chapters have exercises. All functions introduced in the book are in the package. R code for all examples, both graphs and tables, in the book is included in the scripts directory of the package.
Maintained by Richard M. Heiberger. Last updated 1 months ago.
sampling:Survey Sampling
Functions to draw random samples using different sampling schemes are available. Functions are also provided to obtain (generalized) calibration weights, different estimators, as well some variance estimators.
Maintained by Alina Matei. Last updated 1 years ago.
insight:Easy Access to Model Information for Various Model Objects
A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. 'insight' mainly revolves around two types of functions: Functions that find (the names of) information, starting with 'find_', and functions that get the underlying data, starting with 'get_'. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.
Maintained by Daniel Lüdecke. Last updated 5 days ago.
Seurat:Tools for Single Cell Genomics
A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.
Maintained by Paul Hoffman. Last updated 1 years ago.
vcpen:Penalized Variance Components Analysis
Method to perform penalized variance component analysis.
Maintained by Jason Sinnwell. Last updated 3 years ago.
spectrolab:Class and Methods for Spectral Data
Input/Output, processing and visualization of spectra taken with different spectrometers, including SVC (Spectra Vista), ASD and PSR (Spectral Evolution). Implements an S3 class spectra that other packages can build on. Provides methods to access, plot, manipulate, splice sensor overlap, vector normalize and smooth spectra.
Maintained by Jose Eduardo Meireles. Last updated 2 months ago.
sparseMatrixStats:Summary Statistics for Rows and Columns of Sparse Matrices
High performance functions for row and column operations on sparse matrices. For example: col / rowMeans2, col / rowMedians, col / rowVars etc. Currently, the optimizations are limited to data in the column sparse format. This package is inspired by the matrixStats package by Henrik Bengtsson.
Maintained by Constantin Ahlmann-Eltze. Last updated 5 months ago.
lcmm:Extended Mixed Models Using Latent Classes and Latent Processes
Estimation of various extensions of the mixed models including latent class mixed models, joint latent class mixed models, mixed models for curvilinear outcomes, mixed models for multivariate longitudinal outcomes using a maximum likelihood estimation method (Proust-Lima, Philipps, Liquet (2017) <doi:10.18637/jss.v078.i02>).
Maintained by Cecile Proust-Lima. Last updated 1 months ago.
QGglmm:Estimate Quantitative Genetics Parameters from Generalised Linear Mixed Models
Compute various quantitative genetics parameters from a Generalised Linear Mixed Model (GLMM) estimates. Especially, it yields the observed phenotypic mean, phenotypic variance and additive genetic variance.
Maintained by Pierre de Villemereuil. Last updated 2 months ago.
genefilter:genefilter: methods for filtering genes from high-throughput experiments
Some basic functions for filtering genes.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
MatrixGenerics:S4 Generic Summary Statistic Functions that Operate on Matrix-Like Objects
S4 generic functions modeled after the 'matrixStats' API for alternative matrix implementations. Packages with alternative matrix implementation can depend on this package and implement the generic functions that are defined here for a useful set of row and column summary statistics. Other package developers can import this package and handle a different matrix implementations without worrying about incompatibilities.
Maintained by Peter Hickey. Last updated 2 months ago.
glmmTMB:Generalized Linear Mixed Models using Template Model Builder
Fit linear and generalized linear mixed models with various extensions, including zero-inflation. The models are fitted using maximum likelihood estimation via 'TMB' (Template Model Builder). Random effects are assumed to be Gaussian on the scale of the linear predictor and are integrated out using the Laplace approximation. Gradients are calculated using automatic differentiation.
Maintained by Mollie Brooks. Last updated 12 days ago.
scater:Single-Cell Analysis Toolkit for Gene Expression Data in R
A collection of tools for doing various analyses of single-cell RNA-seq gene expression data, with a focus on quality control and visualization.
Maintained by Alan OCallaghan. Last updated 9 days ago.
performance:Assessment of Regression Models Performance
Utilities for computing measures to assess model quality, which are not directly provided by R's 'base' or 'stats' packages. These include e.g. measures like r-squared, intraclass correlation coefficient (Nakagawa, Johnson & Schielzeth (2017) <doi:10.1098/rsif.2017.0213>), root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models. References: Lüdecke et al. (2021) <doi:10.21105/joss.03139>.
Maintained by Daniel Lüdecke. Last updated 19 days ago.
pwr:Basic Functions for Power Analysis
Power analysis functions along the lines of Cohen (1988).
Maintained by Helios De Rosario. Last updated 1 years ago.
MendelianRandomization:Mendelian Randomization Package
Encodes several methods for performing Mendelian randomization analyses with summarized data. Summarized data on genetic associations with the exposure and with the outcome can be obtained from large consortia. These data can be used for obtaining causal estimates using instrumental variable methods.
Maintained by Stephen Burgess. Last updated 2 years ago.
geesmv:Modified Variance Estimators for Generalized Estimating Equations
Generalized estimating equations with the original sandwich variance estimator proposed by Liang and Zeger (1986), and eight types of more recent modified variance estimators for improving the finite small-sample performance.
Maintained by Zheng Li. Last updated 9 years ago.
PRA:Project Risk Analysis
Data analysis for Project Risk Management via the Second Moment Method, Monte Carlo Simulation, Contingency Analysis, Sensitivity Analysis, Earned Value Management, Learning Curves, Design Structure Matrices, and more.
Maintained by Paul Govan. Last updated 3 months ago.
LMMstar:Repeated Measurement Models for Discrete Times
Companion R package for the course "Statistical analysis of correlated and repeated measurements for health science researchers" taught by the section of Biostatistics of the University of Copenhagen. It implements linear mixed models where the model for the variance-covariance of the residuals is specified via patterns (compound symmetry, toeplitz, unstructured, ...). Statistical inference for mean, variance, and correlation parameters is performed based on the observed information and a Satterthwaite approximation of the degrees of freedom. Normalized residuals are provided to assess model misspecification. Statistical inference can be performed for arbitrary linear or non-linear combination(s) of model coefficients. Predictions can be computed conditional to covariates only or also to outcome values.
Maintained by Brice Ozenne. Last updated 5 months ago.
s20x:Functions for University of Auckland Course STATS 201/208 Data Analysis
A set of functions used in teaching STATS 201/208 Data Analysis at the University of Auckland. The functions are designed to make parts of R more accessible to a large undergraduate population who are mostly not statistics majors.
Maintained by James Curran. Last updated 2 years ago.
r4ss:R Code for Stock Synthesis
A collection of R functions for use with Stock Synthesis, a fisheries stock assessment modeling platform written in ADMB by Dr. Richard D. Methot at the NOAA Northwest Fisheries Science Center. The functions include tools for summarizing and plotting results, manipulating files, visualizing model parameterizations, and various other common stock assessment tasks. This version of '{r4ss}' is compatible with Stock Synthesis versions 3.24 through 3.30 (specifically version, from December 2024). Support for 3.24 models is only through the core functions for reading output and plotting.
Maintained by Ian G. Taylor. Last updated 5 days ago.
heplots:Visualizing Hypothesis Tests in Multivariate Linear Models
Provides HE plot and other functions for visualizing hypothesis tests in multivariate linear models. HE plots represent sums-of-squares-and-products matrices for linear hypotheses and for error using ellipses (in two dimensions) and ellipsoids (in three dimensions). The related 'candisc' package provides visualizations in a reduced-rank canonical discriminant space when there are more than a few response variables.
Maintained by Michael Friendly. Last updated 9 days ago.
tramME:Transformation Models with Mixed Effects
Likelihood-based estimation of mixed-effects transformation models using the Template Model Builder ('TMB', Kristensen et al., 2016) <doi:10.18637/jss.v070.i05>. The technical details of transformation models are given in Hothorn et al. (2018) <doi:10.1111/sjos.12291>. Likelihood contributions of exact, randomly censored (left, right, interval) and truncated observations are supported. The random effects are assumed to be normally distributed on the scale of the transformation function, the marginal likelihood is evaluated using the Laplace approximation, and the gradients are calculated with automatic differentiation (Tamasi & Hothorn, 2021) <doi:10.32614/RJ-2021-075>. Penalized smooth shift terms can be defined using 'mgcv'.
Maintained by Balint Tamasi. Last updated 4 days ago.
car:Companion to Applied Regression
Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.
Maintained by John Fox. Last updated 5 months ago.
mrbayes:Bayesian Summary Data Models for Mendelian Randomization Studies
Bayesian estimation of inverse variance weighted (IVW), Burgess et al. (2013) <doi:10.1002/gepi.21758>, and MR-Egger, Bowden et al. (2015) <doi:10.1093/ije/dyv080>, summary data models for Mendelian randomization analyses.
Maintained by Tom Palmer. Last updated 11 days ago.
sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes
An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.
Maintained by Gilberto Camara. Last updated 1 months ago.
PerformanceAnalytics:Econometric Tools for Performance and Risk Analysis
Collection of econometric functions for performance and risk analysis. In addition to standard risk and performance metrics, this package aims to aid practitioners and researchers in utilizing the latest research in analysis of non-normal return streams. In general, it is most tested on return (rather than price) data on a regular scale, but most functions will work with irregular return data as well, and increasing numbers of functions will work with P&L or price data where possible.
Maintained by Brian G. Peterson. Last updated 3 months ago.
coin:Conditional Inference Procedures in a Permutation Test Framework
Conditional inference procedures for the general independence problem including two-sample, K-sample (non-parametric ANOVA), correlation, censored, ordered and multivariate problems described in <doi:10.18637/jss.v028.i08>.
Maintained by Torsten Hothorn. Last updated 9 months ago.
spatialEco:Spatial Analysis and Modelling Utilities
Utilities to support spatial data manipulation, query, sampling and modelling in ecological applications. Functions include models for species population density, spatial smoothing, multivariate separability, point process model for creating pseudo- absences and sub-sampling, Quadrant-based sampling and analysis, auto-logistic modeling, sampling models, cluster optimization, statistical exploratory tools and raster-based metrics.
Maintained by Jeffrey S. Evans. Last updated 13 days ago.
highfrequency:Tools for Highfrequency Data Analysis
Provide functionality to manage, clean and match highfrequency trades and quotes data, calculate various liquidity measures, estimate and forecast volatility, detect price jumps and investigate microstructure noise and intraday periodicity. A detailed vignette can be found in the paper "Analyzing Intraday Financial Data in R: The highfrequency Package" by Boudt, Kleen, and Sjoerup (2022, <doi:10.18637/jss.v104.i08>). The DOI in the CITATION is for a new Journal of Statistical Software publication that will be registered after publication on CRAN. A working paper version can be found on SSRN: <doi:10.2139/ssrn.3917548>.
Maintained by Kris Boudt. Last updated 2 years ago.
mvMAPIT:Multivariate Genome Wide Marginal Epistasis Test
Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this package, we present the 'multivariate MArginal ePIstasis Test' ('mvMAPIT') – a multi-outcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact – thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search based methods. Our proposed 'mvMAPIT' builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate 'mvMAPIT' as a multivariate linear mixed model and develop a multi-trait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. Crawford et al. (2017) <doi:10.1371/journal.pgen.1006869>. Stamp et al. (2023) <doi:10.1093/g3journal/jkad118>.
Maintained by Julian Stamp. Last updated 5 months ago.
pcds:Proximity Catch Digraphs and Their Applications
Contains the functions for construction and visualization of various families of the proximity catch digraphs (PCDs) (see (Ceyhan (2005) ISBN:978-3-639-19063-2), for computing the graph invariants for testing the patterns of segregation and association against complete spatial randomness (CSR) or uniformity in one, two and three dimensional cases. The package also has tools for generating points from these spatial patterns. The graph invariants used in testing spatial point data are the domination number (Ceyhan (2011) <doi:10.1080/03610921003597211>) and arc density (Ceyhan et al. (2006) <doi:10.1016/j.csda.2005.03.002>; Ceyhan et al. (2007) <doi:10.1002/cjs.5550350106>). The PCD families considered are Arc-Slice PCDs, Proportional-Edge PCDs, and Central Similarity PCDs.
Maintained by Elvan Ceyhan. Last updated 2 years ago.
robumeta:Robust Variance Meta-Regression
Functions for conducting robust variance estimation (RVE) meta-regression using both large and small sample RVE estimators under various weighting schemes. These methods are distribution free and provide valid point estimates, standard errors and hypothesis tests even when the degree and structure of dependence between effect sizes is unknown. Also included are functions for conducting sensitivity analyses under correlated effects weighting and producing RVE-based forest plots.
Maintained by Zachary Fisher. Last updated 4 years ago.
geex:An API for M-Estimation
Provides a general, flexible framework for estimating parameters and empirical sandwich variance estimator from a set of unbiased estimating equations (i.e., M-estimation in the vein of Stefanski & Boos (2002) <doi:10.1198/000313002753631330>). All examples from Stefanski & Boos (2002) are published in the corresponding Journal of Statistical Software paper "The Calculus of M-Estimation in R with geex" by Saul & Hudgens (2020) <doi:10.18637/jss.v092.i02>. Also provides an API to compute finite-sample variance corrections.
Maintained by Bradley Saul. Last updated 10 months ago.
inlabru:Bayesian Latent Gaussian Modelling using INLA and Extensions
Facilitates spatial and general latent Gaussian modeling using integrated nested Laplace approximation via the INLA package (<>). Additionally, extends the GAM-like model class to more general nonlinear predictor expressions, and implements a log Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data. Model components are specified with general inputs and mapping methods to the latent variables, and the predictors are specified via general R expressions, with separate expressions for each observation likelihood model in multi-likelihood models. A prediction method based on fast Monte Carlo sampling allows posterior prediction of general expressions of the latent variables. Ecology-focused introduction in Bachl, Lindgren, Borchers, and Illian (2019) <doi:10.1111/2041-210X.13168>.
Maintained by Finn Lindgren. Last updated 4 days ago.
cobalt:Covariate Balance Tables and Plots
Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with 'MatchIt', 'WeightIt', 'MatchThem', 'twang', 'Matching', 'optmatch', 'CBPS', 'ebal', 'cem', 'sbw', and 'designmatch' for assessing balance on the output of their preprocessing functions. Users can also specify data for balance assessment not generated through the above packages. Also included are methods for assessing balance in clustered or multiply imputed data sets or data sets with multi-category, continuous, or longitudinal treatments.
Maintained by Noah Greifer. Last updated 11 months ago.
kim:A Toolkit for Behavioral Scientists
A collection of functions for analyzing data typically collected or used by behavioral scientists. Examples of the functions include a function that compares groups in a factorial experimental design, a function that conducts two-way analysis of variance (ANOVA), and a function that cleans a data set generated by Qualtrics surveys. Some of the functions will require installing additional package(s). Such packages and other references are cited within the section describing the relevant functions. Many functions in this package rely heavily on these two popular R packages: Dowle et al. (2021) <>. Wickham et al. (2021) <>.
Maintained by Jin Kim. Last updated 19 days ago.
tvReg:Time-Varying Coefficient for Single and Multi-Equation Regressions
Fitting time-varying coefficient models for single and multi-equation regressions, using kernel smoothing techniques.
Maintained by Isabel Casas. Last updated 2 years ago.
genridge:Generalized Ridge Trace Plots for Ridge Regression
The genridge package introduces generalizations of the standard univariate ridge trace plot used in ridge regression and related methods. These graphical methods show both bias (actually, shrinkage) and precision, by plotting the covariance ellipsoids of the estimated coefficients, rather than just the estimates themselves. 2D and 3D plotting methods are provided, both in the space of the predictor variables and in the transformed space of the PCA/SVD of the predictors.
Maintained by Michael Friendly. Last updated 3 months ago.
dvmisc:Convenience Functions, Moving Window Statistics, and Graphics
Collection of functions for running and summarizing statistical simulation studies, creating visualizations (e.g. CART Shiny app, histograms with fitted probability mass/density functions), calculating moving-window statistics efficiently, and performing common computations.
Maintained by Dane R. Van Domelen. Last updated 4 years ago.
templateICAr:Estimate Brain Networks and Connectivity with ICA and Empirical Priors
Implements the template ICA (independent components analysis) model proposed in Mejia et al. (2020) <doi:10.1080/01621459.2019.1679638> and the spatial template ICA model proposed in proposed in Mejia et al. (2022) <doi:10.1080/10618600.2022.2104289>. Both models estimate subject-level brain as deviations from known population-level networks, which are estimated using standard ICA algorithms. Both models employ an expectation-maximization algorithm for estimation of the latent brain networks and unknown model parameters. Includes direct support for 'CIFTI', 'GIFTI', and 'NIFTI' neuroimaging file formats.
Maintained by Amanda Mejia. Last updated 3 days ago.
segmented:Regression Models with Break-Points / Change-Points Estimation (with Possibly Random Effects)
Fitting regression models where, in addition to possible linear terms, one or more covariates have segmented (i.e., broken-line or piece-wise linear) or stepmented (i.e. piece-wise constant) effects. Multiple breakpoints for the same variable are allowed. The estimation method is discussed in Muggeo (2003, <doi:10.1002/sim.1545>) and illustrated in Muggeo (2008, <>). An approach for hypothesis testing is presented in Muggeo (2016, <doi:10.1080/00949655.2016.1149855>), and interval estimation for the breakpoint is discussed in Muggeo (2017, <doi:10.1111/anzs.12200>). Segmented mixed models, i.e. random effects in the change point, are discussed in Muggeo (2014, <doi:10.1177/1471082X13504721>). Estimation of piecewise-constant relationships and changepoints (mean-shift models) is discussed in Fasola et al. (2018, <doi:10.1007/s00180-017-0740-4>).
Maintained by Vito M. R. Muggeo. Last updated 16 days ago.
inferr:Inferential Statistics
Select set of parametric and non-parametric statistical tests. 'inferr' builds upon the solid set of statistical tests provided in 'stats' package by including additional data types as inputs, expanding and restructuring the test results. The tests included are t tests, variance tests, proportion tests, chi square tests, Levene's test, McNemar Test, Cochran's Q test and Runs test.
Maintained by Aravind Hebbali. Last updated 4 months ago.
gnonadd:Various Non-Additive Models for Genetic Associations
The goal of 'gnonadd' is to simplify workflows in the analysis of non-additive effects of sequence variants. This includes variance effects (Ivarsdottir et. al (2017) <doi:10.1038/ng.3928>), correlation effects, interaction effects and dominance effects. The package also includes convenience functions for visualization.
Maintained by Audunn S. Snaebjarnarson. Last updated 3 months ago.
NBPSeq:Negative Binomial Models for RNA-Sequencing Data
Negative Binomial (NB) models for two-group comparisons and regression inferences from RNA-Sequencing Data.
Maintained by Yanming Di. Last updated 11 years ago.
mvgam:Multivariate (Dynamic) Generalized Additive Models
Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2023) <doi:10.1111/2041-210X.13974>.
Maintained by Nicholas J Clark. Last updated 21 hours ago.
pbkrtest:Parametric Bootstrap, Kenward-Roger and Satterthwaite Based Methods for Test in Mixed Models
Computes p-values based on (a) Satterthwaite or Kenward-Rogers degree of freedom methods and (b) parametric bootstrap for mixed effects models as implemented in the 'lme4' package. Implements parametric bootstrap test for generalized linear mixed models as implemented in 'lme4' and generalized linear models. The package is documented in the paper by Halekoh and Højsgaard, (2012, <doi:10.18637/jss.v059.i09>). Please see 'citation("pbkrtest")' for citation details.
Maintained by Søren Højsgaard. Last updated 10 days ago.
extras:Helper Functions for Bayesian Analyses
Functions to 'numericise' 'R' objects (coerce to numeric objects), summarise 'MCMC' (Monte Carlo Markov Chain) samples and calculate deviance residuals as well as 'R' translations of some 'BUGS' (Bayesian Using Gibbs Sampling), 'JAGS' (Just Another Gibbs Sampler), 'STAN' and 'TMB' (Template Model Builder) functions.
Maintained by Nicole Hill. Last updated 2 months ago.
mixmeta:An Extended Mixed-Effects Framework for Meta-Analysis
A collection of functions to perform various meta-analytical models through a unified mixed-effects framework, including standard univariate fixed and random-effects meta-analysis and meta-regression, and non-standard extensions such as multivariate, multilevel, longitudinal, and dose-response models.
Maintained by Antonio Gasparrini. Last updated 3 years ago.
metamisc:Meta-Analysis of Diagnosis and Prognosis Research Studies
Facilitate frequentist and Bayesian meta-analysis of diagnosis and prognosis research studies. It includes functions to summarize multiple estimates of prediction model discrimination and calibration performance (Debray et al., 2019) <doi:10.1177/0962280218785504>. It also includes functions to evaluate funnel plot asymmetry (Debray et al., 2018) <doi:10.1002/jrsm.1266>. Finally, the package provides functions for developing multivariable prediction models from datasets with clustering (de Jong et al., 2021) <doi:10.1002/sim.8981>.
Maintained by Thomas Debray. Last updated 1 months ago.
effectsize:Indices of Effect Size
Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc. References: Ben-Shachar et al. (2020) <doi:10.21105/joss.02815>.
Maintained by Mattan S. Ben-Shachar. Last updated 1 months ago.
mme:Multinomial Mixed Effects Models
Fit Gaussian Multinomial mixed-effects models for small area estimation: Model 1, with one random effect in each category of the response variable (Lopez-Vizcaino,E. et al., 2013) <doi:10.1177/1471082X13478873>; Model 2, introducing independent time effect; Model 3, introducing correlated time effect. mme calculates direct and parametric bootstrap MSE estimators (Lopez-Vizcaino,E et al., 2014) <doi:10.1111/rssa.12085>.
Maintained by E. Lopez-Vizcaino. Last updated 6 years ago.
agricolae:Statistical Procedures for Agricultural Research
Original idea was presented in the thesis "A statistical analysis tool for agricultural research" to obtain the degree of Master on science, National Engineering University (UNI), Lima-Peru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, Graeco-Latin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several non-parametric tests comparison, biodiversity indexes and consensus cluster.
Maintained by Felipe de Mendiburu. Last updated 1 years ago.
lsasim:Functions to Facilitate the Simulation of Large Scale Assessment Data
Provides functions to simulate data from large-scale educational assessments, including background questionnaire data and cognitive item responses that adhere to a multiple-matrix sampled design. The theoretical foundation can be found on Matta, T.H., Rutkowski, L., Rutkowski, D. et al. (2018) <doi:10.1186/s40536-018-0068-8>.
Maintained by Waldir Leoncio. Last updated 2 months ago.
SightabilityModel:Wildlife Sightability Modeling
Uses logistic regression to model the probability of detection as a function of covariates. This model is then used with observational survey data to estimate population size, while accounting for uncertain detection. See Steinhorst and Samuel (1989).
Maintained by Schwarz Carl James. Last updated 2 years ago.
lumi:BeadArray Specific Methods for Illumina Methylation and Expression Microarrays
The lumi package provides an integrated solution for the Illumina microarray data analysis. It includes functions of Illumina BeadStudio (GenomeStudio) data input, quality control, BeadArray-specific variance stabilization, normalization and gene annotation at the probe level. It also includes the functions of processing Illumina methylation microarrays, especially Illumina Infinium methylation microarrays.
Maintained by Lei Huang. Last updated 5 months ago.
srvyr:'dplyr'-Like Syntax for Summary Statistics of Survey Data
Use piping, verbs like 'group_by' and 'summarize', and other 'dplyr' inspired syntactic style when calculating summary statistics on survey data using functions from the 'survey' package.
Maintained by Greg Freedman Ellis. Last updated 1 months ago.
PracTools:Designing and Weighting Survey Samples
Functions and datasets to support Valliant, Dever, and Kreuter (2018), <doi:10.1007/978-3-319-93632-1>, "Practical Tools for Designing and Weighting Survey Samples". Contains functions for sample size calculation for survey samples using stratified or clustered one-, two-, and three-stage sample designs, and single-stage audit sample designs. Functions are included that will group geographic units accounting for distances apart and measures of size. Other functions compute variance components for multistage designs and sample sizes in two-phase designs. A number of example data sets are included.
Maintained by Richard Valliant. Last updated 9 months ago.
semTools:Useful Tools for Structural Equation Modeling
Provides miscellaneous tools for structural equation modeling, many of which extend the 'lavaan' package. For example, latent interactions can be estimated using product indicators (Lin et al., 2010, <doi:10.1080/10705511.2010.488999>) and simple effects probed; analytical power analyses can be conducted (Jak et al., 2021, <doi:10.3758/s13428-020-01479-0>); and scale reliability can be estimated based on estimated factor-model parameters.
Maintained by Terrence D. Jorgensen. Last updated 3 days ago.
CPAT:Change Point Analysis Tests
Implements several statistical tests for structural change, specifically the tests featured in Horváth, Rice and Miller (in press): CUSUM (with weighted/trimmed variants), Darling-Erdös, Hidalgo-Seo, Andrews, and the new Rényi-type test.
Maintained by Curtis Miller. Last updated 6 years ago.
posterior:Tools for Working with Posterior Distributions
Provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to: (a) Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions. (b) Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws. (c) Provide various summaries of draws in convenient formats. (d) Provide lightweight implementations of state of the art posterior inference diagnostics. References: Vehtari et al. (2021) <doi:10.1214/20-BA1221>.
Maintained by Paul-Christian Bürkner. Last updated 10 days ago.
treestats:Phylogenetic Tree Statistics
Collection of phylogenetic tree statistics, collected throughout the literature. All functions have been written to maximize computation speed. The package includes umbrella functions to calculate all statistics, all balance associated statistics, or all branching time related statistics. Furthermore, the 'treestats' package supports summary statistic calculations on Ltables, provides speed-improved coding of branching times, Ltable conversion and includes algorithms to create intermediately balanced trees. Full description can be found in Janzen (2024) <doi:10.1016/j.ympev.2024.108168>.
Maintained by Thijs Janzen. Last updated 6 months ago.
qvcalc:Quasi Variances for Factor Effects in Statistical Models
Functions to compute quasi variances and associated measures of approximation error.
Maintained by David Firth. Last updated 2 months ago.
sRACIPE:Systems biology tool to simulate gene regulatory circuits
sRACIPE implements a randomization-based method for gene circuit modeling. It allows us to study the effect of both the gene expression noise and the parametric variation on any gene regulatory circuit (GRC) using only its topology, and simulates an ensemble of models with random kinetic parameters at multiple noise levels. Statistical analysis of the generated gene expressions reveals the basin of attraction and stability of various phenotypic states and their changes associated with intrinsic and extrinsic noises. sRACIPE provides a holistic picture to evaluate the effects of both the stochastic nature of cellular processes and the parametric variation.
Maintained by Mingyang Lu. Last updated 20 days ago.
MVR:Mean-Variance Regularization
Implements a non-parametric method for joint adaptive mean-variance regularization and variance stabilization of high-dimensional data. It is suited for handling difficult problems posed by high-dimensional multivariate datasets (p >> n paradigm). Among those are that the variance is often a function of the mean, variable-specific estimators of variances are not reliable, and tests statistics have low powers due to a lack of degrees of freedom. Key features include: (i) Normalization and/or variance stabilization of the data, (ii) Computation of mean-variance-regularized t-statistics (F-statistics to follow), (iii) Generation of diverse diagnostic plots, (iv) Computationally efficient implementation using C/C++ interfacing and an option for parallel computing to enjoy a faster and easier experience in the R environment.
Maintained by Jean-Eudes Dazard. Last updated 3 years ago.
pls:Partial Least Squares and Principal Component Regression
Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).
Maintained by Kristian Hovde Liland. Last updated 2 months ago.
smallarea:Fits a Fay Herriot Model
Inference techniques for Fay Herriot Model.
Maintained by Abhishek Nandy. Last updated 8 years ago.
compositions:Compositional Data Analysis
Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. Pawlowsky-Glahn.
Maintained by K. Gerald van den Boogaart. Last updated 1 years ago.
infer:Tidy Statistical Inference
The objective of this package is to perform inference using an expressive statistical grammar that coheres with the tidy design framework.
Maintained by Simon Couch. Last updated 6 months ago.
BGmisc:An R Package for Extended Behavior Genetics Analysis
Provides functions for behavior genetics analysis, including variance component model identification [Hunter et al. (2021) <doi:10.1007/s10519-021-10055-x>], calculation of relatedness coefficients using path-tracing methods [Wright (1922) <doi:10.1086/279872>; McArdle & McDonald (1984) <doi:10.1111/j.2044-8317.1984.tb00802.x>], inference of relatedness, pedigree conversion, and simulation of multi-generational family data [Lyu et al. (2024) <doi:10.1101/2024.12.19.629449>]. For a full overview, see Garrison et al. (2024) <doi:10.21105/joss.06203>.
Maintained by S. Mason Garrison. Last updated 24 days ago.
agriutilities:Utilities for Data Analysis in Agriculture
Utilities designed to make the analysis of field trials easier and more accessible for everyone working in plant breeding. It provides a simple and intuitive interface for conducting single and multi-environmental trial analysis, with minimal coding required. Whether you're a beginner or an experienced user, 'agriutilities' will help you quickly and easily carry out complex analyses with confidence. With built-in functions for fitting Linear Mixed Models, 'agriutilities' is the ideal choice for anyone who wants to save time and focus on interpreting their results. Some of the functions require the R package 'asreml' for the 'ASReml' software, this can be obtained upon purchase from 'VSN' international <>.
Maintained by Johan Aparicio. Last updated 2 months ago.
see:Model Visualisation Toolbox for 'easystats' and 'ggplot2'
Provides plotting utilities supporting packages in the 'easystats' ecosystem (<>) and some extra themes, geoms, and scales for 'ggplot2'. Color scales are based on <>. References: Lüdecke et al. (2021) <doi:10.21105/joss.03393>.
Maintained by Indrajeet Patil. Last updated 5 days ago.
daewr:Design and Analysis of Experiments with R
Contains Data frames and functions used in the book "Design and Analysis of Experiments with R", Lawson(2015) ISBN-13:978-1-4398-6813-3.
Maintained by John Lawson. Last updated 2 years ago.
vip:Variable Importance Plots
A general framework for constructing variable importance plots from various types of machine learning models in R. Aside from some standard model- specific variable importance measures, this package also provides model- agnostic approaches that can be applied to any supervised learning algorithm. These include 1) an efficient permutation-based variable importance measure, 2) variable importance based on Shapley values (Strumbelj and Kononenko, 2014) <doi:10.1007/s10115-013-0679-x>, and 3) the variance-based approach described in Greenwell et al. (2018) <arXiv:1805.04755>. A variance-based method for quantifying the relative strength of interaction effects is also included (see the previous reference for details).
Maintained by Brandon M. Greenwell. Last updated 2 years ago.
CLME:Constrained Inference for Linear Mixed Effects Models
Estimation and inference for linear models where some or all of the fixed-effects coefficients are subject to order restrictions. This package uses the robust residual bootstrap methodology for inference, and can handle some structure in the residual variance matrix.
Maintained by Casey M. Jelsema. Last updated 5 years ago.
openxlsx:Read, Write and Edit xlsx Files
Simplifies the creation of Excel .xlsx files by providing a high level interface to writing, styling and editing worksheets. Through the use of 'Rcpp', read/write times are comparable to the 'xlsx' and 'XLConnect' packages with the added benefit of removing the dependency on Java.
Maintained by Jan Marvin Garbuszus. Last updated 2 months ago.
pcev:Principal Component of Explained Variance
Principal component of explained variance (PCEV) is a statistical tool for the analysis of a multivariate response vector. It is a dimension- reduction technique, similar to Principal component analysis (PCA), that seeks to maximize the proportion of variance (in the response vector) being explained by a set of covariates.
Maintained by Maxime Turgeon. Last updated 6 years ago.
edgeR:Empirical Analysis of Digital Gene Expression Data in R
Differential expression analysis of sequence count data. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models, quasi-likelihood, and gene set enrichment. Can perform differential analyses of any type of omics data that produces read counts, including RNA-seq, ChIP-seq, ATAC-seq, Bisulfite-seq, SAGE, CAGE, metabolomics, or proteomics spectral counts. RNA-seq analyses can be conducted at the gene or isoform level, and tests can be conducted for differential exon or transcript usage.
Maintained by Yunshun Chen. Last updated 6 days ago.
dispRity:Measuring Disparity
A modular package for measuring disparity (multidimensional space occupancy). Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics. The package also provides functions for looking at disparity in a serial way (e.g. disparity through time) or per groups as well as visualising the results. Finally, this package provides several statistical tests for disparity analysis.
Maintained by Thomas Guillerme. Last updated 2 days ago.
calibrator:Bayesian Calibration of Complex Computer Codes
Performs Bayesian calibration of computer models as per Kennedy and O'Hagan 2001. The package includes routines to find the hyperparameters and parameters; see the help page for stage1() for a worked example using the toy dataset. A tutorial is provided in the calex.Rnw vignette; and a suite of especially simple one dimensional examples appears in inst/doc/one.dim/.
Maintained by Robin K. S. Hankin. Last updated 4 years ago.
qtl:Tools for Analyzing QTL Experiments
Analysis of experimental crosses to identify genes (called quantitative trait loci, QTLs) contributing to variation in quantitative traits. Broman et al. (2003) <doi:10.1093/bioinformatics/btg112>.
Maintained by Karl W Broman. Last updated 7 months ago.
corral:Correspondence Analysis for Single Cell Data
Correspondence analysis (CA) is a matrix factorization method, and is similar to principal components analysis (PCA). Whereas PCA is designed for application to continuous, approximately normally distributed data, CA is appropriate for non-negative, count-based data that are in the same additive scale. The corral package implements CA for dimensionality reduction of a single matrix of single-cell data, as well as a multi-table adaptation of CA that leverages data-optimized scaling to align data generated from different sequencing platforms by projecting into a shared latent space. corral utilizes sparse matrices and a fast implementation of SVD, and can be called directly on Bioconductor objects (e.g., SingleCellExperiment) for easy pipeline integration. The package also includes additional options, including variations of CA to address overdispersion in count data (e.g., Freeman-Tukey chi-squared residual), as well as the option to apply CA-style processing to continuous data (e.g., proteomic TOF intensities) with the Hellinger distance adaptation of CA.
Maintained by Lauren Hsu. Last updated 5 months ago.
mgcv:Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, 'JAGS' support and distributions beyond the exponential family.
Maintained by Simon Wood. Last updated 1 years ago.
3.5 match 32 stars 12.71 score 17k scripts 7.8k dependents