recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 6 hours ago.
586 stars
renv:Project Environments
A dependency management toolkit for R. Using 'renv', you can create and manage project-local R libraries, save the state of these libraries to a 'lockfile', and later restore your library as required. Together, these tools can help make your projects more isolated, portable, and reproducible.
Maintained by Kevin Ushey. Last updated 3 days ago.
1.0k stars
GenomicRanges:Representation and manipulation of genomic intervals
The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.
Maintained by Hervé Pagès. Last updated 4 months ago.
44 stars
terra:Spatial Data Analysis
Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).
Maintained by Robert J. Hijmans. Last updated 2 hours ago.
560 stars
raster:Geographic Data Analysis and Modeling
Reading, writing, manipulating, analyzing and modeling of spatial data. This package has been superseded by the "terra" package <>.
Maintained by Robert J. Hijmans. Last updated 1 days ago.
163 stars
Matrix:Sparse and Dense Matrix Classes and Methods
A rich hierarchy of sparse and dense matrix classes, including general, symmetric, triangular, and diagonal matrices with numeric, logical, or pattern entries. Efficient methods for operating on such matrices, often wrapping the 'BLAS', 'LAPACK', and 'SuiteSparse' libraries.
Maintained by Martin Maechler. Last updated 19 days ago.
1 stars
lavaan:Latent Variable Analysis
Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.
Maintained by Yves Rosseel. Last updated 3 days ago.
454 stars
GenomicAlignments:Representation and manipulation of short genomic alignments
Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.
Maintained by Hervé Pagès. Last updated 5 months ago.
10 stars
bbmle:Tools for General Maximum Likelihood Estimation
Methods and functions for fitting maximum likelihood models in R. This package modifies and extends the 'mle' classes in the 'stats4' package.
Maintained by Ben Bolker. Last updated 1 months ago.
25 stars
unmarked:Models for Data from Unmarked Animals
Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. References: Kellner et al. (2023) <doi:10.1111/2041-210X.14123>, Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.
Maintained by Ken Kellner. Last updated 9 days ago.
4 stars
dtwclust:Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance
Time series clustering along with optimized techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Implementations of partitional, hierarchical, fuzzy, k-Shape and TADPole clustering are available. Functionality can be easily extended with custom distance measures and centroid definitions. Implementations of DTW barycenter averaging, a distance based on global alignment kernels, and the soft-DTW distance and centroid routines are also provided. All included distance functions have custom loops optimized for the calculation of cross-distance matrices, including parallelization support. Several cluster validity indices are included.
Maintained by Alexis Sarda. Last updated 8 months ago.
262 stars
RProtoBuf:R Interface to the 'Protocol Buffers' 'API' (Version 2 or 3)
Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal 'RPC' protocols and file formats. Additional documentation is available in two included vignettes one of which corresponds to our 'JSS' paper (2016, <doi:10.18637/jss.v071.i02>. A sufficiently recent version of 'Protocol Buffers' library is required; currently version 3.3.0 from 2017 is the stated minimum.
Maintained by Dirk Eddelbuettel. Last updated 12 days ago.
73 stars
MAST:Model-based Analysis of Single Cell Transcriptomics
Methods and models for handling zero-inflated single cell assay data.
Maintained by Andrew McDavid. Last updated 5 months ago.
232 stars
mrgsolve:Simulate from ODE-Based Models
Fast simulation from ordinary differential equation (ODE) based models typically employed in quantitative pharmacology and systems biology.
Maintained by Kyle T Baron. Last updated 9 days ago.
138 stars
textmineR:Functions for Text Mining and Topic Modeling
An aid for text mining in R, with a syntax that should be familiar to experienced R users. Provides a wrapper for several topic models that take similarly-formatted input and give similarly-formatted output. Has additional functionality for analyzing and diagnostics for topic models.
Maintained by Tommy Jones. Last updated 2 years ago.
106 stars
adegraphics:An S4 Lattice-Based Package for the Representation of Multivariate Data
Graphical functionalities for the representation of multivariate data. It is a complete re-implementation of the functions available in the 'ade4' package.
Maintained by Aurélie Siberchicot. Last updated 8 months ago.
9 stars
multtest:Resampling-based multiple hypothesis testing
Non-parametric bootstrap and permutation resampling-based multiple testing procedures (including empirical Bayes methods) for controlling the family-wise error rate (FWER), generalized family-wise error rate (gFWER), tail probability of the proportion of false positives (TPPFP), and false discovery rate (FDR). Several choices of bootstrap-based null distribution are implemented (centered, centered and scaled, quantile-transformed). Single-step and step-wise methods are available. Tests based on a variety of t- and F-statistics (including t-statistics based on regression parameters from linear and survival models as well as those based on correlation parameters) are included. When probing hypotheses with t-statistics, users may also select a potentially faster null distribution which is multivariate normal with mean zero and variance covariance matrix derived from the vector influence function. Results are reported in terms of adjusted p-values, confidence regions and test statistic cutoffs. The procedures are directly applicable to identifying differentially expressed genes in DNA microarray experiments.
Maintained by Katherine S. Pollard. Last updated 5 months ago.
9.34 score 932 scripts 136 dependentsflr
FLCore:Core Package of FLR, Fisheries Modelling in R
Core classes and methods for FLR, a framework for fisheries modelling and management strategy simulation in R. Developed by a team of fisheries scientists in various countries. More information can be found at <>.
Maintained by Iago Mosqueira. Last updated 9 days ago.
16 stars
rbi:Interface to 'LibBi'
Provides a complete interface to 'LibBi', a library for Bayesian inference (see <> and Murray, 2015 <doi:10.18637/jss.v067.i10> for more information). This includes functions for manipulating 'LibBi' models, for reading and writing 'LibBi' input/output files, for converting 'LibBi' output to provide traces for use with the coda package, and for running 'LibBi' to conduct inference.
Maintained by Sebastian Funk. Last updated 10 months ago.
24 stars
gistr:Work with 'GitHub' 'Gists'
Work with 'GitHub' 'gists' from 'R' (e.g., <>, <>). A 'gist' is simply one or more files with code/text/images/etc. This package allows the user to create new 'gists', update 'gists' with new files, rename files, delete files, get and delete 'gists', star and 'un-star' 'gists', fork 'gists', open a 'gist' in your default browser, get embed code for a 'gist', list 'gist' 'commits', and get rate limit information when 'authenticated'. Some requests require authentication and some do not. 'Gists' website: <>.
Maintained by Scott Chamberlain. Last updated 2 years ago.
httphttpsapiweb-servicesgithubgithub apigistgistscodescriptsnippetapi-wrappergithub-apigithub-gist
104 stars
crmPack:Object-Oriented Implementation of CRM Designs
Implements a wide range of model-based dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules. Further details are presented in Sabanes Bove et al. (2019) <doi:10.18637/jss.v089.i10>.
Maintained by Daniel Sabanes Bove. Last updated 2 months ago.
21 stars
fasterRaster:Faster Raster and Spatial Vector Processing Using 'GRASS GIS'
Processing of large-in-memory/large-on disk rasters and spatial vectors using 'GRASS GIS' <>. Most functions in the 'terra' package are recreated. Processing of medium-sized and smaller spatial objects will nearly always be faster using 'terra' or 'sf', but for large-in-memory/large-on-disk objects, 'fasterRaster' may be faster. To use most of the functions, you must have the stand-alone version (not the 'OSGeoW4' installer version) of 'GRASS GIS' 8.0 or higher.
Maintained by Adam B. Smith. Last updated 2 days ago.
57 stars
adoptr:Adaptive Optimal Two-Stage Designs
Optimize one or two-arm, two-stage designs for clinical trials with respect to several implemented objective criteria or custom objectives. Optimization under uncertainty and conditional (given stage-one outcome) constraints are supported. See Pilz et al. (2019) <doi:10.1002/sim.8291> and Kunzmann et al. (2021) <doi:10.18637/jss.v098.i09> for details.
Maintained by Maximilian Pilz. Last updated 6 months ago.
1 stars
DiceKriging:Kriging Methods for Computer Experiments
Estimation, validation and prediction of kriging models. Important functions : km,,,
Maintained by Olivier Roustant. Last updated 4 years ago.
4 stars
SSDM:Stacked Species Distribution Modelling
Allows to map species richness and endemism based on stacked species distribution models (SSDM). Individuals SDMs can be created using a single or multiple algorithms (ensemble SDMs). For each species, an SDM can yield a habitat suitability map, a binary map, a between-algorithm variance map, and can assess variable importance, algorithm accuracy, and between- algorithm correlation. Methods to stack individual SDMs include summing individual probabilities and thresholding then summing. Thresholding can be based on a specific evaluation metric or by drawing repeatedly from a Bernoulli distribution. The SSDM package also provides a user-friendly interface.
Maintained by Sylvain Schmitt. Last updated 11 months ago.
44 stars
restfulr:R Interface to RESTful Web Services
Models a RESTful service as if it were a nested R list.
Maintained by Michael Lawrence. Last updated 3 years ago.
2 stars
ouch:Ornstein-Uhlenbeck Models for Phylogenetic Comparative Hypotheses
Fit and compare Ornstein-Uhlenbeck models for evolution along a phylogenetic tree.
Maintained by Aaron A. King. Last updated 5 months ago.
15 stars
ClassComparison:Classes and Methods for "Class Comparison" Problems on Microarrays
Defines the classes used for "class comparison" problems in the OOMPA project (<>). Class comparison includes tests for differential expression; see Simon's book for details on typical problem types.
Maintained by Kevin R. Coombes. Last updated 2 months ago.
6.46 score 44 scripts 3 dependentscran
fGarch:Rmetrics - Autoregressive Conditional Heteroskedastic Modelling
Analyze and model heteroskedastic behavior in financial time series.
Maintained by Georgi N. Boshnakov. Last updated 1 years ago.
7 stars
dgpsi:Interface to 'dgpsi' for Deep and Linked Gaussian Process Emulations
Interface to the 'python' package 'dgpsi' for Gaussian process, deep Gaussian process, and linked deep Gaussian process emulations of computer models and networks using stochastic imputation (SI). The implementations follow Ming & Guillas (2021) <doi:10.1137/20M1323771> and Ming, Williamson, & Guillas (2023) <doi:10.1080/00401706.2022.2124311> and Ming & Williamson (2023) <doi:10.48550/arXiv.2306.01212>. To get started with the package, see <>.
Maintained by Deyu Ming. Last updated 4 days ago.
6.03 score 76 scriptsankane
jetpack:A Friendly Package Manager
Manage project dependencies from your DESCRIPTION file. Create a reproducible virtual environment with minimal additional files in your project. Provides tools to add, remove, and update dependencies as well as install existing dependencies with a single function.
Maintained by Andrew Kane. Last updated 11 days ago.
242 stars
R2MLwiN:Running 'MLwiN' from Within R
An R command interface to the 'MLwiN' multilevel modelling software package.
Maintained by Zhengzheng Zhang. Last updated 10 days ago.
5.35 score 125 scriptscenterforstatistics-ugent
xnet:Two-Step Kernel Ridge Regression for Network Predictions
Fit a two-step kernel ridge regression model for predicting edges in networks, and carry out cross-validation using shortcuts for swift and accurate performance assessment (Stock et al, 2018 <doi:10.1093/bib/bby095> ).
Maintained by Joris Meys. Last updated 4 years ago.
11 stars
MSnID:Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications
Extracts MS/MS ID data from mzIdentML (leveraging mzID package) or text files. After collating the search results from multiple datasets it assesses their identification quality and optimize filtering criteria to achieve the maximum number of identifications while not exceeding a specified false discovery rate. Also contains a number of utilities to explore the MS/MS results and assess missed and irregular enzymatic cleavages, mass measurement accuracy, etc.
Maintained by Vlad Petyuk. Last updated 5 months ago.
5.06 score 57 scriptsparksw3
fitode:Tools for Ordinary Differential Equations Model Fitting
Methods and functions for fitting ordinary differential equations (ODE) model in 'R'. Sensitivity equations are used to compute the gradients of ODE trajectories with respect to underlying parameters, which in turn allows for more stable fitting. Other fitting methods, such as MCMC (Markov chain Monte Carlo), are also available.
Maintained by Sang Woo Park. Last updated 1 months ago.
6 stars
MLSeq:Machine Learning Interface for RNA-Seq Data
This package applies several machine learning methods, including SVM, bagSVM, Random Forest and CART to RNA-Seq data.
Maintained by Gokmen Zararsiz. Last updated 5 months ago.
4.81 score 27 scripts 1 dependentspchausse
momentfit:Methods of Moments
Several classes for moment-based models are defined. The classes are defined for moment conditions derived from a single equation or a system of equations. The conditions can also be expressed as functions or formulas. Several methods are also offered to facilitate the development of different estimation techniques. The methods that are currently provided are the Generalized method of moments (Hansen 1982; <doi:10.2307/1912775>), for single equations and systems of equation, and the Generalized Empirical Likelihood (Smith 1997; <doi:10.1111/j.0013-0133.1997.174.x>, Kitamura 1997; <doi:10.1214/aos/1069362388>, Newey and Smith 2004; <doi:10.1111/j.1468-0262.2004.00482.x>, and Anatolyev 2005 <doi:10.1111/j.1468-0262.2005.00601.x>).
Maintained by Pierre Chausse. Last updated 1 years ago.
4.80 score 21 scripts 1 dependentsbpfaff
FRAPO:Financial Risk Modelling and Portfolio Optimisation with R
Accompanying package of the book 'Financial Risk Modelling and Portfolio Optimisation with R', second edition. The data sets used in the book are contained in this package.
Maintained by Bernhard Pfaff. Last updated 8 years ago.
11 stars
dcmle:Hierarchical Models Made Easy with Data Cloning
S4 classes around infrastructure provided by the 'coda' and 'dclone' packages to make package development easy as a breeze with data cloning for hierarchical models.
Maintained by Peter Solymos. Last updated 6 months ago.
4.60 score 66 scripts 2 dependentsjacob-long
dpm:Dynamic Panel Models Fit with Maximum Likelihood
Implements the dynamic panel models described by Allison, Williams, and Moral-Benito (2017 <doi:10.1177/2378023117710578>) in R. This class of models uses structural equation modeling to specify dynamic (lagged dependent variable) models with fixed effects for panel data. Additionally, models may have predictors that are only weakly exogenous, i.e., are affected by prior values of the dependent variable. Options also allow for random effects, dropping the lagged dependent variable, and a number of other specification choices.
Maintained by Jacob A. Long. Last updated 1 years ago.
16 stars
sse:Sample Size Estimation
Provides functions to evaluate user-defined power functions for a parameter range, and draws a sensitivity plot. It also provides a resampling procedure for semi-parametric sample size estimation and methods for adding information to a Sweave report.
Maintained by Thomas Fabbro. Last updated 4 years ago.
4.41 score 16 scriptswahani
saeRobust:Robust Small Area Estimation
Methods to fit robust alternatives to commonly used models used in Small Area Estimation. The methods here used are based on best linear unbiased predictions and linear mixed models. At this time available models include area level models incorporating spatial and temporal correlation in the random effects.
Maintained by Sebastian Warnholz. Last updated 1 years ago.
1 stars
FVDDPpkg:Implement Fleming-Viot-Dependent Dirichlet Processes
A Bayesian Nonparametric model for the study of time-evolving frequencies, which has become renowned in the study of population genetics. The model consists of a Hidden Markov Model (HMM) in which the latent signal is a distribution-valued stochastic process that takes the form of a finite mixture of Dirichlet Processes, indexed by vectors that count how many times each value is observed in the population. The package implements methodologies presented in Ascolani, Lijoi and Ruggiero (2021) <doi:10.1214/20-BA1206> and Ascolani, Lijoi and Ruggiero (2023) <doi:10.3150/22-BEJ1504> that make it possible to study the process at the time of data collection or to predict its evolution in future or in the past.
Maintained by Stefano Damato. Last updated 9 months ago.
4.00 score 1 scriptsmhahsler
rEMM:Extensible Markov Model for Modelling Temporal Relationships Between Clusters
Implements TRACDS (Temporal Relationships between Clusters for Data Streams), a generalization of Extensible Markov Model (EMM). TRACDS adds a temporal or order model to data stream clustering by superimposing a dynamically adapting Markov Chain. Also provides an implementation of EMM (TRACDS on top of tNN data stream clustering). Development of this package was supported in part by NSF IIS-0948893 and R21HG005912 from the National Human Genome Research Institute. Hahsler and Dunham (2010) <doi:10.18637/jss.v035.i05>.
Maintained by Michael Hahsler. Last updated 7 months ago.
2 stars
funGp:Gaussian Process Models for Scalar and Functional Inputs
Construction and smart selection of Gaussian process models for analysis of computer experiments with emphasis on treatment of functional inputs that are regularly sampled. This package offers: (i) flexible modeling of functional-input regression problems through the fairly general Gaussian process model; (ii) built-in dimension reduction for functional inputs; (iii) heuristic optimization of the structural parameters of the model (e.g., active inputs, kernel function, type of distance). An in-depth tutorial in the use of funGp is provided in Betancourt et al. (2024) <doi:10.18637/jss.v109.i05> and Metamodeling background is provided in Betancourt et al. (2020) <doi:10.1016/j.ress.2020.106870>. The algorithm for structural parameter optimization is described in <>.
Maintained by Jose Betancourt. Last updated 11 months ago.
4 stars
rsolr:R to Solr Interface
A comprehensive R API for querying Apache Solr databases. A Solr core is represented as a data frame or list that supports Solr-side filtering, sorting, transformation and aggregation, all through the familiar base R API. Queries are processed lazily, i.e., a query is only sent to the database when the data are required.
Maintained by Michael Lawrence. Last updated 3 years ago.
9 stars
rui:A simple set of UI functions
This package provides a wrapper around different cli and usethis functions, aiming at providing a small but consistent set of verbs to construct a simple R package UI.
Maintained by Bart Rogiers. Last updated 9 months ago.
3.48 score 2 dependentsyannrichet-asnr
rlibkriging:Kriging Models using the 'libKriging' Library
Interface to 'libKriging' 'C++' library <> that should provide most standard Kriging / Gaussian process regression features (like in 'DiceKriging', 'kergp' or 'RobustGaSP' packages). 'libKriging' relies on Armadillo linear algebra library (Apache 2 license) by Conrad Sanderson, 'lbfgsb_cpp' is a 'C++' port around by Pascal Have of 'lbfgsb' library (BSD-3 license) by Ciyou Zhu, Richard Byrd, Jorge Nocedal and Jose Luis Morales used for hyperparameters optimization.
Maintained by Yann Richet. Last updated 2 months ago.
3.40 score 126 scriptssergioventurini
dmbc:Model Based Clustering of Binary Dissimilarity Measurements
Functions for fitting a Bayesian model for grouping binary dissimilarity matrices in homogeneous clusters. Currently, it includes methods only for binary data (<doi:10.18637/jss.v100.i16>).
Maintained by Sergio Venturini. Last updated 6 months ago.
2 stars
dabr:Database Management with R
Provides functions to manage databases: select, update, insert, and delete records, list tables, backup tables as CSV files, and import CSV files as tables.
Maintained by Roberto Villegas-Diaz. Last updated 2 years ago.
4 stars
cpss:Change-Point Detection by Sample-Splitting Methods
Implements multiple change searching algorithms for a variety of frequently considered parametric change-point models. In particular, it integrates a criterion proposed by Zou, Wang and Li (2020) <doi:10.1214/19-AOS1814> to select the number of change-points in a data-driven fashion. Moreover, it also provides interfaces for user-customized change-point models with one's own cost function and parameter estimation routine. It is easy to get started with the cpss.* set of functions by accessing their documentation pages (e.g., ?cpss).
Maintained by Guanghui Wang. Last updated 3 years ago.
1 stars
gems:Generalized Multistate Simulation Model
Simulate and analyze multistate models with general hazard functions. gems provides functionality for the preparation of hazard functions and parameters, simulation from a general multistate model and predicting future events. The multistate model is not required to be a Markov model and may take the history of previous events into account. In the basic version, it allows to simulate from transition-specific hazard function, whose parameters are multivariable normally distributed.
Maintained by Luisa Salazar Vizcaya. Last updated 8 years ago.
2.52 score 33 scriptsbpfaff
rneos:XML-RPC Interface to NEOS
Within this package the XML-RPC API to NEOS <> is implemented. This enables the user to pass optimization problems to NEOS and retrieve results within R.
Maintained by Bernhard Pfaff. Last updated 5 years ago.
2.48 score 25 scripts 4 dependentsrickhelmus
patRoonInst:Manages 'patRoon' Installations
Installs and updates patRoon and its dependencies.
Maintained by Rick Helmus. Last updated 2 months ago.
2.00 score 1 scriptsmorgane-m
GPCsign:Gaussian Process Classification as Described in Bachoc et al. (2020)
Parameter estimation and prediction of Gaussian Process Classifier models as described in Bachoc et al. (2020) <doi:10.1007/S10898-020-00920-0>. Important functions : gpcm(), predict.gpcm(), update.gpcm().
Maintained by Morgane Menz. Last updated 30 days ago.
1.30 scorebpfaff
gogarch:Generalized Orthogonal GARCH (GO-GARCH) Models
Provision of classes and methods for estimating generalized orthogonal GARCH models. This is an alternative approach to CC-GARCH models in the context of multivariate volatility modeling.
Maintained by Bernhard Pfaff. Last updated 3 years ago.
1.26 score 18 scriptscran
plde:Penalized Log-Density Estimation Using Legendre Polynomials
We present a penalized log-density estimation method using Legendre polynomials with lasso penalty to adjust estimate's smoothness. Re-expressing the logarithm of the density estimator via a linear combination of Legendre polynomials, we can estimate parameters by maximizing the penalized log-likelihood function. Besides, we proposed an implementation strategy that builds on the coordinate decent algorithm, together with the Bayesian information criterion (BIC).
Maintained by JungJun Lee. Last updated 7 years ago.
1.00 score