R-universe search: predictors

easystats

insight:Easy Access to Model Information for Various Model Objects

A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. 'insight' mainly revolves around two types of functions: Functions that find (the names of) information, starting with 'find_', and functions that get the underlying data, starting with 'get_'. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.

Maintained by Daniel Lüdecke. Last updated 4 days ago.

easystats hacktoberfest insight models names predictors random

15.3 match 412 stars 17.24 score 568 scripts 210 dependents

mwheymans

psfmi:Prediction Model Pooling, Selection and Performance Evaluation Across Multiply Imputed Datasets

Pooling, backward and forward selection of linear, logistic and Cox regression models in multiply imputed datasets. Backward and forward selection can be done from the pooled model using Rubin's Rules (RR), the D1, D2, D3, D4 and the median p-values method. This is also possible for Mixed models. The models can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of predictors. The stability of the models can be evaluated using (cluster) bootstrapping. The package further contains functions to pool model performance measures as ROC/AUC, Reclassification, R-squared, scaled Brier score, H&L test and calibration plots for logistic regression models. Internal validation can be done across multiply imputed datasets with cross-validation or bootstrapping. The adjusted intercept after shrinkage of pooled regression coefficients can be obtained. Backward and forward selection as part of internal validation is possible. A function to externally validate logistic prediction models in multiple imputed datasets is available and a function to compare models. For Cox models a strata variable can be included. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>.

Maintained by Martijn Heymans. Last updated 2 years ago.

cox-regression imputation imputed-datasets logistic multiple-imputation pool predictor regression selection spline spline-predictors

34.1 match 10 stars 7.17 score 70 scripts

stan-dev

rstanarm:Bayesian Applied Regression Modeling via Stan

Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.

Maintained by Ben Goodrich. Last updated 9 months ago.

bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics multilevel-models rstan rstanarm stan statistical-modeling cpp

14.9 match 393 stars 15.68 score 5.0k scripts 13 dependents

blasbenito

collinear:Automated Multicollinearity Management

Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.

Maintained by Blas M. Benito. Last updated 2 months ago.

machine-learning multicollinearity statistics

41.2 match 11 stars 5.51 score 15 scripts 1 dependents

topepo

caret:Classification and Regression Training

Misc functions for training and plotting classification and regression models.

Maintained by Max Kuhn. Last updated 3 months ago.

10.5 match 1.6k stars 19.24 score 61k scripts 303 dependents

cran

Compositional:Compositional Data Analysis

Regression, classification, contour plots, hypothesis testing and fitting of distributions for compositional data are some of the functions included. We further include functions for percentages (or proportions). The standard textbook for such data is John Aitchison's (1986) "The statistical analysis of compositional data". Relevant papers include: a) Tsagris M.T., Preston S. and Wood A.T.A. (2011). "A data-based power transformation for compositional data". Fourth International International Workshop on Compositional Data Analysis. <doi:10.48550/arXiv.1106.1451> b) Tsagris M. (2014). "The k-NN algorithm for compositional data: a revised approach with and without zero values present". Journal of Data Science, 12(3): 519--534. <doi:10.6339/JDS.201407_12(3).0008>. c) Tsagris M. (2015). "A novel, divergence based, regression for compositional data". Proceedings of the 28th Panhellenic Statistics Conference, 15-18 April 2015, Athens, Greece, 430--444. <doi:10.48550/arXiv.1511.07600>. d) Tsagris M. (2015). "Regression analysis with compositional data containing zero values". Chilean Journal of Statistics, 6(2): 47--57. <https://soche.cl/chjs/volumes/06/02/Tsagris(2015).pdf>. e) Tsagris M., Preston S. and Wood A.T.A. (2016). "Improved supervised classification for compositional data using the alpha-transformation". Journal of Classification, 33(2): 243--261. <doi:10.1007/s00357-016-9207-5>. f) Tsagris M., Preston S. and Wood A.T.A. (2017). "Nonparametric hypothesis testing for equality of means on the simplex". Journal of Statistical Computation and Simulation, 87(2): 406--422. <doi:10.1080/00949655.2016.1216554>. g) Tsagris M. and Stewart C. (2018). "A Dirichlet regression model for compositional data with zeros". Lobachevskii Journal of Mathematics, 39(3): 398--412. <doi:10.1134/S1995080218030198>. h) Alenazi A. (2019). "Regression for compositional data with compositional data as predictor variables with or without zero values". Journal of Data Science, 17(1): 219--238. <doi:10.6339/JDS.201901_17(1).0010>. i) Tsagris M. and Stewart C. (2020). "A folded model for compositional data analysis". Australian and New Zealand Journal of Statistics, 62(2): 249--277. <doi:10.1111/anzs.12289>. j) Alenazi A.A. (2022). "f-divergence regression models for compositional data". Pakistan Journal of Statistics and Operation Research, 18(4): 867--882. <doi:10.18187/pjsor.v18i4.3969>. k) Tsagris M. and Stewart C. (2022). "A Review of Flexible Transformations for Modeling Compositional Data". In Advances and Innovations in Statistics and Data Science, pp. 225--234. <doi:10.1007/978-3-031-08329-7_10>. l) Alenazi A. (2023). "A review of compositional data analysis and recent advances". Communications in Statistics--Theory and Methods, 52(16): 5535--5567. <doi:10.1080/03610926.2021.2014890>. m) Tsagris M., Alenazi A. and Stewart C. (2023). "Flexible non-parametric regression models for compositional response data with zeros". Statistics and Computing, 33(106). <doi:10.1007/s11222-023-10277-5>. n) Tsagris. M. (2025). "Constrained least squares simplicial-simplicial regression". Statistics and Computing, 35(27). <doi:10.1007/s11222-024-10560-z>. o) Sevinc V. and Tsagris. M. (2024). "Energy Based Equality of Distributions Testing for Compositional Data". <doi:10.48550/arXiv.2412.05199>.

Maintained by Michail Tsagris. Last updated 2 months ago.

50.5 match 3 stars 3.64 score 4 dependents

paul-buerkner

brms:Bayesian Regression Models using 'Stan'

Fit Bayesian generalized (non-)linear multivariate multilevel models using 'Stan' for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit -- among others -- linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include both theory-driven and data-driven non-linear terms, auto-correlation structures, censoring and truncation, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their prior knowledge. Models can easily be evaluated and compared using several methods assessing posterior or prior predictions. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>; Bürkner (2018) <doi:10.32614/RJ-2018-017>; Bürkner (2021) <doi:10.18637/jss.v100.i05>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.

Maintained by Paul-Christian Bürkner. Last updated 2 days ago.

bayesian-inference brms multilevel-models stan statistical-models

11.0 match 1.3k stars 16.61 score 13k scripts 34 dependents

inlabru-org

inlabru:Bayesian Latent Gaussian Modelling using INLA and Extensions

Facilitates spatial and general latent Gaussian modeling using integrated nested Laplace approximation via the INLA package (<https://www.r-inla.org>). Additionally, extends the GAM-like model class to more general nonlinear predictor expressions, and implements a log Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data. Model components are specified with general inputs and mapping methods to the latent variables, and the predictors are specified via general R expressions, with separate expressions for each observation likelihood model in multi-likelihood models. A prediction method based on fast Monte Carlo sampling allows posterior prediction of general expressions of the latent variables. Ecology-focused introduction in Bachl, Lindgren, Borchers, and Illian (2019) <doi:10.1111/2041-210X.13168>.

Maintained by Finn Lindgren. Last updated 3 days ago.

14.1 match 96 stars 12.62 score 832 scripts 6 dependents

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

18.6 match 3 stars 8.20 score 7.8k scripts 11 dependents

promidat

predictoR:Predictive Data Analysis System

Perform a supervised data analysis on a database through a 'shiny' graphical interface. It includes methods such as K-Nearest Neighbors, Decision Trees, ADA Boosting, Extreme Gradient Boosting, Random Forest, Neural Networks, Deep Learning, Support Vector Machines and Bayesian Methods.

Maintained by Oldemar Rodriguez. Last updated 1 years ago.

55.0 match 1 stars 2.60 score 3 scripts

tidymodels

recipes:Preprocessing and Feature Engineering Steps for Modeling

A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.

Maintained by Max Kuhn. Last updated 5 days ago.

7.1 match 584 stars 18.71 score 7.2k scripts 380 dependents

refunders

refund:Regression with Functional Data

Methods for regression for functional data, including function-on-scalar, scalar-on-function, and function-on-function regression. Some of the functions are applicable to image data.

Maintained by Julia Wrobel. Last updated 6 months ago.

11.3 match 41 stars 10.25 score 472 scripts 16 dependents

blasbenito

spatialRF:Easy Spatial Modeling with Random Forest

Automatic generation and selection of spatial predictors for spatial regression with Random Forest. Spatial predictors are surrogates of variables driving the spatial structure of a response variable. The package offers two methods to generate spatial predictors from a distance matrix among training cases: 1) Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>): computed as the eigenvectors of a weighted matrix of distances; 2) RFsp (Hengl et al. <DOI:10.7717/peerj.5518>): columns of the distance matrix used as spatial predictors. Spatial predictors help minimize the spatial autocorrelation of the model residuals and facilitate an honest assessment of the importance scores of the non-spatial predictors. Additionally, functions to reduce multicollinearity, identify relevant variable interactions, tune random forest hyperparameters, assess model transferability via spatial cross-validation, and explore model results via partial dependence curves and interaction surfaces are included in the package. The modelling functions are built around the highly efficient 'ranger' package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).

Maintained by Blas M. Benito. Last updated 3 years ago.

random-forest spatial-analysis spatial-regression

21.1 match 114 stars 5.45 score 49 scripts

e-sensing

sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes

An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.

Maintained by Gilberto Camara. Last updated 1 months ago.

big-earth-data cbers earth-observation eo-datacubes geospatial image-time-series land-cover-classification landsat planetary-computer r-spatial remote-sensing rspatial satellite-image-time-series satellite-imagery sentinel-2 stac-api stac-catalog cpp

10.8 match 494 stars 9.50 score 384 scripts

zejiang-unsw

WASP:Wavelet System Prediction

The wavelet-based variance transformation method is used for system modelling and prediction. It refines predictor spectral representation using Wavelet Theory, which leads to improved model specifications and prediction accuracy. Details of methodologies used in the package can be found in Jiang, Z., Sharma, A., & Johnson, F. (2020) <doi:10.1029/2019WR026962>, Jiang, Z., Rashid, M. M., Johnson, F., & Sharma, A. (2020) <doi:10.1016/j.envsoft.2020.104907>, and Jiang, Z., Sharma, A., & Johnson, F. (2021) <doi:10.1016/J.JHYDROL.2021.126816>.

Maintained by Ze Jiang. Last updated 7 months ago.

prediction transformation wavelet

15.8 match 9 stars 6.41 score 19 scripts

stan-dev

projpred:Projection Predictive Feature Selection

Performs projection predictive feature selection for generalized linear models (Piironen, Paasiniemi, and Vehtari, 2020, <doi:10.1214/20-EJS1711>) with or without multilevel or additive terms (Catalina, Bürkner, and Vehtari, 2022, <https://proceedings.mlr.press/v151/catalina22a.html>), for some ordinal and nominal regression models (Weber, Glass, and Vehtari, 2023, <arXiv:2301.01660>), and for many other regression models (using the latent projection by Catalina, Bürkner, and Vehtari, 2021, <arXiv:2109.04702>, which can also be applied to most of the former models). The package is compatible with the 'rstanarm' and 'brms' packages, but other reference models can also be used. See the vignettes and the documentation for more information and examples.

Maintained by Frank Weber. Last updated 1 months ago.

bayes bayesian bayesian-inference rstanarm stan statistics variable-selection openblas cpp

10.0 match 112 stars 10.08 score 241 scripts

mjskay

tidybayes:Tidy Data and 'Geoms' for Bayesian Models

Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.

Maintained by Matthew Kay. Last updated 6 months ago.

bayesian-data-analysis brms ggplot2 jags stan tidy-data visualization

6.6 match 733 stars 14.72 score 7.3k scripts 20 dependents

hturner

gnm:Generalized Nonlinear Models

Functions to specify and fit generalized nonlinear models, including models with multiplicative interaction terms such as the UNIDIFF model from sociology and the AMMI model from crop science, and many others. Over-parameterized representations of models are used throughout; functions are provided for inference on estimable parameter combinations, as well as standard methods for diagnostics etc.

Maintained by Heather Turner. Last updated 1 years ago.

generalized-linear-models generalized-nonlinear-models statistical-models openblas

8.9 match 16 stars 10.51 score 290 scripts 21 dependents

iiasa

ibis.iSDM:Modelling framework for integrated biodiversity distribution scenarios

Integrated framework of modelling the distribution of species and ecosystems in a suitability framing. This package allows the estimation of integrated species distribution models (iSDM) based on several sources of evidence and provided presence-only and presence-absence datasets. It makes heavy use of point-process models for estimating habitat suitability and allows to include spatial latent effects and priors in the estimation. To do so 'ibis.iSDM' supports a number of engines for Bayesian and more non-parametric machine learning estimation. Further, the 'ibis.iSDM' is specifically customized to support spatial-temporal projections of habitat suitability into the future.

Maintained by Martin Jung. Last updated 4 months ago.

bayesian biodiversity integrated-framework poisson-process scenarios sdm spatial-grain spatial-predictions species-distribution-modelling

21.0 match 21 stars 4.36 score 12 scripts 1 dependents

cran

rmlnomogram:Construct Explainable Nomogram for a Machine Learning Model

Construct an explainable nomogram for a machine learning (ML) model to improve availability of an ML prediction model in addition to a computer application, particularly in a situation where a computer, a mobile phone, an internet connection, or the application accessibility are unreliable. This package enables a nomogram creation for any ML prediction models, which is conventionally limited to only a linear/logistic regression model. This nomogram may indicate the explainability value per feature, e.g., the Shapley additive explanation value, for each individual. However, this package only allows a nomogram creation for a model using categorical without or with single numerical predictors. Detailed methodologies and examples are documented in our vignette, available at <https://htmlpreview.github.io/?https://github.com/herdiantrisufriyana/rmlnomogram/blob/master/doc/ml_nomogram_exemplar.html>.

Maintained by Herdiantri Sufriyana. Last updated 2 months ago.

33.7 match 2.70 score

giuseppec

iml:Interpretable Machine Learning

Interpretability methods to analyze the behavior and predictions of any machine learning model. Implemented methods are: Feature importance described by Fisher et al. (2018) <doi:10.48550/arxiv.1801.01489>, accumulated local effects plots described by Apley (2018) <doi:10.48550/arxiv.1612.08468>, partial dependence plots described by Friedman (2001) <www.jstor.org/stable/2699986>, individual conditional expectation ('ice') plots described by Goldstein et al. (2013) <doi:10.1080/10618600.2014.907095>, local models (variant of 'lime') described by Ribeiro et. al (2016) <doi:10.48550/arXiv.1602.04938>, the Shapley Value described by Strumbelj et. al (2014) <doi:10.1007/s10115-013-0679-x>, feature interactions described by Friedman et. al <doi:10.1214/07-AOAS148> and tree surrogate models.

Maintained by Giuseppe Casalicchio. Last updated 20 days ago.

6.8 match 494 stars 12.86 score 642 scripts 4 dependents

tidymodels

dials:Tools for Creating Tuning Parameter Values

Many models contain tuning parameters (i.e. parameters that cannot be directly estimated from the data). These tools can be used to define objects for creating, simulating, or validating values for such parameters.

Maintained by Hannah Frick. Last updated 29 days ago.

6.0 match 114 stars 14.31 score 426 scripts 52 dependents

tidymodels

applicable:A Compilation of Applicability Domain Methods

A modeling package compiling applicability domain methods in R. It combines different methods to measure the amount of extrapolation new samples can have from the training set. See Netzeva et al (2005) <doi:10.1177/026119290503300209> for an overview of applicability domains.

Maintained by Marly Gotti. Last updated 2 years ago.

11.3 match 47 stars 7.42 score 47 scripts 1 dependents

chrisaberson

pwr2ppl:Power Analyses for Common Designs (Power to the People)

Statistical power analysis for designs including t-tests, correlations, multiple regression, ANOVA, mediation, and logistic regression. Functions accompany Aberson (2019) <doi:10.4324/9781315171500>.

Maintained by Chris Aberson. Last updated 3 years ago.

19.5 match 17 stars 4.16 score 17 scripts

ebird

ebirdst:Access and Analyze eBird Status and Trends Data Products

Tools for accessing and analyzing eBird Status and Trends Data Products (<https://science.ebird.org/en/status-and-trends>). eBird (<https://ebird.org/home>) is a global database of bird observations collected by member of the public. eBird Status and Trends uses these data to model global bird distributions, abundances, and population trends at a high spatial and temporal resolution.

Maintained by Matthew Strimas-Mackey. Last updated 19 days ago.

9.0 match 26 stars 8.85 score 228 scripts

john-d-fox

effects:Effect Displays for Linear, Generalized Linear, and Other Models

Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.

Maintained by John Fox. Last updated 3 years ago.

7.4 match 6 stars 10.73 score 5.4k scripts 47 dependents

openintrostat

openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs

Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.

Maintained by Mine Çetinkaya-Rundel. Last updated 2 months ago.

data openintro

6.6 match 240 stars 11.39 score 6.0k scripts

boehringer-ingelheim

BPrinStratTTE:Causal Effects in Principal Strata Defined by Antidrug Antibodies

Bayesian models to estimate causal effects of biological treatments on time-to-event endpoints in clinical trials with principal strata defined by the occurrence of antidrug antibodies. The methodology is based on Frangakis and Rubin (2002) <doi:10.1111/j.0006-341x.2002.00021.x> and Imbens and Rubin (1997) <doi:10.1214/aos/1034276631>, and here adapted to a specific time-to-event setting.

Maintained by Christian Stock. Last updated 11 months ago.

bayesian-methods causal-inference clinical-trial estimand mcmc-methods pharmaceutical-development principal-stratification simulation stan time-to-event cpp

22.5 match 3.18 score

ohdsi

PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model

A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.

Maintained by Egill Fridgeirsson. Last updated 8 days ago.

hades openjdk

6.5 match 190 stars 10.85 score 297 scripts

rsquaredacademy

olsrr:Tools for Building OLS Regression Models

Tools designed to make it easier for users, particularly beginner/intermediate R users to build ordinary least squares regression models. Includes comprehensive regression output, heteroskedasticity tests, collinearity diagnostics, residual diagnostics, measures of influence, model fit assessment and variable selection procedures.

Maintained by Aravind Hebbali. Last updated 4 months ago.

collinearity-diagnostics linear-models regression stepwise-regression

5.8 match 103 stars 12.19 score 1.4k scripts 4 dependents

spatstat

spatstat.explore:Exploratory Data Analysis for the 'spatstat' Family

Functionality for exploratory data analysis and nonparametric analysis of spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported.

Maintained by Adrian Baddeley. Last updated 1 months ago.

cluster-detection confidence-intervals hypothesis-testing k-function roc-curves scan-statistics significance-testing simulation-envelopes spatial-analysis spatial-data-analysis spatial-sharpening spatial-smoothing spatial-statistics

6.5 match 1 stars 10.17 score 67 scripts 148 dependents

myles-lewis

nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'

Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.

Maintained by Myles Lewis. Last updated 5 days ago.

7.9 match 12 stars 7.92 score 46 scripts

jedazard

superpc:Supervised Principal Components

Does prediction in the case of a censored survival outcome, or a regression outcome, using the "supervised principal component" approach. 'Superpc' is especially useful for high-dimensional data when the number of features p dominates the number of samples n (p >> n paradigm), as generated, for instance, by high-throughput technologies.

Maintained by Jean-Eudes Dazard. Last updated 3 years ago.

8.8 match 7 stars 6.96 score 80 scripts 2 dependents

tidymodels

workflowsets:Create a Collection of 'tidymodels' Workflows

A workflow is a combination of a model and preprocessors (e.g, a formula, recipe, etc.) (Kuhn and Silge (2021) <https://www.tmwr.org/>). In order to try different combinations of these, an object can be created that contains many workflows. There are functions to create workflows en masse as well as training them and visualizing the results.

Maintained by Simon Couch. Last updated 5 months ago.

5.0 match 93 stars 12.21 score 294 scripts 19 dependents

ddalthorp

GenEst:Generalized Mortality Estimator

Command-line and 'shiny' GUI implementation of the GenEst models for estimating bird and bat mortality at wind and solar power facilities, following Dalthorp, et al. (2018) <doi:10.3133/tm7A2>.

Maintained by Daniel Dalthorp. Last updated 2 years ago.

cpp

7.8 match 7 stars 7.81 score 55 scripts 2 dependents

nicholasjclark

mvgam:Multivariate (Dynamic) Generalized Additive Models

Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2023) <doi:10.1111/2041-210X.13974>.

Maintained by Nicholas J Clark. Last updated 4 hours ago.

bayesian-statistics dynamic-factor-models ecological-modelling forecasting gaussian-process generalised-additive-models generalized-additive-models joint-species-distribution-modelling multilevel-models multivariate-timeseries stan time-series-analysis timeseries vector-autoregression vectorautoregression cpp

6.1 match 139 stars 9.85 score 117 scripts

harrelfe

Hmisc:Harrell Miscellaneous

Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.

Maintained by Frank E Harrell Jr. Last updated 2 days ago.

fortran

3.4 match 210 stars 17.61 score 17k scripts 750 dependents

pauljohn32

rockchalk:Regression Estimation and Presentation

A collection of functions for interpretation and presentation of regression analysis. These functions are used to produce the statistics lectures in <https://pj.freefaculty.org/guides/>. Includes regression diagnostics, regression tables, and plots of interactions and "moderator" variables. The emphasis is on "mean-centered" and "residual-centered" predictors. The vignette 'rockchalk' offers a fairly comprehensive overview. The vignette 'Rstyle' has advice about coding in R. The package title 'rockchalk' refers to our school motto, 'Rock Chalk Jayhawk, Go K.U.'.

Maintained by Paul E. Johnson. Last updated 3 years ago.

8.3 match 7.13 score 584 scripts 18 dependents

tyee001

VGAM:Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book "Vector Generalized Linear and Additive Models: With an Implementation in R" (Yee, 2015) <DOI:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (100+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, doubly constrained RR-VGLMs, quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)---these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Hauck-Donner effect detection is implemented. Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

Maintained by Thomas Yee. Last updated 1 months ago.

fortran

5.3 match 10 stars 10.67 score 3.6k scripts 169 dependents

zejiang-unsw

synthesis:Generate Synthetic Data from Statistical Models

Generate synthetic time series from commonly used statistical models, including linear, nonlinear and chaotic systems. Applications to testing methods can be found in Jiang, Z., Sharma, A., & Johnson, F. (2019) <doi:10.1016/j.advwatres.2019.103430> and Jiang, Z., Sharma, A., & Johnson, F. (2020) <doi:10.1029/2019WR026962> associated with an open-source tool by Jiang, Z., Rashid, M. M., Johnson, F., & Sharma, A. (2020) <doi:10.1016/j.envsoft.2020.104907>.

Maintained by Ze Jiang. Last updated 9 months ago.

12.3 match 3 stars 4.56 score 12 scripts

fsolt

dotwhisker:Dot-and-Whisker Plots of Regression Results

Create quick and easy dot-and-whisker plots of regression results. It takes as input either (1) a coefficient table in standard form or (2) one (or a list of) fitted model objects (of any type that has methods implemented in the 'parameters' package). It returns 'ggplot' objects that can be further customized using tools from the 'ggplot2' package. The package also includes helper functions for tasks such as rescaling coefficients or relabeling predictor variables. See more methodological discussion of the visualization and data management methods used in this package in Kastellec and Leoni (2007) <doi:10.1017/S1537592707072209> and Gelman (2008) <doi:10.1002/sim.3107>.

Maintained by Yue Hu. Last updated 6 months ago.

graphics plot regression-models

5.4 match 60 stars 10.25 score 680 scripts

ecospat

ecospat:Spatial Ecology Miscellaneous Methods

Collection of R functions and data sets for the support of spatial ecology analyses with a focus on pre, core and post modelling analyses of species distribution, niche quantification and community assembly. Written by current and former members and collaborators of the ecospat group of Antoine Guisan, Department of Ecology and Evolution (DEE) and Institute of Earth Surface Dynamics (IDYST), University of Lausanne, Switzerland. Read Di Cola et al. (2016) <doi:10.1111/ecog.02671> for details.

Maintained by Olivier Broennimann. Last updated 1 months ago.

5.9 match 32 stars 9.35 score 418 scripts 1 dependents

jacob-long

interactions:Comprehensive, User-Friendly Toolkit for Probing Interactions

A suite of functions for conducting and interpreting analysis of statistical interaction in regression models that was formerly part of the 'jtools' package. Functionality includes visualization of two- and three-way interactions among continuous and/or categorical variables as well as calculation of "simple slopes" and Johnson-Neyman intervals (see e.g., Bauer & Curran, 2005 <doi:10.1207/s15327906mbr4003_5>). These capabilities are implemented for generalized linear models in addition to the standard linear regression context.

Maintained by Jacob A. Long. Last updated 8 months ago.

interactions moderation social-sciences statistics

4.7 match 131 stars 11.39 score 1.2k scripts 5 dependents

mmaechler

supclust:Supervised Clustering of Predictor Variables Such as Genes

Methodology for supervised grouping aka "clustering" of potentially many predictor variables, such as genes etc, implementing algorithms 'PELORA' and 'WILMA'.

Maintained by Martin Maechler. Last updated 7 months ago.

openblas

12.7 match 2 stars 4.15 score 28 scripts

bioc

sesame:SEnsible Step-wise Analysis of DNA MEthylation BeadChips

Tools For analyzing Illumina Infinium DNA methylation arrays. SeSAMe provides utilities to support analyses of multiple generations of Infinium DNA methylation BeadChips, including preprocessing, quality control, visualization and inference. SeSAMe features accurate detection calling, intelligent inference of ethnicity, sex and advanced quality control routines.

Maintained by Wanding Zhou. Last updated 2 months ago.

dnamethylation methylationarray preprocessing qualitycontrol bioinformatics dna-methylation microarray

5.7 match 69 stars 9.08 score 258 scripts 1 dependents

capnrefsmmat

regressinator:Simulate and Diagnose (Generalized) Linear Models

Simulate samples from populations with known covariate distributions, generate response variables according to common linear and generalized linear model families, draw from sampling distributions of regression estimates, and perform visual inference on diagnostics from model fits.

Maintained by Alex Reinhart. Last updated 5 months ago.

statistics

8.5 match 4 stars 6.08 score 25 scripts

tidymodels

embed:Extra Recipes for Encoding Predictors

Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.

Maintained by Emil Hvitfeldt. Last updated 2 months ago.

5.4 match 142 stars 9.35 score 1.1k scripts

simulatr

simrel:Simulation of Multivariate Linear Model Data

Researchers have been using simulated data from a multivariate linear model to compare and evaluate different methods, ideas and models. Additionally, teachers and educators have been using a simulation tool to demonstrate and teach various statistical and machine learning concepts. This package helps users to simulate linear model data with a wide range of properties by tuning few parameters such as relevant latent components. In addition, a shiny app as an 'RStudio' gadget gives users a simple interface for using the simulation function. See more on: Sæbø, S., Almøy, T., Helland, I.S. (2015) <doi:10.1016/j.chemolab.2015.05.012> and Rimal, R., Almøy, T., Sæbø, S. (2018) <doi:10.1016/j.chemolab.2018.02.009>.

Maintained by Raju Rimal. Last updated 2 years ago.

bivariate-simulation multivariate-simulation relevant-predictor-components simulated-data simulation univariate-simulation

10.4 match 3 stars 4.78 score 40 scripts

geomorphr

geomorph:Geometric Morphometric Analyses of 2D and 3D Landmark Data

Read, manipulate, and digitize landmark data, generate shape variables via Procrustes analysis for points, curves and surfaces, perform shape analyses, and provide graphical depictions of shapes and patterns of shape variation.

Maintained by Dean Adams. Last updated 1 months ago.

4.1 match 76 stars 12.05 score 700 scripts 6 dependents

jenfb

bkmr:Bayesian Kernel Machine Regression

Implementation of a statistical approach for estimating the joint health effects of multiple concurrent exposures, as described in Bobb et al (2015) <doi:10.1093/biostatistics/kxu058>.

Maintained by Jennifer F. Bobb. Last updated 4 months ago.

6.8 match 55 stars 7.03 score 182 scripts 1 dependents

knausb

vcfR:Manipulate and Visualize VCF Data

Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.

Maintained by Brian J. Knaus. Last updated 22 days ago.

genomics population-genetics population-genomics rcpp vcf-data visualization zlib cpp

3.5 match 254 stars 13.59 score 3.1k scripts 19 dependents

dwarton

ecostats:Code and Data Accompanying the Eco-Stats Text (Warton 2022)

Functions and data supporting the Eco-Stats text (Warton, 2022, Springer), and solutions to exercises. Functions include tools for using simulation envelopes in diagnostic plots, and a function for diagnostic plots of multivariate linear models. Datasets mentioned in the package are included here (where not available elsewhere) and there is a vignette for each chapter of the text with solutions to exercises.

Maintained by David Warton. Last updated 1 years ago.

7.2 match 8 stars 6.58 score 53 scripts

choonghyunryu

dlookr:Tools for Data Diagnosis, Exploration, Transformation

A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.

Maintained by Choonghyun Ryu. Last updated 9 months ago.

4.3 match 212 stars 11.05 score 748 scripts 2 dependents

clbustos

dominanceanalysis:Dominance Analysis

Dominance analysis is a method that allows to compare the relative importance of predictors in multiple regression models: ordinary least squares, generalized linear models, hierarchical linear models, beta regression and dynamic linear models. The main principles and methods of dominance analysis are described in Budescu, D. V. (1993) <doi:10.1037/0033-2909.114.3.542> and Azen, R., & Budescu, D. V. (2003) <doi:10.1037/1082-989X.8.2.129> for ordinary least squares regression. Subsequently, the extensions for multivariate regression, logistic regression and hierarchical linear models were described in Azen, R., & Budescu, D. V. (2006) <doi:10.3102/10769986031002157>, Azen, R., & Traxel, N. (2009) <doi:10.3102/1076998609332754> and Luo, W., & Azen, R. (2013) <doi:10.3102/1076998612458319>, respectively.

Maintained by Claudio Bustos Navarrete. Last updated 1 years ago.

8.1 match 25 stars 5.75 score 45 scripts

gavinsimpson

gratia:Graceful 'ggplot'-Based Graphics and Other Functions for GAMs Fitted Using 'mgcv'

Graceful 'ggplot'-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the 'mgcv' package. Provides a reimplementation of the plot() method for GAMs that 'mgcv' provides, as well as 'tidyverse' compatible representations of estimated smooths.

Maintained by Gavin L. Simpson. Last updated 5 days ago.

distributional-regression gam gamm generalized-additive-mixed-models generalized-additive-models ggplot2 glm lm mgcv penalized-spline random-effects smoothing splines

3.6 match 216 stars 12.68 score 1.6k scripts 1 dependents

sahirbhatnagar

casebase:Fitting Flexible Smooth-in-Time Hazards and Risk Functions via Logistic and Multinomial Regression

Fit flexible and fully parametric hazard regression models to survival data with single event type or multiple competing causes via logistic and multinomial regression. Our formulation allows for arbitrary functional forms of time and its interactions with other predictors for time-dependent hazards and hazard ratios. From the fitted hazard model, we provide functions to readily calculate and plot cumulative incidence and survival curves for a given covariate profile. This approach accommodates any log-linear hazard function of prognostic time, treatment, and covariates, and readily allows for non-proportionality. We also provide a plot method for visualizing incidence density via population time plots. Based on the case-base sampling approach of Hanley and Miettinen (2009) <DOI:10.2202/1557-4679.1125>, Saarela and Arjas (2015) <DOI:10.1111/sjos.12125>, and Saarela (2015) <DOI:10.1007/s10985-015-9352-x>.

Maintained by Sahir Bhatnagar. Last updated 7 months ago.

competing-risks cox-regression regression-models survival-analysis

6.3 match 9 stars 7.16 score 94 scripts

dadongz

OncoSubtype:Predict Cancer Subtypes Based on TCGA Data using Machine Learning Method

Provide functionality for cancer subtyping using nearest centroids or machine learning methods based on TCGA data.

Maintained by Dadong Zhang. Last updated 12 months ago.

12.0 match 1 stars 3.70 score 1 scripts

cran

mgcv:Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, 'JAGS' support and distributions beyond the exponential family.

Maintained by Simon Wood. Last updated 1 years ago.

openblas openmp

3.4 match 32 stars 12.71 score 17k scripts 7.8k dependents

qingzhaoyu

mma:Multiple Mediation Analysis

Used for general multiple mediation analysis. The analysis method is described in Yu and Li (2022) (ISBN: 9780367365479) "Statistical Methods for Mediation, Confounding and Moderation Analysis Using R and SAS", published by Chapman and Hall/CRC; and Yu et al.(2017) <DOI:10.1016/j.sste.2017.02.001> "Exploring racial disparity in obesity: a mediation analysis considering geo-coded environmental factors", published on Spatial and Spatio-temporal Epidemiology, 21, 13-23.

Maintained by Qingzhao Yu. Last updated 2 years ago.

10.7 match 1 stars 3.96 score 61 scripts 1 dependents

dmcartor

MDMR:Multivariate Distance Matrix Regression

This package allows users to conduct multivariate distance matrix regression using analytic p-values and compute measures of effect size. For details on the method, see McArtor, Lubke, & Bergeman (2017) <https://doi.org/10.1007/s11336-016-9527-8>.

Maintained by Daniel B. McArtor. Last updated 7 years ago.

6.9 match 6 stars 6.13 score 15 scripts 1 dependents

bioc

PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit

Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpression pharmacogenetics pharmacogenomics software classification survival clustering geneprediction

9.7 match 1 stars 4.31 score 17 scripts

florale

multilevelcoda:Estimate Bayesian Multilevel Models for Compositional Data

Implement Bayesian Multilevel Modelling for compositional data in a multilevel framework. Compute multilevel compositional data and Isometric log ratio (ILR) at between and within-person levels, fit Bayesian multilevel models for compositional predictors and outcomes, and run post-hoc analyses such as isotemporal substitution models. References: Le, Stanford, Dumuid, and Wiley (2024) <doi:10.48550/arXiv.2405.03985>, Le, Dumuid, Stanford, and Wiley (2024) <doi:10.48550/arXiv.2411.12407>.

Maintained by Flora Le. Last updated 2 days ago.

bayesian-inference compositional-data-analysis multilevel-models multilevelcoda

4.9 match 14 stars 8.31 score 118 scripts

cardiomoon

ggiraphExtra:Make Interactive 'ggplot2'. Extension to 'ggplot2' and 'ggiraph'

Collection of functions to enhance 'ggplot2' and 'ggiraph'. Provides functions for exploratory plots. All plot can be a 'static' plot or an 'interactive' plot using 'ggiraph'.

Maintained by Keon-Woong Moon. Last updated 4 years ago.

4.5 match 48 stars 8.93 score 402 scripts 3 dependents

qyaozh

Keng:Knock Errors Off Nice Guesses

Miscellaneous functions and data used in psychological research and teaching. Keng currently has a built-in dataset depress, and could (1) scale a vector; (2) compute the cut-off values of Pearson's r with known sample size; (3) test the significance and compute the post-hoc power for Pearson's r with known sample size; (4) conduct a priori power analysis and plan the sample size for Pearson's r; (5) compare lm()'s fitted outputs using R-squared, f_squared, post-hoc power, and PRE (Proportional Reduction in Error, also called partial R-squared or partial Eta-squared); (6) calculate PRE from partial correlation, Cohen's f, or f_squared; (7) conduct a priori power analysis and plan the sample size for one or a set of predictors in regression analysis; (8) conduct post-hoc power analysis for one or a set of predictors in regression analysis with known sample size.

Maintained by Qingyao Zhang. Last updated 1 months ago.

7.5 match 5.42 score 11 scripts

thothorn

ipred:Improved Predictors

Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.

Maintained by Torsten Hothorn. Last updated 8 months ago.

3.8 match 10.76 score 3.3k scripts 411 dependents

friendly

candisc:Visualizing Generalized Canonical Discriminant and Canonical Correlation Analysis

Functions for computing and visualizing generalized canonical discriminant analyses and canonical correlation analysis for a multivariate linear model. Traditional canonical discriminant analysis is restricted to a one-way 'MANOVA' design and is equivalent to canonical correlation analysis between a set of quantitative response variables and a set of dummy variables coded from the factor variable. The 'candisc' package generalizes this to higher-way 'MANOVA' designs for all factors in a multivariate linear model, computing canonical scores and vectors for each term. The graphic functions provide low-rank (1D, 2D, 3D) visualizations of terms in an 'mlm' via the 'plot.candisc' and 'heplot.candisc' methods. Related plots are now provided for canonical correlation analysis when all predictors are quantitative.

Maintained by Michael Friendly. Last updated 10 months ago.

dimension-reduction multivariate-linear-models visualization

4.5 match 15 stars 8.86 score 221 scripts 3 dependents

holgstr

fmeffects:Model-Agnostic Interpretations with Forward Marginal Effects

Create local, regional, and global explanations for any machine learning model with forward marginal effects. You provide a model and data, and 'fmeffects' computes feature effects. The package is based on the theory in: C. A. Scholbeck, G. Casalicchio, C. Molnar, B. Bischl, and C. Heumann (2022) <doi:10.48550/arXiv.2201.08837>.

Maintained by Holger Löwe. Last updated 4 months ago.

7.0 match 2 stars 5.73 score 6 scripts

jenniniku

gllvm:Generalized Linear Latent Variable Models

Analysis of multivariate data using generalized linear latent variable models (gllvm). Estimation is performed using either the Laplace method, variational approximations, or extended variational approximations, implemented via TMB (Kristensen et al. (2016), <doi:10.18637/jss.v070.i05>).

Maintained by Jenni Niku. Last updated 2 days ago.

cpp openmp

3.8 match 51 stars 10.52 score 176 scripts 1 dependents

lifewatch

sdmpredictors:Species Distribution Modelling Predictor Datasets

Terrestrial and marine predictors for species distribution modelling from multiple sources, including WorldClim <https://www.worldclim.org/>,, ENVIREM <https://envirem.github.io/>, Bio-ORACLE <https://bio-oracle.org/> and MARSPEC <http://www.marspec.org/>.

Maintained by Salvador Fernandez. Last updated 2 years ago.

bio-oracle lifewatch lifewatchvliz species-distribution-modelling

5.3 match 30 stars 7.47 score 218 scripts

vgherard

sbo:Text Prediction via Stupid Back-Off N-Gram Models

Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).

Maintained by Valerio Gherardi. Last updated 4 years ago.

natural-language-processing ngram-models predictive-text sbo cpp

8.2 match 10 stars 4.78 score 12 scripts

friendly

vcdExtra:'vcd' Extensions and Additions

Provides additional data sets, methods and documentation to complement the 'vcd' package for Visualizing Categorical Data and the 'gnm' package for Generalized Nonlinear Models. In particular, 'vcdExtra' extends mosaic, assoc and sieve plots from 'vcd' to handle 'glm()' and 'gnm()' models and adds a 3D version in 'mosaic3d'. Additionally, methods are provided for comparing and visualizing lists of 'glm' and 'loglm' objects. This package is now a support package for the book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer.

Maintained by Michael Friendly. Last updated 5 months ago.

categorical-data-visualization generalized-linear-models mosaic-plots

3.8 match 24 stars 10.34 score 472 scripts 3 dependents

skranz

gtree:gtree basic functionality to model and solve games

gtree basic functionality to model and solve games

Maintained by Sebastian Kranz. Last updated 4 years ago.

economic-experiments economics gambit game-theory nash-equilibrium

10.2 match 18 stars 3.79 score 23 scripts 1 dependents

plangfelder

WGCNA:Weighted Correlation Network Analysis

Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.

Maintained by Peter Langfelder. Last updated 6 months ago.

cpp

4.0 match 54 stars 9.65 score 5.3k scripts 32 dependents

gamlss-dev

gamlss:Generalized Additive Models for Location Scale and Shape

Functions for fitting the Generalized Additive Models for Location Scale and Shape introduced by Rigby and Stasinopoulos (2005), <doi:10.1111/j.1467-9876.2005.00510.x>. The models use a distributional regression approach where all the parameters of the conditional distribution of the response variable are modelled using explanatory variables.

Maintained by Mikis Stasinopoulos. Last updated 4 months ago.

3.4 match 16 stars 11.23 score 2.0k scripts 49 dependents

theoreticalecology

sjSDM:Scalable Joint Species Distribution Modeling

A scalable and fast method for estimating joint Species Distribution Models (jSDMs) for big community data, including eDNA data. The package estimates a full (i.e. non-latent) jSDM with different response distributions (including the traditional multivariate probit model). The package allows to perform variation partitioning (VP) / ANOVA on the fitted models to separate the contribution of environmental, spatial, and biotic associations. In addition, the total R-squared can be further partitioned per species and site to reveal the internal metacommunity structure, see Leibold et al., <doi:10.1111/oik.08618>. The internal structure can then be regressed against environmental and spatial distinctiveness, richness, and traits to analyze metacommunity assembly processes. The package includes support for accounting for spatial autocorrelation and the option to fit responses using deep neural networks instead of a standard linear predictor. As described in Pichler & Hartig (2021) <doi:10.1111/2041-210X.13687>, scalability is achieved by using a Monte Carlo approximation of the joint likelihood implemented via 'PyTorch' and 'reticulate', which can be run on CPUs or GPUs.

Maintained by Maximilian Pichler. Last updated 23 days ago.

deep-learning gpu-acceleration machine-learning species-distribution-modelling species-interactions

4.8 match 69 stars 7.64 score 70 scripts

tobiasschoch

robsurvey:Robust Survey Statistics Estimation

Robust (outlier-resistant) estimators of finite population characteristics like of means, totals, ratios, regression, etc. Available methods are M- and GM-estimators of regression, weight reduction, trimming, and winsorization. The package extends the 'survey' <https://CRAN.R-project.org/package=survey> package.

Maintained by Tobias Schoch. Last updated 3 months ago.

openblas

6.0 match 9 stars 6.16 score 5 scripts

abichat

evabic:Evaluation of Binary Classifiers

Evaluates the performance of binary classifiers. Computes confusion measures (TP, TN, FP, FN), derived measures (TPR, FDR, accuracy, F1, DOR, ..), and area under the curve. Outputs are well suited for nested dataframes.

Maintained by Antoine Bichat. Last updated 3 years ago.

classifier measures predictors roc-curve statistics

10.0 match 6 stars 3.62 score 14 scripts

rvalavi

blockCV:Spatial and Environmental Blocking for K-Fold and LOO Cross-Validation

Creating spatially or environmentally separated folds for cross-validation to provide a robust error estimation in spatially structured environments; Investigating and visualising the effective range of spatial autocorrelation in continuous raster covariates and point samples to find an initial realistic distance band to separate training and testing datasets spatially described in Valavi, R. et al. (2019) <doi:10.1111/2041-210X.13107>.

Maintained by Roozbeh Valavi. Last updated 5 months ago.

cross-validation spatial spatial-cross-validation spatial-modelling species-distribution-modelling cpp

3.4 match 113 stars 10.49 score 302 scripts 3 dependents

jackmwolf

pcsstools:Tools for Regression Using Pre-Computed Summary Statistics

Defines functions to describe regression models using only pre-computed summary statistics (i.e. means, variances, and covariances) in place of individual participant data. Possible models include linear models for linear combinations, products, and logical combinations of phenotypes. Implements methods presented in Wolf et al. (2021) <doi:10.3389/fgene.2021.745901> Wolf et al. (2020) <doi:10.1142/9789811215636_0063> and Gasdaska et al. (2019) <doi:10.1142/9789813279827_0036>.

Maintained by Jack Wolf. Last updated 9 months ago.

gwas statistical-genetics

10.5 match 5 stars 3.40 score 5 scripts

forestry-labs

distillML:Model Distillation and Interpretability Methods for Machine Learning Models

Provides several methods for model distillation and interpretability for general black box machine learning models and treatment effect estimation methods. For details on the algorithms implemented, see <https://forestry-labs.github.io/distillML/index.html> Brian Cho, Theo F. Saarinen, Jasjeet S. Sekhon, Simon Walter.

Maintained by Theo Saarinen. Last updated 2 years ago.

bart distillation-model explainable-machine-learning explainable-ml interpretability interpretable-machine-learning machine-learning model random-forest xgboost

9.1 match 7 stars 3.92 score 12 scripts

jfiksel

codalm:Transformation-Free Linear Regression for Compositional Outcomes and Predictors

Implements the expectation-maximization (EM) algorithm as described in Fiksel et al. (2020) <arXiv:2004.07881> for transformation-free linear regression for compositional outcomes and predictors.

Maintained by Jacob Fiksel. Last updated 4 years ago.

8.5 match 3 stars 4.18 score 5 scripts

lme4

lme4:Linear Mixed-Effects Models using 'Eigen' and S4

Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".

Maintained by Ben Bolker. Last updated 2 days ago.

cpp

1.7 match 647 stars 20.69 score 35k scripts 1.5k dependents

nepem-ufsc

metan:Multi Environment Trials Analysis

Performs stability analysis of multi-environment trial data using parametric and non-parametric methods. Parametric methods includes Additive Main Effects and Multiplicative Interaction (AMMI) analysis by Gauch (2013) <doi:10.2135/cropsci2013.04.0241>, Ecovalence by Wricke (1965), Genotype plus Genotype-Environment (GGE) biplot analysis by Yan & Kang (2003) <doi:10.1201/9781420040371>, geometric adaptability index by Mohammadi & Amri (2008) <doi:10.1007/s10681-007-9600-6>, joint regression analysis by Eberhart & Russel (1966) <doi:10.2135/cropsci1966.0011183X000600010011x>, genotypic confidence index by Annicchiarico (1992), Murakami & Cruz's (2004) method, power law residuals (POLAR) statistics by Doring et al. (2015) <doi:10.1016/j.fcr.2015.08.005>, scale-adjusted coefficient of variation by Doring & Reckling (2018) <doi:10.1016/j.eja.2018.06.007>, stability variance by Shukla (1972) <doi:10.1038/hdy.1972.87>, weighted average of absolute scores by Olivoto et al. (2019a) <doi:10.2134/agronj2019.03.0220>, and multi-trait stability index by Olivoto et al. (2019b) <doi:10.2134/agronj2019.03.0221>. Non-parametric methods includes superiority index by Lin & Binns (1988) <doi:10.4141/cjps88-018>, nonparametric measures of phenotypic stability by Huehn (1990) <doi:10.1007/BF00024241>, TOP third statistic by Fox et al. (1990) <doi:10.1007/BF00040364>. Functions for computing biometrical analysis such as path analysis, canonical correlation, partial correlation, clustering analysis, and tools for inspecting, manipulating, summarizing and plotting typical multi-environment trial data are also provided.

Maintained by Tiago Olivoto. Last updated 9 days ago.

3.6 match 2 stars 9.48 score 1.3k scripts 2 dependents

sibipx

missForestPredict:Missing Value Imputation using Random Forest for Prediction Settings

Missing data imputation based on the 'missForest' algorithm (Stekhoven, Daniel J (2012) <doi:10.1093/bioinformatics/btr597>) with adaptations for prediction settings. The function missForest() is used to impute a (training) dataset with missing values and to learn imputation models that can be later used for imputing new observations. The function missForestPredict() is used to impute one or multiple new observations (test set) using the models learned on the training data.

Maintained by Elena Albu. Last updated 1 years ago.

8.5 match 4.00 score 3 scripts

bioc

cancerclass:Development and validation of diagnostic tests from high-dimensional molecular data

The classification protocol starts with a feature selection step and continues with nearest-centroid classification. The accurarcy of the predictor can be evaluated using training and test set validation, leave-one-out cross-validation or in a multiple random validation protocol. Methods for calculation and visualization of continuous prediction scores allow to balance sensitivity and specificity and define a cutoff value according to clinical requirements.

Maintained by Daniel Kosztyla. Last updated 5 months ago.

cancer microarray classification visualization

10.2 match 3.30 score 10 scripts

solivella

NetMix:Dynamic Mixed-Membership Network Regression Model

Stochastic collapsed variational inference on mixed-membership stochastic blockmodel for networks, incorporating node-level predictors of mixed-membership vectors, as well as dyad-level predictors. For networks observed over time, the model defines a hidden Markov process that allows the effects of node-level predictors to evolve in discrete, historical periods. In addition, the package offers a variety of utilities for exploring results of estimation, including tools for conducting posterior predictive checks of goodness-of-fit and several plotting functions. The package implements methods described in Olivella, Pratt and Imai (2019) 'Dynamic Stochastic Blockmodel Regression for Social Networks: Application to International Conflicts', available at <https://www.santiagoolivella.info/pdfs/socnet.pdf>.

Maintained by Santiago Olivella. Last updated 1 years ago.

openblas cpp openmp

7.8 match 11 stars 4.30 score 36 scripts

tidyverse

modelr:Modelling Functions that Work with the Pipe

Functions for modelling that help you seamlessly integrate modelling into a pipeline of data manipulation and visualisation.

Maintained by Hadley Wickham. Last updated 1 years ago.

modelling

2.0 match 401 stars 16.44 score 6.9k scripts 1.0k dependents

anthonydevaux

DynForest:Random Forest with Multivariate Longitudinal Predictors

Based on random forest principle, 'DynForest' is able to include multiple longitudinal predictors to provide individual predictions. Longitudinal predictors are modeled through the random forest. The methodology is fully described for a survival outcome in: Devaux, Helmer, Genuer & Proust-Lima (2023) <doi: 10.1177/09622802231206477>.

Maintained by Anthony Devaux. Last updated 5 months ago.

5.0 match 16 stars 6.38 score 8 scripts

munchfab

mlts:Multilevel Latent Time Series Models with 'R' and 'Stan'

Fit multilevel manifest or latent time-series models, including popular Dynamic Structural Equation Models (DSEM). The models can be set up and modified with user-friendly functions and are fit to the data using 'Stan' for Bayesian inference. Path models and formulas for user-defined models can be easily created with functions using 'knitr'. Asparouhov, Hamaker, & Muthen (2018) <doi:10.1080/10705511.2017.1406803>.

Maintained by Kenneth Koslowski. Last updated 9 months ago.

cpp

5.5 match 2 stars 5.68 score 9 scripts

hturner

BradleyTerry2:Bradley-Terry Models

Specify and fit the Bradley-Terry model, including structured versions in which the parameters are related to explanatory variables through a linear predictor and versions with contest-specific effects, such as a home advantage.

Maintained by Heather Turner. Last updated 6 years ago.

bradley-terry-models paired-comparisons statistical-models

3.9 match 20 stars 7.97 score 172 scripts 1 dependents

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 6 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

1.9 match 462 stars 16.50 score 10k scripts 154 dependents

gi0na

ghypernet:Fit and Simulate Generalised Hypergeometric Ensembles of Graphs

Provides functions for model fitting and selection of generalised hypergeometric ensembles of random graphs (gHypEG). To learn how to use it, check the vignettes for a quick tutorial. Please reference its use as Casiraghi, G., Nanumyan, V. (2019) <doi:10.5281/zenodo.2555300> together with those relevant references from the one listed below. The package is based on the research developed at the Chair of Systems Design, ETH Zurich. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2016) <arXiv:1607.02441>. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2017) <doi:10.1007/978-3-319-67256-4_11>. Casiraghi, G., (2017) <arXiv:1702.02048> Brandenberger, L., Casiraghi, G., Nanumyan, V., Schweitzer, F. (2019) <doi:10.1145/3341161.3342926> Casiraghi, G. (2019) <doi:10.1007/s41109-019-0241-1>. Casiraghi, G., Nanumyan, V. (2021) <doi:10.1038/s41598-021-92519-y>. Casiraghi, G. (2021) <doi:10.1088/2632-072X/ac0493>.

Maintained by Giona Casiraghi. Last updated 11 months ago.

data-mining data-science graphs network network-analysis random-graph-generation random-graphs

5.4 match 8 stars 5.68 score 20 scripts

uscbiostats

xrnet:Hierarchical Regularized Regression

Fits hierarchical regularized regression models to incorporate potentially informative external data, Weaver and Lewinger (2019) <doi:10.21105/joss.01761>. Utilizes coordinate descent to efficiently fit regularized regression models both with and without external information with the most common penalties used in practice (i.e. ridge, lasso, elastic net). Support for standard R matrices, sparse matrices and big.matrix objects.

Maintained by Garrett Weaver. Last updated 8 months ago.

cpp

6.9 match 10 stars 4.48 score 10 scripts

datalorax

equatiomatic:Transform Models into 'LaTeX' Equations

The goal of 'equatiomatic' is to reduce the pain associated with writing 'LaTeX' formulas from fitted models. The primary function of the package, extract_eq(), takes a fitted model object as its input and returns the corresponding 'LaTeX' code for the model.

Maintained by Philippe Grosjean. Last updated 7 days ago.

2.6 match 619 stars 11.75 score 424 scripts 5 dependents

tidymodels

rsample:General Resampling Infrastructure

Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).

Maintained by Hannah Frick. Last updated 5 days ago.

1.8 match 341 stars 16.72 score 5.2k scripts 79 dependents

jinli22

spm:Spatial Predictive Modeling

Introduction to some novel accurate hybrid methods of geostatistical and machine learning methods for spatial predictive modelling. It contains two commonly used geostatistical methods, two machine learning methods, four hybrid methods and two averaging methods. For each method, two functions are provided. One function is for assessing the predictive errors and accuracy of the method based on cross-validation. The other one is for generating spatial predictions using the method. For details please see: Li, J., Potter, A., Huang, Z., Daniell, J. J. and Heap, A. (2010) <https:www.ga.gov.au/metadata-gateway/metadata/record/gcat_71407> Li, J., Heap, A. D., Potter, A., Huang, Z. and Daniell, J. (2011) <doi:10.1016/j.csr.2011.05.015> Li, J., Heap, A. D., Potter, A. and Daniell, J. (2011) <doi:10.1016/j.envsoft.2011.07.004> Li, J., Potter, A., Huang, Z. and Heap, A. (2012) <https:www.ga.gov.au/metadata-gateway/metadata/record/74030>.

Maintained by Jin Li. Last updated 3 years ago.

5.5 match 3 stars 5.46 score 107 scripts 3 dependents

bioc

SemDist:Information Accretion-based Function Predictor Evaluation

This package implements methods to calculate information accretion for a given version of the gene ontology and uses this data to calculate remaining uncertainty, misinformation, and semantic similarity for given sets of predicted annotations and true annotations from a protein function predictor.

Maintained by Ian Gonzalez. Last updated 5 months ago.

classification annotation go software

6.9 match 1 stars 4.30 score 3 scripts

tidymodels

hardhat:Construct Modeling Packages

Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.

Maintained by Hannah Frick. Last updated 1 months ago.

2.0 match 103 stars 14.88 score 175 scripts 436 dependents

diystat

NBPSeq:Negative Binomial Models for RNA-Sequencing Data

Negative Binomial (NB) models for two-group comparisons and regression inferences from RNA-Sequencing Data.

Maintained by Yanming Di. Last updated 11 years ago.

6.0 match 1 stars 4.88 score 17 scripts 3 dependents

easystats

performance:Assessment of Regression Models Performance

Utilities for computing measures to assess model quality, which are not directly provided by R's 'base' or 'stats' packages. These include e.g. measures like r-squared, intraclass correlation coefficient (Nakagawa, Johnson & Schielzeth (2017) <doi:10.1098/rsif.2017.0213>), root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models. References: Lüdecke et al. (2021) <doi:10.21105/joss.03139>.

Maintained by Daniel Lüdecke. Last updated 18 days ago.

aic easystats hacktoberfest loo machine-learning mixed-models models performance r2 statistics

1.8 match 1.1k stars 16.17 score 4.3k scripts 47 dependents

rvlenth

rsm:Response-Surface Analysis

Provides functions to generate response-surface designs, fit first- and second-order response-surface models, make surface plots, obtain the path of steepest ascent, and do canonical analysis. A good reference on these methods is Chapter 10 of Wu, C-F J and Hamada, M (2009) "Experiments: Planning, Analysis, and Parameter Design Optimization" ISBN 978-0-471-69946-0. An early version of the package is documented in Journal of Statistical Software <doi:10.18637/jss.v032.i07>.

Maintained by Russell Lenth. Last updated 9 months ago.

2.8 match 18 stars 10.16 score 192 scripts 8 dependents

john-d-fox

Rcmdr:R Commander

A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.

Maintained by John Fox. Last updated 5 months ago.

3.0 match 4 stars 9.49 score 636 scripts 38 dependents

easystats

bayestestR:Understand and Describe Bayesian Models and Posterior Distributions

Provides utilities to describe posterior distributions and Bayesian models. It includes point-estimates such as Maximum A Posteriori (MAP), measures of dispersion (Highest Density Interval - HDI; Kruschke, 2015 <doi:10.1016/C2012-0-00477-2>) and indices used for null-hypothesis testing (such as ROPE percentage, pd and Bayes factors). References: Makowski et al. (2021) <doi:10.21105/joss.01541>.

Maintained by Dominique Makowski. Last updated 15 hours ago.

bayes-factors bayesfactor bayesian bayesian-framework credible-interval easystats hacktoberfest hdi map posterior-distributions rope

1.7 match 579 stars 16.84 score 2.2k scripts 82 dependents

business-science

timetk:A Tool Kit for Working with Time Series

Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.

Maintained by Matt Dancho. Last updated 1 years ago.

coercion coercion-functions data-mining dplyr forecast forecasting forecasting-models machine-learning series-decomposition series-signature tibble tidy tidyquant tidyverse time time-series timeseries

2.0 match 625 stars 14.15 score 4.0k scripts 16 dependents

allengoebl

iopsych:Methods for Industrial/Organizational Psychology

Collection of functions for IO Psychologists.

Maintained by Allen Goebl. Last updated 7 years ago.

7.1 match 3 stars 4.00 score 66 scripts

mlverse

tabnet:Fit 'TabNet' Models for Classification and Regression

Implements the 'TabNet' model by Sercan O. Arik et al. (2019) <doi:10.48550/arXiv.1908.07442> with 'Coherent Hierarchical Multi-label Classification Networks' by Giunchiglia et al. <doi:10.48550/arXiv.2010.10151> and provides a consistent interface for fitting and creating predictions. It's also fully compatible with the 'tidymodels' ecosystem.

Maintained by Christophe Regouby. Last updated 6 months ago.

tabnet

3.1 match 109 stars 9.00 score 65 scripts

jingzzeng

TRES:Tensor Regression with Envelope Structure

Provides three estimators for tensor response regression (TRR) and tensor predictor regression (TPR) models with tensor envelope structure. The three types of estimation approaches are generic and can be applied to any envelope estimation problems. The full Grassmannian (FG) optimization is often associated with likelihood-based estimation but requires heavy computation and good initialization; the one-directional optimization approaches (1D and ECD algorithms) are faster, stable and does not require carefully chosen initial values; the SIMPLS-type is motivated by the partial least squares regression and is computationally the least expensive. For details of TRR, see Li L, Zhang X (2017) <doi:10.1080/01621459.2016.1193022>. For details of TPR, see Zhang X, Li L (2017) <doi:10.1080/00401706.2016.1272495>. For details of 1D algorithm, see Cook RD, Zhang X (2016) <doi:10.1080/10618600.2015.1029577>. For details of ECD algorithm, see Cook RD, Zhang X (2018) <doi:10.5705/ss.202016.0037>. For more details of the package, see Zeng J, Wang W, Zhang X (2021) <doi:10.18637/jss.v099.i12>.

Maintained by Jing Zeng. Last updated 3 years ago.

5.9 match 2 stars 4.76 score 19 scripts 1 dependents

cran

PLORN:Prediction with Less Overfitting and Robust to Noise

A method for the quantitative prediction with much predictors. This package provides functions to construct the quantitative prediction model with less overfitting and robust to noise.

Maintained by Takahiko Koizumi. Last updated 3 years ago.

10.2 match 2.70 score 4 scripts

r-forge

car:Companion to Applied Regression

Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.

Maintained by John Fox. Last updated 5 months ago.

1.8 match 15.29 score 43k scripts 901 dependents

f-rousset

spaMM:Mixed-Effect Models, with or without Spatial Random Effects

Inference based on models with or without spatially-correlated random effects, multivariate responses, or non-Gaussian random effects (e.g., Beta). Variation in residual variance (heteroscedasticity) can itself be represented by a mixed-effect model. Both classical geostatistical models (Rousset and Ferdy 2014 <doi:10.1111/ecog.00566>), and Markov random field models on irregular grids (as considered in the 'INLA' package, <https://www.r-inla.org>), can be fitted, with distinct computational procedures exploiting the sparse matrix representations for the latter case and other autoregressive models. Laplace approximations are used for likelihood or restricted likelihood. Penalized quasi-likelihood and other variants discussed in the h-likelihood literature (Lee and Nelder 2001 <doi:10.1093/biomet/88.4.987>) are also implemented.

Maintained by François Rousset. Last updated 9 months ago.

gsl cpp openmp

5.6 match 4.94 score 208 scripts 5 dependents

edwinkipruto

mfp2:Multivariable Fractional Polynomial Models with Extensions

Multivariable fractional polynomial algorithm simultaneously selects variables and functional forms in both generalized linear models and Cox proportional hazard models. Key references are Royston and Altman (1994) <doi:10.2307/2986270> and Royston and Sauerbrei (2008, ISBN:978-0-470-02842-1). In addition, it can model a sigmoid relationship between variable x and an outcome variable y using the approximate cumulative distribution transformation proposed by Royston (2014) <doi:10.1177/1536867X1401400206>. This feature distinguishes it from a standard fractional polynomial function, which lacks the ability to achieve such modeling.

Maintained by Edwin Kipruto. Last updated 10 months ago.

5.2 match 3 stars 5.26 score 4 scripts 2 dependents

strengejacke

ggeffects:Create Tidy Data Frames of Marginal Effects for 'ggplot' from Model Outputs

Compute marginal effects and adjusted predictions from statistical models and returns the result as tidy data frames. These data frames are ready to use with the 'ggplot2'-package. Effects and predictions can be calculated for many different models. Interaction terms, splines and polynomial terms are also supported. The main functions are ggpredict(), ggemmeans() and ggeffect(). There is a generic plot()-method to plot the results using 'ggplot2'.

Maintained by Daniel Lüdecke. Last updated 4 days ago.

estimated-marginal-means hacktoberfest marginal-effects prediction

1.8 match 588 stars 15.55 score 3.6k scripts 7 dependents

weiliang

powerMediation:Power/Sample Size Calculation for Mediation Analysis

Functions to calculate power and sample size for testing (1) mediation effects; (2) the slope in a simple linear regression; (3) odds ratio in a simple logistic regression; (4) mean change for longitudinal study with 2 time points; (5) interaction effect in 2-way ANOVA; and (6) the slope in a simple Poisson regression.

Maintained by Weiliang Qiu. Last updated 4 years ago.

6.8 match 3 stars 3.97 score 65 scripts 2 dependents

hojsgaard

doBy:Groupwise Statistics, LSmeans, Linear Estimates, Utilities

Utility package containing: 1) Facilities for working with grouped data: 'do' something to data stratified 'by' some variables. 2) LSmeans (least-squares means), general linear estimates. 3) Restrict functions to a smaller domain. 4) Miscellaneous other utilities.

Maintained by Søren Højsgaard. Last updated 4 days ago.

1.8 match 1 stars 14.94 score 3.2k scripts 939 dependents

nliulab

AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator

A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.

Maintained by Feng Xie. Last updated 14 days ago.

3.5 match 32 stars 7.70 score 30 scripts

tidy-finance

tidyfinance:Tidy Finance Helper Functions

Helper functions for empirical research in financial economics, addressing a variety of topics covered in Scheuch, Voigt, and Weiss (2023) <doi:10.1201/b23237>. The package is designed to provide shortcuts for issues extensively discussed in the book, facilitating easier application of its concepts. For more information and resources related to the book, visit <https://www.tidy-finance.org/r/index.html>.

Maintained by Christoph Scheuch. Last updated 3 months ago.

finance

3.5 match 15 stars 7.56 score 24 scripts

ahoshiyar

ordPens:Selection, Fusion, Smoothing and Principal Components Analysis for Ordinal Variables

Selection, fusion, and/or smoothing of ordinally scaled independent variables using a group lasso, fused lasso or generalized ridge penalty, as well as non-linear principal components analysis for ordinal variables using a second-order difference/smoothing penalty.

Maintained by Aisouda Hoshiyar. Last updated 10 months ago.

7.0 match 2 stars 3.79 score 31 scripts

statnet

ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 6 days ago.

1.7 match 100 stars 15.36 score 1.4k scripts 36 dependents

dallenmidd

IxPopDyMod:Framework for Tick Population and Infection Modeling

Code to specify, run, and then visualize and analyze the results of Ixodidae (hard-bodied ticks) population and infection dynamics models. Such models exist in the literature, but the source code to run them is not always available. 'IxPopDyMod' provides an easy way for these models to be written and shared.

Maintained by Myles Stokowski. Last updated 4 months ago.

8.8 match 2 stars 3.00 score 6 scripts

maressyl

LPS:Linear Predictor Score, for Binary Inference from Multiple Continuous Variables

An implementation of the Linear Predictor Score approach, as initiated by Radmacher et al. (J Comput Biol 2001) and enhanced by Wright et al. (PNAS 2003) for gene expression signatures. Several tools for unsupervised clustering of gene expression data are also provided.

Maintained by Sylvain Mareschal. Last updated 4 years ago.

7.0 match 1 stars 3.74 score 11 scripts

evolecolgroup

tidysdm:Species Distribution Models with Tidymodels

Fit species distribution models (SDMs) using the 'tidymodels' framework, which provides a standardised interface to define models and process their outputs. 'tidysdm' expands 'tidymodels' by providing methods for spatial objects, models and metrics specific to SDMs, as well as a number of specialised functions to process occurrences for contemporary and palaeo datasets. The full functionalities of the package are described in Leonardi et al. (2023) <doi:10.1101/2023.07.24.550358>.

Maintained by Andrea Manica. Last updated 9 days ago.

species-distribution-modelling tidymodels

3.0 match 31 stars 8.82 score 51 scripts

rvlenth

emmeans:Estimated Marginal Means, aka Least-Squares Means

Obtain estimated marginal means (EMMs) for many linear, generalized linear, and mixed models. Compute contrasts or linear functions of EMMs, trends, and comparisons of slopes. Plots and other displays. Least-squares means are discussed, and the term "estimated marginal means" is suggested, in Searle, Speed, and Milliken (1980) Population marginal means in the linear model: An alternative to least squares means, The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>.

Maintained by Russell V. Lenth. Last updated 3 days ago.

1.3 match 377 stars 19.19 score 13k scripts 187 dependents

guido-s

meta:General Package for Meta-Analysis

User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rücker <DOI:10.1007/978-3-319-21416-0>, "Meta-Analysis with R" (2015): - common effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L'Abbe, Baujat, bubble); - three-level meta-analysis model; - generalised linear mixed model; - logistic regression with penalised likelihood for rare events; - Hartung-Knapp method for random effects model; - Kenward-Roger method for random effects model; - prediction interval; - statistical tests for funnel plot asymmetry; - trim-and-fill method to evaluate bias in meta-analysis; - meta-regression; - cumulative meta-analysis and leave-one-out meta-analysis; - import data from 'RevMan 5'; - produce forest plot summarising several (subgroup) meta-analyses.

Maintained by Guido Schwarzer. Last updated 25 days ago.

meta-analysis rstudio

1.7 match 84 stars 14.84 score 2.3k scripts 29 dependents

r-forge

robustbase:Basic Robust Statistics

"Essential" Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book "Robust Statistics, Theory and Methods" by 'Maronna, Martin and Yohai'; Wiley 2006.

Maintained by Martin Maechler. Last updated 4 months ago.

fortran openblas

1.9 match 13.33 score 1.7k scripts 480 dependents

amices

ggmice:Visualizations for 'mice' with 'ggplot2'

Enhance a 'mice' imputation workflow with visualizations for incomplete and/or imputed data. The plotting functions produce 'ggplot' objects which may be easily manipulated or extended. Use 'ggmice' to inspect missing data, develop imputation models, evaluate algorithmic convergence, or compare observed versus imputed data.

Maintained by Hanne Oberman. Last updated 8 months ago.

ggplot2 mice visualization

3.3 match 32 stars 7.42 score 165 scripts

emmekeaarts

mHMMbayes:Multilevel Hidden Markov Models Using Bayesian Estimation

An implementation of the multilevel (also known as mixed or random effects) hidden Markov model using Bayesian estimation in R. The multilevel hidden Markov model (HMM) is a generalization of the well-known hidden Markov model, for the latter see Rabiner (1989) <doi:10.1109/5.18626>. The multilevel HMM is tailored to accommodate (intense) longitudinal data of multiple individuals simultaneously, see e.g., de Haan-Rietdijk et al. <doi:10.1080/00273171.2017.1370364>. Using a multilevel framework, we allow for heterogeneity in the model parameters (transition probability matrix and conditional distribution), while estimating one overall HMM. The model can be fitted on multivariate data with either a categorical, normal, or Poisson distribution, and include individual level covariates (allowing for e.g., group comparisons on model parameters). Parameters are estimated using Bayesian estimation utilizing the forward-backward recursion within a hybrid Metropolis within Gibbs sampler. Missing data (NA) in the dependent variables is accommodated assuming MAR. The package also includes various visualization options, a function to simulate data, and a function to obtain the most likely hidden state sequence for each individual using the Viterbi algorithm.

Maintained by Emmeke Aarts. Last updated 12 months ago.

cpp

4.0 match 17 stars 6.08 score 35 scripts

bioc

genefu:Computation of Gene Expression-Based Signatures in Breast Cancer

This package contains functions implementing various tasks usually required by gene expression analysis, especially in breast cancer studies: gene mapping between different microarray platforms, identification of molecular subtypes, implementation of published gene signatures, gene selection, and survival analysis.

Maintained by Benjamin Haibe-Kains. Last updated 4 months ago.

differentialexpression geneexpression visualization clustering classification

3.3 match 7.42 score 193 scripts 3 dependents

neural-structured-additive-learning

deeptrafo:Fitting Deep Conditional Transformation Models

Allows for the specification of deep conditional transformation models (DCTMs) and ordinal neural network transformation models, as described in Baumann et al (2021) <doi:10.1007/978-3-030-86523-8_1> and Kook et al (2022) <doi:10.1016/j.patcog.2021.108263>. Extensions such as autoregressive DCTMs (Ruegamer et al, 2023, <doi:10.1007/s11222-023-10212-8>) and transformation ensembles (Kook et al, 2022, <doi:10.48550/arXiv.2205.12729>) are implemented. The software package is described in Kook et al (2024, <doi:10.18637/jss.v111.i10>).

Maintained by Lucas Kook. Last updated 2 months ago.

5.4 match 5 stars 4.44 score 11 scripts

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 3 days ago.

immunooncology microarray sequencing metabolomics metagenomics proteomics geneprediction multiplecomparison classification regression bioconductor genomics genomics-data genomics-visualization multivariate-analysis multivariate-statistics omics r-pkg r-project

1.8 match 182 stars 13.71 score 1.3k scripts 22 dependents

hannameyer

CAST:'caret' Applications for Spatial-Temporal Models

Supporting functionality to run 'caret' with spatial or spatial-temporal data. 'caret' is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using 'caret'. It includes the newly suggested 'Nearest neighbor distance matching' cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milà et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>.

Maintained by Hanna Meyer. Last updated 2 months ago.

autocorrelation caret feature-selection machine-learning overfitting predictive-modeling spatial spatio-temporal variable-selection

2.0 match 114 stars 11.97 score 298 scripts 1 dependents

jasinmachkour

TRexSelector:T-Rex Selector: High-Dimensional Variable Selection & FDR Control

Performs fast variable selection in high-dimensional settings while controlling the false discovery rate (FDR) at a user-defined target level. The package is based on the paper Machkour, Muma, and Palomar (2022) <arXiv:2110.06048>.

Maintained by Jasin Machkour. Last updated 1 years ago.

5.4 match 5 stars 4.40 score 5 scripts

rspatial

dismo:Species Distribution Modeling

Methods for species distribution modeling, that is, predicting the environmental similarity of any site to that of the locations of known occurrences of a species.

Maintained by Robert J. Hijmans. Last updated 4 months ago.

cpp

2.0 match 26 stars 11.88 score 2.8k scripts 21 dependents

kkholst

targeted:Targeted Inference

Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>).

Maintained by Klaus K. Holst. Last updated 1 months ago.

causal-inference double-robust estimation semiparametric-estimation statistics openblas cpp openmp

3.3 match 11 stars 7.20 score 30 scripts 1 dependents

tagteam

riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks

Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.

Maintained by Thomas Alexander Gerds. Last updated 17 days ago.

openblas cpp

1.8 match 46 stars 13.00 score 736 scripts 35 dependents

r-forge

modEvA:Model Evaluation and Analysis

Analyses species distribution models and evaluates their performance. It includes functions for variation partitioning, extracting variable importance, computing several metrics of model discrimination and calibration performance, optimizing prediction thresholds based on a number of criteria, performing multivariate environmental similarity surface (MESS) analysis, and displaying various analytical plots. Initially described in Barbosa et al. (2013) <doi:10.1111/ddi.12100>.

Maintained by A. Marcia Barbosa. Last updated 10 days ago.

3.4 match 6.82 score 269 scripts 3 dependents

mastoffel

partR2:Partitioning R2 in GLMMs

Partitioning the R2 of GLMMs into variation explained by each predictor and combination of predictors using semi-partial (part) R2 and inclusive R2. Methods are based on the R2 for GLMMs described in Nakagawa & Schielzeth (2013) and Nakagawa, Johnson & Schielzeth (2017).

Maintained by Martin A. Stoffel. Last updated 6 months ago.

3.5 match 22 stars 6.68 score 73 scripts

nimble-dev

nimbleMacros:Macros Generating 'nimble' Code

Macros to generate 'nimble' code from a concise syntax. Included are macros for generating linear modeling code using a formula-based syntax and for building for() loops. For more details review the 'nimble' manual: <https://r-nimble.org/html_manual/cha-writing-models.html#subsec:macros>.

Maintained by Ken Kellner. Last updated 4 days ago.

4.7 match 4.98 score

goodekat

ggResidpanel:Panels and Interactive Versions of Diagnostic Plots using 'ggplot2'

An R package for creating diagnostic plots for models. The package allows for the creation of panels of plots and interactive plots.

Maintained by Katherine Goode. Last updated 2 months ago.

2.9 match 37 stars 7.68 score 262 scripts

jamiemkass

ENMeval:Automated Tuning and Evaluations of Ecological Niche Models

Runs ecological niche models over all combinations of user-defined settings (i.e., tuning), performs cross validation to evaluate models, and returns data tables to aid in selection of optimal model settings that balance goodness-of-fit and model complexity. Also has functions to partition data spatially (or not) for cross validation, to plot multiple visualizations of results, to run null models to estimate significance and effect sizes of performance metrics, and to calculate range overlap between model predictions, among others. The package was originally built for Maxent models (Phillips et al. 2006, Phillips et al. 2017), but the current version allows possible extensions for any modeling algorithm. The extensive vignette, which guides users through most package functionality but unfortunately has a file size too big for CRAN, can be found here on the package's Github Pages website: <https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.

Maintained by Jamie M. Kass. Last updated 2 months ago.

2.0 match 49 stars 11.25 score 332 scripts 2 dependents

stan-dev

rstantools:Tools for Developing R Packages Interfacing with 'Stan'

Provides various tools for developers of R packages interfacing with 'Stan' <https://mc-stan.org>, including functions to set up the required package structure, S3 generics and default methods to unify function naming across 'Stan'-based R packages, and vignettes with recommendations for developers.

Maintained by Jonah Gabry. Last updated 2 months ago.

bayesian-data-analysis bayesian-statistics developer-tools stan

1.7 match 50 stars 13.09 score 134 scripts 222 dependents

humaniverse

wildfires:Mapping Risk and Resilience to wildfires in the UK

Build an social vulnerability index using PCA and identify areas of high wildfire risk and high social vulnerability.

Maintained by Matteo Larrode. Last updated 7 months ago.

7.5 match 1 stars 2.95 score

lcbc-uio

galamm:Generalized Additive Latent and Mixed Models

Estimates generalized additive latent and mixed models using maximum marginal likelihood, as defined in Sorensen et al. (2023) <doi:10.1007/s11336-023-09910-z>, which is an extension of Rabe-Hesketh and Skrondal (2004)'s unifying framework for multilevel latent variable modeling <doi:10.1007/BF02295939>. Efficient computation is done using sparse matrix methods, Laplace approximation, and automatic differentiation. The framework includes generalized multilevel models with heteroscedastic residuals, mixed response types, factor loadings, smoothing splines, crossed random effects, and combinations thereof. Syntax for model formulation is close to 'lme4' (Bates et al. (2015) <doi:10.18637/jss.v067.i01>) and 'PLmixed' (Rockwood and Jeon (2019) <doi:10.1080/00273171.2018.1516541>).

Maintained by Øystein Sørensen. Last updated 6 months ago.

generalized-additive-models hierarchical-models item-response-theory latent-variable-models structural-equation-models cpp

3.0 match 29 stars 7.33 score 41 scripts

bgreenwell

pdp:Partial Dependence Plots

A general framework for constructing partial dependence (i.e., marginal effect) plots from various types machine learning models in R.

Maintained by Brandon M. Greenwell. Last updated 3 years ago.

black-box-model machine-learning partial-dependence-function partial-dependence-plot visualization

1.9 match 93 stars 11.72 score 1.1k scripts 8 dependents

cefet-rj-dal

daltoolbox:Leveraging Experiment Lines to Data Analytics

The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.

Maintained by Eduardo Ogasawara. Last updated 1 months ago.

3.3 match 1 stars 6.65 score 536 scripts 4 dependents

rfastofficial

Rfast:A Collection of Efficient and Extremely Fast R Functions

A collection of fast (utility) functions for data analysis. Column and row wise means, medians, variances, minimums, maximums, many t, F and G-square tests, many regressions (normal, logistic, Poisson), are some of the many fast functions. References: a) Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>. b) Tsagris M. and Papadakis M. (2018). Forward regression in R: from the extreme slow to the extreme fast. Journal of Data Science, 16(4): 771--780. <doi:10.6339/JDS.201810_16(4).00006>. c) Chatzipantsiou C., Dimitriadis M., Papadakis M. and Tsagris M. (2020). Extremely Efficient Permutation and Bootstrap Hypothesis Tests Using Hypothesis Tests Using R. Journal of Modern Applied Statistical Methods, 18(2), eP2898. <doi:10.48550/arXiv.1806.10947>. d) Tsagris M., Papadakis M., Alenazi A. and Alzeley O. (2024). Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm. Computation, 12(9): 185. <doi:10.3390/computation12090185>. e) Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. <doi:10.48550/arXiv.2501.02849>.

Maintained by Manos Papadakis. Last updated 17 days ago.

openblas cpp openmp

1.8 match 147 stars 12.54 score 1.2k scripts 166 dependents

jacob-long

jtools:Analysis and Presentation of Social Scientific Data

This is a collection of tools for more efficiently understanding and sharing the results of (primarily) regression analyses. There are also a number of miscellaneous functions for statistical and programming purposes. Support for models produced by the survey and lme4 packages are points of emphasis.

Maintained by Jacob A. Long. Last updated 6 months ago.

social-sciences

1.5 match 167 stars 14.48 score 4.0k scripts 14 dependents

biodiverse

unmarked:Models for Data from Unmarked Animals

Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. References: Kellner et al. (2023) <doi:10.1111/2041-210X.14123>, Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.

Maintained by Ken Kellner. Last updated 16 hours ago.

openblas cpp openmp

1.7 match 4 stars 13.03 score 652 scripts 12 dependents

microsoft

wpa:Tools for Analysing and Visualising Viva Insights Data

Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in 'wpa': (i) Standard functions create a 'ggplot' visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to 'tidyverse' principles and works well with the pipe syntax. 'wpa' is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.

Maintained by Martin Chan. Last updated 4 months ago.

workplace-analytics

3.2 match 30 stars 6.69 score 39 scripts 1 dependents

fabian-s

spikeSlabGAM:Bayesian Variable Selection and Model Choice for Generalized Additive Mixed Models

Bayesian variable selection, model choice, and regularized estimation for (spatial) generalized additive mixed regression models via stochastic search variable selection with spike-and-slab priors.

Maintained by Fabian Scheipl. Last updated 5 months ago.

openblas

3.4 match 14 stars 6.28 score 15 scripts 1 dependents

easystats

modelbased:Estimation of Model-Based Predictions, Contrasts and Means

Implements a general interface for model-based estimations for a wide variety of models, used in the computation of marginal means, contrast analysis and predictions. For a list of supported models, see 'insight::supported_models()'.

Maintained by Dominique Makowski. Last updated 2 days ago.

contrast-analysis contrasts easystats estimate ggplot2 hacktoberfest marginal marginal-effects means predict

1.7 match 241 stars 12.35 score 315 scripts 4 dependents

bioc

ENmix:Quality control and analysis tools for Illumina DNA methylation BeadChip

Tools for quanlity control, analysis and visulization of Illumina DNA methylation array data.

Maintained by Zongli Xu. Last updated 2 days ago.

dnamethylation preprocessing qualitycontrol twochannel microarray onechannel methylationarray batcheffect normalization dataimport regression principalcomponent epigenetics multichannel differentialmethylation immunooncology

3.5 match 6.01 score 115 scripts

suyusung

arm:Data Analysis Using Regression and Multilevel/Hierarchical Models

Functions to accompany A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2007.

Maintained by Yu-Sung Su. Last updated 4 months ago.

1.7 match 25 stars 12.38 score 3.3k scripts 89 dependents

crsh

papaja:Prepare American Psychological Association Journal Articles with R Markdown

Tools to create dynamic, submission-ready manuscripts, which conform to American Psychological Association manuscript guidelines. We provide R Markdown document formats for manuscripts (PDF and Word) and revision letters (PDF). Helper functions facilitate reporting statistical analyses or create publication-ready tables and plots.

Maintained by Frederik Aust. Last updated 17 days ago.

apa apa-guidelines journal manuscript psychology reproducible-paper reproducible-research rmarkdown

1.8 match 662 stars 11.74 score 1.7k scripts 1 dependents

neuropsychology

psycho:Efficient and Publishing-Oriented Workflow for Psychological Science

The main goal of the psycho package is to provide tools for psychologists, neuropsychologists and neuroscientists, to facilitate and speed up the time spent on data analysis. It aims at supporting best practices and tools to format the output of statistical methods to directly paste them into a manuscript, ensuring statistical reporting standardization and conformity.

Maintained by Dominique Makowski. Last updated 4 years ago.

apa apa6 bayesian correlation format interpretation mixed-models neuroscience psycho psychology rstanarm statistics

1.9 match 149 stars 10.86 score 628 scripts 5 dependents

hjboonstra

mcmcsae:Markov Chain Monte Carlo Small Area Estimation

Fit multi-level models with possibly correlated random effects using Markov Chain Monte Carlo simulation. Such models allow smoothing over space and time and are useful in, for example, small area estimation.

Maintained by Harm Jan Boonstra. Last updated 3 months ago.

cpp

8.2 match 2.48 score 8 scripts

functionaldata

fdapace:Functional Data Analysis and Empirical Dynamics

A versatile package that provides implementation of various methods of Functional Data Analysis (FDA) and Empirical Dynamics. The core of this package is Functional Principal Component Analysis (FPCA), a key technique for functional data analysis, for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm. This core algorithm yields covariance and mean functions, eigenfunctions and principal component (scores), for both functional data and derivatives, for both dense (functional) and sparse (longitudinal) sampling designs. For sparse designs, it provides fitted continuous trajectories with confidence bands, even for subjects with very few longitudinal observations. PACE is a viable and flexible alternative to random effects modeling of longitudinal data. There is also a Matlab version (PACE) that contains some methods not available on fdapace and vice versa. Updates to fdapace were supported by grants from NIH Echo and NSF DMS-1712864 and DMS-2014626. Please cite our package if you use it (You may run the command citation("fdapace") to get the citation format and bibtex entry). References: Wang, J.L., Chiou, J., Müller, H.G. (2016) <doi:10.1146/annurev-statistics-041715-033624>; Chen, K., Zhang, X., Petersen, A., Müller, H.G. (2017) <doi:10.1007/s12561-015-9137-5>.

Maintained by Yidong Zhou. Last updated 9 months ago.

cpp

1.8 match 31 stars 11.46 score 474 scripts 25 dependents

tdaverse

ripserr:Calculate Persistent Homology with Ripser-Based Engines

Ports the Ripser <https://arxiv.org/abs/1908.02518> and Cubical Ripser <https://arxiv.org/abs/2005.12692> persistent homology calculation engines from C++. Can be used as a rapid calculation tool in topological data analysis pipelines.

Maintained by Raoul Wadhwa. Last updated 1 days ago.

algebraic-topology cohomology cpp cubical-complex persistent-homology pixel point-cloud r-language r-programming rcpp rips-complex ripser simplicial-complex simplicial-homology topological-data-analysis topology vietoris-complex voxel cpp

3.4 match 7 stars 5.80 score 6 scripts

kopperud

slouch:Stochastic Linear Ornstein-Uhlenbeck Comparative Hypotheses

An implementation of a phylogenetic comparative method. It can fit univariate among-species Ornstein-Uhlenbeck models of phenotypic trait evolution, where the trait evolves towards a primary optimum. The optimum can be modelled as a single parameter, as multiple discrete regimes on the phylogenetic tree, and/or with continuous covariates. See also Hansen (1997) <doi:10.2307/2411186>, Butler & King (2004) <doi:10.1086/426002>, Hansen et al. (2008) <doi:10.1111/j.1558-5646.2008.00412.x>.

Maintained by Bjørn Tore Kopperud. Last updated 1 years ago.

3.9 match 2 stars 5.12 score 44 scripts 1 dependents

matthias-da

robCompositions:Compositional Data Analysis

Methods for analysis of compositional data including robust methods (<doi:10.1007/978-3-319-96422-5>), imputation of missing values (<doi:10.1016/j.csda.2009.11.023>), methods to replace rounded zeros (<doi:10.1080/02664763.2017.1410524>, <doi:10.1016/j.chemolab.2016.04.011>, <doi:10.1016/j.csda.2012.02.012>), count zeros (<doi:10.1177/1471082X14535524>), methods to deal with essential zeros (<doi:10.1080/02664763.2016.1182135>), (robust) outlier detection for compositional data, (robust) principal component analysis for compositional data, (robust) factor analysis for compositional data, (robust) discriminant analysis for compositional data (Fisher rule), robust regression with compositional predictors, functional data analysis (<doi:10.1016/j.csda.2015.07.007>) and p-splines (<doi:10.1016/j.csda.2015.07.007>), contingency (<doi:10.1080/03610926.2013.824980>) and compositional tables (<doi:10.1111/sjos.12326>, <doi:10.1111/sjos.12223>, <doi:10.1080/02664763.2013.856871>) and (robust) Anderson-Darling normality tests for compositional data as well as popular log-ratio transformations (addLR, cenLR, isomLR, and their inverse transformations). In addition, visualisation and diagnostic tools are implemented as well as high and low-level plot functions for the ternary diagram.

Maintained by Matthias Templ. Last updated 26 days ago.

cpp

2.1 match 11 stars 9.19 score 226 scripts 2 dependents

dyfanjones

sagemaker.mlcore:sagemaker machine learning core classes and methods

`sagemaker` machine learning core classes and methods.

Maintained by Dyfan Jones. Last updated 3 years ago.

amazon-sagemaker aws machine-learning sagemaker sdk

7.3 match 2.65 score 3 dependents

moderndive

moderndive:Tidyverse-Friendly Introductory Linear Regression

Datasets and wrapper functions for tidyverse-friendly introductory linear regression, used in "Statistical Inference via Data Science: A ModernDive into R and the Tidyverse" available at <https://moderndive.com/>.

Maintained by Albert Y. Kim. Last updated 3 months ago.

1.7 match 88 stars 11.35 score 1.8k scripts

glsnow

TeachingDemos:Demonstrations for Teaching and Learning

Demonstration functions that can be used in a classroom to demonstrate statistical concepts, or on your own to better understand the concepts or the programming.

Maintained by Greg Snow. Last updated 1 years ago.

2.7 match 7.18 score 760 scripts 13 dependents

cran

Renvlp:Computing Envelope Estimators

Provides a general routine, envMU, which allows estimation of the M envelope of span(U) given root n consistent estimators of M and U. The routine envMU does not presume a model. This package implements response envelopes, partial response envelopes, envelopes in the predictor space, heteroscedastic envelopes, simultaneous envelopes, scaled response envelopes, scaled envelopes in the predictor space, groupwise envelopes, weighted envelopes, envelopes in logistic regression, envelopes in Poisson regression envelopes in function-on-function linear regression, envelope-based Partial Partial Least Squares, envelopes with non-constant error covariance, envelopes with t-distributed errors, reduced rank envelopes and reduced rank envelopes with non-constant error covariance. For each of these model-based routines the package provides inference tools including bootstrap, cross validation, estimation and prediction, hypothesis testing on coefficients are included except for weighted envelopes. Tools for selection of dimension include AIC, BIC and likelihood ratio testing. Background is available at Cook, R. D., Forzani, L. and Su, Z. (2016) <doi:10.1016/j.jmva.2016.05.006>. Optimization is based on a clockwise coordinate descent algorithm.

Maintained by Minji Lee. Last updated 1 years ago.

9.8 match 1.98 score 95 scripts

hjunwoo

bbl:Boltzmann Bayes Learner

Supervised learning using Boltzmann Bayes model inference, which extends naive Bayes model to include interactions. Enables classification of data into multiple response groups based on a large number of discrete predictors that can take factor values of heterogeneous levels. Either pseudo-likelihood or mean field inference can be used with L2 regularization, cross-validation, and prediction on new data. <doi:10.18637/jss.v101.i05>.

Maintained by Jun Woo. Last updated 3 years ago.

gsl cpp

7.1 match 2.70 score 3 scripts

cran

epiDisplay:Epidemiological Data Display Package

Package for data exploration and result presentation. Full 'epicalc' package with data management functions is available at '<https://medipe.psu.ac.th/epicalc/>'.

Maintained by Virasakdi Chongsuvivatwong. Last updated 3 years ago.

3.5 match 1 stars 5.44 score 758 scripts 2 dependents

alanarnholt

PASWR:Probability and Statistics with R

Functions and data sets for the text Probability and Statistics with R.

Maintained by Alan T. Arnholt. Last updated 3 years ago.

4.0 match 2 stars 4.70 score 241 scripts

cropmodels

Recocrop:Estimating Environmental Suitability for Plants

The ecocrop model estimates environmental suitability for plants using a limiting factor approach for plant growth following Hackett (1991) <doi:10.1007/BF00045728>. The implementation in this package is fast and flexible: it allows for the use of any (environmental) predictor variable. Predictors can be either static (for example, soil pH) or dynamic (for example, monthly precipitation).

Maintained by Robert J. Hijmans. Last updated 3 years ago.

cpp

4.9 match 11 stars 3.82 score 12 scripts

giraultg

SpiceFP:Sparse Method to Identify Joint Effects of Functional Predictors

A set of functions allowing to implement the 'SpiceFP' approach which is iterative. It involves transformation of functional predictors into several candidate explanatory matrices (based on contingency tables), to which relative edge matrices with contiguity constraints are associated. Generalized Fused Lasso regression are performed in order to identify the best candidate matrix, the best class intervals and related coefficients at each iteration. The approach is stopped when the maximal number of iterations is reached or when retained coefficients are zeros. Supplementary functions allow to get coefficients of any candidate matrix or mean of coefficients of many candidates.

Maintained by Girault Gnanguenon Guesse. Last updated 2 years ago.

5.1 match 3.70 score 1 scripts

ludvigolsen

cvms:Cross-Validation for Model Selection

Cross-validate one or multiple regression and classification models and get relevant evaluation metrics in a tidy format. Validate the best model on a test set and compare it to a baseline evaluation. Alternatively, evaluate predictions from an external model. Currently supports regression and classification (binary and multiclass). Described in chp. 5 of Jeyaraman, B. P., Olsen, L. R., & Wambugu M. (2019, ISBN: 9781838550134).

Maintained by Ludvig Renbo Olsen. Last updated 9 days ago.

1.8 match 39 stars 10.31 score 492 scripts 5 dependents

mjuraska

CoRpower:Power Calculations for Assessing Correlates of Risk in Clinical Efficacy Trials

Calculates power for assessment of intermediate biomarker responses as correlates of risk in the active treatment group in clinical efficacy trials, as described in Gilbert, Janes, and Huang, Power/Sample Size Calculations for Assessing Correlates of Risk in Clinical Efficacy Trials (2016, Statistics in Medicine). The methods differ from past approaches by accounting for the level of clinical treatment efficacy overall and in biomarker response subgroups, which enables the correlates of risk results to be interpreted in terms of potential correlates of efficacy/protection. The methods also account for inter-individual variability of the observed biomarker response that is not biologically relevant (e.g., due to technical measurement error of the laboratory assay used to measure the biomarker response), which is important because power to detect a specified correlate of risk effect size is heavily affected by the biomarker's measurement error. The methods can be used for a general binary clinical endpoint model with a univariate dichotomous, trichotomous, or continuous biomarker response measured in active treatment recipients at a fixed timepoint after randomization, with either case-cohort Bernoulli sampling or case-control without-replacement sampling of the biomarker (a baseline biomarker is handled as a trivial special case). In a specified two-group trial design, the computeN() function can initially be used for calculating additional requisite design parameters pertaining to the target population of active treatment recipients observed to be at risk at the biomarker sampling timepoint. Subsequently, the power calculation employs an inverse probability weighted logistic regression model fitted by the tps() function in the 'osDesign' package. Power results as well as the relationship between the correlate of risk effect size and treatment efficacy can be visualized using various plotting functions. To link power calculations for detecting a correlate of risk and a correlate of treatment efficacy, a baseline immunogenicity predictor (BIP) can be simulated according to a specified classification rule (for dichotomous or trichotomous BIPs) or correlation with the biomarker response (for continuous BIPs), then outputted along with biomarker response data under assignment to treatment, and clinical endpoint data for both treatment and placebo groups.

Maintained by Michal Juraska. Last updated 4 years ago.

4.4 match 4.15 score 14 scripts

thie1e

cutpointr:Determine and Evaluate Optimal Cutpoints in Binary Classification Tasks

Estimate cutpoints that optimize a specified metric in binary classification tasks and validate performance using bootstrapping. Some methods for more robust cutpoint estimation are supported, e.g. a parametric method assuming normal distributions, bootstrapped cutpoints, and smoothing of the metric values per cutpoint using Generalized Additive Models. Various plotting functions are included. For an overview of the package see Thiele and Hirschfeld (2021) <doi:10.18637/jss.v098.i11>.

Maintained by Christian Thiele. Last updated 3 months ago.

bootstrapping cutpoint-optimization roc-curve cpp

1.8 match 88 stars 10.44 score 322 scripts 1 dependents

marjoleinf

pre:Prediction Rule Ensembles

Derives prediction rule ensembles (PREs). Largely follows the procedure for deriving PREs as described in Friedman & Popescu (2008; <DOI:10.1214/07-AOAS148>), with adjustments and improvements. The main function pre() derives prediction rule ensembles consisting of rules and/or linear terms for continuous, binary, count, multinomial, and multivariate continuous responses. Function gpe() derives generalized prediction ensembles, consisting of rules, hinge and linear functions of the predictor variables.

Maintained by Marjolein Fokkema. Last updated 9 months ago.

2.1 match 58 stars 8.49 score 98 scripts 1 dependents

dyfanjones

sagemaker.mlframework:sagemaker machine learning developed by amazon

`sagemaker` machine learning developed by amazon.

Maintained by Dyfan Jones. Last updated 3 years ago.

amazon-sagemaker aws machine-learning sagemaker sdk

7.3 match 2.48 score 2 dependents

sfcheung

semhelpinghands:Helper Functions for Structural Equation Modeling

An assortment of helper functions for doing structural equation modeling, mainly by 'lavaan' for now. Most of them are time-saving functions for common tasks in doing structural equation modeling and reading the output. This package is not for functions that implement advanced statistical procedures. It is a light-weight package for simple functions that do simple tasks conveniently, with as few dependencies as possible.

Maintained by Shu Fai Cheung. Last updated 4 months ago.

bootstrapping lavaan structural-equation-modeling

3.5 match 5.13 score 27 scripts

hugheylab

zeitzeiger:Regularized Supervised Learning for Data from Rhythmic Systems

Method for predicting the value of a periodic variable from a high-dimensional observation. See Hughey et al. (2016) <doi:10.1093/nar/gkw030> and Hughey (2017) <doi:10.1186/s13073-017-0406-4>.

Maintained by Jake Hughey. Last updated 2 years ago.

4.8 match 10 stars 3.70 score 1 scripts

sth1402

modelObj:A Model Object Framework for Regression Analysis

A utility library to facilitate the generalization of statistical methods built on a regression framework. Package developers can use 'modelObj' methods to initiate a regression analysis without concern for the details of the regression model and the method to be used to obtain parameter estimates. The specifics of the regression step are left to the user to define when calling the function. The user of a function developed within the 'modelObj' framework creates as input a 'modelObj' that contains the model and the R methods to be used to obtain parameter estimates and to obtain predictions. In this way, a user can easily go from linear to non-linear models within the same package.

Maintained by Shannon T. Holloway. Last updated 3 years ago.

5.3 match 3.32 score 23 scripts 3 dependents

friendly

heplots:Visualizing Hypothesis Tests in Multivariate Linear Models

Provides HE plot and other functions for visualizing hypothesis tests in multivariate linear models. HE plots represent sums-of-squares-and-products matrices for linear hypotheses and for error using ellipses (in two dimensions) and ellipsoids (in three dimensions). The related 'candisc' package provides visualizations in a reduced-rank canonical discriminant space when there are more than a few response variables.

Maintained by Michael Friendly. Last updated 8 days ago.

linear-hypotheses matrices multivariate-linear-models plot repeated-measure-designs visualizing-hypothesis-tests

1.5 match 9 stars 11.49 score 1.1k scripts 7 dependents

florianhartig

DHARMa:Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models

The 'DHARMa' package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals for fitted (generalized) linear mixed models. Currently supported are linear and generalized linear (mixed) models from 'lme4' (classes 'lmerMod', 'glmerMod'), 'glmmTMB', 'GLMMadaptive', and 'spaMM'; phylogenetic linear models from 'phylolm' (classes 'phylolm' and 'phyloglm'); generalized additive models ('gam' from 'mgcv'); 'glm' (including 'negbin' from 'MASS', but excluding quasi-distributions) and 'lm' model classes. Moreover, externally created simulations, e.g. posterior predictive simulations from Bayesian software such as 'JAGS', 'STAN', or 'BUGS' can be processed as well. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problems, such as over/underdispersion, zero-inflation, and residual spatial, phylogenetic and temporal autocorrelation.

Maintained by Florian Hartig. Last updated 12 days ago.

glmm regression regression-diagnostics residual

1.2 match 226 stars 14.74 score 2.8k scripts 10 dependents

cran

CSTools:Assessing Skill of Climate Forecasts on Seasonal-to-Decadal Timescales

Exploits dynamical seasonal forecasts in order to provide information relevant to stakeholders at the seasonal timescale. The package contains process-based methods for forecast calibration, bias correction, statistical and stochastic downscaling, optimal forecast combination and multivariate verification, as well as basic and advanced tools to obtain tailored products. This package was developed in the context of the 'ERA4CS' project 'MEDSCOPE' and the 'H2020 S2S4E' project and includes contributions from 'ArticXchange' project founded by 'EU-PolarNet 2'. 'Pérez-Zanón et al. (2022) <doi:10.5194/gmd-15-6115-2022>'. 'Doblas-Reyes et al. (2005) <doi:10.1111/j.1600-0870.2005.00104.x>'. 'Mishra et al. (2018) <doi:10.1007/s00382-018-4404-z>'. 'Sanchez-Garcia et al. (2019) <doi:10.5194/asr-16-165-2019>'. 'Straus et al. (2007) <doi:10.1175/JCLI4070.1>'. 'Terzago et al. (2018) <doi:10.5194/nhess-18-2825-2018>'. 'Torralba et al. (2017) <doi:10.1175/JAMC-D-16-0204.1>'. 'D'Onofrio et al. (2014) <doi:10.1175/JHM-D-13-096.1>'. 'Verfaillie et al. (2017) <doi:10.5194/gmd-10-4257-2017>'. 'Van Schaeybroeck et al. (2019) <doi:10.1016/B978-0-12-812372-0.00010-8>'. 'Yiou et al. (2013) <doi:10.1007/s00382-012-1626-3>'.

Maintained by Victoria Agudetse. Last updated 1 years ago.

fortran

3.2 match 2 stars 5.32 score 62 scripts 1 dependents

cran

IIVpredictor:Modeling Within Individual Variability as Predictor

Time parceling method and Bayesian variability modeling methods for modeling within individual variability indicators as predictors.For more details, see <https://github.com/xliu12/IIVpredicitor>.

Maintained by Xiao Liu. Last updated 4 years ago.

jags cpp

8.6 match 2.00 score 2 scripts

nwaller

fungible:Psychometric Functions from the Waller Lab

Computes fungible coefficients and Monte Carlo data. Underlying theory for these functions is described in the following publications: Waller, N. (2008). Fungible Weights in Multiple Regression. Psychometrika, 73(4), 691-703, <DOI:10.1007/s11336-008-9066-z>. Waller, N. & Jones, J. (2009). Locating the Extrema of Fungible Regression Weights. Psychometrika, 74(4), 589-602, <DOI:10.1007/s11336-008-9087-7>. Waller, N. G. (2016). Fungible Correlation Matrices: A Method for Generating Nonsingular, Singular, and Improper Correlation Matrices for Monte Carlo Research. Multivariate Behavioral Research, 51(4), 554-568. Jones, J. A. & Waller, N. G. (2015). The normal-theory and asymptotic distribution-free (ADF) covariance matrix of standardized regression coefficients: theoretical extensions and finite sample behavior. Psychometrika, 80, 365-378, <DOI:10.1007/s11336-013-9380-y>. Waller, N. G. (2018). Direct Schmid-Leiman transformations and rank-deficient loadings matrices. Psychometrika, 83, 858-870. <DOI:10.1007/s11336-017-9599-0>.

Maintained by Niels Waller. Last updated 1 years ago.

3.4 match 5.01 score 136 scripts 8 dependents

druegamer

deepregression:Fitting Deep Distributional Regression

Allows for the specification of semi-structured deep distributional regression models which are fitted in a neural network as proposed by Ruegamer et al. (2023) <doi:10.18637/jss.v105.i02>. Predictors can be modeled using structured (penalized) linear effects, structured non-linear effects or using an unstructured deep network model.

Maintained by David Ruegamer. Last updated 3 months ago.

7.5 match 2.28 score 63 scripts 1 dependents

lingfeiwang

lassopv:Nonparametric P-Value Estimation for Predictors in Lasso

Estimate the p-values for predictors x against target variable y in lasso regression, using the regularization strength when each predictor enters the active set of regularization path for the first time as the statistic. This is based on the assumption that predictors (of the same variance) that (first) become active earlier tend to be more significant. Three null distributions are supported: normal and spherical, which are computed separately for each predictor and analytically under approximation, which aims at efficiency and accuracy for small p-values.

Maintained by Lingfei Wang. Last updated 2 years ago.

feature-selection lasso linear-regression p-value variable-selection

7.4 match 2 stars 2.30 score 5 scripts

emilioluissaenzguillen

GeDS:Geometrically Designed Spline Regression

Spline Regression, Generalized Additive Models, and Component-wise Gradient Boosting, utilizing Geometrically Designed (GeD) Splines. GeDS regression is a non-parametric method inspired by geometric principles, for fitting spline regression models with variable knots in one or two independent variables. It efficiently estimates the number of knots and their positions, as well as the spline order, assuming the response variable follows a distribution from the exponential family. GeDS models integrate the broader category of Generalized (Non-)Linear Models, offering a flexible approach to modeling complex relationships. A description of the method can be found in Kaishev et al. (2016) <doi:10.1007/s00180-015-0621-7> and Dimitrova et al. (2023) <doi:10.1016/j.amc.2022.127493>. Further extending its capabilities, GeDS's implementation includes Generalized Additive Models (GAM) and Functional Gradient Boosting (FGB), enabling versatile multivariate predictor modeling, as discussed in the forthcoming work of Dimitrova et al. (2025).

Maintained by Emilio L. Sáenz Guillén. Last updated 5 days ago.

cpp

4.2 match 4.06 score 19 scripts

alanarnholt

PASWR2:Probability and Statistics with R, Second Edition

Functions and data sets for the text Probability and Statistics with R, Second Edition.

Maintained by Alan T. Arnholt. Last updated 3 years ago.

4.0 match 1 stars 4.24 score 260 scripts

miriamesteve

eat:Efficiency Analysis Trees

Functions are provided to determine production frontiers and technical efficiency measures through non-parametric techniques based upon regression trees. The package includes code for estimating radial input, output, directional and additive measures, plotting graphical representations of the scores and the production frontiers by means of trees, and determining rankings of importance of input variables in the analysis. Additionally, an adaptation of Random Forest by a set of individual Efficiency Analysis Trees for estimating technical efficiency is also included. More details in: <doi:10.1016/j.eswa.2020.113783>.

Maintained by Miriam Esteve. Last updated 3 years ago.

3.6 match 5 stars 4.68 score 19 scripts

pat-s

oddsratio:Odds Ratio Calculation for GAM(M)s & GLM(M)s

Simplified odds ratio calculation of GAM(M)s & GLM(M)s. Provides structured output (data frame) of all predictors and their corresponding odds ratios and confident intervals for further analyses. It helps to avoid false references of predictors and increments by specifying these parameters in a list instead of using 'exp(coef(model))' (standard approach of odds ratio calculation for GLMs) which just returns a plain numeric output. For GAM(M)s, odds ratio calculation is highly simplified with this package since it takes care of the multiple 'predict()' calls of the chosen predictor while holding other predictors constant. Also, this package allows odds ratio calculation of percentage steps across the whole predictor distribution range for GAM(M)s. In both cases, confident intervals are returned additionally. Calculated odds ratio of GAM(M)s can be inserted into the smooth function plot.

Maintained by Patrick Schratz. Last updated 11 months ago.

odds-ratio probability statistics

2.3 match 31 stars 7.48 score 81 scripts 1 dependents

cran

catalytic:Tools for Applying Catalytic Priors in Statistical Modeling

To improve estimation accuracy and stability in statistical modeling, catalytic prior distributions are employed, integrating observed data with synthetic data generated from a simpler model's predictive distribution. This approach enhances model robustness, stability, and flexibility in complex data scenarios. The catalytic prior distributions are introduced by 'Huang et al.' (2020, <doi:10.1073/pnas.1920913117>), Li and Huang (2023, <doi:10.48550/arXiv.2312.01411>).

Maintained by Dongming Huang. Last updated 3 months ago.

5.3 match 3.18 score

cdriveraus

ctsem:Continuous Time Structural Equation Modelling

Hierarchical continuous (and discrete) time state space modelling, for linear and nonlinear systems measured by continuous variables, with limited support for binary data. The subject specific dynamic system is modelled as a stochastic differential equation (SDE) or difference equation, measurement models are typically multivariate normal factor models. Linear mixed effects SDE's estimated via maximum likelihood and optimization are the default. Nonlinearities, (state dependent parameters) and random effects on all parameters are possible, using either max likelihood / max a posteriori optimization (with optional importance sampling) or Stan's Hamiltonian Monte Carlo sampling. See <https://github.com/cdriveraus/ctsem/raw/master/vignettes/hierarchicalmanual.pdf> for details. Priors may be used. For the conceptual overview of the hierarchical Bayesian linear SDE approach, see <https://www.researchgate.net/publication/324093594_Hierarchical_Bayesian_Continuous_Time_Dynamic_Modeling>. Exogenous inputs may also be included, for an overview of such possibilities see <https://www.researchgate.net/publication/328221807_Understanding_the_Time_Course_of_Interventions_with_Continuous_Time_Dynamic_Models> . Stan based functions are not available on 32 bit Windows systems at present. <https://cdriver.netlify.app/> contains some tutorial blog posts.

Maintained by Charles Driver. Last updated 11 days ago.

stochastic-differential-equations time-series cpp

1.8 match 42 stars 9.58 score 366 scripts 1 dependents

alexpkeil1

bkmrhat:Parallel Chain Tools for Bayesian Kernel Machine Regression

Bayesian kernel machine regression (from the 'bkmr' package) is a Bayesian semi-parametric generalized linear model approach under identity and probit links. There are a number of functions in this package that extend Bayesian kernel machine regression fits to allow multiple-chain inference and diagnostics, which leverage functions from the 'future', 'rstan', and 'coda' packages. Reference: Bobb, J. F., Henn, B. C., Valeri, L., & Coull, B. A. (2018). Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. ; <doi:10.1186/s12940-018-0413-y>.

Maintained by Alexander Keil. Last updated 3 years ago.

3.7 match 7 stars 4.54 score 10 scripts

ying-ju

basemodels:Baseline Models for Classification and Regression

Providing equivalent functions for the dummy classifier and regressor used in 'Python' 'scikit-learn' library. Our goal is to allow R users to easily identify baseline performance for their classification and regression problems. Our baseline models use no predictors, and are useful in cases of class imbalance, multiclass classification, and when users want to quickly identify how much improvement their statistical and machine learning models are over several baseline models. We use a "better" default (proportional guessing) for the dummy classifier than the 'Python' implementation ("prior", which is the most frequent class in the training set). The functions in the package can be used on their own, or introduce methods named 'dummy_regressor' or 'dummy_classifier' that can be used within the caret package pipeline.

Maintained by Ying-Ju Chen. Last updated 2 years ago.

4.5 match 3.70 score 7 scripts

syksy

ePCR:Ensemble Penalized Cox Regression for Survival Prediction

The top-performing ensemble-based Penalized Cox Regression (ePCR) framework developed during the DREAM 9.5 mCRPC Prostate Cancer Challenge <https://www.synapse.org/ProstateCancerChallenge> presented in Guinney J, Wang T, Laajala TD, et al. (2017) <doi:10.1016/S1470-2045(16)30560-5> is provided here-in, together with the corresponding follow-up work. While initially aimed at modeling the most advanced stage of prostate cancer, metastatic Castration-Resistant Prostate Cancer (mCRPC), the modeling framework has subsequently been extended to cover also the non-metastatic form of advanced prostate cancer (CRPC). Readily fitted ensemble-based model S4-objects are provided, and a simulated example dataset based on a real-life cohort is provided from the Turku University Hospital, to illustrate the use of the package. Functionality of the ePCR methodology relies on constructing ensembles of strata in patient cohorts and averaging over them, with each ensemble member consisting of a highly optimized penalized/regularized Cox regression model. Various cross-validation and other modeling schema are provided for constructing novel model objects.

Maintained by Teemu Daniel Laajala. Last updated 1 years ago.

3.3 match 5.00 score 20 scripts

jmleach-bst

sim2Dpredictr:Simulate Outcomes Using Spatially Dependent Design Matrices

Provides tools for simulating spatially dependent predictors (continuous or binary), which are used to generate scalar outcomes in a (generalized) linear model framework. Continuous predictors are generated using traditional multivariate normal distributions or Gauss Markov random fields with several correlation function approaches (e.g., see Rue (2001) <doi:10.1111/1467-9868.00288> and Furrer and Sain (2010) <doi:10.18637/jss.v036.i10>), while binary predictors are generated using a Boolean model (see Cressie and Wikle (2011, ISBN: 978-0-471-69274-4)). Parameter vectors exhibiting spatial clustering can also be easily specified by the user.

Maintained by Justin Leach. Last updated 1 years ago.

6.1 match 2.70 score 2 scripts

bxc147

Epi:Statistical Analysis in Epidemiology

Functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data. In particular representation, manipulation, rate estimation and simulation for multistate data - the Lexis suite of functions, which includes interfaces to 'mstate', 'etm' and 'cmprsk' packages. Contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.

Maintained by Bendix Carstensen. Last updated 2 months ago.

1.7 match 4 stars 9.65 score 708 scripts 11 dependents

ericarcher

rfPermute:Estimate Permutation p-Values for Random Forest Importance Metrics

Estimate significance of importance metrics for a Random Forest model by permuting the response variable. Produces null distribution of importance metrics for each predictor variable and p-value of observed. Provides summary and visualization functions for 'randomForest' results.

Maintained by Eric Archer. Last updated 2 years ago.

jags cpp

2.4 match 27 stars 6.77 score 96 scripts 1 dependents

daya6489

SmartEDA:Summarize and Explore the Data

Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.

Maintained by Dayanand Ubrangala. Last updated 1 years ago.

analysis exploratory-data-analysis

2.2 match 42 stars 7.25 score 214 scripts

ajrgodfrey

BrailleR:Improved Access for Blind Users

Blind users do not have access to the graphical output from R without printing the content of graphics windows to an embosser of some kind. This is not as immediate as is required for efficient access to statistical output. The functions here are created so that blind people can make even better use of R. This includes the text descriptions of graphs, convenience functions to replace the functionality offered in many GUI front ends, and experimental functionality for optimising graphical content to prepare it for embossing as tactile images.

Maintained by A. Jonathan R. Godfrey. Last updated 11 months ago.

1.8 match 123 stars 8.90 score 143 scripts

rempsyc

rempsyc:Convenience Functions for Psychology

Make your workflow faster and easier. Easily customizable plots (via 'ggplot2'), nice APA tables (following the style of the *American Psychological Association*) exportable to Word (via 'flextable'), easily run statistical tests or check assumptions, and automatize various other tasks.

Maintained by Rémi Thériault. Last updated 1 months ago.

convenience-functions ggplot2 psychology statistics visualization

1.5 match 43 stars 10.68 score 214 scripts 2 dependents

jellegoeman

penalized:L1 (Lasso and Fused Lasso) and L2 (Ridge) Penalized Estimation in GLMs and in the Cox Model

Fitting possibly high dimensional penalized regression models. The penalty structure can be any combination of an L1 penalty (lasso and fused lasso), an L2 penalty (ridge) and a positivity constraint on the regression coefficients. The supported regression models are linear, logistic and Poisson regression and the Cox Proportional Hazards model. Cross-validation routines allow optimization of the tuning parameters.

Maintained by Jelle Goeman. Last updated 3 years ago.

openblas cpp

2.3 match 4 stars 7.09 score 429 scripts 17 dependents

ghbolstad

evolvability:Calculation of Evolvability Parameters

Provides tools for calculating evolvability parameters from estimated G-matrices as defined in Hansen and Houle (2008) <doi:10.1111/j.1420-9101.2008.01573.x> and fits phylogenetic comparative models that link the rate of evolution of a trait to the state of another evolving trait (see Hansen et al. 2021 Systematic Biology <doi:10.1093/sysbio/syab079>). The package was released with Bolstad et al. (2014) <doi:10.1098/rstb.2013.0255>, which contains some examples of use.

Maintained by Geir H. Bolstad. Last updated 10 months ago.

3.8 match 4.20 score 16 scripts