Showing 200 of total 654 results (show query)
easystats
insight:Easy Access to Model Information for Various Model Objects
A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. 'insight' mainly revolves around two types of functions: Functions that find (the names of) information, starting with 'find_', and functions that get the underlying data, starting with 'get_'. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.
Maintained by Daniel Lüdecke. Last updated 4 days ago.
easystatshacktoberfestinsightmodelsnamespredictorsrandom
15.3 match 412 stars 17.24 score 568 scripts 210 dependentsmwheymans
psfmi:Prediction Model Pooling, Selection and Performance Evaluation Across Multiply Imputed Datasets
Pooling, backward and forward selection of linear, logistic and Cox regression models in multiply imputed datasets. Backward and forward selection can be done from the pooled model using Rubin's Rules (RR), the D1, D2, D3, D4 and the median p-values method. This is also possible for Mixed models. The models can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of predictors. The stability of the models can be evaluated using (cluster) bootstrapping. The package further contains functions to pool model performance measures as ROC/AUC, Reclassification, R-squared, scaled Brier score, H&L test and calibration plots for logistic regression models. Internal validation can be done across multiply imputed datasets with cross-validation or bootstrapping. The adjusted intercept after shrinkage of pooled regression coefficients can be obtained. Backward and forward selection as part of internal validation is possible. A function to externally validate logistic prediction models in multiple imputed datasets is available and a function to compare models. For Cox models a strata variable can be included. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>.
Maintained by Martijn Heymans. Last updated 2 years ago.
cox-regressionimputationimputed-datasetslogisticmultiple-imputationpoolpredictorregressionselectionsplinespline-predictors
34.1 match 10 stars 7.17 score 70 scriptsstan-dev
rstanarm:Bayesian Applied Regression Modeling via Stan
Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.
Maintained by Ben Goodrich. Last updated 9 months ago.
bayesianbayesian-data-analysisbayesian-inferencebayesian-methodsbayesian-statisticsmultilevel-modelsrstanrstanarmstanstatistical-modelingcpp
14.9 match 393 stars 15.68 score 5.0k scripts 13 dependentsblasbenito
collinear:Automated Multicollinearity Management
Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.
Maintained by Blas M. Benito. Last updated 2 months ago.
machine-learningmulticollinearitystatistics
41.2 match 11 stars 5.51 score 15 scripts 1 dependentstopepo
caret:Classification and Regression Training
Misc functions for training and plotting classification and regression models.
Maintained by Max Kuhn. Last updated 3 months ago.
10.5 match 1.6k stars 19.24 score 61k scripts 303 dependentspaul-buerkner
brms:Bayesian Regression Models using 'Stan'
Fit Bayesian generalized (non-)linear multivariate multilevel models using 'Stan' for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit -- among others -- linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include both theory-driven and data-driven non-linear terms, auto-correlation structures, censoring and truncation, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their prior knowledge. Models can easily be evaluated and compared using several methods assessing posterior or prior predictions. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>; Bürkner (2018) <doi:10.32614/RJ-2018-017>; Bürkner (2021) <doi:10.18637/jss.v100.i05>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.
Maintained by Paul-Christian Bürkner. Last updated 2 days ago.
bayesian-inferencebrmsmultilevel-modelsstanstatistical-models
11.0 match 1.3k stars 16.61 score 13k scripts 34 dependentsinlabru-org
inlabru:Bayesian Latent Gaussian Modelling using INLA and Extensions
Facilitates spatial and general latent Gaussian modeling using integrated nested Laplace approximation via the INLA package (<https://www.r-inla.org>). Additionally, extends the GAM-like model class to more general nonlinear predictor expressions, and implements a log Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data. Model components are specified with general inputs and mapping methods to the latent variables, and the predictors are specified via general R expressions, with separate expressions for each observation likelihood model in multi-likelihood models. A prediction method based on fast Monte Carlo sampling allows posterior prediction of general expressions of the latent variables. Ecology-focused introduction in Bachl, Lindgren, Borchers, and Illian (2019) <doi:10.1111/2041-210X.13168>.
Maintained by Finn Lindgren. Last updated 3 days ago.
14.1 match 96 stars 12.62 score 832 scripts 6 dependentstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
18.6 match 3 stars 8.20 score 7.8k scripts 11 dependentspromidat
predictoR:Predictive Data Analysis System
Perform a supervised data analysis on a database through a 'shiny' graphical interface. It includes methods such as K-Nearest Neighbors, Decision Trees, ADA Boosting, Extreme Gradient Boosting, Random Forest, Neural Networks, Deep Learning, Support Vector Machines and Bayesian Methods.
Maintained by Oldemar Rodriguez. Last updated 1 years ago.
55.0 match 1 stars 2.60 score 3 scriptstidymodels
recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 5 days ago.
7.1 match 584 stars 18.71 score 7.2k scripts 380 dependentsrefunders
refund:Regression with Functional Data
Methods for regression for functional data, including function-on-scalar, scalar-on-function, and function-on-function regression. Some of the functions are applicable to image data.
Maintained by Julia Wrobel. Last updated 6 months ago.
11.3 match 41 stars 10.25 score 472 scripts 16 dependentsblasbenito
spatialRF:Easy Spatial Modeling with Random Forest
Automatic generation and selection of spatial predictors for spatial regression with Random Forest. Spatial predictors are surrogates of variables driving the spatial structure of a response variable. The package offers two methods to generate spatial predictors from a distance matrix among training cases: 1) Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>): computed as the eigenvectors of a weighted matrix of distances; 2) RFsp (Hengl et al. <DOI:10.7717/peerj.5518>): columns of the distance matrix used as spatial predictors. Spatial predictors help minimize the spatial autocorrelation of the model residuals and facilitate an honest assessment of the importance scores of the non-spatial predictors. Additionally, functions to reduce multicollinearity, identify relevant variable interactions, tune random forest hyperparameters, assess model transferability via spatial cross-validation, and explore model results via partial dependence curves and interaction surfaces are included in the package. The modelling functions are built around the highly efficient 'ranger' package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).
Maintained by Blas M. Benito. Last updated 3 years ago.
random-forestspatial-analysisspatial-regression
21.1 match 114 stars 5.45 score 49 scriptse-sensing
sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes
An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.
Maintained by Gilberto Camara. Last updated 1 months ago.
big-earth-datacbersearth-observationeo-datacubesgeospatialimage-time-seriesland-cover-classificationlandsatplanetary-computerr-spatialremote-sensingrspatialsatellite-image-time-seriessatellite-imagerysentinel-2stac-apistac-catalogcpp
10.8 match 494 stars 9.50 score 384 scriptszejiang-unsw
WASP:Wavelet System Prediction
The wavelet-based variance transformation method is used for system modelling and prediction. It refines predictor spectral representation using Wavelet Theory, which leads to improved model specifications and prediction accuracy. Details of methodologies used in the package can be found in Jiang, Z., Sharma, A., & Johnson, F. (2020) <doi:10.1029/2019WR026962>, Jiang, Z., Rashid, M. M., Johnson, F., & Sharma, A. (2020) <doi:10.1016/j.envsoft.2020.104907>, and Jiang, Z., Sharma, A., & Johnson, F. (2021) <doi:10.1016/J.JHYDROL.2021.126816>.
Maintained by Ze Jiang. Last updated 7 months ago.
predictiontransformationwavelet
15.8 match 9 stars 6.41 score 19 scriptsstan-dev
projpred:Projection Predictive Feature Selection
Performs projection predictive feature selection for generalized linear models (Piironen, Paasiniemi, and Vehtari, 2020, <doi:10.1214/20-EJS1711>) with or without multilevel or additive terms (Catalina, Bürkner, and Vehtari, 2022, <https://proceedings.mlr.press/v151/catalina22a.html>), for some ordinal and nominal regression models (Weber, Glass, and Vehtari, 2023, <arXiv:2301.01660>), and for many other regression models (using the latent projection by Catalina, Bürkner, and Vehtari, 2021, <arXiv:2109.04702>, which can also be applied to most of the former models). The package is compatible with the 'rstanarm' and 'brms' packages, but other reference models can also be used. See the vignettes and the documentation for more information and examples.
Maintained by Frank Weber. Last updated 1 months ago.
bayesbayesianbayesian-inferencerstanarmstanstatisticsvariable-selectionopenblascpp
10.0 match 112 stars 10.08 score 241 scriptsmjskay
tidybayes:Tidy Data and 'Geoms' for Bayesian Models
Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.
Maintained by Matthew Kay. Last updated 6 months ago.
bayesian-data-analysisbrmsggplot2jagsstantidy-datavisualization
6.6 match 733 stars 14.72 score 7.3k scripts 20 dependentshturner
gnm:Generalized Nonlinear Models
Functions to specify and fit generalized nonlinear models, including models with multiplicative interaction terms such as the UNIDIFF model from sociology and the AMMI model from crop science, and many others. Over-parameterized representations of models are used throughout; functions are provided for inference on estimable parameter combinations, as well as standard methods for diagnostics etc.
Maintained by Heather Turner. Last updated 1 years ago.
generalized-linear-modelsgeneralized-nonlinear-modelsstatistical-modelsopenblas
8.9 match 16 stars 10.51 score 290 scripts 21 dependentsiiasa
ibis.iSDM:Modelling framework for integrated biodiversity distribution scenarios
Integrated framework of modelling the distribution of species and ecosystems in a suitability framing. This package allows the estimation of integrated species distribution models (iSDM) based on several sources of evidence and provided presence-only and presence-absence datasets. It makes heavy use of point-process models for estimating habitat suitability and allows to include spatial latent effects and priors in the estimation. To do so 'ibis.iSDM' supports a number of engines for Bayesian and more non-parametric machine learning estimation. Further, the 'ibis.iSDM' is specifically customized to support spatial-temporal projections of habitat suitability into the future.
Maintained by Martin Jung. Last updated 4 months ago.
bayesianbiodiversityintegrated-frameworkpoisson-processscenariossdmspatial-grainspatial-predictionsspecies-distribution-modelling
21.0 match 21 stars 4.36 score 12 scripts 1 dependentsgiuseppec
iml:Interpretable Machine Learning
Interpretability methods to analyze the behavior and predictions of any machine learning model. Implemented methods are: Feature importance described by Fisher et al. (2018) <doi:10.48550/arxiv.1801.01489>, accumulated local effects plots described by Apley (2018) <doi:10.48550/arxiv.1612.08468>, partial dependence plots described by Friedman (2001) <www.jstor.org/stable/2699986>, individual conditional expectation ('ice') plots described by Goldstein et al. (2013) <doi:10.1080/10618600.2014.907095>, local models (variant of 'lime') described by Ribeiro et. al (2016) <doi:10.48550/arXiv.1602.04938>, the Shapley Value described by Strumbelj et. al (2014) <doi:10.1007/s10115-013-0679-x>, feature interactions described by Friedman et. al <doi:10.1214/07-AOAS148> and tree surrogate models.
Maintained by Giuseppe Casalicchio. Last updated 20 days ago.
6.8 match 494 stars 12.86 score 642 scripts 4 dependentstidymodels
dials:Tools for Creating Tuning Parameter Values
Many models contain tuning parameters (i.e. parameters that cannot be directly estimated from the data). These tools can be used to define objects for creating, simulating, or validating values for such parameters.
Maintained by Hannah Frick. Last updated 29 days ago.
6.0 match 114 stars 14.31 score 426 scripts 52 dependentstidymodels
applicable:A Compilation of Applicability Domain Methods
A modeling package compiling applicability domain methods in R. It combines different methods to measure the amount of extrapolation new samples can have from the training set. See Netzeva et al (2005) <doi:10.1177/026119290503300209> for an overview of applicability domains.
Maintained by Marly Gotti. Last updated 2 years ago.
11.3 match 47 stars 7.42 score 47 scripts 1 dependentschrisaberson
pwr2ppl:Power Analyses for Common Designs (Power to the People)
Statistical power analysis for designs including t-tests, correlations, multiple regression, ANOVA, mediation, and logistic regression. Functions accompany Aberson (2019) <doi:10.4324/9781315171500>.
Maintained by Chris Aberson. Last updated 3 years ago.
19.5 match 17 stars 4.16 score 17 scriptsebird
ebirdst:Access and Analyze eBird Status and Trends Data Products
Tools for accessing and analyzing eBird Status and Trends Data Products (<https://science.ebird.org/en/status-and-trends>). eBird (<https://ebird.org/home>) is a global database of bird observations collected by member of the public. eBird Status and Trends uses these data to model global bird distributions, abundances, and population trends at a high spatial and temporal resolution.
Maintained by Matthew Strimas-Mackey. Last updated 19 days ago.
9.0 match 26 stars 8.85 score 228 scriptsjohn-d-fox
effects:Effect Displays for Linear, Generalized Linear, and Other Models
Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.
Maintained by John Fox. Last updated 3 years ago.
7.4 match 6 stars 10.73 score 5.4k scripts 47 dependentsopenintrostat
openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
Maintained by Mine Çetinkaya-Rundel. Last updated 2 months ago.
6.6 match 240 stars 11.39 score 6.0k scriptsboehringer-ingelheim
BPrinStratTTE:Causal Effects in Principal Strata Defined by Antidrug Antibodies
Bayesian models to estimate causal effects of biological treatments on time-to-event endpoints in clinical trials with principal strata defined by the occurrence of antidrug antibodies. The methodology is based on Frangakis and Rubin (2002) <doi:10.1111/j.0006-341x.2002.00021.x> and Imbens and Rubin (1997) <doi:10.1214/aos/1034276631>, and here adapted to a specific time-to-event setting.
Maintained by Christian Stock. Last updated 11 months ago.
bayesian-methodscausal-inferenceclinical-trialestimandmcmc-methodspharmaceutical-developmentprincipal-stratificationsimulationstantime-to-eventcpp
22.5 match 3.18 scoreohdsi
PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model
A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.
Maintained by Egill Fridgeirsson. Last updated 8 days ago.
6.5 match 190 stars 10.85 score 297 scriptsrsquaredacademy
olsrr:Tools for Building OLS Regression Models
Tools designed to make it easier for users, particularly beginner/intermediate R users to build ordinary least squares regression models. Includes comprehensive regression output, heteroskedasticity tests, collinearity diagnostics, residual diagnostics, measures of influence, model fit assessment and variable selection procedures.
Maintained by Aravind Hebbali. Last updated 4 months ago.
collinearity-diagnosticslinear-modelsregressionstepwise-regression
5.8 match 103 stars 12.19 score 1.4k scripts 4 dependentsspatstat
spatstat.explore:Exploratory Data Analysis for the 'spatstat' Family
Functionality for exploratory data analysis and nonparametric analysis of spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported.
Maintained by Adrian Baddeley. Last updated 1 months ago.
cluster-detectionconfidence-intervalshypothesis-testingk-functionroc-curvesscan-statisticssignificance-testingsimulation-envelopesspatial-analysisspatial-data-analysisspatial-sharpeningspatial-smoothingspatial-statistics
6.5 match 1 stars 10.17 score 67 scripts 148 dependentsmyles-lewis
nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'
Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.
Maintained by Myles Lewis. Last updated 5 days ago.
7.9 match 12 stars 7.92 score 46 scriptsjedazard
superpc:Supervised Principal Components
Does prediction in the case of a censored survival outcome, or a regression outcome, using the "supervised principal component" approach. 'Superpc' is especially useful for high-dimensional data when the number of features p dominates the number of samples n (p >> n paradigm), as generated, for instance, by high-throughput technologies.
Maintained by Jean-Eudes Dazard. Last updated 3 years ago.
8.8 match 7 stars 6.96 score 80 scripts 2 dependentstidymodels
workflowsets:Create a Collection of 'tidymodels' Workflows
A workflow is a combination of a model and preprocessors (e.g, a formula, recipe, etc.) (Kuhn and Silge (2021) <https://www.tmwr.org/>). In order to try different combinations of these, an object can be created that contains many workflows. There are functions to create workflows en masse as well as training them and visualizing the results.
Maintained by Simon Couch. Last updated 5 months ago.
5.0 match 93 stars 12.21 score 294 scripts 19 dependentsddalthorp
GenEst:Generalized Mortality Estimator
Command-line and 'shiny' GUI implementation of the GenEst models for estimating bird and bat mortality at wind and solar power facilities, following Dalthorp, et al. (2018) <doi:10.3133/tm7A2>.
Maintained by Daniel Dalthorp. Last updated 2 years ago.
7.8 match 7 stars 7.81 score 55 scripts 2 dependentsnicholasjclark
mvgam:Multivariate (Dynamic) Generalized Additive Models
Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2023) <doi:10.1111/2041-210X.13974>.
Maintained by Nicholas J Clark. Last updated 4 hours ago.
bayesian-statisticsdynamic-factor-modelsecological-modellingforecastinggaussian-processgeneralised-additive-modelsgeneralized-additive-modelsjoint-species-distribution-modellingmultilevel-modelsmultivariate-timeseriesstantime-series-analysistimeseriesvector-autoregressionvectorautoregressioncpp
6.1 match 139 stars 9.85 score 117 scriptsharrelfe
Hmisc:Harrell Miscellaneous
Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.
Maintained by Frank E Harrell Jr. Last updated 2 days ago.
3.4 match 210 stars 17.61 score 17k scripts 750 dependentspauljohn32
rockchalk:Regression Estimation and Presentation
A collection of functions for interpretation and presentation of regression analysis. These functions are used to produce the statistics lectures in <https://pj.freefaculty.org/guides/>. Includes regression diagnostics, regression tables, and plots of interactions and "moderator" variables. The emphasis is on "mean-centered" and "residual-centered" predictors. The vignette 'rockchalk' offers a fairly comprehensive overview. The vignette 'Rstyle' has advice about coding in R. The package title 'rockchalk' refers to our school motto, 'Rock Chalk Jayhawk, Go K.U.'.
Maintained by Paul E. Johnson. Last updated 3 years ago.
8.3 match 7.13 score 584 scripts 18 dependentszejiang-unsw
synthesis:Generate Synthetic Data from Statistical Models
Generate synthetic time series from commonly used statistical models, including linear, nonlinear and chaotic systems. Applications to testing methods can be found in Jiang, Z., Sharma, A., & Johnson, F. (2019) <doi:10.1016/j.advwatres.2019.103430> and Jiang, Z., Sharma, A., & Johnson, F. (2020) <doi:10.1029/2019WR026962> associated with an open-source tool by Jiang, Z., Rashid, M. M., Johnson, F., & Sharma, A. (2020) <doi:10.1016/j.envsoft.2020.104907>.
Maintained by Ze Jiang. Last updated 9 months ago.
12.3 match 3 stars 4.56 score 12 scriptsfsolt
dotwhisker:Dot-and-Whisker Plots of Regression Results
Create quick and easy dot-and-whisker plots of regression results. It takes as input either (1) a coefficient table in standard form or (2) one (or a list of) fitted model objects (of any type that has methods implemented in the 'parameters' package). It returns 'ggplot' objects that can be further customized using tools from the 'ggplot2' package. The package also includes helper functions for tasks such as rescaling coefficients or relabeling predictor variables. See more methodological discussion of the visualization and data management methods used in this package in Kastellec and Leoni (2007) <doi:10.1017/S1537592707072209> and Gelman (2008) <doi:10.1002/sim.3107>.
Maintained by Yue Hu. Last updated 6 months ago.
5.4 match 60 stars 10.25 score 680 scriptsecospat
ecospat:Spatial Ecology Miscellaneous Methods
Collection of R functions and data sets for the support of spatial ecology analyses with a focus on pre, core and post modelling analyses of species distribution, niche quantification and community assembly. Written by current and former members and collaborators of the ecospat group of Antoine Guisan, Department of Ecology and Evolution (DEE) and Institute of Earth Surface Dynamics (IDYST), University of Lausanne, Switzerland. Read Di Cola et al. (2016) <doi:10.1111/ecog.02671> for details.
Maintained by Olivier Broennimann. Last updated 1 months ago.
5.9 match 32 stars 9.35 score 418 scripts 1 dependentsjacob-long
interactions:Comprehensive, User-Friendly Toolkit for Probing Interactions
A suite of functions for conducting and interpreting analysis of statistical interaction in regression models that was formerly part of the 'jtools' package. Functionality includes visualization of two- and three-way interactions among continuous and/or categorical variables as well as calculation of "simple slopes" and Johnson-Neyman intervals (see e.g., Bauer & Curran, 2005 <doi:10.1207/s15327906mbr4003_5>). These capabilities are implemented for generalized linear models in addition to the standard linear regression context.
Maintained by Jacob A. Long. Last updated 8 months ago.
interactionsmoderationsocial-sciencesstatistics
4.7 match 131 stars 11.39 score 1.2k scripts 5 dependentsmmaechler
supclust:Supervised Clustering of Predictor Variables Such as Genes
Methodology for supervised grouping aka "clustering" of potentially many predictor variables, such as genes etc, implementing algorithms 'PELORA' and 'WILMA'.
Maintained by Martin Maechler. Last updated 7 months ago.
12.7 match 2 stars 4.15 score 28 scriptsbioc
sesame:SEnsible Step-wise Analysis of DNA MEthylation BeadChips
Tools For analyzing Illumina Infinium DNA methylation arrays. SeSAMe provides utilities to support analyses of multiple generations of Infinium DNA methylation BeadChips, including preprocessing, quality control, visualization and inference. SeSAMe features accurate detection calling, intelligent inference of ethnicity, sex and advanced quality control routines.
Maintained by Wanding Zhou. Last updated 2 months ago.
dnamethylationmethylationarraypreprocessingqualitycontrolbioinformaticsdna-methylationmicroarray
5.7 match 69 stars 9.08 score 258 scripts 1 dependentscapnrefsmmat
regressinator:Simulate and Diagnose (Generalized) Linear Models
Simulate samples from populations with known covariate distributions, generate response variables according to common linear and generalized linear model families, draw from sampling distributions of regression estimates, and perform visual inference on diagnostics from model fits.
Maintained by Alex Reinhart. Last updated 5 months ago.
8.5 match 4 stars 6.08 score 25 scriptstidymodels
embed:Extra Recipes for Encoding Predictors
Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
5.4 match 142 stars 9.35 score 1.1k scriptssimulatr
simrel:Simulation of Multivariate Linear Model Data
Researchers have been using simulated data from a multivariate linear model to compare and evaluate different methods, ideas and models. Additionally, teachers and educators have been using a simulation tool to demonstrate and teach various statistical and machine learning concepts. This package helps users to simulate linear model data with a wide range of properties by tuning few parameters such as relevant latent components. In addition, a shiny app as an 'RStudio' gadget gives users a simple interface for using the simulation function. See more on: Sæbø, S., Almøy, T., Helland, I.S. (2015) <doi:10.1016/j.chemolab.2015.05.012> and Rimal, R., Almøy, T., Sæbø, S. (2018) <doi:10.1016/j.chemolab.2018.02.009>.
Maintained by Raju Rimal. Last updated 2 years ago.
bivariate-simulationmultivariate-simulationrelevant-predictor-componentssimulated-datasimulationunivariate-simulation
10.4 match 3 stars 4.78 score 40 scriptsgeomorphr
geomorph:Geometric Morphometric Analyses of 2D and 3D Landmark Data
Read, manipulate, and digitize landmark data, generate shape variables via Procrustes analysis for points, curves and surfaces, perform shape analyses, and provide graphical depictions of shapes and patterns of shape variation.
Maintained by Dean Adams. Last updated 1 months ago.
4.1 match 76 stars 12.05 score 700 scripts 6 dependentsjenfb
bkmr:Bayesian Kernel Machine Regression
Implementation of a statistical approach for estimating the joint health effects of multiple concurrent exposures, as described in Bobb et al (2015) <doi:10.1093/biostatistics/kxu058>.
Maintained by Jennifer F. Bobb. Last updated 4 months ago.
6.8 match 55 stars 7.03 score 182 scripts 1 dependentsknausb
vcfR:Manipulate and Visualize VCF Data
Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.
Maintained by Brian J. Knaus. Last updated 22 days ago.
genomicspopulation-geneticspopulation-genomicsrcppvcf-datavisualizationzlibcpp
3.5 match 254 stars 13.59 score 3.1k scripts 19 dependentsdwarton
ecostats:Code and Data Accompanying the Eco-Stats Text (Warton 2022)
Functions and data supporting the Eco-Stats text (Warton, 2022, Springer), and solutions to exercises. Functions include tools for using simulation envelopes in diagnostic plots, and a function for diagnostic plots of multivariate linear models. Datasets mentioned in the package are included here (where not available elsewhere) and there is a vignette for each chapter of the text with solutions to exercises.
Maintained by David Warton. Last updated 1 years ago.
7.2 match 8 stars 6.58 score 53 scriptschoonghyunryu
dlookr:Tools for Data Diagnosis, Exploration, Transformation
A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.
Maintained by Choonghyun Ryu. Last updated 9 months ago.
4.3 match 212 stars 11.05 score 748 scripts 2 dependentsclbustos
dominanceanalysis:Dominance Analysis
Dominance analysis is a method that allows to compare the relative importance of predictors in multiple regression models: ordinary least squares, generalized linear models, hierarchical linear models, beta regression and dynamic linear models. The main principles and methods of dominance analysis are described in Budescu, D. V. (1993) <doi:10.1037/0033-2909.114.3.542> and Azen, R., & Budescu, D. V. (2003) <doi:10.1037/1082-989X.8.2.129> for ordinary least squares regression. Subsequently, the extensions for multivariate regression, logistic regression and hierarchical linear models were described in Azen, R., & Budescu, D. V. (2006) <doi:10.3102/10769986031002157>, Azen, R., & Traxel, N. (2009) <doi:10.3102/1076998609332754> and Luo, W., & Azen, R. (2013) <doi:10.3102/1076998612458319>, respectively.
Maintained by Claudio Bustos Navarrete. Last updated 1 years ago.
8.1 match 25 stars 5.75 score 45 scriptsgavinsimpson
gratia:Graceful 'ggplot'-Based Graphics and Other Functions for GAMs Fitted Using 'mgcv'
Graceful 'ggplot'-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the 'mgcv' package. Provides a reimplementation of the plot() method for GAMs that 'mgcv' provides, as well as 'tidyverse' compatible representations of estimated smooths.
Maintained by Gavin L. Simpson. Last updated 5 days ago.
distributional-regressiongamgammgeneralized-additive-mixed-modelsgeneralized-additive-modelsggplot2glmlmmgcvpenalized-splinerandom-effectssmoothingsplines
3.6 match 216 stars 12.68 score 1.6k scripts 1 dependentssahirbhatnagar
casebase:Fitting Flexible Smooth-in-Time Hazards and Risk Functions via Logistic and Multinomial Regression
Fit flexible and fully parametric hazard regression models to survival data with single event type or multiple competing causes via logistic and multinomial regression. Our formulation allows for arbitrary functional forms of time and its interactions with other predictors for time-dependent hazards and hazard ratios. From the fitted hazard model, we provide functions to readily calculate and plot cumulative incidence and survival curves for a given covariate profile. This approach accommodates any log-linear hazard function of prognostic time, treatment, and covariates, and readily allows for non-proportionality. We also provide a plot method for visualizing incidence density via population time plots. Based on the case-base sampling approach of Hanley and Miettinen (2009) <DOI:10.2202/1557-4679.1125>, Saarela and Arjas (2015) <DOI:10.1111/sjos.12125>, and Saarela (2015) <DOI:10.1007/s10985-015-9352-x>.
Maintained by Sahir Bhatnagar. Last updated 7 months ago.
competing-riskscox-regressionregression-modelssurvival-analysis
6.3 match 9 stars 7.16 score 94 scriptsdadongz
OncoSubtype:Predict Cancer Subtypes Based on TCGA Data using Machine Learning Method
Provide functionality for cancer subtyping using nearest centroids or machine learning methods based on TCGA data.
Maintained by Dadong Zhang. Last updated 12 months ago.
12.0 match 1 stars 3.70 score 1 scriptscran
mgcv:Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, 'JAGS' support and distributions beyond the exponential family.
Maintained by Simon Wood. Last updated 1 years ago.
3.4 match 32 stars 12.71 score 17k scripts 7.8k dependentsqingzhaoyu
mma:Multiple Mediation Analysis
Used for general multiple mediation analysis. The analysis method is described in Yu and Li (2022) (ISBN: 9780367365479) "Statistical Methods for Mediation, Confounding and Moderation Analysis Using R and SAS", published by Chapman and Hall/CRC; and Yu et al.(2017) <DOI:10.1016/j.sste.2017.02.001> "Exploring racial disparity in obesity: a mediation analysis considering geo-coded environmental factors", published on Spatial and Spatio-temporal Epidemiology, 21, 13-23.
Maintained by Qingzhao Yu. Last updated 2 years ago.
10.7 match 1 stars 3.96 score 61 scripts 1 dependentsdmcartor
MDMR:Multivariate Distance Matrix Regression
This package allows users to conduct multivariate distance matrix regression using analytic p-values and compute measures of effect size. For details on the method, see McArtor, Lubke, & Bergeman (2017) <https://doi.org/10.1007/s11336-016-9527-8>.
Maintained by Daniel B. McArtor. Last updated 7 years ago.
6.9 match 6 stars 6.13 score 15 scripts 1 dependentsbioc
PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit
Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
geneexpressionpharmacogeneticspharmacogenomicssoftwareclassificationsurvivalclusteringgeneprediction
9.7 match 1 stars 4.31 score 17 scriptsflorale
multilevelcoda:Estimate Bayesian Multilevel Models for Compositional Data
Implement Bayesian Multilevel Modelling for compositional data in a multilevel framework. Compute multilevel compositional data and Isometric log ratio (ILR) at between and within-person levels, fit Bayesian multilevel models for compositional predictors and outcomes, and run post-hoc analyses such as isotemporal substitution models. References: Le, Stanford, Dumuid, and Wiley (2024) <doi:10.48550/arXiv.2405.03985>, Le, Dumuid, Stanford, and Wiley (2024) <doi:10.48550/arXiv.2411.12407>.
Maintained by Flora Le. Last updated 2 days ago.
bayesian-inferencecompositional-data-analysismultilevel-modelsmultilevelcoda
4.9 match 14 stars 8.31 score 118 scriptscardiomoon
ggiraphExtra:Make Interactive 'ggplot2'. Extension to 'ggplot2' and 'ggiraph'
Collection of functions to enhance 'ggplot2' and 'ggiraph'. Provides functions for exploratory plots. All plot can be a 'static' plot or an 'interactive' plot using 'ggiraph'.
Maintained by Keon-Woong Moon. Last updated 4 years ago.
4.5 match 48 stars 8.93 score 402 scripts 3 dependentsthothorn
ipred:Improved Predictors
Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.
Maintained by Torsten Hothorn. Last updated 8 months ago.
3.8 match 10.76 score 3.3k scripts 411 dependentsfriendly
candisc:Visualizing Generalized Canonical Discriminant and Canonical Correlation Analysis
Functions for computing and visualizing generalized canonical discriminant analyses and canonical correlation analysis for a multivariate linear model. Traditional canonical discriminant analysis is restricted to a one-way 'MANOVA' design and is equivalent to canonical correlation analysis between a set of quantitative response variables and a set of dummy variables coded from the factor variable. The 'candisc' package generalizes this to higher-way 'MANOVA' designs for all factors in a multivariate linear model, computing canonical scores and vectors for each term. The graphic functions provide low-rank (1D, 2D, 3D) visualizations of terms in an 'mlm' via the 'plot.candisc' and 'heplot.candisc' methods. Related plots are now provided for canonical correlation analysis when all predictors are quantitative.
Maintained by Michael Friendly. Last updated 10 months ago.
dimension-reductionmultivariate-linear-modelsvisualization
4.5 match 15 stars 8.86 score 221 scripts 3 dependentsholgstr
fmeffects:Model-Agnostic Interpretations with Forward Marginal Effects
Create local, regional, and global explanations for any machine learning model with forward marginal effects. You provide a model and data, and 'fmeffects' computes feature effects. The package is based on the theory in: C. A. Scholbeck, G. Casalicchio, C. Molnar, B. Bischl, and C. Heumann (2022) <doi:10.48550/arXiv.2201.08837>.
Maintained by Holger Löwe. Last updated 4 months ago.
7.0 match 2 stars 5.73 score 6 scriptsjenniniku
gllvm:Generalized Linear Latent Variable Models
Analysis of multivariate data using generalized linear latent variable models (gllvm). Estimation is performed using either the Laplace method, variational approximations, or extended variational approximations, implemented via TMB (Kristensen et al. (2016), <doi:10.18637/jss.v070.i05>).
Maintained by Jenni Niku. Last updated 2 days ago.
3.8 match 51 stars 10.52 score 176 scripts 1 dependentslifewatch
sdmpredictors:Species Distribution Modelling Predictor Datasets
Terrestrial and marine predictors for species distribution modelling from multiple sources, including WorldClim <https://www.worldclim.org/>,, ENVIREM <https://envirem.github.io/>, Bio-ORACLE <https://bio-oracle.org/> and MARSPEC <http://www.marspec.org/>.
Maintained by Salvador Fernandez. Last updated 2 years ago.
bio-oraclelifewatchlifewatchvlizspecies-distribution-modelling
5.3 match 30 stars 7.47 score 218 scriptsvgherard
sbo:Text Prediction via Stupid Back-Off N-Gram Models
Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).
Maintained by Valerio Gherardi. Last updated 4 years ago.
natural-language-processingngram-modelspredictive-textsbocpp
8.2 match 10 stars 4.78 score 12 scriptsfriendly
vcdExtra:'vcd' Extensions and Additions
Provides additional data sets, methods and documentation to complement the 'vcd' package for Visualizing Categorical Data and the 'gnm' package for Generalized Nonlinear Models. In particular, 'vcdExtra' extends mosaic, assoc and sieve plots from 'vcd' to handle 'glm()' and 'gnm()' models and adds a 3D version in 'mosaic3d'. Additionally, methods are provided for comparing and visualizing lists of 'glm' and 'loglm' objects. This package is now a support package for the book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer.
Maintained by Michael Friendly. Last updated 5 months ago.
categorical-data-visualizationgeneralized-linear-modelsmosaic-plots
3.8 match 24 stars 10.34 score 472 scripts 3 dependentsskranz
gtree:gtree basic functionality to model and solve games
gtree basic functionality to model and solve games
Maintained by Sebastian Kranz. Last updated 4 years ago.
economic-experimentseconomicsgambitgame-theorynash-equilibrium
10.2 match 18 stars 3.79 score 23 scripts 1 dependentsplangfelder
WGCNA:Weighted Correlation Network Analysis
Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.
Maintained by Peter Langfelder. Last updated 6 months ago.
4.0 match 54 stars 9.65 score 5.3k scripts 32 dependentsgamlss-dev
gamlss:Generalized Additive Models for Location Scale and Shape
Functions for fitting the Generalized Additive Models for Location Scale and Shape introduced by Rigby and Stasinopoulos (2005), <doi:10.1111/j.1467-9876.2005.00510.x>. The models use a distributional regression approach where all the parameters of the conditional distribution of the response variable are modelled using explanatory variables.
Maintained by Mikis Stasinopoulos. Last updated 4 months ago.
3.4 match 16 stars 11.23 score 2.0k scripts 49 dependentstheoreticalecology
sjSDM:Scalable Joint Species Distribution Modeling
A scalable and fast method for estimating joint Species Distribution Models (jSDMs) for big community data, including eDNA data. The package estimates a full (i.e. non-latent) jSDM with different response distributions (including the traditional multivariate probit model). The package allows to perform variation partitioning (VP) / ANOVA on the fitted models to separate the contribution of environmental, spatial, and biotic associations. In addition, the total R-squared can be further partitioned per species and site to reveal the internal metacommunity structure, see Leibold et al., <doi:10.1111/oik.08618>. The internal structure can then be regressed against environmental and spatial distinctiveness, richness, and traits to analyze metacommunity assembly processes. The package includes support for accounting for spatial autocorrelation and the option to fit responses using deep neural networks instead of a standard linear predictor. As described in Pichler & Hartig (2021) <doi:10.1111/2041-210X.13687>, scalability is achieved by using a Monte Carlo approximation of the joint likelihood implemented via 'PyTorch' and 'reticulate', which can be run on CPUs or GPUs.
Maintained by Maximilian Pichler. Last updated 23 days ago.
deep-learninggpu-accelerationmachine-learningspecies-distribution-modellingspecies-interactions
4.8 match 69 stars 7.64 score 70 scriptstobiasschoch
robsurvey:Robust Survey Statistics Estimation
Robust (outlier-resistant) estimators of finite population characteristics like of means, totals, ratios, regression, etc. Available methods are M- and GM-estimators of regression, weight reduction, trimming, and winsorization. The package extends the 'survey' <https://CRAN.R-project.org/package=survey> package.
Maintained by Tobias Schoch. Last updated 3 months ago.
6.0 match 9 stars 6.16 score 5 scriptsabichat
evabic:Evaluation of Binary Classifiers
Evaluates the performance of binary classifiers. Computes confusion measures (TP, TN, FP, FN), derived measures (TPR, FDR, accuracy, F1, DOR, ..), and area under the curve. Outputs are well suited for nested dataframes.
Maintained by Antoine Bichat. Last updated 3 years ago.
classifiermeasurespredictorsroc-curvestatistics
10.0 match 6 stars 3.62 score 14 scriptsrvalavi
blockCV:Spatial and Environmental Blocking for K-Fold and LOO Cross-Validation
Creating spatially or environmentally separated folds for cross-validation to provide a robust error estimation in spatially structured environments; Investigating and visualising the effective range of spatial autocorrelation in continuous raster covariates and point samples to find an initial realistic distance band to separate training and testing datasets spatially described in Valavi, R. et al. (2019) <doi:10.1111/2041-210X.13107>.
Maintained by Roozbeh Valavi. Last updated 5 months ago.
cross-validationspatialspatial-cross-validationspatial-modellingspecies-distribution-modellingcpp
3.4 match 113 stars 10.49 score 302 scripts 3 dependentsjackmwolf
pcsstools:Tools for Regression Using Pre-Computed Summary Statistics
Defines functions to describe regression models using only pre-computed summary statistics (i.e. means, variances, and covariances) in place of individual participant data. Possible models include linear models for linear combinations, products, and logical combinations of phenotypes. Implements methods presented in Wolf et al. (2021) <doi:10.3389/fgene.2021.745901> Wolf et al. (2020) <doi:10.1142/9789811215636_0063> and Gasdaska et al. (2019) <doi:10.1142/9789813279827_0036>.
Maintained by Jack Wolf. Last updated 9 months ago.
10.5 match 5 stars 3.40 score 5 scriptsforestry-labs
distillML:Model Distillation and Interpretability Methods for Machine Learning Models
Provides several methods for model distillation and interpretability for general black box machine learning models and treatment effect estimation methods. For details on the algorithms implemented, see <https://forestry-labs.github.io/distillML/index.html> Brian Cho, Theo F. Saarinen, Jasjeet S. Sekhon, Simon Walter.
Maintained by Theo Saarinen. Last updated 2 years ago.
bartdistillation-modelexplainable-machine-learningexplainable-mlinterpretabilityinterpretable-machine-learningmachine-learningmodelrandom-forestxgboost
9.1 match 7 stars 3.92 score 12 scriptsjfiksel
codalm:Transformation-Free Linear Regression for Compositional Outcomes and Predictors
Implements the expectation-maximization (EM) algorithm as described in Fiksel et al. (2020) <arXiv:2004.07881> for transformation-free linear regression for compositional outcomes and predictors.
Maintained by Jacob Fiksel. Last updated 4 years ago.
8.5 match 3 stars 4.18 score 5 scriptslme4
lme4:Linear Mixed-Effects Models using 'Eigen' and S4
Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".
Maintained by Ben Bolker. Last updated 2 days ago.
1.7 match 647 stars 20.69 score 35k scripts 1.5k dependentssibipx
missForestPredict:Missing Value Imputation using Random Forest for Prediction Settings
Missing data imputation based on the 'missForest' algorithm (Stekhoven, Daniel J (2012) <doi:10.1093/bioinformatics/btr597>) with adaptations for prediction settings. The function missForest() is used to impute a (training) dataset with missing values and to learn imputation models that can be later used for imputing new observations. The function missForestPredict() is used to impute one or multiple new observations (test set) using the models learned on the training data.
Maintained by Elena Albu. Last updated 1 years ago.
8.5 match 4.00 score 3 scriptsbioc
cancerclass:Development and validation of diagnostic tests from high-dimensional molecular data
The classification protocol starts with a feature selection step and continues with nearest-centroid classification. The accurarcy of the predictor can be evaluated using training and test set validation, leave-one-out cross-validation or in a multiple random validation protocol. Methods for calculation and visualization of continuous prediction scores allow to balance sensitivity and specificity and define a cutoff value according to clinical requirements.
Maintained by Daniel Kosztyla. Last updated 5 months ago.
cancermicroarrayclassificationvisualization
10.2 match 3.30 score 10 scriptssolivella
NetMix:Dynamic Mixed-Membership Network Regression Model
Stochastic collapsed variational inference on mixed-membership stochastic blockmodel for networks, incorporating node-level predictors of mixed-membership vectors, as well as dyad-level predictors. For networks observed over time, the model defines a hidden Markov process that allows the effects of node-level predictors to evolve in discrete, historical periods. In addition, the package offers a variety of utilities for exploring results of estimation, including tools for conducting posterior predictive checks of goodness-of-fit and several plotting functions. The package implements methods described in Olivella, Pratt and Imai (2019) 'Dynamic Stochastic Blockmodel Regression for Social Networks: Application to International Conflicts', available at <https://www.santiagoolivella.info/pdfs/socnet.pdf>.
Maintained by Santiago Olivella. Last updated 1 years ago.
7.8 match 11 stars 4.30 score 36 scriptstidyverse
modelr:Modelling Functions that Work with the Pipe
Functions for modelling that help you seamlessly integrate modelling into a pipeline of data manipulation and visualisation.
Maintained by Hadley Wickham. Last updated 1 years ago.
2.0 match 401 stars 16.44 score 6.9k scripts 1.0k dependentsanthonydevaux
DynForest:Random Forest with Multivariate Longitudinal Predictors
Based on random forest principle, 'DynForest' is able to include multiple longitudinal predictors to provide individual predictions. Longitudinal predictors are modeled through the random forest. The methodology is fully described for a survival outcome in: Devaux, Helmer, Genuer & Proust-Lima (2023) <doi: 10.1177/09622802231206477>.
Maintained by Anthony Devaux. Last updated 5 months ago.
5.0 match 16 stars 6.38 score 8 scriptsmunchfab
mlts:Multilevel Latent Time Series Models with 'R' and 'Stan'
Fit multilevel manifest or latent time-series models, including popular Dynamic Structural Equation Models (DSEM). The models can be set up and modified with user-friendly functions and are fit to the data using 'Stan' for Bayesian inference. Path models and formulas for user-defined models can be easily created with functions using 'knitr'. Asparouhov, Hamaker, & Muthen (2018) <doi:10.1080/10705511.2017.1406803>.
Maintained by Kenneth Koslowski. Last updated 9 months ago.
5.5 match 2 stars 5.68 score 9 scriptshturner
BradleyTerry2:Bradley-Terry Models
Specify and fit the Bradley-Terry model, including structured versions in which the parameters are related to explanatory variables through a linear predictor and versions with contest-specific effects, such as a home advantage.
Maintained by Heather Turner. Last updated 6 years ago.
bradley-terry-modelspaired-comparisonsstatistical-models
3.9 match 20 stars 7.97 score 172 scripts 1 dependentsamices
mice:Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Maintained by Stef van Buuren. Last updated 6 days ago.
chained-equationsfcsimputationmicemissing-datamissing-valuesmultiple-imputationmultivariate-datacpp
1.9 match 462 stars 16.50 score 10k scripts 154 dependentsgi0na
ghypernet:Fit and Simulate Generalised Hypergeometric Ensembles of Graphs
Provides functions for model fitting and selection of generalised hypergeometric ensembles of random graphs (gHypEG). To learn how to use it, check the vignettes for a quick tutorial. Please reference its use as Casiraghi, G., Nanumyan, V. (2019) <doi:10.5281/zenodo.2555300> together with those relevant references from the one listed below. The package is based on the research developed at the Chair of Systems Design, ETH Zurich. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2016) <arXiv:1607.02441>. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2017) <doi:10.1007/978-3-319-67256-4_11>. Casiraghi, G., (2017) <arXiv:1702.02048> Brandenberger, L., Casiraghi, G., Nanumyan, V., Schweitzer, F. (2019) <doi:10.1145/3341161.3342926> Casiraghi, G. (2019) <doi:10.1007/s41109-019-0241-1>. Casiraghi, G., Nanumyan, V. (2021) <doi:10.1038/s41598-021-92519-y>. Casiraghi, G. (2021) <doi:10.1088/2632-072X/ac0493>.
Maintained by Giona Casiraghi. Last updated 11 months ago.
data-miningdata-sciencegraphsnetworknetwork-analysisrandom-graph-generationrandom-graphs
5.4 match 8 stars 5.68 score 20 scriptsuscbiostats
xrnet:Hierarchical Regularized Regression
Fits hierarchical regularized regression models to incorporate potentially informative external data, Weaver and Lewinger (2019) <doi:10.21105/joss.01761>. Utilizes coordinate descent to efficiently fit regularized regression models both with and without external information with the most common penalties used in practice (i.e. ridge, lasso, elastic net). Support for standard R matrices, sparse matrices and big.matrix objects.
Maintained by Garrett Weaver. Last updated 8 months ago.
6.9 match 10 stars 4.48 score 10 scriptsdatalorax
equatiomatic:Transform Models into 'LaTeX' Equations
The goal of 'equatiomatic' is to reduce the pain associated with writing 'LaTeX' formulas from fitted models. The primary function of the package, extract_eq(), takes a fitted model object as its input and returns the corresponding 'LaTeX' code for the model.
Maintained by Philippe Grosjean. Last updated 7 days ago.
2.6 match 619 stars 11.75 score 424 scripts 5 dependentstidymodels
rsample:General Resampling Infrastructure
Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).
Maintained by Hannah Frick. Last updated 5 days ago.
1.8 match 341 stars 16.72 score 5.2k scripts 79 dependentsjinli22
spm:Spatial Predictive Modeling
Introduction to some novel accurate hybrid methods of geostatistical and machine learning methods for spatial predictive modelling. It contains two commonly used geostatistical methods, two machine learning methods, four hybrid methods and two averaging methods. For each method, two functions are provided. One function is for assessing the predictive errors and accuracy of the method based on cross-validation. The other one is for generating spatial predictions using the method. For details please see: Li, J., Potter, A., Huang, Z., Daniell, J. J. and Heap, A. (2010) <https:www.ga.gov.au/metadata-gateway/metadata/record/gcat_71407> Li, J., Heap, A. D., Potter, A., Huang, Z. and Daniell, J. (2011) <doi:10.1016/j.csr.2011.05.015> Li, J., Heap, A. D., Potter, A. and Daniell, J. (2011) <doi:10.1016/j.envsoft.2011.07.004> Li, J., Potter, A., Huang, Z. and Heap, A. (2012) <https:www.ga.gov.au/metadata-gateway/metadata/record/74030>.
Maintained by Jin Li. Last updated 3 years ago.
5.5 match 3 stars 5.46 score 107 scripts 3 dependentsbioc
SemDist:Information Accretion-based Function Predictor Evaluation
This package implements methods to calculate information accretion for a given version of the gene ontology and uses this data to calculate remaining uncertainty, misinformation, and semantic similarity for given sets of predicted annotations and true annotations from a protein function predictor.
Maintained by Ian Gonzalez. Last updated 5 months ago.
classificationannotationgosoftware
6.9 match 1 stars 4.30 score 3 scriptstidymodels
hardhat:Construct Modeling Packages
Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.
Maintained by Hannah Frick. Last updated 1 months ago.
2.0 match 103 stars 14.88 score 175 scripts 436 dependentsdiystat
NBPSeq:Negative Binomial Models for RNA-Sequencing Data
Negative Binomial (NB) models for two-group comparisons and regression inferences from RNA-Sequencing Data.
Maintained by Yanming Di. Last updated 11 years ago.
6.0 match 1 stars 4.88 score 17 scripts 3 dependentseasystats
performance:Assessment of Regression Models Performance
Utilities for computing measures to assess model quality, which are not directly provided by R's 'base' or 'stats' packages. These include e.g. measures like r-squared, intraclass correlation coefficient (Nakagawa, Johnson & Schielzeth (2017) <doi:10.1098/rsif.2017.0213>), root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models. References: Lüdecke et al. (2021) <doi:10.21105/joss.03139>.
Maintained by Daniel Lüdecke. Last updated 18 days ago.
aiceasystatshacktoberfestloomachine-learningmixed-modelsmodelsperformancer2statistics
1.8 match 1.1k stars 16.17 score 4.3k scripts 47 dependentsrvlenth
rsm:Response-Surface Analysis
Provides functions to generate response-surface designs, fit first- and second-order response-surface models, make surface plots, obtain the path of steepest ascent, and do canonical analysis. A good reference on these methods is Chapter 10 of Wu, C-F J and Hamada, M (2009) "Experiments: Planning, Analysis, and Parameter Design Optimization" ISBN 978-0-471-69946-0. An early version of the package is documented in Journal of Statistical Software <doi:10.18637/jss.v032.i07>.
Maintained by Russell Lenth. Last updated 9 months ago.
2.8 match 18 stars 10.16 score 192 scripts 8 dependentsjohn-d-fox
Rcmdr:R Commander
A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.
Maintained by John Fox. Last updated 5 months ago.
3.0 match 4 stars 9.49 score 636 scripts 38 dependentseasystats
bayestestR:Understand and Describe Bayesian Models and Posterior Distributions
Provides utilities to describe posterior distributions and Bayesian models. It includes point-estimates such as Maximum A Posteriori (MAP), measures of dispersion (Highest Density Interval - HDI; Kruschke, 2015 <doi:10.1016/C2012-0-00477-2>) and indices used for null-hypothesis testing (such as ROPE percentage, pd and Bayes factors). References: Makowski et al. (2021) <doi:10.21105/joss.01541>.
Maintained by Dominique Makowski. Last updated 15 hours ago.
bayes-factorsbayesfactorbayesianbayesian-frameworkcredible-intervaleasystatshacktoberfesthdimapposterior-distributionsrope
1.7 match 579 stars 16.84 score 2.2k scripts 82 dependentsbusiness-science
timetk:A Tool Kit for Working with Time Series
Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.
Maintained by Matt Dancho. Last updated 1 years ago.
coercioncoercion-functionsdata-miningdplyrforecastforecastingforecasting-modelsmachine-learningseries-decompositionseries-signaturetibbletidytidyquanttidyversetimetime-seriestimeseries
2.0 match 625 stars 14.15 score 4.0k scripts 16 dependentsallengoebl
iopsych:Methods for Industrial/Organizational Psychology
Collection of functions for IO Psychologists.
Maintained by Allen Goebl. Last updated 7 years ago.
7.1 match 3 stars 4.00 score 66 scriptsmlverse
tabnet:Fit 'TabNet' Models for Classification and Regression
Implements the 'TabNet' model by Sercan O. Arik et al. (2019) <doi:10.48550/arXiv.1908.07442> with 'Coherent Hierarchical Multi-label Classification Networks' by Giunchiglia et al. <doi:10.48550/arXiv.2010.10151> and provides a consistent interface for fitting and creating predictions. It's also fully compatible with the 'tidymodels' ecosystem.
Maintained by Christophe Regouby. Last updated 6 months ago.
3.1 match 109 stars 9.00 score 65 scriptscran
PLORN:Prediction with Less Overfitting and Robust to Noise
A method for the quantitative prediction with much predictors. This package provides functions to construct the quantitative prediction model with less overfitting and robust to noise.
Maintained by Takahiko Koizumi. Last updated 3 years ago.
10.2 match 2.70 score 4 scriptsr-forge
car:Companion to Applied Regression
Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.
Maintained by John Fox. Last updated 5 months ago.
1.8 match 15.29 score 43k scripts 901 dependentsf-rousset
spaMM:Mixed-Effect Models, with or without Spatial Random Effects
Inference based on models with or without spatially-correlated random effects, multivariate responses, or non-Gaussian random effects (e.g., Beta). Variation in residual variance (heteroscedasticity) can itself be represented by a mixed-effect model. Both classical geostatistical models (Rousset and Ferdy 2014 <doi:10.1111/ecog.00566>), and Markov random field models on irregular grids (as considered in the 'INLA' package, <https://www.r-inla.org>), can be fitted, with distinct computational procedures exploiting the sparse matrix representations for the latter case and other autoregressive models. Laplace approximations are used for likelihood or restricted likelihood. Penalized quasi-likelihood and other variants discussed in the h-likelihood literature (Lee and Nelder 2001 <doi:10.1093/biomet/88.4.987>) are also implemented.
Maintained by François Rousset. Last updated 9 months ago.
5.6 match 4.94 score 208 scripts 5 dependentsedwinkipruto
mfp2:Multivariable Fractional Polynomial Models with Extensions
Multivariable fractional polynomial algorithm simultaneously selects variables and functional forms in both generalized linear models and Cox proportional hazard models. Key references are Royston and Altman (1994) <doi:10.2307/2986270> and Royston and Sauerbrei (2008, ISBN:978-0-470-02842-1). In addition, it can model a sigmoid relationship between variable x and an outcome variable y using the approximate cumulative distribution transformation proposed by Royston (2014) <doi:10.1177/1536867X1401400206>. This feature distinguishes it from a standard fractional polynomial function, which lacks the ability to achieve such modeling.
Maintained by Edwin Kipruto. Last updated 10 months ago.
5.2 match 3 stars 5.26 score 4 scripts 2 dependentsstrengejacke
ggeffects:Create Tidy Data Frames of Marginal Effects for 'ggplot' from Model Outputs
Compute marginal effects and adjusted predictions from statistical models and returns the result as tidy data frames. These data frames are ready to use with the 'ggplot2'-package. Effects and predictions can be calculated for many different models. Interaction terms, splines and polynomial terms are also supported. The main functions are ggpredict(), ggemmeans() and ggeffect(). There is a generic plot()-method to plot the results using 'ggplot2'.
Maintained by Daniel Lüdecke. Last updated 4 days ago.
estimated-marginal-meanshacktoberfestmarginal-effectsprediction
1.8 match 588 stars 15.55 score 3.6k scripts 7 dependentsweiliang
powerMediation:Power/Sample Size Calculation for Mediation Analysis
Functions to calculate power and sample size for testing (1) mediation effects; (2) the slope in a simple linear regression; (3) odds ratio in a simple logistic regression; (4) mean change for longitudinal study with 2 time points; (5) interaction effect in 2-way ANOVA; and (6) the slope in a simple Poisson regression.
Maintained by Weiliang Qiu. Last updated 4 years ago.
6.8 match 3 stars 3.97 score 65 scripts 2 dependentshojsgaard
doBy:Groupwise Statistics, LSmeans, Linear Estimates, Utilities
Utility package containing: 1) Facilities for working with grouped data: 'do' something to data stratified 'by' some variables. 2) LSmeans (least-squares means), general linear estimates. 3) Restrict functions to a smaller domain. 4) Miscellaneous other utilities.
Maintained by Søren Højsgaard. Last updated 4 days ago.
1.8 match 1 stars 14.94 score 3.2k scripts 939 dependentsnliulab
AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator
A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.
Maintained by Feng Xie. Last updated 14 days ago.
3.5 match 32 stars 7.70 score 30 scriptstidy-finance
tidyfinance:Tidy Finance Helper Functions
Helper functions for empirical research in financial economics, addressing a variety of topics covered in Scheuch, Voigt, and Weiss (2023) <doi:10.1201/b23237>. The package is designed to provide shortcuts for issues extensively discussed in the book, facilitating easier application of its concepts. For more information and resources related to the book, visit <https://www.tidy-finance.org/r/index.html>.
Maintained by Christoph Scheuch. Last updated 3 months ago.
3.5 match 15 stars 7.56 score 24 scriptsahoshiyar
ordPens:Selection, Fusion, Smoothing and Principal Components Analysis for Ordinal Variables
Selection, fusion, and/or smoothing of ordinally scaled independent variables using a group lasso, fused lasso or generalized ridge penalty, as well as non-linear principal components analysis for ordinal variables using a second-order difference/smoothing penalty.
Maintained by Aisouda Hoshiyar. Last updated 10 months ago.
7.0 match 2 stars 3.79 score 31 scriptsstatnet
ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks
An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.
Maintained by Pavel N. Krivitsky. Last updated 6 days ago.
1.7 match 100 stars 15.36 score 1.4k scripts 36 dependentsdallenmidd
IxPopDyMod:Framework for Tick Population and Infection Modeling
Code to specify, run, and then visualize and analyze the results of Ixodidae (hard-bodied ticks) population and infection dynamics models. Such models exist in the literature, but the source code to run them is not always available. 'IxPopDyMod' provides an easy way for these models to be written and shared.
Maintained by Myles Stokowski. Last updated 4 months ago.
8.8 match 2 stars 3.00 score 6 scriptsmaressyl
LPS:Linear Predictor Score, for Binary Inference from Multiple Continuous Variables
An implementation of the Linear Predictor Score approach, as initiated by Radmacher et al. (J Comput Biol 2001) and enhanced by Wright et al. (PNAS 2003) for gene expression signatures. Several tools for unsupervised clustering of gene expression data are also provided.
Maintained by Sylvain Mareschal. Last updated 4 years ago.
7.0 match 1 stars 3.74 score 11 scriptsevolecolgroup
tidysdm:Species Distribution Models with Tidymodels
Fit species distribution models (SDMs) using the 'tidymodels' framework, which provides a standardised interface to define models and process their outputs. 'tidysdm' expands 'tidymodels' by providing methods for spatial objects, models and metrics specific to SDMs, as well as a number of specialised functions to process occurrences for contemporary and palaeo datasets. The full functionalities of the package are described in Leonardi et al. (2023) <doi:10.1101/2023.07.24.550358>.
Maintained by Andrea Manica. Last updated 9 days ago.
species-distribution-modellingtidymodels
3.0 match 31 stars 8.82 score 51 scriptsrvlenth
emmeans:Estimated Marginal Means, aka Least-Squares Means
Obtain estimated marginal means (EMMs) for many linear, generalized linear, and mixed models. Compute contrasts or linear functions of EMMs, trends, and comparisons of slopes. Plots and other displays. Least-squares means are discussed, and the term "estimated marginal means" is suggested, in Searle, Speed, and Milliken (1980) Population marginal means in the linear model: An alternative to least squares means, The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>.
Maintained by Russell V. Lenth. Last updated 3 days ago.
1.3 match 377 stars 19.19 score 13k scripts 187 dependentsguido-s
meta:General Package for Meta-Analysis
User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rücker <DOI:10.1007/978-3-319-21416-0>, "Meta-Analysis with R" (2015): - common effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L'Abbe, Baujat, bubble); - three-level meta-analysis model; - generalised linear mixed model; - logistic regression with penalised likelihood for rare events; - Hartung-Knapp method for random effects model; - Kenward-Roger method for random effects model; - prediction interval; - statistical tests for funnel plot asymmetry; - trim-and-fill method to evaluate bias in meta-analysis; - meta-regression; - cumulative meta-analysis and leave-one-out meta-analysis; - import data from 'RevMan 5'; - produce forest plot summarising several (subgroup) meta-analyses.
Maintained by Guido Schwarzer. Last updated 25 days ago.
1.7 match 84 stars 14.84 score 2.3k scripts 29 dependentsr-forge
robustbase:Basic Robust Statistics
"Essential" Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book "Robust Statistics, Theory and Methods" by 'Maronna, Martin and Yohai'; Wiley 2006.
Maintained by Martin Maechler. Last updated 4 months ago.
1.9 match 13.33 score 1.7k scripts 480 dependentsamices
ggmice:Visualizations for 'mice' with 'ggplot2'
Enhance a 'mice' imputation workflow with visualizations for incomplete and/or imputed data. The plotting functions produce 'ggplot' objects which may be easily manipulated or extended. Use 'ggmice' to inspect missing data, develop imputation models, evaluate algorithmic convergence, or compare observed versus imputed data.
Maintained by Hanne Oberman. Last updated 8 months ago.
3.3 match 32 stars 7.42 score 165 scriptsbioc
genefu:Computation of Gene Expression-Based Signatures in Breast Cancer
This package contains functions implementing various tasks usually required by gene expression analysis, especially in breast cancer studies: gene mapping between different microarray platforms, identification of molecular subtypes, implementation of published gene signatures, gene selection, and survival analysis.
Maintained by Benjamin Haibe-Kains. Last updated 4 months ago.
differentialexpressiongeneexpressionvisualizationclusteringclassification
3.3 match 7.42 score 193 scripts 3 dependentsneural-structured-additive-learning
deeptrafo:Fitting Deep Conditional Transformation Models
Allows for the specification of deep conditional transformation models (DCTMs) and ordinal neural network transformation models, as described in Baumann et al (2021) <doi:10.1007/978-3-030-86523-8_1> and Kook et al (2022) <doi:10.1016/j.patcog.2021.108263>. Extensions such as autoregressive DCTMs (Ruegamer et al, 2023, <doi:10.1007/s11222-023-10212-8>) and transformation ensembles (Kook et al, 2022, <doi:10.48550/arXiv.2205.12729>) are implemented. The software package is described in Kook et al (2024, <doi:10.18637/jss.v111.i10>).
Maintained by Lucas Kook. Last updated 2 months ago.
5.4 match 5 stars 4.44 score 11 scriptsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 3 days ago.
immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project
1.8 match 182 stars 13.71 score 1.3k scripts 22 dependentshannameyer
CAST:'caret' Applications for Spatial-Temporal Models
Supporting functionality to run 'caret' with spatial or spatial-temporal data. 'caret' is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using 'caret'. It includes the newly suggested 'Nearest neighbor distance matching' cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milà et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>.
Maintained by Hanna Meyer. Last updated 2 months ago.
autocorrelationcaretfeature-selectionmachine-learningoverfittingpredictive-modelingspatialspatio-temporalvariable-selection
2.0 match 114 stars 11.97 score 298 scripts 1 dependentsjasinmachkour
TRexSelector:T-Rex Selector: High-Dimensional Variable Selection & FDR Control
Performs fast variable selection in high-dimensional settings while controlling the false discovery rate (FDR) at a user-defined target level. The package is based on the paper Machkour, Muma, and Palomar (2022) <arXiv:2110.06048>.
Maintained by Jasin Machkour. Last updated 1 years ago.
5.4 match 5 stars 4.40 score 5 scriptsrspatial
dismo:Species Distribution Modeling
Methods for species distribution modeling, that is, predicting the environmental similarity of any site to that of the locations of known occurrences of a species.
Maintained by Robert J. Hijmans. Last updated 4 months ago.
2.0 match 26 stars 11.88 score 2.8k scripts 21 dependentskkholst
targeted:Targeted Inference
Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>).
Maintained by Klaus K. Holst. Last updated 1 months ago.
causal-inferencedouble-robustestimationsemiparametric-estimationstatisticsopenblascppopenmp
3.3 match 11 stars 7.20 score 30 scripts 1 dependentstagteam
riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks
Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.
Maintained by Thomas Alexander Gerds. Last updated 17 days ago.
1.8 match 46 stars 13.00 score 736 scripts 35 dependentsr-forge
modEvA:Model Evaluation and Analysis
Analyses species distribution models and evaluates their performance. It includes functions for variation partitioning, extracting variable importance, computing several metrics of model discrimination and calibration performance, optimizing prediction thresholds based on a number of criteria, performing multivariate environmental similarity surface (MESS) analysis, and displaying various analytical plots. Initially described in Barbosa et al. (2013) <doi:10.1111/ddi.12100>.
Maintained by A. Marcia Barbosa. Last updated 10 days ago.
3.4 match 6.82 score 269 scripts 3 dependentsmastoffel
partR2:Partitioning R2 in GLMMs
Partitioning the R2 of GLMMs into variation explained by each predictor and combination of predictors using semi-partial (part) R2 and inclusive R2. Methods are based on the R2 for GLMMs described in Nakagawa & Schielzeth (2013) and Nakagawa, Johnson & Schielzeth (2017).
Maintained by Martin A. Stoffel. Last updated 6 months ago.
3.5 match 22 stars 6.68 score 73 scriptsnimble-dev
nimbleMacros:Macros Generating 'nimble' Code
Macros to generate 'nimble' code from a concise syntax. Included are macros for generating linear modeling code using a formula-based syntax and for building for() loops. For more details review the 'nimble' manual: <https://r-nimble.org/html_manual/cha-writing-models.html#subsec:macros>.
Maintained by Ken Kellner. Last updated 4 days ago.
4.7 match 4.98 scoregoodekat
ggResidpanel:Panels and Interactive Versions of Diagnostic Plots using 'ggplot2'
An R package for creating diagnostic plots for models. The package allows for the creation of panels of plots and interactive plots.
Maintained by Katherine Goode. Last updated 2 months ago.
2.9 match 37 stars 7.68 score 262 scriptsjamiemkass
ENMeval:Automated Tuning and Evaluations of Ecological Niche Models
Runs ecological niche models over all combinations of user-defined settings (i.e., tuning), performs cross validation to evaluate models, and returns data tables to aid in selection of optimal model settings that balance goodness-of-fit and model complexity. Also has functions to partition data spatially (or not) for cross validation, to plot multiple visualizations of results, to run null models to estimate significance and effect sizes of performance metrics, and to calculate range overlap between model predictions, among others. The package was originally built for Maxent models (Phillips et al. 2006, Phillips et al. 2017), but the current version allows possible extensions for any modeling algorithm. The extensive vignette, which guides users through most package functionality but unfortunately has a file size too big for CRAN, can be found here on the package's Github Pages website: <https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.
Maintained by Jamie M. Kass. Last updated 2 months ago.
2.0 match 49 stars 11.25 score 332 scripts 2 dependentsstan-dev
rstantools:Tools for Developing R Packages Interfacing with 'Stan'
Provides various tools for developers of R packages interfacing with 'Stan' <https://mc-stan.org>, including functions to set up the required package structure, S3 generics and default methods to unify function naming across 'Stan'-based R packages, and vignettes with recommendations for developers.
Maintained by Jonah Gabry. Last updated 2 months ago.
bayesian-data-analysisbayesian-statisticsdeveloper-toolsstan
1.7 match 50 stars 13.09 score 134 scripts 222 dependentshumaniverse
wildfires:Mapping Risk and Resilience to wildfires in the UK
Build an social vulnerability index using PCA and identify areas of high wildfire risk and high social vulnerability.
Maintained by Matteo Larrode. Last updated 7 months ago.
7.5 match 1 stars 2.95 scorelcbc-uio
galamm:Generalized Additive Latent and Mixed Models
Estimates generalized additive latent and mixed models using maximum marginal likelihood, as defined in Sorensen et al. (2023) <doi:10.1007/s11336-023-09910-z>, which is an extension of Rabe-Hesketh and Skrondal (2004)'s unifying framework for multilevel latent variable modeling <doi:10.1007/BF02295939>. Efficient computation is done using sparse matrix methods, Laplace approximation, and automatic differentiation. The framework includes generalized multilevel models with heteroscedastic residuals, mixed response types, factor loadings, smoothing splines, crossed random effects, and combinations thereof. Syntax for model formulation is close to 'lme4' (Bates et al. (2015) <doi:10.18637/jss.v067.i01>) and 'PLmixed' (Rockwood and Jeon (2019) <doi:10.1080/00273171.2018.1516541>).
Maintained by Øystein Sørensen. Last updated 6 months ago.
generalized-additive-modelshierarchical-modelsitem-response-theorylatent-variable-modelsstructural-equation-modelscpp
3.0 match 29 stars 7.33 score 41 scriptsbgreenwell
pdp:Partial Dependence Plots
A general framework for constructing partial dependence (i.e., marginal effect) plots from various types machine learning models in R.
Maintained by Brandon M. Greenwell. Last updated 3 years ago.
black-box-modelmachine-learningpartial-dependence-functionpartial-dependence-plotvisualization
1.9 match 93 stars 11.72 score 1.1k scripts 8 dependentscefet-rj-dal
daltoolbox:Leveraging Experiment Lines to Data Analytics
The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.
Maintained by Eduardo Ogasawara. Last updated 1 months ago.
3.3 match 1 stars 6.65 score 536 scripts 4 dependentsjacob-long
jtools:Analysis and Presentation of Social Scientific Data
This is a collection of tools for more efficiently understanding and sharing the results of (primarily) regression analyses. There are also a number of miscellaneous functions for statistical and programming purposes. Support for models produced by the survey and lme4 packages are points of emphasis.
Maintained by Jacob A. Long. Last updated 6 months ago.
1.5 match 167 stars 14.48 score 4.0k scripts 14 dependentsbiodiverse
unmarked:Models for Data from Unmarked Animals
Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. References: Kellner et al. (2023) <doi:10.1111/2041-210X.14123>, Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.
Maintained by Ken Kellner. Last updated 16 hours ago.
1.7 match 4 stars 13.03 score 652 scripts 12 dependentsmicrosoft
wpa:Tools for Analysing and Visualising Viva Insights Data
Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in 'wpa': (i) Standard functions create a 'ggplot' visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to 'tidyverse' principles and works well with the pipe syntax. 'wpa' is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.
Maintained by Martin Chan. Last updated 4 months ago.
3.2 match 30 stars 6.69 score 39 scripts 1 dependentsfabian-s
spikeSlabGAM:Bayesian Variable Selection and Model Choice for Generalized Additive Mixed Models
Bayesian variable selection, model choice, and regularized estimation for (spatial) generalized additive mixed regression models via stochastic search variable selection with spike-and-slab priors.
Maintained by Fabian Scheipl. Last updated 5 months ago.
3.4 match 14 stars 6.28 score 15 scripts 1 dependentseasystats
modelbased:Estimation of Model-Based Predictions, Contrasts and Means
Implements a general interface for model-based estimations for a wide variety of models, used in the computation of marginal means, contrast analysis and predictions. For a list of supported models, see 'insight::supported_models()'.
Maintained by Dominique Makowski. Last updated 2 days ago.
contrast-analysiscontrastseasystatsestimateggplot2hacktoberfestmarginalmarginal-effectsmeanspredict
1.7 match 241 stars 12.35 score 315 scripts 4 dependentsbioc
ENmix:Quality control and analysis tools for Illumina DNA methylation BeadChip
Tools for quanlity control, analysis and visulization of Illumina DNA methylation array data.
Maintained by Zongli Xu. Last updated 2 days ago.
dnamethylationpreprocessingqualitycontroltwochannelmicroarrayonechannelmethylationarraybatcheffectnormalizationdataimportregressionprincipalcomponentepigeneticsmultichanneldifferentialmethylationimmunooncology
3.5 match 6.01 score 115 scriptssuyusung
arm:Data Analysis Using Regression and Multilevel/Hierarchical Models
Functions to accompany A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2007.
Maintained by Yu-Sung Su. Last updated 4 months ago.
1.7 match 25 stars 12.38 score 3.3k scripts 89 dependentscrsh
papaja:Prepare American Psychological Association Journal Articles with R Markdown
Tools to create dynamic, submission-ready manuscripts, which conform to American Psychological Association manuscript guidelines. We provide R Markdown document formats for manuscripts (PDF and Word) and revision letters (PDF). Helper functions facilitate reporting statistical analyses or create publication-ready tables and plots.
Maintained by Frederik Aust. Last updated 17 days ago.
apaapa-guidelinesjournalmanuscriptpsychologyreproducible-paperreproducible-researchrmarkdown
1.8 match 662 stars 11.74 score 1.7k scripts 1 dependentsneuropsychology
psycho:Efficient and Publishing-Oriented Workflow for Psychological Science
The main goal of the psycho package is to provide tools for psychologists, neuropsychologists and neuroscientists, to facilitate and speed up the time spent on data analysis. It aims at supporting best practices and tools to format the output of statistical methods to directly paste them into a manuscript, ensuring statistical reporting standardization and conformity.
Maintained by Dominique Makowski. Last updated 4 years ago.
apaapa6bayesiancorrelationformatinterpretationmixed-modelsneurosciencepsychopsychologyrstanarmstatistics
1.9 match 149 stars 10.86 score 628 scripts 5 dependentshjboonstra
mcmcsae:Markov Chain Monte Carlo Small Area Estimation
Fit multi-level models with possibly correlated random effects using Markov Chain Monte Carlo simulation. Such models allow smoothing over space and time and are useful in, for example, small area estimation.
Maintained by Harm Jan Boonstra. Last updated 3 months ago.
8.2 match 2.48 score 8 scriptstdaverse
ripserr:Calculate Persistent Homology with Ripser-Based Engines
Ports the Ripser <https://arxiv.org/abs/1908.02518> and Cubical Ripser <https://arxiv.org/abs/2005.12692> persistent homology calculation engines from C++. Can be used as a rapid calculation tool in topological data analysis pipelines.
Maintained by Raoul Wadhwa. Last updated 1 days ago.
algebraic-topologycohomologycppcubical-complexpersistent-homologypixelpoint-cloudr-languager-programmingrcpprips-complexripsersimplicial-complexsimplicial-homologytopological-data-analysistopologyvietoris-complexvoxelcpp
3.4 match 7 stars 5.80 score 6 scriptskopperud
slouch:Stochastic Linear Ornstein-Uhlenbeck Comparative Hypotheses
An implementation of a phylogenetic comparative method. It can fit univariate among-species Ornstein-Uhlenbeck models of phenotypic trait evolution, where the trait evolves towards a primary optimum. The optimum can be modelled as a single parameter, as multiple discrete regimes on the phylogenetic tree, and/or with continuous covariates. See also Hansen (1997) <doi:10.2307/2411186>, Butler & King (2004) <doi:10.1086/426002>, Hansen et al. (2008) <doi:10.1111/j.1558-5646.2008.00412.x>.
Maintained by Bjørn Tore Kopperud. Last updated 1 years ago.
3.9 match 2 stars 5.12 score 44 scripts 1 dependentsdyfanjones
sagemaker.mlcore:sagemaker machine learning core classes and methods
`sagemaker` machine learning core classes and methods.
Maintained by Dyfan Jones. Last updated 3 years ago.
amazon-sagemakerawsmachine-learningsagemakersdk
7.3 match 2.65 score 3 dependentsmoderndive
moderndive:Tidyverse-Friendly Introductory Linear Regression
Datasets and wrapper functions for tidyverse-friendly introductory linear regression, used in "Statistical Inference via Data Science: A ModernDive into R and the Tidyverse" available at <https://moderndive.com/>.
Maintained by Albert Y. Kim. Last updated 3 months ago.
1.7 match 88 stars 11.35 score 1.8k scriptsglsnow
TeachingDemos:Demonstrations for Teaching and Learning
Demonstration functions that can be used in a classroom to demonstrate statistical concepts, or on your own to better understand the concepts or the programming.
Maintained by Greg Snow. Last updated 1 years ago.
2.7 match 7.18 score 760 scripts 13 dependentshjunwoo
bbl:Boltzmann Bayes Learner
Supervised learning using Boltzmann Bayes model inference, which extends naive Bayes model to include interactions. Enables classification of data into multiple response groups based on a large number of discrete predictors that can take factor values of heterogeneous levels. Either pseudo-likelihood or mean field inference can be used with L2 regularization, cross-validation, and prediction on new data. <doi:10.18637/jss.v101.i05>.
Maintained by Jun Woo. Last updated 3 years ago.
7.1 match 2.70 score 3 scriptscran
epiDisplay:Epidemiological Data Display Package
Package for data exploration and result presentation. Full 'epicalc' package with data management functions is available at '<https://medipe.psu.ac.th/epicalc/>'.
Maintained by Virasakdi Chongsuvivatwong. Last updated 3 years ago.
3.5 match 1 stars 5.44 score 758 scripts 2 dependentsalanarnholt
PASWR:Probability and Statistics with R
Functions and data sets for the text Probability and Statistics with R.
Maintained by Alan T. Arnholt. Last updated 3 years ago.
4.0 match 2 stars 4.70 score 241 scriptscropmodels
Recocrop:Estimating Environmental Suitability for Plants
The ecocrop model estimates environmental suitability for plants using a limiting factor approach for plant growth following Hackett (1991) <doi:10.1007/BF00045728>. The implementation in this package is fast and flexible: it allows for the use of any (environmental) predictor variable. Predictors can be either static (for example, soil pH) or dynamic (for example, monthly precipitation).
Maintained by Robert J. Hijmans. Last updated 3 years ago.
4.9 match 11 stars 3.82 score 12 scriptsgiraultg
SpiceFP:Sparse Method to Identify Joint Effects of Functional Predictors
A set of functions allowing to implement the 'SpiceFP' approach which is iterative. It involves transformation of functional predictors into several candidate explanatory matrices (based on contingency tables), to which relative edge matrices with contiguity constraints are associated. Generalized Fused Lasso regression are performed in order to identify the best candidate matrix, the best class intervals and related coefficients at each iteration. The approach is stopped when the maximal number of iterations is reached or when retained coefficients are zeros. Supplementary functions allow to get coefficients of any candidate matrix or mean of coefficients of many candidates.
Maintained by Girault Gnanguenon Guesse. Last updated 2 years ago.
5.1 match 3.70 score 1 scriptsludvigolsen
cvms:Cross-Validation for Model Selection
Cross-validate one or multiple regression and classification models and get relevant evaluation metrics in a tidy format. Validate the best model on a test set and compare it to a baseline evaluation. Alternatively, evaluate predictions from an external model. Currently supports regression and classification (binary and multiclass). Described in chp. 5 of Jeyaraman, B. P., Olsen, L. R., & Wambugu M. (2019, ISBN: 9781838550134).
Maintained by Ludvig Renbo Olsen. Last updated 9 days ago.
1.8 match 39 stars 10.31 score 492 scripts 5 dependentsthie1e
cutpointr:Determine and Evaluate Optimal Cutpoints in Binary Classification Tasks
Estimate cutpoints that optimize a specified metric in binary classification tasks and validate performance using bootstrapping. Some methods for more robust cutpoint estimation are supported, e.g. a parametric method assuming normal distributions, bootstrapped cutpoints, and smoothing of the metric values per cutpoint using Generalized Additive Models. Various plotting functions are included. For an overview of the package see Thiele and Hirschfeld (2021) <doi:10.18637/jss.v098.i11>.
Maintained by Christian Thiele. Last updated 3 months ago.
bootstrappingcutpoint-optimizationroc-curvecpp
1.8 match 88 stars 10.44 score 322 scripts 1 dependentsmarjoleinf
pre:Prediction Rule Ensembles
Derives prediction rule ensembles (PREs). Largely follows the procedure for deriving PREs as described in Friedman & Popescu (2008; <DOI:10.1214/07-AOAS148>), with adjustments and improvements. The main function pre() derives prediction rule ensembles consisting of rules and/or linear terms for continuous, binary, count, multinomial, and multivariate continuous responses. Function gpe() derives generalized prediction ensembles, consisting of rules, hinge and linear functions of the predictor variables.
Maintained by Marjolein Fokkema. Last updated 9 months ago.
2.1 match 58 stars 8.49 score 98 scripts 1 dependentsdyfanjones
sagemaker.mlframework:sagemaker machine learning developed by amazon
`sagemaker` machine learning developed by amazon.
Maintained by Dyfan Jones. Last updated 3 years ago.
amazon-sagemakerawsmachine-learningsagemakersdk
7.3 match 2.48 score 2 dependentssfcheung
semhelpinghands:Helper Functions for Structural Equation Modeling
An assortment of helper functions for doing structural equation modeling, mainly by 'lavaan' for now. Most of them are time-saving functions for common tasks in doing structural equation modeling and reading the output. This package is not for functions that implement advanced statistical procedures. It is a light-weight package for simple functions that do simple tasks conveniently, with as few dependencies as possible.
Maintained by Shu Fai Cheung. Last updated 4 months ago.
bootstrappinglavaanstructural-equation-modeling
3.5 match 5.13 score 27 scriptshugheylab
zeitzeiger:Regularized Supervised Learning for Data from Rhythmic Systems
Method for predicting the value of a periodic variable from a high-dimensional observation. See Hughey et al. (2016) <doi:10.1093/nar/gkw030> and Hughey (2017) <doi:10.1186/s13073-017-0406-4>.
Maintained by Jake Hughey. Last updated 2 years ago.
4.8 match 10 stars 3.70 score 1 scriptssth1402
modelObj:A Model Object Framework for Regression Analysis
A utility library to facilitate the generalization of statistical methods built on a regression framework. Package developers can use 'modelObj' methods to initiate a regression analysis without concern for the details of the regression model and the method to be used to obtain parameter estimates. The specifics of the regression step are left to the user to define when calling the function. The user of a function developed within the 'modelObj' framework creates as input a 'modelObj' that contains the model and the R methods to be used to obtain parameter estimates and to obtain predictions. In this way, a user can easily go from linear to non-linear models within the same package.
Maintained by Shannon T. Holloway. Last updated 3 years ago.
5.3 match 3.32 score 23 scripts 3 dependentsfriendly
heplots:Visualizing Hypothesis Tests in Multivariate Linear Models
Provides HE plot and other functions for visualizing hypothesis tests in multivariate linear models. HE plots represent sums-of-squares-and-products matrices for linear hypotheses and for error using ellipses (in two dimensions) and ellipsoids (in three dimensions). The related 'candisc' package provides visualizations in a reduced-rank canonical discriminant space when there are more than a few response variables.
Maintained by Michael Friendly. Last updated 8 days ago.
linear-hypothesesmatricesmultivariate-linear-modelsplotrepeated-measure-designsvisualizing-hypothesis-tests
1.5 match 9 stars 11.49 score 1.1k scripts 7 dependentsflorianhartig
DHARMa:Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models
The 'DHARMa' package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals for fitted (generalized) linear mixed models. Currently supported are linear and generalized linear (mixed) models from 'lme4' (classes 'lmerMod', 'glmerMod'), 'glmmTMB', 'GLMMadaptive', and 'spaMM'; phylogenetic linear models from 'phylolm' (classes 'phylolm' and 'phyloglm'); generalized additive models ('gam' from 'mgcv'); 'glm' (including 'negbin' from 'MASS', but excluding quasi-distributions) and 'lm' model classes. Moreover, externally created simulations, e.g. posterior predictive simulations from Bayesian software such as 'JAGS', 'STAN', or 'BUGS' can be processed as well. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problems, such as over/underdispersion, zero-inflation, and residual spatial, phylogenetic and temporal autocorrelation.
Maintained by Florian Hartig. Last updated 12 days ago.
glmmregressionregression-diagnosticsresidual
1.2 match 226 stars 14.74 score 2.8k scripts 10 dependentscran
IIVpredictor:Modeling Within Individual Variability as Predictor
Time parceling method and Bayesian variability modeling methods for modeling within individual variability indicators as predictors.For more details, see <https://github.com/xliu12/IIVpredicitor>.
Maintained by Xiao Liu. Last updated 4 years ago.
8.6 match 2.00 score 2 scriptsdruegamer
deepregression:Fitting Deep Distributional Regression
Allows for the specification of semi-structured deep distributional regression models which are fitted in a neural network as proposed by Ruegamer et al. (2023) <doi:10.18637/jss.v105.i02>. Predictors can be modeled using structured (penalized) linear effects, structured non-linear effects or using an unstructured deep network model.
Maintained by David Ruegamer. Last updated 3 months ago.
7.5 match 2.28 score 63 scripts 1 dependentslingfeiwang
lassopv:Nonparametric P-Value Estimation for Predictors in Lasso
Estimate the p-values for predictors x against target variable y in lasso regression, using the regularization strength when each predictor enters the active set of regularization path for the first time as the statistic. This is based on the assumption that predictors (of the same variance) that (first) become active earlier tend to be more significant. Three null distributions are supported: normal and spherical, which are computed separately for each predictor and analytically under approximation, which aims at efficiency and accuracy for small p-values.
Maintained by Lingfei Wang. Last updated 2 years ago.
feature-selectionlassolinear-regressionp-valuevariable-selection
7.4 match 2 stars 2.30 score 5 scriptsalanarnholt
PASWR2:Probability and Statistics with R, Second Edition
Functions and data sets for the text Probability and Statistics with R, Second Edition.
Maintained by Alan T. Arnholt. Last updated 3 years ago.
4.0 match 1 stars 4.24 score 260 scriptsmiriamesteve
eat:Efficiency Analysis Trees
Functions are provided to determine production frontiers and technical efficiency measures through non-parametric techniques based upon regression trees. The package includes code for estimating radial input, output, directional and additive measures, plotting graphical representations of the scores and the production frontiers by means of trees, and determining rankings of importance of input variables in the analysis. Additionally, an adaptation of Random Forest by a set of individual Efficiency Analysis Trees for estimating technical efficiency is also included. More details in: <doi:10.1016/j.eswa.2020.113783>.
Maintained by Miriam Esteve. Last updated 3 years ago.
3.6 match 5 stars 4.68 score 19 scriptspat-s
oddsratio:Odds Ratio Calculation for GAM(M)s & GLM(M)s
Simplified odds ratio calculation of GAM(M)s & GLM(M)s. Provides structured output (data frame) of all predictors and their corresponding odds ratios and confident intervals for further analyses. It helps to avoid false references of predictors and increments by specifying these parameters in a list instead of using 'exp(coef(model))' (standard approach of odds ratio calculation for GLMs) which just returns a plain numeric output. For GAM(M)s, odds ratio calculation is highly simplified with this package since it takes care of the multiple 'predict()' calls of the chosen predictor while holding other predictors constant. Also, this package allows odds ratio calculation of percentage steps across the whole predictor distribution range for GAM(M)s. In both cases, confident intervals are returned additionally. Calculated odds ratio of GAM(M)s can be inserted into the smooth function plot.
Maintained by Patrick Schratz. Last updated 11 months ago.
odds-ratioprobabilitystatistics
2.3 match 31 stars 7.48 score 81 scripts 1 dependentscran
catalytic:Tools for Applying Catalytic Priors in Statistical Modeling
To improve estimation accuracy and stability in statistical modeling, catalytic prior distributions are employed, integrating observed data with synthetic data generated from a simpler model's predictive distribution. This approach enhances model robustness, stability, and flexibility in complex data scenarios. The catalytic prior distributions are introduced by 'Huang et al.' (2020, <doi:10.1073/pnas.1920913117>), Li and Huang (2023, <doi:10.48550/arXiv.2312.01411>).
Maintained by Dongming Huang. Last updated 3 months ago.
5.3 match 3.18 scorecdriveraus
ctsem:Continuous Time Structural Equation Modelling
Hierarchical continuous (and discrete) time state space modelling, for linear and nonlinear systems measured by continuous variables, with limited support for binary data. The subject specific dynamic system is modelled as a stochastic differential equation (SDE) or difference equation, measurement models are typically multivariate normal factor models. Linear mixed effects SDE's estimated via maximum likelihood and optimization are the default. Nonlinearities, (state dependent parameters) and random effects on all parameters are possible, using either max likelihood / max a posteriori optimization (with optional importance sampling) or Stan's Hamiltonian Monte Carlo sampling. See <https://github.com/cdriveraus/ctsem/raw/master/vignettes/hierarchicalmanual.pdf> for details. Priors may be used. For the conceptual overview of the hierarchical Bayesian linear SDE approach, see <https://www.researchgate.net/publication/324093594_Hierarchical_Bayesian_Continuous_Time_Dynamic_Modeling>. Exogenous inputs may also be included, for an overview of such possibilities see <https://www.researchgate.net/publication/328221807_Understanding_the_Time_Course_of_Interventions_with_Continuous_Time_Dynamic_Models> . Stan based functions are not available on 32 bit Windows systems at present. <https://cdriver.netlify.app/> contains some tutorial blog posts.
Maintained by Charles Driver. Last updated 11 days ago.
stochastic-differential-equationstime-seriescpp
1.8 match 42 stars 9.58 score 366 scripts 1 dependentsalexpkeil1
bkmrhat:Parallel Chain Tools for Bayesian Kernel Machine Regression
Bayesian kernel machine regression (from the 'bkmr' package) is a Bayesian semi-parametric generalized linear model approach under identity and probit links. There are a number of functions in this package that extend Bayesian kernel machine regression fits to allow multiple-chain inference and diagnostics, which leverage functions from the 'future', 'rstan', and 'coda' packages. Reference: Bobb, J. F., Henn, B. C., Valeri, L., & Coull, B. A. (2018). Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. ; <doi:10.1186/s12940-018-0413-y>.
Maintained by Alexander Keil. Last updated 3 years ago.
3.7 match 7 stars 4.54 score 10 scriptsjmleach-bst
sim2Dpredictr:Simulate Outcomes Using Spatially Dependent Design Matrices
Provides tools for simulating spatially dependent predictors (continuous or binary), which are used to generate scalar outcomes in a (generalized) linear model framework. Continuous predictors are generated using traditional multivariate normal distributions or Gauss Markov random fields with several correlation function approaches (e.g., see Rue (2001) <doi:10.1111/1467-9868.00288> and Furrer and Sain (2010) <doi:10.18637/jss.v036.i10>), while binary predictors are generated using a Boolean model (see Cressie and Wikle (2011, ISBN: 978-0-471-69274-4)). Parameter vectors exhibiting spatial clustering can also be easily specified by the user.
Maintained by Justin Leach. Last updated 1 years ago.
6.1 match 2.70 score 2 scriptsbxc147
Epi:Statistical Analysis in Epidemiology
Functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data. In particular representation, manipulation, rate estimation and simulation for multistate data - the Lexis suite of functions, which includes interfaces to 'mstate', 'etm' and 'cmprsk' packages. Contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.
Maintained by Bendix Carstensen. Last updated 2 months ago.
1.7 match 4 stars 9.65 score 708 scripts 11 dependentsericarcher
rfPermute:Estimate Permutation p-Values for Random Forest Importance Metrics
Estimate significance of importance metrics for a Random Forest model by permuting the response variable. Produces null distribution of importance metrics for each predictor variable and p-value of observed. Provides summary and visualization functions for 'randomForest' results.
Maintained by Eric Archer. Last updated 2 years ago.
2.4 match 27 stars 6.77 score 96 scripts 1 dependentsdaya6489
SmartEDA:Summarize and Explore the Data
Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.
Maintained by Dayanand Ubrangala. Last updated 1 years ago.
analysisexploratory-data-analysis
2.2 match 42 stars 7.25 score 214 scriptsajrgodfrey
BrailleR:Improved Access for Blind Users
Blind users do not have access to the graphical output from R without printing the content of graphics windows to an embosser of some kind. This is not as immediate as is required for efficient access to statistical output. The functions here are created so that blind people can make even better use of R. This includes the text descriptions of graphs, convenience functions to replace the functionality offered in many GUI front ends, and experimental functionality for optimising graphical content to prepare it for embossing as tactile images.
Maintained by A. Jonathan R. Godfrey. Last updated 11 months ago.
1.8 match 123 stars 8.90 score 143 scriptsrempsyc
rempsyc:Convenience Functions for Psychology
Make your workflow faster and easier. Easily customizable plots (via 'ggplot2'), nice APA tables (following the style of the *American Psychological Association*) exportable to Word (via 'flextable'), easily run statistical tests or check assumptions, and automatize various other tasks.
Maintained by Rémi Thériault. Last updated 1 months ago.
convenience-functionsggplot2psychologystatisticsvisualization
1.5 match 43 stars 10.68 score 214 scripts 2 dependentsjellegoeman
penalized:L1 (Lasso and Fused Lasso) and L2 (Ridge) Penalized Estimation in GLMs and in the Cox Model
Fitting possibly high dimensional penalized regression models. The penalty structure can be any combination of an L1 penalty (lasso and fused lasso), an L2 penalty (ridge) and a positivity constraint on the regression coefficients. The supported regression models are linear, logistic and Poisson regression and the Cox Proportional Hazards model. Cross-validation routines allow optimization of the tuning parameters.
Maintained by Jelle Goeman. Last updated 3 years ago.
2.3 match 4 stars 7.09 score 429 scripts 17 dependentsghbolstad
evolvability:Calculation of Evolvability Parameters
Provides tools for calculating evolvability parameters from estimated G-matrices as defined in Hansen and Houle (2008) <doi:10.1111/j.1420-9101.2008.01573.x> and fits phylogenetic comparative models that link the rate of evolution of a trait to the state of another evolving trait (see Hansen et al. 2021 Systematic Biology <doi:10.1093/sysbio/syab079>). The package was released with Bolstad et al. (2014) <doi:10.1098/rstb.2013.0255>, which contains some examples of use.
Maintained by Geir H. Bolstad. Last updated 10 months ago.
3.8 match 4.20 score 16 scripts