R-universe search: exports:tidy

tidymodels

broom:Convert Statistical Objects into Tidy Tibbles

Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.

Maintained by Simon Couch. Last updated 1 days ago.

modeling tidy-data

1.5k stars 21.58 score 37k scripts 1.5k dependents

tidymodels

recipes:Preprocessing and Feature Engineering Steps for Modeling

A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.

Maintained by Max Kuhn. Last updated 19 hours ago.

586 stars 18.80 score 7.2k scripts 383 dependents

juliasilge

tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.

Maintained by Julia Silge. Last updated 12 months ago.

natural-language-processing text-mining tidy-data tidyverse

1.2k stars 16.86 score 17k scripts 61 dependents

tidymodels

rsample:General Resampling Infrastructure

Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).

Maintained by Hannah Frick. Last updated 18 days ago.

341 stars 16.72 score 5.2k scripts 79 dependents

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 1 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

462 stars 16.64 score 10k scripts 154 dependents

tidymodels

parsnip:A Common API to Modeling and Analysis Functions

A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).

Maintained by Max Kuhn. Last updated 17 days ago.

612 stars 16.37 score 3.4k scripts 69 dependents

mhahsler

dbscan:Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms

A fast reimplementation of several density-based algorithms of the DBSCAN family. Includes the clustering algorithms DBSCAN (density-based spatial clustering of applications with noise) and HDBSCAN (hierarchical DBSCAN), the ordering algorithm OPTICS (ordering points to identify the clustering structure), shared nearest neighbor clustering, and the outlier detection algorithms LOF (local outlier factor) and GLOSH (global-local outlier score from hierarchies). The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided. Hahsler, Piekenbrock and Doran (2019) <doi:10.18637/jss.v091.i01>.

Maintained by Michael Hahsler. Last updated 2 months ago.

clustering dbscan density-based-clustering hdbscan lof optics cpp

324 stars 15.60 score 1.6k scripts 85 dependents

tidymodels

yardstick:Tidy Characterizations of Model Performance

Tidy tools for quantifying how well model fits to a data set such as confusion matrices, class probability curve summaries, and regression metrics (e.g., RMSE).

Maintained by Emil Hvitfeldt. Last updated 17 days ago.

387 stars 15.47 score 2.2k scripts 60 dependents

kassambara

rstatix:Pipe-Friendly Framework for Basic Statistical Tests

Provides a simple and intuitive pipe-friendly framework, coherent with the 'tidyverse' design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses. The output of each test is automatically transformed into a tidy data frame to facilitate visualization. Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely 'within-Ss' designs (repeated measures), purely 'between-Ss' designs, and mixed 'within-and-between-Ss' designs. It's also possible to compute several effect size metrics, including "eta squared" for ANOVA, "Cohen's d" for t-test and 'Cramer V' for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances.

Maintained by Alboukadel Kassambara. Last updated 2 years ago.

458 stars 15.27 score 11k scripts 432 dependents

bbolker

broom.mixed:Tidying Methods for Mixed Models

Convert fitted objects from various R mixed-model packages into tidy data frames along the lines of the 'broom' package. The package provides three S3 generics for each model: tidy(), which summarizes a model's statistical findings such as coefficients of a regression; augment(), which adds columns to the original data such as predictions, residuals and cluster assignments; and glance(), which provides a one-row summary of model-level statistics.

Maintained by Ben Bolker. Last updated 6 days ago.

230 stars 15.22 score 4.0k scripts 37 dependents

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 11 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

959 stars 15.20 score 4.0k scripts 21 dependents

vincentarelbundock

marginaleffects:Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests

Compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and machine learning models in R. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference. Details can be found in Arel-Bundock, Greifer, and Heiss (2024) <doi:10.18637/jss.v111.i09>.

Maintained by Vincent Arel-Bundock. Last updated 1 hours ago.

cpp

511 stars 14.57 score 1.8k scripts 10 dependents

jacob-long

jtools:Analysis and Presentation of Social Scientific Data

This is a collection of tools for more efficiently understanding and sharing the results of (primarily) regression analyses. There are also a number of miscellaneous functions for statistical and programming purposes. Support for models produced by the survey and lme4 packages are points of emphasis.

Maintained by Jacob A. Long. Last updated 7 months ago.

social-sciences

167 stars 14.48 score 4.0k scripts 14 dependents

hojsgaard

pbkrtest:Parametric Bootstrap, Kenward-Roger and Satterthwaite Based Methods for Test in Mixed Models

Computes p-values based on (a) Satterthwaite or Kenward-Rogers degree of freedom methods and (b) parametric bootstrap for mixed effects models as implemented in the 'lme4' package. Implements parametric bootstrap test for generalized linear mixed models as implemented in 'lme4' and generalized linear models. The package is documented in the paper by Halekoh and Højsgaard, (2012, <doi:10.18637/jss.v059.i09>). Please see 'citation("pbkrtest")' for citation details.

Maintained by Søren Højsgaard. Last updated 23 days ago.

5 stars 14.36 score 648 scripts 915 dependents

r-lib

generics:Common S3 Generics not Provided by Base R Methods Related to Model Fitting

In order to reduce potential package dependencies and conflicts, generics provides a number of commonly used S3 generics.

Maintained by Hadley Wickham. Last updated 1 years ago.

61 stars 14.00 score 131 scripts 9.8k dependents

hughjonesd

huxtable:Easily Create and Style Tables for LaTeX, HTML and Other Formats

Creates styled tables for data presentation. Export to HTML, LaTeX, RTF, 'Word', 'Excel', and 'PowerPoint'. Simple, modern interface to manipulate borders, size, position, captions, colours, text styles and number formatting. Table cells can span multiple rows and/or columns. Includes a 'huxreg' function for creation of regression tables, and 'quick_*' one-liners to print data to a new document.

Maintained by David Hugh-Jones. Last updated 25 days ago.

html huxtable latex microsoft-word powerpoint reproducible-research tables

323 stars 13.93 score 1.9k scripts 16 dependents

vincentarelbundock

modelsummary:Summary Tables and Plots for Statistical Models and Data: Beautiful, Customizable, and Publication-Ready

Create beautiful and customizable tables to summarize several statistical models side-by-side. Draw coefficient plots, multi-level cross-tabs, dataset summaries, balance tables (a.k.a. "Table 1s"), and correlation matrices. This package supports dozens of statistical models, and it can produce tables in HTML, LaTeX, Word, Markdown, PDF, PowerPoint, Excel, RTF, JPG, or PNG. Tables can easily be embedded in 'Rmarkdown' or 'knitr' dynamic documents. Details can be found in Arel-Bundock (2022) <doi:10.18637/jss.v103.i01>.

Maintained by Vincent Arel-Bundock. Last updated 28 days ago.

926 stars 13.41 score 6.2k scripts 2 dependents

chjackson

flexsurv:Flexible Parametric Survival and Multi-State Models

Flexible parametric models for time-to-event data, including the Royston-Parmar spline model, generalized gamma and generalized F distributions. Any user-defined parametric distribution can be fitted, given at least an R function defining the probability density or hazard. There are also tools for fitting and predicting from fully parametric multi-state models, based on either cause-specific hazards or mixture models.

Maintained by Christopher Jackson. Last updated 2 months ago.

cpp

57 stars 13.31 score 632 scripts 43 dependents

tidyverts

fabletools:Core Tools for Packages in the 'fable' Framework

Provides tools, helpers and data structures for developing models and time series functions for 'fable' and extension packages. These tools support a consistent and tidy interface for time series modelling and analysis.

Maintained by Mitchell OHara-Wild. Last updated 2 months ago.

91 stars 12.18 score 396 scripts 18 dependents

openpharma

mmrm:Mixed Models for Repeated Measures

Mixed models for repeated measures (MMRM) are a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials and beyond; see Cnaan, Laird and Slasor (1997) <doi:10.1002/(SICI)1097-0258(19971030)16:20%3C2349::AID-SIM667%3E3.0.CO;2-E> for a tutorial and Mallinckrodt, Lane, Schnell, Peng and Mancuso (2008) <doi:10.1177/009286150804200402> for a review. This package implements MMRM based on the marginal linear model without random effects using Template Model Builder ('TMB') which enables fast and robust model fitting. Users can specify a variety of covariance matrices, weight observations, fit models with restricted or standard maximum likelihood inference, perform hypothesis testing with Satterthwaite or Kenward-Roger adjustment, and extract least square means estimates by using 'emmeans'.

Maintained by Daniel Sabanes Bove. Last updated 22 days ago.

cpp

138 stars 12.15 score 113 scripts 4 dependents

bcallaway11

did:Treatment Effects with Multiple Periods and Groups

The standard Difference-in-Differences (DID) setup involves two periods and two groups -- a treated group and untreated group. Many applications of DID methods involve more than two periods and have individuals that are treated at different points in time. This package contains tools for computing average treatment effect parameters in Difference in Differences setups with more than two periods and with variation in treatment timing using the methods developed in Callaway and Sant'Anna (2021) <doi:10.1016/j.jeconom.2020.12.001>. The main parameters are group-time average treatment effects which are the average treatment effect for a particular group at a a particular time. These can be aggregated into a fewer number of treatment effect parameters, and the package deals with the cases where there is selective treatment timing, dynamic treatment effects, calendar time effects, or combinations of these. There are also functions for testing the Difference in Differences assumption, and plotting group-time average treatment effects.

Maintained by Brantly Callaway. Last updated 3 days ago.

329 stars 12.09 score 696 scripts 3 dependents

declaredesign

estimatr:Fast Estimators for Design-Based Inference

Fast procedures for small set of commonly-used, design-appropriate estimators with robust standard errors and confidence intervals. Includes estimators for linear regression, instrumental variables regression, difference-in-means, Horvitz-Thompson estimation, and regression improving precision of experimental estimates by interacting treatment with centered pre-treatment covariates introduced by Lin (2013) <doi:10.1214/12-AOAS583>.

Maintained by Graeme Blair. Last updated 2 months ago.

cpp

133 stars 11.58 score 1.7k scripts 11 dependents

jacob-long

interactions:Comprehensive, User-Friendly Toolkit for Probing Interactions

A suite of functions for conducting and interpreting analysis of statistical interaction in regression models that was formerly part of the 'jtools' package. Functionality includes visualization of two- and three-way interactions among continuous and/or categorical variables as well as calculation of "simple slopes" and Johnson-Neyman intervals (see e.g., Bauer & Curran, 2005 <doi:10.1207/s15327906mbr4003_5>). These capabilities are implemented for generalized linear models in addition to the standard linear regression context.

Maintained by Jacob A. Long. Last updated 8 months ago.

interactions moderation social-sciences statistics

131 stars 11.40 score 1.2k scripts 5 dependents

tidymodels

tidypredict:Run Predictions Inside the Database

It parses a fitted 'R' model object, and returns a formula in 'Tidy Eval' code that calculates the predictions. It works with several databases back-ends because it leverages 'dplyr' and 'dbplyr' for the final 'SQL' translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), earth(), xgb.Booster.complete(), cubist(), and ctree() models.

Maintained by Emil Hvitfeldt. Last updated 3 months ago.

dbplyr dplyr purrr rlang

262 stars 11.05 score 241 scripts 2 dependents

pbs-assess

sdmTMB:Spatial and Spatiotemporal SPDE-Based GLMMs with 'TMB'

Implements spatial and spatiotemporal GLMMs (Generalized Linear Mixed Effect Models) using 'TMB', 'fmesher', and the SPDE (Stochastic Partial Differential Equation) Gaussian Markov random field approximation to Gaussian random fields. One common application is for spatially explicit species distribution models (SDMs). See Anderson et al. (2024) <doi:10.1101/2022.03.24.485545>.

Maintained by Sean C. Anderson. Last updated 18 hours ago.

ecology glmm spatial-analysis species-distribution-modelling tmb cpp

205 stars 11.04 score 848 scripts 1 dependents

tidymodels

textrecipes:Extra 'Recipes' for Text Processing

Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.

Maintained by Emil Hvitfeldt. Last updated 10 days ago.

160 stars 10.86 score 964 scripts 1 dependents

hojsgaard

geepack:Generalized Estimating Equation Package

Generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses. See e.g. Halekoh and Højsgaard, (2005, <doi:10.18637/jss.v015.i02>), for details.

Maintained by Søren Højsgaard. Last updated 8 months ago.

cpp

1 stars 10.57 score 1.7k scripts 43 dependents

matthieustigler

tsDyn:Nonlinear Time Series Models with Regime Switching

Implements nonlinear autoregressive (AR) time series models. For univariate series, a non-parametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).

Maintained by Matthieu Stigler. Last updated 5 months ago.

34 stars 10.53 score 684 scripts 3 dependents

tidymodels

themis:Extra Recipes Steps for Dealing with Unbalanced Data

A dataset with an uneven number of cases in each class is said to be unbalanced. Many models produce a subpar performance on unbalanced datasets. A dataset can be balanced by increasing the number of minority cases using SMOTE 2011 <doi:10.48550/arXiv.1106.1813>, BorderlineSMOTE 2005 <doi:10.1007/11538059_91> and ADASYN 2008 <https://ieeexplore.ieee.org/document/4633969>. Or by decreasing the number of majority cases using NearMiss 2003 <https://www.site.uottawa.ca/~nat/Workshop2003/jzhang.pdf> or Tomek link removal 1976 <https://ieeexplore.ieee.org/document/4309452>.

Maintained by Emil Hvitfeldt. Last updated 2 months ago.

143 stars 10.37 score 1.3k scripts 2 dependents

atsa-es

MARSS:Multivariate Autoregressive State-Space Modeling

The MARSS package provides maximum-likelihood parameter estimation for constrained and unconstrained linear multivariate autoregressive state-space (MARSS) models, including partially deterministic models. MARSS models are a class of dynamic linear model (DLM) and vector autoregressive model (VAR) model. Fitting available via Expectation-Maximization (EM), BFGS (using optim), and 'TMB' (using the 'marssTMB' companion package). Functions are provided for parametric and innovations bootstrapping, Kalman filtering and smoothing, model selection criteria including bootstrap AICb, confidences intervals via the Hessian approximation or bootstrapping, and all conditional residual types. See the user guide for examples of dynamic factor analysis, dynamic linear models, outlier and shock detection, and multivariate AR-p models. Online workshops (lectures, eBook, and computer labs) at <https://atsa-es.github.io/>.

Maintained by Elizabeth Eli Holmes. Last updated 1 years ago.

multivariate-timeseries state-space-models statistics time-series

52 stars 10.34 score 596 scripts 3 dependents

bcgov

ssdtools:Species Sensitivity Distributions

Species sensitivity distributions are cumulative probability distributions which are fitted to toxicity concentrations for different species as described by Posthuma et al.(2001) <isbn:9781566705783>. The ssdtools package uses Maximum Likelihood to fit distributions such as the gamma, log-logistic, log-normal and log-normal log-normal mixture. Multiple distributions can be averaged using Akaike Information Criteria. Confidence intervals on hazard concentrations and proportions are produced by bootstrapping.

Maintained by Joe Thorley. Last updated 1 months ago.

ecotoxicology env species-sensitivity-distribution cpp

33 stars 10.33 score 111 scripts 5 dependents

darwin-eu

omopgenerics:Methods and Classes for the OMOP Common Data Model

Provides definitions of core classes and methods used by analytic pipelines that query the OMOP (Observational Medical Outcomes Partnership) common data model.

Maintained by Martí Català. Last updated 22 days ago.

9.97 score 193 scripts 16 dependents

nicholasjclark

mvgam:Multivariate (Dynamic) Generalized Additive Models

Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2023) <doi:10.1111/2041-210X.13974>.

Maintained by Nicholas J Clark. Last updated 12 hours ago.

bayesian-statistics dynamic-factor-models ecological-modelling forecasting gaussian-process generalised-additive-models generalized-additive-models joint-species-distribution-modelling multilevel-models multivariate-timeseries stan time-series-analysis timeseries vector-autoregression vectorautoregression cpp

148 stars 9.92 score 117 scripts

tidymodels

rules:Model Wrappers for Rule-Based Models

Bindings for additional models for use with the 'parsnip' package. Models include prediction rule ensembles (Friedman and Popescu, 2008) <doi:10.1214/07-AOAS148>, C5.0 rules (Quinlan, 1992 ISBN: 1558602380), and Cubist (Kuhn and Johnson, 2013) <doi:10.1007/978-1-4614-6849-3>.

Maintained by Emil Hvitfeldt. Last updated 5 months ago.

40 stars 9.52 score 20k scripts 1 dependents

stemangiola

tidyseurat:Brings Seurat to the Tidyverse

It creates an invisible layer that allow to see the 'Seurat' object as tibble and interact seamlessly with the tidyverse.

Maintained by Stefano Mangiola. Last updated 8 months ago.

assaydomain infrastructure rnaseq differentialexpression geneexpression normalization clustering qualitycontrol sequencing transcription transcriptomics dplyr ggplot2 pca purrr sct seurat single-cell single-cell-rna-seq tibble tidyr tidyverse transcripts tsne umap

159 stars 9.48 score 398 scripts 1 dependents

tidymodels

embed:Extra Recipes for Encoding Predictors

Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.

Maintained by Emil Hvitfeldt. Last updated 2 months ago.

142 stars 9.35 score 1.1k scripts

otoomet

maxLik:Maximum Likelihood Estimation and Related Tools

Functions for Maximum Likelihood (ML) estimation, non-linear optimization, and related tools. It includes a unified way to call different optimizers, and classes and methods to handle the results from the Maximum Likelihood viewpoint. It also includes a number of convenience tools for testing and developing your own models.

Maintained by Ott Toomet. Last updated 1 years ago.

9.14 score 480 scripts 110 dependents

jhelvy

logitr:Logit Models w/Preference & WTP Space Utility Parameterizations

Fast estimation of multinomial (MNL) and mixed logit (MXL) models in R. Models can be estimated using "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations. Weighted models can also be estimated. An option is available to run a parallelized multistart optimization loop with random starting points in each iteration, which is useful for non-convex problems like MXL models or models with WTP space utility parameterizations. The main optimization loop uses the 'nloptr' package to minimize the negative log-likelihood function. Additional functions are available for computing and comparing WTP from both preference space and WTP space models and for predicting expected choices and choice probabilities for sets of alternatives based on an estimated model. Mixed logit models can include uncorrelated or correlated heterogeneity covariances and are estimated using maximum simulated likelihood based on the algorithms in Train (2009) <doi:10.1017/CBO9780511805271>. More details can be found in Helveston (2023) <doi:10.18637/jss.v105.i10>.

Maintained by John Helveston. Last updated 5 months ago.

log-likelihood logit logit-model mixed-logit mlogit multinomial-regression mxl mxl-models preference-space preferences willingness-to-pay wtp

54 stars 9.10 score 119 scripts 1 dependents

graemeleehickey

joineRML:Joint Modelling of Multivariate Longitudinal Data and Time-to-Event Outcomes

Fits the joint model proposed by Henderson and colleagues (2000) <doi:10.1093/biostatistics/1.4.465>, but extended to the case of multiple continuous longitudinal measures. The time-to-event data is modelled using a Cox proportional hazards regression model with time-varying covariates. The multiple longitudinal outcomes are modelled using a multivariate version of the Laird and Ware linear mixed model. The association is captured by a multivariate latent Gaussian process. The model is estimated using a Monte Carlo Expectation Maximization algorithm. This project was funded by the Medical Research Council (Grant number MR/M013227/1).

Maintained by Graeme L. Hickey. Last updated 2 months ago.

armadillo biostatistics clinical-trials cox dynamic joint-models longitudinal-data multivariate-analysis multivariate-data multivariate-longitudinal-data prediction rcpp regression-models statistics survival openblas cpp openmp

30 stars 8.93 score 146 scripts 1 dependents

bioc

tidySingleCellExperiment:Brings SingleCellExperiment to the Tidyverse

'tidySingleCellExperiment' is an adapter that abstracts the 'SingleCellExperiment' container in the form of a 'tibble'. This allows *tidy* data manipulation, nesting, and plotting. For example, a 'tidySingleCellExperiment' is directly compatible with functions from 'tidyverse' packages `dplyr` and `tidyr`, as well as plotting with `ggplot2` and `plotly`. In addition, the package provides various utility functions specific to single-cell omics data analysis (e.g., aggregation of cell-level data to pseudobulks).

Maintained by Stefano Mangiola. Last updated 5 months ago.

assaydomain infrastructure rnaseq differentialexpression singlecell geneexpression normalization clustering qualitycontrol sequencing bioconductor dplyr ggplot2 plotly single-cell-rna-seq single-cell-sequencing singlecellexperiment tibble tidyr tidyverse

36 stars 8.86 score 125 scripts 2 dependents

charlie86

spotifyr:R Wrapper for the 'Spotify' Web API

An R wrapper for pulling data from the 'Spotify' Web API <https://developer.spotify.com/documentation/web-api/> in bulk, or post items on a 'Spotify' user's playlist.

Maintained by Daniel Antal. Last updated 5 months ago.

music-information-retrieval spotify

375 stars 8.61 score 936 scripts

tidymodels

tidyposterior:Bayesian Analysis to Compare Models using Resampling Statistics

Bayesian analysis used here to answer the question: "when looking at resampling results, are the differences between models 'real'?" To answer this, a model can be created were the performance statistic is the resampling statistics (e.g. accuracy or RMSE). These values are explained by the model types. In doing this, we can get parameter estimates for each model's affect on performance and make statistical (and practical) comparisons between models. The methods included here are similar to Benavoli et al (2017) <https://jmlr.org/papers/v18/16-305.html>.

Maintained by Max Kuhn. Last updated 5 months ago.

102 stars 8.44 score 273 scripts

bioc

tidySummarizedExperiment:Brings SummarizedExperiment to the Tidyverse

The tidySummarizedExperiment package provides a set of tools for creating and manipulating tidy data representations of SummarizedExperiment objects. SummarizedExperiment is a widely used data structure in bioinformatics for storing high-throughput genomic data, such as gene expression or DNA sequencing data. The tidySummarizedExperiment package introduces a tidy framework for working with SummarizedExperiment objects. It allows users to convert their data into a tidy format, where each observation is a row and each variable is a column. This tidy representation simplifies data manipulation, integration with other tidyverse packages, and enables seamless integration with the broader ecosystem of tidy tools for data analysis.

Maintained by Stefano Mangiola. Last updated 5 months ago.

assaydomain infrastructure rnaseq differentialexpression geneexpression normalization clustering qualitycontrol sequencing transcription transcriptomics

26 stars 8.44 score 196 scripts 1 dependents

declaredesign

DeclareDesign:Declare and Diagnose Research Designs

Researchers can characterize and learn about the properties of research designs before implementation using `DeclareDesign`. Ex ante declaration and diagnosis of designs can help researchers clarify the strengths and limitations of their designs and to improve their properties, and can help readers evaluate a research strategy prior to implementation and without access to results. It can also make it easier for designs to be shared, replicated, and critiqued.

Maintained by Graeme Blair. Last updated 2 months ago.

design research simulations

101 stars 8.42 score 398 scripts 1 dependents

radiant-rstats

radiant.data:Data Menu for Radiant: Business Analytics using R and Shiny

The Radiant Data menu includes interfaces for loading, saving, viewing, visualizing, summarizing, transforming, and combining data. It also contains functionality to generate reproducible reports of the analyses conducted in the application.

Maintained by Vincent Nijs. Last updated 5 months ago.

53 stars 8.25 score 146 scripts 6 dependents

robinhankin

permutations:The Symmetric Group: Permutations of a Finite Set

Manipulates invertible functions from a finite set to itself. Can transform from word form to cycle form and back. To cite the package in publications please use Hankin (2020) "Introducing the permutations R package", SoftwareX, volume 11 <doi:10.1016/j.softx.2020.100453>.

Maintained by Robin K. S. Hankin. Last updated 2 months ago.

6 stars 8.23 score 49 scripts 2 dependents

darwin-eu

DrugUtilisation:Summarise Patient-Level Drug Utilisation in Data Mapped to the OMOP Common Data Model

Summarise patient-level drug utilisation cohorts using data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model. New users and prevalent users cohorts can be generated and their characteristics, indication and drug use summarised.

Maintained by Martí Català. Last updated 2 months ago.

8.20 score 156 scripts 2 dependents

henrikbengtsson

R.rsp:Dynamic Generation of Scientific Reports

The RSP markup language makes any text-based document come alive. RSP provides a powerful markup for controlling the content and output of LaTeX, HTML, Markdown, AsciiDoc, Sweave and knitr documents (and more), e.g. 'Today's date is <%=Sys.Date()%>'. Contrary to many other literate programming languages, with RSP it is straightforward to loop over mixtures of code and text sections, e.g. in month-by-month summaries. RSP has also several preprocessing directives for incorporating static and dynamic contents of external files (local or online) among other things. Functions rstring() and rcat() make it easy to process RSP strings, rsource() sources an RSP file as it was an R script, while rfile() compiles it (even online) into its final output format, e.g. rfile('report.tex.rsp') generates 'report.pdf' and rfile('report.md.rsp') generates 'report.html'. RSP is ideal for self-contained scientific reports and R package vignettes. It's easy to use - if you know how to write an R script, you'll be up and running within minutes.

Maintained by Henrik Bengtsson. Last updated 1 years ago.

document markup report reproducibility science

31 stars 8.06 score 36 scripts 9 dependents

darwin-eu

CohortCharacteristics:Summarise and Visualise Characteristics of Patients in the OMOP CDM

Summarise and visualise the characteristics of patients in data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model (CDM).

Maintained by Marti Catala. Last updated 4 months ago.

1 stars 8.03 score 111 scripts 1 dependents

robinhankin

hyper2:The Hyperdirichlet Distribution, Mark 2

A suite of routines for the hyperdirichlet distribution and reified Bradley-Terry; supersedes the 'hyperdirichlet' package; uses 'disordR' discipline <doi:10.48550/ARXIV.2210.03856>. To cite in publications please use Hankin 2017 <doi:10.32614/rj-2017-061>, and for Generalized Plackett-Luce likelihoods use Hankin 2024 <doi:10.18637/jss.v109.i08>.

Maintained by Robin K. S. Hankin. Last updated 6 hours ago.

cpp

5 stars 7.91 score 38 scripts 1 dependents

stocnet

goldfish:Statistical Network Models for Dynamic Network Data

Tools for fitting statistical network models to dynamic network data. Can be used for fitting both dynamic network actor models ('DyNAMs') and relational event models ('REMs'). Stadtfeld, Hollway, and Block (2017a) <doi:10.1177/0081175017709295>, Stadtfeld, Hollway, and Block (2017b) <doi:10.1177/0081175017733457>, Stadtfeld and Block (2017) <doi:10.15195/v4.a14>, Hoffman et al. (2020) <doi:10.1017/nws.2020.3>.

Maintained by Alvaro Uzaheta. Last updated 7 months ago.

dynam network-modelling rem statistical-network-analysis openblas cpp openmp

61 stars 7.91 score 44 scripts

darwin-eu

visOmopResults:Graphs and Tables for OMOP Results

Provides methods to transform omop_result objects into formatted tables and figures, facilitating the visualisation of study results working with the Observational Medical Outcomes Partnership (OMOP) Common Data Model.

Maintained by Núria Mercadé-Besora. Last updated 8 days ago.

7.89 score 53 scripts 3 dependents

openpharma

crmPack:Object-Oriented Implementation of CRM Designs

Implements a wide range of model-based dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules. Further details are presented in Sabanes Bove et al. (2019) <doi:10.18637/jss.v089.i10>.

Maintained by Daniel Sabanes Bove. Last updated 2 months ago.

jags cpp

21 stars 7.76 score 208 scripts

ellessenne

rsimsum:Analysis of Simulation Studies Including Monte Carlo Error

Summarise results from simulation studies and compute Monte Carlo standard errors of commonly used summary statistics. This package is modelled on the 'simsum' user-written command in 'Stata' (White I.R., 2010 <https://www.stata-journal.com/article.html?article=st0200>), further extending it with additional performance measures and functionality.

Maintained by Alessandro Gasparini. Last updated 11 months ago.

biostatistics monte-carlo-error simulation simulation-study simulations statistics

28 stars 7.70 score 148 scripts

usepa

spmodel:Spatial Statistical Modeling and Prediction

Fit, summarize, and predict for a variety of spatial statistical models applied to point-referenced and areal (lattice) data. Parameters are estimated using various methods. Additional modeling features include anisotropy, non-spatial random effects, partition factors, big data approaches, and more. Model-fit statistics are used to summarize, visualize, and compare models. Predictions at unobserved locations are readily obtainable. For additional details, see Dumelle et al. (2023) <doi:10.1371/journal.pone.0282524>.

Maintained by Michael Dumelle. Last updated 17 days ago.

15 stars 7.66 score 112 scripts 3 dependents

poissonconsulting

mcmcr:Manipulate MCMC Samples

Functions and classes to store, manipulate and summarise Monte Carlo Markov Chain (MCMC) samples. For more information see Brooks et al. (2011) <isbn:978-1-4200-7941-8>.

Maintained by Joe Thorley. Last updated 2 months ago.

coda mcmc

17 stars 7.66 score 111 scripts 10 dependents

bioc

AlpsNMR:Automated spectraL Processing System for NMR

Reads Bruker NMR data directories both zipped and unzipped. It provides automated and efficient signal processing for untargeted NMR metabolomics. It is able to interpolate the samples, detect outliers, exclude regions, normalize, detect peaks, align the spectra, integrate peaks, manage metadata and visualize the spectra. After spectra proccessing, it can apply multivariate analysis on extracted data. Efficient plotting with 1-D data is also available. Basic reading of 1D ACD/Labs exported JDX samples is also available.

Maintained by Sergio Oller Moreno. Last updated 5 months ago.

software preprocessing visualization classification cheminformatics metabolomics dataimport

15 stars 7.59 score 12 scripts 1 dependents

bayesiandemography

bage:Bayesian Estimation and Forecasting of Age-Specific Rates

Fast Bayesian estimation and forecasting of age-specific rates, probabilities, and means, based on 'Template Model Builder'.

Maintained by John Bryant. Last updated 12 days ago.

cpp

3 stars 7.41 score 39 scripts

tommyjones

tidylda:Latent Dirichlet Allocation Using 'tidyverse' Conventions

Implements an algorithm for Latent Dirichlet Allocation (LDA), Blei et at. (2003) <https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf>, using style conventions from the 'tidyverse', Wickham et al. (2019)<doi:10.21105/joss.01686>, and 'tidymodels', Kuhn et al.<https://tidymodels.github.io/model-implementation-principles/>. Fitting is done via collapsed Gibbs sampling. Also implements several novel features for LDA such as guided models and transfer learning.

Maintained by Tommy Jones. Last updated 2 months ago.

cpp openmp

41 stars 7.36 score 53 scripts

corybrunson

ordr:A Tidyverse Extension for Ordinations and Biplots

Ordination comprises several multivariate exploratory and explanatory techniques with theoretical foundations in geometric data analysis; see Podani (2000, ISBN:90-5782-067-6) for techniques and applications and Le Roux & Rouanet (2005) <doi:10.1007/1-4020-2236-0> for foundations. Greenacre (2010, ISBN:978-84-923846) shows how the most established of these, including principal components analysis, correspondence analysis, multidimensional scaling, factor analysis, and discriminant analysis, rely on eigen-decompositions or singular value decompositions of pre-processed numeric matrix data. These decompositions give rise to a set of shared coordinates along which the row and column elements can be measured. The overlay of their scatterplots on these axes, introduced by Gabriel (1971) <doi:10.1093/biomet/58.3.453>, is called a biplot. 'ordr' provides inspection, extraction, manipulation, and visualization tools for several popular ordination classes supported by a set of recovery methods. It is inspired by and designed to integrate into 'tidyverse' workflows provided by Wickham et al (2019) <doi:10.21105/joss.01686>.

Maintained by Jason Cory Brunson. Last updated 26 days ago.

biplot data-visualization dimension-reduction geometric-data-analysis grammar-of-graphics log-ratio-analysis multivariate-analysis multivariate-statistics ordination tidymodels tidyverse

24 stars 7.26 score 28 scripts

tidymodels

poissonreg:Model Wrappers for Poisson Regression

Bindings for Poisson regression models for use with the 'parsnip' package. Models include simple generalized linear models, Bayesian models, and zero-inflated Poisson models (Zeileis, Kleiber, and Jackman (2008) <doi:10.18637/jss.v027.i08>).

Maintained by Hannah Frick. Last updated 5 months ago.

22 stars 7.26 score 342 scripts 1 dependents

poissonconsulting

nlist:Lists of Numeric Atomic Objects

Create and manipulate numeric list ('nlist') objects. An 'nlist' is an S3 list of uniquely named numeric objects. An numeric object is an integer or double vector, matrix or array. An 'nlists' object is a S3 class list of 'nlist' objects with the same names, dimensionalities and typeofs. Numeric list objects are of interest because they are the raw data inputs for analytic engines such as 'JAGS', 'STAN' and 'TMB'. Numeric lists objects, which are useful for storing multiple realizations of of simulated data sets, can be converted to coda::mcmc and coda::mcmc.list objects.

Maintained by Joe Thorley. Last updated 2 months ago.

data-frame natomic nlist nlists

6 stars 7.23 score 13 scripts 12 dependents

insightsengineering

tern.mmrm:Tables and Graphs for Mixed Models for Repeated Measures (MMRM)

Mixed models for repeated measures (MMRM) are a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials and beyond; see for example Cnaan, Laird and Slasor (1997) <doi:10.1002/(SICI)1097-0258(19971030)16:20%3C2349::AID-SIM667%3E3.0.CO;2-E>. This package provides an interface for fitting MMRM within the 'tern' <https://cran.r-project.org/package=tern> framework by Zhu et al. (2023) and tabulate results easily using 'rtables' <https://cran.r-project.org/package=rtables> by Becker et al. (2023). It builds on 'mmrm' <https://cran.r-project.org/package=mmrm> by Sabanés Bové et al. (2023) for the actual MMRM computations.

Maintained by Joe Zhu. Last updated 6 months ago.

graphs listings statistical-engineering tables

6 stars 7.23 score 8 scripts 1 dependents

tidymodels

tidyclust:A Common API to Clustering

A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.

Maintained by Emil Hvitfeldt. Last updated 2 months ago.

112 stars 7.21 score 139 scripts

robjhyndman

vital:Tidy Analysis Tools for Mortality, Fertility, Migration and Population Data

Analysing vital statistics based on tools consistent with the tidyverse. Tools are provided for data visualization, life table calculations, computing net migration numbers, Lee-Carter modelling; functional data modelling and forecasting.

Maintained by Rob Hyndman. Last updated 2 days ago.

28 stars 7.20 score 18 scripts

mskcc-epi-bio

tidycmprsk:Competing Risks Estimation

Provides an intuitive interface for working with the competing risk endpoints. The package wraps the 'cmprsk' package, and exports functions for univariate cumulative incidence estimates and competing risk regression. Methods follow those introduced in Fine and Gray (1999) <doi:10.1002/sim.7501>.

Maintained by Daniel D. Sjoberg. Last updated 8 months ago.

competing-risks

23 stars 7.06 score 157 scripts 1 dependents

tidymodels

agua:'tidymodels' Integration with 'h2o'

Create and evaluate models using 'tidymodels' and 'h2o' <https://h2o.ai/>. The package enables users to specify 'h2o' as an engine for several modeling methods.

Maintained by Qiushi Yan. Last updated 10 months ago.

22 stars 6.88 score 80 scripts

kvasilopoulos

exuber:Econometric Analysis of Explosive Time Series

Testing for and dating periods of explosive dynamics (exuberance) in time series using the univariate and panel recursive unit root tests proposed by Phillips et al. (2015) <doi:10.1111/iere.12132> and Pavlidis et al. (2016) <doi:10.1007/s11146-015-9531-2>.The recursive least-squares algorithm utilizes the matrix inversion lemma to avoid matrix inversion which results in significant speed improvements. Simulation of a variety of periodically-collapsing bubble processes. Details can be found in Vasilopoulos et al. (2022) <doi:10.18637/jss.v103.i10>.

Maintained by Kostas Vasilopoulos. Last updated 1 years ago.

dickey-fuller explosive-dynamics simulation time-series openblas cpp

29 stars 6.83 score 77 scripts

seananderson

glmmfields:Generalized Linear Mixed Models with Robust Random Fields for Spatiotemporal Modeling

Implements Bayesian spatial and spatiotemporal models that optionally allow for extreme spatial deviations through time. 'glmmfields' uses a predictive process approach with random fields implemented through a multivariate-t distribution instead of the usual multivariate normal. Sampling is conducted with 'Stan'. References: Anderson and Ward (2019) <doi:10.1002/ecy.2403>.

Maintained by Sean C. Anderson. Last updated 1 years ago.

ecology extremes spatial-analysis spatiotemporal cpp

50 stars 6.74 score 55 scripts

shah-in-boots

card:Cardiovascular Applications in Research Data

A collection of cardiovascular research datasets and analytical tools, including methods for cardiovascular procedural data, such as electrocardiography, echocardiography, and catheterization data. Additional methods exist for analysis of procedural billing codes.

Maintained by Anish S. Shah. Last updated 2 months ago.

cardiology research

3 stars 6.73 score 163 scripts

s3alfisc

fwildclusterboot:Fast Wild Cluster Bootstrap Inference for Linear Models

Implementation of fast algorithms for wild cluster bootstrap inference developed in 'Roodman et al' (2019, 'STATA' Journal, <doi:10.1177/1536867X19830877>) and 'MacKinnon et al' (2022), which makes it feasible to quickly calculate bootstrap test statistics based on a large number of bootstrap draws even for large samples. Multiple bootstrap types as described in 'MacKinnon, Nielsen & Webb' (2022) are supported. Further, 'multiway' clustering, regression weights, bootstrap weights, fixed effects and 'subcluster' bootstrapping are supported. Further, both restricted ('WCR') and unrestricted ('WCU') bootstrap are supported. Methods are provided for a variety of fitted models, including 'lm()', 'feols()' (from package 'fixest') and 'felm()' (from package 'lfe'). Additionally implements a 'heteroskedasticity-robust' ('HC1') wild bootstrap. Last, the package provides an R binding to 'WildBootTests.jl', which provides additional speed gains and functionality, including the 'WRE' bootstrap for instrumental variable models (based on models of type 'ivreg()' from package 'ivreg') and hypotheses with q > 1.

Maintained by Alexander Fischer. Last updated 2 years ago.

clustered-standard-errors linear-regression-models wild-bootstrap wild-cluster-bootstrap openblas cpp openmp

25 stars 6.69 score 109 scripts 2 dependents

gsk-biostatistics

beastt:Bayesian Evaluation, Analysis, and Simulation Software Tools for Trials

Bayesian dynamic borrowing with covariate adjustment via inverse probability weighting for simulations and data analyses in clinical trials. This makes it easy to use propensity score methods to balance covariate distributions between external and internal data.

Maintained by Christina Fillmore. Last updated 4 days ago.

cpp

3 stars 6.65 score 4 scripts

usepa

SSN2:Spatial Modeling on Stream Networks

Spatial statistical modeling and prediction for data on stream networks, including models based on in-stream distance (Ver Hoef, J.M. and Peterson, E.E., (2010) <DOI:10.1198/jasa.2009.ap08248>.) Models are created using moving average constructions. Spatial linear models, including explanatory variables, can be fit with (restricted) maximum likelihood. Mapping and other graphical functions are included.

Maintained by Michael Dumelle. Last updated 7 months ago.

19 stars 6.61 score 36 scripts 2 dependents

tidymodels

plsmod:Model Wrappers for Projection Methods

Bindings for additional regression models for use with the 'parsnip' package, including ordinary and spare partial least squares models for regression and classification (Rohart et al (2017) <doi:10.1371/journal.pcbi.1005752>).

Maintained by Max Kuhn. Last updated 5 months ago.

mixomics

14 stars 6.47 score 59 scripts 1 dependents

mattheaphy

actxps:Create Actuarial Experience Studies: Prepare Data, Summarize Results, and Create Reports

Experience studies are used by actuaries to explore historical experience across blocks of business and to inform assumption setting activities. This package provides functions for preparing data, creating studies, visualizing results, and beginning assumption development. Experience study methods, including exposure calculations, are described in: Atkinson & McGarry (2016) "Experience Study Calculations" <https://www.soa.org/49378a/globalassets/assets/files/research/experience-study-calculations.pdf>. The limited fluctuation credibility method used by the 'exp_stats()' function is described in: Herzog (1999, ISBN:1-56698-374-6) "Introduction to Credibility Theory".

Maintained by Matt Heaphy. Last updated 3 months ago.

14 stars 6.38 score 23 scripts

stocnet

migraph:Univariate and Multivariate Tests for Multimodal and Other Networks

A set of tools for testing networks. It includes functions for univariate and multivariate conditional uniform graph and quadratic assignment procedure testing, and network regression. The package is a complement to 'Multimodal Political Networks' (2021, ISBN:9781108985000), and includes various datasets used in the book. Built on the 'manynet' package, all functions operate with matrices, edge lists, and 'igraph', 'network', and 'tidygraph' objects, and on one-mode and two-mode (bipartite) networks.

Maintained by James Hollway. Last updated 4 months ago.

igraph multilevel-networks multimodal-network network-analysis sna

41 stars 6.37 score 33 scripts

nt-williams

lmtp:Non-Parametric Causal Effects of Feasible Interventions Based on Modified Treatment Policies

Non-parametric estimators for casual effects based on longitudinal modified treatment policies as described in Diaz, Williams, Hoffman, and Schenck <doi:10.1080/01621459.2021.1955691>, traditional point treatment, and traditional longitudinal effects. Continuous, binary, categorical treatments, and multivariate treatments are allowed as well are censored outcomes. The treatment mechanism is estimated via a density ratio classification procedure irrespective of treatment variable type. For both continuous and binary outcomes, additive treatment effects can be calculated and relative risks and odds ratios may be calculated for binary outcomes. Supports survival outcomes with competing risks (Diaz, Hoffman, and Hejazi; <doi:10.1007/s10985-023-09606-7>).

Maintained by Nicholas Williams. Last updated 22 days ago.

causal-inference censored-data longitudinal-data machine-learning modified-treatment-policy nonparametric-statistics precision-medicine robust-statistics statistics stochastic-interventions survival-analysis targeted-learning

64 stars 6.37 score 91 scripts

cmstatr

cmstatr:Statistical Methods for Composite Material Data

An implementation of the statistical methods commonly used for advanced composite materials in aerospace applications. This package focuses on calculating basis values (lower tolerance bounds) for material strength properties, as well as performing the associated diagnostic tests. This package provides functions for calculating basis values assuming several different distributions, as well as providing functions for non-parametric methods of computing basis values. Functions are also provided for testing the hypothesis that there is no difference between strength and modulus data from an alternate sample and that from a "qualification" or "baseline" sample. For a discussion of these statistical methods and their use, see the Composite Materials Handbook, Volume 1 (2012, ISBN: 978-0-7680-7811-4). Additional details about this package are available in the paper by Kloppenborg (2020, <doi:10.21105/joss.02265>).

Maintained by Stefan Kloppenborg. Last updated 11 days ago.

composite-material-data data materials-science statistical-analysis statistics

4 stars 6.36 score 23 scripts

stevenpawley

Rsagacmd:Linking R with the Open-Source 'SAGA-GIS' Software

Provides an R scripting interface to the open-source 'SAGA-GIS' (System for Automated Geoscientific Analyses Geographical Information System) software. 'Rsagacmd' dynamically generates R functions for every 'SAGA-GIS' geoprocessing tool based on the user's currently installed 'SAGA-GIS' version. These functions are contained within an S3 object and are accessed as a named list of libraries and tools. This structure facilitates an easier scripting experience by organizing the large number of 'SAGA-GIS' geoprocessing tools (>700) by their respective library. Interactive scripting can fully take advantage of code autocompletion tools (e.g. in 'RStudio'), allowing for each tools syntax to be quickly recognized. Furthermore, the most common types of spatial data (via the 'terra', 'sp', and 'sf' packages) along with non-spatial data are automatically passed from R to the 'SAGA-GIS' command line tool for geoprocessing operations, and the results are loaded as the appropriate R object. Outputs from individual 'SAGA-GIS' tools can also be chained using pipes from the 'magrittr' and 'dplyr' packages to combine complex geoprocessing operations together in a single statement. 'SAGA-GIS' is available under a GPLv2 / LGPLv2 licence from <https://sourceforge.net/projects/saga-gis/> including Windows x86/x64 and macOS binaries. SAGA-GIS is also included in Debian/Ubuntu default software repositories. Rsagacmd has currently been tested on 'SAGA-GIS' versions from 2.3.1 to 9.5.1 on Windows, Linux and macOS.

Maintained by Steven Pawley. Last updated 7 months ago.

32 stars 6.27 score 77 scripts

vivianalobo

lnmixsurv:Bayesian Mixture Log-Normal Survival Model

Bayesian Survival models via the mixture of Log-Normal distribution extends the well-known survival models and accommodates different behaviour over time and considers higher censored survival times. The proposal combines mixture distributions Fruhwirth-Schnatter(2006) <doi:10.1007/s11336-009-9121-4>, and data augmentation techniques Tanner and Wong (1987) <doi:10.1080/01621459.1987.10478458>.

Maintained by Victor Hugo Soares Ney. Last updated 24 days ago.

gsl openblas cpp

2 stars 6.16 score 18 scripts

s3alfisc

summclust:Module to Compute Influence and Leverage Statistics for Regression Models with Clustered Errors

Module to compute cluster specific information for regression models with clustered errors, including leverage and influence statistics. Models of type 'lm' and 'fixest'(from the 'stats' and 'fixest' packages) are supported. 'summclust' implements similar features as the user-written 'summclust.ado' Stata module (MacKinnon, Nielsen & Webb, 2022; <arXiv:2205.03288v1>).

Maintained by Alexander Fischer. Last updated 2 years ago.

clustered-standard-errors fixest linear-regression robust-inference

6 stars 6.16 score 53 scripts 3 dependents

ipd-tools

ipd:Inference on Predicted Data

Performs valid statistical inference on predicted data (IPD) using recent methods, where for a subset of the data, the outcomes have been predicted by an algorithm. Provides a wrapper function with specified defaults for the type of model and method to be used for estimation and inference. Further provides methods for tidying and summarizing results. Salerno et al., (2024) <doi:10.48550/arXiv.2410.09665>.

Maintained by Stephen Salerno. Last updated 3 months ago.

8 stars 6.13 score 5 scripts

mattblackwell

DirectEffects:Estimating Controlled Direct Effects for Explaining Causal Findings

A set of functions to estimate the controlled direct effect of treatment fixing a potential mediator to a specific value. Implements the sequential g-estimation estimator described in Vansteelandt (2009) <doi:10.1097/EDE.0b013e3181b6f4c9> and Acharya, Blackwell, and Sen (2016) <doi:10.1017/S0003055416000216> and the telescope matching estimator described in Blackwell and Strezhnev (2020) <doi:10.1111/rssa.12759>.

Maintained by Matthew Blackwell. Last updated 1 months ago.

18 stars 6.09 score 17 scripts

pachadotdev

capybara:Fast and Memory Efficient Fitting of Linear Models with High-Dimensional Fixed Effects

Fast and user-friendly estimation of generalized linear models with multiple fixed effects and cluster the standard errors. The method to obtain the estimated fixed-effects coefficients is based on Stammann (2018) <doi:10.48550/arXiv.1707.01815> and Gaure (2013) <doi:10.1016/j.csda.2013.03.024>.

Maintained by Mauricio Vargas Sepulveda. Last updated 5 days ago.

cpp11 econometrics linear-models openblas cpp openmp

13 stars 6.07 score

yjunechoe

jlmerclusterperm:Cluster-Based Permutation Analysis for Densely Sampled Time Data

An implementation of fast cluster-based permutation analysis (CPA) for densely-sampled time data developed in Maris & Oostenveld, 2007 <doi:10.1016/j.jneumeth.2007.03.024>. Supports (generalized, mixed-effects) regression models for the calculation of timewise statistics. Provides both a wholesale and a piecemeal interface to the CPA procedure with an emphasis on interpretability and diagnostics. Integrates 'Julia' libraries 'MixedModels.jl' and 'GLM.jl' for performance improvements, with additional functionalities for interfacing with 'Julia' from 'R' powered by the 'JuliaConnectoR' package.

Maintained by June Choe. Last updated 19 days ago.

cluster-based-permutation-test eeg eyetracking mixed-effects-models timeseries

13 stars 5.86 score 14 scripts

evolecolgroup

tidypopgen:Tidy Population Genetics

We provide a tidy grammar of population genetics, facilitating the manipulation and analysis of data on biallelic single nucleotide polymorphisms (SNPs). `tidypopgen` scales to very large genetic datasets by storing genotypes on disk, and performing operations on them in chunks, without ever loading all data in memory.

Maintained by Andrea Manica. Last updated 8 days ago.

openblas zlib cpp openmp

4 stars 5.84 score 8 scripts

berrij

profoc:Probabilistic Forecast Combination Using CRPS Learning

Combine probabilistic forecasts using CRPS learning algorithms proposed in Berrisch, Ziel (2021) <doi:10.48550/arXiv.2102.00968> <doi:10.1016/j.jeconom.2021.11.008>. The package implements multiple online learning algorithms like Bernstein online aggregation; see Wintenberger (2014) <doi:10.48550/arXiv.1404.1356>. Quantile regression is also implemented for comparison purposes. Model parameters can be tuned automatically with respect to the loss of the forecast combination. Methods like predict(), update(), plot() and print() are available for convenience. This package utilizes the optim C++ library for numeric optimization <https://github.com/kthohr/optim>.

Maintained by Jonathan Berrisch. Last updated 6 months ago.

openblas cpp openmp

14 stars 5.74 score 13 scripts

acoppock

ri2:Randomization Inference for Randomized Experiments

Randomization inference procedures for simple and complex randomized designs, including multi-armed trials, as described in Gerber and Green (2012, ISBN: 978-0393979954). Users formally describe their randomization procedure and test statistic. The randomization distribution of the test statistic under some null hypothesis is efficiently simulated.

Maintained by Alexander Coppock. Last updated 3 years ago.

12 stars 5.69 score 82 scripts

vpnsctl

mixpoissonreg:Mixed Poisson Regression for Overdispersed Count Data

Fits mixed Poisson regression models (Poisson-Inverse Gaussian or Negative-Binomial) on data sets with response variables being count data. The models can have varying precision parameter, where a linear regression structure (through a link function) is assumed to hold on the precision parameter. The Expectation-Maximization algorithm for both these models (Poisson Inverse Gaussian and Negative Binomial) is an important contribution of this package. Another important feature of this package is the set of functions to perform global and local influence analysis. See Barreto-Souza and Simas (2016) <doi:10.1007/s11222-015-9601-6> for further details.

Maintained by Alexandre B. Simas. Last updated 4 years ago.

count-data diagnostics influence-analysis local-influence negative-binomial-regression poisson-inverse-gaussian-regression

3 stars 5.44 score 23 scripts

beanumber

tidychangepoint:A Tidy Framework for Changepoint Detection Analysis

Changepoint detection algorithms for R are widespread but have different interfaces and reporting conventions. This makes the comparative analysis of results difficult. We solve this problem by providing a tidy, unified interface for several different changepoint detection algorithms. We also provide consistent numerical and graphical reporting leveraging the 'broom' and 'ggplot2' packages.

Maintained by Benjamin S. Baumer. Last updated 2 months ago.

2 stars 5.30 score 8 scripts

sachsmc

stdReg2:Regression Standardization for Causal Inference

Contains more modern tools for causal inference using regression standardization. Four general classes of models are implemented; generalized linear models, conditional generalized estimating equation models, Cox proportional hazards models, and shared frailty gamma-Weibull models. Methodological details are described in Sjölander, A. (2016) <doi:10.1007/s10654-016-0157-3>. Also includes functionality for doubly robust estimation for generalized linear models in some special cases, and the ability to implement custom models.

Maintained by Michael C Sachs. Last updated 8 days ago.

2 stars 5.15 score 9 scripts

poissonconsulting

bboutools:Boreal Caribou Survival, Recruitment and Population Growth

Estimates annual survival, recruitment and population growth for boreal caribou populations using Bayesian and Maximum Likelihood models with fixed and random effects.

Maintained by Seb Dalgarno. Last updated 2 months ago.

1 stars 5.11 score 13 scripts 2 dependents

opisthokonta

chainbinomial:Chain Binomial Models for Analysis of Infectious Disease Data

Implements the chain binomial model for analysis of infectious disease data. Contains functions for calculating probabilities of the final size of infectious disease outbreaks using the method from D. Ludwig (1975) <doi:10.1016/0025-5564(75)90119-4> and for outbreaks that are not concluded, from Lindstrøm et al. (2024) <doi:10.48550/arXiv.2403.03948>. The package also contains methods for estimation and regression analysis of secondary attack rates.

Maintained by Jonas Christoffer Lindstrøm. Last updated 2 months ago.

5.00 score 5 scripts

korap

RKorAPClient:'KorAP' Web Service Client Package

A client package that makes the 'KorAP' web service API accessible from R. The corpus analysis platform 'KorAP' has been developed as a scientific tool to make potentially large, stratified and multiply annotated corpora, such as the 'German Reference Corpus DeReKo' or the 'Corpus of the Contemporary Romanian Language CoRoLa', accessible for linguists to let them verify hypotheses and to find interesting patterns in real language use. The 'RKorAPClient' package provides access to 'KorAP' and the corpora behind it for user-created R code, as a programmatic alternative to the 'KorAP' web user-interface. You can learn more about 'KorAP' and use it directly on 'DeReKo' at <https://korap.ids-mannheim.de/>.

Maintained by Marc Kupietz. Last updated 28 days ago.

6 stars 4.81 score 30 scripts

talegari

tidyrules:Utilities to Retrieve Rulelists from Model Fits, Filter, Prune, Reorder and Predict on Unseen Data

Provides a framework to work with decision rules. Rules can be extracted from supported models, augmented with (custom) metrics using validation data, manipulated using standard dataframe operations, reordered and pruned based on a metric, predict on unseen (test) data. Utilities include; Creating a rulelist manually, Exporting a rulelist as a SQL case statement and so on. The package offers two classes; rulelist and ruleset based on dataframe.

Maintained by Srikanth Komala Sheshachala. Last updated 2 months ago.

11 stars 4.75 score 17 scripts

njtierney

mmcc:tidy mcmc.list using data.table

Tidy up, diagnose, and visualise your mcmc samples quickly and easily so you can get on with your analysis.

Maintained by Nicholas Tierney. Last updated 3 years ago.

24 stars 4.68 score 10 scripts

hriebl

lmls:Gaussian Location-Scale Regression

The Gaussian location-scale regression model is a multi-predictor model with explanatory variables for the mean (= location) and the standard deviation (= scale) of a response variable. This package implements maximum likelihood and Markov chain Monte Carlo (MCMC) inference (using algorithms from Girolami and Calderhead (2011) <doi:10.1111/j.1467-9868.2010.00765.x> and Nesterov (2009) <doi:10.1007/s10107-007-0149-x>), a parametric bootstrap algorithm, and diagnostic plots for the model class.

Maintained by Hannes Riebl. Last updated 5 months ago.

3 stars 4.65 score 15 scripts

poissonconsulting

embr:Model Builder Utility Functions and Virtual Classes

Utility functions and virtual classes shared by model builder packages such as tmbr, jmbr and smbr.

Maintained by Joe Thorley. Last updated 2 months ago.

analyses mbr

3 stars 4.61 score 4 scripts 3 dependents

chjackson

disbayes:Bayesian Multi-State Modelling of Chronic Disease Burden Data

Estimation of incidence and case fatality for a chronic disease, given partial information, using a multi-state model. Given data on age-specific mortality and either incidence or prevalence, Bayesian inference is used to estimate the posterior distributions of incidence, case fatality, and functions of these such as prevalence. The methods are described in Jackson et al. (2023) <doi:10.1093/jrsssa/qnac015>.

Maintained by Christopher Jackson. Last updated 1 years ago.

cpp

7 stars 4.54 score 10 scripts

shah-in-boots

rmdl:A Causality-Informed Modeling Approach

A system for describing and manipulating the many models that are generated in causal inference and data analysis projects, as based on the causal theory and criteria of Austin Bradford Hill (1965) <doi:10.1177/003591576505800503>. This system includes the addition of formal attributes that modify base `R` objects, including terms and formulas, with a focus on variable roles in the "do-calculus" of modeling, as described in Pearl (2010) <doi:10.2202/1557-4679.1203>. For example, the definition of exposure, outcome, and interaction are implicit in the roles variables take in a formula. These premises allow for a more fluent modeling approach focusing on variable relationships, and assessing effect modification, as described by VanderWeele and Robins (2007) <doi:10.1097/EDE.0b013e318127181b>. The essential goal is to help contextualize formulas and models in causality-oriented workflows.

Maintained by Anish S. Shah. Last updated 10 months ago.

epidemiology modeling statistics

4.54 score 7 scripts

fndemarqui

survstan:Fitting Survival Regression Models via 'Stan'

Parametric survival regression models under the maximum likelihood approach via 'Stan'. Implemented regression models include accelerated failure time (AFT) models, proportional hazards (PH) models, proportional odds (PO) models, accelerated hazard (AH) models, Yang and Prentice (YP) models, and extended hazard (EH) models. Available baseline survival distributions include exponential, Weibull, log-normal, log-logistic, gamma, generalized gamma, rayleigh, Gompertz and fatigue (Birnbaum-Saunders) distributions. The baseline survival distribution can be further modeled using Bernstein polynomails' approximation of the baseline hazard function. References: Lawless (2002) <ISBN:9780471372158>; Bennett (1982) <doi:10.1002/sim.4780020223>; Chen and Wang(2000) <doi:10.1080/01621459.2000.10474236>; Demarqui and Mayrink (2021) <doi:10.1214/20-BJPS471>.

Maintained by Fabio Demarqui. Last updated 7 months ago.

cpp

5 stars 4.50 score 63 scripts

mattblackwell

factiv:Instrumental Variables Estimation for 2^k Factorial Experiments

Implements instrumental variable estimators for 2^K factorial experiments with noncompliance.

Maintained by Matthew Blackwell. Last updated 3 years ago.

3 stars 4.18 score 6 scripts

data-wise

RMediation:Mediation Analysis Confidence Intervals

We provide functions to compute confidence intervals for a well-defined nonlinear function of the model parameters (e.g., product of k coefficients) in single--level and multilevel structural equation models. It also computes a chi-square test statistic for a function of indirect effects. 'Tofighi', D. and 'MacKinnon', D. P. (2011). 'RMediation' An R package for mediation analysis confidence intervals. Behavior Research Methods, 43, 692--700. <doi:10.3758/s13428-011-0076-x>. 'Tofighi', D. (2020). Bootstrap Model-Based Constrained Optimization Tests of Indirect Effects. Frontiers in Psychology, 10, 2989. <doi:10.3389/fpsyg.2019.02989>.

Maintained by Davood Tofighi. Last updated 1 years ago.

causal-inference confidence-intervals likelihood-ratio-test mediation mediation-analysis

1 stars 4.10 score 25 scripts

yjunechoe

jlme:Regression Modelling with 'GLM.jl' and 'MixedModels.jl' in 'Julia'

Bindings to 'Julia' packages 'GLM.jl' <doi:10.5281/zenodo.3376013> and 'MixedModels.jl' <doi:10.5281/zenodo.12575371>, powered by 'JuliaConnectoR'. Fits (generalized) linear (mixed-effects) regression models in 'Julia' using familiar model fitting syntax from R. Offers 'broom'-style data frame summary functionalities for 'Julia' regression models.

Maintained by June Choe. Last updated 4 days ago.

1 stars 3.98 score 6 scripts

openpharma

roxylint:Lint 'roxygen2'-Generated Documentation

Provides formatting linting to 'roxygen2' tags. Linters report 'roxygen2' tags that do not conform to a standard style. These linters can be a helpful check for building more consistent documentation and to provide reminders about best practices or checks for typos. Default linting suites are provided for common style guides such as the one followed by the 'tidyverse', though custom linters can be registered by other packages or be custom-tailored to a specific package.

Maintained by Doug Kelkhoff. Last updated 1 years ago.

linter roxygen2

17 stars 3.93 score

dmolitor

bolasso:Model Consistent Lasso Estimation Through the Bootstrap

Implements the bolasso algorithm for consistent variable selection and estimation accuracy. Includes support for many parallel backends via the future package. For details see: Bach (2008), 'Bolasso: model consistent Lasso estimation through the bootstrap', <doi:10.48550/arXiv.0804.1302>.

Maintained by Daniel Molitor. Last updated 3 months ago.

bolasso bootstrap lasso variable-selection

4 stars 3.90 score 7 scripts

ycroissant

mhurdle:Multiple Hurdle Tobit Models

Estimation of models with dependent variable left-censored at zero. Null values may be caused by a selection process Cragg (1971) <doi:10.2307/1909582>, insufficient resources Tobin (1958) <doi:10.2307/1907382>, or infrequency of purchase Deaton and Irish (1984) <doi:10.1016/0047-2727(84)90067-7>.

Maintained by Yves Croissant. Last updated 9 months ago.

fortran

3.88 score 15 scripts

grantmcdermott

ritest:Randomisation Inference Testing

An experimental port of the `ritest` Stata routine by Simon Heß. Fast and user-friendly. Aims to support a variety of model classes once it is fully baked.

Maintained by . Last updated 3 years ago.

10 stars 3.70 score 7 scripts

psychelzh

cpmr:Connectome Predictive Modelling in R

Connectome Predictive Modelling (CPM) (Shen et al. (2017) <doi:10.1038/nprot.2016.178>) is a method to predict individual differences in behaviour from brain functional connectivity. 'cpmr' provides a simple yet efficient implementation of this method.

Maintained by Liang Zhang. Last updated 6 months ago.

1 stars 3.65 score 4 scripts

njtierney

broomstick:Convert Decision Tree Objects into Tidy Data Frames

Convert Decision Tree objects into tidy data frames, by using the framework laid out by the package broom, this means that decision tree output can be easily reshaped, porocessed, and combined with tools like 'dplyr', 'tidyr' and 'ggplot2'. Like the package broom, broomstick provides three S3 generics: tidy, to summarise decision tree specific features - tidy returns the variable importance table; augment adds columns to the original data such as predictions and residuals; and glance, which provides a one-row summary of model-level statistics.

Maintained by Nicholas Tierney. Last updated 1 years ago.

broom decision-trees gbm machine-learning randomforest rpart statistical-learning

29 stars 3.59 score 27 scripts

paithiov909

shikakusphere:Miscellaneous Functions for Japanese Mahjong

A collection of miscellaneous functions for Japanese mahjong that wraps C++ sources of 'shanten-number' <https://github.com/tomohxx/shanten-number> and 'cmajiang' <https://github.com/TadaoYamaoka/cmajiang>.

Maintained by Akiru Kato. Last updated 29 days ago.

mahjong rcpp cpp

4 stars 3.41 score 5 scripts

pasturm

bfsl:Best-Fit Straight Line

How to fit a straight line through a set of points with errors in both coordinates? The 'bfsl' package implements the York regression (York, 2004 <doi:10.1119/1.1632486>). It provides unbiased estimates of the intercept, slope and standard errors for the best-fit straight line to independent points with (possibly correlated) normally distributed errors in both x and y. Other commonly used errors-in-variables methods, such as orthogonal distance regression, geometric mean regression or Deming regression are special cases of the 'bfsl' solution.

Maintained by Patrick Sturm. Last updated 3 years ago.

3 stars 3.18 score 10 scripts

nk027

BVARverse:Tidy Bayesian Vector Autoregression

Functions to prepare tidy objects from estimated models via 'BVAR' (see Kuschnig & Vashold, 2019 <doi:10.13140/RG.2.2.25541.60643>) and visualisation thereof. Bridges the gap between estimating models with 'BVAR' and plotting the results in a more sophisticated way with 'ggplot2' as well as passing them on in a tidy format.

Maintained by Lukas Vashold. Last updated 5 years ago.

bayesian data-science vector-autoregressions

2 stars 3.00 score 7 scripts

njtierney

yahtsee:Yet Another Hierachical Time Series Extension and Expansion

An opinionated approach to building hierarchical time series models in R using INLA and inlabru.

Maintained by Nicholas Tierney. Last updated 3 years ago.

2 stars 3.00 score 8 scripts

rdinnager

fibre:Fast Evolutionary Trait Modelling on Phylogenies using Branch Regression Models

Implements Phylogenetic Branch Regression models which allow for flexible and versatile models of evolution along a phylogeny. The model can be used to detect shifts in rates of evolution along branches. The model uses a continuous and linear model structure and so can be easily combined with other non-phylogenetic statistical structures, as long as they are implemented using the R package INLA. One major uses of this are to condition on phylogeny in a standard regression between two traits, thus 'accounting' for phylogenetic structure in the response variable, similar to how pgls is used but allowing for a more flexible phylogenetic model. This also allows the phylogenetic model to be combined with the spatial models that INLA excels at (and with comparable flexibility to those spatial models).

Maintained by Russell Dinnage. Last updated 4 months ago.

3 stars 2.71 score 34 scripts

sciviews

exploreit:Exploratory Data Analysis for 'SciViews::R'

Multivariate analysis and data exploration for the 'SciViews::R' dialect.

Maintained by Philippe Grosjean. Last updated 11 months ago.

multivariate-analysis sciviews statistical-methods

2.70 score 4 scripts

graemeblair

rdss:Companion Datasets and Functions for Research Design in the Social Sciences

Helper functions to accompany the Blair, Coppock, and Humphreys (2022) "Research Design in the Social Sciences: Declaration, Diagnosis, and Redesign" <https://book.declaredesign.org>. 'rdss' includes datasets, helper functions, and plotting components to enable use and replication of the book.

Maintained by Graeme Blair. Last updated 3 months ago.

2.64 score 29 scripts

shixiangwang

sigminer.prediction:Train and Predict Cancer Subtype with Keras Model based on Mutational Signatures

Mutational signatures represent mutational processes occured in cancer evolution, thus are stable and genetic resources for subtyping. This tool provides functions for training neutral network models to predict the subtype a sample belongs to based on 'keras' and 'sigminer' packages.

Maintained by Shixiang Wang. Last updated 3 years ago.

keras mutational-signatures prostate-cancer sigminer

8 stars 2.60 score 2 scripts

bklamer

rankdifferencetest:Kornbrot's Rank Difference Test

Implements Kornbrot's rank difference test as described in <doi:10.1111/j.2044-8317.1990.tb00939.x>. This method is a modified Wilcoxon signed-rank test which produces consistent and meaningful results for ordinal or monotonically-transformed data.

Maintained by Brett Klamer. Last updated 6 months ago.

2.18 score 4 scripts

eric-hunt

htce:A set of internal tools for managing high-throughput assay data at NEB

What the package does (one paragraph).

Maintained by Eric Hunt. Last updated 10 months ago.

1.00 score