Showing 120 of total 120 results (show query)
tidymodels
broom:Convert Statistical Objects into Tidy Tibbles
Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.
Maintained by Simon Couch. Last updated 1 days ago.
1.5k stars 21.58 score 37k scripts 1.5k dependentstidymodels
recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 19 hours ago.
586 stars 18.80 score 7.2k scripts 383 dependentsjuliasilge
tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.
Maintained by Julia Silge. Last updated 12 months ago.
natural-language-processingtext-miningtidy-datatidyverse
1.2k stars 16.86 score 17k scripts 61 dependentstidymodels
rsample:General Resampling Infrastructure
Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).
Maintained by Hannah Frick. Last updated 18 days ago.
341 stars 16.72 score 5.2k scripts 79 dependentsamices
mice:Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Maintained by Stef van Buuren. Last updated 1 days ago.
chained-equationsfcsimputationmicemissing-datamissing-valuesmultiple-imputationmultivariate-datacpp
462 stars 16.64 score 10k scripts 154 dependentstidymodels
parsnip:A Common API to Modeling and Analysis Functions
A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).
Maintained by Max Kuhn. Last updated 17 days ago.
612 stars 16.37 score 3.4k scripts 69 dependentsmhahsler
dbscan:Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms
A fast reimplementation of several density-based algorithms of the DBSCAN family. Includes the clustering algorithms DBSCAN (density-based spatial clustering of applications with noise) and HDBSCAN (hierarchical DBSCAN), the ordering algorithm OPTICS (ordering points to identify the clustering structure), shared nearest neighbor clustering, and the outlier detection algorithms LOF (local outlier factor) and GLOSH (global-local outlier score from hierarchies). The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided. Hahsler, Piekenbrock and Doran (2019) <doi:10.18637/jss.v091.i01>.
Maintained by Michael Hahsler. Last updated 2 months ago.
clusteringdbscandensity-based-clusteringhdbscanlofopticscpp
324 stars 15.60 score 1.6k scripts 85 dependentstidymodels
yardstick:Tidy Characterizations of Model Performance
Tidy tools for quantifying how well model fits to a data set such as confusion matrices, class probability curve summaries, and regression metrics (e.g., RMSE).
Maintained by Emil Hvitfeldt. Last updated 17 days ago.
387 stars 15.47 score 2.2k scripts 60 dependentskassambara
rstatix:Pipe-Friendly Framework for Basic Statistical Tests
Provides a simple and intuitive pipe-friendly framework, coherent with the 'tidyverse' design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses. The output of each test is automatically transformed into a tidy data frame to facilitate visualization. Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely 'within-Ss' designs (repeated measures), purely 'between-Ss' designs, and mixed 'within-and-between-Ss' designs. It's also possible to compute several effect size metrics, including "eta squared" for ANOVA, "Cohen's d" for t-test and 'Cramer V' for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances.
Maintained by Alboukadel Kassambara. Last updated 2 years ago.
458 stars 15.27 score 11k scripts 432 dependentsbbolker
broom.mixed:Tidying Methods for Mixed Models
Convert fitted objects from various R mixed-model packages into tidy data frames along the lines of the 'broom' package. The package provides three S3 generics for each model: tidy(), which summarizes a model's statistical findings such as coefficients of a regression; augment(), which adds columns to the original data such as predictions, residuals and cluster assignments; and glance(), which provides a one-row summary of model-level statistics.
Maintained by Ben Bolker. Last updated 6 days ago.
230 stars 15.22 score 4.0k scripts 37 dependentssparklyr
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 11 days ago.
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
959 stars 15.20 score 4.0k scripts 21 dependentsvincentarelbundock
marginaleffects:Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests
Compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and machine learning models in R. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference. Details can be found in Arel-Bundock, Greifer, and Heiss (2024) <doi:10.18637/jss.v111.i09>.
Maintained by Vincent Arel-Bundock. Last updated 1 hours ago.
511 stars 14.57 score 1.8k scripts 10 dependentsjacob-long
jtools:Analysis and Presentation of Social Scientific Data
This is a collection of tools for more efficiently understanding and sharing the results of (primarily) regression analyses. There are also a number of miscellaneous functions for statistical and programming purposes. Support for models produced by the survey and lme4 packages are points of emphasis.
Maintained by Jacob A. Long. Last updated 7 months ago.
167 stars 14.48 score 4.0k scripts 14 dependentshojsgaard
pbkrtest:Parametric Bootstrap, Kenward-Roger and Satterthwaite Based Methods for Test in Mixed Models
Computes p-values based on (a) Satterthwaite or Kenward-Rogers degree of freedom methods and (b) parametric bootstrap for mixed effects models as implemented in the 'lme4' package. Implements parametric bootstrap test for generalized linear mixed models as implemented in 'lme4' and generalized linear models. The package is documented in the paper by Halekoh and Højsgaard, (2012, <doi:10.18637/jss.v059.i09>). Please see 'citation("pbkrtest")' for citation details.
Maintained by Søren Højsgaard. Last updated 23 days ago.
5 stars 14.36 score 648 scripts 915 dependentsr-lib
generics:Common S3 Generics not Provided by Base R Methods Related to Model Fitting
In order to reduce potential package dependencies and conflicts, generics provides a number of commonly used S3 generics.
Maintained by Hadley Wickham. Last updated 1 years ago.
61 stars 14.00 score 131 scripts 9.8k dependentshughjonesd
huxtable:Easily Create and Style Tables for LaTeX, HTML and Other Formats
Creates styled tables for data presentation. Export to HTML, LaTeX, RTF, 'Word', 'Excel', and 'PowerPoint'. Simple, modern interface to manipulate borders, size, position, captions, colours, text styles and number formatting. Table cells can span multiple rows and/or columns. Includes a 'huxreg' function for creation of regression tables, and 'quick_*' one-liners to print data to a new document.
Maintained by David Hugh-Jones. Last updated 25 days ago.
htmlhuxtablelatexmicrosoft-wordpowerpointreproducible-researchtables
323 stars 13.93 score 1.9k scripts 16 dependentsvincentarelbundock
modelsummary:Summary Tables and Plots for Statistical Models and Data: Beautiful, Customizable, and Publication-Ready
Create beautiful and customizable tables to summarize several statistical models side-by-side. Draw coefficient plots, multi-level cross-tabs, dataset summaries, balance tables (a.k.a. "Table 1s"), and correlation matrices. This package supports dozens of statistical models, and it can produce tables in HTML, LaTeX, Word, Markdown, PDF, PowerPoint, Excel, RTF, JPG, or PNG. Tables can easily be embedded in 'Rmarkdown' or 'knitr' dynamic documents. Details can be found in Arel-Bundock (2022) <doi:10.18637/jss.v103.i01>.
Maintained by Vincent Arel-Bundock. Last updated 28 days ago.
926 stars 13.41 score 6.2k scripts 2 dependentschjackson
flexsurv:Flexible Parametric Survival and Multi-State Models
Flexible parametric models for time-to-event data, including the Royston-Parmar spline model, generalized gamma and generalized F distributions. Any user-defined parametric distribution can be fitted, given at least an R function defining the probability density or hazard. There are also tools for fitting and predicting from fully parametric multi-state models, based on either cause-specific hazards or mixture models.
Maintained by Christopher Jackson. Last updated 2 months ago.
57 stars 13.31 score 632 scripts 43 dependentstidyverts
fabletools:Core Tools for Packages in the 'fable' Framework
Provides tools, helpers and data structures for developing models and time series functions for 'fable' and extension packages. These tools support a consistent and tidy interface for time series modelling and analysis.
Maintained by Mitchell OHara-Wild. Last updated 2 months ago.
91 stars 12.18 score 396 scripts 18 dependentsopenpharma
mmrm:Mixed Models for Repeated Measures
Mixed models for repeated measures (MMRM) are a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials and beyond; see Cnaan, Laird and Slasor (1997) <doi:10.1002/(SICI)1097-0258(19971030)16:20%3C2349::AID-SIM667%3E3.0.CO;2-E> for a tutorial and Mallinckrodt, Lane, Schnell, Peng and Mancuso (2008) <doi:10.1177/009286150804200402> for a review. This package implements MMRM based on the marginal linear model without random effects using Template Model Builder ('TMB') which enables fast and robust model fitting. Users can specify a variety of covariance matrices, weight observations, fit models with restricted or standard maximum likelihood inference, perform hypothesis testing with Satterthwaite or Kenward-Roger adjustment, and extract least square means estimates by using 'emmeans'.
Maintained by Daniel Sabanes Bove. Last updated 22 days ago.
138 stars 12.15 score 113 scripts 4 dependentsdeclaredesign
estimatr:Fast Estimators for Design-Based Inference
Fast procedures for small set of commonly-used, design-appropriate estimators with robust standard errors and confidence intervals. Includes estimators for linear regression, instrumental variables regression, difference-in-means, Horvitz-Thompson estimation, and regression improving precision of experimental estimates by interacting treatment with centered pre-treatment covariates introduced by Lin (2013) <doi:10.1214/12-AOAS583>.
Maintained by Graeme Blair. Last updated 2 months ago.
133 stars 11.58 score 1.7k scripts 11 dependentsjacob-long
interactions:Comprehensive, User-Friendly Toolkit for Probing Interactions
A suite of functions for conducting and interpreting analysis of statistical interaction in regression models that was formerly part of the 'jtools' package. Functionality includes visualization of two- and three-way interactions among continuous and/or categorical variables as well as calculation of "simple slopes" and Johnson-Neyman intervals (see e.g., Bauer & Curran, 2005 <doi:10.1207/s15327906mbr4003_5>). These capabilities are implemented for generalized linear models in addition to the standard linear regression context.
Maintained by Jacob A. Long. Last updated 8 months ago.
interactionsmoderationsocial-sciencesstatistics
131 stars 11.40 score 1.2k scripts 5 dependentstidymodels
tidypredict:Run Predictions Inside the Database
It parses a fitted 'R' model object, and returns a formula in 'Tidy Eval' code that calculates the predictions. It works with several databases back-ends because it leverages 'dplyr' and 'dbplyr' for the final 'SQL' translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), earth(), xgb.Booster.complete(), cubist(), and ctree() models.
Maintained by Emil Hvitfeldt. Last updated 3 months ago.
262 stars 11.05 score 241 scripts 2 dependentspbs-assess
sdmTMB:Spatial and Spatiotemporal SPDE-Based GLMMs with 'TMB'
Implements spatial and spatiotemporal GLMMs (Generalized Linear Mixed Effect Models) using 'TMB', 'fmesher', and the SPDE (Stochastic Partial Differential Equation) Gaussian Markov random field approximation to Gaussian random fields. One common application is for spatially explicit species distribution models (SDMs). See Anderson et al. (2024) <doi:10.1101/2022.03.24.485545>.
Maintained by Sean C. Anderson. Last updated 18 hours ago.
ecologyglmmspatial-analysisspecies-distribution-modellingtmbcpp
205 stars 11.04 score 848 scripts 1 dependentstidymodels
textrecipes:Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Maintained by Emil Hvitfeldt. Last updated 10 days ago.
160 stars 10.86 score 964 scripts 1 dependentshojsgaard
geepack:Generalized Estimating Equation Package
Generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses. See e.g. Halekoh and Højsgaard, (2005, <doi:10.18637/jss.v015.i02>), for details.
Maintained by Søren Højsgaard. Last updated 8 months ago.
1 stars 10.57 score 1.7k scripts 43 dependentsmatthieustigler
tsDyn:Nonlinear Time Series Models with Regime Switching
Implements nonlinear autoregressive (AR) time series models. For univariate series, a non-parametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).
Maintained by Matthieu Stigler. Last updated 5 months ago.
34 stars 10.53 score 684 scripts 3 dependentstidymodels
themis:Extra Recipes Steps for Dealing with Unbalanced Data
A dataset with an uneven number of cases in each class is said to be unbalanced. Many models produce a subpar performance on unbalanced datasets. A dataset can be balanced by increasing the number of minority cases using SMOTE 2011 <doi:10.48550/arXiv.1106.1813>, BorderlineSMOTE 2005 <doi:10.1007/11538059_91> and ADASYN 2008 <https://ieeexplore.ieee.org/document/4633969>. Or by decreasing the number of majority cases using NearMiss 2003 <https://www.site.uottawa.ca/~nat/Workshop2003/jzhang.pdf> or Tomek link removal 1976 <https://ieeexplore.ieee.org/document/4309452>.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
143 stars 10.37 score 1.3k scripts 2 dependentsatsa-es
MARSS:Multivariate Autoregressive State-Space Modeling
The MARSS package provides maximum-likelihood parameter estimation for constrained and unconstrained linear multivariate autoregressive state-space (MARSS) models, including partially deterministic models. MARSS models are a class of dynamic linear model (DLM) and vector autoregressive model (VAR) model. Fitting available via Expectation-Maximization (EM), BFGS (using optim), and 'TMB' (using the 'marssTMB' companion package). Functions are provided for parametric and innovations bootstrapping, Kalman filtering and smoothing, model selection criteria including bootstrap AICb, confidences intervals via the Hessian approximation or bootstrapping, and all conditional residual types. See the user guide for examples of dynamic factor analysis, dynamic linear models, outlier and shock detection, and multivariate AR-p models. Online workshops (lectures, eBook, and computer labs) at <https://atsa-es.github.io/>.
Maintained by Elizabeth Eli Holmes. Last updated 1 years ago.
multivariate-timeseriesstate-space-modelsstatisticstime-series
52 stars 10.34 score 596 scripts 3 dependentsbcgov
ssdtools:Species Sensitivity Distributions
Species sensitivity distributions are cumulative probability distributions which are fitted to toxicity concentrations for different species as described by Posthuma et al.(2001) <isbn:9781566705783>. The ssdtools package uses Maximum Likelihood to fit distributions such as the gamma, log-logistic, log-normal and log-normal log-normal mixture. Multiple distributions can be averaged using Akaike Information Criteria. Confidence intervals on hazard concentrations and proportions are produced by bootstrapping.
Maintained by Joe Thorley. Last updated 1 months ago.
ecotoxicologyenvspecies-sensitivity-distributioncpp
33 stars 10.33 score 111 scripts 5 dependentsdarwin-eu
omopgenerics:Methods and Classes for the OMOP Common Data Model
Provides definitions of core classes and methods used by analytic pipelines that query the OMOP (Observational Medical Outcomes Partnership) common data model.
Maintained by Martí Català. Last updated 22 days ago.
9.97 score 193 scripts 16 dependentsnicholasjclark
mvgam:Multivariate (Dynamic) Generalized Additive Models
Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2023) <doi:10.1111/2041-210X.13974>.
Maintained by Nicholas J Clark. Last updated 12 hours ago.
bayesian-statisticsdynamic-factor-modelsecological-modellingforecastinggaussian-processgeneralised-additive-modelsgeneralized-additive-modelsjoint-species-distribution-modellingmultilevel-modelsmultivariate-timeseriesstantime-series-analysistimeseriesvector-autoregressionvectorautoregressioncpp
148 stars 9.92 score 117 scriptstidymodels
rules:Model Wrappers for Rule-Based Models
Bindings for additional models for use with the 'parsnip' package. Models include prediction rule ensembles (Friedman and Popescu, 2008) <doi:10.1214/07-AOAS148>, C5.0 rules (Quinlan, 1992 ISBN: 1558602380), and Cubist (Kuhn and Johnson, 2013) <doi:10.1007/978-1-4614-6849-3>.
Maintained by Emil Hvitfeldt. Last updated 5 months ago.
40 stars 9.52 score 20k scripts 1 dependentsstemangiola
tidyseurat:Brings Seurat to the Tidyverse
It creates an invisible layer that allow to see the 'Seurat' object as tibble and interact seamlessly with the tidyverse.
Maintained by Stefano Mangiola. Last updated 8 months ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsdplyrggplot2pcapurrrsctseuratsingle-cellsingle-cell-rna-seqtibbletidyrtidyversetranscriptstsneumap
159 stars 9.48 score 398 scripts 1 dependentstidymodels
embed:Extra Recipes for Encoding Predictors
Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
142 stars 9.35 score 1.1k scriptsotoomet
maxLik:Maximum Likelihood Estimation and Related Tools
Functions for Maximum Likelihood (ML) estimation, non-linear optimization, and related tools. It includes a unified way to call different optimizers, and classes and methods to handle the results from the Maximum Likelihood viewpoint. It also includes a number of convenience tools for testing and developing your own models.
Maintained by Ott Toomet. Last updated 1 years ago.
9.14 score 480 scripts 110 dependentsjhelvy
logitr:Logit Models w/Preference & WTP Space Utility Parameterizations
Fast estimation of multinomial (MNL) and mixed logit (MXL) models in R. Models can be estimated using "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations. Weighted models can also be estimated. An option is available to run a parallelized multistart optimization loop with random starting points in each iteration, which is useful for non-convex problems like MXL models or models with WTP space utility parameterizations. The main optimization loop uses the 'nloptr' package to minimize the negative log-likelihood function. Additional functions are available for computing and comparing WTP from both preference space and WTP space models and for predicting expected choices and choice probabilities for sets of alternatives based on an estimated model. Mixed logit models can include uncorrelated or correlated heterogeneity covariances and are estimated using maximum simulated likelihood based on the algorithms in Train (2009) <doi:10.1017/CBO9780511805271>. More details can be found in Helveston (2023) <doi:10.18637/jss.v105.i10>.
Maintained by John Helveston. Last updated 5 months ago.
log-likelihoodlogitlogit-modelmixed-logitmlogitmultinomial-regressionmxlmxl-modelspreference-spacepreferenceswillingness-to-paywtp
54 stars 9.10 score 119 scripts 1 dependentsgraemeleehickey
joineRML:Joint Modelling of Multivariate Longitudinal Data and Time-to-Event Outcomes
Fits the joint model proposed by Henderson and colleagues (2000) <doi:10.1093/biostatistics/1.4.465>, but extended to the case of multiple continuous longitudinal measures. The time-to-event data is modelled using a Cox proportional hazards regression model with time-varying covariates. The multiple longitudinal outcomes are modelled using a multivariate version of the Laird and Ware linear mixed model. The association is captured by a multivariate latent Gaussian process. The model is estimated using a Monte Carlo Expectation Maximization algorithm. This project was funded by the Medical Research Council (Grant number MR/M013227/1).
Maintained by Graeme L. Hickey. Last updated 2 months ago.
armadillobiostatisticsclinical-trialscoxdynamicjoint-modelslongitudinal-datamultivariate-analysismultivariate-datamultivariate-longitudinal-datapredictionrcppregression-modelsstatisticssurvivalopenblascppopenmp
30 stars 8.93 score 146 scripts 1 dependentsbioc
tidySingleCellExperiment:Brings SingleCellExperiment to the Tidyverse
'tidySingleCellExperiment' is an adapter that abstracts the 'SingleCellExperiment' container in the form of a 'tibble'. This allows *tidy* data manipulation, nesting, and plotting. For example, a 'tidySingleCellExperiment' is directly compatible with functions from 'tidyverse' packages `dplyr` and `tidyr`, as well as plotting with `ggplot2` and `plotly`. In addition, the package provides various utility functions specific to single-cell omics data analysis (e.g., aggregation of cell-level data to pseudobulks).
Maintained by Stefano Mangiola. Last updated 5 months ago.
assaydomaininfrastructurernaseqdifferentialexpressionsinglecellgeneexpressionnormalizationclusteringqualitycontrolsequencingbioconductordplyrggplot2plotlysingle-cell-rna-seqsingle-cell-sequencingsinglecellexperimenttibbletidyrtidyverse
36 stars 8.86 score 125 scripts 2 dependentscharlie86
spotifyr:R Wrapper for the 'Spotify' Web API
An R wrapper for pulling data from the 'Spotify' Web API <https://developer.spotify.com/documentation/web-api/> in bulk, or post items on a 'Spotify' user's playlist.
Maintained by Daniel Antal. Last updated 5 months ago.
music-information-retrievalspotify
375 stars 8.61 score 936 scriptstidymodels
tidyposterior:Bayesian Analysis to Compare Models using Resampling Statistics
Bayesian analysis used here to answer the question: "when looking at resampling results, are the differences between models 'real'?" To answer this, a model can be created were the performance statistic is the resampling statistics (e.g. accuracy or RMSE). These values are explained by the model types. In doing this, we can get parameter estimates for each model's affect on performance and make statistical (and practical) comparisons between models. The methods included here are similar to Benavoli et al (2017) <https://jmlr.org/papers/v18/16-305.html>.
Maintained by Max Kuhn. Last updated 5 months ago.
102 stars 8.44 score 273 scriptsbioc
tidySummarizedExperiment:Brings SummarizedExperiment to the Tidyverse
The tidySummarizedExperiment package provides a set of tools for creating and manipulating tidy data representations of SummarizedExperiment objects. SummarizedExperiment is a widely used data structure in bioinformatics for storing high-throughput genomic data, such as gene expression or DNA sequencing data. The tidySummarizedExperiment package introduces a tidy framework for working with SummarizedExperiment objects. It allows users to convert their data into a tidy format, where each observation is a row and each variable is a column. This tidy representation simplifies data manipulation, integration with other tidyverse packages, and enables seamless integration with the broader ecosystem of tidy tools for data analysis.
Maintained by Stefano Mangiola. Last updated 5 months ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomics
26 stars 8.44 score 196 scripts 1 dependentsdeclaredesign
DeclareDesign:Declare and Diagnose Research Designs
Researchers can characterize and learn about the properties of research designs before implementation using `DeclareDesign`. Ex ante declaration and diagnosis of designs can help researchers clarify the strengths and limitations of their designs and to improve their properties, and can help readers evaluate a research strategy prior to implementation and without access to results. It can also make it easier for designs to be shared, replicated, and critiqued.
Maintained by Graeme Blair. Last updated 2 months ago.
101 stars 8.42 score 398 scripts 1 dependentsradiant-rstats
radiant.data:Data Menu for Radiant: Business Analytics using R and Shiny
The Radiant Data menu includes interfaces for loading, saving, viewing, visualizing, summarizing, transforming, and combining data. It also contains functionality to generate reproducible reports of the analyses conducted in the application.
Maintained by Vincent Nijs. Last updated 5 months ago.
53 stars 8.25 score 146 scripts 6 dependentsrobinhankin
permutations:The Symmetric Group: Permutations of a Finite Set
Manipulates invertible functions from a finite set to itself. Can transform from word form to cycle form and back. To cite the package in publications please use Hankin (2020) "Introducing the permutations R package", SoftwareX, volume 11 <doi:10.1016/j.softx.2020.100453>.
Maintained by Robin K. S. Hankin. Last updated 2 months ago.
6 stars 8.23 score 49 scripts 2 dependentsdarwin-eu
DrugUtilisation:Summarise Patient-Level Drug Utilisation in Data Mapped to the OMOP Common Data Model
Summarise patient-level drug utilisation cohorts using data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model. New users and prevalent users cohorts can be generated and their characteristics, indication and drug use summarised.
Maintained by Martí Català. Last updated 2 months ago.
8.20 score 156 scripts 2 dependentshenrikbengtsson
R.rsp:Dynamic Generation of Scientific Reports
The RSP markup language makes any text-based document come alive. RSP provides a powerful markup for controlling the content and output of LaTeX, HTML, Markdown, AsciiDoc, Sweave and knitr documents (and more), e.g. 'Today's date is <%=Sys.Date()%>'. Contrary to many other literate programming languages, with RSP it is straightforward to loop over mixtures of code and text sections, e.g. in month-by-month summaries. RSP has also several preprocessing directives for incorporating static and dynamic contents of external files (local or online) among other things. Functions rstring() and rcat() make it easy to process RSP strings, rsource() sources an RSP file as it was an R script, while rfile() compiles it (even online) into its final output format, e.g. rfile('report.tex.rsp') generates 'report.pdf' and rfile('report.md.rsp') generates 'report.html'. RSP is ideal for self-contained scientific reports and R package vignettes. It's easy to use - if you know how to write an R script, you'll be up and running within minutes.
Maintained by Henrik Bengtsson. Last updated 1 years ago.
documentmarkupreportreproducibilityscience
31 stars 8.06 score 36 scripts 9 dependentsdarwin-eu
CohortCharacteristics:Summarise and Visualise Characteristics of Patients in the OMOP CDM
Summarise and visualise the characteristics of patients in data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model (CDM).
Maintained by Marti Catala. Last updated 4 months ago.
1 stars 8.03 score 111 scripts 1 dependentsrobinhankin
hyper2:The Hyperdirichlet Distribution, Mark 2
A suite of routines for the hyperdirichlet distribution and reified Bradley-Terry; supersedes the 'hyperdirichlet' package; uses 'disordR' discipline <doi:10.48550/ARXIV.2210.03856>. To cite in publications please use Hankin 2017 <doi:10.32614/rj-2017-061>, and for Generalized Plackett-Luce likelihoods use Hankin 2024 <doi:10.18637/jss.v109.i08>.
Maintained by Robin K. S. Hankin. Last updated 6 hours ago.
5 stars 7.91 score 38 scripts 1 dependentsstocnet
goldfish:Statistical Network Models for Dynamic Network Data
Tools for fitting statistical network models to dynamic network data. Can be used for fitting both dynamic network actor models ('DyNAMs') and relational event models ('REMs'). Stadtfeld, Hollway, and Block (2017a) <doi:10.1177/0081175017709295>, Stadtfeld, Hollway, and Block (2017b) <doi:10.1177/0081175017733457>, Stadtfeld and Block (2017) <doi:10.15195/v4.a14>, Hoffman et al. (2020) <doi:10.1017/nws.2020.3>.
Maintained by Alvaro Uzaheta. Last updated 7 months ago.
dynamnetwork-modellingremstatistical-network-analysisopenblascppopenmp
61 stars 7.91 score 44 scriptsdarwin-eu
visOmopResults:Graphs and Tables for OMOP Results
Provides methods to transform omop_result objects into formatted tables and figures, facilitating the visualisation of study results working with the Observational Medical Outcomes Partnership (OMOP) Common Data Model.
Maintained by Núria Mercadé-Besora. Last updated 8 days ago.
7.89 score 53 scripts 3 dependentsopenpharma
crmPack:Object-Oriented Implementation of CRM Designs
Implements a wide range of model-based dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules. Further details are presented in Sabanes Bove et al. (2019) <doi:10.18637/jss.v089.i10>.
Maintained by Daniel Sabanes Bove. Last updated 2 months ago.
21 stars 7.76 score 208 scriptsellessenne
rsimsum:Analysis of Simulation Studies Including Monte Carlo Error
Summarise results from simulation studies and compute Monte Carlo standard errors of commonly used summary statistics. This package is modelled on the 'simsum' user-written command in 'Stata' (White I.R., 2010 <https://www.stata-journal.com/article.html?article=st0200>), further extending it with additional performance measures and functionality.
Maintained by Alessandro Gasparini. Last updated 11 months ago.
biostatisticsmonte-carlo-errorsimulationsimulation-studysimulationsstatistics
28 stars 7.70 score 148 scriptsusepa
spmodel:Spatial Statistical Modeling and Prediction
Fit, summarize, and predict for a variety of spatial statistical models applied to point-referenced and areal (lattice) data. Parameters are estimated using various methods. Additional modeling features include anisotropy, non-spatial random effects, partition factors, big data approaches, and more. Model-fit statistics are used to summarize, visualize, and compare models. Predictions at unobserved locations are readily obtainable. For additional details, see Dumelle et al. (2023) <doi:10.1371/journal.pone.0282524>.
Maintained by Michael Dumelle. Last updated 17 days ago.
15 stars 7.66 score 112 scripts 3 dependentspoissonconsulting
mcmcr:Manipulate MCMC Samples
Functions and classes to store, manipulate and summarise Monte Carlo Markov Chain (MCMC) samples. For more information see Brooks et al. (2011) <isbn:978-1-4200-7941-8>.
Maintained by Joe Thorley. Last updated 2 months ago.
17 stars 7.66 score 111 scripts 10 dependentsbioc
AlpsNMR:Automated spectraL Processing System for NMR
Reads Bruker NMR data directories both zipped and unzipped. It provides automated and efficient signal processing for untargeted NMR metabolomics. It is able to interpolate the samples, detect outliers, exclude regions, normalize, detect peaks, align the spectra, integrate peaks, manage metadata and visualize the spectra. After spectra proccessing, it can apply multivariate analysis on extracted data. Efficient plotting with 1-D data is also available. Basic reading of 1D ACD/Labs exported JDX samples is also available.
Maintained by Sergio Oller Moreno. Last updated 5 months ago.
softwarepreprocessingvisualizationclassificationcheminformaticsmetabolomicsdataimport
15 stars 7.59 score 12 scripts 1 dependentsbayesiandemography
bage:Bayesian Estimation and Forecasting of Age-Specific Rates
Fast Bayesian estimation and forecasting of age-specific rates, probabilities, and means, based on 'Template Model Builder'.
Maintained by John Bryant. Last updated 12 days ago.
3 stars 7.41 score 39 scriptstommyjones
tidylda:Latent Dirichlet Allocation Using 'tidyverse' Conventions
Implements an algorithm for Latent Dirichlet Allocation (LDA), Blei et at. (2003) <https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf>, using style conventions from the 'tidyverse', Wickham et al. (2019)<doi:10.21105/joss.01686>, and 'tidymodels', Kuhn et al.<https://tidymodels.github.io/model-implementation-principles/>. Fitting is done via collapsed Gibbs sampling. Also implements several novel features for LDA such as guided models and transfer learning.
Maintained by Tommy Jones. Last updated 2 months ago.
41 stars 7.36 score 53 scriptscorybrunson
ordr:A Tidyverse Extension for Ordinations and Biplots
Ordination comprises several multivariate exploratory and explanatory techniques with theoretical foundations in geometric data analysis; see Podani (2000, ISBN:90-5782-067-6) for techniques and applications and Le Roux & Rouanet (2005) <doi:10.1007/1-4020-2236-0> for foundations. Greenacre (2010, ISBN:978-84-923846) shows how the most established of these, including principal components analysis, correspondence analysis, multidimensional scaling, factor analysis, and discriminant analysis, rely on eigen-decompositions or singular value decompositions of pre-processed numeric matrix data. These decompositions give rise to a set of shared coordinates along which the row and column elements can be measured. The overlay of their scatterplots on these axes, introduced by Gabriel (1971) <doi:10.1093/biomet/58.3.453>, is called a biplot. 'ordr' provides inspection, extraction, manipulation, and visualization tools for several popular ordination classes supported by a set of recovery methods. It is inspired by and designed to integrate into 'tidyverse' workflows provided by Wickham et al (2019) <doi:10.21105/joss.01686>.
Maintained by Jason Cory Brunson. Last updated 26 days ago.
biplotdata-visualizationdimension-reductiongeometric-data-analysisgrammar-of-graphicslog-ratio-analysismultivariate-analysismultivariate-statisticsordinationtidymodelstidyverse
24 stars 7.26 score 28 scriptstidymodels
poissonreg:Model Wrappers for Poisson Regression
Bindings for Poisson regression models for use with the 'parsnip' package. Models include simple generalized linear models, Bayesian models, and zero-inflated Poisson models (Zeileis, Kleiber, and Jackman (2008) <doi:10.18637/jss.v027.i08>).
Maintained by Hannah Frick. Last updated 5 months ago.
22 stars 7.26 score 342 scripts 1 dependentspoissonconsulting
nlist:Lists of Numeric Atomic Objects
Create and manipulate numeric list ('nlist') objects. An 'nlist' is an S3 list of uniquely named numeric objects. An numeric object is an integer or double vector, matrix or array. An 'nlists' object is a S3 class list of 'nlist' objects with the same names, dimensionalities and typeofs. Numeric list objects are of interest because they are the raw data inputs for analytic engines such as 'JAGS', 'STAN' and 'TMB'. Numeric lists objects, which are useful for storing multiple realizations of of simulated data sets, can be converted to coda::mcmc and coda::mcmc.list objects.
Maintained by Joe Thorley. Last updated 2 months ago.
6 stars 7.23 score 13 scripts 12 dependentsinsightsengineering
tern.mmrm:Tables and Graphs for Mixed Models for Repeated Measures (MMRM)
Mixed models for repeated measures (MMRM) are a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials and beyond; see for example Cnaan, Laird and Slasor (1997) <doi:10.1002/(SICI)1097-0258(19971030)16:20%3C2349::AID-SIM667%3E3.0.CO;2-E>. This package provides an interface for fitting MMRM within the 'tern' <https://cran.r-project.org/package=tern> framework by Zhu et al. (2023) and tabulate results easily using 'rtables' <https://cran.r-project.org/package=rtables> by Becker et al. (2023). It builds on 'mmrm' <https://cran.r-project.org/package=mmrm> by Sabanés Bové et al. (2023) for the actual MMRM computations.
Maintained by Joe Zhu. Last updated 6 months ago.
graphslistingsstatistical-engineeringtables
6 stars 7.23 score 8 scripts 1 dependentstidymodels
tidyclust:A Common API to Clustering
A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
112 stars 7.21 score 139 scriptsrobjhyndman
vital:Tidy Analysis Tools for Mortality, Fertility, Migration and Population Data
Analysing vital statistics based on tools consistent with the tidyverse. Tools are provided for data visualization, life table calculations, computing net migration numbers, Lee-Carter modelling; functional data modelling and forecasting.
Maintained by Rob Hyndman. Last updated 2 days ago.
28 stars 7.20 score 18 scriptsmskcc-epi-bio
tidycmprsk:Competing Risks Estimation
Provides an intuitive interface for working with the competing risk endpoints. The package wraps the 'cmprsk' package, and exports functions for univariate cumulative incidence estimates and competing risk regression. Methods follow those introduced in Fine and Gray (1999) <doi:10.1002/sim.7501>.
Maintained by Daniel D. Sjoberg. Last updated 8 months ago.
23 stars 7.06 score 157 scripts 1 dependentstidymodels
agua:'tidymodels' Integration with 'h2o'
Create and evaluate models using 'tidymodels' and 'h2o' <https://h2o.ai/>. The package enables users to specify 'h2o' as an engine for several modeling methods.
Maintained by Qiushi Yan. Last updated 10 months ago.
22 stars 6.88 score 80 scriptskvasilopoulos
exuber:Econometric Analysis of Explosive Time Series
Testing for and dating periods of explosive dynamics (exuberance) in time series using the univariate and panel recursive unit root tests proposed by Phillips et al. (2015) <doi:10.1111/iere.12132> and Pavlidis et al. (2016) <doi:10.1007/s11146-015-9531-2>.The recursive least-squares algorithm utilizes the matrix inversion lemma to avoid matrix inversion which results in significant speed improvements. Simulation of a variety of periodically-collapsing bubble processes. Details can be found in Vasilopoulos et al. (2022) <doi:10.18637/jss.v103.i10>.
Maintained by Kostas Vasilopoulos. Last updated 1 years ago.
dickey-fullerexplosive-dynamicssimulationtime-seriesopenblascpp
29 stars 6.83 score 77 scriptsseananderson
glmmfields:Generalized Linear Mixed Models with Robust Random Fields for Spatiotemporal Modeling
Implements Bayesian spatial and spatiotemporal models that optionally allow for extreme spatial deviations through time. 'glmmfields' uses a predictive process approach with random fields implemented through a multivariate-t distribution instead of the usual multivariate normal. Sampling is conducted with 'Stan'. References: Anderson and Ward (2019) <doi:10.1002/ecy.2403>.
Maintained by Sean C. Anderson. Last updated 1 years ago.
ecologyextremesspatial-analysisspatiotemporalcpp
50 stars 6.74 score 55 scriptsshah-in-boots
card:Cardiovascular Applications in Research Data
A collection of cardiovascular research datasets and analytical tools, including methods for cardiovascular procedural data, such as electrocardiography, echocardiography, and catheterization data. Additional methods exist for analysis of procedural billing codes.
Maintained by Anish S. Shah. Last updated 2 months ago.
3 stars 6.73 score 163 scriptss3alfisc
fwildclusterboot:Fast Wild Cluster Bootstrap Inference for Linear Models
Implementation of fast algorithms for wild cluster bootstrap inference developed in 'Roodman et al' (2019, 'STATA' Journal, <doi:10.1177/1536867X19830877>) and 'MacKinnon et al' (2022), which makes it feasible to quickly calculate bootstrap test statistics based on a large number of bootstrap draws even for large samples. Multiple bootstrap types as described in 'MacKinnon, Nielsen & Webb' (2022) are supported. Further, 'multiway' clustering, regression weights, bootstrap weights, fixed effects and 'subcluster' bootstrapping are supported. Further, both restricted ('WCR') and unrestricted ('WCU') bootstrap are supported. Methods are provided for a variety of fitted models, including 'lm()', 'feols()' (from package 'fixest') and 'felm()' (from package 'lfe'). Additionally implements a 'heteroskedasticity-robust' ('HC1') wild bootstrap. Last, the package provides an R binding to 'WildBootTests.jl', which provides additional speed gains and functionality, including the 'WRE' bootstrap for instrumental variable models (based on models of type 'ivreg()' from package 'ivreg') and hypotheses with q > 1.
Maintained by Alexander Fischer. Last updated 2 years ago.
clustered-standard-errorslinear-regression-modelswild-bootstrapwild-cluster-bootstrapopenblascppopenmp
25 stars 6.69 score 109 scripts 2 dependentsgsk-biostatistics
beastt:Bayesian Evaluation, Analysis, and Simulation Software Tools for Trials
Bayesian dynamic borrowing with covariate adjustment via inverse probability weighting for simulations and data analyses in clinical trials. This makes it easy to use propensity score methods to balance covariate distributions between external and internal data.
Maintained by Christina Fillmore. Last updated 4 days ago.
3 stars 6.65 score 4 scriptsusepa
SSN2:Spatial Modeling on Stream Networks
Spatial statistical modeling and prediction for data on stream networks, including models based on in-stream distance (Ver Hoef, J.M. and Peterson, E.E., (2010) <DOI:10.1198/jasa.2009.ap08248>.) Models are created using moving average constructions. Spatial linear models, including explanatory variables, can be fit with (restricted) maximum likelihood. Mapping and other graphical functions are included.
Maintained by Michael Dumelle. Last updated 7 months ago.
19 stars 6.61 score 36 scripts 2 dependentstidymodels
plsmod:Model Wrappers for Projection Methods
Bindings for additional regression models for use with the 'parsnip' package, including ordinary and spare partial least squares models for regression and classification (Rohart et al (2017) <doi:10.1371/journal.pcbi.1005752>).
Maintained by Max Kuhn. Last updated 5 months ago.
14 stars 6.47 score 59 scripts 1 dependentsmattheaphy
actxps:Create Actuarial Experience Studies: Prepare Data, Summarize Results, and Create Reports
Experience studies are used by actuaries to explore historical experience across blocks of business and to inform assumption setting activities. This package provides functions for preparing data, creating studies, visualizing results, and beginning assumption development. Experience study methods, including exposure calculations, are described in: Atkinson & McGarry (2016) "Experience Study Calculations" <https://www.soa.org/49378a/globalassets/assets/files/research/experience-study-calculations.pdf>. The limited fluctuation credibility method used by the 'exp_stats()' function is described in: Herzog (1999, ISBN:1-56698-374-6) "Introduction to Credibility Theory".
Maintained by Matt Heaphy. Last updated 3 months ago.
14 stars 6.38 score 23 scriptsstocnet
migraph:Univariate and Multivariate Tests for Multimodal and Other Networks
A set of tools for testing networks. It includes functions for univariate and multivariate conditional uniform graph and quadratic assignment procedure testing, and network regression. The package is a complement to 'Multimodal Political Networks' (2021, ISBN:9781108985000), and includes various datasets used in the book. Built on the 'manynet' package, all functions operate with matrices, edge lists, and 'igraph', 'network', and 'tidygraph' objects, and on one-mode and two-mode (bipartite) networks.
Maintained by James Hollway. Last updated 4 months ago.
igraphmultilevel-networksmultimodal-networknetwork-analysissna
41 stars 6.37 score 33 scriptsnt-williams
lmtp:Non-Parametric Causal Effects of Feasible Interventions Based on Modified Treatment Policies
Non-parametric estimators for casual effects based on longitudinal modified treatment policies as described in Diaz, Williams, Hoffman, and Schenck <doi:10.1080/01621459.2021.1955691>, traditional point treatment, and traditional longitudinal effects. Continuous, binary, categorical treatments, and multivariate treatments are allowed as well are censored outcomes. The treatment mechanism is estimated via a density ratio classification procedure irrespective of treatment variable type. For both continuous and binary outcomes, additive treatment effects can be calculated and relative risks and odds ratios may be calculated for binary outcomes. Supports survival outcomes with competing risks (Diaz, Hoffman, and Hejazi; <doi:10.1007/s10985-023-09606-7>).
Maintained by Nicholas Williams. Last updated 22 days ago.
causal-inferencecensored-datalongitudinal-datamachine-learningmodified-treatment-policynonparametric-statisticsprecision-medicinerobust-statisticsstatisticsstochastic-interventionssurvival-analysistargeted-learning
64 stars 6.37 score 91 scriptscmstatr
cmstatr:Statistical Methods for Composite Material Data
An implementation of the statistical methods commonly used for advanced composite materials in aerospace applications. This package focuses on calculating basis values (lower tolerance bounds) for material strength properties, as well as performing the associated diagnostic tests. This package provides functions for calculating basis values assuming several different distributions, as well as providing functions for non-parametric methods of computing basis values. Functions are also provided for testing the hypothesis that there is no difference between strength and modulus data from an alternate sample and that from a "qualification" or "baseline" sample. For a discussion of these statistical methods and their use, see the Composite Materials Handbook, Volume 1 (2012, ISBN: 978-0-7680-7811-4). Additional details about this package are available in the paper by Kloppenborg (2020, <doi:10.21105/joss.02265>).
Maintained by Stefan Kloppenborg. Last updated 11 days ago.
composite-material-datadatamaterials-sciencestatistical-analysisstatistics
4 stars 6.36 score 23 scriptsvivianalobo
lnmixsurv:Bayesian Mixture Log-Normal Survival Model
Bayesian Survival models via the mixture of Log-Normal distribution extends the well-known survival models and accommodates different behaviour over time and considers higher censored survival times. The proposal combines mixture distributions Fruhwirth-Schnatter(2006) <doi:10.1007/s11336-009-9121-4>, and data augmentation techniques Tanner and Wong (1987) <doi:10.1080/01621459.1987.10478458>.
Maintained by Victor Hugo Soares Ney. Last updated 24 days ago.
2 stars 6.16 score 18 scriptss3alfisc
summclust:Module to Compute Influence and Leverage Statistics for Regression Models with Clustered Errors
Module to compute cluster specific information for regression models with clustered errors, including leverage and influence statistics. Models of type 'lm' and 'fixest'(from the 'stats' and 'fixest' packages) are supported. 'summclust' implements similar features as the user-written 'summclust.ado' Stata module (MacKinnon, Nielsen & Webb, 2022; <arXiv:2205.03288v1>).
Maintained by Alexander Fischer. Last updated 2 years ago.
clustered-standard-errorsfixestlinear-regressionrobust-inference
6 stars 6.16 score 53 scripts 3 dependentsipd-tools
ipd:Inference on Predicted Data
Performs valid statistical inference on predicted data (IPD) using recent methods, where for a subset of the data, the outcomes have been predicted by an algorithm. Provides a wrapper function with specified defaults for the type of model and method to be used for estimation and inference. Further provides methods for tidying and summarizing results. Salerno et al., (2024) <doi:10.48550/arXiv.2410.09665>.
Maintained by Stephen Salerno. Last updated 3 months ago.
8 stars 6.13 score 5 scriptsmattblackwell
DirectEffects:Estimating Controlled Direct Effects for Explaining Causal Findings
A set of functions to estimate the controlled direct effect of treatment fixing a potential mediator to a specific value. Implements the sequential g-estimation estimator described in Vansteelandt (2009) <doi:10.1097/EDE.0b013e3181b6f4c9> and Acharya, Blackwell, and Sen (2016) <doi:10.1017/S0003055416000216> and the telescope matching estimator described in Blackwell and Strezhnev (2020) <doi:10.1111/rssa.12759>.
Maintained by Matthew Blackwell. Last updated 1 months ago.
18 stars 6.09 score 17 scriptspachadotdev
capybara:Fast and Memory Efficient Fitting of Linear Models with High-Dimensional Fixed Effects
Fast and user-friendly estimation of generalized linear models with multiple fixed effects and cluster the standard errors. The method to obtain the estimated fixed-effects coefficients is based on Stammann (2018) <doi:10.48550/arXiv.1707.01815> and Gaure (2013) <doi:10.1016/j.csda.2013.03.024>.
Maintained by Mauricio Vargas Sepulveda. Last updated 5 days ago.
cpp11econometricslinear-modelsopenblascppopenmp
13 stars 6.07 scoreyjunechoe
jlmerclusterperm:Cluster-Based Permutation Analysis for Densely Sampled Time Data
An implementation of fast cluster-based permutation analysis (CPA) for densely-sampled time data developed in Maris & Oostenveld, 2007 <doi:10.1016/j.jneumeth.2007.03.024>. Supports (generalized, mixed-effects) regression models for the calculation of timewise statistics. Provides both a wholesale and a piecemeal interface to the CPA procedure with an emphasis on interpretability and diagnostics. Integrates 'Julia' libraries 'MixedModels.jl' and 'GLM.jl' for performance improvements, with additional functionalities for interfacing with 'Julia' from 'R' powered by the 'JuliaConnectoR' package.
Maintained by June Choe. Last updated 19 days ago.
cluster-based-permutation-testeegeyetrackingmixed-effects-modelstimeseries
13 stars 5.86 score 14 scriptsevolecolgroup
tidypopgen:Tidy Population Genetics
We provide a tidy grammar of population genetics, facilitating the manipulation and analysis of data on biallelic single nucleotide polymorphisms (SNPs). `tidypopgen` scales to very large genetic datasets by storing genotypes on disk, and performing operations on them in chunks, without ever loading all data in memory.
Maintained by Andrea Manica. Last updated 8 days ago.
4 stars 5.84 score 8 scriptsberrij
profoc:Probabilistic Forecast Combination Using CRPS Learning
Combine probabilistic forecasts using CRPS learning algorithms proposed in Berrisch, Ziel (2021) <doi:10.48550/arXiv.2102.00968> <doi:10.1016/j.jeconom.2021.11.008>. The package implements multiple online learning algorithms like Bernstein online aggregation; see Wintenberger (2014) <doi:10.48550/arXiv.1404.1356>. Quantile regression is also implemented for comparison purposes. Model parameters can be tuned automatically with respect to the loss of the forecast combination. Methods like predict(), update(), plot() and print() are available for convenience. This package utilizes the optim C++ library for numeric optimization <https://github.com/kthohr/optim>.
Maintained by Jonathan Berrisch. Last updated 6 months ago.
14 stars 5.74 score 13 scriptsacoppock
ri2:Randomization Inference for Randomized Experiments
Randomization inference procedures for simple and complex randomized designs, including multi-armed trials, as described in Gerber and Green (2012, ISBN: 978-0393979954). Users formally describe their randomization procedure and test statistic. The randomization distribution of the test statistic under some null hypothesis is efficiently simulated.
Maintained by Alexander Coppock. Last updated 3 years ago.
12 stars 5.69 score 82 scriptsvpnsctl
mixpoissonreg:Mixed Poisson Regression for Overdispersed Count Data
Fits mixed Poisson regression models (Poisson-Inverse Gaussian or Negative-Binomial) on data sets with response variables being count data. The models can have varying precision parameter, where a linear regression structure (through a link function) is assumed to hold on the precision parameter. The Expectation-Maximization algorithm for both these models (Poisson Inverse Gaussian and Negative Binomial) is an important contribution of this package. Another important feature of this package is the set of functions to perform global and local influence analysis. See Barreto-Souza and Simas (2016) <doi:10.1007/s11222-015-9601-6> for further details.
Maintained by Alexandre B. Simas. Last updated 4 years ago.
count-datadiagnosticsinfluence-analysislocal-influencenegative-binomial-regressionpoisson-inverse-gaussian-regression
3 stars 5.44 score 23 scriptsbeanumber
tidychangepoint:A Tidy Framework for Changepoint Detection Analysis
Changepoint detection algorithms for R are widespread but have different interfaces and reporting conventions. This makes the comparative analysis of results difficult. We solve this problem by providing a tidy, unified interface for several different changepoint detection algorithms. We also provide consistent numerical and graphical reporting leveraging the 'broom' and 'ggplot2' packages.
Maintained by Benjamin S. Baumer. Last updated 2 months ago.
2 stars 5.30 score 8 scriptssachsmc
stdReg2:Regression Standardization for Causal Inference
Contains more modern tools for causal inference using regression standardization. Four general classes of models are implemented; generalized linear models, conditional generalized estimating equation models, Cox proportional hazards models, and shared frailty gamma-Weibull models. Methodological details are described in Sjölander, A. (2016) <doi:10.1007/s10654-016-0157-3>. Also includes functionality for doubly robust estimation for generalized linear models in some special cases, and the ability to implement custom models.
Maintained by Michael C Sachs. Last updated 8 days ago.
2 stars 5.15 score 9 scriptspoissonconsulting
bboutools:Boreal Caribou Survival, Recruitment and Population Growth
Estimates annual survival, recruitment and population growth for boreal caribou populations using Bayesian and Maximum Likelihood models with fixed and random effects.
Maintained by Seb Dalgarno. Last updated 2 months ago.
1 stars 5.11 score 13 scripts 2 dependentsopisthokonta
chainbinomial:Chain Binomial Models for Analysis of Infectious Disease Data
Implements the chain binomial model for analysis of infectious disease data. Contains functions for calculating probabilities of the final size of infectious disease outbreaks using the method from D. Ludwig (1975) <doi:10.1016/0025-5564(75)90119-4> and for outbreaks that are not concluded, from Lindstrøm et al. (2024) <doi:10.48550/arXiv.2403.03948>. The package also contains methods for estimation and regression analysis of secondary attack rates.
Maintained by Jonas Christoffer Lindstrøm. Last updated 2 months ago.
5.00 score 5 scriptskorap
RKorAPClient:'KorAP' Web Service Client Package
A client package that makes the 'KorAP' web service API accessible from R. The corpus analysis platform 'KorAP' has been developed as a scientific tool to make potentially large, stratified and multiply annotated corpora, such as the 'German Reference Corpus DeReKo' or the 'Corpus of the Contemporary Romanian Language CoRoLa', accessible for linguists to let them verify hypotheses and to find interesting patterns in real language use. The 'RKorAPClient' package provides access to 'KorAP' and the corpora behind it for user-created R code, as a programmatic alternative to the 'KorAP' web user-interface. You can learn more about 'KorAP' and use it directly on 'DeReKo' at <https://korap.ids-mannheim.de/>.
Maintained by Marc Kupietz. Last updated 28 days ago.
6 stars 4.81 score 30 scriptstalegari
tidyrules:Utilities to Retrieve Rulelists from Model Fits, Filter, Prune, Reorder and Predict on Unseen Data
Provides a framework to work with decision rules. Rules can be extracted from supported models, augmented with (custom) metrics using validation data, manipulated using standard dataframe operations, reordered and pruned based on a metric, predict on unseen (test) data. Utilities include; Creating a rulelist manually, Exporting a rulelist as a SQL case statement and so on. The package offers two classes; rulelist and ruleset based on dataframe.
Maintained by Srikanth Komala Sheshachala. Last updated 2 months ago.
11 stars 4.75 score 17 scriptsnjtierney
mmcc:tidy mcmc.list using data.table
Tidy up, diagnose, and visualise your mcmc samples quickly and easily so you can get on with your analysis.
Maintained by Nicholas Tierney. Last updated 3 years ago.
24 stars 4.68 score 10 scriptshriebl
lmls:Gaussian Location-Scale Regression
The Gaussian location-scale regression model is a multi-predictor model with explanatory variables for the mean (= location) and the standard deviation (= scale) of a response variable. This package implements maximum likelihood and Markov chain Monte Carlo (MCMC) inference (using algorithms from Girolami and Calderhead (2011) <doi:10.1111/j.1467-9868.2010.00765.x> and Nesterov (2009) <doi:10.1007/s10107-007-0149-x>), a parametric bootstrap algorithm, and diagnostic plots for the model class.
Maintained by Hannes Riebl. Last updated 5 months ago.
3 stars 4.65 score 15 scriptspoissonconsulting
embr:Model Builder Utility Functions and Virtual Classes
Utility functions and virtual classes shared by model builder packages such as tmbr, jmbr and smbr.
Maintained by Joe Thorley. Last updated 2 months ago.
3 stars 4.61 score 4 scripts 3 dependentschjackson
disbayes:Bayesian Multi-State Modelling of Chronic Disease Burden Data
Estimation of incidence and case fatality for a chronic disease, given partial information, using a multi-state model. Given data on age-specific mortality and either incidence or prevalence, Bayesian inference is used to estimate the posterior distributions of incidence, case fatality, and functions of these such as prevalence. The methods are described in Jackson et al. (2023) <doi:10.1093/jrsssa/qnac015>.
Maintained by Christopher Jackson. Last updated 1 years ago.
7 stars 4.54 score 10 scriptsshah-in-boots
rmdl:A Causality-Informed Modeling Approach
A system for describing and manipulating the many models that are generated in causal inference and data analysis projects, as based on the causal theory and criteria of Austin Bradford Hill (1965) <doi:10.1177/003591576505800503>. This system includes the addition of formal attributes that modify base `R` objects, including terms and formulas, with a focus on variable roles in the "do-calculus" of modeling, as described in Pearl (2010) <doi:10.2202/1557-4679.1203>. For example, the definition of exposure, outcome, and interaction are implicit in the roles variables take in a formula. These premises allow for a more fluent modeling approach focusing on variable relationships, and assessing effect modification, as described by VanderWeele and Robins (2007) <doi:10.1097/EDE.0b013e318127181b>. The essential goal is to help contextualize formulas and models in causality-oriented workflows.
Maintained by Anish S. Shah. Last updated 10 months ago.
epidemiologymodelingstatistics
4.54 score 7 scriptsmattblackwell
factiv:Instrumental Variables Estimation for 2^k Factorial Experiments
Implements instrumental variable estimators for 2^K factorial experiments with noncompliance.
Maintained by Matthew Blackwell. Last updated 3 years ago.
3 stars 4.18 score 6 scriptsdata-wise
RMediation:Mediation Analysis Confidence Intervals
We provide functions to compute confidence intervals for a well-defined nonlinear function of the model parameters (e.g., product of k coefficients) in single--level and multilevel structural equation models. It also computes a chi-square test statistic for a function of indirect effects. 'Tofighi', D. and 'MacKinnon', D. P. (2011). 'RMediation' An R package for mediation analysis confidence intervals. Behavior Research Methods, 43, 692--700. <doi:10.3758/s13428-011-0076-x>. 'Tofighi', D. (2020). Bootstrap Model-Based Constrained Optimization Tests of Indirect Effects. Frontiers in Psychology, 10, 2989. <doi:10.3389/fpsyg.2019.02989>.
Maintained by Davood Tofighi. Last updated 1 years ago.
causal-inferenceconfidence-intervalslikelihood-ratio-testmediationmediation-analysis
1 stars 4.10 score 25 scriptsyjunechoe
jlme:Regression Modelling with 'GLM.jl' and 'MixedModels.jl' in 'Julia'
Bindings to 'Julia' packages 'GLM.jl' <doi:10.5281/zenodo.3376013> and 'MixedModels.jl' <doi:10.5281/zenodo.12575371>, powered by 'JuliaConnectoR'. Fits (generalized) linear (mixed-effects) regression models in 'Julia' using familiar model fitting syntax from R. Offers 'broom'-style data frame summary functionalities for 'Julia' regression models.
Maintained by June Choe. Last updated 4 days ago.
1 stars 3.98 score 6 scriptsopenpharma
roxylint:Lint 'roxygen2'-Generated Documentation
Provides formatting linting to 'roxygen2' tags. Linters report 'roxygen2' tags that do not conform to a standard style. These linters can be a helpful check for building more consistent documentation and to provide reminders about best practices or checks for typos. Default linting suites are provided for common style guides such as the one followed by the 'tidyverse', though custom linters can be registered by other packages or be custom-tailored to a specific package.
Maintained by Doug Kelkhoff. Last updated 1 years ago.
17 stars 3.93 scoredmolitor
bolasso:Model Consistent Lasso Estimation Through the Bootstrap
Implements the bolasso algorithm for consistent variable selection and estimation accuracy. Includes support for many parallel backends via the future package. For details see: Bach (2008), 'Bolasso: model consistent Lasso estimation through the bootstrap', <doi:10.48550/arXiv.0804.1302>.
Maintained by Daniel Molitor. Last updated 3 months ago.
bolassobootstraplassovariable-selection
4 stars 3.90 score 7 scriptsycroissant
mhurdle:Multiple Hurdle Tobit Models
Estimation of models with dependent variable left-censored at zero. Null values may be caused by a selection process Cragg (1971) <doi:10.2307/1909582>, insufficient resources Tobin (1958) <doi:10.2307/1907382>, or infrequency of purchase Deaton and Irish (1984) <doi:10.1016/0047-2727(84)90067-7>.
Maintained by Yves Croissant. Last updated 9 months ago.
3.88 score 15 scriptsgrantmcdermott
ritest:Randomisation Inference Testing
An experimental port of the `ritest` Stata routine by Simon Heß. Fast and user-friendly. Aims to support a variety of model classes once it is fully baked.
Maintained by . Last updated 3 years ago.
10 stars 3.70 score 7 scriptspsychelzh
cpmr:Connectome Predictive Modelling in R
Connectome Predictive Modelling (CPM) (Shen et al. (2017) <doi:10.1038/nprot.2016.178>) is a method to predict individual differences in behaviour from brain functional connectivity. 'cpmr' provides a simple yet efficient implementation of this method.
Maintained by Liang Zhang. Last updated 6 months ago.
1 stars 3.65 score 4 scriptsnjtierney
broomstick:Convert Decision Tree Objects into Tidy Data Frames
Convert Decision Tree objects into tidy data frames, by using the framework laid out by the package broom, this means that decision tree output can be easily reshaped, porocessed, and combined with tools like 'dplyr', 'tidyr' and 'ggplot2'. Like the package broom, broomstick provides three S3 generics: tidy, to summarise decision tree specific features - tidy returns the variable importance table; augment adds columns to the original data such as predictions and residuals; and glance, which provides a one-row summary of model-level statistics.
Maintained by Nicholas Tierney. Last updated 1 years ago.
broomdecision-treesgbmmachine-learningrandomforestrpartstatistical-learning
29 stars 3.59 score 27 scriptspaithiov909
shikakusphere:Miscellaneous Functions for Japanese Mahjong
A collection of miscellaneous functions for Japanese mahjong that wraps C++ sources of 'shanten-number' <https://github.com/tomohxx/shanten-number> and 'cmajiang' <https://github.com/TadaoYamaoka/cmajiang>.
Maintained by Akiru Kato. Last updated 29 days ago.
4 stars 3.41 score 5 scriptspasturm
bfsl:Best-Fit Straight Line
How to fit a straight line through a set of points with errors in both coordinates? The 'bfsl' package implements the York regression (York, 2004 <doi:10.1119/1.1632486>). It provides unbiased estimates of the intercept, slope and standard errors for the best-fit straight line to independent points with (possibly correlated) normally distributed errors in both x and y. Other commonly used errors-in-variables methods, such as orthogonal distance regression, geometric mean regression or Deming regression are special cases of the 'bfsl' solution.
Maintained by Patrick Sturm. Last updated 3 years ago.
3 stars 3.18 score 10 scriptsnk027
BVARverse:Tidy Bayesian Vector Autoregression
Functions to prepare tidy objects from estimated models via 'BVAR' (see Kuschnig & Vashold, 2019 <doi:10.13140/RG.2.2.25541.60643>) and visualisation thereof. Bridges the gap between estimating models with 'BVAR' and plotting the results in a more sophisticated way with 'ggplot2' as well as passing them on in a tidy format.
Maintained by Lukas Vashold. Last updated 5 years ago.
bayesiandata-sciencevector-autoregressions
2 stars 3.00 score 7 scriptsnjtierney
yahtsee:Yet Another Hierachical Time Series Extension and Expansion
An opinionated approach to building hierarchical time series models in R using INLA and inlabru.
Maintained by Nicholas Tierney. Last updated 3 years ago.
2 stars 3.00 score 8 scriptssciviews
exploreit:Exploratory Data Analysis for 'SciViews::R'
Multivariate analysis and data exploration for the 'SciViews::R' dialect.
Maintained by Philippe Grosjean. Last updated 11 months ago.
multivariate-analysissciviewsstatistical-methods
2.70 score 4 scriptsgraemeblair
rdss:Companion Datasets and Functions for Research Design in the Social Sciences
Helper functions to accompany the Blair, Coppock, and Humphreys (2022) "Research Design in the Social Sciences: Declaration, Diagnosis, and Redesign" <https://book.declaredesign.org>. 'rdss' includes datasets, helper functions, and plotting components to enable use and replication of the book.
Maintained by Graeme Blair. Last updated 3 months ago.
2.64 score 29 scriptsshixiangwang
sigminer.prediction:Train and Predict Cancer Subtype with Keras Model based on Mutational Signatures
Mutational signatures represent mutational processes occured in cancer evolution, thus are stable and genetic resources for subtyping. This tool provides functions for training neutral network models to predict the subtype a sample belongs to based on 'keras' and 'sigminer' packages.
Maintained by Shixiang Wang. Last updated 3 years ago.
kerasmutational-signaturesprostate-cancersigminer
8 stars 2.60 score 2 scriptsbklamer
rankdifferencetest:Kornbrot's Rank Difference Test
Implements Kornbrot's rank difference test as described in <doi:10.1111/j.2044-8317.1990.tb00939.x>. This method is a modified Wilcoxon signed-rank test which produces consistent and meaningful results for ordinal or monotonically-transformed data.
Maintained by Brett Klamer. Last updated 6 months ago.
2.18 score 4 scriptseric-hunt
htce:A set of internal tools for managing high-throughput assay data at NEB
What the package does (one paragraph).
Maintained by Eric Hunt. Last updated 10 months ago.
1.00 score