Showing 200 of total 263 results (show query)
ddsjoberg
gtsummary:Presentation-Ready Data Summary and Analytic Result Tables
Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.
Maintained by Daniel D. Sjoberg. Last updated 3 days ago.
easy-to-usegthtml5regression-modelsreproducibilityreproducible-researchstatisticssummary-statisticssummary-tablestable1tableone
1.1k stars 17.02 score 8.2k scripts 15 dependentssebkrantz
collapse:Advanced and Fast Data Transformation
A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.
Maintained by Sebastian Krantz. Last updated 6 days ago.
data-aggregationdata-analysisdata-manipulationdata-processingdata-sciencedata-transformationeconometricshigh-performancepanel-datascientific-computingstatisticstime-seriesweightedweightscppopenmp
672 stars 16.68 score 708 scripts 99 dependentsbethatkinson
rpart:Recursive Partitioning and Regression Trees
Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.
Maintained by Beth Atkinson. Last updated 9 months ago.
52 stars 16.59 score 18k scripts 1.6k dependentseasystats
effectsize:Indices of Effect Size
Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc. References: Ben-Shachar et al. (2020) <doi:10.21105/joss.02815>.
Maintained by Mattan S. Ben-Shachar. Last updated 2 months ago.
anovacohens-dcomputeconversioncorrelationeffect-sizeeffectsizehacktoberfesthedges-ginterpretationstandardizationstandardizedstatistics
344 stars 16.38 score 1.8k scripts 29 dependentsspatstat
spatstat:Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests
Comprehensive open-source toolbox for analysing Spatial Point Patterns. Focused mainly on two-dimensional point patterns, including multitype/marked points, in any spatial region. Also supports three-dimensional point patterns, space-time point patterns in any number of dimensions, point patterns on a linear network, and patterns of other geometrical objects. Supports spatial covariate data such as pixel images. Contains over 3000 functions for plotting spatial data, exploratory data analysis, model-fitting, simulation, spatial sampling, model diagnostics, and formal inference. Data types include point patterns, line segment patterns, spatial windows, pixel images, tessellations, and linear networks. Exploratory methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the functions ppm(), kppm(), slrm(), dppm() similar to glm(). Types of models include Poisson, Gibbs and Cox point processes, Neyman-Scott cluster processes, and determinantal point processes. Models may involve dependence on covariates, inter-point interaction, cluster formation and dependence on marks. Models are fitted by maximum likelihood, logistic regression, minimum contrast, and composite likelihood methods. A model can be fitted to a list of point patterns (replicated point pattern data) using the function mppm(). The model can include random effects and fixed effects depending on the experimental design, in addition to all the features listed above. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots.
Maintained by Adrian Baddeley. Last updated 6 days ago.
cluster-processcox-point-processgibbs-processkernel-densitynetwork-analysispoint-processpoisson-processspatial-analysisspatial-dataspatial-data-analysisspatial-statisticsspatstatstatistical-methodsstatistical-modelsstatistical-testsstatistics
200 stars 16.25 score 5.5k scripts 40 dependentseasystats
performance:Assessment of Regression Models Performance
Utilities for computing measures to assess model quality, which are not directly provided by R's 'base' or 'stats' packages. These include e.g. measures like r-squared, intraclass correlation coefficient (Nakagawa, Johnson & Schielzeth (2017) <doi:10.1098/rsif.2017.0213>), root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models. References: Lüdecke et al. (2021) <doi:10.21105/joss.03139>.
Maintained by Daniel Lüdecke. Last updated 3 days ago.
aiceasystatshacktoberfestloomachine-learningmixed-modelsmodelsperformancer2statistics
1.1k stars 16.20 score 4.3k scripts 48 dependentsaphalo
ggpmisc:Miscellaneous Extensions to 'ggplot2'
Extensions to 'ggplot2' respecting the grammar of graphics paradigm. Statistics: locate and tag peaks and valleys; label plot with the equation of a fitted polynomial or other types of models; labels with P-value, R^2 or adjusted R^2 or information criteria for fitted models; label with ANOVA table for fitted models; label with summary for fitted models. Model fit classes for which suitable methods are provided by package 'broom' and 'broom.mixed' are supported. Scales and stats to build volcano and quadrant plots based on outcomes, fold changes, p-values and false discovery rates.
Maintained by Pedro J. Aphalo. Last updated 16 hours ago.
data-analysisdatavizggplot2-annotationsggplot2-statsstatistics
107 stars 13.64 score 4.4k scripts 14 dependentskaz-yos
tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights
Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.
Maintained by Kazuki Yoshida. Last updated 3 years ago.
baseline-characteristicsdescriptive-statisticsstatistics
221 stars 13.55 score 2.3k scripts 12 dependentsmitchelloharawild
distributional:Vectorised Probability Distributions
Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions.
Maintained by Mitchell OHara-Wild. Last updated 1 days ago.
probability-distributionstatisticsvctrs
100 stars 13.54 score 744 scripts 388 dependentsmayoverse
arsenal:An Arsenal of 'R' Functions for Large-Scale Statistical Summaries
An Arsenal of 'R' functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types 'by' the levels of one or more categorical variables; paired(), a Table-1-like summary of multiple variable types paired across two time points; modelsum(), which performs simple model fits on one or more endpoints for many variables (univariate or adjusted for covariates); freqlist(), a powerful frequency table across many categorical variables; comparedf(), a function for comparing data.frames; and write2(), a function to output tables to a document.
Maintained by Ethan Heinzen. Last updated 8 months ago.
baseline-characteristicsdescriptive-statisticsmodelingpaired-comparisonsreportingstatisticstableone
225 stars 13.40 score 1.2k scripts 15 dependentseasystats
see:Model Visualisation Toolbox for 'easystats' and 'ggplot2'
Provides plotting utilities supporting packages in the 'easystats' ecosystem (<https://github.com/easystats/easystats>) and some extra themes, geoms, and scales for 'ggplot2'. Color scales are based on <https://materialui.co/>. References: Lüdecke et al. (2021) <doi:10.21105/joss.03393>.
Maintained by Indrajeet Patil. Last updated 17 days ago.
data-visualizationeasystatsggplot2hacktoberfestplottingseestatisticsvisualisationvisualization
902 stars 13.22 score 2.0k scripts 3 dependentseasystats
easystats:Framework for Easy Statistical Modeling, Visualization, and Reporting
A meta-package that installs and loads a set of packages from 'easystats' ecosystem in a single step. This collection of packages provide a unifying and consistent framework for statistical modeling, visualization, and reporting. Additionally, it provides articles targeted at instructors for teaching 'easystats', and a dashboard targeted at new R users for easily conducting statistical analysis by accessing summary results, model fit indices, and visualizations with minimal programming.
Maintained by Daniel Lüdecke. Last updated 24 days ago.
dataanalyticsdatascienceeasystatshacktoberfestmodelsperformance-metricsregression-modelsstatistics
1.1k stars 13.01 score 1.8k scripts 1 dependentskkholst
lava:Latent Variable Models
A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.
Maintained by Klaus K. Holst. Last updated 3 months ago.
latent-variable-modelssimulationstatisticsstructural-equation-models
33 stars 12.87 score 610 scripts 478 dependentsdrostlab
philentropy:Similarity and Distance Quantification Between Probability Functions
Computes 46 optimized distance and similarity measures for comparing probability functions (Drost (2018) <doi:10.21105/joss.00765>). These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a core framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions.
Maintained by Hajk-Georg Drost. Last updated 4 months ago.
distance-measuresdistance-quantificationinformation-theoryjensen-shannon-divergenceparametric-distributionssimilarity-measuresstatisticscpp
137 stars 12.44 score 484 scripts 24 dependentsyihui
animation:A Gallery of Animations in Statistics and Utilities to Create Animations
Provides functions for animations in statistics, covering topics in probability theory, mathematical statistics, multivariate statistics, non-parametric statistics, sampling survey, linear models, time series, computational statistics, data mining and machine learning. These functions may be helpful in teaching statistics and data analysis. Also provided in this package are a series of functions to save animations to various formats, e.g. Flash, 'GIF', HTML pages, 'PDF' and videos. 'PDF' animations can be inserted into 'Sweave' / 'knitr' easily.
Maintained by Yihui Xie. Last updated 2 years ago.
animationstatistical-computingstatistical-graphicsstatistics
208 stars 12.13 score 2.5k scripts 28 dependentstwolodzko
extraDistr:Additional Univariate and Multivariate Distributions
Density, distribution function, quantile function and random generation for a number of univariate and multivariate distributions. This package implements the following distributions: Bernoulli, beta-binomial, beta-negative binomial, beta prime, Bhattacharjee, Birnbaum-Saunders, bivariate normal, bivariate Poisson, categorical, Dirichlet, Dirichlet-multinomial, discrete gamma, discrete Laplace, discrete normal, discrete uniform, discrete Weibull, Frechet, gamma-Poisson, generalized extreme value, Gompertz, generalized Pareto, Gumbel, half-Cauchy, half-normal, half-t, Huber density, inverse chi-squared, inverse-gamma, Kumaraswamy, Laplace, location-scale t, logarithmic, Lomax, multivariate hypergeometric, multinomial, negative hypergeometric, non-standard beta, normal mixture, Poisson mixture, Pareto, power, reparametrized beta, Rayleigh, shifted Gompertz, Skellam, slash, triangular, truncated binomial, truncated normal, truncated Poisson, Tukey lambda, Wald, zero-inflated binomial, zero-inflated negative binomial, zero-inflated Poisson.
Maintained by Tymoteusz Wolodzko. Last updated 23 days ago.
c-plus-plusc-plus-plus-11distributionmultivariate-distributionsprobabilityrandom-generationrcppstatisticscpp
53 stars 11.60 score 1.5k scripts 107 dependentsjacob-long
interactions:Comprehensive, User-Friendly Toolkit for Probing Interactions
A suite of functions for conducting and interpreting analysis of statistical interaction in regression models that was formerly part of the 'jtools' package. Functionality includes visualization of two- and three-way interactions among continuous and/or categorical variables as well as calculation of "simple slopes" and Johnson-Neyman intervals (see e.g., Bauer & Curran, 2005 <doi:10.1207/s15327906mbr4003_5>). These capabilities are implemented for generalized linear models in addition to the standard linear regression context.
Maintained by Jacob A. Long. Last updated 8 months ago.
interactionsmoderationsocial-sciencesstatistics
131 stars 11.40 score 1.2k scripts 5 dependentstnagler
VineCopula:Statistical Inference of Vine Copulas
Provides tools for the statistical analysis of regular vine copula models, see Aas et al. (2009) <doi:10.1016/j.insmatheco.2007.02.001> and Dissman et al. (2013) <doi:10.1016/j.csda.2012.08.010>. The package includes tools for parameter estimation, model selection, simulation, goodness-of-fit tests, and visualization. Tools for estimation, selection and exploratory data analysis of bivariate copula models are also provided.
Maintained by Thomas Nagler. Last updated 4 days ago.
copulaestimationstatisticsvine
92 stars 11.07 score 362 scripts 23 dependentsspatstat
spatstat.data:Datasets for 'spatstat' Family
Contains all the datasets for the 'spatstat' family of packages.
Maintained by Adrian Baddeley. Last updated 11 days ago.
kernel-densitypoint-processspatial-analysisspatial-dataspatial-data-analysisspatstatstatistical-analysisstatistical-methodsstatistical-testsstatistics
6 stars 11.07 score 186 scripts 228 dependentsopenml
OpenML:Open Machine Learning and Open Data Platform
We provide an R interface to 'OpenML.org' which is an online machine learning platform where researchers can access open data, download and upload data sets, share their machine learning tasks and experiments and organize them online to work and collaborate with other researchers. The R interface allows to query for data sets with specific properties, and allows the downloading and uploading of data sets, tasks, flows and runs. See <https://www.openml.org/guide/api> for more information.
Maintained by Giuseppe Casalicchio. Last updated 10 months ago.
arffbenchmarkingbenchmarking-suiteclassificationdata-sciencedatabasedatasetdatasetsmachine-learningmachine-learning-algorithmsopen-dataopen-scienceopendataopenmlopenscienceregressionreproducible-researchstatistics
97 stars 11.04 score 7.1k scriptsconfig-i1
greybox:Toolbox for Model Building and Forecasting
Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.
Maintained by Ivan Svetunkov. Last updated 14 days ago.
forecastingmodel-selectionmodel-selection-and-evaluationregressionregression-modelsstatisticscpp
30 stars 11.03 score 97 scripts 34 dependentsrudeboybert
fivethirtyeight:Data and Code Behind the Stories and Interactives at 'FiveThirtyEight'
Datasets and code published by the data journalism website 'FiveThirtyEight' available at <https://github.com/fivethirtyeight/data>. Note that while we received guidance from editors at 'FiveThirtyEight', this package is not officially published by 'FiveThirtyEight'.
Maintained by Albert Y. Kim. Last updated 2 years ago.
data-sciencedatajournalismfivethirtyeightstatistics
453 stars 10.98 score 1.7k scriptsneuropsychology
psycho:Efficient and Publishing-Oriented Workflow for Psychological Science
The main goal of the psycho package is to provide tools for psychologists, neuropsychologists and neuroscientists, to facilitate and speed up the time spent on data analysis. It aims at supporting best practices and tools to format the output of statistical methods to directly paste them into a manuscript, ensuring statistical reporting standardization and conformity.
Maintained by Dominique Makowski. Last updated 4 years ago.
apaapa6bayesiancorrelationformatinterpretationmixed-modelsneurosciencepsychopsychologyrstanarmstatistics
149 stars 10.86 score 628 scripts 5 dependentsovvo-financial
NNS:Nonlinear Nonparametric Statistics
Nonlinear nonparametric statistics using partial moments. Partial moments are the elements of variance and asymptotically approximate the area of f(x). These robust statistics provide the basis for nonlinear analysis while retaining linear equivalences. NNS offers: Numerical integration, Numerical differentiation, Clustering, Correlation, Dependence, Causal analysis, ANOVA, Regression, Classification, Seasonality, Autoregressive modeling, Normalization, Stochastic dominance and Advanced Monte Carlo sampling. All routines based on: Viole, F. and Nawrocki, D. (2013), Nonlinear Nonparametric Statistics: Using Partial Moments (ISBN: 1490523995).
Maintained by Fred Viole. Last updated 1 hours ago.
clusteringeconometricsmachine-learningnonlinearnonparametricpartial-momentsstatisticstime-seriescpp
72 stars 10.77 score 66 scripts 3 dependentsmariarizzo
energy:E-Statistics: Multivariate Inference via the Energy of Data
E-statistics (energy) tests and statistics for multivariate and univariate inference, including distance correlation, one-sample, two-sample, and multi-sample tests for comparing multivariate distributions, are implemented. Measuring and testing multivariate independence based on distance correlation, partial distance correlation, multivariate goodness-of-fit tests, k-groups and hierarchical clustering based on energy distance, testing for multivariate normality, distance components (disco) for non-parametric analysis of structured data, and other energy statistics/methods are implemented.
Maintained by Maria Rizzo. Last updated 7 months ago.
distance-correlationenergymultivariate-analysisstatisticscpp
45 stars 10.69 score 634 scripts 45 dependentsrempsyc
rempsyc:Convenience Functions for Psychology
Make your workflow faster and easier. Easily customizable plots (via 'ggplot2'), nice APA tables (following the style of the *American Psychological Association*) exportable to Word (via 'flextable'), easily run statistical tests or check assumptions, and automatize various other tasks.
Maintained by Rémi Thériault. Last updated 2 months ago.
convenience-functionsggplot2psychologystatisticsvisualization
43 stars 10.68 score 214 scripts 2 dependentsikosmidis
brglm2:Bias Reduction in Generalized Linear Models
Estimation and inference from generalized linear models based on various methods for bias reduction and maximum penalized likelihood with powers of the Jeffreys prior as penalty. The 'brglmFit' fitting method can achieve reduction of estimation bias by solving either the mean bias-reducing adjusted score equations in Firth (1993) <doi:10.1093/biomet/80.1.27> and Kosmidis and Firth (2009) <doi:10.1093/biomet/asp055>, or the median bias-reduction adjusted score equations in Kenne et al. (2017) <doi:10.1093/biomet/asx046>, or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as in Cordeiro and McCullagh (1991) <https://www.jstor.org/stable/2345592>. See Kosmidis et al (2020) <doi:10.1007/s11222-019-09860-6> for more details. Estimation in all cases takes place via a quasi Fisher scoring algorithm, and S3 methods for the construction of of confidence intervals for the reduced-bias estimates are provided. In the special case of generalized linear models for binomial and multinomial responses (both ordinal and nominal), the adjusted score approaches to mean and media bias reduction have been found to return estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite (e.g. complete and quasi-complete separation; see Kosmidis and Firth, 2020 <doi:10.1093/biomet/asaa052>, for a proof for mean bias reduction in logistic regression).
Maintained by Ioannis Kosmidis. Last updated 7 months ago.
adjusted-score-equationsalgorithmsbias-reducing-adjustmentsbias-reductionestimationglmlogistic-regressionnominal-responsesordinal-responsesregressionregression-algorithmsstatistics
32 stars 10.41 score 106 scripts 10 dependentsatsa-es
MARSS:Multivariate Autoregressive State-Space Modeling
The MARSS package provides maximum-likelihood parameter estimation for constrained and unconstrained linear multivariate autoregressive state-space (MARSS) models, including partially deterministic models. MARSS models are a class of dynamic linear model (DLM) and vector autoregressive model (VAR) model. Fitting available via Expectation-Maximization (EM), BFGS (using optim), and 'TMB' (using the 'marssTMB' companion package). Functions are provided for parametric and innovations bootstrapping, Kalman filtering and smoothing, model selection criteria including bootstrap AICb, confidences intervals via the Hessian approximation or bootstrapping, and all conditional residual types. See the user guide for examples of dynamic factor analysis, dynamic linear models, outlier and shock detection, and multivariate AR-p models. Online workshops (lectures, eBook, and computer labs) at <https://atsa-es.github.io/>.
Maintained by Elizabeth Eli Holmes. Last updated 1 years ago.
multivariate-timeseriesstate-space-modelsstatisticstime-series
52 stars 10.34 score 596 scripts 3 dependentsmsalibian
RobStatTM:Robust Statistics: Theory and Methods
Companion package for the book: "Robust Statistics: Theory and Methods, second edition", <http://www.wiley.com/go/maronna/robust>. This package contains code that implements the robust estimators discussed in the recent second edition of the book above, as well as the scripts reproducing all the examples in the book.
Maintained by Matias Salibian-Barrera. Last updated 15 days ago.
robustrobust-estimationrobust-regressionrobust-statisticsrobustnessstatisticsfortranopenblas
17 stars 10.23 score 84 scripts 8 dependentsstan-dev
projpred:Projection Predictive Feature Selection
Performs projection predictive feature selection for generalized linear models (Piironen, Paasiniemi, and Vehtari, 2020, <doi:10.1214/20-EJS1711>) with or without multilevel or additive terms (Catalina, Bürkner, and Vehtari, 2022, <https://proceedings.mlr.press/v151/catalina22a.html>), for some ordinal and nominal regression models (Weber, Glass, and Vehtari, 2023, <arXiv:2301.01660>), and for many other regression models (using the latent projection by Catalina, Bürkner, and Vehtari, 2021, <arXiv:2109.04702>, which can also be applied to most of the former models). The package is compatible with the 'rstanarm' and 'brms' packages, but other reference models can also be used. See the vignettes and the documentation for more information and examples.
Maintained by Frank Weber. Last updated 11 days ago.
bayesbayesianbayesian-inferencerstanarmstanstatisticsvariable-selectionopenblascpp
112 stars 10.09 score 241 scriptstlverse
sl3:Pipelines for Machine Learning and Super Learning
A modern implementation of the Super Learner prediction algorithm, coupled with a general purpose framework for composing arbitrary pipelines for machine learning tasks.
Maintained by Jeremy Coyle. Last updated 4 months ago.
data-scienceensemble-learningensemble-modelmachine-learningmodel-selectionregressionstackingstatistics
100 stars 9.94 score 748 scripts 7 dependentsstocnet
RSiena:Siena - Simulation Investigation for Empirical Network Analysis
The main purpose of this package is to perform simulation-based estimation of stochastic actor-oriented models for longitudinal network data collected as panel data. Dependent variables can be single or multivariate networks, which can be directed, non-directed, or two-mode; and associated actor variables. There are also functions for testing parameters and checking goodness of fit. An overview of these models is given in Snijders (2017), <doi:10.1146/annurev-statistics-060116-054035>.
Maintained by Tom A.B. Snijders. Last updated 2 months ago.
longitudinal-datarsienasocial-network-analysisstatistical-network-analysisstatisticscpp
107 stars 9.93 score 346 scripts 1 dependentsacclab
dabestr:Data Analysis using Bootstrap-Coupled Estimation
Data Analysis using Bootstrap-Coupled ESTimation. Estimation statistics is a simple framework that avoids the pitfalls of significance testing. It uses familiar statistical concepts: means, mean differences, and error bars. More importantly, it focuses on the effect size of one's experiment/intervention, as opposed to a false dichotomy engendered by P values. An estimation plot has two key features: 1. It presents all datapoints as a swarmplot, which orders each point to display the underlying distribution. 2. It presents the effect size as a bootstrap 95% confidence interval on a separate but aligned axes. Estimation plots are introduced in Ho et al., Nature Methods 2019, 1548-7105. <doi:10.1038/s41592-019-0470-3>. The free-to-view PDF is located at <https://www.nature.com/articles/s41592-019-0470-3.epdf?author_access_token=Euy6APITxsYA3huBKOFBvNRgN0jAjWel9jnR3ZoTv0Pr6zJiJ3AA5aH4989gOJS_dajtNr1Wt17D0fh-t4GFcvqwMYN03qb8C33na_UrCUcGrt-Z0J9aPL6TPSbOxIC-pbHWKUDo2XsUOr3hQmlRew%3D%3D>.
Maintained by Yishan Mai. Last updated 1 years ago.
data-analysisdata-visualizationestimationstatistics
214 stars 9.80 score 142 scriptsjasonjfoster
roll:Rolling and Expanding Statistics
Fast and efficient computation of rolling and expanding statistics for time-series data.
Maintained by Jason Foster. Last updated 2 months ago.
algorithmsrcppstatisticsopenblascppopenmp
116 stars 9.76 score 318 scripts 13 dependentsdcousin3
superb:Summary Plots with Adjusted Error Bars
Computes standard error and confidence interval of various descriptive statistics under various designs and sampling schemes. The main function, superb(), return a plot. It can also be used to obtain a dataframe with the statistics and their precision intervals so that other plotting environments (e.g., Excel) can be used. See Cousineau and colleagues (2021) <doi:10.1177/25152459211035109> or Cousineau (2017) <doi:10.5709/acp-0214-z> for a review as well as Cousineau (2005) <doi:10.20982/tqmp.01.1.p042>, Morey (2008) <doi:10.20982/tqmp.04.2.p061>, Baguley (2012) <doi:10.3758/s13428-011-0123-7>, Cousineau & Laurencelle (2016) <doi:10.1037/met0000055>, Cousineau & O'Brien (2014) <doi:10.3758/s13428-013-0441-z>, Calderini & Harding <doi:10.20982/tqmp.15.1.p001> for specific references.
Maintained by Denis Cousineau. Last updated 2 months ago.
error-barsplottingstatisticssummary-plotssummary-statisticsvisualization
19 stars 9.53 score 155 scripts 2 dependentstbates
umx:Structural Equation Modeling and Twin Modeling in R
Quickly create, run, and report structural equation models, and twin models. See '?umx' for help, and umx_open_CRAN_page("umx") for NEWS. Timothy C. Bates, Michael C. Neale, Hermine H. Maes, (2019). umx: A library for Structural Equation and Twin Modelling in R. Twin Research and Human Genetics, 22, 27-41. <doi:10.1017/thg.2019.2>.
Maintained by Timothy C. Bates. Last updated 14 days ago.
behavior-geneticsgeneticsopenmxpsychologysemstatisticsstructural-equation-modelingtutorialstwin-modelsumx
44 stars 9.45 score 472 scriptseblondel
rsdmx:Tools for Reading SDMX Data and Metadata
Set of classes and methods to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework, currently focusing on the SDMX XML standard format (SDMX-ML).
Maintained by Emmanuel Blondel. Last updated 8 days ago.
apidatastructuresdsdreadreadsdmxsdmxsdmx-formatsdmx-providersdmx-standardsstatisticstimeseriesweb-services
105 stars 9.37 score 4 dependentsdmphillippo
multinma:Bayesian Network Meta-Analysis of Individual and Aggregate Data
Network meta-analysis and network meta-regression models for aggregate data, individual patient data, and mixtures of both individual and aggregate data using multilevel network meta-regression as described by Phillippo et al. (2020) <doi:10.1111/rssa.12579>. Models are estimated in a Bayesian framework using 'Stan'.
Maintained by David M. Phillippo. Last updated 2 days ago.
35 stars 9.34 score 163 scriptsledell
cvAUC:Cross-Validated Area Under the ROC Curve Confidence Intervals
Tools for working with and evaluating cross-validated area under the ROC curve (AUC) estimators. The primary functions of the package are ci.cvAUC and ci.pooled.cvAUC, which report cross-validated AUC and compute confidence intervals for cross-validated AUC estimates based on influence curves for i.i.d. and pooled repeated measures data, respectively. One benefit to using influence curve based confidence intervals is that they require much less computation time than bootstrapping methods. The utility functions, AUC and cvAUC, are simple wrappers for functions from the ROCR package.
Maintained by Erin LeDell. Last updated 3 years ago.
aucconfidence-intervalscross-validationmachine-learningstatisticsvariance
23 stars 9.17 score 317 scripts 40 dependentsdoubleml
DoubleML:Double Machine Learning in R
Implementation of the double/debiased machine learning framework of Chernozhukov et al. (2018) <doi:10.1111/ectj.12097> for partially linear regression models, partially linear instrumental variable regression models, interactive regression models and interactive instrumental variable regression models. 'DoubleML' allows estimation of the nuisance parts in these models by machine learning methods and computation of the Neyman orthogonal score functions. 'DoubleML' is built on top of 'mlr3' and the 'mlr3' ecosystem. The object-oriented implementation of 'DoubleML' based on the 'R6' package is very flexible. More information available in the publication in the Journal of Statistical Software: <doi:10.18637/jss.v108.i03>.
Maintained by Philipp Bach. Last updated 4 months ago.
causal-inferencedata-sciencedouble-machine-learningeconometricsmachine-learningmlr3statistics
139 stars 9.16 score 267 scripts 1 dependentsgreat-northern-diver
loon:Interactive Statistical Data Visualization
An extendable toolkit for interactive data visualization and exploration.
Maintained by R. Wayne Oldford. Last updated 2 years ago.
data-analysisdata-sciencedata-visualizationexploratory-analysisexploratory-data-analysishigh-dimensional-datainteractive-graphicsinteractive-visualizationsloonpythonstatistical-analysisstatistical-graphicsstatisticstcl-extensiontk
48 stars 9.00 score 93 scripts 5 dependentsgraemeleehickey
joineRML:Joint Modelling of Multivariate Longitudinal Data and Time-to-Event Outcomes
Fits the joint model proposed by Henderson and colleagues (2000) <doi:10.1093/biostatistics/1.4.465>, but extended to the case of multiple continuous longitudinal measures. The time-to-event data is modelled using a Cox proportional hazards regression model with time-varying covariates. The multiple longitudinal outcomes are modelled using a multivariate version of the Laird and Ware linear mixed model. The association is captured by a multivariate latent Gaussian process. The model is estimated using a Monte Carlo Expectation Maximization algorithm. This project was funded by the Medical Research Council (Grant number MR/M013227/1).
Maintained by Graeme L. Hickey. Last updated 2 months ago.
armadillobiostatisticsclinical-trialscoxdynamicjoint-modelslongitudinal-datamultivariate-analysismultivariate-datamultivariate-longitudinal-datapredictionrcppregression-modelsstatisticssurvivalopenblascppopenmp
30 stars 8.93 score 146 scripts 1 dependentsmattcowgill
readabs:Download and Tidy Time Series Data from the Australian Bureau of Statistics
Downloads, imports, and tidies time series data from the Australian Bureau of Statistics <https://www.abs.gov.au/>.
Maintained by Matt Cowgill. Last updated 26 days ago.
absaustraliaaustralian-bureau-of-statisticsaustralian-datastatisticstidy-datatime-series
104 stars 8.85 score 180 scriptsropengov
regions:Processing Regional Statistics
Validating sub-national statistical typologies, re-coding across standard typologies of sub-national statistics, and making valid aggregate level imputation, re-aggregation, re-weighting and projection down to lower hierarchical levels to create meaningful data panels and time series.
Maintained by Daniel Antal. Last updated 3 years ago.
observatoryregionsropengovstatistics
12 stars 8.81 score 67 scripts 5 dependentsopenpharma
brms.mmrm:Bayesian MMRMs using 'brms'
The mixed model for repeated measures (MMRM) is a popular model for longitudinal clinical trial data with continuous endpoints, and 'brms' is a powerful and versatile package for fitting Bayesian regression models. The 'brms.mmrm' R package leverages 'brms' to run MMRMs, and it supports a simplified interfaced to reduce difficulty and align with the best practices of the life sciences. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>, Mallinckrodt (2008) <doi:10.1177/009286150804200402>.
Maintained by William Michael Landau. Last updated 6 months ago.
brmslife-sciencesmc-stanmmrmstanstatistics
21 stars 8.80 score 13 scriptsjinseob2kim
jsmodule:'RStudio' Addins and 'Shiny' Modules for Medical Research
'RStudio' addins and 'Shiny' modules for descriptive statistics, regression and survival analysis.
Maintained by Jinseob Kim. Last updated 10 days ago.
medicalrstudio-addinsshinyshiny-modulesstatistics
21 stars 8.69 score 61 scriptsmobiodiv
mobr:Measurement of Biodiversity
Functions for calculating metrics for the measurement biodiversity and its changes across scales, treatments, and gradients. The methods implemented in this package are described in: Chase, J.M., et al. (2018) <doi:10.1111/ele.13151>, McGlinn, D.J., et al. (2019) <doi:10.1111/2041-210X.13102>, McGlinn, D.J., et al. (2020) <doi:10.1101/851717>, and McGlinn, D.J., et al. (2023) <doi:10.1101/2023.09.19.558467>.
Maintained by Daniel McGlinn. Last updated 8 days ago.
biodiversityconservationecologyrarefactionspeciesstatistics
23 stars 8.65 score 93 scriptsmayer79
confintr:Confidence Intervals
Calculates classic and/or bootstrap confidence intervals for many parameters such as the population mean, variance, interquartile range (IQR), median absolute deviation (MAD), skewness, kurtosis, Cramer's V, odds ratio, R-squared, quantiles (incl. median), proportions, different types of correlation measures, difference in means, quantiles and medians. Many of the classic confidence intervals are described in Smithson, M. (2003, ISBN: 978-0761924999). Bootstrap confidence intervals are calculated with the R package 'boot'. Both one- and two-sided intervals are supported.
Maintained by Michael Mayer. Last updated 8 months ago.
bootstrapconfidence-intervalsstatistical-inferencestatistics
16 stars 8.62 score 104 scripts 17 dependentsbioc
nullranges:Generation of null ranges via bootstrapping or covariate matching
Modular package for generation of sets of ranges representing the null hypothesis. These can take the form of bootstrap samples of ranges (using the block bootstrap framework of Bickel et al 2010), or sets of control ranges that are matched across one or more covariates. nullranges is designed to be inter-operable with other packages for analysis of genomic overlap enrichment, including the plyranges Bioconductor package.
Maintained by Michael Love. Last updated 5 months ago.
visualizationgenesetenrichmentfunctionalgenomicsepigeneticsgeneregulationgenetargetgenomeannotationannotationgenomewideassociationhistonemodificationchipseqatacseqdnaseseqrnaseqhiddenmarkovmodelbioconductorbootstrapgenomicsmatchingstatistics
27 stars 8.16 score 50 scripts 1 dependentsgmgeorg
LambertW:Probabilistic Models to Analyze and Gaussianize Heavy-Tailed, Skewed Data
Lambert W x F distributions are a generalized framework to analyze skewed, heavy-tailed data. It is based on an input/output system, where the output random variable (RV) Y is a non-linearly transformed version of an input RV X ~ F with similar properties as X, but slightly skewed (heavy-tailed). The transformed RV Y has a Lambert W x F distribution. This package contains functions to model and analyze skewed, heavy-tailed data the Lambert Way: simulate random samples, estimate parameters, compute quantiles, and plot/ print results nicely. The most useful function is 'Gaussianize', which works similarly to 'scale', but actually makes the data Gaussian. A do-it-yourself toolkit allows users to define their own Lambert W x 'MyFavoriteDistribution' and use it in their analysis right away.
Maintained by Georg M. Goerg. Last updated 1 years ago.
gaussianizegaussianize-dataheavy-tailedheavy-tailed-distributionsleptokurtosisnormal-distributionnormalizationskewed-datastatisticscpp
10 stars 8.16 score 78 scripts 13 dependentspsychbruce
bruceR:Broadly Useful Convenient and Efficient R Functions
Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.
Maintained by Han-Wu-Shuang Bao. Last updated 10 months ago.
anovadata-analysisdata-sciencelinear-modelslinear-regressionmultilevel-modelsstatisticstoolbox
176 stars 7.87 score 316 scripts 3 dependentscwatson
brainGraph:Graph Theory Analysis of Brain MRI Data
A set of tools for performing graph theory analysis of brain MRI data. It works with data from a Freesurfer analysis (cortical thickness, volumes, local gyrification index, surface area), diffusion tensor tractography data (e.g., from FSL) and resting-state fMRI data (e.g., from DPABI). It contains a graphical user interface for graph visualization and data exploration, along with several functions for generating useful figures.
Maintained by Christopher G. Watson. Last updated 1 years ago.
brain-connectivitybrain-imagingcomplex-networksconnectomeconnectomicsfmrigraph-theorymrinetwork-analysisneuroimagingneurosciencestatisticstractography
188 stars 7.86 score 107 scripts 3 dependentsbioc
fishpond:Fishpond: downstream methods and tools for expression data
Fishpond contains methods for differential transcript and gene expression analysis of RNA-seq data using inferential replicates for uncertainty of abundance quantification, as generated by Gibbs sampling or bootstrap sampling. Also the package contains a number of utilities for working with Salmon and Alevin quantification files.
Maintained by Michael Love. Last updated 5 months ago.
sequencingrnaseqgeneexpressiontranscriptionnormalizationregressionmultiplecomparisonbatcheffectvisualizationdifferentialexpressiondifferentialsplicingalternativesplicingsinglecellbioconductorgene-expressiongenomicssalmonscrnaseqstatisticstranscriptomics
28 stars 7.83 score 150 scriptsstatswithr
statsr:Companion Software for the Coursera Statistics with R Specialization
Data and functions to support Bayesian and frequentist inference and decision making for the Coursera Specialization "Statistics with R". See <https://github.com/StatsWithR/statsr> for more information.
Maintained by Merlise Clyde. Last updated 4 years ago.
bayesian-inferencecourserastatistics
71 stars 7.82 score 880 scriptsspsanderson
TidyDensity:Functions for Tidy Analysis and Generation of Random Data
To make it easy to generate random numbers based upon the underlying stats distribution functions. All data is returned in a tidy and structured format making working with the data simple and straight forward. Given that the data is returned in a tidy 'tibble' it lends itself to working with the rest of the 'tidyverse'.
Maintained by Steven Sanderson. Last updated 5 months ago.
bootstrapdensitydistributionsggplot2probabilityr-languagesimulationstatisticstibbletidy
34 stars 7.73 score 66 scripts 1 dependentsellessenne
rsimsum:Analysis of Simulation Studies Including Monte Carlo Error
Summarise results from simulation studies and compute Monte Carlo standard errors of commonly used summary statistics. This package is modelled on the 'simsum' user-written command in 'Stata' (White I.R., 2010 <https://www.stata-journal.com/article.html?article=st0200>), further extending it with additional performance measures and functionality.
Maintained by Alessandro Gasparini. Last updated 11 months ago.
biostatisticsmonte-carlo-errorsimulationsimulation-studysimulationsstatistics
28 stars 7.70 score 148 scriptsstatisticsnorway
SSBtools:Algorithms and Tools for Tabular Statistics and Hierarchical Computations
Includes general data manipulation functions, algorithms for statistical disclosure control (Langsrud, 2024) <doi:10.1007/978-3-031-69651-0_6> and functions for hierarchical computations by sparse model matrices (Langsrud, 2023) <doi:10.32614/RJ-2023-088>.
Maintained by Øyvind Langsrud. Last updated 14 days ago.
7 stars 7.62 score 68 scripts 7 dependentsmichelenuijten
statcheck:Extract Statistics from Articles and Recompute P-Values
A "spellchecker" for statistics. It checks whether your p-values match their accompanying test statistic and degrees of freedom. statcheck searches for null-hypothesis significance test (NHST) in APA style (e.g., t(28) = 2.2, p < .05). It recalculates the p-value using the reported test statistic and degrees of freedom. If the reported and computed p-values don't match, statcheck will flag the result as an error. If the reported p-value is statistically significant and the recomputed one is not, or vice versa, the result will be flagged as a decision error. You can use statcheck directly on a string of text, but you can also scan a PDF or HTML file, or even a folder of PDF and/or HTML files. Statcheck needs an external program to convert PDF to text: Xpdf. Instructions on where and how to download this program, how to install statcheck, and more details on what statcheck can and cannot do can be found in the online manual: <https://rpubs.com/michelenuijten/statcheckmanual>. You can find a point-and-click web interface to scan PDF or HTML or DOCX articles on <http://statcheck.io>.
Maintained by Michele B. Nuijten. Last updated 8 months ago.
nhstp-valuesreproducibilitystatistics
178 stars 7.55 score 40 scriptsisaakiel
mortAAR:Analysis of Archaeological Mortality Data
A collection of functions for the analysis of archaeological mortality data (on the topic see e.g. Chamberlain 2006 <https://books.google.de/books?id=nG5FoO_becAC&lpg=PA27&ots=LG0b_xrx6O&dq=life%20table%20archaeology&pg=PA27#v=onepage&q&f=false>). It takes demographic data in different formats and displays the result in a standard life table as well as plots the relevant indices (percentage of deaths, survivorship, probability of death, life expectancy, percentage of population). It also checks for possible biases in the age structure and applies corrections to life tables.
Maintained by Nils Mueller-Scheessel. Last updated 3 months ago.
anthropologyarchaeologydemographystatistics
15 stars 7.49 score 23 scriptspat-s
oddsratio:Odds Ratio Calculation for GAM(M)s & GLM(M)s
Simplified odds ratio calculation of GAM(M)s & GLM(M)s. Provides structured output (data frame) of all predictors and their corresponding odds ratios and confident intervals for further analyses. It helps to avoid false references of predictors and increments by specifying these parameters in a list instead of using 'exp(coef(model))' (standard approach of odds ratio calculation for GLMs) which just returns a plain numeric output. For GAM(M)s, odds ratio calculation is highly simplified with this package since it takes care of the multiple 'predict()' calls of the chosen predictor while holding other predictors constant. Also, this package allows odds ratio calculation of percentage steps across the whole predictor distribution range for GAM(M)s. In both cases, confident intervals are returned additionally. Calculated odds ratio of GAM(M)s can be inserted into the smooth function plot.
Maintained by Patrick Schratz. Last updated 12 months ago.
odds-ratioprobabilitystatistics
31 stars 7.48 score 81 scripts 1 dependentsmw201608
SuperExactTest:Exact Test and Visualization of Multi-Set Intersections
Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. This package implements a theoretical framework for efficient computation of statistical distributions of multi-set intersections based upon combinatorial theory, and provides multiple scalable techniques for visualizing the intersection statistics. The statistical algorithm behind this package was published in Wang et al. (2015) <doi:10.1038/srep16923>.
Maintained by Minghui Wang. Last updated 1 years ago.
intersectionsetstatisticsvisualization
28 stars 7.47 score 70 scripts 1 dependentsvinecopulib
rvinecopulib:High Performance Algorithms for Vine Copula Modeling
Provides an interface to 'vinecopulib', a C++ library for vine copula modeling. The 'rvinecopulib' package implements the core features of the popular 'VineCopula' package, in particular inference algorithms for both vine copula and bivariate copula models. Advantages over 'VineCopula' are a sleeker and more modern API, improved performances, especially in high dimensions, nonparametric and multi-parameter families, and the ability to model discrete variables. The 'rvinecopulib' package includes 'vinecopulib' as header-only C++ library (currently version 0.7.2). Thus users do not need to install 'vinecopulib' itself in order to use 'rvinecopulib'. Since their initial releases, 'vinecopulib' is licensed under the MIT License, and 'rvinecopulib' is licensed under the GNU GPL version 3.
Maintained by Thomas Nagler. Last updated 4 days ago.
copulaestimationstatisticsvinecpp
35 stars 7.43 score 60 scripts 14 dependentsstevecondylios
priceR:Economics and Pricing Tools
Functions to aid in micro and macro economic analysis and handling of price and currency data. Includes extraction of relevant inflation and exchange rate data from World Bank API, data cleaning/parsing, and standardisation. Inflation adjustment calculations as found in Principles of Macroeconomics by Gregory Mankiw et al (2014). Current and historical end of day exchange rates for 171 currencies from the European Central Bank Statistical Data Warehouse (2020) <https://sdw.ecb.europa.eu/curConverter.do>.
Maintained by Steve Condylios. Last updated 7 months ago.
data-scienceeconometricseconomicsfinancemodelingr-programmingstatistics
59 stars 7.38 score 102 scriptsmarberts
piar:Price Index Aggregation
Most price indexes are made with a two-step procedure, where period-over-period elemental indexes are first calculated for a collection of elemental aggregates at each point in time, and then aggregated according to a price index aggregation structure. These indexes can then be chained together to form a time series that gives the evolution of prices with respect to a fixed base period. This package contains a collection of functions that revolve around this work flow, making it easy to build standard price indexes, and implement the methods described by Balk (2008, <doi:10.1017/CBO9780511720758>), von der Lippe (2007, <doi:10.3726/978-3-653-01120-3>), and the CPI manual (2020, <doi:10.5089/9781484354841.069>) for bilateral price indexes.
Maintained by Steve Martin. Last updated 2 days ago.
economicsinflationofficial-statisticsstatistics
4 stars 7.30 score 25 scriptsairoldilab
sgd:Stochastic Gradient Descent for Scalable Estimation
A fast and flexible set of tools for large scale estimation. It features many stochastic gradient methods, built-in models, visualization tools, automated hyperparameter tuning, model checking, interval estimation, and convergence diagnostics.
Maintained by Junhyung Lyle Kim. Last updated 1 years ago.
big-datadata-analysisgradient-descentstatisticsopenblascpp
62 stars 7.25 score 71 scriptsdeepankardatta
blandr:Bland-Altman Method Comparison
Carries out Bland Altman analyses (also known as a Tukey mean-difference plot) as described by JM Bland and DG Altman in 1986 <doi:10.1016/S0140-6736(86)90837-8>. This package was created in 2015 as existing Bland-Altman analysis functions did not calculate confidence intervals. This package was created to rectify this, and create reproducible plots. This package is also available as a module for the 'jamovi' statistical spreadsheet (see <https://www.jamovi.org> for more information).
Maintained by Deepankar Datta. Last updated 10 months ago.
bland-altmanggplot2method-comparisonstatistics
22 stars 7.22 score 75 scriptskkholst
targeted:Targeted Inference
Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>).
Maintained by Klaus K. Holst. Last updated 2 months ago.
causal-inferencedouble-robustestimationsemiparametric-estimationstatisticsopenblascppopenmp
11 stars 7.20 score 30 scripts 1 dependentsarsilva87
biotools:Tools for Biometry and Applied Statistics in Agricultural Science
Tools designed to perform and evaluate cluster analysis (including Tocher's algorithm), discriminant analysis and path analysis (standard and under collinearity), as well as some useful miscellaneous tools for dealing with sample size and optimum plot size calculations. A test for seed sample heterogeneity is now available. Mantel's permutation test can be found in this package. A new approach for calculating its power is implemented. biotools also contains tests for genetic covariance components. Heuristic approaches for performing non-parametric spatial predictions of generic response variables and spatial gene diversity are implemented.
Maintained by Anderson Rodrigo da Silva. Last updated 3 years ago.
cluster-analysismultivariate-analysisstatisticstocher
2 stars 7.11 score 161 scripts 1 dependentsnbarrowman
vtree:Display Information About Nested Subsets of a Data Frame
A tool for calculating and drawing "variable trees". Variable trees display information about nested subsets of a data frame.
Maintained by Nick Barrowman. Last updated 13 days ago.
data-sciencedata-visualizationexploratory-data-analysisstatistics
76 stars 7.09 score 65 scriptsdoccstat
fastcpd:Fast Change Point Detection via Sequential Gradient Descent
Implements fast change point detection algorithm based on the paper "Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis" by Xianyang Zhang, Trisha Dawn <https://proceedings.mlr.press/v206/zhang23b.html>. The algorithm is based on dynamic programming with pruning and sequential gradient descent. It is able to detect change points a magnitude faster than the vanilla Pruned Exact Linear Time(PELT). The package includes examples of linear regression, logistic regression, Poisson regression, penalized linear regression data, and whole lot more examples with custom cost function in case the user wants to use their own cost function.
Maintained by Xingchi Li. Last updated 10 days ago.
change-point-detectioncppcustom-functiongradient-descentlassolinear-regressionlogistic-regressionofflinepeltpenalized-regressionpoisson-regressionquasi-newtonstatisticstime-serieswarm-startfortranopenblascppopenmp
22 stars 7.00 score 7 scriptsvandomed
tab:Create Summary Tables for Statistical Reports
Contains functions for creating various types of summary tables, e.g. comparing characteristics across levels of a categorical variable and summarizing fitted generalized linear models, generalized estimating equations, and Cox proportional hazards models. Functions are available to handle data from simple random samples as well as complex surveys.
Maintained by Dane R. Van Domelen. Last updated 4 years ago.
manuscriptsreportsreproducible-researchstatisticstables
2 stars 6.97 score 86 scripts 9 dependentsropensci
jagstargets:Targets for JAGS Pipelines
Bayesian data analysis usually incurs long runtimes and cumbersome custom code. A pipeline toolkit tailored to Bayesian statisticians, the 'jagstargets' R package is leverages 'targets' and 'R2jags' to ease this burden. 'jagstargets' makes it super easy to set up scalable JAGS pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom code is required, and there is no need to manually configure branching, so usage is much easier than 'targets' alone. For the underlying methodology, please refer to the documentation of 'targets' <doi:10.21105/joss.02959> and 'JAGS' (Plummer 2003) <https://www.r-project.org/conferences/DSC-2003/Proceedings/Plummer.pdf>.
Maintained by William Michael Landau. Last updated 4 months ago.
bayesianhigh-performance-computingjagsmaker-targetopiareproducibilityrjagsstatisticstargetscpp
10 stars 6.95 score 32 scriptsrempsyc
lavaanExtra:Convenience Functions for Package 'lavaan'
Affords an alternative, vector-based syntax to 'lavaan', as well as other convenience functions such as naming paths and defining indirect links automatically, in addition to convenience formatting optimized for a publication and script sharing workflow.
Maintained by Rémi Thériault. Last updated 9 months ago.
convenience-functionslavaanpsychologystatisticsstructural-equation-modeling
18 stars 6.95 score 33 scriptsropensci
stantargets:Targets for Stan Workflows
Bayesian data analysis usually incurs long runtimes and cumbersome custom code. A pipeline toolkit tailored to Bayesian statisticians, the 'stantargets' R package leverages 'targets' and 'cmdstanr' to ease these burdens. 'stantargets' makes it super easy to set up scalable Stan pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom code is required, and there is no need to manually configure branching, so usage is much easier than 'targets' alone. 'stantargets' can access all of 'cmdstanr''s major algorithms (MCMC, variational Bayes, and optimization) and it supports both single-fit workflows and multi-rep simulation studies. For the statistical methodology, please refer to 'Stan' documentation (Stan Development Team 2020) <https://mc-stan.org/>.
Maintained by William Michael Landau. Last updated 2 months ago.
bayesianhigh-performance-computingmaker-targetopiareproducibilitystanstatisticstargets
49 stars 6.85 score 180 scriptsdesctable
desctable:Produce Descriptive and Comparative Tables Easily
Easily create descriptive and comparative tables. It makes use and integrates directly with the tidyverse family of packages, and pipes. Tables are produced as (nested) dataframes for easy manipulation.
Maintained by Maxime Wack. Last updated 3 years ago.
52 stars 6.85 score 45 scriptsmayer79
MetricsWeighted:Weighted Metrics and Performance Measures for Machine Learning
Provides weighted versions of several metrics and performance measures used in machine learning, including average unit deviances of the Bernoulli, Tweedie, Poisson, and Gamma distributions, see Jorgensen B. (1997, ISBN: 978-0412997112). The package also contains a weighted version of generalized R-squared, see e.g. Cohen, J. et al. (2002, ISBN: 978-0805822236). Furthermore, 'dplyr' chains are supported.
Maintained by Michael Mayer. Last updated 8 months ago.
machine-learningmetricsperformancestatistics
11 stars 6.79 score 75 scripts 5 dependentsimbi-heidelberg
DescrTab2:Publication Quality Descriptive Statistics Tables
Provides functions to create descriptive statistics tables for continuous and categorical variables. By default, summary statistics such as mean, standard deviation, quantiles, minimum and maximum for continuous variables and relative and absolute frequencies for categorical variables are calculated. 'DescrTab2' features a sophisticated algorithm to choose appropriate test statistics for your data and provides p-values. On top of this, confidence intervals for group differences of appropriated summary measures are automatically produces for two-group comparison. Tables generated by 'DescrTab2' can be integrated in a variety of document formats, including .html, .tex and .docx documents. 'DescrTab2' also allows printing tables to console and saving table objects for later use.
Maintained by Jan Meis. Last updated 1 years ago.
categorical-variablescontinuous-variabledescriptive-statisticsp-valuesstatistical-testsstatistics
9 stars 6.71 score 19 scripts 1 dependentsdoomlab
MOTE:Effect Size and Confidence Interval Calculator
Measure of the Effect ('MOTE') is an effect size calculator, including a wide variety of effect sizes in the mean differences family (all versions of d) and the variance overlap family (eta, omega, epsilon, r). 'MOTE' provides non-central confidence intervals for each effect size, relevant test statistics, and output for reporting in APA Style (American Psychological Association, 2010, <ISBN:1433805618>) with 'LaTeX'. In research, an over-reliance on p-values may conceal the fact that a study is under-powered (Halsey, Curran-Everett, Vowler, & Drummond, 2015 <doi:10.1038/nmeth.3288>). A test may be statistically significant, yet practically inconsequential (Fritz, Scherndl, & Kühberger, 2012 <doi:10.1177/0959354312436870>). Although the American Psychological Association has long advocated for the inclusion of effect sizes (Wilkinson & American Psychological Association Task Force on Statistical Inference, 1999 <doi:10.1037/0003-066X.54.8.594>), the vast majority of peer-reviewed, published academic studies stop short of reporting effect sizes and confidence intervals (Cumming, 2013, <doi:10.1177/0956797613504966>). 'MOTE' simplifies the use and interpretation of effect sizes and confidence intervals. For more information, visit <https://www.aggieerin.com/shiny-server>.
Maintained by Erin M. Buchanan. Last updated 3 years ago.
confidenceeffectintervalsizestatistics
17 stars 6.69 score 320 scripts 1 dependentskoenderks
jfa:Statistical Methods for Auditing
Provides statistical methods for auditing as implemented in JASP for Audit (Derks et al., 2021 <doi:10.21105/joss.02733>). First, the package makes it easy for an auditor to plan a statistical sample, select the sample from the population, and evaluate the misstatement in the sample compliant with international auditing standards. Second, the package provides statistical methods for auditing data, including tests of digit distributions and repeated values. Finally, the package includes methods for auditing algorithms on the aspect of fairness and bias. Next to classical statistical methodology, the package implements Bayesian equivalents of these methods whose statistical underpinnings are described in Derks et al. (2021) <doi:10.1111/ijau.12240>, Derks et al. (2024) <doi:10.2308/AJPT-2021-086>, Derks et al. (2022) <doi:10.31234/osf.io/8nf3e> Derks et al. (2024) <doi:10.31234/osf.io/tgq5z>, and Derks et al. (2025) <doi:10.31234/osf.io/b8tu2>.
Maintained by Koen Derks. Last updated 12 days ago.
algorithm-auditingauditaudit-samplingbayesiandata-auditingjaspjasp-for-auditstatistical-auditstatisticscpp
8 stars 6.69 score 17 scriptsmingzehuang
latentcor:Fast Computation of Latent Correlations for Mixed Data
The first stand-alone R package for computation of latent correlation that takes into account all variable types (continuous/binary/ordinal/zero-inflated), comes with an optimized memory footprint, and is computationally efficient, essentially making latent correlation estimation almost as fast as rank-based correlation estimation. The estimation is based on latent copula Gaussian models. For continuous/binary types, see Fan, J., Liu, H., Ning, Y., and Zou, H. (2017). For ternary type, see Quan X., Booth J.G. and Wells M.T. (2018) <arXiv:1809.06255>. For truncated type or zero-inflated type, see Yoon G., Carroll R.J. and Gaynanova I. (2020) <doi:10.1093/biomet/asaa007>. For approximation method of computation, see Yoon G., Müller C.L. and Gaynanova I. (2021) <doi:10.1080/10618600.2021.1882468>. The latter method uses multi-linear interpolation originally implemented in the R package <https://cran.r-project.org/package=chebpol>.
Maintained by Mingze Huang. Last updated 3 years ago.
data-analysisdata-miningdata-processingdata-sciencedata-structuresmachine-learningmixed-typesstatistics
16 stars 6.65 score 46 scripts 1 dependentsbusiness-science
modeltime.resample:Resampling Tools for Time Series Forecasting
A 'modeltime' extension that implements forecast resampling tools that assess time-based model performance and stability for a single time series, panel data, and cross-sectional time series analysis.
Maintained by Matt Dancho. Last updated 1 years ago.
accuracy-metricsbacktestingbootstrapbootstrappingcross-validationforecastingmodeltimemodeltime-resampleresamplingstatisticstidymodelstime-series
19 stars 6.64 score 38 scripts 1 dependentssimnph
SimNPH:Simulate Non-Proportional Hazards
A toolkit for simulation studies concerning time-to-event endpoints with non-proportional hazards. 'SimNPH' encompasses functions for simulating time-to-event data in various scenarios, simulating different trial designs like fixed-followup, event-driven, and group sequential designs. The package provides functions to calculate the true values of common summary statistics for the implemented scenarios and offers common analysis methods for time-to-event data. Helper functions for running simulations with the 'SimDesign' package and for aggregating and presenting the results are also included. Results of the conducted simulation study are available in the paper: "A Comparison of Statistical Methods for Time-To-Event Analyses in Randomized Controlled Trials Under Non-Proportional Hazards", Klinglmüller et al. (2025) <doi:10.1002/sim.70019>.
Maintained by Tobias Fellinger. Last updated 24 days ago.
clinical-trial-simulationsnon-proportional-hazardsstatistical-simulationstatisticssurvival-analysis
6 stars 6.63 score 43 scriptsrsquaredacademy
xplorerr:Tools for Interactive Data Exploration
Tools for interactive data exploration built using 'shiny'. Includes apps for descriptive statistics, visualizing probability distributions, inferential statistics, linear regression, logistic regression and RFM analysis.
Maintained by Aravind Hebbali. Last updated 5 months ago.
dataexplorationshiny-appsstatisticsvisualizationcpp
38 stars 6.62 score 11 scripts 6 dependentsmarberts
gpindex:Generalized Price and Quantity Indexes
Tools to build and work with bilateral generalized-mean price indexes (and by extension quantity indexes), and indexes composed of generalized-mean indexes (e.g., superlative quadratic-mean indexes, GEKS). Covers the core mathematical machinery for making bilateral price indexes, computing price relatives, detecting outliers, and decomposing indexes, with wrappers for all common (and many uncommon) index-number formulas. Implements and extends many of the methods in Balk (2008, <doi:10.1017/CBO9780511720758>), von der Lippe (2007, <doi:10.3726/978-3-653-01120-3>), and the CPI manual (2020, <doi:10.5089/9781484354841.069>).
Maintained by Steve Martin. Last updated 1 days ago.
economicsinflationofficial-statisticsstatistics
7 stars 6.60 score 29 scripts 1 dependentsserkor1
SLmetrics:Machine Learning Performance Evaluation on Steroids
Performance evaluation metrics for supervised and unsupervised machine learning, statistical learning and artificial intelligence applications. Core computations are implemented in 'C++' for scalability and efficiency.
Maintained by Serkan Korkmaz. Last updated 1 days ago.
cppdata-analysisdata-scienceeigen3machine-learningperformance-metricsrcpprcppeigenstatisticssupervised-learningcpp
22 stars 6.56 scorebrubinstein
diffpriv:Easy Differential Privacy
An implementation of major general-purpose mechanisms for privatizing statistics, models, and machine learners, within the framework of differential privacy of Dwork et al. (2006) <doi:10.1007/11681878_14>. Example mechanisms include the Laplace mechanism for releasing numeric aggregates, and the exponential mechanism for releasing set elements. A sensitivity sampler (Rubinstein & Alda, 2017) <arXiv:1706.02562> permits sampling target non-private function sensitivity; combined with the generic mechanisms, it permits turn-key privatization of arbitrary programs.
Maintained by Benjamin Rubinstein. Last updated 3 years ago.
data-sciencedifferential-privacydiffprivmachine-learningstatistics
67 stars 6.54 score 52 scriptssacema
inctools:Incidence Estimation Tools
Tools for estimating incidence from biomarker data in cross- sectional surveys, and for calibrating tests for recent infection. Implements and extends the method of Kassanjee et al. (2012) <doi:10.1097/EDE.0b013e3182576c07>.
Maintained by Eduard Grebe. Last updated 4 years ago.
biomarkersbiostatisticsepidemiologyhivincidenceincidence-estimationincidence-inferenceinfectious-diseasesstatistics
6 stars 6.51 score 27 scriptsterrytangyuan
lfda:Local Fisher Discriminant Analysis
Functions for performing and visualizing Local Fisher Discriminant Analysis(LFDA), Kernel Fisher Discriminant Analysis(KLFDA), and Semi-supervised Local Fisher Discriminant Analysis(SELF).
Maintained by Yuan Tang. Last updated 2 years ago.
dimensionality-reductiondistance-metric-learningmachine-learningmetric-learningstatistics
76 stars 6.50 score 74 scripts 3 dependentsdhaine
episensr:Basic Sensitivity Analysis of Epidemiological Results
Basic sensitivity analysis of the observed relative risks adjusting for unmeasured confounding and misclassification of the exposure/outcome, or both. It follows the bias analysis methods and examples from the book by Lash T.L, Fox M.P, and Fink A.K. "Applying Quantitative Bias Analysis to Epidemiologic Data", ('Springer', 2021).
Maintained by Denis Haine. Last updated 1 years ago.
biasepidemiologysensitivity-analysisstatistics
13 stars 6.48 score 39 scripts 1 dependentsr-spark
sparklyr.flint:Sparklyr Extension for 'Flint'
This sparklyr extension makes 'Flint' time series library functionalities (<https://github.com/twosigma/flint>) easily accessible through R.
Maintained by Edgar Ruiz. Last updated 3 years ago.
apache-sparkdata-analysisdata-miningdata-sciencedistributeddistributed-computingflintremote-clusterssparksparklyrstatistical-analysisstatisticsstatssummarizationsummary-statisticstime-seriestime-series-analysistwosigma-flint
9 stars 6.46 score 54 scriptselbersb
segregation:Entropy-Based Segregation Indices
Computes segregation indices, including the Index of Dissimilarity, as well as the information-theoretic indices developed by Theil (1971) <isbn:978-0471858454>, namely the Mutual Information Index (M) and Theil's Information Index (H). The M, further described by Mora and Ruiz-Castillo (2011) <doi:10.1111/j.1467-9531.2011.01237.x> and Frankel and Volij (2011) <doi:10.1016/j.jet.2010.10.008>, is a measure of segregation that is highly decomposable. The package provides tools to decompose the index by units and groups (local segregation), and by within and between terms. The package also provides a method to decompose differences in segregation as described by Elbers (2021) <doi:10.1177/0049124121986204>. The package includes standard error estimation by bootstrapping, which also corrects for small sample bias. The package also contains functions for visualizing segregation patterns.
Maintained by Benjamin Elbers. Last updated 1 years ago.
entropysegregationstatisticscpp
36 stars 6.44 score 51 scriptsfaosorios
fastmatrix:Fast Computation of some Matrices Useful in Statistics
Small set of functions to fast computation of some matrices and operations useful in statistics and econometrics. Currently, there are functions for efficient computation of duplication, commutation and symmetrizer matrices with minimal storage requirements. Some commonly used matrix decompositions (LU and LDL), basic matrix operations (for instance, Hadamard, Kronecker products and the Sherman-Morrison formula) and iterative solvers for linear systems are also available. In addition, the package includes a number of common statistical procedures such as the sweep operator, weighted mean and covariance matrix using an online algorithm, linear regression (using Cholesky, QR, SVD, sweep operator and conjugate gradients methods), ridge regression (with optimal selection of the ridge parameter considering several procedures), omnibus tests for univariate normality, functions to compute the multivariate skewness, kurtosis, the Mahalanobis distance (checking the positive defineteness), and the Wilson-Hilferty transformation of gamma variables. Furthermore, the package provides interfaces to C code callable by another C code from other R packages.
Maintained by Felipe Osorio. Last updated 1 years ago.
commutation-matrixjarque-bera-testldl-factorizationlu-factorizationmatrix-api-for-r-packagesmatrix-normsmodified-choleskyols-regressionpower-methodridge-regressionsherman-morrisonstatisticssweep-operatorsymmetrizer-matrixfortranopenblas
19 stars 6.37 score 37 scripts 11 dependentsnt-williams
lmtp:Non-Parametric Causal Effects of Feasible Interventions Based on Modified Treatment Policies
Non-parametric estimators for casual effects based on longitudinal modified treatment policies as described in Diaz, Williams, Hoffman, and Schenck <doi:10.1080/01621459.2021.1955691>, traditional point treatment, and traditional longitudinal effects. Continuous, binary, categorical treatments, and multivariate treatments are allowed as well are censored outcomes. The treatment mechanism is estimated via a density ratio classification procedure irrespective of treatment variable type. For both continuous and binary outcomes, additive treatment effects can be calculated and relative risks and odds ratios may be calculated for binary outcomes. Supports survival outcomes with competing risks (Diaz, Hoffman, and Hejazi; <doi:10.1007/s10985-023-09606-7>).
Maintained by Nicholas Williams. Last updated 21 days ago.
causal-inferencecensored-datalongitudinal-datamachine-learningmodified-treatment-policynonparametric-statisticsprecision-medicinerobust-statisticsstatisticsstochastic-interventionssurvival-analysistargeted-learning
64 stars 6.37 score 91 scriptscmstatr
cmstatr:Statistical Methods for Composite Material Data
An implementation of the statistical methods commonly used for advanced composite materials in aerospace applications. This package focuses on calculating basis values (lower tolerance bounds) for material strength properties, as well as performing the associated diagnostic tests. This package provides functions for calculating basis values assuming several different distributions, as well as providing functions for non-parametric methods of computing basis values. Functions are also provided for testing the hypothesis that there is no difference between strength and modulus data from an alternate sample and that from a "qualification" or "baseline" sample. For a discussion of these statistical methods and their use, see the Composite Materials Handbook, Volume 1 (2012, ISBN: 978-0-7680-7811-4). Additional details about this package are available in the paper by Kloppenborg (2020, <doi:10.21105/joss.02265>).
Maintained by Stefan Kloppenborg. Last updated 10 days ago.
composite-material-datadatamaterials-sciencestatistical-analysisstatistics
4 stars 6.36 score 23 scriptshoxo-m
densratio:Density Ratio Estimation
Density ratio estimation. The estimated density ratio function can be used in many applications such as anomaly detection, change-point detection, covariate shift adaptation. The implemented methods are uLSIF (Hido et al. (2011) <doi:10.1007/s10115-010-0283-2>), RuLSIF (Yamada et al. (2011) <doi:10.1162/NECO_a_00442>), and KLIEP (Sugiyama et al. (2007) <doi:10.1007/s10463-008-0197-x>).
Maintained by Koji Makiyama. Last updated 6 years ago.
anomalydetectionmachine-learningmachine-learning-algorithmsmachine-learning-libraryr-languagestatistics
21 stars 6.36 score 36 scripts 2 dependentsbioc
structToolbox:Data processing & analysis tools for Metabolomics and other omics
An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.
Maintained by Gavin Rhys Lloyd. Last updated 1 months ago.
workflowstepmetabolomicsbioconductor-packagedimslc-msmachine-learningmultivariate-analysisstatisticsunivariate
10 stars 6.26 score 12 scriptsfabrice-rossi
mixvlmc:Variable Length Markov Chains with Covariates
Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999) <doi:10.1214/aos/1018031204> for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) <doi:10.1111/jtsa.12615> for VLMC with covariates.
Maintained by Fabrice Rossi. Last updated 11 months ago.
machine-learningmarkov-chainmarkov-modelstatisticstime-seriescpp
2 stars 6.23 score 20 scriptsjacobseedorff21
BranchGLM:Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms
Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'.
Maintained by Jacob Seedorff. Last updated 6 months ago.
generalized-linear-modelsregressionstatisticssubset-selectionvariable-selectionopenblascppopenmp
7 stars 6.20 score 30 scriptsmatherealize
simdata:Generate Simulated Datasets
Generate simulated datasets from an initial underlying distribution and apply transformations to obtain realistic data. Implements the 'NORTA' (Normal-to-anything) approach from Cario and Nelson (1997) and other data generating mechanisms. Simple network visualization tools are provided to facilitate communicating the simulation setup.
Maintained by Michael Kammer. Last updated 4 months ago.
data-generationregressionsimulationstatistics
7 stars 6.10 score 10 scripts 1 dependentsmodal-inria
RMixtComp:Mixture Models with Heterogeneous and (Partially) Missing Data
Mixture Composer (Biernacki (2015) <https://inria.hal.science/hal-01253393v1>) is a project to perform clustering using mixture models with heterogeneous data and partially missing data. Mixture models are fitted using a SEM algorithm. It includes 8 models for real, categorical, counting, functional and ranking data.
Maintained by Quentin Grimonprez. Last updated 11 months ago.
clusteringcppheterogeneous-datamissing-datamixed-datamixture-modelstatistics
13 stars 6.10 score 12 scriptscapnrefsmmat
regressinator:Simulate and Diagnose (Generalized) Linear Models
Simulate samples from populations with known covariate distributions, generate response variables according to common linear and generalized linear model families, draw from sampling distributions of regression estimates, and perform visual inference on diagnostics from model fits.
Maintained by Alex Reinhart. Last updated 6 months ago.
4 stars 6.08 score 25 scriptstanaylab
tgstat:Amos Tanay's Group High Performance Statistical Utilities
A collection of high performance utilities to compute distance, correlation, auto correlation, clustering and other tasks. Contains graph clustering algorithm described in "MetaCell: analysis of single-cell RNA-seq data using K-nn graph partitions" (Yael Baran, Akhiad Bercovich, Arnau Sebe-Pedros, Yaniv Lubling, Amir Giladi, Elad Chomsky, Zohar Meir, Michael Hoichman, Aviezer Lifshitz & Amos Tanay, 2019 <doi:10.1186/s13059-019-1812-2>).
Maintained by Aviezer Lifshitz. Last updated 6 months ago.
algorithms-implementedcorrelationknnstatisticsopenblascpp
8 stars 6.06 score 24 scripts 1 dependentsterrytangyuan
autoplotly:Automatic Generation of Interactive Visualizations for Statistical Results
Functionalities to automatically generate interactive visualizations for statistical results supported by 'ggfortify', such as time series, PCA, clustering and survival analysis, with 'plotly.js' <https://plotly.com/> and 'ggplot2' style. The generated visualizations can also be easily extended using 'ggplot2' and 'plotly' syntax while staying interactive.
Maintained by Yuan Tang. Last updated 2 years ago.
data-visualizationggplot2interactive-visualizationsmachine-learningplotlyplotlyjsstatistics
88 stars 6.01 score 23 scriptsltrr-arizona-edu
burnr:Forest Fire History Analysis
Tools to read, write, parse, and analyze forest fire history data (e.g. FHX). Described in Malevich et al. (2018) <doi:10.1016/j.dendro.2018.02.005>.
Maintained by Steven Malevich. Last updated 3 years ago.
citationdendrochronologyecologyforestfireplotscientificstatistics
15 stars 5.95 score 59 scriptsalexioannides
pipeliner:Machine Learning Pipelines for R
A framework for defining 'pipelines' of functions for applying data transformations, model estimation and inverse-transformations, resulting in predicted value generation (or model-scoring) functions that automatically apply the entire pipeline of functions required to go from input to predicted output.
Maintained by Alex Ioannides. Last updated 8 years ago.
data-sciencemachine-learningmachine-learning-pipelinespipelinepredictionstatisticstransform-functionsworkflow
67 stars 5.94 score 26 scriptsterrytangyuan
dml:Distance Metric Learning in R
State-of-the-art algorithms for distance metric learning, including global and local methods such as Relevant Component Analysis, Discriminative Component Analysis, Local Fisher Discriminant Analysis, etc. These distance metric learning methods are widely applied in feature extraction, dimensionality reduction, clustering, classification, information retrieval, and computer vision problems.
Maintained by Yuan Tang. Last updated 2 years ago.
dimensionality-reductiondistance-metric-learningmachine-learningmetric-learningstatistics
58 stars 5.94 score 8 scripts 1 dependentspegeler
samplesizeCMH:Power and Sample Size Calculation for the Cochran-Mantel-Haenszel Test
Calculates the power and sample size for Cochran-Mantel-Haenszel tests. There are also several helper functions for working with probability, odds, relative risk, and odds ratio values.
Maintained by Paul Egeler. Last updated 2 months ago.
categorical-datacmh-testsample-sizestatistical-powerstatistics
4 stars 5.94 score 36 scriptsbozenne
BuyseTest:Generalized Pairwise Comparisons
Implementation of the Generalized Pairwise Comparisons (GPC) as defined in Buyse (2010) <doi:10.1002/sim.3923> for complete observations, and extended in Peron (2018) <doi:10.1177/0962280216658320> to deal with right-censoring. GPC compare two groups of observations (intervention vs. control group) regarding several prioritized endpoints to estimate the probability that a random observation drawn from one group performs better/worse/equivalently than a random observation drawn from the other group. Summary statistics such as the net treatment benefit, win ratio, or win odds are then deduced from these probabilities. Confidence intervals and p-values are obtained based on asymptotic results (Ozenne 2021 <doi:10.1177/09622802211037067>), non-parametric bootstrap, or permutations. The software enables the use of thresholds of minimal importance difference, stratification, non-prioritized endpoints (O Brien test), and can handle right-censoring and competing-risks.
Maintained by Brice Ozenne. Last updated 16 days ago.
generalized-pairwise-comparisonsnon-parametricstatisticscpp
5 stars 5.91 score 90 scriptsstrakaps
MittagLeffleR:Mittag-Leffler Family of Distributions
Implements the Mittag-Leffler function, distribution, random variate generation, and estimation. Based on the Laplace-Inversion algorithm by Garrappa, R. (2015) <doi:10.1137/140971191>.
Maintained by Peter Straka. Last updated 4 years ago.
6 stars 5.88 score 28 scriptsmvuorre
bmlm:Bayesian Multilevel Mediation
Easy estimation of Bayesian multilevel mediation models with Stan.
Maintained by Matti Vuorre. Last updated 4 months ago.
bayesian-data-analysismultilevel-mediation-modelsstatisticscpp
42 stars 5.81 score 34 scriptstnagler
vinereg:D-Vine Quantile Regression
Implements D-vine quantile regression models with parametric or nonparametric pair-copulas. See Kraus and Czado (2017) <doi:10.1016/j.csda.2016.12.009> and Schallhorn et al. (2017) <doi:10.48550/arXiv.1705.08310>.
Maintained by Thomas Nagler. Last updated 3 months ago.
copulaestimationstatisticsvinecpp
11 stars 5.76 score 26 scriptsflying-sheep
ggplot.multistats:Multiple Summary Statistics for Binned Stats/Geometries
Provides the ggplot binning layer stat_summaries_hex(), which functions similar to its singular form, but allows the use of multiple statistics per bin. Those statistics can be mapped to multiple bin aesthetics.
Maintained by Philipp Angerer. Last updated 6 months ago.
10 stars 5.76 score 16 scripts 2 dependentsclaudiozandonella
PRDA:Conduct a Prospective or Retrospective Design Analysis
An implementation of the "Design Analysis" proposed by Gelman and Carlin (2014) <doi:10.1177/1745691614551642>. It combines the evaluation of Power-Analysis with other inferential-risks as Type-M error (i.e. Magnitude) and Type-S error (i.e. Sign). See also Altoè et al. (2020) <doi:10.3389/fpsyg.2019.02893> and Bertoldo et al. (2020) <doi:10.31234/osf.io/q9f86>.
Maintained by Claudio Zandonella Callegher. Last updated 4 years ago.
design-analysisstatisticsopenblascpp
6 stars 5.73 score 30 scriptsrubenfcasal
npsp:Nonparametric Spatial Statistics
Multidimensional nonparametric spatial (spatio-temporal) geostatistics. S3 classes and methods for multidimensional: linear binning, local polynomial kernel regression (spatial trend estimation), density and variogram estimation. Nonparametric methods for simultaneous inference on both spatial trend and variogram functions (for spatial processes). Nonparametric residual kriging (spatial prediction). For details on these methods see, for example, Fernandez-Casal and Francisco-Fernandez (2014) <doi:10.1007/s00477-013-0817-8> or Castillo-Paez et al. (2019) <doi:10.1016/j.csda.2019.01.017>.
Maintained by Ruben Fernandez-Casal. Last updated 5 months ago.
geostatisticsspatial-data-analysisstatisticsfortranopenblas
4 stars 5.71 score 64 scriptssvazzole
sparsevar:Sparse VAR/VECM Models Estimation
A wrapper for sparse VAR/VECM time series models estimation using penalties like ENET (Elastic Net), SCAD (Smoothly Clipped Absolute Deviation) and MCP (Minimax Concave Penalty). Based on the work of Sumanta Basu and George Michailidis <doi:10.1214/15-AOS1315>.
Maintained by Simone Vazzoler. Last updated 4 years ago.
econometricslassomcpscadsparsestatisticstime-seriesvarvecm
11 stars 5.69 score 30 scripts 1 dependentstsostarics
contrastable:Consistent Contrast Coding for Factors
Quickly set and summarize contrasts for factors prior to regression analyses. Intended comparisons, baseline conditions, and intercepts can be explicitly set and documented without the user needing to directly manipulate matrices. Reviews and introductions for contrast coding are available in Brehm and Alday (2022)<doi:10.1016/j.jml.2022.104334> and Schad et al. (2020)<doi:10.1016/j.jml.2019.104038>.
Maintained by Thomas Sostarics. Last updated 2 months ago.
5.69 score 22 scriptsadelahladka
difNLR:DIF and DDF Detection by Non-Linear Regression Models
Detection of differential item functioning (DIF) among dichotomously scored items and differential distractor functioning (DDF) among unscored items with non-linear regression procedures based on generalized logistic regression models (Hladka & Martinkova, 2020, <doi:10.32614/RJ-2020-014>).
Maintained by Adela Hladka. Last updated 8 days ago.
differential-item-functioningitem-analysispsychometricsstatistics
6 stars 5.66 score 51 scripts 1 dependentsdgerlanc
bootES:Bootstrap Confidence Intervals on Effect Sizes
Calculate robust measures of effect sizes using the bootstrap.
Maintained by Daniel Gerlanc. Last updated 5 months ago.
bootstrapping-statisticseffect-sizesocial-sciencesstatistics
11 stars 5.63 score 62 scriptshendersontrent
correctR:Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples
Calculate a set of corrected test statistics for cases when samples are not independent, such as when classification accuracy values are obtained over resamples or through k-fold cross-validation, as proposed by Nadeau and Bengio (2003) <doi:10.1023/A:1024068626366> and presented in Bouckaert and Frank (2004) <doi:10.1007/978-3-540-24775-3_3>.
Maintained by Trent Henderson. Last updated 2 months ago.
hypothesis-testingmachine-learningstatistics
14 stars 5.62 score 8 scripts 1 dependentsnelson-gon
mde:Missing Data Explorer
Correct identification and handling of missing data is one of the most important steps in any analysis. To aid this process, 'mde' provides a very easy to use yet robust framework to quickly get an idea of where the missing data lies and therefore find the most appropriate action to take. Graham WJ (2009) <doi:10.1146/annurev.psych.58.110405.085530>.
Maintained by Nelson Gonzabato. Last updated 3 years ago.
data-analysisdata-cleaningdata-explorationdata-sciencedatacleanerdatacleaningexploratory-data-analysismissingmissing-datamissing-value-treatmentmissing-valuesmissingnessomitrecodereplacestatistics
4 stars 5.61 score 34 scriptscfwp
rags2ridges:Ridge Estimation of Precision Matrices from High-Dimensional Data
Proper L2-penalized maximum likelihood estimators for precision matrices and supporting functions to employ these estimators in a graphical modeling setting. For details, see Peeters, Bilgrau, & van Wieringen (2022) <doi:10.18637/jss.v102.i04> and associated publications.
Maintained by Carel F.W. Peeters. Last updated 1 years ago.
c-plus-plusgraphical-modelsmachine-learningnetworksciencestatisticsopenblascpp
8 stars 5.60 score 46 scriptsblasbenito
collinear:Automated Multicollinearity Management
Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.
Maintained by Blas M. Benito. Last updated 2 months ago.
machine-learningmulticollinearitystatistics
11 stars 5.51 score 15 scripts 1 dependentsmuriteams
ergmito:Exponential Random Graph Models for Small Networks
Simulation and estimation of Exponential Random Graph Models (ERGMs) for small networks using exact statistics as shown in Vega Yon et al. (2020) <DOI:10.1016/j.socnet.2020.07.005>. As a difference from the 'ergm' package, 'ergmito' circumvents using Markov-Chain Maximum Likelihood Estimator (MC-MLE) and instead uses Maximum Likelihood Estimator (MLE) to fit ERGMs for small networks. As exhaustive enumeration is computationally feasible for small networks, this R package takes advantage of this and provides tools for calculating likelihood functions, and other relevant functions, directly, meaning that in many cases both estimation and simulation of ERGMs for small networks can be faster and more accurate than simulation-based algorithms.
Maintained by George Vega Yon. Last updated 2 years ago.
ergmexponential-random-graph-modelsstatisticsopenblascppopenmp
9 stars 5.49 score 34 scriptsmcanouil
insane:INsulin Secretion ANalysEr
A user-friendly interface, using Shiny, to analyse glucose-stimulated insulin secretion (GSIS) assays in pancreatic beta cells or islets. The package allows the user to import several sets of experiments from different spreadsheets and to perform subsequent steps: summarise in a tidy format, visualise data quality and compare experimental conditions without omitting to account for technical confounders such as the date of the experiment or the technician. Together, insane is a comprehensive method that optimises pre-processing and analyses of GSIS experiments in a friendly-user interface. The Shiny App was initially designed for EndoC-betaH1 cell line following method described in Ndiaye et al., 2017 (<doi:10.1016/j.molmet.2017.03.011>).
Maintained by Mickaël Canouil. Last updated 3 months ago.
beta-cellsendoc-betah1insulin-secretionpancreasshinystatisticsstats
3 stars 5.48 score 4 scriptsgdkrmr
coRanking:Co-Ranking Matrix
Calculates the co-ranking matrix to assess the quality of a dimensionality reduction.
Maintained by Guido Kraemer. Last updated 6 months ago.
dimensionality-reductionmanifold-learningqualitystatisticsunsupervised-learningcpp
9 stars 5.43 score 20 scripts 1 dependentspersimune
explainer:Machine Learning Model Explainer
It enables detailed interpretation of complex classification and regression models through Shapley analysis including data-driven characterization of subgroups of individuals. Furthermore, it facilitates multi-measure model evaluation, model fairness, and decision curve analysis. Additionally, it offers enhanced visualizations with interactive elements.
Maintained by Ramtin Zargari Marandi. Last updated 6 months ago.
aiclassificationclinical-researchexplainabilityexplainable-aiinterpretabilitymachine-learningregressionshapstatistics
15 stars 5.43 score 12 scriptsmodeloriented
hstats:Interaction Statistics
Fast, model-agnostic implementation of different H-statistics introduced by Jerome H. Friedman and Bogdan E. Popescu (2008) <doi:10.1214/07-AOAS148>. These statistics quantify interaction strength per feature, feature pair, and feature triple. The package supports multi-output predictions and can account for case weights. In addition, several variants of the original statistics are provided. The shape of the interactions can be explored through partial dependence plots or individual conditional expectation plots. 'DALEX' explainers, meta learners ('mlr3', 'tidymodels', 'caret') and most other models work out-of-the-box.
Maintained by Michael Mayer. Last updated 7 months ago.
interactioninterpretabilitymachine-learningrstatstatisticsxai
29 stars 5.39 score 34 scriptsntguardian
CPAT:Change Point Analysis Tests
Implements several statistical tests for structural change, specifically the tests featured in Horváth, Rice and Miller (in press): CUSUM (with weighted/trimmed variants), Darling-Erdös, Hidalgo-Seo, Andrews, and the new Rényi-type test.
Maintained by Curtis Miller. Last updated 6 years ago.
11 stars 5.37 score 43 scriptsstatisticsnorway
SmallCountRounding:Small Count Rounding of Tabular Data
A statistical disclosure control tool to protect frequency tables in cases where small values are sensitive. The function PLSrounding() performs small count rounding of necessary inner cells so that all small frequencies of cross-classifications to be published (publishable cells) are rounded. This is equivalent to changing micro data since frequencies of unique combinations are changed. Thus, additivity and consistency are guaranteed. The methodology is described in Langsrud and Heldal (2018) <https://www.researchgate.net/publication/327768398_An_Algorithm_for_Small_Count_Rounding_of_Tabular_Data>.
Maintained by Øyvind Langsrud. Last updated 15 days ago.
3 stars 5.36 score 14 scriptscenterforstatistics-ugent
pim:Fit Probabilistic Index Models
Fit a probabilistic index model as described in Thas et al, 2012: <doi:10.1111/j.1467-9868.2011.01020.x>. The interface to the modeling function has changed in this new version. The old version is still available at R-Forge.
Maintained by Joris Meys. Last updated 3 months ago.
10 stars 5.33 score 43 scriptsmlr-org
mlr3inferr:Inference on the Generalization Error
Confidence interval and resampling methods for inference on the generalization error.
Maintained by Sebastian Fischer. Last updated 2 months ago.
4 stars 5.32 score 4 scripts 2 dependentsashenoy-cmbi
grafify:Easy Graphs for Data Visualisation and Linear Models for ANOVA
Easily explore data by plotting graphs with a few lines of code. Use these ggplot() wrappers to quickly draw graphs of scatter/dots with box-whiskers, violins or SD error bars, data distributions, before-after graphs, factorial ANOVA and more. Customise graphs in many ways, for example, by choosing from colour blind-friendly palettes (12 discreet, 3 continuous and 2 divergent palettes). Use the simple code for ANOVA as ordinary (lm()) or mixed-effects linear models (lmer()), including randomised-block or repeated-measures designs, and fit non-linear outcomes as a generalised additive model (gam) using mgcv(). Obtain estimated marginal means and perform post-hoc comparisons on fitted models (via emmeans()). Also includes small datasets for practising code and teaching basics before users move on to more complex designs. See vignettes for details on usage <https://grafify.shenoylab.com/>. Citation: <doi:10.5281/zenodo.5136508>.
Maintained by Avinash R Shenoy. Last updated 13 days ago.
ggplot2linear-modelspost-hoc-comparisonsstatisticsvignettes
48 stars 5.31 score 107 scriptstnagler
wdm:Weighted Dependence Measures
Provides efficient implementations of weighted dependence measures and related asymptotic tests for independence. Implemented measures are the Pearson correlation, Spearman's rho, Kendall's tau, Blomqvist's beta, and Hoeffding's D; see, e.g., Nelsen (2006) <doi:10.1007/0-387-28678-0> and Hollander et al. (2015, ISBN:9780470387375).
Maintained by Thomas Nagler. Last updated 3 months ago.
3 stars 5.30 score 11 scripts 21 dependentsbioc
biotmle:Targeted Learning with Moderated Statistics for Biomarker Discovery
Tools for differential expression biomarker discovery based on microarray and next-generation sequencing data that leverage efficient semiparametric estimators of the average treatment effect for variable importance analysis. Estimation and inference of the (marginal) average treatment effects of potential biomarkers are computed by targeted minimum loss-based estimation, with joint, stable inference constructed across all biomarkers using a generalization of moderated statistics for use with the estimated efficient influence function. The procedure accommodates the use of ensemble machine learning for the estimation of nuisance functions.
Maintained by Nima Hejazi. Last updated 5 months ago.
regressiongeneexpressiondifferentialexpressionsequencingmicroarrayrnaseqimmunooncologybioconductorbioconductor-packagebioconductor-packagesbioinformaticsbiomarker-discoverybiostatisticscausal-inferencecomputational-biologymachine-learningstatisticstargeted-learning
5 stars 5.30 score 5 scriptsncchung
jackstraw:Statistical Inference for Unsupervised Learning
Test for association between the observed data and their estimated latent variables. The jackstraw package provides a resampling strategy and testing scheme to estimate statistical significance of association between the observed data and their latent variables. Depending on the data type and the analysis aim, the latent variables may be estimated by principal component analysis (PCA), factor analysis (FA), K-means clustering, and related unsupervised learning algorithms. The jackstraw methods learn over-fitting characteristics inherent in this circular analysis, where the observed data are used to estimate the latent variables and used again to test against that estimated latent variables. When latent variables are estimated by PCA, the jackstraw enables statistical testing for association between observed variables and latent variables, as estimated by low-dimensional principal components (PCs). This essentially leads to identifying variables that are significantly associated with PCs. Similarly, unsupervised clustering, such as K-means clustering, partition around medoids (PAM), and others, finds coherent groups in high-dimensional data. The jackstraw estimates statistical significance of cluster membership, by testing association between data and cluster centers. Clustering membership can be improved by using the resulting jackstraw p-values and posterior inclusion probabilities (PIPs), with an application to unsupervised evaluation of cell identities in single cell RNA-seq (scRNA-seq).
Maintained by Neo Christopher Chung. Last updated 3 months ago.
clusteringk-meansmachine-learningpcastatisticsunsupervised
16 stars 5.29 score 35 scriptszzawadz
DepthProc:Statistical Depth Functions for Multivariate Analysis
Data depth concept offers a variety of powerful and user friendly tools for robust exploration and inference for multivariate data. The offered techniques may be successfully used in cases of lack of our knowledge on parametric models generating data due to their nature. The package consist of among others implementations of several data depth techniques involving multivariate quantile-quantile plots, multivariate scatter estimators, multivariate Wilcoxon tests and robust regressions.
Maintained by Zygmunt Zawadzki. Last updated 3 years ago.
depth-functionsexploratory-data-analysisstatisticsopenblascppopenmp
6 stars 5.27 score 104 scripts 2 dependentspbiecek
ddst:Data Driven Smooth Tests
Smooth tests are data driven (alternative hypothesis is dynamically selected based on data). In this package you will find two groups of smooth of test: goodness-of-fit tests and nonparametric tests for comparing distributions. Among goodness-of-fit tests there are tests for exponent, Gaussian, Gumbel and uniform distribution. Among nonparametric tests there are tests for stochastic dominance, k-sample test, test with umbrella alternatives and test for change-point problems.
Maintained by Przemyslaw Biecek. Last updated 2 years ago.
data-drivensmooth-teststatisticstest
6 stars 5.26 score 6 scripts 2 dependentsmarberts
sps:Sequential Poisson Sampling
Sequential Poisson sampling is a variation of Poisson sampling for drawing probability-proportional-to-size samples with a given number of units, and is commonly used for price-index surveys. This package gives functions to draw stratified sequential Poisson samples according to the method by Ohlsson (1998, ISSN:0282-423X), as well as other order sample designs by Rosén (1997, <doi:10.1016/S0378-3758(96)00186-3>), and generate appropriate bootstrap replicate weights according to the generalized bootstrap method by Beaumont and Patak (2012, <doi:10.1111/j.1751-5823.2011.00166.x>).
Maintained by Steve Martin. Last updated 1 days ago.
official-statisticssamplingstatisticssurvey-sampling
4 stars 5.26 score 8 scriptspetrbouchal
czso:Use Open Data from the Czech Statistical Office in R
Get programmatic access to the open data provided by the Czech Statistical Office (CZSO, <https://czso.cz>).
Maintained by Petr Bouchal. Last updated 7 months ago.
czech-republicczech-statistical-officeczsodatasetopen-datastatistics
11 stars 5.24 score 53 scriptsrcalinjageman
esci:Estimation Statistics with Confidence Intervals
A collection of functions and 'jamovi' module for the estimation approach to inferential statistics, the approach which emphasizes effect sizes, interval estimates, and meta-analysis. Nearly all functions are based on 'statpsych' and 'metafor'. This package is still under active development, and breaking changes are likely, especially with the plot and hypothesis test functions. Data sets are included for all examples from Cumming & Calin-Jageman (2024) <ISBN:9780367531508>.
Maintained by Robert Calin-Jageman. Last updated 1 months ago.
jamovijaspsciencestatisticsvisualization
24 stars 5.24 score 12 scriptsalexanderlynl
safestats:Safe Anytime-Valid Inference
Functions to design and apply tests that are anytime valid. The functions can be used to design hypothesis tests in the prospective/randomised control trial setting or in the observational/retrospective setting. The resulting tests remain valid under both optional stopping and optional continuation. The current version includes safe t-tests and safe tests of two proportions. For details on the theory of safe tests, see Grunwald, de Heide and Koolen (2019) "Safe Testing" <arXiv:1906.07801>, for details on safe logrank tests see ter Schure, Perez-Ortiz, Ly and Grunwald (2020) "The Safe Logrank Test: Error Control under Continuous Monitoring with Unlimited Horizon" <arXiv:2011.06931v3> and Turner, Ly and Grunwald (2021) "Safe Tests and Always-Valid Confidence Intervals for contingency tables and beyond" <arXiv:2106.02693> for details on safe contingency table tests.
Maintained by Alexander Ly. Last updated 2 years ago.
evalueshacktoberfestsafe-testingstatistics
6 stars 5.23 score 14 scriptsshabbychef
fromo:Fast Robust Moments
Fast, numerically robust computation of weighted moments via 'Rcpp'. Supports computation on vectors and matrices, and Monoidal append of moments. Moments and cumulants over running fixed length windows can be computed, as well as over time-based windows. Moment computations are via a generalization of Welford's method, as described by Bennett et. (2009) <doi:10.1109/CLUSTR.2009.5289161>.
Maintained by Steven E. Pav. Last updated 4 months ago.
cumulantsmomentsrolling-statisticsstatisticscpp
3 stars 5.22 score 22 scriptsbioc
OmicCircos:High-quality circular visualization of omics data
OmicCircos is an R application and package for generating high-quality circular plots for omics data.
Maintained by Ying Hu. Last updated 5 months ago.
visualizationstatisticsannotation
5.20 score 80 scriptsmodal-inria
RMixtCompUtilities:Utility Functions for 'MixtComp' Outputs
Mixture Composer <https://github.com/modal-inria/MixtComp> is a project to build mixture models with heterogeneous data sets and partially missing data management. This package contains graphical, getter and some utility functions to facilitate the analysis of 'MixtComp' output.
Maintained by Quentin Grimonprez. Last updated 11 months ago.
clusteringcppheterogeneous-datamissing-datamixed-datamixture-modelstatistics
13 stars 5.19 score 2 scripts 1 dependentsnalimilan
logmult:Log-Multiplicative Models, Including Association Models
Functions to fit log-multiplicative models using 'gnm', with support for convenient printing, plots, and jackknife/bootstrap standard errors. For complex survey data, models can be fitted from design objects from the 'survey' package. Currently supported models include UNIDIFF (Erikson & Goldthorpe, 1992), a.k.a. log-multiplicative layer effect model (Xie, 1992) <doi:10.2307/2096242>, and several association models: Goodman (1979) <doi:10.2307/2286971> row-column association models of the RC(M) and RC(M)-L families with one or several dimensions; two skew-symmetric association models proposed by Yamaguchi (1990) <doi:10.2307/271086> and by van der Heijden & Mooijaart (1995) <doi:10.1177/0049124195024001002> Functions allow computing the intrinsic association coefficient (see Bouchet-Valat (2022) <doi:10.1177/0049124119852389>) and the Altham (1970) index <doi:10.1111/j.2517-6161.1970.tb00816.x>, including via the Bayes shrinkage estimator proposed by Zhou (2015) <doi:10.1177/0081175015570097>; and the RAS/IPF/Deming-Stephan algorithm.
Maintained by Milan Bouchet-Valat. Last updated 3 years ago.
log-linear-modelmodellingstatistics
4 stars 5.18 score 76 scriptsrkabacoff
qacBase:Functions to Facilitate Exploratory Data Analysis
Functions for descriptive statistics, data management, and data visualization.
Maintained by Kabacoff Robert. Last updated 3 years ago.
1 stars 5.13 score 45 scriptsnhejazi
txshift:Efficient Estimation of the Causal Effects of Stochastic Interventions
Efficient estimation of the population-level causal effects of stochastic interventions on a continuous-valued exposure. Both one-step and targeted minimum loss estimators are implemented for the counterfactual mean value of an outcome of interest under an additive modified treatment policy, a stochastic intervention that may depend on the natural value of the exposure. To accommodate settings with outcome-dependent two-phase sampling, procedures incorporating inverse probability of censoring weighting are provided to facilitate the construction of inefficient and efficient one-step and targeted minimum loss estimators. The causal parameter and its estimation were first described by Díaz and van der Laan (2013) <doi:10.1111/j.1541-0420.2011.01685.x>, while the multiply robust estimation procedure and its application to data from two-phase sampling designs is detailed in NS Hejazi, MJ van der Laan, HE Janes, PB Gilbert, and DC Benkeser (2020) <doi:10.1111/biom.13375>. The software package implementation is described in NS Hejazi and DC Benkeser (2020) <doi:10.21105/joss.02447>. Estimation of nuisance parameters may be enhanced through the Super Learner ensemble model in 'sl3', available for download from GitHub using 'remotes::install_github("tlverse/sl3")'.
Maintained by Nima Hejazi. Last updated 6 months ago.
causal-effectscausal-inferencecensored-datamachine-learningrobust-statisticsstatisticsstochastic-interventionsstochastic-treatment-regimestargeted-learningtreatment-effectsvariable-importance
14 stars 5.12 score 19 scriptstjmahr
polypoly:Helper Functions for Orthogonal Polynomials
Tools for reshaping, plotting, and manipulating matrices of orthogonal polynomials.
Maintained by Tristan Mahr. Last updated 2 years ago.
19 stars 5.12 score 14 scriptsmikejareds
hermiter:Efficient Sequential and Batch Estimation of Univariate and Bivariate Probability Density Functions and Cumulative Distribution Functions along with Quantiles (Univariate) and Nonparametric Correlation (Bivariate)
Facilitates estimation of full univariate and bivariate probability density functions and cumulative distribution functions along with full quantile functions (univariate) and nonparametric correlation (bivariate) using Hermite series based estimators. These estimators are particularly useful in the sequential setting (both stationary and non-stationary) and one-pass batch estimation setting for large data sets. Based on: Stephanou, Michael, Varughese, Melvin and Macdonald, Iain. "Sequential quantiles via Hermite series density estimation." Electronic Journal of Statistics 11.1 (2017): 570-607 <doi:10.1214/17-EJS1245>, Stephanou, Michael and Varughese, Melvin. "On the properties of Hermite series based distribution function estimators." Metrika (2020) <doi:10.1007/s00184-020-00785-z> and Stephanou, Michael and Varughese, Melvin. "Sequential estimation of Spearman rank correlation using Hermite series estimators." Journal of Multivariate Analysis (2021) <doi:10.1016/j.jmva.2021.104783>.
Maintained by Michael Stephanou. Last updated 7 months ago.
cumulative-distribution-functionkendall-correlation-coefficientonline-algorithmsprobability-density-functionquantilespearman-correlation-coefficientstatisticsstreaming-algorithmsstreaming-datacpp
15 stars 5.11 score 17 scriptslindanab
mecor:Measurement Error Correction in Linear Models with a Continuous Outcome
Covariate measurement error correction is implemented by means of regression calibration by Carroll RJ, Ruppert D, Stefanski LA & Crainiceanu CM (2006, ISBN:1584886331), efficient regression calibration by Spiegelman D, Carroll RJ & Kipnis V (2001) <doi:10.1002/1097-0258(20010115)20:1%3C139::AID-SIM644%3E3.0.CO;2-K> and maximum likelihood estimation by Bartlett JW, Stavola DBL & Frost C (2009) <doi:10.1002/sim.3713>. Outcome measurement error correction is implemented by means of the method of moments by Buonaccorsi JP (2010, ISBN:1420066560) and efficient method of moments by Keogh RH, Carroll RJ, Tooze JA, Kirkpatrick SI & Freedman LS (2014) <doi:10.1002/sim.7011>. Standard error estimation of the corrected estimators is implemented by means of the Delta method by Rosner B, Spiegelman D & Willett WC (1990) <doi:10.1093/oxfordjournals.aje.a115715> and Rosner B, Spiegelman D & Willett WC (1992) <doi:10.1093/oxfordjournals.aje.a116453>, the Fieller method described by Buonaccorsi JP (2010, ISBN:1420066560), and the Bootstrap by Carroll RJ, Ruppert D, Stefanski LA & Crainiceanu CM (2006, ISBN:1584886331).
Maintained by Linda Nab. Last updated 3 years ago.
linear-modelsmeasurement-errorstatistics
6 stars 5.07 score 13 scriptsralmond
CPTtools:Tools for Creating Conditional Probability Tables
Provides support parameterized tables for Bayesian networks, particularly the IRT-like DiBello tables. Also, provides some tools for visualing the networks.
Maintained by Russell Almond. Last updated 3 months ago.
1 stars 5.05 score 21 scripts 4 dependentsathammad
pbox:Exploring Multivariate Spaces with Probability Boxes
Advanced statistical library offering a method to encapsulate and query the probability space of a dataset effortlessly using Probability Boxes (p-boxes). Its distinctive feature lies in the ease with which users can navigate and analyze marginal, joint, and conditional probabilities while taking into account the underlying correlation structure inherent in the data using copula theory and models. A comprehensive explanation is available in the paper "pbox: Exploring Multivariate Spaces with Probability Boxes" to be published in the Journal of Statistical Software.
Maintained by Ahmed T. Hammad. Last updated 9 months ago.
climate-changecopulaenvironmental-monitoringfinancial-analysisprobabilityrisk-assessmentrisk-managementstatistics
2 stars 5.04 score 4 scriptsjobnmadu
Dyn4cast:Dynamic Modeling and Machine Learning Environment
Estimates, predict and forecast dynamic models as well as Machine Learning metrics which assists in model selection for further analysis. The package also have capabilities to provide tools and metrics that are useful in machine learning and modeling. For example, there is quick summary, percent sign, Mallow's Cp tools and others. The ecosystem of this package is analysis of economic data for national development. The package is so far stable and has high reliability and efficiency as well as time-saving.
Maintained by Job Nmadu. Last updated 12 days ago.
data-scienceequal-lenght-forecastforecastingknotsmachine-learningnigeriapredictionregression-modelsspline-modelsstatisticstime-series
4 stars 5.03 score 38 scriptsncchung
jaccard:Testing similarity between binary datasets using Jaccard/Tanimoto coefficients
Calculate statistical significance of Jaccard/Tanimoto similarity coefficients.
Maintained by Neo Christopher Chung. Last updated 5 years ago.
binary-datahypothesis-testingjaccardsimilaritystatisticstanimotocpp
5 stars 5.03 score 85 scriptsmarberts
rsmatrix:Matrices for Repeat-Sales Price Indexes
Calculate the matrices in Shiller (1991, <doi:10.1016/S1051-1377(05)80028-2>) that serve as the foundation for many repeat-sales price indexes.
Maintained by Steve Martin. Last updated 1 days ago.
4 stars 5.00 score 7 scriptshendersontrent
theftdlc:Analyse and Interpret Time Series Features
Provides a suite of functions for analysing, interpreting, and visualising time-series features calculated from different feature sets from the 'theft' package. Implements statistical learning methodologies described in Henderson, T., Bryant, A., and Fulcher, B. (2023) <arXiv:2303.17809>.
Maintained by Trent Henderson. Last updated 2 months ago.
data-sciencedata-visualizationmachine-learningstatisticstime-series
4 stars 4.94 score 11 scriptssmac-group
avar:Allan Variance
Implements the allan variance and allan variance linear regression estimator for latent time series models. More details about the method can be found, for example, in Guerrier, S., Molinari, R., & Stebler, Y. (2016) <doi:10.1109/LSP.2016.2541867>.
Maintained by Stéphane Guerrier. Last updated 3 years ago.
allan-varianceinertial-sensorsstatisticstime-seriescpp
5 stars 4.88 score 9 scriptsgraemeleehickey
goldilocks:Goldilocks Adaptive Trial Designs for Time-to-Event Endpoints
Implements the Goldilocks adaptive trial design for a time to event outcome using a piecewise exponential model and conjugate Gamma prior distributions. The method closely follows the article by Broglio and colleagues <doi:10.1080/10543406.2014.888569>, which allows users to explore the operating characteristics of different trial designs.
Maintained by Graeme L. Hickey. Last updated 2 months ago.
adaptivebayesianbayesian-statisticsclinical-trialsstatisticscpp
7 stars 4.85 score 4 scriptsjucheng1992
ctmle:Collaborative Targeted Maximum Likelihood Estimation
Implements the general template for collaborative targeted maximum likelihood estimation. It also provides several commonly used C-TMLE instantiation, like the vanilla/scalable variable-selection C-TMLE (Ju et al. (2017) <doi:10.1177/0962280217729845>) and the glmnet-C-TMLE algorithm (Ju et al. (2017) <arXiv:1706.10029>).
Maintained by Cheng Ju. Last updated 5 years ago.
causal-inferencemachine-learningstatisticstmle
5 stars 4.83 score 27 scriptsquadrama
DramaAnalysis:Analysis of Dramatic Texts
Analysis of preprocessed dramatic texts, with respect to literary research. The package provides functions to analyze and visualize information about characters, stage directions, the dramatic structure and the text itself. The dramatic texts are expected to be in CSV format, which can be installed from within the package, sample texts are provided. The package and the reasoning behind it are described in Reiter et al. (2017) <doi:10.18420/in2017_119>.
Maintained by Nils Reiter. Last updated 5 years ago.
corpus-linguisticsdigital-humanitiesdramadramatic-textsstatistics
15 stars 4.79 score 41 scriptsbioc
HDTD:Statistical Inference about the Mean Matrix and the Covariance Matrices in High-Dimensional Transposable Data (HDTD)
Characterization of intra-individual variability using physiologically relevant measurements provides important insights into fundamental biological questions ranging from cell type identity to tumor development. For each individual, the data measurements can be written as a matrix with the different subsamples of the individual recorded in the columns and the different phenotypic units recorded in the rows. Datasets of this type are called high-dimensional transposable data. The HDTD package provides functions for conducting statistical inference for the mean relationship between the row and column variables and for the covariance structure within and between the row and column variables.
Maintained by Anestis Touloumis. Last updated 5 months ago.
differentialexpressiongeneticsgeneexpressionmicroarraysequencingstatisticalmethodsoftwarebioconductor-packagehigh-dimensionalstatisticsopenblascppopenmp
1 stars 4.78 scorebioc
VaSP:Quantification and Visualization of Variations of Splicing in Population
Discovery of genome-wide variable alternative splicing events from short-read RNA-seq data and visualizations of gene splicing information for publication-quality multi-panel figures in a population. (Warning: The visualizing function is removed due to the dependent package Sushi deprecated. If you want to use it, please change back to an older version.)
Maintained by Huihui Yu. Last updated 5 months ago.
rnaseqalternativesplicingdifferentialsplicingstatisticalmethodvisualizationpreprocessingclusteringdifferentialexpressionkeggimmunooncology3s-scoresalternative-splicingballgownrna-seqsplicingsqtlstatistics
3 stars 4.78 score 3 scriptsropensci
tacmagic:Positron Emission Tomography Time-Activity Curve Analysis
To facilitate the analysis of positron emission tomography (PET) time activity curve (TAC) data, and to encourage open science and replicability, this package supports data loading and analysis of multiple TAC file formats. Functions are available to analyze loaded TAC data for individual participants or in batches. Major functionality includes weighted TAC merging by region of interest (ROI), calculating models including standardized uptake value ratio (SUVR) and distribution volume ratio (DVR, Logan et al. 1996 <doi:10.1097/00004647-199609000-00008>), basic plotting functions and calculation of cut-off values (Aizenstein et al. 2008 <doi:10.1001/archneur.65.11.1509>). Please see the walkthrough vignette for a detailed overview of 'tacmagic' functions.
Maintained by Eric Brown. Last updated 5 years ago.
mrineuroimagingneuroscienceneuroscience-methodspetpet-mrpositronpositron-emission-tomographystatistics
5 stars 4.76 score 23 scriptsegarpor
goffda:Goodness-of-Fit Tests for Functional Data
Implementation of several goodness-of-fit tests for functional data. Currently, mostly related with the functional linear model with functional/scalar response and functional/scalar predictor. The package allows for the replication of the data applications considered in García-Portugués, Álvarez-Liébana, Álvarez-Pérez and González-Manteiga (2021) <doi:10.1111/sjos.12486>.
Maintained by Eduardo García-Portugués. Last updated 1 years ago.
functional-data-analysisgoodness-of-fitreproducible-researchstatisticsopenblascpp
10 stars 4.76 score 19 scripts 1 dependentsegarpor
rotasym:Tests for Rotational Symmetry on the Hypersphere
Implementation of the tests for rotational symmetry on the hypersphere proposed in García-Portugués, Paindaveine and Verdebout (2020) <doi:10.1080/01621459.2019.1665527>. The package also implements the proposed distributions on the hypersphere, based on the tangent-normal decomposition, and allows for the replication of the data application considered in the paper.
Maintained by Eduardo García-Portugués. Last updated 16 days ago.
circular-statisticsdirectional-statisticsgoodness-of-fitsemiparametricstatisticscpp
2 stars 4.68 score 32 scripts 5 dependentsalexiosg
RcppBessel:Bessel Functions Rcpp Interface
Exports an 'Rcpp' interface for the Bessel functions in the 'Bessel' package, which can then be called from the 'C++' code of other packages. For the original 'Fortran' implementation of these functions see Amos (1995) <doi:10.1145/212066.212078>.
Maintained by Alexios Galanos. Last updated 7 months ago.
mathematical-functionsrcppstatisticscpp
1 stars 4.65 score 4 scripts 1 dependentsbenjilu
forestError:A Unified Framework for Random Forest Prediction Error Estimation
Estimates the conditional error distributions of random forest predictions and common parameters of those distributions, including conditional misclassification rates, conditional mean squared prediction errors, conditional biases, and conditional quantiles, by out-of-bag weighting of out-of-bag prediction errors as proposed by Lu and Hardin (2021). This package is compatible with several existing packages that implement random forests in R.
Maintained by Benjamin Lu. Last updated 4 years ago.
inferenceintervalsmachine-learningmachinelearningpredictionrandom-forestrandomforeststatistics
26 stars 4.62 score 16 scriptsgraemeleehickey
adaptDiag:Bayesian Adaptive Designs for Diagnostic Trials
Simulate clinical trials for diagnostic test devices and evaluate the operating characteristics under an adaptive design with futility assessment determined via the posterior predictive probabilities.
Maintained by Graeme L. Hickey. Last updated 3 months ago.
adaptivebayesianbayesian-statisticsclinical-trialsdiagnostic-testsdiagnosticsstatistics
4 stars 4.60 score 5 scriptscotterell
TDCM:The Transition Diagnostic Classification Model Framework
Estimate the transition diagnostic classification model (TDCM) described in Madison & Bradshaw (2018) <doi:10.1007/s11336-018-9638-5>, a longitudinal extension of the log-linear cognitive diagnosis model (LCDM) in Henson, Templin & Willse (2009) <doi:10.1007/s11336-008-9089-5>. As the LCDM subsumes many other diagnostic classification models (DCMs), many other DCMs can be estimated longitudinally via the TDCM. The 'TDCM' package includes functions to estimate the single-group and multigroup TDCM, summarize results of interest including item parameters, growth proportions, transition probabilities, transitional reliability, attribute correlations, model fit, and growth plots.
Maintained by Michael E. Cotterell. Last updated 10 days ago.
4.60 score 5 scriptsbozenne
lavaSearch2:Tools for Model Specification in the Latent Variable Framework
Tools for model specification in the latent variable framework (add-on to the 'lava' package). The package contains three main functionalities: Wald tests/F-tests with improved control of the type 1 error in small samples, adjustment for multiple comparisons when searching for local dependencies, and adjustment for multiple comparisons when doing inference for multiple latent variable models.
Maintained by Brice Ozenne. Last updated 8 months ago.
inferencelatent-variable-modelsstatisticsopenblascpp
4.59 score 155 scriptsbioc
meshr:Tools for conducting enrichment analysis of MeSH
A set of annotation maps describing the entire MeSH assembled using data from MeSH.
Maintained by Koki Tsuyuzaki. Last updated 5 months ago.
annotationdatafunctionalannotationbioinformaticsstatisticsannotationmultiplecomparisonsmeshdb
4.56 score 9 scripts 1 dependentsjacob-long
dpm:Dynamic Panel Models Fit with Maximum Likelihood
Implements the dynamic panel models described by Allison, Williams, and Moral-Benito (2017 <doi:10.1177/2378023117710578>) in R. This class of models uses structural equation modeling to specify dynamic (lagged dependent variable) models with fixed effects for panel data. Additionally, models may have predictors that are only weakly exogenous, i.e., are affected by prior values of the dependent variable. Options also allow for random effects, dropping the lagged dependent variable, and a number of other specification choices.
Maintained by Jacob A. Long. Last updated 1 years ago.
16 stars 4.55 score 44 scriptsshah-in-boots
rmdl:A Causality-Informed Modeling Approach
A system for describing and manipulating the many models that are generated in causal inference and data analysis projects, as based on the causal theory and criteria of Austin Bradford Hill (1965) <doi:10.1177/003591576505800503>. This system includes the addition of formal attributes that modify base `R` objects, including terms and formulas, with a focus on variable roles in the "do-calculus" of modeling, as described in Pearl (2010) <doi:10.2202/1557-4679.1203>. For example, the definition of exposure, outcome, and interaction are implicit in the roles variables take in a formula. These premises allow for a more fluent modeling approach focusing on variable relationships, and assessing effect modification, as described by VanderWeele and Robins (2007) <doi:10.1097/EDE.0b013e318127181b>. The essential goal is to help contextualize formulas and models in causality-oriented workflows.
Maintained by Anish S. Shah. Last updated 10 months ago.
epidemiologymodelingstatistics
4.54 score 7 scriptscsblatvia
surveyplanning:Survey Planning Tools
Tools for sample survey planning, including sample size calculation, estimation of expected precision for the estimates of totals, and calculation of optimal sample size allocation.
Maintained by Juris Breidaks. Last updated 4 years ago.
8 stars 4.53 score 14 scripts 1 dependentshanjunwei-lab
MiRSEA:'MicroRNA' Set Enrichment Analysis
The tools for 'MicroRNA Set Enrichment Analysis' can identify risk pathways(or prior gene sets) regulated by microRNA set in the context of microRNA expression data. (1) This package constructs a correlation profile of microRNA and pathways by the hypergeometric statistic test. The gene sets of pathways derived from the three public databases (Kyoto Encyclopedia of Genes and Genomes ('KEGG'); 'Reactome'; 'Biocarta') and the target gene sets of microRNA are provided by four databases('TarBaseV6.0'; 'mir2Disease'; 'miRecords'; 'miRTarBase';). (2) This package can quantify the change of correlation between microRNA for each pathway(or prior gene set) based on a microRNA expression data with cases and controls. (3) This package uses the weighted Kolmogorov-Smirnov statistic to calculate an enrichment score (ES) of a microRNA set that co-regulate to a pathway , which reflects the degree to which a given pathway is associated with the specific phenotype. (4) This package can provide the visualization of the results.
Maintained by Junwei Han. Last updated 5 years ago.
statisticspathwaysmicrornaenrichment analysis
4.51 score 16 scriptsxiaoruizhu
SurrogateRsq:Goodness-of-Fit Analysis for Categorical Data using the Surrogate R-Squared
To assess and compare the models' goodness of fit, R-squared is one of the most popular measures. For categorical data analysis, however, no universally adopted R-squared measure can resemble the ordinary least square (OLS) R-squared for linear models with continuous data. This package implement the surrogate R-squared measure for categorical data analysis, which is proposed in the study of Dungang Liu, Xiaorui Zhu, Brandon Greenwell, and Zewei Lin (2022) <doi:10.1111/bmsp.12289>. It can generate a point or interval measure of the surrogate R-squared. It can also provide a ranking measure of the percentage contribution of each variable to the overall surrogate R-squared. This ranking assessment allows one to check the importance of each variable in terms of their explained variance. This package can be jointly used with other existing R packages for variable selection and model diagnostics in the model-building process.
Maintained by Xiaorui (Jeremy) Zhu. Last updated 1 years ago.
categorical-data-analysisgoodness-of-fitr-squared-statisticstatistics
5 stars 4.48 score 12 scriptseurostat
hicp:Harmonised Index of Consumer Prices
The Harmonised Index of Consumer Prices (HICP) is the key economic figure to measure inflation in the euro area. The methodology underlying the HICP is documented in the HICP Methodological Manual (<https://ec.europa.eu/eurostat/web/products-manuals-and-guidelines/w/ks-gq-24-003>). Based on the manual, this package provides functions to access and work with HICP data from Eurostat's public database (<https://ec.europa.eu/eurostat/data/database>).
Maintained by Sebastian Weinand. Last updated 8 months ago.
consumer-price-indexinflationpricesstatistics
2 stars 4.48 score 6 scriptsyboulag
cTOST:Finite Sample Correction of the Two One-Sided Tests in the Univariate Framework
A system containing easy-to-use tools to compute the bioequivalence assessment in the univariate framework using the methods proposed in Boulaguiem et al. (2023) <doi:10.1101/2023.03.11.532179>.
Maintained by Younes Boulaguiem. Last updated 2 months ago.
bioequivalenceequivalencehighly-variable-drugsstatistics
4.48 score 4 scriptsfriendly
mvinfluence:Influence Measures and Diagnostic Plots for Multivariate Linear Models
Computes regression deletion diagnostics for multivariate linear models and provides some associated diagnostic plots. The diagnostic measures include hat-values (leverages), generalized Cook's distance, and generalized squared 'studentized' residuals. Several types of plots to detect influential observations are provided.
Maintained by Michael Friendly. Last updated 3 years ago.
multivariate-analysismultivariate-linear-regressionstatisticsvisualization
2 stars 4.41 score 26 scriptspbosetti
adas.utils:Design of Experiments and Factorial Plans Utilities
A number of functions to create and analyze factorial plans according to the Design of Experiments (DoE) approach, with the addition of some utility function to perform some statistical analyses. DoE approach follows the approach in "Design and Analysis of Experiments" by Douglas C. Montgomery (2019, ISBN:978-1-119-49244-3). The package also provides utilities used in the course "Analysis of Data and Statistics" at the University of Trento, Italy.
Maintained by Paolo Bosetti. Last updated 19 hours ago.
4.40 score 6 scriptsnvietto
samplezoo:Generate Samples with a Variety of Probability Distributions
Simplifies the process of generating samples from a variety of probability distributions, allowing users to quickly create data frames for demonstrations, troubleshooting, or teaching purposes. Data is available in multiple sizes—small, medium, and large. For more information, refer to the package documentation.
Maintained by Nicholas Vietto. Last updated 1 months ago.
probability-distributionrngsimulationstatistics
4.40 score 8 scriptsmarkajoc
condvis:Conditional Visualization for Statistical Models
Exploring fitted models by interactively taking 2-D and 3-D sections in data space.
Maintained by Mark OConnell. Last updated 7 years ago.
20 stars 4.38 score 24 scriptsvusaverse
vvdoctor:Statistical Test App with R 'shiny'
Provides a user-friendly R 'shiny' app for performing various statistical tests on datasets. It allows users to upload data in numerous formats and perform statistical analyses. The app dynamically adapts its options based on the selected columns and supports both single and multiple column comparisons. The app's user interface is designed to streamline the process of selecting datasets, columns, and test options, making it easy for users to explore and interpret their data. The underlying functions for statistical tests are well-organized and can be used independently within other R scripts.
Maintained by Tomer Iwan. Last updated 11 months ago.
hypothesis-testingr-r-shinyshiny-appsshiny-rstatistical-testsstatisticsstats
7 stars 4.32 score 3 scriptshugleipzig
kitesquare:Visualize Contingency Tables Using Kite-Square Plots
Create a kite-square plot for contingency tables using 'ggplot2', to display their relevant quantities in a single figure (marginal, conditional, expected, observed, chi-squared). The plot resembles a flying kite inside a square if the variables are independent, and deviates from this the more dependence exists.
Maintained by John Wiedenhöft. Last updated 4 days ago.
contingency-tablecontingency-tablesstatisticsvisualisationvisualization
1 stars 4.30 scoretimbeechey
clubpro:Classification Using Binary Procrustes Rotation
Implements a classification method described by Grice (2011, ISBN:978-0-12-385194-9) using binary procrustes rotation; a simplified version of procrustes rotation.
Maintained by Timothy Beechey. Last updated 10 months ago.
classificationdata-analysispsychology-experimentsrcppstatistical-analysisstatisticsopenblascppopenmp
4.30 score 2 scriptsgasparl
neatStats:Neat and Painless Statistical Reporting
User-friendly, clear and simple statistics, primarily for publication in psychological science. The main functions are wrappers for other packages, but there are various additions as well. Every relevant step from data aggregation to reportable printed statistics is covered for basic experimental designs.
Maintained by Gáspár Lukács. Last updated 2 years ago.
bayesfactorconfidence-intervalspipelinestatistical-analysisstatistics
4 stars 4.30 scorexsswang
remiod:Reference-Based Multiple Imputation for Ordinal/Binary Response
Reference-based multiple imputation of ordinal and binary responses under Bayesian framework, as described in Wang and Liu (2022) <arXiv:2203.02771>. Methods for missing-not-at-random include Jump-to-Reference (J2R), Copy Reference (CR), and Delta Adjustment which can generate tipping point analysis.
Maintained by Tony Wang. Last updated 2 years ago.
bayesiancontrol-basedcopy-referencedelta-adjustmentgeneralized-linear-modelsglmjagsjump-to-referencemcmcmissing-at-randommissing-datamissing-not-at-randommultiple-imputationnon-ignorableordinal-regressionpattern-mixture-modelreference-basedstatisticscpp
4.30 score 3 scriptsstephaneguerrier
pempi:Proportion Estimation with Marginal Proxy Information
A system contains easy-to-use tools for the conditional estimation of the prevalence of an emerging or rare infectious diseases using the methods proposed in Guerrier et al. (2023) <arXiv:2012.10745>.
Maintained by Stéphane Guerrier. Last updated 1 years ago.
covidprevalencerare-infectious-diseasesstatistics
4.30 score 9 scriptsthiyangt
DSjobtracker:What Skills and Qualifications are Required for Data Science Related Jobs?
Dataset containing information about job listings for data science job roles.
Maintained by Thiyanga S. Talagala. Last updated 1 years ago.
datasetqualificationsskillsstatisticstidy
3 stars 4.29 score 13 scriptsegarpor
DirStats:Nonparametric Methods for Directional Data
Nonparametric kernel density estimation, bandwidth selection, and other utilities for analyzing directional data. Implements the estimator in Bai, Rao and Zhao (1987) <doi:10.1016/0047-259X(88)90113-3>, the cross-validation bandwidth selectors in Hall, Watson and Cabrera (1987) <doi:10.1093/biomet/74.4.751> and the plug-in bandwidth selectors in García-Portugués (2013) <doi:10.1214/13-ejs821>.
Maintained by Eduardo García-Portugués. Last updated 2 years ago.
directional-statisticsnonparametric-statisticsstatisticsfortran
12 stars 4.26 score 7 scripts 1 dependentsmightymetrika
npboottprm:Nonparametric Bootstrap Test with Pooled Resampling
Addressing crucial research questions often necessitates a small sample size due to factors such as distinctive target populations, rarity of the event under study, time and cost constraints, ethical concerns, or group-level unit of analysis. Many readily available analytic methods, however, do not accommodate small sample sizes, and the choice of the best method can be unclear. The 'npboottprm' package enables the execution of nonparametric bootstrap tests with pooled resampling to help fill this gap. Grounded in the statistical methods for small sample size studies detailed in Dwivedi, Mallawaarachchi, and Alvarado (2017) <doi:10.1002/sim.7263>, the package facilitates a range of statistical tests, encompassing independent t-tests, paired t-tests, and one-way Analysis of Variance (ANOVA) F-tests. The nonparboot() function undertakes essential computations, yielding detailed outputs which include test statistics, effect sizes, confidence intervals, and bootstrap distributions. Further, 'npboottprm' incorporates an interactive 'shiny' web application, nonparboot_app(), offering intuitive, user-friendly data exploration.
Maintained by Mackson Ncube. Last updated 6 months ago.
datasciencenonparametricstatistics
1 stars 4.26 score 5 scripts 2 dependentshoxo-m
deltatest:Statistical Hypothesis Testing Using the Delta Method
Statistical hypothesis testing using the Delta method as proposed by Deng et al. (2018) <doi:10.1145/3219819.3219919>. This method replaces the standard variance estimation formula in the Z-test with an approximate formula derived via the Delta method, which can account for within-user correlation.
Maintained by Koji Makiyama. Last updated 13 days ago.
ab-testingdata-sciencestatistics
4 stars 4.26 scorejeffreyevans
GeNetIt:Spatial Graph-Theoretic Genetic Gravity Modelling
Implementation of spatial graph-theoretic genetic gravity models. The model framework is applicable for other types of spatial flow questions. Includes functions for constructing spatial graphs, sampling and summarizing associated raster variables and building unconstrained and singly constrained gravity models.
Maintained by Jeffrey S. Evans. Last updated 2 years ago.
landscape-geneticsr-spatialspatialstatistics
9 stars 4.24 score 39 scriptskoenderks
digitTests:Tests for Detecting Irregular Digit Patterns
Provides statistical tests and support functions for detecting irregular digit patterns in numerical data. The package includes tools for extracting digits at various locations in a number, tests for repeated values, and (Bayesian) tests of digit distributions.
Maintained by Koen Derks. Last updated 2 years ago.
digit-analysisdigitsstatistics
3 stars 4.18 score 9 scriptscoatless-rpkg
msos:Data Sets and Functions Used in Multivariate Statistics: Old School by John Marden
Multivariate Analysis methods and data sets used in John Marden's book Multivariate Statistics: Old School (2015) <ISBN:978-1456538835>. This also serves as a companion package for the STAT 571: Multivariate Analysis course offered by the Department of Statistics at the University of Illinois at Urbana-Champaign ('UIUC').
Maintained by James Balamuta. Last updated 1 years ago.
3 stars 4.16 score 32 scripts 1 dependentsjoshuawlambert
rFSA:Feasible Solution Algorithm for Finding Best Subsets and Interactions
Assists in statistical model building to find optimal and semi-optimal higher order interactions and best subsets. Uses the lm(), glm(), and other R functions to fit models generated from a feasible solution algorithm. Discussed in Subset Selection in Regression, A Miller (2002). Applied and explained for least median of squares in Hawkins (1993) <doi:10.1016/0167-9473(93)90246-P>. The feasible solution algorithm comes up with model forms of a specific type that can have fixed variables, higher order interactions and their lower order terms.
Maintained by Joshua Lambert. Last updated 4 years ago.
algorithmfsainteractionmodelsparallelstatisticalstatisticssubset
7 stars 4.15 score 20 scriptsxiaoruizhu
PAsso:Assessing the Partial Association Between Ordinal Variables
An implementation of the unified framework for assessing partial association between ordinal variables after adjusting for a set of covariates (Dungang Liu, Shaobo Li, Yan Yu and Irini Moustaki (2020) <doi:10.1080/01621459.2020.1796394> Journal of the American Statistical Association). This package provides a set of tools to quantify, visualize, and test partial associations between multiple ordinal variables. It can produce a number of $phi$ measures, partial regression plots, 3-D plots, and p-values for testing H_0: phi=0 or H_0: phi <= delta.
Maintained by Xiaorui (Jeremy) Zhu. Last updated 1 years ago.
association-analysisordinal-variablespartial-associationstatisticscpp
7 stars 4.14 score 13 scripts 1 dependents