R-universe search: topic:statistics

ddsjoberg

gtsummary:Presentation-Ready Data Summary and Analytic Result Tables

Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.

Maintained by Daniel D. Sjoberg. Last updated 3 days ago.

easy-to-use gt html5 regression-models reproducibility reproducible-research statistics summary-statistics summary-tables table1 tableone

1.1k stars 17.02 score 8.2k scripts 15 dependents

sebkrantz

collapse:Advanced and Fast Data Transformation

A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.

Maintained by Sebastian Krantz. Last updated 6 days ago.

data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data scientific-computing statistics time-series weighted weights cpp openmp

672 stars 16.68 score 708 scripts 99 dependents

bethatkinson

rpart:Recursive Partitioning and Regression Trees

Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.

Maintained by Beth Atkinson. Last updated 9 months ago.

cart classification statistics

52 stars 16.59 score 18k scripts 1.6k dependents

easystats

effectsize:Indices of Effect Size

Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc. References: Ben-Shachar et al. (2020) <doi:10.21105/joss.02815>.

Maintained by Mattan S. Ben-Shachar. Last updated 2 months ago.

anova cohens-d compute conversion correlation effect-size effectsize hacktoberfest hedges-g interpretation standardization standardized statistics

344 stars 16.38 score 1.8k scripts 29 dependents

spatstat

spatstat:Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests

Comprehensive open-source toolbox for analysing Spatial Point Patterns. Focused mainly on two-dimensional point patterns, including multitype/marked points, in any spatial region. Also supports three-dimensional point patterns, space-time point patterns in any number of dimensions, point patterns on a linear network, and patterns of other geometrical objects. Supports spatial covariate data such as pixel images. Contains over 3000 functions for plotting spatial data, exploratory data analysis, model-fitting, simulation, spatial sampling, model diagnostics, and formal inference. Data types include point patterns, line segment patterns, spatial windows, pixel images, tessellations, and linear networks. Exploratory methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the functions ppm(), kppm(), slrm(), dppm() similar to glm(). Types of models include Poisson, Gibbs and Cox point processes, Neyman-Scott cluster processes, and determinantal point processes. Models may involve dependence on covariates, inter-point interaction, cluster formation and dependence on marks. Models are fitted by maximum likelihood, logistic regression, minimum contrast, and composite likelihood methods. A model can be fitted to a list of point patterns (replicated point pattern data) using the function mppm(). The model can include random effects and fixed effects depending on the experimental design, in addition to all the features listed above. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots.

Maintained by Adrian Baddeley. Last updated 6 days ago.

cluster-process cox-point-process gibbs-process kernel-density network-analysis point-process poisson-process spatial-analysis spatial-data spatial-data-analysis spatial-statistics spatstat statistical-methods statistical-models statistical-tests statistics

200 stars 16.25 score 5.5k scripts 40 dependents

easystats

performance:Assessment of Regression Models Performance

Utilities for computing measures to assess model quality, which are not directly provided by R's 'base' or 'stats' packages. These include e.g. measures like r-squared, intraclass correlation coefficient (Nakagawa, Johnson & Schielzeth (2017) <doi:10.1098/rsif.2017.0213>), root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models. References: Lüdecke et al. (2021) <doi:10.21105/joss.03139>.

Maintained by Daniel Lüdecke. Last updated 3 days ago.

aic easystats hacktoberfest loo machine-learning mixed-models models performance r2 statistics

1.1k stars 16.20 score 4.3k scripts 48 dependents

aphalo

ggpmisc:Miscellaneous Extensions to 'ggplot2'

Extensions to 'ggplot2' respecting the grammar of graphics paradigm. Statistics: locate and tag peaks and valleys; label plot with the equation of a fitted polynomial or other types of models; labels with P-value, R^2 or adjusted R^2 or information criteria for fitted models; label with ANOVA table for fitted models; label with summary for fitted models. Model fit classes for which suitable methods are provided by package 'broom' and 'broom.mixed' are supported. Scales and stats to build volcano and quadrant plots based on outcomes, fold changes, p-values and false discovery rates.

Maintained by Pedro J. Aphalo. Last updated 16 hours ago.

data-analysis dataviz ggplot2-annotations ggplot2-stats statistics

107 stars 13.64 score 4.4k scripts 14 dependents

kaz-yos

tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights

Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.

Maintained by Kazuki Yoshida. Last updated 3 years ago.

baseline-characteristics descriptive-statistics statistics

221 stars 13.55 score 2.3k scripts 12 dependents

mitchelloharawild

distributional:Vectorised Probability Distributions

Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions.

Maintained by Mitchell OHara-Wild. Last updated 1 days ago.

probability-distribution statistics vctrs

100 stars 13.54 score 744 scripts 388 dependents

mayoverse

arsenal:An Arsenal of 'R' Functions for Large-Scale Statistical Summaries

An Arsenal of 'R' functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types 'by' the levels of one or more categorical variables; paired(), a Table-1-like summary of multiple variable types paired across two time points; modelsum(), which performs simple model fits on one or more endpoints for many variables (univariate or adjusted for covariates); freqlist(), a powerful frequency table across many categorical variables; comparedf(), a function for comparing data.frames; and write2(), a function to output tables to a document.

Maintained by Ethan Heinzen. Last updated 8 months ago.

baseline-characteristics descriptive-statistics modeling paired-comparisons reporting statistics tableone

225 stars 13.40 score 1.2k scripts 15 dependents

easystats

see:Model Visualisation Toolbox for 'easystats' and 'ggplot2'

Provides plotting utilities supporting packages in the 'easystats' ecosystem (<https://github.com/easystats/easystats>) and some extra themes, geoms, and scales for 'ggplot2'. Color scales are based on <https://materialui.co/>. References: Lüdecke et al. (2021) <doi:10.21105/joss.03393>.

Maintained by Indrajeet Patil. Last updated 17 days ago.

data-visualization easystats ggplot2 hacktoberfest plotting see statistics visualisation visualization

902 stars 13.22 score 2.0k scripts 3 dependents

easystats

easystats:Framework for Easy Statistical Modeling, Visualization, and Reporting

A meta-package that installs and loads a set of packages from 'easystats' ecosystem in a single step. This collection of packages provide a unifying and consistent framework for statistical modeling, visualization, and reporting. Additionally, it provides articles targeted at instructors for teaching 'easystats', and a dashboard targeted at new R users for easily conducting statistical analysis by accessing summary results, model fit indices, and visualizations with minimal programming.

Maintained by Daniel Lüdecke. Last updated 24 days ago.

dataanalytics datascience easystats hacktoberfest models performance-metrics regression-models statistics

1.1k stars 13.01 score 1.8k scripts 1 dependents

kkholst

lava:Latent Variable Models

A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.

Maintained by Klaus K. Holst. Last updated 3 months ago.

latent-variable-models simulation statistics structural-equation-models

33 stars 12.87 score 610 scripts 478 dependents

drostlab

philentropy:Similarity and Distance Quantification Between Probability Functions

Computes 46 optimized distance and similarity measures for comparing probability functions (Drost (2018) <doi:10.21105/joss.00765>). These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a core framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions.

Maintained by Hajk-Georg Drost. Last updated 4 months ago.

distance-measures distance-quantification information-theory jensen-shannon-divergence parametric-distributions similarity-measures statistics cpp

137 stars 12.44 score 484 scripts 24 dependents

yihui

animation:A Gallery of Animations in Statistics and Utilities to Create Animations

Provides functions for animations in statistics, covering topics in probability theory, mathematical statistics, multivariate statistics, non-parametric statistics, sampling survey, linear models, time series, computational statistics, data mining and machine learning. These functions may be helpful in teaching statistics and data analysis. Also provided in this package are a series of functions to save animations to various formats, e.g. Flash, 'GIF', HTML pages, 'PDF' and videos. 'PDF' animations can be inserted into 'Sweave' / 'knitr' easily.

Maintained by Yihui Xie. Last updated 2 years ago.

animation statistical-computing statistical-graphics statistics

208 stars 12.13 score 2.5k scripts 28 dependents

twolodzko

extraDistr:Additional Univariate and Multivariate Distributions

Density, distribution function, quantile function and random generation for a number of univariate and multivariate distributions. This package implements the following distributions: Bernoulli, beta-binomial, beta-negative binomial, beta prime, Bhattacharjee, Birnbaum-Saunders, bivariate normal, bivariate Poisson, categorical, Dirichlet, Dirichlet-multinomial, discrete gamma, discrete Laplace, discrete normal, discrete uniform, discrete Weibull, Frechet, gamma-Poisson, generalized extreme value, Gompertz, generalized Pareto, Gumbel, half-Cauchy, half-normal, half-t, Huber density, inverse chi-squared, inverse-gamma, Kumaraswamy, Laplace, location-scale t, logarithmic, Lomax, multivariate hypergeometric, multinomial, negative hypergeometric, non-standard beta, normal mixture, Poisson mixture, Pareto, power, reparametrized beta, Rayleigh, shifted Gompertz, Skellam, slash, triangular, truncated binomial, truncated normal, truncated Poisson, Tukey lambda, Wald, zero-inflated binomial, zero-inflated negative binomial, zero-inflated Poisson.

Maintained by Tymoteusz Wolodzko. Last updated 23 days ago.

c-plus-plus c-plus-plus-11 distribution multivariate-distributions probability random-generation rcpp statistics cpp

53 stars 11.60 score 1.5k scripts 107 dependents

jacob-long

interactions:Comprehensive, User-Friendly Toolkit for Probing Interactions

A suite of functions for conducting and interpreting analysis of statistical interaction in regression models that was formerly part of the 'jtools' package. Functionality includes visualization of two- and three-way interactions among continuous and/or categorical variables as well as calculation of "simple slopes" and Johnson-Neyman intervals (see e.g., Bauer & Curran, 2005 <doi:10.1207/s15327906mbr4003_5>). These capabilities are implemented for generalized linear models in addition to the standard linear regression context.

Maintained by Jacob A. Long. Last updated 8 months ago.

interactions moderation social-sciences statistics

131 stars 11.40 score 1.2k scripts 5 dependents

tnagler

VineCopula:Statistical Inference of Vine Copulas

Provides tools for the statistical analysis of regular vine copula models, see Aas et al. (2009) <doi:10.1016/j.insmatheco.2007.02.001> and Dissman et al. (2013) <doi:10.1016/j.csda.2012.08.010>. The package includes tools for parameter estimation, model selection, simulation, goodness-of-fit tests, and visualization. Tools for estimation, selection and exploratory data analysis of bivariate copula models are also provided.

Maintained by Thomas Nagler. Last updated 4 days ago.

copula estimation statistics vine

92 stars 11.07 score 362 scripts 23 dependents

spatstat

spatstat.data:Datasets for 'spatstat' Family

Contains all the datasets for the 'spatstat' family of packages.

Maintained by Adrian Baddeley. Last updated 11 days ago.

kernel-density point-process spatial-analysis spatial-data spatial-data-analysis spatstat statistical-analysis statistical-methods statistical-tests statistics

6 stars 11.07 score 186 scripts 228 dependents

openml

OpenML:Open Machine Learning and Open Data Platform

We provide an R interface to 'OpenML.org' which is an online machine learning platform where researchers can access open data, download and upload data sets, share their machine learning tasks and experiments and organize them online to work and collaborate with other researchers. The R interface allows to query for data sets with specific properties, and allows the downloading and uploading of data sets, tasks, flows and runs. See <https://www.openml.org/guide/api> for more information.

Maintained by Giuseppe Casalicchio. Last updated 10 months ago.

arff benchmarking benchmarking-suite classification data-science database dataset datasets machine-learning machine-learning-algorithms open-data open-science opendata openml openscience regression reproducible-research statistics

97 stars 11.04 score 7.1k scripts

config-i1

greybox:Toolbox for Model Building and Forecasting

Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.

Maintained by Ivan Svetunkov. Last updated 14 days ago.

forecasting model-selection model-selection-and-evaluation regression regression-models statistics cpp

30 stars 11.03 score 97 scripts 34 dependents

rudeboybert

fivethirtyeight:Data and Code Behind the Stories and Interactives at 'FiveThirtyEight'

Datasets and code published by the data journalism website 'FiveThirtyEight' available at <https://github.com/fivethirtyeight/data>. Note that while we received guidance from editors at 'FiveThirtyEight', this package is not officially published by 'FiveThirtyEight'.

Maintained by Albert Y. Kim. Last updated 2 years ago.

data-science datajournalism fivethirtyeight statistics

453 stars 10.98 score 1.7k scripts

neuropsychology

psycho:Efficient and Publishing-Oriented Workflow for Psychological Science

The main goal of the psycho package is to provide tools for psychologists, neuropsychologists and neuroscientists, to facilitate and speed up the time spent on data analysis. It aims at supporting best practices and tools to format the output of statistical methods to directly paste them into a manuscript, ensuring statistical reporting standardization and conformity.

Maintained by Dominique Makowski. Last updated 4 years ago.

apa apa6 bayesian correlation format interpretation mixed-models neuroscience psycho psychology rstanarm statistics

149 stars 10.86 score 628 scripts 5 dependents

ovvo-financial

NNS:Nonlinear Nonparametric Statistics

Nonlinear nonparametric statistics using partial moments. Partial moments are the elements of variance and asymptotically approximate the area of f(x). These robust statistics provide the basis for nonlinear analysis while retaining linear equivalences. NNS offers: Numerical integration, Numerical differentiation, Clustering, Correlation, Dependence, Causal analysis, ANOVA, Regression, Classification, Seasonality, Autoregressive modeling, Normalization, Stochastic dominance and Advanced Monte Carlo sampling. All routines based on: Viole, F. and Nawrocki, D. (2013), Nonlinear Nonparametric Statistics: Using Partial Moments (ISBN: 1490523995).

Maintained by Fred Viole. Last updated 1 hours ago.

clustering econometrics machine-learning nonlinear nonparametric partial-moments statistics time-series cpp

72 stars 10.77 score 66 scripts 3 dependents

mariarizzo

energy:E-Statistics: Multivariate Inference via the Energy of Data

E-statistics (energy) tests and statistics for multivariate and univariate inference, including distance correlation, one-sample, two-sample, and multi-sample tests for comparing multivariate distributions, are implemented. Measuring and testing multivariate independence based on distance correlation, partial distance correlation, multivariate goodness-of-fit tests, k-groups and hierarchical clustering based on energy distance, testing for multivariate normality, distance components (disco) for non-parametric analysis of structured data, and other energy statistics/methods are implemented.

Maintained by Maria Rizzo. Last updated 7 months ago.

distance-correlation energy multivariate-analysis statistics cpp

45 stars 10.69 score 634 scripts 45 dependents

rempsyc

rempsyc:Convenience Functions for Psychology

Make your workflow faster and easier. Easily customizable plots (via 'ggplot2'), nice APA tables (following the style of the *American Psychological Association*) exportable to Word (via 'flextable'), easily run statistical tests or check assumptions, and automatize various other tasks.

Maintained by Rémi Thériault. Last updated 2 months ago.

convenience-functions ggplot2 psychology statistics visualization

43 stars 10.68 score 214 scripts 2 dependents

ikosmidis

brglm2:Bias Reduction in Generalized Linear Models

Estimation and inference from generalized linear models based on various methods for bias reduction and maximum penalized likelihood with powers of the Jeffreys prior as penalty. The 'brglmFit' fitting method can achieve reduction of estimation bias by solving either the mean bias-reducing adjusted score equations in Firth (1993) <doi:10.1093/biomet/80.1.27> and Kosmidis and Firth (2009) <doi:10.1093/biomet/asp055>, or the median bias-reduction adjusted score equations in Kenne et al. (2017) <doi:10.1093/biomet/asx046>, or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as in Cordeiro and McCullagh (1991) <https://www.jstor.org/stable/2345592>. See Kosmidis et al (2020) <doi:10.1007/s11222-019-09860-6> for more details. Estimation in all cases takes place via a quasi Fisher scoring algorithm, and S3 methods for the construction of of confidence intervals for the reduced-bias estimates are provided. In the special case of generalized linear models for binomial and multinomial responses (both ordinal and nominal), the adjusted score approaches to mean and media bias reduction have been found to return estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite (e.g. complete and quasi-complete separation; see Kosmidis and Firth, 2020 <doi:10.1093/biomet/asaa052>, for a proof for mean bias reduction in logistic regression).

Maintained by Ioannis Kosmidis. Last updated 7 months ago.

adjusted-score-equations algorithms bias-reducing-adjustments bias-reduction estimation glm logistic-regression nominal-responses ordinal-responses regression regression-algorithms statistics

32 stars 10.41 score 106 scripts 10 dependents

atsa-es

MARSS:Multivariate Autoregressive State-Space Modeling

The MARSS package provides maximum-likelihood parameter estimation for constrained and unconstrained linear multivariate autoregressive state-space (MARSS) models, including partially deterministic models. MARSS models are a class of dynamic linear model (DLM) and vector autoregressive model (VAR) model. Fitting available via Expectation-Maximization (EM), BFGS (using optim), and 'TMB' (using the 'marssTMB' companion package). Functions are provided for parametric and innovations bootstrapping, Kalman filtering and smoothing, model selection criteria including bootstrap AICb, confidences intervals via the Hessian approximation or bootstrapping, and all conditional residual types. See the user guide for examples of dynamic factor analysis, dynamic linear models, outlier and shock detection, and multivariate AR-p models. Online workshops (lectures, eBook, and computer labs) at <https://atsa-es.github.io/>.

Maintained by Elizabeth Eli Holmes. Last updated 1 years ago.

multivariate-timeseries state-space-models statistics time-series

52 stars 10.34 score 596 scripts 3 dependents

msalibian

RobStatTM:Robust Statistics: Theory and Methods

Companion package for the book: "Robust Statistics: Theory and Methods, second edition", <http://www.wiley.com/go/maronna/robust>. This package contains code that implements the robust estimators discussed in the recent second edition of the book above, as well as the scripts reproducing all the examples in the book.

Maintained by Matias Salibian-Barrera. Last updated 15 days ago.

robust robust-estimation robust-regression robust-statistics robustness statistics fortran openblas

17 stars 10.23 score 84 scripts 8 dependents

stan-dev

projpred:Projection Predictive Feature Selection

Performs projection predictive feature selection for generalized linear models (Piironen, Paasiniemi, and Vehtari, 2020, <doi:10.1214/20-EJS1711>) with or without multilevel or additive terms (Catalina, Bürkner, and Vehtari, 2022, <https://proceedings.mlr.press/v151/catalina22a.html>), for some ordinal and nominal regression models (Weber, Glass, and Vehtari, 2023, <arXiv:2301.01660>), and for many other regression models (using the latent projection by Catalina, Bürkner, and Vehtari, 2021, <arXiv:2109.04702>, which can also be applied to most of the former models). The package is compatible with the 'rstanarm' and 'brms' packages, but other reference models can also be used. See the vignettes and the documentation for more information and examples.

Maintained by Frank Weber. Last updated 11 days ago.

bayes bayesian bayesian-inference rstanarm stan statistics variable-selection openblas cpp

112 stars 10.09 score 241 scripts

tlverse

sl3:Pipelines for Machine Learning and Super Learning

A modern implementation of the Super Learner prediction algorithm, coupled with a general purpose framework for composing arbitrary pipelines for machine learning tasks.

Maintained by Jeremy Coyle. Last updated 4 months ago.

data-science ensemble-learning ensemble-model machine-learning model-selection regression stacking statistics

100 stars 9.94 score 748 scripts 7 dependents

stocnet

RSiena:Siena - Simulation Investigation for Empirical Network Analysis

The main purpose of this package is to perform simulation-based estimation of stochastic actor-oriented models for longitudinal network data collected as panel data. Dependent variables can be single or multivariate networks, which can be directed, non-directed, or two-mode; and associated actor variables. There are also functions for testing parameters and checking goodness of fit. An overview of these models is given in Snijders (2017), <doi:10.1146/annurev-statistics-060116-054035>.

Maintained by Tom A.B. Snijders. Last updated 2 months ago.

longitudinal-data rsiena social-network-analysis statistical-network-analysis statistics cpp

107 stars 9.93 score 346 scripts 1 dependents

acclab

dabestr:Data Analysis using Bootstrap-Coupled Estimation

Data Analysis using Bootstrap-Coupled ESTimation. Estimation statistics is a simple framework that avoids the pitfalls of significance testing. It uses familiar statistical concepts: means, mean differences, and error bars. More importantly, it focuses on the effect size of one's experiment/intervention, as opposed to a false dichotomy engendered by P values. An estimation plot has two key features: 1. It presents all datapoints as a swarmplot, which orders each point to display the underlying distribution. 2. It presents the effect size as a bootstrap 95% confidence interval on a separate but aligned axes. Estimation plots are introduced in Ho et al., Nature Methods 2019, 1548-7105. <doi:10.1038/s41592-019-0470-3>. The free-to-view PDF is located at <https://www.nature.com/articles/s41592-019-0470-3.epdf?author_access_token=Euy6APITxsYA3huBKOFBvNRgN0jAjWel9jnR3ZoTv0Pr6zJiJ3AA5aH4989gOJS_dajtNr1Wt17D0fh-t4GFcvqwMYN03qb8C33na_UrCUcGrt-Z0J9aPL6TPSbOxIC-pbHWKUDo2XsUOr3hQmlRew%3D%3D>.

Maintained by Yishan Mai. Last updated 1 years ago.

data-analysis data-visualization estimation statistics

214 stars 9.80 score 142 scripts

jasonjfoster

roll:Rolling and Expanding Statistics

Fast and efficient computation of rolling and expanding statistics for time-series data.

Maintained by Jason Foster. Last updated 2 months ago.

algorithms rcpp statistics openblas cpp openmp

116 stars 9.76 score 318 scripts 13 dependents

dcousin3

superb:Summary Plots with Adjusted Error Bars

Computes standard error and confidence interval of various descriptive statistics under various designs and sampling schemes. The main function, superb(), return a plot. It can also be used to obtain a dataframe with the statistics and their precision intervals so that other plotting environments (e.g., Excel) can be used. See Cousineau and colleagues (2021) <doi:10.1177/25152459211035109> or Cousineau (2017) <doi:10.5709/acp-0214-z> for a review as well as Cousineau (2005) <doi:10.20982/tqmp.01.1.p042>, Morey (2008) <doi:10.20982/tqmp.04.2.p061>, Baguley (2012) <doi:10.3758/s13428-011-0123-7>, Cousineau & Laurencelle (2016) <doi:10.1037/met0000055>, Cousineau & O'Brien (2014) <doi:10.3758/s13428-013-0441-z>, Calderini & Harding <doi:10.20982/tqmp.15.1.p001> for specific references.

Maintained by Denis Cousineau. Last updated 2 months ago.

error-bars plotting statistics summary-plots summary-statistics visualization

19 stars 9.53 score 155 scripts 2 dependents

tbates

umx:Structural Equation Modeling and Twin Modeling in R

Quickly create, run, and report structural equation models, and twin models. See '?umx' for help, and umx_open_CRAN_page("umx") for NEWS. Timothy C. Bates, Michael C. Neale, Hermine H. Maes, (2019). umx: A library for Structural Equation and Twin Modelling in R. Twin Research and Human Genetics, 22, 27-41. <doi:10.1017/thg.2019.2>.

Maintained by Timothy C. Bates. Last updated 14 days ago.

behavior-genetics genetics openmx psychology sem statistics structural-equation-modeling tutorials twin-models umx

44 stars 9.45 score 472 scripts

eblondel

rsdmx:Tools for Reading SDMX Data and Metadata

Set of classes and methods to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework, currently focusing on the SDMX XML standard format (SDMX-ML).

Maintained by Emmanuel Blondel. Last updated 8 days ago.

api datastructures dsd read readsdmx sdmx sdmx-format sdmx-provider sdmx-standards statistics timeseries web-services

105 stars 9.37 score 4 dependents

dmphillippo

multinma:Bayesian Network Meta-Analysis of Individual and Aggregate Data

Network meta-analysis and network meta-regression models for aggregate data, individual patient data, and mixtures of both individual and aggregate data using multilevel network meta-regression as described by Phillippo et al. (2020) <doi:10.1111/rssa.12579>. Models are estimated in a Bayesian framework using 'Stan'.

Maintained by David M. Phillippo. Last updated 2 days ago.

statistics cpp

35 stars 9.34 score 163 scripts

ledell

cvAUC:Cross-Validated Area Under the ROC Curve Confidence Intervals

Tools for working with and evaluating cross-validated area under the ROC curve (AUC) estimators. The primary functions of the package are ci.cvAUC and ci.pooled.cvAUC, which report cross-validated AUC and compute confidence intervals for cross-validated AUC estimates based on influence curves for i.i.d. and pooled repeated measures data, respectively. One benefit to using influence curve based confidence intervals is that they require much less computation time than bootstrapping methods. The utility functions, AUC and cvAUC, are simple wrappers for functions from the ROCR package.

Maintained by Erin LeDell. Last updated 3 years ago.

auc confidence-intervals cross-validation machine-learning statistics variance

23 stars 9.17 score 317 scripts 40 dependents

doubleml

DoubleML:Double Machine Learning in R

Implementation of the double/debiased machine learning framework of Chernozhukov et al. (2018) <doi:10.1111/ectj.12097> for partially linear regression models, partially linear instrumental variable regression models, interactive regression models and interactive instrumental variable regression models. 'DoubleML' allows estimation of the nuisance parts in these models by machine learning methods and computation of the Neyman orthogonal score functions. 'DoubleML' is built on top of 'mlr3' and the 'mlr3' ecosystem. The object-oriented implementation of 'DoubleML' based on the 'R6' package is very flexible. More information available in the publication in the Journal of Statistical Software: <doi:10.18637/jss.v108.i03>.

Maintained by Philipp Bach. Last updated 4 months ago.

causal-inference data-science double-machine-learning econometrics machine-learning mlr3 statistics

139 stars 9.16 score 267 scripts 1 dependents

great-northern-diver

loon:Interactive Statistical Data Visualization

An extendable toolkit for interactive data visualization and exploration.

Maintained by R. Wayne Oldford. Last updated 2 years ago.

data-analysis data-science data-visualization exploratory-analysis exploratory-data-analysis high-dimensional-data interactive-graphics interactive-visualizations loon python statistical-analysis statistical-graphics statistics tcl-extension tk

48 stars 9.00 score 93 scripts 5 dependents

graemeleehickey

joineRML:Joint Modelling of Multivariate Longitudinal Data and Time-to-Event Outcomes

Fits the joint model proposed by Henderson and colleagues (2000) <doi:10.1093/biostatistics/1.4.465>, but extended to the case of multiple continuous longitudinal measures. The time-to-event data is modelled using a Cox proportional hazards regression model with time-varying covariates. The multiple longitudinal outcomes are modelled using a multivariate version of the Laird and Ware linear mixed model. The association is captured by a multivariate latent Gaussian process. The model is estimated using a Monte Carlo Expectation Maximization algorithm. This project was funded by the Medical Research Council (Grant number MR/M013227/1).

Maintained by Graeme L. Hickey. Last updated 2 months ago.

armadillo biostatistics clinical-trials cox dynamic joint-models longitudinal-data multivariate-analysis multivariate-data multivariate-longitudinal-data prediction rcpp regression-models statistics survival openblas cpp openmp

30 stars 8.93 score 146 scripts 1 dependents

jacob-long

panelr:Regression Models and Utilities for Repeated Measures and Panel Data

Provides an object type and associated tools for storing and wrangling panel data. Implements several methods for creating regression models that take advantage of the unique aspects of panel data. Among other capabilities, automates the "within-between" (also known as "between-within" and "hybrid") panel regression specification that combines the desirable aspects of both fixed effects and random effects econometric models and fits them as multilevel models (Allison, 2009 <doi:10.4135/9781412993869.d33>; Bell & Jones, 2015 <doi:10.1017/psrm.2014.7>). These models can also be estimated via generalized estimating equations (GEE; McNeish, 2019 <doi:10.1080/00273171.2019.1602504>) and Bayesian estimation is (optionally) supported via 'Stan'. Supports estimation of asymmetric effects models via first differences (Allison, 2019 <doi:10.1177/2378023119826441>) as well as a generalized linear model extension thereof using GEE.

Maintained by Jacob A. Long. Last updated 1 years ago.

social-science statistics

101 stars 8.88 score 181 scripts 1 dependents

mattcowgill

readabs:Download and Tidy Time Series Data from the Australian Bureau of Statistics

Downloads, imports, and tidies time series data from the Australian Bureau of Statistics <https://www.abs.gov.au/>.

Maintained by Matt Cowgill. Last updated 26 days ago.

abs australia australian-bureau-of-statistics australian-data statistics tidy-data time-series

104 stars 8.85 score 180 scripts

ropengov

regions:Processing Regional Statistics

Validating sub-national statistical typologies, re-coding across standard typologies of sub-national statistics, and making valid aggregate level imputation, re-aggregation, re-weighting and projection down to lower hierarchical levels to create meaningful data panels and time series.

Maintained by Daniel Antal. Last updated 3 years ago.

observatory regions ropengov statistics

12 stars 8.81 score 67 scripts 5 dependents

openpharma

brms.mmrm:Bayesian MMRMs using 'brms'

The mixed model for repeated measures (MMRM) is a popular model for longitudinal clinical trial data with continuous endpoints, and 'brms' is a powerful and versatile package for fitting Bayesian regression models. The 'brms.mmrm' R package leverages 'brms' to run MMRMs, and it supports a simplified interfaced to reduce difficulty and align with the best practices of the life sciences. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>, Mallinckrodt (2008) <doi:10.1177/009286150804200402>.

Maintained by William Michael Landau. Last updated 6 months ago.

brms life-sciences mc-stan mmrm stan statistics

21 stars 8.80 score 13 scripts

jinseob2kim

jsmodule:'RStudio' Addins and 'Shiny' Modules for Medical Research

'RStudio' addins and 'Shiny' modules for descriptive statistics, regression and survival analysis.

Maintained by Jinseob Kim. Last updated 10 days ago.

medical rstudio-addins shiny shiny-modules statistics

21 stars 8.69 score 61 scripts

mobiodiv

mobr:Measurement of Biodiversity

Functions for calculating metrics for the measurement biodiversity and its changes across scales, treatments, and gradients. The methods implemented in this package are described in: Chase, J.M., et al. (2018) <doi:10.1111/ele.13151>, McGlinn, D.J., et al. (2019) <doi:10.1111/2041-210X.13102>, McGlinn, D.J., et al. (2020) <doi:10.1101/851717>, and McGlinn, D.J., et al. (2023) <doi:10.1101/2023.09.19.558467>.

Maintained by Daniel McGlinn. Last updated 8 days ago.

biodiversity conservation ecology rarefaction species statistics

23 stars 8.65 score 93 scripts

mayer79

confintr:Confidence Intervals

Calculates classic and/or bootstrap confidence intervals for many parameters such as the population mean, variance, interquartile range (IQR), median absolute deviation (MAD), skewness, kurtosis, Cramer's V, odds ratio, R-squared, quantiles (incl. median), proportions, different types of correlation measures, difference in means, quantiles and medians. Many of the classic confidence intervals are described in Smithson, M. (2003, ISBN: 978-0761924999). Bootstrap confidence intervals are calculated with the R package 'boot'. Both one- and two-sided intervals are supported.

Maintained by Michael Mayer. Last updated 8 months ago.

bootstrap confidence-intervals statistical-inference statistics

16 stars 8.62 score 104 scripts 17 dependents

bioc

nullranges:Generation of null ranges via bootstrapping or covariate matching

Modular package for generation of sets of ranges representing the null hypothesis. These can take the form of bootstrap samples of ranges (using the block bootstrap framework of Bickel et al 2010), or sets of control ranges that are matched across one or more covariates. nullranges is designed to be inter-operable with other packages for analysis of genomic overlap enrichment, including the plyranges Bioconductor package.

Maintained by Michael Love. Last updated 5 months ago.

visualization genesetenrichment functionalgenomics epigenetics generegulation genetarget genomeannotation annotation genomewideassociation histonemodification chipseq atacseq dnaseseq rnaseq hiddenmarkovmodel bioconductor bootstrap genomics matching statistics

27 stars 8.16 score 50 scripts 1 dependents

gmgeorg

LambertW:Probabilistic Models to Analyze and Gaussianize Heavy-Tailed, Skewed Data

Lambert W x F distributions are a generalized framework to analyze skewed, heavy-tailed data. It is based on an input/output system, where the output random variable (RV) Y is a non-linearly transformed version of an input RV X ~ F with similar properties as X, but slightly skewed (heavy-tailed). The transformed RV Y has a Lambert W x F distribution. This package contains functions to model and analyze skewed, heavy-tailed data the Lambert Way: simulate random samples, estimate parameters, compute quantiles, and plot/ print results nicely. The most useful function is 'Gaussianize', which works similarly to 'scale', but actually makes the data Gaussian. A do-it-yourself toolkit allows users to define their own Lambert W x 'MyFavoriteDistribution' and use it in their analysis right away.

Maintained by Georg M. Goerg. Last updated 1 years ago.

gaussianize gaussianize-data heavy-tailed heavy-tailed-distributions leptokurtosis normal-distribution normalization skewed-data statistics cpp

10 stars 8.16 score 78 scripts 13 dependents

psychbruce

bruceR:Broadly Useful Convenient and Efficient R Functions

Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.

Maintained by Han-Wu-Shuang Bao. Last updated 10 months ago.

anova data-analysis data-science linear-models linear-regression multilevel-models statistics toolbox

176 stars 7.87 score 316 scripts 3 dependents

cwatson

brainGraph:Graph Theory Analysis of Brain MRI Data

A set of tools for performing graph theory analysis of brain MRI data. It works with data from a Freesurfer analysis (cortical thickness, volumes, local gyrification index, surface area), diffusion tensor tractography data (e.g., from FSL) and resting-state fMRI data (e.g., from DPABI). It contains a graphical user interface for graph visualization and data exploration, along with several functions for generating useful figures.

Maintained by Christopher G. Watson. Last updated 1 years ago.

brain-connectivity brain-imaging complex-networks connectome connectomics fmri graph-theory mri network-analysis neuroimaging neuroscience statistics tractography

188 stars 7.86 score 107 scripts 3 dependents

bioc

fishpond:Fishpond: downstream methods and tools for expression data

Fishpond contains methods for differential transcript and gene expression analysis of RNA-seq data using inferential replicates for uncertainty of abundance quantification, as generated by Gibbs sampling or bootstrap sampling. Also the package contains a number of utilities for working with Salmon and Alevin quantification files.

Maintained by Michael Love. Last updated 5 months ago.

sequencing rnaseq geneexpression transcription normalization regression multiplecomparison batcheffect visualization differentialexpression differentialsplicing alternativesplicing singlecell bioconductor gene-expression genomics salmon scrnaseq statistics transcriptomics

28 stars 7.83 score 150 scripts

statswithr

statsr:Companion Software for the Coursera Statistics with R Specialization

Data and functions to support Bayesian and frequentist inference and decision making for the Coursera Specialization "Statistics with R". See <https://github.com/StatsWithR/statsr> for more information.

Maintained by Merlise Clyde. Last updated 4 years ago.

bayesian-inference coursera statistics

71 stars 7.82 score 880 scripts

spsanderson

TidyDensity:Functions for Tidy Analysis and Generation of Random Data

To make it easy to generate random numbers based upon the underlying stats distribution functions. All data is returned in a tidy and structured format making working with the data simple and straight forward. Given that the data is returned in a tidy 'tibble' it lends itself to working with the rest of the 'tidyverse'.

Maintained by Steven Sanderson. Last updated 5 months ago.

bootstrap density distributions ggplot2 probability r-language simulation statistics tibble tidy

34 stars 7.73 score 66 scripts 1 dependents

ellessenne

rsimsum:Analysis of Simulation Studies Including Monte Carlo Error

Summarise results from simulation studies and compute Monte Carlo standard errors of commonly used summary statistics. This package is modelled on the 'simsum' user-written command in 'Stata' (White I.R., 2010 <https://www.stata-journal.com/article.html?article=st0200>), further extending it with additional performance measures and functionality.

Maintained by Alessandro Gasparini. Last updated 11 months ago.

biostatistics monte-carlo-error simulation simulation-study simulations statistics

28 stars 7.70 score 148 scripts

statisticsnorway

SSBtools:Algorithms and Tools for Tabular Statistics and Hierarchical Computations

Includes general data manipulation functions, algorithms for statistical disclosure control (Langsrud, 2024) <doi:10.1007/978-3-031-69651-0_6> and functions for hierarchical computations by sparse model matrices (Langsrud, 2023) <doi:10.32614/RJ-2023-088>.

Maintained by Øyvind Langsrud. Last updated 14 days ago.

statistics

7 stars 7.62 score 68 scripts 7 dependents

michelenuijten

statcheck:Extract Statistics from Articles and Recompute P-Values

A "spellchecker" for statistics. It checks whether your p-values match their accompanying test statistic and degrees of freedom. statcheck searches for null-hypothesis significance test (NHST) in APA style (e.g., t(28) = 2.2, p < .05). It recalculates the p-value using the reported test statistic and degrees of freedom. If the reported and computed p-values don't match, statcheck will flag the result as an error. If the reported p-value is statistically significant and the recomputed one is not, or vice versa, the result will be flagged as a decision error. You can use statcheck directly on a string of text, but you can also scan a PDF or HTML file, or even a folder of PDF and/or HTML files. Statcheck needs an external program to convert PDF to text: Xpdf. Instructions on where and how to download this program, how to install statcheck, and more details on what statcheck can and cannot do can be found in the online manual: <https://rpubs.com/michelenuijten/statcheckmanual>. You can find a point-and-click web interface to scan PDF or HTML or DOCX articles on <http://statcheck.io>.

Maintained by Michele B. Nuijten. Last updated 8 months ago.

nhst p-values reproducibility statistics

178 stars 7.55 score 40 scripts

isaakiel

mortAAR:Analysis of Archaeological Mortality Data

A collection of functions for the analysis of archaeological mortality data (on the topic see e.g. Chamberlain 2006 <https://books.google.de/books?id=nG5FoO_becAC&lpg=PA27&ots=LG0b_xrx6O&dq=life%20table%20archaeology&pg=PA27#v=onepage&q&f=false>). It takes demographic data in different formats and displays the result in a standard life table as well as plots the relevant indices (percentage of deaths, survivorship, probability of death, life expectancy, percentage of population). It also checks for possible biases in the age structure and applies corrections to life tables.

Maintained by Nils Mueller-Scheessel. Last updated 3 months ago.

anthropology archaeology demography statistics

15 stars 7.49 score 23 scripts

pat-s

oddsratio:Odds Ratio Calculation for GAM(M)s & GLM(M)s

Simplified odds ratio calculation of GAM(M)s & GLM(M)s. Provides structured output (data frame) of all predictors and their corresponding odds ratios and confident intervals for further analyses. It helps to avoid false references of predictors and increments by specifying these parameters in a list instead of using 'exp(coef(model))' (standard approach of odds ratio calculation for GLMs) which just returns a plain numeric output. For GAM(M)s, odds ratio calculation is highly simplified with this package since it takes care of the multiple 'predict()' calls of the chosen predictor while holding other predictors constant. Also, this package allows odds ratio calculation of percentage steps across the whole predictor distribution range for GAM(M)s. In both cases, confident intervals are returned additionally. Calculated odds ratio of GAM(M)s can be inserted into the smooth function plot.

Maintained by Patrick Schratz. Last updated 12 months ago.

odds-ratio probability statistics

31 stars 7.48 score 81 scripts 1 dependents

mw201608

SuperExactTest:Exact Test and Visualization of Multi-Set Intersections

Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. This package implements a theoretical framework for efficient computation of statistical distributions of multi-set intersections based upon combinatorial theory, and provides multiple scalable techniques for visualizing the intersection statistics. The statistical algorithm behind this package was published in Wang et al. (2015) <doi:10.1038/srep16923>.

Maintained by Minghui Wang. Last updated 1 years ago.

intersection set statistics visualization

28 stars 7.47 score 70 scripts 1 dependents

vinecopulib

rvinecopulib:High Performance Algorithms for Vine Copula Modeling

Provides an interface to 'vinecopulib', a C++ library for vine copula modeling. The 'rvinecopulib' package implements the core features of the popular 'VineCopula' package, in particular inference algorithms for both vine copula and bivariate copula models. Advantages over 'VineCopula' are a sleeker and more modern API, improved performances, especially in high dimensions, nonparametric and multi-parameter families, and the ability to model discrete variables. The 'rvinecopulib' package includes 'vinecopulib' as header-only C++ library (currently version 0.7.2). Thus users do not need to install 'vinecopulib' itself in order to use 'rvinecopulib'. Since their initial releases, 'vinecopulib' is licensed under the MIT License, and 'rvinecopulib' is licensed under the GNU GPL version 3.

Maintained by Thomas Nagler. Last updated 4 days ago.

copula estimation statistics vine cpp

35 stars 7.43 score 60 scripts 14 dependents

stevecondylios

priceR:Economics and Pricing Tools

Functions to aid in micro and macro economic analysis and handling of price and currency data. Includes extraction of relevant inflation and exchange rate data from World Bank API, data cleaning/parsing, and standardisation. Inflation adjustment calculations as found in Principles of Macroeconomics by Gregory Mankiw et al (2014). Current and historical end of day exchange rates for 171 currencies from the European Central Bank Statistical Data Warehouse (2020) <https://sdw.ecb.europa.eu/curConverter.do>.

Maintained by Steve Condylios. Last updated 7 months ago.

data-science econometrics economics finance modeling r-programming statistics

59 stars 7.38 score 102 scripts

marberts

piar:Price Index Aggregation

Most price indexes are made with a two-step procedure, where period-over-period elemental indexes are first calculated for a collection of elemental aggregates at each point in time, and then aggregated according to a price index aggregation structure. These indexes can then be chained together to form a time series that gives the evolution of prices with respect to a fixed base period. This package contains a collection of functions that revolve around this work flow, making it easy to build standard price indexes, and implement the methods described by Balk (2008, <doi:10.1017/CBO9780511720758>), von der Lippe (2007, <doi:10.3726/978-3-653-01120-3>), and the CPI manual (2020, <doi:10.5089/9781484354841.069>) for bilateral price indexes.

Maintained by Steve Martin. Last updated 2 days ago.

economics inflation official-statistics statistics

4 stars 7.30 score 25 scripts

airoldilab

sgd:Stochastic Gradient Descent for Scalable Estimation

A fast and flexible set of tools for large scale estimation. It features many stochastic gradient methods, built-in models, visualization tools, automated hyperparameter tuning, model checking, interval estimation, and convergence diagnostics.

Maintained by Junhyung Lyle Kim. Last updated 1 years ago.

big-data data-analysis gradient-descent statistics openblas cpp

62 stars 7.25 score 71 scripts

deepankardatta

blandr:Bland-Altman Method Comparison

Carries out Bland Altman analyses (also known as a Tukey mean-difference plot) as described by JM Bland and DG Altman in 1986 <doi:10.1016/S0140-6736(86)90837-8>. This package was created in 2015 as existing Bland-Altman analysis functions did not calculate confidence intervals. This package was created to rectify this, and create reproducible plots. This package is also available as a module for the 'jamovi' statistical spreadsheet (see <https://www.jamovi.org> for more information).

Maintained by Deepankar Datta. Last updated 10 months ago.

bland-altman ggplot2 method-comparison statistics

22 stars 7.22 score 75 scripts

kkholst

targeted:Targeted Inference

Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>).

Maintained by Klaus K. Holst. Last updated 2 months ago.

causal-inference double-robust estimation semiparametric-estimation statistics openblas cpp openmp

11 stars 7.20 score 30 scripts 1 dependents

arsilva87

biotools:Tools for Biometry and Applied Statistics in Agricultural Science

Tools designed to perform and evaluate cluster analysis (including Tocher's algorithm), discriminant analysis and path analysis (standard and under collinearity), as well as some useful miscellaneous tools for dealing with sample size and optimum plot size calculations. A test for seed sample heterogeneity is now available. Mantel's permutation test can be found in this package. A new approach for calculating its power is implemented. biotools also contains tests for genetic covariance components. Heuristic approaches for performing non-parametric spatial predictions of generic response variables and spatial gene diversity are implemented.

Maintained by Anderson Rodrigo da Silva. Last updated 3 years ago.

cluster-analysis multivariate-analysis statistics tocher

2 stars 7.11 score 161 scripts 1 dependents

nbarrowman

vtree:Display Information About Nested Subsets of a Data Frame

A tool for calculating and drawing "variable trees". Variable trees display information about nested subsets of a data frame.

Maintained by Nick Barrowman. Last updated 13 days ago.

data-science data-visualization exploratory-data-analysis statistics

76 stars 7.09 score 65 scripts

doccstat

fastcpd:Fast Change Point Detection via Sequential Gradient Descent

Implements fast change point detection algorithm based on the paper "Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis" by Xianyang Zhang, Trisha Dawn <https://proceedings.mlr.press/v206/zhang23b.html>. The algorithm is based on dynamic programming with pruning and sequential gradient descent. It is able to detect change points a magnitude faster than the vanilla Pruned Exact Linear Time(PELT). The package includes examples of linear regression, logistic regression, Poisson regression, penalized linear regression data, and whole lot more examples with custom cost function in case the user wants to use their own cost function.

Maintained by Xingchi Li. Last updated 10 days ago.

change-point-detection cpp custom-function gradient-descent lasso linear-regression logistic-regression offline pelt penalized-regression poisson-regression quasi-newton statistics time-series warm-start fortran openblas cpp openmp

22 stars 7.00 score 7 scripts

vandomed

tab:Create Summary Tables for Statistical Reports

Contains functions for creating various types of summary tables, e.g. comparing characteristics across levels of a categorical variable and summarizing fitted generalized linear models, generalized estimating equations, and Cox proportional hazards models. Functions are available to handle data from simple random samples as well as complex surveys.

Maintained by Dane R. Van Domelen. Last updated 4 years ago.

manuscripts reports reproducible-research statistics tables

2 stars 6.97 score 86 scripts 9 dependents

ropensci

jagstargets:Targets for JAGS Pipelines

Bayesian data analysis usually incurs long runtimes and cumbersome custom code. A pipeline toolkit tailored to Bayesian statisticians, the 'jagstargets' R package is leverages 'targets' and 'R2jags' to ease this burden. 'jagstargets' makes it super easy to set up scalable JAGS pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom code is required, and there is no need to manually configure branching, so usage is much easier than 'targets' alone. For the underlying methodology, please refer to the documentation of 'targets' <doi:10.21105/joss.02959> and 'JAGS' (Plummer 2003) <https://www.r-project.org/conferences/DSC-2003/Proceedings/Plummer.pdf>.

Maintained by William Michael Landau. Last updated 4 months ago.

bayesian high-performance-computing jags make r-targetopia reproducibility rjags statistics targets cpp

10 stars 6.95 score 32 scripts

rempsyc

lavaanExtra:Convenience Functions for Package 'lavaan'

Affords an alternative, vector-based syntax to 'lavaan', as well as other convenience functions such as naming paths and defining indirect links automatically, in addition to convenience formatting optimized for a publication and script sharing workflow.

Maintained by Rémi Thériault. Last updated 9 months ago.

convenience-functions lavaan psychology statistics structural-equation-modeling

18 stars 6.95 score 33 scripts

ropensci

stantargets:Targets for Stan Workflows

Bayesian data analysis usually incurs long runtimes and cumbersome custom code. A pipeline toolkit tailored to Bayesian statisticians, the 'stantargets' R package leverages 'targets' and 'cmdstanr' to ease these burdens. 'stantargets' makes it super easy to set up scalable Stan pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom code is required, and there is no need to manually configure branching, so usage is much easier than 'targets' alone. 'stantargets' can access all of 'cmdstanr''s major algorithms (MCMC, variational Bayes, and optimization) and it supports both single-fit workflows and multi-rep simulation studies. For the statistical methodology, please refer to 'Stan' documentation (Stan Development Team 2020) <https://mc-stan.org/>.

Maintained by William Michael Landau. Last updated 2 months ago.

bayesian high-performance-computing make r-targetopia reproducibility stan statistics targets

49 stars 6.85 score 180 scripts

desctable

desctable:Produce Descriptive and Comparative Tables Easily

Easily create descriptive and comparative tables. It makes use and integrates directly with the tidyverse family of packages, and pipes. Tables are produced as (nested) dataframes for easy manipulation.

Maintained by Maxime Wack. Last updated 3 years ago.

markdown statistics tidyverse

52 stars 6.85 score 45 scripts

mayer79

MetricsWeighted:Weighted Metrics and Performance Measures for Machine Learning

Provides weighted versions of several metrics and performance measures used in machine learning, including average unit deviances of the Bernoulli, Tweedie, Poisson, and Gamma distributions, see Jorgensen B. (1997, ISBN: 978-0412997112). The package also contains a weighted version of generalized R-squared, see e.g. Cohen, J. et al. (2002, ISBN: 978-0805822236). Furthermore, 'dplyr' chains are supported.

Maintained by Michael Mayer. Last updated 8 months ago.

machine-learning metrics performance statistics

11 stars 6.79 score 75 scripts 5 dependents

imbi-heidelberg

DescrTab2:Publication Quality Descriptive Statistics Tables

Provides functions to create descriptive statistics tables for continuous and categorical variables. By default, summary statistics such as mean, standard deviation, quantiles, minimum and maximum for continuous variables and relative and absolute frequencies for categorical variables are calculated. 'DescrTab2' features a sophisticated algorithm to choose appropriate test statistics for your data and provides p-values. On top of this, confidence intervals for group differences of appropriated summary measures are automatically produces for two-group comparison. Tables generated by 'DescrTab2' can be integrated in a variety of document formats, including .html, .tex and .docx documents. 'DescrTab2' also allows printing tables to console and saving table objects for later use.

Maintained by Jan Meis. Last updated 1 years ago.

categorical-variables continuous-variable descriptive-statistics p-values statistical-tests statistics

9 stars 6.71 score 19 scripts 1 dependents

doomlab

MOTE:Effect Size and Confidence Interval Calculator

Measure of the Effect ('MOTE') is an effect size calculator, including a wide variety of effect sizes in the mean differences family (all versions of d) and the variance overlap family (eta, omega, epsilon, r). 'MOTE' provides non-central confidence intervals for each effect size, relevant test statistics, and output for reporting in APA Style (American Psychological Association, 2010, <ISBN:1433805618>) with 'LaTeX'. In research, an over-reliance on p-values may conceal the fact that a study is under-powered (Halsey, Curran-Everett, Vowler, & Drummond, 2015 <doi:10.1038/nmeth.3288>). A test may be statistically significant, yet practically inconsequential (Fritz, Scherndl, & Kühberger, 2012 <doi:10.1177/0959354312436870>). Although the American Psychological Association has long advocated for the inclusion of effect sizes (Wilkinson & American Psychological Association Task Force on Statistical Inference, 1999 <doi:10.1037/0003-066X.54.8.594>), the vast majority of peer-reviewed, published academic studies stop short of reporting effect sizes and confidence intervals (Cumming, 2013, <doi:10.1177/0956797613504966>). 'MOTE' simplifies the use and interpretation of effect sizes and confidence intervals. For more information, visit <https://www.aggieerin.com/shiny-server>.

Maintained by Erin M. Buchanan. Last updated 3 years ago.

confidence effect interval size statistics

17 stars 6.69 score 320 scripts 1 dependents

koenderks

jfa:Statistical Methods for Auditing

Provides statistical methods for auditing as implemented in JASP for Audit (Derks et al., 2021 <doi:10.21105/joss.02733>). First, the package makes it easy for an auditor to plan a statistical sample, select the sample from the population, and evaluate the misstatement in the sample compliant with international auditing standards. Second, the package provides statistical methods for auditing data, including tests of digit distributions and repeated values. Finally, the package includes methods for auditing algorithms on the aspect of fairness and bias. Next to classical statistical methodology, the package implements Bayesian equivalents of these methods whose statistical underpinnings are described in Derks et al. (2021) <doi:10.1111/ijau.12240>, Derks et al. (2024) <doi:10.2308/AJPT-2021-086>, Derks et al. (2022) <doi:10.31234/osf.io/8nf3e> Derks et al. (2024) <doi:10.31234/osf.io/tgq5z>, and Derks et al. (2025) <doi:10.31234/osf.io/b8tu2>.

Maintained by Koen Derks. Last updated 12 days ago.

algorithm-auditing audit audit-sampling bayesian data-auditing jasp jasp-for-audit statistical-audit statistics cpp

8 stars 6.69 score 17 scripts

mingzehuang

latentcor:Fast Computation of Latent Correlations for Mixed Data

The first stand-alone R package for computation of latent correlation that takes into account all variable types (continuous/binary/ordinal/zero-inflated), comes with an optimized memory footprint, and is computationally efficient, essentially making latent correlation estimation almost as fast as rank-based correlation estimation. The estimation is based on latent copula Gaussian models. For continuous/binary types, see Fan, J., Liu, H., Ning, Y., and Zou, H. (2017). For ternary type, see Quan X., Booth J.G. and Wells M.T. (2018) <arXiv:1809.06255>. For truncated type or zero-inflated type, see Yoon G., Carroll R.J. and Gaynanova I. (2020) <doi:10.1093/biomet/asaa007>. For approximation method of computation, see Yoon G., Müller C.L. and Gaynanova I. (2021) <doi:10.1080/10618600.2021.1882468>. The latter method uses multi-linear interpolation originally implemented in the R package <https://cran.r-project.org/package=chebpol>.

Maintained by Mingze Huang. Last updated 3 years ago.

data-analysis data-mining data-processing data-science data-structures machine-learning mixed-types statistics

16 stars 6.65 score 46 scripts 1 dependents

business-science

modeltime.resample:Resampling Tools for Time Series Forecasting

A 'modeltime' extension that implements forecast resampling tools that assess time-based model performance and stability for a single time series, panel data, and cross-sectional time series analysis.

Maintained by Matt Dancho. Last updated 1 years ago.

accuracy-metrics backtesting bootstrap bootstrapping cross-validation forecasting modeltime modeltime-resample resampling statistics tidymodels time-series

19 stars 6.64 score 38 scripts 1 dependents

simnph

SimNPH:Simulate Non-Proportional Hazards

A toolkit for simulation studies concerning time-to-event endpoints with non-proportional hazards. 'SimNPH' encompasses functions for simulating time-to-event data in various scenarios, simulating different trial designs like fixed-followup, event-driven, and group sequential designs. The package provides functions to calculate the true values of common summary statistics for the implemented scenarios and offers common analysis methods for time-to-event data. Helper functions for running simulations with the 'SimDesign' package and for aggregating and presenting the results are also included. Results of the conducted simulation study are available in the paper: "A Comparison of Statistical Methods for Time-To-Event Analyses in Randomized Controlled Trials Under Non-Proportional Hazards", Klinglmüller et al. (2025) <doi:10.1002/sim.70019>.

Maintained by Tobias Fellinger. Last updated 24 days ago.

clinical-trial-simulations non-proportional-hazards statistical-simulation statistics survival-analysis

6 stars 6.63 score 43 scripts

rsquaredacademy

xplorerr:Tools for Interactive Data Exploration

Tools for interactive data exploration built using 'shiny'. Includes apps for descriptive statistics, visualizing probability distributions, inferential statistics, linear regression, logistic regression and RFM analysis.

Maintained by Aravind Hebbali. Last updated 5 months ago.

data exploration shiny-apps statistics visualization cpp

38 stars 6.62 score 11 scripts 6 dependents

marberts

gpindex:Generalized Price and Quantity Indexes

Tools to build and work with bilateral generalized-mean price indexes (and by extension quantity indexes), and indexes composed of generalized-mean indexes (e.g., superlative quadratic-mean indexes, GEKS). Covers the core mathematical machinery for making bilateral price indexes, computing price relatives, detecting outliers, and decomposing indexes, with wrappers for all common (and many uncommon) index-number formulas. Implements and extends many of the methods in Balk (2008, <doi:10.1017/CBO9780511720758>), von der Lippe (2007, <doi:10.3726/978-3-653-01120-3>), and the CPI manual (2020, <doi:10.5089/9781484354841.069>).

Maintained by Steve Martin. Last updated 1 days ago.

economics inflation official-statistics statistics

7 stars 6.60 score 29 scripts 1 dependents

serkor1

SLmetrics:Machine Learning Performance Evaluation on Steroids

Performance evaluation metrics for supervised and unsupervised machine learning, statistical learning and artificial intelligence applications. Core computations are implemented in 'C++' for scalability and efficiency.

Maintained by Serkan Korkmaz. Last updated 1 days ago.

cpp data-analysis data-science eigen3 machine-learning performance-metrics rcpp rcppeigen statistics supervised-learning cpp

22 stars 6.56 score

brubinstein

diffpriv:Easy Differential Privacy

An implementation of major general-purpose mechanisms for privatizing statistics, models, and machine learners, within the framework of differential privacy of Dwork et al. (2006) <doi:10.1007/11681878_14>. Example mechanisms include the Laplace mechanism for releasing numeric aggregates, and the exponential mechanism for releasing set elements. A sensitivity sampler (Rubinstein & Alda, 2017) <arXiv:1706.02562> permits sampling target non-private function sensitivity; combined with the generic mechanisms, it permits turn-key privatization of arbitrary programs.

Maintained by Benjamin Rubinstein. Last updated 3 years ago.

data-science differential-privacy diffpriv machine-learning statistics

67 stars 6.54 score 52 scripts

sacema

inctools:Incidence Estimation Tools

Tools for estimating incidence from biomarker data in cross- sectional surveys, and for calibrating tests for recent infection. Implements and extends the method of Kassanjee et al. (2012) <doi:10.1097/EDE.0b013e3182576c07>.

Maintained by Eduard Grebe. Last updated 4 years ago.

biomarkers biostatistics epidemiology hiv incidence incidence-estimation incidence-inference infectious-diseases statistics

6 stars 6.51 score 27 scripts

terrytangyuan

lfda:Local Fisher Discriminant Analysis

Functions for performing and visualizing Local Fisher Discriminant Analysis(LFDA), Kernel Fisher Discriminant Analysis(KLFDA), and Semi-supervised Local Fisher Discriminant Analysis(SELF).

Maintained by Yuan Tang. Last updated 2 years ago.

dimensionality-reduction distance-metric-learning machine-learning metric-learning statistics

76 stars 6.50 score 74 scripts 3 dependents

dhaine

episensr:Basic Sensitivity Analysis of Epidemiological Results

Basic sensitivity analysis of the observed relative risks adjusting for unmeasured confounding and misclassification of the exposure/outcome, or both. It follows the bias analysis methods and examples from the book by Lash T.L, Fox M.P, and Fink A.K. "Applying Quantitative Bias Analysis to Epidemiologic Data", ('Springer', 2021).

Maintained by Denis Haine. Last updated 1 years ago.

bias epidemiology sensitivity-analysis statistics

13 stars 6.48 score 39 scripts 1 dependents

r-spark

sparklyr.flint:Sparklyr Extension for 'Flint'

This sparklyr extension makes 'Flint' time series library functionalities (<https://github.com/twosigma/flint>) easily accessible through R.

Maintained by Edgar Ruiz. Last updated 3 years ago.

apache-spark data-analysis data-mining data-science distributed distributed-computing flint remote-clusters spark sparklyr statistical-analysis statistics stats summarization summary-statistics time-series time-series-analysis twosigma-flint

9 stars 6.46 score 54 scripts

elbersb

segregation:Entropy-Based Segregation Indices

Computes segregation indices, including the Index of Dissimilarity, as well as the information-theoretic indices developed by Theil (1971) <isbn:978-0471858454>, namely the Mutual Information Index (M) and Theil's Information Index (H). The M, further described by Mora and Ruiz-Castillo (2011) <doi:10.1111/j.1467-9531.2011.01237.x> and Frankel and Volij (2011) <doi:10.1016/j.jet.2010.10.008>, is a measure of segregation that is highly decomposable. The package provides tools to decompose the index by units and groups (local segregation), and by within and between terms. The package also provides a method to decompose differences in segregation as described by Elbers (2021) <doi:10.1177/0049124121986204>. The package includes standard error estimation by bootstrapping, which also corrects for small sample bias. The package also contains functions for visualizing segregation patterns.

Maintained by Benjamin Elbers. Last updated 1 years ago.

entropy segregation statistics cpp

36 stars 6.44 score 51 scripts

faosorios

fastmatrix:Fast Computation of some Matrices Useful in Statistics

Small set of functions to fast computation of some matrices and operations useful in statistics and econometrics. Currently, there are functions for efficient computation of duplication, commutation and symmetrizer matrices with minimal storage requirements. Some commonly used matrix decompositions (LU and LDL), basic matrix operations (for instance, Hadamard, Kronecker products and the Sherman-Morrison formula) and iterative solvers for linear systems are also available. In addition, the package includes a number of common statistical procedures such as the sweep operator, weighted mean and covariance matrix using an online algorithm, linear regression (using Cholesky, QR, SVD, sweep operator and conjugate gradients methods), ridge regression (with optimal selection of the ridge parameter considering several procedures), omnibus tests for univariate normality, functions to compute the multivariate skewness, kurtosis, the Mahalanobis distance (checking the positive defineteness), and the Wilson-Hilferty transformation of gamma variables. Furthermore, the package provides interfaces to C code callable by another C code from other R packages.

Maintained by Felipe Osorio. Last updated 1 years ago.

commutation-matrix jarque-bera-test ldl-factorization lu-factorization matrix-api-for-r-packages matrix-norms modified-cholesky ols-regression power-method ridge-regression sherman-morrison statistics sweep-operator symmetrizer-matrix fortran openblas

19 stars 6.37 score 37 scripts 11 dependents

nt-williams

lmtp:Non-Parametric Causal Effects of Feasible Interventions Based on Modified Treatment Policies

Non-parametric estimators for casual effects based on longitudinal modified treatment policies as described in Diaz, Williams, Hoffman, and Schenck <doi:10.1080/01621459.2021.1955691>, traditional point treatment, and traditional longitudinal effects. Continuous, binary, categorical treatments, and multivariate treatments are allowed as well are censored outcomes. The treatment mechanism is estimated via a density ratio classification procedure irrespective of treatment variable type. For both continuous and binary outcomes, additive treatment effects can be calculated and relative risks and odds ratios may be calculated for binary outcomes. Supports survival outcomes with competing risks (Diaz, Hoffman, and Hejazi; <doi:10.1007/s10985-023-09606-7>).

Maintained by Nicholas Williams. Last updated 21 days ago.

causal-inference censored-data longitudinal-data machine-learning modified-treatment-policy nonparametric-statistics precision-medicine robust-statistics statistics stochastic-interventions survival-analysis targeted-learning

64 stars 6.37 score 91 scripts

cmstatr

cmstatr:Statistical Methods for Composite Material Data

An implementation of the statistical methods commonly used for advanced composite materials in aerospace applications. This package focuses on calculating basis values (lower tolerance bounds) for material strength properties, as well as performing the associated diagnostic tests. This package provides functions for calculating basis values assuming several different distributions, as well as providing functions for non-parametric methods of computing basis values. Functions are also provided for testing the hypothesis that there is no difference between strength and modulus data from an alternate sample and that from a "qualification" or "baseline" sample. For a discussion of these statistical methods and their use, see the Composite Materials Handbook, Volume 1 (2012, ISBN: 978-0-7680-7811-4). Additional details about this package are available in the paper by Kloppenborg (2020, <doi:10.21105/joss.02265>).

Maintained by Stefan Kloppenborg. Last updated 10 days ago.

composite-material-data data materials-science statistical-analysis statistics

4 stars 6.36 score 23 scripts

hoxo-m

densratio:Density Ratio Estimation

Density ratio estimation. The estimated density ratio function can be used in many applications such as anomaly detection, change-point detection, covariate shift adaptation. The implemented methods are uLSIF (Hido et al. (2011) <doi:10.1007/s10115-010-0283-2>), RuLSIF (Yamada et al. (2011) <doi:10.1162/NECO_a_00442>), and KLIEP (Sugiyama et al. (2007) <doi:10.1007/s10463-008-0197-x>).

Maintained by Koji Makiyama. Last updated 6 years ago.

anomalydetection machine-learning machine-learning-algorithms machine-learning-library r-language statistics

21 stars 6.36 score 36 scripts 2 dependents

bioc

structToolbox:Data processing & analysis tools for Metabolomics and other omics

An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.

Maintained by Gavin Rhys Lloyd. Last updated 1 months ago.

workflowstep metabolomics bioconductor-package dims lc-ms machine-learning multivariate-analysis statistics univariate

10 stars 6.26 score 12 scripts

fabrice-rossi

mixvlmc:Variable Length Markov Chains with Covariates

Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999) <doi:10.1214/aos/1018031204> for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) <doi:10.1111/jtsa.12615> for VLMC with covariates.

Maintained by Fabrice Rossi. Last updated 11 months ago.

machine-learning markov-chain markov-model statistics time-series cpp

2 stars 6.23 score 20 scripts

jacobseedorff21

BranchGLM:Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms

Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'.

Maintained by Jacob Seedorff. Last updated 6 months ago.

generalized-linear-models regression statistics subset-selection variable-selection openblas cpp openmp

7 stars 6.20 score 30 scripts

matherealize

simdata:Generate Simulated Datasets

Generate simulated datasets from an initial underlying distribution and apply transformations to obtain realistic data. Implements the 'NORTA' (Normal-to-anything) approach from Cario and Nelson (1997) and other data generating mechanisms. Simple network visualization tools are provided to facilitate communicating the simulation setup.

Maintained by Michael Kammer. Last updated 4 months ago.

data-generation regression simulation statistics

7 stars 6.10 score 10 scripts 1 dependents

modal-inria

RMixtComp:Mixture Models with Heterogeneous and (Partially) Missing Data

Mixture Composer (Biernacki (2015) <https://inria.hal.science/hal-01253393v1>) is a project to perform clustering using mixture models with heterogeneous data and partially missing data. Mixture models are fitted using a SEM algorithm. It includes 8 models for real, categorical, counting, functional and ranking data.

Maintained by Quentin Grimonprez. Last updated 11 months ago.

clustering cpp heterogeneous-data missing-data mixed-data mixture-model statistics

13 stars 6.10 score 12 scripts

capnrefsmmat

regressinator:Simulate and Diagnose (Generalized) Linear Models

Simulate samples from populations with known covariate distributions, generate response variables according to common linear and generalized linear model families, draw from sampling distributions of regression estimates, and perform visual inference on diagnostics from model fits.

Maintained by Alex Reinhart. Last updated 6 months ago.

statistics

4 stars 6.08 score 25 scripts

tanaylab

tgstat:Amos Tanay's Group High Performance Statistical Utilities

A collection of high performance utilities to compute distance, correlation, auto correlation, clustering and other tasks. Contains graph clustering algorithm described in "MetaCell: analysis of single-cell RNA-seq data using K-nn graph partitions" (Yael Baran, Akhiad Bercovich, Arnau Sebe-Pedros, Yaniv Lubling, Amir Giladi, Elad Chomsky, Zohar Meir, Michael Hoichman, Aviezer Lifshitz & Amos Tanay, 2019 <doi:10.1186/s13059-019-1812-2>).

Maintained by Aviezer Lifshitz. Last updated 6 months ago.

algorithms-implemented correlation knn statistics openblas cpp

8 stars 6.06 score 24 scripts 1 dependents

terrytangyuan

autoplotly:Automatic Generation of Interactive Visualizations for Statistical Results

Functionalities to automatically generate interactive visualizations for statistical results supported by 'ggfortify', such as time series, PCA, clustering and survival analysis, with 'plotly.js' <https://plotly.com/> and 'ggplot2' style. The generated visualizations can also be easily extended using 'ggplot2' and 'plotly' syntax while staying interactive.

Maintained by Yuan Tang. Last updated 2 years ago.

data-visualization ggplot2 interactive-visualizations machine-learning plotly plotlyjs statistics

88 stars 6.01 score 23 scripts

ltrr-arizona-edu

burnr:Forest Fire History Analysis

Tools to read, write, parse, and analyze forest fire history data (e.g. FHX). Described in Malevich et al. (2018) <doi:10.1016/j.dendro.2018.02.005>.

Maintained by Steven Malevich. Last updated 3 years ago.

citation dendrochronology ecology forestfire plot scientific statistics

15 stars 5.95 score 59 scripts

alexioannides

pipeliner:Machine Learning Pipelines for R

A framework for defining 'pipelines' of functions for applying data transformations, model estimation and inverse-transformations, resulting in predicted value generation (or model-scoring) functions that automatically apply the entire pipeline of functions required to go from input to predicted output.

Maintained by Alex Ioannides. Last updated 8 years ago.

data-science machine-learning machine-learning-pipelines pipeline prediction statistics transform-functions workflow

67 stars 5.94 score 26 scripts

terrytangyuan

dml:Distance Metric Learning in R

State-of-the-art algorithms for distance metric learning, including global and local methods such as Relevant Component Analysis, Discriminative Component Analysis, Local Fisher Discriminant Analysis, etc. These distance metric learning methods are widely applied in feature extraction, dimensionality reduction, clustering, classification, information retrieval, and computer vision problems.

Maintained by Yuan Tang. Last updated 2 years ago.

dimensionality-reduction distance-metric-learning machine-learning metric-learning statistics

58 stars 5.94 score 8 scripts 1 dependents

pegeler

samplesizeCMH:Power and Sample Size Calculation for the Cochran-Mantel-Haenszel Test

Calculates the power and sample size for Cochran-Mantel-Haenszel tests. There are also several helper functions for working with probability, odds, relative risk, and odds ratio values.

Maintained by Paul Egeler. Last updated 2 months ago.

categorical-data cmh-test sample-size statistical-power statistics

4 stars 5.94 score 36 scripts

bozenne

BuyseTest:Generalized Pairwise Comparisons

Implementation of the Generalized Pairwise Comparisons (GPC) as defined in Buyse (2010) <doi:10.1002/sim.3923> for complete observations, and extended in Peron (2018) <doi:10.1177/0962280216658320> to deal with right-censoring. GPC compare two groups of observations (intervention vs. control group) regarding several prioritized endpoints to estimate the probability that a random observation drawn from one group performs better/worse/equivalently than a random observation drawn from the other group. Summary statistics such as the net treatment benefit, win ratio, or win odds are then deduced from these probabilities. Confidence intervals and p-values are obtained based on asymptotic results (Ozenne 2021 <doi:10.1177/09622802211037067>), non-parametric bootstrap, or permutations. The software enables the use of thresholds of minimal importance difference, stratification, non-prioritized endpoints (O Brien test), and can handle right-censoring and competing-risks.

Maintained by Brice Ozenne. Last updated 16 days ago.

generalized-pairwise-comparisons non-parametric statistics cpp

5 stars 5.91 score 90 scripts

strakaps

MittagLeffleR:Mittag-Leffler Family of Distributions

Implements the Mittag-Leffler function, distribution, random variate generation, and estimation. Based on the Laplace-Inversion algorithm by Garrappa, R. (2015) <doi:10.1137/140971191>.

Maintained by Peter Straka. Last updated 4 years ago.

probability statistics

6 stars 5.88 score 28 scripts

mvuorre

bmlm:Bayesian Multilevel Mediation

Easy estimation of Bayesian multilevel mediation models with Stan.

Maintained by Matti Vuorre. Last updated 4 months ago.

bayesian-data-analysis multilevel-mediation-models statistics cpp

42 stars 5.81 score 34 scripts

csblatvia

vardpoor:Variance Estimation for Sample Surveys by the Ultimate Cluster Method

Generation of domain variables, linearization of several non-linear population statistics (the ratio of two totals, weighted income percentile, relative median income ratio, at-risk-of-poverty rate, at-risk-of-poverty threshold, Gini coefficient, gender pay gap, the aggregate replacement ratio, the relative median income ratio, median income below at-risk-of-poverty gap, income quintile share ratio, relative median at-risk-of-poverty gap), computation of regression residuals in case of weight calibration, variance estimation of sample surveys by the ultimate cluster method (Hansen, Hurwitz and Madow, Sample Survey Methods And Theory, vol. I: Methods and Applications; vol. II: Theory. 1953, New York: John Wiley and Sons), variance estimation for longitudinal, cross-sectional measures and measures of change for single and multistage stage cluster sampling designs (Berger, Y. G., 2015, <doi:10.1111/rssa.12116>). Several other precision measures are derived - standard error, the coefficient of variation, the margin of error, confidence interval, design effect.

Maintained by Martins Liberts. Last updated 3 years ago.

sampling statistics variance

10 stars 5.78 score 240 scripts

tnagler

vinereg:D-Vine Quantile Regression

Implements D-vine quantile regression models with parametric or nonparametric pair-copulas. See Kraus and Czado (2017) <doi:10.1016/j.csda.2016.12.009> and Schallhorn et al. (2017) <doi:10.48550/arXiv.1705.08310>.

Maintained by Thomas Nagler. Last updated 3 months ago.

copula estimation statistics vine cpp

11 stars 5.76 score 26 scripts

flying-sheep

ggplot.multistats:Multiple Summary Statistics for Binned Stats/Geometries

Provides the ggplot binning layer stat_summaries_hex(), which functions similar to its singular form, but allows the use of multiple statistics per bin. Those statistics can be mapped to multiple bin aesthetics.

Maintained by Philipp Angerer. Last updated 6 months ago.

ggplot2 ggplot2-geom statistics

10 stars 5.76 score 16 scripts 2 dependents

claudiozandonella

PRDA:Conduct a Prospective or Retrospective Design Analysis

An implementation of the "Design Analysis" proposed by Gelman and Carlin (2014) <doi:10.1177/1745691614551642>. It combines the evaluation of Power-Analysis with other inferential-risks as Type-M error (i.e. Magnitude) and Type-S error (i.e. Sign). See also Altoè et al. (2020) <doi:10.3389/fpsyg.2019.02893> and Bertoldo et al. (2020) <doi:10.31234/osf.io/q9f86>.

Maintained by Claudio Zandonella Callegher. Last updated 4 years ago.

design-analysis statistics openblas cpp

6 stars 5.73 score 30 scripts

rubenfcasal

npsp:Nonparametric Spatial Statistics

Multidimensional nonparametric spatial (spatio-temporal) geostatistics. S3 classes and methods for multidimensional: linear binning, local polynomial kernel regression (spatial trend estimation), density and variogram estimation. Nonparametric methods for simultaneous inference on both spatial trend and variogram functions (for spatial processes). Nonparametric residual kriging (spatial prediction). For details on these methods see, for example, Fernandez-Casal and Francisco-Fernandez (2014) <doi:10.1007/s00477-013-0817-8> or Castillo-Paez et al. (2019) <doi:10.1016/j.csda.2019.01.017>.

Maintained by Ruben Fernandez-Casal. Last updated 5 months ago.

geostatistics spatial-data-analysis statistics fortran openblas

4 stars 5.71 score 64 scripts

svazzole

sparsevar:Sparse VAR/VECM Models Estimation

A wrapper for sparse VAR/VECM time series models estimation using penalties like ENET (Elastic Net), SCAD (Smoothly Clipped Absolute Deviation) and MCP (Minimax Concave Penalty). Based on the work of Sumanta Basu and George Michailidis <doi:10.1214/15-AOS1315>.

Maintained by Simone Vazzoler. Last updated 4 years ago.

econometrics lasso mcp scad sparse statistics time-series var vecm

11 stars 5.69 score 30 scripts 1 dependents

tsostarics

contrastable:Consistent Contrast Coding for Factors

Quickly set and summarize contrasts for factors prior to regression analyses. Intended comparisons, baseline conditions, and intercepts can be explicitly set and documented without the user needing to directly manipulate matrices. Reviews and introductions for contrast coding are available in Brehm and Alday (2022)<doi:10.1016/j.jml.2022.104334> and Schad et al. (2020)<doi:10.1016/j.jml.2019.104038>.

Maintained by Thomas Sostarics. Last updated 2 months ago.

contrasts modeling statistics

5.69 score 22 scripts

adelahladka

difNLR:DIF and DDF Detection by Non-Linear Regression Models

Detection of differential item functioning (DIF) among dichotomously scored items and differential distractor functioning (DDF) among unscored items with non-linear regression procedures based on generalized logistic regression models (Hladka & Martinkova, 2020, <doi:10.32614/RJ-2020-014>).

Maintained by Adela Hladka. Last updated 8 days ago.

differential-item-functioning item-analysis psychometrics statistics

6 stars 5.66 score 51 scripts 1 dependents

dgerlanc

bootES:Bootstrap Confidence Intervals on Effect Sizes

Calculate robust measures of effect sizes using the bootstrap.

Maintained by Daniel Gerlanc. Last updated 5 months ago.

bootstrapping-statistics effect-size social-sciences statistics

11 stars 5.63 score 62 scripts

hendersontrent

correctR:Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples

Calculate a set of corrected test statistics for cases when samples are not independent, such as when classification accuracy values are obtained over resamples or through k-fold cross-validation, as proposed by Nadeau and Bengio (2003) <doi:10.1023/A:1024068626366> and presented in Bouckaert and Frank (2004) <doi:10.1007/978-3-540-24775-3_3>.

Maintained by Trent Henderson. Last updated 2 months ago.

hypothesis-testing machine-learning statistics

14 stars 5.62 score 8 scripts 1 dependents

nelson-gon

mde:Missing Data Explorer

Correct identification and handling of missing data is one of the most important steps in any analysis. To aid this process, 'mde' provides a very easy to use yet robust framework to quickly get an idea of where the missing data lies and therefore find the most appropriate action to take. Graham WJ (2009) <doi:10.1146/annurev.psych.58.110405.085530>.

Maintained by Nelson Gonzabato. Last updated 3 years ago.

data-analysis data-cleaning data-exploration data-science datacleaner datacleaning exploratory-data-analysis missing missing-data missing-value-treatment missing-values missingness omit recode replace statistics

4 stars 5.61 score 34 scripts

cfwp

rags2ridges:Ridge Estimation of Precision Matrices from High-Dimensional Data

Proper L2-penalized maximum likelihood estimators for precision matrices and supporting functions to employ these estimators in a graphical modeling setting. For details, see Peeters, Bilgrau, & van Wieringen (2022) <doi:10.18637/jss.v102.i04> and associated publications.

Maintained by Carel F.W. Peeters. Last updated 1 years ago.

c-plus-plus graphical-models machine-learning networkscience statistics openblas cpp

8 stars 5.60 score 46 scripts

blasbenito

collinear:Automated Multicollinearity Management

Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.

Maintained by Blas M. Benito. Last updated 2 months ago.

machine-learning multicollinearity statistics

11 stars 5.51 score 15 scripts 1 dependents

muriteams

ergmito:Exponential Random Graph Models for Small Networks

Simulation and estimation of Exponential Random Graph Models (ERGMs) for small networks using exact statistics as shown in Vega Yon et al. (2020) <DOI:10.1016/j.socnet.2020.07.005>. As a difference from the 'ergm' package, 'ergmito' circumvents using Markov-Chain Maximum Likelihood Estimator (MC-MLE) and instead uses Maximum Likelihood Estimator (MLE) to fit ERGMs for small networks. As exhaustive enumeration is computationally feasible for small networks, this R package takes advantage of this and provides tools for calculating likelihood functions, and other relevant functions, directly, meaning that in many cases both estimation and simulation of ERGMs for small networks can be faster and more accurate than simulation-based algorithms.

Maintained by George Vega Yon. Last updated 2 years ago.

ergm exponential-random-graph-models statistics openblas cpp openmp

9 stars 5.49 score 34 scripts

mcanouil

insane:INsulin Secretion ANalysEr

A user-friendly interface, using Shiny, to analyse glucose-stimulated insulin secretion (GSIS) assays in pancreatic beta cells or islets. The package allows the user to import several sets of experiments from different spreadsheets and to perform subsequent steps: summarise in a tidy format, visualise data quality and compare experimental conditions without omitting to account for technical confounders such as the date of the experiment or the technician. Together, insane is a comprehensive method that optimises pre-processing and analyses of GSIS experiments in a friendly-user interface. The Shiny App was initially designed for EndoC-betaH1 cell line following method described in Ndiaye et al., 2017 (<doi:10.1016/j.molmet.2017.03.011>).

Maintained by Mickaël Canouil. Last updated 3 months ago.

beta-cells endoc-betah1 insulin-secretion pancreas shiny statistics stats

3 stars 5.48 score 4 scripts

gdkrmr

coRanking:Co-Ranking Matrix

Calculates the co-ranking matrix to assess the quality of a dimensionality reduction.

Maintained by Guido Kraemer. Last updated 6 months ago.

dimensionality-reduction manifold-learning quality statistics unsupervised-learning cpp

9 stars 5.43 score 20 scripts 1 dependents

persimune

explainer:Machine Learning Model Explainer

It enables detailed interpretation of complex classification and regression models through Shapley analysis including data-driven characterization of subgroups of individuals. Furthermore, it facilitates multi-measure model evaluation, model fairness, and decision curve analysis. Additionally, it offers enhanced visualizations with interactive elements.

Maintained by Ramtin Zargari Marandi. Last updated 6 months ago.

ai classification clinical-research explainability explainable-ai interpretability machine-learning regression shap statistics

15 stars 5.43 score 12 scripts

modeloriented

hstats:Interaction Statistics

Fast, model-agnostic implementation of different H-statistics introduced by Jerome H. Friedman and Bogdan E. Popescu (2008) <doi:10.1214/07-AOAS148>. These statistics quantify interaction strength per feature, feature pair, and feature triple. The package supports multi-output predictions and can account for case weights. In addition, several variants of the original statistics are provided. The shape of the interactions can be explored through partial dependence plots or individual conditional expectation plots. 'DALEX' explainers, meta learners ('mlr3', 'tidymodels', 'caret') and most other models work out-of-the-box.

Maintained by Michael Mayer. Last updated 7 months ago.

interaction interpretability machine-learning rstat statistics xai

29 stars 5.39 score 34 scripts

ntguardian

CPAT:Change Point Analysis Tests

Implements several statistical tests for structural change, specifically the tests featured in Horváth, Rice and Miller (in press): CUSUM (with weighted/trimmed variants), Darling-Erdös, Hidalgo-Seo, Andrews, and the new Rényi-type test.

Maintained by Curtis Miller. Last updated 6 years ago.

change point statistics tests cpp

11 stars 5.37 score 43 scripts

statisticsnorway

SmallCountRounding:Small Count Rounding of Tabular Data

A statistical disclosure control tool to protect frequency tables in cases where small values are sensitive. The function PLSrounding() performs small count rounding of necessary inner cells so that all small frequencies of cross-classifications to be published (publishable cells) are rounded. This is equivalent to changing micro data since frequencies of unique combinations are changed. Thus, additivity and consistency are guaranteed. The methodology is described in Langsrud and Heldal (2018) <https://www.researchgate.net/publication/327768398_An_Algorithm_for_Small_Count_Rounding_of_Tabular_Data>.

Maintained by Øyvind Langsrud. Last updated 15 days ago.

statistics

3 stars 5.36 score 14 scripts

centerforstatistics-ugent

pim:Fit Probabilistic Index Models

Fit a probabilistic index model as described in Thas et al, 2012: <doi:10.1111/j.1467-9868.2011.01020.x>. The interface to the modeling function has changed in this new version. The old version is still available at R-Forge.

Maintained by Joris Meys. Last updated 3 months ago.

modelling statistics

10 stars 5.33 score 43 scripts

mlr-org

mlr3inferr:Inference on the Generalization Error

Confidence interval and resampling methods for inference on the generalization error.

Maintained by Sebastian Fischer. Last updated 2 months ago.

machine-learning statistics

4 stars 5.32 score 4 scripts 2 dependents

ashenoy-cmbi

grafify:Easy Graphs for Data Visualisation and Linear Models for ANOVA

Easily explore data by plotting graphs with a few lines of code. Use these ggplot() wrappers to quickly draw graphs of scatter/dots with box-whiskers, violins or SD error bars, data distributions, before-after graphs, factorial ANOVA and more. Customise graphs in many ways, for example, by choosing from colour blind-friendly palettes (12 discreet, 3 continuous and 2 divergent palettes). Use the simple code for ANOVA as ordinary (lm()) or mixed-effects linear models (lmer()), including randomised-block or repeated-measures designs, and fit non-linear outcomes as a generalised additive model (gam) using mgcv(). Obtain estimated marginal means and perform post-hoc comparisons on fitted models (via emmeans()). Also includes small datasets for practising code and teaching basics before users move on to more complex designs. See vignettes for details on usage <https://grafify.shenoylab.com/>. Citation: <doi:10.5281/zenodo.5136508>.

Maintained by Avinash R Shenoy. Last updated 13 days ago.

ggplot2 linear-models post-hoc-comparisons statistics vignettes

48 stars 5.31 score 107 scripts

tnagler

wdm:Weighted Dependence Measures

Provides efficient implementations of weighted dependence measures and related asymptotic tests for independence. Implemented measures are the Pearson correlation, Spearman's rho, Kendall's tau, Blomqvist's beta, and Hoeffding's D; see, e.g., Nelsen (2006) <doi:10.1007/0-387-28678-0> and Hollander et al. (2015, ISBN:9780470387375).

Maintained by Thomas Nagler. Last updated 3 months ago.

dependence statistics cpp

3 stars 5.30 score 11 scripts 21 dependents

bioc

biotmle:Targeted Learning with Moderated Statistics for Biomarker Discovery

Tools for differential expression biomarker discovery based on microarray and next-generation sequencing data that leverage efficient semiparametric estimators of the average treatment effect for variable importance analysis. Estimation and inference of the (marginal) average treatment effects of potential biomarkers are computed by targeted minimum loss-based estimation, with joint, stable inference constructed across all biomarkers using a generalization of moderated statistics for use with the estimated efficient influence function. The procedure accommodates the use of ensemble machine learning for the estimation of nuisance functions.

Maintained by Nima Hejazi. Last updated 5 months ago.

regression geneexpression differentialexpression sequencing microarray rnaseq immunooncology bioconductor bioconductor-package bioconductor-packages bioinformatics biomarker-discovery biostatistics causal-inference computational-biology machine-learning statistics targeted-learning

5 stars 5.30 score 5 scripts

ncchung

jackstraw:Statistical Inference for Unsupervised Learning

Test for association between the observed data and their estimated latent variables. The jackstraw package provides a resampling strategy and testing scheme to estimate statistical significance of association between the observed data and their latent variables. Depending on the data type and the analysis aim, the latent variables may be estimated by principal component analysis (PCA), factor analysis (FA), K-means clustering, and related unsupervised learning algorithms. The jackstraw methods learn over-fitting characteristics inherent in this circular analysis, where the observed data are used to estimate the latent variables and used again to test against that estimated latent variables. When latent variables are estimated by PCA, the jackstraw enables statistical testing for association between observed variables and latent variables, as estimated by low-dimensional principal components (PCs). This essentially leads to identifying variables that are significantly associated with PCs. Similarly, unsupervised clustering, such as K-means clustering, partition around medoids (PAM), and others, finds coherent groups in high-dimensional data. The jackstraw estimates statistical significance of cluster membership, by testing association between data and cluster centers. Clustering membership can be improved by using the resulting jackstraw p-values and posterior inclusion probabilities (PIPs), with an application to unsupervised evaluation of cell identities in single cell RNA-seq (scRNA-seq).

Maintained by Neo Christopher Chung. Last updated 3 months ago.

clustering k-means machine-learning pca statistics unsupervised

16 stars 5.29 score 35 scripts

zzawadz

DepthProc:Statistical Depth Functions for Multivariate Analysis

Data depth concept offers a variety of powerful and user friendly tools for robust exploration and inference for multivariate data. The offered techniques may be successfully used in cases of lack of our knowledge on parametric models generating data due to their nature. The package consist of among others implementations of several data depth techniques involving multivariate quantile-quantile plots, multivariate scatter estimators, multivariate Wilcoxon tests and robust regressions.

Maintained by Zygmunt Zawadzki. Last updated 3 years ago.

depth-functions exploratory-data-analysis statistics openblas cpp openmp

6 stars 5.27 score 104 scripts 2 dependents

pbiecek

ddst:Data Driven Smooth Tests

Smooth tests are data driven (alternative hypothesis is dynamically selected based on data). In this package you will find two groups of smooth of test: goodness-of-fit tests and nonparametric tests for comparing distributions. Among goodness-of-fit tests there are tests for exponent, Gaussian, Gumbel and uniform distribution. Among nonparametric tests there are tests for stochastic dominance, k-sample test, test with umbrella alternatives and test for change-point problems.

Maintained by Przemyslaw Biecek. Last updated 2 years ago.

data-driven smooth-test statistics test

6 stars 5.26 score 6 scripts 2 dependents

marberts

sps:Sequential Poisson Sampling

Sequential Poisson sampling is a variation of Poisson sampling for drawing probability-proportional-to-size samples with a given number of units, and is commonly used for price-index surveys. This package gives functions to draw stratified sequential Poisson samples according to the method by Ohlsson (1998, ISSN:0282-423X), as well as other order sample designs by Rosén (1997, <doi:10.1016/S0378-3758(96)00186-3>), and generate appropriate bootstrap replicate weights according to the generalized bootstrap method by Beaumont and Patak (2012, <doi:10.1111/j.1751-5823.2011.00166.x>).

Maintained by Steve Martin. Last updated 1 days ago.

official-statistics sampling statistics survey-sampling

4 stars 5.26 score 8 scripts

petrbouchal

czso:Use Open Data from the Czech Statistical Office in R

Get programmatic access to the open data provided by the Czech Statistical Office (CZSO, <https://czso.cz>).

Maintained by Petr Bouchal. Last updated 7 months ago.

czech-republic czech-statistical-office czso dataset open-data statistics

11 stars 5.24 score 53 scripts

rcalinjageman

esci:Estimation Statistics with Confidence Intervals

A collection of functions and 'jamovi' module for the estimation approach to inferential statistics, the approach which emphasizes effect sizes, interval estimates, and meta-analysis. Nearly all functions are based on 'statpsych' and 'metafor'. This package is still under active development, and breaking changes are likely, especially with the plot and hypothesis test functions. Data sets are included for all examples from Cumming & Calin-Jageman (2024) <ISBN:9780367531508>.

Maintained by Robert Calin-Jageman. Last updated 1 months ago.

jamovi jasp science statistics visualization

24 stars 5.24 score 12 scripts

alexanderlynl

safestats:Safe Anytime-Valid Inference

Functions to design and apply tests that are anytime valid. The functions can be used to design hypothesis tests in the prospective/randomised control trial setting or in the observational/retrospective setting. The resulting tests remain valid under both optional stopping and optional continuation. The current version includes safe t-tests and safe tests of two proportions. For details on the theory of safe tests, see Grunwald, de Heide and Koolen (2019) "Safe Testing" <arXiv:1906.07801>, for details on safe logrank tests see ter Schure, Perez-Ortiz, Ly and Grunwald (2020) "The Safe Logrank Test: Error Control under Continuous Monitoring with Unlimited Horizon" <arXiv:2011.06931v3> and Turner, Ly and Grunwald (2021) "Safe Tests and Always-Valid Confidence Intervals for contingency tables and beyond" <arXiv:2106.02693> for details on safe contingency table tests.

Maintained by Alexander Ly. Last updated 2 years ago.

evalues hacktoberfest safe-testing statistics

6 stars 5.23 score 14 scripts

shabbychef

fromo:Fast Robust Moments

Fast, numerically robust computation of weighted moments via 'Rcpp'. Supports computation on vectors and matrices, and Monoidal append of moments. Moments and cumulants over running fixed length windows can be computed, as well as over time-based windows. Moment computations are via a generalization of Welford's method, as described by Bennett et. (2009) <doi:10.1109/CLUSTR.2009.5289161>.

Maintained by Steven E. Pav. Last updated 4 months ago.

cumulants moments rolling-statistics statistics cpp

3 stars 5.22 score 22 scripts

bioc

OmicCircos:High-quality circular visualization of omics data

OmicCircos is an R application and package for generating high-quality circular plots for omics data.

Maintained by Ying Hu. Last updated 5 months ago.

visualization statistics annotation

5.20 score 80 scripts

modal-inria

RMixtCompUtilities:Utility Functions for 'MixtComp' Outputs

Mixture Composer <https://github.com/modal-inria/MixtComp> is a project to build mixture models with heterogeneous data sets and partially missing data management. This package contains graphical, getter and some utility functions to facilitate the analysis of 'MixtComp' output.

Maintained by Quentin Grimonprez. Last updated 11 months ago.

clustering cpp heterogeneous-data missing-data mixed-data mixture-model statistics

13 stars 5.19 score 2 scripts 1 dependents

nalimilan

logmult:Log-Multiplicative Models, Including Association Models

Functions to fit log-multiplicative models using 'gnm', with support for convenient printing, plots, and jackknife/bootstrap standard errors. For complex survey data, models can be fitted from design objects from the 'survey' package. Currently supported models include UNIDIFF (Erikson & Goldthorpe, 1992), a.k.a. log-multiplicative layer effect model (Xie, 1992) <doi:10.2307/2096242>, and several association models: Goodman (1979) <doi:10.2307/2286971> row-column association models of the RC(M) and RC(M)-L families with one or several dimensions; two skew-symmetric association models proposed by Yamaguchi (1990) <doi:10.2307/271086> and by van der Heijden & Mooijaart (1995) <doi:10.1177/0049124195024001002> Functions allow computing the intrinsic association coefficient (see Bouchet-Valat (2022) <doi:10.1177/0049124119852389>) and the Altham (1970) index <doi:10.1111/j.2517-6161.1970.tb00816.x>, including via the Bayes shrinkage estimator proposed by Zhou (2015) <doi:10.1177/0081175015570097>; and the RAS/IPF/Deming-Stephan algorithm.

Maintained by Milan Bouchet-Valat. Last updated 3 years ago.

log-linear-model modelling statistics

4 stars 5.18 score 76 scripts

rkabacoff

qacBase:Functions to Facilitate Exploratory Data Analysis

Functions for descriptive statistics, data management, and data visualization.

Maintained by Kabacoff Robert. Last updated 3 years ago.

eda statistics

1 stars 5.13 score 45 scripts

nhejazi

txshift:Efficient Estimation of the Causal Effects of Stochastic Interventions

Efficient estimation of the population-level causal effects of stochastic interventions on a continuous-valued exposure. Both one-step and targeted minimum loss estimators are implemented for the counterfactual mean value of an outcome of interest under an additive modified treatment policy, a stochastic intervention that may depend on the natural value of the exposure. To accommodate settings with outcome-dependent two-phase sampling, procedures incorporating inverse probability of censoring weighting are provided to facilitate the construction of inefficient and efficient one-step and targeted minimum loss estimators. The causal parameter and its estimation were first described by Díaz and van der Laan (2013) <doi:10.1111/j.1541-0420.2011.01685.x>, while the multiply robust estimation procedure and its application to data from two-phase sampling designs is detailed in NS Hejazi, MJ van der Laan, HE Janes, PB Gilbert, and DC Benkeser (2020) <doi:10.1111/biom.13375>. The software package implementation is described in NS Hejazi and DC Benkeser (2020) <doi:10.21105/joss.02447>. Estimation of nuisance parameters may be enhanced through the Super Learner ensemble model in 'sl3', available for download from GitHub using 'remotes::install_github("tlverse/sl3")'.

Maintained by Nima Hejazi. Last updated 6 months ago.

causal-effects causal-inference censored-data machine-learning robust-statistics statistics stochastic-interventions stochastic-treatment-regimes targeted-learning treatment-effects variable-importance

14 stars 5.12 score 19 scripts

tjmahr

polypoly:Helper Functions for Orthogonal Polynomials

Tools for reshaping, plotting, and manipulating matrices of orthogonal polynomials.

Maintained by Tristan Mahr. Last updated 2 years ago.

data-science statistics

19 stars 5.12 score 14 scripts

mikejareds

hermiter:Efficient Sequential and Batch Estimation of Univariate and Bivariate Probability Density Functions and Cumulative Distribution Functions along with Quantiles (Univariate) and Nonparametric Correlation (Bivariate)

Facilitates estimation of full univariate and bivariate probability density functions and cumulative distribution functions along with full quantile functions (univariate) and nonparametric correlation (bivariate) using Hermite series based estimators. These estimators are particularly useful in the sequential setting (both stationary and non-stationary) and one-pass batch estimation setting for large data sets. Based on: Stephanou, Michael, Varughese, Melvin and Macdonald, Iain. "Sequential quantiles via Hermite series density estimation." Electronic Journal of Statistics 11.1 (2017): 570-607 <doi:10.1214/17-EJS1245>, Stephanou, Michael and Varughese, Melvin. "On the properties of Hermite series based distribution function estimators." Metrika (2020) <doi:10.1007/s00184-020-00785-z> and Stephanou, Michael and Varughese, Melvin. "Sequential estimation of Spearman rank correlation using Hermite series estimators." Journal of Multivariate Analysis (2021) <doi:10.1016/j.jmva.2021.104783>.

Maintained by Michael Stephanou. Last updated 7 months ago.

cumulative-distribution-function kendall-correlation-coefficient online-algorithms probability-density-function quantile spearman-correlation-coefficient statistics streaming-algorithms streaming-data cpp

15 stars 5.11 score 17 scripts

lindanab

mecor:Measurement Error Correction in Linear Models with a Continuous Outcome

Covariate measurement error correction is implemented by means of regression calibration by Carroll RJ, Ruppert D, Stefanski LA & Crainiceanu CM (2006, ISBN:1584886331), efficient regression calibration by Spiegelman D, Carroll RJ & Kipnis V (2001) <doi:10.1002/1097-0258(20010115)20:1%3C139::AID-SIM644%3E3.0.CO;2-K> and maximum likelihood estimation by Bartlett JW, Stavola DBL & Frost C (2009) <doi:10.1002/sim.3713>. Outcome measurement error correction is implemented by means of the method of moments by Buonaccorsi JP (2010, ISBN:1420066560) and efficient method of moments by Keogh RH, Carroll RJ, Tooze JA, Kirkpatrick SI & Freedman LS (2014) <doi:10.1002/sim.7011>. Standard error estimation of the corrected estimators is implemented by means of the Delta method by Rosner B, Spiegelman D & Willett WC (1990) <doi:10.1093/oxfordjournals.aje.a115715> and Rosner B, Spiegelman D & Willett WC (1992) <doi:10.1093/oxfordjournals.aje.a116453>, the Fieller method described by Buonaccorsi JP (2010, ISBN:1420066560), and the Bootstrap by Carroll RJ, Ruppert D, Stefanski LA & Crainiceanu CM (2006, ISBN:1584886331).

Maintained by Linda Nab. Last updated 3 years ago.

linear-models measurement-error statistics

6 stars 5.07 score 13 scripts

ralmond

CPTtools:Tools for Creating Conditional Probability Tables

Provides support parameterized tables for Bayesian networks, particularly the IRT-like DiBello tables. Also, provides some tools for visualing the networks.

Maintained by Russell Almond. Last updated 3 months ago.

bayesian-network statistics

1 stars 5.05 score 21 scripts 4 dependents

athammad

pbox:Exploring Multivariate Spaces with Probability Boxes

Advanced statistical library offering a method to encapsulate and query the probability space of a dataset effortlessly using Probability Boxes (p-boxes). Its distinctive feature lies in the ease with which users can navigate and analyze marginal, joint, and conditional probabilities while taking into account the underlying correlation structure inherent in the data using copula theory and models. A comprehensive explanation is available in the paper "pbox: Exploring Multivariate Spaces with Probability Boxes" to be published in the Journal of Statistical Software.

Maintained by Ahmed T. Hammad. Last updated 9 months ago.

climate-change copula environmental-monitoring financial-analysis probability risk-assessment risk-management statistics

2 stars 5.04 score 4 scripts

jobnmadu

Dyn4cast:Dynamic Modeling and Machine Learning Environment

Estimates, predict and forecast dynamic models as well as Machine Learning metrics which assists in model selection for further analysis. The package also have capabilities to provide tools and metrics that are useful in machine learning and modeling. For example, there is quick summary, percent sign, Mallow's Cp tools and others. The ecosystem of this package is analysis of economic data for national development. The package is so far stable and has high reliability and efficiency as well as time-saving.

Maintained by Job Nmadu. Last updated 12 days ago.

data-science equal-lenght-forecast forecasting knots machine-learning nigeria prediction regression-models spline-models statistics time-series

4 stars 5.03 score 38 scripts

ncchung

jaccard:Testing similarity between binary datasets using Jaccard/Tanimoto coefficients

Calculate statistical significance of Jaccard/Tanimoto similarity coefficients.

Maintained by Neo Christopher Chung. Last updated 5 years ago.

binary-data hypothesis-testing jaccard similarity statistics tanimoto cpp

5 stars 5.03 score 85 scripts

marberts

rsmatrix:Matrices for Repeat-Sales Price Indexes

Calculate the matrices in Shiller (1991, <doi:10.1016/S1051-1377(05)80028-2>) that serve as the foundation for many repeat-sales price indexes.

Maintained by Steve Martin. Last updated 1 days ago.

economics housing statistics

4 stars 5.00 score 7 scripts

hendersontrent

theftdlc:Analyse and Interpret Time Series Features

Provides a suite of functions for analysing, interpreting, and visualising time-series features calculated from different feature sets from the 'theft' package. Implements statistical learning methodologies described in Henderson, T., Bryant, A., and Fulcher, B. (2023) <arXiv:2303.17809>.

Maintained by Trent Henderson. Last updated 2 months ago.

data-science data-visualization machine-learning statistics time-series

4 stars 4.94 score 11 scripts

smac-group

avar:Allan Variance

Implements the allan variance and allan variance linear regression estimator for latent time series models. More details about the method can be found, for example, in Guerrier, S., Molinari, R., & Stebler, Y. (2016) <doi:10.1109/LSP.2016.2541867>.

Maintained by Stéphane Guerrier. Last updated 3 years ago.

allan-variance inertial-sensors statistics time-series cpp

5 stars 4.88 score 9 scripts

graemeleehickey

goldilocks:Goldilocks Adaptive Trial Designs for Time-to-Event Endpoints

Implements the Goldilocks adaptive trial design for a time to event outcome using a piecewise exponential model and conjugate Gamma prior distributions. The method closely follows the article by Broglio and colleagues <doi:10.1080/10543406.2014.888569>, which allows users to explore the operating characteristics of different trial designs.

Maintained by Graeme L. Hickey. Last updated 2 months ago.

adaptive bayesian bayesian-statistics clinical-trials statistics cpp

7 stars 4.85 score 4 scripts

jucheng1992

ctmle:Collaborative Targeted Maximum Likelihood Estimation

Implements the general template for collaborative targeted maximum likelihood estimation. It also provides several commonly used C-TMLE instantiation, like the vanilla/scalable variable-selection C-TMLE (Ju et al. (2017) <doi:10.1177/0962280217729845>) and the glmnet-C-TMLE algorithm (Ju et al. (2017) <arXiv:1706.10029>).

Maintained by Cheng Ju. Last updated 5 years ago.

causal-inference machine-learning statistics tmle

5 stars 4.83 score 27 scripts

quadrama

DramaAnalysis:Analysis of Dramatic Texts

Analysis of preprocessed dramatic texts, with respect to literary research. The package provides functions to analyze and visualize information about characters, stage directions, the dramatic structure and the text itself. The dramatic texts are expected to be in CSV format, which can be installed from within the package, sample texts are provided. The package and the reasoning behind it are described in Reiter et al. (2017) <doi:10.18420/in2017_119>.

Maintained by Nils Reiter. Last updated 5 years ago.

corpus-linguistics digital-humanities drama dramatic-texts statistics

15 stars 4.79 score 41 scripts

bioc

HDTD:Statistical Inference about the Mean Matrix and the Covariance Matrices in High-Dimensional Transposable Data (HDTD)

Characterization of intra-individual variability using physiologically relevant measurements provides important insights into fundamental biological questions ranging from cell type identity to tumor development. For each individual, the data measurements can be written as a matrix with the different subsamples of the individual recorded in the columns and the different phenotypic units recorded in the rows. Datasets of this type are called high-dimensional transposable data. The HDTD package provides functions for conducting statistical inference for the mean relationship between the row and column variables and for the covariance structure within and between the row and column variables.

Maintained by Anestis Touloumis. Last updated 5 months ago.

differentialexpression genetics geneexpression microarray sequencing statisticalmethod software bioconductor-package high-dimensional statistics openblas cpp openmp

1 stars 4.78 score

bioc

VaSP:Quantification and Visualization of Variations of Splicing in Population

Discovery of genome-wide variable alternative splicing events from short-read RNA-seq data and visualizations of gene splicing information for publication-quality multi-panel figures in a population. (Warning: The visualizing function is removed due to the dependent package Sushi deprecated. If you want to use it, please change back to an older version.)

Maintained by Huihui Yu. Last updated 5 months ago.

rnaseq alternativesplicing differentialsplicing statisticalmethod visualization preprocessing clustering differentialexpression kegg immunooncology 3s-scores alternative-splicing ballgown rna-seq splicing sqtl statistics

3 stars 4.78 score 3 scripts

ropensci

tacmagic:Positron Emission Tomography Time-Activity Curve Analysis

To facilitate the analysis of positron emission tomography (PET) time activity curve (TAC) data, and to encourage open science and replicability, this package supports data loading and analysis of multiple TAC file formats. Functions are available to analyze loaded TAC data for individual participants or in batches. Major functionality includes weighted TAC merging by region of interest (ROI), calculating models including standardized uptake value ratio (SUVR) and distribution volume ratio (DVR, Logan et al. 1996 <doi:10.1097/00004647-199609000-00008>), basic plotting functions and calculation of cut-off values (Aizenstein et al. 2008 <doi:10.1001/archneur.65.11.1509>). Please see the walkthrough vignette for a detailed overview of 'tacmagic' functions.

Maintained by Eric Brown. Last updated 5 years ago.

mri neuroimaging neuroscience neuroscience-methods pet pet-mr positron positron-emission-tomography statistics

5 stars 4.76 score 23 scripts

egarpor

goffda:Goodness-of-Fit Tests for Functional Data

Implementation of several goodness-of-fit tests for functional data. Currently, mostly related with the functional linear model with functional/scalar response and functional/scalar predictor. The package allows for the replication of the data applications considered in García-Portugués, Álvarez-Liébana, Álvarez-Pérez and González-Manteiga (2021) <doi:10.1111/sjos.12486>.

Maintained by Eduardo García-Portugués. Last updated 1 years ago.

functional-data-analysis goodness-of-fit reproducible-research statistics openblas cpp

10 stars 4.76 score 19 scripts 1 dependents

egarpor

rotasym:Tests for Rotational Symmetry on the Hypersphere

Implementation of the tests for rotational symmetry on the hypersphere proposed in García-Portugués, Paindaveine and Verdebout (2020) <doi:10.1080/01621459.2019.1665527>. The package also implements the proposed distributions on the hypersphere, based on the tangent-normal decomposition, and allows for the replication of the data application considered in the paper.

Maintained by Eduardo García-Portugués. Last updated 16 days ago.

circular-statistics directional-statistics goodness-of-fit semiparametric statistics cpp

2 stars 4.68 score 32 scripts 5 dependents

alexiosg

RcppBessel:Bessel Functions Rcpp Interface

Exports an 'Rcpp' interface for the Bessel functions in the 'Bessel' package, which can then be called from the 'C++' code of other packages. For the original 'Fortran' implementation of these functions see Amos (1995) <doi:10.1145/212066.212078>.

Maintained by Alexios Galanos. Last updated 7 months ago.

mathematical-functions rcpp statistics cpp

1 stars 4.65 score 4 scripts 1 dependents

benjilu

forestError:A Unified Framework for Random Forest Prediction Error Estimation

Estimates the conditional error distributions of random forest predictions and common parameters of those distributions, including conditional misclassification rates, conditional mean squared prediction errors, conditional biases, and conditional quantiles, by out-of-bag weighting of out-of-bag prediction errors as proposed by Lu and Hardin (2021). This package is compatible with several existing packages that implement random forests in R.

Maintained by Benjamin Lu. Last updated 4 years ago.

inference intervals machine-learning machinelearning prediction random-forest randomforest statistics

26 stars 4.62 score 16 scripts

graemeleehickey

adaptDiag:Bayesian Adaptive Designs for Diagnostic Trials

Simulate clinical trials for diagnostic test devices and evaluate the operating characteristics under an adaptive design with futility assessment determined via the posterior predictive probabilities.

Maintained by Graeme L. Hickey. Last updated 3 months ago.

adaptive bayesian bayesian-statistics clinical-trials diagnostic-tests diagnostics statistics

4 stars 4.60 score 5 scripts

cotterell

TDCM:The Transition Diagnostic Classification Model Framework

Estimate the transition diagnostic classification model (TDCM) described in Madison & Bradshaw (2018) <doi:10.1007/s11336-018-9638-5>, a longitudinal extension of the log-linear cognitive diagnosis model (LCDM) in Henson, Templin & Willse (2009) <doi:10.1007/s11336-008-9089-5>. As the LCDM subsumes many other diagnostic classification models (DCMs), many other DCMs can be estimated longitudinally via the TDCM. The 'TDCM' package includes functions to estimate the single-group and multigroup TDCM, summarize results of interest including item parameters, growth proportions, transition probabilities, transitional reliability, attribute correlations, model fit, and growth plots.

Maintained by Michael E. Cotterell. Last updated 10 days ago.

statistics

4.60 score 5 scripts

bozenne

lavaSearch2:Tools for Model Specification in the Latent Variable Framework

Tools for model specification in the latent variable framework (add-on to the 'lava' package). The package contains three main functionalities: Wald tests/F-tests with improved control of the type 1 error in small samples, adjustment for multiple comparisons when searching for local dependencies, and adjustment for multiple comparisons when doing inference for multiple latent variable models.

Maintained by Brice Ozenne. Last updated 8 months ago.

inference latent-variable-models statistics openblas cpp

4.59 score 155 scripts

bioc

meshr:Tools for conducting enrichment analysis of MeSH

A set of annotation maps describing the entire MeSH assembled using data from MeSH.

Maintained by Koki Tsuyuzaki. Last updated 5 months ago.

annotationdata functionalannotation bioinformatics statistics annotation multiplecomparisons meshdb

4.56 score 9 scripts 1 dependents

jacob-long

dpm:Dynamic Panel Models Fit with Maximum Likelihood

Implements the dynamic panel models described by Allison, Williams, and Moral-Benito (2017 <doi:10.1177/2378023117710578>) in R. This class of models uses structural equation modeling to specify dynamic (lagged dependent variable) models with fixed effects for panel data. Additionally, models may have predictors that are only weakly exogenous, i.e., are affected by prior values of the dependent variable. Options also allow for random effects, dropping the lagged dependent variable, and a number of other specification choices.

Maintained by Jacob A. Long. Last updated 1 years ago.

social-science statistics

16 stars 4.55 score 44 scripts

shah-in-boots

rmdl:A Causality-Informed Modeling Approach

A system for describing and manipulating the many models that are generated in causal inference and data analysis projects, as based on the causal theory and criteria of Austin Bradford Hill (1965) <doi:10.1177/003591576505800503>. This system includes the addition of formal attributes that modify base `R` objects, including terms and formulas, with a focus on variable roles in the "do-calculus" of modeling, as described in Pearl (2010) <doi:10.2202/1557-4679.1203>. For example, the definition of exposure, outcome, and interaction are implicit in the roles variables take in a formula. These premises allow for a more fluent modeling approach focusing on variable relationships, and assessing effect modification, as described by VanderWeele and Robins (2007) <doi:10.1097/EDE.0b013e318127181b>. The essential goal is to help contextualize formulas and models in causality-oriented workflows.

Maintained by Anish S. Shah. Last updated 10 months ago.

epidemiology modeling statistics

4.54 score 7 scripts

csblatvia

surveyplanning:Survey Planning Tools

Tools for sample survey planning, including sample size calculation, estimation of expected precision for the estimates of totals, and calculation of optimal sample size allocation.

Maintained by Juris Breidaks. Last updated 4 years ago.

sample-size sampling statistics

8 stars 4.53 score 14 scripts 1 dependents

hanjunwei-lab

MiRSEA:'MicroRNA' Set Enrichment Analysis

The tools for 'MicroRNA Set Enrichment Analysis' can identify risk pathways(or prior gene sets) regulated by microRNA set in the context of microRNA expression data. (1) This package constructs a correlation profile of microRNA and pathways by the hypergeometric statistic test. The gene sets of pathways derived from the three public databases (Kyoto Encyclopedia of Genes and Genomes ('KEGG'); 'Reactome'; 'Biocarta') and the target gene sets of microRNA are provided by four databases('TarBaseV6.0'; 'mir2Disease'; 'miRecords'; 'miRTarBase';). (2) This package can quantify the change of correlation between microRNA for each pathway(or prior gene set) based on a microRNA expression data with cases and controls. (3) This package uses the weighted Kolmogorov-Smirnov statistic to calculate an enrichment score (ES) of a microRNA set that co-regulate to a pathway , which reflects the degree to which a given pathway is associated with the specific phenotype. (4) This package can provide the visualization of the results.

Maintained by Junwei Han. Last updated 5 years ago.

statistics pathways microrna enrichment analysis

4.51 score 16 scripts

xiaoruizhu

SurrogateRsq:Goodness-of-Fit Analysis for Categorical Data using the Surrogate R-Squared

To assess and compare the models' goodness of fit, R-squared is one of the most popular measures. For categorical data analysis, however, no universally adopted R-squared measure can resemble the ordinary least square (OLS) R-squared for linear models with continuous data. This package implement the surrogate R-squared measure for categorical data analysis, which is proposed in the study of Dungang Liu, Xiaorui Zhu, Brandon Greenwell, and Zewei Lin (2022) <doi:10.1111/bmsp.12289>. It can generate a point or interval measure of the surrogate R-squared. It can also provide a ranking measure of the percentage contribution of each variable to the overall surrogate R-squared. This ranking assessment allows one to check the importance of each variable in terms of their explained variance. This package can be jointly used with other existing R packages for variable selection and model diagnostics in the model-building process.

Maintained by Xiaorui (Jeremy) Zhu. Last updated 1 years ago.

categorical-data-analysis goodness-of-fit r-squared-statistic statistics

5 stars 4.48 score 12 scripts

eurostat

hicp:Harmonised Index of Consumer Prices

The Harmonised Index of Consumer Prices (HICP) is the key economic figure to measure inflation in the euro area. The methodology underlying the HICP is documented in the HICP Methodological Manual (<https://ec.europa.eu/eurostat/web/products-manuals-and-guidelines/w/ks-gq-24-003>). Based on the manual, this package provides functions to access and work with HICP data from Eurostat's public database (<https://ec.europa.eu/eurostat/data/database>).

Maintained by Sebastian Weinand. Last updated 8 months ago.

consumer-price-index inflation prices statistics

2 stars 4.48 score 6 scripts

yboulag

cTOST:Finite Sample Correction of the Two One-Sided Tests in the Univariate Framework

A system containing easy-to-use tools to compute the bioequivalence assessment in the univariate framework using the methods proposed in Boulaguiem et al. (2023) <doi:10.1101/2023.03.11.532179>.

Maintained by Younes Boulaguiem. Last updated 2 months ago.

bioequivalence equivalence highly-variable-drugs statistics

4.48 score 4 scripts

friendly

mvinfluence:Influence Measures and Diagnostic Plots for Multivariate Linear Models

Computes regression deletion diagnostics for multivariate linear models and provides some associated diagnostic plots. The diagnostic measures include hat-values (leverages), generalized Cook's distance, and generalized squared 'studentized' residuals. Several types of plots to detect influential observations are provided.

Maintained by Michael Friendly. Last updated 3 years ago.

multivariate-analysis multivariate-linear-regression statistics visualization

2 stars 4.41 score 26 scripts

pbosetti

adas.utils:Design of Experiments and Factorial Plans Utilities

A number of functions to create and analyze factorial plans according to the Design of Experiments (DoE) approach, with the addition of some utility function to perform some statistical analyses. DoE approach follows the approach in "Design and Analysis of Experiments" by Douglas C. Montgomery (2019, ISBN:978-1-119-49244-3). The package also provides utilities used in the course "Analysis of Data and Statistics" at the University of Trento, Italy.

Maintained by Paolo Bosetti. Last updated 19 hours ago.

doe statistics

4.40 score 6 scripts

nvietto

samplezoo:Generate Samples with a Variety of Probability Distributions

Simplifies the process of generating samples from a variety of probability distributions, allowing users to quickly create data frames for demonstrations, troubleshooting, or teaching purposes. Data is available in multiple sizes—small, medium, and large. For more information, refer to the package documentation.

Maintained by Nicholas Vietto. Last updated 1 months ago.

probability-distribution rng simulation statistics

4.40 score 8 scripts

markajoc

condvis:Conditional Visualization for Statistical Models

Exploring fitted models by interactively taking 2-D and 3-D sections in data space.

Maintained by Mark OConnell. Last updated 7 years ago.

models statistics visualization

20 stars 4.38 score 24 scripts

vusaverse

vvdoctor:Statistical Test App with R 'shiny'

Provides a user-friendly R 'shiny' app for performing various statistical tests on datasets. It allows users to upload data in numerous formats and perform statistical analyses. The app dynamically adapts its options based on the selected columns and supports both single and multiple column comparisons. The app's user interface is designed to streamline the process of selecting datasets, columns, and test options, making it easy for users to explore and interpret their data. The underlying functions for statistical tests are well-organized and can be used independently within other R scripts.

Maintained by Tomer Iwan. Last updated 11 months ago.

hypothesis-testing r-r-shiny shiny-apps shiny-r statistical-tests statistics stats

7 stars 4.32 score 3 scripts

hugleipzig

kitesquare:Visualize Contingency Tables Using Kite-Square Plots

Create a kite-square plot for contingency tables using 'ggplot2', to display their relevant quantities in a single figure (marginal, conditional, expected, observed, chi-squared). The plot resembles a flying kite inside a square if the variables are independent, and deviates from this the more dependence exists.

Maintained by John Wiedenhöft. Last updated 4 days ago.

contingency-table contingency-tables statistics visualisation visualization

1 stars 4.30 score

dcousin3

ANOFA:Analyses of Frequency Data

Analyses of frequencies can be performed using an alternative test based on the G statistic. The test has similar type-I error rates and power as the chi-square test. However, it is based on a total statistic that can be decomposed in an additive fashion into interaction effects, main effects, simple effects, contrast effects, etc., mimicking precisely the logic of ANOVA. We call this set of tools 'ANOFA' (Analysis of Frequency data) to highlight its similarities with ANOVA. This framework also renders plots of frequencies along with confidence intervals. Finally, effect sizes and planning statistical power are easily done under this framework. The ANOFA is a tool that assesses the significance of effects instead of the significance of parameters; as such, it is more intuitive to most researchers than alternative approaches based on generalized linear models. See Laurencelle and Cousineau (2023) <doi:10.20982/tqmp.19.2.p173>.

Maintained by Denis Cousineau. Last updated 3 months ago.

frequencies statistics

1 stars 4.30 score 1 scripts

timbeechey

clubpro:Classification Using Binary Procrustes Rotation

Implements a classification method described by Grice (2011, ISBN:978-0-12-385194-9) using binary procrustes rotation; a simplified version of procrustes rotation.

Maintained by Timothy Beechey. Last updated 10 months ago.

classification data-analysis psychology-experiments rcpp statistical-analysis statistics openblas cpp openmp

4.30 score 2 scripts

gasparl

neatStats:Neat and Painless Statistical Reporting

User-friendly, clear and simple statistics, primarily for publication in psychological science. The main functions are wrappers for other packages, but there are various additions as well. Every relevant step from data aggregation to reportable printed statistics is covered for basic experimental designs.

Maintained by Gáspár Lukács. Last updated 2 years ago.

bayesfactor confidence-intervals pipeline statistical-analysis statistics

4 stars 4.30 score

xsswang

remiod:Reference-Based Multiple Imputation for Ordinal/Binary Response

Reference-based multiple imputation of ordinal and binary responses under Bayesian framework, as described in Wang and Liu (2022) <arXiv:2203.02771>. Methods for missing-not-at-random include Jump-to-Reference (J2R), Copy Reference (CR), and Delta Adjustment which can generate tipping point analysis.

Maintained by Tony Wang. Last updated 2 years ago.

bayesian control-based copy-reference delta-adjustment generalized-linear-models glm jags jump-to-reference mcmc missing-at-random missing-data missing-not-at-random multiple-imputation non-ignorable ordinal-regression pattern-mixture-model reference-based statistics cpp

4.30 score 3 scripts

stephaneguerrier

pempi:Proportion Estimation with Marginal Proxy Information

A system contains easy-to-use tools for the conditional estimation of the prevalence of an emerging or rare infectious diseases using the methods proposed in Guerrier et al. (2023) <arXiv:2012.10745>.

Maintained by Stéphane Guerrier. Last updated 1 years ago.

covid prevalence rare-infectious-diseases statistics

4.30 score 9 scripts

thiyangt

DSjobtracker:What Skills and Qualifications are Required for Data Science Related Jobs?

Dataset containing information about job listings for data science job roles.

Maintained by Thiyanga S. Talagala. Last updated 1 years ago.

dataset qualifications skills statistics tidy

3 stars 4.29 score 13 scripts

egarpor

DirStats:Nonparametric Methods for Directional Data

Nonparametric kernel density estimation, bandwidth selection, and other utilities for analyzing directional data. Implements the estimator in Bai, Rao and Zhao (1987) <doi:10.1016/0047-259X(88)90113-3>, the cross-validation bandwidth selectors in Hall, Watson and Cabrera (1987) <doi:10.1093/biomet/74.4.751> and the plug-in bandwidth selectors in García-Portugués (2013) <doi:10.1214/13-ejs821>.

Maintained by Eduardo García-Portugués. Last updated 2 years ago.

directional-statistics nonparametric-statistics statistics fortran

12 stars 4.26 score 7 scripts 1 dependents

mightymetrika

npboottprm:Nonparametric Bootstrap Test with Pooled Resampling

Addressing crucial research questions often necessitates a small sample size due to factors such as distinctive target populations, rarity of the event under study, time and cost constraints, ethical concerns, or group-level unit of analysis. Many readily available analytic methods, however, do not accommodate small sample sizes, and the choice of the best method can be unclear. The 'npboottprm' package enables the execution of nonparametric bootstrap tests with pooled resampling to help fill this gap. Grounded in the statistical methods for small sample size studies detailed in Dwivedi, Mallawaarachchi, and Alvarado (2017) <doi:10.1002/sim.7263>, the package facilitates a range of statistical tests, encompassing independent t-tests, paired t-tests, and one-way Analysis of Variance (ANOVA) F-tests. The nonparboot() function undertakes essential computations, yielding detailed outputs which include test statistics, effect sizes, confidence intervals, and bootstrap distributions. Further, 'npboottprm' incorporates an interactive 'shiny' web application, nonparboot_app(), offering intuitive, user-friendly data exploration.

Maintained by Mackson Ncube. Last updated 6 months ago.

datascience nonparametric statistics

1 stars 4.26 score 5 scripts 2 dependents

hoxo-m

deltatest:Statistical Hypothesis Testing Using the Delta Method

Statistical hypothesis testing using the Delta method as proposed by Deng et al. (2018) <doi:10.1145/3219819.3219919>. This method replaces the standard variance estimation formula in the Z-test with an approximate formula derived via the Delta method, which can account for within-user correlation.

Maintained by Koji Makiyama. Last updated 13 days ago.

ab-testing data-science statistics

4 stars 4.26 score

jeffreyevans

GeNetIt:Spatial Graph-Theoretic Genetic Gravity Modelling

Implementation of spatial graph-theoretic genetic gravity models. The model framework is applicable for other types of spatial flow questions. Includes functions for constructing spatial graphs, sampling and summarizing associated raster variables and building unconstrained and singly constrained gravity models.

Maintained by Jeffrey S. Evans. Last updated 2 years ago.

landscape-genetics r-spatial spatial statistics

9 stars 4.24 score 39 scripts

koenderks

digitTests:Tests for Detecting Irregular Digit Patterns

Provides statistical tests and support functions for detecting irregular digit patterns in numerical data. The package includes tools for extracting digits at various locations in a number, tests for repeated values, and (Bayesian) tests of digit distributions.

Maintained by Koen Derks. Last updated 2 years ago.

digit-analysis digits statistics

3 stars 4.18 score 9 scripts

coatless-rpkg

msos:Data Sets and Functions Used in Multivariate Statistics: Old School by John Marden

Multivariate Analysis methods and data sets used in John Marden's book Multivariate Statistics: Old School (2015) <ISBN:978-1456538835>. This also serves as a companion package for the STAT 571: Multivariate Analysis course offered by the Department of Statistics at the University of Illinois at Urbana-Champaign ('UIUC').

Maintained by James Balamuta. Last updated 1 years ago.

multivariate statistics

3 stars 4.16 score 32 scripts 1 dependents

joshuawlambert

rFSA:Feasible Solution Algorithm for Finding Best Subsets and Interactions

Assists in statistical model building to find optimal and semi-optimal higher order interactions and best subsets. Uses the lm(), glm(), and other R functions to fit models generated from a feasible solution algorithm. Discussed in Subset Selection in Regression, A Miller (2002). Applied and explained for least median of squares in Hawkins (1993) <doi:10.1016/0167-9473(93)90246-P>. The feasible solution algorithm comes up with model forms of a specific type that can have fixed variables, higher order interactions and their lower order terms.

Maintained by Joshua Lambert. Last updated 4 years ago.

algorithm fsa interaction models parallel statistical statistics subset

7 stars 4.15 score 20 scripts

xiaoruizhu

PAsso:Assessing the Partial Association Between Ordinal Variables

An implementation of the unified framework for assessing partial association between ordinal variables after adjusting for a set of covariates (Dungang Liu, Shaobo Li, Yan Yu and Irini Moustaki (2020) <doi:10.1080/01621459.2020.1796394> Journal of the American Statistical Association). This package provides a set of tools to quantify, visualize, and test partial associations between multiple ordinal variables. It can produce a number of $phi$ measures, partial regression plots, 3-D plots, and p-values for testing H_0: phi=0 or H_0: phi <= delta.

Maintained by Xiaorui (Jeremy) Zhu. Last updated 1 years ago.

association-analysis ordinal-variables partial-association statistics cpp

7 stars 4.14 score 13 scripts 1 dependents