baseballr:Acquiring and Analyzing Baseball Data
Provides numerous utilities for acquiring and analyzing baseball data from online sources such as 'Baseball Reference' <>, 'FanGraphs' <>, and the 'MLB Stats' API <>.
Maintained by Saiem Gilani. Last updated 4 months ago.
73.1 match 380 stars 8.98 score 582 scriptsopenintrostat
openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
Maintained by Mine Çetinkaya-Rundel. Last updated 2 months ago.
13.9 match 240 stars 11.39 score 6.0k scriptsbayesball
LearnBayes:Learning Bayesian Inference
Contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.
Maintained by Jim Albert. Last updated 7 years ago.
10.9 match 38 stars 11.34 score 690 scripts 31 dependentscolindouglas
retrosheet:Import Professional Baseball Data from 'Retrosheet'
A collection of tools to import and structure the (currently) single-season event, game-log, roster, and schedule data available from <>. In particular, the event (a.k.a. play-by-play) files can be especially difficult to parse. This package does the parsing on those files, returning the requested data in the most practical R structure to use for sabermetric or other analyses.
Maintained by Colin Douglas. Last updated 1 years ago.
25.5 match 5 stars 4.18 score 30 scriptsalanarnholt
BSDA:Basic Statistics and Data Analysis
Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.
Maintained by Alan T. Arnholt. Last updated 2 years ago.
10.2 match 7 stars 9.11 score 1.3k scripts 6 dependentscamdenk
mlbplotR:Create 'ggplot2' and 'gt' Visuals with Major League Baseball Logos
Tools to help visualize Major League Baseball analysis in 'ggplot2' and 'gt'. You provide team/player information and 'mlbplotR' will transform that information into team colors, logos, or player headshots for graphics.
Maintained by Camden Kay. Last updated 4 months ago.
13.3 match 21 stars 5.97 score 111 scriptsstatmanrobin
Stat2Data:Datasets for Stat2
Datasets for the textbook Stat2: Modeling with Regression and ANOVA (second edition). The package also includes data for the first edition, Stat2: Building Models for a World of Data and a few functions for plotting diagnostics.
Maintained by Robin Lock. Last updated 6 years ago.
14.1 match 5 stars 4.94 score 544 scriptscdalzell
Lahman:Sean 'Lahman' Baseball Database
Provides the tables from the 'Sean Lahman Baseball Database' as a set of R data.frames. It uses the data on pitching, hitting and fielding performance and other tables from 1871 through 2023, as recorded in the 2024 version of the database. Documentation examples show how many baseball questions can be investigated.
Maintained by Chris Dalzell. Last updated 4 months ago.
5.8 match 79 stars 11.98 score 1.7k scripts 2 dependentsjverzani
UsingR:Data Sets, Etc. for the Text "Using R for Introductory Statistics", Second Edition
A collection of data sets to accompany the textbook "Using R for Introductory Statistics," second edition.
Maintained by John Verzani. Last updated 3 years ago.
14.0 match 1 stars 4.97 score 1.4k scriptshadley
plyr:Tools for Splitting, Applying and Combining Data
A set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary statistics. The development of 'plyr' has been generously supported by 'Becton Dickinson'.
Maintained by Hadley Wickham. Last updated 4 months ago.
3.4 match 500 stars 18.16 score 83k scripts 3.3k dependentsggobi
GGally:Extension to 'ggplot2'
The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.
Maintained by Barret Schloerke. Last updated 10 months ago.
3.4 match 597 stars 16.15 score 17k scripts 154 dependentscran
vcd:Visualizing Categorical Data
Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book "Visualizing Categorical Data" by Michael Friendly and is now the main support package for a new book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer (2015).
Maintained by David Meyer. Last updated 6 months ago.
4.5 match 5 stars 10.57 score 3.1k scripts 86 dependentsstatmanrobin
Lock5Data:Datasets for "Statistics: UnLocking the Power of Data"
Datasets for the third edition of "Statistics: Unlocking the Power of Data" by Lock^5 Includes version of datasets from earlier editions.
Maintained by Robin Lock. Last updated 4 years ago.
16.3 match 2.90 score 322 scriptsrobinhankin
hyper2:The Hyperdirichlet Distribution, Mark 2
A suite of routines for the hyperdirichlet distribution and reified Bradley-Terry; supersedes the 'hyperdirichlet' package; uses 'disordR' discipline <doi:10.48550/ARXIV.2210.03856>. To cite in publications please use Hankin 2017 <doi:10.32614/rj-2017-061>, and for Generalized Plackett-Luce likelihoods use Hankin 2024 <doi:10.18637/jss.v109.i08>.
Maintained by Robin K. S. Hankin. Last updated 3 days ago.
7.5 match 5 stars 6.01 score 38 scripts 1 dependentsatahk
pscl:Political Science Computational Laboratory
Bayesian analysis of item-response theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zero-inflated and hurdle models for count data; goodness-of-fit measures for GLMs; data sets used in writing and teaching; seats-votes curves.
Maintained by Simon Jackman. Last updated 1 years ago.
3.4 match 67 stars 13.28 score 2.7k scripts 54 dependentsmatloff
qeML:Quick and Easy Machine Learning Tools
The letters 'qe' in the package title stand for "quick and easy," alluding to the convenience goal of the package. We bring together a variety of machine learning (ML) tools from standard R packages, providing wrappers with a simple, convenient, and uniform interface.
Maintained by Norm Matloff. Last updated 25 days ago.
5.3 match 41 stars 8.41 score 48 scripts 1 dependentskwstat
corrgram:Plot a Correlogram
Calculates correlation of variables and displays the results graphically. Included panel functions can display points, shading, ellipses, and correlation values with confidence intervals. See Friendly (2002) <doi:10.1198/000313002533>.
Maintained by Kevin Wright. Last updated 8 months ago.
4.0 match 18 stars 10.93 score 1.3k scripts 3 dependentsrpruim
fastR2:Foundations and Applications of Statistics Using R (2nd Edition)
Data sets and utilities to accompany the second edition of "Foundations and Applications of Statistics: an Introduction using R" (R Pruim, published by AMS, 2017), a text covering topics from probability and mathematical statistics at an advanced undergraduate level. R is integrated throughout, and access to all the R code in the book is provided via the snippet() function.
Maintained by Randall Pruim. Last updated 1 years ago.
7.1 match 13 stars 5.85 score 108 scriptsstan-dev
rstanarm:Bayesian Applied Regression Modeling via Stan
Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.
Maintained by Ben Goodrich. Last updated 9 months ago.
2.4 match 393 stars 15.68 score 5.0k scripts 13 dependentscran
Rfit:Rank-Based Estimation for Linear Models
Rank-based (R) estimation and inference for linear models. Estimation is for general scores and a library of commonly used score functions is included.
Maintained by John Kloke. Last updated 10 months ago.
8.5 match 4.35 score 9 dependentsrudeboybert
resampledata:Data Sets for Mathematical Statistics with Resampling in R
Package of data sets from "Mathematical Statistics with Resampling in R" (1st Ed. 2011, 2nd Ed. 2018) by Laura Chihara and Tim Hesterberg.
Maintained by Albert Y. Kim. Last updated 4 months ago.
7.0 match 15 stars 5.15 score 187 scriptstrevorhastie
ISLR:Data for an Introduction to Statistical Learning with Applications in R
We provide the collection of data-sets used in the book 'An Introduction to Statistical Learning with Applications in R'.
Maintained by Trevor Hastie. Last updated 4 years ago.
4.5 match 4 stars 7.58 score 10k scripts 2 dependentsmatloff
regtools:Regression and Classification Tools
Tools for linear, nonlinear and nonparametric regression and classification. Novel graphical methods for assessment of parametric models using nonparametric methods. One vs. All and All vs. All multiclass classification, optional class probabilities adjustment. Nonparametric regression (k-NN) for general dimension, local-linear option. Nonlinear regression with Eickert-White method for dealing with heteroscedasticity. Utilities for converting time series to rectangular form. Utilities for conversion between factors and indicator variables. Some code related to "Statistical Regression and Classification: from Linear Models to Machine Learning", N. Matloff, 2017, CRC, ISBN 9781498710916.
Maintained by Norm Matloff. Last updated 2 months ago.
3.5 match 127 stars 9.39 score 48 scripts 3 dependentsprojectmosaic
mosaicCalc:R-Language Based Calculus Operations for Teaching
Software to support the introductory *MOSAIC Calculus* textbook <>), one of many data- and modeling-oriented educational resources developed by Project MOSAIC (<>). Provides symbolic and numerical differentiation and integration, as well as support for applied linear algebra (for data science), and differential equations/dynamics. Includes grammar-of-graphics-based functions for drawing vector fields, trajectories, etc. The software is suitable for general use, but intended mainly for teaching calculus.
Maintained by Daniel Kaplan. Last updated 19 days ago.
3.8 match 13 stars 8.68 score 546 scriptsbayesball
ProbBayes:Probability and Bayesian Modeling
Functions and datasets to accompany J. Albert and J. Hu, "Probability and Bayesian Modeling", CRC Press, (2019, ISBN: 1138492566).
Maintained by Jim Albert. Last updated 4 years ago.
7.3 match 5 stars 4.30 score 80 scriptshturner
BradleyTerry2:Bradley-Terry Models
Specify and fit the Bradley-Terry model, including structured versions in which the parameters are related to explanatory variables through a linear predictor and versions with contest-specific effects, such as a home advantage.
Maintained by Heather Turner. Last updated 6 years ago.
3.8 match 20 stars 7.97 score 172 scripts 1 dependentsheliosdrm
phia:Post-Hoc Interaction Analysis
Analysis of terms in linear, generalized and mixed linear models, on the basis of multiple comparisons of factor contrasts. Specially suited for the analysis of interaction terms.
Maintained by Helios De Rosario-Martinez. Last updated 1 years ago.
3.6 match 4 stars 8.07 score 199 scripts 4 dependentsstatswithr
statsr:Companion Software for the Coursera Statistics with R Specialization
Data and functions to support Bayesian and frequentist inference and decision making for the Coursera Specialization "Statistics with R". See <> for more information.
Maintained by Merlise Clyde. Last updated 4 years ago.
3.6 match 71 stars 7.80 score 880 scriptsmdsr-book
mdsr:Complement to 'Modern Data Science with R'
A complement to all editions of *Modern Data Science with R* (ISBN: 978-0367191498, publisher URL: <>). This package contains data and code to complete exercises and reproduce examples from the text. It also facilitates connections to the SQL database server used in the book. All editions of the book are supported by this package.
Maintained by Benjamin S. Baumer. Last updated 7 months ago.
3.5 match 38 stars 7.21 score 504 scriptstrevorhastie
ISLR2:Introduction to Statistical Learning, Second Edition
We provide the collection of data-sets used in the book 'An Introduction to Statistical Learning with Applications in R, Second Edition'. These include many data-sets that we used in the first edition (some with minor changes), and some new datasets.
Maintained by Trevor Hastie. Last updated 2 years ago.
4.5 match 2 stars 5.49 score 2.2k scriptskloke
npsm:Nonparametric Statistical Methods
Accompanies the book "Nonparametric Statistical Methods Using R, 2nd Edition" by Kloke and McKean (2024, ISBN:9780367651350). Includes methods, datasets, and random number generation useful for the study of robust and/or nonparametric statistics. Emphasizes classical nonparametric methods for a variety of designs --- especially one-sample and two-sample problems. Includes methods for general scores, including estimation and testing for the two-sample location problem as well as Hogg's adaptive method.
Maintained by John Kloke. Last updated 9 months ago.
7.0 match 3.47 score 59 scriptsflyaflya
causact:Fast, Easy, and Visual Bayesian Inference
Accelerate Bayesian analytics workflows in 'R' through interactive modelling, visualization, and inference. Define probabilistic graphical models using directed acyclic graphs (DAGs) as a unifying language for business stakeholders, statisticians, and programmers. This package relies on interfacing with the 'numpyro' python package.
Maintained by Adam Fleischhacker. Last updated 2 months ago.
3.4 match 45 stars 7.15 score 52 scriptswch
gcookbook:Data for "R Graphics Cookbook"
Data sets used in the book "R Graphics Cookbook" by Winston Chang, published by O'Reilly Media.
Maintained by Winston Chang. Last updated 6 years ago.
3.4 match 10 stars 6.77 score 1.3k scripts 1 dependentslightbluetitan
usdatasets:A Comprehensive Collection of U.S. Datasets
Provides a diverse collection of U.S. datasets encompassing various fields such as crime, economics, education, finance, energy, healthcare, and more. It serves as a valuable resource for researchers and analysts seeking to perform in-depth analyses and derive insights from U.S.-specific data.
Maintained by Renzo Caceres Rossi. Last updated 5 months ago.
3.4 match 7 stars 5.99 score 141 scriptsschochastics
networkdata:Repository of Network Datasets
The package contains a large collection of network dataset with different context. This includes social networks, animal networks and movie networks. All datasets are in 'igraph' format.
Maintained by David Schoch. Last updated 12 months ago.
4.0 match 143 stars 5.01 score 143 scriptsbeanumber
tidychangepoint:A Tidy Framework for Changepoint Detection Analysis
Changepoint detection algorithms for R are widespread but have different interfaces and reporting conventions. This makes the comparative analysis of results difficult. We solve this problem by providing a tidy, unified interface for several different changepoint detection algorithms. We also provide consistent numerical and graphical reporting leveraging the 'broom' and 'ggplot2' packages.
Maintained by Benjamin S. Baumer. Last updated 1 months ago.
3.6 match 2 stars 5.30 score 8 scriptsmlr-org
mlr3data:Collection of Machine Learning Data Sets for 'mlr3'
A small collection of interesting and educational machine learning data sets which are used as examples in the 'mlr3' book (<>), the use case gallery (<>), or in other examples. All data sets are properly preprocessed and ready to be analyzed by most machine learning algorithms. Data sets are automatically added to the dictionary of tasks if 'mlr3' is loaded.
Maintained by Marc Becker. Last updated 4 months ago.
3.5 match 2 stars 5.28 score 18 scripts 2 dependentsmariarizzo
RbyExample:Data for the Book "R by Example"
Data for the examples and exercises in the book "R by Example". Jim Albert and Maria Rizzo (2012, ISBN 978-1-4614-1365-3).
Maintained by Maria Rizzo. Last updated 8 months ago.
3.8 match 2 stars 3.94 score 22 scriptssportsdataverse
sportyR:Plot Scaled 'ggplot' Representations of Sports Playing Surfaces
Create scaled 'ggplot' representations of playing surfaces. Playing surfaces are drawn pursuant to rule-book specifications. This package should be used as a baseline plot for displaying any type of tracking data.
Maintained by Ross Drucker. Last updated 1 months ago.
1.7 match 104 stars 8.08 score 97 scriptswjbraun
MPV:Data Sets from Montgomery, Peck and Vining
Most of this package consists of data sets from the textbook Introduction to Linear Regression Analysis, by Montgomery, Peck and Vining. All data sets from the 3rd edition are included and many from the 6th edition are also included. The package also contains some additional data sets and functions.
Maintained by W.J. Braun. Last updated 7 months ago.
3.8 match 3.36 score 193 scripts 1 dependentspdwaggoner
mlbstats:Major League Baseball Player Statistics Calculator
Computational functions for player metrics in major league baseball including batting, pitching, fielding, base-running, and overall player statistics. This package is actively maintained with new metrics being added as they are developed.
Maintained by Philip D. Waggoner. Last updated 6 months ago.
3.4 match 3.36 score 46 scriptstimhesterberg
resampledata3:Data Sets for "Mathematical Statistics with Resampling and R" (3rd Ed)
Data sets for Chihara and Hesterberg (2022, ISBN: 978-1-119-87404-1) "Mathematical Statistics with Resampling in R" (3rd Ed).
Maintained by Tim Hesterberg. Last updated 3 years ago.
7.5 match 1.52 score 33 scriptshanmingwu1103
dataSDA:Data Sets for Symbolic Data Analysis
Collects a diverse range of symbolic data and offers a comprehensive set of functions that facilitate the conversion of traditional data into the symbolic data format.
Maintained by Han-Ming Wu. Last updated 2 years ago.
3.8 match 2.70 score 2 scriptscran
SDAResources:Datasets and Functions for 'Sampling: Design and Analysis, 3rd Edition'
Includes all the datasets of 'Sampling: Design and Analysis' (3rd edition by Sharon Lohr) in R format and additional functions for analyzing and graphing probability samples.
Maintained by Yan Lu. Last updated 3 years ago.
4.5 match 2.00 scoremaiermarco
prefmod:Utilities to Fit Paired Comparison Models for Preferences
Generates design matrix for analysing real paired comparisons and derived paired comparison data (Likert type items/ratings or rankings) using a loglinear approach. Fits loglinear Bradley-Terry model (LLBT) exploiting an eliminate feature. Computes pattern models for paired comparisons, rankings, and ratings. Some treatment of missing values (MCAR and MNAR). Fits latent class (mixture) models for paired comparison, rating and ranking patterns using a non-parametric ML approach.
Maintained by Marco Johannes Maier. Last updated 1 years ago.
3.6 match 2.41 score 53 scripts 1 dependentscran
freqparcoord:Novel Methods for Parallel Coordinates
New approaches to parallel coordinates plots for multivariate data visualization, including applications to clustering, outlier hunting and regression diagnostics. Includes general functions for multivariate nonparametric density and regression estimation, using parallel computation.
Maintained by Norm Matloff. Last updated 9 years ago.
3.5 match 1.78 score 2 dependentsnateybear
probstats4econ:Companion Package to Probability and Statistics for Economics and Business
Utilities for multiple hypothesis testing, companion datasets from "Probability and Statistics for Economics and Business: An Introduction Using R" by Jason Abrevaya (MIT Press, under contract).
Maintained by Nathan Gardner Hattersley. Last updated 7 months ago.
4.0 match 1.30 scorefpzhang2015
cthreshER:Continuous Threshold Expectile Regression
Estimation and inference methods for the continuous threshold expectile regression. It can fit the continuous threshold expectile regression and test the existence of change point, for the paper, "Feipeng Zhang and Qunhua Li (2016). A continuous threshold expectile regression, submitted."
Maintained by Feipeng Zhang. Last updated 8 years ago.
3.8 match 1 stars 1.00 score 4 scriptsmarcoblume Odds Data from Pinnacle
Market odds from from Pinnacle, an online sports betting bookmaker (see <> for more information). Included are datasets for the Major League Baseball (MLB) 2016 season and the USA election 2016. These datasets can be used to build models and compare statistical information with the information from prediction markets.The Major League Baseball (MLB) 2016 dataset can be used for sabermetrics analysis and also can be used in conjunction with other popular Major League Baseball (MLB) datasets such as Retrosheets or the Lahman package by merging by GameID.
Maintained by Marco Blume. Last updated 8 years ago.
0.9 match 12 stars 3.78 score 7 scripts