dplyr:A Grammar of Data Manipulation
A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
Maintained by Hadley Wickham. Last updated 25 days ago.
4.8k stars 24.68 score 659k scripts 7.8k dependentstidyverse
tidyr:Tidy Messy Data
Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).
Maintained by Hadley Wickham. Last updated 25 days ago.
1.4k stars 22.88 score 168k scripts 5.5k dependentsapache
arrow:Integration to 'Apache' 'Arrow'
'Apache' 'Arrow' <> is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. This package provides an interface to the 'Arrow C++' library.
Maintained by Jonathan Keane. Last updated 2 months ago.
15k stars 19.25 score 10k scripts 82 dependentsrstudio
gt:Easily Create Presentation-Ready Display Tables
Build display tables from tabular data with an easy-to-use set of functions. With its progressive approach, we can construct display tables with a cohesive set of table parts. Table values can be formatted using any of the included formatting functions. Footnotes and cell styles can be precisely added through a location targeting system. The way in which 'gt' handles things for you means that you don't often have to worry about the fine details.
Maintained by Richard Iannone. Last updated 23 days ago.
2.1k stars 18.36 score 20k scripts 112 dependentsr-lib
tidyselect:Select from a Set of Strings
A backend for the selecting functions of the 'tidyverse'. It makes it easy to implement select-like functions in your own packages in a way that is consistent with other 'tidyverse' interfaces for selection.
Maintained by Lionel Henry. Last updated 4 months ago.
130 stars 18.31 score 1.9k scripts 8.2k dependentstidyverse
vroom:Read and Write Rectangular Text Data Quickly
The goal of 'vroom' is to read and write data (like 'csv', 'tsv' and 'fwf') quickly. When reading it uses a quick initial indexing step, then reads the values lazily , so only the data you actually use needs to be read. The writer formats the data in parallel and writes to disk asynchronously from formatting.
Maintained by Jennifer Bryan. Last updated 7 months ago.
625 stars 17.82 score 4.5k scripts 2.1k dependentsddsjoberg
gtsummary:Presentation-Ready Data Summary and Analytic Result Tables
Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.
Maintained by Daniel D. Sjoberg. Last updated 3 days ago.
1.1k stars 17.02 score 8.2k scripts 15 dependentsropensci
skimr:Compact and Flexible Summaries of Data
A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the "Using skimr" vignette and the README.
Maintained by Elin Waring. Last updated 2 months ago.
1.1k stars 16.80 score 18k scripts 14 dependentsropensci
targets:Dynamic Function-Oriented 'Make'-Like Declarative Pipelines
Pipeline tools coordinate the pieces of computationally demanding analysis projects. The 'targets' package is a 'Make'-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU 'Make' (2015, ISBN:978-9881443519) and 'drake' (2018, <doi:10.21105/joss.00550>).
Maintained by William Michael Landau. Last updated 1 hours ago.
978 stars 15.16 score 4.6k scripts 22 dependentsthomasp85
tidygraph:A Tidy API for Graph Manipulation
A graph, while not "tidy" in itself, can be thought of as two tidy data frames describing node and edge data respectively. 'tidygraph' provides an approach to manipulate these two virtual data frames using the API defined in the 'dplyr' package, as well as provides tidy interfaces to a lot of common graph algorithms.
Maintained by Thomas Lin Pedersen. Last updated 2 months ago.
553 stars 14.74 score 4.6k scripts 136 dependentsropensci
drake:A Pipeline Toolkit for Reproducible Computation at Scale
A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <> and the online manual <>.
Maintained by William Michael Landau. Last updated 4 months ago.
1.3k stars 11.49 score 1.7k scripts 1 dependentslarmarange
broom.helpers:Helpers for Model Coefficients Tibbles
Provides suite of functions to work with regression model 'broom::tidy()' tibbles. The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.
Maintained by Joseph Larmarange. Last updated 22 days ago.
22 stars 11.45 score 165 scripts 2 dependentsjthomasmock
gtExtras:Extending 'gt' for Beautiful HTML Tables
Provides additional functions for creating beautiful tables with 'gt'. The functions are generally wrappers around boilerplate or adding opinionated niche capabilities and helpers functions.
Maintained by Thomas Mock. Last updated 12 months ago.
199 stars 11.45 score 2.4k scripts 3 dependentsinsightsengineering
cards:Analysis Results Data
Construct CDISC (Clinical Data Interchange Standards Consortium) compliant Analysis Results Data objects. These objects are used and re-used to construct summary tables, visualizations, and written reports. The package also exports utilities for working with these objects and creating new Analysis Results Data objects.
Maintained by Daniel D. Sjoberg. Last updated 27 days ago.
39 stars 11.41 score 100 scripts 20 dependentsropensci
tarchetypes:Archetypes for Targets
Function-oriented Make-like declarative pipelines for Statistics and data science are supported in the 'targets' R package. As an extension to 'targets', the 'tarchetypes' package provides convenient user-side functions to make 'targets' easier to use. By establishing reusable archetypes for common kinds of targets and pipelines, these functions help express complicated reproducible pipelines concisely and compactly. The methods in this package were influenced by the 'targets' R package. by Will Landau (2018) <doi:10.21105/joss.00550>.
Maintained by William Michael Landau. Last updated 5 hours ago.
142 stars 11.27 score 1.7k scripts 10 dependentswlandau
crew:A Distributed Worker Launcher Framework
In computationally demanding analysis projects, statisticians and data scientists asynchronously deploy long-running tasks to distributed systems, ranging from traditional clusters to cloud services. The 'NNG'-powered 'mirai' R package by Gao (2023) <doi:10.5281/zenodo.7912722> is a sleek and sophisticated scheduler that efficiently processes these intense workloads. The 'crew' package extends 'mirai' with a unifying interface for third-party worker launchers. Inspiration also comes from packages. 'future' by Bengtsson (2021) <doi:10.32614/RJ-2021-048>, 'rrq' by FitzJohn and Ashton (2023) <>, 'clustermq' by Schubert (2019) <doi:10.1093/bioinformatics/btz284>), and 'batchtools' by Lang, Bischel, and Surmann (2017) <doi:10.21105/joss.00135>.
Maintained by William Michael Landau. Last updated 1 days ago.
136 stars 11.13 score 243 scripts 2 dependentsipums
ipumsr:An R Interface for Downloading, Reading, and Handling IPUMS Data
An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website <>.
Maintained by Derek Burk. Last updated 1 months ago.
30 stars 11.05 score 720 scripts 2 dependentsgrunwaldlab
metacoder:Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data
Reads, plots, and manipulates large taxonomic data sets, like those generated from modern high-throughput sequencing, such as metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the 'taxmap' format defined by the 'taxa' package. The 'metacoder' package is described in the publication by Foster et al. (2017) <doi:10.1371/journal.pcbi.1005404>.
Maintained by Zachary Foster. Last updated 2 months ago.
140 stars 9.64 score 328 scriptsrstudio
tfdatasets:Interface to 'TensorFlow' Datasets
Interface to 'TensorFlow' Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces. See <> for additional details.
Maintained by Tomasz Kalinowski. Last updated 16 days ago.
34 stars 9.32 score 656 scripts 3 dependentsinsightsengineering
cardx:Extra Analysis Results Data Utilities
Create extra Analysis Results Data (ARD) summary objects. The package supplements the simple ARD functions from the 'cards' package, exporting functions to put statistical results in the ARD format. These objects are used and re-used to construct summary tables, visualizations, and written reports.
Maintained by Daniel D. Sjoberg. Last updated 1 months ago.
19 stars 8.99 score 50 scripts 1 dependentsrstudio
tfestimators:Interface to 'TensorFlow' Estimators
Interface to 'TensorFlow' Estimators <>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.
Maintained by Tomasz Kalinowski. Last updated 3 years ago.
57 stars 8.42 score 170 scriptsshannonpileggi
gtreg:Regulatory Tables for Clinical Research
Creates tables suitable for regulatory agency submission by leveraging the 'gtsummary' package as the back end. Tables can be exported to HTML, Word, PDF and more. Highly customized outputs are available by utilizing existing styling functions from 'gtsummary' as well as custom options designed for regulatory tables.
Maintained by Shannon Pileggi. Last updated 1 months ago.
37 stars 6.92 score 30 scriptsropensci
taxa:Classes for Storing and Manipulating Taxonomic Data
Provides classes for storing and manipulating taxonomic data. Most of the classes can be treated like base R vectors (e.g. can be used in tables as columns and can be named). Vectorized classes can store taxon names and authorities, taxon IDs from databases, taxon ranks, and other types of information. More complex classes are provided to store taxonomic trees and user-defined data associated with them.
Maintained by Zachary Foster. Last updated 1 years ago.
47 stars 6.79 score 217 scriptschristopherkenny
name:Tools for Working with Names
A system for organizing column names in data. Aimed at supporting a prefix-based and suffix-based column naming scheme. Extends 'dplyr' functionality to add ordering by function and more explicit renaming.
Maintained by Christopher T. Kenny. Last updated 3 years ago.
2 stars 6.28 score 19k scriptsasardaes 'data.table' Expressions with Data Manipulation Verbs
A specialization of 'dplyr' data manipulation verbs that parse and build expressions which are ultimately evaluated by 'data.table', letting it handle all optimizations. A set of additional verbs is also provided to facilitate some common operations on a subset of the data.
Maintained by Alexis Sarda-Espinosa. Last updated 2 years ago.
65 stars 5.81 score 8 scriptssocialresearchcentre
testdat:Data Unit Testing for R
Test your data! An extension of the 'testthat' unit testing framework with a family of functions and reporting tools for checking and validating data frames.
Maintained by Danny Smith. Last updated 10 months ago.
8 stars 5.78 score 50 scriptsr-causal
halfmoon:Techniques to Build Better Balance
Build better balance in causal inference models. 'halfmoon' helps you assess propensity score models for balance between groups using metrics like standardized mean differences and visualization techniques like mirrored histograms. 'halfmoon' supports both weighting and matching techniques.
Maintained by Malcolm Barrett. Last updated 28 days ago.
17 stars 5.39 score 72 scriptsmalcolmbarrett
metaconfoundr:Visualize 'Confounder' Control in Meta-Analyses
Visualize 'confounder' control in meta-analysis. 'metaconfoundr' is an approach to evaluating bias in studies used in meta-analyses based on the causal inference framework. Study groups create a causal diagram displaying their assumptions about the scientific question. From this, they develop a list of important 'confounders'. Then, they evaluate whether studies controlled for these variables well. 'metaconfoundr' is a toolkit to facilitate this process and visualize the results as heat maps, traffic light plots, and more.
Maintained by Malcolm Barrett. Last updated 2 years ago.
11 stars 5.04 score 9 scriptsarmcn
quickcheck:Property Based Testing
Property based testing, inspired by the original 'QuickCheck'. This package builds on the property based testing framework provided by 'hedgehog' and is designed to seamlessly integrate with 'testthat'.
Maintained by Andrew McNeil. Last updated 1 years ago.
25 stars 4.94 score 70 scriptsforestgeo
fgeo.tool:Import and Manipulate 'ForestGEO' Data
To help you access, transform, analyze, and visualize 'ForestGEO' data, we developed a collection of R packages (<>). This package, in particular, helps you to easily import, filter, and modify 'ForestGEO' data. To learn more about 'ForestGEO' visit <>.
Maintained by Mauro Lepore. Last updated 3 years ago.
2 stars 4.86 score 27 scripts 3 dependentsr-causal
tidysmd:Tidy Standardized Mean Differences
Tidy standardized mean differences ('SMDs'). 'tidysmd' uses the 'smd' package to calculate standardized mean differences for variables in a data frame, returning the results in a tidy format.
Maintained by Malcolm Barrett. Last updated 2 months ago.
9 stars 4.36 score 17 scripts 1 dependentschristopherkenny
divseg:Calculate Diversity and Segregation Indices
Implements common measures of diversity and spatial segregation. This package has tools to compute the majority of measures are reviewed in Massey and Denton (1988) <doi:10.2307/2579183>. Multiple common measures of within-geography diversity are implemented as well. All functions operate on data frames with a 'tidyselect' based workflow.
Maintained by Christopher T. Kenny. Last updated 10 months ago.
1 stars 2.78 score 12 scriptsefinite
utile.tables:Build Tables for Publication
Functions for building customized ready-to-export tables for publication.
Maintained by Eric Finnesgard. Last updated 2 years ago.
1 stars 2.70 score 2 scripts