R-universe search: exports:one

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 25 days ago.

data-manipulation grammar cpp

4.8k stars 24.68 score 659k scripts 7.8k dependents

tidyverse

tidyr:Tidy Messy Data

Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).

Maintained by Hadley Wickham. Last updated 25 days ago.

tidy-data cpp

1.4k stars 22.88 score 168k scripts 5.5k dependents

apache

arrow:Integration to 'Apache' 'Arrow'

'Apache' 'Arrow' <https://arrow.apache.org/> is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. This package provides an interface to the 'Arrow C++' library.

Maintained by Jonathan Keane. Last updated 2 months ago.

arrow curl openssl cpp

15k stars 19.25 score 10k scripts 82 dependents

rstudio

gt:Easily Create Presentation-Ready Display Tables

Build display tables from tabular data with an easy-to-use set of functions. With its progressive approach, we can construct display tables with a cohesive set of table parts. Table values can be formatted using any of the included formatting functions. Footnotes and cell styles can be precisely added through a location targeting system. The way in which 'gt' handles things for you means that you don't often have to worry about the fine details.

Maintained by Richard Iannone. Last updated 23 days ago.

docx easy-to-use html latex rtf summary-tables

2.1k stars 18.36 score 20k scripts 112 dependents

r-lib

tidyselect:Select from a Set of Strings

A backend for the selecting functions of the 'tidyverse'. It makes it easy to implement select-like functions in your own packages in a way that is consistent with other 'tidyverse' interfaces for selection.

Maintained by Lionel Henry. Last updated 4 months ago.

130 stars 18.31 score 1.9k scripts 8.2k dependents

tidyverse

vroom:Read and Write Rectangular Text Data Quickly

The goal of 'vroom' is to read and write data (like 'csv', 'tsv' and 'fwf') quickly. When reading it uses a quick initial indexing step, then reads the values lazily , so only the data you actually use needs to be read. The writer formats the data in parallel and writes to disk asynchronously from formatting.

Maintained by Jennifer Bryan. Last updated 7 months ago.

csv csv-parser fixed-width-text tsv tsv-parser cpp

625 stars 17.82 score 4.5k scripts 2.1k dependents

ddsjoberg

gtsummary:Presentation-Ready Data Summary and Analytic Result Tables

Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.

Maintained by Daniel D. Sjoberg. Last updated 3 days ago.

easy-to-use gt html5 regression-models reproducibility reproducible-research statistics summary-statistics summary-tables table1 tableone

1.1k stars 17.02 score 8.2k scripts 15 dependents

ropensci

skimr:Compact and Flexible Summaries of Data

A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the "Using skimr" vignette and the README.

Maintained by Elin Waring. Last updated 2 months ago.

peer-reviewed ropensci summary-statistics unconf unconf17

1.1k stars 16.80 score 18k scripts 14 dependents

ropensci

targets:Dynamic Function-Oriented 'Make'-Like Declarative Pipelines

Pipeline tools coordinate the pieces of computationally demanding analysis projects. The 'targets' package is a 'Make'-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU 'Make' (2015, ISBN:978-9881443519) and 'drake' (2018, <doi:10.21105/joss.00550>).

Maintained by William Michael Landau. Last updated 1 hours ago.

data-science high-performance-computing make peer-reviewed pipeline r-targetopia reproducibility reproducible-research targets workflow

978 stars 15.16 score 4.6k scripts 22 dependents

thomasp85

tidygraph:A Tidy API for Graph Manipulation

A graph, while not "tidy" in itself, can be thought of as two tidy data frames describing node and edge data respectively. 'tidygraph' provides an approach to manipulate these two virtual data frames using the API defined in the 'dplyr' package, as well as provides tidy interfaces to a lot of common graph algorithms.

Maintained by Thomas Lin Pedersen. Last updated 2 months ago.

graph-algorithms graph-manipulation igraph network-analysis tidyverse cpp

553 stars 14.74 score 4.6k scripts 136 dependents

ropensci

drake:A Pipeline Toolkit for Reproducible Computation at Scale

A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.

Maintained by William Michael Landau. Last updated 4 months ago.

data-science drake high-performance-computing makefile peer-reviewed pipeline reproducibility reproducible-research ropensci workflow

1.3k stars 11.49 score 1.7k scripts 1 dependents

larmarange

broom.helpers:Helpers for Model Coefficients Tibbles

Provides suite of functions to work with regression model 'broom::tidy()' tibbles. The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.

Maintained by Joseph Larmarange. Last updated 22 days ago.

22 stars 11.45 score 165 scripts 2 dependents

jthomasmock

gtExtras:Extending 'gt' for Beautiful HTML Tables

Provides additional functions for creating beautiful tables with 'gt'. The functions are generally wrappers around boilerplate or adding opinionated niche capabilities and helpers functions.

Maintained by Thomas Mock. Last updated 12 months ago.

data-science data-visualization datascience ggplot2 gt plots sparkline sparkline-graphs sparklines tables

199 stars 11.45 score 2.4k scripts 3 dependents

insightsengineering

cards:Analysis Results Data

Construct CDISC (Clinical Data Interchange Standards Consortium) compliant Analysis Results Data objects. These objects are used and re-used to construct summary tables, visualizations, and written reports. The package also exports utilities for working with these objects and creating new Analysis Results Data objects.

Maintained by Daniel D. Sjoberg. Last updated 27 days ago.

analysis cdisc dataset

39 stars 11.41 score 100 scripts 20 dependents

ropensci

tarchetypes:Archetypes for Targets

Function-oriented Make-like declarative pipelines for Statistics and data science are supported in the 'targets' R package. As an extension to 'targets', the 'tarchetypes' package provides convenient user-side functions to make 'targets' easier to use. By establishing reusable archetypes for common kinds of targets and pipelines, these functions help express complicated reproducible pipelines concisely and compactly. The methods in this package were influenced by the 'targets' R package. by Will Landau (2018) <doi:10.21105/joss.00550>.

Maintained by William Michael Landau. Last updated 5 hours ago.

data-science high-performance-computing peer-reviewed pipeline r-targetopia reproducibility targets workflow

142 stars 11.27 score 1.7k scripts 10 dependents

wlandau

crew:A Distributed Worker Launcher Framework

In computationally demanding analysis projects, statisticians and data scientists asynchronously deploy long-running tasks to distributed systems, ranging from traditional clusters to cloud services. The 'NNG'-powered 'mirai' R package by Gao (2023) <doi:10.5281/zenodo.7912722> is a sleek and sophisticated scheduler that efficiently processes these intense workloads. The 'crew' package extends 'mirai' with a unifying interface for third-party worker launchers. Inspiration also comes from packages. 'future' by Bengtsson (2021) <doi:10.32614/RJ-2021-048>, 'rrq' by FitzJohn and Ashton (2023) <https://github.com/mrc-ide/rrq>, 'clustermq' by Schubert (2019) <doi:10.1093/bioinformatics/btz284>), and 'batchtools' by Lang, Bischel, and Surmann (2017) <doi:10.21105/joss.00135>.

Maintained by William Michael Landau. Last updated 1 days ago.

high-performance-computing

136 stars 11.13 score 243 scripts 2 dependents

ipums

ipumsr:An R Interface for Downloading, Reading, and Handling IPUMS Data

An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website <https://www.ipums.org>.

Maintained by Derek Burk. Last updated 1 months ago.

30 stars 11.05 score 720 scripts 2 dependents

grunwaldlab

metacoder:Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

Reads, plots, and manipulates large taxonomic data sets, like those generated from modern high-throughput sequencing, such as metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the 'taxmap' format defined by the 'taxa' package. The 'metacoder' package is described in the publication by Foster et al. (2017) <doi:10.1371/journal.pcbi.1005404>.

Maintained by Zachary Foster. Last updated 2 months ago.

community-diversity hierarchical metabarcoding pcr taxonomy trees cpp

140 stars 9.64 score 328 scripts

nepem-ufsc

metan:Multi Environment Trials Analysis

Performs stability analysis of multi-environment trial data using parametric and non-parametric methods. Parametric methods includes Additive Main Effects and Multiplicative Interaction (AMMI) analysis by Gauch (2013) <doi:10.2135/cropsci2013.04.0241>, Ecovalence by Wricke (1965), Genotype plus Genotype-Environment (GGE) biplot analysis by Yan & Kang (2003) <doi:10.1201/9781420040371>, geometric adaptability index by Mohammadi & Amri (2008) <doi:10.1007/s10681-007-9600-6>, joint regression analysis by Eberhart & Russel (1966) <doi:10.2135/cropsci1966.0011183X000600010011x>, genotypic confidence index by Annicchiarico (1992), Murakami & Cruz's (2004) method, power law residuals (POLAR) statistics by Doring et al. (2015) <doi:10.1016/j.fcr.2015.08.005>, scale-adjusted coefficient of variation by Doring & Reckling (2018) <doi:10.1016/j.eja.2018.06.007>, stability variance by Shukla (1972) <doi:10.1038/hdy.1972.87>, weighted average of absolute scores by Olivoto et al. (2019a) <doi:10.2134/agronj2019.03.0220>, and multi-trait stability index by Olivoto et al. (2019b) <doi:10.2134/agronj2019.03.0221>. Non-parametric methods includes superiority index by Lin & Binns (1988) <doi:10.4141/cjps88-018>, nonparametric measures of phenotypic stability by Huehn (1990) <doi:10.1007/BF00024241>, TOP third statistic by Fox et al. (1990) <doi:10.1007/BF00040364>. Functions for computing biometrical analysis such as path analysis, canonical correlation, partial correlation, clustering analysis, and tools for inspecting, manipulating, summarizing and plotting typical multi-environment trial data are also provided.

Maintained by Tiago Olivoto. Last updated 21 days ago.

2 stars 9.48 score 1.3k scripts 2 dependents

rstudio

tfdatasets:Interface to 'TensorFlow' Datasets

Interface to 'TensorFlow' Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces. See <https://www.tensorflow.org/guide> for additional details.

Maintained by Tomasz Kalinowski. Last updated 16 days ago.

34 stars 9.32 score 656 scripts 3 dependents

insightsengineering

cardx:Extra Analysis Results Data Utilities

Create extra Analysis Results Data (ARD) summary objects. The package supplements the simple ARD functions from the 'cards' package, exporting functions to put statistical results in the ARD format. These objects are used and re-used to construct summary tables, visualizations, and written reports.

Maintained by Daniel D. Sjoberg. Last updated 1 months ago.

19 stars 8.99 score 50 scripts 1 dependents

rstudio

tfestimators:Interface to 'TensorFlow' Estimators

Interface to 'TensorFlow' Estimators <https://www.tensorflow.org/guide/estimator>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.

Maintained by Tomasz Kalinowski. Last updated 3 years ago.

57 stars 8.42 score 170 scripts

shannonpileggi

gtreg:Regulatory Tables for Clinical Research

Creates tables suitable for regulatory agency submission by leveraging the 'gtsummary' package as the back end. Tables can be exported to HTML, Word, PDF and more. Highly customized outputs are available by utilizing existing styling functions from 'gtsummary' as well as custom options designed for regulatory tables.

Maintained by Shannon Pileggi. Last updated 1 months ago.

37 stars 6.92 score 30 scripts

ropensci

taxa:Classes for Storing and Manipulating Taxonomic Data

Provides classes for storing and manipulating taxonomic data. Most of the classes can be treated like base R vectors (e.g. can be used in tables as columns and can be named). Vectorized classes can store taxon names and authorities, taxon IDs from databases, taxon ranks, and other types of information. More complex classes are provided to store taxonomic trees and user-defined data associated with them.

Maintained by Zachary Foster. Last updated 1 years ago.

taxonomy biology hierarchy data-cleaning taxon

47 stars 6.79 score 217 scripts

christopherkenny

name:Tools for Working with Names

A system for organizing column names in data. Aimed at supporting a prefix-based and suffix-based column naming scheme. Extends 'dplyr' functionality to add ordering by function and more explicit renaming.

Maintained by Christopher T. Kenny. Last updated 3 years ago.

2 stars 6.28 score 19k scripts

asardaes

table.express:Build 'data.table' Expressions with Data Manipulation Verbs

A specialization of 'dplyr' data manipulation verbs that parse and build expressions which are ultimately evaluated by 'data.table', letting it handle all optimizations. A set of additional verbs is also provided to facilitate some common operations on a subset of the data.

Maintained by Alexis Sarda-Espinosa. Last updated 2 years ago.

65 stars 5.81 score 8 scripts

socialresearchcentre

testdat:Data Unit Testing for R

Test your data! An extension of the 'testthat' unit testing framework with a family of functions and reporting tools for checking and validating data frames.

Maintained by Danny Smith. Last updated 10 months ago.

8 stars 5.78 score 50 scripts

r-causal

halfmoon:Techniques to Build Better Balance

Build better balance in causal inference models. 'halfmoon' helps you assess propensity score models for balance between groups using metrics like standardized mean differences and visualization techniques like mirrored histograms. 'halfmoon' supports both weighting and matching techniques.

Maintained by Malcolm Barrett. Last updated 28 days ago.

17 stars 5.39 score 72 scripts

malcolmbarrett

metaconfoundr:Visualize 'Confounder' Control in Meta-Analyses

Visualize 'confounder' control in meta-analysis. 'metaconfoundr' is an approach to evaluating bias in studies used in meta-analyses based on the causal inference framework. Study groups create a causal diagram displaying their assumptions about the scientific question. From this, they develop a list of important 'confounders'. Then, they evaluate whether studies controlled for these variables well. 'metaconfoundr' is a toolkit to facilitate this process and visualize the results as heat maps, traffic light plots, and more.

Maintained by Malcolm Barrett. Last updated 2 years ago.

11 stars 5.04 score 9 scripts

armcn

quickcheck:Property Based Testing

Property based testing, inspired by the original 'QuickCheck'. This package builds on the property based testing framework provided by 'hedgehog' and is designed to seamlessly integrate with 'testthat'.

Maintained by Andrew McNeil. Last updated 1 years ago.

functional-programming property-based-testing

25 stars 4.94 score 70 scripts

forestgeo

fgeo.tool:Import and Manipulate 'ForestGEO' Data

To help you access, transform, analyze, and visualize 'ForestGEO' data, we developed a collection of R packages (<https://forestgeo.github.io/fgeo/>). This package, in particular, helps you to easily import, filter, and modify 'ForestGEO' data. To learn more about 'ForestGEO' visit <https://forestgeo.si.edu/>.

Maintained by Mauro Lepore. Last updated 3 years ago.

dynamics ecology fgeo forestgeo miscelaneas tools tree utils

2 stars 4.86 score 27 scripts 3 dependents

r-causal

tidysmd:Tidy Standardized Mean Differences

Tidy standardized mean differences ('SMDs'). 'tidysmd' uses the 'smd' package to calculate standardized mean differences for variables in a data frame, returning the results in a tidy format.

Maintained by Malcolm Barrett. Last updated 2 months ago.

9 stars 4.36 score 17 scripts 1 dependents

christopherkenny

divseg:Calculate Diversity and Segregation Indices

Implements common measures of diversity and spatial segregation. This package has tools to compute the majority of measures are reviewed in Massey and Denton (1988) <doi:10.2307/2579183>. Multiple common measures of within-geography diversity are implemented as well. All functions operate on data frames with a 'tidyselect' based workflow.

Maintained by Christopher T. Kenny. Last updated 10 months ago.

1 stars 2.78 score 12 scripts

efinite

utile.tables:Build Tables for Publication

Functions for building customized ready-to-export tables for publication.

Maintained by Eric Finnesgard. Last updated 2 years ago.

1 stars 2.70 score 2 scripts