Showing 19 of total 19 results (show query)
tidyverse
tidyr:Tidy Messy Data
Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).
Maintained by Hadley Wickham. Last updated 25 days ago.
1.4k stars 22.88 score 168k scripts 5.5k dependentssebkrantz
collapse:Advanced and Fast Data Transformation
A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.
Maintained by Sebastian Krantz. Last updated 7 days ago.
data-aggregationdata-analysisdata-manipulationdata-processingdata-sciencedata-transformationeconometricshigh-performancepanel-datascientific-computingstatisticstime-seriesweightedweightscppopenmp
672 stars 16.68 score 708 scripts 99 dependentssparklyr
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 10 days ago.
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
959 stars 15.20 score 4.0k scripts 21 dependentsthomasp85
tidygraph:A Tidy API for Graph Manipulation
A graph, while not "tidy" in itself, can be thought of as two tidy data frames describing node and edge data respectively. 'tidygraph' provides an approach to manipulate these two virtual data frames using the API defined in the 'dplyr' package, as well as provides tidy interfaces to a lot of common graph algorithms.
Maintained by Thomas Lin Pedersen. Last updated 2 months ago.
graph-algorithmsgraph-manipulationigraphnetwork-analysistidyversecpp
553 stars 14.74 score 4.6k scripts 136 dependentsdieghernan
tidyterra:'tidyverse' Methods and 'ggplot2' Helpers for 'terra' Objects
Extension of the 'tidyverse' for 'SpatRaster' and 'SpatVector' objects of the 'terra' package. It includes also new 'geom_' functions that provide a convenient way of visualizing 'terra' objects with 'ggplot2'.
Maintained by Diego Hernangómez. Last updated 4 days ago.
terraggplot-extensionr-spatialrspatial
190 stars 13.59 score 1.9k scripts 25 dependentsmarkfairbanks
tidytable:Tidy Interface to 'data.table'
A tidy interface to 'data.table', giving users the speed of 'data.table' while using tidyverse-like syntax.
Maintained by Mark Fairbanks. Last updated 2 months ago.
460 stars 11.39 score 732 scripts 11 dependentsnathaneastwood
poorman:A Poor Man's Dependency Free Recreation of 'dplyr'
A replication of key functionality from 'dplyr' and the wider 'tidyverse' using only 'base'.
Maintained by Nathan Eastwood. Last updated 1 years ago.
base-rdata-manipulationgrammar
342 stars 10.79 score 156 scripts 27 dependentselbersb
tidylog:Logging for 'dplyr' and 'tidyr' Functions
Provides feedback about 'dplyr' and 'tidyr' operations.
Maintained by Benjamin Elbers. Last updated 10 months ago.
dplyrtidyrtidyversewrapper-functions
593 stars 10.23 score 1.7k scriptsben519
mltools:Machine Learning Tools
A collection of machine learning helper functions, particularly assisting in the Exploratory Data Analysis phase. Makes heavy use of the 'data.table' package for optimal speed and memory efficiency. Highlights include a versatile bin_data() function, sparsify() for converting a data.table to sparse matrix format with one-hot encoding, fast evaluation metrics, and empirical_cdf() for calculating empirical Multivariate Cumulative Distribution Functions.
Maintained by Ben Gorman. Last updated 4 years ago.
exploratory-data-analysismachine-learning
72 stars 9.67 score 1.2k scripts 13 dependentsinsightsengineering
random.cdisc.data:Create Random ADaM Datasets
A set of functions to create random Analysis Data Model (ADaM) datasets and cached dataset. ADaM dataset specifications are described by the Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model Team.
Maintained by Joe Zhu. Last updated 6 months ago.
33 stars 8.60 score 52 scriptsshichenxie
scorecard:Credit Risk Scorecard
The `scorecard` package makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection, woe binning, scorecard scaling, performance evaluation and report generation. These functions can also used in the development of machine learning models. The references including: 1. Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS. 2. Siddiqi, N. (2006, ISBN: 9780471754510). Credit risk scorecards. Developing and Implementing Intelligent Credit Scoring.
Maintained by Shichen Xie. Last updated 12 months ago.
binningcredit-scoringreleasescorecardwoewoebinning
164 stars 8.07 score 94 scriptsbioc
BioNERO:Biological Network Reconstruction Omnibus
BioNERO aims to integrate all aspects of biological network inference in a single package, including data preprocessing, exploratory analyses, network inference, and analyses for biological interpretations. BioNERO can be used to infer gene coexpression networks (GCNs) and gene regulatory networks (GRNs) from gene expression data. Additionally, it can be used to explore topological properties of protein-protein interaction (PPI) networks. GCN inference relies on the popular WGCNA algorithm. GRN inference is based on the "wisdom of the crowds" principle, which consists in inferring GRNs with multiple algorithms (here, CLR, GENIE3 and ARACNE) and calculating the average rank for each interaction pair. As all steps of network analyses are included in this package, BioNERO makes users avoid having to learn the syntaxes of several packages and how to communicate between them. Finally, users can also identify consensus modules across independent expression sets and calculate intra and interspecies module preservation statistics between different networks.
Maintained by Fabricio Almeida-Silva. Last updated 5 months ago.
softwaregeneexpressiongeneregulationsystemsbiologygraphandnetworkpreprocessingnetworknetworkinference
27 stars 7.78 score 50 scripts 1 dependentscbroeckl
RAMClustR:Mass Spectrometry Metabolomics Feature Clustering and Interpretation
A feature clustering algorithm for non-targeted mass spectrometric metabolomics data. This method is compatible with gas and liquid chromatography coupled mass spectrometry, including indiscriminant tandem mass spectrometry <DOI: 10.1021/ac501530d> data.
Maintained by Helge Hecht. Last updated 7 months ago.
12 stars 7.46 score 20 scriptsropensci
taxlist:Handling Taxonomic Lists
Handling taxonomic lists through objects of class 'taxlist'. This package provides functions to import species lists from 'Turboveg' (<https://www.synbiosys.alterra.nl/turboveg/>) and the possibility to create backups from resulting R-objects. Also quick displays are implemented as summary-methods.
Maintained by Miguel Alvarez. Last updated 6 months ago.
12 stars 7.07 score 81 scripts 2 dependentsstatisfactions
simpr:Flexible 'Tidyverse'-Friendly Simulations
A general, 'tidyverse'-friendly framework for simulation studies, design analysis, and power analysis. Specify data generation, define varying parameters, generate data, fit models, and tidy model results in a single pipeline, without needing loops or custom functions.
Maintained by Ethan Brown. Last updated 9 months ago.
43 stars 6.89 score 30 scriptseth-mds
ricu:Intensive Care Unit Data with R
Focused on (but not exclusive to) data sets hosted on PhysioNet (<https://physionet.org>), 'ricu' provides utilities for download, setup and access of intensive care unit (ICU) data sets. In addition to functions for running arbitrary queries against available data sets, a system for defining clinical concepts and encoding their representations in tabular ICU data is presented.
Maintained by Nicolas Bennett. Last updated 10 months ago.
39 stars 5.65 score 77 scriptsroaldarbol
animovement:An R toolbox for analysing animal movement across space and time
An R toolbox for analysing animal movement across space and time.
Maintained by Mikkel Roald-Arbøl. Last updated 3 months ago.
animal-behaviouranimal-movementneuroethologyneuroscience
10 stars 4.81 score 8 scriptspoissonconsulting
mcmcdata:Manipulate MCMC Samples and Data Frames
Manipulates Monte Carlo Markov Chain samples and associated data frames.
Maintained by Joe Thorley. Last updated 2 months ago.
1 stars 3.56 score 4 scripts 4 dependents