R-universe search: cleaning

sfirke

janitor:Simple Tools for Examining and Cleaning Dirty Data

The main janitor functions can: perfectly format data.frame column names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and explore duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness.

Maintained by Sam Firke. Last updated 3 months ago.

data-analysis data-cleaning data-science dirty-data excel pivot-tables spss tabulations tidyverse

17.1 match 1.4k stars 19.15 score 35k scripts 231 dependents

data-cleaning

validate:Data Validation Infrastructure

Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity. Supports checks implied by an SDMX DSD file as well. See also Van der Loo and De Jonge (2018) <doi:10.1002/9781118897126>, Chapter 6 and the JSS paper (2021) <doi:10.18637/jss.v097.i10>.

Maintained by Mark van der Loo. Last updated 12 days ago.

data-cleaning validation

22.5 match 418 stars 12.50 score 448 scripts 9 dependents

epiforecasts

covidregionaldata:Subnational Data for COVID-19 Epidemiology

An interface to subnational and national level COVID-19 data sourced from both official sources, such as Public Health England in the UK, and from other COVID-19 data collections, including the World Health Organisation (WHO), European Centre for Disease Prevention and Control (ECDC), John Hopkins University (JHU), Google Open Data and others. Designed to streamline COVID-19 data extraction, cleaning, and processing from a range of data sources in an open and transparent way. This allows users to inspect and scrutinise the data, and tools used to process it, at every step. For all countries supported, data includes a daily time-series of cases. Wherever available data is also provided for deaths, hospitalisations, and tests. National level data are also supported using a range of sources.

Maintained by Sam Abbott. Last updated 3 years ago.

covid-19 data open-science r6 regional-data

31.5 match 37 stars 5.67 score 121 scripts

ices-tools-prod

TAF:Transparent Assessment Framework for Reproducible Research

General framework to organize data, methods, and results used in reproducible scientific analyses. A TAF analysis consists of four scripts (data.R, model.R, output.R, report.R) that are run sequentially. Each script starts by reading files from a previous step and ends with writing out files for the next step. Convenience functions are provided to version control the required data and software, run analyses, clean residues from previous runs, manage files, manipulate tables, and produce figures. With a focus on stability and reproducible analyses, the TAF package comes with no dependencies. TAF forms a base layer for the 'icesTAF' package and other scientific applications.

Maintained by Arni Magnusson. Last updated 4 months ago.

23.1 match 3 stars 6.85 score 282 scripts 2 dependents

epiverse-trace

cleanepi:Clean and Standardize Epidemiological Data

Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, and generates a comprehensive report detailing the outcomes of each cleaning task.

Maintained by Karim Mané. Last updated 3 days ago.

data-cleaning epidemiology epiverse

21.1 match 9 stars 7.44 score 19 scripts

ropensci

CoordinateCleaner:Automated Cleaning of Occurrence Records from Biological Collections

Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. The reference for the methodology is: Zizka et al. (2019) <doi:10.1111/2041-210X.13152>.

Maintained by Alexander Zizka. Last updated 1 years ago.

14.2 match 82 stars 10.93 score 306 scripts 3 dependents

eblondel

cleangeo:Cleaning Geometries from Spatial Objects

Provides a set of utility tools to inspect spatial objects, facilitate handling and reporting of topology errors and geometry validity issue with sp objects. Finally, it provides a geometry cleaner that will fix all geometry problems, and eliminate (at least reduce) the likelihood of having issues when doing spatial data processing.

Maintained by Emmanuel Blondel. Last updated 2 years ago.

cleaning cleaning-geometries gis sp spatial

22.5 match 45 stars 6.82 score 99 scripts 1 dependents

data-cleaning

errorlocate:Locate Errors with Validation Rules

Errors in data can be located and removed using validation rules from package 'validate'. See also Van der Loo and De Jonge (2018) <doi:10.1002/9781118897126>, chapter 7.

Maintained by Edwin de Jonge. Last updated 9 months ago.

data-cleaning errors invalidation

24.2 match 22 stars 6.11 score 59 scripts

trinker

textclean:Text Cleaning Tools

Tools to clean and process text. Tools are geared at checking for substrings that are not optimal for analysis and replacing or removing them (normalizing) with more analysis friendly substrings (see Sproat, Black, Chen, Kumar, Ostendorf, & Richards (2001) <doi:10.1006/csla.2001.0169>) or extracting them into new variables. For example, emoticons are often used in text but not always easily handled by analysis algorithms. The replace_emoticon() function replaces emoticons with word equivalents.

Maintained by Tyler Rinker. Last updated 3 years ago.

data-munging emoticons regex text-analysis text-cleaning

13.3 match 248 stars 10.08 score 760 scripts 22 dependents

braverock

PerformanceAnalytics:Econometric Tools for Performance and Risk Analysis

Collection of econometric functions for performance and risk analysis. In addition to standard risk and performance metrics, this package aims to aid practitioners and researchers in utilizing the latest research in analysis of non-normal return streams. In general, it is most tested on return (rather than price) data on a regular scale, but most functions will work with irregular return data as well, and increasing numbers of functions will work with P&L or price data where possible.

Maintained by Brian G. Peterson. Last updated 3 months ago.

7.9 match 222 stars 15.93 score 4.8k scripts 20 dependents

data-cleaning

editrules:Parsing, Applying, and Manipulating Data Cleaning Rules

Please note: active development has moved to packages 'validate' and 'errorlocate'. Facilitates reading and manipulating (multivariate) data restrictions (edit rules) on numerical and categorical data. Rules can be defined with common R syntax and parsed to an internal (matrix-like format). Rules can be manipulated with variable elimination and value substitution methods, allowing for feasibility checks and more. Data can be tested against the rules and erroneous fields can be found based on Fellegi and Holt's generalized principle. Rules dependencies can be visualized with using the 'igraph' package.

Maintained by Edwin de Jonge. Last updated 9 months ago.

17.9 match 22 stars 6.97 score 140 scripts 1 dependents

rolkra

explore:Simplifies Exploratory Data Analysis

Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.

Maintained by Roland Krasser. Last updated 3 months ago.

data-exploration data-visualisation decision-trees eda rmarkdown shiny tidy

10.3 match 228 stars 11.43 score 221 scripts 1 dependents

gadenbuie

cleanrmd:Clean Class-Less 'R Markdown' HTML Documents

A collection of clean 'R Markdown' HTML document templates using classy-looking classless CSS styles. These documents use a minimal set of dependencies but still look great, making them suitable for use a package vignettes or for sharing results via email.

Maintained by Garrick Aden-Buie. Last updated 2 years ago.

classless classless-theme clean css html rmarkdown style theme

18.0 match 151 stars 5.95 score 10 scripts 1 dependents

rstudio

renv:Project Environments

A dependency management toolkit for R. Using 'renv', you can create and manage project-local R libraries, save the state of these libraries to a 'lockfile', and later restore your library as required. Together, these tools can help make your projects more isolated, portable, and reproducible.

Maintained by Kevin Ushey. Last updated 3 days ago.

5.6 match 1.0k stars 18.55 score 1.5k scripts 113 dependents

data-cleaning

validatetools:Checking and Simplifying Validation Rule Sets

Rule sets with validation rules may contain redundancies or contradictions. Functions for finding redundancies and problematic rules are provided, given a set a rules formulated with 'validate'.

Maintained by Edwin de Jonge. Last updated 9 months ago.

data-cleaning rules validation

22.5 match 15 stars 4.47 score 39 scripts

nataliepatten

gatoRs:Geographic and Taxonomic Occurrence R-Based Scrubbing

Streamlines downloading and cleaning biodiversity data from Integrated Digitized Biocollections (iDigBio) and the Global Biodiversity Information Facility (GBIF).

Maintained by Natalie N. Patten. Last updated 10 months ago.

16.1 match 11 stars 6.16 score 66 scripts

data-cleaning

validatesuggest:Generate Suggestions for Validation Rules

Generate suggestions for validation rules from a reference data set, which can be used as a starting point for domain specific rules to be checked with package 'validate'.

Maintained by Edwin de Jonge. Last updated 1 years ago.

data-cleaning validation

22.5 match 5 stars 4.40 score 5 scripts

data-cleaning

dcmodify:Modify Data Using Externally Defined Modification Rules

Data cleaning scripts typically contain a lot of 'if this change that' type of statements. Such statements are typically condensed expert knowledge. With this package, such 'data modifying rules' are taken out of the code and become in stead parameters to the work flow. This allows one to maintain, document, and reason about data modification rules as separate entities.

Maintained by Mark van der Loo. Last updated 9 months ago.

15.5 match 10 stars 6.24 score 58 scripts

jonathancornelissen

highfrequency:Tools for Highfrequency Data Analysis

Provide functionality to manage, clean and match highfrequency trades and quotes data, calculate various liquidity measures, estimate and forecast volatility, detect price jumps and investigate microstructure noise and intraday periodicity. A detailed vignette can be found in the paper "Analyzing Intraday Financial Data in R: The highfrequency Package" by Boudt, Kleen, and Sjoerup (2022, <doi:10.18637/jss.v104.i08>). The DOI in the CITATION is for a new Journal of Statistical Software publication that will be registered after publication on CRAN. A working paper version can be found on SSRN: <doi:10.2139/ssrn.3917548>.

Maintained by Kris Boudt. Last updated 2 years ago.

openblas cpp openmp

13.0 match 152 stars 7.37 score 286 scripts

data-cleaning

deductive:Data Correction and Imputation Using Deductive Methods

Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data.

Maintained by Mark van der Loo. Last updated 1 months ago.

data-cleaning

22.5 match 14 stars 4.26 score 13 scripts

cran

podcleaner:Legacy Scottish Post Office Directories Cleaner

Attempts to clean optical character recognition (OCR) errors in legacy Scottish Post Office Directories. Further attempts to match records from trades and general directories.

Maintained by Olivier Bautheac. Last updated 3 years ago.

55.6 match 1.70 score

wadpac

GGIR:Raw Accelerometer Data Analysis

A tool to process and analyse data collected with wearable raw acceleration sensors as described in Migueles and colleagues (JMPB 2019), and van Hees and colleagues (JApplPhysiol 2014; PLoSONE 2015). The package has been developed and tested for binary data from 'GENEActiv' <https://activinsights.com/>, binary (.gt3x) and .csv-export data from 'Actigraph' <https://theactigraph.com> devices, and binary (.cwa) and .csv-export data from 'Axivity' <https://axivity.com>. These devices are currently widely used in research on human daily physical activity. Further, the package can handle accelerometer data file from any other sensor brand providing that the data is stored in csv format. Also the package allows for external function embedding.

Maintained by Vincent T van Hees. Last updated 2 days ago.

accelerometer activity-recognition circadian-rhythm movement-sensor sleep

7.1 match 109 stars 13.20 score 342 scripts 3 dependents

msberends

cleaner:Fast and Easy Data Cleaning

Data cleaning functions for classes logical, factor, numeric, character, currency and Date to make data cleaning fast and easy. Relying on very few dependencies, it provides smart guessing, but with user options to override anything if needed.

Maintained by Matthijs S. Berends. Last updated 4 months ago.

13.1 match 32 stars 6.95 score 64 scripts 9 dependents

rqtl

qtl2:Quantitative Trait Locus Mapping in Experimental Crosses

Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.

Maintained by Karl W Broman. Last updated 8 days ago.

cpp

9.6 match 34 stars 9.48 score 1.1k scripts 5 dependents

openbiox

UCSCXenaShiny:Interactive Analysis of UCSC Xena Data

Provides functions and a Shiny application for downloading, analyzing and visualizing datasets from UCSC Xena (<http://xena.ucsc.edu/>), which is a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others.

Maintained by Shixiang Wang. Last updated 4 months ago.

cancer-dataset shiny-apps ucsc-xena

10.4 match 96 stars 8.54 score 35 scripts

arutools

ARUtools:Management and Processing of Autonomous Recording Unit (ARU) Data

Parse Autonomous Recording Unit (ARU) data and for sub-sampling recordings. Extract Metadata from your recordings, select a subset of recordings for interpretation, and prepare files for processing on the 'WildTrax' <https://wildtrax.ca/> platform. Read and process metadata from recordings collected using the SongMeter and BAR-LT types of ARUs.

Maintained by David Hope. Last updated 4 months ago.

14.0 match 6.30 score 26 scripts

business-science

anomalize:Tidy Anomaly Detection

The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it's quite simple to decompose time series, detect anomalies, and create bands separating the "normal" data from the anomalous data at scale (i.e. for multiple time series). Time series decomposition is used to remove trend and seasonal components via the time_decompose() function and methods include seasonal decomposition of time series by Loess ("stl") and seasonal decomposition by piecewise medians ("twitter"). The anomalize() function implements two methods for anomaly detection of residuals including using an inner quartile range ("iqr") and generalized extreme studentized deviation ("gesd"). These methods are based on those used in the 'forecast' package and the Twitter 'AnomalyDetection' package. Refer to the associated functions for specific references for these methods.

Maintained by Matt Dancho. Last updated 1 years ago.

anomaly anomaly-detection decomposition detect-anomalies iqr time-series

9.2 match 339 stars 9.56 score 332 scripts

lrberge

stringmagic:Character String Operations and Interpolation, Magic Edition

Performs complex string operations compactly and efficiently. Supports string interpolation jointly with over 50 string operations. Also enhances regular string functions (like grep() and co). See an introduction at <https://lrberge.github.io/stringmagic/>.

Maintained by Laurent R Berge. Last updated 7 months ago.

interpolation string cpp

8.1 match 15 stars 10.56 score 37 scripts 33 dependents

dominiquemaucieri

quadcleanR:Cleanup and Visualization of Quadrat Data

A tool that can be customized to aid in the clean up of ecological data collected using quadrats and can crop quadrats to ensure comparability between quadrats collected under different methodologies.

Maintained by Dominique Maucieri. Last updated 2 years ago.

19.0 match 4.45 score 14 scripts

usaid-oha-si

mindthegap:Mind the Gap

Package to tidy UNAIDS estimates (from the EDMS database) as well as plot trends in UNAIDS 95 goals and ART coverage gap by country.

Maintained by Karishma Srikanth. Last updated 2 months ago.

14.7 match 5 stars 5.51 score 13 scripts

assuom44

arlclustering:Exploring Social Network Structures Through Friendship-Driven Community Detection with Association Rules Mining

Implements an innovative approach to community detection in social networks using Association Rules Learning. The package provides tools for processing graph and rules objects, generating association rules, and detecting communities based on node interactions. Designed to facilitate advanced research in Social Network Analysis, this package leverages association rules learning for enhanced community detection. This approach is described in El-Moussaoui et al. (2021) <doi:10.1007/978-3-030-66840-2_3>.

Maintained by Mohamed El-Moussaoui. Last updated 6 months ago.

12.6 match 6.45 score 50 scripts

ices-tools-prod

icesTAF:Functions to Support the ICES Transparent Assessment Framework

Functions to support the ICES Transparent Assessment Framework <https://taf.ices.dk> to organize data, methods, and results used in ICES assessments. ICES is an organization facilitating international collaboration in marine science.

Maintained by Colin Millar. Last updated 2 years ago.

12.3 match 5 stars 6.37 score 1.1k scripts 1 dependents

ropensci

pathviewr:Wrangle, Analyze, and Visualize Animal Movement Data

Tools to import, clean, and visualize movement data, particularly from motion capture systems such as Optitrack's 'Motive', the Straw Lab's 'Flydra', or from other sources. We provide functions to remove artifacts, standardize tunnel position and tunnel axes, select a region of interest, isolate specific trajectories, fill gaps in trajectory data, and calculate 3D and per-axis velocity. For experiments of visual guidance, we also provide functions that use subject position to estimate perception of visual stimuli.

Maintained by Vikram B. Baliga. Last updated 2 years ago.

animal-movement flydra motion movement-data optitrack trajectories trajectory-analysis visual-guidance visual-perception

11.9 match 8 stars 6.56 score 102 scripts

data-cleaning

lintools:Manipulation of Linear Systems of (in)Equalities

Variable elimination (Gaussian elimination, Fourier-Motzkin elimination), Moore-Penrose pseudoinverse, reduction to reduced row echelon form, value substitution, projecting a vector on the convex polytope described by a system of (in)equations, simplify systems by removing spurious columns and rows and collapse implied equalities, test if a matrix is totally unimodular, compute variable ranges implied by linear (in)equalities.

Maintained by Mark van der Loo. Last updated 9 months ago.

15.0 match 4 stars 5.19 score 13 scripts 2 dependents

ropensci

drake:A Pipeline Toolkit for Reproducible Computation at Scale

A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.

Maintained by William Michael Landau. Last updated 3 months ago.

data-science drake high-performance-computing makefile peer-reviewed pipeline reproducibility reproducible-research ropensci workflow

6.8 match 1.3k stars 11.49 score 1.7k scripts 1 dependents

epicentre-msf

dbc:Dictionary-Based Cleaning

Tools for dictionary-based data cleaning.

Maintained by Patrick Barks. Last updated 1 years ago.

31.3 match 2 stars 2.48 score 4 scripts 1 dependents

ropensci

EndoMineR:Functions to mine endoscopic and associated pathology datasets

This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.

Maintained by Sebastian Zeki. Last updated 7 months ago.

endoscopy gastroenterology peer-reviewed semi-structured-data text-mining

13.4 match 13 stars 5.47 score 30 scripts

billdenney

PKNCA:Perform Pharmacokinetic Non-Compartmental Analysis

Compute standard Non-Compartmental Analysis (NCA) parameters for typical pharmacokinetic analyses and summarize them.

Maintained by Bill Denney. Last updated 17 days ago.

nca noncompartmental-analysis pharmacokinetics

5.6 match 73 stars 12.61 score 214 scripts 4 dependents

ibot-geoecology

myClim:Microclimatic Data Processing

Handling the microclimatic data in R. The 'myClim' workflow begins at the reading data primary from microclimatic dataloggers, but can be also reading of meteorological station data from files. Cleaning time step, time zone settings and metadata collecting is the next step of the work flow. With 'myClim' tools one can crop, join, downscale, and convert microclimatic data formats, sort them into localities, request descriptive characteristics and compute microclimatic variables. Handy plotting functions are provided with smart defaults.

Maintained by Vojtěch Kalčík. Last updated 14 days ago.

10.0 match 7 stars 6.97 score 30 scripts

asgr

imager:Image Processing Library Based on 'CImg'

Fast image processing for images in up to 4 dimensions (two spatial dimensions, one time/depth dimension, one colour dimension). Provides most traditional image processing tools (filtering, morphology, transformations, etc.) as well as various functions for easily analysing image data using R. The package wraps 'CImg', <http://cimg.eu>, a simple, modern C++ library for image processing.

Maintained by Aaron Robotham. Last updated 27 days ago.

libx11 fftw3 tiff cpp openmp

5.0 match 17 stars 13.62 score 2.4k scripts 45 dependents

kbroman

qtl:Tools for Analyzing QTL Experiments

Analysis of experimental crosses to identify genes (called quantitative trait loci, QTLs) contributing to variation in quantitative traits. Broman et al. (2003) <doi:10.1093/bioinformatics/btg112>.

Maintained by Karl W Broman. Last updated 7 months ago.

openblas

5.3 match 80 stars 12.79 score 2.4k scripts 29 dependents

rafapereirabr

r5r:Rapid Realistic Routing with 'R5'

Rapid realistic routing on multimodal transport networks (walk, bike, public transport and car) using 'R5', the Rapid Realistic Routing on Real-world and Reimagined networks engine <https://github.com/conveyal/r5>. The package allows users to generate detailed routing analysis or calculate travel time and monetary cost matrices using seamless parallel computing on top of the R5 Java machine. While R5 is developed by Conveyal, the package r5r is independently developed by a team at the Institute for Applied Economic Research (Ipea) with contributions from collaborators. Apart from the documentation in this package, users will find additional information on R5 documentation at <https://docs.conveyal.com/>. Although we try to keep new releases of r5r in synchrony with R5, the development of R5 follows Conveyal's independent update process. Hence, users should confirm the R5 version implied by the Conveyal user manual (see <https://docs.conveyal.com/changelog>) corresponds with the R5 version that r5r depends on. This version of r5r depends on R5 v7.1.

Maintained by Rafael H. M. Pereira. Last updated 8 days ago.

openjdk

12.0 match 5.62 score 432 scripts

azure

azuremlsdk:Interface to the 'Azure Machine Learning' 'SDK'

Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: <https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml>.

Maintained by Diondra Peck. Last updated 3 years ago.

amlcompute azure azure-machine-learning azureml dsi machine-learning rstudio sdk-r

7.5 match 106 stars 8.91 score 221 scripts

bioc

MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics

MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.

Maintained by Laurent Gatto. Last updated 2 days ago.

immunooncology infrastructure proteomics massspectrometry qualitycontrol dataimport bioconductor bioinformatics mass-spectrometry proteomics-data visualisation cpp

5.1 match 130 stars 12.81 score 772 scripts 36 dependents

data-cleaning

deducorrect:Deductive Correction, Deductive Imputation, and Deterministic Correction

A collection of methods for automated data cleaning where all actions are logged. NOTE: active development has moved to the 'deductive' package.

Maintained by Mark van der Loo. Last updated 9 months ago.

15.5 match 9 stars 4.18 score 34 scripts

jrnold

ggthemes:Extra Themes, Scales and Geoms for 'ggplot2'

Some extra themes, geoms, and scales for 'ggplot2'. Provides 'ggplot2' themes and scales that replicate the look of plots by Edward Tufte, Stephen Few, 'Fivethirtyeight', 'The Economist', 'Stata', 'Excel', and 'The Wall Street Journal', among others. Provides 'geoms' for Tufte's box plot and range frame.

Maintained by Jeffrey B. Arnold. Last updated 1 years ago.

data-visualisation ggplot2 ggplot2-themes plot plotting theme visualization

4.0 match 1.3k stars 16.17 score 40k scripts 102 dependents

bioc

COTAN:COexpression Tables ANalysis

Statistical and computational method to analyze the co-expression of gene pairs at single cell level. It provides the foundation for single-cell gene interactome analysis. The basic idea is studying the zero UMI counts' distribution instead of focusing on positive counts; this is done with a generalized contingency tables framework. COTAN can effectively assess the correlated or anti-correlated expression of gene pairs. It provides a numerical index related to the correlation and an approximate p-value for the associated independence test. COTAN can also evaluate whether single genes are differentially expressed, scoring them with a newly defined global differentiation index. Moreover, this approach provides ways to plot and cluster genes according to their co-expression pattern with other genes, effectively helping the study of gene interactions and becoming a new tool to identify cell-identity marker genes.

Maintained by Galfrè Silvia Giulia. Last updated 19 days ago.

systemsbiology transcriptomics geneexpression singlecell

8.1 match 16 stars 7.88 score 96 scripts

easystats

insight:Easy Access to Model Information for Various Model Objects

A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. 'insight' mainly revolves around two types of functions: Functions that find (the names of) information, starting with 'find_', and functions that get the underlying data, starting with 'get_'. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.

Maintained by Daniel Lüdecke. Last updated 5 days ago.

easystats hacktoberfest insight models names predictors random

3.6 match 412 stars 17.24 score 568 scripts 210 dependents

mi2datalab

tidycharts:Generate Tidy Charts Inspired by 'IBCS'

There is a wide range of R packages created for data visualization, but still, there was no simple and easily accessible way to create clean and transparent charts - up to now. The 'tidycharts' package enables the user to generate charts compliant with International Business Communication Standards ('IBCS'). It means unified bar widths, colors, chart sizes, etc. Creating homogeneous reports has never been that easy! Additionally, users can apply semantic notation to indicate different data scenarios (plan, budget, forecast). What's more, it is possible to customize the charts by creating a personal color pallet with the possibility of switching to default options after the experiments. We wanted the package to be helpful in writing reports, so we also made joining charts in a one, clear image possible. All charts are generated in SVG format and can be shown in the 'RStudio' viewer pane or exported to HTML output of 'knitr'/'markdown'.

Maintained by Bartosz Sawicki. Last updated 3 years ago.

charts clean ibcs visualization

11.5 match 5 stars 5.23 score 17 scripts

billpetti

baseballr:Acquiring and Analyzing Baseball Data

Provides numerous utilities for acquiring and analyzing baseball data from online sources such as 'Baseball Reference' <https://www.baseball-reference.com/>, 'FanGraphs' <https://www.fangraphs.com/>, and the 'MLB Stats' API <https://www.mlb.com/>.

Maintained by Saiem Gilani. Last updated 4 months ago.

baseball pitchfx sabermetrics statcast

6.6 match 380 stars 8.98 score 582 scripts

renkun-ken

rlist:A Toolbox for Non-Tabular Data Manipulation

Provides a set of functions for data manipulation with list objects, including mapping, filtering, grouping, sorting, updating, searching, and other useful functions. Most functions are designed to be pipeline friendly so that data processing with lists can be chained.

Maintained by Kun Ren. Last updated 2 years ago.

4.3 match 206 stars 13.73 score 2.2k scripts 123 dependents

business-science

timetk:A Tool Kit for Working with Time Series

Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.

Maintained by Matt Dancho. Last updated 1 years ago.

coercion coercion-functions data-mining dplyr forecast forecasting forecasting-models machine-learning series-decomposition series-signature tibble tidy tidyquant tidyverse time time-series timeseries

4.1 match 625 stars 14.15 score 4.0k scripts 16 dependents

the-hull

datacleanr:Interactive and Reproducible Data Cleaning

Flexible and efficient cleaning of data with interactivity. 'datacleanr' facilitates best practices in data analyses and reproducibility with built-in features and by translating interactive/manual operations to code. The package is designed for interoperability, and so seamlessly fits into reproducible analyses pipelines in 'R'.

Maintained by Alexander Hurley. Last updated 3 years ago.

annotation-tool data-cleaning outlier-detection outlier-removal reproducibility

13.0 match 20 stars 4.38 score 24 scripts

ekstroem

dataMaid:A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Screening Process

Data screening is an important first step of any statistical analysis. dataMaid auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset.

Maintained by Claus Thorn Ekstrøm. Last updated 3 years ago.

data-cleaning data-screening reproducible-research

7.5 match 143 stars 7.53 score 236 scripts

egenn

rtemis:Machine Learning and Visualization

Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.

Maintained by E.D. Gennatas. Last updated 1 months ago.

data-science data-visualization machine-learning machine-learning-library visualization

7.8 match 145 stars 7.09 score 50 scripts 2 dependents

thinkr-open

thinkr:Tools for Cleaning Up Messy Files

Some tools for cleaning up messy 'Excel' files to be suitable for R. People who have been working with 'Excel' for years built more or less complicated sheets with names, characters, formats that are not homogeneous. To be able to use them in R nowadays, we built a set of functions that will avoid the majority of importation problems and keep all the data at best.

Maintained by Vincent Guyader. Last updated 3 years ago.

hacktoberfest thinkr-not-maintained

7.6 match 29 stars 6.96 score 45 scripts

chrismuir

refinr:Cluster and Merge Similar Values Within a Character Vector

These functions take a character vector as input, identify and cluster similar values, and then merge clusters together so their values become identical. The functions are an implementation of the key collision and ngram fingerprint algorithms from the open source tool Open Refine <https://openrefine.org/>. More info on key collision and ngram fingerprint can be found here <https://openrefine.org/docs/technical-reference/clustering-in-depth>.

Maintained by Chris Muir. Last updated 1 years ago.

approximate-string-matching clustering data-cleaning data-clustering fuzzy-matching ngram openrefine cpp

7.5 match 104 stars 6.80 score 121 scripts

ropensci

taxa:Classes for Storing and Manipulating Taxonomic Data

Provides classes for storing and manipulating taxonomic data. Most of the classes can be treated like base R vectors (e.g. can be used in tables as columns and can be named). Vectorized classes can store taxon names and authorities, taxon IDs from databases, taxon ranks, and other types of information. More complex classes are provided to store taxonomic trees and user-defined data associated with them.

Maintained by Zachary Foster. Last updated 1 years ago.

taxonomy biology hierarchy data-cleaning taxon

7.5 match 48 stars 6.80 score 217 scripts

trinker

qdap:Bridging the Gap Between Qualitative Data and Quantitative Analysis

Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.

Maintained by Tyler Rinker. Last updated 4 years ago.

qdap quantitative-discourse-analysis text-analysis text-mining text-plotting openjdk

5.3 match 176 stars 9.61 score 1.3k scripts 3 dependents

pascoalf

ulrb:Unsupervised Learning Based Definition of Microbial Rare Biosphere

A tool to define rare biosphere. 'ulrb' solves the problem of the definition of rarity by replacing arbitrary thresholds with an unsupervised machine learning algorithm (partitioning around medoids, or k-medoids). This algorithm works for any type of microbiome data, provided there is a species abundance table. For validation of this method to different species abundance tables see Pascoal et al, 2024 (in peer-review). This method also works for non-microbiome data.

Maintained by Francisco Pascoal. Last updated 20 days ago.

8.9 match 3 stars 5.68 score 9 scripts

epiforecasts

socialmixr:Social Mixing Matrices for Infectious Disease Modelling

Provides methods for sampling contact matrices from diary data for use in infectious disease modelling, as discussed in Mossong et al. (2008) <doi:10.1371/journal.pmed.0050074>.

Maintained by Sebastian Funk. Last updated 5 months ago.

5.2 match 38 stars 9.74 score 227 scripts 1 dependents

epiforecasts

EpiNow2:Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters

Estimates the time-varying reproduction number, rate of spread, and doubling time using a range of open-source tools (Abbott et al. (2020) <doi:10.12688/wellcomeopenres.16006.1>), and current best practices (Gostic et al. (2020) <doi:10.1101/2020.06.18.20134858>). It aims to help users avoid some of the limitations of naive implementations in a framework that is informed by community feedback and is actively supported.

Maintained by Sebastian Funk. Last updated 25 days ago.

backcalculation covid-19 gaussian-processes open-source reproduction-number stan cpp

4.1 match 120 stars 11.88 score 210 scripts

kcuilla

reactablefmtr:Streamlined Table Styling and Formatting for Reactable

Provides various features to streamline and enhance the styling of interactive reactable tables with easy-to-use and highly-customizable functions and themes. Apply conditional formatting to cells with data bars, color scales, color tiles, and icon sets. Utilize custom table themes inspired by popular websites such and bootstrap themes. Apply sparkline line & bar charts (note this feature requires the 'dataui' package which can be downloaded from <https://github.com/timelyportfolio/dataui>). Increase the portability and reproducibility of reactable tables by embedding images from the web directly into cells. Save the final table output as a static image or interactive file.

Maintained by Kyle Cuilla. Last updated 2 years ago.

customization data-visualization easy-to-use reproducible tables

5.6 match 209 stars 8.79 score 460 scripts 4 dependents

oscarkjell

text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Maintained by Oscar Kjell. Last updated 4 days ago.

deep-learning machine-learning nlp transformers openjdk

3.7 match 146 stars 13.16 score 436 scripts 1 dependents

extendr

rextendr:Call Rust Code from R using the 'extendr' Crate

Provides functions to compile and load Rust code from R, similar to how 'Rcpp' or 'cpp11' allow easy interfacing with C++ code. Also provides helper functions to create R packages that use Rust code. Under the hood, the Rust crate 'extendr' is used to do all the heavy lifting.

Maintained by Ilia Kosenkov. Last updated 24 days ago.

5.1 match 205 stars 9.43 score 61 scripts

aphalo

photobiology:Photobiological Calculations

Definitions of classes, methods, operators and functions for use in photobiology and radiation meteorology and climatology. Calculation of effective (weighted) and not-weighted irradiances/doses, fluence rates, transmittance, reflectance, absorptance, absorbance and diverse ratios and other derived quantities from spectral data. Local maxima and minima: peaks, valleys and spikes. Conversion between energy-and photon-based units. Wavelength interpolation. Astronomical calculations related solar angles and day length. Colours and vision. This package is part of the 'r4photobiology' suite, Aphalo, P. J. (2015) <doi:10.19232/uv4pb.2015.1.14>.

Maintained by Pedro J. Aphalo. Last updated 3 days ago.

light photobiology quantification r4photobiology-suite radiation spectra sun-position

5.1 match 4 stars 9.35 score 604 scripts 12 dependents

beckerbenj

eatGADS:Data Management of Large Hierarchical Data

Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases.

Maintained by Benjamin Becker. Last updated 24 days ago.

6.5 match 1 stars 7.36 score 34 scripts 1 dependents

stmcg

metamedian:Meta-Analysis of Medians

Implements several methods to meta-analyze studies that report the sample median of the outcome. The methods described by McGrath et al. (2019) <doi:10.1002/sim.8013>, Ozturk and Balakrishnan (2020) <doi:10.1002/sim.8738>, and McGrath et al. (2020a) <doi:10.1002/bimj.201900036> can be applied to directly meta-analyze the median or difference of medians between groups. Additionally, a number of methods (e.g., McGrath et al. (2020b) <doi:10.1177/0962280219889080>, Cai et al. (2021) <doi:10.1177/09622802211047348>, and McGrath et al. (2023) <doi:10.1177/09622802221139233>) are implemented to estimate study-specific (difference of) means and their standard errors in order to estimate the pooled (difference of) means. Methods for meta-analyzing median survival times (McGrath et al. (2025) <doi:10.48550/arXiv.2503.03065>) are also implemented. See McGrath et al. (2024) <doi:10.1002/jrsm.1686> for a detailed guide on using the package.

Maintained by Sean McGrath. Last updated 9 days ago.

9.7 match 9 stars 4.86 score 16 scripts

bioc

xcms:LC-MS and GC-MS Data Analysis

Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.

Maintained by Steffen Neumann. Last updated 3 days ago.

immunooncology massspectrometry metabolomics bioconductor feature-detection mass-spectrometry peak-detection cpp

3.3 match 196 stars 14.31 score 984 scripts 11 dependents

usaid-oha-si

gophr:Utility functions related to working with the MER Structured Dataset

This packages contains a number of functions for working with the PEPFAR MSD.

Maintained by Aaron Chafetz. Last updated 4 months ago.

7.5 match 1 stars 6.21 score 182 scripts 1 dependents

bioc

iNETgrate:Integrates DNA methylation data with gene expression in a single gene network

The iNETgrate package provides functions to build a correlation network in which nodes are genes. DNA methylation and gene expression data are integrated to define the connections between genes. This network is used to identify modules (clusters) of genes. The biological information in each of the resulting modules is represented by an eigengene. These biological signatures can be used as features e.g., for classification of patients into risk categories. The resulting biological signatures are very robust and give a holistic view of the underlying molecular changes.

Maintained by Habil Zare. Last updated 5 months ago.

geneexpression rnaseq dnamethylation networkinference network graphandnetwork biomedicalinformatics systemsbiology transcriptomics classification clustering dimensionreduction principalcomponent mrnamicroarray normalization geneprediction kegg survival core-services

7.5 match 74 stars 6.21 score 1 scripts

chandlerxiandeyang

CleaningValidation:Cleaning Validation Functions for Pharmaceutical Cleaning Process

Provides essential Cleaning Validation functions for complying with pharmaceutical cleaning process regulatory standards. The package includes non-parametric methods to analyze drug active-ingredient residue (DAR), cleaning agent residue (CAR), and microbial colonies (Mic) for non-Poisson distributions. Additionally, Poisson methods are provided for Mic analysis when Mic data follow a Poisson distribution.

Maintained by Xiande Yang. Last updated 10 months ago.

17.1 match 2.70 score

bioc

MultiAssayExperiment:Software for the integration of multi-omics experiments in Bioconductor

Harmonize data management of multiple experimental assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames. Facilities are provided for reshaping data into wide and long formats for adaptability to graphing and downstream analysis.

Maintained by Marcel Ramos. Last updated 2 months ago.

infrastructure datarepresentation bioconductor bioconductor-package genomics nci-itcr tcga u24ca289073

3.0 match 71 stars 14.95 score 670 scripts 127 dependents

alexchristensen

SemNetCleaner:An Automated Cleaning Tool for Semantic and Linguistic Data

Implements several functions that automates the cleaning and spell-checking of text data. Also converges, finalizes, removes plurals and continuous strings, and puts text data in binary format for semantic network analysis. Uses the 'SemNetDictionaries' package to make the cleaning process more accurate, efficient, and reproducible.

Maintained by Alexander P. Christensen. Last updated 3 years ago.

preprocessing semantic-network-analysis

7.2 match 10 stars 6.16 score 48 scripts 1 dependents

taxonomicallyinformedannotation

tima:Taxonomically Informed Metabolite Annotation

This package provides the infrastructure to perform Taxonomically Informed Metabolite Annotation.

Maintained by Adriano Rutz. Last updated 6 days ago.

metabolite annotation chemotaxonomy scoring system natural products computational metabolomics taxonomic distance specialized metabolome

6.8 match 9 stars 6.55 score 32 scripts 2 dependents

tidymodels

textrecipes:Extra 'Recipes' for Text Processing

Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.

Maintained by Emil Hvitfeldt. Last updated 9 days ago.

4.0 match 160 stars 10.87 score 964 scripts 1 dependents

renands

RMLPCA:Maximum Likelihood Principal Component Analysis

R implementation of Maximum Likelihood Principal Component Analysis The main idea of this package is to have an alternative way of PCA for subspace modeling that considers measurement errors. More details can be found in Peter D. Wentzell (2009) <doi:10.1016/B978-0-444-64165-6.03029-9>.

Maintained by Renan Santos Barbosa. Last updated 4 years ago.

13.7 match 2 stars 3.15 score 14 scripts

ikosmidis

cranly:Package Directives and Collaboration Networks in CRAN

Core visualizations and summaries for the CRAN package database. The package provides comprehensive methods for cleaning up and organizing the information in the CRAN package database, for building package directives networks (depends, imports, suggests, enhances, linking to) and collaboration networks, producing package dependence trees, and for computing useful summaries and producing interactive visualizations from the resulting networks and summaries. The resulting networks can be coerced to 'igraph' <https://CRAN.R-project.org/package=igraph> objects for further analyses and modelling.

Maintained by Ioannis Kosmidis. Last updated 3 years ago.

network-analysis network-visualization

6.2 match 49 stars 6.85 score 32 scripts 1 dependents

nelson-gon

mde:Missing Data Explorer

Correct identification and handling of missing data is one of the most important steps in any analysis. To aid this process, 'mde' provides a very easy to use yet robust framework to quickly get an idea of where the missing data lies and therefore find the most appropriate action to take. Graham WJ (2009) <doi:10.1146/annurev.psych.58.110405.085530>.

Maintained by Nelson Gonzabato. Last updated 3 years ago.

data-analysis data-cleaning data-exploration data-science datacleaner datacleaning exploratory-data-analysis missing missing-data missing-value-treatment missing-values missingness omit recode replace statistics

7.5 match 4 stars 5.61 score 34 scripts

kwb-r

kwb.endnote:Helper Functions for Analysing KWB Endnote Library (Exported as .xml)

Helper Functions For Analysing KWB Endnote Library (Exported As .XML).

Maintained by Michael Rustler. Last updated 4 years ago.

endnote knowledge-repo literature-data-management project-fakin publication

14.0 match 3.00 score 2 scripts

usaid-oha-si

glamr:SI Utilities Package

Provides a series of base functions useful to the GH OHA SI team. This includes project setup, pulling from DATIM, and key functions for working with the MSD.

Maintained by Aaron Chafetz. Last updated 6 months ago.

5.7 match 2 stars 7.28 score 1.3k scripts 1 dependents

bioc

affy:Methods for Affymetrix Oligonucleotide Arrays

The package contains functions for exploratory oligonucleotide array analysis. The dependence on tkWidgets only concerns few convenience functions. 'affy' is fully functional without it.

Maintained by Robert D. Shear. Last updated 2 months ago.

microarray onechannel preprocessing

3.8 match 11.12 score 2.5k scripts 98 dependents

r-lib

pkgdown:Make Static HTML Documentation for a Package

Generate an attractive and useful website from a source package. 'pkgdown' converts your documentation, vignettes, 'README', and more to 'HTML' making it easy to share information about your package online.

Maintained by Hadley Wickham. Last updated 11 hours ago.

documentation-tool

2.3 match 734 stars 18.47 score 588 scripts 162 dependents

yihui

knitr:A General-Purpose Package for Dynamic Report Generation in R

Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.

Maintained by Yihui Xie. Last updated 4 hours ago.

dynamic-documents knitr literate-programming rmarkdown sweave

1.8 match 2.4k stars 23.61 score 116k scripts 4.2k dependents

rstudio

tfruns:Training Run Tools for 'TensorFlow'

Create and manage unique directories for each 'TensorFlow' training run. Provides a unique, time stamped directory for each run along with functions to retrieve the directory of the latest run or latest several runs.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

3.5 match 34 stars 11.80 score 325 scripts 77 dependents

pecanproject

PEcAn.settings:PEcAn Settings package

Contains functions to read PEcAn settings files.

Maintained by David LeBauer. Last updated 2 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

4.1 match 216 stars 10.00 score 54 scripts 17 dependents

blasbenito

distantia:Advanced Toolset for Efficient Time Series Dissimilarity Analysis

Fast C++ implementation of Dynamic Time Warping for time series dissimilarity analysis, with applications in environmental monitoring and sensor data analysis, climate science, signal processing and pattern recognition, and financial data analysis. Built upon the ideas presented in Benito and Birks (2020) <doi:10.1111/ecog.04895>, provides tools for analyzing time series of varying lengths and structures, including irregular multivariate time series. Key features include individual variable contribution analysis, restricted permutation tests for statistical significance, and imputation of missing data via GAMs. Additionally, the package provides an ample set of tools to prepare and manage time series data.

Maintained by Blas M. Benito. Last updated 25 days ago.

7.1 match 23 stars 5.76 score 11 scripts

weecology

portalr:Create Useful Summaries of the Portal Data

Download and generate summaries for the rodent, plant, ant, and weather data from the Portal Project. Portal is a long-term (and ongoing) experimental monitoring site in the Chihuahuan desert. The raw data files can be found at <https://github.com/weecology/portaldata>.

Maintained by Glenda M. Yenni. Last updated 4 months ago.

community-ecology ecology small-mammal-trapping

5.3 match 11 stars 7.64 score 63 scripts

bioc

GWASTools:Tools for Genome Wide Association Studies

Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.

Maintained by Stephanie M. Gogarten. Last updated 5 months ago.

snp geneticvariability qualitycontrol microarray

3.9 match 17 stars 10.50 score 396 scripts 5 dependents

r-spatial

qgisprocess:Use 'QGIS' Processing Algorithms

Provides seamless access to the 'QGIS' (<https://qgis.org>) processing toolbox using the standalone 'qgis_process' command-line utility. Both native and third-party (plugin) processing providers are supported. Beside referring data sources from file, also common objects from 'sf', 'terra' and 'stars' are supported. The native processing algorithms are documented by QGIS.org (2024) <https://docs.qgis.org/latest/en/docs/user_manual/processing_algs/>.

Maintained by Floris Vanderhaeghe. Last updated 5 months ago.

4.0 match 210 stars 10.09 score 175 scripts

pachadotdev

cpp11armadillo:An 'Armadillo' Interface

Provides function declarations and inline function definitions that facilitate communication between R and the 'Armadillo' 'C++' library for linear algebra and scientific computing. This implementation is detailed in Vargas Sepulveda and Schneider Malamud (2024) <doi:10.48550/arXiv.2408.11074>.

Maintained by Mauricio Vargas Sepulveda. Last updated 26 days ago.

armadillo cpp cpp11 hacktoberfest linear-algebra

4.4 match 9 stars 9.14 score 1 scripts 16 dependents

inbo

checklist:A Thorough and Strict Set of Checks for R Packages and Source Code

An opinionated set of rules for R packages and R source code projects.

Maintained by Thierry Onkelinx. Last updated 27 days ago.

checklist continuous-integration continuous-testing quality-assurance

5.5 match 19 stars 7.24 score 21 scripts 2 dependents

rstudio

packrat:A Dependency Management System for Projects and their R Package Dependencies

Manage the R packages your project depends on in an isolated, portable, and reproducible way.

Maintained by Aron Atkins. Last updated 1 months ago.

3.3 match 406 stars 12.15 score 256 scripts 9 dependents

datalowe

synr:Explore and Process Synesthesia Consistency Test Data

Explore synesthesia consistency test data, calculate consistency scores, and classify participant data as valid or invalid.

Maintained by Lowe Wilsson. Last updated 1 years ago.

data-cleaning synesthesia

7.5 match 5.32 score 139 scripts

bioc

ShortRead:FASTQ input and manipulation

This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

dataimport sequencing qualitycontrol bioconductor-package core-package zlib cpp

3.3 match 8 stars 12.08 score 1.8k scripts 49 dependents

brunobrr

bdc:Biodiversity Data Cleaning

It brings together several aspects of biodiversity data-cleaning in one place. 'bdc' is organized in thematic modules related to different biodiversity dimensions, including 1) Merge datasets: standardization and integration of different datasets; 2) Pre-filter: flagging and removal of invalid or non-interpretable information, followed by data amendments; 3) Taxonomy: cleaning, parsing, and harmonization of scientific names from several taxonomic groups against taxonomic databases locally stored through the application of exact and partial matching algorithms; 4) Space: flagging of erroneous, suspect, and low-precision geographic coordinates; and 5) Time: flagging and, whenever possible, correction of inconsistent collection date. In addition, it contains features to visualize, document, and report data quality – which is essential for making data quality assessment transparent and reproducible. The reference for the methodology is Bruno et al. (2022) <doi:10.1111/2041-210X.13868>.

Maintained by Bruno Ribeiro. Last updated 3 months ago.

bdc biodiversity-data workflow

6.0 match 24 stars 6.66 score 53 scripts

bioc

beer:Bayesian Enrichment Estimation in R

BEER implements a Bayesian model for analyzing phage-immunoprecipitation sequencing (PhIP-seq) data. Given a PhIPData object, BEER returns posterior probabilities of enriched antibody responses, point estimates for the relative fold-change in comparison to negative control samples, and more. Additionally, BEER provides a convenient implementation for using edgeR to identify enriched antibody responses.

Maintained by Athena Chen. Last updated 5 months ago.

software statisticalmethod bayesian sequencing coverage jags cpp

7.4 match 10 stars 5.38 score 12 scripts

ambuvjyn

baseq:Basic Sequence Processing Tool for Biological Data

Primarily created as an easy and understanding way to do basic sequences surrounding the central dogma of molecular biology.

Maintained by Ambu Vijayan. Last updated 2 years ago.

bioinformatics sequencing

9.9 match 2 stars 4.00 score

gpilgrim2670

SwimmeR:Data Import, Cleaning, and Conversions for Swimming Results

The goal of the 'SwimmeR' package is to provide means of acquiring, and then analyzing, data from swimming (and diving) competitions. To that end 'SwimmeR' allows results to be read in from .html sources, like 'Hy-Tek' real time results pages, '.pdf' files, 'ISL' results, 'Omega' results, and (on a development basis) '.hy3' files. Once read in, 'SwimmeR' can convert swimming times (performances) between the computationally useful format of seconds reported to the '100ths' place (e.g. 95.37), and the conventional reporting format (1:35.37) used in the swimming community. 'SwimmeR' can also score meets in a variety of formats with user defined point values, convert times between courses ('LCM', 'SCM', 'SCY') and draw single elimination brackets, as well as providing a suite of tools for working cleaning swimming data. This is a developmental package, not yet mature.

Maintained by Greg Pilgrim. Last updated 2 years ago.

8.6 match 4 stars 4.53 score 17 scripts

r-lib

devtools:Tools to Make Developing R Packages Easier

Collection of package development tools.

Maintained by Jennifer Bryan. Last updated 6 months ago.

package-creation

2.0 match 2.4k stars 19.51 score 51k scripts 148 dependents

wenlong-liu

usfertilizer:County-Level Estimates of Fertilizer Application in USA

Compiled and cleaned the county-level estimates of fertilizer, nitrogen and phosphorus, from 1945 to 2012 in United States of America (USA). The commercial fertilizer data were originally generated by USGS based on the sales data of commercial fertilizer. The manure data were estimated based on county-level population data of livestock, poultry, and other animals. See the user manual for detailed data sources and cleaning methods. 'usfertilizer' utilized the tidyverse to clean the original data and provide user-friendly dataframe. Please note that USGS does not endorse this package. Also data from 1986 is not available for now.

Maintained by Wenlong Liu. Last updated 7 years ago.

datasets tidyverse

8.9 match 11 stars 4.34 score 1 scripts

tvganesh

cricketr:Analyze Cricketers and Cricket Teams Based on ESPN Cricinfo Statsguru

Tools for analyzing performances of cricketers based on stats in ESPN Cricinfo Statsguru. The toolset can be used for analysis of Tests,ODIs and Twenty20 matches of both batsmen and bowlers. The package can also be used to analyze team performances.

Maintained by Tinniam V Ganesh. Last updated 4 years ago.

6.9 match 62 stars 5.55 score 115 scripts

silvadenisson

electionsBR:R Functions to Download and Clean Brazilian Electoral Data

Offers a set of functions to easily download and clean Brazilian electoral data from the Superior Electoral Court and 'CepespData' websites. Among other features, the package retrieves data on local and federal elections for all positions (city councilor, mayor, state deputy, federal deputy, governor, and president) aggregated by state, city, and electoral zones.

Maintained by Denisson Silva. Last updated 4 months ago.

5.1 match 65 stars 7.54 score 66 scripts

msberends

AMR:Antimicrobial Resistance Data Analysis

Functions to simplify and standardise antimicrobial resistance (AMR) data analysis and to work with microbial and antimicrobial properties by using evidence-based methods, as described in <doi:10.18637/jss.v104.i03>.

Maintained by Matthijs S. Berends. Last updated 7 hours ago.

amr antimicrobial-data epidemiology microbiology software

3.2 match 92 stars 11.87 score 182 scripts 6 dependents

ropensci

taxlist:Handling Taxonomic Lists

Handling taxonomic lists through objects of class 'taxlist'. This package provides functions to import species lists from 'Turboveg' (<https://www.synbiosys.alterra.nl/turboveg/>) and the possibility to create backups from resulting R-objects. Also quick displays are implemented as summary-methods.

Maintained by Miguel Alvarez. Last updated 6 months ago.

5.3 match 12 stars 7.07 score 81 scripts 2 dependents

tirgit

missCompare:Intuitive Missing Data Imputation Framework

Offers a convenient pipeline to test and compare various missing data imputation algorithms on simulated and real data. These include simpler methods, such as mean and median imputation and random replacement, but also include more sophisticated algorithms already implemented in popular R packages, such as 'mi', described by Su et al. (2011) <doi:10.18637/jss.v045.i02>; 'mice', described by van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>; 'missForest', described by Stekhoven and Buhlmann (2012) <doi:10.1093/bioinformatics/btr597>; 'missMDA', described by Josse and Husson (2016) <doi:10.18637/jss.v070.i01>; and 'pcaMethods', described by Stacklies et al. (2007) <doi:10.1093/bioinformatics/btm069>. The central assumption behind 'missCompare' is that structurally different datasets (e.g. larger datasets with a large number of correlated variables vs. smaller datasets with non correlated variables) will benefit differently from different missing data imputation algorithms. 'missCompare' takes measurements of your dataset and sets up a sandbox to try a curated list of standard and sophisticated missing data imputation algorithms and compares them assuming custom missingness patterns. 'missCompare' will also impute your real-life dataset for you after the selection of the best performing algorithm in the simulations. The package also provides various post-imputation diagnostics and visualizations to help you assess imputation performance.

Maintained by Tibor V. Varga. Last updated 4 years ago.

comparison comparison-benchmarks imputation imputation-algorithm imputation-methods imputations kolmogorov-smirnov missing missing-data missing-data-imputation missing-status-check missing-values missingness post-imputation-diagnostics rmse

6.3 match 39 stars 5.89 score 40 scripts

rstudio

rmarkdown:Dynamic Documents for R

Convert R Markdown documents into a variety of formats.

Maintained by Yihui Xie. Last updated 4 months ago.

literate-programming markdown pandoc rmarkdown

1.7 match 2.9k stars 21.79 score 14k scripts 3.7k dependents

ecohealthalliance

ohcleandat:One Health Data Cleaning and Quality Checking Package

This package provides useful functions to orchestrate analytics and data cleaning pipelines for One Health projects.

Maintained by Collin Schwantes. Last updated 5 days ago.

7.6 match 1 stars 4.88 score 5 scripts

samhforbes

eyetrackingR:Eye-Tracking Data Analysis

Addresses tasks along the pipeline from raw data to analysis and visualization for eye-tracking data. Offers several popular types of analyses, including linear and growth curve time analyses, onset-contingent reaction time analyses, as well as several non-parametric bootstrapping approaches. For references to the approach see Mirman, Dixon & Magnuson (2008) <doi:10.1016/j.jml.2007.11.006>, and Barr (2008) <doi:10.1016/j.jml.2007.09.002>.

Maintained by Samuel Forbes. Last updated 2 years ago.

4.7 match 22 stars 7.84 score 60 scripts

bioc

DropletUtils:Utilities for Handling Single-Cell Droplet Data

Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading from count matrices or molecule information files, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix.

Maintained by Jonathan Griffiths. Last updated 3 months ago.

immunooncology singlecell sequencing rnaseq geneexpression transcriptomics dataimport coverage zlib cpp

3.7 match 10.08 score 2.7k scripts 9 dependents

tguillerme

dispRity:Measuring Disparity

A modular package for measuring disparity (multidimensional space occupancy). Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics. The package also provides functions for looking at disparity in a serial way (e.g. disparity through time) or per groups as well as visualising the results. Finally, this package provides several statistical tests for disparity analysis.

Maintained by Thomas Guillerme. Last updated 2 days ago.

disparity ecology multidimensionality palaeobiology

4.3 match 26 stars 8.69 score 220 scripts 1 dependents

hannahcomiskey

mcmsupply:Estimating Public and Private Sector Contraceptive Market Supply Shares

Family Planning programs and initiatives typically use nationally representative surveys to estimate key indicators of a country’s family planning progress. However, in recent years, routinely collected family planning services data (Service Statistics) have been used as a supplementary data source to bridge gaps in the surveys. The use of service statistics comes with the caveat that adjustments need to be made for missing private sector contributions to the contraceptive method supply chain. Evaluating the supply source of modern contraceptives often relies on Demographic Health Surveys (DHS), where many countries do not have recent data beyond 2015/16. Fortunately, in the absence of recent surveys we can rely on statistical model-based estimates and projections to fill the knowledge gap. We present a Bayesian, hierarchical, penalized-spline model with multivariate-normal spline coefficients, to account for across method correlations, to produce country-specific,annual estimates for the proportion of modern contraceptive methods coming from the public and private sectors. This package provides a quick and convenient way for users to access the DHS modern contraceptive supply share data at national and subnational administration levels, estimate, evaluate and plot annual estimates with uncertainty for a sample of low- and middle-income countries. Methods for the estimation of method supply shares at the national level are described in Comiskey, Alkema, Cahill (2022) <arXiv:2212.03844>.

Maintained by Hannah Comiskey. Last updated 12 months ago.

jags cpp

7.0 match 2 stars 5.15 score 20 scripts

cran

textreg:n-Gram Text Regression, aka Concise Comparative Summarization

Function for sparse regression on raw text, regressing a labeling vector onto a feature space consisting of all possible phrases.

Maintained by Luke Miratrix. Last updated 6 years ago.

cpp

11.0 match 1 stars 3.26 score

tom-wolff

ideanet:Integrating Data Exchange and Analysis for Networks ('ideanet')

A suite of convenient tools for social network analysis geared toward students, entry-level users, and non-expert practitioners. ‘ideanet’ features unique functions for the processing and measurement of sociocentric and egocentric network data. These functions automatically generate node- and system-level measures commonly used in the analysis of these types of networks. Outputs from these functions maximize the ability of novice users to employ network measurements in further analyses while making all users less prone to common data analytic errors. Additionally, ‘ideanet’ features an R Shiny graphic user interface that allows novices to explore network data with minimal need for coding.

Maintained by Tom Wolff. Last updated 3 days ago.

5.3 match 6 stars 6.80 score 10 scripts

cloudyr

googleComputeEngineR:R Interface with Google Compute Engine

Interact with the 'Google Compute Engine' API in R. Lets you create, start and stop instances in the 'Google Cloud'. Support for preconfigured instances, with templates for common R needs.

Maintained by Mark Edmondson. Last updated 1 days ago.

api cloud-computing cloudyr google-cloud googleauthr launching-virtual-machines

3.7 match 152 stars 9.73 score 235 scripts

harrison4192

framecleaner:Clean Data Frames

Provides a friendly interface for modifying data frames with a sequence of piped commands built upon the 'tidyverse' Wickham et al., (2019) <doi:10.21105/joss.01686> . The majority of commands wrap 'dplyr' mutate statements in a convenient way to concisely solve common issues that arise when tidying small to medium data sets. Includes smart defaults and allows flexible selection of columns via 'tidyselect'.

Maintained by Harrison Tietze. Last updated 1 years ago.

6.8 match 2 stars 5.18 score 5 scripts 5 dependents

bioboot

bio3d:Biological Structure Analysis

Utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform sequence and structure database searches, data summaries, atom selection, alignment, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, normal mode analysis, principal component analysis of heterogeneous structure data, and correlation network analysis from normal mode and molecular dynamics data. In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data. Please refer to the URLs below for more information.

Maintained by Barry Grant. Last updated 5 months ago.

zlib cpp

4.1 match 5 stars 8.49 score 1.4k scripts 10 dependents

bgreenwell

bpa:Basic Pattern Analysis

Run basic pattern analyses on character sets, digits, or combined input containing both characters and numeric digits. Useful for data cleaning and for identifying columns containing multiple or nonstandard formats.

Maintained by Brandon Greenwell. Last updated 9 years ago.

basic-pattern-analysis data-cleaning standardization

8.0 match 3 stars 4.32 score 14 scripts

cardiomoon

ggiraphExtra:Make Interactive 'ggplot2'. Extension to 'ggplot2' and 'ggiraph'

Collection of functions to enhance 'ggplot2' and 'ggiraph'. Provides functions for exploratory plots. All plot can be a 'static' plot or an 'interactive' plot using 'ggiraph'.

Maintained by Keon-Woong Moon. Last updated 4 years ago.

3.9 match 48 stars 8.93 score 402 scripts 3 dependents

bioc

scde:Single Cell Differential Expression

The scde package implements a set of statistical methods for analyzing single-cell RNA-seq data. scde fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The scde package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify and characterize putative cell subpopulations based on transcriptional signatures. The overall approach to the differential expression analysis is detailed in the following publication: "Bayesian approach to single-cell differential expression analysis" (Kharchenko PV, Silberstein L, Scadden DT, Nature Methods, doi: 10.1038/nmeth.2967). The overall approach to subpopulation identification and characterization is detailed in the following pre-print: "Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis" (Fan J, Salathia N, Liu R, Kaeser G, Yung Y, Herman J, Kaper F, Fan JB, Zhang K, Chun J, and Kharchenko PV, Nature Methods, doi:10.1038/nmeth.3734).

Maintained by Evan Biederstedt. Last updated 5 months ago.

immunooncology rnaseq statisticalmethod differentialexpression bayesian transcription software analysis bioinformatics heterogenity ngs single-cell transcriptomics openblas cpp openmp

4.5 match 173 stars 7.53 score 141 scripts

jaseziv

worldfootballR:Extract and Clean World Football (Soccer) Data

Allow users to obtain clean and tidy football (soccer) game, team and player data. Data is collected from a number of popular sites, including 'FBref', transfer and valuations data from 'Transfermarkt'<https://www.transfermarkt.com/> and shooting location and other match stats data from 'Understat'<https://understat.com/>. It gives users the ability to access data more efficiently, rather than having to export data tables to files before being able to complete their analysis.

Maintained by Jason Zivkovic. Last updated 1 months ago.

fbref football football-data soccer-data sports-data transfermarkt understat

3.4 match 506 stars 9.89 score 516 scripts 2 dependents

e-sensing

sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes

An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.

Maintained by Gilberto Camara. Last updated 1 months ago.

big-earth-data cbers earth-observation eo-datacubes geospatial image-time-series land-cover-classification landsat planetary-computer r-spatial remote-sensing rspatial satellite-image-time-series satellite-imagery sentinel-2 stac-api stac-catalog cpp

3.5 match 494 stars 9.50 score 384 scripts

alphaprime7

normfluodbf:Cleans and Normalizes FLUOstar DBF and DAT Files from 'Liposome' Flux Assays

Cleans and Normalizes FLUOstar DBF and DAT Files obtained from liposome flux assays. Users should verify extended usage of the package on files from other assay types.

Maintained by Tingwei Adeck. Last updated 4 months ago.

6.5 match 1 stars 4.98 score 12 scripts

jbdorey

BeeBDC:Occurrence Data Cleaning

Flags and checks occurrence data that are in Darwin Core format. The package includes generic functions and data as well as some that are specific to bees. This package is meant to build upon and be complimentary to other excellent occurrence cleaning packages, including 'bdc' and 'CoordinateCleaner'. This package uses datasets from several sources and particularly from the Discover Life Website, created by Ascher and Pickering (2020). For further information, please see the original publication and package website. Publication - Dorey et al. (2023) <doi:10.1101/2023.06.30.547152> and package website - Dorey et al. (2023) <https://github.com/jbdorey/BeeBDC>.

Maintained by James B. Dorey. Last updated 4 months ago.

5.6 match 3 stars 5.68 score 7 scripts

rstudio

bookdown:Authoring Books and Technical Documents with R Markdown

Output formats and utilities for authoring books and technical documents with R Markdown.

Maintained by Yihui Xie. Last updated 2 days ago.

book bookdown epub gitbook html latex rmarkdown

1.8 match 3.9k stars 17.51 score 1.7k scripts 136 dependents

bioc

MSstatsConvert:Import Data from Various Mass Spectrometry Signal Processing Tools to MSstats Format

MSstatsConvert provides tools for importing reports of Mass Spectrometry data processing tools into R format suitable for statistical analysis using the MSstats and MSstatsTMT packages.

Maintained by Mateusz Staniak. Last updated 3 months ago.

massspectrometry proteomics software dataimport qualitycontrol

4.9 match 6.37 score 25 scripts 7 dependents

kassambara

rstatix:Pipe-Friendly Framework for Basic Statistical Tests

Provides a simple and intuitive pipe-friendly framework, coherent with the 'tidyverse' design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses. The output of each test is automatically transformed into a tidy data frame to facilitate visualization. Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely 'within-Ss' designs (repeated measures), purely 'between-Ss' designs, and mixed 'within-and-between-Ss' designs. It's also possible to compute several effect size metrics, including "eta squared" for ANOVA, "Cohen's d" for t-test and 'Cramer V' for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances.

Maintained by Alboukadel Kassambara. Last updated 2 years ago.

2.0 match 456 stars 15.16 score 11k scripts 420 dependents

jpquast

protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.

Maintained by Jan-Philipp Quast. Last updated 5 months ago.

data-analysis lip-ms mass-spectrometry omics protein proteomics systems-biology

3.5 match 61 stars 8.58 score 83 scripts

cbailiss

pivottabler:Create Pivot Tables

Create regular pivot tables with just a few lines of R. More complex pivot tables can also be created, e.g. pivot tables with irregular layouts, multiple calculations and/or derived calculations based on multiple data frames. Pivot tables are constructed using R only and can be written to a range of output formats (plain text, 'HTML', 'Latex' and 'Excel'), including with styling/formatting.

Maintained by Christopher Bailiss. Last updated 1 years ago.

calculations html htmlwidget latex pivot-tables visualization

3.7 match 122 stars 8.08 score 358 scripts 1 dependents

usa-npn

rnpn:Interface to the National 'Phenology' Network 'API'

Programmatic interface to the Web Service methods provided by the National 'Phenology' Network (<https://usanpn.org/>), which includes data on various life history events that occur at specific times.

Maintained by Jeff Switzer. Last updated 5 days ago.

data national-phenology-network phenology species web-api

3.3 match 21 stars 8.82 score 109 scripts

bioc

flowClean:flowClean

A quality control tool for flow cytometry data based on compositional data analysis.

Maintained by Kipper Fletez-Brant. Last updated 5 months ago.

flowcytometry qualitycontrol immunooncology

6.5 match 4.56 score 18 scripts

christophergandrud

DataCombine:Tools for Easily Combining and Cleaning Data Sets

Tools for combining and cleaning data sets, particularly with grouped and time series data. This includes functions for merging data while reporting duplicates, filling in columns with values of a column in another data frame, and creating continuous time data for interupted time series.

Maintained by Christopher Gandrud. Last updated 5 years ago.

3.4 match 55 stars 8.50 score 864 scripts 3 dependents

reconhub

matchmaker:Flexible Dictionary-Based Cleaning

Provides flexible dictionary-based cleaning that allows users to specify implicit and explicit missing data, regular expressions for both data and columns, and global matches, while respecting ordering of factors. This package is part of the 'RECON' (<https://www.repidemicsconsortium.org/>) toolkit for outbreak analysis.

Maintained by Zhian N. Kamvar. Last updated 5 years ago.

5.3 match 9 stars 5.43 score 9 scripts 2 dependents

bioc

iSEE:Interactive SummarizedExperiment Explorer

Create an interactive Shiny-based graphical user interface for exploring data stored in SummarizedExperiment objects, including row- and column-level metadata. The interface supports transmission of selections between plots and tables, code tracking, interactive tours, interactive or programmatic initialization, preservation of app state, and extensibility to new panel types via S4 classes. Special attention is given to single-cell data in a SingleCellExperiment object with visualization of dimensionality reduction results.

Maintained by Kevin Rue-Albrecht. Last updated 11 days ago.

cellbasedassays clustering dimensionreduction featureextraction geneexpression gui immunooncology shinyapps singlecell transcription transcriptomics visualization dimension-reduction feature-extraction gene-expression hacktoberfest human-cell-atlas shiny single-cell

2.3 match 225 stars 12.86 score 380 scripts 9 dependents

helixcn

phylotools:Phylogenetic Tools for Eco-Phylogenetics

A collection of tools for building RAxML supermatrix using PHYLIP or aligned FASTA files. These functions will be useful for building large phylogenies using multiple markers.

Maintained by Jinlong Zhang. Last updated 5 months ago.

3.9 match 11 stars 7.31 score 368 scripts

ropensci

jstor:Read Data from JSTOR/DfR

Functions and helpers to import metadata, ngrams and full-texts delivered by Data for Research by JSTOR.

Maintained by Thomas Klebel. Last updated 8 months ago.

jstor peer-reviewed text-analysis text-mining

3.9 match 47 stars 7.29 score 55 scripts

thinkr-open

fusen:Build a Package from Rmarkdown Files

Use Rmarkdown First method to build your package. Start your package with documentation, functions, examples and tests in the same unique file. Everything can be set from the Rmarkdown template file provided in your project, then inflated as a package. Inflating the template copies the relevant chunks and sections in the appropriate files required for package development.

Maintained by Vincent Guyader. Last updated 2 months ago.

hacktoberfest rmd-first

3.0 match 163 stars 9.45 score 35 scripts

msperlin

BatchGetSymbols:Downloads and Organizes Financial Data for Multiple Tickers

Makes it easy to download financial data from Yahoo Finance <https://finance.yahoo.com/>.

Maintained by Marcelo Perlin. Last updated 3 years ago.

financial-data individual-stocks tickers yahoo-finance

3.8 match 18 stars 7.21 score 393 scripts

janmarvin

openxlsx2:Read, Write and Edit 'xlsx' Files

Simplifies the creation of 'xlsx' files by providing a high level interface to writing, styling and editing worksheets.

Maintained by Jan Marvin Garbuszus. Last updated 2 days ago.

xlsx cpp

2.0 match 138 stars 13.67 score 194 scripts 11 dependents

r-lib

covr:Test Coverage for Packages

Track and report code coverage for your package and (optionally) upload the results to a coverage service like 'Codecov' <https://about.codecov.io> or 'Coveralls' <https://coveralls.io>. Code coverage is a measure of the amount of code being exercised by a set of tests. It is an indirect measure of test quality and completeness. This package is compatible with any testing methodology or framework and tracks coverage of both R code and compiled C/C++/FORTRAN code.

Maintained by Jim Hester. Last updated 1 months ago.

codecov coverage coverage-report travis-ci

1.8 match 337 stars 15.25 score 2.3k scripts 9 dependents

ropensci

restez:Create and Query a Local Copy of 'GenBank' in R

Download large sections of 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> and generate a local SQL-based database. A user can then query this database using 'restez' functions or through 'rentrez' <https://CRAN.R-project.org/package=rentrez> wrappers.

Maintained by Joel H. Nitta. Last updated 10 days ago.

dna entrez genbank sequence

3.8 match 26 stars 7.01 score 175 scripts 1 dependents

johncoene

packer:An Opinionated Framework for Using 'JavaScript'

Enforces good practice and provides convenience functions to make work with 'JavaScript' not just easier but also scalable. It is a robust wrapper to 'NPM', 'yarn', and 'webpack' that enables to compartmentalize 'JavaScript' code, leverage 'NPM' and 'yarn' packages, include 'TypeScript', 'React', or 'Vue' in web applications, and much more.

Maintained by John Coene. Last updated 7 months ago.

javascript webpack

4.3 match 148 stars 6.25 score 1 scripts 2 dependents

r-lib

pak:Another Approach to Package Installation

The goal of 'pak' is to make package installation faster and more reliable. In particular, it performs all HTTP operations in parallel, so metadata resolution and package downloads are fast. Metadata and package files are cached on the local disk as well. 'pak' has a dependency solver, so it finds version conflicts before performing the installation. This version of 'pak' supports CRAN, 'Bioconductor' and 'GitHub' packages as well.

Maintained by Gábor Csárdi. Last updated 16 hours ago.

2.0 match 717 stars 13.05 score 277 scripts 17 dependents

r-lib

ps:List, Query, Manipulate System Processes

List, query and manipulate all system processes, on 'Windows', 'Linux' and 'macOS'.

Maintained by Gábor Csárdi. Last updated 17 days ago.

1.7 match 79 stars 15.09 score 108 scripts 1.5k dependents

r-spatial

rgee:R Bindings for Calling the 'Earth Engine' API

Earth Engine <https://earthengine.google.com/> client library for R. All of the 'Earth Engine' API classes, modules, and functions are made available. Additional functions implemented include importing (exporting) of Earth Engine spatial objects, extraction of time series, interactive map display, assets management interface, and metadata display. See <https://r-spatial.github.io/rgee/> for further details.

Maintained by Cesar Aybar. Last updated 4 days ago.

earth-engine earthengine google-earth-engine googleearthengine spatial-analysis spatial-data

1.9 match 715 stars 13.77 score 1.9k scripts 3 dependents

truenomad

epiCleanr:A Tidy Solution for Epidemiological Data

Offers a tidy solution for epidemiological data. It houses a range of functions for epidemiologists and public health data wizards for data management and cleaning.

Maintained by Mohamed A. Yusuf. Last updated 1 years ago.

7.0 match 3.70 score 4 scripts

capitalone

dataCompareR:Compare Two Data Frames and Summarise the Difference

Easy comparison of two tabular data objects in R. Specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. Aims to offer a more useful output than all.equal() when your two data sets do not match, but isn't intended to replace all.equal() as a way to test for equality.

Maintained by Sarah Johnston. Last updated 2 years ago.

compare-data data data-analysis data-science

3.5 match 76 stars 7.24 score 76 scripts

rstudio

promises:Abstractions for Promise-Based Asynchronous Programming

Provides fundamental abstractions for doing asynchronous programming in R using promises. Asynchronous programming is useful for allowing a single R process to orchestrate multiple tasks in the background while also attending to something else. Semantics are similar to 'JavaScript' promises, but with a syntax that is idiomatic R.

Maintained by Joe Cheng. Last updated 1 months ago.

cpp

1.5 match 204 stars 17.10 score 688 scripts 2.6k dependents

oobianom

quickcode:Quick and Essential 'R' Tricks for Better Scripts

The NOT functions, 'R' tricks and a compilation of some simple quick plus often used 'R' codes to improve your scripts. Improve the quality and reproducibility of 'R' scripts.

Maintained by Obinna Obianom. Last updated 14 days ago.

colors data distributions images

3.3 match 5 stars 7.76 score 7 scripts 6 dependents

bioc

gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files

Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.

Maintained by Xiuwen Zheng. Last updated 2 days ago.

infrastructure dataimport bioinformatics gds-format genomics cpp

2.3 match 18 stars 11.34 score 920 scripts 29 dependents

dmurdoch

plotrix:Various Plotting Functions

Lots of plots, various labeling, axis and color scaling functions. The author/maintainer died in September 2023.

Maintained by Duncan Murdoch. Last updated 1 years ago.

2.3 match 5 stars 11.31 score 9.2k scripts 361 dependents

pik-piam

magclass:Data Class and Tools for Handling Spatial-Temporal Data

Data class for increased interoperability working with spatial-temporal data together with corresponding functions and methods (conversions, basic calculations and basic data manipulation). The class distinguishes between spatial, temporal and other dimensions to facilitate the development and interoperability of tools build for it. Additional features are name-based addressing of data and internal consistency checks (e.g. checking for the right data order in calculations).

Maintained by Jan Philipp Dietrich. Last updated 10 days ago.

2.3 match 5 stars 11.16 score 412 scripts 56 dependents

ropensci

birdsize:Estimate Avian Body Size Distributions

Generate estimated body size distributions for populations or communities of birds, given either species ID or species' mean body size. Designed to work naturally with the North American Breeding Bird Survey, or with any dataset of bird species, abundance, and/or mean size data.

Maintained by Renata Diaz. Last updated 1 years ago.

6.6 match 3 stars 3.78 score 8 scripts

geomarker-io

addr:Clean, Parse, Harmonize, Match, and Geocode Messy Real-World Addresses

Addresses that were not validated at the time of collection are often heterogenously formatted, making them difficult to compare or link to other sets of addresses. The addr package is designed to clean character strings of addresses, use the `usaddress` library to tag address components, and paste together select components to create a normalized address. Normalized addresses can be hashed to create hashdresses that can be used to merge with other sets of addresses.

Maintained by Cole Brokamp. Last updated 5 months ago.

rust cargo

5.3 match 2 stars 4.70 score 388 scripts

josesamos

starschemar:Obtaining Stars from Flat Tables

Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure. Frequently, the operations to be performed aim to transform a flat table into a star schema. Transformations can be carried out using professional extract, transform and load tools or tools intended for data transformation for end users. With the tools mentioned, this transformation can be carried out, but it requires a lot of work. The main objective of this package is to define transformations that allow obtaining stars from flat tables easily. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context.

Maintained by Jose Samos. Last updated 11 months ago.

4.3 match 7 stars 5.66 score 11 scripts 2 dependents

c0webster

fedmatch:Fast, Flexible, and User-Friendly Record Linkage Methods

Provides a flexible set of tools for matching two un-linked data sets. 'fedmatch' allows for three ways to match data: exact matches, fuzzy matches, and multi-variable matches. It also allows an easy combination of these three matches via the tier matching function.

Maintained by Chris Webster. Last updated 1 months ago.

cpp openmp

5.3 match 1 stars 4.62 score 80 scripts

tetratech

baytrends:Long Term Water Quality Trend Analysis

Enable users to evaluate long-term trends using a Generalized Additive Modeling (GAM) approach. The model development includes selecting a GAM structure to describe nonlinear seasonally-varying changes over time, incorporation of hydrologic variability via either a river flow or salinity, the use of an intervention to deal with method or laboratory changes suspected to impact data values, and representation of left- and interval-censored data. The approach has been applied to water quality data in the Chesapeake Bay, a major estuary on the east coast of the United States to provide insights to a range of management- and research-focused questions. Methodology described in Murphy (2019) <doi:10.1016/j.envsoft.2019.03.027>.

Maintained by Erik W Leppo. Last updated 5 months ago.

3.6 match 12 stars 6.67 score 97 scripts

philchalmers

SimDesign:Structure for Organizing Monte Carlo Simulation Designs

Provides tools to safely and efficiently organize and execute Monte Carlo simulation experiments in R. The package controls the structure and back-end of Monte Carlo simulation experiments by utilizing a generate-analyse-summarise workflow. The workflow safeguards against common simulation coding issues, such as automatically re-simulating non-convergent results, prevents inadvertently overwriting simulation files, catches error and warning messages during execution, implicitly supports parallel processing with high-quality random number generation, and provides tools for managing high-performance computing (HPC) array jobs submitted to schedulers such as SLURM. For a pedagogical introduction to the package see Sigal and Chalmers (2016) <doi:10.1080/10691898.2016.1246953>. For a more in-depth overview of the package and its design philosophy see Chalmers and Adkins (2020) <doi:10.20982/tqmp.16.4.p248>.

Maintained by Phil Chalmers. Last updated 22 hours ago.

monte-carlo-simulation simulation simulation-framework

1.8 match 62 stars 13.36 score 253 scripts 46 dependents

kjhealy

gssrdoc:Document General Social Survey Variable

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 11 months ago.

10.5 match 2.28 score 38 scripts

winvector

rquery:Relational Query Generator for Data Manipulation at Scale

A piped query generator based on Edgar F. Codd's relational algebra, and on production experience using 'SQL' and 'dplyr' at big data scale. The design represents an attempt to make 'SQL' more teachable by denoting composition by a sequential pipeline notation instead of nested queries or functions. The implementation delivers reliable high performance data processing on large data systems such as 'Spark', databases, and 'data.table'. Package features include: data processing trees or pipelines as observable objects (able to report both columns produced and columns used), optimized 'SQL' generation as an explicit user visible table modeling step, plus explicit query reasoning and checking.

Maintained by John Mount. Last updated 2 years ago.

2.5 match 110 stars 9.53 score 126 scripts 3 dependents

business-science

tidyquant:Tidy Quantitative Financial Analysis

Bringing business and financial analysis to the 'tidyverse'. The 'tidyquant' package provides a convenient wrapper to various 'xts', 'zoo', 'quantmod', 'TTR' and 'PerformanceAnalytics' package functions and returns the objects in the tidy 'tibble' format. The main advantage is being able to use quantitative functions with the 'tidyverse' functions including 'purrr', 'dplyr', 'tidyr', 'ggplot2', 'lubridate', etc. See the 'tidyquant' website for more information, documentation and examples.

Maintained by Matt Dancho. Last updated 1 months ago.

dplyr financial-analysis financial-data financial-statements multiple-stocks performance-analysis performanceanalytics quantmod stock stock-exchanges stock-indexes stock-lists stock-performance stock-prices stock-symbol tidyverse time-series timeseries xts

1.8 match 872 stars 13.34 score 5.2k scripts

vpnagraj

rrefine:r Client for OpenRefine API

'OpenRefine' (formerly 'Google Refine') is a popular, open source data cleaning software. This package enables users to programmatically trigger data transfer between R and 'OpenRefine'. Available functionality includes project import, export and deletion.

Maintained by VP Nagraj. Last updated 2 years ago.

4.0 match 22 stars 5.77 score 27 scripts

nrennie

messy:Create Messy Data from Clean Data Frames

For the purposes of teaching, it is often desirable to show examples of working with messy data and how to clean it. This R package creates messy data from clean, tidy data frames so that students have a clean example to work towards.

Maintained by Nicola Rennie. Last updated 3 months ago.

teaching

3.9 match 141 stars 5.93 score 8 scripts

chgrl

bReeze:Functions for Wind Resource Assessment

A collection of functions to analyse, visualize and interpret wind data and to calculate the potential energy production of wind turbines.

Maintained by Christian Graul. Last updated 1 years ago.

5.3 match 20 stars 4.34 score 22 scripts

tjebo

eye:Analysis of Eye Data

There is no ophthalmic researcher who has not had headaches from the handling of visual acuity entries. Different notations, untidy entries. This shall now be a matter of the past. Eye makes it as easy as pie to work with VA data - easy cleaning, easy conversion between Snellen, logMAR, ETDRS letters, and qualitative visual acuity shall never pester you again. The eye package automates the pesky task to count number of patients and eyes, and can help to clean data with easy re-coding for right and left eyes. It also contains functions to help reshaping eye side specific variables between wide and long format. Visual acuity conversion is based on Schulze-Bonsel et al. (2006) <doi:10.1167/iovs.05-0981>, Gregori et al. (2010) <doi:10.1097/iae.0b013e3181d87e04>, Beck et al. (2003) <doi:10.1016/s0002-9394(02)01825-1> and Bach (2007) <http:michaelbach.de/sci/acuity.html>.

Maintained by Tjebo Heeren. Last updated 3 years ago.

4.6 match 6 stars 4.92 score 14 scripts

mdlincoln

salty:Turn Clean Data into Messy Data

Take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc.

Maintained by Matthew Lincoln. Last updated 7 months ago.

4.8 match 64 stars 4.81 score 20 scripts

poissonconsulting

batchr:Batch Process Files

Processes multiple files with a user-supplied function. The key design principle is that only files which were last modified before the directory was configured are processed. A hidden file stores the configuration time and function etc while successfully processed files are automatically touched to update their modification date. As a result batch processing can be stopped and restarted and any files created (or modified or deleted) during processing are ignored.

Maintained by Joe Thorley. Last updated 2 months ago.

batch-processing

5.0 match 6 stars 4.56 score 8 scripts

revelle

psych:Procedures for Psychological, Psychometric, and Personality Research

A general purpose toolbox developed originally for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Validation and cross validation of scales developed using basic machine learning algorithms are provided, as are functions for simulating and testing particular item and test structures. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, including mediation models, factor analysis and structural equation models are created using basic graphics. Some of the functions are written to support a book on psychometric theory as well as publications in personality research. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 3 months ago.

1.6 match 52 stars 13.94 score 29k scripts 317 dependents

vincentarelbundock

modelsummary:Summary Tables and Plots for Statistical Models and Data: Beautiful, Customizable, and Publication-Ready

Create beautiful and customizable tables to summarize several statistical models side-by-side. Draw coefficient plots, multi-level cross-tabs, dataset summaries, balance tables (a.k.a. "Table 1s"), and correlation matrices. This package supports dozens of statistical models, and it can produce tables in HTML, LaTeX, Word, Markdown, PDF, PowerPoint, Excel, RTF, JPG, or PNG. Tables can easily be embedded in 'Rmarkdown' or 'knitr' dynamic documents. Details can be found in Arel-Bundock (2022) <doi:10.18637/jss.v103.i01>.

Maintained by Vincent Arel-Bundock. Last updated 15 days ago.

1.7 match 926 stars 13.41 score 6.2k scripts 2 dependents

miracum

DIZtools:Lightweight Utilities for 'DIZ' R Package Development

Lightweight utility functions used for the R package development infrastructure inside the data integration centers ('DIZ') to standardize and facilitate repetitive tasks such as setting up a database connection or issuing notification messages and to avoid redundancy.

Maintained by Jonathan M. Mang. Last updated 1 years ago.

snippets tools

5.5 match 3 stars 4.13 score 2 scripts 3 dependents

schochastics

networkdata:Repository of Network Datasets

The package contains a large collection of network dataset with different context. This includes social networks, animal networks and movie networks. All datasets are in 'igraph' format.

Maintained by David Schoch. Last updated 12 months ago.

dataset network-analysis

4.5 match 143 stars 5.01 score 143 scripts

matt-dray

oystr:Handle Personal Oyster Journey History Data Provided by Transport for London

You can opt-in to monthly emails from Transport for London (TfL) that have your Oyster journey history attached as a CSV. Functions in this small package help you read, wrangle and summarise these data. I, and this work, are unaffiliated with Transport for London (TfL).

Maintained by Matt Dray. Last updated 4 years ago.

london oyster oystercard tfl transport

7.5 match 2 stars 3.00 score 5 scripts

ropengov

eurostat:Tools for Eurostat Open Data

Tools to download data from the Eurostat database <https://ec.europa.eu/eurostat> together with search and manipulation utilities.

Maintained by Leo Lahti. Last updated 28 days ago.

ropengov eurostat eurostat-data

2.0 match 239 stars 11.09 score 892 scripts 5 dependents

richjjackson

psc:Personalised Synthetic Controls

Allows the comparison of data cohorts (DC) against a Counter Factual Model (CFM) and measures the difference in terms of an efficacy parameter. Allows the application of Personalised Synthetic Controls.

Maintained by Richard Jackson. Last updated 4 months ago.

5.3 match 1 stars 4.23 score 24 scripts

ipums

ipumsr:An R Interface for Downloading, Reading, and Handling IPUMS Data

An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website <https://www.ipums.org>.

Maintained by Derek Burk. Last updated 19 days ago.

2.0 match 28 stars 11.07 score 720 scripts 2 dependents

hadley

reshape:Flexibly Reshape Data

Flexibly restructure and aggregate data using just two functions: melt and cast.

Maintained by Hadley Wickham. Last updated 3 years ago.

2.3 match 9.83 score 21k scripts 231 dependents

ylin00

gen5helper:Processing 'Gen5' 2.06 Exported Data

A collection of functions for processing 'Gen5' 2.06 exported data. 'Gen5' is an essential data analysis software for BioTek plate readers <https://www.biotek.com/products/software-robotics-software/gen5-microplate-reader-and-imager-software/>. This package contains functions for data cleaning, modeling and plotting using exported data from 'Gen5' version 2.06. It exports technically correct data defined in (Edwin de Jonge and Mark van der Loo (2013) <https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf>) for customized analysis. It contains Boltzmann fitting for general kinetic analysis. See <https://www.github.com/yanxianUCSB/gen5helper> for more information, documentation and examples.

Maintained by Yanxian Lin. Last updated 5 years ago.

8.1 match 2.70 score 1 scripts

opengeos

whitebox:'WhiteboxTools' R Frontend

An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.

Maintained by Andrew Brown. Last updated 5 months ago.

geomorphometry geoprocessing geospatial gis hydrology remote-sensing rstudio

2.3 match 173 stars 9.65 score 203 scripts 2 dependents

rstudio

blogdown:Create Blogs and Websites with R Markdown

Write blog posts and web pages in R Markdown. This package supports the static site generator 'Hugo' (<https://gohugo.io>) best, and it also supports 'Jekyll' (<https://jekyllrb.com>) and 'Hexo' (<https://hexo.io>).

Maintained by Yihui Xie. Last updated 22 hours ago.

blog-engine blogdown hugo rmarkdown rstudio website-generation

1.9 match 1.8k stars 11.55 score 1.4k scripts 1 dependents

mandymejia

fMRIscrub:Scrubbing and Other Data Cleaning Routines for fMRI

Data-driven fMRI denoising with projection scrubbing (Pham et al (2022) <doi:10.1016/j.neuroimage.2023.119972>). Also includes routines for DVARS (Derivatives VARianceS) (Afyouni and Nichols (2018) <doi:10.1016/j.neuroimage.2017.12.098>), motion scrubbing (Power et al (2012) <doi:10.1016/j.neuroimage.2011.10.018>), aCompCor (anatomical Components Correction) (Muschelli et al (2014) <doi:10.1016/j.neuroimage.2014.03.028>), detrending, and nuisance regression. Projection scrubbing is also applicable to other outlier detection tasks involving high-dimensional data.

Maintained by Amanda Mejia. Last updated 2 years ago.

4.8 match 4 stars 4.56 score 15 scripts 1 dependents

kamapu

vegtable:Handling Vegetation Data Sets

Import and handling data from vegetation-plot databases, especially data stored in 'Turboveg 2' (<https://www.synbiosys.alterra.nl/turboveg/>). Also import/export routines for exchange of data with 'Juice' (<https://www.sci.muni.cz/botany/juice/>) are implemented.

Maintained by Miguel Alvarez. Last updated 8 months ago.

5.1 match 7 stars 4.23 score 49 scripts

ericdunipace

RcppCGAL:'Rcpp' Integration for 'CGAL'

Creates a header only package to link to the 'CGAL' (Computational Geometry Algorithms Library) header files in 'Rcpp'. There are a variety of potential uses for the software such as Hilbert sorting, K-D Tree nearest neighbors, and convex hull algorithms. For more information about how to use the header files, see the 'CGAL' documentation at <https://www.cgal.org>. Currently downloads version 6.0.1 of the 'CGAL' header files.

Maintained by Eric Dunipace. Last updated 2 months ago.

cgal cgal-headers

3.0 match 12 stars 7.18 score 1 scripts 12 dependents

gavinrozzi

njtr1:Download, Analyze & Clean New Jersey Car Crash Data

Download and analyze motor vehicle crash data released by the New Jersey Department of Transportation (NJDOT). The data in this package is collected through the filing of NJTR-1 form by police officers, which provide a standardized way of documenting a motor vehicle crash that occurred in New Jersey. 3 different data tables containing data on crashes, vehicles & pedestrians released from 2001 to the present can be downloaded & cleaned using this package.

Maintained by Gavin Rozzi. Last updated 1 years ago.

njtr1 new-jersey road-safety car-crashes car-accidents data

4.9 match 5 stars 4.40 score 7 scripts

stan-dev

cmdstanr:R Interface to 'CmdStan'

A lightweight interface to 'Stan' <https://mc-stan.org>. The 'CmdStanR' interface is an alternative to 'RStan' that calls the command line interface for compilation and running algorithms instead of interfacing with C++ via 'Rcpp'. This has many benefits including always being compatible with the latest version of Stan, fewer installation errors, fewer unexpected crashes in RStudio, and a more permissive license.

Maintained by Andrew Johnson. Last updated 9 months ago.

bayes bayesian markov-chain-monte-carlo maximum-likelihood mcmc stan variational-inference

1.8 match 145 stars 12.27 score 5.2k scripts 9 dependents

nflverse

nflreadr:Download 'nflverse' Data

A minimal package for downloading data from 'GitHub' repositories of the 'nflverse' project.

Maintained by Tan Ho. Last updated 4 months ago.

nfl nflfastr nflverse sports-data

1.7 match 66 stars 12.46 score 476 scripts 10 dependents

tyee001

VGAM:Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book "Vector Generalized Linear and Additive Models: With an Implementation in R" (Yee, 2015) <DOI:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (100+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, doubly constrained RR-VGLMs, quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)---these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Hauck-Donner effect detection is implemented. Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

Maintained by Thomas Yee. Last updated 1 months ago.

fortran

2.0 match 10 stars 10.67 score 3.6k scripts 169 dependents

alexbhatt

epidm:UK Epidemiological Data Management

Contains utilities and functions for the cleaning, processing and management of patient level public health data for surveillance and analysis held by the UK Health Security Agency, UKHSA.

Maintained by Alex Bhattacharya. Last updated 7 months ago.

4.2 match 13 stars 5.07 score 2 scripts

nutriverse

mwana:An Efficient Workflow for Plausibility Checks and Prevalence Analysis of Wasting in R

A simple and streamlined workflow for plausibility checks and prevalence analysis of wasting based on the Standardized Monitoring and Assessment of Relief and Transition (SMART) Methodology <https://smartmethodology.org/>, with application in R.

Maintained by Tomás Zaba. Last updated 1 months ago.

acute-malnutrition anthropometry muac nutrition smart survey wasting

5.0 match 2 stars 4.23 score 6 scripts

paws-r

paws:Amazon Web Services Software Development Kit

Interface to Amazon Web Services <https://aws.amazon.com>, including storage, database, and compute services, such as 'Simple Storage Service' ('S3'), 'DynamoDB' 'NoSQL' database, and 'Lambda' functions-as-a-service.

Maintained by Dyfan Jones. Last updated 4 days ago.

aws aws-sdk

1.9 match 332 stars 11.25 score 177 scripts 12 dependents

joundso

cleaR:Clean the R Console and Environment

Small package to clean the R console and the R environment with the call of just one function.

Maintained by Jonathan M. Mang. Last updated 1 years ago.

5.5 match 3.78 score 3 scripts 4 dependents

asa12138

MetaNet:Network Analysis for Omics Data

Comprehensive network analysis package. Calculate correlation network fastly, accelerate lots of analysis by parallel computing. Support for multi-omics data, search sub-nets fluently. Handle bigger data, more than 10,000 nodes in each omics. Offer various layout method for multi-omics network and some interfaces to other software ('Gephi', 'Cytoscape', 'ggplot2'), easy to visualize. Provide comprehensive topology indexes calculation, including ecological network stability.

Maintained by Chen Peng. Last updated 11 days ago.

dataimport network analysis omics software visualization

3.8 match 13 stars 5.51 score 9 scripts

mayur1009

cleanTS:Testbench for Univariate Time Series Cleaning

A reliable and efficient tool for cleaning univariate time series data. It implements reliable and efficient procedures for automating the process of cleaning univariate time series data. The package provides integration with already developed and deployed tools for missing value imputation and outlier detection. It also provides a way of visualizing large time-series data in different resolutions.

Maintained by Mayur Shende. Last updated 1 years ago.

5.6 match 11 stars 3.74 score 3 scripts

bioc

basecallQC:Working with Illumina Basecalling and Demultiplexing input and output files

The basecallQC package provides tools to work with Illumina bcl2Fastq (versions >= 2.1.7) software.Prior to basecalling and demultiplexing using the bcl2Fastq software, basecallQC functions allow the user to update Illumina sample sheets from versions <= 1.8.9 to >= 2.1.7 standards, clean sample sheets of common problems such as invalid sample names and IDs, create read and index basemasks and the bcl2Fastq command. Following the generation of basecalled and demultiplexed data, the basecallQC packages allows the user to generate HTML tables, plots and a self contained report of summary metrics from Illumina XML output files.

Maintained by Thomas Carroll. Last updated 5 months ago.

sequencing infrastructure dataimport qualitycontrol

4.7 match 4.32 score 21 scripts

eu-ecdc

EpiSignalDetection:Signal Detection Analysis

Exploring time series for signal detection. It is specifically designed to detect possible outbreaks using infectious disease surveillance data at the European Union / European Economic Area or country level. Automatic detection tools used are presented in the paper "Monitoring count time series in R: aberration detection in public health surveillance", by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. The package includes: - Signal Detection tool, an interactive 'shiny' application in which the user can import external data and perform basic signal detection analyses; - An automated report in HTML format, presenting the results of the time series analysis in tables and graphs. This report can also be stratified by population characteristics (see 'Population' variable). This project was funded by the European Centre for Disease Prevention and Control.

Maintained by Joana Gomes Dias. Last updated 6 years ago.

3.8 match 16 stars 5.43 score 17 scripts

r-forge

deSolve:Solvers for Initial Value Problems of Differential Equations ('ODE', 'DAE', 'DDE')

Functions that solve initial value problems of a system of first-order ordinary differential equations ('ODE'), of partial differential equations ('PDE'), of differential algebraic equations ('DAE'), and of delay differential equations. The functions provide an interface to the FORTRAN functions 'lsoda', 'lsodar', 'lsode', 'lsodes' of the 'ODEPACK' collection, to the FORTRAN functions 'dvode', 'zvode' and 'daspk' and a C-implementation of solvers of the 'Runge-Kutta' family with fixed or variable time steps. The package contains routines designed for solving 'ODEs' resulting from 1-D, 2-D and 3-D partial differential equations ('PDE') that have been converted to 'ODEs' by numerical differencing.

Maintained by Thomas Petzoldt. Last updated 1 years ago.

fortran openblas

1.7 match 12.33 score 8.0k scripts 427 dependents

surveygraph

surveygraph:Network Representations of Attitudes

A tool for computing network representations of attitudes, extracted from tabular data such as sociological surveys. Development of surveygraph software and training materials was initially funded by the European Union under the ERC Proof-of-concept programme (ERC, Attitude-Maps-4-All, project number: 101069264). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

Maintained by Samuel Unicomb. Last updated 4 months ago.

cpp

3.8 match 1 stars 5.41 score 6 scripts

maarten14c

rice:Radiocarbon Equations

Provides functions for the calibration of radiocarbon dates, as well as options to calculate different radiocarbon realms (C14 age, F14C, pMC, D14C) and estimating the effects of contamination or local reservoir offsets (Reimer and Reimer 2001 <doi:10.1017/S0033822200038339>). The methods follow long-established recommendations such as Stuiver and Polach (1977) <doi:10.1017/S0033822200003672> and Reimer et al. (2004) <doi:10.1017/S0033822200033154>. This package complements the data package 'rintcal'.

Maintained by Maarten Blaauw. Last updated 2 months ago.

3.3 match 1 stars 6.13 score 13 scripts 4 dependents

pauljohn32

kutils:Project Management Tools

Tools for data importation, recoding, and inspection. There are functions to create new project folders, R code templates, create uniquely named output directories, and to quickly obtain a visual summary for each variable in a data frame. The main feature here is the systematic implementation of the "variable key" framework for data importation and recoding. We are eager to have community feedback about the variable key and the vignette about it. In version 1.7, the function 'semTable' is removed. It was deprecated since 1.67. That is provided in a separate package, 'semTable'.

Maintained by Paul Johnson. Last updated 1 years ago.

3.4 match 5.85 score 110 scripts 20 dependents

matt-dray

tamRgo:Digital Pets for R

Store a persistent digital pet on your computer and interact with it in your R console.

Maintained by Matt Dray. Last updated 2 years ago.

tamagotchi

5.6 match 7 stars 3.54 score 4 scripts

fkeck

bioseq:A Toolbox for Manipulating Biological Sequences

Classes and functions to work with biological sequences (DNA, RNA and amino acid sequences). Implements S3 infrastructure to work with biological sequences as described in Keck (2020) <doi:10.1111/2041-210X.13490>. Provides a collection of functions to perform biological conversion among classes (transcription, translation) and basic operations on sequences (detection, selection and replacement based on positions or patterns). The package also provides functions to import and export sequences from and to other package formats.

Maintained by Francois Keck. Last updated 3 years ago.

2.9 match 22 stars 6.72 score 80 scripts 1 dependents