Showing 200 of total 771 results (show query)
sfirke
janitor:Simple Tools for Examining and Cleaning Dirty Data
The main janitor functions can: perfectly format data.frame column names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and explore duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness.
Maintained by Sam Firke. Last updated 3 months ago.
data-analysisdata-cleaningdata-sciencedirty-dataexcelpivot-tablesspsstabulationstidyverse
17.1 match 1.4k stars 19.15 score 35k scripts 231 dependentsdata-cleaning
validate:Data Validation Infrastructure
Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity. Supports checks implied by an SDMX DSD file as well. See also Van der Loo and De Jonge (2018) <doi:10.1002/9781118897126>, Chapter 6 and the JSS paper (2021) <doi:10.18637/jss.v097.i10>.
Maintained by Mark van der Loo. Last updated 12 days ago.
22.5 match 418 stars 12.50 score 448 scripts 9 dependentsepiforecasts
covidregionaldata:Subnational Data for COVID-19 Epidemiology
An interface to subnational and national level COVID-19 data sourced from both official sources, such as Public Health England in the UK, and from other COVID-19 data collections, including the World Health Organisation (WHO), European Centre for Disease Prevention and Control (ECDC), John Hopkins University (JHU), Google Open Data and others. Designed to streamline COVID-19 data extraction, cleaning, and processing from a range of data sources in an open and transparent way. This allows users to inspect and scrutinise the data, and tools used to process it, at every step. For all countries supported, data includes a daily time-series of cases. Wherever available data is also provided for deaths, hospitalisations, and tests. National level data are also supported using a range of sources.
Maintained by Sam Abbott. Last updated 3 years ago.
covid-19dataopen-sciencer6regional-data
31.5 match 37 stars 5.67 score 121 scriptsices-tools-prod
TAF:Transparent Assessment Framework for Reproducible Research
General framework to organize data, methods, and results used in reproducible scientific analyses. A TAF analysis consists of four scripts (data.R, model.R, output.R, report.R) that are run sequentially. Each script starts by reading files from a previous step and ends with writing out files for the next step. Convenience functions are provided to version control the required data and software, run analyses, clean residues from previous runs, manage files, manipulate tables, and produce figures. With a focus on stability and reproducible analyses, the TAF package comes with no dependencies. TAF forms a base layer for the 'icesTAF' package and other scientific applications.
Maintained by Arni Magnusson. Last updated 4 months ago.
23.1 match 3 stars 6.85 score 282 scripts 2 dependentsepiverse-trace
cleanepi:Clean and Standardize Epidemiological Data
Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, and generates a comprehensive report detailing the outcomes of each cleaning task.
Maintained by Karim Mané. Last updated 3 days ago.
data-cleaningepidemiologyepiverse
21.1 match 9 stars 7.44 score 19 scriptsropensci
CoordinateCleaner:Automated Cleaning of Occurrence Records from Biological Collections
Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. The reference for the methodology is: Zizka et al. (2019) <doi:10.1111/2041-210X.13152>.
Maintained by Alexander Zizka. Last updated 1 years ago.
14.2 match 82 stars 10.93 score 306 scripts 3 dependentseblondel
cleangeo:Cleaning Geometries from Spatial Objects
Provides a set of utility tools to inspect spatial objects, facilitate handling and reporting of topology errors and geometry validity issue with sp objects. Finally, it provides a geometry cleaner that will fix all geometry problems, and eliminate (at least reduce) the likelihood of having issues when doing spatial data processing.
Maintained by Emmanuel Blondel. Last updated 2 years ago.
cleaningcleaning-geometriesgisspspatial
22.5 match 45 stars 6.82 score 99 scripts 1 dependentsdata-cleaning
errorlocate:Locate Errors with Validation Rules
Errors in data can be located and removed using validation rules from package 'validate'. See also Van der Loo and De Jonge (2018) <doi:10.1002/9781118897126>, chapter 7.
Maintained by Edwin de Jonge. Last updated 9 months ago.
data-cleaningerrorsinvalidation
24.2 match 22 stars 6.11 score 59 scriptstrinker
textclean:Text Cleaning Tools
Tools to clean and process text. Tools are geared at checking for substrings that are not optimal for analysis and replacing or removing them (normalizing) with more analysis friendly substrings (see Sproat, Black, Chen, Kumar, Ostendorf, & Richards (2001) <doi:10.1006/csla.2001.0169>) or extracting them into new variables. For example, emoticons are often used in text but not always easily handled by analysis algorithms. The replace_emoticon() function replaces emoticons with word equivalents.
Maintained by Tyler Rinker. Last updated 3 years ago.
data-mungingemoticonsregextext-analysistext-cleaning
13.3 match 248 stars 10.08 score 760 scripts 22 dependentsbraverock
PerformanceAnalytics:Econometric Tools for Performance and Risk Analysis
Collection of econometric functions for performance and risk analysis. In addition to standard risk and performance metrics, this package aims to aid practitioners and researchers in utilizing the latest research in analysis of non-normal return streams. In general, it is most tested on return (rather than price) data on a regular scale, but most functions will work with irregular return data as well, and increasing numbers of functions will work with P&L or price data where possible.
Maintained by Brian G. Peterson. Last updated 3 months ago.
7.9 match 222 stars 15.93 score 4.8k scripts 20 dependentsdata-cleaning
editrules:Parsing, Applying, and Manipulating Data Cleaning Rules
Please note: active development has moved to packages 'validate' and 'errorlocate'. Facilitates reading and manipulating (multivariate) data restrictions (edit rules) on numerical and categorical data. Rules can be defined with common R syntax and parsed to an internal (matrix-like format). Rules can be manipulated with variable elimination and value substitution methods, allowing for feasibility checks and more. Data can be tested against the rules and erroneous fields can be found based on Fellegi and Holt's generalized principle. Rules dependencies can be visualized with using the 'igraph' package.
Maintained by Edwin de Jonge. Last updated 9 months ago.
17.9 match 22 stars 6.97 score 140 scripts 1 dependentsrolkra
explore:Simplifies Exploratory Data Analysis
Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.
Maintained by Roland Krasser. Last updated 3 months ago.
data-explorationdata-visualisationdecision-treesedarmarkdownshinytidy
10.3 match 228 stars 11.43 score 221 scripts 1 dependentsgadenbuie
cleanrmd:Clean Class-Less 'R Markdown' HTML Documents
A collection of clean 'R Markdown' HTML document templates using classy-looking classless CSS styles. These documents use a minimal set of dependencies but still look great, making them suitable for use a package vignettes or for sharing results via email.
Maintained by Garrick Aden-Buie. Last updated 2 years ago.
classlessclassless-themecleancsshtmlrmarkdownstyletheme
18.0 match 151 stars 5.95 score 10 scripts 1 dependentsrstudio
renv:Project Environments
A dependency management toolkit for R. Using 'renv', you can create and manage project-local R libraries, save the state of these libraries to a 'lockfile', and later restore your library as required. Together, these tools can help make your projects more isolated, portable, and reproducible.
Maintained by Kevin Ushey. Last updated 3 days ago.
5.6 match 1.0k stars 18.55 score 1.5k scripts 113 dependentsdata-cleaning
validatetools:Checking and Simplifying Validation Rule Sets
Rule sets with validation rules may contain redundancies or contradictions. Functions for finding redundancies and problematic rules are provided, given a set a rules formulated with 'validate'.
Maintained by Edwin de Jonge. Last updated 9 months ago.
22.5 match 15 stars 4.47 score 39 scriptsnataliepatten
gatoRs:Geographic and Taxonomic Occurrence R-Based Scrubbing
Streamlines downloading and cleaning biodiversity data from Integrated Digitized Biocollections (iDigBio) and the Global Biodiversity Information Facility (GBIF).
Maintained by Natalie N. Patten. Last updated 10 months ago.
16.1 match 11 stars 6.16 score 66 scriptsdata-cleaning
validatesuggest:Generate Suggestions for Validation Rules
Generate suggestions for validation rules from a reference data set, which can be used as a starting point for domain specific rules to be checked with package 'validate'.
Maintained by Edwin de Jonge. Last updated 1 years ago.
22.5 match 5 stars 4.40 score 5 scriptsdata-cleaning
dcmodify:Modify Data Using Externally Defined Modification Rules
Data cleaning scripts typically contain a lot of 'if this change that' type of statements. Such statements are typically condensed expert knowledge. With this package, such 'data modifying rules' are taken out of the code and become in stead parameters to the work flow. This allows one to maintain, document, and reason about data modification rules as separate entities.
Maintained by Mark van der Loo. Last updated 9 months ago.
15.5 match 10 stars 6.24 score 58 scriptsjonathancornelissen
highfrequency:Tools for Highfrequency Data Analysis
Provide functionality to manage, clean and match highfrequency trades and quotes data, calculate various liquidity measures, estimate and forecast volatility, detect price jumps and investigate microstructure noise and intraday periodicity. A detailed vignette can be found in the paper "Analyzing Intraday Financial Data in R: The highfrequency Package" by Boudt, Kleen, and Sjoerup (2022, <doi:10.18637/jss.v104.i08>). The DOI in the CITATION is for a new Journal of Statistical Software publication that will be registered after publication on CRAN. A working paper version can be found on SSRN: <doi:10.2139/ssrn.3917548>.
Maintained by Kris Boudt. Last updated 2 years ago.
13.0 match 152 stars 7.37 score 286 scriptsdata-cleaning
deductive:Data Correction and Imputation Using Deductive Methods
Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data.
Maintained by Mark van der Loo. Last updated 1 months ago.
22.5 match 14 stars 4.26 score 13 scriptscran
podcleaner:Legacy Scottish Post Office Directories Cleaner
Attempts to clean optical character recognition (OCR) errors in legacy Scottish Post Office Directories. Further attempts to match records from trades and general directories.
Maintained by Olivier Bautheac. Last updated 3 years ago.
55.6 match 1.70 scorewadpac
GGIR:Raw Accelerometer Data Analysis
A tool to process and analyse data collected with wearable raw acceleration sensors as described in Migueles and colleagues (JMPB 2019), and van Hees and colleagues (JApplPhysiol 2014; PLoSONE 2015). The package has been developed and tested for binary data from 'GENEActiv' <https://activinsights.com/>, binary (.gt3x) and .csv-export data from 'Actigraph' <https://theactigraph.com> devices, and binary (.cwa) and .csv-export data from 'Axivity' <https://axivity.com>. These devices are currently widely used in research on human daily physical activity. Further, the package can handle accelerometer data file from any other sensor brand providing that the data is stored in csv format. Also the package allows for external function embedding.
Maintained by Vincent T van Hees. Last updated 2 days ago.
accelerometeractivity-recognitioncircadian-rhythmmovement-sensorsleep
7.1 match 109 stars 13.20 score 342 scripts 3 dependentsmsberends
cleaner:Fast and Easy Data Cleaning
Data cleaning functions for classes logical, factor, numeric, character, currency and Date to make data cleaning fast and easy. Relying on very few dependencies, it provides smart guessing, but with user options to override anything if needed.
Maintained by Matthijs S. Berends. Last updated 4 months ago.
13.1 match 32 stars 6.95 score 64 scripts 9 dependentsrqtl
qtl2:Quantitative Trait Locus Mapping in Experimental Crosses
Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.
Maintained by Karl W Broman. Last updated 8 days ago.
9.6 match 34 stars 9.48 score 1.1k scripts 5 dependentsopenbiox
UCSCXenaShiny:Interactive Analysis of UCSC Xena Data
Provides functions and a Shiny application for downloading, analyzing and visualizing datasets from UCSC Xena (<http://xena.ucsc.edu/>), which is a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others.
Maintained by Shixiang Wang. Last updated 4 months ago.
cancer-datasetshiny-appsucsc-xena
10.4 match 96 stars 8.54 score 35 scriptsarutools
ARUtools:Management and Processing of Autonomous Recording Unit (ARU) Data
Parse Autonomous Recording Unit (ARU) data and for sub-sampling recordings. Extract Metadata from your recordings, select a subset of recordings for interpretation, and prepare files for processing on the 'WildTrax' <https://wildtrax.ca/> platform. Read and process metadata from recordings collected using the SongMeter and BAR-LT types of ARUs.
Maintained by David Hope. Last updated 4 months ago.
14.0 match 6.30 score 26 scriptsbusiness-science
anomalize:Tidy Anomaly Detection
The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it's quite simple to decompose time series, detect anomalies, and create bands separating the "normal" data from the anomalous data at scale (i.e. for multiple time series). Time series decomposition is used to remove trend and seasonal components via the time_decompose() function and methods include seasonal decomposition of time series by Loess ("stl") and seasonal decomposition by piecewise medians ("twitter"). The anomalize() function implements two methods for anomaly detection of residuals including using an inner quartile range ("iqr") and generalized extreme studentized deviation ("gesd"). These methods are based on those used in the 'forecast' package and the Twitter 'AnomalyDetection' package. Refer to the associated functions for specific references for these methods.
Maintained by Matt Dancho. Last updated 1 years ago.
anomalyanomaly-detectiondecompositiondetect-anomaliesiqrtime-series
9.2 match 339 stars 9.56 score 332 scriptslrberge
stringmagic:Character String Operations and Interpolation, Magic Edition
Performs complex string operations compactly and efficiently. Supports string interpolation jointly with over 50 string operations. Also enhances regular string functions (like grep() and co). See an introduction at <https://lrberge.github.io/stringmagic/>.
Maintained by Laurent R Berge. Last updated 7 months ago.
8.1 match 15 stars 10.56 score 37 scripts 33 dependentsdominiquemaucieri
quadcleanR:Cleanup and Visualization of Quadrat Data
A tool that can be customized to aid in the clean up of ecological data collected using quadrats and can crop quadrats to ensure comparability between quadrats collected under different methodologies.
Maintained by Dominique Maucieri. Last updated 2 years ago.
19.0 match 4.45 score 14 scriptsusaid-oha-si
mindthegap:Mind the Gap
Package to tidy UNAIDS estimates (from the EDMS database) as well as plot trends in UNAIDS 95 goals and ART coverage gap by country.
Maintained by Karishma Srikanth. Last updated 2 months ago.
14.7 match 5 stars 5.51 score 13 scriptsassuom44
arlclustering:Exploring Social Network Structures Through Friendship-Driven Community Detection with Association Rules Mining
Implements an innovative approach to community detection in social networks using Association Rules Learning. The package provides tools for processing graph and rules objects, generating association rules, and detecting communities based on node interactions. Designed to facilitate advanced research in Social Network Analysis, this package leverages association rules learning for enhanced community detection. This approach is described in El-Moussaoui et al. (2021) <doi:10.1007/978-3-030-66840-2_3>.
Maintained by Mohamed El-Moussaoui. Last updated 6 months ago.
12.6 match 6.45 score 50 scriptsices-tools-prod
icesTAF:Functions to Support the ICES Transparent Assessment Framework
Functions to support the ICES Transparent Assessment Framework <https://taf.ices.dk> to organize data, methods, and results used in ICES assessments. ICES is an organization facilitating international collaboration in marine science.
Maintained by Colin Millar. Last updated 2 years ago.
12.3 match 5 stars 6.37 score 1.1k scripts 1 dependentsropensci
pathviewr:Wrangle, Analyze, and Visualize Animal Movement Data
Tools to import, clean, and visualize movement data, particularly from motion capture systems such as Optitrack's 'Motive', the Straw Lab's 'Flydra', or from other sources. We provide functions to remove artifacts, standardize tunnel position and tunnel axes, select a region of interest, isolate specific trajectories, fill gaps in trajectory data, and calculate 3D and per-axis velocity. For experiments of visual guidance, we also provide functions that use subject position to estimate perception of visual stimuli.
Maintained by Vikram B. Baliga. Last updated 2 years ago.
animal-movementflydramotionmovement-dataoptitracktrajectoriestrajectory-analysisvisual-guidancevisual-perception
11.9 match 8 stars 6.56 score 102 scriptsdata-cleaning
lintools:Manipulation of Linear Systems of (in)Equalities
Variable elimination (Gaussian elimination, Fourier-Motzkin elimination), Moore-Penrose pseudoinverse, reduction to reduced row echelon form, value substitution, projecting a vector on the convex polytope described by a system of (in)equations, simplify systems by removing spurious columns and rows and collapse implied equalities, test if a matrix is totally unimodular, compute variable ranges implied by linear (in)equalities.
Maintained by Mark van der Loo. Last updated 9 months ago.
15.0 match 4 stars 5.19 score 13 scripts 2 dependentsropensci
drake:A Pipeline Toolkit for Reproducible Computation at Scale
A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.
Maintained by William Michael Landau. Last updated 3 months ago.
data-sciencedrakehigh-performance-computingmakefilepeer-reviewedpipelinereproducibilityreproducible-researchropensciworkflow
6.8 match 1.3k stars 11.49 score 1.7k scripts 1 dependentsepicentre-msf
dbc:Dictionary-Based Cleaning
Tools for dictionary-based data cleaning.
Maintained by Patrick Barks. Last updated 1 years ago.
31.3 match 2 stars 2.48 score 4 scripts 1 dependentsropensci
EndoMineR:Functions to mine endoscopic and associated pathology datasets
This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.
Maintained by Sebastian Zeki. Last updated 7 months ago.
endoscopygastroenterologypeer-reviewedsemi-structured-datatext-mining
13.4 match 13 stars 5.47 score 30 scriptsbilldenney
PKNCA:Perform Pharmacokinetic Non-Compartmental Analysis
Compute standard Non-Compartmental Analysis (NCA) parameters for typical pharmacokinetic analyses and summarize them.
Maintained by Bill Denney. Last updated 17 days ago.
ncanoncompartmental-analysispharmacokinetics
5.6 match 73 stars 12.61 score 214 scripts 4 dependentsibot-geoecology
myClim:Microclimatic Data Processing
Handling the microclimatic data in R. The 'myClim' workflow begins at the reading data primary from microclimatic dataloggers, but can be also reading of meteorological station data from files. Cleaning time step, time zone settings and metadata collecting is the next step of the work flow. With 'myClim' tools one can crop, join, downscale, and convert microclimatic data formats, sort them into localities, request descriptive characteristics and compute microclimatic variables. Handy plotting functions are provided with smart defaults.
Maintained by Vojtěch Kalčík. Last updated 14 days ago.
10.0 match 7 stars 6.97 score 30 scriptsasgr
imager:Image Processing Library Based on 'CImg'
Fast image processing for images in up to 4 dimensions (two spatial dimensions, one time/depth dimension, one colour dimension). Provides most traditional image processing tools (filtering, morphology, transformations, etc.) as well as various functions for easily analysing image data using R. The package wraps 'CImg', <http://cimg.eu>, a simple, modern C++ library for image processing.
Maintained by Aaron Robotham. Last updated 27 days ago.
5.0 match 17 stars 13.62 score 2.4k scripts 45 dependentskbroman
qtl:Tools for Analyzing QTL Experiments
Analysis of experimental crosses to identify genes (called quantitative trait loci, QTLs) contributing to variation in quantitative traits. Broman et al. (2003) <doi:10.1093/bioinformatics/btg112>.
Maintained by Karl W Broman. Last updated 7 months ago.
5.3 match 80 stars 12.79 score 2.4k scripts 29 dependentsazure
azuremlsdk:Interface to the 'Azure Machine Learning' 'SDK'
Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: <https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml>.
Maintained by Diondra Peck. Last updated 3 years ago.
amlcomputeazureazure-machine-learningazuremldsimachine-learningrstudiosdk-r
7.5 match 106 stars 8.91 score 221 scriptsbioc
MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics
MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.
Maintained by Laurent Gatto. Last updated 2 days ago.
immunooncologyinfrastructureproteomicsmassspectrometryqualitycontroldataimportbioconductorbioinformaticsmass-spectrometryproteomics-datavisualisationcpp
5.1 match 130 stars 12.81 score 772 scripts 36 dependentsdata-cleaning
deducorrect:Deductive Correction, Deductive Imputation, and Deterministic Correction
A collection of methods for automated data cleaning where all actions are logged. NOTE: active development has moved to the 'deductive' package.
Maintained by Mark van der Loo. Last updated 9 months ago.
15.5 match 9 stars 4.18 score 34 scriptsjrnold
ggthemes:Extra Themes, Scales and Geoms for 'ggplot2'
Some extra themes, geoms, and scales for 'ggplot2'. Provides 'ggplot2' themes and scales that replicate the look of plots by Edward Tufte, Stephen Few, 'Fivethirtyeight', 'The Economist', 'Stata', 'Excel', and 'The Wall Street Journal', among others. Provides 'geoms' for Tufte's box plot and range frame.
Maintained by Jeffrey B. Arnold. Last updated 1 years ago.
data-visualisationggplot2ggplot2-themesplotplottingthemevisualization
4.0 match 1.3k stars 16.17 score 40k scripts 102 dependentsbioc
COTAN:COexpression Tables ANalysis
Statistical and computational method to analyze the co-expression of gene pairs at single cell level. It provides the foundation for single-cell gene interactome analysis. The basic idea is studying the zero UMI counts' distribution instead of focusing on positive counts; this is done with a generalized contingency tables framework. COTAN can effectively assess the correlated or anti-correlated expression of gene pairs. It provides a numerical index related to the correlation and an approximate p-value for the associated independence test. COTAN can also evaluate whether single genes are differentially expressed, scoring them with a newly defined global differentiation index. Moreover, this approach provides ways to plot and cluster genes according to their co-expression pattern with other genes, effectively helping the study of gene interactions and becoming a new tool to identify cell-identity marker genes.
Maintained by Galfrè Silvia Giulia. Last updated 19 days ago.
systemsbiologytranscriptomicsgeneexpressionsinglecell
8.1 match 16 stars 7.88 score 96 scriptseasystats
insight:Easy Access to Model Information for Various Model Objects
A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. 'insight' mainly revolves around two types of functions: Functions that find (the names of) information, starting with 'find_', and functions that get the underlying data, starting with 'get_'. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.
Maintained by Daniel Lüdecke. Last updated 5 days ago.
easystatshacktoberfestinsightmodelsnamespredictorsrandom
3.6 match 412 stars 17.24 score 568 scripts 210 dependentsbillpetti
baseballr:Acquiring and Analyzing Baseball Data
Provides numerous utilities for acquiring and analyzing baseball data from online sources such as 'Baseball Reference' <https://www.baseball-reference.com/>, 'FanGraphs' <https://www.fangraphs.com/>, and the 'MLB Stats' API <https://www.mlb.com/>.
Maintained by Saiem Gilani. Last updated 4 months ago.
baseballpitchfxsabermetricsstatcast
6.6 match 380 stars 8.98 score 582 scriptsrenkun-ken
rlist:A Toolbox for Non-Tabular Data Manipulation
Provides a set of functions for data manipulation with list objects, including mapping, filtering, grouping, sorting, updating, searching, and other useful functions. Most functions are designed to be pipeline friendly so that data processing with lists can be chained.
Maintained by Kun Ren. Last updated 2 years ago.
4.3 match 206 stars 13.73 score 2.2k scripts 123 dependentsbusiness-science
timetk:A Tool Kit for Working with Time Series
Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.
Maintained by Matt Dancho. Last updated 1 years ago.
coercioncoercion-functionsdata-miningdplyrforecastforecastingforecasting-modelsmachine-learningseries-decompositionseries-signaturetibbletidytidyquanttidyversetimetime-seriestimeseries
4.1 match 625 stars 14.15 score 4.0k scripts 16 dependentsthe-hull
datacleanr:Interactive and Reproducible Data Cleaning
Flexible and efficient cleaning of data with interactivity. 'datacleanr' facilitates best practices in data analyses and reproducibility with built-in features and by translating interactive/manual operations to code. The package is designed for interoperability, and so seamlessly fits into reproducible analyses pipelines in 'R'.
Maintained by Alexander Hurley. Last updated 3 years ago.
annotation-tooldata-cleaningoutlier-detectionoutlier-removalreproducibility
13.0 match 20 stars 4.38 score 24 scriptsekstroem
dataMaid:A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Screening Process
Data screening is an important first step of any statistical analysis. dataMaid auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset.
Maintained by Claus Thorn Ekstrøm. Last updated 3 years ago.
data-cleaningdata-screeningreproducible-research
7.5 match 143 stars 7.53 score 236 scriptsegenn
rtemis:Machine Learning and Visualization
Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.
Maintained by E.D. Gennatas. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningmachine-learning-libraryvisualization
7.8 match 145 stars 7.09 score 50 scripts 2 dependentsthinkr-open
thinkr:Tools for Cleaning Up Messy Files
Some tools for cleaning up messy 'Excel' files to be suitable for R. People who have been working with 'Excel' for years built more or less complicated sheets with names, characters, formats that are not homogeneous. To be able to use them in R nowadays, we built a set of functions that will avoid the majority of importation problems and keep all the data at best.
Maintained by Vincent Guyader. Last updated 3 years ago.
hacktoberfestthinkr-not-maintained
7.6 match 29 stars 6.96 score 45 scriptschrismuir
refinr:Cluster and Merge Similar Values Within a Character Vector
These functions take a character vector as input, identify and cluster similar values, and then merge clusters together so their values become identical. The functions are an implementation of the key collision and ngram fingerprint algorithms from the open source tool Open Refine <https://openrefine.org/>. More info on key collision and ngram fingerprint can be found here <https://openrefine.org/docs/technical-reference/clustering-in-depth>.
Maintained by Chris Muir. Last updated 1 years ago.
approximate-string-matchingclusteringdata-cleaningdata-clusteringfuzzy-matchingngramopenrefinecpp
7.5 match 104 stars 6.80 score 121 scriptsropensci
taxa:Classes for Storing and Manipulating Taxonomic Data
Provides classes for storing and manipulating taxonomic data. Most of the classes can be treated like base R vectors (e.g. can be used in tables as columns and can be named). Vectorized classes can store taxon names and authorities, taxon IDs from databases, taxon ranks, and other types of information. More complex classes are provided to store taxonomic trees and user-defined data associated with them.
Maintained by Zachary Foster. Last updated 1 years ago.
taxonomybiologyhierarchydata-cleaningtaxon
7.5 match 48 stars 6.80 score 217 scriptstrinker
qdap:Bridging the Gap Between Qualitative Data and Quantitative Analysis
Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.
Maintained by Tyler Rinker. Last updated 4 years ago.
qdapquantitative-discourse-analysistext-analysistext-miningtext-plottingopenjdk
5.3 match 176 stars 9.61 score 1.3k scripts 3 dependentspascoalf
ulrb:Unsupervised Learning Based Definition of Microbial Rare Biosphere
A tool to define rare biosphere. 'ulrb' solves the problem of the definition of rarity by replacing arbitrary thresholds with an unsupervised machine learning algorithm (partitioning around medoids, or k-medoids). This algorithm works for any type of microbiome data, provided there is a species abundance table. For validation of this method to different species abundance tables see Pascoal et al, 2024 (in peer-review). This method also works for non-microbiome data.
Maintained by Francisco Pascoal. Last updated 20 days ago.
8.9 match 3 stars 5.68 score 9 scriptsepiforecasts
socialmixr:Social Mixing Matrices for Infectious Disease Modelling
Provides methods for sampling contact matrices from diary data for use in infectious disease modelling, as discussed in Mossong et al. (2008) <doi:10.1371/journal.pmed.0050074>.
Maintained by Sebastian Funk. Last updated 5 months ago.
5.2 match 38 stars 9.74 score 227 scripts 1 dependentsepiforecasts
EpiNow2:Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters
Estimates the time-varying reproduction number, rate of spread, and doubling time using a range of open-source tools (Abbott et al. (2020) <doi:10.12688/wellcomeopenres.16006.1>), and current best practices (Gostic et al. (2020) <doi:10.1101/2020.06.18.20134858>). It aims to help users avoid some of the limitations of naive implementations in a framework that is informed by community feedback and is actively supported.
Maintained by Sebastian Funk. Last updated 25 days ago.
backcalculationcovid-19gaussian-processesopen-sourcereproduction-numberstancpp
4.1 match 120 stars 11.88 score 210 scriptskcuilla
reactablefmtr:Streamlined Table Styling and Formatting for Reactable
Provides various features to streamline and enhance the styling of interactive reactable tables with easy-to-use and highly-customizable functions and themes. Apply conditional formatting to cells with data bars, color scales, color tiles, and icon sets. Utilize custom table themes inspired by popular websites such and bootstrap themes. Apply sparkline line & bar charts (note this feature requires the 'dataui' package which can be downloaded from <https://github.com/timelyportfolio/dataui>). Increase the portability and reproducibility of reactable tables by embedding images from the web directly into cells. Save the final table output as a static image or interactive file.
Maintained by Kyle Cuilla. Last updated 2 years ago.
customizationdata-visualizationeasy-to-usereproducibletables
5.6 match 209 stars 8.79 score 460 scripts 4 dependentsoscarkjell
text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.
Maintained by Oscar Kjell. Last updated 4 days ago.
deep-learningmachine-learningnlptransformersopenjdk
3.7 match 146 stars 13.16 score 436 scripts 1 dependentsextendr
rextendr:Call Rust Code from R using the 'extendr' Crate
Provides functions to compile and load Rust code from R, similar to how 'Rcpp' or 'cpp11' allow easy interfacing with C++ code. Also provides helper functions to create R packages that use Rust code. Under the hood, the Rust crate 'extendr' is used to do all the heavy lifting.
Maintained by Ilia Kosenkov. Last updated 24 days ago.
5.1 match 205 stars 9.43 score 61 scriptsaphalo
photobiology:Photobiological Calculations
Definitions of classes, methods, operators and functions for use in photobiology and radiation meteorology and climatology. Calculation of effective (weighted) and not-weighted irradiances/doses, fluence rates, transmittance, reflectance, absorptance, absorbance and diverse ratios and other derived quantities from spectral data. Local maxima and minima: peaks, valleys and spikes. Conversion between energy-and photon-based units. Wavelength interpolation. Astronomical calculations related solar angles and day length. Colours and vision. This package is part of the 'r4photobiology' suite, Aphalo, P. J. (2015) <doi:10.19232/uv4pb.2015.1.14>.
Maintained by Pedro J. Aphalo. Last updated 3 days ago.
lightphotobiologyquantificationr4photobiology-suiteradiationspectrasun-position
5.1 match 4 stars 9.35 score 604 scripts 12 dependentsbeckerbenj
eatGADS:Data Management of Large Hierarchical Data
Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases.
Maintained by Benjamin Becker. Last updated 24 days ago.
6.5 match 1 stars 7.36 score 34 scripts 1 dependentsbioc
xcms:LC-MS and GC-MS Data Analysis
Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.
Maintained by Steffen Neumann. Last updated 3 days ago.
immunooncologymassspectrometrymetabolomicsbioconductorfeature-detectionmass-spectrometrypeak-detectioncpp
3.3 match 196 stars 14.31 score 984 scripts 11 dependentsusaid-oha-si
gophr:Utility functions related to working with the MER Structured Dataset
This packages contains a number of functions for working with the PEPFAR MSD.
Maintained by Aaron Chafetz. Last updated 4 months ago.
7.5 match 1 stars 6.21 score 182 scripts 1 dependentsbioc
iNETgrate:Integrates DNA methylation data with gene expression in a single gene network
The iNETgrate package provides functions to build a correlation network in which nodes are genes. DNA methylation and gene expression data are integrated to define the connections between genes. This network is used to identify modules (clusters) of genes. The biological information in each of the resulting modules is represented by an eigengene. These biological signatures can be used as features e.g., for classification of patients into risk categories. The resulting biological signatures are very robust and give a holistic view of the underlying molecular changes.
Maintained by Habil Zare. Last updated 5 months ago.
geneexpressionrnaseqdnamethylationnetworkinferencenetworkgraphandnetworkbiomedicalinformaticssystemsbiologytranscriptomicsclassificationclusteringdimensionreductionprincipalcomponentmrnamicroarraynormalizationgenepredictionkeggsurvivalcore-services
7.5 match 74 stars 6.21 score 1 scriptschandlerxiandeyang
CleaningValidation:Cleaning Validation Functions for Pharmaceutical Cleaning Process
Provides essential Cleaning Validation functions for complying with pharmaceutical cleaning process regulatory standards. The package includes non-parametric methods to analyze drug active-ingredient residue (DAR), cleaning agent residue (CAR), and microbial colonies (Mic) for non-Poisson distributions. Additionally, Poisson methods are provided for Mic analysis when Mic data follow a Poisson distribution.
Maintained by Xiande Yang. Last updated 10 months ago.
17.1 match 2.70 scorebioc
MultiAssayExperiment:Software for the integration of multi-omics experiments in Bioconductor
Harmonize data management of multiple experimental assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames. Facilities are provided for reshaping data into wide and long formats for adaptability to graphing and downstream analysis.
Maintained by Marcel Ramos. Last updated 2 months ago.
infrastructuredatarepresentationbioconductorbioconductor-packagegenomicsnci-itcrtcgau24ca289073
3.0 match 71 stars 14.95 score 670 scripts 127 dependentsalexchristensen
SemNetCleaner:An Automated Cleaning Tool for Semantic and Linguistic Data
Implements several functions that automates the cleaning and spell-checking of text data. Also converges, finalizes, removes plurals and continuous strings, and puts text data in binary format for semantic network analysis. Uses the 'SemNetDictionaries' package to make the cleaning process more accurate, efficient, and reproducible.
Maintained by Alexander P. Christensen. Last updated 3 years ago.
preprocessingsemantic-network-analysis
7.2 match 10 stars 6.16 score 48 scripts 1 dependentstaxonomicallyinformedannotation
tima:Taxonomically Informed Metabolite Annotation
This package provides the infrastructure to perform Taxonomically Informed Metabolite Annotation.
Maintained by Adriano Rutz. Last updated 6 days ago.
metabolite annotationchemotaxonomyscoring systemnatural productscomputational metabolomicstaxonomic distancespecialized metabolome
6.8 match 9 stars 6.55 score 32 scripts 2 dependentstidymodels
textrecipes:Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Maintained by Emil Hvitfeldt. Last updated 9 days ago.
4.0 match 160 stars 10.87 score 964 scripts 1 dependentsrenands
RMLPCA:Maximum Likelihood Principal Component Analysis
R implementation of Maximum Likelihood Principal Component Analysis The main idea of this package is to have an alternative way of PCA for subspace modeling that considers measurement errors. More details can be found in Peter D. Wentzell (2009) <doi:10.1016/B978-0-444-64165-6.03029-9>.
Maintained by Renan Santos Barbosa. Last updated 4 years ago.
13.7 match 2 stars 3.15 score 14 scriptsikosmidis
cranly:Package Directives and Collaboration Networks in CRAN
Core visualizations and summaries for the CRAN package database. The package provides comprehensive methods for cleaning up and organizing the information in the CRAN package database, for building package directives networks (depends, imports, suggests, enhances, linking to) and collaboration networks, producing package dependence trees, and for computing useful summaries and producing interactive visualizations from the resulting networks and summaries. The resulting networks can be coerced to 'igraph' <https://CRAN.R-project.org/package=igraph> objects for further analyses and modelling.
Maintained by Ioannis Kosmidis. Last updated 3 years ago.
network-analysisnetwork-visualization
6.2 match 49 stars 6.85 score 32 scripts 1 dependentsnelson-gon
mde:Missing Data Explorer
Correct identification and handling of missing data is one of the most important steps in any analysis. To aid this process, 'mde' provides a very easy to use yet robust framework to quickly get an idea of where the missing data lies and therefore find the most appropriate action to take. Graham WJ (2009) <doi:10.1146/annurev.psych.58.110405.085530>.
Maintained by Nelson Gonzabato. Last updated 3 years ago.
data-analysisdata-cleaningdata-explorationdata-sciencedatacleanerdatacleaningexploratory-data-analysismissingmissing-datamissing-value-treatmentmissing-valuesmissingnessomitrecodereplacestatistics
7.5 match 4 stars 5.61 score 34 scriptskwb-r
kwb.endnote:Helper Functions for Analysing KWB Endnote Library (Exported as .xml)
Helper Functions For Analysing KWB Endnote Library (Exported As .XML).
Maintained by Michael Rustler. Last updated 4 years ago.
endnoteknowledge-repoliterature-data-managementproject-fakinpublication
14.0 match 3.00 score 2 scriptsusaid-oha-si
glamr:SI Utilities Package
Provides a series of base functions useful to the GH OHA SI team. This includes project setup, pulling from DATIM, and key functions for working with the MSD.
Maintained by Aaron Chafetz. Last updated 6 months ago.
5.7 match 2 stars 7.28 score 1.3k scripts 1 dependentsbioc
affy:Methods for Affymetrix Oligonucleotide Arrays
The package contains functions for exploratory oligonucleotide array analysis. The dependence on tkWidgets only concerns few convenience functions. 'affy' is fully functional without it.
Maintained by Robert D. Shear. Last updated 2 months ago.
microarrayonechannelpreprocessing
3.8 match 11.12 score 2.5k scripts 98 dependentsr-lib
pkgdown:Make Static HTML Documentation for a Package
Generate an attractive and useful website from a source package. 'pkgdown' converts your documentation, vignettes, 'README', and more to 'HTML' making it easy to share information about your package online.
Maintained by Hadley Wickham. Last updated 11 hours ago.
2.3 match 734 stars 18.47 score 588 scripts 162 dependentsyihui
knitr:A General-Purpose Package for Dynamic Report Generation in R
Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.
Maintained by Yihui Xie. Last updated 4 hours ago.
dynamic-documentsknitrliterate-programmingrmarkdownsweave
1.8 match 2.4k stars 23.61 score 116k scripts 4.2k dependentsrstudio
tfruns:Training Run Tools for 'TensorFlow'
Create and manage unique directories for each 'TensorFlow' training run. Provides a unique, time stamped directory for each run along with functions to retrieve the directory of the latest run or latest several runs.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
3.5 match 34 stars 11.80 score 325 scripts 77 dependentspecanproject
PEcAn.settings:PEcAn Settings package
Contains functions to read PEcAn settings files.
Maintained by David LeBauer. Last updated 2 days ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
4.1 match 216 stars 10.00 score 54 scripts 17 dependentsblasbenito
distantia:Advanced Toolset for Efficient Time Series Dissimilarity Analysis
Fast C++ implementation of Dynamic Time Warping for time series dissimilarity analysis, with applications in environmental monitoring and sensor data analysis, climate science, signal processing and pattern recognition, and financial data analysis. Built upon the ideas presented in Benito and Birks (2020) <doi:10.1111/ecog.04895>, provides tools for analyzing time series of varying lengths and structures, including irregular multivariate time series. Key features include individual variable contribution analysis, restricted permutation tests for statistical significance, and imputation of missing data via GAMs. Additionally, the package provides an ample set of tools to prepare and manage time series data.
Maintained by Blas M. Benito. Last updated 25 days ago.
dissimilaritydynamic-time-warpinglock-steptime-seriescpp
7.1 match 23 stars 5.76 score 11 scriptsweecology
portalr:Create Useful Summaries of the Portal Data
Download and generate summaries for the rodent, plant, ant, and weather data from the Portal Project. Portal is a long-term (and ongoing) experimental monitoring site in the Chihuahuan desert. The raw data files can be found at <https://github.com/weecology/portaldata>.
Maintained by Glenda M. Yenni. Last updated 4 months ago.
community-ecologyecologysmall-mammal-trapping
5.3 match 11 stars 7.64 score 63 scriptsbioc
GWASTools:Tools for Genome Wide Association Studies
Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.
Maintained by Stephanie M. Gogarten. Last updated 5 months ago.
snpgeneticvariabilityqualitycontrolmicroarray
3.9 match 17 stars 10.50 score 396 scripts 5 dependentsr-spatial
qgisprocess:Use 'QGIS' Processing Algorithms
Provides seamless access to the 'QGIS' (<https://qgis.org>) processing toolbox using the standalone 'qgis_process' command-line utility. Both native and third-party (plugin) processing providers are supported. Beside referring data sources from file, also common objects from 'sf', 'terra' and 'stars' are supported. The native processing algorithms are documented by QGIS.org (2024) <https://docs.qgis.org/latest/en/docs/user_manual/processing_algs/>.
Maintained by Floris Vanderhaeghe. Last updated 5 months ago.
4.0 match 210 stars 10.09 score 175 scriptspachadotdev
cpp11armadillo:An 'Armadillo' Interface
Provides function declarations and inline function definitions that facilitate communication between R and the 'Armadillo' 'C++' library for linear algebra and scientific computing. This implementation is detailed in Vargas Sepulveda and Schneider Malamud (2024) <doi:10.48550/arXiv.2408.11074>.
Maintained by Mauricio Vargas Sepulveda. Last updated 26 days ago.
armadillocppcpp11hacktoberfestlinear-algebra
4.4 match 9 stars 9.14 score 1 scripts 16 dependentsinbo
checklist:A Thorough and Strict Set of Checks for R Packages and Source Code
An opinionated set of rules for R packages and R source code projects.
Maintained by Thierry Onkelinx. Last updated 27 days ago.
checklistcontinuous-integrationcontinuous-testingquality-assurance
5.5 match 19 stars 7.24 score 21 scripts 2 dependentsrstudio
packrat:A Dependency Management System for Projects and their R Package Dependencies
Manage the R packages your project depends on in an isolated, portable, and reproducible way.
Maintained by Aron Atkins. Last updated 1 months ago.
3.3 match 406 stars 12.15 score 256 scripts 9 dependentsdatalowe
synr:Explore and Process Synesthesia Consistency Test Data
Explore synesthesia consistency test data, calculate consistency scores, and classify participant data as valid or invalid.
Maintained by Lowe Wilsson. Last updated 1 years ago.
7.5 match 5.32 score 139 scriptsbioc
ShortRead:FASTQ input and manipulation
This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
dataimportsequencingqualitycontrolbioconductor-packagecore-packagezlibcpp
3.3 match 8 stars 12.08 score 1.8k scripts 49 dependentsbioc
beer:Bayesian Enrichment Estimation in R
BEER implements a Bayesian model for analyzing phage-immunoprecipitation sequencing (PhIP-seq) data. Given a PhIPData object, BEER returns posterior probabilities of enriched antibody responses, point estimates for the relative fold-change in comparison to negative control samples, and more. Additionally, BEER provides a convenient implementation for using edgeR to identify enriched antibody responses.
Maintained by Athena Chen. Last updated 5 months ago.
softwarestatisticalmethodbayesiansequencingcoveragejagscpp
7.4 match 10 stars 5.38 score 12 scriptsambuvjyn
baseq:Basic Sequence Processing Tool for Biological Data
Primarily created as an easy and understanding way to do basic sequences surrounding the central dogma of molecular biology.
Maintained by Ambu Vijayan. Last updated 2 years ago.
9.9 match 2 stars 4.00 scoregpilgrim2670
SwimmeR:Data Import, Cleaning, and Conversions for Swimming Results
The goal of the 'SwimmeR' package is to provide means of acquiring, and then analyzing, data from swimming (and diving) competitions. To that end 'SwimmeR' allows results to be read in from .html sources, like 'Hy-Tek' real time results pages, '.pdf' files, 'ISL' results, 'Omega' results, and (on a development basis) '.hy3' files. Once read in, 'SwimmeR' can convert swimming times (performances) between the computationally useful format of seconds reported to the '100ths' place (e.g. 95.37), and the conventional reporting format (1:35.37) used in the swimming community. 'SwimmeR' can also score meets in a variety of formats with user defined point values, convert times between courses ('LCM', 'SCM', 'SCY') and draw single elimination brackets, as well as providing a suite of tools for working cleaning swimming data. This is a developmental package, not yet mature.
Maintained by Greg Pilgrim. Last updated 2 years ago.
8.6 match 4 stars 4.53 score 17 scriptsr-lib
devtools:Tools to Make Developing R Packages Easier
Collection of package development tools.
Maintained by Jennifer Bryan. Last updated 6 months ago.
2.0 match 2.4k stars 19.51 score 51k scripts 148 dependentswenlong-liu
usfertilizer:County-Level Estimates of Fertilizer Application in USA
Compiled and cleaned the county-level estimates of fertilizer, nitrogen and phosphorus, from 1945 to 2012 in United States of America (USA). The commercial fertilizer data were originally generated by USGS based on the sales data of commercial fertilizer. The manure data were estimated based on county-level population data of livestock, poultry, and other animals. See the user manual for detailed data sources and cleaning methods. 'usfertilizer' utilized the tidyverse to clean the original data and provide user-friendly dataframe. Please note that USGS does not endorse this package. Also data from 1986 is not available for now.
Maintained by Wenlong Liu. Last updated 7 years ago.
8.9 match 11 stars 4.34 score 1 scriptstvganesh
cricketr:Analyze Cricketers and Cricket Teams Based on ESPN Cricinfo Statsguru
Tools for analyzing performances of cricketers based on stats in ESPN Cricinfo Statsguru. The toolset can be used for analysis of Tests,ODIs and Twenty20 matches of both batsmen and bowlers. The package can also be used to analyze team performances.
Maintained by Tinniam V Ganesh. Last updated 4 years ago.
6.9 match 62 stars 5.55 score 115 scriptssilvadenisson
electionsBR:R Functions to Download and Clean Brazilian Electoral Data
Offers a set of functions to easily download and clean Brazilian electoral data from the Superior Electoral Court and 'CepespData' websites. Among other features, the package retrieves data on local and federal elections for all positions (city councilor, mayor, state deputy, federal deputy, governor, and president) aggregated by state, city, and electoral zones.
Maintained by Denisson Silva. Last updated 4 months ago.
5.1 match 65 stars 7.54 score 66 scriptsmsberends
AMR:Antimicrobial Resistance Data Analysis
Functions to simplify and standardise antimicrobial resistance (AMR) data analysis and to work with microbial and antimicrobial properties by using evidence-based methods, as described in <doi:10.18637/jss.v104.i03>.
Maintained by Matthijs S. Berends. Last updated 7 hours ago.
amrantimicrobial-dataepidemiologymicrobiologysoftware
3.2 match 92 stars 11.87 score 182 scripts 6 dependentsropensci
taxlist:Handling Taxonomic Lists
Handling taxonomic lists through objects of class 'taxlist'. This package provides functions to import species lists from 'Turboveg' (<https://www.synbiosys.alterra.nl/turboveg/>) and the possibility to create backups from resulting R-objects. Also quick displays are implemented as summary-methods.
Maintained by Miguel Alvarez. Last updated 6 months ago.
5.3 match 12 stars 7.07 score 81 scripts 2 dependentstirgit
missCompare:Intuitive Missing Data Imputation Framework
Offers a convenient pipeline to test and compare various missing data imputation algorithms on simulated and real data. These include simpler methods, such as mean and median imputation and random replacement, but also include more sophisticated algorithms already implemented in popular R packages, such as 'mi', described by Su et al. (2011) <doi:10.18637/jss.v045.i02>; 'mice', described by van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>; 'missForest', described by Stekhoven and Buhlmann (2012) <doi:10.1093/bioinformatics/btr597>; 'missMDA', described by Josse and Husson (2016) <doi:10.18637/jss.v070.i01>; and 'pcaMethods', described by Stacklies et al. (2007) <doi:10.1093/bioinformatics/btm069>. The central assumption behind 'missCompare' is that structurally different datasets (e.g. larger datasets with a large number of correlated variables vs. smaller datasets with non correlated variables) will benefit differently from different missing data imputation algorithms. 'missCompare' takes measurements of your dataset and sets up a sandbox to try a curated list of standard and sophisticated missing data imputation algorithms and compares them assuming custom missingness patterns. 'missCompare' will also impute your real-life dataset for you after the selection of the best performing algorithm in the simulations. The package also provides various post-imputation diagnostics and visualizations to help you assess imputation performance.
Maintained by Tibor V. Varga. Last updated 4 years ago.
comparisoncomparison-benchmarksimputationimputation-algorithmimputation-methodsimputationskolmogorov-smirnovmissingmissing-datamissing-data-imputationmissing-status-checkmissing-valuesmissingnesspost-imputation-diagnosticsrmse
6.3 match 39 stars 5.89 score 40 scriptsrstudio
rmarkdown:Dynamic Documents for R
Convert R Markdown documents into a variety of formats.
Maintained by Yihui Xie. Last updated 4 months ago.
literate-programmingmarkdownpandocrmarkdown
1.7 match 2.9k stars 21.79 score 14k scripts 3.7k dependentsecohealthalliance
ohcleandat:One Health Data Cleaning and Quality Checking Package
This package provides useful functions to orchestrate analytics and data cleaning pipelines for One Health projects.
Maintained by Collin Schwantes. Last updated 5 days ago.
7.6 match 1 stars 4.88 score 5 scriptssamhforbes
eyetrackingR:Eye-Tracking Data Analysis
Addresses tasks along the pipeline from raw data to analysis and visualization for eye-tracking data. Offers several popular types of analyses, including linear and growth curve time analyses, onset-contingent reaction time analyses, as well as several non-parametric bootstrapping approaches. For references to the approach see Mirman, Dixon & Magnuson (2008) <doi:10.1016/j.jml.2007.11.006>, and Barr (2008) <doi:10.1016/j.jml.2007.09.002>.
Maintained by Samuel Forbes. Last updated 2 years ago.
4.7 match 22 stars 7.84 score 60 scriptsbioc
DropletUtils:Utilities for Handling Single-Cell Droplet Data
Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading from count matrices or molecule information files, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix.
Maintained by Jonathan Griffiths. Last updated 3 months ago.
immunooncologysinglecellsequencingrnaseqgeneexpressiontranscriptomicsdataimportcoveragezlibcpp
3.7 match 10.08 score 2.7k scripts 9 dependentstguillerme
dispRity:Measuring Disparity
A modular package for measuring disparity (multidimensional space occupancy). Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics. The package also provides functions for looking at disparity in a serial way (e.g. disparity through time) or per groups as well as visualising the results. Finally, this package provides several statistical tests for disparity analysis.
Maintained by Thomas Guillerme. Last updated 2 days ago.
disparityecologymultidimensionalitypalaeobiology
4.3 match 26 stars 8.69 score 220 scripts 1 dependentscran
textreg:n-Gram Text Regression, aka Concise Comparative Summarization
Function for sparse regression on raw text, regressing a labeling vector onto a feature space consisting of all possible phrases.
Maintained by Luke Miratrix. Last updated 6 years ago.
11.0 match 1 stars 3.26 scoretom-wolff
ideanet:Integrating Data Exchange and Analysis for Networks ('ideanet')
A suite of convenient tools for social network analysis geared toward students, entry-level users, and non-expert practitioners. ‘ideanet’ features unique functions for the processing and measurement of sociocentric and egocentric network data. These functions automatically generate node- and system-level measures commonly used in the analysis of these types of networks. Outputs from these functions maximize the ability of novice users to employ network measurements in further analyses while making all users less prone to common data analytic errors. Additionally, ‘ideanet’ features an R Shiny graphic user interface that allows novices to explore network data with minimal need for coding.
Maintained by Tom Wolff. Last updated 3 days ago.
5.3 match 6 stars 6.80 score 10 scriptscloudyr
googleComputeEngineR:R Interface with Google Compute Engine
Interact with the 'Google Compute Engine' API in R. Lets you create, start and stop instances in the 'Google Cloud'. Support for preconfigured instances, with templates for common R needs.
Maintained by Mark Edmondson. Last updated 1 days ago.
apicloud-computingcloudyrgoogle-cloudgoogleauthrlaunching-virtual-machines
3.7 match 152 stars 9.73 score 235 scriptsharrison4192
framecleaner:Clean Data Frames
Provides a friendly interface for modifying data frames with a sequence of piped commands built upon the 'tidyverse' Wickham et al., (2019) <doi:10.21105/joss.01686> . The majority of commands wrap 'dplyr' mutate statements in a convenient way to concisely solve common issues that arise when tidying small to medium data sets. Includes smart defaults and allows flexible selection of columns via 'tidyselect'.
Maintained by Harrison Tietze. Last updated 1 years ago.
6.8 match 2 stars 5.18 score 5 scripts 5 dependentsbioboot
bio3d:Biological Structure Analysis
Utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform sequence and structure database searches, data summaries, atom selection, alignment, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, normal mode analysis, principal component analysis of heterogeneous structure data, and correlation network analysis from normal mode and molecular dynamics data. In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data. Please refer to the URLs below for more information.
Maintained by Barry Grant. Last updated 5 months ago.
4.1 match 5 stars 8.49 score 1.4k scripts 10 dependentsbgreenwell
bpa:Basic Pattern Analysis
Run basic pattern analyses on character sets, digits, or combined input containing both characters and numeric digits. Useful for data cleaning and for identifying columns containing multiple or nonstandard formats.
Maintained by Brandon Greenwell. Last updated 9 years ago.
basic-pattern-analysisdata-cleaningstandardization
8.0 match 3 stars 4.32 score 14 scriptscardiomoon
ggiraphExtra:Make Interactive 'ggplot2'. Extension to 'ggplot2' and 'ggiraph'
Collection of functions to enhance 'ggplot2' and 'ggiraph'. Provides functions for exploratory plots. All plot can be a 'static' plot or an 'interactive' plot using 'ggiraph'.
Maintained by Keon-Woong Moon. Last updated 4 years ago.
3.9 match 48 stars 8.93 score 402 scripts 3 dependentsbioc
scde:Single Cell Differential Expression
The scde package implements a set of statistical methods for analyzing single-cell RNA-seq data. scde fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The scde package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify and characterize putative cell subpopulations based on transcriptional signatures. The overall approach to the differential expression analysis is detailed in the following publication: "Bayesian approach to single-cell differential expression analysis" (Kharchenko PV, Silberstein L, Scadden DT, Nature Methods, doi: 10.1038/nmeth.2967). The overall approach to subpopulation identification and characterization is detailed in the following pre-print: "Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis" (Fan J, Salathia N, Liu R, Kaeser G, Yung Y, Herman J, Kaper F, Fan JB, Zhang K, Chun J, and Kharchenko PV, Nature Methods, doi:10.1038/nmeth.3734).
Maintained by Evan Biederstedt. Last updated 5 months ago.
immunooncologyrnaseqstatisticalmethoddifferentialexpressionbayesiantranscriptionsoftwareanalysisbioinformaticsheterogenityngssingle-celltranscriptomicsopenblascppopenmp
4.5 match 173 stars 7.53 score 141 scriptsjaseziv
worldfootballR:Extract and Clean World Football (Soccer) Data
Allow users to obtain clean and tidy football (soccer) game, team and player data. Data is collected from a number of popular sites, including 'FBref', transfer and valuations data from 'Transfermarkt'<https://www.transfermarkt.com/> and shooting location and other match stats data from 'Understat'<https://understat.com/>. It gives users the ability to access data more efficiently, rather than having to export data tables to files before being able to complete their analysis.
Maintained by Jason Zivkovic. Last updated 1 months ago.
fbreffootballfootball-datasoccer-datasports-datatransfermarktunderstat
3.4 match 506 stars 9.89 score 516 scripts 2 dependentse-sensing
sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes
An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.
Maintained by Gilberto Camara. Last updated 1 months ago.
big-earth-datacbersearth-observationeo-datacubesgeospatialimage-time-seriesland-cover-classificationlandsatplanetary-computerr-spatialremote-sensingrspatialsatellite-image-time-seriessatellite-imagerysentinel-2stac-apistac-catalogcpp
3.5 match 494 stars 9.50 score 384 scriptsalphaprime7
normfluodbf:Cleans and Normalizes FLUOstar DBF and DAT Files from 'Liposome' Flux Assays
Cleans and Normalizes FLUOstar DBF and DAT Files obtained from liposome flux assays. Users should verify extended usage of the package on files from other assay types.
Maintained by Tingwei Adeck. Last updated 4 months ago.
6.5 match 1 stars 4.98 score 12 scriptsjbdorey
BeeBDC:Occurrence Data Cleaning
Flags and checks occurrence data that are in Darwin Core format. The package includes generic functions and data as well as some that are specific to bees. This package is meant to build upon and be complimentary to other excellent occurrence cleaning packages, including 'bdc' and 'CoordinateCleaner'. This package uses datasets from several sources and particularly from the Discover Life Website, created by Ascher and Pickering (2020). For further information, please see the original publication and package website. Publication - Dorey et al. (2023) <doi:10.1101/2023.06.30.547152> and package website - Dorey et al. (2023) <https://github.com/jbdorey/BeeBDC>.
Maintained by James B. Dorey. Last updated 4 months ago.
5.6 match 3 stars 5.68 score 7 scriptsrstudio
bookdown:Authoring Books and Technical Documents with R Markdown
Output formats and utilities for authoring books and technical documents with R Markdown.
Maintained by Yihui Xie. Last updated 2 days ago.
bookbookdownepubgitbookhtmllatexrmarkdown
1.8 match 3.9k stars 17.51 score 1.7k scripts 136 dependentsbioc
MSstatsConvert:Import Data from Various Mass Spectrometry Signal Processing Tools to MSstats Format
MSstatsConvert provides tools for importing reports of Mass Spectrometry data processing tools into R format suitable for statistical analysis using the MSstats and MSstatsTMT packages.
Maintained by Mateusz Staniak. Last updated 3 months ago.
massspectrometryproteomicssoftwaredataimportqualitycontrol
4.9 match 6.37 score 25 scripts 7 dependentskassambara
rstatix:Pipe-Friendly Framework for Basic Statistical Tests
Provides a simple and intuitive pipe-friendly framework, coherent with the 'tidyverse' design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses. The output of each test is automatically transformed into a tidy data frame to facilitate visualization. Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely 'within-Ss' designs (repeated measures), purely 'between-Ss' designs, and mixed 'within-and-between-Ss' designs. It's also possible to compute several effect size metrics, including "eta squared" for ANOVA, "Cohen's d" for t-test and 'Cramer V' for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances.
Maintained by Alboukadel Kassambara. Last updated 2 years ago.
2.0 match 456 stars 15.16 score 11k scripts 420 dependentsjpquast
protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools
Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.
Maintained by Jan-Philipp Quast. Last updated 5 months ago.
data-analysislip-msmass-spectrometryomicsproteinproteomicssystems-biology
3.5 match 61 stars 8.58 score 83 scriptscbailiss
pivottabler:Create Pivot Tables
Create regular pivot tables with just a few lines of R. More complex pivot tables can also be created, e.g. pivot tables with irregular layouts, multiple calculations and/or derived calculations based on multiple data frames. Pivot tables are constructed using R only and can be written to a range of output formats (plain text, 'HTML', 'Latex' and 'Excel'), including with styling/formatting.
Maintained by Christopher Bailiss. Last updated 1 years ago.
calculationshtmlhtmlwidgetlatexpivot-tablesvisualization
3.7 match 122 stars 8.08 score 358 scripts 1 dependentsusa-npn
rnpn:Interface to the National 'Phenology' Network 'API'
Programmatic interface to the Web Service methods provided by the National 'Phenology' Network (<https://usanpn.org/>), which includes data on various life history events that occur at specific times.
Maintained by Jeff Switzer. Last updated 5 days ago.
datanational-phenology-networkphenologyspeciesweb-api
3.3 match 21 stars 8.82 score 109 scriptsbioc
flowClean:flowClean
A quality control tool for flow cytometry data based on compositional data analysis.
Maintained by Kipper Fletez-Brant. Last updated 5 months ago.
flowcytometryqualitycontrolimmunooncology
6.5 match 4.56 score 18 scriptschristophergandrud
DataCombine:Tools for Easily Combining and Cleaning Data Sets
Tools for combining and cleaning data sets, particularly with grouped and time series data. This includes functions for merging data while reporting duplicates, filling in columns with values of a column in another data frame, and creating continuous time data for interupted time series.
Maintained by Christopher Gandrud. Last updated 5 years ago.
3.4 match 55 stars 8.50 score 864 scripts 3 dependentsreconhub
matchmaker:Flexible Dictionary-Based Cleaning
Provides flexible dictionary-based cleaning that allows users to specify implicit and explicit missing data, regular expressions for both data and columns, and global matches, while respecting ordering of factors. This package is part of the 'RECON' (<https://www.repidemicsconsortium.org/>) toolkit for outbreak analysis.
Maintained by Zhian N. Kamvar. Last updated 5 years ago.
5.3 match 9 stars 5.43 score 9 scripts 2 dependentsbioc
iSEE:Interactive SummarizedExperiment Explorer
Create an interactive Shiny-based graphical user interface for exploring data stored in SummarizedExperiment objects, including row- and column-level metadata. The interface supports transmission of selections between plots and tables, code tracking, interactive tours, interactive or programmatic initialization, preservation of app state, and extensibility to new panel types via S4 classes. Special attention is given to single-cell data in a SingleCellExperiment object with visualization of dimensionality reduction results.
Maintained by Kevin Rue-Albrecht. Last updated 11 days ago.
cellbasedassaysclusteringdimensionreductionfeatureextractiongeneexpressionguiimmunooncologyshinyappssinglecelltranscriptiontranscriptomicsvisualizationdimension-reductionfeature-extractiongene-expressionhacktoberfesthuman-cell-atlasshinysingle-cell
2.3 match 225 stars 12.86 score 380 scripts 9 dependentshelixcn
phylotools:Phylogenetic Tools for Eco-Phylogenetics
A collection of tools for building RAxML supermatrix using PHYLIP or aligned FASTA files. These functions will be useful for building large phylogenies using multiple markers.
Maintained by Jinlong Zhang. Last updated 5 months ago.
3.9 match 11 stars 7.31 score 368 scriptsropensci
jstor:Read Data from JSTOR/DfR
Functions and helpers to import metadata, ngrams and full-texts delivered by Data for Research by JSTOR.
Maintained by Thomas Klebel. Last updated 8 months ago.
jstorpeer-reviewedtext-analysistext-mining
3.9 match 47 stars 7.29 score 55 scriptsthinkr-open
fusen:Build a Package from Rmarkdown Files
Use Rmarkdown First method to build your package. Start your package with documentation, functions, examples and tests in the same unique file. Everything can be set from the Rmarkdown template file provided in your project, then inflated as a package. Inflating the template copies the relevant chunks and sections in the appropriate files required for package development.
Maintained by Vincent Guyader. Last updated 2 months ago.
3.0 match 163 stars 9.45 score 35 scriptsmsperlin
BatchGetSymbols:Downloads and Organizes Financial Data for Multiple Tickers
Makes it easy to download financial data from Yahoo Finance <https://finance.yahoo.com/>.
Maintained by Marcelo Perlin. Last updated 3 years ago.
financial-dataindividual-stockstickersyahoo-finance
3.8 match 18 stars 7.21 score 393 scriptsjanmarvin
openxlsx2:Read, Write and Edit 'xlsx' Files
Simplifies the creation of 'xlsx' files by providing a high level interface to writing, styling and editing worksheets.
Maintained by Jan Marvin Garbuszus. Last updated 2 days ago.
2.0 match 138 stars 13.67 score 194 scripts 11 dependentsr-lib
covr:Test Coverage for Packages
Track and report code coverage for your package and (optionally) upload the results to a coverage service like 'Codecov' <https://about.codecov.io> or 'Coveralls' <https://coveralls.io>. Code coverage is a measure of the amount of code being exercised by a set of tests. It is an indirect measure of test quality and completeness. This package is compatible with any testing methodology or framework and tracks coverage of both R code and compiled C/C++/FORTRAN code.
Maintained by Jim Hester. Last updated 1 months ago.
codecovcoveragecoverage-reporttravis-ci
1.8 match 337 stars 15.25 score 2.3k scripts 9 dependentsropensci
restez:Create and Query a Local Copy of 'GenBank' in R
Download large sections of 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> and generate a local SQL-based database. A user can then query this database using 'restez' functions or through 'rentrez' <https://CRAN.R-project.org/package=rentrez> wrappers.
Maintained by Joel H. Nitta. Last updated 10 days ago.
3.8 match 26 stars 7.01 score 175 scripts 1 dependentsjohncoene
packer:An Opinionated Framework for Using 'JavaScript'
Enforces good practice and provides convenience functions to make work with 'JavaScript' not just easier but also scalable. It is a robust wrapper to 'NPM', 'yarn', and 'webpack' that enables to compartmentalize 'JavaScript' code, leverage 'NPM' and 'yarn' packages, include 'TypeScript', 'React', or 'Vue' in web applications, and much more.
Maintained by John Coene. Last updated 7 months ago.
4.3 match 148 stars 6.25 score 1 scripts 2 dependentsr-lib
pak:Another Approach to Package Installation
The goal of 'pak' is to make package installation faster and more reliable. In particular, it performs all HTTP operations in parallel, so metadata resolution and package downloads are fast. Metadata and package files are cached on the local disk as well. 'pak' has a dependency solver, so it finds version conflicts before performing the installation. This version of 'pak' supports CRAN, 'Bioconductor' and 'GitHub' packages as well.
Maintained by Gábor Csárdi. Last updated 16 hours ago.
2.0 match 717 stars 13.05 score 277 scripts 17 dependentsr-lib
ps:List, Query, Manipulate System Processes
List, query and manipulate all system processes, on 'Windows', 'Linux' and 'macOS'.
Maintained by Gábor Csárdi. Last updated 17 days ago.
1.7 match 79 stars 15.09 score 108 scripts 1.5k dependentsr-spatial
rgee:R Bindings for Calling the 'Earth Engine' API
Earth Engine <https://earthengine.google.com/> client library for R. All of the 'Earth Engine' API classes, modules, and functions are made available. Additional functions implemented include importing (exporting) of Earth Engine spatial objects, extraction of time series, interactive map display, assets management interface, and metadata display. See <https://r-spatial.github.io/rgee/> for further details.
Maintained by Cesar Aybar. Last updated 4 days ago.
earth-engineearthenginegoogle-earth-enginegoogleearthenginespatial-analysisspatial-data
1.9 match 715 stars 13.77 score 1.9k scripts 3 dependentstruenomad
epiCleanr:A Tidy Solution for Epidemiological Data
Offers a tidy solution for epidemiological data. It houses a range of functions for epidemiologists and public health data wizards for data management and cleaning.
Maintained by Mohamed A. Yusuf. Last updated 1 years ago.
7.0 match 3.70 score 4 scriptscapitalone
dataCompareR:Compare Two Data Frames and Summarise the Difference
Easy comparison of two tabular data objects in R. Specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. Aims to offer a more useful output than all.equal() when your two data sets do not match, but isn't intended to replace all.equal() as a way to test for equality.
Maintained by Sarah Johnston. Last updated 2 years ago.
compare-datadatadata-analysisdata-science
3.5 match 76 stars 7.24 score 76 scriptsrstudio
promises:Abstractions for Promise-Based Asynchronous Programming
Provides fundamental abstractions for doing asynchronous programming in R using promises. Asynchronous programming is useful for allowing a single R process to orchestrate multiple tasks in the background while also attending to something else. Semantics are similar to 'JavaScript' promises, but with a syntax that is idiomatic R.
Maintained by Joe Cheng. Last updated 1 months ago.
1.5 match 204 stars 17.10 score 688 scripts 2.6k dependentsoobianom
quickcode:Quick and Essential 'R' Tricks for Better Scripts
The NOT functions, 'R' tricks and a compilation of some simple quick plus often used 'R' codes to improve your scripts. Improve the quality and reproducibility of 'R' scripts.
Maintained by Obinna Obianom. Last updated 14 days ago.
3.3 match 5 stars 7.76 score 7 scripts 6 dependentsbioc
gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files
Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.
Maintained by Xiuwen Zheng. Last updated 2 days ago.
infrastructuredataimportbioinformaticsgds-formatgenomicscpp
2.3 match 18 stars 11.34 score 920 scripts 29 dependentsdmurdoch
plotrix:Various Plotting Functions
Lots of plots, various labeling, axis and color scaling functions. The author/maintainer died in September 2023.
Maintained by Duncan Murdoch. Last updated 1 years ago.
2.3 match 5 stars 11.31 score 9.2k scripts 361 dependentspik-piam
magclass:Data Class and Tools for Handling Spatial-Temporal Data
Data class for increased interoperability working with spatial-temporal data together with corresponding functions and methods (conversions, basic calculations and basic data manipulation). The class distinguishes between spatial, temporal and other dimensions to facilitate the development and interoperability of tools build for it. Additional features are name-based addressing of data and internal consistency checks (e.g. checking for the right data order in calculations).
Maintained by Jan Philipp Dietrich. Last updated 10 days ago.
2.3 match 5 stars 11.16 score 412 scripts 56 dependentsropensci
birdsize:Estimate Avian Body Size Distributions
Generate estimated body size distributions for populations or communities of birds, given either species ID or species' mean body size. Designed to work naturally with the North American Breeding Bird Survey, or with any dataset of bird species, abundance, and/or mean size data.
Maintained by Renata Diaz. Last updated 1 years ago.
6.6 match 3 stars 3.78 score 8 scriptsgeomarker-io
addr:Clean, Parse, Harmonize, Match, and Geocode Messy Real-World Addresses
Addresses that were not validated at the time of collection are often heterogenously formatted, making them difficult to compare or link to other sets of addresses. The addr package is designed to clean character strings of addresses, use the `usaddress` library to tag address components, and paste together select components to create a normalized address. Normalized addresses can be hashed to create hashdresses that can be used to merge with other sets of addresses.
Maintained by Cole Brokamp. Last updated 5 months ago.
5.3 match 2 stars 4.70 score 388 scriptsjosesamos
starschemar:Obtaining Stars from Flat Tables
Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure. Frequently, the operations to be performed aim to transform a flat table into a star schema. Transformations can be carried out using professional extract, transform and load tools or tools intended for data transformation for end users. With the tools mentioned, this transformation can be carried out, but it requires a lot of work. The main objective of this package is to define transformations that allow obtaining stars from flat tables easily. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context.
Maintained by Jose Samos. Last updated 11 months ago.
4.3 match 7 stars 5.66 score 11 scripts 2 dependentsc0webster
fedmatch:Fast, Flexible, and User-Friendly Record Linkage Methods
Provides a flexible set of tools for matching two un-linked data sets. 'fedmatch' allows for three ways to match data: exact matches, fuzzy matches, and multi-variable matches. It also allows an easy combination of these three matches via the tier matching function.
Maintained by Chris Webster. Last updated 1 months ago.
5.3 match 1 stars 4.62 score 80 scriptstetratech
baytrends:Long Term Water Quality Trend Analysis
Enable users to evaluate long-term trends using a Generalized Additive Modeling (GAM) approach. The model development includes selecting a GAM structure to describe nonlinear seasonally-varying changes over time, incorporation of hydrologic variability via either a river flow or salinity, the use of an intervention to deal with method or laboratory changes suspected to impact data values, and representation of left- and interval-censored data. The approach has been applied to water quality data in the Chesapeake Bay, a major estuary on the east coast of the United States to provide insights to a range of management- and research-focused questions. Methodology described in Murphy (2019) <doi:10.1016/j.envsoft.2019.03.027>.
Maintained by Erik W Leppo. Last updated 5 months ago.
3.6 match 12 stars 6.67 score 97 scriptsphilchalmers
SimDesign:Structure for Organizing Monte Carlo Simulation Designs
Provides tools to safely and efficiently organize and execute Monte Carlo simulation experiments in R. The package controls the structure and back-end of Monte Carlo simulation experiments by utilizing a generate-analyse-summarise workflow. The workflow safeguards against common simulation coding issues, such as automatically re-simulating non-convergent results, prevents inadvertently overwriting simulation files, catches error and warning messages during execution, implicitly supports parallel processing with high-quality random number generation, and provides tools for managing high-performance computing (HPC) array jobs submitted to schedulers such as SLURM. For a pedagogical introduction to the package see Sigal and Chalmers (2016) <doi:10.1080/10691898.2016.1246953>. For a more in-depth overview of the package and its design philosophy see Chalmers and Adkins (2020) <doi:10.20982/tqmp.16.4.p248>.
Maintained by Phil Chalmers. Last updated 22 hours ago.
monte-carlo-simulationsimulationsimulation-framework
1.8 match 62 stars 13.36 score 253 scripts 46 dependentskjhealy
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 11 months ago.
10.5 match 2.28 score 38 scriptswinvector
rquery:Relational Query Generator for Data Manipulation at Scale
A piped query generator based on Edgar F. Codd's relational algebra, and on production experience using 'SQL' and 'dplyr' at big data scale. The design represents an attempt to make 'SQL' more teachable by denoting composition by a sequential pipeline notation instead of nested queries or functions. The implementation delivers reliable high performance data processing on large data systems such as 'Spark', databases, and 'data.table'. Package features include: data processing trees or pipelines as observable objects (able to report both columns produced and columns used), optimized 'SQL' generation as an explicit user visible table modeling step, plus explicit query reasoning and checking.
Maintained by John Mount. Last updated 2 years ago.
2.5 match 110 stars 9.53 score 126 scripts 3 dependentsbusiness-science
tidyquant:Tidy Quantitative Financial Analysis
Bringing business and financial analysis to the 'tidyverse'. The 'tidyquant' package provides a convenient wrapper to various 'xts', 'zoo', 'quantmod', 'TTR' and 'PerformanceAnalytics' package functions and returns the objects in the tidy 'tibble' format. The main advantage is being able to use quantitative functions with the 'tidyverse' functions including 'purrr', 'dplyr', 'tidyr', 'ggplot2', 'lubridate', etc. See the 'tidyquant' website for more information, documentation and examples.
Maintained by Matt Dancho. Last updated 1 months ago.
dplyrfinancial-analysisfinancial-datafinancial-statementsmultiple-stocksperformance-analysisperformanceanalyticsquantmodstockstock-exchangesstock-indexesstock-listsstock-performancestock-pricesstock-symboltidyversetime-seriestimeseriesxts
1.8 match 872 stars 13.34 score 5.2k scriptsvpnagraj
rrefine:r Client for OpenRefine API
'OpenRefine' (formerly 'Google Refine') is a popular, open source data cleaning software. This package enables users to programmatically trigger data transfer between R and 'OpenRefine'. Available functionality includes project import, export and deletion.
Maintained by VP Nagraj. Last updated 2 years ago.
4.0 match 22 stars 5.77 score 27 scriptsnrennie
messy:Create Messy Data from Clean Data Frames
For the purposes of teaching, it is often desirable to show examples of working with messy data and how to clean it. This R package creates messy data from clean, tidy data frames so that students have a clean example to work towards.
Maintained by Nicola Rennie. Last updated 3 months ago.
3.9 match 141 stars 5.93 score 8 scriptschgrl
bReeze:Functions for Wind Resource Assessment
A collection of functions to analyse, visualize and interpret wind data and to calculate the potential energy production of wind turbines.
Maintained by Christian Graul. Last updated 1 years ago.
5.3 match 20 stars 4.34 score 22 scriptsmdlincoln
salty:Turn Clean Data into Messy Data
Take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc.
Maintained by Matthew Lincoln. Last updated 7 months ago.
4.8 match 64 stars 4.81 score 20 scriptspoissonconsulting
batchr:Batch Process Files
Processes multiple files with a user-supplied function. The key design principle is that only files which were last modified before the directory was configured are processed. A hidden file stores the configuration time and function etc while successfully processed files are automatically touched to update their modification date. As a result batch processing can be stopped and restarted and any files created (or modified or deleted) during processing are ignored.
Maintained by Joe Thorley. Last updated 2 months ago.
5.0 match 6 stars 4.56 score 8 scriptsvincentarelbundock
modelsummary:Summary Tables and Plots for Statistical Models and Data: Beautiful, Customizable, and Publication-Ready
Create beautiful and customizable tables to summarize several statistical models side-by-side. Draw coefficient plots, multi-level cross-tabs, dataset summaries, balance tables (a.k.a. "Table 1s"), and correlation matrices. This package supports dozens of statistical models, and it can produce tables in HTML, LaTeX, Word, Markdown, PDF, PowerPoint, Excel, RTF, JPG, or PNG. Tables can easily be embedded in 'Rmarkdown' or 'knitr' dynamic documents. Details can be found in Arel-Bundock (2022) <doi:10.18637/jss.v103.i01>.
Maintained by Vincent Arel-Bundock. Last updated 15 days ago.
1.7 match 926 stars 13.41 score 6.2k scripts 2 dependentsmiracum
DIZtools:Lightweight Utilities for 'DIZ' R Package Development
Lightweight utility functions used for the R package development infrastructure inside the data integration centers ('DIZ') to standardize and facilitate repetitive tasks such as setting up a database connection or issuing notification messages and to avoid redundancy.
Maintained by Jonathan M. Mang. Last updated 1 years ago.
5.5 match 3 stars 4.13 score 2 scripts 3 dependentsschochastics
networkdata:Repository of Network Datasets
The package contains a large collection of network dataset with different context. This includes social networks, animal networks and movie networks. All datasets are in 'igraph' format.
Maintained by David Schoch. Last updated 12 months ago.
4.5 match 143 stars 5.01 score 143 scriptsmatt-dray
oystr:Handle Personal Oyster Journey History Data Provided by Transport for London
You can opt-in to monthly emails from Transport for London (TfL) that have your Oyster journey history attached as a CSV. Functions in this small package help you read, wrangle and summarise these data. I, and this work, are unaffiliated with Transport for London (TfL).
Maintained by Matt Dray. Last updated 4 years ago.
londonoysteroystercardtfltransport
7.5 match 2 stars 3.00 score 5 scriptsropengov
eurostat:Tools for Eurostat Open Data
Tools to download data from the Eurostat database <https://ec.europa.eu/eurostat> together with search and manipulation utilities.
Maintained by Leo Lahti. Last updated 28 days ago.
2.0 match 239 stars 11.09 score 892 scripts 5 dependentsrichjjackson
psc:Personalised Synthetic Controls
Allows the comparison of data cohorts (DC) against a Counter Factual Model (CFM) and measures the difference in terms of an efficacy parameter. Allows the application of Personalised Synthetic Controls.
Maintained by Richard Jackson. Last updated 4 months ago.
5.3 match 1 stars 4.23 score 24 scriptsipums
ipumsr:An R Interface for Downloading, Reading, and Handling IPUMS Data
An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website <https://www.ipums.org>.
Maintained by Derek Burk. Last updated 19 days ago.
2.0 match 28 stars 11.07 score 720 scripts 2 dependentshadley
reshape:Flexibly Reshape Data
Flexibly restructure and aggregate data using just two functions: melt and cast.
Maintained by Hadley Wickham. Last updated 3 years ago.
2.3 match 9.83 score 21k scripts 231 dependentsylin00
gen5helper:Processing 'Gen5' 2.06 Exported Data
A collection of functions for processing 'Gen5' 2.06 exported data. 'Gen5' is an essential data analysis software for BioTek plate readers <https://www.biotek.com/products/software-robotics-software/gen5-microplate-reader-and-imager-software/>. This package contains functions for data cleaning, modeling and plotting using exported data from 'Gen5' version 2.06. It exports technically correct data defined in (Edwin de Jonge and Mark van der Loo (2013) <https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf>) for customized analysis. It contains Boltzmann fitting for general kinetic analysis. See <https://www.github.com/yanxianUCSB/gen5helper> for more information, documentation and examples.
Maintained by Yanxian Lin. Last updated 5 years ago.
8.1 match 2.70 score 1 scriptsopengeos
whitebox:'WhiteboxTools' R Frontend
An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.
Maintained by Andrew Brown. Last updated 5 months ago.
geomorphometrygeoprocessinggeospatialgishydrologyremote-sensingrstudio
2.3 match 173 stars 9.65 score 203 scripts 2 dependentsrstudio
blogdown:Create Blogs and Websites with R Markdown
Write blog posts and web pages in R Markdown. This package supports the static site generator 'Hugo' (<https://gohugo.io>) best, and it also supports 'Jekyll' (<https://jekyllrb.com>) and 'Hexo' (<https://hexo.io>).
Maintained by Yihui Xie. Last updated 22 hours ago.
blog-engineblogdownhugormarkdownrstudiowebsite-generation
1.9 match 1.8k stars 11.55 score 1.4k scripts 1 dependentsmandymejia
fMRIscrub:Scrubbing and Other Data Cleaning Routines for fMRI
Data-driven fMRI denoising with projection scrubbing (Pham et al (2022) <doi:10.1016/j.neuroimage.2023.119972>). Also includes routines for DVARS (Derivatives VARianceS) (Afyouni and Nichols (2018) <doi:10.1016/j.neuroimage.2017.12.098>), motion scrubbing (Power et al (2012) <doi:10.1016/j.neuroimage.2011.10.018>), aCompCor (anatomical Components Correction) (Muschelli et al (2014) <doi:10.1016/j.neuroimage.2014.03.028>), detrending, and nuisance regression. Projection scrubbing is also applicable to other outlier detection tasks involving high-dimensional data.
Maintained by Amanda Mejia. Last updated 2 years ago.
4.8 match 4 stars 4.56 score 15 scripts 1 dependentskamapu
vegtable:Handling Vegetation Data Sets
Import and handling data from vegetation-plot databases, especially data stored in 'Turboveg 2' (<https://www.synbiosys.alterra.nl/turboveg/>). Also import/export routines for exchange of data with 'Juice' (<https://www.sci.muni.cz/botany/juice/>) are implemented.
Maintained by Miguel Alvarez. Last updated 8 months ago.
5.1 match 7 stars 4.23 score 49 scriptsericdunipace
RcppCGAL:'Rcpp' Integration for 'CGAL'
Creates a header only package to link to the 'CGAL' (Computational Geometry Algorithms Library) header files in 'Rcpp'. There are a variety of potential uses for the software such as Hilbert sorting, K-D Tree nearest neighbors, and convex hull algorithms. For more information about how to use the header files, see the 'CGAL' documentation at <https://www.cgal.org>. Currently downloads version 6.0.1 of the 'CGAL' header files.
Maintained by Eric Dunipace. Last updated 2 months ago.
3.0 match 12 stars 7.18 score 1 scripts 12 dependentsgavinrozzi
njtr1:Download, Analyze & Clean New Jersey Car Crash Data
Download and analyze motor vehicle crash data released by the New Jersey Department of Transportation (NJDOT). The data in this package is collected through the filing of NJTR-1 form by police officers, which provide a standardized way of documenting a motor vehicle crash that occurred in New Jersey. 3 different data tables containing data on crashes, vehicles & pedestrians released from 2001 to the present can be downloaded & cleaned using this package.
Maintained by Gavin Rozzi. Last updated 1 years ago.
njtr1new-jerseyroad-safetycar-crashescar-accidentsdata
4.9 match 5 stars 4.40 score 7 scriptsstan-dev
cmdstanr:R Interface to 'CmdStan'
A lightweight interface to 'Stan' <https://mc-stan.org>. The 'CmdStanR' interface is an alternative to 'RStan' that calls the command line interface for compilation and running algorithms instead of interfacing with C++ via 'Rcpp'. This has many benefits including always being compatible with the latest version of Stan, fewer installation errors, fewer unexpected crashes in RStudio, and a more permissive license.
Maintained by Andrew Johnson. Last updated 9 months ago.
bayesbayesianmarkov-chain-monte-carlomaximum-likelihoodmcmcstanvariational-inference
1.8 match 145 stars 12.27 score 5.2k scripts 9 dependentsnflverse
nflreadr:Download 'nflverse' Data
A minimal package for downloading data from 'GitHub' repositories of the 'nflverse' project.
Maintained by Tan Ho. Last updated 4 months ago.
nflnflfastrnflversesports-data
1.7 match 66 stars 12.46 score 476 scripts 10 dependentsalexbhatt
epidm:UK Epidemiological Data Management
Contains utilities and functions for the cleaning, processing and management of patient level public health data for surveillance and analysis held by the UK Health Security Agency, UKHSA.
Maintained by Alex Bhattacharya. Last updated 7 months ago.
4.2 match 13 stars 5.07 score 2 scriptsnutriverse
mwana:An Efficient Workflow for Plausibility Checks and Prevalence Analysis of Wasting in R
A simple and streamlined workflow for plausibility checks and prevalence analysis of wasting based on the Standardized Monitoring and Assessment of Relief and Transition (SMART) Methodology <https://smartmethodology.org/>, with application in R.
Maintained by Tomás Zaba. Last updated 1 months ago.
acute-malnutritionanthropometrymuacnutritionsmartsurveywasting
5.0 match 2 stars 4.23 score 6 scriptspaws-r
paws:Amazon Web Services Software Development Kit
Interface to Amazon Web Services <https://aws.amazon.com>, including storage, database, and compute services, such as 'Simple Storage Service' ('S3'), 'DynamoDB' 'NoSQL' database, and 'Lambda' functions-as-a-service.
Maintained by Dyfan Jones. Last updated 4 days ago.
1.9 match 332 stars 11.25 score 177 scripts 12 dependentsjoundso
cleaR:Clean the R Console and Environment
Small package to clean the R console and the R environment with the call of just one function.
Maintained by Jonathan M. Mang. Last updated 1 years ago.
5.5 match 3.78 score 3 scripts 4 dependentsasa12138
MetaNet:Network Analysis for Omics Data
Comprehensive network analysis package. Calculate correlation network fastly, accelerate lots of analysis by parallel computing. Support for multi-omics data, search sub-nets fluently. Handle bigger data, more than 10,000 nodes in each omics. Offer various layout method for multi-omics network and some interfaces to other software ('Gephi', 'Cytoscape', 'ggplot2'), easy to visualize. Provide comprehensive topology indexes calculation, including ecological network stability.
Maintained by Chen Peng. Last updated 11 days ago.
dataimportnetwork analysisomicssoftwarevisualization
3.8 match 13 stars 5.51 score 9 scriptsmayur1009
cleanTS:Testbench for Univariate Time Series Cleaning
A reliable and efficient tool for cleaning univariate time series data. It implements reliable and efficient procedures for automating the process of cleaning univariate time series data. The package provides integration with already developed and deployed tools for missing value imputation and outlier detection. It also provides a way of visualizing large time-series data in different resolutions.
Maintained by Mayur Shende. Last updated 1 years ago.
5.6 match 11 stars 3.74 score 3 scriptsbioc
basecallQC:Working with Illumina Basecalling and Demultiplexing input and output files
The basecallQC package provides tools to work with Illumina bcl2Fastq (versions >= 2.1.7) software.Prior to basecalling and demultiplexing using the bcl2Fastq software, basecallQC functions allow the user to update Illumina sample sheets from versions <= 1.8.9 to >= 2.1.7 standards, clean sample sheets of common problems such as invalid sample names and IDs, create read and index basemasks and the bcl2Fastq command. Following the generation of basecalled and demultiplexed data, the basecallQC packages allows the user to generate HTML tables, plots and a self contained report of summary metrics from Illumina XML output files.
Maintained by Thomas Carroll. Last updated 5 months ago.
sequencinginfrastructuredataimportqualitycontrol
4.7 match 4.32 score 21 scriptseu-ecdc
EpiSignalDetection:Signal Detection Analysis
Exploring time series for signal detection. It is specifically designed to detect possible outbreaks using infectious disease surveillance data at the European Union / European Economic Area or country level. Automatic detection tools used are presented in the paper "Monitoring count time series in R: aberration detection in public health surveillance", by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. The package includes: - Signal Detection tool, an interactive 'shiny' application in which the user can import external data and perform basic signal detection analyses; - An automated report in HTML format, presenting the results of the time series analysis in tables and graphs. This report can also be stratified by population characteristics (see 'Population' variable). This project was funded by the European Centre for Disease Prevention and Control.
Maintained by Joana Gomes Dias. Last updated 6 years ago.
3.8 match 16 stars 5.43 score 17 scriptsr-forge
deSolve:Solvers for Initial Value Problems of Differential Equations ('ODE', 'DAE', 'DDE')
Functions that solve initial value problems of a system of first-order ordinary differential equations ('ODE'), of partial differential equations ('PDE'), of differential algebraic equations ('DAE'), and of delay differential equations. The functions provide an interface to the FORTRAN functions 'lsoda', 'lsodar', 'lsode', 'lsodes' of the 'ODEPACK' collection, to the FORTRAN functions 'dvode', 'zvode' and 'daspk' and a C-implementation of solvers of the 'Runge-Kutta' family with fixed or variable time steps. The package contains routines designed for solving 'ODEs' resulting from 1-D, 2-D and 3-D partial differential equations ('PDE') that have been converted to 'ODEs' by numerical differencing.
Maintained by Thomas Petzoldt. Last updated 1 years ago.
1.7 match 12.33 score 8.0k scripts 427 dependentssurveygraph
surveygraph:Network Representations of Attitudes
A tool for computing network representations of attitudes, extracted from tabular data such as sociological surveys. Development of surveygraph software and training materials was initially funded by the European Union under the ERC Proof-of-concept programme (ERC, Attitude-Maps-4-All, project number: 101069264). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.
Maintained by Samuel Unicomb. Last updated 4 months ago.
3.8 match 1 stars 5.41 score 6 scriptsmaarten14c
rice:Radiocarbon Equations
Provides functions for the calibration of radiocarbon dates, as well as options to calculate different radiocarbon realms (C14 age, F14C, pMC, D14C) and estimating the effects of contamination or local reservoir offsets (Reimer and Reimer 2001 <doi:10.1017/S0033822200038339>). The methods follow long-established recommendations such as Stuiver and Polach (1977) <doi:10.1017/S0033822200003672> and Reimer et al. (2004) <doi:10.1017/S0033822200033154>. This package complements the data package 'rintcal'.
Maintained by Maarten Blaauw. Last updated 2 months ago.
3.3 match 1 stars 6.13 score 13 scripts 4 dependentspauljohn32
kutils:Project Management Tools
Tools for data importation, recoding, and inspection. There are functions to create new project folders, R code templates, create uniquely named output directories, and to quickly obtain a visual summary for each variable in a data frame. The main feature here is the systematic implementation of the "variable key" framework for data importation and recoding. We are eager to have community feedback about the variable key and the vignette about it. In version 1.7, the function 'semTable' is removed. It was deprecated since 1.67. That is provided in a separate package, 'semTable'.
Maintained by Paul Johnson. Last updated 1 years ago.
3.4 match 5.85 score 110 scripts 20 dependentsmatt-dray
tamRgo:Digital Pets for R
Store a persistent digital pet on your computer and interact with it in your R console.
Maintained by Matt Dray. Last updated 2 years ago.
5.6 match 7 stars 3.54 score 4 scriptsfkeck
bioseq:A Toolbox for Manipulating Biological Sequences
Classes and functions to work with biological sequences (DNA, RNA and amino acid sequences). Implements S3 infrastructure to work with biological sequences as described in Keck (2020) <doi:10.1111/2041-210X.13490>. Provides a collection of functions to perform biological conversion among classes (transcription, translation) and basic operations on sequences (detection, selection and replacement based on positions or patterns). The package also provides functions to import and export sequences from and to other package formats.
Maintained by Francois Keck. Last updated 3 years ago.
2.9 match 22 stars 6.72 score 80 scripts 1 dependents