tidyverse:Easily Install and Load the 'Tidyverse'
The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <>.
Maintained by Hadley Wickham. Last updated 5 months ago.
1.7k stars 20.23 score 664k scripts 125 dependentstidyverse
haven:Import and Export 'SPSS', 'Stata' and 'SAS' Files
Import foreign statistical formats into R via the embedded 'ReadStat' C library, <>.
Maintained by Hadley Wickham. Last updated 6 months ago.
427 stars 18.63 score 18k scripts 682 dependentsgesistsa
rio:A Swiss-Army Knife for Data I/O
Streamlined data import and export by making assumptions that the user is probably willing to make: 'import()' and 'export()' determine the data format from the file extension, reasonable defaults are used for data import and export, web-based import is natively supported (including from SSL/HTTPS), compressed files can be read directly, and fast import packages are used where appropriate. An additional convenience function, 'convert()', provides a simple method for converting between file types.
Maintained by Chung-hong Chan. Last updated 3 months ago.
610 stars 17.10 score 7.8k scripts 74 dependentsamices
mice:Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Maintained by Stef van Buuren. Last updated 22 hours ago.
462 stars 16.64 score 10k scripts 154 dependentsnjtierney
naniar:Data Structures, Summaries, and Visualisations for Missing Data
Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. 'naniar' provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data. The work is fully discussed at Tierney & Cook (2023) <doi:10.18637/jss.v105.i07>.
Maintained by Nicholas Tierney. Last updated 16 days ago.
657 stars 15.63 score 5.1k scripts 9 dependentsrich-iannone
DiagrammeR:Graph/Network Visualization
Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.
Maintained by Richard Iannone. Last updated 2 months ago.
1.7k stars 15.29 score 3.8k scripts 86 dependentslarmarange
labelled:Manipulating Labelled Data
Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.
Maintained by Joseph Larmarange. Last updated 1 months ago.
76 stars 15.04 score 2.4k scripts 98 dependentsguido-s
meta:General Package for Meta-Analysis
User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rücker <DOI:10.1007/978-3-319-21416-0>, "Meta-Analysis with R" (2015): - common effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L'Abbe, Baujat, bubble); - three-level meta-analysis model; - generalised linear mixed model; - logistic regression with penalised likelihood for rare events; - Hartung-Knapp method for random effects model; - Kenward-Roger method for random effects model; - prediction interval; - statistical tests for funnel plot asymmetry; - trim-and-fill method to evaluate bias in meta-analysis; - meta-regression; - cumulative meta-analysis and leave-one-out meta-analysis; - import data from 'RevMan 5'; - produce forest plot summarising several (subgroup) meta-analyses.
Maintained by Guido Schwarzer. Last updated 19 hours ago.
89 stars 14.95 score 2.3k scripts 30 dependentsbioc
TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data
The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.
Maintained by Tiago Chedraoui Silva. Last updated 1 months ago.
310 stars 14.47 score 1.6k scripts 6 dependentsbusiness-science
timetk:A Tool Kit for Working with Time Series
Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.
Maintained by Matt Dancho. Last updated 1 years ago.
626 stars 14.20 score 4.0k scripts 16 dependentsdoi-usgs
dataRetrieval:Retrieval Functions for USGS and EPA Hydrology and Water Quality Data
Collection of functions to help retrieve U.S. Geological Survey and U.S. Environmental Protection Agency water quality and hydrology data from web services. Data are discovered from National Water Information System <> and <>. Water quality data are obtained from the Water Quality Portal <>.
Maintained by Laura DeCicco. Last updated 3 days ago.
286 stars 14.16 score 1.7k scripts 15 dependentswalkerke
tidycensus:Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames
An integrated R interface to several United States Census Bureau APIs (<>) and the US Census Bureau's geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for mapping and spatial analysis.
Maintained by Kyle Walker. Last updated 2 months ago.
648 stars 14.02 score 7.5k scripts 10 dependentsropensci
taxize:Taxonomic Information from Around the Web
Interacts with a suite of web application programming interfaces (API) for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more. Some of the services supported include 'NCBI E-utilities' (<>), 'Encyclopedia of Life' (<>), 'Global Biodiversity Information Facility' (<>), and many more. Links to the API documentation for other supported services are available in the documentation for their respective functions in this package.
Maintained by Zachary Foster. Last updated 25 days ago.
274 stars 13.63 score 1.6k scripts 23 dependentskaz-yos
tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights
Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.
Maintained by Kazuki Yoshida. Last updated 3 years ago.
221 stars 13.55 score 2.3k scripts 12 dependentsbioc
GEOquery:Get data from NCBI Gene Expression Omnibus (GEO)
The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.
Maintained by Sean Davis. Last updated 5 months ago.
93 stars 13.48 score 4.1k scripts 45 dependentsbusiness-science
tidyquant:Tidy Quantitative Financial Analysis
Bringing business and financial analysis to the 'tidyverse'. The 'tidyquant' package provides a convenient wrapper to various 'xts', 'zoo', 'quantmod', 'TTR' and 'PerformanceAnalytics' package functions and returns the objects in the tidy 'tibble' format. The main advantage is being able to use quantitative functions with the 'tidyverse' functions including 'purrr', 'dplyr', 'tidyr', 'ggplot2', 'lubridate', etc. See the 'tidyquant' website for more information, documentation and examples.
Maintained by Matt Dancho. Last updated 1 months ago.
872 stars 13.34 score 5.2k scriptsprojectmosaic
mosaic:Project MOSAIC Statistics and Mathematics Teaching Utilities
Data sets and utilities from Project MOSAIC (<>) used to teach mathematics, statistics, computation and modeling. Funded by the NSF, Project MOSAIC is a community of educators working to tie together aspects of quantitative work that students in science, technology, engineering and mathematics will need in their professional lives, but which are usually taught in isolation, if at all.
Maintained by Randall Pruim. Last updated 1 years ago.
93 stars 13.32 score 7.2k scripts 7 dependentsropensci
visdat:Preliminary Visualisation of Data
Create preliminary exploratory data visualisations of an entire dataset to identify problems or unexpected features using 'ggplot2'.
Maintained by Nicholas Tierney. Last updated 8 months ago.
452 stars 13.31 score 2.1k scripts 11 dependentsdreamrs
esquisse:Explore and Visualize Your Data Interactively
A 'shiny' gadget to create 'ggplot2' figures interactively with drag-and-drop to map your variables to different aesthetics. You can quickly visualize your data accordingly to their type, export in various formats, and retrieve the code to reproduce the plot.
Maintained by Victor Perrier. Last updated 1 months ago.
1.8k stars 13.31 score 1.1k scripts 1 dependentsoscarkjell
text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <>.
Maintained by Oscar Kjell. Last updated 7 days ago.
145 stars 13.21 score 436 scripts 1 dependentsopenair-project
openair:Tools for the Analysis of Air Pollution Data
Tools to analyse, interpret and understand air pollution data. Data are typically regular time series and air quality measurement, meteorological data and dispersion model output can be analysed. The package is described in Carslaw and Ropkins (2012, <doi:10.1016/j.envsoft.2011.09.008>) and subsequent papers.
Maintained by David Carslaw. Last updated 1 days ago.
316 stars 12.94 score 1.2k scripts 12 dependentsjuba
questionr:Functions to Make Surveys Processing Easier
Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.
Maintained by Julien Barnier. Last updated 8 days ago.
83 stars 12.93 score 1.1k scripts 19 dependentsbioc
minfi:Analyze Illumina Infinium DNA methylation arrays
Tools to analyze & visualize Illumina Infinium methylation arrays.
Maintained by Kasper Daniel Hansen. Last updated 4 months ago.
60 stars 12.82 score 996 scripts 27 dependentsohdsi
DatabaseConnector:Connecting to Various Database Platforms
An R 'DataBase Interface' ('DBI') compatible interface to various database platforms ('PostgreSQL', 'Oracle', 'Microsoft SQL Server', 'Amazon Redshift', 'Microsoft Parallel Database Warehouse', 'IBM Netezza', 'Apache Impala', 'Google BigQuery', 'Snowflake', 'Spark', 'SQLite', and 'InterSystems IRIS'). Also includes support for fetching data as 'Andromeda' objects. Uses either 'Java Database Connectivity' ('JDBC') or other 'DBI' drivers to connect to databases.
Maintained by Martijn Schuemie. Last updated 2 months ago.
56 stars 12.63 score 772 scripts 11 dependentsmassimoaria
bibliometrix:Comprehensive Science Mapping Analysis
Tool for quantitative research in scientometrics and bibliometrics. It implements the comprehensive workflow for science mapping analysis proposed in Aria M. and Cuccurullo C. (2017) <doi:10.1016/j.joi.2017.08.007>. 'bibliometrix' provides various routines for importing bibliographic data from 'SCOPUS', 'Clarivate Analytics Web of Science' (<>), 'Digital Science Dimensions' (<>), 'OpenAlex' (<>), 'Cochrane Library' (<>), 'Lens' (<>), and 'PubMed' (<>) databases, performing bibliometric analysis and building networks for co-citation, coupling, scientific collaboration and co-word analysis.
Maintained by Massimo Aria. Last updated 10 days ago.
545 stars 12.54 score 518 scripts 2 dependentssimongrund1
mitml:Tools for Multiple Imputation in Multilevel Modeling
Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the packages 'pan' and 'jomo', and several functions for visualization, data management and the analysis of multiply imputed data sets.
Maintained by Simon Grund. Last updated 1 years ago.
29 stars 12.36 score 246 scripts 153 dependentsouhscbbmc
REDCapR:Interaction Between R and REDCap
Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.
Maintained by Will Beasley. Last updated 3 months ago.
118 stars 12.36 score 438 scripts 6 dependentsdreamrs
datamods:Modules to Import and Manipulate Data in 'Shiny'
'Shiny' modules to import data into an application or 'addin' from various sources, and to manipulate them after that.
Maintained by Victor Perrier. Last updated 24 days ago.
144 stars 12.03 score 174 scripts 7 dependentsbioc
GenomicDataCommons:NIH / NCI Genomic Data Commons Access
Programmatically access the NIH / NCI Genomic Data Commons RESTful service.
Maintained by Sean Davis. Last updated 2 months ago.
87 stars 11.94 score 238 scripts 12 dependentsguido-s
netmeta:Network Meta-Analysis using Frequentist Methods
A comprehensive set of functions providing frequentist methods for network meta-analysis (Balduzzi et al., 2023) <doi:10.18637/jss.v106.i02> and supporting Schwarzer et al. (2015) <doi:10.1007/978-3-319-21416-0>, Chapter 8 "Network Meta-Analysis": - frequentist network meta-analysis following Rücker (2012) <doi:10.1002/jrsm.1058>; - additive network meta-analysis for combinations of treatments (Rücker et al., 2020) <doi:10.1002/bimj.201800167>; - network meta-analysis of binary data using the Mantel-Haenszel or non-central hypergeometric distribution method (Efthimiou et al., 2019) <doi:10.1002/sim.8158>, or penalised logistic regression (Evrenoglou et al., 2022) <doi:10.1002/sim.9562>; - rankograms and ranking of treatments by the Surface under the cumulative ranking curve (SUCRA) (Salanti et al., 2013) <doi:10.1016/j.jclinepi.2010.03.016>; - ranking of treatments using P-scores (frequentist analogue of SUCRAs without resampling) according to Rücker & Schwarzer (2015) <doi:10.1186/s12874-015-0060-8>; - split direct and indirect evidence to check consistency (Dias et al., 2010) <doi:10.1002/sim.3767>, (Efthimiou et al., 2019) <doi:10.1002/sim.8158>; - league table with network meta-analysis results; - 'comparison-adjusted' funnel plot (Chaimani & Salanti, 2012) <doi:10.1002/jrsm.57>; - net heat plot and design-based decomposition of Cochran's Q according to Krahn et al. (2013) <doi:10.1186/1471-2288-13-35>; - measures characterizing the flow of evidence between two treatments by König et al. (2013) <doi:10.1002/sim.6001>; - automated drawing of network graphs described in Rücker & Schwarzer (2016) <doi:10.1002/jrsm.1143>; - partial order of treatment rankings ('poset') and Hasse diagram for 'poset' (Carlsen & Bruggemann, 2014) <doi:10.1002/cem.2569>; (Rücker & Schwarzer, 2017) <doi:10.1002/jrsm.1270>; - contribution matrix as described in Papakonstantinou et al. (2018) <doi:10.12688/f1000research.14770.3> and Davies et al. (2022) <doi:10.1002/sim.9346>; - subgroup network meta-analysis.
Maintained by Guido Schwarzer. Last updated 8 days ago.
33 stars 11.84 score 199 scripts 10 dependentsateucher
rmapshaper:Client for 'mapshaper' for 'Geospatial' Operations
Edit and simplify 'geojson', 'Spatial', and 'sf' objects. This is wrapper around the 'mapshaper' 'JavaScript' library by Matthew Bloch <> to perform topologically-aware polygon simplification, as well as other operations such as clipping, erasing, dissolving, and converting 'multi-part' to 'single-part' geometries.
Maintained by Andy Teucher. Last updated 9 months ago.
204 stars 11.64 score 2.1k scripts 18 dependentspecanproject Functions Used for Managing Climate Driver Data
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.
Maintained by David LeBauer. Last updated 2 days ago.
216 stars 11.61 score 64 scripts 14 dependentsprojectmosaic
ggformula:Formula Interface to the Grammar of Graphics
Provides a formula interface to 'ggplot2' graphics.
Maintained by Randall Pruim. Last updated 1 years ago.
38 stars 11.55 score 1.7k scripts 25 dependentsbioc
mia:Microbiome analysis
mia implements tools for microbiome analysis based on the SummarizedExperiment, SingleCellExperiment and TreeSummarizedExperiment infrastructure. Data wrangling and analysis in the context of taxonomic data is the main scope. Additional functions for common task are implemented such as community indices calculation and summarization.
Maintained by Tuomas Borman. Last updated 2 days ago.
51 stars 11.51 score 316 scripts 5 dependentslarmarange
broom.helpers:Helpers for Model Coefficients Tibbles
Provides suite of functions to work with regression model 'broom::tidy()' tibbles. The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.
Maintained by Joseph Larmarange. Last updated 23 days ago.
22 stars 11.45 score 165 scripts 2 dependentsewenharrison
finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling
Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.
Maintained by Ewen Harrison. Last updated 8 days ago.
270 stars 11.43 score 1.0k scriptsdarwin-eu
CDMConnector:Connect to an OMOP Common Data Model
Provides tools for working with observational health data in the Observational Medical Outcomes Partnership (OMOP) Common Data Model format with a pipe friendly syntax. Common data model database table references are stored in a single compound object along with metadata.
Maintained by Adam Black. Last updated 1 months ago.
12 stars 11.43 score 502 scripts 12 dependentsopenintrostat
openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
Maintained by Mine Çetinkaya-Rundel. Last updated 3 months ago.
240 stars 11.39 score 6.0k scriptsdoi-usgs
nhdplusTools:NHDPlus Tools
Tools for traversing and working with National Hydrography Dataset Plus (NHDPlus) data. All methods implemented in 'nhdplusTools' are available in the NHDPlus documentation available from the US Environmental Protection Agency <>.
Maintained by David Blodgett. Last updated 1 months ago.
87 stars 11.38 score 348 scripts 5 dependentsropensci
biomartr:Genomic Data Retrieval
Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database (Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr', 'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. with only one command.
Maintained by Hajk-Georg Drost. Last updated 2 months ago.
218 stars 11.35 score 129 scripts 3 dependentsmrcieu
TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database
A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <> to automatically obtain data, and a wide range of methods to run the analysis.
Maintained by Gibran Hemani. Last updated 1 days ago.
476 stars 11.27 score 1.7k scripts 1 dependentsbioc
genomation:Summary, annotation and visualization of genomic data
A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.
Maintained by Altuna Akalin. Last updated 5 months ago.
76 stars 11.13 score 738 scripts 5 dependentscovid19datahub
COVID19:COVID-19 Data Hub
Unified datasets for a better understanding of COVID-19.
Maintained by Emanuele Guidotti. Last updated 1 months ago.
252 stars 11.08 score 265 scriptsropengov
eurostat:Tools for Eurostat Open Data
Tools to download data from the Eurostat database <> together with search and manipulation utilities.
Maintained by Leo Lahti. Last updated 1 months ago.
242 stars 11.07 score 892 scripts 4 dependentschoonghyunryu
dlookr:Tools for Data Diagnosis, Exploration, Transformation
A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.
Maintained by Choonghyun Ryu. Last updated 10 months ago.
212 stars 11.05 score 748 scripts 2 dependentsipums
ipumsr:An R Interface for Downloading, Reading, and Handling IPUMS Data
An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website <>.
Maintained by Derek Burk. Last updated 1 months ago.
30 stars 11.05 score 720 scripts 2 dependentsuupharmacometrics
xpose:Diagnostics for Pharmacometric Models
Diagnostics for non-linear mixed-effects (population) models from 'NONMEM' <>. 'xpose' facilitates data import, creation of numerical run summary and provide 'ggplot2'-based graphics for data exploration and model diagnostics.
Maintained by Benjamin Guiastrennec. Last updated 3 months ago.
62 stars 11.02 score 183 scripts 6 dependentsohdsi
PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model
A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.
Maintained by Egill Fridgeirsson. Last updated 22 days ago.
190 stars 10.85 score 297 scriptsropensci
geojsonio:Convert Data from and to 'GeoJSON' or 'TopoJSON'
Convert data to 'GeoJSON' or 'TopoJSON' from various R classes, including vectors, lists, data frames, shape files, and spatial classes. 'geojsonio' does not aim to replace packages like 'sp', 'rgdal', 'rgeos', but rather aims to be a high level client to simplify conversions of data from and to 'GeoJSON' and 'TopoJSON'.
Maintained by Michael Mahoney. Last updated 1 years ago.
151 stars 10.83 score 2.9k scripts 13 dependentsbioc
ANCOMBC:Microbiome differential abudance and correlation analyses with bias correction
ANCOMBC is a package containing differential abundance (DA) and correlation analyses for microbiome data. Specifically, the package includes Analysis of Compositions of Microbiomes with Bias Correction 2 (ANCOM-BC2), Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC), and Analysis of Composition of Microbiomes (ANCOM) for DA analysis, and Sparse Estimation of Correlations among Microbiomes (SECOM) for correlation analysis. Microbiome data are typically subject to two sources of biases: unequal sampling fractions (sample-specific biases) and differential sequencing efficiencies (taxon-specific biases). Methodologies included in the ANCOMBC package are designed to correct these biases and construct statistically consistent estimators.
Maintained by Huang Lin. Last updated 13 days ago.
120 stars 10.79 score 406 scripts 1 dependentsjimmyday12
fitzRoy:Easily Scrape and Process AFL Data
An easy package for scraping and processing Australia Rules Football (AFL) data. 'fitzRoy' provides a range of functions for accessing publicly available data from 'AFL Tables' <>, 'Footy Wire' <> and 'The Squiggle' <>. Further functions allow for easy processing, cleaning and transformation of this data into formats that can be used for analysis.
Maintained by James Day. Last updated 9 days ago.
136 stars 10.72 score 324 scriptsbioc
GWASTools:Tools for Genome Wide Association Studies
Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.
Maintained by Stephanie M. Gogarten. Last updated 11 days ago.
17 stars 10.67 score 396 scripts 5 dependentsdoi-usgs
EGRET:Exploration and Graphics for RivEr Trends
Statistics and graphics for streamflow history, water quality trends, and the statistical modeling algorithm: Weighted Regressions on Time, Discharge, and Season (WRTDS).
Maintained by Laura DeCicco. Last updated 4 months ago.
90 stars 10.67 score 362 scripts 1 dependentsohdsi
FeatureExtraction:Generating Features for a Cohort
An R interface for generating features for a cohort using data in the Common Data Model. Features can be constructed using default or custom made feature definitions. Furthermore it's possible to aggregate features and get the summary statistics.
Maintained by Ger Inberg. Last updated 9 days ago.
62 stars 10.64 score 209 scripts 2 dependentsbusiness-science
modeltime:The Tidymodels Extension for Time Series Modeling
The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<>). Refer to "Prophet: forecasting at scale" (<>.).
Maintained by Matt Dancho. Last updated 5 months ago.
551 stars 10.61 score 1.1k scripts 7 dependentsbioc
ORFik:Open Reading Frames in Genomics
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
Maintained by Haakon Tjeldnes. Last updated 1 months ago.
33 stars 10.56 score 115 scripts 2 dependentsropensci
gutenbergr:Download and Process Public Domain Works from Project Gutenberg
Download and process public domain works in the Project Gutenberg collection <>. Includes metadata for all Project Gutenberg works, so that they can be searched and retrieved.
Maintained by Jon Harmon. Last updated 3 months ago.
105 stars 10.50 score 1.1k scripts 1 dependentsrstudio
vetiver:Version, Share, Deploy, and Monitor Models
The goal of 'vetiver' is to provide fluent tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. The 'vetiver' package is extensible, with generics that can support many kinds of models.
Maintained by Julia Silge. Last updated 6 months ago.
185 stars 10.48 score 466 scripts 1 dependentsbioc
GENESIS:GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness
The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.
Maintained by Stephanie M. Gogarten. Last updated 2 months ago.
36 stars 10.44 score 342 scripts 1 dependentsbcgov
bcdata:Search and Retrieve Data from the BC Data Catalogue
Search, query, and download tabular and 'geospatial' data from the British Columbia Data Catalogue (<>). Search catalogue data records based on keywords, data licence, sector, data format, and B.C. government organization. View metadata directly in R, download many data formats, and query 'geospatial' data available via the B.C. government Web Feature Service ('WFS') using 'dplyr' syntax.
Maintained by Andy Teucher. Last updated 3 days ago.
83 stars 10.36 score 186 scripts 4 dependentsssnn-airr
alakazam:Immunoglobulin Clonal Lineage and Diversity Analysis
Provides methods for high-throughput adaptive immune receptor repertoire sequencing (AIRR-Seq; Rep-Seq) analysis. In particular, immunoglobulin (Ig) sequence lineage reconstruction, lineage topology analysis, diversity profiling, amino acid property analysis and gene usage. Citations: Gupta and Vander Heiden, et al (2017) <doi:10.1093/bioinformatics/btv359>, Stern, Yaari and Vander Heiden, et al (2014) <doi:10.1126/scitranslmed.3008879>.
Maintained by Susanna Marquez. Last updated 3 months ago.
10.33 score 424 scripts 7 dependentsrichardli
SUMMER:Small-Area-Estimation Unit/Area Models and Methods for Estimation in R
Provides methods for spatial and spatio-temporal smoothing of demographic and health indicators using survey data, with particular focus on estimating and projecting under-five mortality rates, described in Mercer et al. (2015) <doi:10.1214/15-AOAS872>, Li et al. (2019) <doi:10.1371/journal.pone.0210645>, Wu et al. (DHS Spatial Analysis Reports No. 21, 2021), and Li et al. (2023) <doi:10.48550/arXiv.2007.05117>.
Maintained by Zehang R Li. Last updated 3 months ago.
23 stars 10.28 score 134 scripts 2 dependentsinsightsengineering
teal.modules.clinical:'teal' Modules for Standard Clinical Outputs
Provides user-friendly tools for creating and customizing clinical trial reports. By leveraging the 'teal' framework, this package provides 'teal' modules to easily create an interactive panel that allows for seamless adjustments to data presentation, thereby streamlining the creation of detailed and accurate reports.
Maintained by Dawid Kaledkowski. Last updated 29 days ago.
34 stars 10.25 score 149 scriptsropensci
qualtRics:Download 'Qualtrics' Survey Data
Provides functions to access survey results directly into R using the 'Qualtrics' API. 'Qualtrics' <> is an online survey and data collection software platform. See <> for more information about the 'Qualtrics' API. This package is community-maintained and is not officially supported by 'Qualtrics'.
Maintained by Julia Silge. Last updated 7 months ago.
221 stars 10.23 score 272 scriptsidigbio
ridigbio:Interface to the iDigBio Data API
An interface to iDigBio's search API that allows downloading specimen records. Searches are returned as a data.frame. Other functions such as the metadata end points return lists of information. iDigBio is a US project focused on digitizing and serving museum specimen collections on the web. See <> for information on iDigBio.
Maintained by Jesse Bennett. Last updated 18 days ago.
16 stars 10.23 score 63 scripts 7 dependentsbioc
cBioPortalData:Exposes and Makes Available Data from the cBioPortal Web Resources
The cBioPortalData R package accesses study datasets from the cBio Cancer Genomics Portal. It accesses the data either from the pre-packaged zip / tar files or from the API interface that was recently implemented by the cBioPortal Data Team. The package can provide data in either tabular format or with MultiAssayExperiment object that uses familiar Bioconductor data representations.
Maintained by Marcel Ramos. Last updated 8 days ago.
33 stars 10.17 score 147 scripts 4 dependentsropensci
rdhs:API Client and Dataset Management for the Demographic and Health Survey (DHS) Data
Provides a client for (1) querying the DHS API for survey indicators and metadata (<>), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.
Maintained by OJ Watson. Last updated 30 days ago.
37 stars 10.16 score 286 scripts 4 dependentsdslc-io
tidytuesdayR:Access the Weekly 'TidyTuesday' Project Dataset
'TidyTuesday' is a project by the 'Data Science Learning Community' in which they post a weekly dataset in a public data repository (<>) for people to analyze and visualize. This package provides the tools to easily download this data and the description of the source.
Maintained by Jon Harmon. Last updated 3 days ago.
77 stars 10.13 score 3.0k scriptskogalur
randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.
Maintained by Udaya B. Kogalur. Last updated 20 hours ago.
124 stars 10.10 score 1.2k scripts 11 dependentsropensci
spocc:Interface to Species Occurrence Data Sources
A programmatic interface to many species occurrence data sources, including Global Biodiversity Information Facility ('GBIF'), 'iNaturalist', 'eBird', Integrated Digitized 'Biocollections' ('iDigBio'), 'VertNet', Ocean 'Biogeographic' Information System ('OBIS'), and Atlas of Living Australia ('ALA'). Includes functionality for retrieving species occurrence data, and combining those data.
Maintained by Hannah Owens. Last updated 2 months ago.
118 stars 10.09 score 552 scripts 5 dependentsjinseob2kim
jstable:Create Tables from Different Types of Regression
Create regression tables from generalized linear model(GLM), generalized estimating equation(GEE), generalized linear mixed-effects model(GLMM), Cox proportional hazards model, survey-weighted generalized linear model(svyglm) and survey-weighted Cox model results for publication.
Maintained by Jinseob Kim. Last updated 20 hours ago.
28 stars 10.08 score 199 scripts 1 dependentsgshs-ornl
wbstats:Programmatic Access to Data and Statistics from the World Bank API
Search and download data from the World Bank Data API.
Maintained by Jesse Piburn. Last updated 4 years ago.
126 stars 10.07 score 1.1k scripts 3 dependentsropensci
tabulapdf:Extract Tables from PDF Documents
Bindings for the 'Tabula' <> 'Java' library, which can extract tables from PDF files. This tool can reduce time and effort in data extraction processes in fields like investigative journalism. It allows for automatic and manual table extraction, the latter facilitated through a 'Shiny' interface, enabling manual areas selection\ with a computer mouse for data retrieval.
Maintained by Mauricio Vargas Sepulveda. Last updated 3 months ago.
552 stars 10.07 score 159 scripts 1 dependentsropensci
nasapower:NASA POWER API Client
An API client for NASA POWER global meteorology, surface solar energy and climatology data API. POWER (Prediction Of Worldwide Energy Resources) data are freely available for download with varying spatial resolutions dependent on the original data and with several temporal resolutions depending on the POWER parameter and community. This work is funded through the NASA Earth Science Directorate Applied Science Program. For more on the data themselves, the methodologies used in creating, a web- based data viewer and web access, please see <>.
Maintained by Adam H. Sparks. Last updated 23 days ago.
101 stars 9.98 score 137 scripts 3 dependentsiqss
dataverse:Client for Dataverse 4+ Repositories
Provides access to Dataverse APIs <> (versions 4-5), enabling data search, retrieval, and deposit. For Dataverse versions <= 3.0, use the archived 'dvn' package <>.
Maintained by Shiro Kuriwaki. Last updated 5 months ago.
61 stars 9.98 score 217 scripts 4 dependentsdarwin-eu
PatientProfiles:Identify Characteristics of Patients in the OMOP Common Data Model
Identify the characteristics of patients in data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model.
Maintained by Marti Catala. Last updated 22 days ago.
1 stars 9.97 score 225 scripts 9 dependentspecanproject
PEcAn.assim.batch:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by Istem Fer. Last updated 2 days ago.
216 stars 9.96 score 20 scripts 2 dependentsdarwin-eu
CodelistGenerator:Identify Relevant Clinical Codes and Evaluate Their Use
Generate a candidate code list for the Observational Medical Outcomes Partnership (OMOP) common data model based on string matching. For a given search strategy, a candidate code list will be returned.
Maintained by Edward Burn. Last updated 2 days ago.
14 stars 9.94 score 165 scripts 4 dependentsbioc
OmnipathR:OmniPath web service client and more
A client for the OmniPath web service ( and many other resources. It also includes functions to transform and pretty print some of the downloaded data, functions to access a number of other resources such as BioPlex, ConsensusPathDB, EVEX, Gene Ontology, Guide to Pharmacology (IUPHAR/BPS), Harmonizome, HTRIdb, Human Phenotype Ontology, InWeb InBioMap, KEGG Pathway, Pathway Commons, Ramilowski et al. 2015, RegNetwork, ReMap, TF census, TRRUST and Vinayagam et al. 2011. Furthermore, OmnipathR features a close integration with the NicheNet method for ligand activity prediction from transcriptomics data, and its R implementation `nichenetr` (available only on github).
Maintained by Denes Turei. Last updated 1 months ago.
130 stars 9.90 score 226 scripts 2 dependentsbioc
methylumi:Handle Illumina methylation data
This package provides classes for holding and manipulating Illumina methylation data. Based on eSet, it can contain MIAME information, sample information, feature information, and multiple matrices of data. An "intelligent" import function, methylumiR can read the Illumina text files and create a MethyLumiSet. methylumIDAT can directly read raw IDAT files from HumanMethylation27 and HumanMethylation450 microarrays. Normalization, background correction, and quality control features for GoldenGate, Infinium, and Infinium HD arrays are also included.
Maintained by Sean Davis. Last updated 5 months ago.
9 stars 9.90 score 89 scripts 9 dependentsjaseziv
worldfootballR:Extract and Clean World Football (Soccer) Data
Allow users to obtain clean and tidy football (soccer) game, team and player data. Data is collected from a number of popular sites, including 'FBref', transfer and valuations data from 'Transfermarkt'<> and shooting location and other match stats data from 'Understat'<>. It gives users the ability to access data more efficiently, rather than having to export data tables to files before being able to complete their analysis.
Maintained by Jason Zivkovic. Last updated 1 months ago.
506 stars 9.89 score 516 scripts 2 dependentsjslefche
piecewiseSEM:Piecewise Structural Equation Modeling
Implements piecewise structural equation modeling from a single list of structural equations, with new methods for non-linear, latent, and composite variables, standardized coefficients, query-based prediction and indirect effects. See <> for more.
Maintained by Jon Lefcheck. Last updated 10 months ago.
163 stars 9.85 score 452 scriptsemilhvitfeldt
textdata:Download and Load Various Text Datasets
Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled text data sets for classification and analysis.
Maintained by Emil Hvitfeldt. Last updated 10 months ago.
75 stars 9.84 score 1.4k scripts 1 dependentsropensci
frictionless:Read and Write Frictionless Data Packages
Read and write Frictionless Data Packages. A 'Data Package' (<>) is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR (<>) and open datasets.
Maintained by Peter Desmet. Last updated 6 months ago.
30 stars 9.79 score 55 scripts 6 dependentsbioc
annotatr:Annotation of Genomic Regions to Genomic Annotations
Given a set of genomic sites/regions (e.g. ChIP-seq peaks, CpGs, differentially methylated CpGs or regions, SNPs, etc.) it is often of interest to investigate the intersecting genomic annotations. Such annotations include those relating to gene models (promoters, 5'UTRs, exons, introns, and 3'UTRs), CpGs (CpG islands, CpG shores, CpG shelves), or regulatory sequences such as enhancers. The annotatr package provides an easy way to summarize and visualize the intersection of genomic sites/regions with genomic annotations.
Maintained by Raymond G. Cavalcante. Last updated 5 months ago.
26 stars 9.76 score 246 scripts 5 dependentsbioc
RTCGAToolbox:A new tool for exporting TCGA Firehose data
Managing data from large scale projects such as The Cancer Genome Atlas (TCGA) for further analysis is an important and time consuming step for research projects. Several efforts, such as Firehose project, make TCGA pre-processed data publicly available via web services and data portals but it requires managing, downloading and preparing the data for following steps. We developed an open source and extensible R based data client for Firehose pre-processed data and demonstrated its use with sample case studies. Results showed that RTCGAToolbox could improve data management for researchers who are interested with TCGA data. In addition, it can be integrated with other analysis pipelines for following data analysis.
Maintained by Marcel Ramos. Last updated 3 months ago.
18 stars 9.75 score 76 scripts 5 dependentsropensci
prism:Access Data from the Oregon State Prism Climate Project
Allows users to access the Oregon State Prism climate data (<>). Using the web service API data can easily downloaded in bulk and loaded into R for spatial analysis. Some user friendly visualizations are also provided.
Maintained by Alan Butler. Last updated 3 days ago.
57 stars 9.74 score 354 scriptsohdsi
CohortConstructor:Build and Manipulate Study Cohorts Using a Common Data Model
Create and manipulate study cohorts in data mapped to the Observational Medical Outcomes Partnership Common Data Model.
Maintained by Edward Burn. Last updated 16 hours ago.
2 stars 9.73 score 207 scripts 2 dependentspecanproject
PEcAnRTM:PEcAn Functions Used for Radiative Transfer Modeling
Functions for performing forward runs and inversions of radiative transfer models (RTMs). Inversions can be performed using maximum likelihood, or more complex hierarchical Bayesian methods. Underlying numerical analyses are optimized for speed using Fortran code.
Maintained by Alexey Shiklomanov. Last updated 2 days ago.
216 stars 9.70 score 132 scriptsrnabioco
valr:Genome Interval Arithmetic
Read and manipulate genome intervals and signals. Provides functionality similar to command-line tool suites within R, enabling interactive analysis and visualization of genome-scale data. Riemondy et al. (2017) <doi:10.12688/f1000research.11997.1>.
Maintained by Kent Riemondy. Last updated 20 days ago.
90 stars 9.69 score 227 scriptsbioc
TCGAutils:TCGA utility functions for data management
A suite of helper functions for checking and manipulating TCGA data including data obtained from the curatedTCGAData experiment package. These functions aim to simplify and make working with TCGA data more manageable. Exported functions include those that import data from flat files into Bioconductor objects, convert row annotations, and identifier translation via the GDC API.
Maintained by Marcel Ramos. Last updated 3 months ago.
27 stars 9.66 score 210 scripts 10 dependentsgrunwaldlab
metacoder:Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data
Reads, plots, and manipulates large taxonomic data sets, like those generated from modern high-throughput sequencing, such as metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the 'taxmap' format defined by the 'taxa' package. The 'metacoder' package is described in the publication by Foster et al. (2017) <doi:10.1371/journal.pcbi.1005404>.
Maintained by Zachary Foster. Last updated 2 months ago.
140 stars 9.64 score 328 scriptssdctools
sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation
Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.
Maintained by Matthias Templ. Last updated 1 months ago.
84 stars 9.63 score 258 scriptsfmmattioni
downloadthis:Implement Download Buttons in 'rmarkdown'
Implement download buttons in HTML output from 'rmarkdown' without the need for 'runtime:shiny'.
Maintained by Felipe Mattioni Maturana. Last updated 6 months ago.
146 stars 9.63 score 856 scripts 1 dependentshafen
trelliscopejs:Create Interactive Trelliscope Displays
Trelliscope is a scalable, flexible, interactive approach to visualizing data (Hafen, 2013 <doi:10.1109/LDAV.2013.6675164>). This package provides methods that make it easy to create a Trelliscope display specification for TrelliscopeJS. High-level functions are provided for creating displays from within 'tidyverse' or 'ggplot2' workflows. Low-level functions are also provided for creating new interfaces.
Maintained by Ryan Hafen. Last updated 1 years ago.
262 stars 9.61 score 1000 scripts 1 dependentsropensci
tidyhydat:Extract and Tidy Canadian 'Hydrometric' Data
Provides functions to access historical and real-time national 'hydrometric' data from Water Survey of Canada data sources (<> and <>) and then applies tidy data principles.
Maintained by Sam Albers. Last updated 18 days ago.
71 stars 9.59 score 202 scripts 3 dependentsropensci
rdflib:Tools to Manipulate and Query Semantic Data
The Resource Description Framework, or 'RDF' is a widely used data representation model that forms the cornerstone of the Semantic Web. 'RDF' represents data as a graph rather than the familiar data table or rectangle of relational databases. The 'rdflib' package provides a friendly and concise user interface for performing common tasks on 'RDF' data, such as reading, writing and converting between the various serializations of 'RDF' data, including 'rdfxml', 'turtle', 'nquads', 'ntriples', and 'json-ld'; creating new 'RDF' graphs, and performing graph queries using 'SPARQL'. This package wraps the low level 'redland' R package which provides direct bindings to the 'redland' C library. Additionally, the package supports the newer and more developer friendly 'JSON-LD' format through the 'jsonld' package. The package interface takes inspiration from the Python 'rdflib' library.
Maintained by Carl Boettiger. Last updated 8 months ago.
57 stars 9.59 score 123 scripts 7 dependentsbioc
tidybulk:Brings transcriptomics to the tidyverse
This is a collection of utility functions that allow to perform exploration of and calculations to RNA sequencing data, in a modular, pipe-friendly and tidy fashion.
Maintained by Stefano Mangiola. Last updated 12 days ago.
171 stars 9.57 score 172 scripts 1 dependentsbioc
recount:Explore and download data from the recount project
Explore and download data from the recount project available at Using the recount package you can download RangedSummarizedExperiment objects at the gene, exon or exon-exon junctions level, the raw counts, the phenotype metadata used, the urls to the sample coverage bigWig files or the mean coverage bigWig file for a particular study. The RangedSummarizedExperiment objects can be used by different packages for performing differential expression analysis. Using you can perform annotation-agnostic differential expression analyses with the data from the recount project as described at
Maintained by Leonardo Collado-Torres. Last updated 4 months ago.
41 stars 9.57 score 498 scripts 3 dependentsbusiness-science
anomalize:Tidy Anomaly Detection
The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it's quite simple to decompose time series, detect anomalies, and create bands separating the "normal" data from the anomalous data at scale (i.e. for multiple time series). Time series decomposition is used to remove trend and seasonal components via the time_decompose() function and methods include seasonal decomposition of time series by Loess ("stl") and seasonal decomposition by piecewise medians ("twitter"). The anomalize() function implements two methods for anomaly detection of residuals including using an inner quartile range ("iqr") and generalized extreme studentized deviation ("gesd"). These methods are based on those used in the 'forecast' package and the Twitter 'AnomalyDetection' package. Refer to the associated functions for specific references for these methods.
Maintained by Matt Dancho. Last updated 1 years ago.
339 stars 9.56 score 332 scriptsthackl
gggenomes:A Grammar of Graphics for Comparative Genomics
An extension of 'ggplot2' for creating complex genomic maps. It builds on the power of 'ggplot2' and 'tidyverse' adding new 'ggplot2'-style geoms & positions and 'dplyr'-style verbs to manipulate the underlying data. It implements a layout concept inspired by 'ggraph' and introduces tracks to bring tidiness to the mess that is genomics data.
Maintained by Thomas Hackl. Last updated 2 months ago.
650 stars 9.56 score 123 scriptsdaattali
ddpcr:Analysis and Visualization of Droplet Digital PCR in R and on the Web
An interface to explore, analyze, and visualize droplet digital PCR (ddPCR) data in R. This is the first non-proprietary software for analyzing two-channel ddPCR data. An interactive tool was also created and is available online to facilitate this analysis for anyone who is not comfortable with using R.
Maintained by Dean Attali. Last updated 1 years ago.
61 stars 9.54 score 131 scripts 2 dependentsimmunomind
immunarch:Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires
A comprehensive framework for bioinformatics exploratory analysis of bulk and single-cell T-cell receptor and antibody repertoires. It provides seamless data loading, analysis and visualisation for AIRR (Adaptive Immune Receptor Repertoire) data, both bulk immunosequencing (RepSeq) and single-cell sequencing (scRNAseq). Immunarch implements most of the widely used AIRR analysis methods, such as: clonality analysis, estimation of repertoire similarities in distribution of clonotypes and gene segments, repertoire diversity analysis, annotation of clonotypes using external immune receptor databases and clonotype tracking in vaccination and cancer studies. A successor to our previously published 'tcR' immunoinformatics package (Nazarov 2015) <doi:10.1186/s12859-015-0613-1>.
Maintained by Vadim I. Nazarov. Last updated 1 years ago.
316 stars 9.49 score 203 scriptsjohn-d-fox
Rcmdr:R Commander
A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.
Maintained by John Fox. Last updated 5 months ago.
4 stars 9.48 score 636 scripts 38 dependentstbates
umx:Structural Equation Modeling and Twin Modeling in R
Quickly create, run, and report structural equation models, and twin models. See '?umx' for help, and umx_open_CRAN_page("umx") for NEWS. Timothy C. Bates, Michael C. Neale, Hermine H. Maes, (2019). umx: A library for Structural Equation and Twin Modelling in R. Twin Research and Human Genetics, 22, 27-41. <doi:10.1017/thg.2019.2>.
Maintained by Timothy C. Bates. Last updated 14 days ago.
44 stars 9.45 score 472 scriptspecanproject Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by Mike Dietze. Last updated 2 days ago.
216 stars 9.33 score 19 scripts 10 dependentsmicrosoft
finnts:Microsoft Finance Time Series Forecasting Framework
Automated time series forecasting developed by Microsoft Finance. The Microsoft Finance Time Series Forecasting Framework, aka Finn, can be used to forecast any component of the income statement, balance sheet, or any other area of interest by finance. Any numerical quantity over time, Finn can be used to forecast it. While it can be applied outside of the finance domain, Finn was built to meet the needs of financial analysts to better forecast their businesses within a company, and has a lot of built in features that are specific to the needs of financial forecasters. Happy forecasting!
Maintained by Mike Tokic. Last updated 1 months ago.
194 stars 9.30 score 39 scriptsbioc
EWCE:Expression Weighted Celltype Enrichment
Used to determine which cell types are enriched within gene lists. The package provides tools for testing enrichments within simple gene lists (such as human disease associated genes) and those resulting from differential expression studies. The package does not depend upon any particular Single Cell Transcriptome dataset and user defined datasets can be loaded in and used in the analyses.
Maintained by Alan Murphy. Last updated 1 months ago.
56 stars 9.29 score 99 scriptsbioc
CNEr:CNE Detection and Visualization
Large-scale identification and advanced visualization of sets of conserved noncoding elements.
Maintained by Ge Tan. Last updated 5 months ago.
3 stars 9.28 score 35 scripts 19 dependentsstevenmmortimer
salesforcer:An Implementation of 'Salesforce' APIs Using Tidy Principles
Functions connecting to the 'Salesforce' Platform APIs (REST, SOAP, Bulk 1.0, Bulk 2.0, Metadata, Reports and Dashboards) <>. "API" is an acronym for "application programming interface". Most all calls from these APIs are supported as they use CSV, XML or JSON data that can be parsed into R data structures. For more details please see the 'Salesforce' API documentation and this package's website <> for more information, documentation, and examples.
Maintained by Steven M. Mortimer. Last updated 5 months ago.
82 stars 9.27 score 191 scriptsbioc
IsoformSwitchAnalyzeR:Identify, Annotate and Visualize Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data
Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.
Maintained by Kristoffer Vitting-Seerup. Last updated 5 months ago.
108 stars 9.26 score 125 scriptsbusiness-science
sweep:Tidy Tools for Forecasting
Tidies up the forecasting modeling and prediction work flow, extends the 'broom' package with 'sw_tidy', 'sw_glance', 'sw_augment', and 'sw_tidy_decomp' functions for various forecasting models, and enables converting 'forecast' objects to "tidy" data frames with 'sw_sweep'.
Maintained by Matt Dancho. Last updated 1 years ago.
155 stars 9.23 score 399 scripts 1 dependentsbioc
rWikiPathways:rWikiPathways - R client library for the WikiPathways API
Use this package to interface with the WikiPathways API. It provides programmatic access to WikiPathways content in multiple data and image formats, including official monthly release files and convenient GMT read/write functions.
Maintained by Egon Willighagen. Last updated 5 months ago.
15 stars 9.23 score 131 scripts 3 dependentsgeorgheinze
logistf:Firth's Bias-Reduced Logistic Regression
Fit a logistic regression model using Firth's bias reduction method, equivalent to penalization of the log-likelihood by the Jeffreys prior. Confidence intervals for regression coefficients can be computed by penalized profile likelihood. Firth's method was proposed as ideal solution to the problem of separation in logistic regression, see Heinze and Schemper (2002) <doi:10.1002/sim.1047>. If needed, the bias reduction can be turned off such that ordinary maximum likelihood logistic regression is obtained. Two new modifications of Firth's method, FLIC and FLAC, lead to unbiased predictions and are now available in the package as well, see Puhr et al (2017) <doi:10.1002/sim.7273>.
Maintained by Georg Heinze. Last updated 2 years ago.
12 stars 9.23 score 346 scripts 16 dependentsropensci
stats19:Work with Open Road Traffic Casualty Data from Great Britain
Tools to help download, process and analyse the UK road collision data collected using the 'STATS19' form. The datasets are provided as 'CSV' files with detailed road safety information about the circumstances of car crashes and other incidents on the roads resulting in casualties in Great Britain from 1979 to present. Tables are available on 'colissions' with the circumstances (e.g. speed limit of road), information about 'vehicles' involved (e.g. type of vehicle), and 'casualties' (e.g. age). The statistics relate only to events on public roads that were reported to the police, and subsequently recorded, using the 'STATS19' collision reporting form. See the Department for Transport website <> for more information on these datasets. The package is described in a paper in the Journal of Open Source Software (Lovelace et al. 2019) <doi:10.21105/joss.01181>. See Gilardi et al. (2022) <doi:10.1111/rssa.12823>, Vidal-Tortosa et al. (2021) <doi:10.1016/j.jth.2021.101291>, and Tait et al. (2023) <doi:10.1016/j.aap.2022.106895> for examples of how the data can be used for methodological and empirical road safety research.
Maintained by Robin Lovelace. Last updated 2 months ago.
64 stars 9.20 score 193 scriptsropensci
auk:eBird Data Extraction and Processing in R
Extract and process bird sightings records from eBird (<>), an online tool for recording bird observations. Public access to the full eBird database is via the eBird Basic Dataset (EBD; see <> for access), a downloadable text file. This package is an interface to AWK for extracting data from the EBD based on taxonomic, spatial, or temporal filters, to produce a manageable file size that can be imported into R.
Maintained by Matthew Strimas-Mackey. Last updated 10 days ago.
143 stars 9.18 score 254 scriptsbzhanglab
WebGestaltR:Gene Set Analysis Toolkit WebGestaltR
The web version WebGestalt <> supports 12 organisms, 354 gene identifiers and 321,251 function categories. Users can upload the data and functional categories with their own gene identifiers. In addition to the Over-Representation Analysis, WebGestalt also supports Gene Set Enrichment Analysis and Network Topology Analysis. The user-friendly output report allows interactive and efficient exploration of enrichment results. The WebGestaltR package not only supports all above functions but also can be integrated into other pipeline or simultaneously analyze multiple gene lists.
Maintained by John Elizarraras. Last updated 2 days ago.
35 stars 9.18 score 180 scriptsatlasoflivingaustralia
galah:Biodiversity Data from the GBIF Node Network
The Global Biodiversity Information Facility ('GBIF', <>) sources data from an international network of data providers, known as 'nodes'. Several of these nodes - the "living atlases" (<>) - maintain their own web services using software originally developed by the Atlas of Living Australia ('ALA', <>). 'galah' enables the R community to directly access data and resources hosted by 'GBIF' and its partner nodes.
Maintained by Martin Westgate. Last updated 2 months ago.
43 stars 9.17 score 275 scripts 1 dependentsalexanderrobitzsch
miceadds:Some Additional Multiple Imputation Functions, Especially for 'mice'
Contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are implemented. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>), substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>), and features for the generation of synthetic datasets (Reiter, 2005, <doi:10.1111/j.1467-985X.2004.00343.x>; Nowok, Raab, & Dibben, 2016, <doi:10.18637/jss.v074.i11>).
Maintained by Alexander Robitzsch. Last updated 28 days ago.
16 stars 9.16 score 542 scripts 9 dependentsbodkan
slendr:A Simulation Framework for Spatiotemporal Population Genetics
A framework for simulating spatially explicit genomic data which leverages real cartographic information for programmatic and visual encoding of spatiotemporal population dynamics on real geographic landscapes. Population genetic models are then automatically executed by the 'SLiM' software by Haller et al. (2019) <doi:10.1093/molbev/msy228> behind the scenes, using a custom built-in simulation 'SLiM' script. Additionally, fully abstract spatial models not tied to a specific geographic location are supported, and users can also simulate data from standard, non-spatial, random-mating models. These can be simulated either with the 'SLiM' built-in back-end script, or using an efficient coalescent population genetics simulator 'msprime' by Baumdicker et al. (2022) <doi:10.1093/genetics/iyab229> with a custom-built 'Python' script bundled with the R package. Simulated genomic data is saved in a tree-sequence format and can be loaded, manipulated, and summarised using tree-sequence functionality via an R interface to the 'Python' module 'tskit' by Kelleher et al. (2019) <doi:10.1038/s41588-019-0483-y>. Complete model configuration, simulation and analysis pipelines can be therefore constructed without a need to leave the R environment, eliminating friction between disparate tools for population genetic simulations and data analysis.
Maintained by Martin Petr. Last updated 15 hours ago.
56 stars 9.13 score 88 scriptsmalaria-atlas-project
malariaAtlas:An R Interface to Open-Access Malaria Data, Hosted by the 'Malaria Atlas Project'
A suite of tools to allow you to download all publicly available parasite rate survey points, mosquito occurrence points and raster surfaces from the 'Malaria Atlas Project' <> servers as well as utility functions for plotting the downloaded data.
Maintained by Mauricio van den Berg. Last updated 8 months ago.
44 stars 9.10 score 118 scripts 3 dependentsnickch-k
vtable:Variable Table for Variable Documentation
Automatically generates HTML variable documentation including variable names, labels, classes, value labels (if applicable), value ranges, and summary statistics. See the vignette "vtable" for a package overview.
Maintained by Nick Huntington-Klein. Last updated 3 months ago.
40 stars 9.10 score 1.2k scriptsbioc
sesame:SEnsible Step-wise Analysis of DNA MEthylation BeadChips
Tools For analyzing Illumina Infinium DNA methylation arrays. SeSAMe provides utilities to support analyses of multiple generations of Infinium DNA methylation BeadChips, including preprocessing, quality control, visualization and inference. SeSAMe features accurate detection calling, intelligent inference of ethnicity, sex and advanced quality control routines.
Maintained by Wanding Zhou. Last updated 3 months ago.
69 stars 9.08 score 258 scripts 1 dependentsehrlinger
ggRandomForests:Visually Exploring Random Forests
Graphic elements for exploring Random Forests using the 'randomForest' or 'randomForestSRC' package for survival, regression and classification forests and 'ggplot2' package plotting.
Maintained by John Ehrlinger. Last updated 8 days ago.
148 stars 9.07 score 197 scriptscmmr
rbiom:Read/Write, Analyze, and Visualize 'BIOM' Data
A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. CPU intensive operations are coded in C for speed.
Maintained by Daniel P. Smith. Last updated 11 days ago.
15 stars 9.07 score 117 scripts 6 dependentsbioc
BatchQC:Batch Effects Quality Control Software
Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.
Maintained by Jessica Anderson. Last updated 11 days ago.
7 stars 9.06 score 54 scriptsropensci
ijtiff:Comprehensive TIFF I/O with Full Support for 'ImageJ' TIFF Files
General purpose TIFF file I/O for R users. Currently the only such package with read and write support for TIFF files with floating point (real-numbered) pixels, and the only package that can correctly import TIFF files that were saved from 'ImageJ' and write TIFF files than can be correctly read by 'ImageJ' <>. Also supports text image I/O.
Maintained by Rory Nolan. Last updated 6 days ago.
18 stars 9.03 score 36 scripts 7 dependentseblondel
ows4R:Interface to OGC Web-Services (OWS)
Provides an Interface to Web-Services defined as standards by the Open Geospatial Consortium (OGC), including Web Feature Service (WFS) for vector data, Web Coverage Service (WCS), Catalogue Service (CSW) for ISO/OGC metadata, Web Processing Service (WPS) for data processes, and associated standards such as the common web-service specification (OWS) and OGC Filter Encoding. Partial support is provided for the Web Map Service (WMS). The purpose is to add support for additional OGC service standards such as Web Coverage Processing Service (WCPS), the Sensor Observation Service (SOS), or even new standard services emerging such OGC API or SensorThings.
Maintained by Emmanuel Blondel. Last updated 2 months ago.
38 stars 9.03 score 99 scripts 5 dependentsbioc
CARNIVAL:A CAusal Reasoning tool for Network Identification (from gene expression data) using Integer VALue programming
An upgraded causal reasoning tool from Melas et al in R with updated assignments of TFs' weights from PROGENy scores. Optimization parameters can be freely adjusted and multiple solutions can be obtained and aggregated.
Maintained by Attila Gabor. Last updated 5 months ago.
57 stars 9.03 score 90 scripts 1 dependentsronkeizer
vpc:Create Visual Predictive Checks
Visual predictive checks are a commonly used diagnostic plot in pharmacometrics, showing how certain statistics (percentiles) for observed data compare to those same statistics for data simulated from a model. The package can generate VPCs for continuous, categorical, censored, and (repeated) time-to-event data.
Maintained by Ron Keizer. Last updated 10 months ago.
36 stars 9.01 score 318 scripts 11 dependentsts404
WikidataR:Read-Write API Client Library for Wikidata
Read from, interrogate, and write to Wikidata <> - the multilingual, interdisciplinary, semantic knowledgebase. Includes functions to: read from Wikidata (single items, properties, or properties); query Wikidata (retrieving all items that match a set of criteria via Wikidata SPARQL query service); write to Wikidata (adding new items or statements via QuickStatements); and handle and manipulate Wikidata objects (as lists and tibbles). Uses the Wikidata and QuickStatements APIs.
Maintained by Thomas Shafee. Last updated 2 months ago.
22 stars 9.01 score 109 scripts 28 dependentspecanproject
PEcAn.all:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by David LeBauer. Last updated 2 days ago.
216 stars 9.00 score 266 scriptspecanproject
PEcAn.MAAT:PEcAn Package for Integration of the MAAT Model
This module provides functions to wrap the MAAT model into the PEcAn workflows.
Maintained by Shawn Serbin. Last updated 2 days ago.
216 stars 8.96 score 12 scriptssachaepskamp
bootnet:Bootstrap Methods for Various Network Estimation Routines
Bootstrap methods to assess accuracy and stability of estimated network structures and centrality indices <doi:10.3758/s13428-017-0862-1>. Allows for flexible specification of any undirected network estimation procedure in R, and offers default sets for various estimation routines.
Maintained by Sacha Epskamp. Last updated 5 months ago.
32 stars 8.94 score 155 scripts 3 dependentspecanproject
PEcAn.BIOCRO:PEcAn Package for Integration of the BioCro Model
This module provides functions to link BioCro to PEcAn.
Maintained by David LeBauer. Last updated 2 days ago.
216 stars 8.94 score 23 scriptspik-piam
remind2:The REMIND R package (2nd generation)
Contains the REMIND-specific routines for data and model output manipulation.
Maintained by Renato Rodrigues. Last updated 1 days ago.
8.87 score 161 scripts 5 dependentsropensci
nlrx:Setup, Run and Analyze 'NetLogo' Model Simulations from 'R' via 'XML'
Setup, run and analyze 'NetLogo' (<>) model simulations in 'R'. 'nlrx' experiments use a similar structure as 'NetLogos' Behavior Space experiments. However, 'nlrx' offers more flexibility and additional tools for running and analyzing complex simulation designs and sensitivity analyses. The user defines all information that is needed in an intuitive framework, using class objects. Experiments are submitted from 'R' to 'NetLogo' via 'XML' files that are dynamically written, based on specifications defined by the user. By nesting model calls in future environments, large simulation design with many runs can be executed in parallel. This also enables simulating 'NetLogo' experiments on remote high performance computing machines. In order to use this package, 'Java' and 'NetLogo' (>= 5.3.1) need to be available on the executing system.
Maintained by Sebastian Hanss. Last updated 7 months ago.
78 stars 8.86 score 195 scriptsmattcowgill
readabs:Download and Tidy Time Series Data from the Australian Bureau of Statistics
Downloads, imports, and tidies time series data from the Australian Bureau of Statistics <>.
Maintained by Matt Cowgill. Last updated 27 days ago.
104 stars 8.85 score 180 scriptsatorus-research
xportr:Utilities to Output CDISC SDTM/ADaM XPT Files
Tools to build CDISC compliant data sets and check for CDISC compliance.
Maintained by Eli Miller. Last updated 3 months ago.
43 stars 8.84 score 102 scriptspecanproject
PEcAn.workflow:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation. This package provides workhorse functions that can be used to run the major steps of a PEcAn analysis.
Maintained by David LeBauer. Last updated 2 days ago.
216 stars 8.83 score 15 scripts 4 dependentsmountainmath
cansim:Accessing Statistics Canada Data Table and Vectors
Searches for, accesses, and retrieves Statistics Canada data tables, as well as individual vectors, as tidy data frames. This package enriches the tables with metadata, deals with encoding issues, allows for bilingual English or French language data retrieval, and bundles convenience functions to make it easier to work with retrieved table data. For more efficient data access the package allows for caching data in a local database and database level filtering, data manipulation and summarizing.
Maintained by Jens von Bergmann. Last updated 14 days ago.
45 stars 8.78 score 446 scriptsbioc
SeqVarTools:Tools for variant data
An interface to the fast-access storage format for VCF data provided in SeqArray, with tools for common operations and analysis.
Maintained by Stephanie M. Gogarten. Last updated 5 months ago.
3 stars 8.76 score 384 scripts 2 dependentsbioc
drawProteins:Package to Draw Protein Schematics from Uniprot API output
This package draws protein schematics from Uniprot API output. From the JSON returned by the GET command, it creates a dataframe from the Uniprot Features API. This dataframe can then be used by geoms based on ggplot2 and base R to draw protein schematics.
Maintained by Paul Brennan. Last updated 5 months ago.
34 stars 8.75 score 61 scripts 1 dependentspecanproject
PEcAn.ED2:PEcAn Package for Integration of ED2 Model
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation. This package provides functions to link the Ecosystem Demography Model, version 2, to PEcAn.
Maintained by Mike Dietze. Last updated 2 days ago.
216 stars 8.74 score 145 scriptsadokter
bioRad:Biological Analysis and Visualization of Weather Radar Data
Extract, visualize and summarize aerial movements of birds and insects from weather radar data. See Dokter, A. M. et al. (2018) "bioRad: biological analysis and visualization of weather radar data" <doi:10.1111/ecog.04028> for a software paper describing package and methodologies.
Maintained by Adriaan M. Dokter. Last updated 4 days ago.
29 stars 8.70 score 56 scriptsbioc
memes:motif matching, comparison, and de novo discovery using the MEME Suite
A seamless interface to the MEME Suite family of tools for motif analysis. 'memes' provides data aware utilities for using GRanges objects as entrypoints to motif analysis, data structures for examining & editing motif lists, and novel data visualizations. 'memes' functions and data structures are amenable to both base R and tidyverse workflows.
Maintained by Spencer Nystrom. Last updated 5 months ago.
50 stars 8.69 score 117 scripts 1 dependentsjinseob2kim
jsmodule:'RStudio' Addins and 'Shiny' Modules for Medical Research
'RStudio' addins and 'Shiny' modules for descriptive statistics, regression and survival analysis.
Maintained by Jinseob Kim. Last updated 11 days ago.
21 stars 8.69 score 61 scriptsbioc
miaViz:Microbiome Analysis Plotting and Visualization
The miaViz package implements functions to visualize TreeSummarizedExperiment objects especially in the context of microbiome analysis. Part of the mia family of R/Bioconductor packages.
Maintained by Tuomas Borman. Last updated 10 days ago.
10 stars 8.67 score 81 scripts 1 dependentsropensci
comtradr:Interface with the United Nations Comtrade API
Interface with and extract data from the United Nations 'Comtrade' API <>. 'Comtrade' provides country level shipping data for a variety of commodities, these functions allow for easy API query and data returned as a tidy data frame.
Maintained by Paul Bochtler. Last updated 4 months ago.
66 stars 8.67 score 70 scriptsr-box
boxr:Interface for the ' API'
An R interface for the remote file hosting service 'Box' (<>). In addition to uploading and downloading files, this package includes functions which mirror base R operations for local files, (e.g. box_load(), box_save(), box_read(), box_setwd(), etc.), as well as 'git' style functions for entire directories (e.g. box_fetch(), box_push()).
Maintained by Ian Lyttle. Last updated 12 months ago.
63 stars 8.65 score 238 scriptsbcgov
bcmaps:Map Layers and Spatial Utilities for British Columbia
Various layers of B.C., including administrative boundaries, natural resource management boundaries, census boundaries etc. All layers are available in BC Albers (<>) equal-area projection, which is the B.C. government standard. The layers are sourced from the British Columbia and Canadian government under open licenses, including B.C. Data Catalogue (<>), the Government of Canada Open Data Portal (<>), and Statistics Canada (<>).
Maintained by Andy Teucher. Last updated 3 months ago.
73 stars 8.65 score 254 scriptsropensci
traits:Species Trait Data from Around the Web
Species trait data from many different sources, including sequence data from 'NCBI' (<>), plant trait data from 'BETYdb', data from 'EOL' 'Traitbank', 'Birdlife' International, and more.
Maintained by David LeBauer. Last updated 2 months ago.
41 stars 8.65 score 82 scripts 11 dependentsprojectmosaic
mosaicCalc:R-Language Based Calculus Operations for Teaching
Software to support the introductory *MOSAIC Calculus* textbook <>), one of many data- and modeling-oriented educational resources developed by Project MOSAIC (<>). Provides symbolic and numerical differentiation and integration, as well as support for applied linear algebra (for data science), and differential equations/dynamics. Includes grammar-of-graphics-based functions for drawing vector fields, trajectories, etc. The software is suitable for general use, but intended mainly for teaching calculus.
Maintained by Daniel Kaplan. Last updated 1 months ago.
13 stars 8.63 score 546 scriptscharlie86
spotifyr:R Wrapper for the 'Spotify' Web API
An R wrapper for pulling data from the 'Spotify' Web API <> in bulk, or post items on a 'Spotify' user's playlist.
Maintained by Daniel Antal. Last updated 5 months ago.
375 stars 8.61 score 936 scriptsropensci
babette:Control 'BEAST2'
'BEAST2' (<>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. 'BEAST2' is commonly accompanied by 'BEAUti 2', 'Tracer' and 'DensiTree'. 'babette' provides for an alternative workflow of using all these tools separately. This allows doing complex Bayesian phylogenetics easily and reproducibly from 'R'.
Maintained by Richèl J.C. Bilderbeek. Last updated 23 hours ago.
45 stars 8.55 score 53 scripts 1 dependentsropensci
UCSCXenaTools:Download and Explore Datasets from UCSC Xena Data Hubs
Download and explore datasets from UCSC Xena data hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.
Maintained by Shixiang Wang. Last updated 5 months ago.
106 stars 8.55 score 163 scripts 1 dependentsrundel
parsermd:Formal Parser and Related Tools for R Markdown Documents
An implementation of a formal grammar and parser for R Markdown documents using the Boost Spirit X3 library. It also includes a collection of high level functions for working with the resulting abstract syntax tree.
Maintained by Colin Rundel. Last updated 8 months ago.
84 stars 8.55 score 58 scripts 4 dependentsjpquast
protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools
Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.
Maintained by Jan-Philipp Quast. Last updated 5 months ago.
63 stars 8.51 score 83 scriptsppbds
tutorial.helpers:Helper Functions for Creating Tutorials
Helper functions for creating, editing, and testing tutorials created with the 'learnr' package. Provides a simple method for allowing students to download their answers to tutorial questions. For examples of its use, see the 'r4ds.tutorials' package.
Maintained by David Kane. Last updated 12 days ago.
5 stars 8.50 score 152 scripts 1 dependentsopenbiox
UCSCXenaShiny:Interactive Analysis of UCSC Xena Data
Provides functions and a Shiny application for downloading, analyzing and visualizing datasets from UCSC Xena (<>), which is a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others.
Maintained by Shixiang Wang. Last updated 4 months ago.
98 stars 8.47 score 35 scriptsisubirana
compareGroups:Descriptive Analysis by Groups
Create data summaries for quality control, extensive reports for exploring data, as well as publication-ready univariate or bivariate tables in several formats (plain text, HTML,LaTeX, PDF, Word or Excel. Create figures to quickly visualise the distribution of your data (boxplots, barplots, normality-plots, etc.). Display statistics (mean, median, frequencies, incidences, etc.). Perform the appropriate tests (t-test, Analysis of variance, Kruskal-Wallis, Fisher, log-rank, ...) depending on the nature of the described variable (normal, non-normal or qualitative). Summarize genetic data (Single Nucleotide Polymorphisms) data displaying Allele Frequencies and performing Hardy-Weinberg Equilibrium tests among other typical statistics and tests for these kind of data.
Maintained by Isaac Subirana. Last updated 1 months ago.
36 stars 8.46 score 396 scripts 1 dependentsropensci
weathercan:Download Weather Data from Environment and Climate Change Canada
Provides means for downloading historical weather data from the Environment and Climate Change Canada website (<>). Data can be downloaded from multiple stations and over large date ranges and automatically processed into a single dataset. Tools are also provided to identify stations either by name or proximity to a location.
Maintained by Steffi LaZerte. Last updated 4 days ago.
106 stars 8.45 score 189 scriptsbioc
lefser:R implementation of the LEfSE method for microbiome biomarker discovery
lefser is the R implementation of the popular microbiome biomarker discovery too, LEfSe. It uses the Kruskal-Wallis test, Wilcoxon-Rank Sum test, and Linear Discriminant Analysis to find biomarkers from two-level classes (and optional sub-classes).
Maintained by Sehyun Oh. Last updated 1 months ago.
56 stars 8.44 score 56 scriptsbioc
sccomp:Tests differences in cell-type proportion for single-cell data, robust to outliers
A robust and outlier-aware method for testing differences in cell-type proportion in single-cell data. This model can infer changes in tissue composition and heterogeneity, and can produce realistic data simulations based on any existing dataset. This model can also transfer knowledge from a large set of integrated datasets to increase accuracy further.
Maintained by Stefano Mangiola. Last updated 14 days ago.
99 stars 8.43 score 69 scriptsdavidhodge931
ggblanket:Simplify 'ggplot2' Visualisation
Simplify 'ggplot2' visualisation with 'ggblanket' wrapper functions.
Maintained by David Hodge. Last updated 10 days ago.
173 stars 8.42 score 45 scriptseblondel
zen4R:Interface to 'Zenodo' REST API
Provides an Interface to 'Zenodo' (<>) REST API, including management of depositions, attribution of DOIs by 'Zenodo' and upload and download of files.
Maintained by Emmanuel Blondel. Last updated 29 days ago.
46 stars 8.41 score 76 scripts 1 dependentstheharmonylab
topics:Creating and Significance Testing Language Features for Visualisation
Implements differential language analysis with statistical tests and offers various language visualization techniques for n-grams and topics. It also supports the 'text' package. For more information, visit <> and <>.
Maintained by Oscar Kjell. Last updated 3 days ago.
5 stars 8.38 score 22 scripts 2 dependentsnlmixr2
nlmixr2:Nonlinear Mixed Effects Models in Population PK/PD
Fit and compare nonlinear mixed-effects models in differential equations with flexible dosing information commonly seen in pharmacokinetics and pharmacodynamics (Almquist, Leander, and Jirstrand 2015 <doi:10.1007/s10928-015-9409-1>). Differential equation solving is by compiled C code provided in the 'rxode2' package (Wang, Hallow, and James 2015 <doi:10.1002/psp4.12052>).
Maintained by Matthew Fidler. Last updated 1 months ago.
52 stars 8.38 score 120 scripts 3 dependentsmucollective
multiverse:Create 'multiverse analysis' in R
Implement 'multiverse' style analyses (Steegen S., Tuerlinckx F, Gelman A., Vanpaemal, W., 2016) <doi:10.1177/1745691616658637> to show the robustness of statistical inference. 'Multiverse analysis' is a philosophy of statistical reporting where paper authors report the outcomes of many different statistical analyses in order to show how fragile or robust their findings are. The 'multiverse' package (Sarma A., Kale A., Moon M., Taback N., Chevalier F., Hullman J., Kay M., 2021) <doi:10.31219/> allows users to concisely and flexibly implement 'multiverse-style' analysis, which involve declaring alternate ways of performing an analysis step, in R and R Notebooks.
Maintained by Abhraneel Sarma. Last updated 4 months ago.
62 stars 8.37 score 42 scriptspecanproject
PEcAn.SIPNET:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by Mike Dietze. Last updated 2 days ago.
216 stars 8.36 score 61 scriptswallaceecomod
wallace:A Modular Platform for Reproducible Modeling of Species Niches and Distributions
The 'shiny' application Wallace is a modular platform for reproducible modeling of species niches and distributions. Wallace guides users through a complete analysis, from the acquisition of species occurrence and environmental data to visualizing model predictions on an interactive map, thus bundling complex workflows into a single, streamlined interface. An extensive vignette, which guides users through most package functionality can be found on the package's GitHub Pages website: <>.
Maintained by Mary E. Blair. Last updated 22 days ago.
133 stars 8.36 score 96 scriptspecanproject
PEcAn.LINKAGES:PEcAn Package for Integration of the LINKAGES Model
This module provides functions to link the (LINKAGES) to PEcAn.
Maintained by Ann Raiho. Last updated 2 days ago.
216 stars 8.35 score 59 scriptsalishinski
lavaanPlot:Path Diagrams for 'Lavaan' Models via 'DiagrammeR'
Plots path diagrams from models in 'lavaan' using the plotting functionality from the 'DiagrammeR' package. 'DiagrammeR' provides nice path diagrams via 'Graphviz', and these functions make it easy to generate these diagrams from a 'lavaan' path model without having to write the DOT language graph specification.
Maintained by Alex Lishinski. Last updated 1 years ago.
40 stars 8.33 score 294 scriptsbusiness-science
modeltime.ensemble:Ensemble Algorithms for Time Series Forecasting with Modeltime
A 'modeltime' extension that implements time series ensemble forecasting methods including model averaging, weighted averaging, and stacking. These techniques are popular methods to improve forecast accuracy and stability.
Maintained by Matt Dancho. Last updated 8 months ago.
77 stars 8.30 score 143 scriptsrubenarslan
codebook:Automatic Codebooks from Metadata Encoded in Dataset Attributes
Easily automate the following tasks to describe data frames: Summarise the distributions, and labelled missings of variables graphically and using descriptive statistics. For surveys, compute and summarise reliabilities (internal consistencies, retest, multilevel) for psychological scales. Combine this information with metadata (such as item labels and labelled values) that is derived from R attributes. To do so, the package relies on 'rmarkdown' partials, so you can generate HTML, PDF, and Word documents. Codebooks are also available as tables (CSV, Excel, etc.) and in JSON-LD, so that search engines can find your data and index the metadata. The metadata are also available at your fingertips via RStudio Addins.
Maintained by Ruben Arslan. Last updated 3 months ago.
143 stars 8.29 score 229 scriptsbioc
crisprDesign:Comprehensive design of CRISPR gRNAs for nucleases and base editors
Provides a comprehensive suite of functions to design and annotate CRISPR guide RNA (gRNAs) sequences. This includes on- and off-target search, on-target efficiency scoring, off-target scoring, full gene and TSS contextual annotations, and SNP annotation (human only). It currently support five types of CRISPR modalities (modes of perturbations): CRISPR knockout, CRISPR activation, CRISPR inhibition, CRISPR base editing, and CRISPR knockdown. All types of CRISPR nucleases are supported, including DNA- and RNA-target nucleases such as Cas9, Cas12a, and Cas13d. All types of base editors are also supported. gRNA design can be performed on reference genomes, transcriptomes, and custom DNA and RNA sequences. Both unpaired and paired gRNA designs are enabled.
Maintained by Jean-Philippe Fortin. Last updated 24 days ago.
22 stars 8.28 score 80 scripts 3 dependentsdbosak01
libr:Libraries, Data Dictionaries, and a Data Step for R
Contains a set of functions to create data libraries, generate data dictionaries, and simulate a data step. The libname() function will load a directory of data into a library in one line of code. The dictionary() function will generate data dictionaries for individual data frames or an entire library. And the datestep() function will perform row-by-row data processing.
Maintained by David Bosak. Last updated 3 months ago.
27 stars 8.27 score 48 scripts 2 dependentspik-piam
quitte:Bits and pieces of code to use with quitte-style data frames
A collection of functions for easily dealing with quitte-style data frames, doing multi-model comparisons and plots.
Maintained by Falk Benke. Last updated 4 days ago.
8.26 score 184 scripts 35 dependentsropenspain
climaemet:Climate AEMET Tools
Tools to download the climatic data of the Spanish Meteorological Agency (AEMET) directly from R using their API and create scientific graphs (climate charts, trend analysis of climate time series, temperature and precipitation anomalies maps, warming stripes graphics, climatograms, etc.).
Maintained by Diego Hernangómez. Last updated 4 days ago.
42 stars 8.25 score 59 scriptsradiant-rstats Menu for Radiant: Business Analytics using R and Shiny
The Radiant Data menu includes interfaces for loading, saving, viewing, visualizing, summarizing, transforming, and combining data. It also contains functionality to generate reproducible reports of the analyses conducted in the application.
Maintained by Vincent Nijs. Last updated 5 months ago.
53 stars 8.25 score 146 scripts 6 dependentsopenvolley
datavolley:Reading and Analyzing DataVolley Scout Files
Provides functions for parsing and working with volleyball match files in DataVolley format.
Maintained by Ben Raymond. Last updated 2 months ago.
31 stars 8.24 score 94 scripts 11 dependentsrobjhyndman
demography:Forecasting Mortality, Fertility, Migration and Population Data
Functions for demographic analysis including lifetable calculations; Lee-Carter modelling; functional data analysis of mortality rates, fertility rates, net migration numbers; and stochastic population forecasting.
Maintained by Rob Hyndman. Last updated 4 months ago.
74 stars 8.21 score 241 scripts 6 dependentsnceas
metajam:Easily Download Data and Metadata from 'DataONE'
A set of tools to foster the development of reproducible analytical workflow by simplifying the download of data and metadata from 'DataONE' (<>) and easily importing this information into R.
Maintained by Julien Brun. Last updated 7 months ago.
16 stars 8.21 score 75 scriptsdarwin-eu
DrugUtilisation:Summarise Patient-Level Drug Utilisation in Data Mapped to the OMOP Common Data Model
Summarise patient-level drug utilisation cohorts using data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model. New users and prevalent users cohorts can be generated and their characteristics, indication and drug use summarised.
Maintained by Martí Català. Last updated 2 months ago.
8.20 score 156 scripts 2 dependentssafetygraphics
safetyGraphics:Interactive Graphics for Monitoring Clinical Trial Safety
A framework for evaluation of clinical trial safety. Users can interactively explore their data using the included 'Shiny' application.
Maintained by Jeremy Wildfire. Last updated 2 years ago.
99 stars 8.19 score 111 scriptsjonesor
Rage:Life History Metrics from Matrix Population Models
Functions for calculating life history metrics using matrix population models ('MPMs'). Described in Jones et al. (2021) <doi:10.1101/2021.04.26.441330>.
Maintained by Owen Jones. Last updated 3 months ago.
12 stars 8.18 score 62 scripts 1 dependentsrcannood
SCORPIUS:Inferring Developmental Chronologies from Single-Cell RNA Sequencing Data
An accurate and easy tool for performing linear trajectory inference on single cells using single-cell RNA sequencing data. In addition, 'SCORPIUS' provides functions for discovering the most important genes with respect to the reconstructed trajectory, as well as nice visualisation tools. Cannoodt et al. (2016) <doi:10.1101/079509>.
Maintained by Robrecht Cannoodt. Last updated 2 years ago.
59 stars 8.17 score 126 scriptseblondel
geometa:Tools for Reading and Writing ISO/OGC Geographic Metadata
Provides facilities to read, write and validate geographic metadata defined with ISO TC211 / OGC ISO geographic information metadata standards, and encoded using the ISO 19139 and ISO 19115-3 (XML) standard technical specifications. This includes ISO 19110 (Feature cataloguing), 19115 (dataset metadata), 19119 (service metadata) and 19136 (GML). Other interoperable schemas from the OGC are progressively supported as well, such as the Sensor Web Enablement (SWE) Common Data Model, the OGC GML Coverage Implementation Schema (GMLCOV), or the OGC GML Referenceable Grid (GMLRGRID).
Maintained by Emmanuel Blondel. Last updated 5 days ago.
47 stars 8.16 score 109 scripts 7 dependentsrobjhyndman
cricketdata:International Cricket Data
Data on international and other major cricket matches from ESPNCricinfo <> and Cricsheet <>. This package provides some functions to download the data into tibbles ready for analysis.
Maintained by Rob Hyndman. Last updated 5 days ago.
88 stars 8.14 score 87 scriptsalinetalhouk
diceR:Diverse Cluster Ensemble in R
Performs cluster analysis using an ensemble clustering framework, Chiu & Talhouk (2018) <doi:10.1186/s12859-017-1996-y>. Results from a diverse set of algorithms are pooled together using methods such as majority voting, K-Modes, LinkCluE, and CSPA. There are options to compare cluster assignments across algorithms using internal and external indices, visualizations such as heatmaps, and significance testing for the existence of clusters.
Maintained by Derek Chiu. Last updated 2 months ago.
37 stars 8.13 score 60 scripts 3 dependentspecanproject
PEcAnAssimSequential:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by Mike Dietze. Last updated 2 days ago.
216 stars 8.12 score 35 scriptsmazamascience
MazamaSpatialUtils:Spatial Data Download and Utility Functions
A suite of conversion functions to create internally standardized spatial polygons data frames. Utility functions use these data sets to return values such as country, state, time zone, watershed, etc. associated with a set of longitude/latitude pairs. (They also make cool maps.)
Maintained by Jonathan Callahan. Last updated 5 months ago.
5 stars 8.09 score 282 scripts 2 dependentschop-cgtinformatics
REDCapTidieR:Extract 'REDCap' Databases into Tidy 'Tibble's
Convert 'REDCap' exports into tidy tables for easy handling of 'REDCap' repeat instruments and event arms.
Maintained by Richard Hanna. Last updated 9 days ago.
35 stars 8.08 score 36 scriptsgfellerlab
SuperCell:Simplification of scRNA-seq data by merging together similar cells
Aggregates large single-cell data into metacell dataset by merging together gene expression of very similar cells.
Maintained by The package maintainer. Last updated 8 months ago.
72 stars 8.08 score 93 scripts