R-universe search: needs:openssl

r-lib

httr:Tools for Working with URLs and HTTP

Useful tools for working with HTTP organised by HTTP verbs (GET(), POST(), etc). Configuration functions make it easy to control additional request components (authenticate(), add_headers() and so on).

Maintained by Hadley Wickham. Last updated 1 years ago.

api curl http

989 stars 20.56 score 29k scripts 4.3k dependents

tidyverse

tidyverse:Easily Install and Load the 'Tidyverse'

The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <https://www.tidyverse.org>.

Maintained by Hadley Wickham. Last updated 5 months ago.

data-science tidyverse

1.7k stars 20.23 score 664k scripts 125 dependents

r-lib

devtools:Tools to Make Developing R Packages Easier

Collection of package development tools.

Maintained by Jennifer Bryan. Last updated 6 months ago.

package-creation

2.4k stars 19.55 score 51k scripts 150 dependents

plotly

plotly:Create Interactive Web Graphics via 'plotly.js'

Create interactive web graphics from 'ggplot2' graphs and/or a custom interface to the (MIT-licensed) JavaScript library 'plotly.js' inspired by the grammar of graphics.

Maintained by Carson Sievert. Last updated 3 months ago.

d3js data-visualization ggplot2 javascript plotly shiny webgl

2.6k stars 19.43 score 93k scripts 797 dependents

tidyverse

rvest:Easily Harvest (Scrape) Web Pages

Wrappers around the 'xml2' and 'httr' packages to make it easy to download, then manipulate, HTML and XML.

Maintained by Hadley Wickham. Last updated 5 months ago.

html web-scraping

1.5k stars 19.33 score 29k scripts 549 dependents

r-lib

pkgdown:Make Static HTML Documentation for a Package

Generate an attractive and useful website from a source package. 'pkgdown' converts your documentation, vignettes, 'README', and more to 'HTML' making it easy to share information about your package online.

Maintained by Hadley Wickham. Last updated 10 days ago.

documentation-tool

734 stars 18.46 score 588 scripts 162 dependents

bioc

Biostrings:Efficient manipulation of biological strings

Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.

Maintained by Hervé Pagès. Last updated 1 months ago.

sequencematching alignment sequencing genetics dataimport datarepresentation infrastructure bioconductor-package core-package

62 stars 17.77 score 8.6k scripts 1.2k dependents

bioc

GenomicRanges:Representation and manipulation of genomic intervals

The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.

Maintained by Hervé Pagès. Last updated 4 months ago.

genetics infrastructure datarepresentation sequencing annotation genomeannotation coverage bioconductor-package core-package

44 stars 17.68 score 13k scripts 1.3k dependents

r-lib

httr2:Perform HTTP Requests and Process the Responses

Tools for creating and modifying HTTP requests, then performing them and processing the results. 'httr2' is a modern re-imagining of 'httr' that uses a pipe-based interface and solves more of the problems that API wrapping packages face.

Maintained by Hadley Wickham. Last updated 2 days ago.

http

246 stars 17.64 score 1.9k scripts 1.1k dependents

r-lib

usethis:Automate Package and Project Setup

Automate package and project setup tasks that are otherwise performed manually. This includes setting up unit testing, test coverage, continuous integration, Git, 'GitHub', licenses, 'Rcpp', 'RStudio' projects, and more.

Maintained by Jennifer Bryan. Last updated 24 days ago.

github setup

869 stars 17.54 score 5.6k scripts 336 dependents

davidgohel

flextable:Functions for Tabular Reporting

Use a grammar for creating and customizing pretty tables. The following formats are supported: 'HTML', 'PDF', 'RTF', 'Microsoft Word', 'Microsoft PowerPoint' and R 'Grid Graphics'. 'R Markdown', 'Quarto' and the package 'officer' can be used to produce the result files. The syntax is the same for the user regardless of the type of output to be produced. A set of functions allows the creation, definition of cell arrangement, addition of headers or footers, formatting and definition of cell content with text and or images. The package also offers a set of high-level functions that allow tabular reporting of statistical models and the creation of complex cross tabulations.

Maintained by David Gohel. Last updated 6 days ago.

docx html5 ms-office-documents rmarkdown table

582 stars 17.09 score 7.3k scripts 124 dependents

bioc

clusterProfiler:A universal enrichment tool for interpreting omics data

This package supports functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation. It provides a univeral interface for gene functional annotation from a variety of sources and thus can be applied in diverse scenarios. It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation. Datasets obtained from multiple treatments and time points can be analyzed and compared in a single run, easily revealing functional consensus and differences among distinct conditions.

Maintained by Guangchuang Yu. Last updated 4 months ago.

annotation clustering genesetenrichment go kegg multiplecomparison pathways reactome visualization enrichment-analysis gsea

1.1k stars 17.03 score 11k scripts 48 dependents

satijalab

Seurat:Tools for Single Cell Genomics

A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.

Maintained by Paul Hoffman. Last updated 1 years ago.

human-cell-atlas single-cell-genomics single-cell-rna-seq cpp

2.4k stars 16.86 score 50k scripts 73 dependents

bioc

SummarizedExperiment:A container (S4 class) for matrix-like assays

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

Maintained by Hervé Pagès. Last updated 5 months ago.

genetics infrastructure sequencing annotation coverage genomeannotation bioconductor-package core-package

34 stars 16.84 score 8.6k scripts 1.2k dependents

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 20 hours ago.

fortran cpp

86 stars 16.73 score 7.7k scripts 101 dependents

bioc

GenomeInfoDb:Utilities for manipulating chromosome names, including modifying them to follow a particular naming style

Contains data and functions that define and allow translation between different chromosome sequence naming conventions (e.g., "chr1" versus "1"), including a function that attempts to place sequence names in their natural, rather than lexicographic, order.

Maintained by Hervé Pagès. Last updated 2 months ago.

genetics datarepresentation annotation genomeannotation bioconductor-package core-package

32 stars 16.32 score 1.3k scripts 1.7k dependents

bioc

DESeq2:Differential gene expression analysis based on the negative binomial distribution

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Maintained by Michael Love. Last updated 24 days ago.

sequencing rnaseq chipseq geneexpression transcription normalization differentialexpression bayesian regression principalcomponent clustering immunooncology openblas cpp

375 stars 16.11 score 17k scripts 115 dependents

bioc

biomaRt:Interface to BioMart databases (i.e. Ensembl)

In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (<http://www.biomart.org>). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The most prominent examples of BioMart databases are maintain by Ensembl, which provides biomaRt users direct access to a diverse set of data and enables a wide range of powerful online queries from gene annotation to database mining.

Maintained by Mike Smith. Last updated 15 days ago.

annotation bioconductor biomart ensembl

38 stars 15.99 score 13k scripts 230 dependents

davidgohel

officer:Manipulation of Microsoft Word and PowerPoint Documents

Access and manipulate 'Microsoft Word', 'RTF' and 'Microsoft PowerPoint' documents from R. The package focuses on tabular and graphical reporting from R; it also provides two functions that let users get document content into data objects. A set of functions lets add and remove images, tables and paragraphs of text in new or existing documents. The package does not require any installation of Microsoft products to be able to write Microsoft files.

Maintained by David Gohel. Last updated 6 days ago.

ms-office-documents powerpoint word

629 stars 15.85 score 4.1k scripts 142 dependents

bioc

enrichplot:Visualization of Functional Enrichment Result

The 'enrichplot' package implements several visualization methods for interpreting functional enrichment results obtained from ORA or GSEA analysis. It is mainly designed to work with the 'clusterProfiler' package suite. All the visualization methods are developed based on 'ggplot2' graphics.

Maintained by Guangchuang Yu. Last updated 3 months ago.

annotation genesetenrichment go kegg pathways software visualization enrichment-analysis pathway-analysis

239 stars 15.71 score 3.1k scripts 58 dependents

r-lib

gh:'GitHub' 'API'

Minimal client to access the 'GitHub' 'API'.

Maintained by Gábor Csárdi. Last updated 2 months ago.

github github-api

224 stars 15.55 score 444 scripts 401 dependents

ropensci

rnaturalearth:World Map Data from Natural Earth

Facilitates mapping by making natural earth map data from <https://www.naturalearthdata.com/> more easily available to R users.

Maintained by Philippe Massicotte. Last updated 13 days ago.

peer-reviewed

234 stars 15.51 score 7.2k scripts 47 dependents

bioc

Rsamtools:Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import

This package provides an interface to the 'samtools', 'bcftools', and 'tabix' utilities for manipulating SAM (Sequence Alignment / Map), FASTA, binary variant call (BCF) and compressed indexed tab-delimited (tabix) files.

Maintained by Bioconductor Package Maintainer. Last updated 4 months ago.

dataimport sequencing coverage alignment qualitycontrol bioconductor-package core-package curl bzip2 xz-utils zlib cpp

28 stars 15.34 score 3.2k scripts 569 dependents

bioc

GenomicFeatures:Query the gene models of a given organism/assembly

Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.

Maintained by H. Pagès. Last updated 5 months ago.

genetics infrastructure annotation sequencing genomeannotation bioconductor-package core-package

26 stars 15.34 score 5.3k scripts 339 dependents

r-lib

covr:Test Coverage for Packages

Track and report code coverage for your package and (optionally) upload the results to a coverage service like 'Codecov' <https://about.codecov.io> or 'Coveralls' <https://coveralls.io>. Code coverage is a measure of the amount of code being exercised by a set of tests. It is an indirect measure of test quality and completeness. This package is compatible with any testing methodology or framework and tracks coverage of both R code and compiled C/C++/FORTRAN code.

Maintained by Jim Hester. Last updated 2 months ago.

codecov coverage coverage-report travis-ci

337 stars 15.25 score 2.3k scripts 9 dependents

bioc

GenomicAlignments:Representation and manipulation of short genomic alignments

Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.

Maintained by Hervé Pagès. Last updated 5 months ago.

infrastructure dataimport genetics sequencing rnaseq snp coverage alignment immunooncology bioconductor-package core-package

10 stars 15.21 score 3.1k scripts 528 dependents

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 11 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

959 stars 15.20 score 4.0k scripts 21 dependents

bioc

AnnotationDbi:Manipulation of SQLite-based annotations in Bioconductor

Implements a user-friendly interface for querying SQLite-based annotation data packages.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation microarray sequencing genomeannotation bioconductor-package core-package

9 stars 15.05 score 3.6k scripts 769 dependents

bioc

DOSE:Disease Ontology Semantic and Enrichment analysis

This package implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring semantic similarities among DO terms and gene products. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented for discovering disease associations of high-throughput biological data.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation visualization multiplecomparison genesetenrichment pathways software disease-ontology enrichment-analysis semantic-similarity

119 stars 14.97 score 2.0k scripts 61 dependents

tidyverse

googledrive:An Interface to Google Drive

Manage Google Drive files from R.

Maintained by Jennifer Bryan. Last updated 8 months ago.

google-drive

329 stars 14.97 score 2.1k scripts 164 dependents

bioc

MultiAssayExperiment:Software for the integration of multi-omics experiments in Bioconductor

Harmonize data management of multiple experimental assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames. Facilities are provided for reshaping data into wide and long formats for adaptability to graphing and downstream analysis.

Maintained by Marcel Ramos. Last updated 2 months ago.

infrastructure datarepresentation bioconductor bioconductor-package genomics nci-itcr tcga u24ca289073

71 stars 14.95 score 670 scripts 127 dependents

r-lib

gargle:Utilities for Working with Google APIs

Provides utilities for working with Google APIs <https://developers.google.com/apis-explorer>. This includes functions and classes for handling common credential types and for preparing, executing, and processing HTTP requests.

Maintained by Jennifer Bryan. Last updated 2 years ago.

authentication google

113 stars 14.90 score 266 scripts 192 dependents

rstudio

rsconnect:Deploy Docs, Apps, and APIs to 'Posit Connect', 'shinyapps.io', and 'RPubs'

Programmatic deployment interface for 'RPubs', 'shinyapps.io', and 'Posit Connect'. Supported content types include R Markdown documents, Shiny applications, Plumber APIs, plots, and static web content.

Maintained by Aron Atkins. Last updated 29 days ago.

139 stars 14.90 score 3.1k scripts 6 dependents

ropensci

gert:Simple Git Client for R

Simple git client for R based on 'libgit2' <https://libgit2.org> with support for SSH and HTTPS remotes. All functions in 'gert' use basic R data types (such as vectors and data-frames) for their arguments and return values. User credentials are shared with command line 'git' through the git-credential store and ssh keys stored on disk or ssh-agent.

Maintained by Jeroen Ooms. Last updated 4 days ago.

libgit2

154 stars 14.85 score 158 scripts 367 dependents

bioc

GSVA:Gene Set Variation Analysis for Microarray and RNA-Seq Data

Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.

Maintained by Robert Castelo. Last updated 8 days ago.

functionalgenomics microarray rnaseq pathways genesetenrichment gene-set-enrichment genomics pathway-enrichment-analysis

212 stars 14.74 score 1.6k scripts 19 dependents

florianhartig

DHARMa:Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models

The 'DHARMa' package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals for fitted (generalized) linear mixed models. Currently supported are linear and generalized linear (mixed) models from 'lme4' (classes 'lmerMod', 'glmerMod'), 'glmmTMB', 'GLMMadaptive', and 'spaMM'; phylogenetic linear models from 'phylolm' (classes 'phylolm' and 'phyloglm'); generalized additive models ('gam' from 'mgcv'); 'glm' (including 'negbin' from 'MASS', but excluding quasi-distributions) and 'lm' model classes. Moreover, externally created simulations, e.g. posterior predictive simulations from Bayesian software such as 'JAGS', 'STAN', or 'BUGS' can be processed as well. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problems, such as over/underdispersion, zero-inflation, and residual spatial, phylogenetic and temporal autocorrelation.

Maintained by Florian Hartig. Last updated 25 days ago.

glmm regression regression-diagnostics residual

226 stars 14.74 score 2.8k scripts 10 dependents

tidyverse

googlesheets4:Access Google Sheets using the Sheets API V4

Interact with Google Sheets through the Sheets API v4 <https://developers.google.com/sheets/api>. "API" is an acronym for "application programming interface"; the Sheets API allows users to interact with Google Sheets programmatically, instead of via a web browser. The "v4" refers to the fact that the Sheets API is currently at version 4. This package can read and write both the metadata and the cell data in a Sheet.

Maintained by Jennifer Bryan. Last updated 8 months ago.

google-drive google-sheets spreadsheet

363 stars 14.55 score 7.0k scripts 144 dependents

ropensci

osmdata:Import 'OpenStreetMap' Data as Simple Features or Spatial Objects

Download and import of 'OpenStreetMap' ('OSM') data as 'sf' or 'sp' objects. 'OSM' data are extracted from the 'Overpass' web server (<https://overpass-api.de/>) and processed with very fast 'C++' routines for return to 'R'.

Maintained by Mark Padgham. Last updated 1 months ago.

open0street0map openstreetmap overpass0api osm cpp osm-data overpass-api peer-reviewed cpp

322 stars 14.53 score 2.8k scripts 14 dependents

bioc

TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data

The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Maintained by Tiago Chedraoui Silva. Last updated 1 months ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network sequencing survival software bioc bioconductor gdc integrative-analysis tcga tcga-data tcgabiolinks

310 stars 14.47 score 1.6k scripts 6 dependents

bioc

xcms:LC-MS and GC-MS Data Analysis

Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.

Maintained by Steffen Neumann. Last updated 15 days ago.

immunooncology massspectrometry metabolomics bioconductor feature-detection mass-spectrometry peak-detection cpp

196 stars 14.31 score 984 scripts 11 dependents

talgalili

heatmaply:Interactive Cluster Heat Maps Using 'plotly' and 'ggplot2'

Create interactive cluster 'heatmaps' that can be saved as a stand- alone HTML file, embedded in 'R Markdown' documents or in a 'Shiny' app, and available in the 'RStudio' viewer pane. Hover the mouse pointer over a cell to show details or drag a rectangle to zoom. A 'heatmap' is a popular graphical method for visualizing high-dimensional data, in which a table of numbers are encoded as a grid of colored cells. The rows and columns of the matrix are ordered to highlight patterns and are often accompanied by 'dendrograms'. 'Heatmaps' are used in many fields for visualizing observations, correlations, missing values patterns, and more. Interactive 'heatmaps' allow the inspection of specific value by hovering the mouse over a cell, as well as zooming into a region of the 'heatmap' by dragging a rectangle around the relevant area. This work is based on the 'ggplot2' and 'plotly.js' engine. It produces similar 'heatmaps' to 'heatmap.2' with the advantage of speed ('plotly.js' is able to handle larger size matrix), the ability to zoom from the 'dendrogram' panes, and the placing of factor variables in the sides of the 'heatmap'.

Maintained by Tal Galili. Last updated 9 months ago.

d3-heatmap dendextend dendrogram ggplot2 heatmap plotly

386 stars 14.21 score 2.0k scripts 45 dependents

business-science

timetk:A Tool Kit for Working with Time Series

Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.

Maintained by Matt Dancho. Last updated 1 years ago.

coercion coercion-functions data-mining dplyr forecast forecasting forecasting-models machine-learning series-decomposition series-signature tibble tidy tidyquant tidyverse time time-series timeseries

626 stars 14.20 score 4.0k scripts 16 dependents

rstudio

pins:Pin, Discover, and Share Resources

Publish data sets, models, and other R objects, making it easy to share them across projects and with your colleagues. You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with 'DropBox'), 'Posit Connect', 'AWS S3', and more.

Maintained by Julia Silge. Last updated 2 months ago.

azure gcloud rpins rsconnect s3 storage

321 stars 14.17 score 1.9k scripts 17 dependents

dkahle

ggmap:Spatial Visualization with ggplot2

A collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps and Stamen Maps). It includes tools common to those tasks, including functions for geolocation and routing.

Maintained by David Kahle. Last updated 1 years ago.

770 stars 14.17 score 12k scripts 31 dependents

doi-usgs

dataRetrieval:Retrieval Functions for USGS and EPA Hydrology and Water Quality Data

Collection of functions to help retrieve U.S. Geological Survey and U.S. Environmental Protection Agency water quality and hydrology data from web services. Data are discovered from National Water Information System <https://waterservices.usgs.gov/> and <https://waterdata.usgs.gov/nwis>. Water quality data are obtained from the Water Quality Portal <https://www.waterqualitydata.us/>.

Maintained by Laura DeCicco. Last updated 3 days ago.

usgs

286 stars 14.16 score 1.7k scripts 15 dependents

bioc

GOSemSim:GO-terms Semantic Similarity Measures

The semantic comparisons of Gene Ontology (GO) annotations provide quantitative ways to compute similarities between genes and gene groups, and have became important basis for many bioinformatics analysis approaches. GOSemSim is an R package for semantic similarity computation among GO terms, sets of GO terms, gene products and gene clusters. GOSemSim implemented five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation go clustering pathways network software bioinformatics gene-ontology semantic-similarity cpp

63 stars 14.12 score 708 scripts 68 dependents

bioc

BSgenome:Software infrastructure for efficient representation of full genomes and their SNPs

Infrastructure shared by all the Biostrings-based genome data packages.

Maintained by Hervé Pagès. Last updated 2 months ago.

genetics infrastructure datarepresentation sequencematching annotation snp bioconductor-package core-package

9 stars 14.12 score 1.2k scripts 267 dependents

leifeld

texreg:Conversion of R Regression Output to LaTeX or HTML Tables

Converts coefficients, standard errors, significance stars, and goodness-of-fit statistics of statistical models into LaTeX tables or HTML tables/MS Word documents or to nicely formatted screen output for the R console for easy model comparison. A list of several models can be combined in a single table. The output is highly customizable. New model types can be easily implemented. Details can be found in Leifeld (2013), JStatSoft <doi:10.18637/jss.v055.i08>.)

Maintained by Philip Leifeld. Last updated 3 months ago.

html-tables latex latex-tables regression reporting table texreg

113 stars 14.09 score 1.8k scripts 67 dependents

bioc

ensembldb:Utilities to create and use Ensembl-based annotation databases

The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.

Maintained by Johannes Rainer. Last updated 5 months ago.

genetics annotationdata sequencing coverage annotation bioconductor bioconductor-packages ensembl

35 stars 14.08 score 892 scripts 108 dependents

walkerke

tidycensus:Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames

An integrated R interface to several United States Census Bureau APIs (<https://www.census.gov/data/developers/data-sets.html>) and the US Census Bureau's geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for mapping and spatial analysis.

Maintained by Kyle Walker. Last updated 2 months ago.

648 stars 14.02 score 7.5k scripts 10 dependents

bioc

phyloseq:Handling and analysis of high-throughput microbiome census data

phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.

Maintained by Paul J. McMurdie. Last updated 5 months ago.

immunooncology sequencing microbiome metagenomics clustering classification multiplecomparison geneticvariability

597 stars 13.90 score 8.4k scripts 37 dependents

bioc

AnnotationHub:Client to access AnnotationHub resources

This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be discovered. The resource includes metadata about each resource, e.g., a textual description, tags, and date of modification. The client creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

infrastructure dataimport gui thirdpartyclient core-package u24ca289073

17 stars 13.88 score 2.7k scripts 104 dependents

config-i1

smooth:Forecasting Using State Space Models

Functions implementing Single Source of Error state space models for purposes of time series analysis and forecasting. The package includes ADAM (Svetunkov, 2023, <https://openforecast.org/adam/>), Exponential Smoothing (Hyndman et al., 2008, <doi: 10.1007/978-3-540-71918-2>), SARIMA (Svetunkov & Boylan, 2019 <doi: 10.1080/00207543.2019.1600764>), Complex Exponential Smoothing (Svetunkov & Kourentzes, 2018, <doi: 10.13140/RG.2.2.24986.29123>), Simple Moving Average (Svetunkov & Petropoulos, 2018 <doi: 10.1080/00207543.2017.1380326>) and several simulation functions. It also allows dealing with intermittent demand based on the iETS framework (Svetunkov & Boylan, 2019, <doi: 10.13140/RG.2.2.35897.06242>).

Maintained by Ivan Svetunkov. Last updated 12 days ago.

arima arima-forecasting ces ets exponential-smoothing forecast state-space time-series openblas cpp

90 stars 13.83 score 412 scripts 25 dependents

bioc

BiocFileCache:Manage Files Across Sessions

This package creates a persistent on-disk cache of files that the user can add, update, and retrieve. It is useful for managing resources (such as custom Txdb objects) that are costly or difficult to create, web resources, and data files used across sessions.

Maintained by Lori Shepherd. Last updated 2 months ago.

dataimport core-package u24ca289073

13 stars 13.76 score 486 scripts 436 dependents

ropensci

rentrez:'Entrez' in R

Provides an R interface to the NCBI's 'EUtils' API, allowing users to search databases like 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> and 'PubMed' <https://pubmed.ncbi.nlm.nih.gov/>, process the results of those searches and pull data into their R sessions.

Maintained by David Winter. Last updated 4 years ago.

199 stars 13.72 score 784 scripts 98 dependents

ropensci

taxize:Taxonomic Information from Around the Web

Interacts with a suite of web application programming interfaces (API) for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more. Some of the services supported include 'NCBI E-utilities' (<https://www.ncbi.nlm.nih.gov/books/NBK25501/>), 'Encyclopedia of Life' (<https://eol.org/docs/what-is-eol/data-services>), 'Global Biodiversity Information Facility' (<https://techdocs.gbif.org/en/openapi/>), and many more. Links to the API documentation for other supported services are available in the documentation for their respective functions in this package.

Maintained by Zachary Foster. Last updated 25 days ago.

taxonomy biology nomenclature json api web api-client identifiers species names api-wrapper biodiversity darwincore data taxize

274 stars 13.63 score 1.6k scripts 23 dependents

bioc

SingleCellExperiment:S4 Classes for Single Cell Data

Defines a S4 class for storing data from single-cell experiments. This includes specialized methods to store and retrieve spike-in information, dimensionality reduction coordinates and size factors for each cell, along with the usual metadata for genes and libraries.

Maintained by Davide Risso. Last updated 22 days ago.

immunooncology datarepresentation dataimport infrastructure singlecell

13.53 score 15k scripts 285 dependents

bioc

KEGGREST:Client-side REST access to the Kyoto Encyclopedia of Genes and Genomes (KEGG)

A package that provides a client interface to the Kyoto Encyclopedia of Genes and Genomes (KEGG) REST API. Only for academic use by academic users belonging to academic institutions (see <https://www.kegg.jp/kegg/rest/>). Note that KEGGREST is based on KEGGSOAP by J. Zhang, R. Gentleman, and Marc Carlson, and KEGG (python package) by Aurelien Mazurie.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation pathways thirdpartyclient kegg bioconductor-package core-package

10 stars 13.50 score 688 scripts 771 dependents

bioc

GEOquery:Get data from NCBI Gene Expression Omnibus (GEO)

The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.

Maintained by Sean Davis. Last updated 5 months ago.

microarray dataimport onechannel twochannel sage bioconductor bioinformatics data-science genomics ncbi-geo

93 stars 13.48 score 4.1k scripts 45 dependents

bioc

RCy3:Functions to Access and Control Cytoscape

Vizualize, analyze and explore networks using Cytoscape via R. Anything you can do using the graphical user interface of Cytoscape, you can now do with a single RCy3 function.

Maintained by Alex Pico. Last updated 2 days ago.

visualization graphandnetwork thirdpartyclient network

52 stars 13.47 score 628 scripts 17 dependents

ropensci

RSelenium:R Bindings for 'Selenium WebDriver'

Provides a set of R bindings for the 'Selenium 2.0 WebDriver' (see <https://www.selenium.dev/documentation/> for more information) using the 'JsonWireProtocol' (see <https://github.com/SeleniumHQ/selenium/wiki/JsonWireProtocol> for more information). 'Selenium 2.0 WebDriver' allows driving a web browser natively as a user would either locally or on a remote machine using the Selenium server it marks a leap forward in terms of web browser automation. Selenium automates web browsers (commonly referred to as browsers). Using RSelenium you can automate browsers locally or remotely.

Maintained by Jonathan Völkle. Last updated 2 years ago.

rselenium selenium webdriver

344 stars 13.38 score 1.9k scripts 12 dependents

business-science

tidyquant:Tidy Quantitative Financial Analysis

Bringing business and financial analysis to the 'tidyverse'. The 'tidyquant' package provides a convenient wrapper to various 'xts', 'zoo', 'quantmod', 'TTR' and 'PerformanceAnalytics' package functions and returns the objects in the tidy 'tibble' format. The main advantage is being able to use quantitative functions with the 'tidyverse' functions including 'purrr', 'dplyr', 'tidyr', 'ggplot2', 'lubridate', etc. See the 'tidyquant' website for more information, documentation and examples.

Maintained by Matt Dancho. Last updated 1 months ago.

dplyr financial-analysis financial-data financial-statements multiple-stocks performance-analysis performanceanalytics quantmod stock stock-exchanges stock-indexes stock-lists stock-performance stock-prices stock-symbol tidyverse time-series timeseries xts

872 stars 13.34 score 5.2k scripts

ropensci

rgbif:Interface to the Global Biodiversity Information Facility API

A programmatic interface to the Web Service methods provided by the Global Biodiversity Information Facility (GBIF; <https://www.gbif.org/developer/summary>). GBIF is a database of species occurrence records from sources all over the globe. rgbif includes functions for searching for taxonomic names, retrieving information on data providers, getting species occurrence records, getting counts of occurrence records, and using the GBIF tile map service to make rasters summarizing huge amounts of data.

Maintained by John Waller. Last updated 16 days ago.

gbif specimens api web-services occurrences species taxonomy biodiversity data lifewatch oscibio spocc

161 stars 13.26 score 2.1k scripts 20 dependents

bioc

dada2:Accurate, high-resolution sample inference from amplicon sequencing data

The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.

Maintained by Benjamin Callahan. Last updated 5 months ago.

immunooncology microbiome sequencing classification metagenomics amplicon bioconductor bioinformatics metabarcoding taxonomy cpp

487 stars 13.17 score 3.0k scripts 4 dependents

bioc

scran:Methods for Single-Cell RNA-Seq Data Analysis

Implements miscellaneous functions for interpretation of single-cell RNA-seq data. Methods are provided for assignment of cell cycle phase, detection of highly variable and significantly correlated genes, identification of marker genes, and other common tasks in routine single-cell analysis workflows.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology normalization sequencing rnaseq software geneexpression transcriptomics singlecell clustering bioconductor-package human-cell-atlas single-cell-rna-seq openblas cpp

41 stars 13.05 score 7.6k scripts 37 dependents

bioc

Gviz:Plotting data and annotation information along genomic coordinates

Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.

Maintained by Robert Ivanek. Last updated 5 months ago.

visualization microarray sequencing

79 stars 13.05 score 1.4k scripts 46 dependents

bioc

ChIPseeker:ChIPseeker for ChIP peak Annotation, Comparison, and Visualization

This package implements functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statstical methods for estimate the significance of overlap among ChIP peak data sets, and incorporate GEO database for user to compare the own dataset with those deposited in database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. Several visualization functions are implemented to summarize the coverage of the peak experiment, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS, and overlap of peaks or genes.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation chipseq software visualization multiplecomparison atac-seq chip-seq comparison epigenetics epigenomics

233 stars 13.05 score 1.6k scripts 5 dependents

ropensci

piggyback:Managing Larger Data on a GitHub Repository

Helps store files as GitHub release assets, which is a convenient way for large/binary data files to piggyback onto public and private GitHub repositories. Includes functions for file downloads, uploads, and managing releases via the GitHub API.

Maintained by Carl Boettiger. Last updated 4 months ago.

data-store git-lfs peer-reviewed

187 stars 12.98 score 187 scripts 12 dependents

tidyverse

ellmer:Chat with Large Language Models

Chat with large language models from a range of providers including 'Claude' <https://claude.ai>, 'OpenAI' <https://chatgpt.com>, and more. Supports streaming, asynchronous calls, tool calling, and structured data extraction.

Maintained by Hadley Wickham. Last updated 18 hours ago.

407 stars 12.94 score 98 scripts 8 dependents

michaelhallquist

MplusAutomation:An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus

Leverages the R language to automate latent variable model estimation and interpretation using 'Mplus', a powerful latent variable modeling program developed by Muthen and Muthen (<https://www.statmodel.com>). Specifically, this package provides routines for creating related groups of models, running batches of models, and extracting and tabulating model parameters and fit statistics.

Maintained by Michael Hallquist. Last updated 4 days ago.

86 stars 12.92 score 664 scripts 13 dependents

walkerke

tigris:Load Census TIGER/Line Shapefiles

Download TIGER/Line shapefiles from the United States Census Bureau (<https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html>) and load into R as 'sf' objects.

Maintained by Kyle Walker. Last updated 5 months ago.

331 stars 12.87 score 5.3k scripts 16 dependents

rinterface

bs4Dash:A 'Bootstrap 4' Version of 'shinydashboard'

Make 'Bootstrap 4' Shiny dashboards. Use the full power of 'AdminLTE3', a dashboard template built on top of 'Bootstrap 4' <https://github.com/ColorlibHQ/AdminLTE>.

Maintained by David Granjon. Last updated 7 months ago.

bootstrap4 dashboard-templates hacktoberfest2022 shiny shiny-apps shinydashboard

442 stars 12.87 score 1.2k scripts 15 dependents

bioc

iSEE:Interactive SummarizedExperiment Explorer

Create an interactive Shiny-based graphical user interface for exploring data stored in SummarizedExperiment objects, including row- and column-level metadata. The interface supports transmission of selections between plots and tables, code tracking, interactive tours, interactive or programmatic initialization, preservation of app state, and extensibility to new panel types via S4 classes. Special attention is given to single-cell data in a SingleCellExperiment object with visualization of dimensionality reduction results.

Maintained by Kevin Rue-Albrecht. Last updated 23 days ago.

cellbasedassays clustering dimensionreduction featureextraction geneexpression gui immunooncology shinyapps singlecell transcription transcriptomics visualization dimension-reduction feature-extraction gene-expression hacktoberfest human-cell-atlas shiny single-cell

225 stars 12.86 score 380 scripts 9 dependents

markedmondson1234

googleAuthR:Authenticate and Create Google APIs

Create R functions that interact with OAuth2 Google APIs <https://developers.google.com/apis-explorer/> easily, with auto-refresh and Shiny compatibility.

Maintained by Erik Grönroos. Last updated 10 months ago.

api authentication google googleauthr oauth2-flow shiny

178 stars 12.85 score 804 scripts 13 dependents

bioc

SingleR:Reference-Based Single-Cell RNA-Seq Annotation

Performs unbiased cell type recognition from single-cell RNA sequencing data, by leveraging reference transcriptomic datasets of pure cell types to infer the cell of origin of each single cell independently.

Maintained by Aaron Lun. Last updated 1 months ago.

software singlecell geneexpression transcriptomics classification clustering annotation bioconductor singler cpp

184 stars 12.83 score 2.1k scripts 2 dependents

bioc

minfi:Analyze Illumina Infinium DNA methylation arrays

Tools to analyze & visualize Illumina Infinium methylation arrays.

Maintained by Kasper Daniel Hansen. Last updated 4 months ago.

immunooncology dnamethylation differentialmethylation epigenetics microarray methylationarray multichannel twochannel dataimport normalization preprocessing qualitycontrol

60 stars 12.82 score 996 scripts 27 dependents

tkonopka

umap:Uniform Manifold Approximation and Projection

Uniform manifold approximation and projection is a technique for dimension reduction. The algorithm was described by McInnes and Healy (2018) in <arXiv:1802.03426>. This package provides an interface for two implementations. One is written from scratch, including components for nearest-neighbor search and for embedding. The second implementation is a wrapper for 'python' package 'umap-learn' (requires separate installation, see vignette for more details).

Maintained by Tomasz Konopka. Last updated 11 months ago.

dimensionality-reduction umap cpp

132 stars 12.82 score 3.6k scripts 45 dependents

bioc

MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics

MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.

Maintained by Laurent Gatto. Last updated 15 days ago.

immunooncology infrastructure proteomics massspectrometry qualitycontrol dataimport bioconductor bioinformatics mass-spectrometry proteomics-data visualisation cpp

131 stars 12.76 score 772 scripts 36 dependents

bioc

plyranges:A fluent interface for manipulating GenomicRanges

A dplyr-like interface for interacting with the common Bioconductor classes Ranges and GenomicRanges. By providing a grammatical and consistent way of manipulating these classes their accessiblity for new Bioconductor users is hopefully increased.

Maintained by Michael Love. Last updated 10 days ago.

infrastructure datarepresentation workflowstep coverage bioconductor data-analysis dplyr genomic-ranges genomics tidy-data

144 stars 12.66 score 1.9k scripts 20 dependents

bioc

rtracklayer:R interface to genome annotation files and the UCSC genome browser

Extensible framework for interacting with multiple genome browsers (currently UCSC built-in) and manipulating annotation tracks in various formats (currently GFF, BED, bedGraph, BED15, WIG, BigWig and 2bit built-in). The user may export/import tracks to/from the supported browsers, as well as query and modify the browser state, such as the current viewport.

Maintained by Michael Lawrence. Last updated 4 days ago.

annotation visualization dataimport zlib openssl curl

12.66 score 6.7k scripts 480 dependents

insightsengineering

teal:Exploratory Web Apps for Analyzing Clinical Trials Data

A 'shiny' based interactive exploration framework for analyzing clinical trials data. 'teal' currently provides a dynamic filtering facility and different data viewers. 'teal' 'shiny' applications are built using standard 'shiny' modules.

Maintained by Dawid Kaledkowski. Last updated 1 months ago.

clinical-trials nest shiny webapp

206 stars 12.65 score 176 scripts 5 dependents

bioc

SpatialExperiment:S4 Class for Spatially Resolved -omics Data

Defines an S4 class for storing data from spatial -omics experiments. The class extends SingleCellExperiment to support storage and retrieval of additional information from spot-based and molecule-based platforms, including spatial coordinates, images, and image metadata. A specialized constructor function is included for data from the 10x Genomics Visium platform.

Maintained by Dario Righelli. Last updated 5 months ago.

datarepresentation dataimport infrastructure immunooncology geneexpression transcriptomics singlecell spatial

59 stars 12.63 score 1.8k scripts 71 dependents

hrbrmstr

ggalt:Extra Coordinate Systems, 'Geoms', Statistical Transformations, Scales and Fonts for 'ggplot2'

A compendium of new geometries, coordinate systems, statistical transformations, scales and fonts for 'ggplot2', including splines, 1d and 2d densities, univariate average shifted histograms, a new map coordinate system based on the 'PROJ.4'-library along with geom_cartogram() that mimics the original functionality of geom_map(), formatters for "bytes", a stat_stepribbon() function, increased 'plotly' compatibility and the 'StateFace' open source font 'ProPublica'. Further new functionality includes lollipop charts, dumbbell charts, the ability to encircle points and coordinate-system-based text annotations.

Maintained by Bob Rudis. Last updated 2 years ago.

geom ggplot-extension ggplot2 ggplot2-geom ggplot2-scales

676 stars 12.60 score 2.3k scripts 7 dependents

massimoaria

bibliometrix:Comprehensive Science Mapping Analysis

Tool for quantitative research in scientometrics and bibliometrics. It implements the comprehensive workflow for science mapping analysis proposed in Aria M. and Cuccurullo C. (2017) <doi:10.1016/j.joi.2017.08.007>. 'bibliometrix' provides various routines for importing bibliographic data from 'SCOPUS', 'Clarivate Analytics Web of Science' (<https://www.webofknowledge.com/>), 'Digital Science Dimensions' (<https://www.dimensions.ai/>), 'OpenAlex' (<https://openalex.org/>), 'Cochrane Library' (<https://www.cochranelibrary.com/>), 'Lens' (<https://lens.org>), and 'PubMed' (<https://pubmed.ncbi.nlm.nih.gov/>) databases, performing bibliometric analysis and building networks for co-citation, coupling, scientific collaboration and co-word analysis.

Maintained by Massimo Aria. Last updated 10 days ago.

bibliometric-analysis bibliometrics citation citation-network citations co-authors co-occurence co-word-analysis correspondence-analysis coupling isi-web journal manuscript quantitative-analysis scholars science science-mapping scientific scientometrics scopus

545 stars 12.54 score 518 scripts 2 dependents

bioc

microbiome:Microbiome Analytics

Utilities for microbiome analysis.

Maintained by Leo Lahti. Last updated 5 months ago.

metagenomics microbiome sequencing systemsbiology hitchip hitchip-atlas human-microbiome microbiology microbiome-analysis phyloseq population-study

293 stars 12.51 score 2.0k scripts 5 dependents

r-dbi

bigrquery:An Interface to Google's 'BigQuery' 'API'

Easily talk to Google's 'BigQuery' database from R.

Maintained by Hadley Wickham. Last updated 1 months ago.

bigquery database cpp

520 stars 12.47 score 1.8k scripts 4 dependents

cloudyr

aws.s3:'AWS S3' Client Package

A simple client package for the Amazon Web Services ('AWS') Simple Storage Service ('S3') 'REST' 'API' <https://aws.amazon.com/s3/>.

Maintained by Simon Urbanek. Last updated 5 years ago.

amazon aws aws-s3 cloudyr s3 s3-storage

383 stars 12.47 score 1.4k scripts 17 dependents

r-lib

credentials:Tools for Managing SSH and Git Credentials

Setup and retrieve HTTPS and SSH credentials for use with 'git' and other services. For HTTPS remotes the package interfaces the 'git-credential' utility which 'git' uses to store HTTP usernames and passwords. For SSH remotes we provide convenient functions to find or generate appropriate SSH keys. The package both helps the user to setup a local git installation, and also provides a back-end for git/ssh client libraries to authenticate with existing user credentials.

Maintained by Jeroen Ooms. Last updated 6 months ago.

git password ssh

72 stars 12.40 score 91 scripts 380 dependents

bioc

scDblFinder:scDblFinder

The scDblFinder package gathers various methods for the detection and handling of doublets/multiplets in single-cell sequencing data (i.e. multiple cells captured within the same droplet or reaction volume). It includes methods formerly found in the scran package, the new fast and comprehensive scDblFinder method, and a reimplementation of the Amulet detection method for single-cell ATAC-seq.

Maintained by Pierre-Luc Germain. Last updated 9 days ago.

preprocessing singlecell rnaseq atacseq doublets single-cell

184 stars 12.38 score 888 scripts 1 dependents

bioc

TFBSTools:Software Package for Transcription Factor Binding Site (TFBS) Analysis

TFBSTools is a package for the analysis and manipulation of transcription factor binding sites. It includes matrices conversion between Position Frequency Matirx (PFM), Position Weight Matirx (PWM) and Information Content Matrix (ICM). It can also scan putative TFBS from sequence/alignment, query JASPAR database and provides a wrapper of de novo motif discovery software.

Maintained by Ge Tan. Last updated 17 days ago.

motifannotation generegulation motifdiscovery transcription alignment

28 stars 12.36 score 1.1k scripts 18 dependents

ouhscbbmc

REDCapR:Interaction Between R and REDCap

Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.

Maintained by Will Beasley. Last updated 3 months ago.

redcap redcap-api

118 stars 12.36 score 438 scripts 6 dependents

ropensci

stplanr:Sustainable Transport Planning

Tools for transport planning with an emphasis on spatial transport data and non-motorized modes. The package was originally developed to support the 'Propensity to Cycle Tool', a publicly available strategic cycle network planning tool (Lovelace et al. 2017) <doi:10.5198/jtlu.2016.862>, but has since been extended to support public transport routing and accessibility analysis (Moreno-Monroy et al. 2017) <doi:10.1016/j.jtrangeo.2017.08.012> and routing with locally hosted routing engines such as 'OSRM' (Lowans et al. 2023) <doi:10.1016/j.enconman.2023.117337>. The main functions are for creating and manipulating geographic "desire lines" from origin-destination (OD) data (building on the 'od' package); calculating routes on the transport network locally and via interfaces to routing services such as <https://cyclestreets.net/> (Desjardins et al. 2021) <doi:10.1007/s11116-021-10197-1>; and calculating route segment attributes such as bearing. The package implements the 'travel flow aggregration' method described in Morgan and Lovelace (2020) <doi:10.1177/2399808320942779> and the 'OD jittering' method described in Lovelace et al. (2022) <doi:10.32866/001c.33873>. Further information on the package's aim and scope can be found in the vignettes and in a paper in the R Journal (Lovelace and Ellison 2018) <doi:10.32614/RJ-2018-053>, and in a paper outlining the landscape of open source software for geographic methods in transport planning (Lovelace, 2021) <doi:10.1007/s10109-020-00342-2>.

Maintained by Robin Lovelace. Last updated 7 months ago.

cycle cycling desire-lines origin-destination peer-reviewed pubic-transport route-network routes routing spatial transport transport-planning transportation walking

427 stars 12.31 score 684 scripts 3 dependents

r-lib

keyring:Access the System Credential Store from R

Platform independent 'API' to access the operating system's credential store. Currently supports: 'Keychain' on 'macOS', Credential Store on 'Windows', the Secret Service 'API' on 'Linux', and simple, platform independent stores implemented with environment variables or encrypted files. Additional storage back-ends can be added easily.

Maintained by Gábor Csárdi. Last updated 27 days ago.

keyring security libsecret glib

198 stars 12.29 score 976 scripts 56 dependents

bioc

bsseq:Analyze, manage and store whole-genome methylation data

A collection of tools for analyzing and visualizing whole-genome methylation data from sequencing. This includes whole-genome bisulfite sequencing and Oxford nanopore data.

Maintained by Kasper Daniel Hansen. Last updated 3 months ago.

dnamethylation cpp

37 stars 12.26 score 676 scripts 15 dependents

bioc

ReactomePA:Reactome Pathway Analysis

This package provides functions for pathway analysis based on REACTOME pathway database. It implements enrichment analysis, gene set enrichment analysis and several functions for visualization. This package is not affiliated with the Reactome team.

Maintained by Guangchuang Yu. Last updated 5 months ago.

pathways visualization annotation multiplecomparison genesetenrichment reactome enrichment-analysis reactome-pathway-analysis reactomepa

40 stars 12.25 score 1.5k scripts 7 dependents

bioc

ggbio:Visualization tools for genomic data

The ggbio package extends and specializes the grammar of graphics for biological data. The graphics are designed to answer common scientific questions, in particular those often asked of high throughput genomics data. All core Bioconductor data structures are supported, where appropriate. The package supports detailed views of particular genomic regions, as well as genome-wide overviews. Supported overviews include ideograms and grand linear views. High-level plots include sequence fragment length, edge-linked interval to data view, mismatch pileup, and several splicing summaries.

Maintained by Michael Lawrence. Last updated 5 months ago.

infrastructure visualization

111 stars 12.23 score 734 scripts 16 dependents

stuart-lab

Signac:Analysis of Single-Cell Chromatin Data

A framework for the analysis and exploration of single-cell chromatin data. The 'Signac' package contains functions for quantifying single-cell chromatin data, computing per-cell quality control metrics, dimension reduction and normalization, visualization, and DNA sequence motif analysis. Reference: Stuart et al. (2021) <doi:10.1038/s41592-021-01282-5>.

Maintained by Tim Stuart. Last updated 7 months ago.

atac bioinformatics single-cell zlib cpp

355 stars 12.18 score 3.7k scripts 1 dependents

bioc

glmGamPoi:Fit a Gamma-Poisson Generalized Linear Model

Fit linear models to overdispersed count data. The package can estimate the overdispersion and fit repeated models for matrix input. It is designed to handle large input datasets as they typically occur in single cell RNA-seq experiments.

Maintained by Constantin Ahlmann-Eltze. Last updated 12 days ago.

regression rnaseq software singlecell gamma-poisson glm negative-binomial-regression on-disk openblas cpp

111 stars 12.16 score 1.0k scripts 4 dependents

rstudio

shinytest2:Testing for Shiny Applications

Automated unit testing of Shiny applications through a headless 'Chromium' browser.

Maintained by Barret Schloerke. Last updated 3 days ago.

cpp

108 stars 12.13 score 704 scripts 1 dependents

bioc

SeqArray:Data management of large-scale whole-genome sequence variant calls using GDS files

Data management of large-scale whole-genome sequencing variant calls with thousands of individuals: genotypic data (e.g., SNVs, indels and structural variation calls) and annotations in SeqArray GDS files are stored in an array-oriented and compressed manner, with efficient data access using the R programming language.

Maintained by Xiuwen Zheng. Last updated 6 days ago.

infrastructure datarepresentation sequencing genetics bioinformatics gds-format snp snv wes wgs cpp

45 stars 12.11 score 1.1k scripts 9 dependents

bioc

ShortRead:FASTQ input and manipulation

This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

dataimport sequencing qualitycontrol bioconductor-package core-package zlib cpp

8 stars 12.08 score 1.8k scripts 49 dependents

ropensci

RefManageR:Straightforward 'BibTeX' and 'BibLaTeX' Bibliography Management

Provides tools for importing and working with bibliographic references. It greatly enhances the 'bibentry' class by providing a class 'BibEntry' which stores 'BibTeX' and 'BibLaTeX' references, supports 'UTF-8' encoding, and can be easily searched by any field, by date ranges, and by various formats for name lists (author by last names, translator by full names, etc.). Entries can be updated, combined, sorted, printed in a number of styles, and exported. 'BibTeX' and 'BibLaTeX' '.bib' files can be read into 'R' and converted to 'BibEntry' objects. Interfaces to 'NCBI Entrez', 'CrossRef', and 'Zotero' are provided for importing references and references can be created from locally stored 'PDF' files using 'Poppler'. Includes functions for citing and generating a bibliography with hyperlinks for documents prepared with 'RMarkdown' or 'RHTML'.

Maintained by Mathew W. McLean. Last updated 4 months ago.

peer-reviewed

115 stars 12.06 score 2.3k scripts 16 dependents

ropensci

rotl:Interface to the 'Open Tree of Life' API

An interface to the 'Open Tree of Life' API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to 'Open Tree identifiers'. The 'Open Tree of Life' aims at assembling a comprehensive phylogenetic tree for all named species.

Maintained by Francois Michonneau. Last updated 2 years ago.

metadata ropensci phylogenetics independant-contrasts biodiversity peer-reviewed phylogeny taxonomy

40 stars 12.05 score 356 scripts 29 dependents

bioc

slingshot:Tools for ordering single-cell sequencing

Provides functions for inferring continuous, branching lineage structures in low-dimensional data. Slingshot was designed to model developmental trajectories in single-cell RNA sequencing data and serve as a component in an analysis pipeline after dimensionality reduction and clustering. It is flexible enough to handle arbitrarily many branching events and allows for the incorporation of prior knowledge through supervised graph construction.

Maintained by Kelly Street. Last updated 5 months ago.

clustering differentialexpression geneexpression rnaseq sequencing software singlecell transcriptomics visualization

283 stars 12.01 score 1.0k scripts 4 dependents

bioc

ExperimentHub:Client to access ExperimentHub resources

This package provides a client for the Bioconductor ExperimentHub web resource. ExperimentHub provides a central location where curated data from experiments, publications or training courses can be accessed. Each resource has associated metadata, tags and date of modification. The client creates and manages a local cache of files retrieved enabling quick and reproducible access.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

infrastructure dataimport gui thirdpartyclient core-package u24ca289073

10 stars 11.94 score 764 scripts 57 dependents

bioc

GenomicDataCommons:NIH / NCI Genomic Data Commons Access

Programmatically access the NIH / NCI Genomic Data Commons RESTful service.

Maintained by Sean Davis. Last updated 2 months ago.

dataimport sequencing api-client bioconductor bioinformatics cancer core-services data-science genomics nci tcga vignette

87 stars 11.94 score 238 scripts 12 dependents

jinghuazhao

gap:Genetic Analysis Package

As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).

Maintained by Jing Hua Zhao. Last updated 4 days ago.

genetics imputation lmm fortran

12 stars 11.94 score 448 scripts 16 dependents

pecanproject

PEcAn.DB:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by David LeBauer. Last updated 2 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 11.89 score 127 scripts 27 dependents

bioc

QFeatures:Quantitative features for mass spectrometry data

The QFeatures infrastructure enables the management and processing of quantitative features for high-throughput mass spectrometry assays. It provides a familiar Bioconductor user experience to manages quantitative data across different assay levels (such as peptide spectrum matches, peptides and proteins) in a coherent and tractable format.

Maintained by Laurent Gatto. Last updated 25 days ago.

infrastructure massspectrometry proteomics metabolomics bioconductor mass-spectrometry

27 stars 11.87 score 278 scripts 49 dependents

bioc

methylKit:DNA methylation analysis from high-throughput bisulfite sequencing results

methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing. It also has functions to analyze base-pair resolution 5hmC data from experimental protocols such as oxBS-Seq and TAB-Seq. Methylation calling can be performed directly from Bismark aligned BAM files.

Maintained by Altuna Akalin. Last updated 29 days ago.

dnamethylation sequencing methylseq genome-biology methylation statistical-analysis visualization curl bzip2 xz-utils zlib cpp

220 stars 11.80 score 578 scripts 3 dependents

mrkaye97

slackr:Send Messages, Images, R Objects and Files to 'Slack' Channels/Users

'Slack' <https://slack.com/> provides a service for teams to collaborate by sharing messages, images, links, files and more. Functions are provided that make it possible to interact with the 'Slack' platform 'API'. When you need to share information or data from R, rather than resort to copy/ paste in e-mails or other services like 'Skype' <https://www.skype.com/en/>, you can use this package to send well-formatted output from multiple R objects and expressions to all teammates at the same time with little effort. You can also send images from the current graphics device, R objects, and upload files.

Maintained by Matt Kaye. Last updated 5 months ago.

slack

306 stars 11.66 score 179 scripts

haleyjeppson

ggmosaic:Mosaic Plots in the 'ggplot2' Framework

Mosaic plots in the 'ggplot2' framework. Mosaic plot functionality is provided in a single 'ggplot2' layer by calling the geom 'mosaic'.

Maintained by Haley Jeppson. Last updated 6 months ago.

167 stars 11.63 score 1.8k scripts 4 dependents

pecanproject

PEcAn.data.atmosphere:PEcAn Functions Used for Managing Climate Driver Data

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The PECAn.data.atmosphere package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.

Maintained by David LeBauer. Last updated 2 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 11.61 score 64 scripts 14 dependents

bioc

bumphunter:Bump Hunter

Tools for finding bumps in genomic data

Maintained by Tamilselvi Guharaj. Last updated 5 months ago.

dnamethylation epigenetics infrastructure multiplecomparison immunooncology

16 stars 11.61 score 210 scripts 43 dependents

workflowr

workflowr:A Framework for Reproducible and Collaborative Data Science

Provides a workflow for your analysis projects by combining literate programming ('knitr' and 'rmarkdown') and version control ('Git', via 'git2r') to generate a website containing time-stamped, versioned, and documented results.

Maintained by John Blischak. Last updated 4 months ago.

git project-management rmarkdown website workflow

848 stars 11.53 score 566 scripts

urbananalyst

dodgr:Distances on Directed Graphs

Distances on dual-weighted directed graphs using priority-queue shortest paths (Padgham (2019) <doi:10.32866/6945>). Weighted directed graphs have weights from A to B which may differ from those from B to A. Dual-weighted directed graphs have two sets of such weights. A canonical example is a street network to be used for routing in which routes are calculated by weighting distances according to the type of way and mode of transport, yet lengths of routes must be calculated from direct distances.

Maintained by Mark Padgham. Last updated 20 hours ago.

distance openstreetmap router shortest-paths street-networks cpp

129 stars 11.52 score 229 scripts 4 dependents

bioc

systemPipeR:systemPipeR: Workflow Environment for Data Analysis and Report Generation

systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates.

Maintained by Thomas Girke. Last updated 5 months ago.

genetics infrastructure dataimport sequencing rnaseq riboseq chipseq methylseq snp geneexpression coverage genesetenrichment alignment qualitycontrol immunooncology reportwriting workflowstep workflowmanagement

53 stars 11.52 score 344 scripts 3 dependents

bioc

mia:Microbiome analysis

mia implements tools for microbiome analysis based on the SummarizedExperiment, SingleCellExperiment and TreeSummarizedExperiment infrastructure. Data wrangling and analysis in the context of taxonomic data is the main scope. Additional functions for common task are implemented such as community indices calculation and summarization.

Maintained by Tuomas Borman. Last updated 2 days ago.

microbiome software dataimport analysis bioconductor cpp

51 stars 11.51 score 316 scripts 5 dependents

r-lib

gmailr:Access the 'Gmail' 'RESTful' API

An interface to the 'Gmail' 'RESTful' API. Allows access to your 'Gmail' messages, threads, drafts and labels.

Maintained by Jennifer Bryan. Last updated 1 years ago.

230 stars 11.50 score 289 scripts 1 dependents

bioc

msa:Multiple Sequence Alignment

The 'msa' package provides a unified R/Bioconductor interface to the multiple sequence alignment algorithms ClustalW, ClustalOmega, and Muscle. All three algorithms are integrated in the package, therefore, they do not depend on any external software tools and are available for all major platforms. The multiple sequence alignment algorithms are complemented by a function for pretty-printing multiple sequence alignments using the LaTeX package TeXshade.

Maintained by Ulrich Bodenhofer. Last updated 1 months ago.

multiplesequencealignment alignment multiplecomparison sequencing cpp

17 stars 11.46 score 744 scripts 6 dependents

bioc

destiny:Creates diffusion maps

Create and plot diffusion maps.

Maintained by Philipp Angerer. Last updated 4 months ago.

cellbiology cellbasedassays clustering software visualization diffusion-maps dimensionality-reduction cpp

82 stars 11.44 score 792 scripts 1 dependents

rolkra

explore:Simplifies Exploratory Data Analysis

Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.

Maintained by Roland Krasser. Last updated 4 months ago.

data-exploration data-visualisation decision-trees eda rmarkdown shiny tidy

228 stars 11.43 score 221 scripts 1 dependents

bioc

annotate:Annotation for microarrays

Using R enviroments for annotation.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation pathways go

11.41 score 812 scripts 239 dependents

bioc

VariantAnnotation:Annotation of Genetic Variants

Annotate variants, compute amino acid coding changes, predict coding outcomes.

Maintained by Bioconductor Package Maintainer. Last updated 3 months ago.

dataimport sequencing snp annotation genetics variantannotation curl bzip2 xz-utils zlib

11.39 score 1.9k scripts 152 dependents

bioc

PharmacoGx:Analysis of Large-Scale Pharmacogenomic Data

Contains a set of functions to perform large-scale analysis of pharmaco-genomic data. These include the PharmacoSet object for storing the results of pharmacogenomic experiments, as well as a number of functions for computing common summaries of drug-dose response and correlating them with the molecular features in a cancer cell-line.

Maintained by Benjamin Haibe-Kains. Last updated 3 months ago.

geneexpression pharmacogenetics pharmacogenomics software classification datasets pharmacogenomic pharmacogx cpp

68 stars 11.39 score 442 scripts 3 dependents

r4ss

r4ss:R Code for Stock Synthesis

A collection of R functions for use with Stock Synthesis, a fisheries stock assessment modeling platform written in ADMB by Dr. Richard D. Methot at the NOAA Northwest Fisheries Science Center. The functions include tools for summarizing and plotting results, manipulating files, visualizing model parameterizations, and various other common stock assessment tasks. This version of '{r4ss}' is compatible with Stock Synthesis versions 3.24 through 3.30 (specifically version 3.30.23.1, from December 2024). Support for 3.24 models is only through the core functions for reading output and plotting.

Maintained by Ian G. Taylor. Last updated 17 days ago.

fisheries fisheries-stock-assessment stock-synthesis

43 stars 11.38 score 1.0k scripts 2 dependents

doi-usgs

nhdplusTools:NHDPlus Tools

Tools for traversing and working with National Hydrography Dataset Plus (NHDPlus) data. All methods implemented in 'nhdplusTools' are available in the NHDPlus documentation available from the US Environmental Protection Agency <https://www.epa.gov/waterdata/basic-information>.

Maintained by David Blodgett. Last updated 1 months ago.

87 stars 11.38 score 348 scripts 5 dependents

bioc

pathview:a tool set for pathway based data integration and visualization

Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.

Maintained by Weijun Luo. Last updated 8 hours ago.

pathways graphandnetwork visualization genesetenrichment differentialexpression geneexpression microarray rnaseq genetics metabolomics proteomics systemsbiology sequencing

40 stars 11.37 score 1.6k scripts 10 dependents

ropensci

biomartr:Genomic Data Retrieval

Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database (Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr', 'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. with only one command.

Maintained by Hajk-Georg Drost. Last updated 2 months ago.

biomart genomic-data-retrieval annotation-retrieval database-retrieval ncbi ensembl biological-data-retrieval ensembl-servers genome genome-annotation genome-retrieval genomics meta-analysis metagenomics ncbi-genbank peer-reviewed proteome sequenced-genomes

218 stars 11.35 score 129 scripts 3 dependents

jessecambon

tidygeocoder:Geocoding Made Easy

An intuitive interface for getting data from geocoding services.

Maintained by Jesse Cambon. Last updated 5 months ago.

geocoding rspatial tidyverse

287 stars 11.35 score 1.0k scripts 9 dependents

dsy109

mixtools:Tools for Analyzing Finite Mixture Models

Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixtures-of-regressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictor-dependent mixing proportions, random effects regressions, hierarchical mixtures-of-experts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic, mixturegrams, and model selection criteria). Bayesian estimation of mixtures-of-linear-regressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772 and the Chan Zuckerberg Initiative: Essential Open Source Software for Science (Grant No. 2020-255193).

Maintained by Derek Young. Last updated 10 months ago.

mixture-models mixture-of-experts semiparametric-regression

20 stars 11.34 score 1.4k scripts 56 dependents

r-hub

rhub:Tools for R Package Developers

R-hub v2 uses GitHub Actions to run 'R CMD check' and similar package checks. The 'rhub' package helps you set up R-hub v2 for your R package, and start running checks.

Maintained by Gábor Csárdi. Last updated 22 days ago.

359 stars 11.33 score 191 scripts 1 dependents

ropensci

ssh:Secure Shell (SSH) Client for R

Connect to a remote server over SSH to transfer files via SCP, setup a secure tunnel, or run a command or script on the host while streaming stdout and stderr directly to the client.

Maintained by Jeroen Ooms. Last updated 3 days ago.

libssh ssh ssh-client

129 stars 11.33 score 128 scripts 10 dependents

neuhausi

canvasXpress:Visualization Package for CanvasXpress in R

Enables creation of visualizations using the CanvasXpress framework in R. CanvasXpress is a standalone JavaScript library for reproducible research with complete tracking of data and end-user modifications stored in a single PNG image that can be played back. See <https://www.canvasxpress.org> for more information.

Maintained by Connie Brett. Last updated 21 hours ago.

analytics bioinformatics chart charting dash dashboard data-analytics data-science data-visualization genomics graphs javascript network network-visualization python reproducible-research shiny visualization

297 stars 11.28 score 145 scripts

bioc

MAST:Model-based Analysis of Single Cell Transcriptomics

Methods and models for handling zero-inflated single cell assay data.

Maintained by Andrew McDavid. Last updated 5 months ago.

geneexpression differentialexpression genesetenrichment rnaseq transcriptomics singlecell

232 stars 11.28 score 1.8k scripts 5 dependents

mrcieu

TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database

A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <https://gwas.mrcieu.ac.uk/> to automatically obtain data, and a wide range of methods to run the analysis.

Maintained by Gibran Hemani. Last updated 1 days ago.

476 stars 11.27 score 1.7k scripts 1 dependents

jeroen

mongolite:Fast and Simple 'MongoDB' Client for R

High-performance MongoDB client based on 'mongo-c-driver' and 'jsonlite'. Includes support for aggregation, indexing, map-reduce, streaming, encryption, enterprise authentication, and GridFS. The online user manual provides an overview of the available methods in the package: <https://jeroen.github.io/mongolite/>.

Maintained by Jeroen Ooms. Last updated 7 days ago.

cyrus-sasl2 openssl glibc zlib

285 stars 11.25 score 860 scripts 10 dependents

bioc

zellkonverter:Conversion Between scRNA-seq Objects

Provides methods to convert between Python AnnData objects and SingleCellExperiment objects. These are primarily intended for use by downstream Bioconductor packages that wrap Python methods for single-cell data analysis. It also includes functions to read and write H5AD files used for saving AnnData objects to disk.

Maintained by Luke Zappia. Last updated 20 days ago.

singlecell dataimport datarepresentation bioconductor conversion scrna-seq

159 stars 11.25 score 660 scripts 4 dependents

paws-r

paws:Amazon Web Services Software Development Kit

Interface to Amazon Web Services <https://aws.amazon.com>, including storage, database, and compute services, such as 'Simple Storage Service' ('S3'), 'DynamoDB' 'NoSQL' database, and 'Lambda' functions-as-a-service.

Maintained by Dyfan Jones. Last updated 16 days ago.

aws aws-sdk

332 stars 11.25 score 177 scripts 12 dependents

bioc

karyoploteR:Plot customizable linear genomes displaying arbitrary data

karyoploteR creates karyotype plots of arbitrary genomes and offers a complete set of functions to plot arbitrary data on them. It mimicks many R base graphics functions coupling them with a coordinate change function automatically mapping the chromosome and data coordinates into the plot coordinates. In addition to the provided data plotting functions, it is easy to add new ones.

Maintained by Bernat Gel. Last updated 5 months ago.

visualization copynumbervariation sequencing coverage dnaseq chipseq methylseq dataimport onechannel bioconductor bioinformatics data-visualization genome genomics-visualization plotting-in-r

307 stars 11.25 score 656 scripts 4 dependents

adeverse

adespatial:Multivariate Multiscale Spatial Analysis

Tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran's Eigenvectors Maps, MEM). Several approaches are described in the review Dray et al (2012) <doi:10.1890/11-1183.1>.

Maintained by Aurélie Siberchicot. Last updated 9 days ago.

fortran openblas

36 stars 11.16 score 398 scripts 2 dependents

jamiemkass

ENMeval:Automated Tuning and Evaluations of Ecological Niche Models

Runs ecological niche models over all combinations of user-defined settings (i.e., tuning), performs cross validation to evaluate models, and returns data tables to aid in selection of optimal model settings that balance goodness-of-fit and model complexity. Also has functions to partition data spatially (or not) for cross validation, to plot multiple visualizations of results, to run null models to estimate significance and effect sizes of performance metrics, and to calculate range overlap between model predictions, among others. The package was originally built for Maxent models (Phillips et al. 2006, Phillips et al. 2017), but the current version allows possible extensions for any modeling algorithm. The extensive vignette, which guides users through most package functionality but unfortunately has a file size too big for CRAN, can be found here on the package's Github Pages website: <https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.

Maintained by Jamie M. Kass. Last updated 12 hours ago.

49 stars 11.16 score 332 scripts 2 dependents

azure

Microsoft365R:Interface to the 'Microsoft 365' Suite of Cloud Services

An interface to the 'Microsoft 365' (formerly known as 'Office 365') suite of cloud services, building on the framework supplied by the 'AzureGraph' package. Enables access from R to data stored in 'Teams', 'SharePoint Online' and 'OneDrive', including the ability to list drive folder contents, upload and download files, send messages, and retrieve data lists. Also provides a full-featured 'Outlook' email client, with the ability to send emails and manage emails and mail folders.

Maintained by Hong Ooi. Last updated 28 days ago.

azure-sdk-r microsoft-365 microsoft-graph-api office-365 onedrive onedrive-for-business sharepoint-online

325 stars 11.14 score 88 scripts 7 dependents

bioc

genomation:Summary, annotation and visualization of genomic data

A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.

Maintained by Altuna Akalin. Last updated 5 months ago.

annotation sequencing visualization cpgisland cpp

76 stars 11.13 score 738 scripts 5 dependents

bioc

genefilter:genefilter: methods for filtering genes from high-throughput experiments

Some basic functions for filtering genes.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

microarray fortran cpp

11.11 score 2.4k scripts 143 dependents

usepa

elevatr:Access Elevation Data from Various APIs

Several web services are available that provide access to elevation data. This package provides access to many of those services and returns elevation data either as an 'sf' simple features object from point elevation services or as a 'raster' object from raster elevation services. In future versions, 'elevatr' will drop support for 'raster' and will instead return 'terra' objects. Currently, the package supports access to the Amazon Web Services Terrain Tiles <https://registry.opendata.aws/terrain-tiles/>, the Open Topography Global Datasets API <https://opentopography.org/developers/>, and the USGS Elevation Point Query Service <https://apps.nationalmap.gov/epqs/>.

Maintained by Jeffrey Hollister. Last updated 7 months ago.

digital-elevation-model elevation-data elevatr epa mapzen-elevation-service r-language

206 stars 11.11 score 1.3k scripts 3 dependents

fmichonneau

phylobase:Base Package for Phylogenetic Structures and Comparative Data

Provides a base S4 class for comparative methods, incorporating one or more trees and trait data.

Maintained by Francois Michonneau. Last updated 1 years ago.

phylogenetics cpp

18 stars 11.10 score 394 scripts 18 dependents

covid19datahub

COVID19:COVID-19 Data Hub

Unified datasets for a better understanding of COVID-19.

Maintained by Emanuele Guidotti. Last updated 1 months ago.

2019-ncov coronavirus covid-19 covid-data covid19-data

252 stars 11.08 score 265 scripts

ropengov

eurostat:Tools for Eurostat Open Data

Tools to download data from the Eurostat database <https://ec.europa.eu/eurostat> together with search and manipulation utilities.

Maintained by Leo Lahti. Last updated 1 months ago.

ropengov eurostat eurostat-data

242 stars 11.07 score 892 scripts 4 dependents

bioc

scater:Single-Cell Analysis Toolkit for Gene Expression Data in R

A collection of tools for doing various analyses of single-cell RNA-seq gene expression data, with a focus on quality control and visualization.

Maintained by Alan OCallaghan. Last updated 22 days ago.

immunooncology singlecell rnaseq qualitycontrol preprocessing normalization visualization dimensionreduction transcriptomics geneexpression sequencing software dataimport datarepresentation infrastructure coverage

11.07 score 12k scripts 43 dependents

paws-r

paws.common:Paws Low-Level Amazon Web Services API

Functions for making low-level API requests to Amazon Web Services <https://aws.amazon.com>. The functions handle building, signing, and sending requests, and receiving responses. They are designed to help build higher-level interfaces to individual services, such as Simple Storage Service (S3).

Maintained by Dyfan Jones. Last updated 16 days ago.

aws aws-sdk cpp

332 stars 11.07 score 39 dependents

ipums

ipumsr:An R Interface for Downloading, Reading, and Handling IPUMS Data

An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website <https://www.ipums.org>.

Maintained by Derek Burk. Last updated 1 months ago.

30 stars 11.05 score 720 scripts 2 dependents

bioc

universalmotif:Import, Modify, and Export Motifs with R

Allows for importing most common motif types into R for use by functions provided by other Bioconductor motif-related packages. Motifs can be exported into most major motif formats from various classes as defined by other Bioconductor packages. A suite of motif and sequence manipulation and analysis functions are included, including enrichment, comparison, P-value calculation, shuffling, trimming, higher-order motifs, and others.

Maintained by Benjamin Jean-Marie Tremblay. Last updated 5 months ago.

motifannotation motifdiscovery dataimport generegulation motif-analysis motif-enrichment-analysis sequence-logo cpp

28 stars 11.04 score 342 scripts 12 dependents

openml

OpenML:Open Machine Learning and Open Data Platform

We provide an R interface to 'OpenML.org' which is an online machine learning platform where researchers can access open data, download and upload data sets, share their machine learning tasks and experiments and organize them online to work and collaborate with other researchers. The R interface allows to query for data sets with specific properties, and allows the downloading and uploading of data sets, tasks, flows and runs. See <https://www.openml.org/guide/api> for more information.

Maintained by Giuseppe Casalicchio. Last updated 10 months ago.

arff benchmarking benchmarking-suite classification data-science database dataset datasets machine-learning machine-learning-algorithms open-data open-science opendata openml openscience regression reproducible-research statistics

97 stars 11.04 score 7.1k scripts

config-i1

greybox:Toolbox for Model Building and Forecasting

Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.

Maintained by Ivan Svetunkov. Last updated 15 days ago.

forecasting model-selection model-selection-and-evaluation regression regression-models statistics cpp

30 stars 11.03 score 97 scripts 34 dependents

mhahsler

arulesViz:Visualizing Association Rules and Frequent Itemsets

Extends package 'arules' with various visualization techniques for association rules and itemsets. The package also includes several interactive visualizations for rule exploration. Michael Hahsler (2017) <doi:10.32614/RJ-2017-047>.

Maintained by Michael Hahsler. Last updated 7 months ago.

arules association-rules frequent-itemsets interactive-visualizations visualization

54 stars 11.03 score 1.7k scripts 2 dependents

r-lib

jose:JavaScript Object Signing and Encryption

Read and write JSON Web Keys (JWK, rfc7517), generate and verify JSON Web Signatures (JWS, rfc7515) and encode/decode JSON Web Tokens (JWT, rfc7519) <https://datatracker.ietf.org/wg/jose/documents/>. These standards provide modern signing and encryption formats that are natively supported by browsers via the JavaScript WebCryptoAPI <https://www.w3.org/TR/WebCryptoAPI/#jose>, and used by services like OAuth 2.0, LetsEncrypt, and Github Apps.

Maintained by Jeroen Ooms. Last updated 6 months ago.

50 stars 11.00 score 63 scripts 35 dependents

bioc

CATALYST:Cytometry dATa anALYSis Tools

CATALYST provides tools for preprocessing of and differential discovery in cytometry data such as FACS, CyTOF, and IMC. Preprocessing includes i) normalization using bead standards, ii) single-cell deconvolution, and iii) bead-based compensation. For differential discovery, the package provides a number of convenient functions for data processing (e.g., clustering, dimension reduction), as well as a suite of visualizations for exploratory data analysis and exploration of results from differential abundance (DA) and state (DS) analysis in order to identify differences in composition and expression profiles at the subpopulation-level, respectively.

Maintained by Helena L. Crowell. Last updated 4 months ago.

clustering dataimport differentialexpression experimentaldesign flowcytometry immunooncology massspectrometry normalization preprocessing singlecell software statisticalmethod visualization

67 stars 10.99 score 362 scripts 2 dependents

davidgohel

officedown:Enhanced 'R Markdown' Format for 'Word' and 'PowerPoint'

Allows production of 'Microsoft' corporate documents from 'R Markdown' by reusing formatting defined in 'Microsoft Word' documents. You can reuse table styles, list styles but also add column sections, landscape oriented pages. Table and image captions as well as cross-references are transformed into 'Microsoft Word' fields, allowing documents edition and merging without issue with references; the syntax conforms to the 'bookdown' cross-reference definition. Objects generated by the 'officer' package are also supported in the 'knitr' chunks. 'Microsoft PowerPoint' presentations also benefit from this as well as the ability to produce editable vector graphics in 'PowerPoint' and also to define placeholder where content is to be added.

Maintained by David Gohel. Last updated 11 days ago.

371 stars 10.93 score 342 scripts 7 dependents

ropensci

CoordinateCleaner:Automated Cleaning of Occurrence Records from Biological Collections

Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. The reference for the methodology is: Zizka et al. (2019) <doi:10.1111/2041-210X.13152>.

Maintained by Alexander Zizka. Last updated 1 years ago.

82 stars 10.93 score 306 scripts 3 dependents

bioc

infercnv:Infer Copy Number Variation from Single-Cell RNA-Seq Data

Using single-cell RNA-Seq expression to visualize CNV in cells.

Maintained by Christophe Georgescu. Last updated 5 months ago.

software copynumbervariation variantdetection structuralvariation genomicvariation genetics transcriptomics statisticalmethod bayesian hiddenmarkovmodel singlecell jags cpp

601 stars 10.92 score 674 scripts

bioc

EnrichedHeatmap:Making Enriched Heatmaps

Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions. Here we implement enriched heatmap by ComplexHeatmap package. Since this type of heatmap is just a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmap as well as concatenating to a list of heatmaps to show correspondance between different data sources.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization sequencing genomeannotation coverage cpp

190 stars 10.87 score 330 scripts 1 dependents

marce10

warbleR:Streamline Bioacoustic Analysis

Functions aiming to facilitate the analysis of the structure of animal acoustic signals in 'R'. 'warbleR' makes use of the basic sound analysis tools from the packages 'tuneR' and 'seewave', and offers new tools for explore and quantify acoustic signal structure. The package allows to organize and manipulate multiple sound files, create spectrograms of complete recordings or individual signals in different formats, run several measures of acoustic structure, and characterize different structural levels in acoustic signals.

Maintained by Marcelo Araya-Salas. Last updated 2 months ago.

animal-acoustic-signals audio-processing bioacoustics spectrogram streamline-analysis cpp

56 stars 10.86 score 270 scripts 4 dependents

michelnivard

gptstudio:Use Large Language Models Directly in your Development Environment

Large language models are readily accessible via API. This package lowers the barrier to use the API inside of your development environment. For more on the API, see <https://platform.openai.com/docs/introduction>.

Maintained by James Wade. Last updated 3 days ago.

chatgpt gpt-3 rstudio rstudio-addin

930 stars 10.85 score 43 scripts 1 dependents

bioc

ANCOMBC:Microbiome differential abudance and correlation analyses with bias correction

ANCOMBC is a package containing differential abundance (DA) and correlation analyses for microbiome data. Specifically, the package includes Analysis of Compositions of Microbiomes with Bias Correction 2 (ANCOM-BC2), Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC), and Analysis of Composition of Microbiomes (ANCOM) for DA analysis, and Sparse Estimation of Correlations among Microbiomes (SECOM) for correlation analysis. Microbiome data are typically subject to two sources of biases: unequal sampling fractions (sample-specific biases) and differential sequencing efficiencies (taxon-specific biases). Methodologies included in the ANCOMBC package are designed to correct these biases and construct statistically consistent estimators.

Maintained by Huang Lin. Last updated 13 days ago.

differentialexpression microbiome normalization sequencing software ancom ancombc ancombc2 correlation differential-abundance-analysis secom

120 stars 10.79 score 406 scripts 1 dependents

azure

AzureStor:Storage Management in 'Azure'

Manage storage in Microsoft's 'Azure' cloud: <https://azure.microsoft.com/en-us/product-categories/storage/>. On the admin side, 'AzureStor' includes features to create, modify and delete storage accounts. On the client side, it includes an interface to blob storage, file storage, and 'Azure Data Lake Storage Gen2': upload and download files and blobs; list containers and files/blobs; create containers; and so on. Authenticated access to storage is supported, via either a shared access key or a shared access signature (SAS). Part of the 'AzureR' family of packages.

Maintained by Hong Ooi. Last updated 2 years ago.

azure-data-lake azure-sdk-r azure-storage azure-storage-blob azure-storage-file

65 stars 10.74 score 298 scripts 4 dependents

bioc

muscat:Multi-sample multi-group scRNA-seq data analysis tools

`muscat` provides various methods and visualization tools for DS analysis in multi-sample, multi-group, multi-(cell-)subpopulation scRNA-seq data, including cell-level mixed models and methods based on aggregated “pseudobulk” data, as well as a flexible simulation platform that mimics both single and multi-sample scRNA-seq data.

Maintained by Helena L. Crowell. Last updated 5 months ago.

immunooncology differentialexpression sequencing singlecell software statisticalmethod visualization

184 stars 10.74 score 686 scripts 1 dependents

rstudio

pointblank:Data Validation and Organization of Metadata for Local and Remote Tables

Validate data in data frames, 'tibble' objects, 'Spark' 'DataFrames', and database tables. Validation pipelines can be made using easily-readable, consecutive validation steps. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions. Many other workflows are available including an information management workflow, where the aim is to record, collect, and generate useful information on data tables.

Maintained by Richard Iannone. Last updated 3 days ago.

data-assertions data-checker data-dictionaries data-frames data-inference data-management data-profiler data-quality data-validation data-verification database-tables easy-to-understand reporting-tool schema-validation testing-tools yaml-configuration

942 stars 10.73 score 284 scripts

apache

apache.sedona:R Interface for Apache Sedona

R interface for 'Apache Sedona' based on 'sparklyr' (<https://sedona.apache.org>).

Maintained by Apache Sedona. Last updated 5 hours ago.

cluster-computing geospatial java python scala spatial-analysis spatial-query spatial-sql

2.0k stars 10.72 score 105 scripts

jimmyday12

fitzRoy:Easily Scrape and Process AFL Data

An easy package for scraping and processing Australia Rules Football (AFL) data. 'fitzRoy' provides a range of functions for accessing publicly available data from 'AFL Tables' <https://afltables.com/afl/afl_index.html>, 'Footy Wire' <https://www.footywire.com> and 'The Squiggle' <https://squiggle.com.au>. Further functions allow for easy processing, cleaning and transformation of this data into formats that can be used for analysis.

Maintained by James Day. Last updated 8 days ago.

136 stars 10.72 score 324 scripts

pecanproject

PEcAn.benchmark:PEcAn Functions Used for Benchmarking

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation. The PEcAn.benchmark package provides utilities for comparing models and data, including a suite of statistical metrics and plots.

Maintained by Mike Dietze. Last updated 2 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 10.72 score 416 scripts 11 dependents

mrcieu

ieugwasr:Interface to the 'OpenGWAS' Database API

Interface to the 'OpenGWAS' database API <https://api.opengwas.io/api/>. Includes a wrapper to make generic calls to the API, plus convenience functions for specific queries.

Maintained by Gibran Hemani. Last updated 16 days ago.

89 stars 10.71 score 404 scripts 6 dependents

bioc

ALDEx2:Analysis Of Differential Abundance Taking Sample and Scale Variation Into Account

A differential abundance analysis for the comparison of two or more conditions. Useful for analyzing data from standard RNA-seq or meta-RNA-seq assays as well as selected and unselected values from in-vitro sequence selections. Uses a Dirichlet-multinomial model to infer abundance from counts, optimized for three or more experimental replicates. The method infers biological and sampling variation to calculate the expected false discovery rate, given the variation, based on a Wilcoxon Rank Sum test and Welch's t-test (via aldex.ttest), a Kruskal-Wallis test (via aldex.kw), a generalized linear model (via aldex.glm), or a correlation test (via aldex.corr). All tests report predicted p-values and posterior Benjamini-Hochberg corrected p-values. ALDEx2 also calculates expected standardized effect sizes for paired or unpaired study designs. ALDEx2 can now be used to estimate the effect of scale on the results and report on the scale-dependent robustness of results.

Maintained by Greg Gloor. Last updated 5 months ago.

differentialexpression rnaseq transcriptomics geneexpression dnaseq chipseq bayesian sequencing software microbiome metagenomics immunooncology scale simulation posterior p-value

28 stars 10.70 score 424 scripts 3 dependents

cjvanlissa

tidySEM:Tidy Structural Equation Modeling

A tidy workflow for generating, estimating, reporting, and plotting structural equation models using 'lavaan', 'OpenMx', or 'Mplus'. Throughout this workflow, elements of syntax, results, and graphs are represented as 'tidy' data, making them easy to customize. Includes functionality to estimate latent class analyses, and to plot 'dagitty' and 'igraph' objects.

Maintained by Caspar J. van Lissa. Last updated 20 days ago.

58 stars 10.69 score 330 scripts 1 dependents

doi-usgs

EGRET:Exploration and Graphics for RivEr Trends

Statistics and graphics for streamflow history, water quality trends, and the statistical modeling algorithm: Weighted Regressions on Time, Discharge, and Season (WRTDS).

Maintained by Laura DeCicco. Last updated 4 months ago.

usgs water-quality water-quality-data

90 stars 10.67 score 362 scripts 1 dependents

quanteda

readtext:Import and Handling for Plain and Formatted Text Files

Functions for importing and handling text files and formatted text files with additional meta-data, such including '.csv', '.tab', '.json', '.xml', '.html', '.pdf', '.doc', '.docx', '.rtf', '.xls', '.xlsx', and others.

Maintained by Kenneth Benoit. Last updated 4 months ago.

encoding quanteda text

122 stars 10.66 score 1.2k scripts 5 dependents

neonscience

neonUtilities:Utilities for Working with NEON Data

NEON data packages can be accessed through the NEON Data Portal <https://www.neonscience.org> or through the NEON Data API (see <https://data.neonscience.org/data-api> for documentation). Data delivered from the Data Portal are provided as monthly zip files packaged within a parent zip file, while individual files can be accessed from the API. This package provides tools that aid in discovering, downloading, and reformatting data prior to use in analyses. This includes downloading data via the API, merging data tables by type, and converting formats. For more information, see the readme file at <https://github.com/NEONScience/NEON-utilities>.

Maintained by Claire Lunch. Last updated 2 months ago.

57 stars 10.66 score 944 scripts 15 dependents

r-lum

Luminescence:Comprehensive Luminescence Dating Data Analysis

A collection of various R functions for the purpose of Luminescence dating data analysis. This includes, amongst others, data import, export, application of age models, curve deconvolution, sequence analysis and plotting of equivalent dose distributions.

Maintained by Sebastian Kreutzer. Last updated 19 hours ago.

bayesian-statistics data-science geochronology luminescence luminescence-dating open-science osl plotting radiofluorescence tl xsyg cpp

15 stars 10.66 score 178 scripts 8 dependents

business-science

modeltime:The Tidymodels Extension for Time Series Modeling

The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<https://otexts.com/fpp2/>). Refer to "Prophet: forecasting at scale" (<https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/>.).

Maintained by Matt Dancho. Last updated 5 months ago.

arima data-science deep-learning ets forecasting machine-learning machine-learning-algorithms modeltime prophet tbats tidymodeling tidymodels time time-series time-series-analysis timeseries timeseries-forecasting

551 stars 10.61 score 1.1k scripts 7 dependents

bioc

tximeta:Transcript Quantification Import with Automatic Metadata

Transcript quantification import from Salmon and other quantifiers with automatic attachment of transcript ranges and release information, and other associated metadata. De novo transcriptomes can be linked to the appropriate sources with linkedTxomes and shared for computational reproducibility.

Maintained by Michael Love. Last updated 2 months ago.

annotation genomeannotation dataimport preprocessing rnaseq singlecell transcriptomics transcription geneexpression functionalgenomics reproducibleresearch reportwriting immunooncology

67 stars 10.58 score 466 scripts 1 dependents

bioc

Glimma:Interactive visualizations for gene expression analysis

This package produces interactive visualizations for RNA-seq data analysis, utilizing output from limma, edgeR, or DESeq2. It produces interactive htmlwidgets versions of popular RNA-seq analysis plots to enhance the exploration of analysis results by overlaying interactive features. The plots can be viewed in a web browser or embedded in notebook documents.

Maintained by Shian Su. Last updated 2 months ago.

differentialexpression geneexpression microarray reportwriting rnaseq sequencing visualization differential-expression interactive-visualizations

32 stars 10.58 score 600 scripts 1 dependents

bioc

ORFik:Open Reading Frames in Genomics

R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.

Maintained by Haakon Tjeldnes. Last updated 1 months ago.

immunooncology software sequencing riboseq rnaseq functionalgenomics coverage alignment dataimport cpp

33 stars 10.56 score 115 scripts 2 dependents

bioc

DECIPHER:Tools for curating, analyzing, and manipulating biological sequences

A toolset for deciphering and managing biological sequences.

Maintained by Erik Wright. Last updated 18 days ago.

clustering genetics sequencing dataimport visualization microarray qualitycontrol qpcr alignment wholegenome microbiome immunooncology geneprediction openmp

10.55 score 1.1k scripts 14 dependents

mlverse

chattr:Interact with Large Language Models in 'RStudio'

Enables user interactivity with large-language models ('LLM') inside the 'RStudio' integrated development environment (IDE). The user can interact with the model using the 'shiny' app included in this package, or directly in the 'R' console. It comes with back-ends for 'OpenAI', 'GitHub' 'Copilot', and 'LlamaGPT'.

Maintained by Edgar Ruiz. Last updated 2 months ago.

215 stars 10.55 score 71 scripts 1 dependents

datastorm-open

shinymanager:Authentication Management for 'Shiny' Applications

Simple and secure authentification mechanism for single 'Shiny' applications. Credentials can be stored in an encrypted 'SQLite' database or on your own SQL Database (Postgres, MySQL, ...). Source code of main application is protected until authentication is successful.

Maintained by Benoit Thieurmel. Last updated 11 months ago.

shiny shiny-server shinyapps

391 stars 10.51 score 316 scripts 2 dependents

bioc

ballgown:Flexible, isoform-level differential expression analysis

Tools for statistical analysis of assembled transcriptomes, including flexible differential expression analysis, visualization of transcript structures, and matching of assembled transcripts to annotation.

Maintained by Jack Fu. Last updated 5 months ago.

immunooncology rnaseq statisticalmethod preprocessing differentialexpression

145 stars 10.51 score 338 scripts 1 dependents

ropensci

gutenbergr:Download and Process Public Domain Works from Project Gutenberg

Download and process public domain works in the Project Gutenberg collection <https://www.gutenberg.org/>. Includes metadata for all Project Gutenberg works, so that they can be searched and retrieved.

Maintained by Jon Harmon. Last updated 3 months ago.

peer-reviewed

105 stars 10.50 score 1.1k scripts 1 dependents

bioc

miloR:Differential neighbourhood abundance testing on a graph

Milo performs single-cell differential abundance testing. Cell states are modelled as representative neighbourhoods on a nearest neighbour graph. Hypothesis testing is performed using either a negative bionomial generalized linear model or negative binomial generalized linear mixed model.

Maintained by Mike Morgan. Last updated 5 months ago.

singlecell multiplecomparison functionalgenomics software openblas cpp openmp

362 stars 10.49 score 340 scripts 1 dependents

rstudio

vetiver:Version, Share, Deploy, and Monitor Models

The goal of 'vetiver' is to provide fluent tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. The 'vetiver' package is extensible, with generics that can support many kinds of models.

Maintained by Julia Silge. Last updated 6 months ago.

185 stars 10.48 score 466 scripts 1 dependents

bioc

celda:CEllular Latent Dirichlet Allocation

Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included.

Maintained by Joshua Campbell. Last updated 1 months ago.

singlecell geneexpression clustering sequencing bayesian immunooncology dataimport cpp openmp

147 stars 10.47 score 256 scripts 2 dependents

crunch-io

crunch:Crunch.io Data Tools

The Crunch.io service <https://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.

Maintained by Greg Freedman Ellis. Last updated 8 days ago.

9 stars 10.47 score 200 scripts 2 dependents

vubiostat

redcapAPI:Interface to 'REDCap'

Access data stored in 'REDCap' databases using the Application Programming Interface (API). 'REDCap' (Research Electronic Data CAPture; <https://projectredcap.org>, Harris, et al. (2009) <doi:10.1016/j.jbi.2008.08.010>, Harris, et al. (2019) <doi:10.1016/j.jbi.2019.103208>) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The API allows users to access data and project meta data (such as the data dictionary) from the web programmatically. The 'redcapAPI' package facilitates the process of accessing data with options to prepare an analysis-ready data set consistent with the definitions in a database's data dictionary.

Maintained by Shawn Garbett. Last updated 22 days ago.

22 stars 10.47 score 134 scripts 2 dependents

bioc

GENESIS:GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness

The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.

Maintained by Stephanie M. Gogarten. Last updated 2 months ago.

snp geneticvariability genetics statisticalmethod dimensionreduction principalcomponent genomewideassociation qualitycontrol biocviews

36 stars 10.44 score 342 scripts 1 dependents

bioc

UCell:Rank-based signature enrichment analysis for single-cell data

UCell is a package for evaluating gene signatures in single-cell datasets. UCell signature scores, based on the Mann-Whitney U statistic, are robust to dataset size and heterogeneity, and their calculation demands less computing time and memory than other available methods, enabling the processing of large datasets in a few minutes even on machines with limited computing power. UCell can be applied to any single-cell data matrix, and includes functions to directly interact with SingleCellExperiment and Seurat objects.

Maintained by Massimo Andreatta. Last updated 5 months ago.

singlecell genesetenrichment transcriptomics geneexpression cellbasedassays

143 stars 10.43 score 454 scripts 2 dependents

ropensci

rebird:R Client for the eBird Database of Bird Observations

A programmatic client for the eBird database (<https://ebird.org/home>), including functions for searching for bird observations by geographic location (latitude, longitude), eBird hotspots, location identifiers, by notable sightings, by region, and by taxonomic name.

Maintained by Sebastian Pardo. Last updated 2 months ago.

birds birding ebird database data biology observations sightings ornithology ebird-api ebird-webservices spocc

90 stars 10.43 score 73 scripts 6 dependents

ropensci

robotstxt:A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

Provides functions to download and parse 'robots.txt' files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, ...) are allowed to access specific resources on a domain.

Maintained by Jordan Bradford. Last updated 4 months ago.

crawler peer-reviewed robotstxt scraper spider webscraping

68 stars 10.43 score 414 scripts 7 dependents

bioc

oligo:Preprocessing tools for oligonucleotide arrays

A package to analyze oligonucleotide arrays (expression/SNP/tiling/exon) at probe-level. It currently supports Affymetrix (CEL files) and NimbleGen arrays (XYS files).

Maintained by Benilton Carvalho. Last updated 21 days ago.

microarray onechannel twochannel preprocessing snp differentialexpression exonarray geneexpression dataimport zlib

3 stars 10.42 score 528 scripts 10 dependents

bioc

scRepertoire:A toolkit for single-cell immune receptor profiling

scRepertoire is a toolkit for processing and analyzing single-cell T-cell receptor (TCR) and immunoglobulin (Ig). The scRepertoire framework supports use of 10x, AIRR, BD, MiXCR, Omniscope, TRUST4, and WAT3R single-cell formats. The functionality includes basic clonal analyses, repertoire summaries, distance-based clustering and interaction with the popular Seurat and SingleCellExperiment/Bioconductor R workflows.

Maintained by Nick Borcherding. Last updated 10 days ago.

software immunooncology singlecell classification annotation sequencing cpp

327 stars 10.42 score 240 scripts

azure

AzureAuth:Authentication Services for Azure Active Directory

Provides Azure Active Directory (AAD) authentication functionality for R users of Microsoft's 'Azure' cloud <https://azure.microsoft.com/>. Use this package to obtain 'OAuth' 2.0 tokens for services including Azure Resource Manager, Azure Storage and others. It supports both AAD v1.0 and v2.0, as well as multiple authentication methods, including device code and resource owner grant. Tokens are cached in a user-specific directory obtained using the 'rappdirs' package. The interface is based on the 'OAuth' framework in the 'httr' package, but customised and streamlined for Azure. Part of the 'AzureR' family of packages.

Maintained by Hong Ooi. Last updated 3 years ago.

azure azure-active-directory azure-sdk-r oauth2

44 stars 10.42 score 57 scripts 23 dependents

posit-dev

connectapi:Utilities for Interacting with the 'Posit Connect' Server API

Provides a helpful 'R6' class and methods for interacting with the 'Posit Connect' Server API along with some meaningful utility functions for regular tasks. API documentation varies by 'Posit Connect' installation and version, but the latest documentation is also hosted publicly at <https://docs.posit.co/connect/api/>.

Maintained by Toph Allen. Last updated 4 days ago.

api-client rstudio-connect

47 stars 10.42 score 252 scripts 1 dependents