R-universe search: needs:blob

tidyverse

tidyverse:Easily Install and Load the 'Tidyverse'

The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <https://www.tidyverse.org>.

Maintained by Hadley Wickham. Last updated 5 months ago.

data-science tidyverse

1.7k stars 20.23 score 664k scripts 125 dependents

tidyverse

dbplyr:A 'dplyr' Back End for Databases

A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author.

Maintained by Hadley Wickham. Last updated 4 months ago.

database

481 stars 19.72 score 5.2k scripts 736 dependents

r-dbi

RSQLite:SQLite Interface for R

Embeds the SQLite database engine in R and provides an interface compliant with the DBI package. The source for the SQLite engine and for various extensions in a recent version is included. System libraries will never be consulted because this package relies on static linking for the plugins it includes; this also ensures a consistent experience across all installations.

Maintained by Kirill Müller. Last updated 2 days ago.

database sqlite3 cpp

331 stars 18.78 score 8.1k scripts 1.1k dependents

bioc

clusterProfiler:A universal enrichment tool for interpreting omics data

This package supports functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation. It provides a univeral interface for gene functional annotation from a variety of sources and thus can be applied in diverse scenarios. It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation. Datasets obtained from multiple treatments and time points can be analyzed and compared in a single run, easily revealing functional consensus and differences among distinct conditions.

Maintained by Guangchuang Yu. Last updated 4 months ago.

annotation clustering genesetenrichment go kegg multiplecomparison pathways reactome visualization enrichment-analysis gsea

1.1k stars 17.03 score 11k scripts 48 dependents

r-dbi

odbc:Connect to ODBC Compatible Databases (using the DBI Interface)

A DBI-compatible interface to ODBC databases.

Maintained by Hadley Wickham. Last updated 4 days ago.

database odbc unixodbc cpp

396 stars 16.31 score 2.9k scripts 23 dependents

bioc

biomaRt:Interface to BioMart databases (i.e. Ensembl)

In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (<http://www.biomart.org>). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The most prominent examples of BioMart databases are maintain by Ensembl, which provides biomaRt users direct access to a diverse set of data and enables a wide range of powerful online queries from gene annotation to database mining.

Maintained by Mike Smith. Last updated 17 days ago.

annotation bioconductor biomart ensembl

38 stars 15.99 score 13k scripts 230 dependents

bioc

enrichplot:Visualization of Functional Enrichment Result

The 'enrichplot' package implements several visualization methods for interpreting functional enrichment results obtained from ORA or GSEA analysis. It is mainly designed to work with the 'clusterProfiler' package suite. All the visualization methods are developed based on 'ggplot2' graphics.

Maintained by Guangchuang Yu. Last updated 3 months ago.

annotation genesetenrichment go kegg pathways software visualization enrichment-analysis pathway-analysis

239 stars 15.71 score 3.1k scripts 58 dependents

bioc

GenomicFeatures:Query the gene models of a given organism/assembly

Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.

Maintained by H. Pagès. Last updated 5 months ago.

genetics infrastructure annotation sequencing genomeannotation bioconductor-package core-package

26 stars 15.34 score 5.3k scripts 339 dependents

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 13 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

959 stars 15.20 score 4.0k scripts 21 dependents

bioc

AnnotationDbi:Manipulation of SQLite-based annotations in Bioconductor

Implements a user-friendly interface for querying SQLite-based annotation data packages.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation microarray sequencing genomeannotation bioconductor-package core-package

9 stars 15.05 score 3.6k scripts 769 dependents

bioc

DOSE:Disease Ontology Semantic and Enrichment analysis

This package implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring semantic similarities among DO terms and gene products. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented for discovering disease associations of high-throughput biological data.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation visualization multiplecomparison genesetenrichment pathways software disease-ontology enrichment-analysis semantic-similarity

119 stars 14.97 score 2.0k scripts 61 dependents

r-dbi

RPostgres:C++ Interface to PostgreSQL

Fully DBI-compliant C++-backed interface to PostgreSQL <https://www.postgresql.org/>, an open-source relational database.

Maintained by Kirill Müller. Last updated 1 months ago.

database postgres postgresql cpp

338 stars 14.78 score 1.6k scripts 31 dependents

bioc

GSVA:Gene Set Variation Analysis for Microarray and RNA-Seq Data

Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.

Maintained by Robert Castelo. Last updated 10 days ago.

functionalgenomics microarray rnaseq pathways genesetenrichment gene-set-enrichment genomics pathway-enrichment-analysis

212 stars 14.74 score 1.6k scripts 19 dependents

bioc

TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data

The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Maintained by Tiago Chedraoui Silva. Last updated 1 months ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network sequencing survival software bioc bioconductor gdc integrative-analysis tcga tcga-data tcgabiolinks

310 stars 14.47 score 1.6k scripts 6 dependents

bioc

GOSemSim:GO-terms Semantic Similarity Measures

The semantic comparisons of Gene Ontology (GO) annotations provide quantitative ways to compute similarities between genes and gene groups, and have became important basis for many bioinformatics analysis approaches. GOSemSim is an R package for semantic similarity computation among GO terms, sets of GO terms, gene products and gene clusters. GOSemSim implemented five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation go clustering pathways network software bioinformatics gene-ontology semantic-similarity cpp

63 stars 14.12 score 708 scripts 68 dependents

bioc

ensembldb:Utilities to create and use Ensembl-based annotation databases

The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.

Maintained by Johannes Rainer. Last updated 5 months ago.

genetics annotationdata sequencing coverage annotation bioconductor bioconductor-packages ensembl

35 stars 14.08 score 892 scripts 108 dependents

bioc

AnnotationHub:Client to access AnnotationHub resources

This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be discovered. The resource includes metadata about each resource, e.g., a textual description, tags, and date of modification. The client creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

infrastructure dataimport gui thirdpartyclient core-package u24ca289073

17 stars 13.88 score 2.7k scripts 104 dependents

bioc

BiocFileCache:Manage Files Across Sessions

This package creates a persistent on-disk cache of files that the user can add, update, and retrieve. It is useful for managing resources (such as custom Txdb objects) that are costly or difficult to create, web resources, and data files used across sessions.

Maintained by Lori Shepherd. Last updated 2 months ago.

dataimport core-package u24ca289073

13 stars 13.76 score 486 scripts 436 dependents

bioc

Gviz:Plotting data and annotation information along genomic coordinates

Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.

Maintained by Robert Ivanek. Last updated 5 months ago.

visualization microarray sequencing

79 stars 13.05 score 1.4k scripts 46 dependents

bioc

ChIPseeker:ChIPseeker for ChIP peak Annotation, Comparison, and Visualization

This package implements functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statstical methods for estimate the significance of overlap among ChIP peak data sets, and incorporate GEO database for user to compare the own dataset with those deposited in database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. Several visualization functions are implemented to summarize the coverage of the peak experiment, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS, and overlap of peaks or genes.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation chipseq software visualization multiplecomparison atac-seq chip-seq comparison epigenetics epigenomics

233 stars 13.05 score 1.6k scripts 5 dependents

ggrothendieck

sqldf:Manipulate R Data Frames Using SQL

The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf() transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle. 'RSQLite', 'RH2', 'RMySQL' and 'RPostgreSQL' backends are supported.

Maintained by G. Grothendieck. Last updated 3 years ago.

250 stars 13.04 score 8.1k scripts 52 dependents

bioc

minfi:Analyze Illumina Infinium DNA methylation arrays

Tools to analyze & visualize Illumina Infinium methylation arrays.

Maintained by Kasper Daniel Hansen. Last updated 4 months ago.

immunooncology dnamethylation differentialmethylation epigenetics microarray methylationarray multichannel twochannel dataimport normalization preprocessing qualitycontrol

60 stars 12.82 score 996 scripts 27 dependents

bioc

SpatialExperiment:S4 Class for Spatially Resolved -omics Data

Defines an S4 class for storing data from spatial -omics experiments. The class extends SingleCellExperiment to support storage and retrieval of additional information from spot-based and molecule-based platforms, including spatial coordinates, images, and image metadata. A specialized constructor function is included for data from the 10x Genomics Visium platform.

Maintained by Dario Righelli. Last updated 5 months ago.

datarepresentation dataimport infrastructure immunooncology geneexpression transcriptomics singlecell spatial

59 stars 12.63 score 1.8k scripts 71 dependents

ohdsi

DatabaseConnector:Connecting to Various Database Platforms

An R 'DataBase Interface' ('DBI') compatible interface to various database platforms ('PostgreSQL', 'Oracle', 'Microsoft SQL Server', 'Amazon Redshift', 'Microsoft Parallel Database Warehouse', 'IBM Netezza', 'Apache Impala', 'Google BigQuery', 'Snowflake', 'Spark', 'SQLite', and 'InterSystems IRIS'). Also includes support for fetching data as 'Andromeda' objects. Uses either 'Java Database Connectivity' ('JDBC') or other 'DBI' drivers to connect to databases.

Maintained by Martijn Schuemie. Last updated 2 months ago.

hades openjdk

56 stars 12.63 score 772 scripts 11 dependents

bioc

TFBSTools:Software Package for Transcription Factor Binding Site (TFBS) Analysis

TFBSTools is a package for the analysis and manipulation of transcription factor binding sites. It includes matrices conversion between Position Frequency Matirx (PFM), Position Weight Matirx (PWM) and Information Content Matrix (ICM). It can also scan putative TFBS from sequence/alignment, query JASPAR database and provides a wrapper of de novo motif discovery software.

Maintained by Ge Tan. Last updated 19 days ago.

motifannotation generegulation motifdiscovery transcription alignment

28 stars 12.36 score 1.1k scripts 18 dependents

bioc

ReactomePA:Reactome Pathway Analysis

This package provides functions for pathway analysis based on REACTOME pathway database. It implements enrichment analysis, gene set enrichment analysis and several functions for visualization. This package is not affiliated with the Reactome team.

Maintained by Guangchuang Yu. Last updated 5 months ago.

pathways visualization annotation multiplecomparison genesetenrichment reactome enrichment-analysis reactome-pathway-analysis reactomepa

40 stars 12.25 score 1.5k scripts 7 dependents

bioc

ggbio:Visualization tools for genomic data

The ggbio package extends and specializes the grammar of graphics for biological data. The graphics are designed to answer common scientific questions, in particular those often asked of high throughput genomics data. All core Bioconductor data structures are supported, where appropriate. The package supports detailed views of particular genomic regions, as well as genome-wide overviews. Supported overviews include ideograms and grand linear views. High-level plots include sequence fragment length, edge-linked interval to data view, mismatch pileup, and several splicing summaries.

Maintained by Michael Lawrence. Last updated 5 months ago.

infrastructure visualization

111 stars 12.23 score 734 scripts 16 dependents

r-dbi

RMariaDB:Database Interface and MariaDB Driver

Implements a DBI-compliant interface to MariaDB (<https://mariadb.org/>) and MySQL (<https://www.mysql.com/>) databases.

Maintained by Kirill Müller. Last updated 1 months ago.

database mariadb mysql cpp

133 stars 12.20 score 792 scripts 10 dependents

bioc

ExperimentHub:Client to access ExperimentHub resources

This package provides a client for the Bioconductor ExperimentHub web resource. ExperimentHub provides a central location where curated data from experiments, publications or training courses can be accessed. Each resource has associated metadata, tags and date of modification. The client creates and manages a local cache of files retrieved enabling quick and reproducible access.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

infrastructure dataimport gui thirdpartyclient core-package u24ca289073

10 stars 11.94 score 764 scripts 57 dependents

pecanproject

PEcAn.DB:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by David LeBauer. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 11.91 score 127 scripts 27 dependents

pecanproject

PEcAn.data.atmosphere:PEcAn Functions Used for Managing Climate Driver Data

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The PECAn.data.atmosphere package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.

Maintained by David LeBauer. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 11.63 score 64 scripts 14 dependents

bioc

bumphunter:Bump Hunter

Tools for finding bumps in genomic data

Maintained by Tamilselvi Guharaj. Last updated 5 months ago.

dnamethylation epigenetics infrastructure multiplecomparison immunooncology

16 stars 11.61 score 210 scripts 43 dependents

darwin-eu

CDMConnector:Connect to an OMOP Common Data Model

Provides tools for working with observational health data in the Observational Medical Outcomes Partnership (OMOP) Common Data Model format with a pipe friendly syntax. Common data model database table references are stored in a single compound object along with metadata.

Maintained by Adam Black. Last updated 1 months ago.

12 stars 11.43 score 502 scripts 12 dependents

bioc

annotate:Annotation for microarrays

Using R enviroments for annotation.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation pathways go

11.41 score 812 scripts 239 dependents

bioc

VariantAnnotation:Annotation of Genetic Variants

Annotate variants, compute amino acid coding changes, predict coding outcomes.

Maintained by Bioconductor Package Maintainer. Last updated 3 months ago.

dataimport sequencing snp annotation genetics variantannotation curl bzip2 xz-utils zlib

11.39 score 1.9k scripts 152 dependents

bioc

pathview:a tool set for pathway based data integration and visualization

Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.

Maintained by Weijun Luo. Last updated 3 days ago.

pathways graphandnetwork visualization genesetenrichment differentialexpression geneexpression microarray rnaseq genetics metabolomics proteomics systemsbiology sequencing

40 stars 11.37 score 1.6k scripts 10 dependents

ropensci

biomartr:Genomic Data Retrieval

Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database (Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr', 'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. with only one command.

Maintained by Hajk-Georg Drost. Last updated 2 months ago.

biomart genomic-data-retrieval annotation-retrieval database-retrieval ncbi ensembl biological-data-retrieval ensembl-servers genome genome-annotation genome-retrieval genomics meta-analysis metagenomics ncbi-genbank peer-reviewed proteome sequenced-genomes

218 stars 11.35 score 129 scripts 3 dependents

bioc

karyoploteR:Plot customizable linear genomes displaying arbitrary data

karyoploteR creates karyotype plots of arbitrary genomes and offers a complete set of functions to plot arbitrary data on them. It mimicks many R base graphics functions coupling them with a coordinate change function automatically mapping the chromosome and data coordinates into the plot coordinates. In addition to the provided data plotting functions, it is easy to add new ones.

Maintained by Bernat Gel. Last updated 5 months ago.

visualization copynumbervariation sequencing coverage dnaseq chipseq methylseq dataimport onechannel bioconductor bioinformatics data-visualization genome genomics-visualization plotting-in-r

307 stars 11.25 score 656 scripts 4 dependents

jamiemkass

ENMeval:Automated Tuning and Evaluations of Ecological Niche Models

Runs ecological niche models over all combinations of user-defined settings (i.e., tuning), performs cross validation to evaluate models, and returns data tables to aid in selection of optimal model settings that balance goodness-of-fit and model complexity. Also has functions to partition data spatially (or not) for cross validation, to plot multiple visualizations of results, to run null models to estimate significance and effect sizes of performance metrics, and to calculate range overlap between model predictions, among others. The package was originally built for Maxent models (Phillips et al. 2006, Phillips et al. 2017), but the current version allows possible extensions for any modeling algorithm. The extensive vignette, which guides users through most package functionality but unfortunately has a file size too big for CRAN, can be found here on the package's Github Pages website: <https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.

Maintained by Jamie M. Kass. Last updated 3 days ago.

49 stars 11.16 score 332 scripts 2 dependents

bioc

genefilter:genefilter: methods for filtering genes from high-throughput experiments

Some basic functions for filtering genes.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

microarray fortran cpp

11.11 score 2.4k scripts 143 dependents

ohdsi

PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model

A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.

Maintained by Egill Fridgeirsson. Last updated 24 days ago.

hades openjdk

190 stars 10.85 score 297 scripts

apache

apache.sedona:R Interface for Apache Sedona

R interface for 'Apache Sedona' based on 'sparklyr' (<https://sedona.apache.org>).

Maintained by Apache Sedona. Last updated 5 hours ago.

cluster-computing geospatial java python scala spatial-analysis spatial-query spatial-sql

2.0k stars 10.73 score 105 scripts

pecanproject

PEcAn.benchmark:PEcAn Functions Used for Benchmarking

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation. The PEcAn.benchmark package provides utilities for comparing models and data, including a suite of statistical metrics and plots.

Maintained by Mike Dietze. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 10.73 score 416 scripts 11 dependents

rstudio

pointblank:Data Validation and Organization of Metadata for Local and Remote Tables

Validate data in data frames, 'tibble' objects, 'Spark' 'DataFrames', and database tables. Validation pipelines can be made using easily-readable, consecutive validation steps. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions. Many other workflows are available including an information management workflow, where the aim is to record, collect, and generate useful information on data tables.

Maintained by Richard Iannone. Last updated 5 days ago.

data-assertions data-checker data-dictionaries data-frames data-inference data-management data-profiler data-quality data-validation data-verification database-tables easy-to-understand reporting-tool schema-validation testing-tools yaml-configuration

942 stars 10.73 score 284 scripts

bioc

GWASTools:Tools for Genome Wide Association Studies

Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.

Maintained by Stephanie M. Gogarten. Last updated 13 days ago.

snp geneticvariability qualitycontrol microarray

17 stars 10.67 score 396 scripts 5 dependents

ohdsi

FeatureExtraction:Generating Features for a Cohort

An R interface for generating features for a cohort using data in the Common Data Model. Features can be constructed using default or custom made feature definitions. Furthermore it's possible to aggregate features and get the summary statistics.

Maintained by Ger Inberg. Last updated 11 days ago.

hades openjdk

62 stars 10.64 score 209 scripts 2 dependents

bioc

tximeta:Transcript Quantification Import with Automatic Metadata

Transcript quantification import from Salmon and other quantifiers with automatic attachment of transcript ranges and release information, and other associated metadata. De novo transcriptomes can be linked to the appropriate sources with linkedTxomes and shared for computational reproducibility.

Maintained by Michael Love. Last updated 2 months ago.

annotation genomeannotation dataimport preprocessing rnaseq singlecell transcriptomics transcription geneexpression functionalgenomics reproducibleresearch reportwriting immunooncology

67 stars 10.58 score 466 scripts 1 dependents

bioc

ORFik:Open Reading Frames in Genomics

R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.

Maintained by Haakon Tjeldnes. Last updated 1 months ago.

immunooncology software sequencing riboseq rnaseq functionalgenomics coverage alignment dataimport cpp

33 stars 10.56 score 115 scripts 2 dependents

datastorm-open

shinymanager:Authentication Management for 'Shiny' Applications

Simple and secure authentification mechanism for single 'Shiny' applications. Credentials can be stored in an encrypted 'SQLite' database or on your own SQL Database (Postgres, MySQL, ...). Source code of main application is protected until authentication is successful.

Maintained by Benoit Thieurmel. Last updated 11 months ago.

shiny shiny-server shinyapps

391 stars 10.51 score 316 scripts 2 dependents

bioc

ballgown:Flexible, isoform-level differential expression analysis

Tools for statistical analysis of assembled transcriptomes, including flexible differential expression analysis, visualization of transcript structures, and matching of assembled transcripts to annotation.

Maintained by Jack Fu. Last updated 5 months ago.

immunooncology rnaseq statisticalmethod preprocessing differentialexpression

145 stars 10.51 score 338 scripts 1 dependents

bioc

GENESIS:GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness

The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.

Maintained by Stephanie M. Gogarten. Last updated 2 months ago.

snp geneticvariability genetics statisticalmethod dimensionreduction principalcomponent genomewideassociation qualitycontrol biocviews

36 stars 10.44 score 342 scripts 1 dependents

bioc

oligo:Preprocessing tools for oligonucleotide arrays

A package to analyze oligonucleotide arrays (expression/SNP/tiling/exon) at probe-level. It currently supports Affymetrix (CEL files) and NimbleGen arrays (XYS files).

Maintained by Benilton Carvalho. Last updated 23 days ago.

microarray onechannel twochannel preprocessing snp differentialexpression exonarray geneexpression dataimport zlib

3 stars 10.42 score 528 scripts 10 dependents

egeulgen

pathfindR:Enrichment Analysis Utilizing Active Subnetworks

Enrichment analysis enables researchers to uncover mechanisms underlying a phenotype. However, conventional methods for enrichment analysis do not take into account protein-protein interaction information, resulting in incomplete conclusions. 'pathfindR' is a tool for enrichment analysis utilizing active subnetworks. The main function identifies active subnetworks in a protein-protein interaction network using a user-provided list of genes and associated p values. It then performs enrichment analyses on the identified subnetworks, identifying enriched terms (i.e. pathways or, more broadly, gene sets) that possibly underlie the phenotype of interest. 'pathfindR' also offers functionalities to cluster the enriched terms and identify representative terms in each cluster, to score the enriched terms per sample and to visualize analysis results. The enrichment, clustering and other methods implemented in 'pathfindR' are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2019. 'pathfindR': An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. <doi:10.3389/fgene.2019.00858>.

Maintained by Ege Ulgen. Last updated 1 months ago.

active-subnetworks enrichment pathway pathway-enrichment-analysis subnetwork

187 stars 10.38 score 138 scripts

bcgov

bcdata:Search and Retrieve Data from the BC Data Catalogue

Search, query, and download tabular and 'geospatial' data from the British Columbia Data Catalogue (<https://catalogue.data.gov.bc.ca/>). Search catalogue data records based on keywords, data licence, sector, data format, and B.C. government organization. View metadata directly in R, download many data formats, and query 'geospatial' data available via the B.C. government Web Feature Service ('WFS') using 'dplyr' syntax.

Maintained by Andy Teucher. Last updated 5 days ago.

bcdc citz data-science env

83 stars 10.36 score 186 scripts 4 dependents

bioc

pRoloc:A unifying bioinformatics framework for spatial proteomics

The pRoloc package implements machine learning and visualisation methods for the analysis and interogation of quantitiative mass spectrometry data to reliably infer protein sub-cellular localisation.

Maintained by Lisa Breckels. Last updated 4 days ago.

immunooncology proteomics massspectrometry classification clustering qualitycontrol bioconductor proteomics-data spatial-proteomics visualisation openblas cpp

15 stars 10.31 score 101 scripts 2 dependents

bioc

GSEABase:Gene set enrichment data structures and methods

This package provides classes and methods to support Gene Set Enrichment Analysis (GSEA).

Maintained by Bioconductor Package Maintainer. Last updated 2 months ago.

geneexpression genesetenrichment graphandnetwork go kegg

10.27 score 1.5k scripts 77 dependents

bioc

graphite:GRAPH Interaction from pathway Topological Environment

Graph objects from pathway topology derived from KEGG, Panther, PathBank, PharmGKB, Reactome SMPDB and WikiPathways databases.

Maintained by Gabriele Sales. Last updated 5 months ago.

pathways thirdpartyclient graphandnetwork network reactome kegg metabolomics bioinformatics mirror pathway-analysis

8 stars 10.24 score 122 scripts 21 dependents

bioc

EDASeq:Exploratory Data Analysis and Normalization for RNA-Seq

Numerical and graphical summaries of RNA-Seq read data. Within-lane normalization procedures to adjust for GC-content effect (or other gene-level effects) on read counts: loess robust local regression, global-scaling, and full-quantile normalization (Risso et al., 2011). Between-lane normalization procedures to adjust for distributional differences between lanes (e.g., sequencing depth): global-scaling and full-quantile normalization (Bullard et al., 2010).

Maintained by Davide Risso. Last updated 5 months ago.

immunooncology sequencing rnaseq preprocessing qualitycontrol differentialexpression

5 stars 10.24 score 594 scripts 9 dependents

idigbio

ridigbio:Interface to the iDigBio Data API

An interface to iDigBio's search API that allows downloading specimen records. Searches are returned as a data.frame. Other functions such as the metadata end points return lists of information. iDigBio is a US project focused on digitizing and serving museum specimen collections on the web. See <https://www.idigbio.org> for information on iDigBio.

Maintained by Jesse Bennett. Last updated 20 days ago.

16 stars 10.23 score 63 scripts 7 dependents

bioc

zinbwave:Zero-Inflated Negative Binomial Model for RNA-Seq Data

Implements a general and flexible zero-inflated negative binomial model that can be used to provide a low-dimensional representations of single-cell RNA-seq data. The model accounts for zero inflation (dropouts), over-dispersion, and the count nature of the data. The model also accounts for the difference in library sizes and optionally for batch effects and/or other covariates, avoiding the need for pre-normalize the data.

Maintained by Davide Risso. Last updated 5 months ago.

immunooncology dimensionreduction geneexpression rnaseq software transcriptomics sequencing singlecell

43 stars 10.21 score 190 scripts 6 dependents

bioc

cBioPortalData:Exposes and Makes Available Data from the cBioPortal Web Resources

The cBioPortalData R package accesses study datasets from the cBio Cancer Genomics Portal. It accesses the data either from the pre-packaged zip / tar files or from the API interface that was recently implemented by the cBioPortal Data Team. The package can provide data in either tabular format or with MultiAssayExperiment object that uses familiar Bioconductor data representations.

Maintained by Marcel Ramos. Last updated 10 days ago.

software infrastructure thirdpartyclient bioconductor-package nci-itcr u24ca289073

33 stars 10.17 score 147 scripts 4 dependents

bioc

singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.

Maintained by Joshua David Campbell. Last updated 1 months ago.

singlecell geneexpression differentialexpression alignment clustering immunooncology batcheffect normalization qualitycontrol dataimport gui

182 stars 10.17 score 252 scripts

ropensci

rfishbase:R Interface to 'FishBase'

A programmatic interface to 'FishBase', re-written based on an accompanying 'RESTful' API. Access tables describing over 30,000 species of fish, their biology, ecology, morphology, and more. This package also supports experimental access to 'SeaLifeBase' data, which contains nearly 200,000 species records for all types of aquatic life not covered by 'FishBase.'

Maintained by Carl Boettiger. Last updated 3 months ago.

fish fishbase taxonomy

116 stars 10.11 score 764 scripts 2 dependents

ropensci

spocc:Interface to Species Occurrence Data Sources

A programmatic interface to many species occurrence data sources, including Global Biodiversity Information Facility ('GBIF'), 'iNaturalist', 'eBird', Integrated Digitized 'Biocollections' ('iDigBio'), 'VertNet', Ocean 'Biogeographic' Information System ('OBIS'), and Atlas of Living Australia ('ALA'). Includes functionality for retrieving species occurrence data, and combining those data.

Maintained by Hannah Owens. Last updated 2 months ago.

specimens api web-services occurrences species taxonomy gbif inat vertnet ebird idigbio obis ala antweb bison data ecoengine inaturalist occurrence species-occurrence spocc

118 stars 10.09 score 552 scripts 5 dependents

bioc

sva:Surrogate Variable Analysis

The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. Specifically, the sva package contains functions for the identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data (like gene expression/RNA sequencing/methylation/brain imaging data) that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise. The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv). Removing batch effects and using surrogate variables in differential expression analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility, see (Leek and Storey 2007 PLoS Genetics, 2008 PNAS or Leek et al. 2011 Nat. Reviews Genetics).

Maintained by Jeffrey T. Leek. Last updated 5 months ago.

immunooncology microarray statisticalmethod preprocessing multiplecomparison sequencing rnaseq batcheffect normalization

10.04 score 3.2k scripts 50 dependents

pecanproject

PEcAn.settings:PEcAn Settings package

Contains functions to read PEcAn settings files.

Maintained by David LeBauer. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 10.03 score 54 scripts 17 dependents

bioc

BiocCheck:Bioconductor-specific package checks

BiocCheck guides maintainers through Bioconductor best practicies. It runs Bioconductor-specific package checks by searching through package code, examples, and vignettes. Maintainers are required to address all errors, warnings, and most notes produced.

Maintained by Marcel Ramos. Last updated 1 months ago.

infrastructure bioconductor-package core-services

8 stars 10.03 score 114 scripts 6 dependents

bioc

singscore:Rank-based single-sample gene set scoring method

A simple single-sample gene signature scoring method that uses rank-based statistics to analyze the sample's gene expression profile. It scores the expression activities of gene sets at a single-sample level.

Maintained by Malvika Kharbanda. Last updated 5 months ago.

software geneexpression genesetenrichment bioinformatics

41 stars 10.03 score 124 scripts 4 dependents

bioc

derfinder:Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

This package provides functions for annotation-agnostic differential expression analysis of RNA-seq data. Two implementations of the DER Finder approach are included in this package: (1) single base-level F-statistics and (2) DER identification at the expressed regions-level. The DER Finder approach can also be used to identify differentially bounded ChIP-seq peaks.

Maintained by Leonardo Collado-Torres. Last updated 4 months ago.

differentialexpression sequencing rnaseq chipseq differentialpeakcalling software immunooncology coverage annotation-agnostic bioconductor derfinder

42 stars 10.03 score 78 scripts 6 dependents

pecanproject

PEcAn.assim.batch:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by Istem Fer. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 9.97 score 20 scripts 2 dependents

darwin-eu

omopgenerics:Methods and Classes for the OMOP Common Data Model

Provides definitions of core classes and methods used by analytic pipelines that query the OMOP (Observational Medical Outcomes Partnership) common data model.

Maintained by Martí Català. Last updated 25 days ago.

9.97 score 193 scripts 16 dependents

bioc

goseq:Gene Ontology analyser for RNA-seq and other length biased data

Detects Gene Ontology and/or other user defined categories which are over/under represented in RNA-seq data.

Maintained by Federico Marini. Last updated 5 months ago.

immunooncology sequencing go geneexpression transcription rnaseq differentialexpression annotation genesetenrichment kegg pathways software

2 stars 9.97 score 636 scripts 9 dependents

darwin-eu

PatientProfiles:Identify Characteristics of Patients in the OMOP Common Data Model

Identify the characteristics of patients in data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model.

Maintained by Marti Catala. Last updated 25 days ago.

1 stars 9.97 score 225 scripts 9 dependents

pecanproject

PEcAn.priors:PEcAn Functions Used to Estimate Priors from Data

Functions to estimate priors from data.

Maintained by David LeBauer. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 9.96 score 13 scripts 6 dependents

bioc

rGREAT:GREAT Analysis - Functional Enrichment on Genomic Regions

GREAT (Genomic Regions Enrichment of Annotations Tool) is a type of functional enrichment analysis directly performed on genomic regions. This package implements the GREAT algorithm (the local GREAT analysis), also it supports directly interacting with the GREAT web service (the online GREAT analysis). Both analysis can be viewed by a Shiny application. rGREAT by default supports more than 600 organisms and a large number of gene set collections, as well as self-provided gene sets and organisms from users. Additionally, it implements a general method for dealing with background regions.

Maintained by Zuguang Gu. Last updated 19 days ago.

genesetenrichment go pathways software sequencing wholegenome genomeannotation coverage cpp

86 stars 9.96 score 320 scripts 1 dependents

darwin-eu

CodelistGenerator:Identify Relevant Clinical Codes and Evaluate Their Use

Generate a candidate code list for the Observational Medical Outcomes Partnership (OMOP) common data model based on string matching. For a given search strategy, a candidate code list will be returned.

Maintained by Edward Burn. Last updated 5 days ago.

14 stars 9.94 score 165 scripts 4 dependents

pecanproject

PEcAn.MA:PEcAn Functions Used for Meta-Analysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation. The PEcAn.MA package contains the functions used in the Bayesian meta-analysis of trait data.

Maintained by David LeBauer. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 9.92 score 7 scripts 7 dependents

bioc

RUVSeq:Remove Unwanted Variation from RNA-Seq Data

This package implements the remove unwanted variation (RUV) methods of Risso et al. (2014) for the normalization of RNA-Seq read counts between samples.

Maintained by Davide Risso. Last updated 5 months ago.

immunooncology differentialexpression preprocessing rnaseq software

13 stars 9.91 score 482 scripts 5 dependents

bioc

OmnipathR:OmniPath web service client and more

A client for the OmniPath web service (https://www.omnipathdb.org) and many other resources. It also includes functions to transform and pretty print some of the downloaded data, functions to access a number of other resources such as BioPlex, ConsensusPathDB, EVEX, Gene Ontology, Guide to Pharmacology (IUPHAR/BPS), Harmonizome, HTRIdb, Human Phenotype Ontology, InWeb InBioMap, KEGG Pathway, Pathway Commons, Ramilowski et al. 2015, RegNetwork, ReMap, TF census, TRRUST and Vinayagam et al. 2011. Furthermore, OmnipathR features a close integration with the NicheNet method for ligand activity prediction from transcriptomics data, and its R implementation `nichenetr` (available only on github).

Maintained by Denes Turei. Last updated 1 months ago.

graphandnetwork network pathways software thirdpartyclient dataimport datarepresentation genesignaling generegulation systemsbiology transcriptomics singlecell annotation kegg complexes enzyme-ptm networks networks-biology omnipath proteins quarto

130 stars 9.90 score 226 scripts 2 dependents

bioc

methylumi:Handle Illumina methylation data

This package provides classes for holding and manipulating Illumina methylation data. Based on eSet, it can contain MIAME information, sample information, feature information, and multiple matrices of data. An "intelligent" import function, methylumiR can read the Illumina text files and create a MethyLumiSet. methylumIDAT can directly read raw IDAT files from HumanMethylation27 and HumanMethylation450 microarrays. Normalization, background correction, and quality control features for GoldenGate, Infinium, and Infinium HD arrays are also included.

Maintained by Sean Davis. Last updated 5 months ago.

dnamethylation twochannel preprocessing qualitycontrol cpgisland

9 stars 9.90 score 89 scripts 9 dependents

bioc

PureCN:Copy number calling and SNV classification using targeted short read sequencing

This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.

Maintained by Markus Riester. Last updated 1 days ago.

copynumbervariation software sequencing variantannotation variantdetection coverage immunooncology bioconductor-package cell-free-dna copy-number loh tumor-heterogeneity tumor-mutational-burden tumor-purity

132 stars 9.88 score 40 scripts

bioc

GenVisR:Genomic Visualizations in R

Produce highly customizable publication quality graphics for genomic data primarily at the cohort level.

Maintained by Zachary Skidmore. Last updated 5 months ago.

infrastructure datarepresentation classification dnaseq

217 stars 9.87 score 76 scripts

bioc

annotatr:Annotation of Genomic Regions to Genomic Annotations

Given a set of genomic sites/regions (e.g. ChIP-seq peaks, CpGs, differentially methylated CpGs or regions, SNPs, etc.) it is often of interest to investigate the intersecting genomic annotations. Such annotations include those relating to gene models (promoters, 5'UTRs, exons, introns, and 3'UTRs), CpGs (CpG islands, CpG shores, CpG shelves), or regulatory sequences such as enhancers. The annotatr package provides an easy way to summarize and visualize the intersection of genomic sites/regions with genomic annotations.

Maintained by Raymond G. Cavalcante. Last updated 5 months ago.

software annotation genomeannotation functionalgenomics visualization genome-annotation

26 stars 9.76 score 246 scripts 5 dependents

bioc

RTCGAToolbox:A new tool for exporting TCGA Firehose data

Managing data from large scale projects such as The Cancer Genome Atlas (TCGA) for further analysis is an important and time consuming step for research projects. Several efforts, such as Firehose project, make TCGA pre-processed data publicly available via web services and data portals but it requires managing, downloading and preparing the data for following steps. We developed an open source and extensible R based data client for Firehose pre-processed data and demonstrated its use with sample case studies. Results showed that RTCGAToolbox could improve data management for researchers who are interested with TCGA data. In addition, it can be integrated with other analysis pipelines for following data analysis.

Maintained by Marcel Ramos. Last updated 3 months ago.

differentialexpression geneexpression sequencing

18 stars 9.75 score 76 scripts 5 dependents

ohdsi

CohortConstructor:Build and Manipulate Study Cohorts Using a Common Data Model

Create and manipulate study cohorts in data mapped to the Observational Medical Outcomes Partnership Common Data Model.

Maintained by Edward Burn. Last updated 9 hours ago.

2 stars 9.74 score 207 scripts 2 dependents

prestodb

RPresto:DBI Connector to Presto

Implements a 'DBI' compliant interface to Presto. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes: <https://prestodb.io/>.

Maintained by Jarod G.R. Meng. Last updated 2 months ago.

132 stars 9.73 score 25 scripts 4 dependents

femiguez

apsimx:Inspect, Read, Edit and Run 'APSIM' "Next Generation" and 'APSIM' Classic

The functions in this package inspect, read, edit and run files for 'APSIM' "Next Generation" ('JSON') and 'APSIM' "Classic" ('XML'). The files with an 'apsim' extension correspond to 'APSIM' Classic (7.x) - Windows only - and the ones with an 'apsimx' extension correspond to 'APSIM' "Next Generation". For more information about 'APSIM' see (<https://www.apsim.info/>) and for 'APSIM' next generation (<https://apsimnextgeneration.netlify.app/>).

Maintained by Fernando Miguez. Last updated 12 days ago.

apsim apsimx

59 stars 9.72 score 68 scripts 2 dependents

pecanproject

PEcAnRTM:PEcAn Functions Used for Radiative Transfer Modeling

Functions for performing forward runs and inversions of radiative transfer models (RTMs). Inversions can be performed using maximum likelihood, or more complex hierarchical Bayesian methods. Underlying numerical analyses are optimized for speed using Fortran code.

Maintained by Alexey Shiklomanov. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants fortran jags cpp

216 stars 9.72 score 132 scripts

centerforassessment

SGP:Student Growth Percentiles & Percentile Growth Trajectories

An analytic framework for the calculation of norm- and criterion-referenced academic growth estimates using large scale, longitudinal education assessment data as developed in Betebenner (2009) <doi:10.1111/j.1745-3992.2009.00161.x>.

Maintained by Damian W. Betebenner. Last updated 13 days ago.

percentile-growth-projections quantile-regression sgp sgp-analyses student-growth-percentiles student-growth-projections

20 stars 9.69 score 88 scripts

appsilon

shiny.telemetry:'Shiny' App Usage Telemetry

Enables instrumentation of 'Shiny' apps for tracking user session events such as input changes, browser type, and session duration. These events can be sent to any of the available storage backends and analyzed using the included 'Shiny' app to gain insights about app usage and adoption.

Maintained by André Veríssimo. Last updated 4 months ago.

analytics rhinoverse shiny

67 stars 9.69 score 29 scripts

bioc

txdbmaker:Tools for making TxDb objects from genomic annotations

A set of tools for making TxDb objects from genomic annotations from various sources (e.g. UCSC, Ensembl, and GFF files). These tools allow the user to download the genomic locations of transcripts, exons, and CDS, for a given assembly, and to import them in a TxDb object. TxDb objects are implemented in the GenomicFeatures package, together with flexible methods for extracting the desired features in convenient formats.

Maintained by H. Pagès. Last updated 4 months ago.

infrastructure dataimport annotation genomeannotation genomeassembly genetics sequencing bioconductor-package core-package

3 stars 9.68 score 92 scripts 87 dependents

bioc

TCGAutils:TCGA utility functions for data management

A suite of helper functions for checking and manipulating TCGA data including data obtained from the curatedTCGAData experiment package. These functions aim to simplify and make working with TCGA data more manageable. Exported functions include those that import data from flat files into Bioconductor objects, convert row annotations, and identifier translation via the GDC API.

Maintained by Marcel Ramos. Last updated 4 months ago.

software workflowstep preprocessing dataimport bioconductor-package tcga u24ca289073 utilities

27 stars 9.66 score 210 scripts 10 dependents

plangfelder

WGCNA:Weighted Correlation Network Analysis

Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.

Maintained by Peter Langfelder. Last updated 6 months ago.

cpp

54 stars 9.65 score 5.3k scripts 32 dependents

vimc

orderly:Lightweight Reproducible Reporting

Order, create and store reports from R. By defining a lightweight interface around the inputs and outputs of an analysis, a lot of the repetitive work for reproducible research can be automated. We define a simple format for organising and describing work that facilitates collaborative reproducible research and acknowledges that all analyses are run multiple times over their lifespans.

Maintained by Rich FitzJohn. Last updated 2 years ago.

117 stars 9.63 score 94 scripts 4 dependents

bioc

pcaExplorer:Interactive Visualization of RNA-seq Data Using a Principal Components Approach

This package provides functionality for interactive visualization of RNA-seq datasets based on Principal Components Analysis. The methods provided allow for quick information extraction and effective data exploration. A Shiny application encapsulates the whole analysis.

Maintained by Federico Marini. Last updated 3 months ago.

immunooncology visualization rnaseq dimensionreduction principalcomponent qualitycontrol gui reportwriting shinyapps bioconductor principal-components reproducible-research rna-seq-analysis rna-seq-data shiny transcriptome user-friendly

56 stars 9.63 score 180 scripts

bioc

clusterExperiment:Compare Clusterings for Single-Cell Sequencing

Provides functionality for running and comparing many different clusterings of single-cell sequencing data or other large mRNA Expression data sets.

Maintained by Elizabeth Purdom. Last updated 5 months ago.

clustering rnaseq sequencing software singlecell cpp

38 stars 9.62 score 192 scripts 1 dependents

bioc

AnnotationForge:Tools for building SQLite-based annotation data packages

Provides code for generating Annotation packages and their databases. Packages produced are intended to be used with AnnotationDbi.

Maintained by Bioconductor Package Maintainer. Last updated 18 days ago.

annotation infrastructure bioconductor-package core-package

5 stars 9.62 score 143 scripts 19 dependents

bioc

cytomapper:Visualization of highly multiplexed imaging data in R

Highly multiplexed imaging acquires the single-cell expression of selected proteins in a spatially-resolved fashion. These measurements can be visualised across multiple length-scales. First, pixel-level intensities represent the spatial distributions of feature expression with highest resolution. Second, after segmentation, expression values or cell-level metadata (e.g. cell-type information) can be visualised on segmented cell areas. This package contains functions for the visualisation of multiplexed read-outs and cell-level information obtained by multiplexed imaging technologies. The main functions of this package allow 1. the visualisation of pixel-level information across multiple channels, 2. the display of cell-level information (expression and/or metadata) on segmentation masks and 3. gating and visualisation of single cells.

Maintained by Lasse Meyer. Last updated 5 months ago.

immunooncology software singlecell onechannel twochannel multiplecomparison normalization dataimport bioimaging imaging-mass-cytometry single-cell spatial-analysis

32 stars 9.61 score 354 scripts 5 dependents

ropensci

tidyhydat:Extract and Tidy Canadian 'Hydrometric' Data

Provides functions to access historical and real-time national 'hydrometric' data from Water Survey of Canada data sources (<https://dd.weather.gc.ca/hydrometric/csv/> and <https://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/>) and then applies tidy data principles.

Maintained by Sam Albers. Last updated 20 days ago.

citz government-data hydrology hydrometrics tidy-data water-resources

71 stars 9.59 score 202 scripts 3 dependents

bioc

recount:Explore and download data from the recount project

Explore and download data from the recount project available at https://jhubiostatistics.shinyapps.io/recount/. Using the recount package you can download RangedSummarizedExperiment objects at the gene, exon or exon-exon junctions level, the raw counts, the phenotype metadata used, the urls to the sample coverage bigWig files or the mean coverage bigWig file for a particular study. The RangedSummarizedExperiment objects can be used by different packages for performing differential expression analysis. Using http://bioconductor.org/packages/derfinder you can perform annotation-agnostic differential expression analyses with the data from the recount project as described at http://www.nature.com/nbt/journal/v35/n4/full/nbt.3838.html.

Maintained by Leonardo Collado-Torres. Last updated 4 months ago.

coverage differentialexpression geneexpression rnaseq sequencing software dataimport immunooncology annotation-agnostic bioconductor count derfinder deseq2 exon gene human illumina junction recount

41 stars 9.57 score 498 scripts 3 dependents

rqtl

qtl2:Quantitative Trait Locus Mapping in Experimental Crosses

Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.

Maintained by Karl W Broman. Last updated 23 days ago.

cpp

34 stars 9.48 score 1.1k scripts 5 dependents

bioc

SpatialFeatureExperiment:Integrating SpatialExperiment with Simple Features in sf

A new S4 class integrating Simple Features with the R package sf to bring geospatial data analysis methods based on vector data to spatial transcriptomics. Also implements management of spatial neighborhood graphs and geometric operations. This pakage builds upon SpatialExperiment and SingleCellExperiment, hence methods for these parent classes can still be used.

Maintained by Lambda Moses. Last updated 2 months ago.

datarepresentation transcriptomics spatial

49 stars 9.40 score 322 scripts 1 dependents

usepa

tcpl:ToxCast Data Analysis Pipeline

The ToxCast Data Analysis Pipeline ('tcpl') is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, 'invitrodb'. The package was developed for the chemical screening data curated by the US EPA's Toxicity Forecaster (ToxCast) program, but 'tcpl' can be used to support diverse chemical screening efforts.

Maintained by Jason Brown. Last updated 12 days ago.

ccte comptox ord

36 stars 9.39 score 90 scripts

pecanproject

PEcAn.data.land:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by Mike Dietze. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 9.35 score 19 scripts 10 dependents

bioc

GenomicInteractions:Utilities for handling genomic interaction data

Utilities for handling genomic interaction data such as ChIA-PET or Hi-C, annotating genomic features with interaction information, and producing plots and summary statistics.

Maintained by Liz Ing-Simmons. Last updated 5 months ago.

software infrastructure dataimport datarepresentation hic

7 stars 9.31 score 162 scripts 5 dependents

ohdsi

Andromeda:Asynchronous Disk-Based Representation of Massive Data

Storing very large data objects on a local drive, while still making it possible to manipulate the data in an efficient manner.

Maintained by Martijn Schuemie. Last updated 7 months ago.

hades

11 stars 9.29 score 57 scripts 8 dependents

bioc

EWCE:Expression Weighted Celltype Enrichment

Used to determine which cell types are enriched within gene lists. The package provides tools for testing enrichments within simple gene lists (such as human disease associated genes) and those resulting from differential expression studies. The package does not depend upon any particular Single Cell Transcriptome dataset and user defined datasets can be loaded in and used in the analyses.

Maintained by Alan Murphy. Last updated 2 months ago.

geneexpression transcription differentialexpression genesetenrichment genetics microarray mrnamicroarray onechannel rnaseq biomedicalinformatics proteomics visualization functionalgenomics singlecell deconvolution single-cell single-cell-rna-seq transcriptomics

56 stars 9.29 score 99 scripts

bioc

CNEr:CNE Detection and Visualization

Large-scale identification and advanced visualization of sets of conserved noncoding elements.

Maintained by Ge Tan. Last updated 5 months ago.

generegulation visualization dataimport

3 stars 9.28 score 35 scripts 19 dependents

bioc

IsoformSwitchAnalyzeR:Identify, Annotate and Visualize Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data

Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.

Maintained by Kristoffer Vitting-Seerup. Last updated 5 months ago.

geneexpression transcription alternativesplicing differentialexpression differentialsplicing visualization statisticalmethod transcriptomevariant biomedicalinformatics functionalgenomics systemsbiology transcriptomics rnaseq annotation functionalprediction geneprediction dataimport multiplecomparison batcheffect immunooncology

108 stars 9.26 score 125 scripts

bioc

RcisTarget:RcisTarget Identify transcription factor binding motifs enriched on a list of genes or genomic regions

RcisTarget identifies transcription factor binding motifs (TFBS) over-represented on a gene list. In a first step, RcisTarget selects DNA motifs that are significantly over-represented in the surroundings of the transcription start site (TSS) of the genes in the gene-set. This is achieved by using a database that contains genome-wide cross-species rankings for each motif. The motifs that are then annotated to TFs and those that have a high Normalized Enrichment Score (NES) are retained. Finally, for each motif and gene-set, RcisTarget predicts the candidate target genes (i.e. genes in the gene-set that are ranked above the leading edge).

Maintained by Gert Hulselmans. Last updated 5 months ago.

generegulation motifannotation transcriptomics transcription genesetenrichment genetarget

37 stars 9.18 score 191 scripts

pecanproject

PEcAn.allometry:PEcAn Allometry Functions

Synthesize allometric equations or fit allometries to data.

Maintained by Mike Dietze. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 9.13 score 34 scripts

bioc

sesame:SEnsible Step-wise Analysis of DNA MEthylation BeadChips

Tools For analyzing Illumina Infinium DNA methylation arrays. SeSAMe provides utilities to support analyses of multiple generations of Infinium DNA methylation BeadChips, including preprocessing, quality control, visualization and inference. SeSAMe features accurate detection calling, intelligent inference of ethnicity, sex and advanced quality control routines.

Maintained by Wanding Zhou. Last updated 3 months ago.

dnamethylation methylationarray preprocessing qualitycontrol bioinformatics dna-methylation microarray

69 stars 9.08 score 258 scripts 1 dependents

bioc

OUTRIDER:OUTRIDER - OUTlier in RNA-Seq fInDER

Identification of aberrant gene expression in RNA-seq data. Read count expectations are modeled by an autoencoder to control for confounders in the data. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. Furthermore, OUTRIDER provides useful plotting functions to analyze and visualize the results.

Maintained by Christian Mertes. Last updated 5 months ago.

immunooncology rnaseq transcriptomics alignment sequencing geneexpression genetics count-data diagnostics expression-analysis mendelian-genetics outlier-detection rna-seq openblas cpp

50 stars 9.07 score 110 scripts 1 dependents

pecanproject

PEcAn.qaqc:QAQC

PEcAn integration and model skill testing

Maintained by David LeBauer. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 9.07 score 5 scripts

bioc

BatchQC:Batch Effects Quality Control Software

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.

Maintained by Jessica Anderson. Last updated 13 days ago.

batcheffect graphandnetwork microarray normalization principalcomponent sequencing software visualization qualitycontrol rnaseq preprocessing differentialexpression immunooncology

7 stars 9.06 score 54 scripts

ohdsi

Cyclops:Cyclic Coordinate Descent for Logistic, Poisson and Survival Analysis

This model fitting tool incorporates cyclic coordinate descent and majorization-minimization approaches to fit a variety of regression models found in large-scale observational healthcare data. Implementations focus on computational optimization and fine-scale parallelization to yield efficient inference in massive datasets. Please see: Suchard, Simpson, Zorych, Ryan and Madigan (2013) <doi:10.1145/2414416.2414791>.

Maintained by Marc A. Suchard. Last updated 4 months ago.

hades cpp

39 stars 9.05 score 73 scripts 4 dependents

bioc

bambu:Context-Aware Transcript Quantification from Long Read RNA-Seq data

bambu is a R package for multi-sample transcript discovery and quantification using long read RNA-Seq data. You can use bambu after read alignment to obtain expression estimates for known and novel transcripts and genes. The output from bambu can directly be used for visualisation and downstream analysis such as differential gene expression or transcript usage.

Maintained by Ying Chen. Last updated 2 months ago.

alignment coverage differentialexpression featureextraction geneexpression genomeannotation genomeassembly immunooncology longread multiplecomparison normalization rnaseq regression sequencing software transcription transcriptomics bambu bioconductor long-reads nanopore nanopore-sequencing rna-seq rna-seq-analysis transcript-quantification transcript-reconstruction cpp

203 stars 9.04 score 91 scripts 1 dependents

bioc

Banksy:Spatial transcriptomic clustering

Banksy is an R package that incorporates spatial information to cluster cells in a feature space (e.g. gene expression). To incorporate spatial information, BANKSY computes the mean neighborhood expression and azimuthal Gabor filters that capture gene expression gradients. These features are combined with the cell's own expression to embed cells in a neighbor-augmented product space which can then be clustered, allowing for accurate and spatially-aware cell typing and tissue domain segmentation.

Maintained by Joseph Lee. Last updated 28 days ago.

clustering spatial singlecell geneexpression dimensionreduction clustering-algorithm single-cell-omics spatial-omics

90 stars 9.03 score 248 scripts

pecanproject

PEcAn.all:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by David LeBauer. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 9.02 score 266 scripts

bioc

scPipe:Pipeline for single cell multi-omic data pre-processing

A preprocessing pipeline for single cell RNA-seq/ATAC-seq data that starts from the fastq files and produces a feature count matrix with associated quality control information. It can process fastq data generated by CEL-seq, MARS-seq, Drop-seq, Chromium 10x and SMART-seq protocols.

Maintained by Shian Su. Last updated 4 months ago.

immunooncology software sequencing rnaseq geneexpression singlecell visualization sequencematching preprocessing qualitycontrol genomeannotation dataimport curl bzip2 xz-utils zlib cpp

68 stars 9.02 score 84 scripts

bioc

scone:Single Cell Overview of Normalized Expression data

SCONE is an R package for comparing and ranking the performance of different normalization schemes for single-cell RNA-seq and other high-throughput analyses.

Maintained by Davide Risso. Last updated 1 months ago.

immunooncology normalization preprocessing qualitycontrol geneexpression rnaseq software transcriptomics sequencing singlecell coverage

53 stars 9.00 score 104 scripts

pecanproject

PEcAn.MAAT:PEcAn Package for Integration of the MAAT Model

This module provides functions to wrap the MAAT model into the PEcAn workflows.

Maintained by Shawn Serbin. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 8.97 score 12 scripts

dexter-psychometrics

dexter:Data Management and Analysis of Tests

A system for the management, assessment, and psychometric analysis of data from educational and psychological tests.

Maintained by Jesse Koops. Last updated 21 days ago.

openblas cpp openmp

8 stars 8.97 score 135 scripts 2 dependents

sym33

RecordLinkage:Record Linkage Functions for Linking and Deduplicating Data Sets

Provides functions for linking and deduplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain. For details, see our paper "The RecordLinkage Package: Detecting Errors in Data" Sariyar M / Borg A (2010) <doi:10.32614/RJ-2010-017>.

Maintained by Murat Sariyar. Last updated 2 years ago.

6 stars 8.96 score 454 scripts 8 dependents

bioc

topGO:Enrichment Analysis for Gene Ontology

topGO package provides tools for testing GO terms while accounting for the topology of the GO graph. Different test statistics and different methods for eliminating local similarities and dependencies between GO terms can be implemented and applied.

Maintained by Adrian Alexa. Last updated 5 months ago.

microarray visualization

8.96 score 2.0k scripts 20 dependents

pecanproject

PEcAn.BIOCRO:PEcAn Package for Integration of the BioCro Model

This module provides functions to link BioCro to PEcAn.

Maintained by David LeBauer. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 8.96 score 23 scripts

pecanproject

PEcAn.uncertainty:PEcAn Functions Used for Propagating and Partitioning Uncertainties in Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by David LeBauer. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 8.95 score 15 scripts 5 dependents

bioc

motifbreakR:A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 32 species, a total of 109 versions).

Maintained by Simon Gert Coetzee. Last updated 5 months ago.

chipseq visualization motifannotation transcription

28 stars 8.89 score 103 scripts

ericpante

marmap:Import, Plot and Analyze Bathymetric and Topographic Data

Import xyz data from the NOAA (National Oceanic and Atmospheric Administration, <https://www.noaa.gov>), GEBCO (General Bathymetric Chart of the Oceans, <https://www.gebco.net>) and other sources, plot xyz data to prepare publication-ready figures, analyze xyz data to extract transects, get depth / altitude based on geographical coordinates, or calculate z-constrained least-cost paths.

Maintained by Benoit Simon-Bouhet. Last updated 9 months ago.

32 stars 8.86 score 524 scripts 1 dependents

pecanproject

PEcAn.workflow:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation. This package provides workhorse functions that can be used to run the major steps of a PEcAn analysis.

Maintained by David LeBauer. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 8.85 score 15 scripts 4 dependents

sherrillmix

taxonomizr:Functions to Work with NCBI Accessions and Taxonomy

Functions for assigning taxonomy to NCBI accession numbers and taxon IDs based on NCBI's accession2taxid and taxdump files. This package allows the user to download NCBI data dumps and create a local database for fast and local taxonomic assignment.

Maintained by Scott Sherrill-Mix. Last updated 19 days ago.

72 stars 8.85 score 255 scripts 2 dependents

usdaforestservice

FIESTA:Forest Inventory Estimation and Analysis

A research estimation tool for analysts that work with sample-based inventory data from the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis (FIA) Program.

Maintained by Grayson White. Last updated 5 days ago.

30 stars 8.84 score 62 scripts

pbiecek

archivist:Tools for Storing, Restoring and Searching for R Objects

Data exploration and modelling is a process in which a lot of data artifacts are produced. Artifacts like: subsets, data aggregates, plots, statistical models, different versions of data sets and different versions of results. The more projects we work with the more artifacts are produced and the harder it is to manage these artifacts. Archivist helps to store and manage artifacts created in R. Archivist allows you to store selected artifacts as a binary files together with their metadata and relations. Archivist allows to share artifacts with others, either through shared folder or github. Archivist allows to look for already created artifacts by using it's class, name, date of the creation or other properties. Makes it easy to restore such artifacts. Archivist allows to check if new artifact is the exact copy that was produced some time ago. That might be useful either for testing or caching.

Maintained by Przemyslaw Biecek. Last updated 8 months ago.

74 stars 8.81 score 105 scripts 2 dependents

mountainmath

cansim:Accessing Statistics Canada Data Table and Vectors

Searches for, accesses, and retrieves Statistics Canada data tables, as well as individual vectors, as tidy data frames. This package enriches the tables with metadata, deals with encoding issues, allows for bilingual English or French language data retrieval, and bundles convenience functions to make it easier to work with retrieved table data. For more efficient data access the package allows for caching data in a local database and database level filtering, data manipulation and summarizing.

Maintained by Jens von Bergmann. Last updated 17 days ago.

45 stars 8.78 score 446 scripts

pecanproject

PEcAn.data.remote:PEcAn Functions Used for Extracting Remote Sensing Data

PEcAn module for processing remote data. Python module requirements: requests, json, re, ast, panads, sys. If any of these modules are missing, install using pip install <module name>.

Maintained by Bailey Morrison. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 8.77 score 6 scripts 5 dependents

pecanproject

PEcAn.ED2:PEcAn Package for Integration of ED2 Model

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation. This package provides functions to link the Ecosystem Demography Model, version 2, to PEcAn.

Maintained by Mike Dietze. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 8.76 score 145 scripts

bioc

ChIPpeakAnno:Batch annotation of the peaks identified from either ChIP-seq, ChIP-chip experiments, or any experiments that result in large number of genomic interval data

The package encompasses a range of functions for identifying the closest gene, exon, miRNA, or custom features—such as highly conserved elements and user-supplied transcription factor binding sites. Additionally, users can retrieve sequences around the peaks and obtain enriched Gene Ontology (GO) or Pathway terms. In version 2.0.5 and beyond, new functionalities have been introduced. These include features for identifying peaks associated with bi-directional promoters along with summary statistics (peaksNearBDP), summarizing motif occurrences in peaks (summarizePatternInPeaks), and associating additional identifiers with annotated peaks or enrichedGO (addGeneIDs). The package integrates with various other packages such as biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest, and stat to enhance its analytical capabilities.

Maintained by Jianhong Ou. Last updated 3 months ago.

annotation chipseq chipchip

8.75 score 584 scripts 6 dependents

bioc

CellBench:Construct Benchmarks for Single Cell Analysis Methods

This package contains infrastructure for benchmarking analysis methods and access to single cell mixture benchmarking data. It provides a framework for organising analysis methods and testing combinations of methods in a pipeline without explicitly laying out each combination. It also provides utilities for sampling and filtering SingleCellExperiment objects, constructing lists of functions with varying parameters, and multithreaded evaluation of analysis methods.

Maintained by Shian Su. Last updated 5 months ago.

software infrastructure singlecell benchmark bioinformatics

31 stars 8.73 score 98 scripts

bioc

qpgraph:Estimation of Genetic and Molecular Regulatory Networks from High-Throughput Genomics Data

Estimate gene and eQTL networks from high-throughput expression and genotyping assays.

Maintained by Robert Castelo. Last updated 3 days ago.

microarray geneexpression transcription pathways networkinference graphandnetwork generegulation genetics geneticvariability snp software openblas

3 stars 8.72 score 20 scripts 3 dependents

bioc

GenomicScores:Infrastructure to work with genomewide position-specific scores

Provide infrastructure to store and access genomewide position-specific scores within R and Bioconductor.

Maintained by Robert Castelo. Last updated 2 months ago.

infrastructure genetics annotation sequencing coverage annotationhubsoftware

8 stars 8.71 score 83 scripts 6 dependents

bioc

Voyager:From geospatial to spatial omics

SpatialFeatureExperiment (SFE) is a new S4 class for working with spatial single-cell genomics data. The voyager package implements basic exploratory spatial data analysis (ESDA) methods for SFE. Univariate methods include univariate global spatial ESDA methods such as Moran's I, permutation testing for Moran's I, and correlograms. Bivariate methods include Lee's L and cross variogram. Multivariate methods include MULTISPATI PCA and multivariate local Geary's C recently developed by Anselin. The Voyager package also implements plotting functions to plot SFE data and ESDA results.

Maintained by Lambda Moses. Last updated 3 months ago.

geneexpression spatial transcriptomics visualization bioconductor eda esda exploratory-data-analysis omics spatial-statistics spatial-transcriptomics

88 stars 8.71 score 173 scripts

bioc

trackViewer:A R/Bioconductor package with web interface for drawing elegant interactive tracks or lollipop plot to facilitate integrated analysis of multi-omics data

Visualize mapped reads along with annotation as track layers for NGS dataset such as ChIP-seq, RNA-seq, miRNA-seq, DNA-seq, SNPs and methylation data.

Maintained by Jianhong Ou. Last updated 4 days ago.

visualization

8.68 score 145 scripts 2 dependents

bioc

gage:Generally Applicable Gene-set Enrichment for Pathway Analysis

GAGE is a published method for gene set (enrichment or GSEA) or pathway analysis. GAGE is generally applicable independent of microarray or RNA-Seq data attributes including sample sizes, experimental designs, assay platforms, and other types of heterogeneity, and consistently achieves superior performance over other frequently used methods. In gage package, we provide functions for basic GAGE analysis, result processing and presentation. We have also built pipeline routines for of multiple GAGE analyses in a batch, comparison between parallel analyses, and combined analysis of heterogeneous data from different sources/studies. In addition, we provide demo microarray data and commonly used gene set data based on KEGG pathways and GO terms. These funtions and data are also useful for gene set analysis using other methods.

Maintained by Weijun Luo. Last updated 5 months ago.

pathways go differentialexpression microarray onechannel twochannel rnaseq genetics multiplecomparison genesetenrichment geneexpression systemsbiology sequencing

5 stars 8.68 score 784 scripts 1 dependents

bcgov

bcmaps:Map Layers and Spatial Utilities for British Columbia

Various layers of B.C., including administrative boundaries, natural resource management boundaries, census boundaries etc. All layers are available in BC Albers (<https://spatialreference.org/ref/epsg/3005/>) equal-area projection, which is the B.C. government standard. The layers are sourced from the British Columbia and Canadian government under open licenses, including B.C. Data Catalogue (<https://data.gov.bc.ca>), the Government of Canada Open Data Portal (<https://open.canada.ca/en/using-open-data>), and Statistics Canada (<https://www.statcan.gc.ca/en/reference/licence>).

Maintained by Andy Teucher. Last updated 3 months ago.

data-science env

73 stars 8.65 score 254 scripts

bioc

QuasR:Quantify and Annotate Short Reads in R

This package provides a framework for the quantification and analysis of Short Reads. It covers a complete workflow starting from raw sequence reads, over creation of alignments and quality control plots, to the quantification of genomic regions of interest. Read alignments are either generated through Rbowtie (data from DNA/ChIP/ATAC/Bis-seq experiments) or Rhisat2 (data from RNA-seq experiments that require spliced alignments), or can be provided in the form of bam files.

Maintained by Michael Stadler. Last updated 1 months ago.

genetics preprocessing sequencing chipseq rnaseq methylseq coverage alignment qualitycontrol immunooncology curl bzip2 xz-utils zlib cpp

6 stars 8.63 score 79 scripts 1 dependents

ropensci

ckanr:Client for the Comprehensive Knowledge Archive Network ('CKAN') API

Client for 'CKAN' API (<https://ckan.org/>). Includes interface to 'CKAN' 'APIs' for search, list, show for packages, organizations, and resources. In addition, provides an interface to the 'datastore' API.

Maintained by Francisco Alves. Last updated 2 years ago.

database open-data ckan api data dataset api-wrapper ckan-api

98 stars 8.60 score 448 scripts 4 dependents

bioc

AUCell:AUCell: Analysis of 'gene set' activity in single-cell RNA-seq data (e.g. identify cells with specific gene signatures)

AUCell allows to identify cells with active gene sets (e.g. signatures, gene modules...) in single-cell RNA-seq data. AUCell uses the "Area Under the Curve" (AUC) to calculate whether a critical subset of the input gene set is enriched within the expressed genes for each cell. The distribution of AUC scores across all the cells allows exploring the relative expression of the signature. Since the scoring method is ranking-based, AUCell is independent of the gene expression units and the normalization procedure. In addition, since the cells are evaluated individually, it can easily be applied to bigger datasets, subsetting the expression matrix if needed.

Maintained by Gert Hulselmans. Last updated 5 months ago.

singlecell genesetenrichment transcriptomics transcription geneexpression workflowstep normalization

8.59 score 860 scripts 4 dependents

bioc

SPIAT:Spatial Image Analysis of Tissues

SPIAT (**Sp**atial **I**mage **A**nalysis of **T**issues) is an R package with a suite of data processing, quality control, visualization and data analysis tools. SPIAT is compatible with data generated from single-cell spatial proteomics platforms (e.g. OPAL, CODEX, MIBI, cellprofiler). SPIAT reads spatial data in the form of X and Y coordinates of cells, marker intensities and cell phenotypes. SPIAT includes six analysis modules that allow visualization, calculation of cell colocalization, categorization of the immune microenvironment relative to tumor areas, analysis of cellular neighborhoods, and the quantification of spatial heterogeneity, providing a comprehensive toolkit for spatial data analysis.

Maintained by Yuzhou Feng. Last updated 16 days ago.

biomedicalinformatics cellbiology spatial clustering dataimport immunooncology qualitycontrol singlecell software visualization

22 stars 8.59 score 69 scripts

tudo-r

BatchJobs:Batch Computing with R

Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page.

Maintained by Bernd Bischl. Last updated 3 years ago.

85 stars 8.57 score 616 scripts 3 dependents

bioc

FRASER:Find RAre Splicing Events in RNA-Seq Data

Detection of rare aberrant splicing events in transcriptome profiles. Read count ratio expectations are modeled by an autoencoder to control for confounding factors in the data. Given these expectations, the ratios are assumed to follow a beta-binomial distribution with a junction specific dispersion. Outlier events are then identified as read-count ratios that deviate significantly from this distribution. FRASER is able to detect alternative splicing, but also intron retention. The package aims to support diagnostics in the field of rare diseases where RNA-seq is performed to identify aberrant splicing defects.

Maintained by Christian Mertes. Last updated 5 months ago.

rnaseq alternativesplicing sequencing software genetics coverage aberrant-splicing diagnostics outlier-detection rare-disease rna-seq splicing openblas cpp

44 stars 8.53 score 155 scripts

cboettig

duckdbfs:High Performance Remote File System, Database and 'Geospatial' Access Using 'duckdb'

Provides friendly wrappers for creating 'duckdb'-backed connections to tabular datasets ('csv', parquet, etc) on local or remote file systems. This mimics the behaviour of "open_dataset" in the 'arrow' package, but in addition to 'S3' file system also generalizes to any list of 'http' URLs.

Maintained by Carl Boettiger. Last updated 20 days ago.

85 stars 8.51 score 41 scripts 16 dependents

ropensci

weatherOz:An API Client for Australian Weather and Climate Data Resources

Provides automated downloading, parsing and formatting of weather data for Australia through API endpoints provided by the Department of Primary Industries and Regional Development ('DPIRD') of Western Australia and by the Science and Technology Division of the Queensland Government's Department of Environment and Science ('DES'). As well as the Bureau of Meteorology ('BOM') of the Australian government precis and coastal forecasts, and downloading and importing radar and satellite imagery files. 'DPIRD' weather data are accessed through public 'APIs' provided by 'DPIRD', <https://www.agric.wa.gov.au/weather-api-20>, providing access to weather station data from the 'DPIRD' weather station network. Australia-wide weather data are based on data from the Australian Bureau of Meteorology ('BOM') data and accessed through 'SILO' (Scientific Information for Land Owners) Jeffrey et al. (2001) <doi:10.1016/S1364-8152(01)00008-1>. 'DPIRD' data are made available under a Creative Commons Attribution 3.0 Licence (CC BY 3.0 AU) license <https://creativecommons.org/licenses/by/3.0/au/deed.en>. SILO data are released under a Creative Commons Attribution 4.0 International licence (CC BY 4.0) <https://creativecommons.org/licenses/by/4.0/>. 'BOM' data are (c) Australian Government Bureau of Meteorology and released under a Creative Commons (CC) Attribution 3.0 licence or Public Access Licence ('PAL') as appropriate, see <http://www.bom.gov.au/other/copyright.shtml> for further details.

Maintained by Rodrigo Pires. Last updated 1 months ago.

dpird bom meteorological-data weather-forecast australia weather weather-data meteorology western-australia australia-bureau-of-meteorology western-australia-agriculture australia-agriculture australia-climate australia-weather api-client climate data rainfall weather-api

31 stars 8.47 score 40 scripts

bioc

TitanCNA:Subclonal copy number and LOH prediction from whole genome sequencing of tumours

Hidden Markov model to segment and predict regions of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH), and estimate cellular prevalence of clonal clusters in tumour whole genome sequencing data.

Maintained by Gavin Ha. Last updated 5 months ago.

sequencing wholegenome dnaseq exomeseq statisticalmethod copynumbervariation hiddenmarkovmodel genetics genomicvariation immunooncology 10x-genomics copy-number-variation genome-sequencing hmm tumor-heterogeneity

97 stars 8.47 score 68 scripts

bioc

ClassifyR:A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.

Maintained by Dario Strbenac. Last updated 7 days ago.

classification survival cpp

6 stars 8.46 score 45 scripts 3 dependents

bioc

BgeeDB:Annotation and gene expression data retrieval from Bgee database. TopAnat, an anatomical entities Enrichment Analysis tool for UBERON ontology

A package for the annotation and gene expression data download from Bgee database, and TopAnat analysis: GO-like enrichment of anatomical terms, mapped to genes by expression patterns.

Maintained by Julien Wollbrett. Last updated 5 months ago.

software dataimport sequencing geneexpression microarray go genesetenrichment bioinformatics enrichment-analysis rna-seq scrna-seq single-cell

15 stars 8.46 score 19 scripts 1 dependents

bioc

multiMiR:Integration of multiple microRNA-target databases with their disease and drug associations

A collection of microRNAs/targets from external resources, including validated microRNA-target databases (miRecords, miRTarBase and TarBase), predicted microRNA-target databases (DIANA-microT, ElMMo, MicroCosm, miRanda, miRDB, PicTar, PITA and TargetScan) and microRNA-disease/drug databases (miR2Disease, Pharmaco-miR VerSe and PhenomiR).

Maintained by Spencer Mahaffey. Last updated 5 months ago.

mirnadata homo_sapiens_data mus_musculus_data rattus_norvegicus_data organismdata microrna-sequence sql

20 stars 8.45 score 141 scripts

bioc

geneplotter:Graphics related functions for Bioconductor

Functions for plotting genomic data

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

visualization

8.40 score 249 scripts 10 dependents

bioc

UniProt.ws:R Interface to UniProt Web Services

The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. This package provides a collection of functions for retrieving, processing, and re-packaging UniProt web services. The package makes use of UniProt's modernized REST API and allows mapping of identifiers accross different databases.

Maintained by Marcel Ramos. Last updated 3 months ago.

annotation infrastructure go kegg biocarta bioconductor-package core-package

4 stars 8.38 score 167 scripts 4 dependents

pecanproject

PEcAn.SIPNET:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by Mike Dietze. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 8.38 score 61 scripts

bioc

EnrichmentBrowser:Seamless navigation through combined results of set-based and network-based enrichment analysis

The EnrichmentBrowser package implements essential functionality for the enrichment analysis of gene expression data. The analysis combines the advantages of set-based and network-based enrichment analysis in order to derive high-confidence gene sets and biological pathways that are differentially regulated in the expression data under investigation. Besides, the package facilitates the visualization and exploration of such sets and pathways.

Maintained by Ludwig Geistlinger. Last updated 5 months ago.

immunooncology microarray rnaseq geneexpression differentialexpression pathways graphandnetwork network genesetenrichment networkenrichment visualization reportwriting

20 stars 8.37 score 164 scripts 3 dependents

pecanproject

PEcAn.LINKAGES:PEcAn Package for Integration of the LINKAGES Model

This module provides functions to link the (LINKAGES) to PEcAn.

Maintained by Ann Raiho. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 8.37 score 59 scripts

wallaceecomod

wallace:A Modular Platform for Reproducible Modeling of Species Niches and Distributions

The 'shiny' application Wallace is a modular platform for reproducible modeling of species niches and distributions. Wallace guides users through a complete analysis, from the acquisition of species occurrence and environmental data to visualizing model predictions on an interactive map, thus bundling complex workflows into a single, streamlined interface. An extensive vignette, which guides users through most package functionality can be found on the package's GitHub Pages website: <https://wallaceecomod.github.io/wallace/articles/tutorial-v2.html>.

Maintained by Mary E. Blair. Last updated 24 days ago.

openjdk

133 stars 8.36 score 96 scripts

bioc

CompoundDb:Creating and Using (Chemical) Compound Annotation Databases

CompoundDb provides functionality to create and use (chemical) compound annotation databases from a variety of different sources such as LipidMaps, HMDB, ChEBI or MassBank. The database format allows to store in addition MS/MS spectra along with compound information. The package provides also a backend for Bioconductor's Spectra package and allows thus to match experimetal MS/MS spectra against MS/MS spectra in the database. Databases can be stored in SQLite format and are thus portable.

Maintained by Johannes Rainer. Last updated 2 months ago.

massspectrometry metabolomics annotation databases mass-spectrometry

17 stars 8.35 score 69 scripts 1 dependents

smouksassi

ggquickeda:Quickly Explore Your Data Using 'ggplot2' and 'table1' Summary Tables

Quickly and easily perform exploratory data analysis by uploading your data as a 'csv' file. Start generating insights using 'ggplot2' plots and 'table1' tables with descriptive stats, all using an easy-to-use point and click 'Shiny' interface.

Maintained by Samer Mouksassi. Last updated 15 days ago.

73 stars 8.34 score 27 scripts

bioc

igvR:igvR: integrative genomics viewer

Access to igv.js, the Integrative Genomics Viewer running in a web browser.

Maintained by Arkadiusz Gladki. Last updated 5 months ago.

visualization thirdpartyclient genomebrowsers

45 stars 8.33 score 118 scripts

surveydown-dev

surveydown:Markdown-Based Surveys Using 'Quarto' and 'shiny'

Generate surveys using markdown and R code chunks. Surveys are composed of two files: a survey.qmd 'Quarto' file defining the survey content (pages, questions, etc), and an app.R file defining a 'shiny' app with global settings (libraries, database configuration, etc.) and server configuration options (e.g., conditional skipping / display, etc.). Survey data collected from respondents is stored in a 'PostgreSQL' database. Features include controls for conditional skip logic (skip to a page based on an answer to a question), conditional display logic (display a question based on an answer to a question), a customizable progress bar, and a wide variety of question types, including multiple choice (single choice and multiple choices), select, text, numeric, multiple choice buttons, text area, and dates. Because the surveys render into a 'shiny' app, designers can also leverage the reactive capabilities of 'shiny' to create dynamic and interactive surveys.

Maintained by John Paul Helveston. Last updated 4 days ago.

markdown postgres postgresql quarto shiny shiny-apps shiny-r supabase survey surveys

97 stars 8.29 score 133 scripts

bioc

GeneTonic:Enjoy Analyzing And Integrating The Results From Differential Expression Analysis And Functional Enrichment Analysis

This package provides functionality to combine the existing pieces of the transcriptome data and results, making it easier to generate insightful observations and hypothesis. Its usage is made easy with a Shiny application, combining the benefits of interactivity and reproducibility e.g. by capturing the features and gene sets of interest highlighted during the live session, and creating an HTML report as an artifact where text, code, and output coexist. Using the GeneTonicList as a standardized container for all the required components, it is possible to simplify the generation of multiple visualizations and summaries.

Maintained by Federico Marini. Last updated 3 months ago.

gui geneexpression software transcription transcriptomics visualization differentialexpression pathways reportwriting genesetenrichment annotation go shinyapps bioconductor bioconductor-package data-exploration data-visualization functional-enrichment-analysis gene-expression pathway-analysis reproducible-research rna-seq-analysis rna-seq-data shiny transcriptome user-friendly

77 stars 8.28 score 37 scripts 1 dependents

bioc

crisprDesign:Comprehensive design of CRISPR gRNAs for nucleases and base editors

Provides a comprehensive suite of functions to design and annotate CRISPR guide RNA (gRNAs) sequences. This includes on- and off-target search, on-target efficiency scoring, off-target scoring, full gene and TSS contextual annotations, and SNP annotation (human only). It currently support five types of CRISPR modalities (modes of perturbations): CRISPR knockout, CRISPR activation, CRISPR inhibition, CRISPR base editing, and CRISPR knockdown. All types of CRISPR nucleases are supported, including DNA- and RNA-target nucleases such as Cas9, Cas12a, and Cas13d. All types of base editors are also supported. gRNA design can be performed on reference genomes, transcriptomes, and custom DNA and RNA sequences. Both unpaired and paired gRNA designs are enabled.

Maintained by Jean-Philippe Fortin. Last updated 26 days ago.

crispr functionalgenomics genetarget bioconductor bioconductor-package crispr-cas9 crispr-design crispr-target genomics-analysis grna grna-sequence grna-sequences sgrna sgrna-design

22 stars 8.28 score 80 scripts 3 dependents

r-dbi

DBItest:Testing DBI Backends

A helper that tests DBI back ends for conformity to the interface.

Maintained by Kirill Müller. Last updated 15 days ago.

database testing

24 stars 8.21 score 11 scripts

darwin-eu

DrugUtilisation:Summarise Patient-Level Drug Utilisation in Data Mapped to the OMOP Common Data Model

Summarise patient-level drug utilisation cohorts using data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model. New users and prevalent users cohorts can be generated and their characteristics, indication and drug use summarised.

Maintained by Martí Català. Last updated 2 months ago.

8.20 score 156 scripts 2 dependents

bioc

POMA:Tools for Omics Data Analysis

The POMA package offers a comprehensive toolkit designed for omics data analysis, streamlining the process from initial visualization to final statistical analysis. Its primary goal is to simplify and unify the various steps involved in omics data processing, making it more accessible and manageable within a single, intuitive R package. Emphasizing on reproducibility and user-friendliness, POMA leverages the standardized SummarizedExperiment class from Bioconductor, ensuring seamless integration and compatibility with a wide array of Bioconductor tools. This approach guarantees maximum flexibility and replicability, making POMA an essential asset for researchers handling omics datasets. See https://github.com/pcastellanoescuder/POMAShiny. Paper: Castellano-Escuder et al. (2021) <doi:10.1371/journal.pcbi.1009148> for more details.

Maintained by Pol Castellano-Escuder. Last updated 4 months ago.

batcheffect classification clustering decisiontree dimensionreduction multidimensionalscaling normalization preprocessing principalcomponent regression rnaseq software statisticalmethod visualization bioconductor bioinformatics data-visualization dimension-reduction exploratory-data-analysis machine-learning omics-data-integration pipeline pre-processing statistical-analysis user-friendly workflow

11 stars 8.16 score 20 scripts 1 dependents

bioc

dreamlet:Scalable differential expression analysis of single cell transcriptomics datasets with complex study designs

Recent advances in single cell/nucleus transcriptomic technology has enabled collection of cohort-scale datasets to study cell type specific gene expression differences associated disease state, stimulus, and genetic regulation. The scale of these data, complex study designs, and low read count per cell mean that characterizing cell type specific molecular mechanisms requires a user-frieldly, purpose-build analytical framework. We have developed the dreamlet package that applies a pseudobulk approach and fits a regression model for each gene and cell cluster to test differential expression across individuals associated with a trait of interest. Use of precision-weighted linear mixed models enables accounting for repeated measures study designs, high dimensional batch effects, and varying sequencing depth or observed cells per biosample.

Maintained by Gabriel Hoffman. Last updated 6 days ago.

rnaseq geneexpression differentialexpression batcheffect qualitycontrol regression genesetenrichment generegulation epigenetics functionalgenomics transcriptomics normalization singlecell preprocessing sequencing immunooncology software cpp

12 stars 8.14 score 128 scripts

pecanproject

PEcAnAssimSequential:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by Mike Dietze. Last updated 19 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 8.14 score 35 scripts

bioc

motifmatchr:Fast Motif Matching in R

Quickly find motif matches for many motifs and many sequences. Wraps C++ code from the MOODS motif calling library, which was developed by Pasi Rastas, Janne Korhonen, and Petri Martinmäki.

Maintained by Alicia Schep. Last updated 5 months ago.

motifannotation cpp

8.11 score 722 scripts 5 dependents

bioc

monaLisa:Binned Motif Enrichment Analysis and Visualization

Useful functions to work with sequence motifs in the analysis of genomics data. These include methods to annotate genomic regions or sequences with predicted motif hits and to identify motifs that drive observed changes in accessibility or expression. Functions to produce informative visualizations of the obtained results are also provided.

Maintained by Michael Stadler. Last updated 8 days ago.

motifannotation visualization featureextraction epigenetics

40 stars 8.10 score 53 scripts

bioc

ggkegg:Analyzing and visualizing KEGG information using the grammar of graphics

This package aims to import, parse, and analyze KEGG data such as KEGG PATHWAY and KEGG MODULE. The package supports visualizing KEGG information using ggplot2 and ggraph through using the grammar of graphics. The package enables the direct visualization of the results from various omics analysis packages.

Maintained by Noriaki Sato. Last updated 2 months ago.

pathways dataimport kegg ggplot2 ggraph pathway tidygraph visualization

225 stars 8.08 score 30 scripts 1 dependents

bioc

STRINGdb:STRINGdb - Protein-Protein Interaction Networks and Functional Enrichment Analysis

The STRINGdb package provides a R interface to the STRING protein-protein interactions database (https://string-db.org).

Maintained by Damian Szklarczyk. Last updated 5 months ago.

network

8.08 score 344 scripts 9 dependents

bioc

rBLAST:R Interface for the Basic Local Alignment Search Tool

Seamlessly interfaces the Basic Local Alignment Search Tool (BLAST) running locally to search genetic sequence data bases. This work was partially supported by grant no. R21HG005912 from the National Human Genome Research Institute.

Maintained by Michael Hahsler. Last updated 3 months ago.

genetics sequencing sequencematching alignment dataimport bioconductor bioinformatics blast-search

106 stars 8.07 score 106 scripts

r-arcgis

arcgislayers:An Interface to ArcGIS Data Services

Enables users of 'ArcGIS Enterprise', 'ArcGIS Online', or 'ArcGIS Platform' to read, write, publish, or manage vector and raster data via ArcGIS location services REST API endpoints <https://developers.arcgis.com/rest/>.

Maintained by Josiah Parry. Last updated 13 days ago.

arcgis esri gis r-spatial

50 stars 8.07 score 38 scripts 4 dependents

bioc

FLAMES:FLAMES: Full Length Analysis of Mutations and Splicing in long read RNA-seq data

Semi-supervised isoform detection and annotation from both bulk and single-cell long read RNA-seq data. Flames provides automated pipelines for analysing isoforms, as well as intermediate functions for manual execution.

Maintained by Changqing Wang. Last updated 20 hours ago.

rnaseq singlecell transcriptomics dataimport differentialsplicing alternativesplicing geneexpression longread zlib curl bzip2 xz-utils cpp

33 stars 8.04 score 12 scripts

darwin-eu

CohortCharacteristics:Summarise and Visualise Characteristics of Patients in the OMOP CDM

Summarise and visualise the characteristics of patients in data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model (CDM).

Maintained by Marti Catala. Last updated 4 months ago.

1 stars 8.03 score 111 scripts 1 dependents

bioc

biovizBase:Basic graphic utilities for visualization of genomic data.

The biovizBase package is designed to provide a set of utilities, color schemes and conventions for genomic data. It serves as the base for various high-level packages for biological data visualization. This saves development effort and encourages consistency.

Maintained by Michael Lawrence. Last updated 5 months ago.

infrastructure visualization preprocessing

8.03 score 273 scripts 74 dependents

bioc

recount3:Explore and download data from the recount3 project

The recount3 package enables access to a large amount of uniformly processed RNA-seq data from human and mouse. You can download RangedSummarizedExperiment objects at the gene, exon or exon-exon junctions level with sample metadata and QC statistics. In addition we provide access to sample coverage BigWig files.

Maintained by Leonardo Collado-Torres. Last updated 4 months ago.

coverage differentialexpression geneexpression rnaseq sequencing software dataimport annotation-agnostic bioconductor count derfinder exon gene human illumina junction mouse recount recount3

33 stars 8.03 score 216 scripts

bioc

simplifyEnrichment:Simplify Functional Enrichment Results

A new clustering algorithm, "binary cut", for clustering similarity matrices of functional terms is implemeted in this package. It also provides functions for visualizing, summarizing and comparing the clusterings.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization go clustering genesetenrichment

113 stars 8.02 score 196 scripts

bioc

spicyR:Spatial analysis of in situ cytometry data

The spicyR package provides a framework for performing inference on changes in spatial relationships between pairs of cell types for cell-resolution spatial omics technologies. spicyR consists of three primary steps: (i) summarizing the degree of spatial localization between pairs of cell types for each image; (ii) modelling the variability in localization summary statistics as a function of cell counts and (iii) testing for changes in spatial localizations associated with a response variable.

Maintained by Ellis Patrick. Last updated 27 days ago.

singlecell cellbasedassays spatial

9 stars 8.02 score 57 scripts 1 dependents

bioc

netZooR:Unified methods for the inference and analysis of gene regulatory networks

netZooR unifies the implementations of several Network Zoo methods (netzoo, netzoo.github.io) into a single package by creating interfaces between network inference and network analysis methods. Currently, the package has 3 methods for network inference including PANDA and its optimized implementation OTTER (network reconstruction using mutliple lines of biological evidence), LIONESS (single-sample network inference), and EGRET (genotype-specific networks). Network analysis methods include CONDOR (community detection), ALPACA (differential community detection), CRANE (significance estimation of differential modules), MONSTER (estimation of network transition states). In addition, YARN allows to process gene expresssion data for tissue-specific analyses and SAMBAR infers missing mutation data based on pathway information.

Maintained by Tara Eicher. Last updated 13 days ago.

networkinference network generegulation geneexpression transcription microarray graphandnetwork gene-regulatory-network transcription-factors

105 stars 7.98 score

darwin-eu

IncidencePrevalence:Estimate Incidence and Prevalence using the OMOP Common Data Model

Calculate incidence and prevalence using data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model. Incidence and prevalence can be estimated for the total population in a database or for a stratification cohort.

Maintained by Edward Burn. Last updated 21 days ago.

9 stars 7.96 score 102 scripts 1 dependents

bioc

motifStack:Plot stacked logos for single or multiple DNA, RNA and amino acid sequence

The motifStack package is designed for graphic representation of multiple motifs with different similarity scores. It works with both DNA/RNA sequence motif and amino acid sequence motif. In addition, it provides the flexibility for users to customize the graphic parameters such as the font type and symbol colors.

Maintained by Jianhong Ou. Last updated 3 months ago.

sequencematching visualization sequencing microarray alignment chipchip chipseq motifannotation dataimport

7.93 score 188 scripts 6 dependents

bioc

Category:Category Analysis

A collection of tools for performing category (gene set enrichment) analysis.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation go pathways genesetenrichment

7.93 score 183 scripts 16 dependents

ohdsi

CohortGenerator:Cohort Generation for the OMOP Common Data Model

Generate cohorts and subsets using an Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) Database. Cohorts are defined using 'CIRCE' (<https://github.com/ohdsi/circe-be>) or SQL compatible with 'SqlRender' (<https://github.com/OHDSI/SqlRender>).

Maintained by Anthony Sena. Last updated 6 months ago.

hades openjdk

13 stars 7.91 score 165 scripts

bioc

BayesSpace:Clustering and Resolution Enhancement of Spatial Transcriptomes

Tools for clustering and enhancing the resolution of spatial gene expression experiments. BayesSpace clusters a low-dimensional representation of the gene expression matrix, incorporating a spatial prior to encourage neighboring spots to cluster together. The method can enhance the resolution of the low-dimensional representation into "sub-spots", for which features such as gene expression or cell type composition can be imputed.

Maintained by Matt Stone. Last updated 5 months ago.

software clustering transcriptomics geneexpression singlecell immunooncology dataimport openblas cpp openmp

126 stars 7.90 score 278 scripts 1 dependents

darwin-eu

visOmopResults:Graphs and Tables for OMOP Results

Provides methods to transform omop_result objects into formatted tables and figures, facilitating the visualisation of study results working with the Observational Medical Outcomes Partnership (OMOP) Common Data Model.

Maintained by Núria Mercadé-Besora. Last updated 11 days ago.

7.89 score 53 scripts 3 dependents

bioc

beadarray:Quality assessment and low-level analysis for Illumina BeadArray data

The package is able to read bead-level data (raw TIFFs and text files) output by BeadScan as well as bead-summary data from BeadStudio. Methods for quality assessment and low-level analysis are provided.

Maintained by Mark Dunning. Last updated 5 months ago.

microarray onechannel qualitycontrol preprocessing

7.88 score 70 scripts 4 dependents

adriancorrendo

metrica:Prediction Performance Metrics

A compilation of more than 80 functions designed to quantitatively and visually evaluate prediction performance of regression (continuous variables) and classification (categorical variables) of point-forecast models (e.g. APSIM, DSSAT, DNDC, supervised Machine Learning). For regression, it includes functions to generate plots (scatter, tiles, density, & Bland-Altman plot), and to estimate error metrics (e.g. MBE, MAE, RMSE), error decomposition (e.g. lack of accuracy-precision), model efficiency (e.g. NSE, E1, KGE), indices of agreement (e.g. d, RAC), goodness of fit (e.g. r, R2), adjusted correlation coefficients (e.g. CCC, dcorr), symmetric regression coefficients (intercept, slope), and mean absolute scaled error (MASE) for time series predictions. For classification (binomial and multinomial), it offers functions to generate and plot confusion matrices, and to estimate performance metrics such as accuracy, precision, recall, specificity, F-score, Cohen's Kappa, G-mean, and many more. For more details visit the vignettes <https://adriancorrendo.github.io/metrica/>.

Maintained by Adrian A. Correndo. Last updated 9 months ago.

77 stars 7.88 score 49 scripts

bioc

biodb:biodb, a library and a development framework for connecting to chemical and biological databases

The biodb package provides access to standard remote chemical and biological databases (ChEBI, KEGG, HMDB, ...), as well as to in-house local database files (CSV, SQLite), with easy retrieval of entries, access to web services, search of compounds by mass and/or name, and mass spectra matching for LCMS and MSMS. Its architecture as a development framework facilitates the development of new database connectors for local projects or inside separate published packages.

Maintained by Pierrick Roger. Last updated 5 months ago.

software infrastructure dataimport kegg biology cheminformatics chemistry databases cpp

11 stars 7.85 score 24 scripts 6 dependents

nmecsys

BETS:Brazilian Economic Time Series

It provides access to and information about the most important Brazilian economic time series - from the Getulio Vargas Foundation <http://portal.fgv.br/en>, the Central Bank of Brazil <http://www.bcb.gov.br> and the Brazilian Institute of Geography and Statistics <http://www.ibge.gov.br>. It also presents tools for managing, analysing (e.g. generating dynamic reports with a complete analysis of a series) and exporting these time series.

Maintained by Talitha Speranza. Last updated 4 years ago.

38 stars 7.82 score 108 scripts

bioc

Rcpi:Molecular Informatics Toolkit for Compound-Protein Interaction in Drug Discovery

A molecular informatics toolkit with an integration of bioinformatics and chemoinformatics tools for drug discovery.

Maintained by Nan Xiao. Last updated 5 months ago.

software dataimport datarepresentation featureextraction cheminformatics biomedicalinformatics proteomics go systemsbiology bioconductor bioinformatics drug-discovery feature-extraction fingerprint molecular-descriptors protein-sequences

37 stars 7.81 score 29 scripts

bioc

SRAdb:A compilation of metadata from NCBI SRA and tools

The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, and others. However, finding data of interest can be challenging using current tools. SRAdb is an attempt to make access to the metadata associated with submission, study, sample, experiment and run much more feasible. This is accomplished by parsing all the NCBI SRA metadata into a SQLite database that can be stored and queried locally. Fulltext search in the package make querying metadata very flexible and powerful. fastq and sra files can be downloaded for doing alignment locally. Beside ftp protocol, the SRAdb has funcitons supporting fastp protocol (ascp from Aspera Connect) for faster downloading large data files over long distance. The SQLite database is updated regularly as new data is added to SRA and can be downloaded at will for the most up-to-date metadata.

Maintained by Jack Zhu. Last updated 4 months ago.

infrastructure sequencing dataimport

2 stars 7.81 score 200 scripts

bioc

debrowser:Interactive Differential Expresion Analysis Browser

Bioinformatics platform containing interactive plots and tables for differential gene and region expression studies. Allows visualizing expression data much more deeply in an interactive and faster way. By changing the parameters, users can easily discover different parts of the data that like never have been done before. Manually creating and looking these plots takes time. With DEBrowser users can prepare plots without writing any code. Differential expression, PCA and clustering analysis are made on site and the results are shown in various plots such as scatter, bar, box, volcano, ma plots and Heatmaps.

Maintained by Alper Kucukural. Last updated 5 months ago.

sequencing chipseq rnaseq differentialexpression geneexpression clustering immunooncology

61 stars 7.80 score 65 scripts

bioc

PhyloProfile:PhyloProfile

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Maintained by Vinh Tran. Last updated 10 days ago.

software visualization datarepresentation multiplecomparison functionalprediction dimensionreduction bioinformatics heatmap interactive-visualizations orthologs phylogenetic-profile shiny

33 stars 7.79 score 10 scripts