Showing 200 of total 2018 results (show query)
igraph
igraph:Network Analysis and Visualization
Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
Maintained by Kirill Müller. Last updated 13 hours ago.
complex-networksgraph-algorithmsgraph-theorymathematicsnetwork-analysisnetwork-graphfortranlibxml2glpkopenblascpp
57.0 match 582 stars 21.11 score 31k scripts 1.9k dependentstraminer
TraMineR:Trajectory Miner: a Sequence Analysis Toolkit
Set of sequence analysis tools for manipulating, describing and rendering categorical sequences, and more generally mining sequence data in the field of social sciences. Although this sequence analysis package is primarily intended for state or event sequences that describe time use or life courses such as family formation histories or professional careers, its features also apply to many other kinds of categorical sequence data. It accepts many different sequence representations as input and provides tools for converting sequences from one format to another. It offers several functions for describing and rendering sequences, for computing distances between sequences with different metrics (among which optimal matching), original dissimilarity-based analysis tools, and functions for extracting the most frequent event subsequences and identifying the most discriminating ones among them. A user's guide can be found on the TraMineR web page.
Maintained by Gilbert Ritschard. Last updated 3 months ago.
136.9 match 11 stars 8.24 score 534 scripts 13 dependentsbioc
Biostrings:Efficient manipulation of biological strings
Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.
Maintained by Hervé Pagès. Last updated 24 days ago.
sequencematchingalignmentsequencinggeneticsdataimportdatarepresentationinfrastructurebioconductor-packagecore-package
51.5 match 61 stars 17.83 score 8.6k scripts 1.2k dependentsemmanuelparadis
ape:Analyses of Phylogenetics and Evolution
Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
Maintained by Emmanuel Paradis. Last updated 14 hours ago.
41.1 match 64 stars 17.22 score 13k scripts 599 dependentsbioc
DECIPHER:Tools for curating, analyzing, and manipulating biological sequences
A toolset for deciphering and managing biological sequences.
Maintained by Erik Wright. Last updated 5 days ago.
clusteringgeneticssequencingdataimportvisualizationmicroarrayqualitycontrolqpcralignmentwholegenomemicrobiomeimmunooncologygenepredictionopenmp
77.0 match 8.40 score 1.1k scripts 14 dependentsbioc
dada2:Accurate, high-resolution sample inference from amplicon sequencing data
The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.
Maintained by Benjamin Callahan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclassificationmetagenomicsampliconbioconductorbioinformaticsmetabarcodingtaxonomycpp
49.0 match 485 stars 13.17 score 3.0k scripts 4 dependentsnanxstats
protr:Generating Various Numerical Representation Schemes for Protein Sequences
Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042>. For full functionality, the software 'ncbi-blast+' is needed, see <https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html> for more information.
Maintained by Nan Xiao. Last updated 6 months ago.
bioinformaticsfeature-engineeringfeature-extractionmachine-learningpeptidesprotein-sequencessequence-analysis
59.7 match 52 stars 10.02 score 173 scripts 3 dependentsbodkan
slendr:A Simulation Framework for Spatiotemporal Population Genetics
A framework for simulating spatially explicit genomic data which leverages real cartographic information for programmatic and visual encoding of spatiotemporal population dynamics on real geographic landscapes. Population genetic models are then automatically executed by the 'SLiM' software by Haller et al. (2019) <doi:10.1093/molbev/msy228> behind the scenes, using a custom built-in simulation 'SLiM' script. Additionally, fully abstract spatial models not tied to a specific geographic location are supported, and users can also simulate data from standard, non-spatial, random-mating models. These can be simulated either with the 'SLiM' built-in back-end script, or using an efficient coalescent population genetics simulator 'msprime' by Baumdicker et al. (2022) <doi:10.1093/genetics/iyab229> with a custom-built 'Python' script bundled with the R package. Simulated genomic data is saved in a tree-sequence format and can be loaded, manipulated, and summarised using tree-sequence functionality via an R interface to the 'Python' module 'tskit' by Kelleher et al. (2019) <doi:10.1038/s41588-019-0483-y>. Complete model configuration, simulation and analysis pipelines can be therefore constructed without a need to leave the R environment, eliminating friction between disparate tools for population genetic simulations and data analysis.
Maintained by Martin Petr. Last updated 12 days ago.
popgenpopulation-geneticssimulationsspatial-statistics
59.3 match 56 stars 9.15 score 88 scriptsdosorio
Peptides:Calculate Indices and Theoretical Physicochemical Properties of Protein Sequences
Includes functions to calculate several physicochemical properties and indices for amino-acid sequences as well as to read and plot 'XVG' output files from the 'GROMACS' molecular dynamics package.
Maintained by Daniel Osorio. Last updated 1 years ago.
bioinformaticscalculate-indicespeptidesprotein-sequencesqsarcpp
58.8 match 82 stars 9.14 score 245 scripts 7 dependentsbioc
universalmotif:Import, Modify, and Export Motifs with R
Allows for importing most common motif types into R for use by functions provided by other Bioconductor motif-related packages. Motifs can be exported into most major motif formats from various classes as defined by other Bioconductor packages. A suite of motif and sequence manipulation and analysis functions are included, including enrichment, comparison, P-value calculation, shuffling, trimming, higher-order motifs, and others.
Maintained by Benjamin Jean-Marie Tremblay. Last updated 4 months ago.
motifannotationmotifdiscoverydataimportgeneregulationmotif-analysismotif-enrichment-analysissequence-logocpp
46.9 match 28 stars 11.04 score 342 scripts 12 dependentsbrodieg
fansi:ANSI Control Sequence Aware String Functions
Counterparts to R string manipulation functions that account for the effects of ANSI text formatting control sequences.
Maintained by Brodie Gaslam. Last updated 10 months ago.
31.3 match 54 stars 14.18 score 136 scripts 11k dependentsr-lib
clock:Date-Time Types and Tools
Provides a comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (durations, time points, zoned-times, and calendars) that partition responsibilities so that the complexities of time zones are only considered when they are really needed. Capabilities include: date-time parsing, formatting, arithmetic, extraction and updating of components, and rounding.
Maintained by Davis Vaughan. Last updated 2 days ago.
30.6 match 106 stars 14.48 score 296 scripts 407 dependentsfkeck
bioseq:A Toolbox for Manipulating Biological Sequences
Classes and functions to work with biological sequences (DNA, RNA and amino acid sequences). Implements S3 infrastructure to work with biological sequences as described in Keck (2020) <doi:10.1111/2041-210X.13490>. Provides a collection of functions to perform biological conversion among classes (transcription, translation) and basic operations on sequences (detection, selection and replacement based on positions or patterns). The package also provides functions to import and export sequences from and to other package formats.
Maintained by Francois Keck. Last updated 3 years ago.
64.4 match 22 stars 6.72 score 80 scripts 1 dependentsbioc
crisprDesign:Comprehensive design of CRISPR gRNAs for nucleases and base editors
Provides a comprehensive suite of functions to design and annotate CRISPR guide RNA (gRNAs) sequences. This includes on- and off-target search, on-target efficiency scoring, off-target scoring, full gene and TSS contextual annotations, and SNP annotation (human only). It currently support five types of CRISPR modalities (modes of perturbations): CRISPR knockout, CRISPR activation, CRISPR inhibition, CRISPR base editing, and CRISPR knockdown. All types of CRISPR nucleases are supported, including DNA- and RNA-target nucleases such as Cas9, Cas12a, and Cas13d. All types of base editors are also supported. gRNA design can be performed on reference genomes, transcriptomes, and custom DNA and RNA sequences. Both unpaired and paired gRNA designs are enabled.
Maintained by Jean-Philippe Fortin. Last updated 11 days ago.
crisprfunctionalgenomicsgenetargetbioconductorbioconductor-packagecrispr-cas9crispr-designcrispr-targetgenomics-analysisgrnagrna-sequencegrna-sequencessgrnasgrna-design
47.9 match 22 stars 8.28 score 80 scripts 3 dependentsbioc
msa:Multiple Sequence Alignment
The 'msa' package provides a unified R/Bioconductor interface to the multiple sequence alignment algorithms ClustalW, ClustalOmega, and Muscle. All three algorithms are integrated in the package, therefore, they do not depend on any external software tools and are available for all major platforms. The multiple sequence alignment algorithms are complemented by a function for pretty-printing multiple sequence alignments using the LaTeX package TeXshade.
Maintained by Ulrich Bodenhofer. Last updated 1 months ago.
multiplesequencealignmentalignmentmultiplecomparisonsequencingcpp
33.8 match 17 stars 11.51 score 744 scripts 6 dependentsadrientaudiere
MiscMetabar:Miscellaneous Functions for Metabarcoding Analysis
Facilitate the description, transformation, exploration, and reproducibility of metabarcoding analyses. 'MiscMetabar' is mainly built on top of the 'phyloseq', 'dada2' and 'targets' R packages. It helps to build reproducible and robust bioinformatics pipelines in R. 'MiscMetabar' makes ecological analysis of alpha and beta-diversity easier, more reproducible and more powerful by integrating a large number of tools. Important features are described in Taudière A. (2023) <doi:10.21105/joss.06038>.
Maintained by Adrien Taudière. Last updated 26 days ago.
sequencingmicrobiomemetagenomicsclusteringclassificationvisualizationampliconamplicon-sequencingbiodiversity-informaticsecologyilluminametabarcodingngs-analysis
60.3 match 17 stars 6.44 score 23 scriptsbiogenies
tidysq:Tidy Processing and Analysis of Biological Sequences
A tidy approach to analysis of biological sequences. All processing and data-storage functions are heavily optimized to allow the fastest and most efficient data storage.
Maintained by Dominik Rafacz. Last updated 2 months ago.
bioconductorbioinformaticsbiological-sequencesfastas3sequencestibbletidytidyversevctrscpp
50.5 match 40 stars 7.56 score 38 scriptsropensci
biomartr:Genomic Data Retrieval
Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database (Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr', 'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. with only one command.
Maintained by Hajk-Georg Drost. Last updated 1 months ago.
biomartgenomic-data-retrievalannotation-retrievaldatabase-retrievalncbiensemblbiological-data-retrievalensembl-serversgenomegenome-annotationgenome-retrievalgenomicsmeta-analysismetagenomicsncbi-genbankpeer-reviewedproteomesequenced-genomes
32.7 match 218 stars 11.35 score 129 scripts 3 dependentsbioboot
bio3d:Biological Structure Analysis
Utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform sequence and structure database searches, data summaries, atom selection, alignment, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, normal mode analysis, principal component analysis of heterogeneous structure data, and correlation network analysis from normal mode and molecular dynamics data. In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data. Please refer to the URLs below for more information.
Maintained by Barry Grant. Last updated 5 months ago.
43.4 match 5 stars 8.49 score 1.4k scripts 10 dependentsbenrenard
sequenceR:A Simple Sequencer for Data Sonification
A rudimentary sequencer to define, manipulate and mix sound samples. The underlying motivation is to sonify data, as demonstrated in the blog <https://globxblog.github.io/>, the presentation by Renard and Le Bescond (2022, <https://hal.science/hal-03710340v1>) or the poster by Renard et al. (2023, <https://hal.inrae.fr/hal-04388845v1>).
Maintained by Benjamin Renard. Last updated 2 months ago.
70.1 match 3 stars 5.13 score 15 scriptsbioc
Rcpi:Molecular Informatics Toolkit for Compound-Protein Interaction in Drug Discovery
A molecular informatics toolkit with an integration of bioinformatics and chemoinformatics tools for drug discovery.
Maintained by Nan Xiao. Last updated 5 months ago.
softwaredataimportdatarepresentationfeatureextractioncheminformaticsbiomedicalinformaticsproteomicsgosystemsbiologybioconductorbioinformaticsdrug-discoveryfeature-extractionfingerprintmolecular-descriptorsprotein-sequences
43.4 match 37 stars 7.81 score 29 scriptsropensci
phylotaR:Automated Phylogenetic Sequence Cluster Identification from 'GenBank'
A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.
Maintained by Shixiang Wang. Last updated 8 months ago.
blastngenbankpeer-reviewedphylogeneticssequence-alignment
56.2 match 23 stars 5.86 score 156 scriptsbioc
motifStack:Plot stacked logos for single or multiple DNA, RNA and amino acid sequence
The motifStack package is designed for graphic representation of multiple motifs with different similarity scores. It works with both DNA/RNA sequence motif and amino acid sequence motif. In addition, it provides the flexibility for users to customize the graphic parameters such as the font type and symbol colors.
Maintained by Jianhong Ou. Last updated 2 months ago.
sequencematchingvisualizationsequencingmicroarrayalignmentchipchipchipseqmotifannotationdataimport
41.4 match 7.93 score 188 scripts 6 dependentssamuel-marsh
scCustomize:Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing
Collection of functions created and/or curated to aid in the visualization and analysis of single-cell data using 'R'. 'scCustomize' aims to provide 1) Customized visualizations for aid in ease of use and to create more aesthetic and functional visuals. 2) Improve speed/reproducibility of common tasks/pieces of code in scRNA-seq analysis with a single or group of functions. For citation please use: Marsh SE (2021) "Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing" <doi:10.5281/zenodo.5706430> RRID:SCR_024675.
Maintained by Samuel Marsh. Last updated 3 months ago.
customizationggplot2scrna-seqseuratsingle-cellsingle-cell-genomicssingle-cell-rna-seqvisualization
32.8 match 242 stars 8.75 score 1.1k scriptsklausvigo
phangorn:Phylogenetic Reconstruction and Analysis
Allows for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation (Schliep 2011). Offers methods for tree comparison, model selection and visualization of phylogenetic networks as described in Schliep et al. (2017).
Maintained by Klaus Schliep. Last updated 1 months ago.
softwaretechnologyqualitycontrolphylogenetic-analysisphylogeneticsopenblascpp
16.9 match 206 stars 16.69 score 2.5k scripts 135 dependentsbioc
ORFik:Open Reading Frames in Genomics
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
Maintained by Haakon Tjeldnes. Last updated 28 days ago.
immunooncologysoftwaresequencingriboseqrnaseqfunctionalgenomicscoveragealignmentdataimportcpp
25.9 match 33 stars 10.63 score 115 scripts 2 dependentsbioc
GenomicFeatures:Query the gene models of a given organism/assembly
Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.
Maintained by H. Pagès. Last updated 4 months ago.
geneticsinfrastructureannotationsequencinggenomeannotationbioconductor-packagecore-package
17.8 match 26 stars 15.34 score 5.3k scripts 339 dependentst-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
25.0 match 10.82 score 10k scripts 54 dependentsbioc
metagenomeSeq:Statistical analysis for sparse high-throughput sequencing
metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq is designed to address the effects of both normalization and under-sampling of microbial communities on disease association detection and the testing of feature correlations.
Maintained by Joseph N. Paulson. Last updated 3 months ago.
immunooncologyclassificationclusteringgeneticvariabilitydifferentialexpressionmicrobiomemetagenomicsnormalizationvisualizationmultiplecomparisonsequencingsoftware
22.4 match 69 stars 12.02 score 494 scripts 7 dependentsrstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 4 days ago.
18.9 match 845 stars 13.57 score 264 scripts 2 dependentsbioc
GenomAutomorphism:Compute the automorphisms between DNA's Abelian group representations
This is a R package to compute the automorphisms between pairwise aligned DNA sequences represented as elements from a Genomic Abelian group. In a general scenario, from genomic regions till the whole genomes from a given population (from any species or close related species) can be algebraically represented as a direct sum of cyclic groups or more specifically Abelian p-groups. Basically, we propose the representation of multiple sequence alignments of length N bp as element of a finite Abelian group created by the direct sum of homocyclic Abelian group of prime-power order.
Maintained by Robersy Sanchez. Last updated 3 months ago.
mathematicalbiologycomparativegenomicsfunctionalgenomicsmultiplesequencealignmentwholegenomegenetic-codegenetic-code-algebragenomegenome-algebra
58.6 match 4.30 score 9 scriptsbioc
Rsamtools:Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import
This package provides an interface to the 'samtools', 'bcftools', and 'tabix' utilities for manipulating SAM (Sequence Alignment / Map), FASTA, binary variant call (BCF) and compressed indexed tab-delimited (tabix) files.
Maintained by Bioconductor Package Maintainer. Last updated 4 months ago.
dataimportsequencingcoveragealignmentqualitycontrolbioconductor-packagecore-packagecurlbzip2xz-utilszlibcpp
15.8 match 28 stars 15.42 score 3.2k scripts 566 dependentsbioc
RAIDS:Accurate Inference of Genetic Ancestry from Cancer Sequences
This package implements specialized algorithms that enable genetic ancestry inference from various cancer sequences sources (RNA, Exome and Whole-Genome sequences). This package also implements a simulation algorithm that generates synthetic cancer-derived data. This code and analysis pipeline was designed and developed for the following publication: Belleau, P et al. Genetic Ancestry Inference from Cancer-Derived Molecular Data across Genomic and Transcriptomic Platforms. Cancer Res 1 January 2023; 83 (1): 49–58.
Maintained by Pascal Belleau. Last updated 5 months ago.
geneticssoftwaresequencingwholegenomeprincipalcomponentgeneticvariabilitydimensionreductionbiocviewsancestrycancer-genomicsexome-sequencinggenomicsinferencer-languagerna-seqrna-sequencingwhole-genome-sequencing
38.2 match 5 stars 6.23 score 19 scriptsbioc
Modstrings:Working with modified nucleotide sequences
Representing nucleotide modifications in a nucleotide sequence is usually done via special characters from a number of sources. This represents a challenge to work with in R and the Biostrings package. The Modstrings package implements this functionallity for RNA and DNA sequences containing modified nucleotides by translating the character internally in order to work with the infrastructure of the Biostrings package. For this the ModRNAString and ModDNAString classes and derivates and functions to construct and modify these objects despite the encoding issues are implemenented. In addition the conversion from sequences to list like location information (and the reverse operation) is implemented as well.
Maintained by Felix G.M. Ernst. Last updated 5 months ago.
dataimportdatarepresentationinfrastructuresequencingsoftwarebioconductorbiostringsdnadna-modificationsmodified-nucleotidesnucleotidesrnarna-modification-alphabetrna-modificationssequences
35.5 match 1 stars 6.64 score 5 scripts 8 dependentsbioc
phyloseq:Handling and analysis of high-throughput microbiome census data
phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.
Maintained by Paul J. McMurdie. Last updated 5 months ago.
immunooncologysequencingmicrobiomemetagenomicsclusteringclassificationmultiplecomparisongeneticvariability
16.7 match 597 stars 13.90 score 8.4k scripts 37 dependentsbioc
seqArchR:Identify Different Architectures of Sequence Elements
seqArchR enables unsupervised discovery of _de novo_ clusters with characteristic sequence architectures characterized by position-specific motifs or composition of stretches of nucleotides, e.g., CG-richness. seqArchR does _not_ require any specifications w.r.t. the number of clusters, the length of any individual motifs, or the distance between motifs if and when they occur in pairs/groups; it directly detects them from the data. seqArchR uses non-negative matrix factorization (NMF) as its backbone, and employs a chunking-based iterative procedure that enables processing of large sequence collections efficiently. Wrapper functions are provided for visualizing cluster architectures as sequence logos.
Maintained by Sarvesh Nikumbh. Last updated 5 months ago.
motifdiscoverygeneregulationmathematicalbiologysystemsbiologytranscriptomicsgeneticsclusteringdimensionreductionfeatureextractiondnaseqnmfnonnegative-matrix-factorizationpromoter-sequence-architecturesscikit-learnsequence-analysissequence-architecturesunsupervised-machine-learning
51.2 match 1 stars 4.48 score 9 scripts 1 dependentsbioc
LymphoSeq:Analyze high-throughput sequencing of T and B cell receptors
This R package analyzes high-throughput sequencing of T and B cell receptor complementarity determining region 3 (CDR3) sequences generated by Adaptive Biotechnologies' ImmunoSEQ assay. Its input comes from tab-separated value (.tsv) files exported from the ImmunoSEQ analyzer.
Maintained by David Coffey. Last updated 5 months ago.
softwaretechnologysequencingtargetedresequencingalignmentmultiplesequencealignment
56.6 match 4.00 score 4 scriptsbioc
kebabs:Kernel-Based Analysis of Biological Sequences
The package provides functionality for kernel-based analysis of DNA, RNA, and amino acid sequences via SVM-based methods. As core functionality, kebabs implements following sequence kernels: spectrum kernel, mismatch kernel, gappy pair kernel, and motif kernel. Apart from an efficient implementation of standard position-independent functionality, the kernels are extended in a novel way to take the position of patterns into account for the similarity measure. Because of the flexibility of the kernel formulation, other kernels like the weighted degree kernel or the shifted weighted degree kernel with constant weighting of positions are included as special cases. An annotation-specific variant of the kernels uses annotation information placed along the sequence together with the patterns in the sequence. The package allows for the generation of a kernel matrix or an explicit feature representation in dense or sparse format for all available kernels which can be used with methods implemented in other R packages. With focus on SVM-based methods, kebabs provides a framework which simplifies the usage of existing SVM implementations in kernlab, e1071, and LiblineaR. Binary and multi-class classification as well as regression tasks can be used in a unified way without having to deal with the different functions, parameters, and formats of the selected SVM. As support for choosing hyperparameters, the package provides cross validation - including grouped cross validation, grid search and model selection functions. For easier biological interpretation of the results, the package computes feature weights for all SVMs and prediction profiles which show the contribution of individual sequence positions to the prediction result and indicate the relevance of sequence sections for the learning result and the underlying biological functions.
Maintained by Ulrich Bodenhofer. Last updated 5 months ago.
supportvectormachineclassificationclusteringregressioncpp
34.0 match 6.58 score 47 scripts 3 dependentsbioc
CellBarcode:Cellular DNA Barcode Analysis toolkit
The package CellBarcode performs Cellular DNA Barcode analysis. It can handle all kinds of DNA barcodes, as long as the barcode is within a single sequencing read and has a pattern that can be matched by a regular expression. \code{CellBarcode} can handle barcodes with flexible lengths, with or without UMI (unique molecular identifier). This tool also can be used for pre-processing some amplicon data such as CRISPR gRNA screening, immune repertoire sequencing, and metagenome data.
Maintained by Wenjie Sun. Last updated 12 days ago.
preprocessingqualitycontrolsequencingcrisprampliconamplicon-sequencingcellular-barcoderustcargo
37.9 match 1 stars 5.86 score 40 scriptslvclark
polyRAD:Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids
Read depth data from genotyping-by-sequencing (GBS) or restriction site-associated DNA sequencing (RAD-seq) are imported and used to make Bayesian probability estimates of genotypes in polyploids or diploids. The genotype probabilities, posterior mean genotypes, or most probable genotypes can then be exported for downstream analysis. 'polyRAD' is described by Clark et al. (2019) <doi:10.1534/g3.118.200913>, and the Hind/He statistic for marker filtering is described by Clark et al. (2022) <doi:10.1186/s12859-022-04635-9>. A variant calling pipeline for highly duplicated genomes is also included and is described by Clark et al. (2020, Version 1) <doi:10.1101/2020.01.11.902890>.
Maintained by Lindsay V. Clark. Last updated 8 days ago.
bioinformaticsdna-sequencinggenotype-likelihoodsgenotyping-by-sequencinghacktoberfestrad-seqrad-sequencingsnp-genotypingcpp
31.5 match 28 stars 6.98 score 85 scriptsshaunpwilkinson
insect:Informatic Sequence Classification Trees
Provides tools for probabilistic taxon assignment with informatic sequence classification trees. See Wilkinson et al (2018) <doi:10.7287/peerj.preprints.26812v1>.
Maintained by Shaun Wilkinson. Last updated 4 years ago.
37.7 match 14 stars 5.80 score 91 scriptsmichbur
biogram:N-Gram Analysis of Biological Sequences
Tools for extraction and analysis of various n-grams (k-mers) derived from biological sequences (proteins or nucleic acids). Contains QuiPT (quick permutation test) for fast feature-filtering of the n-gram data.
Maintained by Michal Burdukiewicz. Last updated 7 months ago.
biological-sequencesngram-analysis
28.0 match 10 stars 7.50 score 87 scripts 3 dependentsmartin3141
spant:MR Spectroscopy Analysis Tools
Tools for reading, visualising and processing Magnetic Resonance Spectroscopy data. The package includes methods for spectral fitting: Wilson (2021) <DOI:10.1002/mrm.28385> and spectral alignment: Wilson (2018) <DOI:10.1002/mrm.27605>.
Maintained by Martin Wilson. Last updated 30 days ago.
brainmrimrsmrshubspectroscopyfortran
24.4 match 25 stars 8.52 score 81 scriptskbhoehn
dowser:B Cell Receptor Phylogenetics Toolkit
Provides a set of functions for inferring, visualizing, and analyzing B cell phylogenetic trees. Provides methods to 1) reconstruct unmutated ancestral sequences, 2) build B cell phylogenetic trees using multiple methods, 3) visualize trees with metadata at the tips, 4) reconstruct intermediate sequences, 5) detect biased ancestor-descendant relationships among metadata types Workflow examples available at documentation site (see URL). Citations: Hoehn et al (2022) <doi:10.1371/journal.pcbi.1009885>, Hoehn et al (2021) <doi:10.1101/2021.01.06.425648>.
Maintained by Kenneth Hoehn. Last updated 2 months ago.
30.6 match 6.81 score 84 scriptsbioc
TitanCNA:Subclonal copy number and LOH prediction from whole genome sequencing of tumours
Hidden Markov model to segment and predict regions of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH), and estimate cellular prevalence of clonal clusters in tumour whole genome sequencing data.
Maintained by Gavin Ha. Last updated 5 months ago.
sequencingwholegenomednaseqexomeseqstatisticalmethodcopynumbervariationhiddenmarkovmodelgeneticsgenomicvariationimmunooncology10x-genomicscopy-number-variationgenome-sequencinghmmtumor-heterogeneity
24.1 match 96 stars 8.47 score 68 scriptsbioc
decontam:Identify Contaminants in Marker-gene and Metagenomics Sequencing Data
Simple statistical identification of contaminating sequence features in marker-gene or metagenomics data. Works on any kind of feature derived from environmental sequencing data (e.g. ASVs, OTUs, taxonomic groups, MAGs,...). Requires DNA quantitation data or sequenced negative control samples.
Maintained by Benjamin Callahan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclassificationmetagenomicsampliconbioinformaticscontaminationmetabarcoding
17.7 match 153 stars 11.36 score 524 scripts 5 dependentsbioc
ShortRead:FASTQ input and manipulation
This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
dataimportsequencingqualitycontrolbioconductor-packagecore-packagezlibcpp
16.5 match 8 stars 12.08 score 1.8k scripts 49 dependentsbioc
immApex:Tools for Adaptive Immune Receptor Sequence-Based Machine and Deep Learning
A set of tools to build tensorflow/keras3-based models in R from amino acid and nucleotide sequences focusing on adaptive immune receptors. The package includes pre-processing of sequences, unifying gene nomenclature usage, encoding sequences, and combining models. This package will serve as the basis of future immune receptor sequence functions/packages/models compatible with the scRepertoire ecosystem.
Maintained by Nick Borcherding. Last updated 19 days ago.
softwareimmunooncologysinglecellclassificationannotationsequencingmotifannotation
32.7 match 8 stars 5.92 score 3 scriptsbioc
crisprBase:Base functions and classes for CRISPR gRNA design
Provides S4 classes for general nucleases, CRISPR nucleases, CRISPR nickases, and base editors.Several CRISPR-specific genome arithmetic functions are implemented to help extract genomic coordinates of spacer and protospacer sequences. Commonly-used CRISPR nuclease objects are provided that can be readily used in other packages. Both DNA- and RNA-targeting nucleases are supported.
Maintained by Jean-Philippe Fortin. Last updated 5 months ago.
crisprfunctionalgenomicsbioconductorbioconductor-packagecrispr-cas9crispr-designcrispr-targetgrnagrna-sequencegrna-sequences
26.7 match 5 stars 7.15 score 52 scripts 6 dependentsbioc
Rsubread:Mapping, quantification and variant analysis of sequencing data
Alignment, quantification and analysis of RNA sequencing data (including both bulk RNA-seq and scRNA-seq) and DNA sequenicng data (including ATAC-seq, ChIP-seq, WGS, WES etc). Includes functionality for read mapping, read counting, SNP calling, structural variant detection and gene fusion discovery. Can be applied to all major sequencing techologies and to both short and long sequence reads.
Maintained by Wei Shi. Last updated 2 days ago.
sequencingalignmentsequencematchingrnaseqchipseqsinglecellgeneexpressiongeneregulationgeneticsimmunooncologysnpgeneticvariabilitypreprocessingqualitycontrolgenomeannotationgenefusiondetectionindeldetectionvariantannotationvariantdetectionmultiplesequencealignmentzlib
20.7 match 9.24 score 892 scripts 10 dependentsgobbios
EloRating:Animal Dominance Hierarchies by Elo Rating
Provides functions to quantify animal dominance hierarchies. The major focus is on Elo rating and its ability to deal with temporal dynamics in dominance interaction sequences. For static data, David's score and de Vries' I&SI are also implemented. In addition, the package provides functions to assess transitivity, linearity and stability of dominance networks. See Neumann et al (2011) <doi:10.1016/j.anbehav.2011.07.016> for an introduction.
Maintained by Christof Neumann. Last updated 8 months ago.
27.7 match 4 stars 6.86 score 61 scripts 1 dependentsbioc
GenomicRanges:Representation and manipulation of genomic intervals
The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.
Maintained by Hervé Pagès. Last updated 4 months ago.
geneticsinfrastructuredatarepresentationsequencingannotationgenomeannotationcoveragebioconductor-packagecore-package
10.5 match 44 stars 17.75 score 13k scripts 1.3k dependentsspedygiorgio
markovchain:Easy Handling Discrete Time Markov Chains
Functions and S4 methods to create and manage discrete time Markov chains more easily. In addition functions to perform statistical (fitting and drawing random variates) and probabilistic (analysis of their structural proprieties) analysis are provided. See Spedicato (2017) <doi:10.32614/RJ-2017-036>. Some functions for continuous times Markov chains depend on the suggested ctmcd package.
Maintained by Giorgio Alfredo Spedicato. Last updated 4 months ago.
ctmcdtmcmarkov-chainmarkov-modelr-programmingrcppopenblascpp
14.5 match 104 stars 12.78 score 712 scripts 4 dependentsropensci
restez:Create and Query a Local Copy of 'GenBank' in R
Download large sections of 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> and generate a local SQL-based database. A user can then query this database using 'restez' functions or through 'rentrez' <https://CRAN.R-project.org/package=rentrez> wrappers.
Maintained by Joel H. Nitta. Last updated 10 days ago.
26.4 match 26 stars 7.01 score 175 scripts 1 dependentsomarwagih
ggseqlogo:A 'ggplot2' Extension for Drawing Publication-Ready Sequence Logos
The extensive range of functions provided by this package makes it possible to draw highly versatile sequence logos. Features include, but not limited to, modifying colour schemes and fonts used to draw the logo, generating multiple logo plots, and aiding the visualisation with annotations. Sequence logos can easily be combined with other plots 'ggplot2' plots.
Maintained by Omar Wagih. Last updated 5 months ago.
16.1 match 211 stars 11.48 score 786 scripts 13 dependentsbioc
ensembldb:Utilities to create and use Ensembl-based annotation databases
The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.
Maintained by Johannes Rainer. Last updated 5 months ago.
geneticsannotationdatasequencingcoverageannotationbioconductorbioconductor-packagesensembl
13.0 match 35 stars 14.08 score 892 scripts 108 dependentsbioc
GenomicAlignments:Representation and manipulation of short genomic alignments
Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.
Maintained by Hervé Pagès. Last updated 5 months ago.
infrastructuredataimportgeneticssequencingrnaseqsnpcoveragealignmentimmunooncologybioconductor-packagecore-package
13.4 match 10 stars 13.61 score 3.1k scripts 529 dependentsbioc
SeqArray:Data Management of Large-Scale Whole-Genome Sequence Variant Calls
Data management of large-scale whole-genome sequencing variant calls with thousands of individuals: genotypic data (e.g., SNVs, indels and structural variation calls) and annotations in SeqArray GDS files are stored in an array-oriented and compressed manner, with efficient data access using the R programming language.
Maintained by Xiuwen Zheng. Last updated 10 days ago.
infrastructuredatarepresentationsequencinggeneticsbioinformaticsgds-formatsnpsnvweswgscpp
15.0 match 45 stars 12.08 score 1.1k scripts 9 dependentsbioc
ALDEx2:Analysis Of Differential Abundance Taking Sample and Scale Variation Into Account
A differential abundance analysis for the comparison of two or more conditions. Useful for analyzing data from standard RNA-seq or meta-RNA-seq assays as well as selected and unselected values from in-vitro sequence selections. Uses a Dirichlet-multinomial model to infer abundance from counts, optimized for three or more experimental replicates. The method infers biological and sampling variation to calculate the expected false discovery rate, given the variation, based on a Wilcoxon Rank Sum test and Welch's t-test (via aldex.ttest), a Kruskal-Wallis test (via aldex.kw), a generalized linear model (via aldex.glm), or a correlation test (via aldex.corr). All tests report predicted p-values and posterior Benjamini-Hochberg corrected p-values. ALDEx2 also calculates expected standardized effect sizes for paired or unpaired study designs. ALDEx2 can now be used to estimate the effect of scale on the results and report on the scale-dependent robustness of results.
Maintained by Greg Gloor. Last updated 5 months ago.
differentialexpressionrnaseqtranscriptomicsgeneexpressiondnaseqchipseqbayesiansequencingsoftwaremicrobiomemetagenomicsimmunooncologyscale simulationposterior p-value
16.7 match 28 stars 10.70 score 424 scripts 3 dependentszhanxw
seqminer:Efficiently Read Sequence Data (VCF Format, BCF Format, METAL Format and BGEN Format) into R
Integrate sequencing data (Variant call format, e.g. VCF or BCF) or meta-analysis results in R. This package can help you (1) read VCF/BCF/BGEN files by chromosomal ranges (e.g. 1:100-200); (2) read RareMETAL summary statistics files; (3) read tables from a tabix-indexed files; (4) annotate VCF/BCF files; (5) create customized workflow based on Makefile.
Maintained by Xiaowei Zhan. Last updated 6 months ago.
annotationbcfbgenmeta-analysisnext-generation-sequencingplinksequencingtabixvcfworkflowzlibbzip2libzstdsqlite3cpp
21.5 match 30 stars 8.29 score 111 scripts 6 dependentsbioc
methylKit:DNA methylation analysis from high-throughput bisulfite sequencing results
methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing. It also has functions to analyze base-pair resolution 5hmC data from experimental protocols such as oxBS-Seq and TAB-Seq. Methylation calling can be performed directly from Bismark aligned BAM files.
Maintained by Altuna Akalin. Last updated 16 days ago.
dnamethylationsequencingmethylseqgenome-biologymethylationstatistical-analysisvisualizationcurlbzip2xz-utilszlibcpp
15.0 match 220 stars 11.80 score 578 scripts 3 dependentsbioc
scRepertoire:A toolkit for single-cell immune receptor profiling
scRepertoire is a toolkit for processing and analyzing single-cell T-cell receptor (TCR) and immunoglobulin (Ig). The scRepertoire framework supports use of 10x, AIRR, BD, MiXCR, Omniscope, TRUST4, and WAT3R single-cell formats. The functionality includes basic clonal analyses, repertoire summaries, distance-based clustering and interaction with the popular Seurat and SingleCellExperiment/Bioconductor R workflows.
Maintained by Nick Borcherding. Last updated 2 months ago.
softwareimmunooncologysinglecellclassificationannotationsequencingcpp
16.8 match 326 stars 10.49 score 240 scriptsohdsi
CohortSymmetry:Sequence Symmetry Analysis Using the Observational Medical Outcomes Partnership Common Data Model
Calculating crude sequence ratio, adjusted sequence ratio and confidence intervals using data mapped to the Observational Medical Outcomes Partnership Common Data Model.
Maintained by Xihang Chen. Last updated 15 days ago.
23.3 match 1 stars 7.53 score 73 scriptsthackl
gggenomes:A Grammar of Graphics for Comparative Genomics
An extension of 'ggplot2' for creating complex genomic maps. It builds on the power of 'ggplot2' and 'tidyverse' adding new 'ggplot2'-style geoms & positions and 'dplyr'-style verbs to manipulate the underlying data. It implements a layout concept inspired by 'ggraph' and introduces tracks to bring tidiness to the mess that is genomics data.
Maintained by Thomas Hackl. Last updated 2 months ago.
biological-datacomparative-genomicsgenomics-visualizationggplot-extensionggplot2
18.3 match 650 stars 9.56 score 123 scriptsbioc
hiReadsProcessor:Functions to process LM-PCR reads from 454/Illumina data
hiReadsProcessor contains set of functions which allow users to process LM-PCR products sequenced using any platform. Given an excel/txt file containing parameters for demultiplexing and sample metadata, the functions automate trimming of adaptors and identification of the genomic product. Genomic products are further processed for QC and abundance quantification.
Maintained by Nirav V Malani. Last updated 5 months ago.
41.6 match 4.18 score 7 scriptsmhahsler
rMSA:Interface for Popular Multiple Sequence Alignment Tools
Seamlessly interfaces the Multiple Sequence Alignment software packages ClustalW, MAFFT, MUSCLE and Kalign (downloaded separately) and provides support to calcualte distances between sequences. This work was partially supported by grant no. R21HG005912 from the National Human Genome Research Institute.
Maintained by Michael Hahsler. Last updated 10 months ago.
geneticssequencinginfrastructurealignmentbioinformaticssequence-alignment
45.8 match 12 stars 3.78 score 7 scriptsbioc
pwalign:Perform pairwise sequence alignments
The two main functions in the package are pairwiseAlignment() and stringDist(). The former solves (Needleman-Wunsch) global alignment, (Smith-Waterman) local alignment, and (ends-free) overlap alignment problems. The latter computes the Levenshtein edit distance or pairwise alignment score matrix for a set of strings.
Maintained by Hervé Pagès. Last updated 2 months ago.
alignmentsequencematchingsequencinggeneticsbioconductor-package
20.3 match 1 stars 8.42 score 27 scripts 105 dependentsbioc
DESeq2:Differential gene expression analysis based on the negative binomial distribution
Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.
Maintained by Michael Love. Last updated 11 days ago.
sequencingrnaseqchipseqgeneexpressiontranscriptionnormalizationdifferentialexpressionbayesianregressionprincipalcomponentclusteringimmunooncologyopenblascpp
10.5 match 375 stars 16.11 score 17k scripts 115 dependentsbioc
ComplexHeatmap:Make Complex Heatmaps
Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports various annotation graphics.
Maintained by Zuguang Gu. Last updated 5 months ago.
softwarevisualizationsequencingclusteringcomplex-heatmapsheatmap
10.0 match 1.3k stars 16.93 score 16k scripts 151 dependentsbioc
bambu:Context-Aware Transcript Quantification from Long Read RNA-Seq data
bambu is a R package for multi-sample transcript discovery and quantification using long read RNA-Seq data. You can use bambu after read alignment to obtain expression estimates for known and novel transcripts and genes. The output from bambu can directly be used for visualisation and downstream analysis such as differential gene expression or transcript usage.
Maintained by Ying Chen. Last updated 1 months ago.
alignmentcoveragedifferentialexpressionfeatureextractiongeneexpressiongenomeannotationgenomeassemblyimmunooncologylongreadmultiplecomparisonnormalizationrnaseqregressionsequencingsoftwaretranscriptiontranscriptomicsbambubioconductorlong-readsnanoporenanopore-sequencingrna-seqrna-seq-analysistranscript-quantificationtranscript-reconstructioncpp
18.7 match 197 stars 9.03 score 91 scripts 1 dependentsbioc
SummarizedExperiment:A container (S4 class) for matrix-like assays
The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
Maintained by Hervé Pagès. Last updated 5 months ago.
geneticsinfrastructuresequencingannotationcoveragegenomeannotationbioconductor-packagecore-package
10.0 match 34 stars 16.85 score 8.6k scripts 1.2k dependentsbioc
DSS:Dispersion shrinkage for sequencing data
DSS is an R library performing differntial analysis for count-based sequencing data. It detectes differentially expressed genes (DEGs) from RNA-seq, and differentially methylated loci or regions (DML/DMRs) from bisulfite sequencing (BS-seq). The core of DSS is a new dispersion shrinkage method for estimating the dispersion parameter from Gamma-Poisson or Beta-Binomial distributions.
Maintained by Hao Wu. Last updated 5 months ago.
sequencingrnaseqdnamethylationgeneexpressiondifferentialexpressiondifferentialmethylation
24.0 match 7.02 score 248 scripts 5 dependentsambuvjyn
baseq:Basic Sequence Processing Tool for Biological Data
Primarily created as an easy and understanding way to do basic sequences surrounding the central dogma of molecular biology.
Maintained by Ambu Vijayan. Last updated 2 years ago.
41.4 match 2 stars 4.00 scoreaalfons
robustHD:Robust Methods for High-Dimensional Data
Robust methods for high-dimensional data, in particular linear model selection techniques based on least angle regression and sparse regression. Specifically, the package implements robust least angle regression (Khan, Van Aelst & Zamar, 2007; <doi:10.1198/016214507000000950>), (robust) groupwise least angle regression (Alfons, Croux & Gelper, 2016; <doi:10.1016/j.csda.2015.02.007>), and sparse least trimmed squares regression (Alfons, Croux & Gelper, 2013; <doi:10.1214/12-AOAS575>).
Maintained by Andreas Alfons. Last updated 9 months ago.
23.2 match 10 stars 7.06 score 174 scripts 8 dependentsbioc
edgeR:Empirical Analysis of Digital Gene Expression Data in R
Differential expression analysis of sequence count data. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models, quasi-likelihood, and gene set enrichment. Can perform differential analyses of any type of omics data that produces read counts, including RNA-seq, ChIP-seq, ATAC-seq, Bisulfite-seq, SAGE, CAGE, metabolomics, or proteomics spectral counts. RNA-seq analyses can be conducted at the gene or isoform level, and tests can be conducted for differential exon or transcript usage.
Maintained by Yunshun Chen. Last updated 6 days ago.
alternativesplicingbatcheffectbayesianbiomedicalinformaticscellbiologychipseqclusteringcoveragedifferentialexpressiondifferentialmethylationdifferentialsplicingdnamethylationepigeneticsfunctionalgenomicsgeneexpressiongenesetenrichmentgeneticsimmunooncologymultiplecomparisonnormalizationpathwaysproteomicsqualitycontrolregressionrnaseqsagesequencingsinglecellsystemsbiologytimecoursetranscriptiontranscriptomicsopenblas
12.2 match 13.40 score 17k scripts 255 dependentscnuge
debar:A Post-Clustering Denoiser for COI-5P Barcode Data
The 'debar' sequence processing pipeline is designed for denoising high throughput sequencing data for the animal DNA barcode marker cytochrome c oxidase I (COI). The package is designed to detect and correct insertion and deletion errors within sequencer outputs. This is accomplished through comparison of input sequences against a profile hidden Markov model (PHMM) using the Viterbi algorithm (for algorithm details see Durbin et al. 1998, ISBN: 9780521629713). Inserted base pairs are removed and deleted base pairs are accounted for through the introduction of a placeholder character. Since the PHMM is a probabilistic representation of the COI barcode, corrections are not always perfect. For this reason 'debar' censors base pairs adjacent to reported indel sites, turning them into placeholder characters (default is 7 base pairs in either direction, this feature can be disabled). Testing has shown that this censorship results in the correct sequence length being restored, and erroneous base pairs being masked the vast majority of the time (>95%).
Maintained by Cameron M. Nugent. Last updated 1 years ago.
bioinformaticsdenoisingdna-barcodingdna-sequencinghidden-markov-modelmachine-learning
40.9 match 1 stars 4.00 score 8 scriptsbioc
slingshot:Tools for ordering single-cell sequencing
Provides functions for inferring continuous, branching lineage structures in low-dimensional data. Slingshot was designed to model developmental trajectories in single-cell RNA sequencing data and serve as a component in an analysis pipeline after dimensionality reduction and clustering. It is flexible enough to handle arbitrarily many branching events and allows for the incorporation of prior knowledge through supervised graph construction.
Maintained by Kelly Street. Last updated 5 months ago.
clusteringdifferentialexpressiongeneexpressionrnaseqsequencingsoftwaresinglecelltranscriptomicsvisualization
13.5 match 283 stars 12.01 score 1.0k scripts 4 dependentscristianetaniguti
onemap:Construction of Genetic Maps in Experimental Crosses
Analysis of molecular marker data from model (backcrosses, F2 and recombinant inbred lines) and non-model systems (i. e. outcrossing species). For the later, it allows statistical analysis by simultaneously estimating linkage and linkage phases (genetic map construction) according to Wu et al. (2002) <doi:10.1006/tpbi.2002.1577>. All analysis are based on multipoint approaches using hidden Markov models.
Maintained by Cristiane Taniguti. Last updated 2 months ago.
24.6 match 3 stars 6.58 score 183 scriptsbioc
TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data
The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.
Maintained by Tiago Chedraoui Silva. Last updated 27 days ago.
dnamethylationdifferentialmethylationgeneregulationgeneexpressionmethylationarraydifferentialexpressionpathwaysnetworksequencingsurvivalsoftwarebiocbioconductorgdcintegrative-analysistcgatcga-datatcgabiolinks
11.1 match 305 stars 14.45 score 1.6k scripts 6 dependentsbioc
QDNAseq:Quantitative DNA Sequencing for Chromosomal Aberrations
Quantitative DNA sequencing for chromosomal aberrations. The genome is divided into non-overlapping fixed-sized bins, number of sequence reads in each counted, adjusted with a simultaneous two-dimensional loess correction for sequence mappability and GC content, and filtered to remove spurious regions in the genome. Downstream steps of segmentation and calling are also implemented via packages DNAcopy and CGHcall, respectively.
Maintained by Daoud Sie. Last updated 5 months ago.
copynumbervariationdnaseqgeneticsgenomeannotationpreprocessingqualitycontrolsequencing
15.7 match 49 stars 10.10 score 177 scripts 4 dependentslhdjung
scrutiny:Error Detection in Science
Test published summary statistics for consistency (Brown and Heathers, 2017, <doi:10.1177/1948550616673876>; Allard, 2018, <https://aurelienallard.netlify.app/post/anaytic-grimmer-possibility-standard-deviations/>; Heathers and Brown, 2019, <https://osf.io/5vb3u/>). The package also provides infrastructure for implementing new error detection techniques.
Maintained by Lukas Jung. Last updated 6 months ago.
24.3 match 8 stars 6.52 score 38 scriptsbioc
tRNA:Analyzing tRNA sequences and structures
The tRNA package allows tRNA sequences and structures to be accessed and used for subsetting. In addition, it provides visualization tools to compare feature parameters of multiple tRNA sets and correlate them to additional data. The tRNA package uses GRanges objects as inputs requiring only few additional column data sets.
Maintained by Felix GM Ernst. Last updated 5 months ago.
softwarevisualizationbioconductorsequencesstructurestrna
23.2 match 1 stars 6.81 score 3 scripts 3 dependentsleelabsg
SKAT:SNP-Set (Sequence) Kernel Association Test
Functions for kernel-regression-based association tests including Burden test, SKAT and SKAT-O. These methods aggregate individual SNP score statistics in a SNP set and efficiently compute SNP-set level p-values.
Maintained by Seunggeun (Shawn) Lee. Last updated 1 months ago.
16.3 match 45 stars 9.70 score 268 scripts 16 dependentsbioc
CrispRVariants:Tools for counting and visualising mutations in a target location
CrispRVariants provides tools for analysing the results of a CRISPR-Cas9 mutagenesis sequencing experiment, or other sequencing experiments where variants within a given region are of interest. These tools allow users to localize variant allele combinations with respect to any genomic location (e.g. the Cas9 cut site), plot allele combinations and calculate mutation rates with flexible filtering of unrelated variants.
Maintained by Helen Lindsay. Last updated 5 months ago.
immunooncologycrisprgenomicvariationvariantdetectiongeneticvariabilitydatarepresentationvisualizationsequencing
28.6 match 5.51 score 32 scriptstidyverse
magrittr:A Forward-Pipe Operator for R
Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions. For more information, see package vignette. To quote Rene Magritte, "Ceci n'est pas un pipe."
Maintained by Lionel Henry. Last updated 2 years ago.
7.3 match 961 stars 21.06 score 82k scripts 14k dependentsbioc
scifer:Scifer: Single-Cell Immunoglobulin Filtering of Sanger Sequences
Have you ever index sorted cells in a 96 or 384-well plate and then sequenced using Sanger sequencing? If so, you probably had some struggles to either check the electropherogram of each cell sequenced manually, or when you tried to identify which cell was sorted where after sequencing the plate. Scifer was developed to solve this issue by performing basic quality control of Sanger sequences and merging flow cytometry data from probed single-cell sorted B cells with sequencing data. scifer can export summary tables, 'fasta' files, electropherograms for visual inspection, and generate reports.
Maintained by Rodrigo Arcoverde Cerveira. Last updated 3 months ago.
preprocessingqualitycontrolsangerseqsequencingsoftwareflowcytometrysinglecell
27.9 match 5 stars 5.54 score 9 scriptsbioc
ngsReports:Load FastqQC reports and other NGS related files
This package provides methods and object classes for parsing FastQC reports and output summaries from other NGS tools into R. As well as parsing files, multiple plotting methods have been implemented for visualising the parsed data. Plots can be generated as static ggplot objects or interactive plotly objects.
Maintained by Stevie Pederson. Last updated 5 months ago.
19.6 match 22 stars 7.89 score 99 scriptsmaraab23
ggseqplot:Render Sequence Plots using 'ggplot2'
A set of wrapper functions that mainly re-produces most of the sequence plots rendered with TraMineR::seqplot(). Whereas 'TraMineR' uses base R to produce the plots this library draws on 'ggplot2'. The plots are produced on the basis of a sequence object defined with TraMineR::seqdef(). The package automates the reshaping and plotting of sequence data. Resulting plots are of class 'ggplot', i.e. components can be added and tweaked using '+' and regular 'ggplot2' functions.
Maintained by Marcel Raab. Last updated 4 months ago.
ggplot2sequence-analysistraminervisualization
27.1 match 14 stars 5.70 score 18 scriptsbioc
maftools:Summarize, Analyze and Visualize MAF Files
Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.
Maintained by Anand Mayakonda. Last updated 5 months ago.
datarepresentationdnaseqvisualizationdrivermutationvariantannotationfeatureextractionclassificationsomaticmutationsequencingfunctionalgenomicssurvivalbioinformaticscancer-genome-atlascancer-genomicsgenomicsmaf-filestcgacurlbzip2xz-utilszlib
10.5 match 459 stars 14.63 score 948 scripts 18 dependentsbusiness-science
timetk:A Tool Kit for Working with Time Series
Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.
Maintained by Matt Dancho. Last updated 1 years ago.
coercioncoercion-functionsdata-miningdplyrforecastforecastingforecasting-modelsmachine-learningseries-decompositionseries-signaturetibbletidytidyquanttidyversetimetime-seriestimeseries
10.8 match 625 stars 14.15 score 4.0k scripts 16 dependentsbioc
Repitools:Epigenomic tools
Tools for the analysis of enrichment-based epigenomic data. Features include summarization and visualization of epigenomic data across promoters according to gene expression context, finding regions of differential methylation/binding, BayMeth for quantifying methylation etc.
Maintained by Mark Robinson. Last updated 5 months ago.
dnamethylationgeneexpressionmethylseq
25.9 match 5.90 score 267 scriptsbioc
cardelino:Clone Identification from Single Cell Data
Methods to infer clonal tree configuration for a population of cells using single-cell RNA-seq data (scRNA-seq), and possibly other data modalities. Methods are also provided to assign cells to inferred clones and explore differences in gene expression between clones. These methods can flexibly integrate information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. A flexible beta-binomial error model that accounts for stochastic dropout events as well as systematic allelic imbalance is used.
Maintained by Davis McCarthy. Last updated 5 months ago.
singlecellrnaseqvisualizationtranscriptomicsgeneexpressionsequencingsoftwareexomeseqclonal-clusteringgibbs-samplingscrna-seqsingle-cellsomatic-mutations
21.6 match 61 stars 7.05 score 62 scriptsbioc
seqLogo:Sequence logos for DNA sequence alignments
seqLogo takes the position weight matrix of a DNA sequence motif and plots the corresponding sequence logo as introduced by Schneider and Stephens (1990).
Maintained by Robert Ivanek. Last updated 5 months ago.
14.4 match 4 stars 10.57 score 304 scripts 29 dependentsbioc
PWMEnrich:PWM enrichment analysis
A toolkit of high-level functions for DNA motif scanning and enrichment analysis built upon Biostrings. The main functionality is PWM enrichment analysis of already known PWMs (e.g. from databases such as MotifDb), but the package also implements high-level functions for PWM scanning and visualisation. The package does not perform "de novo" motif discovery, but is instead focused on using motifs that are either experimentally derived or computationally constructed by other tools.
Maintained by Diego Diez. Last updated 5 months ago.
motifannotationsequencematchingsoftware
29.9 match 5.08 score 60 scriptsbioc
Rhtslib:HTSlib high-throughput sequencing library as an R package
This package provides version 1.18 of the 'HTSlib' C library for high-throughput sequence analysis. The package is primarily useful to developers of other R packages who wish to make use of HTSlib. Motivation and instructions for use of this package are in the vignette, vignette(package="Rhtslib", "Rhtslib").
Maintained by Hervé Pagès. Last updated 12 days ago.
dataimportsequencingbioconductor-packagecore-packagecurlbzip2xz-utilszlib
13.4 match 11 stars 11.25 score 3 scripts 591 dependentsbioc
AnnotationDbi:Manipulation of SQLite-based annotations in Bioconductor
Implements a user-friendly interface for querying SQLite-based annotation data packages.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
annotationmicroarraysequencinggenomeannotationbioconductor-packagecore-package
10.0 match 9 stars 15.05 score 3.6k scripts 769 dependentsmatthias-studer
WeightedCluster:Clustering of Weighted Data
Clusters state sequences and weighted data. It provides an optimized weighted PAM algorithm as well as functions for aggregating replicated cases, computing cluster quality measures for a range of clustering solutions and plotting (fuzzy) clusters of state sequences. Parametric bootstraps methods to validate typology of sequences are also provided. Finally, it provides a fuzzy and crisp CLARA algorithm to cluster large database with sequence analysis.
Maintained by Matthias Studer. Last updated 3 months ago.
26.8 match 5.55 score 106 scripts 4 dependentsbioc
Structstrings:Implementation of the dot bracket annotations with Biostrings
The Structstrings package implements the widely used dot bracket annotation for storing base pairing information in structured RNA. Structstrings uses the infrastructure provided by the Biostrings package and derives the DotBracketString and related classes from the BString class. From these, base pair tables can be produced for in depth analysis. In addition, the loop indices of the base pairs can be retrieved as well. For better efficiency, information conversion is implemented in C, inspired to a large extend by the ViennaRNA package.
Maintained by Felix G.M. Ernst. Last updated 5 months ago.
dataimportdatarepresentationinfrastructuresequencingsoftwarealignmentsequencematchingbioconductorrnarna-structural-analysisrna-structuresequencesstructures
22.9 match 4 stars 6.46 score 3 scripts 4 dependentsboopsboops
spider:Species Identity and Evolution in R
Analysis of species limits and DNA barcoding data. Included are functions for generating important summary statistics from DNA barcode data, assessing specimen identification efficacy, testing and optimizing divergence threshold limits, assessment of diagnostic nucleotides, and calculation of the probability of reciprocal monophyly. Additionally, a sliding window function offers opportunities to analyse information across a gene, often used for marker design in degraded DNA studies. Further information on the package has been published in Brown et al (2012) <doi:10.1111/j.1755-0998.2011.03108.x>.
Maintained by Rupert A. Collins. Last updated 6 years ago.
dna-barcodeednaevolutionspecies-delimitationspecies-identity
28.5 match 2 stars 5.20 score 66 scripts 1 dependentsbioc
tidySingleCellExperiment:Brings SingleCellExperiment to the Tidyverse
'tidySingleCellExperiment' is an adapter that abstracts the 'SingleCellExperiment' container in the form of a 'tibble'. This allows *tidy* data manipulation, nesting, and plotting. For example, a 'tidySingleCellExperiment' is directly compatible with functions from 'tidyverse' packages `dplyr` and `tidyr`, as well as plotting with `ggplot2` and `plotly`. In addition, the package provides various utility functions specific to single-cell omics data analysis (e.g., aggregation of cell-level data to pseudobulks).
Maintained by Stefano Mangiola. Last updated 5 months ago.
assaydomaininfrastructurernaseqdifferentialexpressionsinglecellgeneexpressionnormalizationclusteringqualitycontrolsequencingbioconductordplyrggplot2plotlysingle-cell-rna-seqsingle-cell-sequencingsinglecellexperimenttibbletidyrtidyverse
16.7 match 36 stars 8.86 score 125 scripts 2 dependentsbioc
Rqc:Quality Control Tool for High-Throughput Sequencing Data
Rqc is an optimised tool designed for quality control and assessment of high-throughput sequencing data. It performs parallel processing of entire files and produces a report which contains a set of high-resolution graphics.
Maintained by Welliton Souza. Last updated 5 months ago.
sequencingqualitycontroldataimportcpp
24.3 match 6.00 score 67 scriptsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 4 days ago.
immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project
10.5 match 182 stars 13.71 score 1.3k scripts 22 dependentsbioc
deepSNV:Detection of subclonal SNVs in deep sequencing data.
This package provides provides quantitative variant callers for detecting subclonal mutations in ultra-deep (>=100x coverage) sequencing experiments. The deepSNV algorithm is used for a comparative setup with a control experiment of the same loci and uses a beta-binomial model and a likelihood ratio test to discriminate sequencing errors and subclonal SNVs. The shearwater algorithm computes a Bayes classifier based on a beta-binomial model for variant calling with multiple samples for precisely estimating model parameters - such as local error rates and dispersion - and prior knowledge, e.g. from variation data bases such as COSMIC.
Maintained by Moritz Gerstung. Last updated 5 months ago.
geneticvariabilitysnpsequencinggeneticsdataimportcurlbzip2xz-utilszlibcpp
21.9 match 6.53 score 38 scripts 1 dependentsbioc
RNAmodR:Detection of post-transcriptional modifications in high throughput sequencing data
RNAmodR provides classes and workflows for loading/aggregation data from high througput sequencing aimed at detecting post-transcriptional modifications through analysis of specific patterns. In addition, utilities are provided to validate and visualize the results. The RNAmodR package provides a core functionality from which specific analysis strategies can be easily implemented as a seperate package.
Maintained by Felix G.M. Ernst. Last updated 5 months ago.
softwareinfrastructureworkflowstepvisualizationsequencingalkanilineseqbioconductormodificationsribomethseqrnarnamodr
21.8 match 3 stars 6.51 score 9 scripts 3 dependentsbioc
BASiCS:Bayesian Analysis of Single-Cell Sequencing data
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model to perform statistical analyses of single-cell RNA sequencing datasets in the context of supervised experiments (where the groups of cells of interest are known a priori, e.g. experimental conditions or cell types). BASiCS performs built-in data normalisation (global scaling) and technical noise quantification (based on spike-in genes). BASiCS provides an intuitive detection criterion for highly (or lowly) variable genes within a single group of cells. Additionally, BASiCS can compare gene expression patterns between two or more pre-specified groups of cells. Unlike traditional differential expression tools, BASiCS quantifies changes in expression that lie beyond comparisons of means, also allowing the study of changes in cell-to-cell heterogeneity. The latter can be quantified via a biological over-dispersion parameter that measures the excess of variability that is observed with respect to Poisson sampling noise, after normalisation and technical noise removal. Due to the strong mean/over-dispersion confounding that is typically observed for scRNA-seq datasets, BASiCS also tests for changes in residual over-dispersion, defined by residual values with respect to a global mean/over-dispersion trend.
Maintained by Catalina Vallejos. Last updated 5 months ago.
immunooncologynormalizationsequencingrnaseqsoftwaregeneexpressiontranscriptomicssinglecelldifferentialexpressionbayesiancellbiologybioconductor-packagegene-expressionrcpprcpparmadilloscrna-seqsingle-cellopenblascppopenmp
13.8 match 83 stars 10.26 score 368 scripts 1 dependentsropensci
bold:Interface to Bold Systems API
A programmatic interface to the Web Service methods provided by Bold Systems (<http://www.boldsystems.org/>) for genetic 'barcode' data. Functions include methods for searching by sequences by taxonomic names, ids, collectors, and institutions; as well as a function for searching for specimens, and downloading trace files.
Maintained by Salix Dubois. Last updated 3 months ago.
biodiversitybarcodednasequencesfastaapi-wrapperbarcodestaxize
24.6 match 18 stars 5.74 score 57 scriptsbioc
ProteoDisco:Generation of customized protein variant databases from genomic variants, splice-junctions and manual sequences
ProteoDisco is an R package to facilitate proteogenomics studies. It houses functions to create customized (variant) protein databases based on user-submitted genomic variants, splice-junctions, fusion genes and manual transcript sequences. The flexible workflow can be adopted to suit a myriad of research and experimental settings.
Maintained by Job van Riet. Last updated 5 months ago.
softwareproteomicsrnaseqsnpsequencingvariantannotationdataimport
26.5 match 5 stars 5.30 score 4 scriptshelske
seqHMM:Mixture Hidden Markov Models for Social Sequence Data and Other Multivariate, Multichannel Categorical Time Series
Designed for fitting hidden (latent) Markov models and mixture hidden Markov models for social sequence data and other categorical time series. Also some more restricted versions of these type of models are available: Markov models, mixture Markov models, and latent class models. The package supports models for one or multiple subjects with one or multiple parallel sequences (channels). External covariates can be added to explain cluster membership in mixture models. The package provides functions for evaluating and comparing models, as well as functions for visualizing of multichannel sequence data and hidden Markov models. Models are estimated using maximum likelihood via the EM algorithm and/or direct numerical maximization with analytical gradients. All main algorithms are written in C++ with support for parallel computation. Documentation is available via several vignettes in this page, and the paper by Helske and Helske (2019, <doi:10.18637/jss.v088.i03>).
Maintained by Jouni Helske. Last updated 2 years ago.
categorical-dataem-algorithmhidden-markov-modelshmmmixture-markov-modelstime-seriesopenblascppopenmp
16.5 match 97 stars 8.51 score 92 scripts 1 dependentsbioc
GenomicSuperSignature:Interpretation of RNA-seq experiments through robust, efficient comparison to public databases
This package provides a novel method for interpreting new transcriptomic datasets through near-instantaneous comparison to public archives without high-performance computing requirements. Through the pre-computed index, users can identify public resources associated with their dataset such as gene sets, MeSH term, and publication. Functions to identify interpretable annotations and intuitive visualization options are implemented in this package.
Maintained by Sehyun Oh. Last updated 5 months ago.
transcriptomicssystemsbiologyprincipalcomponentrnaseqsequencingpathwaysclusteringbioconductor-packageexploratory-data-analysisgseameshprincipal-component-analysisrna-sequencing-profilestransferlearning
20.0 match 16 stars 6.97 score 59 scriptsbioc
limma:Linear Models for Microarray and Omics Data
Data analysis, linear models and differential expression for omics data.
Maintained by Gordon Smyth. Last updated 6 days ago.
exonarraygeneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicinggenesetenrichmentdataimportbayesianclusteringregressiontimecoursemicroarraymicrornaarraymrnamicroarrayonechannelproprietaryplatformstwochannelsequencingrnaseqbatcheffectmultiplecomparisonnormalizationpreprocessingqualitycontrolbiomedicalinformaticscellbiologycheminformaticsepigeneticsfunctionalgenomicsgeneticsimmunooncologymetabolomicsproteomicssystemsbiologytranscriptomics
10.0 match 13.81 score 16k scripts 585 dependentsimmunomind
immunarch:Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires
A comprehensive framework for bioinformatics exploratory analysis of bulk and single-cell T-cell receptor and antibody repertoires. It provides seamless data loading, analysis and visualisation for AIRR (Adaptive Immune Receptor Repertoire) data, both bulk immunosequencing (RepSeq) and single-cell sequencing (scRNAseq). Immunarch implements most of the widely used AIRR analysis methods, such as: clonality analysis, estimation of repertoire similarities in distribution of clonotypes and gene segments, repertoire diversity analysis, annotation of clonotypes using external immune receptor databases and clonotype tracking in vaccination and cancer studies. A successor to our previously published 'tcR' immunoinformatics package (Nazarov 2015) <doi:10.1186/s12859-015-0613-1>.
Maintained by Vadim I. Nazarov. Last updated 12 months ago.
airr-analysisb-cell-receptorbcrbcr-repertoirebioinformaticsigig-repertoireimmune-repertoireimmune-repertoire-analysisimmune-repertoire-dataimmunoglobulinimmunoinformaticsimmunologyrep-seqrepertoire-analysissingle-cellsingle-cell-analysist-cell-receptortcrtcr-repertoirecpp
14.5 match 315 stars 9.49 score 203 scriptsklvoje
evoTS:Analyses of Evolutionary Time-Series
Facilitates univariate and multivariate analysis of evolutionary sequences of phenotypic change. The package extends the modeling framework available in the 'paleoTS' package. Please see <https://klvoje.github.io/evoTS/index.html> for information about the package and the implemented models.
Maintained by Kjetil Lysne Voje. Last updated 9 months ago.
32.2 match 1 stars 4.26 score 184 scriptsbioc
VariantAnnotation:Annotation of Genetic Variants
Annotate variants, compute amino acid coding changes, predict coding outcomes.
Maintained by Bioconductor Package Maintainer. Last updated 2 months ago.
dataimportsequencingsnpannotationgeneticsvariantannotationcurlbzip2xz-utilszlib
12.0 match 11.39 score 1.9k scripts 152 dependentsbioc
crisprBowtie:Bowtie-based alignment of CRISPR gRNA spacer sequences
Provides a user-friendly interface to map on-targets and off-targets of CRISPR gRNA spacer sequences using bowtie. The alignment is fast, and can be performed using either commonly-used or custom CRISPR nucleases. The alignment can work with any reference or custom genomes. Both DNA- and RNA-targeting nucleases are supported.
Maintained by Jean-Philippe Fortin. Last updated 5 months ago.
crisprfunctionalgenomicsalignmentalignerbioconductorbioconductor-packagebowtiecrispr-analysiscrispr-cas9crispr-designcrispr-targetgrnagrna-sequencegrna-sequencessgrnasgrna-design
23.0 match 3 stars 5.86 score 7 scripts 4 dependentstylermorganwall
spacefillr:Space-Filling Random and Quasi-Random Sequences
Generates random and quasi-random space-filling sequences. Supports the following sequences: 'Halton', 'Sobol', 'Owen'-scrambled 'Sobol', 'Owen'-scrambled 'Sobol' with errors distributed as blue noise, progressive jittered, progressive multi-jittered ('PMJ'), 'PMJ' with blue noise, 'PMJ02', and 'PMJ02' with blue noise. Includes a 'C++' 'API'. Methods derived from "Constructing Sobol sequences with better two-dimensional projections" (2012) <doi:10.1137/070709359> S. Joe and F. Y. Kuo, "Progressive Multi-Jittered Sample Sequences" (2018) <https://graphics.pixar.com/library/ProgressiveMultiJitteredSampling/paper.pdf> Christensen, P., Kensler, A. and Kilpatrick, C., and "A Low-Discrepancy Sampler that Distributes Monte Carlo Errors as a Blue Noise in Screen Space" (2019) E. Heitz, B. Laurent, O. Victor, C. David and I. Jean-Claude, <doi:10.1145/3306307.3328191>.
Maintained by Tyler Morgan-Wall. Last updated 21 days ago.
halton-sequencequasi-random-generatorsobol-sequencecpp
18.9 match 7 stars 7.07 score 3 scripts 45 dependentsbioc
idpr:Profiling and Analyzing Intrinsically Disordered Proteins in R
‘idpr’ aims to integrate tools for the computational analysis of intrinsically disordered proteins (IDPs) within R. This package is used to identify known characteristics of IDPs for a sequence of interest with easily reported and dynamic results. Additionally, this package includes tools for IDP-based sequence analysis to be used in conjunction with other R packages. Described in McFadden WM & Yanowitz JL (2022). "idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in R." PloS one, 17(4), e0266929. <https://doi.org/10.1371/journal.pone.0266929>.
Maintained by William M. McFadden. Last updated 5 months ago.
structuralpredictionproteomicscellbiology
21.6 match 4 stars 6.16 score 20 scriptsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
11.2 match 215 stars 11.83 score 1.2k scripts 9 dependentsbioc
splatter:Simple Simulation of Single-cell RNA Sequencing Data
Splatter is a package for the simulation of single-cell RNA sequencing count data. It provides a simple interface for creating complex simulations that are reproducible and well-documented. Parameters can be estimated from real data and functions are provided for comparing real and simulated datasets.
Maintained by Luke Zappia. Last updated 4 months ago.
singlecellrnaseqtranscriptomicsgeneexpressionsequencingsoftwareimmunooncologybioconductorbioinformaticsscrna-seqsimulation
13.4 match 224 stars 9.92 score 424 scripts 1 dependentsbioc
segmentSeq:Methods for identifying small RNA loci from high-throughput sequencing data
High-throughput sequencing technologies allow the production of large volumes of short sequences, which can be aligned to the genome to create a set of matches to the genome. By looking for regions of the genome which to which there are high densities of matches, we can infer a segmentation of the genome into regions of biological significance. The methods in this package allow the simultaneous segmentation of data from multiple samples, taking into account replicate data, in order to create a consensus segmentation. This has obvious applications in a number of classes of sequencing experiments, particularly in the discovery of small RNA loci and novel mRNA transcriptome discovery.
Maintained by Samuel Granjeaud. Last updated 4 months ago.
multiplecomparisonsequencingalignmentdifferentialexpressionqualitycontroldataimport
21.5 match 6.17 score 42 scriptsbioc
HDF5Array:HDF5 datasets as array-like objects in R
The HDF5Array package is an HDF5 backend for DelayedArray objects. It implements the HDF5Array, H5SparseMatrix, H5ADMatrix, and TENxMatrix classes, 4 convenient and memory-efficient array-like containers for representing and manipulating either: (1) a conventional (a.k.a. dense) HDF5 dataset, (2) an HDF5 sparse matrix (stored in CSR/CSC/Yale format), (3) the central matrix of an h5ad file (or any matrix in the /layers group), or (4) a 10x Genomics sparse matrix. All these containers are DelayedArray extensions and thus support all operations (delayed or block-processed) supported by DelayedArray objects.
Maintained by Hervé Pagès. Last updated 26 days ago.
infrastructuredatarepresentationdataimportsequencingrnaseqcoverageannotationgenomeannotationsinglecellimmunooncologybioconductor-packagecore-packageu24ca289073
10.0 match 12 stars 13.19 score 844 scripts 123 dependentsbioc
scran:Methods for Single-Cell RNA-Seq Data Analysis
Implements miscellaneous functions for interpretation of single-cell RNA-seq data. Methods are provided for assignment of cell cycle phase, detection of highly variable and significantly correlated genes, identification of marker genes, and other common tasks in routine single-cell analysis workflows.
Maintained by Aaron Lun. Last updated 5 months ago.
immunooncologynormalizationsequencingrnaseqsoftwaregeneexpressiontranscriptomicssinglecellclusteringbioconductor-packagehuman-cell-atlassingle-cell-rna-seqopenblascpp
10.0 match 41 stars 13.14 score 7.6k scripts 36 dependentsbioc
baySeq:Empirical Bayesian analysis of patterns of differential expression in count data
This package identifies differential expression in high-throughput 'count' data, such as that derived from next-generation sequencing machines, calculating estimated posterior likelihoods of differential expression (or more complex hypotheses) via empirical Bayesian methods.
Maintained by Samuel Granjeaud. Last updated 5 months ago.
sequencingdifferentialexpressionmultiplecomparisonsagebayesiancoverage
16.9 match 7.75 score 79 scripts 3 dependentsbioc
Gviz:Plotting data and annotation information along genomic coordinates
Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.
Maintained by Robert Ivanek. Last updated 5 months ago.
visualizationmicroarraysequencing
10.0 match 79 stars 13.08 score 1.4k scripts 48 dependentsbioc
clusterExperiment:Compare Clusterings for Single-Cell Sequencing
Provides functionality for running and comparing many different clusterings of single-cell sequencing data or other large mRNA Expression data sets.
Maintained by Elizabeth Purdom. Last updated 5 months ago.
clusteringrnaseqsequencingsoftwaresinglecellcpp
13.5 match 39 stars 9.63 score 192 scripts 1 dependentsbioc
sitePath:Phylogeny-based sequence clustering with site polymorphism
Using site polymorphism is one of the ways to cluster DNA/protein sequences but it is possible for the sequences with the same polymorphism on a single site to be genetically distant. This package is aimed at clustering sequences using site polymorphism and their corresponding phylogenetic trees. By considering their location on the tree, only the structurally adjacent sequences will be clustered. However, the adjacent sequences may not necessarily have the same polymorphism. So a branch-and-bound like algorithm is used to minimize the entropy representing the purity of site polymorphism of each cluster.
Maintained by Chengyang Ji. Last updated 5 months ago.
alignmentmultiplesequencealignmentphylogeneticssnpsoftwaremutationcpp
25.0 match 8 stars 5.20 score 9 scriptsbioc
tradeSeq:trajectory-based differential expression analysis for sequencing data
tradeSeq provides a flexible method for fitting regression models that can be used to find genes that are differentially expressed along one or multiple lineages in a trajectory. Based on the fitted models, it uses a variety of tests suited to answer different questions of interest, e.g. the discovery of genes for which expression is associated with pseudotime, or which are differentially expressed (in a specific region) along the trajectory. It fits a negative binomial generalized additive model (GAM) for each gene, and performs inference on the parameters of the GAM.
Maintained by Hector Roux de Bezieux. Last updated 5 months ago.
clusteringregressiontimecoursedifferentialexpressiongeneexpressionrnaseqsequencingsoftwaresinglecelltranscriptomicsmultiplecomparisonvisualization
12.9 match 247 stars 10.06 score 440 scriptsgreat-northern-diver
loon:Interactive Statistical Data Visualization
An extendable toolkit for interactive data visualization and exploration.
Maintained by R. Wayne Oldford. Last updated 2 years ago.
data-analysisdata-sciencedata-visualizationexploratory-analysisexploratory-data-analysishigh-dimensional-datainteractive-graphicsinteractive-visualizationsloonpythonstatistical-analysisstatistical-graphicsstatisticstcl-extensiontk
14.4 match 48 stars 9.00 score 93 scripts 5 dependentsbioc
PureCN:Copy number calling and SNV classification using targeted short read sequencing
This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.
Maintained by Markus Riester. Last updated 2 months ago.
copynumbervariationsoftwaresequencingvariantannotationvariantdetectioncoverageimmunooncologybioconductor-packagecell-free-dnacopy-numberlohtumor-heterogeneitytumor-mutational-burdentumor-purity
13.3 match 132 stars 9.72 score 40 scriptsbioc
seqTools:Analysis of nucleotide, sequence and quality content on fastq files
Analyze read length, phred scores and alphabet frequency and DNA k-mers on uncompressed and compressed fastq files.
Maintained by Wolfgang Kaisers. Last updated 5 months ago.
23.1 match 5.57 score 52 scripts 1 dependentscran
localScore:Package for Sequence Analysis by Local Score
Functionalities for calculating the local score and calculating statistical relevance (p-value) to find a local Score in a sequence of given distribution (S. Mercier and J.-J. Daudin (2001) <https://hal.science/hal-00714174/>) ; S. Karlin and S. Altschul (1990) <https://pmc.ncbi.nlm.nih.gov/articles/PMC53667/> ; S. Mercier, D. Cellier and F. Charlot (2003) <https://hal.science/hal-00937529v1/> ; A. Lagnoux, S. Mercier and P. Valois (2017) <doi:10.1093/bioinformatics/btw699> ).
Maintained by David Robelin. Last updated 21 days ago.
55.3 match 2.30 score 6 scriptsbioc
monaLisa:Binned Motif Enrichment Analysis and Visualization
Useful functions to work with sequence motifs in the analysis of genomics data. These include methods to annotate genomic regions or sequences with predicted motif hits and to identify motifs that drive observed changes in accessibility or expression. Functions to produce informative visualizations of the obtained results are also provided.
Maintained by Michael Stadler. Last updated 2 months ago.
motifannotationvisualizationfeatureextractionepigenetics
15.8 match 40 stars 8.06 score 53 scriptsbioc
HTSFilter:Filter replicated high-throughput transcriptome sequencing data
This package implements a filtering procedure for replicated transcriptome sequencing data based on a global Jaccard similarity index in order to identify genes with low, constant levels of expression across one or more experimental conditions.
Maintained by Andrea Rau. Last updated 5 months ago.
sequencingrnaseqpreprocessingdifferentialexpressiongeneexpressionnormalizationimmunooncology
20.4 match 6.24 score 58 scripts 1 dependentsbioc
XNAString:Efficient Manipulation of Modified Oligonucleotide Sequences
The XNAString package allows for description of base sequences and associated chemical modifications in a single object. XNAString is able to capture single stranded, as well as double stranded molecules. Chemical modifications are represented as independent strings associated with different features of the molecules (base sequence, sugar sequence, backbone sequence, modifications) and can be read or written to a HELM notation. It also enables secondary structure prediction using RNAfold from ViennaRNA. XNAString is designed to be efficient representation of nucleic-acid based therapeutics, therefore it stores information about target sequences and provides interface for matching and alignment functions from Biostrings and pwalign packages.
Maintained by Marianna Plucinska. Last updated 5 months ago.
sequencematchingalignmentsequencinggeneticscpp
30.4 match 4.18 score 4 scriptsbioc
microbiome:Microbiome Analytics
Utilities for microbiome analysis.
Maintained by Leo Lahti. Last updated 5 months ago.
metagenomicsmicrobiomesequencingsystemsbiologyhitchiphitchip-atlashuman-microbiomemicrobiologymicrobiome-analysisphyloseqpopulation-study
10.0 match 290 stars 12.50 score 2.0k scripts 5 dependentsbioc
genomation:Summary, annotation and visualization of genomic data
A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.
Maintained by Altuna Akalin. Last updated 5 months ago.
annotationsequencingvisualizationcpgislandcpp
11.3 match 75 stars 11.09 score 738 scripts 5 dependentsbioc
DNAfusion:Identification of gene fusions using paired-end sequencing
DNAfusion can identify gene fusions such as EML4-ALK based on paired-end sequencing results. This package was developed using position deduplicated BAM files generated with the AVENIO Oncology Analysis Software. These files are made using the AVENIO ctDNA surveillance kit and Illumina Nextseq 500 sequencing. This is a targeted hybridization NGS approach and includes ALK-specific but not EML4-specific probes.
Maintained by Christoffer Trier Maansson. Last updated 5 months ago.
targetedresequencinggeneticsgenefusiondetectionsequencingbioconductor-packagecirculating-tumor-dnagene-fusionliquid-biopsynext-generation-sequencingtargeted-sequencingvariant-calling
27.8 match 3 stars 4.48 score 10 scriptsbioc
FastqCleaner:A Shiny Application for Quality Control, Filtering and Trimming of FASTQ Files
An interactive web application for quality control, filtering and trimming of FASTQ files. This user-friendly tool combines a pipeline for data processing based on Biostrings and ShortRead infrastructure, with a cutting-edge visual environment. Single-Read and Paired-End files can be locally processed. Diagnostic interactive plots (CG content, per-base sequence quality, etc.) are provided for both the input and output files.
Maintained by Leandro Roser. Last updated 5 months ago.
qualitycontrolsequencingsoftwaresangerseqsequencematchingcpp
30.8 match 4.00 score 4 scriptsbioc
qckitfastq:FASTQ Quality Control
Assessment of FASTQ file format with multiple metrics including quality score, sequence content, overrepresented sequence and Kmers.
Maintained by August Guang. Last updated 5 months ago.
softwarequalitycontrolsequencingzlibcpp
28.1 match 4.38 score 24 scriptsbioc
sva:Surrogate Variable Analysis
The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. Specifically, the sva package contains functions for the identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data (like gene expression/RNA sequencing/methylation/brain imaging data) that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise. The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv). Removing batch effects and using surrogate variables in differential expression analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility, see (Leek and Storey 2007 PLoS Genetics, 2008 PNAS or Leek et al. 2011 Nat. Reviews Genetics).
Maintained by Jeffrey T. Leek. Last updated 5 months ago.
immunooncologymicroarraystatisticalmethodpreprocessingmultiplecomparisonsequencingrnaseqbatcheffectnormalization
12.2 match 10.05 score 3.2k scripts 50 dependentschaoliu-cl
conversim:Conversation Similarity Analysis Package
A comprehensive toolkit for analyzing and comparing conversations. This package provides functions to calculate various similarity measures between conversations, including topic, lexical, semantic, structural, stylistic, sentiment, participant, and timing similarities. It supports both pairwise conversation comparisons and analysis of multiple dyads.
Maintained by Chao Liu. Last updated 6 months ago.
28.5 match 4.30 score 10 scriptssjmack
HLAtools:Toolkit for HLA Immunogenomics
A toolkit for the analysis and management of data for genes in the so-called "Human Leukocyte Antigen" (HLA) region. Functions extract reference data from the Anthony Nolan HLA Informatics Group/ImmunoGeneTics HLA 'GitHub' repository (ANHIG/IMGTHLA) <https://github.com/ANHIG/IMGTHLA>, validate Genotype List (GL) Strings, convert between UNIFORMAT and GL String Code (GLSC) formats, translate HLA alleles and GLSCs across ImmunoPolymorphism Database (IPD) IMGT/HLA Database release versions, identify differences between pairs of alleles at a locus, generate customized, multi-position sequence alignments, trim and convert allele-names across nomenclature epochs, and extend existing data-analysis methods.
Maintained by Steven Mack. Last updated 13 days ago.
19.7 match 4 stars 6.21 score 7 scripts 1 dependentsbioc
DropletUtils:Utilities for Handling Single-Cell Droplet Data
Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading from count matrices or molecule information files, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix.
Maintained by Jonathan Griffiths. Last updated 3 months ago.
immunooncologysinglecellsequencingrnaseqgeneexpressiontranscriptomicsdataimportcoveragezlibcpp
12.0 match 10.08 score 2.7k scripts 9 dependentsmt1022
cubar:Codon Usage Bias Analysis
A suite of functions for rapid and flexible analysis of codon usage bias. It provides in-depth analysis at the codon level, including relative synonymous codon usage (RSCU), tRNA weight calculations, machine learning predictions for optimal or preferred codons, and visualization of codon-anticodon pairing. Additionally, it can calculate various gene- specific codon indices such as codon adaptation index (CAI), effective number of codons (ENC), fraction of optimal codons (Fop), tRNA adaptation index (tAI), mean codon stabilization coefficients (CSCg), and GC contents (GC/GC3s/GC4d). It also supports both standard and non-standard genetic code tables found in NCBI, as well as custom genetic code tables.
Maintained by Hong Zhang. Last updated 3 months ago.
bioinformaticscodon-usagemachine-learningsequence-analysis
20.6 match 6 stars 5.82 score 8 scriptsbioc
circRNAprofiler:circRNAprofiler: An R-Based Computational Framework for the Downstream Analysis of Circular RNAs
R-based computational framework for a comprehensive in silico analysis of circRNAs. This computational framework allows to combine and analyze circRNAs previously detected by multiple publicly available annotation-based circRNA detection tools. It covers different aspects of circRNAs analysis from differential expression analysis, evolutionary conservation, biogenesis to functional analysis.
Maintained by Simona Aufiero. Last updated 5 months ago.
annotationstructuralpredictionfunctionalpredictiongenepredictiongenomeassemblydifferentialexpression
20.7 match 10 stars 5.78 score 5 scriptsbioc
GenomicDataCommons:NIH / NCI Genomic Data Commons Access
Programmatically access the NIH / NCI Genomic Data Commons RESTful service.
Maintained by Sean Davis. Last updated 1 months ago.
dataimportsequencingapi-clientbioconductorbioinformaticscancercore-servicesdata-sciencegenomicsncitcgavignette
10.0 match 87 stars 11.94 score 238 scripts 12 dependentsstuart-lab
Signac:Analysis of Single-Cell Chromatin Data
A framework for the analysis and exploration of single-cell chromatin data. The 'Signac' package contains functions for quantifying single-cell chromatin data, computing per-cell quality control metrics, dimension reduction and normalization, visualization, and DNA sequence motif analysis. Reference: Stuart et al. (2021) <doi:10.1038/s41592-021-01282-5>.
Maintained by Tim Stuart. Last updated 7 months ago.
atacbioinformaticssingle-cellzlibcpp
9.7 match 349 stars 12.19 score 3.7k scripts 1 dependentsbioc
derfinder:Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach
This package provides functions for annotation-agnostic differential expression analysis of RNA-seq data. Two implementations of the DER Finder approach are included in this package: (1) single base-level F-statistics and (2) DER identification at the expressed regions-level. The DER Finder approach can also be used to identify differentially bounded ChIP-seq peaks.
Maintained by Leonardo Collado-Torres. Last updated 3 months ago.
differentialexpressionsequencingrnaseqchipseqdifferentialpeakcallingsoftwareimmunooncologycoverageannotation-agnosticbioconductorderfinder
11.8 match 42 stars 10.03 score 78 scripts 6 dependentsmolevolepid
SEEPS:Sequence evolution and epidemiological process simulator
A modular, modern simulation suite and toolkit for simulating transmission networks, phylogenies, and evolutionary pairwise distance matrices under different models and assumptions for viral/sequence evolution. While intially developed for HIV, SEEPS offers modular utilities for custom workflows for extension beyond HIV.
Maintained by Michael Kupperman. Last updated 2 months ago.
biological-sequencesepidemiologyevolutionhivsimulation-framework
29.8 match 1 stars 3.95 score 6 scriptsbioc
spiky:Spike-in calibration for cell-free MeDIP
spiky implements methods and model generation for cfMeDIP (cell-free methylated DNA immunoprecipitation) with spike-in controls. CfMeDIP is an enrichment protocol which avoids destructive conversion of scarce template, making it ideal as a "liquid biopsy," but creating certain challenges in comparing results across specimens, subjects, and experiments. The use of synthetic spike-in standard oligos allows diagnostics performed with cfMeDIP to quantitatively compare samples across subjects, experiments, and time points in both relative and absolute terms.
Maintained by Tim Triche. Last updated 5 months ago.
differentialmethylationdnamethylationnormalizationpreprocessingqualitycontrolsequencing
23.9 match 2 stars 4.90 score 3 scriptsbioc
cfDNAPro:cfDNAPro extracts and Visualises biological features from whole genome sequencing data of cell-free DNA
cfDNA fragments carry important features for building cancer sample classification ML models, such as fragment size, and fragment end motif etc. Analyzing and visualizing fragment size metrics, as well as other biological features in a curated, standardized, scalable, well-documented, and reproducible way might be time intensive. This package intends to resolve these problems and simplify the process. It offers two sets of functions for cfDNA feature characterization and visualization.
Maintained by Haichao Wang. Last updated 5 months ago.
visualizationsequencingwholegenomebioinformaticscancer-genomicscancer-researchcell-free-dnaearly-detectiongenomics-visualizationliquid-biopsyswgswhole-genome-sequencing
19.4 match 28 stars 6.04 score 13 scriptscran
ips:Interfaces to Phylogenetic Software in R
Functions that wrap popular phylogenetic software for sequence alignment, masking of sequence alignments, and estimation of phylogenies and ancestral character states.
Maintained by Christoph Heibl. Last updated 11 months ago.
27.2 match 4.28 score 128 scripts 1 dependentsbioc
systemPipeR:systemPipeR: Workflow Environment for Data Analysis and Report Generation
systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates.
Maintained by Thomas Girke. Last updated 5 months ago.
geneticsinfrastructuredataimportsequencingrnaseqriboseqchipseqmethylseqsnpgeneexpressioncoveragegenesetenrichmentalignmentqualitycontrolimmunooncologyreportwritingworkflowstepworkflowmanagement
10.0 match 53 stars 11.56 score 344 scripts 3 dependentsbioc
muscle:Multiple Sequence Alignment with MUSCLE
MUSCLE performs multiple sequence alignments of nucleotide or amino acid sequences.
Maintained by Alex T. Kalinka. Last updated 5 months ago.
multiplesequencealignmentalignmentsequencinggeneticssequencematchingdataimportcpp
22.0 match 5.21 score 81 scriptsbioc
methInheritSim:Simulating Whole-Genome Inherited Bisulphite Sequencing Data
Simulate a multigeneration methylation case versus control experiment with inheritance relation using a real control dataset.
Maintained by Pascal Belleau. Last updated 5 months ago.
biologicalquestionepigeneticsdnamethylationdifferentialmethylationmethylseqsoftwareimmunooncologystatisticalmethodwholegenomesequencingbisulphite-sequencinginheritancemethylationsimulation
24.9 match 1 stars 4.60 score 1 scriptsdsy109
mixtools:Tools for Analyzing Finite Mixture Models
Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixtures-of-regressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictor-dependent mixing proportions, random effects regressions, hierarchical mixtures-of-experts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic, mixturegrams, and model selection criteria). Bayesian estimation of mixtures-of-linear-regressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772 and the Chan Zuckerberg Initiative: Essential Open Source Software for Science (Grant No. 2020-255193).
Maintained by Derek Young. Last updated 9 months ago.
mixture-modelsmixture-of-expertssemiparametric-regression
10.1 match 20 stars 11.34 score 1.4k scripts 56 dependentsbioc
bsseq:Analyze, manage and store whole-genome methylation data
A collection of tools for analyzing and visualizing whole-genome methylation data from sequencing. This includes whole-genome bisulfite sequencing and Oxford nanopore data.
Maintained by Kasper Daniel Hansen. Last updated 3 months ago.
9.3 match 37 stars 12.26 score 676 scripts 15 dependentsbioc
TAPseq:Targeted scRNA-seq primer design for TAP-seq
Design primers for targeted single-cell RNA-seq used by TAP-seq. Create sequence templates for target gene panels and design gene-specific primers using Primer3. Potential off-targets can be estimated with BLAST. Requires working installations of Primer3 and BLASTn.
Maintained by Andreas R. Gschwind. Last updated 5 months ago.
singlecellsequencingtechnologycrisprpooledscreens
21.2 match 4 stars 5.38 score 9 scriptsbioc
crisprScore:On-Target and Off-Target Scoring Algorithms for CRISPR gRNAs
Provides R wrappers of several on-target and off-target scoring methods for CRISPR guide RNAs (gRNAs). The following nucleases are supported: SpCas9, AsCas12a, enAsCas12a, and RfxCas13d (CasRx). The available on-target cutting efficiency scoring methods are RuleSet1, Azimuth, DeepHF, DeepCpf1, enPAM+GB, and CRISPRscan. Both the CFD and MIT scoring methods are available for off-target specificity prediction. The package also provides a Lindel-derived score to predict the probability of a gRNA to produce indels inducing a frameshift for the Cas9 nuclease. Note that DeepHF, DeepCpf1 and enPAM+GB are not available on Windows machines.
Maintained by Jean-Philippe Fortin. Last updated 5 months ago.
crisprfunctionalgenomicsfunctionalpredictionbioconductorbioconductor-packagecrispr-cas9crispr-designcrispr-targetgenomicsgrnagrna-sequencegrna-sequencesscoring-algorithmsgrnasgrna-design
15.0 match 16 stars 7.52 score 19 scripts 4 dependentsbioc
pathview:a tool set for pathway based data integration and visualization
Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.
Maintained by Weijun Luo. Last updated 5 months ago.
pathwaysgraphandnetworkvisualizationgenesetenrichmentdifferentialexpressiongeneexpressionmicroarrayrnaseqgeneticsmetabolomicsproteomicssystemsbiologysequencing
10.0 match 40 stars 11.24 score 1.6k scripts 10 dependentsbioc
karyoploteR:Plot customizable linear genomes displaying arbitrary data
karyoploteR creates karyotype plots of arbitrary genomes and offers a complete set of functions to plot arbitrary data on them. It mimicks many R base graphics functions coupling them with a coordinate change function automatically mapping the chromosome and data coordinates into the plot coordinates. In addition to the provided data plotting functions, it is easy to add new ones.
Maintained by Bernat Gel. Last updated 5 months ago.
visualizationcopynumbervariationsequencingcoveragednaseqchipseqmethylseqdataimportonechannelbioconductorbioinformaticsdata-visualizationgenomegenomics-visualizationplotting-in-r
10.0 match 306 stars 11.22 score 656 scripts 4 dependentsbioc
StructuralVariantAnnotation:Variant annotations for structural variants
StructuralVariantAnnotation provides a framework for analysis of structural variants within the Bioconductor ecosystem. This package contains contains useful helper functions for dealing with structural variants in VCF format. The packages contains functions for parsing VCFs from a number of popular callers as well as functions for dealing with breakpoints involving two separate genomic loci encoded as GRanges objects.
Maintained by Daniel Cameron. Last updated 5 months ago.
dataimportsequencingannotationgeneticsvariantannotation
17.8 match 6.26 score 102 scripts 2 dependentsmlverse
torch:Tensors and Neural Networks with 'GPU' Acceleration
Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.
Maintained by Daniel Falbel. Last updated 6 days ago.
6.8 match 520 stars 16.52 score 1.4k scripts 38 dependentsxytangtang
ProcData:Process Data Analysis
Provides tools for exploratory process data analysis. Process data refers to the data describing participants' problem-solving processes in computer-based assessments. It is often recorded in computer log files. This package provides functions to read, process, and write process data. It also implements two feature extraction methods to compress the information stored in process data into standard numerical vectors. This package also provides recurrent neural network based models that relate response processes with other binary or scale variables of interest. The functions that involve training and evaluating neural networks are wrappers of functions in 'keras'.
Maintained by Xueying Tang. Last updated 4 years ago.
30.1 match 10 stars 3.70 score 2 scriptsbioc
scater:Single-Cell Analysis Toolkit for Gene Expression Data in R
A collection of tools for doing various analyses of single-cell RNA-seq gene expression data, with a focus on quality control and visualization.
Maintained by Alan OCallaghan. Last updated 9 days ago.
immunooncologysinglecellrnaseqqualitycontrolpreprocessingnormalizationvisualizationdimensionreductiontranscriptomicsgeneexpressionsequencingsoftwaredataimportdatarepresentationinfrastructurecoverage
10.0 match 11.07 score 12k scripts 43 dependentsbioc
celda:CEllular Latent Dirichlet Allocation
Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included.
Maintained by Joshua Campbell. Last updated 28 days ago.
singlecellgeneexpressionclusteringsequencingbayesianimmunooncologydataimportcppopenmp
10.5 match 147 stars 10.47 score 256 scripts 2 dependentsbioc
DirichletMultinomial:Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data
Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Quince, 2012, PLoS ONE 7(2): 1-15, as discussed further in the man page for this package, ?DirichletMultinomial.
Maintained by Martin Morgan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclusteringclassificationmetagenomicsgsl
10.0 match 11 stars 10.97 score 125 scripts 26 dependentsbioc
ORFhunteR:Predict open reading frames in nucleotide sequences
The ORFhunteR package is a R and C++ library for an automatic determination and annotation of open reading frames (ORF) in a large set of RNA molecules. It efficiently implements the machine learning model based on vectorization of nucleotide sequences and the random forest classification algorithm. The ORFhunteR package consists of a set of functions written in the R language in conjunction with C++. The efficiency of the package was confirmed by the examples of the analysis of RNA molecules from the NCBI RefSeq and Ensembl databases. The package can be used in basic and applied biomedical research related to the study of the transcriptome of normal as well as altered (for example, cancer) human cells.
Maintained by Vasily V. Grinev. Last updated 5 months ago.
technologystatisticalmethodsequencingrnaseqclassificationfeatureextractioncpp
24.5 match 1 stars 4.48 scorebioc
IsoformSwitchAnalyzeR:Identify, Annotate and Visualize Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data
Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.
Maintained by Kristoffer Vitting-Seerup. Last updated 5 months ago.
geneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicingvisualizationstatisticalmethodtranscriptomevariantbiomedicalinformaticsfunctionalgenomicssystemsbiologytranscriptomicsrnaseqannotationfunctionalpredictiongenepredictiondataimportmultiplecomparisonbatcheffectimmunooncology
11.8 match 108 stars 9.26 score 125 scriptsbioc
EnrichedHeatmap:Making Enriched Heatmaps
Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions. Here we implement enriched heatmap by ComplexHeatmap package. Since this type of heatmap is just a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmap as well as concatenating to a list of heatmaps to show correspondance between different data sources.
Maintained by Zuguang Gu. Last updated 5 months ago.
softwarevisualizationsequencinggenomeannotationcoveragecpp
10.0 match 190 stars 10.87 score 330 scripts 1 dependentsbioc
Dino:Normalization of Single-Cell mRNA Sequencing Data
Dino normalizes single-cell, mRNA sequencing data to correct for technical variation, particularly sequencing depth, prior to downstream analysis. The approach produces a matrix of corrected expression for which the dependency between sequencing depth and the full distribution of normalized expression; many existing methods aim to remove only the dependency between sequencing depth and the mean of the normalized expression. This is particuarly useful in the context of highly sparse datasets such as those produced by 10X genomics and other uninque molecular identifier (UMI) based microfluidics protocols for which the depth-dependent proportion of zeros in the raw expression data can otherwise present a challenge.
Maintained by Jared Brown. Last updated 5 months ago.
softwarenormalizationrnaseqsinglecellsequencinggeneexpressiontranscriptomicsregressioncellbasedassays
18.0 match 9 stars 6.02 score 13 scriptsbioc
LOLA:Locus overlap analysis for enrichment of genomic ranges
Provides functions for testing overlap of sets of genomic regions with public and custom region set (genomic ranges) databases. This makes it possible to do automated enrichment analysis for genomic region sets, thus facilitating interpretation of functional genomics and epigenomics data.
Maintained by Nathan Sheffield. Last updated 5 months ago.
genesetenrichmentgeneregulationgenomeannotationsystemsbiologyfunctionalgenomicschipseqmethylseqsequencing
11.6 match 76 stars 9.34 score 160 scriptsbioc
SRAdb:A compilation of metadata from NCBI SRA and tools
The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, and others. However, finding data of interest can be challenging using current tools. SRAdb is an attempt to make access to the metadata associated with submission, study, sample, experiment and run much more feasible. This is accomplished by parsing all the NCBI SRA metadata into a SQLite database that can be stored and queried locally. Fulltext search in the package make querying metadata very flexible and powerful. fastq and sra files can be downloaded for doing alignment locally. Beside ftp protocol, the SRAdb has funcitons supporting fastp protocol (ascp from Aspera Connect) for faster downloading large data files over long distance. The SQLite database is updated regularly as new data is added to SRA and can be downloaded at will for the most up-to-date metadata.
Maintained by Jack Zhu. Last updated 3 months ago.
infrastructuresequencingdataimport
13.8 match 2 stars 7.81 score 200 scriptsbioc
EDASeq:Exploratory Data Analysis and Normalization for RNA-Seq
Numerical and graphical summaries of RNA-Seq read data. Within-lane normalization procedures to adjust for GC-content effect (or other gene-level effects) on read counts: loess robust local regression, global-scaling, and full-quantile normalization (Risso et al., 2011). Between-lane normalization procedures to adjust for distributional differences between lanes (e.g., sequencing depth): global-scaling and full-quantile normalization (Bullard et al., 2010).
Maintained by Davide Risso. Last updated 5 months ago.
immunooncologysequencingrnaseqpreprocessingqualitycontroldifferentialexpression
10.5 match 5 stars 10.24 score 594 scripts 9 dependentsbioc
R453Plus1Toolbox:A package for importing and analyzing data from Roche's Genome Sequencer System
The R453Plus1 Toolbox comprises useful functions for the analysis of data generated by Roche's 454 sequencing platform. It adds functions for quality assurance as well as for annotation and visualization of detected variants, complementing the software tools shipped by Roche with their product. Further, a pipeline for the detection of structural variants is provided.
Maintained by Hans-Ulrich Klein. Last updated 5 months ago.
sequencinginfrastructuredataimportdatarepresentationvisualizationqualitycontrolreportwriting
30.8 match 3.48 score 10 scriptsbioc
SPsimSeq:Semi-parametric simulation tool for bulk and single-cell RNA sequencing data
SPsimSeq uses a specially designed exponential family for density estimation to constructs the distribution of gene expression levels from a given real RNA sequencing data (single-cell or bulk), and subsequently simulates a new dataset from the estimated marginal distributions using Gaussian-copulas to retain the dependence between genes. It allows simulation of multiple groups and batches with any required sample size and library size.
Maintained by Joris Meys. Last updated 5 months ago.
geneexpressionrnaseqsinglecellsequencingdnaseq
14.9 match 10 stars 7.14 score 29 scripts 1 dependentsbioc
Rbwa:R wrapper for BWA-backtrack and BWA-MEM aligners
Provides an R wrapper for BWA alignment algorithms. Both BWA-backtrack and BWA-MEM are available. Convenience function to build a BWA index from a reference genome is also provided. Currently not supported for Windows machines.
Maintained by Jean-Philippe Fortin. Last updated 5 months ago.
sequencingalignmentbwabwa-memdna-sequencesdna-sequencing
25.0 match 1 stars 4.26 score 10 scripts 1 dependentsbioc
dmrseq:Detection and inference of differentially methylated regions from Whole Genome Bisulfite Sequencing
This package implements an approach for scanning the genome to detect and perform accurate inference on differentially methylated regions from Whole Genome Bisulfite Sequencing data. The method is based on comparing detected regions to a pooled null distribution, that can be implemented even when as few as two samples per population are available. Region-level statistics are obtained by fitting a generalized least squares (GLS) regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions.
Maintained by Keegan Korthauer. Last updated 5 months ago.
immunooncologydnamethylationepigeneticsmultiplecomparisonsoftwaresequencingdifferentialmethylationwholegenomeregressionfunctionalgenomics
16.6 match 6.39 score 59 scripts 1 dependentsbioc
Glimma:Interactive visualizations for gene expression analysis
This package produces interactive visualizations for RNA-seq data analysis, utilizing output from limma, edgeR, or DESeq2. It produces interactive htmlwidgets versions of popular RNA-seq analysis plots to enhance the exploration of analysis results by overlaying interactive features. The plots can be viewed in a web browser or embedded in notebook documents.
Maintained by Shian Su. Last updated 1 months ago.
differentialexpressiongeneexpressionmicroarrayreportwritingrnaseqsequencingvisualizationdifferential-expressioninteractive-visualizations
10.0 match 32 stars 10.58 score 600 scripts 1 dependentsbioc
zinbwave:Zero-Inflated Negative Binomial Model for RNA-Seq Data
Implements a general and flexible zero-inflated negative binomial model that can be used to provide a low-dimensional representations of single-cell RNA-seq data. The model accounts for zero inflation (dropouts), over-dispersion, and the count nature of the data. The model also accounts for the difference in library sizes and optionally for batch effects and/or other covariates, avoiding the need for pre-normalize the data.
Maintained by Davide Risso. Last updated 5 months ago.
immunooncologydimensionreductiongeneexpressionrnaseqsoftwaretranscriptomicssequencingsinglecell
10.0 match 43 stars 10.53 score 190 scripts 6 dependentsbioc
scds:In-Silico Annotation of Doublets for Single Cell RNA Sequencing Data
In single cell RNA sequencing (scRNA-seq) data combinations of cells are sometimes considered a single cell (doublets). The scds package provides methods to annotate doublets in scRNA-seq data computationally.
Maintained by Dennis Kostka. Last updated 5 months ago.
singlecellrnaseqqualitycontrolpreprocessingtranscriptomicsgeneexpressionsequencingsoftwareclassification
16.0 match 6.57 score 176 scripts 1 dependentsbioc
YAPSA:Yet Another Package for Signature Analysis
This package provides functions and routines for supervised analyses of mutational signatures (i.e., the signatures have to be known, cf. L. Alexandrov et al., Nature 2013 and L. Alexandrov et al., Bioaxiv 2018). In particular, the family of functions LCD (LCD = linear combination decomposition) can use optimal signature-specific cutoffs which takes care of different detectability of the different signatures. Moreover, the package provides different sets of mutational signatures, including the COSMIC and PCAWG SNV signatures and the PCAWG Indel signatures; the latter infering that with YAPSA, the concept of supervised analysis of mutational signatures is extended to Indel signatures. YAPSA also provides confidence intervals as computed by profile likelihoods and can perform signature analysis on a stratified mutational catalogue (SMC = stratify mutational catalogue) in order to analyze enrichment and depletion patterns for the signatures in different strata.
Maintained by Zuguang Gu. Last updated 5 months ago.
sequencingdnaseqsomaticmutationvisualizationclusteringgenomicvariationstatisticalmethodbiologicalquestion
16.4 match 6.41 score 57 scriptszachcp
msaR:Multiple Sequence Alignment for R Shiny
Visualises multiple sequence alignments dynamically within the Shiny web application framework.
Maintained by Zachary Charlop-Powers. Last updated 3 years ago.
alignmenthtmlwidgetsmultiplesequence
19.6 match 8 stars 5.37 score 58 scriptsbioc
OmnipathR:OmniPath web service client and more
A client for the OmniPath web service (https://www.omnipathdb.org) and many other resources. It also includes functions to transform and pretty print some of the downloaded data, functions to access a number of other resources such as BioPlex, ConsensusPathDB, EVEX, Gene Ontology, Guide to Pharmacology (IUPHAR/BPS), Harmonizome, HTRIdb, Human Phenotype Ontology, InWeb InBioMap, KEGG Pathway, Pathway Commons, Ramilowski et al. 2015, RegNetwork, ReMap, TF census, TRRUST and Vinayagam et al. 2011. Furthermore, OmnipathR features a close integration with the NicheNet method for ligand activity prediction from transcriptomics data, and its R implementation `nichenetr` (available only on github).
Maintained by Denes Turei. Last updated 19 days ago.
graphandnetworknetworkpathwayssoftwarethirdpartyclientdataimportdatarepresentationgenesignalinggeneregulationsystemsbiologytranscriptomicssinglecellannotationkeggcomplexesenzyme-ptmnetworksnetworks-biologyomnipathproteinsquarto
10.6 match 126 stars 9.90 score 226 scripts 2 dependentscausal-lda
TrialEmulation:Causal Analysis of Observational Time-to-Event Data
Implements target trial emulation methods to apply randomized clinical trial design and analysis in an observational setting. Using marginal structural models, it can estimate intention-to-treat and per-protocol effects in emulated trials using electronic health records. A description and application of the method can be found in Danaei et al (2013) <doi:10.1177/0962280211403603>.
Maintained by Isaac Gravestock. Last updated 23 days ago.
causal-inferencelongitudinal-datasurvival-analysiscpp
13.5 match 25 stars 7.72 score 29 scriptsbioc
AneuFinder:Analysis of Copy Number Variation in Single-Cell-Sequencing Data
AneuFinder implements functions for copy-number detection, breakpoint detection, and karyotype and heterogeneity analysis in single-cell whole genome sequencing and strand-seq data.
Maintained by Aaron Taudt. Last updated 5 months ago.
immunooncologysoftwaresequencingsinglecellcopynumbervariationgenomicvariationhiddenmarkovmodelwholegenomecpp
13.3 match 17 stars 7.70 score 37 scriptsbioc
muscat:Multi-sample multi-group scRNA-seq data analysis tools
`muscat` provides various methods and visualization tools for DS analysis in multi-sample, multi-group, multi-(cell-)subpopulation scRNA-seq data, including cell-level mixed models and methods based on aggregated “pseudobulk” data, as well as a flexible simulation platform that mimics both single and multi-sample scRNA-seq data.
Maintained by Helena L. Crowell. Last updated 5 months ago.
immunooncologydifferentialexpressionsequencingsinglecellsoftwarestatisticalmethodvisualization
10.0 match 181 stars 10.26 score 686 scriptsbioc
QuasR:Quantify and Annotate Short Reads in R
This package provides a framework for the quantification and analysis of Short Reads. It covers a complete workflow starting from raw sequence reads, over creation of alignments and quality control plots, to the quantification of genomic regions of interest. Read alignments are either generated through Rbowtie (data from DNA/ChIP/ATAC/Bis-seq experiments) or Rhisat2 (data from RNA-seq experiments that require spliced alignments), or can be provided in the form of bam files.
Maintained by Michael Stadler. Last updated 24 days ago.
geneticspreprocessingsequencingchipseqrnaseqmethylseqcoveragealignmentqualitycontrolimmunooncologycurlbzip2xz-utilszlibcpp
11.8 match 6 stars 8.70 score 79 scripts 1 dependentsbioc
scone:Single Cell Overview of Normalized Expression data
SCONE is an R package for comparing and ranking the performance of different normalization schemes for single-cell RNA-seq and other high-throughput analyses.
Maintained by Davide Risso. Last updated 25 days ago.
immunooncologynormalizationpreprocessingqualitycontrolgeneexpressionrnaseqsoftwaretranscriptomicssequencingsinglecellcoverage
11.2 match 53 stars 9.12 score 104 scriptsbioc
scuttle:Single-Cell RNA-Seq Analysis Utilities
Provides basic utility functions for performing single-cell analyses, focusing on simple normalization, quality control and data transformations. Also provides some helper functions to assist development of other packages.
Maintained by Aaron Lun. Last updated 5 months ago.
immunooncologysinglecellrnaseqqualitycontrolpreprocessingnormalizationtranscriptomicsgeneexpressionsequencingsoftwaredataimportopenblascpp
10.0 match 10.21 score 1.7k scripts 80 dependentsbioc
CircSeqAlignTk:A toolkit for end-to-end analysis of RNA-seq data for circular genomes
CircSeqAlignTk is designed for end-to-end RNA-Seq data analysis of circular genome sequences, from alignment to visualization. It mainly targets viroids which are composed of 246-401 nt circular RNAs. In addition, CircSeqAlignTk implements a tidy interface to generate synthetic sequencing data that mimic real RNA-Seq data, allowing developers to evaluate the performance of alignment tools and workflows.
Maintained by Jianqiang Sun. Last updated 5 months ago.
sequencingsmallrnaalignmentsoftware
23.1 match 4.40 score 3 scriptsbioc
clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters
Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.
Maintained by Rui Fu. Last updated 5 months ago.
singlecellannotationsequencingmicroarraygeneexpressionassign-identitiesclustersmarker-genesrna-seqsingle-cell-rna-seq
10.5 match 119 stars 9.63 score 296 scriptsbioc
QSutils:Quasispecies Diversity
Set of utility functions for viral quasispecies analysis with NGS data. Most functions are equally useful for metagenomic studies. There are three main types: (1) data manipulation and exploration—functions useful for converting reads to haplotypes and frequencies, repairing reads, intersecting strand haplotypes, and visualizing haplotype alignments. (2) diversity indices—functions to compute diversity and entropy, in which incidence, abundance, and functional indices are considered. (3) data simulation—functions useful for generating random viral quasispecies data.
Maintained by Mercedes Guerrero-Murillo. Last updated 5 months ago.
softwaregeneticsdnaseqgeneticvariabilitysequencingalignmentsequencematchingdataimport
18.1 match 5.56 score 8 scripts 1 dependentsbioc
ModCon:Modifying splice site usage by changing the mRNP code, while maintaining the genetic code
Collection of functions to calculate a nucleotide sequence surrounding for splice donors sites to either activate or repress donor usage. The proposed alternative nucleotide sequence encodes the same amino acid and could be applied e.g. in reporter systems to silence or activate cryptic splice donor sites.
Maintained by Johannes Ptok. Last updated 5 months ago.
functionalgenomicsalternativesplicing
25.1 match 1 stars 4.00 score 2 scriptsbioc
SOMNiBUS:Smooth modeling of bisulfite sequencing
This package aims to analyse count-based methylation data on predefined genomic regions, such as those obtained by targeted sequencing, and thus to identify differentially methylated regions (DMRs) that are associated with phenotypes or traits. The method is built a rich flexible model that allows for the effects, on the methylation levels, of multiple covariates to vary smoothly along genomic regions. At the same time, this method also allows for sequencing errors and can adjust for variability in cell type mixture.
Maintained by Kathleen Klein. Last updated 3 months ago.
dnamethylationregressionepigeneticsdifferentialmethylationsequencingfunctionalprediction
23.3 match 1 stars 4.30 score 3 scriptsbioc
CAGEr:Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining
The _CAGEr_ package identifies transcription start sites (TSS) and their usage frequency from CAGE (Cap Analysis Gene Expression) sequencing data. It normalises raw CAGE tag count, clusters TSSs into tag clusters (TC) and aggregates them across multiple CAGE experiments to construct consensus clusters (CC) representing the promoterome. CAGEr provides functions to profile expression levels of these clusters by cumulative expression and rarefaction analysis, and outputs the plots in ggplot2 format for further facetting and customisation. After clustering, CAGEr performs analyses of promoter width and detects differential usage of TSSs (promoter shifting) between samples. CAGEr also exports its data as genome browser tracks, and as R objects for downsteam expression analysis by other Bioconductor packages such as DESeq2, CAGEfightR, or seqArchR.
Maintained by Charles Plessy. Last updated 5 months ago.
preprocessingsequencingnormalizationfunctionalgenomicstranscriptiongeneexpressionclusteringvisualization
16.4 match 6.12 score 73 scriptsbioc
tidybulk:Brings transcriptomics to the tidyverse
This is a collection of utility functions that allow to perform exploration of and calculations to RNA sequencing data, in a modular, pipe-friendly and tidy fashion.
Maintained by Stefano Mangiola. Last updated 5 months ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsbioconductorbulk-transcriptional-analysesdeseq2differential-expressionedgerensembl-idsentrezgene-symbolsgseamds-dimensionspcapiperedundancytibbletidytidy-datatidyversetranscriptstsne
10.5 match 168 stars 9.48 score 172 scripts 1 dependentsbioc
rGREAT:GREAT Analysis - Functional Enrichment on Genomic Regions
GREAT (Genomic Regions Enrichment of Annotations Tool) is a type of functional enrichment analysis directly performed on genomic regions. This package implements the GREAT algorithm (the local GREAT analysis), also it supports directly interacting with the GREAT web service (the online GREAT analysis). Both analysis can be viewed by a Shiny application. rGREAT by default supports more than 600 organisms and a large number of gene set collections, as well as self-provided gene sets and organisms from users. Additionally, it implements a general method for dealing with background regions.
Maintained by Zuguang Gu. Last updated 4 days ago.
genesetenrichmentgopathwayssoftwaresequencingwholegenomegenomeannotationcoveragecpp
10.0 match 86 stars 9.96 score 320 scripts 1 dependentsdsstoffer
astsa:Applied Statistical Time Series Analysis
Contains data sets and scripts for analyzing time series in both the frequency and time domains including state space modeling as well as supporting the texts Time Series Analysis and Its Applications: With R Examples (5th ed), by R.H. Shumway and D.S. Stoffer. Springer Texts in Statistics, 2025, <https://link.springer.com/book/9783031705830>, and Time Series: A Data Analysis Approach Using R. Chapman-Hall, 2019, <DOI:10.1201/9780429273285>.
Maintained by David Stoffer. Last updated 2 months ago.
12.4 match 7 stars 7.88 score 2.2k scripts 8 dependents