R-universe search: sequences

igraph

igraph:Network Analysis and Visualization

Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.

Maintained by Kirill Müller. Last updated 13 hours ago.

complex-networks graph-algorithms graph-theory mathematics network-analysis network-graph fortran libxml2 glpk openblas cpp

57.0 match 582 stars 21.11 score 31k scripts 1.9k dependents

traminer

TraMineR:Trajectory Miner: a Sequence Analysis Toolkit

Set of sequence analysis tools for manipulating, describing and rendering categorical sequences, and more generally mining sequence data in the field of social sciences. Although this sequence analysis package is primarily intended for state or event sequences that describe time use or life courses such as family formation histories or professional careers, its features also apply to many other kinds of categorical sequence data. It accepts many different sequence representations as input and provides tools for converting sequences from one format to another. It offers several functions for describing and rendering sequences, for computing distances between sequences with different metrics (among which optimal matching), original dissimilarity-based analysis tools, and functions for extracting the most frequent event subsequences and identifying the most discriminating ones among them. A user's guide can be found on the TraMineR web page.

Maintained by Gilbert Ritschard. Last updated 3 months ago.

cpp

136.9 match 11 stars 8.24 score 534 scripts 13 dependents

bioc

Biostrings:Efficient manipulation of biological strings

Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.

Maintained by Hervé Pagès. Last updated 24 days ago.

sequencematching alignment sequencing genetics dataimport datarepresentation infrastructure bioconductor-package core-package

51.5 match 61 stars 17.83 score 8.6k scripts 1.2k dependents

emmanuelparadis

ape:Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.

Maintained by Emmanuel Paradis. Last updated 14 hours ago.

openblas cpp

41.1 match 64 stars 17.22 score 13k scripts 599 dependents

bioc

DECIPHER:Tools for curating, analyzing, and manipulating biological sequences

A toolset for deciphering and managing biological sequences.

Maintained by Erik Wright. Last updated 5 days ago.

clustering genetics sequencing dataimport visualization microarray qualitycontrol qpcr alignment wholegenome microbiome immunooncology geneprediction openmp

77.0 match 8.40 score 1.1k scripts 14 dependents

bioc

dada2:Accurate, high-resolution sample inference from amplicon sequencing data

The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.

Maintained by Benjamin Callahan. Last updated 5 months ago.

immunooncology microbiome sequencing classification metagenomics amplicon bioconductor bioinformatics metabarcoding taxonomy cpp

49.0 match 485 stars 13.17 score 3.0k scripts 4 dependents

nanxstats

protr:Generating Various Numerical Representation Schemes for Protein Sequences

Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042>. For full functionality, the software 'ncbi-blast+' is needed, see <https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html> for more information.

Maintained by Nan Xiao. Last updated 6 months ago.

bioinformatics feature-engineering feature-extraction machine-learning peptides protein-sequences sequence-analysis

59.7 match 52 stars 10.02 score 173 scripts 3 dependents

bodkan

slendr:A Simulation Framework for Spatiotemporal Population Genetics

A framework for simulating spatially explicit genomic data which leverages real cartographic information for programmatic and visual encoding of spatiotemporal population dynamics on real geographic landscapes. Population genetic models are then automatically executed by the 'SLiM' software by Haller et al. (2019) <doi:10.1093/molbev/msy228> behind the scenes, using a custom built-in simulation 'SLiM' script. Additionally, fully abstract spatial models not tied to a specific geographic location are supported, and users can also simulate data from standard, non-spatial, random-mating models. These can be simulated either with the 'SLiM' built-in back-end script, or using an efficient coalescent population genetics simulator 'msprime' by Baumdicker et al. (2022) <doi:10.1093/genetics/iyab229> with a custom-built 'Python' script bundled with the R package. Simulated genomic data is saved in a tree-sequence format and can be loaded, manipulated, and summarised using tree-sequence functionality via an R interface to the 'Python' module 'tskit' by Kelleher et al. (2019) <doi:10.1038/s41588-019-0483-y>. Complete model configuration, simulation and analysis pipelines can be therefore constructed without a need to leave the R environment, eliminating friction between disparate tools for population genetic simulations and data analysis.

Maintained by Martin Petr. Last updated 12 days ago.

popgen population-genetics simulations spatial-statistics

59.3 match 56 stars 9.15 score 88 scripts

dosorio

Peptides:Calculate Indices and Theoretical Physicochemical Properties of Protein Sequences

Includes functions to calculate several physicochemical properties and indices for amino-acid sequences as well as to read and plot 'XVG' output files from the 'GROMACS' molecular dynamics package.

Maintained by Daniel Osorio. Last updated 1 years ago.

bioinformatics calculate-indices peptides protein-sequences qsar cpp

58.8 match 82 stars 9.14 score 245 scripts 7 dependents

bioc

universalmotif:Import, Modify, and Export Motifs with R

Allows for importing most common motif types into R for use by functions provided by other Bioconductor motif-related packages. Motifs can be exported into most major motif formats from various classes as defined by other Bioconductor packages. A suite of motif and sequence manipulation and analysis functions are included, including enrichment, comparison, P-value calculation, shuffling, trimming, higher-order motifs, and others.

Maintained by Benjamin Jean-Marie Tremblay. Last updated 4 months ago.

motifannotation motifdiscovery dataimport generegulation motif-analysis motif-enrichment-analysis sequence-logo cpp

46.9 match 28 stars 11.04 score 342 scripts 12 dependents

brodieg

fansi:ANSI Control Sequence Aware String Functions

Counterparts to R string manipulation functions that account for the effects of ANSI text formatting control sequences.

Maintained by Brodie Gaslam. Last updated 10 months ago.

string-manipulation

31.3 match 54 stars 14.18 score 136 scripts 11k dependents

r-lib

clock:Date-Time Types and Tools

Provides a comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (durations, time points, zoned-times, and calendars) that partition responsibilities so that the complexities of time zones are only considered when they are really needed. Capabilities include: date-time parsing, formatting, arithmetic, extraction and updating of components, and rounding.

Maintained by Davis Vaughan. Last updated 2 days ago.

cpp

30.6 match 106 stars 14.48 score 296 scripts 407 dependents

fkeck

bioseq:A Toolbox for Manipulating Biological Sequences

Classes and functions to work with biological sequences (DNA, RNA and amino acid sequences). Implements S3 infrastructure to work with biological sequences as described in Keck (2020) <doi:10.1111/2041-210X.13490>. Provides a collection of functions to perform biological conversion among classes (transcription, translation) and basic operations on sequences (detection, selection and replacement based on positions or patterns). The package also provides functions to import and export sequences from and to other package formats.

Maintained by Francois Keck. Last updated 3 years ago.

64.4 match 22 stars 6.72 score 80 scripts 1 dependents

bioc

crisprDesign:Comprehensive design of CRISPR gRNAs for nucleases and base editors

Provides a comprehensive suite of functions to design and annotate CRISPR guide RNA (gRNAs) sequences. This includes on- and off-target search, on-target efficiency scoring, off-target scoring, full gene and TSS contextual annotations, and SNP annotation (human only). It currently support five types of CRISPR modalities (modes of perturbations): CRISPR knockout, CRISPR activation, CRISPR inhibition, CRISPR base editing, and CRISPR knockdown. All types of CRISPR nucleases are supported, including DNA- and RNA-target nucleases such as Cas9, Cas12a, and Cas13d. All types of base editors are also supported. gRNA design can be performed on reference genomes, transcriptomes, and custom DNA and RNA sequences. Both unpaired and paired gRNA designs are enabled.

Maintained by Jean-Philippe Fortin. Last updated 11 days ago.

crispr functionalgenomics genetarget bioconductor bioconductor-package crispr-cas9 crispr-design crispr-target genomics-analysis grna grna-sequence grna-sequences sgrna sgrna-design

47.9 match 22 stars 8.28 score 80 scripts 3 dependents

bioc

msa:Multiple Sequence Alignment

The 'msa' package provides a unified R/Bioconductor interface to the multiple sequence alignment algorithms ClustalW, ClustalOmega, and Muscle. All three algorithms are integrated in the package, therefore, they do not depend on any external software tools and are available for all major platforms. The multiple sequence alignment algorithms are complemented by a function for pretty-printing multiple sequence alignments using the LaTeX package TeXshade.

Maintained by Ulrich Bodenhofer. Last updated 1 months ago.

multiplesequencealignment alignment multiplecomparison sequencing cpp

33.8 match 17 stars 11.51 score 744 scripts 6 dependents

adrientaudiere

MiscMetabar:Miscellaneous Functions for Metabarcoding Analysis

Facilitate the description, transformation, exploration, and reproducibility of metabarcoding analyses. 'MiscMetabar' is mainly built on top of the 'phyloseq', 'dada2' and 'targets' R packages. It helps to build reproducible and robust bioinformatics pipelines in R. 'MiscMetabar' makes ecological analysis of alpha and beta-diversity easier, more reproducible and more powerful by integrating a large number of tools. Important features are described in Taudière A. (2023) <doi:10.21105/joss.06038>.

Maintained by Adrien Taudière. Last updated 26 days ago.

sequencing microbiome metagenomics clustering classification visualization amplicon amplicon-sequencing biodiversity-informatics ecology illumina metabarcoding ngs-analysis

60.3 match 17 stars 6.44 score 23 scripts

biogenies

tidysq:Tidy Processing and Analysis of Biological Sequences

A tidy approach to analysis of biological sequences. All processing and data-storage functions are heavily optimized to allow the fastest and most efficient data storage.

Maintained by Dominik Rafacz. Last updated 2 months ago.

bioconductor bioinformatics biological-sequences fasta s3 sequences tibble tidy tidyverse vctrs cpp

50.5 match 40 stars 7.56 score 38 scripts

ropensci

biomartr:Genomic Data Retrieval

Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database (Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr', 'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. with only one command.

Maintained by Hajk-Georg Drost. Last updated 1 months ago.

biomart genomic-data-retrieval annotation-retrieval database-retrieval ncbi ensembl biological-data-retrieval ensembl-servers genome genome-annotation genome-retrieval genomics meta-analysis metagenomics ncbi-genbank peer-reviewed proteome sequenced-genomes

32.7 match 218 stars 11.35 score 129 scripts 3 dependents

bioboot

bio3d:Biological Structure Analysis

Utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform sequence and structure database searches, data summaries, atom selection, alignment, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, normal mode analysis, principal component analysis of heterogeneous structure data, and correlation network analysis from normal mode and molecular dynamics data. In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data. Please refer to the URLs below for more information.

Maintained by Barry Grant. Last updated 5 months ago.

zlib cpp

43.4 match 5 stars 8.49 score 1.4k scripts 10 dependents

benrenard

sequenceR:A Simple Sequencer for Data Sonification

A rudimentary sequencer to define, manipulate and mix sound samples. The underlying motivation is to sonify data, as demonstrated in the blog <https://globxblog.github.io/>, the presentation by Renard and Le Bescond (2022, <https://hal.science/hal-03710340v1>) or the poster by Renard et al. (2023, <https://hal.inrae.fr/hal-04388845v1>).

Maintained by Benjamin Renard. Last updated 2 months ago.

audio sonification

70.1 match 3 stars 5.13 score 15 scripts

bioc

Rcpi:Molecular Informatics Toolkit for Compound-Protein Interaction in Drug Discovery

A molecular informatics toolkit with an integration of bioinformatics and chemoinformatics tools for drug discovery.

Maintained by Nan Xiao. Last updated 5 months ago.

software dataimport datarepresentation featureextraction cheminformatics biomedicalinformatics proteomics go systemsbiology bioconductor bioinformatics drug-discovery feature-extraction fingerprint molecular-descriptors protein-sequences

43.4 match 37 stars 7.81 score 29 scripts

ropensci

phylotaR:Automated Phylogenetic Sequence Cluster Identification from 'GenBank'

A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.

Maintained by Shixiang Wang. Last updated 8 months ago.

blastn genbank peer-reviewed phylogenetics sequence-alignment

56.2 match 23 stars 5.86 score 156 scripts

bioc

motifStack:Plot stacked logos for single or multiple DNA, RNA and amino acid sequence

The motifStack package is designed for graphic representation of multiple motifs with different similarity scores. It works with both DNA/RNA sequence motif and amino acid sequence motif. In addition, it provides the flexibility for users to customize the graphic parameters such as the font type and symbol colors.

Maintained by Jianhong Ou. Last updated 2 months ago.

sequencematching visualization sequencing microarray alignment chipchip chipseq motifannotation dataimport

41.4 match 7.93 score 188 scripts 6 dependents

samuel-marsh

scCustomize:Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing

Collection of functions created and/or curated to aid in the visualization and analysis of single-cell data using 'R'. 'scCustomize' aims to provide 1) Customized visualizations for aid in ease of use and to create more aesthetic and functional visuals. 2) Improve speed/reproducibility of common tasks/pieces of code in scRNA-seq analysis with a single or group of functions. For citation please use: Marsh SE (2021) "Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing" <doi:10.5281/zenodo.5706430> RRID:SCR_024675.

Maintained by Samuel Marsh. Last updated 3 months ago.

customization ggplot2 scrna-seq seurat single-cell single-cell-genomics single-cell-rna-seq visualization

32.8 match 242 stars 8.75 score 1.1k scripts

klausvigo

phangorn:Phylogenetic Reconstruction and Analysis

Allows for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation (Schliep 2011). Offers methods for tree comparison, model selection and visualization of phylogenetic networks as described in Schliep et al. (2017).

Maintained by Klaus Schliep. Last updated 1 months ago.

software technology qualitycontrol phylogenetic-analysis phylogenetics openblas cpp

16.9 match 206 stars 16.69 score 2.5k scripts 135 dependents

bioc

ORFik:Open Reading Frames in Genomics

R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.

Maintained by Haakon Tjeldnes. Last updated 28 days ago.

immunooncology software sequencing riboseq rnaseq functionalgenomics coverage alignment dataimport cpp

25.9 match 33 stars 10.63 score 115 scripts 2 dependents

bioc

GenomicFeatures:Query the gene models of a given organism/assembly

Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.

Maintained by H. Pagès. Last updated 4 months ago.

genetics infrastructure annotation sequencing genomeannotation bioconductor-package core-package

17.8 match 26 stars 15.34 score 5.3k scripts 339 dependents

t-kalinowski

keras:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

25.0 match 10.82 score 10k scripts 54 dependents

bioc

metagenomeSeq:Statistical analysis for sparse high-throughput sequencing

metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq is designed to address the effects of both normalization and under-sampling of microbial communities on disease association detection and the testing of feature correlations.

Maintained by Joseph N. Paulson. Last updated 3 months ago.

immunooncology classification clustering geneticvariability differentialexpression microbiome metagenomics normalization visualization multiplecomparison sequencing software

22.4 match 69 stars 12.02 score 494 scripts 7 dependents

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 4 days ago.

18.9 match 845 stars 13.57 score 264 scripts 2 dependents

bioc

GenomAutomorphism:Compute the automorphisms between DNA's Abelian group representations

This is a R package to compute the automorphisms between pairwise aligned DNA sequences represented as elements from a Genomic Abelian group. In a general scenario, from genomic regions till the whole genomes from a given population (from any species or close related species) can be algebraically represented as a direct sum of cyclic groups or more specifically Abelian p-groups. Basically, we propose the representation of multiple sequence alignments of length N bp as element of a finite Abelian group created by the direct sum of homocyclic Abelian group of prime-power order.

Maintained by Robersy Sanchez. Last updated 3 months ago.

mathematicalbiology comparativegenomics functionalgenomics multiplesequencealignment wholegenome genetic-code genetic-code-algebra genome genome-algebra

58.6 match 4.30 score 9 scripts

bioc

Rsamtools:Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import

This package provides an interface to the 'samtools', 'bcftools', and 'tabix' utilities for manipulating SAM (Sequence Alignment / Map), FASTA, binary variant call (BCF) and compressed indexed tab-delimited (tabix) files.

Maintained by Bioconductor Package Maintainer. Last updated 4 months ago.

dataimport sequencing coverage alignment qualitycontrol bioconductor-package core-package curl bzip2 xz-utils zlib cpp

15.8 match 28 stars 15.42 score 3.2k scripts 566 dependents

bioc

RAIDS:Accurate Inference of Genetic Ancestry from Cancer Sequences

This package implements specialized algorithms that enable genetic ancestry inference from various cancer sequences sources (RNA, Exome and Whole-Genome sequences). This package also implements a simulation algorithm that generates synthetic cancer-derived data. This code and analysis pipeline was designed and developed for the following publication: Belleau, P et al. Genetic Ancestry Inference from Cancer-Derived Molecular Data across Genomic and Transcriptomic Platforms. Cancer Res 1 January 2023; 83 (1): 49–58.

Maintained by Pascal Belleau. Last updated 5 months ago.

genetics software sequencing wholegenome principalcomponent geneticvariability dimensionreduction biocviews ancestry cancer-genomics exome-sequencing genomics inference r-language rna-seq rna-sequencing whole-genome-sequencing

38.2 match 5 stars 6.23 score 19 scripts

bioc

Modstrings:Working with modified nucleotide sequences

Representing nucleotide modifications in a nucleotide sequence is usually done via special characters from a number of sources. This represents a challenge to work with in R and the Biostrings package. The Modstrings package implements this functionallity for RNA and DNA sequences containing modified nucleotides by translating the character internally in order to work with the infrastructure of the Biostrings package. For this the ModRNAString and ModDNAString classes and derivates and functions to construct and modify these objects despite the encoding issues are implemenented. In addition the conversion from sequences to list like location information (and the reverse operation) is implemented as well.

Maintained by Felix G.M. Ernst. Last updated 5 months ago.

dataimport datarepresentation infrastructure sequencing software bioconductor biostrings dna dna-modifications modified-nucleotides nucleotides rna rna-modification-alphabet rna-modifications sequences

35.5 match 1 stars 6.64 score 5 scripts 8 dependents

bioc

phyloseq:Handling and analysis of high-throughput microbiome census data

phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.

Maintained by Paul J. McMurdie. Last updated 5 months ago.

immunooncology sequencing microbiome metagenomics clustering classification multiplecomparison geneticvariability

16.7 match 597 stars 13.90 score 8.4k scripts 37 dependents

bioc

seqArchR:Identify Different Architectures of Sequence Elements

seqArchR enables unsupervised discovery of _de novo_ clusters with characteristic sequence architectures characterized by position-specific motifs or composition of stretches of nucleotides, e.g., CG-richness. seqArchR does _not_ require any specifications w.r.t. the number of clusters, the length of any individual motifs, or the distance between motifs if and when they occur in pairs/groups; it directly detects them from the data. seqArchR uses non-negative matrix factorization (NMF) as its backbone, and employs a chunking-based iterative procedure that enables processing of large sequence collections efficiently. Wrapper functions are provided for visualizing cluster architectures as sequence logos.

Maintained by Sarvesh Nikumbh. Last updated 5 months ago.

motifdiscovery generegulation mathematicalbiology systemsbiology transcriptomics genetics clustering dimensionreduction featureextraction dnaseq nmf nonnegative-matrix-factorization promoter-sequence-architectures scikit-learn sequence-analysis sequence-architectures unsupervised-machine-learning

51.2 match 1 stars 4.48 score 9 scripts 1 dependents

bioc

LymphoSeq:Analyze high-throughput sequencing of T and B cell receptors

This R package analyzes high-throughput sequencing of T and B cell receptor complementarity determining region 3 (CDR3) sequences generated by Adaptive Biotechnologies' ImmunoSEQ assay. Its input comes from tab-separated value (.tsv) files exported from the ImmunoSEQ analyzer.

Maintained by David Coffey. Last updated 5 months ago.

software technology sequencing targetedresequencing alignment multiplesequencealignment

56.6 match 4.00 score 4 scripts

bioc

kebabs:Kernel-Based Analysis of Biological Sequences

The package provides functionality for kernel-based analysis of DNA, RNA, and amino acid sequences via SVM-based methods. As core functionality, kebabs implements following sequence kernels: spectrum kernel, mismatch kernel, gappy pair kernel, and motif kernel. Apart from an efficient implementation of standard position-independent functionality, the kernels are extended in a novel way to take the position of patterns into account for the similarity measure. Because of the flexibility of the kernel formulation, other kernels like the weighted degree kernel or the shifted weighted degree kernel with constant weighting of positions are included as special cases. An annotation-specific variant of the kernels uses annotation information placed along the sequence together with the patterns in the sequence. The package allows for the generation of a kernel matrix or an explicit feature representation in dense or sparse format for all available kernels which can be used with methods implemented in other R packages. With focus on SVM-based methods, kebabs provides a framework which simplifies the usage of existing SVM implementations in kernlab, e1071, and LiblineaR. Binary and multi-class classification as well as regression tasks can be used in a unified way without having to deal with the different functions, parameters, and formats of the selected SVM. As support for choosing hyperparameters, the package provides cross validation - including grouped cross validation, grid search and model selection functions. For easier biological interpretation of the results, the package computes feature weights for all SVMs and prediction profiles which show the contribution of individual sequence positions to the prediction result and indicate the relevance of sequence sections for the learning result and the underlying biological functions.

Maintained by Ulrich Bodenhofer. Last updated 5 months ago.

supportvectormachine classification clustering regression cpp

34.0 match 6.58 score 47 scripts 3 dependents

snoweye

phyclust:Phylogenetic Clustering (Phyloclustering)

Phylogenetic clustering (phyloclustering) is an evolutionary Continuous Time Markov Chain model-based approach to identify population structure from molecular data without assuming linkage equilibrium. The package phyclust (Chen 2011) provides a convenient implementation of phyloclustering for DNA and SNP data, capable of clustering individuals into subpopulations and identifying molecular sequences representative of those subpopulations. It is designed in C for performance, interfaced with R for visualization, and incorporates other popular open source programs including ms (Hudson 2002) <doi:10.1093/bioinformatics/18.2.337>, seq-gen (Rambaut and Grassly 1997) <doi:10.1093/bioinformatics/13.3.235>, Hap-Clustering (Tzeng 2005) <doi:10.1002/gepi.20063> and PAML baseml (Yang 1997, 2007) <doi:10.1093/bioinformatics/13.5.555>, <doi:10.1093/molbev/msm088>, for simulating data, additional analyses, and searching the best tree. See the phyclust website for more information, documentations and examples.

Maintained by Wei-Chen Chen. Last updated 2 years ago.

26.4 match 9 stars 8.45 score 126 scripts 8 dependents

bioc

CellBarcode:Cellular DNA Barcode Analysis toolkit

The package CellBarcode performs Cellular DNA Barcode analysis. It can handle all kinds of DNA barcodes, as long as the barcode is within a single sequencing read and has a pattern that can be matched by a regular expression. \code{CellBarcode} can handle barcodes with flexible lengths, with or without UMI (unique molecular identifier). This tool also can be used for pre-processing some amplicon data such as CRISPR gRNA screening, immune repertoire sequencing, and metagenome data.

Maintained by Wenjie Sun. Last updated 12 days ago.

preprocessing qualitycontrol sequencing crispr amplicon amplicon-sequencing cellular-barcode rust cargo

37.9 match 1 stars 5.86 score 40 scripts

lvclark

polyRAD:Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

Read depth data from genotyping-by-sequencing (GBS) or restriction site-associated DNA sequencing (RAD-seq) are imported and used to make Bayesian probability estimates of genotypes in polyploids or diploids. The genotype probabilities, posterior mean genotypes, or most probable genotypes can then be exported for downstream analysis. 'polyRAD' is described by Clark et al. (2019) <doi:10.1534/g3.118.200913>, and the Hind/He statistic for marker filtering is described by Clark et al. (2022) <doi:10.1186/s12859-022-04635-9>. A variant calling pipeline for highly duplicated genomes is also included and is described by Clark et al. (2020, Version 1) <doi:10.1101/2020.01.11.902890>.

Maintained by Lindsay V. Clark. Last updated 8 days ago.

bioinformatics dna-sequencing genotype-likelihoods genotyping-by-sequencing hacktoberfest rad-seq rad-sequencing snp-genotyping cpp

31.5 match 28 stars 6.98 score 85 scripts

shaunpwilkinson

insect:Informatic Sequence Classification Trees

Provides tools for probabilistic taxon assignment with informatic sequence classification trees. See Wilkinson et al (2018) <doi:10.7287/peerj.preprints.26812v1>.

Maintained by Shaun Wilkinson. Last updated 4 years ago.

37.7 match 14 stars 5.80 score 91 scripts

michbur

biogram:N-Gram Analysis of Biological Sequences

Tools for extraction and analysis of various n-grams (k-mers) derived from biological sequences (proteins or nucleic acids). Contains QuiPT (quick permutation test) for fast feature-filtering of the n-gram data.

Maintained by Michal Burdukiewicz. Last updated 7 months ago.

biological-sequences ngram-analysis

28.0 match 10 stars 7.50 score 87 scripts 3 dependents

martin3141

spant:MR Spectroscopy Analysis Tools

Tools for reading, visualising and processing Magnetic Resonance Spectroscopy data. The package includes methods for spectral fitting: Wilson (2021) <DOI:10.1002/mrm.28385> and spectral alignment: Wilson (2018) <DOI:10.1002/mrm.27605>.

Maintained by Martin Wilson. Last updated 30 days ago.

brain mri mrs mrshub spectroscopy fortran

24.4 match 25 stars 8.52 score 81 scripts

kbhoehn

dowser:B Cell Receptor Phylogenetics Toolkit

Provides a set of functions for inferring, visualizing, and analyzing B cell phylogenetic trees. Provides methods to 1) reconstruct unmutated ancestral sequences, 2) build B cell phylogenetic trees using multiple methods, 3) visualize trees with metadata at the tips, 4) reconstruct intermediate sequences, 5) detect biased ancestor-descendant relationships among metadata types Workflow examples available at documentation site (see URL). Citations: Hoehn et al (2022) <doi:10.1371/journal.pcbi.1009885>, Hoehn et al (2021) <doi:10.1101/2021.01.06.425648>.

Maintained by Kenneth Hoehn. Last updated 2 months ago.

30.6 match 6.81 score 84 scripts

bioc

TitanCNA:Subclonal copy number and LOH prediction from whole genome sequencing of tumours

Hidden Markov model to segment and predict regions of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH), and estimate cellular prevalence of clonal clusters in tumour whole genome sequencing data.

Maintained by Gavin Ha. Last updated 5 months ago.

sequencing wholegenome dnaseq exomeseq statisticalmethod copynumbervariation hiddenmarkovmodel genetics genomicvariation immunooncology 10x-genomics copy-number-variation genome-sequencing hmm tumor-heterogeneity

24.1 match 96 stars 8.47 score 68 scripts

bioc

decontam:Identify Contaminants in Marker-gene and Metagenomics Sequencing Data

Simple statistical identification of contaminating sequence features in marker-gene or metagenomics data. Works on any kind of feature derived from environmental sequencing data (e.g. ASVs, OTUs, taxonomic groups, MAGs,...). Requires DNA quantitation data or sequenced negative control samples.

Maintained by Benjamin Callahan. Last updated 5 months ago.

immunooncology microbiome sequencing classification metagenomics amplicon bioinformatics contamination metabarcoding

17.7 match 153 stars 11.36 score 524 scripts 5 dependents

bioc

ShortRead:FASTQ input and manipulation

This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

dataimport sequencing qualitycontrol bioconductor-package core-package zlib cpp

16.5 match 8 stars 12.08 score 1.8k scripts 49 dependents

bioc

immApex:Tools for Adaptive Immune Receptor Sequence-Based Machine and Deep Learning

A set of tools to build tensorflow/keras3-based models in R from amino acid and nucleotide sequences focusing on adaptive immune receptors. The package includes pre-processing of sequences, unifying gene nomenclature usage, encoding sequences, and combining models. This package will serve as the basis of future immune receptor sequence functions/packages/models compatible with the scRepertoire ecosystem.

Maintained by Nick Borcherding. Last updated 19 days ago.

software immunooncology singlecell classification annotation sequencing motifannotation

32.7 match 8 stars 5.92 score 3 scripts

bioc

crisprBase:Base functions and classes for CRISPR gRNA design

Provides S4 classes for general nucleases, CRISPR nucleases, CRISPR nickases, and base editors.Several CRISPR-specific genome arithmetic functions are implemented to help extract genomic coordinates of spacer and protospacer sequences. Commonly-used CRISPR nuclease objects are provided that can be readily used in other packages. Both DNA- and RNA-targeting nucleases are supported.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics bioconductor bioconductor-package crispr-cas9 crispr-design crispr-target grna grna-sequence grna-sequences

26.7 match 5 stars 7.15 score 52 scripts 6 dependents

bioc

Rsubread:Mapping, quantification and variant analysis of sequencing data

Alignment, quantification and analysis of RNA sequencing data (including both bulk RNA-seq and scRNA-seq) and DNA sequenicng data (including ATAC-seq, ChIP-seq, WGS, WES etc). Includes functionality for read mapping, read counting, SNP calling, structural variant detection and gene fusion discovery. Can be applied to all major sequencing techologies and to both short and long sequence reads.

Maintained by Wei Shi. Last updated 2 days ago.

sequencing alignment sequencematching rnaseq chipseq singlecell geneexpression generegulation genetics immunooncology snp geneticvariability preprocessing qualitycontrol genomeannotation genefusiondetection indeldetection variantannotation variantdetection multiplesequencealignment zlib

20.7 match 9.24 score 892 scripts 10 dependents

gobbios

EloRating:Animal Dominance Hierarchies by Elo Rating

Provides functions to quantify animal dominance hierarchies. The major focus is on Elo rating and its ability to deal with temporal dynamics in dominance interaction sequences. For static data, David's score and de Vries' I&SI are also implemented. In addition, the package provides functions to assess transitivity, linearity and stability of dominance networks. See Neumann et al (2011) <doi:10.1016/j.anbehav.2011.07.016> for an introduction.

Maintained by Christof Neumann. Last updated 8 months ago.

cpp

27.7 match 4 stars 6.86 score 61 scripts 1 dependents

bioc

GenomicRanges:Representation and manipulation of genomic intervals

The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.

Maintained by Hervé Pagès. Last updated 4 months ago.

genetics infrastructure datarepresentation sequencing annotation genomeannotation coverage bioconductor-package core-package

10.5 match 44 stars 17.75 score 13k scripts 1.3k dependents

spedygiorgio

markovchain:Easy Handling Discrete Time Markov Chains

Functions and S4 methods to create and manage discrete time Markov chains more easily. In addition functions to perform statistical (fitting and drawing random variates) and probabilistic (analysis of their structural proprieties) analysis are provided. See Spedicato (2017) <doi:10.32614/RJ-2017-036>. Some functions for continuous times Markov chains depend on the suggested ctmcd package.

Maintained by Giorgio Alfredo Spedicato. Last updated 4 months ago.

ctmc dtmc markov-chain markov-model r-programming rcpp openblas cpp

14.5 match 104 stars 12.78 score 712 scripts 4 dependents

ropensci

restez:Create and Query a Local Copy of 'GenBank' in R

Download large sections of 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> and generate a local SQL-based database. A user can then query this database using 'restez' functions or through 'rentrez' <https://CRAN.R-project.org/package=rentrez> wrappers.

Maintained by Joel H. Nitta. Last updated 10 days ago.

dna entrez genbank sequence

26.4 match 26 stars 7.01 score 175 scripts 1 dependents

omarwagih

ggseqlogo:A 'ggplot2' Extension for Drawing Publication-Ready Sequence Logos

The extensive range of functions provided by this package makes it possible to draw highly versatile sequence logos. Features include, but not limited to, modifying colour schemes and fonts used to draw the logo, generating multiple logo plots, and aiding the visualisation with annotations. Sequence logos can easily be combined with other plots 'ggplot2' plots.

Maintained by Omar Wagih. Last updated 5 months ago.

16.1 match 211 stars 11.48 score 786 scripts 13 dependents

bioc

ensembldb:Utilities to create and use Ensembl-based annotation databases

The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.

Maintained by Johannes Rainer. Last updated 5 months ago.

genetics annotationdata sequencing coverage annotation bioconductor bioconductor-packages ensembl

13.0 match 35 stars 14.08 score 892 scripts 108 dependents

bioc

GenomicAlignments:Representation and manipulation of short genomic alignments

Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.

Maintained by Hervé Pagès. Last updated 5 months ago.

infrastructure dataimport genetics sequencing rnaseq snp coverage alignment immunooncology bioconductor-package core-package

13.4 match 10 stars 13.61 score 3.1k scripts 529 dependents

bioc

SeqArray:Data Management of Large-Scale Whole-Genome Sequence Variant Calls

Data management of large-scale whole-genome sequencing variant calls with thousands of individuals: genotypic data (e.g., SNVs, indels and structural variation calls) and annotations in SeqArray GDS files are stored in an array-oriented and compressed manner, with efficient data access using the R programming language.

Maintained by Xiuwen Zheng. Last updated 10 days ago.

infrastructure datarepresentation sequencing genetics bioinformatics gds-format snp snv wes wgs cpp

15.0 match 45 stars 12.08 score 1.1k scripts 9 dependents

bioc

ALDEx2:Analysis Of Differential Abundance Taking Sample and Scale Variation Into Account

A differential abundance analysis for the comparison of two or more conditions. Useful for analyzing data from standard RNA-seq or meta-RNA-seq assays as well as selected and unselected values from in-vitro sequence selections. Uses a Dirichlet-multinomial model to infer abundance from counts, optimized for three or more experimental replicates. The method infers biological and sampling variation to calculate the expected false discovery rate, given the variation, based on a Wilcoxon Rank Sum test and Welch's t-test (via aldex.ttest), a Kruskal-Wallis test (via aldex.kw), a generalized linear model (via aldex.glm), or a correlation test (via aldex.corr). All tests report predicted p-values and posterior Benjamini-Hochberg corrected p-values. ALDEx2 also calculates expected standardized effect sizes for paired or unpaired study designs. ALDEx2 can now be used to estimate the effect of scale on the results and report on the scale-dependent robustness of results.

Maintained by Greg Gloor. Last updated 5 months ago.

differentialexpression rnaseq transcriptomics geneexpression dnaseq chipseq bayesian sequencing software microbiome metagenomics immunooncology scale simulation posterior p-value

16.7 match 28 stars 10.70 score 424 scripts 3 dependents

zhanxw

seqminer:Efficiently Read Sequence Data (VCF Format, BCF Format, METAL Format and BGEN Format) into R

Integrate sequencing data (Variant call format, e.g. VCF or BCF) or meta-analysis results in R. This package can help you (1) read VCF/BCF/BGEN files by chromosomal ranges (e.g. 1:100-200); (2) read RareMETAL summary statistics files; (3) read tables from a tabix-indexed files; (4) annotate VCF/BCF files; (5) create customized workflow based on Makefile.

Maintained by Xiaowei Zhan. Last updated 6 months ago.

annotation bcf bgen meta-analysis next-generation-sequencing plink sequencing tabix vcf workflow zlib bzip2 libzstd sqlite3 cpp

21.5 match 30 stars 8.29 score 111 scripts 6 dependents

bioc

methylKit:DNA methylation analysis from high-throughput bisulfite sequencing results

methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing. It also has functions to analyze base-pair resolution 5hmC data from experimental protocols such as oxBS-Seq and TAB-Seq. Methylation calling can be performed directly from Bismark aligned BAM files.

Maintained by Altuna Akalin. Last updated 16 days ago.

dnamethylation sequencing methylseq genome-biology methylation statistical-analysis visualization curl bzip2 xz-utils zlib cpp

15.0 match 220 stars 11.80 score 578 scripts 3 dependents

bioc

scRepertoire:A toolkit for single-cell immune receptor profiling

scRepertoire is a toolkit for processing and analyzing single-cell T-cell receptor (TCR) and immunoglobulin (Ig). The scRepertoire framework supports use of 10x, AIRR, BD, MiXCR, Omniscope, TRUST4, and WAT3R single-cell formats. The functionality includes basic clonal analyses, repertoire summaries, distance-based clustering and interaction with the popular Seurat and SingleCellExperiment/Bioconductor R workflows.

Maintained by Nick Borcherding. Last updated 2 months ago.

software immunooncology singlecell classification annotation sequencing cpp

16.8 match 326 stars 10.49 score 240 scripts

ohdsi

CohortSymmetry:Sequence Symmetry Analysis Using the Observational Medical Outcomes Partnership Common Data Model

Calculating crude sequence ratio, adjusted sequence ratio and confidence intervals using data mapped to the Observational Medical Outcomes Partnership Common Data Model.

Maintained by Xihang Chen. Last updated 15 days ago.

23.3 match 1 stars 7.53 score 73 scripts

thackl

gggenomes:A Grammar of Graphics for Comparative Genomics

An extension of 'ggplot2' for creating complex genomic maps. It builds on the power of 'ggplot2' and 'tidyverse' adding new 'ggplot2'-style geoms & positions and 'dplyr'-style verbs to manipulate the underlying data. It implements a layout concept inspired by 'ggraph' and introduces tracks to bring tidiness to the mess that is genomics data.

Maintained by Thomas Hackl. Last updated 2 months ago.

biological-data comparative-genomics genomics-visualization ggplot-extension ggplot2

18.3 match 650 stars 9.56 score 123 scripts

bioc

hiReadsProcessor:Functions to process LM-PCR reads from 454/Illumina data

hiReadsProcessor contains set of functions which allow users to process LM-PCR products sequenced using any platform. Given an excel/txt file containing parameters for demultiplexing and sample metadata, the functions automate trimming of adaptors and identification of the genomic product. Genomic products are further processed for QC and abundance quantification.

Maintained by Nirav V Malani. Last updated 5 months ago.

sequencing preprocessing

41.6 match 4.18 score 7 scripts

mhahsler

rMSA:Interface for Popular Multiple Sequence Alignment Tools

Seamlessly interfaces the Multiple Sequence Alignment software packages ClustalW, MAFFT, MUSCLE and Kalign (downloaded separately) and provides support to calcualte distances between sequences. This work was partially supported by grant no. R21HG005912 from the National Human Genome Research Institute.

Maintained by Michael Hahsler. Last updated 10 months ago.

genetics sequencing infrastructure alignment bioinformatics sequence-alignment

45.8 match 12 stars 3.78 score 7 scripts

bioc

pwalign:Perform pairwise sequence alignments

The two main functions in the package are pairwiseAlignment() and stringDist(). The former solves (Needleman-Wunsch) global alignment, (Smith-Waterman) local alignment, and (ends-free) overlap alignment problems. The latter computes the Levenshtein edit distance or pairwise alignment score matrix for a set of strings.

Maintained by Hervé Pagès. Last updated 2 months ago.

alignment sequencematching sequencing genetics bioconductor-package

20.3 match 1 stars 8.42 score 27 scripts 105 dependents

bioc

DESeq2:Differential gene expression analysis based on the negative binomial distribution

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Maintained by Michael Love. Last updated 11 days ago.

sequencing rnaseq chipseq geneexpression transcription normalization differentialexpression bayesian regression principalcomponent clustering immunooncology openblas cpp

10.5 match 375 stars 16.11 score 17k scripts 115 dependents

bioc

ComplexHeatmap:Make Complex Heatmaps

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports various annotation graphics.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization sequencing clustering complex-heatmaps heatmap

10.0 match 1.3k stars 16.93 score 16k scripts 151 dependents

bioc

bambu:Context-Aware Transcript Quantification from Long Read RNA-Seq data

bambu is a R package for multi-sample transcript discovery and quantification using long read RNA-Seq data. You can use bambu after read alignment to obtain expression estimates for known and novel transcripts and genes. The output from bambu can directly be used for visualisation and downstream analysis such as differential gene expression or transcript usage.

Maintained by Ying Chen. Last updated 1 months ago.

alignment coverage differentialexpression featureextraction geneexpression genomeannotation genomeassembly immunooncology longread multiplecomparison normalization rnaseq regression sequencing software transcription transcriptomics bambu bioconductor long-reads nanopore nanopore-sequencing rna-seq rna-seq-analysis transcript-quantification transcript-reconstruction cpp

18.7 match 197 stars 9.03 score 91 scripts 1 dependents

bioc

SummarizedExperiment:A container (S4 class) for matrix-like assays

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

Maintained by Hervé Pagès. Last updated 5 months ago.

genetics infrastructure sequencing annotation coverage genomeannotation bioconductor-package core-package

10.0 match 34 stars 16.85 score 8.6k scripts 1.2k dependents

bioc

DSS:Dispersion shrinkage for sequencing data

DSS is an R library performing differntial analysis for count-based sequencing data. It detectes differentially expressed genes (DEGs) from RNA-seq, and differentially methylated loci or regions (DML/DMRs) from bisulfite sequencing (BS-seq). The core of DSS is a new dispersion shrinkage method for estimating the dispersion parameter from Gamma-Poisson or Beta-Binomial distributions.

Maintained by Hao Wu. Last updated 5 months ago.

sequencing rnaseq dnamethylation geneexpression differentialexpression differentialmethylation

24.0 match 7.02 score 248 scripts 5 dependents

ambuvjyn

baseq:Basic Sequence Processing Tool for Biological Data

Primarily created as an easy and understanding way to do basic sequences surrounding the central dogma of molecular biology.

Maintained by Ambu Vijayan. Last updated 2 years ago.

bioinformatics sequencing

41.4 match 2 stars 4.00 score

aalfons

robustHD:Robust Methods for High-Dimensional Data

Robust methods for high-dimensional data, in particular linear model selection techniques based on least angle regression and sparse regression. Specifically, the package implements robust least angle regression (Khan, Van Aelst & Zamar, 2007; <doi:10.1198/016214507000000950>), (robust) groupwise least angle regression (Alfons, Croux & Gelper, 2016; <doi:10.1016/j.csda.2015.02.007>), and sparse least trimmed squares regression (Alfons, Croux & Gelper, 2013; <doi:10.1214/12-AOAS575>).

Maintained by Andreas Alfons. Last updated 9 months ago.

openblas cpp openmp

23.2 match 10 stars 7.06 score 174 scripts 8 dependents

bioc

edgeR:Empirical Analysis of Digital Gene Expression Data in R

Differential expression analysis of sequence count data. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models, quasi-likelihood, and gene set enrichment. Can perform differential analyses of any type of omics data that produces read counts, including RNA-seq, ChIP-seq, ATAC-seq, Bisulfite-seq, SAGE, CAGE, metabolomics, or proteomics spectral counts. RNA-seq analyses can be conducted at the gene or isoform level, and tests can be conducted for differential exon or transcript usage.

Maintained by Yunshun Chen. Last updated 6 days ago.

alternativesplicing batcheffect bayesian biomedicalinformatics cellbiology chipseq clustering coverage differentialexpression differentialmethylation differentialsplicing dnamethylation epigenetics functionalgenomics geneexpression genesetenrichment genetics immunooncology multiplecomparison normalization pathways proteomics qualitycontrol regression rnaseq sage sequencing singlecell systemsbiology timecourse transcription transcriptomics openblas

12.2 match 13.40 score 17k scripts 255 dependents

cnuge

debar:A Post-Clustering Denoiser for COI-5P Barcode Data

The 'debar' sequence processing pipeline is designed for denoising high throughput sequencing data for the animal DNA barcode marker cytochrome c oxidase I (COI). The package is designed to detect and correct insertion and deletion errors within sequencer outputs. This is accomplished through comparison of input sequences against a profile hidden Markov model (PHMM) using the Viterbi algorithm (for algorithm details see Durbin et al. 1998, ISBN: 9780521629713). Inserted base pairs are removed and deleted base pairs are accounted for through the introduction of a placeholder character. Since the PHMM is a probabilistic representation of the COI barcode, corrections are not always perfect. For this reason 'debar' censors base pairs adjacent to reported indel sites, turning them into placeholder characters (default is 7 base pairs in either direction, this feature can be disabled). Testing has shown that this censorship results in the correct sequence length being restored, and erroneous base pairs being masked the vast majority of the time (>95%).

Maintained by Cameron M. Nugent. Last updated 1 years ago.

bioinformatics denoising dna-barcoding dna-sequencing hidden-markov-model machine-learning

40.9 match 1 stars 4.00 score 8 scripts

bioc

slingshot:Tools for ordering single-cell sequencing

Provides functions for inferring continuous, branching lineage structures in low-dimensional data. Slingshot was designed to model developmental trajectories in single-cell RNA sequencing data and serve as a component in an analysis pipeline after dimensionality reduction and clustering. It is flexible enough to handle arbitrarily many branching events and allows for the incorporation of prior knowledge through supervised graph construction.

Maintained by Kelly Street. Last updated 5 months ago.

clustering differentialexpression geneexpression rnaseq sequencing software singlecell transcriptomics visualization

13.5 match 283 stars 12.01 score 1.0k scripts 4 dependents

cristianetaniguti

onemap:Construction of Genetic Maps in Experimental Crosses

Analysis of molecular marker data from model (backcrosses, F2 and recombinant inbred lines) and non-model systems (i. e. outcrossing species). For the later, it allows statistical analysis by simultaneously estimating linkage and linkage phases (genetic map construction) according to Wu et al. (2002) <doi:10.1006/tpbi.2002.1577>. All analysis are based on multipoint approaches using hidden Markov models.

Maintained by Cristiane Taniguti. Last updated 2 months ago.

cpp

24.6 match 3 stars 6.58 score 183 scripts

bioc

TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data

The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Maintained by Tiago Chedraoui Silva. Last updated 27 days ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network sequencing survival software bioc bioconductor gdc integrative-analysis tcga tcga-data tcgabiolinks

11.1 match 305 stars 14.45 score 1.6k scripts 6 dependents

bioc

QDNAseq:Quantitative DNA Sequencing for Chromosomal Aberrations

Quantitative DNA sequencing for chromosomal aberrations. The genome is divided into non-overlapping fixed-sized bins, number of sequence reads in each counted, adjusted with a simultaneous two-dimensional loess correction for sequence mappability and GC content, and filtered to remove spurious regions in the genome. Downstream steps of segmentation and calling are also implemented via packages DNAcopy and CGHcall, respectively.

Maintained by Daoud Sie. Last updated 5 months ago.

copynumbervariation dnaseq genetics genomeannotation preprocessing qualitycontrol sequencing

15.7 match 49 stars 10.10 score 177 scripts 4 dependents

lhdjung

scrutiny:Error Detection in Science

Test published summary statistics for consistency (Brown and Heathers, 2017, <doi:10.1177/1948550616673876>; Allard, 2018, <https://aurelienallard.netlify.app/post/anaytic-grimmer-possibility-standard-deviations/>; Heathers and Brown, 2019, <https://osf.io/5vb3u/>). The package also provides infrastructure for implementing new error detection techniques.

Maintained by Lukas Jung. Last updated 6 months ago.

24.3 match 8 stars 6.52 score 38 scripts

bioc

tRNA:Analyzing tRNA sequences and structures

The tRNA package allows tRNA sequences and structures to be accessed and used for subsetting. In addition, it provides visualization tools to compare feature parameters of multiple tRNA sets and correlate them to additional data. The tRNA package uses GRanges objects as inputs requiring only few additional column data sets.

Maintained by Felix GM Ernst. Last updated 5 months ago.

software visualization bioconductor sequences structures trna

23.2 match 1 stars 6.81 score 3 scripts 3 dependents

leelabsg

SKAT:SNP-Set (Sequence) Kernel Association Test

Functions for kernel-regression-based association tests including Burden test, SKAT and SKAT-O. These methods aggregate individual SNP score statistics in a SNP set and efficiently compute SNP-set level p-values.

Maintained by Seunggeun (Shawn) Lee. Last updated 1 months ago.

sequence cpp

16.3 match 45 stars 9.70 score 268 scripts 16 dependents

bioc

CrispRVariants:Tools for counting and visualising mutations in a target location

CrispRVariants provides tools for analysing the results of a CRISPR-Cas9 mutagenesis sequencing experiment, or other sequencing experiments where variants within a given region are of interest. These tools allow users to localize variant allele combinations with respect to any genomic location (e.g. the Cas9 cut site), plot allele combinations and calculate mutation rates with flexible filtering of unrelated variants.

Maintained by Helen Lindsay. Last updated 5 months ago.

immunooncology crispr genomicvariation variantdetection geneticvariability datarepresentation visualization sequencing

28.6 match 5.51 score 32 scripts

tidyverse

magrittr:A Forward-Pipe Operator for R

Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions. For more information, see package vignette. To quote Rene Magritte, "Ceci n'est pas un pipe."

Maintained by Lionel Henry. Last updated 2 years ago.

pipe

7.3 match 961 stars 21.06 score 82k scripts 14k dependents

bioc

scifer:Scifer: Single-Cell Immunoglobulin Filtering of Sanger Sequences

Have you ever index sorted cells in a 96 or 384-well plate and then sequenced using Sanger sequencing? If so, you probably had some struggles to either check the electropherogram of each cell sequenced manually, or when you tried to identify which cell was sorted where after sequencing the plate. Scifer was developed to solve this issue by performing basic quality control of Sanger sequences and merging flow cytometry data from probed single-cell sorted B cells with sequencing data. scifer can export summary tables, 'fasta' files, electropherograms for visual inspection, and generate reports.

Maintained by Rodrigo Arcoverde Cerveira. Last updated 3 months ago.

preprocessing qualitycontrol sangerseq sequencing software flowcytometry singlecell

27.9 match 5 stars 5.54 score 9 scripts

bioc

ngsReports:Load FastqQC reports and other NGS related files

This package provides methods and object classes for parsing FastQC reports and output summaries from other NGS tools into R. As well as parsing files, multiple plotting methods have been implemented for visualising the parsed data. Plots can be generated as static ggplot objects or interactive plotly objects.

Maintained by Stevie Pederson. Last updated 5 months ago.

qualitycontrol reportwriting

19.6 match 22 stars 7.89 score 99 scripts

maraab23

ggseqplot:Render Sequence Plots using 'ggplot2'

A set of wrapper functions that mainly re-produces most of the sequence plots rendered with TraMineR::seqplot(). Whereas 'TraMineR' uses base R to produce the plots this library draws on 'ggplot2'. The plots are produced on the basis of a sequence object defined with TraMineR::seqdef(). The package automates the reshaping and plotting of sequence data. Resulting plots are of class 'ggplot', i.e. components can be added and tweaked using '+' and regular 'ggplot2' functions.

Maintained by Marcel Raab. Last updated 4 months ago.

ggplot2 sequence-analysis traminer visualization

27.1 match 14 stars 5.70 score 18 scripts

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

10.5 match 459 stars 14.63 score 948 scripts 18 dependents

business-science

timetk:A Tool Kit for Working with Time Series

Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.

Maintained by Matt Dancho. Last updated 1 years ago.

coercion coercion-functions data-mining dplyr forecast forecasting forecasting-models machine-learning series-decomposition series-signature tibble tidy tidyquant tidyverse time time-series timeseries

10.8 match 625 stars 14.15 score 4.0k scripts 16 dependents

bioc

Repitools:Epigenomic tools

Tools for the analysis of enrichment-based epigenomic data. Features include summarization and visualization of epigenomic data across promoters according to gene expression context, finding regions of differential methylation/binding, BayMeth for quantifying methylation etc.

Maintained by Mark Robinson. Last updated 5 months ago.

dnamethylation geneexpression methylseq

25.9 match 5.90 score 267 scripts

bioc

cardelino:Clone Identification from Single Cell Data

Methods to infer clonal tree configuration for a population of cells using single-cell RNA-seq data (scRNA-seq), and possibly other data modalities. Methods are also provided to assign cells to inferred clones and explore differences in gene expression between clones. These methods can flexibly integrate information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. A flexible beta-binomial error model that accounts for stochastic dropout events as well as systematic allelic imbalance is used.

Maintained by Davis McCarthy. Last updated 5 months ago.

singlecell rnaseq visualization transcriptomics geneexpression sequencing software exomeseq clonal-clustering gibbs-sampling scrna-seq single-cell somatic-mutations

21.6 match 61 stars 7.05 score 62 scripts

bioc

seqLogo:Sequence logos for DNA sequence alignments

seqLogo takes the position weight matrix of a DNA sequence motif and plots the corresponding sequence logo as introduced by Schneider and Stephens (1990).

Maintained by Robert Ivanek. Last updated 5 months ago.

sequencematching

14.4 match 4 stars 10.57 score 304 scripts 29 dependents

bioc

PWMEnrich:PWM enrichment analysis

A toolkit of high-level functions for DNA motif scanning and enrichment analysis built upon Biostrings. The main functionality is PWM enrichment analysis of already known PWMs (e.g. from databases such as MotifDb), but the package also implements high-level functions for PWM scanning and visualisation. The package does not perform "de novo" motif discovery, but is instead focused on using motifs that are either experimentally derived or computationally constructed by other tools.

Maintained by Diego Diez. Last updated 5 months ago.

motifannotation sequencematching software

29.9 match 5.08 score 60 scripts

bioc

Rhtslib:HTSlib high-throughput sequencing library as an R package

This package provides version 1.18 of the 'HTSlib' C library for high-throughput sequence analysis. The package is primarily useful to developers of other R packages who wish to make use of HTSlib. Motivation and instructions for use of this package are in the vignette, vignette(package="Rhtslib", "Rhtslib").

Maintained by Hervé Pagès. Last updated 12 days ago.

dataimport sequencing bioconductor-package core-package curl bzip2 xz-utils zlib

13.4 match 11 stars 11.25 score 3 scripts 591 dependents

bioc

AnnotationDbi:Manipulation of SQLite-based annotations in Bioconductor

Implements a user-friendly interface for querying SQLite-based annotation data packages.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation microarray sequencing genomeannotation bioconductor-package core-package

10.0 match 9 stars 15.05 score 3.6k scripts 769 dependents

matthias-studer

WeightedCluster:Clustering of Weighted Data

Clusters state sequences and weighted data. It provides an optimized weighted PAM algorithm as well as functions for aggregating replicated cases, computing cluster quality measures for a range of clustering solutions and plotting (fuzzy) clusters of state sequences. Parametric bootstraps methods to validate typology of sequences are also provided. Finally, it provides a fuzzy and crisp CLARA algorithm to cluster large database with sequence analysis.

Maintained by Matthias Studer. Last updated 3 months ago.

cpp

26.8 match 5.55 score 106 scripts 4 dependents

bioc

Structstrings:Implementation of the dot bracket annotations with Biostrings

The Structstrings package implements the widely used dot bracket annotation for storing base pairing information in structured RNA. Structstrings uses the infrastructure provided by the Biostrings package and derives the DotBracketString and related classes from the BString class. From these, base pair tables can be produced for in depth analysis. In addition, the loop indices of the base pairs can be retrieved as well. For better efficiency, information conversion is implemented in C, inspired to a large extend by the ViennaRNA package.

Maintained by Felix G.M. Ernst. Last updated 5 months ago.

dataimport datarepresentation infrastructure sequencing software alignment sequencematching bioconductor rna rna-structural-analysis rna-structure sequences structures

22.9 match 4 stars 6.46 score 3 scripts 4 dependents

boopsboops

spider:Species Identity and Evolution in R

Analysis of species limits and DNA barcoding data. Included are functions for generating important summary statistics from DNA barcode data, assessing specimen identification efficacy, testing and optimizing divergence threshold limits, assessment of diagnostic nucleotides, and calculation of the probability of reciprocal monophyly. Additionally, a sliding window function offers opportunities to analyse information across a gene, often used for marker design in degraded DNA studies. Further information on the package has been published in Brown et al (2012) <doi:10.1111/j.1755-0998.2011.03108.x>.

Maintained by Rupert A. Collins. Last updated 6 years ago.

dna-barcode edna evolution species-delimitation species-identity

28.5 match 2 stars 5.20 score 66 scripts 1 dependents

bioc

tidySingleCellExperiment:Brings SingleCellExperiment to the Tidyverse

'tidySingleCellExperiment' is an adapter that abstracts the 'SingleCellExperiment' container in the form of a 'tibble'. This allows *tidy* data manipulation, nesting, and plotting. For example, a 'tidySingleCellExperiment' is directly compatible with functions from 'tidyverse' packages `dplyr` and `tidyr`, as well as plotting with `ggplot2` and `plotly`. In addition, the package provides various utility functions specific to single-cell omics data analysis (e.g., aggregation of cell-level data to pseudobulks).

Maintained by Stefano Mangiola. Last updated 5 months ago.

assaydomain infrastructure rnaseq differentialexpression singlecell geneexpression normalization clustering qualitycontrol sequencing bioconductor dplyr ggplot2 plotly single-cell-rna-seq single-cell-sequencing singlecellexperiment tibble tidyr tidyverse

16.7 match 36 stars 8.86 score 125 scripts 2 dependents

bioc

Rqc:Quality Control Tool for High-Throughput Sequencing Data

Rqc is an optimised tool designed for quality control and assessment of high-throughput sequencing data. It performs parallel processing of entire files and produces a report which contains a set of high-resolution graphics.

Maintained by Welliton Souza. Last updated 5 months ago.

sequencing qualitycontrol dataimport cpp

24.3 match 6.00 score 67 scripts

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 4 days ago.

immunooncology microarray sequencing metabolomics metagenomics proteomics geneprediction multiplecomparison classification regression bioconductor genomics genomics-data genomics-visualization multivariate-analysis multivariate-statistics omics r-pkg r-project

10.5 match 182 stars 13.71 score 1.3k scripts 22 dependents

bioc

deepSNV:Detection of subclonal SNVs in deep sequencing data.

This package provides provides quantitative variant callers for detecting subclonal mutations in ultra-deep (>=100x coverage) sequencing experiments. The deepSNV algorithm is used for a comparative setup with a control experiment of the same loci and uses a beta-binomial model and a likelihood ratio test to discriminate sequencing errors and subclonal SNVs. The shearwater algorithm computes a Bayes classifier based on a beta-binomial model for variant calling with multiple samples for precisely estimating model parameters - such as local error rates and dispersion - and prior knowledge, e.g. from variation data bases such as COSMIC.

Maintained by Moritz Gerstung. Last updated 5 months ago.

geneticvariability snp sequencing genetics dataimport curl bzip2 xz-utils zlib cpp

21.9 match 6.53 score 38 scripts 1 dependents

bioc

RNAmodR:Detection of post-transcriptional modifications in high throughput sequencing data

RNAmodR provides classes and workflows for loading/aggregation data from high througput sequencing aimed at detecting post-transcriptional modifications through analysis of specific patterns. In addition, utilities are provided to validate and visualize the results. The RNAmodR package provides a core functionality from which specific analysis strategies can be easily implemented as a seperate package.

Maintained by Felix G.M. Ernst. Last updated 5 months ago.

software infrastructure workflowstep visualization sequencing alkanilineseq bioconductor modifications ribomethseq rna rnamodr

21.8 match 3 stars 6.51 score 9 scripts 3 dependents

bioc

BASiCS:Bayesian Analysis of Single-Cell Sequencing data

Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model to perform statistical analyses of single-cell RNA sequencing datasets in the context of supervised experiments (where the groups of cells of interest are known a priori, e.g. experimental conditions or cell types). BASiCS performs built-in data normalisation (global scaling) and technical noise quantification (based on spike-in genes). BASiCS provides an intuitive detection criterion for highly (or lowly) variable genes within a single group of cells. Additionally, BASiCS can compare gene expression patterns between two or more pre-specified groups of cells. Unlike traditional differential expression tools, BASiCS quantifies changes in expression that lie beyond comparisons of means, also allowing the study of changes in cell-to-cell heterogeneity. The latter can be quantified via a biological over-dispersion parameter that measures the excess of variability that is observed with respect to Poisson sampling noise, after normalisation and technical noise removal. Due to the strong mean/over-dispersion confounding that is typically observed for scRNA-seq datasets, BASiCS also tests for changes in residual over-dispersion, defined by residual values with respect to a global mean/over-dispersion trend.

Maintained by Catalina Vallejos. Last updated 5 months ago.

immunooncology normalization sequencing rnaseq software geneexpression transcriptomics singlecell differentialexpression bayesian cellbiology bioconductor-package gene-expression rcpp rcpparmadillo scrna-seq single-cell openblas cpp openmp

13.8 match 83 stars 10.26 score 368 scripts 1 dependents

ropensci

bold:Interface to Bold Systems API

A programmatic interface to the Web Service methods provided by Bold Systems (<http://www.boldsystems.org/>) for genetic 'barcode' data. Functions include methods for searching by sequences by taxonomic names, ids, collectors, and institutions; as well as a function for searching for specimens, and downloading trace files.

Maintained by Salix Dubois. Last updated 3 months ago.

biodiversity barcode dna sequences fasta api-wrapper barcodes taxize

24.6 match 18 stars 5.74 score 57 scripts

bioc

ProteoDisco:Generation of customized protein variant databases from genomic variants, splice-junctions and manual sequences

ProteoDisco is an R package to facilitate proteogenomics studies. It houses functions to create customized (variant) protein databases based on user-submitted genomic variants, splice-junctions, fusion genes and manual transcript sequences. The flexible workflow can be adopted to suit a myriad of research and experimental settings.

Maintained by Job van Riet. Last updated 5 months ago.

software proteomics rnaseq snp sequencing variantannotation dataimport

26.5 match 5 stars 5.30 score 4 scripts

helske

seqHMM:Mixture Hidden Markov Models for Social Sequence Data and Other Multivariate, Multichannel Categorical Time Series

Designed for fitting hidden (latent) Markov models and mixture hidden Markov models for social sequence data and other categorical time series. Also some more restricted versions of these type of models are available: Markov models, mixture Markov models, and latent class models. The package supports models for one or multiple subjects with one or multiple parallel sequences (channels). External covariates can be added to explain cluster membership in mixture models. The package provides functions for evaluating and comparing models, as well as functions for visualizing of multichannel sequence data and hidden Markov models. Models are estimated using maximum likelihood via the EM algorithm and/or direct numerical maximization with analytical gradients. All main algorithms are written in C++ with support for parallel computation. Documentation is available via several vignettes in this page, and the paper by Helske and Helske (2019, <doi:10.18637/jss.v088.i03>).

Maintained by Jouni Helske. Last updated 2 years ago.

categorical-data em-algorithm hidden-markov-models hmm mixture-markov-models time-series openblas cpp openmp

16.5 match 97 stars 8.51 score 92 scripts 1 dependents

bioc

GenomicSuperSignature:Interpretation of RNA-seq experiments through robust, efficient comparison to public databases

This package provides a novel method for interpreting new transcriptomic datasets through near-instantaneous comparison to public archives without high-performance computing requirements. Through the pre-computed index, users can identify public resources associated with their dataset such as gene sets, MeSH term, and publication. Functions to identify interpretable annotations and intuitive visualization options are implemented in this package.

Maintained by Sehyun Oh. Last updated 5 months ago.

transcriptomics systemsbiology principalcomponent rnaseq sequencing pathways clustering bioconductor-package exploratory-data-analysis gsea mesh principal-component-analysis rna-sequencing-profiles transferlearning

20.0 match 16 stars 6.97 score 59 scripts

bioc

limma:Linear Models for Microarray and Omics Data

Data analysis, linear models and differential expression for omics data.

Maintained by Gordon Smyth. Last updated 6 days ago.

exonarray geneexpression transcription alternativesplicing differentialexpression differentialsplicing genesetenrichment dataimport bayesian clustering regression timecourse microarray micrornaarray mrnamicroarray onechannel proprietaryplatforms twochannel sequencing rnaseq batcheffect multiplecomparison normalization preprocessing qualitycontrol biomedicalinformatics cellbiology cheminformatics epigenetics functionalgenomics genetics immunooncology metabolomics proteomics systemsbiology transcriptomics

10.0 match 13.81 score 16k scripts 585 dependents

immunomind

immunarch:Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires

A comprehensive framework for bioinformatics exploratory analysis of bulk and single-cell T-cell receptor and antibody repertoires. It provides seamless data loading, analysis and visualisation for AIRR (Adaptive Immune Receptor Repertoire) data, both bulk immunosequencing (RepSeq) and single-cell sequencing (scRNAseq). Immunarch implements most of the widely used AIRR analysis methods, such as: clonality analysis, estimation of repertoire similarities in distribution of clonotypes and gene segments, repertoire diversity analysis, annotation of clonotypes using external immune receptor databases and clonotype tracking in vaccination and cancer studies. A successor to our previously published 'tcR' immunoinformatics package (Nazarov 2015) <doi:10.1186/s12859-015-0613-1>.

Maintained by Vadim I. Nazarov. Last updated 12 months ago.

airr-analysis b-cell-receptor bcr bcr-repertoire bioinformatics ig ig-repertoire immune-repertoire immune-repertoire-analysis immune-repertoire-data immunoglobulin immunoinformatics immunology rep-seq repertoire-analysis single-cell single-cell-analysis t-cell-receptor tcr tcr-repertoire cpp

14.5 match 315 stars 9.49 score 203 scripts

klvoje

evoTS:Analyses of Evolutionary Time-Series

Facilitates univariate and multivariate analysis of evolutionary sequences of phenotypic change. The package extends the modeling framework available in the 'paleoTS' package. Please see <https://klvoje.github.io/evoTS/index.html> for information about the package and the implemented models.

Maintained by Kjetil Lysne Voje. Last updated 9 months ago.

32.2 match 1 stars 4.26 score 184 scripts

bioc

VariantAnnotation:Annotation of Genetic Variants

Annotate variants, compute amino acid coding changes, predict coding outcomes.

Maintained by Bioconductor Package Maintainer. Last updated 2 months ago.

dataimport sequencing snp annotation genetics variantannotation curl bzip2 xz-utils zlib

12.0 match 11.39 score 1.9k scripts 152 dependents

bioc

crisprBowtie:Bowtie-based alignment of CRISPR gRNA spacer sequences

Provides a user-friendly interface to map on-targets and off-targets of CRISPR gRNA spacer sequences using bowtie. The alignment is fast, and can be performed using either commonly-used or custom CRISPR nucleases. The alignment can work with any reference or custom genomes. Both DNA- and RNA-targeting nucleases are supported.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics alignment aligner bioconductor bioconductor-package bowtie crispr-analysis crispr-cas9 crispr-design crispr-target grna grna-sequence grna-sequences sgrna sgrna-design

23.0 match 3 stars 5.86 score 7 scripts 4 dependents

tylermorganwall

spacefillr:Space-Filling Random and Quasi-Random Sequences

Generates random and quasi-random space-filling sequences. Supports the following sequences: 'Halton', 'Sobol', 'Owen'-scrambled 'Sobol', 'Owen'-scrambled 'Sobol' with errors distributed as blue noise, progressive jittered, progressive multi-jittered ('PMJ'), 'PMJ' with blue noise, 'PMJ02', and 'PMJ02' with blue noise. Includes a 'C++' 'API'. Methods derived from "Constructing Sobol sequences with better two-dimensional projections" (2012) <doi:10.1137/070709359> S. Joe and F. Y. Kuo, "Progressive Multi-Jittered Sample Sequences" (2018) <https://graphics.pixar.com/library/ProgressiveMultiJitteredSampling/paper.pdf> Christensen, P., Kensler, A. and Kilpatrick, C., and "A Low-Discrepancy Sampler that Distributes Monte Carlo Errors as a Blue Noise in Screen Space" (2019) E. Heitz, B. Laurent, O. Victor, C. David and I. Jean-Claude, <doi:10.1145/3306307.3328191>.

Maintained by Tyler Morgan-Wall. Last updated 21 days ago.

halton-sequence quasi-random-generator sobol-sequence cpp

18.9 match 7 stars 7.07 score 3 scripts 45 dependents

bioc

idpr:Profiling and Analyzing Intrinsically Disordered Proteins in R

‘idpr’ aims to integrate tools for the computational analysis of intrinsically disordered proteins (IDPs) within R. This package is used to identify known characteristics of IDPs for a sequence of interest with easily reported and dynamic results. Additionally, this package includes tools for IDP-based sequence analysis to be used in conjunction with other R packages. Described in McFadden WM & Yanowitz JL (2022). "idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in R." PloS one, 17(4), e0266929. <https://doi.org/10.1371/journal.pone.0266929>.

Maintained by William M. McFadden. Last updated 5 months ago.

structuralprediction proteomics cellbiology

21.6 match 4 stars 6.16 score 20 scripts

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

11.2 match 215 stars 11.83 score 1.2k scripts 9 dependents

bioc

splatter:Simple Simulation of Single-cell RNA Sequencing Data

Splatter is a package for the simulation of single-cell RNA sequencing count data. It provides a simple interface for creating complex simulations that are reproducible and well-documented. Parameters can be estimated from real data and functions are provided for comparing real and simulated datasets.

Maintained by Luke Zappia. Last updated 4 months ago.

singlecell rnaseq transcriptomics geneexpression sequencing software immunooncology bioconductor bioinformatics scrna-seq simulation

13.4 match 224 stars 9.92 score 424 scripts 1 dependents

bioc

segmentSeq:Methods for identifying small RNA loci from high-throughput sequencing data

High-throughput sequencing technologies allow the production of large volumes of short sequences, which can be aligned to the genome to create a set of matches to the genome. By looking for regions of the genome which to which there are high densities of matches, we can infer a segmentation of the genome into regions of biological significance. The methods in this package allow the simultaneous segmentation of data from multiple samples, taking into account replicate data, in order to create a consensus segmentation. This has obvious applications in a number of classes of sequencing experiments, particularly in the discovery of small RNA loci and novel mRNA transcriptome discovery.

Maintained by Samuel Granjeaud. Last updated 4 months ago.

multiplecomparison sequencing alignment differentialexpression qualitycontrol dataimport

21.5 match 6.17 score 42 scripts

bioc

HDF5Array:HDF5 datasets as array-like objects in R

The HDF5Array package is an HDF5 backend for DelayedArray objects. It implements the HDF5Array, H5SparseMatrix, H5ADMatrix, and TENxMatrix classes, 4 convenient and memory-efficient array-like containers for representing and manipulating either: (1) a conventional (a.k.a. dense) HDF5 dataset, (2) an HDF5 sparse matrix (stored in CSR/CSC/Yale format), (3) the central matrix of an h5ad file (or any matrix in the /layers group), or (4) a 10x Genomics sparse matrix. All these containers are DelayedArray extensions and thus support all operations (delayed or block-processed) supported by DelayedArray objects.

Maintained by Hervé Pagès. Last updated 26 days ago.

infrastructure datarepresentation dataimport sequencing rnaseq coverage annotation genomeannotation singlecell immunooncology bioconductor-package core-package u24ca289073

10.0 match 12 stars 13.19 score 844 scripts 123 dependents

bioc

scran:Methods for Single-Cell RNA-Seq Data Analysis

Implements miscellaneous functions for interpretation of single-cell RNA-seq data. Methods are provided for assignment of cell cycle phase, detection of highly variable and significantly correlated genes, identification of marker genes, and other common tasks in routine single-cell analysis workflows.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology normalization sequencing rnaseq software geneexpression transcriptomics singlecell clustering bioconductor-package human-cell-atlas single-cell-rna-seq openblas cpp

10.0 match 41 stars 13.14 score 7.6k scripts 36 dependents

bioc

baySeq:Empirical Bayesian analysis of patterns of differential expression in count data

This package identifies differential expression in high-throughput 'count' data, such as that derived from next-generation sequencing machines, calculating estimated posterior likelihoods of differential expression (or more complex hypotheses) via empirical Bayesian methods.

Maintained by Samuel Granjeaud. Last updated 5 months ago.

sequencing differentialexpression multiplecomparison sage bayesian coverage

16.9 match 7.75 score 79 scripts 3 dependents

bioc

Gviz:Plotting data and annotation information along genomic coordinates

Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.

Maintained by Robert Ivanek. Last updated 5 months ago.

visualization microarray sequencing

10.0 match 79 stars 13.08 score 1.4k scripts 48 dependents

bioc

clusterExperiment:Compare Clusterings for Single-Cell Sequencing

Provides functionality for running and comparing many different clusterings of single-cell sequencing data or other large mRNA Expression data sets.

Maintained by Elizabeth Purdom. Last updated 5 months ago.

clustering rnaseq sequencing software singlecell cpp

13.5 match 39 stars 9.63 score 192 scripts 1 dependents

bioc

sitePath:Phylogeny-based sequence clustering with site polymorphism

Using site polymorphism is one of the ways to cluster DNA/protein sequences but it is possible for the sequences with the same polymorphism on a single site to be genetically distant. This package is aimed at clustering sequences using site polymorphism and their corresponding phylogenetic trees. By considering their location on the tree, only the structurally adjacent sequences will be clustered. However, the adjacent sequences may not necessarily have the same polymorphism. So a branch-and-bound like algorithm is used to minimize the entropy representing the purity of site polymorphism of each cluster.

Maintained by Chengyang Ji. Last updated 5 months ago.

alignment multiplesequencealignment phylogenetics snp software mutation cpp

25.0 match 8 stars 5.20 score 9 scripts

bioc

tradeSeq:trajectory-based differential expression analysis for sequencing data

tradeSeq provides a flexible method for fitting regression models that can be used to find genes that are differentially expressed along one or multiple lineages in a trajectory. Based on the fitted models, it uses a variety of tests suited to answer different questions of interest, e.g. the discovery of genes for which expression is associated with pseudotime, or which are differentially expressed (in a specific region) along the trajectory. It fits a negative binomial generalized additive model (GAM) for each gene, and performs inference on the parameters of the GAM.

Maintained by Hector Roux de Bezieux. Last updated 5 months ago.

clustering regression timecourse differentialexpression geneexpression rnaseq sequencing software singlecell transcriptomics multiplecomparison visualization

12.9 match 247 stars 10.06 score 440 scripts

great-northern-diver

loon:Interactive Statistical Data Visualization

An extendable toolkit for interactive data visualization and exploration.

Maintained by R. Wayne Oldford. Last updated 2 years ago.

data-analysis data-science data-visualization exploratory-analysis exploratory-data-analysis high-dimensional-data interactive-graphics interactive-visualizations loon python statistical-analysis statistical-graphics statistics tcl-extension tk

14.4 match 48 stars 9.00 score 93 scripts 5 dependents

bioc

PureCN:Copy number calling and SNV classification using targeted short read sequencing

This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.

Maintained by Markus Riester. Last updated 2 months ago.

copynumbervariation software sequencing variantannotation variantdetection coverage immunooncology bioconductor-package cell-free-dna copy-number loh tumor-heterogeneity tumor-mutational-burden tumor-purity

13.3 match 132 stars 9.72 score 40 scripts

bioc

seqTools:Analysis of nucleotide, sequence and quality content on fastq files

Analyze read length, phred scores and alphabet frequency and DNA k-mers on uncompressed and compressed fastq files.

Maintained by Wolfgang Kaisers. Last updated 5 months ago.

qualitycontrol sequencing zlib

23.1 match 5.57 score 52 scripts 1 dependents

cran

localScore:Package for Sequence Analysis by Local Score

Functionalities for calculating the local score and calculating statistical relevance (p-value) to find a local Score in a sequence of given distribution (S. Mercier and J.-J. Daudin (2001) <https://hal.science/hal-00714174/>) ; S. Karlin and S. Altschul (1990) <https://pmc.ncbi.nlm.nih.gov/articles/PMC53667/> ; S. Mercier, D. Cellier and F. Charlot (2003) <https://hal.science/hal-00937529v1/> ; A. Lagnoux, S. Mercier and P. Valois (2017) <doi:10.1093/bioinformatics/btw699> ).

Maintained by David Robelin. Last updated 21 days ago.

cpp

55.3 match 2.30 score 6 scripts

bioc

monaLisa:Binned Motif Enrichment Analysis and Visualization

Useful functions to work with sequence motifs in the analysis of genomics data. These include methods to annotate genomic regions or sequences with predicted motif hits and to identify motifs that drive observed changes in accessibility or expression. Functions to produce informative visualizations of the obtained results are also provided.

Maintained by Michael Stadler. Last updated 2 months ago.

motifannotation visualization featureextraction epigenetics

15.8 match 40 stars 8.06 score 53 scripts

bioc

HTSFilter:Filter replicated high-throughput transcriptome sequencing data

This package implements a filtering procedure for replicated transcriptome sequencing data based on a global Jaccard similarity index in order to identify genes with low, constant levels of expression across one or more experimental conditions.

Maintained by Andrea Rau. Last updated 5 months ago.

sequencing rnaseq preprocessing differentialexpression geneexpression normalization immunooncology

20.4 match 6.24 score 58 scripts 1 dependents

bioc

XNAString:Efficient Manipulation of Modified Oligonucleotide Sequences

The XNAString package allows for description of base sequences and associated chemical modifications in a single object. XNAString is able to capture single stranded, as well as double stranded molecules. Chemical modifications are represented as independent strings associated with different features of the molecules (base sequence, sugar sequence, backbone sequence, modifications) and can be read or written to a HELM notation. It also enables secondary structure prediction using RNAfold from ViennaRNA. XNAString is designed to be efficient representation of nucleic-acid based therapeutics, therefore it stores information about target sequences and provides interface for matching and alignment functions from Biostrings and pwalign packages.

Maintained by Marianna Plucinska. Last updated 5 months ago.

sequencematching alignment sequencing genetics cpp

30.4 match 4.18 score 4 scripts

bioc

microbiome:Microbiome Analytics

Utilities for microbiome analysis.

Maintained by Leo Lahti. Last updated 5 months ago.

metagenomics microbiome sequencing systemsbiology hitchip hitchip-atlas human-microbiome microbiology microbiome-analysis phyloseq population-study

10.0 match 290 stars 12.50 score 2.0k scripts 5 dependents

bioc

genomation:Summary, annotation and visualization of genomic data

A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.

Maintained by Altuna Akalin. Last updated 5 months ago.

annotation sequencing visualization cpgisland cpp

11.3 match 75 stars 11.09 score 738 scripts 5 dependents

bioc

DNAfusion:Identification of gene fusions using paired-end sequencing

DNAfusion can identify gene fusions such as EML4-ALK based on paired-end sequencing results. This package was developed using position deduplicated BAM files generated with the AVENIO Oncology Analysis Software. These files are made using the AVENIO ctDNA surveillance kit and Illumina Nextseq 500 sequencing. This is a targeted hybridization NGS approach and includes ALK-specific but not EML4-specific probes.

Maintained by Christoffer Trier Maansson. Last updated 5 months ago.

targetedresequencing genetics genefusiondetection sequencing bioconductor-package circulating-tumor-dna gene-fusion liquid-biopsy next-generation-sequencing targeted-sequencing variant-calling

27.8 match 3 stars 4.48 score 10 scripts

bioc

FastqCleaner:A Shiny Application for Quality Control, Filtering and Trimming of FASTQ Files

An interactive web application for quality control, filtering and trimming of FASTQ files. This user-friendly tool combines a pipeline for data processing based on Biostrings and ShortRead infrastructure, with a cutting-edge visual environment. Single-Read and Paired-End files can be locally processed. Diagnostic interactive plots (CG content, per-base sequence quality, etc.) are provided for both the input and output files.

Maintained by Leandro Roser. Last updated 5 months ago.

qualitycontrol sequencing software sangerseq sequencematching cpp

30.8 match 4.00 score 4 scripts

bioc

qckitfastq:FASTQ Quality Control

Assessment of FASTQ file format with multiple metrics including quality score, sequence content, overrepresented sequence and Kmers.

Maintained by August Guang. Last updated 5 months ago.

software qualitycontrol sequencing zlib cpp

28.1 match 4.38 score 24 scripts

bioc

sva:Surrogate Variable Analysis

The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. Specifically, the sva package contains functions for the identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data (like gene expression/RNA sequencing/methylation/brain imaging data) that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise. The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv). Removing batch effects and using surrogate variables in differential expression analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility, see (Leek and Storey 2007 PLoS Genetics, 2008 PNAS or Leek et al. 2011 Nat. Reviews Genetics).

Maintained by Jeffrey T. Leek. Last updated 5 months ago.

immunooncology microarray statisticalmethod preprocessing multiplecomparison sequencing rnaseq batcheffect normalization

12.2 match 10.05 score 3.2k scripts 50 dependents

chaoliu-cl

conversim:Conversation Similarity Analysis Package

A comprehensive toolkit for analyzing and comparing conversations. This package provides functions to calculate various similarity measures between conversations, including topic, lexical, semantic, structural, stylistic, sentiment, participant, and timing similarities. It supports both pairwise conversation comparisons and analysis of multiple dyads.

Maintained by Chao Liu. Last updated 6 months ago.

28.5 match 4.30 score 10 scripts

sjmack

HLAtools:Toolkit for HLA Immunogenomics

A toolkit for the analysis and management of data for genes in the so-called "Human Leukocyte Antigen" (HLA) region. Functions extract reference data from the Anthony Nolan HLA Informatics Group/ImmunoGeneTics HLA 'GitHub' repository (ANHIG/IMGTHLA) <https://github.com/ANHIG/IMGTHLA>, validate Genotype List (GL) Strings, convert between UNIFORMAT and GL String Code (GLSC) formats, translate HLA alleles and GLSCs across ImmunoPolymorphism Database (IPD) IMGT/HLA Database release versions, identify differences between pairs of alleles at a locus, generate customized, multi-position sequence alignments, trim and convert allele-names across nomenclature epochs, and extend existing data-analysis methods.

Maintained by Steven Mack. Last updated 13 days ago.

19.7 match 4 stars 6.21 score 7 scripts 1 dependents

bioc

DropletUtils:Utilities for Handling Single-Cell Droplet Data

Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading from count matrices or molecule information files, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix.

Maintained by Jonathan Griffiths. Last updated 3 months ago.

immunooncology singlecell sequencing rnaseq geneexpression transcriptomics dataimport coverage zlib cpp

12.0 match 10.08 score 2.7k scripts 9 dependents

mt1022

cubar:Codon Usage Bias Analysis

A suite of functions for rapid and flexible analysis of codon usage bias. It provides in-depth analysis at the codon level, including relative synonymous codon usage (RSCU), tRNA weight calculations, machine learning predictions for optimal or preferred codons, and visualization of codon-anticodon pairing. Additionally, it can calculate various gene- specific codon indices such as codon adaptation index (CAI), effective number of codons (ENC), fraction of optimal codons (Fop), tRNA adaptation index (tAI), mean codon stabilization coefficients (CSCg), and GC contents (GC/GC3s/GC4d). It also supports both standard and non-standard genetic code tables found in NCBI, as well as custom genetic code tables.

Maintained by Hong Zhang. Last updated 3 months ago.

bioinformatics codon-usage machine-learning sequence-analysis

20.6 match 6 stars 5.82 score 8 scripts

bioc

circRNAprofiler:circRNAprofiler: An R-Based Computational Framework for the Downstream Analysis of Circular RNAs

R-based computational framework for a comprehensive in silico analysis of circRNAs. This computational framework allows to combine and analyze circRNAs previously detected by multiple publicly available annotation-based circRNA detection tools. It covers different aspects of circRNAs analysis from differential expression analysis, evolutionary conservation, biogenesis to functional analysis.

Maintained by Simona Aufiero. Last updated 5 months ago.

annotation structuralprediction functionalprediction geneprediction genomeassembly differentialexpression

20.7 match 10 stars 5.78 score 5 scripts

bioc

GenomicDataCommons:NIH / NCI Genomic Data Commons Access

Programmatically access the NIH / NCI Genomic Data Commons RESTful service.

Maintained by Sean Davis. Last updated 1 months ago.

dataimport sequencing api-client bioconductor bioinformatics cancer core-services data-science genomics nci tcga vignette

10.0 match 87 stars 11.94 score 238 scripts 12 dependents

stuart-lab

Signac:Analysis of Single-Cell Chromatin Data

A framework for the analysis and exploration of single-cell chromatin data. The 'Signac' package contains functions for quantifying single-cell chromatin data, computing per-cell quality control metrics, dimension reduction and normalization, visualization, and DNA sequence motif analysis. Reference: Stuart et al. (2021) <doi:10.1038/s41592-021-01282-5>.

Maintained by Tim Stuart. Last updated 7 months ago.

atac bioinformatics single-cell zlib cpp

9.7 match 349 stars 12.19 score 3.7k scripts 1 dependents

bioc

derfinder:Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

This package provides functions for annotation-agnostic differential expression analysis of RNA-seq data. Two implementations of the DER Finder approach are included in this package: (1) single base-level F-statistics and (2) DER identification at the expressed regions-level. The DER Finder approach can also be used to identify differentially bounded ChIP-seq peaks.

Maintained by Leonardo Collado-Torres. Last updated 3 months ago.

differentialexpression sequencing rnaseq chipseq differentialpeakcalling software immunooncology coverage annotation-agnostic bioconductor derfinder

11.8 match 42 stars 10.03 score 78 scripts 6 dependents

molevolepid

SEEPS:Sequence evolution and epidemiological process simulator

A modular, modern simulation suite and toolkit for simulating transmission networks, phylogenies, and evolutionary pairwise distance matrices under different models and assumptions for viral/sequence evolution. While intially developed for HIV, SEEPS offers modular utilities for custom workflows for extension beyond HIV.

Maintained by Michael Kupperman. Last updated 2 months ago.

biological-sequences epidemiology evolution hiv simulation-framework

29.8 match 1 stars 3.95 score 6 scripts

bioc

spiky:Spike-in calibration for cell-free MeDIP

spiky implements methods and model generation for cfMeDIP (cell-free methylated DNA immunoprecipitation) with spike-in controls. CfMeDIP is an enrichment protocol which avoids destructive conversion of scarce template, making it ideal as a "liquid biopsy," but creating certain challenges in comparing results across specimens, subjects, and experiments. The use of synthetic spike-in standard oligos allows diagnostics performed with cfMeDIP to quantitatively compare samples across subjects, experiments, and time points in both relative and absolute terms.

Maintained by Tim Triche. Last updated 5 months ago.

differentialmethylation dnamethylation normalization preprocessing qualitycontrol sequencing

23.9 match 2 stars 4.90 score 3 scripts

bioc

cfDNAPro:cfDNAPro extracts and Visualises biological features from whole genome sequencing data of cell-free DNA

cfDNA fragments carry important features for building cancer sample classification ML models, such as fragment size, and fragment end motif etc. Analyzing and visualizing fragment size metrics, as well as other biological features in a curated, standardized, scalable, well-documented, and reproducible way might be time intensive. This package intends to resolve these problems and simplify the process. It offers two sets of functions for cfDNA feature characterization and visualization.

Maintained by Haichao Wang. Last updated 5 months ago.

visualization sequencing wholegenome bioinformatics cancer-genomics cancer-research cell-free-dna early-detection genomics-visualization liquid-biopsy swgs whole-genome-sequencing

19.4 match 28 stars 6.04 score 13 scripts

cran

ips:Interfaces to Phylogenetic Software in R

Functions that wrap popular phylogenetic software for sequence alignment, masking of sequence alignments, and estimation of phylogenies and ancestral character states.

Maintained by Christoph Heibl. Last updated 11 months ago.

27.2 match 4.28 score 128 scripts 1 dependents

bioc

systemPipeR:systemPipeR: Workflow Environment for Data Analysis and Report Generation

systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates.

Maintained by Thomas Girke. Last updated 5 months ago.

genetics infrastructure dataimport sequencing rnaseq riboseq chipseq methylseq snp geneexpression coverage genesetenrichment alignment qualitycontrol immunooncology reportwriting workflowstep workflowmanagement

10.0 match 53 stars 11.56 score 344 scripts 3 dependents

bioc

muscle:Multiple Sequence Alignment with MUSCLE

MUSCLE performs multiple sequence alignments of nucleotide or amino acid sequences.

Maintained by Alex T. Kalinka. Last updated 5 months ago.

multiplesequencealignment alignment sequencing genetics sequencematching dataimport cpp

22.0 match 5.21 score 81 scripts

bioc

methInheritSim:Simulating Whole-Genome Inherited Bisulphite Sequencing Data

Simulate a multigeneration methylation case versus control experiment with inheritance relation using a real control dataset.

Maintained by Pascal Belleau. Last updated 5 months ago.

biologicalquestion epigenetics dnamethylation differentialmethylation methylseq software immunooncology statisticalmethod wholegenome sequencing bisulphite-sequencing inheritance methylation simulation

24.9 match 1 stars 4.60 score 1 scripts

dsy109

mixtools:Tools for Analyzing Finite Mixture Models

Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixtures-of-regressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictor-dependent mixing proportions, random effects regressions, hierarchical mixtures-of-experts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic, mixturegrams, and model selection criteria). Bayesian estimation of mixtures-of-linear-regressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772 and the Chan Zuckerberg Initiative: Essential Open Source Software for Science (Grant No. 2020-255193).

Maintained by Derek Young. Last updated 9 months ago.

mixture-models mixture-of-experts semiparametric-regression

10.1 match 20 stars 11.34 score 1.4k scripts 56 dependents

bioc

bsseq:Analyze, manage and store whole-genome methylation data

A collection of tools for analyzing and visualizing whole-genome methylation data from sequencing. This includes whole-genome bisulfite sequencing and Oxford nanopore data.

Maintained by Kasper Daniel Hansen. Last updated 3 months ago.

dnamethylation cpp

9.3 match 37 stars 12.26 score 676 scripts 15 dependents

bioc

TAPseq:Targeted scRNA-seq primer design for TAP-seq

Design primers for targeted single-cell RNA-seq used by TAP-seq. Create sequence templates for target gene panels and design gene-specific primers using Primer3. Potential off-targets can be estimated with BLAST. Requires working installations of Primer3 and BLASTn.

Maintained by Andreas R. Gschwind. Last updated 5 months ago.

singlecell sequencing technology crispr pooledscreens

21.2 match 4 stars 5.38 score 9 scripts

bioc

crisprScore:On-Target and Off-Target Scoring Algorithms for CRISPR gRNAs

Provides R wrappers of several on-target and off-target scoring methods for CRISPR guide RNAs (gRNAs). The following nucleases are supported: SpCas9, AsCas12a, enAsCas12a, and RfxCas13d (CasRx). The available on-target cutting efficiency scoring methods are RuleSet1, Azimuth, DeepHF, DeepCpf1, enPAM+GB, and CRISPRscan. Both the CFD and MIT scoring methods are available for off-target specificity prediction. The package also provides a Lindel-derived score to predict the probability of a gRNA to produce indels inducing a frameshift for the Cas9 nuclease. Note that DeepHF, DeepCpf1 and enPAM+GB are not available on Windows machines.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics functionalprediction bioconductor bioconductor-package crispr-cas9 crispr-design crispr-target genomics grna grna-sequence grna-sequences scoring-algorithm sgrna sgrna-design

15.0 match 16 stars 7.52 score 19 scripts 4 dependents

bioc

pathview:a tool set for pathway based data integration and visualization

Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.

Maintained by Weijun Luo. Last updated 5 months ago.

pathways graphandnetwork visualization genesetenrichment differentialexpression geneexpression microarray rnaseq genetics metabolomics proteomics systemsbiology sequencing

10.0 match 40 stars 11.24 score 1.6k scripts 10 dependents

bioc

karyoploteR:Plot customizable linear genomes displaying arbitrary data

karyoploteR creates karyotype plots of arbitrary genomes and offers a complete set of functions to plot arbitrary data on them. It mimicks many R base graphics functions coupling them with a coordinate change function automatically mapping the chromosome and data coordinates into the plot coordinates. In addition to the provided data plotting functions, it is easy to add new ones.

Maintained by Bernat Gel. Last updated 5 months ago.

visualization copynumbervariation sequencing coverage dnaseq chipseq methylseq dataimport onechannel bioconductor bioinformatics data-visualization genome genomics-visualization plotting-in-r

10.0 match 306 stars 11.22 score 656 scripts 4 dependents

bioc

StructuralVariantAnnotation:Variant annotations for structural variants

StructuralVariantAnnotation provides a framework for analysis of structural variants within the Bioconductor ecosystem. This package contains contains useful helper functions for dealing with structural variants in VCF format. The packages contains functions for parsing VCFs from a number of popular callers as well as functions for dealing with breakpoints involving two separate genomic loci encoded as GRanges objects.

Maintained by Daniel Cameron. Last updated 5 months ago.

dataimport sequencing annotation genetics variantannotation

17.8 match 6.26 score 102 scripts 2 dependents

mlverse

torch:Tensors and Neural Networks with 'GPU' Acceleration

Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.

Maintained by Daniel Falbel. Last updated 6 days ago.

autograd deep-learning torch cpp

6.8 match 520 stars 16.52 score 1.4k scripts 38 dependents

xytangtang

ProcData:Process Data Analysis

Provides tools for exploratory process data analysis. Process data refers to the data describing participants' problem-solving processes in computer-based assessments. It is often recorded in computer log files. This package provides functions to read, process, and write process data. It also implements two feature extraction methods to compress the information stored in process data into standard numerical vectors. This package also provides recurrent neural network based models that relate response processes with other binary or scale variables of interest. The functions that involve training and evaluating neural networks are wrappers of functions in 'keras'.

Maintained by Xueying Tang. Last updated 4 years ago.

cpp

30.1 match 10 stars 3.70 score 2 scripts

bioc

scater:Single-Cell Analysis Toolkit for Gene Expression Data in R

A collection of tools for doing various analyses of single-cell RNA-seq gene expression data, with a focus on quality control and visualization.

Maintained by Alan OCallaghan. Last updated 9 days ago.

immunooncology singlecell rnaseq qualitycontrol preprocessing normalization visualization dimensionreduction transcriptomics geneexpression sequencing software dataimport datarepresentation infrastructure coverage

10.0 match 11.07 score 12k scripts 43 dependents

bioc

celda:CEllular Latent Dirichlet Allocation

Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included.

Maintained by Joshua Campbell. Last updated 28 days ago.

singlecell geneexpression clustering sequencing bayesian immunooncology dataimport cpp openmp

10.5 match 147 stars 10.47 score 256 scripts 2 dependents

bioc

DirichletMultinomial:Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data

Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Quince, 2012, PLoS ONE 7(2): 1-15, as discussed further in the man page for this package, ?DirichletMultinomial.

Maintained by Martin Morgan. Last updated 5 months ago.

immunooncology microbiome sequencing clustering classification metagenomics gsl

10.0 match 11 stars 10.97 score 125 scripts 26 dependents

bioc

ORFhunteR:Predict open reading frames in nucleotide sequences

The ORFhunteR package is a R and C++ library for an automatic determination and annotation of open reading frames (ORF) in a large set of RNA molecules. It efficiently implements the machine learning model based on vectorization of nucleotide sequences and the random forest classification algorithm. The ORFhunteR package consists of a set of functions written in the R language in conjunction with C++. The efficiency of the package was confirmed by the examples of the analysis of RNA molecules from the NCBI RefSeq and Ensembl databases. The package can be used in basic and applied biomedical research related to the study of the transcriptome of normal as well as altered (for example, cancer) human cells.

Maintained by Vasily V. Grinev. Last updated 5 months ago.

technology statisticalmethod sequencing rnaseq classification featureextraction cpp

24.5 match 1 stars 4.48 score

bioc

IsoformSwitchAnalyzeR:Identify, Annotate and Visualize Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data

Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.

Maintained by Kristoffer Vitting-Seerup. Last updated 5 months ago.

geneexpression transcription alternativesplicing differentialexpression differentialsplicing visualization statisticalmethod transcriptomevariant biomedicalinformatics functionalgenomics systemsbiology transcriptomics rnaseq annotation functionalprediction geneprediction dataimport multiplecomparison batcheffect immunooncology

11.8 match 108 stars 9.26 score 125 scripts

bioc

EnrichedHeatmap:Making Enriched Heatmaps

Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions. Here we implement enriched heatmap by ComplexHeatmap package. Since this type of heatmap is just a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmap as well as concatenating to a list of heatmaps to show correspondance between different data sources.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization sequencing genomeannotation coverage cpp

10.0 match 190 stars 10.87 score 330 scripts 1 dependents

bioc

Dino:Normalization of Single-Cell mRNA Sequencing Data

Dino normalizes single-cell, mRNA sequencing data to correct for technical variation, particularly sequencing depth, prior to downstream analysis. The approach produces a matrix of corrected expression for which the dependency between sequencing depth and the full distribution of normalized expression; many existing methods aim to remove only the dependency between sequencing depth and the mean of the normalized expression. This is particuarly useful in the context of highly sparse datasets such as those produced by 10X genomics and other uninque molecular identifier (UMI) based microfluidics protocols for which the depth-dependent proportion of zeros in the raw expression data can otherwise present a challenge.

Maintained by Jared Brown. Last updated 5 months ago.

software normalization rnaseq singlecell sequencing geneexpression transcriptomics regression cellbasedassays

18.0 match 9 stars 6.02 score 13 scripts

bioc

LOLA:Locus overlap analysis for enrichment of genomic ranges

Provides functions for testing overlap of sets of genomic regions with public and custom region set (genomic ranges) databases. This makes it possible to do automated enrichment analysis for genomic region sets, thus facilitating interpretation of functional genomics and epigenomics data.

Maintained by Nathan Sheffield. Last updated 5 months ago.

genesetenrichment generegulation genomeannotation systemsbiology functionalgenomics chipseq methylseq sequencing

11.6 match 76 stars 9.34 score 160 scripts

bioc

SRAdb:A compilation of metadata from NCBI SRA and tools

The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, and others. However, finding data of interest can be challenging using current tools. SRAdb is an attempt to make access to the metadata associated with submission, study, sample, experiment and run much more feasible. This is accomplished by parsing all the NCBI SRA metadata into a SQLite database that can be stored and queried locally. Fulltext search in the package make querying metadata very flexible and powerful. fastq and sra files can be downloaded for doing alignment locally. Beside ftp protocol, the SRAdb has funcitons supporting fastp protocol (ascp from Aspera Connect) for faster downloading large data files over long distance. The SQLite database is updated regularly as new data is added to SRA and can be downloaded at will for the most up-to-date metadata.

Maintained by Jack Zhu. Last updated 3 months ago.

infrastructure sequencing dataimport

13.8 match 2 stars 7.81 score 200 scripts

bioc

EDASeq:Exploratory Data Analysis and Normalization for RNA-Seq

Numerical and graphical summaries of RNA-Seq read data. Within-lane normalization procedures to adjust for GC-content effect (or other gene-level effects) on read counts: loess robust local regression, global-scaling, and full-quantile normalization (Risso et al., 2011). Between-lane normalization procedures to adjust for distributional differences between lanes (e.g., sequencing depth): global-scaling and full-quantile normalization (Bullard et al., 2010).

Maintained by Davide Risso. Last updated 5 months ago.

immunooncology sequencing rnaseq preprocessing qualitycontrol differentialexpression

10.5 match 5 stars 10.24 score 594 scripts 9 dependents

bioc

R453Plus1Toolbox:A package for importing and analyzing data from Roche's Genome Sequencer System

The R453Plus1 Toolbox comprises useful functions for the analysis of data generated by Roche's 454 sequencing platform. It adds functions for quality assurance as well as for annotation and visualization of detected variants, complementing the software tools shipped by Roche with their product. Further, a pipeline for the detection of structural variants is provided.

Maintained by Hans-Ulrich Klein. Last updated 5 months ago.

sequencing infrastructure dataimport datarepresentation visualization qualitycontrol reportwriting

30.8 match 3.48 score 10 scripts

bioc

SPsimSeq:Semi-parametric simulation tool for bulk and single-cell RNA sequencing data

SPsimSeq uses a specially designed exponential family for density estimation to constructs the distribution of gene expression levels from a given real RNA sequencing data (single-cell or bulk), and subsequently simulates a new dataset from the estimated marginal distributions using Gaussian-copulas to retain the dependence between genes. It allows simulation of multiple groups and batches with any required sample size and library size.

Maintained by Joris Meys. Last updated 5 months ago.

geneexpression rnaseq singlecell sequencing dnaseq

14.9 match 10 stars 7.14 score 29 scripts 1 dependents

bioc

Rbwa:R wrapper for BWA-backtrack and BWA-MEM aligners

Provides an R wrapper for BWA alignment algorithms. Both BWA-backtrack and BWA-MEM are available. Convenience function to build a BWA index from a reference genome is also provided. Currently not supported for Windows machines.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

sequencing alignment bwa bwa-mem dna-sequences dna-sequencing

25.0 match 1 stars 4.26 score 10 scripts 1 dependents

bioc

dmrseq:Detection and inference of differentially methylated regions from Whole Genome Bisulfite Sequencing

This package implements an approach for scanning the genome to detect and perform accurate inference on differentially methylated regions from Whole Genome Bisulfite Sequencing data. The method is based on comparing detected regions to a pooled null distribution, that can be implemented even when as few as two samples per population are available. Region-level statistics are obtained by fitting a generalized least squares (GLS) regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions.

Maintained by Keegan Korthauer. Last updated 5 months ago.

immunooncology dnamethylation epigenetics multiplecomparison software sequencing differentialmethylation wholegenome regression functionalgenomics

16.6 match 6.39 score 59 scripts 1 dependents

bioc

Glimma:Interactive visualizations for gene expression analysis

This package produces interactive visualizations for RNA-seq data analysis, utilizing output from limma, edgeR, or DESeq2. It produces interactive htmlwidgets versions of popular RNA-seq analysis plots to enhance the exploration of analysis results by overlaying interactive features. The plots can be viewed in a web browser or embedded in notebook documents.

Maintained by Shian Su. Last updated 1 months ago.

differentialexpression geneexpression microarray reportwriting rnaseq sequencing visualization differential-expression interactive-visualizations

10.0 match 32 stars 10.58 score 600 scripts 1 dependents

bioc

zinbwave:Zero-Inflated Negative Binomial Model for RNA-Seq Data

Implements a general and flexible zero-inflated negative binomial model that can be used to provide a low-dimensional representations of single-cell RNA-seq data. The model accounts for zero inflation (dropouts), over-dispersion, and the count nature of the data. The model also accounts for the difference in library sizes and optionally for batch effects and/or other covariates, avoiding the need for pre-normalize the data.

Maintained by Davide Risso. Last updated 5 months ago.

immunooncology dimensionreduction geneexpression rnaseq software transcriptomics sequencing singlecell

10.0 match 43 stars 10.53 score 190 scripts 6 dependents

bioc

scds:In-Silico Annotation of Doublets for Single Cell RNA Sequencing Data

In single cell RNA sequencing (scRNA-seq) data combinations of cells are sometimes considered a single cell (doublets). The scds package provides methods to annotate doublets in scRNA-seq data computationally.

Maintained by Dennis Kostka. Last updated 5 months ago.

singlecell rnaseq qualitycontrol preprocessing transcriptomics geneexpression sequencing software classification

16.0 match 6.57 score 176 scripts 1 dependents

bioc

YAPSA:Yet Another Package for Signature Analysis

This package provides functions and routines for supervised analyses of mutational signatures (i.e., the signatures have to be known, cf. L. Alexandrov et al., Nature 2013 and L. Alexandrov et al., Bioaxiv 2018). In particular, the family of functions LCD (LCD = linear combination decomposition) can use optimal signature-specific cutoffs which takes care of different detectability of the different signatures. Moreover, the package provides different sets of mutational signatures, including the COSMIC and PCAWG SNV signatures and the PCAWG Indel signatures; the latter infering that with YAPSA, the concept of supervised analysis of mutational signatures is extended to Indel signatures. YAPSA also provides confidence intervals as computed by profile likelihoods and can perform signature analysis on a stratified mutational catalogue (SMC = stratify mutational catalogue) in order to analyze enrichment and depletion patterns for the signatures in different strata.

Maintained by Zuguang Gu. Last updated 5 months ago.

sequencing dnaseq somaticmutation visualization clustering genomicvariation statisticalmethod biologicalquestion

16.4 match 6.41 score 57 scripts

zachcp

msaR:Multiple Sequence Alignment for R Shiny

Visualises multiple sequence alignments dynamically within the Shiny web application framework.

Maintained by Zachary Charlop-Powers. Last updated 3 years ago.

alignment htmlwidgets multiple sequence

19.6 match 8 stars 5.37 score 58 scripts

bioc

OmnipathR:OmniPath web service client and more

A client for the OmniPath web service (https://www.omnipathdb.org) and many other resources. It also includes functions to transform and pretty print some of the downloaded data, functions to access a number of other resources such as BioPlex, ConsensusPathDB, EVEX, Gene Ontology, Guide to Pharmacology (IUPHAR/BPS), Harmonizome, HTRIdb, Human Phenotype Ontology, InWeb InBioMap, KEGG Pathway, Pathway Commons, Ramilowski et al. 2015, RegNetwork, ReMap, TF census, TRRUST and Vinayagam et al. 2011. Furthermore, OmnipathR features a close integration with the NicheNet method for ligand activity prediction from transcriptomics data, and its R implementation `nichenetr` (available only on github).

Maintained by Denes Turei. Last updated 19 days ago.

graphandnetwork network pathways software thirdpartyclient dataimport datarepresentation genesignaling generegulation systemsbiology transcriptomics singlecell annotation kegg complexes enzyme-ptm networks networks-biology omnipath proteins quarto

10.6 match 126 stars 9.90 score 226 scripts 2 dependents

causal-lda

TrialEmulation:Causal Analysis of Observational Time-to-Event Data

Implements target trial emulation methods to apply randomized clinical trial design and analysis in an observational setting. Using marginal structural models, it can estimate intention-to-treat and per-protocol effects in emulated trials using electronic health records. A description and application of the method can be found in Danaei et al (2013) <doi:10.1177/0962280211403603>.

Maintained by Isaac Gravestock. Last updated 23 days ago.

causal-inference longitudinal-data survival-analysis cpp

13.5 match 25 stars 7.72 score 29 scripts

bioc

AneuFinder:Analysis of Copy Number Variation in Single-Cell-Sequencing Data

AneuFinder implements functions for copy-number detection, breakpoint detection, and karyotype and heterogeneity analysis in single-cell whole genome sequencing and strand-seq data.

Maintained by Aaron Taudt. Last updated 5 months ago.

immunooncology software sequencing singlecell copynumbervariation genomicvariation hiddenmarkovmodel wholegenome cpp

13.3 match 17 stars 7.70 score 37 scripts

bioc

muscat:Multi-sample multi-group scRNA-seq data analysis tools

`muscat` provides various methods and visualization tools for DS analysis in multi-sample, multi-group, multi-(cell-)subpopulation scRNA-seq data, including cell-level mixed models and methods based on aggregated “pseudobulk” data, as well as a flexible simulation platform that mimics both single and multi-sample scRNA-seq data.

Maintained by Helena L. Crowell. Last updated 5 months ago.

immunooncology differentialexpression sequencing singlecell software statisticalmethod visualization

10.0 match 181 stars 10.26 score 686 scripts

bioc

QuasR:Quantify and Annotate Short Reads in R

This package provides a framework for the quantification and analysis of Short Reads. It covers a complete workflow starting from raw sequence reads, over creation of alignments and quality control plots, to the quantification of genomic regions of interest. Read alignments are either generated through Rbowtie (data from DNA/ChIP/ATAC/Bis-seq experiments) or Rhisat2 (data from RNA-seq experiments that require spliced alignments), or can be provided in the form of bam files.

Maintained by Michael Stadler. Last updated 24 days ago.

genetics preprocessing sequencing chipseq rnaseq methylseq coverage alignment qualitycontrol immunooncology curl bzip2 xz-utils zlib cpp

11.8 match 6 stars 8.70 score 79 scripts 1 dependents

bioc

scone:Single Cell Overview of Normalized Expression data

SCONE is an R package for comparing and ranking the performance of different normalization schemes for single-cell RNA-seq and other high-throughput analyses.

Maintained by Davide Risso. Last updated 25 days ago.

immunooncology normalization preprocessing qualitycontrol geneexpression rnaseq software transcriptomics sequencing singlecell coverage

11.2 match 53 stars 9.12 score 104 scripts

bioc

scuttle:Single-Cell RNA-Seq Analysis Utilities

Provides basic utility functions for performing single-cell analyses, focusing on simple normalization, quality control and data transformations. Also provides some helper functions to assist development of other packages.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology singlecell rnaseq qualitycontrol preprocessing normalization transcriptomics geneexpression sequencing software dataimport openblas cpp

10.0 match 10.21 score 1.7k scripts 80 dependents

bioc

CircSeqAlignTk:A toolkit for end-to-end analysis of RNA-seq data for circular genomes

CircSeqAlignTk is designed for end-to-end RNA-Seq data analysis of circular genome sequences, from alignment to visualization. It mainly targets viroids which are composed of 246-401 nt circular RNAs. In addition, CircSeqAlignTk implements a tidy interface to generate synthetic sequencing data that mimic real RNA-Seq data, allowing developers to evaluate the performance of alignment tools and workflows.

Maintained by Jianqiang Sun. Last updated 5 months ago.

sequencing smallrna alignment software

23.1 match 4.40 score 3 scripts

bioc

clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters

Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.

Maintained by Rui Fu. Last updated 5 months ago.

singlecell annotation sequencing microarray geneexpression assign-identities clusters marker-genes rna-seq single-cell-rna-seq

10.5 match 119 stars 9.63 score 296 scripts

bioc

QSutils:Quasispecies Diversity

Set of utility functions for viral quasispecies analysis with NGS data. Most functions are equally useful for metagenomic studies. There are three main types: (1) data manipulation and exploration—functions useful for converting reads to haplotypes and frequencies, repairing reads, intersecting strand haplotypes, and visualizing haplotype alignments. (2) diversity indices—functions to compute diversity and entropy, in which incidence, abundance, and functional indices are considered. (3) data simulation—functions useful for generating random viral quasispecies data.

Maintained by Mercedes Guerrero-Murillo. Last updated 5 months ago.

software genetics dnaseq geneticvariability sequencing alignment sequencematching dataimport

18.1 match 5.56 score 8 scripts 1 dependents

bioc

ModCon:Modifying splice site usage by changing the mRNP code, while maintaining the genetic code

Collection of functions to calculate a nucleotide sequence surrounding for splice donors sites to either activate or repress donor usage. The proposed alternative nucleotide sequence encodes the same amino acid and could be applied e.g. in reporter systems to silence or activate cryptic splice donor sites.

Maintained by Johannes Ptok. Last updated 5 months ago.

functionalgenomics alternativesplicing

25.1 match 1 stars 4.00 score 2 scripts

bioc

SOMNiBUS:Smooth modeling of bisulfite sequencing

This package aims to analyse count-based methylation data on predefined genomic regions, such as those obtained by targeted sequencing, and thus to identify differentially methylated regions (DMRs) that are associated with phenotypes or traits. The method is built a rich flexible model that allows for the effects, on the methylation levels, of multiple covariates to vary smoothly along genomic regions. At the same time, this method also allows for sequencing errors and can adjust for variability in cell type mixture.

Maintained by Kathleen Klein. Last updated 3 months ago.

dnamethylation regression epigenetics differentialmethylation sequencing functionalprediction

23.3 match 1 stars 4.30 score 3 scripts

bioc

CAGEr:Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

The _CAGEr_ package identifies transcription start sites (TSS) and their usage frequency from CAGE (Cap Analysis Gene Expression) sequencing data. It normalises raw CAGE tag count, clusters TSSs into tag clusters (TC) and aggregates them across multiple CAGE experiments to construct consensus clusters (CC) representing the promoterome. CAGEr provides functions to profile expression levels of these clusters by cumulative expression and rarefaction analysis, and outputs the plots in ggplot2 format for further facetting and customisation. After clustering, CAGEr performs analyses of promoter width and detects differential usage of TSSs (promoter shifting) between samples. CAGEr also exports its data as genome browser tracks, and as R objects for downsteam expression analysis by other Bioconductor packages such as DESeq2, CAGEfightR, or seqArchR.

Maintained by Charles Plessy. Last updated 5 months ago.

preprocessing sequencing normalization functionalgenomics transcription geneexpression clustering visualization

16.4 match 6.12 score 73 scripts

bioc

tidybulk:Brings transcriptomics to the tidyverse

This is a collection of utility functions that allow to perform exploration of and calculations to RNA sequencing data, in a modular, pipe-friendly and tidy fashion.

Maintained by Stefano Mangiola. Last updated 5 months ago.

assaydomain infrastructure rnaseq differentialexpression geneexpression normalization clustering qualitycontrol sequencing transcription transcriptomics bioconductor bulk-transcriptional-analyses deseq2 differential-expression edger ensembl-ids entrez gene-symbols gsea mds-dimensions pca pipe redundancy tibble tidy tidy-data tidyverse transcripts tsne

10.5 match 168 stars 9.48 score 172 scripts 1 dependents

bioc

rGREAT:GREAT Analysis - Functional Enrichment on Genomic Regions

GREAT (Genomic Regions Enrichment of Annotations Tool) is a type of functional enrichment analysis directly performed on genomic regions. This package implements the GREAT algorithm (the local GREAT analysis), also it supports directly interacting with the GREAT web service (the online GREAT analysis). Both analysis can be viewed by a Shiny application. rGREAT by default supports more than 600 organisms and a large number of gene set collections, as well as self-provided gene sets and organisms from users. Additionally, it implements a general method for dealing with background regions.

Maintained by Zuguang Gu. Last updated 4 days ago.

genesetenrichment go pathways software sequencing wholegenome genomeannotation coverage cpp

10.0 match 86 stars 9.96 score 320 scripts 1 dependents

dsstoffer

astsa:Applied Statistical Time Series Analysis

Contains data sets and scripts for analyzing time series in both the frequency and time domains including state space modeling as well as supporting the texts Time Series Analysis and Its Applications: With R Examples (5th ed), by R.H. Shumway and D.S. Stoffer. Springer Texts in Statistics, 2025, <https://link.springer.com/book/9783031705830>, and Time Series: A Data Analysis Approach Using R. Chapman-Hall, 2019, <DOI:10.1201/9780429273285>.

Maintained by David Stoffer. Last updated 2 months ago.

12.4 match 7 stars 7.88 score 2.2k scripts 8 dependents

bioc

ChIPpeakAnno:Batch annotation of the peaks identified from either ChIP-seq, ChIP-chip experiments, or any experiments that result in large number of genomic interval data

The package encompasses a range of functions for identifying the closest gene, exon, miRNA, or custom features—such as highly conserved elements and user-supplied transcription factor binding sites. Additionally, users can retrieve sequences around the peaks and obtain enriched Gene Ontology (GO) or Pathway terms. In version 2.0.5 and beyond, new functionalities have been introduced. These include features for identifying peaks associated with bi-directional promoters along with summary statistics (peaksNearBDP), summarizing motif occurrences in peaks (summarizePatternInPeaks), and associating additional identifiers with annotated peaks or enrichedGO (addGeneIDs). The package integrates with various other packages such as biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest, and stat to enhance its analytical capabilities.

Maintained by Jianhong Ou. Last updated 2 months ago.

annotation chipseq chipchip

11.2 match 8.75 score 584 scripts 6 dependents