Showing 200 of total 1185 results (show query)
ropensci
biomartr:Genomic Data Retrieval
Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database (Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr', 'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. with only one command.
Maintained by Hajk-Georg Drost. Last updated 2 months ago.
biomartgenomic-data-retrievalannotation-retrievaldatabase-retrievalncbiensemblbiological-data-retrievalensembl-serversgenomegenome-annotationgenome-retrievalgenomicsmeta-analysismetagenomicsncbi-genbankpeer-reviewedproteomesequenced-genomes
113.7 match 218 stars 11.35 score 129 scripts 3 dependentsbioc
genomation:Summary, annotation and visualization of genomic data
A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.
Maintained by Altuna Akalin. Last updated 5 months ago.
annotationsequencingvisualizationcpgislandcpp
80.7 match 75 stars 11.09 score 738 scripts 5 dependentsshixiangwang
sigminer:Extract, Analyze and Visualize Mutational Signatures for Genomic Variations
Genomic alterations including single nucleotide substitution, copy number alteration, etc. are the major force for cancer initialization and development. Due to the specificity of molecular lesions caused by genomic alterations, we can generate characteristic alteration spectra, called 'signature' (Wang, Shixiang, et al. (2021) <DOI:10.1371/journal.pgen.1009557> & Alexandrov, Ludmil B., et al. (2020) <DOI:10.1038/s41586-020-1943-3> & Steele Christopher D., et al. (2022) <DOI:10.1038/s41586-022-04738-6>). This package helps users to extract, analyze and visualize signatures from genomic alteration records, thus providing new insight into cancer study.
Maintained by Shixiang Wang. Last updated 6 months ago.
bayesian-nmfbioinformaticscancer-researchcnvcopynumber-signaturescosmic-signaturesdbseasy-to-useindelmutational-signaturesnmfnmf-extractionsbssignature-extractionsomatic-mutationssomatic-variantsvisualizationcpp
80.4 match 150 stars 9.48 score 123 scripts 2 dependentsstuart-lab
Signac:Analysis of Single-Cell Chromatin Data
A framework for the analysis and exploration of single-cell chromatin data. The 'Signac' package contains functions for quantifying single-cell chromatin data, computing per-cell quality control metrics, dimension reduction and normalization, visualization, and DNA sequence motif analysis. Reference: Stuart et al. (2021) <doi:10.1038/s41592-021-01282-5>.
Maintained by Tim Stuart. Last updated 7 months ago.
atacbioinformaticssingle-cellzlibcpp
58.3 match 355 stars 12.18 score 3.7k scripts 1 dependentsthackl
gggenomes:A Grammar of Graphics for Comparative Genomics
An extension of 'ggplot2' for creating complex genomic maps. It builds on the power of 'ggplot2' and 'tidyverse' adding new 'ggplot2'-style geoms & positions and 'dplyr'-style verbs to manipulate the underlying data. It implements a layout concept inspired by 'ggraph' and introduces tracks to bring tidiness to the mess that is genomics data.
Maintained by Thomas Hackl. Last updated 2 months ago.
biological-datacomparative-genomicsgenomics-visualizationggplot-extensionggplot2
57.2 match 650 stars 9.56 score 123 scriptsjokergoo
circlize:Circular Visualization
Circular layout is an efficient way for the visualization of huge amounts of information. Here this package provides an implementation of circular layout generation in R as well as an enhancement of available software. The flexibility of the package is based on the usage of low-level graphics functions such that self-defined high-level graphics can be easily implemented by users for specific purposes. Together with the seamless connection between the powerful computational and visual environment in R, it gives users more convenience and freedom to design figures for better understanding complex patterns behind multiple dimensional data. The package is described in Gu et al. 2014 <doi:10.1093/bioinformatics/btu393>.
Maintained by Zuguang Gu. Last updated 1 years ago.
34.6 match 983 stars 15.62 score 10k scripts 213 dependentskbroman
qtl:Tools for Analyzing QTL Experiments
Analysis of experimental crosses to identify genes (called quantitative trait loci, QTLs) contributing to variation in quantitative traits. Broman et al. (2003) <doi:10.1093/bioinformatics/btg112>.
Maintained by Karl W Broman. Last updated 7 months ago.
38.6 match 80 stars 12.79 score 2.4k scripts 29 dependentsbioc
rtracklayer:R interface to genome annotation files and the UCSC genome browser
Extensible framework for interacting with multiple genome browsers (currently UCSC built-in) and manipulating annotation tracks in various formats (currently GFF, BED, bedGraph, BED15, WIG, BigWig and 2bit built-in). The user may export/import tracks to/from the supported browsers, as well as query and modify the browser state, such as the current viewport.
Maintained by Michael Lawrence. Last updated 3 days ago.
annotationvisualizationdataimportzlibopensslcurl
36.7 match 12.66 score 6.7k scripts 480 dependentsbioc
GenomicRanges:Representation and manipulation of genomic intervals
The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.
Maintained by Hervé Pagès. Last updated 4 months ago.
geneticsinfrastructuredatarepresentationsequencingannotationgenomeannotationcoveragebioconductor-packagecore-package
26.2 match 44 stars 17.68 score 13k scripts 1.3k dependentsbioc
maftools:Summarize, Analyze and Visualize MAF Files
Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.
Maintained by Anand Mayakonda. Last updated 5 months ago.
datarepresentationdnaseqvisualizationdrivermutationvariantannotationfeatureextractionclassificationsomaticmutationsequencingfunctionalgenomicssurvivalbioinformaticscancer-genome-atlascancer-genomicsgenomicsmaf-filestcgacurlbzip2xz-utilszlib
28.9 match 459 stars 14.63 score 948 scripts 18 dependentsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 16 days ago.
immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project
25.0 match 182 stars 13.71 score 1.3k scripts 22 dependentsbioc
GenomicFeatures:Query the gene models of a given organism/assembly
Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.
Maintained by H. Pagès. Last updated 5 months ago.
geneticsinfrastructureannotationsequencinggenomeannotationbioconductor-packagecore-package
22.3 match 26 stars 15.34 score 5.3k scripts 339 dependentsbioc
ORFik:Open Reading Frames in Genomics
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
Maintained by Haakon Tjeldnes. Last updated 1 months ago.
immunooncologysoftwaresequencingriboseqrnaseqfunctionalgenomicscoveragealignmentdataimportcpp
31.9 match 33 stars 10.56 score 115 scripts 2 dependentstickingclock1992
RIdeogram:Drawing SVG Graphics to Visualize and Map Genome-Wide Data on Idiograms
For whole-genome analysis, idiograms are virtually the most intuitive and effective way to map and visualize the genome-wide information. RIdeogram was developed to visualize and map whole-genome data on idiograms with no restriction of species.
Maintained by Zhaodong Hao. Last updated 5 years ago.
39.7 match 169 stars 7.97 score 62 scriptsbioc
karyoploteR:Plot customizable linear genomes displaying arbitrary data
karyoploteR creates karyotype plots of arbitrary genomes and offers a complete set of functions to plot arbitrary data on them. It mimicks many R base graphics functions coupling them with a coordinate change function automatically mapping the chromosome and data coordinates into the plot coordinates. In addition to the provided data plotting functions, it is easy to add new ones.
Maintained by Bernat Gel. Last updated 5 months ago.
visualizationcopynumbervariationsequencingcoveragednaseqchipseqmethylseqdataimportonechannelbioconductorbioinformaticsdata-visualizationgenomegenomics-visualizationplotting-in-r
26.4 match 306 stars 11.22 score 656 scripts 4 dependentsbioc
plyranges:A fluent interface for manipulating GenomicRanges
A dplyr-like interface for interacting with the common Bioconductor classes Ranges and GenomicRanges. By providing a grammatical and consistent way of manipulating these classes their accessiblity for new Bioconductor users is hopefully increased.
Maintained by Michael Love. Last updated 9 days ago.
infrastructuredatarepresentationworkflowstepcoveragebioconductordata-analysisdplyrgenomic-rangesgenomicstidy-data
23.0 match 144 stars 12.66 score 1.9k scripts 20 dependentsbioc
plotgardener:Coordinate-Based Genomic Visualization Package for R
Coordinate-based genomic visualization package for R. It grants users the ability to programmatically produce complex, multi-paneled figures. Tailored for genomics, plotgardener allows users to visualize large complex genomic datasets and provides exquisite control over how plots are placed and arranged on a page.
Maintained by Nicole Kramer. Last updated 5 months ago.
visualizationgenomeannotationfunctionalgenomicsgenomeassemblyhiccpp
28.5 match 309 stars 10.17 score 167 scripts 3 dependentssatijalab
Seurat:Tools for Single Cell Genomics
A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.
Maintained by Paul Hoffman. Last updated 1 years ago.
human-cell-atlassingle-cell-genomicssingle-cell-rna-seqcpp
16.8 match 2.4k stars 16.86 score 50k scripts 73 dependentsbioc
annotatr:Annotation of Genomic Regions to Genomic Annotations
Given a set of genomic sites/regions (e.g. ChIP-seq peaks, CpGs, differentially methylated CpGs or regions, SNPs, etc.) it is often of interest to investigate the intersecting genomic annotations. Such annotations include those relating to gene models (promoters, 5'UTRs, exons, introns, and 3'UTRs), CpGs (CpG islands, CpG shores, CpG shelves), or regulatory sequences such as enhancers. The annotatr package provides an easy way to summarize and visualize the intersection of genomic sites/regions with genomic annotations.
Maintained by Raymond G. Cavalcante. Last updated 5 months ago.
softwareannotationgenomeannotationfunctionalgenomicsvisualizationgenome-annotation
28.4 match 26 stars 9.76 score 246 scripts 5 dependentshzhanghenry
RCircos:Circos 2D Track Plot
A simple and flexible way to generate Circos 2D track plot images for genomic data visualization is implemented in this package. The types of plots include: heatmap, histogram, lines, scatterplot, tiles and plot items for further decorations include connector, link (lines and ribbons), and text (gene) label. All functions require only R graphics package that comes with R base installation.
Maintained by Hongen Zhang. Last updated 3 years ago.
35.9 match 6 stars 7.21 score 298 scripts 3 dependentsbioc
igvR:igvR: integrative genomics viewer
Access to igv.js, the Integrative Genomics Viewer running in a web browser.
Maintained by Arkadiusz Gladki. Last updated 5 months ago.
visualizationthirdpartyclientgenomebrowsers
30.3 match 45 stars 8.33 score 118 scriptsbioc
ensembldb:Utilities to create and use Ensembl-based annotation databases
The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.
Maintained by Johannes Rainer. Last updated 5 months ago.
geneticsannotationdatasequencingcoverageannotationbioconductorbioconductor-packagesensembl
17.7 match 35 stars 14.08 score 892 scripts 108 dependentsbioc
GenomicAlignments:Representation and manipulation of short genomic alignments
Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.
Maintained by Hervé Pagès. Last updated 5 months ago.
infrastructuredataimportgeneticssequencingrnaseqsnpcoveragealignmentimmunooncologybioconductor-packagecore-package
16.2 match 10 stars 15.21 score 3.1k scripts 528 dependentsknausb
vcfR:Manipulate and Visualize VCF Data
Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.
Maintained by Brian J. Knaus. Last updated 1 months ago.
genomicspopulation-geneticspopulation-genomicsrcppvcf-datavisualizationzlibcpp
18.0 match 256 stars 13.66 score 3.1k scripts 19 dependentsrnabioco
valr:Genome Interval Arithmetic
Read and manipulate genome intervals and signals. Provides functionality similar to command-line tool suites within R, enabling interactive analysis and visualization of genome-scale data. Riemondy et al. (2017) <doi:10.12688/f1000research.11997.1>.
Maintained by Kent Riemondy. Last updated 19 days ago.
bedtoolsgenomeinterval-arithmeticcpp
25.3 match 90 stars 9.69 score 227 scriptsbioc
GenomicDistributions:GenomicDistributions: fast analysis of genomic intervals with Bioconductor
If you have a set of genomic ranges, this package can help you with visualization and comparison. It produces several kinds of plots, for example: Chromosome distribution plots, which visualize how your regions are distributed over chromosomes; feature distance distribution plots, which visualizes how your regions are distributed relative to a feature of interest, like Transcription Start Sites (TSSs); genomic partition plots, which visualize how your regions overlap given genomic features such as promoters, introns, exons, or intergenic regions. It also makes it easy to compare one set of ranges to another.
Maintained by Kristyna Kupkova. Last updated 5 months ago.
softwaregenomeannotationgenomeassemblydatarepresentationsequencingcoveragefunctionalgenomicsvisualization
32.4 match 26 stars 7.44 score 25 scriptstanaylab
misha:Toolkit for Analysis of Genomic Data
A toolkit for analysis of genomic data. The 'misha' package implements an efficient data structure for storing genomic data, and provides a set of functions for data extraction, manipulation and analysis. Some of the 2D genome algorithms were described in Yaffe and Tanay (2011) <doi:10.1038/ng.947>.
Maintained by Aviezer Lifshitz. Last updated 17 days ago.
37.2 match 4 stars 5.86 scorebioc
genomes:Genome sequencing project metadata
Download genome and assembly reports from NCBI
Maintained by Chris Stubben. Last updated 5 months ago.
59.5 match 3.48 score 15 scriptsbioc
gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files
Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.
Maintained by Xiuwen Zheng. Last updated 14 days ago.
infrastructuredataimportbioinformaticsgds-formatgenomicscpp
17.8 match 18 stars 11.34 score 920 scripts 29 dependentsbioc
GenomicPlot:Plot profiles of next generation sequencing data in genomic features
Visualization of next generation sequencing (NGS) data is essential for interpreting high-throughput genomics experiment results. 'GenomicPlot' facilitates plotting of NGS data in various formats (bam, bed, wig and bigwig); both coverage and enrichment over input can be computed and displayed with respect to genomic features (such as UTR, CDS, enhancer), and user defined genomic loci or regions. Statistical tests on signal intensity within user defined regions of interest can be performed and represented as boxplots or bar graphs. Parallel processing is used to speed up computation on multicore platforms. In addition to genomic plots which is suitable for displaying of coverage of genomic DNA (such as ChIPseq data), metagenomic (without introns) plots can also be made for RNAseq or CLIPseq data as well.
Maintained by Shuye Pu. Last updated 2 months ago.
alternativesplicingchipseqcoveragegeneexpressionrnaseqsequencingsoftwaretranscriptionvisualizationannotation
35.2 match 3 stars 5.62 score 4 scriptsbioc
MultiAssayExperiment:Software for the integration of multi-omics experiments in Bioconductor
Harmonize data management of multiple experimental assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames. Facilities are provided for reshaping data into wide and long formats for adaptability to graphing and downstream analysis.
Maintained by Marcel Ramos. Last updated 2 months ago.
infrastructuredatarepresentationbioconductorbioconductor-packagegenomicsnci-itcrtcgau24ca289073
13.1 match 71 stars 14.95 score 670 scripts 127 dependentsbioc
derfinder:Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach
This package provides functions for annotation-agnostic differential expression analysis of RNA-seq data. Two implementations of the DER Finder approach are included in this package: (1) single base-level F-statistics and (2) DER identification at the expressed regions-level. The DER Finder approach can also be used to identify differentially bounded ChIP-seq peaks.
Maintained by Leonardo Collado-Torres. Last updated 4 months ago.
differentialexpressionsequencingrnaseqchipseqdifferentialpeakcallingsoftwareimmunooncologycoverageannotation-agnosticbioconductorderfinder
18.6 match 42 stars 10.03 score 78 scripts 6 dependentsrqtl
qtl2:Quantitative Trait Locus Mapping in Experimental Crosses
Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.
Maintained by Karl W Broman. Last updated 20 days ago.
19.6 match 34 stars 9.48 score 1.1k scripts 5 dependentsbioc
BSgenome:Software infrastructure for efficient representation of full genomes and their SNPs
Infrastructure shared by all the Biostrings-based genome data packages.
Maintained by Hervé Pagès. Last updated 2 months ago.
geneticsinfrastructuredatarepresentationsequencematchingannotationsnpbioconductor-packagecore-package
12.7 match 9 stars 14.12 score 1.2k scripts 267 dependentsbioc
cogeqc:Systematic quality checks on comparative genomics analyses
cogeqc aims to facilitate systematic quality checks on standard comparative genomics analyses to help researchers detect issues and select the most suitable parameters for each data set. cogeqc can be used to asses: i. genome assembly and annotation quality with BUSCOs and comparisons of statistics with publicly available genomes on the NCBI; ii. orthogroup inference using a protein domain-based approach and; iii. synteny detection using synteny network properties. There are also data visualization functions to explore QC summary statistics.
Maintained by Fabrício Almeida-Silva. Last updated 5 months ago.
softwaregenomeassemblycomparativegenomicsfunctionalgenomicsphylogeneticsqualitycontrolnetworkcomparative-genomicsevolutionary-genomics
29.4 match 10 stars 6.08 score 20 scriptsbioc
syntenet:Inference And Analysis Of Synteny Networks
syntenet can be used to infer synteny networks from whole-genome protein sequences and analyze them. Anchor pairs are detected with the MCScanX algorithm, which was ported to this package with the Rcpp framework for R and C++ integration. Anchor pairs from synteny analyses are treated as an undirected unweighted graph (i.e., a synteny network), and users can perform: i. network clustering; ii. phylogenomic profiling (by identifying which species contain which clusters) and; iii. microsynteny-based phylogeny reconstruction with maximum likelihood.
Maintained by Fabrício Almeida-Silva. Last updated 3 months ago.
softwarenetworkinferencefunctionalgenomicscomparativegenomicsphylogeneticssystemsbiologygraphandnetworkwholegenomenetworkcomparative-genomicsevolutionary-genomicsnetwork-sciencephylogenomicssyntenysynteny-networkcpp
25.9 match 28 stars 6.70 score 12 scripts 1 dependentsgaynorr
AlphaSimR:Breeding Program Simulations
The successor to the 'AlphaSim' software for breeding program simulation [Faux et al. (2016) <doi:10.3835/plantgenome2016.02.0013>]. Used for stochastic simulations of breeding programs to the level of DNA sequence for every individual. Contained is a wide range of functions for modeling common tasks in a breeding program, such as selection and crossing. These functions allow for constructing simulations of highly complex plant and animal breeding programs via scripting in the R software environment. Such simulations can be used to evaluate overall breeding program performance and conduct research into breeding program design, such as implementation of genomic selection. Included is the 'Markovian Coalescent Simulator' ('MaCS') for fast simulation of biallelic sequences according to a population demographic history [Chen et al. (2009) <doi:10.1101/gr.083634.108>].
Maintained by Chris Gaynor. Last updated 5 months ago.
breedinggenomicssimulationopenblascppopenmp
16.6 match 47 stars 10.22 score 534 scripts 2 dependentsbioc
TitanCNA:Subclonal copy number and LOH prediction from whole genome sequencing of tumours
Hidden Markov model to segment and predict regions of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH), and estimate cellular prevalence of clonal clusters in tumour whole genome sequencing data.
Maintained by Gavin Ha. Last updated 5 months ago.
sequencingwholegenomednaseqexomeseqstatisticalmethodcopynumbervariationhiddenmarkovmodelgeneticsgenomicvariationimmunooncology10x-genomicscopy-number-variationgenome-sequencinghmmtumor-heterogeneity
19.9 match 97 stars 8.47 score 68 scriptsbioc
GEOquery:Get data from NCBI Gene Expression Omnibus (GEO)
The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.
Maintained by Sean Davis. Last updated 5 months ago.
microarraydataimportonechanneltwochannelsagebioconductorbioinformaticsdata-sciencegenomicsncbi-geo
11.7 match 92 stars 14.46 score 4.1k scripts 44 dependentscbiit
LDlinkR:Calculating Linkage Disequilibrium (LD) in Human Population Groups of Interest
Provides access to the 'LDlink' API (<https://ldlink.nih.gov/?tab=apiaccess>) using the R console. This programmatic access facilitates researchers who are interested in performing batch queries in 1000 Genomes Project (2015) <doi:10.1038/nature15393> data using 'LDlink'. 'LDlink' is an interactive and powerful suite of web-based tools for querying germline variants in human population groups of interest. For more details, please see Machiela et al. (2015) <doi:10.1093/bioinformatics/btv402>.
Maintained by Timothy A. Myers. Last updated 12 months ago.
ld-calculatorldlinkldlink-apildlink-webtoollinkage-disequilibriumpopulation-genetics
18.2 match 58 stars 9.21 score 206 scripts 1 dependentsbioc
RAIDS:Accurate Inference of Genetic Ancestry from Cancer Sequences
This package implements specialized algorithms that enable genetic ancestry inference from various cancer sequences sources (RNA, Exome and Whole-Genome sequences). This package also implements a simulation algorithm that generates synthetic cancer-derived data. This code and analysis pipeline was designed and developed for the following publication: Belleau, P et al. Genetic Ancestry Inference from Cancer-Derived Molecular Data across Genomic and Transcriptomic Platforms. Cancer Res 1 January 2023; 83 (1): 49–58.
Maintained by Pascal Belleau. Last updated 5 months ago.
geneticssoftwaresequencingwholegenomeprincipalcomponentgeneticvariabilitydimensionreductionbiocviewsancestrycancer-genomicsexome-sequencinggenomicsinferencer-languagerna-seqrna-sequencingwhole-genome-sequencing
26.1 match 5 stars 6.23 score 19 scriptsbioc
GenomicDataCommons:NIH / NCI Genomic Data Commons Access
Programmatically access the NIH / NCI Genomic Data Commons RESTful service.
Maintained by Sean Davis. Last updated 2 months ago.
dataimportsequencingapi-clientbioconductorbioinformaticscancercore-servicesdata-sciencegenomicsncitcgavignette
13.5 match 87 stars 11.94 score 238 scripts 12 dependentsbioc
GenomAutomorphism:Compute the automorphisms between DNA's Abelian group representations
This is a R package to compute the automorphisms between pairwise aligned DNA sequences represented as elements from a Genomic Abelian group. In a general scenario, from genomic regions till the whole genomes from a given population (from any species or close related species) can be algebraically represented as a direct sum of cyclic groups or more specifically Abelian p-groups. Basically, we propose the representation of multiple sequence alignments of length N bp as element of a finite Abelian group created by the direct sum of homocyclic Abelian group of prime-power order.
Maintained by Robersy Sanchez. Last updated 3 months ago.
mathematicalbiologycomparativegenomicsfunctionalgenomicsmultiplesequencealignmentwholegenomegenetic-codegenetic-code-algebragenomegenome-algebra
36.7 match 4.30 score 9 scriptsbioc
GenomeInfoDb:Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
Contains data and functions that define and allow translation between different chromosome sequence naming conventions (e.g., "chr1" versus "1"), including a function that attempts to place sequence names in their natural, rather than lexicographic, order.
Maintained by Hervé Pagès. Last updated 2 months ago.
geneticsdatarepresentationannotationgenomeannotationbioconductor-packagecore-package
9.6 match 32 stars 16.32 score 1.3k scripts 1.7k dependentsbioc
bsseq:Analyze, manage and store whole-genome methylation data
A collection of tools for analyzing and visualizing whole-genome methylation data from sequencing. This includes whole-genome bisulfite sequencing and Oxford nanopore data.
Maintained by Kasper Daniel Hansen. Last updated 3 months ago.
12.2 match 37 stars 12.26 score 676 scripts 15 dependentsbioc
GenomicScores:Infrastructure to work with genomewide position-specific scores
Provide infrastructure to store and access genomewide position-specific scores within R and Bioconductor.
Maintained by Robert Castelo. Last updated 2 months ago.
infrastructuregeneticsannotationsequencingcoverageannotationhubsoftware
17.2 match 8 stars 8.71 score 83 scripts 6 dependentsbioc
GSVA:Gene Set Variation Analysis for Microarray and RNA-Seq Data
Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.
Maintained by Robert Castelo. Last updated 7 days ago.
functionalgenomicsmicroarrayrnaseqpathwaysgenesetenrichmentgene-set-enrichmentgenomicspathway-enrichment-analysis
10.0 match 212 stars 14.74 score 1.6k scripts 19 dependentsbioc
cfDNAPro:cfDNAPro extracts and Visualises biological features from whole genome sequencing data of cell-free DNA
cfDNA fragments carry important features for building cancer sample classification ML models, such as fragment size, and fragment end motif etc. Analyzing and visualizing fragment size metrics, as well as other biological features in a curated, standardized, scalable, well-documented, and reproducible way might be time intensive. This package intends to resolve these problems and simplify the process. It offers two sets of functions for cfDNA feature characterization and visualization.
Maintained by Haichao Wang. Last updated 5 months ago.
visualizationsequencingwholegenomebioinformaticscancer-genomicscancer-researchcell-free-dnaearly-detectiongenomics-visualizationliquid-biopsyswgswhole-genome-sequencing
24.4 match 28 stars 6.04 score 13 scriptsbioc
genomeIntervals:Operations on genomic intervals
This package defines classes for representing genomic intervals and provides functions and methods for working with these. Note: The package provides the basic infrastructure for and is enhanced by the package 'girafe'.
Maintained by Julien Gagneur. Last updated 5 months ago.
dataimportinfrastructuregenetics
26.8 match 5.43 score 45 scripts 2 dependentsbioc
coMethDMR:Accurate identification of co-methylated and differentially methylated regions in epigenome-wide association studies
coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.
Maintained by Fernanda Veitzman. Last updated 5 months ago.
dnamethylationepigeneticsmethylationarraydifferentialmethylationgenomewideassociation
21.9 match 7 stars 6.47 score 42 scriptsbioc
musicatk:Mutational Signature Comprehensive Analysis Toolkit
Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3.
Maintained by Joshua D. Campbell. Last updated 5 months ago.
softwarebiologicalquestionsomaticmutationvariantannotation
20.1 match 13 stars 6.97 score 20 scriptsbioc
orthogene:Interspecies gene mapping
`orthogene` is an R package for easy mapping of orthologous genes across hundreds of species. It pulls up-to-date gene ortholog mappings across **700+ organisms**. It also provides various utility functions to aggregate/expand common objects (e.g. data.frames, gene expression matrices, lists) using **1:1**, **many:1**, **1:many** or **many:many** gene mappings, both within- and between-species.
Maintained by Brian Schilder. Last updated 5 months ago.
geneticscomparativegenomicspreprocessingphylogeneticstranscriptomicsgeneexpressionanimal-modelsbioconductorbioconductor-packagebioinformaticsbiomedicinecomparative-genomicsevolutionary-biologygenesgenomicsontologiestranslational-research
17.5 match 42 stars 7.85 score 31 scripts 2 dependentsbioc
iClusterPlus:Integrative clustering of multi-type genomic data
Integrative clustering of multiple genomic data using a joint latent variable model.
Maintained by Qianxing Mo. Last updated 4 months ago.
multi-omicsclusteringfortranopenblas
23.7 match 5.76 score 190 scriptslarssnip
micropan:Microbial Pan-Genome Analysis
A collection of functions for computations and visualizations of microbial pan-genomes.
Maintained by Lars Snipen. Last updated 3 years ago.
21.7 match 21 stars 6.15 score 67 scriptsbioc
nullranges:Generation of null ranges via bootstrapping or covariate matching
Modular package for generation of sets of ranges representing the null hypothesis. These can take the form of bootstrap samples of ranges (using the block bootstrap framework of Bickel et al 2010), or sets of control ranges that are matched across one or more covariates. nullranges is designed to be inter-operable with other packages for analysis of genomic overlap enrichment, including the plyranges Bioconductor package.
Maintained by Michael Love. Last updated 5 months ago.
visualizationgenesetenrichmentfunctionalgenomicsepigeneticsgeneregulationgenetargetgenomeannotationannotationgenomewideassociationhistonemodificationchipseqatacseqdnaseseqrnaseqhiddenmarkovmodelbioconductorbootstrapgenomicsmatchingstatistics
16.2 match 27 stars 8.16 score 50 scripts 1 dependentsbioc
Gviz:Plotting data and annotation information along genomic coordinates
Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.
Maintained by Robert Ivanek. Last updated 5 months ago.
visualizationmicroarraysequencing
9.8 match 79 stars 13.08 score 1.4k scripts 48 dependentsbioc
recoup:An R package for the creation of complex genomic profile plots
recoup calculates and plots signal profiles created from short sequence reads derived from Next Generation Sequencing technologies. The profiles provided are either sumarized curve profiles or heatmap profiles. Currently, recoup supports genomic profile plots for reads derived from ChIP-Seq and RNA-Seq experiments. The package uses ggplot2 and ComplexHeatmap graphics facilities for curve and heatmap coverage profiles respectively.
Maintained by Panagiotis Moulos. Last updated 5 months ago.
immunooncologysoftwaregeneexpressionpreprocessingqualitycontrolrnaseqchipseqsequencingcoverageatacseqchiponchipalignmentdataimport
25.5 match 1 stars 5.02 score 2 scriptsnanxstats
ggsci:Scientific Journal and Sci-Fi Themed Color Palettes for 'ggplot2'
A collection of 'ggplot2' color palettes inspired by plots in scientific journals, data visualization libraries, science fiction movies, and TV shows.
Maintained by Nan Xiao. Last updated 9 months ago.
color-palettesdata-visualizationggplot2ggscisci-fiscientific-journalsvisualization
7.1 match 680 stars 18.00 score 26k scripts 438 dependentsbioc
doubletrouble:Identification and classification of duplicated genes
doubletrouble aims to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. The duplication modes are i. segmental duplication (SD); ii. tandem duplication (TD); iii. proximal duplication (PD); iv. transposed duplication (TRD) and; v. dispersed duplication (DD). Transposon-derived duplicates (TRD) can be further subdivided into rTRD (retrotransposon-derived duplication) and dTRD (DNA transposon-derived duplication). If users want a simpler classification scheme, duplicates can also be classified into SD- and SSD-derived (small-scale duplication) gene pairs. Besides classifying gene pairs, users can also classify genes, so that each gene is assigned a unique mode of duplication. Users can also calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.
Maintained by Fabrício Almeida-Silva. Last updated 15 days ago.
softwarewholegenomecomparativegenomicsfunctionalgenomicsphylogeneticsnetworkclassificationbioinformaticscomparative-genomicsgene-duplicationmolecular-evolutionwhole-genome-duplication
19.8 match 23 stars 6.44 score 17 scriptsbioc
AneuFinder:Analysis of Copy Number Variation in Single-Cell-Sequencing Data
AneuFinder implements functions for copy-number detection, breakpoint detection, and karyotype and heterogeneity analysis in single-cell whole genome sequencing and strand-seq data.
Maintained by Aaron Taudt. Last updated 2 days ago.
immunooncologysoftwaresequencingsinglecellcopynumbervariationgenomicvariationhiddenmarkovmodelwholegenomecpp
16.0 match 18 stars 7.90 score 37 scriptsitalo-granato
snpReady:Preparing Genotypic Datasets in Order to Run Genomic Analysis
Three functions to clean, summarize and prepare genomic datasets to Genome Selection and Genome Association analysis and to estimate population genetic parameters.
Maintained by Italo Granato. Last updated 5 years ago.
21.3 match 4 stars 5.90 score 33 scriptsstephenturner
qqman:Q-Q and Manhattan Plots for GWAS Data
Create Q-Q and manhattan plots for GWAS data from PLINK results.
Maintained by Stephen Turner. Last updated 2 years ago.
10.0 match 165 stars 12.51 score 2.4k scripts 20 dependentsbioc
EpiCompare:Comparison, Benchmarking & QC of Epigenomic Datasets
EpiCompare is used to compare and analyse epigenetic datasets for quality control and benchmarking purposes. The package outputs an HTML report consisting of three sections: (1. General metrics) Metrics on peaks (percentage of blacklisted and non-standard peaks, and peak widths) and fragments (duplication rate) of samples, (2. Peak overlap) Percentage and statistical significance of overlapping and non-overlapping peaks. Also includes upset plot and (3. Functional annotation) functional annotation (ChromHMM, ChIPseeker and enrichment analysis) of peaks. Also includes peak enrichment around TSS.
Maintained by Hiranyamaya Dash. Last updated 1 months ago.
epigeneticsgeneticsqualitycontrolchipseqmultiplecomparisonfunctionalgenomicsatacseqdnaseseqbenchmarkbenchmarkingbioconductorbioconductor-packagecomparisonhtmlinteractive-reporting
16.7 match 15 stars 7.49 score 46 scriptskosukehamazaki
RAINBOWR:Genome-Wide Association Study with SNP-Set Methods
By using 'RAINBOWR' (Reliable Association INference By Optimizing Weights with R), users can test multiple SNPs (Single Nucleotide Polymorphisms) simultaneously by kernel-based (SNP-set) methods. This package can also be applied to haplotype-based GWAS (Genome-Wide Association Study). Users can test not only additive effects but also dominance and epistatic effects. In detail, please check our paper on PLOS Computational Biology: Kosuke Hamazaki and Hiroyoshi Iwata (2020) <doi:10.1371/journal.pcbi.1007663>.
Maintained by Kosuke Hamazaki. Last updated 4 months ago.
20.6 match 22 stars 5.99 score 22 scriptsigordot
msigdbr:MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format
Provides the 'Molecular Signatures Database' (MSigDB) gene sets typically used with the 'Gene Set Enrichment Analysis' (GSEA) software (Subramanian et al. 2005 <doi:10.1073/pnas.0506580102>, Liberzon et al. 2015 <doi:10.1016/j.cels.2015.12.004>, Castanza et al. 2023 <doi:10.1038/s41592-023-02014-7>) as an R data frame. The package includes the human genes as listed in MSigDB as well as the corresponding symbols and IDs for frequently studied model organisms such as mouse, rat, pig, fly, and yeast.
Maintained by Igor Dolgalev. Last updated 9 days ago.
enrichment-analysisgene-setsgenomicsgseamsigdbpathway-analysispathways
10.0 match 73 stars 12.20 score 3.6k scripts 20 dependentsbioc
methylKit:DNA methylation analysis from high-throughput bisulfite sequencing results
methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing. It also has functions to analyze base-pair resolution 5hmC data from experimental protocols such as oxBS-Seq and TAB-Seq. Methylation calling can be performed directly from Bismark aligned BAM files.
Maintained by Altuna Akalin. Last updated 28 days ago.
dnamethylationsequencingmethylseqgenome-biologymethylationstatistical-analysisvisualizationcurlbzip2xz-utilszlibcpp
10.0 match 220 stars 11.80 score 578 scripts 3 dependentsbioc
VariantAnnotation:Annotation of Genetic Variants
Annotate variants, compute amino acid coding changes, predict coding outcomes.
Maintained by Bioconductor Package Maintainer. Last updated 3 months ago.
dataimportsequencingsnpannotationgeneticsvariantannotationcurlbzip2xz-utilszlib
10.3 match 11.39 score 1.9k scripts 152 dependentsbioc
PREDA:Position Related Data Analysis
Package for the position related analysis of quantitative functional genomics data.
Maintained by Francesco Ferrari. Last updated 5 months ago.
softwarecopynumbervariationgeneexpressiongenetics
27.1 match 4.30 score 9 scriptsconst-ae
tidygenomics:Tidy Verbs for Dealing with Genomic Data Frames
Handle genomic data within data frames just as you would with 'GRanges'. This packages provides method to deal with genomic intervals the "tidy-way" which makes it simpler to integrate in the the general data munging process. The API is inspired by the popular 'bedtools' and the genome_join() method from the 'fuzzyjoin' package.
Maintained by Constantin Ahlmann-Eltze. Last updated 4 years ago.
17.5 match 103 stars 6.49 score 30 scriptsneuhausi
canvasXpress:Visualization Package for CanvasXpress in R
Enables creation of visualizations using the CanvasXpress framework in R. CanvasXpress is a standalone JavaScript library for reproducible research with complete tracking of data and end-user modifications stored in a single PNG image that can be played back. See <https://www.canvasxpress.org> for more information.
Maintained by Connie Brett. Last updated 17 hours ago.
analyticsbioinformaticschartchartingdashdashboarddata-analyticsdata-sciencedata-visualizationgenomicsgraphsjavascriptnetworknetwork-visualizationpythonreproducible-researchshinyvisualization
10.0 match 297 stars 11.28 score 145 scriptskharchenkolab
numbat:Haplotype-Aware CNV Analysis from scRNA-Seq
A computational method that infers copy number variations (CNVs) in cancer scRNA-seq data and reconstructs the tumor phylogeny. 'numbat' integrates signals from gene expression, allelic ratio, and population haplotype structures to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship. 'numbat' can be used to: 1. detect allele-specific copy number variations from single-cells; 2. differentiate tumor versus normal cells in the tumor microenvironment; 3. infer the clonal architecture and evolutionary history of profiled tumors. 'numbat' does not require tumor/normal-paired DNA or genotype data, but operates solely on the donor scRNA-data data (for example, 10x Cell Ranger output). Additional examples and documentations are available at <https://kharchenkolab.github.io/numbat/>. For details on the method please see Gao et al. Nature Biotechnology (2022) <doi:10.1038/s41587-022-01468-y>.
Maintained by Teng Gao. Last updated 9 days ago.
cancer-genomicscnv-detectionlineage-tracingphylogenysingle-cellsingle-cell-analysissingle-cell-rna-seqspatial-transcriptomicscpp
15.0 match 180 stars 7.48 score 120 scriptschr1swallace
genomic.autocorr:Models Dealing with Spatial Dependency in Genomic Data
Local structure in genomic data often induces dependence between observations taken at different genomic locations. Ignoring this dependence leads to underestimation of the standard error of parameter estimates. This package uses block bootstrapping to estimate asymptotically correct standard errors of parameters from any standard generalised linear model that may be fit by the glm() function.
Maintained by Chris Wallace. Last updated 7 years ago.
41.2 match 2.70 score 4 scriptsstephenturner
kgp:1000 Genomes Project Metadata
Metadata about populations and data about samples from the 1000 Genomes Project, including the 2,504 samples sequenced for the Phase 3 release and the expanded collection of 3,202 samples with 602 additional trios. The data is described in Auton et al. (2015) <doi:10.1038/nature15393> and Byrska-Bishop et al. (2022) <doi:10.1016/j.cell.2022.08.004>, and raw data is available at <http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/>. See Turner (2022) <doi:10.48550/arXiv.2210.00539> for more details.
Maintained by Stephen Turner. Last updated 2 years ago.
1000genomesbioinformaticsgeneticsgenomicsmetadatapopulation-geneticssequencing
27.7 match 20 stars 4.00 score 3 scriptsbioc
CopyNumberPlots:Create Copy-Number Plots using karyoploteR functionality
CopyNumberPlots have a set of functions extending karyoploteRs functionality to create beautiful, customizable and flexible plots of copy-number related data.
Maintained by Bernat Gel. Last updated 5 months ago.
visualizationcopynumbervariationcoverageonechanneldataimportsequencingdnaseqbioconductorbioconductor-packagebioinformaticscopy-number-variationgenomicsgenomics-visualization
17.5 match 6 stars 6.24 score 16 scripts 2 dependentsisglobal-brge
SNPassoc:SNPs-Based Whole Genome Association Studies
Functions to perform most of the common analysis in genome association studies are implemented. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented. Max-statistic and genetic risk-allele score exact distributions are also possible to be estimated. The methods are described in Gonzalez JR et al., 2007 <doi: 10.1093/bioinformatics/btm025>.
Maintained by Dolors Pelegri. Last updated 5 months ago.
11.8 match 16 stars 9.11 score 89 scripts 6 dependentsgmod
JBrowseR:An R Interface to the JBrowse 2 Genome Browser
Provides an R interface to the JBrowse 2 genome browser. Enables embedding a JB2 genome browser in a Shiny app or R Markdown document. The browser can also be launched from an interactive R console. The browser can be loaded with a variety of common genomics data types, and can be used with a custom theme.
Maintained by Colin Diesh. Last updated 1 years ago.
genomicsreactjsrmarkdownshinyvisualization
15.6 match 35 stars 6.81 score 31 scripts 1 dependentsbioc
idr2d:Irreproducible Discovery Rate for Genomic Interactions Data
A tool to measure reproducibility between genomic experiments that produce two-dimensional peaks (interactions between peaks), such as ChIA-PET, HiChIP, and HiC. idr2d is an extension of the original idr package, which is intended for (one-dimensional) ChIP-seq peaks.
Maintained by Konstantin Krismer. Last updated 5 months ago.
dna3dstructuregeneregulationpeakdetectionepigeneticsfunctionalgenomicsclassificationhic
24.5 match 4.30 score 6 scriptsbioc
CAGEfightR:Analysis of Cap Analysis of Gene Expression (CAGE) data using Bioconductor
CAGE is a widely used high throughput assay for measuring transcription start site (TSS) activity. CAGEfightR is an R/Bioconductor package for performing a wide range of common data analysis tasks for CAGE and 5'-end data in general. Core functionality includes: import of CAGE TSSs (CTSSs), tag (or unidirectional) clustering for TSS identification, bidirectional clustering for enhancer identification, annotation with transcript and gene models, correlation of TSS and enhancer expression, calculation of TSS shapes, quantification of CAGE expression as expression matrices and genome brower visualization.
Maintained by Malte Thodberg. Last updated 5 months ago.
softwaretranscriptioncoveragegeneexpressiongeneregulationpeakdetectiondataimportdatarepresentationtranscriptomicssequencingannotationgenomebrowsersnormalizationpreprocessingvisualization
14.1 match 8 stars 7.46 score 67 scripts 1 dependentsjinghuazhao
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 3 days ago.
8.8 match 12 stars 11.94 score 448 scripts 16 dependentsbioc
GenomicTuples:Representation and Manipulation of Genomic Tuples
GenomicTuples defines general purpose containers for storing genomic tuples. It aims to provide functionality for tuples of genomic co-ordinates that are analogous to those available for genomic ranges in the GenomicRanges Bioconductor package.
Maintained by Peter Hickey. Last updated 5 months ago.
infrastructuredatarepresentationsequencingcpp
19.2 match 4 stars 5.48 score 7 scriptsbioc
decompTumor2Sig:Decomposition of individual tumors into mutational signatures by signature refitting
Uses quadratic programming for signature refitting, i.e., to decompose the mutation catalog from an individual tumor sample into a set of given mutational signatures (either Alexandrov-model signatures or Shiraishi-model signatures), computing weights that reflect the contributions of the signatures to the mutation load of the tumor.
Maintained by Rosario M. Piro. Last updated 5 months ago.
softwaresnpsequencingdnaseqgenomicvariationsomaticmutationbiomedicalinformaticsgeneticsbiologicalquestionstatisticalmethod
21.8 match 1 stars 4.78 score 10 scripts 1 dependentshighlanderlab
SIMplyBee:'AlphaSimR' Extension for Simulating Honeybee Populations and Breeding Programmes
An extension of the 'AlphaSimR' package (<https://cran.r-project.org/package=AlphaSimR>) for stochastic simulations of honeybee populations and breeding programmes. 'SIMplyBee' enables simulation of individual bees that form a colony, which includes a queen, fathers (drones the queen mated with), virgin queens, workers, and drones. Multiple colony can be merged into a population of colonies, such as an apiary or a whole country of colonies. Functions enable operations on castes, colony, or colonies, to ease 'R' scripting of whole populations. All 'AlphaSimR' functionality with respect to genomes and genetic and phenotype values is available and further extended for honeybees, including haplo-diploidy, complementary sex determiner locus, colony events (swarming, supersedure, etc.), and colony phenotype values.
Maintained by Jana Obšteter. Last updated 6 months ago.
16.5 match 2 stars 6.24 score 18 scriptsbioc
systemPipeR:systemPipeR: Workflow Environment for Data Analysis and Report Generation
systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates.
Maintained by Thomas Girke. Last updated 5 months ago.
geneticsinfrastructuredataimportsequencingrnaseqriboseqchipseqmethylseqsnpgeneexpressioncoveragegenesetenrichmentalignmentqualitycontrolimmunooncologyreportwritingworkflowstepworkflowmanagement
8.9 match 53 stars 11.52 score 344 scripts 3 dependentsbioc
regioneR:Association analysis of genomic regions based on permutation tests
regioneR offers a statistical framework based on customizable permutation tests to assess the association between genomic region sets and other genomic features.
Maintained by Bernat Gel. Last updated 5 months ago.
geneticschipseqdnaseqmethylseqcopynumbervariation
11.4 match 9.00 score 2.7k scripts 21 dependentsbioc
txdbmaker:Tools for making TxDb objects from genomic annotations
A set of tools for making TxDb objects from genomic annotations from various sources (e.g. UCSC, Ensembl, and GFF files). These tools allow the user to download the genomic locations of transcripts, exons, and CDS, for a given assembly, and to import them in a TxDb object. TxDb objects are implemented in the GenomicFeatures package, together with flexible methods for extracting the desired features in convenient formats.
Maintained by H. Pagès. Last updated 4 months ago.
infrastructuredataimportannotationgenomeannotationgenomeassemblygeneticssequencingbioconductor-packagecore-package
10.5 match 3 stars 9.68 score 92 scripts 87 dependentsquantgen
BGData:A Suite of Packages for Analysis of Big Genomic Data
An umbrella package providing a phenotype/genotype data structure and scalable and efficient computational methods for large genomic datasets in combination with several other packages: 'BEDMatrix', 'LinkedMatrix', and 'symDMatrix'.
Maintained by Alexander Grueneberg. Last updated 2 months ago.
geneticsgenomicsgwasr-pkgopenmp
18.9 match 34 stars 5.34 score 43 scriptsbioc
SeqArray:Data management of large-scale whole-genome sequence variant calls using GDS files
Data management of large-scale whole-genome sequencing variant calls with thousands of individuals: genotypic data (e.g., SNVs, indels and structural variation calls) and annotations in SeqArray GDS files are stored in an array-oriented and compressed manner, with efficient data access using the R programming language.
Maintained by Xiuwen Zheng. Last updated 5 days ago.
infrastructuredatarepresentationsequencinggeneticsbioinformaticsgds-formatsnpsnvweswgscpp
8.3 match 45 stars 12.11 score 1.1k scripts 9 dependentsbioc
aCGH:Classes and functions for Array Comparative Genomic Hybridization data
Functions for reading aCGH data from image analysis output files and clone information files, creation of aCGH S3 objects for storing these data. Basic methods for accessing/replacing, subsetting, printing and plotting aCGH objects.
Maintained by Peter Dimitrov. Last updated 5 months ago.
copynumbervariationdataimportgeneticscpp
18.4 match 5.38 score 9 scripts 4 dependentsbioc
quantsmooth:Quantile smoothing and genomic visualization of array data
Implements quantile smoothing as introduced in: Quantile smoothing of array CGH data; Eilers PH, de Menezes RX; Bioinformatics. 2005 Apr 1;21(7):1146-53.
Maintained by Jan Oosting. Last updated 5 months ago.
visualizationcopynumbervariation
17.0 match 5.78 score 40 scripts 7 dependentsbioc
sevenC:Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs
Chromatin looping is an essential feature of eukaryotic genomes and can bring regulatory sequences, such as enhancers or transcription factor binding sites, in the close physical proximity of regulated target genes. Here, we provide sevenC, an R package that uses protein binding signals from ChIP-seq and sequence motif information to predict chromatin looping events. Cross-linking of proteins that bind close to loop anchors result in ChIP-seq signals at both anchor loci. These signals are used at CTCF motif pairs together with their distance and orientation to each other to predict whether they interact or not. The resulting chromatin loops might be used to associate enhancers or transcription factor binding sites (e.g., ChIP-seq peaks) to regulated target genes.
Maintained by Jonas Ibn-Salem. Last updated 5 months ago.
dna3dstructurechipchipcoveragedataimportepigeneticsfunctionalgenomicsclassificationregressionchipseqhicannotation3d-genomechip-seqchromatin-interactionhi-cpredictionsequence-motiftranscription-factors
18.3 match 12 stars 5.38 score 3 scriptsbioc
consensusSeekeR:Detection of consensus regions inside a group of experiences using genomic positions and genomic ranges
This package compares genomic positions and genomic ranges from multiple experiments to extract common regions. The size of the analyzed region is adjustable as well as the number of experiences in which a feature must be present in a potential region to tag this region as a consensus region. In genomic analysis where feature identification generates a position value surrounded by a genomic range, such as ChIP-Seq peaks and nucleosome positions, the replication of an experiment may result in slight differences between predicted values. This package enables the conciliation of the results into consensus regions.
Maintained by Astrid Deschênes. Last updated 5 months ago.
biologicalquestionchipseqgeneticsmultiplecomparisontranscriptionpeakdetectionsequencingcoveragechip-seq-analysisgenomic-data-analysisnucleosome-positioning
18.6 match 1 stars 5.26 score 5 scripts 1 dependentssamilhll
macrosyntR:Draw Ordered Oxford Grids
Use standard genomics file format (BED) and a table of orthologs to illustrate synteny conservation at the genome-wide scale. Significantly conserved linkage groups are identified as described in Simakov et al. (2020) <doi:10.1038/s41559-020-1156-z> and displayed on an Oxford Grid (Edwards (1991) <doi:10.1111/j.1469-1809.1991.tb00394.x>) or a chord diagram as in Simakov et al. (2022) <doi:10.1126/sciadv.abi5884>. The package provides a function that uses a network-based greedy algorithm to find communities (Clauset et al. (2004) <doi:10.1103/PhysRevE.70.066111>) and so automatically order the chromosomes on the plot to improve interpretability.
Maintained by Sami El Hilali. Last updated 10 months ago.
bioinformaticsgenomic-visualizationsgenomics
20.1 match 14 stars 4.85 score 5 scriptsbioc
spiky:Spike-in calibration for cell-free MeDIP
spiky implements methods and model generation for cfMeDIP (cell-free methylated DNA immunoprecipitation) with spike-in controls. CfMeDIP is an enrichment protocol which avoids destructive conversion of scarce template, making it ideal as a "liquid biopsy," but creating certain challenges in comparing results across specimens, subjects, and experiments. The use of synthetic spike-in standard oligos allows diagnostics performed with cfMeDIP to quantitatively compare samples across subjects, experiments, and time points in both relative and absolute terms.
Maintained by Tim Triche. Last updated 5 months ago.
differentialmethylationdnamethylationnormalizationpreprocessingqualitycontrolsequencing
19.6 match 2 stars 4.90 score 3 scriptsbioc
RcisTarget:RcisTarget Identify transcription factor binding motifs enriched on a list of genes or genomic regions
RcisTarget identifies transcription factor binding motifs (TFBS) over-represented on a gene list. In a first step, RcisTarget selects DNA motifs that are significantly over-represented in the surroundings of the transcription start site (TSS) of the genes in the gene-set. This is achieved by using a database that contains genome-wide cross-species rankings for each motif. The motifs that are then annotated to TFs and those that have a high Normalized Enrichment Score (NES) are retained. Finally, for each motif and gene-set, RcisTarget predicts the candidate target genes (i.e. genes in the gene-set that are ranked above the leading edge).
Maintained by Gert Hulselmans. Last updated 5 months ago.
generegulationmotifannotationtranscriptomicstranscriptiongenesetenrichmentgenetarget
10.1 match 37 stars 9.47 score 191 scriptsbioc
gtrellis:Genome Level Trellis Layout
Genome level Trellis graph visualizes genomic data conditioned by genomic categories (e.g. chromosomes). For each genomic category, multiple dimensional data which are represented as tracks describe different features from different aspects. This package provides high flexibility to arrange genomic categories and to add self-defined graphics in the plot.
Maintained by Zuguang Gu. Last updated 5 months ago.
softwarevisualizationsequencing
11.6 match 39 stars 8.24 score 37 scripts 1 dependentsbioc
crisprDesign:Comprehensive design of CRISPR gRNAs for nucleases and base editors
Provides a comprehensive suite of functions to design and annotate CRISPR guide RNA (gRNAs) sequences. This includes on- and off-target search, on-target efficiency scoring, off-target scoring, full gene and TSS contextual annotations, and SNP annotation (human only). It currently support five types of CRISPR modalities (modes of perturbations): CRISPR knockout, CRISPR activation, CRISPR inhibition, CRISPR base editing, and CRISPR knockdown. All types of CRISPR nucleases are supported, including DNA- and RNA-target nucleases such as Cas9, Cas12a, and Cas13d. All types of base editors are also supported. gRNA design can be performed on reference genomes, transcriptomes, and custom DNA and RNA sequences. Both unpaired and paired gRNA designs are enabled.
Maintained by Jean-Philippe Fortin. Last updated 23 days ago.
crisprfunctionalgenomicsgenetargetbioconductorbioconductor-packagecrispr-cas9crispr-designcrispr-targetgenomics-analysisgrnagrna-sequencegrna-sequencessgrnasgrna-design
11.5 match 22 stars 8.28 score 80 scripts 3 dependentsbioc
igvShiny:igvShiny: a wrapper of Integrative Genomics Viewer (IGV - an interactive tool for visualization and exploration integrated genomic data)
This package is a wrapper of Integrative Genomics Viewer (IGV). It comprises an htmlwidget version of IGV. It can be used as a module in Shiny apps.
Maintained by Arkadiusz Gladki. Last updated 5 months ago.
softwareshinyappssequencingcoverage
12.9 match 37 stars 7.40 score 120 scriptsbnprks
BPCells:Single Cell Counts Matrices to PCA
> Efficient operations for single cell ATAC-seq fragments and RNA counts matrices. Interoperable with standard file formats, and introduces efficient bit-packed formats that allow large storage savings and increased read speeds.
Maintained by Benjamin Parks. Last updated 2 months ago.
12.7 match 184 stars 7.48 score 172 scriptsbioc
GenomicInteractions:Utilities for handling genomic interaction data
Utilities for handling genomic interaction data such as ChIA-PET or Hi-C, annotating genomic features with interaction information, and producing plots and summary statistics.
Maintained by Liz Ing-Simmons. Last updated 5 months ago.
softwareinfrastructuredataimportdatarepresentationhic
10.1 match 7 stars 9.31 score 162 scripts 5 dependentsbioc
methodical:Discovering genomic regions where methylation is strongly associated with transcriptional activity
DNA methylation is generally considered to be associated with transcriptional silencing. However, comprehensive, genome-wide investigation of this relationship requires the evaluation of potentially millions of correlation values between the methylation of individual genomic loci and expression of associated transcripts in a relatively large numbers of samples. Methodical makes this process quick and easy while keeping a low memory footprint. It also provides a novel method for identifying regions where a number of methylation sites are consistently strongly associated with transcriptional expression. In addition, Methodical enables housing DNA methylation data from diverse sources (e.g. WGBS, RRBS and methylation arrays) with a common framework, lifting over DNA methylation data between different genome builds and creating base-resolution plots of the association between DNA methylation and transcriptional activity at transcriptional start sites.
Maintained by Richard Heery. Last updated 2 months ago.
dnamethylationmethylationarraytranscriptiongenomewideassociationsoftwareopenjdk
20.3 match 4.65 score 14 scriptsbioc
minfi:Analyze Illumina Infinium DNA methylation arrays
Tools to analyze & visualize Illumina Infinium methylation arrays.
Maintained by Kasper Daniel Hansen. Last updated 4 months ago.
immunooncologydnamethylationdifferentialmethylationepigeneticsmicroarraymethylationarraymultichanneltwochanneldataimportnormalizationpreprocessingqualitycontrol
7.3 match 60 stars 12.82 score 996 scripts 27 dependentsbioc
ComplexHeatmap:Make Complex Heatmaps
Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports various annotation graphics.
Maintained by Zuguang Gu. Last updated 5 months ago.
softwarevisualizationsequencingclusteringcomplex-heatmapsheatmap
5.4 match 1.3k stars 16.93 score 16k scripts 151 dependentsbioc
VisiumIO:Import Visium data from the 10X Space Ranger pipeline
The package allows users to readily import spatial data obtained from either the 10X website or from the Space Ranger pipeline. Supported formats include tar.gz, h5, and mtx files. Multiple files can be imported at once with *List type of functions. The package represents data mainly as SpatialExperiment objects.
Maintained by Marcel Ramos. Last updated 2 months ago.
softwareinfrastructuredataimportsinglecellspatialbioconductor-packagegenomicsu24ca289073
16.3 match 5.50 score 14 scripts 1 dependentsbioc
GWASTools:Tools for Genome Wide Association Studies
Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.
Maintained by Stephanie M. Gogarten. Last updated 10 days ago.
snpgeneticvariabilityqualitycontrolmicroarray
8.3 match 17 stars 10.67 score 396 scripts 5 dependentsenblacar
SCpubr:Generate Publication Ready Visualizations of Single Cell Transcriptomics Data
A system that provides a streamlined way of generating publication ready plots for known Single-Cell transcriptomics data in a “publication ready” format. This is, the goal is to automatically generate plots with the highest quality possible, that can be used right away or with minimal modifications for a research article.
Maintained by Enrique Blanco-Carmona. Last updated 1 months ago.
softwaresinglecellvisualizationdata-visualizationggplot2publication-quality-plotsseuratsingle-cellsingle-cell-genomicssingle-cell-rna-seq
10.2 match 178 stars 8.71 score 194 scriptsbioc
CaDrA:Candidate Driver Analysis
Performs both stepwise and backward heuristic search for candidate (epi)genetic drivers based on a binary multi-omics dataset. CaDrA's main objective is to identify features which, together, are significantly skewed or enriched pertaining to a given vector of continuous scores (e.g. sample-specific scores representing a phenotypic readout of interest, such as protein expression, pathway activity, etc.), based on the union occurence (i.e. logical OR) of the events.
Maintained by Reina Chau. Last updated 5 months ago.
microarrayrnaseqgeneexpressionsoftwarefeatureextraction
12.2 match 24 stars 7.19 score 12 scriptstomkellygenetics
graphsim:Simulate Expression Data from 'igraph' Networks
Functions to develop simulated continuous data (e.g., gene expression) from a sigma covariance matrix derived from a graph structure in 'igraph' objects. Intended to extend 'mvtnorm' to take 'igraph' structures rather than sigma matrices as input. This allows the use of simulated data that correctly accounts for pathway relationships and correlations. This allows the use of simulated data that correctly accounts for pathway relationships and correlations. Here we present a versatile statistical framework to simulate correlated gene expression data from biological pathways, by sampling from a multivariate normal distribution derived from a graph structure. This package allows the simulation of biological pathways from a graph structure based on a statistical model of gene expression. For example methods to infer biological pathways and gene regulatory networks from gene expression data can be tested on simulated datasets using this framework. This also allows for pathway structures to be considered as a confounding variable when simulating gene expression data to test the performance of genomic analyses.
Maintained by S. Thomas Kelly. Last updated 3 years ago.
benchmarkinggene-expressiongene-regulatory-networksgeneticsgenomic-data-analysisgenomicsgraph-algorithmsigraph-networksjossngs-analysissimulated-datasimulation-modeling
17.2 match 24 stars 5.08 score 2 scriptsitalo-granato
BGGE:Bayesian Genomic Linear Models Applied to GE Genome Selection
Application of genome prediction for a continuous variable, focused on genotype by environment (GE) genomic selection models (GS). It consists a group of functions that help to create regression kernels for some GE genomic models proposed by Jarquín et al. (2014) <doi:10.1007/s00122-013-2243-1> and Lopez-Cruz et al. (2015) <doi:10.1534/g3.114.016097>. Also, it computes genomic predictions based on Bayesian approaches. The prediction function uses an orthogonal transformation of the data and specific priors present by Cuevas et al. (2014) <doi:10.1534/g3.114.013094>.
Maintained by Italo Granato. Last updated 6 years ago.
bayesian-inferencege-genomic-modelsgenomicgenotype-by-environmentpredictionstatistics
24.1 match 1 stars 3.60 score 5 scriptsbioc
mobileRNA:mobileRNA: Investigate the RNA mobilome & population-scale changes
Genomic analysis can be utilised to identify differences between RNA populations in two conditions, both in production and abundance. This includes the identification of RNAs produced by multiple genomes within a biological system. For example, RNA produced by pathogens within a host or mobile RNAs in plant graft systems. The mobileRNA package provides methods to pre-process, analyse and visualise the sRNA and mRNA populations based on the premise of mapping reads to all genotypes at the same time.
Maintained by Katie Jeynes-Cupper. Last updated 5 months ago.
visualizationrnaseqsequencingsmallrnagenomeassemblyclusteringexperimentaldesignqualitycontrolworkflowstepalignmentpreprocessingbioinformaticsplant-science
17.2 match 4 stars 5.00 score 2 scriptscidm-ph
phylepic:Combined Visualisation of Phylogenetic and Epidemiological Data
A collection of utilities and 'ggplot2' extensions to assist with visualisations in genomic epidemiology. This includes the 'phylepic' chart, a visual combination of a phylogenetic tree and a matched epidemic curve. The included 'ggplot2' extensions such as date axes binned by week are relevant for other applications in epidemiology and beyond. The approach is described in Suster et al. (2024) <doi:10.1101/2024.04.02.24305229>.
Maintained by Carl Suster. Last updated 3 months ago.
genomicsgenomics-visualizationpublic-health
18.0 match 4.65 score 4 scriptsjgx65
hierfstat:Estimation and Tests of Hierarchical F-Statistics
Estimates hierarchical F-statistics from haploid or diploid genetic data with any numbers of levels in the hierarchy, following the algorithm of Yang (Evolution(1998), 52:950). Tests via randomisations the significance of each F and variance components, using the likelihood-ratio statistics G (Goudet et al. (1996) <https://academic.oup.com/genetics/article/144/4/1933/6017091>). Estimates genetic diversity statistics for haploid and diploid genetic datasets in various formats, including inbreeding and coancestry coefficients, and population specific F-statistics following Weir and Goudet (2017) <https://academic.oup.com/genetics/article/206/4/2085/6072590>.
Maintained by Jerome Goudet. Last updated 4 months ago.
devtoolsfstatisticsgwashierfstatkinshippopulation-geneticspopulation-genomicsquantitative-geneticssimulations
7.5 match 25 stars 11.06 score 560 scripts 5 dependentsbioc
CAGEr:Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining
The _CAGEr_ package identifies transcription start sites (TSS) and their usage frequency from CAGE (Cap Analysis Gene Expression) sequencing data. It normalises raw CAGE tag count, clusters TSSs into tag clusters (TC) and aggregates them across multiple CAGE experiments to construct consensus clusters (CC) representing the promoterome. CAGEr provides functions to profile expression levels of these clusters by cumulative expression and rarefaction analysis, and outputs the plots in ggplot2 format for further facetting and customisation. After clustering, CAGEr performs analyses of promoter width and detects differential usage of TSSs (promoter shifting) between samples. CAGEr also exports its data as genome browser tracks, and as R objects for downsteam expression analysis by other Bioconductor packages such as DESeq2, CAGEfightR, or seqArchR.
Maintained by Charles Plessy. Last updated 5 months ago.
preprocessingsequencingnormalizationfunctionalgenomicstranscriptiongeneexpressionclusteringvisualization
13.5 match 6.12 score 73 scriptsbioc
epidecodeR:epidecodeR: a functional exploration tool for epigenetic and epitranscriptomic regulation
epidecodeR is a package capable of analysing impact of degree of DNA/RNA epigenetic chemical modifications on dysregulation of genes or proteins. This package integrates chemical modification data generated from a host of epigenomic or epitranscriptomic techniques such as ChIP-seq, ATAC-seq, m6A-seq, etc. and dysregulated gene lists in the form of differential gene expression, ribosome occupancy or differential protein translation and identify impact of dysregulation of genes caused due to varying degrees of chemical modifications associated with the genes. epidecodeR generates cumulative distribution function (CDF) plots showing shifts in trend of overall log2FC between genes divided into groups based on the degree of modification associated with the genes. The tool also tests for significance of difference in log2FC between groups of genes.
Maintained by Kandarp Joshi. Last updated 5 months ago.
differentialexpressiongeneregulationhistonemodificationfunctionalpredictiontranscriptiongeneexpressionepitranscriptomicsepigeneticsfunctionalgenomicssystemsbiologytranscriptomicschiponchipdifferential-expressiongenomicsgenomics-visualization
17.5 match 5 stars 4.70 score 1 scriptsbioc
QuasR:Quantify and Annotate Short Reads in R
This package provides a framework for the quantification and analysis of Short Reads. It covers a complete workflow starting from raw sequence reads, over creation of alignments and quality control plots, to the quantification of genomic regions of interest. Read alignments are either generated through Rbowtie (data from DNA/ChIP/ATAC/Bis-seq experiments) or Rhisat2 (data from RNA-seq experiments that require spliced alignments), or can be provided in the form of bam files.
Maintained by Michael Stadler. Last updated 1 months ago.
geneticspreprocessingsequencingchipseqrnaseqmethylseqcoveragealignmentqualitycontrolimmunooncologycurlbzip2xz-utilszlibcpp
9.5 match 6 stars 8.63 score 79 scripts 1 dependentsdwinter
pafr:Read, Manipulate and Visualize 'Pairwise mApping Format' Data
Provides functions to read, process and visualize pairwise sequence alignments in the 'PAF' format used by 'minimap2' and other whole-genome aligners. 'minimap2' is described by Li H. (2018) <doi:10.1093/bioinformatics/bty191>.
Maintained by David Winter. Last updated 4 years ago.
12.0 match 71 stars 6.73 score 75 scriptsbioc
Rsubread:Mapping, quantification and variant analysis of sequencing data
Alignment, quantification and analysis of RNA sequencing data (including both bulk RNA-seq and scRNA-seq) and DNA sequenicng data (including ATAC-seq, ChIP-seq, WGS, WES etc). Includes functionality for read mapping, read counting, SNP calling, structural variant detection and gene fusion discovery. Can be applied to all major sequencing techologies and to both short and long sequence reads.
Maintained by Wei Shi. Last updated 8 days ago.
sequencingalignmentsequencematchingrnaseqchipseqsinglecellgeneexpressiongeneregulationgeneticsimmunooncologysnpgeneticvariabilitypreprocessingqualitycontrolgenomeannotationgenefusiondetectionindeldetectionvariantannotationvariantdetectionmultiplesequencealignmentzlib
8.8 match 9.24 score 892 scripts 10 dependentsbioc
BUSpaRse:kallisto | bustools R utilities
The kallisto | bustools pipeline is a fast and modular set of tools to convert single cell RNA-seq reads in fastq files into gene count or transcript compatibility counts (TCC) matrices for downstream analysis. Central to this pipeline is the barcode, UMI, and set (BUS) file format. This package serves the following purposes: First, this package allows users to manipulate BUS format files as data frames in R and then convert them into gene count or TCC matrices. Furthermore, since R and Rcpp code is easier to handle than pure C++ code, users are encouraged to tweak the source code of this package to experiment with new uses of BUS format and different ways to convert the BUS file into gene count matrix. Second, this package can conveniently generate files required to generate gene count matrices for spliced and unspliced transcripts for RNA velocity. Here biotypes can be filtered and scaffolds and haplotypes can be removed, and the filtered transcriptome can be extracted and written to disk. Third, this package implements utility functions to get transcripts and associated genes required to convert BUS files to gene count matrices, to write the transcript to gene information in the format required by bustools, and to read output of bustools into R as sparses matrices.
Maintained by Lambda Moses. Last updated 5 months ago.
singlecellrnaseqworkflowstepcpp
10.9 match 9 stars 7.35 score 165 scriptsbioc
segmentSeq:Methods for identifying small RNA loci from high-throughput sequencing data
High-throughput sequencing technologies allow the production of large volumes of short sequences, which can be aligned to the genome to create a set of matches to the genome. By looking for regions of the genome which to which there are high densities of matches, we can infer a segmentation of the genome into regions of biological significance. The methods in this package allow the simultaneous segmentation of data from multiple samples, taking into account replicate data, in order to create a consensus segmentation. This has obvious applications in a number of classes of sequencing experiments, particularly in the discovery of small RNA loci and novel mRNA transcriptome discovery.
Maintained by Samuel Granjeaud. Last updated 5 months ago.
multiplecomparisonsequencingalignmentdifferentialexpressionqualitycontroldataimport
13.0 match 6.17 score 42 scriptsbioc
TnT:Interactive Visualization for Genomic Features
A R interface to the TnT javascript library (https://github.com/ tntvis) to provide interactive and flexible visualization of track-based genomic data.
Maintained by Jialin Ma. Last updated 5 months ago.
infrastructurevisualizationbioconductorgenome-browserhtmlwidgetsshiny
11.2 match 14 stars 7.15 score 17 scriptsbioc
chipenrich:Gene Set Enrichment For ChIP-seq Peak Data
ChIP-Enrich and Poly-Enrich perform gene set enrichment testing using peaks called from a ChIP-seq experiment. The method empirically corrects for confounding factors such as the length of genes, and the mappability of the sequence surrounding genes.
Maintained by Kai Wang. Last updated 17 days ago.
immunooncologychipseqepigeneticsfunctionalgenomicsgenesetenrichmenthistonemodificationregression
16.1 match 4.94 score 29 scriptsbioc
planttfhunter:Identification and classification of plant transcription factors
planttfhunter is used to identify plant transcription factors (TFs) from protein sequence data and classify them into families and subfamilies using the classification scheme implemented in PlantTFDB. TFs are identified using pre-built hidden Markov model profiles for DNA-binding domains. Then, auxiliary and forbidden domains are used with DNA-binding domains to classify TFs into families and subfamilies (when applicable). Currently, TFs can be classified in 58 different TF families/subfamilies.
Maintained by Fabrício Almeida-Silva. Last updated 5 months ago.
softwaretranscriptionfunctionalpredictiongenomeannotationfunctionalgenomicshiddenmarkovmodelsequencingclassificationfunctional-genomicsgene-familieshidden-markov-modelsplant-genomicsplantsprotein-domainstranscription-factors
19.7 match 4.00 score 5 scriptseriqande
gscramble:Simulating Admixed Genotypes Without Replacement
A genomic simulation approach for creating biologically informed individual genotypes from empirical data that 1) samples alleles from populations without replacement, 2) segregates alleles based on species-specific recombination rates. 'gscramble' is a flexible simulation approach that allows users to create pedigrees of varying complexity in order to simulate admixed genotypes. Furthermore, it allows users to track haplotype blocks from the source populations through the pedigrees.
Maintained by Eric C. Anderson. Last updated 1 years ago.
18.0 match 4.35 score 15 scriptsbioc
fishpond:Fishpond: downstream methods and tools for expression data
Fishpond contains methods for differential transcript and gene expression analysis of RNA-seq data using inferential replicates for uncertainty of abundance quantification, as generated by Gibbs sampling or bootstrap sampling. Also the package contains a number of utilities for working with Salmon and Alevin quantification files.
Maintained by Michael Love. Last updated 5 months ago.
sequencingrnaseqgeneexpressiontranscriptionnormalizationregressionmultiplecomparisonbatcheffectvisualizationdifferentialexpressiondifferentialsplicingalternativesplicingsinglecellbioconductorgene-expressiongenomicssalmonscrnaseqstatisticstranscriptomics
10.0 match 28 stars 7.83 score 150 scriptsbioc
podkat:Position-Dependent Kernel Association Test
This package provides an association test that is capable of dealing with very rare and even private variants. This is accomplished by a kernel-based approach that takes the positions of the variants into account. The test can be used for pre-processed matrix data, but also directly for variant data stored in VCF files. Association testing can be performed whole-genome, whole-exome, or restricted to pre-defined regions of interest. The test is complemented by tools for analyzing and visualizing the results.
Maintained by Ulrich Bodenhofer. Last updated 5 months ago.
geneticswholegenomeannotationvariantannotationsequencingdataimportcurlbzip2xz-utilszlibcpp
15.6 match 5.02 score 6 scriptshenrikbengtsson
TopDom:An Efficient and Deterministic Method for Identifying Topological Domains in Genomes
The 'TopDom' method identifies topological domains in genomes from Hi-C sequence data (Shin et al., 2016 <doi:10.1093/nar/gkv1505>). The authors published an implementation of their method as an R script (two different versions; also available in this package). This package originates from those original 'TopDom' R scripts and provides help pages adopted from the original 'TopDom' PDF documentation. It also provides a small number of bug fixes to the original code.
Maintained by Henrik Bengtsson. Last updated 4 years ago.
genomicshictopological-domains
13.4 match 21 stars 5.80 score 20 scripts 1 dependentsquantgen
BEDMatrix:Extract Genotypes from a PLINK .bed File
A matrix-like data structure that allows for efficient, convenient, and scalable subsetting of binary genotype/phenotype files generated by PLINK (<https://www.cog-genomics.org/plink2>), the whole genome association analysis toolset, without loading the entire file into memory.
Maintained by Alexander Grueneberg. Last updated 7 months ago.
bedgeneticsgenomicsplinkplink2r-pkg
10.8 match 18 stars 7.13 score 196 scripts 6 dependentsr-forge
genoPlotR:Plot Publication-Grade Gene and Genome Maps
Draws gene or genome maps and comparisons between these, in a publication-grade manner. Starting from simple, common files, it will draw postscript or PDF files that can be sent as such to journals.
Maintained by Lionel Guy. Last updated 4 years ago.
14.4 match 5.33 score 106 scriptsabbvie-external
OmicNavigator:Open-Source Software for 'Omic' Data Analysis and Visualization
A tool for interactive exploration of the results from 'omics' experiments to facilitate novel discoveries from high-throughput biology. The software includes R functions for the 'bioinformatician' to deposit study metadata and the outputs from statistical analyses (e.g. differential expression, enrichment). These results are then exported to an interactive JavaScript dashboard that can be interrogated on the user's local machine or deployed online to be explored by collaborators. The dashboard includes 'sortable' tables, interactive plots including network visualization, and fine-grained filtering based on statistical significance.
Maintained by John Blischak. Last updated 15 days ago.
bioinformaticsgenomicsomicsopencpu
10.0 match 34 stars 7.68 score 31 scriptsbioc
martini:GWAS Incorporating Networks
martini deals with the low power inherent to GWAS studies by using prior knowledge represented as a network. SNPs are the vertices of the network, and the edges represent biological relationships between them (genomic adjacency, belonging to the same gene, physical interaction between protein products). The network is scanned using SConES, which looks for groups of SNPs maximally associated with the phenotype, that form a close subnetwork.
Maintained by Hector Climente-Gonzalez. Last updated 5 months ago.
softwaregenomewideassociationsnpgeneticvariabilitygeneticsfeatureextractiongraphandnetworknetworkbioinformaticsgenomicsgwasnetwork-analysissnpssystems-biologycpp
12.4 match 4 stars 6.16 score 30 scriptscovaruber
sommer:Solving Mixed Model Equations in R
Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.
Maintained by Giovanny Covarrubias-Pazaran. Last updated 2 days ago.
average-informationmixed-modelsrcpparmadilloopenblascppopenmp
6.0 match 44 stars 12.63 score 300 scripts 10 dependentsbioc
crisprScore:On-Target and Off-Target Scoring Algorithms for CRISPR gRNAs
Provides R wrappers of several on-target and off-target scoring methods for CRISPR guide RNAs (gRNAs). The following nucleases are supported: SpCas9, AsCas12a, enAsCas12a, and RfxCas13d (CasRx). The available on-target cutting efficiency scoring methods are RuleSet1, Azimuth, DeepHF, DeepCpf1, enPAM+GB, and CRISPRscan. Both the CFD and MIT scoring methods are available for off-target specificity prediction. The package also provides a Lindel-derived score to predict the probability of a gRNA to produce indels inducing a frameshift for the Cas9 nuclease. Note that DeepHF, DeepCpf1 and enPAM+GB are not available on Windows machines.
Maintained by Jean-Philippe Fortin. Last updated 5 months ago.
crisprfunctionalgenomicsfunctionalpredictionbioconductorbioconductor-packagecrispr-cas9crispr-designcrispr-targetgenomicsgrnagrna-sequencegrna-sequencesscoring-algorithmsgrnasgrna-design
10.0 match 16 stars 7.52 score 19 scripts 4 dependentsoumarkme
TSDFGS:Training Set Determination For Genomic Selection
We propose an optimality criterion to determine the required training set, r-score, which is derived directly from Pearson's correlation between the genomic estimated breeding values and phenotypic values of the test set <doi:10.1007/s00122-019-03387-0>. This package provides two main functions to determine a good training set and its size.
Maintained by Jen-Hsiang Ou. Last updated 1 years ago.
genomic-predictiongenomic-selectioncpp
20.3 match 5 stars 3.70 score 7 scriptsbioc
UMI4Cats:UMI4Cats: Processing, analysis and visualization of UMI-4C chromatin contact data
UMI-4C is a technique that allows characterization of 3D chromatin interactions with a bait of interest, taking advantage of a sonication step to produce unique molecular identifiers (UMIs) that help remove duplication bias, thus allowing a better differential comparsion of chromatin interactions between conditions. This package allows processing of UMI-4C data, starting from FastQ files provided by the sequencing facility. It provides two statistical methods for detecting differential contacts and includes a visualization function to plot integrated information from a UMI-4C assay.
Maintained by Mireia Ramos-Rodriguez. Last updated 5 months ago.
qualitycontrolpreprocessingalignmentnormalizationvisualizationsequencingcoveragechromatinchromatin-interactiongenomicsumi4c
13.3 match 5 stars 5.57 score 7 scriptsbioc
netSmooth:Network smoothing for scRNAseq
netSmooth is an R package for network smoothing of single cell RNA sequencing data. Using bio networks such as protein-protein interactions as priors for gene co-expression, netsmooth improves cell type identification from noisy, sparse scRNAseq data.
Maintained by Jonathan Ronen. Last updated 5 months ago.
networkgraphandnetworksinglecellrnaseqgeneexpressionsequencingtranscriptomicsnormalizationpreprocessingclusteringdimensionreductionbioinformaticsgenomicssingle-cell
10.0 match 27 stars 7.41 score 4 scriptsbioc
ggbio:Visualization tools for genomic data
The ggbio package extends and specializes the grammar of graphics for biological data. The graphics are designed to answer common scientific questions, in particular those often asked of high throughput genomics data. All core Bioconductor data structures are supported, where appropriate. The package supports detailed views of particular genomic regions, as well as genome-wide overviews. Supported overviews include ideograms and grand linear views. High-level plots include sequence fragment length, edge-linked interval to data view, mismatch pileup, and several splicing summaries.
Maintained by Michael Lawrence. Last updated 5 months ago.
6.0 match 111 stars 12.26 score 734 scripts 17 dependentsbioc
rGREAT:GREAT Analysis - Functional Enrichment on Genomic Regions
GREAT (Genomic Regions Enrichment of Annotations Tool) is a type of functional enrichment analysis directly performed on genomic regions. This package implements the GREAT algorithm (the local GREAT analysis), also it supports directly interacting with the GREAT web service (the online GREAT analysis). Both analysis can be viewed by a Shiny application. rGREAT by default supports more than 600 organisms and a large number of gene set collections, as well as self-provided gene sets and organisms from users. Additionally, it implements a general method for dealing with background regions.
Maintained by Zuguang Gu. Last updated 15 days ago.
genesetenrichmentgopathwayssoftwaresequencingwholegenomegenomeannotationcoveragecpp
7.4 match 86 stars 9.96 score 320 scripts 1 dependentsbioc
CleanUpRNAseq:Detect and Correct Genomic DNA Contamination in RNA-seq Data
RNA-seq data generated by some library preparation methods, such as rRNA-depletion-based method and the SMART-seq method, might be contaminated by genomic DNA (gDNA), if DNase I disgestion is not performed properly during RNA preparation. CleanUpRNAseq is developed to check if RNA-seq data is suffered from gDNA contamination. If so, it can perform correction for gDNA contamination and reduce false discovery rate of differentially expressed genes.
Maintained by Haibo Liu. Last updated 4 months ago.
qualitycontrolsequencinggeneexpression
13.3 match 5 stars 5.44 score 4 scriptsbioc
pwalign:Perform pairwise sequence alignments
The two main functions in the package are pairwiseAlignment() and stringDist(). The former solves (Needleman-Wunsch) global alignment, (Smith-Waterman) local alignment, and (ends-free) overlap alignment problems. The latter computes the Levenshtein edit distance or pairwise alignment score matrix for a set of strings.
Maintained by Hervé Pagès. Last updated 10 days ago.
alignmentsequencematchingsequencinggeneticsbioconductor-package
8.4 match 1 stars 8.48 score 27 scripts 104 dependentsbioc
bumphunter:Bump Hunter
Tools for finding bumps in genomic data
Maintained by Tamilselvi Guharaj. Last updated 5 months ago.
dnamethylationepigeneticsinfrastructuremultiplecomparisonimmunooncology
6.1 match 16 stars 11.61 score 210 scripts 43 dependentskarissawhiting
cbioportalR:Browse and Query Clinical and Genomic Data from cBioPortal
Provides R users with direct access to genomic and clinical data from the 'cBioPortal' web resource via user-friendly functions that wrap 'cBioPortal's' existing API endpoints <https://www.cbioportal.org/api/swagger-ui/index.html>. Users can browse and query genomic data on mutations, copy number alterations and fusions, as well as data on tumor mutational burden ('TMB'), microsatellite instability status ('MSI'), 'FACETS' and select clinical data points (depending on the study). See <https://www.cbioportal.org/> and Gao et al., (2013) <doi:10.1126/scisignal.2004088> for more information on the cBioPortal web resource.
Maintained by Karissa Whiting. Last updated 4 months ago.
10.4 match 22 stars 6.72 score 20 scriptspsoerensen
qgg:Statistical Tools for Quantitative Genetic Analyses
Provides an infrastructure for efficient processing of large-scale genetic and phenotypic data including core functions for: 1) fitting linear mixed models, 2) constructing marker-based genomic relationship matrices, 3) estimating genetic parameters (heritability and correlation), 4) performing genomic prediction and genetic risk profiling, and 5) single or multi-marker association analyses. Rohde et al. (2019) <doi:10.1101/503631>.
Maintained by Peter Soerensen. Last updated 10 days ago.
10.0 match 36 stars 7.01 score 47 scriptsbioc
HicAggR:Set of 3D genomic interaction analysis tools
This package provides a set of functions useful in the analysis of 3D genomic interactions. It includes the import of standard HiC data formats into R and HiC normalisation procedures. The main objective of this package is to improve the visualization and quantification of the analysis of HiC contacts through aggregation. The package allows to import 1D genomics data, such as peaks from ATACSeq, ChIPSeq, to create potential couples between features of interest under user-defined parameters such as distance between pairs of features of interest. It allows then the extraction of contact values from the HiC data for these couples and to perform Aggregated Peak Analysis (APA) for visualization, but also to compare normalized contact values between conditions. Overall the package allows to integrate 1D genomics data with 3D genomics data, providing an easy access to HiC contact values.
Maintained by Olivier Cuvier. Last updated 5 months ago.
softwarehicdataimportdatarepresentationnormalizationvisualizationdna3dstructureatacseqchipseqdnaseseqrnaseq
14.6 match 4.70 score 3 scriptsbioc
BaalChIP:BaalChIP: Bayesian analysis of allele-specific transcription factor binding in cancer genomes
The package offers functions to process multiple ChIP-seq BAM files and detect allele-specific events. Computes allele counts at individual variants (SNPs/SNVs), implements extensive QC steps to remove problematic variants, and utilizes a bayesian framework to identify statistically significant allele- specific events. BaalChIP is able to account for copy number differences between the two alleles, a known phenotypical feature of cancer samples.
Maintained by Ines de Santiago. Last updated 5 months ago.
softwarechipseqbayesiansequencing
17.0 match 4.00 score 5 scriptsmrcieu
ieugwasr:Interface to the 'OpenGWAS' Database API
Interface to the 'OpenGWAS' database API <https://api.opengwas.io/api/>. Includes a wrapper to make generic calls to the API, plus convenience functions for specific queries.
Maintained by Gibran Hemani. Last updated 15 days ago.
6.3 match 89 stars 10.71 score 404 scripts 6 dependentsmoosa-r
rbioapi:User-Friendly R Interface to Biologic Web Services' API
Currently fully supports Enrichr, JASPAR, miEAA, PANTHER, Reactome, STRING, and UniProt! The goal of rbioapi is to provide a user-friendly and consistent interface to biological databases and services. In a way that insulates the user from the technicalities of using web services API and creates a unified and easy-to-use interface to biological and medical web services. This is an ongoing project; New databases and services will be added periodically. Feel free to suggest any databases or services you often use.
Maintained by Moosa Rezwani. Last updated 2 months ago.
api-clientbioinformaticsbiologyenrichmentenrichment-analysisenrichrjasparmieaaover-representation-analysispantherreactomestringuniprot
8.9 match 20 stars 7.60 score 55 scriptsbioc
methylPipe:Base resolution DNA methylation data analysis
Memory efficient analysis of base resolution DNA methylation data in both the CpG and non-CpG sequence context. Integration of DNA methylation data derived from any methodology providing base- or low-resolution data.
Maintained by Mattia Furlan. Last updated 5 months ago.
methylseqdnamethylationcoveragesequencing
14.3 match 4.73 score 1 scripts 1 dependentsbioc
IsoformSwitchAnalyzeR:Identify, Annotate and Visualize Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data
Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.
Maintained by Kristoffer Vitting-Seerup. Last updated 5 months ago.
geneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicingvisualizationstatisticalmethodtranscriptomevariantbiomedicalinformaticsfunctionalgenomicssystemsbiologytranscriptomicsrnaseqannotationfunctionalpredictiongenepredictiondataimportmultiplecomparisonbatcheffectimmunooncology
7.2 match 108 stars 9.26 score 125 scriptssperfu
findGSEP:Estimate Genome Size of Polyploid Species Using k-Mer Frequencies
Provides tools to estimate the genome size of polyploid species using k-mer frequencies. This package includes functions to process k-mer frequency data and perform genome size estimation by fitting k-mer frequencies with a normal distribution model. It supports handling of complex polyploid genomes and offers various options for customizing the estimation process. The basic method 'findGSE' is detailed in Sun, Hequan, et al. (2018) <doi:10.1093/bioinformatics/btx637>.
Maintained by Laiyi Fu. Last updated 8 months ago.
13.4 match 4 stars 5.00 score 1 scriptsbioc
ontoProc:processing of ontologies of anatomy, cell lines, and so on
Support harvesting of diverse bioinformatic ontologies, making particular use of the ontologyIndex package on CRAN. We provide snapshots of key ontologies for terms about cells, cell lines, chemical compounds, and anatomy, to help analyze genome-scale experiments, particularly cell x compound screens. Another purpose is to strengthen development of compelling use cases for richer interfaces to emerging ontologies.
Maintained by Vincent Carey. Last updated 15 days ago.
infrastructuregobioinformaticsgenomicsontology
10.5 match 3 stars 6.37 score 75 scripts 2 dependentsbioc
proActiv:Estimate Promoter Activity from RNA-Seq data
Most human genes have multiple promoters that control the expression of different isoforms. The use of these alternative promoters enables the regulation of isoform expression pre-transcriptionally. Alternative promoters have been found to be important in a wide number of cell types and diseases. proActiv is an R package that enables the analysis of promoters from RNA-seq data. proActiv uses aligned reads as input, and generates counts and normalized promoter activity estimates for each annotated promoter. In particular, proActiv accepts junction files from TopHat2 or STAR or BAM files as inputs. These estimates can then be used to identify which promoter is active, which promoter is inactive, and which promoters change their activity across conditions. proActiv also allows visualization of promoter activity across conditions.
Maintained by Joseph Lee. Last updated 5 months ago.
rnaseqgeneexpressiontranscriptionalternativesplicinggeneregulationdifferentialsplicingfunctionalgenomicsepigeneticstranscriptomicspreprocessingalternative-promotersgenomicspromoter-activitypromoter-annotationrna-seq-data
10.0 match 51 stars 6.66 score 15 scriptsbioc
RTCGA:The Cancer Genome Atlas Data Integration
The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. The key is to understand genomics to improve cancer care. RTCGA package offers download and integration of the variety and volume of TCGA data using patient barcode key, what enables easier data possession. This may have an benefcial infuence on impact on development of science and improvement of patients' treatment. Furthermore, RTCGA package transforms TCGA data to tidy form which is convenient to use.
Maintained by Marcin Kosinski. Last updated 5 months ago.
immunooncologysoftwaredataimportdatarepresentationpreprocessingrnaseqsurvivaldnamethylationprincipalcomponentvisualization
7.5 match 51 stars 8.91 score 106 scripts 1 dependentsbioc
cn.mops:cn.mops - Mixture of Poissons for CNV detection in NGS data
cn.mops (Copy Number estimation by a Mixture Of PoissonS) is a data processing pipeline for copy number variations and aberrations (CNVs and CNAs) from next generation sequencing (NGS) data. The package supplies functions to convert BAM files into read count matrices or genomic ranges objects, which are the input objects for cn.mops. cn.mops models the depths of coverage across samples at each genomic position. Therefore, it does not suffer from read count biases along chromosomes. Using a Bayesian approach, cn.mops decomposes read variations across samples into integer copy numbers and noise by its mixture components and Poisson distributions, respectively. cn.mops guarantees a low FDR because wrong detections are indicated by high noise and filtered out. cn.mops is very fast and written in C++.
Maintained by Gundula Povysil. Last updated 3 months ago.
sequencingcopynumbervariationhomo_sapienscellbiologyhapmapgeneticscpp
12.4 match 5.35 score 94 scripts 4 dependentsbioc
cbpManager:Generate, manage, and edit data and metadata files suitable for the import in cBioPortal for Cancer Genomics
This R package provides an R Shiny application that enables the user to generate, manage, and edit data and metadata files suitable for the import in cBioPortal for Cancer Genomics. Create cancer studies and edit its metadata. Upload mutation data of a patient that will be concatenated to the data_mutation_extended.txt file of the study. Create and edit clinical patient data, sample data, and timeline data. Create custom timeline tracks for patients.
Maintained by Arsenij Ustjanzew. Last updated 5 months ago.
immunooncologydataimportdatarepresentationguithirdpartyclientpreprocessingvisualizationcancer-genomicscbioportalclinical-datafilegeneratormutation-datapatient-data
12.0 match 8 stars 5.51 score 1 scriptsbioc
dmrseq:Detection and inference of differentially methylated regions from Whole Genome Bisulfite Sequencing
This package implements an approach for scanning the genome to detect and perform accurate inference on differentially methylated regions from Whole Genome Bisulfite Sequencing data. The method is based on comparing detected regions to a pooled null distribution, that can be implemented even when as few as two samples per population are available. Region-level statistics are obtained by fitting a generalized least squares (GLS) regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions.
Maintained by Keegan Korthauer. Last updated 5 months ago.
immunooncologydnamethylationepigeneticsmultiplecomparisonsoftwaresequencingdifferentialmethylationwholegenomeregressionfunctionalgenomics
10.2 match 6.39 score 59 scripts 1 dependentsblasseigne
ProliferativeIndex:Calculates and Analyzes the Proliferative Index
Provides functions for calculating and analyzing the proliferative index (PI) from an RNA-seq dataset. As described in Ramaker & Lasseigne, et al. bioRxiv, 2016 <doi:10.1101/063057>.
Maintained by Brittany Lasseigne. Last updated 7 years ago.
cancercancer-genomicsgene-expressiongenomicsindexmetagene
17.5 match 3.70 score 10 scriptsgenie-bpc
genieBPC:Project GENIE BioPharma Collaborative Data Processing Pipeline
The American Association Research (AACR) Project Genomics Evidence Neoplasia Information Exchange (GENIE) BioPharma Collaborative represents a multi-year, multi-institution effort to build a pan-cancer repository of linked clinico-genomic data. The genomic and clinical data are provided in multiple releases (separate releases for each cancer cohort with updates following data corrections), which are stored on the data sharing platform 'Synapse' <https://www.synapse.org/>. The 'genieBPC' package provides a seamless way to obtain the data corresponding to each release from 'Synapse' and to prepare datasets for analysis.
Maintained by Jessica A. Lavery. Last updated 9 months ago.
8.5 match 9 stars 7.57 score 26 scriptsbioc
ProteoDisco:Generation of customized protein variant databases from genomic variants, splice-junctions and manual sequences
ProteoDisco is an R package to facilitate proteogenomics studies. It houses functions to create customized (variant) protein databases based on user-submitted genomic variants, splice-junctions, fusion genes and manual transcript sequences. The flexible workflow can be adopted to suit a myriad of research and experimental settings.
Maintained by Job van Riet. Last updated 5 months ago.
softwareproteomicsrnaseqsnpsequencingvariantannotationdataimport
12.1 match 5 stars 5.30 score 4 scriptsbioc
gemma.R:A wrapper for Gemma's Restful API to access curated gene expression data and differential expression analyses
Low- and high-level wrappers for Gemma's RESTful API. They enable access to curated expression and differential expression data from over 10,000 published studies. Gemma is a web site, database and a set of tools for the meta-analysis, re-use and sharing of genomics data, currently primarily targeted at the analysis of gene expression profiles.
Maintained by Ogan Mancarci. Last updated 4 months ago.
softwaredataimportmicroarraysinglecellthirdpartyclientdifferentialexpressiongeneexpressionbayesianannotationexperimentaldesignnormalizationbatcheffectpreprocessingbioinformaticsgemmagenomicstranscriptomics
10.5 match 10 stars 5.99 score 26 scriptskfarleigh
PopGenHelpR:Streamline Population Genomic and Genetic Analyses
Estimate commonly used population genomic statistics and generate publication quality figures. 'PopGenHelpR' uses vcf, 'geno' (012), and csv files to generate output.
Maintained by Keaka Farleigh. Last updated 8 months ago.
diversityfstheterozygosityinterpolationneispopulation-geneticspopulation-genomicsprivate-allelessnmfstructurevcf
12.5 match 3 stars 5.02 score 14 scriptsbioc
rCGH:Comprehensive Pipeline for Analyzing and Visualizing Array-Based CGH Data
A comprehensive pipeline for analyzing and interactively visualizing genomic profiles generated through commercial or custom aCGH arrays. As inputs, rCGH supports Agilent dual-color Feature Extraction files (.txt), from 44 to 400K, Affymetrix SNP6.0 and cytoScanHD probeset.txt, cychp.txt, and cnchp.txt files exported from ChAS or Affymetrix Power Tools. rCGH also supports custom arrays, provided data complies with the expected format. This package takes over all the steps required for individual genomic profiles analysis, from reading files to profiles segmentation and gene annotations. This package also provides several visualization functions (static or interactive) which facilitate individual profiles interpretation. Input files can be in compressed format, e.g. .bz2 or .gz.
Maintained by Frederic Commo. Last updated 5 months ago.
acghcopynumbervariationpreprocessingfeatureextraction
12.3 match 4 stars 5.10 score 26 scripts 1 dependentsbioinformatics-ptp
detectRUNS:Detect Runs of Homozygosity and Runs of Heterozygosity in Diploid Genomes
Detection of runs of homozygosity and of heterozygosity in diploid genomes using two methods: sliding windows (Purcell et al (2007) <doi:10.1086/519795>) and consecutive runs (Marras et al (2015) <doi:10.1111/age.12259>).
Maintained by Filippo Biscarini. Last updated 3 years ago.
9.5 match 9 stars 6.50 score 35 scriptsbioc
REMP:Repetitive Element Methylation Prediction
Machine learning-based tools to predict DNA methylation of locus-specific repetitive elements (RE) by learning surrounding genetic and epigenetic information. These tools provide genomewide and single-base resolution of DNA methylation prediction on RE that are difficult to measure using array-based or sequencing-based platforms, which enables epigenome-wide association study (EWAS) and differentially methylated region (DMR) analysis on RE.
Maintained by Yinan Zheng. Last updated 5 months ago.
dnamethylationmicroarraymethylationarraysequencinggenomewideassociationepigeneticspreprocessingmultichanneltwochanneldifferentialmethylationqualitycontroldataimport
10.4 match 2 stars 5.94 score 18 scriptsbioc
tidyomics:Easily install and load the tidyomics ecosystem
The tidyomics ecosystem is a set of packages for ’omic data analysis that work together in harmony; they share common data representations and API design, consistent with the tidyverse ecosystem. The tidyomics package is designed to make it easy to install and load core packages from the tidyomics ecosystem with a single command.
Maintained by Stefano Mangiola. Last updated 5 months ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicscytometrygenomicstidyverse
10.0 match 64 stars 6.11 score 5 scriptsadeverse
ade4:Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences
Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
Maintained by Aurélie Siberchicot. Last updated 8 days ago.
4.0 match 40 stars 15.10 score 2.2k scripts 257 dependentsbioc
celda:CEllular Latent Dirichlet Allocation
Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included.
Maintained by Joshua Campbell. Last updated 1 months ago.
singlecellgeneexpressionclusteringsequencingbayesianimmunooncologydataimportcppopenmp
5.7 match 147 stars 10.47 score 256 scripts 2 dependentsbioc
DECIPHER:Tools for curating, analyzing, and manipulating biological sequences
A toolset for deciphering and managing biological sequences.
Maintained by Erik Wright. Last updated 17 days ago.
clusteringgeneticssequencingdataimportvisualizationmicroarrayqualitycontrolqpcralignmentwholegenomemicrobiomeimmunooncologygenepredictionopenmp
5.7 match 10.55 score 1.1k scripts 14 dependentsbioc
RankProd:Rank Product method for identifying differentially expressed genes with application in meta-analysis
Non-parametric method for identifying differentially expressed (up- or down- regulated) genes based on the estimated percentage of false predictions (pfp). The method can combine data sets from different origins (meta-analysis) to increase the power of the identification.
Maintained by Francesco Del Carratore. Last updated 5 months ago.
differentialexpressionstatisticalmethodsoftwareresearchfieldmetabolomicslipidomicsproteomicssystemsbiologygeneexpressionmicroarraygenesignaling
9.4 match 6.39 score 81 scripts 5 dependentsbioc
motifbreakR:A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites
We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 32 species, a total of 109 versions).
Maintained by Simon Gert Coetzee. Last updated 5 months ago.
chipseqvisualizationmotifannotationtranscription
6.7 match 28 stars 8.89 score 103 scriptsbioc
switchde:Switch-like differential expression across single-cell trajectories
Inference and detection of switch-like differential expression across single-cell RNA-seq trajectories.
Maintained by Kieran Campbell. Last updated 5 months ago.
immunooncologysoftwaretranscriptomicsgeneexpressionrnaseqregressiondifferentialexpressionsinglecellgene-expressiongenomicssingle-cell
10.0 match 19 stars 5.98 score 7 scriptsjendelman
rrBLUP:Ridge Regression and Other Kernels for Genomic Selection
Software for genomic prediction with the RR-BLUP mixed model (Endelman 2011, <doi:10.3835/plantgenome2011.08.0024>). One application is to estimate marker effects by ridge regression; alternatively, BLUPs can be calculated based on an additive relationship matrix or a Gaussian kernel.
Maintained by Jeffrey Endelman. Last updated 1 years ago.
9.0 match 13 stars 6.55 score 568 scripts 6 dependentshusson
FactoMineR:Multivariate Exploratory Data Analysis and Data Mining
Exploratory data analysis methods to summarize, visualize and describe datasets. The main principal component methods are available, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, Multiple Factor Analysis when variables are structured in groups, etc. and hierarchical cluster analysis. F. Husson, S. Le and J. Pages (2017).
Maintained by Francois Husson. Last updated 4 months ago.
4.0 match 47 stars 14.71 score 5.6k scripts 112 dependentsbioc
gmoviz:Seamless visualization of complex genomic variations in GMOs and edited cell lines
Genetically modified organisms (GMOs) and cell lines are widely used models in all kinds of biological research. As part of characterising these models, DNA sequencing technology and bioinformatics analyses are used systematically to study their genomes. Therefore, large volumes of data are generated and various algorithms are applied to analyse this data, which introduces a challenge on representing all findings in an informative and concise manner. `gmoviz` provides users with an easy way to visualise and facilitate the explanation of complex genomic editing events on a larger, biologically-relevant scale.
Maintained by Kathleen Zeglinski. Last updated 5 months ago.
visualizationsequencinggeneticvariabilitygenomicvariationcoverage
13.7 match 4.30 score 9 scriptsbhklab
mRMRe:Parallelized Minimum Redundancy, Maximum Relevance (mRMR)
Computes mutual information matrices from continuous, categorical and survival variables, as well as feature selection with minimum redundancy, maximum relevance (mRMR) and a new ensemble mRMR technique. Published in De Jay et al. (2013) <doi:10.1093/bioinformatics/btt383>.
Maintained by Benjamin Haibe-Kains. Last updated 4 years ago.
6.5 match 19 stars 8.95 score 105 scripts 2 dependentscore-bioinformatics
ClustAssess:Tools for Assessing Clustering
A set of tools for evaluating clustering robustness using proportion of ambiguously clustered pairs (Senbabaoglu et al. (2014) <doi:10.1038/srep06207>), as well as similarity across methods and method stability using element-centric clustering comparison (Gates et al. (2019) <doi:10.1038/s41598-019-44892-y>). Additionally, this package enables stability-based parameter assessment for graph-based clustering pipelines typical in single-cell data analysis.
Maintained by Andi Munteanu. Last updated 1 months ago.
softwaresinglecellrnaseqatacseqnormalizationpreprocessingdimensionreductionvisualizationqualitycontrolclusteringclassificationannotationgeneexpressiondifferentialexpressionbioinformaticsgenomicsmachine-learningparameter-optimizationrobustnesssingle-cellunsupervised-learningcpp
10.0 match 23 stars 5.70 score 18 scriptsbioc
SCOPE:A normalization and copy number estimation method for single-cell DNA sequencing
Whole genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy number profiles at the cellular level. This circumvents the averaging effects associated with bulk-tissue sequencing and has increased resolution yet decreased ambiguity in deconvolving cancer subclones and elucidating cancer evolutionary history. ScDNA-seq data is, however, sparse, noisy, and highly variable even within a homogeneous cell population, due to the biases and artifacts that are introduced during the library preparation and sequencing procedure. Here, we propose SCOPE, a normalization and copy number estimation method for scDNA-seq data. The distinguishing features of SCOPE include: (i) utilization of cell-specific Gini coefficients for quality controls and for identification of normal/diploid cells, which are further used as negative control samples in a Poisson latent factor model for normalization; (ii) modeling of GC content bias using an expectation-maximization algorithm embedded in the Poisson generalized linear models, which accounts for the different copy number states along the genome; (iii) a cross-sample iterative segmentation procedure to identify breakpoints that are shared across cells from the same genetic background.
Maintained by Rujin Wang. Last updated 5 months ago.
singlecellnormalizationcopynumbervariationsequencingwholegenomecoveragealignmentqualitycontroldataimportdnaseq
9.6 match 5.92 score 84 scriptsbioc
multicrispr:Multi-locus multi-purpose Crispr/Cas design
This package is for designing Crispr/Cas9 and Prime Editing experiments. It contains functions to (1) define and transform genomic targets, (2) find spacers (4) count offtarget (mis)matches, and (5) compute Doench2016/2014 targeting efficiency. Care has been taken for multicrispr to scale well towards large target sets, enabling the design of large Crispr/Cas9 libraries.
Maintained by Aditya Bhagwat. Last updated 4 months ago.
10.0 match 5.65 score 2 scriptssamuel-marsh
scCustomize:Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing
Collection of functions created and/or curated to aid in the visualization and analysis of single-cell data using 'R'. 'scCustomize' aims to provide 1) Customized visualizations for aid in ease of use and to create more aesthetic and functional visuals. 2) Improve speed/reproducibility of common tasks/pieces of code in scRNA-seq analysis with a single or group of functions. For citation please use: Marsh SE (2021) "Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing" <doi:10.5281/zenodo.5706430> RRID:SCR_024675.
Maintained by Samuel Marsh. Last updated 3 months ago.
customizationggplot2scrna-seqseuratsingle-cellsingle-cell-genomicssingle-cell-rna-seqvisualization
6.7 match 246 stars 8.45 score 1.1k scriptsramiromagno
gwasrapidd:'REST' 'API' Client for the 'NHGRI'-'EBI' 'GWAS' Catalog
'GWAS' R 'API' Data Download. This package provides easy access to the 'NHGRI'-'EBI' 'GWAS' Catalog data by accessing the 'REST' 'API' <https://www.ebi.ac.uk/gwas/rest/docs/api/>.
Maintained by Ramiro Magno. Last updated 1 years ago.
thirdpartyclientbiomedicalinformaticsgenomewideassociationsnpassociation-studiesgwas-cataloghumanrest-clienttraittrait-ontology
6.9 match 95 stars 8.10 score 49 scripts 1 dependentsbioc
CNEr:CNE Detection and Visualization
Large-scale identification and advanced visualization of sets of conserved noncoding elements.
Maintained by Ge Tan. Last updated 5 months ago.
generegulationvisualizationdataimport
6.0 match 3 stars 9.28 score 35 scripts 19 dependentsbioc
MungeSumstats:Standardise summary statistics from GWAS
The *MungeSumstats* package is designed to facilitate the standardisation of GWAS summary statistics. It reformats inputted summary statisitics to include SNP, CHR, BP and can look up these values if any are missing. It also pefrorms dozens of QC and filtering steps to ensure high data quality and minimise inter-study differences.
Maintained by Alan Murphy. Last updated 3 months ago.
snpwholegenomegeneticscomparativegenomicsgenomewideassociationgenomicvariationpreprocessing
9.0 match 3 stars 6.23 score 91 scriptsbioc
VanillaICE:A Hidden Markov Model for high throughput genotyping arrays
Hidden Markov Models for characterizing chromosomal alteration in high throughput SNP arrays.
Maintained by Robert Scharpf. Last updated 5 months ago.
10.4 match 5.36 score 63 scripts 1 dependentsbioc
InTAD:Search for correlation between epigenetic signals and gene expression in TADs
The package is focused on the detection of correlation between expressed genes and selected epigenomic signals (i.e. enhancers obtained from ChIP-seq data) either within topologically associated domains (TADs) or between chromatin contact loop anchors. Various parameters can be controlled to investigate the influence of external factors and visualization plots are available for each analysis step.
Maintained by Konstantin Okonechnikov. Last updated 5 months ago.
epigeneticssequencingchipseqrnaseqhicgeneexpressionimmunooncology
12.8 match 4.30 score 6 scriptsbioc
LOLA:Locus overlap analysis for enrichment of genomic ranges
Provides functions for testing overlap of sets of genomic regions with public and custom region set (genomic ranges) databases. This makes it possible to do automated enrichment analysis for genomic region sets, thus facilitating interpretation of functional genomics and epigenomics data.
Maintained by Nathan Sheffield. Last updated 5 months ago.
genesetenrichmentgeneregulationgenomeannotationsystemsbiologyfunctionalgenomicschipseqmethylseqsequencing
5.9 match 76 stars 9.34 score 160 scriptsbioc
TENxIO:Import methods for 10X Genomics files
Provides a structured S4 approach to importing data files from the 10X pipelines. It mainly supports Single Cell Multiome ATAC + Gene Expression data among other data types. The main Bioconductor data representations used are SingleCellExperiment and RaggedExperiment.
Maintained by Marcel Ramos. Last updated 4 months ago.
softwareinfrastructuredataimportsinglecellbioconductor-packageu24ca289073
9.5 match 5.77 score 7 scripts 3 dependentsbioc
CNVrd2:CNVrd2: a read depth-based method to detect and genotype complex common copy number variants from next generation sequencing data.
CNVrd2 uses next-generation sequencing data to measure human gene copy number for multiple samples, indentify SNPs tagging copy number variants and detect copy number polymorphic genomic regions.
Maintained by Hoang Tan Nguyen. Last updated 5 months ago.
copynumbervariationsnpsequencingsoftwarecoveragelinkagedisequilibriumclustering.jagscpp
11.0 match 3 stars 4.92 scorehenrikbengtsson
PSCBS:Analysis of Parent-Specific DNA Copy Numbers
Segmentation of allele-specific DNA copy number data and detection of regions with abnormal copy number within each parental chromosome. Both tumor-normal paired and tumor-only analyses are supported.
Maintained by Henrik Bengtsson. Last updated 1 years ago.
acghcopynumbervariantssnpmicroarrayonechanneltwochannelgenetics
7.1 match 7 stars 7.63 score 34 scripts 9 dependentsbioc
UCSC.utils:Low-level utilities to retrieve data from the UCSC Genome Browser
A set of low-level utilities to retrieve data from the UCSC Genome Browser. Most functions in the package access the data via the UCSC REST API but some of them query the UCSC MySQL server directly. Note that the primary purpose of the package is to support higher-level functionalities implemented in downstream packages like GenomeInfoDb or txdbmaker.
Maintained by Hervé Pagès. Last updated 2 months ago.
infrastructuregenomeassemblyannotationgenomeannotationdataimportbioconductor-packagecore-package
5.3 match 1 stars 10.09 score 4 scripts 1.7k dependentsbioc
DropletUtils:Utilities for Handling Single-Cell Droplet Data
Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading from count matrices or molecule information files, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix.
Maintained by Jonathan Griffiths. Last updated 4 months ago.
immunooncologysinglecellsequencingrnaseqgeneexpressiontranscriptomicsdataimportcoveragezlibcpp
5.4 match 10.01 score 2.7k scripts 9 dependentsclandere
AnaCoDa:Analysis of Codon Data under Stationarity using a Bayesian Framework
Is a collection of models to analyze genome scale codon data using a Bayesian framework. Provides visualization routines and checkpointing for model fittings. Currently published models to analyze gene data for selection on codon usage based on Ribosome Overhead Cost (ROC) are: ROC (Gilchrist et al. (2015) <doi:10.1093/gbe/evv087>), and ROC with phi (Wallace & Drummond (2013) <doi:10.1093/molbev/mst051>). In addition 'AnaCoDa' contains three currently unpublished models. The FONSE (First order approximation On NonSense Error) model analyzes gene data for selection on codon usage against of nonsense error rates. The PA (PAusing time) and PANSE (PAusing time + NonSense Error) models use ribosome footprinting data to analyze estimate ribosome pausing times with and without nonsense error rate from ribosome footprinting data.
Maintained by Cedric Landerer. Last updated 4 years ago.
13.4 match 1 stars 4.00 score 100 scriptscran
avidaR:A Computational Biologist’s Toolkit To Get Data From 'avidaDB'
Easy-to-use tools for performing complex queries on 'avidaDB', a semantic database that stores genomic and transcriptomic data of self-replicating computer programs (known as digital organisms) that mutate and evolve within a user-defined computational environment.
Maintained by Raúl Ortega. Last updated 9 months ago.
31.4 match 1.70 scorebioc
SIM:Integrated Analysis on two human genomic datasets
Finds associations between two human genomic datasets.
Maintained by Renee X. de Menezes. Last updated 5 months ago.
12.4 match 4.30 score 3 scriptsbioc
SeqSQC:A bioconductor package for sample quality check with next generation sequencing data
The SeqSQC is designed to identify problematic samples in NGS data, including samples with gender mismatch, contamination, cryptic relatedness, and population outlier.
Maintained by Qian Liu. Last updated 5 months ago.
experiment datahomo_sapiens_datasequencing dataproject1000genomesgenome
10.0 match 5.26 score 2 scriptsbioc
InteractionSet:Base Classes for Storing Genomic Interaction Data
Provides the GInteractions, InteractionSet and ContactMatrix objects and associated methods for storing and manipulating genomic interaction data from Hi-C and ChIA-PET experiments.
Maintained by Aaron Lun. Last updated 5 months ago.
infrastructuredatarepresentationsoftwarehiccpp
6.6 match 7.97 score 250 scripts 36 dependentsbioc
Repitools:Epigenomic tools
Tools for the analysis of enrichment-based epigenomic data. Features include summarization and visualization of epigenomic data across promoters according to gene expression context, finding regions of differential methylation/binding, BayMeth for quantifying methylation etc.
Maintained by Mark Robinson. Last updated 5 months ago.
dnamethylationgeneexpressionmethylseq
8.8 match 5.90 score 267 scriptsg3viz
g3viz:Interactively Visualize Genetic Mutation Data using a Lollipop-Diagram
Interface for 'g3-lollipop' 'JavaScript' library. Visualize genetic mutation data using an interactive lollipop diagram in 'RStudio' or your web browser.
Maintained by Xin Guo. Last updated 7 months ago.
bioinformaticsgenomics-visualizationlollipop-plotvariantsvisualize-mutation-data
9.3 match 31 stars 5.61 score 22 scriptsalenxav
NAM:Nested Association Mapping
Designed for association studies in nested association mapping (NAM) panels, experimental and random panels. The method is described by Xavier et al. (2015) <doi:10.1093/bioinformatics/btv448>. It includes tools for genome-wide associations of multiple populations, marker quality control, population genetics analysis, genome-wide prediction, solving mixed models and finding variance components through likelihood and Bayesian methods.
Maintained by Alencar Xavier. Last updated 5 years ago.
9.1 match 2 stars 5.72 score 44 scripts 1 dependents