R-universe search: topic:genomics

bioc

MultiAssayExperiment:Software for the integration of multi-omics experiments in Bioconductor

Harmonize data management of multiple experimental assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames. Facilities are provided for reshaping data into wide and long formats for adaptability to graphing and downstream analysis.

Maintained by Marcel Ramos. Last updated 2 months ago.

infrastructure datarepresentation bioconductor bioconductor-package genomics nci-itcr tcga u24ca289073

71 stars 14.94 score 670 scripts 126 dependents

bioc

GSVA:Gene Set Variation Analysis for Microarray and RNA-Seq Data

Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.

Maintained by Robert Castelo. Last updated 11 days ago.

functionalgenomics microarray rnaseq pathways genesetenrichment gene-set-enrichment genomics pathway-enrichment-analysis

212 stars 14.74 score 1.6k scripts 19 dependents

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

461 stars 14.59 score 948 scripts 18 dependents

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 5 days ago.

immunooncology microarray sequencing metabolomics metagenomics proteomics geneprediction multiplecomparison classification regression bioconductor genomics genomics-data genomics-visualization multivariate-analysis multivariate-statistics omics r-pkg r-project

185 stars 13.75 score 1.3k scripts 22 dependents

knausb

vcfR:Manipulate and Visualize VCF Data

Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.

Maintained by Brian J. Knaus. Last updated 1 months ago.

genomics population-genetics population-genomics rcpp vcf-data visualization zlib cpp

256 stars 13.66 score 3.1k scripts 19 dependents

bioc

GEOquery:Get data from NCBI Gene Expression Omnibus (GEO)

The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.

Maintained by Sean Davis. Last updated 5 months ago.

microarray dataimport onechannel twochannel sage bioconductor bioinformatics data-science genomics ncbi-geo

93 stars 13.48 score 4.1k scripts 45 dependents

bioc

plyranges:A fluent interface for manipulating GenomicRanges

A dplyr-like interface for interacting with the common Bioconductor classes Ranges and GenomicRanges. By providing a grammatical and consistent way of manipulating these classes their accessiblity for new Bioconductor users is hopefully increased.

Maintained by Michael Love. Last updated 13 days ago.

infrastructure datarepresentation workflowstep coverage bioconductor data-analysis dplyr genomic-ranges genomics tidy-data

144 stars 12.66 score 1.9k scripts 20 dependents

stephenturner

qqman:Q-Q and Manhattan Plots for GWAS Data

Create Q-Q and manhattan plots for GWAS data from PLINK results.

Maintained by Stephen Turner. Last updated 2 years ago.

genomics gwas

165 stars 12.51 score 2.4k scripts 20 dependents

igordot

msigdbr:MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format

Provides the 'Molecular Signatures Database' (MSigDB) gene sets typically used with the 'Gene Set Enrichment Analysis' (GSEA) software (Subramanian et al. 2005 <doi:10.1073/pnas.0506580102>, Liberzon et al. 2015 <doi:10.1016/j.cels.2015.12.004>, Castanza et al. 2023 <doi:10.1038/s41592-023-02014-7>) as an R data frame. The package includes the human genes as listed in MSigDB as well as the corresponding symbols and IDs for frequently studied model organisms such as mouse, rat, pig, fly, and yeast.

Maintained by Igor Dolgalev. Last updated 13 days ago.

enrichment-analysis gene-sets genomics gsea msigdb pathway-analysis pathways

73 stars 12.20 score 3.6k scripts 20 dependents

bioc

GenomicDataCommons:NIH / NCI Genomic Data Commons Access

Programmatically access the NIH / NCI Genomic Data Commons RESTful service.

Maintained by Sean Davis. Last updated 2 months ago.

dataimport sequencing api-client bioconductor bioinformatics cancer core-services data-science genomics nci tcga vignette

87 stars 11.94 score 238 scripts 12 dependents

ropensci

biomartr:Genomic Data Retrieval

Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database (Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr', 'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. with only one command.

Maintained by Hajk-Georg Drost. Last updated 2 months ago.

biomart genomic-data-retrieval annotation-retrieval database-retrieval ncbi ensembl biological-data-retrieval ensembl-servers genome genome-annotation genome-retrieval genomics meta-analysis metagenomics ncbi-genbank peer-reviewed proteome sequenced-genomes

218 stars 11.35 score 129 scripts 3 dependents

bioc

gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files

Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.

Maintained by Xiuwen Zheng. Last updated 18 days ago.

infrastructure dataimport bioinformatics gds-format genomics cpp

18 stars 11.34 score 920 scripts 29 dependents

neuhausi

canvasXpress:Visualization Package for CanvasXpress in R

Enables creation of visualizations using the CanvasXpress framework in R. CanvasXpress is a standalone JavaScript library for reproducible research with complete tracking of data and end-user modifications stored in a single PNG image that can be played back. See <https://www.canvasxpress.org> for more information.

Maintained by Connie Brett. Last updated 17 hours ago.

analytics bioinformatics chart charting dash dashboard data-analytics data-science data-visualization genomics graphs javascript network network-visualization python reproducible-research shiny visualization

298 stars 11.28 score 145 scripts

gaynorr

AlphaSimR:Breeding Program Simulations

The successor to the 'AlphaSim' software for breeding program simulation [Faux et al. (2016) <doi:10.3835/plantgenome2016.02.0013>]. Used for stochastic simulations of breeding programs to the level of DNA sequence for every individual. Contained is a wide range of functions for modeling common tasks in a breeding program, such as selection and crossing. These functions allow for constructing simulations of highly complex plant and animal breeding programs via scripting in the R software environment. Such simulations can be used to evaluate overall breeding program performance and conduct research into breeding program design, such as implementation of genomic selection. Included is the 'Markovian Coalescent Simulator' ('MaCS') for fast simulation of biallelic sequences according to a population demographic history [Chen et al. (2009) <doi:10.1101/gr.083634.108>].

Maintained by Chris Gaynor. Last updated 5 months ago.

breeding genomics simulation openblas cpp openmp

47 stars 10.09 score 534 scripts 2 dependents

bioc

nullranges:Generation of null ranges via bootstrapping or covariate matching

Modular package for generation of sets of ranges representing the null hypothesis. These can take the form of bootstrap samples of ranges (using the block bootstrap framework of Bickel et al 2010), or sets of control ranges that are matched across one or more covariates. nullranges is designed to be inter-operable with other packages for analysis of genomic overlap enrichment, including the plyranges Bioconductor package.

Maintained by Michael Love. Last updated 5 months ago.

visualization genesetenrichment functionalgenomics epigenetics generegulation genetarget genomeannotation annotation genomewideassociation histonemodification chipseq atacseq dnaseseq rnaseq hiddenmarkovmodel bioconductor bootstrap genomics matching statistics

27 stars 8.16 score 50 scripts 1 dependents

bioc

orthogene:Interspecies gene mapping

`orthogene` is an R package for easy mapping of orthologous genes across hundreds of species. It pulls up-to-date gene ortholog mappings across **700+ organisms**. It also provides various utility functions to aggregate/expand common objects (e.g. data.frames, gene expression matrices, lists) using **1:1**, **many:1**, **1:many** or **many:many** gene mappings, both within- and between-species.

Maintained by Brian Schilder. Last updated 5 months ago.

genetics comparativegenomics preprocessing phylogenetics transcriptomics geneexpression animal-models bioconductor bioconductor-package bioinformatics biomedicine comparative-genomics evolutionary-biology genes genomics ontologies translational-research

43 stars 7.86 score 31 scripts 2 dependents

bioc

fishpond:Fishpond: downstream methods and tools for expression data

Fishpond contains methods for differential transcript and gene expression analysis of RNA-seq data using inferential replicates for uncertainty of abundance quantification, as generated by Gibbs sampling or bootstrap sampling. Also the package contains a number of utilities for working with Salmon and Alevin quantification files.

Maintained by Michael Love. Last updated 5 months ago.

sequencing rnaseq geneexpression transcription normalization regression multiplecomparison batcheffect visualization differentialexpression differentialsplicing alternativesplicing singlecell bioconductor gene-expression genomics salmon scrnaseq statistics transcriptomics

28 stars 7.83 score 150 scripts

abbvie-external

OmicNavigator:Open-Source Software for 'Omic' Data Analysis and Visualization

A tool for interactive exploration of the results from 'omics' experiments to facilitate novel discoveries from high-throughput biology. The software includes R functions for the 'bioinformatician' to deposit study metadata and the outputs from statistical analyses (e.g. differential expression, enrichment). These results are then exported to an interactive JavaScript dashboard that can be interrogated on the user's local machine or deployed online to be explored by collaborators. The dashboard includes 'sortable' tables, interactive plots including network visualization, and fine-grained filtering based on statistical significance.

Maintained by John Blischak. Last updated 19 days ago.

bioinformatics genomics omics opencpu

34 stars 7.68 score 31 scripts

bioc

crisprScore:On-Target and Off-Target Scoring Algorithms for CRISPR gRNAs

Provides R wrappers of several on-target and off-target scoring methods for CRISPR guide RNAs (gRNAs). The following nucleases are supported: SpCas9, AsCas12a, enAsCas12a, and RfxCas13d (CasRx). The available on-target cutting efficiency scoring methods are RuleSet1, Azimuth, DeepHF, DeepCpf1, enPAM+GB, and CRISPRscan. Both the CFD and MIT scoring methods are available for off-target specificity prediction. The package also provides a Lindel-derived score to predict the probability of a gRNA to produce indels inducing a frameshift for the Cas9 nuclease. Note that DeepHF, DeepCpf1 and enPAM+GB are not available on Windows machines.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics functionalprediction bioconductor bioconductor-package crispr-cas9 crispr-design crispr-target genomics grna grna-sequence grna-sequences scoring-algorithm sgrna sgrna-design

16 stars 7.44 score 19 scripts 4 dependents

bioc

netSmooth:Network smoothing for scRNAseq

netSmooth is an R package for network smoothing of single cell RNA sequencing data. Using bio networks such as protein-protein interactions as priors for gene co-expression, netsmooth improves cell type identification from noisy, sparse scRNAseq data.

Maintained by Jonathan Ronen. Last updated 5 months ago.

network graphandnetwork singlecell rnaseq geneexpression sequencing transcriptomics normalization preprocessing clustering dimensionreduction bioinformatics genomics single-cell

27 stars 7.41 score 4 scripts

quantgen

BEDMatrix:Extract Genotypes from a PLINK .bed File

A matrix-like data structure that allows for efficient, convenient, and scalable subsetting of binary genotype/phenotype files generated by PLINK (<https://www.cog-genomics.org/plink2>), the whole genome association analysis toolset, without loading the entire file into memory.

Maintained by Alexander Grueneberg. Last updated 7 months ago.

bed genetics genomics plink plink2 r-pkg

18 stars 7.13 score 196 scripts 6 dependents

gmod

JBrowseR:An R Interface to the JBrowse 2 Genome Browser

Provides an R interface to the JBrowse 2 genome browser. Enables embedding a JB2 genome browser in a Shiny app or R Markdown document. The browser can also be launched from an interactive R console. The browser can be loaded with a variety of common genomics data types, and can be used with a custom theme.

Maintained by Colin Diesh. Last updated 1 years ago.

genomics reactjs rmarkdown shiny visualization

35 stars 6.81 score 31 scripts 1 dependents

bioc

proActiv:Estimate Promoter Activity from RNA-Seq data

Most human genes have multiple promoters that control the expression of different isoforms. The use of these alternative promoters enables the regulation of isoform expression pre-transcriptionally. Alternative promoters have been found to be important in a wide number of cell types and diseases. proActiv is an R package that enables the analysis of promoters from RNA-seq data. proActiv uses aligned reads as input, and generates counts and normalized promoter activity estimates for each annotated promoter. In particular, proActiv accepts junction files from TopHat2 or STAR or BAM files as inputs. These estimates can then be used to identify which promoter is active, which promoter is inactive, and which promoters change their activity across conditions. proActiv also allows visualization of promoter activity across conditions.

Maintained by Joseph Lee. Last updated 5 months ago.

rnaseq geneexpression transcription alternativesplicing generegulation differentialsplicing functionalgenomics epigenetics transcriptomics preprocessing alternative-promoters genomics promoter-activity promoter-annotation rna-seq-data

51 stars 6.66 score 15 scripts

const-ae

tidygenomics:Tidy Verbs for Dealing with Genomic Data Frames

Handle genomic data within data frames just as you would with 'GRanges'. This packages provides method to deal with genomic intervals the "tidy-way" which makes it simpler to integrate in the the general data munging process. The API is inspired by the popular 'bedtools' and the genome_join() method from the 'fuzzyjoin' package.

Maintained by Constantin Ahlmann-Eltze. Last updated 4 years ago.

genomics intervals tidy cpp

103 stars 6.49 score 30 scripts

bioc

ontoProc:processing of ontologies of anatomy, cell lines, and so on

Support harvesting of diverse bioinformatic ontologies, making particular use of the ontologyIndex package on CRAN. We provide snapshots of key ontologies for terms about cells, cell lines, chemical compounds, and anatomy, to help analyze genome-scale experiments, particularly cell x compound screens. Another purpose is to strengthen development of compelling use cases for richer interfaces to emerging ontologies.

Maintained by Vincent Carey. Last updated 19 days ago.

infrastructure go bioinformatics genomics ontology

3 stars 6.37 score 75 scripts 2 dependents

bioc

CopyNumberPlots:Create Copy-Number Plots using karyoploteR functionality

CopyNumberPlots have a set of functions extending karyoploteRs functionality to create beautiful, customizable and flexible plots of copy-number related data.

Maintained by Bernat Gel. Last updated 5 months ago.

visualization copynumbervariation coverage onechannel dataimport sequencing dnaseq bioconductor bioconductor-package bioinformatics copy-number-variation genomics genomics-visualization

6 stars 6.24 score 16 scripts 2 dependents

bioc

RAIDS:Accurate Inference of Genetic Ancestry from Cancer Sequences

This package implements specialized algorithms that enable genetic ancestry inference from various cancer sequences sources (RNA, Exome and Whole-Genome sequences). This package also implements a simulation algorithm that generates synthetic cancer-derived data. This code and analysis pipeline was designed and developed for the following publication: Belleau, P et al. Genetic Ancestry Inference from Cancer-Derived Molecular Data across Genomic and Transcriptomic Platforms. Cancer Res 1 January 2023; 83 (1): 49–58.

Maintained by Pascal Belleau. Last updated 5 months ago.

genetics software sequencing wholegenome principalcomponent geneticvariability dimensionreduction biocviews ancestry cancer-genomics exome-sequencing genomics inference r-language rna-seq rna-sequencing whole-genome-sequencing

5 stars 6.23 score 19 scripts

bioc

martini:GWAS Incorporating Networks

martini deals with the low power inherent to GWAS studies by using prior knowledge represented as a network. SNPs are the vertices of the network, and the edges represent biological relationships between them (genomic adjacency, belonging to the same gene, physical interaction between protein products). The network is scanned using SConES, which looks for groups of SNPs maximally associated with the phenotype, that form a close subnetwork.

Maintained by Hector Climente-Gonzalez. Last updated 5 months ago.

software genomewideassociation snp geneticvariability genetics featureextraction graphandnetwork network bioinformatics genomics gwas network-analysis snps systems-biology cpp

4 stars 6.16 score 30 scripts

bioc

tidyomics:Easily install and load the tidyomics ecosystem

The tidyomics ecosystem is a set of packages for ’omic data analysis that work together in harmony; they share common data representations and API design, consistent with the tidyverse ecosystem. The tidyomics package is designed to make it easy to install and load core packages from the tidyomics ecosystem with a single command.

Maintained by Stefano Mangiola. Last updated 5 months ago.

assaydomain infrastructure rnaseq differentialexpression geneexpression normalization clustering qualitycontrol sequencing transcription transcriptomics cytometry genomics tidyverse

67 stars 6.13 score 5 scripts

bioc

gemma.R:A wrapper for Gemma's Restful API to access curated gene expression data and differential expression analyses

Low- and high-level wrappers for Gemma's RESTful API. They enable access to curated expression and differential expression data from over 10,000 published studies. Gemma is a web site, database and a set of tools for the meta-analysis, re-use and sharing of genomics data, currently primarily targeted at the analysis of gene expression profiles.

Maintained by Ogan Mancarci. Last updated 4 months ago.

software dataimport microarray singlecell thirdpartyclient differentialexpression geneexpression bayesian annotation experimentaldesign normalization batcheffect preprocessing bioinformatics gemma genomics transcriptomics

10 stars 5.99 score 26 scripts

bioc

switchde:Switch-like differential expression across single-cell trajectories

Inference and detection of switch-like differential expression across single-cell RNA-seq trajectories.

Maintained by Kieran Campbell. Last updated 5 months ago.

immunooncology software transcriptomics geneexpression rnaseq regression differentialexpression singlecell gene-expression genomics single-cell

19 stars 5.98 score 7 scripts

henrikbengtsson

TopDom:An Efficient and Deterministic Method for Identifying Topological Domains in Genomes

The 'TopDom' method identifies topological domains in genomes from Hi-C sequence data (Shin et al., 2016 <doi:10.1093/nar/gkv1505>). The authors published an implementation of their method as an R script (two different versions; also available in this package). This package originates from those original 'TopDom' R scripts and provides help pages adopted from the original 'TopDom' PDF documentation. It also provides a small number of bug fixes to the original code.

Maintained by Henrik Bengtsson. Last updated 4 years ago.

genomics hic topological-domains

21 stars 5.80 score 20 scripts 1 dependents

core-bioinformatics

ClustAssess:Tools for Assessing Clustering

A set of tools for evaluating clustering robustness using proportion of ambiguously clustered pairs (Senbabaoglu et al. (2014) <doi:10.1038/srep06207>), as well as similarity across methods and method stability using element-centric clustering comparison (Gates et al. (2019) <doi:10.1038/s41598-019-44892-y>). Additionally, this package enables stability-based parameter assessment for graph-based clustering pipelines typical in single-cell data analysis.

Maintained by Andi Munteanu. Last updated 2 months ago.

software singlecell rnaseq atacseq normalization preprocessing dimensionreduction visualization qualitycontrol clustering classification annotation geneexpression differentialexpression bioinformatics genomics machine-learning parameter-optimization robustness single-cell unsupervised-learning cpp

23 stars 5.70 score 18 scripts

bioc

VisiumIO:Import Visium data from the 10X Space Ranger pipeline

The package allows users to readily import spatial data obtained from either the 10X website or from the Space Ranger pipeline. Supported formats include tar.gz, h5, and mtx files. Multiple files can be imported at once with *List type of functions. The package represents data mainly as SpatialExperiment objects.

Maintained by Marcel Ramos. Last updated 2 months ago.

software infrastructure dataimport singlecell spatial bioconductor-package genomics u24ca289073

5.50 score 14 scripts 1 dependents

bioc

UMI4Cats:UMI4Cats: Processing, analysis and visualization of UMI-4C chromatin contact data

UMI-4C is a technique that allows characterization of 3D chromatin interactions with a bait of interest, taking advantage of a sonication step to produce unique molecular identifiers (UMIs) that help remove duplication bias, thus allowing a better differential comparsion of chromatin interactions between conditions. This package allows processing of UMI-4C data, starting from FastQ files provided by the sequencing facility. It provides two statistical methods for detecting differential contacts and includes a visualization function to plot integrated information from a UMI-4C assay.

Maintained by Mireia Ramos-Rodriguez. Last updated 5 months ago.

qualitycontrol preprocessing alignment normalization visualization sequencing coverage chromatin chromatin-interaction genomics umi4c

5 stars 5.40 score 7 scripts

quantgen

BGData:A Suite of Packages for Analysis of Big Genomic Data

An umbrella package providing a phenotype/genotype data structure and scalable and efficient computational methods for large genomic datasets in combination with several other packages: 'BEDMatrix', 'LinkedMatrix', and 'symDMatrix'.

Maintained by Alexander Grueneberg. Last updated 2 months ago.

genetics genomics gwas r-pkg openmp

34 stars 5.34 score 43 scripts

cbg-ethz

clustNet:Network-Based Clustering

Network-based clustering using a Bayesian network mixture model with optional covariate adjustment.

Maintained by Fritz Bayer. Last updated 1 years ago.

bayesian-network bayesian-networks clustering dag genomics mixture-model network-clustering

7 stars 5.16 score 41 scripts

eppicenter

moire:Multiplicity of Infection and Allele Frequency Recovery from Noisy Polyallelic Genetics Data

A Markov Chain Monte Carlo (MCMC) based approach to Bayesian estimation of individual level multiplicity of infection, within host relatedness, and population allele frequencies from polyallelic genetic data.

Maintained by Maxwell Murphy. Last updated 5 months ago.

genomics malaria mcmc cpp openmp

7 stars 5.14 score 22 scripts

tomkellygenetics

graphsim:Simulate Expression Data from 'igraph' Networks

Functions to develop simulated continuous data (e.g., gene expression) from a sigma covariance matrix derived from a graph structure in 'igraph' objects. Intended to extend 'mvtnorm' to take 'igraph' structures rather than sigma matrices as input. This allows the use of simulated data that correctly accounts for pathway relationships and correlations. This allows the use of simulated data that correctly accounts for pathway relationships and correlations. Here we present a versatile statistical framework to simulate correlated gene expression data from biological pathways, by sampling from a multivariate normal distribution derived from a graph structure. This package allows the simulation of biological pathways from a graph structure based on a statistical model of gene expression. For example methods to infer biological pathways and gene regulatory networks from gene expression data can be tested on simulated datasets using this framework. This also allows for pathway structures to be considered as a confounding variable when simulating gene expression data to test the performance of genomic analyses.

Maintained by S. Thomas Kelly. Last updated 3 years ago.

benchmarking gene-expression gene-regulatory-networks genetics genomic-data-analysis genomics graph-algorithms igraph-networks joss ngs-analysis simulated-data simulation-modeling

24 stars 5.08 score 2 scripts

acabassi

coca:Cluster-of-Clusters Analysis

Contains the R functions needed to perform Cluster-Of-Clusters Analysis (COCA) and Consensus Clustering (CC). For further details please see Cabassi and Kirk (2020) <doi:10.1093/bioinformatics/btaa593>.

Maintained by Alessandra Cabassi. Last updated 5 years ago.

cluster-analysis cluster-of-clusters clustering coca genomics integrative-clustering multi-omics

6 stars 5.03 score 12 scripts 1 dependents

samilhll

macrosyntR:Draw Ordered Oxford Grids

Use standard genomics file format (BED) and a table of orthologs to illustrate synteny conservation at the genome-wide scale. Significantly conserved linkage groups are identified as described in Simakov et al. (2020) <doi:10.1038/s41559-020-1156-z> and displayed on an Oxford Grid (Edwards (1991) <doi:10.1111/j.1469-1809.1991.tb00394.x>) or a chord diagram as in Simakov et al. (2022) <doi:10.1126/sciadv.abi5884>. The package provides a function that uses a network-based greedy algorithm to find communities (Clauset et al. (2004) <doi:10.1103/PhysRevE.70.066111>) and so automatically order the chromosomes on the plot to improve interpretability.

Maintained by Sami El Hilali. Last updated 10 months ago.

bioinformatics genomic-visualizations genomics

14 stars 4.85 score 5 scripts

bioc

epidecodeR:epidecodeR: a functional exploration tool for epigenetic and epitranscriptomic regulation

epidecodeR is a package capable of analysing impact of degree of DNA/RNA epigenetic chemical modifications on dysregulation of genes or proteins. This package integrates chemical modification data generated from a host of epigenomic or epitranscriptomic techniques such as ChIP-seq, ATAC-seq, m6A-seq, etc. and dysregulated gene lists in the form of differential gene expression, ribosome occupancy or differential protein translation and identify impact of dysregulation of genes caused due to varying degrees of chemical modifications associated with the genes. epidecodeR generates cumulative distribution function (CDF) plots showing shifts in trend of overall log2FC between genes divided into groups based on the degree of modification associated with the genes. The tool also tests for significance of difference in log2FC between groups of genes.

Maintained by Kandarp Joshi. Last updated 5 months ago.

differentialexpression generegulation histonemodification functionalprediction transcription geneexpression epitranscriptomics epigenetics functionalgenomics systemsbiology transcriptomics chiponchip differential-expression genomics genomics-visualization

5 stars 4.70 score 1 scripts

cidm-ph

phylepic:Combined Visualisation of Phylogenetic and Epidemiological Data

A collection of utilities and 'ggplot2' extensions to assist with visualisations in genomic epidemiology. This includes the 'phylepic' chart, a visual combination of a phylogenetic tree and a matched epidemic curve. The included 'ggplot2' extensions such as date axes binned by week are relevant for other applications in epidemiology and beyond. The approach is described in Suster et al. (2024) <doi:10.1101/2024.04.02.24305229>.

Maintained by Carl Suster. Last updated 3 months ago.

genomics genomics-visualization public-health

4.65 score 4 scripts

acabassi

klic:Kernel Learning Integrative Clustering

Kernel Learning Integrative Clustering (KLIC) is an algorithm that allows to combine multiple kernels, each representing a different measure of the similarity between a set of observations. The contribution of each kernel on the final clustering is weighted according to the amount of information carried by it. As well as providing the functions required to perform the kernel-based clustering, this package also allows the user to simply give the data as input: the kernels are then built using consensus clustering. Different strategies to choose the best number of clusters are also available. For further details please see Cabassi and Kirk (2020) <doi:10.1093/bioinformatics/btaa593>.

Maintained by Alessandra Cabassi. Last updated 5 years ago.

cluster-analysis clustering coca genomics integrative-clustering kernel-methods multi-omics

5 stars 4.40 score 10 scripts

piyalkarum

rCNV:Detect Copy Number Variants from SNPs Data

Functions in this package will import filtered variant call format (VCF) files of SNPs data and generate data sets to detect copy number variants, visualize them and do downstream analyses with copy number variants(e.g. Environmental association analyses).

Maintained by Piyal Karunarathne. Last updated 28 days ago.

cnv-analysis copy-number-variation gene-duplication genetics genomics landscape-genetics snps cpp

6 stars 4.26 score 4 scripts

bioc

getDEE2:Programmatic access to the DEE2 RNA expression dataset

Digital Expression Explorer 2 (or DEE2 for short) is a repository of processed RNA-seq data in the form of counts. It was designed so that researchers could undertake re-analysis and meta-analysis of published RNA-seq studies quickly and easily. As of April 2020, over 1 million SRA datasets have been processed. This package provides an R interface to access these expression data. More information about the DEE2 project can be found at the project homepage (http://dee2.io) and main publication (https://doi.org/10.1093/gigascience/giz022).

Maintained by Mark Ziemann. Last updated 3 months ago.

geneexpression transcriptomics sequencing bioinformatics data-mining genomics rna-expression rna-seq

4 stars 4.20 score 5 scripts

stephenturner

kgp:1000 Genomes Project Metadata

Metadata about populations and data about samples from the 1000 Genomes Project, including the 2,504 samples sequenced for the Phase 3 release and the expanded collection of 3,202 samples with 602 additional trios. The data is described in Auton et al. (2015) <doi:10.1038/nature15393> and Byrska-Bishop et al. (2022) <doi:10.1016/j.cell.2022.08.004>, and raw data is available at <http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/>. See Turner (2022) <doi:10.48550/arXiv.2210.00539> for more details.

Maintained by Stephen Turner. Last updated 2 years ago.

1000genomes bioinformatics genetics genomics metadata population-genetics sequencing

20 stars 4.00 score 3 scripts

blasseigne

ProliferativeIndex:Calculates and Analyzes the Proliferative Index

Provides functions for calculating and analyzing the proliferative index (PI) from an RNA-seq dataset. As described in Ramaker & Lasseigne, et al. bioRxiv, 2016 <doi:10.1101/063057>.

Maintained by Brittany Lasseigne. Last updated 7 years ago.

cancer cancer-genomics gene-expression genomics index metagene

3.70 score 10 scripts

spacecowboy-71

xadmix:Subsetting and Plotting Optimized for Admixture Data

A few functions which provide a quick way of subsetting genomic admixture data and generating customizable stacked barplots.

Maintained by Lukas Schönmann. Last updated 3 years ago.

admixture genomics ggplot2 plotting

3.70 score 6 scripts

quantgen

symDMatrix:Partitioned Symmetric Matrices

A matrix-like class to represent a symmetric matrix partitioned into file-backed blocks.

Maintained by Alexander Grueneberg. Last updated 5 years ago.

genetics genomics r-pkg

2 stars 3.48 score 6 scripts 1 dependents