R-universe search: mutation

bioc

MutationalPatterns:Comprehensive genome-wide analysis of mutational processes

Mutational processes leave characteristic footprints in genomic DNA. This package provides a comprehensive set of flexible functions that allows researchers to easily evaluate and visualize a multitude of mutational patterns in base substitution catalogues of e.g. healthy samples, tumour samples, or DNA-repair deficient cells. The package covers a wide range of patterns including: mutational signatures, transcriptional and replicative strand bias, lesion segregation, genomic distribution and association with genomic features, which are collectively meaningful for studying the activity of mutational processes. The package works with single nucleotide variants (SNVs), insertions and deletions (Indels), double base substitutions (DBSs) and larger multi base substitutions (MBSs). The package provides functionalities for both extracting mutational signatures de novo and determining the contribution of previously identified mutational signatures on a single sample level. MutationalPatterns integrates with common R genomic analysis workflows and allows easy association with (publicly available) annotation data.

Maintained by Mark van Roosmalen. Last updated 5 months ago.

genetics somaticmutation

76.0 match 7.27 score 251 scripts 1 dependents

bioc

YAPSA:Yet Another Package for Signature Analysis

This package provides functions and routines for supervised analyses of mutational signatures (i.e., the signatures have to be known, cf. L. Alexandrov et al., Nature 2013 and L. Alexandrov et al., Bioaxiv 2018). In particular, the family of functions LCD (LCD = linear combination decomposition) can use optimal signature-specific cutoffs which takes care of different detectability of the different signatures. Moreover, the package provides different sets of mutational signatures, including the COSMIC and PCAWG SNV signatures and the PCAWG Indel signatures; the latter infering that with YAPSA, the concept of supervised analysis of mutational signatures is extended to Indel signatures. YAPSA also provides confidence intervals as computed by profile likelihoods and can perform signature analysis on a stratified mutational catalogue (SMC = stratify mutational catalogue) in order to analyze enrichment and depletion patterns for the signatures in different strata.

Maintained by Zuguang Gu. Last updated 5 months ago.

sequencing dnaseq somaticmutation visualization clustering genomicvariation statisticalmethod biologicalquestion

77.2 match 6.41 score 57 scripts

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

25.0 match 459 stars 14.63 score 948 scripts 18 dependents

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 12 days ago.

data-manipulation grammar cpp

14.3 match 4.8k stars 24.68 score 659k scripts 7.8k dependents

dami82

mutSignatures:Decipher Mutational Signatures from Somatic Mutational Catalogs

Cancer cells accumulate DNA mutations as result of DNA damage and DNA repair processes. This computational framework is aimed at deciphering DNA mutational signatures operating in cancer. The framework includes modules that support raw data import and processing, mutational signature extraction, and results interpretation and visualization. The framework accepts widely used file formats storing information about DNA variants, such as Variant Call Format files. The framework performs Non-Negative Matrix Factorization to extract mutational signatures explaining the observed set of DNA mutations. Bootstrapping is performed as part of the analysis. The framework supports parallelization and is optimized for use on multi-core systems. The software was described by Fantini D et al (2020) <doi:10.1038/s41598-020-75062-0> and is based on a custom R-based implementation of the original MATLAB WTSI framework by Alexandrov LB et al (2013) <doi:10.1016/j.celrep.2012.12.008>.

Maintained by Damiano Fantini. Last updated 2 years ago.

52.2 match 14 stars 5.83 score 48 scripts

bioc

HiLDA:Conducting statistical inference on comparing the mutational exposures of mutational signatures by using hierarchical latent Dirichlet allocation

A package built under the Bayesian framework of applying hierarchical latent Dirichlet allocation. It statistically tests whether the mutational exposures of mutational signatures (Shiraishi-model signatures) are different between two groups. The package also provides inference and visualization.

Maintained by Zhi Yang. Last updated 5 months ago.

software somaticmutation sequencing statisticalmethod bayesian mutational-signatures rjags somatic-mutations cpp jags

54.0 match 3 stars 5.56 score 7 scripts 1 dependents

jakobbossek

ecr:Evolutionary Computation in R

Framework for building evolutionary algorithms for both single- and multi-objective continuous or discrete optimization problems. A set of predefined evolutionary building blocks and operators is included. Moreover, the user can easily set up custom objective functions, operators, building blocks and representations sticking to few conventions. The package allows both a black-box approach for standard tasks (plug-and-play style) and a much more flexible white-box approach where the evolutionary cycle is written by hand.

Maintained by Jakob Bossek. Last updated 1 years ago.

combinatorial-optimization evolutionary-algorithm evolutionary-algorithms evolutionary-strategy genetic-algorithm-framework metaheuristics multi-objective-optimization optimization optimization-framework cpp

25.3 match 43 stars 7.36 score 89 scripts 2 dependents

bioc

canceR:A Graphical User Interface for accessing and modeling the Cancer Genomics Data of MSKCC

The package is user friendly interface based on the cgdsr and other modeling packages to explore, compare, and analyse all available Cancer Data (Clinical data, Gene Mutation, Gene Methylation, Gene Expression, Protein Phosphorylation, Copy Number Alteration) hosted by the Computational Biology Center at Memorial-Sloan-Kettering Cancer Center (MSKCC).

Maintained by Karim Mezhoud. Last updated 5 months ago.

gui geneexpression clustering go genesetenrichment kegg multiplecomparison cancer cancer-data gene gene-expression gene-methylation gene-mutation gene-sets methylation mskcc mutations tcltk

34.8 match 7 stars 5.25 score 17 scripts

msq-123

CovidMutations:Mutation Analysis and Assay Validation Toolkit for COVID-19 (Coronavirus Disease 2019)

A feasible framework for mutation analysis and reverse transcription polymerase chain reaction (RT-PCR) assay evaluation of COVID-19, including mutation profile visualization, statistics and mutation ratio of each assay. The mutation ratio is conducive to evaluating the coverage of RT-PCR assays in large-sized samples<doi:10.20944/preprints202004.0529.v1>.

Maintained by Shaoqian Ma. Last updated 5 years ago.

41.2 match 4 stars 4.30 score 6 scripts

shixiangwang

sigminer:Extract, Analyze and Visualize Mutational Signatures for Genomic Variations

Genomic alterations including single nucleotide substitution, copy number alteration, etc. are the major force for cancer initialization and development. Due to the specificity of molecular lesions caused by genomic alterations, we can generate characteristic alteration spectra, called 'signature' (Wang, Shixiang, et al. (2021) <DOI:10.1371/journal.pgen.1009557> & Alexandrov, Ludmil B., et al. (2020) <DOI:10.1038/s41586-020-1943-3> & Steele Christopher D., et al. (2022) <DOI:10.1038/s41586-022-04738-6>). This package helps users to extract, analyze and visualize signatures from genomic alteration records, thus providing new insight into cancer study.

Maintained by Shixiang Wang. Last updated 5 months ago.

bayesian-nmf bioinformatics cancer-research cnv copynumber-signatures cosmic-signatures dbs easy-to-use indel mutational-signatures nmf nmf-extraction sbs signature-extraction somatic-mutations somatic-variants visualization cpp

17.9 match 150 stars 9.48 score 123 scripts 2 dependents

hneth

riskyr:Rendering Risk Literacy more Transparent

Risk-related information (like the prevalence of conditions, the sensitivity and specificity of diagnostic tests, or the effectiveness of interventions or treatments) can be expressed in terms of frequencies or probabilities. By providing a toolbox of corresponding metrics and representations, 'riskyr' computes, translates, and visualizes risk-related information in a variety of ways. Adopting multiple complementary perspectives provides insights into the interplay between key parameters and renders teaching and training programs on risk literacy more transparent.

Maintained by Hansjoerg Neth. Last updated 10 months ago.

2x2-matrix bayesian-inference contingency-table representation risk risk-literacy visualization

20.6 match 19 stars 7.36 score 80 scripts

bioc

Moonlight2R:Identify oncogenes and tumor suppressor genes from omics data

The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). We present an updated version of the R/bioconductor package called MoonlightR, namely Moonlight2R, which returns a list of candidate driver genes for specific cancer types on the basis of omics data integration. The Moonlight framework contains a primary layer where gene expression data and information about biological processes are integrated to predict genes called oncogenic mediators, divided into putative tumor suppressors and putative oncogenes. This is done through functional enrichment analyses, gene regulatory networks and upstream regulator analyses to score the importance of well-known biological processes with respect to the studied cancer type. By evaluating the effect of the oncogenic mediators on biological processes or through random forests, the primary layer predicts two putative roles for the oncogenic mediators: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As gene expression data alone is not enough to explain the deregulation of the genes, a second layer of evidence is needed. We have automated the integration of a secondary mutational layer through new functionalities in Moonlight2R. These functionalities analyze mutations in the cancer cohort and classifies these into driver and passenger mutations using the driver mutation prediction tool, CScape-somatic. Those oncogenic mediators with at least one driver mutation are retained as the driver genes. As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, Moonlight2R can be used to discover OCGs and TSGs in the same cancer type. This may for instance help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV). In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments. An additional mechanistic layer evaluates if there are mutations affecting the protein stability of the transcription factors (TFs) of the TSGs and OCGs, as that may have an effect on the expression of the genes.

Maintained by Matteo Tiberti. Last updated 2 months ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network survival genesetenrichment networkenrichment

22.7 match 5 stars 6.59 score 43 scripts

business-science

tidyquant:Tidy Quantitative Financial Analysis

Bringing business and financial analysis to the 'tidyverse'. The 'tidyquant' package provides a convenient wrapper to various 'xts', 'zoo', 'quantmod', 'TTR' and 'PerformanceAnalytics' package functions and returns the objects in the tidy 'tibble' format. The main advantage is being able to use quantitative functions with the 'tidyverse' functions including 'purrr', 'dplyr', 'tidyr', 'ggplot2', 'lubridate', etc. See the 'tidyquant' website for more information, documentation and examples.

Maintained by Matt Dancho. Last updated 1 months ago.

dplyr financial-analysis financial-data financial-statements multiple-stocks performance-analysis performanceanalytics quantmod stock stock-exchanges stock-indexes stock-lists stock-performance stock-prices stock-symbol tidyverse time-series timeseries xts

11.2 match 872 stars 13.34 score 5.2k scripts

mano-b

MicroSEC:Sequence Error Filter for Formalin-Fixed and Paraffin-Embedded Samples

Clinical sequencing of tumor is usually performed on formalin-fixed and paraffin-embedded samples and have many sequencing errors. We found that the majority of these errors are detected in chimeric read caused by single-strand DNA with micro-homology. Our filtering pipeline focuses on the uneven distribution of the artifacts in each read and removes such errors in formalin-fixed and paraffin-embedded samples without over-eliminating the true mutations detected in fresh frozen samples.

Maintained by Masachika Ikegami. Last updated 3 months ago.

25.4 match 7 stars 5.66 score 8 scripts

rozen-lab

cosmicsig:Mutational Signatures from COSMIC (Catalogue of Somatic Mutations in Cancer)

A data package with 2 main package variables: 'signature' and 'etiology'. The 'signature' variable contains the latest mutational signature profiles released on COSMIC <https://cancer.sanger.ac.uk/signatures/> for 3 mutation types: * Single base substitutions in the context of preceding and following bases, * Doublet base substitutions, and * Small insertions and deletions. The 'etiology' variable provides the known or hypothesized causes of signatures. 'cosmicsig' stands for COSMIC signatures. Please run ?'cosmicsig' for more information.

Maintained by Steven Rozen. Last updated 2 years ago.

41.3 match 1 stars 3.04 score 22 scripts

g3viz

g3viz:Interactively Visualize Genetic Mutation Data using a Lollipop-Diagram

Interface for 'g3-lollipop' 'JavaScript' library. Visualize genetic mutation data using an interactive lollipop diagram in 'RStudio' or your web browser.

Maintained by Xin Guo. Last updated 6 months ago.

bioinformatics genomics-visualization lollipop-plot variants visualize-mutation-data

21.7 match 31 stars 5.61 score 22 scripts

bioc

tidySummarizedExperiment:Brings SummarizedExperiment to the Tidyverse

The tidySummarizedExperiment package provides a set of tools for creating and manipulating tidy data representations of SummarizedExperiment objects. SummarizedExperiment is a widely used data structure in bioinformatics for storing high-throughput genomic data, such as gene expression or DNA sequencing data. The tidySummarizedExperiment package introduces a tidy framework for working with SummarizedExperiment objects. It allows users to convert their data into a tidy format, where each observation is a row and each variable is a column. This tidy representation simplifies data manipulation, integration with other tidyverse packages, and enables seamless integration with the broader ecosystem of tidy tools for data analysis.

Maintained by Stefano Mangiola. Last updated 5 months ago.

assaydomain infrastructure rnaseq differentialexpression geneexpression normalization clustering qualitycontrol sequencing transcription transcriptomics

13.5 match 26 stars 8.44 score 196 scripts 1 dependents

bioc

sitePath:Phylogeny-based sequence clustering with site polymorphism

Using site polymorphism is one of the ways to cluster DNA/protein sequences but it is possible for the sequences with the same polymorphism on a single site to be genetically distant. This package is aimed at clustering sequences using site polymorphism and their corresponding phylogenetic trees. By considering their location on the tree, only the structurally adjacent sequences will be clustered. However, the adjacent sequences may not necessarily have the same polymorphism. So a branch-and-bound like algorithm is used to minimize the entropy representing the purity of site polymorphism of each cluster.

Maintained by Chengyang Ji. Last updated 5 months ago.

alignment multiplesequencealignment phylogenetics snp software mutation cpp

21.6 match 8 stars 5.20 score 9 scripts

bioc

decompTumor2Sig:Decomposition of individual tumors into mutational signatures by signature refitting

Uses quadratic programming for signature refitting, i.e., to decompose the mutation catalog from an individual tumor sample into a set of given mutational signatures (either Alexandrov-model signatures or Shiraishi-model signatures), computing weights that reflect the contributions of the signatures to the mutation load of the tumor.

Maintained by Rosario M. Piro. Last updated 5 months ago.

software snp sequencing dnaseq genomicvariation somaticmutation biomedicalinformatics genetics biologicalquestion statisticalmethod

23.5 match 1 stars 4.78 score 10 scripts 1 dependents

bioc

musicatk:Mutational Signature Comprehensive Analysis Toolkit

Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3.

Maintained by Joshua D. Campbell. Last updated 5 months ago.

software biologicalquestion somaticmutation variantannotation

15.7 match 13 stars 7.02 score 20 scripts

rich-iannone

DiagrammeR:Graph/Network Visualization

Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.

Maintained by Richard Iannone. Last updated 2 months ago.

graph graph-functions network-graph property-graph visualization

7.1 match 1.7k stars 15.18 score 3.8k scripts 87 dependents

ssnn-airr

shazam:Immunoglobulin Somatic Hypermutation Analysis

Provides a computational framework for analyzing mutations in immunoglobulin (Ig) sequences. Includes methods for Bayesian estimation of antigen-driven selection pressure, mutational load quantification, building of somatic hypermutation (SHM) models, and model-dependent distance calculations. Also includes empirically derived models of SHM for both mice and humans. Citations: Gupta and Vander Heiden, et al (2015) <doi:10.1093/bioinformatics/btv359>, Yaari, et al (2012) <doi:10.1093/nar/gks457>, Yaari, et al (2013) <doi:10.3389/fimmu.2013.00358>, Cui, et al (2016) <doi:10.4049/jimmunol.1502263>.

Maintained by Susanna Marquez. Last updated 2 months ago.

14.5 match 7.43 score 222 scripts 2 dependents

magnusdv

pedmut:Mutation Models for Pedigree Likelihood Computations

A collection of functions for modelling mutations in pedigrees with marker data, as used e.g. in likelihood computations with microsatellite data. Implemented models include equal, proportional and stepwise models, as well as random models for experimental work, and custom models allowing the user to apply any valid mutation matrix. Allele lumping is done following the lumpability criteria of Kemeny and Snell (1976), ISBN:0387901922.

Maintained by Magnus Dehli Vigeland. Last updated 1 years ago.

21.6 match 2 stars 4.76 score 5 scripts 19 dependents

bioc

PureCN:Copy number calling and SNV classification using targeted short read sequencing

This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.

Maintained by Markus Riester. Last updated 2 months ago.

copynumbervariation software sequencing variantannotation variantdetection coverage immunooncology bioconductor-package cell-free-dna copy-number loh tumor-heterogeneity tumor-mutational-burden tumor-purity

10.5 match 132 stars 9.72 score 40 scripts

bioc

selectKSigs:Selecting the number of mutational signatures using a perplexity-based measure and cross-validation

A package to suggest the number of mutational signatures in a collection of somatic mutations using calculating the cross-validated perplexity score.

Maintained by Zhi Yang. Last updated 5 months ago.

software somaticmutation sequencing statisticalmethod clustering mutational-signatures rjags somatic-mutations cpp jags

24.7 match 3 stars 4.08 score 1 scripts

bioc

TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data

The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Maintained by Tiago Chedraoui Silva. Last updated 26 days ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network sequencing survival software bioc bioconductor gdc integrative-analysis tcga tcga-data tcgabiolinks

6.8 match 305 stars 14.45 score 1.6k scripts 6 dependents

hanjunwei-lab

ssMutPA:Single-Sample Mutation-Based Pathway Analysis

A systematic bioinformatics tool to perform single-sample mutation-based pathway analysis by integrating somatic mutation data with the Protein-Protein Interaction (PPI) network. In this method, we use local and global weighted strategies to evaluate the effects of network genes from mutations according to the network topology and then calculate the mutation-based pathway enrichment score (ssMutPES) to reflect the accumulated effect of mutations of each pathway. Subsequently, the ssMutPES profiles are used for unsupervised spectral clustering to identify cancer subtypes.

Maintained by Junwei Han. Last updated 5 months ago.

23.9 match 4.00 score 9 scripts

bioc

RaggedExperiment:Representation of Sparse Experiments and Assays Across Samples

This package provides a flexible representation of copy number, mutation, and other data that fit into the ragged array schema for genomic location data. The basic representation of such data provides a rectangular flat table interface to the user with range information in the rows and samples/specimen in the columns. The RaggedExperiment class derives from a GRangesList representation and provides a semblance of a rectangular dataset.

Maintained by Marcel Ramos. Last updated 4 months ago.

infrastructure datarepresentation copynumber core-package data-structure mutations u24ca289073

10.5 match 4 stars 8.96 score 76 scripts 15 dependents

bioc

CaMutQC:An R Package for Comprehensive Filtration and Selection of Cancer Somatic Mutations

CaMutQC is able to filter false positive mutations generated due to technical issues, as well as to select candidate cancer mutations through a series of well-structured functions by labeling mutations with various flags. And a detailed and vivid filter report will be offered after completing a whole filtration or selection section. Also, CaMutQC integrates serveral methods and gene panels for Tumor Mutational Burden (TMB) estimation.

Maintained by Xin Wang. Last updated 5 months ago.

software qualitycontrol genetarget cancer-genomics somatic-mutations

15.8 match 7 stars 5.92 score 1 scripts

bioc

mitoClone2:Clonal Population Identification in Single-Cell RNA-Seq Data using Mitochondrial and Somatic Mutations

This package primarily identifies variants in mitochondrial genomes from BAM alignment files. It filters these variants to remove RNA editing events then estimates their evolutionary relationship (i.e. their phylogenetic tree) and groups single cells into clones. It also visualizes the mutations and providing additional genomic context.

Maintained by Benjamin Story. Last updated 5 months ago.

annotation dataimport genetics snp software singlecell alignment curl bzip2 xz-utils zlib cpp

20.8 match 1 stars 4.48 score 9 scripts

cran

podcleaner:Legacy Scottish Post Office Directories Cleaner

Attempts to clean optical character recognition (OCR) errors in legacy Scottish Post Office Directories. Further attempts to match records from trades and general directories.

Maintained by Olivier Bautheac. Last updated 3 years ago.

54.0 match 1.70 score

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 9 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

6.0 match 959 stars 15.16 score 4.0k scripts 21 dependents

stemangiola

tidyseurat:Brings Seurat to the Tidyverse

It creates an invisible layer that allow to see the 'Seurat' object as tibble and interact seamlessly with the tidyverse.

Maintained by Stefano Mangiola. Last updated 8 months ago.

assaydomain infrastructure rnaseq differentialexpression geneexpression normalization clustering qualitycontrol sequencing transcription transcriptomics dplyr ggplot2 pca purrr sct seurat single-cell single-cell-rna-seq tibble tidyr tidyverse transcripts tsne umap

9.0 match 158 stars 9.66 score 398 scripts 1 dependents

hadley

plyr:Tools for Splitting, Applying and Combining Data

A set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary statistics. The development of 'plyr' has been generously supported by 'Becton Dickinson'.

Maintained by Hadley Wickham. Last updated 4 months ago.

cpp

4.7 match 500 stars 18.16 score 83k scripts 3.3k dependents

bioc

tidySingleCellExperiment:Brings SingleCellExperiment to the Tidyverse

'tidySingleCellExperiment' is an adapter that abstracts the 'SingleCellExperiment' container in the form of a 'tibble'. This allows *tidy* data manipulation, nesting, and plotting. For example, a 'tidySingleCellExperiment' is directly compatible with functions from 'tidyverse' packages `dplyr` and `tidyr`, as well as plotting with `ggplot2` and `plotly`. In addition, the package provides various utility functions specific to single-cell omics data analysis (e.g., aggregation of cell-level data to pseudobulks).

Maintained by Stefano Mangiola. Last updated 5 months ago.

assaydomain infrastructure rnaseq differentialexpression singlecell geneexpression normalization clustering qualitycontrol sequencing bioconductor dplyr ggplot2 plotly single-cell-rna-seq single-cell-sequencing singlecellexperiment tibble tidyr tidyverse

9.0 match 36 stars 8.86 score 125 scripts 2 dependents

bioc

iPAC:Identification of Protein Amino acid Clustering

iPAC is a novel tool to identify somatic amino acid mutation clustering within proteins while taking into account protein structure.

Maintained by Gregory Ryslik. Last updated 2 days ago.

clustering proteomics

14.2 match 5.56 score 4 scripts 3 dependents

bioc

SparseSignatures:SparseSignatures

Point mutations occurring in a genome can be divided into 96 categories based on the base being mutated, the base it is mutated into and its two flanking bases. Therefore, for any patient, it is possible to represent all the point mutations occurring in that patient's tumor as a vector of length 96, where each element represents the count of mutations for a given category in the patient. A mutational signature represents the pattern of mutations produced by a mutagen or mutagenic process inside the cell. Each signature can also be represented by a vector of length 96, where each element represents the probability that this particular mutagenic process generates a mutation of the 96 above mentioned categories. In this R package, we provide a set of functions to extract and visualize the mutational signatures that best explain the mutation counts of a large number of patients.

Maintained by Luca De Sano. Last updated 5 months ago.

biomedicalinformatics somaticmutation

12.0 match 11 stars 6.42 score 4 scripts

zhangrenl

geneHapR:Gene Haplotype Statistics, Phenotype Association and Visualization

Import genome variants data and perform gene haplotype Statistics, visualization and phenotype association with 'R'.

Maintained by Zhang Renliang. Last updated 6 months ago.

nucleosomepositioning dataimport

14.6 match 13 stars 5.11 score 8 scripts

r-forge

ROI:R Optimization Infrastructure

The R Optimization Infrastructure ('ROI') <doi:10.18637/jss.v094.i15> is a sophisticated framework for handling optimization problems in R. Additional information can be found on the 'ROI' homepage <http://roi.r-forge.r-project.org/>.

Maintained by Stefan Theussl. Last updated 2 years ago.

9.4 match 7.68 score 506 scripts 47 dependents

bioc

netZooR:Unified methods for the inference and analysis of gene regulatory networks

netZooR unifies the implementations of several Network Zoo methods (netzoo, netzoo.github.io) into a single package by creating interfaces between network inference and network analysis methods. Currently, the package has 3 methods for network inference including PANDA and its optimized implementation OTTER (network reconstruction using mutliple lines of biological evidence), LIONESS (single-sample network inference), and EGRET (genotype-specific networks). Network analysis methods include CONDOR (community detection), ALPACA (differential community detection), CRANE (significance estimation of differential modules), MONSTER (estimation of network transition states). In addition, YARN allows to process gene expresssion data for tissue-specific analyses and SAMBAR infers missing mutation data based on pathway information.

Maintained by Tara Eicher. Last updated 8 days ago.

networkinference network generegulation geneexpression transcription microarray graphandnetwork gene-regulatory-network transcription-factors

8.9 match 105 stars 7.98 score

bioc

DriverNet:Drivernet: uncovering somatic driver mutations modulating transcriptional networks in cancer

DriverNet is a package to predict functional important driver genes in cancer by integrating genome data (mutation and copy number variation data) and transcriptome data (gene expression data). The different kinds of data are combined by an influence graph, which is a gene-gene interaction network deduced from pathway data. A greedy algorithm is used to find the possible driver genes, which may mutated in a larger number of patients and these mutations will push the gene expression values of the connected genes to some extreme values.

Maintained by Jiarui Ding. Last updated 5 months ago.

network

16.4 match 4.30 score 7 scripts

bioc

signeR:Empirical Bayesian approach to mutational signature discovery

The signeR package provides an empirical Bayesian approach to mutational signature discovery. It is designed to analyze single nucleotide variation (SNV) counts in cancer genomes, but can also be applied to other features as well. Functionalities to characterize signatures or genome samples according to exposure patterns are also provided.

Maintained by Renan Valieris. Last updated 5 months ago.

genomicvariation somaticmutation statisticalmethod visualization bioconductor bioinformatics openblas cpp

9.2 match 13 stars 7.67 score 22 scripts

bioc

GenomAutomorphism:Compute the automorphisms between DNA's Abelian group representations

This is a R package to compute the automorphisms between pairwise aligned DNA sequences represented as elements from a Genomic Abelian group. In a general scenario, from genomic regions till the whole genomes from a given population (from any species or close related species) can be algebraically represented as a direct sum of cyclic groups or more specifically Abelian p-groups. Basically, we propose the representation of multiple sequence alignments of length N bp as element of a finite Abelian group created by the direct sum of homocyclic Abelian group of prime-power order.

Maintained by Robersy Sanchez. Last updated 3 months ago.

mathematicalbiology comparativegenomics functionalgenomics multiplesequencealignment wholegenome genetic-code genetic-code-algebra genome genome-algebra

16.1 match 4.30 score 9 scripts

openbiox

UCSCXenaShiny:Interactive Analysis of UCSC Xena Data

Provides functions and a Shiny application for downloading, analyzing and visualizing datasets from UCSC Xena (<http://xena.ucsc.edu/>), which is a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others.

Maintained by Shixiang Wang. Last updated 4 months ago.

cancer-dataset shiny-apps ucsc-xena

8.1 match 96 stars 8.54 score 35 scripts

elbersb

tidylog:Logging for 'dplyr' and 'tidyr' Functions

Provides feedback about 'dplyr' and 'tidyr' operations.

Maintained by Benjamin Elbers. Last updated 9 months ago.

dplyr tidyr tidyverse wrapper-functions

6.7 match 593 stars 10.23 score 1.7k scripts

talegari

tidier:Enhanced 'mutate'

Provides 'Apache Spark' style window aggregation for R dataframes and remote 'dbplyr' tables via 'mutate' in 'dplyr' flavour.

Maintained by Srikanth Komala Sheshachala. Last updated 2 years ago.

dbplyr dplyr mutate spark-sql tidyverse

21.3 match 3 stars 3.18 score 1 scripts

bioc

bioCancer:Interactive Multi-Omics Cancers Data Visualization and Analysis

This package is a Shiny App to visualize and analyse interactively Multi-Assays of Cancer Genomic Data.

Maintained by Karim Mezhoud. Last updated 5 months ago.

gui datarepresentation network multiplecomparison pathways reactome visualization geneexpression genetarget analysis biocancer-interface cancer cancer-studies rmarkdown

11.2 match 20 stars 5.95 score 7 scripts

dieghernan

tidyterra:'tidyverse' Methods and 'ggplot2' Helpers for 'terra' Objects

Extension of the 'tidyverse' for 'SpatRaster' and 'SpatVector' objects of the 'terra' package. It includes also new 'geom_' functions that provide a convenient way of visualizing 'terra' objects with 'ggplot2'.

Maintained by Diego Hernangómez. Last updated 8 hours ago.

terra ggplot-extension r-spatial rspatial

4.9 match 191 stars 13.62 score 1.9k scripts 25 dependents

bioc

SigsPack:Mutational Signature Estimation for Single Samples

Single sample estimation of exposure to mutational signatures. Exposures to known mutational signatures are estimated for single samples, based on quadratic programming algorithms. Bootstrapping the input mutational catalogues provides estimations on the stability of these exposures. The effect of the sequence composition of mutational context can be taken into account by normalising the catalogues.

Maintained by Franziska Schumann. Last updated 5 months ago.

somaticmutation snp variantannotation biomedicalinformatics dnaseq

15.3 match 2 stars 4.30 score 4 scripts

bdj34

cloneRate:Estimate Growth Rates from Phylogenetic Trees

Quickly estimate the net growth rate of a population or clone whose growth can be approximated by a birth-death branching process. Input should be phylogenetic tree(s) of clone(s) with edge lengths corresponding to either time or mutations. Based on coalescent results in Johnson et al. (2023) <doi:10.1093/bioinformatics/btad561>. Simulation techniques as well as growth rate methods build on prior work from Lambert A. (2018) <doi:10.1016/j.tpb.2018.04.005> and Stadler T. (2009) <doi:10.1016/j.jtbi.2009.07.018>.

Maintained by Brian Johnson. Last updated 11 months ago.

cpp

13.0 match 4 stars 4.90 score 8 scripts

billdenney

PKNCA:Perform Pharmacokinetic Non-Compartmental Analysis

Compute standard Non-Compartmental Analysis (NCA) parameters for typical pharmacokinetic analyses and summarize them.

Maintained by Bill Denney. Last updated 16 days ago.

nca noncompartmental-analysis pharmacokinetics

4.8 match 73 stars 12.61 score 214 scripts 4 dependents

bioc

SomaticSignatures:Somatic Signatures

The SomaticSignatures package identifies mutational signatures of single nucleotide variants (SNVs). It provides a infrastructure related to the methodology described in Nik-Zainal (2012, Cell), with flexibility in the matrix decomposition algorithms.

Maintained by Julian Gehring. Last updated 5 months ago.

sequencing somaticmutation visualization clustering genomicvariation statisticalmethod

8.8 match 22 stars 6.85 score 54 scripts 1 dependents

stan-dev

posterior:Tools for Working with Posterior Distributions

Provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to: (a) Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions. (b) Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws. (c) Provide various summaries of draws in convenient formats. (d) Provide lightweight implementations of state of the art posterior inference diagnostics. References: Vehtari et al. (2021) <doi:10.1214/20-BA1221>.

Maintained by Paul-Christian Bürkner. Last updated 10 days ago.

bayes bayesian mcmc

3.7 match 168 stars 16.13 score 3.3k scripts 342 dependents

bioc

scMitoMut:Single-cell Mitochondrial Mutation Analysis Tool

This package is designed for calling lineage-informative mitochondrial mutations using single-cell sequencing data, such as scRNASeq and scATACSeq (preferably the latter due to RNA editing issues). It includes functions for mutation calling and visualization. Mutation calling is done using beta-binomial distribution.

Maintained by Wenjie Sun. Last updated 3 months ago.

preprocessing sequencing singlecell openblas cpp

12.2 match 2 stars 4.90 score 5 scripts

cran

adana:Adaptive Nature-Inspired Algorithms for Hybrid Genetic Optimization

The Genetic Algorithm (GA) is a type of optimization method of Evolutionary Algorithms. It uses the biologically inspired operators such as mutation, crossover, selection and replacement.Because of their global search and robustness abilities, GAs have been widely utilized in machine learning, expert systems, data science, engineering, life sciences and many other areas of research and business. However, the regular GAs need the techniques to improve their efficiency in computing time and performance in finding global optimum using some adaptation and hybridization strategies. The adaptive GAs (AGA) increase the convergence speed and success of regular GAs by setting the parameters crossover and mutation probabilities dynamically. The hybrid GAs combine the exploration strength of a stochastic GAs with the exact convergence ability of any type of deterministic local search algorithms such as simulated-annealing, in addition to other nature-inspired algorithms such as ant colony optimization, particle swarm optimization etc. The package 'adana' includes a rich working environment with its many functions that make possible to build and work regular GA, adaptive GA, hybrid GA and hybrid adaptive GA for any kind of optimization problems. Cebeci, Z. (2021, ISBN: 9786254397448).

Maintained by Erkut Tekeli. Last updated 3 years ago.

59.3 match 1.00 score 9 scripts

martinzaefferer

CEGO:Combinatorial Efficient Global Optimization

Model building, surrogate model based optimization and Efficient Global Optimization in combinatorial or mixed search spaces.

Maintained by Martin Zaefferer. Last updated 2 months ago.

19.4 match 1 stars 3.04 score 73 scripts

bioc

LACE:Longitudinal Analysis of Cancer Evolution (LACE)

LACE is an algorithmic framework that processes single-cell somatic mutation profiles from cancer samples collected at different time points and in distinct experimental settings, to produce longitudinal models of cancer evolution. The approach solves a Boolean Matrix Factorization problem with phylogenetic constraints, by maximizing a weighed likelihood function computed on multiple time points.

Maintained by Davide Maspero. Last updated 5 months ago.

biomedicalinformatics singlecell somaticmutation

7.7 match 15 stars 7.65 score 3 scripts

plotly

plotly:Create Interactive Web Graphics via 'plotly.js'

Create interactive web graphics from 'ggplot2' graphs and/or a custom interface to the (MIT-licensed) JavaScript library 'plotly.js' inspired by the grammar of graphics.

Maintained by Carson Sievert. Last updated 3 months ago.

d3js data-visualization ggplot2 javascript plotly shiny webgl

3.0 match 2.6k stars 19.43 score 93k scripts 797 dependents

dreamrs

shinyWidgets:Custom Inputs Widgets for Shiny

Collection of custom input controls and user interface components for 'Shiny' applications. Give your applications a unique and colorful style !

Maintained by Victor Perrier. Last updated 11 days ago.

shiny

3.4 match 849 stars 17.05 score 8.1k scripts 218 dependents

kharchenkolab

numbat:Haplotype-Aware CNV Analysis from scRNA-Seq

A computational method that infers copy number variations (CNVs) in cancer scRNA-seq data and reconstructs the tumor phylogeny. 'numbat' integrates signals from gene expression, allelic ratio, and population haplotype structures to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship. 'numbat' can be used to: 1. detect allele-specific copy number variations from single-cells; 2. differentiate tumor versus normal cells in the tumor microenvironment; 3. infer the clonal architecture and evolutionary history of profiled tumors. 'numbat' does not require tumor/normal-paired DNA or genotype data, but operates solely on the donor scRNA-data data (for example, 10x Cell Ranger output). Additional examples and documentations are available at <https://kharchenkolab.github.io/numbat/>. For details on the method please see Gao et al. Nature Biotechnology (2022) <doi:10.1038/s41587-022-01468-y>.

Maintained by Teng Gao. Last updated 16 days ago.

cancer-genomics cnv-detection lineage-tracing phylogeny single-cell single-cell-analysis single-cell-rna-seq spatial-transcriptomics cpp

7.8 match 179 stars 7.41 score 120 scripts

lightbluetitan

OncoDataSets:A Comprehensive Collection of Cancer Types and Cancer-related DataSets

Offers a rich collection of data focused on cancer research, covering survival rates, genetic studies, biomarkers, and epidemiological insights. Designed for researchers, analysts, and bioinformatics practitioners, the package includes datasets on various cancer types such as melanoma, leukemia, breast, ovarian, and lung cancer, among others. It aims to facilitate advanced research, analysis, and understanding of cancer epidemiology, genetics, and treatment outcomes.

Maintained by Renzo Caceres Rossi. Last updated 3 months ago.

13.7 match 3 stars 4.18 score 6 scripts

nathaneastwood

poorman:A Poor Man's Dependency Free Recreation of 'dplyr'

A replication of key functionality from 'dplyr' and the wider 'tidyverse' using only 'base'.

Maintained by Nathan Eastwood. Last updated 1 years ago.

base-r data-manipulation grammar

5.3 match 341 stars 10.79 score 156 scripts 27 dependents

bioc

QSutils:Quasispecies Diversity

Set of utility functions for viral quasispecies analysis with NGS data. Most functions are equally useful for metagenomic studies. There are three main types: (1) data manipulation and exploration—functions useful for converting reads to haplotypes and frequencies, repairing reads, intersecting strand haplotypes, and visualizing haplotype alignments. (2) diversity indices—functions to compute diversity and entropy, in which incidence, abundance, and functional indices are considered. (3) data simulation—functions useful for generating random viral quasispecies data.

Maintained by Mercedes Guerrero-Murillo. Last updated 5 months ago.

software genetics dnaseq geneticvariability sequencing alignment sequencematching dataimport

9.8 match 5.56 score 8 scripts 1 dependents

thackl

gggenomes:A Grammar of Graphics for Comparative Genomics

An extension of 'ggplot2' for creating complex genomic maps. It builds on the power of 'ggplot2' and 'tidyverse' adding new 'ggplot2'-style geoms & positions and 'dplyr'-style verbs to manipulate the underlying data. It implements a layout concept inspired by 'ggraph' and introduces tracks to bring tidiness to the mess that is genomics data.

Maintained by Thomas Hackl. Last updated 1 months ago.

biological-data comparative-genomics genomics-visualization ggplot-extension ggplot2

5.6 match 650 stars 9.56 score 123 scripts

bioc

cardelino:Clone Identification from Single Cell Data

Methods to infer clonal tree configuration for a population of cells using single-cell RNA-seq data (scRNA-seq), and possibly other data modalities. Methods are also provided to assign cells to inferred clones and explore differences in gene expression between clones. These methods can flexibly integrate information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. A flexible beta-binomial error model that accounts for stochastic dropout events as well as systematic allelic imbalance is used.

Maintained by Davis McCarthy. Last updated 5 months ago.

singlecell rnaseq visualization transcriptomics geneexpression sequencing software exomeseq clonal-clustering gibbs-sampling scrna-seq single-cell somatic-mutations

7.5 match 61 stars 7.05 score 62 scripts

mikldk

malan:MAle Lineage ANalysis

MAle Lineage ANalysis by simulating genealogies backwards and imposing short tandem repeats (STR) mutations forwards. Intended for forensic Y chromosomal STR (Y-STR) haplotype analyses. Numerous analyses are possible, e.g. number of matches and meiotic distance to matches. Refer to papers mentioned in citation("malan") (DOI's: <doi:10.1371/journal.pgen.1007028>, <doi:10.21105/joss.00684> and <doi:10.1016/j.fsigen.2018.10.004>).

Maintained by Mikkel Meyer Andersen. Last updated 1 years ago.

openblas cpp

11.8 match 4.48 score 6 scripts

bioc

TRONCO:TRONCO, an R package for TRanslational ONCOlogy

The TRONCO (TRanslational ONCOlogy) R package collects algorithms to infer progression models via the approach of Suppes-Bayes Causal Network, both from an ensemble of tumors (cross-sectional samples) and within an individual patient (multi-region or single-cell samples). The package provides parallel implementation of algorithms that process binary matrices where each row represents a tumor sample and each column a single-nucleotide or a structural variant driving the progression; a 0/1 value models the absence/presence of that alteration in the sample. The tool can import data from plain, MAF or GISTIC format files, and can fetch it from the cBioPortal for cancer genomics. Functions for data manipulation and visualization are provided, as well as functions to import/export such data to other bioinformatics tools for, e.g, clustering or detection of mutually exclusive alterations. Inferred models can be visualized and tested for their confidence via bootstrap and cross-validation. TRONCO is used for the implementation of the Pipeline for Cancer Inference (PICNIC).

Maintained by Luca De Sano. Last updated 5 months ago.

biomedicalinformatics bayesian graphandnetwork somaticmutation networkinference network clustering dataimport singlecell immunooncology algorithms cancer-inference tumors

8.0 match 30 stars 6.50 score 38 scripts

emmanuelparadis

pegas:Population and Evolutionary Genetics Analysis System

Functions for reading, writing, plotting, analysing, and manipulating allelic and haplotypic data, including from VCF files, and for the analysis of population nucleotide sequences and micro-satellites including coalescent analyses, linkage disequilibrium, population structure (Fst, Amova) and equilibrium (HWE), haplotype networks, minimum spanning tree and network, and median-joining networks.

Maintained by Emmanuel Paradis. Last updated 1 years ago.

6.9 match 7.53 score 576 scripts 18 dependents

rdinnager

slimr:Create, Run and Post-Process 'SLiM' Population Genetics Forward Simulations

Lets you write 'SLiM' scripts (population genomics simulation) using your favourite R IDE, using a syntax as close as possible to the original 'SLiM' language. It offer many tools to manipulate those scripts, as well as run them in the 'SLiM' software from R, as well as capture and post-process their output, after or even during a simulation.

Maintained by Russell Dinnage. Last updated 4 months ago.

11.0 match 8 stars 4.70 score 42 scripts

bioc

cbpManager:Generate, manage, and edit data and metadata files suitable for the import in cBioPortal for Cancer Genomics

This R package provides an R Shiny application that enables the user to generate, manage, and edit data and metadata files suitable for the import in cBioPortal for Cancer Genomics. Create cancer studies and edit its metadata. Upload mutation data of a patient that will be concatenated to the data_mutation_extended.txt file of the study. Create and edit clinical patient data, sample data, and timeline data. Create custom timeline tracks for patients.

Maintained by Arsenij Ustjanzew. Last updated 5 months ago.

immunooncology dataimport datarepresentation gui thirdpartyclient preprocessing visualization cancer-genomics cbioportal clinical-data filegenerator mutation-data patient-data

9.3 match 8 stars 5.51 score 1 scripts

fcampelo

ExpDE:Modular Differential Evolution for Experimenting with Operators

Modular implementation of the Differential Evolution algorithm for experimenting with different types of operators.

Maintained by Felipe Campelo. Last updated 6 years ago.

13.9 match 2 stars 3.70 score 25 scripts

gaynorr

AlphaSimR:Breeding Program Simulations

The successor to the 'AlphaSim' software for breeding program simulation [Faux et al. (2016) <doi:10.3835/plantgenome2016.02.0013>]. Used for stochastic simulations of breeding programs to the level of DNA sequence for every individual. Contained is a wide range of functions for modeling common tasks in a breeding program, such as selection and crossing. These functions allow for constructing simulations of highly complex plant and animal breeding programs via scripting in the R software environment. Such simulations can be used to evaluate overall breeding program performance and conduct research into breeding program design, such as implementation of genomic selection. Included is the 'Markovian Coalescent Simulator' ('MaCS') for fast simulation of biallelic sequences according to a population demographic history [Chen et al. (2009) <doi:10.1101/gr.083634.108>].

Maintained by Chris Gaynor. Last updated 5 months ago.

breeding genomics simulation openblas cpp openmp

5.0 match 47 stars 10.22 score 534 scripts 2 dependents

bioc

clusterProfiler:A universal enrichment tool for interpreting omics data

This package supports functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation. It provides a univeral interface for gene functional annotation from a variety of sources and thus can be applied in diverse scenarios. It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation. Datasets obtained from multiple treatments and time points can be analyzed and compared in a single run, easily revealing functional consensus and differences among distinct conditions.

Maintained by Guangchuang Yu. Last updated 4 months ago.

annotation clustering genesetenrichment go kegg multiplecomparison pathways reactome visualization enrichment-analysis gsea

3.0 match 1.1k stars 17.03 score 11k scripts 48 dependents

ddsjoberg

gtsummary:Presentation-Ready Data Summary and Analytic Result Tables

Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.

Maintained by Daniel D. Sjoberg. Last updated 2 days ago.

easy-to-use gt html5 regression-models reproducibility reproducible-research statistics summary-statistics summary-tables table1 tableone

3.0 match 1.1k stars 17.00 score 8.2k scripts 15 dependents

hanjunwei-lab

pathwayTMB:Pathway Based Tumor Mutational Burden

A systematic bioinformatics tool to develop a new pathway-based gene panel for tumor mutational burden (TMB) assessment (pathway-based tumor mutational burden, PTMB), using somatic mutations files in an efficient manner from either The Cancer Genome Atlas sources or any in-house studies as long as the data is in mutation annotation file (MAF) format. Besides, we develop a multiple machine learning method using the sample's PTMB profiles to identify cancer-specific dysfunction pathways, which can be a biomarker of prognostic and predictive for cancer immunotherapy.

Maintained by Junwei Han. Last updated 3 years ago.

20.2 match 2.48 score 2 scripts 1 dependents

kassambara

ggpubr:'ggplot2' Based Publication Ready Plots

The 'ggplot2' package is excellent and flexible for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a 'ggplot', the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills. 'ggpubr' provides some easy-to-use functions for creating and customizing 'ggplot2'- based publication ready plots.

Maintained by Alboukadel Kassambara. Last updated 2 years ago.

3.0 match 1.2k stars 16.68 score 65k scripts 409 dependents

bioc

FLAMES:FLAMES: Full Length Analysis of Mutations and Splicing in long read RNA-seq data

Semi-supervised isoform detection and annotation from both bulk and single-cell long read RNA-seq data. Flames provides automated pipelines for analysing isoforms, as well as intermediate functions for manual execution.

Maintained by Changqing Wang. Last updated 5 days ago.

rnaseq singlecell transcriptomics dataimport differentialsplicing alternativesplicing geneexpression longread zlib curl bzip2 xz-utils cpp

6.2 match 31 stars 7.95 score 12 scripts

bioc

RESOLVE:RESOLVE: An R package for the efficient analysis of mutational signatures from cancer genomes

Cancer is a genetic disease caused by somatic mutations in genes controlling key biological functions such as cellular growth and division. Such mutations may arise both through cell-intrinsic and exogenous processes, generating characteristic mutational patterns over the genome named mutational signatures. The study of mutational signatures have become a standard component of modern genomics studies, since it can reveal which (environmental and endogenous) mutagenic processes are active in a tumor, and may highlight markers for therapeutic response. Mutational signatures computational analysis presents many pitfalls. First, the task of determining the number of signatures is very complex and depends on heuristics. Second, several signatures have no clear etiology, casting doubt on them being computational artifacts rather than due to mutagenic processes. Last, approaches for signatures assignment are greatly influenced by the set of signatures used for the analysis. To overcome these limitations, we developed RESOLVE (Robust EStimation Of mutationaL signatures Via rEgularization), a framework that allows the efficient extraction and assignment of mutational signatures. RESOLVE implements a novel algorithm that enables (i) the efficient extraction, (ii) exposure estimation, and (iii) confidence assessment during the computational inference of mutational signatures.

Maintained by Luca De Sano. Last updated 5 months ago.

biomedicalinformatics somaticmutation

10.7 match 1 stars 4.60 score 3 scripts

hojsgaard

doBy:Groupwise Statistics, LSmeans, Linear Estimates, Utilities

Utility package containing: 1) Facilities for working with grouped data: 'do' something to data stratified 'by' some variables. 2) LSmeans (least-squares means), general linear estimates. 3) Restrict functions to a smaller domain. 4) Miscellaneous other utilities.

Maintained by Søren Højsgaard. Last updated 4 days ago.

3.3 match 1 stars 14.94 score 3.2k scripts 939 dependents

bioc

supersigs:Supervised mutational signatures

Generate SuperSigs (supervised mutational signatures) from single nucleotide variants in the cancer genome. Functions included in the package allow the user to learn supervised mutational signatures from their data and apply them to new data. The methodology is based on the one described in Afsari (2021, ELife).

Maintained by Albert Kuo. Last updated 5 months ago.

featureextraction classification regression sequencing wholegenome somaticmutation

9.9 match 3 stars 4.78 score 3 scripts

kharchenkolab

scistreer:Maximum-Likelihood Perfect Phylogeny Inference at Scale

Fast maximum-likelihood phylogeny inference from noisy single-cell data using the 'ScisTree' algorithm by Yufeng Wu (2019) <doi:10.1093/bioinformatics/btz676>. 'scistreer' provides an 'R' interface and improves speed via 'Rcpp' and 'RcppParallel', making the method applicable to massive single-cell datasets (>10,000 cells).

Maintained by Teng Gao. Last updated 2 years ago.

evolution phylogenetics single-cell cpp

11.8 match 7 stars 4.02 score 2 scripts 1 dependents

snoweye

phyclust:Phylogenetic Clustering (Phyloclustering)

Phylogenetic clustering (phyloclustering) is an evolutionary Continuous Time Markov Chain model-based approach to identify population structure from molecular data without assuming linkage equilibrium. The package phyclust (Chen 2011) provides a convenient implementation of phyloclustering for DNA and SNP data, capable of clustering individuals into subpopulations and identifying molecular sequences representative of those subpopulations. It is designed in C for performance, interfaced with R for visualization, and incorporates other popular open source programs including ms (Hudson 2002) <doi:10.1093/bioinformatics/18.2.337>, seq-gen (Rambaut and Grassly 1997) <doi:10.1093/bioinformatics/13.3.235>, Hap-Clustering (Tzeng 2005) <doi:10.1002/gepi.20063> and PAML baseml (Yang 1997, 2007) <doi:10.1093/bioinformatics/13.5.555>, <doi:10.1093/molbev/msm088>, for simulating data, additional analyses, and searching the best tree. See the phyclust website for more information, documentations and examples.

Maintained by Wei-Chen Chen. Last updated 2 years ago.

5.5 match 9 stars 8.45 score 126 scripts 8 dependents

asardaes

table.express:Build 'data.table' Expressions with Data Manipulation Verbs

A specialization of 'dplyr' data manipulation verbs that parse and build expressions which are ultimately evaluated by 'data.table', letting it handle all optimizations. A set of additional verbs is also provided to facilitate some common operations on a subset of the data.

Maintained by Alexis Sarda-Espinosa. Last updated 2 years ago.

8.0 match 65 stars 5.81 score 8 scripts

bioc

SynMut:SynMut: Designing Synonymously Mutated Sequences with Different Genomic Signatures

There are increasing demands on designing virus mutants with specific dinucleotide or codon composition. This tool can take both dinucleotide preference and/or codon usage bias into account while designing mutants. It is a powerful tool for in silico designs of DNA sequence mutants.

Maintained by Haogao Gu. Last updated 5 months ago.

sequencematching experimentaldesign preprocessing

10.8 match 2 stars 4.30 score 1 scripts

bioc

CIMICE:CIMICE-R: (Markov) Chain Method to Inferr Cancer Evolution

CIMICE is a tool in the field of tumor phylogenetics and its goal is to build a Markov Chain (called Cancer Progression Markov Chain, CPMC) in order to model tumor subtypes evolution. The input of CIMICE is a Mutational Matrix, so a boolean matrix representing altered genes in a collection of samples. These samples are assumed to be obtained with single-cell DNA analysis techniques and the tool is specifically written to use the peculiarities of this data for the CMPC construction.

Maintained by Nicolò Rossi. Last updated 5 months ago.

software biologicalquestion networkinference researchfield phylogenetics statisticalmethod graphandnetwork technology singlecell

10.8 match 4.30 score 5 scripts

kassambara

rstatix:Pipe-Friendly Framework for Basic Statistical Tests

Provides a simple and intuitive pipe-friendly framework, coherent with the 'tidyverse' design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses. The output of each test is automatically transformed into a tidy data frame to facilitate visualization. Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely 'within-Ss' designs (repeated measures), purely 'between-Ss' designs, and mixed 'within-and-between-Ss' designs. It's also possible to compute several effect size metrics, including "eta squared" for ANOVA, "Cohen's d" for t-test and 'Cramer V' for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances.

Maintained by Alboukadel Kassambara. Last updated 2 years ago.

3.0 match 456 stars 15.16 score 11k scripts 420 dependents

cynkra

dm:Relational Data Models

Provides tools for working with multiple related tables, stored as data frames or in a relational database. Multiple tables (data and metadata) are stored in a compound object, which can then be manipulated with a pipe-friendly syntax.

Maintained by Kirill Müller. Last updated 2 months ago.

data-model data-warehousing datawarehousing dbi dbplyr relational-databases

3.0 match 511 stars 14.81 score 410 scripts 8 dependents

thomasp85

tidygraph:A Tidy API for Graph Manipulation

A graph, while not "tidy" in itself, can be thought of as two tidy data frames describing node and edge data respectively. 'tidygraph' provides an approach to manipulate these two virtual data frames using the API defined in the 'dplyr' package, as well as provides tidy interfaces to a lot of common graph algorithms.

Maintained by Thomas Lin Pedersen. Last updated 1 months ago.

graph-algorithms graph-manipulation igraph network-analysis tidyverse cpp

3.0 match 553 stars 14.74 score 4.6k scripts 136 dependents

skranz

dplyrExtras:extra functionality for dplyr like mutate_rows for mutation of a subset of rows

Some extra functionality that is not (yet) in dplyr, e.g. mutate_rows (mutation of subset of rows) xsummarise_each (summarise_each with more flexible alignment of results), or s_filter, s_arrange ,... that allow string arguments.

Maintained by Sebastian Kranz. Last updated 5 years ago.

dplyr

9.1 match 20 stars 4.85 score 59 scripts 4 dependents

bioc

plasmut:Stratifying mutations observed in cell-free DNA and white blood cells as germline, hematopoietic, or somatic

A Bayesian method for quantifying the liklihood that a given plasma mutation arises from clonal hematopoesis or the underlying tumor. It requires sequencing data of the mutation in plasma and white blood cells with the number of distinct and mutant reads in both tissues. We implement a Monte Carlo importance sampling method to assess the likelihood that a mutation arises from the tumor relative to non-tumor origin.

Maintained by Adith Arun. Last updated 5 months ago.

bayesian somaticmutation germlinemutation sequencing

10.9 match 4.00 score 2 scripts

bioc

compSPOT:compSPOT: Tool for identifying and comparing significantly mutated genomic hotspots

Clonal cell groups share common mutations within cancer, precancer, and even clinically normal appearing tissues. The frequency and location of these mutations may predict prognosis and cancer risk. It has also been well established that certain genomic regions have increased sensitivity to acquiring mutations. Mutation-sensitive genomic regions may therefore serve as markers for predicting cancer risk. This package contains multiple functions to establish significantly mutated hotspots, compare hotspot mutation burden between samples, and perform exploratory data analysis of the correlation between hotspot mutation burden and personal risk factors for cancer, such as age, gender, and history of carcinogen exposure. This package allows users to identify robust genomic markers to help establish cancer risk.

Maintained by Sydney Grant. Last updated 5 months ago.

software technology sequencing dnaseq wholegenome classification singlecell survival multiplecomparison

10.7 match 4.00 score 3 scripts

luca-scr

GA:Genetic Algorithms

Flexible general-purpose toolbox implementing genetic algorithms (GAs) for stochastic optimisation. Binary, real-valued, and permutation representations are available to optimize a fitness function, i.e. a function provided by users depending on their objective function. Several genetic operators are available and can be combined to explore the best settings for the current task. Furthermore, users can define new genetic operators and easily evaluate their performances. Local search using general-purpose optimisation algorithms can be applied stochastically to exploit interesting regions. GAs can be run sequentially or in parallel, using an explicit master-slave parallelisation or a coarse-grain islands approach. For more details see Scrucca (2013) <doi:10.18637/jss.v053.i04> and Scrucca (2017) <doi:10.32614/RJ-2017-008>.

Maintained by Luca Scrucca. Last updated 6 months ago.

genetic-algorithm optimisation cpp

3.7 match 93 stars 11.58 score 624 scripts 52 dependents

gergness

srvyr:'dplyr'-Like Syntax for Summary Statistics of Survey Data

Use piping, verbs like 'group_by' and 'summarize', and other 'dplyr' inspired syntactic style when calculating summary statistics on survey data using functions from the 'survey' package.

Maintained by Greg Freedman Ellis. Last updated 1 months ago.

survey

3.0 match 215 stars 13.88 score 1.8k scripts 15 dependents

jakobbossek

mcMST:A Toolbox for the Multi-Criteria Minimum Spanning Tree Problem

Algorithms to approximate the Pareto-front of multi-criteria minimum spanning tree problems.

Maintained by Jakob Bossek. Last updated 2 years ago.

evolutionary-algorithms mcmst minimum-spanning-trees multi-objective-optimization spanningtrees

8.7 match 4 stars 4.73 score 27 scripts

jbytecode

mcga:Machine Coded Genetic Algorithms for Real-Valued Optimization Problems

Machine coded genetic algorithm (MCGA) is a fast tool for real-valued optimization problems. It uses the byte representation of variables rather than real-values. It performs the classical crossover operations (uniform) on these byte representations. Mutation operator is also similar to classical mutation operator, which is to say, it changes a randomly selected byte value of a chromosome by +1 or -1 with probability 1/2. In MCGAs there is no need for encoding-decoding process and the classical operators are directly applicable on real-values. It is fast and can handle a wide range of a search space with high precision. Using a 256-unary alphabet is the main disadvantage of this algorithm but a moderate size population is convenient for many problems. Package also includes multi_mcga function for multi objective optimization problems. This function sorts the chromosomes using their ranks calculated from the non-dominated sorting algorithm.

Maintained by Mehmet Hakan Satman. Last updated 1 years ago.

cpp

15.0 match 2.72 score 52 scripts

bioc

SUITOR:Selecting the number of mutational signatures through cross-validation

An unsupervised cross-validation method to select the optimal number of mutational signatures. A data set of mutational counts is split into training and validation data.Signatures are estimated in the training data and then used to predict the mutations in the validation data.

Maintained by Bill Wheeler. Last updated 5 months ago.

genetics software somaticmutation

10.1 match 4.00 score 2 scripts

statgenlmu

coala:A Framework for Coalescent Simulation

Coalescent simulators can rapidly simulate biological sequences evolving according to a given model of evolution. You can use this package to specify such models, to conduct the simulations and to calculate additional statistics from the results (Staab, Metzler, 2016 <doi:10.1093/bioinformatics/btw098>). It relies on existing simulators for doing the simulation, and currently supports the programs 'ms', 'msms' and 'scrm'. It also supports finite-sites mutation models by combining the simulators with the program 'seq-gen'. Coala provides functions for calculating certain summary statistics, which can also be applied to actual biological data. One possibility to import data is through the 'PopGenome' package (<https://github.com/pievos101/PopGenome>).

Maintained by Dirk Metzler. Last updated 1 years ago.

coalescent dna evolution popgen simulation cpp

5.7 match 23 stars 7.06 score 84 scripts

bioc

survClust:Identification Of Clinically Relevant Genomic Subtypes Using Outcome Weighted Learning

survClust is an outcome weighted integrative clustering algorithm used to classify multi-omic samples on their available time to event information. The resulting clusters are cross-validated to avoid over overfitting and output classification of samples that are molecularly distinct and clinically meaningful. It takes in binary (mutation) as well as continuous data (other omic types).

Maintained by Arshi Arora. Last updated 5 months ago.

software clustering survival classification cpp

8.5 match 16 stars 4.74 score 17 scripts

yulab-smu

tidytree:A Tidy Tool for Phylogenetic Tree Data Manipulation

Phylogenetic tree generally contains multiple components including node, edge, branch and associated data. 'tidytree' provides an approach to convert tree object to tidy data frame as well as provides tidy interfaces to manipulate tree data.

Maintained by Guangchuang Yu. Last updated 8 months ago.

phylogenetic-tree tidyverse tree-data

3.0 match 54 stars 13.25 score 584 scripts 128 dependents

bioc

tidySpatialExperiment:SpatialExperiment with tidy principles

tidySpatialExperiment provides a bridge between the SpatialExperiment package and the tidyverse ecosystem. It creates an invisible layer that allows you to interact with a SpatialExperiment object as if it were a tibble; enabling the use of functions from dplyr, tidyr, ggplot2 and plotly. But, underneath, your data remains a SpatialExperiment object.

Maintained by William Hutchison. Last updated 5 months ago.

infrastructure rnaseq geneexpression sequencing spatial transcriptomics singlecell

6.8 match 6 stars 5.88 score 12 scripts

hanjunwei-lab

ProgModule:Identification of Prognosis-Related Mutually Exclusive Modules

A novel tool to identify candidate driver modules for predicting the prognosis of patients by integrating exclusive coverage of mutations with clinical characteristics in cancer.

Maintained by Junwei Han. Last updated 3 months ago.

10.3 match 3.70 score 1 scripts

bioc

plyranges:A fluent interface for manipulating GenomicRanges

A dplyr-like interface for interacting with the common Bioconductor classes Ranges and GenomicRanges. By providing a grammatical and consistent way of manipulating these classes their accessiblity for new Bioconductor users is hopefully increased.

Maintained by Michael Love. Last updated 5 months ago.

infrastructure datarepresentation workflowstep coverage bioconductor data-analysis dplyr genomic-ranges genomics tidy-data

3.0 match 143 stars 12.60 score 1.9k scripts 20 dependents

ropensci

skimr:Compact and Flexible Summaries of Data

A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the "Using skimr" vignette and the README.

Maintained by Elin Waring. Last updated 2 months ago.

peer-reviewed ropensci summary-statistics unconf unconf17

2.3 match 1.1k stars 16.80 score 18k scripts 14 dependents

danymukesha

genetic.algo.optimizeR:Genetic Algorithm Optimization

Genetic algorithm are a class of optimization algorithms inspired by the process of natural selection and genetics. This package is for learning purposes and allows users to optimize various functions or parameters by mimicking biological evolution processes such as selection, crossover, and mutation. Ideal for tasks like machine learning parameter tuning, mathematical function optimization, and solving an optimization problem that involves finding the best solution in a discrete space.

Maintained by Dany Mukesha. Last updated 5 months ago.

experimentaldesign technology

8.8 match 4.30 score 10 scripts

complexgenome

GARCOM:Gene and Region Counting of Mutations ("GARCOM")

Gene and Region Counting of Mutations (GARCOM) package computes mutation (or alleles) counts per gene per individuals based on gene annotation or genomic base pair boundaries. It comes with features to accept data formats in plink(.raw) and VCF. It provides users flexibility to extract and filter individuals, mutations and genes of interest.

Maintained by Sanjeev Sariya. Last updated 2 years ago.

genetics mutation

13.9 match 2.70 score 2 scripts

bioc

goProfiles:goProfiles: an R package for the statistical analysis of functional profiles

The package implements methods to compare lists of genes based on comparing the corresponding 'functional profiles'.

Maintained by Alex Sanchez. Last updated 5 months ago.

annotation go geneexpression genesetenrichment graphandnetwork microarray multiplecomparison pathways software

6.9 match 5.48 score 6 scripts 1 dependents

bioc

mslp:Predict synthetic lethal partners of tumour mutations

An integrated pipeline to predict the potential synthetic lethality partners (SLPs) of tumour mutations, based on gene expression, mutation profiling and cell line genetic screens data. It has builtd-in support for data from cBioPortal. The primary SLPs correlating with muations in WT and compensating for the loss of function of mutations are predicted by random forest based methods (GENIE3) and Rank Products, respectively. Genetic screens are employed to identfy consensus SLPs leads to reduced cell viability when perturbed.

Maintained by Chunxuan Shao. Last updated 5 months ago.

pharmacogenetics pharmacogenomics

11.3 match 3.30 score 1 scripts

bioc

GenomicDataCommons:NIH / NCI Genomic Data Commons Access

Programmatically access the NIH / NCI Genomic Data Commons RESTful service.

Maintained by Sean Davis. Last updated 1 months ago.

dataimport sequencing api-client bioconductor bioinformatics cancer core-services data-science genomics nci tcga vignette

3.1 match 87 stars 11.94 score 238 scripts 12 dependents

cobrbra

ICBioMark:Data-Driven Design of Targeted Gene Panels for Estimating Immunotherapy Biomarkers

Implementation of the methodology proposed in 'Data-driven design of targeted gene panels for estimating immunotherapy biomarkers', Bradley and Cannings (2021) <arXiv:2102.04296>. This package allows the user to fit generative models of mutation from an annotated mutation dataset, and then further to produce tunable linear estimators of exome-wide biomarkers. It also contains functions to simulate mutation annotated format (MAF) data, as well as to analyse the output and performance of models.

Maintained by Jacob R. Bradley. Last updated 2 years ago.

13.8 match 2.70 score 2 scripts

bioc

CrispRVariants:Tools for counting and visualising mutations in a target location

CrispRVariants provides tools for analysing the results of a CRISPR-Cas9 mutagenesis sequencing experiment, or other sequencing experiments where variants within a given region are of interest. These tools allow users to localize variant allele combinations with respect to any genomic location (e.g. the Cas9 cut site), plot allele combinations and calculate mutation rates with flexible filtering of unrelated variants.

Maintained by Helen Lindsay. Last updated 5 months ago.

immunooncology crispr genomicvariation variantdetection geneticvariability datarepresentation visualization sequencing

6.8 match 5.51 score 32 scripts

bioc

GraphPAC:Identification of Mutational Clusters in Proteins via a Graph Theoretical Approach.

Identifies mutational clusters of amino acids in a protein while utilizing the proteins tertiary structure via a graph theoretical model.

Maintained by Gregory Ryslik. Last updated 2 days ago.

clustering proteomics

8.0 match 4.65 score 1 scripts 1 dependents

r-spatial

stars:Spatiotemporal Arrays, Raster and Vector Data Cubes

Reading, manipulating, writing and plotting spatiotemporal arrays (raster and vector data cubes) in 'R', using 'GDAL' bindings provided by 'sf', and 'NetCDF' bindings by 'ncmeta' and 'RNetCDF'.

Maintained by Edzer Pebesma. Last updated 30 days ago.

raster satellite-images spatial

2.0 match 568 stars 18.26 score 7.2k scripts 135 dependents

bioc

SpacePAC:Identification of Mutational Clusters in 3D Protein Space via Simulation.

Identifies clustering of somatic mutations in proteins via a simulation approach while considering the protein's tertiary structure.

Maintained by Gregory Ryslik. Last updated 2 days ago.

clustering proteomics

7.8 match 4.65 score 2 scripts 1 dependents

bioc

QuartPAC:Identification of mutational clusters in protein quaternary structures

Identifies clustering of somatic mutations in proteins over the entire quaternary structure.

Maintained by Gregory Ryslik. Last updated 2 days ago.

clustering proteomics somaticmutation

10.0 match 3.60 score 2 scripts

mli171

changepointGA:Changepoint Detection via Modified Genetic Algorithm

The Genetic Algorithm (GA) is used to perform changepoint analysis in time series data. The package also includes an extended island version of GA, as described in Lu, Lund, and Lee (2010, <doi:10.1214/09-AOAS289>). By mimicking the principles of natural selection and evolution, GA provides a powerful stochastic search technique for solving combinatorial optimization problems. In 'changepointGA', each chromosome represents a changepoint configuration, including the number and locations of changepoints, hyperparameters, and model parameters. The package employs genetic operators—selection, crossover, and mutation—to iteratively improve solutions based on the given fitness (objective) function. Key features of 'changepointGA' include encoding changepoint configurations in an integer format, enabling dynamic and simultaneous estimation of model hyperparameters, changepoint configurations, and associated parameters. The detailed algorithmic implementation can be found in the package vignettes and in the paper of Li (2024, <doi:10.48550/arXiv.2410.15571>).

Maintained by Mo Li. Last updated 22 days ago.

cpp

7.0 match 4.95 score

green-striped-gecko

PopGenReport:A Simple Framework to Analyse Population and Landscape Genetic Data

Provides beginner friendly framework to analyse population genetic data. Based on 'adegenet' objects it uses 'knitr' to create comprehensive reports on spatial genetic data. For detailed information how to use the package refer to the comprehensive tutorials or visit <http://www.popgenreport.org/>.

Maintained by Bernd Gruber. Last updated 1 years ago.

4.8 match 5 stars 7.27 score 82 scripts 1 dependents

hanjunwei-lab

SMDIC:Identification of Somatic Mutation-Driven Immune Cells

A computing tool is developed to automated identify somatic mutation-driven immune cells. The operation modes including: i) inferring the relative abundance matrix of tumor-infiltrating immune cells and integrating it with a particular gene mutation status, ii) detecting differential immune cells with respect to the gene mutation status and converting the abundance matrix of significant differential immune cell into two binary matrices (one for up-regulated and one for down-regulated), iii) identifying somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices across all samples, and iv) visualization of immune cell abundance of samples in different mutation status..

Maintained by Junwei Han. Last updated 5 months ago.

8.6 match 2 stars 4.00 score 5 scripts

jthomasmock

gtExtras:Extending 'gt' for Beautiful HTML Tables

Provides additional functions for creating beautiful tables with 'gt'. The functions are generally wrappers around boilerplate or adding opinionated niche capabilities and helpers functions.

Maintained by Thomas Mock. Last updated 12 months ago.

data-science data-visualization datascience ggplot2 gt plots sparkline sparkline-graphs sparklines tables

3.0 match 199 stars 11.45 score 2.4k scripts 3 dependents

markfairbanks

tidytable:Tidy Interface to 'data.table'

A tidy interface to 'data.table', giving users the speed of 'data.table' while using tidyverse-like syntax.

Maintained by Mark Fairbanks. Last updated 2 months ago.

3.0 match 458 stars 11.41 score 732 scripts 10 dependents

rstudio

promises:Abstractions for Promise-Based Asynchronous Programming

Provides fundamental abstractions for doing asynchronous programming in R using promises. Asynchronous programming is useful for allowing a single R process to orchestrate multiple tasks in the background while also attending to something else. Semantics are similar to 'JavaScript' promises, but with a syntax that is idiomatic R.

Maintained by Joe Cheng. Last updated 1 months ago.

cpp

2.0 match 204 stars 17.10 score 688 scripts 2.6k dependents

ropensci

BaseSet:Working with Sets the Tidy Way

Implements a class and methods to work with sets, doing intersection, union, complementary sets, power sets, cartesian product and other set operations in a "tidy" way. These set operations are available for both classical sets and fuzzy sets. Import sets from several formats or from other several data structures.

Maintained by Lluís Revilla Sancho. Last updated 25 days ago.

bioconductor bioconductor-package sets

6.0 match 11 stars 5.69 score 5 scripts

tidymodels

recipes:Preprocessing and Feature Engineering Steps for Modeling

A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.

Maintained by Max Kuhn. Last updated 5 days ago.

1.8 match 584 stars 18.71 score 7.2k scripts 380 dependents

bioc

RCy3:Functions to Access and Control Cytoscape

Vizualize, analyze and explore networks using Cytoscape via R. Anything you can do using the graphical user interface of Cytoscape, you can now do with a single RCy3 function.

Maintained by Alex Pico. Last updated 5 months ago.

visualization graphandnetwork thirdpartyclient network

2.5 match 52 stars 13.39 score 628 scripts 15 dependents

uupharmacometrics

xpose:Diagnostics for Pharmacometric Models

Diagnostics for non-linear mixed-effects (population) models from 'NONMEM' <https://www.iconplc.com/solutions/technologies/nonmem/>. 'xpose' facilitates data import, creation of numerical run summary and provide 'ggplot2'-based graphics for data exploration and model diagnostics.

Maintained by Benjamin Guiastrennec. Last updated 2 months ago.

diagnostics ggplot2 nonmem pharmacometrics xpose

3.0 match 62 stars 11.02 score 183 scripts 6 dependents

tidyverse

stringr:Simple, Consistent Wrappers for Common String Operations

A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.

Maintained by Hadley Wickham. Last updated 7 months ago.

regular-expression strings

1.5 match 622 stars 21.97 score 164k scripts 8.2k dependents

bioc

h5vc:Managing alignment tallies using a hdf5 backend

This package contains functions to interact with tally data from NGS experiments that is stored in HDF5 files.

Maintained by Paul Theodor Pyl. Last updated 2 months ago.

curl bzip2 xz-utils zlib cpp

7.3 match 4.48 score 2 scripts

mitchelloharawild

vitae:Curriculum Vitae for R Markdown

Provides templates and functions to simplify the production and maintenance of curriculum vitae.

Maintained by Mitchell OHara-Wild. Last updated 9 months ago.

cv ozunconf18 resume unconf

3.0 match 1.2k stars 10.78 score 556 scripts

cran

Ease:Simulating Explicit Population Genetics Models

Implementation in a simple and efficient way of fully customisable population genetics simulations, considering multiple loci that have epistatic interactions. Specifically suited to the modelling of multilocus nucleocytoplasmic systems (with both diploid and haploid loci), it is nevertheless possible to simulate purely diploid (or purely haploid) genetic models. Examples of models that can be simulated with Ease are numerous, for example models of genetic incompatibilities as presented by Marie-Orleach et al. (2022) <doi:10.1101/2022.07.25.501356>. Many others are conceivable, although few are actually explored, Ease having been developed in particular to provide a solution so that these kinds of models can be simulated simply.

Maintained by Ehouarn Le Faou. Last updated 2 years ago.

cpp

16.0 match 2.00 score

bodkan

slendr:A Simulation Framework for Spatiotemporal Population Genetics

A framework for simulating spatially explicit genomic data which leverages real cartographic information for programmatic and visual encoding of spatiotemporal population dynamics on real geographic landscapes. Population genetic models are then automatically executed by the 'SLiM' software by Haller et al. (2019) <doi:10.1093/molbev/msy228> behind the scenes, using a custom built-in simulation 'SLiM' script. Additionally, fully abstract spatial models not tied to a specific geographic location are supported, and users can also simulate data from standard, non-spatial, random-mating models. These can be simulated either with the 'SLiM' built-in back-end script, or using an efficient coalescent population genetics simulator 'msprime' by Baumdicker et al. (2022) <doi:10.1093/genetics/iyab229> with a custom-built 'Python' script bundled with the R package. Simulated genomic data is saved in a tree-sequence format and can be loaded, manipulated, and summarised using tree-sequence functionality via an R interface to the 'Python' module 'tskit' by Kelleher et al. (2019) <doi:10.1038/s41588-019-0483-y>. Complete model configuration, simulation and analysis pipelines can be therefore constructed without a need to leave the R environment, eliminating friction between disparate tools for population genetic simulations and data analysis.

Maintained by Martin Petr. Last updated 12 days ago.

popgen population-genetics simulations spatial-statistics

3.5 match 56 stars 9.15 score 88 scripts

jmsigner

amt:Animal Movement Tools

Manage and analyze animal movement data. The functionality of 'amt' includes methods to calculate home ranges, track statistics (e.g. step lengths, speed, or turning angles), prepare data for fitting habitat selection analyses, and simulation of space-use from fitted step-selection functions.

Maintained by Johannes Signer. Last updated 4 months ago.

3.0 match 41 stars 10.54 score 418 scripts

cran

ClusteredMutations:Location and Visualization of Clustered Somatic Mutations

Identification and visualization of groups of closely spaced mutations in the DNA sequence of cancer genome. The extremely mutated zones are searched in the symmetric dissimilarity matrix using the anti-Robinson matrix properties. Different data sets are obtained to describe and plot the clustered mutations information.

Maintained by David Lora. Last updated 9 years ago.

15.8 match 2.00 score

hanjunwei-lab

PMAPscore:Identify Prognosis-Related Pathways Altered by Somatic Mutation

We innovatively defined a pathway mutation accumulate perturbation score (PMAPscore) to reflect the position and the cumulative effect of the genetic mutations at the pathway level. Based on the PMAPscore of pathways, identified prognosis-related pathways altered by somatic mutation and predict immunotherapy efficacy by constructing a multiple-pathway-based risk model (Tarca, Adi Laurentiu et al (2008) <doi:10.1093/bioinformatics/btn577>).

Maintained by Junwei Han. Last updated 3 years ago.

8.5 match 3.70 score 2 scripts

steverozen

ICAMS:In-Depth Characterization and Analysis of Mutational Signatures ('ICAMS')

Analysis and visualization of experimentally elucidated mutational signatures -- the kind of analysis and visualization in Boot et al., "In-depth characterization of the cisplatin mutational signature in human cell lines and in esophageal and liver tumors", Genome Research 2018, <doi:10.1101/gr.230219.117> and "Characterization of colibactin-associated mutational signature in an Asian oral squamous cell carcinoma and in other mucosal tumor types", Genome Research 2020 <doi:10.1101/gr.255620.119>. 'ICAMS' stands for In-depth Characterization and Analysis of Mutational Signatures. 'ICAMS' has functions to read in variant call files (VCFs) and to collate the corresponding catalogs of mutational spectra and to analyze and plot catalogs of mutational spectra and signatures. Handles both "counts-based" and "density-based" (i.e. representation as mutations per megabase) mutational spectra or signatures.

Maintained by Steve Rozen. Last updated 3 years ago.

5.7 match 8 stars 5.41 score 128 scripts

bcgov

bcdata:Search and Retrieve Data from the BC Data Catalogue

Search, query, and download tabular and 'geospatial' data from the British Columbia Data Catalogue (<https://catalogue.data.gov.bc.ca/>). Search catalogue data records based on keywords, data licence, sector, data format, and B.C. government organization. View metadata directly in R, download many data formats, and query 'geospatial' data available via the B.C. government Web Feature Service ('WFS') using 'dplyr' syntax.

Maintained by Andy Teucher. Last updated 1 months ago.

bcdc citz data-science env

3.0 match 83 stars 10.29 score 186 scripts 4 dependents

pik-piam

quitte:Bits and pieces of code to use with quitte-style data frames

A collection of functions for easily dealing with quitte-style data frames, doing multi-model comparisons and plots.

Maintained by Michaja Pehl. Last updated 2 days ago.

3.8 match 8.22 score 184 scripts 35 dependents

cynkra

munch:Functions for working with the historicized list of communes of Switzerland

Contains historicized municipality data for Switzerland from 1960 onwards, from the "Historisiertes Gemeindeverzeichnis" of the Swiss Federal Statistical Office.

Maintained by Kirill Müller. Last updated 3 months ago.

5.6 match 6 stars 5.43 score 2 scripts

hojsgaard

gRbase:A Package for Graphical Modelling in R

The 'gRbase' package provides graphical modelling features used by e.g. the packages 'gRain', 'gRim' and 'gRc'. 'gRbase' implements graph algorithms including (i) maximum cardinality search (for marked and unmarked graphs). (ii) moralization, (iii) triangulation, (iv) creation of junction tree. 'gRbase' facilitates array operations, 'gRbase' implements functions for testing for conditional independence. 'gRbase' illustrates how hierarchical log-linear models may be implemented and describes concept of graphical meta data. The facilities of the package are documented in the book by Højsgaard, Edwards and Lauritzen (2012, <doi:10.1007/978-1-4614-2299-0>) and in the paper by Dethlefsen and Højsgaard, (2005, <doi:10.18637/jss.v014.i17>). Please see 'citation("gRbase")' for citation details.

Maintained by Søren Højsgaard. Last updated 4 months ago.

openblas cpp

3.3 match 3 stars 9.24 score 241 scripts 20 dependents

karissawhiting

cbioportalR:Browse and Query Clinical and Genomic Data from cBioPortal

Provides R users with direct access to genomic and clinical data from the 'cBioPortal' web resource via user-friendly functions that wrap 'cBioPortal's' existing API endpoints <https://www.cbioportal.org/api/swagger-ui/index.html>. Users can browse and query genomic data on mutations, copy number alterations and fusions, as well as data on tumor mutational burden ('TMB'), microsatellite instability status ('MSI'), 'FACETS' and select clinical data points (depending on the study). See <https://www.cbioportal.org/> and Gao et al., (2013) <doi:10.1126/scisignal.2004088> for more information on the cBioPortal web resource.

Maintained by Karissa Whiting. Last updated 4 months ago.

4.5 match 21 stars 6.70 score 20 scripts

bioc

plyinteractions:Extending tidy verbs to genomic interactions

Operate on `GInteractions` objects as tabular data using `dplyr`-like verbs. The functions and methods in `plyinteractions` provide a grammatical approach to manipulate `GInteractions`, to facilitate their integration in genomic analysis workflows.

Maintained by Jacques Serizay. Last updated 5 months ago.

software infrastructure

6.4 match 4.75 score 14 scripts

jprybylski

xpose.xtras:Extra Functionality for the 'xpose' Package

Adding some at-present missing functionality, or functions unlikely to be added to the base 'xpose' package. This includes some diagnostic plots that have been missing in translation from 'xpose4', but also some useful features that truly extend the capabilities of what can be done with 'xpose'. These extensions include the concept of a set of 'xpose' objects, and diagnostics for likelihood-based models.

Maintained by John Prybylski. Last updated 4 months ago.

5.0 match 6.01 score 5 scripts

jonesor

Rcompadre:Utilities for using the 'COM(P)ADRE' Matrix Model Database

Utility functions for interacting with the 'COMPADRE' and 'COMADRE' databases of matrix population models. Described in Jones et al. (2021) <doi:10.1101/2021.04.26.441330>.

Maintained by Owen Jones. Last updated 5 months ago.

3.9 match 11 stars 7.74 score 55 scripts 2 dependents

r-lib

lintr:A 'Linter' for R Code

Checks adherence to a given style, syntax errors and possible semantic issues. Supports on the fly checking of R code edited with 'RStudio IDE', 'Emacs', 'Vim', 'Sublime Text', 'Atom' and 'Visual Studio Code'.

Maintained by Michael Chirico. Last updated 8 days ago.

linter

1.8 match 1.2k stars 17.00 score 916 scripts 33 dependents

bioc

MicrobiotaProcess:A comprehensive R package for managing and analyzing microbiome and other ecological data within the tidy framework

MicrobiotaProcess is an R package for analysis, visualization and biomarker discovery of microbial datasets. It introduces MPSE class, this make it more interoperable with the existing computing ecosystem. Moreover, it introduces a tidy microbiome data structure paradigm and analysis grammar. It provides a wide variety of microbiome data analysis procedures under the unified and common framework (tidy-like framework).

Maintained by Shuangbin Xu. Last updated 5 months ago.

visualization microbiome software multiplecomparison featureextraction microbiome-analysis microbiome-data

3.0 match 183 stars 9.70 score 126 scripts 1 dependents

shixiangwang

sigminer.prediction:Train and Predict Cancer Subtype with Keras Model based on Mutational Signatures

Mutational signatures represent mutational processes occured in cancer evolution, thus are stable and genetic resources for subtyping. This tool provides functions for training neutral network models to predict the subtype a sample belongs to based on 'keras' and 'sigminer' packages.

Maintained by Shixiang Wang. Last updated 3 years ago.

keras mutational-signatures prostate-cancer sigminer

11.1 match 8 stars 2.60 score 2 scripts

pierreroudier

spectacles:Storing, Manipulating and Analysis Spectroscopy and Associated Data

Stores and eases the manipulation of spectra and associated data, with dedicated classes for spatial and soil-related data.

Maintained by Pierre Roudier. Last updated 2 years ago.

4.6 match 11 stars 6.17 score 45 scripts 1 dependents

bioc

tidybulk:Brings transcriptomics to the tidyverse

This is a collection of utility functions that allow to perform exploration of and calculations to RNA sequencing data, in a modular, pipe-friendly and tidy fashion.

Maintained by Stefano Mangiola. Last updated 5 months ago.

assaydomain infrastructure rnaseq differentialexpression geneexpression normalization clustering qualitycontrol sequencing transcription transcriptomics bioconductor bulk-transcriptional-analyses deseq2 differential-expression edger ensembl-ids entrez gene-symbols gsea mds-dimensions pca pipe redundancy tibble tidy tidy-data tidyverse transcripts tsne

3.0 match 168 stars 9.48 score 172 scripts 1 dependents

nepem-ufsc

metan:Multi Environment Trials Analysis

Performs stability analysis of multi-environment trial data using parametric and non-parametric methods. Parametric methods includes Additive Main Effects and Multiplicative Interaction (AMMI) analysis by Gauch (2013) <doi:10.2135/cropsci2013.04.0241>, Ecovalence by Wricke (1965), Genotype plus Genotype-Environment (GGE) biplot analysis by Yan & Kang (2003) <doi:10.1201/9781420040371>, geometric adaptability index by Mohammadi & Amri (2008) <doi:10.1007/s10681-007-9600-6>, joint regression analysis by Eberhart & Russel (1966) <doi:10.2135/cropsci1966.0011183X000600010011x>, genotypic confidence index by Annicchiarico (1992), Murakami & Cruz's (2004) method, power law residuals (POLAR) statistics by Doring et al. (2015) <doi:10.1016/j.fcr.2015.08.005>, scale-adjusted coefficient of variation by Doring & Reckling (2018) <doi:10.1016/j.eja.2018.06.007>, stability variance by Shukla (1972) <doi:10.1038/hdy.1972.87>, weighted average of absolute scores by Olivoto et al. (2019a) <doi:10.2134/agronj2019.03.0220>, and multi-trait stability index by Olivoto et al. (2019b) <doi:10.2134/agronj2019.03.0221>. Non-parametric methods includes superiority index by Lin & Binns (1988) <doi:10.4141/cjps88-018>, nonparametric measures of phenotypic stability by Huehn (1990) <doi:10.1007/BF00024241>, TOP third statistic by Fox et al. (1990) <doi:10.1007/BF00040364>. Functions for computing biometrical analysis such as path analysis, canonical correlation, partial correlation, clustering analysis, and tools for inspecting, manipulating, summarizing and plotting typical multi-environment trial data are also provided.

Maintained by Tiago Olivoto. Last updated 9 days ago.

3.0 match 2 stars 9.48 score 1.3k scripts 2 dependents

davidbolin

MetricGraph:Random Fields on Metric Graphs

Facilitates creation and manipulation of metric graphs, such as street or river networks. Further facilitates operations and visualizations of data on metric graphs, and the creation of a large class of random fields and stochastic partial differential equations on such spaces. These random fields can be used for simulation, prediction and inference. In particular, linear mixed effects models including random field components can be fitted to data based on computationally efficient sparse matrix representations. Interfaces to the R packages 'INLA' and 'inlabru' are also provided, which facilitate working with Bayesian statistical models on metric graphs. The main references for the methods are Bolin, Simas and Wallin (2024) <doi:10.3150/23-BEJ1647>, Bolin, Kovacs, Kumar and Simas (2023) <doi:10.1090/mcom/3929> and Bolin, Simas and Wallin (2023) <doi:10.48550/arXiv.2304.03190> and <doi:10.48550/arXiv.2304.10372>.

Maintained by David Bolin. Last updated 6 days ago.

cpp

4.7 match 14 stars 6.06 score 275 scripts

fcampelo

MOEADr:Component-Wise MOEA/D Implementation

Modular implementation of Multiobjective Evolutionary Algorithms based on Decomposition (MOEA/D) [Zhang and Li (2007), <DOI:10.1109/TEVC.2007.892759>] for quick assembling and testing of new algorithmic components, as well as easy replication of published MOEA/D proposals. The full framework is documented in a paper published in the Journal of Statistical Software [<doi:10.18637/jss.v092.i06>].

Maintained by Felipe Campelo. Last updated 2 years ago.

moead multiobjective-optimization

4.5 match 20 stars 6.30 score 40 scripts

bioc

MesKit:A tool kit for dissecting cancer evolution from multi-region derived tumor biopsies via somatic alterations

MesKit provides commonly used analysis and visualization modules based on mutational data generated by multi-region sequencing (MRS). This package allows to depict mutational profiles, measure heterogeneity within or between tumors from the same patient, track evolutionary dynamics, as well as characterize mutational patterns on different levels. Shiny application was also developed for a need of GUI-based analysis. As a handy tool, MesKit can facilitate the interpretation of tumor heterogeneity and the understanding of evolutionary relationship between regions in MRS study.

Maintained by Mengni Liu. Last updated 5 months ago.

5.9 match 4.73 score 18 scripts 1 dependents

greymonroe

genemodel:Gene Model Plotting in R

Using simple input, this package creates plots of gene models. Users can create plots of alternatively spliced gene variants and the positions of mutations and other gene features.

Maintained by J Grey Monroe. Last updated 8 years ago.

6.4 match 4 stars 4.30 score 9 scripts

bupaverse

bupaR:Business Process Analysis in R

Comprehensive Business Process Analysis toolkit. Creates S3-class for event log objects, and related handler functions. Imports related packages for filtering event data, computation of descriptive statistics, handling of 'Petri Net' objects and visualization of process maps. See also packages 'edeaR','processmapR', 'eventdataR' and 'processmonitR'.

Maintained by Gert Janssenswillen. Last updated 2 years ago.

3.0 match 55 stars 9.07 score 389 scripts 11 dependents

bioc

GenVisR:Genomic Visualizations in R

Produce highly customizable publication quality graphics for genomic data primarily at the cohort level.

Maintained by Zachary Skidmore. Last updated 5 months ago.

infrastructure datarepresentation classification dnaseq

2.8 match 215 stars 9.87 score 76 scripts

cmmr

rbiom:Read/Write, Analyze, and Visualize 'BIOM' Data

A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. CPU intensive operations are coded in C for speed.

Maintained by Daniel P. Smith. Last updated 6 days ago.

3.0 match 15 stars 9.02 score 117 scripts 6 dependents

ysosirius

windfarmGA:Genetic Algorithm for Wind Farm Layout Optimization

The genetic algorithm is designed to optimize wind farms of any shape. It requires a predefined amount of turbines, a unified rotor radius and an average wind speed value for each incoming wind direction. A terrain effect model can be included that downloads an 'SRTM' elevation model and loads a Corine Land Cover raster to approximate surface roughness.

Maintained by Sebastian Gatscha. Last updated 2 months ago.

windfarm-layout optimization genetic-algorithm renewable-energy cpp

5.3 match 27 stars 5.06 score 17 scripts

business-science

timetk:A Tool Kit for Working with Time Series

Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.

Maintained by Matt Dancho. Last updated 1 years ago.

coercion coercion-functions data-mining dplyr forecast forecasting forecasting-models machine-learning series-decomposition series-signature tibble tidy tidyquant tidyverse time time-series timeseries

1.9 match 625 stars 14.15 score 4.0k scripts 16 dependents

adibender

pammtools:Piece-Wise Exponential Additive Mixed Modeling Tools for Survival Analysis

The Piece-wise exponential (Additive Mixed) Model (PAMM; Bender and others (2018) <doi: 10.1177/1471082X17748083>) is a powerful model class for the analysis of survival (or time-to-event) data, based on Generalized Additive (Mixed) Models (GA(M)Ms). It offers intuitive specification and robust estimation of complex survival models with stratified baseline hazards, random effects, time-varying effects, time-dependent covariates and cumulative effects (Bender and others (2019)), as well as support for left-truncated, competing risks and recurrent events data. pammtools provides tidy workflow for survival analysis with PAMMs, including data simulation, transformation and other functions for data preprocessing and model post-processing as well as visualization.

Maintained by Andreas Bender. Last updated 2 months ago.

additive-models pamm pammtools piece-wise-exponential survival-analysis

3.0 match 48 stars 8.78 score 310 scripts 8 dependents

bioc

OncoSimulR:Forward Genetic Simulation of Cancer Progression with Epistasis

Functions for forward population genetic simulation in asexual populations, with special focus on cancer progression. Fitness can be an arbitrary function of genetic interactions between multiple genes or modules of genes, including epistasis, order restrictions in mutation accumulation, and order effects. Fitness (including just birth, just death, or both birth and death) can also be a function of the relative and absolute frequencies of other genotypes (i.e., frequency-dependent fitness). Mutation rates can differ between genes, and we can include mutator/antimutator genes (to model mutator phenotypes). Simulating multi-species scenarios and therapeutic interventions, including adaptive therapy, is also possible. Simulations use continuous-time models and can include driver and passenger genes and modules. Also included are functions for: simulating random DAGs of the type found in Oncogenetic Trees, Conjunctive Bayesian Networks, and other cancer progression models; plotting and sampling from single or multiple realizations of the simulations, including single-cell sampling; plotting the parent-child relationships of the clones; generating random fitness landscapes (Rough Mount Fuji, House of Cards, additive, NK, Ising, and Eggbox models) and plotting them.

Maintained by Ramon Diaz-Uriarte. Last updated 11 days ago.

biologicalquestion somaticmutation cpp

4.3 match 7 stars 6.06 score 68 scripts

r-tidy-remote-sensing

tidyrgee:'tidyverse' Methods for 'Earth Engine'

Provides 'tidyverse' methods for wrangling and analyzing 'Earth Engine' <https://earthengine.google.com/> data. These methods help the user with filtering, joining and summarising 'Earth Engine' image collections.

Maintained by Zack Arno. Last updated 2 years ago.

4.7 match 48 stars 5.53 score 140 scripts

rozen-lab

mSigTools:Mutational Signature Analysis Tools

Utility functions for mutational signature analysis as described in Alexandrov, L. B. (2020) <doi:10.1038/s41586-020-1943-3>. This package provides two groups of functions. One is for dealing with mutational signature "exposures" (i.e. the counts of mutations in a sample that are due to each mutational signature). The other group of functions is for matching or comparing sets of mutational signatures. 'mSigTools' stands for mutational Signature analysis Tools.

Maintained by Steven Rozen. Last updated 2 years ago.

8.5 match 2 stars 3.00 score 9 scripts

sapfluxnet

sapfluxnetr:Working with 'Sapfluxnet' Project Data

Access, modify, aggregate and plot data from the 'Sapfluxnet' project (<http://sapfluxnet.creaf.cat>), the first global database of sap flow measurements.

Maintained by Victor Granda. Last updated 2 years ago.

3.9 match 25 stars 6.57 score 49 scripts

langendorfr

netcom:NETwork COMparison Inference

Infer system functioning with empirical NETwork COMparisons. These methods are part of a growing paradigm in network science that uses relative comparisons of networks to infer mechanistic classifications and predict systemic interventions. They have been developed and applied in Langendorf and Burgess (2021) <doi:10.1038/s41598-021-99251-7>, Langendorf (2020) <doi:10.1201/9781351190831-6>, and Langendorf and Goldberg (2019) <arXiv:1912.12551>.

Maintained by Ryan Langendorf. Last updated 8 months ago.

5.6 match 5 stars 4.46 score 115 scripts

matthewwolak

nadiv:(Non)Additive Genetic Relatedness Matrices

Constructs (non)additive genetic relationship matrices, and their inverses, from a pedigree to be used in linear mixed effect models (A.K.A. the 'animal model'). Also includes other functions to facilitate the use of animal models. Some functions have been created to be used in conjunction with the R package 'asreml' for the 'ASReml' software, which can be obtained upon purchase from 'VSN' international (<https://vsni.co.uk/software/asreml>).

Maintained by Matthew Wolak. Last updated 10 months ago.

cpp

3.5 match 20 stars 7.13 score 151 scripts 3 dependents

bioc

plyxp:Data masks for SummarizedExperiment enabling dplyr-like manipulation

The package provides `rlang` data masks for the SummarizedExperiment class. The enables the evaluation of unquoted expression in different contexts of the SummarizedExperiment object with optional access to other contexts. The goal for `plyxp` is for evaluation to feel like a data.frame object without ever needing to unwind to a rectangular data.frame.

Maintained by Justin Landis. Last updated 5 months ago.

annotation genomeannotation transcriptomics

5.0 match 4 stars 4.81 score 6 scripts

thibautjombart

adegenet:Exploratory Analysis of Genetic and Genomic Data

Toolset for the exploration of genetic and genomic data. Adegenet provides formal (S4) classes for storing and handling various genetic data, including genetic markers with varying ploidy and hierarchical population structure ('genind' class), alleles counts by populations ('genpop'), and genome-wide SNP data ('genlight'). It also implements original multivariate methods (DAPC, sPCA), graphics, statistical tests, simulation tools, distance and similarity measures, and several spatial methods. A range of both empirical and simulated datasets is also provided to illustrate various methods.

Maintained by Zhian N. Kamvar. Last updated 1 months ago.

1.9 match 182 stars 12.60 score 1.9k scripts 29 dependents

yulab-smu

ggfun:Miscellaneous Functions for 'ggplot2'

Useful functions and utilities for 'ggplot' object (e.g., geometric layers, themes, and utilities to edit the object).

Maintained by Guangchuang Yu. Last updated 2 months ago.

2.3 match 18 stars 10.41 score 58 scripts 151 dependents

hopkinsidd

phylosamp:Sample Size Calculations for Molecular and Phylogenetic Studies

Implements novel tools for estimating sample sizes needed for phylogenetic studies, including studies focused on estimating the probability of true pathogen transmission between two cases given phylogenetic linkage and studies focused on tracking pathogen variants at a population level. Methods described in Wohl, Giles, and Lessler (2021) and in Wohl, Lee, DiPrete, and Lessler (2023).

Maintained by Justin Lessler. Last updated 2 years ago.

phylogenetics sampling

3.5 match 12 stars 6.65 score 25 scripts

somalogic

SomaDataIO:Input/Output 'SomaScan' Data

Load and export 'SomaScan' data via the 'Standard BioTools, Inc.' structured text file called an ADAT ('*.adat'). For file format see <https://github.com/SomaLogic/SomaLogic-Data/blob/main/README.md>. The package also exports auxiliary functions for manipulating, wrangling, and extracting relevant information from an ADAT object once in memory.

Maintained by Caleb Scheidel. Last updated 1 months ago.

adat proteomics proteomics-data-analysis somascan

3.0 match 26 stars 7.71 score 132 scripts

molgenis

dsTidyverseClient:'DataSHIELD' 'Tidyverse' Clientside Package

Implementation of selected 'Tidyverse' functions within 'DataSHIELD', an open-source federated analysis solution in R. Currently, 'DataSHIELD' contains very limited tools for data manipulation, so the aim of this package is to improve the researcher experience by implementing essential functions for data manipulation, including subsetting, filtering, grouping, and renaming variables. This is the clientside package which should be installed locally, and is used in conjuncture with the serverside package 'dsTidyverse' which is installed on the remote server holding the data. For more information, see <https://www.tidyverse.org/>, <https://datashield.org/> and <https://github.com/molgenis/ds-tidyverse>.

Maintained by Tim Cadman. Last updated 17 days ago.

4.3 match 1 stars 5.43 score 2 scripts

reconverse

incidence2:Compute, Handle and Plot Incidence of Dated Events

Provides functions and classes to compute, handle and visualise incidence from dated events for a defined time interval. Dates can be provided in various standard formats. The class 'incidence2' is used to store computed incidence and can be easily manipulated, subsetted, and plotted.

Maintained by Tim Taylor. Last updated 5 days ago.

3.0 match 17 stars 7.67 score 104 scripts 1 dependents

asgr

imager:Image Processing Library Based on 'CImg'

Fast image processing for images in up to 4 dimensions (two spatial dimensions, one time/depth dimension, one colour dimension). Provides most traditional image processing tools (filtering, morphology, transformations, etc.) as well as various functions for easily analysing image data using R. The package wraps 'CImg', <http://cimg.eu>, a simple, modern C++ library for image processing.

Maintained by Aaron Robotham. Last updated 26 days ago.

libx11 fftw3 tiff cpp openmp

1.7 match 17 stars 13.62 score 2.4k scripts 45 dependents

bioc

ELMER:Inferring Regulatory Element Landscapes and Transcription Factor Networks Using Cancer Methylomes

ELMER is designed to use DNA methylation and gene expression from a large number of samples to infere regulatory element landscape and transcription factor network in primary tissue.

Maintained by Tiago Chedraoui Silva. Last updated 5 months ago.

dnamethylation geneexpression motifannotation software generegulation transcription network

3.1 match 7.42 score 176 scripts

bioc

safe:Significance Analysis of Function and Expression

SAFE is a resampling-based method for testing functional categories in gene expression experiments. SAFE can be applied to 2-sample and multi-class comparisons, or simple linear regressions. Other experimental designs can also be accommodated through user-defined functions.

Maintained by Ludwig Geistlinger. Last updated 5 months ago.

differentialexpression pathways genesetenrichment statisticalmethod software

4.0 match 5.60 score 32 scripts 5 dependents

bioc

VERSO:Viral Evolution ReconStructiOn (VERSO)

Mutations that rapidly accumulate in viral genomes during a pandemic can be used to track the evolution of the virus and, accordingly, unravel the viral infection network. To this extent, sequencing samples of the virus can be employed to estimate models from genomic epidemiology and may serve, for instance, to estimate the proportion of undetected infected people by uncovering cryptic transmissions, as well as to predict likely trends in the number of infected, hospitalized, dead and recovered people. VERSO is an algorithmic framework that processes variants profiles from viral samples to produce phylogenetic models of viral evolution. The approach solves a Boolean Matrix Factorization problem with phylogenetic constraints, by maximizing a log-likelihood function. VERSO includes two separate and subsequent steps; in this package we provide an R implementation of VERSO STEP 1.

Maintained by Davide Maspero. Last updated 5 months ago.

biomedicalinformatics sequencing somaticmutation

3.7 match 7 stars 6.05 score

momx

Momocs:Morphometrics using R

The goal of 'Momocs' is to provide a complete, convenient, reproducible and open-source toolkit for 2D morphometrics. It includes most common 2D morphometrics approaches on outlines, open outlines, configurations of landmarks, traditional morphometrics, and facilities for data preparation, manipulation and visualization with a consistent grammar throughout. It allows reproducible, complex morphometrics analyses and other morphometrics approaches should be easy to plug in, or develop from, on top of this canvas.

Maintained by Vincent Bonhomme. Last updated 1 years ago.

morphometrics

3.0 match 51 stars 7.42 score 346 scripts

bioc

seq.hotSPOT:Targeted sequencing panel design based on mutation hotspots

seq.hotSPOT provides a resource for designing effective sequencing panels to help improve mutation capture efficacy for ultradeep sequencing projects. Using SNV datasets, this package designs custom panels for any tissue of interest and identify the genomic regions likely to contain the most mutations. Establishing efficient targeted sequencing panels can allow researchers to study mutation burden in tissues at high depth without the economic burden of whole-exome or whole-genome sequencing. This tool was developed to make high-depth sequencing panels to study low-frequency clonal mutations in clinically normal and cancerous tissues.

Maintained by Sydney Grant. Last updated 5 months ago.

software technology sequencing dnaseq wholegenome

5.6 match 4.00 score 3 scripts

thomasp85

particles:A Graph Based Particle Simulator Based on D3-Force

Simulating particle movement in 2D space has many application. The 'particles' package implements a particle simulator based on the ideas behind the 'd3-force' 'JavaScript' library. 'particles' implements all forces defined in 'd3-force' as well as others such as vector fields, traps, and attractors.

Maintained by Thomas Lin Pedersen. Last updated 3 months ago.

d3js graph-layout network network-visualization particles simulation cpp

3.0 match 119 stars 7.19 score 43 scripts

strohne

volker:High-Level Functions for Tabulating, Charting and Reporting Survey Data

Craft polished tables and plots in Markdown reports. Simply choose whether to treat your data as counts or metrics, and the package will automatically generate well-designed default tables and plots for you. Boiled down to the basics, with labeling features and simple interactive reports. All functions are 'tidyverse' compatible.

Maintained by Jakob Jünger. Last updated 2 days ago.

3.0 match 5 stars 7.16 score 125 scripts

fawda123

rStrava:Access the 'Strava' API

Functions to access data from the 'Strava v3 API' <https://developers.strava.com/>.

Maintained by Marcus W. Beck. Last updated 5 months ago.

3.0 match 155 stars 7.15 score 57 scripts

statisfactions

simpr:Flexible 'Tidyverse'-Friendly Simulations

A general, 'tidyverse'-friendly framework for simulation studies, design analysis, and power analysis. Specify data generation, define varying parameters, generate data, fit models, and tidy model results in a single pipeline, without needing loops or custom functions.

Maintained by Ethan Brown. Last updated 8 months ago.

3.0 match 43 stars 6.89 score 30 scripts

ycroissant

dfidx:Indexed Data Frames

Provides extended data frames, with a special data frame column which contains two indexes, with potentially a nesting structure.

Maintained by Yves Croissant. Last updated 7 months ago.

3.0 match 2 stars 6.85 score 44 scripts 18 dependents

bioc

proActiv:Estimate Promoter Activity from RNA-Seq data

Most human genes have multiple promoters that control the expression of different isoforms. The use of these alternative promoters enables the regulation of isoform expression pre-transcriptionally. Alternative promoters have been found to be important in a wide number of cell types and diseases. proActiv is an R package that enables the analysis of promoters from RNA-seq data. proActiv uses aligned reads as input, and generates counts and normalized promoter activity estimates for each annotated promoter. In particular, proActiv accepts junction files from TopHat2 or STAR or BAM files as inputs. These estimates can then be used to identify which promoter is active, which promoter is inactive, and which promoters change their activity across conditions. proActiv also allows visualization of promoter activity across conditions.

Maintained by Joseph Lee. Last updated 5 months ago.

rnaseq geneexpression transcription alternativesplicing generegulation differentialsplicing functionalgenomics epigenetics transcriptomics preprocessing alternative-promoters genomics promoter-activity promoter-annotation rna-seq-data

3.0 match 51 stars 6.66 score 15 scripts

bioc

survtype:Subtype Identification with Survival Data

Subtypes are defined as groups of samples that have distinct molecular and clinical features. Genomic data can be analyzed for discovering patient subtypes, associated with clinical data, especially for survival information. This package is aimed to identify subtypes that are both clinically relevant and biologically meaningful.

Maintained by Dongmin Jung. Last updated 5 months ago.

software statisticalmethod geneexpression survival clustering sequencing coverage

5.0 match 4.00 score 3 scripts

craddm

eegUtils:Utilities for Electroencephalographic (EEG) Analysis

Electroencephalography data processing and visualization tools. Includes import functions for 'BioSemi' (.BDF), 'Neuroscan' (.CNT), 'Brain Vision Analyzer' (.VHDR), 'EEGLAB' (.set) and 'Fieldtrip' (.mat). Many preprocessing functions such as referencing, epoching, filtering, and ICA are available. There are a variety of visualizations possible, including timecourse and topographical plotting.

Maintained by Matt Craddock. Last updated 5 months ago.

eeg eeg-analysis eeg-data eeg-signals eeg-signals-processing openblas cpp openmp

3.0 match 106 stars 6.54 score 82 scripts

buriom

denoiSeq:Differential Expression Analysis Using a Bottom-Up Model

Given count data from two conditions, it determines which transcripts are differentially expressed across the two conditions using Bayesian inference of the parameters of a bottom-up model for PCR amplification. This model is developed in Ndifon Wilfred, Hilah Gal, Eric Shifrut, Rina Aharoni, Nissan Yissachar, Nir Waysbort, Shlomit Reich Zeliger, Ruth Arnon, and Nir Friedman (2012), <http://www.pnas.org/content/109/39/15865.full>, and results in a distribution for the counts that is a superposition of the binomial and negative binomial distribution.

Maintained by Gershom Buri. Last updated 7 years ago.

5.3 match 3.70 score 10 scripts

liamrevell

learnPopGen:Population Genetic Simulations & Numerical Analysis

Conducts various numerical analyses and simulations in population genetics and evolutionary theory, primarily for the purpose of teaching (and learning about) key concepts in population & quantitative genetics, and evolutionary theory.

Maintained by Liam J. Revell. Last updated 2 years ago.

4.0 match 26 stars 4.82 score 51 scripts

stocnet

manynet:Many Ways to Make, Modify, Map, Mark, and Measure Myriad Networks

Many tools for making, modifying, mapping, marking, measuring, and motifs and memberships of many different types of networks. All functions operate with matrices, edge lists, and 'igraph', 'network', and 'tidygraph' objects, and on one-mode, two-mode (bipartite), and sometimes three-mode networks. The package includes functions for importing and exporting, creating and generating networks, modifying networks and node and tie attributes, and describing and visualizing networks with sensible defaults.

Maintained by James Hollway. Last updated 3 months ago.

diffusion-models graphs network-analysis

3.0 match 13 stars 6.41 score 35 scripts 1 dependents

bioc

DNABarcodes:A tool for creating and analysing DNA barcodes used in Next Generation Sequencing multiplexing experiments

The package offers a function to create DNA barcode sets capable of correcting insertion, deletion, and substitution errors. Existing barcodes can be analysed regarding their minimal, maximal and average distances between barcodes. Finally, reads that start with a (possibly mutated) barcode can be demultiplexed, i.e., assigned to their original reference barcode.

Maintained by Tilo Buschmann. Last updated 5 months ago.

preprocessing sequencing cpp openmp

4.3 match 4.51 score 27 scripts

qile0317

FastUtils:Fast, Readable Utility Functions

A wide variety of tools for general data analysis, wrangling, spelling, statistics, visualizations, package development, and more. All functions have vectorized implementations whenever possible. Exported names are designed to be readable, with longer names possessing short aliases.

Maintained by Qile Yang. Last updated 4 months ago.

scientific-computing utilities utility cpp

3.9 match 2 stars 4.95 score 2 scripts

mattheaphy

actxps:Create Actuarial Experience Studies: Prepare Data, Summarize Results, and Create Reports

Experience studies are used by actuaries to explore historical experience across blocks of business and to inform assumption setting activities. This package provides functions for preparing data, creating studies, visualizing results, and beginning assumption development. Experience study methods, including exposure calculations, are described in: Atkinson & McGarry (2016) "Experience Study Calculations" <https://www.soa.org/49378a/globalassets/assets/files/research/experience-study-calculations.pdf>. The limited fluctuation credibility method used by the 'exp_stats()' function is described in: Herzog (1999, ISBN:1-56698-374-6) "Introduction to Credibility Theory".

Maintained by Matt Heaphy. Last updated 2 months ago.

3.0 match 14 stars 6.38 score 23 scripts

henningte

ir:Functions to Handle and Preprocess Infrared Spectra

Functions to import and handle infrared spectra (import from '.csv' and Thermo Galactic's '.spc', baseline correction, binning, clipping, interpolating, smoothing, averaging, adding, subtracting, dividing, multiplying, plotting).

Maintained by Henning Teickner. Last updated 3 years ago.

chemometrics infrared infrared-spectra ir-package mid-infrared-spectra spectroscopy

3.6 match 6 stars 5.32 score 35 scripts

hope-data-science

tidyfst:Tidy Verbs for Fast Data Manipulation

A toolkit of tidy data manipulation verbs with 'data.table' as the backend. Combining the merits of syntax elegance from 'dplyr' and computing performance from 'data.table', 'tidyfst' intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension of 'data.table'. While enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations.

Maintained by Tian-Yuan Huang. Last updated 6 months ago.

1.9 match 98 stars 10.09 score 118 scripts 4 dependents

reimand0

ActiveDriver:Finding Cancer Driver Proteins with Enriched Mutations in Post-Translational Modification Sites

A mutation analysis tool that discovers cancer driver genes with frequent mutations in protein signalling sites such as post-translational modifications (phosphorylation, ubiquitination, etc). The Poisson generalised linear regression model identifies genes where cancer mutations in signalling sites are more frequent than expected from the sequence of the entire gene. Integration of mutations with signalling information helps find new driver genes and propose candidate mechanisms to known drivers. Reference: Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Juri Reimand and Gary D Bader. Molecular Systems Biology (2013) 9:637 <doi:10.1038/msb.2012.68>.

Maintained by Juri Reimand. Last updated 8 years ago.

9.4 match 2.00 score 6 scripts

hope-data-science

tidyft:Fast and Memory Efficient Data Operations in Tidy Syntax

Tidy syntax for 'data.table', using modification by reference whenever possible. This toolkit is designed for big data analysis in high-performance desktop or laptop computers. The syntax of the package is similar or identical to 'tidyverse'. It is user friendly, memory efficient and time saving. For more information, check its ancestor package 'tidyfst'.

Maintained by Tian-Yuan Huang. Last updated 6 months ago.

3.0 match 35 stars 6.25 score 34 scripts

kbhoehn

dowser:B Cell Receptor Phylogenetics Toolkit

Provides a set of functions for inferring, visualizing, and analyzing B cell phylogenetic trees. Provides methods to 1) reconstruct unmutated ancestral sequences, 2) build B cell phylogenetic trees using multiple methods, 3) visualize trees with metadata at the tips, 4) reconstruct intermediate sequences, 5) detect biased ancestor-descendant relationships among metadata types Workflow examples available at documentation site (see URL). Citations: Hoehn et al (2022) <doi:10.1371/journal.pcbi.1009885>, Hoehn et al (2021) <doi:10.1101/2021.01.06.425648>.

Maintained by Kenneth Hoehn. Last updated 2 months ago.

2.8 match 6.81 score 84 scripts

poissonconsulting

mcmcdata:Manipulate MCMC Samples and Data Frames

Manipulates Monte Carlo Markov Chain samples and associated data frames.

Maintained by Joe Thorley. Last updated 2 months ago.

5.3 match 1 stars 3.56 score 4 scripts 4 dependents

r-lib

slider:Sliding Window Functions

Provides type-stable rolling window functions over any R data type. Cumulative and expanding windows are also supported. For more advanced usage, an index can be used as a secondary vector that defines how sliding windows are to be created.

Maintained by Davis Vaughan. Last updated 1 months ago.

1.3 match 302 stars 13.92 score 848 scripts 99 dependents

mhahsler

rMSA:Interface for Popular Multiple Sequence Alignment Tools

Seamlessly interfaces the Multiple Sequence Alignment software packages ClustalW, MAFFT, MUSCLE and Kalign (downloaded separately) and provides support to calcualte distances between sequences. This work was partially supported by grant no. R21HG005912 from the National Human Genome Research Institute.

Maintained by Michael Hahsler. Last updated 10 months ago.

genetics sequencing infrastructure alignment bioinformatics sequence-alignment

4.9 match 12 stars 3.78 score 7 scripts

bioc

scoup:Simulate Codons with Darwinian Selection Modelled as an OU Process

An elaborate molecular evolutionary framework that facilitates straightforward simulation of codon genetic sequences subjected to different degrees and/or patterns of Darwinian selection. The model is built upon the fitness landscape paradigm of Sewall Wright, as popularised by the mutation-selection model of Halpern and Bruno. This enables realistic evolutionary process of living organisms to be reproducible seamlessly. For example, an Ornstein-Uhlenbeck fitness update algorithm is incorporated herein. Consequently, otherwise complex biological processes, such as the effect of the interplay between genetic drift and fitness landscape fluctuations on the inference of diversifying selection, may now be investigated with minimal effort. Frequency-dependent and stochastic fitness landscape update techniques are available.

Maintained by Hassan Sadiq. Last updated 2 months ago.

alignment classification comparativegenomics dataimport genetics mathematicalbiology researchfield sequencing sequencematching software statisticalmethod workflowstep

4.0 match 4.60 score 8 scripts

bioc

TCGAutils:TCGA utility functions for data management

A suite of helper functions for checking and manipulating TCGA data including data obtained from the curatedTCGAData experiment package. These functions aim to simplify and make working with TCGA data more manageable. Exported functions include those that import data from flat files into Bioconductor objects, convert row annotations, and identifier translation via the GDC API.

Maintained by Marcel Ramos. Last updated 3 months ago.

software workflowstep preprocessing dataimport bioconductor-package tcga u24ca289073 utilities

1.9 match 26 stars 9.68 score 210 scripts 10 dependents