R-universe search: nucleotides

bioc

Modstrings:Working with modified nucleotide sequences

Representing nucleotide modifications in a nucleotide sequence is usually done via special characters from a number of sources. This represents a challenge to work with in R and the Biostrings package. The Modstrings package implements this functionallity for RNA and DNA sequences containing modified nucleotides by translating the character internally in order to work with the infrastructure of the Biostrings package. For this the ModRNAString and ModDNAString classes and derivates and functions to construct and modify these objects despite the encoding issues are implemenented. In addition the conversion from sequences to list like location information (and the reverse operation) is implemented as well.

Maintained by Felix G.M. Ernst. Last updated 5 months ago.

dataimport datarepresentation infrastructure sequencing software bioconductor biostrings dna dna-modifications modified-nucleotides nucleotides rna rna-modification-alphabet rna-modifications sequences

33.7 match 1 stars 6.64 score 5 scripts 8 dependents

snoweye

phyclust:Phylogenetic Clustering (Phyloclustering)

Phylogenetic clustering (phyloclustering) is an evolutionary Continuous Time Markov Chain model-based approach to identify population structure from molecular data without assuming linkage equilibrium. The package phyclust (Chen 2011) provides a convenient implementation of phyloclustering for DNA and SNP data, capable of clustering individuals into subpopulations and identifying molecular sequences representative of those subpopulations. It is designed in C for performance, interfaced with R for visualization, and incorporates other popular open source programs including ms (Hudson 2002) <doi:10.1093/bioinformatics/18.2.337>, seq-gen (Rambaut and Grassly 1997) <doi:10.1093/bioinformatics/13.3.235>, Hap-Clustering (Tzeng 2005) <doi:10.1002/gepi.20063> and PAML baseml (Yang 1997, 2007) <doi:10.1093/bioinformatics/13.5.555>, <doi:10.1093/molbev/msm088>, for simulating data, additional analyses, and searching the best tree. See the phyclust website for more information, documentations and examples.

Maintained by Wei-Chen Chen. Last updated 2 years ago.

17.1 match 9 stars 8.45 score 126 scripts 8 dependents

bioc

ginmappeR:Gene Identifier Mapper

Provides functionalities to translate gene or protein identifiers between state-of-art biological databases: CARD (<https://card.mcmaster.ca/>), NCBI Protein, Nucleotide and Gene (<https://www.ncbi.nlm.nih.gov/>), UniProt (<https://www.uniprot.org/>) and KEGG (<https://www.kegg.jp>). Also offers complementary functionality like NCBI identical proteins or UniProt similar genes clusters retrieval.

Maintained by Fernando Sola. Last updated 3 months ago.

annotation kegg genetics thirdpartyclient software

17.5 match 4.88 score 7 scripts

simonlabcode

bakR:Analyze and Compare Nucleotide Recoding RNA Sequencing Datasets

Several implementations of a novel Bayesian hierarchical statistical model of nucleotide recoding RNA-seq experiments (NR-seq; TimeLapse-seq, SLAM-seq, TUC-seq, etc.) for analyzing and comparing NR-seq datasets (see 'Vock and Simon' (2023) <doi:10.1261/rna.079451.122>). NR-seq is a powerful extension of RNA-seq that provides information about the kinetics of RNA metabolism (e.g., RNA degradation rate constants), which is notably lacking in standard RNA-seq data. The statistical model makes maximal use of these high-throughput datasets by sharing information across transcripts to significantly improve uncertainty quantification and increase statistical power. 'bakR' includes a maximally efficient implementation of this model for conservative initial investigations of datasets. 'bakR' also provides more highly powered implementations using the probabilistic programming language 'Stan' to sample from the full posterior distribution. 'bakR' performs multiple-test adjusted statistical inference with the output of these model implementations to help biologists separate signal from background. Methods to automatically visualize key results and detect batch effects are also provided.

Maintained by Isaac Vock. Last updated 4 months ago.

cpp

13.8 match 6 stars 6.12 score 21 scripts

dsstoffer

astsa:Applied Statistical Time Series Analysis

Contains data sets and scripts for analyzing time series in both the frequency and time domains including state space modeling as well as supporting the texts Time Series Analysis and Its Applications: With R Examples (5th ed), by R.H. Shumway and D.S. Stoffer. Springer Texts in Statistics, 2025, <https://link.springer.com/book/9783031705830>, and Time Series: A Data Analysis Approach Using R. Chapman-Hall, 2019, <DOI:10.1201/9780429273285>.

Maintained by David Stoffer. Last updated 2 months ago.

10.6 match 7 stars 7.88 score 2.2k scripts 8 dependents

sareameri

ftrCOOL:Feature Extraction from Biological Sequences

Extracts features from biological sequences. It contains most features which are presented in related work and also includes features which have never been introduced before. It extracts numerous features from nucleotide and peptide sequences. Each feature converts the input sequences to discrete numbers in order to use them as predictors in machine learning models. There are many features and information which are hidden inside a sequence. Utilizing the package, users can convert biological sequences to discrete models based on chosen properties. References: 'iLearn' 'Z. Chen et al.' (2019) <DOI:10.1093/bib/bbz041>. 'iFeature' 'Z. Chen et al.' (2018) <DOI:10.1093/bioinformatics/bty140>. <https://CRAN.R-project.org/package=rDNAse>. 'PseKRAAC' 'Y. Zuo et al.' 'PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition' (2017) <DOI:10.1093/bioinformatics/btw564>. 'iDNA6mA-PseKNC' 'P. Feng et al.' 'iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC' (2019) <DOI:10.1016/j.ygeno.2018.01.005>. 'I. Dubchak et al.' 'Prediction of protein folding class using global description of amino acid sequence' (1995) <DOI:10.1073/pnas.92.19.8700>. 'W. Chen et al.' 'Identification and analysis of the N6-methyladenosine in the Saccharomyces cerevisiae transcriptome' (2015) <DOI:10.1038/srep13859>.

Maintained by Sare Amerifar. Last updated 3 years ago.

31.4 match 2 stars 2.26 score 1 scripts 3 dependents

thibautjombart

adegenet:Exploratory Analysis of Genetic and Genomic Data

Toolset for the exploration of genetic and genomic data. Adegenet provides formal (S4) classes for storing and handling various genetic data, including genetic markers with varying ploidy and hierarchical population structure ('genind' class), alleles counts by populations ('genpop'), and genome-wide SNP data ('genlight'). It also implements original multivariate methods (DAPC, sPCA), graphics, statistical tests, simulation tools, distance and similarity measures, and several spatial methods. A range of both empirical and simulated datasets is also provided to illustrate various methods.

Maintained by Zhian N. Kamvar. Last updated 1 months ago.

5.3 match 182 stars 12.60 score 1.9k scripts 29 dependents

bioc

ShortRead:FASTQ input and manipulation

This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

dataimport sequencing qualitycontrol bioconductor-package core-package zlib cpp

5.2 match 8 stars 12.08 score 1.8k scripts 49 dependents

sjmack

HLAtools:Toolkit for HLA Immunogenomics

A toolkit for the analysis and management of data for genes in the so-called "Human Leukocyte Antigen" (HLA) region. Functions extract reference data from the Anthony Nolan HLA Informatics Group/ImmunoGeneTics HLA 'GitHub' repository (ANHIG/IMGTHLA) <https://github.com/ANHIG/IMGTHLA>, validate Genotype List (GL) Strings, convert between UNIFORMAT and GL String Code (GLSC) formats, translate HLA alleles and GLSCs across ImmunoPolymorphism Database (IPD) IMGT/HLA Database release versions, identify differences between pairs of alleles at a locus, generate customized, multi-position sequence alignments, trim and convert allele-names across nomenclature epochs, and extend existing data-analysis methods.

Maintained by Steven Mack. Last updated 13 days ago.

9.6 match 4 stars 6.21 score 7 scripts 1 dependents

boopsboops

spider:Species Identity and Evolution in R

Analysis of species limits and DNA barcoding data. Included are functions for generating important summary statistics from DNA barcode data, assessing specimen identification efficacy, testing and optimizing divergence threshold limits, assessment of diagnostic nucleotides, and calculation of the probability of reciprocal monophyly. Additionally, a sliding window function offers opportunities to analyse information across a gene, often used for marker design in degraded DNA studies. Further information on the package has been published in Brown et al (2012) <doi:10.1111/j.1755-0998.2011.03108.x>.

Maintained by Rupert A. Collins. Last updated 6 years ago.

dna-barcode edna evolution species-delimitation species-identity

9.8 match 2 stars 5.20 score 66 scripts 1 dependents

bioc

motifbreakR:A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 32 species, a total of 109 versions).

Maintained by Simon Gert Coetzee. Last updated 5 months ago.

chipseq visualization motifannotation transcription

5.6 match 28 stars 8.96 score 103 scripts

bioc

seqTools:Analysis of nucleotide, sequence and quality content on fastq files

Analyze read length, phred scores and alphabet frequency and DNA k-mers on uncompressed and compressed fastq files.

Maintained by Wolfgang Kaisers. Last updated 5 months ago.

qualitycontrol sequencing zlib

8.9 match 5.57 score 52 scripts 1 dependents

bioc

YAPSA:Yet Another Package for Signature Analysis

This package provides functions and routines for supervised analyses of mutational signatures (i.e., the signatures have to be known, cf. L. Alexandrov et al., Nature 2013 and L. Alexandrov et al., Bioaxiv 2018). In particular, the family of functions LCD (LCD = linear combination decomposition) can use optimal signature-specific cutoffs which takes care of different detectability of the different signatures. Moreover, the package provides different sets of mutational signatures, including the COSMIC and PCAWG SNV signatures and the PCAWG Indel signatures; the latter infering that with YAPSA, the concept of supervised analysis of mutational signatures is extended to Indel signatures. YAPSA also provides confidence intervals as computed by profile likelihoods and can perform signature analysis on a stratified mutational catalogue (SMC = stratify mutational catalogue) in order to analyze enrichment and depletion patterns for the signatures in different strata.

Maintained by Zuguang Gu. Last updated 5 months ago.

sequencing dnaseq somaticmutation visualization clustering genomicvariation statisticalmethod biologicalquestion

7.3 match 6.41 score 57 scripts

tathey

VLF:Frequency Matrix Approach for Assessing Very Low Frequency Variants in Sequence Records

Using frequency matrices, very low frequency variants (VLFs) are assessed for amino acid and nucleotide sequences. The VLFs are then compared to see if they occur in only one member of a species, singleton VLFs, or if they occur in multiple members of a species, shared VLFs. The amino acid and nucleotide VLFs are then compared to see if they are concordant with one another. Amino acid VLFs are also assessed to determine if they lead to a change in amino acid residue type, and potential changes to protein structures. Based on Stoeckle and Kerr (2012) <doi:10.1371/journal.pone.0043992>.

Maintained by Taryn B. T. Athey. Last updated 3 years ago.

21.1 match 2.16 score 48 scripts 1 dependents

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

2.9 match 459 stars 14.63 score 948 scripts 18 dependents

bioc

atSNP:Affinity test for identifying regulatory SNPs

atSNP performs affinity tests of motif matches with the SNP or the reference genomes and SNP-led changes in motif matches.

Maintained by Sunyoung Shin. Last updated 5 months ago.

software chipseq genomeannotation motifannotation visualization cpp

6.8 match 1 stars 5.73 score 36 scripts

mvesuviusc

primerTree:Visually Assessing the Specificity and Informativeness of Primer Pairs

Identifies potential target sequences for a given set of primers and generates phylogenetic trees annotated with the taxonomies of the predicted amplification products.

Maintained by Matt Cannon. Last updated 1 years ago.

6.9 match 51 stars 5.56 score 16 scripts

bioc

ORFhunteR:Predict open reading frames in nucleotide sequences

The ORFhunteR package is a R and C++ library for an automatic determination and annotation of open reading frames (ORF) in a large set of RNA molecules. It efficiently implements the machine learning model based on vectorization of nucleotide sequences and the random forest classification algorithm. The ORFhunteR package consists of a set of functions written in the R language in conjunction with C++. The efficiency of the package was confirmed by the examples of the analysis of RNA molecules from the NCBI RefSeq and Ensembl databases. The package can be used in basic and applied biomedical research related to the study of the transcriptome of normal as well as altered (for example, cancer) human cells.

Maintained by Vasily V. Grinev. Last updated 5 months ago.

technology statisticalmethod sequencing rnaseq classification featureextraction cpp

8.6 match 1 stars 4.48 score

openintrostat

openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs

Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.

Maintained by Mine Çetinkaya-Rundel. Last updated 3 months ago.

data openintro

3.3 match 240 stars 11.39 score 6.0k scripts

bioc

deepSNV:Detection of subclonal SNVs in deep sequencing data.

This package provides provides quantitative variant callers for detecting subclonal mutations in ultra-deep (>=100x coverage) sequencing experiments. The deepSNV algorithm is used for a comparative setup with a control experiment of the same loci and uses a beta-binomial model and a likelihood ratio test to discriminate sequencing errors and subclonal SNVs. The shearwater algorithm computes a Bayes classifier based on a beta-binomial model for variant calling with multiple samples for precisely estimating model parameters - such as local error rates and dispersion - and prior knowledge, e.g. from variation data bases such as COSMIC.

Maintained by Moritz Gerstung. Last updated 5 months ago.

geneticvariability snp sequencing genetics dataimport curl bzip2 xz-utils zlib cpp

5.6 match 6.53 score 38 scripts 1 dependents

bioc

G4SNVHunter:Evaluating SNV-Induced Disruption of G-Quadruplex Structures

G-quadruplexes (G4s) are unique nucleic acid secondary structures predominantly found in guanine-rich regions and have been shown to be involved in various biological regulatory processes. G4SNVHunter is an R package designed to rapidly identify genomic sequences with G4-forming potential and accurately screen user-provided single nucleotide variants (also applicable to single nucleotide polymorphisms) that may destabilize these structures. This enables users to screen key variants for further experimental study, investigating how these variants may influence biological functions, such as gene regulation, by altering G4 formation.

Maintained by Rongxin Zhang. Last updated 3 months ago.

epigenetics snp cpp

8.0 match 4.48 score 4 scripts

bioc

qckitfastq:FASTQ Quality Control

Assessment of FASTQ file format with multiple metrics including quality score, sequence content, overrepresented sequence and Kmers.

Maintained by August Guang. Last updated 5 months ago.

software qualitycontrol sequencing zlib cpp

7.9 match 4.38 score 24 scripts

bioc

immApex:Tools for Adaptive Immune Receptor Sequence-Based Machine and Deep Learning

A set of tools to build tensorflow/keras3-based models in R from amino acid and nucleotide sequences focusing on adaptive immune receptors. The package includes pre-processing of sequences, unifying gene nomenclature usage, encoding sequences, and combining models. This package will serve as the basis of future immune receptor sequence functions/packages/models compatible with the scRepertoire ecosystem.

Maintained by Nick Borcherding. Last updated 19 days ago.

software immunooncology singlecell classification annotation sequencing motifannotation

5.7 match 8 stars 5.92 score 3 scripts

eriqande

whoa:Evaluation of Genotyping Error in Genotype-by-Sequencing Data

This is a small, lightweight package that lets users investigate the distribution of genotypes in genotype-by-sequencing (GBS) data where they expect (by and large) Hardy-Weinberg equilibrium, in order to assess rates of genotyping errors and the dependence of those rates on read depth. It implements a Markov chain Monte Carlo (MCMC) sampler using 'Rcpp' to compute a Bayesian estimate of what we call the heterozygote miscall rate for restriction-associated digest (RAD) sequencing data and other types of reduced representation GBS data. It also provides functions to generate plots of expected and observed genotype frequencies. Some background on these topics can be found in a recent paper "Recent advances in conservation and population genomics data analysis" by Hendricks et al. (2018) <doi:10.1111/eva.12659>, and another paper describing the MCMC approach is in preparation with Gordon Luikart and Thierry Gosselin.

Maintained by Eric C. Anderson. Last updated 4 years ago.

cpp

6.5 match 6 stars 5.16 score 24 scripts

bioc

BUMHMM:Computational pipeline for computing probability of modification from structure probing experiment data

This is a probabilistic modelling pipeline for computing per- nucleotide posterior probabilities of modification from the data collected in structure probing experiments. The model supports multiple experimental replicates and empirically corrects coverage- and sequence-dependent biases. The model utilises the measure of a "drop-off rate" for each nucleotide, which is compared between replicates through a log-ratio (LDR). The LDRs between control replicates define a null distribution of variability in drop-off rate observed by chance and LDRs between treatment and control replicates gets compared to this distribution. Resulting empirical p-values (probability of being "drawn" from the null distribution) are used as observations in a Hidden Markov Model with a Beta-Uniform Mixture model used as an emission model. The resulting posterior probabilities indicate the probability of a nucleotide of having being modified in a structure probing experiment.

Maintained by Alina Selega. Last updated 5 months ago.

immunooncology geneticvariability transcription geneexpression generegulation coverage genetics structuralprediction transcriptomics bayesian classification featureextraction hiddenmarkovmodel regression rnaseq sequencing

7.9 match 4.15 score 14 scripts

bioc

QSutils:Quasispecies Diversity

Set of utility functions for viral quasispecies analysis with NGS data. Most functions are equally useful for metagenomic studies. There are three main types: (1) data manipulation and exploration—functions useful for converting reads to haplotypes and frequencies, repairing reads, intersecting strand haplotypes, and visualizing haplotype alignments. (2) diversity indices—functions to compute diversity and entropy, in which incidence, abundance, and functional indices are considered. (3) data simulation—functions useful for generating random viral quasispecies data.

Maintained by Mercedes Guerrero-Murillo. Last updated 5 months ago.

software genetics dnaseq geneticvariability sequencing alignment sequencematching dataimport

5.7 match 5.56 score 8 scripts 1 dependents

rdinnager

slimr:Create, Run and Post-Process 'SLiM' Population Genetics Forward Simulations

Lets you write 'SLiM' scripts (population genomics simulation) using your favourite R IDE, using a syntax as close as possible to the original 'SLiM' language. It offer many tools to manipulate those scripts, as well as run them in the 'SLiM' software from R, as well as capture and post-process their output, after or even during a simulation.

Maintained by Russell Dinnage. Last updated 4 months ago.

6.8 match 8 stars 4.70 score 42 scripts

bioc

ModCon:Modifying splice site usage by changing the mRNP code, while maintaining the genetic code

Collection of functions to calculate a nucleotide sequence surrounding for splice donors sites to either activate or repress donor usage. The proposed alternative nucleotide sequence encodes the same amino acid and could be applied e.g. in reporter systems to silence or activate cryptic splice donor sites.

Maintained by Johannes Ptok. Last updated 5 months ago.

functionalgenomics alternativesplicing

7.9 match 1 stars 4.00 score 2 scripts

ropensci

rsnps:Get 'SNP' ('Single-Nucleotide' 'Polymorphism') Data on the Web

A programmatic interface to various 'SNP' 'datasets' on the web: 'OpenSNP' (<https://opensnp.org>), and 'NBCIs' 'dbSNP' database (<https://www.ncbi.nlm.nih.gov/projects/SNP/>). Functions are included for searching for 'NCBI'. For 'OpenSNP', functions are included for getting 'SNPs', and data for 'genotypes', 'phenotypes', annotations, and bulk downloads of data by user.

Maintained by Julia Gustavsen. Last updated 2 years ago.

gene snp sequence api web api-client species dbsnp opensnp ncbi genotype data snps web-api

4.6 match 52 stars 6.59 score 63 scripts

ropensci

beautier:'BEAUti' from R

'BEAST2' (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. 'BEAUti 2' (which is part of 'BEAST2') is a GUI tool that allows users to specify the many possible setups and generates the XML file 'BEAST2' needs to run. This package provides a way to create 'BEAST2' input files without active user input, but using R function calls instead.

Maintained by Richèl J.C. Bilderbeek. Last updated 23 days ago.

bayesian beast beast2 beauti phylogenetic-inference phylogenetics

3.4 match 13 stars 8.76 score 198 scripts 5 dependents

prabinameher

EncDNA:Encoding of Nucleotide Sequences into Numeric Feature Vectors

We describe fifteen different splice site sequence encoding schemes that have been used in earlier studies for mapping of splice site sequences into numeric feature vectors. These encoding schemes will also be helpful for transforming other nucleotide sequences into numeric forms, provided they are of equal length. These encoding schemes will help the computational biologist working in the field of classification (binary or multiclass) or prediction involving nucleic acid sequences of equal length.

Maintained by Prabina Kumar Meher. Last updated 6 years ago.

29.6 match 1 stars 1.00 score

cran

KRIS:Keen and Reliable Interface Subroutines for Bioinformatic Analysis

Provides useful functions which are needed for bioinformatic analysis such as calculating linear principal components from numeric data and Single-nucleotide polymorphism (SNP) dataset, calculating fixation index (Fst) using Hudson method, creating scatter plots in 3 views, handling with PLINK binary file format, detecting rough structures and outliers using unsupervised clustering, and calculating matrix multiplication in the faster way for big data.

Maintained by Kridsadakorn Chaichoompu. Last updated 4 years ago.

10.4 match 2.73 score 18 scripts 2 dependents

uclahs-cds

BoutrosLab.plotting.general:Functions to Create Publication-Quality Plots

Contains several plotting functions such as barplots, scatterplots, heatmaps, as well as functions to combine plots and assist in the creation of these plots. These functions will give users great ease of use and customization options in broad use for biomedical applications, as well as general purpose plotting. Each of the functions also provides valid default settings to make plotting data more efficient and producing high quality plots with standard colour schemes simpler. All functions within this package are capable of producing plots that are of the quality to be presented in scientific publications and journals. P'ng et al.; BPG: Seamless, automated and interactive visualization of scientific data; BMC Bioinformatics 2019 <doi:10.1186/s12859-019-2610-2>.

Maintained by Paul Boutros. Last updated 5 months ago.

3.4 match 12 stars 8.36 score 414 scripts 6 dependents

bioc

mitoClone2:Clonal Population Identification in Single-Cell RNA-Seq Data using Mitochondrial and Somatic Mutations

This package primarily identifies variants in mitochondrial genomes from BAM alignment files. It filters these variants to remove RNA editing events then estimates their evolutionary relationship (i.e. their phylogenetic tree) and groups single cells into clones. It also visualizes the mutations and providing additional genomic context.

Maintained by Benjamin Story. Last updated 5 months ago.

annotation dataimport genetics snp software singlecell alignment curl bzip2 xz-utils zlib cpp

6.3 match 1 stars 4.48 score 9 scripts

bioc

rhinotypeR:Rhinovirus genotyping

"rhinotypeR" is designed to automate the comparison of sequence data against prototype strains, streamlining the genotype assignment process. By implementing predefined pairwise distance thresholds, this package makes genotype assignment accessible to researchers and public health professionals. This tool enhances our epidemiological toolkit by enabling more efficient surveillance and analysis of rhinoviruses (RVs) and other viral pathogens with complex genomic landscapes. Additionally, "rhinotypeR" supports comprehensive visualization and analysis of single nucleotide polymorphisms (SNPs) and amino acid substitutions, facilitating in-depth genetic and evolutionary studies.

Maintained by Martha Luka. Last updated 5 months ago.

sequencing genetics phylogenetics

4.3 match 4 stars 6.28 score 2 scripts

bioc

ORFik:Open Reading Frames in Genomics

R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.

Maintained by Haakon Tjeldnes. Last updated 28 days ago.

immunooncology software sequencing riboseq rnaseq functionalgenomics coverage alignment dataimport cpp

2.4 match 33 stars 10.63 score 115 scripts 2 dependents

junhuili1017

TmCalculator:Melting Temperature of Nucleic Acid Sequences

This tool is extended from methods in Bio.SeqUtils.MeltingTemp of python. The melting temperature of nucleic acid sequences can be calculated in three method, the Wallace rule (Thein & Wallace (1986) <doi:10.1016/S0140-6736(86)90739-7>), empirical formulas based on G and C content (Marmur J. (1962) <doi:10.1016/S0022-2836(62)80066-7>, Schildkraut C. (2010) <doi:10.1002/bip.360030207>, Wetmur J G (1991) <doi:10.3109/10409239109114069>, Untergasser,A. (2012) <doi:10.1093/nar/gks596>, von Ahsen N (2001) <doi:10.1093/clinchem/47.11.1956>) and nearest neighbor thermodynamics (Breslauer K J (1986) <doi:10.1073/pnas.83.11.3746>, Sugimoto N (1996) <doi:10.1093/nar/24.22.4501>, Allawi H (1998) <doi:10.1093/nar/26.11.2694>, SantaLucia J (2004) <doi:10.1146/annurev.biophys.32.110601.141800>, Freier S (1986) <doi:10.1073/pnas.83.24.9373>, Xia T (1998) <doi:10.1021/bi9809425>, Chen JL (2012) <doi:10.1021/bi3002709>, Bommarito S (2000) <doi:10.1093/nar/28.9.1929>, Turner D H (2010) <doi:10.1093/nar/gkp892>, Sugimoto N (1995) <doi:10.1016/S0048-9697(98)00088-6>, Allawi H T (1997) <doi:10.1021/bi962590c>, Santalucia N (2005) <doi:10.1093/nar/gki918>), and it can also be corrected with salt ions and chemical compound (SantaLucia J (1996) <doi:10.1021/bi951907q>, SantaLucia J(1998) <doi:10.1073/pnas.95.4.1460>, Owczarzy R (2004) <doi:10.1021/bi034621r>, Owczarzy R (2008) <doi:10.1021/bi702363u>).

Maintained by Junhui Li. Last updated 8 days ago.

5.3 match 4 stars 4.75 score 47 scripts 1 dependents

bioc

CSAR:Statistical tools for the analysis of ChIP-seq data

Statistical tools for ChIP-seq data analysis. The package includes the statistical method described in Kaufmann et al. (2009) PLoS Biology: 7(4):e1000090. Briefly, Taking the average DNA fragment size subjected to sequencing into account, the software calculates genomic single-nucleotide read-enrichment values. After normalization, sample and control are compared using a test based on the Poisson distribution. Test statistic thresholds to control the false discovery rate are obtained through random permutation.

Maintained by Jose M Muino. Last updated 5 months ago.

chipseq transcription genetics

5.6 match 4.30 score 6 scripts

erhard-lab

grandR:Comprehensive Analysis of Nucleotide Conversion Sequencing Data

Nucleotide conversion sequencing experiments have been developed to add a temporal dimension to RNA-seq and single-cell RNA-seq. Such experiments require specialized tools for primary processing such as GRAND-SLAM, (see 'Jürges et al' <doi:10.1093/bioinformatics/bty256>) and specialized tools for downstream analyses. 'grandR' provides a comprehensive toolbox for quality control, kinetic modeling, differential gene expression analysis and visualization of such data.

Maintained by Florian Erhard. Last updated 1 months ago.

3.4 match 11 stars 7.03 score 18 scripts 1 dependents

bioc

lumi:BeadArray Specific Methods for Illumina Methylation and Expression Microarrays

The lumi package provides an integrated solution for the Illumina microarray data analysis. It includes functions of Illumina BeadStudio (GenomeStudio) data input, quality control, BeadArray-specific variance stabilization, normalization and gene annotation at the probe level. It also includes the functions of processing Illumina methylation microarrays, especially Illumina Infinium methylation microarrays.

Maintained by Lei Huang. Last updated 5 months ago.

microarray onechannel preprocessing dnamethylation qualitycontrol twochannel

3.8 match 6.27 score 294 scripts 5 dependents

bioc

TFBSTools:Software Package for Transcription Factor Binding Site (TFBS) Analysis

TFBSTools is a package for the analysis and manipulation of transcription factor binding sites. It includes matrices conversion between Position Frequency Matirx (PFM), Position Weight Matirx (PWM) and Information Content Matrix (ICM). It can also scan putative TFBS from sequence/alignment, query JASPAR database and provides a wrapper of de novo motif discovery software.

Maintained by Ge Tan. Last updated 4 days ago.

motifannotation generegulation motifdiscovery transcription alignment

1.9 match 28 stars 12.36 score 1.1k scripts 18 dependents

adrientaudiere

MiscMetabar:Miscellaneous Functions for Metabarcoding Analysis

Facilitate the description, transformation, exploration, and reproducibility of metabarcoding analyses. 'MiscMetabar' is mainly built on top of the 'phyloseq', 'dada2' and 'targets' R packages. It helps to build reproducible and robust bioinformatics pipelines in R. 'MiscMetabar' makes ecological analysis of alpha and beta-diversity easier, more reproducible and more powerful by integrating a large number of tools. Important features are described in Taudière A. (2023) <doi:10.21105/joss.06038>.

Maintained by Adrien Taudière. Last updated 26 days ago.

sequencing microbiome metagenomics clustering classification visualization amplicon amplicon-sequencing biodiversity-informatics ecology illumina metabarcoding ngs-analysis

3.6 match 17 stars 6.44 score 23 scripts

dami82

mutSignatures:Decipher Mutational Signatures from Somatic Mutational Catalogs

Cancer cells accumulate DNA mutations as result of DNA damage and DNA repair processes. This computational framework is aimed at deciphering DNA mutational signatures operating in cancer. The framework includes modules that support raw data import and processing, mutational signature extraction, and results interpretation and visualization. The framework accepts widely used file formats storing information about DNA variants, such as Variant Call Format files. The framework performs Non-Negative Matrix Factorization to extract mutational signatures explaining the observed set of DNA mutations. Bootstrapping is performed as part of the analysis. The framework supports parallelization and is optimized for use on multi-core systems. The software was described by Fantini D et al (2020) <doi:10.1038/s41598-020-75062-0> and is based on a custom R-based implementation of the original MATLAB WTSI framework by Alexandrov LB et al (2013) <doi:10.1016/j.celrep.2012.12.008>.

Maintained by Damiano Fantini. Last updated 2 years ago.

3.9 match 14 stars 5.83 score 48 scripts

jinghuazhao

gap:Genetic Analysis Package

As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).

Maintained by Jing Hua Zhao. Last updated 16 days ago.

genetics imputation lmm fortran

1.9 match 12 stars 11.88 score 448 scripts 16 dependents

msq-123

CovidMutations:Mutation Analysis and Assay Validation Toolkit for COVID-19 (Coronavirus Disease 2019)

A feasible framework for mutation analysis and reverse transcription polymerase chain reaction (RT-PCR) assay evaluation of COVID-19, including mutation profile visualization, statistics and mutation ratio of each assay. The mutation ratio is conducive to evaluating the coverage of RT-PCR assays in large-sized samples<doi:10.20944/preprints202004.0529.v1>.

Maintained by Shaoqian Ma. Last updated 5 years ago.

5.0 match 4 stars 4.30 score 6 scripts

statgenlmu

coala:A Framework for Coalescent Simulation

Coalescent simulators can rapidly simulate biological sequences evolving according to a given model of evolution. You can use this package to specify such models, to conduct the simulations and to calculate additional statistics from the results (Staab, Metzler, 2016 <doi:10.1093/bioinformatics/btw098>). It relies on existing simulators for doing the simulation, and currently supports the programs 'ms', 'msms' and 'scrm'. It also supports finite-sites mutation models by combining the simulators with the program 'seq-gen'. Coala provides functions for calculating certain summary statistics, which can also be applied to actual biological data. One possibility to import data is through the 'PopGenome' package (<https://github.com/pievos101/PopGenome>).

Maintained by Dirk Metzler. Last updated 1 years ago.

coalescent dna evolution popgen simulation cpp

3.0 match 23 stars 7.06 score 84 scripts

bioc

VarCon:VarCon: an R package for retrieving neighboring nucleotides of an SNV

VarCon is an R package which converts the positional information from the annotation of an single nucleotide variation (SNV) (either referring to the coding sequence or the reference genomic sequence). It retrieves the genomic reference sequence around the position of the single nucleotide variation. To asses, whether the SNV could potentially influence binding of splicing regulatory proteins VarCon calcualtes the HEXplorer score as an estimation. Besides, VarCon additionally reports splice site strengths of splice sites within the retrieved genomic sequence and any changes due to the SNV.

Maintained by Johannes Ptok. Last updated 5 months ago.

functionalgenomics alternativesplicing

5.3 match 4.00 score 5 scripts

emmanuelparadis

pegas:Population and Evolutionary Genetics Analysis System

Functions for reading, writing, plotting, analysing, and manipulating allelic and haplotypic data, including from VCF files, and for the analysis of population nucleotide sequences and micro-satellites including coalescent analyses, linkage disequilibrium, population structure (Fst, Amova) and equilibrium (HWE), haplotype networks, minimum spanning tree and network, and median-joining networks.

Maintained by Emmanuel Paradis. Last updated 1 years ago.

2.8 match 7.53 score 576 scripts 18 dependents

bioc

DropletUtils:Utilities for Handling Single-Cell Droplet Data

Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading from count matrices or molecule information files, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix.

Maintained by Jonathan Griffiths. Last updated 3 months ago.

immunooncology singlecell sequencing rnaseq geneexpression transcriptomics dataimport coverage zlib cpp

2.0 match 10.08 score 2.7k scripts 9 dependents

bioc

oligo:Preprocessing tools for oligonucleotide arrays

A package to analyze oligonucleotide arrays (expression/SNP/tiling/exon) at probe-level. It currently supports Affymetrix (CEL files) and NimbleGen arrays (XYS files).

Maintained by Benilton Carvalho. Last updated 8 days ago.

microarray onechannel twochannel preprocessing snp differentialexpression exonarray geneexpression dataimport zlib

1.9 match 3 stars 10.42 score 528 scripts 10 dependents

ropensci

bold:Interface to Bold Systems API

A programmatic interface to the Web Service methods provided by Bold Systems (<http://www.boldsystems.org/>) for genetic 'barcode' data. Functions include methods for searching by sequences by taxonomic names, ids, collectors, and institutions; as well as a function for searching for specimens, and downloading trace files.

Maintained by Salix Dubois. Last updated 3 months ago.

biodiversity barcode dna sequences fasta api-wrapper barcodes taxize

3.4 match 18 stars 5.74 score 57 scripts

bioc

TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data

The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Maintained by Tiago Chedraoui Silva. Last updated 27 days ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network sequencing survival software bioc bioconductor gdc integrative-analysis tcga tcga-data tcgabiolinks

1.3 match 305 stars 14.45 score 1.6k scripts 6 dependents

jgx65

hierfstat:Estimation and Tests of Hierarchical F-Statistics

Estimates hierarchical F-statistics from haploid or diploid genetic data with any numbers of levels in the hierarchy, following the algorithm of Yang (Evolution(1998), 52:950). Tests via randomisations the significance of each F and variance components, using the likelihood-ratio statistics G (Goudet et al. (1996) <https://academic.oup.com/genetics/article/144/4/1933/6017091>). Estimates genetic diversity statistics for haploid and diploid genetic datasets in various formats, including inbreeding and coancestry coefficients, and population specific F-statistics following Weir and Goudet (2017) <https://academic.oup.com/genetics/article/206/4/2085/6072590>.

Maintained by Jerome Goudet. Last updated 4 months ago.

devtools fstatistics gwas hierfstat kinship population-genetics population-genomics quantitative-genetics simulations

1.8 match 25 stars 10.94 score 560 scripts 4 dependents

systemsbioinformatics

parcr:Construct Parsers for Structured Text Files

Construct parser combinator functions, higher order functions that parse input. Construction of such parsers is transparent and easy. Their main application is the parsing of structured text files like those generated by laboratory instruments. Based on a paper by Hutton (1992) <doi:10.1017/S0956796800000411>.

Maintained by Douwe Molenaar. Last updated 9 months ago.

combinators higher-order-functions parser parsing

3.8 match 4 stars 5.08 score 8 scripts

lucasnell

jackalope:A Swift, Versatile Phylogenomic and High-Throughput Sequencing Simulator

Simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina <https://www.illumina.com/> and Pacific Biosciences (PacBio) <https://www.pacb.com/> platforms. It can either read reference genomes from FASTA files or simulate new ones. Genomic variants can be simulated using summary statistics, phylogenies, Variant Call Format (VCF) files, and coalescent simulations—the latter of which can include selection, recombination, and demographic fluctuations. 'jackalope' can simulate single, paired-end, or mate-pair Illumina reads, as well as PacBio reads. These simulations include sequencing errors, mapping qualities, multiplexing, and optical/polymerase chain reaction (PCR) duplicates. Simulating Illumina sequencing is based on ART by Huang et al. (2012) <doi:10.1093/bioinformatics/btr708>. PacBio sequencing simulation is based on SimLoRD by Stöcker et al. (2016) <doi:10.1093/bioinformatics/btw286>. All outputs can be written to standard file formats.

Maintained by Lucas A. Nell. Last updated 1 years ago.

zlib openblas curl bzip2 xz-utils cpp

3.3 match 8 stars 5.28 score 24 scripts

nakarinp

longreadvqs:Viral Quasispecies Comparison from Long-Read Sequencing Data

Performs variety of viral quasispecies diversity analyses [see Pamornchainavakul et al. (2024) <doi:10.21203/rs.3.rs-4637890/v1>] based on long-read sequence alignment. Main functions include 1) sequencing error and other noise minimization and read sampling, 2) Single nucleotide variant (SNV) profiles comparison, and 3) viral quasispecies profiles comparison and visualization.

Maintained by Nakarin Pamornchainavakul. Last updated 7 months ago.

3.7 match 4.65 score 4 scripts

bioc

Rsubread:Mapping, quantification and variant analysis of sequencing data

Alignment, quantification and analysis of RNA sequencing data (including both bulk RNA-seq and scRNA-seq) and DNA sequenicng data (including ATAC-seq, ChIP-seq, WGS, WES etc). Includes functionality for read mapping, read counting, SNP calling, structural variant detection and gene fusion discovery. Can be applied to all major sequencing techologies and to both short and long sequence reads.

Maintained by Wei Shi. Last updated 2 days ago.

sequencing alignment sequencematching rnaseq chipseq singlecell geneexpression generegulation genetics immunooncology snp geneticvariability preprocessing qualitycontrol genomeannotation genefusiondetection indeldetection variantannotation variantdetection multiplesequencealignment zlib

1.9 match 9.24 score 892 scripts 10 dependents

immunomind

immunarch:Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires

A comprehensive framework for bioinformatics exploratory analysis of bulk and single-cell T-cell receptor and antibody repertoires. It provides seamless data loading, analysis and visualisation for AIRR (Adaptive Immune Receptor Repertoire) data, both bulk immunosequencing (RepSeq) and single-cell sequencing (scRNAseq). Immunarch implements most of the widely used AIRR analysis methods, such as: clonality analysis, estimation of repertoire similarities in distribution of clonotypes and gene segments, repertoire diversity analysis, annotation of clonotypes using external immune receptor databases and clonotype tracking in vaccination and cancer studies. A successor to our previously published 'tcR' immunoinformatics package (Nazarov 2015) <doi:10.1186/s12859-015-0613-1>.

Maintained by Vadim I. Nazarov. Last updated 12 months ago.

airr-analysis b-cell-receptor bcr bcr-repertoire bioinformatics ig ig-repertoire immune-repertoire immune-repertoire-analysis immune-repertoire-data immunoglobulin immunoinformatics immunology rep-seq repertoire-analysis single-cell single-cell-analysis t-cell-receptor tcr tcr-repertoire cpp

1.8 match 315 stars 9.49 score 203 scripts

bioc

DECIPHER:Tools for curating, analyzing, and manipulating biological sequences

A toolset for deciphering and managing biological sequences.

Maintained by Erik Wright. Last updated 5 days ago.

clustering genetics sequencing dataimport visualization microarray qualitycontrol qpcr alignment wholegenome microbiome immunooncology geneprediction openmp

2.0 match 8.40 score 1.1k scripts 14 dependents

bioc

SeqVarTools:Tools for variant data

An interface to the fast-access storage format for VCF data provided in SeqArray, with tools for common operations and analysis.

Maintained by Stephanie M. Gogarten. Last updated 5 months ago.

snp geneticvariability sequencing genetics

1.9 match 3 stars 8.76 score 384 scripts 2 dependents

yulab-smu

TDbook:Companion Package for the Book "Data Integration, Manipulation and Visualization of Phylogenetic Trees" by Guangchuang Yu (2022, ISBN:9781032233574, doi:10.1201/9781003279242)

The companion package that provides all the datasets used in the book "Data Integration, Manipulation and Visualization of Phylogenetic Trees" by Guangchuang Yu (2022, ISBN:9781032233574, doi:10.1201/9781003279242).

Maintained by Guangchuang Yu. Last updated 3 years ago.

3.3 match 13 stars 4.88 score 59 scripts

cran

ips:Interfaces to Phylogenetic Software in R

Functions that wrap popular phylogenetic software for sequence alignment, masking of sequence alignments, and estimation of phylogenies and ancestral character states.

Maintained by Christoph Heibl. Last updated 11 months ago.

3.7 match 4.28 score 128 scripts 1 dependents

thomaschln

snplinkage:Single Nucleotide Polymorphisms Linkage Disequilibrium Visualizations

Linkage disequilibrium visualizations of up to several hundreds of single nucleotide polymorphisms (SNPs), annotated with chromosomic positions and gene names. Two types of plots are available for small numbers of SNPs (<40) and for large numbers (tested up to 500). Both can be extended by combining other ggplots, e.g. association studies results, and functions enable to directly visualize the effect of SNP selection methods, as minor allele frequency filtering and TagSNP selection, with a second correlation heatmap. The SNPs correlations are computed on Genotype Data objects from the 'GWASTools' package using the 'SNPRelate' package, and the plots are customizable 'ggplot2' and 'gtable' objects and are annotated using the 'biomaRt' package. Usage is detailed in the vignette with example data and results from up to 500 SNPs of 1,200 scans are in Charlon T. (2019) <doi:10.13097/archive-ouverte/unige:161795>.

Maintained by Thomas Charlon. Last updated 4 months ago.

geneticvariability microarray snp

3.4 match 4.62 score 14 scripts

hmorlon

RPANDA:Phylogenetic ANalyses of DiversificAtion

Implements macroevolutionary analyses on phylogenetic trees. See Morlon et al. (2010) <DOI:10.1371/journal.pbio.1000493>, Morlon et al. (2011) <DOI:10.1073/pnas.1102543108>, Condamine et al. (2013) <DOI:10.1111/ele.12062>, Morlon et al. (2014) <DOI:10.1111/ele.12251>, Manceau et al. (2015) <DOI:10.1111/ele.12415>, Lewitus & Morlon (2016) <DOI:10.1093/sysbio/syv116>, Drury et al. (2016) <DOI:10.1093/sysbio/syw020>, Manceau et al. (2016) <DOI:10.1093/sysbio/syw115>, Morlon et al. (2016) <DOI:10.1111/2041-210X.12526>, Clavel & Morlon (2017) <DOI:10.1073/pnas.1606868114>, Drury et al. (2017) <DOI:10.1093/sysbio/syx079>, Lewitus & Morlon (2017) <DOI:10.1093/sysbio/syx095>, Drury et al. (2018) <DOI:10.1371/journal.pbio.2003563>, Clavel et al. (2019) <DOI:10.1093/sysbio/syy045>, Maliet et al. (2019) <DOI:10.1038/s41559-019-0908-0>, Billaud et al. (2019) <DOI:10.1093/sysbio/syz057>, Lewitus et al. (2019) <DOI:10.1093/sysbio/syz061>, Aristide & Morlon (2019) <DOI:10.1111/ele.13385>, Maliet et al. (2020) <DOI:10.1111/ele.13592>, Drury et al. (2021) <DOI:10.1371/journal.pbio.3001270>, Perez-Lamarque & Morlon (2022) <DOI:10.1111/mec.16478>, Perez-Lamarque et al. (2022) <DOI:10.1101/2021.08.30.458192>, Mazet et al. (2023) <DOI:10.1111/2041-210X.14195>, Drury et al. (2024) <DOI:10.1016/j.cub.2023.12.055>.

Maintained by Hélène Morlon. Last updated 2 months ago.

1.8 match 24 stars 8.50 score 255 scripts

jlp-bioinf

RRNA:Secondary Structure Plotting for RNA

Functions for creating and manipulating RNA secondary structure plots.

Maintained by JP Bida. Last updated 11 months ago.

3.4 match 1 stars 4.33 score 47 scripts 1 dependents

bioc

m6Aboost:m6Aboost

This package can help user to run the m6Aboost model on their own miCLIP2 data. The package includes functions to assign the read counts and get the features to run the m6Aboost model. The miCLIP2 data should be stored in a GRanges object. More details can be found in the vignette.

Maintained by You Zhou. Last updated 5 months ago.

sequencing epigenetics genetics experimenthubsoftware

3.3 match 2 stars 4.30 score 5 scripts

bioc

seq.hotSPOT:Targeted sequencing panel design based on mutation hotspots

seq.hotSPOT provides a resource for designing effective sequencing panels to help improve mutation capture efficacy for ultradeep sequencing projects. Using SNV datasets, this package designs custom panels for any tissue of interest and identify the genomic regions likely to contain the most mutations. Establishing efficient targeted sequencing panels can allow researchers to study mutation burden in tissues at high depth without the economic burden of whole-exome or whole-genome sequencing. This tool was developed to make high-depth sequencing panels to study low-frequency clonal mutations in clinically normal and cancerous tissues.

Maintained by Sydney Grant. Last updated 5 months ago.

software technology sequencing dnaseq wholegenome

3.5 match 4.00 score 3 scripts

bioc

EDASeq:Exploratory Data Analysis and Normalization for RNA-Seq

Numerical and graphical summaries of RNA-Seq read data. Within-lane normalization procedures to adjust for GC-content effect (or other gene-level effects) on read counts: loess robust local regression, global-scaling, and full-quantile normalization (Risso et al., 2011). Between-lane normalization procedures to adjust for distributional differences between lanes (e.g., sequencing depth): global-scaling and full-quantile normalization (Bullard et al., 2010).

Maintained by Davide Risso. Last updated 5 months ago.

immunooncology sequencing rnaseq preprocessing qualitycontrol differentialexpression

1.3 match 5 stars 10.24 score 594 scripts 9 dependents

william-swl

plutor:Useful Functions for Visualization

In ancient Roman mythology, 'Pluto' was the ruler of the underworld and presides over the afterlife. 'Pluto' was frequently conflated with 'Plutus', the god of wealth, because mineral wealth was found underground. When plotting with R, you try once, twice, practice again and again, and finally you get a pretty figure you want. It's a 'plot tour', a tour about repetition and reward. Hope 'plutor' helps you on the tour!

Maintained by William Song. Last updated 1 years ago.

3.8 match 3 stars 3.62 score 28 scripts

bioc

compSPOT:compSPOT: Tool for identifying and comparing significantly mutated genomic hotspots

Clonal cell groups share common mutations within cancer, precancer, and even clinically normal appearing tissues. The frequency and location of these mutations may predict prognosis and cancer risk. It has also been well established that certain genomic regions have increased sensitivity to acquiring mutations. Mutation-sensitive genomic regions may therefore serve as markers for predicting cancer risk. This package contains multiple functions to establish significantly mutated hotspots, compare hotspot mutation burden between samples, and perform exploratory data analysis of the correlation between hotspot mutation burden and personal risk factors for cancer, such as age, gender, and history of carcinogen exposure. This package allows users to identify robust genomic markers to help establish cancer risk.

Maintained by Sydney Grant. Last updated 5 months ago.

software technology sequencing dnaseq wholegenome classification singlecell survival multiplecomparison

3.4 match 4.00 score 3 scripts

bioc

SigsPack:Mutational Signature Estimation for Single Samples

Single sample estimation of exposure to mutational signatures. Exposures to known mutational signatures are estimated for single samples, based on quadratic programming algorithms. Bootstrapping the input mutational catalogues provides estimations on the stability of these exposures. The effect of the sequence composition of mutational context can be taken into account by normalising the catalogues.

Maintained by Franziska Schumann. Last updated 5 months ago.

somaticmutation snp variantannotation biomedicalinformatics dnaseq

3.0 match 2 stars 4.30 score 4 scripts

pneuvial

adjclust:Adjacency-Constrained Clustering of a Block-Diagonal Similarity Matrix

Implements a constrained version of hierarchical agglomerative clustering, in which each observation is associated to a position, and only adjacent clusters can be merged. Typical application fields in bioinformatics include Genome-Wide Association Studies or Hi-C data analysis, where the similarity between items is a decreasing function of their genomic distance. Taking advantage of this feature, the implemented algorithm is time and memory efficient. This algorithm is described in Ambroise et al (2019) <doi:10.1186/s13015-019-0157-4>.

Maintained by Pierre Neuvial. Last updated 5 months ago.

clustering featureextraction gwas hi-c hierarchical-clustering linkage-disequilibrium cpp openmp

1.8 match 16 stars 7.35 score 13 scripts 2 dependents

bioc

isomiRs:Analyze isomiRs and miRNAs from small RNA-seq

Characterization of miRNAs and isomiRs, clustering and differential expression.

Maintained by Lorena Pantano. Last updated 5 months ago.

mirna rnaseq differentialexpression clustering immunooncology analyze-isomirs bioconductor isomirs

1.8 match 8 stars 7.09 score 43 scripts

ubcxzhang

iimi:Identifying Infection with Machine Intelligence

A novel machine learning method for plant viruses diagnostic using genome sequencing data. This package includes three different machine learning models, random forest, XGBoost, and elastic net, to train and predict mapped genome samples. Mappability profile and unreliable regions are introduced to the algorithm, and users can build a mappability profile from scratch with functions included in the package. Plotting mapped sample coverage information is provided.

Maintained by Xuekui Zhang. Last updated 5 months ago.

4.9 match 2.60 score 5 scripts

kbhoehn

dowser:B Cell Receptor Phylogenetics Toolkit

Provides a set of functions for inferring, visualizing, and analyzing B cell phylogenetic trees. Provides methods to 1) reconstruct unmutated ancestral sequences, 2) build B cell phylogenetic trees using multiple methods, 3) visualize trees with metadata at the tips, 4) reconstruct intermediate sequences, 5) detect biased ancestor-descendant relationships among metadata types Workflow examples available at documentation site (see URL). Citations: Hoehn et al (2022) <doi:10.1371/journal.pcbi.1009885>, Hoehn et al (2021) <doi:10.1101/2021.01.06.425648>.

Maintained by Kenneth Hoehn. Last updated 2 months ago.

1.7 match 6.81 score 84 scripts

bioc

Structstrings:Implementation of the dot bracket annotations with Biostrings

The Structstrings package implements the widely used dot bracket annotation for storing base pairing information in structured RNA. Structstrings uses the infrastructure provided by the Biostrings package and derives the DotBracketString and related classes from the BString class. From these, base pair tables can be produced for in depth analysis. In addition, the loop indices of the base pairs can be retrieved as well. For better efficiency, information conversion is implemented in C, inspired to a large extend by the ViennaRNA package.

Maintained by Felix G.M. Ernst. Last updated 5 months ago.

dataimport datarepresentation infrastructure sequencing software alignment sequencematching bioconductor rna rna-structural-analysis rna-structure sequences structures

1.8 match 4 stars 6.46 score 3 scripts 4 dependents

bioc

h5vc:Managing alignment tallies using a hdf5 backend

This package contains functions to interact with tally data from NGS experiments that is stored in HDF5 files.

Maintained by Paul Theodor Pyl. Last updated 2 months ago.

curl bzip2 xz-utils zlib cpp

2.5 match 4.48 score 2 scripts

bioc

IsoformSwitchAnalyzeR:Identify, Annotate and Visualize Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data

Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.

Maintained by Kristoffer Vitting-Seerup. Last updated 5 months ago.

geneexpression transcription alternativesplicing differentialexpression differentialsplicing visualization statisticalmethod transcriptomevariant biomedicalinformatics functionalgenomics systemsbiology transcriptomics rnaseq annotation functionalprediction geneprediction dataimport multiplecomparison batcheffect immunooncology

1.2 match 108 stars 9.26 score 125 scripts

bioc

motifmatchr:Fast Motif Matching in R

Quickly find motif matches for many motifs and many sequences. Wraps C++ code from the MOODS motif calling library, which was developed by Pasi Rastas, Janne Korhonen, and Petri Martinmäki.

Maintained by Alicia Schep. Last updated 5 months ago.

motifannotation cpp

1.3 match 8.12 score 722 scripts 5 dependents

r-forge

genoPlotR:Plot Publication-Grade Gene and Genome Maps

Draws gene or genome maps and comparisons between these, in a publication-grade manner. Starting from simple, common files, it will draw postscript or PDF files that can be sent as such to journals.

Maintained by Lionel Guy. Last updated 4 years ago.

1.9 match 5.33 score 106 scripts

bioc

SNPRelate:Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed an R package SNPRelate to provide a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.

Maintained by Xiuwen Zheng. Last updated 5 months ago.

infrastructure genetics statisticalmethod principalcomponent bioinformatics gds-format pca simd snp openblas cpp

0.8 match 104 stars 12.69 score 1.6k scripts 18 dependents

bioc

R4RNA:An R package for RNA visualization and analysis

A package for RNA basepair analysis, including the visualization of basepairs as arc diagrams for easy comparison and annotation of sequence and structure. Arc diagrams can additionally be projected onto multiple sequence alignments to assess basepair conservation and covariation, with numerical methods for computing statistics for each.

Maintained by Daniel Lai. Last updated 5 months ago.

alignment multiplesequencealignment preprocessing visualization dataimport datarepresentation multiplecomparison

1.8 match 5.36 score 19 scripts 4 dependents

jdieramon

refseqR:Common Computational Operations Working with RefSeq Entries (GenBank)

Fetches NCBI data (RefSeq <https://www.ncbi.nlm.nih.gov/refseq/> database) and provides an environment to extract information at the level of gene, mRNA or protein accessions.

Maintained by Jose V. Die. Last updated 3 months ago.

1.8 match 4 stars 5.34 score 5 scripts

bioc

wavClusteR:Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data

The package provides an integrated pipeline for the analysis of PAR-CLIP data. PAR-CLIP-induced transitions are first discriminated from sequencing errors, SNPs and additional non-experimental sources by a non- parametric mixture model. The protein binding sites (clusters) are then resolved at high resolution and cluster statistics are estimated using a rigorous Bayesian framework. Post-processing of the results, data export for UCSC genome browser visualization and motif search analysis are provided. In addition, the package allows to integrate RNA-Seq data to estimate the False Discovery Rate of cluster detection. Key functions support parallel multicore computing. Note: while wavClusteR was designed for PAR-CLIP data analysis, it can be applied to the analysis of other NGS data obtained from experimental procedures that induce nucleotide substitutions (e.g. BisSeq).

Maintained by Federico Comoglio. Last updated 5 months ago.

immunooncology sequencing technology ripseq rnaseq bayesian

2.0 match 4.60 score 3 scripts

empiricalbayes

SNVLFDR:Empirical Bayes Single Nucleotide Variant Calling

Identifies single nucleotide variants in next-generation sequencing data by estimating their local false discovery rates. For more details, see Karimnezhad, A. and Perkins, T. J. (2024) <doi:10.1038/s41598-024-51958-z>.

Maintained by Ali Karimnezhad. Last updated 1 years ago.

3.4 match 2.70 score

arindamroychoudhury

rapidphylo:Rapidly Estimates Phylogeny from Large Allele Frequency Data Using Root Distances Method

Rapidly estimates tree-topology from large allele frequency data using Root Distances Method, under a Brownian Motion Model. See Peng et al. (2021) <doi:10.1016/j.ympev.2021.107142>.

Maintained by Arindam RoyChoudhury. Last updated 2 years ago.

3.4 match 2.70 score

bioc

PWMEnrich:PWM enrichment analysis

A toolkit of high-level functions for DNA motif scanning and enrichment analysis built upon Biostrings. The main functionality is PWM enrichment analysis of already known PWMs (e.g. from databases such as MotifDb), but the package also implements high-level functions for PWM scanning and visualisation. The package does not perform "de novo" motif discovery, but is instead focused on using motifs that are either experimentally derived or computationally constructed by other tools.

Maintained by Diego Diez. Last updated 5 months ago.

motifannotation sequencematching software

1.8 match 5.08 score 60 scripts

emmanuelparadis

ape:Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.

Maintained by Emmanuel Paradis. Last updated 14 hours ago.

openblas cpp

0.5 match 64 stars 17.22 score 13k scripts 599 dependents

bioc

MassArray:Analytical Tools for MassArray Data

This package is designed for the import, quality control, analysis, and visualization of methylation data generated using Sequenom's MassArray platform. The tools herein contain a highly detailed amplicon prediction for optimal assay design. Also included are quality control measures of data, such as primer dimer and bisulfite conversion efficiency estimation. Methylation data are calculated using the same algorithms contained in the EpiTyper software package. Additionally, automatic SNP-detection can be used to flag potentially confounded data from specific CG sites. Visualization includes barplots of methylation data as well as UCSC Genome Browser-compatible BED tracks. Multiple assays can be positionally combined for integrated analysis.

Maintained by Reid F. Thompson. Last updated 5 months ago.

immunooncology dnamethylation snp massspectrometry genetics dataimport visualization

2.0 match 4.30 score 1 scripts

bioc

mobileRNA:mobileRNA: Investigate the RNA mobilome & population-scale changes

Genomic analysis can be utilised to identify differences between RNA populations in two conditions, both in production and abundance. This includes the identification of RNAs produced by multiple genomes within a biological system. For example, RNA produced by pathogens within a host or mobile RNAs in plant graft systems. The mobileRNA package provides methods to pre-process, analyse and visualise the sRNA and mRNA populations based on the premise of mapping reads to all genotypes at the same time.

Maintained by Katie Jeynes-Cupper. Last updated 5 months ago.

visualization rnaseq sequencing smallrna genomeassembly clustering experimentaldesign qualitycontrol workflowstep alignment preprocessing bioinformatics plant-science

1.7 match 4 stars 5.00 score 2 scripts

connor-reid-tiffany

omu:A Metabolomics Analysis Tool for Intuitive Figures and Convenient Metadata Collection

Facilitates the creation of intuitive figures to describe metabolomics data by utilizing Kyoto Encyclopedia of Genes and Genomes (KEGG) hierarchy data, and gathers functional orthology and gene data from the KEGG-REST API.

Maintained by Connor Tiffany. Last updated 1 years ago.

1.8 match 3 stars 4.89 score 52 scripts

bioc

dStruct:Identifying differentially reactive regions from RNA structurome profiling data

dStruct identifies differentially reactive regions from RNA structurome profiling data. dStruct is compatible with a broad range of structurome profiling technologies, e.g., SHAPE-MaP, DMS-MaPseq, Structure-Seq, SHAPE-Seq, etc. See Choudhary et al., Genome Biology, 2019 for the underlying method.

Maintained by Krishna Choudhary. Last updated 5 months ago.

statisticalmethod structuralprediction sequencing software

1.8 match 2 stars 4.86 score 12 scripts

bioc

VariantTools:Tools for Exploratory Analysis of Variant Calls

Explore, diagnose, and compare variant calls using filters.

Maintained by Michael Lawrence. Last updated 5 months ago.

genetics geneticvariability sequencing

2.0 match 4.09 score 41 scripts

bioc

SynMut:SynMut: Designing Synonymously Mutated Sequences with Different Genomic Signatures

There are increasing demands on designing virus mutants with specific dinucleotide or codon composition. This tool can take both dinucleotide preference and/or codon usage bias into account while designing mutants. It is a powerful tool for in silico designs of DNA sequence mutants.

Maintained by Haogao Gu. Last updated 5 months ago.

sequencematching experimentaldesign preprocessing

1.9 match 2 stars 4.30 score 1 scripts

pboutros

SeqKat:Detection of Kataegis

Kataegis is a localized hypermutation occurring when a region is enriched in somatic SNVs. Kataegis can result from multiple cytosine deaminations catalyzed by the AID/APOBEC family of proteins. This package contains functions to detect kataegis from SNVs in BED format. This package reports two scores per kataegic event, a hypermutation score and an APOBEC mediated kataegic score. Yousif, F. et al.; The Origins and Consequences of Localized and Global Somatic Hypermutation; Biorxiv 2018 <doi:10.1101/287839>.

Maintained by Paul C. Boutros. Last updated 5 years ago.

cpp

3.8 match 2.11 score 13 scripts

bioc

hiReadsProcessor:Functions to process LM-PCR reads from 454/Illumina data

hiReadsProcessor contains set of functions which allow users to process LM-PCR products sequenced using any platform. Given an excel/txt file containing parameters for demultiplexing and sample metadata, the functions automate trimming of adaptors and identification of the genomic product. Genomic products are further processed for QC and abundance quantification.

Maintained by Nirav V Malani. Last updated 5 months ago.

sequencing preprocessing

1.9 match 4.18 score 7 scripts

bioc

R453Plus1Toolbox:A package for importing and analyzing data from Roche's Genome Sequencer System

The R453Plus1 Toolbox comprises useful functions for the analysis of data generated by Roche's 454 sequencing platform. It adds functions for quality assurance as well as for annotation and visualization of detected variants, complementing the software tools shipped by Roche with their product. Further, a pipeline for the detection of structural variants is provided.

Maintained by Hans-Ulrich Klein. Last updated 5 months ago.

sequencing infrastructure dataimport datarepresentation visualization qualitycontrol reportwriting

2.3 match 3.48 score 10 scripts

greifflab

immuneSIM:Tunable Simulation of B- And T-Cell Receptor Repertoires

Simulate full B-cell and T-cell receptor repertoires using an in silico recombination process that includes a wide variety of tunable parameters to introduce noise and biases. Additional post-simulation modification functions allow the user to implant motifs or codon biases as well as remodeling sequence similarity architecture. The output repertoires contain records of all relevant repertoire dimensions and can be analyzed using provided repertoire analysis functions. Preprint is available at bioRxiv (Weber et al., 2019 <doi:10.1101/759795>).

Maintained by Cédric R. Weber. Last updated 1 years ago.

1.8 match 37 stars 4.44 score 15 scripts

stitam

webseq:Access data from biological sequence databases like NCBI, ENA, MGnify

This package interacts with online biological sequence databases. It provides functions to search for sequences, convert identifiers and download sequences and associated metadata.

Maintained by Tamas Stirling. Last updated 1 months ago.

1.7 match 3 stars 4.13 score 1 scripts

bioc

GenomicAlignments:Representation and manipulation of short genomic alignments

Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.

Maintained by Hervé Pagès. Last updated 5 months ago.

infrastructure dataimport genetics sequencing rnaseq snp coverage alignment immunooncology bioconductor-package core-package

0.5 match 10 stars 13.61 score 3.1k scripts 529 dependents

bioc

Motif2Site:Detect binding sites from motifs and ChIP-seq experiments, and compare binding sites across conditions

Detect binding sites using motifs IUPAC sequence or bed coordinates and ChIP-seq experiments in bed or bam format. Combine/compare binding sites across experiments, tissues, or conditions. All normalization and differential steps are done using TMM-GLM method. Signal decomposition is done by setting motifs as the centers of the mixture of normal distribution curves.

Maintained by Peyman Zarrineh. Last updated 5 months ago.

software sequencing chipseq differentialpeakcalling epigenetics sequencematching

1.7 match 4.00 score 3 scripts

cran

IPCAPS:Iterative Pruning to Capture Population Structure

An unsupervised clustering algorithm based on iterative pruning is for capturing population structure. This version supports ordinal data which can be applied directly to SNP data to identify fine-level population structure and it is built on the iterative pruning Principal Component Analysis ('ipPCA') algorithm as explained in Intarapanich et al. (2009) <doi:10.1186/1471-2105-10-382>. The 'IPCAPS' involves an iterative process using multiple splits based on multivariate Gaussian mixture modeling of principal components and 'Expectation-Maximization' clustering as explained in Lebret et al. (2015) <doi:10.18637/jss.v067.i06>. In each iteration, rough clusters and outliers are also identified using the function rubikclust() from the R package 'KRIS'.

Maintained by Kridsadakorn Chaichoompu. Last updated 4 years ago.

3.4 match 2.00 score 10 scripts

bioc

GeoTcgaData:Processing Various Types of Data on GEO and TCGA

Gene Expression Omnibus(GEO) and The Cancer Genome Atlas (TCGA) provide us with a wealth of data, such as RNA-seq, DNA Methylation, SNP and Copy number variation data. It's easy to download data from TCGA using the gdc tool, but processing these data into a format suitable for bioinformatics analysis requires more work. This R package was developed to handle these data.

Maintained by Erqiang Hu. Last updated 5 months ago.

geneexpression differentialexpression rnaseq copynumbervariation microarray software dnamethylation differentialmethylation snp atacseq methylationarray

1.2 match 25 stars 5.85 score 19 scripts

bioc

crisprShiny:Exploring curated CRISPR gRNAs via Shiny

Provides means to interactively visualize guide RNAs (gRNAs) in GuideSet objects via Shiny application. This GUI can be self-contained or as a module within a larger Shiny app. The content of the app reflects the annotations present in the passed GuideSet object, and includes intuitive tools to examine, filter, and export gRNAs, thereby making gRNA design more user-friendly.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics genetarget gui crispr-analysis crispr-design shiny

1.5 match 2 stars 4.48 score 8 scripts

stephenturner

string2dna:Encode/Decode Strings as Nucleotide Sequences

Encode strings as nucleotide sequences and decode nucleotide sequences into strings.

Maintained by Stephen Turner. Last updated 2 years ago.

3.9 match 1 stars 1.70 score

cnuge

coil:Contextualization and Evaluation of COI-5P Barcode Data

Designed for the cleaning, contextualization and assessment of cytochrome c oxidase I DNA barcode data (COI-5P, or the five prime portion of COI). It contains functions for placing COI-5P barcode sequences into a common reading frame, translating DNA sequences to amino acids and for assessing the likelihood that a given barcode sequence includes an insertion or deletion error. The error assessment relies on the comparison of input sequences against nucleotide and amino acid profile hidden Markov models (PHMMs) (for details see Durbin et al. 1998, ISBN: 9780521629713) trained on a taxonomically diverse set of reference sequences. The functions are provided as a complete pipeline and are also available individually for efficient and targeted analysis of barcode data.

Maintained by Cameron M. Nugent. Last updated 1 years ago.

2.3 match 2.88 score 15 scripts

bioc

XNAString:Efficient Manipulation of Modified Oligonucleotide Sequences

The XNAString package allows for description of base sequences and associated chemical modifications in a single object. XNAString is able to capture single stranded, as well as double stranded molecules. Chemical modifications are represented as independent strings associated with different features of the molecules (base sequence, sugar sequence, backbone sequence, modifications) and can be read or written to a HELM notation. It also enables secondary structure prediction using RNAfold from ViennaRNA. XNAString is designed to be efficient representation of nucleic-acid based therapeutics, therefore it stores information about target sequences and provides interface for matching and alignment functions from Biostrings and pwalign packages.

Maintained by Marianna Plucinska. Last updated 5 months ago.

sequencematching alignment sequencing genetics cpp

1.5 match 4.18 score 4 scripts

bioc

gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files

Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.

Maintained by Xiuwen Zheng. Last updated 2 days ago.

infrastructure dataimport bioinformatics gds-format genomics cpp

0.5 match 18 stars 11.34 score 920 scripts 29 dependents

urodelan

LocaTT:Geographically-Conscious Taxonomic Assignment for Metabarcoding

A bioinformatics pipeline for performing taxonomic assignment of DNA metabarcoding sequence data while considering geographic location. A detailed tutorial is available at <https://urodelan.github.io/Local_Taxa_Tool_Tutorial/>. A manuscript describing these methods is in preparation.

Maintained by Kenen Goodwin. Last updated 12 months ago.

1.8 match 3.00 score

bioc

PureCN:Copy number calling and SNV classification using targeted short read sequencing

This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.

Maintained by Markus Riester. Last updated 2 months ago.

copynumbervariation software sequencing variantannotation variantdetection coverage immunooncology bioconductor-package cell-free-dna copy-number loh tumor-heterogeneity tumor-mutational-burden tumor-purity

0.5 match 132 stars 9.72 score 40 scripts

shixiangwang

sigminer:Extract, Analyze and Visualize Mutational Signatures for Genomic Variations

Genomic alterations including single nucleotide substitution, copy number alteration, etc. are the major force for cancer initialization and development. Due to the specificity of molecular lesions caused by genomic alterations, we can generate characteristic alteration spectra, called 'signature' (Wang, Shixiang, et al. (2021) <DOI:10.1371/journal.pgen.1009557> & Alexandrov, Ludmil B., et al. (2020) <DOI:10.1038/s41586-020-1943-3> & Steele Christopher D., et al. (2022) <DOI:10.1038/s41586-022-04738-6>). This package helps users to extract, analyze and visualize signatures from genomic alteration records, thus providing new insight into cancer study.

Maintained by Shixiang Wang. Last updated 5 months ago.

bayesian-nmf bioinformatics cancer-research cnv copynumber-signatures cosmic-signatures dbs easy-to-use indel mutational-signatures nmf nmf-extraction sbs signature-extraction somatic-mutations somatic-variants visualization cpp

0.5 match 150 stars 9.48 score 123 scripts 2 dependents

bioc

seqArchRplus:Downstream analyses of promoter sequence architectures and HTML report generation

seqArchRplus facilitates downstream analyses of promoter sequence architectures/clusters identified by seqArchR (or any other tool/method). With additional available information such as the TPM values and interquantile widths (IQWs) of the CAGE tag clusters, seqArchRplus can order the input promoter clusters by their shape (IQWs), and write the cluster information as browser/IGV track files. Provided visualizations are of two kind: per sample/stage and per cluster visualizations. Those of the first kind include: plot panels for each sample showing per cluster shape, TPM and other score distributions, sequence logos, and peak annotations. The second include per cluster chromosome-wise and strand distributions, motif occurrence heatmaps and GO term enrichments. Additionally, seqArchRplus can also generate HTML reports for easy viewing and comparison of promoter architectures between samples/stages.

Maintained by Sarvesh Nikumbh. Last updated 5 months ago.

annotation visualization reportwriting go motifannotation clustering

1.2 match 1 stars 4.00 score 2 scripts

grafxzahl

genBaRcode:Analysis and Visualization Tools for Genetic Barcode Data

Provides the necessary functions to identify and extract a selection of already available barcode constructs (Cornils, K. et al. (2014) <doi:10.1093/nar/gku081>) and freely choosable barcode designs from next generation sequence (NGS) data. Furthermore, it offers the possibility to account for sequence errors, the calculation of barcode similarities and provides a variety of visualisation tools (Thielecke, L. et al. (2017) <doi:10.1038/srep43249>).

Maintained by Lars Thielecke. Last updated 6 days ago.

2.0 match 2.30 score 6 scripts

green-striped-gecko

dartR.popgen:Analysing 'SNP' and 'Silicodart' Data Generated by Genome-Wide Restriction Fragment Analysis

Facilitates the analysis of SNP (single nucleotide polymorphism) and silicodart (presence/absence) data. 'dartR.popgen' provides a suit of functions to analyse such data in a population genetics context. It provides several functions to calculate population genetic metrics and to study population structure. Quite a few functions need additional software to be able to run (gl.run.structure(), gl.blast(), gl.LDNe()). You find detailed description in the help pages how to download and link the packages so the function can run the software. 'dartR.popgen' is part of the the 'dartRverse' suit of packages. Gruber et al. (2018) <doi:10.1111/1755-0998.12745>. Mijangos et al. (2022) <doi:10.1111/2041-210X.13918>.

Maintained by Bernd Gruber. Last updated 9 months ago.

2.2 match 2.00 score 9 scripts

fischuu

hoardeR:Collect and Retrieve Annotation Data for Various Genomic Data Using Different Webservices

Cross-species identification of novel gene candidates using the NCBI web service is provided. Further, sets of miRNA target genes can be identified by using the targetscan.org API.

Maintained by Daniel Fischer. Last updated 11 months ago.

1.2 match 1 stars 3.70 score 6 scripts

isubirana

compareGroups:Descriptive Analysis by Groups

Create data summaries for quality control, extensive reports for exploring data, as well as publication-ready univariate or bivariate tables in several formats (plain text, HTML,LaTeX, PDF, Word or Excel. Create figures to quickly visualise the distribution of your data (boxplots, barplots, normality-plots, etc.). Display statistics (mean, median, frequencies, incidences, etc.). Perform the appropriate tests (t-test, Analysis of variance, Kruskal-Wallis, Fisher, log-rank, ...) depending on the nature of the described variable (normal, non-normal or qualitative). Summarize genetic data (Single Nucleotide Polymorphisms) data displaying Allele Frequencies and performing Hardy-Weinberg Equilibrium tests among other typical statistics and tests for these kind of data.

Maintained by Isaac Subirana. Last updated 19 days ago.

comparegroups descriptive-statistics plot report table

0.5 match 35 stars 8.51 score 396 scripts 1 dependents

nirmalaruban

geneNR:Automated Gene Identification for Post-GWAS Analysis

Facilitates the post-Genome Wide Association Studies (GWAS) analysis of identifying candidate genes within user-defined search window, based on the identified Single Nucleotide Polymorphisms (SNPs) as given by Mazumder AK (2024) <doi:10.1038/s41598-024-66903-3>. It supports candidate gene analysis for wheat and rice. Just import your GWAS result as explained in the sample_data file and the function does all the manual search and retrieve candidate genes for you, while exporting the results into ready-to-use output.

Maintained by Rajamani Nirmalaruban. Last updated 6 days ago.

2.1 match 2.00 score

cran

GenomicSig:Computation of Genomic Signatures

Genomic signatures represent unique features within a species' DNA, enabling the differentiation of species and offering broad applications across various fields. This package provides essential tools for calculating these specific signatures, streamlining the process for researchers and offering a comprehensive and time-saving solution for genomic analysis.The amino acid contents are identified based on the work published by Sandberg et al. (2003) <doi:10.1016/s0378-1119(03)00581-x> and Xiao et al. (2015) <doi:10.1093/bioinformatics/btv042>. The Average Mutual Information Profiles (AMIP) values are calculated based on the work of Bauer et al. (2008) <doi:10.1186/1471-2105-9-48>. The Chaos Game Representation (CGR) plot visualization was done based on the work of Deschavanne et al. (1999) <doi:10.1093/oxfordjournals.molbev.a026048> and Jeffrey et al. (1990) <doi:10.1093/nar/18.8.2163>. The GC content is calculated based on the work published by Nakabachi et al. (2006) <doi:10.1126/science.1134196> and Barbu et al. (1956) <https://pubmed.ncbi.nlm.nih.gov/13363015>. The Oligonucleotide Frequency Derived Error Gradient (OFDEG) values are computed based on the work published by Saeed et al. (2009) <doi:10.1186/1471-2164-10-S3-S10>. The Relative Synonymous Codon Usage (RSCU) values are calculated based on the work published by Elek (2018) <https://urn.nsk.hr/urn:nbn:hr:217:686131>.

Maintained by Anu Sharma. Last updated 6 months ago.

4.0 match 1.00 score

bioc

signeR:Empirical Bayesian approach to mutational signature discovery

The signeR package provides an empirical Bayesian approach to mutational signature discovery. It is designed to analyze single nucleotide variation (SNV) counts in cancer genomes, but can also be applied to other features as well. Functionalities to characterize signatures or genome samples according to exposure patterns are also provided.

Maintained by Renan Valieris. Last updated 5 months ago.

genomicvariation somaticmutation statisticalmethod visualization bioconductor bioinformatics openblas cpp

0.5 match 13 stars 7.67 score 22 scripts

cran

SeqFeatR:A Tool to Associate FASTA Sequences and Features

Provides user friendly methods for the identification of sequence patterns that are statistically significantly associated with a property of the sequence. For instance, SeqFeatR allows to identify viral immune escape mutations for hosts of given HLA types. The underlying statistical method is Fisher's exact test, with appropriate corrections for multiple testing, or Bayes. Patterns may be point mutations or n-tuple of mutations. SeqFeatR offers several ways to visualize the results of the statistical analyses, see Budeus (2016) <doi:10.1371/journal.pone.0146409>.

Maintained by Bettina Budeus. Last updated 6 years ago.

jags cpp

1.7 match 2.30 score

luisekuehn

quiddich:QUick IDentification of DIagnostic CHaracters

Provides tools for an automated identification of diagnostic molecular characters, i.e. such columns in a given nucleotide or amino acid alignment that allow to distinguish taxa from each other. These characters can then be used to complement the formal descriptions of the taxa, which are often based on morphological and anatomical features. Especially for morphologically cryptic species, this will be helpful. QUIDDICH distinguishes between four different types of diagnostic characters. For more information, see "Kuehn, A.L., Haase, M. 2019. QUIDDICH: QUick IDentification of DIagnostic CHaracters."

Maintained by A. Luise Kuehn. Last updated 6 years ago.

3.8 match 1.00 score

steffenmoritz

ridge:Ridge Regression with Automatic Selection of the Penalty Parameter

Linear and logistic ridge regression functions. Additionally includes special functions for genome-wide single-nucleotide polymorphism (SNP) data. More details can be found in <doi: 10.1002/gepi.21750> and <doi: 10.1186/1471-2105-12-372>.

Maintained by Steffen Moritz. Last updated 3 years ago.

regression ridge-regression gsl

0.5 match 18 stars 7.24 score 124 scripts 2 dependents

bioc

MutationalPatterns:Comprehensive genome-wide analysis of mutational processes

Mutational processes leave characteristic footprints in genomic DNA. This package provides a comprehensive set of flexible functions that allows researchers to easily evaluate and visualize a multitude of mutational patterns in base substitution catalogues of e.g. healthy samples, tumour samples, or DNA-repair deficient cells. The package covers a wide range of patterns including: mutational signatures, transcriptional and replicative strand bias, lesion segregation, genomic distribution and association with genomic features, which are collectively meaningful for studying the activity of mutational processes. The package works with single nucleotide variants (SNVs), insertions and deletions (Indels), double base substitutions (DBSs) and larger multi base substitutions (MBSs). The package provides functionalities for both extracting mutational signatures de novo and determining the contribution of previously identified mutational signatures on a single sample level. MutationalPatterns integrates with common R genomic analysis workflows and allows easy association with (publicly available) annotation data.

Maintained by Mark van Roosmalen. Last updated 5 months ago.

genetics somaticmutation

0.5 match 7.27 score 251 scripts 1 dependents

ikwak2

aSPU:Adaptive Sum of Powered Score Test

R codes for the (adaptive) Sum of Powered Score ('SPU' and 'aSPU') tests, inverse variance weighted Sum of Powered score ('SPUw' and 'aSPUw') tests and gene-based and some pathway based association tests (Pathway based Sum of Powered Score tests ('SPUpath'), adaptive 'SPUpath' ('aSPUpath') test, 'GEEaSPU' test for multiple traits - single 'SNP' (single nucleotide polymorphism) association in generalized estimation equations, 'MTaSPUs' test for multiple traits - single 'SNP' association with Genome Wide Association Studies ('GWAS') summary statistics, Gene-based Association Test that uses an extended 'Simes' procedure ('GATES'), Hybrid Set-based Test ('HYST') and extended version of 'GATES' test for pathway-based association testing ('GATES-Simes'). ). The tests can be used with genetic and other data sets with covariates. The response variable is binary or quantitative. Summary; (1) Single trait-'SNP' set association with individual-level data ('aSPU', 'aSPUw', 'aSPUr'), (2) Single trait-'SNP' set association with summary statistics ('aSPUs'), (3) Single trait-pathway association with individual-level data ('aSPUpath'), (4) Single trait-pathway association with summary statistics ('aSPUsPath'), (5) Multiple traits-single 'SNP' association with individual-level data ('GEEaSPU'), (6) Multiple traits- single 'SNP' association with summary statistics ('MTaSPUs'), (7) Multiple traits-'SNP' set association with summary statistics('MTaSPUsSet'), (8) Multiple traits-pathway association with summary statistics('MTaSPUsSetPath').

Maintained by Il-Youp Kwak. Last updated 4 years ago.

0.5 match 12 stars 7.18 score 42 scripts 1 dependents

bioc

SomaticSignatures:Somatic Signatures

The SomaticSignatures package identifies mutational signatures of single nucleotide variants (SNVs). It provides a infrastructure related to the methodology described in Nik-Zainal (2012, Cell), with flexibility in the matrix decomposition algorithms.

Maintained by Julian Gehring. Last updated 5 months ago.

sequencing somaticmutation visualization clustering genomicvariation statisticalmethod

0.5 match 22 stars 6.85 score 54 scripts 1 dependents

cran

spgs:Statistical Patterns in Genomic Sequences

A collection of statistical hypothesis tests and other techniques for identifying certain spatial relationships/phenomena in DNA sequences. In particular, it provides tests and graphical methods for determining whether or not DNA sequences comply with Chargaff's second parity rule or exhibit purine-pyrimidine parity. In addition, there are functions for efficiently simulating discrete state space Markov chains and testing arbitrary symbolic sequences of symbols for the presence of first-order Markovianness. Also, it has functions for counting words/k-mers (and cylinder patterns) in arbitrary symbolic sequences. Functions which take a DNA sequence as input can handle sequences stored as SeqFastadna objects from the 'seqinr' package.

Maintained by Andrew Hart. Last updated 1 years ago.

1.7 match 1.98 score 96 scripts

mpierrejean

jointseg:Joint Segmentation of Multivariate (Copy Number) Signals

Methods for fast segmentation of multivariate signals into piecewise constant profiles and for generating realistic copy-number profiles. A typical application is the joint segmentation of total DNA copy numbers and allelic ratios obtained from Single Nucleotide Polymorphism (SNP) microarrays in cancer studies. The methods are described in Pierre-Jean, Rigaill and Neuvial (2015) <doi:10.1093/bib/bbu026>.

Maintained by Morgane Pierre-Jean. Last updated 6 years ago.

cpp

0.5 match 6 stars 6.50 score 44 scripts 2 dependents

bioc

TRONCO:TRONCO, an R package for TRanslational ONCOlogy

The TRONCO (TRanslational ONCOlogy) R package collects algorithms to infer progression models via the approach of Suppes-Bayes Causal Network, both from an ensemble of tumors (cross-sectional samples) and within an individual patient (multi-region or single-cell samples). The package provides parallel implementation of algorithms that process binary matrices where each row represents a tumor sample and each column a single-nucleotide or a structural variant driving the progression; a 0/1 value models the absence/presence of that alteration in the sample. The tool can import data from plain, MAF or GISTIC format files, and can fetch it from the cBioPortal for cancer genomics. Functions for data manipulation and visualization are provided, as well as functions to import/export such data to other bioinformatics tools for, e.g, clustering or detection of mutually exclusive alterations. Inferred models can be visualized and tested for their confidence via bootstrap and cross-validation. TRONCO is used for the implementation of the Pipeline for Cancer Inference (PICNIC).

Maintained by Luca De Sano. Last updated 5 months ago.

biomedicalinformatics bayesian graphandnetwork somaticmutation networkinference network clustering dataimport singlecell immunooncology algorithms cancer-inference tumors

0.5 match 30 stars 6.50 score 38 scripts

pbreheny

plmmr:Penalized Linear Mixed Models for Correlated Data

Fits penalized linear mixed models that correct for unobserved confounding factors. 'plmmr' infers and corrects for the presence of unobserved confounding effects such as population stratification and environmental heterogeneity. It then fits a linear model via penalized maximum likelihood. Originally designed for the multivariate analysis of single nucleotide polymorphisms (SNPs) measured in a genome-wide association study (GWAS), 'plmmr' eliminates the need for subpopulation-specific analyses and post-analysis p-value adjustments. Functions for the appropriate processing of 'PLINK' files are also supplied. For examples, see the package homepage. <https://pbreheny.github.io/plmmr/>.

Maintained by Patrick J. Breheny. Last updated 11 days ago.

cpp openmp

0.5 match 4 stars 6.31 score 10 scripts

evolecolgroup

tidypopgen:Tidy Population Genetics

We provide a tidy grammar of population genetics, facilitating the manipulation and analysis of data on biallelic single nucleotide polymorphisms (SNPs).

Maintained by Andrea Manica. Last updated 3 days ago.

openblas zlib cpp openmp

0.5 match 4 stars 5.83 score 8 scripts

kosukehamazaki

RAINBOWR:Genome-Wide Association Study with SNP-Set Methods

By using 'RAINBOWR' (Reliable Association INference By Optimizing Weights with R), users can test multiple SNPs (Single Nucleotide Polymorphisms) simultaneously by kernel-based (SNP-set) methods. This package can also be applied to haplotype-based GWAS (Genome-Wide Association Study). Users can test not only additive effects but also dominance and epistatic effects. In detail, please check our paper on PLOS Computational Biology: Kosuke Hamazaki and Hiroyoshi Iwata (2020) <doi:10.1371/journal.pcbi.1007663>.

Maintained by Kosuke Hamazaki. Last updated 3 months ago.

cpp

0.5 match 22 stars 5.99 score 22 scripts

bioc

muscle:Multiple Sequence Alignment with MUSCLE

MUSCLE performs multiple sequence alignments of nucleotide or amino acid sequences.

Maintained by Alex T. Kalinka. Last updated 5 months ago.

multiplesequencealignment alignment sequencing genetics sequencematching dataimport cpp

0.6 match 5.21 score 81 scripts

bioc

CNVfilteR:Identifies false positives of CNV calling tools by using SNV calls

CNVfilteR identifies those CNVs that can be discarded by using the single nucleotide variant (SNV) calls that are usually obtained in common NGS pipelines.

Maintained by Jose Marcos Moreno-Cabrera. Last updated 5 months ago.

copynumbervariation sequencing dnaseq visualization dataimport

0.5 match 5 stars 5.18 score 1 scripts

bioc

periodicDNA:Set of tools to identify periodic occurrences of k-mers in DNA sequences

This R package helps the user identify k-mers (e.g. di- or tri-nucleotides) present periodically in a set of genomic loci (typically regulatory elements). The functions of this package provide a straightforward approach to find periodic occurrences of k-mers in DNA sequences, such as regulatory elements. It is not aimed at identifying motifs separated by a conserved distance; for this type of analysis, please visit MEME website.

Maintained by Jacques Serizay. Last updated 5 months ago.

sequencematching motifdiscovery motifannotation sequencing coverage alignment dataimport

0.5 match 6 stars 5.26 score 5 scripts

signaturescience

skater:Utilities for SNP-Based Kinship Analysis

Utilities for single nucleotide polymorphism (SNP) based kinship analysis testing and evaluation. The 'skater' package contains functions for importing, parsing, and analyzing pedigree data, performing relationship degree inference, benchmarking relationship degree classification, and summarizing identity by descent (IBD) segment data. Package functions and methods are described in Turner et al. (2021) "skater: An R package for SNP-based Kinship Analysis, Testing, and Evaluation" <doi:10.1101/2021.07.21.453083>.

Maintained by Stephen Turner. Last updated 2 years ago.

0.5 match 9 stars 5.26 score 7 scripts

bioc

supersigs:Supervised mutational signatures

Generate SuperSigs (supervised mutational signatures) from single nucleotide variants in the cancer genome. Functions included in the package allow the user to learn supervised mutational signatures from their data and apply them to new data. The methodology is based on the one described in Afsari (2021, ELife).

Maintained by Albert Kuo. Last updated 5 months ago.

featureextraction classification regression sequencing wholegenome somaticmutation

0.5 match 3 stars 4.78 score 3 scripts

bioc

customProDB:Generate customized protein database from NGS data, with a focus on RNA-Seq data, for proteomics search

Database search is the most widely used approach for peptide and protein identification in mass spectrometry-based proteomics studies. Our previous study showed that sample-specific protein databases derived from RNA-Seq data can better approximate the real protein pools in the samples and thus improve protein identification. More importantly, single nucleotide variations, short insertion and deletions and novel junctions identified from RNA-Seq data make protein database more complete and sample-specific. Here, we report an R package customProDB that enables the easy generation of customized databases from RNA-Seq data for proteomics search. This work bridges genomics and proteomics studies and facilitates cross-omics data integration.

Maintained by Xiaojing Wang. Last updated 5 months ago.

immunooncology sequencing massspectrometry proteomics snp rnaseq software transcription alternativesplicing functionalgenomics

0.5 match 4.72 score 15 scripts

bioc

GEOfastq:Downloads ENA Fastqs With GEO Accessions

GEOfastq is used to download fastq files from the European Nucleotide Archive (ENA) starting with an accession from the Gene Expression Omnibus (GEO). To do this, sample metadata is retrieved from GEO and the Sequence Read Archive (SRA). SRA run accessions are then used to construct FTP and aspera download links for fastq files generated by the ENA.

Maintained by Alex Pickering. Last updated 5 months ago.

rnaseq dataimport bioinformatics fastq gene-expression geo rna-seq

0.5 match 4 stars 4.60 score 6 scripts

cran

rehh:Searching for Footprints of Selection using 'Extended Haplotype Homozygosity' Based Tests

Population genetic data such as 'Single Nucleotide Polymorphisms' (SNPs) is often used to identify genomic regions that have been under recent natural or artificial selection and might provide clues about the molecular mechanisms of adaptation. One approach, the concept of an 'Extended Haplotype Homozygosity' (EHH), introduced by (Sabeti 2002) <doi:10.1038/nature01140>, has given rise to several statistics designed for whole genome scans. The package provides functions to compute three of these, namely: 'iHS' (Voight 2006) <doi:10.1371/journal.pbio.0040072> for detecting positive or 'Darwinian' selection within a single population as well as 'Rsb' (Tang 2007) <doi:10.1371/journal.pbio.0050171> and 'XP-EHH' (Sabeti 2007) <doi:10.1038/nature06250>, targeted at differential selection between two populations. Various plotting functions are included to facilitate visualization and interpretation of these statistics.

Maintained by Alexander Klassmann. Last updated 4 years ago.

openmp

0.5 match 8 stars 4.68 score 1 dependents

wenlongren

ScoreEB:Score Test Integrated with Empirical Bayes for Association Study

Perform association test within linear mixed model framework using score test integrated with empirical bayes for genome-wide association study. Firstly, score test was conducted for each single nucleotide polymorphism (SNP) under linear mixed model framework, taking into account the genetic relatedness and population structure. And then all the potentially associated SNPs were selected with a less stringent criterion. Finally, all the selected SNPs were performed empirical bayes in a multi-locus model to identify the true quantitative trait nucleotide (QTN).

Maintained by Wenlong Ren. Last updated 3 years ago.

0.8 match 2 stars 3.00 score 1 scripts

kbroman

mbmixture:Microbiome Mixture Analysis

Evaluate whether a microbiome sample is a mixture of two samples, by fitting a model for the number of read counts as a function of single nucleotide polymorphism (SNP) allele and the genotypes of two potential source samples. Lobo et al. (2021) <doi:10.1093/g3journal/jkab308>.

Maintained by Karl W Broman. Last updated 4 months ago.

0.5 match 6 stars 4.48 score 5 scripts

tacazares

SeedMatchR:Find Matches to Canonical SiRNA Seeds in Genomic Features

On-target gene knockdown using siRNA ideally results from binding fully complementary regions in mRNA transcripts to induce cleavage. Off-target siRNA gene knockdown can occur through several modes, one being a seed-mediated mechanism mimicking miRNA gene regulation. Seed-mediated off-target effects occur when the ~8 nucleotides at the 5’ end of the guide strand, called a seed region, bind the 3’ untranslated regions of mRNA, causing reduced translation. Experiments using siRNA knockdown paired with RNA-seq can be used to detect siRNA sequences with potential off-target effects driven by the seed region. 'SeedMatchR' provides tools for exploring and detecting potential seed-mediated off-target effects of siRNA in RNA-seq experiments. 'SeedMatchR' is designed to extend current differential expression analysis tools, such as 'DESeq2', by annotating results with predicted seed matches. Using publicly available data, we demonstrate the ability of 'SeedMatchR' to detect cumulative changes in differential gene expression attributed to siRNA seed regions.

Maintained by Tareian Cazares. Last updated 1 years ago.

deseq2-analysis mirna rna-seq sirna transcriptomics

0.5 match 7 stars 4.54 score 7 scripts

bioc

gmapR:An R interface to the GMAP/GSNAP/GSTRUCT suite

GSNAP and GMAP are a pair of tools to align short-read data written by Tom Wu. This package provides convenience methods to work with GMAP and GSNAP from within R. In addition, it provides methods to tally alignment results on a per-nucleotide basis using the bam_tally tool.

Maintained by Michael Lawrence. Last updated 16 days ago.

alignment zlib

0.5 match 4.43 score 45 scripts

bioc

seqArchR:Identify Different Architectures of Sequence Elements

seqArchR enables unsupervised discovery of _de novo_ clusters with characteristic sequence architectures characterized by position-specific motifs or composition of stretches of nucleotides, e.g., CG-richness. seqArchR does _not_ require any specifications w.r.t. the number of clusters, the length of any individual motifs, or the distance between motifs if and when they occur in pairs/groups; it directly detects them from the data. seqArchR uses non-negative matrix factorization (NMF) as its backbone, and employs a chunking-based iterative procedure that enables processing of large sequence collections efficiently. Wrapper functions are provided for visualizing cluster architectures as sequence logos.

Maintained by Sarvesh Nikumbh. Last updated 5 months ago.

motifdiscovery generegulation mathematicalbiology systemsbiology transcriptomics genetics clustering dimensionreduction featureextraction dnaseq nmf nonnegative-matrix-factorization promoter-sequence-architectures scikit-learn sequence-analysis sequence-architectures unsupervised-machine-learning

0.5 match 1 stars 4.48 score 9 scripts 1 dependents

bioc

tLOH:Assessment of evidence for LOH in spatial transcriptomics pre-processed data using Bayes factor calculations

tLOH, or transcriptomicsLOH, assesses evidence for loss of heterozygosity (LOH) in pre-processed spatial transcriptomics data. This tool requires spatial transcriptomics cluster and allele count information at likely heterozygous single-nucleotide polymorphism (SNP) positions in VCF format. Bayes factors are calculated at each SNP to determine likelihood of potential loss of heterozygosity event. Two plotting functions are included to visualize allele fraction and aggregated Bayes factor per chromosome. Data generated with the 10X Genomics Visium Spatial Gene Expression platform must be pre-processed to obtain an individual sample VCF with columns for each cluster. Required fields are allele depth (AD) with counts for reference/alternative alleles and read depth (DP).

Maintained by Michelle Webb. Last updated 5 months ago.

copynumbervariation transcription snp geneexpression transcriptomics

0.5 match 3 stars 4.48 score 4 scripts

bioc

motifcounter:R package for analysing TFBSs in DNA sequences

'motifcounter' provides motif matching, motif counting and motif enrichment functionality based on position frequency matrices. The main features of the packages include the utilization of higher-order background models and accounting for self-overlapping motif matches when determining motif enrichment. The background model allows to capture dinucleotide (or higher-order nucleotide) composition adequately which may reduced model biases and misleading results compared to using simple GC background models. When conducting a motif enrichment analysis based on the motif match count, the package relies on a compound Poisson distribution or alternatively a combinatorial model. These distribution account for self-overlapping motif structures as exemplified by repeat-like or palindromic motifs, and allow to determine the p-value and fold-enrichment for a set of observed motif matches.

Maintained by Wolfgang Kopp. Last updated 5 months ago.

transcription motifannotation sequencematching software openmp

0.5 match 4.30 score 7 scripts

delomast

tripsAndDipR:Inference of Ploidy from Sequencing Data

Uses read counts for biallelic single nucleotide polymorphisms (SNPs) to infer ploidy. It allows parameters to be specified to account for sequencing error rates and allelic bias. For details of the algorithms, please see Delomas (2019) <doi:10.1111/1755-0998.13073> and Delomas et al. (2021) <doi:10.1111/1755-0998.13431>.

Maintained by Thomas Delomas. Last updated 2 years ago.

cpp

0.5 match 3 stars 4.18 score 4 scripts

bioc

SNPhood:SNPhood: Investigate, quantify and visualise the epigenomic neighbourhood of SNPs using NGS data

To date, thousands of single nucleotide polymorphisms (SNPs) have been found to be associated with complex traits and diseases. However, the vast majority of these disease-associated SNPs lie in the non-coding part of the genome, and are likely to affect regulatory elements, such as enhancers and promoters, rather than function of a protein. Thus, to understand the molecular mechanisms underlying genetic traits and diseases, it becomes increasingly important to study the effect of a SNP on nearby molecular traits such as chromatin environment or transcription factor (TF) binding. Towards this aim, we developed SNPhood, a user-friendly *Bioconductor* R package to investigate and visualize the local neighborhood of a set of SNPs of interest for NGS data such as chromatin marks or transcription factor binding sites from ChIP-Seq or RNA- Seq experiments. SNPhood comprises a set of easy-to-use functions to extract, normalize and summarize reads for a genomic region, perform various data quality checks, normalize read counts using additional input files, and to cluster and visualize the regions according to the binding pattern. The regions around each SNP can be binned in a user-defined fashion to allow for analysis of very broad patterns as well as a detailed investigation of specific binding shapes. Furthermore, SNPhood supports the integration with genotype information to investigate and visualize genotype-specific binding patterns. Finally, SNPhood can be employed for determining, investigating, and visualizing allele-specific binding patterns around the SNPs of interest.

Maintained by Christian Arnold. Last updated 5 months ago.

software

0.5 match 3.90 score 1 scripts

green-striped-gecko

dartR.base:Analysing 'SNP' and 'Silicodart' Data - Basic Functions

Facilitates the import and analysis of 'SNP' (single nucleotide 'polymorphism') and 'silicodart' (presence/absence) data. The main focus is on data generated by 'DarT' (Diversity Arrays Technology), however, data from other sequencing platforms can be used once 'SNP' or related fragment presence/absence data from any source is imported. Genetic datasets are stored in a derived 'genlight' format (package 'adegenet'), that allows for a very compact storage of data and metadata. Functions are available for importing and exporting of 'SNP' and 'silicodart' data, for reporting on and filtering on various criteria (e.g. 'callrate', 'heterozygosity', 'reproducibility', maximum allele frequency). Additional functions are available for visualization (e.g. Principle Coordinate Analysis) and creating a spatial representation using maps. 'dartR.base' is the 'base' package of the 'dartRverse' suits of packages. To install the other packages, we recommend to install the 'dartRverse' package, that supports the installation of all packages in the 'dartRverse'. If you want to cite 'dartR', you find the information by typing citation('dartR.base') in the console.

Maintained by Bernd Gruber. Last updated 13 days ago.

0.5 match 3.84 score 17 scripts 5 dependents

yuchaojiang

Canopy:Accessing Intra-Tumor Heterogeneity and Tracking Longitudinal and Spatial Clonal Evolutionary History by Next-Generation Sequencing

A statistical framework and computational procedure for identifying the sub-populations within a tumor, determining the mutation profiles of each subpopulation, and inferring the tumor's phylogenetic history. The input are variant allele frequencies (VAFs) of somatic single nucleotide alterations (SNAs) along with allele-specific coverage ratios between the tumor and matched normal sample for somatic copy number alterations (CNAs). These quantities can be directly taken from the output of existing software. Canopy provides a general mathematical framework for pooling data across samples and sites to infer the underlying parameters. For SNAs that fall within CNA regions, Canopy infers their temporal ordering and resolves their phase. When there are multiple evolutionary configurations consistent with the data, Canopy outputs all configurations along with their confidence assessment.

Maintained by Yuchao Jiang. Last updated 7 years ago.

0.5 match 3.81 score 65 scripts

cran

disprose:Discriminating Probes Selection

Set of tools for molecular probes selection and design of a microarray, e.g. the assessment of physical and chemical properties, blast performance, selection according to sensitivity and selectivity. Methods used in package are described in: Lorenz R., Stephan H.B., Höner zu Siederdissen C. et al. (2011) <doi:10.1186/1748-7188-6-26>; Camacho C., Coulouris G., Avagyan V. et al. (2009) <doi:10.1186/1471-2105-10-421>.

Maintained by Elena Filatova. Last updated 3 years ago.

1.9 match 1.00 score

arunabhacodes

MPGE:A Two-Step Approach to Testing Overall Effect of Gene-Environment Interaction for Multiple Phenotypes

Interaction between a genetic variant (e.g., a single nucleotide polymorphism) and an environmental variable (e.g., physical activity) can have a shared effect on multiple phenotypes (e.g., blood lipids). We implement a two-step method to test for an overall interaction effect on multiple phenotypes. In first step, the method tests for an overall marginal genetic association between the genetic variant and the multivariate phenotype. The genetic variants which show an evidence of marginal overall genetic effect in the first step are prioritized while testing for an overall gene-environment interaction effect in the second step. Methodology is available from: A Majumdar, KS Burch, S Sankararaman, B Pasaniuc, WJ Gauderman, JS Witte (2020) <doi:10.1101/2020.07.06.190256>.

Maintained by Arunabha Majumdar. Last updated 4 years ago.

0.5 match 1 stars 3.70 score 1 scripts

jmanitz

kangar00:Kernel Approaches for Nonlinear Genetic Association Regression

Methods to extract information on pathways, genes and various single-nucleotid polymorphisms (SNPs) from online databases. It provides functions for data preparation and evaluation of genetic influence on a binary outcome using the logistic kernel machine test (LKMT). Three different kernel functions are offered to analyze genotype information in this variance component test: A linear kernel, a size-adjusted kernel and a network-based kernel).

Maintained by Juliane Manitz. Last updated 6 months ago.

0.5 match 2 stars 3.62 score 21 scripts

bioc

rfPred:Assign rfPred functional prediction scores to a missense variants list

Based on external numerous data files where rfPred scores are pre-calculated on all genomic positions of the human exome, the package gives rfPred scores to missense variants identified by the chromosome, the position (hg19 version), the referent and alternative nucleotids and the uniprot identifier of the protein. Note that for using the package, the user has to download the TabixFile and index (approximately 3.3 Go).

Maintained by Hugo Varet. Last updated 5 months ago.

software annotation classification

0.5 match 3.60 score 4 scripts

soroushmdg

gwid:Genome-Wide Identity-by-Descent

Methods and tools for the analysis of Genome Wide Identity-by-Descent ('gwid') mapping data, focusing on testing whether there is a higher occurrence of Identity-By-Descent (IBD) segments around potential causal variants in cases compared to controls, which is crucial for identifying rare variants. To enhance its analytical power, 'gwid' incorporates a Sliding Window Approach, allowing for the detection and analysis of signals from multiple Single Nucleotide Polymorphisms (SNPs).

Maintained by Soroush Mahmoudiandehkordi. Last updated 6 months ago.

0.5 match 1 stars 3.60 score 4 scripts

jphill01

HACSim:Iterative Extrapolation of Species' Haplotype Accumulation Curves for Genetic Diversity Assessment

Performs iterative extrapolation of species' haplotype accumulation curves using a nonparametric stochastic (Monte Carlo) optimization method for assessment of specimen sampling completeness based on the approach of Phillips et al. (2015) <doi:10.1515/dna-2015-0008>, Phillips et al. (2019) <doi:10.1002/ece3.4757> and Phillips et al. (2020) <doi: 10.7717/peerj-cs.243>. 'HACSim' outputs a number of useful summary statistics of sampling coverage ("Measures of Sampling Closeness"), including an estimate of the likely required sample size (along with desired level confidence intervals) necessary to recover a given number/proportion of observed unique species' haplotypes. Any genomic marker can be targeted to assess likely required specimen sample sizes for genetic diversity assessment. The method is particularly well-suited to assess sampling sufficiency for DNA barcoding initiatives. Users can also simulate their own DNA sequences according to various models of nucleotide substitution. A Shiny app is also available.

Maintained by Jarrett D. Phillips. Last updated 6 months ago.

dna-barcoding haplotype-accumulation-curves cpp

0.5 match 3.48 score 5 scripts

bioc

CNViz:Copy Number Visualization

CNViz takes probe, gene, and segment-level log2 copy number ratios and launches a Shiny app to visualize your sample's copy number profile. You can also integrate loss of heterozygosity (LOH) and single nucleotide variant (SNV) data.

Maintained by Rebecca Greenblatt. Last updated 5 months ago.

visualization copynumbervariation sequencing dnaseq

0.5 match 3.30 score 1 scripts

bioc

triplex:Search and visualize intramolecular triplex-forming sequences in DNA

This package provides functions for identification and visualization of potential intramolecular triplex patterns in DNA sequence. The main functionality is to detect the positions of subsequences capable of folding into an intramolecular triplex (H-DNA) in a much larger sequence. The potential H-DNA (triplexes) should be made of as many cannonical nucleotide triplets as possible. The package includes visualization showing the exact base-pairing in 1D, 2D or 3D.

Maintained by Jiri Hon. Last updated 5 months ago.

sequencematching generegulation

0.5 match 3.30 score 2 scripts

bioc

dyebias:The GASSCO method for correcting for slide-dependent gene-specific dye bias

Many two-colour hybridizations suffer from a dye bias that is both gene-specific and slide-specific. The former depends on the content of the nucleotide used for labeling; the latter depends on the labeling percentage. The slide-dependency was hitherto not recognized, and made addressing the artefact impossible. Given a reasonable number of dye-swapped pairs of hybridizations, or of same vs. same hybridizations, both the gene- and slide-biases can be estimated and corrected using the GASSCO method (Margaritis et al., Mol. Sys. Biol. 5:266 (2009), doi:10.1038/msb.2009.21)

Maintained by Philip Lijnzaad. Last updated 5 months ago.

microarray twochannel qualitycontrol preprocessing

0.5 match 3.30 score 10 scripts

noramvillanueva

seq2R:Simple Method to Detect Compositional Changes in Genomic Sequences

This software is useful for loading '.fasta' or '.gbk' files, and for retrieving sequences from 'GenBank' dataset <https://www.ncbi.nlm.nih.gov/genbank/>. This package allows to detect differences or asymmetries based on nucleotide composition by using local linear kernel smoothers. Also, it is possible to draw inference about critical points (i. e. maximum or minimum points) related with the derivative curves. Additionally, bootstrap methods have been used for estimating confidence intervals and speed computational techniques (binning techniques) have been implemented in 'seq2R'.

Maintained by Nora M. Villanueva. Last updated 4 months ago.

bootstrap change-points dna-sequences genome-analysis machine-learning nonparametric-statistics regression fortran

0.5 match 3.00 score 10 scripts

dcibioinformatics

survSNP:Power Calculations for SNP Studies with Censored Outcomes

Conduct asymptotic and empirical power and sample size calculations for Single-Nucleotide Polymorphism (SNP) association studies with right censored time to event outcomes.

Maintained by Alexander Sibley. Last updated 2 years ago.

gsl cpp

0.5 match 2.81 score 16 scripts

srika1919

pPCA:Partial Principal Component Analysis of Partitioned Large Sparse Matrices

Performs partial principal component analysis of a large sparse matrix. The matrix may be stored as a list of matrices to be concatenated (implicitly) horizontally. Useful application includes cases where the number of total nonzero entries exceed the capacity of 32 bit integers (e.g., with large Single Nucleotide Polymorphism data).

Maintained by Srika Raja. Last updated 5 months ago.

cpp

0.5 match 2.78 score

matveevdaniil

RPatternJoin:String Similarity Joins for Hamming and Levenshtein Distances

This project is a tool for words edit similarity joins (a.k.a. all-pairs similarity search) under small (< 3) edit distance constraints. It works for Levenshtein/Hamming distances and words from any alphabet. The software was originally developed for joining amino-acid/nucleotide sequences from Adaptive Immune Repertoires, where the number of words is relatively large (10^5-10^6) and the average length of words is relatively small (10-100).

Maintained by Daniil Matveev. Last updated 5 months ago.

cpp

0.5 match 2.78 score 5 scripts 1 dependents

leipzig

asciiruler:Render an ASCII Ruler

An ASCII ruler is for measuring text and is especially useful for sequence analysis. Included in this package are methods to create ASCII rulers and associated GenBank sequence blocks, multi-column text displays that make it easy for viewers to locate nucleotides by position.

Maintained by Jeremy Leipzig. Last updated 3 years ago.

0.5 match 2.70 score 5 scripts

cran

HTRX:Haplotype Trend Regression with eXtra Flexibility (HTRX)

Detection of haplotype patterns that include single nucleotide polymorphisms (SNPs) and non-contiguous haplotypes that are associated with a phenotype. Methods for implementing HTRX are described in Yang Y, Lawson DJ (2023) <doi:10.1093/bioadv/vbad038> and Barrie W, Yang Y, Irving-Pease E.K, et al (2024) <doi:10.1038/s41586-023-06618-z>.

Maintained by Yaoling Yang. Last updated 1 years ago.

0.5 match 2.70 score

gastonquero

haplotyper:Tool for Clustering Genotypes in Haplotypes

Function to identify haplotypes within QTL (Quantitative Trait Loci). One haplotype is a combination of SNP (Single Nucleotide Polymorphisms) within the QTL. This function groups together all individuals of a population with the same haplotype. Each group contains individual with the same allele in each SNP, whether or not missing data. Thus, haplotyper groups individuals, that to be imputed, have a non-zero probability of having the same alleles in the entire sequence of SNP's. Moreover, haplotyper calculates such probability from relative frequencies.

Maintained by Gaston Quero. Last updated 9 years ago.

0.5 match 2.70 score 3 scripts

henrikbengtsson

calmate:Improved Allele-Specific Copy Number of SNP Microarrays for Downstream Segmentation

The CalMaTe method calibrates preprocessed allele-specific copy number estimates (ASCNs) from DNA microarrays by controlling for single-nucleotide polymorphism-specific allelic crosstalk. The resulting ASCNs are on average more accurate, which increases the power of segmentation methods for detecting changes between copy number states in tumor studies including copy neutral loss of heterozygosity. CalMaTe applies to any ASCNs regardless of preprocessing method and microarray technology, e.g. Affymetrix and Illumina.

Maintained by Henrik Bengtsson. Last updated 3 years ago.

acgh copynumbervariants snp microarray onechannel twochannel genetics

0.5 match 1 stars 2.70 score 6 scripts

pboutros

ApplyPolygenicScore:Utilities for the Application of a Polygenic Score to a VCF

Simple and transparent parsing of genotype/dosage data from an input Variant Call Format (VCF) file, matching of genotype coordinates to the component Single Nucleotide Polymorphisms (SNPs) of an existing polygenic score (PGS), and application of SNP weights to dosages for the calculation of a polygenic score for each individual in accordance with the additive weighted sum of dosages model. Methods are designed in reference to best practices described by Collister, Liu, and Clifton (2022) <doi:10.3389/fgene.2022.818574>.

Maintained by Paul Boutros. Last updated 11 days ago.

0.5 match 2.70 score

mngar

simulMGF:Simulate SNP Matrix, Phenotype and Genotypic Effects

Simulate genotypes in SNP (single nucleotide polymorphisms) Matrix as random numbers from an uniform distribution, for diploid organisms (coded by 0, 1, 2), Sikorska et al., (2013) <doi:10.1186/1471-2105-14-166>, or half-sib/full-sib SNP matrix from real or simulated parents SNP data, assuming mendelian segregation. Simulate phenotypic traits for real or simulated SNP data, controlled by a specific number of quantitative trait loci and their effects, sampled from a Normal or an Uniform distributions, assuming a pure additive model. This is useful for testing association and genomic prediction models or for educational purposes.

Maintained by Martin Nahuel Garcia. Last updated 2 years ago.

0.5 match 2.70 score 1 scripts

cran

hiphop:Parentage Assignment using Bi-Allelic Genetic Markers

Can be used for paternity and maternity assignment and outperforms conventional methods where closely related individuals occur in the pool of possible parents. The method compares the genotypes of offspring with any combination of potentials parents and scores the number of mismatches of these individuals at bi-allelic genetic markers (e.g. Single Nucleotide Polymorphisms). It elaborates on a prior exclusion method based on the Homozygous Opposite Test (HOT; Huisman 2017 <doi:10.1111/1755-0998.12665>) by introducing the additional exclusion criterion HIPHOP (Homozygous Identical Parents, Heterozygous Offspring are Precluded; Cockburn et al., in revision). Potential parents are excluded if they have more mismatches than can be expected due to genotyping error and mutation, and thereby one can identify the true genetic parents and detect situations where one (or both) of the true parents is not sampled. Package 'hiphop' can deal with (a) the case where there is contextual information about parentage of the mother (i.e. a female has been seen to be involved in reproductive tasks such as nest building), but paternity is unknown (e.g. due to promiscuity), (b) where both parents need to be assigned, because there is no contextual information on which female laid eggs and which male fertilized them (e.g. polygynandrous mating system where multiple females and males deposit young in a common nest, or organisms with external fertilisation that breed in aggregations). For details: Cockburn, A., Penalba, J.V.,Jaccoud, D.,Kilian, A., Brouwer, L., Double, M.C., Margraf, N., Osmond, H.L., van de Pol, M. and Kruuk, L.E.B. (in revision). HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for bi-allelic markers. Molecular Ecology Resources, DOI to be added upon acceptance.

Maintained by Martijn van de Pol. Last updated 5 years ago.

0.5 match 1 stars 2.70 score

ryansunwork

ICSKAT:Interval-Censored Sequence Kernel Association Test

Implements the Interval-Censored Sequence Kernel Association (ICSKAT) test for testing the association between interval-censored time-to-event outcomes and groups of single nucleotide polymorphisms (SNPs). Interval-censored time-to-event data occur when the event time is not known exactly but can be deduced to fall within a given interval. For example, some medical conditions like bone mineral density deficiency are generally only diagnosed at clinical visits. If a patient goes for clinical checkups yearly and is diagnosed at, say, age 30, then the onset of the deficiency is only known to fall between the date of their age 29 checkup and the date of the age 30 checkup. Interval-censored data include right- and left-censored data as special cases. This package also implements the interval-censored Burden test and the ICSKATO test, which is the optimal combination of the ICSKAT and Burden tests. Please see the vignette for a quickstart guide.

Maintained by Ryan Sun. Last updated 3 years ago.

cpp

0.5 match 2.48 score 3 scripts 1 dependents

liqgroup

AssocTests:Genetic Association Studies

Some procedures including EIGENSTRAT (a procedure for detecting and correcting for population stratification through searching for the eigenvectors in genetic association studies), PCoC (a procedure for correcting for population stratification through calculating the principal coordinates and the clustering of the subjects), Tracy-Widom test (a procedure for detecting the significant eigenvalues of a matrix), distance regression (a procedure for detecting the association between a distance matrix and some independent variants of interest), single-marker test (a procedure for identifying the association between the genotype at a biallelic marker and a trait using the Wald test or the Fisher's exact test), MAX3 (a procedure for testing for the association between a single nucleotide polymorphism and a binary phenotype using the maximum value of the three test statistics derived for the recessive, additive, and dominant models), nonparametric trend test (a procedure for testing for the association between a genetic variant and a non-normal distributed quantitative trait based on the nonparametric risk), and nonparametric MAX3 (a procedure for testing for the association between a biallelic single nucleotide polymorphism and a quantitative trait using the maximum value of the three nonparametric trend tests derived for the recessive, additive, and dominant models), which are commonly used in genetic association studies. To cite this package in publications use: Lin Wang, Wei Zhang, and Qizhai Li. AssocTests: An R Package for Genetic Association Studies. Journal of Statistical Software. 2020; 94(5): 1-26.

Maintained by Lin Wang. Last updated 4 years ago.

0.8 match 1 stars 1.64 score 11 scripts

cran

dartR.sexlinked:Analysing SNP Data to Identify Sex-Linked Markers

Identifies, filters and exports sex linked markers using 'SNP' (single nucleotide polymorphism) data. To install the other packages, we recommend to install the 'dartRverse' package, that supports the installation of all packages in the 'dartRverse'. If you want understand the applied rational to identify sexlinked markers and/or want to cite 'dartR.sexlinked', you find the information by typing citation('dartR.sexlinked') in the console.

Maintained by Diana Robledo-Ruiz. Last updated 9 months ago.

0.5 match 2.00 score 4 scripts

green-striped-gecko

dartR.captive:Analysing 'SNP' Data to Support Captive Breeding

Functions are provided that facilitate the analysis of SNP (single nucleotide polymorphism) data to answer questions regarding captive breeding and relatedness between individuals. 'dartR.captive' is part of the 'dartRverse' suit of packages. Gruber et al. (2018) <doi:10.1111/1755-0998.12745>. Mijangos et al. (2022) <doi:10.1111/2041-210X.13918>.

Maintained by Bernd Gruber. Last updated 27 days ago.

0.5 match 1 stars 2.00 score 3 scripts

gastonquero

clusterhap:Clustering Genotypes in Haplotypes

One haplotype is a combination of SNP (Single Nucleotide Polymorphisms) within the QTL (Quantitative Trait Loci). clusterhap groups together all individuals of a population with the same haplotype. Each group contains individual with the same allele in each SNP, whether or not missing data. Thus, clusterhap groups individuals, that to be imputed, have a non-zero probability of having the same alleles in the entire sequence of SNP's. Moreover, clusterhap calculates such probability from relative frequencies.

Maintained by Gaston Quero. Last updated 9 years ago.

0.5 match 2.00 score 3 scripts

empiricalbayes

LFDREmpiricalBayes:Estimating Local False Discovery Rates Using Empirical Bayes Methods

New empirical Bayes methods aiming at analyzing the association of single nucleotide polymorphisms (SNPs) to some particular disease are implemented in this package. The package uses local false discovery rate (LFDR) estimates of SNPs within a sample population defined as a "reference class" and discovers if SNPs are associated with the corresponding disease. Although SNPs are used throughout this document, other biological data such as protein data and other gene data can be used. Karimnezhad, Ali and Bickel, D. R. (2016) <http://hdl.handle.net/10393/34889>.

Maintained by Ali Karimnezhad. Last updated 7 years ago.

bayesian mathematicalbiology multiplecomparison

0.5 match 2.00 score 5 scripts

cran

GEVACO:Joint Test of Gene and GxE Interactions via Varying Coefficients

A novel statistical model to detect the joint genetic and dynamic gene-environment (GxE) interaction with continuous traits in genetic association studies. It uses varying-coefficient models to account for different GxE trajectories, regardless whether the relationship is linear or not. The package includes one function, GxEtest(), to test a single genetic variant (e.g., a single nucleotide polymorphism or SNP), and another function, GxEscreen(), to test for a set of genetic variants. The method involves a likelihood ratio test described in Crainiceanu, C. M., and Ruppert, D. (2004) <doi:10.1111/j.1467-9868.2004.00438.x>.

Maintained by Sydney Manning. Last updated 3 years ago.

0.5 match 2.00 score 2 scripts

wittenburg

hsrecombi:Estimation of Recombination Rate and Maternal LD in Half-Sibs

Paternal recombination rate and maternal linkage disequilibrium (LD) are estimated for pairs of biallelic markers such as single nucleotide polymorphisms (SNPs) from progeny genotypes and sire haplotypes. The implementation relies on paternal half-sib families. If maternal half-sib families are used, the roles of sire/dam are swapped. Multiple families can be considered. For parameter estimation, at least one sire has to be double heterozygous at the investigated pairs of SNPs. Based on recombination rates, genetic distances between markers can be estimated. Markers with unusually large recombination rate to markers in close proximity (i.e. putatively misplaced markers) shall be discarded in this derivation. A workflow description is attached as vignette. *A pipeline is available at GitHub* <https://github.com/wittenburg/hsrecombi> Hampel, Teuscher, Gomez-Raya, Doschoris, Wittenburg (2018) "Estimation of recombination rate and maternal linkage disequilibrium in half-sibs" <doi:10.3389/fgene.2018.00186>. Gomez-Raya (2012) "Maximum likelihood estimation of linkage disequilibrium in half-sib families" <doi:10.1534/genetics.111.137521>.

Maintained by Dörte Wittenburg. Last updated 2 years ago.

cpp

0.5 match 2.00 score 7 scripts

wpihongzhang

CKAT:Composite Kernel Association Test for Pharmacogenetics Studies

Composite Kernel Association Test (CKAT) is a flexible and robust kernel machine based approach to jointly test the genetic main effect and gene-treatment interaction effect for a set of single-nucleotide polymorphisms (SNPs) in pharmacogenetics (PGx) assessments embedded within randomized clinical trials.

Maintained by Hong Zhang. Last updated 5 years ago.

0.5 match 1.78 score

cran

LDAandLDAS:Linkage Disequilibrium of Ancestry (LDA) and LDA Score (LDAS)

Computation of linkage disequilibrium of ancestry (LDA) and linkage disequilibrium of ancestry score (LDAS). LDA calculates the pairwise linkage disequilibrium of ancestry between single nucleotide polymorphisms (SNPs). LDAS calculates the LDA score of SNPs. The methods are described in Barrie W, Yang Y, Irving-Pease E.K, et al (2024) <doi:10.1038/s41586-023-06618-z>.

Maintained by Yaoling Yang. Last updated 1 years ago.

cpp

0.5 match 1.70 score

cran

FILEST:Fine-Level Structure Simulator

A population genetic simulator, which is able to generate synthetic datasets for single-nucleotide polymorphisms (SNP) for multiple populations. The genetic distances among populations can be set according to the Fixation Index (Fst) as explained in Balding and Nichols (1995) <doi:10.1007/BF01441146>. This tool is able to simulate outlying individuals and missing SNPs can be specified. For Genome-wide association study (GWAS), disease status can be set in desired level according risk ratio.

Maintained by Kridsadakorn Chaichoompu. Last updated 4 years ago.

0.5 match 1.70 score

largon-denayah

read.gb:Open GenBank Files

Opens complete record(s) with .gb extension from the NCBI/GenBank Nucleotide database and returns a list containing shaped record(s). These kind of files contains detailed records of DNA samples (locus, organism, type of sequence, source of the sequence...). An example of record can be found at <https://www.ncbi.nlm.nih.gov/nuccore/HE799070>.

Maintained by Robin Mercier. Last updated 4 years ago.

0.5 match 1.48 score 5 scripts 1 dependents

cran

FunctanSNP:Functional Analysis (with Interactions) for Dense SNP Data

An implementation of revised functional regression models for multiple genetic variation data, such as single nucleotide polymorphism (SNP) data, which provides revised functional linear regression models, partially functional interaction regression analysis with penalty-based techniques and corresponding drawing functions, etc.(Ruzong Fan, Yifan Wang, James L. Mills, Alexander F. Wilson, Joan E. Bailey-Wilson, and Momiao Xiong (2013) <doi:10.1002/gepi.21757>).

Maintained by Rui Ren. Last updated 2 years ago.

0.5 match 1.40 score 25 scripts

l0ka

TPES:Tumor Purity Estimation using SNVs

A bioinformatics tool for the estimation of the tumor purity from sequencing data. It uses the set of putative clonal somatic single nucleotide variants within copy number neutral segments to call tumor cellularity.

Maintained by Alessio Locallo. Last updated 6 years ago.

0.5 match 2 stars 1.30 score 10 scripts

cran

PlasmaMutationDetector:Tumor Mutation Detection in Plasma

Aims at detecting single nucleotide variation (SNV) and insertion/deletion (INDEL) in circulating tumor DNA (ctDNA), used as a surrogate marker for tumor, at each base position of an Next Generation Sequencing (NGS) analysis. Mutations are assessed by comparing the minor-allele frequency at each position to the measured PER in control samples.

Maintained by Yves Rozenholc. Last updated 7 years ago.

0.5 match 1.30 score

cran

genomicper:Circular Genomic Permutation using Genome Wide Association p-Values

Circular genomic permutation approach uses genome wide association studies (GWAS) results to establish the significance of pathway/gene-set associations whilst accounting for genomic structure(Cabrera et al (2012) <doi:10.1534/g3.112.002618>). All single nucleotide polymorphisms (SNPs) in the GWAS are placed in a 'circular genome' according to their location. Then the complete set of SNP association p-values are permuted by rotation with respect to the SNPs' genomic locations. Two testing frameworks are available: permutations at the gene level, and permutations at the SNP level. The permutation at the gene level uses Fisher's combination test to calculate a single gene p-value, followed by the hypergeometric test. The SNP count methodology maps each SNP to pathways/gene-sets and calculates the proportion of SNPs for the real and the permutated datasets above a pre-defined threshold. Genomicper requires a matrix of GWAS association p-values and SNPs annotation to genes. Pathways can be obtained from within the package or can be provided by the user.

Maintained by Claudia P Cabrera. Last updated 4 years ago.

0.5 match 1 stars 1.15 score 14 scripts

cran

PdPDB:Pattern Discovery in PDB Structures of Metalloproteins

Looks for amino acid and/or nucleotide patterns and/or small ligands coordinated to a given prosthetic centre. Files have to be in the local file system and contain proper extension.

Maintained by Luca Belmonte. Last updated 7 years ago.

0.5 match 1.00 score

alenazia

NGBVS:Bayesian Variable Selection for SNP Data using Normal-Gamma

Posterior distribution of case-control fine-mapping. Specifically, Bayesian variable selection for single-nucleotide polymorphism (SNP) data using the normal-gamma prior. Alenazi A.A., Cox A., Juarez M,. Lin W-Y. and Walters, K. (2019) Bayesian variable selection using partially observed categorical prior information in fine-mapping association studies, Genetic Epidemiology. <doi:10.1002/gepi.22213>.

Maintained by Abdulaziz Alenazi. Last updated 2 years ago.

0.5 match 1.00 score

cran

RHclust:Vector in Partition

Non-parametric clustering of joint pattern multi-genetic/epigenetic factors. This package contains functions designed to cluster subjects based on gene features including single nucleotide polymorphisms (SNPs), DNA methylation (CPG), gene expression (GE), and covariate data. The novel concept follows the general K-means (Hartigan and Wong (1979) <doi:10.2307/2346830> framework but uses weighted Euclidean distances across the gene features to cluster subjects. This approach is unique in that it attempts to capture all pairwise interactions in an effort to cluster based on their complex biological interactions.

Maintained by Joseph Handwerker. Last updated 2 years ago.

0.5 match 1.00 score

cran

PlasmaMutationDetector2:Tumor Mutation Detection in Plasma using Barcoding

Aims at detecting single nucleotide variation (SNV) and insertion/deletion (INDEL) in circulating tumor DNA (ctDNA), used as a surrogate marker for tumor, at each base position of an Next Generation Sequencing (NGS) analysis using barcoding. Mutations are assessed by comparing the minor-allele frequency at each position to the measured PER in control samples. This package has been used for Kjersti Tjensvoll, Morten Lapin, Bjørnar Gilje, Herish Garresori, Satu Oltedal, Rakel Brendsdal Forthun, Anders Molven, Yves Rozenholc and Oddmund N\o{o}rdgaard (2022) <https://www.nature.com/articles/s41598-022-09698-5>.

Maintained by Rozenholc. Last updated 3 years ago.

0.5 match 1.00 score

cran

CLONETv2:Clonality Estimates in Tumor

Analyze data from next-generation sequencing experiments on genomic samples. 'CLONETv2' offers a set of functions to compute allele specific copy number and clonality from segmented data and SNPs position pileup. The package has also calculated the clonality of single nucleotide variants given read counts at mutated positions. The package has been developed at the laboratory of Computational and Functional Oncology, Department of CIBIO, University of Trento (Italy), under the supervision of prof Francesca Demichelis. References: Prandi et al. (2014) <doi:10.1186/s13059-014-0439-6>; Carreira et al. (2014) <doi:10.1126/scitranslmed.3009448>; Romanel et al. (2015) <doi:10.1126/scitranslmed.aac9511>.

Maintained by Yari Ciani. Last updated 3 years ago.

0.5 match 1.00 score

wpihongzhang

cwot:Cauchy Weighted Joint Test for Pharmacogenetics Analysis

A flexible and robust joint test of the single nucleotide polymorphism (SNP) main effect and genotype-by-treatment interaction effect for continuous and binary endpoints. Two analytic procedures, Cauchy weighted joint test (CWOT) and adaptively weighted joint test (AWOT), are proposed to accurately calculate the joint test p-value. The proposed methods are evaluated through extensive simulations under various scenarios. The results show that the proposed AWOT and CWOT control type I error well and outperform existing methods in detecting the most interesting signal patterns in pharmacogenetics (PGx) association studies. For reference, see Hong Zhang, Devan Mehrotra and Judong Shen (2022) <doi:10.13140/RG.2.2.28323.53280>.

Maintained by Hong Zhang. Last updated 2 years ago.

0.5 match 1.00 score

kkunji

AssocAFC:Allele Frequency Comparison

When doing association analysis one does not always have the genotypes for the control population. In such cases it may be necessary to fall back on frequency based tests using well known sources for the frequencies in the control population, for instance, from the 1000 Genomes Project. The Allele Frequency Comparison ('AssocAFC') package performs multiple rare variant association analyses in both population and family-based GWAS (Genome-Wide Association Study) designs. It includes three score tests that are based on the difference of the sum of allele frequencies between cases and controls. Two of these tests, Wcorrected() and Wqls(), are collapsing-based tests and suffer from having protective and risk variants. The third test, afcSKAT(), is a score test that overcomes the mix of SNP (Single-Nucleotide Polymorphism) effect directions. For more details see Saad M and Wijsman EM (2017) <doi:10.1093/bib/bbx107>.

Maintained by Khalid B. Kunji. Last updated 7 years ago.

0.5 match 1.00 score 9 scripts

cran

SADEG:Stability Analysis in Differentially Expressed Genes

We analyzed the nucleotide composition of genes with a special emphasis on stability of DNA sequences. Besides, in a variety of different organisms unequal use of synonymous codons, or codon usage bias, occurs which also show variation among genes in the same genome. Seemingly, codon usage bias is affected by both selective constraints and mutation bias which allows and enables us to examine and detect changes in these two evolutionary forces between genomes or along one genome. Therefore, we determined the codon adaptation index (CAI), effective number of codons (ENC) and codon usage analysis with calculation of the relative synonymous codon usage (RSCU), and subsequently predicted the translation efficiency and accuracy through GC-rich codon usages. Furthermore, we estimated the relative stability of the DNA sequence following calculation of the average free energy (Delta G) and Dimer base-stacking energy level.

Maintained by Babak Khorsand. Last updated 8 years ago.

0.5 match 1.00 score

wittenburg

hscovar:Calculation of Covariance Between Markers for Half-Sib Families

The theoretical covariance between pairs of markers is calculated from either paternal haplotypes and maternal linkage disequilibrium (LD) or vise versa. A genetic map is required. Grouping of markers is based on the correlation matrix and a representative marker is suggested for each group. Employing the correlation matrix, optimal sample size can be derived for association studies based on a SNP-BLUP approach. The implementation relies on paternal half-sib families and biallelic markers. If maternal half-sib families are used, the roles of sire/dam are swapped. Multiple families can be considered. Wittenburg, Bonk, Doschoris, Reyer (2020) "Design of Experiments for Fine-Mapping Quantitative Trait Loci in Livestock Populations" <doi:10.1186/s12863-020-00871-1>. Carlson, Eberle, Rieder, Yi, Kruglyak, Nickerson (2004) "Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium" <doi:10.1086/381000>.

Maintained by Dörte Wittenburg. Last updated 4 years ago.

0.5 match 1.00 score 2 scripts