Showing 195 of total 195 results (show query)
bioc
Modstrings:Working with modified nucleotide sequences
Representing nucleotide modifications in a nucleotide sequence is usually done via special characters from a number of sources. This represents a challenge to work with in R and the Biostrings package. The Modstrings package implements this functionallity for RNA and DNA sequences containing modified nucleotides by translating the character internally in order to work with the infrastructure of the Biostrings package. For this the ModRNAString and ModDNAString classes and derivates and functions to construct and modify these objects despite the encoding issues are implemenented. In addition the conversion from sequences to list like location information (and the reverse operation) is implemented as well.
Maintained by Felix G.M. Ernst. Last updated 5 months ago.
dataimportdatarepresentationinfrastructuresequencingsoftwarebioconductorbiostringsdnadna-modificationsmodified-nucleotidesnucleotidesrnarna-modification-alphabetrna-modificationssequences
33.7 match 1 stars 6.64 score 5 scripts 8 dependentsbioc
ginmappeR:Gene Identifier Mapper
Provides functionalities to translate gene or protein identifiers between state-of-art biological databases: CARD (<https://card.mcmaster.ca/>), NCBI Protein, Nucleotide and Gene (<https://www.ncbi.nlm.nih.gov/>), UniProt (<https://www.uniprot.org/>) and KEGG (<https://www.kegg.jp>). Also offers complementary functionality like NCBI identical proteins or UniProt similar genes clusters retrieval.
Maintained by Fernando Sola. Last updated 3 months ago.
annotationkegggeneticsthirdpartyclientsoftware
17.5 match 4.88 score 7 scriptsdsstoffer
astsa:Applied Statistical Time Series Analysis
Contains data sets and scripts for analyzing time series in both the frequency and time domains including state space modeling as well as supporting the texts Time Series Analysis and Its Applications: With R Examples (5th ed), by R.H. Shumway and D.S. Stoffer. Springer Texts in Statistics, 2025, <https://link.springer.com/book/9783031705830>, and Time Series: A Data Analysis Approach Using R. Chapman-Hall, 2019, <DOI:10.1201/9780429273285>.
Maintained by David Stoffer. Last updated 2 months ago.
10.6 match 7 stars 7.88 score 2.2k scripts 8 dependentsthibautjombart
adegenet:Exploratory Analysis of Genetic and Genomic Data
Toolset for the exploration of genetic and genomic data. Adegenet provides formal (S4) classes for storing and handling various genetic data, including genetic markers with varying ploidy and hierarchical population structure ('genind' class), alleles counts by populations ('genpop'), and genome-wide SNP data ('genlight'). It also implements original multivariate methods (DAPC, sPCA), graphics, statistical tests, simulation tools, distance and similarity measures, and several spatial methods. A range of both empirical and simulated datasets is also provided to illustrate various methods.
Maintained by Zhian N. Kamvar. Last updated 1 months ago.
5.3 match 182 stars 12.60 score 1.9k scripts 29 dependentsbioc
ShortRead:FASTQ input and manipulation
This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
dataimportsequencingqualitycontrolbioconductor-packagecore-packagezlibcpp
5.2 match 8 stars 12.08 score 1.8k scripts 49 dependentssjmack
HLAtools:Toolkit for HLA Immunogenomics
A toolkit for the analysis and management of data for genes in the so-called "Human Leukocyte Antigen" (HLA) region. Functions extract reference data from the Anthony Nolan HLA Informatics Group/ImmunoGeneTics HLA 'GitHub' repository (ANHIG/IMGTHLA) <https://github.com/ANHIG/IMGTHLA>, validate Genotype List (GL) Strings, convert between UNIFORMAT and GL String Code (GLSC) formats, translate HLA alleles and GLSCs across ImmunoPolymorphism Database (IPD) IMGT/HLA Database release versions, identify differences between pairs of alleles at a locus, generate customized, multi-position sequence alignments, trim and convert allele-names across nomenclature epochs, and extend existing data-analysis methods.
Maintained by Steven Mack. Last updated 13 days ago.
9.6 match 4 stars 6.21 score 7 scripts 1 dependentsboopsboops
spider:Species Identity and Evolution in R
Analysis of species limits and DNA barcoding data. Included are functions for generating important summary statistics from DNA barcode data, assessing specimen identification efficacy, testing and optimizing divergence threshold limits, assessment of diagnostic nucleotides, and calculation of the probability of reciprocal monophyly. Additionally, a sliding window function offers opportunities to analyse information across a gene, often used for marker design in degraded DNA studies. Further information on the package has been published in Brown et al (2012) <doi:10.1111/j.1755-0998.2011.03108.x>.
Maintained by Rupert A. Collins. Last updated 6 years ago.
dna-barcodeednaevolutionspecies-delimitationspecies-identity
9.8 match 2 stars 5.20 score 66 scripts 1 dependentsbioc
motifbreakR:A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites
We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 32 species, a total of 109 versions).
Maintained by Simon Gert Coetzee. Last updated 5 months ago.
chipseqvisualizationmotifannotationtranscription
5.6 match 28 stars 8.96 score 103 scriptsbioc
seqTools:Analysis of nucleotide, sequence and quality content on fastq files
Analyze read length, phred scores and alphabet frequency and DNA k-mers on uncompressed and compressed fastq files.
Maintained by Wolfgang Kaisers. Last updated 5 months ago.
8.9 match 5.57 score 52 scripts 1 dependentsbioc
YAPSA:Yet Another Package for Signature Analysis
This package provides functions and routines for supervised analyses of mutational signatures (i.e., the signatures have to be known, cf. L. Alexandrov et al., Nature 2013 and L. Alexandrov et al., Bioaxiv 2018). In particular, the family of functions LCD (LCD = linear combination decomposition) can use optimal signature-specific cutoffs which takes care of different detectability of the different signatures. Moreover, the package provides different sets of mutational signatures, including the COSMIC and PCAWG SNV signatures and the PCAWG Indel signatures; the latter infering that with YAPSA, the concept of supervised analysis of mutational signatures is extended to Indel signatures. YAPSA also provides confidence intervals as computed by profile likelihoods and can perform signature analysis on a stratified mutational catalogue (SMC = stratify mutational catalogue) in order to analyze enrichment and depletion patterns for the signatures in different strata.
Maintained by Zuguang Gu. Last updated 5 months ago.
sequencingdnaseqsomaticmutationvisualizationclusteringgenomicvariationstatisticalmethodbiologicalquestion
7.3 match 6.41 score 57 scriptstathey
VLF:Frequency Matrix Approach for Assessing Very Low Frequency Variants in Sequence Records
Using frequency matrices, very low frequency variants (VLFs) are assessed for amino acid and nucleotide sequences. The VLFs are then compared to see if they occur in only one member of a species, singleton VLFs, or if they occur in multiple members of a species, shared VLFs. The amino acid and nucleotide VLFs are then compared to see if they are concordant with one another. Amino acid VLFs are also assessed to determine if they lead to a change in amino acid residue type, and potential changes to protein structures. Based on Stoeckle and Kerr (2012) <doi:10.1371/journal.pone.0043992>.
Maintained by Taryn B. T. Athey. Last updated 3 years ago.
21.1 match 2.16 score 48 scripts 1 dependentsbioc
maftools:Summarize, Analyze and Visualize MAF Files
Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.
Maintained by Anand Mayakonda. Last updated 5 months ago.
datarepresentationdnaseqvisualizationdrivermutationvariantannotationfeatureextractionclassificationsomaticmutationsequencingfunctionalgenomicssurvivalbioinformaticscancer-genome-atlascancer-genomicsgenomicsmaf-filestcgacurlbzip2xz-utilszlib
2.9 match 459 stars 14.63 score 948 scripts 18 dependentsbioc
atSNP:Affinity test for identifying regulatory SNPs
atSNP performs affinity tests of motif matches with the SNP or the reference genomes and SNP-led changes in motif matches.
Maintained by Sunyoung Shin. Last updated 5 months ago.
softwarechipseqgenomeannotationmotifannotationvisualizationcpp
6.8 match 1 stars 5.73 score 36 scriptsmvesuviusc
primerTree:Visually Assessing the Specificity and Informativeness of Primer Pairs
Identifies potential target sequences for a given set of primers and generates phylogenetic trees annotated with the taxonomies of the predicted amplification products.
Maintained by Matt Cannon. Last updated 1 years ago.
6.9 match 51 stars 5.56 score 16 scriptsbioc
ORFhunteR:Predict open reading frames in nucleotide sequences
The ORFhunteR package is a R and C++ library for an automatic determination and annotation of open reading frames (ORF) in a large set of RNA molecules. It efficiently implements the machine learning model based on vectorization of nucleotide sequences and the random forest classification algorithm. The ORFhunteR package consists of a set of functions written in the R language in conjunction with C++. The efficiency of the package was confirmed by the examples of the analysis of RNA molecules from the NCBI RefSeq and Ensembl databases. The package can be used in basic and applied biomedical research related to the study of the transcriptome of normal as well as altered (for example, cancer) human cells.
Maintained by Vasily V. Grinev. Last updated 5 months ago.
technologystatisticalmethodsequencingrnaseqclassificationfeatureextractioncpp
8.6 match 1 stars 4.48 scoreopenintrostat
openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
Maintained by Mine Çetinkaya-Rundel. Last updated 3 months ago.
3.3 match 240 stars 11.39 score 6.0k scriptsbioc
deepSNV:Detection of subclonal SNVs in deep sequencing data.
This package provides provides quantitative variant callers for detecting subclonal mutations in ultra-deep (>=100x coverage) sequencing experiments. The deepSNV algorithm is used for a comparative setup with a control experiment of the same loci and uses a beta-binomial model and a likelihood ratio test to discriminate sequencing errors and subclonal SNVs. The shearwater algorithm computes a Bayes classifier based on a beta-binomial model for variant calling with multiple samples for precisely estimating model parameters - such as local error rates and dispersion - and prior knowledge, e.g. from variation data bases such as COSMIC.
Maintained by Moritz Gerstung. Last updated 5 months ago.
geneticvariabilitysnpsequencinggeneticsdataimportcurlbzip2xz-utilszlibcpp
5.6 match 6.53 score 38 scripts 1 dependentsbioc
G4SNVHunter:Evaluating SNV-Induced Disruption of G-Quadruplex Structures
G-quadruplexes (G4s) are unique nucleic acid secondary structures predominantly found in guanine-rich regions and have been shown to be involved in various biological regulatory processes. G4SNVHunter is an R package designed to rapidly identify genomic sequences with G4-forming potential and accurately screen user-provided single nucleotide variants (also applicable to single nucleotide polymorphisms) that may destabilize these structures. This enables users to screen key variants for further experimental study, investigating how these variants may influence biological functions, such as gene regulation, by altering G4 formation.
Maintained by Rongxin Zhang. Last updated 3 months ago.
8.0 match 4.48 score 4 scriptsbioc
qckitfastq:FASTQ Quality Control
Assessment of FASTQ file format with multiple metrics including quality score, sequence content, overrepresented sequence and Kmers.
Maintained by August Guang. Last updated 5 months ago.
softwarequalitycontrolsequencingzlibcpp
7.9 match 4.38 score 24 scriptsbioc
immApex:Tools for Adaptive Immune Receptor Sequence-Based Machine and Deep Learning
A set of tools to build tensorflow/keras3-based models in R from amino acid and nucleotide sequences focusing on adaptive immune receptors. The package includes pre-processing of sequences, unifying gene nomenclature usage, encoding sequences, and combining models. This package will serve as the basis of future immune receptor sequence functions/packages/models compatible with the scRepertoire ecosystem.
Maintained by Nick Borcherding. Last updated 19 days ago.
softwareimmunooncologysinglecellclassificationannotationsequencingmotifannotation
5.7 match 8 stars 5.92 score 3 scriptsbioc
BUMHMM:Computational pipeline for computing probability of modification from structure probing experiment data
This is a probabilistic modelling pipeline for computing per- nucleotide posterior probabilities of modification from the data collected in structure probing experiments. The model supports multiple experimental replicates and empirically corrects coverage- and sequence-dependent biases. The model utilises the measure of a "drop-off rate" for each nucleotide, which is compared between replicates through a log-ratio (LDR). The LDRs between control replicates define a null distribution of variability in drop-off rate observed by chance and LDRs between treatment and control replicates gets compared to this distribution. Resulting empirical p-values (probability of being "drawn" from the null distribution) are used as observations in a Hidden Markov Model with a Beta-Uniform Mixture model used as an emission model. The resulting posterior probabilities indicate the probability of a nucleotide of having being modified in a structure probing experiment.
Maintained by Alina Selega. Last updated 5 months ago.
immunooncologygeneticvariabilitytranscriptiongeneexpressiongeneregulationcoveragegeneticsstructuralpredictiontranscriptomicsbayesianclassificationfeatureextractionhiddenmarkovmodelregressionrnaseqsequencing
7.9 match 4.15 score 14 scriptsbioc
QSutils:Quasispecies Diversity
Set of utility functions for viral quasispecies analysis with NGS data. Most functions are equally useful for metagenomic studies. There are three main types: (1) data manipulation and exploration—functions useful for converting reads to haplotypes and frequencies, repairing reads, intersecting strand haplotypes, and visualizing haplotype alignments. (2) diversity indices—functions to compute diversity and entropy, in which incidence, abundance, and functional indices are considered. (3) data simulation—functions useful for generating random viral quasispecies data.
Maintained by Mercedes Guerrero-Murillo. Last updated 5 months ago.
softwaregeneticsdnaseqgeneticvariabilitysequencingalignmentsequencematchingdataimport
5.7 match 5.56 score 8 scripts 1 dependentsrdinnager
slimr:Create, Run and Post-Process 'SLiM' Population Genetics Forward Simulations
Lets you write 'SLiM' scripts (population genomics simulation) using your favourite R IDE, using a syntax as close as possible to the original 'SLiM' language. It offer many tools to manipulate those scripts, as well as run them in the 'SLiM' software from R, as well as capture and post-process their output, after or even during a simulation.
Maintained by Russell Dinnage. Last updated 4 months ago.
6.8 match 8 stars 4.70 score 42 scriptsbioc
ModCon:Modifying splice site usage by changing the mRNP code, while maintaining the genetic code
Collection of functions to calculate a nucleotide sequence surrounding for splice donors sites to either activate or repress donor usage. The proposed alternative nucleotide sequence encodes the same amino acid and could be applied e.g. in reporter systems to silence or activate cryptic splice donor sites.
Maintained by Johannes Ptok. Last updated 5 months ago.
functionalgenomicsalternativesplicing
7.9 match 1 stars 4.00 score 2 scriptsropensci
rsnps:Get 'SNP' ('Single-Nucleotide' 'Polymorphism') Data on the Web
A programmatic interface to various 'SNP' 'datasets' on the web: 'OpenSNP' (<https://opensnp.org>), and 'NBCIs' 'dbSNP' database (<https://www.ncbi.nlm.nih.gov/projects/SNP/>). Functions are included for searching for 'NCBI'. For 'OpenSNP', functions are included for getting 'SNPs', and data for 'genotypes', 'phenotypes', annotations, and bulk downloads of data by user.
Maintained by Julia Gustavsen. Last updated 2 years ago.
genesnpsequenceapiwebapi-clientspeciesdbsnpopensnpncbigenotypedatasnpsweb-api
4.6 match 52 stars 6.59 score 63 scriptsropensci
beautier:'BEAUti' from R
'BEAST2' (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. 'BEAUti 2' (which is part of 'BEAST2') is a GUI tool that allows users to specify the many possible setups and generates the XML file 'BEAST2' needs to run. This package provides a way to create 'BEAST2' input files without active user input, but using R function calls instead.
Maintained by Richèl J.C. Bilderbeek. Last updated 22 days ago.
bayesianbeastbeast2beautiphylogenetic-inferencephylogenetics
3.4 match 13 stars 8.76 score 198 scripts 5 dependentsprabinameher
EncDNA:Encoding of Nucleotide Sequences into Numeric Feature Vectors
We describe fifteen different splice site sequence encoding schemes that have been used in earlier studies for mapping of splice site sequences into numeric feature vectors. These encoding schemes will also be helpful for transforming other nucleotide sequences into numeric forms, provided they are of equal length. These encoding schemes will help the computational biologist working in the field of classification (binary or multiclass) or prediction involving nucleic acid sequences of equal length.
Maintained by Prabina Kumar Meher. Last updated 6 years ago.
29.6 match 1 stars 1.00 scorecran
KRIS:Keen and Reliable Interface Subroutines for Bioinformatic Analysis
Provides useful functions which are needed for bioinformatic analysis such as calculating linear principal components from numeric data and Single-nucleotide polymorphism (SNP) dataset, calculating fixation index (Fst) using Hudson method, creating scatter plots in 3 views, handling with PLINK binary file format, detecting rough structures and outliers using unsupervised clustering, and calculating matrix multiplication in the faster way for big data.
Maintained by Kridsadakorn Chaichoompu. Last updated 4 years ago.
10.4 match 2.73 score 18 scripts 2 dependentsuclahs-cds
BoutrosLab.plotting.general:Functions to Create Publication-Quality Plots
Contains several plotting functions such as barplots, scatterplots, heatmaps, as well as functions to combine plots and assist in the creation of these plots. These functions will give users great ease of use and customization options in broad use for biomedical applications, as well as general purpose plotting. Each of the functions also provides valid default settings to make plotting data more efficient and producing high quality plots with standard colour schemes simpler. All functions within this package are capable of producing plots that are of the quality to be presented in scientific publications and journals. P'ng et al.; BPG: Seamless, automated and interactive visualization of scientific data; BMC Bioinformatics 2019 <doi:10.1186/s12859-019-2610-2>.
Maintained by Paul Boutros. Last updated 5 months ago.
3.4 match 12 stars 8.36 score 414 scripts 6 dependentsbioc
mitoClone2:Clonal Population Identification in Single-Cell RNA-Seq Data using Mitochondrial and Somatic Mutations
This package primarily identifies variants in mitochondrial genomes from BAM alignment files. It filters these variants to remove RNA editing events then estimates their evolutionary relationship (i.e. their phylogenetic tree) and groups single cells into clones. It also visualizes the mutations and providing additional genomic context.
Maintained by Benjamin Story. Last updated 5 months ago.
annotationdataimportgeneticssnpsoftwaresinglecellalignmentcurlbzip2xz-utilszlibcpp
6.3 match 1 stars 4.48 score 9 scriptsbioc
rhinotypeR:Rhinovirus genotyping
"rhinotypeR" is designed to automate the comparison of sequence data against prototype strains, streamlining the genotype assignment process. By implementing predefined pairwise distance thresholds, this package makes genotype assignment accessible to researchers and public health professionals. This tool enhances our epidemiological toolkit by enabling more efficient surveillance and analysis of rhinoviruses (RVs) and other viral pathogens with complex genomic landscapes. Additionally, "rhinotypeR" supports comprehensive visualization and analysis of single nucleotide polymorphisms (SNPs) and amino acid substitutions, facilitating in-depth genetic and evolutionary studies.
Maintained by Martha Luka. Last updated 5 months ago.
sequencinggeneticsphylogenetics
4.3 match 4 stars 6.28 score 2 scriptsbioc
ORFik:Open Reading Frames in Genomics
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
Maintained by Haakon Tjeldnes. Last updated 27 days ago.
immunooncologysoftwaresequencingriboseqrnaseqfunctionalgenomicscoveragealignmentdataimportcpp
2.4 match 33 stars 10.63 score 115 scripts 2 dependentsbioc
CSAR:Statistical tools for the analysis of ChIP-seq data
Statistical tools for ChIP-seq data analysis. The package includes the statistical method described in Kaufmann et al. (2009) PLoS Biology: 7(4):e1000090. Briefly, Taking the average DNA fragment size subjected to sequencing into account, the software calculates genomic single-nucleotide read-enrichment values. After normalization, sample and control are compared using a test based on the Poisson distribution. Test statistic thresholds to control the false discovery rate are obtained through random permutation.
Maintained by Jose M Muino. Last updated 5 months ago.
5.6 match 4.30 score 6 scriptserhard-lab
grandR:Comprehensive Analysis of Nucleotide Conversion Sequencing Data
Nucleotide conversion sequencing experiments have been developed to add a temporal dimension to RNA-seq and single-cell RNA-seq. Such experiments require specialized tools for primary processing such as GRAND-SLAM, (see 'Jürges et al' <doi:10.1093/bioinformatics/bty256>) and specialized tools for downstream analyses. 'grandR' provides a comprehensive toolbox for quality control, kinetic modeling, differential gene expression analysis and visualization of such data.
Maintained by Florian Erhard. Last updated 1 months ago.
3.4 match 11 stars 7.03 score 18 scripts 1 dependentsbioc
lumi:BeadArray Specific Methods for Illumina Methylation and Expression Microarrays
The lumi package provides an integrated solution for the Illumina microarray data analysis. It includes functions of Illumina BeadStudio (GenomeStudio) data input, quality control, BeadArray-specific variance stabilization, normalization and gene annotation at the probe level. It also includes the functions of processing Illumina methylation microarrays, especially Illumina Infinium methylation microarrays.
Maintained by Lei Huang. Last updated 5 months ago.
microarrayonechannelpreprocessingdnamethylationqualitycontroltwochannel
3.8 match 6.27 score 294 scripts 5 dependentsbioc
TFBSTools:Software Package for Transcription Factor Binding Site (TFBS) Analysis
TFBSTools is a package for the analysis and manipulation of transcription factor binding sites. It includes matrices conversion between Position Frequency Matirx (PFM), Position Weight Matirx (PWM) and Information Content Matrix (ICM). It can also scan putative TFBS from sequence/alignment, query JASPAR database and provides a wrapper of de novo motif discovery software.
Maintained by Ge Tan. Last updated 4 days ago.
motifannotationgeneregulationmotifdiscoverytranscriptionalignment
1.9 match 28 stars 12.36 score 1.1k scripts 18 dependentsadrientaudiere
MiscMetabar:Miscellaneous Functions for Metabarcoding Analysis
Facilitate the description, transformation, exploration, and reproducibility of metabarcoding analyses. 'MiscMetabar' is mainly built on top of the 'phyloseq', 'dada2' and 'targets' R packages. It helps to build reproducible and robust bioinformatics pipelines in R. 'MiscMetabar' makes ecological analysis of alpha and beta-diversity easier, more reproducible and more powerful by integrating a large number of tools. Important features are described in Taudière A. (2023) <doi:10.21105/joss.06038>.
Maintained by Adrien Taudière. Last updated 25 days ago.
sequencingmicrobiomemetagenomicsclusteringclassificationvisualizationampliconamplicon-sequencingbiodiversity-informaticsecologyilluminametabarcodingngs-analysis
3.6 match 17 stars 6.44 score 23 scriptsjinghuazhao
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 16 days ago.
1.9 match 12 stars 11.88 score 448 scripts 16 dependentsmsq-123
CovidMutations:Mutation Analysis and Assay Validation Toolkit for COVID-19 (Coronavirus Disease 2019)
A feasible framework for mutation analysis and reverse transcription polymerase chain reaction (RT-PCR) assay evaluation of COVID-19, including mutation profile visualization, statistics and mutation ratio of each assay. The mutation ratio is conducive to evaluating the coverage of RT-PCR assays in large-sized samples<doi:10.20944/preprints202004.0529.v1>.
Maintained by Shaoqian Ma. Last updated 5 years ago.
5.0 match 4 stars 4.30 score 6 scriptsstatgenlmu
coala:A Framework for Coalescent Simulation
Coalescent simulators can rapidly simulate biological sequences evolving according to a given model of evolution. You can use this package to specify such models, to conduct the simulations and to calculate additional statistics from the results (Staab, Metzler, 2016 <doi:10.1093/bioinformatics/btw098>). It relies on existing simulators for doing the simulation, and currently supports the programs 'ms', 'msms' and 'scrm'. It also supports finite-sites mutation models by combining the simulators with the program 'seq-gen'. Coala provides functions for calculating certain summary statistics, which can also be applied to actual biological data. One possibility to import data is through the 'PopGenome' package (<https://github.com/pievos101/PopGenome>).
Maintained by Dirk Metzler. Last updated 1 years ago.
coalescentdnaevolutionpopgensimulationcpp
3.0 match 23 stars 7.06 score 84 scriptsbioc
VarCon:VarCon: an R package for retrieving neighboring nucleotides of an SNV
VarCon is an R package which converts the positional information from the annotation of an single nucleotide variation (SNV) (either referring to the coding sequence or the reference genomic sequence). It retrieves the genomic reference sequence around the position of the single nucleotide variation. To asses, whether the SNV could potentially influence binding of splicing regulatory proteins VarCon calcualtes the HEXplorer score as an estimation. Besides, VarCon additionally reports splice site strengths of splice sites within the retrieved genomic sequence and any changes due to the SNV.
Maintained by Johannes Ptok. Last updated 5 months ago.
functionalgenomicsalternativesplicing
5.3 match 4.00 score 5 scriptsemmanuelparadis
pegas:Population and Evolutionary Genetics Analysis System
Functions for reading, writing, plotting, analysing, and manipulating allelic and haplotypic data, including from VCF files, and for the analysis of population nucleotide sequences and micro-satellites including coalescent analyses, linkage disequilibrium, population structure (Fst, Amova) and equilibrium (HWE), haplotype networks, minimum spanning tree and network, and median-joining networks.
Maintained by Emmanuel Paradis. Last updated 1 years ago.
2.8 match 7.53 score 576 scripts 18 dependentsbioc
DropletUtils:Utilities for Handling Single-Cell Droplet Data
Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading from count matrices or molecule information files, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix.
Maintained by Jonathan Griffiths. Last updated 3 months ago.
immunooncologysinglecellsequencingrnaseqgeneexpressiontranscriptomicsdataimportcoveragezlibcpp
2.0 match 10.08 score 2.7k scripts 9 dependentsbioc
oligo:Preprocessing tools for oligonucleotide arrays
A package to analyze oligonucleotide arrays (expression/SNP/tiling/exon) at probe-level. It currently supports Affymetrix (CEL files) and NimbleGen arrays (XYS files).
Maintained by Benilton Carvalho. Last updated 8 days ago.
microarrayonechanneltwochannelpreprocessingsnpdifferentialexpressionexonarraygeneexpressiondataimportzlib
1.9 match 3 stars 10.42 score 528 scripts 10 dependentsropensci
bold:Interface to Bold Systems API
A programmatic interface to the Web Service methods provided by Bold Systems (<http://www.boldsystems.org/>) for genetic 'barcode' data. Functions include methods for searching by sequences by taxonomic names, ids, collectors, and institutions; as well as a function for searching for specimens, and downloading trace files.
Maintained by Salix Dubois. Last updated 3 months ago.
biodiversitybarcodednasequencesfastaapi-wrapperbarcodestaxize
3.4 match 18 stars 5.74 score 57 scriptsbioc
TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data
The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.
Maintained by Tiago Chedraoui Silva. Last updated 27 days ago.
dnamethylationdifferentialmethylationgeneregulationgeneexpressionmethylationarraydifferentialexpressionpathwaysnetworksequencingsurvivalsoftwarebiocbioconductorgdcintegrative-analysistcgatcga-datatcgabiolinks
1.3 match 305 stars 14.45 score 1.6k scripts 6 dependentsjgx65
hierfstat:Estimation and Tests of Hierarchical F-Statistics
Estimates hierarchical F-statistics from haploid or diploid genetic data with any numbers of levels in the hierarchy, following the algorithm of Yang (Evolution(1998), 52:950). Tests via randomisations the significance of each F and variance components, using the likelihood-ratio statistics G (Goudet et al. (1996) <https://academic.oup.com/genetics/article/144/4/1933/6017091>). Estimates genetic diversity statistics for haploid and diploid genetic datasets in various formats, including inbreeding and coancestry coefficients, and population specific F-statistics following Weir and Goudet (2017) <https://academic.oup.com/genetics/article/206/4/2085/6072590>.
Maintained by Jerome Goudet. Last updated 4 months ago.
devtoolsfstatisticsgwashierfstatkinshippopulation-geneticspopulation-genomicsquantitative-geneticssimulations
1.8 match 25 stars 10.94 score 560 scripts 4 dependentssystemsbioinformatics
parcr:Construct Parsers for Structured Text Files
Construct parser combinator functions, higher order functions that parse input. Construction of such parsers is transparent and easy. Their main application is the parsing of structured text files like those generated by laboratory instruments. Based on a paper by Hutton (1992) <doi:10.1017/S0956796800000411>.
Maintained by Douwe Molenaar. Last updated 9 months ago.
combinatorshigher-order-functionsparserparsing
3.8 match 4 stars 5.08 score 8 scriptslucasnell
jackalope:A Swift, Versatile Phylogenomic and High-Throughput Sequencing Simulator
Simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina <https://www.illumina.com/> and Pacific Biosciences (PacBio) <https://www.pacb.com/> platforms. It can either read reference genomes from FASTA files or simulate new ones. Genomic variants can be simulated using summary statistics, phylogenies, Variant Call Format (VCF) files, and coalescent simulations—the latter of which can include selection, recombination, and demographic fluctuations. 'jackalope' can simulate single, paired-end, or mate-pair Illumina reads, as well as PacBio reads. These simulations include sequencing errors, mapping qualities, multiplexing, and optical/polymerase chain reaction (PCR) duplicates. Simulating Illumina sequencing is based on ART by Huang et al. (2012) <doi:10.1093/bioinformatics/btr708>. PacBio sequencing simulation is based on SimLoRD by Stöcker et al. (2016) <doi:10.1093/bioinformatics/btw286>. All outputs can be written to standard file formats.
Maintained by Lucas A. Nell. Last updated 1 years ago.
zlibopenblascurlbzip2xz-utilscpp
3.3 match 8 stars 5.28 score 24 scriptsnakarinp
longreadvqs:Viral Quasispecies Comparison from Long-Read Sequencing Data
Performs variety of viral quasispecies diversity analyses [see Pamornchainavakul et al. (2024) <doi:10.21203/rs.3.rs-4637890/v1>] based on long-read sequence alignment. Main functions include 1) sequencing error and other noise minimization and read sampling, 2) Single nucleotide variant (SNV) profiles comparison, and 3) viral quasispecies profiles comparison and visualization.
Maintained by Nakarin Pamornchainavakul. Last updated 7 months ago.
3.7 match 4.65 score 4 scriptsbioc
Rsubread:Mapping, quantification and variant analysis of sequencing data
Alignment, quantification and analysis of RNA sequencing data (including both bulk RNA-seq and scRNA-seq) and DNA sequenicng data (including ATAC-seq, ChIP-seq, WGS, WES etc). Includes functionality for read mapping, read counting, SNP calling, structural variant detection and gene fusion discovery. Can be applied to all major sequencing techologies and to both short and long sequence reads.
Maintained by Wei Shi. Last updated 2 days ago.
sequencingalignmentsequencematchingrnaseqchipseqsinglecellgeneexpressiongeneregulationgeneticsimmunooncologysnpgeneticvariabilitypreprocessingqualitycontrolgenomeannotationgenefusiondetectionindeldetectionvariantannotationvariantdetectionmultiplesequencealignmentzlib
1.9 match 9.24 score 892 scripts 10 dependentsimmunomind
immunarch:Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires
A comprehensive framework for bioinformatics exploratory analysis of bulk and single-cell T-cell receptor and antibody repertoires. It provides seamless data loading, analysis and visualisation for AIRR (Adaptive Immune Receptor Repertoire) data, both bulk immunosequencing (RepSeq) and single-cell sequencing (scRNAseq). Immunarch implements most of the widely used AIRR analysis methods, such as: clonality analysis, estimation of repertoire similarities in distribution of clonotypes and gene segments, repertoire diversity analysis, annotation of clonotypes using external immune receptor databases and clonotype tracking in vaccination and cancer studies. A successor to our previously published 'tcR' immunoinformatics package (Nazarov 2015) <doi:10.1186/s12859-015-0613-1>.
Maintained by Vadim I. Nazarov. Last updated 12 months ago.
airr-analysisb-cell-receptorbcrbcr-repertoirebioinformaticsigig-repertoireimmune-repertoireimmune-repertoire-analysisimmune-repertoire-dataimmunoglobulinimmunoinformaticsimmunologyrep-seqrepertoire-analysissingle-cellsingle-cell-analysist-cell-receptortcrtcr-repertoirecpp
1.8 match 315 stars 9.49 score 203 scriptsbioc
DECIPHER:Tools for curating, analyzing, and manipulating biological sequences
A toolset for deciphering and managing biological sequences.
Maintained by Erik Wright. Last updated 5 days ago.
clusteringgeneticssequencingdataimportvisualizationmicroarrayqualitycontrolqpcralignmentwholegenomemicrobiomeimmunooncologygenepredictionopenmp
2.0 match 8.40 score 1.1k scripts 14 dependentsbioc
SeqVarTools:Tools for variant data
An interface to the fast-access storage format for VCF data provided in SeqArray, with tools for common operations and analysis.
Maintained by Stephanie M. Gogarten. Last updated 5 months ago.
snpgeneticvariabilitysequencinggenetics
1.9 match 3 stars 8.76 score 384 scripts 2 dependentsyulab-smu
TDbook:Companion Package for the Book "Data Integration, Manipulation and Visualization of Phylogenetic Trees" by Guangchuang Yu (2022, ISBN:9781032233574, doi:10.1201/9781003279242)
The companion package that provides all the datasets used in the book "Data Integration, Manipulation and Visualization of Phylogenetic Trees" by Guangchuang Yu (2022, ISBN:9781032233574, doi:10.1201/9781003279242).
Maintained by Guangchuang Yu. Last updated 3 years ago.
3.3 match 13 stars 4.88 score 59 scriptscran
ips:Interfaces to Phylogenetic Software in R
Functions that wrap popular phylogenetic software for sequence alignment, masking of sequence alignments, and estimation of phylogenies and ancestral character states.
Maintained by Christoph Heibl. Last updated 11 months ago.
3.7 match 4.28 score 128 scripts 1 dependentsthomaschln
snplinkage:Single Nucleotide Polymorphisms Linkage Disequilibrium Visualizations
Linkage disequilibrium visualizations of up to several hundreds of single nucleotide polymorphisms (SNPs), annotated with chromosomic positions and gene names. Two types of plots are available for small numbers of SNPs (<40) and for large numbers (tested up to 500). Both can be extended by combining other ggplots, e.g. association studies results, and functions enable to directly visualize the effect of SNP selection methods, as minor allele frequency filtering and TagSNP selection, with a second correlation heatmap. The SNPs correlations are computed on Genotype Data objects from the 'GWASTools' package using the 'SNPRelate' package, and the plots are customizable 'ggplot2' and 'gtable' objects and are annotated using the 'biomaRt' package. Usage is detailed in the vignette with example data and results from up to 500 SNPs of 1,200 scans are in Charlon T. (2019) <doi:10.13097/archive-ouverte/unige:161795>.
Maintained by Thomas Charlon. Last updated 4 months ago.
geneticvariabilitymicroarraysnp
3.4 match 4.62 score 14 scriptsjlp-bioinf
RRNA:Secondary Structure Plotting for RNA
Functions for creating and manipulating RNA secondary structure plots.
Maintained by JP Bida. Last updated 11 months ago.
3.4 match 1 stars 4.33 score 47 scripts 1 dependentsbioc
m6Aboost:m6Aboost
This package can help user to run the m6Aboost model on their own miCLIP2 data. The package includes functions to assign the read counts and get the features to run the m6Aboost model. The miCLIP2 data should be stored in a GRanges object. More details can be found in the vignette.
Maintained by You Zhou. Last updated 5 months ago.
sequencingepigeneticsgeneticsexperimenthubsoftware
3.3 match 2 stars 4.30 score 5 scriptsbioc
seq.hotSPOT:Targeted sequencing panel design based on mutation hotspots
seq.hotSPOT provides a resource for designing effective sequencing panels to help improve mutation capture efficacy for ultradeep sequencing projects. Using SNV datasets, this package designs custom panels for any tissue of interest and identify the genomic regions likely to contain the most mutations. Establishing efficient targeted sequencing panels can allow researchers to study mutation burden in tissues at high depth without the economic burden of whole-exome or whole-genome sequencing. This tool was developed to make high-depth sequencing panels to study low-frequency clonal mutations in clinically normal and cancerous tissues.
Maintained by Sydney Grant. Last updated 5 months ago.
softwaretechnologysequencingdnaseqwholegenome
3.5 match 4.00 score 3 scriptsbioc
EDASeq:Exploratory Data Analysis and Normalization for RNA-Seq
Numerical and graphical summaries of RNA-Seq read data. Within-lane normalization procedures to adjust for GC-content effect (or other gene-level effects) on read counts: loess robust local regression, global-scaling, and full-quantile normalization (Risso et al., 2011). Between-lane normalization procedures to adjust for distributional differences between lanes (e.g., sequencing depth): global-scaling and full-quantile normalization (Bullard et al., 2010).
Maintained by Davide Risso. Last updated 5 months ago.
immunooncologysequencingrnaseqpreprocessingqualitycontroldifferentialexpression
1.3 match 5 stars 10.24 score 594 scripts 9 dependentswilliam-swl
plutor:Useful Functions for Visualization
In ancient Roman mythology, 'Pluto' was the ruler of the underworld and presides over the afterlife. 'Pluto' was frequently conflated with 'Plutus', the god of wealth, because mineral wealth was found underground. When plotting with R, you try once, twice, practice again and again, and finally you get a pretty figure you want. It's a 'plot tour', a tour about repetition and reward. Hope 'plutor' helps you on the tour!
Maintained by William Song. Last updated 1 years ago.
3.8 match 3 stars 3.62 score 28 scriptsbioc
compSPOT:compSPOT: Tool for identifying and comparing significantly mutated genomic hotspots
Clonal cell groups share common mutations within cancer, precancer, and even clinically normal appearing tissues. The frequency and location of these mutations may predict prognosis and cancer risk. It has also been well established that certain genomic regions have increased sensitivity to acquiring mutations. Mutation-sensitive genomic regions may therefore serve as markers for predicting cancer risk. This package contains multiple functions to establish significantly mutated hotspots, compare hotspot mutation burden between samples, and perform exploratory data analysis of the correlation between hotspot mutation burden and personal risk factors for cancer, such as age, gender, and history of carcinogen exposure. This package allows users to identify robust genomic markers to help establish cancer risk.
Maintained by Sydney Grant. Last updated 5 months ago.
softwaretechnologysequencingdnaseqwholegenomeclassificationsinglecellsurvivalmultiplecomparison
3.4 match 4.00 score 3 scriptsbioc
SigsPack:Mutational Signature Estimation for Single Samples
Single sample estimation of exposure to mutational signatures. Exposures to known mutational signatures are estimated for single samples, based on quadratic programming algorithms. Bootstrapping the input mutational catalogues provides estimations on the stability of these exposures. The effect of the sequence composition of mutational context can be taken into account by normalising the catalogues.
Maintained by Franziska Schumann. Last updated 5 months ago.
somaticmutationsnpvariantannotationbiomedicalinformaticsdnaseq
3.0 match 2 stars 4.30 score 4 scriptspneuvial
adjclust:Adjacency-Constrained Clustering of a Block-Diagonal Similarity Matrix
Implements a constrained version of hierarchical agglomerative clustering, in which each observation is associated to a position, and only adjacent clusters can be merged. Typical application fields in bioinformatics include Genome-Wide Association Studies or Hi-C data analysis, where the similarity between items is a decreasing function of their genomic distance. Taking advantage of this feature, the implemented algorithm is time and memory efficient. This algorithm is described in Ambroise et al (2019) <doi:10.1186/s13015-019-0157-4>.
Maintained by Pierre Neuvial. Last updated 5 months ago.
clusteringfeatureextractiongwashi-chierarchical-clusteringlinkage-disequilibriumcppopenmp
1.8 match 16 stars 7.35 score 13 scripts 2 dependentsbioc
isomiRs:Analyze isomiRs and miRNAs from small RNA-seq
Characterization of miRNAs and isomiRs, clustering and differential expression.
Maintained by Lorena Pantano. Last updated 5 months ago.
mirnarnaseqdifferentialexpressionclusteringimmunooncologyanalyze-isomirsbioconductorisomirs
1.8 match 8 stars 7.09 score 43 scriptsubcxzhang
iimi:Identifying Infection with Machine Intelligence
A novel machine learning method for plant viruses diagnostic using genome sequencing data. This package includes three different machine learning models, random forest, XGBoost, and elastic net, to train and predict mapped genome samples. Mappability profile and unreliable regions are introduced to the algorithm, and users can build a mappability profile from scratch with functions included in the package. Plotting mapped sample coverage information is provided.
Maintained by Xuekui Zhang. Last updated 5 months ago.
4.9 match 2.60 score 5 scriptskbhoehn
dowser:B Cell Receptor Phylogenetics Toolkit
Provides a set of functions for inferring, visualizing, and analyzing B cell phylogenetic trees. Provides methods to 1) reconstruct unmutated ancestral sequences, 2) build B cell phylogenetic trees using multiple methods, 3) visualize trees with metadata at the tips, 4) reconstruct intermediate sequences, 5) detect biased ancestor-descendant relationships among metadata types Workflow examples available at documentation site (see URL). Citations: Hoehn et al (2022) <doi:10.1371/journal.pcbi.1009885>, Hoehn et al (2021) <doi:10.1101/2021.01.06.425648>.
Maintained by Kenneth Hoehn. Last updated 2 months ago.
1.7 match 6.81 score 84 scriptsbioc
Structstrings:Implementation of the dot bracket annotations with Biostrings
The Structstrings package implements the widely used dot bracket annotation for storing base pairing information in structured RNA. Structstrings uses the infrastructure provided by the Biostrings package and derives the DotBracketString and related classes from the BString class. From these, base pair tables can be produced for in depth analysis. In addition, the loop indices of the base pairs can be retrieved as well. For better efficiency, information conversion is implemented in C, inspired to a large extend by the ViennaRNA package.
Maintained by Felix G.M. Ernst. Last updated 5 months ago.
dataimportdatarepresentationinfrastructuresequencingsoftwarealignmentsequencematchingbioconductorrnarna-structural-analysisrna-structuresequencesstructures
1.8 match 4 stars 6.46 score 3 scripts 4 dependentsbioc
IsoformSwitchAnalyzeR:Identify, Annotate and Visualize Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data
Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.
Maintained by Kristoffer Vitting-Seerup. Last updated 5 months ago.
geneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicingvisualizationstatisticalmethodtranscriptomevariantbiomedicalinformaticsfunctionalgenomicssystemsbiologytranscriptomicsrnaseqannotationfunctionalpredictiongenepredictiondataimportmultiplecomparisonbatcheffectimmunooncology
1.2 match 108 stars 9.26 score 125 scriptsbioc
motifmatchr:Fast Motif Matching in R
Quickly find motif matches for many motifs and many sequences. Wraps C++ code from the MOODS motif calling library, which was developed by Pasi Rastas, Janne Korhonen, and Petri Martinmäki.
Maintained by Alicia Schep. Last updated 5 months ago.
1.3 match 8.12 score 722 scripts 5 dependentsr-forge
genoPlotR:Plot Publication-Grade Gene and Genome Maps
Draws gene or genome maps and comparisons between these, in a publication-grade manner. Starting from simple, common files, it will draw postscript or PDF files that can be sent as such to journals.
Maintained by Lionel Guy. Last updated 4 years ago.
1.9 match 5.33 score 106 scriptsbioc
SNPRelate:Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data
Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed an R package SNPRelate to provide a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.
Maintained by Xiuwen Zheng. Last updated 5 months ago.
infrastructuregeneticsstatisticalmethodprincipalcomponentbioinformaticsgds-formatpcasimdsnpopenblascpp
0.8 match 104 stars 12.69 score 1.6k scripts 18 dependentsbioc
R4RNA:An R package for RNA visualization and analysis
A package for RNA basepair analysis, including the visualization of basepairs as arc diagrams for easy comparison and annotation of sequence and structure. Arc diagrams can additionally be projected onto multiple sequence alignments to assess basepair conservation and covariation, with numerical methods for computing statistics for each.
Maintained by Daniel Lai. Last updated 5 months ago.
alignmentmultiplesequencealignmentpreprocessingvisualizationdataimportdatarepresentationmultiplecomparison
1.8 match 5.36 score 19 scripts 4 dependentsjdieramon
refseqR:Common Computational Operations Working with RefSeq Entries (GenBank)
Fetches NCBI data (RefSeq <https://www.ncbi.nlm.nih.gov/refseq/> database) and provides an environment to extract information at the level of gene, mRNA or protein accessions.
Maintained by Jose V. Die. Last updated 3 months ago.
1.8 match 4 stars 5.34 score 5 scriptsbioc
wavClusteR:Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data
The package provides an integrated pipeline for the analysis of PAR-CLIP data. PAR-CLIP-induced transitions are first discriminated from sequencing errors, SNPs and additional non-experimental sources by a non- parametric mixture model. The protein binding sites (clusters) are then resolved at high resolution and cluster statistics are estimated using a rigorous Bayesian framework. Post-processing of the results, data export for UCSC genome browser visualization and motif search analysis are provided. In addition, the package allows to integrate RNA-Seq data to estimate the False Discovery Rate of cluster detection. Key functions support parallel multicore computing. Note: while wavClusteR was designed for PAR-CLIP data analysis, it can be applied to the analysis of other NGS data obtained from experimental procedures that induce nucleotide substitutions (e.g. BisSeq).
Maintained by Federico Comoglio. Last updated 5 months ago.
immunooncologysequencingtechnologyripseqrnaseqbayesian
2.0 match 4.60 score 3 scriptsempiricalbayes
SNVLFDR:Empirical Bayes Single Nucleotide Variant Calling
Identifies single nucleotide variants in next-generation sequencing data by estimating their local false discovery rates. For more details, see Karimnezhad, A. and Perkins, T. J. (2024) <doi:10.1038/s41598-024-51958-z>.
Maintained by Ali Karimnezhad. Last updated 1 years ago.
3.4 match 2.70 scorearindamroychoudhury
rapidphylo:Rapidly Estimates Phylogeny from Large Allele Frequency Data Using Root Distances Method
Rapidly estimates tree-topology from large allele frequency data using Root Distances Method, under a Brownian Motion Model. See Peng et al. (2021) <doi:10.1016/j.ympev.2021.107142>.
Maintained by Arindam RoyChoudhury. Last updated 2 years ago.
3.4 match 2.70 scorebioc
PWMEnrich:PWM enrichment analysis
A toolkit of high-level functions for DNA motif scanning and enrichment analysis built upon Biostrings. The main functionality is PWM enrichment analysis of already known PWMs (e.g. from databases such as MotifDb), but the package also implements high-level functions for PWM scanning and visualisation. The package does not perform "de novo" motif discovery, but is instead focused on using motifs that are either experimentally derived or computationally constructed by other tools.
Maintained by Diego Diez. Last updated 5 months ago.
motifannotationsequencematchingsoftware
1.8 match 5.08 score 60 scriptsemmanuelparadis
ape:Analyses of Phylogenetics and Evolution
Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
Maintained by Emmanuel Paradis. Last updated 10 hours ago.
0.5 match 64 stars 17.22 score 13k scripts 599 dependentsbioc
MassArray:Analytical Tools for MassArray Data
This package is designed for the import, quality control, analysis, and visualization of methylation data generated using Sequenom's MassArray platform. The tools herein contain a highly detailed amplicon prediction for optimal assay design. Also included are quality control measures of data, such as primer dimer and bisulfite conversion efficiency estimation. Methylation data are calculated using the same algorithms contained in the EpiTyper software package. Additionally, automatic SNP-detection can be used to flag potentially confounded data from specific CG sites. Visualization includes barplots of methylation data as well as UCSC Genome Browser-compatible BED tracks. Multiple assays can be positionally combined for integrated analysis.
Maintained by Reid F. Thompson. Last updated 5 months ago.
immunooncologydnamethylationsnpmassspectrometrygeneticsdataimportvisualization
2.0 match 4.30 score 1 scriptsbioc
mobileRNA:mobileRNA: Investigate the RNA mobilome & population-scale changes
Genomic analysis can be utilised to identify differences between RNA populations in two conditions, both in production and abundance. This includes the identification of RNAs produced by multiple genomes within a biological system. For example, RNA produced by pathogens within a host or mobile RNAs in plant graft systems. The mobileRNA package provides methods to pre-process, analyse and visualise the sRNA and mRNA populations based on the premise of mapping reads to all genotypes at the same time.
Maintained by Katie Jeynes-Cupper. Last updated 5 months ago.
visualizationrnaseqsequencingsmallrnagenomeassemblyclusteringexperimentaldesignqualitycontrolworkflowstepalignmentpreprocessingbioinformaticsplant-science
1.7 match 4 stars 5.00 score 2 scriptsconnor-reid-tiffany
omu:A Metabolomics Analysis Tool for Intuitive Figures and Convenient Metadata Collection
Facilitates the creation of intuitive figures to describe metabolomics data by utilizing Kyoto Encyclopedia of Genes and Genomes (KEGG) hierarchy data, and gathers functional orthology and gene data from the KEGG-REST API.
Maintained by Connor Tiffany. Last updated 1 years ago.
1.8 match 3 stars 4.89 score 52 scriptsbioc
dStruct:Identifying differentially reactive regions from RNA structurome profiling data
dStruct identifies differentially reactive regions from RNA structurome profiling data. dStruct is compatible with a broad range of structurome profiling technologies, e.g., SHAPE-MaP, DMS-MaPseq, Structure-Seq, SHAPE-Seq, etc. See Choudhary et al., Genome Biology, 2019 for the underlying method.
Maintained by Krishna Choudhary. Last updated 5 months ago.
statisticalmethodstructuralpredictionsequencingsoftware
1.8 match 2 stars 4.86 score 12 scriptsbioc
VariantTools:Tools for Exploratory Analysis of Variant Calls
Explore, diagnose, and compare variant calls using filters.
Maintained by Michael Lawrence. Last updated 5 months ago.
geneticsgeneticvariabilitysequencing
2.0 match 4.09 score 41 scriptsbioc
SynMut:SynMut: Designing Synonymously Mutated Sequences with Different Genomic Signatures
There are increasing demands on designing virus mutants with specific dinucleotide or codon composition. This tool can take both dinucleotide preference and/or codon usage bias into account while designing mutants. It is a powerful tool for in silico designs of DNA sequence mutants.
Maintained by Haogao Gu. Last updated 5 months ago.
sequencematchingexperimentaldesignpreprocessing
1.9 match 2 stars 4.30 score 1 scriptspboutros
SeqKat:Detection of Kataegis
Kataegis is a localized hypermutation occurring when a region is enriched in somatic SNVs. Kataegis can result from multiple cytosine deaminations catalyzed by the AID/APOBEC family of proteins. This package contains functions to detect kataegis from SNVs in BED format. This package reports two scores per kataegic event, a hypermutation score and an APOBEC mediated kataegic score. Yousif, F. et al.; The Origins and Consequences of Localized and Global Somatic Hypermutation; Biorxiv 2018 <doi:10.1101/287839>.
Maintained by Paul C. Boutros. Last updated 5 years ago.
3.8 match 2.11 score 13 scriptsbioc
hiReadsProcessor:Functions to process LM-PCR reads from 454/Illumina data
hiReadsProcessor contains set of functions which allow users to process LM-PCR products sequenced using any platform. Given an excel/txt file containing parameters for demultiplexing and sample metadata, the functions automate trimming of adaptors and identification of the genomic product. Genomic products are further processed for QC and abundance quantification.
Maintained by Nirav V Malani. Last updated 5 months ago.
1.9 match 4.18 score 7 scriptsbioc
R453Plus1Toolbox:A package for importing and analyzing data from Roche's Genome Sequencer System
The R453Plus1 Toolbox comprises useful functions for the analysis of data generated by Roche's 454 sequencing platform. It adds functions for quality assurance as well as for annotation and visualization of detected variants, complementing the software tools shipped by Roche with their product. Further, a pipeline for the detection of structural variants is provided.
Maintained by Hans-Ulrich Klein. Last updated 5 months ago.
sequencinginfrastructuredataimportdatarepresentationvisualizationqualitycontrolreportwriting
2.3 match 3.48 score 10 scriptsgreifflab
immuneSIM:Tunable Simulation of B- And T-Cell Receptor Repertoires
Simulate full B-cell and T-cell receptor repertoires using an in silico recombination process that includes a wide variety of tunable parameters to introduce noise and biases. Additional post-simulation modification functions allow the user to implant motifs or codon biases as well as remodeling sequence similarity architecture. The output repertoires contain records of all relevant repertoire dimensions and can be analyzed using provided repertoire analysis functions. Preprint is available at bioRxiv (Weber et al., 2019 <doi:10.1101/759795>).
Maintained by Cédric R. Weber. Last updated 1 years ago.
1.8 match 37 stars 4.44 score 15 scriptsstitam
webseq:Access data from biological sequence databases like NCBI, ENA, MGnify
This package interacts with online biological sequence databases. It provides functions to search for sequences, convert identifiers and download sequences and associated metadata.
Maintained by Tamas Stirling. Last updated 1 months ago.
1.7 match 3 stars 4.13 score 1 scriptsbioc
GenomicAlignments:Representation and manipulation of short genomic alignments
Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.
Maintained by Hervé Pagès. Last updated 5 months ago.
infrastructuredataimportgeneticssequencingrnaseqsnpcoveragealignmentimmunooncologybioconductor-packagecore-package
0.5 match 10 stars 13.61 score 3.1k scripts 529 dependentscran
IPCAPS:Iterative Pruning to Capture Population Structure
An unsupervised clustering algorithm based on iterative pruning is for capturing population structure. This version supports ordinal data which can be applied directly to SNP data to identify fine-level population structure and it is built on the iterative pruning Principal Component Analysis ('ipPCA') algorithm as explained in Intarapanich et al. (2009) <doi:10.1186/1471-2105-10-382>. The 'IPCAPS' involves an iterative process using multiple splits based on multivariate Gaussian mixture modeling of principal components and 'Expectation-Maximization' clustering as explained in Lebret et al. (2015) <doi:10.18637/jss.v067.i06>. In each iteration, rough clusters and outliers are also identified using the function rubikclust() from the R package 'KRIS'.
Maintained by Kridsadakorn Chaichoompu. Last updated 4 years ago.
3.4 match 2.00 score 10 scriptsbioc
Motif2Site:Detect binding sites from motifs and ChIP-seq experiments, and compare binding sites across conditions
Detect binding sites using motifs IUPAC sequence or bed coordinates and ChIP-seq experiments in bed or bam format. Combine/compare binding sites across experiments, tissues, or conditions. All normalization and differential steps are done using TMM-GLM method. Signal decomposition is done by setting motifs as the centers of the mixture of normal distribution curves.
Maintained by Peyman Zarrineh. Last updated 5 months ago.
softwaresequencingchipseqdifferentialpeakcallingepigeneticssequencematching
1.7 match 4.00 score 3 scriptsbioc
GeoTcgaData:Processing Various Types of Data on GEO and TCGA
Gene Expression Omnibus(GEO) and The Cancer Genome Atlas (TCGA) provide us with a wealth of data, such as RNA-seq, DNA Methylation, SNP and Copy number variation data. It's easy to download data from TCGA using the gdc tool, but processing these data into a format suitable for bioinformatics analysis requires more work. This R package was developed to handle these data.
Maintained by Erqiang Hu. Last updated 5 months ago.
geneexpressiondifferentialexpressionrnaseqcopynumbervariationmicroarraysoftwarednamethylationdifferentialmethylationsnpatacseqmethylationarray
1.2 match 25 stars 5.85 score 19 scriptsbioc
crisprShiny:Exploring curated CRISPR gRNAs via Shiny
Provides means to interactively visualize guide RNAs (gRNAs) in GuideSet objects via Shiny application. This GUI can be self-contained or as a module within a larger Shiny app. The content of the app reflects the annotations present in the passed GuideSet object, and includes intuitive tools to examine, filter, and export gRNAs, thereby making gRNA design more user-friendly.
Maintained by Jean-Philippe Fortin. Last updated 5 months ago.
crisprfunctionalgenomicsgenetargetguicrispr-analysiscrispr-designshiny
1.5 match 2 stars 4.48 score 8 scriptsstephenturner
string2dna:Encode/Decode Strings as Nucleotide Sequences
Encode strings as nucleotide sequences and decode nucleotide sequences into strings.
Maintained by Stephen Turner. Last updated 2 years ago.
3.9 match 1 stars 1.70 scorecnuge
coil:Contextualization and Evaluation of COI-5P Barcode Data
Designed for the cleaning, contextualization and assessment of cytochrome c oxidase I DNA barcode data (COI-5P, or the five prime portion of COI). It contains functions for placing COI-5P barcode sequences into a common reading frame, translating DNA sequences to amino acids and for assessing the likelihood that a given barcode sequence includes an insertion or deletion error. The error assessment relies on the comparison of input sequences against nucleotide and amino acid profile hidden Markov models (PHMMs) (for details see Durbin et al. 1998, ISBN: 9780521629713) trained on a taxonomically diverse set of reference sequences. The functions are provided as a complete pipeline and are also available individually for efficient and targeted analysis of barcode data.
Maintained by Cameron M. Nugent. Last updated 1 years ago.
2.3 match 2.88 score 15 scriptsbioc
XNAString:Efficient Manipulation of Modified Oligonucleotide Sequences
The XNAString package allows for description of base sequences and associated chemical modifications in a single object. XNAString is able to capture single stranded, as well as double stranded molecules. Chemical modifications are represented as independent strings associated with different features of the molecules (base sequence, sugar sequence, backbone sequence, modifications) and can be read or written to a HELM notation. It also enables secondary structure prediction using RNAfold from ViennaRNA. XNAString is designed to be efficient representation of nucleic-acid based therapeutics, therefore it stores information about target sequences and provides interface for matching and alignment functions from Biostrings and pwalign packages.
Maintained by Marianna Plucinska. Last updated 5 months ago.
sequencematchingalignmentsequencinggeneticscpp
1.5 match 4.18 score 4 scriptsbioc
gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files
Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.
Maintained by Xiuwen Zheng. Last updated 2 days ago.
infrastructuredataimportbioinformaticsgds-formatgenomicscpp
0.5 match 18 stars 11.34 score 920 scripts 29 dependentsurodelan
LocaTT:Geographically-Conscious Taxonomic Assignment for Metabarcoding
A bioinformatics pipeline for performing taxonomic assignment of DNA metabarcoding sequence data while considering geographic location. A detailed tutorial is available at <https://urodelan.github.io/Local_Taxa_Tool_Tutorial/>. A manuscript describing these methods is in preparation.
Maintained by Kenen Goodwin. Last updated 12 months ago.
1.8 match 3.00 scorebioc
PureCN:Copy number calling and SNV classification using targeted short read sequencing
This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.
Maintained by Markus Riester. Last updated 2 months ago.
copynumbervariationsoftwaresequencingvariantannotationvariantdetectioncoverageimmunooncologybioconductor-packagecell-free-dnacopy-numberlohtumor-heterogeneitytumor-mutational-burdentumor-purity
0.5 match 132 stars 9.72 score 40 scriptsshixiangwang
sigminer:Extract, Analyze and Visualize Mutational Signatures for Genomic Variations
Genomic alterations including single nucleotide substitution, copy number alteration, etc. are the major force for cancer initialization and development. Due to the specificity of molecular lesions caused by genomic alterations, we can generate characteristic alteration spectra, called 'signature' (Wang, Shixiang, et al. (2021) <DOI:10.1371/journal.pgen.1009557> & Alexandrov, Ludmil B., et al. (2020) <DOI:10.1038/s41586-020-1943-3> & Steele Christopher D., et al. (2022) <DOI:10.1038/s41586-022-04738-6>). This package helps users to extract, analyze and visualize signatures from genomic alteration records, thus providing new insight into cancer study.
Maintained by Shixiang Wang. Last updated 5 months ago.
bayesian-nmfbioinformaticscancer-researchcnvcopynumber-signaturescosmic-signaturesdbseasy-to-useindelmutational-signaturesnmfnmf-extractionsbssignature-extractionsomatic-mutationssomatic-variantsvisualizationcpp
0.5 match 150 stars 9.48 score 123 scripts 2 dependentsbioc
seqArchRplus:Downstream analyses of promoter sequence architectures and HTML report generation
seqArchRplus facilitates downstream analyses of promoter sequence architectures/clusters identified by seqArchR (or any other tool/method). With additional available information such as the TPM values and interquantile widths (IQWs) of the CAGE tag clusters, seqArchRplus can order the input promoter clusters by their shape (IQWs), and write the cluster information as browser/IGV track files. Provided visualizations are of two kind: per sample/stage and per cluster visualizations. Those of the first kind include: plot panels for each sample showing per cluster shape, TPM and other score distributions, sequence logos, and peak annotations. The second include per cluster chromosome-wise and strand distributions, motif occurrence heatmaps and GO term enrichments. Additionally, seqArchRplus can also generate HTML reports for easy viewing and comparison of promoter architectures between samples/stages.
Maintained by Sarvesh Nikumbh. Last updated 5 months ago.
annotationvisualizationreportwritinggomotifannotationclustering
1.2 match 1 stars 4.00 score 2 scriptsgrafxzahl
genBaRcode:Analysis and Visualization Tools for Genetic Barcode Data
Provides the necessary functions to identify and extract a selection of already available barcode constructs (Cornils, K. et al. (2014) <doi:10.1093/nar/gku081>) and freely choosable barcode designs from next generation sequence (NGS) data. Furthermore, it offers the possibility to account for sequence errors, the calculation of barcode similarities and provides a variety of visualisation tools (Thielecke, L. et al. (2017) <doi:10.1038/srep43249>).
Maintained by Lars Thielecke. Last updated 6 days ago.
2.0 match 2.30 score 6 scriptsgreen-striped-gecko
dartR.popgen:Analysing 'SNP' and 'Silicodart' Data Generated by Genome-Wide Restriction Fragment Analysis
Facilitates the analysis of SNP (single nucleotide polymorphism) and silicodart (presence/absence) data. 'dartR.popgen' provides a suit of functions to analyse such data in a population genetics context. It provides several functions to calculate population genetic metrics and to study population structure. Quite a few functions need additional software to be able to run (gl.run.structure(), gl.blast(), gl.LDNe()). You find detailed description in the help pages how to download and link the packages so the function can run the software. 'dartR.popgen' is part of the the 'dartRverse' suit of packages. Gruber et al. (2018) <doi:10.1111/1755-0998.12745>. Mijangos et al. (2022) <doi:10.1111/2041-210X.13918>.
Maintained by Bernd Gruber. Last updated 9 months ago.
2.2 match 2.00 score 9 scriptsfischuu
hoardeR:Collect and Retrieve Annotation Data for Various Genomic Data Using Different Webservices
Cross-species identification of novel gene candidates using the NCBI web service is provided. Further, sets of miRNA target genes can be identified by using the targetscan.org API.
Maintained by Daniel Fischer. Last updated 11 months ago.
1.2 match 1 stars 3.70 score 6 scriptsisubirana
compareGroups:Descriptive Analysis by Groups
Create data summaries for quality control, extensive reports for exploring data, as well as publication-ready univariate or bivariate tables in several formats (plain text, HTML,LaTeX, PDF, Word or Excel. Create figures to quickly visualise the distribution of your data (boxplots, barplots, normality-plots, etc.). Display statistics (mean, median, frequencies, incidences, etc.). Perform the appropriate tests (t-test, Analysis of variance, Kruskal-Wallis, Fisher, log-rank, ...) depending on the nature of the described variable (normal, non-normal or qualitative). Summarize genetic data (Single Nucleotide Polymorphisms) data displaying Allele Frequencies and performing Hardy-Weinberg Equilibrium tests among other typical statistics and tests for these kind of data.
Maintained by Isaac Subirana. Last updated 18 days ago.
comparegroupsdescriptive-statisticsplotreporttable
0.5 match 35 stars 8.51 score 396 scripts 1 dependentsnirmalaruban
geneNR:Automated Gene Identification for Post-GWAS Analysis
Facilitates the post-Genome Wide Association Studies (GWAS) analysis of identifying candidate genes within user-defined search window, based on the identified Single Nucleotide Polymorphisms (SNPs) as given by Mazumder AK (2024) <doi:10.1038/s41598-024-66903-3>. It supports candidate gene analysis for wheat and rice. Just import your GWAS result as explained in the sample_data file and the function does all the manual search and retrieve candidate genes for you, while exporting the results into ready-to-use output.
Maintained by Rajamani Nirmalaruban. Last updated 6 days ago.
2.1 match 2.00 scorebioc
signeR:Empirical Bayesian approach to mutational signature discovery
The signeR package provides an empirical Bayesian approach to mutational signature discovery. It is designed to analyze single nucleotide variation (SNV) counts in cancer genomes, but can also be applied to other features as well. Functionalities to characterize signatures or genome samples according to exposure patterns are also provided.
Maintained by Renan Valieris. Last updated 5 months ago.
genomicvariationsomaticmutationstatisticalmethodvisualizationbioconductorbioinformaticsopenblascpp
0.5 match 13 stars 7.67 score 22 scriptscran
SeqFeatR:A Tool to Associate FASTA Sequences and Features
Provides user friendly methods for the identification of sequence patterns that are statistically significantly associated with a property of the sequence. For instance, SeqFeatR allows to identify viral immune escape mutations for hosts of given HLA types. The underlying statistical method is Fisher's exact test, with appropriate corrections for multiple testing, or Bayes. Patterns may be point mutations or n-tuple of mutations. SeqFeatR offers several ways to visualize the results of the statistical analyses, see Budeus (2016) <doi:10.1371/journal.pone.0146409>.
Maintained by Bettina Budeus. Last updated 6 years ago.
1.7 match 2.30 scoreluisekuehn
quiddich:QUick IDentification of DIagnostic CHaracters
Provides tools for an automated identification of diagnostic molecular characters, i.e. such columns in a given nucleotide or amino acid alignment that allow to distinguish taxa from each other. These characters can then be used to complement the formal descriptions of the taxa, which are often based on morphological and anatomical features. Especially for morphologically cryptic species, this will be helpful. QUIDDICH distinguishes between four different types of diagnostic characters. For more information, see "Kuehn, A.L., Haase, M. 2019. QUIDDICH: QUick IDentification of DIagnostic CHaracters."
Maintained by A. Luise Kuehn. Last updated 6 years ago.
3.8 match 1.00 scoresteffenmoritz
ridge:Ridge Regression with Automatic Selection of the Penalty Parameter
Linear and logistic ridge regression functions. Additionally includes special functions for genome-wide single-nucleotide polymorphism (SNP) data. More details can be found in <doi: 10.1002/gepi.21750> and <doi: 10.1186/1471-2105-12-372>.
Maintained by Steffen Moritz. Last updated 3 years ago.
0.5 match 18 stars 7.24 score 124 scripts 2 dependentsbioc
SomaticSignatures:Somatic Signatures
The SomaticSignatures package identifies mutational signatures of single nucleotide variants (SNVs). It provides a infrastructure related to the methodology described in Nik-Zainal (2012, Cell), with flexibility in the matrix decomposition algorithms.
Maintained by Julian Gehring. Last updated 5 months ago.
sequencingsomaticmutationvisualizationclusteringgenomicvariationstatisticalmethod
0.5 match 22 stars 6.85 score 54 scripts 1 dependentscran
spgs:Statistical Patterns in Genomic Sequences
A collection of statistical hypothesis tests and other techniques for identifying certain spatial relationships/phenomena in DNA sequences. In particular, it provides tests and graphical methods for determining whether or not DNA sequences comply with Chargaff's second parity rule or exhibit purine-pyrimidine parity. In addition, there are functions for efficiently simulating discrete state space Markov chains and testing arbitrary symbolic sequences of symbols for the presence of first-order Markovianness. Also, it has functions for counting words/k-mers (and cylinder patterns) in arbitrary symbolic sequences. Functions which take a DNA sequence as input can handle sequences stored as SeqFastadna objects from the 'seqinr' package.
Maintained by Andrew Hart. Last updated 1 years ago.
1.7 match 1.98 score 96 scriptsmpierrejean
jointseg:Joint Segmentation of Multivariate (Copy Number) Signals
Methods for fast segmentation of multivariate signals into piecewise constant profiles and for generating realistic copy-number profiles. A typical application is the joint segmentation of total DNA copy numbers and allelic ratios obtained from Single Nucleotide Polymorphism (SNP) microarrays in cancer studies. The methods are described in Pierre-Jean, Rigaill and Neuvial (2015) <doi:10.1093/bib/bbu026>.
Maintained by Morgane Pierre-Jean. Last updated 6 years ago.
0.5 match 6 stars 6.50 score 44 scripts 2 dependentsbioc
TRONCO:TRONCO, an R package for TRanslational ONCOlogy
The TRONCO (TRanslational ONCOlogy) R package collects algorithms to infer progression models via the approach of Suppes-Bayes Causal Network, both from an ensemble of tumors (cross-sectional samples) and within an individual patient (multi-region or single-cell samples). The package provides parallel implementation of algorithms that process binary matrices where each row represents a tumor sample and each column a single-nucleotide or a structural variant driving the progression; a 0/1 value models the absence/presence of that alteration in the sample. The tool can import data from plain, MAF or GISTIC format files, and can fetch it from the cBioPortal for cancer genomics. Functions for data manipulation and visualization are provided, as well as functions to import/export such data to other bioinformatics tools for, e.g, clustering or detection of mutually exclusive alterations. Inferred models can be visualized and tested for their confidence via bootstrap and cross-validation. TRONCO is used for the implementation of the Pipeline for Cancer Inference (PICNIC).
Maintained by Luca De Sano. Last updated 5 months ago.
biomedicalinformaticsbayesiangraphandnetworksomaticmutationnetworkinferencenetworkclusteringdataimportsinglecellimmunooncologyalgorithmscancer-inferencetumors
0.5 match 30 stars 6.50 score 38 scriptspbreheny
plmmr:Penalized Linear Mixed Models for Correlated Data
Fits penalized linear mixed models that correct for unobserved confounding factors. 'plmmr' infers and corrects for the presence of unobserved confounding effects such as population stratification and environmental heterogeneity. It then fits a linear model via penalized maximum likelihood. Originally designed for the multivariate analysis of single nucleotide polymorphisms (SNPs) measured in a genome-wide association study (GWAS), 'plmmr' eliminates the need for subpopulation-specific analyses and post-analysis p-value adjustments. Functions for the appropriate processing of 'PLINK' files are also supplied. For examples, see the package homepage. <https://pbreheny.github.io/plmmr/>.
Maintained by Patrick J. Breheny. Last updated 11 days ago.
0.5 match 4 stars 6.31 score 10 scriptsevolecolgroup
tidypopgen:Tidy Population Genetics
We provide a tidy grammar of population genetics, facilitating the manipulation and analysis of data on biallelic single nucleotide polymorphisms (SNPs).
Maintained by Andrea Manica. Last updated 3 days ago.
0.5 match 4 stars 5.83 score 8 scriptskosukehamazaki
RAINBOWR:Genome-Wide Association Study with SNP-Set Methods
By using 'RAINBOWR' (Reliable Association INference By Optimizing Weights with R), users can test multiple SNPs (Single Nucleotide Polymorphisms) simultaneously by kernel-based (SNP-set) methods. This package can also be applied to haplotype-based GWAS (Genome-Wide Association Study). Users can test not only additive effects but also dominance and epistatic effects. In detail, please check our paper on PLOS Computational Biology: Kosuke Hamazaki and Hiroyoshi Iwata (2020) <doi:10.1371/journal.pcbi.1007663>.
Maintained by Kosuke Hamazaki. Last updated 3 months ago.
0.5 match 22 stars 5.99 score 22 scriptsbioc
muscle:Multiple Sequence Alignment with MUSCLE
MUSCLE performs multiple sequence alignments of nucleotide or amino acid sequences.
Maintained by Alex T. Kalinka. Last updated 5 months ago.
multiplesequencealignmentalignmentsequencinggeneticssequencematchingdataimportcpp
0.6 match 5.21 score 81 scriptsbioc
CNVfilteR:Identifies false positives of CNV calling tools by using SNV calls
CNVfilteR identifies those CNVs that can be discarded by using the single nucleotide variant (SNV) calls that are usually obtained in common NGS pipelines.
Maintained by Jose Marcos Moreno-Cabrera. Last updated 5 months ago.
copynumbervariationsequencingdnaseqvisualizationdataimport
0.5 match 5 stars 5.18 score 1 scriptsbioc
periodicDNA:Set of tools to identify periodic occurrences of k-mers in DNA sequences
This R package helps the user identify k-mers (e.g. di- or tri-nucleotides) present periodically in a set of genomic loci (typically regulatory elements). The functions of this package provide a straightforward approach to find periodic occurrences of k-mers in DNA sequences, such as regulatory elements. It is not aimed at identifying motifs separated by a conserved distance; for this type of analysis, please visit MEME website.
Maintained by Jacques Serizay. Last updated 5 months ago.
sequencematchingmotifdiscoverymotifannotationsequencingcoveragealignmentdataimport
0.5 match 6 stars 5.26 score 5 scriptssignaturescience
skater:Utilities for SNP-Based Kinship Analysis
Utilities for single nucleotide polymorphism (SNP) based kinship analysis testing and evaluation. The 'skater' package contains functions for importing, parsing, and analyzing pedigree data, performing relationship degree inference, benchmarking relationship degree classification, and summarizing identity by descent (IBD) segment data. Package functions and methods are described in Turner et al. (2021) "skater: An R package for SNP-based Kinship Analysis, Testing, and Evaluation" <doi:10.1101/2021.07.21.453083>.
Maintained by Stephen Turner. Last updated 2 years ago.
0.5 match 9 stars 5.26 score 7 scriptsbioc
supersigs:Supervised mutational signatures
Generate SuperSigs (supervised mutational signatures) from single nucleotide variants in the cancer genome. Functions included in the package allow the user to learn supervised mutational signatures from their data and apply them to new data. The methodology is based on the one described in Afsari (2021, ELife).
Maintained by Albert Kuo. Last updated 5 months ago.
featureextractionclassificationregressionsequencingwholegenomesomaticmutation
0.5 match 3 stars 4.78 score 3 scriptsbioc
customProDB:Generate customized protein database from NGS data, with a focus on RNA-Seq data, for proteomics search
Database search is the most widely used approach for peptide and protein identification in mass spectrometry-based proteomics studies. Our previous study showed that sample-specific protein databases derived from RNA-Seq data can better approximate the real protein pools in the samples and thus improve protein identification. More importantly, single nucleotide variations, short insertion and deletions and novel junctions identified from RNA-Seq data make protein database more complete and sample-specific. Here, we report an R package customProDB that enables the easy generation of customized databases from RNA-Seq data for proteomics search. This work bridges genomics and proteomics studies and facilitates cross-omics data integration.
Maintained by Xiaojing Wang. Last updated 5 months ago.
immunooncologysequencingmassspectrometryproteomicssnprnaseqsoftwaretranscriptionalternativesplicingfunctionalgenomics
0.5 match 4.72 score 15 scriptsbioc
GEOfastq:Downloads ENA Fastqs With GEO Accessions
GEOfastq is used to download fastq files from the European Nucleotide Archive (ENA) starting with an accession from the Gene Expression Omnibus (GEO). To do this, sample metadata is retrieved from GEO and the Sequence Read Archive (SRA). SRA run accessions are then used to construct FTP and aspera download links for fastq files generated by the ENA.
Maintained by Alex Pickering. Last updated 5 months ago.
rnaseqdataimportbioinformaticsfastqgene-expressiongeorna-seq
0.5 match 4 stars 4.60 score 6 scriptswenlongren
ScoreEB:Score Test Integrated with Empirical Bayes for Association Study
Perform association test within linear mixed model framework using score test integrated with empirical bayes for genome-wide association study. Firstly, score test was conducted for each single nucleotide polymorphism (SNP) under linear mixed model framework, taking into account the genetic relatedness and population structure. And then all the potentially associated SNPs were selected with a less stringent criterion. Finally, all the selected SNPs were performed empirical bayes in a multi-locus model to identify the true quantitative trait nucleotide (QTN).
Maintained by Wenlong Ren. Last updated 3 years ago.
0.8 match 2 stars 3.00 score 1 scriptskbroman
mbmixture:Microbiome Mixture Analysis
Evaluate whether a microbiome sample is a mixture of two samples, by fitting a model for the number of read counts as a function of single nucleotide polymorphism (SNP) allele and the genotypes of two potential source samples. Lobo et al. (2021) <doi:10.1093/g3journal/jkab308>.
Maintained by Karl W Broman. Last updated 4 months ago.
0.5 match 6 stars 4.48 score 5 scriptstacazares
SeedMatchR:Find Matches to Canonical SiRNA Seeds in Genomic Features
On-target gene knockdown using siRNA ideally results from binding fully complementary regions in mRNA transcripts to induce cleavage. Off-target siRNA gene knockdown can occur through several modes, one being a seed-mediated mechanism mimicking miRNA gene regulation. Seed-mediated off-target effects occur when the ~8 nucleotides at the 5’ end of the guide strand, called a seed region, bind the 3’ untranslated regions of mRNA, causing reduced translation. Experiments using siRNA knockdown paired with RNA-seq can be used to detect siRNA sequences with potential off-target effects driven by the seed region. 'SeedMatchR' provides tools for exploring and detecting potential seed-mediated off-target effects of siRNA in RNA-seq experiments. 'SeedMatchR' is designed to extend current differential expression analysis tools, such as 'DESeq2', by annotating results with predicted seed matches. Using publicly available data, we demonstrate the ability of 'SeedMatchR' to detect cumulative changes in differential gene expression attributed to siRNA seed regions.
Maintained by Tareian Cazares. Last updated 1 years ago.
deseq2-analysismirnarna-seqsirnatranscriptomics
0.5 match 7 stars 4.54 score 7 scriptsbioc
gmapR:An R interface to the GMAP/GSNAP/GSTRUCT suite
GSNAP and GMAP are a pair of tools to align short-read data written by Tom Wu. This package provides convenience methods to work with GMAP and GSNAP from within R. In addition, it provides methods to tally alignment results on a per-nucleotide basis using the bam_tally tool.
Maintained by Michael Lawrence. Last updated 16 days ago.
0.5 match 4.43 score 45 scriptsbioc
seqArchR:Identify Different Architectures of Sequence Elements
seqArchR enables unsupervised discovery of _de novo_ clusters with characteristic sequence architectures characterized by position-specific motifs or composition of stretches of nucleotides, e.g., CG-richness. seqArchR does _not_ require any specifications w.r.t. the number of clusters, the length of any individual motifs, or the distance between motifs if and when they occur in pairs/groups; it directly detects them from the data. seqArchR uses non-negative matrix factorization (NMF) as its backbone, and employs a chunking-based iterative procedure that enables processing of large sequence collections efficiently. Wrapper functions are provided for visualizing cluster architectures as sequence logos.
Maintained by Sarvesh Nikumbh. Last updated 5 months ago.
motifdiscoverygeneregulationmathematicalbiologysystemsbiologytranscriptomicsgeneticsclusteringdimensionreductionfeatureextractiondnaseqnmfnonnegative-matrix-factorizationpromoter-sequence-architecturesscikit-learnsequence-analysissequence-architecturesunsupervised-machine-learning
0.5 match 1 stars 4.48 score 9 scripts 1 dependentsbioc
tLOH:Assessment of evidence for LOH in spatial transcriptomics pre-processed data using Bayes factor calculations
tLOH, or transcriptomicsLOH, assesses evidence for loss of heterozygosity (LOH) in pre-processed spatial transcriptomics data. This tool requires spatial transcriptomics cluster and allele count information at likely heterozygous single-nucleotide polymorphism (SNP) positions in VCF format. Bayes factors are calculated at each SNP to determine likelihood of potential loss of heterozygosity event. Two plotting functions are included to visualize allele fraction and aggregated Bayes factor per chromosome. Data generated with the 10X Genomics Visium Spatial Gene Expression platform must be pre-processed to obtain an individual sample VCF with columns for each cluster. Required fields are allele depth (AD) with counts for reference/alternative alleles and read depth (DP).
Maintained by Michelle Webb. Last updated 5 months ago.
copynumbervariationtranscriptionsnpgeneexpressiontranscriptomics
0.5 match 3 stars 4.48 score 4 scriptsbioc
motifcounter:R package for analysing TFBSs in DNA sequences
'motifcounter' provides motif matching, motif counting and motif enrichment functionality based on position frequency matrices. The main features of the packages include the utilization of higher-order background models and accounting for self-overlapping motif matches when determining motif enrichment. The background model allows to capture dinucleotide (or higher-order nucleotide) composition adequately which may reduced model biases and misleading results compared to using simple GC background models. When conducting a motif enrichment analysis based on the motif match count, the package relies on a compound Poisson distribution or alternatively a combinatorial model. These distribution account for self-overlapping motif structures as exemplified by repeat-like or palindromic motifs, and allow to determine the p-value and fold-enrichment for a set of observed motif matches.
Maintained by Wolfgang Kopp. Last updated 5 months ago.
transcriptionmotifannotationsequencematchingsoftwareopenmp
0.5 match 4.30 score 7 scriptsdelomast
tripsAndDipR:Inference of Ploidy from Sequencing Data
Uses read counts for biallelic single nucleotide polymorphisms (SNPs) to infer ploidy. It allows parameters to be specified to account for sequencing error rates and allelic bias. For details of the algorithms, please see Delomas (2019) <doi:10.1111/1755-0998.13073> and Delomas et al. (2021) <doi:10.1111/1755-0998.13431>.
Maintained by Thomas Delomas. Last updated 2 years ago.
0.5 match 3 stars 4.18 score 4 scriptscran
disprose:Discriminating Probes Selection
Set of tools for molecular probes selection and design of a microarray, e.g. the assessment of physical and chemical properties, blast performance, selection according to sensitivity and selectivity. Methods used in package are described in: Lorenz R., Stephan H.B., Höner zu Siederdissen C. et al. (2011) <doi:10.1186/1748-7188-6-26>; Camacho C., Coulouris G., Avagyan V. et al. (2009) <doi:10.1186/1471-2105-10-421>.
Maintained by Elena Filatova. Last updated 3 years ago.
1.9 match 1.00 scorearunabhacodes
MPGE:A Two-Step Approach to Testing Overall Effect of Gene-Environment Interaction for Multiple Phenotypes
Interaction between a genetic variant (e.g., a single nucleotide polymorphism) and an environmental variable (e.g., physical activity) can have a shared effect on multiple phenotypes (e.g., blood lipids). We implement a two-step method to test for an overall interaction effect on multiple phenotypes. In first step, the method tests for an overall marginal genetic association between the genetic variant and the multivariate phenotype. The genetic variants which show an evidence of marginal overall genetic effect in the first step are prioritized while testing for an overall gene-environment interaction effect in the second step. Methodology is available from: A Majumdar, KS Burch, S Sankararaman, B Pasaniuc, WJ Gauderman, JS Witte (2020) <doi:10.1101/2020.07.06.190256>.
Maintained by Arunabha Majumdar. Last updated 4 years ago.
0.5 match 1 stars 3.70 score 1 scriptsjmanitz
kangar00:Kernel Approaches for Nonlinear Genetic Association Regression
Methods to extract information on pathways, genes and various single-nucleotid polymorphisms (SNPs) from online databases. It provides functions for data preparation and evaluation of genetic influence on a binary outcome using the logistic kernel machine test (LKMT). Three different kernel functions are offered to analyze genotype information in this variance component test: A linear kernel, a size-adjusted kernel and a network-based kernel).
Maintained by Juliane Manitz. Last updated 6 months ago.
0.5 match 2 stars 3.62 score 21 scriptsbioc
rfPred:Assign rfPred functional prediction scores to a missense variants list
Based on external numerous data files where rfPred scores are pre-calculated on all genomic positions of the human exome, the package gives rfPred scores to missense variants identified by the chromosome, the position (hg19 version), the referent and alternative nucleotids and the uniprot identifier of the protein. Note that for using the package, the user has to download the TabixFile and index (approximately 3.3 Go).
Maintained by Hugo Varet. Last updated 5 months ago.
softwareannotationclassification
0.5 match 3.60 score 4 scriptssoroushmdg
gwid:Genome-Wide Identity-by-Descent
Methods and tools for the analysis of Genome Wide Identity-by-Descent ('gwid') mapping data, focusing on testing whether there is a higher occurrence of Identity-By-Descent (IBD) segments around potential causal variants in cases compared to controls, which is crucial for identifying rare variants. To enhance its analytical power, 'gwid' incorporates a Sliding Window Approach, allowing for the detection and analysis of signals from multiple Single Nucleotide Polymorphisms (SNPs).
Maintained by Soroush Mahmoudiandehkordi. Last updated 6 months ago.
0.5 match 1 stars 3.60 score 4 scriptsjphill01
HACSim:Iterative Extrapolation of Species' Haplotype Accumulation Curves for Genetic Diversity Assessment
Performs iterative extrapolation of species' haplotype accumulation curves using a nonparametric stochastic (Monte Carlo) optimization method for assessment of specimen sampling completeness based on the approach of Phillips et al. (2015) <doi:10.1515/dna-2015-0008>, Phillips et al. (2019) <doi:10.1002/ece3.4757> and Phillips et al. (2020) <doi: 10.7717/peerj-cs.243>. 'HACSim' outputs a number of useful summary statistics of sampling coverage ("Measures of Sampling Closeness"), including an estimate of the likely required sample size (along with desired level confidence intervals) necessary to recover a given number/proportion of observed unique species' haplotypes. Any genomic marker can be targeted to assess likely required specimen sample sizes for genetic diversity assessment. The method is particularly well-suited to assess sampling sufficiency for DNA barcoding initiatives. Users can also simulate their own DNA sequences according to various models of nucleotide substitution. A Shiny app is also available.
Maintained by Jarrett D. Phillips. Last updated 6 months ago.
dna-barcodinghaplotype-accumulation-curvescpp
0.5 match 3.48 score 5 scriptsbioc
CNViz:Copy Number Visualization
CNViz takes probe, gene, and segment-level log2 copy number ratios and launches a Shiny app to visualize your sample's copy number profile. You can also integrate loss of heterozygosity (LOH) and single nucleotide variant (SNV) data.
Maintained by Rebecca Greenblatt. Last updated 5 months ago.
visualizationcopynumbervariationsequencingdnaseq
0.5 match 3.30 score 1 scriptsbioc
triplex:Search and visualize intramolecular triplex-forming sequences in DNA
This package provides functions for identification and visualization of potential intramolecular triplex patterns in DNA sequence. The main functionality is to detect the positions of subsequences capable of folding into an intramolecular triplex (H-DNA) in a much larger sequence. The potential H-DNA (triplexes) should be made of as many cannonical nucleotide triplets as possible. The package includes visualization showing the exact base-pairing in 1D, 2D or 3D.
Maintained by Jiri Hon. Last updated 5 months ago.
sequencematchinggeneregulation
0.5 match 3.30 score 2 scriptsbioc
dyebias:The GASSCO method for correcting for slide-dependent gene-specific dye bias
Many two-colour hybridizations suffer from a dye bias that is both gene-specific and slide-specific. The former depends on the content of the nucleotide used for labeling; the latter depends on the labeling percentage. The slide-dependency was hitherto not recognized, and made addressing the artefact impossible. Given a reasonable number of dye-swapped pairs of hybridizations, or of same vs. same hybridizations, both the gene- and slide-biases can be estimated and corrected using the GASSCO method (Margaritis et al., Mol. Sys. Biol. 5:266 (2009), doi:10.1038/msb.2009.21)
Maintained by Philip Lijnzaad. Last updated 5 months ago.
microarraytwochannelqualitycontrolpreprocessing
0.5 match 3.30 score 10 scriptsnoramvillanueva
seq2R:Simple Method to Detect Compositional Changes in Genomic Sequences
This software is useful for loading '.fasta' or '.gbk' files, and for retrieving sequences from 'GenBank' dataset <https://www.ncbi.nlm.nih.gov/genbank/>. This package allows to detect differences or asymmetries based on nucleotide composition by using local linear kernel smoothers. Also, it is possible to draw inference about critical points (i. e. maximum or minimum points) related with the derivative curves. Additionally, bootstrap methods have been used for estimating confidence intervals and speed computational techniques (binning techniques) have been implemented in 'seq2R'.
Maintained by Nora M. Villanueva. Last updated 4 months ago.
bootstrapchange-pointsdna-sequencesgenome-analysismachine-learningnonparametric-statisticsregressionfortran
0.5 match 3.00 score 10 scriptsdcibioinformatics
survSNP:Power Calculations for SNP Studies with Censored Outcomes
Conduct asymptotic and empirical power and sample size calculations for Single-Nucleotide Polymorphism (SNP) association studies with right censored time to event outcomes.
Maintained by Alexander Sibley. Last updated 2 years ago.
0.5 match 2.81 score 16 scriptssrika1919
pPCA:Partial Principal Component Analysis of Partitioned Large Sparse Matrices
Performs partial principal component analysis of a large sparse matrix. The matrix may be stored as a list of matrices to be concatenated (implicitly) horizontally. Useful application includes cases where the number of total nonzero entries exceed the capacity of 32 bit integers (e.g., with large Single Nucleotide Polymorphism data).
Maintained by Srika Raja. Last updated 5 months ago.
0.5 match 2.78 scorematveevdaniil
RPatternJoin:String Similarity Joins for Hamming and Levenshtein Distances
This project is a tool for words edit similarity joins (a.k.a. all-pairs similarity search) under small (< 3) edit distance constraints. It works for Levenshtein/Hamming distances and words from any alphabet. The software was originally developed for joining amino-acid/nucleotide sequences from Adaptive Immune Repertoires, where the number of words is relatively large (10^5-10^6) and the average length of words is relatively small (10-100).
Maintained by Daniil Matveev. Last updated 5 months ago.
0.5 match 2.78 score 5 scripts 1 dependentsleipzig
asciiruler:Render an ASCII Ruler
An ASCII ruler is for measuring text and is especially useful for sequence analysis. Included in this package are methods to create ASCII rulers and associated GenBank sequence blocks, multi-column text displays that make it easy for viewers to locate nucleotides by position.
Maintained by Jeremy Leipzig. Last updated 3 years ago.
0.5 match 2.70 score 5 scriptscran
HTRX:Haplotype Trend Regression with eXtra Flexibility (HTRX)
Detection of haplotype patterns that include single nucleotide polymorphisms (SNPs) and non-contiguous haplotypes that are associated with a phenotype. Methods for implementing HTRX are described in Yang Y, Lawson DJ (2023) <doi:10.1093/bioadv/vbad038> and Barrie W, Yang Y, Irving-Pease E.K, et al (2024) <doi:10.1038/s41586-023-06618-z>.
Maintained by Yaoling Yang. Last updated 1 years ago.
0.5 match 2.70 scoregastonquero
haplotyper:Tool for Clustering Genotypes in Haplotypes
Function to identify haplotypes within QTL (Quantitative Trait Loci). One haplotype is a combination of SNP (Single Nucleotide Polymorphisms) within the QTL. This function groups together all individuals of a population with the same haplotype. Each group contains individual with the same allele in each SNP, whether or not missing data. Thus, haplotyper groups individuals, that to be imputed, have a non-zero probability of having the same alleles in the entire sequence of SNP's. Moreover, haplotyper calculates such probability from relative frequencies.
Maintained by Gaston Quero. Last updated 9 years ago.
0.5 match 2.70 score 3 scriptshenrikbengtsson
calmate:Improved Allele-Specific Copy Number of SNP Microarrays for Downstream Segmentation
The CalMaTe method calibrates preprocessed allele-specific copy number estimates (ASCNs) from DNA microarrays by controlling for single-nucleotide polymorphism-specific allelic crosstalk. The resulting ASCNs are on average more accurate, which increases the power of segmentation methods for detecting changes between copy number states in tumor studies including copy neutral loss of heterozygosity. CalMaTe applies to any ASCNs regardless of preprocessing method and microarray technology, e.g. Affymetrix and Illumina.
Maintained by Henrik Bengtsson. Last updated 3 years ago.
acghcopynumbervariantssnpmicroarrayonechanneltwochannelgenetics
0.5 match 1 stars 2.70 score 6 scriptspboutros
ApplyPolygenicScore:Utilities for the Application of a Polygenic Score to a VCF
Simple and transparent parsing of genotype/dosage data from an input Variant Call Format (VCF) file, matching of genotype coordinates to the component Single Nucleotide Polymorphisms (SNPs) of an existing polygenic score (PGS), and application of SNP weights to dosages for the calculation of a polygenic score for each individual in accordance with the additive weighted sum of dosages model. Methods are designed in reference to best practices described by Collister, Liu, and Clifton (2022) <doi:10.3389/fgene.2022.818574>.
Maintained by Paul Boutros. Last updated 11 days ago.
0.5 match 2.70 scoremngar
simulMGF:Simulate SNP Matrix, Phenotype and Genotypic Effects
Simulate genotypes in SNP (single nucleotide polymorphisms) Matrix as random numbers from an uniform distribution, for diploid organisms (coded by 0, 1, 2), Sikorska et al., (2013) <doi:10.1186/1471-2105-14-166>, or half-sib/full-sib SNP matrix from real or simulated parents SNP data, assuming mendelian segregation. Simulate phenotypic traits for real or simulated SNP data, controlled by a specific number of quantitative trait loci and their effects, sampled from a Normal or an Uniform distributions, assuming a pure additive model. This is useful for testing association and genomic prediction models or for educational purposes.
Maintained by Martin Nahuel Garcia. Last updated 2 years ago.
0.5 match 2.70 score 1 scriptscran
dartR.sexlinked:Analysing SNP Data to Identify Sex-Linked Markers
Identifies, filters and exports sex linked markers using 'SNP' (single nucleotide polymorphism) data. To install the other packages, we recommend to install the 'dartRverse' package, that supports the installation of all packages in the 'dartRverse'. If you want understand the applied rational to identify sexlinked markers and/or want to cite 'dartR.sexlinked', you find the information by typing citation('dartR.sexlinked') in the console.
Maintained by Diana Robledo-Ruiz. Last updated 9 months ago.
0.5 match 2.00 score 4 scriptsgreen-striped-gecko
dartR.captive:Analysing 'SNP' Data to Support Captive Breeding
Functions are provided that facilitate the analysis of SNP (single nucleotide polymorphism) data to answer questions regarding captive breeding and relatedness between individuals. 'dartR.captive' is part of the 'dartRverse' suit of packages. Gruber et al. (2018) <doi:10.1111/1755-0998.12745>. Mijangos et al. (2022) <doi:10.1111/2041-210X.13918>.
Maintained by Bernd Gruber. Last updated 26 days ago.
0.5 match 1 stars 2.00 score 3 scriptsgastonquero
clusterhap:Clustering Genotypes in Haplotypes
One haplotype is a combination of SNP (Single Nucleotide Polymorphisms) within the QTL (Quantitative Trait Loci). clusterhap groups together all individuals of a population with the same haplotype. Each group contains individual with the same allele in each SNP, whether or not missing data. Thus, clusterhap groups individuals, that to be imputed, have a non-zero probability of having the same alleles in the entire sequence of SNP's. Moreover, clusterhap calculates such probability from relative frequencies.
Maintained by Gaston Quero. Last updated 9 years ago.
0.5 match 2.00 score 3 scriptsempiricalbayes
LFDREmpiricalBayes:Estimating Local False Discovery Rates Using Empirical Bayes Methods
New empirical Bayes methods aiming at analyzing the association of single nucleotide polymorphisms (SNPs) to some particular disease are implemented in this package. The package uses local false discovery rate (LFDR) estimates of SNPs within a sample population defined as a "reference class" and discovers if SNPs are associated with the corresponding disease. Although SNPs are used throughout this document, other biological data such as protein data and other gene data can be used. Karimnezhad, Ali and Bickel, D. R. (2016) <http://hdl.handle.net/10393/34889>.
Maintained by Ali Karimnezhad. Last updated 7 years ago.
bayesianmathematicalbiologymultiplecomparison
0.5 match 2.00 score 5 scriptscran
GEVACO:Joint Test of Gene and GxE Interactions via Varying Coefficients
A novel statistical model to detect the joint genetic and dynamic gene-environment (GxE) interaction with continuous traits in genetic association studies. It uses varying-coefficient models to account for different GxE trajectories, regardless whether the relationship is linear or not. The package includes one function, GxEtest(), to test a single genetic variant (e.g., a single nucleotide polymorphism or SNP), and another function, GxEscreen(), to test for a set of genetic variants. The method involves a likelihood ratio test described in Crainiceanu, C. M., and Ruppert, D. (2004) <doi:10.1111/j.1467-9868.2004.00438.x>.
Maintained by Sydney Manning. Last updated 3 years ago.
0.5 match 2.00 score 2 scriptswpihongzhang
CKAT:Composite Kernel Association Test for Pharmacogenetics Studies
Composite Kernel Association Test (CKAT) is a flexible and robust kernel machine based approach to jointly test the genetic main effect and gene-treatment interaction effect for a set of single-nucleotide polymorphisms (SNPs) in pharmacogenetics (PGx) assessments embedded within randomized clinical trials.
Maintained by Hong Zhang. Last updated 5 years ago.
0.5 match 1.78 scorecran
LDAandLDAS:Linkage Disequilibrium of Ancestry (LDA) and LDA Score (LDAS)
Computation of linkage disequilibrium of ancestry (LDA) and linkage disequilibrium of ancestry score (LDAS). LDA calculates the pairwise linkage disequilibrium of ancestry between single nucleotide polymorphisms (SNPs). LDAS calculates the LDA score of SNPs. The methods are described in Barrie W, Yang Y, Irving-Pease E.K, et al (2024) <doi:10.1038/s41586-023-06618-z>.
Maintained by Yaoling Yang. Last updated 1 years ago.
0.5 match 1.70 scorecran
FILEST:Fine-Level Structure Simulator
A population genetic simulator, which is able to generate synthetic datasets for single-nucleotide polymorphisms (SNP) for multiple populations. The genetic distances among populations can be set according to the Fixation Index (Fst) as explained in Balding and Nichols (1995) <doi:10.1007/BF01441146>. This tool is able to simulate outlying individuals and missing SNPs can be specified. For Genome-wide association study (GWAS), disease status can be set in desired level according risk ratio.
Maintained by Kridsadakorn Chaichoompu. Last updated 4 years ago.
0.5 match 1.70 scorelargon-denayah
read.gb:Open GenBank Files
Opens complete record(s) with .gb extension from the NCBI/GenBank Nucleotide database and returns a list containing shaped record(s). These kind of files contains detailed records of DNA samples (locus, organism, type of sequence, source of the sequence...). An example of record can be found at <https://www.ncbi.nlm.nih.gov/nuccore/HE799070>.
Maintained by Robin Mercier. Last updated 4 years ago.
0.5 match 1.48 score 5 scripts 1 dependentscran
FunctanSNP:Functional Analysis (with Interactions) for Dense SNP Data
An implementation of revised functional regression models for multiple genetic variation data, such as single nucleotide polymorphism (SNP) data, which provides revised functional linear regression models, partially functional interaction regression analysis with penalty-based techniques and corresponding drawing functions, etc.(Ruzong Fan, Yifan Wang, James L. Mills, Alexander F. Wilson, Joan E. Bailey-Wilson, and Momiao Xiong (2013) <doi:10.1002/gepi.21757>).
Maintained by Rui Ren. Last updated 2 years ago.
0.5 match 1.40 score 25 scriptsl0ka
TPES:Tumor Purity Estimation using SNVs
A bioinformatics tool for the estimation of the tumor purity from sequencing data. It uses the set of putative clonal somatic single nucleotide variants within copy number neutral segments to call tumor cellularity.
Maintained by Alessio Locallo. Last updated 6 years ago.
0.5 match 2 stars 1.30 score 10 scriptscran
PlasmaMutationDetector:Tumor Mutation Detection in Plasma
Aims at detecting single nucleotide variation (SNV) and insertion/deletion (INDEL) in circulating tumor DNA (ctDNA), used as a surrogate marker for tumor, at each base position of an Next Generation Sequencing (NGS) analysis. Mutations are assessed by comparing the minor-allele frequency at each position to the measured PER in control samples.
Maintained by Yves Rozenholc. Last updated 7 years ago.
0.5 match 1.30 scorecran
PdPDB:Pattern Discovery in PDB Structures of Metalloproteins
Looks for amino acid and/or nucleotide patterns and/or small ligands coordinated to a given prosthetic centre. Files have to be in the local file system and contain proper extension.
Maintained by Luca Belmonte. Last updated 7 years ago.
0.5 match 1.00 scorealenazia
NGBVS:Bayesian Variable Selection for SNP Data using Normal-Gamma
Posterior distribution of case-control fine-mapping. Specifically, Bayesian variable selection for single-nucleotide polymorphism (SNP) data using the normal-gamma prior. Alenazi A.A., Cox A., Juarez M,. Lin W-Y. and Walters, K. (2019) Bayesian variable selection using partially observed categorical prior information in fine-mapping association studies, Genetic Epidemiology. <doi:10.1002/gepi.22213>.
Maintained by Abdulaziz Alenazi. Last updated 2 years ago.
0.5 match 1.00 scorecran
RHclust:Vector in Partition
Non-parametric clustering of joint pattern multi-genetic/epigenetic factors. This package contains functions designed to cluster subjects based on gene features including single nucleotide polymorphisms (SNPs), DNA methylation (CPG), gene expression (GE), and covariate data. The novel concept follows the general K-means (Hartigan and Wong (1979) <doi:10.2307/2346830> framework but uses weighted Euclidean distances across the gene features to cluster subjects. This approach is unique in that it attempts to capture all pairwise interactions in an effort to cluster based on their complex biological interactions.
Maintained by Joseph Handwerker. Last updated 2 years ago.
0.5 match 1.00 score