Showing 55 of total 55 results (show query)
bioc
ShortRead:FASTQ input and manipulation
This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
dataimportsequencingqualitycontrolbioconductor-packagecore-packagezlibcpp
16.2 match 8 stars 12.08 score 1.8k scripts 49 dependentsbioc
dada2:Accurate, high-resolution sample inference from amplicon sequencing data
The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.
Maintained by Benjamin Callahan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclassificationmetagenomicsampliconbioconductorbioinformaticsmetabarcodingtaxonomycpp
11.3 match 485 stars 13.17 score 3.0k scripts 4 dependentsbioc
seqTools:Analysis of nucleotide, sequence and quality content on fastq files
Analyze read length, phred scores and alphabet frequency and DNA k-mers on uncompressed and compressed fastq files.
Maintained by Wolfgang Kaisers. Last updated 5 months ago.
21.6 match 5.57 score 52 scripts 1 dependentsbioc
ORFik:Open Reading Frames in Genomics
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
Maintained by Haakon Tjeldnes. Last updated 27 days ago.
immunooncologysoftwaresequencingriboseqrnaseqfunctionalgenomicscoveragealignmentdataimportcpp
7.7 match 33 stars 10.63 score 115 scripts 2 dependentsbioc
Biostrings:Efficient manipulation of biological strings
Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.
Maintained by Hervรฉ Pagรจs. Last updated 23 days ago.
sequencematchingalignmentsequencinggeneticsdataimportdatarepresentationinfrastructurebioconductor-packagecore-package
4.5 match 61 stars 17.83 score 8.6k scripts 1.2k dependentsbioc
GEOfastq:Downloads ENA Fastqs With GEO Accessions
GEOfastq is used to download fastq files from the European Nucleotide Archive (ENA) starting with an accession from the Gene Expression Omnibus (GEO). To do this, sample metadata is retrieved from GEO and the Sequence Read Archive (SRA). SRA run accessions are then used to construct FTP and aspera download links for fastq files generated by the ENA.
Maintained by Alex Pickering. Last updated 5 months ago.
rnaseqdataimportbioinformaticsfastqgene-expressiongeorna-seq
16.5 match 4 stars 4.60 score 6 scriptsbioc
Rfastp:An Ultra-Fast and All-in-One Fastq Preprocessor (Quality Control, Adapter, low quality and polyX trimming) and UMI Sequence Parsing).
Rfastp is an R wrapper of fastp developed in c++. fastp performs quality control for fastq files. including low quality bases trimming, polyX trimming, adapter auto-detection and trimming, paired-end reads merging, UMI sequence/id handling. Rfastp can concatenate multiple files into one file (like shell command cat) and accept multiple files as input.
Maintained by Thomas Carroll. Last updated 5 months ago.
qualitycontrolsequencingpreprocessingsoftwarezlibcpp
17.8 match 3.82 score 33 scriptsbioc
scPipe:Pipeline for single cell multi-omic data pre-processing
A preprocessing pipeline for single cell RNA-seq/ATAC-seq data that starts from the fastq files and produces a feature count matrix with associated quality control information. It can process fastq data generated by CEL-seq, MARS-seq, Drop-seq, Chromium 10x and SMART-seq protocols.
Maintained by Shian Su. Last updated 3 months ago.
immunooncologysoftwaresequencingrnaseqgeneexpressionsinglecellvisualizationsequencematchingpreprocessingqualitycontrolgenomeannotationdataimportcurlbzip2xz-utilszlibcpp
7.2 match 68 stars 9.02 score 84 scriptsbioc
UMI4Cats:UMI4Cats: Processing, analysis and visualization of UMI-4C chromatin contact data
UMI-4C is a technique that allows characterization of 3D chromatin interactions with a bait of interest, taking advantage of a sonication step to produce unique molecular identifiers (UMIs) that help remove duplication bias, thus allowing a better differential comparsion of chromatin interactions between conditions. This package allows processing of UMI-4C data, starting from FastQ files provided by the sequencing facility. It provides two statistical methods for detecting differential contacts and includes a visualization function to plot integrated information from a UMI-4C assay.
Maintained by Mireia Ramos-Rodriguez. Last updated 5 months ago.
qualitycontrolpreprocessingalignmentnormalizationvisualizationsequencingcoveragechromatinchromatin-interactiongenomicsumi4c
11.3 match 5 stars 5.57 score 7 scriptsadrientaudiere
MiscMetabar:Miscellaneous Functions for Metabarcoding Analysis
Facilitate the description, transformation, exploration, and reproducibility of metabarcoding analyses. 'MiscMetabar' is mainly built on top of the 'phyloseq', 'dada2' and 'targets' R packages. It helps to build reproducible and robust bioinformatics pipelines in R. 'MiscMetabar' makes ecological analysis of alpha and beta-diversity easier, more reproducible and more powerful by integrating a large number of tools. Important features are described in Taudiรจre A. (2023) <doi:10.21105/joss.06038>.
Maintained by Adrien Taudiรจre. Last updated 25 days ago.
sequencingmicrobiomemetagenomicsclusteringclassificationvisualizationampliconamplicon-sequencingbiodiversity-informaticsecologyilluminametabarcodingngs-analysis
7.2 match 17 stars 6.44 score 23 scriptsbioc
qckitfastq:FASTQ Quality Control
Assessment of FASTQ file format with multiple metrics including quality score, sequence content, overrepresented sequence and Kmers.
Maintained by August Guang. Last updated 5 months ago.
softwarequalitycontrolsequencingzlibcpp
10.4 match 4.38 score 24 scriptsbioc
SRAdb:A compilation of metadata from NCBI SRA and tools
The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, and others. However, finding data of interest can be challenging using current tools. SRAdb is an attempt to make access to the metadata associated with submission, study, sample, experiment and run much more feasible. This is accomplished by parsing all the NCBI SRA metadata into a SQLite database that can be stored and queried locally. Fulltext search in the package make querying metadata very flexible and powerful. fastq and sra files can be downloaded for doing alignment locally. Beside ftp protocol, the SRAdb has funcitons supporting fastp protocol (ascp from Aspera Connect) for faster downloading large data files over long distance. The SQLite database is updated regularly as new data is added to SRA and can be downloaded at will for the most up-to-date metadata.
Maintained by Jack Zhu. Last updated 3 months ago.
infrastructuresequencingdataimport
5.5 match 2 stars 7.81 score 200 scriptsemmanuelparadis
ape:Analyses of Phylogenetics and Evolution
Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
Maintained by Emmanuel Paradis. Last updated 1 months ago.
2.3 match 64 stars 17.18 score 13k scripts 601 dependentsbioc
systemPipeR:systemPipeR: Workflow Environment for Data Analysis and Report Generation
systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates.
Maintained by Thomas Girke. Last updated 5 months ago.
geneticsinfrastructuredataimportsequencingrnaseqriboseqchipseqmethylseqsnpgeneexpressioncoveragegenesetenrichmentalignmentqualitycontrolimmunooncologyreportwritingworkflowstepworkflowmanagement
3.2 match 53 stars 11.56 score 344 scripts 3 dependentsbioc
IONiseR:Quality Assessment Tools for Oxford Nanopore MinION data
IONiseR provides tools for the quality assessment of Oxford Nanopore MinION data. It extracts summary statistics from a set of fast5 files and can be used either before or after base calling. In addition to standard summaries of the read-types produced, it provides a number of plots for visualising metrics relative to experiment run time or spatially over the surface of a flowcell.
Maintained by Mike Smith. Last updated 5 months ago.
qualitycontroldataimportsequencing
8.3 match 4.30 score 5 scriptsambuvjyn
baseq:Basic Sequence Processing Tool for Biological Data
Primarily created as an easy and understanding way to do basic sequences surrounding the central dogma of molecular biology.
Maintained by Ambu Vijayan. Last updated 2 years ago.
8.9 match 2 stars 4.00 scorebioc
CrispRVariants:Tools for counting and visualising mutations in a target location
CrispRVariants provides tools for analysing the results of a CRISPR-Cas9 mutagenesis sequencing experiment, or other sequencing experiments where variants within a given region are of interest. These tools allow users to localize variant allele combinations with respect to any genomic location (e.g. the Cas9 cut site), plot allele combinations and calculate mutation rates with flexible filtering of unrelated variants.
Maintained by Helen Lindsay. Last updated 5 months ago.
immunooncologycrisprgenomicvariationvariantdetectiongeneticvariabilitydatarepresentationvisualizationsequencing
6.1 match 5.51 score 32 scriptsshaunpwilkinson
insect:Informatic Sequence Classification Trees
Provides tools for probabilistic taxon assignment with informatic sequence classification trees. See Wilkinson et al (2018) <doi:10.7287/peerj.preprints.26812v1>.
Maintained by Shaun Wilkinson. Last updated 4 years ago.
5.6 match 14 stars 5.80 score 91 scriptsbioc
FastqCleaner:A Shiny Application for Quality Control, Filtering and Trimming of FASTQ Files
An interactive web application for quality control, filtering and trimming of FASTQ files. This user-friendly tool combines a pipeline for data processing based on Biostrings and ShortRead infrastructure, with a cutting-edge visual environment. Single-Read and Paired-End files can be locally processed. Diagnostic interactive plots (CG content, per-base sequence quality, etc.) are provided for both the input and output files.
Maintained by Leandro Roser. Last updated 5 months ago.
qualitycontrolsequencingsoftwaresangerseqsequencematchingcpp
8.1 match 4.00 score 4 scriptsbioc
icetea:Integrating Cap Enrichment with Transcript Expression Analysis
icetea (Integrating Cap Enrichment with Transcript Expression Analysis) provides functions for end-to-end analysis of multiple 5'-profiling methods such as CAGE, RAMPAGE and MAPCap, beginning from raw reads to detection of transcription start sites using replicates. It also allows performing differential TSS detection between group of samples, therefore, integrating the mRNA cap enrichment information with transcript expression analysis.
Maintained by Vivek Bhardwaj. Last updated 5 months ago.
immunooncologytranscriptiongeneexpressionsequencingrnaseqtranscriptomicsdifferentialexpressioncageexpressionrna-seq
6.3 match 2 stars 5.08 score 7 scriptsbioc
ChIPsim:Simulation of ChIP-seq experiments
A general framework for the simulation of ChIP-seq data. Although currently focused on nucleosome positioning the package is designed to support different types of experiments.
Maintained by Peter Humburg. Last updated 5 months ago.
7.0 match 4.00 score 3 scriptsbioc
esATAC:An Easy-to-use Systematic pipeline for ATACseq data analysis
This package provides a framework and complete preset pipeline for quantification and analysis of ATAC-seq Reads. It covers raw sequencing reads preprocessing (FASTQ files), reads alignment (Rbowtie2), aligned reads file operations (SAM, BAM, and BED files), peak calling (F-seq), genome annotations (Motif, GO, SNP analysis) and quality control report. The package is managed by dataflow graph. It is easy for user to pass variables seamlessly between processes and understand the workflow. Users can process FASTQ files through end-to-end preset pipeline which produces a pretty HTML report for quality control and preliminary statistical results, or customize workflow starting from any intermediate stages with esATAC functions easily and flexibly.
Maintained by Zheng Wei. Last updated 5 months ago.
immunooncologysequencingdnaseqqualitycontrolalignmentpreprocessingcoverageatacseqdnaseseqatac-seqbioconductorpipelinecppopenjdk
4.5 match 23 stars 6.11 score 3 scriptsurodelan
LocaTT:Geographically-Conscious Taxonomic Assignment for Metabarcoding
A bioinformatics pipeline for performing taxonomic assignment of DNA metabarcoding sequence data while considering geographic location. A detailed tutorial is available at <https://urodelan.github.io/Local_Taxa_Tool_Tutorial/>. A manuscript describing these methods is in preparation.
Maintained by Kenen Goodwin. Last updated 12 months ago.
8.5 match 3.00 scorebioc
edgeR:Empirical Analysis of Digital Gene Expression Data in R
Differential expression analysis of sequence count data. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models, quasi-likelihood, and gene set enrichment. Can perform differential analyses of any type of omics data that produces read counts, including RNA-seq, ChIP-seq, ATAC-seq, Bisulfite-seq, SAGE, CAGE, metabolomics, or proteomics spectral counts. RNA-seq analyses can be conducted at the gene or isoform level, and tests can be conducted for differential exon or transcript usage.
Maintained by Yunshun Chen. Last updated 5 days ago.
alternativesplicingbatcheffectbayesianbiomedicalinformaticscellbiologychipseqclusteringcoveragedifferentialexpressiondifferentialmethylationdifferentialsplicingdnamethylationepigeneticsfunctionalgenomicsgeneexpressiongenesetenrichmentgeneticsimmunooncologymultiplecomparisonnormalizationpathwaysproteomicsqualitycontrolregressionrnaseqsagesequencingsinglecellsystemsbiologytimecoursetranscriptiontranscriptomicsopenblas
1.7 match 13.40 score 17k scripts 255 dependentssvilsen
STRMPS:Analysis of Short Tandem Repeat (STR) Massively Parallel Sequencing (MPS) Data
Loading, identifying, aggregating, manipulating, and analysing short tandem repeat regions of massively parallel sequencing data in forensic genetics. The analyses and framework implemented in this package relies on the papers of Vilsen et al. (2017) <doi:10.1016/j.fsigen.2017.01.017> and Vilsen et al. (2018) <doi:10.1016/j.fsigen.2018.04.003>. Note: that the parallelisation in the package relies on mclapply() and, thus, speed-ups will only be seen on UNIX based systems.
Maintained by Sรธren B. Vilsen. Last updated 2 days ago.
biostringspwalignshortreadiranges
5.1 match 4.30 scorelucasnell
jackalope:A Swift, Versatile Phylogenomic and High-Throughput Sequencing Simulator
Simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina <https://www.illumina.com/> and Pacific Biosciences (PacBio) <https://www.pacb.com/> platforms. It can either read reference genomes from FASTA files or simulate new ones. Genomic variants can be simulated using summary statistics, phylogenies, Variant Call Format (VCF) files, and coalescent simulationsโthe latter of which can include selection, recombination, and demographic fluctuations. 'jackalope' can simulate single, paired-end, or mate-pair Illumina reads, as well as PacBio reads. These simulations include sequencing errors, mapping qualities, multiplexing, and optical/polymerase chain reaction (PCR) duplicates. Simulating Illumina sequencing is based on ART by Huang et al. (2012) <doi:10.1093/bioinformatics/btr708>. PacBio sequencing simulation is based on SimLoRD by Stรถcker et al. (2016) <doi:10.1093/bioinformatics/btw286>. All outputs can be written to standard file formats.
Maintained by Lucas A. Nell. Last updated 1 years ago.
zlibopenblascurlbzip2xz-utilscpp
3.4 match 8 stars 5.28 score 24 scriptsguokai8
microbial:Do 16s Data Analysis and Generate Figures
Provides functions to enhance the available statistical analysis procedures in R by providing simple functions to analysis and visualize the 16S rRNA data.Here we present a tutorial with minimum working examples to demonstrate usage and dependencies.
Maintained by Kai Guo. Last updated 5 months ago.
softwaregraphandnetworkmicrobiomemicrobiome-analysis
3.1 match 13 stars 5.81 score 25 scriptsfischuu
GenomicTools.fileHandler:File Handlers for Genomic Data Analysis
A collection of I/O tools for handling the most commonly used genomic datafiles, like fasta/-q, bed, gff, gtf, ped/map and vcf.
Maintained by Daniel Fischer. Last updated 1 months ago.
3.8 match 4.48 score 4 scripts 2 dependentsbioc
KnowSeq:KnowSeq R/Bioc package: The Smart Transcriptomic Pipeline
KnowSeq proposes a novel methodology that comprises the most relevant steps in the Transcriptomic gene expression analysis. KnowSeq expects to serve as an integrative tool that allows to process and extract relevant biomarkers, as well as to assess them through a Machine Learning approaches. Finally, the last objective of KnowSeq is the biological knowledge extraction from the biomarkers (Gene Ontology enrichment, Pathway listing and Visualization and Evidences related to the addressed disease). Although the package allows analyzing all the data manually, the main strenght of KnowSeq is the possibilty of carrying out an automatic and intelligent HTML report that collect all the involved steps in one document. It is important to highligh that the pipeline is totally modular and flexible, hence it can be started from whichever of the different steps. KnowSeq expects to serve as a novel tool to help to the experts in the field to acquire robust knowledge and conclusions for the data and diseases to study.
Maintained by Daniel Castillo-Secilla. Last updated 5 months ago.
geneexpressiondifferentialexpressiongenesetenrichmentdataimportclassificationfeatureextractionsequencingrnaseqbatcheffectnormalizationpreprocessingqualitycontrolgeneticstranscriptomicsmicroarrayalignmentpathwayssystemsbiologygoimmunooncology
4.9 match 3.30 score 5 scriptsbioc
SELEX:Functions for analyzing SELEX-seq data
Tools for quantifying DNA binding specificities based on SELEX-seq data.
Maintained by Harmen J. Bussemaker. Last updated 5 months ago.
softwaremotifdiscoverymotifannotationgeneregulationtranscriptionopenjdk
3.6 match 4.30 score 8 scriptsbioc
DECIPHER:Tools for curating, analyzing, and manipulating biological sequences
A toolset for deciphering and managing biological sequences.
Maintained by Erik Wright. Last updated 5 days ago.
clusteringgeneticssequencingdataimportvisualizationmicroarrayqualitycontrolqpcralignmentwholegenomemicrobiomeimmunooncologygenepredictionopenmp
1.8 match 8.40 score 1.1k scripts 14 dependentsc5sire
ace2fastq:ACE File to FASTQ Converter
The ACE file format is used in genomics to store contigs from sequencing machines. This tools converts it into FASTQ format. Both formats contain the sequence characters and their corresponding quality information. Unlike the FASTQ file, the ace file stores the quality values numerically. The conversion algorithm uses the standard Sanger formula. The package facilitates insertion into pipelines, and content inspection.
Maintained by Reinhard Simon. Last updated 2 years ago.
3.9 match 3.70 score 7 scriptscnuge
debar:A Post-Clustering Denoiser for COI-5P Barcode Data
The 'debar' sequence processing pipeline is designed for denoising high throughput sequencing data for the animal DNA barcode marker cytochrome c oxidase I (COI). The package is designed to detect and correct insertion and deletion errors within sequencer outputs. This is accomplished through comparison of input sequences against a profile hidden Markov model (PHMM) using the Viterbi algorithm (for algorithm details see Durbin et al. 1998, ISBN: 9780521629713). Inserted base pairs are removed and deleted base pairs are accounted for through the introduction of a placeholder character. Since the PHMM is a probabilistic representation of the COI barcode, corrections are not always perfect. For this reason 'debar' censors base pairs adjacent to reported indel sites, turning them into placeholder characters (default is 7 base pairs in either direction, this feature can be disabled). Testing has shown that this censorship results in the correct sequence length being restored, and erroneous base pairs being masked the vast majority of the time (>95%).
Maintained by Cameron M. Nugent. Last updated 1 years ago.
bioinformaticsdenoisingdna-barcodingdna-sequencinghidden-markov-modelmachine-learning
3.5 match 1 stars 4.00 score 8 scriptsbioc
ngsReports:Load FastqQC reports and other NGS related files
This package provides methods and object classes for parsing FastQC reports and output summaries from other NGS tools into R. As well as parsing files, multiple plotting methods have been implemented for visualising the parsed data. Plots can be generated as static ggplot objects or interactive plotly objects.
Maintained by Stevie Pederson. Last updated 5 months ago.
1.7 match 22 stars 7.89 score 99 scriptsbioc
chimeraviz:Visualization tools for gene fusions
chimeraviz manages data from fusion gene finders and provides useful visualization tools.
Maintained by Stian Lรฅgstad. Last updated 5 months ago.
1.9 match 37 stars 6.71 score 14 scriptsbioc
alabaster.files:Wrappers to Save Common File Formats
Save common bioinformatics file formats within the alabaster framework. This includes BAM, BED, VCF, bigWig, bigBed, FASTQ, FASTA and so on. We save and load additional metadata for each file, and we support linkage between each file and its corresponding index.
Maintained by Aaron Lun. Last updated 5 months ago.
2.5 match 4.50 score 21 scriptsbioc
HiCool:HiCool
HiCool provides an R interface to process and normalize Hi-C paired-end fastq reads into .(m)cool files. .(m)cool is a compact, indexed HDF5 file format specifically tailored for efficiently storing HiC-based data. On top of processing fastq reads, HiCool provides a convenient reporting function to generate shareable reports summarizing Hi-C experiments and including quality controls.
Maintained by Jacques Serizay. Last updated 5 months ago.
2.5 match 2 stars 4.34 score 11 scriptsbioc
ramwas:Fast Methylome-Wide Association Study Pipeline for Enrichment Platforms
A complete toolset for methylome-wide association studies (MWAS). It is specifically designed for data from enrichment based methylation assays, but can be applied to other data as well. The analysis pipeline includes seven steps: (1) scanning aligned reads from BAM files, (2) calculation of quality control measures, (3) creation of methylation score (coverage) matrix, (4) principal component analysis for capturing batch effects and detection of outliers, (5) association analysis with respect to phenotypes of interest while correcting for top PCs and known covariates, (6) annotation of significant findings, and (7) multi-marker analysis (methylation risk score) using elastic net. Additionally, RaMWAS include tools for joint analysis of methlyation and genotype data. This work is published in Bioinformatics, Shabalin et al. (2018) <doi:10.1093/bioinformatics/bty069>.
Maintained by Andrey A Shabalin. Last updated 5 months ago.
dnamethylationsequencingqualitycontrolcoveragepreprocessingnormalizationbatcheffectprincipalcomponentdifferentialmethylationvisualization
1.8 match 10 stars 6.08 score 85 scriptswheretrue
exonr:Scientific Data Processing
This package provides a set of tools for processing scientific data. It's based on the exon Rust package.
Maintained by Trent Hauck. Last updated 2 months ago.
arrowbioinformaticsdatafusionngsproteomicsrustsqlcargo
2.0 match 56 stars 5.27 score 2 scriptslarssnip
microseq:Basic Biological Sequence Handling
Basic functions for microbial sequence data analysis. The idea is to use generic R data structures as much as possible, making R data wrangling possible also for sequence data.
Maintained by Lars Snipen. Last updated 10 months ago.
1.9 match 3 stars 5.46 score 54 scripts 3 dependentsbioc
CellBarcode:Cellular DNA Barcode Analysis toolkit
The package CellBarcode performs Cellular DNA Barcode analysis. It can handle all kinds of DNA barcodes, as long as the barcode is within a single sequencing read and has a pattern that can be matched by a regular expression. \code{CellBarcode} can handle barcodes with flexible lengths, with or without UMI (unique molecular identifier). This tool also can be used for pre-processing some amplicon data such as CRISPR gRNA screening, immune repertoire sequencing, and metagenome data.
Maintained by Wenjie Sun. Last updated 11 days ago.
preprocessingqualitycontrolsequencingcrisprampliconamplicon-sequencingcellular-barcoderustcargo
1.7 match 1 stars 5.86 score 40 scriptswanglabcsu
blit:Bioinformatics Library for Integrated Tools
An all-encompassing R toolkit designed to streamline the process of calling various bioinformatics software and then performing data analysis and visualization in R. With 'blit', users can easily integrate a wide array of bioinformatics command line tools into their workflows, leveraging the power of R for sophisticated data manipulation and graphical representation.
Maintained by Yun Peng. Last updated 19 hours ago.
2.3 match 3 stars 4.38 score 3 scriptsbioc
psichomics:Graphical Interface for Alternative Splicing Quantification, Analysis and Visualisation
Interactive R package with an intuitive Shiny-based graphical interface for alternative splicing quantification and integrative analyses of alternative splicing and gene expression based on The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), Sequence Read Archive (SRA) and user-provided data. The tool interactively performs survival, dimensionality reduction and median- and variance-based differential splicing and gene expression analyses that benefit from the incorporation of clinical and molecular sample-associated features (such as tumour stage or survival). Interactive visual access to genomic mapping and functional annotation of selected alternative splicing events is also included.
Maintained by Nuno Saraiva-Agostinho. Last updated 5 months ago.
sequencingrnaseqalternativesplicingdifferentialsplicingtranscriptionguiprincipalcomponentsurvivalbiomedicalinformaticstranscriptomicsimmunooncologyvisualizationmultiplecomparisongeneexpressiondifferentialexpressionalternative-splicingbioconductordata-analysesdifferential-gene-expressiondifferential-splicing-analysisgene-expressiongtexrecount2rna-seq-datasplicing-quantificationsratcgavast-toolscpp
1.3 match 36 stars 6.95 score 31 scriptscran
varitas:Variant Calling in Targeted Analysis Sequencing Data
Multi-caller variant analysis pipeline for targeted analysis sequencing (TAS) data. Features a modular, automated workflow that can start with raw reads and produces a user-friendly PDF summary and a spreadsheet containing consensus variant information.
Maintained by Adam Mills. Last updated 4 years ago.
3.7 match 2.30 scorebioc
methodical:Discovering genomic regions where methylation is strongly associated with transcriptional activity
DNA methylation is generally considered to be associated with transcriptional silencing. However, comprehensive, genome-wide investigation of this relationship requires the evaluation of potentially millions of correlation values between the methylation of individual genomic loci and expression of associated transcripts in a relatively large numbers of samples. Methodical makes this process quick and easy while keeping a low memory footprint. It also provides a novel method for identifying regions where a number of methylation sites are consistently strongly associated with transcriptional expression. In addition, Methodical enables housing DNA methylation data from diverse sources (e.g. WGBS, RRBS and methylation arrays) with a common framework, lifting over DNA methylation data between different genome builds and creating base-resolution plots of the association between DNA methylation and transcriptional activity at transcriptional start sites.
Maintained by Richard Heery. Last updated 2 months ago.
dnamethylationmethylationarraytranscriptiongenomewideassociationsoftwareopenjdk
1.7 match 4.65 score 14 scriptsbioc
CircSeqAlignTk:A toolkit for end-to-end analysis of RNA-seq data for circular genomes
CircSeqAlignTk is designed for end-to-end RNA-Seq data analysis of circular genome sequences, from alignment to visualization. It mainly targets viroids which are composed of 246-401 nt circular RNAs. In addition, CircSeqAlignTk implements a tidy interface to generate synthetic sequencing data that mimic real RNA-Seq data, allowing developers to evaluate the performance of alignment tools and workflows.
Maintained by Jianqiang Sun. Last updated 5 months ago.
sequencingsmallrnaalignmentsoftware
1.8 match 4.40 score 3 scriptsbioc
SpliceWiz:interactive analysis and visualization of alternative splicing in R
The analysis and visualization of alternative splicing (AS) events from RNA sequencing data remains challenging. SpliceWiz is a user-friendly and performance-optimized R package for AS analysis, by processing alignment BAM files to quantify read counts across splice junctions, IRFinder-based intron retention quantitation, and supports novel splicing event identification. We introduce a novel visualization for AS using normalized coverage, thereby allowing visualization of differential AS across conditions. SpliceWiz features a shiny-based GUI facilitating interactive data exploration of results including gene ontology enrichment. It is performance optimized with multi-threaded processing of BAM files and a new COV file format for fast recall of sequencing coverage. Overall, SpliceWiz streamlines AS analysis, enabling reliable identification of functionally relevant AS events for further characterization.
Maintained by Alex Chit Hei Wong. Last updated 3 days ago.
softwaretranscriptomicsrnaseqalternativesplicingcoveragedifferentialsplicingdifferentialexpressionguisequencingcppopenmp
1.2 match 16 stars 6.41 score 8 scriptsbioc
hiReadsProcessor:Functions to process LM-PCR reads from 454/Illumina data
hiReadsProcessor contains set of functions which allow users to process LM-PCR products sequenced using any platform. Given an excel/txt file containing parameters for demultiplexing and sample metadata, the functions automate trimming of adaptors and identification of the genomic product. Genomic products are further processed for QC and abundance quantification.
Maintained by Nirav V Malani. Last updated 5 months ago.
1.7 match 4.18 score 7 scriptsbioc
basecallQC:Working with Illumina Basecalling and Demultiplexing input and output files
The basecallQC package provides tools to work with Illumina bcl2Fastq (versions >= 2.1.7) software.Prior to basecalling and demultiplexing using the bcl2Fastq software, basecallQC functions allow the user to update Illumina sample sheets from versions <= 1.8.9 to >= 2.1.7 standards, clean sample sheets of common problems such as invalid sample names and IDs, create read and index basemasks and the bcl2Fastq command. Following the generation of basecalled and demultiplexed data, the basecallQC packages allows the user to generate HTML tables, plots and a self contained report of summary metrics from Illumina XML output files.
Maintained by Thomas Carroll. Last updated 5 months ago.
sequencinginfrastructuredataimportqualitycontrol
1.6 match 4.32 score 21 scriptsbioc
R453Plus1Toolbox:A package for importing and analyzing data from Roche's Genome Sequencer System
The R453Plus1 Toolbox comprises useful functions for the analysis of data generated by Roche's 454 sequencing platform. It adds functions for quality assurance as well as for annotation and visualization of detected variants, complementing the software tools shipped by Roche with their product. Further, a pipeline for the detection of structural variants is provided.
Maintained by Hans-Ulrich Klein. Last updated 5 months ago.
sequencinginfrastructuredataimportdatarepresentationvisualizationqualitycontrolreportwriting
1.8 match 3.48 score 10 scriptsbioc
BUSpaRse:kallisto | bustools R utilities
The kallisto | bustools pipeline is a fast and modular set of tools to convert single cell RNA-seq reads in fastq files into gene count or transcript compatibility counts (TCC) matrices for downstream analysis. Central to this pipeline is the barcode, UMI, and set (BUS) file format. This package serves the following purposes: First, this package allows users to manipulate BUS format files as data frames in R and then convert them into gene count or TCC matrices. Furthermore, since R and Rcpp code is easier to handle than pure C++ code, users are encouraged to tweak the source code of this package to experiment with new uses of BUS format and different ways to convert the BUS file into gene count matrix. Second, this package can conveniently generate files required to generate gene count matrices for spliced and unspliced transcripts for RNA velocity. Here biotypes can be filtered and scaffolds and haplotypes can be removed, and the filtered transcriptome can be extracted and written to disk. Third, this package implements utility functions to get transcripts and associated genes required to convert BUS files to gene count matrices, to write the transcript to gene information in the format required by bustools, and to read output of bustools into R as sparses matrices.
Maintained by Lambda Moses. Last updated 5 months ago.
singlecellrnaseqworkflowstepcpp
0.5 match 9 stars 7.35 score 165 scriptsbioc
metabinR:Abundance and Compositional Based Binning of Metagenomes
Provide functions for performing abundance and compositional based binning on metagenomic samples, directly from FASTA or FASTQ files. Functions are implemented in Java and called via rJava. Parallel implementation that operates directly on input FASTA/FASTQ files for fast execution.
Maintained by Anestis Gkanogiannis. Last updated 5 months ago.
classificationclusteringmicrobiomesequencingsoftwareopenjdk
0.8 match 4.18 score 2 scriptsbioc
scruff:Single Cell RNA-Seq UMI Filtering Facilitator (scruff)
A pipeline which processes single cell RNA-seq (scRNA-seq) reads from CEL-seq and CEL-seq2 protocols. Demultiplex scRNA-seq FASTQ files, align reads to reference genome using Rsubread, and generate UMI filtered count matrix. Also provide visualizations of read alignments and pre- and post-alignment QC metrics.
Maintained by Zhe Wang. Last updated 5 months ago.
softwaretechnologysequencingalignmentrnaseqsinglecellworkflowsteppreprocessingqualitycontrolvisualizationimmunooncologybioinformaticsscrna-seqsingle-cellumi
0.5 match 8 stars 6.20 score 22 scriptsgrafxzahl
genBaRcode:Analysis and Visualization Tools for Genetic Barcode Data
Provides the necessary functions to identify and extract a selection of already available barcode constructs (Cornils, K. et al. (2014) <doi:10.1093/nar/gku081>) and freely choosable barcode designs from next generation sequence (NGS) data. Furthermore, it offers the possibility to account for sequence errors, the calculation of barcode similarities and provides a variety of visualisation tools (Thielecke, L. et al. (2017) <doi:10.1038/srep43249>).
Maintained by Lars Thielecke. Last updated 6 days ago.
1.2 match 2.30 score 6 scripts