R-universe search: fastq

bioc

ShortRead:FASTQ input and manipulation

This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

dataimport sequencing qualitycontrol bioconductor-package core-package zlib cpp

16.2 match 8 stars 12.08 score 1.8k scripts 49 dependents

bioc

dada2:Accurate, high-resolution sample inference from amplicon sequencing data

The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.

Maintained by Benjamin Callahan. Last updated 5 months ago.

immunooncology microbiome sequencing classification metagenomics amplicon bioconductor bioinformatics metabarcoding taxonomy cpp

11.3 match 485 stars 13.17 score 3.0k scripts 4 dependents

bioc

seqTools:Analysis of nucleotide, sequence and quality content on fastq files

Analyze read length, phred scores and alphabet frequency and DNA k-mers on uncompressed and compressed fastq files.

Maintained by Wolfgang Kaisers. Last updated 5 months ago.

qualitycontrol sequencing zlib

21.6 match 5.57 score 52 scripts 1 dependents

bioc

ORFik:Open Reading Frames in Genomics

R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.

Maintained by Haakon Tjeldnes. Last updated 27 days ago.

immunooncology software sequencing riboseq rnaseq functionalgenomics coverage alignment dataimport cpp

7.7 match 33 stars 10.63 score 115 scripts 2 dependents

bioc

Biostrings:Efficient manipulation of biological strings

Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.

Maintained by Hervé Pagès. Last updated 23 days ago.

sequencematching alignment sequencing genetics dataimport datarepresentation infrastructure bioconductor-package core-package

4.5 match 61 stars 17.83 score 8.6k scripts 1.2k dependents

bioc

GEOfastq:Downloads ENA Fastqs With GEO Accessions

GEOfastq is used to download fastq files from the European Nucleotide Archive (ENA) starting with an accession from the Gene Expression Omnibus (GEO). To do this, sample metadata is retrieved from GEO and the Sequence Read Archive (SRA). SRA run accessions are then used to construct FTP and aspera download links for fastq files generated by the ENA.

Maintained by Alex Pickering. Last updated 5 months ago.

rnaseq dataimport bioinformatics fastq gene-expression geo rna-seq

16.5 match 4 stars 4.60 score 6 scripts

bioc

Rfastp:An Ultra-Fast and All-in-One Fastq Preprocessor (Quality Control, Adapter, low quality and polyX trimming) and UMI Sequence Parsing).

Rfastp is an R wrapper of fastp developed in c++. fastp performs quality control for fastq files. including low quality bases trimming, polyX trimming, adapter auto-detection and trimming, paired-end reads merging, UMI sequence/id handling. Rfastp can concatenate multiple files into one file (like shell command cat) and accept multiple files as input.

Maintained by Thomas Carroll. Last updated 5 months ago.

qualitycontrol sequencing preprocessing software zlib cpp

17.8 match 3.82 score 33 scripts

bioc

scPipe:Pipeline for single cell multi-omic data pre-processing

A preprocessing pipeline for single cell RNA-seq/ATAC-seq data that starts from the fastq files and produces a feature count matrix with associated quality control information. It can process fastq data generated by CEL-seq, MARS-seq, Drop-seq, Chromium 10x and SMART-seq protocols.

Maintained by Shian Su. Last updated 3 months ago.

immunooncology software sequencing rnaseq geneexpression singlecell visualization sequencematching preprocessing qualitycontrol genomeannotation dataimport curl bzip2 xz-utils zlib cpp

7.2 match 68 stars 9.02 score 84 scripts

bioc

UMI4Cats:UMI4Cats: Processing, analysis and visualization of UMI-4C chromatin contact data

UMI-4C is a technique that allows characterization of 3D chromatin interactions with a bait of interest, taking advantage of a sonication step to produce unique molecular identifiers (UMIs) that help remove duplication bias, thus allowing a better differential comparsion of chromatin interactions between conditions. This package allows processing of UMI-4C data, starting from FastQ files provided by the sequencing facility. It provides two statistical methods for detecting differential contacts and includes a visualization function to plot integrated information from a UMI-4C assay.

Maintained by Mireia Ramos-Rodriguez. Last updated 5 months ago.

qualitycontrol preprocessing alignment normalization visualization sequencing coverage chromatin chromatin-interaction genomics umi4c

11.3 match 5 stars 5.57 score 7 scripts

adrientaudiere

MiscMetabar:Miscellaneous Functions for Metabarcoding Analysis

Facilitate the description, transformation, exploration, and reproducibility of metabarcoding analyses. 'MiscMetabar' is mainly built on top of the 'phyloseq', 'dada2' and 'targets' R packages. It helps to build reproducible and robust bioinformatics pipelines in R. 'MiscMetabar' makes ecological analysis of alpha and beta-diversity easier, more reproducible and more powerful by integrating a large number of tools. Important features are described in Taudière A. (2023) <doi:10.21105/joss.06038>.

Maintained by Adrien Taudière. Last updated 25 days ago.

sequencing microbiome metagenomics clustering classification visualization amplicon amplicon-sequencing biodiversity-informatics ecology illumina metabarcoding ngs-analysis

7.2 match 17 stars 6.44 score 23 scripts

bioc

qckitfastq:FASTQ Quality Control

Assessment of FASTQ file format with multiple metrics including quality score, sequence content, overrepresented sequence and Kmers.

Maintained by August Guang. Last updated 5 months ago.

software qualitycontrol sequencing zlib cpp

10.4 match 4.38 score 24 scripts

bioc

SRAdb:A compilation of metadata from NCBI SRA and tools

The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, and others. However, finding data of interest can be challenging using current tools. SRAdb is an attempt to make access to the metadata associated with submission, study, sample, experiment and run much more feasible. This is accomplished by parsing all the NCBI SRA metadata into a SQLite database that can be stored and queried locally. Fulltext search in the package make querying metadata very flexible and powerful. fastq and sra files can be downloaded for doing alignment locally. Beside ftp protocol, the SRAdb has funcitons supporting fastp protocol (ascp from Aspera Connect) for faster downloading large data files over long distance. The SQLite database is updated regularly as new data is added to SRA and can be downloaded at will for the most up-to-date metadata.

Maintained by Jack Zhu. Last updated 3 months ago.

infrastructure sequencing dataimport

5.5 match 2 stars 7.81 score 200 scripts

emmanuelparadis

ape:Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.

Maintained by Emmanuel Paradis. Last updated 1 months ago.

openblas cpp

2.3 match 64 stars 17.18 score 13k scripts 601 dependents

bioc

systemPipeR:systemPipeR: Workflow Environment for Data Analysis and Report Generation

systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates.

Maintained by Thomas Girke. Last updated 5 months ago.

genetics infrastructure dataimport sequencing rnaseq riboseq chipseq methylseq snp geneexpression coverage genesetenrichment alignment qualitycontrol immunooncology reportwriting workflowstep workflowmanagement

3.2 match 53 stars 11.56 score 344 scripts 3 dependents

bioc

IONiseR:Quality Assessment Tools for Oxford Nanopore MinION data

IONiseR provides tools for the quality assessment of Oxford Nanopore MinION data. It extracts summary statistics from a set of fast5 files and can be used either before or after base calling. In addition to standard summaries of the read-types produced, it provides a number of plots for visualising metrics relative to experiment run time or spatially over the surface of a flowcell.

Maintained by Mike Smith. Last updated 5 months ago.

qualitycontrol dataimport sequencing

8.3 match 4.30 score 5 scripts

ambuvjyn

baseq:Basic Sequence Processing Tool for Biological Data

Primarily created as an easy and understanding way to do basic sequences surrounding the central dogma of molecular biology.

Maintained by Ambu Vijayan. Last updated 2 years ago.

bioinformatics sequencing

8.9 match 2 stars 4.00 score

bioc

CrispRVariants:Tools for counting and visualising mutations in a target location

CrispRVariants provides tools for analysing the results of a CRISPR-Cas9 mutagenesis sequencing experiment, or other sequencing experiments where variants within a given region are of interest. These tools allow users to localize variant allele combinations with respect to any genomic location (e.g. the Cas9 cut site), plot allele combinations and calculate mutation rates with flexible filtering of unrelated variants.

Maintained by Helen Lindsay. Last updated 5 months ago.

immunooncology crispr genomicvariation variantdetection geneticvariability datarepresentation visualization sequencing

6.1 match 5.51 score 32 scripts

shaunpwilkinson

insect:Informatic Sequence Classification Trees

Provides tools for probabilistic taxon assignment with informatic sequence classification trees. See Wilkinson et al (2018) <doi:10.7287/peerj.preprints.26812v1>.

Maintained by Shaun Wilkinson. Last updated 4 years ago.

5.6 match 14 stars 5.80 score 91 scripts

bioc

FastqCleaner:A Shiny Application for Quality Control, Filtering and Trimming of FASTQ Files

An interactive web application for quality control, filtering and trimming of FASTQ files. This user-friendly tool combines a pipeline for data processing based on Biostrings and ShortRead infrastructure, with a cutting-edge visual environment. Single-Read and Paired-End files can be locally processed. Diagnostic interactive plots (CG content, per-base sequence quality, etc.) are provided for both the input and output files.

Maintained by Leandro Roser. Last updated 5 months ago.

qualitycontrol sequencing software sangerseq sequencematching cpp

8.1 match 4.00 score 4 scripts

bioc

icetea:Integrating Cap Enrichment with Transcript Expression Analysis

icetea (Integrating Cap Enrichment with Transcript Expression Analysis) provides functions for end-to-end analysis of multiple 5'-profiling methods such as CAGE, RAMPAGE and MAPCap, beginning from raw reads to detection of transcription start sites using replicates. It also allows performing differential TSS detection between group of samples, therefore, integrating the mRNA cap enrichment information with transcript expression analysis.

Maintained by Vivek Bhardwaj. Last updated 5 months ago.

immunooncology transcription geneexpression sequencing rnaseq transcriptomics differentialexpression cage expression rna-seq

6.3 match 2 stars 5.08 score 7 scripts

bioc

ChIPsim:Simulation of ChIP-seq experiments

A general framework for the simulation of ChIP-seq data. Although currently focused on nucleosome positioning the package is designed to support different types of experiments.

Maintained by Peter Humburg. Last updated 5 months ago.

infrastructure chipseq

7.0 match 4.00 score 3 scripts

bioc

esATAC:An Easy-to-use Systematic pipeline for ATACseq data analysis

This package provides a framework and complete preset pipeline for quantification and analysis of ATAC-seq Reads. It covers raw sequencing reads preprocessing (FASTQ files), reads alignment (Rbowtie2), aligned reads file operations (SAM, BAM, and BED files), peak calling (F-seq), genome annotations (Motif, GO, SNP analysis) and quality control report. The package is managed by dataflow graph. It is easy for user to pass variables seamlessly between processes and understand the workflow. Users can process FASTQ files through end-to-end preset pipeline which produces a pretty HTML report for quality control and preliminary statistical results, or customize workflow starting from any intermediate stages with esATAC functions easily and flexibly.

Maintained by Zheng Wei. Last updated 5 months ago.

immunooncology sequencing dnaseq qualitycontrol alignment preprocessing coverage atacseq dnaseseq atac-seq bioconductor pipeline cpp openjdk

4.5 match 23 stars 6.11 score 3 scripts

urodelan

LocaTT:Geographically-Conscious Taxonomic Assignment for Metabarcoding

A bioinformatics pipeline for performing taxonomic assignment of DNA metabarcoding sequence data while considering geographic location. A detailed tutorial is available at <https://urodelan.github.io/Local_Taxa_Tool_Tutorial/>. A manuscript describing these methods is in preparation.

Maintained by Kenen Goodwin. Last updated 12 months ago.

8.5 match 3.00 score

bioc

edgeR:Empirical Analysis of Digital Gene Expression Data in R

Differential expression analysis of sequence count data. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models, quasi-likelihood, and gene set enrichment. Can perform differential analyses of any type of omics data that produces read counts, including RNA-seq, ChIP-seq, ATAC-seq, Bisulfite-seq, SAGE, CAGE, metabolomics, or proteomics spectral counts. RNA-seq analyses can be conducted at the gene or isoform level, and tests can be conducted for differential exon or transcript usage.

Maintained by Yunshun Chen. Last updated 5 days ago.

alternativesplicing batcheffect bayesian biomedicalinformatics cellbiology chipseq clustering coverage differentialexpression differentialmethylation differentialsplicing dnamethylation epigenetics functionalgenomics geneexpression genesetenrichment genetics immunooncology multiplecomparison normalization pathways proteomics qualitycontrol regression rnaseq sage sequencing singlecell systemsbiology timecourse transcription transcriptomics openblas

1.7 match 13.40 score 17k scripts 255 dependents

svilsen

STRMPS:Analysis of Short Tandem Repeat (STR) Massively Parallel Sequencing (MPS) Data

Loading, identifying, aggregating, manipulating, and analysing short tandem repeat regions of massively parallel sequencing data in forensic genetics. The analyses and framework implemented in this package relies on the papers of Vilsen et al. (2017) <doi:10.1016/j.fsigen.2017.01.017> and Vilsen et al. (2018) <doi:10.1016/j.fsigen.2018.04.003>. Note: that the parallelisation in the package relies on mclapply() and, thus, speed-ups will only be seen on UNIX based systems.

Maintained by Søren B. Vilsen. Last updated 2 days ago.

biostrings pwalign shortread iranges

5.1 match 4.30 score

lucasnell

jackalope:A Swift, Versatile Phylogenomic and High-Throughput Sequencing Simulator

Simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina <https://www.illumina.com/> and Pacific Biosciences (PacBio) <https://www.pacb.com/> platforms. It can either read reference genomes from FASTA files or simulate new ones. Genomic variants can be simulated using summary statistics, phylogenies, Variant Call Format (VCF) files, and coalescent simulations—the latter of which can include selection, recombination, and demographic fluctuations. 'jackalope' can simulate single, paired-end, or mate-pair Illumina reads, as well as PacBio reads. These simulations include sequencing errors, mapping qualities, multiplexing, and optical/polymerase chain reaction (PCR) duplicates. Simulating Illumina sequencing is based on ART by Huang et al. (2012) <doi:10.1093/bioinformatics/btr708>. PacBio sequencing simulation is based on SimLoRD by Stöcker et al. (2016) <doi:10.1093/bioinformatics/btw286>. All outputs can be written to standard file formats.

Maintained by Lucas A. Nell. Last updated 1 years ago.

zlib openblas curl bzip2 xz-utils cpp

3.4 match 8 stars 5.28 score 24 scripts

guokai8

microbial:Do 16s Data Analysis and Generate Figures

Provides functions to enhance the available statistical analysis procedures in R by providing simple functions to analysis and visualize the 16S rRNA data.Here we present a tutorial with minimum working examples to demonstrate usage and dependencies.

Maintained by Kai Guo. Last updated 5 months ago.

software graphandnetwork microbiome microbiome-analysis

3.1 match 13 stars 5.81 score 25 scripts

fischuu

GenomicTools.fileHandler:File Handlers for Genomic Data Analysis

A collection of I/O tools for handling the most commonly used genomic datafiles, like fasta/-q, bed, gff, gtf, ped/map and vcf.

Maintained by Daniel Fischer. Last updated 1 months ago.

3.8 match 4.48 score 4 scripts 2 dependents

bioc

KnowSeq:KnowSeq R/Bioc package: The Smart Transcriptomic Pipeline

KnowSeq proposes a novel methodology that comprises the most relevant steps in the Transcriptomic gene expression analysis. KnowSeq expects to serve as an integrative tool that allows to process and extract relevant biomarkers, as well as to assess them through a Machine Learning approaches. Finally, the last objective of KnowSeq is the biological knowledge extraction from the biomarkers (Gene Ontology enrichment, Pathway listing and Visualization and Evidences related to the addressed disease). Although the package allows analyzing all the data manually, the main strenght of KnowSeq is the possibilty of carrying out an automatic and intelligent HTML report that collect all the involved steps in one document. It is important to highligh that the pipeline is totally modular and flexible, hence it can be started from whichever of the different steps. KnowSeq expects to serve as a novel tool to help to the experts in the field to acquire robust knowledge and conclusions for the data and diseases to study.

Maintained by Daniel Castillo-Secilla. Last updated 5 months ago.

geneexpression differentialexpression genesetenrichment dataimport classification featureextraction sequencing rnaseq batcheffect normalization preprocessing qualitycontrol genetics transcriptomics microarray alignment pathways systemsbiology go immunooncology

4.9 match 3.30 score 5 scripts

bioc

SELEX:Functions for analyzing SELEX-seq data

Tools for quantifying DNA binding specificities based on SELEX-seq data.

Maintained by Harmen J. Bussemaker. Last updated 5 months ago.

software motifdiscovery motifannotation generegulation transcription openjdk

3.6 match 4.30 score 8 scripts

bioc

DECIPHER:Tools for curating, analyzing, and manipulating biological sequences

A toolset for deciphering and managing biological sequences.

Maintained by Erik Wright. Last updated 5 days ago.

clustering genetics sequencing dataimport visualization microarray qualitycontrol qpcr alignment wholegenome microbiome immunooncology geneprediction openmp

1.8 match 8.40 score 1.1k scripts 14 dependents

c5sire

ace2fastq:ACE File to FASTQ Converter

The ACE file format is used in genomics to store contigs from sequencing machines. This tools converts it into FASTQ format. Both formats contain the sequence characters and their corresponding quality information. Unlike the FASTQ file, the ace file stores the quality values numerically. The conversion algorithm uses the standard Sanger formula. The package facilitates insertion into pipelines, and content inspection.

Maintained by Reinhard Simon. Last updated 2 years ago.

3.9 match 3.70 score 7 scripts

cnuge

debar:A Post-Clustering Denoiser for COI-5P Barcode Data

The 'debar' sequence processing pipeline is designed for denoising high throughput sequencing data for the animal DNA barcode marker cytochrome c oxidase I (COI). The package is designed to detect and correct insertion and deletion errors within sequencer outputs. This is accomplished through comparison of input sequences against a profile hidden Markov model (PHMM) using the Viterbi algorithm (for algorithm details see Durbin et al. 1998, ISBN: 9780521629713). Inserted base pairs are removed and deleted base pairs are accounted for through the introduction of a placeholder character. Since the PHMM is a probabilistic representation of the COI barcode, corrections are not always perfect. For this reason 'debar' censors base pairs adjacent to reported indel sites, turning them into placeholder characters (default is 7 base pairs in either direction, this feature can be disabled). Testing has shown that this censorship results in the correct sequence length being restored, and erroneous base pairs being masked the vast majority of the time (>95%).

Maintained by Cameron M. Nugent. Last updated 1 years ago.

bioinformatics denoising dna-barcoding dna-sequencing hidden-markov-model machine-learning

3.5 match 1 stars 4.00 score 8 scripts

bioc

ngsReports:Load FastqQC reports and other NGS related files

This package provides methods and object classes for parsing FastQC reports and output summaries from other NGS tools into R. As well as parsing files, multiple plotting methods have been implemented for visualising the parsed data. Plots can be generated as static ggplot objects or interactive plotly objects.

Maintained by Stevie Pederson. Last updated 5 months ago.

qualitycontrol reportwriting

1.7 match 22 stars 7.89 score 99 scripts

bioc

chimeraviz:Visualization tools for gene fusions

chimeraviz manages data from fusion gene finders and provides useful visualization tools.

Maintained by Stian Lågstad. Last updated 5 months ago.

infrastructure alignment

1.9 match 37 stars 6.71 score 14 scripts

bioc

alabaster.files:Wrappers to Save Common File Formats

Save common bioinformatics file formats within the alabaster framework. This includes BAM, BED, VCF, bigWig, bigBed, FASTQ, FASTA and so on. We save and load additional metadata for each file, and we support linkage between each file and its corresponding index.

Maintained by Aaron Lun. Last updated 5 months ago.

datarepresentation dataimport

2.5 match 4.50 score 21 scripts

bioc

HiCool:HiCool

HiCool provides an R interface to process and normalize Hi-C paired-end fastq reads into .(m)cool files. .(m)cool is a compact, indexed HDF5 file format specifically tailored for efficiently storing HiC-based data. On top of processing fastq reads, HiCool provides a convenient reporting function to generate shareable reports summarizing Hi-C experiments and including quality controls.

Maintained by Jacques Serizay. Last updated 5 months ago.

hic dna3dstructure dataimport

2.5 match 2 stars 4.34 score 11 scripts

bioc

ramwas:Fast Methylome-Wide Association Study Pipeline for Enrichment Platforms

A complete toolset for methylome-wide association studies (MWAS). It is specifically designed for data from enrichment based methylation assays, but can be applied to other data as well. The analysis pipeline includes seven steps: (1) scanning aligned reads from BAM files, (2) calculation of quality control measures, (3) creation of methylation score (coverage) matrix, (4) principal component analysis for capturing batch effects and detection of outliers, (5) association analysis with respect to phenotypes of interest while correcting for top PCs and known covariates, (6) annotation of significant findings, and (7) multi-marker analysis (methylation risk score) using elastic net. Additionally, RaMWAS include tools for joint analysis of methlyation and genotype data. This work is published in Bioinformatics, Shabalin et al. (2018) <doi:10.1093/bioinformatics/bty069>.

Maintained by Andrey A Shabalin. Last updated 5 months ago.

dnamethylation sequencing qualitycontrol coverage preprocessing normalization batcheffect principalcomponent differentialmethylation visualization

1.8 match 10 stars 6.08 score 85 scripts

wheretrue

exonr:Scientific Data Processing

This package provides a set of tools for processing scientific data. It's based on the exon Rust package.

Maintained by Trent Hauck. Last updated 2 months ago.

arrow bioinformatics datafusion ngs proteomics rust sql cargo

2.0 match 56 stars 5.27 score 2 scripts

larssnip

microseq:Basic Biological Sequence Handling

Basic functions for microbial sequence data analysis. The idea is to use generic R data structures as much as possible, making R data wrangling possible also for sequence data.

Maintained by Lars Snipen. Last updated 10 months ago.

cpp

1.9 match 3 stars 5.46 score 54 scripts 3 dependents

bioc

CellBarcode:Cellular DNA Barcode Analysis toolkit

The package CellBarcode performs Cellular DNA Barcode analysis. It can handle all kinds of DNA barcodes, as long as the barcode is within a single sequencing read and has a pattern that can be matched by a regular expression. \code{CellBarcode} can handle barcodes with flexible lengths, with or without UMI (unique molecular identifier). This tool also can be used for pre-processing some amplicon data such as CRISPR gRNA screening, immune repertoire sequencing, and metagenome data.

Maintained by Wenjie Sun. Last updated 11 days ago.

preprocessing qualitycontrol sequencing crispr amplicon amplicon-sequencing cellular-barcode rust cargo

1.7 match 1 stars 5.86 score 40 scripts

wanglabcsu

blit:Bioinformatics Library for Integrated Tools

An all-encompassing R toolkit designed to streamline the process of calling various bioinformatics software and then performing data analysis and visualization in R. With 'blit', users can easily integrate a wide array of bioinformatics command line tools into their workflows, leveraging the power of R for sophisticated data manipulation and graphical representation.

Maintained by Yun Peng. Last updated 19 hours ago.

2.3 match 3 stars 4.38 score 3 scripts

hugheylab

seeker:Simplified Fetching and Processing of Microarray and RNA-Seq Data

Wrapper around various existing tools and command-line interfaces, providing a standard interface, simple parallelization, and detailed logging. For microarray data, maps probe sets to standard gene IDs, building on 'GEOquery' Davis and Meltzer (2007) <doi:10.1093/bioinformatics/btm254>, 'ArrayExpress' Kauffmann et al. (2009) <doi:10.1093/bioinformatics/btp354>, Robust multi-array average 'RMA' Irizarry et al. (2003) <doi:10.1093/biostatistics/4.2.249>, and 'BrainArray' Dai et al. (2005) <doi:10.1093/nar/gni179>. For RNA-seq data, fetches metadata and raw reads from National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA), performs standard adapter and quality trimming using 'TrimGalore' Krueger <https://github.com/FelixKrueger/TrimGalore>, performs quality control checks using 'FastQC' Andrews <https://github.com/s-andrews/FastQC>, quantifies transcript abundances using 'salmon' Patro et al. (2017) <doi:10.1038/nmeth.4197> and potentially 'refgenie' Stolarczyk et al. (2020) <doi:10.1093/gigascience/giz149>, aggregates the results using 'MultiQC' Ewels et al. (2016) <doi:10.1093/bioinformatics/btw354>, maps transcripts to genes using 'biomaRt' Durinkck et al. (2009) <doi:10.1038/nprot.2009.97>, and summarizes transcript-level quantifications for gene-level analyses using 'tximport' Soneson et al. (2015) <doi:10.12688/f1000research.7563.2>.

Maintained by Jake Hughey. Last updated 7 months ago.

2.0 match 3 stars 4.78 score 1 scripts

bioc

psichomics:Graphical Interface for Alternative Splicing Quantification, Analysis and Visualisation

Interactive R package with an intuitive Shiny-based graphical interface for alternative splicing quantification and integrative analyses of alternative splicing and gene expression based on The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), Sequence Read Archive (SRA) and user-provided data. The tool interactively performs survival, dimensionality reduction and median- and variance-based differential splicing and gene expression analyses that benefit from the incorporation of clinical and molecular sample-associated features (such as tumour stage or survival). Interactive visual access to genomic mapping and functional annotation of selected alternative splicing events is also included.

Maintained by Nuno Saraiva-Agostinho. Last updated 5 months ago.

sequencing rnaseq alternativesplicing differentialsplicing transcription gui principalcomponent survival biomedicalinformatics transcriptomics immunooncology visualization multiplecomparison geneexpression differentialexpression alternative-splicing bioconductor data-analyses differential-gene-expression differential-splicing-analysis gene-expression gtex recount2 rna-seq-data splicing-quantification sra tcga vast-tools cpp

1.3 match 36 stars 6.95 score 31 scripts

cran

varitas:Variant Calling in Targeted Analysis Sequencing Data

Multi-caller variant analysis pipeline for targeted analysis sequencing (TAS) data. Features a modular, automated workflow that can start with raw reads and produces a user-friendly PDF summary and a spreadsheet containing consensus variant information.

Maintained by Adam Mills. Last updated 4 years ago.

3.7 match 2.30 score

bioc

methodical:Discovering genomic regions where methylation is strongly associated with transcriptional activity

DNA methylation is generally considered to be associated with transcriptional silencing. However, comprehensive, genome-wide investigation of this relationship requires the evaluation of potentially millions of correlation values between the methylation of individual genomic loci and expression of associated transcripts in a relatively large numbers of samples. Methodical makes this process quick and easy while keeping a low memory footprint. It also provides a novel method for identifying regions where a number of methylation sites are consistently strongly associated with transcriptional expression. In addition, Methodical enables housing DNA methylation data from diverse sources (e.g. WGBS, RRBS and methylation arrays) with a common framework, lifting over DNA methylation data between different genome builds and creating base-resolution plots of the association between DNA methylation and transcriptional activity at transcriptional start sites.

Maintained by Richard Heery. Last updated 2 months ago.

dnamethylation methylationarray transcription genomewideassociation software openjdk

1.7 match 4.65 score 14 scripts

bioc

CircSeqAlignTk:A toolkit for end-to-end analysis of RNA-seq data for circular genomes

CircSeqAlignTk is designed for end-to-end RNA-Seq data analysis of circular genome sequences, from alignment to visualization. It mainly targets viroids which are composed of 246-401 nt circular RNAs. In addition, CircSeqAlignTk implements a tidy interface to generate synthetic sequencing data that mimic real RNA-Seq data, allowing developers to evaluate the performance of alignment tools and workflows.

Maintained by Jianqiang Sun. Last updated 5 months ago.

sequencing smallrna alignment software

1.8 match 4.40 score 3 scripts

bioc

SpliceWiz:interactive analysis and visualization of alternative splicing in R

The analysis and visualization of alternative splicing (AS) events from RNA sequencing data remains challenging. SpliceWiz is a user-friendly and performance-optimized R package for AS analysis, by processing alignment BAM files to quantify read counts across splice junctions, IRFinder-based intron retention quantitation, and supports novel splicing event identification. We introduce a novel visualization for AS using normalized coverage, thereby allowing visualization of differential AS across conditions. SpliceWiz features a shiny-based GUI facilitating interactive data exploration of results including gene ontology enrichment. It is performance optimized with multi-threaded processing of BAM files and a new COV file format for fast recall of sequencing coverage. Overall, SpliceWiz streamlines AS analysis, enabling reliable identification of functionally relevant AS events for further characterization.

Maintained by Alex Chit Hei Wong. Last updated 3 days ago.

software transcriptomics rnaseq alternativesplicing coverage differentialsplicing differentialexpression gui sequencing cpp openmp

1.2 match 16 stars 6.41 score 8 scripts

bioc

hiReadsProcessor:Functions to process LM-PCR reads from 454/Illumina data

hiReadsProcessor contains set of functions which allow users to process LM-PCR products sequenced using any platform. Given an excel/txt file containing parameters for demultiplexing and sample metadata, the functions automate trimming of adaptors and identification of the genomic product. Genomic products are further processed for QC and abundance quantification.

Maintained by Nirav V Malani. Last updated 5 months ago.

sequencing preprocessing

1.7 match 4.18 score 7 scripts

bioc

basecallQC:Working with Illumina Basecalling and Demultiplexing input and output files

The basecallQC package provides tools to work with Illumina bcl2Fastq (versions >= 2.1.7) software.Prior to basecalling and demultiplexing using the bcl2Fastq software, basecallQC functions allow the user to update Illumina sample sheets from versions <= 1.8.9 to >= 2.1.7 standards, clean sample sheets of common problems such as invalid sample names and IDs, create read and index basemasks and the bcl2Fastq command. Following the generation of basecalled and demultiplexed data, the basecallQC packages allows the user to generate HTML tables, plots and a self contained report of summary metrics from Illumina XML output files.

Maintained by Thomas Carroll. Last updated 5 months ago.

sequencing infrastructure dataimport qualitycontrol

1.6 match 4.32 score 21 scripts

bioc

R453Plus1Toolbox:A package for importing and analyzing data from Roche's Genome Sequencer System

The R453Plus1 Toolbox comprises useful functions for the analysis of data generated by Roche's 454 sequencing platform. It adds functions for quality assurance as well as for annotation and visualization of detected variants, complementing the software tools shipped by Roche with their product. Further, a pipeline for the detection of structural variants is provided.

Maintained by Hans-Ulrich Klein. Last updated 5 months ago.

sequencing infrastructure dataimport datarepresentation visualization qualitycontrol reportwriting

1.8 match 3.48 score 10 scripts

bioc

BUSpaRse:kallisto | bustools R utilities

The kallisto | bustools pipeline is a fast and modular set of tools to convert single cell RNA-seq reads in fastq files into gene count or transcript compatibility counts (TCC) matrices for downstream analysis. Central to this pipeline is the barcode, UMI, and set (BUS) file format. This package serves the following purposes: First, this package allows users to manipulate BUS format files as data frames in R and then convert them into gene count or TCC matrices. Furthermore, since R and Rcpp code is easier to handle than pure C++ code, users are encouraged to tweak the source code of this package to experiment with new uses of BUS format and different ways to convert the BUS file into gene count matrix. Second, this package can conveniently generate files required to generate gene count matrices for spliced and unspliced transcripts for RNA velocity. Here biotypes can be filtered and scaffolds and haplotypes can be removed, and the filtered transcriptome can be extracted and written to disk. Third, this package implements utility functions to get transcripts and associated genes required to convert BUS files to gene count matrices, to write the transcript to gene information in the format required by bustools, and to read output of bustools into R as sparses matrices.

Maintained by Lambda Moses. Last updated 5 months ago.

singlecell rnaseq workflowstep cpp

0.5 match 9 stars 7.35 score 165 scripts

bioc

metabinR:Abundance and Compositional Based Binning of Metagenomes

Provide functions for performing abundance and compositional based binning on metagenomic samples, directly from FASTA or FASTQ files. Functions are implemented in Java and called via rJava. Parallel implementation that operates directly on input FASTA/FASTQ files for fast execution.

Maintained by Anestis Gkanogiannis. Last updated 5 months ago.

classification clustering microbiome sequencing software openjdk

0.8 match 4.18 score 2 scripts

bioc

scruff:Single Cell RNA-Seq UMI Filtering Facilitator (scruff)

A pipeline which processes single cell RNA-seq (scRNA-seq) reads from CEL-seq and CEL-seq2 protocols. Demultiplex scRNA-seq FASTQ files, align reads to reference genome using Rsubread, and generate UMI filtered count matrix. Also provide visualizations of read alignments and pre- and post-alignment QC metrics.

Maintained by Zhe Wang. Last updated 5 months ago.

software technology sequencing alignment rnaseq singlecell workflowstep preprocessing qualitycontrol visualization immunooncology bioinformatics scrna-seq single-cell umi

0.5 match 8 stars 6.20 score 22 scripts

grafxzahl

genBaRcode:Analysis and Visualization Tools for Genetic Barcode Data

Provides the necessary functions to identify and extract a selection of already available barcode constructs (Cornils, K. et al. (2014) <doi:10.1093/nar/gku081>) and freely choosable barcode designs from next generation sequence (NGS) data. Furthermore, it offers the possibility to account for sequence errors, the calculation of barcode similarities and provides a variety of visualisation tools (Thielecke, L. et al. (2017) <doi:10.1038/srep43249>).

Maintained by Lars Thielecke. Last updated 6 days ago.

1.2 match 2.30 score 6 scripts