Showing 88 of total 88 results (show query)
tidyverse
haven:Import and Export 'SPSS', 'Stata' and 'SAS' Files
Import foreign statistical formats into R via the embedded 'ReadStat' C library, <https://github.com/WizardMac/ReadStat>.
Maintained by Hadley Wickham. Last updated 6 months ago.
427 stars 18.63 score 18k scripts 682 dependentsbioc
rhdf5:R Interface to HDF5
This package provides an interface between HDF5 and R. HDF5's main features are the ability to store and access very large and/or complex datasets and a wide variety of metadata on mass storage (disk) through a completely portable file format. The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, and for letting R applications work on datasets that are larger than the available RAM.
Maintained by Mike Smith. Last updated 4 days ago.
infrastructuredataimporthdf5rhdf5opensslcurlzlibcpp
62 stars 15.87 score 4.2k scripts 232 dependentsropensci
writexl:Export Data Frames to Excel 'xlsx' Format
Zero-dependency data frame to xlsx exporter based on 'libxlsxwriter' <https://libxlsxwriter.github.io>. Fast and no Java or Excel required.
Maintained by Jeroen Ooms. Last updated 6 days ago.
212 stars 15.56 score 14k scripts 212 dependentsbioc
Rsamtools:Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import
This package provides an interface to the 'samtools', 'bcftools', and 'tabix' utilities for manipulating SAM (Sequence Alignment / Map), FASTA, binary variant call (BCF) and compressed indexed tab-delimited (tabix) files.
Maintained by Bioconductor Package Maintainer. Last updated 4 months ago.
dataimportsequencingcoveragealignmentqualitycontrolbioconductor-packagecore-packagecurlbzip2xz-utilszlibcpp
28 stars 15.34 score 3.2k scripts 569 dependentsbioc
maftools:Summarize, Analyze and Visualize MAF Files
Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.
Maintained by Anand Mayakonda. Last updated 5 months ago.
datarepresentationdnaseqvisualizationdrivermutationvariantannotationfeatureextractionclassificationsomaticmutationsequencingfunctionalgenomicssurvivalbioinformaticscancer-genome-atlascancer-genomicsgenomicsmaf-filestcgacurlbzip2xz-utilszlib
459 stars 14.63 score 948 scripts 18 dependentsknausb
vcfR:Manipulate and Visualize VCF Data
Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.
Maintained by Brian J. Knaus. Last updated 1 months ago.
genomicspopulation-geneticspopulation-genomicsrcppvcf-datavisualizationzlibcpp
256 stars 13.66 score 3.1k scripts 19 dependentsbioc
mzR:parser for netCDF, mzXML and mzML and mzIdentML files (mass spectrometry data)
mzR provides a unified API to the common file formats and parsers available for mass spectrometry data. It comes with a subset of the proteowizard library for mzXML, mzML and mzIdentML. The netCDF reading code has previously been used in XCMS.
Maintained by Steffen Neumann. Last updated 2 months ago.
immunooncologyinfrastructuredataimportproteomicsmetabolomicsmassspectrometryzlibcpp
45 stars 12.77 score 204 scripts 44 dependentsbioc
rtracklayer:R interface to genome annotation files and the UCSC genome browser
Extensible framework for interacting with multiple genome browsers (currently UCSC built-in) and manipulating annotation tracks in various formats (currently GFF, BED, bedGraph, BED15, WIG, BigWig and 2bit built-in). The user may export/import tracks to/from the supported browsers, as well as query and modify the browser state, such as the current viewport.
Maintained by Michael Lawrence. Last updated 3 days ago.
annotationvisualizationdataimportzlibopensslcurl
12.66 score 6.7k scripts 480 dependentsstuart-lab
Signac:Analysis of Single-Cell Chromatin Data
A framework for the analysis and exploration of single-cell chromatin data. The 'Signac' package contains functions for quantifying single-cell chromatin data, computing per-cell quality control metrics, dimension reduction and normalization, visualization, and DNA sequence motif analysis. Reference: Stuart et al. (2021) <doi:10.1038/s41592-021-01282-5>.
Maintained by Tim Stuart. Last updated 7 months ago.
atacbioinformaticssingle-cellzlibcpp
355 stars 12.18 score 3.7k scripts 1 dependentsbioc
ShortRead:FASTQ input and manipulation
This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
dataimportsequencingqualitycontrolbioconductor-packagecore-packagezlibcpp
8 stars 12.08 score 1.8k scripts 49 dependentsbioc
methylKit:DNA methylation analysis from high-throughput bisulfite sequencing results
methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing. It also has functions to analyze base-pair resolution 5hmC data from experimental protocols such as oxBS-Seq and TAB-Seq. Methylation calling can be performed directly from Bismark aligned BAM files.
Maintained by Altuna Akalin. Last updated 28 days ago.
dnamethylationsequencingmethylseqgenome-biologymethylationstatistical-analysisvisualizationcurlbzip2xz-utilszlibcpp
220 stars 11.80 score 578 scripts 3 dependentsbioc
Rgraphviz:Provides plotting capabilities for R graph objects
Interfaces R with the AT and T graphviz library for plotting R graph objects from the graph package.
Maintained by Kasper Daniel Hansen. Last updated 2 days ago.
graphandnetworkvisualizationzlib
11.51 score 1.2k scripts 107 dependentsprivefl
bigsnpr:Analysis of Massive SNP Arrays
Easy-to-use, efficient, flexible and scalable tools for analyzing massive SNP arrays. Privé et al. (2018) <doi:10.1093/bioinformatics/bty185>.
Maintained by Florian Privé. Last updated 22 days ago.
big-databioinformaticsmemory-mapped-fileparallel-computingpolygenic-scorespopulation-structure-inferencesnp-datastatistical-methodsopenblaszlibcppopenmp
200 stars 11.44 score 1.5k scripts 3 dependentsbioc
VariantAnnotation:Annotation of Genetic Variants
Annotate variants, compute amino acid coding changes, predict coding outcomes.
Maintained by Bioconductor Package Maintainer. Last updated 3 months ago.
dataimportsequencingsnpannotationgeneticsvariantannotationcurlbzip2xz-utilszlib
11.39 score 1.9k scripts 152 dependentsbioc
XVector:Foundation of external vector representation and manipulation in Bioconductor
Provides memory efficient S4 classes for storing sequences "externally" (e.g. behind an R external pointer, or on disk).
Maintained by Hervé Pagès. Last updated 3 months ago.
infrastructuredatarepresentationbioconductor-packagecore-packagezlib
2 stars 11.36 score 67 scripts 1.7k dependentsjeroen
mongolite:Fast and Simple 'MongoDB' Client for R
High-performance MongoDB client based on 'mongo-c-driver' and 'jsonlite'. Includes support for aggregation, indexing, map-reduce, streaming, encryption, enterprise authentication, and GridFS. The online user manual provides an overview of the available methods in the package: <https://jeroen.github.io/mongolite/>.
Maintained by Jeroen Ooms. Last updated 6 days ago.
285 stars 11.25 score 860 scripts 10 dependentsbioc
Rhtslib:HTSlib high-throughput sequencing library as an R package
This package provides version 1.18 of the 'HTSlib' C library for high-throughput sequence analysis. The package is primarily useful to developers of other R packages who wish to make use of HTSlib. Motivation and instructions for use of this package are in the vignette, vignette(package="Rhtslib", "Rhtslib").
Maintained by Hervé Pagès. Last updated 24 days ago.
dataimportsequencingbioconductor-packagecore-packagecurlbzip2xz-utilszlib
11 stars 11.25 score 3 scripts 591 dependentsbioc
Rhdf5lib:hdf5 library as an R package
Provides C and C++ hdf5 libraries.
Maintained by Mike Smith. Last updated 5 days ago.
infrastructurebioconductorhdf5hdf5-libraryfortranzlib
6 stars 11.22 score 26 scripts 341 dependentsropensci
qpdf:Split, Combine and Compress PDF Files
Content-preserving transformations transformations of PDF files such as split, combine, and compress. This package interfaces directly to the 'qpdf' C++ library <https://qpdf.sourceforge.io/> and does not require any command line utilities. Note that 'qpdf' does not read actual content from PDF files: to extract text and data you need the 'pdftools' package.
Maintained by Jeroen Ooms. Last updated 6 days ago.
57 stars 10.65 score 203 scripts 76 dependentsjonclayden
RNifti:Fast R and C++ Access to NIfTI Images
Provides very fast read and write access to images stored in the NIfTI-1, NIfTI-2 and ANALYZE-7.5 formats, with seamless synchronisation of in-memory image objects between compiled C and interpreted R code. Also provides a simple image viewer, and a C/C++ API that can be used by other packages. Not to be confused with 'RNiftyReg', which performs image registration and applies spatial transformations.
Maintained by Jon Clayden. Last updated 1 months ago.
medical-imagingnifti-formatzlibcpp
49 stars 10.48 score 522 scripts 56 dependentsbioc
oligo:Preprocessing tools for oligonucleotide arrays
A package to analyze oligonucleotide arrays (expression/SNP/tiling/exon) at probe-level. It currently supports Affymetrix (CEL files) and NimbleGen arrays (XYS files).
Maintained by Benilton Carvalho. Last updated 20 days ago.
microarrayonechanneltwochannelpreprocessingsnpdifferentialexpressionexonarraygeneexpressiondataimportzlib
3 stars 10.42 score 528 scripts 10 dependentsbioc
DropletUtils:Utilities for Handling Single-Cell Droplet Data
Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading from count matrices or molecule information files, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix.
Maintained by Jonathan Griffiths. Last updated 4 months ago.
immunooncologysinglecellsequencingrnaseqgeneexpressiontranscriptomicsdataimportcoveragezlibcpp
10.01 score 2.7k scripts 9 dependentscoolbutuseless
yyjsonr:Fast 'JSON', 'NDJSON' and 'GeoJSON' Parser and Generator
A fast 'JSON' parser, generator and validator which converts 'JSON', 'NDJSON' (Newline Delimited 'JSON') and 'GeoJSON' (Geographic 'JSON') data to/from R objects. The standard R data types are supported (e.g. logical, numeric, integer) with configurable handling of NULL and NA values. Data frames, atomic vectors and lists are all supported as data containers translated to/from 'JSON'. 'GeoJSON' data is read in as 'simple features' objects. This implementation wraps the 'yyjson' 'C' library which is available from <https://github.com/ibireme/yyjson>.
Maintained by Mike Cheng. Last updated 5 months ago.
147 stars 9.56 score 22 scripts 9 dependentsbioc
snpStats:SnpMatrix and XSnpMatrix classes and methods
Classes and statistical methods for large SNP association studies. This extends the earlier snpMatrix package, allowing for uncertainty in genotypes.
Maintained by David Clayton. Last updated 5 months ago.
microarraysnpgeneticvariabilityzlib
9.48 score 674 scripts 20 dependentsbioc
Rsubread:Mapping, quantification and variant analysis of sequencing data
Alignment, quantification and analysis of RNA sequencing data (including both bulk RNA-seq and scRNA-seq) and DNA sequenicng data (including ATAC-seq, ChIP-seq, WGS, WES etc). Includes functionality for read mapping, read counting, SNP calling, structural variant detection and gene fusion discovery. Can be applied to all major sequencing techologies and to both short and long sequence reads.
Maintained by Wei Shi. Last updated 8 days ago.
sequencingalignmentsequencematchingrnaseqchipseqsinglecellgeneexpressiongeneregulationgeneticsimmunooncologysnpgeneticvariabilitypreprocessingqualitycontrolgenomeannotationgenefusiondetectionindeldetectionvariantannotationvariantdetectionmultiplesequencealignmentzlib
9.24 score 892 scripts 10 dependentsbioc
affyio:Tools for parsing Affymetrix data files
Routines for parsing Affymetrix data files based upon file format information. Primary focus is on accessing the CEL and CDF file formats.
Maintained by Ben Bolstad. Last updated 2 months ago.
microarraydataimportinfrastructurezlib
4 stars 9.07 score 40 scripts 110 dependentsbioc
scPipe:Pipeline for single cell multi-omic data pre-processing
A preprocessing pipeline for single cell RNA-seq/ATAC-seq data that starts from the fastq files and produces a feature count matrix with associated quality control information. It can process fastq data generated by CEL-seq, MARS-seq, Drop-seq, Chromium 10x and SMART-seq protocols.
Maintained by Shian Su. Last updated 3 months ago.
immunooncologysoftwaresequencingrnaseqgeneexpressionsinglecellvisualizationsequencematchingpreprocessingqualitycontrolgenomeannotationdataimportcurlbzip2xz-utilszlibcpp
68 stars 9.02 score 84 scriptsbioc
bamsignals:Extract read count signals from bam files
This package allows to efficiently obtain count vectors from indexed bam files. It counts the number of reads in given genomic ranges and it computes reads profiles and coverage profiles. It also handles paired-end data.
Maintained by Johannes Helmuth. Last updated 5 months ago.
dataimportsequencingcoveragealignmentcurlbzip2xz-utilszlibcpp
15 stars 8.95 score 31 scripts 11 dependentsbioc
QuasR:Quantify and Annotate Short Reads in R
This package provides a framework for the quantification and analysis of Short Reads. It covers a complete workflow starting from raw sequence reads, over creation of alignments and quality control plots, to the quantification of genomic regions of interest. Read alignments are either generated through Rbowtie (data from DNA/ChIP/ATAC/Bis-seq experiments) or Rhisat2 (data from RNA-seq experiments that require spliced alignments), or can be provided in the form of bam files.
Maintained by Michael Stadler. Last updated 1 months ago.
geneticspreprocessingsequencingchipseqrnaseqmethylseqcoveragealignmentqualitycontrolimmunooncologycurlbzip2xz-utilszlibcpp
6 stars 8.63 score 79 scripts 1 dependentsbioboot
bio3d:Biological Structure Analysis
Utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform sequence and structure database searches, data summaries, atom selection, alignment, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, normal mode analysis, principal component analysis of heterogeneous structure data, and correlation network analysis from normal mode and molecular dynamics data. In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data. Please refer to the URLs below for more information.
Maintained by Barry Grant. Last updated 5 months ago.
5 stars 8.49 score 1.4k scripts 10 dependentsbioc
alabaster.base:Save Bioconductor Objects to File
Save Bioconductor data structures into file artifacts, and load them back into memory. This is a more robust and portable alternative to serialization of such objects into RDS files. Each artifact is associated with metadata for further interpretation; downstream applications can enrich this metadata with context-specific properties.
Maintained by Aaron Lun. Last updated 24 days ago.
datarepresentationdataimportzlibcpp
3 stars 8.47 score 60 scripts 15 dependentseddelbuettel
RcppCNPy:Read-Write Support for 'NumPy' Files via 'Rcpp'
The 'cnpy' library written by Carl Rogers provides read and write facilities for files created with (or for) the 'NumPy' extension for 'Python'. Vectors and matrices of numeric types can be read or written to and from files as well as compressed files. Support for integer files is available if the package has been built with as C++11 which should be the default on all platforms since the release of R 3.3.0.
Maintained by Dirk Eddelbuettel. Last updated 14 days ago.
27 stars 8.38 score 448 scriptshanchenphd
GMMAT:Generalized Linear Mixed Model Association Tests
Perform association tests using generalized linear mixed models (GLMMs) in genome-wide association studies (GWAS) and sequencing association studies. First, GMMAT fits a GLMM with covariate adjustment and random effects to account for population structure and familial or cryptic relatedness. For GWAS, GMMAT performs score tests for each genetic variant as proposed in Chen et al. (2016) <DOI:10.1016/j.ajhg.2016.02.012>. For candidate gene studies, GMMAT can also perform Wald tests to get the effect size estimate for each genetic variant. For rare variant analysis from sequencing association studies, GMMAT performs the variant Set Mixed Model Association Tests (SMMAT) as proposed in Chen et al. (2019) <DOI:10.1016/j.ajhg.2018.12.012>, including the burden test, the sequence kernel association test (SKAT), SKAT-O and an efficient hybrid test of the burden test and SKAT, based on user-defined variant sets.
Maintained by Han Chen. Last updated 1 years ago.
openblaszlibbzip2libzstdlibdeflatecpp
41 stars 8.37 score 96 scripts 2 dependentsbioc
csaw:ChIP-Seq Analysis with Windows
Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control.
Maintained by Aaron Lun. Last updated 2 months ago.
multiplecomparisonchipseqnormalizationsequencingcoveragegeneticsannotationdifferentialpeakcallingcurlbzip2xz-utilszlibcpp
8.32 score 498 scripts 7 dependentszhanxw
seqminer:Efficiently Read Sequence Data (VCF Format, BCF Format, METAL Format and BGEN Format) into R
Integrate sequencing data (Variant call format, e.g. VCF or BCF) or meta-analysis results in R. This package can help you (1) read VCF/BCF/BGEN files by chromosomal ranges (e.g. 1:100-200); (2) read RareMETAL summary statistics files; (3) read tables from a tabix-indexed files; (4) annotate VCF/BCF files; (5) create customized workflow based on Makefile.
Maintained by Xiaowei Zhan. Last updated 6 months ago.
annotationbcfbgenmeta-analysisnext-generation-sequencingplinksequencingtabixvcfworkflowzlibbzip2libzstdsqlite3cpp
30 stars 8.29 score 111 scripts 6 dependentsaidenlab
strawr:Fast Implementation of Reading/Dump for .hic Files
API for fast data extraction for .hic files that provides programmatic access to the matrices. It doesn't store the pointer data for all the matrices, only the one queried, and currently we are only supporting matrices (not vectors).
Maintained by Neva Cherniavsky Durand. Last updated 12 months ago.
67 stars 8.23 score 49 scripts 11 dependentsplfjohnson
devEMF:EMF Graphics Output Device
Output graphics to EMF+/EMF.
Maintained by Philip Johnson. Last updated 2 days ago.
6 stars 8.09 score 214 scripts 11 dependentsnx10
unigd:Universal Graphics Device
A unified R graphics backend. Render R graphics fast and easy to many common file formats. Provides a thread safe 'C' interface for asynchronous rendering of R graphics.
Maintained by Florian Rupprecht. Last updated 13 days ago.
23 stars 8.07 score 6 scripts 2 dependentsbioc
FLAMES:FLAMES: Full Length Analysis of Mutations and Splicing in long read RNA-seq data
Semi-supervised isoform detection and annotation from both bulk and single-cell long read RNA-seq data. Flames provides automated pipelines for analysing isoforms, as well as intermediate functions for manual execution.
Maintained by Changqing Wang. Last updated 18 days ago.
rnaseqsinglecelltranscriptomicsdataimportdifferentialsplicingalternativesplicinggeneexpressionlongreadzlibcurlbzip2xz-utilscpp
31 stars 7.95 score 12 scriptsbioc
flowWorkspace:Infrastructure for representing and interacting with gated and ungated cytometry data sets.
This package is designed to facilitate comparison of automated gating methods against manual gating done in flowJo. This package allows you to import basic flowJo workspaces into BioConductor and replicate the gating from flowJo using the flowCore functionality. Gating hierarchies, groups of samples, compensation, and transformation are performed so that the output matches the flowJo analysis.
Maintained by Greg Finak. Last updated 22 days ago.
immunooncologyflowcytometrydataimportpreprocessingdatarepresentationzlibopenblascpp
7.89 score 576 scripts 10 dependentshrbrmstr
ndjson:Wicked-Fast Streaming 'JSON' ('ndjson') Reader
Streaming 'JSON' ('ndjson') has one 'JSON' record per-line and many modern 'ndjson' files contain large numbers of records. These constructs may not be columnar in nature, but it is often useful to read in these files and "flatten" the structure out to enable working with the data in an R 'data.frame'-like context. Functions are provided that make it possible to read in plain 'ndjson' files or compressed ('gz') 'ndjson' files and either validate the format of the records or create "flat" 'data.table' structures from them.
Maintained by Bob Rudis. Last updated 2 years ago.
57 stars 7.88 score 125 scripts 7 dependentsbioc
CytoML:A GatingML Interface for Cross Platform Cytometry Data Sharing
Uses platform-specific implemenations of the GatingML2.0 standard to exchange gated cytometry data with other software platforms.
Maintained by Mike Jiang. Last updated 22 days ago.
immunooncologyflowcytometrydataimportdatarepresentationzlibopenblaslibxml2cpp
30 stars 7.60 score 132 scriptsbioc
ncdfFlow:ncdfFlow: A package that provides HDF5 based storage for flow cytometry data.
Provides HDF5 storage based methods and functions for manipulation of flow cytometry data.
Maintained by Mike Jiang. Last updated 2 months ago.
immunooncologyflowcytometryzlibcpp
7.56 score 96 scripts 11 dependentsbnprks
BPCells:Single Cell Counts Matrices to PCA
> Efficient operations for single cell ATAC-seq fragments and RNA counts matrices. Interoperable with standard file formats, and introduces efficient bit-packed formats that allow large storage savings and increased read speeds.
Maintained by Benjamin Parks. Last updated 2 months ago.
184 stars 7.48 score 172 scriptsbioc
DiffBind:Differential Binding Analysis of ChIP-Seq Peak Data
Compute differentially bound sites from multiple ChIP-seq experiments using affinity (quantitative) data. Also enables occupancy (overlap) analysis and plotting functions.
Maintained by Rory Stark. Last updated 2 months ago.
sequencingchipseqatacseqdnaseseqmethylseqripseqdifferentialpeakcallingdifferentialmethylationgeneregulationhistonemodificationpeakdetectionbiomedicalinformaticscellbiologymultiplecomparisonnormalizationreportwritingepigeneticsfunctionalgenomicscurlbzip2xz-utilszlibcpp
7.13 score 512 scripts 2 dependentsbioc
affyPLM:Methods for fitting probe-level models
A package that extends and improves the functionality of the base affy package. Routines that make heavy use of compiled code for speed. Central focus is on implementation of methods for fitting probe-level models and tools using these models. PLM based quality assessment tools.
Maintained by Ben Bolstad. Last updated 2 months ago.
microarrayonechannelpreprocessingqualitycontrolopenblaszlib
6.99 score 206 scripts 4 dependentsbioc
h5mread:A fast HDF5 reader
The main function in the h5mread package is h5mread(), which allows reading arbitrary data from an HDF5 dataset into R, similarly to what the h5read() function from the rhdf5 package does. In the case of h5mread(), the implementation has been optimized to make it as fast and memory-efficient as possible.
Maintained by Hervé Pagès. Last updated 2 months ago.
infrastructuredatarepresentationdataimportopensslcurlzlib
1 stars 6.98 score 4 scripts 127 dependentsbioc
NanoMethViz:Visualise methylation data from Oxford Nanopore sequencing
NanoMethViz is a toolkit for visualising methylation data from Oxford Nanopore sequencing. It can be used to explore methylation patterns from reads derived from Oxford Nanopore direct DNA sequencing with methylation called by callers including nanopolish, f5c and megalodon. The plots in this package allow the visualisation of methylation profiles aggregated over experimental groups and across classes of genomic features.
Maintained by Shian Su. Last updated 19 days ago.
softwarelongreadvisualizationdifferentialmethylationdnamethylationepigeneticsdataimportzlibcpp
26 stars 6.95 score 11 scriptszilong-li
vcfppR:Rapid Manipulation of the Variant Call Format (VCF)
The 'vcfpp.h' (<https://github.com/Zilong-Li/vcfpp>) provides an easy-to-use 'C++' 'API' of 'htslib', offering full functionality for manipulating Variant Call Format (VCF) files. The 'vcfppR' package serves as the R bindings of the 'vcfpp.h' library, enabling rapid processing of both compressed and uncompressed VCF files. Explore a range of powerful features for efficient VCF data manipulation.
Maintained by Zilong Li. Last updated 15 days ago.
bioinformaticsfastrhtslibpopulation-geneticspopulation-genomicsvcfvcf-datavisulizationlibdeflatezlibbzip2xz-utilscurlcpp
13 stars 6.70 score 16 scriptsadokter
vol2birdR:Vertical Profiles of Biological Signals in Weather Radar Data
'R' implementation of the 'vol2bird' software for generating vertical profiles of birds and other biological signals in weather radar data. See Dokter et al. (2011) <doi:10.1098/rsif.2010.0116> for a paper describing the methodology.
Maintained by Adriaan M. Dokter. Last updated 8 hours ago.
6 stars 6.58 score 19 scriptsbioc
deepSNV:Detection of subclonal SNVs in deep sequencing data.
This package provides provides quantitative variant callers for detecting subclonal mutations in ultra-deep (>=100x coverage) sequencing experiments. The deepSNV algorithm is used for a comparative setup with a control experiment of the same loci and uses a beta-binomial model and a likelihood ratio test to discriminate sequencing errors and subclonal SNVs. The shearwater algorithm computes a Bayes classifier based on a beta-binomial model for variant calling with multiple samples for precisely estimating model parameters - such as local error rates and dispersion - and prior knowledge, e.g. from variation data bases such as COSMIC.
Maintained by Moritz Gerstung. Last updated 5 months ago.
geneticvariabilitysnpsequencinggeneticsdataimportcurlbzip2xz-utilszlibcpp
6.53 score 38 scripts 1 dependentsbioc
knowYourCG:Functional analysis of DNA methylome datasets
KnowYourCG (KYCG) is a supervised learning framework designed for the functional analysis of DNA methylation data. Unlike existing tools that focus on genes or genomic intervals, KnowYourCG directly targets CpG dinucleotides, featuring automated supervised screenings of diverse biological and technical influences, including sequence motifs, transcription factor binding, histone modifications, replication timing, cell-type-specific methylation, and trait-epigenome associations. KnowYourCG addresses the challenges of data sparsity in various methylation datasets, including low-pass Nanopore sequencing, single-cell DNA methylomes, 5-hydroxymethylation profiles, spatial DNA methylation maps, and array-based datasets for epigenome-wide association studies and epigenetic clocks.
Maintained by Goldberg David. Last updated 3 months ago.
epigeneticsdnamethylationsequencingsinglecellspatialmethylationarrayzlib
2 stars 6.10 score 4 scriptscran
gaston:Genetic Data Handling (QC, GRM, LD, PCA) & Linear Mixed Models
Manipulation of genetic data (SNPs). Computation of GRM and dominance matrix, LD, heritability with efficient algorithms for linear mixed model (AIREML). Dandine et al <doi:10.1159/000488519>.
Maintained by Hervé Perdry. Last updated 1 years ago.
5 stars 6.02 score 12 dependentsbioc
raer:RNA editing tools in R
Toolkit for identification and statistical testing of RNA editing signals from within R. Provides support for identifying sites from bulk-RNA and single cell RNA-seq datasets, and general methods for extraction of allelic read counts from alignment files. Facilitates annotation and exploratory analysis of editing signals using Bioconductor packages and resources.
Maintained by Kent Riemondy. Last updated 5 months ago.
multiplecomparisonrnaseqsinglecellsequencingcoverageepitranscriptomicsfeatureextractionannotationalignmentbioconductor-packagerna-seq-analysissingle-cell-analysissingle-cell-rna-seqcurlbzip2xz-utilszlib
8 stars 5.98 score 6 scriptsgams-dev
gamstransfer:A Data Interface Between 'GAMS' and R
Read, analyze, modify, and write 'GAMS' (General Algebraic Modeling System) data. The main focus of 'gamstransfer' is the highly efficient transfer of data with 'GAMS' <https://www.gams.com/>, while keeping these operations as simple as possible for the user. The transfer of data usually takes place via an intermediate GDX (GAMS Data Exchange) file. Additionally, 'gamstransfer' provides utility functions to get an overview of 'GAMS' data and to check its validity.
Maintained by Atharv Bhosekar. Last updated 6 months ago.
2 stars 5.96 score 26 scripts 13 dependentsbioc
epialleleR:Fast, Epiallele-Aware Methylation Caller and Reporter
Epialleles are specific DNA methylation patterns that are mitotically and/or meiotically inherited. This package calls and reports cytosine methylation as well as frequencies of hypermethylated epialleles at the level of genomic regions or individual cytosines in next-generation sequencing data using binary alignment map (BAM) files as an input. Among other things, this package can also extract and visualise methylation patterns and assess allele specificity of methylation.
Maintained by Oleksii Nikolaienko. Last updated 24 days ago.
dnamethylationepigeneticsmethylseqlongreadbioconductordna-methylationepiallelenext-generation-sequencingsamtoolscurlbzip2xz-utilszlibcpp
4 stars 5.94 score 5 scriptsstewid
rmatio:Read and Write 'Matlab' Files
Read and write 'Matlab' MAT files from R. The 'rmatio' package supports reading MAT version 4, MAT version 5 and MAT compressed version 5. The 'rmatio' package can write version 5 MAT files and version 5 files with variable compression.
Maintained by Stefan Widgren. Last updated 2 years ago.
11 stars 5.90 score 115 scripts 2 dependentsevolecolgroup
tidypopgen:Tidy Population Genetics
We provide a tidy grammar of population genetics, facilitating the manipulation and analysis of data on biallelic single nucleotide polymorphisms (SNPs). `tidypopgen` scales to very large genetic datasets by storing genotypes on disk, and performing operations on them in chunks, without ever loading all data in memory.
Maintained by Andrea Manica. Last updated 7 days ago.
4 stars 5.84 score 8 scriptsbioc
makecdfenv:CDF Environment Maker
This package has two functions. One reads a Affymetrix chip description file (CDF) and creates a hash table environment containing the location/probe set membership mapping. The other creates a package that automatically loads that environment.
Maintained by James W. MacDonald. Last updated 3 months ago.
onechanneldataimportpreprocessingzlib
5.81 score 36 scripts 2 dependentsbioc
diffHic:Differential Analysis of Hi-C Data
Detects differential interactions across biological conditions in a Hi-C experiment. Methods are provided for read alignment and data pre-processing into interaction counts. Statistical analysis is based on edgeR and supports normalization and filtering. Several visualization options are also available.
Maintained by Aaron Lun. Last updated 3 months ago.
multiplecomparisonpreprocessingsequencingcoveragealignmentnormalizationclusteringhiccurlbzip2xz-utilszlibcpp
5.58 score 38 scriptsbioc
seqTools:Analysis of nucleotide, sequence and quality content on fastq files
Analyze read length, phred scores and alphabet frequency and DNA k-mers on uncompressed and compressed fastq files.
Maintained by Wolfgang Kaisers. Last updated 5 months ago.
5.57 score 52 scripts 1 dependentspachadotdev
cpp11qpdf:Split, Combine and Compress PDF Files
Bindings to 'qpdf': 'qpdf' (<https://qpdf.sourceforge.io/>) is a an open-source PDF rendering library that allows to conduct content-preserving transformations of PDF files such as split, combine, and compress PDF files.
Maintained by Mauricio Vargas Sepulveda. Last updated 3 months ago.
3 stars 5.56 score 4 scriptsreedacartwright
rbedrock:Analysis and Manipulation of Data from Minecraft Bedrock Edition
Implements an interface to Minecraft (Bedrock Edition) worlds. Supports the analysis and management of these worlds and game saves.
Maintained by Reed Cartwright. Last updated 3 days ago.
43 stars 5.29 score 3 scriptslucasnell
jackalope:A Swift, Versatile Phylogenomic and High-Throughput Sequencing Simulator
Simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina <https://www.illumina.com/> and Pacific Biosciences (PacBio) <https://www.pacb.com/> platforms. It can either read reference genomes from FASTA files or simulate new ones. Genomic variants can be simulated using summary statistics, phylogenies, Variant Call Format (VCF) files, and coalescent simulations—the latter of which can include selection, recombination, and demographic fluctuations. 'jackalope' can simulate single, paired-end, or mate-pair Illumina reads, as well as PacBio reads. These simulations include sequencing errors, mapping qualities, multiplexing, and optical/polymerase chain reaction (PCR) duplicates. Simulating Illumina sequencing is based on ART by Huang et al. (2012) <doi:10.1093/bioinformatics/btr708>. PacBio sequencing simulation is based on SimLoRD by Stöcker et al. (2016) <doi:10.1093/bioinformatics/btw286>. All outputs can be written to standard file formats.
Maintained by Lucas A. Nell. Last updated 1 years ago.
zlibopenblascurlbzip2xz-utilscpp
8 stars 5.28 score 24 scriptstimothy-barry
ondisc:Algorithms and data structures for large single-cell expression matrices
Single-cell datasets are growing in size, posing challenges as well as opportunities for genomics researchers. `ondisc` is an R package that facilitates analysis of large-scale single-cell data out-of-core on a laptop or distributed across tens to hundreds processors on a cluster or cloud. In both of these settings, `ondisc` requires only a few gigabytes of memory, even if the input data are tens of gigabytes in size. `ondisc` mainly is oriented toward single-cell CRISPR screen analysis, but ondisc also can be used for single-cell differential expression and single-cell co-expression analyses. ondisc is powered by several new, efficient algorithms for manipulating and querying large, sparse expression matrices.
Maintained by Timothy Barry. Last updated 12 months ago.
dataimportsinglecelldifferentialexpressioncrisprzlibcpp
11 stars 5.13 score 62 scriptsbioc
podkat:Position-Dependent Kernel Association Test
This package provides an association test that is capable of dealing with very rare and even private variants. This is accomplished by a kernel-based approach that takes the positions of the variants into account. The test can be used for pre-processed matrix data, but also directly for variant data stored in VCF files. Association testing can be performed whole-genome, whole-exome, or restricted to pre-defined regions of interest. The test is complemented by tools for analyzing and visualizing the results.
Maintained by Ulrich Bodenhofer. Last updated 5 months ago.
geneticswholegenomeannotationvariantannotationsequencingdataimportcurlbzip2xz-utilszlibcpp
5.02 score 6 scriptsdeploid-dev
DEploid:Deconvolute Mixed Genomes with Unknown Proportions
Traditional phasing programs are limited to diploid organisms. Our method modifies Li and Stephens algorithm with Markov chain Monte Carlo (MCMC) approaches, and builds a generic framework that allows haplotype searches in a multiple infection setting. This package is primarily developed as part of the Pf3k project, which is a global collaboration using the latest sequencing technologies to provide a high-resolution view of natural variation in the malaria parasite Plasmodium falciparum. Parasite DNA are extracted from patient blood sample, which often contains more than one parasite strain, with unknown proportions. This package is used for deconvoluting mixed haplotypes, and reporting the mixture proportions from each sample.
Maintained by Joe Zhu. Last updated 2 months ago.
deconvoluting-mixed-genomeshmmmalariamcmcparasitesphasingunknown-proportionszlibcpp
1 stars 4.99 score 39 scriptslcrawlab
smer:Sparse Marginal Epistasis Test
The Sparse Marginal Epistasis Test is a computationally efficient genetics method which detects statistical epistasis in complex traits; see Stamp et al. (2025, <doi:10.1101/2025.01.11.632557>) for details.
Maintained by Julian Stamp. Last updated 2 months ago.
genomewideassociationepistasisgeneticssnplinearmixedmodelcppepistasis-analysisepistatisgwasgwas-toolsmapitzlibcppopenmp
1 stars 4.95 score 8 scriptsbioc
epigraHMM:Epigenomic R-based analysis with hidden Markov models
epigraHMM provides a set of tools for the analysis of epigenomic data based on hidden Markov Models. It contains two separate peak callers, one for consensus peaks from biological or technical replicates, and one for differential peaks from multi-replicate multi-condition experiments. In differential peak calling, epigraHMM provides window-specific posterior probabilities associated with every possible combinatorial pattern of read enrichment across conditions.
Maintained by Pedro Baldoni. Last updated 5 months ago.
chipseqatacseqdnaseseqhiddenmarkovmodelepigeneticszlibopenblascppopenmp
4.94 score 88 scriptsbioc
beachmat.hdf5:beachmat bindings for HDF5-backed matrices
Extends beachmat to support initialization of tatami matrices from HDF5-backed arrays. This allows C++ code in downstream packages to directly call the HDF5 C/C++ library to access array data, without the need for block processing via DelayedArray. Some utilities are also provided for direct creation of an in-memory tatami matrix from a HDF5 file.
Maintained by Aaron Lun. Last updated 5 months ago.
datarepresentationdataimportinfrastructurezlibcpp
4.88 score 6 scriptsbioc
transmogR:Modify a set of reference sequences using a set of variants
transmogR provides the tools needed to crate a new reference genome or reference transcriptome, using a set of variants. Variants can be any combination of SNPs, Insertions and Deletions. The intended use-case is to enable creation of variant-modified reference transcriptomes for incorporation into transcriptomic pseudo-alignment workflows, such as salmon.
Maintained by Stevie Pederson. Last updated 13 days ago.
alignmentgenomicvariationsequencingtranscriptomevariantvariantannotationzlib
4.74 score 2 scriptsbioc
screenCounter:Counting Reads in High-Throughput Sequencing Screens
Provides functions for counting reads from high-throughput sequencing screen data (e.g., CRISPR, shRNA) to quantify barcode abundance. Currently supports single barcodes in single- or paired-end data, and combinatorial barcodes in paired-end data.
Maintained by Aaron Lun. Last updated 4 months ago.
crispralignmentfunctionalgenomicsfunctionalpredictionzlibcpp
3 stars 4.65 score 10 scriptsbioc
gmapR:An R interface to the GMAP/GSNAP/GSTRUCT suite
GSNAP and GMAP are a pair of tools to align short-read data written by Tom Wu. This package provides convenience methods to work with GMAP and GSNAP from within R. In addition, it provides methods to tally alignment results on a per-nucleotide basis using the bam_tally tool.
Maintained by Michael Lawrence. Last updated 10 days ago.
4.65 score 45 scriptsbioc
mitoClone2:Clonal Population Identification in Single-Cell RNA-Seq Data using Mitochondrial and Somatic Mutations
This package primarily identifies variants in mitochondrial genomes from BAM alignment files. It filters these variants to remove RNA editing events then estimates their evolutionary relationship (i.e. their phylogenetic tree) and groups single cells into clones. It also visualizes the mutations and providing additional genomic context.
Maintained by Benjamin Story. Last updated 5 months ago.
annotationdataimportgeneticssnpsoftwaresinglecellalignmentcurlbzip2xz-utilszlibcpp
1 stars 4.48 score 9 scriptsbioc
chihaya:Save Delayed Operations to a HDF5 File
Saves the delayed operations of a DelayedArray to a HDF5 file. This enables efficient recovery of the DelayedArray's contents in other languages and analysis frameworks.
Maintained by Aaron Lun. Last updated 5 months ago.
dataimportdatarepresentationzlibcpp
4.38 score 16 scriptsbioc
qckitfastq:FASTQ Quality Control
Assessment of FASTQ file format with multiple metrics including quality score, sequence content, overrepresented sequence and Kmers.
Maintained by August Guang. Last updated 5 months ago.
softwarequalitycontrolsequencingzlibcpp
4.38 score 24 scriptsbioc
HiCDCPlus:Hi-C Direct Caller Plus
Systematic 3D interaction calls and differential analysis for Hi-C and HiChIP. The HiC-DC+ (Hi-C/HiChIP direct caller plus) package enables principled statistical analysis of Hi-C and HiChIP data sets – including calling significant interactions within a single experiment and performing differential analysis between conditions given replicate experiments – to facilitate global integrative studies. HiC-DC+ estimates significant interactions in a Hi-C or HiChIP experiment directly from the raw contact matrix for each chromosome up to a specified genomic distance, binned by uniform genomic intervals or restriction enzyme fragments, by training a background model to account for random polymer ligation and systematic sources of read count variation.
Maintained by Merve Sahin. Last updated 5 months ago.
hicdna3dstructuresoftwarenormalizationzlibcpp
4.20 score 16 scriptsr-spark
sparkwarc:Load WARC Files into Apache Spark
Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project <http://commoncrawl.org/>.
Maintained by Edgar Ruiz. Last updated 3 years ago.
13 stars 3.89 score 12 scriptsbioc
Rfastp:An Ultra-Fast and All-in-One Fastq Preprocessor (Quality Control, Adapter, low quality and polyX trimming) and UMI Sequence Parsing).
Rfastp is an R wrapper of fastp developed in c++. fastp performs quality control for fastq files. including low quality bases trimming, polyX trimming, adapter auto-detection and trimming, paired-end reads merging, UMI sequence/id handling. Rfastp can concatenate multiple files into one file (like shell command cat) and accept multiple files as input.
Maintained by Thomas Carroll. Last updated 5 months ago.
qualitycontrolsequencingpreprocessingsoftwarezlibcpp
3.82 score 33 scriptsbioc
Rmmquant:RNA-Seq multi-mapping Reads Quantification Tool
RNA-Seq is currently used routinely, and it provides accurate information on gene transcription. However, the method cannot accurately estimate duplicated genes expression. Several strategies have been previously used, but all of them provide biased results. With Rmmquant, if a read maps at different positions, the tool detects that the corresponding genes are duplicated; it merges the genes and creates a merged gene. The counts of ambiguous reads is then based on the input genes and the merged genes. Rmmquant is a drop-in replacement of the widely used tools findOverlaps and featureCounts that handles multi-mapping reads in an unabiased way.
Maintained by Zytnicki Matthias. Last updated 5 months ago.
geneexpressiontranscriptionzlibcpp
3.30 score 5 scriptschrchang
pgenlibr:PLINK 2 Binary (.pgen) Reader
A thin wrapper over PLINK 2's core libraries which provides an R interface for reading .pgen files. A minimal .pvar loader is also included. Chang et al. (2015) \doi{10.1186/s13742-015-0047-8}.
Maintained by Christopher Chang. Last updated 2 months ago.
2.98 score 64 scriptscran
hipread:Read Hierarchical Fixed Width Files
Read hierarchical fixed width files like those commonly used by many census data providers. Also allows for reading of data in chunks, and reading 'gzipped' files without storing the full file in memory.
Maintained by Derek Burk. Last updated 1 years ago.
2.87 score 3 dependentsbioc
TransView:Read density map construction and accession. Visualization of ChIPSeq and RNASeq data sets
This package provides efficient tools to generate, access and display read densities of sequencing based data sets such as from RNA-Seq and ChIP-Seq.
Maintained by Julius Muller. Last updated 2 months ago.
immunooncologydnamethylationgeneexpressiontranscriptionmicroarraysequencingchipseqrnaseqmethylseqdataimportvisualizationclusteringmultiplecomparisoncurlbzip2xz-utilszlib
2.60 scoreshajoezhu
DEploid.utils:'DEploid' Data Analysis and Results Interpretation
'DEploid' (Zhu et.al. 2018 <doi:10.1093/bioinformatics/btx530>) is designed for deconvoluting mixed genomes with unknown proportions. Traditional phasing programs are limited to diploid organisms. Our method modifies Li and Stephen’s algorithm with Markov chain Monte Carlo (MCMC) approaches, and builds a generic framework that allows haloptype searches in a multiple infection setting. This package provides R functions to support data analysis and results interpretation.
Maintained by Joe Zhu. Last updated 3 months ago.
2.18 score 1 dependentscran
milorGWAS:Mixed Logistic Regression for Genome-Wide Analysis Studies (GWAS)
Fast approximate methods for mixed logistic regression in genome-wide analysis studies (GWAS). Two computationnally efficient methods are proposed for obtaining effect size estimates (beta) in Mixed Logistic Regression in GWAS: the Approximate Maximum Likelihood Estimate (AMLE), and the Offset method. The wald test obtained with AMLE is identical to the score test. Data can be genotype matrices in plink format, or dosage (VCF files). The methods are described in details in Milet et al (2020) <doi:10.1101/2020.01.17.910109>.
Maintained by Hervé Perdry. Last updated 9 months ago.
2.00 scorecipollinifabrizio-01
TAQMNGR:Manage Tick-by-Tick Transaction Data
Manager of tick-by-tick transaction data that performs 'cleaning', 'aggregation' and 'import' in an efficient and fast way. The package engine, written in C++, exploits the 'zlib' and 'gzstream' libraries to handle gzipped data without need to uncompress them. 'Cleaning' and 'aggregation' are performed according to Brownlees and Gallo (2006) <DOI:10.1016/j.csda.2006.09.030>. Currently, TAQMNGR processes raw data from WRDS (Wharton Research Data Service, <https://wrds-web.wharton.upenn.edu/wrds/>).
Maintained by Fabrizio Cipollini. Last updated 7 years ago.
1.00 score 2 scripts