Showing 200 of total 756 results (show query)
rcppcore
Rcpp:Seamless R and C++ Integration
The 'Rcpp' package provides R functions as well as C++ classes which offer a seamless integration of R and C++. Many R data types and objects can be mapped back and forth to C++ equivalents which facilitates both writing of new code as well as easier integration of third-party libraries. Documentation about 'Rcpp' is provided by several vignettes included in this package, via the 'Rcpp Gallery' site at <https://gallery.rcpp.org>, the paper by Eddelbuettel and Francois (2011, <doi:10.18637/jss.v040.i08>), the book by Eddelbuettel (2013, <doi:10.1007/978-1-4614-6868-4>) and the paper by Eddelbuettel and Balamuta (2018, <doi:10.1080/00031305.2017.1375990>); see 'citation("Rcpp")' for details.
Maintained by Dirk Eddelbuettel. Last updated 2 days ago.
c-plus-plusc-plus-plus-11c-plus-plus-14c-plus-plus-17c-plus-plus-20rcppcpp
755 stars 22.62 score 11k scripts 13k dependentstidyverse
lubridate:Make Dealing with Dates a Little Easier
Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects. The 'lubridate' package has a consistent and memorable syntax that makes working with dates easy and fun.
Maintained by Vitalie Spinu. Last updated 4 months ago.
757 stars 20.95 score 135k scripts 1.9k dependentsr-dbi
DBI:R Database Interface
A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.
Maintained by Kirill Müller. Last updated 11 days ago.
302 stars 20.87 score 19k scripts 2.9k dependentslme4
lme4:Linear Mixed-Effects Models using 'Eigen' and S4
Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".
Maintained by Ben Bolker. Last updated 4 days ago.
647 stars 20.68 score 35k scripts 1.5k dependentsstan-dev
rstan:R Interface to Stan
User-facing R functions are provided to parse, compile, test, estimate, and analyze Stan models by accessing the header-only Stan library provided by the 'StanHeaders' package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo, rough Bayesian inference via 'variational' approximation, and (optionally penalized) maximum likelihood estimation via optimization. In all three cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.
Maintained by Ben Goodrich. Last updated 4 days ago.
bayesian-data-analysisbayesian-inferencebayesian-statisticsmcmcstancpp
1.1k stars 18.84 score 14k scripts 281 dependentsr-dbi
RSQLite:SQLite Interface for R
Embeds the SQLite database engine in R and provides an interface compliant with the DBI package. The source for the SQLite engine and for various extensions in a recent version is included. System libraries will never be consulted because this package relies on static linking for the plugins it includes; this also ensures a consistent experience across all installations.
Maintained by Kirill Müller. Last updated 1 months ago.
331 stars 18.76 score 8.1k scripts 1.1k dependentsedzer
sp:Classes and Methods for Spatial Data
Classes and methods for spatial data; the classes document where the spatial location information resides, for 2D or 3D data. Utility functions are provided, e.g. for plotting data as maps, spatial selection, as well as methods for retrieving coordinates, for subsetting, print, summary, etc. From this version, 'rgdal', 'maptools', and 'rgeos' are no longer used at all, see <https://r-spatial.org/r/2023/05/15/evolution4.html> for details.
Maintained by Edzer Pebesma. Last updated 2 months ago.
127 stars 18.63 score 35k scripts 1.3k dependentsbioc
Biostrings:Efficient manipulation of biological strings
Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.
Maintained by Hervé Pagès. Last updated 1 months ago.
sequencematchingalignmentsequencinggeneticsdataimportdatarepresentationinfrastructurebioconductor-packagecore-package
62 stars 17.77 score 8.6k scripts 1.2k dependentsbioc
GenomicRanges:Representation and manipulation of genomic intervals
The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.
Maintained by Hervé Pagès. Last updated 4 months ago.
geneticsinfrastructuredatarepresentationsequencingannotationgenomeannotationcoveragebioconductor-packagecore-package
44 stars 17.68 score 13k scripts 1.3k dependentsbioc
BiocParallel:Bioconductor facilities for parallel evaluation
This package provides modified versions and novel implementation of functions for parallel evaluation, tailored to use with Bioconductor objects.
Maintained by Martin Morgan. Last updated 1 months ago.
infrastructurebioconductor-packagecore-packageu24ca289073cpp
67 stars 17.31 score 7.3k scripts 1.1k dependentsdaattali
shinyjs:Easily Improve the User Experience of Your Shiny Apps in Seconds
Perform common useful JavaScript operations in Shiny apps that will greatly improve your apps without having to know any JavaScript. Examples include: hiding an element, disabling an input, resetting an input back to its original value, delaying code execution by a few seconds, and many more useful functions for both the end user and the developer. 'shinyjs' can also be used to easily call your own custom JavaScript functions from R.
Maintained by Dean Attali. Last updated 7 months ago.
740 stars 17.28 score 8.9k scripts 400 dependentsr-forge
Matrix:Sparse and Dense Matrix Classes and Methods
A rich hierarchy of sparse and dense matrix classes, including general, symmetric, triangular, and diagonal matrices with numeric, logical, or pattern entries. Efficient methods for operating on such matrices, often wrapping the 'BLAS', 'LAPACK', and 'SuiteSparse' libraries.
Maintained by Martin Maechler. Last updated 19 days ago.
1 stars 17.23 score 33k scripts 12k dependentsbioc
SummarizedExperiment:A container (S4 class) for matrix-like assays
The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
Maintained by Hervé Pagès. Last updated 5 months ago.
geneticsinfrastructuresequencingannotationcoveragegenomeannotationbioconductor-packagecore-package
34 stars 16.84 score 8.6k scripts 1.2k dependentsbioc
Biobase:Biobase: Base functions for Bioconductor
Functions that are needed by many other packages or which replace R functions.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructurebioconductor-packagecore-package
9 stars 16.45 score 6.6k scripts 1.8k dependentsbioc
GenomeInfoDb:Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
Contains data and functions that define and allow translation between different chromosome sequence naming conventions (e.g., "chr1" versus "1"), including a function that attempts to place sequence names in their natural, rather than lexicographic, order.
Maintained by Hervé Pagès. Last updated 2 months ago.
geneticsdatarepresentationannotationgenomeannotationbioconductor-packagecore-package
32 stars 16.32 score 1.3k scripts 1.7k dependentsr-dbi
odbc:Connect to ODBC Compatible Databases (using the DBI Interface)
A DBI-compatible interface to ODBC databases.
Maintained by Hadley Wickham. Last updated 1 days ago.
396 stars 16.31 score 2.9k scripts 23 dependentsjoshuaulrich
quantmod:Quantitative Financial Modelling Framework
Specify, build, trade, and analyse quantitative financial trading strategies.
Maintained by Joshua M. Ulrich. Last updated 26 days ago.
algorithmic-tradingchartingdata-importfinancetime-series
839 stars 16.17 score 8.1k scripts 343 dependentsbioc
DESeq2:Differential gene expression analysis based on the negative binomial distribution
Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.
Maintained by Michael Love. Last updated 23 days ago.
sequencingrnaseqchipseqgeneexpressiontranscriptionnormalizationdifferentialexpressionbayesianregressionprincipalcomponentclusteringimmunooncologyopenblascpp
375 stars 16.11 score 17k scripts 115 dependentsbioc
IRanges:Foundation of integer range manipulation in Bioconductor
Provides efficient low-level and highly reusable S4 classes for storing, manipulating and aggregating over annotated ranges of integers. Implements an algebra of range operations, including efficient algorithms for finding overlaps and nearest neighbors. Defines efficient list-like classes for storing, transforming and aggregating large grouped data, i.e., collections of atomic vectors and DataFrames.
Maintained by Hervé Pagès. Last updated 2 months ago.
infrastructuredatarepresentationbioconductor-packagecore-package
22 stars 16.09 score 2.1k scripts 1.8k dependentsbioc
S4Vectors:Foundation of vector-like and list-like containers in Bioconductor
The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).
Maintained by Hervé Pagès. Last updated 2 months ago.
infrastructuredatarepresentationbioconductor-packagecore-package
18 stars 16.05 score 1.0k scripts 1.9k dependentsbioc
biomaRt:Interface to BioMart databases (i.e. Ensembl)
In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (<http://www.biomart.org>). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The most prominent examples of BioMart databases are maintain by Ensembl, which provides biomaRt users direct access to a diverse set of data and enables a wide range of powerful online queries from gene annotation to database mining.
Maintained by Mike Smith. Last updated 14 days ago.
annotationbioconductorbiomartensembl
38 stars 15.99 score 13k scripts 230 dependentsbioc
rhdf5:R Interface to HDF5
This package provides an interface between HDF5 and R. HDF5's main features are the ability to store and access very large and/or complex datasets and a wide variety of metadata on mass storage (disk) through a completely portable file format. The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, and for letting R applications work on datasets that are larger than the available RAM.
Maintained by Mike Smith. Last updated 4 days ago.
infrastructuredataimporthdf5rhdf5opensslcurlzlibcpp
62 stars 15.87 score 4.2k scripts 232 dependentsbioc
DelayedArray:A unified framework for working transparently with on-disk and in-memory array-like datasets
Wrapping an array-like object (typically an on-disk object) in a DelayedArray object allows one to perform common array operations on it without loading the object in memory. In order to reduce memory usage and optimize performance, operations on the object are either delayed or executed using a block processing mechanism. Note that this also works on in-memory array-like objects like DataFrame objects (typically with Rle columns), Matrix objects, ordinary arrays and, data frames.
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructuredatarepresentationannotationgenomeannotationbioconductor-packagecore-packageu24ca289073
27 stars 15.59 score 538 scripts 1.2k dependentsbioc
Rsamtools:Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import
This package provides an interface to the 'samtools', 'bcftools', and 'tabix' utilities for manipulating SAM (Sequence Alignment / Map), FASTA, binary variant call (BCF) and compressed indexed tab-delimited (tabix) files.
Maintained by Bioconductor Package Maintainer. Last updated 4 months ago.
dataimportsequencingcoveragealignmentqualitycontrolbioconductor-packagecore-packagecurlbzip2xz-utilszlibcpp
28 stars 15.34 score 3.2k scripts 569 dependentsbioc
GenomicFeatures:Query the gene models of a given organism/assembly
Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.
Maintained by H. Pagès. Last updated 5 months ago.
geneticsinfrastructureannotationsequencinggenomeannotationbioconductor-packagecore-package
26 stars 15.34 score 5.3k scripts 339 dependentsbioc
GenomicAlignments:Representation and manipulation of short genomic alignments
Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.
Maintained by Hervé Pagès. Last updated 5 months ago.
infrastructuredataimportgeneticssequencingrnaseqsnpcoveragealignmentimmunooncologybioconductor-packagecore-package
10 stars 15.21 score 3.1k scripts 528 dependentsbioc
AnnotationDbi:Manipulation of SQLite-based annotations in Bioconductor
Implements a user-friendly interface for querying SQLite-based annotation data packages.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
annotationmicroarraysequencinggenomeannotationbioconductor-packagecore-package
9 stars 15.05 score 3.6k scripts 769 dependentsbioc
DOSE:Disease Ontology Semantic and Enrichment analysis
This package implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring semantic similarities among DO terms and gene products. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented for discovering disease associations of high-throughput biological data.
Maintained by Guangchuang Yu. Last updated 5 months ago.
annotationvisualizationmultiplecomparisongenesetenrichmentpathwayssoftwaredisease-ontologyenrichment-analysissemantic-similarity
119 stars 14.97 score 2.0k scripts 61 dependentsbioc
MultiAssayExperiment:Software for the integration of multi-omics experiments in Bioconductor
Harmonize data management of multiple experimental assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames. Facilities are provided for reshaping data into wide and long formats for adaptability to graphing and downstream analysis.
Maintained by Marcel Ramos. Last updated 2 months ago.
infrastructuredatarepresentationbioconductorbioconductor-packagegenomicsnci-itcrtcgau24ca289073
71 stars 14.95 score 670 scripts 127 dependentsr-dbi
RPostgres:C++ Interface to PostgreSQL
Fully DBI-compliant C++-backed interface to PostgreSQL <https://www.postgresql.org/>, an open-source relational database.
Maintained by Kirill Müller. Last updated 1 months ago.
338 stars 14.78 score 1.6k scripts 31 dependentsbioc
GEOquery:Get data from NCBI Gene Expression Omnibus (GEO)
The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.
Maintained by Sean Davis. Last updated 5 months ago.
microarraydataimportonechanneltwochannelsagebioconductorbioinformaticsdata-sciencegenomicsncbi-geo
92 stars 14.46 score 4.1k scripts 44 dependentsbioc
xcms:LC-MS and GC-MS Data Analysis
Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.
Maintained by Steffen Neumann. Last updated 14 days ago.
immunooncologymassspectrometrymetabolomicsbioconductorfeature-detectionmass-spectrometrypeak-detectioncpp
196 stars 14.31 score 984 scripts 11 dependentsbioc
BSgenome:Software infrastructure for efficient representation of full genomes and their SNPs
Infrastructure shared by all the Biostrings-based genome data packages.
Maintained by Hervé Pagès. Last updated 2 months ago.
geneticsinfrastructuredatarepresentationsequencematchingannotationsnpbioconductor-packagecore-package
9 stars 14.12 score 1.2k scripts 267 dependentsleifeld
texreg:Conversion of R Regression Output to LaTeX or HTML Tables
Converts coefficients, standard errors, significance stars, and goodness-of-fit statistics of statistical models into LaTeX tables or HTML tables/MS Word documents or to nicely formatted screen output for the R console for easy model comparison. A list of several models can be combined in a single table. The output is highly customizable. New model types can be easily implemented. Details can be found in Leifeld (2013), JStatSoft <doi:10.18637/jss.v055.i08>.)
Maintained by Philip Leifeld. Last updated 3 months ago.
html-tableslatexlatex-tablesregressionreportingtabletexreg
113 stars 14.09 score 1.8k scripts 67 dependentsbioc
ensembldb:Utilities to create and use Ensembl-based annotation databases
The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.
Maintained by Johannes Rainer. Last updated 5 months ago.
geneticsannotationdatasequencingcoverageannotationbioconductorbioconductor-packagesensembl
35 stars 14.08 score 892 scripts 108 dependentsedzer
hexbin:Hexagonal Binning Routines
Binning and plotting functions for hexagonal bins.
Maintained by Edzer Pebesma. Last updated 5 months ago.
37 stars 14.00 score 2.4k scripts 114 dependentsmhahsler
arules:Mining Association Rules and Frequent Itemsets
Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.
Maintained by Michael Hahsler. Last updated 2 months ago.
arulesassociation-rulesfrequent-itemsets
194 stars 13.99 score 3.3k scripts 28 dependentsbioc
phyloseq:Handling and analysis of high-throughput microbiome census data
phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.
Maintained by Paul J. McMurdie. Last updated 5 months ago.
immunooncologysequencingmicrobiomemetagenomicsclusteringclassificationmultiplecomparisongeneticvariability
597 stars 13.90 score 8.4k scripts 37 dependentsbioc
AnnotationHub:Client to access AnnotationHub resources
This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be discovered. The resource includes metadata about each resource, e.g., a textual description, tags, and date of modification. The client creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructuredataimportguithirdpartyclientcore-packageu24ca289073
17 stars 13.88 score 2.7k scripts 104 dependentsbiomodhub
biomod2:Ensemble Platform for Species Distribution Modeling
Functions for species distribution modeling, calibration and evaluation, ensemble of models, ensemble forecasting and visualization. The package permits to run consistently up to 10 single models on a presence/absences (resp presences/pseudo-absences) dataset and to combine them in ensemble models and ensemble projections. Some bench of other evaluation and visualisation tools are also available within the package.
Maintained by Maya Guéguen. Last updated 2 days ago.
95 stars 13.85 score 536 scripts 7 dependentsbioc
limma:Linear Models for Microarray and Omics Data
Data analysis, linear models and differential expression for omics data.
Maintained by Gordon Smyth. Last updated 10 days ago.
exonarraygeneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicinggenesetenrichmentdataimportbayesianclusteringregressiontimecoursemicroarraymicrornaarraymrnamicroarrayonechannelproprietaryplatformstwochannelsequencingrnaseqbatcheffectmultiplecomparisonnormalizationpreprocessingqualitycontrolbiomedicalinformaticscellbiologycheminformaticsepigeneticsfunctionalgenomicsgeneticsimmunooncologymetabolomicsproteomicssystemsbiologytranscriptomics
13.81 score 16k scripts 586 dependentsduckdb
duckdb:DBI Package for the DuckDB Database Management System
The DuckDB project is an embedded analytical data management system with support for the Structured Query Language (SQL). This package includes all of DuckDB and an R Database Interface (DBI) connector.
Maintained by Kirill Müller. Last updated 10 days ago.
159 stars 13.80 score 1.7k scripts 46 dependentsbioc
BiocFileCache:Manage Files Across Sessions
This package creates a persistent on-disk cache of files that the user can add, update, and retrieve. It is useful for managing resources (such as custom Txdb objects) that are costly or difficult to create, web resources, and data files used across sessions.
Maintained by Lori Shepherd. Last updated 2 months ago.
dataimportcore-packageu24ca289073
13 stars 13.76 score 486 scripts 436 dependentssimsem
semTools:Useful Tools for Structural Equation Modeling
Provides miscellaneous tools for structural equation modeling, many of which extend the 'lavaan' package. For example, latent interactions can be estimated using product indicators (Lin et al., 2010, <doi:10.1080/10705511.2010.488999>) and simple effects probed; analytical power analyses can be conducted (Jak et al., 2021, <doi:10.3758/s13428-020-01479-0>); and scale reliability can be estimated based on estimated factor-model parameters.
Maintained by Terrence D. Jorgensen. Last updated 15 days ago.
79 stars 13.74 score 1.1k scripts 31 dependentsr-dbi
RMySQL:Database Interface and 'MySQL' Driver for R
Legacy 'DBI' interface to 'MySQL' / 'MariaDB' based on old code ported from S-PLUS. A modern 'MySQL' client written in 'C++' is available from the 'RMariaDB' package.
Maintained by Jeroen Ooms. Last updated 2 months ago.
209 stars 13.68 score 3.7k scripts 15 dependentsbioc
SingleCellExperiment:S4 Classes for Single Cell Data
Defines a S4 class for storing data from single-cell experiments. This includes specialized methods to store and retrieve spike-in information, dimensionality reduction coordinates and size factors for each cell, along with the usual metadata for genes and libraries.
Maintained by Davide Risso. Last updated 21 days ago.
immunooncologydatarepresentationdataimportinfrastructuresinglecell
13.53 score 15k scripts 285 dependentsbioc
edgeR:Empirical Analysis of Digital Gene Expression Data in R
Differential expression analysis of sequence count data. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models, quasi-likelihood, and gene set enrichment. Can perform differential analyses of any type of omics data that produces read counts, including RNA-seq, ChIP-seq, ATAC-seq, Bisulfite-seq, SAGE, CAGE, metabolomics, or proteomics spectral counts. RNA-seq analyses can be conducted at the gene or isoform level, and tests can be conducted for differential exon or transcript usage.
Maintained by Yunshun Chen. Last updated 18 days ago.
alternativesplicingbatcheffectbayesianbiomedicalinformaticscellbiologychipseqclusteringcoveragedifferentialexpressiondifferentialmethylationdifferentialsplicingdnamethylationepigeneticsfunctionalgenomicsgeneexpressiongenesetenrichmentgeneticsimmunooncologymultiplecomparisonnormalizationpathwaysproteomicsqualitycontrolregressionrnaseqsagesequencingsinglecellsystemsbiologytimecoursetranscriptiontranscriptomicsopenblas
13.40 score 17k scripts 255 dependentsyulab-smu
tidytree:A Tidy Tool for Phylogenetic Tree Data Manipulation
Phylogenetic tree generally contains multiple components including node, edge, branch and associated data. 'tidytree' provides an approach to convert tree object to tidy data frame as well as provides tidy interfaces to manipulate tree data.
Maintained by Guangchuang Yu. Last updated 8 months ago.
phylogenetic-treetidyversetree-data
56 stars 13.36 score 584 scripts 128 dependentsbioc
HDF5Array:HDF5 datasets as array-like objects in R
The HDF5Array package is an HDF5 backend for DelayedArray objects. It implements the HDF5Array, H5SparseMatrix, H5ADMatrix, and TENxMatrix classes, 4 convenient and memory-efficient array-like containers for representing and manipulating either: (1) a conventional (a.k.a. dense) HDF5 dataset, (2) an HDF5 sparse matrix (stored in CSR/CSC/Yale format), (3) the central matrix of an h5ad file (or any matrix in the /layers group), or (4) a 10x Genomics sparse matrix. All these containers are DelayedArray extensions and thus support all operations (delayed or block-processed) supported by DelayedArray objects.
Maintained by Hervé Pagès. Last updated 9 days ago.
infrastructuredatarepresentationdataimportsequencingrnaseqcoverageannotationgenomeannotationsinglecellimmunooncologybioconductor-packagecore-packageu24ca289073
12 stars 13.20 score 844 scripts 126 dependentsbioc
Gviz:Plotting data and annotation information along genomic coordinates
Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.
Maintained by Robert Ivanek. Last updated 5 months ago.
visualizationmicroarraysequencing
79 stars 13.08 score 1.4k scripts 48 dependentsbioc
ChIPseeker:ChIPseeker for ChIP peak Annotation, Comparison, and Visualization
This package implements functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statstical methods for estimate the significance of overlap among ChIP peak data sets, and incorporate GEO database for user to compare the own dataset with those deposited in database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. Several visualization functions are implemented to summarize the coverage of the peak experiment, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS, and overlap of peaks or genes.
Maintained by Guangchuang Yu. Last updated 5 months ago.
annotationchipseqsoftwarevisualizationmultiplecomparisonatac-seqchip-seqcomparisonepigeneticsepigenomics
234 stars 13.02 score 1.6k scripts 5 dependentsbioc
Spectra:Spectra Infrastructure for Mass Spectrometry Data
The Spectra package defines an efficient infrastructure for storing and handling mass spectrometry spectra and functionality to subset, process, visualize and compare spectra data. It provides different implementations (backends) to store mass spectrometry data. These comprise backends tuned for fast data access and processing and backends for very large data sets ensuring a small memory footprint.
Maintained by RforMassSpectrometry Package Maintainer. Last updated 21 days ago.
infrastructureproteomicsmassspectrometrymetabolomicsbioconductorhacktoberfestmass-spectrometry
41 stars 13.01 score 254 scripts 35 dependentsbioc
iSEE:Interactive SummarizedExperiment Explorer
Create an interactive Shiny-based graphical user interface for exploring data stored in SummarizedExperiment objects, including row- and column-level metadata. The interface supports transmission of selections between plots and tables, code tracking, interactive tours, interactive or programmatic initialization, preservation of app state, and extensibility to new panel types via S4 classes. Special attention is given to single-cell data in a SingleCellExperiment object with visualization of dimensionality reduction results.
Maintained by Kevin Rue-Albrecht. Last updated 22 days ago.
cellbasedassaysclusteringdimensionreductionfeatureextractiongeneexpressionguiimmunooncologyshinyappssinglecelltranscriptiontranscriptomicsvisualizationdimension-reductionfeature-extractiongene-expressionhacktoberfesthuman-cell-atlasshinysingle-cell
225 stars 12.86 score 380 scripts 9 dependentsbioc
minfi:Analyze Illumina Infinium DNA methylation arrays
Tools to analyze & visualize Illumina Infinium methylation arrays.
Maintained by Kasper Daniel Hansen. Last updated 4 months ago.
immunooncologydnamethylationdifferentialmethylationepigeneticsmicroarraymethylationarraymultichanneltwochanneldataimportnormalizationpreprocessingqualitycontrol
60 stars 12.82 score 996 scripts 27 dependentscsgillespie
poweRlaw:Analysis of Heavy Tailed Distributions
An implementation of maximum likelihood estimators for a variety of heavy tailed distributions, including both the discrete and continuous power law distributions. Additionally, a goodness-of-fit based approach is used to estimate the lower cut-off for the scaling region.
Maintained by Colin Gillespie. Last updated 2 months ago.
112 stars 12.79 score 332 scripts 32 dependentsspedygiorgio
markovchain:Easy Handling Discrete Time Markov Chains
Functions and S4 methods to create and manage discrete time Markov chains more easily. In addition functions to perform statistical (fitting and drawing random variates) and probabilistic (analysis of their structural proprieties) analysis are provided. See Spedicato (2017) <doi:10.32614/RJ-2017-036>. Some functions for continuous times Markov chains depend on the suggested ctmcd package.
Maintained by Giorgio Alfredo Spedicato. Last updated 5 months ago.
ctmcdtmcmarkov-chainmarkov-modelr-programmingrcppopenblascpp
104 stars 12.78 score 712 scripts 4 dependentsbioc
EBImage:Image processing and analysis toolbox for R
EBImage provides general purpose functionality for image processing and analysis. In the context of (high-throughput) microscopy-based cellular assays, EBImage offers tools to segment cells and extract quantitative cellular descriptors. This allows the automation of such tasks using the R programming language and facilitates the use of other tools in the R environment for signal processing, statistical modeling, machine learning and visualization with image data.
Maintained by Andrzej Oleś. Last updated 5 months ago.
visualizationbioinformaticsimage-analysisimage-processingcpp
71 stars 12.77 score 1.5k scripts 33 dependentsbioc
MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics
MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.
Maintained by Laurent Gatto. Last updated 14 days ago.
immunooncologyinfrastructureproteomicsmassspectrometryqualitycontroldataimportbioconductorbioinformaticsmass-spectrometryproteomics-datavisualisationcpp
131 stars 12.76 score 772 scripts 36 dependentsboost-r
mboost:Model-Based Boosting
Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data. Models and algorithms are described in <doi:10.1214/07-STS242>, a hands-on tutorial is available from <doi:10.1007/s00180-012-0382-5>. The package allows user-specified loss functions and base-learners.
Maintained by Torsten Hothorn. Last updated 5 months ago.
boosting-algorithmsgamglmmachine-learningmboostmodellingr-languagetutorialsvariable-selectionopenblas
72 stars 12.70 score 540 scripts 27 dependentsbioc
rtracklayer:R interface to genome annotation files and the UCSC genome browser
Extensible framework for interacting with multiple genome browsers (currently UCSC built-in) and manipulating annotation tracks in various formats (currently GFF, BED, bedGraph, BED15, WIG, BigWig and 2bit built-in). The user may export/import tracks to/from the supported browsers, as well as query and modify the browser state, such as the current viewport.
Maintained by Michael Lawrence. Last updated 3 days ago.
annotationvisualizationdataimportzlibopensslcurl
12.66 score 6.7k scripts 480 dependentsr-dbi
bigrquery:An Interface to Google's 'BigQuery' 'API'
Easily talk to Google's 'BigQuery' database from R.
Maintained by Hadley Wickham. Last updated 1 months ago.
520 stars 12.47 score 1.8k scripts 4 dependentsbioc
SparseArray:High-performance sparse data representation and manipulation in R
The SparseArray package provides array-like containers for efficient in-memory representation of multidimensional sparse data in R (arrays and matrices). The package defines the SparseArray virtual class and two concrete subclasses: COO_SparseArray and SVT_SparseArray. Each subclass uses its own internal representation of the nonzero multidimensional data: the "COO layout" and the "SVT layout", respectively. SVT_SparseArray objects mimic as much as possible the behavior of ordinary matrix and array objects in base R. In particular, they suppport most of the "standard matrix and array API" defined in base R and in the matrixStats package from CRAN.
Maintained by Hervé Pagès. Last updated 10 days ago.
infrastructuredatarepresentationbioconductor-packagecore-packageopenmp
9 stars 12.47 score 79 scripts 1.2k dependentssuyusung
arm:Data Analysis Using Regression and Multilevel/Hierarchical Models
Functions to accompany A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2007.
Maintained by Yu-Sung Su. Last updated 5 months ago.
25 stars 12.38 score 3.3k scripts 89 dependentsasardaes
dtwclust:Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance
Time series clustering along with optimized techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Implementations of partitional, hierarchical, fuzzy, k-Shape and TADPole clustering are available. Functionality can be easily extended with custom distance measures and centroid definitions. Implementations of DTW barycenter averaging, a distance based on global alignment kernels, and the soft-DTW distance and centroid routines are also provided. All included distance functions have custom loops optimized for the calculation of cross-distance matrices, including parallelization support. Several cluster validity indices are included.
Maintained by Alexis Sarda. Last updated 8 months ago.
clusteringdtwtime-seriesopenblascpp
262 stars 12.35 score 406 scripts 14 dependentsmelff
memisc:Management of Survey Data and Presentation of Analysis Results
An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.
Maintained by Martin Elff. Last updated 23 days ago.
46 stars 12.34 score 1.2k scripts 13 dependentsgaospecial
ggVennDiagram:A 'ggplot2' Implement of Venn Diagram
Easy-to-use functions to generate 2-7 sets Venn or upset plot in publication quality. 'ggVennDiagram' plot Venn or upset using well-defined geometry dataset and 'ggplot2'. The shapes of 2-4 sets Venn use circles and ellipses, while the shapes of 4-7 sets Venn use irregular polygons (4 has both forms), which are developed and imported from another package 'venn', authored by Adrian Dusa. We provided internal functions to integrate shape data with user provided sets data, and calculated the geometry of every regions/intersections of them, then separately plot Venn in four components, set edges/labels, and region edges/labels. From version 1.0, it is possible to customize these components as you demand in ordinary 'ggplot2' grammar. From version 1.4.4, it supports unlimited number of sets, as it can draw a plain upset plot automatically when number of sets is more than 7.
Maintained by Chun-Hui Gao. Last updated 5 months ago.
set-operationsupsetupsetplotvenn-diagramvenn-plot
292 stars 12.31 score 1.3k scripts 4 dependentsmiraisolutions
XLConnect:Excel Connector for R
Provides comprehensive functionality to read, write and format Excel data.
Maintained by Martin Studer. Last updated 30 days ago.
cross-platformexcelr-languagexlconnectopenjdk
130 stars 12.28 score 1.2k scripts 1 dependentsalexkz
kernlab:Kernel-Based Machine Learning Lab
Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods 'kernlab' includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
Maintained by Alexandros Karatzoglou. Last updated 8 months ago.
21 stars 12.26 score 7.8k scripts 487 dependentsbioc
bsseq:Analyze, manage and store whole-genome methylation data
A collection of tools for analyzing and visualizing whole-genome methylation data from sequencing. This includes whole-genome bisulfite sequencing and Oxford nanopore data.
Maintained by Kasper Daniel Hansen. Last updated 3 months ago.
37 stars 12.26 score 676 scripts 15 dependentsmarkmfredrickson
optmatch:Functions for Optimal Matching
Distance based bipartite matching using minimum cost flow, oriented to matching of treatment and control groups in observational studies ('Hansen' and 'Klopfer' 2006 <doi:10.1198/106186006X137047>). Routines are provided to generate distances from generalised linear models (propensity score matching), formulas giving variables on which to limit matched distances, stratified or exact matching directives, or calipers, alone or in combination.
Maintained by Josh Errickson. Last updated 4 months ago.
47 stars 12.22 score 588 scripts 5 dependentsr-dbi
RMariaDB:Database Interface and MariaDB Driver
Implements a DBI-compliant interface to MariaDB (<https://mariadb.org/>) and MySQL (<https://www.mysql.com/>) databases.
Maintained by Kirill Müller. Last updated 1 months ago.
133 stars 12.20 score 792 scripts 10 dependentsalexiosg
rugarch:Univariate GARCH Models
ARFIMA, in-mean, external regressors and various GARCH flavors, with methods for fit, forecast, simulation, inference and plotting.
Maintained by Alexios Galanos. Last updated 3 months ago.
26 stars 12.13 score 1.3k scripts 15 dependentstomoakin
RPostgreSQL:R Interface to the 'PostgreSQL' Database System
Database interface and 'PostgreSQL' driver for 'R'. This package provides a Database Interface 'DBI' compliant driver for 'R' to access 'PostgreSQL' database systems. In order to build and install this package from source, 'PostgreSQL' itself must be present your system to provide 'PostgreSQL' functionality via its libraries and header files. These files are provided as 'postgresql-devel' package under some Linux distributions. On 'macOS' and 'Microsoft Windows' system the attached 'libpq' library source will be used.
Maintained by Tomoaki Nishiyama. Last updated 15 hours ago.
66 stars 12.11 score 4.5k scripts 19 dependentsbioc
BiocSingular:Singular Value Decomposition for Bioconductor Packages
Implements exact and approximate methods for singular value decomposition and principal components analysis, in a framework that allows them to be easily switched within Bioconductor packages or workflows. Where possible, parallelization is achieved using the BiocParallel framework.
Maintained by Aaron Lun. Last updated 5 months ago.
softwaredimensionreductionprincipalcomponentbioconductor-packagehuman-cell-atlassingular-value-decompositioncpp
7 stars 12.10 score 1.2k scripts 103 dependentsbioc
ShortRead:FASTQ input and manipulation
This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
dataimportsequencingqualitycontrolbioconductor-packagecore-packagezlibcpp
8 stars 12.08 score 1.8k scripts 49 dependentsbioc
slingshot:Tools for ordering single-cell sequencing
Provides functions for inferring continuous, branching lineage structures in low-dimensional data. Slingshot was designed to model developmental trajectories in single-cell RNA sequencing data and serve as a component in an analysis pipeline after dimensionality reduction and clustering. It is flexible enough to handle arbitrarily many branching events and allows for the incorporation of prior knowledge through supervised graph construction.
Maintained by Kelly Street. Last updated 5 months ago.
clusteringdifferentialexpressiongeneexpressionrnaseqsequencingsoftwaresinglecelltranscriptomicsvisualization
283 stars 12.01 score 1.0k scripts 4 dependentsbioc
QFeatures:Quantitative features for mass spectrometry data
The QFeatures infrastructure enables the management and processing of quantitative features for high-throughput mass spectrometry assays. It provides a familiar Bioconductor user experience to manages quantitative data across different assay levels (such as peptide spectrum matches, peptides and proteins) in a coherent and tractable format.
Maintained by Laurent Gatto. Last updated 24 days ago.
infrastructuremassspectrometryproteomicsmetabolomicsbioconductormass-spectrometry
27 stars 11.87 score 278 scripts 49 dependentsbioc
graph:graph: A package to handle graph data structures
A package that implements some simple graph handling capabilities.
Maintained by Bioconductor Package Maintainer. Last updated 9 days ago.
11.86 score 764 scripts 339 dependentsr-forge
copula:Multivariate Dependence with Copulas
Classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.
Maintained by Martin Maechler. Last updated 23 days ago.
11.83 score 1.2k scripts 86 dependentstiledb-inc
tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays
The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.
Maintained by Isaiah Norton. Last updated 21 hours ago.
arrayhdfss3storage-managertiledbcpp
108 stars 11.79 score 306 scripts 4 dependentskingaa
pomp:Statistical Inference for Partially Observed Markov Processes
Tools for data analysis with partially observed Markov process (POMP) models (also known as stochastic dynamical systems, hidden Markov models, and nonlinear, non-Gaussian, state-space models). The package provides facilities for implementing POMP models, simulating them, and fitting them to time series data by a variety of frequentist and Bayesian methods. It is also a versatile platform for implementation of inference methods for general POMP models.
Maintained by Aaron A. King. Last updated 8 days ago.
abcb-splinedifferential-equationsdynamical-systemsiterated-filteringlikelihoodlikelihood-freemarkov-chain-monte-carlomarkov-modelmathematical-modellingmeasurement-errorparticle-filtersequential-monte-carlosimulation-based-inferencesobol-sequencestate-spacestatistical-inferencestochastic-processestime-seriesopenblas
114 stars 11.74 score 1.3k scripts 4 dependentsr-forge
coin:Conditional Inference Procedures in a Permutation Test Framework
Conditional inference procedures for the general independence problem including two-sample, K-sample (non-parametric ANOVA), correlation, censored, ordered and multivariate problems described in <doi:10.18637/jss.v028.i08>.
Maintained by Torsten Hothorn. Last updated 9 months ago.
11.70 score 1.6k scripts 73 dependentssatijalab
SeuratObject:Data Structures for Single Cell Data
Defines S4 classes for single-cell genomic data and associated information, such as dimensionality reduction embeddings, nearest-neighbor graphs, and spatially-resolved coordinates. Provides data access methods and R-native hooks to ensure the Seurat object is familiar to other R users. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, and Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031> for more details.
Maintained by Paul Hoffman. Last updated 2 years ago.
25 stars 11.69 score 1.2k scripts 88 dependentsluca-scr
GA:Genetic Algorithms
Flexible general-purpose toolbox implementing genetic algorithms (GAs) for stochastic optimisation. Binary, real-valued, and permutation representations are available to optimize a fitness function, i.e. a function provided by users depending on their objective function. Several genetic operators are available and can be combined to explore the best settings for the current task. Furthermore, users can define new genetic operators and easily evaluate their performances. Local search using general-purpose optimisation algorithms can be applied stochastically to exploit interesting regions. GAs can be run sequentially or in parallel, using an explicit master-slave parallelisation or a coarse-grain islands approach. For more details see Scrucca (2013) <doi:10.18637/jss.v053.i04> and Scrucca (2017) <doi:10.32614/RJ-2017-008>.
Maintained by Luca Scrucca. Last updated 7 months ago.
genetic-algorithmoptimisationcpp
93 stars 11.58 score 624 scripts 52 dependentsbioc
systemPipeR:systemPipeR: Workflow Environment for Data Analysis and Report Generation
systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates.
Maintained by Thomas Girke. Last updated 5 months ago.
geneticsinfrastructuredataimportsequencingrnaseqriboseqchipseqmethylseqsnpgeneexpressioncoveragegenesetenrichmentalignmentqualitycontrolimmunooncologyreportwritingworkflowstepworkflowmanagement
53 stars 11.52 score 344 scripts 3 dependentsbioc
Rgraphviz:Provides plotting capabilities for R graph objects
Interfaces R with the AT and T graphviz library for plotting R graph objects from the graph package.
Maintained by Kasper Daniel Hansen. Last updated 2 days ago.
graphandnetworkvisualizationzlib
11.51 score 1.2k scripts 107 dependentsrudjer
SparseM:Sparse Linear Algebra
Some basic linear algebra functionality for sparse matrices is provided: including Cholesky decomposition and backsolving as well as standard R subsetting and Kronecker products.
Maintained by Roger Koenker. Last updated 8 months ago.
3 stars 11.47 score 306 scripts 1.5k dependentsbioc
msa:Multiple Sequence Alignment
The 'msa' package provides a unified R/Bioconductor interface to the multiple sequence alignment algorithms ClustalW, ClustalOmega, and Muscle. All three algorithms are integrated in the package, therefore, they do not depend on any external software tools and are available for all major platforms. The multiple sequence alignment algorithms are complemented by a function for pretty-printing multiple sequence alignments using the LaTeX package TeXshade.
Maintained by Ulrich Bodenhofer. Last updated 1 months ago.
multiplesequencealignmentalignmentmultiplecomparisonsequencingcpp
17 stars 11.46 score 744 scripts 6 dependentseddelbuettel
RProtoBuf:R Interface to the 'Protocol Buffers' 'API' (Version 2 or 3)
Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal 'RPC' protocols and file formats. Additional documentation is available in two included vignettes one of which corresponds to our 'JSS' paper (2016, <doi:10.18637/jss.v071.i02>. A sufficiently recent version of 'Protocol Buffers' library is required; currently version 3.3.0 from 2017 is the stated minimum.
Maintained by Dirk Eddelbuettel. Last updated 12 days ago.
c-plus-plusprotocol-buffersprotobufcpp
73 stars 11.44 score 126 scripts 21 dependentsbioc
destiny:Creates diffusion maps
Create and plot diffusion maps.
Maintained by Philipp Angerer. Last updated 4 months ago.
cellbiologycellbasedassaysclusteringsoftwarevisualizationdiffusion-mapsdimensionality-reductioncpp
82 stars 11.44 score 792 scripts 1 dependentsbioc
annotate:Annotation for microarrays
Using R enviroments for annotation.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
11.41 score 812 scripts 239 dependentsbioc
PharmacoGx:Analysis of Large-Scale Pharmacogenomic Data
Contains a set of functions to perform large-scale analysis of pharmaco-genomic data. These include the PharmacoSet object for storing the results of pharmacogenomic experiments, as well as a number of functions for computing common summaries of drug-dose response and correlating them with the molecular features in a cancer cell-line.
Maintained by Benjamin Haibe-Kains. Last updated 3 months ago.
geneexpressionpharmacogeneticspharmacogenomicssoftwareclassificationdatasetspharmacogenomicpharmacogxcpp
68 stars 11.39 score 442 scripts 3 dependentsbioc
biomformat:An interface package for the BIOM file format
This is an R package for interfacing with the BIOM format. This package includes basic tools for reading biom-format files, accessing and subsetting data tables from a biom object (which is more complex than a single table), as well as limited support for writing a biom-object back to a biom-format file. The design of this API is intended to match the python API and other tools included with the biom-format project, but with a decidedly "R flavor" that should be familiar to R users. This includes S4 classes and methods, as well as extensions of common core functions/methods.
Maintained by Paul J. McMurdie. Last updated 5 months ago.
immunooncologydataimportmetagenomicsmicrobiome
7 stars 11.38 score 416 scripts 39 dependentsbioc
XVector:Foundation of external vector representation and manipulation in Bioconductor
Provides memory efficient S4 classes for storing sequences "externally" (e.g. behind an R external pointer, or on disk).
Maintained by Hervé Pagès. Last updated 3 months ago.
infrastructuredatarepresentationbioconductor-packagecore-packagezlib
2 stars 11.36 score 67 scripts 1.7k dependentsbioc
gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files
Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.
Maintained by Xiuwen Zheng. Last updated 14 days ago.
infrastructuredataimportbioinformaticsgds-formatgenomicscpp
18 stars 11.34 score 920 scripts 29 dependentsr-forge
Rmpfr:Interface R to MPFR - Multiple Precision Floating-Point Reliable
Arithmetic (via S4 classes and methods) for arbitrary precision floating point numbers, including transcendental ("special") functions. To this end, the package interfaces to the 'LGPL' licensed 'MPFR' (Multiple Precision Floating-Point Reliable) Library which itself is based on the 'GMP' (GNU Multiple Precision) Library.
Maintained by Martin Maechler. Last updated 4 months ago.
11.30 score 316 scripts 141 dependentsbioc
MAST:Model-based Analysis of Single Cell Transcriptomics
Methods and models for handling zero-inflated single cell assay data.
Maintained by Andrew McDavid. Last updated 5 months ago.
geneexpressiondifferentialexpressiongenesetenrichmentrnaseqtranscriptomicssinglecell
232 stars 11.28 score 1.8k scripts 5 dependentsbioc
ggcyto:Visualize Cytometry data with ggplot
With the dedicated fortify method implemented for flowSet, ncdfFlowSet and GatingSet classes, both raw and gated flow cytometry data can be plotted directly with ggplot. ggcyto wrapper and some customed layers also make it easy to add gates and population statistics to the plot.
Maintained by Mike Jiang. Last updated 5 months ago.
immunooncologyflowcytometrycellbasedassaysinfrastructurevisualization
58 stars 11.25 score 362 scripts 5 dependentspik-piam
magclass:Data Class and Tools for Handling Spatial-Temporal Data
Data class for increased interoperability working with spatial-temporal data together with corresponding functions and methods (conversions, basic calculations and basic data manipulation). The class distinguishes between spatial, temporal and other dimensions to facilitate the development and interoperability of tools build for it. Additional features are name-based addressing of data and internal consistency checks (e.g. checking for the right data order in calculations).
Maintained by Jan Philipp Dietrich. Last updated 22 days ago.
5 stars 11.16 score 412 scripts 56 dependentsbioc
affy:Methods for Affymetrix Oligonucleotide Arrays
The package contains functions for exploratory oligonucleotide array analysis. The dependence on tkWidgets only concerns few convenience functions. 'affy' is fully functional without it.
Maintained by Robert D. Shear. Last updated 3 months ago.
microarrayonechannelpreprocessing
11.12 score 2.5k scripts 98 dependentsbioc
genefilter:genefilter: methods for filtering genes from high-throughput experiments
Some basic functions for filtering genes.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
11.11 score 2.4k scripts 143 dependentsfmichonneau
phylobase:Base Package for Phylogenetic Structures and Comparative Data
Provides a base S4 class for comparative methods, incorporating one or more trees and trait data.
Maintained by Francois Michonneau. Last updated 1 years ago.
18 stars 11.10 score 394 scripts 18 dependentsrkillick
changepoint:Methods for Changepoint Detection
Implements various mainstream and specialised changepoint methods for finding single and multiple changepoints within data. Many popular non-parametric and frequentist methods are included. The cpt.mean(), cpt.var(), cpt.meanvar() functions should be your first point of call.
Maintained by Rebecca Killick. Last updated 4 months ago.
133 stars 11.05 score 736 scripts 40 dependentsbioc
S4Arrays:Foundation of array-like containers in Bioconductor
The S4Arrays package defines the Array virtual class to be extended by other S4 classes that wish to implement a container with an array-like semantic. It also provides: (1) low-level functionality meant to help the developer of such container to implement basic operations like display, subsetting, or coercion of their array-like objects to an ordinary matrix or array, and (2) a framework that facilitates block processing of array-like objects (typically on-disk objects).
Maintained by Hervé Pagès. Last updated 2 months ago.
infrastructuredatarepresentationbioconductor-packagecore-package
5 stars 10.99 score 8 scripts 1.2k dependentsbioc
DirichletMultinomial:Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data
Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Quince, 2012, PLoS ONE 7(2): 1-15, as discussed further in the man page for this package, ?DirichletMultinomial.
Maintained by Martin Morgan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclusteringclassificationmetagenomicsgsl
10 stars 10.91 score 125 scripts 26 dependentseddelbuettel
nanotime:Nanosecond-Resolution Time Support for R
Full 64-bit resolution date and time functionality with nanosecond granularity is provided, with easy transition to and from the standard 'POSIXct' type. Three additional classes offer interval, period and duration functionality for nanosecond-resolution timestamps.
Maintained by Dirk Eddelbuettel. Last updated 2 months ago.
datetimedatetimesnanosecond-resolutionnanosecondscpp
53 stars 10.91 score 134 scripts 17 dependentsr-forge
MatrixModels:Modelling with Sparse and Dense Matrices
Generalized Linear Modelling with sparse and dense 'Matrix' matrices, using modular prediction and response module classes.
Maintained by Martin Maechler. Last updated 19 hours ago.
1 stars 10.90 score 1.5k dependentsmetrumresearchgroup
mrgsolve:Simulate from ODE-Based Models
Fast simulation from ordinary differential equation (ODE) based models typically employed in quantitative pharmacology and systems biology.
Maintained by Kyle T Baron. Last updated 9 days ago.
138 stars 10.90 score 1.2k scripts 3 dependentsecmerkle
blavaan:Bayesian Latent Variable Analysis
Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models. References: Merkle & Rosseel (2018) <doi:10.18637/jss.v085.i04>; Merkle et al. (2021) <doi:10.18637/jss.v100.i06>.
Maintained by Edgar Merkle. Last updated 9 days ago.
bayesian-statisticsfactor-analysisgrowth-curve-modelslatent-variablesmissing-datamultilevel-modelsmultivariate-analysispath-analysispsychometricsstatistical-modelingstructural-equation-modelingcpp
92 stars 10.84 score 183 scripts 3 dependentswelch-lab
rliger:Linked Inference of Genomic Experimental Relationships
Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.
Maintained by Yichen Wang. Last updated 2 months ago.
nonnegative-matrix-factorizationsingle-cellopenblascpp
408 stars 10.77 score 334 scripts 1 dependentsbioc
GWASTools:Tools for Genome Wide Association Studies
Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.
Maintained by Stephanie M. Gogarten. Last updated 10 days ago.
snpgeneticvariabilityqualitycontrolmicroarray
17 stars 10.67 score 396 scripts 5 dependentsr-lum
Luminescence:Comprehensive Luminescence Dating Data Analysis
A collection of various R functions for the purpose of Luminescence dating data analysis. This includes, amongst others, data import, export, application of age models, curve deconvolution, sequence analysis and plotting of equivalent dose distributions.
Maintained by Sebastian Kreutzer. Last updated 1 days ago.
bayesian-statisticsdata-sciencegeochronologyluminescenceluminescence-datingopen-scienceoslplottingradiofluorescencetlxsygcpp
15 stars 10.67 score 178 scripts 8 dependentszdebruine
RcppML:Rcpp Machine Learning Library
Fast machine learning algorithms including matrix factorization and divisive clustering for large sparse and dense matrices.
Maintained by Zach DeBruine. Last updated 2 years ago.
clusteringmatrix-factorizationnmfrcpprcppeigensparse-matrixcppopenmp
107 stars 10.66 score 125 scripts 50 dependentsohdsi
FeatureExtraction:Generating Features for a Cohort
An R interface for generating features for a cohort using data in the Common Data Model. Features can be constructed using default or custom made feature definitions. Furthermore it's possible to aggregate features and get the summary statistics.
Maintained by Ger Inberg. Last updated 8 days ago.
62 stars 10.64 score 209 scripts 2 dependentspredictiveecology
SpaDES.core:Core Utilities for Developing and Running Spatially Explicit Discrete Event Models
Provides the core framework for a discrete event system to implement a complete data-to-decisions, reproducible workflow. The core components facilitate the development of modular pieces, and enable the user to include additional functionality by running user-built modules. Includes conditional scheduling, restart after interruption, packaging of reusable modules, tools for developing arbitrary automated workflows, automated interweaving of modules of different temporal resolution, and tools for visualizing and understanding the within-project dependencies. The suggested package 'NLMR' can be installed from the repository (<https://PredictiveEcology.r-universe.dev>).
Maintained by Eliot J B McIntire. Last updated 1 months ago.
discrete-events-simulationssimulation-frameworksimulation-modeling
10 stars 10.61 score 142 scripts 6 dependentsvalentint
rrcov:Scalable Robust Estimators with High Breakdown Point
Robust Location and Scatter Estimation and Robust Multivariate Analysis with High Breakdown Point: principal component analysis (Filzmoser and Todorov (2013), <doi:10.1016/j.ins.2012.10.017>), linear and quadratic discriminant analysis (Todorov and Pires (2007)), multivariate tests (Todorov and Filzmoser (2010) <doi:10.1016/j.csda.2009.08.015>), outlier detection (Todorov et al. (2010) <doi:10.1007/s11634-010-0075-2>). See also Todorov and Filzmoser (2009) <urn:isbn:978-3838108148>, Todorov and Filzmoser (2010) <doi:10.18637/jss.v032.i03> and Boudt et al. (2019) <doi:10.1007/s11222-019-09869-x>.
Maintained by Valentin Todorov. Last updated 7 months ago.
2 stars 10.57 score 484 scripts 96 dependentsbioc
seqLogo:Sequence logos for DNA sequence alignments
seqLogo takes the position weight matrix of a DNA sequence motif and plots the corresponding sequence logo as introduced by Schneider and Stephens (1990).
Maintained by Robert Ivanek. Last updated 5 months ago.
4 stars 10.57 score 304 scripts 29 dependentsbioc
ORFik:Open Reading Frames in Genomics
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
Maintained by Haakon Tjeldnes. Last updated 1 months ago.
immunooncologysoftwaresequencingriboseqrnaseqfunctionalgenomicscoveragealignmentdataimportcpp
33 stars 10.56 score 115 scripts 2 dependentswrathematics
float:32-Bit Floats
R comes with a suite of utilities for linear algebra with "numeric" (double precision) vectors/matrices. However, sometimes single precision (or less!) is more than enough for a particular task. This package extends R's linear algebra facilities to include 32-bit float (single precision) data. Float vectors/matrices have half the precision of their "numeric"-type counterparts but are generally faster to numerically operate on, for a performance vs accuracy trade-off. The internal representation is an S4 class, which allows us to keep the syntax identical to that of base R's. Interaction between floats and base types for binary operators is generally possible; in these cases, type promotion always defaults to the higher precision. The package ships with copies of the single precision 'BLAS' and 'LAPACK', which are automatically built in the event they are not available on the system.
Maintained by Drew Schmidt. Last updated 19 days ago.
float-matrixhpclinear-algebramatrixfortranopenblasopenmp
46 stars 10.53 score 228 scripts 42 dependentsbioc
miloR:Differential neighbourhood abundance testing on a graph
Milo performs single-cell differential abundance testing. Cell states are modelled as representative neighbourhoods on a nearest neighbour graph. Hypothesis testing is performed using either a negative bionomial generalized linear model or negative binomial generalized linear mixed model.
Maintained by Mike Morgan. Last updated 5 months ago.
singlecellmultiplecomparisonfunctionalgenomicssoftwareopenblascppopenmp
357 stars 10.49 score 340 scripts 1 dependentscrunch-io
crunch:Crunch.io Data Tools
The Crunch.io service <https://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.
Maintained by Greg Freedman Ellis. Last updated 7 days ago.
9 stars 10.47 score 200 scripts 2 dependentswrathematics
ngram:Fast n-Gram 'Tokenization'
An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.
Maintained by Drew Schmidt. Last updated 1 years ago.
71 stars 10.45 score 844 scripts 7 dependentsbioc
ChemmineR:Cheminformatics Toolkit for R
ChemmineR is a cheminformatics package for analyzing drug-like small molecule data in R. Its latest version contains functions for efficient processing of large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms. In addition, it offers visualization functions for compound clustering results and chemical structures.
Maintained by Thomas Girke. Last updated 5 months ago.
cheminformaticsbiomedicalinformaticspharmacogeneticspharmacogenomicsmicrotitreplateassaycellbasedassaysvisualizationinfrastructuredataimportclusteringproteomicsmetabolomicscpp
15 stars 10.45 score 253 scripts 12 dependentsbioc
oligo:Preprocessing tools for oligonucleotide arrays
A package to analyze oligonucleotide arrays (expression/SNP/tiling/exon) at probe-level. It currently supports Affymetrix (CEL files) and NimbleGen arrays (XYS files).
Maintained by Benilton Carvalho. Last updated 20 days ago.
microarrayonechanneltwochannelpreprocessingsnpdifferentialexpressionexonarraygeneexpressiondataimportzlib
3 stars 10.42 score 528 scripts 10 dependentsmhahsler
recommenderlab:Lab for Developing and Testing Recommender Algorithms
Provides a research infrastructure to develop and evaluate collaborative filtering recommender algorithms. This includes a sparse representation for user-item matrices, many popular algorithms, top-N recommendations, and cross-validation. Hahsler (2022) <doi:10.48550/arXiv.2205.12371>.
Maintained by Michael Hahsler. Last updated 3 days ago.
collaborative-filteringrecommender-system
214 stars 10.42 score 840 scripts 2 dependentsadeverse
adegraphics:An S4 Lattice-Based Package for the Representation of Multivariate Data
Graphical functionalities for the representation of multivariate data. It is a complete re-implementation of the functions available in the 'ade4' package.
Maintained by Aurélie Siberchicot. Last updated 8 months ago.
9 stars 10.37 score 386 scripts 6 dependentsbioc
flowCore:flowCore: Basic structures for flow cytometry data
Provides S4 data structures and basic functions to deal with flow cytometry data.
Maintained by Mike Jiang. Last updated 5 months ago.
immunooncologyinfrastructureflowcytometrycellbasedassayscpp
10.34 score 1.7k scripts 59 dependentsagrdatasci
gdistance:Distances and Routes on Geographical Grids
Provides classes and functions to calculate various distance measures and routes in heterogeneous geographic spaces represented as grids. The package implements measures to model dispersal histories first presented by van Etten and Hijmans (2010) <doi:10.1371/journal.pone.0012060>. Least-cost distances as well as more complex distances based on (constrained) random walks can be calculated. The distances implemented in the package are used in geographical genetics, accessibility indicators, and may also have applications in other fields of geospatial analysis.
Maintained by Andrew Marx. Last updated 1 years ago.
17 stars 10.34 score 478 scripts 23 dependentsbioc
GSEABase:Gene set enrichment data structures and methods
This package provides classes and methods to support Gene Set Enrichment Analysis (GSEA).
Maintained by Bioconductor Package Maintainer. Last updated 2 months ago.
geneexpressiongenesetenrichmentgraphandnetworkgokegg
10.27 score 1.5k scripts 77 dependentsbioc
CAMERA:Collection of annotation related methods for mass spectrometry data
Annotation of peaklists generated by xcms, rule based annotation of isotopes and adducts, isotope validation, EIC correlation based tagging of unknown adducts and fragments
Maintained by Steffen Neumann. Last updated 5 months ago.
immunooncologymassspectrometrymetabolomics
11 stars 10.27 score 175 scripts 6 dependentsbioc
BASiCS:Bayesian Analysis of Single-Cell Sequencing data
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model to perform statistical analyses of single-cell RNA sequencing datasets in the context of supervised experiments (where the groups of cells of interest are known a priori, e.g. experimental conditions or cell types). BASiCS performs built-in data normalisation (global scaling) and technical noise quantification (based on spike-in genes). BASiCS provides an intuitive detection criterion for highly (or lowly) variable genes within a single group of cells. Additionally, BASiCS can compare gene expression patterns between two or more pre-specified groups of cells. Unlike traditional differential expression tools, BASiCS quantifies changes in expression that lie beyond comparisons of means, also allowing the study of changes in cell-to-cell heterogeneity. The latter can be quantified via a biological over-dispersion parameter that measures the excess of variability that is observed with respect to Poisson sampling noise, after normalisation and technical noise removal. Due to the strong mean/over-dispersion confounding that is typically observed for scRNA-seq datasets, BASiCS also tests for changes in residual over-dispersion, defined by residual values with respect to a global mean/over-dispersion trend.
Maintained by Catalina Vallejos. Last updated 5 months ago.
immunooncologynormalizationsequencingrnaseqsoftwaregeneexpressiontranscriptomicssinglecelldifferentialexpressionbayesiancellbiologybioconductor-packagegene-expressionrcpprcpparmadilloscrna-seqsingle-cellopenblascppopenmp
83 stars 10.26 score 368 scripts 1 dependentsbioc
zinbwave:Zero-Inflated Negative Binomial Model for RNA-Seq Data
Implements a general and flexible zero-inflated negative binomial model that can be used to provide a low-dimensional representations of single-cell RNA-seq data. The model accounts for zero inflation (dropouts), over-dispersion, and the count nature of the data. The model also accounts for the difference in library sizes and optionally for batch effects and/or other covariates, avoiding the need for pre-normalize the data.
Maintained by Davide Risso. Last updated 5 months ago.
immunooncologydimensionreductiongeneexpressionrnaseqsoftwaretranscriptomicssequencingsinglecell
43 stars 10.21 score 190 scripts 6 dependentsbioc
BiocIO:Standard Input and Output for Bioconductor Packages
The `BiocIO` package contains high-level abstract classes and generics used by developers to build IO funcionality within the Bioconductor suite of packages. Implements `import()` and `export()` standard generics for importing and exporting biological data formats. `import()` supports whole-file as well as chunk-wise iterative import. The `import()` interface optionally provides a standard mechanism for 'lazy' access via `filter()` (on row or element-like components of the file resource), `select()` (on column-like components of the file resource) and `collect()`. The `import()` interface optionally provides transparent access to remote (e.g. via https) as well as local access. Developers can register a file extension, e.g., `.loom` for dispatch from character-based URIs to specific `import()` / `export()` methods based on classes representing file types, e.g., `LoomFile()`.
Maintained by Marcel Ramos. Last updated 4 months ago.
annotationdataimportbioconductor-packagecore-package
1 stars 10.20 score 19 scripts 492 dependentsbioc
AnnotationFilter:Facilities for Filtering Bioconductor Annotation Resources
This package provides class and other infrastructure to implement filters for manipulating Bioconductor annotation resources. The filters will be used by ensembldb, Organism.dplyr, and other packages.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
annotationinfrastructuresoftwarebioconductor-packagecore-package
5 stars 10.19 score 45 scripts 160 dependentsbioc
BiocNeighbors:Nearest Neighbor Detection for Bioconductor Packages
Implements exact and approximate methods for nearest neighbor detection, in a framework that allows them to be easily switched within Bioconductor packages or workflows. Exact searches can be performed using the k-means for k-nearest neighbors algorithm or with vantage point trees. Approximate searches can be performed using the Annoy or HNSW libraries. Searching on either Euclidean or Manhattan distances is supported. Parallelization is achieved for all methods by using BiocParallel. Functions are also provided to search for all neighbors within a given distance.
Maintained by Aaron Lun. Last updated 24 days ago.
10.14 score 646 scripts 89 dependentsgeoffjentry
twitteR:R Based Twitter Client
Provides an interface to the Twitter web API.
Maintained by Jeff Gentry. Last updated 9 years ago.
254 stars 10.12 score 2.0k scripts 1 dependentsstewid
SimInf:A Framework for Data-Driven Stochastic Disease Spread Simulations
Provides an efficient and very flexible framework to conduct data-driven epidemiological modeling in realistic large scale disease spread simulations. The framework integrates infection dynamics in subpopulations as continuous-time Markov chains using the Gillespie stochastic simulation algorithm and incorporates available data such as births, deaths and movements as scheduled events at predefined time-points. Using C code for the numerical solvers and 'OpenMP' (if available) to divide work over multiple processors ensures high performance when simulating a sample outcome. One of our design goals was to make the package extendable and enable usage of the numerical solvers from other R extension packages in order to facilitate complex epidemiological research. The package contains template models and can be extended with user-defined models. For more details see the paper by Widgren, Bauer, Eriksson and Engblom (2019) <doi:10.18637/jss.v091.i12>. The package also provides functionality to fit models to time series data using the Approximate Bayesian Computation Sequential Monte Carlo ('ABC-SMC') algorithm of Toni and others (2009) <doi:10.1098/rsif.2008.0172>.
Maintained by Stefan Widgren. Last updated 17 days ago.
data-drivenepidemiologyhigh-performance-computingmarkov-chainmathematical-modellinggslopenmp
35 stars 10.09 score 227 scriptsmages
ChainLadder:Statistical Methods and Models for Claims Reserving in General Insurance
Various statistical methods and models which are typically used for the estimation of outstanding claims reserves in general insurance, including those to estimate the claims development result as required under Solvency II.
Maintained by Markus Gesmann. Last updated 2 months ago.
82 stars 10.04 score 196 scripts 2 dependentsinsightsengineering
teal.data:Data Model for 'teal' Applications
Provides a 'teal_data' class as a unified data model for 'teal' applications focusing on reproducibility and relational data.
Maintained by Dawid Kaledkowski. Last updated 2 months ago.
11 stars 9.93 score 44 scripts 8 dependentsbioc
methylumi:Handle Illumina methylation data
This package provides classes for holding and manipulating Illumina methylation data. Based on eSet, it can contain MIAME information, sample information, feature information, and multiple matrices of data. An "intelligent" import function, methylumiR can read the Illumina text files and create a MethyLumiSet. methylumIDAT can directly read raw IDAT files from HumanMethylation27 and HumanMethylation450 microarrays. Normalization, background correction, and quality control features for GoldenGate, Infinium, and Infinium HD arrays are also included.
Maintained by Sean Davis. Last updated 5 months ago.
dnamethylationtwochannelpreprocessingqualitycontrolcpgisland
9 stars 9.90 score 89 scripts 9 dependentsshinra-dev
memuse:Memory Estimation Utilities
How much ram do you need to store a 100,000 by 100,000 matrix? How much ram is your current R session using? How much ram do you even have? Learn the scintillating answer to these and many more such questions with the 'memuse' package.
Maintained by Drew Schmidt. Last updated 2 years ago.
46 stars 9.84 score 142 scripts 33 dependentsubod
apcluster:Affinity Propagation Clustering
Implements Affinity Propagation clustering introduced by Frey and Dueck (2007) <DOI:10.1126/science.1136800>. The algorithms are largely analogous to the 'Matlab' code published by Frey and Dueck. The package further provides leveraged affinity propagation and an algorithm for exemplar-based agglomerative clustering that can also be used to join clusters obtained from affinity propagation. Various plotting functions are available for analyzing clustering results.
Maintained by Ulrich Bodenhofer. Last updated 11 months ago.
10 stars 9.81 score 270 scripts 25 dependentsprestodb
RPresto:DBI Connector to Presto
Implements a 'DBI' compliant interface to Presto. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes: <https://prestodb.io/>.
Maintained by Jarod G.R. Meng. Last updated 1 months ago.
132 stars 9.73 score 25 scripts 4 dependentsbioc
biocViews:Categorized views of R package repositories
Infrastructure to support 'views' used to classify Bioconductor packages. 'biocViews' are directed acyclic graphs of terms from a controlled vocabulary. There are three major classifications, corresponding to 'software', 'annotation', and 'experiment data' packages.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructurebioconductor-packagecore-package
4 stars 9.71 score 30 scripts 14 dependentsbioc
MicrobiotaProcess:A comprehensive R package for managing and analyzing microbiome and other ecological data within the tidy framework
MicrobiotaProcess is an R package for analysis, visualization and biomarker discovery of microbial datasets. It introduces MPSE class, this make it more interoperable with the existing computing ecosystem. Moreover, it introduces a tidy microbiome data structure paradigm and analysis grammar. It provides a wide variety of microbiome data analysis procedures under the unified and common framework (tidy-like framework).
Maintained by Shuangbin Xu. Last updated 5 months ago.
visualizationmicrobiomesoftwaremultiplecomparisonfeatureextractionmicrobiome-analysismicrobiome-data
183 stars 9.70 score 126 scripts 1 dependentssdctools
sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation
Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.
Maintained by Matthias Templ. Last updated 1 months ago.
84 stars 9.63 score 258 scriptsbioc
clusterExperiment:Compare Clusterings for Single-Cell Sequencing
Provides functionality for running and comparing many different clusterings of single-cell sequencing data or other large mRNA Expression data sets.
Maintained by Elizabeth Purdom. Last updated 5 months ago.
clusteringrnaseqsequencingsoftwaresinglecellcpp
38 stars 9.62 score 192 scripts 1 dependentsbioc
cytomapper:Visualization of highly multiplexed imaging data in R
Highly multiplexed imaging acquires the single-cell expression of selected proteins in a spatially-resolved fashion. These measurements can be visualised across multiple length-scales. First, pixel-level intensities represent the spatial distributions of feature expression with highest resolution. Second, after segmentation, expression values or cell-level metadata (e.g. cell-type information) can be visualised on segmented cell areas. This package contains functions for the visualisation of multiplexed read-outs and cell-level information obtained by multiplexed imaging technologies. The main functions of this package allow 1. the visualisation of pixel-level information across multiple channels, 2. the display of cell-level information (expression and/or metadata) on segmentation masks and 3. gating and visualisation of single cells.
Maintained by Lasse Meyer. Last updated 5 months ago.
immunooncologysoftwaresinglecellonechanneltwochannelmultiplecomparisonnormalizationdataimportbioimagingimaging-mass-cytometrysingle-cellspatial-analysis
32 stars 9.61 score 354 scripts 5 dependentsedzer
intervals:Tools for Working with Points and Intervals
Tools for working with and comparing sets of points and intervals.
Maintained by Edzer Pebesma. Last updated 7 months ago.
11 stars 9.50 score 122 scripts 98 dependentsbioc
snpStats:SnpMatrix and XSnpMatrix classes and methods
Classes and statistical methods for large SNP association studies. This extends the earlier snpMatrix package, allowing for uncertainty in genotypes.
Maintained by David Clayton. Last updated 5 months ago.
microarraysnpgeneticvariabilityzlib
9.48 score 674 scripts 20 dependentsbioc
RcisTarget:RcisTarget Identify transcription factor binding motifs enriched on a list of genes or genomic regions
RcisTarget identifies transcription factor binding motifs (TFBS) over-represented on a gene list. In a first step, RcisTarget selects DNA motifs that are significantly over-represented in the surroundings of the transcription start site (TSS) of the genes in the gene-set. This is achieved by using a database that contains genome-wide cross-species rankings for each motif. The motifs that are then annotated to TFs and those that have a high Normalized Enrichment Score (NES) are retained. Finally, for each motif and gene-set, RcisTarget predicts the candidate target genes (i.e. genes in the gene-set that are ranked above the leading edge).
Maintained by Gert Hulselmans. Last updated 5 months ago.
generegulationmotifannotationtranscriptomicstranscriptiongenesetenrichmentgenetarget
37 stars 9.47 score 191 scriptsbioc
bluster:Clustering Algorithms for Bioconductor
Wraps common clustering algorithms in an easily extended S4 framework. Backends are implemented for hierarchical, k-means and graph-based clustering. Several utilities are also provided to compare and evaluate clustering results.
Maintained by Aaron Lun. Last updated 5 months ago.
immunooncologysoftwaregeneexpressiontranscriptomicssinglecellclusteringcpp
9.43 score 636 scripts 51 dependentseagerai
fastai:Interface to 'fastai'
The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.
Maintained by Turgut Abdullayev. Last updated 12 months ago.
audiocollaborative-filteringdarknetdarknet-image-classificationfastaimedicalobject-detectiontabulartextvision
118 stars 9.40 score 76 scriptsbioc
SpatialFeatureExperiment:Integrating SpatialExperiment with Simple Features in sf
A new S4 class integrating Simple Features with the R package sf to bring geospatial data analysis methods based on vector data to spatial transcriptomics. Also implements management of spatial neighborhood graphs and geometric operations. This pakage builds upon SpatialExperiment and SingleCellExperiment, hence methods for these parent classes can still be used.
Maintained by Lambda Moses. Last updated 2 months ago.
datarepresentationtranscriptomicsspatial
49 stars 9.40 score 322 scripts 1 dependentsbioc
ggmsa:Plot Multiple Sequence Alignment using 'ggplot2'
A visual exploration tool for multiple sequence alignment and associated data. Supports MSA of DNA, RNA, and protein sequences using 'ggplot2'. Multiple sequence alignment can easily be combined with other 'ggplot2' plots, such as phylogenetic tree Visualized by 'ggtree', boxplot, genome map and so on. More features: visualization of sequence logos, sequence bundles, RNA secondary structures and detection of sequence recombinations.
Maintained by Guangchuang Yu. Last updated 3 months ago.
softwarevisualizationalignmentannotationmultiplesequencealignment
210 stars 9.35 score 196 scripts 2 dependentshauselin
ollamar:'Ollama' Language Models
An interface to easily run local language models with 'Ollama' <https://ollama.com> server and API endpoints (see <https://github.com/ollama/ollama/blob/main/docs/api.md> for details). It lets you run open-source large language models locally on your machine.
Maintained by Hause Lin. Last updated 4 days ago.
89 stars 9.32 score 74 scripts 5 dependentsohdsi
Andromeda:Asynchronous Disk-Based Representation of Massive Data
Storing very large data objects on a local drive, while still making it possible to manipulate the data in an efficient manner.
Maintained by Martijn Schuemie. Last updated 7 months ago.
11 stars 9.29 score 57 scripts 8 dependentsotoomet
maxLik:Maximum Likelihood Estimation and Related Tools
Functions for Maximum Likelihood (ML) estimation, non-linear optimization, and related tools. It includes a unified way to call different optimizers, and classes and methods to handle the results from the Maximum Likelihood viewpoint. It also includes a number of convenience tools for testing and developing your own models.
Maintained by Ott Toomet. Last updated 1 years ago.
9.14 score 480 scripts 110 dependentsdavid-cortes
MatrixExtra:Extra Methods for Sparse Matrices
Extends sparse matrix and vector classes from the 'Matrix' package by providing: (a) Methods and operators that work natively on CSR formats (compressed sparse row, a.k.a. 'RsparseMatrix') such as slicing/sub-setting, assignment, rbind(), mathematical operators for CSR and COO such as addition ("+") or sqrt(), and methods such as diag(); (b) Multi-threaded matrix multiplication and cross-product for many <sparse, dense> types, including the 'float32' type from 'float'; (c) Coercion methods between pairs of classes which are not present in 'Matrix', such as 'dgCMatrix' -> 'ngRMatrix', as well as convenience conversion functions; (d) Utility functions for sparse matrices such as sorting the indices or removing zero-valued entries; (e) Fast transposes that work by outputting in the opposite storage format; (f) Faster replacements for many 'Matrix' methods for all sparse types, such as slicing and elementwise multiplication. (g) Convenience functions for sparse objects, such as 'mapSparse' or a shorter 'show' method.
Maintained by David Cortes. Last updated 9 months ago.
csrsparse-matrixopenblascppopenmp
20 stars 9.08 score 84 scripts 29 dependentsinsightsengineering
teal.code:Code Storage and Execution Class for 'teal' Applications
Introduction of 'qenv' S4 class, that facilitates code execution and reproducibility in 'teal' applications.
Maintained by Dawid Kaledkowski. Last updated 1 months ago.
12 stars 9.03 score 11 scripts 9 dependentsbioc
topGO:Enrichment Analysis for Gene Ontology
topGO package provides tools for testing GO terms while accounting for the topology of the GO graph. Different test statistics and different methods for eliminating local similarities and dependencies between GO terms can be implemented and applied.
Maintained by Adrian Alexa. Last updated 5 months ago.
8.96 score 2.0k scripts 20 dependentsbpfaff
urca:Unit Root and Cointegration Tests for Time Series Data
Unit root and cointegration tests encountered in applied econometric analysis are implemented.
Maintained by Bernhard Pfaff. Last updated 10 months ago.
6 stars 8.95 score 1.4k scripts 270 dependentsbioc
RaggedExperiment:Representation of Sparse Experiments and Assays Across Samples
This package provides a flexible representation of copy number, mutation, and other data that fit into the ragged array schema for genomic location data. The basic representation of such data provides a rectangular flat table interface to the user with range information in the rows and samples/specimen in the columns. The RaggedExperiment class derives from a GRangesList representation and provides a semblance of a rectangular dataset.
Maintained by Marcel Ramos. Last updated 4 months ago.
infrastructuredatarepresentationcopynumbercore-packagedata-structuremutationsu24ca289073
4 stars 8.93 score 76 scripts 14 dependentsbioc
marray:Exploratory analysis for two-color spotted microarray data
Class definitions for two-color spotted microarray data. Fuctions for data input, diagnostic plots, normalization and quality checking.
Maintained by Yee Hwa (Jean) Yang. Last updated 5 months ago.
microarraytwochannelpreprocessing
8.92 score 222 scripts 38 dependentsquanteda
quanteda.textstats:Textual Statistics for the Quantitative Analysis of Textual Data
Textual statistics functions formerly in the 'quanteda' package. Textual statistics for characterizing and comparing textual data. Includes functions for measuring term and document frequency, the co-occurrence of words, similarity and distance between features and documents, feature entropy, keyword occurrence, readability, and lexical diversity. These functions extend the 'quanteda' package and are specially designed for sparse textual data.
Maintained by Kenneth Benoit. Last updated 7 months ago.
15 stars 8.91 score 916 scripts 10 dependentscran
XML:Tools for Parsing and Generating XML Within R and S-Plus
Many approaches for both reading and creating XML (and HTML) documents (including DTDs), both local and accessible via HTTP or FTP. Also offers access to an 'XPath' "interpreter".
Maintained by CRAN Team. Last updated 3 months ago.
3 stars 8.87 score 1.3k dependentsbioc
AnVIL:Bioconductor on the AnVIL compute environment
The AnVIL is a cloud computing resource developed in part by the National Human Genome Research Institute. The AnVIL package provides end-user and developer functionality. For the end-user, AnVIL provides fast binary package installation, utitlities for working with Terra / AnVIL table and data resources, and convenient functions for file movement to and from Google cloud storage. For developers, AnVIL provides programatic access to the Terra, Leonardo, Rawls, and Dockstore RESTful programming interface, including helper functions to transform JSON responses to formats more amenable to manipulation in R.
Maintained by Marcel Ramos. Last updated 1 months ago.
8.85 score 250 scripts 11 dependentskollerma
robustlmm:Robust Linear Mixed Effects Models
Implements the Robust Scoring Equations estimator to fit linear mixed effects models robustly. Robustness is achieved by modification of the scoring equations combined with the Design Adaptive Scale approach.
Maintained by Manuel Koller. Last updated 1 years ago.
28 stars 8.79 score 138 scriptsflr
FLCore:Core Package of FLR, Fisheries Modelling in R
Core classes and methods for FLR, a framework for fisheries modelling and management strategy simulation in R. Developed by a team of fisheries scientists in various countries. More information can be found at <http://flr-project.org/>.
Maintained by Iago Mosqueira. Last updated 8 days ago.
fisheriesflrfisheries-modelling
16 stars 8.78 score 956 scripts 23 dependentsr-forge
distr:Object Oriented Implementation of Distributions
S4-classes and methods for distributions.
Maintained by Peter Ruckdeschel. Last updated 2 months ago.
8.77 score 327 scripts 32 dependentsbioc
SeqVarTools:Tools for variant data
An interface to the fast-access storage format for VCF data provided in SeqArray, with tools for common operations and analysis.
Maintained by Stephanie M. Gogarten. Last updated 5 months ago.
snpgeneticvariabilitysequencinggenetics
3 stars 8.76 score 384 scripts 2 dependentsbart1
move:Visualizing and Analyzing Animal Track Data
Contains functions to access movement data stored in 'movebank.org' as well as tools to visualize and statistically analyze animal movement data, among others functions to calculate dynamic Brownian Bridge Movement Models. Move helps addressing movement ecology questions.
Maintained by Bart Kranstauber. Last updated 4 months ago.
8.76 score 690 scripts 3 dependentsclementcalenge
adehabitatHR:Home Range Estimation
A collection of tools for the estimation of animals home range.
Maintained by Clement Calenge. Last updated 7 months ago.
6 stars 8.73 score 752 scripts 9 dependentsbioc
GenomicScores:Infrastructure to work with genomewide position-specific scores
Provide infrastructure to store and access genomewide position-specific scores within R and Bioconductor.
Maintained by Robert Castelo. Last updated 2 months ago.
infrastructuregeneticsannotationsequencingcoverageannotationhubsoftware
8 stars 8.71 score 83 scripts 6 dependentsbioc
trackViewer:A R/Bioconductor package with web interface for drawing elegant interactive tracks or lollipop plot to facilitate integrated analysis of multi-omics data
Visualize mapped reads along with annotation as track layers for NGS dataset such as ChIP-seq, RNA-seq, miRNA-seq, DNA-seq, SNPs and methylation data.
Maintained by Jianhong Ou. Last updated 3 months ago.
8.71 score 145 scripts 2 dependentspilaboratory
sads:Maximum Likelihood Models for Species Abundance Distributions
Maximum likelihood tools to fit and compare models of species abundance distributions and of species rank-abundance distributions.
Maintained by Paulo I. Prado. Last updated 1 years ago.
23 stars 8.66 score 244 scripts 3 dependentsbioc
pRoloc:A unifying bioinformatics framework for spatial proteomics
The pRoloc package implements machine learning and visualisation methods for the analysis and interogation of quantitiative mass spectrometry data to reliably infer protein sub-cellular localisation.
Maintained by Lisa Breckels. Last updated 1 months ago.
immunooncologyproteomicsmassspectrometryclassificationclusteringqualitycontrolbioconductorproteomics-dataspatial-proteomicsvisualisationopenblascpp
15 stars 8.64 score 101 scripts 2 dependentsbioc
QuasR:Quantify and Annotate Short Reads in R
This package provides a framework for the quantification and analysis of Short Reads. It covers a complete workflow starting from raw sequence reads, over creation of alignments and quality control plots, to the quantification of genomic regions of interest. Read alignments are either generated through Rbowtie (data from DNA/ChIP/ATAC/Bis-seq experiments) or Rhisat2 (data from RNA-seq experiments that require spliced alignments), or can be provided in the form of bam files.
Maintained by Michael Stadler. Last updated 1 months ago.
geneticspreprocessingsequencingchipseqrnaseqmethylseqcoveragealignmentqualitycontrolimmunooncologycurlbzip2xz-utilszlibcpp
6 stars 8.63 score 79 scripts 1 dependentsdjvanderlaan
LaF:Fast Access to Large ASCII Files
Methods for fast access to large ASCII files. Currently the following file formats are supported: comma separated format (CSV) and fixed width format. It is assumed that the files are too large to fit into memory, although the package can also be used to efficiently access files that do fit into memory. Methods are provided to access and process files blockwise. Furthermore, an opened file can be accessed as one would an ordinary data.frame. The LaF vignette gives an overview of the functionality provided.
Maintained by Jan van der Laan. Last updated 4 months ago.
54 stars 8.62 score 61 scripts 5 dependentsactuaryzhang
cplm:Compound Poisson Linear Models
Likelihood-based and Bayesian methods for various compound Poisson linear models based on Zhang, Yanwei (2013) <doi:10.1007/s11222-012-9343-7>.
Maintained by Yanwei (Wayne) Zhang. Last updated 1 years ago.
16 stars 8.55 score 75 scripts 10 dependentsbioc
MsExperiment:Infrastructure for Mass Spectrometry Experiments
Infrastructure to store and manage all aspects related to a complete proteomics or metabolomics mass spectrometry (MS) experiment. The MsExperiment package provides light-weight and flexible containers for MS experiments building on the new MS infrastructure provided by the Spectra, QFeatures and related packages. Along with raw data representations, links to original data files and sample annotations, additional metadata or annotations can also be stored within the MsExperiment container. To guarantee maximum flexibility only minimal constraints are put on the type and content of the data within the containers.
Maintained by Laurent Gatto. Last updated 2 months ago.
infrastructureproteomicsmassspectrometrymetabolomicsexperimentaldesigndataimport
5 stars 8.51 score 126 scripts 14 dependentsbioc
ReactomeGSA:Client for the Reactome Analysis Service for comparative multi-omics gene set analysis
The ReactomeGSA packages uses Reactome's online analysis service to perform a multi-omics gene set analysis. The main advantage of this package is, that the retrieved results can be visualized using REACTOME's powerful webapplication. Since Reactome's analysis service also uses R to perfrom the actual gene set analysis you will get similar results when using the same packages (such as limma and edgeR) locally. Therefore, if you only require a gene set analysis, different packages are more suited.
Maintained by Johannes Griss. Last updated 4 months ago.
genesetenrichmentproteomicstranscriptomicssystemsbiologygeneexpressionreactome
22 stars 8.50 score 67 scripts 1 dependentsbioc
vsn:Variance stabilization and calibration for microarray data
The package implements a method for normalising microarray intensities from single- and multiple-color arrays. It can also be used for data from other technologies, as long as they have similar format. The method uses a robust variant of the maximum-likelihood estimator for an additive-multiplicative error model and affine calibration. The model incorporates data calibration step (a.k.a. normalization), a model for the dependence of the variance on the mean intensity and a variance stabilizing data transformation. Differences between transformed intensities are analogous to "normalized log-ratios". However, in contrast to the latter, their variance is independent of the mean, and they are usually more sensitive and specific in detecting differential transcription.
Maintained by Wolfgang Huber. Last updated 5 months ago.
microarrayonechanneltwochannelpreprocessing
8.49 score 924 scripts 51 dependentsbioc
pwalign:Perform pairwise sequence alignments
The two main functions in the package are pairwiseAlignment() and stringDist(). The former solves (Needleman-Wunsch) global alignment, (Smith-Waterman) local alignment, and (ends-free) overlap alignment problems. The latter computes the Levenshtein edit distance or pairwise alignment score matrix for a set of strings.
Maintained by Hervé Pagès. Last updated 10 days ago.
alignmentsequencematchingsequencinggeneticsbioconductor-package
1 stars 8.48 score 27 scripts 104 dependentsbioc
ClassifyR:A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing
The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.
Maintained by Dario Strbenac. Last updated 4 days ago.
6 stars 8.46 score 45 scripts 3 dependentsbioc
multiMiR:Integration of multiple microRNA-target databases with their disease and drug associations
A collection of microRNAs/targets from external resources, including validated microRNA-target databases (miRecords, miRTarBase and TarBase), predicted microRNA-target databases (DIANA-microT, ElMMo, MicroCosm, miRanda, miRDB, PicTar, PITA and TargetScan) and microRNA-disease/drug databases (miR2Disease, Pharmaco-miR VerSe and PhenomiR).
Maintained by Spencer Mahaffey. Last updated 5 months ago.
mirnadatahomo_sapiens_datamus_musculus_datarattus_norvegicus_dataorganismdatamicrorna-sequencesql
20 stars 8.45 score 141 scriptsbioc
ScaledMatrix:Creating a DelayedMatrix of Scaled and Centered Values
Provides delayed computation of a matrix of scaled and centered values. The result is equivalent to using the scale() function but avoids explicit realization of a dense matrix during block processing. This permits greater efficiency in common operations, most notably matrix multiplication.
Maintained by Aaron Lun. Last updated 2 months ago.
8.44 score 10 scripts 105 dependentsbioc
UniProt.ws:R Interface to UniProt Web Services
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. This package provides a collection of functions for retrieving, processing, and re-packaging UniProt web services. The package makes use of UniProt's modernized REST API and allows mapping of identifiers accross different databases.
Maintained by Marcel Ramos. Last updated 3 months ago.
annotationinfrastructuregokeggbiocartabioconductor-packagecore-package
4 stars 8.38 score 167 scripts 4 dependentsbioc
CompoundDb:Creating and Using (Chemical) Compound Annotation Databases
CompoundDb provides functionality to create and use (chemical) compound annotation databases from a variety of different sources such as LipidMaps, HMDB, ChEBI or MassBank. The database format allows to store in addition MS/MS spectra along with compound information. The package provides also a backend for Bioconductor's Spectra package and allows thus to match experimetal MS/MS spectra against MS/MS spectra in the database. Databases can be stored in SQLite format and are thus portable.
Maintained by Johannes Rainer. Last updated 2 months ago.
massspectrometrymetabolomicsannotationdatabasesmass-spectrometry
17 stars 8.35 score 69 scripts 1 dependentsbioc
csaw:ChIP-Seq Analysis with Windows
Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control.
Maintained by Aaron Lun. Last updated 2 months ago.
multiplecomparisonchipseqnormalizationsequencingcoveragegeneticsannotationdifferentialpeakcallingcurlbzip2xz-utilszlibcpp
8.32 score 498 scripts 7 dependentsbioc
rols:An R interface to the Ontology Lookup Service
The rols package is an interface to the Ontology Lookup Service (OLS) to access and query hundred of ontolgies directly from R.
Maintained by Laurent Gatto. Last updated 5 months ago.
immunooncologysoftwareannotationmassspectrometrygo
11 stars 8.30 score 89 scripts 5 dependentstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
3 stars 8.20 score 7.8k scripts 11 dependentscran
flexmix:Flexible Mixture Modeling
A general framework for finite mixtures of regression models using the EM algorithm is implemented. The E-step and all data handling are provided, while the M-step can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering.
Maintained by Bettina Gruen. Last updated 29 days ago.
5 stars 8.19 score 113 dependentsrdpeng
filehash:Simple Key-Value Database
Implements a simple key-value style database where character string keys are associated with data values that are stored on the disk. A simple interface is provided for inserting, retrieving, and deleting data from the database. Utilities are provided that allow 'filehash' databases to be treated much like environments and lists are already used in R. These utilities are provided to encourage interactive and exploratory analysis on large datasets. Three different file formats for representing the database are currently available and new formats can easily be incorporated by third parties for use in the 'filehash' framework.
Maintained by Roger D. Peng. Last updated 2 years ago.
24 stars 8.16 score 78 scripts 11 dependentsbioc
nullranges:Generation of null ranges via bootstrapping or covariate matching
Modular package for generation of sets of ranges representing the null hypothesis. These can take the form of bootstrap samples of ranges (using the block bootstrap framework of Bickel et al 2010), or sets of control ranges that are matched across one or more covariates. nullranges is designed to be inter-operable with other packages for analysis of genomic overlap enrichment, including the plyranges Bioconductor package.
Maintained by Michael Love. Last updated 5 months ago.
visualizationgenesetenrichmentfunctionalgenomicsepigeneticsgeneregulationgenetargetgenomeannotationannotationgenomewideassociationhistonemodificationchipseqatacseqdnaseseqrnaseqhiddenmarkovmodelbioconductorbootstrapgenomicsmatchingstatistics
27 stars 8.16 score 50 scripts 1 dependentsbioc
BioQC:Detect tissue heterogeneity in expression profiles with gene sets
BioQC performs quality control of high-throughput expression data based on tissue gene signatures. It can detect tissue heterogeneity in gene expression data. The core algorithm is a Wilcoxon-Mann-Whitney test that is optimised for high performance.
Maintained by Jitao David Zhang. Last updated 5 months ago.
geneexpressionqualitycontrolstatisticalmethodgenesetenrichmentcpp
5 stars 8.16 score 86 scriptsbioc
dreamlet:Scalable differential expression analysis of single cell transcriptomics datasets with complex study designs
Recent advances in single cell/nucleus transcriptomic technology has enabled collection of cohort-scale datasets to study cell type specific gene expression differences associated disease state, stimulus, and genetic regulation. The scale of these data, complex study designs, and low read count per cell mean that characterizing cell type specific molecular mechanisms requires a user-frieldly, purpose-build analytical framework. We have developed the dreamlet package that applies a pseudobulk approach and fits a regression model for each gene and cell cluster to test differential expression across individuals associated with a trait of interest. Use of precision-weighted linear mixed models enables accounting for repeated measures study designs, high dimensional batch effects, and varying sequencing depth or observed cells per biosample.
Maintained by Gabriel Hoffman. Last updated 3 days ago.
rnaseqgeneexpressiondifferentialexpressionbatcheffectqualitycontrolregressiongenesetenrichmentgeneregulationepigeneticsfunctionalgenomicstranscriptomicsnormalizationsinglecellpreprocessingsequencingimmunooncologysoftwarecpp
12 stars 8.14 score 128 scripts