Biostrings:Efficient manipulation of biological strings
Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.
Maintained by Hervé Pagès. Last updated 1 months ago.
62 stars 17.77 score 8.6k scripts 1.2k dependentsbioc
GenomicRanges:Representation and manipulation of genomic intervals
The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.
Maintained by Hervé Pagès. Last updated 4 months ago.
44 stars 17.68 score 13k scripts 1.3k dependentsbioc
SummarizedExperiment:A container (S4 class) for matrix-like assays
The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
Maintained by Hervé Pagès. Last updated 5 months ago.
34 stars 16.84 score 8.6k scripts 1.2k dependentsbioc
IRanges:Foundation of integer range manipulation in Bioconductor
Provides efficient low-level and highly reusable S4 classes for storing, manipulating and aggregating over annotated ranges of integers. Implements an algebra of range operations, including efficient algorithms for finding overlaps and nearest neighbors. Defines efficient list-like classes for storing, transforming and aggregating large grouped data, i.e., collections of atomic vectors and DataFrames.
Maintained by Hervé Pagès. Last updated 2 months ago.
22 stars 16.09 score 2.1k scripts 1.8k dependentsbioc
GenomicAlignments:Representation and manipulation of short genomic alignments
Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.
Maintained by Hervé Pagès. Last updated 5 months ago.
10 stars 15.21 score 3.1k scripts 528 dependentsmhahsler
arules:Mining Association Rules and Frequent Itemsets
Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.
Maintained by Michael Hahsler. Last updated 2 months ago.
194 stars 13.99 score 3.3k scripts 28 dependentsbioc
microbiome:Microbiome Analytics
Utilities for microbiome analysis.
Maintained by Leo Lahti. Last updated 5 months ago.
293 stars 12.51 score 2.0k scripts 5 dependentsstuart-lab
Signac:Analysis of Single-Cell Chromatin Data
A framework for the analysis and exploration of single-cell chromatin data. The 'Signac' package contains functions for quantifying single-cell chromatin data, computing per-cell quality control metrics, dimension reduction and normalization, visualization, and DNA sequence motif analysis. Reference: Stuart et al. (2021) <doi:10.1038/s41592-021-01282-5>.
Maintained by Tim Stuart. Last updated 7 months ago.
355 stars 12.18 score 3.7k scripts 1 dependentsbioc
ShortRead:FASTQ input and manipulation
This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
8 stars 12.08 score 1.8k scripts 49 dependentsemf-creaf
indicspecies:Relationship Between Species and Groups of Sites
Functions to assess the strength and statistical significance of the relationship between species occurrence/abundance and groups of sites [De Caceres & Legendre (2009) <doi:10.1890/08-1823.1>]. Also includes functions to measure species niche breadth using resource categories [De Caceres et al. (2011) <doi:10.1111/J.1600-0706.2011.19679.x>].
Maintained by Miquel De Cáceres. Last updated 1 months ago.
10 stars 9.66 score 386 scripts 4 dependentsvigou3
actuar:Actuarial Functions and Heavy Tailed Distributions
Functions and data sets for actuarial science: modeling of loss distributions; risk theory and ruin theory; simulation of compound models, discrete mixtures and compound hierarchical models; credibility theory. Support for many additional probability distributions to model insurance loss size and frequency: 23 continuous heavy tailed distributions; the Poisson-inverse Gaussian discrete distribution; zero-truncated and zero-modified extensions of the standard discrete distributions. Support for phase-type distributions commonly used to compute ruin probabilities. Main reference: <doi:10.18637/jss.v025.i07>. Implementation of the Feller-Pareto family of distributions: <doi:10.18637/jss.v103.i06>.
Maintained by Vincent Goulet. Last updated 3 months ago.
12 stars 9.44 score 732 scripts 35 dependentsohdsi
Cyclops:Cyclic Coordinate Descent for Logistic, Poisson and Survival Analysis
This model fitting tool incorporates cyclic coordinate descent and majorization-minimization approaches to fit a variety of regression models found in large-scale observational healthcare data. Implementations focus on computational optimization and fine-scale parallelization to yield efficient inference in massive datasets. Please see: Suchard, Simpson, Zorych, Ryan and Madigan (2013) <doi:10.1145/2414416.2414791>.
Maintained by Marc A. Suchard. Last updated 4 months ago.
39 stars 9.05 score 73 scripts 4 dependentsbioc
pwalign:Perform pairwise sequence alignments
The two main functions in the package are pairwiseAlignment() and stringDist(). The former solves (Needleman-Wunsch) global alignment, (Smith-Waterman) local alignment, and (ends-free) overlap alignment problems. The latter computes the Levenshtein edit distance or pairwise alignment score matrix for a set of strings.
Maintained by Hervé Pagès. Last updated 10 days ago.
1 stars 8.48 score 27 scripts 104 dependentsropensci
ritis:Integrated Taxonomic Information System Client
An interface to the Integrated Taxonomic Information System ('ITIS') (<>). Includes functions to work with the 'ITIS' REST API methods (<>), as well as the 'Solr' web service (<>).
Maintained by Julia Blum. Last updated 2 months ago.
16 stars 7.72 score 64 scripts 24 dependentsprojectmosaic
mosaicCore:Common Utilities for Other MOSAIC-Family Packages
Common utilities used in other MOSAIC-family packages are collected here.
Maintained by Randall Pruim. Last updated 1 years ago.
1 stars 7.07 score 113 scripts 26 dependentsfcharte
mldr:Exploratory Data Analysis and Manipulation of Multi-Label Data Sets
Exploratory data analysis and manipulation functions for multi- label data sets along with an interactive Shiny application to ease their use.
Maintained by David Charte. Last updated 5 years ago.
23 stars 7.07 score 168 scripts 2 dependentsbioc
GenomicFiles:Distributed computing by file or by range
This package provides infrastructure for parallel computations distributed 'by file' or 'by range'. User defined MAPPER and REDUCER functions provide added flexibility for data combination and manipulation.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
6.86 score 89 scripts 16 dependentschelbert
DiceDesign:Designs of Computer Experiments
Space-Filling Designs and space-filling criteria (distance-based and uniformity-based), with emphasis to computer experiments; <doi:10.18637/jss.v065.i11>.
Maintained by Celine Helbert. Last updated 1 years ago.
6.12 score 231 scripts 64 dependentsbioc
HiContacts:Analysing cool files in R with HiContacts
HiContacts provides a collection of tools to analyse and visualize Hi-C datasets imported in R by HiCExperiment.
Maintained by Jacques Serizay. Last updated 10 days ago.
12 stars 6.07 score 49 scriptsnjtierney
maxcovr:A Set of Tools For Solving The Maximal Covering Location Problem
Solving the "maximal covering location problem" as described by Church can be difficult for users not familiar with linear programming. maxcovr provides functions to make it easy to solve this problem, and tools to calculate facility coverage.
Maintained by Nicholas Tierney. Last updated 4 months ago.
43 stars 6.05 score 43 scriptsbioc
GenomicTuples:Representation and Manipulation of Genomic Tuples
GenomicTuples defines general purpose containers for storing genomic tuples. It aims to provide functionality for tuples of genomic co-ordinates that are analogous to those available for genomic ranges in the GenomicRanges Bioconductor package.
Maintained by Peter Hickey. Last updated 5 months ago.
4 stars 5.48 score 7 scriptsemilio-berti
GHCNr:Download Weather Station Data from GHCNd
The goal of 'GHCNr' is to provide a fast and friendly interface with the Global Historical Climatology Network daily (GHCNd) database, which contains daily summaries of weather station data worldwide (<>). GHCNd is accessed through the web API <>. 'GHCNr' main functionalities consist of downloading data from GHCNd, filter it, and to aggregate it at monthly and annual scales.
Maintained by Emilio Berti. Last updated 3 months ago.
2 stars 4.95 score 3 scriptsmartirm
clustAnalytics:Cluster Evaluation on Graphs
Evaluates the stability and significance of clusters on 'igraph' graphs. Supports weighted and unweighted graphs. Implements the cluster evaluation methods defined by Arratia A, Renedo M (2021) <doi:10.7717/peerj-cs.600>. Also includes an implementation of the Reduced Mutual Information introduced by Newman et al. (2020) <doi:10.1103/PhysRevE.101.042304>.
Maintained by Martí Renedo Mirambell. Last updated 1 years ago.
5 stars 4.92 score 33 scriptsericmarcon
divent:Entropy Partitioning to Measure Diversity
Measurement and partitioning of diversity, based on Tsallis entropy, following Marcon and Herault (2015) <doi:10.18637/jss.v067.i08>. 'divent' provides functions to estimate alpha, beta and gamma diversity of communities, including phylogenetic and functional diversity.
Maintained by Eric Marcon. Last updated 1 months ago.
1 stars 4.78 score 1 scriptsbioc
scmeth:Functions to conduct quality control analysis in methylation data
Functions to analyze methylation data can be found here. Some functions are relevant for single cell methylation data but most other functions can be used for any methylation data. Highlight of this workflow is the comprehensive quality control report.
Maintained by Divy Kangeyan. Last updated 5 months ago.
4.70 score 5 scriptskoenniem
mpathsenser:Process and Analyse Data from m-Path Sense
Overcomes one of the major challenges in mobile (passive) sensing, namely being able to pre-process the raw data that comes from a mobile sensing app, specifically 'm-Path Sense' <>. The main task of 'mpathsenser' is therefore to read 'm-Path Sense' JSON files into a database and provide several convenience functions to aid in data processing.
Maintained by Koen Niemeijer. Last updated 1 months ago.
1 stars 4.30 score 6 scriptscran
BAT:Biodiversity Assessment Tools
Includes algorithms to assess alpha and beta diversity in all their dimensions (taxonomic, phylogenetic and functional). It allows performing a number of analyses based on species identities/abundances, phylogenetic/functional distances, trees, convex-hulls or kernel density n-dimensional hypervolumes depicting species relationships. Cardoso et al. (2015) <doi:10.1111/2041-210X.12310>.
Maintained by Pedro Cardoso. Last updated 1 years ago.
3.17 score 3 dependentsjeinbeck-code
LPCM:Local Principal Curve Methods
Fitting multivariate data patterns with local principal curves, including tools for data compression (projection) and measuring goodness-of-fit; with some additional functions for mean shift clustering. See Einbeck, Tutz and Evers (2005) <doi:10.1007/s11222-005-4073-8> and Ameijeiras-Alonso and Einbeck (2023) <doi:10.1007/s11634-023-00575-1>.
Maintained by Jochen Einbeck. Last updated 7 months ago.
3.09 score 35 scripts 1 dependentsrwparsons
simMetric:Metrics (with Uncertainty) for Simulation Studies that Evaluate Statistical Methods
Allows users to quickly apply individual or multiple metrics to evaluate Monte Carlo simulation studies.
Maintained by Rex Parsons. Last updated 2 years ago.
2.70 score 2 scriptscran
arulesSequences:Mining Frequent Sequences
Add-on for arules to handle and mine frequent sequences. Provides interfaces to the C++ implementation of cSPADE by Mohammed J. Zaki.
Maintained by Christian Buchta. Last updated 7 months ago.
12 stars 2.63 scoremhoehle
binomSamSize:Confidence Intervals and Sample Size Determination for a Binomial Proportion under Simple Random Sampling and Pooled Sampling
A suite of functions to compute confidence intervals and necessary sample sizes for the parameter p of the Bernoulli B(p) distribution under simple random sampling or under pooled sampling. Such computations are e.g. of interest when investigating the incidence or prevalence in populations. The package contains functions to compute coverage probabilities and coverage coefficients of the provided confidence intervals procedures. Sample size calculations are based on expected length.
Maintained by Michael Hoehle. Last updated 1 years ago.
2 stars 2.18 score 15 scriptsandrea-fasulo
SAEval:Small Area Estimation Evaluation
Allows users to produce diagnostic procedures and graphic tools for the evaluation of Small Area estimators.
Maintained by Andrea Fasulo. Last updated 2 years ago.
1.00 score