R-universe search: exports:summary

stan-dev

rstan:R Interface to Stan

User-facing R functions are provided to parse, compile, test, estimate, and analyze Stan models by accessing the header-only Stan library provided by the 'StanHeaders' package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo, rough Bayesian inference via 'variational' approximation, and (optionally penalized) maximum likelihood estimation via optimization. In all three cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.

Maintained by Ben Goodrich. Last updated 4 days ago.

bayesian-data-analysis bayesian-inference bayesian-statistics mcmc stan cpp

1.1k stars 18.84 score 14k scripts 281 dependents

edzer

sp:Classes and Methods for Spatial Data

Classes and methods for spatial data; the classes document where the spatial location information resides, for 2D or 3D data. Utility functions are provided, e.g. for plotting data as maps, spatial selection, as well as methods for retrieving coordinates, for subsetting, print, summary, etc. From this version, 'rgdal', 'maptools', and 'rgeos' are no longer used at all, see <https://r-spatial.org/r/2023/05/15/evolution4.html> for details.

Maintained by Edzer Pebesma. Last updated 2 months ago.

127 stars 18.63 score 35k scripts 1.3k dependents

bioc

Biostrings:Efficient manipulation of biological strings

Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.

Maintained by Hervé Pagès. Last updated 1 months ago.

sequencematching alignment sequencing genetics dataimport datarepresentation infrastructure bioconductor-package core-package

62 stars 17.77 score 8.6k scripts 1.2k dependents

bioc

GenomicRanges:Representation and manipulation of genomic intervals

The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.

Maintained by Hervé Pagès. Last updated 4 months ago.

genetics infrastructure datarepresentation sequencing annotation genomeannotation coverage bioconductor-package core-package

44 stars 17.68 score 13k scripts 1.3k dependents

rspatial

terra:Spatial Data Analysis

Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).

Maintained by Robert J. Hijmans. Last updated 2 days ago.

geospatial raster spatial vector onetbb proj gdal geos cpp

559 stars 17.64 score 17k scripts 855 dependents

rspatial

raster:Geographic Data Analysis and Modeling

Reading, writing, manipulating, analyzing and modeling of spatial data. This package has been superseded by the "terra" package <https://CRAN.R-project.org/package=terra>.

Maintained by Robert J. Hijmans. Last updated 18 hours ago.

cpp

163 stars 17.23 score 58k scripts 562 dependents

r-forge

Matrix:Sparse and Dense Matrix Classes and Methods

A rich hierarchy of sparse and dense matrix classes, including general, symmetric, triangular, and diagonal matrices with numeric, logical, or pattern entries. Efficient methods for operating on such matrices, often wrapping the 'BLAS', 'LAPACK', and 'SuiteSparse' libraries.

Maintained by Martin Maechler. Last updated 19 days ago.

openblas

1 stars 17.23 score 33k scripts 12k dependents

yrosseel

lavaan:Latent Variable Analysis

Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.

Maintained by Yves Rosseel. Last updated 2 days ago.

factor-analysis growth-curve-models latent-variables missing-data multilevel-models multivariate-analysis path-analysis psychometrics statistical-modeling structural-equation-modeling

454 stars 16.82 score 8.4k scripts 218 dependents

bioc

GenomeInfoDb:Utilities for manipulating chromosome names, including modifying them to follow a particular naming style

Contains data and functions that define and allow translation between different chromosome sequence naming conventions (e.g., "chr1" versus "1"), including a function that attempts to place sequence names in their natural, rather than lexicographic, order.

Maintained by Hervé Pagès. Last updated 2 months ago.

genetics datarepresentation annotation genomeannotation bioconductor-package core-package

32 stars 16.32 score 1.3k scripts 1.7k dependents

joshuaulrich

quantmod:Quantitative Financial Modelling Framework

Specify, build, trade, and analyse quantitative financial trading strategies.

Maintained by Joshua M. Ulrich. Last updated 26 days ago.

algorithmic-trading charting data-import finance time-series

839 stars 16.17 score 8.1k scripts 343 dependents

bioc

DESeq2:Differential gene expression analysis based on the negative binomial distribution

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Maintained by Michael Love. Last updated 23 days ago.

sequencing rnaseq chipseq geneexpression transcription normalization differentialexpression bayesian regression principalcomponent clustering immunooncology openblas cpp

375 stars 16.11 score 17k scripts 115 dependents

bioc

IRanges:Foundation of integer range manipulation in Bioconductor

Provides efficient low-level and highly reusable S4 classes for storing, manipulating and aggregating over annotated ranges of integers. Implements an algebra of range operations, including efficient algorithms for finding overlaps and nearest neighbors. Defines efficient list-like classes for storing, transforming and aggregating large grouped data, i.e., collections of atomic vectors and DataFrames.

Maintained by Hervé Pagès. Last updated 2 months ago.

infrastructure datarepresentation bioconductor-package core-package

22 stars 16.09 score 2.1k scripts 1.8k dependents

bioc

S4Vectors:Foundation of vector-like and list-like containers in Bioconductor

The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).

Maintained by Hervé Pagès. Last updated 2 months ago.

infrastructure datarepresentation bioconductor-package core-package

18 stars 16.05 score 1.0k scripts 1.9k dependents

bioc

DelayedArray:A unified framework for working transparently with on-disk and in-memory array-like datasets

Wrapping an array-like object (typically an on-disk object) in a DelayedArray object allows one to perform common array operations on it without loading the object in memory. In order to reduce memory usage and optimize performance, operations on the object are either delayed or executed using a block processing mechanism. Note that this also works on in-memory array-like objects like DataFrame objects (typically with Rle columns), Matrix objects, ordinary arrays and, data frames.

Maintained by Hervé Pagès. Last updated 1 months ago.

infrastructure datarepresentation annotation genomeannotation bioconductor-package core-package u24ca289073

27 stars 15.59 score 538 scripts 1.2k dependents

dankelley

oce:Analysis of Oceanographic Data

Supports the analysis of Oceanographic data, including 'ADCP' measurements, measurements made with 'argo' floats, 'CTD' measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the 'UNESCO' or 'TEOS-10' equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively by Kelley (2018) "Oceanographic Analysis with R" <doi:10.1007/978-1-4939-8844-0>.

Maintained by Dan Kelley. Last updated 1 days ago.

oceanography fortran cpp

146 stars 15.34 score 4.2k scripts 18 dependents

bioc

AnnotationDbi:Manipulation of SQLite-based annotations in Bioconductor

Implements a user-friendly interface for querying SQLite-based annotation data packages.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation microarray sequencing genomeannotation bioconductor-package core-package

9 stars 15.05 score 3.6k scripts 769 dependents

bioc

DOSE:Disease Ontology Semantic and Enrichment analysis

This package implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring semantic similarities among DO terms and gene products. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented for discovering disease associations of high-throughput biological data.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation visualization multiplecomparison genesetenrichment pathways software disease-ontology enrichment-analysis semantic-similarity

119 stars 14.97 score 2.0k scripts 61 dependents

philchalmers

mirt:Multidimensional Item Response Theory

Analysis of discrete response data using unidimensional and multidimensional item analysis models under the Item Response Theory paradigm (Chalmers (2012) <doi:10.18637/jss.v048.i06>). Exploratory and confirmatory item factor analysis models are estimated with quadrature (EM) or stochastic (MHRM) methods. Confirmatory bi-factor and two-tier models are available for modeling item testlets using dimension reduction EM algorithms, while multiple group analyses and mixed effects designs are included for detecting differential item, bundle, and test functioning, and for modeling item and person covariates. Finally, latent class models such as the DINA, DINO, multidimensional latent class, mixture IRT models, and zero-inflated response models are supported, as well as a wide family of probabilistic unfolding models.

Maintained by Phil Chalmers. Last updated 1 days ago.

irt mirt openblas cpp openmp

212 stars 14.93 score 2.5k scripts 40 dependents

edzer

hexbin:Hexagonal Binning Routines

Binning and plotting functions for hexagonal bins.

Maintained by Edzer Pebesma. Last updated 5 months ago.

fortran

37 stars 14.00 score 2.4k scripts 114 dependents

mhahsler

arules:Mining Association Rules and Frequent Itemsets

Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.

Maintained by Michael Hahsler. Last updated 2 months ago.

arules association-rules frequent-itemsets

194 stars 13.99 score 3.3k scripts 28 dependents

biomodhub

biomod2:Ensemble Platform for Species Distribution Modeling

Functions for species distribution modeling, calibration and evaluation, ensemble of models, ensemble forecasting and visualization. The package permits to run consistently up to 10 single models on a presence/absences (resp presences/pseudo-absences) dataset and to combine them in ensemble models and ensemble projections. Some bench of other evaluation and visualisation tools are also available within the package.

Maintained by Maya Guéguen. Last updated 2 days ago.

95 stars 13.85 score 536 scripts 7 dependents

r-dbi

RMySQL:Database Interface and 'MySQL' Driver for R

Legacy 'DBI' interface to 'MySQL' / 'MariaDB' based on old code ported from S-PLUS. A modern 'MySQL' client written in 'C++' is available from the 'RMariaDB' package.

Maintained by Jeroen Ooms. Last updated 2 months ago.

database mysql

209 stars 13.68 score 3.7k scripts 15 dependents

bbolker

bbmle:Tools for General Maximum Likelihood Estimation

Methods and functions for fitting maximum likelihood models in R. This package modifies and extends the 'mle' classes in the 'stats4' package.

Maintained by Ben Bolker. Last updated 1 months ago.

25 stars 13.36 score 1.4k scripts 117 dependents

brodieg

diffobj:Diffs for R Objects

Generate a colorized diff of two R objects for an intuitive visualization of their differences.

Maintained by Brodie Gaslam. Last updated 3 years ago.

diff

231 stars 13.17 score 107 scripts 494 dependents

biodiverse

unmarked:Models for Data from Unmarked Animals

Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. References: Kellner et al. (2023) <doi:10.1111/2041-210X.14123>, Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.

Maintained by Ken Kellner. Last updated 9 days ago.

openblas cpp openmp

4 stars 13.02 score 652 scripts 12 dependents

spedygiorgio

markovchain:Easy Handling Discrete Time Markov Chains

Functions and S4 methods to create and manage discrete time Markov chains more easily. In addition functions to perform statistical (fitting and drawing random variates) and probabilistic (analysis of their structural proprieties) analysis are provided. See Spedicato (2017) <doi:10.32614/RJ-2017-036>. Some functions for continuous times Markov chains depend on the suggested ctmcd package.

Maintained by Giorgio Alfredo Spedicato. Last updated 5 months ago.

ctmc dtmc markov-chain markov-model r-programming rcpp openblas cpp

104 stars 12.78 score 712 scripts 4 dependents

bioc

rtracklayer:R interface to genome annotation files and the UCSC genome browser

Extensible framework for interacting with multiple genome browsers (currently UCSC built-in) and manipulating annotation tracks in various formats (currently GFF, BED, bedGraph, BED15, WIG, BigWig and 2bit built-in). The user may export/import tracks to/from the supported browsers, as well as query and modify the browser state, such as the current viewport.

Maintained by Michael Lawrence. Last updated 3 days ago.

annotation visualization dataimport zlib openssl curl

12.66 score 6.7k scripts 480 dependents

thibautjombart

adegenet:Exploratory Analysis of Genetic and Genomic Data

Toolset for the exploration of genetic and genomic data. Adegenet provides formal (S4) classes for storing and handling various genetic data, including genetic markers with varying ploidy and hierarchical population structure ('genind' class), alleles counts by populations ('genpop'), and genome-wide SNP data ('genlight'). It also implements original multivariate methods (DAPC, sPCA), graphics, statistical tests, simulation tools, distance and similarity measures, and several spatial methods. A range of both empirical and simulated datasets is also provided to illustrate various methods.

Maintained by Zhian N. Kamvar. Last updated 2 months ago.

182 stars 12.60 score 1.9k scripts 29 dependents

data-cleaning

validate:Data Validation Infrastructure

Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity. Supports checks implied by an SDMX DSD file as well. See also Van der Loo and De Jonge (2018) <doi:10.1002/9781118897126>, Chapter 6 and the JSS paper (2021) <doi:10.18637/jss.v097.i10>.

Maintained by Mark van der Loo. Last updated 24 days ago.

data-cleaning validation

419 stars 12.39 score 448 scripts 8 dependents

melff

memisc:Management of Survey Data and Presentation of Analysis Results

An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.

Maintained by Martin Elff. Last updated 23 days ago.

survey-data

46 stars 12.34 score 1.2k scripts 13 dependents

miraisolutions

XLConnect:Excel Connector for R

Provides comprehensive functionality to read, write and format Excel data.

Maintained by Martin Studer. Last updated 30 days ago.

cross-platform excel r-language xlconnect openjdk

130 stars 12.28 score 1.2k scripts 1 dependents

tomoakin

RPostgreSQL:R Interface to the 'PostgreSQL' Database System

Database interface and 'PostgreSQL' driver for 'R'. This package provides a Database Interface 'DBI' compliant driver for 'R' to access 'PostgreSQL' database systems. In order to build and install this package from source, 'PostgreSQL' itself must be present your system to provide 'PostgreSQL' functionality via its libraries and header files. These files are provided as 'postgresql-devel' package under some Linux distributions. On 'macOS' and 'Microsoft Windows' system the attached 'libpq' library source will be used.

Maintained by Tomoaki Nishiyama. Last updated 15 hours ago.

postgresql

66 stars 12.11 score 4.5k scripts 19 dependents

r-forge

copula:Multivariate Dependence with Copulas

Classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.

Maintained by Martin Maechler. Last updated 23 days ago.

11.83 score 1.2k scripts 86 dependents

kingaa

pomp:Statistical Inference for Partially Observed Markov Processes

Tools for data analysis with partially observed Markov process (POMP) models (also known as stochastic dynamical systems, hidden Markov models, and nonlinear, non-Gaussian, state-space models). The package provides facilities for implementing POMP models, simulating them, and fitting them to time series data by a variety of frequentist and Bayesian methods. It is also a versatile platform for implementation of inference methods for general POMP models.

Maintained by Aaron A. King. Last updated 8 days ago.

abc b-spline differential-equations dynamical-systems iterated-filtering likelihood likelihood-free markov-chain-monte-carlo markov-model mathematical-modelling measurement-error particle-filter sequential-monte-carlo simulation-based-inference sobol-sequence state-space statistical-inference stochastic-processes time-series openblas

114 stars 11.74 score 1.3k scripts 4 dependents

luca-scr

GA:Genetic Algorithms

Flexible general-purpose toolbox implementing genetic algorithms (GAs) for stochastic optimisation. Binary, real-valued, and permutation representations are available to optimize a fitness function, i.e. a function provided by users depending on their objective function. Several genetic operators are available and can be combined to explore the best settings for the current task. Furthermore, users can define new genetic operators and easily evaluate their performances. Local search using general-purpose optimisation algorithms can be applied stochastically to exploit interesting regions. GAs can be run sequentially or in parallel, using an explicit master-slave parallelisation or a coarse-grain islands approach. For more details see Scrucca (2013) <doi:10.18637/jss.v053.i04> and Scrucca (2017) <doi:10.32614/RJ-2017-008>.

Maintained by Luca Scrucca. Last updated 7 months ago.

genetic-algorithm optimisation cpp

93 stars 11.58 score 624 scripts 52 dependents

bioc

mia:Microbiome analysis

mia implements tools for microbiome analysis based on the SummarizedExperiment, SingleCellExperiment and TreeSummarizedExperiment infrastructure. Data wrangling and analysis in the context of taxonomic data is the main scope. Additional functions for common task are implemented such as community indices calculation and summarization.

Maintained by Tuomas Borman. Last updated 14 days ago.

microbiome software dataimport analysis bioconductor

52 stars 11.50 score 316 scripts 5 dependents

r-forge

Rmpfr:Interface R to MPFR - Multiple Precision Floating-Point Reliable

Arithmetic (via S4 classes and methods) for arbitrary precision floating point numbers, including transcendental ("special") functions. To this end, the package interfaces to the 'LGPL' licensed 'MPFR' (Multiple Precision Floating-Point Reliable) Library which itself is based on the 'GMP' (GNU Multiple Precision) Library.

Maintained by Martin Maechler. Last updated 4 months ago.

mpfr4 gmp

11.30 score 316 scripts 141 dependents

bioc

MAST:Model-based Analysis of Single Cell Transcriptomics

Methods and models for handling zero-inflated single cell assay data.

Maintained by Andrew McDavid. Last updated 5 months ago.

geneexpression differentialexpression genesetenrichment rnaseq transcriptomics singlecell

232 stars 11.28 score 1.8k scripts 5 dependents

fmichonneau

phylobase:Base Package for Phylogenetic Structures and Comparative Data

Provides a base S4 class for comparative methods, incorporating one or more trees and trait data.

Maintained by Francois Michonneau. Last updated 1 years ago.

phylogenetics cpp

18 stars 11.10 score 394 scripts 18 dependents

mclements

rstpm2:Smooth Survival Models, Including Generalized Survival Models

R implementation of generalized survival models (GSMs), smooth accelerated failure time (AFT) models and Markov multi-state models. For the GSMs, g(S(t|x))=eta(t,x) for a link function g, survival S at time t with covariates x and a linear predictor eta(t,x). The main assumption is that the time effect(s) are smooth <doi:10.1177/0962280216664760>. For fully parametric models with natural splines, this re-implements Stata's 'stpm2' function, which are flexible parametric survival models developed by Royston and colleagues. We have extended the parametric models to include any smooth parametric smoothers for time. We have also extended the model to include any smooth penalized smoothers from the 'mgcv' package, using penalized likelihood. These models include left truncation, right censoring, interval censoring, gamma frailties and normal random effects <doi:10.1002/sim.7451>, and copulas. For the smooth AFTs, S(t|x) = S_0(t*eta(t,x)), where the baseline survival function S_0(t)=exp(-exp(eta_0(t))) is modelled for natural splines for eta_0, and the time-dependent cumulative acceleration factor eta(t,x)=\int_0^t exp(eta_1(u,x)) du for log acceleration factor eta_1(u,x). The Markov multi-state models allow for a range of models with smooth transitions to predict transition probabilities, length of stay, utilities and costs, with differences, ratios and standardisation.

Maintained by Mark Clements. Last updated 5 months ago.

fortran openblas cpp

27 stars 11.09 score 137 scripts 52 dependents

rkillick

changepoint:Methods for Changepoint Detection

Implements various mainstream and specialised changepoint methods for finding single and multiple changepoints within data. Many popular non-parametric and frequentist methods are included. The cpt.mean(), cpt.var(), cpt.meanvar() functions should be your first point of call.

Maintained by Rebecca Killick. Last updated 4 months ago.

changepoint segmentation

133 stars 11.05 score 736 scripts 40 dependents

bioc

DirichletMultinomial:Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data

Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Quince, 2012, PLoS ONE 7(2): 1-15, as discussed further in the man page for this package, ?DirichletMultinomial.

Maintained by Martin Morgan. Last updated 5 months ago.

immunooncology microbiome sequencing clustering classification metagenomics gsl

10 stars 10.91 score 125 scripts 26 dependents

ecmerkle

blavaan:Bayesian Latent Variable Analysis

Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models. References: Merkle & Rosseel (2018) <doi:10.18637/jss.v085.i04>; Merkle et al. (2021) <doi:10.18637/jss.v100.i06>.

Maintained by Edgar Merkle. Last updated 9 days ago.

bayesian-statistics factor-analysis growth-curve-models latent-variables missing-data multilevel-models multivariate-analysis path-analysis psychometrics statistical-modeling structural-equation-modeling cpp

92 stars 10.84 score 183 scripts 3 dependents

tyee001

VGAM:Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book "Vector Generalized Linear and Additive Models: With an Implementation in R" (Yee, 2015) <DOI:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (100+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, doubly constrained RR-VGLMs, quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)---these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Hauck-Donner effect detection is implemented. Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

Maintained by Thomas Yee. Last updated 1 months ago.

fortran

10 stars 10.67 score 3.6k scripts 169 dependents

zdebruine

RcppML:Rcpp Machine Learning Library

Fast machine learning algorithms including matrix factorization and divisive clustering for large sparse and dense matrices.

Maintained by Zach DeBruine. Last updated 2 years ago.

clustering matrix-factorization nmf rcpp rcppeigen sparse-matrix cpp openmp

107 stars 10.66 score 125 scripts 50 dependents

ohdsi

FeatureExtraction:Generating Features for a Cohort

An R interface for generating features for a cohort using data in the Common Data Model. Features can be constructed using default or custom made feature definitions. Furthermore it's possible to aggregate features and get the summary statistics.

Maintained by Ger Inberg. Last updated 8 days ago.

hades openjdk

62 stars 10.64 score 209 scripts 2 dependents

valentint

rrcov:Scalable Robust Estimators with High Breakdown Point

Robust Location and Scatter Estimation and Robust Multivariate Analysis with High Breakdown Point: principal component analysis (Filzmoser and Todorov (2013), <doi:10.1016/j.ins.2012.10.017>), linear and quadratic discriminant analysis (Todorov and Pires (2007)), multivariate tests (Todorov and Filzmoser (2010) <doi:10.1016/j.csda.2009.08.015>), outlier detection (Todorov et al. (2010) <doi:10.1007/s11634-010-0075-2>). See also Todorov and Filzmoser (2009) <urn:isbn:978-3838108148>, Todorov and Filzmoser (2010) <doi:10.18637/jss.v032.i03> and Boudt et al. (2019) <doi:10.1007/s11222-019-09869-x>.

Maintained by Valentin Todorov. Last updated 7 months ago.

fortran openblas

2 stars 10.57 score 484 scripts 96 dependents

bioc

seqLogo:Sequence logos for DNA sequence alignments

seqLogo takes the position weight matrix of a DNA sequence motif and plots the corresponding sequence logo as introduced by Schneider and Stephens (1990).

Maintained by Robert Ivanek. Last updated 5 months ago.

sequencematching

4 stars 10.57 score 304 scripts 29 dependents

mhahsler

recommenderlab:Lab for Developing and Testing Recommender Algorithms

Provides a research infrastructure to develop and evaluate collaborative filtering recommender algorithms. This includes a sparse representation for user-item matrices, many popular algorithms, top-N recommendations, and cross-validation. Hahsler (2022) <doi:10.48550/arXiv.2205.12371>.

Maintained by Michael Hahsler. Last updated 3 days ago.

collaborative-filtering recommender-system

214 stars 10.42 score 840 scripts 2 dependents

bioc

flowCore:flowCore: Basic structures for flow cytometry data

Provides S4 data structures and basic functions to deal with flow cytometry data.

Maintained by Mike Jiang. Last updated 5 months ago.

immunooncology infrastructure flowcytometry cellbasedassays cpp

10.34 score 1.7k scripts 59 dependents

agrdatasci

gdistance:Distances and Routes on Geographical Grids

Provides classes and functions to calculate various distance measures and routes in heterogeneous geographic spaces represented as grids. The package implements measures to model dispersal histories first presented by van Etten and Hijmans (2010) <doi:10.1371/journal.pone.0012060>. Least-cost distances as well as more complex distances based on (constrained) random walks can be calculated. The distances implemented in the package are used in geographical genetics, accessibility indicators, and may also have applications in other fields of geospatial analysis.

Maintained by Andrew Marx. Last updated 1 years ago.

17 stars 10.34 score 478 scripts 23 dependents

stewid

SimInf:A Framework for Data-Driven Stochastic Disease Spread Simulations

Provides an efficient and very flexible framework to conduct data-driven epidemiological modeling in realistic large scale disease spread simulations. The framework integrates infection dynamics in subpopulations as continuous-time Markov chains using the Gillespie stochastic simulation algorithm and incorporates available data such as births, deaths and movements as scheduled events at predefined time-points. Using C code for the numerical solvers and 'OpenMP' (if available) to divide work over multiple processors ensures high performance when simulating a sample outcome. One of our design goals was to make the package extendable and enable usage of the numerical solvers from other R extension packages in order to facilitate complex epidemiological research. The package contains template models and can be extended with user-defined models. For more details see the paper by Widgren, Bauer, Eriksson and Engblom (2019) <doi:10.18637/jss.v091.i12>. The package also provides functionality to fit models to time series data using the Approximate Bayesian Computation Sequential Monte Carlo ('ABC-SMC') algorithm of Toni and others (2009) <doi:10.1098/rsif.2008.0172>.

Maintained by Stefan Widgren. Last updated 17 days ago.

data-driven epidemiology high-performance-computing markov-chain mathematical-modelling gsl openmp

35 stars 10.09 score 227 scripts

mages

ChainLadder:Statistical Methods and Models for Claims Reserving in General Insurance

Various statistical methods and models which are typically used for the estimation of outstanding claims reserves in general insurance, including those to estimate the claims development result as required under Solvency II.

Maintained by Markus Gesmann. Last updated 2 months ago.

82 stars 10.04 score 196 scripts 2 dependents

ropensci

RNeXML:Semantically Rich I/O for the 'NeXML' Format

Provides access to phyloinformatic data in 'NeXML' format. The package should add new functionality to R such as the possibility to manipulate 'NeXML' objects in more various and refined way and compatibility with 'ape' objects.

Maintained by Carl Boettiger. Last updated 11 months ago.

metadata nexml phylogenetics linked-data

13 stars 9.97 score 100 scripts 19 dependents

bioc

methylumi:Handle Illumina methylation data

This package provides classes for holding and manipulating Illumina methylation data. Based on eSet, it can contain MIAME information, sample information, feature information, and multiple matrices of data. An "intelligent" import function, methylumiR can read the Illumina text files and create a MethyLumiSet. methylumIDAT can directly read raw IDAT files from HumanMethylation27 and HumanMethylation450 microarrays. Normalization, background correction, and quality control features for GoldenGate, Infinium, and Infinium HD arrays are also included.

Maintained by Sean Davis. Last updated 5 months ago.

dnamethylation twochannel preprocessing qualitycontrol cpgisland

9 stars 9.90 score 89 scripts 9 dependents

bioc

snpStats:SnpMatrix and XSnpMatrix classes and methods

Classes and statistical methods for large SNP association studies. This extends the earlier snpMatrix package, allowing for uncertainty in genotypes.

Maintained by David Clayton. Last updated 5 months ago.

microarray snp geneticvariability zlib

9.48 score 674 scripts 20 dependents

sizespectrum

mizer:Dynamic Multi-Species Size Spectrum Modelling

A set of classes and methods to set up and run multi-species, trait based and community size spectrum ecological models, focused on the marine environment.

Maintained by Gustav Delius. Last updated 2 months ago.

ecosystem-model fish-population-dynamics fisheries fisheries-management marine-ecosystem population-dynamics simulation size-structure species-interactions transport-equation cpp

39 stars 9.41 score 207 scripts

reinhardfurrer

spam:SPArse Matrix

Set of functions for sparse matrix algebra. Differences with other sparse matrix packages are: (1) we only support (essentially) one sparse matrix format, (2) based on transparent and simple structure(s), (3) tailored for MCMC calculations within G(M)RF. (4) and it is fast and scalable (with the extension package spam64). Documentation about 'spam' is provided by vignettes included in this package, see also Furrer and Sain (2010) <doi:10.18637/jss.v036.i10>; see 'citation("spam")' for details.

Maintained by Reinhard Furrer. Last updated 2 months ago.

fortran openblas cpp

1 stars 9.36 score 420 scripts 439 dependents

bioc

multtest:Resampling-based multiple hypothesis testing

Non-parametric bootstrap and permutation resampling-based multiple testing procedures (including empirical Bayes methods) for controlling the family-wise error rate (FWER), generalized family-wise error rate (gFWER), tail probability of the proportion of false positives (TPPFP), and false discovery rate (FDR). Several choices of bootstrap-based null distribution are implemented (centered, centered and scaled, quantile-transformed). Single-step and step-wise methods are available. Tests based on a variety of t- and F-statistics (including t-statistics based on regression parameters from linear and survival models as well as those based on correlation parameters) are included. When probing hypotheses with t-statistics, users may also select a potentially faster null distribution which is multivariate normal with mean zero and variance covariance matrix derived from the vector influence function. Results are reported in terms of adjusted p-values, confidence regions and test statistic cutoffs. The procedures are directly applicable to identifying differentially expressed genes in DNA microarray experiments.

Maintained by Katherine S. Pollard. Last updated 5 months ago.

microarray differentialexpression multiplecomparison

9.34 score 932 scripts 136 dependents

bioc

CNEr:CNE Detection and Visualization

Large-scale identification and advanced visualization of sets of conserved noncoding elements.

Maintained by Ge Tan. Last updated 5 months ago.

generegulation visualization dataimport

3 stars 9.28 score 35 scripts 19 dependents

bpfaff

urca:Unit Root and Cointegration Tests for Time Series Data

Unit root and cointegration tests encountered in applied econometric analysis are implemented.

Maintained by Bernhard Pfaff. Last updated 10 months ago.

fortran

6 stars 8.95 score 1.4k scripts 270 dependents

bioc

marray:Exploratory analysis for two-color spotted microarray data

Class definitions for two-color spotted microarray data. Fuctions for data input, diagnostic plots, normalization and quality checking.

Maintained by Yee Hwa (Jean) Yang. Last updated 5 months ago.

microarray twochannel preprocessing

8.92 score 222 scripts 38 dependents

flr

FLCore:Core Package of FLR, Fisheries Modelling in R

Core classes and methods for FLR, a framework for fisheries modelling and management strategy simulation in R. Developed by a team of fisheries scientists in various countries. More information can be found at <http://flr-project.org/>.

Maintained by Iago Mosqueira. Last updated 8 days ago.

fisheries flr fisheries-modelling

16 stars 8.78 score 956 scripts 23 dependents

bart1

move:Visualizing and Analyzing Animal Track Data

Contains functions to access movement data stored in 'movebank.org' as well as tools to visualize and statistically analyze animal movement data, among others functions to calculate dynamic Brownian Bridge Movement Models. Move helps addressing movement ecology questions.

Maintained by Bart Kranstauber. Last updated 4 months ago.

cpp

8.76 score 690 scripts 3 dependents

pilaboratory

sads:Maximum Likelihood Models for Species Abundance Distributions

Maximum likelihood tools to fit and compare models of species abundance distributions and of species rank-abundance distributions.

Maintained by Paulo I. Prado. Last updated 1 years ago.

23 stars 8.66 score 244 scripts 3 dependents

jniedballa

camtrapR:Camera Trap Data Management and Preparation of Occupancy and Spatial Capture-Recapture Analyses

Management of and data extraction from camera trap data in wildlife studies. The package provides a workflow for storing and sorting camera trap photos (and videos), tabulates records of species and individuals, and creates detection/non-detection matrices for occupancy and spatial capture-recapture analyses with great flexibility. In addition, it can visualise species activity data and provides simple mapping functions with GIS export.

Maintained by Juergen Niedballa. Last updated 4 months ago.

occupancy-modeling spatial-capture-recapture wildlife

35 stars 8.65 score 178 scripts

actuaryzhang

cplm:Compound Poisson Linear Models

Likelihood-based and Bayesian methods for various compound Poisson linear models based on Zhang, Yanwei (2013) <doi:10.1007/s11222-012-9343-7>.

Maintained by Yanwei (Wayne) Zhang. Last updated 1 years ago.

openblas

16 stars 8.55 score 75 scripts 10 dependents

r-forge

ClassDiscovery:Classes and Methods for "Class Discovery" with Microarrays or Proteomics

Defines the classes used for "class discovery" problems in the OOMPA project (<http://oompa.r-forge.r-project.org/>). Class discovery primarily consists of unsupervised clustering methods with attempts to assess their statistical significance.

Maintained by Kevin R. Coombes. Last updated 2 months ago.

microarray clustering

8.53 score 85 scripts 9 dependents

bioc

pwalign:Perform pairwise sequence alignments

The two main functions in the package are pairwiseAlignment() and stringDist(). The former solves (Needleman-Wunsch) global alignment, (Smith-Waterman) local alignment, and (ends-free) overlap alignment problems. The latter computes the Levenshtein edit distance or pairwise alignment score matrix for a set of strings.

Maintained by Hervé Pagès. Last updated 10 days ago.

alignment sequencematching sequencing genetics bioconductor-package

1 stars 8.48 score 27 scripts 104 dependents

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

3 stars 8.20 score 7.8k scripts 11 dependents

cran

flexmix:Flexible Mixture Modeling

A general framework for finite mixtures of regression models using the EM algorithm is implemented. The E-step and all data handling are provided, while the M-step can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering.

Maintained by Bettina Gruen. Last updated 29 days ago.

5 stars 8.19 score 113 dependents

r-hyperspec

hyperSpec:Work with Hyperspectral Data, i.e. Spectra + Meta Information (Spatial, Time, Concentration, ...)

Comfortable ways to work with hyperspectral data sets, i.e. spatially or time-resolved spectra, or spectra with any other kind of information associated with each of the spectra. The spectra can be data as obtained in XRF, UV/VIS, Fluorescence, AES, NIR, IR, Raman, NMR, MS, etc. More generally, any data that is recorded over a discretized variable, e.g. absorbance = f(wavelength), stored as a vector of absorbance values for discrete wavelengths is suitable.

Maintained by Claudia Beleites. Last updated 10 months ago.

data-wrangling hyperspectral imaging infrared nmr raman spectroscopy uv-vis xrf

16 stars 8.10 score 233 scripts 2 dependents

polmine

polmineR:Verbs and Nouns for Corpus Analysis

Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.

Maintained by Andreas Blaette. Last updated 1 years ago.

49 stars 7.96 score 311 scripts

bioc

Category:Category Analysis

A collection of tools for performing category (gene set enrichment) analysis.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation go pathways genesetenrichment

7.93 score 183 scripts 16 dependents

r-forge

tuneR:Analysis of Music and Speech

Analyze music and speech, extract features like MFCCs, handle wave files and their representation in various ways, read mp3, read midi, perform steps of a transcription, ... Also contains functions ported from the 'rastamat' 'Matlab' package.

Maintained by Uwe Ligges. Last updated 12 months ago.

7.93 score 1.1k scripts 44 dependents

biodiverse

ubms:Bayesian Models for Data from Unmarked Animals using 'Stan'

Fit Bayesian hierarchical models of animal abundance and occurrence via the 'rstan' package, the R interface to the 'Stan' C++ library. Supported models include single-season occupancy, dynamic occupancy, and N-mixture abundance models. Covariates on model parameters are specified using a formula-based interface similar to package 'unmarked', while also allowing for estimation of random slope and intercept terms. References: Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>; Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.

Maintained by Ken Kellner. Last updated 30 days ago.

distance-sampling hierarchical-models n-mixture-model occupancy stan openblas cpp

36 stars 7.90 score 73 scripts

bioc

siggenes:Multiple Testing using SAM and Efron's Empirical Bayes Approaches

Identification of differentially expressed genes and estimation of the False Discovery Rate (FDR) using both the Significance Analysis of Microarrays (SAM) and the Empirical Bayes Analyses of Microarrays (EBAM).

Maintained by Holger Schwender. Last updated 5 months ago.

multiplecomparison microarray geneexpression snp exonarray differentialexpression

7.87 score 74 scripts 34 dependents

lvclark

polysat:Tools for Polyploid Microsatellite Analysis

A collection of tools to handle microsatellite data of any ploidy (and samples of mixed ploidy) where allele copy number is not known in partially heterozygous genotypes. It can import and export data in ABI 'GeneMapper', 'Structure', 'ATetra', 'Tetrasat'/'Tetra', 'GenoDive', 'SPAGeDi', 'POPDIST', 'STRand', and binary presence/absence formats. It can calculate pairwise distances between individuals using a stepwise mutation model or infinite alleles model, with or without taking ploidies and allele frequencies into account. These distances can be used for the calculation of clonal diversity statistics or used for further analysis in R. Allelic diversity statistics and Polymorphic Information Content are also available. polysat can assist the user in estimating the ploidy of samples, and it can estimate allele frequencies in populations, calculate pairwise or global differentiation statistics based on those frequencies, and export allele frequencies to 'SPAGeDi' and 'adegenet'. Functions are also included for assigning alleles to isoloci in cases where one pair of microsatellite primers amplifies alleles from two or more independently segregating isoloci. polysat is described by Clark and Jasieniuk (2011) <doi:10.1111/j.1755-0998.2011.02985.x> and Clark and Schreier (2017) <doi:10.1111/1755-0998.12639>.

Maintained by Lindsay V. Clark. Last updated 3 months ago.

cpp

12 stars 7.80 score 72 scripts 1 dependents

bioc

hermes:Preprocessing, analyzing, and reporting of RNA-seq data

Provides classes and functions for quality control, filtering, normalization and differential expression analysis of pre-processed `RNA-seq` data. Data can be imported from `SummarizedExperiment` as well as `matrix` objects and can be annotated from `BioMart`. Filtering for genes without too low expression or containing required annotations, as well as filtering for samples with sufficient correlation to other samples or total number of reads is supported. The standard normalization methods including cpm, rpkm and tpm can be used, and 'DESeq2` as well as voom differential expression analyses are available.

Maintained by Daniel Sabanés Bové. Last updated 5 months ago.

rnaseq differentialexpression normalization preprocessing qualitycontrol rna-seq statistical-engineering

11 stars 7.77 score 48 scripts 1 dependents

bioc

edge:Extraction of Differential Gene Expression

The edge package implements methods for carrying out differential expression analyses of genome-wide gene expression studies. Significance testing using the optimal discovery procedure and generalized likelihood ratio tests (equivalent to F-tests and t-tests) are implemented for general study designs. Special functions are available to facilitate the analysis of common study designs, including time course experiments. Other packages such as sva and qvalue are integrated in edge to provide a wide range of tools for gene expression analysis.

Maintained by John D. Storey. Last updated 5 months ago.

multiplecomparison differentialexpression timecourse regression geneexpression dataimport

21 stars 7.77 score 62 scripts

openpharma

crmPack:Object-Oriented Implementation of CRM Designs

Implements a wide range of model-based dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules. Further details are presented in Sabanes Bove et al. (2019) <doi:10.18637/jss.v089.i10>.

Maintained by Daniel Sabanes Bove. Last updated 2 months ago.

jags cpp

21 stars 7.76 score 208 scripts

trackage

trip:Tracking Data

Access and manipulate spatial tracking data, with straightforward coercion from and to other formats. Filter for speed and create time spent maps from tracking data. There are coercion methods to convert between 'trip' and 'ltraj' from 'adehabitatLT', and between 'trip' and 'psp' and 'ppp' from 'spatstat'. Trip objects can be created from raw or grouped data frames, and from types in the 'sp', sf', 'amt', 'trackeR', 'mousetrap', and other packages, Sumner, MD (2011) <https://figshare.utas.edu.au/articles/thesis/The_tag_location_problem/23209538>.

Maintained by Michael D. Sumner. Last updated 9 months ago.

13 stars 7.72 score 137 scripts 1 dependents

blue-matter

MSEtool:Management Strategy Evaluation Toolkit

Development, simulation testing, and implementation of management procedures for fisheries (see Carruthers & Hordyk (2018) <doi:10.1111/2041-210X.13081>).

Maintained by Adrian Hordyk. Last updated 3 days ago.

cpp

8 stars 7.71 score 163 scripts 3 dependents

bsaul

geex:An API for M-Estimation

Provides a general, flexible framework for estimating parameters and empirical sandwich variance estimator from a set of unbiased estimating equations (i.e., M-estimation in the vein of Stefanski & Boos (2002) <doi:10.1198/000313002753631330>). All examples from Stefanski & Boos (2002) are published in the corresponding Journal of Statistical Software paper "The Calculus of M-Estimation in R with geex" by Saul & Hudgens (2020) <doi:10.18637/jss.v092.i02>. Also provides an API to compute finite-sample variance corrections.

Maintained by Bradley Saul. Last updated 11 months ago.

asymptotics covariance-estimates covariance-estimation estimate-parameters estimating-equations estimation inference m-estimation robust sandwich

8 stars 7.70 score 131 scripts 2 dependents

adamlilith

fasterRaster:Faster Raster and Spatial Vector Processing Using 'GRASS GIS'

Processing of large-in-memory/large-on disk rasters and spatial vectors using 'GRASS GIS' <https://grass.osgeo.org/>. Most functions in the 'terra' package are recreated. Processing of medium-sized and smaller spatial objects will nearly always be faster using 'terra' or 'sf', but for large-in-memory/large-on-disk objects, 'fasterRaster' may be faster. To use most of the functions, you must have the stand-alone version (not the 'OSGeoW4' installer version) of 'GRASS GIS' 8.0 or higher.

Maintained by Adam B. Smith. Last updated 2 days ago.

aspect distance fragmentation fragmentation-indices gis grass grass-gis raster raster-projection rasterize slope topography vectorization

57 stars 7.68 score 8 scripts

ltorgo

DMwR2:Functions and Data for the Second Edition of "Data Mining with R"

Functions and data accompanying the second edition of the book "Data Mining with R, learning with case studies" by Luis Torgo, published by CRC Press.

Maintained by Luis Torgo. Last updated 8 years ago.

27 stars 7.64 score 380 scripts 2 dependents

wenjie2wang

reda:Recurrent Event Data Analysis

Contains implementations of recurrent event data analysis routines including (1) survival and recurrent event data simulation from stochastic process point of view by the thinning method proposed by Lewis and Shedler (1979) <doi:10.1002/nav.3800260304> and the inversion method introduced in Cinlar (1975, ISBN:978-0486497976), (2) the mean cumulative function (MCF) estimation by the Nelson-Aalen estimator of the cumulative hazard rate function, (3) two-sample recurrent event responses comparison with the pseudo-score tests proposed by Lawless and Nadeau (1995) <doi:10.2307/1269617>, (4) gamma frailty model with spline rate function following Fu, et al. (2016) <doi:10.1080/10543406.2014.992524>.

Maintained by Wenjie Wang. Last updated 1 years ago.

mcf mean-cumulative-function recurrent-event survival-analysis cpp

15 stars 7.52 score 55 scripts 3 dependents

tpetzoldt

growthrates:Estimate Growth Rates from Experimental Data

A collection of methods to determine growth rates from experimental data, in particular from batch experiments and plate reader trials.

Maintained by Thomas Petzoldt. Last updated 2 years ago.

27 stars 7.52 score 102 scripts

cran

sn:The Skew-Normal and Related Distributions Such as the Skew-t and the SUN

Build and manipulate probability distributions of the skew-normal family and some related ones, notably the skew-t and the SUN families. For the skew-normal and the skew-t distributions, statistical methods are provided for data fitting and model diagnostics, in the univariate and the multivariate case.

Maintained by Adelchi Azzalini. Last updated 2 years ago.

3 stars 7.44 score 92 dependents

ssnn-airr

shazam:Immunoglobulin Somatic Hypermutation Analysis

Provides a computational framework for analyzing mutations in immunoglobulin (Ig) sequences. Includes methods for Bayesian estimation of antigen-driven selection pressure, mutational load quantification, building of somatic hypermutation (SHM) models, and model-dependent distance calculations. Also includes empirically derived models of SHM for both mice and humans. Citations: Gupta and Vander Heiden, et al (2015) <doi:10.1093/bioinformatics/btv359>, Yaari, et al (2012) <doi:10.1093/nar/gks457>, Yaari, et al (2013) <doi:10.3389/fimmu.2013.00358>, Cui, et al (2016) <doi:10.4049/jimmunol.1502263>.

Maintained by Susanna Marquez. Last updated 3 months ago.

7.43 score 222 scripts 2 dependents

bioc

cogena:co-expressed gene-set enrichment analysis

cogena is a workflow for co-expressed gene-set enrichment analysis. It aims to discovery smaller scale, but highly correlated cellular events that may be of great biological relevance. A novel pipeline for drug discovery and drug repositioning based on the cogena workflow is proposed. Particularly, candidate drugs can be predicted based on the gene expression of disease-related data, or other similar drugs can be identified based on the gene expression of drug-related data. Moreover, the drug mode of action can be disclosed by the associated pathway analysis. In summary, cogena is a flexible workflow for various gene set enrichment analysis for co-expressed genes, with a focus on pathway/GO analysis and drug repositioning.

Maintained by Zhilong Jia. Last updated 5 months ago.

clustering genesetenrichment geneexpression visualization pathways kegg go microarray sequencing systemsbiology datarepresentation dataimport bioconductor bioinformatics

12 stars 7.36 score 32 scripts

choi-phd

TestDesign:Optimal Test Design Approach to Fixed and Adaptive Test Construction

Uses the optimal test design approach by Birnbaum (1968, ISBN:9781593119348) and van der Linden (2018) <doi:10.1201/9781315117430> to construct fixed, adaptive, and parallel tests. Supports the following mixed-integer programming (MIP) solver packages: 'Rsymphony', 'highs', 'gurobi', 'lpSolve', and 'Rglpk'. The 'gurobi' package is not available from CRAN; see <https://www.gurobi.com/downloads/>.

Maintained by Seung W. Choi. Last updated 6 months ago.

openblas cpp

3 stars 7.34 score 37 scripts 2 dependents

argocanada

argoFloats:Analysis of Oceanographic Argo Floats

Supports the analysis of oceanographic data recorded by Argo autonomous drifting profiling floats. Functions are provided to (a) download and cache data files, (b) subset data in various ways, (c) handle quality-control flags and (d) plot the results according to oceanographic conventions. A shiny app is provided for easy exploration of datasets. The package is designed to work well with the 'oce' package, providing a wide range of processing capabilities that are particular to oceanographic analysis. See Kelley, Harbin, and Richards (2021) <doi:10.3389/fmars.2021.635922> for more on the scientific context and applications.

Maintained by Dan Kelley. Last updated 1 months ago.

17 stars 7.32 score 203 scripts

bioc

flowClust:Clustering for Flow Cytometry

Robust model-based clustering using a t-mixture model with Box-Cox transformation. Note: users should have GSL installed. Windows users: 'consult the README file available in the inst directory of the source distribution for necessary configuration instructions'.

Maintained by Greg Finak. Last updated 5 months ago.

immunooncology clustering visualization flowcytometry

7.30 score 83 scripts 6 dependents

r-forge

pcalg:Methods for Graphical Models and Causal Inference

Functions for causal structure learning and causal inference using graphical models. The main algorithms for causal structure learning are PC (for observational data without hidden variables), FCI and RFCI (for observational data with hidden variables), and GIES (for a mix of data from observational studies (i.e. observational data) and data from experiments involving interventions (i.e. interventional data) without hidden variables). For causal inference the IDA algorithm, the Generalized Backdoor Criterion (GBC), the Generalized Adjustment Criterion (GAC) and some related functions are implemented. Functions for incorporating background knowledge are provided.

Maintained by Markus Kalisch. Last updated 7 months ago.

openblas cpp

7.30 score 700 scripts 19 dependents

bioc

qpgraph:Estimation of Genetic and Molecular Regulatory Networks from High-Throughput Genomics Data

Estimate gene and eQTL networks from high-throughput expression and genotyping assays.

Maintained by Robert Castelo. Last updated 3 days ago.

microarray geneexpression transcription pathways networkinference graphandnetwork generegulation genetics geneticvariability snp software openblas

3 stars 7.24 score 20 scripts 3 dependents

ropensci

melt:Multiple Empirical Likelihood Tests

Performs multiple empirical likelihood tests. It offers an easy-to-use interface and flexibility in specifying hypotheses and calibration methods, extending the framework to simultaneous inferences. The core computational routines are implemented using the 'Eigen' 'C++' library and 'RcppEigen' interface, with 'OpenMP' for parallel computation. Details of the testing procedures are provided in Kim, MacEachern, and Peruggia (2023) <doi:10.1080/10485252.2023.2206919>. A companion paper by Kim, MacEachern, and Peruggia (2024) <doi:10.18637/jss.v108.i05> is available for further information. This work was supported by the U.S. National Science Foundation under Grants No. SES-1921523 and DMS-2015552.

Maintained by Eunseop Kim. Last updated 11 months ago.

cpp openmp

12 stars 7.24 score 84 scripts

vpihur

clValid:Validation of Clustering Results

Statistical and biological validation of clustering results. This package implements Dunn Index, Silhouette, Connectivity, Stability, BHI and BSI. Further information can be found in Brock, G et al. (2008) <doi: 10.18637/jss.v025.i04>.

Maintained by Vasyl Pihur. Last updated 4 years ago.

5 stars 7.24 score 422 scripts 14 dependents

dankelley

plan:Tools for Project Planning

Supports the creation of 'burndown' charts and 'gantt' diagrams.

Maintained by Dan Kelley. Last updated 2 years ago.

33 stars 7.23 score 103 scripts

jhorzek

lessSEM:Non-Smooth Regularization for Structural Equation Models

Provides regularized structural equation modeling (regularized SEM) with non-smooth penalty functions (e.g., lasso) building on 'lavaan'. The package is heavily inspired by the ['regsem'](<https://github.com/Rjacobucci/regsem>) and ['lslx'](<https://github.com/psyphh/lslx>) packages.

Maintained by Jannik H. Orzek. Last updated 1 years ago.

lasso psychometrics regularization regularized-structural-equation-model sem structural-equation-modeling openblas cpp openmp

7 stars 7.19 score 223 scripts

sahirbhatnagar

casebase:Fitting Flexible Smooth-in-Time Hazards and Risk Functions via Logistic and Multinomial Regression

Fit flexible and fully parametric hazard regression models to survival data with single event type or multiple competing causes via logistic and multinomial regression. Our formulation allows for arbitrary functional forms of time and its interactions with other predictors for time-dependent hazards and hazard ratios. From the fitted hazard model, we provide functions to readily calculate and plot cumulative incidence and survival curves for a given covariate profile. This approach accommodates any log-linear hazard function of prognostic time, treatment, and covariates, and readily allows for non-proportionality. We also provide a plot method for visualizing incidence density via population time plots. Based on the case-base sampling approach of Hanley and Miettinen (2009) <DOI:10.2202/1557-4679.1125>, Saarela and Arjas (2015) <DOI:10.1111/sjos.12125>, and Saarela (2015) <DOI:10.1007/s10985-015-9352-x>.

Maintained by Sahir Bhatnagar. Last updated 7 months ago.

competing-risks cox-regression regression-models survival-analysis

9 stars 7.16 score 94 scripts

optad

adoptr:Adaptive Optimal Two-Stage Designs

Optimize one or two-arm, two-stage designs for clinical trials with respect to several implemented objective criteria or custom objectives. Optimization under uncertainty and conditional (given stage-one outcome) constraints are supported. See Pilz et al. (2019) <doi:10.1002/sim.8291> and Kunzmann et al. (2021) <doi:10.18637/jss.v098.i09> for details.

Maintained by Maximilian Pilz. Last updated 6 months ago.

1 stars 7.09 score 39 scripts 1 dependents

ropensci

taxlist:Handling Taxonomic Lists

Handling taxonomic lists through objects of class 'taxlist'. This package provides functions to import species lists from 'Turboveg' (<https://www.synbiosys.alterra.nl/turboveg/>) and the possibility to create backups from resulting R-objects. Also quick displays are implemented as summary-methods.

Maintained by Miguel Alvarez. Last updated 6 months ago.

12 stars 7.07 score 81 scripts 2 dependents

spedygiorgio

lifecontingencies:Financial and Actuarial Mathematics for Life Contingencies

Classes and methods that allow the user to manage life table, actuarial tables (also multiple decrements tables). Moreover, functions to easily perform demographic, financial and actuarial mathematics on life contingencies insurances calculations are contained therein. See Spedicato (2013) <doi:10.18637/jss.v055.i10>.

Maintained by Giorgio Alfredo Spedicato. Last updated 6 months ago.

actuarial financial life-contingencies life-insurance cpp

61 stars 7.06 score 156 scripts

leifeld

btergm:Temporal Exponential Random Graph Models by Bootstrapped Pseudolikelihood

Temporal Exponential Random Graph Models (TERGM) estimated by maximum pseudolikelihood with bootstrapped confidence intervals or Markov Chain Monte Carlo maximum likelihood. Goodness of fit assessment for ERGMs, TERGMs, and SAOMs. Micro-level interpretation of ERGMs and TERGMs. The methods are described in Leifeld, Cranmer and Desmarais (2018), JStatSoft <doi:10.18637/jss.v083.i06>.

Maintained by Philip Leifeld. Last updated 10 days ago.

complex-networks dynamic-analysis ergm estimation goodness-of-fit inference longitudinal-data network-analysis prediction tergm

18 stars 7.03 score 83 scripts 2 dependents

doccstat

fastcpd:Fast Change Point Detection via Sequential Gradient Descent

Implements fast change point detection algorithm based on the paper "Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis" by Xianyang Zhang, Trisha Dawn <https://proceedings.mlr.press/v206/zhang23b.html>. The algorithm is based on dynamic programming with pruning and sequential gradient descent. It is able to detect change points a magnitude faster than the vanilla Pruned Exact Linear Time(PELT). The package includes examples of linear regression, logistic regression, Poisson regression, penalized linear regression data, and whole lot more examples with custom cost function in case the user wants to use their own cost function.

Maintained by Xingchi Li. Last updated 11 days ago.

change-point-detection cpp custom-function gradient-descent lasso linear-regression logistic-regression offline pelt penalized-regression poisson-regression quasi-newton statistics time-series warm-start fortran openblas cpp openmp

22 stars 7.00 score 7 scripts

roustant

DiceKriging:Kriging Methods for Computer Experiments

Estimation, validation and prediction of kriging models. Important functions : km, print.km, plot.km, predict.km.

Maintained by Olivier Roustant. Last updated 4 years ago.

4 stars 6.99 score 526 scripts 37 dependents

bioc

affyPLM:Methods for fitting probe-level models

A package that extends and improves the functionality of the base affy package. Routines that make heavy use of compiled code for speed. Central focus is on implementation of methods for fitting probe-level models and tools using these models. PLM based quality assessment tools.

Maintained by Ben Bolstad. Last updated 2 months ago.

microarray onechannel preprocessing qualitycontrol openblas zlib

6.99 score 206 scripts 4 dependents

r-forge

oompaBase:Class Unions, Matrix Operations, and Color Schemes for OOMPA

Provides the class unions that must be preloaded in order for the basic tools in the OOMPA (Object-Oriented Microarray and Proteomics Analysis) project to be defined and loaded. It also includes vectorized operations for row-by-row means, variances, and t-tests. Finally, it provides new color schemes. Details on the packages in the OOMPA project can be found at <http://oompa.r-forge.r-project.org/>.

Maintained by Kevin R. Coombes. Last updated 2 months ago.

infrastructure

6.97 score 29 scripts 18 dependents

skoval

RISmed:Download Content from NCBI Databases

A set of tools to extract bibliographic content from the National Center for Biotechnology Information (NCBI) databases, including PubMed. The name RISmed is a portmanteau of RIS (for Research Information Systems, a common tag format for bibliographic data) and PubMed.

Maintained by Stephanie Kovalchik. Last updated 3 years ago.

38 stars 6.94 score 252 scripts 3 dependents

bioc

GOstats:Tools for manipulating GO and microarrays

A set of tools for interacting with GO and microarray data. A variety of basic manipulation tools for graphs, hypothesis testing and other simple calculations.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation go multiplecomparison geneexpression microarray pathways genesetenrichment graphandnetwork

6.93 score 528 scripts 12 dependents

archaeostat

ArchaeoPhases:Post-Processing of Markov Chain Monte Carlo Simulations for Chronological Modelling

Statistical analysis of archaeological dates and groups of dates. This package allows to post-process Markov Chain Monte Carlo (MCMC) simulations from 'ChronoModel' <https://chronomodel.com/>, 'Oxcal' <https://c14.arch.ox.ac.uk/oxcal.html> or 'BCal' <https://bcal.shef.ac.uk/>. It provides functions for the study of rhythms of the long term from the posterior distribution of a series of dates (tempo and activity plot). It also allows the estimation and visualization of time ranges from the posterior distribution of groups of dates (e.g. duration, transition and hiatus between successive phases) as described in Philippe and Vibet (2020) <doi:10.18637/jss.v093.c01>.

Maintained by Anne Philippe. Last updated 12 months ago.

archaeology bayesian-statistics geochronology markov-chain radiocarbon-dates

10 stars 6.90 score 66 scripts

kingaa

ouch:Ornstein-Uhlenbeck Models for Phylogenetic Comparative Hypotheses

Fit and compare Ornstein-Uhlenbeck models for evolution along a phylogenetic tree.

Maintained by Aaron A. King. Last updated 5 months ago.

adaptive-regime brownian-motion ornstein-uhlenbeck ornstein-uhlenbeck-models ouch phylogenetic-comparative-hypotheses phylogenetic-comparative-methods phylogenetic-data react

15 stars 6.87 score 68 scripts 4 dependents

flr

FLasher:Projection and Forecasting of Fish Populations, Stocks and Fleets

Projection of future population and fishery dynamics is carried out for a given set of management targets. A system of equations is solved, using Automatic Differentation (AD), for the levels of effort by fishery (fleet) that will result in the required abundances, catches or fishing mortalities.

Maintained by Iago Mosqueira. Last updated 21 days ago.

forecast fisheries flr cpp

2 stars 6.86 score 254 scripts 6 dependents

bioc

GenomicFiles:Distributed computing by file or by range

This package provides infrastructure for parallel computations distributed 'by file' or 'by range'. User defined MAPPER and REDUCER functions provide added flexibility for data combination and manipulation.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

genetics infrastructure dataimport sequencing coverage

6.86 score 89 scripts 16 dependents

ingmarvisser

depmixS4:Dependent Mixture Models - Hidden Markov Models of GLMs and Other Distributions in S4

Fits latent (hidden) Markov models on mixed categorical and continuous (time series) data, otherwise known as dependent mixture models, see Visser & Speekenbrink (2010, <DOI:10.18637/jss.v036.i07>).

Maintained by Ingmar Visser. Last updated 4 years ago.

12 stars 6.85 score 308 scripts 4 dependents

pcruniversum

chipPCR:Toolkit of Helper Functions to Pre-Process Amplification Data

A collection of functions to pre-process amplification curve data from polymerase chain reaction (PCR) or isothermal amplification reactions. Contains functions to normalize and baseline amplification curves, to detect both the start and end of an amplification reaction, several smoothers (e.g., LOWESS, moving average, cubic splines, Savitzky-Golay), a function to detect false positive amplification reactions and a function to determine the amplification efficiency. Quantification point (Cq) methods include the first (FDM) and second approximate derivative maximum (SDM) methods (calculated by a 5-point-stencil) and the cycle threshold method. Data sets of experimental nucleic acid amplification systems ('VideoScan HCU', capillary convective PCR (ccPCR)) and commercial systems are included. Amplification curves were generated by helicase dependent amplification (HDA), ccPCR or PCR. As detection system intercalating dyes (EvaGreen, SYBR Green) and hydrolysis probes (TaqMan) were used. For more information see: Roediger et al. (2015) <doi:10.1093/bioinformatics/btv205>.

Maintained by Stefan Roediger. Last updated 4 years ago.

8 stars 6.84 score 97 scripts 1 dependents

ludovikcoba

rrecsys:Environment for Evaluating Recommender Systems

Processes standard recommendation datasets (e.g., a user-item rating matrix) as input and generates rating predictions and lists of recommended items. Standard algorithm implementations which are included in this package are the following: Global/Item/User-Average baselines, Weighted Slope One, Item-Based KNN, User-Based KNN, FunkSVD, BPR and weighted ALS. They can be assessed according to the standard offline evaluation methodology (Shani, et al. (2011) <doi:10.1007/978-0-387-85820-3_8>) for recommender systems using measures such as MAE, RMSE, Precision, Recall, F1, AUC, NDCG, RankScore and coverage measures. The package (Coba, et al.(2017) <doi: 10.1007/978-3-319-60042-0_36>) is intended for rapid prototyping of recommendation algorithms and education purposes.

Maintained by Ludovik Çoba. Last updated 3 years ago.

cpp

23 stars 6.84 score 25 scripts

bioc

maser:Mapping Alternative Splicing Events to pRoteins

This package provides functionalities for downstream analysis, annotation and visualizaton of alternative splicing events generated by rMATS.

Maintained by Diogo F.T. Veiga. Last updated 5 months ago.

alternativesplicing transcriptomics visualization

17 stars 6.74 score 18 scripts

dgerlanc

portfolio:Analysing Equity Portfolios

Classes for analysing and implementing equity portfolios, including routines for generating tradelists and calculating exposures to user-specified risk factors.

Maintained by Daniel Gerlanc. Last updated 7 months ago.

finance portfolio-construction risk-modelling

16 stars 6.71 score 106 scripts

bioc

doppelgangR:Identify likely duplicate samples from genomic or meta-data

The main function is doppelgangR(), which takes as minimal input a list of ExpressionSet object, and searches all list pairs for duplicated samples. The search is based on the genomic data (exprs(eset)), phenotype/clinical data (pData(eset)), and "smoking guns" - supposedly unique identifiers found in pData(eset).

Maintained by Levi Waldron. Last updated 5 months ago.

immunooncology rnaseq microarray geneexpression qualitycontrol bioconductor-package

5 stars 6.67 score 31 scripts

bioc

LEA:LEA: an R package for Landscape and Ecological Association Studies

LEA is an R package dedicated to population genomics, landscape genomics and genotype-environment association tests. LEA can run analyses of population structure and genome-wide tests for local adaptation, and also performs imputation of missing genotypes. The package includes statistical methods for estimating ancestry coefficients from large genotypic matrices and for evaluating the number of ancestral populations (snmf). It performs statistical tests using latent factor mixed models for identifying genetic polymorphisms that exhibit association with environmental gradients or phenotypic traits (lfmm2). In addition, LEA computes values of genetic offset statistics based on new or predicted environments (genetic.gap, genetic.offset). LEA is mainly based on optimized programs that can scale with the dimensions of large data sets.

Maintained by Olivier Francois. Last updated 18 days ago.

software statistical method clustering regression openblas

6.63 score 534 scripts

robinhankin

spray:Sparse Arrays and Multivariate Polynomials

Sparse arrays interpreted as multivariate polynomials. Uses 'disordR' discipline (Hankin, 2022, <doi:10.48550/ARXIV.2210.03856>). To cite the package in publications please use Hankin (2022) <doi:10.48550/ARXIV.2210.10848>.

Maintained by Robin K. S. Hankin. Last updated 2 months ago.

cpp

2 stars 6.62 score 35 scripts 4 dependents

robinhankin

disordR:Non-Ordered Vectors

Functionality for manipulating values of associative maps. The package is a dependency for mvp-type packages that use the STL map class: it traps plausible idiom that is ill-defined (implementation-specific) and returns an informative error, rather than returning a possibly incorrect result. To cite the package in publications please use Hankin (2022) <doi:10.48550/ARXIV.2210.03856>.

Maintained by Robin K. S. Hankin. Last updated 5 months ago.

1 stars 6.59 score 20 dependents

flr

FLBRP:Reference Points for Fisheries Management

Calculates a range of biological reference points based upon yield per recruit and stock recruit based equilibrium calculations. These include F based reference points like F0.1, FMSY and biomass based reference points like BMSY.

Maintained by Iago Mosqueira. Last updated 4 months ago.

reference points fisheries flr cpp

2 stars 6.58 score 350 scripts 4 dependents

spkaluzny

splus2R:Supplemental S-PLUS Functionality in R

Currently there are many functions in S-PLUS that are missing in R. To facilitate the conversion of S-PLUS packages to R packages, this package provides some missing S-PLUS functionality in R.

Maintained by Stephen Kaluzny. Last updated 1 years ago.

1 stars 6.56 score 82 scripts 30 dependents

fbertran

Cascade:Selection, Reverse-Engineering and Prediction in Cascade Networks

A modeling tool allowing gene selection, reverse engineering, and prediction in cascade networks. Jung, N., Bertrand, F., Bahram, S., Vallat, L., and Maumy-Bertrand, M. (2014) <doi:10.1093/bioinformatics/btt705>.

Maintained by Frederic Bertrand. Last updated 2 years ago.

1 stars 6.56 score 40 scripts 2 dependents

bioc

deepSNV:Detection of subclonal SNVs in deep sequencing data.

This package provides provides quantitative variant callers for detecting subclonal mutations in ultra-deep (>=100x coverage) sequencing experiments. The deepSNV algorithm is used for a comparative setup with a control experiment of the same loci and uses a beta-binomial model and a likelihood ratio test to discriminate sequencing errors and subclonal SNVs. The shearwater algorithm computes a Bayes classifier based on a beta-binomial model for variant calling with multiple samples for precisely estimating model parameters - such as local error rates and dispersion - and prior knowledge, e.g. from variation data bases such as COSMIC.

Maintained by Moritz Gerstung. Last updated 5 months ago.

geneticvariability snp sequencing genetics dataimport curl bzip2 xz-utils zlib cpp

6.53 score 38 scripts 1 dependents

yangjasp

optimall:Allocate Samples Among Strata

Functions for the design process of survey sampling, with specific tools for multi-wave and multi-phase designs. Perform optimum allocation using Neyman (1934) <doi:10.2307/2342192> or Wright (2012) <doi:10.1080/00031305.2012.733679> allocation, split strata based on quantiles or values of known variables, randomly select samples from strata, allocate sampling waves iteratively, and organize a complex survey design. Also includes a Shiny application for observing the effects of different strata splits.

Maintained by Jasper Yang. Last updated 1 months ago.

5 stars 6.49 score 39 scripts

r-forge

ClassComparison:Classes and Methods for "Class Comparison" Problems on Microarrays

Defines the classes used for "class comparison" problems in the OOMPA project (<http://oompa.r-forge.r-project.org/>). Class comparison includes tests for differential expression; see Simon's book for details on typical problem types.

Maintained by Kevin R. Coombes. Last updated 2 months ago.

microarray differentialexpression multiplecomparisons

6.46 score 44 scripts 3 dependents

bstatcomp

bayes4psy:User Friendly Bayesian Data Analysis for Psychology

Contains several Bayesian models for data analysis of psychological tests. A user friendly interface for these models should enable students and researchers to perform professional level Bayesian data analysis without advanced knowledge in programming and Bayesian statistics. This package is based on the Stan platform (Carpenter et el. 2017 <doi:10.18637/jss.v076.i01>).

Maintained by Jure Demšar. Last updated 1 years ago.

cpp

14 stars 6.44 score 33 scripts

bioc

PICS:Probabilistic inference of ChIP-seq

Probabilistic inference of ChIP-Seq using an empirical Bayes mixture model approach.

Maintained by Renan Sauteraud. Last updated 2 days ago.

clustering visualization sequencing chipseq gsl

6.43 score 7 scripts 1 dependents

bioc

quantro:A test for when to use quantile normalization

A data-driven test for the assumptions of quantile normalization using raw data such as objects that inherit eSets (e.g. ExpressionSet, MethylSet). Group level information about each sample (such as Tumor / Normal status) must also be provided because the test assesses if there are global differences in the distributions between the user-defined groups.

Maintained by Stephanie Hicks. Last updated 5 months ago.

normalization preprocessing multiplecomparison microarray sequencing

6.40 score 69 scripts 2 dependents

blue-matter

SAMtool:Stock Assessment Methods Toolkit

Simulation tools for closed-loop simulation are provided for the 'MSEtool' operating model to inform data-rich fisheries. 'SAMtool' provides a conditioning model, assessment models of varying complexity with standardized reporting, model-based management procedures, and diagnostic tools for evaluating assessments inside closed-loop simulation.

Maintained by Quang Huynh. Last updated 1 months ago.

cpp

3 stars 6.39 score 36 scripts 1 dependents

r-forge

TailRank:The Tail-Rank Statistic

Implements the tail-rank statistic for selecting biomarkers from a microarray data set, an efficient nonparametric test focused on the distributional tails. See <https://gitlab.com/krcoombes/coombeslab/-/blob/master/doc/papers/tolstoy-new.pdf>.

Maintained by Kevin R. Coombes. Last updated 2 months ago.

6.38 score 37 scripts 3 dependents

ropensci

QuadratiK:Collection of Methods Constructed using Kernel-Based Quadratic Distances

It includes test for multivariate normality, test for uniformity on the d-dimensional Sphere, non-parametric two- and k-sample tests, random generation of points from the Poisson kernel-based density and clustering algorithm for spherical data. For more information see Saraceno G., Markatou M., Mukhopadhyay R. and Golzy M. (2024) <doi:10.48550/arXiv.2402.02290> Markatou, M. and Saraceno, G. (2024) <doi:10.48550/arXiv.2407.16374>, Ding, Y., Markatou, M. and Saraceno, G. (2023) <doi:10.5705/ss.202022.0347>, and Golzy, M. and Markatou, M. (2020) <doi:10.1080/10618600.2020.1740713>.

Maintained by Giovanni Saraceno. Last updated 2 months ago.

cpp

1 stars 6.36 score 27 scripts

bioc

NanoStringNCTools:NanoString nCounter Tools

Tools for NanoString Technologies nCounter Technology. Provides support for reading RCC files into an ExpressionSet derived object. Also includes methods for QC and normalizaztion of NanoString data.

Maintained by Maddy Griswold. Last updated 5 months ago.

geneexpression transcription cellbasedassays dataimport transcriptomics proteomics mrnamicroarray proprietaryplatforms rnaseq

6.35 score 94 scripts 4 dependents

stc04003

reReg:Recurrent Event Regression

A comprehensive collection of practical and easy-to-use tools for regression analysis of recurrent events, with or without the presence of a (possibly) informative terminal event described in Chiou et al. (2023) <doi:10.18637/jss.v105.i05>. The modeling framework is based on a joint frailty scale-change model, that includes models described in Wang et al. (2001) <doi:10.1198/016214501753209031>, Huang and Wang (2004) <doi:10.1198/016214504000001033>, Xu et al. (2017) <doi:10.1080/01621459.2016.1173557>, and Xu et al. (2019) <doi:10.5705/SS.202018.0224> as special cases. The implemented estimating procedure does not require any parametric assumption on the frailty distribution. The package also allows the users to specify different model forms for both the recurrent event process and the terminal event.

Maintained by Sy Han (Steven) Chiou. Last updated 2 months ago.

openblas cpp

23 stars 6.35 score 36 scripts 1 dependents

dfriend21

quadtree:Region Quadtrees for Spatial Data

Provides functionality for working with raster-like quadtrees (also called “region quadtrees”), which allow for variable-sized cells. The package allows for flexibility in the quadtree creation process. Several functions defining how to split and aggregate cells are provided, and custom functions can be written for both of these processes. In addition, quadtrees can be created using other quadtrees as “templates”, so that the new quadtree's structure is identical to the template quadtree. The package also includes functionality for modifying quadtrees, querying values, saving quadtrees to a file, and calculating least-cost paths using the quadtree as a resistance surface.

Maintained by Derek Friend. Last updated 2 years ago.

cpp

19 stars 6.34 score 58 scripts

cran

fGarch:Rmetrics - Autoregressive Conditional Heteroskedastic Modelling

Analyze and model heteroskedastic behavior in financial time series.

Maintained by Georgi N. Boshnakov. Last updated 1 years ago.

fortran

7 stars 6.33 score 51 dependents

smoeding

usl:Analyze System Scalability with the Universal Scalability Law

The Universal Scalability Law (Gunther 2007) <doi:10.1007/978-3-540-31010-5> is a model to predict hardware and software scalability. It uses system capacity as a function of load to forecast the scalability for the system.

Maintained by Stefan Moeding. Last updated 3 years ago.

scalability universal-scalability-law usl

36 stars 6.32 score 117 scripts

jensharbers

agricolaeplotr:Visualization of Design of Experiments from the 'agricolae' Package

Visualization of Design of Experiments from the 'agricolae' package with 'ggplot2' framework The user provides an experiment design from the 'agricolae' package, calls the corresponding function and will receive a visualization with 'ggplot2' based functions that are specific for each design. As there are many different designs, each design is tested on its type. The output can be modified with standard 'ggplot2' commands or with other packages with 'ggplot2' function extensions.

Maintained by Jens Harbers. Last updated 2 months ago.

8 stars 6.27 score 78 scripts

larmarange

prevR:Estimating Regional Trends of a Prevalence from a DHS and Similar Surveys

Spatial estimation of a prevalence surface or a relative risks surface, using data from a Demographic and Health Survey (DHS) or an analog survey, see Larmarange et al. (2011) <doi:10.4000/cybergeo.24606>.

Maintained by Joseph Larmarange. Last updated 6 months ago.

5 stars 6.26 score 46 scripts

bioc

lumi:BeadArray Specific Methods for Illumina Methylation and Expression Microarrays

The lumi package provides an integrated solution for the Illumina microarray data analysis. It includes functions of Illumina BeadStudio (GenomeStudio) data input, quality control, BeadArray-specific variance stabilization, normalization and gene annotation at the probe level. It also includes the functions of processing Illumina methylation microarrays, especially Illumina Infinium methylation microarrays.

Maintained by Lei Huang. Last updated 5 months ago.

microarray onechannel preprocessing dnamethylation qualitycontrol twochannel

6.26 score 294 scripts 5 dependents

bioc

VariantFiltering:Filtering of coding and non-coding genetic variants

Filter genetic variants using different criteria such as inheritance model, amino acid change consequence, minor allele frequencies across human populations, splice site strength, conservation, etc.

Maintained by Robert Castelo. Last updated 2 months ago.

genetics homo_sapiens annotation snp sequencing highthroughputsequencing

4 stars 6.23 score 21 scripts

mu-sigma

HVT:Constructing Hierarchical Voronoi Tessellations and Overlay Heatmaps for Data Analysis

Facilitates building topology preserving maps for data analysis.

Maintained by "Mu Sigma, Inc.". Last updated 1 days ago.

4 stars 6.20 score 1 scripts

clarahapp

funData:An S4 Class for Functional Data

S4 classes for univariate and multivariate functional data with utility functions. See <doi:10.18637/jss.v093.i05> for a detailed description of the package functionalities and its interplay with the MFPCA package for multivariate functional principal component analysis <https://CRAN.R-project.org/package=MFPCA>.

Maintained by Clara Happ-Kurz. Last updated 1 years ago.

14 stars 6.15 score 111 scripts 6 dependents

bioc

Pedixplorer:Pedigree Functions

Routines to handle family data with a Pedigree object. The initial purpose was to create correlation structures that describe family relationships such as kinship and identity-by-descent, which can be used to model family data in mixed effects models, such as in the coxme function. Also includes a tool for Pedigree drawing which is focused on producing compact layouts without intervention. Recent additions include utilities to trim the Pedigree object with various criteria, and kinship for the X chromosome.

Maintained by Louis Le Nezet. Last updated 13 days ago.

software datarepresentation genetics graphandnetwork visualization kinship pedigree

2 stars 6.08 score 10 scripts

ltorgo

performanceEstimation:An Infra-Structure for Performance Estimation of Predictive Models

An infra-structure for estimating the predictive performance of predictive models. In this context, it can also be used to compare and/or select among different alternative ways of solving one or more predictive tasks. The main goal of the package is to provide a generic infra-structure to estimate the values of different metrics of predictive performance using different estimation procedures. These estimation tasks can be applied to any solutions (workflows) to the predictive tasks. The package provides easy to use standard workflows that allow the usage of any available R modeling algorithm together with some pre-defined data pre-processing steps and also prediction post- processing methods. It also provides means for addressing issues related with the statistical significance of the observed differences.

Maintained by Luis Torgo. Last updated 8 years ago.

16 stars 5.97 score 195 scripts 1 dependents

bioc

normr:Normalization and difference calling in ChIP-seq data

Robust normalization and difference calling procedures for ChIP-seq and alike data. Read counts are modeled jointly as a binomial mixture model with a user-specified number of components. A fitted background estimate accounts for the effect of enrichment in certain regions and, therefore, represents an appropriate null hypothesis. This robust background is used to identify significantly enriched or depleted regions.

Maintained by Johannes Helmuth. Last updated 5 months ago.

bayesian differentialpeakcalling classification dataimport chipseq ripseq functionalgenomics genetics multiplecomparison normalization peakdetection preprocessing alignment cpp openmp

11 stars 5.93 score 13 scripts

bozenne

BuyseTest:Generalized Pairwise Comparisons

Implementation of the Generalized Pairwise Comparisons (GPC) as defined in Buyse (2010) <doi:10.1002/sim.3923> for complete observations, and extended in Peron (2018) <doi:10.1177/0962280216658320> to deal with right-censoring. GPC compare two groups of observations (intervention vs. control group) regarding several prioritized endpoints to estimate the probability that a random observation drawn from one group performs better/worse/equivalently than a random observation drawn from the other group. Summary statistics such as the net treatment benefit, win ratio, or win odds are then deduced from these probabilities. Confidence intervals and p-values are obtained based on asymptotic results (Ozenne 2021 <doi:10.1177/09622802211037067>), non-parametric bootstrap, or permutations. The software enables the use of thresholds of minimal importance difference, stratification, non-prioritized endpoints (O Brien test), and can handle right-censoring and competing-risks.

Maintained by Brice Ozenne. Last updated 16 days ago.

generalized-pairwise-comparisons non-parametric statistics cpp

5 stars 5.91 score 90 scripts

bioc

scanMiR:scanMiR

A set of tools for working with miRNA affinity models (KdModels), efficiently scanning for miRNA binding sites, and predicting target repression. It supports scanning using miRNA seeds, full miRNA sequences (enabling 3' alignment) and KdModels, and includes the prediction of slicing and TDMD sites. Finally, it includes utility and plotting functions (e.g. for the visual representation of miRNA-target alignment).

Maintained by Pierre-Luc Germain. Last updated 5 months ago.

mirna sequencematching alignment

5.89 score 52 scripts 1 dependents

bioc

globaltest:Testing Groups of Covariates/Features for Association with a Response Variable, with Applications to Gene Set Testing

The global test tests groups of covariates (or features) for association with a response variable. This package implements the test with diagnostic plots and multiple testing utilities, along with several functions to facilitate the use of this test for gene set testing of GO and KEGG terms.

Maintained by Jelle Goeman. Last updated 5 months ago.

microarray onechannel bioinformatics differentialexpression go pathways

5.89 score 79 scripts 6 dependents

ropensci

phylotaR:Automated Phylogenetic Sequence Cluster Identification from 'GenBank'

A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.

Maintained by Shixiang Wang. Last updated 8 months ago.

blastn genbank peer-reviewed phylogenetics sequence-alignment

23 stars 5.86 score 156 scripts

bioc

fabia:FABIA: Factor Analysis for Bicluster Acquisition

Biclustering by "Factor Analysis for Bicluster Acquisition" (FABIA). FABIA is a model-based technique for biclustering, that is clustering rows and columns simultaneously. Biclusters are found by factor analysis where both the factors and the loading matrix are sparse. FABIA is a multiplicative model that extracts linear dependencies between samples and feature patterns. It captures realistic non-Gaussian data distributions with heavy tails as observed in gene expression measurements. FABIA utilizes well understood model selection techniques like the EM algorithm and variational approaches and is embedded into a Bayesian framework. FABIA ranks biclusters according to their information content and separates spurious biclusters from true biclusters. The code is written in C.

Maintained by Andreas Mitterecker. Last updated 5 months ago.

statisticalmethod microarray differentialexpression multiplecomparison clustering visualization

5.84 score 32 scripts 6 dependents

cran

flexclust:Flexible Cluster Algorithms

The main function kcca implements a general framework for k-centroids cluster analysis supporting arbitrary distance measures and centroid computation. Further cluster methods include hard competitive learning, neural gas, and QT clustering. There are numerous visualization methods for cluster results (neighborhood graphs, convex cluster hulls, barcharts of centroids, ...), and bootstrap methods for the analysis of cluster stability.

Maintained by Bettina Grün. Last updated 29 days ago.

3 stars 5.81 score 52 dependents

bioc

BindingSiteFinder:Binding site defintion based on iCLIP data

Precise knowledge on the binding sites of an RNA-binding protein (RBP) is key to understand (post-) transcriptional regulatory processes. Here we present a workflow that describes how exact binding sites can be defined from iCLIP data. The package provides functions for binding site definition and result visualization. For details please see the vignette.

Maintained by Mirko Brüggemann. Last updated 9 days ago.

sequencing geneexpression generegulation functionalgenomics coverage dataimport binding-site-classification binding-sites bioconductor-package iclip rna-binding-proteins

6 stars 5.80 score 3 scripts

sylvainschmitt

rcontroll:Individual-Based Forest Growth Simulator 'TROLL'

'TROLL' is coded in C++ and it typically simulates hundreds of thousands of individuals over hundreds of years. The 'rcontroll' R package is a wrapper of 'TROLL'. 'rcontroll' includes functions that generate inputs for simulations and run simulations. Finally, it is possible to analyse the 'TROLL' outputs through tables, figures, and maps taking advantage of other R visualisation packages. 'rcontroll' also offers the possibility to generate a virtual LiDAR point cloud that corresponds to a snapshot of the simulated forest.

Maintained by Sylvain Schmitt. Last updated 6 months ago.

gsl cpp

5 stars 5.76 score 19 scripts

bioc

demuxmix:Demultiplexing oligo-barcoded scRNA-seq data using regression mixture models

A package for demultiplexing single-cell sequencing experiments of pooled cells labeled with barcode oligonucleotides. The package implements methods to fit regression mixture models for a probabilistic classification of cells, including multiplet detection. Demultiplexing error rates can be estimated, and methods for quality control are provided.

Maintained by Hans-Ulrich Klein. Last updated 5 months ago.

singlecell sequencing preprocessing classification regression

5 stars 5.76 score 19 scripts 1 dependents

bioc

qusage:qusage: Quantitative Set Analysis for Gene Expression

This package is an implementation the Quantitative Set Analysis for Gene Expression (QuSAGE) method described in (Yaari G. et al, Nucl Acids Res, 2013). This is a novel Gene Set Enrichment-type test, which is designed to provide a faster, more accurate, and easier to understand test for gene expression studies. qusage accounts for inter-gene correlations using the Variance Inflation Factor technique proposed by Wu et al. (Nucleic Acids Res, 2012). In addition, rather than simply evaluating the deviation from a null hypothesis with a single number (a P value), qusage quantifies gene set activity with a complete probability density function (PDF). From this PDF, P values and confidence intervals can be easily extracted. Preserving the PDF also allows for post-hoc analysis (e.g., pair-wise comparisons of gene set activity) while maintaining statistical traceability. Finally, while qusage is compatible with individual gene statistics from existing methods (e.g., LIMMA), a Welch-based method is implemented that is shown to improve specificity. The QuSAGE package also includes a mixed effects model implementation, as described in (Turner JA et al, BMC Bioinformatics, 2015), and a meta-analysis framework as described in (Meng H, et al. PLoS Comput Biol. 2019). For questions, contact Chris Bolen (cbolen1@gmail.com) or Steven Kleinstein (steven.kleinstein@yale.edu)

Maintained by Christopher Bolen. Last updated 5 months ago.

genesetenrichment microarray rnaseq software immunooncology

5.65 score 185 scripts 1 dependents

bioc

flowMeans:Non-parametric Flow Cytometry Data Gating

Identifies cell populations in Flow Cytometry data using non-parametric clustering and segmented-regression-based change point detection. Note: R 2.11.0 or newer is required.

Maintained by Nima Aghaeepour. Last updated 5 months ago.

immunooncology flowcytometry cellbiology clustering

5.64 score 36 scripts 2 dependents

graemeleehickey

bayesDP:Implementation of the Bayesian Discount Prior Approach for Clinical Trials

Functions for data augmentation using the Bayesian discount prior method for single arm and two-arm clinical trials, as described in Haddad et al. (2017) <doi:10.1080/10543406.2017.1300907>. The discount power prior methodology was developed in collaboration with the The Medical Device Innovation Consortium (MDIC) Computer Modeling & Simulation Working Group.

Maintained by Graeme L. Hickey. Last updated 3 months ago.

bayesian bayesian-inference bayesian-statistics clinical-trials mdic posterior-predictive posterior-probability prior-distribution openblas cpp

5.56 score 20 scripts 1 dependents

quantsulting

ghyp:Generalized Hyperbolic Distribution and Its Special Cases

Detailed functionality for working with the univariate and multivariate Generalized Hyperbolic distribution and its special cases (Hyperbolic (hyp), Normal Inverse Gaussian (NIG), Variance Gamma (VG), skewed Student-t and Gaussian distribution). Especially, it contains fitting procedures, an AIC-based model selection routine, and functions for the computation of density, quantile, probability, random variates, expected shortfall and some portfolio optimization and plotting routines as well as the likelihood ratio test. In addition, it contains the Generalized Inverse Gaussian distribution. See Chapter 3 of A. J. McNeil, R. Frey, and P. Embrechts. Quantitative risk management: Concepts, techniques and tools. Princeton University Press, Princeton (2005).

Maintained by Marc Weibel. Last updated 7 months ago.

5.55 score 90 scripts 8 dependents

bayesplay

bayesplay:The Bayes Factor Playground

A lightweight modelling syntax for defining likelihoods and priors and for computing Bayes factors for simple one parameter models. It includes functionality for computing and plotting priors, likelihoods, and model predictions. Additional functionality is included for computing and plotting posteriors.

Maintained by Lincoln John Colling. Last updated 1 years ago.

bayes bayesian bayesian-statistics

6 stars 5.54 score 23 scripts

jeffreyhanson

raptr:Representative and Adequate Prioritization Toolkit in R

Biodiversity is in crisis. The overarching aim of conservation is to preserve biodiversity patterns and processes. To this end, protected areas are established to buffer species and preserve biodiversity processes. But resources are limited and so protected areas must be cost-effective. This package contains tools to generate plans for protected areas (prioritizations), using spatially explicit targets for biodiversity patterns and processes. To obtain solutions in a feasible amount of time, this package uses the commercial 'Gurobi' software (obtained from <https://www.gurobi.com/>). For more information on using this package, see Hanson et al. (2018) <doi:10.1111/2041-210X.12862>.

Maintained by Jeffrey O Hanson. Last updated 1 years ago.

cpp

8 stars 5.52 score 83 scripts

sleire

etrm:Energy Trading and Risk Management

Provides a collection of functions to perform core tasks within Energy Trading and Risk Management (ETRM). Calculation of maximum smoothness forward price curves for electricity and natural gas contracts with flow delivery, as presented in F. E. Benth, S. Koekebakker, and F. Ollmar (2007) <doi:10.3905/jod.2007.694791> and F. E. Benth, J. S. Benth, and S. Koekebakker (2008) <doi:10.1142/6811>. Portfolio insurance trading strategies for price risk management in the forward market, see F. Black (1976) <doi:10.1016/0304-405X(76)90024-6>, T. Bjork (2009) <https://EconPapers.repec.org/RePEc:oxp:obooks:9780199574742>, F. Black and R. W. Jones (1987) <doi:10.3905/jpm.1987.409131> and H. E. Leland (1980) <http://www.jstor.org/stable/2327419>.

Maintained by Anders D. Sleire. Last updated 2 years ago.

commodities energy-trading risk-management trading-strategies

33 stars 5.52 score 10 scripts

dgerlanc

backtest:Exploring Portfolio-Based Conjectures About Financial Instruments

The backtest package provides facilities for exploring portfolio-based conjectures about financial instruments (stocks, bonds, swaps, options, et cetera).

Maintained by Daniel Gerlanc. Last updated 10 years ago.

20 stars 5.52 score 33 scripts

luciu5

antitrust:Tools for Antitrust Practitioners

A collection of tools for antitrust practitioners, including the ability to calibrate different consumer demand systems and simulate the effects of mergers under different competitive regimes.

Maintained by Charles Taragin. Last updated 6 months ago.

5 stars 5.51 score 36 scripts 2 dependents

fmmgroupva

FMM:Rhythmic Patterns Modeling by FMM Models

Provides a collection of functions to fit and explore single, multi-component and restricted Frequency Modulated Moebius (FMM) models. 'FMM' is a nonlinear parametric regression model capable of fitting non-sinusoidal shapes in rhythmic patterns. Details about the mathematical formulation of 'FMM' models can be found in Rueda et al. (2019) <doi:10.1038/s41598-019-54569-1>.

Maintained by Itziar Fernandez. Last updated 3 days ago.

2 stars 5.48 score

blasif

cocons:Covariate-Based Covariance Functions for Nonstationary Spatial Modeling

Estimation, prediction, and simulation of nonstationary Gaussian process with modular covariate-based covariance functions. Sources of nonstationarity, such as spatial mean, variance, geometric anisotropy, smoothness, and nugget, can be considered based on spatial characteristics. An induced compact-supported nonstationary covariance function is provided, enabling fast and memory-efficient computations when handling densely sampled domains.

Maintained by Federico Blasi. Last updated 2 months ago.

covariance-matrix cpp estimation gaussian-processes large-dataset nonstationarity optimization prediction cpp

3 stars 5.48 score 1 scripts

bioc

specL:specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics

provides a functions for generating spectra libraries that can be used for MRM SRM MS workflows in proteomics. The package provides a BiblioSpec reader, a function which can add the protein information using a FASTA formatted amino acid file, and an export method for using the created library in the Spectronaut software. The package is developed, tested and used at the Functional Genomics Center Zurich <https://fgcz.ch>.

Maintained by Christian Panse. Last updated 5 months ago.

massspectrometry proteomics dda dia mass-spectrometry

1 stars 5.46 score 12 scripts

r-forge

fRegression:Rmetrics - Regression Based Decision and Prediction

A collection of functions for linear and non-linear regression modelling. It implements a wrapper for several regression models available in the base and contributed packages of R.

Maintained by Paul J. Northrop. Last updated 9 days ago.

1 stars 5.44 score 23 scripts

ssnn-airr

scoper:Spectral Clustering-Based Method for Identifying B Cell Clones

Provides a computational framework for identification of B cell clones from Adaptive Immune Receptor Repertoire sequencing (AIRR-Seq) data. Three main functions are included (identicalClones, hierarchicalClones, and spectralClones) that perform clustering among sequences of BCRs/IGs (B cell receptors/immunoglobulins) which share the same V gene, J gene and junction length. Nouri N and Kleinstein SH (2018) <doi: 10.1093/bioinformatics/bty235>. Nouri N and Kleinstein SH (2019) <doi: 10.1101/788620>. Gupta NT, et al. (2017) <doi: 10.4049/jimmunol.1601850>.

Maintained by Susanna Marquez. Last updated 2 months ago.

cpp

5.43 score 89 scripts

staffanbetner

rethinking:Statistical Rethinking book package

Utilities for fitting and comparing models

Maintained by Richard McElreath. Last updated 4 months ago.

5.42 score 4.4k scripts

simonmoulds

lulcc:Land Use Change Modelling in R

Classes and methods for spatially explicit land use change modelling in R.

Maintained by Simon Moulds. Last updated 5 years ago.

41 stars 5.37 score 38 scripts

r-forge

R2MLwiN:Running 'MLwiN' from Within R

An R command interface to the 'MLwiN' multilevel modelling software package.

Maintained by Zhengzheng Zhang. Last updated 9 days ago.

5.35 score 125 scripts

neotomadb

neotoma2:Working with the Neotoma Paleoecology Database

Access and manipulation of data using the Neotoma Paleoecology Database. <https://api.neotomadb.org/api-docs/>.

Maintained by Dominguez Vidana Socorro. Last updated 8 months ago.

earthcube neotoma nsf paleoecology

8 stars 5.35 score 56 scripts

centerforstatistics-ugent

pim:Fit Probabilistic Index Models

Fit a probabilistic index model as described in Thas et al, 2012: <doi:10.1111/j.1467-9868.2011.01020.x>. The interface to the modeling function has changed in this new version. The old version is still available at R-Forge.

Maintained by Joris Meys. Last updated 3 months ago.

modelling statistics

10 stars 5.33 score 43 scripts

eglenn

acs:Download, Manipulate, and Present American Community Survey and Decennial Data from the US Census

Provides a general toolkit for downloading, managing, analyzing, and presenting data from the U.S. Census (<https://www.census.gov/data/developers/data-sets.html>), including SF1 (Decennial short-form), SF3 (Decennial long-form), and the American Community Survey (ACS). Confidence intervals provided with ACS data are converted to standard errors to be bundled with estimates in complex acs objects. Package provides new methods to conduct standard operations on acs objects and present/plot data in statistically appropriate ways.

Maintained by Ezra Haber Glenn. Last updated 6 years ago.

11 stars 5.33 score 430 scripts 3 dependents

fukayak

occumb:Site Occupancy Modeling for Environmental DNA Metabarcoding

Fits multispecies site occupancy models to environmental DNA metabarcoding data collected using spatially-replicated survey design. Model fitting results can be used to evaluate and compare the effectiveness of species detection to find an efficient survey design. Reference: Fukaya et al. (2022) <doi:10.1111/2041-210X.13732>.

Maintained by Keiichi Fukaya. Last updated 2 months ago.

jags cpp

2 stars 5.30 score 10 scripts

pedersen-fisheries-lab

sspm:Spatial Surplus Production Model Framework for Northern Shrimp Populations

Implement a GAM-based (Generalized Additive Models) spatial surplus production model (spatial SPM), aimed at modeling northern shrimp population in Atlantic Canada but potentially to any stock in any location. The package is opinionated in its implementation of SPMs as it internally makes the choice to use penalized spatial gams with time lags. However, it also aims to provide options for the user to customize their model. The methods are described in Pedersen et al. (2022, <https://www.dfo-mpo.gc.ca/csas-sccs/Publications/ResDocs-DocRech/2022/2022_062-eng.html>).

Maintained by Valentin Lucet. Last updated 2 months ago.

gam model spatial surplus

3 stars 5.28 score 21 scripts

kkawato

rdlearn:Safe Policy Learning under Regression Discontinuity Design with Multiple Cutoffs

Implements safe policy learning under regression discontinuity designs with multiple cutoffs, based on Zhang et al. (2022) <doi:10.48550/arXiv.2208.13323>. The learned cutoffs are guaranteed to perform no worse than the existing cutoffs in terms of overall outcomes. The 'rdlearn' package also includes features for visualizing the learned cutoffs relative to the baseline and conducting sensitivity analyses.

Maintained by Kentaro Kawato. Last updated 1 months ago.

1 stars 5.23 score 4 scripts

bioc

HiTC:High Throughput Chromosome Conformation Capture analysis

The HiTC package was developed to explore high-throughput 'C' data such as 5C or Hi-C. Dedicated R classes as well as standard methods for quality controls, normalization, visualization, and further analysis are also provided.

Maintained by Nicolas Servant. Last updated 5 months ago.

sequencing highthroughputsequencing hic

5.23 score 42 scripts

cran

ICS:Tools for Exploring Multivariate Data via ICS/ICA

Implementation of Tyler, Critchley, Duembgen and Oja's (JRSS B, 2009, <doi:10.1111/j.1467-9868.2009.00706.x>) and Oja, Sirkia and Eriksson's (AJS, 2006, <https://www.ajs.or.at/index.php/ajs/article/view/vol35,%20no2%263%20-%207>) method of two different scatter matrices to obtain an invariant coordinate system or independent components, depending on the underlying assumptions.

Maintained by Klaus Nordhausen. Last updated 10 days ago.

5.20 score 17 dependents

bioc

ASICS:Automatic Statistical Identification in Complex Spectra

With a set of pure metabolite reference spectra, ASICS quantifies concentration of metabolites in a complex spectrum. The identification of metabolites is performed by fitting a mixture model to the spectra of the library with a sparse penalty. The method and its statistical properties are described in Tardivel et al. (2017) <doi:10.1007/s11306-017-1244-5>.

Maintained by Gaëlle Lefort. Last updated 5 months ago.

software dataimport cheminformatics metabolomics

5.18 score 30 scripts

cran

aod:Analysis of Overdispersed Data

Provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).

Maintained by Renaud Lancelot. Last updated 1 years ago.

3 stars 5.15 score 15 dependents

jaganmn

flint:Fast Library for Number Theory

An R interface to 'FLINT' <https://flintlib.org/>, a C library for number theory. 'FLINT' extends GNU 'MPFR' <https://www.mpfr.org/> and GNU 'MP' <https://gmplib.org/> with support for operations on standard rings (the integers, the integers modulo n, finite fields, the rational, p-adic, real, and complex numbers) as well as matrices and polynomials over rings. 'FLINT' implements midpoint-radius interval arithmetic, also known as ball arithmetic, in the real and complex numbers, enabling computation in arbitrary precision with rigorous propagation of rounding errors; see Johansson (2017) <doi:10.1109/TC.2017.2690633>. Finally, 'FLINT' provides ball arithmetic implementations of many special mathematical functions, with high coverage of reference works such as the NIST Digital Library of Mathematical Functions <https://dlmf.nist.gov/>. The R interface defines S4 classes, generic functions, and methods for representation and basic operations as well as plain R functions mirroring and vectorizing entry points in the C library.

Maintained by Mikael Jagan. Last updated 4 days ago.

flint mpfr4 gmp

5 stars 5.11 score 20 scripts

bioc

CMA:Synthesis of microarray-based classification

This package provides a comprehensive collection of various microarray-based classification algorithms both from Machine Learning and Statistics. Variable Selection, Hyperparameter tuning, Evaluation and Comparison can be performed combined or stepwise in a user-friendly environment.

Maintained by Roman Hornung. Last updated 5 months ago.

classification decisiontree

5.09 score 61 scripts

bioc

topdownr:Investigation of Fragmentation Conditions in Top-Down Proteomics

The topdownr package allows automatic and systemic investigation of fragment conditions. It creates Thermo Orbitrap Fusion Lumos method files to test hundreds of fragmentation conditions. Additionally it provides functions to analyse and process the generated MS data and determine the best conditions to maximise overall fragment coverage.

Maintained by Sebastian Gibb. Last updated 5 months ago.

immunooncology infrastructure proteomics massspectrometry coverage mass-spectrometry topdown

1 stars 5.08 score

statistikat

x12:Interface to 'X12-ARIMA'/'X13-ARIMA-SEATS' and Structure for Batch Processing of Seasonal Adjustment

The 'X13-ARIMA-SEATS' <https://www.census.gov/data/software/x13as.html> methodology and software is a widely used software and developed by the US Census Bureau. It can be accessed from 'R' with this package and 'X13-ARIMA-SEATS' binaries are provided by the 'R' package 'x13binary'.

Maintained by Alexander Kowarik. Last updated 3 years ago.

18 stars 5.06 score 57 scripts

yanrong-stacy-song

creditr:Credit Default Swaps

Price credit default swaps using 'C' code from the International Swaps and Derivatives Association CDS Standard Model. See <https://www.cdsmodel.com/cdsmodel/documentation.html> for more information about the model and <https://www.cdsmodel.com/cdsmodel/cds-disclaimer.html> for license details for the 'C' code.

Maintained by Yanrong Song. Last updated 5 days ago.

5.05 score 32 scripts

modal-inria

Rankcluster:Model-Based Clustering for Multivariate Partial Ranking Data

Implementation of a model-based clustering algorithm for ranking data (C. Biernacki, J. Jacques (2013) <doi:10.1016/j.csda.2012.08.008>). Multivariate rankings as well as partial rankings are taken into account. This algorithm is based on an extension of the Insertion Sorting Rank (ISR) model for ranking data, which is a meaningful and effective model parametrized by a position parameter (the modal ranking, quoted by mu) and a dispersion parameter (quoted by pi). The heterogeneity of the rank population is modelled by a mixture of ISR, whereas conditional independence assumption is considered for multivariate rankings.

Maintained by Quentin Grimonprez. Last updated 2 years ago.

clustering hacktoberfest rank cpp

1 stars 5.05 score 37 scripts 1 dependents

emanuelsommer

portvine:Vine Based (Un)Conditional Portfolio Risk Measure Estimation

Following Sommer (2022) <https://mediatum.ub.tum.de/1658240> portfolio level risk estimates (e.g. Value at Risk, Expected Shortfall) are estimated by modeling each asset univariately by an ARMA-GARCH model and then their cross dependence via a Vine Copula model in a rolling window fashion. One can even condition on variables/time series at certain quantile levels to stress test the risk measure estimates.

Maintained by Emanuel Sommer. Last updated 1 years ago.

expected-shortfall garch-models value-at-risk vine-copulas cpp

22 stars 5.04 score 6 scripts

alexzwanenburg

familiar:End-to-End Automated Machine Learning and Model Evaluation

Single unified interface for end-to-end modelling of regression, categorical and time-to-event (survival) outcomes. Models created using familiar are self-containing, and their use does not require additional information such as baseline survival, feature clustering, or feature transformation and normalisation parameters. Model performance, calibration, risk group stratification, (permutation) variable importance, individual conditional expectation, partial dependence, and more, are assessed automatically as part of the evaluation process and exported in tabular format and plotted, and may also be computed manually using export and plot functions. Where possible, metrics and values obtained during the evaluation process come with confidence intervals.

Maintained by Alex Zwanenburg. Last updated 6 months ago.

ai explainable-ai machine-learning survival-analysis tabular-data

30 stars 5.03 score 18 scripts

bioc

podkat:Position-Dependent Kernel Association Test

This package provides an association test that is capable of dealing with very rare and even private variants. This is accomplished by a kernel-based approach that takes the positions of the variants into account. The test can be used for pre-processed matrix data, but also directly for variant data stored in VCF files. Association testing can be performed whole-genome, whole-exome, or restricted to pre-defined regions of interest. The test is complemented by tools for analyzing and visualizing the results.

Maintained by Ulrich Bodenhofer. Last updated 5 months ago.

genetics wholegenome annotation variantannotation sequencing dataimport curl bzip2 xz-utils zlib cpp

5.02 score 6 scripts

evolutionary-optimization-laboratory

rmoo:Multi-Objective Optimization in R

The 'rmoo' package is a framework for multi- and many-objective optimization, which allows researchers and users versatility in parameter configuration, as well as tools for analysis, replication and visualization of results. The 'rmoo' package was built as a fork of the 'GA' package by Luca Scrucca(2017) <DOI:10.32614/RJ-2017-008> and implementing the Non-Dominated Sorting Genetic Algorithms proposed by K. Deb's.

Maintained by Francisco Benitez. Last updated 5 months ago.

metaheuristics multiobjective multiobjective-optimization nsga nsga2 nsga3 optimization pareto-front

30 stars 5.01 score 23 scripts

parksw3

fitode:Tools for Ordinary Differential Equations Model Fitting

Methods and functions for fitting ordinary differential equations (ODE) model in 'R'. Sensitivity equations are used to compute the gradients of ODE trajectories with respect to underlying parameters, which in turn allows for more stable fitting. Other fitting methods, such as MCMC (Markov chain Monte Carlo), are also available.

Maintained by Sang Woo Park. Last updated 1 months ago.

6 stars 5.01 score 34 scripts

bioc

fmrs:Variable Selection in Finite Mixture of AFT Regression and FMR Models

The package obtains parameter estimation, i.e., maximum likelihood estimators (MLE), via the Expectation-Maximization (EM) algorithm for the Finite Mixture of Regression (FMR) models with Normal distribution, and MLE for the Finite Mixture of Accelerated Failure Time Regression (FMAFTR) subject to right censoring with Log-Normal and Weibull distributions via the EM algorithm and the Newton-Raphson algorithm (for Weibull distribution). More importantly, the package obtains the maximum penalized likelihood (MPLE) for both FMR and FMAFTR models (collectively called FMRs). A component-wise tuning parameter selection based on a component-wise BIC is implemented in the package. Furthermore, this package provides Ridge Regression and Elastic Net.

Maintained by Farhad Shokoohi. Last updated 5 months ago.

survival regression dimensionreduction

3 stars 5.00 score 55 scripts 1 dependents

r-forge

plasma:Partial LeAst Squares for Multiomic Analysis

Contains tools for supervised analyses of incomplete, overlapping multiomics datasets. Applies partial least squares in multiple steps to find models that predict survival outcomes. See Yamaguchi et al. (2023) <doi:10.1101/2023.03.10.532096>.

Maintained by Kevin R. Coombes. Last updated 2 months ago.

4.97 score 13 scripts

feiyoung

ILSE:Linear Regression Based on 'ILSE' for Missing Data

Linear regression when covariates include missing values by embedding the correlation information between covariates. Especially for block missing data, it works well. 'ILSE' conducts imputation and regression simultaneously and iteratively. More details can be referred to Huazhen Lin, Wei Liu and Wei Lan. (2021) <doi:10.1080/07350015.2019.1635486>.

Maintained by Wei Liu. Last updated 1 years ago.

fiml ilse linear-regression missing-data openblas cpp

2 stars 4.95 score 3 scripts