Showing 200 of total 707 results (show query)
rcppcore
RcppArmadillo:'Rcpp' Integration for the 'Armadillo' Templated Linear Algebra Library
'Armadillo' is a templated C++ linear algebra library (by Conrad Sanderson) that aims towards a good balance between speed and ease of use. Integer, floating point and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. The 'RcppArmadillo' package includes the header files from the templated 'Armadillo' library. Thus users do not need to install 'Armadillo' itself in order to use 'RcppArmadillo'. From release 7.800.0 on, 'Armadillo' is licensed under Apache License 2; previous releases were under licensed as MPL 2.0 from version 3.800.0 onwards and LGPL-3 prior to that; 'RcppArmadillo' (the 'Rcpp' bindings/bridge to Armadillo) is licensed under the GNU GPL version 2 or later, as is the rest of 'Rcpp'.
Maintained by Dirk Eddelbuettel. Last updated 2 hours ago.
armadilloc-plus-plusrcpprcpparmadilloopenblascppopenmp
200 stars 18.85 score 1.9k scripts 3.4k dependentsdmlc
xgboost:Extreme Gradient Boosting
Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.
Maintained by Jiaming Yuan. Last updated 1 days ago.
distributed-systemsgbdtgbmgbrtmachine-learningxgboostcppopenmp
27k stars 17.45 score 115 dependentsglmmtmb
glmmTMB:Generalized Linear Mixed Models using Template Model Builder
Fit linear and generalized linear mixed models with various extensions, including zero-inflation. The models are fitted using maximum likelihood estimation via 'TMB' (Template Model Builder). Random effects are assumed to be Gaussian on the scale of the linear predictor and are integrated out using the Laplace approximation. Gradients are calculated using automatic differentiation.
Maintained by Mollie Brooks. Last updated 2 days ago.
314 stars 16.84 score 3.7k scripts 25 dependentssebkrantz
collapse:Advanced and Fast Data Transformation
A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.
Maintained by Sebastian Krantz. Last updated 6 days ago.
data-aggregationdata-analysisdata-manipulationdata-processingdata-sciencedata-transformationeconometricshigh-performancepanel-datascientific-computingstatisticstime-seriesweightedweightscppopenmp
672 stars 16.68 score 708 scripts 99 dependentsmarkvanderloo
stringdist:Approximate String Matching, Fuzzy Text Search, and String Distance Functions
Implements an approximate string matching version of R's native 'match' function. Also offers fuzzy text search based on various string distance measures. Can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences. This package is built for speed and runs in parallel by using 'openMP'. An API for C or C++ is exposed as well. Reference: MPJ van der Loo (2014) <doi:10.32614/RJ-2014-011>.
Maintained by Mark van der Loo. Last updated 4 months ago.
327 stars 15.54 score 2.0k scripts 179 dependentskosukeimai
MatchIt:Nonparametric Preprocessing for Parametric Causal Inference
Selects matched samples of the original treated and control groups with similar covariate distributions -- can be used to match exactly on covariates, to match on propensity scores, or perform a variety of other matching procedures. The package also implements a series of recommendations offered in Ho, Imai, King, and Stuart (2007) <DOI:10.1093/pan/mpl013>. (The 'gurobi' package, which is not on CRAN, is optional and comes with an installation of the Gurobi Optimizer, available at <https://www.gurobi.com>.)
Maintained by Noah Greifer. Last updated 14 days ago.
220 stars 15.03 score 2.4k scripts 21 dependentsphilchalmers
mirt:Multidimensional Item Response Theory
Analysis of discrete response data using unidimensional and multidimensional item analysis models under the Item Response Theory paradigm (Chalmers (2012) <doi:10.18637/jss.v048.i06>). Exploratory and confirmatory item factor analysis models are estimated with quadrature (EM) or stochastic (MHRM) methods. Confirmatory bi-factor and two-tier models are available for modeling item testlets using dimension reduction EM algorithms, while multiple group analyses and mixed effects designs are included for detecting differential item, bundle, and test functioning, and for modeling item and person covariates. Finally, latent class models such as the DINA, DINO, multidimensional latent class, mixture IRT models, and zero-inflated response models are supported, as well as a wide family of probabilistic unfolding models.
Maintained by Phil Chalmers. Last updated 22 hours ago.
212 stars 14.93 score 2.5k scripts 40 dependentslrberge
fixest:Fast Fixed-Effects Estimations
Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018) <https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf>. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.
Maintained by Laurent Berge. Last updated 7 months ago.
394 stars 14.69 score 3.8k scripts 26 dependentsr-lidar
lidR:Airborne LiDAR Data Manipulation and Visualization for Forestry Applications
Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.
Maintained by Jean-Romain Roussel. Last updated 2 months ago.
alsforestrylaslazlidarpoint-cloudremote-sensingopenblascppopenmp
623 stars 14.47 score 844 scripts 8 dependentsjkrijthe
Rtsne:T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation
An R wrapper around the fast T-distributed Stochastic Neighbor Embedding implementation by Van der Maaten (see <https://github.com/lvdmaaten/bhtsne/> for more information on the original implementation).
Maintained by Jesse Krijthe. Last updated 10 months ago.
256 stars 14.01 score 4.4k scripts 233 dependentsasgr
imager:Image Processing Library Based on 'CImg'
Fast image processing for images in up to 4 dimensions (two spatial dimensions, one time/depth dimension, one colour dimension). Provides most traditional image processing tools (filtering, morphology, transformations, etc.) as well as various functions for easily analysing image data using R. The package wraps 'CImg', <http://cimg.eu>, a simple, modern C++ library for image processing.
Maintained by Aaron Robotham. Last updated 4 days ago.
17 stars 13.53 score 2.4k scripts 44 dependentsbiodiverse
unmarked:Models for Data from Unmarked Animals
Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. References: Kellner et al. (2023) <doi:10.1111/2041-210X.14123>, Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.
Maintained by Ken Kellner. Last updated 9 days ago.
4 stars 13.02 score 652 scripts 12 dependentscran
mgcv:Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, 'JAGS' support and distributions beyond the exponential family.
Maintained by Simon Wood. Last updated 1 years ago.
32 stars 12.71 score 17k scripts 7.8k dependentscovaruber
sommer:Solving Mixed Model Equations in R
Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.
Maintained by Giovanny Covarrubias-Pazaran. Last updated 2 days ago.
average-informationmixed-modelsrcpparmadilloopenblascppopenmp
44 stars 12.63 score 300 scripts 10 dependentsbioc
SparseArray:High-performance sparse data representation and manipulation in R
The SparseArray package provides array-like containers for efficient in-memory representation of multidimensional sparse data in R (arrays and matrices). The package defines the SparseArray virtual class and two concrete subclasses: COO_SparseArray and SVT_SparseArray. Each subclass uses its own internal representation of the nonzero multidimensional data: the "COO layout" and the "SVT layout", respectively. SVT_SparseArray objects mimic as much as possible the behavior of ordinary matrix and array objects in base R. In particular, they suppport most of the "standard matrix and array API" defined in base R and in the matrixStats package from CRAN.
Maintained by Hervé Pagès. Last updated 10 days ago.
infrastructuredatarepresentationbioconductor-packagecore-packageopenmp
9 stars 12.47 score 79 scripts 1.2k dependentsprivefl
bigsnpr:Analysis of Massive SNP Arrays
Easy-to-use, efficient, flexible and scalable tools for analyzing massive SNP arrays. Privé et al. (2018) <doi:10.1093/bioinformatics/bty185>.
Maintained by Florian Privé. Last updated 22 days ago.
big-databioinformaticsmemory-mapped-fileparallel-computingpolygenic-scorespopulation-structure-inferencesnp-datastatistical-methodsopenblaszlibcppopenmp
200 stars 11.44 score 1.5k scripts 3 dependentsnlmixr2
rxode2:Facilities for Simulating from ODE-Based Models
Facilities for running simulations from ordinary differential equation ('ODE') models, such as pharmacometrics and other compartmental models. A compilation manager translates the ODE model into C, compiles it, and dynamically loads the object code into R for improved computational efficiency. An event table object facilitates the specification of complex dosing regimens (optional) and sampling schedules. NB: The use of this package requires both C and Fortran compilers, for details on their use with R please see Section 6.3, Appendix A, and Appendix D in the "R Administration and Installation" manual. Also the code is mostly released under GPL. The 'VODE' and 'LSODA' are in the public domain. The information is available in the inst/COPYRIGHTS.
Maintained by Matthew L. Fidler. Last updated 1 months ago.
40 stars 11.24 score 220 scripts 13 dependentsmarkvanderloo
gower:Gower's Distance
Compute Gower's distance (or similarity) coefficient between records. Compute the top-n matches between records. Core algorithms are executed in parallel on systems supporting OpenMP.
Maintained by Mark van der Loo. Last updated 10 months ago.
29 stars 11.19 score 66 scripts 391 dependentsmlampros
ClusterR:Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids and Affinity Propagation Clustering
Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of 'RcppArmadillo' to speed up the computationally intensive parts of the functions. For more information, see (i) "Clustering in an Object-Oriented Environment" by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) "Web-scale k-means clustering" by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) "Armadillo: a template-based C++ library for linear algebra" by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) "Clustering by Passing Messages Between Data Points" by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, <doi:10.1126/science.1136800>.
Maintained by Lampros Mouselimis. Last updated 9 months ago.
affinity-propagationcpp11gmmkmeanskmedoids-clusteringmini-batch-kmeansrcpparmadilloopenblascppopenmp
84 stars 11.08 score 640 scripts 24 dependentsstephenslab
mashr:Multivariate Adaptive Shrinkage
Implements the multivariate adaptive shrinkage (mash) method of Urbut et al (2019) <DOI:10.1038/s41588-018-0268-8> for estimating and testing large numbers of effects in many conditions (or many outcomes). Mash takes an empirical Bayes approach to testing and effect estimation; it estimates patterns of similarity among conditions, then exploits these patterns to improve accuracy of the effect estimates. The core linear algebra is implemented in C++ for fast model fitting and posterior computation.
Maintained by Peter Carbonetto. Last updated 5 months ago.
91 stars 11.04 score 624 scripts 3 dependentsgrunwaldlab
poppr:Genetic Analysis of Populations with Mixed Reproduction
Population genetic analyses for hierarchical analysis of partially clonal populations built upon the architecture of the 'adegenet' package. Originally described in Kamvar, Tabima, and Grünwald (2014) <doi:10.7717/peerj.281> with version 2.0 described in Kamvar, Brooks, and Grünwald (2015) <doi:10.3389/fgene.2015.00208>.
Maintained by Zhian N. Kamvar. Last updated 11 months ago.
clonalitygenetic-analysisgenetic-distancesminimum-spanning-networksmultilocus-genotypesmultilocus-lineagespopulation-geneticspopulationsopenmp
69 stars 10.84 score 672 scriptszdebruine
RcppML:Rcpp Machine Learning Library
Fast machine learning algorithms including matrix factorization and divisive clustering for large sparse and dense matrices.
Maintained by Zach DeBruine. Last updated 2 years ago.
clusteringmatrix-factorizationnmfrcpprcppeigensparse-matrixcppopenmp
107 stars 10.66 score 125 scripts 50 dependentsnorskregnesentral
shapr:Prediction Explanation with Dependence-Aware Shapley Values
Complex machine learning models are often hard to interpret. However, in many situations it is crucial to understand and explain why a model made a specific prediction. Shapley values is the only method for such prediction explanation framework with a solid theoretical foundation. Previously known methods for estimating the Shapley values do, however, assume feature independence. This package implements methods which accounts for any feature dependence, and thereby produces more accurate estimates of the true Shapley values. An accompanying 'Python' wrapper ('shaprpy') is available through the GitHub repository.
Maintained by Martin Jullum. Last updated 1 days ago.
explainable-aiexplainable-mlrcpprcpparmadilloshapleyopenblascppopenmp
154 stars 10.59 score 175 scripts 1 dependentsprivefl
bigstatsr:Statistical Tools for Filebacked Big Matrices
Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more <doi:10.1093/bioinformatics/bty185>.
Maintained by Florian Privé. Last updated 7 months ago.
big-datalarge-matricesmemory-mapped-fileparallel-computingstatistical-methodsopenblascppopenmp
180 stars 10.59 score 394 scripts 16 dependentsjenniniku
gllvm:Generalized Linear Latent Variable Models
Analysis of multivariate data using generalized linear latent variable models (gllvm). Estimation is performed using either the Laplace method, variational approximations, or extended variational approximations, implemented via TMB (Kristensen et al. (2016), <doi:10.18637/jss.v070.i05>).
Maintained by Jenni Niku. Last updated 1 hours ago.
52 stars 10.56 score 176 scripts 1 dependentsbioc
DECIPHER:Tools for curating, analyzing, and manipulating biological sequences
A toolset for deciphering and managing biological sequences.
Maintained by Erik Wright. Last updated 17 days ago.
clusteringgeneticssequencingdataimportvisualizationmicroarrayqualitycontrolqpcralignmentwholegenomemicrobiomeimmunooncologygenepredictionopenmp
10.55 score 1.1k scripts 14 dependentswrathematics
float:32-Bit Floats
R comes with a suite of utilities for linear algebra with "numeric" (double precision) vectors/matrices. However, sometimes single precision (or less!) is more than enough for a particular task. This package extends R's linear algebra facilities to include 32-bit float (single precision) data. Float vectors/matrices have half the precision of their "numeric"-type counterparts but are generally faster to numerically operate on, for a performance vs accuracy trade-off. The internal representation is an S4 class, which allows us to keep the syntax identical to that of base R's. Interaction between floats and base types for binary operators is generally possible; in these cases, type promotion always defaults to the higher precision. The package ships with copies of the single precision 'BLAS' and 'LAPACK', which are automatically built in the event they are not available on the system.
Maintained by Drew Schmidt. Last updated 19 days ago.
float-matrixhpclinear-algebramatrixfortranopenblasopenmp
46 stars 10.53 score 228 scripts 42 dependentsbioc
miloR:Differential neighbourhood abundance testing on a graph
Milo performs single-cell differential abundance testing. Cell states are modelled as representative neighbourhoods on a nearest neighbour graph. Hypothesis testing is performed using either a negative bionomial generalized linear model or negative binomial generalized linear mixed model.
Maintained by Mike Morgan. Last updated 5 months ago.
singlecellmultiplecomparisonfunctionalgenomicssoftwareopenblascppopenmp
357 stars 10.49 score 340 scripts 1 dependentsbioc
celda:CEllular Latent Dirichlet Allocation
Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included.
Maintained by Joshua Campbell. Last updated 1 months ago.
singlecellgeneexpressionclusteringsequencingbayesianimmunooncologydataimportcppopenmp
147 stars 10.47 score 256 scripts 2 dependentsdavid-cortes
isotree:Isolation-Based Outlier Detection
Fast and multi-threaded implementation of isolation forest (Liu, Ting, Zhou (2008) <doi:10.1109/ICDM.2008.17>), extended isolation forest (Hariri, Kind, Brunner (2018) <doi:10.48550/arXiv.1811.02141>), SCiForest (Liu, Ting, Zhou (2010) <doi:10.1007/978-3-642-15883-4_18>), fair-cut forest (Cortes (2021) <doi:10.48550/arXiv.2110.13402>), robust random-cut forest (Guha, Mishra, Roy, Schrijvers (2016) <http://proceedings.mlr.press/v48/guha16.html>), and customizable variations of them, for isolation-based outlier detection, clustered outlier detection, distance or similarity approximation (Cortes (2019) <doi:10.48550/arXiv.1910.12362>), isolation kernel calculation (Ting, Zhu, Zhou (2018) <doi:10.1145/3219819.3219990>), and imputation of missing values (Cortes (2019) <doi:10.48550/arXiv.1911.06646>), based on random or guided decision tree splitting, and providing different metrics for scoring anomalies based on isolation depth or density (Cortes (2021) <doi:10.48550/arXiv.2111.11639>). Provides simple heuristics for fitting the model to categorical columns and handling missing data, and offers options for varying between random and guided splits, and for using different splitting criteria.
Maintained by David Cortes. Last updated 7 days ago.
anomaly-detectionimputationisolation-forestoutlier-detectioncppopenmp
206 stars 10.43 score 115 scripts 6 dependentsbioc
BASiCS:Bayesian Analysis of Single-Cell Sequencing data
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model to perform statistical analyses of single-cell RNA sequencing datasets in the context of supervised experiments (where the groups of cells of interest are known a priori, e.g. experimental conditions or cell types). BASiCS performs built-in data normalisation (global scaling) and technical noise quantification (based on spike-in genes). BASiCS provides an intuitive detection criterion for highly (or lowly) variable genes within a single group of cells. Additionally, BASiCS can compare gene expression patterns between two or more pre-specified groups of cells. Unlike traditional differential expression tools, BASiCS quantifies changes in expression that lie beyond comparisons of means, also allowing the study of changes in cell-to-cell heterogeneity. The latter can be quantified via a biological over-dispersion parameter that measures the excess of variability that is observed with respect to Poisson sampling noise, after normalisation and technical noise removal. Due to the strong mean/over-dispersion confounding that is typically observed for scRNA-seq datasets, BASiCS also tests for changes in residual over-dispersion, defined by residual values with respect to a global mean/over-dispersion trend.
Maintained by Catalina Vallejos. Last updated 5 months ago.
immunooncologynormalizationsequencingrnaseqsoftwaregeneexpressiontranscriptomicssinglecelldifferentialexpressionbayesiancellbiologybioconductor-packagegene-expressionrcpprcpparmadilloscrna-seqsingle-cellopenblascppopenmp
83 stars 10.26 score 368 scripts 1 dependentsgaynorr
AlphaSimR:Breeding Program Simulations
The successor to the 'AlphaSim' software for breeding program simulation [Faux et al. (2016) <doi:10.3835/plantgenome2016.02.0013>]. Used for stochastic simulations of breeding programs to the level of DNA sequence for every individual. Contained is a wide range of functions for modeling common tasks in a breeding program, such as selection and crossing. These functions allow for constructing simulations of highly complex plant and animal breeding programs via scripting in the R software environment. Such simulations can be used to evaluate overall breeding program performance and conduct research into breeding program design, such as implementation of genomic selection. Included is the 'Markovian Coalescent Simulator' ('MaCS') for fast simulation of biallelic sequences according to a population demographic history [Chen et al. (2009) <doi:10.1101/gr.083634.108>].
Maintained by Chris Gaynor. Last updated 5 months ago.
breedinggenomicssimulationopenblascppopenmp
47 stars 10.22 score 534 scripts 2 dependentsl-ramirez-lopez
prospectr:Miscellaneous Functions for Processing and Sample Selection of Spectroscopic Data
Functions to preprocess spectroscopic data and conduct (representative) sample selection/calibration sampling.
Maintained by Leonardo Ramirez-Lopez. Last updated 11 days ago.
chemometricsderivativesinfrarednear-infrarednirpedometricspreprocessingresamplesamplingsignalsoil-spectroscopyspectroscopyopenblascppopenmp
42 stars 10.22 score 326 scripts 4 dependentszarquon42b
Rvcg:Manipulations of Triangular Meshes Based on the 'VCGLIB' API
Operations on triangular meshes based on 'VCGLIB'. This package integrates nicely with the R-package 'rgl' to render the meshes processed by 'Rvcg'. The Visualization and Computer Graphics Library (VCG for short) is an open source portable C++ templated library for manipulation, processing and displaying with OpenGL of triangle and tetrahedral meshes. The library, composed by more than 100k lines of code, is released under the GPL license, and it is the base of most of the software tools of the Visual Computing Lab of the Italian National Research Council Institute ISTI <https://vcg.isti.cnr.it/>, like 'metro' and 'MeshLab'. The 'VCGLIB' source is pulled from trunk <https://github.com/cnr-isti-vclab/vcglib> and patched to work with options determined by the configure script as well as to work with the header files included by 'RcppEigen'.
Maintained by Stefan Schlager. Last updated 5 months ago.
25 stars 10.20 score 195 scripts 29 dependentsstewid
SimInf:A Framework for Data-Driven Stochastic Disease Spread Simulations
Provides an efficient and very flexible framework to conduct data-driven epidemiological modeling in realistic large scale disease spread simulations. The framework integrates infection dynamics in subpopulations as continuous-time Markov chains using the Gillespie stochastic simulation algorithm and incorporates available data such as births, deaths and movements as scheduled events at predefined time-points. Using C code for the numerical solvers and 'OpenMP' (if available) to divide work over multiple processors ensures high performance when simulating a sample outcome. One of our design goals was to make the package extendable and enable usage of the numerical solvers from other R extension packages in order to facilitate complex epidemiological research. The package contains template models and can be extended with user-defined models. For more details see the paper by Widgren, Bauer, Eriksson and Engblom (2019) <doi:10.18637/jss.v091.i12>. The package also provides functionality to fit models to time series data using the Approximate Bayesian Computation Sequential Monte Carlo ('ABC-SMC') algorithm of Toni and others (2009) <doi:10.1098/rsif.2008.0172>.
Maintained by Stefan Widgren. Last updated 16 days ago.
data-drivenepidemiologyhigh-performance-computingmarkov-chainmathematical-modellinggslopenmp
35 stars 10.09 score 227 scriptszarquon42b
Morpho:Calculations and Visualisations Related to Geometric Morphometrics
A toolset for Geometric Morphometrics and mesh processing. This includes (among other stuff) mesh deformations based on reference points, permutation tests, detection of outliers, processing of sliding semi-landmarks and semi-automated surface landmark placement.
Maintained by Stefan Schlager. Last updated 5 months ago.
51 stars 10.01 score 218 scripts 13 dependentskogalur
randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.
Maintained by Udaya B. Kogalur. Last updated 6 days ago.
124 stars 10.01 score 1.2k scripts 11 dependentsmlampros
OpenImageR:An Image Processing Toolkit
Incorporates functions for image preprocessing, filtering and image recognition. The package takes advantage of 'RcppArmadillo' to speed up computationally intensive functions. The histogram of oriented gradients descriptor is a modification of the 'findHOGFeatures' function of the 'SimpleCV' computer vision platform, the average_hash(), dhash() and phash() functions are based on the 'ImageHash' python library. The Gabor Feature Extraction functions are based on 'Matlab' code of the paper, "CloudID: Trustworthy cloud-based and cross-enterprise biometric identification" by M. Haghighat, S. Zonouz, M. Abdel-Mottaleb, Expert Systems with Applications, vol. 42, no. 21, pp. 7905-7916, 2015, <doi:10.1016/j.eswa.2015.06.025>. The 'SLIC' and 'SLICO' superpixel algorithms were explained in detail in (i) "SLIC Superpixels Compared to State-of-the-art Superpixel Methods", Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Suesstrunk, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, num. 11, p. 2274-2282, May 2012, <doi:10.1109/TPAMI.2012.120> and (ii) "SLIC Superpixels", Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Suesstrunk, EPFL Technical Report no. 149300, June 2010.
Maintained by Lampros Mouselimis. Last updated 2 years ago.
filteringgabor-feature-extractiongabor-filtershog-featuresimageimage-hashingprocessingrcpparmadillorecognitionslicslicosuperpixelsopenblascppopenmp
60 stars 9.86 score 358 scripts 8 dependentspbreheny
biglasso:Extending Lasso Model Fitting to Big Data
Extend lasso and elastic-net model fitting for large data sets that cannot be loaded into memory. Designed to be more memory- and computation-efficient than existing lasso-fitting packages like 'glmnet' and 'ncvreg', thus allowing the user to analyze big data with limited RAM <doi:10.32614/RJ-2021-001>.
Maintained by Patrick Breheny. Last updated 23 days ago.
bigdatalassoout-of-coreparallel-computingcppopenmp
113 stars 9.84 score 74 scripts 1 dependentsjasonjfoster
roll:Rolling and Expanding Statistics
Fast and efficient computation of rolling and expanding statistics for time-series data.
Maintained by Jason Foster. Last updated 2 months ago.
algorithmsrcppstatisticsopenblascppopenmp
116 stars 9.76 score 318 scripts 13 dependentscoatless-rpkg
sitmo:Parallel Pseudo Random Number Generator (PPRNG) 'sitmo' Header Files
Provided within are two high quality and fast PPRNGs that may be used in an 'OpenMP' parallel environment. In addition, there is a generator for one dimensional low-discrepancy sequence. The objective of this library to consolidate the distribution of the 'sitmo' (C++98 & C++11), 'threefry' and 'vandercorput' (C++11-only) engines on CRAN by enabling others to link to the header files inside of 'sitmo' instead of including a copy of each engine within their individual package. Lastly, the package contains example implementations using the 'sitmo' package and three accompanying vignette that provide additional information.
Maintained by James Balamuta. Last updated 1 years ago.
parallelrandom-generationrcppcppopenmp
7 stars 9.75 score 15 scripts 201 dependentsjolars
SLOPE:Sorted L1 Penalized Estimation
Efficient implementations for Sorted L-One Penalized Estimation (SLOPE): generalized linear models regularized with the sorted L1-norm (Bogdan et al. 2015). Supported models include ordinary least-squares regression, binomial regression, multinomial regression, and Poisson regression. Both dense and sparse predictor matrices are supported. In addition, the package features predictor screening rules that enable fast and efficient solutions to high-dimensional problems.
Maintained by Johan Larsson. Last updated 14 days ago.
generalized-linear-modelsslopesparse-regressioncppopenmp
17 stars 9.62 score 75 scripts 3 dependentsdonaldrwilliams
BGGM:Bayesian Gaussian Graphical Models
Fit Bayesian Gaussian graphical models. The methods are separated into two Bayesian approaches for inference: hypothesis testing and estimation. There are extensions for confirmatory hypothesis testing, comparing Gaussian graphical models, and node wise predictability. These methods were recently introduced in the Gaussian graphical model literature, including Williams (2019) <doi:10.31234/osf.io/x8dpr>, Williams and Mulder (2019) <doi:10.31234/osf.io/ypxd8>, Williams, Rast, Pericchi, and Mulder (2019) <doi:10.31234/osf.io/yt386>.
Maintained by Philippe Rast. Last updated 3 months ago.
bayes-factorsbayesian-hypothesis-testinggaussian-graphical-modelsopenblascppopenmp
55 stars 9.61 score 102 scripts 1 dependentsschochastics
netrankr:Analyzing Partial Rankings in Networks
Implements methods for centrality related analyses of networks. While the package includes the possibility to build more than 20 indices, its main focus lies on index-free assessment of centrality via partial rankings obtained by neighborhood-inclusion or positional dominance. These partial rankings can be analyzed with different methods, including probabilistic methods like computing expected node ranks and relative rank probabilities (how likely is it that a node is more central than another?). The methodology is described in depth in the vignettes and in Schoch (2018) <doi:10.1016/j.socnet.2017.12.003>.
Maintained by David Schoch. Last updated 2 months ago.
network-analysisnetwork-centralityopenblascppopenmp
49 stars 9.56 score 91 scripts 2 dependentsropensci
aorsf:Accelerated Oblique Random Forests
Fit, interpret, and compute predictions with oblique random forests. Includes support for partial dependence, variable importance, passing customized functions for variable importance and identification of linear combinations of features. Methods for the oblique random survival forest are described in Jaeger et al., (2023) <DOI:10.1080/10618600.2023.2231048>.
Maintained by Byron Jaeger. Last updated 7 days ago.
data-scienceobliquerandom-forestsurvivalopenblascppopenmp
58 stars 9.29 score 60 scripts 1 dependentsalarm-redist
redist:Simulation Methods for Legislative Redistricting
Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <arXiv:1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <arXiv:2011.02288>.
Maintained by Christopher T. Kenny. Last updated 2 months ago.
geospatialgerrymanderingredistrictingsamplingopenblascppopenmp
68 stars 9.17 score 259 scriptsmlampros
KernelKnn:Kernel k Nearest Neighbors
Extends the simple k-nearest neighbors algorithm by incorporating numerous kernel functions and a variety of distance metrics. The package takes advantage of 'RcppArmadillo' to speed up the calculation of distances between observations.
Maintained by Lampros Mouselimis. Last updated 2 years ago.
cpp11distance-metrickernel-methodsknnrcpparmadilloopenblascppopenmp
17 stars 9.16 score 54 scripts 13 dependents2005m
kit:Data Manipulation Functions Implemented in C
Basic functions, implemented in C, for large data manipulation. Fast vectorised ifelse()/nested if()/switch() functions, psum()/pprod() functions equivalent to pmin()/pmax() plus others which are missing from base R. Most of these functions are callable at C level.
Maintained by Morgan Jacob. Last updated 7 months ago.
58 stars 9.11 score 92 scripts 5 dependentsdavid-cortes
MatrixExtra:Extra Methods for Sparse Matrices
Extends sparse matrix and vector classes from the 'Matrix' package by providing: (a) Methods and operators that work natively on CSR formats (compressed sparse row, a.k.a. 'RsparseMatrix') such as slicing/sub-setting, assignment, rbind(), mathematical operators for CSR and COO such as addition ("+") or sqrt(), and methods such as diag(); (b) Multi-threaded matrix multiplication and cross-product for many <sparse, dense> types, including the 'float32' type from 'float'; (c) Coercion methods between pairs of classes which are not present in 'Matrix', such as 'dgCMatrix' -> 'ngRMatrix', as well as convenience conversion functions; (d) Utility functions for sparse matrices such as sorting the indices or removing zero-valued entries; (e) Fast transposes that work by outputting in the opposite storage format; (f) Faster replacements for many 'Matrix' methods for all sparse types, such as slicing and elementwise multiplication. (g) Convenience functions for sparse objects, such as 'mapSparse' or a shorter 'show' method.
Maintained by David Cortes. Last updated 9 months ago.
csrsparse-matrixopenblascppopenmp
20 stars 9.08 score 84 scripts 29 dependentsdexter-psychometrics
dexter:Data Management and Analysis of Tests
A system for the management, assessment, and psychometric analysis of data from educational and psychological tests.
Maintained by Jesse Koops. Last updated 17 days ago.
8 stars 8.97 score 135 scripts 2 dependentsbhklab
mRMRe:Parallelized Minimum Redundancy, Maximum Relevance (mRMR)
Computes mutual information matrices from continuous, categorical and survival variables, as well as feature selection with minimum redundancy, maximum relevance (mRMR) and a new ensemble mRMR technique. Published in De Jay et al. (2013) <doi:10.1093/bioinformatics/btt383>.
Maintained by Benjamin Haibe-Kains. Last updated 4 years ago.
19 stars 8.95 score 105 scripts 2 dependentsgraemeleehickey
joineRML:Joint Modelling of Multivariate Longitudinal Data and Time-to-Event Outcomes
Fits the joint model proposed by Henderson and colleagues (2000) <doi:10.1093/biostatistics/1.4.465>, but extended to the case of multiple continuous longitudinal measures. The time-to-event data is modelled using a Cox proportional hazards regression model with time-varying covariates. The multiple longitudinal outcomes are modelled using a multivariate version of the Laird and Ware linear mixed model. The association is captured by a multivariate latent Gaussian process. The model is estimated using a Monte Carlo Expectation Maximization algorithm. This project was funded by the Medical Research Council (Grant number MR/M013227/1).
Maintained by Graeme L. Hickey. Last updated 2 months ago.
armadillobiostatisticsclinical-trialscoxdynamicjoint-modelslongitudinal-datamultivariate-analysismultivariate-datamultivariate-longitudinal-datapredictionrcppregression-modelsstatisticssurvivalopenblascppopenmp
30 stars 8.93 score 146 scripts 1 dependentswrathematics
coop:Co-Operation: Fast Covariance, Correlation, and Cosine Similarity Operations
Fast implementations of the co-operations: covariance, correlation, and cosine similarity. The implementations are fast and memory-efficient and their use is resolved automatically based on the input data, handled by R's S3 methods. Full descriptions of the algorithms and benchmarks are available in the package vignettes.
Maintained by Drew Schmidt. Last updated 3 years ago.
35 stars 8.92 score 214 scripts 16 dependentsbioc
BayesSpace:Clustering and Resolution Enhancement of Spatial Transcriptomes
Tools for clustering and enhancing the resolution of spatial gene expression experiments. BayesSpace clusters a low-dimensional representation of the gene expression matrix, incorporating a spatial prior to encourage neighboring spots to cluster together. The method can enhance the resolution of the low-dimensional representation into "sub-spots", for which features such as gene expression or cell type composition can be imputed.
Maintained by Matt Stone. Last updated 5 months ago.
softwareclusteringtranscriptomicsgeneexpressionsinglecellimmunooncologydataimportopenblascppopenmp
123 stars 8.89 score 278 scripts 1 dependentsusccana
netdiffuseR:Analysis of Diffusion and Contagion Processes on Networks
Empirical statistical analysis, visualization and simulation of diffusion and contagion processes on networks. The package implements algorithms for calculating network diffusion statistics such as transmission rate, hazard rates, exposure models, network threshold levels, infectiousness (contagion), and susceptibility. The package is inspired by work published in Valente, et al., (2015) <DOI:10.1016/j.socscimed.2015.10.001>; Valente (1995) <ISBN: 9781881303213>, Myers (2000) <DOI:10.1086/303110>, Iyengar and others (2011) <DOI:10.1287/mksc.1100.0566>, Burt (1987) <DOI:10.1086/228667>; among others.
Maintained by George Vega Yon. Last updated 4 months ago.
contagiondiffusion-networknetwork-analysisnetwork-visualizationopenblascppopenmp
88 stars 8.88 score 217 scriptsatmoschem
vein:Vehicular Emissions Inventories
Elaboration of vehicular emissions inventories, consisting in four stages, pre-processing activity data, preparing emissions factors, estimating the emissions and post-processing of emissions in maps and databases. More details in Ibarra-Espinosa et al (2018) <doi:10.5194/gmd-11-2209-2018>. Before using VEIN you need to know the vehicular composition of your study area, in other words, the combination of of type of vehicles, size and fuel of the fleet. Then, it is recommended to start with the project to download a template to create a structure of directories and scripts.
Maintained by Sergio Ibarra-Espinosa. Last updated 21 hours ago.
atmoschematmospheric-chemistryatmospheric-scienceatmospheric-sciencesemissionsemissions-modelvehicular-emissions-inventoriesveinfortranopenmp
46 stars 8.73 score 137 scriptshelske
seqHMM:Mixture Hidden Markov Models for Social Sequence Data and Other Multivariate, Multichannel Categorical Time Series
Designed for fitting hidden (latent) Markov models and mixture hidden Markov models for social sequence data and other categorical time series. Also some more restricted versions of these type of models are available: Markov models, mixture Markov models, and latent class models. The package supports models for one or multiple subjects with one or multiple parallel sequences (channels). External covariates can be added to explain cluster membership in mixture models. The package provides functions for evaluating and comparing models, as well as functions for visualizing of multichannel sequence data and hidden Markov models. Models are estimated using maximum likelihood via the EM algorithm and/or direct numerical maximization with analytical gradients. All main algorithms are written in C++ with support for parallel computation. Documentation is available via several vignettes in this page, and the paper by Helske and Helske (2019, <doi:10.18637/jss.v088.i03>).
Maintained by Jouni Helske. Last updated 2 years ago.
categorical-dataem-algorithmhidden-markov-modelshmmmixture-markov-modelstime-seriesopenblascppopenmp
98 stars 8.52 score 92 scripts 1 dependentsdcgerard
updog:Flexible Genotyping for Polyploids
Implements empirical Bayes approaches to genotype polyploids from next generation sequencing data while accounting for allele bias, overdispersion, and sequencing error. The main functions are flexdog() and multidog(), which allow the specification of many different genotype distributions. Also provided are functions to simulate genotypes, rgeno(), and read-counts, rflexdog(), as well as functions to calculate oracle genotyping error rates, oracle_mis(), and correlation with the true genotypes, oracle_cor(). These latter two functions are useful for read depth calculations. Run browseVignettes(package = "updog") in R for example usage. See Gerard et al. (2018) <doi:10.1534/genetics.118.301468> and Gerard and Ferrao (2020) <doi:10.1093/bioinformatics/btz852> for details on the implemented methods.
Maintained by David Gerard. Last updated 1 years ago.
28 stars 8.45 score 83 scripts 2 dependentscollinerickson
GauPro:Gaussian Process Fitting
Fits a Gaussian process model to data. Gaussian processes are commonly used in computer experiments to fit an interpolating model. The model is stored as an 'R6' object and can be easily updated with new data. There are options to run in parallel, and 'Rcpp' has been used to speed up calculations. For more info about Gaussian process software, see Erickson et al. (2018) <doi:10.1016/j.ejor.2017.10.002>.
Maintained by Collin Erickson. Last updated 12 days ago.
16 stars 8.44 score 104 scripts 1 dependentsbyoungman
evgam:Generalised Additive Extreme Value Models
Methods for fitting various extreme value distributions with parameters of generalised additive model (GAM) form are provided. For details of distributions see Coles, S.G. (2001) <doi:10.1007/978-1-4471-3675-0>, GAMs see Wood, S.N. (2017) <doi:10.1201/9781315370279>, and the fitting approach see Wood, S.N., Pya, N. & Safken, B. (2016) <doi:10.1080/01621459.2016.1180986>. Details of how evgam works and various examples are given in Youngman, B.D. (2022) <doi:10.18637/jss.v103.i03>.
Maintained by Ben Youngman. Last updated 15 days ago.
6 stars 8.43 score 82 scripts 12 dependentskss2k
modsem:Latent Interaction (and Moderation) Analysis in Structural Equation Models (SEM)
Estimation of interaction (i.e., moderation) effects between latent variables in structural equation models (SEM). The supported methods are: The constrained approach (Algina & Moulder, 2001). The unconstrained approach (Marsh et al., 2004). The residual centering approach (Little et al., 2006). The double centering approach (Lin et al., 2010). The latent moderated structural equations (LMS) approach (Klein & Moosbrugger, 2000). The quasi-maximum likelihood (QML) approach (Klein & Muthén, 2007) (temporarily unavailable) The constrained- unconstrained, residual- and double centering- approaches are estimated via 'lavaan' (Rosseel, 2012), whilst the LMS- and QML- approaches are estimated via 'modsem' it self. Alternatively model can be estimated via 'Mplus' (Muthén & Muthén, 1998-2017). References: Algina, J., & Moulder, B. C. (2001). <doi:10.1207/S15328007SEM0801_3>. "A note on estimating the Jöreskog-Yang model for latent variable interaction using 'LISREL' 8.3." Klein, A., & Moosbrugger, H. (2000). <doi:10.1007/BF02296338>. "Maximum likelihood estimation of latent interaction effects with the LMS method." Klein, A. G., & Muthén, B. O. (2007). <doi:10.1080/00273170701710205>. "Quasi-maximum likelihood estimation of structural equation models with multiple interaction and quadratic effects." Lin, G. C., Wen, Z., Marsh, H. W., & Lin, H. S. (2010). <doi:10.1080/10705511.2010.488999>. "Structural equation models of latent interactions: Clarification of orthogonalizing and double-mean-centering strategies." Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). <doi:10.1207/s15328007sem1304_1>. "On the merits of orthogonalizing powered and product terms: Implications for modeling interactions among latent variables." Marsh, H. W., Wen, Z., & Hau, K. T. (2004). <doi:10.1037/1082-989X.9.3.275>. "Structural equation models of latent interactions: evaluation of alternative estimation strategies and indicator construction." Muthén, L.K. and Muthén, B.O. (1998-2017). "'Mplus' User’s Guide. Eighth Edition." <https://www.statmodel.com/>. Rosseel Y (2012). <doi:10.18637/jss.v048.i02>. "'lavaan': An R Package for Structural Equation Modeling."
Maintained by Kjell Solem Slupphaug. Last updated 1 days ago.
interaction-effectinteraction-effectslatent-moderated-structural-equationslavaan-syntaxlmsmoderationqmlquasi-maximum-likelihoodrlangrlanguagesemstructural-equation-modelingstructural-equation-modelsopenblascppopenmp
6 stars 8.41 score 54 scriptskisungyou
Rdimtools:Dimension Reduction and Estimation Methods
We provide linear and nonlinear dimension reduction techniques. Intrinsic dimension estimation methods for exploratory analysis are also provided. For more details on the package, see the paper by You and Shung (2022) <doi:10.1016/j.simpa.2022.100414>.
Maintained by Kisung You. Last updated 2 years ago.
dimension-estimationdimension-reductionmanifold-learningsubspace-learningopenblascppopenmp
52 stars 8.37 score 186 scripts 8 dependentsnlmixr2
nlmixr2est:Nonlinear Mixed Effects Models in Population PK/PD, Estimation Routines
Fit and compare nonlinear mixed-effects models in differential equations with flexible dosing information commonly seen in pharmacokinetics and pharmacodynamics (Almquist, Leander, and Jirstrand 2015 <doi:10.1007/s10928-015-9409-1>). Differential equation solving is by compiled C code provided in the 'rxode2' package (Wang, Hallow, and James 2015 <doi:10.1002/psp4.12052>).
Maintained by Matthew Fidler. Last updated 11 days ago.
9 stars 8.33 score 26 scripts 9 dependentsuofuepibio
epiworldR:Fast Agent-Based Epi Models
A flexible framework for Agent-Based Models (ABM), the 'epiworldR' package provides methods for prototyping disease outbreaks and transmission models using a 'C++' backend, making it very fast. It supports multiple epidemiological models, including the Susceptible-Infected-Susceptible (SIS), Susceptible-Infected-Removed (SIR), Susceptible-Exposed-Infected-Removed (SEIR), and others, involving arbitrary mitigation policies and multiple-disease models. Users can specify infectiousness/susceptibility rates as a function of agents' features, providing great complexity for the model dynamics. Furthermore, 'epiworldR' is ideal for simulation studies featuring large populations.
Maintained by Andrew Pulsipher. Last updated 23 days ago.
abmagent-based-modelingcovid-19epidemicsepidemiologyr-programmingrpackrpkgseirseir-modelsimulationsirsir-modelcppopenmp
9 stars 8.33 score 58 scripts 1 dependentsjsilve24
fido:Bayesian Multinomial Logistic Normal Regression
Provides methods for fitting and inspection of Bayesian Multinomial Logistic Normal Models using MAP estimation and Laplace Approximation as developed in Silverman et. Al. (2022) <https://www.jmlr.org/papers/v23/19-882.html>. Key functionality is implemented in C++ for scalability. 'fido' replaces the previous package 'stray'.
Maintained by Justin Silverman. Last updated 30 days ago.
20 stars 8.31 score 103 scriptsdrizopoulos
JMbayes2:Extended Joint Models for Longitudinal and Time-to-Event Data
Fit joint models for longitudinal and time-to-event data under the Bayesian approach. Multiple longitudinal outcomes of mixed type (continuous/categorical) and multiple event times (competing risks and multi-state processes) are accommodated. Rizopoulos (2012, ISBN:9781439872864).
Maintained by Dimitris Rizopoulos. Last updated 23 days ago.
competing-riskslongitudinal-analysismixed-modelsmulti-statepersonalized-medicineprecision-medicineprediction-modelsurvival-modelsopenblascppopenmp
84 stars 8.27 score 264 scripts 2 dependentsvalentint
tclust:Robust Trimmed Clustering
Provides functions for robust trimmed clustering. The methods are described in Garcia-Escudero (2008) <doi:10.1214/07-AOS515>, Fritz et al. (2012) <doi:10.18637/jss.v047.i12>, Garcia-Escudero et al. (2011) <doi:10.1007/s11222-010-9194-z> and others.
Maintained by Valentin Todorov. Last updated 1 months ago.
3 stars 8.26 score 72 scripts 3 dependentslbelzile
mev:Modelling of Extreme Values
Various tools for the analysis of univariate, multivariate and functional extremes. Exact simulation from max-stable processes [Dombry, Engelke and Oesting (2016) <doi:10.1093/biomet/asw008>, R-Pareto processes for various parametric models, including Brown-Resnick (Wadsworth and Tawn, 2014, <doi:10.1093/biomet/ast042>) and Extremal Student (Thibaud and Opitz, 2015, <doi:10.1093/biomet/asv045>). Threshold selection methods, including Wadsworth (2016) <doi:10.1080/00401706.2014.998345>, and Northrop and Coleman (2014) <doi:10.1007/s10687-014-0183-z>. Multivariate extreme diagnostics. Estimation and likelihoods for univariate extremes, e.g., Coles (2001) <doi:10.1007/978-1-4471-3675-0>.
Maintained by Leo Belzile. Last updated 5 months ago.
extreme-value-statisticslikelihood-functionsmax-stablesimulationthreshold-selectionopenblascppopenmp
14 stars 8.21 score 94 scripts 4 dependentsalexiosg
rmgarch:Multivariate GARCH Models
Feasible multivariate GARCH models including DCC, GO-GARCH and Copula-GARCH.
Maintained by Alexios Galanos. Last updated 3 months ago.
14 stars 8.11 score 294 scripts 1 dependentsrfastofficial
Rfast2:A Collection of Efficient and Extremely Fast R Functions II
A collection of fast statistical and utility functions for data analysis. Functions for regression, maximum likelihood, column-wise statistics and many more have been included. C++ has been utilized to speed up the functions. References: Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>.
Maintained by Manos Papadakis. Last updated 1 years ago.
38 stars 8.09 score 75 scripts 26 dependentslbelzile
TruncatedNormal:Truncated Multivariate Normal and Student Distributions
A collection of functions to deal with the truncated univariate and multivariate normal and Student distributions, described in Botev (2017) <doi:10.1111/rssb.12162> and Botev and L'Ecuyer (2015) <doi:10.1109/WSC.2015.7408180>.
Maintained by Leo Belzile. Last updated 29 days ago.
gaussianstudent-distributionstruncatedopenblascppopenmp
8 stars 8.08 score 116 scripts 18 dependentsmrc-ide
dust:Iterate Multiple Realisations of Stochastic Models
An Engine for simulation of stochastic models. Includes support for running stochastic models in parallel, either with shared or varying parameters. Simulations are run efficiently in compiled code and can be run with a fraction of simulated states returned to R, allowing control over memory usage. Support is provided for building bootstrap particle filter for performing Sequential Monte Carlo (e.g., Gordon et al. 1993 <doi:10.1049/ip-f-2.1993.0015>). The core of the simulation engine is the 'xoshiro256**' algorithm (Blackman and Vigna <arXiv:1805.01407>), and the package is further described in FitzJohn et al 2021 <doi:10.12688/wellcomeopenres.16466.2>.
Maintained by Rich FitzJohn. Last updated 8 days ago.
18 stars 8.07 score 60 scripts 3 dependentsdschuhmacher
transport:Computation of Optimal Transport Plans and Wasserstein Distances
Solve optimal transport problems. Compute Wasserstein distances (a.k.a. Kantorovitch, Fortet--Mourier, Mallows, Earth Mover's, or minimal L_p distances), return the corresponding transference plans, and display them graphically. Objects that can be compared include grey-scale images, (weighted) point patterns, and mass vectors.
Maintained by Dominic Schuhmacher. Last updated 6 months ago.
5 stars 8.07 score 414 scripts 22 dependentsacorg
Racmacs:Antigenic Cartography Macros
A toolkit for making antigenic maps from immunological assay data, in order to quantify and visualize antigenic differences between different pathogen strains as described in Smith et al. (2004) <doi:10.1126/science.1097211> and used in the World Health Organization influenza vaccine strain selection process. Additional functions allow for the diagnostic evaluation of antigenic maps and an interactive viewer is provided to explore antigenic relationships amongst several strains and incorporate the visualization of associated genetic information.
Maintained by Sam Wilks. Last updated 9 months ago.
21 stars 8.06 score 362 scriptsrsparapa
BART:Bayesian Additive Regression Trees
Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary, categorical and time-to-event outcomes. For more information see Sparapani, Spanbauer and McCulloch <doi:10.18637/jss.v097.i01>.
Maintained by Rodney Sparapani. Last updated 9 months ago.
14 stars 8.05 score 474 scripts 10 dependentskharchenkolab
pagoda2:Single Cell Analysis and Differential Expression
Analyzing and interactively exploring large-scale single-cell RNA-seq datasets. 'pagoda2' primarily performs normalization and differential gene expression analysis, with an interactive application for exploring single-cell RNA-seq datasets. It performs basic tasks such as cell size normalization, gene variance normalization, and can be used to identify subpopulations and run differential expression within individual samples. 'pagoda2' was written to rapidly process modern large-scale scRNAseq datasets of approximately 1e6 cells. The companion web application allows users to explore which gene expression patterns form the different subpopulations within your data. The package also serves as the primary method for preprocessing data for conos, <https://github.com/kharchenkolab/conos>. This package interacts with data available through the 'p2data' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/pagoda2>. The size of the 'p2data' package is approximately 6 MB.
Maintained by Evan Biederstedt. Last updated 1 years ago.
scrna-seqsingle-cellsingle-cell-rna-seqtranscriptomicsopenblascppopenmp
223 stars 8.00 score 282 scriptskosukeimai
fastLink:Fast Probabilistic Record Linkage with Missing Data
Implements a Fellegi-Sunter probabilistic record linkage model that allows for missing data and the inclusion of auxiliary information. This includes functionalities to conduct a merge of two datasets under the Fellegi-Sunter model using the Expectation-Maximization algorithm. In addition, tools for preparing, adjusting, and summarizing data merges are included. The package implements methods described in Enamorado, Fifield, and Imai (2019) ''Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records'' <doi:10.1017/S0003055418000783> and is available at <https://imai.fas.harvard.edu/research/linkage.html>.
Maintained by Ted Enamorado. Last updated 1 years ago.
279 stars 7.98 score 95 scripts 1 dependentsyixuan
recosystem:Recommender System using Matrix Factorization
R wrapper of the 'libmf' library <https://www.csie.ntu.edu.tw/~cjlin/libmf/> for recommender system using matrix factorization. It is typically used to approximate an incomplete matrix using the product of two matrices in a latent space. Other common names for this task include "collaborative filtering", "matrix completion", "matrix recovery", etc. High performance multi-core parallel computing is supported in this package.
Maintained by Yixuan Qiu. Last updated 2 years ago.
matrix-factorizationrecommender-systemcppopenmp
84 stars 7.97 score 101 scripts 6 dependentsocbe-uio
BayesMallows:Bayesian Preference Learning with the Mallows Rank Model
An implementation of the Bayesian version of the Mallows rank model (Vitelli et al., Journal of Machine Learning Research, 2018 <https://jmlr.org/papers/v18/15-481.html>; Crispino et al., Annals of Applied Statistics, 2019 <doi:10.1214/18-AOAS1203>; Sorensen et al., R Journal, 2020 <doi:10.32614/RJ-2020-026>; Stein, PhD Thesis, 2023 <https://eprints.lancs.ac.uk/id/eprint/195759>). Both Metropolis-Hastings and sequential Monte Carlo algorithms for estimating the models are available. Cayley, footrule, Hamming, Kendall, Spearman, and Ulam distances are supported in the models. The rank data to be analyzed can be in the form of complete rankings, top-k rankings, partially missing rankings, as well as consistent and inconsistent pairwise preferences. Several functions for plotting and studying the posterior distributions of parameters are provided. The package also provides functions for estimating the partition function (normalizing constant) of the Mallows rank model, both with the importance sampling algorithm of Vitelli et al. and asymptotic approximation with the IPFP algorithm (Mukherjee, Annals of Statistics, 2016 <doi:10.1214/15-AOS1389>).
Maintained by Oystein Sorensen. Last updated 2 months ago.
mallows-modelopenblascppopenmp
21 stars 7.91 score 36 scripts 1 dependentsstocnet
goldfish:Statistical Network Models for Dynamic Network Data
Tools for fitting statistical network models to dynamic network data. Can be used for fitting both dynamic network actor models ('DyNAMs') and relational event models ('REMs'). Stadtfeld, Hollway, and Block (2017a) <doi:10.1177/0081175017709295>, Stadtfeld, Hollway, and Block (2017b) <doi:10.1177/0081175017733457>, Stadtfeld and Block (2017) <doi:10.15195/v4.a14>, Hoffman et al. (2020) <doi:10.1017/nws.2020.3>.
Maintained by Alvaro Uzaheta. Last updated 7 months ago.
dynamnetwork-modellingremstatistical-network-analysisopenblascppopenmp
61 stars 7.91 score 44 scriptsdavidrusi
mombf:Model Selection with Bayesian Methods and Information Criteria
Model selection and averaging for regression and mixtures, inclusing Bayesian model selection and information criteria (BIC, EBIC, AIC, GIC).
Maintained by David Rossell. Last updated 2 months ago.
7 stars 7.89 score 73 scripts 1 dependentsjdtuck
fdasrvf:Elastic Functional Data Analysis
Performs alignment, PCA, and modeling of multidimensional and unidimensional functions using the square-root velocity framework (Srivastava et al., 2011 <doi:10.48550/arXiv.1103.3817> and Tucker et al., 2014 <DOI:10.1016/j.csda.2012.12.001>). This framework allows for elastic analysis of functional data through phase and amplitude separation.
Maintained by J. Derek Tucker. Last updated 1 months ago.
13 stars 7.79 score 83 scripts 3 dependentscristiancastiglione
sgdGMF:Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent
Efficient framework to estimate high-dimensional generalized matrix factorization models using penalized maximum likelihood under a dispersion exponential family specification. Either deterministic and stochastic methods are implemented for the numerical maximization. In particular, the package implements the stochastic gradient descent algorithm with a block-wise mini-batch strategy to speed up the computations and an efficient adaptive learning rate schedule to stabilize the convergence. All the theoretical details can be found in Castiglione, Segers, Clement, Risso (2024, <https://arxiv.org/abs/2412.20509>). Other methods considered for the optimization are the alternated iterative re-weighted least squares and the quasi-Newton method with diagonal approximation of the Fisher information matrix discussed in Kidzinski, Hui, Warton, Hastie (2022, <http://jmlr.org/papers/v23/20-1104.html>).
Maintained by Cristian Castiglione. Last updated 23 days ago.
10 stars 7.75 score 108 scriptseco-hydro
phenofit:Extract Remote Sensing Vegetation Phenology
The merits of 'TIMESAT' and 'phenopix' are adopted. Besides, a simple and growing season dividing method and a practical snow elimination method based on Whittaker were proposed. 7 curve fitting methods and 4 phenology extraction methods were provided. Parameters boundary are considered for every curve fitting methods according to their ecological meaning. And 'optimx' is used to select best optimization method for different curve fitting methods. Reference: Kong, D., (2020). R package: A state-of-the-art Vegetation Phenology extraction package, phenofit version 0.3.1, <doi:10.5281/zenodo.5150204>; Kong, D., Zhang, Y., Wang, D., Chen, J., & Gu, X. (2020). Photoperiod Explains the Asynchronization Between Vegetation Carbon Phenology and Vegetation Greenness Phenology. Journal of Geophysical Research: Biogeosciences, 125(8), e2020JG005636. <doi:10.1029/2020JG005636>; Kong, D., Zhang, Y., Gu, X., & Wang, D. (2019). A robust method for reconstructing global MODIS EVI time series on the Google Earth Engine. ISPRS Journal of Photogrammetry and Remote Sensing, 155, 13–24; Zhang, Q., Kong, D., Shi, P., Singh, V.P., Sun, P., 2018. Vegetation phenology on the Qinghai-Tibetan Plateau and its response to climate change (1982–2013). Agric. For. Meteorol. 248, 408–417. <doi:10.1016/j.agrformet.2017.10.026>.
Maintained by Dongdong Kong. Last updated 2 months ago.
phenologyremote-sensingopenblascppopenmp
78 stars 7.71 score 332 scriptsjonclayden
RNiftyReg:Image Registration Using the 'NiftyReg' Library
Provides an 'R' interface to the 'NiftyReg' image registration tools <https://github.com/KCL-BMEIS/niftyreg>. Linear and nonlinear registration are supported, in two and three dimensions.
Maintained by Jon Clayden. Last updated 6 months ago.
image-registrationmedical-imagingtransformationscppopenmp
43 stars 7.69 score 50 scripts 5 dependentscoatless-rpkg
RcppEnsmallen:Header-Only C++ Mathematical Optimization Library for 'Armadillo'
'Ensmallen' is a templated C++ mathematical optimization library (by the 'MLPACK' team) that provides a simple set of abstractions for writing an objective function to optimize. Provided within are various standard and cutting-edge optimizers that include full-batch gradient descent techniques, small-batch techniques, gradient-free optimizers, and constrained optimization. The 'RcppEnsmallen' package includes the header files from the 'Ensmallen' library and pairs the appropriate header files from 'armadillo' through the 'RcppArmadillo' package. Therefore, users do not need to install 'Ensmallen' nor 'Armadillo' to use 'RcppEnsmallen'. Note that 'Ensmallen' is licensed under 3-Clause BSD, 'Armadillo' starting from 7.800.0 is licensed under Apache License 2, 'RcppArmadillo' (the 'Rcpp' bindings/bridge to 'Armadillo') is licensed under the GNU GPL version 2 or later. Thus, 'RcppEnsmallen' is also licensed under similar terms. Note that 'Ensmallen' requires a compiler that supports 'C++14' and 'Armadillo' 10.8.2 or later.
Maintained by James Joseph Balamuta. Last updated 4 months ago.
armadillocpp11ensmallenoptimizationrcpprcpparmadilloopenblascppopenmp
31 stars 7.67 score 1 scripts 14 dependentsnsaph-software
CausalGPS:Matching on Generalized Propensity Scores with Continuous Exposures
Provides a framework for estimating causal effects of a continuous exposure using observational data, and implementing matching and weighting on the generalized propensity score. Wu, X., Mealli, F., Kioumourtzoglou, M.A., Dominici, F. and Braun, D., 2022. Matching on generalized propensity scores with continuous exposures. Journal of the American Statistical Association, pp.1-29.
Maintained by Naeem Khoshnevis. Last updated 9 months ago.
24 stars 7.67 score 39 scriptsbsvars
bsvars:Bayesian Estimation of Structural Vector Autoregressive Models
Provides fast and efficient procedures for Bayesian analysis of Structural Vector Autoregressions. This package estimates a wide range of models, including homo-, heteroskedastic, and non-normal specifications. Structural models can be identified by adjustable exclusion restrictions, time-varying volatility, or non-normality. They all include a flexible three-level equation-specific local-global hierarchical prior distribution for the estimated level of shrinkage for autoregressive and structural parameters. Additionally, the package facilitates predictive and structural analyses such as impulse responses, forecast error variance and historical decompositions, forecasting, verification of heteroskedasticity, non-normality, and hypotheses on autoregressive parameters, as well as analyses of structural shocks, volatilities, and fitted values. Beautiful plots, informative summary functions, and extensive documentation including the vignette by Woźniak (2024) <doi:10.48550/arXiv.2410.15090> complement all this. The implemented techniques align closely with those presented in Lütkepohl, Shang, Uzeda, & Woźniak (2024) <doi:10.48550/arXiv.2404.11057>, Lütkepohl & Woźniak (2020) <doi:10.1016/j.jedc.2020.103862>, and Song & Woźniak (2021) <doi:10.1093/acrefore/9780190625979.013.174>. The 'bsvars' package is aligned regarding objects, workflows, and code structure with the R package 'bsvarSIGNs' by Wang & Woźniak (2024) <doi:10.32614/CRAN.package.bsvarSIGNs>, and they constitute an integrated toolset.
Maintained by Tomasz Woźniak. Last updated 2 months ago.
bayesian-inferenceeconometricsvector-autoregressionopenblascppopenmp
46 stars 7.67 score 32 scripts 1 dependentsnlmixr2
lotri:A Simple Way to Specify Symmetric, Block Diagonal Matrices
Provides a simple mechanism to specify a symmetric block diagonal matrices (often used for covariance matrices). This is based on the domain specific language implemented in 'nlmixr2' but expanded to create matrices in R generally instead of specifying parts of matrices to estimate. It has expanded to include some matrix manipulation functions that are generally useful for 'rxode2' and 'nlmixr2'.
Maintained by Matthew L. Fidler. Last updated 6 months ago.
6 stars 7.67 score 18 scripts 14 dependentsmlampros
textTinyR:Text Processing for Small or Big Data Files
It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.
Maintained by Lampros Mouselimis. Last updated 1 years ago.
bhboostcpp11processingrcpprcpparmadillotextopenblascppopenmp
39 stars 7.64 score 244 scripts 1 dependentsbioc
ggsc:Visualizing Single Cell and Spatial Transcriptomics
Useful functions to visualize single cell and spatial data. It supports visualizing 'Seurat', 'SingleCellExperiment' and 'SpatialExperiment' objects through grammar of graphics syntax implemented in 'ggplot2'.
Maintained by Guangchuang Yu. Last updated 5 months ago.
dimensionreductiongeneexpressionsinglecellsoftwarespatialtranscriptomicsvisualizationopenblascppopenmp
47 stars 7.59 score 18 scriptsbioc
scde:Single Cell Differential Expression
The scde package implements a set of statistical methods for analyzing single-cell RNA-seq data. scde fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The scde package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify and characterize putative cell subpopulations based on transcriptional signatures. The overall approach to the differential expression analysis is detailed in the following publication: "Bayesian approach to single-cell differential expression analysis" (Kharchenko PV, Silberstein L, Scadden DT, Nature Methods, doi: 10.1038/nmeth.2967). The overall approach to subpopulation identification and characterization is detailed in the following pre-print: "Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis" (Fan J, Salathia N, Liu R, Kaeser G, Yung Y, Herman J, Kaper F, Fan JB, Zhang K, Chun J, and Kharchenko PV, Nature Methods, doi:10.1038/nmeth.3734).
Maintained by Evan Biederstedt. Last updated 5 months ago.
immunooncologyrnaseqstatisticalmethoddifferentialexpressionbayesiantranscriptionsoftwareanalysisbioinformaticsheterogenityngssingle-celltranscriptomicsopenblascppopenmp
173 stars 7.53 score 141 scriptsrezamoammadi
BDgraph:Bayesian Structure Learning in Graphical Models using Birth-Death MCMC
Advanced statistical tools for Bayesian structure learning in undirected graphical models, accommodating continuous, ordinal, discrete, count, and mixed data. It integrates recent advancements in Bayesian graphical models as presented in the literature, including the works of Mohammadi and Wit (2015) <doi:10.1214/14-BA889>, Mohammadi et al. (2021) <doi:10.1080/01621459.2021.1996377>, Dobra and Mohammadi (2018) <doi:10.1214/18-AOAS1164>, and Mohammadi et al. (2023) <doi:10.48550/arXiv.2307.00127>.
Maintained by Reza Mohammadi. Last updated 7 months ago.
8 stars 7.46 score 223 scripts 7 dependentsblasern
rdist:Calculate Pairwise Distances
A common framework for calculating distance matrices.
Maintained by Nello Blaser. Last updated 2 years ago.
17 stars 7.45 score 203 scripts 11 dependentsjonclayden
mmand:Mathematical Morphology in Any Number of Dimensions
Provides tools for performing mathematical morphology operations, such as erosion and dilation, on data of arbitrary dimensionality. Can also be used for finding connected components, resampling, filtering, smoothing and other image processing-style operations.
Maintained by Jon Clayden. Last updated 1 years ago.
image-processingmorphologyresamplingcppopenmp
37 stars 7.42 score 223 scripts 9 dependentsjonathancornelissen
highfrequency:Tools for Highfrequency Data Analysis
Provide functionality to manage, clean and match highfrequency trades and quotes data, calculate various liquidity measures, estimate and forecast volatility, detect price jumps and investigate microstructure noise and intraday periodicity. A detailed vignette can be found in the paper "Analyzing Intraday Financial Data in R: The highfrequency Package" by Boudt, Kleen, and Sjoerup (2022, <doi:10.18637/jss.v104.i08>). The DOI in the CITATION is for a new Journal of Statistical Software publication that will be registered after publication on CRAN. A working paper version can be found on SSRN: <doi:10.2139/ssrn.3917548>.
Maintained by Kris Boudt. Last updated 2 years ago.
152 stars 7.37 score 286 scriptstommyjones
tidylda:Latent Dirichlet Allocation Using 'tidyverse' Conventions
Implements an algorithm for Latent Dirichlet Allocation (LDA), Blei et at. (2003) <https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf>, using style conventions from the 'tidyverse', Wickham et al. (2019)<doi:10.21105/joss.01686>, and 'tidymodels', Kuhn et al.<https://tidymodels.github.io/model-implementation-principles/>. Fitting is done via collapsed Gibbs sampling. Also implements several novel features for LDA such as guided models and transfer learning.
Maintained by Tommy Jones. Last updated 2 months ago.
41 stars 7.36 score 53 scriptspneuvial
adjclust:Adjacency-Constrained Clustering of a Block-Diagonal Similarity Matrix
Implements a constrained version of hierarchical agglomerative clustering, in which each observation is associated to a position, and only adjacent clusters can be merged. Typical application fields in bioinformatics include Genome-Wide Association Studies or Hi-C data analysis, where the similarity between items is a decreasing function of their genomic distance. Taking advantage of this feature, the implemented algorithm is time and memory efficient. This algorithm is described in Ambroise et al (2019) <doi:10.1186/s13015-019-0157-4>.
Maintained by Pierre Neuvial. Last updated 6 months ago.
clusteringfeatureextractiongwashi-chierarchical-clusteringlinkage-disequilibriumcppopenmp
16 stars 7.35 score 13 scripts 2 dependentsdavid-cortes
outliertree:Explainable Outlier Detection Through Decision Tree Conditioning
Outlier detection method that flags suspicious values within observations, constrasting them against the normal values in a user-readable format, potentially describing conditions within the data that make a given outlier more rare. Full procedure is described in Cortes (2020) <doi:10.48550/arXiv.2001.00636>. Loosely based on the 'GritBot' <https://www.rulequest.com/gritbot-info.html> software.
Maintained by David Cortes. Last updated 3 months ago.
anomaly-detectionoutlier-detectioncppopenmp
58 stars 7.34 score 21 scripts 2 dependentsgagolews
genieclust:Fast and Robust Hierarchical Clustering with Noise Points Detection
A retake on the Genie algorithm (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>), which is a robust hierarchical clustering method (Gagolewski, Bartoszuk, Cena, 2016 <DOI:10.1016/j.ins.2016.05.003>). It is now faster and more memory efficient; determining the whole cluster hierarchy for datasets of 10M points in low dimensional Euclidean spaces or 100K points in high-dimensional ones takes only a minute or so. Allows clustering with respect to mutual reachability distances so that it can act as a noise point detector or a robustified version of 'HDBSCAN*' (that is able to detect a predefined number of clusters and hence it does not dependent on the somewhat fragile 'eps' parameter). The package also features an implementation of inequality indices (e.g., Gini and Bonferroni), external cluster validity measures (e.g., the normalised clustering accuracy, the adjusted Rand index, the Fowlkes-Mallows index, and normalised mutual information), and internal cluster validity indices (e.g., the Calinski-Harabasz, Davies-Bouldin, Ball-Hall, Silhouette, and generalised Dunn indices). See also the 'Python' version of 'genieclust' available on 'PyPI', which supports sparse data, more metrics, and even larger datasets.
Maintained by Marek Gagolewski. Last updated 8 days ago.
cluster-analysisclusteringclustering-algorithmdata-analysisdata-miningdata-sciencegeniehdbscanhierarchical-clusteringhierarchical-clustering-algorithmmachine-learningmachine-learning-algorithmsmlpacknmslibpythonpython3sparsecppopenmp
61 stars 7.33 score 13 scripts 5 dependentskharchenkolab
conos:Clustering on Network of Samples
Wires together large collections of single-cell RNA-seq datasets, which allows for both the identification of recurrent cell clusters and the propagation of information between datasets in multi-sample or atlas-scale collections. 'Conos' focuses on the uniform mapping of homologous cell types across heterogeneous sample collections. For instance, users could investigate a collection of dozens of peripheral blood samples from cancer patients combined with dozens of controls, which perhaps includes samples of a related tissue such as lymph nodes. This package interacts with data available through the 'conosPanel' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/conos>. The size of the 'conosPanel' package is approximately 12 MB.
Maintained by Evan Biederstedt. Last updated 1 years ago.
batch-correctionscrna-seqsingle-cell-rna-seqopenblascppopenmp
205 stars 7.33 score 258 scriptsropensci
melt:Multiple Empirical Likelihood Tests
Performs multiple empirical likelihood tests. It offers an easy-to-use interface and flexibility in specifying hypotheses and calibration methods, extending the framework to simultaneous inferences. The core computational routines are implemented using the 'Eigen' 'C++' library and 'RcppEigen' interface, with 'OpenMP' for parallel computation. Details of the testing procedures are provided in Kim, MacEachern, and Peruggia (2023) <doi:10.1080/10485252.2023.2206919>. A companion paper by Kim, MacEachern, and Peruggia (2024) <doi:10.18637/jss.v108.i05> is available for further information. This work was supported by the U.S. National Science Foundation under Grants No. SES-1921523 and DMS-2015552.
Maintained by Eunseop Kim. Last updated 11 months ago.
12 stars 7.24 score 84 scriptsnicchr
cheapr:Simple Functions to Save Time and Memory
Fast and memory-efficient (or 'cheap') tools to facilitate efficient programming, saving time and memory. It aims to provide 'cheaper' alternatives to common base R functions, as well as some additional functions.
Maintained by Nick Christofides. Last updated 1 days ago.
19 stars 7.21 score 73 scripts 2 dependentskkholst
targeted:Targeted Inference
Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>).
Maintained by Klaus K. Holst. Last updated 2 months ago.
causal-inferencedouble-robustestimationsemiparametric-estimationstatisticsopenblascppopenmp
11 stars 7.20 score 30 scripts 1 dependentsjhorzek
lessSEM:Non-Smooth Regularization for Structural Equation Models
Provides regularized structural equation modeling (regularized SEM) with non-smooth penalty functions (e.g., lasso) building on 'lavaan'. The package is heavily inspired by the ['regsem'](<https://github.com/Rjacobucci/regsem>) and ['lslx'](<https://github.com/psyphh/lslx>) packages.
Maintained by Jannik H. Orzek. Last updated 1 years ago.
lassopsychometricsregularizationregularized-structural-equation-modelsemstructural-equation-modelingopenblascppopenmp
7 stars 7.19 score 223 scriptsjalilian
ETAS:Modeling Earthquake Data Using 'ETAS' Model
Fits the space-time Epidemic Type Aftershock Sequence ('ETAS') model to earthquake catalogs using a stochastic 'declustering' approach. The 'ETAS' model is a 'spatio-temporal' marked point process model and a special case of the 'Hawkes' process. The package is based on a Fortran program by 'Jiancang Zhuang' (available at <http://bemlar.ism.ac.jp/zhuang/software.html>), which is modified and translated into C++ and C such that it can be called from R. Parallel computing with 'OpenMP' is possible on supported platforms.
Maintained by Abdollah Jalilian. Last updated 7 months ago.
24 stars 7.18 score 21 scripts 1 dependentsaalfons
robustHD:Robust Methods for High-Dimensional Data
Robust methods for high-dimensional data, in particular linear model selection techniques based on least angle regression and sparse regression. Specifically, the package implements robust least angle regression (Khan, Van Aelst & Zamar, 2007; <doi:10.1198/016214507000000950>), (robust) groupwise least angle regression (Alfons, Croux & Gelper, 2016; <doi:10.1016/j.csda.2015.02.007>), and sparse least trimmed squares regression (Alfons, Croux & Gelper, 2013; <doi:10.1214/12-AOAS575>).
Maintained by Andreas Alfons. Last updated 9 months ago.
10 stars 7.10 score 174 scripts 8 dependentsmlampros
elmNNRcpp:The Extreme Learning Machine Algorithm
Training and predict functions for Single Hidden-layer Feedforward Neural Networks (SLFN) using the Extreme Learning Machine (ELM) algorithm. The ELM algorithm differs from the traditional gradient-based algorithms for very short training times (it doesn't need any iterative tuning, this makes learning time very fast) and there is no need to set any other parameters like learning rate, momentum, epochs, etc. This is a reimplementation of the 'elmNN' package using 'RcppArmadillo' after the 'elmNN' package was archived. For more information, see "Extreme learning machine: Theory and applications" by Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew (2006), Elsevier B.V, <doi:10.1016/j.neucom.2005.12.126>.
Maintained by Lampros Mouselimis. Last updated 2 years ago.
armadilloelmextreme-learning-machinercpparmadilloopenblascppopenmp
14 stars 7.06 score 39 scripts 7 dependentsrwehrens
kohonen:Supervised and Unsupervised Self-Organising Maps
Functions to train self-organising maps (SOMs). Also interrogation of the maps and prediction using trained maps are supported. The name of the package refers to Teuvo Kohonen, the inventor of the SOM.
Maintained by Ron Wehrens. Last updated 2 years ago.
9 stars 7.05 score 724 scripts 14 dependentsloelschlaeger
fHMM:Fitting Hidden Markov Models to Financial Data
Fitting (hierarchical) hidden Markov models to financial data via maximum likelihood estimation. See Oelschläger, L. and Adam, T. "Detecting Bearish and Bullish Markets in Financial Time Series Using Hierarchical Hidden Markov Models" (2021, Statistical Modelling) <doi:10.1177/1471082X211034048> for a reference on the method. A user guide is provided by the accompanying software paper "fHMM: Hidden Markov Models for Financial Time Series in R", Oelschläger, L., Adam, T., and Michels, R. (2024, Journal of Statistical Software) <doi:10.18637/jss.v109.i09>.
Maintained by Lennart Oelschläger. Last updated 5 days ago.
financehidden-markov-modelscppopenmp
17 stars 7.04 score 5 scriptsdoccstat
fastcpd:Fast Change Point Detection via Sequential Gradient Descent
Implements fast change point detection algorithm based on the paper "Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis" by Xianyang Zhang, Trisha Dawn <https://proceedings.mlr.press/v206/zhang23b.html>. The algorithm is based on dynamic programming with pruning and sequential gradient descent. It is able to detect change points a magnitude faster than the vanilla Pruned Exact Linear Time(PELT). The package includes examples of linear regression, logistic regression, Poisson regression, penalized linear regression data, and whole lot more examples with custom cost function in case the user wants to use their own cost function.
Maintained by Xingchi Li. Last updated 10 days ago.
change-point-detectioncppcustom-functiongradient-descentlassolinear-regressionlogistic-regressionofflinepeltpenalized-regressionpoisson-regressionquasi-newtonstatisticstime-serieswarm-startfortranopenblascppopenmp
22 stars 7.00 score 7 scriptsdrizopoulos
JMbayes:Joint Modeling of Longitudinal and Time-to-Event Data under a Bayesian Approach
Shared parameter models for the joint modeling of longitudinal and time-to-event data using MCMC; Dimitris Rizopoulos (2016) <doi:10.18637/jss.v072.i07>.
Maintained by Dimitris Rizopoulos. Last updated 4 years ago.
joint-modelslongitudinal-responsesprediction-modelsurvival-analysisopenblascppopenmpjags
59 stars 6.97 score 80 scriptsinsitro
AllelicSeries:Allelic Series Test
Implementation of gene-level rare variant association tests targeting allelic series: genes where increasingly deleterious mutations have increasingly large phenotypic effects. The COding-variant Allelic Series Test (COAST) operates on the benign missense variants (BMVs), deleterious missense variants (DMVs), and protein truncating variants (PTVs) within a gene. COAST uses a set of adjustable weights that tailor the test towards rejecting the null hypothesis for genes where the average magnitude of effect increases monotonically from BMVs to DMVs to PTVs. See McCaw ZR, O’Dushlaine C, Somineni H, Bereket M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. (2023) "An allelic series rare variant association test for candidate gene discovery" <doi:10.1016/j.ajhg.2023.07.001>.
Maintained by Zachary McCaw. Last updated 2 months ago.
13 stars 6.97 score 8 scriptsjames-thorson-noaa
dsem:Fit Dynamic Structural Equation Models
Applies dynamic structural equation models to time-series data with generic and simplified specification for simultaneous and lagged effects. Methods are described in Thorson et al. (2024) "Dynamic structural equation models synthesize ecosystem dynamics constrained by ecological mechanisms."
Maintained by James Thorson. Last updated 17 days ago.
11 stars 6.90 score 24 scriptslcrawlab
mvMAPIT:Multivariate Genome Wide Marginal Epistasis Test
Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this package, we present the 'multivariate MArginal ePIstasis Test' ('mvMAPIT') – a multi-outcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact – thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search based methods. Our proposed 'mvMAPIT' builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate 'mvMAPIT' as a multivariate linear mixed model and develop a multi-trait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. Crawford et al. (2017) <doi:10.1371/journal.pgen.1006869>. Stamp et al. (2023) <doi:10.1093/g3journal/jkad118>.
Maintained by Julian Stamp. Last updated 5 months ago.
cppepistasisepistasis-analysisgwasgwas-toolslinear-mixed-modelsmapitmvmapitvariance-componentsopenblascppopenmp
11 stars 6.90 score 17 scripts 1 dependentsfurrer-lab
abn:Modelling Multivariate Data with Additive Bayesian Networks
The 'abn' R package facilitates Bayesian network analysis, a probabilistic graphical model that derives from empirical data a directed acyclic graph (DAG). This DAG describes the dependency structure between random variables. The R package 'abn' provides routines to help determine optimal Bayesian network models for a given data set. These models are used to identify statistical dependencies in messy, complex data. Their additive formulation is equivalent to multivariate generalised linear modelling, including mixed models with independent and identically distributed (iid) random effects. The core functionality of the 'abn' package revolves around model selection, also known as structure discovery. It supports both exact and heuristic structure learning algorithms and does not restrict the data distribution of parent-child combinations, providing flexibility in model creation and analysis. The 'abn' package uses Laplace approximations for metric estimation and includes wrappers to the 'INLA' package. It also employs 'JAGS' for data simulation purposes. For more resources and information, visit the 'abn' website.
Maintained by Matteo Delucchi. Last updated 17 days ago.
bayesian-networkbinomialcategorical-datagaussiangrouped-datasetsmixed-effectsmultinomialmultivariatepoissonstructure-learninggslopenblascppopenmpjags
6 stars 6.88 score 90 scriptsmrc-ide
dust2:Next Generation dust
Experimental sources for the next generation of dust, which will properly adopt the particle filter, have support for partial parameter updates, support for multiple parameter sets and hopefully better GPU/MPI support.
Maintained by Rich FitzJohn. Last updated 21 hours ago.
6.87 score 32 scripts 2 dependentsdavid-cortes
cmfrec:Collective Matrix Factorization for Recommender Systems
Collective matrix factorization (a.k.a. multi-view or multi-way factorization, Singh, Gordon, (2008) <doi:10.1145/1401890.1401969>) tries to approximate a (potentially very sparse or having many missing values) matrix 'X' as the product of two low-dimensional matrices, optionally aided with secondary information matrices about rows and/or columns of 'X', which are also factorized using the same latent components. The intended usage is for recommender systems, dimensionality reduction, and missing value imputation. Implements extensions of the original model (Cortes, (2018) <arXiv:1809.00366>) and can produce different factorizations such as the weighted 'implicit-feedback' model (Hu, Koren, Volinsky, (2008) <doi:10.1109/ICDM.2008.22>), the 'weighted-lambda-regularization' model, (Zhou, Wilkinson, Schreiber, Pan, (2008) <doi:10.1007/978-3-540-68880-8_32>), or the enhanced model with 'implicit features' (Rendle, Zhang, Koren, (2019) <arXiv:1905.01395>), with or without side information. Can use gradient-based procedures or alternating-least squares procedures (Koren, Bell, Volinsky, (2009) <doi:10.1109/MC.2009.263>), with either a Cholesky solver, a faster conjugate gradient solver (Takacs, Pilaszy, Tikk, (2011) <doi:10.1145/2043932.2043987>), or a non-negative coordinate descent solver (Franc, Hlavac, Navara, (2005) <doi:10.1007/11556121_50>), providing efficient methods for sparse and dense data, and mixtures thereof. Supports L1 and L2 regularization in the main models, offers alternative most-popular and content-based models, and implements functionality for cold-start recommendations and imputation of 2D data.
Maintained by David Cortes. Last updated 3 months ago.
cold-startcollaborative-filteringcollective-matrix-factorizationopenblasopenmp
120 stars 6.84 score 23 scriptsnepem-ufsc
pliman:Tools for Plant Image Analysis
Tools for both single and batch image manipulation and analysis (Olivoto, 2022 <doi:10.1111/2041-210X.13803>) and phytopathometry (Olivoto et al., 2022 <doi:10.1007/S40858-021-00487-5>). The tools can be used for the quantification of leaf area, object counting, extraction of image indexes, shape measurement, object landmark identification, and Elliptical Fourier Analysis of object outlines (Claude (2008) <doi:10.1007/978-0-387-77789-4>). The package also provides a comprehensive pipeline for generating shapefiles with complex layouts and supports high-throughput phenotyping of RGB, multispectral, and hyperspectral orthomosaics. This functionality facilitates field phenotyping using UAV- or satellite-based imagery.
Maintained by Tiago Olivoto. Last updated 16 hours ago.
11 stars 6.76 score 476 scriptsr-lidar
lasR:Fast and Pipeable Airborne LiDAR Data Tools
Fast and pipeable airborne lidar processing tools. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, normalization, individual tree segmentation and other manipulations in a powerful and versatile processing chain.
Maintained by Jean-Romain Roussel. Last updated 1 months ago.
19 stars 6.75 score 26 scriptss3alfisc
fwildclusterboot:Fast Wild Cluster Bootstrap Inference for Linear Models
Implementation of fast algorithms for wild cluster bootstrap inference developed in 'Roodman et al' (2019, 'STATA' Journal, <doi:10.1177/1536867X19830877>) and 'MacKinnon et al' (2022), which makes it feasible to quickly calculate bootstrap test statistics based on a large number of bootstrap draws even for large samples. Multiple bootstrap types as described in 'MacKinnon, Nielsen & Webb' (2022) are supported. Further, 'multiway' clustering, regression weights, bootstrap weights, fixed effects and 'subcluster' bootstrapping are supported. Further, both restricted ('WCR') and unrestricted ('WCU') bootstrap are supported. Methods are provided for a variety of fitted models, including 'lm()', 'feols()' (from package 'fixest') and 'felm()' (from package 'lfe'). Additionally implements a 'heteroskedasticity-robust' ('HC1') wild bootstrap. Last, the package provides an R binding to 'WildBootTests.jl', which provides additional speed gains and functionality, including the 'WRE' bootstrap for instrumental variable models (based on models of type 'ivreg()' from package 'ivreg') and hypotheses with q > 1.
Maintained by Alexander Fischer. Last updated 2 years ago.
clustered-standard-errorslinear-regression-modelswild-bootstrapwild-cluster-bootstrapopenblascppopenmp
25 stars 6.69 score 109 scripts 2 dependentsmlizhangx
NAIR:Network Analysis of Immune Repertoire
Pipelines for studying the adaptive immune repertoire of T cells and B cells via network analysis based on receptor sequence similarity. Relate clinical outcomes to immune repertoires based on their network properties, or to particular clusters and clones within a repertoire. Yang et al. (2023) <doi:10.3389/fimmu.2023.1181825>.
Maintained by Brian Neal. Last updated 2 months ago.
7 stars 6.66 score 27 scriptse-nakama
RhpcBLASctl:Control the Number of Threads on 'BLAS'
Control the number of threads on 'BLAS' (Aka 'GotoBLAS', 'OpenBLAS', 'ACML', 'BLIS' and 'MKL'). And possible to control the number of threads in 'OpenMP'. Get a number of logical cores and physical cores if feasible.
Maintained by Ei-ji Nakama. Last updated 2 years ago.
6.65 score 552 scripts 103 dependentsdavidbolin
excursions:Excursion Sets and Contour Credibility Regions for Random Fields
Functions that compute probabilistic excursion sets, contour credibility regions, contour avoiding regions, and simultaneous confidence bands for latent Gaussian random processes and fields. The package also contains functions that calculate these quantities for models estimated with the INLA package. The main references for excursions are Bolin and Lindgren (2015) <doi:10.1111/rssb.12055>, Bolin and Lindgren (2017) <doi:10.1080/10618600.2016.1228537>, and Bolin and Lindgren (2018) <doi:10.18637/jss.v086.i05>. These can be generated by the citation function in R.
Maintained by David Bolin. Last updated 10 days ago.
3 stars 6.64 score 40 scripts 1 dependentsbioc
bayNorm:Single-cell RNA sequencing data normalization
bayNorm is used for normalizing single-cell RNA-seq data.
Maintained by Wenhao Tang. Last updated 5 months ago.
immunooncologynormalizationrnaseqsinglecellsequencingscrnaseqcppopenmp
9 stars 6.59 score 36 scriptsstscl
sdsfun:Spatial Data Science Complementary Features
Wrapping and supplementing commonly used functions in the R ecosystem related to spatial data science, while serving as a basis for other packages maintained by Wenbo Lv.
Maintained by Wenbo Lv. Last updated 26 days ago.
geoinformaticsspatial-data-analysisspatial-data-sciencespatial-statisticsopenblascppopenmp
16 stars 6.58 score 6 scripts 8 dependentsagnesdeng
mixgb:Multiple Imputation Through 'XGBoost'
Multiple imputation using 'XGBoost', subsampling, and predictive mean matching as described in Deng and Lumley (2023) <doi:10.1080/10618600.2023.2252501>. The package supports various types of variables, offers flexible settings, and enables saving an imputation model to impute new data. Data processing and memory usage have been optimised to speed up the imputation process.
Maintained by Yongshi Deng. Last updated 2 months ago.
23 stars 6.58 score 82 scriptsmdsteiner
EFAtools:Fast and Flexible Implementations of Exploratory Factor Analysis Tools
Provides functions to perform exploratory factor analysis (EFA) procedures and compare their solutions. The goal is to provide state-of-the-art factor retention methods and a high degree of flexibility in the EFA procedures. This way, for example, implementations from R 'psych' and 'SPSS' can be compared. Moreover, functions for Schmid-Leiman transformation and the computation of omegas are provided. To speed up the analyses, some of the iterative procedures, like principal axis factoring (PAF), are implemented in C++.
Maintained by Markus Steiner. Last updated 3 months ago.
10 stars 6.58 score 83 scripts 1 dependentscraddm
eegUtils:Utilities for Electroencephalographic (EEG) Analysis
Electroencephalography data processing and visualization tools. Includes import functions for 'BioSemi' (.BDF), 'Neuroscan' (.CNT), 'Brain Vision Analyzer' (.VHDR), 'EEGLAB' (.set) and 'Fieldtrip' (.mat). Many preprocessing functions such as referencing, epoching, filtering, and ICA are available. There are a variety of visualizations possible, including timecourse and topographical plotting.
Maintained by Matt Craddock. Last updated 5 months ago.
eegeeg-analysiseeg-dataeeg-signalseeg-signals-processingopenblascppopenmp
106 stars 6.54 score 82 scriptsamerican-institutes-for-research
wCorr:Weighted Correlations
Calculates Pearson, Spearman, polychoric, and polyserial correlation coefficients, in weighted or unweighted form. The package implements tetrachoric correlation as a special case of the polychoric and biserial correlation as a specific case of the polyserial.
Maintained by Paul Bailey. Last updated 2 years ago.
6.54 score 118 scripts 8 dependentsjmgirard
circumplex:Analysis and Visualization of Circular Data
Circumplex models, which organize constructs in a circle around two underlying dimensions, are popular for studying interpersonal functioning, mood/affect, and vocational preferences/environments. This package provides tools for analyzing and visualizing circular data, including scoring functions for relevant instruments and a generalization of the bootstrapped structural summary method from Zimmermann & Wright (2017) <doi:10.1177/1073191115621795> and functions for creating publication-ready tables and figures from the results.
Maintained by Jeffrey Girard. Last updated 5 months ago.
circularcircumplexdata-analysisggplot2interpersonalpsychologyrcpparmadillotidyverseopenblascppopenmp
11 stars 6.54 score 52 scriptsausgis
geocomplexity:Mitigating Spatial Bias Through Geographical Complexity
The geographical complexity of individual variables can be characterized by the differences in local attribute variables, while the common geographical complexity of multiple variables can be represented by fluctuations in the similarity of vectors composed of multiple variables. In spatial regression tasks, the goodness of fit can be improved by incorporating a geographical complexity representation vector during modeling, using a geographical complexity-weighted spatial weight matrix, or employing local geographical complexity kernel density. Similarly, in spatial sampling tasks, samples can be selected more effectively by using a method that weights based on geographical complexity. By optimizing performance in spatial regression and spatial sampling tasks, the spatial bias of the model can be effectively reduced.
Maintained by Wenbo Lv. Last updated 5 months ago.
geographical-complexitygeospatial-analysisspatial-regressionspatial-relationsspatial-samplingspatial-statisticsopenblascppopenmp
19 stars 6.53 score 12 scriptsscmethods
scregclust:Reconstructing the Regulatory Programs of Target Genes in scRNA-Seq Data
Implementation of the scregclust algorithm described in Larsson, Held, et al. (2024) <doi:10.1038/s41467-024-53954-3> which reconstructs regulatory programs of target genes in scRNA-seq data. Target genes are clustered into modules and each module is associated with a linear model describing the regulatory program.
Maintained by Felix Held. Last updated 2 months ago.
clusteringregulatory-programsscrna-seq-analysiscppopenmp
10 stars 6.50 score 21 scriptstvpham
iq:Protein Quantification in Mass Spectrometry-Based Proteomics
An implementation of the MaxLFQ algorithm by Cox et al. (2014) <doi:10.1074/mcp.M113.031591> in a comprehensive pipeline for processing proteomics data in data-independent acquisition mode (Pham et al. 2020 <doi:10.1093/bioinformatics/btz961>). It offers additional options for protein quantification using the N most intense fragment ions, using all fragment ions, and a wrapper for the median polish algorithm by Tukey (1977, ISBN:0201076160). In general, the tool can be used to integrate multiple proportional observations into a single quantitative value.
Maintained by Thang Pham. Last updated 27 days ago.
27 stars 6.49 score 25 scriptsthongphamthe
PAFit:Generative Mechanism Estimation in Temporal Complex Networks
Statistical methods for estimating preferential attachment and node fitness generative mechanisms in temporal complex networks are provided. Thong Pham et al. (2015) <doi:10.1371/journal.pone.0137796>. Thong Pham et al. (2016) <doi:10.1038/srep32558>. Thong Pham et al. (2020) <doi:10.18637/jss.v092.i03>. Thong Pham et al. (2021) <doi:10.1093/comnet/cnab024>.
Maintained by Thong Pham. Last updated 1 years ago.
complex-networksfit-get-richergeneral-preferential-attachmentminorize-maximizationpreferential-attachmentrich-get-richerscale-freetemporal-networkscppopenmp
17 stars 6.47 score 70 scriptsbachmannpatrick
CLVTools:Tools for Customer Lifetime Value Estimation
A set of state-of-the-art probabilistic modeling approaches to derive estimates of individual customer lifetime values (CLV). Commonly, probabilistic approaches focus on modelling 3 processes, i.e. individuals' attrition, transaction, and spending process. Latent customer attrition models, which are also known as "buy-'til-you-die models", model the attrition as well as the transaction process. They are used to make inferences and predictions about transactional patterns of individual customers such as their future purchase behavior. Moreover, these models have also been used to predict individuals’ long-term engagement in activities such as playing an online game or posting to a social media platform. The spending process is usually modelled by a separate probabilistic model. Combining these results yields in lifetime values estimates for individual customers. This package includes fast and accurate implementations of various probabilistic models for non-contractual settings (e.g., grocery purchases or hotel visits). All implementations support time-invariant covariates, which can be used to control for e.g., socio-demographics. If such an extension has been proposed in literature, we further provide the possibility to control for time-varying covariates to control for e.g., seasonal patterns. Currently, the package includes the following latent attrition models to model individuals' attrition and transaction process: [1] Pareto/NBD model (Pareto/Negative-Binomial-Distribution), [2] the Extended Pareto/NBD model (Pareto/Negative-Binomial-Distribution with time-varying covariates), [3] the BG/NBD model (Beta-Gamma/Negative-Binomial-Distribution) and the [4] GGom/NBD (Gamma-Gompertz/Negative-Binomial-Distribution). Further, we provide an implementation of the Gamma/Gamma model to model the spending process of individuals.
Maintained by Patrick Bachmann. Last updated 4 months ago.
clvcustomer-lifetime-valuecustomer-relationship-managementopenblasgslcppopenmp
55 stars 6.47 score 12 scriptstilburgnetworkgroup
remify:Processing and Transforming Relational Event History Data
Efficiently processes relational event history data and transforms them into formats suitable for other packages. The primary objective of this package is to convert event history data into a format that integrates with the packages in 'remverse' and is compatible with various analytical tools (e.g., computing network statistics, estimating tie-oriented or actor-oriented social network models). Second, it can also transform the data into formats compatible with other packages out of 'remverse'. The package processes the data for two types of temporal social network models: tie-oriented modeling framework (Butts, C., 2008, <doi:10.1111/j.1467-9531.2008.00203.x>) and actor-oriented modeling framework (Stadtfeld, C., & Block, P., 2017, <doi:10.15195/v4.a14>).
Maintained by Giuseppe Arena. Last updated 2 months ago.
4 stars 6.45 score 39 scripts 1 dependentskaskr
RTMBp:'R' Bindings for 'TMB'
Native 'R' interface to 'TMB' (Template Model Builder) so models can be written entirely in 'R' rather than 'C++'. Automatic differentiation, to any order, is available for a rich subset of 'R' features, including linear algebra for dense and sparse matrices, complex arithmetic, Fast Fourier Transform, probability distributions and special functions. 'RTMBp' provides easy access to model fitting and validation following the principles of Kristensen, K., Nielsen, A., Berg, C. W., Skaug, H., & Bell, B. M. (2016) <DOI:10.18637/jss.v070.i05> and Thygesen, U.H., Albertsen, C.M., Berg, C.W. et al. (2017) <DOI:10.1007/s10651-017-0372-4>.
Maintained by Kasper Kristensen. Last updated 1 months ago.
51 stars 6.44 score 1 scriptshelske
bssm:Bayesian Inference of Non-Linear and Non-Gaussian State Space Models
Efficient methods for Bayesian inference of state space models via Markov chain Monte Carlo (MCMC) based on parallel importance sampling type weighted estimators (Vihola, Helske, and Franks, 2020, <doi:10.1111/sjos.12492>), particle MCMC, and its delayed acceptance version. Gaussian, Poisson, binomial, negative binomial, and Gamma observation densities and basic stochastic volatility models with linear-Gaussian state dynamics, as well as general non-linear Gaussian models and discretised diffusion models are supported. See Helske and Vihola (2021, <doi:10.32614/RJ-2021-103>) for details.
Maintained by Jouni Helske. Last updated 7 months ago.
bayesian-inferencecppmarkov-chain-monte-carloparticle-filterstate-spacetime-seriesopenblascppopenmp
42 stars 6.43 score 11 scriptsbioc
SynExtend:Tools for Working With Synteny Objects
Shared order between genomic sequences provide a great deal of information. Synteny objects produced by the R package DECIPHER provides quantitative information about that shared order. SynExtend provides tools for extracting information from Synteny objects.
Maintained by Nicholas Cooley. Last updated 15 days ago.
geneticsclusteringcomparativegenomicsdataimportfortranopenmp
1 stars 6.42 score 77 scriptsygeunkim
bvhar:Bayesian Vector Heterogeneous Autoregressive Modeling
Tools to model and forecast multivariate time series including Bayesian Vector heterogeneous autoregressive (VHAR) model by Kim & Baek (2023) (<doi:10.1080/00949655.2023.2281644>). 'bvhar' can model Vector Autoregressive (VAR), VHAR, Bayesian VAR (BVAR), and Bayesian VHAR (BVHAR) models.
Maintained by Young Geun Kim. Last updated 28 days ago.
bayesianbayesian-econometricsbvareigenforecastingharpybind11pythonrcppeigentime-seriesvector-autoregressioncppopenmp
6 stars 6.42 score 25 scriptsbioc
SpliceWiz:interactive analysis and visualization of alternative splicing in R
The analysis and visualization of alternative splicing (AS) events from RNA sequencing data remains challenging. SpliceWiz is a user-friendly and performance-optimized R package for AS analysis, by processing alignment BAM files to quantify read counts across splice junctions, IRFinder-based intron retention quantitation, and supports novel splicing event identification. We introduce a novel visualization for AS using normalized coverage, thereby allowing visualization of differential AS across conditions. SpliceWiz features a shiny-based GUI facilitating interactive data exploration of results including gene ontology enrichment. It is performance optimized with multi-threaded processing of BAM files and a new COV file format for fast recall of sequencing coverage. Overall, SpliceWiz streamlines AS analysis, enabling reliable identification of functionally relevant AS events for further characterization.
Maintained by Alex Chit Hei Wong. Last updated 16 days ago.
softwaretranscriptomicsrnaseqalternativesplicingcoveragedifferentialsplicingdifferentialexpressionguisequencingcppopenmp
16 stars 6.41 score 8 scriptspbreheny
plmmr:Penalized Linear Mixed Models for Correlated Data
Fits penalized linear mixed models that correct for unobserved confounding factors. 'plmmr' infers and corrects for the presence of unobserved confounding effects such as population stratification and environmental heterogeneity. It then fits a linear model via penalized maximum likelihood. Originally designed for the multivariate analysis of single nucleotide polymorphisms (SNPs) measured in a genome-wide association study (GWAS), 'plmmr' eliminates the need for subpopulation-specific analyses and post-analysis p-value adjustments. Functions for the appropriate processing of 'PLINK' files are also supplied. For examples, see the package homepage. <https://pbreheny.github.io/plmmr/>.
Maintained by Patrick J. Breheny. Last updated 8 days ago.
5 stars 6.41 score 10 scriptslbb220
GWmodel:Geographically-Weighted Models
Techniques from a particular branch of spatial statistics,termed geographically-weighted (GW) models. GW models suit situations when data are not described well by some global model, but where there are spatial regions where a suitably localised calibration provides a better description. 'GWmodel' includes functions to calibrate: GW summary statistics (Brunsdon et al., 2002)<doi: 10.1016/s0198-9715(01)00009-6>, GW principal components analysis (Harris et al., 2011)<doi: 10.1080/13658816.2011.554838>, GW discriminant analysis (Brunsdon et al., 2007)<doi: 10.1111/j.1538-4632.2007.00709.x> and various forms of GW regression (Brunsdon et al., 1996)<doi: 10.1111/j.1538-4632.1996.tb00936.x>; some of which are provided in basic and robust (outlier resistant) forms.
Maintained by Binbin Lu. Last updated 7 months ago.
18 stars 6.38 score 266 scripts 4 dependentsailich
GLCMTextures:GLCM Textures of Raster Layers
Calculates grey level co-occurrence matrix (GLCM) based texture measures (Hall-Beyer (2017) <https://prism.ucalgary.ca/bitstream/handle/1880/51900/texture%20tutorial%20v%203_0%20180206.pdf>; Haralick et al. (1973) <doi:10.1109/TSMC.1973.4309314>) of raster layers using a sliding rectangular window. It also includes functions to quantize a raster into grey levels as well as tabulate a glcm and calculate glcm texture metrics for a matrix.
Maintained by Alexander Ilich. Last updated 2 months ago.
12 stars 6.33 score 20 scripts 2 dependentspachadotdev
economiccomplexity:Computational Methods for Economic Complexity
A wrapper of different methods from Linear Algebra for the equations introduced in The Atlas of Economic Complexity and related literature. This package provides standard matrix and graph output that can be used seamlessly with other packages. See <doi:10.21105/joss.01866> for a summary of these methods and its evolution in literature.
Maintained by Mauricio Vargas Sepulveda. Last updated 3 months ago.
economic-complexityeigenvalueseigenvectorsgraphsinternational-tradematrixnetworksrecursive-algorithmopenblascppopenmp
39 stars 6.32 score 18 scriptsspatialnous
alcyon:Spatial Network Analysis
Interface package for 'sala', the spatial network analysis library from the 'depthmapX' software application. The R parts of the code are based on the 'rdepthmap' package. Allows for the analysis of urban and building-scale networks and provides metrics and methods usually found within the Space Syntax domain. Methods in this package are described by K. Al-Sayed, A. Turner, B. Hillier, S. Iida and A. Penn (2014) "Space Syntax methodology", and also by A. Turner (2004) <https://discovery.ucl.ac.uk/id/eprint/2651> "Depthmap 4: a researcher's handbook".
Maintained by Petros Koutsolampros. Last updated 10 hours ago.
2 stars 6.32 score 13 scriptsibidat
nn2poly:Neural Network Weights Transformation into Polynomial Coefficients
Implements a method that builds the coefficients of a polynomial model that performs almost equivalently as a given neural network (densely connected). This is achieved using Taylor expansion at the activation functions. The obtained polynomial coefficients can be used to explain features (and their interactions) importance in the neural network, therefore working as a tool for interpretability or eXplainable Artificial Intelligence (XAI). See Morala et al. 2021 <doi:10.1016/j.neunet.2021.04.036>, and 2023 <doi:10.1109/TNNLS.2023.3330328>.
Maintained by Pablo Morala. Last updated 2 months ago.
10 stars 6.32 score 23 scriptshughparsonage
hutilscpp:Miscellaneous Functions in C++
Provides utility functions that are simply, frequently used, but may require higher performance that what can be obtained from base R. Incidentally provides support for 'reverse geocoding', such as matching a point with its nearest neighbour in another array. Used as a complement to package 'hutils' by sacrificing compilation or installation time for higher running speeds. The name is a portmanteau of the author and 'Rcpp'.
Maintained by Hugh Parsonage. Last updated 8 days ago.
10 stars 6.31 score 113 scripts 2 dependentsbsvars
bsvarSIGNs:Bayesian SVARs with Sign, Zero, and Narrative Restrictions
Implements state-of-the-art algorithms for the Bayesian analysis of Structural Vector Autoregressions (SVARs) identified by sign, zero, and narrative restrictions. The core model is based on a flexible Vector Autoregression with estimated hyper-parameters of the Minnesota prior and the dummy observation priors as in Giannone, Lenza, Primiceri (2015) <doi:10.1162/REST_a_00483>. The sign restrictions are implemented employing the methods proposed by Rubio-Ramírez, Waggoner & Zha (2010) <doi:10.1111/j.1467-937X.2009.00578.x>, while identification through sign and zero restrictions follows the approach developed by Arias, Rubio-Ramírez, & Waggoner (2018) <doi:10.3982/ECTA14468>. Furthermore, our tool provides algorithms for identification via sign and narrative restrictions, in line with the methods introduced by Antolín-Díaz and Rubio-Ramírez (2018) <doi:10.1257/aer.20161852>. Users can also estimate a model with sign, zero, and narrative restrictions imposed at once. The package facilitates predictive and structural analyses using impulse responses, forecast error variance and historical decompositions, forecasting and conditional forecasting, as well as analyses of structural shocks and fitted values. All this is complemented by colourful plots, user-friendly summary functions, and comprehensive documentation including the vignette by Wang & Woźniak (2024) <doi:10.48550/arXiv.2501.16711>. The 'bsvarSIGNs' package is aligned regarding objects, workflows, and code structure with the R package 'bsvars' by Woźniak (2024) <doi:10.32614/CRAN.package.bsvars>, and they constitute an integrated toolset. It was granted the Di Cook Open-Source Statistical Software Award by the Statistical Society of Australia in 2024.
Maintained by Xiaolei Wang. Last updated 2 months ago.
bayesian-inferenceeconometricsvector-autoregressionopenblascppopenmp
13 stars 6.21 score 10 scriptsjacobseedorff21
BranchGLM:Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms
Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'.
Maintained by Jacob Seedorff. Last updated 6 months ago.
generalized-linear-modelsregressionstatisticssubset-selectionvariable-selectionopenblascppopenmp
7 stars 6.20 score 30 scriptsgzt
MixMatrix:Classification with Matrix Variate Normal and t Distributions
Provides sampling and density functions for matrix variate normal, t, and inverted t distributions; ML estimation for matrix variate normal and t distributions using the EM algorithm, including some restrictions on the parameters; and classification by linear and quadratic discriminant analysis for matrix variate normal and t distributions described in Thompson et al. (2019) <doi:10.1080/10618600.2019.1696208>. Performs clustering with matrix variate normal and t mixture models.
Maintained by Geoffrey Thompson. Last updated 6 months ago.
3 stars 6.19 score 29 scripts 3 dependentsstscl
spEDM:Spatial Empirical Dynamic Modeling
Inferring causal associations in cross-sectional earth system data through empirical dynamic modeling (EDM), with extensions to convergent cross mapping from Sugihara et al. (2012) <doi:10.1126/science.1227079>, partial cross mapping as outlined in Leng et al. (2020) <doi:10.1038/s41467-020-16238-0>, and cross mapping cardinality as described in Tao et al. (2023)<doi:10.1016/j.fmre.2023.01.007>.
Maintained by Wenbo Lv. Last updated 3 days ago.
causal-inferencecppempirical-dynamic-modelinggeoinformaticsgeospatial-causalityspatial-statisticsopenblascppopenmp
17 stars 6.16 score 2 scriptsjoeguinness
GpGp:Fast Gaussian Process Computation Using Vecchia's Approximation
Functions for fitting and doing predictions with Gaussian process models using Vecchia's (1988) approximation. Package also includes functions for reordering input locations, finding ordered nearest neighbors (with help from 'FNN' package), grouping operations, and conditional simulations. Covariance functions for spatial and spatial-temporal data on Euclidean domains and spheres are provided. The original approximation is due to Vecchia (1988) <http://www.jstor.org/stable/2345768>, and the reordering and grouping methods are from Guinness (2018) <doi:10.1080/00401706.2018.1437476>. Model fitting employs a Fisher scoring algorithm described in Guinness (2019) <doi:10.48550/arXiv.1905.08374>.
Maintained by Joseph Guinness. Last updated 5 months ago.
10 stars 6.16 score 160 scripts 6 dependentsbiodiverse
spAbundance:Univariate and Multivariate Spatial Modeling of Species Abundance
Fits single-species (univariate) and multi-species (multivariate) non-spatial and spatial abundance models in a Bayesian framework using Markov Chain Monte Carlo (MCMC). Spatial models are fit using Nearest Neighbor Gaussian Processes (NNGPs). Details on NNGP models are given in Datta, Banerjee, Finley, and Gelfand (2016) <doi:10.1080/01621459.2015.1044091> and Finley, Datta, and Banerjee (2022) <doi:10.18637/jss.v103.i05>. Fits single-species and multi-species spatial and non-spatial versions of generalized linear mixed models (Gaussian, Poisson, Negative Binomial), N-mixture models (Royle 2004 <doi:10.1111/j.0006-341X.2004.00142.x>) and hierarchical distance sampling models (Royle, Dawson, Bates (2004) <doi:10.1890/03-3127>). Multi-species spatial models are fit using a spatial factor modeling approach with NNGPs for computational efficiency.
Maintained by Jeffrey Doser. Last updated 3 days ago.
17 stars 6.15 score 43 scripts 1 dependentsjoliencremers
bpnreg:Bayesian Projected Normal Regression Models for Circular Data
Fitting Bayesian multiple and mixed-effect regression models for circular data based on the projected normal distribution. Both continuous and categorical predictors can be included. Sampling from the posterior is performed via an MCMC algorithm. Posterior descriptives of all parameters, model fit statistics and Bayes factors for hypothesis tests for inequality constrained hypotheses are provided. See Cremers, Mulder & Klugkist (2018) <doi:10.1111/bmsp.12108> and Nuñez-Antonio & Guttiérez-Peña (2014) <doi:10.1016/j.csda.2012.07.025>.
Maintained by Jolien Cremers. Last updated 1 years ago.
14 stars 6.15 score 101 scriptshelske
Rlibeemd:Ensemble Empirical Mode Decomposition (EEMD) and Its Complete Variant (CEEMDAN)
An R interface for libeemd (Luukko, Helske, Räsänen, 2016) <doi:10.1007/s00180-015-0603-9>, a C library of highly efficient parallelizable functions for performing the ensemble empirical mode decomposition (EEMD), its complete variant (CEEMDAN), the regular empirical mode decomposition (EMD), and bivariate EMD (BEMD). Due to the possible portability issues CRAN version no longer supports OpenMP, you can install OpenMP-supported version from GitHub: <https://github.com/helske/Rlibeemd/>.
Maintained by Jouni Helske. Last updated 2 years ago.
cdecompositioneemdemdtime-seriesgslcppopenmp
39 stars 6.14 score 17 scripts 14 dependentsdylanb95
statespacer:State Space Modelling in 'R'
A tool that makes estimating models in state space form a breeze. See "Time Series Analysis by State Space Methods" by Durbin and Koopman (2012, ISBN: 978-0-19-964117-8) for details about the algorithms implemented.
Maintained by Dylan Beijers. Last updated 2 years ago.
cppdynamic-linear-modelforecastinggaussian-modelskalman-filtermathematical-modellingstate-spacestatistical-inferencestatistical-modelsstructural-analysistime-seriesopenblascppopenmp
15 stars 6.14 score 37 scriptsjwiley
brmsmargins:Bayesian Marginal Effects for 'brms' Models
Calculate Bayesian marginal effects, average marginal effects, and marginal coefficients (also called population averaged coefficients) for models fit using the 'brms' package including fixed effects, mixed effects, and location scale models. These are based on marginal predictions that integrate out random effects if necessary (see for example <doi:10.1186/s12874-015-0046-6> and <doi:10.1111/biom.12707>).
Maintained by Joshua F. Wiley. Last updated 2 months ago.
20 stars 6.14 score 42 scriptsbiometris
statgenGWAS:Genome Wide Association Studies
Fast single trait Genome Wide Association Studies (GWAS) following the method described in Kang et al. (2010), <doi:10.1038/ng.548>. One of a series of statistical genetic packages for streamlining the analysis of typical plant breeding experiments developed by Biometris.
Maintained by Bart-Jan van Rossum. Last updated 4 months ago.
14 stars 6.14 score 15 scripts 3 dependentssales-lab
parmigene:Parallel Mutual Information Estimation for Gene Network Reconstruction
Parallel estimation of the mutual information based on entropy estimates from k-nearest neighbors distances and algorithms for the reconstruction of gene regulatory networks (Sales et al, 2011 <doi:10.1093/bioinformatics/btr274>).
Maintained by Gabriele Sales. Last updated 5 months ago.
6 stars 6.14 score 38 scripts 4 dependentsmbant
BayesSUR:Bayesian Seemingly Unrelated Regression Models in High-Dimensional Settings
Bayesian seemingly unrelated regression with general variable selection and dense/sparse covariance matrix. The sparse seemingly unrelated regression is described in Bottolo et al. (2021) <doi:10.1111/rssc.12490>, the software paper is in Zhao et al. (2021) <doi:10.18637/jss.v100.i11>, and the model with random effects is described in Zhao et al. (2024) <doi:10.1093/jrsssc/qlad102>.
Maintained by Zhi Zhao. Last updated 1 days ago.
8 stars 6.11 score 3 scriptsmkln
meshed:Bayesian Regression with Meshed Gaussian Processes
Fits Bayesian regression models based on latent Meshed Gaussian Processes (MGP) as described in Peruzzi, Banerjee, Finley (2020) <doi:10.1080/01621459.2020.1833889>, Peruzzi, Banerjee, Dunson, and Finley (2021) <arXiv:2101.03579>, Peruzzi and Dunson (2024) <arXiv:2201.10080>. Funded by ERC grant 856506 and NIH grant R01ES028804.
Maintained by Michele Peruzzi. Last updated 8 months ago.
bayesianmcmcmultivariateregressionspatialspatiotemporalopenblascppopenmp
13 stars 6.11 score 49 scriptsbioc
QUBIC:An R package for qualitative biclustering in support of gene co-expression analyses
The core function of this R package is to provide the implementation of the well-cited and well-reviewed QUBIC algorithm, aiming to deliver an effective and efficient biclustering capability. This package also includes the following related functions: (i) a qualitative representation of the input gene expression data, through a well-designed discretization way considering the underlying data property, which can be directly used in other biclustering programs; (ii) visualization of identified biclusters using heatmap in support of overall expression pattern analysis; (iii) bicluster-based co-expression network elucidation and visualization, where different correlation coefficient scores between a pair of genes are provided; and (iv) a generalize output format of biclusters and corresponding network can be freely downloaded so that a user can easily do following comprehensive functional enrichment analysis (e.g. DAVID) and advanced network visualization (e.g. Cytoscape).
Maintained by Yu Zhang. Last updated 5 months ago.
statisticalmethodmicroarraydifferentialexpressionmultiplecomparisonclusteringvisualizationgeneexpressionnetworkbioconductor-packagebioconductor-packagescppopenmp
3 stars 6.10 score 14 scripts 1 dependentsjeremygelb
geocmeans:Implementing Methods for Spatial Fuzzy Unsupervised Classification
Provides functions to apply spatial fuzzy unsupervised classification, visualize and interpret results. This method is well suited when the user wants to analyze data with a fuzzy clustering algorithm and to account for the spatial dimension of the dataset. In addition, indexes for estimating the spatial consistency and classification quality are proposed. The methods were originally proposed in the field of brain imagery (seed Cai and al. 2007 <doi:10.1016/j.patcog.2006.07.011> and Zaho and al. 2013 <doi:10.1016/j.dsp.2012.09.016>) and recently applied in geography (see Gelb and Apparicio <doi:10.4000/cybergeo.36414>).
Maintained by Jeremy Gelb. Last updated 4 months ago.
clusteringcmeansfuzzy-classification-algorithmsspatial-analysisspatial-fuzzy-cmeansunsupervised-learningcppopenmp
28 stars 6.10 score 90 scriptssentometricsresearch
sentometrics:An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction
Optimized prediction based on textual sentiment, accounting for the intrinsic challenge that sentiment can be computed and pooled across texts and time in various ways. See Ardia et al. (2021) <doi:10.18637/jss.v099.i02>.
Maintained by Samuel Borms. Last updated 4 years ago.
nlppredictionsentiment-analysistext-miningtime-seriesopenblascppopenmp
83 stars 6.09 score 49 scriptskisungyou
ADMM:Algorithms using Alternating Direction Method of Multipliers
Provides algorithms to solve popular optimization problems in statistics such as regression or denoising based on Alternating Direction Method of Multipliers (ADMM). See Boyd et al (2010) <doi:10.1561/2200000016> for complete introduction to the method.
Maintained by Kisung You. Last updated 4 years ago.
6 stars 6.08 score 15 scripts 9 dependentsbioc
mgsa:Model-based gene set analysis
Model-based Gene Set Analysis (MGSA) is a Bayesian modeling approach for gene set enrichment. The package mgsa implements MGSA and tools to use MGSA together with the Gene Ontology.
Maintained by Sebastian Bauer. Last updated 5 months ago.
pathwaysgogenesetenrichmentopenmp
5 stars 6.08 score 12 scriptspachadotdev
capybara:Fast and Memory Efficient Fitting of Linear Models with High-Dimensional Fixed Effects
Fast and user-friendly estimation of generalized linear models with multiple fixed effects and cluster the standard errors. The method to obtain the estimated fixed-effects coefficients is based on Stammann (2018) <doi:10.48550/arXiv.1707.01815> and Gaure (2013) <doi:10.1016/j.csda.2013.03.024>.
Maintained by Mauricio Vargas Sepulveda. Last updated 3 days ago.
cpp11econometricslinear-modelsopenblascppopenmp
13 stars 6.07 scorexiaooupan
conquer:Convolution-Type Smoothed Quantile Regression
Estimation and inference for conditional linear quantile regression models using a convolution smoothed approach. In the low-dimensional setting, efficient gradient-based methods are employed for fitting both a single model and a regression process over a quantile range. Normal-based and (multiplier) bootstrap confidence intervals for all slope coefficients are constructed. In high dimensions, the conquer method is complemented with flexible types of penalties (Lasso, elastic-net, group lasso, sparse group lasso, scad and mcp) to deal with complex low-dimensional structures.
Maintained by Xiaoou Pan. Last updated 2 years ago.
21 stars 6.06 score 17 scripts 5 dependentsdakep
pense:Penalized Elastic Net S/MM-Estimator of Regression
Robust penalized (adaptive) elastic net S and M estimators for linear regression. The methods are proposed in Cohen Freue, G. V., Kepplinger, D., Salibián-Barrera, M., and Smucler, E. (2019) <https://projecteuclid.org/euclid.aoas/1574910036>. The package implements the extensions and algorithms described in Kepplinger, D. (2020) <doi:10.14288/1.0392915>.
Maintained by David Kepplinger. Last updated 8 months ago.
linear-regressionpenseregressionrobust-regresssionrobust-statisticsopenblascppopenmp
4 stars 6.06 score 48 scriptsjakobraymaekers
cellWise:Analyzing Data with Cellwise Outliers
Tools for detecting cellwise outliers and robust methods to analyze data which may contain them. Contains the implementation of the algorithms described in Rousseeuw and Van den Bossche (2018) <doi:10.1080/00401706.2017.1340909> (open access) Hubert et al. (2019) <doi:10.1080/00401706.2018.1562989> (open access), Raymaekers and Rousseeuw (2021) <doi:10.1080/00401706.2019.1677270> (open access), Raymaekers and Rousseeuw (2021) <doi:10.1007/s10994-021-05960-5> (open access), Raymaekers and Rousseeuw (2021) <doi:10.52933/jdssv.v1i3.18> (open access), Raymaekers and Rousseeuw (2022) <arXiv:2207.13493> (open access) Rousseeuw (2022) <doi:10.1016/j.ecosta.2023.01.007> (open access). Examples can be found in the vignettes: "DDC_examples", "MacroPCA_examples", "wrap_examples", "transfo_examples", "DI_examples", "cellMCD_examples" , "Correspondence_analysis_examples", and "cellwise_weights_examples".
Maintained by Jakob Raymaekers. Last updated 1 years ago.
2 stars 6.06 score 54 scripts 16 dependentshughparsonage
grattan:Australian Tax Policy Analysis
Utilities to cost and evaluate Australian tax policy, including fast projections of personal income tax collections, high-performance tax and transfer calculators, and an interface to common indices from the Australian Bureau of Statistics. Written to support Grattan Institute's Australian Perspectives program, and related projects. Access to the Australian Taxation Office's sample files of personal income tax returns is assumed.
Maintained by Hugh Parsonage. Last updated 1 years ago.
25 stars 6.04 score 124 scriptsahaeusser
echos:Echo State Networks for Time Series Modeling and Forecasting
Provides a lightweight implementation of functions and methods for fast and fully automatic time series modeling and forecasting using Echo State Networks (ESNs).
Maintained by Alexander Häußer. Last updated 24 days ago.
echo-state-networksfablefabletoolsforecastforecastingrecurrent-neural-networksreservoir-computingridge-regressiontime-seriesopenblascppopenmp
12 stars 6.03 score 8 scriptsjaredhuling
oem:Orthogonalizing EM: Penalized Regression for Big Tall Data
Solves penalized least squares problems for big tall data using the orthogonalizing EM algorithm of Xiong et al. (2016) <doi:10.1080/00401706.2015.1054436>. The main fitting function is oem() and the functions cv.oem() and xval.oem() are for cross validation, the latter being an accelerated cross validation function for linear models. The big.oem() function allows for out of memory fitting. A description of the underlying methods and code interface is described in Huling and Chien (2022) <doi:10.18637/jss.v104.i06>.
Maintained by Jared Huling. Last updated 8 months ago.
group-lassolassomachine-learningmcpoemoem-algorithmpenalized-regressionscadvariable-selectionopenblascppopenmp
27 stars 6.02 score 26 scripts 1 dependentsthiloklein
matchingMarkets:Analysis of Stable Matchings
Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes one-sided matching of agents into groups as well as two-sided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.
Maintained by Thilo Klein. Last updated 5 years ago.
40 stars 5.99 score 49 scriptsxiaolei-lab
simer:Data Simulation for Life Science and Breeding
Data simulator including genotype, phenotype, pedigree, selection and reproduction in R. It simulates most of reproduction process of animals or plants and provides data for GS (Genomic Selection), GWAS (Genome-Wide Association Study), and Breeding. For ADI model, please see Kao C and Zeng Z (2002) <doi:10.1093/genetics/160.3.1243>. For build.cov, please see B. D. Ripley (1987) <ISBN:9780470009604>.
Maintained by Xiaolei Liu. Last updated 1 days ago.
36 stars 5.96 score 2 scriptsjameel-institute
daedalus:Model Health, Social, and Economic Costs of a Pandemic
Model the health, education, and economic costs of directly transmitted respiratory virus pandemics, under different scenarios of prior vaccine investment and reactive interventions, using the 'DAEDALUS' integrated health-economics model adapted from Haw et al. (2022) <doi.org/10.1038/s43588-022-00233-0>.
Maintained by Pratik Gupte. Last updated 1 hours ago.
decision-supportepidemiological-modelshealth-economicspandemic-preparednesspublic-healthrcppsdg-3cppopenmp
4 stars 5.95 score 8 scriptspolkas
miceFast:Fast Imputations Using 'Rcpp' and 'Armadillo'
Fast imputations under the object-oriented programming paradigm. Moreover there are offered a few functions built to work with popular R packages such as 'data.table' or 'dplyr'. The biggest improvement in time performance could be achieve for a calculation where a grouping variable have to be used. A single evaluation of a quantitative model for the multiple imputations is another major enhancement. A new major improvement is one of the fastest predictive mean matching in the R world because of presorting and binary search.
Maintained by Maciej Nasinski. Last updated 2 months ago.
cppfastfast-imputationsgroupingimputationimputationsmatrixmromultiple-imputationrcpprcpparmadillovifweightingopenblascppopenmp
20 stars 5.94 score 29 scriptscomeetie
greed:Clustering and Model Selection with the Integrated Classification Likelihood
An ensemble of algorithms that enable the clustering of networks and data matrices (such as counts, categorical or continuous) with different type of generative models. Model selection and clustering is performed in combination by optimizing the Integrated Classification Likelihood (which is equivalent to minimizing the description length). Several models are available such as: Stochastic Block Model, degree corrected Stochastic Block Model, Mixtures of Multinomial, Latent Block Model. The optimization is performed thanks to a combination of greedy local search and a genetic algorithm (see <arXiv:2002:11577> for more details).
Maintained by Etienne Côme. Last updated 2 years ago.
14 stars 5.94 score 41 scriptsbioc
normr:Normalization and difference calling in ChIP-seq data
Robust normalization and difference calling procedures for ChIP-seq and alike data. Read counts are modeled jointly as a binomial mixture model with a user-specified number of components. A fitted background estimate accounts for the effect of enrichment in certain regions and, therefore, represents an appropriate null hypothesis. This robust background is used to identify significantly enriched or depleted regions.
Maintained by Johannes Helmuth. Last updated 5 months ago.
bayesiandifferentialpeakcallingclassificationdataimportchipseqripseqfunctionalgenomicsgeneticsmultiplecomparisonnormalizationpeakdetectionpreprocessingalignmentcppopenmp
11 stars 5.93 score 13 scriptscaetanods
ratematrix:Bayesian Estimation of the Evolutionary Rate Matrix
The Evolutionary Rate Matrix is a variance-covariance matrix which describes both the rates of trait evolution and the evolutionary correlation among multiple traits. This package has functions to estimate these parameters using Bayesian MCMC. It is possible to test if the pattern of evolutionary correlations among traits has changed between predictive regimes painted along the branches of the phylogenetic tree. Regimes can be created a priori or estimated as part of the MCMC under a joint estimation approach. The package has functions to run MCMC chains, plot results, evaluate convergence, and summarize posterior distributions.
Maintained by Daniel Caetano. Last updated 2 years ago.
10 stars 5.91 score 18 scripts 1 dependentsl-ramirez-lopez
resemble:Memory-Based Learning in Spectral Chemometrics
Functions for dissimilarity analysis and memory-based learning (MBL, a.k.a local modeling) in complex spectral data sets. Most of these functions are based on the methods presented in Ramirez-Lopez et al. (2013) <doi:10.1016/j.geoderma.2012.12.014>.
Maintained by Leonardo Ramirez-Lopez. Last updated 2 years ago.
chemoinformaticschemometricsinfrared-spectroscopylazy-learninglocal-regressionmachine-learningmemory-based-learningnirpedometricssoil-spectroscopyspectral-dataspectral-libraryspectroscopyopenblascppopenmp
20 stars 5.91 score 27 scriptscran
dotCall64:Enhanced Foreign Function Interface Supporting Long Vectors
Provides .C64(), which is an enhanced version of .C() and .Fortran() from the foreign function interface. .C64() supports long vectors, arguments of type 64-bit integer, and provides a mechanism to avoid unnecessary copies of read-only and write-only arguments. This makes it a convenient and fast interface to C/C++ and Fortran code.
Maintained by Reinhard Furrer. Last updated 6 months ago.
5.90 score 439 dependentshmjianggatech
SAM:Sparse Additive Modelling
Computationally efficient tools for high dimensional predictive modeling (regression and classification). SAM is short for sparse additive modeling, and adopts the computationally efficient basis spline technique. We solve the optimization problems by various computational algorithms including the block coordinate descent algorithm, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by warm-start and active-set tricks.
Maintained by Haoming Jiang. Last updated 3 years ago.
6 stars 5.86 score 20 scripts 4 dependentsjamesyang007
adelie:Group Lasso and Elastic Net Solver for Generalized Linear Models
Extremely efficient procedures for fitting the entire group lasso and group elastic net regularization path for GLMs, multinomial, the Cox model and multi-task Gaussian models. Similar to the R package 'glmnet' in scope of models, and in computational speed. This package provides R bindings to the C++ code underlying the corresponding Python package 'adelie'. These bindings offer a general purpose group elastic net solver, a wide range of matrix classes that can exploit special structure to allow large-scale inputs, and an assortment of generalized linear model classes for fitting various types of data. The package is an implementation of Yang, J. and Hastie, T. (2024) <doi:10.48550/arXiv.2405.08631>.
Maintained by Trevor Hastie. Last updated 28 days ago.
6 stars 5.86 score 3 scriptsvdblab
FLORAL:Fit Log-Ratio Lasso Regression for Compositional Data
Log-ratio Lasso regression for continuous, binary, and survival outcomes with (longitudinal) compositional features. See Fei and others (2024) <doi:10.1016/j.crmeth.2024.100899>.
Maintained by Teng Fei. Last updated 1 months ago.
12 stars 5.85 score 13 scripts