R-universe search: randomization

igraph

igraph:Network Analysis and Visualization

Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.

Maintained by Kirill Müller. Last updated 1 days ago.

complex-networks graph-algorithms graph-theory mathematics network-analysis network-graph fortran libxml2 glpk openblas cpp

76.8 match 581 stars 21.10 score 31k scripts 1.9k dependents

spatstat

spatstat.random:Random Generation Functionality for the 'spatstat' Family

Functionality for random generation of spatial data in the 'spatstat' family of packages. Generates random spatial patterns of points according to many simple rules (complete spatial randomness, Poisson, binomial, random grid, systematic, cell), randomised alteration of patterns (thinning, random shift, jittering), simulated realisations of random point processes including simple sequential inhibition, Matern inhibition models, Neyman-Scott cluster processes (using direct, Brix-Kendall, or hybrid algorithms), log-Gaussian Cox processes, product shot noise cluster processes and Gibbs point processes (using Metropolis-Hastings birth-death-shift algorithm, alternating Gibbs sampler, or coupling-from-the-past perfect simulation). Also generates random spatial patterns of line segments, random tessellations, and random images (random noise, random mosaics). Excludes random generation on a linear network, which is covered by the separate package 'spatstat.linnet'.

Maintained by Adrian Baddeley. Last updated 6 months ago.

point-processes random-generation simulation spatial-sampling spatial-simulation cpp

92.2 match 5 stars 10.77 score 84 scripts 173 dependents

trinker

wakefield:Generate Random Data Sets

Generates random data sets including: data.frames, lists, and vectors.

Maintained by Tyler Rinker. Last updated 5 years ago.

data-generation wakefield

118.8 match 256 stars 7.13 score 209 scripts

eddelbuettel

random:True Random Numbers using RANDOM.ORG

The true random number service provided by the RANDOM.ORG website created by Mads Haahr samples atmospheric noise via radio tuned to an unused broadcasting frequency together with a skew correction algorithm due to John von Neumann. More background is available in the included vignette based on an essay by Mads Haahr. In its current form, the package offers functions to retrieve random integers, randomized sequences and random strings.

Maintained by Dirk Eddelbuettel. Last updated 28 days ago.

random-number-generators

82.4 match 9 stars 9.47 score 1.4k scripts 2 dependents

alexpghayes

distributions3:Probability Distributions as S3 Objects

Tools to create and manipulate probability distributions using S3. Generics pdf(), cdf(), quantile(), and random() provide replacements for base R's d/p/q/r style functions. Functions and arguments have been named carefully to minimize confusion for students in intro stats courses. The documentation for each distribution contains detailed mathematical notes.

Maintained by Alex Hayes. Last updated 6 months ago.

66.1 match 101 stars 11.31 score 118 scripts 7 dependents

stan-dev

posterior:Tools for Working with Posterior Distributions

Provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to: (a) Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions. (b) Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws. (c) Provide various summaries of draws in convenient formats. (d) Provide lightweight implementations of state of the art posterior inference diagnostics. References: Vehtari et al. (2021) <doi:10.1214/20-BA1221>.

Maintained by Paul-Christian Bürkner. Last updated 9 days ago.

bayes bayesian mcmc

45.3 match 168 stars 16.13 score 3.3k scripts 342 dependents

daqana

dqrng:Fast Pseudo Random Number Generators

Several fast random number generators are provided as C++ header only libraries: The PCG family by O'Neill (2014 <https://www.cs.hmc.edu/tr/hmc-cs-2014-0905.pdf>) as well as the Xoroshiro / Xoshiro family by Blackman and Vigna (2021 <doi:10.1145/3460772>). In addition fast functions for generating random numbers according to a uniform, normal and exponential distribution are included. The latter two use the Ziggurat algorithm originally proposed by Marsaglia and Tsang (2000, <doi:10.18637/jss.v005.i08>). The fast sampling methods support unweighted sampling both with and without replacement. These functions are exported to R and as a C++ interface and are enabled for use with the default 64 bit generator from the PCG family, Xoroshiro128+/++/** and Xoshiro256+/++/** as well as the 64 bit version of the 20 rounds Threefry engine (Salmon et al., 2011, <doi:10.1145/2063384.2063405>) as provided by the package 'sitmo'.

Maintained by Ralf Stubner. Last updated 6 months ago.

random random-distributions random-generation random-sampling rng cpp

51.6 match 42 stars 13.12 score 188 scripts 183 dependents

declaredesign

randomizr:Easy-to-Use Tools for Common Forms of Random Assignment and Sampling

Generates random assignments for common experimental designs and random samples for common sampling designs.

Maintained by Alexander Coppock. Last updated 1 months ago.

63.1 match 37 stars 9.90 score 396 scripts 13 dependents

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 3 days ago.

40.8 match 845 stars 13.57 score 264 scripts 2 dependents

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 8 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

34.2 match 959 stars 15.16 score 4.0k scripts 21 dependents

spsanderson

TidyDensity:Functions for Tidy Analysis and Generation of Random Data

To make it easy to generate random numbers based upon the underlying stats distribution functions. All data is returned in a tidy and structured format making working with the data simple and straight forward. Given that the data is returned in a tidy 'tibble' it lends itself to working with the rest of the 'tidyverse'.

Maintained by Steven Sanderson. Last updated 5 months ago.

bootstrap density distributions ggplot2 probability r-language simulation statistics tibble tidy

66.2 match 34 stars 7.78 score 66 scripts 1 dependents

kkholst

mets:Analysis of Multivariate Event Times

Implementation of various statistical models for multivariate event history data <doi:10.1007/s10985-013-9244-x>. Including multivariate cumulative incidence models <doi:10.1002/sim.6016>, and bivariate random effects probit models (Liability models) <doi:10.1016/j.csda.2015.01.014>. Modern methods for survival analysis, including regression modelling (Cox, Fine-Gray, Ghosh-Lin, Binomial regression) with fast computation of influence functions.

Maintained by Klaus K. Holst. Last updated 1 days ago.

multivariate-time-to-event survival-analysis time-to-event fortran openblas cpp

36.7 match 14 stars 13.47 score 236 scripts 42 dependents

unuran

Runuran:R Interface to the 'UNU.RAN' Random Variate Generators

Interface to the 'UNU.RAN' library for Universal Non-Uniform RANdom variate generators. Thus it allows to build non-uniform random number generators from quite arbitrary distributions. In particular, it provides an algorithm for fast numerical inversion for distribution with given density function. In addition, the package contains densities, distribution functions and quantiles from a couple of distributions.

Maintained by Josef Leydold. Last updated 5 months ago.

60.8 match 6.87 score 180 scripts 8 dependents

datastorm-open

rAmCharts:JavaScript Charts Tool

Provides an R interface for using 'AmCharts' Library. Based on 'htmlwidgets', it provides a global architecture to generate 'JavaScript' source code for charts. Most of classes in the library have their equivalent in R with S4 classes; for those classes, not all properties have been referenced but can easily be added in the constructors. Complex properties (e.g. 'JavaScript' object) can be passed as named list. See examples at <https://datastorm-open.github.io/introduction_ramcharts/> and <https://www.amcharts.com/> for more information about the library. The package includes the free version of 'AmCharts' Library. Its only limitation is a small link to the web site displayed on your charts. If you enjoy this library, do not hesitate to refer to this page <https://www.amcharts.com/online-store/> to purchase a licence, and thus support its creators and get a period of Priority Support. See also <https://www.amcharts.com/about/> for more information about 'AmCharts' company.

Maintained by Benoit Thieurmel. Last updated 2 months ago.

53.9 match 49 stars 7.17 score 153 scripts 4 dependents

easystats

insight:Easy Access to Model Information for Various Model Objects

A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. 'insight' mainly revolves around two types of functions: Functions that find (the names of) information, starting with 'find_', and functions that get the underlying data, starting with 'get_'. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.

Maintained by Daniel Lüdecke. Last updated 4 days ago.

easystats hacktoberfest insight models names predictors random

22.1 match 412 stars 17.24 score 568 scripts 210 dependents

insightsengineering

random.cdisc.data:Create Random ADaM Datasets

A set of functions to create random Analysis Data Model (ADaM) datasets and cached dataset. ADaM dataset specifications are described by the Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model Team.

Maintained by Joe Zhu. Last updated 5 months ago.

cdisc dataset

43.7 match 33 stars 8.60 score 52 scripts

cran

randomizeR:Randomization for Clinical Trials

This tool enables the user to choose a randomization procedure based on sound scientific criteria. It comprises the generation of randomization sequences as well the assessment of randomization procedures based on carefully selected criteria. Furthermore, 'randomizeR' provides a function for the comparison of randomization procedures.

Maintained by Ralf-Dieter Hilgers. Last updated 1 years ago.

110.2 match 2 stars 3.38 score 1 dependents

openintrostat

openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs

Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.

Maintained by Mine Çetinkaya-Rundel. Last updated 2 months ago.

data openintro

28.1 match 240 stars 11.39 score 6.0k scripts

jknowles

merTools:Tools for Analyzing Mixed Effect Regression Models

Provides methods for extracting results from mixed-effect model objects fit with the 'lme4' package. Allows construction of prediction intervals efficiently from large scale linear and generalized linear mixed-effects models. This method draws from the simulation framework used in the Gelman and Hill (2007) textbook: Data Analysis Using Regression and Multilevel/Hierarchical Models.

Maintained by Jared E. Knowles. Last updated 1 years ago.

29.1 match 105 stars 10.49 score 768 scripts

statnet

ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 5 days ago.

18.8 match 100 stars 15.36 score 1.4k scripts 36 dependents

covaruber

sommer:Solving Mixed Model Equations in R

Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.

Maintained by Giovanny Covarrubias-Pazaran. Last updated 20 days ago.

average-information mixed-models rcpparmadillo openblas cpp openmp

22.7 match 43 stars 12.70 score 300 scripts 9 dependents

mlr-org

mlr3extralearners:Extra Learners For mlr3

Extra learners for use in mlr3.

Maintained by Sebastian Fischer. Last updated 4 months ago.

machine-learning mlr3

29.0 match 94 stars 9.16 score 474 scripts

stochastictree

stochtree:Stochastic Tree Ensembles (XBART and BART) for Supervised Learning and Causal Inference

Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) Chipman, George, McCulloch (2010) <doi:10.1214/09-AOAS285> for supervised learning and Bayesian Causal Forests (BCF) Hahn, Murray, Carvalho (2020) <doi:10.1214/19-BA1195> for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers.

Maintained by Drew Herren. Last updated 16 days ago.

bart bayesian-machine-learning bayesian-methods decision-trees gradient-boosted-trees machine-learning probabilistic-models tree-ensembles cpp

30.6 match 20 stars 8.52 score 40 scripts

t-kalinowski

keras:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

23.1 match 10.82 score 10k scripts 54 dependents

erichson

rsvd:Randomized Singular Value Decomposition

Low-rank matrix decompositions are fundamental tools and widely used for data analysis, dimension reduction, and data compression. Classically, highly accurate deterministic matrix algorithms are used for this task. However, the emergence of large-scale data has severely challenged our computational ability to analyze big data. The concept of randomness has been demonstrated as an effective strategy to quickly produce approximate answers to familiar problems such as the singular value decomposition (SVD). The rsvd package provides several randomized matrix algorithms such as the randomized singular value decomposition (rsvd), randomized principal component analysis (rpca), randomized robust principal component analysis (rrpca), randomized interpolative decomposition (rid), and the randomized CUR decomposition (rcur). In addition several plot functions are provided.

Maintained by N. Benjamin Erichson. Last updated 4 years ago.

dimension-reduction matrix-approximation pca principal-component-analysis probabilistic-algorithms randomized-algorithm singular-value-decomposition svd

22.7 match 98 stars 10.80 score 408 scripts 119 dependents

modeloriented

randomForestExplainer:Explaining and Visualizing Random Forests in Terms of Variable Importance

A set of tools to help explain which variables are most important in a random forests. Various variable importance measures are calculated and visualized in different settings in order to get an idea on how their importance changes depending on our criteria (Hemant Ishwaran and Udaya B. Kogalur and Eiran Z. Gorodeski and Andy J. Minn and Michael S. Lauer (2010) <doi:10.1198/jasa.2009.tm08622>, Leo Breiman (2001) <doi:10.1023/A:1010933404324>).

Maintained by Yue Jiang. Last updated 12 months ago.

random-forest

23.1 match 231 stars 9.82 score 236 scripts

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 9 days ago.

fortran cpp

13.4 match 87 stars 16.68 score 7.7k scripts 99 dependents

rfastofficial

Rfast:A Collection of Efficient and Extremely Fast R Functions

A collection of fast (utility) functions for data analysis. Column and row wise means, medians, variances, minimums, maximums, many t, F and G-square tests, many regressions (normal, logistic, Poisson), are some of the many fast functions. References: a) Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>. b) Tsagris M. and Papadakis M. (2018). Forward regression in R: from the extreme slow to the extreme fast. Journal of Data Science, 16(4): 771--780. <doi:10.6339/JDS.201810_16(4).00006>. c) Chatzipantsiou C., Dimitriadis M., Papadakis M. and Tsagris M. (2020). Extremely Efficient Permutation and Bootstrap Hypothesis Tests Using Hypothesis Tests Using R. Journal of Modern Applied Statistical Methods, 18(2), eP2898. <doi:10.48550/arXiv.1806.10947>. d) Tsagris M., Papadakis M., Alenazi A. and Alzeley O. (2024). Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm. Computation, 12(9): 185. <doi:10.3390/computation12090185>. e) Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. <doi:10.48550/arXiv.2501.02849>.

Maintained by Manos Papadakis. Last updated 16 days ago.

openblas cpp openmp

17.5 match 147 stars 12.54 score 1.2k scripts 166 dependents

bioc

DelayedRandomArray:Delayed Arrays of Random Values

Implements a DelayedArray of random values where the realization of the sampled values is delayed until they are needed. Reproducible sampling within any subarray is achieved by chunking where each chunk is initialized with a different random seed and stream. The usual distributions in the stats package are supported, along with scalar, vector and arrays for the parameters.

Maintained by Aaron Lun. Last updated 2 months ago.

datarepresentation cpp

41.7 match 5.26 score 6 scripts 1 dependents

r-forge

copula:Multivariate Dependence with Copulas

Classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.

Maintained by Martin Maechler. Last updated 10 days ago.

18.0 match 11.83 score 1.2k scripts 86 dependents

poissonconsulting

extras:Helper Functions for Bayesian Analyses

Functions to 'numericise' 'R' objects (coerce to numeric objects), summarise 'MCMC' (Monte Carlo Markov Chain) samples and calculate deviance residuals as well as 'R' translations of some 'BUGS' (Bayesian Using Gibbs Sampling), 'JAGS' (Just Another Gibbs Sampler), 'STAN' and 'TMB' (Template Model Builder) functions.

Maintained by Nicole Hill. Last updated 2 months ago.

24.8 match 9 stars 8.49 score 15 scripts 16 dependents

rolkra

explore:Simplifies Exploratory Data Analysis

Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.

Maintained by Roland Krasser. Last updated 3 months ago.

data-exploration data-visualisation decision-trees eda rmarkdown shiny tidy

18.3 match 228 stars 11.43 score 221 scripts 1 dependents

mlverse

torchvision:Models, Datasets and Transformations for Images

Provides access to datasets, models and preprocessing facilities for deep learning with images. Integrates seamlessly with the 'torch' package and it's 'API' borrows heavily from 'PyTorch' vision package.

Maintained by Daniel Falbel. Last updated 6 months ago.

21.0 match 65 stars 9.75 score 313 scripts 6 dependents

sollano

forestmangr:Forest Mensuration and Management

Processing forest inventory data with methods such as simple random sampling, stratified random sampling and systematic sampling. There are also functions for yield and growth predictions and model fitting, linear and nonlinear grouped data fitting, and statistical tests. References: Kershaw Jr., Ducey, Beers and Husch (2016). <doi:10.1002/9781118902028>.

Maintained by Sollano Rabelo Braga. Last updated 3 months ago.

25.4 match 17 stars 7.97 score 378 scripts

centerforassessment

randomNames:Generate Random Given and Surnames

Function for generating random gender and ethnicity correct first and/or last names. Names are chosen proportionally based upon their probability of appearing in a large scale data base of real names.

Maintained by Damian W. Betebenner. Last updated 3 months ago.

random-name-generators random-names

21.7 match 32 stars 9.24 score 297 scripts 5 dependents

munterfi

eRTG3D:Empirically Informed Random Trajectory Generation in 3-D

Creates realistic random trajectories in a 3-D space between two given fix points, so-called conditional empirical random walks (CERWs). The trajectory generation is based on empirical distribution functions extracted from observed trajectories (training data) and thus reflects the geometrical movement characteristics of the mover. A digital elevation model (DEM), representing the Earth's surface, and a background layer of probabilities (e.g. food sources, uplift potential, waterbodies, etc.) can be used to influence the trajectories. Unterfinger M (2018). "3-D Trajectory Simulation in Movement Ecology: Conditional Empirical Random Walk". Master's thesis, University of Zurich. <https://www.geo.uzh.ch/dam/jcr:6194e41e-055c-4635-9807-53c5a54a3be7/MasterThesis_Unterfinger_2018.pdf>. Technitis G, Weibel R, Kranstauber B, Safi K (2016). "An algorithm for empirically informed random trajectory generation between two endpoints". GIScience 2016: Ninth International Conference on Geographic Information Science, 9, online. <doi:10.5167/uzh-130652>.

Maintained by Merlin Unterfinger. Last updated 3 years ago.

3d birds conditional-empirical-random-walk gliding-and-soaring machine-learning movement-ecology random-trajectory-generator random-walk simulation trajectory-generation

34.0 match 6 stars 5.71 score 19 scripts

glmmtmb

glmmTMB:Generalized Linear Mixed Models using Template Model Builder

Fit linear and generalized linear mixed models with various extensions, including zero-inflation. The models are fitted using maximum likelihood estimation via 'TMB' (Template Model Builder). Random effects are assumed to be Gaussian on the scale of the linear predictor and are integrated out using the Laplace approximation. Gradients are calculated using automatic differentiation.

Maintained by Mollie Brooks. Last updated 10 days ago.

cpp openmp

11.5 match 312 stars 16.77 score 3.7k scripts 24 dependents

braverock

PortfolioAnalytics:Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios

Portfolio optimization and analysis routines and graphics.

Maintained by Brian G. Peterson. Last updated 3 months ago.

16.7 match 81 stars 11.49 score 626 scripts 2 dependents

coatless-rpkg

sitmo:Parallel Pseudo Random Number Generator (PPRNG) 'sitmo' Header Files

Provided within are two high quality and fast PPRNGs that may be used in an 'OpenMP' parallel environment. In addition, there is a generator for one dimensional low-discrepancy sequence. The objective of this library to consolidate the distribution of the 'sitmo' (C++98 & C++11), 'threefry' and 'vandercorput' (C++11-only) engines on CRAN by enabling others to link to the header files inside of 'sitmo' instead of including a copy of each engine within their individual package. Lastly, the package contains example implementations using the 'sitmo' package and three accompanying vignette that provide additional information.

Maintained by James Balamuta. Last updated 1 years ago.

parallel random-generation rcpp cpp openmp

19.1 match 7 stars 9.75 score 15 scripts 201 dependents

scumdogsteev

mlsjunkgen:Use the MLS Junk Generator Algorithm to Generate a Stream of Pseudo-Random Numbers

Generate a stream of pseudo-random numbers generated using the MLS Junk Generator algorithm. Functions exist to generate single pseudo-random numbers as well as a vector, data frame, or matrix of pseudo-random numbers.

Maintained by Steve Myles. Last updated 4 years ago.

mls-junk-generator mlsjunkgen random-generation random-number random-number-generator random-number-generators random-quote-machine rng rpackages

46.9 match 3.95 score 18 scripts

gadget-framework

gadget3:Globally-Applicable Area Disaggregated General Ecosystem Toolbox V3

A framework to assist creation of marine ecosystem models, generating either 'R' or 'C++' code which can then be optimised using the 'TMB' package and standard 'R' tools. Principally designed to reproduce gadget2 models in 'TMB', but can be extended beyond gadget2's capabilities. Kasper Kristensen, Anders Nielsen, Casper W. Berg, Hans Skaug, Bradley M. Bell (2016) <doi:10.18637/jss.v070.i05> "TMB: Automatic Differentiation and Laplace Approximation.". Begley, J., & Howell, D. (2004) <https://core.ac.uk/download/pdf/225936648.pdf> "An overview of Gadget, the globally applicable area-disaggregated general ecosystem toolbox. ICES.".

Maintained by Jamie Lentin. Last updated 29 days ago.

21.3 match 8 stars 8.69 score 170 scripts

lme4

lme4:Linear Mixed-Effects Models using 'Eigen' and S4

Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".

Maintained by Ben Bolker. Last updated 1 days ago.

cpp

8.9 match 647 stars 20.69 score 35k scripts 1.5k dependents

richardli

SUMMER:Small-Area-Estimation Unit/Area Models and Methods for Estimation in R

Provides methods for spatial and spatio-temporal smoothing of demographic and health indicators using survey data, with particular focus on estimating and projecting under-five mortality rates, described in Mercer et al. (2015) <doi:10.1214/15-AOAS872>, Li et al. (2019) <doi:10.1371/journal.pone.0210645>, Wu et al. (DHS Spatial Analysis Reports No. 21, 2021), and Li et al. (2023) <doi:10.48550/arXiv.2007.05117>.

Maintained by Zehang R Li. Last updated 2 months ago.

bayesian-inference small-area-estimation space-time

17.7 match 23 stars 10.28 score 134 scripts 2 dependents

insightsengineering

teal.data:Data Model for 'teal' Applications

Provides a 'teal_data' class as a unified data model for 'teal' applications focusing on reproducibility and relational data.

Maintained by Dawid Kaledkowski. Last updated 2 months ago.

data-model nest

18.3 match 11 stars 9.93 score 44 scripts 8 dependents

hemingnm

SESraster:Raster Randomization for Null Hypothesis Testing

Randomization of presence/absence species distribution raster data with or without including spatial structure for calculating standardized effect sizes and testing null hypothesis. The randomization algorithms are based on classical algorithms for matrices (Gotelli 2000, <doi:10.2307/177478>) implemented for raster data.

Maintained by Neander Marcel Heming. Last updated 5 months ago.

null-models randomization raster spatial spatial-analysis species-distribution-modelling

27.3 match 7 stars 6.61 score 32 scripts 2 dependents

spsanderson

RandomWalker:Generate Random Walks Compatible with the 'tidyverse'

Generates random walks of various types by providing a set of functions that are compatible with the 'tidyverse'. The functions provided in the package make it simple to create random walks with a variety of properties, such as how many simulations to run, how many steps to take, and the distribution of random walk itself.

Maintained by Steven Sanderson. Last updated 1 months ago.

random-walk random-walks rpackages

29.3 match 5 stars 6.05 score 5 scripts 1 dependents

alexkowa

EnvStats:Package for Environmental Statistics, Including US EPA Guidance

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <doi:10.1007/978-1-4614-8456-1>).

Maintained by Alexander Kowarik. Last updated 15 days ago.

13.7 match 26 stars 12.80 score 2.4k scripts 46 dependents

didiermurillof

FielDHub:A Shiny App for Design of Experiments in Life Sciences

A shiny design of experiments (DOE) app that aids in the creation of traditional, un-replicated, augmented and partially-replicated designs applied to agriculture, plant breeding, forestry, animal and biological sciences.

Maintained by Didier Murillo. Last updated 8 months ago.

agricultural breeding design doe experimental plantbreeding shiny

19.1 match 48 stars 9.10 score 70 scripts 1 dependents

cran

RRTCS:Randomized Response Techniques for Complex Surveys

Point and interval estimation of linear parameters with data obtained from complex surveys (including stratified and clustered samples) when randomization techniques are used. The randomized response technique was developed to obtain estimates that are more valid when studying sensitive topics. Estimators and variances for 14 randomized response methods for qualitative variables and 7 randomized response methods for quantitative variables are also implemented. In addition, some data sets from surveys with these randomization methods are included in the package.

Maintained by Beatriz Cobo Rodríguez. Last updated 4 years ago.

87.1 match 2.00 score

tmlange

optRF:Optimising Random Forest Stability by Determining the Optimal Number of Trees

Calculating the stability of random forest with certain numbers of trees. The non-linear relationship between stability and numbers of trees is described using a logistic regression model and used to estimate the optimal number of trees.

Maintained by Thomas Martin Lange. Last updated 1 months ago.

36.3 match 4.78 score

alanarnholt

BSDA:Basic Statistics and Data Analysis

Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.

Maintained by Alan T. Arnholt. Last updated 2 years ago.

18.8 match 7 stars 9.11 score 1.3k scripts 6 dependents

pachadotdev

cpp11armadillo:An 'Armadillo' Interface

Provides function declarations and inline function definitions that facilitate communication between R and the 'Armadillo' 'C++' library for linear algebra and scientific computing. This implementation is detailed in Vargas Sepulveda and Schneider Malamud (2024) <doi:10.48550/arXiv.2408.11074>.

Maintained by Mauricio Vargas Sepulveda. Last updated 24 days ago.

armadillo cpp cpp11 hacktoberfest linear-algebra

18.5 match 9 stars 9.14 score 1 scripts 16 dependents

cran

mgcv:Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, 'JAGS' support and distributions beyond the exponential family.

Maintained by Simon Wood. Last updated 1 years ago.

openblas openmp

13.0 match 32 stars 12.71 score 17k scripts 7.8k dependents

r-forge

mlogit:Multinomial Logit Models

Maximum Likelihood estimation of random utility discrete choice models, as described in Kenneth Train (2009) Discrete Choice Methods with Simulations <doi:10.1017/CBO9780511805271>.

Maintained by Yves Croissant. Last updated 5 years ago.

16.8 match 9.81 score 1.2k scripts 14 dependents

michaellli

evalITR:Evaluating Individualized Treatment Rules

Provides various statistical methods for evaluating Individualized Treatment Rules under randomized data. The provided metrics include Population Average Value (PAV), Population Average Prescription Effect (PAPE), Area Under Prescription Effect Curve (AUPEC). It also provides the tools to analyze Individualized Treatment Rules under budget constraints. Detailed reference in Imai and Li (2019) <arXiv:1905.05389>.

Maintained by Michael Lingzhi Li. Last updated 2 years ago.

24.0 match 14 stars 6.78 score 36 scripts

jinli22

spm:Spatial Predictive Modeling

Introduction to some novel accurate hybrid methods of geostatistical and machine learning methods for spatial predictive modelling. It contains two commonly used geostatistical methods, two machine learning methods, four hybrid methods and two averaging methods. For each method, two functions are provided. One function is for assessing the predictive errors and accuracy of the method based on cross-validation. The other one is for generating spatial predictions using the method. For details please see: Li, J., Potter, A., Huang, Z., Daniell, J. J. and Heap, A. (2010) <https:www.ga.gov.au/metadata-gateway/metadata/record/gcat_71407> Li, J., Heap, A. D., Potter, A., Huang, Z. and Daniell, J. (2011) <doi:10.1016/j.csr.2011.05.015> Li, J., Heap, A. D., Potter, A. and Daniell, J. (2011) <doi:10.1016/j.envsoft.2011.07.004> Li, J., Potter, A., Huang, Z. and Heap, A. (2012) <https:www.ga.gov.au/metadata-gateway/metadata/record/74030>.

Maintained by Jin Li. Last updated 3 years ago.

29.7 match 3 stars 5.46 score 107 scripts 3 dependents

reside-ic

ids:Generate Random Identifiers

Generate random or human readable and pronounceable identifiers.

Maintained by Rich FitzJohn. Last updated 3 years ago.

12.0 match 94 stars 13.27 score 175 scripts 165 dependents

ycroissant

plm:Linear Models for Panel Data

A set of estimators for models and (robust) covariance matrices, and tests for panel data econometrics, including within/fixed effects, random effects, between, first-difference, nested random effects as well as instrumental-variable (IV) and Hausman-Taylor-style models, panel generalized method of moments (GMM) and general FGLS models, mean groups (MG), demeaned MG, and common correlated effects (CCEMG) and pooled (CCEP) estimators with common factors, variable coefficients and limited dependent variables models. Test functions include model specification, serial correlation, cross-sectional dependence, panel unit root and panel Granger (non-)causality. Typical references are general econometrics text books such as Baltagi (2021), Econometric Analysis of Panel Data (<doi:10.1007/978-3-030-53953-5>), Hsiao (2014), Analysis of Panel Data (<doi:10.1017/CBO9781139839327>), and Croissant and Millo (2018), Panel Data Econometrics with R (<doi:10.1002/9781119504641>).

Maintained by Kevin Tappe. Last updated 9 hours ago.

13.2 match 59 stars 12.07 score 39 dependents

revelle

psych:Procedures for Psychological, Psychometric, and Personality Research

A general purpose toolbox developed originally for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Validation and cross validation of scales developed using basic machine learning algorithms are provided, as are functions for simulating and testing particular item and test structures. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, including mediation models, factor analysis and structural equation models are created using basic graphics. Some of the functions are written to support a book on psychometric theory as well as publications in personality research. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 3 months ago.

11.4 match 52 stars 13.94 score 29k scripts 317 dependents

imbs-hl

ranger:A Fast Implementation of Random Forests

A fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression, survival and probability prediction trees are supported. Data from genome-wide association studies can be analyzed efficiently. In addition to data frames, datasets of class 'gwaa.data' (R package 'GenABEL') and 'dgCMatrix' (R package 'Matrix') can be directly analyzed.

Maintained by Marvin N. Wright. Last updated 4 months ago.

cpp

9.5 match 783 stars 16.22 score 9.2k scripts 189 dependents

usepa

spmodel:Spatial Statistical Modeling and Prediction

Fit, summarize, and predict for a variety of spatial statistical models applied to point-referenced and areal (lattice) data. Parameters are estimated using various methods. Additional modeling features include anisotropy, non-spatial random effects, partition factors, big data approaches, and more. Model-fit statistics are used to summarize, visualize, and compare models. Predictions at unobserved locations are readily obtainable. For additional details, see Dumelle et al. (2023) <doi:10.1371/journal.pone.0282524>.

Maintained by Michael Dumelle. Last updated 2 days ago.

19.9 match 15 stars 7.66 score 112 scripts 3 dependents

cran

nlme:Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixed-effects models.

Maintained by R Core Team. Last updated 2 months ago.

fortran

11.7 match 6 stars 13.00 score 13k scripts 8.7k dependents

bedapub

designit:Blocking and Randomization for Experimental Design

Intelligently assign samples to batches in order to reduce batch effects. Batch effects can have a significant impact on data analysis, especially when the assignment of samples to batches coincides with the contrast groups being studied. By defining a batch container and a scoring function that reflects the contrasts, this package allows users to assign samples in a way that minimizes the potential impact of batch effects on the comparison of interest. Among other functionality, we provide an implementation for OSAT score by Yan et al. (2012, <doi:10.1186/1471-2164-13-689>).

Maintained by Iakov I. Davydov. Last updated 4 months ago.

design-of-experiments randomization

20.5 match 8 stars 7.28 score 24 scripts

mrc-ide

dust:Iterate Multiple Realisations of Stochastic Models

An Engine for simulation of stochastic models. Includes support for running stochastic models in parallel, either with shared or varying parameters. Simulations are run efficiently in compiled code and can be run with a fraction of simulated states returned to R, allowing control over memory usage. Support is provided for building bootstrap particle filter for performing Sequential Monte Carlo (e.g., Gordon et al. 1993 <doi:10.1049/ip-f-2.1993.0015>). The core of the simulation engine is the 'xoshiro256**' algorithm (Blackman and Vigna <arXiv:1805.01407>), and the package is further described in FitzJohn et al 2021 <doi:10.12688/wellcomeopenres.16466.2>.

Maintained by Rich FitzJohn. Last updated 5 months ago.

cpp openmp

18.8 match 18 stars 7.84 score 60 scripts 3 dependents

neurodata

lolR:Linear Optimal Low-Rank Projection

Supervised learning techniques designed for the situation when the dimensionality exceeds the sample size have a tendency to overfit as the dimensionality of the data increases. To remedy this High dimensionality; low sample size (HDLSS) situation, we attempt to learn a lower-dimensional representation of the data before learning a classifier. That is, we project the data to a situation where the dimensionality is more manageable, and then are able to better apply standard classification or clustering techniques since we will have fewer dimensions to overfit. A number of previous works have focused on how to strategically reduce dimensionality in the unsupervised case, yet in the supervised HDLSS regime, few works have attempted to devise dimensionality reduction techniques that leverage the labels associated with the data. In this package and the associated manuscript Vogelstein et al. (2017) <arXiv:1709.01233>, we provide several methods for feature extraction, some utilizing labels and some not, along with easily extensible utilities to simplify cross-validative efforts to identify the best feature extraction method. Additionally, we include a series of adaptable benchmark simulations to serve as a standard for future investigative efforts into supervised HDLSS. Finally, we produce a comprehensive comparison of the included algorithms across a range of benchmark simulations and real data applications.

Maintained by Eric Bridgeford. Last updated 4 years ago.

20.0 match 20 stars 7.28 score 80 scripts

liamrevell

phytools:Phylogenetic Tools for Comparative Biology (and Other Things)

A wide range of methods for phylogenetic analysis - concentrated in phylogenetic comparative biology, but also including numerous techniques for visualizing, analyzing, manipulating, reading or writing, and even inferring phylogenetic trees. Included among the functions in phylogenetic comparative biology are various for ancestral state reconstruction, model-fitting, and simulation of phylogenies and trait data. A broad range of plotting methods for phylogenies and comparative data include (but are not restricted to) methods for mapping trait evolution on trees, for projecting trees into phenotype space or a onto a geographic map, and for visualizing correlated speciation between trees. Lastly, numerous functions are designed for reading, writing, analyzing, inferring, simulating, and manipulating phylogenetic trees and comparative data. For instance, there are functions for computing consensus phylogenies from a set, for simulating phylogenetic trees and data under a range of models, for randomly or non-randomly attaching species or clades to a tree, as well as for a wide range of other manipulations and analyses that phylogenetic biologists might find useful in their research.

Maintained by Liam J. Revell. Last updated 26 days ago.

10.3 match 218 stars 13.85 score 4.8k scripts 76 dependents

simsem

semTools:Useful Tools for Structural Equation Modeling

Provides miscellaneous tools for structural equation modeling, many of which extend the 'lavaan' package. For example, latent interactions can be estimated using product indicators (Lin et al., 2010, <doi:10.1080/10705511.2010.488999>) and simple effects probed; analytical power analyses can be conducted (Jak et al., 2021, <doi:10.3758/s13428-020-01479-0>); and scale reliability can be estimated based on estimated factor-model parameters.

Maintained by Terrence D. Jorgensen. Last updated 2 days ago.

10.3 match 79 stars 13.74 score 1.1k scripts 31 dependents

briencj

dae:Functions Useful in the Design and ANOVA of Experiments

The content falls into the following groupings: (i) Data, (ii) Factor manipulation functions, (iii) Design functions, (iv) ANOVA functions, (v) Matrix functions, (vi) Projector and canonical efficiency functions, and (vii) Miscellaneous functions. There is a vignette describing how to use the design functions for randomizing and assessing designs available as a vignette called 'DesignNotes'. The ANOVA functions facilitate the extraction of information when the 'Error' function has been used in the call to 'aov'. The package 'dae' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 3 months ago.

16.4 match 1 stars 8.62 score 356 scripts 7 dependents

ropensci

aorsf:Accelerated Oblique Random Forests

Fit, interpret, and compute predictions with oblique random forests. Includes support for partial dependence, variable importance, passing customized functions for variable importance and identification of linear combinations of features. Methods for the oblique random survival forest are described in Jaeger et al., (2023) <DOI:10.1080/10618600.2023.2231048>.

Maintained by Byron Jaeger. Last updated 2 days ago.

data-science oblique random-forest survival openblas cpp openmp

15.3 match 58 stars 9.21 score 60 scripts 1 dependents

daijiang

phyr:Model Based Phylogenetic Analysis

A collection of functions to do model-based phylogenetic analysis. It includes functions to calculate community phylogenetic diversity, to estimate correlations among functional traits while accounting for phylogenetic relationships, and to fit phylogenetic generalized linear mixed models. The Bayesian phylogenetic generalized linear mixed models are fitted with the 'INLA' package (<https://www.r-inla.org>).

Maintained by Daijiang Li. Last updated 1 years ago.

bayesian glmm inla phylogeny species-distribution-modeling openblas cpp

16.1 match 31 stars 8.67 score 107 scripts 2 dependents

gaynorr

AlphaSimR:Breeding Program Simulations

The successor to the 'AlphaSim' software for breeding program simulation [Faux et al. (2016) <doi:10.3835/plantgenome2016.02.0013>]. Used for stochastic simulations of breeding programs to the level of DNA sequence for every individual. Contained is a wide range of functions for modeling common tasks in a breeding program, such as selection and crossing. These functions allow for constructing simulations of highly complex plant and animal breeding programs via scripting in the R software environment. Such simulations can be used to evaluate overall breeding program performance and conduct research into breeding program design, such as implementation of genomic selection. Included is the 'Markovian Coalescent Simulator' ('MaCS') for fast simulation of biallelic sequences according to a population demographic history [Chen et al. (2009) <doi:10.1101/gr.083634.108>].

Maintained by Chris Gaynor. Last updated 4 months ago.

breeding genomics simulation openblas cpp openmp

13.4 match 47 stars 10.22 score 534 scripts 2 dependents

blasbenito

spatialRF:Easy Spatial Modeling with Random Forest

Automatic generation and selection of spatial predictors for spatial regression with Random Forest. Spatial predictors are surrogates of variables driving the spatial structure of a response variable. The package offers two methods to generate spatial predictors from a distance matrix among training cases: 1) Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>): computed as the eigenvectors of a weighted matrix of distances; 2) RFsp (Hengl et al. <DOI:10.7717/peerj.5518>): columns of the distance matrix used as spatial predictors. Spatial predictors help minimize the spatial autocorrelation of the model residuals and facilitate an honest assessment of the importance scores of the non-spatial predictors. Additionally, functions to reduce multicollinearity, identify relevant variable interactions, tune random forest hyperparameters, assess model transferability via spatial cross-validation, and explore model results via partial dependence curves and interaction surfaces are included in the package. The modelling functions are built around the highly efficient 'ranger' package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).

Maintained by Blas M. Benito. Last updated 3 years ago.

random-forest spatial-analysis spatial-regression

25.2 match 114 stars 5.45 score 49 scripts

rsbivand

splancs:Spatial and Space-Time Point Pattern Analysis

The Splancs package was written as an enhancement to S-Plus for display and analysis of spatial point pattern data; it has been ported to R and is in "maintenance mode".

Maintained by Roger Bivand. Last updated 10 months ago.

fortran

15.7 match 1 stars 8.72 score 592 scripts 53 dependents

kenaho1

asbio:A Collection of Statistical Tools for Biologists

Contains functions from: Aho, K. (2014) Foundational and Applied Statistics for Biologists using R. CRC/Taylor and Francis, Boca Raton, FL, ISBN: 978-1-4398-7338-0.

Maintained by Ken Aho. Last updated 2 months ago.

19.1 match 5 stars 7.09 score 310 scripts 3 dependents

emmanuelparadis

ape:Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.

Maintained by Emmanuel Paradis. Last updated 1 months ago.

openblas cpp

7.8 match 64 stars 17.18 score 13k scripts 601 dependents

kogalur

randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)

Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.

Maintained by Udaya B. Kogalur. Last updated 2 months ago.

openmp

16.6 match 10 stars 7.90 score 1.2k scripts 12 dependents

andyliaw-mrk

randomForest:Breiman and Cutlers Random Forests for Classification and Regression

Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <DOI:10.1023/A:1010933404324>.

Maintained by Andy Liaw. Last updated 6 months ago.

fortran

10.8 match 47 stars 12.11 score 35k scripts 282 dependents

tilltnet

egor:Import and Analyse Ego-Centered Network Data

Tools for importing, analyzing and visualizing ego-centered network data. Supports several data formats, including the export formats of 'EgoNet', 'EgoWeb 2.0' and 'openeddi'. An interactive (shiny) app for the intuitive visualization of ego-centered networks is provided. Also included are procedures for creating and visualizing Clustered Graphs (Lerner 2008 <DOI:10.1109/PACIFICVIS.2008.4475458>).

Maintained by Till Krenz. Last updated 11 days ago.

ego-centered egonet egor network-analysis sna

15.1 match 24 stars 8.64 score 76 scripts 2 dependents

jlmelville

rnndescent:Nearest Neighbor Descent Method for Approximate Nearest Neighbors

The Nearest Neighbor Descent method for finding approximate nearest neighbors by Dong and co-workers (2010) <doi:10.1145/1963405.1963487>. Based on the 'Python' package 'PyNNDescent' <https://github.com/lmcinnes/pynndescent>.

Maintained by James Melville. Last updated 8 months ago.

approximate-nearest-neighbor-search cpp

17.7 match 11 stars 7.31 score 75 scripts

ewenharrison

finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling

Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.

Maintained by Ewen Harrison. Last updated 6 months ago.

11.3 match 270 stars 11.43 score 1.0k scripts

lebebr01

simglm:Simulate Models Based on the Generalized Linear Model

Simulates regression models, including both simple regression and generalized linear mixed models with up to three level of nesting. Power simulations that are flexible allowing the specification of missing data, unbalanced designs, and different random error distributions are built into the package.

Maintained by Brandon LeBeau. Last updated 10 months ago.

power simulation

16.4 match 43 stars 7.87 score 87 scripts

egenn

rtemis:Machine Learning and Visualization

Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.

Maintained by E.D. Gennatas. Last updated 1 months ago.

data-science data-visualization machine-learning machine-learning-library visualization

18.2 match 145 stars 7.09 score 50 scripts 2 dependents

ropensci

spatsoc:Group Animal Relocation Data by Spatial and Temporal Relationship

Detects spatial and temporal groups in GPS relocations (Robitaille et al. (2019) <doi:10.1111/2041-210X.13215>). It can be used to convert GPS relocations to gambit-of-the-group format to build proximity-based social networks In addition, the randomizations function provides data-stream randomization methods suitable for GPS data.

Maintained by Alec L. Robitaille. Last updated 1 months ago.

animal gps network social spatial

12.9 match 24 stars 9.97 score 145 scripts 3 dependents

cran

randomUniformForest:Random Uniform Forests for Classification, Regression and Unsupervised Learning

Ensemble model, for classification, regression and unsupervised learning, based on a forest of unpruned and randomized binary decision trees. Each tree is grown by sampling, with replacement, a set of variables at each node. Each cut-point is generated randomly, according to the continuous Uniform distribution. For each tree, data are either bootstrapped or subsampled. The unsupervised mode introduces clustering, dimension reduction and variable importance, using a three-layer engine. Random Uniform Forests are mainly aimed to lower correlation between trees (or trees residuals), to provide a deep analysis of variable importance and to allow native distributed and incremental learning.

Maintained by Saip Ciss. Last updated 3 years ago.

cpp

34.0 match 3 stars 3.77 score 99 scripts

r-forge

randtoolbox:Toolbox for Pseudo and Quasi Random Number Generation and Random Generator Tests

Provides (1) pseudo random generators - general linear congruential generators, multiple recursive generators and generalized feedback shift register (SF-Mersenne Twister algorithm (<doi:10.1007/978-3-540-74496-2_36>) and WELL (<doi:10.1145/1132973.1132974>) generators); (2) quasi random generators - the Torus algorithm, the Sobol sequence, the Halton sequence (including the Van der Corput sequence) and (3) some generator tests - the gap test, the serial test, the poker test, see, e.g., Gentle (2003) <doi:10.1007/b97336>. Take a look at the Distribution task view of types and tests of random number generators. The package can be provided without the 'rngWELL' dependency on demand. Package in Memoriam of Diethelm and Barbara Wuertz.

Maintained by Christophe Dutang. Last updated 3 months ago.

12.5 match 1 stars 10.23 score 578 scripts 80 dependents

thinkr-open

shinipsum:Lorem-Ipsum-like Helpers for fast Shiny Prototyping

Prototype your shiny apps quickly with these Lorem-Ipsum-like Helpers.

Maintained by Colin Fay. Last updated 1 years ago.

dygraph ggplot golemverse hacktoberfest lorem-ipsum

19.5 match 125 stars 6.45 score 50 scripts 1 dependents

mayer79

missRanger:Fast Imputation of Missing Values

Alternative implementation of the beautiful 'MissForest' algorithm used to impute mixed-type data sets by chaining random forests, introduced by Stekhoven, D.J. and Buehlmann, P. (2012) <doi:10.1093/bioinformatics/btr597>. Under the hood, it uses the lightning fast random forest package 'ranger'. Between the iterative model fitting, we offer the option of using predictive mean matching. This firstly avoids imputation with values not already present in the original data (like a value 0.3334 in 0-1 coded variable). Secondly, predictive mean matching tries to raise the variance in the resulting conditional distributions to a realistic level. This would allow, e.g., to do multiple imputation when repeating the call to missRanger(). Out-of-sample application is supported as well.

Maintained by Michael Mayer. Last updated 3 months ago.

imputation machine-learning missing-values random-forest

11.3 match 69 stars 11.07 score 208 scripts 6 dependents

bioc

seqsetvis:Set Based Visualizations for Next-Gen Sequencing Data

seqsetvis enables the visualization and analysis of sets of genomic sites in next gen sequencing data. Although seqsetvis was designed for the comparison of mulitple ChIP-seq samples, this package is domain-agnostic and allows the processing of multiple genomic coordinate files (bed-like files) and signal files (bigwig files pileups from bam file). seqsetvis has multiple functions for fetching data from regions into a tidy format for analysis in data.table or tidyverse and visualization via ggplot2.

Maintained by Joseph R Boyd. Last updated 3 months ago.

software chipseq multiplecomparison sequencing visualization

21.4 match 5.82 score 82 scripts

nicholasjclark

MRFcov:Markov Random Fields with Additional Covariates

Approximate node interaction parameters of Markov Random Fields graphical networks. Models can incorporate additional covariates, allowing users to estimate how interactions between nodes in the graph are predicted to change across covariate gradients. The general methods implemented in this package are described in Clark et al. (2018) <doi:10.1002/ecy.2221>.

Maintained by Nicholas J Clark. Last updated 12 months ago.

conditional-random-fields graphical-models machine-learning markov-random-field multivariate-analysis multivariate-statistics network-analysis networks

20.5 match 24 stars 6.03 score 30 scripts

bayesiandemography

bage:Bayesian Estimation and Forecasting of Age-Specific Rates

Fast Bayesian estimation and forecasting of age-specific rates, probabilities, and means, based on 'Template Model Builder'.

Maintained by John Bryant. Last updated 2 months ago.

cpp

16.9 match 3 stars 7.30 score 39 scripts

gamlss-dev

gamlss:Generalized Additive Models for Location Scale and Shape

Functions for fitting the Generalized Additive Models for Location Scale and Shape introduced by Rigby and Stasinopoulos (2005), <doi:10.1111/j.1467-9876.2005.00510.x>. The models use a distributional regression approach where all the parameters of the conditional distribution of the response variable are modelled using explanatory variables.

Maintained by Mikis Stasinopoulos. Last updated 4 months ago.

11.0 match 16 stars 11.23 score 2.0k scripts 49 dependents

briencj

asremlPlus:Augments 'ASReml-R' in Fitting Mixed Models and Packages Generally in Exploring Prediction Differences

Assists in automating the selection of terms to include in mixed models when 'asreml' is used to fit the models. Procedures are available for choosing models that conform to the hierarchy or marginality principle, for fitting and choosing between two-dimensional spatial models using correlation, natural cubic smoothing spline and P-spline models. A history of the fitting of a sequence of models is kept in a data frame. Also used to compute functions and contrasts of, to investigate differences between and to plot predictions obtained using any model fitting function. The content falls into the following natural groupings: (i) Data, (ii) Model modification functions, (iii) Model selection and description functions, (iv) Model diagnostics and simulation functions, (v) Prediction production and presentation functions, (vi) Response transformation functions, (vii) Object manipulation functions, and (viii) Miscellaneous functions (for further details see 'asremlPlus-package' in help). The 'asreml' package provides a computationally efficient algorithm for fitting a wide range of linear mixed models using Residual Maximum Likelihood. It is a commercial package and a license for it can be purchased from 'VSNi' <https://vsni.co.uk/> as 'asreml-R', who will supply a zip file for local installation/updating (see <https://asreml.kb.vsni.co.uk/>). It is not needed for functions that are methods for 'alldiffs' and 'data.frame' objects. The package 'asremPlus' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 26 days ago.

asreml mixed-models

13.2 match 19 stars 9.34 score 200 scripts

jakubnowicki

fixtuRes:Mock Data Generator

Generate mock data in R using YAML configuration.

Maintained by Jakub Nowicki. Last updated 3 years ago.

fixtures mock-data mock-data-generator test-data-generator yaml-configuration

24.7 match 16 stars 4.98 score 12 scripts

chrhennig

fpc:Flexible Procedures for Clustering

Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.

Maintained by Christian Hennig. Last updated 6 months ago.

13.2 match 11 stars 9.25 score 2.6k scripts 70 dependents

bayesball

LearnBayes:Learning Bayesian Inference

Contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.

Maintained by Jim Albert. Last updated 7 years ago.

10.7 match 38 stars 11.34 score 690 scripts 31 dependents

shangzhi-hong

RfEmpImp:Multiple Imputation using Chained Random Forests

An R package for multiple imputation using chained random forests. Implemented methods can handle missing data in mixed types of variables by using prediction-based or node-based conditional distributions constructed using random forests. For prediction-based imputation, the method based on the empirical distribution of out-of-bag prediction errors of random forests and the method based on normality assumption for prediction errors of random forests are provided for imputing continuous variables. And the method based on predicted probabilities is provided for imputing categorical variables. For node-based imputation, the method based on the conditional distribution formed by the predicting nodes of random forests, and the method based on proximity measures of random forests are provided. More details of the statistical methods can be found in Hong et al. (2020) <arXiv:2004.14823>.

Maintained by Shangzhi Hong. Last updated 2 years ago.

imputation missing-data random-forest

27.3 match 5 stars 4.40 score 8 scripts

jeroen

openssl:Toolkit for Encryption, Signatures and Certificates Based on OpenSSL

Bindings to OpenSSL libssl and libcrypto, plus custom SSH key parsers. Supports RSA, DSA and EC curves P-256, P-384, P-521, and curve25519. Cryptographic signatures can either be created and verified manually or via x509 certificates. AES can be used in cbc, ctr or gcm mode for symmetric encryption; RSA for asymmetric (public key) encryption or EC for Diffie Hellman. High-level envelope functions combine RSA and AES for encrypting arbitrary sized data. Other utilities include key generators, hash functions (md5, sha1, sha256, etc), base64 encoder, a secure random number generator, and 'bignum' math methods for manually performing crypto calculations on large multibyte integers.

Maintained by Jeroen Ooms. Last updated 1 months ago.

openssl

6.6 match 65 stars 18.00 score 632 scripts 5.0k dependents

jeffreyevans

spatialEco:Spatial Analysis and Modelling Utilities

Utilities to support spatial data manipulation, query, sampling and modelling in ecological applications. Functions include models for species population density, spatial smoothing, multivariate separability, point process model for creating pseudo- absences and sub-sampling, Quadrant-based sampling and analysis, auto-logistic modeling, sampling models, cluster optimization, statistical exploratory tools and raster-based metrics.

Maintained by Jeffrey S. Evans. Last updated 12 days ago.

biodiversity conservation ecology r-spatial raster spatial vector

12.5 match 110 stars 9.55 score 736 scripts 2 dependents

jenniniku

gllvm:Generalized Linear Latent Variable Models

Analysis of multivariate data using generalized linear latent variable models (gllvm). Estimation is performed using either the Laplace method, variational approximations, or extended variational approximations, implemented via TMB (Kristensen et al. (2016), <doi:10.18637/jss.v070.i05>).

Maintained by Jenni Niku. Last updated 18 hours ago.

cpp openmp

11.3 match 51 stars 10.52 score 176 scripts 1 dependents

mhahsler

arules:Mining Association Rules and Frequent Itemsets

Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.

Maintained by Michael Hahsler. Last updated 1 months ago.

arules association-rules frequent-itemsets

8.5 match 194 stars 13.99 score 3.3k scripts 28 dependents

zdebruine

RcppML:Rcpp Machine Learning Library

Fast machine learning algorithms including matrix factorization and divisive clustering for large sparse and dense matrices.

Maintained by Zach DeBruine. Last updated 2 years ago.

clustering matrix-factorization nmf rcpp rcppeigen sparse-matrix cpp openmp

11.2 match 104 stars 10.53 score 125 scripts 46 dependents

kosukeimai

experiment:R Package for Designing and Analyzing Randomized Experiments

Provides various statistical methods for designing and analyzing randomized experiments. One functionality of the package is the implementation of randomized-block and matched-pair designs based on possibly multivariate pre-treatment covariates. The package also provides the tools to analyze various randomized experiments including cluster randomized experiments, two-stage randomized experiments, randomized experiments with noncompliance, and randomized experiments with missing data.

Maintained by Kosuke Imai. Last updated 3 years ago.

openblas

22.3 match 14 stars 5.29 score 23 scripts

metinbulus

PowerUpR:Power Analysis Tools for Multilevel Randomized Experiments

Includes tools to calculate statistical power, minimum detectable effect size (MDES), MDES difference (MDESD), and minimum required sample size for various multilevel randomized experiments with continuous outcomes. Some of the functions can assist with planning multilevel randomized experiments sensetive to detect multilevel moderation (2-1-1, 2-1-2, 2-2-1, and 2-2-2 designs) and multilevel mediation (2-1-1, 2-2-1, 3-1-1, 3-2-1, and 3-3-1 designs). See 'PowerUp!' Excel series at <https://www.causalevaluation.org/>.

Maintained by Metin Bulus. Last updated 4 years ago.

25.0 match 2 stars 4.68 score 24 scripts

bnosac

crfsuite:Conditional Random Fields for Labelling Sequential Data in Natural Language Processing

Wraps the 'CRFsuite' library <https://github.com/chokkan/crfsuite> allowing users to fit a Conditional Random Field model and to apply it on existing data. The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for named entity recognition, text chunking, part of speech tagging, intent recognition or classification of any category you have in mind. Next to training, a small web application is included in the package to allow you to easily construct training data.

Maintained by Jan Wijffels. Last updated 1 years ago.

chunking conditional-random-fields crf crfsuite data-science intent-classification natural-language-processing ner nlp cpp

18.5 match 63 stars 6.34 score 35 scripts

davidbolin

rSPDE:Rational Approximations of Fractional Stochastic Partial Differential Equations

Functions that compute rational approximations of fractional elliptic stochastic partial differential equations. The package also contains functions for common statistical usage of these approximations. The main references for rSPDE are Bolin, Simas and Xiong (2023) <doi:10.1080/10618600.2023.2231051> for the covariance-based method and Bolin and Kirchner (2020) <doi:10.1080/10618600.2019.1665537> for the operator-based rational approximation. These can be generated by the citation function in R.

Maintained by David Bolin. Last updated 8 days ago.

15.3 match 11 stars 7.57 score 188 scripts 3 dependents

twolodzko

extraDistr:Additional Univariate and Multivariate Distributions

Density, distribution function, quantile function and random generation for a number of univariate and multivariate distributions. This package implements the following distributions: Bernoulli, beta-binomial, beta-negative binomial, beta prime, Bhattacharjee, Birnbaum-Saunders, bivariate normal, bivariate Poisson, categorical, Dirichlet, Dirichlet-multinomial, discrete gamma, discrete Laplace, discrete normal, discrete uniform, discrete Weibull, Frechet, gamma-Poisson, generalized extreme value, Gompertz, generalized Pareto, Gumbel, half-Cauchy, half-normal, half-t, Huber density, inverse chi-squared, inverse-gamma, Kumaraswamy, Laplace, location-scale t, logarithmic, Lomax, multivariate hypergeometric, multinomial, negative hypergeometric, non-standard beta, normal mixture, Poisson mixture, Pareto, power, reparametrized beta, Rayleigh, shifted Gompertz, Skellam, slash, triangular, truncated binomial, truncated normal, truncated Poisson, Tukey lambda, Wald, zero-inflated binomial, zero-inflated negative binomial, zero-inflated Poisson.

Maintained by Tymoteusz Wolodzko. Last updated 10 days ago.

c-plus-plus c-plus-plus-11 distribution multivariate-distributions probability random-generation rcpp statistics cpp

9.9 match 53 stars 11.60 score 1.5k scripts 107 dependents

bioc

sccomp:Tests differences in cell-type proportion for single-cell data, robust to outliers

A robust and outlier-aware method for testing differences in cell-type proportion in single-cell data. This model can infer changes in tissue composition and heterogeneity, and can produce realistic data simulations based on any existing dataset. This model can also transfer knowledge from a large set of integrated datasets to increase accuracy further.

Maintained by Stefano Mangiola. Last updated 15 days ago.

bayesian regression differentialexpression singlecell batch-correction composition cytof differential-proportion microbiome multilevel proportions random-effects single-cell unwanted-variation

13.6 match 99 stars 8.41 score 69 scripts

predictiveecology

NetLogoR:Build and Run Spatially Explicit Agent-Based Models

Build and run spatially explicit agent-based models using only the R platform. 'NetLogoR' follows the same framework as the 'NetLogo' software (Wilensky (1999) <http://ccl.northwestern.edu/netlogo/>) and is a translation in R of the structure and functions of 'NetLogo'. 'NetLogoR' provides new R classes to define model agents and functions to implement spatially explicit agent-based models in the R environment. This package allows benefiting of the fast and easy coding phase from the highly developed 'NetLogo' framework, coupled with the versatility, power and massive resources of the R software. Examples of two models from the NetLogo software repository (Ants <http://ccl.northwestern.edu/netlogo/models/Ants>) and Wolf-Sheep-Predation (<http://ccl.northwestern.edu/netlogo/models/WolfSheepPredation>), and a third, Butterfly, from Railsback and Grimm (2012) <https://www.railsback-grimm-abm-book.com/>, all written using 'NetLogoR' are available. The 'NetLogo' code of the original version of these models is provided alongside. A programming guide inspired from the 'NetLogo' Programming Guide (<https://ccl.northwestern.edu/netlogo/docs/programming.html>) and a dictionary of 'NetLogo' primitives (<https://ccl.northwestern.edu/netlogo/docs/dictionary.html>) equivalences are also available. NOTE: To increment 'time', these functions can use a for loop or can be integrated with a discrete event simulator, such as 'SpaDES' (<https://cran.r-project.org/package=SpaDES>). The suggested package 'fastshp' can be installed with 'install.packages("fastshp", repos = ("<https://rforge.net>"), type = "source")'.

Maintained by Eliot J B McIntire. Last updated 4 months ago.

16.3 match 38 stars 6.94 score 19 scripts

tylermorganwall

spacefillr:Space-Filling Random and Quasi-Random Sequences

Generates random and quasi-random space-filling sequences. Supports the following sequences: 'Halton', 'Sobol', 'Owen'-scrambled 'Sobol', 'Owen'-scrambled 'Sobol' with errors distributed as blue noise, progressive jittered, progressive multi-jittered ('PMJ'), 'PMJ' with blue noise, 'PMJ02', and 'PMJ02' with blue noise. Includes a 'C++' 'API'. Methods derived from "Constructing Sobol sequences with better two-dimensional projections" (2012) <doi:10.1137/070709359> S. Joe and F. Y. Kuo, "Progressive Multi-Jittered Sample Sequences" (2018) <https://graphics.pixar.com/library/ProgressiveMultiJitteredSampling/paper.pdf> Christensen, P., Kensler, A. and Kilpatrick, C., and "A Low-Discrepancy Sampler that Distributes Monte Carlo Errors as a Blue Noise in Screen Space" (2019) E. Heitz, B. Laurent, O. Victor, C. David and I. Jean-Claude, <doi:10.1145/3306307.3328191>.

Maintained by Tyler Morgan-Wall. Last updated 19 days ago.

halton-sequence quasi-random-generator sobol-sequence cpp

16.0 match 7 stars 7.07 score 3 scripts 45 dependents

pbs-assess

sdmTMB:Spatial and Spatiotemporal SPDE-Based GLMMs with 'TMB'

Implements spatial and spatiotemporal GLMMs (Generalized Linear Mixed Effect Models) using 'TMB', 'fmesher', and the SPDE (Stochastic Partial Differential Equation) Gaussian Markov random field approximation to Gaussian random fields. One common application is for spatially explicit species distribution models (SDMs). See Anderson et al. (2024) <doi:10.1101/2022.03.24.485545>.

Maintained by Sean C. Anderson. Last updated 8 hours ago.

ecology glmm spatial-analysis species-distribution-modelling tmb cpp

10.4 match 203 stars 10.71 score 848 scripts 1 dependents

jmsigner

amt:Animal Movement Tools

Manage and analyze animal movement data. The functionality of 'amt' includes methods to calculate home ranges, track statistics (e.g. step lengths, speed, or turning angles), prepare data for fitting habitat selection analyses, and simulation of space-use from fitted step-selection functions.

Maintained by Johannes Signer. Last updated 4 months ago.

10.5 match 41 stars 10.54 score 418 scripts

tychelab

CoSMoS:Complete Stochastic Modelling Solution

Makes univariate, multivariate, or random fields simulations precise and simple. Just select the desired time series or random fields’ properties and it will do the rest. CoSMoS is based on the framework described in Papalexiou (2018, <doi:10.1016/j.advwatres.2018.02.013>), extended for random fields in Papalexiou and Serinaldi (2020, <doi:10.1029/2019WR026331>), and further advanced in Papalexiou et al. (2021, <doi:10.1029/2020WR029466>) to allow fine-scale space-time simulation of storms (or even cyclone-mimicking fields).

Maintained by Kevin Shook. Last updated 4 years ago.

15.5 match 11 stars 7.10 score 77 scripts

miraisolutions

rTRNG:Advanced and Parallel Random Number Generation via 'TRNG'

Embeds sources and headers from Tina's Random Number Generator ('TRNG') C++ library. Exposes some functionality for easier access, testing and benchmarking into R. Provides examples of how to use parallel RNG with 'RcppParallel'. The methods and techniques behind 'TRNG' are illustrated in the package vignettes and examples. Full documentation is available in Bauke (2021) <https://github.com/rabauke/trng4/blob/v4.23.1/doc/trng.pdf>.

Maintained by Riccardo Porreca. Last updated 1 years ago.

hpc parallel rcpp trng cpp

19.6 match 19 stars 5.63 score 15 scripts

hesim-dev

hesim:Health Economic Simulation Modeling and Decision Analysis

A modular and computationally efficient R package for parameterizing, simulating, and analyzing health economic simulation models. The package supports cohort discrete time state transition models (Briggs et al. 1998) <doi:10.2165/00019053-199813040-00003>, N-state partitioned survival models (Glasziou et al. 1990) <doi:10.1002/sim.4780091106>, and individual-level continuous time state transition models (Siebert et al. 2012) <doi:10.1016/j.jval.2012.06.014>, encompassing both Markov (time-homogeneous and time-inhomogeneous) and semi-Markov processes. Decision uncertainty from a cost-effectiveness analysis is quantified with standard graphical and tabular summaries of a probabilistic sensitivity analysis (Claxton et al. 2005, Barton et al. 2008) <doi:10.1002/hec.985>, <doi:10.1111/j.1524-4733.2008.00358.x>. Use of C++ and data.table make individual-patient simulation, probabilistic sensitivity analysis, and incorporation of patient heterogeneity fast.

Maintained by Devin Incerti. Last updated 6 months ago.

health-economic-evaluation microsimulation simulation-modeling cpp

13.6 match 67 stars 8.12 score 41 scripts

blawson-bates

simEd:Simulation Education

Contains various functions to be used for simulation education, including simple Monte Carlo simulation functions, queueing simulation functions, variate generation functions capable of producing independent streams and antithetic variates, functions for illustrating random variate generation for various discrete and continuous distributions, and functions to compute time-persistent statistics. Also contains functions for visualizing: event-driven details of a single-server queue model; a Lehmer random number generator; variate generation via acceptance-rejection; and of generating a non-homogeneous Poisson process via thinning. Also contains two queueing data sets (one fabricated, one real-world) to facilitate input modeling. More details on the use of these functions can be found in Lawson and Leemis (2015) <doi:10.1109/WSC.2017.8248124>, in Kudlay, Lawson, and Leemis (2020) <doi:10.1109/WSC48552.2020.9384010>, and in Lawson and Leemis (2021) <doi:10.1109/WSC52266.2021.9715299>.

Maintained by Barry Lawson. Last updated 1 years ago.

32.7 match 3.35 score 45 scripts

spatstat

spatstat.geom:Geometrical Functionality of the 'spatstat' Family

Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)

Maintained by Adrian Baddeley. Last updated 4 hours ago.

classes-and-objects distance-calculation geometry geometry-processing images mensuration plotting point-patterns spatial-data spatial-data-analysis

9.1 match 7 stars 12.11 score 241 scripts 227 dependents

skgrange

rmweather:Tools to Conduct Meteorological Normalisation and Counterfactual Modelling for Air Quality Data

An integrated set of tools to allow data users to conduct meteorological normalisation and counterfactual modelling for air quality data. The meteorological normalisation technique uses predictive random forest models to remove variation of pollutant concentrations so trends and interventions can be explored in a robust way. For examples, see Grange et al. (2018) <doi:10.5194/acp-18-6223-2018> and Grange and Carslaw (2019) <doi:10.1016/j.scitotenv.2018.10.344>. The random forest models can also be used for counterfactual or business as usual (BAU) modelling by using the models to predict, from the model's perspective, the future. For an example, see Grange et al. (2021) <doi:10.5194/acp-2020-1171>.

Maintained by Stuart K. Grange. Last updated 22 days ago.

17.5 match 49 stars 6.24 score 239 scripts

pdhoff

rstiefel:Random Orthonormal Matrix Generation and Optimization on the Stiefel Manifold

Simulation of random orthonormal matrices from linear and quadratic exponential family distributions on the Stiefel manifold. The most general type of distribution covered is the matrix-variate Bingham-von Mises-Fisher distribution. Most of the simulation methods are presented in Hoff(2009) "Simulation of the Matrix Bingham-von Mises-Fisher Distribution, With Applications to Multivariate and Relational Data" <doi:10.1198/jcgs.2009.07177>. The package also includes functions for optimization on the Stiefel manifold based on algorithms described in Wen and Yin (2013) "A feasible method for optimization with orthogonality constraints" <doi:10.1007/s10107-012-0584-1>.

Maintained by Peter Hoff. Last updated 4 years ago.

16.4 match 3 stars 6.53 score 95 scripts 8 dependents

statnet

latentnet:Latent Position and Cluster Models for Statistical Networks

Fit and simulate latent position and cluster models for statistical networks. See Krivitsky and Handcock (2008) <doi:10.18637/jss.v024.i05> and Krivitsky, Handcock, Raftery, and Hoff (2009) <doi:10.1016/j.socnet.2009.04.001>.

Maintained by Pavel N. Krivitsky. Last updated 4 days ago.

openblas

12.8 match 19 stars 8.36 score 191 scripts 4 dependents

wilkelab

ggridges:Ridgeline Plots in 'ggplot2'

Ridgeline plots provide a convenient way of visualizing changes in distributions over time or space. This package enables the creation of such plots in 'ggplot2'.

Maintained by Claus O. Wilke. Last updated 3 months ago.

6.4 match 418 stars 16.71 score 14k scripts 285 dependents

bioc

BiocParallel:Bioconductor facilities for parallel evaluation

This package provides modified versions and novel implementation of functions for parallel evaluation, tailored to use with Bioconductor objects.

Maintained by Martin Morgan. Last updated 24 days ago.

infrastructure bioconductor-package core-package u24ca289073 cpp

6.1 match 67 stars 17.40 score 7.3k scripts 1.1k dependents

bbolker

broom.mixed:Tidying Methods for Mixed Models

Convert fitted objects from various R mixed-model packages into tidy data frames along the lines of the 'broom' package. The package provides three S3 generics for each model: tidy(), which summarizes a model's statistical findings such as coefficients of a regression; augment(), which adds columns to the original data such as predictions, residuals and cluster assignments; and glance(), which provides a one-row summary of model-level statistics.

Maintained by Ben Bolker. Last updated 3 months ago.

7.0 match 231 stars 15.22 score 4.0k scripts 37 dependents

jinghuazhao

gap:Genetic Analysis Package

As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).

Maintained by Jing Hua Zhao. Last updated 15 days ago.

genetics imputation lmm fortran

8.8 match 12 stars 11.88 score 448 scripts 16 dependents

mobiodiv

mobsim:Spatial Simulation and Scale-Dependent Analysis of Biodiversity Changes

Simulation, analysis and sampling of spatial biodiversity data (May, Gerstner, McGlinn, Xiao & Chase 2017) <doi:10.1111/2041-210x.12986>. In the simulation tools user define the numbers of species and individuals, the species abundance distribution and species aggregation. Functions for analysis include species rarefaction and accumulation curves, species-area relationships and the distance decay of similarity.

Maintained by Felix May. Last updated 3 months ago.

biodiversity macroecology point-pattern-analysis rarefaction simulation species species-abundance-distributions cpp

13.3 match 20 stars 7.84 score 76 scripts

rudeboybert

resampledata:Data Sets for Mathematical Statistics with Resampling in R

Package of data sets from "Mathematical Statistics with Resampling in R" (1st Ed. 2011, 2nd Ed. 2018) by Laura Chihara and Tim Hesterberg.

Maintained by Albert Y. Kim. Last updated 4 months ago.

20.2 match 15 stars 5.15 score 187 scripts

mlr-org

mlr3torch:Deep Learning with 'mlr3'

Deep Learning library that extends the mlr3 framework by building upon the 'torch' package. It allows to conveniently build, train, and evaluate deep learning models without having to worry about low level details. Custom architectures can be created using the graph language defined in 'mlr3pipelines'.

Maintained by Sebastian Fischer. Last updated 30 days ago.

data-science deep-learning machine-learning mlr3 torch

13.6 match 42 stars 7.63 score 78 scripts

cran

sna:Tools for Social Network Analysis

A range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, network regression, random graph generation, and 2D/3D network visualization.

Maintained by Carter T. Butts. Last updated 6 months ago.

15.2 match 8 stars 6.78 score 94 dependents

olink-proteomics

OlinkAnalyze:Facilitate Analysis of Proteomic Data from Olink

A collection of functions to facilitate analysis of proteomic data from Olink, primarily NPX data that has been exported from Olink Software. The functions also work on QUANT data from Olink by log- transforming the QUANT data. The functions are focused on reading data, facilitating data wrangling and quality control analysis, performing statistical analysis and generating figures to visualize the results of the statistical analysis. The goal of this package is to help users extract biological insights from proteomic data run on the Olink platform.

Maintained by Kathleen Nevola. Last updated 19 days ago.

olink proteomics proteomics-data-analysis

10.6 match 104 stars 9.72 score 61 scripts

ehrlinger

ggRandomForests:Visually Exploring Random Forests

Graphic elements for exploring Random Forests using the 'randomForest' or 'randomForestSRC' package for survival, regression and classification forests and 'ggplot2' package plotting.

Maintained by John Ehrlinger. Last updated 4 days ago.

11.5 match 148 stars 8.94 score 197 scripts

melff

mclogit:Multinomial Logit Models, with or without Random Effects or Overdispersion

Provides estimators for multinomial logit models in their conditional logit and baseline logit variants, with or without random effects, with or without overdispersion. Random effects models are estimated using the PQL technique (based on a Laplace approximation) or the MQL technique (based on a Solomon-Cox approximation). Estimates should be treated with caution if the group sizes are small.

Maintained by Martin Elff. Last updated 3 months ago.

9.3 match 23 stars 11.03 score 262 scripts 4 dependents

hwborchers

pracma:Practical Numerical Math Functions

Provides a large number of functions from numerical analysis and linear algebra, numerical optimization, differential equations, time series, plus some well-known special mathematical functions. Uses 'MATLAB' function names where appropriate to simplify porting.

Maintained by Hans W. Borchers. Last updated 1 years ago.

8.3 match 29 stars 12.34 score 6.6k scripts 931 dependents

dgbonett

statpsych:Statistical Methods for Psychologists

Implements confidence interval and sample size methods that are especially useful in psychological research. The methods can be applied in 1-group, 2-group, paired-samples, and multiple-group designs and to a variety of parameters including means, medians, proportions, slopes, standardized mean differences, standardized linear contrasts of means, plus several measures of correlation and association. Confidence interval and sample size functions are given for single parameters as well as differences, ratios, and linear contrasts of parameters. The sample size functions can be used to approximate the sample size needed to estimate a parameter or function of parameters with desired confidence interval precision or to perform a variety of hypothesis tests (directional two-sided, equivalence, superiority, noninferiority) with desired power. For details see: Statistical Methods for Psychologists, Volumes 1 – 4, <https://dgbonett.sites.ucsc.edu/>.

Maintained by Douglas G. Bonett. Last updated 3 months ago.

21.2 match 6 stars 4.83 score 15 scripts 1 dependents

amalan-constat

fitODBOD:Modeling Over Dispersed Binomial Outcome Data Using BMD and ABD

Contains Probability Mass Functions, Cumulative Mass Functions, Negative Log Likelihood value, parameter estimation and modeling data using Binomial Mixture Distributions (BMD) (Manoj et al (2013) <doi:10.5539/ijsp.v2n2p24>) and Alternate Binomial Distributions (ABD) (Paul (1985) <doi:10.1080/03610928508828990>), also Journal article to use the package(<doi:10.21105/joss.01505>).

Maintained by Amalan Mahendran. Last updated 4 months ago.

binomial-outcome-data overdispersion

22.7 match 1 stars 4.44 score 139 scripts

miriamesteve

eat:Efficiency Analysis Trees

Functions are provided to determine production frontiers and technical efficiency measures through non-parametric techniques based upon regression trees. The package includes code for estimating radial input, output, directional and additive measures, plotting graphical representations of the scores and the production frontiers by means of trees, and determining rankings of importance of input variables in the analysis. Additionally, an adaptation of Random Forest by a set of individual Efficiency Analysis Trees for estimating technical efficiency is also included. More details in: <doi:10.1016/j.eswa.2020.113783>.

Maintained by Miriam Esteve. Last updated 3 years ago.

21.5 match 5 stars 4.68 score 19 scripts

biodiverse

unmarked:Models for Data from Unmarked Animals

Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. References: Kellner et al. (2023) <doi:10.1111/2041-210X.14123>, Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.

Maintained by Ken Kellner. Last updated 1 months ago.

openblas cpp openmp

7.7 match 4 stars 13.02 score 652 scripts 12 dependents

ropensci

canaper:Categorical Analysis of Neo- And Paleo-Endemism

Provides functions to analyze the spatial distribution of biodiversity, in particular categorical analysis of neo- and paleo-endemism (CANAPE) as described in Mishler et al (2014) <doi:10.1038/ncomms5473>. 'canaper' conducts statistical tests to determine the types of endemism that occur in a study area while accounting for the evolutionary relationships of species.

Maintained by Joel H. Nitta. Last updated 2 years ago.

biodiversity canape

18.5 match 7 stars 5.38 score 23 scripts

stekhoven

missForest:Nonparametric Missing Value Imputation using Random Forest

The function 'missForest' in this package is used to impute missing values particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data including complex interactions and non-linear relations. It yields an out-of-bag (OOB) imputation error estimate without the need of a test set or elaborate cross-validation. It can be run in parallel to save computation time.

Maintained by Daniel J. Stekhoven. Last updated 1 years ago.

8.6 match 92 stars 11.53 score 1.1k scripts 32 dependents

klvoje

evoTS:Analyses of Evolutionary Time-Series

Facilitates univariate and multivariate analysis of evolutionary sequences of phenotypic change. The package extends the modeling framework available in the 'paleoTS' package. Please see <https://klvoje.github.io/evoTS/index.html> for information about the package and the implemented models.

Maintained by Kjetil Lysne Voje. Last updated 9 months ago.

23.1 match 1 stars 4.26 score 184 scripts

cjerzak

fastrerandomize:Hardware-Accelerated Rerandomization for Improved Balance

Provides hardware-accelerated tools for performing rerandomization and randomization testing in experimental research. Using a 'JAX' backend, the package enables exact rerandomization inference even for large experiments with hundreds of billions of possible randomizations. Key functionalities include generating pools of acceptable rerandomizations based on covariate balance, conducting exact randomization tests, and performing pre-analysis evaluations to determine optimal rerandomization acceptance thresholds. The package supports various hardware acceleration frameworks including 'CPU', 'CUDA', and 'METAL', making it versatile across accelerated computing environments. This allows researchers to efficiently implement stringent rerandomization designs and conduct valid inference even with large sample sizes. The package is partly based on Jerzak and Goldstein (2023) <doi:10.48550/arXiv.2310.00861>.

Maintained by Connor Jerzak. Last updated 1 months ago.

balance experimental-design hardware-acceleration

17.4 match 8 stars 5.64 score 1 scripts

epiforecasts

epinowcast:Flexible Hierarchical Nowcasting

Tools to enable flexible and efficient hierarchical nowcasting of right-truncated epidemiological time-series using a semi-mechanistic Bayesian model with support for a range of reporting and generative processes. Nowcasting, in this context, is gaining situational awareness using currently available observations and the reporting patterns of historical observations. This can be useful when tracking the spread of infectious disease in real-time: without nowcasting, changes in trends can be obfuscated by partial reporting or their detection may be delayed due to the use of simpler methods like truncation. While the package has been designed with epidemiological applications in mind, it could be applied to any set of right-truncated time-series count data.

Maintained by Sam Abbott. Last updated 11 months ago.

cmdstanr effective-reproduction-number-estimation epidemiology infectious-disease-surveillance nowcasting outbreak-analysis pandemic-preparedness real-time-infectious-disease-modelling stan

12.4 match 61 stars 7.88 score 65 scripts

melodyaowen

crt2power:Designing Cluster-Randomized Trials with Two Continuous Co-Primary Outcomes

Provides methods for powering cluster-randomized trials with two continuous co-primary outcomes using five key design techniques. Includes functions for calculating required sample size and statistical power. For more details on methodology, see Owen et al. (2025) <doi:10.1002/sim.70015>, Yang et al. (2022) <doi:10.1111/biom.13692>, Pocock et al. (1987) <doi:10.2307/2531989>, Vickerstaff et al. (2019) <doi:10.1186/s12874-019-0754-4>, and Li et al. (2020) <doi:10.1111/biom.13212>.

Maintained by Melody Owen. Last updated 1 days ago.

27.2 match 3.60 score 2 scripts

eikeluedeling

decisionSupport:Quantitative Support of Decision Making under Uncertainty

Supporting the quantitative analysis of binary welfare based decision making processes using Monte Carlo simulations. Decision support is given on two levels: (i) The actual decision level is to choose between two alternatives under probabilistic uncertainty. This package calculates the optimal decision based on maximizing expected welfare. (ii) The meta decision level is to allocate resources to reduce the uncertainty in the underlying decision problem, i.e to increase the current information to improve the actual decision making process. This problem is dealt with using the Value of Information Analysis. The Expected Value of Information for arbitrary prospective estimates can be calculated as well as Individual Expected Value of Perfect Information. The probabilistic calculations are done via Monte Carlo simulations. This Monte Carlo functionality can be used on its own.

Maintained by Eike Luedeling. Last updated 11 months ago.

18.9 match 6 stars 5.17 score 123 scripts

vmoprojs

GeoModels:Procedures for Gaussian and Non Gaussian Geostatistical (Large) Data Analysis

Functions for Gaussian and Non Gaussian (bivariate) spatial and spatio-temporal data analysis are provided for a) (fast) simulation of random fields, b) inference for random fields using standard likelihood and a likelihood approximation method called weighted composite likelihood based on pairs and b) prediction using (local) best linear unbiased prediction. Weighted composite likelihood can be very efficient for estimating massive datasets. Both regression and spatial (temporal) dependence analysis can be jointly performed. Flexible covariance models for spatial and spatial-temporal data on Euclidean domains and spheres are provided. There are also many useful functions for plotting and performing diagnostic analysis. Different non Gaussian random fields can be considered in the analysis. Among them, random fields with marginal distributions such as Skew-Gaussian, Student-t, Tukey-h, Sin-Arcsin, Two-piece, Weibull, Gamma, Log-Gaussian, Binomial, Negative Binomial and Poisson. See the URL for the papers associated with this package, as for instance, Bevilacqua and Gaetan (2015) <doi:10.1007/s11222-014-9460-6>, Bevilacqua et al. (2016) <doi:10.1007/s13253-016-0256-3>, Vallejos et al. (2020) <doi:10.1007/978-3-030-56681-4>, Bevilacqua et. al (2020) <doi:10.1002/env.2632>, Bevilacqua et. al (2021) <doi:10.1111/sjos.12447>, Bevilacqua et al. (2022) <doi:10.1016/j.jmva.2022.104949>, Morales-Navarrete et al. (2023) <doi:10.1080/01621459.2022.2140053>, and a large class of examples and tutorials.

Maintained by Moreno Bevilacqua. Last updated 2 months ago.

fortran openblas glibc

23.4 match 3 stars 4.17 score 83 scripts

f-rousset

spaMM:Mixed-Effect Models, with or without Spatial Random Effects

Inference based on models with or without spatially-correlated random effects, multivariate responses, or non-Gaussian random effects (e.g., Beta). Variation in residual variance (heteroscedasticity) can itself be represented by a mixed-effect model. Both classical geostatistical models (Rousset and Ferdy 2014 <doi:10.1111/ecog.00566>), and Markov random field models on irregular grids (as considered in the 'INLA' package, <https://www.r-inla.org>), can be fitted, with distinct computational procedures exploiting the sparse matrix representations for the latter case and other autoregressive models. Laplace approximations are used for likelihood or restricted likelihood. Penalized quasi-likelihood and other variants discussed in the h-likelihood literature (Lee and Nelder 2001 <doi:10.1093/biomet/88.4.987>) are also implemented.

Maintained by François Rousset. Last updated 9 months ago.

gsl cpp openmp

19.7 match 4.94 score 208 scripts 5 dependents

epinowcast

epinowcast:Flexible Hierarchical Nowcasting

Tools to enable flexible and efficient hierarchical nowcasting of right-truncated epidemiological time-series using a semi-mechanistic Bayesian model with support for a range of reporting and generative processes. Nowcasting, in this context, is gaining situational awareness using currently available observations and the reporting patterns of historical observations. This can be useful when tracking the spread of infectious disease in real-time: without nowcasting, changes in trends can be obfuscated by partial reporting or their detection may be delayed due to the use of simpler methods like truncation. While the package has been designed with epidemiological applications in mind, it could be applied to any set of right-truncated time-series count data.

Maintained by Sam Abbott. Last updated 11 months ago.

cmdstanr effective-reproduction-number-estimation epidemiology infectious-disease-surveillance nowcasting outbreak-analysis pandemic-preparedness real-time-infectious-disease-modelling stan

12.4 match 61 stars 7.79 score 71 scripts

crj32

MLeval:Machine Learning Model Evaluation

Straightforward and detailed evaluation of machine learning models. 'MLeval' can produce receiver operating characteristic (ROC) curves, precision-recall (PR) curves, calibration curves, and PR gain curves. 'MLeval' accepts a data frame of class probabilities and ground truth labels, or, it can automatically interpret the Caret train function results from repeated cross validation, then select the best model and analyse the results. 'MLeval' produces a range of evaluation metrics with confidence intervals.

Maintained by Christopher R John. Last updated 5 years ago.

16.9 match 6 stars 5.71 score 144 scripts

r-forge

RandVar:Implementation of Random Variables

Implements random variables by means of S4 classes and methods.

Maintained by Matthias Kohl. Last updated 2 months ago.

16.0 match 6.03 score 43 scripts 7 dependents

danheck

RRreg:Correlation and Regression Analyses for Randomized Response Data

Univariate and multivariate methods to analyze randomized response (RR) survey designs (e.g., Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60, 63–69, <doi:10.2307/2283137>). Besides univariate estimates of true proportions, RR variables can be used for correlations, as dependent variable in a logistic regression (with or without random effects), or as predictors in a linear regression (Heck, D. W., & Moshagen, M. (2018). RRreg: An R package for correlation and regression analyses of randomized response data. Journal of Statistical Software, 85(2), 1–29, <doi:10.18637/jss.v085.i02>). For simulations and the estimation of statistical power, RR data can be generated according to several models. The implemented methods also allow to test the link between continuous covariates and dishonesty in cheating paradigms such as the coin-toss or dice-roll task (Moshagen, M., & Hilbig, B. E. (2017). The statistical analysis of cheating paradigms. Behavior Research Methods, 49, 724–732, <doi:10.3758/s13428-016-0729-x>).

Maintained by Daniel W. Heck. Last updated 2 years ago.

17.7 match 3 stars 5.46 score 48 scripts

flr

FLasher:Projection and Forecasting of Fish Populations, Stocks and Fleets

Projection of future population and fishery dynamics is carried out for a given set of management targets. A system of equations is solved, using Automatic Differentation (AD), for the levels of effort by fishery (fleet) that will result in the required abundances, catches or fishing mortalities.

Maintained by Iago Mosqueira. Last updated 8 days ago.

forecast fisheries flr cpp

14.0 match 2 stars 6.86 score 254 scripts 6 dependents

kearutherford

BerkeleyForestsAnalytics:Compute and Summarize Core Forest Metrics from Field Data

A suite of open-source R functions designed to produce standard metrics for forest management and ecology from forest inventory data. The overarching goal is to minimize potential inconsistencies introduced by the algorithms used to compute and summarize core forest metrics. Learn more about the purpose of the package and the specific algorithms used in the package at <https://github.com/kearutherford/BerkeleyForestsAnalytics>.

Maintained by Kea Rutherford. Last updated 2 months ago.

17.4 match 7 stars 5.50 score 4 scripts

claudioagostinelli

CircStats:Circular Statistics, from "Topics in Circular Statistics" (2001)

Circular Statistics, from "Topics in Circular Statistics" (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.

Maintained by Claudio Agostinelli. Last updated 7 years ago.

14.5 match 2 stars 6.60 score 261 scripts 36 dependents

paulhendricks

generator:Generate Data Containing Fake Personally Identifiable Information

Allows users to quickly and easily generate fake data containing Personally Identifiable Information (PII) through convenience functions.

Maintained by Paul Hendricks. Last updated 8 years ago.

16.0 match 24 stars 5.99 score 81 scripts

adeverse

adespatial:Multivariate Multiscale Spatial Analysis

Tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran's Eigenvectors Maps, MEM). Several approaches are described in the review Dray et al (2012) <doi:10.1890/11-1183.1>.

Maintained by Aurélie Siberchicot. Last updated 11 days ago.

openblas

8.6 match 36 stars 11.06 score 398 scripts 2 dependents

gavinsimpson

gratia:Graceful 'ggplot'-Based Graphics and Other Functions for GAMs Fitted Using 'mgcv'

Graceful 'ggplot'-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the 'mgcv' package. Provides a reimplementation of the plot() method for GAMs that 'mgcv' provides, as well as 'tidyverse' compatible representations of estimated smooths.

Maintained by Gavin L. Simpson. Last updated 4 days ago.

distributional-regression gam gamm generalized-additive-mixed-models generalized-additive-models ggplot2 glm lm mgcv penalized-spline random-effects smoothing splines

7.5 match 216 stars 12.68 score 1.6k scripts 1 dependents

cran

PracTools:Designing and Weighting Survey Samples

Functions and datasets to support Valliant, Dever, and Kreuter (2018), <doi:10.1007/978-3-319-93632-1>, "Practical Tools for Designing and Weighting Survey Samples". Contains functions for sample size calculation for survey samples using stratified or clustered one-, two-, and three-stage sample designs, and single-stage audit sample designs. Functions are included that will group geographic units accounting for distances apart and measures of size. Other functions compute variance components for multistage designs and sample sizes in two-phase designs. A number of example data sets are included.

Maintained by Richard Valliant. Last updated 9 months ago.

21.2 match 1 stars 4.48 score 101 scripts 1 dependents

inbo

inlatools:Diagnostic Tools for INLA Models

Several functions which can be useful to choose sensible priors and diagnose the fitted model.

Maintained by Thierry Onkelinx. Last updated 5 months ago.

bayesian-statistics gplv3 inla mixed-models model-checking model-validation

21.2 match 4 stars 4.41 score 43 scripts

gi0na

ghypernet:Fit and Simulate Generalised Hypergeometric Ensembles of Graphs

Provides functions for model fitting and selection of generalised hypergeometric ensembles of random graphs (gHypEG). To learn how to use it, check the vignettes for a quick tutorial. Please reference its use as Casiraghi, G., Nanumyan, V. (2019) <doi:10.5281/zenodo.2555300> together with those relevant references from the one listed below. The package is based on the research developed at the Chair of Systems Design, ETH Zurich. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2016) <arXiv:1607.02441>. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2017) <doi:10.1007/978-3-319-67256-4_11>. Casiraghi, G., (2017) <arXiv:1702.02048> Brandenberger, L., Casiraghi, G., Nanumyan, V., Schweitzer, F. (2019) <doi:10.1145/3341161.3342926> Casiraghi, G. (2019) <doi:10.1007/s41109-019-0241-1>. Casiraghi, G., Nanumyan, V. (2021) <doi:10.1038/s41598-021-92519-y>. Casiraghi, G. (2021) <doi:10.1088/2632-072X/ac0493>.

Maintained by Giona Casiraghi. Last updated 11 months ago.

data-mining data-science graphs network network-analysis random-graph-generation random-graphs

16.5 match 8 stars 5.68 score 20 scripts

r-forge

surveillance:Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena

Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hoehle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hoehle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.

Maintained by Sebastian Meyer. Last updated 17 hours ago.

cpp

8.7 match 2 stars 10.68 score 446 scripts 3 dependents

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 5 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

5.6 match 462 stars 16.50 score 10k scripts 154 dependents

quanteda

quanteda:Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintained by Kenneth Benoit. Last updated 2 months ago.

corpus natural-language-processing quanteda text-analytics onetbb cpp

5.5 match 851 stars 16.68 score 5.4k scripts 51 dependents

statnet

tergm:Fit, Simulate and Diagnose Models for Network Evolution Based on Exponential-Family Random Graph Models

An integrated set of extensions to the 'ergm' package to analyze and simulate network evolution based on exponential-family random graph models (ERGM). 'tergm' is a part of the 'statnet' suite of packages for network analysis. See Krivitsky and Handcock (2014) <doi:10.1111/rssb.12014> and Carnegie, Krivitsky, Hunter, and Goodreau (2015) <doi:10.1080/10618600.2014.903087>.

Maintained by Pavel N. Krivitsky. Last updated 4 months ago.

10.0 match 27 stars 9.29 score 78 scripts 3 dependents

bioc

randRotation:Random Rotation Methods for High Dimensional Data with Batch Structure

A collection of methods for performing random rotations on high-dimensional, normally distributed data (e.g. microarray or RNA-seq data) with batch structure. The random rotation approach allows exact testing of dependent test statistics with linear models following arbitrary batch effect correction methods.

Maintained by Peter Hettegger. Last updated 5 months ago.

software sequencing batcheffect biomedicalinformatics rnaseq preprocessing microarray differentialexpression geneexpression genetics micrornaarray normalization statisticalmethod

25.2 match 3.60 score 3 scripts

yayayaoyaoyao

RARtrials:Response-Adaptive Randomization in Clinical Trials

Some response-adaptive randomization methods commonly found in literature are included in this package. These methods include the randomized play-the-winner rule for binary endpoint (Wei and Durham (1978) <doi:10.2307/2286290>), the doubly adaptive biased coin design with minimal variance strategy for binary endpoint (Atkinson and Biswas (2013) <doi:10.1201/b16101>, Rosenberger and Lachin (2015) <doi:10.1002/9781118742112>) and maximal power strategy targeting Neyman allocation for binary endpoint (Tymofyeyev, Rosenberger, and Hu (2007) <doi:10.1198/016214506000000906>) and RSIHR allocation with each letter representing the first character of the names of the individuals who first proposed this rule (Youngsook and Hu (2010) <doi:10.1198/sbr.2009.0056>, Bello and Sabo (2016) <doi:10.1080/00949655.2015.1114116>), A-optimal Allocation for continuous endpoint (Sverdlov and Rosenberger (2013) <doi:10.1080/15598608.2013.783726>), Aa-optimal Allocation for continuous endpoint (Sverdlov and Rosenberger (2013) <doi:10.1080/15598608.2013.783726>), generalized RSIHR allocation for continuous endpoint (Atkinson and Biswas (2013) <doi:10.1201/b16101>), Bayesian response-adaptive randomization with a control group using the Thall \& Wathen method for binary and continuous endpoints (Thall and Wathen (2007) <doi:10.1016/j.ejca.2007.01.006>) and the forward-looking Gittins index rule for binary and continuous endpoints (Villar, Wason, and Bowden (2015) <doi:10.1111/biom.12337>, Williamson and Villar (2019) <doi:10.1111/biom.13119>).

Maintained by Chuyao Xu. Last updated 2 months ago.

19.5 match 4.65 score

christianroever

bayesmeta:Bayesian Random-Effects Meta-Analysis and Meta-Regression

A collection of functions allowing to derive the posterior distribution of the model parameters in random-effects meta-analysis or meta-regression, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, shrinkage effects, posterior predictive p-values, etc.; For more details, see also Roever C (2020) <doi:10.18637/jss.v093.i06>, or Roever C and Friede T (2022) <doi:10.1016/j.cmpb.2022.107303>.

Maintained by Christian Roever. Last updated 1 years ago.

16.8 match 3 stars 5.40 score 73 scripts 1 dependents

farrellday

miceRanger:Multiple Imputation by Chained Equations with Random Forests

Multiple Imputation has been shown to be a flexible method to impute missing values by Van Buuren (2007) <doi:10.1177/0962280206074463>. Expanding on this, random forests have been shown to be an accurate model by Stekhoven and Buhlmann <arXiv:1105.0828> to impute missing values in datasets. They have the added benefits of returning out of bag error and variable importance estimates, as well as being simple to run in parallel.

Maintained by Sam Wilson. Last updated 3 years ago.

imputation-methods machine-learning mice missing-data missing-values random-forests

12.7 match 67 stars 7.09 score 41 scripts 1 dependents

psirusteam

TeachingSampling:Selection of Samples and Parameter Estimation in Finite Population

Allows the user to draw probabilistic samples and make inferences from a finite population based on several sampling designs.

Maintained by Hugo Andres Gutierrez Rojas. Last updated 5 years ago.

15.4 match 4 stars 5.80 score 217 scripts 4 dependents

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

10.8 match 3 stars 8.20 score 7.8k scripts 11 dependents

pennchopmicrobiomeprogram

ZIBR:A Zero-Inflated Beta Random Effect Model

A two-part zero-inflated Beta regression model with random effects (ZIBR) for testing the association between microbial abundance and clinical covariates for longitudinal microbiome data. Eric Z. Chen and Hongzhe Li (2016) <doi:10.1093/bioinformatics/btw308>.

Maintained by Charlie Bushman. Last updated 1 years ago.

15.1 match 30 stars 5.86 score 24 scripts

fndemarqui

rsurv:Random Generation of Survival Data

Random generation of survival data from a wide range of regression models, including accelerated failure time (AFT), proportional hazards (PH), proportional odds (PO), accelerated hazard (AH), Yang and Prentice (YP), and extended hazard (EH) models. The package 'rsurv' also stands out by its ability to generate survival data from an unlimited number of baseline distributions provided that an implementation of the quantile function of the chosen baseline distribution is available in R. Another nice feature of the package 'rsurv' lies in the fact that linear predictors are specified via a formula-based approach, facilitating the inclusion of categorical variables and interaction terms. The functions implemented in the package 'rsurv' can also be employed to simulate survival data with more complex structures, such as survival data with different types of censoring mechanisms, survival data with cure fraction, survival data with random effects (frailties), multivariate survival data, and competing risks survival data. Details about the R package 'rsurv' can be found in Demarqui (2024) <doi:10.48550/arXiv.2406.01750>.

Maintained by Fabio Demarqui. Last updated 1 months ago.

19.4 match 1 stars 4.56 score 24 scripts

bioc

graph:graph: A package to handle graph data structures

A package that implements some simple graph handling capabilities.

Maintained by Bioconductor Package Maintainer. Last updated 9 days ago.

graphandnetwork

7.5 match 11.78 score 764 scripts 342 dependents

insightsengineering

osprey:R Package to Create TLGs

Community effort to collect TLG code and create a catalogue.

Maintained by Nina Qi. Last updated 19 days ago.

catalog graphs listings nest tables

16.3 match 4 stars 5.41 score 1 dependents

csafe-isu

handwriterRF:Handwriting Analysis with Random Forests

Perform forensic handwriting analysis of two scanned handwritten documents. This package implements the statistical method described by Madeline Johnson and Danica Ommen (2021) <doi:10.1002/sam.11566>. Similarity measures and a random forest produce a score-based likelihood ratio that quantifies the strength of the evidence in favor of the documents being written by the same writer or different writers.

Maintained by Stephanie Reinders. Last updated 7 days ago.

jags cpp

14.1 match 2 stars 6.18 score 15 scripts 1 dependents

rcppcore

RcppArmadillo:'Rcpp' Integration for the 'Armadillo' Templated Linear Algebra Library

'Armadillo' is a templated C++ linear algebra library (by Conrad Sanderson) that aims towards a good balance between speed and ease of use. Integer, floating point and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. The 'RcppArmadillo' package includes the header files from the templated 'Armadillo' library. Thus users do not need to install 'Armadillo' itself in order to use 'RcppArmadillo'. From release 7.800.0 on, 'Armadillo' is licensed under Apache License 2; previous releases were under licensed as MPL 2.0 from version 3.800.0 onwards and LGPL-3 prior to that; 'RcppArmadillo' (the 'Rcpp' bindings/bridge to Armadillo) is licensed under the GNU GPL version 2 or later, as is the rest of 'Rcpp'.

Maintained by Dirk Eddelbuettel. Last updated 4 days ago.

armadillo c-plus-plus rcpp rcpparmadillo openblas cpp openmp

4.6 match 197 stars 18.77 score 1.9k scripts 3.4k dependents

debruine

faux:Simulation for Factorial Designs

Create datasets with factorial structure through simulation by specifying variable parameters. Extended documentation at <https://debruine.github.io/faux/>. Described in DeBruine (2020) <doi:10.5281/zenodo.2669586>.

Maintained by Lisa DeBruine. Last updated 2 months ago.

data simulation

9.2 match 98 stars 9.35 score 716 scripts 1 dependents

mjlajeunesse

metagear:Comprehensive Research Synthesis Tools for Systematic Reviews and Meta-Analysis

Functionalities for facilitating systematic reviews, data extractions, and meta-analyses. It includes a GUI (graphical user interface) to help screen the abstracts and titles of bibliographic data; tools to assign screening effort across multiple collaborators/reviewers and to assess inter- reviewer reliability; tools to help automate the download and retrieval of journal PDF articles from online databases; figure and image extractions from PDFs; web scraping of citations; automated and manual data extraction from scatter-plot and bar-plot images; PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagrams; simple imputation tools to fill gaps in incomplete or missing study parameters; generation of random effects sizes for Hedges' d, log response ratio, odds ratio, and correlation coefficients for Monte Carlo experiments; covariance equations for modelling dependencies among multiple effect sizes (e.g., effect sizes with a common control); and finally summaries that replicate analyses and outputs from widely used but no longer updated meta-analysis software (i.e., metawin). Funding for this package was supported by National Science Foundation (NSF) grants DBI-1262545 and DEB-1451031. CITE: Lajeunesse, M.J. (2016) Facilitating systematic reviews, data extraction and meta-analysis with the metagear package for R. Methods in Ecology and Evolution 7, 323-330 <doi:10.1111/2041-210X.12472>.

Maintained by Marc J. Lajeunesse. Last updated 4 years ago.

12.8 match 14 stars 6.71 score 91 scripts

moviedo5

fda.usc:Functional Data Analysis and Utilities for Statistical Computing

Routines for exploratory and descriptive analysis of functional data such as depth measurements, atypical curves detection, regression models, supervised classification, unsupervised classification and functional analysis of variance.

Maintained by Manuel Oviedo de la Fuente. Last updated 4 months ago.

functional-data-analysis fortran

8.8 match 12 stars 9.72 score 560 scripts 22 dependents

mrcieu

TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database

A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <https://gwas.mrcieu.ac.uk/> to automatically obtain data, and a wide range of methods to run the analysis.

Maintained by Gibran Hemani. Last updated 9 days ago.

7.6 match 467 stars 11.23 score 1.7k scripts 1 dependents

mrc-ide

monty:Monte Carlo Models

Experimental sources for the next generation of mcstate, now called 'monty', which will support much of the old mcstate functionality but new things like better parameter interfaces, Hamiltonian Monte Carlo, and other features.

Maintained by Rich FitzJohn. Last updated 1 months ago.

cpp

11.2 match 3 stars 7.52 score 29 scripts 3 dependents

techtonique

nnetsauce:Randomized and Quasi-Randomized networks for Statistical/Machine Learning

Randomized and Quasi-Randomized networks for Statistical/Machine Learning

Maintained by T. Moudiki. Last updated 7 months ago.

deep-learning machine-learning neural-networks randomized-algorithms statistical-learning

32.3 match 2 stars 2.60 score 6 scripts

paulnorthrop

revdbayes:Ratio-of-Uniforms Sampling for Bayesian Extreme Value Analysis

Provides functions for the Bayesian analysis of extreme value models. The 'rust' package <https://cran.r-project.org/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of 'revdbayes' is similar to the 'evdbayes' package <https://cran.r-project.org/package=evdbayes>, which uses Markov Chain Monte Carlo ('MCMC') methods for posterior simulation. In addition, there are functions for making inferences about the extremal index, using the models for threshold inter-exceedance times of Suveges and Davison (2010) <doi:10.1214/09-AOAS292> and Holesovsky and Fusek (2020) <doi:10.1007/s10687-020-00374-3>. Also provided are d,p,q,r functions for the Generalised Extreme Value ('GEV') and Generalised Pareto ('GP') distributions that deal appropriately with cases where the shape parameter is very close to zero.

Maintained by Paul J. Northrop. Last updated 7 months ago.

analysis bayesian extreme extreme-value-statistics extremes generalized-pareto-distribution gev inference nhpp point-process posterior predictive rcpp value openblas cpp

11.0 match 4 stars 7.62 score 58 scripts 4 dependents

mrcieu

OneSampleMR:One Sample Mendelian Randomization and Instrumental Variable Analyses

Useful functions for one-sample (individual level data) Mendelian randomization and instrumental variable analyses. The package includes implementations of; the Sanderson and Windmeijer (2016) <doi:10.1016/j.jeconom.2015.06.004> conditional F-statistic, the multiplicative structural mean model Hernán and Robins (2006) <doi:10.1097/01.ede.0000222409.00878.37>, and two-stage predictor substitution and two-stage residual inclusion estimators explained by Terza et al. (2008) <doi:10.1016/j.jhealeco.2007.09.009>.

Maintained by Tom Palmer. Last updated 19 days ago.

instrumental-variable instrumental-variables mendelian-randomisation mendelian-randomization mendelianrandomisation mendelianrandomization

12.6 match 19 stars 6.69 score 16 scripts

daijiang

megatrees:Subsets of randomly selected phylogenies from existing mega-phylogenies

There are an increasing number of mega-phylogenies available nowadays, with many of them being sets of thousands of posterior distribution phylogenies. For ecological studies, we may need to randomly select many such posterior phylogeneies to conduct analyses. This data package serves this purpose by providing a small number (100) of randomly selected posterior phylogenies (if available) so that we can readily use them for our downstream analyses without repeating the downloading and selecting processes.

Maintained by Daijiang Li. Last updated 2 months ago.

27.2 match 4 stars 3.08 score 2 scripts 1 dependents

cran

kendallRandomWalks:Simulate and Visualize Kendall Random Walks and Related Distributions

Kendall random walks are a continuous-space Markov chains generated by the Kendall generalized convolution. This package provides tools for simulating these random walks and studying distributions related to them. For more information about Kendall random walks see Jasiulis-Gołdyn (2014) <arXiv:1412.0220>.

Maintained by Mateusz Staniak. Last updated 7 years ago.

25.7 match 3.26 score 18 scripts

bioc

regioneR:Association analysis of genomic regions based on permutation tests

regioneR offers a statistical framework based on customizable permutation tests to assess the association between genomic region sets and other genomic features.

Maintained by Bernat Gel. Last updated 5 months ago.

genetics chipseq dnaseq methylseq copynumbervariation

9.3 match 9.00 score 2.7k scripts 21 dependents

dsy109

mixtools:Tools for Analyzing Finite Mixture Models

Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixtures-of-regressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictor-dependent mixing proportions, random effects regressions, hierarchical mixtures-of-experts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic, mixturegrams, and model selection criteria). Bayesian estimation of mixtures-of-linear-regressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772 and the Chan Zuckerberg Initiative: Essential Open Source Software for Science (Grant No. 2020-255193).

Maintained by Derek Young. Last updated 9 months ago.

mixture-models mixture-of-experts semiparametric-regression

7.3 match 20 stars 11.34 score 1.4k scripts 56 dependents

ropensci

coder:Deterministic Categorization of Items Based on External Code Data

Fast categorization of items based on external code data identified by regular expressions. A typical use case considers patient with medically coded data, such as codes from the International Classification of Diseases ('ICD') or the Anatomic Therapeutic Chemical ('ATC') classification system. Functions of the package relies on a triad of objects: (1) case data with unit id:s and possible dates of interest; (2) external code data for corresponding units in (1) and with optional dates of interest and; (3) a classification scheme ('classcodes' object) with regular expressions to identify and categorize relevant codes from (2). It is easy to introduce new classification schemes ('classcodes' objects) or to use default schemes included in the package. Use cases includes patient categorization based on 'comorbidity indices' such as 'Charlson', 'Elixhauser', 'RxRisk V', or the 'comorbidity-polypharmacy' score (CPS), as well as adverse events after hip and knee replacement surgery.

Maintained by Erik Bulow. Last updated 2 years ago.

classification icd-10

13.1 match 22 stars 6.31 score 23 scripts

adamlilith

fasterRaster:Faster Raster and Spatial Vector Processing Using 'GRASS GIS'

Processing of large-in-memory/large-on disk rasters and spatial vectors using 'GRASS GIS' <https://grass.osgeo.org/>. Most functions in the 'terra' package are recreated. Processing of medium-sized and smaller spatial objects will nearly always be faster using 'terra' or 'sf', but for large-in-memory/large-on-disk objects, 'fasterRaster' may be faster. To use most of the functions, you must have the stand-alone version (not the 'OSGeoW4' installer version) of 'GRASS GIS' 8.0 or higher.

Maintained by Adam B. Smith. Last updated 17 days ago.

aspect distance fragmentation fragmentation-indices gis grass grass-gis raster raster-projection rasterize slope topography vectorization

10.7 match 58 stars 7.69 score 8 scripts

opengeos

whitebox:'WhiteboxTools' R Frontend

An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.

Maintained by Andrew Brown. Last updated 5 months ago.

geomorphometry geoprocessing geospatial gis hydrology remote-sensing rstudio

8.5 match 173 stars 9.65 score 203 scripts 2 dependents

sb452

MendelianRandomization:Mendelian Randomization Package

Encodes several methods for performing Mendelian randomization analyses with summarized data. Summarized data on genetic associations with the exposure and with the outcome can be obtained from large consortia. These data can be used for obtaining causal estimates using instrumental variable methods.

Maintained by Stephen Burgess. Last updated 2 years ago.

openblas cpp

12.0 match 1 stars 6.83 score 940 scripts 1 dependents

sdctools

sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.

Maintained by Matthias Templ. Last updated 25 days ago.

cpp

8.3 match 83 stars 9.89 score 258 scripts

mlr-org

mlr3fselect:Feature Selection for 'mlr3'

Feature selection package of the 'mlr3' ecosystem. It selects the optimal feature set for any 'mlr3' learner. The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling.

Maintained by Marc Becker. Last updated 2 months ago.

evolutionary-algorithms exhaustive-search feature-selection machine-learning mlr3 optimization random-search recursive-feature-elimination sequential-feature-selection

9.9 match 23 stars 8.25 score 70 scripts 2 dependents

nalzok

tree.interpreter:Random Forest Prediction Decomposition and Feature Importance Measure

An R re-implementation of the 'treeinterpreter' package on PyPI <https://pypi.org/project/treeinterpreter/>. Each prediction can be decomposed as 'prediction = bias + feature_1_contribution + ... + feature_n_contribution'. This decomposition is then used to calculate the Mean Decrease Impurity (MDI) and Mean Decrease Impurity using out-of-bag samples (MDI-oob) feature importance measures based on the work of Li et al. (2019) <arXiv:1906.10845>.

Maintained by Qingyao Sun. Last updated 5 years ago.

data-science datascience interpretability machine-learning random-forest cpp

14.1 match 12 stars 5.79 score 6 scripts

mlr-org

mlr3mbo:Flexible Bayesian Optimization

A modern and flexible approach to Bayesian Optimization / Model Based Optimization building on the 'bbotk' package. 'mlr3mbo' is a toolbox providing both ready-to-use optimization algorithms as well as their fundamental building blocks allowing for straightforward implementation of custom algorithms. Single- and multi-objective optimization is supported as well as mixed continuous, categorical and conditional search spaces. Moreover, using 'mlr3mbo' for hyperparameter optimization of machine learning models within the 'mlr3' ecosystem is straightforward via 'mlr3tuning'. Examples of ready-to-use optimization algorithms include Efficient Global Optimization by Jones et al. (1998) <doi:10.1023/A:1008306431147>, ParEGO by Knowles (2006) <doi:10.1109/TEVC.2005.851274> and SMS-EGO by Ponweiser et al. (2008) <doi:10.1007/978-3-540-87700-4_78>.

Maintained by Lennart Schneider. Last updated 11 days ago.

automl bayesian-optimization bbotk black-box-optimization gaussian-process hpo hyperparameter hyperparameter-optimization hyperparameter-tuning machine-learning mlr3 model-based-optimization optimization optimizer random-forest tuning

9.5 match 25 stars 8.57 score 120 scripts 3 dependents

epimodel

EpiModel:Mathematical Modeling of Infectious Disease Dynamics

Tools for simulating mathematical models of infectious disease dynamics. Epidemic model classes include deterministic compartmental models, stochastic individual-contact models, and stochastic network models. Network models use the robust statistical methods of exponential-family random graph models (ERGMs) from the Statnet suite of software packages in R. Standard templates for epidemic modeling include SI, SIR, and SIS disease types. EpiModel features an API for extending these templates to address novel scientific research aims. Full methods for EpiModel are detailed in Jenness et al. (2018, <doi:10.18637/jss.v084.i08>).

Maintained by Samuel Jenness. Last updated 2 months ago.

agent-based-modeling epidemics epidemiology infectious-diseases network-graph cpp

7.0 match 250 stars 11.57 score 315 scripts

sebastian-engelke

graphicalExtremes:Statistical Methodology for Graphical Extreme Value Models

Statistical methodology for sparse multivariate extreme value models. Methods are provided for exact simulation and statistical inference for multivariate Pareto distributions on graphical structures as described in the paper 'Graphical Models for Extremes' by Engelke and Hitz (2020) <doi:10.1111/rssb.12355>.

Maintained by Sebastian Engelke. Last updated 2 months ago.

10.9 match 16 stars 7.38 score 28 scripts 1 dependents

gzt

CholWishart:Cholesky Decomposition of the Wishart Distribution

Sampling from the Cholesky factorization of a Wishart random variable, sampling from the inverse Wishart distribution, sampling from the Cholesky factorization of an inverse Wishart random variable, sampling from the pseudo Wishart distribution, sampling from the generalized inverse Wishart distribution, computing densities for the Wishart and inverse Wishart distributions, and computing the multivariate gamma and digamma functions. Provides a header file so the C functions can be called directly from other programs.

Maintained by Geoffrey Thompson. Last updated 6 months ago.

cholesky-decomposition cholesky-factorization digamma-functions gamma multivariate pseudo-wishart wishart wishart-distributions openblas

11.4 match 7 stars 7.05 score 41 scripts 13 dependents

r-lidar

lidR:Airborne LiDAR Data Manipulation and Visualization for Forestry Applications

Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.

Maintained by Jean-Romain Roussel. Last updated 1 months ago.

als forestry las laz lidar point-cloud remote-sensing openblas cpp openmp

5.6 match 623 stars 14.47 score 844 scripts 8 dependents

liuyu-star

ODRF:Oblique Decision Random Forest for Classification and Regression

The oblique decision tree (ODT) uses linear combinations of predictors as partitioning variables in a decision tree. Oblique Decision Random Forest (ODRF) is an ensemble of multiple ODTs generated by feature bagging. Oblique Decision Boosting Tree (ODBT) applies feature bagging during the training process of ODT-based boosting trees to ensemble multiple boosting trees. All three methods can be used for classification and regression, and ODT and ODRF serve as supplements to the classical CART of Breiman (1984) <DOI:10.1201/9781315139470> and Random Forest of Breiman (2001) <DOI:10.1023/A:1010933404324> respectively.

Maintained by Yu Liu. Last updated 5 months ago.

cpp

15.6 match 7 stars 5.10 score 18 scripts

bayesiandemography

rvec:Vector Representing a Random Variable

Random vectors, called rvecs. An rvec holds multiple draws, but tries to behave like a standard R vector, including working well in data frames. Rvecs are useful for working with output from a simulation or a Bayesian analysis.

Maintained by John Bryant. Last updated 6 months ago.

14.5 match 2 stars 5.46 score 24 scripts 2 dependents

cran

Directional:A Collection of Functions for Directional Data Analysis

A collection of functions for directional data (including massive data, with millions of observations) analysis. Hypothesis testing, discriminant and regression analysis, MLE of distributions and more are included. The standard textbook for such data is the "Directional Statistics" by Mardia, K. V. and Jupp, P. E. (2000). Other references include: a) Paine J.P., Preston S.P., Tsagris M. and Wood A.T.A. (2018). "An elliptically symmetric angular Gaussian distribution". Statistics and Computing 28(3): 689-697. <doi:10.1007/s11222-017-9756-4>. b) Tsagris M. and Alenazi A. (2019). "Comparison of discriminant analysis methods on the sphere". Communications in Statistics: Case Studies, Data Analysis and Applications 5(4):467--491. <doi:10.1080/23737484.2019.1684854>. c) Paine J.P., Preston S.P., Tsagris M. and Wood A.T.A. (2020). "Spherical regression models with general covariates and anisotropic errors". Statistics and Computing 30(1): 153--165. <doi:10.1007/s11222-019-09872-2>. d) Tsagris M. and Alenazi A. (2024). "An investigation of hypothesis testing procedures for circular and spherical mean vectors". Communications in Statistics-Simulation and Computation, 53(3): 1387--1408. <doi:10.1080/03610918.2022.2045499>. e) Yu Z. and Huang X. (2024). A new parameterization for elliptically symmetric angular Gaussian distributions of arbitrary dimension. Electronic Journal of Statistics, 18(1): 301--334. <doi:10.1214/23-EJS2210>. f) Tsagris M. and Alzeley O. (2024). "Circular and spherical projected Cauchy distributions: A Novel Framework for Circular and Directional Data Modeling". Australian & New Zealand Journal of Statistics (Accepted for publication). <doi:10.1111/anzs.12434>. g) Tsagris M., Papastamoulis P. and Kato S. (2024). "Directional data analysis: spherical Cauchy or Poisson kernel-based distribution". Statistics and Computing (Accepted for publication). <doi:10.48550/arXiv.2409.03292>.

Maintained by Michail Tsagris. Last updated 1 months ago.

19.4 match 3 stars 4.06 score 3 dependents