R-universe search: tune

mlr-org

mlr3tuning:Hyperparameter Optimization for 'mlr3'

Hyperparameter optimization package of the 'mlr3' ecosystem. It features highly configurable search spaces via the 'paradox' package and finds optimal hyperparameter configurations for any 'mlr3' learner. 'mlr3tuning' works with several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in 'mlr3mbo') and Hyperband (in 'mlr3hyperband'). Moreover, it can automatically optimize learners and estimate the performance of optimized models with nested resampling.

Maintained by Marc Becker. Last updated 3 months ago.

bbotk hyperparameter-optimization hyperparameter-tuning machine-learning mlr3 optimization tune tuning

103.2 match 55 stars 11.59 score 384 scripts 11 dependents

tidymodels

tune:Tidy Tuning Tools

The ability to tune models is important. 'tune' contains functions and classes to be used in conjunction with other 'tidymodels' packages for finding reasonable values of hyper-parameters in models, pre-processing methods, and post-processing steps.

Maintained by Max Kuhn. Last updated 11 days ago.

79.0 match 293 stars 14.27 score 756 scripts 39 dependents

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 2 days ago.

immunooncology microarray sequencing metabolomics metagenomics proteomics geneprediction multiplecomparison classification regression bioconductor genomics genomics-data genomics-visualization multivariate-analysis multivariate-statistics omics r-pkg r-project

47.0 match 182 stars 13.71 score 1.3k scripts 22 dependents

cran

e1071:Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, generalized k-nearest neighbour ...

Maintained by David Meyer. Last updated 6 months ago.

cpp

28.8 match 28 stars 14.46 score 19k scripts 2.0k dependents

cran

Compositional:Compositional Data Analysis

Regression, classification, contour plots, hypothesis testing and fitting of distributions for compositional data are some of the functions included. We further include functions for percentages (or proportions). The standard textbook for such data is John Aitchison's (1986) "The statistical analysis of compositional data". Relevant papers include: a) Tsagris M.T., Preston S. and Wood A.T.A. (2011). "A data-based power transformation for compositional data". Fourth International International Workshop on Compositional Data Analysis. <doi:10.48550/arXiv.1106.1451> b) Tsagris M. (2014). "The k-NN algorithm for compositional data: a revised approach with and without zero values present". Journal of Data Science, 12(3): 519--534. <doi:10.6339/JDS.201407_12(3).0008>. c) Tsagris M. (2015). "A novel, divergence based, regression for compositional data". Proceedings of the 28th Panhellenic Statistics Conference, 15-18 April 2015, Athens, Greece, 430--444. <doi:10.48550/arXiv.1511.07600>. d) Tsagris M. (2015). "Regression analysis with compositional data containing zero values". Chilean Journal of Statistics, 6(2): 47--57. <https://soche.cl/chjs/volumes/06/02/Tsagris(2015).pdf>. e) Tsagris M., Preston S. and Wood A.T.A. (2016). "Improved supervised classification for compositional data using the alpha-transformation". Journal of Classification, 33(2): 243--261. <doi:10.1007/s00357-016-9207-5>. f) Tsagris M., Preston S. and Wood A.T.A. (2017). "Nonparametric hypothesis testing for equality of means on the simplex". Journal of Statistical Computation and Simulation, 87(2): 406--422. <doi:10.1080/00949655.2016.1216554>. g) Tsagris M. and Stewart C. (2018). "A Dirichlet regression model for compositional data with zeros". Lobachevskii Journal of Mathematics, 39(3): 398--412. <doi:10.1134/S1995080218030198>. h) Alenazi A. (2019). "Regression for compositional data with compositional data as predictor variables with or without zero values". Journal of Data Science, 17(1): 219--238. <doi:10.6339/JDS.201901_17(1).0010>. i) Tsagris M. and Stewart C. (2020). "A folded model for compositional data analysis". Australian and New Zealand Journal of Statistics, 62(2): 249--277. <doi:10.1111/anzs.12289>. j) Alenazi A.A. (2022). "f-divergence regression models for compositional data". Pakistan Journal of Statistics and Operation Research, 18(4): 867--882. <doi:10.18187/pjsor.v18i4.3969>. k) Tsagris M. and Stewart C. (2022). "A Review of Flexible Transformations for Modeling Compositional Data". In Advances and Innovations in Statistics and Data Science, pp. 225--234. <doi:10.1007/978-3-031-08329-7_10>. l) Alenazi A. (2023). "A review of compositional data analysis and recent advances". Communications in Statistics--Theory and Methods, 52(16): 5535--5567. <doi:10.1080/03610926.2021.2014890>. m) Tsagris M., Alenazi A. and Stewart C. (2023). "Flexible non-parametric regression models for compositional response data with zeros". Statistics and Computing, 33(106). <doi:10.1007/s11222-023-10277-5>. n) Tsagris. M. (2025). "Constrained least squares simplicial-simplicial regression". Statistics and Computing, 35(27). <doi:10.1007/s11222-024-10560-z>. o) Sevinc V. and Tsagris. M. (2024). "Energy Based Equality of Distributions Testing for Compositional Data". <doi:10.48550/arXiv.2412.05199>.

Maintained by Michail Tsagris. Last updated 2 months ago.

77.5 match 3 stars 3.64 score 4 dependents

mlr-org

mlr3hyperband:Hyperband for 'mlr3'

Successive Halving (Jamieson and Talwalkar (2016) <doi:10.48550/arXiv.1502.07943>) and Hyperband (Li et al. 2018 <doi:10.48550/arXiv.1603.06560>) optimization algorithm for the mlr3 ecosystem. The implementation in mlr3hyperband features improved scheduling and parallelizes the evaluation of configurations. The package includes tuners for hyperparameter optimization in mlr3tuning and optimizers for black-box optimization in bbotk.

Maintained by Marc Becker. Last updated 9 months ago.

automl bbotk hyperband hyperparameter-tuning machine-learning mlr3 optimization tune tuning

30.4 match 18 stars 7.48 score 44 scripts 3 dependents

tidymodels

dials:Tools for Creating Tuning Parameter Values

Many models contain tuning parameters (i.e. parameters that cannot be directly estimated from the data). These tools can be used to define objects for creating, simulating, or validating values for such parameters.

Maintained by Hannah Frick. Last updated 28 days ago.

12.0 match 114 stars 14.22 score 426 scripts 52 dependents

r-forge

robustbase:Basic Robust Statistics

"Essential" Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book "Robust Statistics, Theory and Methods" by 'Maronna, Martin and Yohai'; Wiley 2006.

Maintained by Martin Maechler. Last updated 4 months ago.

fortran openblas

11.6 match 13.33 score 1.7k scripts 480 dependents

mlr-org

mlr3mbo:Flexible Bayesian Optimization

A modern and flexible approach to Bayesian Optimization / Model Based Optimization building on the 'bbotk' package. 'mlr3mbo' is a toolbox providing both ready-to-use optimization algorithms as well as their fundamental building blocks allowing for straightforward implementation of custom algorithms. Single- and multi-objective optimization is supported as well as mixed continuous, categorical and conditional search spaces. Moreover, using 'mlr3mbo' for hyperparameter optimization of machine learning models within the 'mlr3' ecosystem is straightforward via 'mlr3tuning'. Examples of ready-to-use optimization algorithms include Efficient Global Optimization by Jones et al. (1998) <doi:10.1023/A:1008306431147>, ParEGO by Knowles (2006) <doi:10.1109/TEVC.2005.851274> and SMS-EGO by Ponweiser et al. (2008) <doi:10.1007/978-3-540-87700-4_78>.

Maintained by Lennart Schneider. Last updated 11 days ago.

automl bayesian-optimization bbotk black-box-optimization gaussian-process hpo hyperparameter hyperparameter-optimization hyperparameter-tuning machine-learning mlr3 model-based-optimization optimization optimizer random-forest tuning

17.5 match 25 stars 8.57 score 120 scripts 3 dependents

msalibian

RobStatTM:Robust Statistics: Theory and Methods

Companion package for the book: "Robust Statistics: Theory and Methods, second edition", <http://www.wiley.com/go/maronna/robust>. This package contains code that implements the robust estimators discussed in the recent second edition of the book above, as well as the scripts reproducing all the examples in the book.

Maintained by Matias Salibian-Barrera. Last updated 1 days ago.

robust robust-estimation robust-regression robust-statistics robustness statistics fortran openblas

14.2 match 17 stars 10.23 score 84 scripts 8 dependents

brian-j-smith

MachineShop:Machine Learning Models and Tools

Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.

Maintained by Brian J Smith. Last updated 7 months ago.

classification-models machine-learning predictive-modeling regression-models survival-models

17.8 match 61 stars 7.95 score 121 scripts

jamiemkass

ENMeval:Automated Tuning and Evaluations of Ecological Niche Models

Runs ecological niche models over all combinations of user-defined settings (i.e., tuning), performs cross validation to evaluate models, and returns data tables to aid in selection of optimal model settings that balance goodness-of-fit and model complexity. Also has functions to partition data spatially (or not) for cross validation, to plot multiple visualizations of results, to run null models to estimate significance and effect sizes of performance metrics, and to calculate range overlap between model predictions, among others. The package was originally built for Maxent models (Phillips et al. 2006, Phillips et al. 2017), but the current version allows possible extensions for any modeling algorithm. The extensive vignette, which guides users through most package functionality but unfortunately has a file size too big for CRAN, can be found here on the package's Github Pages website: <https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.

Maintained by Jamie M. Kass. Last updated 2 months ago.

12.2 match 49 stars 11.25 score 332 scripts 2 dependents

business-science

modeltime:The Tidymodels Extension for Time Series Modeling

The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<https://otexts.com/fpp2/>). Refer to "Prophet: forecasting at scale" (<https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/>.).

Maintained by Matt Dancho. Last updated 5 months ago.

arima data-science deep-learning ets forecasting machine-learning machine-learning-algorithms modeltime prophet tbats tidymodeling tidymodels time time-series time-series-analysis timeseries timeseries-forecasting

12.8 match 549 stars 10.57 score 1.1k scripts 7 dependents

yuting-he

RobustPrediction:Robust Tuning and Training for Cross-Source Prediction

Provides robust parameter tuning and model training for predictive models applied across data sources where the data distribution varies slightly from source to source. This package implements three primary tuning methods: cross-validation-based internal tuning, external tuning, and the 'RobustTuneC' method. External tuning includes a conservative option where parameters are tuned internally on the training data and validating on an external dataset, providing a slightly pessimistic AUC estimate. It supports Lasso, Ridge, Random Forest, Boosting, and Support Vector Machine classifiers. Currently, only binary classification is supported. The response variable must be the first column of the dataset and a factor with exactly two levels. The tuning methods are based on the paper by Nicole Ellenbach, Anne-Laure Boulesteix, Bernd Bischl, Kristian Unger, and Roman Hornung (2021) "Improved Outcome Prediction Across Data Sources Through Robust Parameter Tuning" <doi:10.1007/s00357-020-09368-z>.

Maintained by Yuting He. Last updated 3 months ago.

44.6 match 2.85 score

mlopez-ibanez

irace:Iterated Racing for Automatic Algorithm Configuration

Iterated race is an extension of the Iterated F-race method for the automatic configuration of optimization algorithms, that is, (offline) tuning their parameters by finding the most appropriate settings given a set of instances of an optimization problem. M. López-Ibáñez, J. Dubois-Lacoste, L. Pérez Cáceres, T. Stützle, and M. Birattari (2016) <doi:10.1016/j.orp.2016.09.002>.

Maintained by Manuel López-Ibáñez. Last updated 29 days ago.

algorithm-configuration hyperparameter-tuning irace optimization-algorithms

11.6 match 63 stars 10.28 score 103 scripts 1 dependents

consbiol-unibern

SDMtune:Species Distribution Model Selection

User-friendly framework that enables the training and the evaluation of species distribution models (SDMs). The package implements functions for data driven variable selection and model tuning and includes numerous utilities to display the results. All the functions used to select variables or to tune model hyperparameters have an interactive real-time chart displayed in the 'RStudio' viewer pane during their execution.

Maintained by Sergio Vignali. Last updated 3 months ago.

hyperparameter-tuning species-distribution-modelling variable-selection cpp

15.7 match 25 stars 7.37 score 155 scripts

topepo

caret:Classification and Regression Training

Misc functions for training and plotting classification and regression models.

Maintained by Max Kuhn. Last updated 3 months ago.

5.8 match 1.6k stars 19.24 score 61k scripts 303 dependents

rstudio

tfruns:Training Run Tools for 'TensorFlow'

Create and manage unique directories for each 'TensorFlow' training run. Provides a unique, time stamped directory for each run along with functions to retrieve the directory of the latest run or latest several runs.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

8.6 match 34 stars 11.80 score 325 scripts 77 dependents

citoverse

cito:Building and Training Neural Networks

The 'cito' package provides a user-friendly interface for training and interpreting deep neural networks (DNN). 'cito' simplifies the fitting of DNNs by supporting the familiar formula syntax, hyperparameter tuning under cross-validation, and helps to detect and handle convergence problems. DNNs can be trained on CPU, GPU and MacOS GPUs. In addition, 'cito' has many downstream functionalities such as various explainable AI (xAI) metrics (e.g. variable importance, partial dependence plots, accumulated local effect plots, and effect estimates) to interpret trained DNNs. 'cito' optionally provides confidence intervals (and p-values) for all xAI metrics and predictions. At the same time, 'cito' is computationally efficient because it is based on the deep learning framework 'torch'. The 'torch' package is native to R, so no Python installation or other API is required for this package.

Maintained by Maximilian Pichler. Last updated 2 months ago.

machine-learning neural-network

10.8 match 42 stars 9.10 score 129 scripts 1 dependents

elsa-yang98

ConformalSmallest:Efficient Tuning-Free Conformal Prediction

An implementation of efficiency first conformal prediction (EFCP) and validity first conformal prediction (VFCP) that demonstrates both validity (coverage guarantee) and efficiency (width guarantee). To learn how to use it, check the vignettes for a quick tutorial. The package is based on the work by Yang Y., Kuchibhotla A.,(2021) <arxiv:2104.13871>.

Maintained by Yachong Yang. Last updated 4 years ago.

22.6 match 2 stars 4.30 score 5 scripts

kogalur

randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)

Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.

Maintained by Udaya B. Kogalur. Last updated 2 months ago.

openmp

11.5 match 10 stars 7.90 score 1.2k scripts 12 dependents

biomodhub

biomod2:Ensemble Platform for Species Distribution Modeling

Functions for species distribution modeling, calibration and evaluation, ensemble of models, ensemble forecasting and visualization. The package permits to run consistently up to 10 single models on a presence/absences (resp presences/pseudo-absences) dataset and to combine them in ensemble models and ensemble projections. Some bench of other evaluation and visualisation tools are also available within the package.

Maintained by Maya Gueguen. Last updated 4 days ago.

6.5 match 95 stars 13.88 score 536 scripts 7 dependents

cefet-rj-dal

daltoolbox:Leveraging Experiment Lines to Data Analytics

The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.

Maintained by Eduardo Ogasawara. Last updated 1 months ago.

13.4 match 1 stars 6.65 score 536 scripts 4 dependents

bioc

CMA:Synthesis of microarray-based classification

This package provides a comprehensive collection of various microarray-based classification algorithms both from Machine Learning and Statistics. Variable Selection, Hyperparameter tuning, Evaluation and Comparison can be performed combined or stepwise in a user-friendly environment.

Maintained by Roman Hornung. Last updated 5 months ago.

classification decisiontree

16.7 match 5.09 score 61 scripts

retowuest

autoMrP:Improving MrP with Ensemble Learning

A tool that improves the prediction performance of multilevel regression with post-stratification (MrP) by combining a number of machine learning methods. For information on the method, please refer to Broniecki, Wüest, Leemann (2020) ''Improving Multilevel Regression with Post-Stratification Through Machine Learning (autoMrP)'' in the 'Journal of Politics'. Final pre-print version: <https://lucasleemann.files.wordpress.com/2020/07/automrp-r2pa.pdf>.

Maintained by Philipp Broniecki. Last updated 5 months ago.

14.9 match 27 stars 5.61 score

irudnyts

openai:R Wrapper for OpenAI API

An R wrapper of OpenAI API endpoints (see <https://platform.openai.com/docs/introduction> for details). This package covers Models, Completions, Chat, Edits, Images, Embeddings, Audio, Files, Fine-tunes, Moderations, and legacy Engines endpoints.

Maintained by Iegor Rudnytskyi. Last updated 4 months ago.

api ml nlp openai

10.4 match 172 stars 8.05 score 336 scripts 5 dependents

tidymodels

hardhat:Construct Modeling Packages

Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.

Maintained by Hannah Frick. Last updated 1 months ago.

5.3 match 103 stars 14.88 score 175 scripts 436 dependents

r4ss

r4ss:R Code for Stock Synthesis

A collection of R functions for use with Stock Synthesis, a fisheries stock assessment modeling platform written in ADMB by Dr. Richard D. Methot at the NOAA Northwest Fisheries Science Center. The functions include tools for summarizing and plotting results, manipulating files, visualizing model parameterizations, and various other common stock assessment tasks. This version of '{r4ss}' is compatible with Stock Synthesis versions 3.24 through 3.30 (specifically version 3.30.23.1, from December 2024). Support for 3.24 models is only through the core functions for reading output and plotting.

Maintained by Ian G. Taylor. Last updated 3 days ago.

fisheries fisheries-stock-assessment stock-synthesis

6.7 match 43 stars 11.38 score 1.0k scripts 2 dependents

mlr-org

bbotk:Black-Box Optimization Toolkit

Features highly configurable search spaces via the 'paradox' package and optimizes every user-defined objective function. The package includes several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in 'mlr3mbo') and Hyperband (in 'mlr3hyperband'). bbotk is the base package of 'mlr3tuning', 'mlr3fselect' and 'miesmuschel'.

Maintained by Marc Becker. Last updated 3 months ago.

bbotk black-box-optimization data-science hyperparameter-optimization hyperparameter-tuning machine-learning mlr3 optimization

7.5 match 22 stars 9.87 score 166 scripts 14 dependents

kapsner

mlexperiments:Machine Learning Experiments

Provides 'R6' objects to perform parallelized hyperparameter optimization and cross-validation. Hyperparameter optimization can be performed with Bayesian optimization (via 'ParBayesianOptimization' <https://cran.r-project.org/package=ParBayesianOptimization>) and grid search. The optimized hyperparameters can be validated using k-fold cross-validation. Alternatively, hyperparameter optimization and validation can be performed with nested cross-validation. While 'mlexperiments' focuses on core wrappers for machine learning experiments, additional learner algorithms can be supplemented by inheriting from the provided learner base class.

Maintained by Lorenz A. Kapsner. Last updated 10 days ago.

cross-validation experiment hyperparameter-optimization hyperparameter-tuning machine-learning nested

9.3 match 5 stars 7.64 score 49 scripts 2 dependents

haotianxu

changepoints:A Collection of Change-Point Detection Methods

Performs a series of offline and/or online change-point detection algorithms for 1) univariate mean: <doi:10.1214/20-EJS1710>, <arXiv:2006.03283>; 2) univariate polynomials: <doi:10.1214/21-EJS1963>; 3) univariate and multivariate nonparametric settings: <doi:10.1214/21-EJS1809>, <doi:10.1109/TIT.2021.3130330>; 4) high-dimensional covariances: <doi:10.3150/20-BEJ1249>; 5) high-dimensional networks with and without missing values: <doi:10.1214/20-AOS1953>, <arXiv:2101.05477>, <arXiv:2110.06450>; 6) high-dimensional linear regression models: <arXiv:2010.10410>, <arXiv:2207.12453>; 7) high-dimensional vector autoregressive models: <arXiv:1909.06359>; 8) high-dimensional self exciting point processes: <arXiv:2006.03572>; 9) dependent dynamic nonparametric random dot product graphs: <arXiv:1911.07494>; 10) univariate mean against adversarial attacks: <arXiv:2105.10417>.

Maintained by Haotian Xu. Last updated 1 years ago.

openblas cpp

12.1 match 12 stars 5.78 score 25 scripts

cran

Directional:A Collection of Functions for Directional Data Analysis

A collection of functions for directional data (including massive data, with millions of observations) analysis. Hypothesis testing, discriminant and regression analysis, MLE of distributions and more are included. The standard textbook for such data is the "Directional Statistics" by Mardia, K. V. and Jupp, P. E. (2000). Other references include: a) Paine J.P., Preston S.P., Tsagris M. and Wood A.T.A. (2018). "An elliptically symmetric angular Gaussian distribution". Statistics and Computing 28(3): 689-697. <doi:10.1007/s11222-017-9756-4>. b) Tsagris M. and Alenazi A. (2019). "Comparison of discriminant analysis methods on the sphere". Communications in Statistics: Case Studies, Data Analysis and Applications 5(4):467--491. <doi:10.1080/23737484.2019.1684854>. c) Paine J.P., Preston S.P., Tsagris M. and Wood A.T.A. (2020). "Spherical regression models with general covariates and anisotropic errors". Statistics and Computing 30(1): 153--165. <doi:10.1007/s11222-019-09872-2>. d) Tsagris M. and Alenazi A. (2024). "An investigation of hypothesis testing procedures for circular and spherical mean vectors". Communications in Statistics-Simulation and Computation, 53(3): 1387--1408. <doi:10.1080/03610918.2022.2045499>. e) Yu Z. and Huang X. (2024). A new parameterization for elliptically symmetric angular Gaussian distributions of arbitrary dimension. Electronic Journal of Statistics, 18(1): 301--334. <doi:10.1214/23-EJS2210>. f) Tsagris M. and Alzeley O. (2024). "Circular and spherical projected Cauchy distributions: A Novel Framework for Circular and Directional Data Modeling". Australian & New Zealand Journal of Statistics (Accepted for publication). <doi:10.1111/anzs.12434>. g) Tsagris M., Papastamoulis P. and Kato S. (2024). "Directional data analysis: spherical Cauchy or Poisson kernel-based distribution". Statistics and Computing (Accepted for publication). <doi:10.48550/arXiv.2409.03292>.

Maintained by Michail Tsagris. Last updated 1 months ago.

15.6 match 3 stars 4.06 score 3 dependents

ngreifer

cobalt:Covariate Balance Tables and Plots

Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with 'MatchIt', 'WeightIt', 'MatchThem', 'twang', 'Matching', 'optmatch', 'CBPS', 'ebal', 'cem', 'sbw', and 'designmatch' for assessing balance on the output of their preprocessing functions. Users can also specify data for balance assessment not generated through the above packages. Also included are methods for assessing balance in clustered or multiply imputed data sets or data sets with multi-category, continuous, or longitudinal treatments.

Maintained by Noah Greifer. Last updated 11 months ago.

causal-inference propensity-scores

4.5 match 75 stars 12.98 score 1.0k scripts 8 dependents

tidymodels

parsnip:A Common API to Modeling and Analysis Functions

A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).

Maintained by Max Kuhn. Last updated 3 days ago.

3.3 match 612 stars 16.37 score 3.4k scripts 69 dependents

foucher-y

survivalSL:Super Learner for Survival Prediction from Censored Data

Several functions and S3 methods to construct a super learner in the presence of censored times-to-event and to evaluate its prognostic capacities.

Maintained by Yohann Foucher. Last updated 1 months ago.

14.3 match 2 stars 3.70 score

centerforstatistics-ugent

xnet:Two-Step Kernel Ridge Regression for Network Predictions

Fit a two-step kernel ridge regression model for predicting edges in networks, and carry out cross-validation using shortcuts for swift and accurate performance assessment (Stock et al, 2018 <doi:10.1093/bib/bby095> ).

Maintained by Joris Meys. Last updated 4 years ago.

10.0 match 11 stars 5.30 score 12 scripts

mlr-org

paradox:Define and Work with Parameter Spaces for Complex Algorithms

Define parameter spaces, constraints and dependencies for arbitrary algorithms, to program on such spaces. Also includes statistical designs and random samplers. Objects are implemented as 'R6' classes.

Maintained by Martin Binder. Last updated 8 months ago.

experimental-design hyperparameters mlr3 transformations

4.5 match 29 stars 11.56 score 316 scripts 38 dependents

schlosslab

mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines

An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.

Maintained by Kelly Sovacool. Last updated 2 years ago.

machine-learning

6.4 match 56 stars 7.83 score 86 scripts

tidymodels

finetune:Additional Functions for Model Tuning

The ability to tune models is important. 'finetune' enhances the 'tune' package by providing more specialized methods for finding reasonable values of model tuning parameters. Two racing methods described by Kuhn (2014) <arXiv:1405.6974> are included. An iterative search method using generalized simulated annealing (Bohachevsky, Johnson and Stein, 1986) <doi:10.1080/00401706.1986.10488128> is also included.

Maintained by Max Kuhn. Last updated 7 months ago.

5.9 match 62 stars 8.36 score 704 scripts 1 dependents

eagerai

kerastuneR:Interface to 'Keras Tuner'

'Keras Tuner' <https://keras-team.github.io/keras-tuner/> is a hypertuning framework made for humans. It aims at making the life of AI practitioners, hypertuner algorithm creators and model designers as simple as possible by providing them with a clean and easy to use API for hypertuning. 'Keras Tuner' makes moving from a base model to a hypertuned one quick and easy by only requiring you to change a few lines of code.

Maintained by Turgut Abdullayev. Last updated 11 months ago.

hyperparameter-tuning hypertuning keras keras-tuner tensorflow trial

7.5 match 34 stars 6.61 score 48 scripts

marjoleinf

pre:Prediction Rule Ensembles

Derives prediction rule ensembles (PREs). Largely follows the procedure for deriving PREs as described in Friedman & Popescu (2008; <DOI:10.1214/07-AOAS148>), with adjustments and improvements. The main function pre() derives prediction rule ensembles consisting of rules and/or linear terms for continuous, binary, count, multinomial, and multivariate continuous responses. Function gpe() derives generalized prediction ensembles, consisting of rules, hinge and linear functions of the predictor variables.

Maintained by Marjolein Fokkema. Last updated 9 months ago.

5.8 match 58 stars 8.49 score 98 scripts 1 dependents

cran

datarobot:'DataRobot' Predictive Modeling API

For working with the 'DataRobot' predictive modeling platform's API <https://www.datarobot.com/>.

Maintained by AJ Alon. Last updated 1 years ago.

14.1 match 2 stars 3.48 score

nikita-moor

ldatuning:Tuning of the Latent Dirichlet Allocation Models Parameters

This library estimates the best fitting number of topics.

Maintained by Nathan Chaney. Last updated 10 months ago.

4.7 match 75 stars 9.76 score 356 scripts 5 dependents

theoreticalecology

sjSDM:Scalable Joint Species Distribution Modeling

A scalable and fast method for estimating joint Species Distribution Models (jSDMs) for big community data, including eDNA data. The package estimates a full (i.e. non-latent) jSDM with different response distributions (including the traditional multivariate probit model). The package allows to perform variation partitioning (VP) / ANOVA on the fitted models to separate the contribution of environmental, spatial, and biotic associations. In addition, the total R-squared can be further partitioned per species and site to reveal the internal metacommunity structure, see Leibold et al., <doi:10.1111/oik.08618>. The internal structure can then be regressed against environmental and spatial distinctiveness, richness, and traits to analyze metacommunity assembly processes. The package includes support for accounting for spatial autocorrelation and the option to fit responses using deep neural networks instead of a standard linear predictor. As described in Pichler & Hartig (2021) <doi:10.1111/2041-210X.13687>, scalability is achieved by using a Monte Carlo approximation of the joint likelihood implemented via 'PyTorch' and 'reticulate', which can be run on CPUs or GPUs.

Maintained by Maximilian Pichler. Last updated 22 days ago.

deep-learning gpu-acceleration machine-learning species-distribution-modelling species-interactions

5.9 match 69 stars 7.64 score 70 scripts

tidymodels

shinymodels:Interactive Assessments of Models

Launch a 'shiny' application for 'tidymodels' results. For classification or regression models, the app can be used to determine if there is lack of fit or poorly predicted points.

Maintained by Simon Couch. Last updated 5 months ago.

shiny

7.1 match 48 stars 6.21 score 48 scripts

azure

azuremlsdk:Interface to the 'Azure Machine Learning' 'SDK'

Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: <https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml>.

Maintained by Diondra Peck. Last updated 3 years ago.

amlcompute azure azure-machine-learning azureml dsi machine-learning rstudio sdk-r

4.9 match 106 stars 8.91 score 221 scripts

zjph602xtc

WeightSVM:Subject Weighted Support Vector Machines

Functions for subject/instance weighted support vector machines (SVM). It uses a modified version of 'libsvm' and is compatible with package 'e1071'. It also allows user defined kernel matrix.

Maintained by Tianchen Xu. Last updated 5 months ago.

cpp

7.8 match 3 stars 5.54 score 11 scripts 7 dependents

auto-optimization

iraceplot:Plots for Visualizing the Data Produced by the 'irace' Package

Graphical visualization tools for analyzing the data produced by 'irace'. The 'iraceplot' package enables users to analyze the performance and the parameter space data sampled by the configuration during the search process. It provides a set of functions that generate different plots to visualize the configurations sampled during the execution of 'irace' and their performance. The functions just require the log file generated by 'irace' and, in some cases, they can be used with user-provided data.

Maintained by Manuel López-Ibáñez. Last updated 1 months ago.

irace parameter-tuning

7.5 match 5 stars 5.70 score 7 scripts

tkonopka

umap:Uniform Manifold Approximation and Projection

Uniform manifold approximation and projection is a technique for dimension reduction. The algorithm was described by McInnes and Healy (2018) in <arXiv:1802.03426>. This package provides an interface for two implementations. One is written from scratch, including components for nearest-neighbor search and for embedding. The second implementation is a wrapper for 'python' package 'umap-learn' (requires separate installation, see vignette for more details).

Maintained by Tomasz Konopka. Last updated 11 months ago.

dimensionality-reduction umap cpp

3.3 match 132 stars 12.74 score 3.6k scripts 43 dependents

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 3 days ago.

3.1 match 845 stars 13.57 score 264 scripts 2 dependents

nliulab

AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator

A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.

Maintained by Feng Xie. Last updated 13 days ago.

5.3 match 32 stars 7.70 score 30 scripts

nixtla

nixtlar:A Software Development Kit for 'Nixtla''s 'TimeGPT'

A Software Development Kit for working with 'Nixtla''s 'TimeGPT', a foundation model for time series forecasting. 'API' is an acronym for 'application programming interface'; this package allows users to interact with 'TimeGPT' via the 'API'. You can set and validate 'API' keys and generate forecasts via 'API' calls. It is compatible with 'tsibble' and base R. For more details visit <https://docs.nixtla.io/>.

Maintained by Mariana Menchero. Last updated 26 days ago.

5.0 match 30 stars 8.16 score 38 scripts

cran

astrochron:A Computational Tool for Astrochronology

Routines for astrochronologic testing, astronomical time scale construction, and time series analysis <doi:10.1016/j.earscirev.2018.11.015>. Also included are a range of statistical analysis and modeling routines that are relevant to time scale development and paleoclimate analysis.

Maintained by Stephen Meyers. Last updated 6 months ago.

fortran

10.2 match 5 stars 3.85 score 141 scripts

r-forge

DPQ:Density, Probability, Quantile ('DPQ') Computations

Computations for approximations and alternatives for the 'DPQ' (Density (pdf), Probability (cdf) and Quantile) functions for probability distributions in R. Primary focus is on (central and non-central) beta, gamma and related distributions such as the chi-squared, F, and t. -- For several distribution functions, provide functions implementing formulas from Johnson, Kotz, and Kemp (1992) <doi:10.1002/bimj.4710360207> and Johnson, Kotz, and Balakrishnan (1995) for discrete or continuous distributions respectively. This is for the use of researchers in these numerical approximation implementations, notably for my own use in order to improve standard R pbeta(), qgamma(), ..., etc: {'"dpq"'-functions}.

Maintained by Martin Maechler. Last updated 1 months ago.

fortran

6.8 match 5.75 score 43 scripts 1 dependents

wanchanglin

mt:Metabolomics Data Analysis Toolbox

Functions for metabolomics data analysis: data preprocessing, orthogonal signal correction, PCA analysis, PCA-DA analysis, PLS-DA analysis, classification, feature selection, correlation analysis, data visualisation and re-sampling strategies.

Maintained by Wanchang Lin. Last updated 1 years ago.

8.6 match 3 stars 4.57 score 50 scripts

bioc

iClusterPlus:Integrative clustering of multi-type genomic data

Integrative clustering of multiple genomic data using a joint latent variable model.

Maintained by Qianxing Mo. Last updated 4 months ago.

multi-omics clustering fortran openblas

6.8 match 5.76 score 190 scripts

bssherwood

rqPen:Penalized Quantile Regression

Performs penalized quantile regression with LASSO, elastic net, SCAD and MCP penalty functions including group penalties. In addition, offers a group penalty that provides consistent variable selection across quantiles. Provides a function that automatically generates lambdas and evaluates different models with cross validation or BIC, including a large p version of BIC. Below URL provides a link to a work in progress vignette.

Maintained by Ben Sherwood. Last updated 30 days ago.

cpp

5.4 match 17 stars 7.19 score 100 scripts 3 dependents

robingenuer

VSURF:Variable Selection Using Random Forests

Three steps variable selection procedure based on random forests. Initially developed to handle high dimensional data (for which number of variables largely exceeds number of observations), the package is very versatile and can treat most dimensions of data, for regression and supervised classification problems. First step is dedicated to eliminate irrelevant variables from the dataset. Second step aims to select all variables related to the response for interpretation purpose. Third step refines the selection by eliminating redundancy in the set of variables selected by the second step, for prediction purpose. Genuer, R. Poggi, J.-M. and Tuleau-Malot, C. (2015) <https://journal.r-project.org/archive/2015-2/genuer-poggi-tuleaumalot.pdf>.

Maintained by Robin Genuer. Last updated 8 months ago.

5.1 match 36 stars 7.49 score 192 scripts 1 dependents

tidymodels

tidyclust:A Common API to Clustering

A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.

Maintained by Emil Hvitfeldt. Last updated 2 months ago.

5.1 match 111 stars 7.45 score 139 scripts

nanxstats

stackgbm:Stacked Gradient Boosting Machines

A minimalist implementation of model stacking by Wolpert (1992) <doi:10.1016/S0893-6080(05)80023-1> for boosted tree models. A classic, two-layer stacking model is implemented, where the first layer generates features using gradient boosting trees, and the second layer employs a logistic regression model that uses these features as inputs. Utilities for training the base models and parameters tuning are provided, allowing users to experiment with different ensemble configurations easily. It aims to provide a simple and efficient way to combine multiple gradient boosting models to improve predictive model performance and robustness.

Maintained by Nan Xiao. Last updated 11 months ago.

automl catboost decision-trees ensemble-learning gbdt gbm gradient-boosting lightgbm machine-learning model-stacking xgboost

6.9 match 25 stars 5.40 score 3 scripts

nelson-gon

manymodelr:Build and Tune Several Models

Frequently one needs a convenient way to build and tune several models in one go.The goal is to provide a number of machine learning convenience functions. It provides the ability to build, tune and obtain predictions of several models in one function. The models are built using functions from 'caret' with easier to read syntax. Kuhn(2014) <arXiv:1405.6974>.

Maintained by Nelson Gonzabato. Last updated 3 years ago.

analysis-of-variance anova correlation correlation-coefficient generalized-linear-models gradient-boosting-decision-trees knn-classification linear-models linear-regression machine-learning missing-values models r-programming random-forest-algorithm regression-models

6.9 match 2 stars 5.30 score 50 scripts

bioc

tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles

This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.

Maintained by Timothy Keyes. Last updated 5 months ago.

singlecell flowcytometry bioinformatics cytometry data-science single-cell tidyverse cpp

5.0 match 19 stars 7.26 score 35 scripts

bnaras

PMA:Penalized Multivariate Analysis

Performs Penalized Multivariate Analysis: a penalized matrix decomposition, sparse principal components analysis, and sparse canonical correlation analysis, described in Witten, Tibshirani and Hastie (2009) <doi:10.1093/biostatistics/kxp008> and Witten and Tibshirani (2009) Extensions of sparse canonical correlation analysis, with applications to genomic data <doi:10.2202/1544-6115.1470>.

Maintained by Balasubramanian Narasimhan. Last updated 1 years ago.

cpp

5.0 match 4 stars 7.24 score 254 scripts 11 dependents

antonio-pgarcia

evoper:Evolutionary Parameter Estimation for 'Repast Simphony' Models

The EvoPER, Evolutionary Parameter Estimation for Individual-based Models is an extensible package providing optimization driven parameter estimation methods using metaheuristics and evolutionary computation techniques (Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization for continuous domains, Tabu Search, Evolutionary Strategies, ...) which could be more efficient and require, in some cases, fewer model evaluations than alternatives relying on experimental design. Currently there are built in support for models developed with 'Repast Simphony' Agent-Based framework (<https://repast.github.io/>) and with NetLogo (<https://ccl.northwestern.edu/netlogo/>) which are the most used frameworks for Agent-based modeling.

Maintained by Antonio Prestes Garcia. Last updated 5 years ago.

openjdk

8.7 match 6 stars 3.92 score 28 scripts

e-sensing

sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes

An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.

Maintained by Gilberto Camara. Last updated 30 days ago.

big-earth-data cbers earth-observation eo-datacubes geospatial image-time-series land-cover-classification landsat planetary-computer r-spatial remote-sensing rspatial satellite-image-time-series satellite-imagery sentinel-2 stac-api stac-catalog cpp

3.5 match 494 stars 9.50 score 384 scripts

kelvynbladen

randomForestVIP:Tune Random Forests Based on Variable Importance and Plot Results

Functions for assessing variable relations and associations prior to modeling with a Random Forest algorithm (although these are relevant for any predictive model). Metrics such as partial correlations and variance inflation factors are tabulated as well as plotted for the user. A function is available for tuning the main Random Forest hyper-parameter based on model performance and variable importance metrics. This grid-search technique provides tables and plots showing the effect of the main hyper-parameter on each of the assessment metrics. It also returns each of the evaluated models to the user. The package also provides superior variable importance plots for individual models. All of the plots are developed so that the user has the ability to edit and improve further upon the plots. Derivations and methodology are described in Bladen (2022) <https://digitalcommons.usu.edu/etd/8587/>.

Maintained by Kelvyn Bladen. Last updated 1 years ago.

8.8 match 3.78 score 12 scripts

bioc

XDE:XDE: a Bayesian hierarchical model for cross-study analysis of differential gene expression

Multi-level model for cross-study detection of differential gene expression.

Maintained by Robert Scharpf. Last updated 5 months ago.

microarray differentialexpression cpp

7.8 match 4.20 score 10 scripts

gdurif

plsgenomics:PLS Analyses for Genomics

Routines for PLS-based genomic analyses, implementing PLS methods for classification with microarray data and prediction of transcription factor activities from combined ChIP-chip analysis. The >=1.2-1 versions include two new classification methods for microarray data: GSIM and Ridge PLS. The >=1.3 versions includes a new classification method combining variable selection and compression in logistic regression context: logit-SPLS; and an adaptive version of the sparse PLS.

Maintained by Ghislain Durif. Last updated 12 months ago.

5.9 match 5.55 score 140 scripts 2 dependents

thomasp85

ggraph:An Implementation of Grammar of Graphics for Graphs and Networks

The grammar of graphics as implemented in ggplot2 is a poor fit for graph and network visualizations due to its reliance on tabular data input. ggraph is an extension of the ggplot2 API tailored to graph visualizations and provides the same flexible approach to building up plots layer by layer.

Maintained by Thomas Lin Pedersen. Last updated 1 years ago.

ggplot-extension ggplot2 graph-visualization network-visualization visualization cpp

1.9 match 1.1k stars 16.96 score 9.2k scripts 111 dependents

chrhennig

fpc:Flexible Procedures for Clustering

Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.

Maintained by Christian Hennig. Last updated 6 months ago.

3.4 match 11 stars 9.25 score 2.6k scripts 70 dependents

leonawicz

tabr:Music Notation Syntax, Manipulation, Analysis and Transcription in R

Provides a music notation syntax and a collection of music programming functions for generating, manipulating, organizing, and analyzing musical information in R. Music syntax can be entered directly in character strings, for example to quickly transcribe short pieces of music. The package contains functions for directly performing various mathematical, logical and organizational operations and musical transformations on special object classes that facilitate working with music data and notation. The same music data can be organized in tidy data frames for a familiar and powerful approach to the analysis of large amounts of structured music data. Functions are available for mapping seamlessly between these formats and their representations of musical information. The package also provides an API to 'LilyPond' (<https://lilypond.org/>) for transcribing musical representations in R into tablature ("tabs") and sheet music. 'LilyPond' is open source music engraving software for generating high quality sheet music based on markup syntax. The package generates 'LilyPond' files from R code and can pass them to the 'LilyPond' command line interface to be rendered into sheet music PDF files or inserted into R markdown documents. The package offers nominal MIDI file output support in conjunction with rendering sheet music. The package can read MIDI files and attempts to structure the MIDI data to integrate as best as possible with the data structures and functionality found throughout the package.

Maintained by Matthew Leonawicz. Last updated 6 months ago.

guitar-tablature lilypond lilypond-api music-analysis music-data music-notation music-programming music-syntax music-transcription sheet-music

4.0 match 132 stars 7.87 score 94 scripts

tidymodels

rsample:General Resampling Infrastructure

Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).

Maintained by Hannah Frick. Last updated 4 days ago.

1.9 match 341 stars 16.72 score 5.2k scripts 79 dependents

kollerma

robustlmm:Robust Linear Mixed Effects Models

Implements the Robust Scoring Equations estimator to fit linear mixed effects models robustly. Robustness is achieved by modification of the scoring equations combined with the Design Adaptive Scale approach.

Maintained by Manuel Koller. Last updated 1 years ago.

openblas cpp

3.5 match 28 stars 8.79 score 138 scripts

hkestler

TunePareto:Multi-Objective Parameter Tuning for Classifiers

Generic methods for parameter tuning of classification algorithms using multiple scoring functions (Muessel et al. (2012), <doi:10.18637/jss.v046.i05>).

Maintained by Hans Kestler. Last updated 1 years ago.

8.8 match 1 stars 3.52 score 92 scripts 2 dependents

dfalbel

cloudml:Interface to the Google Cloud Machine Learning Platform

Interface to the Google Cloud Machine Learning Platform <https://cloud.google.com/ml-engine>, which provides cloud tools for training machine learning models.

Maintained by Daniel Falbel. Last updated 6 years ago.

8.1 match 3.85 score 141 scripts

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 8 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

2.0 match 959 stars 15.16 score 4.0k scripts 21 dependents

mlr-org

mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'

Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.

Maintained by Martin Binder. Last updated 7 days ago.

bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing stacking

2.4 match 141 stars 12.36 score 448 scripts 7 dependents

microsoft

finnts:Microsoft Finance Time Series Forecasting Framework

Automated time series forecasting developed by Microsoft Finance. The Microsoft Finance Time Series Forecasting Framework, aka Finn, can be used to forecast any component of the income statement, balance sheet, or any other area of interest by finance. Any numerical quantity over time, Finn can be used to forecast it. While it can be applied outside of the finance domain, Finn was built to meet the needs of financial analysts to better forecast their businesses within a company, and has a lot of built in features that are specific to the needs of financial forecasters. Happy forecasting!

Maintained by Mike Tokic. Last updated 24 days ago.

business data-science feature-selection finance finnts forecasting machine-learning microsoft time-series

3.1 match 193 stars 9.45 score 39 scripts

adriancorrendo

soiltestcorr:Soil Test Correlation and Calibration

A compilation of functions designed to assist users on the correlation analysis of crop yield and soil test values. Functions to estimate crop response patterns to soil nutrient availability and critical soil test values using various approaches such as: 1) the modified arcsine-log calibration curve (Correndo et al. (2017) <doi:10.1071/CP16444>); 2) the graphical Cate-Nelson quadrants analysis (Cate & Nelson (1965)), 3) the statistical Cate-Nelson quadrants analysis (Cate & Nelson (1971) <doi:10.2136/sssaj1971.03615995003500040048x>), 4) the linear-plateau regression (Anderson & Nelson (1975) <doi:10.2307/2529422>), 5) the quadratic-plateau regression (Bullock & Bullock (1994) <doi:10.2134/agronj1994.00021962008600010033x>), and 6) the Mitscherlich-type exponential regression (Melsted & Peck (1977) <doi:10.2134/asaspecpub29.c1>). The package development stemmed from ongoing work with the Fertilizer Recommendation Support Tool (FRST) and Feed the Future Innovation Lab for Collaborative Research on Sustainable Intensification (SIIL) projects.

Maintained by Adrian A. Correndo. Last updated 9 months ago.

4.8 match 7 stars 6.04 score 31 scripts

mlr-org

mlr3verse:Easily Install and Load the 'mlr3' Package Family

The 'mlr3' package family is a set of packages for machine-learning purposes built in a modular fashion. This wrapper package is aimed to simplify the installation and loading of the core 'mlr3' packages. Get more information about the 'mlr3' project at <https://mlr3book.mlr-org.com/>.

Maintained by Marc Becker. Last updated 2 months ago.

machine-learning meta mlr3

3.3 match 55 stars 8.36 score 720 scripts 1 dependents

mlverse

tabnet:Fit 'TabNet' Models for Classification and Regression

Implements the 'TabNet' model by Sercan O. Arik et al. (2019) <doi:10.48550/arXiv.1908.07442> with 'Coherent Hierarchical Multi-label Classification Networks' by Giunchiglia et al. <doi:10.48550/arXiv.2010.10151> and provides a consistent interface for fitting and creating predictions. It's also fully compatible with the 'tidymodels' ecosystem.

Maintained by Christophe Regouby. Last updated 6 months ago.

tabnet

3.0 match 109 stars 9.00 score 65 scripts

anothersamwilson

ParBayesianOptimization:Parallel Bayesian Optimization of Hyperparameters

Fast, flexible framework for implementing Bayesian optimization of model hyperparameters according to the methods described in Snoek et al. <arXiv:1206.2944>. The package allows the user to run scoring function in parallel, save intermediary results, and tweak other aspects of the process to fully utilize the computing resources available to the user.

Maintained by Samuel Wilson. Last updated 2 years ago.

bayesian-inference machine-learning

3.8 match 108 stars 7.19 score 86 scripts

nutterb

pixiedust:Tables so Beautifully Fine-Tuned You Will Believe It's Magic

The introduction of the 'broom' package has made converting model objects into data frames as simple as a single function. While the 'broom' package focuses on providing tidy data frames that can be used in advanced analysis, it deliberately stops short of providing functionality for reporting models in publication-ready tables. 'pixiedust' provides this functionality with a programming interface intended to be similar to 'ggplot2's system of layers with fine tuned control over each cell of the table. Options for output include printing to the console and to the common markdown formats (markdown, HTML, and LaTeX). With a little 'pixiedust' (and happy thoughts) tables can really fly.

Maintained by Benjamin Nutter. Last updated 1 years ago.

3.4 match 180 stars 8.01 score 94 scripts

statnet

ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 5 days ago.

1.8 match 100 stars 15.36 score 1.4k scripts 36 dependents

agorstras

ahaz:Regularization for Semiparametric Additive Hazards Regression

Computationally efficient procedures for regularized estimation with the semiparametric additive hazards regression model.

Maintained by Anders Gorst-Rasmussen. Last updated 4 months ago.

openblas

9.9 match 1 stars 2.70 score 20 scripts 6 dependents

lbelzile

BMAmevt:Multivariate Extremes: Bayesian Estimation of the Spectral Measure

Toolkit for Bayesian estimation of the dependence structure in multivariate extreme value parametric models, following Sabourin and Naveau (2014) <doi:10.1016/j.csda.2013.04.021> and Sabourin, Naveau and Fougeres (2013) <doi:10.1007/s10687-012-0163-0>.

Maintained by Leo Belzile. Last updated 2 years ago.

6.8 match 3.90 score 16 scripts

rgcca-factory

RGCCA:Regularized and Sparse Generalized Canonical Correlation Analysis for Multiblock Data

Multi-block data analysis concerns the analysis of several sets of variables (blocks) observed on the same group of individuals. The main aims of the RGCCA package are: to study the relationships between blocks and to identify subsets of variables of each block which are active in their relationships with the other blocks. This package allows to (i) run R/SGCCA and related methods, (ii) help the user to find out the optimal parameters for R/SGCCA such as regularization parameters (tau or sparsity), (iii) evaluate the stability of the RGCCA results and their significance, (iv) build predictive models from the R/SGCCA. (v) Generic print() and plot() functions apply to all these functionalities.

Maintained by Arthur Tenenhaus. Last updated 8 months ago.

3.5 match 12 stars 7.43 score 74 scripts

r-lib

generics:Common S3 Generics not Provided by Base R Methods Related to Model Fitting

In order to reduce potential package dependencies and conflicts, generics provides a number of commonly used S3 generics.

Maintained by Hadley Wickham. Last updated 1 years ago.

1.9 match 61 stars 14.00 score 131 scripts 9.8k dependents

fcampelo

MOEADr:Component-Wise MOEA/D Implementation

Modular implementation of Multiobjective Evolutionary Algorithms based on Decomposition (MOEA/D) [Zhang and Li (2007), <DOI:10.1109/TEVC.2007.892759>] for quick assembling and testing of new algorithmic components, as well as easy replication of published MOEA/D proposals. The full framework is documented in a paper published in the Journal of Statistical Software [<doi:10.18637/jss.v092.i06>].

Maintained by Felipe Campelo. Last updated 2 years ago.

moead multiobjective-optimization

4.1 match 20 stars 6.30 score 40 scripts

andrisignorell

ModTools:Building Regression and Classification Models

Consistent user interface to the most common regression and classification algorithms, such as random forest, neural networks, C5 trees and support vector machines, complemented with a handful of auxiliary functions, such as variable importance and a tuning function for the parameters.

Maintained by Andri Signorell. Last updated 2 months ago.

6.1 match 2 stars 4.20 score 3 scripts

yanyachen

FinCovRegularization:Covariance Matrix Estimation and Regularization for Finance

Estimation and regularization for covariance matrix of asset returns. For covariance matrix estimation, three major types of factor models are included: macroeconomic factor model, fundamental factor model and statistical factor model. For covariance matrix regularization, four regularized estimators are included: banding, tapering, hard-thresholding and soft- thresholding. The tuning parameters of these regularized estimators are selected via cross-validation.

Maintained by YaChen Yan. Last updated 8 years ago.

5.7 match 7 stars 4.30 score 19 scripts 1 dependents

mlr-org

mlr3viz:Visualizations for 'mlr3'

Visualization package of the 'mlr3' ecosystem. It features plots for mlr3 objects such as tasks, learners, predictions, benchmark results, tuning instances and filters via the 'autoplot()' generic of 'ggplot2'. The package draws plots with the 'viridis' color palette and applies the minimal theme. Visualizations include barplots, boxplots, histograms, ROC curves, and Precision-Recall curves.

Maintained by Marc Becker. Last updated 4 months ago.

ggplot2 mlr3 visualization visualizations

2.5 match 45 stars 9.58 score 364 scripts 4 dependents

yunuuuu

ggalign:A 'ggplot2' Extension for Consistent Axis Alignment

A 'ggplot2' extension offers various tools the creation of complex, multi-plot visualizations. Built on the familiar grammar of graphics, it provides intuitive tools to align and organize plots, making it ideal for complex visualizations. It excels in multi-omics research—such as genomics and microbiomes—by simplifying the visualization of intricate relationships between datasets, for example, linking genes to pathways. Whether you need to stack plots, arrange them around a central figure, or use a circular layout, 'ggalign' delivers flexibility and accuracy with minimal effort.

Maintained by Yun Peng. Last updated 14 hours ago.

complex-heatmaps dendrogram dendrogram-heatmap ggplot ggplot-extension ggplot2 heatmap heatmap-visualization heatmaps marginal-plots oncoplot oncoprint tanglegram upset upsetplot

3.3 match 267 stars 7.08 score 27 scripts

flr

mse:Tools for Running Management Strategy Evaluations using FLR

A set of functions and methods to enable the development and running of Management Strategy Evaluation (MSE) analyses, using the FLR packages and classes and the a4a methods and algorithms.

Maintained by Iago Mosqueira. Last updated 20 days ago.

simulation mse fisheries flr a4a

3.3 match 4 stars 7.04 score 137 scripts 3 dependents

aalfons

robmed:(Robust) Mediation Analysis

Perform mediation analysis via the fast-and-robust bootstrap test ROBMED (Alfons, Ates & Groenen, 2022a; <doi:10.1177/1094428121999096>), as well as various other methods. Details on the implementation and code examples can be found in Alfons, Ates, and Groenen (2022b) <doi:10.18637/jss.v103.i13>. Further discussion on robust mediation analysis can be found in Alfons & Schley (2024) <doi:10.31234/osf.io/2hqdy>.

Maintained by Andreas Alfons. Last updated 14 days ago.

3.6 match 6 stars 6.35 score 31 scripts 1 dependents

miriamesteve

eat:Efficiency Analysis Trees

Functions are provided to determine production frontiers and technical efficiency measures through non-parametric techniques based upon regression trees. The package includes code for estimating radial input, output, directional and additive measures, plotting graphical representations of the scores and the production frontiers by means of trees, and determining rankings of importance of input variables in the analysis. Additionally, an adaptation of Random Forest by a set of individual Efficiency Analysis Trees for estimating technical efficiency is also included. More details in: <doi:10.1016/j.eswa.2020.113783>.

Maintained by Miriam Esteve. Last updated 3 years ago.

4.8 match 5 stars 4.68 score 19 scripts

philipppro

tuneRanger:Tune Random Forest of the 'ranger' Package

Tuning random forest with one line. The package is mainly based on the packages 'ranger' and 'mlrMBO'.

Maintained by Philipp Probst. Last updated 12 months ago.

3.5 match 33 stars 6.38 score 121 scripts 1 dependents

tomkellygenetics

leiden:R Implementation of Leiden Clustering Algorithm

Implements the 'Python leidenalg' module to be called in R. Enables clustering using the leiden algorithm for partition a graph into communities. See the 'Python' repository for more details: <https://github.com/vtraag/leidenalg> Traag et al (2018) From Louvain to Leiden: guaranteeing well-connected communities. <arXiv:1810.08473>.

Maintained by S. Thomas Kelly. Last updated 10 months ago.

2.5 match 38 stars 8.90 score 180 scripts 3 dependents

edhofman

ReSurv:Machine Learning Models For Predicting Claim Counts

Prediction of claim counts using the feature based development factors introduced in the manuscript <doi:10.48550/arXiv.2312.14549>. Implementation of Neural Networks, Extreme Gradient Boosting, and Cox model with splines to optimise the partial log-likelihood of proportional hazard models.

Maintained by Emil Hofman. Last updated 4 months ago.

3.8 match 2 stars 5.87 score 21 scripts

andyliaw-mrk

randomForest:Breiman and Cutlers Random Forests for Classification and Regression

Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <DOI:10.1023/A:1010933404324>.

Maintained by Andy Liaw. Last updated 6 months ago.

fortran

1.8 match 47 stars 12.11 score 35k scripts 282 dependents

alec-stashevsky

blocklength:Select an Optimal Block-Length to Bootstrap Dependent Data (Block Bootstrap)

A set of functions to select the optimal block-length for a dependent bootstrap (block-bootstrap). Includes the Hall, Horowitz, and Jing (1995) <doi:10.1093/biomet/82.3.561> subsampling-based cross-validation method, the Politis and White (2004) <doi:10.1081/ETC-120028836> Spectral Density Plug-in method, including the Patton, Politis, and White (2009) <doi:10.1080/07474930802459016> correction, and the Lahiri, Furukawa, and Lee (2007) <doi:10.1016/j.stamet.2006.08.002> nonparametric plug-in method, with a corresponding set of S3 plot methods.

Maintained by Alec Stashevsky. Last updated 6 days ago.

block-bootstrap block-resampling blocklength boot bootstrap depedent-bootstrap dependent horowitz inference meboot politis resample stats time time-series time-series-analysis tseries

4.5 match 4 stars 4.78 score 8 scripts

blasbenito

spatialRF:Easy Spatial Modeling with Random Forest

Automatic generation and selection of spatial predictors for spatial regression with Random Forest. Spatial predictors are surrogates of variables driving the spatial structure of a response variable. The package offers two methods to generate spatial predictors from a distance matrix among training cases: 1) Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>): computed as the eigenvectors of a weighted matrix of distances; 2) RFsp (Hengl et al. <DOI:10.7717/peerj.5518>): columns of the distance matrix used as spatial predictors. Spatial predictors help minimize the spatial autocorrelation of the model residuals and facilitate an honest assessment of the importance scores of the non-spatial predictors. Additionally, functions to reduce multicollinearity, identify relevant variable interactions, tune random forest hyperparameters, assess model transferability via spatial cross-validation, and explore model results via partial dependence curves and interaction surfaces are included in the package. The modelling functions are built around the highly efficient 'ranger' package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).

Maintained by Blas M. Benito. Last updated 3 years ago.

random-forest spatial-analysis spatial-regression

3.9 match 114 stars 5.45 score 49 scripts

hugogogo

varband:Variable Banding of Large Precision Matrices

Implementation of the variable banding procedure for modeling local dependence and estimating precision matrices that is introduced in Yu & Bien (2016) and is available at <https://arxiv.org/abs/1604.07451>.

Maintained by Guo Yu. Last updated 7 years ago.

openblas cpp

5.3 match 2 stars 4.00 score 10 scripts

cran

glmnetr:Nested Cross Validation for the Relaxed Lasso and Other Machine Learning Models

Cross validation informed Relaxed LASSO, Artificial Neural Network (ANN), gradient boosting machine ('xgboost'), Random Forest ('RandomForestSRC'), Oblique Random Forest ('aorsf'), Recursive Partitioning ('RPART') or step wise regression models are fit. Cross validation leave out samples (leading to nested cross validation) or bootstrap out-of-bag samples are used to evaluate and compare performances between these models with results presented in tabular or graphical means. Calibration plots can also be generated, again based upon (outer nested) cross validation or bootstrap leave out (out of bag) samples. For some datasets, for example when the design matrix is not of full rank, 'glmnet' may have very long run times when fitting the relaxed lasso model, from our experience when fitting Cox models on data with many predictors and many patients, making it difficult to get solutions from either glmnet() or cv.glmnet(). This may be remedied by using the 'path=TRUE' option when calling glmnet() and cv.glmnet(). Within the glmnetr package the approach of path=TRUE is taken by default. When fitting not a relaxed lasso model but an elastic-net model, then the R-packages 'nestedcv' <https://cran.r-project.org/package=nestedcv>, 'glmnetSE' <https://cran.r-project.org/package=glmnetSE> or others may provide greater functionality when performing a nested CV. Use of the 'glmnetr' has many similarities to the 'glmnet' package and it is recommended that the user of 'glmnetr' also become familiar with the 'glmnet' package <https://cran.r-project.org/package=glmnet>, with the "An Introduction to 'glmnet'" and "The Relaxed Lasso" being especially useful in this regard.

Maintained by Walter K Kremers. Last updated 2 months ago.

5.8 match 3.67 score 2 scripts

business-science

timetk:A Tool Kit for Working with Time Series

Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.

Maintained by Matt Dancho. Last updated 1 years ago.

coercion coercion-functions data-mining dplyr forecast forecasting forecasting-models machine-learning series-decomposition series-signature tibble tidy tidyquant tidyverse time time-series timeseries

1.5 match 625 stars 14.15 score 4.0k scripts 16 dependents

gesistsa

grafzahl:Supervised Machine Learning for Textual Data Using Transformers and 'Quanteda'

Duct tape the 'quanteda' ecosystem (Benoit et al., 2018) <doi:10.21105/joss.00774> to modern Transformer-based text classification models (Wolf et al., 2020) <doi:10.18653/v1/2020.emnlp-demos.6>, in order to facilitate supervised machine learning for textual data. This package mimics the behaviors of 'quanteda.textmodels' and provides a function to setup the 'Python' environment to use the pretrained models from 'Hugging Face' <https://huggingface.co/>. More information: <doi:10.5117/CCR2023.1.003.CHAN>.

Maintained by Chung-hong Chan. Last updated 24 days ago.

3.5 match 41 stars 5.91 score 3 scripts

ssnn-airr

shazam:Immunoglobulin Somatic Hypermutation Analysis

Provides a computational framework for analyzing mutations in immunoglobulin (Ig) sequences. Includes methods for Bayesian estimation of antigen-driven selection pressure, mutational load quantification, building of somatic hypermutation (SHM) models, and model-dependent distance calculations. Also includes empirically derived models of SHM for both mice and humans. Citations: Gupta and Vander Heiden, et al (2015) <doi:10.1093/bioinformatics/btv359>, Yaari, et al (2012) <doi:10.1093/nar/gks457>, Yaari, et al (2013) <doi:10.3389/fimmu.2013.00358>, Cui, et al (2016) <doi:10.4049/jimmunol.1502263>.

Maintained by Susanna Marquez. Last updated 2 months ago.

2.8 match 7.43 score 222 scripts 2 dependents

cran

catalytic:Tools for Applying Catalytic Priors in Statistical Modeling

To improve estimation accuracy and stability in statistical modeling, catalytic prior distributions are employed, integrating observed data with synthetic data generated from a simpler model's predictive distribution. This approach enhances model robustness, stability, and flexibility in complex data scenarios. The catalytic prior distributions are introduced by 'Huang et al.' (2020, <doi:10.1073/pnas.1920913117>), Li and Huang (2023, <doi:10.48550/arXiv.2312.01411>).

Maintained by Dongming Huang. Last updated 3 months ago.

6.6 match 3.18 score

molaison

MantaID:A Machine-Learning Based Tool to Automate the Identification of Biological Database IDs

The number of biological databases is growing rapidly, but different databases use different IDs to refer to the same biological entity. The inconsistency in IDs impedes the integration of various types of biological data. To resolve the problem, we developed 'MantaID', a data-driven, machine-learning based approach that automates identifying IDs on a large scale. The 'MantaID' model's prediction accuracy was proven to be 99%, and it correctly and effectively predicted 100,000 ID entries within two minutes. 'MantaID' supports the discovery and exploitation of ID patterns from large quantities of databases. (e.g., up to 542 biological databases). An easy-to-use freely available open-source software R package, a user-friendly web application, and API were also developed for 'MantaID' to improve applicability. To our knowledge, 'MantaID' is the first tool that enables an automatic, quick, accurate, and comprehensive identification of large quantities of IDs, and can therefore be used as a starting point to facilitate the complex assimilation and aggregation of biological data across diverse databases.

Maintained by Zhengpeng Zeng. Last updated 6 months ago.

5.5 match 3.78 score 2 scripts

mikkelvembye

AIscreenR:AI Screening Tools in R for Systematic Reviewing

Provides functions to conduct title and abstract screening in systematic reviews using large language models, such as the Generative Pre-trained Transformer (GPT) models from 'OpenAI' <https://platform.openai.com/>. These functions can enhance the quality of title and abstract screenings while reducing the total screening time significantly. In addition, the package includes tools for quality assessment of title and abstract screenings, as described in Vembye, Christensen, Mølgaard, and Schytt (2024) <DOI:10.31219/osf.io/yrhzm>.

Maintained by Mikkel H. Vembye. Last updated 2 months ago.

gpt openai screening systematic-review

3.4 match 10 stars 6.11 score 7 scripts

xiaooupan

FarmTest:Factor-Adjusted Robust Multiple Testing

Performs robust multiple testing for means in the presence of known and unknown latent factors presented in Fan et al.(2019) "FarmTest: Factor-Adjusted Robust Multiple Testing With Approximate False Discovery Control" <doi:10.1080/01621459.2018.1527700>. Implements a series of adaptive Huber methods combined with fast data-drive tuning schemes proposed in Ke et al.(2019) "User-Friendly Covariance Estimation for Heavy-Tailed Distributions" <doi:10.1214/19-STS711> to estimate model parameters and construct test statistics that are robust against heavy-tailed and/or asymmetric error distributions. Extensions to two-sample simultaneous mean comparison problems are also included. As by-products, this package contains functions that compute adaptive Huber mean, covariance and regression estimators that are of independent interest.

Maintained by Xiaoou Pan. Last updated 4 years ago.

openblas cpp openmp

5.9 match 4 stars 3.48 score 15 scripts

tidymodels

workflowsets:Create a Collection of 'tidymodels' Workflows

A workflow is a combination of a model and preprocessors (e.g, a formula, recipe, etc.) (Kuhn and Silge (2021) <https://www.tmwr.org/>). In order to try different combinations of these, an object can be created that contains many workflows. There are functions to create workflows en masse as well as training them and visualizing the results.

Maintained by Simon Couch. Last updated 5 months ago.

1.7 match 93 stars 12.21 score 294 scripts 19 dependents

zachmayer

caretEnsemble:Ensembles of Caret Models

Functions for creating ensembles of caret models: caretList() and caretStack(). caretList() is a convenience function for fitting multiple caret::train() models to the same dataset. caretStack() will make linear or non-linear combinations of these models, using a caret::train() model as a meta-model.

Maintained by Zachary A. Deane-Mayer. Last updated 3 months ago.

1.7 match 226 stars 11.92 score 780 scripts 1 dependents

kaz-yos

tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights

Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.

Maintained by Kazuki Yoshida. Last updated 3 years ago.

baseline-characteristics descriptive-statistics statistics

1.5 match 221 stars 13.55 score 2.3k scripts 12 dependents

r-lum

Luminescence:Comprehensive Luminescence Dating Data Analysis

A collection of various R functions for the purpose of Luminescence dating data analysis. This includes, amongst others, data import, export, application of age models, curve deconvolution, sequence analysis and plotting of equivalent dose distributions.

Maintained by Sebastian Kreutzer. Last updated 14 hours ago.

bayesian-statistics data-science geochronology luminescence luminescence-dating open-science osl plotting radiofluorescence tl xsyg cpp

1.9 match 15 stars 10.77 score 178 scripts 8 dependents

aleksandarsekulic

meteo:RFSI & STRK Interpolation for Meteo and Environmental Variables

Random Forest Spatial Interpolation (RFSI, Sekulić et al. (2020) <doi:10.3390/rs12101687>) and spatio-temporal geostatistical (spatio-temporal regression Kriging (STRK)) interpolation for meteorological (Kilibarda et al. (2014) <doi:10.1002/2013JD020803>, Sekulić et al. (2020) <doi:10.1007/s00704-019-03077-3>) and other environmental variables. Contains global spatio-temporal models calculated using publicly available data.

Maintained by Aleksandar Sekulić. Last updated 5 months ago.

4.0 match 18 stars 5.06 score 64 scripts

albertofranzin

bnstruct:Bayesian Network Structure Learning from Data with Missing Values

Bayesian Network Structure Learning from Data with Missing Values. The package implements the Silander-Myllymaki complete search, the Max-Min Parents-and-Children, the Hill-Climbing, the Max-Min Hill-climbing heuristic searches, and the Structural Expectation-Maximization algorithm. Available scoring functions are BDeu, AIC, BIC. The package also implements methods for generating and using bootstrap samples, imputed data, inference.

Maintained by Alberto Franzin. Last updated 1 years ago.

3.7 match 1 stars 5.40 score 111 scripts 3 dependents

harveyklyne

drape:Doubly Robust Average Partial Effects

Doubly robust average partial effect estimation. This implementation contains methods for adding additional smoothness to plug-in regression procedures and for estimating score functions using smoothing splines. Details of the method can be found in Harvey Klyne and Rajen D. Shah (2023) <doi:10.48550/arXiv.2308.09207>.

Maintained by Harvey Klyne. Last updated 4 months ago.

4.9 match 2 stars 4.00 score 4 scripts

qile0317

APackOfTheClones:Visualization of Clonal Expansion for Single Cell Immune Profiles

Visualize clonal expansion via circle-packing. 'APackOfTheClones' extends 'scRepertoire' to produce a publication-ready visualization of clonal expansion at a single cell resolution, by representing expanded clones as differently sized circles. The method was originally implemented by Murray Christian and Ben Murrell in the following immunology study: Ma et al. (2021) <doi:10.1126/sciimmunol.abg6356>.

Maintained by Qile Yang. Last updated 4 months ago.

clonal-analysis immune-repertoire immune-system scrna-seq scrnaseq seurat single-cell single-cell-genomics cpp

3.0 match 15 stars 6.45 score 15 scripts

hmjianggatech

huge:High-Dimensional Undirected Graph Estimation

Provides a general framework for high-dimensional undirected graph estimation. It integrates data preprocessing, neighborhood screening, graph estimation, and model selection techniques into a pipeline. In preprocessing stage, the nonparanormal(npn) transformation is applied to help relax the normality assumption. In the graph estimation stage, the graph structure is estimated by Meinshausen-Buhlmann graph estimation or the graphical lasso, and both methods can be further accelerated by the lossy screening rule preselecting the neighborhood of each variable by correlation thresholding. We target on high-dimensional data analysis usually d >> n, and the computation is memory-optimized using the sparse matrix output. We also provide a computationally efficient approach, correlation thresholding graph estimation. Three regularization/thresholding parameter selection methods are included in this package: (1)stability approach for regularization selection (2) rotation information criterion (3) extended Bayesian information criterion which is only available for the graphical lasso.

Maintained by Haoming Jiang. Last updated 3 years ago.

cpp openmp

1.9 match 12 stars 10.29 score 608 scripts 19 dependents

mstrimas

colorist:Coloring Wildlife Distributions in Space-Time

Color and visualize wildlife distributions in space-time using raster data. In addition to enabling display of sequential change in distributions through the use of small multiples, 'colorist' provides functions for extracting several features of interest from a sequence of distributions and for visualizing those features using HCL (hue-chroma-luminance) color palettes. Resulting maps allow for "fair" visual comparison of intensity values (e.g., occurrence, abundance, or density) across space and time and can be used to address questions about where, when, and how consistently a species, group, or individual is likely to be found.

Maintained by Matthew Strimas-Mackey. Last updated 11 months ago.

3.3 match 14 stars 5.60 score 19 scripts

mannau

boilerpipeR:Interface to the Boilerpipe Java Library

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe <https://github.com/kohlschutter/boilerpipe> Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Maintained by Mario Annau. Last updated 4 years ago.

openjdk

3.4 match 22 stars 5.52 score 30 scripts

topepo

Cubist:Rule- And Instance-Based Regression Modeling

Regression modeling using rules with added instance-based corrections.

Maintained by Max Kuhn. Last updated 9 months ago.

1.5 match 40 stars 12.38 score 2.8k scripts 18 dependents

r-forge

tramnet:Penalized Transformation Models

Partially penalized versions of specific transformation models implemented in package 'mlt'. Available models include a fully parametric version of the Cox model, other parametric survival models (Weibull, etc.), models for binary and ordered categorical variables, normal and transformed-normal (Box-Cox type) linear models, and continuous outcome logistic regression. Hyperparameter tuning is facilitated through model-based optimization functionalities from package 'mlr3MBO'. The methodology is described in Kook et al. (2021) <doi:10.32614/RJ-2021-054>. Transformation models and model-based optimization are described in Hothorn et al. (2019) <doi:10.1111/sjos.12291> and Bischl et al. (2016) <arxiv:1703.03373>, respectively.

Maintained by Lucas Kook. Last updated 3 days ago.

4.5 match 4.12 score 2 scripts

cran

npreg:Nonparametric Regression via Smoothing Splines

Multiple and generalized nonparametric regression using smoothing spline ANOVA models and generalized additive models, as described in Helwig (2020) <doi:10.4135/9781526421036885885>. Includes support for Gaussian and non-Gaussian responses, smoothers for multiple types of predictors (including random intercepts), interactions between smoothers of mixed types, eight different methods for smoothing parameter selection, and flexible tools for diagnostics, inference, and prediction.

Maintained by Nathaniel E. Helwig. Last updated 12 months ago.

18.3 match 1.00 score

mfasiolo

qgam:Smooth Additive Quantile Regression Models

Smooth additive quantile regression models, fitted using the methods of Fasiolo et al. (2020) <doi:10.1080/01621459.2020.1725521>. See Fasiolo at al. (2021) <doi:10.18637/jss.v100.i09> for an introduction to the package. Differently from 'quantreg', the smoothing parameters are estimated automatically by marginal loss minimization, while the regression coefficients are estimated using either PIRLS or Newton algorithm. The learning rate is determined so that the Bayesian credible intervals of the estimated effects have approximately the correct coverage. The main function is qgam() which is similar to gam() in 'mgcv', but fits non-parametric quantile regression models.

Maintained by Matteo Fasiolo. Last updated 4 days ago.

1.8 match 33 stars 10.13 score 133 scripts 15 dependents

jolars

SLOPE:Sorted L1 Penalized Estimation

Efficient implementations for Sorted L-One Penalized Estimation (SLOPE): generalized linear models regularized with the sorted L1-norm (Bogdan et al. 2015). Supported models include ordinary least-squares regression, binomial regression, multinomial regression, and Poisson regression. Both dense and sparse predictor matrices are supported. In addition, the package features predictor screening rules that enable fast and efficient solutions to high-dimensional problems.

Maintained by Johan Larsson. Last updated 17 hours ago.

generalized-linear-models slope sparse-regression cpp openmp

1.9 match 17 stars 9.62 score 75 scripts 3 dependents

eogasawara

tspredit:Time Series Prediction Integrated Tuning

Prediction is one of the most important activities while working with time series. There are many alternative ways to model the time series. Finding the right one is challenging to model them. Most data-driven models (either statistical or machine learning) demand tuning. Setting them right is mandatory for good predictions. It is even more complex since time series prediction also demands choosing a data pre-processing that complies with the chosen model. Many time series frameworks have features to build and tune models. The package differs as it provides a framework that seamlessly integrates tuning data pre-processing activities with the building of models. The package provides functions for defining and conducting time series prediction, including data pre(post)processing, decomposition, tuning, modeling, prediction, and accuracy assessment. More information is available at Izau et al. <doi:10.5753/sbbd.2022.224330>.

Maintained by Eduardo Ogasawara. Last updated 3 months ago.

6.0 match 2.92 score 56 scripts

dcomtois

summarytools:Tools to Quickly and Neatly Summarize Data

Data frame summaries, cross-tabulations, weight-enabled frequency tables and common descriptive (univariate) statistics in concise tables available in a variety of formats (plain ASCII, Markdown and HTML). A good point-of-entry for exploring data, both for experienced and new R users.

Maintained by Dominic Comtois. Last updated 1 days ago.

descriptive-statistics frequency-table html-report markdown pander pandoc pandoc-markdown rmarkdown rstudio

1.2 match 526 stars 14.52 score 2.9k scripts 6 dependents

hjeglinton

riskscores:Optimized Integer Risk Score Models

Implements an optimized approach to learning risk score models, where sparsity and integer constraints are integrated into the model-fitting process.

Maintained by Hannah Eglinton. Last updated 5 months ago.

3.5 match 1 stars 4.95 score 7 scripts

blue-matter

MSEtool:Management Strategy Evaluation Toolkit

Development, simulation testing, and implementation of management procedures for fisheries (see Carruthers & Hordyk (2018) <doi:10.1111/2041-210X.13081>).

Maintained by Adrian Hordyk. Last updated 24 days ago.

cpp

2.3 match 8 stars 7.69 score 163 scripts 3 dependents

sperfu

findGSEP:Estimate Genome Size of Polyploid Species Using k-Mer Frequencies

Provides tools to estimate the genome size of polyploid species using k-mer frequencies. This package includes functions to process k-mer frequency data and perform genome size estimation by fitting k-mer frequencies with a normal distribution model. It supports handling of complex polyploid genomes and offers various options for customizing the estimation process. The basic method 'findGSE' is detailed in Sun, Hequan, et al. (2018) <doi:10.1093/bioinformatics/btx637>.

Maintained by Laiyi Fu. Last updated 8 months ago.

3.5 match 3 stars 4.88 score 1 scripts

cezarykuran

oaii:'OpenAI' API R Interface

A comprehensive set of helpers that streamline data transmission and processing, making it effortless to interact with the 'OpenAI' API.

Maintained by Cezary Kuran. Last updated 1 years ago.

16.9 match 1.00 score 1 scripts

jillbo1000

EZtune:Tunes AdaBoost, Elastic Net, Support Vector Machines, and Gradient Boosting Machines

Contains two functions that are intended to make tuning supervised learning methods easy. The eztune function uses a genetic algorithm or Hooke-Jeeves optimizer to find the best set of tuning parameters. The user can choose the optimizer, the learning method, and if optimization will be based on accuracy obtained through validation error, cross validation, or resubstitution. The function eztune_cv will compute a cross validated error rate. The purpose of eztune_cv is to provide a cross validated accuracy or MSE when resubstitution or validation data are used for optimization because error measures from both approaches can be misleading.

Maintained by Jill Lundell. Last updated 3 years ago.

3.5 match 4.76 score 38 scripts 1 dependents

jrodu

qqboxplot:Implementation of the Q-Q Boxplot

A system to implement the Q-Q boxplot. It is implemented as an extension to 'ggplot2'. The Q-Q boxplot is an amalgam of the boxplot and the Q-Q plot and allows the user to rapidly examine summary statistics and tail behavior for multiple distributions in the same pane. As an extension of the 'ggplot2' implementation of the boxplot, possible modifications to the boxplot extend to the Q-Q boxplot.

Maintained by Jordan Rodu. Last updated 2 years ago.

3.5 match 2 stars 4.76 score 29 scripts

mrkellermann

eiPack:Ecological Inference and Higher-Dimension Data Management

Provides methods for analyzing R by C ecological contingency tables using the extreme case analysis, ecological regression, and Multinomial-Dirichlet ecological inference models. Also provides tools for manipulating higher-dimension data objects.

Maintained by Michael Kellermann. Last updated 2 years ago.

8.6 match 1.92 score 28 scripts 1 dependents

mlr-org

mlr3fselect:Feature Selection for 'mlr3'

Feature selection package of the 'mlr3' ecosystem. It selects the optimal feature set for any 'mlr3' learner. The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling.

Maintained by Marc Becker. Last updated 2 months ago.

evolutionary-algorithms exhaustive-search feature-selection machine-learning mlr3 optimization random-search recursive-feature-elimination sequential-feature-selection

2.0 match 23 stars 8.25 score 70 scripts 2 dependents

tianxili

randnet:Random Network Model Estimation, Selection and Parameter Tuning

Model selection and parameter tuning procedures for a class of random network models. The model selection can be done by a general cross-validation framework called ECV from Li et. al. (2016) <arXiv:1612.04717> . Several other model-based and task-specific methods are also included, such as NCV from Chen and Lei (2016) <arXiv:1411.1715>, likelihood ratio method from Wang and Bickel (2015) <arXiv:1502.02069>, spectral methods from Le and Levina (2015) <arXiv:1507.00827>. Many network analysis methods are also implemented, such as the regularized spectral clustering (Amini et. al. 2013 <doi:10.1214/13-AOS1138>) and its degree corrected version and graphon neighborhood smoothing (Zhang et. al. 2015 <arXiv:1509.08588>). It also includes the consensus clustering of Gao et. al. (2014) <arXiv:1410.5837>, the method of moments estimation of nomination SBM of Li et. al. (2020) <arxiv:2008.03652>, and the network mixing method of Li and Le (2021) <arxiv:2106.02803>. It also includes the informative core-periphery data processing method of Miao and Li (2021) <arXiv:2101.06388>. The work to build and improve this package is partially supported by the NSF grants DMS-2015298 and DMS-2015134.

Maintained by Tianxi Li. Last updated 2 years ago.

6.7 match 2.45 score 47 scripts 2 dependents

computationalstylistics

stylo:Stylometric Multivariate Analyses

Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.

Maintained by Maciej Eder. Last updated 2 months ago.

1.9 match 186 stars 8.59 score 462 scripts

bioc

MassSpecWavelet:Peak Detection for Mass Spectrometry data using wavelet-based algorithms

Peak Detection in Mass Spectrometry data is one of the important preprocessing steps. The performance of peak detection affects subsequent processes, including protein identification, profile alignment and biomarker identification. Using Continuous Wavelet Transform (CWT), this package provides a reliable algorithm for peak detection that does not require any type of smoothing or previous baseline correction method, providing more consistent results for different spectra. See <doi:10.1093/bioinformatics/btl355} for further details.

Maintained by Sergio Oller Moreno. Last updated 3 months ago.

immunooncology massspectrometry proteomics peakdetection

1.7 match 9 stars 9.38 score 37 scripts 17 dependents

cran

sgPLS:Sparse Group Partial Least Square Methods

Regularized version of partial least square approaches providing sparse, group, and sparse group versions of partial least square regression models (Liquet, B., Lafaye de Micheaux, P., Hejblum B., Thiebaut, R. (2016) <doi:10.1093/bioinformatics/btv535>). Version of PLS Discriminant analysis is also provided.

Maintained by Benoit Liquet. Last updated 1 years ago.

10.9 match 1 stars 1.48 score 7 scripts

mariaguilleng

boostingDEA:A Boosting Approach to Data Envelopment Analysis

Includes functions to estimate production frontiers and make ideal output predictions in the Data Envelopment Analysis (DEA) context using both standard models from DEA and Free Disposal Hull (FDH) and boosting techniques. In particular, EATBoosting (Guillen et al., 2023 <doi:10.1016/j.eswa.2022.119134>) and MARSBoosting. Moreover, the package includes code for estimating several technical efficiency measures using different models such as the input and output-oriented radial measures, the input and output-oriented Russell measures, the Directional Distance Function (DDF), the Weighted Additive Measure (WAM) and the Slacks-Based Measure (SBM).

Maintained by Maria D. Guillen. Last updated 2 years ago.

4.0 match 2 stars 4.00 score 3 scripts

wenjie2wang

abclass:Angle-Based Large-Margin Classifiers

Multi-category angle-based large-margin classifiers. See Zhang and Liu (2014) <doi:10.1093/biomet/asu017> for details.

Maintained by Wenjie Wang. Last updated 1 years ago.

openblas cpp openmp

5.3 match 2 stars 3.04 score 11 scripts

yixuan

recosystem:Recommender System using Matrix Factorization

R wrapper of the 'libmf' library <https://www.csie.ntu.edu.tw/~cjlin/libmf/> for recommender system using matrix factorization. It is typically used to approximate an incomplete matrix using the product of two matrices in a latent space. Other common names for this task include "collaborative filtering", "matrix completion", "matrix recovery", etc. High performance multi-core parallel computing is supported in this package.

Maintained by Yixuan Qiu. Last updated 2 years ago.

matrix-factorization recommender-system cpp openmp

2.0 match 84 stars 7.97 score 101 scripts 6 dependents

duckmayr

bggum:Bayesian Estimation of Generalized Graded Unfolding Model Parameters

Provides a Metropolis-coupled Markov chain Monte Carlo sampler, post-processing and parameter estimation functions, and plotting utilities for the generalized graded unfolding model of Roberts, Donoghue, and Laughlin (2000) <doi:10.1177/01466216000241001>.

Maintained by JBrandon Duck-Mayr. Last updated 5 years ago.

cpp

3.3 match 4 stars 4.78 score 6 scripts

myles-lewis

nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'

Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.

Maintained by Myles Lewis. Last updated 4 days ago.

2.0 match 12 stars 7.92 score 46 scripts

cran

grf:Generalized Random Forests

Forest-based statistical estimation and inference. GRF provides non-parametric methods for heterogeneous treatment effects estimation (optionally using right-censored outcomes, multiple treatment arms or outcomes, or instrumental variables), as well as least-squares regression, quantile regression, and survival regression, all with support for missing covariates.

Maintained by Erik Sverdrup. Last updated 4 months ago.

cpp

2.7 match 5.83 score 1.2k scripts 14 dependents

mikejohnson51

AHGestimation:An R package for Computing Robust, Mass Preserving Hydraulic Geometries and Rating Curves

Compute mass preserving 'At a station Hydraulic Geometry' (AHG) fits from river measurements.

Maintained by Mike Johnson. Last updated 3 months ago.

3.1 match 6 stars 5.02 score 10 scripts

bioc

nethet:A bioconductor package for high-dimensional exploration of biological network heterogeneity

Package nethet is an implementation of statistical solid methodology enabling the analysis of network heterogeneity from high-dimensional data. It combines several implementations of recent statistical innovations useful for estimation and comparison of networks in a heterogeneous, high-dimensional setting. In particular, we provide code for formal two-sample testing in Gaussian graphical models (differential network and GGM-GSA; Stadler and Mukherjee, 2013, 2014) and make a novel network-based clustering algorithm available (mixed graphical lasso, Stadler and Mukherjee, 2013).

Maintained by Nicolas Staedler. Last updated 5 months ago.

clustering graphandnetwork

3.6 match 4.30 score 7 scripts

fabsig

gpboost:Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

An R package that allows for combining tree-boosting with Gaussian process and mixed effects models. It also allows for independently doing tree-boosting as well as inference and prediction for Gaussian process and mixed effects models. See <https://github.com/fabsig/GPBoost> for more information on the software and Sigrist (2022, JMLR) <https://www.jmlr.org/papers/v23/20-322.html> and Sigrist (2023, TPAMI) <doi:10.1109/TPAMI.2022.3168152> for more information on the methodology.

Maintained by Fabio Sigrist. Last updated 24 days ago.

cpp openmp

3.7 match 4.20 score 212 scripts

pecanproject

PEcAn.emulator:Gausian Process Emulator

Implementation of a Gaussian Process model (both likelihood and bayesian approaches) for kriging and model emulation. Includes functions for sampling design and prediction.

Maintained by Mike Dietze. Last updated 13 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

1.8 match 216 stars 8.82 score 1 scripts 6 dependents

tomkellygenetics

vioplot:Violin Plot

A violin plot is a combination of a box plot and a kernel density plot. This package allows extensive customisation of violin plots.

Maintained by S. Thomas Kelly. Last updated 19 days ago.

boxplot colours customisation dataviz formula plotting violin-plot violinplot vioplot

1.3 match 26 stars 12.32 score 2.0k scripts 8 dependents

dcnorris

DTAT:Dose Titration Algorithm Tuning

Dose Titration Algorithm Tuning (DTAT) is a methodologic framework allowing dose individualization to be conceived as a continuous learning process that begins in early-phase clinical trials and continues throughout drug development, on into clinical practice. This package includes code that researchers may use to reproduce or extend key results of the DTAT research programme, plus tools for trialists to design and simulate a '3+3/PC' dose-finding study. Please see Norris (2017a) <doi:10.12688/f1000research.10624.3> and Norris (2017c) <doi:10.1101/240846>.

Maintained by David C. Norris. Last updated 10 months ago.

5.3 match 2.90 score 20 scripts

jackmwolf

tehtuner:Fit and Tune Models to Detect Treatment Effect Heterogeneity

Implements methods to fit Virtual Twins models (Foster et al. (2011) <doi:10.1002/sim.4322>) for identifying subgroups with differential effects in the context of clinical trials while controlling the probability of falsely detecting a differential effect when the conditional average treatment effect is uniform across the study population using parameter selection methods proposed in Wolf et al. (2022) <doi:10.1177/17407745221095855>.

Maintained by Jack Wolf. Last updated 2 years ago.

clinical-trials heterogeneity-of-treatment-effect subgroup-identification

4.7 match 4 stars 3.30 score 6 scripts

bioc

BayesSpace:Clustering and Resolution Enhancement of Spatial Transcriptomes

Tools for clustering and enhancing the resolution of spatial gene expression experiments. BayesSpace clusters a low-dimensional representation of the gene expression matrix, incorporating a spatial prior to encourage neighboring spots to cluster together. The method can enhance the resolution of the low-dimensional representation into "sub-spots", for which features such as gene expression or cell type composition can be imputed.

Maintained by Matt Stone. Last updated 5 months ago.

software clustering transcriptomics geneexpression singlecell immunooncology dataimport openblas cpp openmp

1.7 match 123 stars 8.89 score 278 scripts 1 dependents

ramikrispin

TSstudio:Functions for Time Series Analysis and Forecasting

Provides a set of tools for descriptive and predictive analysis of time series data. That includes functions for interactive visualization of time series objects and as well utility functions for automation time series forecasting.

Maintained by Rami Krispin. Last updated 2 years ago.

forecasting time-series timeseries tsstudio visualization

1.7 match 425 stars 9.02 score 656 scripts

richarddmorey

BayesFactor:Computation of Bayes Factors for Common Designs

A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.

Maintained by Richard D. Morey. Last updated 1 years ago.

cpp

1.1 match 133 stars 13.70 score 1.7k scripts 21 dependents

zhuwang46

mpath:Regularized Linear Models

Algorithms compute robust estimators for loss functions in the concave convex (CC) family by the iteratively reweighted convex optimization (IRCO), an extension of the iteratively reweighted least squares (IRLS). The IRCO reduces the weight of the observation that leads to a large loss; it also provides weights to help identify outliers. Applications include robust (penalized) generalized linear models and robust support vector machines. The package also contains penalized Poisson, negative binomial, zero-inflated Poisson, zero-inflated negative binomial regression models and robust models with non-convex loss functions. Wang et al. (2014) <doi:10.1002/sim.6314>, Wang et al. (2015) <doi:10.1002/bimj.201400143>, Wang et al. (2016) <doi:10.1177/0962280214530608>, Wang (2021) <doi:10.1007/s11749-021-00770-2>, Wang (2020) <arXiv:2010.02848>.

Maintained by Zhu Wang. Last updated 3 years ago.

fortran openblas

2.3 match 1 stars 6.67 score 131 scripts 4 dependents

wlandau

crew:A Distributed Worker Launcher Framework

In computationally demanding analysis projects, statisticians and data scientists asynchronously deploy long-running tasks to distributed systems, ranging from traditional clusters to cloud services. The 'NNG'-powered 'mirai' R package by Gao (2023) <doi:10.5281/zenodo.7912722> is a sleek and sophisticated scheduler that efficiently processes these intense workloads. The 'crew' package extends 'mirai' with a unifying interface for third-party worker launchers. Inspiration also comes from packages. 'future' by Bengtsson (2021) <doi:10.32614/RJ-2021-048>, 'rrq' by FitzJohn and Ashton (2023) <https://github.com/mrc-ide/rrq>, 'clustermq' by Schubert (2019) <doi:10.1093/bioinformatics/btz284>), and 'batchtools' by Lang, Bischel, and Surmann (2017) <doi:10.21105/joss.00135>.

Maintained by William Michael Landau. Last updated 11 hours ago.

high-performance-computing

1.3 match 136 stars 11.19 score 243 scripts 2 dependents

evolecolgroup

tidysdm:Species Distribution Models with Tidymodels

Fit species distribution models (SDMs) using the 'tidymodels' framework, which provides a standardised interface to define models and process their outputs. 'tidysdm' expands 'tidymodels' by providing methods for spatial objects, models and metrics specific to SDMs, as well as a number of specialised functions to process occurrences for contemporary and palaeo datasets. The full functionalities of the package are described in Leonardi et al. (2023) <doi:10.1101/2023.07.24.550358>.

Maintained by Andrea Manica. Last updated 8 days ago.

species-distribution-modelling tidymodels

1.7 match 31 stars 8.82 score 51 scripts

haijiangq

EFAfactors:Determining the Number of Factors in Exploratory Factor Analysis

Provides a collection of standard factor retention methods in Exploratory Factor Analysis (EFA), making it easier to determine the number of factors. Traditional methods such as the scree plot by Cattell (1966) <doi:10.1207/s15327906mbr0102_10>, Kaiser-Guttman Criterion (KGC) by Guttman (1954) <doi:10.1007/BF02289162> and Kaiser (1960) <doi:10.1177/001316446002000116>, and flexible Parallel Analysis (PA) by Horn (1965) <doi:10.1007/BF02289447> based on eigenvalues form PCA or EFA are readily available. This package also implements several newer methods, such as the Empirical Kaiser Criterion (EKC) by Braeken and van Assen (2017) <doi:10.1037/met0000074>, Comparison Data (CD) by Ruscio and Roche (2012) <doi:10.1037/a0025697>, and Hull method by Lorenzo-Seva et al. (2011) <doi:10.1080/00273171.2011.564527>, as well as some AI-based methods like Comparison Data Forest (CDF) by Goretzko and Ruscio (2024) <doi:10.3758/s13428-023-02122-4> and Factor Forest (FF) by Goretzko and Buhner (2020) <doi:10.1037/met0000262>. Additionally, it includes a deep neural network (DNN) trained on large-scale datasets that can efficiently and reliably determine the number of factors.

Maintained by Haijiang Qin. Last updated 26 days ago.

openblas cpp openmp

8.7 match 1.70 score

bips-hb

CVN:Covariate-Varying Networks

Inferring high-dimensional Gaussian graphical networks that change with multiple discrete covariates. Louis Dijkstra, Arne Godt, Ronja Foraita (2024) <arXiv:2407.19978>.

Maintained by Ronja Foraita. Last updated 1 months ago.

graphical-models high-dimensional-statistics network-analysis cpp

4.0 match 3.70 score 7 scripts

jonathancornelissen

highfrequency:Tools for Highfrequency Data Analysis

Provide functionality to manage, clean and match highfrequency trades and quotes data, calculate various liquidity measures, estimate and forecast volatility, detect price jumps and investigate microstructure noise and intraday periodicity. A detailed vignette can be found in the paper "Analyzing Intraday Financial Data in R: The highfrequency Package" by Boudt, Kleen, and Sjoerup (2022, <doi:10.18637/jss.v104.i08>). The DOI in the CITATION is for a new Journal of Statistical Software publication that will be registered after publication on CRAN. A working paper version can be found on SSRN: <doi:10.2139/ssrn.3917548>.

Maintained by Kris Boudt. Last updated 2 years ago.

openblas cpp openmp

2.0 match 152 stars 7.37 score 286 scripts

b-thi

FuncNN:Functional Neural Networks

A collection of functions which fit functional neural network models. In other words, this package will allow users to build deep learning models that have either functional or scalar responses paired with functional and scalar covariates. We implement the theoretical discussion found in Thind, Multani and Cao (2020) <arXiv:2006.09590> through the help of a main fitting and prediction function as well as a number of helper functions to assist with cross-validation, tuning, and the display of estimated functional weights.

Maintained by Barinder Thind. Last updated 5 years ago.

4.6 match 3 stars 3.18 score 5 scripts

yunanwu123

TFRE:A Tuning-Free Robust and Efficient Approach to High-Dimensional Regression

Provide functions to estimate the coefficients in high-dimensional linear regressions via a tuning-free and robust approach. The method was published in Wang, L., Peng, B., Bradic, J., Li, R. and Wu, Y. (2020), "A Tuning-free Robust and Efficient Approach to High-dimensional Regression", Journal of the American Statistical Association, 115:532, 1700-1714(JASA’s discussion paper), <doi:10.1080/01621459.2020.1840989>. See also Wang, L., Peng, B., Bradic, J., Li, R. and Wu, Y. (2020), "Rejoinder to “A tuning-free robust and efficient approach to high-dimensional regression". Journal of the American Statistical Association, 115, 1726-1729, <doi:10.1080/01621459.2020.1843865>; Peng, B. and Wang, L. (2015), "An Iterative Coordinate Descent Algorithm for High-Dimensional Nonconvex Penalized Quantile Regression", Journal of Computational and Graphical Statistics, 24:3, 676-694, <doi:10.1080/10618600.2014.913516>; Clémençon, S., Colin, I., and Bellet, A. (2016), "Scaling-up empirical risk minimization: optimization of incomplete u-statistics", The Journal of Machine Learning Research, 17(1):2682–2717; Fan, J. and Li, R. (2001), "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties", Journal of the American Statistical Association, 96:456, 1348-1360, <doi:10.1198/016214501753382273>.

Maintained by Yunan Wu. Last updated 1 years ago.

cpp

5.5 match 1 stars 2.70 score

rohelab

fastadi:Self-Tuning Data Adaptive Matrix Imputation

Implements the AdaptiveImpute matrix completion algorithm of 'Intelligent Initialization and Adaptive Thresholding for Iterative Matrix Completion', <https://amstat.tandfonline.com/doi/abs/10.1080/10618600.2018.1518238>. AdaptiveImpute is useful for embedding sparsely observed matrices, often out performs competing matrix completion algorithms, and self-tunes its hyperparameter, making usage easy.

Maintained by Alex Hayes. Last updated 9 months ago.

openblas cpp openmp

3.4 match 9 stars 4.26 score 6 scripts

cran

sparcl:Perform Sparse Hierarchical Clustering and Sparse K-Means Clustering

Implements the sparse clustering methods of Witten and Tibshirani (2010): "A framework for feature selection in clustering"; published in Journal of the American Statistical Association 105(490): 713-726.

Maintained by Daniela Witten. Last updated 6 years ago.

fortran

3.5 match 1 stars 4.20 score 133 scripts 4 dependents

crj32

Spectrum:Fast Adaptive Spectral Clustering for Single and Multi-View Data

A self-tuning spectral clustering method for single or multi-view data. 'Spectrum' uses a new type of adaptive density aware kernel that strengthens connections in the graph based on common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to integrate different data sources and reduce noise. 'Spectrum' uses either the eigengap or multimodality gap heuristics to determine the number of clusters. The method is sufficiently flexible so that a wide range of Gaussian and non-Gaussian structures can be clustered with automatic selection of K.

Maintained by Christopher R John. Last updated 5 years ago.

clustering spectral-clustering

2.3 match 7 stars 5.99 score 47 scripts 1 dependents

daniel-jg

BeviMed:Bayesian Evaluation of Variant Involvement in Mendelian Disease

A fast integrative genetic association test for rare diseases based on a model for disease status given allele counts at rare variant sites. Probability of association, mode of inheritance and probability of pathogenicity for individual variants are all inferred in a Bayesian framework - 'A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases', Greene et al 2017 <doi:10.1016/j.ajhg.2017.05.015>.

Maintained by Daniel Greene. Last updated 10 months ago.

cpp

4.0 match 1 stars 3.41 score 17 scripts

ropensci

DoOR.functions:Integrating Heterogeneous Odorant Response Data into a Common Response Model: A DoOR to the Complete Olfactome

This is a function package providing functions to perform data manipulations and visualizations for DoOR.data. See the URLs for the original and the DoOR 2.0 publication.

Maintained by Daniel Münch. Last updated 1 years ago.

peer-reviewed

2.5 match 8 stars 5.40 score 52 scripts

ly129

ktweedie:'Tweedie' Compound Poisson Model in the Reproducing Kernel Hilbert Space

Kernel-based 'Tweedie' compound Poisson gamma model using high-dimensional predictors for the analyses of zero-inflated response variables. The package features built-in estimation, prediction and cross-validation tools and supports choice of different kernel functions. For more details, please see Yi Lian, Archer Yi Yang, Boxiang Wang, Peng Shi & Robert William Platt (2023) <doi:10.1080/00401706.2022.2156615>.

Maintained by Yi Lian. Last updated 1 years ago.

fortran

3.3 match 2 stars 4.00 score 5 scripts

rjacobucci

regsem:Regularized Structural Equation Modeling

Uses both ridge and lasso penalties (and extensions) to penalize specific parameters in structural equation models. The package offers additional cost functions, cross validation, and other extensions beyond traditional structural equation models. Also contains a function to perform exploratory mediation (XMed).

Maintained by Ross Jacobucci. Last updated 2 years ago.

openblas cpp

2.0 match 14 stars 6.63 score 77 scripts

cran

cosso:Fit Regularized Nonparametric Regression Models Using COSSO Penalty

The COSSO regularization method automatically estimates and selects important function components by a soft-thresholding penalty in the context of smoothing spline ANOVA models. Implemented models include mean regression, quantile regression, logistic regression and the Cox regression models.

Maintained by Isaac Ray. Last updated 2 years ago.

10.3 match 1 stars 1.28 score 19 scripts

aalfons

cvTools:Cross-validation tools for regression models

Tools that allow developers to write functions for cross-validation with minimal programming effort and assist users with model selection.

Maintained by Andreas Alfons. Last updated 13 years ago.

1.8 match 8 stars 7.26 score 460 scripts 18 dependents

sth1402

GGMridge:Gaussian Graphical Models Using Ridge Penalty Followed by Thresholding and Reestimation

Estimation of partial correlation matrix using ridge penalty followed by thresholding and reestimation. Under multivariate Gaussian assumption, the matrix constitutes an Gaussian graphical model (GGM).

Maintained by Shannon T. Holloway. Last updated 1 years ago.

6.7 match 1.89 score 13 scripts 2 dependents

qiantang0326

hdqr:Fast Algorithm for Penalized Quantile Regression

Implements an efficient algorithm to fit and tune penalized quantile regression models using the generalized coordinate descent algorithm. Designed to handle high-dimensional datasets effectively, with emphasis on precision and computational efficiency. This package implements the algorithms proposed in Tang, Q., Zhang, Y., & Wang, B. (2022) <https://openreview.net/pdf?id=RvwMTDYTOb>.

Maintained by Qian Tang. Last updated 1 months ago.

fortran

5.5 match 2.30 score

t-kalinowski

keras:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

1.2 match 10.82 score 10k scripts 54 dependents

josetamezpena

FRESA.CAD:Feature Selection Algorithms for Computer Aided Diagnosis

Contains a set of utilities for building and testing statistical models (linear, logistic,ordinal or COX) for Computer Aided Diagnosis/Prognosis applications. Utilities include data adjustment, univariate analysis, model building, model-validation, longitudinal analysis, reporting and visualization.

Maintained by Jose Gerardo Tamez-Pena. Last updated 1 months ago.

openblas cpp openmp

2.3 match 7 stars 5.59 score 31 scripts

bioc

ClassifyR:A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.

Maintained by Dario Strbenac. Last updated 5 days ago.

classification survival cpp

1.5 match 5 stars 8.36 score 45 scripts 3 dependents

sgiddens

DPpack:Differentially Private Statistical Analysis and Machine Learning

An implementation of common statistical analysis and models with differential privacy (Dwork et al., 2006a) <doi:10.1007/11681878_14> guarantees. The package contains, for example, functions providing differentially private computations of mean, variance, median, histograms, and contingency tables. It also implements some statistical models and machine learning algorithms such as linear regression (Kifer et al., 2012) <https://proceedings.mlr.press/v23/kifer12.html> and SVM (Chaudhuri et al., 2011) <https://jmlr.org/papers/v12/chaudhuri11a.html>. In addition, it implements some popular randomization mechanisms, including the Laplace mechanism (Dwork et al., 2006a) <doi:10.1007/11681878_14>, Gaussian mechanism (Dwork et al., 2006b) <doi:10.1007/11761679_29>, analytic Gaussian mechanism (Balle & Wang, 2018) <https://proceedings.mlr.press/v80/balle18a.html>, and exponential mechanism (McSherry & Talwar, 2007) <doi:10.1109/FOCS.2007.66>.

Maintained by Spencer Giddens. Last updated 5 months ago.

3.4 match 3 stars 3.65 score 3 scripts

cran

CSCNet:Fitting and Tuning Regularized Cause-Specific Cox Models with Elastic-Net Penalty

Flexible tools to fit, tune and obtain absolute risk predictions from regularized cause-specific cox models with elastic-net penalty.

Maintained by Shahin Roshani. Last updated 2 years ago.

4.6 match 2.70 score 3 scripts

syeonkang

rrMixture:Reduced-Rank Mixture Models

We implement full-ranked, rank-penalized, and adaptive nuclear norm penalized estimation methods using multivariate mixture models proposed by Kang, Chen, and Yao (2022+).

Maintained by Suyeon Kang. Last updated 3 years ago.

openblas cpp

6.2 match 2.00 score 5 scripts

cran

CompositionalML:Machine Learning with Compositional Data

Machine learning algorithms for predictor variables that are compositional data and the response variable is either continuous or categorical. Specifically, the Boruta variable selection algorithm, random forest, support vector machines and projection pursuit regression are included. Relevant papers include: Tsagris M.T., Preston S. and Wood A.T.A. (2011). "A data-based power transformation for compositional data". Fourth International International Workshop on Compositional Data Analysis. <doi:10.48550/arXiv.1106.1451> and Alenazi, A. (2023). "A review of compositional data analysis and recent advances". Communications in Statistics--Theory and Methods, 52(16): 5535--5567. <doi:10.1080/03610926.2021.2014890>.

Maintained by Michail Tsagris. Last updated 1 years ago.

12.4 match 1.00 score

jhorzek

lessSEM:Non-Smooth Regularization for Structural Equation Models

Provides regularized structural equation modeling (regularized SEM) with non-smooth penalty functions (e.g., lasso) building on 'lavaan'. The package is heavily inspired by the ['regsem'](<https://github.com/Rjacobucci/regsem>) and ['lslx'](<https://github.com/psyphh/lslx>) packages.

Maintained by Jannik H. Orzek. Last updated 1 years ago.

lasso psychometrics regularization regularized-structural-equation-model sem structural-equation-modeling openblas cpp openmp

1.7 match 7 stars 7.19 score 223 scripts

alarm-redist

redist:Simulation Methods for Legislative Redistricting

Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <arXiv:1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <arXiv:2011.02288>.

Maintained by Christopher T. Kenny. Last updated 2 months ago.

geospatial gerrymandering redistricting sampling openblas cpp openmp

1.3 match 68 stars 9.17 score 259 scripts

philipp-baumann

simplerspec:Soil and plant spectroscopic model building and prediction

Functions that cover reading of spectral data, outlier removal, spectral preprocessing, calibration sampling, PLS regression using caret, and model diagnostic statistics and plots.

Maintained by Philipp Baumann. Last updated 1 years ago.

3.5 match 33 stars 3.52 score 10 scripts

haghish

shapley:Weighted Mean SHAP and CI for Robust Feature Selection in ML Grid

This R package introduces Weighted Mean SHapley Additive exPlanations (WMSHAP), an innovative method for calculating SHAP values for a grid of fine-tuned base-learner machine learning models as well as stacked ensembles, a method not previously available due to the common reliance on single best-performing models. By integrating the weighted mean SHAP values from individual base-learners comprising the ensemble or individual base-learners in a tuning grid search, the package weights SHAP contributions according to each model's performance, assessed by multiple either R squared (for both regression and classification models). alternatively, this software also offers weighting SHAP values based on the area under the precision-recall curve (AUCPR), the area under the curve (AUC), and F2 measures for binary classifiers. It further extends this framework to implement weighted confidence intervals for weighted mean SHAP values, offering a more comprehensive and robust feature importance evaluation over a grid of machine learning models, instead of solely computing SHAP values for the best model. This methodology is particularly beneficial for addressing the severe class imbalance (class rarity) problem by providing a transparent, generalized measure of feature importance that mitigates the risk of reporting SHAP values for an overfitted or biased model and maintains robustness under severe class imbalance, where there is no universal criteria of identifying the absolute best model. Furthermore, the package implements hypothesis testing to ascertain the statistical significance of SHAP values for individual features, as well as comparative significance testing of SHAP contributions between features. Additionally, it tackles a critical gap in feature selection literature by presenting criteria for the automatic feature selection of the most important features across a grid of models or stacked ensembles, eliminating the need for arbitrary determination of the number of top features to be extracted. This utility is invaluable for researchers analyzing feature significance, particularly within severely imbalanced outcomes where conventional methods fall short. Moreover, it is also expected to report democratic feature importance across a grid of models, resulting in a more comprehensive and generalizable feature selection. The package further implements a novel method for visualizing SHAP values both at subject level and feature level as well as a plot for feature selection based on the weighted mean SHAP ratios.

Maintained by E. F. Haghish. Last updated 1 days ago.

class-imbalance class-imbalance-problem feature-extraction feature-importance feature-selection machine-learning machine-learning-algorithms shap shap-analysis shap-values shapely shapley-additive-explanations shapley-decomposition shapley-value shapley-values shapleyvalue weighted-shap weighted-shap-confidence-interval weighted-shapley weighted-shapley-ci

2.3 match 14 stars 5.19 score 17 scripts

egenn

rtemis:Machine Learning and Visualization

Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.

Maintained by E.D. Gennatas. Last updated 1 months ago.

data-science data-visualization machine-learning machine-learning-library visualization

1.7 match 145 stars 7.09 score 50 scripts 2 dependents

cran

evreg:Evidential Regression

An implementation of the 'Evidential Neural Network for Regression' model recently introduced in Denoeux (2023) <doi:10.1109/TFUZZ.2023.3268200>. In this model, prediction uncertainty is quantified by Gaussian random fuzzy numbers as introduced in Denoeux (2023) <doi:10.1016/j.fss.2022.06.004>. The package contains functions for training the network, tuning hyperparameters by cross-validation or the hold-out method, and making predictions. It also contains utilities for making calculations with Gaussian random fuzzy numbers (such as, e.g., computing the degrees of belief and plausibility of an interval, or combining Gaussian random fuzzy numbers).

Maintained by Thierry Denoeux. Last updated 10 months ago.

5.3 match 2.30 score

martakarass

adept:Adaptive Empirical Pattern Transformation

Designed for optimal use in performing fast, accurate walking strides segmentation from high-density data collected from a wearable accelerometer worn during continuous walking activity.

Maintained by Marta Karas. Last updated 4 years ago.

2.2 match 7 stars 5.35 score 16 scripts

civisanalytics

civis:R Client for the 'Civis Platform API'

A convenient interface for making requests directly to the 'Civis Platform API' <https://www.civisanalytics.com/platform/>. Full documentation available 'here' <https://civisanalytics.github.io/civis-r/>.

Maintained by Peter Cooman. Last updated 2 months ago.

1.5 match 16 stars 7.84 score 144 scripts

ucl

rmcmc:Robust Markov Chain Monte Carlo Methods

Functions for simulating Markov chains using the Barker proposal to compute Markov chain Monte Carlo (MCMC) estimates of expectations with respect to a target distribution on a real-valued vector space. The Barker proposal, described in Livingstone and Zanella (2022) <doi:10.1111/rssb.12482>, is a gradient-based MCMC algorithm inspired by the Barker accept-reject rule. It combines the robustness of simpler MCMC schemes, such as random-walk Metropolis, with the efficiency of gradient-based methods, such as the Metropolis adjusted Langevin algorithm. The key function provided by the package is sample_chain(), which allows sampling a Markov chain with a specified target distribution as its stationary distribution. The chain is sampled by generating proposals and accepting or rejecting them using a Metropolis-Hasting acceptance rule. During an initial warm-up stage, the parameters of the proposal distribution can be adapted, with adapters available to both: tune the scale of the proposals by coercing the average acceptance rate to a target value; tune the shape of the proposals to match covariance estimates under the target distribution. As well as the default Barker proposal, the package also provides implementations of alternative proposal distributions, such as (Gaussian) random walk and Langevin proposals. Optionally, if 'BridgeStan's R interface <https://roualdes.github.io/bridgestan/latest/languages/r.html>, available on GitHub <https://github.com/roualdes/bridgestan>, is installed, then 'BridgeStan' can be used to specify the target distribution to sample from.

Maintained by Matthew M. Graham. Last updated 12 days ago.

approximate-inference mcmc

2.0 match 5 stars 5.85 score 8 scripts

r-forge

PST:Probabilistic Suffix Trees and Variable Length Markov Chains

Provides a framework for analysing state sequences with probabilistic suffix trees (PST), the construction that stores variable length Markov chains (VLMC). Besides functions for learning and optimizing VLMC models, the PST library includes many additional tools to analyse sequence data with these models: visualization tools, functions for sequence prediction and artificial sequences generation, as well as for context and pattern mining. The package is specifically adapted to the field of social sciences by allowing to learn VLMC models from sets of individual sequences possibly containing missing values, and by accounting for case weights. The library also allows to compute probabilistic divergence between two models, and to fit segmented VLMC, where sub-models fitted to distinct strata of the learning sample are stored in a single PST. This software results from research work executed within the framework of the Swiss National Centre of Competence in Research LIVES, which is financed by the Swiss National Science Foundation. The authors are grateful to the Swiss National Science Foundation for its financial support.

Maintained by Alexis Gabadinho. Last updated 4 years ago.

3.3 match 3.56 score 36 scripts

tidymodels

agua:'tidymodels' Integration with 'h2o'

Create and evaluate models using 'tidymodels' and 'h2o' <https://h2o.ai/>. The package enables users to specify 'h2o' as an engine for several modeling methods.

Maintained by Qiushi Yan. Last updated 9 months ago.

1.7 match 22 stars 6.88 score 80 scripts

ropensci

QuadratiK:Collection of Methods Constructed using Kernel-Based Quadratic Distances

It includes test for multivariate normality, test for uniformity on the d-dimensional Sphere, non-parametric two- and k-sample tests, random generation of points from the Poisson kernel-based density and clustering algorithm for spherical data. For more information see Saraceno G., Markatou M., Mukhopadhyay R. and Golzy M. (2024) <doi:10.48550/arXiv.2402.02290> Markatou, M. and Saraceno, G. (2024) <doi:10.48550/arXiv.2407.16374>, Ding, Y., Markatou, M. and Saraceno, G. (2023) <doi:10.5705/ss.202022.0347>, and Golzy, M. and Markatou, M. (2020) <doi:10.1080/10618600.2020.1740713>.

Maintained by Giovanni Saraceno. Last updated 1 months ago.

cpp

1.8 match 1 stars 6.36 score 27 scripts

thomasferte

reservoirnet:Reservoir Computing and Echo State Networks

A simple user-friendly library based on the 'python' module 'reservoirpy'. It provides a flexible interface to implement efficient Reservoir Computing (RC) architectures with a particular focus on Echo State Networks (ESN). Some of its features are: offline and online training, parallel implementation, sparse matrix computation, fast spectral initialization, advanced learning rules (e.g. Intrinsic Plasticity) etc. It also makes possible to easily create complex architectures with multiple reservoirs (e.g. deep reservoirs), readouts, and complex feedback loops. Moreover, graphical tools are included to easily explore hyperparameters. Finally, it includes several tutorials exploring time series forecasting, classification and hyperparameter tuning. For more information about 'reservoirpy', please see Trouvain et al. (2020) <doi:10.1007/978-3-030-61616-8_40>. This package was developed in the framework of the University of Bordeaux’s IdEx "Investments for the Future" program / RRI PHDS.

Maintained by Thomas Ferte. Last updated 2 years ago.

3.5 match 3.26 score 12 scripts

cjvanlissa

metaforest:Exploring Heterogeneity in Meta-Analysis using Random Forests

Conduct random forests-based meta-analysis, obtain partial dependence plots for metaforest and classic meta-analyses, and cross-validate and tune metaforest- and classic meta-analyses in conjunction with the caret package. A requirement of classic meta-analysis is that the studies being aggregated are conceptually similar, and ideally, close replications. However, in many fields, there is substantial heterogeneity between studies on the same topic. Classic meta-analysis lacks the power to assess more than a handful of univariate moderators. MetaForest, by contrast, has substantial power to explore heterogeneity in meta-analysis. It can identify important moderators from a larger set of potential candidates (Van Lissa, 2020). This is an appealing quality, because many meta-analyses have small sample sizes. Moreover, MetaForest yields a measure of variable importance which can be used to identify important moderators, and offers partial prediction plots to explore the shape of the marginal relationship between moderators and effect size.

Maintained by Caspar J. van Lissa. Last updated 2 months ago.

2.0 match 1 stars 5.68 score 80 scripts

bioc

basilisk:Freezing Python Dependencies Inside Bioconductor Packages

Installs a self-contained conda instance that is managed by the R/Bioconductor installation machinery. This aims to provide a consistent Python environment that can be used reliably by Bioconductor packages. Functions are also provided to enable smooth interoperability of multiple Python environments in a single R session.

Maintained by Aaron Lun. Last updated 1 months ago.

infrastructure

1.3 match 9.10 score 75 scripts 38 dependents

ajmolstad

MatrixLDA:Penalized Matrix-Normal Linear Discriminant Analysis

Fits the penalized matrix-normal model to be used for linear discriminant analysis with matrix-valued predictors. For a description of the method, see Molstad and Rothman (2018) <doi:10.1080/10618600.2018.1476249>.

Maintained by Aaron J. Molstad. Last updated 1 years ago.

openblas cpp

4.2 match 1 stars 2.70 score 4 scripts

rstudio

tfhub:Interface to 'TensorFlow' Hub

'TensorFlow' Hub is a library for the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a 'TensorFlow' graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning. Transfer learning train a model with a smaller dataset, improve generalization, and speed up training.

Maintained by Tomasz Kalinowski. Last updated 3 years ago.

1.5 match 29 stars 7.46 score 73 scripts 1 dependents