Showing 200 of total 528 results (show query)
mlr-org
mlr3tuning:Hyperparameter Optimization for 'mlr3'
Hyperparameter optimization package of the 'mlr3' ecosystem. It features highly configurable search spaces via the 'paradox' package and finds optimal hyperparameter configurations for any 'mlr3' learner. 'mlr3tuning' works with several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in 'mlr3mbo') and Hyperband (in 'mlr3hyperband'). Moreover, it can automatically optimize learners and estimate the performance of optimized models with nested resampling.
Maintained by Marc Becker. Last updated 3 months ago.
bbotkhyperparameter-optimizationhyperparameter-tuningmachine-learningmlr3optimizationtunetuning
103.2 match 55 stars 11.59 score 384 scripts 11 dependentstidymodels
tune:Tidy Tuning Tools
The ability to tune models is important. 'tune' contains functions and classes to be used in conjunction with other 'tidymodels' packages for finding reasonable values of hyper-parameters in models, pre-processing methods, and post-processing steps.
Maintained by Max Kuhn. Last updated 11 days ago.
79.0 match 293 stars 14.27 score 756 scripts 39 dependentsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 2 days ago.
immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project
47.0 match 182 stars 13.71 score 1.3k scripts 22 dependentscran
e1071:Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien
Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, generalized k-nearest neighbour ...
Maintained by David Meyer. Last updated 6 months ago.
28.8 match 28 stars 14.46 score 19k scripts 2.0k dependentsmlr-org
mlr3hyperband:Hyperband for 'mlr3'
Successive Halving (Jamieson and Talwalkar (2016) <doi:10.48550/arXiv.1502.07943>) and Hyperband (Li et al. 2018 <doi:10.48550/arXiv.1603.06560>) optimization algorithm for the mlr3 ecosystem. The implementation in mlr3hyperband features improved scheduling and parallelizes the evaluation of configurations. The package includes tuners for hyperparameter optimization in mlr3tuning and optimizers for black-box optimization in bbotk.
Maintained by Marc Becker. Last updated 9 months ago.
automlbbotkhyperbandhyperparameter-tuningmachine-learningmlr3optimizationtunetuning
30.4 match 18 stars 7.48 score 44 scripts 3 dependentstidymodels
dials:Tools for Creating Tuning Parameter Values
Many models contain tuning parameters (i.e. parameters that cannot be directly estimated from the data). These tools can be used to define objects for creating, simulating, or validating values for such parameters.
Maintained by Hannah Frick. Last updated 28 days ago.
12.0 match 114 stars 14.22 score 426 scripts 52 dependentsr-forge
robustbase:Basic Robust Statistics
"Essential" Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book "Robust Statistics, Theory and Methods" by 'Maronna, Martin and Yohai'; Wiley 2006.
Maintained by Martin Maechler. Last updated 4 months ago.
11.6 match 13.33 score 1.7k scripts 480 dependentsmlr-org
mlr3mbo:Flexible Bayesian Optimization
A modern and flexible approach to Bayesian Optimization / Model Based Optimization building on the 'bbotk' package. 'mlr3mbo' is a toolbox providing both ready-to-use optimization algorithms as well as their fundamental building blocks allowing for straightforward implementation of custom algorithms. Single- and multi-objective optimization is supported as well as mixed continuous, categorical and conditional search spaces. Moreover, using 'mlr3mbo' for hyperparameter optimization of machine learning models within the 'mlr3' ecosystem is straightforward via 'mlr3tuning'. Examples of ready-to-use optimization algorithms include Efficient Global Optimization by Jones et al. (1998) <doi:10.1023/A:1008306431147>, ParEGO by Knowles (2006) <doi:10.1109/TEVC.2005.851274> and SMS-EGO by Ponweiser et al. (2008) <doi:10.1007/978-3-540-87700-4_78>.
Maintained by Lennart Schneider. Last updated 11 days ago.
automlbayesian-optimizationbbotkblack-box-optimizationgaussian-processhpohyperparameterhyperparameter-optimizationhyperparameter-tuningmachine-learningmlr3model-based-optimizationoptimizationoptimizerrandom-foresttuning
17.5 match 25 stars 8.57 score 120 scripts 3 dependentsmsalibian
RobStatTM:Robust Statistics: Theory and Methods
Companion package for the book: "Robust Statistics: Theory and Methods, second edition", <http://www.wiley.com/go/maronna/robust>. This package contains code that implements the robust estimators discussed in the recent second edition of the book above, as well as the scripts reproducing all the examples in the book.
Maintained by Matias Salibian-Barrera. Last updated 2 days ago.
robustrobust-estimationrobust-regressionrobust-statisticsrobustnessstatisticsfortranopenblas
14.2 match 17 stars 10.23 score 84 scripts 8 dependentsbrian-j-smith
MachineShop:Machine Learning Models and Tools
Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.
Maintained by Brian J Smith. Last updated 7 months ago.
classification-modelsmachine-learningpredictive-modelingregression-modelssurvival-models
17.8 match 61 stars 7.95 score 121 scriptsjamiemkass
ENMeval:Automated Tuning and Evaluations of Ecological Niche Models
Runs ecological niche models over all combinations of user-defined settings (i.e., tuning), performs cross validation to evaluate models, and returns data tables to aid in selection of optimal model settings that balance goodness-of-fit and model complexity. Also has functions to partition data spatially (or not) for cross validation, to plot multiple visualizations of results, to run null models to estimate significance and effect sizes of performance metrics, and to calculate range overlap between model predictions, among others. The package was originally built for Maxent models (Phillips et al. 2006, Phillips et al. 2017), but the current version allows possible extensions for any modeling algorithm. The extensive vignette, which guides users through most package functionality but unfortunately has a file size too big for CRAN, can be found here on the package's Github Pages website: <https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.
Maintained by Jamie M. Kass. Last updated 2 months ago.
12.2 match 49 stars 11.25 score 332 scripts 2 dependentsbusiness-science
modeltime:The Tidymodels Extension for Time Series Modeling
The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<https://otexts.com/fpp2/>). Refer to "Prophet: forecasting at scale" (<https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/>.).
Maintained by Matt Dancho. Last updated 5 months ago.
arimadata-sciencedeep-learningetsforecastingmachine-learningmachine-learning-algorithmsmodeltimeprophettbatstidymodelingtidymodelstimetime-seriestime-series-analysistimeseriestimeseries-forecasting
12.8 match 549 stars 10.57 score 1.1k scripts 7 dependentsmlopez-ibanez
irace:Iterated Racing for Automatic Algorithm Configuration
Iterated race is an extension of the Iterated F-race method for the automatic configuration of optimization algorithms, that is, (offline) tuning their parameters by finding the most appropriate settings given a set of instances of an optimization problem. M. López-Ibáñez, J. Dubois-Lacoste, L. Pérez Cáceres, T. Stützle, and M. Birattari (2016) <doi:10.1016/j.orp.2016.09.002>.
Maintained by Manuel López-Ibáñez. Last updated 29 days ago.
algorithm-configurationhyperparameter-tuningiraceoptimization-algorithms
11.6 match 63 stars 10.28 score 103 scripts 1 dependentsconsbiol-unibern
SDMtune:Species Distribution Model Selection
User-friendly framework that enables the training and the evaluation of species distribution models (SDMs). The package implements functions for data driven variable selection and model tuning and includes numerous utilities to display the results. All the functions used to select variables or to tune model hyperparameters have an interactive real-time chart displayed in the 'RStudio' viewer pane during their execution.
Maintained by Sergio Vignali. Last updated 3 months ago.
hyperparameter-tuningspecies-distribution-modellingvariable-selectioncpp
15.7 match 25 stars 7.37 score 155 scriptstopepo
caret:Classification and Regression Training
Misc functions for training and plotting classification and regression models.
Maintained by Max Kuhn. Last updated 3 months ago.
5.8 match 1.6k stars 19.24 score 61k scripts 303 dependentsrstudio
tfruns:Training Run Tools for 'TensorFlow'
Create and manage unique directories for each 'TensorFlow' training run. Provides a unique, time stamped directory for each run along with functions to retrieve the directory of the latest run or latest several runs.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
8.6 match 34 stars 11.80 score 325 scripts 77 dependentscitoverse
cito:Building and Training Neural Networks
The 'cito' package provides a user-friendly interface for training and interpreting deep neural networks (DNN). 'cito' simplifies the fitting of DNNs by supporting the familiar formula syntax, hyperparameter tuning under cross-validation, and helps to detect and handle convergence problems. DNNs can be trained on CPU, GPU and MacOS GPUs. In addition, 'cito' has many downstream functionalities such as various explainable AI (xAI) metrics (e.g. variable importance, partial dependence plots, accumulated local effect plots, and effect estimates) to interpret trained DNNs. 'cito' optionally provides confidence intervals (and p-values) for all xAI metrics and predictions. At the same time, 'cito' is computationally efficient because it is based on the deep learning framework 'torch'. The 'torch' package is native to R, so no Python installation or other API is required for this package.
Maintained by Maximilian Pichler. Last updated 2 months ago.
machine-learningneural-network
10.8 match 42 stars 9.10 score 129 scripts 1 dependentselsa-yang98
ConformalSmallest:Efficient Tuning-Free Conformal Prediction
An implementation of efficiency first conformal prediction (EFCP) and validity first conformal prediction (VFCP) that demonstrates both validity (coverage guarantee) and efficiency (width guarantee). To learn how to use it, check the vignettes for a quick tutorial. The package is based on the work by Yang Y., Kuchibhotla A.,(2021) <arxiv:2104.13871>.
Maintained by Yachong Yang. Last updated 4 years ago.
22.6 match 2 stars 4.30 score 5 scriptskogalur
randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.
Maintained by Udaya B. Kogalur. Last updated 2 months ago.
11.5 match 10 stars 7.90 score 1.2k scripts 12 dependentsbiomodhub
biomod2:Ensemble Platform for Species Distribution Modeling
Functions for species distribution modeling, calibration and evaluation, ensemble of models, ensemble forecasting and visualization. The package permits to run consistently up to 10 single models on a presence/absences (resp presences/pseudo-absences) dataset and to combine them in ensemble models and ensemble projections. Some bench of other evaluation and visualisation tools are also available within the package.
Maintained by Maya Gueguen. Last updated 4 days ago.
6.5 match 95 stars 13.88 score 536 scripts 7 dependentscefet-rj-dal
daltoolbox:Leveraging Experiment Lines to Data Analytics
The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.
Maintained by Eduardo Ogasawara. Last updated 1 months ago.
13.4 match 1 stars 6.65 score 536 scripts 4 dependentsbioc
CMA:Synthesis of microarray-based classification
This package provides a comprehensive collection of various microarray-based classification algorithms both from Machine Learning and Statistics. Variable Selection, Hyperparameter tuning, Evaluation and Comparison can be performed combined or stepwise in a user-friendly environment.
Maintained by Roman Hornung. Last updated 5 months ago.
16.7 match 5.09 score 61 scriptsretowuest
autoMrP:Improving MrP with Ensemble Learning
A tool that improves the prediction performance of multilevel regression with post-stratification (MrP) by combining a number of machine learning methods. For information on the method, please refer to Broniecki, Wüest, Leemann (2020) ''Improving Multilevel Regression with Post-Stratification Through Machine Learning (autoMrP)'' in the 'Journal of Politics'. Final pre-print version: <https://lucasleemann.files.wordpress.com/2020/07/automrp-r2pa.pdf>.
Maintained by Philipp Broniecki. Last updated 5 months ago.
14.9 match 27 stars 5.61 scoreirudnyts
openai:R Wrapper for OpenAI API
An R wrapper of OpenAI API endpoints (see <https://platform.openai.com/docs/introduction> for details). This package covers Models, Completions, Chat, Edits, Images, Embeddings, Audio, Files, Fine-tunes, Moderations, and legacy Engines endpoints.
Maintained by Iegor Rudnytskyi. Last updated 4 months ago.
10.4 match 172 stars 8.05 score 336 scripts 5 dependentstidymodels
hardhat:Construct Modeling Packages
Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.
Maintained by Hannah Frick. Last updated 1 months ago.
5.3 match 103 stars 14.88 score 175 scripts 436 dependentsr4ss
r4ss:R Code for Stock Synthesis
A collection of R functions for use with Stock Synthesis, a fisheries stock assessment modeling platform written in ADMB by Dr. Richard D. Methot at the NOAA Northwest Fisheries Science Center. The functions include tools for summarizing and plotting results, manipulating files, visualizing model parameterizations, and various other common stock assessment tasks. This version of '{r4ss}' is compatible with Stock Synthesis versions 3.24 through 3.30 (specifically version 3.30.23.1, from December 2024). Support for 3.24 models is only through the core functions for reading output and plotting.
Maintained by Ian G. Taylor. Last updated 3 days ago.
fisheriesfisheries-stock-assessmentstock-synthesis
6.7 match 43 stars 11.38 score 1.0k scripts 2 dependentsmlr-org
bbotk:Black-Box Optimization Toolkit
Features highly configurable search spaces via the 'paradox' package and optimizes every user-defined objective function. The package includes several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in 'mlr3mbo') and Hyperband (in 'mlr3hyperband'). bbotk is the base package of 'mlr3tuning', 'mlr3fselect' and 'miesmuschel'.
Maintained by Marc Becker. Last updated 3 months ago.
bbotkblack-box-optimizationdata-sciencehyperparameter-optimizationhyperparameter-tuningmachine-learningmlr3optimization
7.5 match 22 stars 9.87 score 166 scripts 14 dependentskapsner
mlexperiments:Machine Learning Experiments
Provides 'R6' objects to perform parallelized hyperparameter optimization and cross-validation. Hyperparameter optimization can be performed with Bayesian optimization (via 'ParBayesianOptimization' <https://cran.r-project.org/package=ParBayesianOptimization>) and grid search. The optimized hyperparameters can be validated using k-fold cross-validation. Alternatively, hyperparameter optimization and validation can be performed with nested cross-validation. While 'mlexperiments' focuses on core wrappers for machine learning experiments, additional learner algorithms can be supplemented by inheriting from the provided learner base class.
Maintained by Lorenz A. Kapsner. Last updated 10 days ago.
cross-validationexperimenthyperparameter-optimizationhyperparameter-tuningmachine-learningnested
9.3 match 5 stars 7.64 score 49 scripts 2 dependentsngreifer
cobalt:Covariate Balance Tables and Plots
Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with 'MatchIt', 'WeightIt', 'MatchThem', 'twang', 'Matching', 'optmatch', 'CBPS', 'ebal', 'cem', 'sbw', and 'designmatch' for assessing balance on the output of their preprocessing functions. Users can also specify data for balance assessment not generated through the above packages. Also included are methods for assessing balance in clustered or multiply imputed data sets or data sets with multi-category, continuous, or longitudinal treatments.
Maintained by Noah Greifer. Last updated 11 months ago.
causal-inferencepropensity-scores
4.5 match 75 stars 12.98 score 1.0k scripts 8 dependentstidymodels
parsnip:A Common API to Modeling and Analysis Functions
A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).
Maintained by Max Kuhn. Last updated 3 days ago.
3.3 match 612 stars 16.37 score 3.4k scripts 69 dependentsfoucher-y
survivalSL:Super Learner for Survival Prediction from Censored Data
Several functions and S3 methods to construct a super learner in the presence of censored times-to-event and to evaluate its prognostic capacities.
Maintained by Yohann Foucher. Last updated 1 months ago.
14.3 match 2 stars 3.70 scorecenterforstatistics-ugent
xnet:Two-Step Kernel Ridge Regression for Network Predictions
Fit a two-step kernel ridge regression model for predicting edges in networks, and carry out cross-validation using shortcuts for swift and accurate performance assessment (Stock et al, 2018 <doi:10.1093/bib/bby095> ).
Maintained by Joris Meys. Last updated 4 years ago.
10.0 match 11 stars 5.30 score 12 scriptsmlr-org
paradox:Define and Work with Parameter Spaces for Complex Algorithms
Define parameter spaces, constraints and dependencies for arbitrary algorithms, to program on such spaces. Also includes statistical designs and random samplers. Objects are implemented as 'R6' classes.
Maintained by Martin Binder. Last updated 8 months ago.
experimental-designhyperparametersmlr3transformations
4.5 match 29 stars 11.56 score 316 scripts 38 dependentsschlosslab
mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines
An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.
Maintained by Kelly Sovacool. Last updated 2 years ago.
6.4 match 56 stars 7.83 score 86 scriptstidymodels
finetune:Additional Functions for Model Tuning
The ability to tune models is important. 'finetune' enhances the 'tune' package by providing more specialized methods for finding reasonable values of model tuning parameters. Two racing methods described by Kuhn (2014) <arXiv:1405.6974> are included. An iterative search method using generalized simulated annealing (Bohachevsky, Johnson and Stein, 1986) <doi:10.1080/00401706.1986.10488128> is also included.
Maintained by Max Kuhn. Last updated 7 months ago.
5.9 match 62 stars 8.36 score 704 scripts 1 dependentseagerai
kerastuneR:Interface to 'Keras Tuner'
'Keras Tuner' <https://keras-team.github.io/keras-tuner/> is a hypertuning framework made for humans. It aims at making the life of AI practitioners, hypertuner algorithm creators and model designers as simple as possible by providing them with a clean and easy to use API for hypertuning. 'Keras Tuner' makes moving from a base model to a hypertuned one quick and easy by only requiring you to change a few lines of code.
Maintained by Turgut Abdullayev. Last updated 11 months ago.
hyperparameter-tuninghypertuningkeraskeras-tunertensorflowtrial
7.5 match 34 stars 6.61 score 48 scriptsmarjoleinf
pre:Prediction Rule Ensembles
Derives prediction rule ensembles (PREs). Largely follows the procedure for deriving PREs as described in Friedman & Popescu (2008; <DOI:10.1214/07-AOAS148>), with adjustments and improvements. The main function pre() derives prediction rule ensembles consisting of rules and/or linear terms for continuous, binary, count, multinomial, and multivariate continuous responses. Function gpe() derives generalized prediction ensembles, consisting of rules, hinge and linear functions of the predictor variables.
Maintained by Marjolein Fokkema. Last updated 9 months ago.
5.8 match 58 stars 8.49 score 98 scripts 1 dependentscran
datarobot:'DataRobot' Predictive Modeling API
For working with the 'DataRobot' predictive modeling platform's API <https://www.datarobot.com/>.
Maintained by AJ Alon. Last updated 1 years ago.
14.1 match 2 stars 3.48 scorenikita-moor
ldatuning:Tuning of the Latent Dirichlet Allocation Models Parameters
This library estimates the best fitting number of topics.
Maintained by Nathan Chaney. Last updated 10 months ago.
4.7 match 75 stars 9.76 score 356 scripts 5 dependentstheoreticalecology
sjSDM:Scalable Joint Species Distribution Modeling
A scalable and fast method for estimating joint Species Distribution Models (jSDMs) for big community data, including eDNA data. The package estimates a full (i.e. non-latent) jSDM with different response distributions (including the traditional multivariate probit model). The package allows to perform variation partitioning (VP) / ANOVA on the fitted models to separate the contribution of environmental, spatial, and biotic associations. In addition, the total R-squared can be further partitioned per species and site to reveal the internal metacommunity structure, see Leibold et al., <doi:10.1111/oik.08618>. The internal structure can then be regressed against environmental and spatial distinctiveness, richness, and traits to analyze metacommunity assembly processes. The package includes support for accounting for spatial autocorrelation and the option to fit responses using deep neural networks instead of a standard linear predictor. As described in Pichler & Hartig (2021) <doi:10.1111/2041-210X.13687>, scalability is achieved by using a Monte Carlo approximation of the joint likelihood implemented via 'PyTorch' and 'reticulate', which can be run on CPUs or GPUs.
Maintained by Maximilian Pichler. Last updated 22 days ago.
deep-learninggpu-accelerationmachine-learningspecies-distribution-modellingspecies-interactions
5.9 match 69 stars 7.64 score 70 scriptstidymodels
shinymodels:Interactive Assessments of Models
Launch a 'shiny' application for 'tidymodels' results. For classification or regression models, the app can be used to determine if there is lack of fit or poorly predicted points.
Maintained by Simon Couch. Last updated 5 months ago.
7.1 match 48 stars 6.21 score 48 scriptsazure
azuremlsdk:Interface to the 'Azure Machine Learning' 'SDK'
Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: <https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml>.
Maintained by Diondra Peck. Last updated 3 years ago.
amlcomputeazureazure-machine-learningazuremldsimachine-learningrstudiosdk-r
4.9 match 106 stars 8.91 score 221 scriptszjph602xtc
WeightSVM:Subject Weighted Support Vector Machines
Functions for subject/instance weighted support vector machines (SVM). It uses a modified version of 'libsvm' and is compatible with package 'e1071'. It also allows user defined kernel matrix.
Maintained by Tianchen Xu. Last updated 5 months ago.
7.8 match 3 stars 5.54 score 11 scripts 7 dependentsauto-optimization
iraceplot:Plots for Visualizing the Data Produced by the 'irace' Package
Graphical visualization tools for analyzing the data produced by 'irace'. The 'iraceplot' package enables users to analyze the performance and the parameter space data sampled by the configuration during the search process. It provides a set of functions that generate different plots to visualize the configurations sampled during the execution of 'irace' and their performance. The functions just require the log file generated by 'irace' and, in some cases, they can be used with user-provided data.
Maintained by Manuel López-Ibáñez. Last updated 1 months ago.
7.5 match 5 stars 5.70 score 7 scriptstkonopka
umap:Uniform Manifold Approximation and Projection
Uniform manifold approximation and projection is a technique for dimension reduction. The algorithm was described by McInnes and Healy (2018) in <arXiv:1802.03426>. This package provides an interface for two implementations. One is written from scratch, including components for nearest-neighbor search and for embedding. The second implementation is a wrapper for 'python' package 'umap-learn' (requires separate installation, see vignette for more details).
Maintained by Tomasz Konopka. Last updated 11 months ago.
dimensionality-reductionumapcpp
3.3 match 132 stars 12.74 score 3.6k scripts 43 dependentsrstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 3 days ago.
3.1 match 845 stars 13.57 score 264 scripts 2 dependentsnliulab
AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator
A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.
Maintained by Feng Xie. Last updated 13 days ago.
5.3 match 32 stars 7.70 score 30 scriptsnixtla
nixtlar:A Software Development Kit for 'Nixtla''s 'TimeGPT'
A Software Development Kit for working with 'Nixtla''s 'TimeGPT', a foundation model for time series forecasting. 'API' is an acronym for 'application programming interface'; this package allows users to interact with 'TimeGPT' via the 'API'. You can set and validate 'API' keys and generate forecasts via 'API' calls. It is compatible with 'tsibble' and base R. For more details visit <https://docs.nixtla.io/>.
Maintained by Mariana Menchero. Last updated 26 days ago.
5.0 match 30 stars 8.16 score 38 scriptscran
astrochron:A Computational Tool for Astrochronology
Routines for astrochronologic testing, astronomical time scale construction, and time series analysis <doi:10.1016/j.earscirev.2018.11.015>. Also included are a range of statistical analysis and modeling routines that are relevant to time scale development and paleoclimate analysis.
Maintained by Stephen Meyers. Last updated 6 months ago.
10.2 match 5 stars 3.85 score 141 scriptsr-forge
DPQ:Density, Probability, Quantile ('DPQ') Computations
Computations for approximations and alternatives for the 'DPQ' (Density (pdf), Probability (cdf) and Quantile) functions for probability distributions in R. Primary focus is on (central and non-central) beta, gamma and related distributions such as the chi-squared, F, and t. -- For several distribution functions, provide functions implementing formulas from Johnson, Kotz, and Kemp (1992) <doi:10.1002/bimj.4710360207> and Johnson, Kotz, and Balakrishnan (1995) for discrete or continuous distributions respectively. This is for the use of researchers in these numerical approximation implementations, notably for my own use in order to improve standard R pbeta(), qgamma(), ..., etc: {'"dpq"'-functions}.
Maintained by Martin Maechler. Last updated 1 months ago.
6.8 match 5.75 score 43 scripts 1 dependentswanchanglin
mt:Metabolomics Data Analysis Toolbox
Functions for metabolomics data analysis: data preprocessing, orthogonal signal correction, PCA analysis, PCA-DA analysis, PLS-DA analysis, classification, feature selection, correlation analysis, data visualisation and re-sampling strategies.
Maintained by Wanchang Lin. Last updated 1 years ago.
8.6 match 3 stars 4.57 score 50 scriptsbioc
iClusterPlus:Integrative clustering of multi-type genomic data
Integrative clustering of multiple genomic data using a joint latent variable model.
Maintained by Qianxing Mo. Last updated 4 months ago.
multi-omicsclusteringfortranopenblas
6.8 match 5.76 score 190 scriptsbssherwood
rqPen:Penalized Quantile Regression
Performs penalized quantile regression with LASSO, elastic net, SCAD and MCP penalty functions including group penalties. In addition, offers a group penalty that provides consistent variable selection across quantiles. Provides a function that automatically generates lambdas and evaluates different models with cross validation or BIC, including a large p version of BIC. Below URL provides a link to a work in progress vignette.
Maintained by Ben Sherwood. Last updated 30 days ago.
5.4 match 17 stars 7.19 score 100 scripts 3 dependentsrobingenuer
VSURF:Variable Selection Using Random Forests
Three steps variable selection procedure based on random forests. Initially developed to handle high dimensional data (for which number of variables largely exceeds number of observations), the package is very versatile and can treat most dimensions of data, for regression and supervised classification problems. First step is dedicated to eliminate irrelevant variables from the dataset. Second step aims to select all variables related to the response for interpretation purpose. Third step refines the selection by eliminating redundancy in the set of variables selected by the second step, for prediction purpose. Genuer, R. Poggi, J.-M. and Tuleau-Malot, C. (2015) <https://journal.r-project.org/archive/2015-2/genuer-poggi-tuleaumalot.pdf>.
Maintained by Robin Genuer. Last updated 8 months ago.
5.1 match 36 stars 7.49 score 192 scripts 1 dependentstidymodels
tidyclust:A Common API to Clustering
A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
5.1 match 111 stars 7.45 score 139 scriptsnanxstats
stackgbm:Stacked Gradient Boosting Machines
A minimalist implementation of model stacking by Wolpert (1992) <doi:10.1016/S0893-6080(05)80023-1> for boosted tree models. A classic, two-layer stacking model is implemented, where the first layer generates features using gradient boosting trees, and the second layer employs a logistic regression model that uses these features as inputs. Utilities for training the base models and parameters tuning are provided, allowing users to experiment with different ensemble configurations easily. It aims to provide a simple and efficient way to combine multiple gradient boosting models to improve predictive model performance and robustness.
Maintained by Nan Xiao. Last updated 11 months ago.
automlcatboostdecision-treesensemble-learninggbdtgbmgradient-boostinglightgbmmachine-learningmodel-stackingxgboost
6.9 match 25 stars 5.40 score 3 scriptsnelson-gon
manymodelr:Build and Tune Several Models
Frequently one needs a convenient way to build and tune several models in one go.The goal is to provide a number of machine learning convenience functions. It provides the ability to build, tune and obtain predictions of several models in one function. The models are built using functions from 'caret' with easier to read syntax. Kuhn(2014) <arXiv:1405.6974>.
Maintained by Nelson Gonzabato. Last updated 3 years ago.
analysis-of-varianceanovacorrelationcorrelation-coefficientgeneralized-linear-modelsgradient-boosting-decision-treesknn-classificationlinear-modelslinear-regressionmachine-learningmissing-valuesmodelsr-programmingrandom-forest-algorithmregression-models
6.9 match 2 stars 5.30 score 50 scriptsbioc
tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles
This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.
Maintained by Timothy Keyes. Last updated 5 months ago.
singlecellflowcytometrybioinformaticscytometrydata-sciencesingle-celltidyversecpp
5.0 match 19 stars 7.26 score 35 scriptsbnaras
PMA:Penalized Multivariate Analysis
Performs Penalized Multivariate Analysis: a penalized matrix decomposition, sparse principal components analysis, and sparse canonical correlation analysis, described in Witten, Tibshirani and Hastie (2009) <doi:10.1093/biostatistics/kxp008> and Witten and Tibshirani (2009) Extensions of sparse canonical correlation analysis, with applications to genomic data <doi:10.2202/1544-6115.1470>.
Maintained by Balasubramanian Narasimhan. Last updated 1 years ago.
5.0 match 4 stars 7.24 score 254 scripts 11 dependentsantonio-pgarcia
evoper:Evolutionary Parameter Estimation for 'Repast Simphony' Models
The EvoPER, Evolutionary Parameter Estimation for Individual-based Models is an extensible package providing optimization driven parameter estimation methods using metaheuristics and evolutionary computation techniques (Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization for continuous domains, Tabu Search, Evolutionary Strategies, ...) which could be more efficient and require, in some cases, fewer model evaluations than alternatives relying on experimental design. Currently there are built in support for models developed with 'Repast Simphony' Agent-Based framework (<https://repast.github.io/>) and with NetLogo (<https://ccl.northwestern.edu/netlogo/>) which are the most used frameworks for Agent-based modeling.
Maintained by Antonio Prestes Garcia. Last updated 5 years ago.
8.7 match 6 stars 3.92 score 28 scriptse-sensing
sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes
An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.
Maintained by Gilberto Camara. Last updated 30 days ago.
big-earth-datacbersearth-observationeo-datacubesgeospatialimage-time-seriesland-cover-classificationlandsatplanetary-computerr-spatialremote-sensingrspatialsatellite-image-time-seriessatellite-imagerysentinel-2stac-apistac-catalogcpp
3.5 match 494 stars 9.50 score 384 scriptsbioc
XDE:XDE: a Bayesian hierarchical model for cross-study analysis of differential gene expression
Multi-level model for cross-study detection of differential gene expression.
Maintained by Robert Scharpf. Last updated 5 months ago.
microarraydifferentialexpressioncpp
7.8 match 4.20 score 10 scriptsgdurif
plsgenomics:PLS Analyses for Genomics
Routines for PLS-based genomic analyses, implementing PLS methods for classification with microarray data and prediction of transcription factor activities from combined ChIP-chip analysis. The >=1.2-1 versions include two new classification methods for microarray data: GSIM and Ridge PLS. The >=1.3 versions includes a new classification method combining variable selection and compression in logistic regression context: logit-SPLS; and an adaptive version of the sparse PLS.
Maintained by Ghislain Durif. Last updated 12 months ago.
5.9 match 5.55 score 140 scripts 2 dependentsthomasp85
ggraph:An Implementation of Grammar of Graphics for Graphs and Networks
The grammar of graphics as implemented in ggplot2 is a poor fit for graph and network visualizations due to its reliance on tabular data input. ggraph is an extension of the ggplot2 API tailored to graph visualizations and provides the same flexible approach to building up plots layer by layer.
Maintained by Thomas Lin Pedersen. Last updated 1 years ago.
ggplot-extensionggplot2graph-visualizationnetwork-visualizationvisualizationcpp
1.9 match 1.1k stars 16.96 score 9.2k scripts 111 dependentschrhennig
fpc:Flexible Procedures for Clustering
Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.
Maintained by Christian Hennig. Last updated 6 months ago.
3.4 match 11 stars 9.25 score 2.6k scripts 70 dependentsleonawicz
tabr:Music Notation Syntax, Manipulation, Analysis and Transcription in R
Provides a music notation syntax and a collection of music programming functions for generating, manipulating, organizing, and analyzing musical information in R. Music syntax can be entered directly in character strings, for example to quickly transcribe short pieces of music. The package contains functions for directly performing various mathematical, logical and organizational operations and musical transformations on special object classes that facilitate working with music data and notation. The same music data can be organized in tidy data frames for a familiar and powerful approach to the analysis of large amounts of structured music data. Functions are available for mapping seamlessly between these formats and their representations of musical information. The package also provides an API to 'LilyPond' (<https://lilypond.org/>) for transcribing musical representations in R into tablature ("tabs") and sheet music. 'LilyPond' is open source music engraving software for generating high quality sheet music based on markup syntax. The package generates 'LilyPond' files from R code and can pass them to the 'LilyPond' command line interface to be rendered into sheet music PDF files or inserted into R markdown documents. The package offers nominal MIDI file output support in conjunction with rendering sheet music. The package can read MIDI files and attempts to structure the MIDI data to integrate as best as possible with the data structures and functionality found throughout the package.
Maintained by Matthew Leonawicz. Last updated 6 months ago.
guitar-tablaturelilypondlilypond-apimusic-analysismusic-datamusic-notationmusic-programmingmusic-syntaxmusic-transcriptionsheet-music
4.0 match 132 stars 7.87 score 94 scriptstidymodels
rsample:General Resampling Infrastructure
Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).
Maintained by Hannah Frick. Last updated 4 days ago.
1.9 match 341 stars 16.72 score 5.2k scripts 79 dependentskollerma
robustlmm:Robust Linear Mixed Effects Models
Implements the Robust Scoring Equations estimator to fit linear mixed effects models robustly. Robustness is achieved by modification of the scoring equations combined with the Design Adaptive Scale approach.
Maintained by Manuel Koller. Last updated 1 years ago.
3.5 match 28 stars 8.79 score 138 scriptshkestler
TunePareto:Multi-Objective Parameter Tuning for Classifiers
Generic methods for parameter tuning of classification algorithms using multiple scoring functions (Muessel et al. (2012), <doi:10.18637/jss.v046.i05>).
Maintained by Hans Kestler. Last updated 1 years ago.
8.8 match 1 stars 3.52 score 92 scripts 2 dependentsdfalbel
cloudml:Interface to the Google Cloud Machine Learning Platform
Interface to the Google Cloud Machine Learning Platform <https://cloud.google.com/ml-engine>, which provides cloud tools for training machine learning models.
Maintained by Daniel Falbel. Last updated 6 years ago.
8.1 match 3.85 score 141 scriptssparklyr
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 8 days ago.
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
2.0 match 959 stars 15.16 score 4.0k scripts 21 dependentsmlr-org
mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'
Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.
Maintained by Martin Binder. Last updated 7 days ago.
baggingdata-sciencedataflow-programmingensemble-learningmachine-learningmlr3pipelinespreprocessingstacking
2.4 match 141 stars 12.36 score 448 scripts 7 dependentsmicrosoft
finnts:Microsoft Finance Time Series Forecasting Framework
Automated time series forecasting developed by Microsoft Finance. The Microsoft Finance Time Series Forecasting Framework, aka Finn, can be used to forecast any component of the income statement, balance sheet, or any other area of interest by finance. Any numerical quantity over time, Finn can be used to forecast it. While it can be applied outside of the finance domain, Finn was built to meet the needs of financial analysts to better forecast their businesses within a company, and has a lot of built in features that are specific to the needs of financial forecasters. Happy forecasting!
Maintained by Mike Tokic. Last updated 24 days ago.
businessdata-sciencefeature-selectionfinancefinntsforecastingmachine-learningmicrosofttime-series
3.1 match 193 stars 9.45 score 39 scriptsmlr-org
mlr3verse:Easily Install and Load the 'mlr3' Package Family
The 'mlr3' package family is a set of packages for machine-learning purposes built in a modular fashion. This wrapper package is aimed to simplify the installation and loading of the core 'mlr3' packages. Get more information about the 'mlr3' project at <https://mlr3book.mlr-org.com/>.
Maintained by Marc Becker. Last updated 2 months ago.
3.3 match 55 stars 8.36 score 720 scripts 1 dependentsmlverse
tabnet:Fit 'TabNet' Models for Classification and Regression
Implements the 'TabNet' model by Sercan O. Arik et al. (2019) <doi:10.48550/arXiv.1908.07442> with 'Coherent Hierarchical Multi-label Classification Networks' by Giunchiglia et al. <doi:10.48550/arXiv.2010.10151> and provides a consistent interface for fitting and creating predictions. It's also fully compatible with the 'tidymodels' ecosystem.
Maintained by Christophe Regouby. Last updated 6 months ago.
3.0 match 109 stars 9.00 score 65 scriptsanothersamwilson
ParBayesianOptimization:Parallel Bayesian Optimization of Hyperparameters
Fast, flexible framework for implementing Bayesian optimization of model hyperparameters according to the methods described in Snoek et al. <arXiv:1206.2944>. The package allows the user to run scoring function in parallel, save intermediary results, and tweak other aspects of the process to fully utilize the computing resources available to the user.
Maintained by Samuel Wilson. Last updated 2 years ago.
bayesian-inferencemachine-learning
3.8 match 108 stars 7.19 score 86 scriptsnutterb
pixiedust:Tables so Beautifully Fine-Tuned You Will Believe It's Magic
The introduction of the 'broom' package has made converting model objects into data frames as simple as a single function. While the 'broom' package focuses on providing tidy data frames that can be used in advanced analysis, it deliberately stops short of providing functionality for reporting models in publication-ready tables. 'pixiedust' provides this functionality with a programming interface intended to be similar to 'ggplot2's system of layers with fine tuned control over each cell of the table. Options for output include printing to the console and to the common markdown formats (markdown, HTML, and LaTeX). With a little 'pixiedust' (and happy thoughts) tables can really fly.
Maintained by Benjamin Nutter. Last updated 1 years ago.
3.4 match 180 stars 8.01 score 94 scriptsstatnet
ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks
An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.
Maintained by Pavel N. Krivitsky. Last updated 5 days ago.
1.8 match 100 stars 15.36 score 1.4k scripts 36 dependentsagorstras
ahaz:Regularization for Semiparametric Additive Hazards Regression
Computationally efficient procedures for regularized estimation with the semiparametric additive hazards regression model.
Maintained by Anders Gorst-Rasmussen. Last updated 4 months ago.
9.9 match 1 stars 2.70 score 20 scripts 6 dependentslbelzile
BMAmevt:Multivariate Extremes: Bayesian Estimation of the Spectral Measure
Toolkit for Bayesian estimation of the dependence structure in multivariate extreme value parametric models, following Sabourin and Naveau (2014) <doi:10.1016/j.csda.2013.04.021> and Sabourin, Naveau and Fougeres (2013) <doi:10.1007/s10687-012-0163-0>.
Maintained by Leo Belzile. Last updated 2 years ago.
6.8 match 3.90 score 16 scriptsrgcca-factory
RGCCA:Regularized and Sparse Generalized Canonical Correlation Analysis for Multiblock Data
Multi-block data analysis concerns the analysis of several sets of variables (blocks) observed on the same group of individuals. The main aims of the RGCCA package are: to study the relationships between blocks and to identify subsets of variables of each block which are active in their relationships with the other blocks. This package allows to (i) run R/SGCCA and related methods, (ii) help the user to find out the optimal parameters for R/SGCCA such as regularization parameters (tau or sparsity), (iii) evaluate the stability of the RGCCA results and their significance, (iv) build predictive models from the R/SGCCA. (v) Generic print() and plot() functions apply to all these functionalities.
Maintained by Arthur Tenenhaus. Last updated 8 months ago.
3.5 match 12 stars 7.43 score 74 scriptsr-lib
generics:Common S3 Generics not Provided by Base R Methods Related to Model Fitting
In order to reduce potential package dependencies and conflicts, generics provides a number of commonly used S3 generics.
Maintained by Hadley Wickham. Last updated 1 years ago.
1.9 match 61 stars 14.00 score 131 scripts 9.8k dependentsfcampelo
MOEADr:Component-Wise MOEA/D Implementation
Modular implementation of Multiobjective Evolutionary Algorithms based on Decomposition (MOEA/D) [Zhang and Li (2007), <DOI:10.1109/TEVC.2007.892759>] for quick assembling and testing of new algorithmic components, as well as easy replication of published MOEA/D proposals. The full framework is documented in a paper published in the Journal of Statistical Software [<doi:10.18637/jss.v092.i06>].
Maintained by Felipe Campelo. Last updated 2 years ago.
moeadmultiobjective-optimization
4.1 match 20 stars 6.30 score 40 scriptsandrisignorell
ModTools:Building Regression and Classification Models
Consistent user interface to the most common regression and classification algorithms, such as random forest, neural networks, C5 trees and support vector machines, complemented with a handful of auxiliary functions, such as variable importance and a tuning function for the parameters.
Maintained by Andri Signorell. Last updated 2 months ago.
6.1 match 2 stars 4.20 score 3 scriptsyanyachen
FinCovRegularization:Covariance Matrix Estimation and Regularization for Finance
Estimation and regularization for covariance matrix of asset returns. For covariance matrix estimation, three major types of factor models are included: macroeconomic factor model, fundamental factor model and statistical factor model. For covariance matrix regularization, four regularized estimators are included: banding, tapering, hard-thresholding and soft- thresholding. The tuning parameters of these regularized estimators are selected via cross-validation.
Maintained by YaChen Yan. Last updated 8 years ago.
5.7 match 7 stars 4.30 score 19 scripts 1 dependentsmlr-org
mlr3viz:Visualizations for 'mlr3'
Visualization package of the 'mlr3' ecosystem. It features plots for mlr3 objects such as tasks, learners, predictions, benchmark results, tuning instances and filters via the 'autoplot()' generic of 'ggplot2'. The package draws plots with the 'viridis' color palette and applies the minimal theme. Visualizations include barplots, boxplots, histograms, ROC curves, and Precision-Recall curves.
Maintained by Marc Becker. Last updated 4 months ago.
ggplot2mlr3visualizationvisualizations
2.5 match 45 stars 9.58 score 364 scripts 4 dependentsyunuuuu
ggalign:A 'ggplot2' Extension for Consistent Axis Alignment
A 'ggplot2' extension offers various tools the creation of complex, multi-plot visualizations. Built on the familiar grammar of graphics, it provides intuitive tools to align and organize plots, making it ideal for complex visualizations. It excels in multi-omics research—such as genomics and microbiomes—by simplifying the visualization of intricate relationships between datasets, for example, linking genes to pathways. Whether you need to stack plots, arrange them around a central figure, or use a circular layout, 'ggalign' delivers flexibility and accuracy with minimal effort.
Maintained by Yun Peng. Last updated 14 hours ago.
complex-heatmapsdendrogramdendrogram-heatmapggplotggplot-extensionggplot2heatmapheatmap-visualizationheatmapsmarginal-plotsoncoplotoncoprinttanglegramupsetupsetplot
3.3 match 267 stars 7.08 score 27 scriptsflr
mse:Tools for Running Management Strategy Evaluations using FLR
A set of functions and methods to enable the development and running of Management Strategy Evaluation (MSE) analyses, using the FLR packages and classes and the a4a methods and algorithms.
Maintained by Iago Mosqueira. Last updated 20 days ago.
3.3 match 4 stars 7.04 score 137 scripts 3 dependentsaalfons
robmed:(Robust) Mediation Analysis
Perform mediation analysis via the fast-and-robust bootstrap test ROBMED (Alfons, Ates & Groenen, 2022a; <doi:10.1177/1094428121999096>), as well as various other methods. Details on the implementation and code examples can be found in Alfons, Ates, and Groenen (2022b) <doi:10.18637/jss.v103.i13>. Further discussion on robust mediation analysis can be found in Alfons & Schley (2024) <doi:10.31234/osf.io/2hqdy>.
Maintained by Andreas Alfons. Last updated 14 days ago.
3.6 match 6 stars 6.35 score 31 scripts 1 dependentsmiriamesteve
eat:Efficiency Analysis Trees
Functions are provided to determine production frontiers and technical efficiency measures through non-parametric techniques based upon regression trees. The package includes code for estimating radial input, output, directional and additive measures, plotting graphical representations of the scores and the production frontiers by means of trees, and determining rankings of importance of input variables in the analysis. Additionally, an adaptation of Random Forest by a set of individual Efficiency Analysis Trees for estimating technical efficiency is also included. More details in: <doi:10.1016/j.eswa.2020.113783>.
Maintained by Miriam Esteve. Last updated 3 years ago.
4.8 match 5 stars 4.68 score 19 scriptsphilipppro
tuneRanger:Tune Random Forest of the 'ranger' Package
Tuning random forest with one line. The package is mainly based on the packages 'ranger' and 'mlrMBO'.
Maintained by Philipp Probst. Last updated 12 months ago.
3.5 match 33 stars 6.38 score 121 scripts 1 dependentstomkellygenetics
leiden:R Implementation of Leiden Clustering Algorithm
Implements the 'Python leidenalg' module to be called in R. Enables clustering using the leiden algorithm for partition a graph into communities. See the 'Python' repository for more details: <https://github.com/vtraag/leidenalg> Traag et al (2018) From Louvain to Leiden: guaranteeing well-connected communities. <arXiv:1810.08473>.
Maintained by S. Thomas Kelly. Last updated 10 months ago.
2.5 match 38 stars 8.90 score 180 scripts 3 dependentsedhofman
ReSurv:Machine Learning Models For Predicting Claim Counts
Prediction of claim counts using the feature based development factors introduced in the manuscript <doi:10.48550/arXiv.2312.14549>. Implementation of Neural Networks, Extreme Gradient Boosting, and Cox model with splines to optimise the partial log-likelihood of proportional hazard models.
Maintained by Emil Hofman. Last updated 4 months ago.
3.8 match 2 stars 5.87 score 21 scriptsandyliaw-mrk
randomForest:Breiman and Cutlers Random Forests for Classification and Regression
Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <DOI:10.1023/A:1010933404324>.
Maintained by Andy Liaw. Last updated 6 months ago.
1.8 match 47 stars 12.11 score 35k scripts 282 dependentsalec-stashevsky
blocklength:Select an Optimal Block-Length to Bootstrap Dependent Data (Block Bootstrap)
A set of functions to select the optimal block-length for a dependent bootstrap (block-bootstrap). Includes the Hall, Horowitz, and Jing (1995) <doi:10.1093/biomet/82.3.561> subsampling-based cross-validation method, the Politis and White (2004) <doi:10.1081/ETC-120028836> Spectral Density Plug-in method, including the Patton, Politis, and White (2009) <doi:10.1080/07474930802459016> correction, and the Lahiri, Furukawa, and Lee (2007) <doi:10.1016/j.stamet.2006.08.002> nonparametric plug-in method, with a corresponding set of S3 plot methods.
Maintained by Alec Stashevsky. Last updated 6 days ago.
block-bootstrapblock-resamplingblocklengthbootbootstrapdepedent-bootstrapdependenthorowitzinferencemebootpolitisresamplestatstimetime-seriestime-series-analysistseries
4.5 match 4 stars 4.78 score 8 scriptsblasbenito
spatialRF:Easy Spatial Modeling with Random Forest
Automatic generation and selection of spatial predictors for spatial regression with Random Forest. Spatial predictors are surrogates of variables driving the spatial structure of a response variable. The package offers two methods to generate spatial predictors from a distance matrix among training cases: 1) Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>): computed as the eigenvectors of a weighted matrix of distances; 2) RFsp (Hengl et al. <DOI:10.7717/peerj.5518>): columns of the distance matrix used as spatial predictors. Spatial predictors help minimize the spatial autocorrelation of the model residuals and facilitate an honest assessment of the importance scores of the non-spatial predictors. Additionally, functions to reduce multicollinearity, identify relevant variable interactions, tune random forest hyperparameters, assess model transferability via spatial cross-validation, and explore model results via partial dependence curves and interaction surfaces are included in the package. The modelling functions are built around the highly efficient 'ranger' package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).
Maintained by Blas M. Benito. Last updated 3 years ago.
random-forestspatial-analysisspatial-regression
3.9 match 114 stars 5.45 score 49 scriptshugogogo
varband:Variable Banding of Large Precision Matrices
Implementation of the variable banding procedure for modeling local dependence and estimating precision matrices that is introduced in Yu & Bien (2016) and is available at <https://arxiv.org/abs/1604.07451>.
Maintained by Guo Yu. Last updated 7 years ago.
5.3 match 2 stars 4.00 score 10 scriptsbusiness-science
timetk:A Tool Kit for Working with Time Series
Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.
Maintained by Matt Dancho. Last updated 1 years ago.
coercioncoercion-functionsdata-miningdplyrforecastforecastingforecasting-modelsmachine-learningseries-decompositionseries-signaturetibbletidytidyquanttidyversetimetime-seriestimeseries
1.5 match 625 stars 14.15 score 4.0k scripts 16 dependentsgesistsa
grafzahl:Supervised Machine Learning for Textual Data Using Transformers and 'Quanteda'
Duct tape the 'quanteda' ecosystem (Benoit et al., 2018) <doi:10.21105/joss.00774> to modern Transformer-based text classification models (Wolf et al., 2020) <doi:10.18653/v1/2020.emnlp-demos.6>, in order to facilitate supervised machine learning for textual data. This package mimics the behaviors of 'quanteda.textmodels' and provides a function to setup the 'Python' environment to use the pretrained models from 'Hugging Face' <https://huggingface.co/>. More information: <doi:10.5117/CCR2023.1.003.CHAN>.
Maintained by Chung-hong Chan. Last updated 24 days ago.
3.5 match 41 stars 5.91 score 3 scriptsssnn-airr
shazam:Immunoglobulin Somatic Hypermutation Analysis
Provides a computational framework for analyzing mutations in immunoglobulin (Ig) sequences. Includes methods for Bayesian estimation of antigen-driven selection pressure, mutational load quantification, building of somatic hypermutation (SHM) models, and model-dependent distance calculations. Also includes empirically derived models of SHM for both mice and humans. Citations: Gupta and Vander Heiden, et al (2015) <doi:10.1093/bioinformatics/btv359>, Yaari, et al (2012) <doi:10.1093/nar/gks457>, Yaari, et al (2013) <doi:10.3389/fimmu.2013.00358>, Cui, et al (2016) <doi:10.4049/jimmunol.1502263>.
Maintained by Susanna Marquez. Last updated 2 months ago.
2.8 match 7.43 score 222 scripts 2 dependentscran
catalytic:Tools for Applying Catalytic Priors in Statistical Modeling
To improve estimation accuracy and stability in statistical modeling, catalytic prior distributions are employed, integrating observed data with synthetic data generated from a simpler model's predictive distribution. This approach enhances model robustness, stability, and flexibility in complex data scenarios. The catalytic prior distributions are introduced by 'Huang et al.' (2020, <doi:10.1073/pnas.1920913117>), Li and Huang (2023, <doi:10.48550/arXiv.2312.01411>).
Maintained by Dongming Huang. Last updated 3 months ago.
6.6 match 3.18 scoremikkelvembye
AIscreenR:AI Screening Tools in R for Systematic Reviewing
Provides functions to conduct title and abstract screening in systematic reviews using large language models, such as the Generative Pre-trained Transformer (GPT) models from 'OpenAI' <https://platform.openai.com/>. These functions can enhance the quality of title and abstract screenings while reducing the total screening time significantly. In addition, the package includes tools for quality assessment of title and abstract screenings, as described in Vembye, Christensen, Mølgaard, and Schytt (2024) <DOI:10.31219/osf.io/yrhzm>.
Maintained by Mikkel H. Vembye. Last updated 2 months ago.
gptopenaiscreeningsystematic-review
3.4 match 10 stars 6.11 score 7 scriptsxiaooupan
FarmTest:Factor-Adjusted Robust Multiple Testing
Performs robust multiple testing for means in the presence of known and unknown latent factors presented in Fan et al.(2019) "FarmTest: Factor-Adjusted Robust Multiple Testing With Approximate False Discovery Control" <doi:10.1080/01621459.2018.1527700>. Implements a series of adaptive Huber methods combined with fast data-drive tuning schemes proposed in Ke et al.(2019) "User-Friendly Covariance Estimation for Heavy-Tailed Distributions" <doi:10.1214/19-STS711> to estimate model parameters and construct test statistics that are robust against heavy-tailed and/or asymmetric error distributions. Extensions to two-sample simultaneous mean comparison problems are also included. As by-products, this package contains functions that compute adaptive Huber mean, covariance and regression estimators that are of independent interest.
Maintained by Xiaoou Pan. Last updated 4 years ago.
5.9 match 4 stars 3.48 score 15 scriptstidymodels
workflowsets:Create a Collection of 'tidymodels' Workflows
A workflow is a combination of a model and preprocessors (e.g, a formula, recipe, etc.) (Kuhn and Silge (2021) <https://www.tmwr.org/>). In order to try different combinations of these, an object can be created that contains many workflows. There are functions to create workflows en masse as well as training them and visualizing the results.
Maintained by Simon Couch. Last updated 5 months ago.
1.7 match 93 stars 12.21 score 294 scripts 19 dependentszachmayer
caretEnsemble:Ensembles of Caret Models
Functions for creating ensembles of caret models: caretList() and caretStack(). caretList() is a convenience function for fitting multiple caret::train() models to the same dataset. caretStack() will make linear or non-linear combinations of these models, using a caret::train() model as a meta-model.
Maintained by Zachary A. Deane-Mayer. Last updated 3 months ago.
1.7 match 226 stars 11.92 score 780 scripts 1 dependentskaz-yos
tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights
Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.
Maintained by Kazuki Yoshida. Last updated 3 years ago.
baseline-characteristicsdescriptive-statisticsstatistics
1.5 match 221 stars 13.55 score 2.3k scripts 12 dependentsr-lum
Luminescence:Comprehensive Luminescence Dating Data Analysis
A collection of various R functions for the purpose of Luminescence dating data analysis. This includes, amongst others, data import, export, application of age models, curve deconvolution, sequence analysis and plotting of equivalent dose distributions.
Maintained by Sebastian Kreutzer. Last updated 14 hours ago.
bayesian-statisticsdata-sciencegeochronologyluminescenceluminescence-datingopen-scienceoslplottingradiofluorescencetlxsygcpp
1.9 match 15 stars 10.77 score 178 scripts 8 dependentsaleksandarsekulic
meteo:RFSI & STRK Interpolation for Meteo and Environmental Variables
Random Forest Spatial Interpolation (RFSI, Sekulić et al. (2020) <doi:10.3390/rs12101687>) and spatio-temporal geostatistical (spatio-temporal regression Kriging (STRK)) interpolation for meteorological (Kilibarda et al. (2014) <doi:10.1002/2013JD020803>, Sekulić et al. (2020) <doi:10.1007/s00704-019-03077-3>) and other environmental variables. Contains global spatio-temporal models calculated using publicly available data.
Maintained by Aleksandar Sekulić. Last updated 5 months ago.
4.0 match 18 stars 5.06 score 64 scriptsalbertofranzin
bnstruct:Bayesian Network Structure Learning from Data with Missing Values
Bayesian Network Structure Learning from Data with Missing Values. The package implements the Silander-Myllymaki complete search, the Max-Min Parents-and-Children, the Hill-Climbing, the Max-Min Hill-climbing heuristic searches, and the Structural Expectation-Maximization algorithm. Available scoring functions are BDeu, AIC, BIC. The package also implements methods for generating and using bootstrap samples, imputed data, inference.
Maintained by Alberto Franzin. Last updated 1 years ago.
3.7 match 1 stars 5.40 score 111 scripts 3 dependentsharveyklyne
drape:Doubly Robust Average Partial Effects
Doubly robust average partial effect estimation. This implementation contains methods for adding additional smoothness to plug-in regression procedures and for estimating score functions using smoothing splines. Details of the method can be found in Harvey Klyne and Rajen D. Shah (2023) <doi:10.48550/arXiv.2308.09207>.
Maintained by Harvey Klyne. Last updated 4 months ago.
4.9 match 2 stars 4.00 score 4 scriptsqile0317
APackOfTheClones:Visualization of Clonal Expansion for Single Cell Immune Profiles
Visualize clonal expansion via circle-packing. 'APackOfTheClones' extends 'scRepertoire' to produce a publication-ready visualization of clonal expansion at a single cell resolution, by representing expanded clones as differently sized circles. The method was originally implemented by Murray Christian and Ben Murrell in the following immunology study: Ma et al. (2021) <doi:10.1126/sciimmunol.abg6356>.
Maintained by Qile Yang. Last updated 4 months ago.
clonal-analysisimmune-repertoireimmune-systemscrna-seqscrnaseqseuratsingle-cellsingle-cell-genomicscpp
3.0 match 15 stars 6.45 score 15 scriptsmstrimas
colorist:Coloring Wildlife Distributions in Space-Time
Color and visualize wildlife distributions in space-time using raster data. In addition to enabling display of sequential change in distributions through the use of small multiples, 'colorist' provides functions for extracting several features of interest from a sequence of distributions and for visualizing those features using HCL (hue-chroma-luminance) color palettes. Resulting maps allow for "fair" visual comparison of intensity values (e.g., occurrence, abundance, or density) across space and time and can be used to address questions about where, when, and how consistently a species, group, or individual is likely to be found.
Maintained by Matthew Strimas-Mackey. Last updated 11 months ago.
3.3 match 14 stars 5.60 score 19 scriptsmannau
boilerpipeR:Interface to the Boilerpipe Java Library
Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe <https://github.com/kohlschutter/boilerpipe> Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.
Maintained by Mario Annau. Last updated 4 years ago.
3.4 match 22 stars 5.52 score 30 scriptstopepo
Cubist:Rule- And Instance-Based Regression Modeling
Regression modeling using rules with added instance-based corrections.
Maintained by Max Kuhn. Last updated 9 months ago.
1.5 match 40 stars 12.38 score 2.8k scripts 18 dependentsr-forge
tramnet:Penalized Transformation Models
Partially penalized versions of specific transformation models implemented in package 'mlt'. Available models include a fully parametric version of the Cox model, other parametric survival models (Weibull, etc.), models for binary and ordered categorical variables, normal and transformed-normal (Box-Cox type) linear models, and continuous outcome logistic regression. Hyperparameter tuning is facilitated through model-based optimization functionalities from package 'mlr3MBO'. The methodology is described in Kook et al. (2021) <doi:10.32614/RJ-2021-054>. Transformation models and model-based optimization are described in Hothorn et al. (2019) <doi:10.1111/sjos.12291> and Bischl et al. (2016) <arxiv:1703.03373>, respectively.
Maintained by Lucas Kook. Last updated 3 days ago.
4.5 match 4.12 score 2 scriptscran
npreg:Nonparametric Regression via Smoothing Splines
Multiple and generalized nonparametric regression using smoothing spline ANOVA models and generalized additive models, as described in Helwig (2020) <doi:10.4135/9781526421036885885>. Includes support for Gaussian and non-Gaussian responses, smoothers for multiple types of predictors (including random intercepts), interactions between smoothers of mixed types, eight different methods for smoothing parameter selection, and flexible tools for diagnostics, inference, and prediction.
Maintained by Nathaniel E. Helwig. Last updated 12 months ago.
18.3 match 1.00 scoremfasiolo
qgam:Smooth Additive Quantile Regression Models
Smooth additive quantile regression models, fitted using the methods of Fasiolo et al. (2020) <doi:10.1080/01621459.2020.1725521>. See Fasiolo at al. (2021) <doi:10.18637/jss.v100.i09> for an introduction to the package. Differently from 'quantreg', the smoothing parameters are estimated automatically by marginal loss minimization, while the regression coefficients are estimated using either PIRLS or Newton algorithm. The learning rate is determined so that the Bayesian credible intervals of the estimated effects have approximately the correct coverage. The main function is qgam() which is similar to gam() in 'mgcv', but fits non-parametric quantile regression models.
Maintained by Matteo Fasiolo. Last updated 4 days ago.
1.8 match 33 stars 10.13 score 133 scripts 15 dependentsjolars
SLOPE:Sorted L1 Penalized Estimation
Efficient implementations for Sorted L-One Penalized Estimation (SLOPE): generalized linear models regularized with the sorted L1-norm (Bogdan et al. 2015). Supported models include ordinary least-squares regression, binomial regression, multinomial regression, and Poisson regression. Both dense and sparse predictor matrices are supported. In addition, the package features predictor screening rules that enable fast and efficient solutions to high-dimensional problems.
Maintained by Johan Larsson. Last updated 17 hours ago.
generalized-linear-modelsslopesparse-regressioncppopenmp
1.9 match 17 stars 9.62 score 75 scripts 3 dependentsdcomtois
summarytools:Tools to Quickly and Neatly Summarize Data
Data frame summaries, cross-tabulations, weight-enabled frequency tables and common descriptive (univariate) statistics in concise tables available in a variety of formats (plain ASCII, Markdown and HTML). A good point-of-entry for exploring data, both for experienced and new R users.
Maintained by Dominic Comtois. Last updated 1 days ago.
descriptive-statisticsfrequency-tablehtml-reportmarkdownpanderpandocpandoc-markdownrmarkdownrstudio
1.2 match 526 stars 14.52 score 2.9k scripts 6 dependentshjeglinton
riskscores:Optimized Integer Risk Score Models
Implements an optimized approach to learning risk score models, where sparsity and integer constraints are integrated into the model-fitting process.
Maintained by Hannah Eglinton. Last updated 5 months ago.
3.5 match 1 stars 4.95 score 7 scriptsblue-matter
MSEtool:Management Strategy Evaluation Toolkit
Development, simulation testing, and implementation of management procedures for fisheries (see Carruthers & Hordyk (2018) <doi:10.1111/2041-210X.13081>).
Maintained by Adrian Hordyk. Last updated 24 days ago.
2.3 match 8 stars 7.69 score 163 scripts 3 dependentssperfu
findGSEP:Estimate Genome Size of Polyploid Species Using k-Mer Frequencies
Provides tools to estimate the genome size of polyploid species using k-mer frequencies. This package includes functions to process k-mer frequency data and perform genome size estimation by fitting k-mer frequencies with a normal distribution model. It supports handling of complex polyploid genomes and offers various options for customizing the estimation process. The basic method 'findGSE' is detailed in Sun, Hequan, et al. (2018) <doi:10.1093/bioinformatics/btx637>.
Maintained by Laiyi Fu. Last updated 8 months ago.
3.5 match 3 stars 4.88 score 1 scriptscezarykuran
oaii:'OpenAI' API R Interface
A comprehensive set of helpers that streamline data transmission and processing, making it effortless to interact with the 'OpenAI' API.
Maintained by Cezary Kuran. Last updated 1 years ago.
16.9 match 1.00 score 1 scriptsjillbo1000
EZtune:Tunes AdaBoost, Elastic Net, Support Vector Machines, and Gradient Boosting Machines
Contains two functions that are intended to make tuning supervised learning methods easy. The eztune function uses a genetic algorithm or Hooke-Jeeves optimizer to find the best set of tuning parameters. The user can choose the optimizer, the learning method, and if optimization will be based on accuracy obtained through validation error, cross validation, or resubstitution. The function eztune_cv will compute a cross validated error rate. The purpose of eztune_cv is to provide a cross validated accuracy or MSE when resubstitution or validation data are used for optimization because error measures from both approaches can be misleading.
Maintained by Jill Lundell. Last updated 3 years ago.
3.5 match 4.76 score 38 scripts 1 dependentsjrodu
qqboxplot:Implementation of the Q-Q Boxplot
A system to implement the Q-Q boxplot. It is implemented as an extension to 'ggplot2'. The Q-Q boxplot is an amalgam of the boxplot and the Q-Q plot and allows the user to rapidly examine summary statistics and tail behavior for multiple distributions in the same pane. As an extension of the 'ggplot2' implementation of the boxplot, possible modifications to the boxplot extend to the Q-Q boxplot.
Maintained by Jordan Rodu. Last updated 2 years ago.
3.5 match 2 stars 4.76 score 29 scriptsmrkellermann
eiPack:Ecological Inference and Higher-Dimension Data Management
Provides methods for analyzing R by C ecological contingency tables using the extreme case analysis, ecological regression, and Multinomial-Dirichlet ecological inference models. Also provides tools for manipulating higher-dimension data objects.
Maintained by Michael Kellermann. Last updated 2 years ago.
8.6 match 1.92 score 28 scripts 1 dependentsmlr-org
mlr3fselect:Feature Selection for 'mlr3'
Feature selection package of the 'mlr3' ecosystem. It selects the optimal feature set for any 'mlr3' learner. The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling.
Maintained by Marc Becker. Last updated 2 months ago.
evolutionary-algorithmsexhaustive-searchfeature-selectionmachine-learningmlr3optimizationrandom-searchrecursive-feature-eliminationsequential-feature-selection
2.0 match 23 stars 8.25 score 70 scripts 2 dependentscomputationalstylistics
stylo:Stylometric Multivariate Analyses
Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.
Maintained by Maciej Eder. Last updated 2 months ago.
1.9 match 186 stars 8.59 score 462 scriptsbioc
MassSpecWavelet:Peak Detection for Mass Spectrometry data using wavelet-based algorithms
Peak Detection in Mass Spectrometry data is one of the important preprocessing steps. The performance of peak detection affects subsequent processes, including protein identification, profile alignment and biomarker identification. Using Continuous Wavelet Transform (CWT), this package provides a reliable algorithm for peak detection that does not require any type of smoothing or previous baseline correction method, providing more consistent results for different spectra. See <doi:10.1093/bioinformatics/btl355} for further details.
Maintained by Sergio Oller Moreno. Last updated 3 months ago.
immunooncologymassspectrometryproteomicspeakdetection
1.7 match 9 stars 9.38 score 37 scripts 17 dependentscran
sgPLS:Sparse Group Partial Least Square Methods
Regularized version of partial least square approaches providing sparse, group, and sparse group versions of partial least square regression models (Liquet, B., Lafaye de Micheaux, P., Hejblum B., Thiebaut, R. (2016) <doi:10.1093/bioinformatics/btv535>). Version of PLS Discriminant analysis is also provided.
Maintained by Benoit Liquet. Last updated 1 years ago.
10.9 match 1 stars 1.48 score 7 scriptsmariaguilleng
boostingDEA:A Boosting Approach to Data Envelopment Analysis
Includes functions to estimate production frontiers and make ideal output predictions in the Data Envelopment Analysis (DEA) context using both standard models from DEA and Free Disposal Hull (FDH) and boosting techniques. In particular, EATBoosting (Guillen et al., 2023 <doi:10.1016/j.eswa.2022.119134>) and MARSBoosting. Moreover, the package includes code for estimating several technical efficiency measures using different models such as the input and output-oriented radial measures, the input and output-oriented Russell measures, the Directional Distance Function (DDF), the Weighted Additive Measure (WAM) and the Slacks-Based Measure (SBM).
Maintained by Maria D. Guillen. Last updated 2 years ago.
4.0 match 2 stars 4.00 score 3 scriptswenjie2wang
abclass:Angle-Based Large-Margin Classifiers
Multi-category angle-based large-margin classifiers. See Zhang and Liu (2014) <doi:10.1093/biomet/asu017> for details.
Maintained by Wenjie Wang. Last updated 1 years ago.
5.3 match 2 stars 3.04 score 11 scriptsyixuan
recosystem:Recommender System using Matrix Factorization
R wrapper of the 'libmf' library <https://www.csie.ntu.edu.tw/~cjlin/libmf/> for recommender system using matrix factorization. It is typically used to approximate an incomplete matrix using the product of two matrices in a latent space. Other common names for this task include "collaborative filtering", "matrix completion", "matrix recovery", etc. High performance multi-core parallel computing is supported in this package.
Maintained by Yixuan Qiu. Last updated 2 years ago.
matrix-factorizationrecommender-systemcppopenmp
2.0 match 84 stars 7.97 score 101 scripts 6 dependentsduckmayr
bggum:Bayesian Estimation of Generalized Graded Unfolding Model Parameters
Provides a Metropolis-coupled Markov chain Monte Carlo sampler, post-processing and parameter estimation functions, and plotting utilities for the generalized graded unfolding model of Roberts, Donoghue, and Laughlin (2000) <doi:10.1177/01466216000241001>.
Maintained by JBrandon Duck-Mayr. Last updated 5 years ago.
3.3 match 4 stars 4.78 score 6 scriptsmyles-lewis
nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'
Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.
Maintained by Myles Lewis. Last updated 4 days ago.
2.0 match 12 stars 7.92 score 46 scriptscran
grf:Generalized Random Forests
Forest-based statistical estimation and inference. GRF provides non-parametric methods for heterogeneous treatment effects estimation (optionally using right-censored outcomes, multiple treatment arms or outcomes, or instrumental variables), as well as least-squares regression, quantile regression, and survival regression, all with support for missing covariates.
Maintained by Erik Sverdrup. Last updated 4 months ago.
2.7 match 5.83 score 1.2k scripts 14 dependentsmikejohnson51
AHGestimation:An R package for Computing Robust, Mass Preserving Hydraulic Geometries and Rating Curves
Compute mass preserving 'At a station Hydraulic Geometry' (AHG) fits from river measurements.
Maintained by Mike Johnson. Last updated 3 months ago.
3.1 match 6 stars 5.02 score 10 scriptsbioc
nethet:A bioconductor package for high-dimensional exploration of biological network heterogeneity
Package nethet is an implementation of statistical solid methodology enabling the analysis of network heterogeneity from high-dimensional data. It combines several implementations of recent statistical innovations useful for estimation and comparison of networks in a heterogeneous, high-dimensional setting. In particular, we provide code for formal two-sample testing in Gaussian graphical models (differential network and GGM-GSA; Stadler and Mukherjee, 2013, 2014) and make a novel network-based clustering algorithm available (mixed graphical lasso, Stadler and Mukherjee, 2013).
Maintained by Nicolas Staedler. Last updated 5 months ago.
3.6 match 4.30 score 7 scriptsfabsig
gpboost:Combining Tree-Boosting with Gaussian Process and Mixed Effects Models
An R package that allows for combining tree-boosting with Gaussian process and mixed effects models. It also allows for independently doing tree-boosting as well as inference and prediction for Gaussian process and mixed effects models. See <https://github.com/fabsig/GPBoost> for more information on the software and Sigrist (2022, JMLR) <https://www.jmlr.org/papers/v23/20-322.html> and Sigrist (2023, TPAMI) <doi:10.1109/TPAMI.2022.3168152> for more information on the methodology.
Maintained by Fabio Sigrist. Last updated 24 days ago.
3.7 match 4.20 score 212 scriptspecanproject
PEcAn.emulator:Gausian Process Emulator
Implementation of a Gaussian Process model (both likelihood and bayesian approaches) for kriging and model emulation. Includes functions for sampling design and prediction.
Maintained by Mike Dietze. Last updated 13 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
1.8 match 216 stars 8.82 score 1 scripts 6 dependentstomkellygenetics
vioplot:Violin Plot
A violin plot is a combination of a box plot and a kernel density plot. This package allows extensive customisation of violin plots.
Maintained by S. Thomas Kelly. Last updated 19 days ago.
boxplotcolourscustomisationdatavizformulaplottingviolin-plotviolinplotvioplot
1.3 match 26 stars 12.32 score 2.0k scripts 8 dependentsdcnorris
DTAT:Dose Titration Algorithm Tuning
Dose Titration Algorithm Tuning (DTAT) is a methodologic framework allowing dose individualization to be conceived as a continuous learning process that begins in early-phase clinical trials and continues throughout drug development, on into clinical practice. This package includes code that researchers may use to reproduce or extend key results of the DTAT research programme, plus tools for trialists to design and simulate a '3+3/PC' dose-finding study. Please see Norris (2017a) <doi:10.12688/f1000research.10624.3> and Norris (2017c) <doi:10.1101/240846>.
Maintained by David C. Norris. Last updated 10 months ago.
5.3 match 2.90 score 20 scriptsjackmwolf
tehtuner:Fit and Tune Models to Detect Treatment Effect Heterogeneity
Implements methods to fit Virtual Twins models (Foster et al. (2011) <doi:10.1002/sim.4322>) for identifying subgroups with differential effects in the context of clinical trials while controlling the probability of falsely detecting a differential effect when the conditional average treatment effect is uniform across the study population using parameter selection methods proposed in Wolf et al. (2022) <doi:10.1177/17407745221095855>.
Maintained by Jack Wolf. Last updated 2 years ago.
clinical-trialsheterogeneity-of-treatment-effectsubgroup-identification
4.7 match 4 stars 3.30 score 6 scriptsbioc
BayesSpace:Clustering and Resolution Enhancement of Spatial Transcriptomes
Tools for clustering and enhancing the resolution of spatial gene expression experiments. BayesSpace clusters a low-dimensional representation of the gene expression matrix, incorporating a spatial prior to encourage neighboring spots to cluster together. The method can enhance the resolution of the low-dimensional representation into "sub-spots", for which features such as gene expression or cell type composition can be imputed.
Maintained by Matt Stone. Last updated 5 months ago.
softwareclusteringtranscriptomicsgeneexpressionsinglecellimmunooncologydataimportopenblascppopenmp
1.7 match 123 stars 8.89 score 278 scripts 1 dependentsramikrispin
TSstudio:Functions for Time Series Analysis and Forecasting
Provides a set of tools for descriptive and predictive analysis of time series data. That includes functions for interactive visualization of time series objects and as well utility functions for automation time series forecasting.
Maintained by Rami Krispin. Last updated 2 years ago.
forecastingtime-seriestimeseriestsstudiovisualization
1.7 match 425 stars 9.02 score 656 scriptsricharddmorey
BayesFactor:Computation of Bayes Factors for Common Designs
A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.
Maintained by Richard D. Morey. Last updated 1 years ago.
1.1 match 133 stars 13.70 score 1.7k scripts 21 dependentszhuwang46
mpath:Regularized Linear Models
Algorithms compute robust estimators for loss functions in the concave convex (CC) family by the iteratively reweighted convex optimization (IRCO), an extension of the iteratively reweighted least squares (IRLS). The IRCO reduces the weight of the observation that leads to a large loss; it also provides weights to help identify outliers. Applications include robust (penalized) generalized linear models and robust support vector machines. The package also contains penalized Poisson, negative binomial, zero-inflated Poisson, zero-inflated negative binomial regression models and robust models with non-convex loss functions. Wang et al. (2014) <doi:10.1002/sim.6314>, Wang et al. (2015) <doi:10.1002/bimj.201400143>, Wang et al. (2016) <doi:10.1177/0962280214530608>, Wang (2021) <doi:10.1007/s11749-021-00770-2>, Wang (2020) <arXiv:2010.02848>.
Maintained by Zhu Wang. Last updated 3 years ago.
2.3 match 1 stars 6.67 score 131 scripts 4 dependentswlandau
crew:A Distributed Worker Launcher Framework
In computationally demanding analysis projects, statisticians and data scientists asynchronously deploy long-running tasks to distributed systems, ranging from traditional clusters to cloud services. The 'NNG'-powered 'mirai' R package by Gao (2023) <doi:10.5281/zenodo.7912722> is a sleek and sophisticated scheduler that efficiently processes these intense workloads. The 'crew' package extends 'mirai' with a unifying interface for third-party worker launchers. Inspiration also comes from packages. 'future' by Bengtsson (2021) <doi:10.32614/RJ-2021-048>, 'rrq' by FitzJohn and Ashton (2023) <https://github.com/mrc-ide/rrq>, 'clustermq' by Schubert (2019) <doi:10.1093/bioinformatics/btz284>), and 'batchtools' by Lang, Bischel, and Surmann (2017) <doi:10.21105/joss.00135>.
Maintained by William Michael Landau. Last updated 11 hours ago.
1.3 match 136 stars 11.19 score 243 scripts 2 dependentsevolecolgroup
tidysdm:Species Distribution Models with Tidymodels
Fit species distribution models (SDMs) using the 'tidymodels' framework, which provides a standardised interface to define models and process their outputs. 'tidysdm' expands 'tidymodels' by providing methods for spatial objects, models and metrics specific to SDMs, as well as a number of specialised functions to process occurrences for contemporary and palaeo datasets. The full functionalities of the package are described in Leonardi et al. (2023) <doi:10.1101/2023.07.24.550358>.
Maintained by Andrea Manica. Last updated 8 days ago.
species-distribution-modellingtidymodels
1.7 match 31 stars 8.82 score 51 scriptsbips-hb
CVN:Covariate-Varying Networks
Inferring high-dimensional Gaussian graphical networks that change with multiple discrete covariates. Louis Dijkstra, Arne Godt, Ronja Foraita (2024) <arXiv:2407.19978>.
Maintained by Ronja Foraita. Last updated 1 months ago.
graphical-modelshigh-dimensional-statisticsnetwork-analysiscpp
4.0 match 3.70 score 7 scriptsjonathancornelissen
highfrequency:Tools for Highfrequency Data Analysis
Provide functionality to manage, clean and match highfrequency trades and quotes data, calculate various liquidity measures, estimate and forecast volatility, detect price jumps and investigate microstructure noise and intraday periodicity. A detailed vignette can be found in the paper "Analyzing Intraday Financial Data in R: The highfrequency Package" by Boudt, Kleen, and Sjoerup (2022, <doi:10.18637/jss.v104.i08>). The DOI in the CITATION is for a new Journal of Statistical Software publication that will be registered after publication on CRAN. A working paper version can be found on SSRN: <doi:10.2139/ssrn.3917548>.
Maintained by Kris Boudt. Last updated 2 years ago.
2.0 match 152 stars 7.37 score 286 scriptsb-thi
FuncNN:Functional Neural Networks
A collection of functions which fit functional neural network models. In other words, this package will allow users to build deep learning models that have either functional or scalar responses paired with functional and scalar covariates. We implement the theoretical discussion found in Thind, Multani and Cao (2020) <arXiv:2006.09590> through the help of a main fitting and prediction function as well as a number of helper functions to assist with cross-validation, tuning, and the display of estimated functional weights.
Maintained by Barinder Thind. Last updated 5 years ago.
4.6 match 3 stars 3.18 score 5 scriptsrohelab
fastadi:Self-Tuning Data Adaptive Matrix Imputation
Implements the AdaptiveImpute matrix completion algorithm of 'Intelligent Initialization and Adaptive Thresholding for Iterative Matrix Completion', <https://amstat.tandfonline.com/doi/abs/10.1080/10618600.2018.1518238>. AdaptiveImpute is useful for embedding sparsely observed matrices, often out performs competing matrix completion algorithms, and self-tunes its hyperparameter, making usage easy.
Maintained by Alex Hayes. Last updated 9 months ago.
3.4 match 9 stars 4.26 score 6 scriptscran
sparcl:Perform Sparse Hierarchical Clustering and Sparse K-Means Clustering
Implements the sparse clustering methods of Witten and Tibshirani (2010): "A framework for feature selection in clustering"; published in Journal of the American Statistical Association 105(490): 713-726.
Maintained by Daniela Witten. Last updated 6 years ago.
3.5 match 1 stars 4.20 score 133 scripts 4 dependentscrj32
Spectrum:Fast Adaptive Spectral Clustering for Single and Multi-View Data
A self-tuning spectral clustering method for single or multi-view data. 'Spectrum' uses a new type of adaptive density aware kernel that strengthens connections in the graph based on common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to integrate different data sources and reduce noise. 'Spectrum' uses either the eigengap or multimodality gap heuristics to determine the number of clusters. The method is sufficiently flexible so that a wide range of Gaussian and non-Gaussian structures can be clustered with automatic selection of K.
Maintained by Christopher R John. Last updated 5 years ago.
2.3 match 7 stars 5.99 score 47 scripts 1 dependentsdaniel-jg
BeviMed:Bayesian Evaluation of Variant Involvement in Mendelian Disease
A fast integrative genetic association test for rare diseases based on a model for disease status given allele counts at rare variant sites. Probability of association, mode of inheritance and probability of pathogenicity for individual variants are all inferred in a Bayesian framework - 'A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases', Greene et al 2017 <doi:10.1016/j.ajhg.2017.05.015>.
Maintained by Daniel Greene. Last updated 10 months ago.
4.0 match 1 stars 3.41 score 17 scriptsropensci
DoOR.functions:Integrating Heterogeneous Odorant Response Data into a Common Response Model: A DoOR to the Complete Olfactome
This is a function package providing functions to perform data manipulations and visualizations for DoOR.data. See the URLs for the original and the DoOR 2.0 publication.
Maintained by Daniel Münch. Last updated 1 years ago.
2.5 match 8 stars 5.40 score 52 scriptsly129
ktweedie:'Tweedie' Compound Poisson Model in the Reproducing Kernel Hilbert Space
Kernel-based 'Tweedie' compound Poisson gamma model using high-dimensional predictors for the analyses of zero-inflated response variables. The package features built-in estimation, prediction and cross-validation tools and supports choice of different kernel functions. For more details, please see Yi Lian, Archer Yi Yang, Boxiang Wang, Peng Shi & Robert William Platt (2023) <doi:10.1080/00401706.2022.2156615>.
Maintained by Yi Lian. Last updated 1 years ago.
3.3 match 2 stars 4.00 score 5 scriptsrjacobucci
regsem:Regularized Structural Equation Modeling
Uses both ridge and lasso penalties (and extensions) to penalize specific parameters in structural equation models. The package offers additional cost functions, cross validation, and other extensions beyond traditional structural equation models. Also contains a function to perform exploratory mediation (XMed).
Maintained by Ross Jacobucci. Last updated 2 years ago.
2.0 match 14 stars 6.63 score 77 scriptscran
cosso:Fit Regularized Nonparametric Regression Models Using COSSO Penalty
The COSSO regularization method automatically estimates and selects important function components by a soft-thresholding penalty in the context of smoothing spline ANOVA models. Implemented models include mean regression, quantile regression, logistic regression and the Cox regression models.
Maintained by Isaac Ray. Last updated 2 years ago.
10.3 match 1 stars 1.28 score 19 scriptsaalfons
cvTools:Cross-validation tools for regression models
Tools that allow developers to write functions for cross-validation with minimal programming effort and assist users with model selection.
Maintained by Andreas Alfons. Last updated 13 years ago.
1.8 match 8 stars 7.26 score 460 scripts 18 dependentssth1402
GGMridge:Gaussian Graphical Models Using Ridge Penalty Followed by Thresholding and Reestimation
Estimation of partial correlation matrix using ridge penalty followed by thresholding and reestimation. Under multivariate Gaussian assumption, the matrix constitutes an Gaussian graphical model (GGM).
Maintained by Shannon T. Holloway. Last updated 1 years ago.
6.7 match 1.89 score 13 scripts 2 dependentsqiantang0326
hdqr:Fast Algorithm for Penalized Quantile Regression
Implements an efficient algorithm to fit and tune penalized quantile regression models using the generalized coordinate descent algorithm. Designed to handle high-dimensional datasets effectively, with emphasis on precision and computational efficiency. This package implements the algorithms proposed in Tang, Q., Zhang, Y., & Wang, B. (2022) <https://openreview.net/pdf?id=RvwMTDYTOb>.
Maintained by Qian Tang. Last updated 1 months ago.
5.5 match 2.30 scoret-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
1.2 match 10.82 score 10k scripts 54 dependentsjosetamezpena
FRESA.CAD:Feature Selection Algorithms for Computer Aided Diagnosis
Contains a set of utilities for building and testing statistical models (linear, logistic,ordinal or COX) for Computer Aided Diagnosis/Prognosis applications. Utilities include data adjustment, univariate analysis, model building, model-validation, longitudinal analysis, reporting and visualization.
Maintained by Jose Gerardo Tamez-Pena. Last updated 1 months ago.
2.3 match 7 stars 5.59 score 31 scriptsbioc
ClassifyR:A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing
The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.
Maintained by Dario Strbenac. Last updated 5 days ago.
1.5 match 5 stars 8.36 score 45 scripts 3 dependentscran
CSCNet:Fitting and Tuning Regularized Cause-Specific Cox Models with Elastic-Net Penalty
Flexible tools to fit, tune and obtain absolute risk predictions from regularized cause-specific cox models with elastic-net penalty.
Maintained by Shahin Roshani. Last updated 2 years ago.
4.6 match 2.70 score 3 scriptssyeonkang
rrMixture:Reduced-Rank Mixture Models
We implement full-ranked, rank-penalized, and adaptive nuclear norm penalized estimation methods using multivariate mixture models proposed by Kang, Chen, and Yao (2022+).
Maintained by Suyeon Kang. Last updated 3 years ago.
6.2 match 2.00 score 5 scriptsjhorzek
lessSEM:Non-Smooth Regularization for Structural Equation Models
Provides regularized structural equation modeling (regularized SEM) with non-smooth penalty functions (e.g., lasso) building on 'lavaan'. The package is heavily inspired by the ['regsem'](<https://github.com/Rjacobucci/regsem>) and ['lslx'](<https://github.com/psyphh/lslx>) packages.
Maintained by Jannik H. Orzek. Last updated 1 years ago.
lassopsychometricsregularizationregularized-structural-equation-modelsemstructural-equation-modelingopenblascppopenmp
1.7 match 7 stars 7.19 score 223 scriptsalarm-redist
redist:Simulation Methods for Legislative Redistricting
Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <arXiv:1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <arXiv:2011.02288>.
Maintained by Christopher T. Kenny. Last updated 2 months ago.
geospatialgerrymanderingredistrictingsamplingopenblascppopenmp
1.3 match 68 stars 9.17 score 259 scriptsphilipp-baumann
simplerspec:Soil and plant spectroscopic model building and prediction
Functions that cover reading of spectral data, outlier removal, spectral preprocessing, calibration sampling, PLS regression using caret, and model diagnostic statistics and plots.
Maintained by Philipp Baumann. Last updated 1 years ago.
3.5 match 33 stars 3.52 score 10 scriptshaghish
shapley:Weighted Mean SHAP and CI for Robust Feature Selection in ML Grid
This R package introduces Weighted Mean SHapley Additive exPlanations (WMSHAP), an innovative method for calculating SHAP values for a grid of fine-tuned base-learner machine learning models as well as stacked ensembles, a method not previously available due to the common reliance on single best-performing models. By integrating the weighted mean SHAP values from individual base-learners comprising the ensemble or individual base-learners in a tuning grid search, the package weights SHAP contributions according to each model's performance, assessed by multiple either R squared (for both regression and classification models). alternatively, this software also offers weighting SHAP values based on the area under the precision-recall curve (AUCPR), the area under the curve (AUC), and F2 measures for binary classifiers. It further extends this framework to implement weighted confidence intervals for weighted mean SHAP values, offering a more comprehensive and robust feature importance evaluation over a grid of machine learning models, instead of solely computing SHAP values for the best model. This methodology is particularly beneficial for addressing the severe class imbalance (class rarity) problem by providing a transparent, generalized measure of feature importance that mitigates the risk of reporting SHAP values for an overfitted or biased model and maintains robustness under severe class imbalance, where there is no universal criteria of identifying the absolute best model. Furthermore, the package implements hypothesis testing to ascertain the statistical significance of SHAP values for individual features, as well as comparative significance testing of SHAP contributions between features. Additionally, it tackles a critical gap in feature selection literature by presenting criteria for the automatic feature selection of the most important features across a grid of models or stacked ensembles, eliminating the need for arbitrary determination of the number of top features to be extracted. This utility is invaluable for researchers analyzing feature significance, particularly within severely imbalanced outcomes where conventional methods fall short. Moreover, it is also expected to report democratic feature importance across a grid of models, resulting in a more comprehensive and generalizable feature selection. The package further implements a novel method for visualizing SHAP values both at subject level and feature level as well as a plot for feature selection based on the weighted mean SHAP ratios.
Maintained by E. F. Haghish. Last updated 1 days ago.
class-imbalanceclass-imbalance-problemfeature-extractionfeature-importancefeature-selectionmachine-learningmachine-learning-algorithmsshapshap-analysisshap-valuesshapelyshapley-additive-explanationsshapley-decompositionshapley-valueshapley-valuesshapleyvalueweighted-shapweighted-shap-confidence-intervalweighted-shapleyweighted-shapley-ci
2.3 match 14 stars 5.19 score 17 scriptsegenn
rtemis:Machine Learning and Visualization
Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.
Maintained by E.D. Gennatas. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningmachine-learning-libraryvisualization
1.7 match 145 stars 7.09 score 50 scripts 2 dependentscran
evreg:Evidential Regression
An implementation of the 'Evidential Neural Network for Regression' model recently introduced in Denoeux (2023) <doi:10.1109/TFUZZ.2023.3268200>. In this model, prediction uncertainty is quantified by Gaussian random fuzzy numbers as introduced in Denoeux (2023) <doi:10.1016/j.fss.2022.06.004>. The package contains functions for training the network, tuning hyperparameters by cross-validation or the hold-out method, and making predictions. It also contains utilities for making calculations with Gaussian random fuzzy numbers (such as, e.g., computing the degrees of belief and plausibility of an interval, or combining Gaussian random fuzzy numbers).
Maintained by Thierry Denoeux. Last updated 10 months ago.
5.3 match 2.30 scoremartakarass
adept:Adaptive Empirical Pattern Transformation
Designed for optimal use in performing fast, accurate walking strides segmentation from high-density data collected from a wearable accelerometer worn during continuous walking activity.
Maintained by Marta Karas. Last updated 4 years ago.
2.2 match 7 stars 5.35 score 16 scriptscivisanalytics
civis:R Client for the 'Civis Platform API'
A convenient interface for making requests directly to the 'Civis Platform API' <https://www.civisanalytics.com/platform/>. Full documentation available 'here' <https://civisanalytics.github.io/civis-r/>.
Maintained by Peter Cooman. Last updated 2 months ago.
1.5 match 16 stars 7.84 score 144 scriptstidymodels
agua:'tidymodels' Integration with 'h2o'
Create and evaluate models using 'tidymodels' and 'h2o' <https://h2o.ai/>. The package enables users to specify 'h2o' as an engine for several modeling methods.
Maintained by Qiushi Yan. Last updated 9 months ago.
1.7 match 22 stars 6.88 score 80 scriptsropensci
QuadratiK:Collection of Methods Constructed using Kernel-Based Quadratic Distances
It includes test for multivariate normality, test for uniformity on the d-dimensional Sphere, non-parametric two- and k-sample tests, random generation of points from the Poisson kernel-based density and clustering algorithm for spherical data. For more information see Saraceno G., Markatou M., Mukhopadhyay R. and Golzy M. (2024) <doi:10.48550/arXiv.2402.02290> Markatou, M. and Saraceno, G. (2024) <doi:10.48550/arXiv.2407.16374>, Ding, Y., Markatou, M. and Saraceno, G. (2023) <doi:10.5705/ss.202022.0347>, and Golzy, M. and Markatou, M. (2020) <doi:10.1080/10618600.2020.1740713>.
Maintained by Giovanni Saraceno. Last updated 1 months ago.
1.8 match 1 stars 6.36 score 27 scriptsbioc
basilisk:Freezing Python Dependencies Inside Bioconductor Packages
Installs a self-contained conda instance that is managed by the R/Bioconductor installation machinery. This aims to provide a consistent Python environment that can be used reliably by Bioconductor packages. Functions are also provided to enable smooth interoperability of multiple Python environments in a single R session.
Maintained by Aaron Lun. Last updated 1 months ago.
1.3 match 9.10 score 75 scripts 38 dependentsajmolstad
MatrixLDA:Penalized Matrix-Normal Linear Discriminant Analysis
Fits the penalized matrix-normal model to be used for linear discriminant analysis with matrix-valued predictors. For a description of the method, see Molstad and Rothman (2018) <doi:10.1080/10618600.2018.1476249>.
Maintained by Aaron J. Molstad. Last updated 1 years ago.
4.2 match 1 stars 2.70 score 4 scriptsrstudio
tfhub:Interface to 'TensorFlow' Hub
'TensorFlow' Hub is a library for the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a 'TensorFlow' graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning. Transfer learning train a model with a smaller dataset, improve generalization, and speed up training.
Maintained by Tomasz Kalinowski. Last updated 3 years ago.
1.5 match 29 stars 7.46 score 73 scripts 1 dependents