Showing 17 of total 17 results (show query)
parsnip:A Common API to Modeling and Analysis Functions
A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).
Maintained by Max Kuhn. Last updated 16 days ago.
612 stars 16.37 score 3.4k scripts 69 dependentstidymodels
hardhat:Construct Modeling Packages
Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.
Maintained by Hannah Frick. Last updated 2 months ago.
103 stars 14.88 score 175 scripts 436 dependentstidymodels
tune:Tidy Tuning Tools
The ability to tune models is important. 'tune' contains functions and classes to be used in conjunction with other 'tidymodels' packages for finding reasonable values of hyper-parameters in models, pre-processing methods, and post-processing steps.
Maintained by Max Kuhn. Last updated 24 days ago.
293 stars 14.27 score 756 scripts 39 dependentsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 16 days ago.
182 stars 13.71 score 1.3k scripts 22 dependentsmlr-org
mlr3tuning:Hyperparameter Optimization for 'mlr3'
Hyperparameter optimization package of the 'mlr3' ecosystem. It features highly configurable search spaces via the 'paradox' package and finds optimal hyperparameter configurations for any 'mlr3' learner. 'mlr3tuning' works with several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in 'mlr3mbo') and Hyperband (in 'mlr3hyperband'). Moreover, it can automatically optimize learners and estimate the performance of optimized models with nested resampling.
Maintained by Marc Becker. Last updated 3 months ago.
55 stars 11.53 score 384 scripts 11 dependentscran
e1071:Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien
Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, generalized k-nearest neighbour ...
Maintained by David Meyer. Last updated 6 months ago.
29 stars 11.26 score 2.0k dependentskogalur
randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.
Maintained by Udaya B. Kogalur. Last updated 5 hours ago.
124 stars 10.10 score 1.2k scripts 11 dependentscitoverse
cito:Building and Training Neural Networks
The 'cito' package provides a user-friendly interface for training and interpreting deep neural networks (DNN). 'cito' simplifies the fitting of DNNs by supporting the familiar formula syntax, hyperparameter tuning under cross-validation, and helps to detect and handle convergence problems. DNNs can be trained on CPU, GPU and MacOS GPUs. In addition, 'cito' has many downstream functionalities such as various explainable AI (xAI) metrics (e.g. variable importance, partial dependence plots, accumulated local effect plots, and effect estimates) to interpret trained DNNs. 'cito' optionally provides confidence intervals (and p-values) for all xAI metrics and predictions. At the same time, 'cito' is computationally efficient because it is based on the deep learning framework 'torch'. The 'torch' package is native to R, so no Python installation or other API is required for this package.
Maintained by Maximilian Pichler. Last updated 2 months ago.
42 stars 9.07 score 129 scripts 1 dependentsmlr-org
mlr3verse:Easily Install and Load the 'mlr3' Package Family
The 'mlr3' package family is a set of packages for machine-learning purposes built in a modular fashion. This wrapper package is aimed to simplify the installation and loading of the core 'mlr3' packages. Get more information about the 'mlr3' project at <>.
Maintained by Marc Becker. Last updated 2 months ago.
55 stars 8.32 score 720 scripts 1 dependentsrobingenuer
VSURF:Variable Selection Using Random Forests
Three steps variable selection procedure based on random forests. Initially developed to handle high dimensional data (for which number of variables largely exceeds number of observations), the package is very versatile and can treat most dimensions of data, for regression and supervised classification problems. First step is dedicated to eliminate irrelevant variables from the dataset. Second step aims to select all variables related to the response for interpretation purpose. Third step refines the selection by eliminating redundancy in the set of variables selected by the second step, for prediction purpose. Genuer, R. Poggi, J.-M. and Tuleau-Malot, C. (2015) <>.
Maintained by Robin Genuer. Last updated 9 months ago.
36 stars 7.49 score 192 scripts 1 dependentstidymodels
tidyclust:A Common API to Clustering
A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
111 stars 7.45 score 139 scriptsyunuuuu
ggalign:A 'ggplot2' Extension for Consistent Axis Alignment
A 'ggplot2' extension offers various tools the creation of complex, multi-plot visualizations. Built on the familiar grammar of graphics, it provides intuitive tools to align and organize plots, making it ideal for complex visualizations. It excels in multi-omics research—such as genomics and microbiomes—by simplifying the visualization of intricate relationships between datasets, for example, linking genes to pathways. Whether you need to stack plots, arrange them around a central figure, or create a circular layout, 'ggalign' delivers flexibility and accuracy with minimal effort.
Maintained by Yun Peng. Last updated 13 days ago.
267 stars 7.08 score 27 scriptscenterforstatistics-ugent
xnet:Two-Step Kernel Ridge Regression for Network Predictions
Fit a two-step kernel ridge regression model for predicting edges in networks, and carry out cross-validation using shortcuts for swift and accurate performance assessment (Stock et al, 2018 <doi:10.1093/bib/bby095> ).
Maintained by Joris Meys. Last updated 4 years ago.
11 stars 5.30 score 12 scriptsbioc
CMA:Synthesis of microarray-based classification
This package provides a comprehensive collection of various microarray-based classification algorithms both from Machine Learning and Statistics. Variable Selection, Hyperparameter tuning, Evaluation and Comparison can be performed combined or stepwise in a user-friendly environment.
Maintained by Roman Hornung. Last updated 5 months ago.
5.09 score 61 scriptscran
astrochron:A Computational Tool for Astrochronology
Routines for astrochronologic testing, astronomical time scale construction, and time series analysis <doi:10.1016/j.earscirev.2018.11.015>. Also included are a range of statistical analysis and modeling routines that are relevant to time scale development and paleoclimate analysis.
Maintained by Stephen Meyers. Last updated 6 months ago.
5 stars 2.70 scorerarabzadeh
RHMS:Hydrologic Modelling System for R Users
Hydrologic modelling system is an object oriented tool for simulation and analysis of hydrologic events. The package proposes functions and methods for construction, simulation, visualization, and calibration of a hydrologic model.
Maintained by Rezgar Arabzadeh. Last updated 4 years ago.
1 stars 1.30 score 10 scripts