Showing 200 of total 845 results (show query)
rstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 6 hours ago.
49.0 match 845 stars 13.60 score 264 scripts 2 dependentstopepo
caret:Classification and Regression Training
Misc functions for training and plotting classification and regression models.
Maintained by Max Kuhn. Last updated 3 months ago.
29.3 match 1.6k stars 19.24 score 61k scripts 303 dependentst-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
49.5 match 10.93 score 10k scripts 55 dependentsrstudio
tfruns:Training Run Tools for 'TensorFlow'
Create and manage unique directories for each 'TensorFlow' training run. Provides a unique, time stamped directory for each run along with functions to retrieve the directory of the latest run or latest several runs.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
31.1 match 34 stars 11.80 score 325 scripts 77 dependentswjbraun
DAAG:Data Analysis and Graphics Data and Functions
Functions and data sets used in examples and exercises in the text Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) "Data Analysis and Graphics Using R", and in an upcoming Maindonald, Braun, and Andrews text that builds on this earlier text.
Maintained by W. John Braun. Last updated 11 months ago.
28.1 match 8.25 score 1.2k scripts 1 dependentsbioc
PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit
Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
geneexpressionpharmacogeneticspharmacogenomicssoftwareclassificationsurvivalclusteringgeneprediction
53.0 match 1 stars 4.31 score 17 scriptsrstudio
tfestimators:Interface to 'TensorFlow' Estimators
Interface to 'TensorFlow' Estimators <https://www.tensorflow.org/guide/estimator>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.
Maintained by Tomasz Kalinowski. Last updated 3 years ago.
25.8 match 57 stars 8.42 score 170 scriptsazure
azuremlsdk:Interface to the 'Azure Machine Learning' 'SDK'
Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: <https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml>.
Maintained by Diondra Peck. Last updated 3 years ago.
amlcomputeazureazure-machine-learningazuremldsimachine-learningrstudiosdk-r
24.3 match 106 stars 8.91 score 221 scriptscitoverse
cito:Building and Training Neural Networks
The 'cito' package provides a user-friendly interface for training and interpreting deep neural networks (DNN). 'cito' simplifies the fitting of DNNs by supporting the familiar formula syntax, hyperparameter tuning under cross-validation, and helps to detect and handle convergence problems. DNNs can be trained on CPU, GPU and MacOS GPUs. In addition, 'cito' has many downstream functionalities such as various explainable AI (xAI) metrics (e.g. variable importance, partial dependence plots, accumulated local effect plots, and effect estimates) to interpret trained DNNs. 'cito' optionally provides confidence intervals (and p-values) for all xAI metrics and predictions. At the same time, 'cito' is computationally efficient because it is based on the deep learning framework 'torch'. The 'torch' package is native to R, so no Python installation or other API is required for this package.
Maintained by Maximilian Pichler. Last updated 2 months ago.
machine-learningneural-network
22.7 match 42 stars 9.07 score 129 scripts 1 dependentsjackdunnnz
iai:Interface to 'Interpretable AI' Modules
An interface to the algorithms of 'Interpretable AI' <https://www.interpretable.ai> from the R programming language. 'Interpretable AI' provides various modules, including 'Optimal Trees' for classification, regression, prescription and survival analysis, 'Optimal Imputation' for missing data imputation and outlier detection, and 'Optimal Feature Selection' for exact sparse regression. The 'iai' package is an open-source project. The 'Interpretable AI' software modules are proprietary products, but free academic and evaluation licenses are available.
Maintained by Jack Dunn. Last updated 5 months ago.
96.8 match 1 stars 2.00 score 7 scriptstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
23.3 match 3 stars 8.20 score 7.8k scripts 11 dependentsbioc
MOFA2:Multi-Omics Factor Analysis v2
The MOFA2 package contains a collection of tools for training and analysing multi-omic factor analysis (MOFA). MOFA is a probabilistic factor model that aims to identify principal axes of variation from data sets that can comprise multiple omic layers and/or groups of samples. Additional time or space information on the samples can be incorporated using the MEFISTO framework, which is part of MOFA2. Downstream analysis functions to inspect molecular features underlying each factor, vizualisation, imputation etc are available.
Maintained by Ricard Argelaguet. Last updated 5 months ago.
dimensionreductionbayesianvisualizationfactor-analysismofamulti-omics
18.7 match 319 stars 10.02 score 502 scriptstrivialfis
xgboost:Extreme Gradient Boosting
Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.
Maintained by Jiaming Yuan. Last updated 8 months ago.
15.2 match 6 stars 11.70 score 13k scripts 112 dependentse-sensing
sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes
An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.
Maintained by Gilberto Camara. Last updated 1 months ago.
big-earth-datacbersearth-observationeo-datacubesgeospatialimage-time-seriesland-cover-classificationlandsatplanetary-computerr-spatialremote-sensingrspatialsatellite-image-time-seriessatellite-imagerysentinel-2stac-apistac-catalogcpp
18.6 match 494 stars 9.50 score 384 scriptstrackerproject
trackeR:Infrastructure for Running, Cycling and Swimming Data from GPS-Enabled Tracking Devices
Provides infrastructure for handling running, cycling and swimming data from GPS-enabled tracking devices within R. The package provides methods to extract, clean and organise workout and competition data into session-based and unit-aware data objects of class 'trackeRdata' (S3 class). The information can then be visualised, summarised, and analysed through flexible and extensible methods. Frick and Kosmidis (2017) <doi: 10.18637/jss.v082.i07>, which is updated and maintained as one of the vignettes, provides detailed descriptions of the package and its methods, and real-data demonstrations of the package functionality.
Maintained by Ioannis Kosmidis. Last updated 1 years ago.
27.6 match 90 stars 6.37 score 58 scripts 1 dependentsgavinsimpson
analogue:Analogue and Weighted Averaging Methods for Palaeoecology
Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.
Maintained by Gavin L. Simpson. Last updated 6 months ago.
19.4 match 14 stars 8.96 score 185 scripts 4 dependentsbusiness-science
modeltime:The Tidymodels Extension for Time Series Modeling
The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<https://otexts.com/fpp2/>). Refer to "Prophet: forecasting at scale" (<https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/>.).
Maintained by Matt Dancho. Last updated 5 months ago.
arimadata-sciencedeep-learningetsforecastingmachine-learningmachine-learning-algorithmsmodeltimeprophettbatstidymodelingtidymodelstimetime-seriestime-series-analysistimeseriestimeseries-forecasting
16.1 match 549 stars 10.57 score 1.1k scripts 7 dependentsoscarkjell
text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.
Maintained by Oscar Kjell. Last updated 4 days ago.
deep-learningmachine-learningnlptransformersopenjdk
12.7 match 146 stars 13.16 score 436 scripts 1 dependentsnredell
forecastML:Time Series Forecasting with Machine Learning Methods
The purpose of 'forecastML' is to simplify the process of multi-step-ahead forecasting with standard machine learning algorithms. 'forecastML' supports lagged, dynamic, static, and grouping features for modeling single and grouped numeric or factor/sequence time series. In addition, simple wrapper functions are used to support model-building with most R packages. This approach to forecasting is inspired by Bergmeir, Hyndman, and Koo's (2018) paper "A note on the validity of cross-validation for evaluating autoregressive time series prediction" <doi:10.1016/j.csda.2017.11.003>.
Maintained by Nickalus Redell. Last updated 5 years ago.
deep-learningdirect-forecastingforecastforecastingmachine-learningmulti-step-ahead-forecastingneural-networkpythontime-series
21.4 match 131 stars 7.64 score 134 scriptsahaeusser
echos:Echo State Networks for Time Series Modeling and Forecasting
Provides a lightweight implementation of functions and methods for fast and fully automatic time series modeling and forecasting using Echo State Networks (ESNs).
Maintained by Alexander Häußer. Last updated 13 days ago.
echo-state-networksfablefabletoolsforecastforecastingrecurrent-neural-networksreservoir-computingridge-regressiontime-seriesopenblascppopenmp
26.2 match 12 stars 6.03 score 8 scriptsohdsi
PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model
A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.
Maintained by Egill Fridgeirsson. Last updated 9 days ago.
14.2 match 190 stars 10.85 score 297 scriptsmyeomans
politeness:Detecting Politeness Features in Text
Detecting markers of politeness in English natural language. This package allows researchers to easily visualize and quantify politeness between groups of documents. This package combines prior research on the linguistic markers of politeness. We thank the Spencer Foundation, the Hewlett Foundation, and Harvard's Institute for Quantitative Social Science for support.
Maintained by Mike Yeomans. Last updated 1 months ago.
18.4 match 25 stars 7.49 score 41 scripts 1 dependentsbioc
switchBox:Utilities to train and validate classifiers based on pair switching using the K-Top-Scoring-Pair (KTSP) algorithm
The package offer different classifiers based on comparisons of pair of features (TSP), using various decision rules (e.g., majority wins principle).
Maintained by Bahman Afsari. Last updated 5 months ago.
softwarestatisticalmethodclassification
31.7 match 4.30 score 11 scripts 1 dependentsconsbiol-unibern
SDMtune:Species Distribution Model Selection
User-friendly framework that enables the training and the evaluation of species distribution models (SDMs). The package implements functions for data driven variable selection and model tuning and includes numerous utilities to display the results. All the functions used to select variables or to tune model hyperparameters have an interactive real-time chart displayed in the 'RStudio' viewer pane during their execution.
Maintained by Sergio Vignali. Last updated 3 months ago.
hyperparameter-tuningspecies-distribution-modellingvariable-selectioncpp
18.4 match 25 stars 7.37 score 155 scriptsbioc
SIAMCAT:Statistical Inference of Associations between Microbial Communities And host phenoTypes
Pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes (SIAMCAT). A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots).
Maintained by Jakob Wirbel. Last updated 5 months ago.
immunooncologymetagenomicsclassificationmicrobiomesequencingpreprocessingclusteringfeatureextractiongeneticvariabilitymultiplecomparisonregression
20.1 match 6.72 score 147 scriptsdiegommcc
SpatialDDLS:Deconvolution of Spatial Transcriptomics Data Based on Neural Networks
Deconvolution of spatial transcriptomics data based on neural networks and single-cell RNA-seq data. SpatialDDLS implements a workflow to create neural network models able to make accurate estimates of cell composition of spots from spatial transcriptomics data using deep learning and the meaningful information provided by single-cell RNA-seq data. See Torroja and Sanchez-Cabo (2019) <doi:10.3389/fgene.2019.00978> and Mañanes et al. (2024) <doi:10.1093/bioinformatics/btae072> to get an overview of the method and see some examples of its performance.
Maintained by Diego Mañanes. Last updated 5 months ago.
deconvolutiondeep-learningneural-networkspatial-transcriptomics
27.0 match 5 stars 5.00 score 1 scriptsrvalavi
blockCV:Spatial and Environmental Blocking for K-Fold and LOO Cross-Validation
Creating spatially or environmentally separated folds for cross-validation to provide a robust error estimation in spatially structured environments; Investigating and visualising the effective range of spatial autocorrelation in continuous raster covariates and point samples to find an initial realistic distance band to separate training and testing datasets spatially described in Valavi, R. et al. (2019) <doi:10.1111/2041-210X.13107>.
Maintained by Roozbeh Valavi. Last updated 5 months ago.
cross-validationspatialspatial-cross-validationspatial-modellingspecies-distribution-modellingcpp
12.3 match 113 stars 10.49 score 302 scripts 3 dependentsschlosslab
mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines
An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.
Maintained by Kelly Sovacool. Last updated 2 years ago.
15.5 match 56 stars 7.83 score 86 scriptsbioc
TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data
The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.
Maintained by Tiago Chedraoui Silva. Last updated 27 days ago.
dnamethylationdifferentialmethylationgeneregulationgeneexpressionmethylationarraydifferentialexpressionpathwaysnetworksequencingsurvivalsoftwarebiocbioconductorgdcintegrative-analysistcgatcga-datatcgabiolinks
8.2 match 305 stars 14.45 score 1.6k scripts 6 dependentsjameslamb
lightgbm:Light Gradient Boosting Machine
Tree based algorithms can be improved by introducing boosting frameworks. 'LightGBM' is one such framework, based on Ke, Guolin et al. (2017) <https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision>. This package offers an R interface to work with it. It is designed to be distributed and efficient with the following advantages: 1. Faster training speed and higher efficiency. 2. Lower memory usage. 3. Better accuracy. 4. Parallel learning supported. 5. Capable of handling large-scale data. In recognition of these advantages, 'LightGBM' has been widely-used in many winning solutions of machine learning competitions. Comparison experiments on public datasets suggest that 'LightGBM' can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. In addition, parallel experiments suggest that in certain circumstances, 'LightGBM' can achieve a linear speed-up in training time by using multiple machines.
Maintained by James Lamb. Last updated 1 months ago.
13.2 match 1 stars 8.47 score 1.6k scripts 6 dependentsbioc
CPSM:CPSM: Cancer patient survival model
The CPSM package provides a comprehensive computational pipeline for predicting the survival probability of cancer patients. It offers a series of steps including data processing, splitting data into training and test subsets, and normalization of data. The package enables the selection of significant features based on univariate survival analysis and generates a LASSO prognostic index score. It supports the development of predictive models for survival probability using various features and provides visualization tools to draw survival curves based on predicted survival probabilities. Additionally, SPM includes functionalities for generating bar plots that depict the predicted mean and median survival times of patients, making it a versatile tool for survival analysis in cancer research.
Maintained by Harpreet Kaur. Last updated 5 days ago.
geneexpressionnormalizationsurvival
28.7 match 3.90 scorebioc
SPONGE:Sparse Partial Correlations On Gene Expression
This package provides methods to efficiently detect competitive endogeneous RNA interactions between two genes. Such interactions are mediated by one or several miRNAs such that both gene and miRNA expression data for a larger number of samples is needed as input. The SPONGE package now also includes spongEffects: ceRNA modules offer patient-specific insights into the miRNA regulatory landscape.
Maintained by Markus List. Last updated 5 months ago.
geneexpressiontranscriptiongeneregulationnetworkinferencetranscriptomicssystemsbiologyregressionrandomforestmachinelearning
20.5 match 5.36 score 38 scripts 1 dependentsimbs-hl
fuseMLR:Fusing Machine Learning in R
Recent technological advances have enable the simultaneous collection of multi-omics data i.e., different types or modalities of molecular data, presenting challenges for integrative prediction modeling due to the heterogeneous, high-dimensional nature and possible missing modalities of some individuals. We introduce this package for late integrative prediction modeling, enabling modality-specific variable selection and prediction modeling, followed by the aggregation of the modality-specific predictions to train a final meta-model. This package facilitates conducting late integration predictive modeling in a systematic, structured, and reproducible way.
Maintained by Cesaire J. K. Fouodo. Last updated 6 days ago.
18.9 match 6 stars 5.80 score 3 scriptstidymodels
rsample:General Resampling Infrastructure
Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).
Maintained by Hannah Frick. Last updated 6 days ago.
6.5 match 341 stars 16.72 score 5.2k scripts 79 dependentskjhealy
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 11 months ago.
48.0 match 2.28 score 38 scriptspecanproject
PEcAn.data.atmosphere:PEcAn Functions Used for Managing Climate Driver Data
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The PECAn.data.atmosphere package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.
Maintained by David LeBauer. Last updated 3 days ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
9.3 match 216 stars 11.59 score 64 scripts 14 dependentsmlverse
luz:Higher Level 'API' for 'torch'
A high level interface for 'torch' providing utilities to reduce the the amount of code needed for common tasks, abstract away torch details and make the same code work on both the 'CPU' and 'GPU'. It's flexible enough to support expressing a large range of models. It's heavily inspired by 'fastai' by Howard et al. (2020) <arXiv:2002.04688>, 'Keras' by Chollet et al. (2015) and 'PyTorch Lightning' by Falcon et al. (2019) <doi:10.5281/zenodo.3828935>.
Maintained by Daniel Falbel. Last updated 6 months ago.
10.8 match 89 stars 9.86 score 318 scripts 4 dependentstidymodels
recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 6 days ago.
5.5 match 584 stars 18.71 score 7.2k scripts 380 dependentseagerai
fastai:Interface to 'fastai'
The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.
Maintained by Turgut Abdullayev. Last updated 11 months ago.
audiocollaborative-filteringdarknetdarknet-image-classificationfastaimedicalobject-detectiontabulartextvision
10.8 match 118 stars 9.40 score 76 scriptsbioc
tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles
This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.
Maintained by Timothy Keyes. Last updated 5 months ago.
singlecellflowcytometrybioinformaticscytometrydata-sciencesingle-celltidyversecpp
13.5 match 18 stars 7.24 score 35 scriptsbioc
scAnnotatR:Pretrained learning models for cell type prediction on single cell RNA-sequencing data
The package comprises a set of pretrained machine learning models to predict basic immune cell types. This enables all users to quickly get a first annotation of the cell types present in their dataset without requiring prior knowledge. scAnnotatR also allows users to train their own models to predict new cell types based on specific research needs.
Maintained by Johannes Griss. Last updated 5 months ago.
singlecelltranscriptomicsgeneexpressionsupportvectormachineclassificationsoftware
14.5 match 15 stars 6.73 score 20 scriptscran
datarobot:'DataRobot' Predictive Modeling API
For working with the 'DataRobot' predictive modeling platform's API <https://www.datarobot.com/>.
Maintained by AJ Alon. Last updated 1 years ago.
28.1 match 2 stars 3.48 scoremlverse
cuda.ml:R Interface for the RAPIDS cuML Suite of Libraries
R interface for RAPIDS cuML (<https://github.com/rapidsai/cuml>), a suite of GPU-accelerated machine learning libraries powered by CUDA (<https://en.wikipedia.org/wiki/CUDA>).
Maintained by Daniel Falbel. Last updated 3 years ago.
18.3 match 33 stars 5.27 score 57 scriptshannameyer
CAST:'caret' Applications for Spatial-Temporal Models
Supporting functionality to run 'caret' with spatial or spatial-temporal data. 'caret' is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using 'caret'. It includes the newly suggested 'Nearest neighbor distance matching' cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milà et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>.
Maintained by Hanna Meyer. Last updated 2 months ago.
autocorrelationcaretfeature-selectionmachine-learningoverfittingpredictive-modelingspatialspatio-temporalvariable-selection
8.0 match 114 stars 11.97 score 298 scripts 1 dependentszachmayer
caretEnsemble:Ensembles of Caret Models
Functions for creating ensembles of caret models: caretList() and caretStack(). caretList() is a convenience function for fitting multiple caret::train() models to the same dataset. caretStack() will make linear or non-linear combinations of these models, using a caret::train() model as a meta-model.
Maintained by Zachary A. Deane-Mayer. Last updated 3 months ago.
7.9 match 226 stars 11.98 score 780 scripts 1 dependentsbioc
TBSignatureProfiler:Profile RNA-Seq Data Using TB Pathway Signatures
Gene signatures of TB progression, TB disease, and other TB disease states have been validated and published previously. This package aggregates known signatures and provides computational tools to enlist their usage on other datasets. The TBSignatureProfiler makes it easy to profile RNA-Seq data using these signatures and includes common signature profiling tools including ASSIGN, GSVA, and ssGSEA. Original models for some gene signatures are also available. A shiny app provides some functionality alongside for detailed command line accessibility.
Maintained by Aubrey R. Odom. Last updated 3 months ago.
geneexpressiondifferentialexpressionbioconductor-packagebiomarkersgene-signaturestuberculosis
12.9 match 12 stars 7.25 score 23 scriptsiiasa
ibis.iSDM:Modelling framework for integrated biodiversity distribution scenarios
Integrated framework of modelling the distribution of species and ecosystems in a suitability framing. This package allows the estimation of integrated species distribution models (iSDM) based on several sources of evidence and provided presence-only and presence-absence datasets. It makes heavy use of point-process models for estimating habitat suitability and allows to include spatial latent effects and priors in the estimation. To do so 'ibis.iSDM' supports a number of engines for Bayesian and more non-parametric machine learning estimation. Further, the 'ibis.iSDM' is specifically customized to support spatial-temporal projections of habitat suitability into the future.
Maintained by Martin Jung. Last updated 4 months ago.
bayesianbiodiversityintegrated-frameworkpoisson-processscenariossdmspatial-grainspatial-predictionsspecies-distribution-modelling
21.5 match 21 stars 4.36 score 12 scripts 1 dependentsvzhomeexperiments
lazytrade:Learn Computer and Data Science using Algorithmic Trading
Provide sets of functions and methods to learn and practice data science using idea of algorithmic trading. Main goal is to process information within "Decision Support System" to come up with analysis or predictions. There are several utilities such as dynamic and adaptive risk management using reinforcement learning and even functions to generate predictions of price changes using pattern recognition deep regression learning. Summary of Methods used: Awesome H2O tutorials: <https://github.com/h2oai/awesome-h2o>, Market Type research of Van Tharp Institute: <https://vantharp.com/>, Reinforcement Learning R package: <https://CRAN.R-project.org/package=ReinforcementLearning>.
Maintained by Vladimir Zhbanko. Last updated 8 months ago.
16.7 match 23 stars 5.58 score 333 scriptscbailiss
pivottabler:Create Pivot Tables
Create regular pivot tables with just a few lines of R. More complex pivot tables can also be created, e.g. pivot tables with irregular layouts, multiple calculations and/or derived calculations based on multiple data frames. Pivot tables are constructed using R only and can be written to a range of output formats (plain text, 'HTML', 'Latex' and 'Excel'), including with styling/formatting.
Maintained by Christopher Bailiss. Last updated 1 years ago.
calculationshtmlhtmlwidgetlatexpivot-tablesvisualization
11.4 match 122 stars 8.08 score 358 scripts 1 dependentsmlverse
tabnet:Fit 'TabNet' Models for Classification and Regression
Implements the 'TabNet' model by Sercan O. Arik et al. (2019) <doi:10.48550/arXiv.1908.07442> with 'Coherent Hierarchical Multi-label Classification Networks' by Giunchiglia et al. <doi:10.48550/arXiv.2010.10151> and provides a consistent interface for fitting and creating predictions. It's also fully compatible with the 'tidymodels' ecosystem.
Maintained by Christophe Regouby. Last updated 6 months ago.
9.9 match 109 stars 9.00 score 65 scriptsjimbrig
rtraining:R Training Resources, Guides, Tips, and Knowledge Base
Houses variouse material realted to teaching R.
Maintained by Jimmy Briggs. Last updated 2 years ago.
best-practicescurationdeveloper-toolsdevelopmentdevelopment-environmentguideknowledgepackage-developmentsetupshiny-appstips-and-trickstrainingtraining-materialswalkthrough
23.5 match 4 stars 3.60 score 6 scriptsbips-hb
neuralnet:Training of Neural Networks
Training of neural networks using backpropagation, resilient backpropagation with (Riedmiller, 1994) or without weight backtracking (Riedmiller and Braun, 1993) or the modified globally convergent version by Anastasiadis et al. (2005). The package allows flexible settings through custom-choice of error and activation function. Furthermore, the calculation of generalized weights (Intrator O & Intrator N, 1993) is implemented.
Maintained by Marvin N. Wright. Last updated 4 years ago.
7.8 match 32 stars 10.73 score 2.9k scripts 38 dependentsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
7.1 match 215 stars 11.83 score 1.2k scripts 9 dependentsjakobraymaekers
classmap:Visualizing Classification Results
Tools to visualize the results of a classification of cases. The graphical displays include stacked plots, silhouette plots, quasi residual plots, and class maps. Implements the techniques described and illustrated in Raymaekers, Rousseeuw and Hubert (2021), Class maps for visualizing classification results, Technometrics, appeared online. <doi:10.1080/00401706.2021.1927849> (open access) and Raymaekers and Rousseeuw (2021), Silhouettes and quasi residual plots for neural nets and tree-based classifiers, <arXiv:2106.08814>. Examples can be found in the vignettes: "Discriminant_analysis_examples","K_nearest_neighbors_examples", "Support_vector_machine_examples", "Rpart_examples", "Random_forest_examples", and "Neural_net_examples".
Maintained by Jakob Raymaekers. Last updated 2 years ago.
26.4 match 3.08 score 20 scriptscsafe-isu
handwriterRF:Handwriting Analysis with Random Forests
Perform forensic handwriting analysis of two scanned handwritten documents. This package implements the statistical method described by Madeline Johnson and Danica Ommen (2021) <doi:10.1002/sam.11566>. Similarity measures and a random forest produce a score-based likelihood ratio that quantifies the strength of the evidence in favor of the documents being written by the same writer or different writers.
Maintained by Stephanie Reinders. Last updated 9 days ago.
13.1 match 2 stars 6.18 score 15 scripts 1 dependentsvgherard
sbo:Text Prediction via Stupid Back-Off N-Gram Models
Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).
Maintained by Valerio Gherardi. Last updated 4 years ago.
natural-language-processingngram-modelspredictive-textsbocpp
16.8 match 10 stars 4.78 score 12 scriptsnorskregnesentral
shapr:Prediction Explanation with Dependence-Aware Shapley Values
Complex machine learning models are often hard to interpret. However, in many situations it is crucial to understand and explain why a model made a specific prediction. Shapley values is the only method for such prediction explanation framework with a solid theoretical foundation. Previously known methods for estimating the Shapley values do, however, assume feature independence. This package implements methods which accounts for any feature dependence, and thereby produces more accurate estimates of the true Shapley values. An accompanying 'Python' wrapper ('shaprpy') is available through the GitHub repository.
Maintained by Martin Jullum. Last updated 1 months ago.
explainable-aiexplainable-mlrcpprcpparmadilloshapleyopenblascppopenmp
7.4 match 153 stars 10.62 score 175 scripts 1 dependentsbioc
scGPS:A complete analysis of single cell subpopulations, from identifying subpopulations to analysing their relationship (scGPS = single cell Global Predictions of Subpopulation)
The package implements two main algorithms to answer two key questions: a SCORE (Stable Clustering at Optimal REsolution) to find subpopulations, followed by scGPS to investigate the relationships between subpopulations.
Maintained by Quan Nguyen. Last updated 5 months ago.
singlecellclusteringdataimportsequencingcoverageopenblascpp
14.8 match 4 stars 5.20 score 7 scriptsbioc
iterativeBMAsurv:The Iterative Bayesian Model Averaging (BMA) Algorithm For Survival Analysis
The iterative Bayesian Model Averaging (BMA) algorithm for survival analysis is a variable selection method for applying survival analysis to microarray data.
Maintained by Ka Yee Yeung. Last updated 5 months ago.
23.2 match 3.30 score 8 scriptsnickch-k
causaldata:Example Data Sets for Causal Inference Textbooks
Example data sets to run the example problems from causal inference textbooks. Currently, contains data sets for Huntington-Klein, Nick (2021) "The Effect" <https://theeffectbook.net>, first and second edition, Cunningham, Scott (2021, ISBN-13: 978-0-300-25168-5) "Causal Inference: The Mixtape", and Hernán, Miguel and James Robins (2020) "Causal Inference: What If" <https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/>.
Maintained by Nick Huntington-Klein. Last updated 4 months ago.
10.3 match 136 stars 7.43 score 144 scripts 1 dependentsmicrosoft
finnts:Microsoft Finance Time Series Forecasting Framework
Automated time series forecasting developed by Microsoft Finance. The Microsoft Finance Time Series Forecasting Framework, aka Finn, can be used to forecast any component of the income statement, balance sheet, or any other area of interest by finance. Any numerical quantity over time, Finn can be used to forecast it. While it can be applied outside of the finance domain, Finn was built to meet the needs of financial analysts to better forecast their businesses within a company, and has a lot of built in features that are specific to the needs of financial forecasters. Happy forecasting!
Maintained by Mike Tokic. Last updated 26 days ago.
businessdata-sciencefeature-selectionfinancefinntsforecastingmachine-learningmicrosofttime-series
8.0 match 193 stars 9.45 score 39 scriptsbflammers
ANN2:Artificial Neural Networks for Anomaly Detection
Training of neural networks for classification and regression tasks using mini-batch gradient descent. Special features include a function for training autoencoders, which can be used to detect anomalies, and some related plotting functions. Multiple activation functions are supported, including tanh, relu, step and ramp. For the use of the step and ramp activation functions in detecting anomalies using autoencoders, see Hawkins et al. (2002) <doi:10.1007/3-540-46145-0_17>. Furthermore, several loss functions are supported, including robust ones such as Huber and pseudo-Huber loss, as well as L1 and L2 regularization. The possible options for optimization algorithms are RMSprop, Adam and SGD with momentum. The package contains a vectorized C++ implementation that facilitates fast training through mini-batch learning.
Maintained by Bart Lammers. Last updated 4 years ago.
anomaly-detectionartificial-neural-networksautoencodersneural-networksrobust-statisticsopenblascppopenmp
13.5 match 13 stars 5.59 score 60 scriptsrunxiao
deepnet:Deep Learning Toolkit in R
Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.
Maintained by Xiao Rong. Last updated 3 years ago.
15.7 match 24 stars 4.79 score 131 scripts 1 dependentsmyles-lewis
nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'
Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.
Maintained by Myles Lewis. Last updated 6 days ago.
9.4 match 12 stars 7.92 score 46 scriptsyulab-smu
MMINP:Microbe-Metabolite Interactions-Based Metabolic Profiles Predictor
Implements a computational framework to predict microbial community-based metabolic profiles with 'O2PLS' model. It provides procedures of model training and prediction. Paired microbiome and metabolome data are needed for modeling, and the trained model can be applied to predict metabolites of analogous environments using new microbial feature abundances.
Maintained by Wenli Tang. Last updated 2 years ago.
metabolite-predictionmetabolitesmicrobes
15.5 match 13 stars 4.81 score 9 scriptsr-lib
scales:Scale Functions for Visualization
Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends.
Maintained by Thomas Lin Pedersen. Last updated 5 months ago.
3.8 match 419 stars 19.88 score 88k scripts 7.9k dependentsmmi-codex
Xcertainty:Estimating Lengths and Uncertainty from Photogrammetric Imagery
Implementation of Bayesian models for estimating object lengths and morphological relationships between object lengths using photographic data collected from drones. The Bayesian model is described in "Bayesian approach for predicting photogrammetric uncertainty in morphometric measurements derived from drones" (Bierlich et al., 2021, <doi:10.3354/meps13814>).
Maintained by K.C. Bierlich. Last updated 5 months ago.
12.4 match 3 stars 5.95 score 10 scriptsbusiness-science
timetk:A Tool Kit for Working with Time Series
Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.
Maintained by Matt Dancho. Last updated 1 years ago.
coercioncoercion-functionsdata-miningdplyrforecastforecastingforecasting-modelsmachine-learningseries-decompositionseries-signaturetibbletidytidyquanttidyversetimetime-seriestimeseries
5.2 match 625 stars 14.15 score 4.0k scripts 16 dependentsbioc
scClassify:scClassify: single-cell Hierarchical Classification
scClassify is a multiscale classification framework for single-cell RNA-seq data based on ensemble learning and cell type hierarchies, enabling sample size estimation required for accurate cell type classification and joint classification of cells using multiple references.
Maintained by Yingxin Lin. Last updated 5 months ago.
singlecellgeneexpressionclassification
10.6 match 23 stars 6.92 score 30 scriptsbioc
iterativeBMA:The Iterative Bayesian Model Averaging (BMA) algorithm
The iterative Bayesian Model Averaging (BMA) algorithm is a variable selection and classification algorithm with an application of classifying 2-class microarray samples, as described in Yeung, Bumgarner and Raftery (Bioinformatics 2005, 21: 2394-2402).
Maintained by Ka Yee Yeung. Last updated 5 months ago.
18.3 match 3.78 score 1 scriptsmyeomans
doc2concrete:Measuring Concreteness in Natural Language
Models for detecting concreteness in natural language. This package is built in support of Yeomans (2021) <doi:10.1016/j.obhdp.2020.10.008>, which reviews linguistic models of concreteness in several domains. Here, we provide an implementation of the best-performing domain-general model (from Brysbaert et al., (2014) <doi:10.3758/s13428-013-0403-5>) as well as two pre-trained models for the feedback and plan-making domains.
Maintained by Mike Yeomans. Last updated 1 years ago.
12.3 match 13 stars 5.59 score 20 scripts 1 dependentsgesistsa
oolong:Create Validation Tests for Automated Content Analysis
Intended to create standard human-in-the-loop validity tests for typical automated content analysis such as topic modeling and dictionary-based methods. This package offers a standard workflow with functions to prepare, administer and evaluate a human-in-the-loop validity test. This package provides functions for validating topic models using word intrusion, topic intrusion (Chang et al. 2009, <https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models>) and word set intrusion (Ying et al. 2021) <doi:10.1017/pan.2021.33> tests. This package also provides functions for generating gold-standard data which are useful for validating dictionary-based methods. The default settings of all generated tests match those suggested in Chang et al. (2009) and Song et al. (2020) <doi:10.1080/10584609.2020.1723752>.
Maintained by Chung-hong Chan. Last updated 20 days ago.
textanalysistopicmodelingvalidation
8.8 match 54 stars 7.57 score 23 scriptscbailiss
basictabler:Construct Rich Tables for Output to 'HTML'/'Excel'
Easily create tables from data frames/matrices. Create/manipulate tables row-by-row, column-by-column or cell-by-cell. Use common formatting/styling to output rich tables as 'HTML', 'HTML widgets' or to 'Excel'.
Maintained by Christopher Bailiss. Last updated 4 years ago.
htmlhtmlwidgettablesvisualization
9.2 match 37 stars 7.09 score 94 scriptsr-forge
mlogit:Multinomial Logit Models
Maximum Likelihood estimation of random utility discrete choice models, as described in Kenneth Train (2009) Discrete Choice Methods with Simulations <doi:10.1017/CBO9780511805271>.
Maintained by Yves Croissant. Last updated 5 years ago.
6.5 match 9.81 score 1.2k scripts 14 dependentsnliulab
AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator
A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.
Maintained by Feng Xie. Last updated 15 days ago.
8.2 match 32 stars 7.70 score 30 scriptsmabelc
ssc:Semi-Supervised Classification Methods
Provides a collection of self-labeled techniques for semi-supervised classification. In semi-supervised classification, both labeled and unlabeled data are used to train a classifier. This learning paradigm has obtained promising results, specifically in the presence of a reduced set of labeled examples. This package implements a collection of self-labeled techniques to construct a classification model. This family of techniques enlarges the original labeled set using the most confident predictions to classify unlabeled data. The techniques implemented can be applied to classification problems in several domains by the specification of a supervised base classifier. At low ratios of labeled data, it can be shown to perform better than classical supervised classifiers.
Maintained by Christoph Bergmeir. Last updated 5 years ago.
12.0 match 9 stars 5.22 score 62 scripts 1 dependentsgabrielodom
mvMonitoring:Multi-State Adaptive Dynamic Principal Component Analysis for Multivariate Process Monitoring
Use multi-state splitting to apply Adaptive-Dynamic PCA (ADPCA) to data generated from a continuous-time multivariate industrial or natural process. Employ PCA-based dimension reduction to extract linear combinations of relevant features, reducing computational burdens. For a description of ADPCA, see <doi:10.1007/s00477-016-1246-2>, the 2016 paper from Kazor et al. The multi-state application of ADPCA is from a manuscript under current revision entitled "Multi-State Multivariate Statistical Process Control" by Odom, Newhart, Cath, and Hering, and is expected to appear in Q1 of 2018.
Maintained by Gabriel Odom. Last updated 1 years ago.
11.8 match 4 stars 5.24 score 29 scriptsschochastics
networkdata:Repository of Network Datasets
The package contains a large collection of network dataset with different context. This includes social networks, animal networks and movie networks. All datasets are in 'igraph' format.
Maintained by David Schoch. Last updated 12 months ago.
12.3 match 143 stars 5.01 score 143 scriptsshaunpwilkinson
aphid:Analysis with Profile Hidden Markov Models
Designed for the development and application of hidden Markov models and profile HMMs for biological sequence analysis. Contains functions for multiple and pairwise sequence alignment, model construction and parameter optimization, file import/export, implementation of the forward, backward and Viterbi algorithms for conditional sequence probabilities, tree-based sequence weighting, and sequence simulation. Features a wide variety of potential applications including database searching, gene-finding and annotation, phylogenetic analysis and sequence classification. Based on the models and algorithms described in Durbin et al (1998, ISBN: 9780521629713).
Maintained by Shaun Wilkinson. Last updated 8 months ago.
9.3 match 22 stars 6.58 score 38 scripts 3 dependentsthomasp85
lime:Local Interpretable Model-Agnostic Explanations
When building complex models, it is often difficult to explain why the model should be trusted. While global measures such as accuracy are useful, they cannot be used for explaining why a model made a specific prediction. 'lime' (a port of the 'lime' 'Python' package) is a method for explaining the outcome of black box models by fitting a local model around the point in question an perturbations of this point. The approach is described in more detail in the article by Ribeiro et al. (2016) <arXiv:1602.04938>.
Maintained by Emil Hvitfeldt. Last updated 3 years ago.
caretmodel-checkingmodel-evaluationmodelingcpp
5.5 match 485 stars 11.07 score 732 scripts 1 dependentsbioc
DECIPHER:Tools for curating, analyzing, and manipulating biological sequences
A toolset for deciphering and managing biological sequences.
Maintained by Erik Wright. Last updated 6 days ago.
clusteringgeneticssequencingdataimportvisualizationmicroarrayqualitycontrolqpcralignmentwholegenomemicrobiomeimmunooncologygenepredictionopenmp
7.2 match 8.40 score 1.1k scripts 14 dependentsbioc
GenomicSuperSignature:Interpretation of RNA-seq experiments through robust, efficient comparison to public databases
This package provides a novel method for interpreting new transcriptomic datasets through near-instantaneous comparison to public archives without high-performance computing requirements. Through the pre-computed index, users can identify public resources associated with their dataset such as gene sets, MeSH term, and publication. Functions to identify interpretable annotations and intuitive visualization options are implemented in this package.
Maintained by Sehyun Oh. Last updated 5 months ago.
transcriptomicssystemsbiologyprincipalcomponentrnaseqsequencingpathwaysclusteringbioconductor-packageexploratory-data-analysisgseameshprincipal-component-analysisrna-sequencing-profilestransferlearning
8.6 match 16 stars 6.97 score 59 scriptschoonghyunryu
alookr:Model Classifier for Binary Classification
A collection of tools that support data splitting, predictive modeling, and model evaluation. A typical function is to split a dataset into a training dataset and a test dataset. Then compare the data distribution of the two datasets. Another feature is to support the development of predictive models and to compare the performance of several predictive models, helping to select the best model.
Maintained by Choonghyun Ryu. Last updated 1 years ago.
11.1 match 12 stars 5.38 score 9 scriptsrueda-lab
iC10TrainingData:Training Datasets for iC10 Package
Training datasets for iC10; which implements the classifier described in the paper 'Genome-driven integrated classification of breast cancer validated in over 7,500 samples' (Ali HR et al., Genome Biology 2014). It uses copy number and/or expression form breast cancer data, trains a pamr classifier (Tibshirani et al.) with the features available and predicts the iC10 group. Genomic annotation for the training dataset has been obtained from Mark Dunning's lluminaHumanv3.db package.
Maintained by Oscar M. Rueda. Last updated 8 months ago.
27.4 match 2.18 score 7 scripts 5 dependentspokotylo
ddalpha:Depth-Based Classification and Calculation of Data Depth
Contains procedures for depth-based supervised learning, which are entirely non-parametric, in particular the DDalpha-procedure (Lange, Mosler and Mozharovskyi, 2014 <doi:10.1007/s00362-012-0488-4>). The training data sample is transformed by a statistical depth function to a compact low-dimensional space, where the final classification is done. It also offers an extension to functional data and routines for calculating certain notions of statistical depth functions. 50 multivariate and 5 functional classification problems are included. (Pokotylo, Mozharovskyi and Dyckerhoff, 2019 <doi:10.18637/jss.v091.i05>).
Maintained by Oleksii Pokotylo. Last updated 6 months ago.
13.5 match 2 stars 4.40 score 211 scripts 7 dependentsrebeccasalles
TSPred:Functions for Benchmarking Time Series Prediction
Functions for defining and conducting a time series prediction process including pre(post)processing, decomposition, modelling, prediction and accuracy assessment. The generated models and its yielded prediction errors can be used for benchmarking other time series prediction methods and for creating a demand for the refinement of such methods. For this purpose, benchmark data from prediction competitions may be used.
Maintained by Rebecca Pontes Salles. Last updated 4 years ago.
benchmarkinglinear-modelsmachine-learningnonstationaritytime-series-forecasttime-series-prediction
10.6 match 24 stars 5.53 score 94 scripts 1 dependentsaviralvijay-gslab
nonet:Weighted Average Ensemble without Training Labels
It provides ensemble capabilities to supervised and unsupervised learning models predictions without using training labels. It decides the relative weights of the different models predictions by using best models predictions as response variable and rest of the mo. User can decide the best model, therefore, It provides freedom to user to ensemble models based on their design solutions.
Maintained by Aviral Vijay. Last updated 6 years ago.
16.9 match 1 stars 3.41 score 17 scriptsfabsig
gpboost:Combining Tree-Boosting with Gaussian Process and Mixed Effects Models
An R package that allows for combining tree-boosting with Gaussian process and mixed effects models. It also allows for independently doing tree-boosting as well as inference and prediction for Gaussian process and mixed effects models. See <https://github.com/fabsig/GPBoost> for more information on the software and Sigrist (2022, JMLR) <https://www.jmlr.org/papers/v23/20-322.html> and Sigrist (2023, TPAMI) <doi:10.1109/TPAMI.2022.3168152> for more information on the methodology.
Maintained by Fabio Sigrist. Last updated 26 days ago.
13.2 match 4.20 score 212 scriptssym33
RecordLinkage:Record Linkage Functions for Linking and Deduplicating Data Sets
Provides functions for linking and deduplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain. For details, see our paper "The RecordLinkage Package: Detecting Errors in Data" Sariyar M / Borg A (2010) <doi:10.32614/RJ-2010-017>.
Maintained by Murat Sariyar. Last updated 2 years ago.
6.1 match 6 stars 8.96 score 454 scripts 8 dependentsstochastictree
stochtree:Stochastic Tree Ensembles (XBART and BART) for Supervised Learning and Causal Inference
Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) Chipman, George, McCulloch (2010) <doi:10.1214/09-AOAS285> for supervised learning and Bayesian Causal Forests (BCF) Hahn, Murray, Carvalho (2020) <doi:10.1214/19-BA1195> for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers.
Maintained by Drew Herren. Last updated 7 hours ago.
bartbayesian-machine-learningbayesian-methodsdecision-treesgradient-boosted-treesmachine-learningprobabilistic-modelstree-ensemblescpp
6.4 match 22 stars 8.57 score 40 scriptssbgraves237
Ecdat:Data Sets for Econometrics
Data sets for econometrics, including political science.
Maintained by Spencer Graves. Last updated 4 months ago.
7.3 match 2 stars 7.25 score 740 scripts 3 dependentsbioc
peco:A Supervised Approach for **P**r**e**dicting **c**ell Cycle Pr**o**gression using scRNA-seq data
Our approach provides a way to assign continuous cell cycle phase using scRNA-seq data, and consequently, allows to identify cyclic trend of gene expression levels along the cell cycle. This package provides method and training data, which includes scRNA-seq data collected from 6 individual cell lines of induced pluripotent stem cells (iPSCs), and also continuous cell cycle phase derived from FUCCI fluorescence imaging data.
Maintained by Chiaowen Joyce Hsiao. Last updated 5 months ago.
sequencingrnaseqgeneexpressiontranscriptomicssinglecellsoftwarestatisticalmethodclassificationvisualizationcell-cyclesingle-cell-rna-seq
8.7 match 12 stars 6.09 score 34 scriptsdfalbel
cloudml:Interface to the Google Cloud Machine Learning Platform
Interface to the Google Cloud Machine Learning Platform <https://cloud.google.com/ml-engine>, which provides cloud tools for training machine learning models.
Maintained by Daniel Falbel. Last updated 6 years ago.
13.6 match 3.85 score 141 scriptsbioc
SGCP:SGCP: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks
SGC is a semi-supervised pipeline for gene clustering in gene co-expression networks. SGC consists of multiple novel steps that enable the computation of highly enriched modules in an unsupervised manner. But unlike all existing frameworks, it further incorporates a novel step that leverages Gene Ontology information in a semi-supervised clustering method that further improves the quality of the computed modules.
Maintained by Niloofar AghaieAbiane. Last updated 5 months ago.
geneexpressiongenesetenrichmentnetworkenrichmentsystemsbiologyclassificationclusteringdimensionreductiongraphandnetworkneuralnetworknetworkmrnamicroarrayrnaseqvisualizationbioinformaticsgenecoexpressionnetworkgraphsnetworkclusteringnetworksself-trainingsemi-supervised-learningunsupervised-learning
10.2 match 2 stars 5.12 score 44 scriptsskgrange
rmweather:Tools to Conduct Meteorological Normalisation and Counterfactual Modelling for Air Quality Data
An integrated set of tools to allow data users to conduct meteorological normalisation and counterfactual modelling for air quality data. The meteorological normalisation technique uses predictive random forest models to remove variation of pollutant concentrations so trends and interventions can be explored in a robust way. For examples, see Grange et al. (2018) <doi:10.5194/acp-18-6223-2018> and Grange and Carslaw (2019) <doi:10.1016/j.scitotenv.2018.10.344>. The random forest models can also be used for counterfactual or business as usual (BAU) modelling by using the models to predict, from the model's perspective, the future. For an example, see Grange et al. (2021) <doi:10.5194/acp-2020-1171>.
Maintained by Stuart K. Grange. Last updated 24 days ago.
8.3 match 49 stars 6.24 score 239 scriptscran
SSLR:Semi-Supervised Classification, Regression and Clustering Methods
Providing a collection of techniques for semi-supervised classification, regression and clustering. In semi-supervised problem, both labeled and unlabeled data are used to train a classifier. The package includes a collection of semi-supervised learning techniques: self-training, co-training, democratic, decision tree, random forest, 'S3VM' ... etc, with a fairly intuitive interface that is easy to use.
Maintained by Francisco Jesús Palomares Alabarce. Last updated 4 years ago.
13.9 match 1 stars 3.64 score 73 scriptsrstudio
tensorflow:R Interface to 'TensorFlow'
Interface to 'TensorFlow' <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more 'CPUs' or 'GPUs' in a desktop, server, or mobile device with a single 'API'. 'TensorFlow' was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.
Maintained by Tomasz Kalinowski. Last updated 16 days ago.
3.3 match 1.3k stars 15.35 score 3.2k scripts 74 dependentsjpmml
r2pmml:Convert R Models to PMML
R wrapper for the JPMML-R library <https://github.com/jpmml/jpmml-r>, which converts R models to Predictive Model Markup Language (PMML).
Maintained by Villu Ruusmann. Last updated 13 days ago.
8.0 match 74 stars 6.29 score 35 scriptsklausvigo
kknn:Weighted k-Nearest Neighbors
Weighted k-Nearest Neighbors for Classification, Regression and Clustering.
Maintained by Klaus Schliep. Last updated 4 years ago.
4.5 match 23 stars 11.08 score 4.6k scripts 41 dependentsbnosac
crfsuite:Conditional Random Fields for Labelling Sequential Data in Natural Language Processing
Wraps the 'CRFsuite' library <https://github.com/chokkan/crfsuite> allowing users to fit a Conditional Random Field model and to apply it on existing data. The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for named entity recognition, text chunking, part of speech tagging, intent recognition or classification of any category you have in mind. Next to training, a small web application is included in the package to allow you to easily construct training data.
Maintained by Jan Wijffels. Last updated 2 years ago.
chunkingconditional-random-fieldscrfcrfsuitedata-scienceintent-classificationnatural-language-processingnernlpcpp
7.8 match 63 stars 6.34 score 35 scriptsegenn
rtemis:Machine Learning and Visualization
Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.
Maintained by E.D. Gennatas. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningmachine-learning-libraryvisualization
6.9 match 145 stars 7.09 score 50 scripts 2 dependentsbioc
ASSIGN:Adaptive Signature Selection and InteGratioN (ASSIGN)
ASSIGN is a computational tool to evaluate the pathway deregulation/activation status in individual patient samples. ASSIGN employs a flexible Bayesian factor analysis approach that adapts predetermined pathway signatures derived either from knowledge-based literature or from perturbation experiments to the cell-/tissue-specific pathway signatures. The deregulation/activation level of each context-specific pathway is quantified to a score, which represents the extent to which a patient sample encompasses the pathway deregulation/activation signature.
Maintained by Ying Shen. Last updated 5 months ago.
softwaregeneexpressionpathwaysbayesian
6.6 match 2 stars 7.37 score 65 scripts 1 dependentsmdsr-book
mdsr:Complement to 'Modern Data Science with R'
A complement to all editions of *Modern Data Science with R* (ISBN: 978-0367191498, publisher URL: <https://www.routledge.com/Modern-Data-Science-with-R/Baumer-Kaplan-Horton/p/book/9780367191498>). This package contains data and code to complete exercises and reproduce examples from the text. It also facilitates connections to the SQL database server used in the book. All editions of the book are supported by this package.
Maintained by Benjamin S. Baumer. Last updated 7 months ago.
6.8 match 38 stars 7.21 score 504 scriptsphilips-software
latrend:A Framework for Clustering Longitudinal Data
A framework for clustering longitudinal datasets in a standardized way. The package provides an interface to existing R packages for clustering longitudinal univariate trajectories, facilitating reproducible and transparent analyses. Additionally, standard tools are provided to support cluster analyses, including repeated estimation, model validation, and model assessment. The interface enables users to compare results between methods, and to implement and evaluate new methods with ease. The 'akmedoids' package is available from <https://github.com/MAnalytics/akmedoids>.
Maintained by Niek Den Teuling. Last updated 2 months ago.
cluster-analysisclustering-evaluationclustering-methodsdata-sciencelongitudinal-clusteringlongitudinal-datamixture-modelstime-series-analysis
7.0 match 30 stars 6.77 score 26 scriptsbnosac
word2vec:Distributed Representations of Words
Learn vector representations of words by continuous bag of words and skip-gram implementations of the 'word2vec' algorithm. The techniques are detailed in the paper "Distributed Representations of Words and Phrases and their Compositionality" by Mikolov et al. (2013), available at <arXiv:1310.4546>.
Maintained by Jan Wijffels. Last updated 1 years ago.
embeddingsnatural-language-processingword2veccpp
5.6 match 70 stars 8.36 score 227 scripts 6 dependentsbioc
CellNOptR:Training of boolean logic models of signalling networks using prior knowledge networks and perturbation data
This package does optimisation of boolean logic networks of signalling pathways based on a previous knowledge network and a set of data upon perturbation of the nodes in the network.
Maintained by Attila Gabor. Last updated 5 months ago.
cellbasedassayscellbiologyproteomicspathwaysnetworktimecourseimmunooncology
6.9 match 6.72 score 98 scripts 6 dependentsndphillips
FFTrees:Generate, Visualise, and Evaluate Fast-and-Frugal Decision Trees
Create, visualize, and test fast-and-frugal decision trees (FFTs) using the algorithms and methods described by Phillips, Neth, Woike & Gaissmaier (2017), <doi:10.1017/S1930297500006239>. FFTs are simple and transparent decision trees for solving binary classification problems. FFTs can be preferable to more complex algorithms because they require very little information, are easy to understand and communicate, and are robust against overfitting.
Maintained by Hansjoerg Neth. Last updated 5 months ago.
4.9 match 136 stars 9.53 score 144 scriptsr-lib
generics:Common S3 Generics not Provided by Base R Methods Related to Model Fitting
In order to reduce potential package dependencies and conflicts, generics provides a number of commonly used S3 generics.
Maintained by Hadley Wickham. Last updated 1 years ago.
3.3 match 61 stars 14.00 score 131 scripts 9.8k dependentsbioc
ClassifyR:A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing
The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.
Maintained by Dario Strbenac. Last updated 7 days ago.
5.5 match 5 stars 8.36 score 45 scripts 3 dependentsbioc
xCell2:A Tool for Generic Cell Type Enrichment Analysis
xCell2 provides methods for cell type enrichment analysis using cell type signatures. It includes three main functions - 1. xCell2Train for training custom references objects from bulk or single-cell RNA-seq datasets. 2. xCell2Analysis for conducting the cell type enrichment analysis using the custom reference. 3. xCell2GetLineage for identifying dependencies between different cell types using ontology.
Maintained by Almog Angel. Last updated 1 days ago.
geneexpressiontranscriptomicsmicroarrayrnaseqsinglecelldifferentialexpressionimmunooncologygenesetenrichment
7.3 match 6 stars 6.16 score 15 scriptsgjmvanboxtel
gsignal:Signal Processing
R implementation of the 'Octave' package 'signal', containing a variety of signal processing tools, such as signal generation and measurement, correlation and convolution, filtering, filter design, filter analysis and conversion, power spectrum analysis, system identification, decimation and sample rate change, and windowing.
Maintained by Geert van Boxtel. Last updated 2 months ago.
4.5 match 24 stars 10.03 score 133 scripts 34 dependentsbioc
HIBAG:HLA Genotype Imputation with Attribute Bagging
Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.
Maintained by Xiuwen Zheng. Last updated 4 months ago.
geneticsstatisticalmethodbioinformaticsgpuhlaimputationmhcsnpcpp
5.5 match 30 stars 8.24 score 48 scriptsbnaras
pamr:Pam: Prediction Analysis for Microarrays
Some functions for sample classification in microarrays.
Maintained by Balasubramanian Narasimhan. Last updated 9 months ago.
5.7 match 7.90 score 256 scripts 14 dependentsgluc
data.tree:General Purpose Hierarchical Data Structure
Create tree structures from hierarchical data, and traverse the tree in various orders. Aggregate, cumulate, print, plot, convert to and from data.frame and more. Useful for decision trees, machine learning, finance, conversion from and to JSON, and many other applications.
Maintained by Christoph Glur. Last updated 5 months ago.
3.5 match 209 stars 12.84 score 1.1k scripts 88 dependentstidymodels
butcher:Model Butcher
Provides a set of S3 generics to axe components of fitted model objects and help reduce the size of model objects saved to disk.
Maintained by Julia Silge. Last updated 14 days ago.
3.9 match 132 stars 11.54 score 146 scripts 13 dependentspsychbruce
PsychWordVec:Word Embedding Research Framework for Psychological Science
An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a series of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <arXiv:1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <arXiv:1607.04606>; (5) a group of functions to download 'pre-trained' language models (e.g., 'GPT', 'BERT') and extract contextualized (dynamic) word vectors (based on the R package 'text').
Maintained by Han-Wu-Shuang Bao. Last updated 1 years ago.
bertcosine-similarityfasttextglovegptlanguage-modelnatural-language-processingnlppretrained-modelspsychologysemantic-analysistext-analysistext-miningtsneword-embeddingsword-vectorsword2vecopenjdk
11.1 match 22 stars 4.04 score 10 scriptsbioc
TitanCNA:Subclonal copy number and LOH prediction from whole genome sequencing of tumours
Hidden Markov model to segment and predict regions of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH), and estimate cellular prevalence of clonal clusters in tumour whole genome sequencing data.
Maintained by Gavin Ha. Last updated 5 months ago.
sequencingwholegenomednaseqexomeseqstatisticalmethodcopynumbervariationhiddenmarkovmodelgeneticsgenomicvariationimmunooncology10x-genomicscopy-number-variationgenome-sequencinghmmtumor-heterogeneity
5.3 match 96 stars 8.47 score 68 scriptsnanxstats
stackgbm:Stacked Gradient Boosting Machines
A minimalist implementation of model stacking by Wolpert (1992) <doi:10.1016/S0893-6080(05)80023-1> for boosted tree models. A classic, two-layer stacking model is implemented, where the first layer generates features using gradient boosting trees, and the second layer employs a logistic regression model that uses these features as inputs. Utilities for training the base models and parameters tuning are provided, allowing users to experiment with different ensemble configurations easily. It aims to provide a simple and efficient way to combine multiple gradient boosting models to improve predictive model performance and robustness.
Maintained by Nan Xiao. Last updated 11 months ago.
automlcatboostdecision-treesensemble-learninggbdtgbmgradient-boostinglightgbmmachine-learningmodel-stackingxgboost
8.1 match 25 stars 5.40 score 3 scriptssmaakage85
modelgrid:A Framework for Creating, Managing and Training Multiple Caret Models
A minimalistic but flexible framework that facilitates the creation, management and training of multiple 'caret' models. A model grid consists of two components: (1) a set of settings that is shared by all models by default, and (2) specifications that apply only to the individual models. When the model grid is trained, model and training specifications are first consolidated from the shared and the model specific settings into complete 'caret' model configurations. These models are then trained with the 'train' function from the 'caret' package.
Maintained by Lars Kjeldgaard. Last updated 6 years ago.
caretmachine-learningpredictive-analyticspredictive-modeling
8.1 match 23 stars 5.34 score 19 scriptsandy-iskauskas
hmer:History Matching and Emulation Package
A set of objects and functions for Bayes Linear emulation and history matching. Core functionality includes automated training of emulators to data, diagnostic functions to ensure suitability, and a variety of proposal methods for generating 'waves' of points. For details on the mathematical background, there are many papers available on the topic (see references attached to function help files or the below references); for details of the functions in this package, consult the manual or help files. Iskauskas, A, et al. (2024) <doi:10.18637/jss.v109.i10>. Bower, R.G., Goldstein, M., and Vernon, I. (2010) <doi:10.1214/10-BA524>. Craig, P.S., Goldstein, M., Seheult, A.H., and Smith, J.A. (1997) <doi:10.1007/978-1-4612-2290-3_2>.
Maintained by Andrew Iskauskas. Last updated 12 days ago.
6.0 match 16 stars 7.19 score 37 scriptsdselivanov
mlapi:Abstract Classes for Building 'scikit-learn' Like API
Provides 'R6' abstract classes for building machine learning models with 'scikit-learn' like API. <https://scikit-learn.org/> is a popular module for 'Python' programming language which design became de facto a standard in industry for machine learning tasks.
Maintained by Dmitriy Selivanov. Last updated 3 years ago.
8.0 match 5.36 score 5 scripts 24 dependentsopenintrostat
openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
Maintained by Mine Çetinkaya-Rundel. Last updated 3 months ago.
3.8 match 240 stars 11.39 score 6.0k scriptsrstudio
vetiver:Version, Share, Deploy, and Monitor Models
The goal of 'vetiver' is to provide fluent tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. The 'vetiver' package is extensible, with generics that can support many kinds of models.
Maintained by Julia Silge. Last updated 5 months ago.
4.0 match 185 stars 10.48 score 466 scripts 1 dependentsbilldenney
PKNCA:Perform Pharmacokinetic Non-Compartmental Analysis
Compute standard Non-Compartmental Analysis (NCA) parameters for typical pharmacokinetic analyses and summarize them.
Maintained by Bill Denney. Last updated 17 days ago.
ncanoncompartmental-analysispharmacokinetics
3.3 match 73 stars 12.61 score 214 scripts 4 dependentsblasbenito
spatialRF:Easy Spatial Modeling with Random Forest
Automatic generation and selection of spatial predictors for spatial regression with Random Forest. Spatial predictors are surrogates of variables driving the spatial structure of a response variable. The package offers two methods to generate spatial predictors from a distance matrix among training cases: 1) Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>): computed as the eigenvectors of a weighted matrix of distances; 2) RFsp (Hengl et al. <DOI:10.7717/peerj.5518>): columns of the distance matrix used as spatial predictors. Spatial predictors help minimize the spatial autocorrelation of the model residuals and facilitate an honest assessment of the importance scores of the non-spatial predictors. Additionally, functions to reduce multicollinearity, identify relevant variable interactions, tune random forest hyperparameters, assess model transferability via spatial cross-validation, and explore model results via partial dependence curves and interaction surfaces are included in the package. The modelling functions are built around the highly efficient 'ranger' package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).
Maintained by Blas M. Benito. Last updated 3 years ago.
random-forestspatial-analysisspatial-regression
7.7 match 114 stars 5.45 score 49 scriptscran
soundgen:Sound Synthesis and Acoustic Analysis
Performs parametric synthesis of sounds with harmonic and noise components such as animal vocalizations or human voice. Also offers tools for audio manipulation and acoustic analysis, including pitch tracking, spectral analysis, audio segmentation, pitch and formant shifting, etc. Includes four interactive web apps for synthesizing and annotating audio, manually correcting pitch contours, and measuring formant frequencies. Reference: Anikin (2019) <doi:10.3758/s13428-018-1095-7>.
Maintained by Andrey Anikin. Last updated 2 months ago.
8.6 match 1 stars 4.86 score 110 scripts 2 dependentshmjianggatech
SAM:Sparse Additive Modelling
Computationally efficient tools for high dimensional predictive modeling (regression and classification). SAM is short for sparse additive modeling, and adopts the computationally efficient basis spline technique. We solve the optimization problems by various computational algorithms including the block coordinate descent algorithm, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by warm-start and active-set tricks.
Maintained by Haoming Jiang. Last updated 3 years ago.
7.1 match 6 stars 5.86 score 20 scripts 4 dependentspaulhendricks
titanic:Titanic Passenger Survival Data Set
This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner "Titanic", with variables such as economic status (class), sex, age, and survival. Whereas the base R Titanic data found by calling data("Titanic") is an array resulting from cross-tabulating 2201 observations, these data sets are individual non-aggregated observations and formatted in a machine learning context with a training sample, a testing sample, and two additional data sets that can be used for deeper machine learning analysis. These data sets are used in a very well known Kaggle competition; formatting the raw data sets in a package hopefully lowers the barrier to entry for users new to R and machine learning.
Maintained by Paul Hendricks. Last updated 8 years ago.
4.5 match 10 stars 8.95 score 804 scripts 2 dependentsltorgo
DMwR2:Functions and Data for the Second Edition of "Data Mining with R"
Functions and data accompanying the second edition of the book "Data Mining with R, learning with case studies" by Luis Torgo, published by CRC Press.
Maintained by Luis Torgo. Last updated 8 years ago.
5.3 match 27 stars 7.46 score 380 scripts 2 dependentsben519
mltools:Machine Learning Tools
A collection of machine learning helper functions, particularly assisting in the Exploratory Data Analysis phase. Makes heavy use of the 'data.table' package for optimal speed and memory efficiency. Highlights include a versatile bin_data() function, sparsify() for converting a data.table to sparse matrix format with one-hot encoding, fast evaluation metrics, and empirical_cdf() for calculating empirical Multivariate Cumulative Distribution Functions.
Maintained by Ben Gorman. Last updated 3 years ago.
exploratory-data-analysismachine-learning
4.0 match 72 stars 9.58 score 1.2k scripts 13 dependentsmanalytics
opitools:Analyzing the Opinions in a Big Text Document
Designed for performing impact analysis of opinions in a digital text document (DTD). The package allows a user to assess the extent to which a theme or subject within a document impacts the overall opinion expressed in the document. The package can be applied to a wide range of opinion-based DTD, including commentaries on social media platforms (such as 'Facebook', 'Twitter' and 'Youtube'), online products reviews, and so on. The utility of 'opitools' was originally demonstrated in Adepeju and Jimoh (2021) <doi:10.31235/osf.io/c32qh> in the assessment of COVID-19 impacts on neighbourhood policing using Twitter data. Further examples can be found in the vignette of the package.
Maintained by Monsuru Adepeju. Last updated 2 years ago.
7.2 match 12 stars 5.30 score 11 scriptsbioc
bambu:Context-Aware Transcript Quantification from Long Read RNA-Seq data
bambu is a R package for multi-sample transcript discovery and quantification using long read RNA-Seq data. You can use bambu after read alignment to obtain expression estimates for known and novel transcripts and genes. The output from bambu can directly be used for visualisation and downstream analysis such as differential gene expression or transcript usage.
Maintained by Ying Chen. Last updated 1 months ago.
alignmentcoveragedifferentialexpressionfeatureextractiongeneexpressiongenomeannotationgenomeassemblyimmunooncologylongreadmultiplecomparisonnormalizationrnaseqregressionsequencingsoftwaretranscriptiontranscriptomicsbambubioconductorlong-readsnanoporenanopore-sequencingrna-seqrna-seq-analysistranscript-quantificationtranscript-reconstructioncpp
4.2 match 197 stars 9.03 score 91 scripts 1 dependentsrolkra
explore:Simplifies Exploratory Data Analysis
Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.
Maintained by Roland Krasser. Last updated 3 months ago.
data-explorationdata-visualisationdecision-treesedarmarkdownshinytidy
3.3 match 228 stars 11.43 score 221 scripts 1 dependentsandyliaw-mrk
locfit:Local Regression, Likelihood and Density Estimation
Local regression, likelihood and density estimation methods as described in the 1999 book by Loader.
Maintained by Andy Liaw. Last updated 12 days ago.
4.0 match 1 stars 9.40 score 428 scripts 606 dependentsloelschlaeger
RprobitB:Bayesian Probit Choice Modeling
Bayes estimation of probit choice models, both in the cross-sectional and panel setting. The package can analyze binary, multivariate, ordered, and ranked choices, as well as heterogeneity of choice behavior among deciders. The main functionality includes model fitting via Markov chain Monte Carlo m ethods, tools for convergence diagnostic, choice data simulation, in-sample and out-of-sample choice prediction, and model selection using information criteria and Bayes factors. The latent class model extension facilitates preference-based decider classification, where the number of latent classes can be inferred via the Dirichlet process or a weight-based updating heuristic. This allows for flexible modeling of choice behavior without the need to impose structural constraints. For a reference on the method see Oelschlaeger and Bauer (2021) <https://trid.trb.org/view/1759753>.
Maintained by Lennart Oelschläger. Last updated 5 months ago.
bayesdiscrete-choiceprobitopenblascppopenmp
6.8 match 4 stars 5.45 score 1 scriptsgorelab
waves:Vis-NIR Spectral Analysis Wrapper
Originally designed application in the context of resource-limited plant research and breeding programs, 'waves' provides an open-source solution to spectral data processing and model development by bringing useful packages together into a streamlined pipeline. This package is wrapper for functions related to the analysis of point visible and near-infrared reflectance measurements. It includes visualization, filtering, aggregation, preprocessing, cross-validation set formation, model training, and prediction functions to enable open-source association of spectral and reference data. This package is documented in a peer-reviewed manuscript in the Plant Phenome Journal <doi:10.1002/ppj2.20012>. Specialized cross-validation schemes are described in detail in Jarquín et al. (2017) <doi:10.3835/plantgenome2016.12.0130>. Example data is from Ikeogu et al. (2017) <doi:10.1371/journal.pone.0188918>.
Maintained by Jenna Hershberger. Last updated 11 months ago.
6.1 match 6 stars 5.98 score 53 scriptsmidasverse
rMIDAS:Multiple Imputation with Denoising Autoencoders
A tool for multiply imputing missing data using 'MIDAS', a deep learning method based on denoising autoencoder neural networks. This algorithm offers significant accuracy and efficiency advantages over other multiple imputation strategies, particularly when applied to large datasets with complex features. Alongside interfacing with 'Python' to run the core algorithm, this package contains functions for processing data before and after model training, running imputation model diagnostics, generating multiple completed datasets, and estimating regression models on these datasets.
Maintained by Thomas Robinson. Last updated 1 years ago.
deep-learningimputation-methodsneural-networkreticulatetensorflow
5.6 match 34 stars 6.53 score 33 scriptssmartdata-analysis-and-statistics
precmed:Precision Medicine
A doubly robust precision medicine approach to fit, cross-validate and visualize prediction models for the conditional average treatment effect (CATE). It implements doubly robust estimation and semiparametric modeling approach of treatment-covariate interactions as proposed by Yadlowsky et al. (2020) <doi:10.1080/01621459.2020.1772080>.
Maintained by Thomas Debray. Last updated 5 months ago.
8.7 match 4 stars 4.20 score 4 scriptsips-lmu
emuR:Main Package of the EMU Speech Database Management System
Provide the EMU Speech Database Management System (EMU-SDMS) with database management, data extraction, data preparation and data visualization facilities. See <https://ips-lmu.github.io/The-EMU-SDMS-Manual/> for more details.
Maintained by Markus Jochim. Last updated 1 years ago.
5.3 match 24 stars 6.89 score 135 scripts 1 dependentsdtkaplan
LSTbook:Data and Software for "Lessons in Statistical Thinking"
"Lessons in Statistical Thinking" D.T. Kaplan (2014) <https://dtkaplan.github.io/Lessons-in-statistical-thinking/> is a textbook for a first or second course in statistics that embraces data wrangling, causal reasoning, modeling, statistical adjustment, and simulation. 'LSTbook' supports the student-centered, tidy, pipeline-oriented computing style featured in the book.
Maintained by Daniel Kaplan. Last updated 2 days ago.
5.8 match 4 stars 6.29 score 27 scriptsbioc
cellity:Quality Control for Single-Cell RNA-seq Data
A support vector machine approach to identifying and filtering low quality cells from single-cell RNA-seq datasets.
Maintained by Tomislav Ilicic. Last updated 5 months ago.
immunooncologyrnaseqqualitycontrolpreprocessingnormalizationvisualizationdimensionreductiontranscriptomicsgeneexpressionsequencingsoftwaresupportvectormachine
9.0 match 4.00 score 9 scriptstarnduong
ks:Kernel Smoothing
Kernel smoothers for univariate and multivariate data, with comprehensive visualisation and bandwidth selection capabilities, including for densities, density derivatives, cumulative distributions, clustering, classification, density ridges, significant modal regions, and two-sample hypothesis tests. Chacon & Duong (2018) <doi:10.1201/9780429485572>.
Maintained by Tarn Duong. Last updated 6 months ago.
3.5 match 6 stars 10.14 score 920 scripts 262 dependentstrangdata
treeheatr:Heatmap-Integrated Decision Tree Visualizations
Creates interpretable decision tree visualizations with the data represented as a heatmap at the tree's leaf nodes. 'treeheatr' utilizes the customizable 'ggparty' package for drawing decision trees.
Maintained by Trang Le. Last updated 2 years ago.
datavizdecision-treesggplotheatmapvisualization
6.2 match 57 stars 5.71 score 18 scriptsgokmenzararsiz
dtComb:Statistical Combination of Diagnostic Tests
A system for combining two diagnostic tests using various approaches that include statistical and machine-learning-based methodologies. These approaches are divided into four groups: linear combination methods, non-linear combination methods, mathematical operators, and machine learning algorithms. See the <https://biotools.erciyes.edu.tr/dtComb/> website for more information, documentation, and examples.
Maintained by Gokmen Zararsiz. Last updated 5 months ago.
7.5 match 4.70 score 7 scriptsrrwen
nbc4va:Bayes Classifier for Verbal Autopsy Data
An implementation of the Naive Bayes Classifier (NBC) algorithm used for Verbal Autopsy (VA) built on code from Miasnikof et al (2015) <DOI:10.1186/s12916-015-0521-2>.
Maintained by Richard Wen. Last updated 3 years ago.
autopsybayescauseclassifiercodedcomputerdeathestimateimputationlearningmachinemdsmillionnaivenbcprobabilitystudytheoryvaverbal
7.6 match 4.60 score 79 scriptsmsainsburydale
NeuralEstimators:Likelihood-Free Parameter Estimation using Neural Networks
An 'R' interface to the 'Julia' package 'NeuralEstimators.jl'. The package facilitates the user-friendly development of neural Bayes estimators, which are neural networks that map data to a point summary of the posterior distribution (Sainsbury-Dale et al., 2024, <doi:10.1080/00031305.2023.2249522>). These estimators are likelihood-free and amortised, in the sense that, once the neural networks are trained on simulated data, inference from observed data can be made in a fraction of the time required by conventional approaches. The package also supports amortised Bayesian or frequentist inference using neural networks that approximate the posterior or likelihood-to-evidence ratio (Zammit-Mangion et al., 2025, Sec. 3.2, 5.2, <doi:10.48550/arXiv.2404.12484>). The package accommodates any model for which simulation is feasible by allowing users to define models implicitly through simulated data.
Maintained by Matthew Sainsbury-Dale. Last updated 20 hours ago.
5.8 match 9 stars 6.00 score 3 scriptsemeyers
NeuroDecodeR:Decode Information from Neural Activity
Neural decoding is method of analyzing neural data that uses a pattern classifiers to predict experimental conditions based on neural activity. 'NeuroDecodeR' is a system of objects that makes it easy to run neural decoding analyses. For more information on neural decoding see Meyers & Kreiman (2011) <doi:10.7551/mitpress/8404.003.0024>.
Maintained by Ethan Meyers. Last updated 1 years ago.
5.4 match 12 stars 6.49 score 17 scriptsbioc
infinityFlow:Augmenting Massively Parallel Cytometry Experiments Using Multivariate Non-Linear Regressions
Pipeline to analyze and merge data files produced by BioLegend's LEGENDScreen or BD Human Cell Surface Marker Screening Panel (BD Lyoplates).
Maintained by Etienne Becht. Last updated 5 months ago.
softwareflowcytometrycellbasedassayssinglecellproteomics
9.6 match 3.60 score 4 scriptsbioc
MetaNeighbor:Single cell replicability analysis
MetaNeighbor allows users to quantify cell type replicability across datasets using neighbor voting.
Maintained by Stephan Fischer. Last updated 5 months ago.
immunooncologygeneexpressiongomultiplecomparisonsinglecelltranscriptomics
5.8 match 5.89 score 78 scriptsmothur
phylotypr:Classifying DNA Sequences to Taxonomic Groupings
Classification based analysis of DNA sequences to taxonomic groupings. This package primarily implements Naive Bayesian Classifier from the Ribosomal Database Project. This approach has traditionally been used to classify 16S rRNA gene sequences to bacterial taxonomic outlines; however, it can be used for any type of gene sequence. The method was originally described by Wang, Garrity, Tiedje, and Cole in Applied and Environmental Microbiology 73(16):5261-7 <doi:10.1128/AEM.00062-07>. The package also provides functions to read in 'FASTA'-formatted sequence data.
Maintained by Pat Schloss. Last updated 24 days ago.
5.6 match 8 stars 6.08 score 5 scriptsbioc
MLSeq:Machine Learning Interface for RNA-Seq Data
This package applies several machine learning methods, including SVM, bagSVM, Random Forest and CART to RNA-Seq data.
Maintained by Gokmen Zararsiz. Last updated 5 months ago.
immunooncologysequencingrnaseqclassificationclustering
7.0 match 4.81 score 27 scripts 1 dependentstlverse
sl3:Pipelines for Machine Learning and Super Learning
A modern implementation of the Super Learner prediction algorithm, coupled with a general purpose framework for composing arbitrary pipelines for machine learning tasks.
Maintained by Jeremy Coyle. Last updated 4 months ago.
data-scienceensemble-learningensemble-modelmachine-learningmodel-selectionregressionstackingstatistics
3.4 match 100 stars 9.94 score 748 scripts 7 dependentsklarsen1
Information:Data Exploration with Information Theory (Weight-of-Evidence and Information Value)
Performs exploratory data analysis and variable screening for binary classification models using weight-of-evidence (WOE) and information value (IV). In order to make the package as efficient as possible, aggregations are done in data.table and creation of WOE vectors can be distributed across multiple cores. The package also supports exploration for uplift models (NWOE and NIV).
Maintained by Larsen Kim. Last updated 9 years ago.
4.5 match 44 stars 7.45 score 118 scriptsmlr-org
mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'
Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.
Maintained by Martin Binder. Last updated 9 days ago.
baggingdata-sciencedataflow-programmingensemble-learningmachine-learningmlr3pipelinespreprocessingstacking
2.7 match 141 stars 12.36 score 448 scripts 7 dependentsverbal-autopsy-software
InSilicoVA:Probabilistic Verbal Autopsy Coding with 'InSilicoVA' Algorithm
Computes individual causes of death and population cause-specific mortality fractions using the 'InSilicoVA' algorithm from McCormick et al. (2016) <DOI:10.1080/01621459.2016.1152191>. It uses data derived from verbal autopsy (VA) interviews, in a format similar to the input of the widely used 'InterVA' method. This package provides general model fitting and customization for 'InSilicoVA' algorithm and basic graphical visualization of the output.
Maintained by Zehang Richard Li. Last updated 1 months ago.
5.8 match 3 stars 5.67 score 35 scripts 1 dependentsnhs-r-community
NHSRdatasets:NHS and Healthcare-Related Data for Education and Training
Free United Kingdom National Health Service (NHS) and other healthcare, or population health-related data for education and training purposes. This package contains synthetic data based on real healthcare datasets, or cuts of open-licenced official data. This package exists to support skills development in the NHS-R community: <https://nhsrcommunity.com/>.
Maintained by Zoë Turner. Last updated 3 months ago.
3.4 match 70 stars 9.26 score 166 scriptsnmecsys
BETS:Brazilian Economic Time Series
It provides access to and information about the most important Brazilian economic time series - from the Getulio Vargas Foundation <http://portal.fgv.br/en>, the Central Bank of Brazil <http://www.bcb.gov.br> and the Brazilian Institute of Geography and Statistics <http://www.ibge.gov.br>. It also presents tools for managing, analysing (e.g. generating dynamic reports with a complete analysis of a series) and exporting these time series.
Maintained by Talitha Speranza. Last updated 4 years ago.
4.0 match 38 stars 7.82 score 108 scriptsdcauseur
FADA:Variable Selection for Supervised Classification in High Dimension
The functions provided in the FADA (Factor Adjusted Discriminant Analysis) package aim at performing supervised classification of high-dimensional and correlated profiles. The procedure combines a decorrelation step based on a factor modeling of the dependence among covariates and a classification method. The available methods are Lasso regularized logistic model (see Friedman et al. (2010)), sparse linear discriminant analysis (see Clemmensen et al. (2011)), shrinkage linear and diagonal discriminant analysis (see M. Ahdesmaki et al. (2010)). More methods of classification can be used on the decorrelated data provided by the package FADA.
Maintained by David Causeur. Last updated 5 years ago.
16.6 match 1.90 score 6 scriptsubcxzhang
iimi:Identifying Infection with Machine Intelligence
A novel machine learning method for plant viruses diagnostic using genome sequencing data. This package includes three different machine learning models, random forest, XGBoost, and elastic net, to train and predict mapped genome samples. Mappability profile and unreliable regions are introduced to the algorithm, and users can build a mappability profile from scratch with functions included in the package. Plotting mapped sample coverage information is provided.
Maintained by Xuekui Zhang. Last updated 5 months ago.
12.0 match 2.60 score 5 scriptswinvector
vtreat:A Statistically Sound 'data.frame' Processor/Conditioner
A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. 'vtreat' prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems 'vtreat' defends against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.
Maintained by John Mount. Last updated 2 months ago.
categorical-variablesmachine-learning-algorithmsnested-modelsprepare-data
2.8 match 285 stars 11.19 score 328 scripts 1 dependentsbioc
structToolbox:Data processing & analysis tools for Metabolomics and other omics
An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.
Maintained by Gavin Rhys Lloyd. Last updated 26 days ago.
workflowstepmetabolomicsbioconductor-packagedimslc-msmachine-learningmultivariate-analysisstatisticsunivariate
4.9 match 10 stars 6.26 score 12 scriptsevolecolgroup
tidysdm:Species Distribution Models with Tidymodels
Fit species distribution models (SDMs) using the 'tidymodels' framework, which provides a standardised interface to define models and process their outputs. 'tidysdm' expands 'tidymodels' by providing methods for spatial objects, models and metrics specific to SDMs, as well as a number of specialised functions to process occurrences for contemporary and palaeo datasets. The full functionalities of the package are described in Leonardi et al. (2023) <doi:10.1101/2023.07.24.550358>.
Maintained by Andrea Manica. Last updated 10 days ago.
species-distribution-modellingtidymodels
3.5 match 31 stars 8.82 score 51 scriptsai-jyc
GENEAclassify:Segmentation and Classification of Accelerometer Data
Segmentation and classification procedures for data from the 'Activinsights GENEActiv' <https://activinsights.com/technology/geneactiv/> accelerometer that provides the user with a model to guess behaviour from test data where behaviour is missing. Includes a step counting algorithm, a function to create segmented data with custom features and a function to use recursive partitioning provided in the function rpart() of the 'rpart' package to create classification models.
Maintained by Jia Ying Chua. Last updated 1 years ago.
7.9 match 1 stars 3.88 score 51 scriptsmlverse
torch:Tensors and Neural Networks with 'GPU' Acceleration
Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.
Maintained by Daniel Falbel. Last updated 7 days ago.
1.8 match 520 stars 16.52 score 1.4k scripts 38 dependentsemilhvitfeldt
fastTextR:An Interface to the 'fastText' Library
An interface to the 'fastText' library <https://github.com/facebookresearch/fastText>. The package can be used for text classification and to learn word vectors. An example how to use 'fastTextR' can be found in the 'README' file.
Maintained by Emil Hvitfeldt. Last updated 1 years ago.
5.5 match 4 stars 5.50 score 44 scripts 2 dependentsramikrispin
TSstudio:Functions for Time Series Analysis and Forecasting
Provides a set of tools for descriptive and predictive analysis of time series data. That includes functions for interactive visualization of time series objects and as well utility functions for automation time series forecasting.
Maintained by Rami Krispin. Last updated 2 years ago.
forecastingtime-seriestimeseriestsstudiovisualization
3.4 match 425 stars 9.02 score 656 scriptsdrordas
D2MCS:Data Driving Multiple Classifier System
Provides a novel framework to able to automatically develop and deploy an accurate Multiple Classifier System based on the feature-clustering distribution achieved from an input dataset. 'D2MCS' was developed focused on four main aspects: (i) the ability to determine an effective method to evaluate the independence of features, (ii) the identification of the optimal number of feature clusters, (iii) the training and tuning of ML models and (iv) the execution of voting schemes to combine the outputs of each classifier comprising the Multiple Classifier System.
Maintained by Miguel Ferreiro-Díaz. Last updated 3 years ago.
8.2 match 3.70 scorealanarnholt
BSDA:Basic Statistics and Data Analysis
Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.
Maintained by Alan T. Arnholt. Last updated 2 years ago.
3.3 match 7 stars 9.11 score 1.3k scripts 6 dependentsarthurleroy
MagmaClustR:Clustering and Prediction using Multi-Task Gaussian Processes with Common Mean
An implementation for the multi-task Gaussian processes with common mean framework. Two main algorithms, called 'Magma' and 'MagmaClust', are available to perform predictions for supervised learning problems, in particular for time series or any functional/continuous data applications. The corresponding articles has been respectively proposed by Arthur Leroy, Pierre Latouche, Benjamin Guedj and Servane Gey (2022) <doi:10.1007/s10994-022-06172-1>, and Arthur Leroy, Pierre Latouche, Benjamin Guedj and Servane Gey (2023) <https://jmlr.org/papers/v24/20-1321.html>. Theses approaches leverage the learning of cluster-specific mean processes, which are common across similar tasks, to provide enhanced prediction performances (even far from data) at a linear computational cost (in the number of tasks). 'MagmaClust' is a generalisation of 'Magma' where the tasks are simultaneously clustered into groups, each being associated to a specific mean process. User-oriented functions in the package are decomposed into training, prediction and plotting functions. Some basic features (classic kernels, training, prediction) of standard Gaussian processes are also implemented.
Maintained by Arthur Leroy. Last updated 3 months ago.
gaussian-processesmulti-task-learningmulti-task-predictioncpp
6.2 match 14 stars 4.80 score 15 scriptseagerai
tfaddons:Interface to 'TensorFlow SIG Addons'
'TensorFlow SIG Addons' <https://www.tensorflow.org/addons> is a repository of community contributions that conform to well-established API patterns, but implement new functionality not available in core 'TensorFlow'. 'TensorFlow' natively supports a large number of operators, layers, metrics, losses, optimizers, and more. However, in a fast moving field like Machine Learning, there are many interesting new developments that cannot be integrated into core 'TensorFlow' (because their broad applicability is not yet clear, or it is mostly used by a smaller subset of the community).
Maintained by Turgut Abdullayev. Last updated 3 years ago.
deep-learningkerasneural-networkstensorflowtensorflow-addonstfa
5.7 match 20 stars 5.20 score 16 scriptscbergmeir
RSNNS:Neural Networks using the Stuttgart Neural Network Simulator (SNNS)
The Stuttgart Neural Network Simulator (SNNS) is a library containing many standard implementations of neural networks. This package wraps the SNNS functionality to make it available from within R. Using the 'RSNNS' low-level interface, all of the algorithmic functionality and flexibility of SNNS can be accessed. Furthermore, the package contains a convenient high-level interface, so that the most common neural network topologies and learning algorithms integrate seamlessly into R.
Maintained by Christoph Bergmeir. Last updated 1 years ago.
3.3 match 26 stars 8.90 score 426 scripts 9 dependentsbioc
DeProViR:A Deep-Learning Framework Based on Pre-trained Sequence Embeddings for Predicting Host-Viral Protein-Protein Interactions
Emerging infectious diseases, exemplified by the zoonotic COVID-19 pandemic caused by SARS-CoV-2, are grave global threats. Understanding protein-protein interactions (PPIs) between host and viral proteins is essential for therapeutic targets and insights into pathogen replication and immune evasion. While experimental methods like yeast two-hybrid screening and mass spectrometry provide valuable insights, they are hindered by experimental noise and costs, yielding incomplete interaction maps. Computational models, notably DeProViR, predict PPIs from amino acid sequences, incorporating semantic information with GloVe embeddings. DeProViR employs a Siamese neural network, integrating convolutional and Bi-LSTM networks to enhance accuracy. It overcomes the limitations of feature engineering, offering an efficient means to predict host-virus interactions, which holds promise for antiviral therapies and advancing our understanding of infectious diseases.
Maintained by Matineh Rahmatbakhsh. Last updated 5 months ago.
proteomicssystemsbiologynetworkinferenceneuralnetworknetwork
9.8 match 1 stars 3.00 score 1 scriptsbusiness-science
modeltime.resample:Resampling Tools for Time Series Forecasting
A 'modeltime' extension that implements forecast resampling tools that assess time-based model performance and stability for a single time series, panel data, and cross-sectional time series analysis.
Maintained by Matt Dancho. Last updated 1 years ago.
accuracy-metricsbacktestingbootstrapbootstrappingcross-validationforecastingmodeltimemodeltime-resampleresamplingstatisticstidymodelstime-series
4.4 match 19 stars 6.64 score 38 scripts 1 dependentspgiraudoux
pgirmess:Spatial Analysis and Data Mining for Field Ecologists
Set of tools for reading, writing and transforming spatial and seasonal data, model selection and specific statistical tests for ecologists. It includes functions to interpolate regular positions of points between landmarks, to discretize polylines into regular point positions, link distant observations to points and convert a bounding box in a spatial object. It also provides miscellaneous functions for field ecologists such as spatial statistics and inference on diversity indexes, writing data.frame with Chinese characters.
Maintained by Patrick Giraudoux. Last updated 1 years ago.
4.0 match 5 stars 7.32 score 422 scripts 2 dependentsdavid-cortes
recometrics:Evaluation Metrics for Implicit-Feedback Recommender Systems
Calculates evaluation metrics for implicit-feedback recommender systems that are based on low-rank matrix factorization models, given the fitted model matrices and data, thus allowing to compare models from a variety of libraries. Metrics include P@K (precision-at-k, for top-K recommendations), R@K (recall at k), AP@K (average precision at k), NDCG@K (normalized discounted cumulative gain at k), Hit@K (from which the 'Hit Rate' is calculated), RR@K (reciprocal rank at k, from which the 'MRR' or 'mean reciprocal rank' is calculated), ROC-AUC (area under the receiver-operating characteristic curve), and PR-AUC (area under the precision-recall curve). These are calculated on a per-user basis according to the ranking of items induced by the model, using efficient multi-threaded routines. Also provides functions for creating train-test splits for model fitting and evaluation.
Maintained by David Cortes. Last updated 2 months ago.
implicit-feedbackmatrix-factorizationrecommender-systemsopenblascppopenmp
5.3 match 28 stars 5.45 scoretlverse
origami:Generalized Framework for Cross-Validation
A general framework for the application of cross-validation schemes to particular functions. By allowing arbitrary lists of results, origami accommodates a range of cross-validation applications. This implementation was first described by Coyle and Hejazi (2018) <doi:10.21105/joss.00512>.
Maintained by Jeremy Coyle. Last updated 3 years ago.
cross-validationmachine-learning
3.0 match 27 stars 9.68 score 492 scripts 20 dependentsliuyanguu
SHAPforxgboost:SHAP Plots for 'XGBoost'
Aid in visual data investigations using SHAP (SHapley Additive exPlanation) visualization plots for 'XGBoost' and 'LightGBM'. It provides summary plot, dependence plot, interaction plot, and force plot and relies on the SHAP implementation provided by 'XGBoost' and 'LightGBM'. Please refer to 'slundberg/shap' for the original implementation of SHAP in 'Python'.
Maintained by Yang Liu. Last updated 12 months ago.
3.2 match 113 stars 9.02 score 284 scripts 1 dependentstagteam
riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks
Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.
Maintained by Thomas Alexander Gerds. Last updated 18 days ago.
2.2 match 46 stars 13.00 score 736 scripts 35 dependentstidyverse
modelr:Modelling Functions that Work with the Pipe
Functions for modelling that help you seamlessly integrate modelling into a pipeline of data manipulation and visualisation.
Maintained by Hadley Wickham. Last updated 1 years ago.
1.8 match 401 stars 16.44 score 6.9k scripts 1.0k dependentsh56cho
forestRK:Implements the Forest-R.K. Algorithm for Classification Problems
Provides functions that calculates common types of splitting criteria used in random forests for classification problems, as well as functions that make predictions based on a single tree or a Forest-R.K. model; the package also provides functions to generate importance plot for a Forest-R.K. model, as well as the 2D multidimensional-scaling plot of data points that are colour coded by their predicted class types by the Forest-R.K. model. This package is based on: Bernard, S., Heutte, L., Adam, S., (2008, ISBN:978-3-540-85983-3) "Forest-R.K.: A New Random Forest Induction Method", Fourth International Conference on Intelligent Computing, September 2008, Shanghai, China, pp.430-437.
Maintained by Hyunjin Cho. Last updated 6 years ago.
6.8 match 4.24 score 35 scriptscrj32
MLeval:Machine Learning Model Evaluation
Straightforward and detailed evaluation of machine learning models. 'MLeval' can produce receiver operating characteristic (ROC) curves, precision-recall (PR) curves, calibration curves, and PR gain curves. 'MLeval' accepts a data frame of class probabilities and ground truth labels, or, it can automatically interpret the Caret train function results from repeated cross validation, then select the best model and analyse the results. 'MLeval' produces a range of evaluation metrics with confidence intervals.
Maintained by Christopher R John. Last updated 5 years ago.
5.0 match 6 stars 5.71 score 144 scriptstransbiozi
RMTL:Regularized Multi-Task Learning
Efficient solvers for 10 regularized multi-task learning algorithms applicable for regression, classification, joint feature selection, task clustering, low-rank learning, sparse learning and network incorporation. Based on the accelerated gradient descent method, the algorithms feature a state-of-art computational complexity O(1/k^2). Sparse model structure is induced by the solving the proximal operator. The detail of the package is described in the paper of Han Cao and Emanuel Schwarz (2018) <doi:10.1093/bioinformatics/bty831>.
Maintained by Han Cao. Last updated 6 years ago.
low-rank-representaionmulti-task-learningregularizationsparse-coding
5.0 match 19 stars 5.60 score 21 scriptsrafaeljm
LibOPF:Design of Optimum-Path Forest Classifiers
The 'LibOPF' is a framework to develop pattern recognition techniques based on optimum-path forests (OPF), João P. Papa and Alexandre X. Falcão (2008) <doi:10.1007/978-3-540-89639-5_89>, with methods for supervised learning and data clustering.
Maintained by Rafael Junqueira Martarelli. Last updated 4 years ago.
8.8 match 1 stars 3.18 scoredavid-cortes
outliertree:Explainable Outlier Detection Through Decision Tree Conditioning
Outlier detection method that flags suspicious values within observations, constrasting them against the normal values in a user-readable format, potentially describing conditions within the data that make a given outlier more rare. Full procedure is described in Cortes (2020) <doi:10.48550/arXiv.2001.00636>. Loosely based on the 'GritBot' <https://www.rulequest.com/gritbot-info.html> software.
Maintained by David Cortes. Last updated 2 months ago.
anomaly-detectionoutlier-detectioncppopenmp
3.8 match 58 stars 7.34 score 21 scripts 2 dependentscobrbra
ICBioMark:Data-Driven Design of Targeted Gene Panels for Estimating Immunotherapy Biomarkers
Implementation of the methodology proposed in 'Data-driven design of targeted gene panels for estimating immunotherapy biomarkers', Bradley and Cannings (2021) <arXiv:2102.04296>. This package allows the user to fit generative models of mutation from an annotated mutation dataset, and then further to produce tunable linear estimators of exome-wide biomarkers. It also contains functions to simulate mutation annotated format (MAF) data, as well as to analyse the output and performance of models.
Maintained by Jacob R. Bradley. Last updated 2 years ago.
10.2 match 2.70 score 2 scriptsjedazard
superpc:Supervised Principal Components
Does prediction in the case of a censored survival outcome, or a regression outcome, using the "supervised principal component" approach. 'Superpc' is especially useful for high-dimensional data when the number of features p dominates the number of samples n (p >> n paradigm), as generated, for instance, by high-throughput technologies.
Maintained by Jean-Eudes Dazard. Last updated 3 years ago.
4.0 match 7 stars 6.96 score 80 scripts 2 dependentstidymodels
workflows:Modeling Workflows
Managing both a 'parsnip' model and a preprocessor, such as a model formula or recipe from 'recipes', can often be challenging. The goal of 'workflows' is to streamline this process by bundling the model alongside the preprocessor, all within the same object.
Maintained by Simon Couch. Last updated 26 days ago.
2.0 match 207 stars 13.80 score 876 scripts 43 dependentsbioc
mistyR:Multiview Intercellular SpaTial modeling framework
mistyR is an implementation of the Multiview Intercellular SpaTialmodeling framework (MISTy). MISTy is an explainable machine learning framework for knowledge extraction and analysis of single-cell, highly multiplexed, spatially resolved data. MISTy facilitates an in-depth understanding of marker interactions by profiling the intra- and intercellular relationships. MISTy is a flexible framework able to process a custom number of views. Each of these views can describe a different spatial context, i.e., define a relationship among the observed expressions of the markers, such as intracellular regulation or paracrine regulation, but also, the views can also capture cell-type specific relationships, capture relations between functional footprints or focus on relations between different anatomical regions. Each MISTy view is considered as a potential source of variability in the measured marker expressions. Each MISTy view is then analyzed for its contribution to the total expression of each marker and is explained in terms of the interactions with other measurements that led to the observed contribution.
Maintained by Jovan Tanevski. Last updated 5 months ago.
softwarebiomedicalinformaticscellbiologysystemsbiologyregressiondecisiontreesinglecellspatialbioconductorbiologyintercellularmachine-learningmodularmolecular-biologymultiviewspatial-transcriptomics
3.5 match 51 stars 7.87 score 160 scripts