Showing 200 of total 482 results (show query)
topepo
caret:Classification and Regression Training
Misc functions for training and plotting classification and regression models.
Maintained by Max Kuhn. Last updated 4 months ago.
1.6k stars 19.24 score 61k scripts 303 dependentstidymodels
recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 9 hours ago.
586 stars 18.82 score 7.2k scripts 386 dependentstidymodels
tidymodels:Easily Install and Load the 'Tidymodels' Packages
The tidy modeling "verse" is a collection of packages for modeling and statistical analysis that share the underlying design philosophy, grammar, and data structures of the tidyverse.
Maintained by Max Kuhn. Last updated 1 months ago.
783 stars 16.52 score 66k scripts 15 dependentstidymodels
tune:Tidy Tuning Tools
The ability to tune models is important. 'tune' contains functions and classes to be used in conjunction with other 'tidymodels' packages for finding reasonable values of hyper-parameters in models, pre-processing methods, and post-processing steps.
Maintained by Max Kuhn. Last updated 30 days ago.
297 stars 14.25 score 756 scripts 39 dependentsbusiness-science
timetk:A Tool Kit for Working with Time Series
Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.
Maintained by Matt Dancho. Last updated 1 years ago.
coercioncoercion-functionsdata-miningdplyrforecastforecastingforecasting-modelsmachine-learningseries-decompositionseries-signaturetibbletidytidyquanttidyversetimetime-seriestimeseries
626 stars 14.20 score 4.0k scripts 16 dependentstidymodels
workflows:Modeling Workflows
Managing both a 'parsnip' model and a preprocessor, such as a model formula or recipe from 'recipes', can often be challenging. The goal of 'workflows' is to streamline this process by bundling the model alongside the preprocessor, all within the same object.
Maintained by Simon Couch. Last updated 1 months ago.
207 stars 13.97 score 876 scripts 43 dependentskkholst
mets:Analysis of Multivariate Event Times
Implementation of various statistical models for multivariate event history data <doi:10.1007/s10985-013-9244-x>. Including multivariate cumulative incidence models <doi:10.1002/sim.6016>, and bivariate random effects probit models (Liability models) <doi:10.1016/j.csda.2015.01.014>. Modern methods for survival analysis, including regression modelling (Cox, Fine-Gray, Ghosh-Lin, Binomial regression) with fast computation of influence functions.
Maintained by Klaus K. Holst. Last updated 21 hours ago.
multivariate-time-to-eventsurvival-analysistime-to-eventfortranopenblascpp
14 stars 13.44 score 236 scripts 42 dependentsbusiness-science
tidyquant:Tidy Quantitative Financial Analysis
Bringing business and financial analysis to the 'tidyverse'. The 'tidyquant' package provides a convenient wrapper to various 'xts', 'zoo', 'quantmod', 'TTR' and 'PerformanceAnalytics' package functions and returns the objects in the tidy 'tibble' format. The main advantage is being able to use quantitative functions with the 'tidyverse' functions including 'purrr', 'dplyr', 'tidyr', 'ggplot2', 'lubridate', etc. See the 'tidyquant' website for more information, documentation and examples.
Maintained by Matt Dancho. Last updated 2 months ago.
dplyrfinancial-analysisfinancial-datafinancial-statementsmultiple-stocksperformance-analysisperformanceanalyticsquantmodstockstock-exchangesstock-indexesstock-listsstock-performancestock-pricesstock-symboltidyversetime-seriestimeseriesxts
872 stars 13.34 score 5.2k scriptsoscarkjell
text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.
Maintained by Oscar Kjell. Last updated 12 days ago.
deep-learningmachine-learningnlptransformersopenjdk
145 stars 13.21 score 436 scripts 1 dependentstagteam
prodlim:Product-Limit Estimation for Censored Event History Analysis
Fast and user friendly implementation of nonparametric estimators for censored event history (survival) analysis. Kaplan-Meier and Aalen-Johansen method.
Maintained by Thomas A. Gerds. Last updated 30 days ago.
7 stars 12.25 score 1000 scripts 468 dependentstidymodels
probably:Tools for Post-Processing Predicted Values
Models can be improved by post-processing class probabilities, by: recalibration, conversion to hard probabilities, assessment of equivocal zones, and other activities. 'probably' contains tools for conducting these operations as well as calibration tools and conformal inference techniques for regression models.
Maintained by Max Kuhn. Last updated 6 months ago.
115 stars 12.09 score 21k scripts 1 dependentstidymodels
workflowsets:Create a Collection of 'tidymodels' Workflows
A workflow is a combination of a model and preprocessors (e.g, a formula, recipe, etc.) (Kuhn and Silge (2021) <https://www.tmwr.org/>). In order to try different combinations of these, an object can be created that contains many workflows. There are functions to create workflows en masse as well as training them and visualizing the results.
Maintained by Simon Couch. Last updated 5 months ago.
94 stars 12.04 score 294 scripts 19 dependentszachmayer
caretEnsemble:Ensembles of Caret Models
Functions for creating ensembles of caret models: caretList() and caretStack(). caretList() is a convenience function for fitting multiple caret::train() models to the same dataset. caretStack() will make linear or non-linear combinations of these models, using a caret::train() model as a meta-model.
Maintained by Zachary A. Deane-Mayer. Last updated 4 months ago.
226 stars 11.98 score 780 scripts 1 dependentshannameyer
CAST:'caret' Applications for Spatial-Temporal Models
Supporting functionality to run 'caret' with spatial or spatial-temporal data. 'caret' is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using 'caret'. It includes the newly suggested 'Nearest neighbor distance matching' cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milà et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>.
Maintained by Hanna Meyer. Last updated 2 months ago.
autocorrelationcaretfeature-selectionmachine-learningoverfittingpredictive-modelingspatialspatio-temporalvariable-selection
114 stars 11.85 score 298 scripts 1 dependentstidymodels
stacks:Tidy Model Stacking
Model stacking is an ensemble technique that involves training a model to combine the outputs of many diverse statistical models, and has been shown to improve predictive performance in a variety of settings. 'stacks' implements a grammar for 'tidymodels'-aligned model stacking.
Maintained by Simon Couch. Last updated 5 months ago.
298 stars 11.46 score 840 scriptstidymodels
textrecipes:Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Maintained by Emil Hvitfeldt. Last updated 15 days ago.
160 stars 10.86 score 964 scripts 1 dependentsthothorn
ipred:Improved Predictors
Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.
Maintained by Torsten Hothorn. Last updated 9 months ago.
10.76 score 3.3k scripts 411 dependentsbusiness-science
modeltime:The Tidymodels Extension for Time Series Modeling
The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<https://otexts.com/fpp2/>). Refer to "Prophet: forecasting at scale" (<https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/>.).
Maintained by Matt Dancho. Last updated 5 months ago.
arimadata-sciencedeep-learningetsforecastingmachine-learningmachine-learning-algorithmsmodeltimeprophettbatstidymodelingtidymodelstimetime-seriestime-series-analysistimeseriestimeseries-forecasting
551 stars 10.61 score 1.1k scripts 7 dependentsscheike
timereg:Flexible Regression Models for Survival Data
Programs for Martinussen and Scheike (2006), `Dynamic Regression Models for Survival Data', Springer Verlag. Plus more recent developments. Additive survival model, semiparametric proportional odds model, fast cumulative residuals, excess risk models and more. Flexible competing risks regression including GOF-tests. Two-stage frailty modelling. PLS for the additive risk model. Lasso in the 'ahaz' package.
Maintained by Thomas Scheike. Last updated 7 months ago.
31 stars 10.42 score 289 scripts 44 dependentstidymodels
themis:Extra Recipes Steps for Dealing with Unbalanced Data
A dataset with an uneven number of cases in each class is said to be unbalanced. Many models produce a subpar performance on unbalanced datasets. A dataset can be balanced by increasing the number of minority cases using SMOTE 2011 <doi:10.48550/arXiv.1106.1813>, BorderlineSMOTE 2005 <doi:10.1007/11538059_91> and ADASYN 2008 <https://ieeexplore.ieee.org/document/4633969>. Or by decreasing the number of majority cases using NearMiss 2003 <https://www.site.uottawa.ca/~nat/Workshop2003/jzhang.pdf> or Tomek link removal 1976 <https://ieeexplore.ieee.org/document/4309452>.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
143 stars 10.37 score 1.3k scripts 2 dependentsludvigolsen
cvms:Cross-Validation for Model Selection
Cross-validate one or multiple regression and classification models and get relevant evaluation metrics in a tidy format. Validate the best model on a test set and compare it to a baseline evaluation. Alternatively, evaluate predictions from an external model. Currently supports regression and classification (binary and multiclass). Described in chp. 5 of Jeyaraman, B. P., Olsen, L. R., & Wambugu M. (2019, ISBN: 9781838550134).
Maintained by Ludvig Renbo Olsen. Last updated 27 days ago.
39 stars 10.31 score 492 scripts 5 dependentsbioc
pRoloc:A unifying bioinformatics framework for spatial proteomics
The pRoloc package implements machine learning and visualisation methods for the analysis and interogation of quantitiative mass spectrometry data to reliably infer protein sub-cellular localisation.
Maintained by Lisa Breckels. Last updated 7 days ago.
immunooncologyproteomicsmassspectrometryclassificationclusteringqualitycontrolbioconductorproteomics-dataspatial-proteomicsvisualisationopenblascpp
15 stars 10.31 score 101 scripts 2 dependentstagteam
Publish:Format Output of Various Routines in a Suitable Way for Reports and Publication
A bunch of convenience functions that transform the results of some basic statistical analyses into table format nearly ready for publication. This includes descriptive tables, tables of logistic regression and Cox regression results as well as forest plots.
Maintained by Thomas A. Gerds. Last updated 28 days ago.
15 stars 10.11 score 274 scripts 36 dependentsbleutner
RStoolbox:Remote Sensing Data Analysis
Toolbox for remote sensing image processing and analysis such as calculating spectral indexes, principal component transformation, unsupervised and supervised classification or fractional cover analyses.
Maintained by Konstantin Mueller. Last updated 2 months ago.
ggplot2land-cover-mappingremote-sensingspectral-unmixingsupervised-classificationunsupervised-classificationopenblascpp
275 stars 10.10 score 1.1k scriptspecanproject
PEcAn.assim.batch:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by Istem Fer. Last updated 2 days ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplantsjagscpp
216 stars 9.96 score 20 scripts 2 dependentstlverse
sl3:Pipelines for Machine Learning and Super Learning
A modern implementation of the Super Learner prediction algorithm, coupled with a general purpose framework for composing arbitrary pipelines for machine learning tasks.
Maintained by Jeremy Coyle. Last updated 5 months ago.
data-scienceensemble-learningensemble-modelmachine-learningmodel-selectionregressionstackingstatistics
100 stars 9.94 score 748 scripts 7 dependentspecanproject
PEcAnRTM:PEcAn Functions Used for Radiative Transfer Modeling
Functions for performing forward runs and inversions of radiative transfer models (RTMs). Inversions can be performed using maximum likelihood, or more complex hierarchical Bayesian methods. Underlying numerical analyses are optimized for speed using Fortran code.
Maintained by Alexey Shiklomanov. Last updated 2 days ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplantsfortranjagscpp
216 stars 9.70 score 132 scriptsbblonder
hypervolume:High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls
Estimates the shape and volume of high-dimensional datasets and performs set operations: intersection / overlap, union, unique components, inclusion test, and hole detection. Uses stochastic geometry approach to high-dimensional kernel density estimation, support vector machine delineation, and convex hull generation. Applications include modeling trait and niche hypervolumes and species distribution modeling.
Maintained by Benjamin Blonder. Last updated 3 months ago.
23 stars 9.69 score 211 scripts 7 dependentsbusiness-science
anomalize:Tidy Anomaly Detection
The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it's quite simple to decompose time series, detect anomalies, and create bands separating the "normal" data from the anomalous data at scale (i.e. for multiple time series). Time series decomposition is used to remove trend and seasonal components via the time_decompose() function and methods include seasonal decomposition of time series by Loess ("stl") and seasonal decomposition by piecewise medians ("twitter"). The anomalize() function implements two methods for anomaly detection of residuals including using an inner quartile range ("iqr") and generalized extreme studentized deviation ("gesd"). These methods are based on those used in the 'forecast' package and the Twitter 'AnomalyDetection' package. Refer to the associated functions for specific references for these methods.
Maintained by Matt Dancho. Last updated 1 years ago.
anomalyanomaly-detectiondecompositiondetect-anomaliesiqrtime-series
339 stars 9.56 score 332 scriptsndphillips
FFTrees:Generate, Visualise, and Evaluate Fast-and-Frugal Decision Trees
Create, visualize, and test fast-and-frugal decision trees (FFTs) using the algorithms and methods described by Phillips, Neth, Woike & Gaissmaier (2017), <doi:10.1017/S1930297500006239>. FFTs are simple and transparent decision trees for solving binary classification problems. FFTs can be preferable to more complex algorithms because they require very little information, are easy to understand and communicate, and are robust against overfitting.
Maintained by Hansjoerg Neth. Last updated 6 months ago.
136 stars 9.53 score 144 scriptsadibender
pammtools:Piece-Wise Exponential Additive Mixed Modeling Tools for Survival Analysis
The Piece-wise exponential (Additive Mixed) Model (PAMM; Bender and others (2018) <doi: 10.1177/1471082X17748083>) is a powerful model class for the analysis of survival (or time-to-event) data, based on Generalized Additive (Mixed) Models (GA(M)Ms). It offers intuitive specification and robust estimation of complex survival models with stratified baseline hazards, random effects, time-varying effects, time-dependent covariates and cumulative effects (Bender and others (2019)), as well as support for left-truncated data as well as competing risks, recurrent events and multi-state settings. pammtools provides tidy workflow for survival analysis with PAMMs, including data simulation, transformation and other functions for data preprocessing and model post-processing as well as visualization.
Maintained by Andreas Bender. Last updated 10 days ago.
additive-modelspammpammtoolspiece-wise-exponentialsurvival-analysis
48 stars 9.32 score 310 scripts 8 dependentsmicrosoft
finnts:Microsoft Finance Time Series Forecasting Framework
Automated time series forecasting developed by Microsoft Finance. The Microsoft Finance Time Series Forecasting Framework, aka Finn, can be used to forecast any component of the income statement, balance sheet, or any other area of interest by finance. Any numerical quantity over time, Finn can be used to forecast it. While it can be applied outside of the finance domain, Finn was built to meet the needs of financial analysts to better forecast their businesses within a company, and has a lot of built in features that are specific to the needs of financial forecasters. Happy forecasting!
Maintained by Mike Tokic. Last updated 1 months ago.
businessdata-sciencefeature-selectionfinancefinntsforecastingmachine-learningmicrosofttime-series
194 stars 9.30 score 39 scriptsbusiness-science
sweep:Tidy Tools for Forecasting
Tidies up the forecasting modeling and prediction work flow, extends the 'broom' package with 'sw_tidy', 'sw_glance', 'sw_augment', and 'sw_tidy_decomp' functions for various forecasting models, and enables converting 'forecast' objects to "tidy" data frames with 'sw_sweep'.
Maintained by Matt Dancho. Last updated 1 years ago.
broomforecastforecasting-modelspredictiontidytidyversetimetime-seriestimeseries
155 stars 9.23 score 399 scripts 1 dependentstidymodels
embed:Extra Recipes for Encoding Predictors
Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
142 stars 9.18 score 1.1k scriptsmlverse
tabnet:Fit 'TabNet' Models for Classification and Regression
Implements the 'TabNet' model by Sercan O. Arik et al. (2019) <doi:10.48550/arXiv.1908.07442> with 'Coherent Hierarchical Multi-label Classification Networks' by Giunchiglia et al. <doi:10.48550/arXiv.2010.10151> and provides a consistent interface for fitting and creating predictions. It's also fully compatible with the 'tidymodels' ecosystem.
Maintained by Christophe Regouby. Last updated 4 days ago.
109 stars 9.05 score 65 scriptspecanproject
PEcAn.all:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by David LeBauer. Last updated 2 days ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplantsjagscpp
216 stars 9.01 score 266 scriptssym33
RecordLinkage:Record Linkage Functions for Linking and Deduplicating Data Sets
Provides functions for linking and deduplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain. For details, see our paper "The RecordLinkage Package: Detecting Errors in Data" Sariyar M / Borg A (2010) <doi:10.32614/RJ-2010-017>.
Maintained by Murat Sariyar. Last updated 2 years ago.
6 stars 8.96 score 454 scripts 8 dependentspedrohcgs
DRDID:Doubly Robust Difference-in-Differences Estimators
Implements the locally efficient doubly robust difference-in-differences (DiD) estimators for the average treatment effect proposed by Sant'Anna and Zhao (2020) <doi:10.1016/j.jeconom.2020.06.003>. The estimator combines inverse probability weighting and outcome regression estimators (also implemented in the package) to form estimators with more attractive statistical properties. Two different estimation methods can be used to estimate the nuisance functions.
Maintained by Pedro H. C. SantAnna. Last updated 6 months ago.
92 stars 8.88 score 133 scripts 5 dependentsevolecolgroup
tidysdm:Species Distribution Models with Tidymodels
Fit species distribution models (SDMs) using the 'tidymodels' framework, which provides a standardised interface to define models and process their outputs. 'tidysdm' expands 'tidymodels' by providing methods for spatial objects, models and metrics specific to SDMs, as well as a number of specialised functions to process occurrences for contemporary and palaeo datasets. The full functionalities of the package are described in Leonardi et al. (2023) <doi:10.1101/2023.07.24.550358>.
Maintained by Andrea Manica. Last updated 27 days ago.
species-distribution-modellingtidymodels
31 stars 8.82 score 51 scriptsjinseob2kim
jsmodule:'RStudio' Addins and 'Shiny' Modules for Medical Research
'RStudio' addins and 'Shiny' modules for descriptive statistics, regression and survival analysis.
Maintained by Jinseob Kim. Last updated 16 days ago.
medicalrstudio-addinsshinyshiny-modulesstatistics
21 stars 8.69 score 61 scriptstidymodels
censored:'parsnip' Engines for Survival Models
Engines for survival models from the 'parsnip' package. These include parametric models (e.g., Jackson (2016) <doi:10.18637/jss.v070.i08>), semi-parametric (e.g., Simon et al (2011) <doi:10.18637/jss.v039.i05>), and tree-based models (e.g., Buehlmann and Hothorn (2007) <doi:10.1214/07-STS242>).
Maintained by Hannah Frick. Last updated 2 months ago.
122 stars 8.68 score 254 scripts 1 dependentsbioc
survcomp:Performance Assessment and Comparison for Survival Analysis
Assessment and Comparison for Performance of Risk Prediction (Survival) Models.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
geneexpressiondifferentialexpressionvisualizationcpp
8.46 score 448 scripts 12 dependentstidymodels
tidyposterior:Bayesian Analysis to Compare Models using Resampling Statistics
Bayesian analysis used here to answer the question: "when looking at resampling results, are the differences between models 'real'?" To answer this, a model can be created were the performance statistic is the resampling statistics (e.g. accuracy or RMSE). These values are explained by the model types. In doing this, we can get parameter estimates for each model's affect on performance and make statistical (and practical) comparisons between models. The methods included here are similar to Benavoli et al (2017) <https://jmlr.org/papers/v18/16-305.html>.
Maintained by Max Kuhn. Last updated 6 months ago.
102 stars 8.44 score 273 scriptsmodeloriented
survex:Explainable Machine Learning in Survival Analysis
Survival analysis models are commonly used in medicine and other areas. Many of them are too complex to be interpreted by human. Exploration and explanation is needed, but standard methods do not give a broad enough picture. 'survex' provides easy-to-apply methods for explaining survival models, both complex black-boxes and simpler statistical models. They include methods specific to survival analysis such as SurvSHAP(t) introduced in Krzyzinski et al., (2023) <doi:10.1016/j.knosys.2022.110234>, SurvLIME described in Kovalev et al., (2020) <doi:10.1016/j.knosys.2020.106164> as well as extensions of existing ones described in Biecek et al., (2021) <doi:10.1201/9780429027192>.
Maintained by Mikołaj Spytek. Last updated 10 months ago.
biostatisticsbrier-scorescensored-datacox-modelcox-regressionexplainable-aiexplainable-machine-learningexplainable-mlexplanatory-model-analysisinterpretable-machine-learninginterpretable-mlmachine-learningprobabilistic-machine-learningshapsurvival-analysistime-to-eventvariable-importancexai
110 stars 8.40 score 114 scriptstidymodels
finetune:Additional Functions for Model Tuning
The ability to tune models is important. 'finetune' enhances the 'tune' package by providing more specialized methods for finding reasonable values of model tuning parameters. Two racing methods described by Kuhn (2014) <arXiv:1405.6974> are included. An iterative search method using generalized simulated annealing (Bohachevsky, Johnson and Stein, 1986) <doi:10.1080/00401706.1986.10488128> is also included.
Maintained by Max Kuhn. Last updated 8 months ago.
62 stars 8.36 score 704 scripts 1 dependentscefet-rj-dal
harbinger:A Unified Time Series Event Detection Framework
By analyzing time series, it is possible to observe significant changes in the behavior of observations that frequently characterize events. Events present themselves as anomalies, change points, or motifs. In the literature, there are several methods for detecting events. However, searching for a suitable time series method is a complex task, especially considering that the nature of events is often unknown. This work presents Harbinger, a framework for integrating and analyzing event detection methods. Harbinger contains several state-of-the-art methods described in Salles et al. (2020) <doi:10.5753/sbbd.2020.13626>.
Maintained by Eduardo Ogasawara. Last updated 4 months ago.
18 stars 8.32 score 216 scriptsbusiness-science
modeltime.ensemble:Ensemble Algorithms for Time Series Forecasting with Modeltime
A 'modeltime' extension that implements time series ensemble forecasting methods including model averaging, weighted averaging, and stacking. These techniques are popular methods to improve forecast accuracy and stability.
Maintained by Matt Dancho. Last updated 9 months ago.
ensembleensemble-learningforecastforecastingmodeltimestackingstacking-ensembletidymodelstimetime-seriestimeseries
77 stars 8.30 score 143 scriptsgrosssbm
sbm:Stochastic Blockmodels
A collection of tools and functions to adjust a variety of stochastic blockmodels (SBM). Supports at the moment Simple, Bipartite, 'Multipartite' and Multiplex SBM (undirected or directed with Bernoulli, Poisson or Gaussian emission laws on the edges, and possibly covariate for Simple and Bipartite SBM). See Léger (2016) <doi:10.48550/arXiv.1602.07587>, 'Barbillon et al.' (2020) <doi:10.1111/rssa.12193> and 'Bar-Hen et al.' (2020) <doi:10.48550/arXiv.1807.10138>.
Maintained by Julien Chiquet. Last updated 7 months ago.
network-analysissbmstochastic-block-modelcpp
16 stars 8.27 score 98 scripts 2 dependentsbioc
POMA:Tools for Omics Data Analysis
The POMA package offers a comprehensive toolkit designed for omics data analysis, streamlining the process from initial visualization to final statistical analysis. Its primary goal is to simplify and unify the various steps involved in omics data processing, making it more accessible and manageable within a single, intuitive R package. Emphasizing on reproducibility and user-friendliness, POMA leverages the standardized SummarizedExperiment class from Bioconductor, ensuring seamless integration and compatibility with a wide array of Bioconductor tools. This approach guarantees maximum flexibility and replicability, making POMA an essential asset for researchers handling omics datasets. See https://github.com/pcastellanoescuder/POMAShiny. Paper: Castellano-Escuder et al. (2021) <doi:10.1371/journal.pcbi.1009148> for more details.
Maintained by Pol Castellano-Escuder. Last updated 4 months ago.
batcheffectclassificationclusteringdecisiontreedimensionreductionmultidimensionalscalingnormalizationpreprocessingprincipalcomponentregressionrnaseqsoftwarestatisticalmethodvisualizationbioconductorbioinformaticsdata-visualizationdimension-reductionexploratory-data-analysismachine-learningomics-data-integrationpipelinepre-processingstatistical-analysisuser-friendlyworkflow
11 stars 8.16 score 20 scripts 1 dependentsbrian-j-smith
MachineShop:Machine Learning Models and Tools
Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.
Maintained by Brian J Smith. Last updated 8 months ago.
classification-modelsmachine-learningpredictive-modelingregression-modelssurvival-models
62 stars 7.95 score 121 scriptsbcallaway11
BMisc:Miscellaneous Functions for Panel Data, Quantiles, and Printing Results
These are miscellaneous functions for working with panel data, quantiles, and printing results. For panel data, the package includes functions for making a panel data balanced (that is, dropping missing individuals that have missing observations in any time period), converting id numbers to row numbers, and to treat repeated cross sections as panel data under the assumption of rank invariance. For quantiles, there are functions to make distribution functions from a set of data points (this is particularly useful when a distribution function is created in several steps), to combine distribution functions based on some external weights, and to invert distribution functions. Finally, there are several other miscellaneous functions for obtaining weighted means, weighted distribution functions, and weighted quantiles; to generate summary statistics and their differences for two groups; and to add or drop covariates from formulas.
Maintained by Brantly Callaway. Last updated 2 months ago.
7 stars 7.92 score 110 scripts 8 dependentstlverse
tmle3:The Extensible TMLE Framework
A general framework supporting the implementation of targeted maximum likelihood estimators (TMLEs) of a diverse range of statistical target parameters through a unified interface. The goal is that the exposed framework be as general as the mathematical framework upon which it draws.
Maintained by Jeremy Coyle. Last updated 5 months ago.
causal-inferencemachine-learningtargeted-learningvariable-importance
38 stars 7.91 score 286 scripts 5 dependentsmyles-lewis
nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'
Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.
Maintained by Myles Lewis. Last updated 14 days ago.
12 stars 7.90 score 46 scriptskylebutts
did2s:Two-Stage Difference-in-Differences Following Gardner (2021)
Estimates Two-way Fixed Effects difference-in-differences/event-study models using the approach proposed by Gardner (2021) <doi:10.48550/arXiv.2207.05943>. To avoid the problems caused by OLS estimation of the Two-way Fixed Effects model, this function first estimates the fixed effects and covariates using untreated observations and then in a second stage, estimates the treatment effects.
Maintained by Kyle Butts. Last updated 18 days ago.
97 stars 7.89 score 134 scriptsbioc
mistyR:Multiview Intercellular SpaTial modeling framework
mistyR is an implementation of the Multiview Intercellular SpaTialmodeling framework (MISTy). MISTy is an explainable machine learning framework for knowledge extraction and analysis of single-cell, highly multiplexed, spatially resolved data. MISTy facilitates an in-depth understanding of marker interactions by profiling the intra- and intercellular relationships. MISTy is a flexible framework able to process a custom number of views. Each of these views can describe a different spatial context, i.e., define a relationship among the observed expressions of the markers, such as intracellular regulation or paracrine regulation, but also, the views can also capture cell-type specific relationships, capture relations between functional footprints or focus on relations between different anatomical regions. Each MISTy view is considered as a potential source of variability in the measured marker expressions. Each MISTy view is then analyzed for its contribution to the total expression of each marker and is explained in terms of the interactions with other measurements that led to the observed contribution.
Maintained by Jovan Tanevski. Last updated 5 months ago.
softwarebiomedicalinformaticscellbiologysystemsbiologyregressiondecisiontreesinglecellspatialbioconductorbiologyintercellularmachine-learningmodularmolecular-biologymultiviewspatial-transcriptomics
52 stars 7.87 score 160 scriptsschlosslab
mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines
An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.
Maintained by Kelly Sovacool. Last updated 2 years ago.
56 stars 7.83 score 86 scriptsmatloff
dsld:Data Science Looks at Discrimination
Statistical and graphical tools for detecting and measuring discrimination and bias, be it racial, gender, age or other. Detection and remediation of bias in machine learning algorithms. 'Python' interfaces available.
Maintained by Norm Matloff. Last updated 2 months ago.
12 stars 7.81 score 35 scriptsnsaph-software
CausalGPS:Matching on Generalized Propensity Scores with Continuous Exposures
Provides a framework for estimating causal effects of a continuous exposure using observational data, and implementing matching and weighting on the generalized propensity score. Wu, X., Mealli, F., Kioumourtzoglou, M.A., Dominici, F. and Braun, D., 2022. Matching on generalized propensity scores with continuous exposures. Journal of the American Statistical Association, pp.1-29.
Maintained by Naeem Khoshnevis. Last updated 10 months ago.
24 stars 7.67 score 39 scriptsspsanderson
healthyR.ts:The Time Series Modeling Companion to 'healthyR'
Hospital time series data analysis workflow tools, modeling, and automations. This library provides many useful tools to review common administrative time series hospital data. Some of these include average length of stay, and readmission rates. The aim is to provide a simple and consistent verb framework that takes the guesswork out of everything.
Maintained by Steven Sanderson. Last updated 6 months ago.
aiarima-forecastingarima-modeletsforecastingggplot2machine-learningmodelingprophettime-seriestime-series-analysisworkflows
19 stars 7.58 score 56 scripts 1 dependentsrisktoollib
RTL:Risk Tool Library - Trading, Risk, Analytics for Commodities
A toolkit for Commodities 'analytics', risk management and trading professionals. Includes functions for API calls to <https://commodities.morningstar.com/#/>, <https://developer.genscape.com/>, and <https://www.bankofcanada.ca/valet/docs>.
Maintained by Philippe Cote. Last updated 1 months ago.
analyticsapicommoditiescommodities-apifinancegenscapemorningstarpythonrisk-managementcpp
30 stars 7.51 score 198 scriptstagteam
pec:Prediction Error Curves for Risk Prediction Models in Survival Analysis
Validation of risk predictions obtained from survival models and competing risk models based on censored data using inverse weighting and cross-validation. Most of the 'pec' functionality has been moved to 'riskRegression'.
Maintained by Thomas A. Gerds. Last updated 2 years ago.
7.47 score 512 scripts 28 dependentsrgcca-factory
RGCCA:Regularized and Sparse Generalized Canonical Correlation Analysis for Multiblock Data
Multi-block data analysis concerns the analysis of several sets of variables (blocks) observed on the same group of individuals. The main aims of the RGCCA package are: to study the relationships between blocks and to identify subsets of variables of each block which are active in their relationships with the other blocks. This package allows to (i) run R/SGCCA and related methods, (ii) help the user to find out the optimal parameters for R/SGCCA such as regularization parameters (tau or sparsity), (iii) evaluate the stability of the RGCCA results and their significance, (iv) build predictive models from the R/SGCCA. (v) Generic print() and plot() functions apply to all these functionalities.
Maintained by Arthur Tenenhaus. Last updated 9 months ago.
12 stars 7.43 score 74 scriptsbioc
genefu:Computation of Gene Expression-Based Signatures in Breast Cancer
This package contains functions implementing various tasks usually required by gene expression analysis, especially in breast cancer studies: gene mapping between different microarray platforms, identification of molecular subtypes, implementation of published gene signatures, gene selection, and survival analysis.
Maintained by Benjamin Haibe-Kains. Last updated 4 months ago.
differentialexpressiongeneexpressionvisualizationclusteringclassification
7.42 score 193 scripts 3 dependentsspsanderson
healthyR.ai:The Machine Learning and AI Modeling Companion to 'healthyR'
Hospital machine learning and ai data analysis workflow tools, modeling, and automations. This library provides many useful tools to review common administrative hospital data. Some of these include predicting length of stay, and readmits. The aim is to provide a simple and consistent verb framework that takes the guesswork out of everything.
Maintained by Steven Sanderson. Last updated 2 months ago.
aiartificial-intelligencehealthcareanalyticshealthyrhealthyversemachine-learning
16 stars 7.37 score 36 scripts 1 dependentsspsanderson
healthyR:Hospital Data Analysis Workflow Tools
Hospital data analysis workflow tools, modeling, and automations. This library provides many useful tools to review common administrative hospital data. Some of these include average length of stay, readmission rates, average net pay amounts by service lines just to name a few. The aim is to provide a simple and consistent verb framework that takes the guesswork out of everything.
Maintained by Steven Sanderson. Last updated 9 months ago.
analysisanalyticshealthcarehealthyr
30 stars 7.27 score 103 scripts 1 dependentsbioc
tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles
This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.
Maintained by Timothy Keyes. Last updated 5 months ago.
singlecellflowcytometrybioinformaticscytometrydata-sciencesingle-celltidyversecpp
18 stars 7.24 score 35 scriptstidymodels
tidyclust:A Common API to Clustering
A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
112 stars 7.21 score 139 scriptskkholst
targeted:Targeted Inference
Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>).
Maintained by Klaus K. Holst. Last updated 2 months ago.
causal-inferencedouble-robustestimationsemiparametric-estimationstatisticsopenblascppopenmp
11 stars 7.20 score 30 scripts 1 dependentsbusiness-science
correlationfunnel:Speed Up Exploratory Data Analysis (EDA) with the Correlation Funnel
Speeds up exploratory data analysis (EDA) by providing a succinct workflow and interactive visualization tools for understanding which features have relationships to target (response). Uses binary correlation analysis to determine relationship. Default correlation method is the Pearson method. Lian Duan, W Nick Street, Yanchi Liu, Songhua Xu, and Brook Wu (2014) <doi:10.1145/2637484>.
Maintained by Matt Dancho. Last updated 1 years ago.
correlationexploratory-analysisexploratory-data-analysisexploratory-data-visualizationstidyverse
137 stars 7.20 score 115 scriptscardiomoon
autoReg:Automatic Linear and Logistic Regression and Survival Analysis
Make summary tables for descriptive statistics and select explanatory variables automatically in various regression models. Support linear models, generalized linear models and cox-proportional hazard models. Generate publication-ready tables summarizing result of regression analysis and plots. The tables and plots can be exported in "HTML", "pdf('LaTex')", "docx('MS Word')" and "pptx('MS Powerpoint')" documents.
Maintained by Keon-Woong Moon. Last updated 1 years ago.
49 stars 7.13 score 69 scriptsbioc
animalcules:Interactive microbiome analysis toolkit
animalcules is an R package for utilizing up-to-date data analytics, visualization methods, and machine learning models to provide users an easy-to-use interactive microbiome analysis framework. It can be used as a standalone software package or users can explore their data with the accompanying interactive R Shiny application. Traditional microbiome analysis such as alpha/beta diversity and differential abundance analysis are enhanced, while new methods like biomarker identification are introduced by animalcules. Powerful interactive and dynamic figures generated by animalcules enable users to understand their data better and discover new insights.
Maintained by Jessica McClintock. Last updated 5 months ago.
microbiomemetagenomicscoveragevisualization
55 stars 6.95 score 23 scriptsbioc
pRolocGUI:Interactive visualisation of spatial proteomics data
The package pRolocGUI comprises functions to interactively visualise spatial proteomics data on the basis of pRoloc, pRolocdata and shiny.
Maintained by Lisa Breckels. Last updated 5 months ago.
8 stars 6.90 score 3 scriptscytomining
cytominer:Methods for Image-Based Cell Profiling
`cytominer` is a suite of common functions used to process high-dimensional readouts from image-based cell profiling experiments.
Maintained by Shantanu Singh. Last updated 2 years ago.
50 stars 6.89 score 44 scriptstidymodels
agua:'tidymodels' Integration with 'h2o'
Create and evaluate models using 'tidymodels' and 'h2o' <https://h2o.ai/>. The package enables users to specify 'h2o' as an engine for several modeling methods.
Maintained by Qiushi Yan. Last updated 10 months ago.
22 stars 6.88 score 80 scriptstidymodels
usemodels:Boilerplate Code for 'Tidymodels' Analyses
Code snippets to fit models using the tidymodels framework can be easily created for a given data set.
Maintained by Max Kuhn. Last updated 6 months ago.
84 stars 6.88 score 128 scriptsnmfs-ost
asar:Build NOAA Stock Assessment Report
Build a full or update stock assessment report for any stock assessment model. Parameterization allows the user to call a template based on their regional science center, species, area, ect.
Maintained by Samantha Schiano. Last updated 14 hours ago.
latexnoaa-nsapquartostock-assessmentstock-assessment-reports
22 stars 6.85 score 3 scriptsraymondbalise
rUM:R Templates from the University of Miami
This holds some r markdown and quarto templates and a template to create a research project in "R Studio".
Maintained by Raymond Balise. Last updated 13 days ago.
9 stars 6.84 score 16 scriptskozodoi
fairness:Algorithmic Fairness Metrics
Offers calculation, visualization and comparison of algorithmic fairness metrics. Fair machine learning is an emerging topic with the overarching aim to critically assess whether ML algorithms reinforce existing social biases. Unfair algorithms can propagate such biases and produce predictions with a disparate impact on various sensitive groups of individuals (defined by sex, gender, ethnicity, religion, income, socioeconomic status, physical or mental disabilities). Fair algorithms possess the underlying foundation that these groups should be treated similarly or have similar prediction outcomes. The fairness R package offers the calculation and comparisons of commonly and less commonly used fairness metrics in population subgroups. These methods are described by Calders and Verwer (2010) <doi:10.1007/s10618-010-0190-x>, Chouldechova (2017) <doi:10.1089/big.2016.0047>, Feldman et al. (2015) <doi:10.1145/2783258.2783311> , Friedler et al. (2018) <doi:10.1145/3287560.3287589> and Zafar et al. (2017) <doi:10.1145/3038912.3052660>. The package also offers convenient visualizations to help understand fairness metrics.
Maintained by Nikita Kozodoi. Last updated 2 years ago.
algorithmic-discriminationalgorithmic-fairnessdiscriminationdisparate-impactfairnessfairness-aifairness-mlmachine-learning
32 stars 6.82 score 69 scripts 1 dependentsmichaellli
evalITR:Evaluating Individualized Treatment Rules
Provides various statistical methods for evaluating Individualized Treatment Rules under randomized data. The provided metrics include Population Average Value (PAV), Population Average Prescription Effect (PAPE), Area Under Prescription Effect Curve (AUPEC). It also provides the tools to analyze Individualized Treatment Rules under budget constraints. Detailed reference in Imai and Li (2019) <arXiv:1905.05389>.
Maintained by Michael Lingzhi Li. Last updated 2 years ago.
14 stars 6.78 score 36 scriptsharrison4192
autostats:Auto Stats
Automatically do statistical exploration. Create formulas using 'tidyselect' syntax, and then determine cross-validated model accuracy and variable contributions using 'glm' and 'xgboost'. Contains additional helper functions to create and modify formulas. Has a flagship function to quickly determine relationships between categorical and continuous variables in the data set.
Maintained by Harrison Tietze. Last updated 29 days ago.
6 stars 6.76 score 5 scripts 2 dependentsbioc
SPONGE:Sparse Partial Correlations On Gene Expression
This package provides methods to efficiently detect competitive endogeneous RNA interactions between two genes. Such interactions are mediated by one or several miRNAs such that both gene and miRNA expression data for a larger number of samples is needed as input. The SPONGE package now also includes spongEffects: ceRNA modules offer patient-specific insights into the miRNA regulatory landscape.
Maintained by Markus List. Last updated 5 months ago.
geneexpressiontranscriptiongeneregulationnetworkinferencetranscriptomicssystemsbiologyregressionrandomforestmachinelearning
6.66 score 38 scripts 1 dependentsbusiness-science
modeltime.resample:Resampling Tools for Time Series Forecasting
A 'modeltime' extension that implements forecast resampling tools that assess time-based model performance and stability for a single time series, panel data, and cross-sectional time series analysis.
Maintained by Matt Dancho. Last updated 1 years ago.
accuracy-metricsbacktestingbootstrapbootstrappingcross-validationforecastingmodeltimemodeltime-resampleresamplingstatisticstidymodelstime-series
19 stars 6.64 score 38 scripts 1 dependentsbioc
scAnnotatR:Pretrained learning models for cell type prediction on single cell RNA-sequencing data
The package comprises a set of pretrained machine learning models to predict basic immune cell types. This enables all users to quickly get a first annotation of the cell types present in their dataset without requiring prior knowledge. scAnnotatR also allows users to train their own models to predict new cell types based on specific research needs.
Maintained by Johannes Griss. Last updated 5 months ago.
singlecelltranscriptomicsgeneexpressionsupportvectormachineclassificationsoftware
15 stars 6.61 score 20 scriptsspsanderson
tidyAML:Automatic Machine Learning with 'tidymodels'
The goal of this package will be to provide a simple interface for automatic machine learning that fits the 'tidymodels' framework. The intention is to work for regression and classification problems with a simple verb framework.
Maintained by Steven Sanderson. Last updated 11 months ago.
automatic-machine-learningautomlclassificationmachine-learningparsnipr-languager-programmingregressiontidytidymodelstidyverse
68 stars 6.56 score 36 scripts 1 dependentstsailintung
fastdid:Fast Staggered Difference-in-Difference Estimators
A fast and flexible implementation of Callaway and Sant'Anna's (2021)<doi:10.1016/j.jeconom.2020.12.001> staggered Difference-in-Differences (DiD) estimators, 'fastdid' reduces the computation time from hours to seconds, and incorporates extensions such as time-varying covariates and multiple events.
Maintained by Lin-Tung Tsai. Last updated 4 months ago.
difference-in-differencesevent-studystaggered-did
28 stars 6.56 score 4 scriptsbioc
condiments:Differential Topology, Progression and Differentiation
This package encapsulate many functions to conduct a differential topology analysis. It focuses on analyzing an 'omic dataset with multiple conditions. While the package is mostly geared toward scRNASeq, it does not place any restriction on the actual input format.
Maintained by Hector Roux de Bezieux. Last updated 4 months ago.
rnaseqsequencingsoftwaresinglecelltranscriptomicsmultiplecomparisonvisualization
27 stars 6.54 score 17 scriptspaulowhite
timeROC:Time-Dependent ROC Curve and AUC for Censored Survival Data
Estimation of time-dependent ROC curve and area under time dependent ROC curve (AUC) in the presence of censored data, with or without competing risks. Confidence intervals of AUCs and tests for comparing AUCs of two rival markers measured on the same subjects can be computed, using the iid-representation of the AUC estimator. Plot functions for time-dependent ROC curves and AUC curves are provided. Time-dependent Positive Predictive Values (PPV) and Negative Predictive Values (NPV) can also be computed. See Blanche et al. (2013) <doi:10.1002/sim.5958> and references therein for the details of the methods implemented in the package.
Maintained by Paul Blanche. Last updated 5 years ago.
9 stars 6.46 score 342 scripts 9 dependentsstatsgary
OddsPlotty:Odds Plot to Visualise a Logistic Regression Model
Uses the outputs of a logistic regression model, from caret <https://CRAN.R-project.org/package=caret>, to build an odds plot. This allows for the rapid visualisation of odds plot ratios and works best with the outputs of CARET's GLM model class, by returning the final trained model.
Maintained by Gary Hutson. Last updated 2 months ago.
17 stars 6.39 score 48 scripts 1 dependentsmattheaphy
actxps:Create Actuarial Experience Studies: Prepare Data, Summarize Results, and Create Reports
Experience studies are used by actuaries to explore historical experience across blocks of business and to inform assumption setting activities. This package provides functions for preparing data, creating studies, visualizing results, and beginning assumption development. Experience study methods, including exposure calculations, are described in: Atkinson & McGarry (2016) "Experience Study Calculations" <https://www.soa.org/49378a/globalassets/assets/files/research/experience-study-calculations.pdf>. The limited fluctuation credibility method used by the 'exp_stats()' function is described in: Herzog (1999, ISBN:1-56698-374-6) "Introduction to Credibility Theory".
Maintained by Matt Heaphy. Last updated 3 months ago.
14 stars 6.38 score 23 scriptsegeulgen
driveR:Prioritizing Cancer Driver Genes Using Genomics Data
Cancer genomes contain large numbers of somatic alterations but few genes drive tumor development. Identifying cancer driver genes is critical for precision oncology. Most of current approaches either identify driver genes based on mutational recurrence or using estimated scores predicting the functional consequences of mutations. 'driveR' is a tool for personalized or batch analysis of genomic data for driver gene prioritization by combining genomic information and prior biological knowledge. As features, 'driveR' uses coding impact metaprediction scores, non-coding impact scores, somatic copy number alteration scores, hotspot gene/double-hit gene condition, 'phenolyzer' gene scores and memberships to cancer-related KEGG pathways. It uses these features to estimate cancer-type-specific probability for each gene of being a cancer driver using the related task of a multi-task learning classification model. The method is described in detail in Ulgen E, Sezerman OU. 2021. driveR: driveR: a novel method for prioritizing cancer driver genes using somatic genomics data. BMC Bioinformatics <doi:10.1186/s12859-021-04203-7>.
Maintained by Ege Ulgen. Last updated 2 years ago.
cancer-drivernessdriverdriver-gene-prioritizationidentify-driver-genesranking-genesscoring
15 stars 6.29 score 260 scriptsbioc
signifinder:Collection and implementation of public transcriptional cancer signatures
signifinder is an R package for computing and exploring a compendium of tumor signatures. It allows to compute a variety of signatures, based on gene expression values, and return single-sample scores. Currently, signifinder contains more than 60 distinct signatures collected from the literature, relating to multiple tumors and multiple cancer processes.
Maintained by Stefania Pirrotta. Last updated 3 months ago.
geneexpressiongenetargetimmunooncologybiomedicalinformaticsrnaseqmicroarrayreportwritingvisualizationsinglecellspatialgenesignaling
7 stars 6.28 score 15 scriptsbioc
iNETgrate:Integrates DNA methylation data with gene expression in a single gene network
The iNETgrate package provides functions to build a correlation network in which nodes are genes. DNA methylation and gene expression data are integrated to define the connections between genes. This network is used to identify modules (clusters) of genes. The biological information in each of the resulting modules is represented by an eigengene. These biological signatures can be used as features e.g., for classification of patients into risk categories. The resulting biological signatures are very robust and give a holistic view of the underlying molecular changes.
Maintained by Habil Zare. Last updated 5 months ago.
geneexpressionrnaseqdnamethylationnetworkinferencenetworkgraphandnetworkbiomedicalinformaticssystemsbiologytranscriptomicsclassificationclusteringdimensionreductionprincipalcomponentmrnamicroarraynormalizationgenepredictionkeggsurvivalcore-services
74 stars 6.21 score 1 scriptstidymodels
shinymodels:Interactive Assessments of Models
Launch a 'shiny' application for 'tidymodels' results. For classification or regression models, the app can be used to determine if there is lack of fit or poorly predicted points.
Maintained by Simon Couch. Last updated 5 months ago.
48 stars 6.21 score 48 scriptsanthonydevaux
DynForest:Random Forest with Multivariate Longitudinal Predictors
Based on random forest principle, 'DynForest' is able to include multiple longitudinal predictors to provide individual predictions. Longitudinal predictors are modeled through the random forest. The methodology is fully described for a survival outcome in: Devaux, Helmer, Genuer & Proust-Lima (2023) <doi: 10.1177/09622802231206477>.
Maintained by Anthony Devaux. Last updated 5 months ago.
16 stars 6.20 score 8 scriptsmatildabrown
rWCVP:Generating Summaries, Reports and Plots from the World Checklist of Vascular Plants
A companion to the World Checklist of Vascular Plants (WCVP). It includes functions to generate maps and species lists, as well as match names to the WCVP. For more details and to cite the package, see: Brown M.J.M., Walker B.E., Black N., Govaerts R., Ondo I., Turner R., Nic Lughadha E. (in press). "rWCVP: A companion R package to the World Checklist of Vascular Plants". New Phytologist.
Maintained by Matilda Brown. Last updated 1 years ago.
22 stars 6.17 score 45 scripts 1 dependentsipd-tools
ipd:Inference on Predicted Data
Performs valid statistical inference on predicted data (IPD) using recent methods, where for a subset of the data, the outcomes have been predicted by an algorithm. Provides a wrapper function with specified defaults for the type of model and method to be used for estimation and inference. Further provides methods for tidying and summarizing results. Salerno et al., (2024) <doi:10.48550/arXiv.2410.09665>.
Maintained by Stephen Salerno. Last updated 3 months ago.
8 stars 6.13 score 5 scriptsandreanini
idiolect:Forensic Authorship Analysis
Carry out comparative authorship analysis of disputed and undisputed texts within the Likelihood Ratio Framework for expressing evidence in forensic science. This package contains implementations of well-known algorithms for comparative authorship analysis, such as Smith and Aldridge's (2011) Cosine Delta <doi:10.1080/09296174.2011.533591> or Koppel and Winter's (2014) Impostors Method <doi:10.1002/asi.22954>, as well as functions to measure their performance and to calibrate their outputs into Log-Likelihood Ratios.
Maintained by Andrea Nini. Last updated 27 days ago.
14 stars 6.12 score 3 scriptssentometricsresearch
sentometrics:An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction
Optimized prediction based on textual sentiment, accounting for the intrinsic challenge that sentiment can be computed and pooled across texts and time in various ways. See Ardia et al. (2021) <doi:10.18637/jss.v099.i02>.
Maintained by Samuel Borms. Last updated 4 years ago.
nlppredictionsentiment-analysistext-miningtime-seriesopenblascppopenmp
83 stars 6.09 score 49 scriptsbioc
metaseqR2:An R package for the analysis and result reporting of RNA-Seq data by combining multiple statistical algorithms
Provides an interface to several normalization and statistical testing packages for RNA-Seq gene expression data. Additionally, it creates several diagnostic plots, performs meta-analysis by combinining the results of several statistical tests and reports the results in an interactive way.
Maintained by Panagiotis Moulos. Last updated 22 days ago.
softwaregeneexpressiondifferentialexpressionworkflowsteppreprocessingqualitycontrolnormalizationreportwritingrnaseqtranscriptionsequencingtranscriptomicsbayesianclusteringcellbiologybiomedicalinformaticsfunctionalgenomicssystemsbiologyimmunooncologyalternativesplicingdifferentialsplicingmultiplecomparisontimecoursedataimportatacseqepigeneticsregressionproprietaryplatformsgenesetenrichmentbatcheffectchipseq
7 stars 6.05 score 3 scriptsnicholasjclark
MRFcov:Markov Random Fields with Additional Covariates
Approximate node interaction parameters of Markov Random Fields graphical networks. Models can incorporate additional covariates, allowing users to estimate how interactions between nodes in the graph are predicted to change across covariate gradients. The general methods implemented in this package are described in Clark et al. (2018) <doi:10.1002/ecy.2221>.
Maintained by Nicholas J Clark. Last updated 1 years ago.
conditional-random-fieldsgraphical-modelsmachine-learningmarkov-random-fieldmultivariate-analysismultivariate-statisticsnetwork-analysisnetworks
24 stars 6.03 score 30 scriptsjacekbialek
PriceIndices:Calculating Bilateral and Multilateral Price Indexes
Preparing a scanner data set for price dynamics calculations (data selecting, data classification, data matching, data filtering). Computing bilateral and multilateral indexes. For details on these methods see: Diewert and Fox (2020) <doi:10.1080/07350015.2020.1816176>, Białek (2019) <doi:10.2478/jos-2019-0014> or Białek (2020) <doi:10.2478/jos-2020-0037>.
Maintained by Jacek Białek. Last updated 2 months ago.
11 stars 6.02 score 16 scriptsbioc
EventPointer:An effective identification of alternative splicing events using junction arrays and RNA-Seq data
EventPointer is an R package to identify alternative splicing events that involve either simple (case-control experiment) or complex experimental designs such as time course experiments and studies including paired-samples. The algorithm can be used to analyze data from either junction arrays (Affymetrix Arrays) or sequencing data (RNA-Seq). The software returns a data.frame with the detected alternative splicing events: gene name, type of event (cassette, alternative 3',...,etc), genomic position, statistical significance and increment of the percent spliced in (Delta PSI) for all the events. The algorithm can generate a series of files to visualize the detected alternative splicing events in IGV. This eases the interpretation of results and the design of primers for standard PCR validation.
Maintained by Juan A. Ferrer-Bonsoms. Last updated 5 months ago.
alternativesplicingdifferentialsplicingmrnamicroarrayrnaseqtranscriptionsequencingtimecourseimmunooncology
4 stars 6.00 score 6 scriptsbusiness-science
alphavantager:Lightweight Interface to the Alpha Vantage API
Alpha Vantage has free historical financial information. All you need to do is get a free API key at <https://www.alphavantage.co>. Then you can use the R interface to retrieve free equity information. Refer to the Alpha Vantage website for more information.
Maintained by Matt Dancho. Last updated 2 years ago.
alpha-vantagefinancial-datahistorical-financial-data
70 stars 5.98 score 64 scriptsbioc
consensusOV:Gene expression-based subtype classification for high-grade serous ovarian cancer
This package implements four major subtype classifiers for high-grade serous (HGS) ovarian cancer as described by Helland et al. (PLoS One, 2011), Bentink et al. (PLoS One, 2012), Verhaak et al. (J Clin Invest, 2013), and Konecny et al. (J Natl Cancer Inst, 2014). In addition, the package implements a consensus classifier, which consolidates and improves on the robustness of the proposed subtype classifiers, thereby providing reliable stratification of patients with HGS ovarian tumors of clearly defined subtype.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
classificationclusteringdifferentialexpressiongeneexpressionmicroarraytranscriptomicscancer-datacancer-genomicscancer-researchexpression-databaseovarian-cancer
3 stars 5.98 score 15 scripts 1 dependentsrajohansen
waterquality:Satellite Derived Water Quality Detection Algorithms
The main purpose of waterquality is to quickly and easily convert satellite-based reflectance imagery into one or many well-known water quality algorithms designed for the detection of harmful algal blooms or the following pigment proxies: chlorophyll-a, blue-green algae (phycocyanin), and turbidity. Johansen et al. (2019) <doi:10.21079/11681/35053>.
Maintained by Richard Johansen. Last updated 1 years ago.
algal-bloomalgorithmslandsat-8merismodisolciremote-sensingsentinel-2water-quality
44 stars 5.97 score 21 scriptsbozenne
BuyseTest:Generalized Pairwise Comparisons
Implementation of the Generalized Pairwise Comparisons (GPC) as defined in Buyse (2010) <doi:10.1002/sim.3923> for complete observations, and extended in Peron (2018) <doi:10.1177/0962280216658320> to deal with right-censoring. GPC compare two groups of observations (intervention vs. control group) regarding several prioritized endpoints to estimate the probability that a random observation drawn from one group performs better/worse/equivalently than a random observation drawn from the other group. Summary statistics such as the net treatment benefit, win ratio, or win odds are then deduced from these probabilities. Confidence intervals and p-values are obtained based on asymptotic results (Ozenne 2021 <doi:10.1177/09622802211037067>), non-parametric bootstrap, or permutations. The software enables the use of thresholds of minimal importance difference, stratification, non-prioritized endpoints (O Brien test), and can handle right-censoring and competing-risks.
Maintained by Brice Ozenne. Last updated 3 days ago.
generalized-pairwise-comparisonsnon-parametricstatisticscpp
5 stars 5.95 score 90 scriptsbioc
REMP:Repetitive Element Methylation Prediction
Machine learning-based tools to predict DNA methylation of locus-specific repetitive elements (RE) by learning surrounding genetic and epigenetic information. These tools provide genomewide and single-base resolution of DNA methylation prediction on RE that are difficult to measure using array-based or sequencing-based platforms, which enables epigenome-wide association study (EWAS) and differentially methylated region (DMR) analysis on RE.
Maintained by Yinan Zheng. Last updated 5 months ago.
dnamethylationmicroarraymethylationarraysequencinggenomewideassociationepigeneticspreprocessingmultichanneltwochanneldifferentialmethylationqualitycontroldataimport
2 stars 5.94 score 18 scriptsbioc
miRspongeR:Identification and analysis of miRNA sponge regulation
This package provides several functions to explore miRNA sponge (also called ceRNA or miRNA decoy) regulation from putative miRNA-target interactions or/and transcriptomics data (including bulk, single-cell and spatial gene expression data). It provides eight popular methods for identifying miRNA sponge interactions, and an integrative method to integrate miRNA sponge interactions from different methods, as well as the functions to validate miRNA sponge interactions, and infer miRNA sponge modules, conduct enrichment analysis of miRNA sponge modules, and conduct survival analysis of miRNA sponge modules. By using a sample control variable strategy, it provides a function to infer sample-specific miRNA sponge interactions. In terms of sample-specific miRNA sponge interactions, it implements three similarity methods to construct sample-sample correlation network.
Maintained by Junpeng Zhang. Last updated 5 months ago.
geneexpressionbiomedicalinformaticsnetworkenrichmentsurvivalmicroarraysoftwaresinglecellspatialrnaseqcernamirnasponge
5 stars 5.88 score 8 scriptsphytoclass
phytoclass:Estimate Chla Concentrations of Phytoplankton Groups
Determine the chlorophyll a (Chl a) concentrations of different phytoplankton groups based on their pigment biomarkers. The method uses non-negative matrix factorisation and simulated annealing to minimise error between the observed and estimated values of pigment concentrations (Hayward et al. (2023) <doi:10.1002/lom3.10541>). The approach is similar to the widely used 'CHEMTAX' program (Mackey et al. 1996) <doi:10.3354/meps144265>, but is more straightforward, accurate, and not reliant on initial guesses for the pigment to Chl a ratios for phytoplankton groups.
Maintained by Alexander Hayward. Last updated 28 days ago.
2 stars 5.88 score 9 scriptsbcallaway11
qte:Quantile Treatment Effects
Provides several methods for computing the Quantile Treatment Effect (QTE) and Quantile Treatment Effect on the Treated (QTT). The main cases covered are (i) Treatment is randomly assigned, (ii) Treatment is as good as randomly assigned after conditioning on some covariates (also called conditional independence or selection on observables) using the methods developed in Firpo (2007) <doi:10.1111/j.1468-0262.2007.00738.x>, (iii) Identification is based on a Difference in Differences assumption (several varieties are available in the package e.g. Athey and Imbens (2006) <doi:10.1111/j.1468-0262.2006.00668.x> Callaway and Li (2019) <doi:10.3982/QE935>, Callaway, Li, and Oka (2018) <doi:10.1016/j.jeconom.2018.06.008>).
Maintained by Brantly Callaway. Last updated 11 months ago.
9 stars 5.87 score 55 scriptsbioc
CytoGLMM:Conditional Differential Analysis for Flow and Mass Cytometry Experiments
The CytoGLMM R package implements two multiple regression strategies: A bootstrapped generalized linear model (GLM) and a generalized linear mixed model (GLMM). Most current data analysis tools compare expressions across many computationally discovered cell types. CytoGLMM focuses on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. As a result, CytoGLMM finds differential proteins in flow and mass cytometry data while reducing biases arising from marker correlations and safeguarding against false discoveries induced by patient heterogeneity.
Maintained by Christof Seiler. Last updated 16 hours ago.
flowcytometryproteomicssinglecellcellbasedassayscellbiologyimmunooncologyregressionstatisticalmethodsoftware
2 stars 5.86 score 1 scripts 1 dependentsddebeer
permimp:Conditional Permutation Importance
An add-on to the 'party' package, with a faster implementation of the partial-conditional permutation importance for random forests. The standard permutation importance is implemented exactly the same as in the 'party' package. The conditional permutation importance can be computed faster, with an option to be backward compatible to the 'party' implementation. The package is compatible with random forests fit using the 'party' and the 'randomForest' package. The methods are described in Strobl et al. (2007) <doi:10.1186/1471-2105-8-25> and Debeer and Strobl (2020) <doi:10.1186/s12859-020-03622-2>.
Maintained by Dries Debeer. Last updated 6 days ago.
4 stars 5.85 score 39 scripts 1 dependentsvdblab
FLORAL:Fit Log-Ratio Lasso Regression for Compositional Data
Log-ratio Lasso regression for continuous, binary, and survival outcomes with (longitudinal) compositional features. See Fei and others (2024) <doi:10.1016/j.crmeth.2024.100899>.
Maintained by Teng Fei. Last updated 2 months ago.
12 stars 5.85 score 13 scriptsandreasnordland
polle:Policy Learning
Package for learning and evaluating (subgroup) policies via doubly robust loss functions. Policy learning methods include doubly robust blip/conditional average treatment effect learning and sequential policy tree learning. Methods for (subgroup) policy evaluation include doubly robust cross-fitting and online estimation/sequential validation. See Nordland and Holst (2022) <doi:10.48550/arXiv.2212.02335> for documentation and references.
Maintained by Andreas Nordland. Last updated 6 days ago.
4 stars 5.80 score 6 scriptsnelson-gon
manymodelr:Build and Tune Several Models
Frequently one needs a convenient way to build and tune several models in one go.The goal is to provide a number of machine learning convenience functions. It provides the ability to build, tune and obtain predictions of several models in one function. The models are built using functions from 'caret' with easier to read syntax. Kuhn(2014) <doi:10.48550/arXiv.1405.6974>.
Maintained by Nelson Gonzabato. Last updated 14 days ago.
analysis-of-varianceanovacorrelationcorrelation-coefficientgeneralized-linear-modelsgradient-boosting-decision-treesknn-classificationlinear-modelslinear-regressionmachine-learningmissing-valuesmodelsr-programmingrandom-forest-algorithmregression-models
2 stars 5.78 score 50 scriptsriccardo-df
causalQual:Causal Inference for Qualitative Outcomes
Implements the framework introduced in Di Francesco and Mellace (2025) <doi:10.48550/arXiv.2502.11691>, shifting the focus to well-defined and interpretable estimands that quantify how treatment affects the probability distribution over outcome categories. It supports selection-on-observables, instrumental variables, regression discontinuity, and difference-in-differences designs.
Maintained by Riccardo Di Francesco. Last updated 22 hours ago.
11 stars 5.74 scoreisobelbarrott
Landmarking:Analysis using Landmark Models
The landmark approach allows survival predictions to be updated dynamically as new measurements from an individual are recorded. The idea is to set predefined time points, known as "landmark times", and form a model at each landmark time using only the individuals in the risk set. This package allows the longitudinal data to be modelled either using the last observation carried forward or linear mixed effects modelling. There is also the option to model competing risks, either through cause-specific Cox regression or Fine-Gray regression. To find out more about the methods in this package, please see <https://isobelbarrott.github.io/Landmarking/articles/Landmarking>.
Maintained by Isobel Barrott. Last updated 2 years ago.
6 stars 5.72 score 44 scriptsstatsgary
MLDataR:Collection of Machine Learning Datasets for Supervised Machine Learning
Contains a collection of datasets for working with machine learning tasks. It will contain datasets for supervised machine learning Jiang (2020)<doi:10.1016/j.beth.2020.05.002> and will include datasets for classification and regression. The aim of this package is to use data generated around health and other domains.
Maintained by Gary Hutson. Last updated 1 years ago.
53 stars 5.70 score 19 scriptsbioc
bandle:An R package for the Bayesian analysis of differential subcellular localisation experiments
The Bandle package enables the analysis and visualisation of differential localisation experiments using mass-spectrometry data. Experimental methods supported include dynamic LOPIT-DC, hyperLOPIT, Dynamic Organellar Maps, Dynamic PCP. It provides Bioconductor infrastructure to analyse these data.
Maintained by Oliver M. Crook. Last updated 17 hours ago.
bayesianclassificationclusteringimmunooncologyqualitycontroldataimportproteomicsmassspectrometryopenblascppopenmp
4 stars 5.68 score 3 scriptsbioc
metabCombiner:Method for Combining LC-MS Metabolomics Feature Measurements
This package aligns LC-HRMS metabolomics datasets acquired from biologically similar specimens analyzed under similar, but not necessarily identical, conditions. Peak-picked and simply aligned metabolomics feature tables (consisting of m/z, rt, and per-sample abundance measurements, plus optional identifiers & adduct annotations) are accepted as input. The package outputs a combined table of feature pair alignments, organized into groups of similar m/z, and ranked by a similarity score. Input tables are assumed to be acquired using similar (but not necessarily identical) analytical methods.
Maintained by Hani Habra. Last updated 5 months ago.
softwaremassspectrometrymetabolomicsmass-spectrometry
10 stars 5.65 score 5 scriptsakai01
caretForecast:Conformal Time Series Forecasting Using State of Art Machine Learning Algorithms
Conformal time series forecasting using the caret infrastructure. It provides access to state-of-the-art machine learning models for forecasting applications. The hyperparameter of each model is selected based on time series cross-validation, and forecasting is done recursively.
Maintained by Resul Akay. Last updated 2 years ago.
caretconformal-predictiondata-scienceeconometricsforecastforecastingforecasting-modelsmachine-learningmacroeconometricsmicroeconometricstime-seriestime-series-forcastingtime-series-prediction
25 stars 5.62 score 28 scripts 4 dependentsgrosssbm
missSBM:Handling Missing Data in Stochastic Block Models
When a network is partially observed (here, NAs in the adjacency matrix rather than 1 or 0 due to missing information between node pairs), it is possible to account for the underlying process that generates those NAs. 'missSBM', presented in 'Barbillon, Chiquet and Tabouy' (2022) <doi:10.18637/jss.v101.i12>, adjusts the popular stochastic block model from network data sampled under various missing data conditions, as described in 'Tabouy, Barbillon and Chiquet' (2019) <doi:10.1080/01621459.2018.1562934>.
Maintained by Julien Chiquet. Last updated 21 days ago.
missing-datanasnetwork-analysisnetwork-datasetstochastic-block-modelcpp
12 stars 5.53 score 19 scriptstechtonique
learningmachine:Machine Learning with Explanations and Uncertainty Quantification
Regression-based Machine Learning with explanations and uncertainty quantification.
Maintained by T. Moudiki. Last updated 4 months ago.
conformal-predictionmachine-learningmachine-learning-algorithmsmachinelearningstatistical-learninguncertainty-quantificationcpp
5 stars 5.53 score 21 scriptsrobindenz1
contsurvplot:Visualize the Effect of a Continuous Variable on a Time-to-Event Outcome
Graphically display the (causal) effect of a continuous variable on a time-to-event outcome using multiple different types of plots based on g-computation. Those functions include, among others, survival area plots, survival contour plots, survival quantile plots and 3D surface plots. Due to the use of g-computation, all plot allow confounder-adjustment naturally. For details, see Robin Denz, Nina Timmesfeld (2023) <doi:10.1097/EDE.0000000000001630>.
Maintained by Robin Denz. Last updated 2 days ago.
causal-inferencecontinuousg-computationsurvival-analysisvisualization
12 stars 5.53 score 56 scriptsjeffreyhanson
raptr:Representative and Adequate Prioritization Toolkit in R
Biodiversity is in crisis. The overarching aim of conservation is to preserve biodiversity patterns and processes. To this end, protected areas are established to buffer species and preserve biodiversity processes. But resources are limited and so protected areas must be cost-effective. This package contains tools to generate plans for protected areas (prioritizations), using spatially explicit targets for biodiversity patterns and processes. To obtain solutions in a feasible amount of time, this package uses the commercial 'Gurobi' software (obtained from <https://www.gurobi.com/>). For more information on using this package, see Hanson et al. (2018) <doi:10.1111/2041-210X.12862>.
Maintained by Jeffrey O Hanson. Last updated 1 years ago.
8 stars 5.52 score 83 scriptsjonathan-g
datafsm:Estimating Finite State Machine Models from Data
Automatic generation of finite state machine models of dynamic decision-making that both have strong predictive power and are interpretable in human terms. We use an efficient model representation and a genetic algorithm-based estimation process to generate simple deterministic approximations that explain most of the structure of complex stochastic processes. We have applied the software to empirical data, and demonstrated it's ability to recover known data-generating processes by simulating data with agent-based models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity.
Maintained by Jonathan M. Gilligan. Last updated 4 years ago.
11 stars 5.52 score 30 scriptsfrankiethull
maize:Specialty Kernels for SVMs
Bindings for svm kernels via kernlab for use with the 'parsnip' package. Specifically related to specialty kernels for support vector machines not available in parsnip. package includes interface for various kernlab kernels and custom kernels too.
Maintained by Frankie T. Hull. Last updated 1 days ago.
10 stars 5.51 score 3 scriptshenryspatialanalysis
mbg:Model-Based Geostatistics
Modern model-based geostatistics for point-referenced data. This package provides a simple interface to run spatial machine learning models and geostatistical models that estimate a continuous (raster) surface from point-referenced outcomes and, optionally, a set of raster covariates. The package also includes functions to summarize raster outcomes by (polygon) region while preserving uncertainty.
Maintained by Nathaniel Henry. Last updated 11 hours ago.
1 stars 5.48 scorehehta
RESIDE:Rapid Easy Synthesis to Inform Data Extraction
Developed to assist researchers with planning analysis, prior to obtaining data from Trusted Research Environments (TREs) also known as safe havens. With functionality to export and import marginal distributions as well as synthesise data, both with and without correlations from these marginal distributions. Using a multivariate cumulative distribution (COPULA). Additionally the International Stroke Trial (IST) is included as an example dataset under ODC-By licence Sandercock et al. (2011) <doi:10.7488/ds/104>, Sandercock et al. (2011) <doi:10.1186/1745-6215-12-101>.
Maintained by Ryan Field. Last updated 28 days ago.
5.44 score 5 scriptspersimune
explainer:Machine Learning Model Explainer
It enables detailed interpretation of complex classification and regression models through Shapley analysis including data-driven characterization of subgroups of individuals. Furthermore, it facilitates multi-measure model evaluation, model fairness, and decision curve analysis. Additionally, it offers enhanced visualizations with interactive elements.
Maintained by Ramtin Zargari Marandi. Last updated 6 months ago.
aiclassificationclinical-researchexplainabilityexplainable-aiinterpretabilitymachine-learningregressionshapstatistics
15 stars 5.43 score 12 scriptsjaipizgon
NeuralSens:Sensitivity Analysis of Neural Networks
Analysis functions to quantify inputs importance in neural network models. Functions are available for calculating and plotting the inputs importance and obtaining the activation function of each neuron layer and its derivatives. The importance of a given input is defined as the distribution of the derivatives of the output with respect to that input in each training data point <doi:10.18637/jss.v102.i07>.
Maintained by Jaime Pizarroso Gonzalo. Last updated 6 months ago.
15 stars 5.43 score 24 scriptsflaviomoc
divraster:Diversity Metrics Calculations for Rasterized Data
Alpha and beta diversity for taxonomic (TD), functional (FD), and phylogenetic (PD) dimensions based on rasters. Spatial and temporal beta diversity can be partitioned into replacement and richness difference components. It also calculates standardized effect size for FD and PD alpha diversity and the average individual traits across multilayer rasters. The layers of the raster represent species, while the cells represent communities. Methods details can be found at Cardoso et al. 2022 <https://CRAN.R-project.org/package=BAT> and Heming et al. 2023 <https://CRAN.R-project.org/package=SESraster>.
Maintained by Flávio M. M. Mota. Last updated 17 days ago.
10 stars 5.40 score 7 scriptsfawda123
WRTDStidal:Weighted Regression for Water Quality Evaluation in Tidal Waters
An adaptation for estuaries (tidal waters) of weighted regression on time, discharge, and season to evaluate trends in water quality time series. Please see Beck and Hagy (2015) <doi:10.1007/s10666-015-9452-8> for details.
Maintained by Marcus W. Beck. Last updated 1 years ago.
4 stars 5.38 score 119 scriptssmaakage85
modelgrid:A Framework for Creating, Managing and Training Multiple Caret Models
A minimalistic but flexible framework that facilitates the creation, management and training of multiple 'caret' models. A model grid consists of two components: (1) a set of settings that is shared by all models by default, and (2) specifications that apply only to the individual models. When the model grid is trained, model and training specifications are first consolidated from the shared and the model specific settings into complete 'caret' model configurations. These models are then trained with the 'train' function from the 'caret' package.
Maintained by Lars Kjeldgaard. Last updated 6 years ago.
caretmachine-learningpredictive-analyticspredictive-modeling
23 stars 5.34 score 19 scriptsadrientaudiere
cati:Community Assembly by Traits: Individuals and Beyond
Detect and quantify community assembly processes using trait values of individuals or populations, the T-statistics and other metrics, and dedicated null models.
Maintained by Adrien Taudiere. Last updated 5 months ago.
12 stars 5.33 score 15 scriptstlverse
tmle3shift:Targeted Learning of the Causal Effects of Stochastic Interventions
Targeted maximum likelihood estimation (TMLE) of population-level causal effects under stochastic treatment regimes and related nonparametric variable importance analyses. Tools are provided for TML estimation of the counterfactual mean under a stochastic intervention characterized as a modified treatment policy, such as treatment policies that shift the natural value of the exposure. The causal parameter and estimation were described in Díaz and van der Laan (2013) <doi:10.1111/j.1541-0420.2011.01685.x> and an improved estimation approach was given by Díaz and van der Laan (2018) <doi:10.1007/978-3-319-65304-4_14>.
Maintained by Nima Hejazi. Last updated 6 months ago.
causal-inferencemachine-learningmarginal-structural-modelsstochastic-interventionstargeted-learningtreatment-effectsvariable-importance
17 stars 5.33 score 42 scripts 1 dependentsalexkychen
assignPOP:Population Assignment using Genetic, Non-Genetic or Integrated Data in a Machine Learning Framework
Use Monte-Carlo and K-fold cross-validation coupled with machine- learning classification algorithms to perform population assignment, with functionalities of evaluating discriminatory power of independent training samples, identifying informative loci, reducing data dimensionality for genomic data, integrating genetic and non-genetic data, and visualizing results.
Maintained by Kuan-Yu (Alex) Chen. Last updated 1 years ago.
cross-validationdata-integrationgbsmachine-learningpopulation-assignmentpopulation-genomicsradseq
17 stars 5.33 score 25 scriptsbioc
DaMiRseq:Data Mining for RNA-seq data: normalization, feature selection and classification
The DaMiRseq package offers a tidy pipeline of data mining procedures to identify transcriptional biomarkers and exploit them for both binary and multi-class classification purposes. The package accepts any kind of data presented as a table of raw counts and allows including both continous and factorial variables that occur with the experimental setting. A series of functions enable the user to clean up the data by filtering genomic features and samples, to adjust data by identifying and removing the unwanted source of variation (i.e. batches and confounding factors) and to select the best predictors for modeling. Finally, a "stacking" ensemble learning technique is applied to build a robust classification model. Every step includes a checkpoint that the user may exploit to assess the effects of data management by looking at diagnostic plots, such as clustering and heatmaps, RLE boxplots, MDS or correlation plot.
Maintained by Mattia Chiesa. Last updated 5 months ago.
sequencingrnaseqclassificationimmunooncologyopenjdk
5.32 score 7 scripts 1 dependentsbioc
preciseTAD:preciseTAD: A machine learning framework for precise TAD boundary prediction
preciseTAD provides functions to predict the location of boundaries of topologically associated domains (TADs) and chromatin loops at base-level resolution. As an input, it takes BED-formatted genomic coordinates of domain boundaries detected from low-resolution Hi-C data, and coordinates of high-resolution genomic annotations from ENCODE or other consortia. preciseTAD employs several feature engineering strategies and resampling techniques to address class imbalance, and trains an optimized random forest model for predicting low-resolution domain boundaries. Translated on a base-level, preciseTAD predicts the probability for each base to be a boundary. Density-based clustering and scalable partitioning techniques are used to detect precise boundary regions and summit points. Compared with low-resolution boundaries, preciseTAD boundaries are highly enriched for CTCF, RAD21, SMC3, and ZNF143 signal and more conserved across cell lines. The pre-trained model can accurately predict boundaries in another cell line using CTCF, RAD21, SMC3, and ZNF143 annotation data for this cell line.
Maintained by Mikhail Dozmorov. Last updated 5 months ago.
softwarehicsequencingclusteringclassificationfunctionalgenomicsfeatureextraction
7 stars 5.29 score 14 scriptsrandcorporation
optic:Simulation Tool for Causal Inference Using Longitudinal Data
Implements a simulation study to assess the strengths and weaknesses of causal inference methods for estimating policy effects using panel data. See Griffin et al. (2021) <doi:10.1007/s10742-022-00284-w> and Griffin et al. (2022) <doi:10.1186/s12874-021-01471-y> for a description of our methods.
Maintained by Pedro Nascimento de Lima. Last updated 3 months ago.
causal-inferencediff-in-difflongitudinal-datasimulation
9 stars 5.26 score 6 scriptsfranciscomartinezdelrio
utsf:Univariate Time Series Forecasting
An engine for univariate time series forecasting using different regression models in an autoregressive way. The engine provides an uniform interface for applying the different models. Furthermore, it is extensible so that users can easily apply their own regression models to univariate time series forecasting and benefit from all the features of the engine, such as preprocessings or estimation of forecast accuracy.
Maintained by Francisco Martinez. Last updated 2 months ago.
2 stars 5.23 score 4 scriptsbioc
scGPS:A complete analysis of single cell subpopulations, from identifying subpopulations to analysing their relationship (scGPS = single cell Global Predictions of Subpopulation)
The package implements two main algorithms to answer two key questions: a SCORE (Stable Clustering at Optimal REsolution) to find subpopulations, followed by scGPS to investigate the relationships between subpopulations.
Maintained by Quan Nguyen. Last updated 5 months ago.
singlecellclusteringdataimportsequencingcoverageopenblascpp
4 stars 5.20 score 7 scriptsbioc
squallms:Speedy quality assurance via lasso labeling for LC-MS data
squallms is a Bioconductor R package that implements a "semi-labeled" approach to untargeted mass spectrometry data. It pulls in raw data from mass-spec files to calculate several metrics that are then used to label MS features in bulk as high or low quality. These metrics of peak quality are then passed to a simple logistic model that produces a fully-labeled dataset suitable for downstream analysis.
Maintained by William Kumler. Last updated 5 months ago.
massspectrometrymetabolomicsproteomicslipidomicsshinyappsclassificationclusteringfeatureextractionprincipalcomponentregressionpreprocessingqualitycontrolvisualization
3 stars 5.13 score 5 scriptsfbertran
plsRcox:Partial Least Squares Regression for Cox Models and Related Techniques
Provides Partial least squares Regression and various regular, sparse or kernel, techniques for fitting Cox models in high dimensional settings <doi:10.1093/bioinformatics/btu660>, Bastien, P., Bertrand, F., Meyer N., Maumy-Bertrand, M. (2015), Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data, Bioinformatics, 31(3):397-404. Cross validation criteria were studied in <arXiv:1810.02962>, Bertrand, F., Bastien, Ph. and Maumy-Bertrand, M. (2018), Cross validating extensions of kernel, sparse or regular partial least squares regression models to censored data.
Maintained by Frederic Bertrand. Last updated 2 years ago.
4 stars 5.13 score 56 scripts 2 dependentsbioc
SGCP:SGCP: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks
SGC is a semi-supervised pipeline for gene clustering in gene co-expression networks. SGC consists of multiple novel steps that enable the computation of highly enriched modules in an unsupervised manner. But unlike all existing frameworks, it further incorporates a novel step that leverages Gene Ontology information in a semi-supervised clustering method that further improves the quality of the computed modules.
Maintained by Niloofar AghaieAbiane. Last updated 5 months ago.
geneexpressiongenesetenrichmentnetworkenrichmentsystemsbiologyclassificationclusteringdimensionreductiongraphandnetworkneuralnetworknetworkmrnamicroarrayrnaseqvisualizationbioinformaticsgenecoexpressionnetworkgraphsnetworkclusteringnetworksself-trainingsemi-supervised-learningunsupervised-learning
2 stars 5.12 score 44 scriptsspsanderson
healthyverse:Easily Install and Load the 'healthyverse'
The 'healthyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'healthyverse' packages in a single step.
Maintained by Steven Sanderson. Last updated 7 months ago.
analyticshealthcarehealthcare-applicationinstallationinstallermetapackages
11 stars 5.12 score 24 scriptspromidat
loadeR:Load Data for Analysis System
Provides a framework to load text and excel files through a 'shiny' graphical interface. It allows renaming, transforming, ordering and removing variables. It includes basic exploratory methods such as the mean, median, mode, normality test, histogram and correlation.
Maintained by Oldemar Rodriguez. Last updated 2 years ago.
5.09 score 275 scripts 3 dependentsbioc
fobitools:Tools for Manipulating the FOBI Ontology
A set of tools for interacting with the Food-Biomarker Ontology (FOBI). A collection of basic manipulation tools for biological significance analysis, graphs, and text mining strategies for annotating nutritional data.
Maintained by Pol Castellano-Escuder. Last updated 4 months ago.
massspectrometrymetabolomicssoftwarevisualizationbiomedicalinformaticsgraphandnetworkannotationcheminformaticspathwaysgenesetenrichmentbiological-intrerpretationbiological-knowledgebiological-significance-analysisenrichment-analysisfood-biomarker-ontologyknowledge-graphnutritionobofoundryontologytext-mining
1 stars 5.08 score 5 scriptslanl
NEONiso:Tools to Calibrate and Work with NEON Atmospheric Isotope Data
Functions for downloading, calibrating, and analyzing atmospheric isotope data bundled into the eddy covariance data products of the National Ecological Observatory Network (NEON) <https://www.neonscience.org>. Calibration tools are provided for carbon and water isotope products. Carbon isotope calibration details are found in Fiorella et al. (2021) <doi:10.1029/2020JG005862>, and the readme file at <https://github.com/lanl/NEONiso>. Tools for calibrating water isotope products have been added as of 0.6.0, but have known deficiencies and should be considered experimental and unsupported.
Maintained by Rich Fiorella. Last updated 1 months ago.
2 stars 5.08 score 6 scriptsadefazio
classifierplots:Generates a Visualization of Classifier Performance as a Grid of Diagnostic Plots
Generates a visualization of binary classifier performance as a grid of diagnostic plots with just one function call. Includes ROC curves, prediction density, accuracy, precision, recall and calibration plots, all using ggplot2 for easy modification. Debug your binary classifiers faster and easier!
Maintained by Aaron Defazio. Last updated 4 years ago.
50 stars 5.08 score 16 scriptsff1201
sgs:Sparse-Group SLOPE: Adaptive Bi-Level Selection with FDR Control
Implementation of Sparse-group SLOPE (SGS) (Feser and Evangelou (2023) <doi:10.48550/arXiv.2305.09467>) models. Linear and logistic regression models are supported, both of which can be fit using k-fold cross-validation. Dense and sparse input matrices are supported. In addition, a general Adaptive Three Operator Splitting (ATOS) (Pedregosa and Gidel (2018) <doi:10.48550/arXiv.1804.02339>) implementation is provided. Group SLOPE (gSLOPE) (Brzyski et al. (2019) <doi:10.1080/01621459.2017.1411269>) and group-based OSCAR models (Feser and Evangelou (2024) <doi:10.48550/arXiv.2405.15357>) are also implemented. All models are available with strong screening rules (Feser and Evangelou (2024) <doi:10.48550/arXiv.2405.15357>) for computational speed-up.
Maintained by Fabio Feser. Last updated 9 days ago.
1 stars 5.07 score 13 scripts 1 dependentsjameshwade
measure:A Recipes-style Interface to Tidymodels for Analytical Measurements
Analytical measurements...
Maintained by James Wade. Last updated 2 months ago.
5 stars 5.06 score 58 scriptsaleksandarsekulic
meteo:RFSI & STRK Interpolation for Meteo and Environmental Variables
Random Forest Spatial Interpolation (RFSI, Sekulić et al. (2020) <doi:10.3390/rs12101687>) and spatio-temporal geostatistical (spatio-temporal regression Kriging (STRK)) interpolation for meteorological (Kilibarda et al. (2014) <doi:10.1002/2013JD020803>, Sekulić et al. (2020) <doi:10.1007/s00704-019-03077-3>) and other environmental variables. Contains global spatio-temporal models calculated using publicly available data.
Maintained by Aleksandar Sekulić. Last updated 6 months ago.
18 stars 5.06 score 64 scriptspyanglab
AdaSampling:Adaptive Sampling for Positive Unlabeled and Label Noise Learning
Implements the adaptive sampling procedure, a framework for both positive unlabeled learning and learning with class label noise. Yang, P., Ormerod, J., Liu, W., Ma, C., Zomaya, A., Yang, J. (2018) <doi:10.1109/TCYB.2018.2816984>.
Maintained by Pengyi Yang. Last updated 6 years ago.
11 stars 5.04 score 10 scriptssstoeckl
FFdownload:Download Data from Kenneth French's Website
Downloads all the datasets (you can exclude the daily ones or specify a list of those you are targeting specifically) from Kenneth French's Website at <https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html>, process them and convert them to list of 'xts' (time series).
Maintained by Sebastian Stoeckl. Last updated 10 months ago.
9 stars 5.03 score 12 scriptsacabassi
coca:Cluster-of-Clusters Analysis
Contains the R functions needed to perform Cluster-Of-Clusters Analysis (COCA) and Consensus Clustering (CC). For further details please see Cabassi and Kirk (2020) <doi:10.1093/bioinformatics/btaa593>.
Maintained by Alessandra Cabassi. Last updated 5 years ago.
cluster-analysiscluster-of-clustersclusteringcocagenomicsintegrative-clusteringmulti-omics
6 stars 5.03 score 12 scripts 1 dependentsjobnmadu
Dyn4cast:Dynamic Modeling and Machine Learning Environment
Estimates, predict and forecast dynamic models as well as Machine Learning metrics which assists in model selection for further analysis. The package also have capabilities to provide tools and metrics that are useful in machine learning and modeling. For example, there is quick summary, percent sign, Mallow's Cp tools and others. The ecosystem of this package is analysis of economic data for national development. The package is so far stable and has high reliability and efficiency as well as time-saving.
Maintained by Job Nmadu. Last updated 18 days ago.
data-scienceequal-lenght-forecastforecastingknotsmachine-learningnigeriapredictionregression-modelsspline-modelsstatisticstime-series
4 stars 5.03 score 38 scriptscaranathunge
promor:Proteomics Data Analysis and Modeling Tools
A comprehensive, user-friendly package for label-free proteomics data analysis and machine learning-based modeling. Data generated from 'MaxQuant' can be easily used to conduct differential expression analysis, build predictive models with top protein candidates, and assess model performance. promor includes a suite of tools for quality control, visualization, missing data imputation (Lazar et. al. (2016) <doi:10.1021/acs.jproteome.5b00981>), differential expression analysis (Ritchie et. al. (2015) <doi:10.1093/nar/gkv007>), and machine learning-based modeling (Kuhn (2008) <doi:10.18637/jss.v028.i05>).
Maintained by Chathurani Ranathunge. Last updated 2 years ago.
biomarkersdifferential-expressionlfqmachine-learningmass-spectrometrymodelingproteomics
15 stars 5.02 score 14 scriptsbioc
MAI:Mechanism-Aware Imputation
A two-step approach to imputing missing data in metabolomics. Step 1 uses a random forest classifier to classify missing values as either Missing Completely at Random/Missing At Random (MCAR/MAR) or Missing Not At Random (MNAR). MCAR/MAR are combined because it is often difficult to distinguish these two missing types in metabolomics data. Step 2 imputes the missing values based on the classified missing mechanisms, using the appropriate imputation algorithms. Imputation algorithms tested and available for MCAR/MAR include Bayesian Principal Component Analysis (BPCA), Multiple Imputation No-Skip K-Nearest Neighbors (Multi_nsKNN), and Random Forest. Imputation algorithms tested and available for MNAR include nsKNN and a single imputation approach for imputation of metabolites where left-censoring is present.
Maintained by Jonathan Dekermanjian. Last updated 5 months ago.
softwaremetabolomicsstatisticalmethodclassificationimputation-methodsmachine-learningmissing-data
2 stars 5.00 score 6 scriptslanedrew
ldmppr:Estimate and Simulate from Location Dependent Marked Point Processes
A suite of tools for estimating, assessing model fit, simulating from, and visualizing location dependent marked point processes characterized by regularity in the pattern. You provide a reference marked point process, a set of raster images containing location specific covariates, and select the estimation algorithm and type of mark model. 'ldmppr' estimates the process and mark models and allows you to check the appropriateness of the model using a variety of diagnostic tools. Once a satisfactory model fit is obtained, you can simulate from the model and visualize the results. Documentation for the package 'ldmppr' is available in the form of a vignette.
Maintained by Lane Drew. Last updated 1 months ago.
1 stars 5.00 score 2 scriptsbioc
jazzPanda:Finding spatially relevant marker genes in image based spatial transcriptomics data
This package contains the function to find marker genes for image-based spatial transcriptomics data. There are functions to create spatial vectors from the cell and transcript coordiantes, which are passed as inputs to find marker genes. Marker genes are detected for every cluster by two approaches. The first approach is by permtuation testing, which is implmented in parallel for finding marker genes for one sample study. The other approach is to build a linear model for every gene. This approach can account for multiple samples and backgound noise.
Maintained by Melody Jin. Last updated 1 months ago.
spatialgeneexpressiondifferentialexpressionstatisticalmethodtranscriptomicscorrelationlinear-modelsmarker-genesspatial-transcriptomics
2 stars 5.00 scorebioc
GARS:GARS: Genetic Algorithm for the identification of Robust Subsets of variables in high-dimensional and challenging datasets
Feature selection aims to identify and remove redundant, irrelevant and noisy variables from high-dimensional datasets. Selecting informative features affects the subsequent classification and regression analyses by improving their overall performances. Several methods have been proposed to perform feature selection: most of them relies on univariate statistics, correlation, entropy measurements or the usage of backward/forward regressions. Herein, we propose an efficient, robust and fast method that adopts stochastic optimization approaches for high-dimensional. GARS is an innovative implementation of a genetic algorithm that selects robust features in high-dimensional and challenging datasets.
Maintained by Mattia Chiesa. Last updated 5 months ago.
classificationfeatureextractionclusteringopenjdk
5.00 score 2 scriptsr-forge
plasma:Partial LeAst Squares for Multiomic Analysis
Contains tools for supervised analyses of incomplete, overlapping multiomics datasets. Applies partial least squares in multiple steps to find models that predict survival outcomes. See Yamaguchi et al. (2023) <doi:10.1101/2023.03.10.532096>.
Maintained by Kevin R. Coombes. Last updated 2 months ago.
4.97 score 13 scriptsshanpengli
FastJM:Semi-Parametric Joint Modeling of Longitudinal and Survival Data
Maximum likelihood estimation for the semi-parametric joint modeling of competing risks and longitudinal data applying customized linear scan algorithms, proposed by Li and colleagues (2022) <doi:10.1155/2022/1362913>. The time-to-event data is modelled using a (cause-specific) Cox proportional hazards regression model with time-fixed covariates. The longitudinal outcome is modelled using a linear mixed effects model. The association is captured by shared random effects. The model is estimated using an Expectation Maximization algorithm.
Maintained by Shanpeng Li. Last updated 12 days ago.
5 stars 4.95 score 2 scripts 2 dependentstylerjpike
OOS:Out-of-Sample Time Series Forecasting
A comprehensive and cohesive API for the out-of-sample forecasting workflow: data preparation, forecasting - including both traditional econometric time series models and modern machine learning techniques - forecast combination, model and error analysis, and forecast visualization.
Maintained by Tyler J. Pike. Last updated 4 years ago.
econometricsforecast-combinationforecastingmachine-learning
9 stars 4.95 score 5 scriptsbioc
HPiP:Host-Pathogen Interaction Prediction
HPiP (Host-Pathogen Interaction Prediction) uses an ensemble learning algorithm for prediction of host-pathogen protein-protein interactions (HP-PPIs) using structural and physicochemical descriptors computed from amino acid-composition of host and pathogen proteins.The proposed package can effectively address data shortages and data unavailability for HP-PPI network reconstructions. Moreover, establishing computational frameworks in that regard will reveal mechanistic insights into infectious diseases and suggest potential HP-PPI targets, thus narrowing down the range of possible candidates for subsequent wet-lab experimental validations.
Maintained by Matineh Rahmatbakhsh. Last updated 5 months ago.
proteomicssystemsbiologynetworkinferencestructuralpredictiongenepredictionnetwork
3 stars 4.95 score 6 scriptstjetka
SLEMI:Statistical Learning Based Estimation of Mutual Information
The implementation of the algorithm for estimation of mutual information and channel capacity from experimental data by classification procedures (logistic regression). Technically, it allows to estimate information-theoretic measures between finite-state input and multivariate, continuous output. Method described in Jetka et al. (2019) <doi:10.1371/journal.pcbi.1007132>.
Maintained by Tomasz Jetka. Last updated 1 years ago.
channel-capacityinformation-theorylogistic-regressionmutual-information-estimation
4 stars 4.92 score 21 scriptsconnor-reid-tiffany
omu:A Metabolomics Analysis Tool for Intuitive Figures and Convenient Metadata Collection
Facilitates the creation of intuitive figures to describe metabolomics data by utilizing Kyoto Encyclopedia of Genes and Genomes (KEGG) hierarchy data, and gathers functional orthology and gene data from the KEGG-REST API.
Maintained by Connor Tiffany. Last updated 1 years ago.
3 stars 4.89 score 52 scriptsellessenne
KMunicate:KMunicate-Style Kaplan–Meier Plots
Produce Kaplan–Meier plots in the style recommended following the KMunicate study by Morris et al. (2019) <doi:10.1136/bmjopen-2019-030215>. The KMunicate style consists of Kaplan-Meier curves with confidence intervals to quantify uncertainty and an extended risk table (per treatment arm) depicting the number of study subjects at risk, events, and censored observations over time. The resulting plots are built using 'ggplot2' and can be further customised to a certain extent, including themes, fonts, and colour scales.
Maintained by Alessandro Gasparini. Last updated 11 months ago.
7 stars 4.89 score 11 scriptsfmgarciadiaz
PortalHacienda:Acceder Con R a Los Datos Del Portal De Hacienda
Obtener listado de datos, acceder y extender series del Portal de Datos de Hacienda.Las proyecciones se realizan con 'forecast', Hyndman RJ, Khandakar Y (2008) <doi:10.18637/jss.v027.i03>. Search, download and forecast time-series from the Ministry of Economy of Argentina. Forecasts are built with the 'forecast' package, Hyndman RJ, Khandakar Y (2008) <doi:10.18637/jss.v027.i03>.
Maintained by Fernando Garcia Diaz. Last updated 2 years ago.
apiargentinaeconomiaministerio-de-economiaseries-de-tiempo
15 stars 4.88 score 7 scriptsbdwilliamson
flevr:Flexible, Ensemble-Based Variable Selection with Potentially Missing Data
Perform variable selection in settings with possibly missing data based on extrinsic (algorithm-specific) and intrinsic (population-level) variable importance. Uses a Super Learner ensemble to estimate the underlying prediction functions that give rise to estimates of variable importance. For more information about the methods, please see Williamson and Huang (2023+) <arXiv:2202.12989>.
Maintained by Brian D. Williamson. Last updated 1 years ago.
5 stars 4.88 score 2 scriptsrobson-fernandes
bnviewer:Bayesian Networks Interactive Visualization and Explainable Artificial Intelligence
Bayesian networks provide an intuitive framework for probabilistic reasoning and its graphical nature can be interpreted quite clearly. Graph based methods of machine learning are becoming more popular because they offer a richer model of knowledge that can be understood by a human in a graphical format. The 'bnviewer' is an R Package that allows the interactive visualization of Bayesian Networks. The aim of this package is to improve the Bayesian Networks visualization over the basic and static views offered by existing packages.
Maintained by Robson Fernandes. Last updated 5 years ago.
bayesian-inferencebayesian-networkbayesian-networksprobabilistic-graphical-models
7 stars 4.86 score 69 scripts 1 dependentshectorrdb
Ecume:Equality of 2 (or k) Continuous Univariate and Multivariate Distributions
We implement (or re-implements in R) a variety of statistical tools. They are focused on non-parametric two-sample (or k-sample) distribution comparisons in the univariate or multivariate case. See the vignette for more info.
Maintained by Hector Roux de Bezieux. Last updated 10 months ago.
1 stars 4.86 score 16 scripts 3 dependentsgokmenzararsiz
dtComb:Statistical Combination of Diagnostic Tests
A system for combining two diagnostic tests using various approaches that include statistical and machine-learning-based methodologies. These approaches are divided into four groups: linear combination methods, non-linear combination methods, mathematical operators, and machine learning algorithms. See the <https://biotools.erciyes.edu.tr/dtComb/> website for more information, documentation, and examples.
Maintained by Gokmen Zararsiz. Last updated 7 days ago.
4.85 score 7 scriptsepivec
TDLM:Systematic Comparison of Trip Distribution Laws and Models
The main purpose of this package is to propose a rigorous framework to fairly compare trip distribution laws and models as described in Lenormand et al. (2016) <doi:10.1016/j.jtrangeo.2015.12.008>.
Maintained by Maxime Lenormand. Last updated 28 days ago.
2 stars 4.85 score 3 scriptsbioc
MLSeq:Machine Learning Interface for RNA-Seq Data
This package applies several machine learning methods, including SVM, bagSVM, Random Forest and CART to RNA-Seq data.
Maintained by Gokmen Zararsiz. Last updated 5 months ago.
immunooncologysequencingrnaseqclassificationclustering
4.81 score 27 scripts 1 dependentsharrison4192
tidybins:Make Tidy Bins
Multiple ways to bin numeric columns with a tidy output. Wraps a variety of existing binning methods into one function, and includes a new method for binning by equal value, which is useful for sales data. Provides a function to automatically summarize the properties of the binned columns.
Maintained by Harrison Tietze. Last updated 10 months ago.
4 stars 4.78 score 2 scripts 1 dependentsocbe-uio
BayesSurvive:Bayesian Survival Models for High-Dimensional Data
An implementation of Bayesian survival models with graph-structured selection priors for sparse identification of omics features predictive of survival (Madjar et al., 2021 <doi:10.1186/s12859-021-04483-z>) and its extension to use a fixed graph via a Markov Random Field (MRF) prior for capturing known structure of omics features, e.g. disease-specific pathways from the Kyoto Encyclopedia of Genes and Genomes database (Hermansen et al., 2025 <doi:10.48550/arXiv.2503.13078>).
Maintained by Zhi Zhao. Last updated 13 days ago.
bayesian-cox-modelsbayesian-variable-selectiongraph-learninghigh-dimensional-statisticsomics-data-integrationsurvival-analysisopenblascppopenmp
4.78 score 1 scriptsbioc
supersigs:Supervised mutational signatures
Generate SuperSigs (supervised mutational signatures) from single nucleotide variants in the cancer genome. Functions included in the package allow the user to learn supervised mutational signatures from their data and apply them to new data. The methodology is based on the one described in Afsari (2021, ELife).
Maintained by Albert Kuo. Last updated 5 months ago.
featureextractionclassificationregressionsequencingwholegenomesomaticmutation
3 stars 4.78 score 3 scriptsjoelcuerrier
cdid:The Chained Difference-in-Differences
Extends the 'did' package to improve efficiency and handling of unbalanced panel data. Bellego, Benatia, and Dortet-Bernadet (2024), "The Chained Difference-in-Differences", Journal of Econometrics, <doi:10.1016/j.jeconom.2024.105783>.
Maintained by David Benatia. Last updated 2 months ago.
2 stars 4.78 score 3 scriptsadimajo
glmtree:Logistic Regression Trees
A logistic regression tree is a decision tree with logistic regressions at its leaves. A particular stochastic expectation maximization algorithm is used to draw a few good trees, that are then assessed via the user's criterion of choice among BIC / AIC / test set Gini. The formal development is given in a PhD chapter, see Ehrhardt (2019) <https://github.com/adimajo/manuscrit_these/releases/>.
Maintained by Adrien Ehrhardt. Last updated 1 years ago.
6 stars 4.78 score 3 scriptscefet-rj-dal
heimdall:Drift Adaptable Models
By analyzing streaming datasets, it is possible to observe significant changes in the data distribution or models' accuracy during their prediction (concept drift). The goal of 'heimdall' is to measure when concept drift occurs. The package makes available several state-of-the-art methods. It also tackles how to adapt models in a nonstationary context. Some concept drifts methods are described in Tavares (2022) <doi:10.1007/s12530-021-09415-z>.
Maintained by Eduardo Ogasawara. Last updated 2 months ago.
2 stars 4.77 score 45 scriptshknd23
DeepLearningCausal:Causal Inference with Super Learner and Deep Neural Networks
Functions to estimate Conditional Average Treatment Effects (CATE) and Population Average Treatment Effects on the Treated (PATT) from experimental or observational data using the Super Learner (SL) ensemble method and Deep neural networks. The package first provides functions to implement meta-learners such as the Single-learner (S-learner) and Two-learner (T-learner) described in Künzel et al. (2019) <doi:10.1073/pnas.1804597116> for estimating the CATE. The S- and T-learner are each estimated using the SL ensemble method and deep neural networks. It then provides functions to implement the Ottoboni and Poulos (2020) <doi:10.1515/jci-2018-0035> PATT-C estimator to obtain the PATT from experimental data with noncompliance by using the SL ensemble method and deep neural networks.
Maintained by Nguyen K. Huynh. Last updated 2 months ago.
causal-inferencedeep-neural-networksmachine-learning
2 stars 4.73 score 5 scripts