R-universe search: roc

xrobin

pROC:Display and Analyze ROC Curves

Tools for visualizing, smoothing and comparing receiver operating characteristic (ROC curves). (Partial) area under the curve (AUC) can be compared with statistical tests based on U-statistics or bootstrap. Confidence intervals can be computed for (p)AUC or ROC curves.

Maintained by Xavier Robin. Last updated 4 months ago.

bootstrapping covariance hypothesis-testing machine-learning plot plotting roc roc-curve variance cpp

66.5 match 125 stars 15.18 score 16k scripts 445 dependents

bioc

ROC:utilities for ROC, with microarray focus

Provide utilities for ROC, with microarray focus.

Maintained by Vince Carey. Last updated 5 months ago.

differentialexpression

71.0 match 6.97 score 70 scripts 8 dependents

sachsmc

plotROC:Generate Useful ROC Curve Charts for Print and Interactive Use

Most ROC curve plots obscure the cutoff values and inhibit interpretation and comparison of multiple curves. This attempts to address those shortcomings by providing plotting and interactive tools. Functions are provided to generate an interactive ROC curve plot for web use, and print versions. A Shiny application implementing the functions is also included.

Maintained by Michael C. Sachs. Last updated 4 months ago.

36.5 match 87 stars 10.93 score 932 scripts 7 dependents

dpc10ster

RJafroc:Artificial Intelligence Systems and Observer Performance

Analyzing the performance of artificial intelligence (AI) systems/algorithms characterized by a 'search-and-report' strategy. Historically observer performance has dealt with measuring radiologists' performances in search tasks, e.g., searching for lesions in medical images and reporting them, but the implicit location information has been ignored. The implemented methods apply to analyzing the absolute and relative performances of AI systems, comparing AI performance to a group of human readers or optimizing the reporting threshold of an AI system. In addition to performing historical receiver operating receiver operating characteristic (ROC) analysis (localization information ignored), the software also performs free-response receiver operating characteristic (FROC) analysis, where lesion localization information is used. A book using the software has been published: Chakraborty DP: Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples, Taylor-Francis LLC; 2017: <https://www.routledge.com/Observer-Performance-Methods-for-Diagnostic-Imaging-Foundations-Modeling/Chakraborty/p/book/9781482214840>. Online updates to this book, which use the software, are at <https://dpc10ster.github.io/RJafrocQuickStart/>, <https://dpc10ster.github.io/RJafrocRocBook/> and at <https://dpc10ster.github.io/RJafrocFrocBook/>. Supported data collection paradigms are the ROC, FROC and the location ROC (LROC). ROC data consists of single ratings per images, where a rating is the perceived confidence level that the image is that of a diseased patient. An ROC curve is a plot of true positive fraction vs. false positive fraction. FROC data consists of a variable number (zero or more) of mark-rating pairs per image, where a mark is the location of a reported suspicious region and the rating is the confidence level that it is a real lesion. LROC data consists of a rating and a location of the most suspicious region, for every image. Four models of observer performance, and curve-fitting software, are implemented: the binormal model (BM), the contaminated binormal model (CBM), the correlated contaminated binormal model (CORCBM), and the radiological search model (RSM). Unlike the binormal model, CBM, CORCBM and RSM predict 'proper' ROC curves that do not inappropriately cross the chance diagonal. Additionally, RSM parameters are related to search performance (not measured in conventional ROC analysis) and classification performance. Search performance refers to finding lesions, i.e., true positives, while simultaneously not finding false positive locations. Classification performance measures the ability to distinguish between true and false positive locations. Knowing these separate performances allows principled optimization of reader or AI system performance. This package supersedes Windows JAFROC (jackknife alternative FROC) software V4.2.1, <https://github.com/dpc10ster/WindowsJafroc>. Package functions are organized as follows. Data file related function names are preceded by 'Df', curve fitting functions by 'Fit', included data sets by 'dataset', plotting functions by 'Plot', significance testing functions by 'St', sample size related functions by 'Ss', data simulation functions by 'Simulate' and utility functions by 'Util'. Implemented are figures of merit (FOMs) for quantifying performance and functions for visualizing empirical or fitted operating characteristics: e.g., ROC, FROC, alternative FROC (AFROC) and weighted AFROC (wAFROC) curves. For fully crossed study designs significance testing of reader-averaged FOM differences between modalities is implemented via either Dorfman-Berbaum-Metz or the Obuchowski-Rockette methods. Also implemented is single modality analysis, which allows comparison of performance of a group of radiologists to a specified value, or comparison of AI to a group of radiologists interpreting the same cases. Crossed-modality analysis is implemented wherein there are two crossed modality factors and the aim is to determined performance in each modality factor averaged over all levels of the second factor. Sample size estimation tools are provided for ROC and FROC studies; these use estimates of the relevant variances from a pilot study to predict required numbers of readers and cases in a pivotal study to achieve the desired power. Utility and data file manipulation functions allow data to be read in any of the currently used input formats, including Excel, and the results of the analysis can be viewed in text or Excel output files. The methods are illustrated with several included datasets from the author's collaborations. This update includes improvements to the code, some as a result of user-reported bugs and new feature requests, and others discovered during ongoing testing and code simplification.

Maintained by Dev Chakraborty. Last updated 5 months ago.

ai-optimization artificial-intelligence-algorithms computer-aided-diagnosis froc-analysis roc-analysis target-classification target-localization cpp

67.4 match 19 stars 5.69 score 65 scripts

thie1e

cutpointr:Determine and Evaluate Optimal Cutpoints in Binary Classification Tasks

Estimate cutpoints that optimize a specified metric in binary classification tasks and validate performance using bootstrapping. Some methods for more robust cutpoint estimation are supported, e.g. a parametric method assuming normal distributions, bootstrapped cutpoints, and smoothing of the metric values per cutpoint using Generalized Additive Models. Various plotting functions are included. For an overview of the package see Thiele and Hirschfeld (2021) <doi:10.18637/jss.v098.i11>.

Maintained by Christian Thiele. Last updated 3 months ago.

bootstrapping cutpoint-optimization roc-curve cpp

26.8 match 88 stars 10.44 score 322 scripts 1 dependents

spatstat

spatstat.explore:Exploratory Data Analysis for the 'spatstat' Family

Functionality for exploratory data analysis and nonparametric analysis of spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported.

Maintained by Adrian Baddeley. Last updated 1 months ago.

cluster-detection confidence-intervals hypothesis-testing k-function roc-curves scan-statistics significance-testing simulation-envelopes spatial-analysis spatial-data-analysis spatial-sharpening spatial-smoothing spatial-statistics

15.1 match 1 stars 10.17 score 67 scripts 148 dependents

erikpeter

fbroc:Fast Algorithms to Bootstrap Receiver Operating Characteristics Curves

Implements a very fast C++ algorithm to quickly bootstrap receiver operating characteristics (ROC) curves and derived performance metrics, including the area under the curve (AUC) and the partial area under the curve as well as the true and false positive rate. The analysis of paired receiver operating curves is supported as well, so that a comparison of two predictors is possible. You can also plot the results and calculate confidence intervals. On a typical desktop computer the time needed for the calculation of 100000 bootstrap replicates given 500 observations requires time on the order of magnitude of one second.

Maintained by Erik Peter. Last updated 6 years ago.

cpp

33.0 match 7 stars 4.28 score 18 scripts

veseshan

clinfun:Clinical Trial Design and Data Analysis Functions

Utilities to make your clinical collaborations easier if not fun. It contains functions for designing studies such as Simon 2-stage and group sequential designs and for data analysis such as Jonckheere-Terpstra test and estimating survival quantiles.

Maintained by Venkatraman E. Seshan. Last updated 1 years ago.

fortran

17.2 match 5 stars 7.86 score 124 scripts 8 dependents

riazakhan94

ROCit:Performance Assessment of Binary Classifier with Visualization

Sensitivity (or recall or true positive rate), false positive rate, specificity, precision (or positive predictive value), negative predictive value, misclassification rate, accuracy, F-score- these are popular metrics for assessing performance of binary classifier for certain threshold. These metrics are calculated at certain threshold values. Receiver operating characteristic (ROC) curve is a common tool for assessing overall diagnostic ability of the binary classifier. Unlike depending on a certain threshold, area under ROC curve (also known as AUC), is a summary statistic about how well a binary classifier performs overall for the classification task. ROCit package provides flexibility to easily evaluate threshold-bound metrics. Also, ROC curve, along with AUC, can be obtained using different methods, such as empirical, binormal and non-parametric. ROCit encompasses a wide variety of methods for constructing confidence interval of ROC curve and AUC. ROCit also features the option of constructing empirical gains table, which is a handy tool for direct marketing. The package offers options for commonly used visualization, such as, ROC curve, KS plot, lift plot. Along with in-built default graphics setting, there are rooms for manual tweak by providing the necessary values as function arguments. ROCit is a powerful tool offering a range of things, yet it is very easy to use.

Maintained by Md Riaz Ahmed Khan. Last updated 3 years ago.

16.6 match 7.66 score 332 scripts 6 dependents

foucher-y

RISCA:Causal Inference and Prediction in Cohort-Based Analyses

Numerous functions for cohort-based analyses, either for prediction or causal inference. For causal inference, it includes Inverse Probability Weighting and G-computation for marginal estimation of an exposure effect when confounders are expected. We deal with binary outcomes, times-to-events, competing events, and multi-state data. For multistate data, semi-Markov model with interval censoring may be considered, and we propose the possibility to consider the excess of mortality related to the disease compared to reference lifetime tables. For predictive studies, we propose a set of functions to estimate time-dependent receiver operating characteristic (ROC) curves with the possible consideration of right-censoring times-to-events or the presence of confounders. Finally, several functions are available to assess time-dependent ROC curves or survival curves from aggregated data.

Maintained by Yohann Foucher. Last updated 23 days ago.

28.0 match 1 stars 4.33 score 47 scripts

ledell

cvAUC:Cross-Validated Area Under the ROC Curve Confidence Intervals

Tools for working with and evaluating cross-validated area under the ROC curve (AUC) estimators. The primary functions of the package are ci.cvAUC and ci.pooled.cvAUC, which report cross-validated AUC and compute confidence intervals for cross-validated AUC estimates based on influence curves for i.i.d. and pooled repeated measures data, respectively. One benefit to using influence curve based confidence intervals is that they require much less computation time than bootstrapping methods. The utility functions, AUC and cvAUC, are simple wrappers for functions from the ROCR package.

Maintained by Erin LeDell. Last updated 3 years ago.

auc confidence-intervals cross-validation machine-learning statistics variance

12.1 match 23 stars 9.17 score 317 scripts 40 dependents

tidymodels

yardstick:Tidy Characterizations of Model Performance

Tidy tools for quantifying how well model fits to a data set such as confusion matrices, class probability curve summaries, and regression metrics (e.g., RMSE).

Maintained by Emil Hvitfeldt. Last updated 4 days ago.

6.8 match 387 stars 15.47 score 2.2k scripts 60 dependents

brian-j-smith

MRMCaov:Multi-Reader Multi-Case Analysis of Variance

Estimation and comparison of the performances of diagnostic tests in multi-reader multi-case studies where true case statuses (or ground truths) are known and one or more readers provide test ratings for multiple cases. Reader performance metrics are provided for area under and expected utility of ROC curves, likelihood ratio of positive or negative tests, and sensitivity and specificity. ROC curves can be estimated empirically or with binormal or binormal likelihood-ratio models. Statistical comparisons of diagnostic tests are based on the ANOVA model of Obuchowski-Rockette and the unified framework of Hillis (2005) <doi:10.1002/sim.2024>. The ANOVA can be conducted with data from a full factorial, nested, or partially paired study design; with random or fixed readers or cases; and covariances estimated with the DeLong method, jackknifing, or an unbiased method. Smith and Hillis (2020) <doi:10.1117/12.2549075>.

Maintained by Brian J Smith. Last updated 2 years ago.

18.9 match 12 stars 5.26 score 8 scripts 1 dependents

leifeld

btergm:Temporal Exponential Random Graph Models by Bootstrapped Pseudolikelihood

Temporal Exponential Random Graph Models (TERGM) estimated by maximum pseudolikelihood with bootstrapped confidence intervals or Markov Chain Monte Carlo maximum likelihood. Goodness of fit assessment for ERGMs, TERGMs, and SAOMs. Micro-level interpretation of ERGMs and TERGMs. The methods are described in Leifeld, Cranmer and Desmarais (2018), JStatSoft <doi:10.18637/jss.v083.i06>.

Maintained by Philip Leifeld. Last updated 12 months ago.

complex-networks dynamic-analysis ergm estimation goodness-of-fit inference longitudinal-data network-analysis prediction tergm

13.5 match 17 stars 6.70 score 83 scripts 2 dependents

bioc

iCOBRA:Comparison and Visualization of Ranking and Assignment Methods

This package provides functions for calculation and visualization of performance metrics for evaluation of ranking and binary classification (assignment) methods. Various types of performance plots can be generated programmatically. The package also contains a shiny application for interactive exploration of results.

Maintained by Charlotte Soneson. Last updated 3 months ago.

classification visualization

10.1 match 14 stars 8.86 score 192 scripts 1 dependents

jinseob2kim

jsmodule:'RStudio' Addins and 'Shiny' Modules for Medical Research

'RStudio' addins and 'Shiny' modules for descriptive statistics, regression and survival analysis.

Maintained by Jinseob Kim. Last updated 3 days ago.

medical rstudio-addins shiny shiny-modules statistics

10.2 match 21 stars 8.68 score 61 scripts

evalclass

precrec:Calculate Accurate Precision-Recall and ROC (Receiver Operator Characteristics) Curves

Accurate calculations and visualization of precision-recall and ROC (Receiver Operator Characteristics) curves. Saito and Rehmsmeier (2015) <doi:10.1371/journal.pone.0118432>.

Maintained by Takaya Saito. Last updated 1 years ago.

cpp

9.2 match 45 stars 9.59 score 496 scripts 5 dependents

cran

trinROC:Statistical Tests for Assessing Trinormal ROC Data

Several statistical test functions as well as a function for exploratory data analysis to investigate classifiers allocating individuals to one of three disjoint and ordered classes. In a single classifier assessment the discriminatory power is compared to classification by chance. In a comparison of two classifiers the null hypothesis corresponds to equal discriminatory power of the two classifiers. See also "ROC Analysis for Classification and Prediction in Practice" by Nakas, Bantis and Gatsonis (2023), ISBN 9781482233704.

Maintained by Reinhard Furrer. Last updated 5 months ago.

29.5 match 2.70 score

spatstat

spatstat.model:Parametric Statistical Modelling and Inference for the 'spatstat' Family

Functionality for parametric statistical modelling and inference for spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Supports parametric modelling, formal statistical inference, and model validation. Parametric models include Poisson point processes, Cox point processes, Neyman-Scott cluster processes, Gibbs point processes and determinantal point processes. Models can be fitted to data using maximum likelihood, maximum pseudolikelihood, maximum composite likelihood and the method of minimum contrast. Fitted models can be simulated and predicted. Formal inference includes hypothesis tests (quadrat counting tests, Cressie-Read tests, Clark-Evans test, Berman test, Diggle-Cressie-Loosmore-Ford test, scan test, studentised permutation test, segregation test, ANOVA tests of fitted models, adjusted composite likelihood ratio test, envelope tests, Dao-Genton test, balanced independent two-stage test), confidence intervals for parameters, and prediction intervals for point counts. Model validation techniques include leverage, influence, partial residuals, added variable plots, diagnostic plots, pseudoscore residual plots, model compensators and Q-Q plots.

Maintained by Adrian Baddeley. Last updated 7 days ago.

analysis-of-variance cluster-process confidence-intervals cox-process determinantal-point-processes gibbs-process influence leverage model-diagnostics neyman-scott parameter-estimation poisson-process spatial-analysis spatial-modelling spatial-point-processes statistical-inference

8.8 match 5 stars 9.09 score 6 scripts 46 dependents

jgraux

PRROC:Precision-Recall and ROC Curves for Weighted and Unweighted Data

Computes the areas under the precision-recall (PR) and ROC curve for weighted (e.g., soft-labeled) and unweighted data. In contrast to other implementations, the interpolation between points of the PR curve is done by a non-linear piecewise function. In addition to the areas under the curves, the curves themselves can also be computed and plotted by a specific S3-method. References: Davis and Goadrich (2006) <doi:10.1145/1143844.1143874>; Keilwagen et al. (2014) <doi:10.1371/journal.pone.0092209>; Grau et al. (2015) <doi:10.1093/bioinformatics/btv153>.

Maintained by Jan Grau. Last updated 7 years ago.

9.5 match 8.35 score 1.2k scripts 56 dependents

hmjianggatech

huge:High-Dimensional Undirected Graph Estimation

Provides a general framework for high-dimensional undirected graph estimation. It integrates data preprocessing, neighborhood screening, graph estimation, and model selection techniques into a pipeline. In preprocessing stage, the nonparanormal(npn) transformation is applied to help relax the normality assumption. In the graph estimation stage, the graph structure is estimated by Meinshausen-Buhlmann graph estimation or the graphical lasso, and both methods can be further accelerated by the lossy screening rule preselecting the neighborhood of each variable by correlation thresholding. We target on high-dimensional data analysis usually d >> n, and the computation is memory-optimized using the sparse matrix output. We also provide a computationally efficient approach, correlation thresholding graph estimation. Three regularization/thresholding parameter selection methods are included in this package: (1)stability approach for regularization selection (2) rotation information criterion (3) extended Bayesian information criterion which is only available for the graphical lasso.

Maintained by Haoming Jiang. Last updated 3 years ago.

cpp openmp

7.6 match 12 stars 10.29 score 608 scripts 19 dependents

paramita-sc

risksetROC:Riskset ROC Curve Estimation from Censored Survival Data

Compute time-dependent Incident/dynamic accuracy measures (ROC curve, AUC, integrated AUC )from censored survival data under proportional or non-proportional hazard assumption of Heagerty & Zheng (Biometrics, Vol 61 No 1, 2005, PP 92-105).

Maintained by Paramita Saha-Chaudhuri. Last updated 3 years ago.

21.1 match 3.71 score 57 scripts 3 dependents

winvector

WVPlots:Common Plots for Analysis

Select data analysis plots, under a standardized calling interface implemented on top of 'ggplot2' and 'plotly'. Plots of interest include: 'ROC', gain curve, scatter plot with marginal distributions, conditioned scatter plot with marginal densities, box and stem with matching theoretical distribution, and density with matching theoretical distribution.

Maintained by John Mount. Last updated 11 months ago.

9.4 match 85 stars 8.00 score 280 scripts

amsantac

TOC:Total Operating Characteristic Curve and ROC Curve

Construction of the Total Operating Characteristic (TOC) Curve and the Receiver (aka Relative) Operating Characteristic (ROC) Curve for spatial and non-spatial data. The TOC method is a modification of the ROC method which measures the ability of an index variable to diagnose either presence or absence of a characteristic. The diagnosis depends on whether the value of an index variable is above a threshold. Each threshold generates a two-by-two contingency table, which contains four entries: hits (H), misses (M), false alarms (FA), and correct rejections (CR). While ROC shows for each threshold only two ratios, H/(H + M) and FA/(FA + CR), TOC reveals the size of every entry in the contingency table for each threshold (Pontius Jr., R.G., Si, K. 2014. <doi:10.1080/13658816.2013.862623>).

Maintained by Ali Santacruz. Last updated 1 years ago.

16.4 match 4 stars 4.48 score 15 scripts

laresbernardo

lares:Analytics & Machine Learning Sidekick

Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.

Maintained by Bernardo Lares. Last updated 24 days ago.

analytics api automation automl data-science descriptive-statistics h2o machine-learning marketing mmm predictive-modeling puzzle rlanguage robyn visualization

7.2 match 233 stars 9.84 score 185 scripts 1 dependents

rezamoammadi

BDgraph:Bayesian Structure Learning in Graphical Models using Birth-Death MCMC

Advanced statistical tools for Bayesian structure learning in undirected graphical models, accommodating continuous, ordinal, discrete, count, and mixed data. It integrates recent advancements in Bayesian graphical models as presented in the literature, including the works of Mohammadi and Wit (2015) <doi:10.1214/14-BA889>, Mohammadi et al. (2021) <doi:10.1080/01621459.2021.1996377>, Dobra and Mohammadi (2018) <doi:10.1214/18-AOAS1164>, and Mohammadi et al. (2023) <doi:10.48550/arXiv.2307.00127>.

Maintained by Reza Mohammadi. Last updated 7 months ago.

openblas cpp openmp

9.4 match 8 stars 7.45 score 223 scripts 7 dependents

coffeemuggler

caTools:Tools: Moving Window Statistics, GIF, Base64, ROC AUC, etc

Contains several basic utility functions including: moving (rolling, running) window statistic functions, read/write for GIF and ENVI binary files, fast calculation of AUC, LogitBoost classifier, base64 encoder/decoder, round-off-error-free sum and cumsum, etc.

Maintained by Michael Dietze. Last updated 6 months ago.

cpp

6.2 match 8 stars 11.17 score 9.1k scripts 566 dependents

resplab

predtools:Prediction Model Tools

Provides additional functions for evaluating predictive models, including plotting calibration curves and model-based Receiver Operating Characteristic (mROC) based on Sadatsafavi et al (2021) <arXiv:2003.00316>.

Maintained by Amin Adibi. Last updated 2 years ago.

cpp

9.5 match 9 stars 6.74 score 77 scripts

gavinsimpson

analogue:Analogue and Weighted Averaging Methods for Palaeoecology

Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.

Maintained by Gavin L. Simpson. Last updated 6 months ago.

7.1 match 14 stars 8.96 score 185 scripts 4 dependents

cran

verification:Weather Forecast Verification Utilities

Utilities for verifying discrete, continuous and probabilistic forecasts, and forecasts expressed as parametric distributions are included.

Maintained by Eric Gilleland. Last updated 4 months ago.

14.7 match 3 stars 4.19 score 6 dependents

irinagain

iglu:Interpreting Glucose Data from Continuous Glucose Monitors

Implements a wide range of metrics for measuring glucose control and glucose variability based on continuous glucose monitoring data. The list of implemented metrics is summarized in Rodbard (2009) <doi:10.1089/dia.2009.0015>. Additional visualization tools include time-series plots, lasagna plots and ambulatory glucose profile report.

Maintained by Irina Gaynanova. Last updated 10 days ago.

6.8 match 26 stars 9.00 score 39 scripts

spatstat

spatstat.linnet:Linear Networks Functionality of the 'spatstat' Family

Defines types of spatial data on a linear network and provides functionality for geometrical operations, data analysis and modelling of data on a linear network, in the 'spatstat' family of packages. Contains definitions and support for linear networks, including creation of networks, geometrical measurements, topological connectivity, geometrical operations such as inserting and deleting vertices, intersecting a network with another object, and interactive editing of networks. Data types defined on a network include point patterns, pixel images, functions, and tessellations. Exploratory methods include kernel estimation of intensity on a network, K-functions and pair correlation functions on a network, simulation envelopes, nearest neighbour distance and empty space distance, relative risk estimation with cross-validated bandwidth selection. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the function lppm() similar to glm(). Only Poisson models are implemented so far. Models may involve dependence on covariates and dependence on marks. Models are fitted by maximum likelihood. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots. Random point patterns on a network can be generated using a variety of models.

Maintained by Adrian Baddeley. Last updated 2 months ago.

density-estimation heat-equation kernel-density-estimation network-analysis point-processes spatial-data-analysis statistical-analysis statistical-inference statistical-models

6.3 match 6 stars 9.64 score 35 scripts 43 dependents

jacobseedorff21

BranchGLM:Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms

Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'.

Maintained by Jacob Seedorff. Last updated 6 months ago.

generalized-linear-models regression statistics subset-selection variable-selection openblas cpp openmp

9.6 match 7 stars 6.20 score 30 scripts

ohdsi

PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model

A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.

Maintained by Egill Fridgeirsson. Last updated 9 days ago.

hades openjdk

5.4 match 190 stars 10.85 score 297 scripts

modeloriented

auditor:Model Audit - Verification, Validation, and Error Analysis

Provides an easy to use unified interface for creating validation plots for any model. The 'auditor' helps to avoid repetitive work consisting of writing code needed to create residual plots. This visualizations allow to asses and compare the goodness of fit, performance, and similarity of models.

Maintained by Alicja Gosiewska. Last updated 1 years ago.

classification error-analysis explainable-artificial-intelligence machine-learning model-validation regression-models residuals xai

6.6 match 58 stars 8.76 score 94 scripts 2 dependents

paulowhite

timeROC:Time-Dependent ROC Curve and AUC for Censored Survival Data

Estimation of time-dependent ROC curve and area under time dependent ROC curve (AUC) in the presence of censored data, with or without competing risks. Confidence intervals of AUCs and tests for comparing AUCs of two rival markers measured on the same subjects can be computed, using the iid-representation of the AUC estimator. Plot functions for time-dependent ROC curves and AUC curves are provided. Time-dependent Positive Predictive Values (PPV) and Negative Predictive Values (NPV) can also be computed. See Blanche et al. (2013) <doi:10.1002/sim.5958> and references therein for the details of the methods implemented in the package.

Maintained by Paul Blanche. Last updated 5 years ago.

9.0 match 9 stars 6.24 score 342 scripts 8 dependents

brandon-gallas

iMRMC:Multi-Reader, Multi-Case Analysis Methods (ROC, Agreement, and Other Metrics)

This software does Multi-Reader, Multi-Case (MRMC) analyses of data from imaging studies where clinicians (readers) evaluate patient images (cases). What does this mean? ... Many imaging studies are designed so that every reader reads every case in all modalities, a fully-crossed study. In this case, the data is cross-correlated, and we consider the readers and cases to be cross-correlated random effects. An MRMC analysis accounts for the variability and correlations from the readers and cases when estimating variances, confidence intervals, and p-values. The functions in this package can treat arbitrary study designs and studies with missing data, not just fully-crossed study designs. An overview of this software, including references presenting details on the methods, can be found here: <https://www.fda.gov/medical-devices/science-and-research-medical-devices/imrmc-software-do-multi-reader-multi-case-statistical-analysis-reader-studies>.

Maintained by Brandon Gallas. Last updated 7 months ago.

16.7 match 3.32 score 58 scripts 1 dependents

trevorhastie

glmnet:Lasso and Elastic-Net Regularized Generalized Linear Models

Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression, Cox model, multiple-response Gaussian, and the grouped multinomial regression; see <doi:10.18637/jss.v033.i01> and <doi:10.18637/jss.v039.i05>. There are two new and important additions. The family argument can be a GLM family object, which opens the door to any programmed family (<doi:10.18637/jss.v106.i01>). This comes with a modest computational cost, so when the built-in families suffice, they should be used instead. The other novelty is the relax option, which refits each of the active sets in the path unpenalized. The algorithm uses cyclical coordinate descent in a path-wise fashion, as described in the papers cited.

Maintained by Trevor Hastie. Last updated 2 years ago.

fortran cpp

3.5 match 82 stars 15.15 score 22k scripts 736 dependents

wraff

wrProteo:Proteomics Data Analysis Functions

Data analysis of proteomics experiments by mass spectrometry is supported by this collection of functions mostly dedicated to the analysis of (bottom-up) quantitative (XIC) data. Fasta-formatted proteomes (eg from UniProt Consortium <doi:10.1093/nar/gky1049>) can be read with automatic parsing and multiple annotation types (like species origin, abbreviated gene names, etc) extracted. Initial results from multiple software for protein (and peptide) quantitation can be imported (to a common format): MaxQuant (Tyanova et al 2016 <doi:10.1038/nprot.2016.136>), Dia-NN (Demichev et al 2020 <doi:10.1038/s41592-019-0638-x>), Fragpipe (da Veiga et al 2020 <doi:10.1038/s41592-020-0912-y>), ionbot (Degroeve et al 2021 <doi:10.1101/2021.07.02.450686>), MassChroq (Valot et al 2011 <doi:10.1002/pmic.201100120>), OpenMS (Strauss et al 2021 <doi:10.1038/nmeth.3959>), ProteomeDiscoverer (Orsburn 2021 <doi:10.3390/proteomes9010015>), Proline (Bouyssie et al 2020 <doi:10.1093/bioinformatics/btaa118>), AlphaPept (preprint Strauss et al <doi:10.1101/2021.07.23.453379>) and Wombat-P (Bouyssie et al 2023 <doi:10.1021/acs.jproteome.3c00636>. Meta-data provided by initial analysis software and/or in sdrf format can be integrated to the analysis. Quantitative proteomics measurements frequently contain multiple NA values, due to physical absence of given peptides in some samples, limitations in sensitivity or other reasons. Help is provided to inspect the data graphically to investigate the nature of NA-values via their respective replicate measurements and to help/confirm the choice of NA-replacement algorithms. Meta-data in sdrf-format (Perez-Riverol et al 2020 <doi:10.1021/acs.jproteome.0c00376>) or similar tabular formats can be imported and included. Missing values can be inspected and imputed based on the concept of NA-neighbours or other methods. Dedicated filtering and statistical testing using the framework of package 'limma' <doi:10.18129/B9.bioc.limma> can be run, enhanced by multiple rounds of NA-replacements to provide robustness towards rare stochastic events. Multi-species samples, as frequently used in benchmark-tests (eg Navarro et al 2016 <doi:10.1038/nbt.3685>, Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>), can be run with special options considering such sub-groups during normalization and testing. Subsequently, ROC curves (Hand and Till 2001 <doi:10.1023/A:1010920819831>) can be constructed to compare multiple analysis approaches. As detailed example the data-set from Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>) quantified by MaxQuant, ProteomeDiscoverer, and Proline is provided with a detailed analysis of heterologous spike-in proteins.

Maintained by Wolfgang Raffelsberger. Last updated 4 months ago.

14.4 match 3.67 score 17 scripts 1 dependents

tdhock

directlabels:Direct Labels for Multicolor Plots

An extensible framework for automatically placing direct labels onto multicolor 'lattice' or 'ggplot2' plots. Label positions are described using Positioning Methods which can be re-used across several different plots. There are heuristics for examining "trellis" and "ggplot" objects and inferring an appropriate Positioning Method.

Maintained by Toby Dylan Hocking. Last updated 11 months ago.

4.9 match 83 stars 10.62 score 1.8k scripts 16 dependents

mlcoding

flare:Family of Lasso Regression

Provide the implementation of a family of Lasso variants including Dantzig Selector, LAD Lasso, SQRT Lasso, Lq Lasso for estimating high dimensional sparse linear model. We adopt the alternating direction method of multipliers and convert the original optimization problem into a sequential L1 penalized least square minimization problem, which can be efficiently solved by linearization algorithm. A multi-stage screening approach is adopted for further acceleration. Besides the sparse linear model estimation, we also provide the extension of these Lasso variants to sparse Gaussian graphical model estimation including TIGER and CLIME using either L1 or adaptive penalty. Missing values can be tolerated for Dantzig selector and CLIME. The computation is memory-optimized using the sparse matrix output. For more information, please refer to <https://www.jmlr.org/papers/volume16/li15a/li15a.pdf>.

Maintained by Xingguo Li. Last updated 4 months ago.

12.1 match 1 stars 4.31 score 141 scripts 4 dependents

nliulab

AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator

A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.

Maintained by Feng Xie. Last updated 15 days ago.

6.7 match 32 stars 7.70 score 30 scripts

babaknaimi

sdm:Species Distribution Modelling

An extensible framework for developing species distribution models using individual and community-based approaches, generate ensembles of models, evaluate the models, and predict species potential distributions in space and time. For more information, please check the following paper: Naimi, B., Araujo, M.B. (2016) <doi:10.1111/ecog.01881>.

Maintained by Babak Naimi. Last updated 2 months ago.

5.3 match 24 stars 9.53 score 312 scripts 1 dependents

joshuaulrich

TTR:Technical Trading Rules

A collection of over 50 technical indicators for creating technical trading rules. The package also provides fast implementations of common rolling-window functions, and several volatility calculations.

Maintained by Joshua Ulrich. Last updated 1 years ago.

algorithmic-trading finance technical-analysis

3.3 match 338 stars 15.11 score 2.8k scripts 359 dependents

hopkinsidd

phylosamp:Sample Size Calculations for Molecular and Phylogenetic Studies

Implements novel tools for estimating sample sizes needed for phylogenetic studies, including studies focused on estimating the probability of true pathogen transmission between two cases given phylogenetic linkage and studies focused on tracking pathogen variants at a population level. Methods described in Wohl, Giles, and Lessler (2021) and in Wohl, Lee, DiPrete, and Lessler (2023).

Maintained by Justin Lessler. Last updated 2 years ago.

phylogenetics sampling

7.4 match 12 stars 6.65 score 25 scripts

bxc147

Epi:Statistical Analysis in Epidemiology

Functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data. In particular representation, manipulation, rate estimation and simulation for multistate data - the Lexis suite of functions, which includes interfaces to 'mstate', 'etm' and 'cmprsk' packages. Contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.

Maintained by Bendix Carstensen. Last updated 2 months ago.

5.1 match 4 stars 9.65 score 708 scripts 11 dependents

ehrlinger

ggRandomForests:Visually Exploring Random Forests

Graphic elements for exploring Random Forests using the 'randomForest' or 'randomForestSRC' package for survival, regression and classification forests and 'ggplot2' package plotting.

Maintained by John Ehrlinger. Last updated 5 days ago.

5.3 match 148 stars 8.94 score 197 scripts

mxrodriguezuvigo

ROCnReg:ROC Curve Inference with and without Covariates

Estimates the pooled (unadjusted) Receiver Operating Characteristic (ROC) curve, the covariate-adjusted ROC (AROC) curve, and the covariate-specific/conditional ROC (cROC) curve by different methods, both Bayesian and frequentist. Also, it provides functions to obtain ROC-based optimal cutpoints utilizing several criteria. Based on Erkanli, A. et al. (2006) <doi:10.1002/sim.2496>; Faraggi, D. (2003) <doi:10.1111/1467-9884.00350>; Gu, J. et al. (2008) <doi:10.1002/sim.3366>; Inacio de Carvalho, V. et al. (2013) <doi:10.1214/13-BA825>; Inacio de Carvalho, V., and Rodriguez-Alvarez, M.X. (2022) <doi:10.1214/21-STS839>; Janes, H., and Pepe, M.S. (2009) <doi:10.1093/biomet/asp002>; Pepe, M.S. (1998) <http://www.jstor.org/stable/2534001?seq=1>; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1016/j.csda.2010.07.018>; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1007/s11222-010-9184-1>. Please see Rodriguez-Alvarez, M.X. and Inacio, V. (2021) <doi:10.32614/RJ-2021-066> for more details.

Maintained by Maria Xose Rodriguez-Alvarez. Last updated 10 months ago.

28.1 match 1 stars 1.66 score 46 scripts

toduckhanh

bcROCsurface:Bias-Corrected Methods for Estimating the ROC Surface of Continuous Diagnostic Tests

The bias-corrected estimation methods for the receiver operating characteristics ROC surface and the volume under ROC surfaces (VUS) under missing at random (MAR) assumption.

Maintained by Duc-Khanh To. Last updated 1 years ago.

openblas cpp

13.5 match 3.45 score 14 scripts

flr

FLCore:Core Package of FLR, Fisheries Modelling in R

Core classes and methods for FLR, a framework for fisheries modelling and management strategy simulation in R. Developed by a team of fisheries scientists in various countries. More information can be found at <http://flr-project.org/>.

Maintained by Iago Mosqueira. Last updated 9 days ago.

fisheries flr fisheries-modelling

5.2 match 16 stars 8.78 score 956 scripts 23 dependents

cran

PresenceAbsence:Presence-Absence Model Evaluation

Provides a set of functions useful when evaluating the results of presence-absence models. Package includes functions for calculating threshold dependent measures such as confusion matrices, pcc, sensitivity, specificity, and Kappa, and produces plots of each measure as the threshold is varied. It will calculate optimal threshold choice according to a choice of optimization criteria. It also includes functions to plot the threshold independent ROC curves along with the associated AUC (area under the curve).

Maintained by Elizabeth Freeman. Last updated 2 years ago.

8.5 match 1 stars 5.32 score 224 scripts 9 dependents

tjmahr

wisclabmisc:Tools to Support the 'WiscLab'

A collection of 'R' functions for use (and re-use) across 'WiscLab' projects. These are analysis or presentation oriented functions--that is, they are not for data reading or data cleaning.

Maintained by Tristan Mahr. Last updated 4 days ago.

10.8 match 3.95 score 4 scripts

mthrun

DataVisualizations:Visualizations of High-Dimensional Data

Gives access to data visualisation methods that are relevant from the data scientist's point of view. The flagship idea of 'DataVisualizations' is the mirrored density plot (MD-plot) for either classified or non-classified multivariate data published in Thrun, M.C. et al.: "Analyzing the Fine Structure of Distributions" (2020), PLoS ONE, <DOI:10.1371/journal.pone.0238835>. The MD-plot outperforms the box-and-whisker diagram (box plot), violin plot and bean plot and geom_violin plot of ggplot2. Furthermore, a collection of various visualization methods for univariate data is provided. In the case of exploratory data analysis, 'DataVisualizations' makes it possible to inspect the distribution of each feature of a dataset visually through a combination of four methods. One of these methods is the Pareto density estimation (PDE) of the probability density function (pdf). Additionally, visualizations of the distribution of distances using PDE, the scatter-density plot using PDE for two variables as well as the Shepard density plot and the Bland-Altman plot are presented here. Pertaining to classified high-dimensional data, a number of visualizations are described, such as f.ex. the heat map and silhouette plot. A political map of the world or Germany can be visualized with the additional information defined by a classification of countries or regions. By extending the political map further, an uncomplicated function for a Choropleth map can be used which is useful for measurements across a geographic area. For categorical features, the Pie charts, slope charts and fan plots, improved by the ABC analysis, become usable. More detailed explanations are found in the book by Thrun, M.C.: "Projection-Based Clustering through Self-Organization and Swarm Intelligence" (2018) <DOI:10.1007/978-3-658-20540-9>.

Maintained by Michael Thrun. Last updated 2 months ago.

cpp

5.6 match 7 stars 7.72 score 118 scripts 7 dependents

paramita-sc

survivalROC:Time-Dependent ROC Curve Estimation from Censored Survival Data

Compute time-dependent ROC curve from censored survival data using Kaplan-Meier (KM) or Nearest Neighbor Estimation (NNE) method of Heagerty, Lumley & Pepe (Biometrics, Vol 56 No 2, 2000, PP 337-344).

Maintained by Paramita Saha-Chaudhuri. Last updated 2 years ago.

6.7 match 6 stars 6.37 score 266 scripts 16 dependents

egenn

rtemis:Machine Learning and Visualization

Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.

Maintained by E.D. Gennatas. Last updated 1 months ago.

data-science data-visualization machine-learning machine-learning-library visualization

6.0 match 145 stars 7.09 score 50 scripts 2 dependents

josetamezpena

FRESA.CAD:Feature Selection Algorithms for Computer Aided Diagnosis

Contains a set of utilities for building and testing statistical models (linear, logistic,ordinal or COX) for Computer Aided Diagnosis/Prognosis applications. Utilities include data adjustment, univariate analysis, model building, model-validation, longitudinal analysis, reporting and visualization.

Maintained by Jose Gerardo Tamez-Pena. Last updated 1 months ago.

openblas cpp openmp

7.6 match 7 stars 5.59 score 31 scripts

kenaho1

asbio:A Collection of Statistical Tools for Biologists

Contains functions from: Aho, K. (2014) Foundational and Applied Statistics for Biologists using R. CRC/Taylor and Francis, Boca Raton, FL, ISBN: 978-1-4398-7338-0.

Maintained by Ken Aho. Last updated 2 months ago.

5.6 match 5 stars 7.32 score 310 scripts 3 dependents

rsquaredacademy

blorr:Tools for Developing Binary Logistic Regression Models

Tools designed to make it easier for beginner and intermediate users to build and validate binary logistic regression models. Includes bivariate analysis, comprehensive regression output, model fit statistics, variable selection procedures, model validation techniques and a 'shiny' app for interactive model building.

Maintained by Aravind Hebbali. Last updated 4 months ago.

logistic-regression-models regression cpp

5.8 match 17 stars 7.13 score 144 scripts 1 dependents

ben519

mltools:Machine Learning Tools

A collection of machine learning helper functions, particularly assisting in the Exploratory Data Analysis phase. Makes heavy use of the 'data.table' package for optimal speed and memory efficiency. Highlights include a versatile bin_data() function, sparsify() for converting a data.table to sparse matrix format with one-hot encoding, fast evaluation metrics, and empirical_cdf() for calculating empirical Multivariate Cumulative Distribution Functions.

Maintained by Ben Gorman. Last updated 3 years ago.

exploratory-data-analysis machine-learning

4.3 match 72 stars 9.58 score 1.2k scripts 13 dependents

xfim

ggmcmc:Tools for Analyzing MCMC Simulations from Bayesian Inference

Tools for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables, and functions to work with hierarchical/multilevel batches of parameters (Fernández-i-Marín, 2016 <doi:10.18637/jss.v070.i09>).

Maintained by Xavier Fernández i Marín. Last updated 2 years ago.

bayesian-data-analysis ggplot2 graphical jags mcmc stan

3.4 match 112 stars 12.02 score 1.6k scripts 8 dependents

tidymodels

broom:Convert Statistical Objects into Tidy Tibbles

Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.

Maintained by Simon Couch. Last updated 4 months ago.

modeling tidy-data

1.9 match 1.5k stars 21.56 score 37k scripts 1.4k dependents

jinseob2kim

jstable:Create Tables from Different Types of Regression

Create regression tables from generalized linear model(GLM), generalized estimating equation(GEE), generalized linear mixed-effects model(GLMM), Cox proportional hazards model, survey-weighted generalized linear model(svyglm) and survey-weighted Cox model results for publication.

Maintained by Jinseob Kim. Last updated 12 days ago.

label regression table

4.0 match 26 stars 9.98 score 199 scripts 1 dependents

yizhenxu

TGST:Targeted Gold Standard Testing

Functions for implementing the targeted gold standard (GS) testing. You provide the true disease or treatment failure status and the risk score, tell 'TGST' the availability of GS tests and which method to use, and it returns the optimal tripartite rules. Please refer to Liu et al. (2013) <doi:10.1080/01621459.2013.810149> for more details.

Maintained by Yizhen Xu. Last updated 4 years ago.

10.8 match 3.70 score

tdhock

WeightedROC:Fast, Weighted ROC Curves

Fast computation of Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) for weighted binary classification problems (weights are example-specific cost values).

Maintained by Toby Dylan Hocking. Last updated 3 years ago.

6.8 match 29 stars 5.86 score 125 scripts

fcharte

mldr:Exploratory Data Analysis and Manipulation of Multi-Label Data Sets

Exploratory data analysis and manipulation functions for multi- label data sets along with an interactive Shiny application to ease their use.

Maintained by David Charte. Last updated 5 years ago.

5.6 match 23 stars 7.07 score 168 scripts 2 dependents

minatonakazawa

fmsb:Functions for Medical Statistics Book with some Demographic Data

Several utility functions for the book entitled "Practices of Medical and Health Data Analysis using R" (Pearson Education Japan, 2007) with Japanese demographic data and some demographic analysis related functions.

Maintained by Minato Nakazawa. Last updated 1 years ago.

5.1 match 3 stars 7.74 score 1.9k scripts 23 dependents

aiparragirre

svyROC:Estimation of the ROC Curve and the AUC for Complex Survey Data

Estimate the receiver operating characteristic (ROC) curve, area under the curve (AUC) and optimal cut-off points for individual classification taking into account complex sampling designs when working with complex survey data. Methods implemented in this package are described in: A. Iparragirre, I. Barrio, I. Arostegui (2024) <doi:10.1002/sta4.635>; A. Iparragirre, I. Barrio, J. Aramendi, I. Arostegui (2022) <doi:10.2436/20.8080.02.121>; A. Iparragirre, I. Barrio (2024) <doi:10.1007/978-3-031-65723-8_7>.

Maintained by Amaia Iparragirre. Last updated 4 months ago.

auc auc-optimism-correction complex-survey-data optimal-cut-off-points roc-curve sampling-weights

14.2 match 2.70 score

simondedman

gbm.auto:Automated Boosted Regression Tree Modelling and Mapping Suite

Automates delta log-normal boosted regression tree abundance prediction. Loops through parameters provided (LR (learning rate), TC (tree complexity), BF (bag fraction)), chooses best, simplifies, & generates line, dot & bar plots, & outputs these & predictions & a report, makes predicted abundance maps, and Unrepresentativeness surfaces. Package core built around 'gbm' (gradient boosting machine) functions in 'dismo' (Hijmans, Phillips, Leathwick & Jane Elith, 2020 & ongoing), itself built around 'gbm' (Greenwell, Boehmke, Cunningham & Metcalfe, 2020 & ongoing, originally by Ridgeway). Indebted to Elith/Leathwick/Hastie 2008 'Working Guide' <doi:10.1111/j.1365-2656.2008.01390.x>; workflow follows Appendix S3. See <https://www.simondedman.com/> for published guides and papers using this package.

Maintained by Simon Dedman. Last updated 6 days ago.

6.6 match 18 stars 5.77 score 13 scripts

danny-ldc

RcmdrPlugin.ROC:Rcmdr Receiver Operator Characteristic Plug-in Package

Rcmdr GUI extension plug-in for Receiver Operator Characteristic tools from pROC package. Also it ads a Rcmdr GUI extension for Hosmer and Lemeshow GOF test from the package ResourceSelection.

Maintained by Daniel-Corneliu Leucuta. Last updated 3 years ago.

37.5 match 1.00 score 1 scripts

haghish

adjROC:Computing Sensitivity at a Fix Value of Specificity and Vice Versa as Well as Bootstrap Metrics for ROC Curves

This software assesses the receiver operating characteristic (ROC) curve at adjusted thresholds, enabling the comparison of sensitivity and specificity across multiple binary classification models. Instead of comparing different models with varied cutoff values in their risk thresholds, all models can be compared at a fixed threshold of sensitivity, a fixed threshold of specificity, or the crossing point between sensitivity and specificity. If a threshold for specificity is given (e.g., specificity = 0.9), sensitivity and its confidence interval are computed, and vice versa. If the threshold for either sensitivity or specificity is not provided, the crossing point between the sensitivity and specificity curves is returned, along with their confidence intervals. For bootstrap procedures, the software evaluates the mean and CI bootstrap values for sensitivity, specificity, and the crossing point between specificity and sensitivity. This allows users to discern whether the performance of a model (based on adjusted sensitivity or adjusted specificity) is significantly different from other models. This software addresses the issue of comparing different classification models with varying predefined cutoff thresholds, which often leads to inconclusive results due to the fluctuating values of both sensitivity and specificity.

Maintained by E. F. Haghish. Last updated 9 months ago.

12.2 match 3.00 score

ballings

AUC:Threshold Independent Performance Measures for Probabilistic Classifiers

Various functions to compute the area under the curve of selected measures: The area under the sensitivity curve (AUSEC), the area under the specificity curve (AUSPC), the area under the accuracy curve (AUACC), and the area under the receiver operating characteristic curve (AUROC). Support for visualization and partial areas is included.

Maintained by Michel Ballings. Last updated 3 years ago.

6.8 match 5.37 score 424 scripts 7 dependents

bioc

DirichletMultinomial:Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data

Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Quince, 2012, PLoS ONE 7(2): 1-15, as discussed further in the man page for this package, ?DirichletMultinomial.

Maintained by Martin Morgan. Last updated 5 months ago.

immunooncology microbiome sequencing clustering classification metagenomics gsl

3.3 match 11 stars 10.97 score 125 scripts 26 dependents

animint

animint2:Animated Interactive Grammar of Graphics

Functions are provided for defining animated, interactive data visualizations in R code, and rendering on a web page. The 2018 Journal of Computational and Graphical Statistics paper, <doi:10.1080/10618600.2018.1513367> describes the concepts implemented.

Maintained by Toby Hocking. Last updated 27 days ago.

4.0 match 64 stars 8.87 score 173 scripts

tarnduong

ks:Kernel Smoothing

Kernel smoothers for univariate and multivariate data, with comprehensive visualisation and bandwidth selection capabilities, including for densities, density derivatives, cumulative distributions, clustering, classification, density ridges, significant modal regions, and two-sample hypothesis tests. Chacon & Duong (2018) <doi:10.1201/9780429485572>.

Maintained by Tarn Duong. Last updated 6 months ago.

3.4 match 6 stars 10.14 score 920 scripts 262 dependents

barbarabodinier

fake:Flexible Data Simulation Using the Multivariate Normal Distribution

This R package can be used to generate artificial data conditionally on pre-specified (simulated or user-defined) relationships between the variables and/or observations. Each observation is drawn from a multivariate Normal distribution where the mean vector and covariance matrix reflect the desired relationships. Outputs can be used to evaluate the performances of variable selection, graphical modelling, or clustering approaches by comparing the true and estimated structures (B Bodinier et al (2021) <arXiv:2106.02521>).

Maintained by Barbara Bodinier. Last updated 2 years ago.

7.0 match 6 stars 4.86 score 81 scripts 1 dependents

nomahi

dmetatools:Computational tools for meta-analysis of diagnostic accuracy test

Computational tools for meta-analysis of diagnostic accuracy test. This package enables computations of confidence interval for the AUC of summary ROC curve and some related AUC-based inference methods.

Maintained by Hisashi Noma. Last updated 3 years ago.

auc bootstrap diagnostic-tests meta-analysis summary-roc-curve

12.3 match 2.70 score 2 scripts

promidat

traineR:Predictive (Classification and Regression) Models Homologator

Methods to unify the different ways of creating predictive models and their different predictive formats for classification and regression. It includes methods such as K-Nearest Neighbors Schliep, K. P. (2004) <doi:10.5282/ubm/epub.1769>, Decision Trees Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (2017) <doi:10.1201/9781315139470>, ADA Boosting Esteban Alfaro, Matias Gamez, Noelia García (2013) <doi:10.18637/jss.v054.i02>, Extreme Gradient Boosting Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>, Random Forest Breiman (2001) <doi:10.1023/A:1010933404324>, Neural Networks Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Support Vector Machines Bennett, K. P. & Campbell, C. (2000) <doi:10.1145/380995.380999>, Bayesian Methods Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995) <doi:10.1201/9780429258411>, Linear Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Quadratic Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Logistic Regression Dobson, A. J., & Barnett, A. G. (2018) <doi:10.1201/9781315182780> and Penalized Logistic Regression Friedman, J. H., Hastie, T., & Tibshirani, R. (2010) <doi:10.18637/jss.v033.i01>.

Maintained by Oldemar Rodriguez R.. Last updated 1 years ago.

9.0 match 3.64 score 36 scripts 2 dependents

tagteam

riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks

Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.

Maintained by Thomas Alexander Gerds. Last updated 17 days ago.

openblas cpp

2.5 match 46 stars 13.00 score 736 scripts 35 dependents

easystats

performance:Assessment of Regression Models Performance

Utilities for computing measures to assess model quality, which are not directly provided by R's 'base' or 'stats' packages. These include e.g. measures like r-squared, intraclass correlation coefficient (Nakagawa, Johnson & Schielzeth (2017) <doi:10.1098/rsif.2017.0213>), root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models. References: Lüdecke et al. (2021) <doi:10.21105/joss.03139>.

Maintained by Daniel Lüdecke. Last updated 19 days ago.

aic easystats hacktoberfest loo machine-learning mixed-models models performance r2 statistics

2.0 match 1.1k stars 16.17 score 4.3k scripts 47 dependents

ziyili20

caROC:Continuous Biomarker Evaluation with Adjustment of Covariates

Compute covariate-adjusted specificity at controlled sensitivity level, or covariate-adjusted sensitivity at controlled specificity level, or covariate-adjust receiver operating characteristic curve, or covariate-adjusted thresholds at controlled sensitivity/specificity level. All statistics could also be computed for specific sub-populations given their covariate values. Methods are described in Ziyi Li, Yijian Huang, Datta Patil, Martin G. Sanda (2021+) "Covariate adjustment in continuous biomarker assessment".

Maintained by Ziyi Li. Last updated 4 years ago.

16.0 match 2.00 score 5 scripts

haakoneidemhaakstad

betafunctions:Functions for Working with Two- And Four-Parameter Beta Probability Distributions and Psychometric Analysis of Classifications

Package providing a number of functions for working with Two- and Four-parameter Beta and closely related distributions (i.e., the Gamma- Binomial-, and Beta-Binomial distributions). Includes, among other things: - d/p/q/r functions for Four-Parameter Beta distributions and Generalized "Binomial" (continuous) distributions, and d/p/r- functions for Beta- Binomial distributions. - d/p/q/r functions for Two- and Four-Parameter Beta distributions parameterized in terms of their means and variances rather than their shape-parameters. - Moment generating functions for Binomial distributions, Beta-Binomial distributions, and observed value distributions. - Functions for estimating classification accuracy and consistency, making use of the Classical Test-Theory based 'Livingston and Lewis' (L&L) and 'Hanson and Brennan' approaches. A shiny app is available, providing a GUI for the L&L approach when used for binary classifications. For url to the app, see documentation for the LL.CA() function. Livingston and Lewis (1995) <doi:10.1111/j.1745-3984.1995.tb00462.x>. Lord (1965) <doi:10.1007/BF02289490>. Hanson (1991) <https://files.eric.ed.gov/fulltext/ED344945.pdf>.

Maintained by Haakon Eidem Haakstad. Last updated 4 months ago.

10.1 match 3.18 score 8 scripts 1 dependents

cran

nsROC:Non-Standard ROC Curve Analysis

Tools for estimating Receiver Operating Characteristic (ROC) curves, building confidence bands, comparing several curves both for dependent and independent data, estimating the cumulative-dynamic ROC curve in presence of censored data, and performing meta-analysis studies, among others.

Maintained by Sonia Perez Fernandez. Last updated 7 years ago.

20.1 match 1 stars 1.58 score 19 scripts

mxrodriguezuvigo

npROCRegression:Kernel-Based Nonparametric ROC Regression Modelling

Implements several nonparametric regression approaches for the inclusion of covariate information on the receiver operating characteristic (ROC) framework.

Maintained by Maria Xose Rodriguez-Alvarez. Last updated 2 years ago.

fortran

12.7 match 1 stars 2.48 score 15 scripts

jackdunnnz

iai:Interface to 'Interpretable AI' Modules

An interface to the algorithms of 'Interpretable AI' <https://www.interpretable.ai> from the R programming language. 'Interpretable AI' provides various modules, including 'Optimal Trees' for classification, regression, prescription and survival analysis, 'Optimal Imputation' for missing data imputation and outlier detection, and 'Optimal Feature Selection' for exact sparse regression. The 'iai' package is an open-source project. The 'Interpretable AI' software modules are proprietary products, but free academic and evaluation licenses are available.

Maintained by Jack Dunn. Last updated 5 months ago.

15.7 match 1 stars 2.00 score 7 scripts

nicolalunardon

ROSE:Random Over-Sampling Examples

Functions to deal with binary classification problems in the presence of imbalanced classes. Synthetic balanced samples are generated according to ROSE (Menardi and Torelli, 2013). Functions that implement more traditional remedies to the class imbalance are also provided, as well as different metrics to evaluate a learner accuracy. These are estimated by holdout, bootstrap or cross-validation methods.

Maintained by Nicola Lunardon. Last updated 4 years ago.

4.5 match 4 stars 6.86 score 1.6k scripts 3 dependents

bioc

ROCpAI:Receiver Operating Characteristic Partial Area Indexes for evaluating classifiers

The package analyzes the Curve ROC, identificates it among different types of Curve ROC and calculates the area under de curve through the method that is most accuracy. This package is able to standarizate proper and improper pAUC.

Maintained by Juan-Pedro Garcia. Last updated 5 months ago.

software statisticalmethod classification

9.2 match 3.30 score 2 scripts

andrisignorell

ModTools:Building Regression and Classification Models

Consistent user interface to the most common regression and classification algorithms, such as random forest, neural networks, C5 trees and support vector machines, complemented with a handful of auxiliary functions, such as variable importance and a tuning function for the parameters.

Maintained by Andri Signorell. Last updated 2 months ago.

7.2 match 2 stars 4.20 score 3 scripts

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 7 hours ago.

fortran cpp

1.8 match 87 stars 16.70 score 7.7k scripts 99 dependents

mlr-org

mlr3:Machine Learning in R - Next Generation

Efficient, object-oriented programming on the building blocks of machine learning. Provides 'R6' objects for tasks, learners, resamplings, and measures. The package is geared towards scalability and larger datasets by supporting parallelization and out-of-memory data-backends like databases. While 'mlr3' focuses on the core computational operations, add-on packages provide additional functionality.

Maintained by Marc Becker. Last updated 4 days ago.

classification data-science machine-learning mlr3 regression

2.0 match 972 stars 14.86 score 2.3k scripts 35 dependents

bioc

survcomp:Performance Assessment and Comparison for Survival Analysis

Assessment and Comparison for Performance of Risk Prediction (Survival) Models.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpression differentialexpression visualization cpp

3.4 match 8.46 score 448 scripts 12 dependents

glsnow

TeachingDemos:Demonstrations for Teaching and Learning

Demonstration functions that can be used in a classroom to demonstrate statistical concepts, or on your own to better understand the concepts or the programming.

Maintained by Greg Snow. Last updated 1 years ago.

4.0 match 7.18 score 760 scripts 13 dependents

schlosslab

mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines

An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.

Maintained by Kelly Sovacool. Last updated 2 years ago.

machine-learning

3.6 match 56 stars 7.83 score 86 scripts

tgno3

DJL:Distance Measure Based Judgment and Learning

Implements various decision support tools related to the Econometrics & Technometrics. Subroutines include correlation reliability test, Mahalanobis distance measure for outlier detection, combinatorial search (all possible subset regression), non-parametric efficiency analysis measures: DDF (directional distance function), DEA (data envelopment analysis), HDF (hyperbolic distance function), SBM (slack-based measure), and SF (shortage function), benchmarking, Malmquist productivity analysis, risk analysis, technology adoption model, new product target setting, network DEA, dynamic DEA, intertemporal budgeting, etc.

Maintained by Dong-Joon Lim. Last updated 2 years ago.

14.3 match 1 stars 1.97 score 93 scripts

winvector

sigr:Succinct and Correct Statistical Summaries for Reports

Succinctly and correctly format statistical summaries of various models and tests (F-test, Chi-Sq-test, Fisher-test, T-test, and rank-significance). This package also includes empirical tests, such as Monte Carlo and bootstrap distribution estimates.

Maintained by John Mount. Last updated 2 years ago.

3.9 match 28 stars 7.18 score 97 scripts 1 dependents

gbm-developers

gbm:Generalized Boosted Regression Models

An implementation of extensions to Freund and Schapire's AdaBoost algorithm and Friedman's gradient boosting machine. Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart). Originally developed by Greg Ridgeway. Newer version available at github.com/gbm-developers/gbm3.

Maintained by Greg Ridgeway. Last updated 9 months ago.

cpp

2.0 match 52 stars 13.85 score 6.8k scripts 91 dependents

bioc

minet:Mutual Information NETworks

This package implements various algorithms for inferring mutual information networks from data.

Maintained by Patrick E. Meyer. Last updated 5 months ago.

microarray graphandnetwork network networkinference cpp

4.5 match 6.15 score 114 scripts 16 dependents

da-zar

ROCket:Simple and Fast ROC Curves

A set of functions for receiver operating characteristic (ROC) curve estimation and area under the curve (AUC) calculation. All functions are designed to work with aggregated data; nevertheless, they can also handle raw samples. In 'ROCket', we distinguish two types of ROC curve representations: 1) parametric curves - the true positive rate (TPR) and the false positive rate (FPR) are functions of a parameter (the score), 2) functions - TPR is a function of FPR. There are several ROC curve estimation methods available. An introduction to the mathematical background of the implemented methods (and much more) can be found in de Zea Bermudez, Gonçalves, Oliveira & Subtil (2014) <https://www.ine.pt/revstat/pdf/rs140101.pdf> and Cai & Pepe (2004) <doi:10.1111/j.0006-341X.2004.00200.x>.

Maintained by Daniel Lazar. Last updated 4 years ago.

10.2 match 1 stars 2.70 score 6 scripts

abichat

evabic:Evaluation of Binary Classifiers

Evaluates the performance of binary classifiers. Computes confusion measures (TP, TN, FP, FN), derived measures (TPR, FDR, accuracy, F1, DOR, ..), and area under the curve. Outputs are well suited for nested dataframes.

Maintained by Antoine Bichat. Last updated 3 years ago.

classifier measures predictors roc-curve statistics

7.5 match 6 stars 3.62 score 14 scripts

myles-lewis

nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'

Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.

Maintained by Myles Lewis. Last updated 6 days ago.

3.4 match 12 stars 7.92 score 46 scripts

cran

datarobot:'DataRobot' Predictive Modeling API

For working with the 'DataRobot' predictive modeling platform's API <https://www.datarobot.com/>.

Maintained by AJ Alon. Last updated 1 years ago.

7.5 match 2 stars 3.48 score

iangow

farr:Data and Code for Financial Accounting Research

Handy functions and data to support a course book for accounting research. Gow, Ian D. and Tongqing Ding (2024) 'Empirical Research in Accounting: Tools and Methods' <https://iangow.github.io/far_book/>.

Maintained by Ian Gow. Last updated 1 months ago.

accounting finance

5.1 match 17 stars 5.05 score 66 scripts

bioc

structToolbox:Data processing & analysis tools for Metabolomics and other omics

An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.

Maintained by Gavin Rhys Lloyd. Last updated 25 days ago.

workflowstep metabolomics bioconductor-package dims lc-ms machine-learning multivariate-analysis statistics univariate

4.0 match 10 stars 6.26 score 12 scripts

easystats

see:Model Visualisation Toolbox for 'easystats' and 'ggplot2'

Provides plotting utilities supporting packages in the 'easystats' ecosystem (<https://github.com/easystats/easystats>) and some extra themes, geoms, and scales for 'ggplot2'. Color scales are based on <https://materialui.co/>. References: Lüdecke et al. (2021) <doi:10.21105/joss.03393>.

Maintained by Indrajeet Patil. Last updated 5 days ago.

data-visualization easystats ggplot2 hacktoberfest plotting see statistics visualisation visualization

1.9 match 902 stars 13.22 score 2.0k scripts 3 dependents

bioc

TBSignatureProfiler:Profile RNA-Seq Data Using TB Pathway Signatures

Gene signatures of TB progression, TB disease, and other TB disease states have been validated and published previously. This package aggregates known signatures and provides computational tools to enlist their usage on other datasets. The TBSignatureProfiler makes it easy to profile RNA-Seq data using these signatures and includes common signature profiling tools including ASSIGN, GSVA, and ssGSEA. Original models for some gene signatures are also available. A shiny app provides some functionality alongside for detailed command line accessibility.

Maintained by Aubrey R. Odom. Last updated 3 months ago.

geneexpression differentialexpression bioconductor-package biomarkers gene-signatures tuberculosis

3.4 match 12 stars 7.25 score 23 scripts

cran

epiDisplay:Epidemiological Data Display Package

Package for data exploration and result presentation. Full 'epicalc' package with data management functions is available at '<https://medipe.psu.ac.th/epicalc/>'.

Maintained by Virasakdi Chongsuvivatwong. Last updated 3 years ago.

4.5 match 1 stars 5.44 score 758 scripts 2 dependents

mfrasco

Metrics:Evaluation Metrics for Machine Learning

An implementation of evaluation metrics in R that are commonly used in supervised machine learning. It implements metrics for regression, time series, binary classification, classification, and information retrieval problems. It has zero dependencies and a consistent, simple interface for all functions.

Maintained by Michael Frasco. Last updated 6 years ago.

1.9 match 99 stars 13.02 score 6.1k scripts 51 dependents

ipa-tys

ROCR:Visualizing the Performance of Scoring Classifiers

ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of trade-off visualizations for specific pairs of performance measures. ROCR is a flexible tool for creating cutoff-parameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface). Curves from different cross-validation or bootstrapping runs can be averaged by different methods, and standard deviations, standard errors or box plots can be used to visualize the variability across the runs. The parameterization can be visualized by printing cutoff values at the corresponding curve positions, or by coloring the curve according to cutoff. All components of a performance plot can be quickly adjusted using a flexible parameter dispatching mechanism. Despite its flexibility, ROCR is easy to use, with only three commands and reasonable default values for all optional parameters.

Maintained by Felix G.M. Ernst. Last updated 12 months ago.

1.7 match 38 stars 14.29 score 9.2k scripts 217 dependents

stc04003

rocTree:Receiver Operating Characteristic (ROC)-Guided Classification and Survival Tree

Receiver Operating Characteristic (ROC)-guided survival trees and ensemble algorithms are implemented, providing a unified framework for tree-structured analysis with censored survival outcomes. A time-invariant partition scheme on the survivor population was considered to incorporate time-dependent covariates. Motivated by ideas of randomized tests, generalized time-dependent ROC curves were used to evaluate the performance of survival trees and establish the optimality of the target hazard/survival function. The optimality of the target hazard function motivates us to use a weighted average of the time-dependent area under the curve (AUC) on a set of time points to evaluate the prediction performance of survival trees and to guide splitting and pruning. A detailed description of the implemented methods can be found in Sun et al. (2019) <arXiv:1809.05627>.

Maintained by Sy Han Chiou. Last updated 4 years ago.

decision-trees cpp

7.1 match 5 stars 3.40 score 7 scripts

bioc

gCrisprTools:Suite of Functions for Pooled Crispr Screen QC and Analysis

Set of tools for evaluating pooled high-throughput screening experiments, typically employing CRISPR/Cas9 or shRNA expression cassettes. Contains methods for interrogating library and cassette behavior within an experiment, identifying differentially abundant cassettes, aggregating signals to identify candidate targets for empirical validation, hypothesis testing, and comprehensive reporting. Version 2.0 extends these applications to include a variety of tools for contextualizing and integrating signals across many experiments, incorporates extended signal enrichment methodologies via the "sparrow" package, and streamlines many formal requirements to aid in interpretablity.

Maintained by Russell Bainer. Last updated 5 months ago.

immunooncology crispr pooledscreens experimentaldesign biomedicalinformatics cellbiology functionalgenomics pharmacogenomics pharmacogenetics systemsbiology differentialexpression genesetenrichment genetics multiplecomparison normalization preprocessing qualitycontrol rnaseq regression software visualization

5.0 match 4.78 score 8 scripts

izmirlig

pwrFDR:FDR Power

Computing Average and TPX Power under various BHFDR type sequential procedures. All of these procedures involve control of some summary of the distribution of the FDP, e.g. the proportion of discoveries which are false in a given experiment. The most widely known of these, the BH-FDR procedure, controls the FDR which is the mean of the FDP. A lesser known procedure, due to Lehmann and Romano, controls the FDX, or probability that the FDP exceeds a user provided threshold. This is less conservative than FWE control procedures but much more conservative than the BH-FDR proceudre. This package and the references supporting it introduce a new procedure for controlling the FDX which we call the BH-FDX procedure. This procedure iteratively identifies, given alpha and lower threshold delta, an alpha* less than alpha at which BH-FDR guarantees FDX control. This uses asymptotic approximation and is only slightly more conservative than the BH-FDR procedure. Likewise, we can think of the power in multiple testing experiments in terms of a summary of the distribution of the True Positive Proportion (TPP), the portion of tests truly non-null distributed that are called significant. The package will compute power, sample size or any other missing parameter required for power defined as (i) the mean of the TPP (average power) or (ii) the probability that the TPP exceeds a given value, lambda, (TPX power) via asymptotic approximation. All supplied theoretical results are also obtainable via simulation. The suggested approach is to narrow in on a design via the theoretical approaches and then make final adjustments/verify the results by simulation. The theoretical results are described in Izmirlian, G (2020) Statistics and Probability letters, "<doi:10.1016/j.spl.2020.108713>", and an applied paper describing the methodology with a simulation study is in preparation. See citation("pwrFDR").

Maintained by Grant Izmirlian. Last updated 2 months ago.

9.3 match 2.58 score 19 scripts

fvafrcu

HandTill2001:Multiple Class Area under ROC Curve

An S4 implementation of Eq. (3) and Eq. (7) by David J. Hand and Robert J. Till (2001) <DOI:10.1023/A:1010920819831>.

Maintained by Andreas Dominik Cullmann. Last updated 4 years ago.

4.8 match 4.95 score 59 scripts 1 dependents

cran

ROCaggregator:Aggregate Multiple ROC Curves into One Global ROC

Aggregates multiple Receiver Operating Characteristic (ROC) curves obtained from different sources into one global ROC. Additionally, it’s also possible to calculate the aggregated precision-recall (PR) curve.

Maintained by Pedro Mateus. Last updated 4 years ago.

8.8 match 2.70 score

wkostelecki

ezplot:Functions for Common Chart Types

Wrapper for the 'ggplot2' package that creates a variety of common charts (e.g. bar, line, area, ROC, waterfall, pie) while aiming to reduce typing.

Maintained by Wojtek Kostelecki. Last updated 7 months ago.

3.8 match 5 stars 6.16 score 116 scripts

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 4 days ago.

1.7 match 845 stars 13.57 score 264 scripts 2 dependents

kozodoi

fairness:Algorithmic Fairness Metrics

Offers calculation, visualization and comparison of algorithmic fairness metrics. Fair machine learning is an emerging topic with the overarching aim to critically assess whether ML algorithms reinforce existing social biases. Unfair algorithms can propagate such biases and produce predictions with a disparate impact on various sensitive groups of individuals (defined by sex, gender, ethnicity, religion, income, socioeconomic status, physical or mental disabilities). Fair algorithms possess the underlying foundation that these groups should be treated similarly or have similar prediction outcomes. The fairness R package offers the calculation and comparisons of commonly and less commonly used fairness metrics in population subgroups. These methods are described by Calders and Verwer (2010) <doi:10.1007/s10618-010-0190-x>, Chouldechova (2017) <doi:10.1089/big.2016.0047>, Feldman et al. (2015) <doi:10.1145/2783258.2783311> , Friedler et al. (2018) <doi:10.1145/3287560.3287589> and Zafar et al. (2017) <doi:10.1145/3038912.3052660>. The package also offers convenient visualizations to help understand fairness metrics.

Maintained by Nikita Kozodoi. Last updated 2 years ago.

algorithmic-discrimination algorithmic-fairness discrimination disparate-impact fairness fairness-ai fairness-ml machine-learning

3.3 match 32 stars 6.82 score 69 scripts 1 dependents

r-forge

tram:Transformation Models

Formula-based user-interfaces to specific transformation models implemented in package 'mlt' (<DOI:10.32614/CRAN.package.mlt>, <DOI:10.32614/CRAN.package.mlt.docreg>). Available models include Cox models, some parametric survival models (Weibull, etc.), models for ordered categorical variables, normal and non-normal (Box-Cox type) linear models, and continuous outcome logistic regression (Lohse et al., 2017, <DOI:10.12688/f1000research.12934.1>). The underlying theory is described in Hothorn et al. (2018) <DOI:10.1111/sjos.12291>. An extension to transformation models for clustered data is provided (Barbanti and Hothorn, 2022, <DOI:10.1093/biostatistics/kxac048>). Multivariate conditional transformation models (Klein et al, 2022, <DOI:10.1111/sjos.12501>) and shift-scale transformation models (Siegfried et al, 2023, <DOI:10.1080/00031305.2023.2203177>) can be fitted as well. The package contains an implementation of a doubly robust score test, described in Kook et al. (2024, <DOI:10.1080/01621459.2024.2395588>).

Maintained by Torsten Hothorn. Last updated 4 days ago.

3.3 match 6.87 score 97 scripts 6 dependents

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 4 days ago.

immunooncology microarray sequencing metabolomics metagenomics proteomics geneprediction multiplecomparison classification regression bioconductor genomics genomics-data genomics-visualization multivariate-analysis multivariate-statistics omics r-pkg r-project

1.7 match 182 stars 13.71 score 1.3k scripts 22 dependents

cran

multiROC:Calculating and Visualizing ROC and PR Curves Across Multi-Class Classifications

Tools to solve real-world problems with multiple classes classifications by computing the areas under ROC and PR curve via micro-averaging and macro-averaging. The vignettes of this package can be found via <https://github.com/WandeRum/multiROC>. The methodology is described in V. Van Asch (2013) <https://www.clips.uantwerpen.be/~vincent/pdf/microaverage.pdf> and Pedregosa et al. (2011) <http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html>.

Maintained by Runmin Wei. Last updated 7 years ago.

12.7 match 1.78 score

bioc

scAnnotatR:Pretrained learning models for cell type prediction on single cell RNA-sequencing data

The package comprises a set of pretrained machine learning models to predict basic immune cell types. This enables all users to quickly get a first annotation of the cell types present in their dataset without requiring prior knowledge. scAnnotatR also allows users to train their own models to predict new cell types based on specific research needs.

Maintained by Johannes Griss. Last updated 5 months ago.

singlecell transcriptomics geneexpression supportvectormachine classification software

3.3 match 15 stars 6.73 score 20 scripts

snoweye

cubfits:Codon Usage Bias Fits

Estimating mutation and selection coefficients on synonymous codon bias usage based on models of ribosome overhead cost (ROC). Multinomial logistic regression and Markov Chain Monte Carlo are used to estimate and predict protein production rates with/without the presence of expressions and measurement errors. Work flows with examples for simulation, estimation and prediction processes are also provided with parallelization speedup. The whole framework is tested with yeast genome and gene expression data of Yassour, et al. (2009) <doi:10.1073/pnas.0812841106>.

Maintained by Wei-Chen Chen. Last updated 3 years ago.

4.5 match 7 stars 4.83 score 32 scripts

numbersman77

logcondens:Estimate a Log-Concave Probability Density from Iid Observations

Given independent and identically distributed observations X(1), ..., X(n), compute the maximum likelihood estimator (MLE) of a density as well as a smoothed version of it under the assumption that the density is log-concave, see Rufibach (2007) and Duembgen and Rufibach (2009). The main function of the package is 'logConDens' that allows computation of the log-concave MLE and its smoothed version. In addition, we provide functions to compute (1) the value of the density and distribution function estimates (MLE and smoothed) at a given point (2) the characterizing functions of the estimator, (3) to sample from the estimated distribution, (5) to compute a two-sample permutation test based on log-concave densities, (6) the ROC curve based on log-concave estimates within cases and controls, including confidence intervals for given values of false positive fractions (7) computation of a confidence interval for the value of the true density at a fixed point. Finally, three datasets that have been used to illustrate log-concave density estimation are made available.

Maintained by Kaspar Rufibach. Last updated 2 years ago.

6.4 match 3.31 score 31 scripts 1 dependents

aaamini

nett:Network Analysis and Community Detection

Features tools for the network data analysis and community detection. Provides multiple methods for fitting, model selection and goodness-of-fit testing in degree-corrected stochastic blocks models. Most of the computations are fast and scalable for sparse networks, esp. for Poisson versions of the models. Implements the following: Amini, Chen, Bickel and Levina (2013) <doi:10.1214/13-AOS1138> Bickel and Sarkar (2015) <doi:10.1111/rssb.12117> Lei (2016) <doi:10.1214/15-AOS1370> Wang and Bickel (2017) <doi:10.1214/16-AOS1457> Zhang and Amini (2020) <arXiv:2012.15047> Le and Levina (2022) <doi:10.1214/21-EJS1971>.

Maintained by Arash A. Amini. Last updated 2 years ago.

cpp

3.8 match 8 stars 5.48 score 19 scripts

bioc

genefilter:genefilter: methods for filtering genes from high-throughput experiments

Some basic functions for filtering genes.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

microarray fortran cpp

1.9 match 11.10 score 2.4k scripts 142 dependents

sqyu

genscore:Generalized Score Matching Estimators

Implementation of the Generalized Score Matching estimator in Yu et al. (2019) <http://jmlr.org/papers/v20/18-278.html> for non-negative graphical models (truncated Gaussian, exponential square-root, gamma, a-b models) and univariate truncated Gaussian distributions. Also includes the original estimator for untruncated Gaussian graphical models from Lin et al. (2016) <doi:10.1214/16-EJS1126>, with the addition of a diagonal multiplier.

Maintained by Shiqing Yu. Last updated 5 years ago.

density-estimation graphical-models interaction-models score-matching undirected-graphs

4.9 match 1 stars 4.18 score 3 scripts 1 dependents

singmann

MPTinR:Analyze Multinomial Processing Tree Models

Provides a user-friendly way for the analysis of multinomial processing tree (MPT) models (e.g., Riefer, D. M., and Batchelder, W. H. [1988]. Multinomial modeling and the measurement of cognitive processes. Psychological Review, 95, 318-339) for single and multiple datasets. The main functions perform model fitting and model selection. Model selection can be done using AIC, BIC, or the Fisher Information Approximation (FIA) a measure based on the Minimum Description Length (MDL) framework. The model and restrictions can be specified in external files or within an R script in an intuitive syntax or using the context-free language for MPTs. The 'classical' .EQN file format for model files is also supported. Besides MPTs, this package can fit a wide variety of other cognitive models such as SDT models (see fit.model). It also supports multicore fitting and FIA calculation (using the snowfall package), can generate or bootstrap data for simulations, and plot predicted versus observed data.

Maintained by Henrik Singmann. Last updated 4 years ago.

cpp

5.1 match 3 stars 3.99 score 27 scripts 1 dependents

hanjunwei-lab

PMAPscore:Identify Prognosis-Related Pathways Altered by Somatic Mutation

We innovatively defined a pathway mutation accumulate perturbation score (PMAPscore) to reflect the position and the cumulative effect of the genetic mutations at the pathway level. Based on the PMAPscore of pathways, identified prognosis-related pathways altered by somatic mutation and predict immunotherapy efficacy by constructing a multiple-pathway-based risk model (Tarca, Adi Laurentiu et al (2008) <doi:10.1093/bioinformatics/btn577>).

Maintained by Junwei Han. Last updated 3 years ago.

5.4 match 3.70 score 2 scripts

bioc

netprioR:A model for network-based prioritisation of genes

A model for semi-supervised prioritisation of genes integrating network data, phenotypes and additional prior knowledge about TP and TN gene labels from the literature or experts.

Maintained by Fabian Schmich. Last updated 5 months ago.

immunooncology cellbasedassays preprocessing network

5.0 match 4.00 score 1 scripts

mlverse

luz:Higher Level 'API' for 'torch'

A high level interface for 'torch' providing utilities to reduce the the amount of code needed for common tasks, abstract away torch details and make the same code work on both the 'CPU' and 'GPU'. It's flexible enough to support expressing a large range of models. It's heavily inspired by 'fastai' by Howard et al. (2020) <arXiv:2002.04688>, 'Keras' by Chollet et al. (2015) and 'PyTorch Lightning' by Falcon et al. (2019) <doi:10.5281/zenodo.3828935>.

Maintained by Daniel Falbel. Last updated 6 months ago.

2.0 match 89 stars 9.86 score 318 scripts 4 dependents

anabraga

Comp2ROC:Compare Two ROC Curves that Intersect

Comparison of two ROC curves through the methodology proposed by Ana C. Braga.

Maintained by Ana C. Braga. Last updated 9 years ago.

19.6 match 1.00 score 9 scripts

cran

rocsvm.path:The Entire Solution Paths for ROC-SVM

We develop the entire solution paths for ROC-SVM presented by Rakotomamonjy. The ROC-SVM solution path algorithm greatly facilitates the tuning procedure for regularization parameter, lambda in ROC-SVM by avoiding grid search algorithm which may be computationally too intensive. For more information on the ROC-SVM, see the report in the ROC Analysis in AI workshop(ROCAI-2004) : Hernàndez-Orallo, José, et al. (2004) <doi:10.1145/1046456.1046489>.

Maintained by Seung Jun Shin. Last updated 6 years ago.

17.1 match 1.15 score 14 scripts

yuerany

fullROC:Plot Full ROC Curves using Eyewitness Lineup Data

Enable researchers to adjust identification rates using the 1/(lineup size) method, generate the full receiver operating characteristic (ROC) curves, and statistically compare the area under the curves (AUC). References: Yueran Yang & Andrew Smith. (2022). "fullROC: An R package for generating and analyzing eyewitness-lineup ROC curves". Behavior Research Methods. <doi:10.3758/s13428-022-01807-6>, Andrew Smith, Yueran Yang, & Gary Wells. (2020). "Distinguishing between investigator discriminability and eyewitness discriminability: A method for creating full receiver operating characteristic curves of lineup identification performance". Perspectives on Psychological Science, 15(3), 589-607. <doi:10.1177/1745691620902426>.

Maintained by Yueran Yang. Last updated 2 years ago.

9.7 match 2.00 score

adrianantico

AutoPlots:Creating Echarts Visualizations as Easy as Possible

Create beautiful and interactive visualizations in a single function call. The 'data.table' package is utilized to perform the data wrangling necessary to prepare your data for the plot types you wish to build, along with allowing fast processing for big data. There are two broad classes of plots available: standard plots and machine learning evaluation plots. There are lots of parameters available in each plot type function for customizing the plots (such as faceting) and data wrangling (such as variable transformations and aggregation).

Maintained by Adrian Antico. Last updated 10 months ago.

4.5 match 21 stars 4.32 score

rmojab63

ldt:Automated Uncertainty Analysis

Methods and tools for model selection and multi-model inference (Burnham and Anderson (2002) <doi:10.1007/b97636>, among others). 'SUR' (for parameter estimation), 'logit'/'probit' (for binary classification), and 'VARMA' (for time-series forecasting) are implemented. Evaluations are both in-sample and out-of-sample. It is designed to be efficient in terms of CPU usage and memory consumption.

Maintained by Ramin Mojab. Last updated 8 months ago.

openblas cpp openmp

7.8 match 2.48 score 7 scripts

shanascogin

BayesPostEst:Generate Postestimation Quantities for Bayesian MCMC Estimation

An implementation of functions to generate and plot postestimation quantities after estimating Bayesian regression models using Markov chain Monte Carlo (MCMC). Functionality includes the estimation of the Precision-Recall curves (see Beger, 2016 <doi:10.2139/ssrn.2765419>), the implementation of the observed values method of calculating predicted probabilities by Hanmer and Kalkan (2013) <doi:10.1111/j.1540-5907.2012.00602.x>, the implementation of the average value method of calculating predicted probabilities (see King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>), and the generation and plotting of first differences to summarize typical effects across covariates (see Long 1997, ISBN:9780803973749; King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>). This package can be used with MCMC output generated by any Bayesian estimation tool including 'JAGS', 'BUGS', 'MCMCpack', and 'Stan'.

Maintained by Shana Scogin. Last updated 3 years ago.

jags cpp

3.4 match 12 stars 5.71 score 17 scripts

toduckhanh

ClusROC:ROC Analysis in Three-Class Classification Problems for Clustered Data

Statistical methods for ROC surface analysis in three-class classification problems for clustered data and in presence of covariates. In particular, the package allows to obtain covariate-specific point and interval estimation for: (i) true class fractions (TCFs) at fixed pairs of thresholds; (ii) the ROC surface; (iii) the volume under ROC surface (VUS); (iv) the optimal pairs of thresholds. Methods considered in points (i), (ii) and (iv) are proposed and discussed in To et al. (2022) <doi:10.1177/09622802221089029>. Referring to point (iv), three different selection criteria are implemented: Generalized Youden Index (GYI), Closest to Perfection (CtP) and Maximum Volume (MV). Methods considered in point (iii) are proposed and discussed in Xiong et al. (2018) <doi:10.1177/0962280217742539>. Visualization tools are also provided. We refer readers to the articles cited above for all details.

Maintained by Duc-Khanh To. Last updated 2 years ago.

biomaker box-cox-transformation mixed-effects-models optimal-threshold reciver-operating-characteristics cpp

7.1 match 2.70 score 6 scripts

jamesmurray7

gmvjoint:Joint Models of Survival and Multivariate Longitudinal Data

Fit joint models of survival and multivariate longitudinal data. The longitudinal data is specified by generalised linear mixed models. The joint models are fit via maximum likelihood using an approximate expectation maximisation algorithm. Bernhardt (2015) <doi:10.1016/j.csda.2014.11.011>.

Maintained by James Murray. Last updated 5 months ago.

glmm joint-models longitudinal mixed-models model prediction survival survival-analysis openblas cpp openmp

5.1 match 3 stars 3.78 score 20 scripts

cran

tdROC:Nonparametric Estimation of Time-Dependent ROC, Brier Score, and Survival Difference from Right Censored Time-to-Event Data with or without Competing Risks

The tdROC package facilitates the estimation of time-dependent ROC (Receiver Operating Characteristic) curves and the Area Under the time-dependent ROC Curve (AUC) in the context of survival data, accommodating scenarios with right censored data and the option to account for competing risks. In addition to the ROC/AUC estimation, the package also estimates time-dependent Brier score and survival difference. Confidence intervals of various estimated quantities can be obtained from bootstrap. The package also offers plotting functions for visualizing time-dependent ROC curves.

Maintained by Xiaoyang Li. Last updated 1 years ago.

cpp

10.3 match 1.85 score

yanyachen

MLmetrics:Machine Learning Evaluation Metrics

A collection of evaluation metrics, including loss, score and utility functions, that measure regression, classification and ranking performance.

Maintained by Yachen Yan. Last updated 11 months ago.

1.7 match 69 stars 11.09 score 2.2k scripts 20 dependents

cran

SeaVal:Validation of Seasonal Weather Forecasts

Provides tools for processing and evaluating seasonal weather forecasts, with an emphasis on tercile forecasts. We follow the World Meteorological Organization's "Guidance on Verification of Operational Seasonal Climate Forecasts", S.J.Mason (2018, ISBN: 978-92-63-11220-0, URL: <https://library.wmo.int/idurl/4/56227>). The development was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 869730 (CONFER). A comprehensive online tutorial is available at <https://seasonalforecastingengine.github.io/SeaValDoc/>.

Maintained by Claudio Heinrich-Mertsching. Last updated 9 months ago.

10.9 match 1.70 score

t-kalinowski

keras:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

1.7 match 10.82 score 10k scripts 54 dependents

foucher-y

survivalSL:Super Learner for Survival Prediction from Censored Data

Several functions and S3 methods to construct a super learner in the presence of censored times-to-event and to evaluate its prognostic capacities.

Maintained by Yohann Foucher. Last updated 2 months ago.

5.0 match 2 stars 3.70 score

rwehrens

BioMark:Find Biomarkers in Two-Class Discrimination Problems

Variable selection methods are provided for several classification methods: the lasso/elastic net, PCLDA, PLSDA, and several t-tests. Two approaches for selecting cutoffs can be used, one based on the stability of model coefficients under perturbation, and the other on higher criticism.

Maintained by Ron Wehrens. Last updated 10 years ago.

7.8 match 2.32 score 21 scripts

ertansu

ROCsurf:ROC Surface Analysis Under the Three-Class Problems

Receiver Operating Characteristic (ROC) analysis is performed assuming samples are from the proposed distributions. In addition, the volume under the ROC surface and true positive fractions values are evaluated by ROC surface analysis.

Maintained by Ertan Akgenç. Last updated 8 months ago.

5.7 match 3.18 score 4 scripts

bioc

matter:Out-of-core statistical computing and signal processing

Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.

Maintained by Kylie A. Bemis. Last updated 3 months ago.

infrastructure datarepresentation dataimport dimensionreduction preprocessing cpp

1.9 match 57 stars 9.52 score 64 scripts 2 dependents

youyifong

kyotil:Utility Functions for Statistical Analysis Report Generation and Monte Carlo Studies

Helper functions for creating formatted summary of regression models, writing publication-ready tables to latex files, and running Monte Carlo experiments.

Maintained by Youyi Fong. Last updated 8 days ago.

openblas

2.3 match 7.87 score 236 scripts 7 dependents

bioc

SC3:Single-Cell Consensus Clustering

A tool for unsupervised clustering and analysis of single cell RNA-Seq data.

Maintained by Vladimir Kiselev. Last updated 5 months ago.

immunooncology singlecell software classification clustering dimensionreduction supportvectormachine rnaseq visualization transcriptomics datarepresentation gui differentialexpression transcription bioconductor-package human-cell-atlas single-cell-rna-seq openblas cpp

1.8 match 122 stars 10.09 score 374 scripts 1 dependents

bioc

CMA:Synthesis of microarray-based classification

This package provides a comprehensive collection of various microarray-based classification algorithms both from Machine Learning and Statistics. Variable Selection, Hyperparameter tuning, Evaluation and Comparison can be performed combined or stepwise in a user-friendly environment.

Maintained by Roman Hornung. Last updated 5 months ago.

classification decisiontree

3.3 match 5.09 score 61 scripts

ndphillips

FFTrees:Generate, Visualise, and Evaluate Fast-and-Frugal Decision Trees

Create, visualize, and test fast-and-frugal decision trees (FFTs) using the algorithms and methods described by Phillips, Neth, Woike & Gaissmaier (2017), <doi:10.1017/S1930297500006239>. FFTs are simple and transparent decision trees for solving binary classification problems. FFTs can be preferable to more complex algorithms because they require very little information, are easy to understand and communicate, and are robust against overfitting.

Maintained by Hansjoerg Neth. Last updated 5 months ago.

1.8 match 135 stars 9.58 score 144 scripts

bioc

Rtpca:Thermal proximity co-aggregation with R

R package for performing thermal proximity co-aggregation analysis with thermal proteome profiling datasets to analyse protein complex assembly and (differential) protein-protein interactions across conditions.

Maintained by Nils Kurzawa. Last updated 5 months ago.

software proteomics dataimport

3.8 match 4.46 score 29 scripts

aijordan

triptych:Diagnostic Graphics to Evaluate Forecast Performance

Overall predictive performance is measured by a mean score (or loss), which decomposes into miscalibration, discrimination, and uncertainty components. The main focus is visualization of these distinct and complementary aspects in joint displays. See Dimitriadis, Gneiting, Jordan, Vogel (2024) <doi:10.1016/j.ijforecast.2023.09.007>.

Maintained by Alexander I. Jordan. Last updated 9 months ago.

cpp

5.1 match 3.28 score 19 scripts

adriancorrendo

metrica:Prediction Performance Metrics

A compilation of more than 80 functions designed to quantitatively and visually evaluate prediction performance of regression (continuous variables) and classification (categorical variables) of point-forecast models (e.g. APSIM, DSSAT, DNDC, supervised Machine Learning). For regression, it includes functions to generate plots (scatter, tiles, density, & Bland-Altman plot), and to estimate error metrics (e.g. MBE, MAE, RMSE), error decomposition (e.g. lack of accuracy-precision), model efficiency (e.g. NSE, E1, KGE), indices of agreement (e.g. d, RAC), goodness of fit (e.g. r, R2), adjusted correlation coefficients (e.g. CCC, dcorr), symmetric regression coefficients (intercept, slope), and mean absolute scaled error (MASE) for time series predictions. For classification (binomial and multinomial), it offers functions to generate and plot confusion matrices, and to estimate performance metrics such as accuracy, precision, recall, specificity, F-score, Cohen's Kappa, G-mean, and many more. For more details visit the vignettes <https://adriancorrendo.github.io/metrica/>.

Maintained by Adrian A. Correndo. Last updated 9 months ago.

2.0 match 77 stars 8.18 score 49 scripts

cran

HUM:Compute HUM Value and Visualize ROC Curves

Tools for computing HUM (Hypervolume Under the Manifold) value to estimate features ability to discriminate the class labels, visualizing the ROC curve for two or three class labels (Natalia Novoselova, Cristina Della Beffa, Junxi Wang, Jialiang Li, Frank Pessler, Frank Klawonn (2014) <doi:10.1093/bioinformatics/btu086>).

Maintained by Natalia Novoselova. Last updated 3 years ago.

cpp

9.2 match 1 stars 1.78 score

aigorahub

sensR:Thurstonian Models for Sensory Discrimination

Provides methods for sensory discrimination methods; duotrio, tetrad, triangle, 2-AFC, 3-AFC, A-not A, same-different, 2-AC and degree-of-difference. This enables the calculation of d-primes, standard errors of d-primes, sample size and power computations, and comparisons of different d-primes. Methods for profile likelihood confidence intervals and plotting are included. Most methods are described in Brockhoff, P.B. and Christensen, R.H.B. (2010) <doi:10.1016/j.foodqual.2009.04.003>.

Maintained by Dominik Rafacz. Last updated 1 years ago.

3.3 match 7 stars 4.92 score 77 scripts

wzhang17

sorocs:A Bayesian Semiparametric Approach to Correlated ROC Surfaces

A Bayesian semiparametric Dirichlet process mixtures to estimate correlated receiver operating characteristic (ROC) surfaces and the associated volume under the surface (VUS) with stochastic order constraints. The reference paper is:Zhen Chen, Beom Seuk Hwang, (2018) "A Bayesian semiparametric approach to correlated ROC surfaces with stochastic order constraints". Biometrics, 75, 539-550. <doi:10.1111/biom.12997>.

Maintained by Weimin Zhang. Last updated 5 years ago.

5.3 match 3.00 score 2 scripts

brian-j-smith

MachineShop:Machine Learning Models and Tools

Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.

Maintained by Brian J Smith. Last updated 7 months ago.

classification-models machine-learning predictive-modeling regression-models survival-models

2.0 match 61 stars 7.95 score 121 scripts

bioc

puma:Propagating Uncertainty in Microarray Analysis(including Affymetrix tranditional 3' arrays and exon arrays and Human Transcriptome Array 2.0)

Most analyses of Affymetrix GeneChip data (including tranditional 3' arrays and exon arrays and Human Transcriptome Array 2.0) are based on point estimates of expression levels and ignore the uncertainty of such estimates. By propagating uncertainty to downstream analyses we can improve results from microarray analyses. For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. In additon to calculte gene expression from Affymetrix 3' arrays, puma also provides methods to process exon arrays and produces gene and isoform expression for alternative splicing study. puma also offers improvements in terms of scope and speed of execution over previously available uncertainty propagation methods. Included are summarisation, differential expression detection, clustering and PCA methods, together with useful plotting functions.

Maintained by Xuejun Liu. Last updated 5 months ago.

microarray onechannel preprocessing differentialexpression clustering exonarray geneexpression mrnamicroarray chiponchip alternativesplicing differentialsplicing bayesian twochannel dataimport hta2.0

3.5 match 4.53 score 17 scripts

tripartio

staccuracy:Standardized Accuracy and Other Model Performance Metrics

Standardized accuracy (staccuracy) is a framework for expressing accuracy scores such that 50% represents a reference level of performance and 100% is a perfect prediction. The 'staccuracy' package provides tools for creating staccuracy functions as well as some recommended staccuracy measures. It also provides functions for some classic performance metrics such as mean absolute error (MAE), root mean squared error (RMSE), and area under the receiver operating characteristic curve (AUCROC), as well as their winsorized versions when applicable.

Maintained by Chitu Okoli. Last updated 21 days ago.

3.8 match 1 stars 4.18 score 4 scripts 2 dependents

marsdu1989

reportROC:An Easy Way to Report ROC Analysis

Provides an easy way to report the results of ROC analysis, including: 1. an ROC curve. 2. the value of Cutoff, AUC (Area Under Curve), ACC (accuracy), SEN (sensitivity), SPE (specificity), PLR (positive likelihood ratio), NLR (negative likelihood ratio), PPV (positive predictive value), NPV (negative predictive value), PPA (percentage of positive accordance), NPA (percentage of negative accordance), TPA (percentage of total accordance), KAPPA (kappa value).

Maintained by Zhicheng Du. Last updated 3 years ago.

5.6 match 2.77 score 33 scripts 2 dependents

tesselle

kairos:Analysis of Chronological Patterns from Archaeological Count Data

A toolkit for absolute and relative dating and analysis of chronological patterns. This package includes functions for chronological modeling and dating of archaeological assemblages from count data. It provides methods for matrix seriation. It also allows to compute time point estimates and density estimates of the occupation and duration of an archaeological site.

Maintained by Nicolas Frerebeau. Last updated 13 days ago.

chronology matrix-seriation archaeology archaeological-science

3.3 match 4.66 score 11 scripts 1 dependents

mwheymans

psfmi:Prediction Model Pooling, Selection and Performance Evaluation Across Multiply Imputed Datasets

Pooling, backward and forward selection of linear, logistic and Cox regression models in multiply imputed datasets. Backward and forward selection can be done from the pooled model using Rubin's Rules (RR), the D1, D2, D3, D4 and the median p-values method. This is also possible for Mixed models. The models can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of predictors. The stability of the models can be evaluated using (cluster) bootstrapping. The package further contains functions to pool model performance measures as ROC/AUC, Reclassification, R-squared, scaled Brier score, H&L test and calibration plots for logistic regression models. Internal validation can be done across multiply imputed datasets with cross-validation or bootstrapping. The adjusted intercept after shrinkage of pooled regression coefficients can be obtained. Backward and forward selection as part of internal validation is possible. A function to externally validate logistic prediction models in multiple imputed datasets is available and a function to compare models. For Cox models a strata variable can be included. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>.

Maintained by Martijn Heymans. Last updated 2 years ago.

cox-regression imputation imputed-datasets logistic multiple-imputation pool predictor regression selection spline spline-predictors

2.1 match 10 stars 7.17 score 70 scripts

bioc

PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit

Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpression pharmacogenetics pharmacogenomics software classification survival clustering geneprediction

3.5 match 1 stars 4.31 score 17 scripts

modeloriented

survex:Explainable Machine Learning in Survival Analysis

Survival analysis models are commonly used in medicine and other areas. Many of them are too complex to be interpreted by human. Exploration and explanation is needed, but standard methods do not give a broad enough picture. 'survex' provides easy-to-apply methods for explaining survival models, both complex black-boxes and simpler statistical models. They include methods specific to survival analysis such as SurvSHAP(t) introduced in Krzyzinski et al., (2023) <doi:10.1016/j.knosys.2022.110234>, SurvLIME described in Kovalev et al., (2020) <doi:10.1016/j.knosys.2020.106164> as well as extensions of existing ones described in Biecek et al., (2021) <doi:10.1201/9780429027192>.

Maintained by Mikołaj Spytek. Last updated 9 months ago.

biostatistics brier-scores censored-data cox-model cox-regression explainable-ai explainable-machine-learning explainable-ml explanatory-model-analysis interpretable-machine-learning interpretable-ml machine-learning probabilistic-machine-learning shap survival-analysis time-to-event variable-importance xai

1.8 match 110 stars 8.40 score 114 scripts

cran

randomUniformForest:Random Uniform Forests for Classification, Regression and Unsupervised Learning

Ensemble model, for classification, regression and unsupervised learning, based on a forest of unpruned and randomized binary decision trees. Each tree is grown by sampling, with replacement, a set of variables at each node. Each cut-point is generated randomly, according to the continuous Uniform distribution. For each tree, data are either bootstrapped or subsampled. The unsupervised mode introduces clustering, dimension reduction and variable importance, using a three-layer engine. Random Uniform Forests are mainly aimed to lower correlation between trees (or trees residuals), to provide a deep analysis of variable importance and to allow native distributed and incremental learning.

Maintained by Saip Ciss. Last updated 3 years ago.

cpp

4.0 match 3 stars 3.77 score 99 scripts

consbiol-unibern

SDMtune:Species Distribution Model Selection

User-friendly framework that enables the training and the evaluation of species distribution models (SDMs). The package implements functions for data driven variable selection and model tuning and includes numerous utilities to display the results. All the functions used to select variables or to tune model hyperparameters have an interactive real-time chart displayed in the 'RStudio' viewer pane during their execution.

Maintained by Sergio Vignali. Last updated 3 months ago.

hyperparameter-tuning species-distribution-modelling variable-selection cpp

2.0 match 25 stars 7.37 score 155 scripts

lau-mel

rocc:ROC Based Classification

Functions for a classification method based on receiver operating characteristics (ROC). Briefly, features are selected according to their ranked AUC value in the training set. The selected features are merged by the mean value to form a meta-gene. The samples are ranked by their meta-gene value and the meta-gene threshold that has the highest accuracy in splitting the training samples is determined. A new sample is classified by its meta-gene value relative to the threshold. In the first place, the package is aimed at two class problems in gene expression data, but might also apply to other problems.

Maintained by Martin Lauss. Last updated 5 years ago.

9.3 match 1.56 score 36 scripts

yangfengstat

nproc:Neyman-Pearson (NP) Classification Algorithms and NP Receiver Operating Characteristic (NP-ROC) Curves

In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (i.e., the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (i.e., the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, alpha, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than alpha do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than alpha. As a result, the NP paradigm has not been properly implemented for many classification scenarios in practice. In this work, we develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, including popular methods such as logistic regression, support vector machines and random forests. Powered by this umbrella algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands, motivated by the popular receiver operating characteristic (ROC) curves. NP-ROC bands will help choose in a data adaptive way and compare different NP classifiers.

Maintained by Yang Feng. Last updated 5 years ago.

6.4 match 2.23 score 17 scripts

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

1.8 match 3 stars 8.20 score 7.8k scripts 11 dependents

mathijsdeen

MDMA:Mathijs Deen's Miscellaneous Auxiliaries

Provides a variety of functions useful for data analysis, selection, manipulation, and graphics.

Maintained by Mathijs Deen. Last updated 11 months ago.

5.3 match 2.70 score

cran

VUROCS:Volume under the ROC Surface for Multi-Class ROC Analysis

Calculates the volume under the ROC surface and its (co)variance for ordered multi-class ROC analysis as well as certain bivariate ordinal measures of association.

Maintained by Hannes Kazianka. Last updated 5 years ago.

cpp

14.2 match 1.00 score 2 scripts

desanou

mglasso:Multiscale Graphical Lasso

Inference of Multiscale graphical models with neighborhood selection approach. The method is based on solving a convex optimization problem combining a Lasso and fused-group Lasso penalties. This allows to infer simultaneously a conditional independence graph and a clustering partition. The optimization is based on the Continuation with Nesterov smoothing in a Shrinkage-Thresholding Algorithm solver (Hadj-Selem et al. 2018) <doi:10.1109/TMI.2018.2829802> implemented in python.

Maintained by Edmond Sanou. Last updated 2 years ago.

3.5 match 2 stars 4.11 score 13 scripts

lozalojo

mem:The Moving Epidemic Method

The Moving Epidemic Method, created by T Vega and JE Lozano (2012, 2015) <doi:10.1111/j.1750-2659.2012.00422.x>, <doi:10.1111/irv.12330>, allows the weekly assessment of the epidemic and intensity status to help in routine respiratory infections surveillance in health systems. Allows the comparison of different epidemic indicators, timing and shape with past epidemics and across different regions or countries with different surveillance systems. Also, it gives a measure of the performance of the method in terms of sensitivity and specificity of the alert week.

Maintained by Jose E. Lozano. Last updated 2 years ago.

influenza mem

2.3 match 14 stars 6.24 score 82 scripts 1 dependents

ddimmery

tidyhte:Tidy Estimation of Heterogeneous Treatment Effects

Estimates heterogeneous treatment effects using tidy semantics on experimental or observational data. Methods are based on the doubly-robust learner of Kennedy (n.d.) <arXiv:2004.14497>. You provide a simple recipe for what machine learning algorithms to use in estimating the nuisance functions and 'tidyhte' will take care of cross-validation, estimation, model selection, diagnostics and construction of relevant quantities of interest about the variability of treatment effects.

Maintained by Drew Dimmery. Last updated 2 years ago.

2.5 match 14 stars 5.36 score 11 scripts

cran

rocbc:Statistical Inference for Box-Cox Based Receiver Operating Characteristic Curves

Generation of Box-Cox based ROC curves and several aspects of inferences and hypothesis testing. Can be used when inferences for one biomarker (Bantis LE, Nakas CT, Reiser B. (2018)<doi:10.1002/bimj.201700107>) are of interest or when comparisons of two correlated biomarkers (Bantis LE, Nakas CT, Reiser B. (2021)<doi:10.1002/bimj.202000128>) are of interest. Provides inferences and comparisons around the AUC, the Youden index, the sensitivity at a given specificity level (and vice versa), the optimal operating point of the ROC curve (in the Youden sense), and the Youden based cutoff.

Maintained by Benjamin Brewer. Last updated 11 months ago.

5.8 match 2.30 score

cran

longROC:Time-Dependent Prognostic Accuracy with Multiply Evaluated Bio Markers or Scores

Time-dependent Receiver Operating Characteristic curves, Area Under the Curve, and Net Reclassification Indexes for repeated measures. It is based on methods in Barbati and Farcomeni (2017) <doi:10.1007/s10260-017-0410-2>.

Maintained by Alessio Farcomeni. Last updated 7 years ago.

12.3 match 1.08 score 12 scripts

colintredoux

r4lineups:Statistical Inference on Lineup Fairness

Since the early 1970s eyewitness testimony researchers have recognised the importance of estimating properties such as lineup bias (is the lineup biased against the suspect, leading to a rate of choosing higher than one would expect by chance?), and lineup size (how many reasonable choices are in fact available to the witness? A lineup is supposed to consist of a suspect and a number of additional members, or foils, whom a poor-quality witness might mistake for the perpetrator). Lineup measures are descriptive, in the first instance, but since the earliest articles in the literature researchers have recognised the importance of reasoning inferentially about them. This package contains functions to compute various properties of laboratory or police lineups, and is intended for use by researchers in forensic psychology and/or eyewitness testimony research. Among others, the r4lineups package includes functions for calculating lineup proportion, functional size, various estimates of effective size, diagnosticity ratio, homogeneity of the diagnosticity ratio, ROC curves for confidence x accuracy data and the degree of similarity of faces in a lineup.

Maintained by Colin Tredoux. Last updated 7 years ago.

5.1 match 2.58 score 38 scripts

bioc

kebabs:Kernel-Based Analysis of Biological Sequences

The package provides functionality for kernel-based analysis of DNA, RNA, and amino acid sequences via SVM-based methods. As core functionality, kebabs implements following sequence kernels: spectrum kernel, mismatch kernel, gappy pair kernel, and motif kernel. Apart from an efficient implementation of standard position-independent functionality, the kernels are extended in a novel way to take the position of patterns into account for the similarity measure. Because of the flexibility of the kernel formulation, other kernels like the weighted degree kernel or the shifted weighted degree kernel with constant weighting of positions are included as special cases. An annotation-specific variant of the kernels uses annotation information placed along the sequence together with the patterns in the sequence. The package allows for the generation of a kernel matrix or an explicit feature representation in dense or sparse format for all available kernels which can be used with methods implemented in other R packages. With focus on SVM-based methods, kebabs provides a framework which simplifies the usage of existing SVM implementations in kernlab, e1071, and LiblineaR. Binary and multi-class classification as well as regression tasks can be used in a unified way without having to deal with the different functions, parameters, and formats of the selected SVM. As support for choosing hyperparameters, the package provides cross validation - including grouped cross validation, grid search and model selection functions. For easier biological interpretation of the results, the package computes feature weights for all SVMs and prediction profiles which show the contribution of individual sequence positions to the prediction result and indicate the relevance of sequence sections for the learning result and the underlying biological functions.

Maintained by Ulrich Bodenhofer. Last updated 5 months ago.

supportvectormachine classification clustering regression cpp

2.0 match 6.58 score 47 scripts 3 dependents

ludovikcoba

rrecsys:Environment for Evaluating Recommender Systems

Processes standard recommendation datasets (e.g., a user-item rating matrix) as input and generates rating predictions and lists of recommended items. Standard algorithm implementations which are included in this package are the following: Global/Item/User-Average baselines, Weighted Slope One, Item-Based KNN, User-Based KNN, FunkSVD, BPR and weighted ALS. They can be assessed according to the standard offline evaluation methodology (Shani, et al. (2011) <doi:10.1007/978-0-387-85820-3_8>) for recommender systems using measures such as MAE, RMSE, Precision, Recall, F1, AUC, NDCG, RankScore and coverage measures. The package (Coba, et al.(2017) <doi: 10.1007/978-3-319-60042-0_36>) is intended for rapid prototyping of recommendation algorithms and education purposes.

Maintained by Ludovik Çoba. Last updated 3 years ago.

cpp

1.9 match 23 stars 6.84 score 25 scripts

bioc

cardelino:Clone Identification from Single Cell Data

Methods to infer clonal tree configuration for a population of cells using single-cell RNA-seq data (scRNA-seq), and possibly other data modalities. Methods are also provided to assign cells to inferred clones and explore differences in gene expression between clones. These methods can flexibly integrate information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. A flexible beta-binomial error model that accounts for stochastic dropout events as well as systematic allelic imbalance is used.

Maintained by Davis McCarthy. Last updated 5 months ago.

singlecell rnaseq visualization transcriptomics geneexpression sequencing software exomeseq clonal-clustering gibbs-sampling scrna-seq single-cell somatic-mutations

1.8 match 61 stars 7.05 score 62 scripts

mjuraska

CoRpower:Power Calculations for Assessing Correlates of Risk in Clinical Efficacy Trials

Calculates power for assessment of intermediate biomarker responses as correlates of risk in the active treatment group in clinical efficacy trials, as described in Gilbert, Janes, and Huang, Power/Sample Size Calculations for Assessing Correlates of Risk in Clinical Efficacy Trials (2016, Statistics in Medicine). The methods differ from past approaches by accounting for the level of clinical treatment efficacy overall and in biomarker response subgroups, which enables the correlates of risk results to be interpreted in terms of potential correlates of efficacy/protection. The methods also account for inter-individual variability of the observed biomarker response that is not biologically relevant (e.g., due to technical measurement error of the laboratory assay used to measure the biomarker response), which is important because power to detect a specified correlate of risk effect size is heavily affected by the biomarker's measurement error. The methods can be used for a general binary clinical endpoint model with a univariate dichotomous, trichotomous, or continuous biomarker response measured in active treatment recipients at a fixed timepoint after randomization, with either case-cohort Bernoulli sampling or case-control without-replacement sampling of the biomarker (a baseline biomarker is handled as a trivial special case). In a specified two-group trial design, the computeN() function can initially be used for calculating additional requisite design parameters pertaining to the target population of active treatment recipients observed to be at risk at the biomarker sampling timepoint. Subsequently, the power calculation employs an inverse probability weighted logistic regression model fitted by the tps() function in the 'osDesign' package. Power results as well as the relationship between the correlate of risk effect size and treatment efficacy can be visualized using various plotting functions. To link power calculations for detecting a correlate of risk and a correlate of treatment efficacy, a baseline immunogenicity predictor (BIP) can be simulated according to a specified classification rule (for dichotomous or trichotomous BIPs) or correlation with the biomarker response (for continuous BIPs), then outputted along with biomarker response data under assignment to treatment, and clinical endpoint data for both treatment and placebo groups.

Maintained by Michal Juraska. Last updated 4 years ago.

3.0 match 4.15 score 14 scripts

bioc

wateRmelon:Illumina DNA methylation array normalization and metrics

15 flavours of betas and three performance metrics, with methods for objects produced by methylumi and minfi packages.

Maintained by Leo C Schalkwyk. Last updated 4 months ago.

dnamethylation microarray twochannel preprocessing qualitycontrol

1.6 match 7.75 score 247 scripts 2 dependents

pwwang

plotthis:High-Level Plotting Built Upon 'ggplot2' and Other Plotting Packages

Provides high-level API and a wide range of options to create stunning, publication-quality plots effortlessly. It is built upon 'ggplot2' and other plotting packages, and is designed to be easy to use and to work seamlessly with 'ggplot2' objects. It is particularly useful for creating complex plots with multiple layers, facets, and annotations. It also provides a set of functions to create plots for specific types of data, such as Venn diagrams, alluvial diagrams, and phylogenetic trees. The package is designed to be flexible and customizable, and to work well with the 'ggplot2' ecosystem. The API can be found at <https://pwwang.github.io/plotthis/reference/index.html>.

Maintained by Panwen Wang. Last updated 13 hours ago.

ggplot2 plotting single-cell

2.3 match 36 stars 5.51 score 2 scripts

tdhock

penaltyLearning:Penalty Learning

Implementations of algorithms from Learning Sparse Penalties for Change-point Detection using Max Margin Interval Regression, by Hocking, Rigaill, Vert, Bach <http://proceedings.mlr.press/v28/hocking13.html> published in proceedings of ICML2013.

Maintained by Toby Dylan Hocking. Last updated 6 months ago.

cpp

2.0 match 16 stars 6.13 score 129 scripts 2 dependents

drizopoulos

JMbayes:Joint Modeling of Longitudinal and Time-to-Event Data under a Bayesian Approach

Shared parameter models for the joint modeling of longitudinal and time-to-event data using MCMC; Dimitris Rizopoulos (2016) <doi:10.18637/jss.v072.i07>.

Maintained by Dimitris Rizopoulos. Last updated 4 years ago.

joint-models longitudinal-responses prediction-model survival-analysis openblas cpp openmp jags

1.8 match 60 stars 6.98 score 80 scripts

maizhou

emplik:Empirical Likelihood Ratio for Censored/Truncated Data

Empirical likelihood ratio tests and confidence intervals for means/quantiles/hazards from possibly censored and/or truncated data. In particular, the empirical likelihood for the Kaplan-Meier/Nelson-Aalen estimator. Now does AFT regression too.

Maintained by Mai Zhou. Last updated 3 months ago.

3.6 match 3.37 score 39 scripts 13 dependents

bioc

tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles

This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.

Maintained by Timothy Keyes. Last updated 5 months ago.

singlecell flowcytometry bioinformatics cytometry data-science single-cell tidyverse cpp

1.7 match 19 stars 7.26 score 35 scripts

jonasbhend

easyVerification:Ensemble Forecast Verification for Large Data Sets

Set of tools to simplify application of atomic forecast verification metrics for (comparative) verification of ensemble forecasts to large data sets. The forecast metrics are imported from the 'SpecsVerification' package, and additional forecast metrics are provided with this package. Alternatively, new user-defined forecast scores can be implemented using the example scores provided and applied using the functionality of this package.

Maintained by Jonas Bhend. Last updated 2 years ago.

cpp

2.0 match 1 stars 6.04 score 61 scripts 4 dependents

echasnovski

pdqr:Work with Custom Distribution Functions

Create, transform, and summarize custom random variables with distribution functions (analogues of 'p*()', 'd*()', 'q*()', and 'r*()' functions from base R). Two types of distributions are supported: "discrete" (random variable has finite number of output values) and "continuous" (infinite number of values in the form of continuous random variable). Functions for distribution transformations and summaries are available. Implemented approaches often emphasize approximate and numerical solutions: all distributions assume finite support and finite values of density function; some methods implemented with simulation techniques.

Maintained by Evgeni Chasnovski. Last updated 2 years ago.

1.9 match 15 stars 6.37 score 26 scripts 1 dependents

bioc

PathoStat:PathoStat Statistical Microbiome Analysis Package

The purpose of this package is to perform Statistical Microbiome Analysis on metagenomics results from sequencing data samples. In particular, it supports analyses on the PathoScope generated report files. PathoStat provides various functionalities including Relative Abundance charts, Diversity estimates and plots, tests of Differential Abundance, Time Series visualization, and Core OTU analysis.

Maintained by Solaiappan Manimaran. Last updated 5 months ago.

microbiome metagenomics graphandnetwork microarray patternlogic principalcomponent sequencing software visualization rnaseq immunooncology

2.0 match 8 stars 5.90 score 8 scripts

bblodfon

usefun:A Collection of Useful Functions by John

A set of general functions that I have used in various projects and other R packages. Miscellaneous operations on data frames, matrices and vectors, ROC and PR statistics.

Maintained by John Zobolas. Last updated 6 months ago.

functions

2.5 match 4 stars 4.61 score 102 scripts

modeloriented

fairmodels:Flexible Tool for Bias Detection, Visualization, and Mitigation

Measure fairness metrics in one place for many models. Check how big is model's bias towards different races, sex, nationalities etc. Use measures such as Statistical Parity, Equal odds to detect the discrimination against unprivileged groups. Visualize the bias using heatmap, radar plot, biplot, bar chart (and more!). There are various pre-processing and post-processing bias mitigation algorithms implemented. Package also supports calculating fairness metrics for regression models. Find more details in (Wiśniewski, Biecek (2021)) <arXiv:2104.00507>.

Maintained by Jakub Wiśniewski. Last updated 1 months ago.

explain-classifiers explainable-ml fairness fairness-comparison fairness-ml model-evaluation

1.5 match 86 stars 7.72 score 51 scripts 1 dependents

bioc

metaseqR2:An R package for the analysis and result reporting of RNA-Seq data by combining multiple statistical algorithms

Provides an interface to several normalization and statistical testing packages for RNA-Seq gene expression data. Additionally, it creates several diagnostic plots, performs meta-analysis by combinining the results of several statistical tests and reports the results in an interactive way.

Maintained by Panagiotis Moulos. Last updated 5 days ago.

software geneexpression differentialexpression workflowstep preprocessing qualitycontrol normalization reportwriting rnaseq transcription sequencing transcriptomics bayesian clustering cellbiology biomedicalinformatics functionalgenomics systemsbiology immunooncology alternativesplicing differentialsplicing multiplecomparison timecourse dataimport atacseq epigenetics regression proprietaryplatforms genesetenrichment batcheffect chipseq

1.9 match 7 stars 6.05 score 3 scripts

caranathunge

promor:Proteomics Data Analysis and Modeling Tools

A comprehensive, user-friendly package for label-free proteomics data analysis and machine learning-based modeling. Data generated from 'MaxQuant' can be easily used to conduct differential expression analysis, build predictive models with top protein candidates, and assess model performance. promor includes a suite of tools for quality control, visualization, missing data imputation (Lazar et. al. (2016) <doi:10.1021/acs.jproteome.5b00981>), differential expression analysis (Ritchie et. al. (2015) <doi:10.1093/nar/gkv007>), and machine learning-based modeling (Kuhn (2008) <doi:10.18637/jss.v028.i05>).

Maintained by Chathurani Ranathunge. Last updated 2 years ago.

biomarkers differential-expression lfq machine-learning mass-spectrometry modeling proteomics

2.3 match 15 stars 5.02 score 14 scripts

bioc

omicsViewer:Interactive and explorative visualization of SummarizedExperssionSet or ExpressionSet using omicsViewer

omicsViewer visualizes ExpressionSet (or SummarizedExperiment) in an interactive way. The omicsViewer has a separate back- and front-end. In the back-end, users need to prepare an ExpressionSet that contains all the necessary information for the downstream data interpretation. Some extra requirements on the headers of phenotype data or feature data are imposed so that the provided information can be clearly recognized by the front-end, at the same time, keep a minimum modification on the existing ExpressionSet object. The pure dependency on R/Bioconductor guarantees maximum flexibility in the statistical analysis in the back-end. Once the ExpressionSet is prepared, it can be visualized using the front-end, implemented by shiny and plotly. Both features and samples could be selected from (data) tables or graphs (scatter plot/heatmap). Different types of analyses, such as enrichment analysis (using Bioconductor package fgsea or fisher's exact test) and STRING network analysis, will be performed on the fly and the results are visualized simultaneously. When a subset of samples and a phenotype variable is selected, a significance test on means (t-test or ranked based test; when phenotype variable is quantitative) or test of independence (chi-square or fisher’s exact test; when phenotype data is categorical) will be performed to test the association between the phenotype of interest with the selected samples. Additionally, other analyses can be easily added as extra shiny modules. Therefore, omicsViewer will greatly facilitate data exploration, many different hypotheses can be explored in a short time without the need for knowledge of R. In addition, the resulting data could be easily shared using a shiny server. Otherwise, a standalone version of omicsViewer together with designated omics data could be easily created by integrating it with portable R, which can be shared with collaborators or submitted as supplementary data together with a manuscript.

Maintained by Chen Meng. Last updated 2 months ago.

software visualization genesetenrichment differentialexpression motifdiscovery network networkenrichment

1.9 match 4 stars 6.02 score 22 scripts

suman762

PredictABEL:Assessment of Risk Prediction Models

We included functions to assess the performance of risk models. The package contains functions for the various measures that are used in empirical studies, including univariate and multivariate odds ratios (OR) of the predictors, the c-statistic (or area under the receiver operating characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test, reclassification table, net reclassification improvement (NRI) and integrated discrimination improvement (IDI). Also included are functions to create plots, such as risk distributions, ROC curves, calibration plot, discrimination box plot and predictiveness curves. In addition to functions to assess the performance of risk models, the package includes functions to obtain weighted and unweighted risk scores as well as predicted risks using logistic regression analysis. These logistic regression functions are specifically written for models that include genetic variables, but they can also be applied to models that are based on non-genetic risk factors only. Finally, the package includes function to construct a simulated dataset with genotypes, genetic risks, and disease status for a hypothetical population, which is used for the evaluation of genetic risk models.

Maintained by Suman Kundu. Last updated 5 years ago.

3.4 match 2 stars 3.33 score 91 scripts

mdbrown

rmda:Risk Model Decision Analysis

Provides tools to evaluate the value of using a risk prediction instrument to decide treatment or intervention (versus no treatment or intervention). Given one or more risk prediction instruments (risk models) that estimate the probability of a binary outcome, rmda provides functions to estimate and display decision curves and other figures that help assess the population impact of using a risk model for clinical decision making. Here, "population" refers to the relevant patient population. Decision curves display estimates of the (standardized) net benefit over a range of probability thresholds used to categorize observations as 'high risk'. The curves help evaluate a treatment policy that recommends treatment for patients who are estimated to be 'high risk' by comparing the population impact of a risk-based policy to "treat all" and "treat none" intervention policies. Curves can be estimated using data from a prospective cohort. In addition, rmda can estimate decision curves using data from a case-control study if an estimate of the population outcome prevalence is available. Version 1.4 of the package provides an alternative framing of the decision problem for situations where treatment is the standard-of-care and a risk model might be used to recommend that low-risk patients (i.e., patients below some risk threshold) opt out of treatment. Confidence intervals calculated using the bootstrap can be computed and displayed. A wrapper function to calculate cross-validated curves using k-fold cross-validation is also provided.

Maintained by Marshall Brown. Last updated 6 years ago.

1.7 match 28 stars 6.56 score 96 scripts

bioc

ClassifyR:A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.

Maintained by Dario Strbenac. Last updated 7 days ago.

classification survival cpp

1.3 match 5 stars 8.36 score 45 scripts 3 dependents

cran

intcensROC:AUC Estimation of Interval Censored Survival Data

The kernel of this 'Rcpp' based package is an efficient implementation of the generalized gradient projection method for spline function based constrained maximum likelihood estimator for interval censored survival data (Wu, Yuan; Zhang, Ying. Partially monotone tensor spline estimation of the joint distribution function with bivariate current status data. Ann. Statist. 40, 2012, 1609-1636 <doi:10.1214/12-AOS1016>). The key function computes the density function of the joint distribution of event time and the marker and returns the receiver operating characteristic (ROC) curve for the interval censored survival data as well as area under the curve (AUC).

Maintained by Yuan Wu. Last updated 4 years ago.

cpp

5.5 match 2.00 score

donadelnal

RQdeltaCT:Relative Quantification of Gene Expression using Delta Ct Methods

The commonly used methods for relative quantification of gene expression levels obtained in real-time PCR (Polymerase Chain Reaction) experiments are the delta Ct methods, encompassing 2^-dCt and 2^-ddCt methods, originally proposed by Kenneth J. Livak and Thomas D. Schmittgen (2001) <doi:10.1006/meth.2001.1262>. The main idea is to normalise gene expression values using endogenous control gene, present gene expression levels in linear form by using the 2^-(value)^ transformation, and calculate differences in gene expression levels between groups of samples (or technical replicates of a single sample). The 'RQdeltaCT' package offers functions that cover both methods for comparison of either independent groups of samples or groups with paired samples, together with importing expression datasets, performing multi-step quality control of data, enabling numerous data visualisations, enrichment of the standard workflow with additional useful analyses (correlation analysis, Receiver Operating Characteristic analysis, logistic regression), and conveniently export obtained results in table and image formats. The package has been designed to be friendly to non-experts in R programming.

Maintained by Daniel Zalewski. Last updated 1 months ago.

2.3 match 4.70 score 4 scripts

loelschlaeger

RprobitB:Bayesian Probit Choice Modeling

Bayes estimation of probit choice models, both in the cross-sectional and panel setting. The package can analyze binary, multivariate, ordered, and ranked choices, as well as heterogeneity of choice behavior among deciders. The main functionality includes model fitting via Markov chain Monte Carlo m ethods, tools for convergence diagnostic, choice data simulation, in-sample and out-of-sample choice prediction, and model selection using information criteria and Bayes factors. The latent class model extension facilitates preference-based decider classification, where the number of latent classes can be inferred via the Dirichlet process or a weight-based updating heuristic. This allows for flexible modeling of choice behavior without the need to impose structural constraints. For a reference on the method see Oelschlaeger and Bauer (2021) <https://trid.trb.org/view/1759753>.

Maintained by Lennart Oelschläger. Last updated 5 months ago.

bayes discrete-choice probit openblas cpp openmp

2.0 match 4 stars 5.45 score 1 scripts

blasbenito

spatialRF:Easy Spatial Modeling with Random Forest

Automatic generation and selection of spatial predictors for spatial regression with Random Forest. Spatial predictors are surrogates of variables driving the spatial structure of a response variable. The package offers two methods to generate spatial predictors from a distance matrix among training cases: 1) Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>): computed as the eigenvectors of a weighted matrix of distances; 2) RFsp (Hengl et al. <DOI:10.7717/peerj.5518>): columns of the distance matrix used as spatial predictors. Spatial predictors help minimize the spatial autocorrelation of the model residuals and facilitate an honest assessment of the importance scores of the non-spatial predictors. Additionally, functions to reduce multicollinearity, identify relevant variable interactions, tune random forest hyperparameters, assess model transferability via spatial cross-validation, and explore model results via partial dependence curves and interaction surfaces are included in the package. The modelling functions are built around the highly efficient 'ranger' package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).

Maintained by Blas M. Benito. Last updated 3 years ago.

random-forest spatial-analysis spatial-regression

2.0 match 114 stars 5.45 score 49 scripts