Showing 200 of total 319 results (show query)
xrobin
pROC:Display and Analyze ROC Curves
Tools for visualizing, smoothing and comparing receiver operating characteristic (ROC curves). (Partial) area under the curve (AUC) can be compared with statistical tests based on U-statistics or bootstrap. Confidence intervals can be computed for (p)AUC or ROC curves.
Maintained by Xavier Robin. Last updated 4 months ago.
bootstrappingcovariancehypothesis-testingmachine-learningplotplottingrocroc-curvevariancecpp
66.5 match 125 stars 15.18 score 16k scripts 445 dependentsbioc
ROC:utilities for ROC, with microarray focus
Provide utilities for ROC, with microarray focus.
Maintained by Vince Carey. Last updated 5 months ago.
71.0 match 6.97 score 70 scripts 8 dependentssachsmc
plotROC:Generate Useful ROC Curve Charts for Print and Interactive Use
Most ROC curve plots obscure the cutoff values and inhibit interpretation and comparison of multiple curves. This attempts to address those shortcomings by providing plotting and interactive tools. Functions are provided to generate an interactive ROC curve plot for web use, and print versions. A Shiny application implementing the functions is also included.
Maintained by Michael C. Sachs. Last updated 4 months ago.
36.5 match 87 stars 10.93 score 932 scripts 7 dependentsdpc10ster
RJafroc:Artificial Intelligence Systems and Observer Performance
Analyzing the performance of artificial intelligence (AI) systems/algorithms characterized by a 'search-and-report' strategy. Historically observer performance has dealt with measuring radiologists' performances in search tasks, e.g., searching for lesions in medical images and reporting them, but the implicit location information has been ignored. The implemented methods apply to analyzing the absolute and relative performances of AI systems, comparing AI performance to a group of human readers or optimizing the reporting threshold of an AI system. In addition to performing historical receiver operating receiver operating characteristic (ROC) analysis (localization information ignored), the software also performs free-response receiver operating characteristic (FROC) analysis, where lesion localization information is used. A book using the software has been published: Chakraborty DP: Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples, Taylor-Francis LLC; 2017: <https://www.routledge.com/Observer-Performance-Methods-for-Diagnostic-Imaging-Foundations-Modeling/Chakraborty/p/book/9781482214840>. Online updates to this book, which use the software, are at <https://dpc10ster.github.io/RJafrocQuickStart/>, <https://dpc10ster.github.io/RJafrocRocBook/> and at <https://dpc10ster.github.io/RJafrocFrocBook/>. Supported data collection paradigms are the ROC, FROC and the location ROC (LROC). ROC data consists of single ratings per images, where a rating is the perceived confidence level that the image is that of a diseased patient. An ROC curve is a plot of true positive fraction vs. false positive fraction. FROC data consists of a variable number (zero or more) of mark-rating pairs per image, where a mark is the location of a reported suspicious region and the rating is the confidence level that it is a real lesion. LROC data consists of a rating and a location of the most suspicious region, for every image. Four models of observer performance, and curve-fitting software, are implemented: the binormal model (BM), the contaminated binormal model (CBM), the correlated contaminated binormal model (CORCBM), and the radiological search model (RSM). Unlike the binormal model, CBM, CORCBM and RSM predict 'proper' ROC curves that do not inappropriately cross the chance diagonal. Additionally, RSM parameters are related to search performance (not measured in conventional ROC analysis) and classification performance. Search performance refers to finding lesions, i.e., true positives, while simultaneously not finding false positive locations. Classification performance measures the ability to distinguish between true and false positive locations. Knowing these separate performances allows principled optimization of reader or AI system performance. This package supersedes Windows JAFROC (jackknife alternative FROC) software V4.2.1, <https://github.com/dpc10ster/WindowsJafroc>. Package functions are organized as follows. Data file related function names are preceded by 'Df', curve fitting functions by 'Fit', included data sets by 'dataset', plotting functions by 'Plot', significance testing functions by 'St', sample size related functions by 'Ss', data simulation functions by 'Simulate' and utility functions by 'Util'. Implemented are figures of merit (FOMs) for quantifying performance and functions for visualizing empirical or fitted operating characteristics: e.g., ROC, FROC, alternative FROC (AFROC) and weighted AFROC (wAFROC) curves. For fully crossed study designs significance testing of reader-averaged FOM differences between modalities is implemented via either Dorfman-Berbaum-Metz or the Obuchowski-Rockette methods. Also implemented is single modality analysis, which allows comparison of performance of a group of radiologists to a specified value, or comparison of AI to a group of radiologists interpreting the same cases. Crossed-modality analysis is implemented wherein there are two crossed modality factors and the aim is to determined performance in each modality factor averaged over all levels of the second factor. Sample size estimation tools are provided for ROC and FROC studies; these use estimates of the relevant variances from a pilot study to predict required numbers of readers and cases in a pivotal study to achieve the desired power. Utility and data file manipulation functions allow data to be read in any of the currently used input formats, including Excel, and the results of the analysis can be viewed in text or Excel output files. The methods are illustrated with several included datasets from the author's collaborations. This update includes improvements to the code, some as a result of user-reported bugs and new feature requests, and others discovered during ongoing testing and code simplification.
Maintained by Dev Chakraborty. Last updated 5 months ago.
ai-optimizationartificial-intelligence-algorithmscomputer-aided-diagnosisfroc-analysisroc-analysistarget-classificationtarget-localizationcpp
67.4 match 19 stars 5.69 score 65 scriptsthie1e
cutpointr:Determine and Evaluate Optimal Cutpoints in Binary Classification Tasks
Estimate cutpoints that optimize a specified metric in binary classification tasks and validate performance using bootstrapping. Some methods for more robust cutpoint estimation are supported, e.g. a parametric method assuming normal distributions, bootstrapped cutpoints, and smoothing of the metric values per cutpoint using Generalized Additive Models. Various plotting functions are included. For an overview of the package see Thiele and Hirschfeld (2021) <doi:10.18637/jss.v098.i11>.
Maintained by Christian Thiele. Last updated 3 months ago.
bootstrappingcutpoint-optimizationroc-curvecpp
26.8 match 88 stars 10.44 score 322 scripts 1 dependentsspatstat
spatstat.explore:Exploratory Data Analysis for the 'spatstat' Family
Functionality for exploratory data analysis and nonparametric analysis of spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported.
Maintained by Adrian Baddeley. Last updated 1 months ago.
cluster-detectionconfidence-intervalshypothesis-testingk-functionroc-curvesscan-statisticssignificance-testingsimulation-envelopesspatial-analysisspatial-data-analysisspatial-sharpeningspatial-smoothingspatial-statistics
15.1 match 1 stars 10.17 score 67 scripts 148 dependentserikpeter
fbroc:Fast Algorithms to Bootstrap Receiver Operating Characteristics Curves
Implements a very fast C++ algorithm to quickly bootstrap receiver operating characteristics (ROC) curves and derived performance metrics, including the area under the curve (AUC) and the partial area under the curve as well as the true and false positive rate. The analysis of paired receiver operating curves is supported as well, so that a comparison of two predictors is possible. You can also plot the results and calculate confidence intervals. On a typical desktop computer the time needed for the calculation of 100000 bootstrap replicates given 500 observations requires time on the order of magnitude of one second.
Maintained by Erik Peter. Last updated 6 years ago.
33.0 match 7 stars 4.28 score 18 scriptsveseshan
clinfun:Clinical Trial Design and Data Analysis Functions
Utilities to make your clinical collaborations easier if not fun. It contains functions for designing studies such as Simon 2-stage and group sequential designs and for data analysis such as Jonckheere-Terpstra test and estimating survival quantiles.
Maintained by Venkatraman E. Seshan. Last updated 1 years ago.
17.2 match 5 stars 7.86 score 124 scripts 8 dependentsledell
cvAUC:Cross-Validated Area Under the ROC Curve Confidence Intervals
Tools for working with and evaluating cross-validated area under the ROC curve (AUC) estimators. The primary functions of the package are ci.cvAUC and ci.pooled.cvAUC, which report cross-validated AUC and compute confidence intervals for cross-validated AUC estimates based on influence curves for i.i.d. and pooled repeated measures data, respectively. One benefit to using influence curve based confidence intervals is that they require much less computation time than bootstrapping methods. The utility functions, AUC and cvAUC, are simple wrappers for functions from the ROCR package.
Maintained by Erin LeDell. Last updated 3 years ago.
aucconfidence-intervalscross-validationmachine-learningstatisticsvariance
12.1 match 23 stars 9.17 score 317 scripts 40 dependentstidymodels
yardstick:Tidy Characterizations of Model Performance
Tidy tools for quantifying how well model fits to a data set such as confusion matrices, class probability curve summaries, and regression metrics (e.g., RMSE).
Maintained by Emil Hvitfeldt. Last updated 4 days ago.
6.8 match 387 stars 15.47 score 2.2k scripts 60 dependentsbrian-j-smith
MRMCaov:Multi-Reader Multi-Case Analysis of Variance
Estimation and comparison of the performances of diagnostic tests in multi-reader multi-case studies where true case statuses (or ground truths) are known and one or more readers provide test ratings for multiple cases. Reader performance metrics are provided for area under and expected utility of ROC curves, likelihood ratio of positive or negative tests, and sensitivity and specificity. ROC curves can be estimated empirically or with binormal or binormal likelihood-ratio models. Statistical comparisons of diagnostic tests are based on the ANOVA model of Obuchowski-Rockette and the unified framework of Hillis (2005) <doi:10.1002/sim.2024>. The ANOVA can be conducted with data from a full factorial, nested, or partially paired study design; with random or fixed readers or cases; and covariances estimated with the DeLong method, jackknifing, or an unbiased method. Smith and Hillis (2020) <doi:10.1117/12.2549075>.
Maintained by Brian J Smith. Last updated 2 years ago.
18.9 match 12 stars 5.26 score 8 scripts 1 dependentsleifeld
btergm:Temporal Exponential Random Graph Models by Bootstrapped Pseudolikelihood
Temporal Exponential Random Graph Models (TERGM) estimated by maximum pseudolikelihood with bootstrapped confidence intervals or Markov Chain Monte Carlo maximum likelihood. Goodness of fit assessment for ERGMs, TERGMs, and SAOMs. Micro-level interpretation of ERGMs and TERGMs. The methods are described in Leifeld, Cranmer and Desmarais (2018), JStatSoft <doi:10.18637/jss.v083.i06>.
Maintained by Philip Leifeld. Last updated 12 months ago.
complex-networksdynamic-analysisergmestimationgoodness-of-fitinferencelongitudinal-datanetwork-analysispredictiontergm
13.5 match 17 stars 6.70 score 83 scripts 2 dependentsbioc
iCOBRA:Comparison and Visualization of Ranking and Assignment Methods
This package provides functions for calculation and visualization of performance metrics for evaluation of ranking and binary classification (assignment) methods. Various types of performance plots can be generated programmatically. The package also contains a shiny application for interactive exploration of results.
Maintained by Charlotte Soneson. Last updated 3 months ago.
10.1 match 14 stars 8.86 score 192 scripts 1 dependentsjinseob2kim
jsmodule:'RStudio' Addins and 'Shiny' Modules for Medical Research
'RStudio' addins and 'Shiny' modules for descriptive statistics, regression and survival analysis.
Maintained by Jinseob Kim. Last updated 3 days ago.
medicalrstudio-addinsshinyshiny-modulesstatistics
10.2 match 21 stars 8.68 score 61 scriptsevalclass
precrec:Calculate Accurate Precision-Recall and ROC (Receiver Operator Characteristics) Curves
Accurate calculations and visualization of precision-recall and ROC (Receiver Operator Characteristics) curves. Saito and Rehmsmeier (2015) <doi:10.1371/journal.pone.0118432>.
Maintained by Takaya Saito. Last updated 1 years ago.
9.2 match 45 stars 9.59 score 496 scripts 5 dependentscran
trinROC:Statistical Tests for Assessing Trinormal ROC Data
Several statistical test functions as well as a function for exploratory data analysis to investigate classifiers allocating individuals to one of three disjoint and ordered classes. In a single classifier assessment the discriminatory power is compared to classification by chance. In a comparison of two classifiers the null hypothesis corresponds to equal discriminatory power of the two classifiers. See also "ROC Analysis for Classification and Prediction in Practice" by Nakas, Bantis and Gatsonis (2023), ISBN 9781482233704.
Maintained by Reinhard Furrer. Last updated 5 months ago.
29.5 match 2.70 scorespatstat
spatstat.model:Parametric Statistical Modelling and Inference for the 'spatstat' Family
Functionality for parametric statistical modelling and inference for spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Supports parametric modelling, formal statistical inference, and model validation. Parametric models include Poisson point processes, Cox point processes, Neyman-Scott cluster processes, Gibbs point processes and determinantal point processes. Models can be fitted to data using maximum likelihood, maximum pseudolikelihood, maximum composite likelihood and the method of minimum contrast. Fitted models can be simulated and predicted. Formal inference includes hypothesis tests (quadrat counting tests, Cressie-Read tests, Clark-Evans test, Berman test, Diggle-Cressie-Loosmore-Ford test, scan test, studentised permutation test, segregation test, ANOVA tests of fitted models, adjusted composite likelihood ratio test, envelope tests, Dao-Genton test, balanced independent two-stage test), confidence intervals for parameters, and prediction intervals for point counts. Model validation techniques include leverage, influence, partial residuals, added variable plots, diagnostic plots, pseudoscore residual plots, model compensators and Q-Q plots.
Maintained by Adrian Baddeley. Last updated 7 days ago.
analysis-of-variancecluster-processconfidence-intervalscox-processdeterminantal-point-processesgibbs-processinfluenceleveragemodel-diagnosticsneyman-scottparameter-estimationpoisson-processspatial-analysisspatial-modellingspatial-point-processesstatistical-inference
8.8 match 5 stars 9.09 score 6 scripts 46 dependentsjgraux
PRROC:Precision-Recall and ROC Curves for Weighted and Unweighted Data
Computes the areas under the precision-recall (PR) and ROC curve for weighted (e.g., soft-labeled) and unweighted data. In contrast to other implementations, the interpolation between points of the PR curve is done by a non-linear piecewise function. In addition to the areas under the curves, the curves themselves can also be computed and plotted by a specific S3-method. References: Davis and Goadrich (2006) <doi:10.1145/1143844.1143874>; Keilwagen et al. (2014) <doi:10.1371/journal.pone.0092209>; Grau et al. (2015) <doi:10.1093/bioinformatics/btv153>.
Maintained by Jan Grau. Last updated 7 years ago.
9.5 match 8.35 score 1.2k scripts 56 dependentsparamita-sc
risksetROC:Riskset ROC Curve Estimation from Censored Survival Data
Compute time-dependent Incident/dynamic accuracy measures (ROC curve, AUC, integrated AUC )from censored survival data under proportional or non-proportional hazard assumption of Heagerty & Zheng (Biometrics, Vol 61 No 1, 2005, PP 92-105).
Maintained by Paramita Saha-Chaudhuri. Last updated 3 years ago.
21.1 match 3.71 score 57 scripts 3 dependentswinvector
WVPlots:Common Plots for Analysis
Select data analysis plots, under a standardized calling interface implemented on top of 'ggplot2' and 'plotly'. Plots of interest include: 'ROC', gain curve, scatter plot with marginal distributions, conditioned scatter plot with marginal densities, box and stem with matching theoretical distribution, and density with matching theoretical distribution.
Maintained by John Mount. Last updated 11 months ago.
9.4 match 85 stars 8.00 score 280 scriptsamsantac
TOC:Total Operating Characteristic Curve and ROC Curve
Construction of the Total Operating Characteristic (TOC) Curve and the Receiver (aka Relative) Operating Characteristic (ROC) Curve for spatial and non-spatial data. The TOC method is a modification of the ROC method which measures the ability of an index variable to diagnose either presence or absence of a characteristic. The diagnosis depends on whether the value of an index variable is above a threshold. Each threshold generates a two-by-two contingency table, which contains four entries: hits (H), misses (M), false alarms (FA), and correct rejections (CR). While ROC shows for each threshold only two ratios, H/(H + M) and FA/(FA + CR), TOC reveals the size of every entry in the contingency table for each threshold (Pontius Jr., R.G., Si, K. 2014. <doi:10.1080/13658816.2013.862623>).
Maintained by Ali Santacruz. Last updated 1 years ago.
16.4 match 4 stars 4.48 score 15 scriptslaresbernardo
lares:Analytics & Machine Learning Sidekick
Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.
Maintained by Bernardo Lares. Last updated 24 days ago.
analyticsapiautomationautomldata-sciencedescriptive-statisticsh2omachine-learningmarketingmmmpredictive-modelingpuzzlerlanguagerobynvisualization
7.2 match 233 stars 9.84 score 185 scripts 1 dependentsrezamoammadi
BDgraph:Bayesian Structure Learning in Graphical Models using Birth-Death MCMC
Advanced statistical tools for Bayesian structure learning in undirected graphical models, accommodating continuous, ordinal, discrete, count, and mixed data. It integrates recent advancements in Bayesian graphical models as presented in the literature, including the works of Mohammadi and Wit (2015) <doi:10.1214/14-BA889>, Mohammadi et al. (2021) <doi:10.1080/01621459.2021.1996377>, Dobra and Mohammadi (2018) <doi:10.1214/18-AOAS1164>, and Mohammadi et al. (2023) <doi:10.48550/arXiv.2307.00127>.
Maintained by Reza Mohammadi. Last updated 7 months ago.
9.4 match 8 stars 7.45 score 223 scripts 7 dependentscoffeemuggler
caTools:Tools: Moving Window Statistics, GIF, Base64, ROC AUC, etc
Contains several basic utility functions including: moving (rolling, running) window statistic functions, read/write for GIF and ENVI binary files, fast calculation of AUC, LogitBoost classifier, base64 encoder/decoder, round-off-error-free sum and cumsum, etc.
Maintained by Michael Dietze. Last updated 6 months ago.
6.2 match 8 stars 11.17 score 9.1k scripts 566 dependentsresplab
predtools:Prediction Model Tools
Provides additional functions for evaluating predictive models, including plotting calibration curves and model-based Receiver Operating Characteristic (mROC) based on Sadatsafavi et al (2021) <arXiv:2003.00316>.
Maintained by Amin Adibi. Last updated 2 years ago.
9.5 match 9 stars 6.74 score 77 scriptsgavinsimpson
analogue:Analogue and Weighted Averaging Methods for Palaeoecology
Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.
Maintained by Gavin L. Simpson. Last updated 6 months ago.
7.1 match 14 stars 8.96 score 185 scripts 4 dependentscran
verification:Weather Forecast Verification Utilities
Utilities for verifying discrete, continuous and probabilistic forecasts, and forecasts expressed as parametric distributions are included.
Maintained by Eric Gilleland. Last updated 4 months ago.
14.7 match 3 stars 4.19 score 6 dependentsirinagain
iglu:Interpreting Glucose Data from Continuous Glucose Monitors
Implements a wide range of metrics for measuring glucose control and glucose variability based on continuous glucose monitoring data. The list of implemented metrics is summarized in Rodbard (2009) <doi:10.1089/dia.2009.0015>. Additional visualization tools include time-series plots, lasagna plots and ambulatory glucose profile report.
Maintained by Irina Gaynanova. Last updated 10 days ago.
6.8 match 26 stars 9.00 score 39 scriptsspatstat
spatstat.linnet:Linear Networks Functionality of the 'spatstat' Family
Defines types of spatial data on a linear network and provides functionality for geometrical operations, data analysis and modelling of data on a linear network, in the 'spatstat' family of packages. Contains definitions and support for linear networks, including creation of networks, geometrical measurements, topological connectivity, geometrical operations such as inserting and deleting vertices, intersecting a network with another object, and interactive editing of networks. Data types defined on a network include point patterns, pixel images, functions, and tessellations. Exploratory methods include kernel estimation of intensity on a network, K-functions and pair correlation functions on a network, simulation envelopes, nearest neighbour distance and empty space distance, relative risk estimation with cross-validated bandwidth selection. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the function lppm() similar to glm(). Only Poisson models are implemented so far. Models may involve dependence on covariates and dependence on marks. Models are fitted by maximum likelihood. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots. Random point patterns on a network can be generated using a variety of models.
Maintained by Adrian Baddeley. Last updated 2 months ago.
density-estimationheat-equationkernel-density-estimationnetwork-analysispoint-processesspatial-data-analysisstatistical-analysisstatistical-inferencestatistical-models
6.3 match 6 stars 9.64 score 35 scripts 43 dependentsjacobseedorff21
BranchGLM:Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms
Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'.
Maintained by Jacob Seedorff. Last updated 6 months ago.
generalized-linear-modelsregressionstatisticssubset-selectionvariable-selectionopenblascppopenmp
9.6 match 7 stars 6.20 score 30 scriptsohdsi
PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model
A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.
Maintained by Egill Fridgeirsson. Last updated 9 days ago.
5.4 match 190 stars 10.85 score 297 scriptsmodeloriented
auditor:Model Audit - Verification, Validation, and Error Analysis
Provides an easy to use unified interface for creating validation plots for any model. The 'auditor' helps to avoid repetitive work consisting of writing code needed to create residual plots. This visualizations allow to asses and compare the goodness of fit, performance, and similarity of models.
Maintained by Alicja Gosiewska. Last updated 1 years ago.
classificationerror-analysisexplainable-artificial-intelligencemachine-learningmodel-validationregression-modelsresidualsxai
6.6 match 58 stars 8.76 score 94 scripts 2 dependentspaulowhite
timeROC:Time-Dependent ROC Curve and AUC for Censored Survival Data
Estimation of time-dependent ROC curve and area under time dependent ROC curve (AUC) in the presence of censored data, with or without competing risks. Confidence intervals of AUCs and tests for comparing AUCs of two rival markers measured on the same subjects can be computed, using the iid-representation of the AUC estimator. Plot functions for time-dependent ROC curves and AUC curves are provided. Time-dependent Positive Predictive Values (PPV) and Negative Predictive Values (NPV) can also be computed. See Blanche et al. (2013) <doi:10.1002/sim.5958> and references therein for the details of the methods implemented in the package.
Maintained by Paul Blanche. Last updated 5 years ago.
9.0 match 9 stars 6.24 score 342 scripts 8 dependentsbrandon-gallas
iMRMC:Multi-Reader, Multi-Case Analysis Methods (ROC, Agreement, and Other Metrics)
This software does Multi-Reader, Multi-Case (MRMC) analyses of data from imaging studies where clinicians (readers) evaluate patient images (cases). What does this mean? ... Many imaging studies are designed so that every reader reads every case in all modalities, a fully-crossed study. In this case, the data is cross-correlated, and we consider the readers and cases to be cross-correlated random effects. An MRMC analysis accounts for the variability and correlations from the readers and cases when estimating variances, confidence intervals, and p-values. The functions in this package can treat arbitrary study designs and studies with missing data, not just fully-crossed study designs. An overview of this software, including references presenting details on the methods, can be found here: <https://www.fda.gov/medical-devices/science-and-research-medical-devices/imrmc-software-do-multi-reader-multi-case-statistical-analysis-reader-studies>.
Maintained by Brandon Gallas. Last updated 7 months ago.
16.7 match 3.32 score 58 scripts 1 dependentstrevorhastie
glmnet:Lasso and Elastic-Net Regularized Generalized Linear Models
Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression, Cox model, multiple-response Gaussian, and the grouped multinomial regression; see <doi:10.18637/jss.v033.i01> and <doi:10.18637/jss.v039.i05>. There are two new and important additions. The family argument can be a GLM family object, which opens the door to any programmed family (<doi:10.18637/jss.v106.i01>). This comes with a modest computational cost, so when the built-in families suffice, they should be used instead. The other novelty is the relax option, which refits each of the active sets in the path unpenalized. The algorithm uses cyclical coordinate descent in a path-wise fashion, as described in the papers cited.
Maintained by Trevor Hastie. Last updated 2 years ago.
3.5 match 82 stars 15.15 score 22k scripts 736 dependentstdhock
directlabels:Direct Labels for Multicolor Plots
An extensible framework for automatically placing direct labels onto multicolor 'lattice' or 'ggplot2' plots. Label positions are described using Positioning Methods which can be re-used across several different plots. There are heuristics for examining "trellis" and "ggplot" objects and inferring an appropriate Positioning Method.
Maintained by Toby Dylan Hocking. Last updated 11 months ago.
4.9 match 83 stars 10.62 score 1.8k scripts 16 dependentsmlcoding
flare:Family of Lasso Regression
Provide the implementation of a family of Lasso variants including Dantzig Selector, LAD Lasso, SQRT Lasso, Lq Lasso for estimating high dimensional sparse linear model. We adopt the alternating direction method of multipliers and convert the original optimization problem into a sequential L1 penalized least square minimization problem, which can be efficiently solved by linearization algorithm. A multi-stage screening approach is adopted for further acceleration. Besides the sparse linear model estimation, we also provide the extension of these Lasso variants to sparse Gaussian graphical model estimation including TIGER and CLIME using either L1 or adaptive penalty. Missing values can be tolerated for Dantzig selector and CLIME. The computation is memory-optimized using the sparse matrix output. For more information, please refer to <https://www.jmlr.org/papers/volume16/li15a/li15a.pdf>.
Maintained by Xingguo Li. Last updated 4 months ago.
12.1 match 1 stars 4.31 score 141 scripts 4 dependentsnliulab
AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator
A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.
Maintained by Feng Xie. Last updated 15 days ago.
6.7 match 32 stars 7.70 score 30 scriptsbabaknaimi
sdm:Species Distribution Modelling
An extensible framework for developing species distribution models using individual and community-based approaches, generate ensembles of models, evaluate the models, and predict species potential distributions in space and time. For more information, please check the following paper: Naimi, B., Araujo, M.B. (2016) <doi:10.1111/ecog.01881>.
Maintained by Babak Naimi. Last updated 2 months ago.
5.3 match 24 stars 9.53 score 312 scripts 1 dependentsjoshuaulrich
TTR:Technical Trading Rules
A collection of over 50 technical indicators for creating technical trading rules. The package also provides fast implementations of common rolling-window functions, and several volatility calculations.
Maintained by Joshua Ulrich. Last updated 1 years ago.
algorithmic-tradingfinancetechnical-analysis
3.3 match 338 stars 15.11 score 2.8k scripts 359 dependentshopkinsidd
phylosamp:Sample Size Calculations for Molecular and Phylogenetic Studies
Implements novel tools for estimating sample sizes needed for phylogenetic studies, including studies focused on estimating the probability of true pathogen transmission between two cases given phylogenetic linkage and studies focused on tracking pathogen variants at a population level. Methods described in Wohl, Giles, and Lessler (2021) and in Wohl, Lee, DiPrete, and Lessler (2023).
Maintained by Justin Lessler. Last updated 2 years ago.
7.4 match 12 stars 6.65 score 25 scriptsbxc147
Epi:Statistical Analysis in Epidemiology
Functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data. In particular representation, manipulation, rate estimation and simulation for multistate data - the Lexis suite of functions, which includes interfaces to 'mstate', 'etm' and 'cmprsk' packages. Contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.
Maintained by Bendix Carstensen. Last updated 2 months ago.
5.1 match 4 stars 9.65 score 708 scripts 11 dependentsehrlinger
ggRandomForests:Visually Exploring Random Forests
Graphic elements for exploring Random Forests using the 'randomForest' or 'randomForestSRC' package for survival, regression and classification forests and 'ggplot2' package plotting.
Maintained by John Ehrlinger. Last updated 5 days ago.
5.3 match 148 stars 8.94 score 197 scriptsmxrodriguezuvigo
ROCnReg:ROC Curve Inference with and without Covariates
Estimates the pooled (unadjusted) Receiver Operating Characteristic (ROC) curve, the covariate-adjusted ROC (AROC) curve, and the covariate-specific/conditional ROC (cROC) curve by different methods, both Bayesian and frequentist. Also, it provides functions to obtain ROC-based optimal cutpoints utilizing several criteria. Based on Erkanli, A. et al. (2006) <doi:10.1002/sim.2496>; Faraggi, D. (2003) <doi:10.1111/1467-9884.00350>; Gu, J. et al. (2008) <doi:10.1002/sim.3366>; Inacio de Carvalho, V. et al. (2013) <doi:10.1214/13-BA825>; Inacio de Carvalho, V., and Rodriguez-Alvarez, M.X. (2022) <doi:10.1214/21-STS839>; Janes, H., and Pepe, M.S. (2009) <doi:10.1093/biomet/asp002>; Pepe, M.S. (1998) <http://www.jstor.org/stable/2534001?seq=1>; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1016/j.csda.2010.07.018>; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1007/s11222-010-9184-1>. Please see Rodriguez-Alvarez, M.X. and Inacio, V. (2021) <doi:10.32614/RJ-2021-066> for more details.
Maintained by Maria Xose Rodriguez-Alvarez. Last updated 10 months ago.
28.1 match 1 stars 1.66 score 46 scriptstoduckhanh
bcROCsurface:Bias-Corrected Methods for Estimating the ROC Surface of Continuous Diagnostic Tests
The bias-corrected estimation methods for the receiver operating characteristics ROC surface and the volume under ROC surfaces (VUS) under missing at random (MAR) assumption.
Maintained by Duc-Khanh To. Last updated 1 years ago.
13.5 match 3.45 score 14 scriptsflr
FLCore:Core Package of FLR, Fisheries Modelling in R
Core classes and methods for FLR, a framework for fisheries modelling and management strategy simulation in R. Developed by a team of fisheries scientists in various countries. More information can be found at <http://flr-project.org/>.
Maintained by Iago Mosqueira. Last updated 9 days ago.
fisheriesflrfisheries-modelling
5.2 match 16 stars 8.78 score 956 scripts 23 dependentscran
PresenceAbsence:Presence-Absence Model Evaluation
Provides a set of functions useful when evaluating the results of presence-absence models. Package includes functions for calculating threshold dependent measures such as confusion matrices, pcc, sensitivity, specificity, and Kappa, and produces plots of each measure as the threshold is varied. It will calculate optimal threshold choice according to a choice of optimization criteria. It also includes functions to plot the threshold independent ROC curves along with the associated AUC (area under the curve).
Maintained by Elizabeth Freeman. Last updated 2 years ago.
8.5 match 1 stars 5.32 score 224 scripts 9 dependentstjmahr
wisclabmisc:Tools to Support the 'WiscLab'
A collection of 'R' functions for use (and re-use) across 'WiscLab' projects. These are analysis or presentation oriented functions--that is, they are not for data reading or data cleaning.
Maintained by Tristan Mahr. Last updated 4 days ago.
10.8 match 3.95 score 4 scriptsparamita-sc
survivalROC:Time-Dependent ROC Curve Estimation from Censored Survival Data
Compute time-dependent ROC curve from censored survival data using Kaplan-Meier (KM) or Nearest Neighbor Estimation (NNE) method of Heagerty, Lumley & Pepe (Biometrics, Vol 56 No 2, 2000, PP 337-344).
Maintained by Paramita Saha-Chaudhuri. Last updated 2 years ago.
6.7 match 6 stars 6.37 score 266 scripts 16 dependentsegenn
rtemis:Machine Learning and Visualization
Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.
Maintained by E.D. Gennatas. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningmachine-learning-libraryvisualization
6.0 match 145 stars 7.09 score 50 scripts 2 dependentsjosetamezpena
FRESA.CAD:Feature Selection Algorithms for Computer Aided Diagnosis
Contains a set of utilities for building and testing statistical models (linear, logistic,ordinal or COX) for Computer Aided Diagnosis/Prognosis applications. Utilities include data adjustment, univariate analysis, model building, model-validation, longitudinal analysis, reporting and visualization.
Maintained by Jose Gerardo Tamez-Pena. Last updated 1 months ago.
7.6 match 7 stars 5.59 score 31 scriptskenaho1
asbio:A Collection of Statistical Tools for Biologists
Contains functions from: Aho, K. (2014) Foundational and Applied Statistics for Biologists using R. CRC/Taylor and Francis, Boca Raton, FL, ISBN: 978-1-4398-7338-0.
Maintained by Ken Aho. Last updated 2 months ago.
5.6 match 5 stars 7.32 score 310 scripts 3 dependentsrsquaredacademy
blorr:Tools for Developing Binary Logistic Regression Models
Tools designed to make it easier for beginner and intermediate users to build and validate binary logistic regression models. Includes bivariate analysis, comprehensive regression output, model fit statistics, variable selection procedures, model validation techniques and a 'shiny' app for interactive model building.
Maintained by Aravind Hebbali. Last updated 4 months ago.
logistic-regression-modelsregressioncpp
5.8 match 17 stars 7.13 score 144 scripts 1 dependentsben519
mltools:Machine Learning Tools
A collection of machine learning helper functions, particularly assisting in the Exploratory Data Analysis phase. Makes heavy use of the 'data.table' package for optimal speed and memory efficiency. Highlights include a versatile bin_data() function, sparsify() for converting a data.table to sparse matrix format with one-hot encoding, fast evaluation metrics, and empirical_cdf() for calculating empirical Multivariate Cumulative Distribution Functions.
Maintained by Ben Gorman. Last updated 3 years ago.
exploratory-data-analysismachine-learning
4.3 match 72 stars 9.58 score 1.2k scripts 13 dependentsxfim
ggmcmc:Tools for Analyzing MCMC Simulations from Bayesian Inference
Tools for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables, and functions to work with hierarchical/multilevel batches of parameters (Fernández-i-Marín, 2016 <doi:10.18637/jss.v070.i09>).
Maintained by Xavier Fernández i Marín. Last updated 2 years ago.
bayesian-data-analysisggplot2graphicaljagsmcmcstan
3.4 match 112 stars 12.02 score 1.6k scripts 8 dependentstidymodels
broom:Convert Statistical Objects into Tidy Tibbles
Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.
Maintained by Simon Couch. Last updated 4 months ago.
1.9 match 1.5k stars 21.56 score 37k scripts 1.4k dependentsjinseob2kim
jstable:Create Tables from Different Types of Regression
Create regression tables from generalized linear model(GLM), generalized estimating equation(GEE), generalized linear mixed-effects model(GLMM), Cox proportional hazards model, survey-weighted generalized linear model(svyglm) and survey-weighted Cox model results for publication.
Maintained by Jinseob Kim. Last updated 12 days ago.
4.0 match 26 stars 9.98 score 199 scripts 1 dependentsyizhenxu
TGST:Targeted Gold Standard Testing
Functions for implementing the targeted gold standard (GS) testing. You provide the true disease or treatment failure status and the risk score, tell 'TGST' the availability of GS tests and which method to use, and it returns the optimal tripartite rules. Please refer to Liu et al. (2013) <doi:10.1080/01621459.2013.810149> for more details.
Maintained by Yizhen Xu. Last updated 4 years ago.
10.8 match 3.70 scoretdhock
WeightedROC:Fast, Weighted ROC Curves
Fast computation of Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) for weighted binary classification problems (weights are example-specific cost values).
Maintained by Toby Dylan Hocking. Last updated 3 years ago.
6.8 match 29 stars 5.86 score 125 scriptsfcharte
mldr:Exploratory Data Analysis and Manipulation of Multi-Label Data Sets
Exploratory data analysis and manipulation functions for multi- label data sets along with an interactive Shiny application to ease their use.
Maintained by David Charte. Last updated 5 years ago.
5.6 match 23 stars 7.07 score 168 scripts 2 dependentsminatonakazawa
fmsb:Functions for Medical Statistics Book with some Demographic Data
Several utility functions for the book entitled "Practices of Medical and Health Data Analysis using R" (Pearson Education Japan, 2007) with Japanese demographic data and some demographic analysis related functions.
Maintained by Minato Nakazawa. Last updated 1 years ago.
5.1 match 3 stars 7.74 score 1.9k scripts 23 dependentsaiparragirre
svyROC:Estimation of the ROC Curve and the AUC for Complex Survey Data
Estimate the receiver operating characteristic (ROC) curve, area under the curve (AUC) and optimal cut-off points for individual classification taking into account complex sampling designs when working with complex survey data. Methods implemented in this package are described in: A. Iparragirre, I. Barrio, I. Arostegui (2024) <doi:10.1002/sta4.635>; A. Iparragirre, I. Barrio, J. Aramendi, I. Arostegui (2022) <doi:10.2436/20.8080.02.121>; A. Iparragirre, I. Barrio (2024) <doi:10.1007/978-3-031-65723-8_7>.
Maintained by Amaia Iparragirre. Last updated 4 months ago.
aucauc-optimism-correctioncomplex-survey-dataoptimal-cut-off-pointsroc-curvesampling-weights
14.2 match 2.70 scoresimondedman
gbm.auto:Automated Boosted Regression Tree Modelling and Mapping Suite
Automates delta log-normal boosted regression tree abundance prediction. Loops through parameters provided (LR (learning rate), TC (tree complexity), BF (bag fraction)), chooses best, simplifies, & generates line, dot & bar plots, & outputs these & predictions & a report, makes predicted abundance maps, and Unrepresentativeness surfaces. Package core built around 'gbm' (gradient boosting machine) functions in 'dismo' (Hijmans, Phillips, Leathwick & Jane Elith, 2020 & ongoing), itself built around 'gbm' (Greenwell, Boehmke, Cunningham & Metcalfe, 2020 & ongoing, originally by Ridgeway). Indebted to Elith/Leathwick/Hastie 2008 'Working Guide' <doi:10.1111/j.1365-2656.2008.01390.x>; workflow follows Appendix S3. See <https://www.simondedman.com/> for published guides and papers using this package.
Maintained by Simon Dedman. Last updated 6 days ago.
6.6 match 18 stars 5.77 score 13 scriptsdanny-ldc
RcmdrPlugin.ROC:Rcmdr Receiver Operator Characteristic Plug-in Package
Rcmdr GUI extension plug-in for Receiver Operator Characteristic tools from pROC package. Also it ads a Rcmdr GUI extension for Hosmer and Lemeshow GOF test from the package ResourceSelection.
Maintained by Daniel-Corneliu Leucuta. Last updated 3 years ago.
37.5 match 1.00 score 1 scriptsballings
AUC:Threshold Independent Performance Measures for Probabilistic Classifiers
Various functions to compute the area under the curve of selected measures: The area under the sensitivity curve (AUSEC), the area under the specificity curve (AUSPC), the area under the accuracy curve (AUACC), and the area under the receiver operating characteristic curve (AUROC). Support for visualization and partial areas is included.
Maintained by Michel Ballings. Last updated 3 years ago.
6.8 match 5.37 score 424 scripts 7 dependentsbioc
DirichletMultinomial:Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data
Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Quince, 2012, PLoS ONE 7(2): 1-15, as discussed further in the man page for this package, ?DirichletMultinomial.
Maintained by Martin Morgan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclusteringclassificationmetagenomicsgsl
3.3 match 11 stars 10.97 score 125 scripts 26 dependentsanimint
animint2:Animated Interactive Grammar of Graphics
Functions are provided for defining animated, interactive data visualizations in R code, and rendering on a web page. The 2018 Journal of Computational and Graphical Statistics paper, <doi:10.1080/10618600.2018.1513367> describes the concepts implemented.
Maintained by Toby Hocking. Last updated 27 days ago.
4.0 match 64 stars 8.87 score 173 scriptstarnduong
ks:Kernel Smoothing
Kernel smoothers for univariate and multivariate data, with comprehensive visualisation and bandwidth selection capabilities, including for densities, density derivatives, cumulative distributions, clustering, classification, density ridges, significant modal regions, and two-sample hypothesis tests. Chacon & Duong (2018) <doi:10.1201/9780429485572>.
Maintained by Tarn Duong. Last updated 6 months ago.
3.4 match 6 stars 10.14 score 920 scripts 262 dependentsbarbarabodinier
fake:Flexible Data Simulation Using the Multivariate Normal Distribution
This R package can be used to generate artificial data conditionally on pre-specified (simulated or user-defined) relationships between the variables and/or observations. Each observation is drawn from a multivariate Normal distribution where the mean vector and covariance matrix reflect the desired relationships. Outputs can be used to evaluate the performances of variable selection, graphical modelling, or clustering approaches by comparing the true and estimated structures (B Bodinier et al (2021) <arXiv:2106.02521>).
Maintained by Barbara Bodinier. Last updated 2 years ago.
7.0 match 6 stars 4.86 score 81 scripts 1 dependentsnomahi
dmetatools:Computational tools for meta-analysis of diagnostic accuracy test
Computational tools for meta-analysis of diagnostic accuracy test. This package enables computations of confidence interval for the AUC of summary ROC curve and some related AUC-based inference methods.
Maintained by Hisashi Noma. Last updated 3 years ago.
aucbootstrapdiagnostic-testsmeta-analysissummary-roc-curve
12.3 match 2.70 score 2 scriptstagteam
riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks
Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.
Maintained by Thomas Alexander Gerds. Last updated 17 days ago.
2.5 match 46 stars 13.00 score 736 scripts 35 dependentseasystats
performance:Assessment of Regression Models Performance
Utilities for computing measures to assess model quality, which are not directly provided by R's 'base' or 'stats' packages. These include e.g. measures like r-squared, intraclass correlation coefficient (Nakagawa, Johnson & Schielzeth (2017) <doi:10.1098/rsif.2017.0213>), root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models. References: Lüdecke et al. (2021) <doi:10.21105/joss.03139>.
Maintained by Daniel Lüdecke. Last updated 19 days ago.
aiceasystatshacktoberfestloomachine-learningmixed-modelsmodelsperformancer2statistics
2.0 match 1.1k stars 16.17 score 4.3k scripts 47 dependentsziyili20
caROC:Continuous Biomarker Evaluation with Adjustment of Covariates
Compute covariate-adjusted specificity at controlled sensitivity level, or covariate-adjusted sensitivity at controlled specificity level, or covariate-adjust receiver operating characteristic curve, or covariate-adjusted thresholds at controlled sensitivity/specificity level. All statistics could also be computed for specific sub-populations given their covariate values. Methods are described in Ziyi Li, Yijian Huang, Datta Patil, Martin G. Sanda (2021+) "Covariate adjustment in continuous biomarker assessment".
Maintained by Ziyi Li. Last updated 4 years ago.
16.0 match 2.00 score 5 scriptscran
nsROC:Non-Standard ROC Curve Analysis
Tools for estimating Receiver Operating Characteristic (ROC) curves, building confidence bands, comparing several curves both for dependent and independent data, estimating the cumulative-dynamic ROC curve in presence of censored data, and performing meta-analysis studies, among others.
Maintained by Sonia Perez Fernandez. Last updated 7 years ago.
20.1 match 1 stars 1.58 score 19 scriptsmxrodriguezuvigo
npROCRegression:Kernel-Based Nonparametric ROC Regression Modelling
Implements several nonparametric regression approaches for the inclusion of covariate information on the receiver operating characteristic (ROC) framework.
Maintained by Maria Xose Rodriguez-Alvarez. Last updated 2 years ago.
12.7 match 1 stars 2.48 score 15 scriptsjackdunnnz
iai:Interface to 'Interpretable AI' Modules
An interface to the algorithms of 'Interpretable AI' <https://www.interpretable.ai> from the R programming language. 'Interpretable AI' provides various modules, including 'Optimal Trees' for classification, regression, prescription and survival analysis, 'Optimal Imputation' for missing data imputation and outlier detection, and 'Optimal Feature Selection' for exact sparse regression. The 'iai' package is an open-source project. The 'Interpretable AI' software modules are proprietary products, but free academic and evaluation licenses are available.
Maintained by Jack Dunn. Last updated 5 months ago.
15.7 match 1 stars 2.00 score 7 scriptsnicolalunardon
ROSE:Random Over-Sampling Examples
Functions to deal with binary classification problems in the presence of imbalanced classes. Synthetic balanced samples are generated according to ROSE (Menardi and Torelli, 2013). Functions that implement more traditional remedies to the class imbalance are also provided, as well as different metrics to evaluate a learner accuracy. These are estimated by holdout, bootstrap or cross-validation methods.
Maintained by Nicola Lunardon. Last updated 4 years ago.
4.5 match 4 stars 6.86 score 1.6k scripts 3 dependentsbioc
ROCpAI:Receiver Operating Characteristic Partial Area Indexes for evaluating classifiers
The package analyzes the Curve ROC, identificates it among different types of Curve ROC and calculates the area under de curve through the method that is most accuracy. This package is able to standarizate proper and improper pAUC.
Maintained by Juan-Pedro Garcia. Last updated 5 months ago.
softwarestatisticalmethodclassification
9.2 match 3.30 score 2 scriptsandrisignorell
ModTools:Building Regression and Classification Models
Consistent user interface to the most common regression and classification algorithms, such as random forest, neural networks, C5 trees and support vector machines, complemented with a handful of auxiliary functions, such as variable importance and a tuning function for the parameters.
Maintained by Andri Signorell. Last updated 2 months ago.
7.2 match 2 stars 4.20 score 3 scriptsmlr-org
mlr3:Machine Learning in R - Next Generation
Efficient, object-oriented programming on the building blocks of machine learning. Provides 'R6' objects for tasks, learners, resamplings, and measures. The package is geared towards scalability and larger datasets by supporting parallelization and out-of-memory data-backends like databases. While 'mlr3' focuses on the core computational operations, add-on packages provide additional functionality.
Maintained by Marc Becker. Last updated 4 days ago.
classificationdata-sciencemachine-learningmlr3regression
2.0 match 972 stars 14.86 score 2.3k scripts 35 dependentsbioc
survcomp:Performance Assessment and Comparison for Survival Analysis
Assessment and Comparison for Performance of Risk Prediction (Survival) Models.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
geneexpressiondifferentialexpressionvisualizationcpp
3.4 match 8.46 score 448 scripts 12 dependentsglsnow
TeachingDemos:Demonstrations for Teaching and Learning
Demonstration functions that can be used in a classroom to demonstrate statistical concepts, or on your own to better understand the concepts or the programming.
Maintained by Greg Snow. Last updated 1 years ago.
4.0 match 7.18 score 760 scripts 13 dependentsschlosslab
mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines
An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.
Maintained by Kelly Sovacool. Last updated 2 years ago.
3.6 match 56 stars 7.83 score 86 scriptstgno3
DJL:Distance Measure Based Judgment and Learning
Implements various decision support tools related to the Econometrics & Technometrics. Subroutines include correlation reliability test, Mahalanobis distance measure for outlier detection, combinatorial search (all possible subset regression), non-parametric efficiency analysis measures: DDF (directional distance function), DEA (data envelopment analysis), HDF (hyperbolic distance function), SBM (slack-based measure), and SF (shortage function), benchmarking, Malmquist productivity analysis, risk analysis, technology adoption model, new product target setting, network DEA, dynamic DEA, intertemporal budgeting, etc.
Maintained by Dong-Joon Lim. Last updated 2 years ago.
14.3 match 1 stars 1.97 score 93 scriptswinvector
sigr:Succinct and Correct Statistical Summaries for Reports
Succinctly and correctly format statistical summaries of various models and tests (F-test, Chi-Sq-test, Fisher-test, T-test, and rank-significance). This package also includes empirical tests, such as Monte Carlo and bootstrap distribution estimates.
Maintained by John Mount. Last updated 2 years ago.
3.9 match 28 stars 7.18 score 97 scripts 1 dependentsgbm-developers
gbm:Generalized Boosted Regression Models
An implementation of extensions to Freund and Schapire's AdaBoost algorithm and Friedman's gradient boosting machine. Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart). Originally developed by Greg Ridgeway. Newer version available at github.com/gbm-developers/gbm3.
Maintained by Greg Ridgeway. Last updated 9 months ago.
2.0 match 52 stars 13.85 score 6.8k scripts 91 dependentsbioc
minet:Mutual Information NETworks
This package implements various algorithms for inferring mutual information networks from data.
Maintained by Patrick E. Meyer. Last updated 5 months ago.
microarraygraphandnetworknetworknetworkinferencecpp
4.5 match 6.15 score 114 scripts 16 dependentsda-zar
ROCket:Simple and Fast ROC Curves
A set of functions for receiver operating characteristic (ROC) curve estimation and area under the curve (AUC) calculation. All functions are designed to work with aggregated data; nevertheless, they can also handle raw samples. In 'ROCket', we distinguish two types of ROC curve representations: 1) parametric curves - the true positive rate (TPR) and the false positive rate (FPR) are functions of a parameter (the score), 2) functions - TPR is a function of FPR. There are several ROC curve estimation methods available. An introduction to the mathematical background of the implemented methods (and much more) can be found in de Zea Bermudez, Gonçalves, Oliveira & Subtil (2014) <https://www.ine.pt/revstat/pdf/rs140101.pdf> and Cai & Pepe (2004) <doi:10.1111/j.0006-341X.2004.00200.x>.
Maintained by Daniel Lazar. Last updated 4 years ago.
10.2 match 1 stars 2.70 score 6 scriptsabichat
evabic:Evaluation of Binary Classifiers
Evaluates the performance of binary classifiers. Computes confusion measures (TP, TN, FP, FN), derived measures (TPR, FDR, accuracy, F1, DOR, ..), and area under the curve. Outputs are well suited for nested dataframes.
Maintained by Antoine Bichat. Last updated 3 years ago.
classifiermeasurespredictorsroc-curvestatistics
7.5 match 6 stars 3.62 score 14 scriptsmyles-lewis
nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'
Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.
Maintained by Myles Lewis. Last updated 6 days ago.
3.4 match 12 stars 7.92 score 46 scriptscran
datarobot:'DataRobot' Predictive Modeling API
For working with the 'DataRobot' predictive modeling platform's API <https://www.datarobot.com/>.
Maintained by AJ Alon. Last updated 1 years ago.
7.5 match 2 stars 3.48 scoreiangow
farr:Data and Code for Financial Accounting Research
Handy functions and data to support a course book for accounting research. Gow, Ian D. and Tongqing Ding (2024) 'Empirical Research in Accounting: Tools and Methods' <https://iangow.github.io/far_book/>.
Maintained by Ian Gow. Last updated 1 months ago.
5.1 match 17 stars 5.05 score 66 scriptsbioc
structToolbox:Data processing & analysis tools for Metabolomics and other omics
An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.
Maintained by Gavin Rhys Lloyd. Last updated 25 days ago.
workflowstepmetabolomicsbioconductor-packagedimslc-msmachine-learningmultivariate-analysisstatisticsunivariate
4.0 match 10 stars 6.26 score 12 scriptseasystats
see:Model Visualisation Toolbox for 'easystats' and 'ggplot2'
Provides plotting utilities supporting packages in the 'easystats' ecosystem (<https://github.com/easystats/easystats>) and some extra themes, geoms, and scales for 'ggplot2'. Color scales are based on <https://materialui.co/>. References: Lüdecke et al. (2021) <doi:10.21105/joss.03393>.
Maintained by Indrajeet Patil. Last updated 5 days ago.
data-visualizationeasystatsggplot2hacktoberfestplottingseestatisticsvisualisationvisualization
1.9 match 902 stars 13.22 score 2.0k scripts 3 dependentsbioc
TBSignatureProfiler:Profile RNA-Seq Data Using TB Pathway Signatures
Gene signatures of TB progression, TB disease, and other TB disease states have been validated and published previously. This package aggregates known signatures and provides computational tools to enlist their usage on other datasets. The TBSignatureProfiler makes it easy to profile RNA-Seq data using these signatures and includes common signature profiling tools including ASSIGN, GSVA, and ssGSEA. Original models for some gene signatures are also available. A shiny app provides some functionality alongside for detailed command line accessibility.
Maintained by Aubrey R. Odom. Last updated 3 months ago.
geneexpressiondifferentialexpressionbioconductor-packagebiomarkersgene-signaturestuberculosis
3.4 match 12 stars 7.25 score 23 scriptscran
epiDisplay:Epidemiological Data Display Package
Package for data exploration and result presentation. Full 'epicalc' package with data management functions is available at '<https://medipe.psu.ac.th/epicalc/>'.
Maintained by Virasakdi Chongsuvivatwong. Last updated 3 years ago.
4.5 match 1 stars 5.44 score 758 scripts 2 dependentsmfrasco
Metrics:Evaluation Metrics for Machine Learning
An implementation of evaluation metrics in R that are commonly used in supervised machine learning. It implements metrics for regression, time series, binary classification, classification, and information retrieval problems. It has zero dependencies and a consistent, simple interface for all functions.
Maintained by Michael Frasco. Last updated 6 years ago.
1.9 match 99 stars 13.02 score 6.1k scripts 51 dependentsipa-tys
ROCR:Visualizing the Performance of Scoring Classifiers
ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of trade-off visualizations for specific pairs of performance measures. ROCR is a flexible tool for creating cutoff-parameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface). Curves from different cross-validation or bootstrapping runs can be averaged by different methods, and standard deviations, standard errors or box plots can be used to visualize the variability across the runs. The parameterization can be visualized by printing cutoff values at the corresponding curve positions, or by coloring the curve according to cutoff. All components of a performance plot can be quickly adjusted using a flexible parameter dispatching mechanism. Despite its flexibility, ROCR is easy to use, with only three commands and reasonable default values for all optional parameters.
Maintained by Felix G.M. Ernst. Last updated 12 months ago.
1.7 match 38 stars 14.29 score 9.2k scripts 217 dependentsbioc
gCrisprTools:Suite of Functions for Pooled Crispr Screen QC and Analysis
Set of tools for evaluating pooled high-throughput screening experiments, typically employing CRISPR/Cas9 or shRNA expression cassettes. Contains methods for interrogating library and cassette behavior within an experiment, identifying differentially abundant cassettes, aggregating signals to identify candidate targets for empirical validation, hypothesis testing, and comprehensive reporting. Version 2.0 extends these applications to include a variety of tools for contextualizing and integrating signals across many experiments, incorporates extended signal enrichment methodologies via the "sparrow" package, and streamlines many formal requirements to aid in interpretablity.
Maintained by Russell Bainer. Last updated 5 months ago.
immunooncologycrisprpooledscreensexperimentaldesignbiomedicalinformaticscellbiologyfunctionalgenomicspharmacogenomicspharmacogeneticssystemsbiologydifferentialexpressiongenesetenrichmentgeneticsmultiplecomparisonnormalizationpreprocessingqualitycontrolrnaseqregressionsoftwarevisualization
5.0 match 4.78 score 8 scriptsfvafrcu
HandTill2001:Multiple Class Area under ROC Curve
An S4 implementation of Eq. (3) and Eq. (7) by David J. Hand and Robert J. Till (2001) <DOI:10.1023/A:1010920819831>.
Maintained by Andreas Dominik Cullmann. Last updated 4 years ago.
4.8 match 4.95 score 59 scripts 1 dependentscran
ROCaggregator:Aggregate Multiple ROC Curves into One Global ROC
Aggregates multiple Receiver Operating Characteristic (ROC) curves obtained from different sources into one global ROC. Additionally, it’s also possible to calculate the aggregated precision-recall (PR) curve.
Maintained by Pedro Mateus. Last updated 4 years ago.
8.8 match 2.70 scorewkostelecki
ezplot:Functions for Common Chart Types
Wrapper for the 'ggplot2' package that creates a variety of common charts (e.g. bar, line, area, ROC, waterfall, pie) while aiming to reduce typing.
Maintained by Wojtek Kostelecki. Last updated 7 months ago.
3.8 match 5 stars 6.16 score 116 scriptsrstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 4 days ago.
1.7 match 845 stars 13.57 score 264 scripts 2 dependentskozodoi
fairness:Algorithmic Fairness Metrics
Offers calculation, visualization and comparison of algorithmic fairness metrics. Fair machine learning is an emerging topic with the overarching aim to critically assess whether ML algorithms reinforce existing social biases. Unfair algorithms can propagate such biases and produce predictions with a disparate impact on various sensitive groups of individuals (defined by sex, gender, ethnicity, religion, income, socioeconomic status, physical or mental disabilities). Fair algorithms possess the underlying foundation that these groups should be treated similarly or have similar prediction outcomes. The fairness R package offers the calculation and comparisons of commonly and less commonly used fairness metrics in population subgroups. These methods are described by Calders and Verwer (2010) <doi:10.1007/s10618-010-0190-x>, Chouldechova (2017) <doi:10.1089/big.2016.0047>, Feldman et al. (2015) <doi:10.1145/2783258.2783311> , Friedler et al. (2018) <doi:10.1145/3287560.3287589> and Zafar et al. (2017) <doi:10.1145/3038912.3052660>. The package also offers convenient visualizations to help understand fairness metrics.
Maintained by Nikita Kozodoi. Last updated 2 years ago.
algorithmic-discriminationalgorithmic-fairnessdiscriminationdisparate-impactfairnessfairness-aifairness-mlmachine-learning
3.3 match 32 stars 6.82 score 69 scripts 1 dependentsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 4 days ago.
immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project
1.7 match 182 stars 13.71 score 1.3k scripts 22 dependentscran
multiROC:Calculating and Visualizing ROC and PR Curves Across Multi-Class Classifications
Tools to solve real-world problems with multiple classes classifications by computing the areas under ROC and PR curve via micro-averaging and macro-averaging. The vignettes of this package can be found via <https://github.com/WandeRum/multiROC>. The methodology is described in V. Van Asch (2013) <https://www.clips.uantwerpen.be/~vincent/pdf/microaverage.pdf> and Pedregosa et al. (2011) <http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html>.
Maintained by Runmin Wei. Last updated 7 years ago.
12.7 match 1.78 scorebioc
scAnnotatR:Pretrained learning models for cell type prediction on single cell RNA-sequencing data
The package comprises a set of pretrained machine learning models to predict basic immune cell types. This enables all users to quickly get a first annotation of the cell types present in their dataset without requiring prior knowledge. scAnnotatR also allows users to train their own models to predict new cell types based on specific research needs.
Maintained by Johannes Griss. Last updated 5 months ago.
singlecelltranscriptomicsgeneexpressionsupportvectormachineclassificationsoftware
3.3 match 15 stars 6.73 score 20 scriptssnoweye
cubfits:Codon Usage Bias Fits
Estimating mutation and selection coefficients on synonymous codon bias usage based on models of ribosome overhead cost (ROC). Multinomial logistic regression and Markov Chain Monte Carlo are used to estimate and predict protein production rates with/without the presence of expressions and measurement errors. Work flows with examples for simulation, estimation and prediction processes are also provided with parallelization speedup. The whole framework is tested with yeast genome and gene expression data of Yassour, et al. (2009) <doi:10.1073/pnas.0812841106>.
Maintained by Wei-Chen Chen. Last updated 3 years ago.
4.5 match 7 stars 4.83 score 32 scriptsaaamini
nett:Network Analysis and Community Detection
Features tools for the network data analysis and community detection. Provides multiple methods for fitting, model selection and goodness-of-fit testing in degree-corrected stochastic blocks models. Most of the computations are fast and scalable for sparse networks, esp. for Poisson versions of the models. Implements the following: Amini, Chen, Bickel and Levina (2013) <doi:10.1214/13-AOS1138> Bickel and Sarkar (2015) <doi:10.1111/rssb.12117> Lei (2016) <doi:10.1214/15-AOS1370> Wang and Bickel (2017) <doi:10.1214/16-AOS1457> Zhang and Amini (2020) <arXiv:2012.15047> Le and Levina (2022) <doi:10.1214/21-EJS1971>.
Maintained by Arash A. Amini. Last updated 2 years ago.
3.8 match 8 stars 5.48 score 19 scriptsbioc
genefilter:genefilter: methods for filtering genes from high-throughput experiments
Some basic functions for filtering genes.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
1.9 match 11.10 score 2.4k scripts 142 dependentssqyu
genscore:Generalized Score Matching Estimators
Implementation of the Generalized Score Matching estimator in Yu et al. (2019) <http://jmlr.org/papers/v20/18-278.html> for non-negative graphical models (truncated Gaussian, exponential square-root, gamma, a-b models) and univariate truncated Gaussian distributions. Also includes the original estimator for untruncated Gaussian graphical models from Lin et al. (2016) <doi:10.1214/16-EJS1126>, with the addition of a diagonal multiplier.
Maintained by Shiqing Yu. Last updated 5 years ago.
density-estimationgraphical-modelsinteraction-modelsscore-matchingundirected-graphs
4.9 match 1 stars 4.18 score 3 scripts 1 dependentssingmann
MPTinR:Analyze Multinomial Processing Tree Models
Provides a user-friendly way for the analysis of multinomial processing tree (MPT) models (e.g., Riefer, D. M., and Batchelder, W. H. [1988]. Multinomial modeling and the measurement of cognitive processes. Psychological Review, 95, 318-339) for single and multiple datasets. The main functions perform model fitting and model selection. Model selection can be done using AIC, BIC, or the Fisher Information Approximation (FIA) a measure based on the Minimum Description Length (MDL) framework. The model and restrictions can be specified in external files or within an R script in an intuitive syntax or using the context-free language for MPTs. The 'classical' .EQN file format for model files is also supported. Besides MPTs, this package can fit a wide variety of other cognitive models such as SDT models (see fit.model). It also supports multicore fitting and FIA calculation (using the snowfall package), can generate or bootstrap data for simulations, and plot predicted versus observed data.
Maintained by Henrik Singmann. Last updated 4 years ago.
5.1 match 3 stars 3.99 score 27 scripts 1 dependentshanjunwei-lab
PMAPscore:Identify Prognosis-Related Pathways Altered by Somatic Mutation
We innovatively defined a pathway mutation accumulate perturbation score (PMAPscore) to reflect the position and the cumulative effect of the genetic mutations at the pathway level. Based on the PMAPscore of pathways, identified prognosis-related pathways altered by somatic mutation and predict immunotherapy efficacy by constructing a multiple-pathway-based risk model (Tarca, Adi Laurentiu et al (2008) <doi:10.1093/bioinformatics/btn577>).
Maintained by Junwei Han. Last updated 3 years ago.
5.4 match 3.70 score 2 scriptsbioc
netprioR:A model for network-based prioritisation of genes
A model for semi-supervised prioritisation of genes integrating network data, phenotypes and additional prior knowledge about TP and TN gene labels from the literature or experts.
Maintained by Fabian Schmich. Last updated 5 months ago.
immunooncologycellbasedassayspreprocessingnetwork
5.0 match 4.00 score 1 scriptsmlverse
luz:Higher Level 'API' for 'torch'
A high level interface for 'torch' providing utilities to reduce the the amount of code needed for common tasks, abstract away torch details and make the same code work on both the 'CPU' and 'GPU'. It's flexible enough to support expressing a large range of models. It's heavily inspired by 'fastai' by Howard et al. (2020) <arXiv:2002.04688>, 'Keras' by Chollet et al. (2015) and 'PyTorch Lightning' by Falcon et al. (2019) <doi:10.5281/zenodo.3828935>.
Maintained by Daniel Falbel. Last updated 6 months ago.
2.0 match 89 stars 9.86 score 318 scripts 4 dependentsanabraga
Comp2ROC:Compare Two ROC Curves that Intersect
Comparison of two ROC curves through the methodology proposed by Ana C. Braga.
Maintained by Ana C. Braga. Last updated 9 years ago.
19.6 match 1.00 score 9 scriptscran
rocsvm.path:The Entire Solution Paths for ROC-SVM
We develop the entire solution paths for ROC-SVM presented by Rakotomamonjy. The ROC-SVM solution path algorithm greatly facilitates the tuning procedure for regularization parameter, lambda in ROC-SVM by avoiding grid search algorithm which may be computationally too intensive. For more information on the ROC-SVM, see the report in the ROC Analysis in AI workshop(ROCAI-2004) : Hernàndez-Orallo, José, et al. (2004) <doi:10.1145/1046456.1046489>.
Maintained by Seung Jun Shin. Last updated 6 years ago.
17.1 match 1.15 score 14 scriptsadrianantico
AutoPlots:Creating Echarts Visualizations as Easy as Possible
Create beautiful and interactive visualizations in a single function call. The 'data.table' package is utilized to perform the data wrangling necessary to prepare your data for the plot types you wish to build, along with allowing fast processing for big data. There are two broad classes of plots available: standard plots and machine learning evaluation plots. There are lots of parameters available in each plot type function for customizing the plots (such as faceting) and data wrangling (such as variable transformations and aggregation).
Maintained by Adrian Antico. Last updated 10 months ago.
4.5 match 21 stars 4.32 scorermojab63
ldt:Automated Uncertainty Analysis
Methods and tools for model selection and multi-model inference (Burnham and Anderson (2002) <doi:10.1007/b97636>, among others). 'SUR' (for parameter estimation), 'logit'/'probit' (for binary classification), and 'VARMA' (for time-series forecasting) are implemented. Evaluations are both in-sample and out-of-sample. It is designed to be efficient in terms of CPU usage and memory consumption.
Maintained by Ramin Mojab. Last updated 8 months ago.
7.8 match 2.48 score 7 scriptstoduckhanh
ClusROC:ROC Analysis in Three-Class Classification Problems for Clustered Data
Statistical methods for ROC surface analysis in three-class classification problems for clustered data and in presence of covariates. In particular, the package allows to obtain covariate-specific point and interval estimation for: (i) true class fractions (TCFs) at fixed pairs of thresholds; (ii) the ROC surface; (iii) the volume under ROC surface (VUS); (iv) the optimal pairs of thresholds. Methods considered in points (i), (ii) and (iv) are proposed and discussed in To et al. (2022) <doi:10.1177/09622802221089029>. Referring to point (iv), three different selection criteria are implemented: Generalized Youden Index (GYI), Closest to Perfection (CtP) and Maximum Volume (MV). Methods considered in point (iii) are proposed and discussed in Xiong et al. (2018) <doi:10.1177/0962280217742539>. Visualization tools are also provided. We refer readers to the articles cited above for all details.
Maintained by Duc-Khanh To. Last updated 2 years ago.
biomakerbox-cox-transformationmixed-effects-modelsoptimal-thresholdreciver-operating-characteristicscpp
7.1 match 2.70 score 6 scriptsjamesmurray7
gmvjoint:Joint Models of Survival and Multivariate Longitudinal Data
Fit joint models of survival and multivariate longitudinal data. The longitudinal data is specified by generalised linear mixed models. The joint models are fit via maximum likelihood using an approximate expectation maximisation algorithm. Bernhardt (2015) <doi:10.1016/j.csda.2014.11.011>.
Maintained by James Murray. Last updated 5 months ago.
glmmjoint-modelslongitudinalmixed-modelsmodelpredictionsurvivalsurvival-analysisopenblascppopenmp
5.1 match 3 stars 3.78 score 20 scriptsyanyachen
MLmetrics:Machine Learning Evaluation Metrics
A collection of evaluation metrics, including loss, score and utility functions, that measure regression, classification and ranking performance.
Maintained by Yachen Yan. Last updated 11 months ago.
1.7 match 69 stars 11.09 score 2.2k scripts 20 dependentscran
SeaVal:Validation of Seasonal Weather Forecasts
Provides tools for processing and evaluating seasonal weather forecasts, with an emphasis on tercile forecasts. We follow the World Meteorological Organization's "Guidance on Verification of Operational Seasonal Climate Forecasts", S.J.Mason (2018, ISBN: 978-92-63-11220-0, URL: <https://library.wmo.int/idurl/4/56227>). The development was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 869730 (CONFER). A comprehensive online tutorial is available at <https://seasonalforecastingengine.github.io/SeaValDoc/>.
Maintained by Claudio Heinrich-Mertsching. Last updated 9 months ago.
10.9 match 1.70 scoret-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
1.7 match 10.82 score 10k scripts 54 dependentsfoucher-y
survivalSL:Super Learner for Survival Prediction from Censored Data
Several functions and S3 methods to construct a super learner in the presence of censored times-to-event and to evaluate its prognostic capacities.
Maintained by Yohann Foucher. Last updated 2 months ago.
5.0 match 2 stars 3.70 scorerwehrens
BioMark:Find Biomarkers in Two-Class Discrimination Problems
Variable selection methods are provided for several classification methods: the lasso/elastic net, PCLDA, PLSDA, and several t-tests. Two approaches for selecting cutoffs can be used, one based on the stability of model coefficients under perturbation, and the other on higher criticism.
Maintained by Ron Wehrens. Last updated 10 years ago.
7.8 match 2.32 score 21 scriptsertansu
ROCsurf:ROC Surface Analysis Under the Three-Class Problems
Receiver Operating Characteristic (ROC) analysis is performed assuming samples are from the proposed distributions. In addition, the volume under the ROC surface and true positive fractions values are evaluated by ROC surface analysis.
Maintained by Ertan Akgenç. Last updated 8 months ago.
5.7 match 3.18 score 4 scriptsbioc
matter:Out-of-core statistical computing and signal processing
Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.
Maintained by Kylie A. Bemis. Last updated 3 months ago.
infrastructuredatarepresentationdataimportdimensionreductionpreprocessingcpp
1.9 match 57 stars 9.52 score 64 scripts 2 dependentsyouyifong
kyotil:Utility Functions for Statistical Analysis Report Generation and Monte Carlo Studies
Helper functions for creating formatted summary of regression models, writing publication-ready tables to latex files, and running Monte Carlo experiments.
Maintained by Youyi Fong. Last updated 8 days ago.
2.3 match 7.87 score 236 scripts 7 dependentsbioc
SC3:Single-Cell Consensus Clustering
A tool for unsupervised clustering and analysis of single cell RNA-Seq data.
Maintained by Vladimir Kiselev. Last updated 5 months ago.
immunooncologysinglecellsoftwareclassificationclusteringdimensionreductionsupportvectormachinernaseqvisualizationtranscriptomicsdatarepresentationguidifferentialexpressiontranscriptionbioconductor-packagehuman-cell-atlassingle-cell-rna-seqopenblascpp
1.8 match 122 stars 10.09 score 374 scripts 1 dependentsbioc
CMA:Synthesis of microarray-based classification
This package provides a comprehensive collection of various microarray-based classification algorithms both from Machine Learning and Statistics. Variable Selection, Hyperparameter tuning, Evaluation and Comparison can be performed combined or stepwise in a user-friendly environment.
Maintained by Roman Hornung. Last updated 5 months ago.
3.3 match 5.09 score 61 scriptsndphillips
FFTrees:Generate, Visualise, and Evaluate Fast-and-Frugal Decision Trees
Create, visualize, and test fast-and-frugal decision trees (FFTs) using the algorithms and methods described by Phillips, Neth, Woike & Gaissmaier (2017), <doi:10.1017/S1930297500006239>. FFTs are simple and transparent decision trees for solving binary classification problems. FFTs can be preferable to more complex algorithms because they require very little information, are easy to understand and communicate, and are robust against overfitting.
Maintained by Hansjoerg Neth. Last updated 5 months ago.
1.8 match 135 stars 9.58 score 144 scriptsbioc
Rtpca:Thermal proximity co-aggregation with R
R package for performing thermal proximity co-aggregation analysis with thermal proteome profiling datasets to analyse protein complex assembly and (differential) protein-protein interactions across conditions.
Maintained by Nils Kurzawa. Last updated 5 months ago.
3.8 match 4.46 score 29 scriptsaijordan
triptych:Diagnostic Graphics to Evaluate Forecast Performance
Overall predictive performance is measured by a mean score (or loss), which decomposes into miscalibration, discrimination, and uncertainty components. The main focus is visualization of these distinct and complementary aspects in joint displays. See Dimitriadis, Gneiting, Jordan, Vogel (2024) <doi:10.1016/j.ijforecast.2023.09.007>.
Maintained by Alexander I. Jordan. Last updated 9 months ago.
5.1 match 3.28 score 19 scriptscran
HUM:Compute HUM Value and Visualize ROC Curves
Tools for computing HUM (Hypervolume Under the Manifold) value to estimate features ability to discriminate the class labels, visualizing the ROC curve for two or three class labels (Natalia Novoselova, Cristina Della Beffa, Junxi Wang, Jialiang Li, Frank Pessler, Frank Klawonn (2014) <doi:10.1093/bioinformatics/btu086>).
Maintained by Natalia Novoselova. Last updated 3 years ago.
9.2 match 1 stars 1.78 scoreaigorahub
sensR:Thurstonian Models for Sensory Discrimination
Provides methods for sensory discrimination methods; duotrio, tetrad, triangle, 2-AFC, 3-AFC, A-not A, same-different, 2-AC and degree-of-difference. This enables the calculation of d-primes, standard errors of d-primes, sample size and power computations, and comparisons of different d-primes. Methods for profile likelihood confidence intervals and plotting are included. Most methods are described in Brockhoff, P.B. and Christensen, R.H.B. (2010) <doi:10.1016/j.foodqual.2009.04.003>.
Maintained by Dominik Rafacz. Last updated 1 years ago.
3.3 match 7 stars 4.92 score 77 scriptswzhang17
sorocs:A Bayesian Semiparametric Approach to Correlated ROC Surfaces
A Bayesian semiparametric Dirichlet process mixtures to estimate correlated receiver operating characteristic (ROC) surfaces and the associated volume under the surface (VUS) with stochastic order constraints. The reference paper is:Zhen Chen, Beom Seuk Hwang, (2018) "A Bayesian semiparametric approach to correlated ROC surfaces with stochastic order constraints". Biometrics, 75, 539-550. <doi:10.1111/biom.12997>.
Maintained by Weimin Zhang. Last updated 5 years ago.
5.3 match 3.00 score 2 scriptsbrian-j-smith
MachineShop:Machine Learning Models and Tools
Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.
Maintained by Brian J Smith. Last updated 7 months ago.
classification-modelsmachine-learningpredictive-modelingregression-modelssurvival-models
2.0 match 61 stars 7.95 score 121 scriptsbioc
puma:Propagating Uncertainty in Microarray Analysis(including Affymetrix tranditional 3' arrays and exon arrays and Human Transcriptome Array 2.0)
Most analyses of Affymetrix GeneChip data (including tranditional 3' arrays and exon arrays and Human Transcriptome Array 2.0) are based on point estimates of expression levels and ignore the uncertainty of such estimates. By propagating uncertainty to downstream analyses we can improve results from microarray analyses. For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. In additon to calculte gene expression from Affymetrix 3' arrays, puma also provides methods to process exon arrays and produces gene and isoform expression for alternative splicing study. puma also offers improvements in terms of scope and speed of execution over previously available uncertainty propagation methods. Included are summarisation, differential expression detection, clustering and PCA methods, together with useful plotting functions.
Maintained by Xuejun Liu. Last updated 5 months ago.
microarrayonechannelpreprocessingdifferentialexpressionclusteringexonarraygeneexpressionmrnamicroarraychiponchipalternativesplicingdifferentialsplicingbayesiantwochanneldataimporthta2.0
3.5 match 4.53 score 17 scriptstripartio
staccuracy:Standardized Accuracy and Other Model Performance Metrics
Standardized accuracy (staccuracy) is a framework for expressing accuracy scores such that 50% represents a reference level of performance and 100% is a perfect prediction. The 'staccuracy' package provides tools for creating staccuracy functions as well as some recommended staccuracy measures. It also provides functions for some classic performance metrics such as mean absolute error (MAE), root mean squared error (RMSE), and area under the receiver operating characteristic curve (AUCROC), as well as their winsorized versions when applicable.
Maintained by Chitu Okoli. Last updated 21 days ago.
3.8 match 1 stars 4.18 score 4 scripts 2 dependentsmarsdu1989
reportROC:An Easy Way to Report ROC Analysis
Provides an easy way to report the results of ROC analysis, including: 1. an ROC curve. 2. the value of Cutoff, AUC (Area Under Curve), ACC (accuracy), SEN (sensitivity), SPE (specificity), PLR (positive likelihood ratio), NLR (negative likelihood ratio), PPV (positive predictive value), NPV (negative predictive value), PPA (percentage of positive accordance), NPA (percentage of negative accordance), TPA (percentage of total accordance), KAPPA (kappa value).
Maintained by Zhicheng Du. Last updated 3 years ago.
5.6 match 2.77 score 33 scripts 2 dependentstesselle
kairos:Analysis of Chronological Patterns from Archaeological Count Data
A toolkit for absolute and relative dating and analysis of chronological patterns. This package includes functions for chronological modeling and dating of archaeological assemblages from count data. It provides methods for matrix seriation. It also allows to compute time point estimates and density estimates of the occupation and duration of an archaeological site.
Maintained by Nicolas Frerebeau. Last updated 13 days ago.
chronologymatrix-seriationarchaeologyarchaeological-science
3.3 match 4.66 score 11 scripts 1 dependentsmwheymans
psfmi:Prediction Model Pooling, Selection and Performance Evaluation Across Multiply Imputed Datasets
Pooling, backward and forward selection of linear, logistic and Cox regression models in multiply imputed datasets. Backward and forward selection can be done from the pooled model using Rubin's Rules (RR), the D1, D2, D3, D4 and the median p-values method. This is also possible for Mixed models. The models can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of predictors. The stability of the models can be evaluated using (cluster) bootstrapping. The package further contains functions to pool model performance measures as ROC/AUC, Reclassification, R-squared, scaled Brier score, H&L test and calibration plots for logistic regression models. Internal validation can be done across multiply imputed datasets with cross-validation or bootstrapping. The adjusted intercept after shrinkage of pooled regression coefficients can be obtained. Backward and forward selection as part of internal validation is possible. A function to externally validate logistic prediction models in multiple imputed datasets is available and a function to compare models. For Cox models a strata variable can be included. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>.
Maintained by Martijn Heymans. Last updated 2 years ago.
cox-regressionimputationimputed-datasetslogisticmultiple-imputationpoolpredictorregressionselectionsplinespline-predictors
2.1 match 10 stars 7.17 score 70 scriptsbioc
PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit
Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
geneexpressionpharmacogeneticspharmacogenomicssoftwareclassificationsurvivalclusteringgeneprediction
3.5 match 1 stars 4.31 score 17 scriptsmodeloriented
survex:Explainable Machine Learning in Survival Analysis
Survival analysis models are commonly used in medicine and other areas. Many of them are too complex to be interpreted by human. Exploration and explanation is needed, but standard methods do not give a broad enough picture. 'survex' provides easy-to-apply methods for explaining survival models, both complex black-boxes and simpler statistical models. They include methods specific to survival analysis such as SurvSHAP(t) introduced in Krzyzinski et al., (2023) <doi:10.1016/j.knosys.2022.110234>, SurvLIME described in Kovalev et al., (2020) <doi:10.1016/j.knosys.2020.106164> as well as extensions of existing ones described in Biecek et al., (2021) <doi:10.1201/9780429027192>.
Maintained by Mikołaj Spytek. Last updated 9 months ago.
biostatisticsbrier-scorescensored-datacox-modelcox-regressionexplainable-aiexplainable-machine-learningexplainable-mlexplanatory-model-analysisinterpretable-machine-learninginterpretable-mlmachine-learningprobabilistic-machine-learningshapsurvival-analysistime-to-eventvariable-importancexai
1.8 match 110 stars 8.40 score 114 scriptscran
randomUniformForest:Random Uniform Forests for Classification, Regression and Unsupervised Learning
Ensemble model, for classification, regression and unsupervised learning, based on a forest of unpruned and randomized binary decision trees. Each tree is grown by sampling, with replacement, a set of variables at each node. Each cut-point is generated randomly, according to the continuous Uniform distribution. For each tree, data are either bootstrapped or subsampled. The unsupervised mode introduces clustering, dimension reduction and variable importance, using a three-layer engine. Random Uniform Forests are mainly aimed to lower correlation between trees (or trees residuals), to provide a deep analysis of variable importance and to allow native distributed and incremental learning.
Maintained by Saip Ciss. Last updated 3 years ago.
4.0 match 3 stars 3.77 score 99 scriptsconsbiol-unibern
SDMtune:Species Distribution Model Selection
User-friendly framework that enables the training and the evaluation of species distribution models (SDMs). The package implements functions for data driven variable selection and model tuning and includes numerous utilities to display the results. All the functions used to select variables or to tune model hyperparameters have an interactive real-time chart displayed in the 'RStudio' viewer pane during their execution.
Maintained by Sergio Vignali. Last updated 3 months ago.
hyperparameter-tuningspecies-distribution-modellingvariable-selectioncpp
2.0 match 25 stars 7.37 score 155 scriptslau-mel
rocc:ROC Based Classification
Functions for a classification method based on receiver operating characteristics (ROC). Briefly, features are selected according to their ranked AUC value in the training set. The selected features are merged by the mean value to form a meta-gene. The samples are ranked by their meta-gene value and the meta-gene threshold that has the highest accuracy in splitting the training samples is determined. A new sample is classified by its meta-gene value relative to the threshold. In the first place, the package is aimed at two class problems in gene expression data, but might also apply to other problems.
Maintained by Martin Lauss. Last updated 5 years ago.
9.3 match 1.56 score 36 scriptstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
1.8 match 3 stars 8.20 score 7.8k scripts 11 dependentsmathijsdeen
MDMA:Mathijs Deen's Miscellaneous Auxiliaries
Provides a variety of functions useful for data analysis, selection, manipulation, and graphics.
Maintained by Mathijs Deen. Last updated 11 months ago.
5.3 match 2.70 scorecran
VUROCS:Volume under the ROC Surface for Multi-Class ROC Analysis
Calculates the volume under the ROC surface and its (co)variance for ordered multi-class ROC analysis as well as certain bivariate ordinal measures of association.
Maintained by Hannes Kazianka. Last updated 5 years ago.
14.2 match 1.00 score 2 scriptsdesanou
mglasso:Multiscale Graphical Lasso
Inference of Multiscale graphical models with neighborhood selection approach. The method is based on solving a convex optimization problem combining a Lasso and fused-group Lasso penalties. This allows to infer simultaneously a conditional independence graph and a clustering partition. The optimization is based on the Continuation with Nesterov smoothing in a Shrinkage-Thresholding Algorithm solver (Hadj-Selem et al. 2018) <doi:10.1109/TMI.2018.2829802> implemented in python.
Maintained by Edmond Sanou. Last updated 2 years ago.
3.5 match 2 stars 4.11 score 13 scriptslozalojo
mem:The Moving Epidemic Method
The Moving Epidemic Method, created by T Vega and JE Lozano (2012, 2015) <doi:10.1111/j.1750-2659.2012.00422.x>, <doi:10.1111/irv.12330>, allows the weekly assessment of the epidemic and intensity status to help in routine respiratory infections surveillance in health systems. Allows the comparison of different epidemic indicators, timing and shape with past epidemics and across different regions or countries with different surveillance systems. Also, it gives a measure of the performance of the method in terms of sensitivity and specificity of the alert week.
Maintained by Jose E. Lozano. Last updated 2 years ago.
2.3 match 14 stars 6.24 score 82 scripts 1 dependentsddimmery
tidyhte:Tidy Estimation of Heterogeneous Treatment Effects
Estimates heterogeneous treatment effects using tidy semantics on experimental or observational data. Methods are based on the doubly-robust learner of Kennedy (n.d.) <arXiv:2004.14497>. You provide a simple recipe for what machine learning algorithms to use in estimating the nuisance functions and 'tidyhte' will take care of cross-validation, estimation, model selection, diagnostics and construction of relevant quantities of interest about the variability of treatment effects.
Maintained by Drew Dimmery. Last updated 2 years ago.
2.5 match 14 stars 5.36 score 11 scriptscran
rocbc:Statistical Inference for Box-Cox Based Receiver Operating Characteristic Curves
Generation of Box-Cox based ROC curves and several aspects of inferences and hypothesis testing. Can be used when inferences for one biomarker (Bantis LE, Nakas CT, Reiser B. (2018)<doi:10.1002/bimj.201700107>) are of interest or when comparisons of two correlated biomarkers (Bantis LE, Nakas CT, Reiser B. (2021)<doi:10.1002/bimj.202000128>) are of interest. Provides inferences and comparisons around the AUC, the Youden index, the sensitivity at a given specificity level (and vice versa), the optimal operating point of the ROC curve (in the Youden sense), and the Youden based cutoff.
Maintained by Benjamin Brewer. Last updated 11 months ago.
5.8 match 2.30 scorecran
longROC:Time-Dependent Prognostic Accuracy with Multiply Evaluated Bio Markers or Scores
Time-dependent Receiver Operating Characteristic curves, Area Under the Curve, and Net Reclassification Indexes for repeated measures. It is based on methods in Barbati and Farcomeni (2017) <doi:10.1007/s10260-017-0410-2>.
Maintained by Alessio Farcomeni. Last updated 7 years ago.
12.3 match 1.08 score 12 scriptsbioc
kebabs:Kernel-Based Analysis of Biological Sequences
The package provides functionality for kernel-based analysis of DNA, RNA, and amino acid sequences via SVM-based methods. As core functionality, kebabs implements following sequence kernels: spectrum kernel, mismatch kernel, gappy pair kernel, and motif kernel. Apart from an efficient implementation of standard position-independent functionality, the kernels are extended in a novel way to take the position of patterns into account for the similarity measure. Because of the flexibility of the kernel formulation, other kernels like the weighted degree kernel or the shifted weighted degree kernel with constant weighting of positions are included as special cases. An annotation-specific variant of the kernels uses annotation information placed along the sequence together with the patterns in the sequence. The package allows for the generation of a kernel matrix or an explicit feature representation in dense or sparse format for all available kernels which can be used with methods implemented in other R packages. With focus on SVM-based methods, kebabs provides a framework which simplifies the usage of existing SVM implementations in kernlab, e1071, and LiblineaR. Binary and multi-class classification as well as regression tasks can be used in a unified way without having to deal with the different functions, parameters, and formats of the selected SVM. As support for choosing hyperparameters, the package provides cross validation - including grouped cross validation, grid search and model selection functions. For easier biological interpretation of the results, the package computes feature weights for all SVMs and prediction profiles which show the contribution of individual sequence positions to the prediction result and indicate the relevance of sequence sections for the learning result and the underlying biological functions.
Maintained by Ulrich Bodenhofer. Last updated 5 months ago.
supportvectormachineclassificationclusteringregressioncpp
2.0 match 6.58 score 47 scripts 3 dependentsludovikcoba
rrecsys:Environment for Evaluating Recommender Systems
Processes standard recommendation datasets (e.g., a user-item rating matrix) as input and generates rating predictions and lists of recommended items. Standard algorithm implementations which are included in this package are the following: Global/Item/User-Average baselines, Weighted Slope One, Item-Based KNN, User-Based KNN, FunkSVD, BPR and weighted ALS. They can be assessed according to the standard offline evaluation methodology (Shani, et al. (2011) <doi:10.1007/978-0-387-85820-3_8>) for recommender systems using measures such as MAE, RMSE, Precision, Recall, F1, AUC, NDCG, RankScore and coverage measures. The package (Coba, et al.(2017) <doi: 10.1007/978-3-319-60042-0_36>) is intended for rapid prototyping of recommendation algorithms and education purposes.
Maintained by Ludovik Çoba. Last updated 3 years ago.
1.9 match 23 stars 6.84 score 25 scriptsbioc
cardelino:Clone Identification from Single Cell Data
Methods to infer clonal tree configuration for a population of cells using single-cell RNA-seq data (scRNA-seq), and possibly other data modalities. Methods are also provided to assign cells to inferred clones and explore differences in gene expression between clones. These methods can flexibly integrate information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. A flexible beta-binomial error model that accounts for stochastic dropout events as well as systematic allelic imbalance is used.
Maintained by Davis McCarthy. Last updated 5 months ago.
singlecellrnaseqvisualizationtranscriptomicsgeneexpressionsequencingsoftwareexomeseqclonal-clusteringgibbs-samplingscrna-seqsingle-cellsomatic-mutations
1.8 match 61 stars 7.05 score 62 scriptsbioc
wateRmelon:Illumina DNA methylation array normalization and metrics
15 flavours of betas and three performance metrics, with methods for objects produced by methylumi and minfi packages.
Maintained by Leo C Schalkwyk. Last updated 4 months ago.
dnamethylationmicroarraytwochannelpreprocessingqualitycontrol
1.6 match 7.75 score 247 scripts 2 dependentspwwang
plotthis:High-Level Plotting Built Upon 'ggplot2' and Other Plotting Packages
Provides high-level API and a wide range of options to create stunning, publication-quality plots effortlessly. It is built upon 'ggplot2' and other plotting packages, and is designed to be easy to use and to work seamlessly with 'ggplot2' objects. It is particularly useful for creating complex plots with multiple layers, facets, and annotations. It also provides a set of functions to create plots for specific types of data, such as Venn diagrams, alluvial diagrams, and phylogenetic trees. The package is designed to be flexible and customizable, and to work well with the 'ggplot2' ecosystem. The API can be found at <https://pwwang.github.io/plotthis/reference/index.html>.
Maintained by Panwen Wang. Last updated 13 hours ago.
2.3 match 36 stars 5.51 score 2 scriptstdhock
penaltyLearning:Penalty Learning
Implementations of algorithms from Learning Sparse Penalties for Change-point Detection using Max Margin Interval Regression, by Hocking, Rigaill, Vert, Bach <http://proceedings.mlr.press/v28/hocking13.html> published in proceedings of ICML2013.
Maintained by Toby Dylan Hocking. Last updated 6 months ago.
2.0 match 16 stars 6.13 score 129 scripts 2 dependentsdrizopoulos
JMbayes:Joint Modeling of Longitudinal and Time-to-Event Data under a Bayesian Approach
Shared parameter models for the joint modeling of longitudinal and time-to-event data using MCMC; Dimitris Rizopoulos (2016) <doi:10.18637/jss.v072.i07>.
Maintained by Dimitris Rizopoulos. Last updated 4 years ago.
joint-modelslongitudinal-responsesprediction-modelsurvival-analysisopenblascppopenmpjags
1.8 match 60 stars 6.98 score 80 scriptsmaizhou
emplik:Empirical Likelihood Ratio for Censored/Truncated Data
Empirical likelihood ratio tests and confidence intervals for means/quantiles/hazards from possibly censored and/or truncated data. In particular, the empirical likelihood for the Kaplan-Meier/Nelson-Aalen estimator. Now does AFT regression too.
Maintained by Mai Zhou. Last updated 3 months ago.
3.6 match 3.37 score 39 scripts 13 dependentsbioc
tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles
This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.
Maintained by Timothy Keyes. Last updated 5 months ago.
singlecellflowcytometrybioinformaticscytometrydata-sciencesingle-celltidyversecpp
1.7 match 19 stars 7.26 score 35 scriptsjonasbhend
easyVerification:Ensemble Forecast Verification for Large Data Sets
Set of tools to simplify application of atomic forecast verification metrics for (comparative) verification of ensemble forecasts to large data sets. The forecast metrics are imported from the 'SpecsVerification' package, and additional forecast metrics are provided with this package. Alternatively, new user-defined forecast scores can be implemented using the example scores provided and applied using the functionality of this package.
Maintained by Jonas Bhend. Last updated 2 years ago.
2.0 match 1 stars 6.04 score 61 scripts 4 dependentsechasnovski
pdqr:Work with Custom Distribution Functions
Create, transform, and summarize custom random variables with distribution functions (analogues of 'p*()', 'd*()', 'q*()', and 'r*()' functions from base R). Two types of distributions are supported: "discrete" (random variable has finite number of output values) and "continuous" (infinite number of values in the form of continuous random variable). Functions for distribution transformations and summaries are available. Implemented approaches often emphasize approximate and numerical solutions: all distributions assume finite support and finite values of density function; some methods implemented with simulation techniques.
Maintained by Evgeni Chasnovski. Last updated 2 years ago.
1.9 match 15 stars 6.37 score 26 scripts 1 dependentsbioc
PathoStat:PathoStat Statistical Microbiome Analysis Package
The purpose of this package is to perform Statistical Microbiome Analysis on metagenomics results from sequencing data samples. In particular, it supports analyses on the PathoScope generated report files. PathoStat provides various functionalities including Relative Abundance charts, Diversity estimates and plots, tests of Differential Abundance, Time Series visualization, and Core OTU analysis.
Maintained by Solaiappan Manimaran. Last updated 5 months ago.
microbiomemetagenomicsgraphandnetworkmicroarraypatternlogicprincipalcomponentsequencingsoftwarevisualizationrnaseqimmunooncology
2.0 match 8 stars 5.90 score 8 scriptsbblodfon
usefun:A Collection of Useful Functions by John
A set of general functions that I have used in various projects and other R packages. Miscellaneous operations on data frames, matrices and vectors, ROC and PR statistics.
Maintained by John Zobolas. Last updated 6 months ago.
2.5 match 4 stars 4.61 score 102 scriptsmodeloriented
fairmodels:Flexible Tool for Bias Detection, Visualization, and Mitigation
Measure fairness metrics in one place for many models. Check how big is model's bias towards different races, sex, nationalities etc. Use measures such as Statistical Parity, Equal odds to detect the discrimination against unprivileged groups. Visualize the bias using heatmap, radar plot, biplot, bar chart (and more!). There are various pre-processing and post-processing bias mitigation algorithms implemented. Package also supports calculating fairness metrics for regression models. Find more details in (Wiśniewski, Biecek (2021)) <arXiv:2104.00507>.
Maintained by Jakub Wiśniewski. Last updated 1 months ago.
explain-classifiersexplainable-mlfairnessfairness-comparisonfairness-mlmodel-evaluation
1.5 match 86 stars 7.72 score 51 scripts 1 dependentsbioc
metaseqR2:An R package for the analysis and result reporting of RNA-Seq data by combining multiple statistical algorithms
Provides an interface to several normalization and statistical testing packages for RNA-Seq gene expression data. Additionally, it creates several diagnostic plots, performs meta-analysis by combinining the results of several statistical tests and reports the results in an interactive way.
Maintained by Panagiotis Moulos. Last updated 5 days ago.
softwaregeneexpressiondifferentialexpressionworkflowsteppreprocessingqualitycontrolnormalizationreportwritingrnaseqtranscriptionsequencingtranscriptomicsbayesianclusteringcellbiologybiomedicalinformaticsfunctionalgenomicssystemsbiologyimmunooncologyalternativesplicingdifferentialsplicingmultiplecomparisontimecoursedataimportatacseqepigeneticsregressionproprietaryplatformsgenesetenrichmentbatcheffectchipseq
1.9 match 7 stars 6.05 score 3 scriptscaranathunge
promor:Proteomics Data Analysis and Modeling Tools
A comprehensive, user-friendly package for label-free proteomics data analysis and machine learning-based modeling. Data generated from 'MaxQuant' can be easily used to conduct differential expression analysis, build predictive models with top protein candidates, and assess model performance. promor includes a suite of tools for quality control, visualization, missing data imputation (Lazar et. al. (2016) <doi:10.1021/acs.jproteome.5b00981>), differential expression analysis (Ritchie et. al. (2015) <doi:10.1093/nar/gkv007>), and machine learning-based modeling (Kuhn (2008) <doi:10.18637/jss.v028.i05>).
Maintained by Chathurani Ranathunge. Last updated 2 years ago.
biomarkersdifferential-expressionlfqmachine-learningmass-spectrometrymodelingproteomics
2.3 match 15 stars 5.02 score 14 scriptsbioc
omicsViewer:Interactive and explorative visualization of SummarizedExperssionSet or ExpressionSet using omicsViewer
omicsViewer visualizes ExpressionSet (or SummarizedExperiment) in an interactive way. The omicsViewer has a separate back- and front-end. In the back-end, users need to prepare an ExpressionSet that contains all the necessary information for the downstream data interpretation. Some extra requirements on the headers of phenotype data or feature data are imposed so that the provided information can be clearly recognized by the front-end, at the same time, keep a minimum modification on the existing ExpressionSet object. The pure dependency on R/Bioconductor guarantees maximum flexibility in the statistical analysis in the back-end. Once the ExpressionSet is prepared, it can be visualized using the front-end, implemented by shiny and plotly. Both features and samples could be selected from (data) tables or graphs (scatter plot/heatmap). Different types of analyses, such as enrichment analysis (using Bioconductor package fgsea or fisher's exact test) and STRING network analysis, will be performed on the fly and the results are visualized simultaneously. When a subset of samples and a phenotype variable is selected, a significance test on means (t-test or ranked based test; when phenotype variable is quantitative) or test of independence (chi-square or fisher’s exact test; when phenotype data is categorical) will be performed to test the association between the phenotype of interest with the selected samples. Additionally, other analyses can be easily added as extra shiny modules. Therefore, omicsViewer will greatly facilitate data exploration, many different hypotheses can be explored in a short time without the need for knowledge of R. In addition, the resulting data could be easily shared using a shiny server. Otherwise, a standalone version of omicsViewer together with designated omics data could be easily created by integrating it with portable R, which can be shared with collaborators or submitted as supplementary data together with a manuscript.
Maintained by Chen Meng. Last updated 2 months ago.
softwarevisualizationgenesetenrichmentdifferentialexpressionmotifdiscoverynetworknetworkenrichment
1.9 match 4 stars 6.02 score 22 scriptsbioc
ClassifyR:A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing
The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.
Maintained by Dario Strbenac. Last updated 7 days ago.
1.3 match 5 stars 8.36 score 45 scripts 3 dependentsloelschlaeger
RprobitB:Bayesian Probit Choice Modeling
Bayes estimation of probit choice models, both in the cross-sectional and panel setting. The package can analyze binary, multivariate, ordered, and ranked choices, as well as heterogeneity of choice behavior among deciders. The main functionality includes model fitting via Markov chain Monte Carlo m ethods, tools for convergence diagnostic, choice data simulation, in-sample and out-of-sample choice prediction, and model selection using information criteria and Bayes factors. The latent class model extension facilitates preference-based decider classification, where the number of latent classes can be inferred via the Dirichlet process or a weight-based updating heuristic. This allows for flexible modeling of choice behavior without the need to impose structural constraints. For a reference on the method see Oelschlaeger and Bauer (2021) <https://trid.trb.org/view/1759753>.
Maintained by Lennart Oelschläger. Last updated 5 months ago.
bayesdiscrete-choiceprobitopenblascppopenmp
2.0 match 4 stars 5.45 score 1 scriptsblasbenito
spatialRF:Easy Spatial Modeling with Random Forest
Automatic generation and selection of spatial predictors for spatial regression with Random Forest. Spatial predictors are surrogates of variables driving the spatial structure of a response variable. The package offers two methods to generate spatial predictors from a distance matrix among training cases: 1) Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>): computed as the eigenvectors of a weighted matrix of distances; 2) RFsp (Hengl et al. <DOI:10.7717/peerj.5518>): columns of the distance matrix used as spatial predictors. Spatial predictors help minimize the spatial autocorrelation of the model residuals and facilitate an honest assessment of the importance scores of the non-spatial predictors. Additionally, functions to reduce multicollinearity, identify relevant variable interactions, tune random forest hyperparameters, assess model transferability via spatial cross-validation, and explore model results via partial dependence curves and interaction surfaces are included in the package. The modelling functions are built around the highly efficient 'ranger' package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).
Maintained by Blas M. Benito. Last updated 3 years ago.
random-forestspatial-analysisspatial-regression
2.0 match 114 stars 5.45 score 49 scripts