R-universe search: misclassification

pcbrendel

multibias:Simultaneous Multi-Bias Adjustment

Quantify the causal effect of a binary exposure on a binary outcome with adjustment for multiple biases. The functions can simultaneously adjust for any combination of uncontrolled confounding, exposure/outcome misclassification, and selection bias. The underlying method generalizes the concept of combining inverse probability of selection weighting with predictive value weighting. Simultaneous multi-bias analysis can be used to enhance the validity and transparency of real-world evidence obtained from observational, longitudinal studies. Based on the work from Paul Brendel, Aracelis Torres, and Onyebuchi Arah (2023) <doi:10.1093/ije/dyad001>.

Maintained by Paul Brendel. Last updated 21 days ago.

causal-inference causal-models epidemiology

54.8 match 5.34 score 7 scripts

kimberlywebb

COMBO:Correcting Misclassified Binary Outcomes in Association Studies

Use frequentist and Bayesian methods to estimate parameters from a binary outcome misclassification model. These methods correct for the problem of "label switching" by assuming that the sum of outcome sensitivity and specificity is at least 1. A description of the analysis methods is available in Hochstedler and Wells (2023) <doi:10.48550/arXiv.2303.10215>.

Maintained by Kimberly Hochstedler Webb. Last updated 20 days ago.

jags cpp

33.5 match 1 stars 5.08 score 4 scripts

kimberlywebb

COMMA:Correcting Misclassified Mediation Analysis

Use three methods to estimate parameters from a mediation analysis with a binary misclassified mediator. These methods correct for the problem of "label switching" using Youden's J criteria. A detailed description of the analysis methods is available in Webb and Wells (2024), "Effect estimation in the presence of a misclassified binary mediator" <doi:10.48550/arXiv.2407.06970>.

Maintained by Kimberly Webb. Last updated 3 months ago.

25.7 match 5.18 score 7 scripts

dhaine

episensr:Basic Sensitivity Analysis of Epidemiological Results

Basic sensitivity analysis of the observed relative risks adjusting for unmeasured confounding and misclassification of the exposure/outcome, or both. It follows the bias analysis methods and examples from the book by Lash T.L, Fox M.P, and Fink A.K. "Applying Quantitative Bias Analysis to Epidemiologic Data", ('Springer', 2021).

Maintained by Denis Haine. Last updated 1 years ago.

bias epidemiology sensitivity-analysis statistics

17.3 match 13 stars 6.48 score 39 scripts 1 dependents

dcgerard

updog:Flexible Genotyping for Polyploids

Implements empirical Bayes approaches to genotype polyploids from next generation sequencing data while accounting for allele bias, overdispersion, and sequencing error. The main functions are flexdog() and multidog(), which allow the specification of many different genotype distributions. Also provided are functions to simulate genotypes, rgeno(), and read-counts, rflexdog(), as well as functions to calculate oracle genotyping error rates, oracle_mis(), and correlation with the true genotypes, oracle_cor(). These latter two functions are useful for read depth calculations. Run browseVignettes(package = "updog") in R for example usage. See Gerard et al. (2018) <doi:10.1534/genetics.118.301468> and Gerard and Ferrao (2020) <doi:10.1093/bioinformatics/btz852> for details on the implemented methods.

Maintained by David Gerard. Last updated 1 years ago.

openblas cpp openmp

8.2 match 28 stars 8.45 score 83 scripts 2 dependents

mayamathur

EValue:Sensitivity Analyses for Unmeasured Confounding and Other Biases in Observational Studies and Meta-Analyses

Conducts sensitivity analyses for unmeasured confounding, selection bias, and measurement error (individually or in combination; VanderWeele & Ding (2017) <doi:10.7326/M16-2607>; Smith & VanderWeele (2019) <doi:10.1097/EDE.0000000000001032>; VanderWeele & Li (2019) <doi:10.1093/aje/kwz133>; Smith & VanderWeele (2021) <arXiv:2005.02908>). Also conducts sensitivity analyses for unmeasured confounding in meta-analyses (Mathur & VanderWeele (2020a) <doi:10.1080/01621459.2018.1529598>; Mathur & VanderWeele (2020b) <doi:10.1097/EDE.0000000000001180>) and for additive measures of effect modification (Mathur et al., under review).

Maintained by Maya B. Mathur. Last updated 3 years ago.

6.0 match 3 stars 6.35 score 99 scripts 1 dependents

wolfganglederer

simex:SIMEX- And MCSIMEX-Algorithm for Measurement Error Models

Implementation of the SIMEX-Algorithm by Cook & Stefanski (1994) <doi:10.1080/01621459.1994.10476871> and MCSIMEX by Küchenhoff, Mwalili & Lesaffre (2006) <doi:10.1111/j.1541-0420.2005.00396.x>.

Maintained by Wolfgang Lederer. Last updated 6 years ago.

5.3 match 11 stars 6.68 score 75 scripts 7 dependents

aryanrzn

ATE.ERROR:Estimating ATE with Misclassified Outcomes and Mismeasured Covariates

Addressing measurement error in covariates and misclassification in binary outcome variables within causal inference, the 'ATE.ERROR' package implements inverse probability weighted estimation methods proposed by Shu and Yi (2017, <doi:10.1177/0962280217743777>; 2019, <doi:10.1002/sim.8073>). These methods correct errors to accurately estimate average treatment effects (ATE). The package includes two main functions: ATE.ERROR.Y() for handling misclassification in the outcome variable and ATE.ERROR.XY() for correcting both outcome misclassification and covariate measurement error. It employs logistic regression for treatment assignment and uses bootstrap sampling to calculate standard errors and confidence intervals, with simulated datasets provided for practical demonstration.

Maintained by Aryan Rezanezhad. Last updated 6 months ago.

9.6 match 3.71 score 16 scripts

chrhennig

fpc:Flexible Procedures for Clustering

Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.

Maintained by Christian Hennig. Last updated 6 months ago.

3.8 match 11 stars 9.25 score 2.6k scripts 70 dependents

ejikeugba

gofcat:Goodness-of-Fit Measures for Categorical Response Models

A post-estimation method for categorical response models (CRM). Inputs from objects of class serp(), clm(), polr(), multinom(), mlogit(), vglm() and glm() are currently supported. Available tests include the Hosmer-Lemeshow tests for the binary, multinomial and ordinal logistic regression; the Lipsitz and the Pulkstenis-Robinson tests for the ordinal models. The proportional odds, adjacent-category, and constrained continuation-ratio models are particularly supported at ordinal level. Tests for the proportional odds assumptions in ordinal models are also possible with the Brant and the Likelihood-Ratio tests. Moreover, several summary measures of predictive strength (Pseudo R-squared), and some useful error metrics, including, the brier score, misclassification rate and logloss are also available for the binary, multinomial and ordinal models. Ugba, E. R. and Gertheiss, J. (2018) <http://www.statmod.org/workshops_archive_proceedings_2018.html>.

Maintained by Ejike R. Ugba. Last updated 2 years ago.

brant-test brier-scores hosmer-lemeshow-test likelihood-ratio-test lipsitz-test log-loss-score-metric logistic-regression misclassification ordinal-regression proportional-odds-test pseudo-r2 pulkstenis-robinson-test

10.5 match 2 stars 3.18 score 15 scripts

formidify

BayesSenMC:Different Models of Posterior Distributions of Adjusted Odds Ratio

Generates different posterior distributions of adjusted odds ratio under different priors of sensitivity and specificity, and plots the models for comparison. It also provides estimations for the specifications of the models using diagnostics of exposure status with a non-linear mixed effects model. It implements the methods that are first proposed in <doi:10.1016/j.annepidem.2006.04.001> and <doi:10.1177/0272989X09353452>.

Maintained by Jinhui Yang. Last updated 4 years ago.

cpp

11.3 match 2.70 score

mblumuga

abc:Tools for Approximate Bayesian Computation (ABC)

Implements several ABC algorithms for performing parameter estimation, model selection, and goodness-of-fit. Cross-validation tools are also available for measuring the accuracy of ABC estimates, and to calculate the misclassification probabilities of different models.

Maintained by Blum Michael. Last updated 3 months ago.

4.3 match 1 stars 6.93 score 410 scripts 9 dependents

valentint

robust:Port of the S+ "Robust Library"

Methods for robust statistics, a state of the art in the early 2000s, notably for robust regression and robust multivariate analysis.

Maintained by Valentin Todorov. Last updated 7 months ago.

fortran openblas

3.8 match 7.52 score 572 scripts 8 dependents

psobczyk

varclust:Variables Clustering

Performs clustering of quantitative variables, assuming that clusters lie in low-dimensional subspaces. Segmentation of variables, number of clusters and their dimensions are selected based on BIC. Candidate models are identified based on many runs of K-means algorithm with different random initializations of cluster centers.

Maintained by Piotr Sobczyk. Last updated 4 years ago.

5.0 match 3 stars 4.32 score 14 scripts

thie1e

cutpointr:Determine and Evaluate Optimal Cutpoints in Binary Classification Tasks

Estimate cutpoints that optimize a specified metric in binary classification tasks and validate performance using bootstrapping. Some methods for more robust cutpoint estimation are supported, e.g. a parametric method assuming normal distributions, bootstrapped cutpoints, and smoothing of the metric values per cutpoint using Generalized Additive Models. Various plotting functions are included. For an overview of the package see Thiele and Hirschfeld (2021) <doi:10.18637/jss.v098.i11>.

Maintained by Christian Thiele. Last updated 3 months ago.

bootstrapping cutpoint-optimization roc-curve cpp

2.0 match 88 stars 10.44 score 322 scripts 1 dependents

jingxuanh

xtune:Regularized Regression with Feature-Specific Penalties Integrating External Information

Extends standard penalized regression (Lasso, Ridge, and Elastic-net) to allow feature-specific shrinkage based on external information with the goal of achieving a better prediction accuracy and variable selection. Examples of external information include the grouping of predictors, prior knowledge of biological importance, external p-values, function annotations, etc. The choice of multiple tuning parameters is done using an Empirical Bayes approach. A majorization-minimization algorithm is employed for implementation.

Maintained by Jingxuan He. Last updated 2 years ago.

5.0 match 3.90 score 16 scripts

santagos

dad:Three-Way / Multigroup Data Analysis Through Densities

The data consist of a set of variables measured on several groups of individuals. To each group is associated an estimated probability density function. The package provides tools to create or manage such data and functional methods (principal component analysis, multidimensional scaling, cluster analysis, discriminant analysis...) for such probability densities.

Maintained by Pierre Santagostini. Last updated 4 months ago.

3.4 match 5.33 score 92 scripts

danheck

RRreg:Correlation and Regression Analyses for Randomized Response Data

Univariate and multivariate methods to analyze randomized response (RR) survey designs (e.g., Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60, 63–69, <doi:10.2307/2283137>). Besides univariate estimates of true proportions, RR variables can be used for correlations, as dependent variable in a logistic regression (with or without random effects), or as predictors in a linear regression (Heck, D. W., & Moshagen, M. (2018). RRreg: An R package for correlation and regression analyses of randomized response data. Journal of Statistical Software, 85(2), 1–29, <doi:10.18637/jss.v085.i02>). For simulations and the estimation of statistical power, RR data can be generated according to several models. The implemented methods also allow to test the link between continuous covariates and dishonesty in cheating paradigms such as the coin-toss or dice-roll task (Moshagen, M., & Hilbig, B. E. (2017). The statistical analysis of cheating paradigms. Behavior Research Methods, 49, 724–732, <doi:10.3758/s13428-016-0729-x>).

Maintained by Daniel W. Heck. Last updated 2 years ago.

3.3 match 3 stars 5.46 score 48 scripts

johnnyzhz

logistic4p:Logistic Regression with Misclassification in Dependent Variables

Error in a binary dependent variable, also known as misclassification, has not drawn much attention in psychology. Ignoring misclassification in logistic regression can result in misleading parameter estimates and statistical inference. This package conducts logistic regression analysis with misspecification in outcome variables.

Maintained by Zhiyong Zhang. Last updated 1 years ago.

16.3 match 1.00 score 8 scripts

chjackson

msmbayes:Bayesian Multi-State Models for Intermittently-Observed Data

Bayesian multi-state models for intermittently-observed data. Markov and phase-type semi-Markov models, and misclassification hidden Markov models.

Maintained by Christopher Jackson. Last updated 4 months ago.

3.7 match 4 stars 4.26 score 3 scripts

svkucheryavski

mdatools:Multivariate Data Analysis for Chemometrics

Projection based methods for preprocessing, exploring and analysis of multivariate data used in chemometrics. S. Kucheryavskiy (2020) <doi:10.1016/j.chemolab.2020.103937>.

Maintained by Sergey Kucheryavskiy. Last updated 8 months ago.

2.0 match 35 stars 7.37 score 220 scripts 1 dependents

cran

SAMBA:Selection and Misclassification Bias Adjustment for Logistic Regression Models

Health research using data from electronic health records (EHR) has gained popularity, but misclassification of EHR-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error for association tests. Here, the assumed target of inference is the relationship between binary disease status and predictors modeled using a logistic regression model. 'SAMBA' implements several methods for obtaining bias-corrected point estimates along with valid standard errors as proposed in Beesley and Mukherjee (2020) <doi:10.1101/2019.12.26.19015859>, currently under review.

Maintained by Alexander Rix. Last updated 5 years ago.

3.4 match 4.18 score 1 dependents

bioc

MiPP:Misclassification Penalized Posterior Classification

This package finds optimal sets of genes that seperate samples into two or more classes.

Maintained by Sukwoo Kim. Last updated 5 months ago.

microarray classification

3.1 match 3.60 score 1 scripts

andrewtitman

nhm:Non-Homogeneous Markov and Hidden Markov Multistate Models

Fits non-homogeneous Markov multistate models and misclassification-type hidden Markov models in continuous time to intermittently observed data. Implements the methods in Titman (2011) <doi:10.1111/j.1541-0420.2010.01550.x>. Uses direct numerical solution of the Kolmogorov forward equations to calculate the transition probabilities.

Maintained by Andrew Titman. Last updated 1 years ago.

5.6 match 1 stars 2.00 score 4 scripts

modal-inria

RMixtCompUtilities:Utility Functions for 'MixtComp' Outputs

Mixture Composer <https://github.com/modal-inria/MixtComp> is a project to build mixture models with heterogeneous data sets and partially missing data management. This package contains graphical, getter and some utility functions to facilitate the analysis of 'MixtComp' output.

Maintained by Quentin Grimonprez. Last updated 10 months ago.

clustering cpp heterogeneous-data missing-data mixed-data mixture-model statistics

2.0 match 13 stars 5.19 score 2 scripts 1 dependents

cran

tree:Classification and Regression Trees

Classification and regression trees.

Maintained by Brian Ripley. Last updated 3 months ago.

2.0 match 1 stars 4.76 score 13 dependents

cran

ordinalNet:Penalized Ordinal Regression

Fits ordinal regression models with elastic net penalty. Supported model families include cumulative probability, stopping ratio, continuation ratio, and adjacent category. These families are a subset of vector glm's which belong to a model class we call the elementwise link multinomial-ordinal (ELMO) class. Each family in this class links a vector of covariates to a vector of class probabilities. Each of these families has a parallel form, which is appropriate for ordinal response data, as well as a nonparallel form that is appropriate for an unordered categorical response, or as a more flexible model for ordinal data. The parallel model has a single set of coefficients, whereas the nonparallel model has a set of coefficients for each response category except the baseline category. It is also possible to fit a model with both parallel and nonparallel terms, which we call the semi-parallel model. The semi-parallel model has the flexibility of the nonparallel model, but the elastic net penalty shrinks it toward the parallel model. For details, refer to Wurm, Hanlon, and Rathouz (2021) <doi:10.18637/jss.v099.i06>.

Maintained by Michael Wurm. Last updated 3 years ago.

3.2 match 1 stars 2.84 score 29 scripts 4 dependents

moran79

folda:Forward Stepwise Discriminant Analysis with Pillai's Trace

A novel forward stepwise discriminant analysis framework that integrates Pillai's trace with Uncorrelated Linear Discriminant Analysis (ULDA), providing an improvement over traditional stepwise LDA methods that rely on Wilks' Lambda. A stand-alone ULDA implementation is also provided, offering a more general solution than the one available in the 'MASS' package. It automatically handles missing values and provides visualization tools. For more details, see Wang (2024) <doi:10.48550/arXiv.2409.03136>.

Maintained by Siyu Wang. Last updated 5 months ago.

cpp

1.8 match 2 stars 5.18 score 6 scripts 1 dependents

jaredhuling

oem:Orthogonalizing EM: Penalized Regression for Big Tall Data

Solves penalized least squares problems for big tall data using the orthogonalizing EM algorithm of Xiong et al. (2016) <doi:10.1080/00401706.2015.1054436>. The main fitting function is oem() and the functions cv.oem() and xval.oem() are for cross validation, the latter being an accelerated cross validation function for linear models. The big.oem() function allows for out of memory fitting. A description of the underlying methods and code interface is described in Huling and Chien (2022) <doi:10.18637/jss.v104.i06>.

Maintained by Jared Huling. Last updated 8 months ago.

group-lasso lasso machine-learning mcp oem oem-algorithm penalized-regression scad variable-selection openblas cpp openmp

1.5 match 27 stars 6.02 score 26 scripts 1 dependents

philipppro

measures:Performance Measures for Statistical Learning

Provides the biggest amount of statistical measures in the whole R world. Includes measures of regression, (multiclass) classification and multilabel classification. The measures come mainly from the 'mlr' package and were programed by several 'mlr' developers.

Maintained by Philipp Probst. Last updated 4 years ago.

2.0 match 1 stars 4.47 score 88 scripts 2 dependents

lindanab

mecor:Measurement Error Correction in Linear Models with a Continuous Outcome

Covariate measurement error correction is implemented by means of regression calibration by Carroll RJ, Ruppert D, Stefanski LA & Crainiceanu CM (2006, ISBN:1584886331), efficient regression calibration by Spiegelman D, Carroll RJ & Kipnis V (2001) <doi:10.1002/1097-0258(20010115)20:1%3C139::AID-SIM644%3E3.0.CO;2-K> and maximum likelihood estimation by Bartlett JW, Stavola DBL & Frost C (2009) <doi:10.1002/sim.3713>. Outcome measurement error correction is implemented by means of the method of moments by Buonaccorsi JP (2010, ISBN:1420066560) and efficient method of moments by Keogh RH, Carroll RJ, Tooze JA, Kirkpatrick SI & Freedman LS (2014) <doi:10.1002/sim.7011>. Standard error estimation of the corrected estimators is implemented by means of the Delta method by Rosner B, Spiegelman D & Willett WC (1990) <doi:10.1093/oxfordjournals.aje.a115715> and Rosner B, Spiegelman D & Willett WC (1992) <doi:10.1093/oxfordjournals.aje.a116453>, the Fieller method described by Buonaccorsi JP (2010, ISBN:1420066560), and the Bootstrap by Carroll RJ, Ruppert D, Stefanski LA & Crainiceanu CM (2006, ISBN:1584886331).

Maintained by Linda Nab. Last updated 3 years ago.

linear-models measurement-error statistics

1.8 match 6 stars 5.07 score 13 scripts

christinaheinze

CondIndTests:Nonlinear Conditional Independence Tests

Code for a variety of nonlinear conditional independence tests: Kernel conditional independence test (Zhang et al., UAI 2011, <arXiv:1202.3775>), Residual Prediction test (based on Shah and Buehlmann, <arXiv:1511.03334>), Invariant environment prediction, Invariant target prediction, Invariant residual distribution test, Invariant conditional quantile prediction (all from Heinze-Deml et al., <arXiv:1706.08576>).

Maintained by Christina Heinze-Deml. Last updated 5 years ago.

1.8 match 17 stars 4.91 score 32 scripts 1 dependents

cran

hmeasure:The H-Measure and Other Scalar Classification Performance Metrics

Classification performance metrics that are derived from the ROC curve of a classifier. The package includes the H-measure performance metric as described in <http://link.springer.com/article/10.1007/s10994-009-5119-5>, which computes the minimum total misclassification cost, integrating over any uncertainty about the relative misclassification costs, as per a user-defined prior. It also offers a one-stop-shop for other scalar metrics of performance, including sensitivity, specificity and many others, and also offers plotting tools for ROC curves and related statistics.

Maintained by Christoforos Anagnostopoulos. Last updated 6 years ago.

2.4 match 3.48 score 1 dependents

o1iv3r

FeatureImpCluster:Feature Importance for Partitional Clustering

Implements a novel approach for measuring feature importance in k-means clustering. Importance of a feature is measured by the misclassification rate relative to the baseline cluster assignment due to a random permutation of feature values. An explanation of permutation feature importance in general can be found here: <https://christophm.github.io/interpretable-ml-book/feature-importance.html>.

Maintained by Oliver Pfaffel. Last updated 3 years ago.

2.3 match 4 stars 3.58 score 19 scripts

metabocomp

MUVR2:Multivariate Methods with Unbiased Variable Selection

Predictive multivariate modelling for metabolomics. Types: Classification and regression. Methods: Partial Least Squares, Random Forest ans Elastic Net Data structures: Paired and unpaired Validation: repeated double cross-validation (Westerhuis et al. (2008)<doi:10.1007/s11306-007-0099-6>, Filzmoser et al. (2009)<doi:10.1002/cem.1225>) Variable selection: Performed internally, through tuning in the inner cross-validation loop.

Maintained by Yingxiao Yan. Last updated 6 months ago.

2.0 match 2 stars 4.04 score 1 scripts

mrmarjan

mri:Modified Rand and Wallace Indices

It provides functions to compute the values of different modifications of the Rand and Wallace indices. The indices are used to measure the stability or similarity of two partitions obtained on two different sets of units with a non-empty intercept. Splitting and merging of clusters can (depends on the selected index) have a different effect on the value of the indices. The indices are proposed in Cugmas and Ferligoj (2018) <http://ibmi.mf.uni-lj.si/mz/2018/no-1/Cugmas2018.pdf>.

Maintained by Marjan Cugmas. Last updated 6 years ago.

4.0 match 2.00 score 4 scripts

cran

rpartScore:Classification Trees for Ordinal Responses

Recursive partitioning methods to build classification trees for ordinal responses within the CART framework. Trees are grown using the Generalized Gini impurity function, where the misclassification costs are given by the absolute or squared differences in scores assigned to the categories of the response. Pruning is based on the total misclassification rate or on the total misclassification cost.

Maintained by Giuliano Galimberti. Last updated 3 years ago.

4.4 match 1.76 score 19 scripts 1 dependents

vsousa

poolABC:Approximate Bayesian Computation with Pooled Sequencing Data

Provides functions to simulate Pool-seq data under models of demographic formation and to import Pool-seq data from real populations. Implements two ABC algorithms for performing parameter estimation and model selection using Pool-seq data. Cross-validation can also be performed to assess the accuracy of ABC estimates and model choice. Carvalho et al., (2022) <doi:10.1111/1755-0998.13834>.

Maintained by João Carvalho. Last updated 2 years ago.

2.0 match 1 stars 3.70 score 3 scripts

nutriverse

sleacr:Simplified Lot Quality Assurance Sampling Evaluation of Access and Coverage (SLEAC) Tools

In the recent past, measurement of coverage has been mainly through two-stage cluster sampled surveys either as part of a nutrition assessment or through a specific coverage survey known as Centric Systematic Area Sampling (CSAS). However, such methods are resource intensive and often only used for final programme evaluation meaning results arrive too late for programme adaptation. SLEAC, which stands for Simplified Lot Quality Assurance Sampling Evaluation of Access and Coverage, is a low resource method designed specifically to address this limitation and is used regularly for monitoring, planning and importantly, timely improvement to programme quality, both for agency and Ministry of Health (MoH) led programmes. SLEAC is designed to complement the Semi-quantitative Evaluation of Access and Coverage (SQUEAC) method. This package provides functions for use in conducting a SLEAC assessment.

Maintained by Ernest Guevarra. Last updated 1 months ago.

acute-malnutrition cmam coverage nutrition sleac wasting

2.0 match 1 stars 3.48 score 5 scripts

cran

augSIMEX:Analysis of Data with Mixed Measurement Error and Misclassification in Covariates

Implementation of the augmented Simulation-Extrapolation (SIMEX) algorithm proposed by Yi et al. (2015) <doi:10.1080/01621459.2014.922777> for analyzing the data with mixed measurement error and misclassification. The main function provides a similar summary output as that of glm() function. Both parametric and empirical SIMEX are considered in the package.

Maintained by Qihuang Zhang. Last updated 5 years ago.

cpp

6.8 match 1.00 score

koendebock

CustomerScoringMetrics:Evaluation Metrics for Customer Scoring Models Depending on Binary Classifiers

Functions for evaluating and visualizing predictive model performance (specifically: binary classifiers) in the field of customer scoring. These metrics include lift, lift index, gain percentage, top-decile lift, F1-score, expected misclassification cost and absolute misclassification cost. See Berry & Linoff (2004, ISBN:0-471-47064-3), Witten and Frank (2005, 0-12-088407-0) and Blattberg, Kim & Neslin (2008, ISBN:978–0–387–72578–9) for details. Visualization functions are included for lift charts and gain percentage charts. All metrics that require class predictions offer the possibility to dynamically determine cutoff values for transforming real-valued probability predictions into class predictions.

Maintained by Koen W. De Bock. Last updated 7 years ago.

4.6 match 1.40 score 25 scripts

larskotthoff

llama:Leveraging Learning to Automatically Manage Algorithms

Provides functionality to train and evaluate algorithm selection models for portfolios.

Maintained by Lars Kotthoff. Last updated 4 years ago.

openjdk

2.3 match 4 stars 2.80 score 53 scripts 1 dependents

lcougnaud

nlcv:Nested Loop Cross Validation

Nested loop cross validation for classification purposes for misclassification error rate estimation. The package supports several methodologies for feature selection: random forest, Student t-test, limma, and provides an interface to the following classification methods in the 'MLInterfaces' package: linear, quadratic discriminant analyses, random forest, bagging, prediction analysis for microarray, generalized linear model, support vector machine (svm and ksvm). Visualizations to assess the quality of the classifier are included: plot of the ranks of the features, scores plot for a specific classification algorithm and number of features, misclassification rate for the different number of features and classification algorithms tested and ROC plot. For further details about the methodology, please check: Markus Ruschhaupt, Wolfgang Huber, Annemarie Poustka, and Ulrich Mansmann (2004) <doi:10.2202/1544-6115.1078>.

Maintained by Laure Cougnaud. Last updated 7 years ago.

2.8 match 2.00 score 8 scripts

skranz

RoundingMatters:Tools for adjusting for rounding problems in metastudies about p-hacking and publication bias

Tools for adjusting for rounding problems in metastudies about p-hacking and publication bias

Maintained by Sebastian Kranz. Last updated 4 years ago.

3.2 match 1.70 score 8 scripts

cran

MLDS:Maximum Likelihood Difference Scaling

Difference scaling is a method for scaling perceived supra-threshold differences. The package contains functions that allow the user to design and run a difference scaling experiment, to fit the resulting data by maximum likelihood and test the internal validity of the estimated scale.

Maintained by Kenneth Knoblauch. Last updated 2 years ago.

1.8 match 2.70 score

michlau

logicDT:Identifying Interactions Between Binary Predictors

A statistical learning method that tries to find the best set of predictors and interactions between predictors for modeling binary or quantitative response data in a decision tree. Several search algorithms and ensembling techniques are implemented allowing for finetuning the method to the specific problem. Interactions with quantitative covariables can be properly taken into account by fitting local regression models. Moreover, a variable importance measure for assessing marginal and interaction effects is provided. Implements the procedures proposed by Lau et al. (2024, <doi:10.1007/s10994-023-06488-6>).

Maintained by Michael Lau. Last updated 6 months ago.

2.0 match 2 stars 2.00 score 2 scripts

riazakhan94

ROCit:Performance Assessment of Binary Classifier with Visualization

Sensitivity (or recall or true positive rate), false positive rate, specificity, precision (or positive predictive value), negative predictive value, misclassification rate, accuracy, F-score- these are popular metrics for assessing performance of binary classifier for certain threshold. These metrics are calculated at certain threshold values. Receiver operating characteristic (ROC) curve is a common tool for assessing overall diagnostic ability of the binary classifier. Unlike depending on a certain threshold, area under ROC curve (also known as AUC), is a summary statistic about how well a binary classifier performs overall for the classification task. ROCit package provides flexibility to easily evaluate threshold-bound metrics. Also, ROC curve, along with AUC, can be obtained using different methods, such as empirical, binormal and non-parametric. ROCit encompasses a wide variety of methods for constructing confidence interval of ROC curve and AUC. ROCit also features the option of constructing empirical gains table, which is a handy tool for direct marketing. The package offers options for commonly used visualization, such as, ROC curve, KS plot, lift plot. Along with in-built default graphics setting, there are rooms for manual tweak by providing the necessary values as function arguments. ROCit is a powerful tool offering a range of things, yet it is very easy to use.

Maintained by Md Riaz Ahmed Khan. Last updated 3 years ago.

0.5 match 7.66 score 332 scripts 6 dependents

cran

noisemodel:Noise Models for Classification Datasets

Implementation of models for the controlled introduction of errors in classification datasets. This package contains the noise models described in Saez (2022) <doi:10.3390/math10203736> that allow corrupting class labels, attributes and both simultaneously.

Maintained by José A. Sáez. Last updated 2 years ago.

1.9 match 2.00 score

mkhondoker

optBiomarker:Estimation of Optimal Number of Biomarkers for Two-Group Microarray Based Classifications at a Given Error Tolerance Level for Various Classification Rules

Estimates optimal number of biomarkers for two-group classification based on microarray data.

Maintained by Mizanur Khondoker. Last updated 4 years ago.

1.6 match 2.00 score 1 scripts

xzhu20

ManlyMix:Manly Mixture Modeling and Model-Based Clustering

The utility of this package includes finite mixture modeling and model-based clustering through Manly mixture models by Zhu and Melnykov (2016) <DOI:10.1016/j.csda.2016.01.015>. It also provides capabilities for forward and backward model selection procedures.

Maintained by Xuwen Zhu. Last updated 6 months ago.

openblas

1.8 match 1.65 score 15 scripts 1 dependents

benjilu

forestError:A Unified Framework for Random Forest Prediction Error Estimation

Estimates the conditional error distributions of random forest predictions and common parameters of those distributions, including conditional misclassification rates, conditional mean squared prediction errors, conditional biases, and conditional quantiles, by out-of-bag weighting of out-of-bag prediction errors as proposed by Lu and Hardin (2021). This package is compatible with several existing packages that implement random forests in R.

Maintained by Benjamin Lu. Last updated 4 years ago.

inference intervals machine-learning machinelearning prediction random-forest randomforest statistics

0.5 match 26 stars 4.62 score 16 scripts

snoweye

MixSim:Simulating Data to Study Performance of Clustering Algorithms

The utility of this package is in simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of 'MixSim', there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models.

Maintained by Wei-Chen Chen. Last updated 8 months ago.

openblas

0.5 match 1 stars 4.48 score 84 scripts 3 dependents

viroli

quantileDA:Quantile Classifier

Code for centroid, median and quantile classifiers.

Maintained by Cinzia Viroli. Last updated 12 months ago.

2.3 match 1.00 score 10 scripts

dhaine

apisensr:Interface to 'episensr' for Sensitivity Analysis of Epidemiological Results

API for using 'episensr', Basic sensitivity analysis of the observed relative risks adjusting for unmeasured confounding and misclassification of the exposure/outcome, or both. See <https://cran.r-project.org/package=episensr>.

Maintained by Denis Haine. Last updated 2 years ago.

0.5 match 3 stars 4.18 score 5 scripts

gjjvdburg

gensvm:A Generalized Multiclass Support Vector Machine

The GenSVM classifier is a generalized multiclass support vector machine (SVM). This classifier aims to find decision boundaries that separate the classes with as wide a margin as possible. In GenSVM, the loss function is very flexible in the way that misclassifications are penalized. This allows the user to tune the classifier to the dataset at hand and potentially obtain higher classification accuracy than alternative multiclass SVMs. Moreover, this flexibility means that GenSVM has a number of other multiclass SVMs as special cases. One of the other advantages of GenSVM is that it is trained in the primal space, allowing the use of warm starts during optimization. This means that for common tasks such as cross validation or repeated model fitting, GenSVM can be trained very quickly. Based on: G.J.J. van den Burg and P.J.F. Groenen (2018) <https://www.jmlr.org/papers/v17/14-526.html>.

Maintained by Gertjan van den Burg. Last updated 2 years ago.

classification machine-learning machine-learning-algorithms multiclass-classification support-vector-machine

0.5 match 7 stars 3.96 score 26 scripts

ashipunov

shipunov:Miscellaneous Functions from Alexey Shipunov

A collection of functions for data manipulation, plotting and statistical computing, to use separately or with the book "Visual Statistics. Use R!": Shipunov (2020) <http://ashipunov.info/shipunov/software/r/r-en.htm>. Dr Alexey Shipunov died in December 2022. Most useful functions: Bclust(), Jclust() and BootA() which bootstrap hierarchical clustering; Recode() which does multiple recoding in a fast, simple and flexible way; Misclass() which outputs confusion matrix even if classes are not concerted; Overlap() which measures group separation on any projection; Biarrows() which converts any scatterplot into biplot; and Pleiad() which is fast and flexible correlogram.

Maintained by ORPHANED. Last updated 2 years ago.

2.0 match 1.00 score 9 scripts

sdlugosz

misclassGLM:Computation of Generalized Linear Models with Misclassified Covariates Using Side Information

Estimates models that extend the standard GLM to take misclassification into account. The models require side information from a secondary data set on the misclassification process, i.e. some sort of misclassification probabilities conditional on some common covariates. A detailed description of the algorithm can be found in Dlugosz, Mammen and Wilke (2015) <https://www.zew.de/publikationen/generalised-partially-linear-regression-with-misclassified-data-and-an-application-to-labour-market-transitions>.

Maintained by Stephan Dlugosz. Last updated 1 years ago.

0.9 match 1.81 score 13 scripts

nicolas-schmidt

BayesMFSurv:Bayesian Misclassified-Failure Survival Model

Contains a split population survival estimator that models the misclassification probability of failure versus right-censored events. The split population survival estimator is described in Bagozzi et al. (2019) <doi:10.1017/pan.2019.6>.

Maintained by Nicolas Schmidt. Last updated 5 years ago.

misclassified-failure-estimates survival cpp

0.5 match 1 stars 3.00 score

dkahle

poisDoubleSamp:Confidence Intervals with Poisson Double Sampling

Functions to create confidence intervals for ratios of Poisson rates under misclassification using double sampling.

Maintained by David Kahle. Last updated 10 years ago.

cpp

0.5 match 1 stars 2.00 score 7 scripts

andrewtitman

miscIC:Misclassified Interval Censored Time-to-Event Data

Estimation of the survivor function for interval censored time-to-event data subject to misclassification using nonparametric maximum likelihood estimation, implementing the methods of Titman (2017) <doi:10.1007/s11222-016-9705-7>. Misclassification probabilities can either be specified as fixed or estimated. Models with time dependent misclassification may also be fitted.

Maintained by Andrew Titman. Last updated 5 years ago.

0.9 match 1.00 score

yuliangxu

mgee2:Marginal Analysis of Misclassified Longitudinal Ordinal Data

Three estimating equation methods are provided in this package for marginal analysis of longitudinal ordinal data with misclassified responses and covariates. The naive analysis which is solely based on the observed data without adjustment may lead to bias. The corrected generalized estimating equations (GEE2) method which is unbiased requires the misclassification parameters to be known beforehand. The corrected generalized estimating equations (GEE2) with validation subsample method estimates the misclassification parameters based on a given validation set. This package is an implementation of Chen (2013) <doi:10.1002/bimj.201200195>.

Maintained by Yuliang Xu. Last updated 4 months ago.

0.8 match 1.00 score 3 scripts

meintraumus

AFFECT:Accelerated Functional Failure Time Model with Error-Contaminated Survival Times

We aim to deal with data with measurement error in the response and misclassification censoring status under an AFT model. This package primarily contains three functions, which are used to generate artificial data, correction for error-prone data and estimate the functional covariates for an AFT model.

Maintained by Hsiao-Ting Huang. Last updated 2 years ago.

0.5 match 1.00 score

shu-d

ipwErrorY:Inverse Probability Weighted Estimation of Average Treatment Effect with Misclassified Binary Outcome

An implementation of the correction methods proposed by Shu and Yi (2017) <doi:10.1177/0962280217743777> for the inverse probability weighted (IPW) estimation of average treatment effect (ATE) with misclassified binary outcomes. Logistic regression model is assumed for treatment model for all implemented correction methods, and is assumed for the outcome model for the implemented doubly robust correction method. Misclassification probability given a true value of the outcome is assumed to be the same for all individuals.

Maintained by Di Shu. Last updated 6 years ago.

0.5 match 1.00 score 4 scripts

cran

abcrlda:Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis

Offers methods to perform asymptotically bias-corrected regularized linear discriminant analysis (ABC_RLDA) for cost-sensitive binary classification. The bias-correction is an estimate of the bias term added to regularized discriminant analysis (RLDA) that minimizes the overall risk. The default magnitude of misclassification costs are equal and set to 0.5; however, the package also offers the options to set them to some predetermined values or, alternatively, take them as hyperparameters to tune. A. Zollanvari, M. Abdirash, A. Dadlani and B. Abibullaev (2019) <doi:10.1109/LSP.2019.2918485>.

Maintained by Dmitriy Fedorov. Last updated 5 years ago.

0.5 match 1.00 score

cran

SequentialDesign:Observational Database Study Planning using Exact Sequential Analysis for Poisson and Binomial Data

Functions to be used in conjunction with the 'Sequential' package that allows for planning of observational database studies that will be analyzed with exact sequential analysis. This package supports Poisson- and binomial-based data. The primary function, seq_wrapper(...), accepts parameters for simulation of a simple exposure pattern and for the 'Sequential' package setup and analysis functions. The exposure matrix is used to simulate the true and false positive and negative populations (Green (1983) <doi:10.1093/oxfordjournals.aje.a113521>, Brenner (1993) <doi:10.1093/oxfordjournals.aje.a116805>). Functions are then run from the 'Sequential' package on these populations, which allows for the exploration of outcome misclassification in data.

Maintained by Judith Maro. Last updated 7 years ago.

0.5 match 1.00 score