R-universe search: topic:missing-data

yrosseel

lavaan:Latent Variable Analysis

Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.

Maintained by Yves Rosseel. Last updated 3 days ago.

factor-analysis growth-curve-models latent-variables missing-data multilevel-models multivariate-analysis path-analysis psychometrics statistical-modeling structural-equation-modeling

454 stars 16.82 score 8.4k scripts 218 dependents

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 1 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

462 stars 16.64 score 10k scripts 154 dependents

njtierney

naniar:Data Structures, Summaries, and Visualisations for Missing Data

Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. 'naniar' provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data. The work is fully discussed at Tierney & Cook (2023) <doi:10.18637/jss.v105.i07>.

Maintained by Nicholas Tierney. Last updated 17 days ago.

data-visualisation ggplot2 missing-data missingness tidy-data

657 stars 15.63 score 5.1k scripts 9 dependents

simongrund1

mitml:Tools for Multiple Imputation in Multilevel Modeling

Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the packages 'pan' and 'jomo', and several functions for visualization, data management and the analysis of multiply imputed data sets.

Maintained by Simon Grund. Last updated 1 years ago.

imputation missing-data mixed-effects multilevel-data multilevel-models

29 stars 12.36 score 246 scripts 153 dependents

steffenmoritz

imputeTS:Time Series Missing Value Imputation

Imputation (replacement) of missing values in univariate time series. Offers several imputation functions and missing data plots. Available imputation algorithms include: 'Mean', 'LOCF', 'Interpolation', 'Moving Average', 'Seasonal Decomposition', 'Kalman Smoothing on Structural Time Series models', 'Kalman Smoothing on ARIMA models'. Published in Moritz and Bartz-Beielstein (2017) <doi:10.32614/RJ-2017-009>.

Maintained by Steffen Moritz. Last updated 3 years ago.

data-visualization imputation imputation-algorithm imputets missing-data time-series cpp

162 stars 12.18 score 1.9k scripts 27 dependents

ecmerkle

blavaan:Bayesian Latent Variable Analysis

Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models. References: Merkle & Rosseel (2018) <doi:10.18637/jss.v085.i04>; Merkle et al. (2021) <doi:10.18637/jss.v100.i06>.

Maintained by Edgar Merkle. Last updated 10 days ago.

bayesian-statistics factor-analysis growth-curve-models latent-variables missing-data multilevel-models multivariate-analysis path-analysis psychometrics statistical-modeling structural-equation-modeling cpp

92 stars 10.84 score 183 scripts 3 dependents

mikewlcheung

metaSEM:Meta-Analysis using Structural Equation Modeling

A collection of functions for conducting meta-analysis using a structural equation modeling (SEM) approach via the 'OpenMx' and 'lavaan' packages. It also implements various procedures to perform meta-analytic structural equation modeling on the correlation and covariance matrices, see Cheung (2015) <doi:10.3389/fpsyg.2014.01521>.

Maintained by Mike Cheung. Last updated 23 days ago.

meta-analysis meta-analytic-sem missing-data multilevel-models multivariate-analysis structural-equation-modeling structural-equation-models

30 stars 9.43 score 208 scripts 1 dependents

alexanderrobitzsch

miceadds:Some Additional Multiple Imputation Functions, Especially for 'mice'

Contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are implemented. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>), substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>), and features for the generation of synthetic datasets (Reiter, 2005, <doi:10.1111/j.1467-985X.2004.00343.x>; Nowok, Raab, & Dibben, 2016, <doi:10.18637/jss.v074.i11>).

Maintained by Alexander Robitzsch. Last updated 29 days ago.

missing-data multiple-imputation openblas cpp

16 stars 9.16 score 542 scripts 9 dependents

japal

zCompositions:Treatment of Zeros, Left-Censored and Missing Values in Compositional Data Sets

Principled methods for the imputation of zeros, left-censored and missing data in compositional data sets (Palarea-Albaladejo and Martin-Fernandez (2015) <doi:10.1016/j.chemolab.2015.02.019>).

Maintained by Javier Palarea-Albaladejo. Last updated 9 months ago.

censored-data compositional-data imputation-methods missing-data nondetection

7 stars 8.40 score 370 scripts 11 dependents

nerler

JointAI:Joint Analysis and Imputation of Incomplete Data

Joint analysis and imputation of incomplete data in the Bayesian framework, using (generalized) linear (mixed) models and extensions there of, survival models, or joint models for longitudinal and survival data, as described in Erler, Rizopoulos and Lesaffre (2021) <doi:10.18637/jss.v100.i20>. Incomplete covariates, if present, are automatically imputed. The package performs some preprocessing of the data and creates a 'JAGS' model, which will then automatically be passed to 'JAGS' <https://mcmc-jags.sourceforge.io/> with the help of the package 'rjags'.

Maintained by Nicole S. Erler. Last updated 12 months ago.

bayesian generalized-linear-models glm glmm imputation imputations jags joint-analysis linear-mixed-models linear-regression-models mcmc-sample mcmc-sampling missing-data missing-values survival cpp

28 stars 7.30 score 59 scripts 1 dependents

farrellday

miceRanger:Multiple Imputation by Chained Equations with Random Forests

Multiple Imputation has been shown to be a flexible method to impute missing values by Van Buuren (2007) <doi:10.1177/0962280206074463>. Expanding on this, random forests have been shown to be an accurate model by Stekhoven and Buhlmann <arXiv:1105.0828> to impute missing values in datasets. They have the added benefits of returning out of bag error and variable importance estimates, as well as being simple to run in parallel.

Maintained by Sam Wilson. Last updated 3 years ago.

imputation-methods machine-learning mice missing-data missing-values random-forests

67 stars 7.09 score 41 scripts 1 dependents

modal-inria

RMixtComp:Mixture Models with Heterogeneous and (Partially) Missing Data

Mixture Composer (Biernacki (2015) <https://inria.hal.science/hal-01253393v1>) is a project to perform clustering using mixture models with heterogeneous data and partially missing data. Mixture models are fitted using a SEM algorithm. It includes 8 models for real, categorical, counting, functional and ranking data.

Maintained by Quentin Grimonprez. Last updated 11 months ago.

clustering cpp heterogeneous-data missing-data mixed-data mixture-model statistics

13 stars 6.10 score 12 scripts

bioc

BEclear:Correction of batch effects in DNA methylation data

Provides functions to detect and correct for batch effects in DNA methylation data. The core function is based on latent factor models and can also be used to predict missing values in any other matrix containing real numbers.

Maintained by Livia Rasp. Last updated 5 months ago.

batcheffect dnamethylation software preprocessing statisticalmethod batch-effects bioconductor-package dna-methylation latent-factor-model methylation missing-data missing-values stochastic-gradient-descent cpp

4 stars 5.90 score 11 scripts

tirgit

missCompare:Intuitive Missing Data Imputation Framework

Offers a convenient pipeline to test and compare various missing data imputation algorithms on simulated and real data. These include simpler methods, such as mean and median imputation and random replacement, but also include more sophisticated algorithms already implemented in popular R packages, such as 'mi', described by Su et al. (2011) <doi:10.18637/jss.v045.i02>; 'mice', described by van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>; 'missForest', described by Stekhoven and Buhlmann (2012) <doi:10.1093/bioinformatics/btr597>; 'missMDA', described by Josse and Husson (2016) <doi:10.18637/jss.v070.i01>; and 'pcaMethods', described by Stacklies et al. (2007) <doi:10.1093/bioinformatics/btm069>. The central assumption behind 'missCompare' is that structurally different datasets (e.g. larger datasets with a large number of correlated variables vs. smaller datasets with non correlated variables) will benefit differently from different missing data imputation algorithms. 'missCompare' takes measurements of your dataset and sets up a sandbox to try a curated list of standard and sophisticated missing data imputation algorithms and compares them assuming custom missingness patterns. 'missCompare' will also impute your real-life dataset for you after the selection of the best performing algorithm in the simulations. The package also provides various post-imputation diagnostics and visualizations to help you assess imputation performance.

Maintained by Tibor V. Varga. Last updated 4 years ago.

comparison comparison-benchmarks imputation imputation-algorithm imputation-methods imputations kolmogorov-smirnov missing missing-data missing-data-imputation missing-status-check missing-values missingness post-imputation-diagnostics rmse

39 stars 5.89 score 40 scripts

nelson-gon

mde:Missing Data Explorer

Correct identification and handling of missing data is one of the most important steps in any analysis. To aid this process, 'mde' provides a very easy to use yet robust framework to quickly get an idea of where the missing data lies and therefore find the most appropriate action to take. Graham WJ (2009) <doi:10.1146/annurev.psych.58.110405.085530>.

Maintained by Nelson Gonzabato. Last updated 3 years ago.

data-analysis data-cleaning data-exploration data-science datacleaner datacleaning exploratory-data-analysis missing missing-data missing-value-treatment missing-values missingness omit recode replace statistics

4 stars 5.61 score 34 scripts

grosssbm

missSBM:Handling Missing Data in Stochastic Block Models

When a network is partially observed (here, NAs in the adjacency matrix rather than 1 or 0 due to missing information between node pairs), it is possible to account for the underlying process that generates those NAs. 'missSBM', presented in 'Barbillon, Chiquet and Tabouy' (2022) <doi:10.18637/jss.v101.i12>, adjusts the popular stochastic block model from network data sampled under various missing data conditions, as described in 'Tabouy, Barbillon and Chiquet' (2019) <doi:10.1080/01621459.2018.1562934>.

Maintained by Julien Chiquet. Last updated 17 days ago.

missing-data nas network-analysis network-dataset stochastic-block-model cpp

12 stars 5.53 score 19 scripts

angabrio

missingHE:Missing Outcome Data in Health Economic Evaluation

Contains a suite of functions for health economic evaluations with missing outcome data. The package can fit different types of statistical models under a fully Bayesian approach using the software 'JAGS' (which should be installed locally and which is loaded in 'missingHE' via the 'R' package 'R2jags'). Three classes of models can be fitted under a variety of missing data assumptions: selection models, pattern mixture models and hurdle models. In addition to model fitting, 'missingHE' provides a set of specialised functions to assess model convergence and fit, and to summarise the statistical and economic results using different types of measures and graphs. The methods implemented are described in Mason (2018) <doi:10.1002/hec.3793>, Molenberghs (2000) <doi:10.1007/978-1-4419-0300-6_18> and Gabrio (2019) <doi:10.1002/sim.8045>.

Maintained by Andrea Gabrio. Last updated 2 years ago.

cost-effectiveness-analysis health-economic-evaluation individual-level-data jags missing-data parametric-modelling sensitivity-analysis cpp

5 stars 5.38 score 24 scripts

elliecurnow

midoc:A Decision-Making System for Multiple Imputation

A guidance system for analysis with missing data. It incorporates expert, up-to-date methodology to help researchers choose the most appropriate analysis approach when some data are missing. You provide the available data and the assumed causal structure, including the likely causes of missing data. 'midoc' will advise which analysis approaches can be used, and how best to perform them. 'midoc' follows the framework for the treatment and reporting of missing data in observational studies (TARMOS). Lee et al (2021). <doi:10.1016/j.jclinepi.2021.01.008>.

Maintained by Elinor Curnow. Last updated 6 months ago.

missing-data multiple-imputation

6 stars 5.32 score 8 scripts

modal-inria

RMixtCompUtilities:Utility Functions for 'MixtComp' Outputs

Mixture Composer <https://github.com/modal-inria/MixtComp> is a project to build mixture models with heterogeneous data sets and partially missing data management. This package contains graphical, getter and some utility functions to facilitate the analysis of 'MixtComp' output.

Maintained by Quentin Grimonprez. Last updated 11 months ago.

clustering cpp heterogeneous-data missing-data mixed-data mixture-model statistics

13 stars 5.19 score 2 scripts 1 dependents

bioc

MAI:Mechanism-Aware Imputation

A two-step approach to imputing missing data in metabolomics. Step 1 uses a random forest classifier to classify missing values as either Missing Completely at Random/Missing At Random (MCAR/MAR) or Missing Not At Random (MNAR). MCAR/MAR are combined because it is often difficult to distinguish these two missing types in metabolomics data. Step 2 imputes the missing values based on the classified missing mechanisms, using the appropriate imputation algorithms. Imputation algorithms tested and available for MCAR/MAR include Bayesian Principal Component Analysis (BPCA), Multiple Imputation No-Skip K-Nearest Neighbors (Multi_nsKNN), and Random Forest. Imputation algorithms tested and available for MNAR include nsKNN and a single imputation approach for imputation of metabolites where left-censoring is present.

Maintained by Jonathan Dekermanjian. Last updated 5 months ago.

software metabolomics statisticalmethod classification imputation-methods machine-learning missing-data

2 stars 5.00 score 6 scripts

feiyoung

ILSE:Linear Regression Based on 'ILSE' for Missing Data

Linear regression when covariates include missing values by embedding the correlation information between covariates. Especially for block missing data, it works well. 'ILSE' conducts imputation and regression simultaneously and iteratively. More details can be referred to Huazhen Lin, Wei Liu and Wei Lan. (2021) <doi:10.1080/07350015.2019.1635486>.

Maintained by Wei Liu. Last updated 1 years ago.

fiml ilse linear-regression missing-data openblas cpp

2 stars 4.95 score 3 scripts

steffenmoritz

imputeR:A General Multivariate Imputation Framework

Multivariate Expectation-Maximization (EM) based imputation framework that offers several different algorithms. These include regularisation methods like Lasso and Ridge regression, tree-based models and dimensionality reduction methods like PCA and PLS.

Maintained by Steffen Moritz. Last updated 4 years ago.

missing-data

16 stars 4.94 score 54 scripts

haghish

mlim:Single and Multiple Imputation with Automated Machine Learning

Machine learning algorithms have been used for performing single missing data imputation and most recently, multiple imputations. However, this is the first attempt for using automated machine learning algorithms for performing both single and multiple imputation. Automated machine learning is a procedure for fine-tuning the model automatic, performing a random search for a model that results in less error, without overfitting the data. The main idea is to allow the model to set its own parameters for imputing each variable separately instead of setting fixed predefined parameters to impute all variables of the dataset. Using automated machine learning, the package fine-tunes an Elastic Net (default) or Gradient Boosting, Random Forest, Deep Learning, Extreme Gradient Boosting, or Stacked Ensemble machine learning model (from one or a combination of other supported algorithms) for imputing the missing observations. This procedure has been implemented for the first time by this package and is expected to outperform other packages for imputing missing data that do not fine-tune their models. The multiple imputation is implemented via bootstrapping without letting the duplicated observations to harm the cross-validation procedure, which is the way imputed variables are evaluated. Most notably, the package implements automated procedure for handling imputing imbalanced data (class rarity problem), which happens when a factor variable has a level that is far more prevalent than the other(s). This is known to result in biased predictions, hence, biased imputation of missing data. However, the autobalancing procedure ensures that instead of focusing on maximizing accuracy (classification error) in imputing factor variables, a fairer procedure and imputation method is practiced.

Maintained by E. F. Haghish. Last updated 8 months ago.

automatic-machine-learning automl classimbalance data-science elastic-net extreme-gradient-boosting gbm glm gradient-boosting gradient-boosting-machine imputation imputation-algorithm imputation-methods machine-learning missing-data multipleimputation stack-ensemble

31 stars 4.49 score 7 scripts

shangzhi-hong

RfEmpImp:Multiple Imputation using Chained Random Forests

An R package for multiple imputation using chained random forests. Implemented methods can handle missing data in mixed types of variables by using prediction-based or node-based conditional distributions constructed using random forests. For prediction-based imputation, the method based on the empirical distribution of out-of-bag prediction errors of random forests and the method based on normality assumption for prediction errors of random forests are provided for imputing continuous variables. And the method based on predicted probabilities is provided for imputing categorical variables. For node-based imputation, the method based on the conditional distribution formed by the predicting nodes of random forests, and the method based on proximity measures of random forests are provided. More details of the statistical methods can be found in Hong et al. (2020) <arXiv:2004.14823>.

Maintained by Shangzhi Hong. Last updated 2 years ago.

imputation missing-data random-forest

5 stars 4.40 score 8 scripts

xsswang

remiod:Reference-Based Multiple Imputation for Ordinal/Binary Response

Reference-based multiple imputation of ordinal and binary responses under Bayesian framework, as described in Wang and Liu (2022) <arXiv:2203.02771>. Methods for missing-not-at-random include Jump-to-Reference (J2R), Copy Reference (CR), and Delta Adjustment which can generate tipping point analysis.

Maintained by Tony Wang. Last updated 2 years ago.

bayesian control-based copy-reference delta-adjustment generalized-linear-models glm jags jump-to-reference mcmc missing-at-random missing-data missing-not-at-random multiple-imputation non-ignorable ordinal-regression pattern-mixture-model reference-based statistics cpp

4.30 score 3 scripts

alexanderrobitzsch

mdmb:Model Based Treatment of Missing Data

Contains model-based treatment of missing data for regression models with missing values in covariates or the dependent variable using maximum likelihood or Bayesian estimation (Ibrahim et al., 2005; <doi:10.1198/016214504000001844>; Luedtke, Robitzsch, & West, 2020a, 2020b; <doi:10.1080/00273171.2019.1640104><doi:10.1037/met0000233>). The regression model can be nonlinear (e.g., interaction effects, quadratic effects or B-spline functions). Multilevel models with missing data in predictors are available for Bayesian estimation. Substantive-model compatible multiple imputation can be also conducted.

Maintained by Alexander Robitzsch. Last updated 9 months ago.

missing-data multiple-imputation openblas cpp

4 stars 3.78 score 26 scripts

hlorenzo

ddsPLS:Data-Driven Sparse Partial Least Squares

A sparse Partial Least Squares implementation which uses soft-threshold estimation of the covariance matrices and therein introduces sparsity. Number of components and regularization coefficients are automatically set.

Maintained by Hadrien Lorenzo. Last updated 1 years ago.

missing-data multi-block pls supervised-learning svd variable-selection cpp

3.70 score 7 scripts

yixiao-zeng

missoNet:Missingness in Multi-Task Regression with Network Estimation

Efficient procedures for fitting conditional graphical lasso models that link a set of predictor variables to a set of response variables (or tasks), even when the response data may contain missing values. 'missoNet' simultaneously estimates the predictor coefficients for all tasks by leveraging information from one another, in order to provide more accurate predictions in comparison to modeling them individually. Additionally, 'missoNet' estimates the response network structure influenced by conditioning predictor variables using a L1-regularized conditional Gaussian graphical model. Unlike most penalized multi-task regression methods (e.g., MRCE), 'missoNet' is capable of obtaining estimates even when the response data is corrupted by missing values. The method automatically enjoys the theoretical and computational benefits of convexity, and returns solutions that are comparable to the estimates obtained without missingness.

Maintained by Yixiao Zeng. Last updated 2 years ago.

conditional-graphical-lasso missing-data multi-task-regression openblas cpp

1 stars 3.70 score 2 scripts

dsalfran

ImputeRobust:Robust Multiple Imputation with Generalized Additive Models for Location Scale and Shape

Provides new imputation methods for the 'mice' package based on generalized additive models for location, scale, and shape (GAMLSS) as described in de Jong, van Buuren and Spiess <doi:10.1080/03610918.2014.911894>.

Maintained by Daniel Salfran. Last updated 6 years ago.

imputation missing-data multiple-imputation

9 stars 3.65 score 4 scripts

indenkun

MissMech:Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random

To test whether the missing data mechanism, in a set of incompletely observed data, is one of missing completely at random (MCAR). For detailed description see Jamshidian, M. Jalal, S., and Jansen, C. (2014). "MissMech: An R Package for Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random (MCAR)", Journal of Statistical Software, 56(6), 1-31. <https://www.jstatsoft.org/v56/i06/> <doi:10.18637/jss.v056.i06>.

Maintained by Mao Kobayashi. Last updated 1 years ago.

missing-data

3.54 score 54 scripts

mrcieu

tmsens:Sensitivity Analysis Using the Trimmed Means Estimator

Sensitivity analysis using the trimmed means estimator.

Maintained by Audinga-Dea Hazewinkel. Last updated 7 months ago.

missing-data sensitivity-analysis trimmed-means

1 stars 2.70 score 1 scripts

bbartholdy

hitchr:A random sample generator based on The Hitchhiker's Guide to the Galaxy

Generates random samples containing races described in The Hitchhiker's Guide to the Galaxy.

Maintained by Bjørn Peare Bartholdy. Last updated 3 years ago.

hitchhikers-guide missing-data sample-generation

1.70 score