Showing 200 of total 636 results (show query)
amices
mice:Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Maintained by Stef van Buuren. Last updated 5 days ago.
chained-equationsfcsimputationmicemissing-datamissing-valuesmultiple-imputationmultivariate-datacpp
178.5 match 462 stars 16.50 score 10k scripts 154 dependentsalexanderrobitzsch
miceadds:Some Additional Multiple Imputation Functions, Especially for 'mice'
Contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are implemented. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>), substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>), and features for the generation of synthetic datasets (Reiter, 2005, <doi:10.1111/j.1467-985X.2004.00343.x>; Nowok, Raab, & Dibben, 2016, <doi:10.18637/jss.v074.i11>).
Maintained by Alexander Robitzsch. Last updated 14 days ago.
missing-datamultiple-imputationopenblascpp
171.0 match 16 stars 9.16 score 542 scripts 9 dependentsstatistikat
VIM:Visualization and Imputation of Missing Values
New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.
Maintained by Matthias Templ. Last updated 7 months ago.
hotdeckimputation-methodsmodel-predictionsvisualizationcpp
84.4 match 85 stars 14.44 score 2.6k scripts 19 dependentsmatteo21q
jomo:Multilevel Joint Modelling Multiple Imputation
Similarly to Schafer's package 'pan', 'jomo' is a package for multilevel joint modelling multiple imputation (Carpenter and Kenward, 2013) <doi:10.1002/9781119942283>. Novel aspects of 'jomo' are the possibility of handling binary and categorical data through latent normal variables, the option to use cluster-specific covariance matrices and to impute compatibly with the substantive model.
Maintained by Matteo Quartagno. Last updated 2 years ago.
81.4 match 3 stars 9.58 score 126 scripts 154 dependentspharmaverse
admiral:ADaM in R Asset Library
A toolbox for programming Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team, 2021, <https://www.cdisc.org/standards/foundational/adam>).
Maintained by Ben Straub. Last updated 3 days ago.
cdiscclinical-trialsopen-source
40.0 match 236 stars 13.89 score 486 scripts 4 dependentssimongrund1
mitml:Tools for Multiple Imputation in Multilevel Modeling
Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the packages 'pan' and 'jomo', and several functions for visualization, data management and the analysis of multiply imputed data sets.
Maintained by Simon Grund. Last updated 1 years ago.
imputationmissing-datamixed-effectsmultilevel-datamultilevel-models
44.7 match 29 stars 12.36 score 246 scripts 153 dependentsbioc
impute:impute: Imputation for microarray data
Imputation for microarray data (currently KNN only)
Maintained by Balasubramanian Narasimhan. Last updated 5 months ago.
60.3 match 9.04 score 952 scripts 131 dependentstirgit
missCompare:Intuitive Missing Data Imputation Framework
Offers a convenient pipeline to test and compare various missing data imputation algorithms on simulated and real data. These include simpler methods, such as mean and median imputation and random replacement, but also include more sophisticated algorithms already implemented in popular R packages, such as 'mi', described by Su et al. (2011) <doi:10.18637/jss.v045.i02>; 'mice', described by van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>; 'missForest', described by Stekhoven and Buhlmann (2012) <doi:10.1093/bioinformatics/btr597>; 'missMDA', described by Josse and Husson (2016) <doi:10.18637/jss.v070.i01>; and 'pcaMethods', described by Stacklies et al. (2007) <doi:10.1093/bioinformatics/btm069>. The central assumption behind 'missCompare' is that structurally different datasets (e.g. larger datasets with a large number of correlated variables vs. smaller datasets with non correlated variables) will benefit differently from different missing data imputation algorithms. 'missCompare' takes measurements of your dataset and sets up a sandbox to try a curated list of standard and sophisticated missing data imputation algorithms and compares them assuming custom missingness patterns. 'missCompare' will also impute your real-life dataset for you after the selection of the best performing algorithm in the simulations. The package also provides various post-imputation diagnostics and visualizations to help you assess imputation performance.
Maintained by Tibor V. Varga. Last updated 4 years ago.
comparisoncomparison-benchmarksimputationimputation-algorithmimputation-methodsimputationskolmogorov-smirnovmissingmissing-datamissing-data-imputationmissing-status-checkmissing-valuesmissingnesspost-imputation-diagnosticsrmse
90.5 match 39 stars 5.89 score 40 scriptssteffenmoritz
imputeTS:Time Series Missing Value Imputation
Imputation (replacement) of missing values in univariate time series. Offers several imputation functions and missing data plots. Available imputation algorithms include: 'Mean', 'LOCF', 'Interpolation', 'Moving Average', 'Seasonal Decomposition', 'Kalman Smoothing on Structural Time Series models', 'Kalman Smoothing on ARIMA models'. Published in Moritz and Bartz-Beielstein (2017) <doi:10.32614/RJ-2017-009>.
Maintained by Steffen Moritz. Last updated 3 years ago.
data-visualizationimputationimputation-algorithmimputetsmissing-datatime-seriescpp
39.9 match 162 stars 12.18 score 1.9k scripts 27 dependentsnjtierney
naniar:Data Structures, Summaries, and Visualisations for Missing Data
Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. 'naniar' provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data. The work is fully discussed at Tierney & Cook (2023) <doi:10.18637/jss.v105.i07>.
Maintained by Nicholas Tierney. Last updated 2 days ago.
data-visualisationggplot2missing-datamissingnesstidy-data
27.6 match 657 stars 15.63 score 5.1k scripts 9 dependentssteffenmoritz
imputeR:A General Multivariate Imputation Framework
Multivariate Expectation-Maximization (EM) based imputation framework that offers several different algorithms. These include regularisation methods like Lasso and Ridge regression, tree-based models and dimensionality reduction methods like PCA and PLS.
Maintained by Steffen Moritz. Last updated 4 years ago.
85.3 match 16 stars 4.94 score 54 scriptsmwheymans
psfmi:Prediction Model Pooling, Selection and Performance Evaluation Across Multiply Imputed Datasets
Pooling, backward and forward selection of linear, logistic and Cox regression models in multiply imputed datasets. Backward and forward selection can be done from the pooled model using Rubin's Rules (RR), the D1, D2, D3, D4 and the median p-values method. This is also possible for Mixed models. The models can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of predictors. The stability of the models can be evaluated using (cluster) bootstrapping. The package further contains functions to pool model performance measures as ROC/AUC, Reclassification, R-squared, scaled Brier score, H&L test and calibration plots for logistic regression models. Internal validation can be done across multiply imputed datasets with cross-validation or bootstrapping. The adjusted intercept after shrinkage of pooled regression coefficients can be obtained. Backward and forward selection as part of internal validation is possible. A function to externally validate logistic prediction models in multiple imputed datasets is available and a function to compare models. For Cox models a strata variable can be included. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>.
Maintained by Martijn Heymans. Last updated 2 years ago.
cox-regressionimputationimputed-datasetslogisticmultiple-imputationpoolpredictorregressionselectionsplinespline-predictors
53.5 match 10 stars 7.17 score 70 scriptsinsightsengineering
rbmi:Reference Based Multiple Imputation
Implements standard and reference based multiple imputation methods for continuous longitudinal endpoints (Gower-Page et al. (2022) <doi:10.21105/joss.04251>). In particular, this package supports deterministic conditional mean imputation and jackknifing as described in Wolbers et al. (2022) <doi:10.1002/pst.2234>, Bayesian multiple imputation as described in Carpenter et al. (2013) <doi:10.1080/10543406.2013.834911>, and bootstrapped maximum likelihood imputation as described in von Hippel and Bartlett (2021) <doi: 10.1214/20-STS793>.
Maintained by Isaac Gravestock. Last updated 22 days ago.
42.3 match 18 stars 8.78 score 33 scripts 1 dependentsbioc
snpStats:SnpMatrix and XSnpMatrix classes and methods
Classes and statistical methods for large SNP association studies. This extends the earlier snpMatrix package, allowing for uncertainty in genotypes.
Maintained by David Clayton. Last updated 5 months ago.
microarraysnpgeneticvariabilityzlib
35.8 match 9.41 score 674 scripts 17 dependentsjeffreyevans
yaImpute:Nearest Neighbor Observation Imputation and Evaluation Tools
Performs nearest neighbor-based imputation using one or more alternative approaches to processing multivariate data. These include methods based on canonical correlation: analysis, canonical correspondence analysis, and a multivariate adaptation of the random forest classification and regression techniques of Leo Breiman and Adele Cutler. Additional methods are also offered. The package includes functions for comparing the results from running alternative techniques, detecting imputation targets that are notably distant from reference observations, detecting and correcting for bias, bootstrapping and building ensemble imputations, and mapping results.
Maintained by Jeffrey S. Evans. Last updated 6 months ago.
42.5 match 3 stars 7.40 score 94 scripts 12 dependentspolkas
miceFast:Fast Imputations Using 'Rcpp' and 'Armadillo'
Fast imputations under the object-oriented programming paradigm. Moreover there are offered a few functions built to work with popular R packages such as 'data.table' or 'dplyr'. The biggest improvement in time performance could be achieve for a calculation where a grouping variable have to be used. A single evaluation of a quantitative model for the multiple imputations is another major enhancement. A new major improvement is one of the fastest predictive mean matching in the R world because of presorting and binary search.
Maintained by Maciej Nasinski. Last updated 1 months ago.
cppfastfast-imputationsgroupingimputationimputationsmatrixmromultiple-imputationrcpprcpparmadillovifweightingopenblascppopenmp
49.1 match 20 stars 5.94 score 29 scriptsmatthewblackwell
Amelia:A Program for Missing Data
A tool that "multiply imputes" missing data in a single cross-section (such as a survey), from a time series (like variables collected for each year in a country), or from a time-series-cross-sectional data set (such as collected by years for each of several countries). Amelia II implements our bootstrapping-based algorithm that gives essentially the same answers as the standard IP or EMis approaches, is usually considerably faster than existing approaches and can handle many more variables. Unlike Amelia I and other statistically rigorous imputation software, it virtually never crashes (but please let us know if you find to the contrary!). The program also generalizes existing approaches by allowing for trends in time series across observations within a cross-sectional unit, as well as priors that allow experts to incorporate beliefs they have about the values of missing cells in their data. Amelia II also includes useful diagnostics of the fit of multiple imputation models. The program works from the R command line or via a graphical user interface that does not require users to know R.
Maintained by Matthew Blackwell. Last updated 4 months ago.
31.7 match 1 stars 9.06 score 1.4k scripts 7 dependentsharrelfe
Hmisc:Harrell Miscellaneous
Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.
Maintained by Frank E Harrell Jr. Last updated 2 days ago.
16.1 match 210 stars 17.61 score 17k scripts 750 dependentsnerler
JointAI:Joint Analysis and Imputation of Incomplete Data
Joint analysis and imputation of incomplete data in the Bayesian framework, using (generalized) linear (mixed) models and extensions there of, survival models, or joint models for longitudinal and survival data, as described in Erler, Rizopoulos and Lesaffre (2021) <doi:10.18637/jss.v100.i20>. Incomplete covariates, if present, are automatically imputed. The package performs some preprocessing of the data and creates a 'JAGS' model, which will then automatically be passed to 'JAGS' <https://mcmc-jags.sourceforge.io/> with the help of the package 'rjags'.
Maintained by Nicole S. Erler. Last updated 12 months ago.
bayesiangeneralized-linear-modelsglmglmmimputationimputationsjagsjoint-analysislinear-mixed-modelslinear-regression-modelsmcmc-samplemcmc-samplingmissing-datamissing-valuessurvivalcpp
37.9 match 28 stars 7.30 score 59 scripts 1 dependentstidymodels
recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 4 days ago.
14.3 match 584 stars 18.71 score 7.2k scripts 380 dependentsbluefoxr
COINr:Composite Indicator Construction and Analysis
A comprehensive high-level package, for composite indicator construction and analysis. It is a "development environment" for composite indicators and scoreboards, which includes utilities for construction (indicator selection, denomination, imputation, data treatment, normalisation, weighting and aggregation) and analysis (multivariate analysis, correlation plotting, short cuts for principal component analysis, global sensitivity analysis, and more). A composite indicator is completely encapsulated inside a single hierarchical list called a "coin". This allows a fast and efficient work flow, as well as making quick copies, testing methodological variations and making comparisons. It also includes many plotting options, both statistical (scatter plots, distribution plots) as well as for presenting results.
Maintained by William Becker. Last updated 2 months ago.
28.8 match 26 stars 9.07 score 73 scripts 1 dependentsmayer79
missRanger:Fast Imputation of Missing Values
Alternative implementation of the beautiful 'MissForest' algorithm used to impute mixed-type data sets by chaining random forests, introduced by Stekhoven, D.J. and Buehlmann, P. (2012) <doi:10.1093/bioinformatics/btr597>. Under the hood, it uses the lightning fast random forest package 'ranger'. Between the iterative model fitting, we offer the option of using predictive mean matching. This firstly avoids imputation with values not already present in the original data (like a value 0.3334 in 0-1 coded variable). Secondly, predictive mean matching tries to raise the variance in the resulting conditional distributions to a realistic level. This would allow, e.g., to do multiple imputation when repeating the call to missRanger(). Out-of-sample application is supported as well.
Maintained by Michael Mayer. Last updated 3 months ago.
imputationmachine-learningmissing-valuesrandom-forest
23.0 match 69 stars 11.07 score 208 scripts 6 dependentsshangzhi-hong
RfEmpImp:Multiple Imputation using Chained Random Forests
An R package for multiple imputation using chained random forests. Implemented methods can handle missing data in mixed types of variables by using prediction-based or node-based conditional distributions constructed using random forests. For prediction-based imputation, the method based on the empirical distribution of out-of-bag prediction errors of random forests and the method based on normality assumption for prediction errors of random forests are provided for imputing continuous variables. And the method based on predicted probabilities is provided for imputing categorical variables. For node-based imputation, the method based on the conditional distribution formed by the predicting nodes of random forests, and the method based on proximity measures of random forests are provided. More details of the statistical methods can be found in Hong et al. (2020) <arXiv:2004.14823>.
Maintained by Shangzhi Hong. Last updated 2 years ago.
imputationmissing-datarandom-forest
51.2 match 5 stars 4.40 score 8 scriptsbioc
DAPAR:Tools for the Differential Analysis of Proteins Abundance with R
The package DAPAR is a Bioconductor distributed R package which provides all the necessary functions to analyze quantitative data from label-free proteomics experiments. Contrarily to most other similar R packages, it is endowed with rich and user-friendly graphical interfaces, so that no programming skill is required (see `Prostar` package).
Maintained by Samuel Wieczorek. Last updated 5 months ago.
proteomicsnormalizationpreprocessingmassspectrometryqualitycontrolgodataimportprostar1
41.3 match 2 stars 5.42 score 22 scripts 1 dependentsmlr-org
mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'
Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.
Maintained by Martin Binder. Last updated 7 days ago.
baggingdata-sciencedataflow-programmingensemble-learningmachine-learningmlr3pipelinespreprocessingstacking
17.8 match 141 stars 12.36 score 448 scripts 7 dependentsbilldenney
PKNCA:Perform Pharmacokinetic Non-Compartmental Analysis
Compute standard Non-Compartmental Analysis (NCA) parameters for typical pharmacokinetic analyses and summarize them.
Maintained by Bill Denney. Last updated 15 days ago.
ncanoncompartmental-analysispharmacokinetics
17.2 match 73 stars 12.61 score 214 scripts 4 dependentsfarrellday
miceRanger:Multiple Imputation by Chained Equations with Random Forests
Multiple Imputation has been shown to be a flexible method to impute missing values by Van Buuren (2007) <doi:10.1177/0962280206074463>. Expanding on this, random forests have been shown to be an accurate model by Stekhoven and Buhlmann <arXiv:1105.0828> to impute missing values in datasets. They have the added benefits of returning out of bag error and variable importance estimates, as well as being simple to run in parallel.
Maintained by Sam Wilson. Last updated 3 years ago.
imputation-methodsmachine-learningmicemissing-datamissing-valuesrandom-forests
28.3 match 67 stars 7.09 score 41 scripts 1 dependentsjwb133
smcfcs:Multiple Imputation of Covariates by Substantive Model Compatible Fully Conditional Specification
Implements multiple imputation of missing covariates by Substantive Model Compatible Fully Conditional Specification. This is a modification of the popular FCS/chained equations multiple imputation approach, and allows imputation of missing covariate values from models which are compatible with the user specified substantive model.
Maintained by Jonathan Bartlett. Last updated 15 hours ago.
21.5 match 11 stars 9.00 score 59 scripts 1 dependentsbioc
HIBAG:HLA Genotype Imputation with Attribute Bagging
Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.
Maintained by Xiuwen Zheng. Last updated 4 months ago.
geneticsstatisticalmethodbioinformaticsgpuhlaimputationmhcsnpcpp
23.3 match 30 stars 8.24 score 48 scriptsmarkvanderloo
simputation:Simple Imputation
Easy to use interfaces to a number of imputation methods that fit in the not-a-pipe operator of the 'magrittr' package.
Maintained by Mark van der Loo. Last updated 8 months ago.
data-scienceimputationofficialstatistics
22.3 match 92 stars 8.38 score 350 scriptssamwieczorek
imputeLCMD:A Collection of Methods for Left-Censored Missing Data Imputation
A collection of functions for left-censored missing data imputation. Left-censoring is a special case of missing not at random (MNAR) mechanism that generates non-responses in proteomics experiments. The package also contains functions to artificially generate peptide/protein expression data (log-transformed) as random draws from a multivariate Gaussian distribution as well as a function to generate missing data (both randomly and non-randomly). For comparison reasons, the package also contains several wrapper functions for the imputation of non-responses that are missing at random. * New functionality has been added: a hybrid method that allows the imputation of missing values in a more complex scenario where the missing data are both MAR and MNAR.
Maintained by Samuel Wieczorek. Last updated 3 years ago.
40.9 match 2 stars 4.55 score 93 scripts 5 dependentschoonghyunryu
dlookr:Tools for Data Diagnosis, Exploration, Transformation
A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.
Maintained by Choonghyun Ryu. Last updated 9 months ago.
16.0 match 212 stars 11.05 score 748 scripts 2 dependentstorockel
missMethods:Methods for Missing Data
Supply functions for the creation and handling of missing data as well as tools to evaluate missing data methods. Nearly all possibilities of generating missing data discussed by Santos et al. (2019) <doi:10.1109/ACCESS.2019.2891360> and some additional are implemented. Functions are supplied to compare parameter estimates and imputed values to true values to evaluate missing data methods. Evaluations of these types are done, for example, by Cetin-Berber et al. (2019) <doi:10.1177/0013164418805532> and Kim et al. (2005) <doi:10.1093/bioinformatics/bth499>.
Maintained by Tobias Rockel. Last updated 2 years ago.
24.6 match 7 stars 6.98 score 113 scripts 4 dependentsalbertofranzin
bnstruct:Bayesian Network Structure Learning from Data with Missing Values
Bayesian Network Structure Learning from Data with Missing Values. The package implements the Silander-Myllymaki complete search, the Max-Min Parents-and-Children, the Hill-Climbing, the Max-Min Hill-climbing heuristic searches, and the Structural Expectation-Maximization algorithm. Available scoring functions are BDeu, AIC, BIC. The package also implements methods for generating and using bootstrap samples, imputed data, inference.
Maintained by Alberto Franzin. Last updated 1 years ago.
31.0 match 1 stars 5.40 score 111 scripts 3 dependentsagnesdeng
mixgb:Multiple Imputation Through 'XGBoost'
Multiple imputation using 'XGBoost', subsampling, and predictive mean matching as described in Deng and Lumley (2023) <doi:10.1080/10618600.2023.2252501>. The package supports various types of variables, offers flexible settings, and enables saving an imputation model to impute new data. Data processing and memory usage have been optimised to speed up the imputation process.
Maintained by Yongshi Deng. Last updated 2 months ago.
25.1 match 23 stars 6.58 score 82 scriptsinbo
multimput:Using Multiple Imputation to Address Missing Data
Accompanying package for the paper: Working with population totals in the presence of missing data comparing imputation methods in terms of bias and precision. Published in 2017 in the Journal of Ornithology volume 158 page 603–615 (<doi:10.1007/s10336-016-1404-9>).
Maintained by Thierry Onkelinx. Last updated 16 days ago.
45.4 match 1 stars 3.62 score 14 scripts 1 dependentsmidasverse
rMIDAS:Multiple Imputation with Denoising Autoencoders
A tool for multiply imputing missing data using 'MIDAS', a deep learning method based on denoising autoencoder neural networks. This algorithm offers significant accuracy and efficiency advantages over other multiple imputation strategies, particularly when applied to large datasets with complex features. Alongside interfacing with 'Python' to run the core algorithm, this package contains functions for processing data before and after model training, running imputation model diagnostics, generating multiple completed datasets, and estimating regression models on these datasets.
Maintained by Thomas Robinson. Last updated 1 years ago.
deep-learningimputation-methodsneural-networkreticulatetensorflow
24.7 match 34 stars 6.53 score 33 scriptsdsalfran
ImputeRobust:Robust Multiple Imputation with Generalized Additive Models for Location Scale and Shape
Provides new imputation methods for the 'mice' package based on generalized additive models for location, scale, and shape (GAMLSS) as described in de Jong, van Buuren and Spiess <doi:10.1080/03610918.2014.911894>.
Maintained by Daniel Salfran. Last updated 6 years ago.
imputationmissing-datamultiple-imputation
44.2 match 9 stars 3.65 score 4 scriptsfarhadpishgar
MatchThem:Matching and Weighting Multiply Imputed Datasets
Provides essential tools for the pre-processing techniques of matching and weighting multiply imputed datasets. The package includes functions for matching within and across multiply imputed datasets using various methods, estimating weights for units in the imputed datasets using multiple weighting methods, calculating causal effect estimates in each matched or weighted dataset using parametric or non-parametric statistical models, and pooling the resulting estimates according to Rubin's rules (please see <https://journal.r-project.org/archive/2021/RJ-2021-073/> for more details).
Maintained by Farhad Pishgar. Last updated 5 months ago.
21.8 match 16 stars 7.34 score 112 scriptswelch-lab
cytosignal:What the Package Does (One Line, Title Case)
What the package does (one paragraph).
Maintained by Jialin Liu. Last updated 5 days ago.
26.7 match 16 stars 5.95 score 6 scriptshaghish
mlim:Single and Multiple Imputation with Automated Machine Learning
Machine learning algorithms have been used for performing single missing data imputation and most recently, multiple imputations. However, this is the first attempt for using automated machine learning algorithms for performing both single and multiple imputation. Automated machine learning is a procedure for fine-tuning the model automatic, performing a random search for a model that results in less error, without overfitting the data. The main idea is to allow the model to set its own parameters for imputing each variable separately instead of setting fixed predefined parameters to impute all variables of the dataset. Using automated machine learning, the package fine-tunes an Elastic Net (default) or Gradient Boosting, Random Forest, Deep Learning, Extreme Gradient Boosting, or Stacked Ensemble machine learning model (from one or a combination of other supported algorithms) for imputing the missing observations. This procedure has been implemented for the first time by this package and is expected to outperform other packages for imputing missing data that do not fine-tune their models. The multiple imputation is implemented via bootstrapping without letting the duplicated observations to harm the cross-validation procedure, which is the way imputed variables are evaluated. Most notably, the package implements automated procedure for handling imputing imbalanced data (class rarity problem), which happens when a factor variable has a level that is far more prevalent than the other(s). This is known to result in biased predictions, hence, biased imputation of missing data. However, the autobalancing procedure ensures that instead of focusing on maximizing accuracy (classification error) in imputing factor variables, a fairer procedure and imputation method is practiced.
Maintained by E. F. Haghish. Last updated 8 months ago.
automatic-machine-learningautomlclassimbalancedata-scienceelastic-netextreme-gradient-boostinggbmglmgradient-boostinggradient-boosting-machineimputationimputation-algorithmimputation-methodsmachine-learningmissing-datamultipleimputationstack-ensemble
35.0 match 31 stars 4.49 score 7 scriptsdavid-cortes
isotree:Isolation-Based Outlier Detection
Fast and multi-threaded implementation of isolation forest (Liu, Ting, Zhou (2008) <doi:10.1109/ICDM.2008.17>), extended isolation forest (Hariri, Kind, Brunner (2018) <doi:10.48550/arXiv.1811.02141>), SCiForest (Liu, Ting, Zhou (2010) <doi:10.1007/978-3-642-15883-4_18>), fair-cut forest (Cortes (2021) <doi:10.48550/arXiv.2110.13402>), robust random-cut forest (Guha, Mishra, Roy, Schrijvers (2016) <http://proceedings.mlr.press/v48/guha16.html>), and customizable variations of them, for isolation-based outlier detection, clustered outlier detection, distance or similarity approximation (Cortes (2019) <doi:10.48550/arXiv.1910.12362>), isolation kernel calculation (Ting, Zhu, Zhou (2018) <doi:10.1145/3219819.3219990>), and imputation of missing values (Cortes (2019) <doi:10.48550/arXiv.1911.06646>), based on random or guided decision tree splitting, and providing different metrics for scoring anomalies based on isolation depth or density (Cortes (2021) <doi:10.48550/arXiv.2111.11639>). Provides simple heuristics for fitting the model to categorical columns and handling missing data, and offers options for varying between random and guided splits, and for using different splitting criteria.
Maintained by David Cortes. Last updated 13 days ago.
anomaly-detectionimputationisolation-forestoutlier-detectioncppopenmp
14.2 match 203 stars 10.41 score 115 scripts 6 dependentsjinghuazhao
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 15 days ago.
11.9 match 12 stars 11.88 score 448 scripts 16 dependentsjinghuazhao
pan:Multiple Imputation for Multivariate Panel or Clustered Data
It provides functions and examples for maximum likelihood estimation for generalized linear mixed models and Gibbs sampler for multivariate linear mixed models with incomplete data, as described in Schafer JL (1997) "Imputation of missing covariates under a multivariate linear mixed model". Technical report 97-04, Dept. of Statistics, The Pennsylvania State University.
Maintained by Jing hua Zhao. Last updated 2 years ago.
17.4 match 1 stars 7.86 score 65 scripts 155 dependentsbioc
xcms:LC-MS and GC-MS Data Analysis
Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.
Maintained by Steffen Neumann. Last updated 1 months ago.
immunooncologymassspectrometrymetabolomicsbioconductorfeature-detectionmass-spectrometrypeak-detectioncpp
9.4 match 192 stars 14.32 score 984 scripts 11 dependentsxsswang
remiod:Reference-Based Multiple Imputation for Ordinal/Binary Response
Reference-based multiple imputation of ordinal and binary responses under Bayesian framework, as described in Wang and Liu (2022) <arXiv:2203.02771>. Methods for missing-not-at-random include Jump-to-Reference (J2R), Copy Reference (CR), and Delta Adjustment which can generate tipping point analysis.
Maintained by Tony Wang. Last updated 2 years ago.
bayesiancontrol-basedcopy-referencedelta-adjustmentgeneralized-linear-modelsglmjagsjump-to-referencemcmcmissing-at-randommissing-datamissing-not-at-randommultiple-imputationnon-ignorableordinal-regressionpattern-mixture-modelreference-basedstatisticscpp
30.2 match 4.30 score 3 scriptsmwheymans
miceafter:Data and Statistical Analyses after Multiple Imputation
Statistical Analyses and Pooling after Multiple Imputation. A large variety of repeated statistical analysis can be performed and finally pooled. Statistical analysis that are available are, among others, Levene's test, Odds and Risk Ratios, One sample proportions, difference between proportions and linear and logistic regression models. Functions can also be used in combination with the Pipe operator. More and more statistical analyses and pooling functions will be added over time. Heymans (2007) <doi:10.1186/1471-2288-7-33>. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>. Sidi (2021) <doi:10.1080/00031305.2021.1898468>. Lott (2018) <doi:10.1080/00031305.2018.1473796>. Grund (2021) <doi:10.31234/osf.io/d459g>.
Maintained by Martijn Heymans. Last updated 2 years ago.
26.6 match 2 stars 4.84 score 23 scriptstdjorgensen
lavaan.mi:Fit Structural Equation Models to Multiply Imputed Data
The primary purpose of 'lavaan.mi' is to extend the functionality of the R package 'lavaan', which implements structural equation modeling (SEM). When incomplete data have been multiply imputed, the imputed data sets can be analyzed by 'lavaan' using complete-data estimation methods, but results must be pooled across imputations (Rubin, 1987, <doi:10.1002/9780470316696>). The 'lavaan.mi' package automates the pooling of point and standard-error estimates, as well as a variety of test statistics, using a familiar interface that allows users to fit an SEM to multiple imputations as they would to a single data set using the 'lavaan' package.
Maintained by Terrence D. Jorgensen. Last updated 3 days ago.
23.4 match 3 stars 5.45 scorebioc
ADImpute:Adaptive Dropout Imputer (ADImpute)
Single-cell RNA sequencing (scRNA-seq) methods are typically unable to quantify the expression levels of all genes in a cell, creating a need for the computational prediction of missing values (‘dropout imputation’). Most existing dropout imputation methods are limited in the sense that they exclusively use the scRNA-seq dataset at hand and do not exploit external gene-gene relationship information. Here we propose two novel methods: a gene regulatory network-based approach using gene-gene relationships learnt from external data and a baseline approach corresponding to a sample-wide average. ADImpute can implement these novel methods and also combine them with existing imputation methods (currently supported: DrImpute, SAVER). ADImpute can learn the best performing method per gene and combine the results from different methods into an ensemble.
Maintained by Ana Carolina Leote. Last updated 5 months ago.
geneexpressionnetworkpreprocessingsequencingsinglecelltranscriptomics
29.6 match 4.30 score 7 scriptsphargarten2
miWQS:Multiple Imputation Using Weighted Quantile Sum Regression
The miWQS package handles the uncertainty due to below the detection limit in a correlated component mixture problem. Researchers want to determine if a set/mixture of continuous and correlated components/chemicals is associated with an outcome and if so, which components are important in that mixture. These components share a common outcome but are interval-censored between zero and low thresholds, or detection limits, that may be different across the components. This package applies the multiple imputation (MI) procedure to the weighted quantile sum regression (WQS) methodology for continuous, binary, or count outcomes (Hargarten & Wheeler (2020) <doi:10.1016/j.envres.2020.109466>). The imputation models are: bootstrapping imputation (Lubin et al (2004) <doi:10.1289/ehp.7199>), univariate Bayesian imputation (Hargarten & Wheeler (2020) <doi:10.1016/j.envres.2020.109466>), and multivariate Bayesian regression imputation.
Maintained by Paul M. Hargarten. Last updated 1 years ago.
26.2 match 2 stars 4.78 score 20 scripts 1 dependentsbioc
DEP:Differential Enrichment analysis of Proteomics data
This package provides an integrated analysis workflow for robust and reproducible analysis of mass spectrometry proteomics data for differential protein expression or differential enrichment. It requires tabular input (e.g. txt files) as generated by quantitative analysis softwares of raw mass spectrometry data, such as MaxQuant or IsobarQuant. Functions are provided for data preparation, filtering, variance normalization and imputation of missing values, as well as statistical testing of differentially enriched / expressed proteins. It also includes tools to check intermediate steps in the workflow, such as normalization and missing values imputation. Finally, visualization tools are provided to explore the results, including heatmap, volcano plot and barplot representations. For scientists with limited experience in R, the package also contains wrapper functions that entail the complete analysis workflow and generate a report. Even easier to use are the interactive Shiny apps that are provided by the package.
Maintained by Arne Smits. Last updated 5 months ago.
immunooncologyproteomicsmassspectrometrydifferentialexpressiondatarepresentation
17.6 match 7.10 score 628 scriptsvaudigier
micemd:Multiple Imputation by Chained Equations with Multilevel Data
Addons for the 'mice' package to perform multiple imputation using chained equations with two-level data. Includes imputation methods dedicated to sporadically and systematically missing values. Imputation of continuous, binary or count variables are available. Following the recommendations of Audigier, V. et al (2018) <doi:10.1214/18-STS646>, the choice of the imputation method for each variable can be facilitated by a default choice tuned according to the structure of the incomplete dataset. Allows parallel calculation and overimputation for 'mice'.
Maintained by Vincent Audigier. Last updated 1 years ago.
40.3 match 1 stars 3.08 score 80 scripts 1 dependentsmariechion
mi4p:Multiple Imputation for Proteomics
A framework for multiple imputation for proteomics is proposed by Marie Chion, Christine Carapito and Frederic Bertrand (2021) <doi:10.1371/journal.pcbi.1010420>. It is dedicated to dealing with multiple imputation for proteomics.
Maintained by Frederic Bertrand. Last updated 5 months ago.
23.9 match 6 stars 4.91 score 27 scriptsjwb133
InformativeCensoring:Multiple Imputation for Informative Censoring
Multiple Imputation for Informative Censoring. This package implements two methods. Gamma Imputation described in <DOI:10.1002/sim.6274> and Risk Score Imputation described in <DOI:10.1002/sim.3480>.
Maintained by Jonathan Bartlett. Last updated 2 years ago.
24.2 match 4.78 score 9 scripts 1 dependentselliecurnow
midoc:A Decision-Making System for Multiple Imputation
A guidance system for analysis with missing data. It incorporates expert, up-to-date methodology to help researchers choose the most appropriate analysis approach when some data are missing. You provide the available data and the assumed causal structure, including the likely causes of missing data. 'midoc' will advise which analysis approaches can be used, and how best to perform them. 'midoc' follows the framework for the treatment and reporting of missing data in observational studies (TARMOS). Lee et al (2021). <doi:10.1016/j.jclinepi.2021.01.008>.
Maintained by Elinor Curnow. Last updated 5 months ago.
missing-datamultiple-imputation
21.6 match 6 stars 5.32 score 8 scriptsamices
ggmice:Visualizations for 'mice' with 'ggplot2'
Enhance a 'mice' imputation workflow with visualizations for incomplete and/or imputed data. The plotting functions produce 'ggplot' objects which may be easily manipulated or extended. Use 'ggmice' to inspect missing data, develop imputation models, evaluate algorithmic convergence, or compare observed versus imputed data.
Maintained by Hanne Oberman. Last updated 8 months ago.
15.4 match 32 stars 7.42 score 165 scriptsbioc
pcaMethods:A collection of PCA methods
Provides Bayesian PCA, Probabilistic PCA, Nipals PCA, Inverse Non-Linear PCA and the conventional SVD PCA. A cluster based method for missing value estimation is included for comparison. BPCA, PPCA and NipalsPCA may be used to perform PCA on incomplete data as well as for accurate missing value estimation. A set of methods for printing and plotting the results is also provided. All PCA methods make use of the same data structure (pcaRes) to provide a common interface to the PCA results. Initiated at the Max-Planck Institute for Molecular Plant Physiology, Golm, Germany.
Maintained by Henning Redestig. Last updated 5 months ago.
8.7 match 49 stars 13.10 score 538 scripts 73 dependentstslumley
mitools:Tools for Multiple Imputation of Missing Data
Tools to perform analyses and combine results from multiple-imputation datasets.
Maintained by Thomas Lumley. Last updated 6 years ago.
11.9 match 1 stars 9.50 score 716 scripts 249 dependentsmroman-ibs
FuzzyImputationTest:Imputation Procedures and Quality Tests for Fuzzy Data
Special procedures for the imputation of missing fuzzy numbers are still underdeveloped. The goal of the package is to provide the new d-imputation method (DIMP for short, Romaniuk, M. and Grzegorzewski, P. (2023) "Fuzzy Data Imputation with DIMP and FGAIN" RB/23/2023) and covert some classical ones applied in R packages ('missForest','miceRanger','knn') for use with fuzzy datasets. Additionally, specially tailored benchmarking tests are provided to check and compare these imputation procedures with fuzzy datasets.
Maintained by Maciej Romaniuk. Last updated 2 months ago.
21.9 match 5.06 score 2 scriptsjaredlander
useful:A Collection of Handy, Useful Functions
A set of little functions that have been found useful to do little odds and ends such as plotting the results of K-means clustering, substituting special text characters, viewing parts of a data.frame, constructing formulas from text and building design and response matrices.
Maintained by Jared P. Lander. Last updated 1 years ago.
14.3 match 8 stars 7.75 score 748 scripts 6 dependentsbioc
methyLImp2:Missing value estimation of DNA methylation data
This package allows to estimate missing values in DNA methylation data. methyLImp method is based on linear regression since methylation levels show a high degree of inter-sample correlation. Implementation is parallelised over chromosomes since probes on different chromosomes are usually independent. Mini-batch approach to reduce the runtime in case of large number of samples is available.
Maintained by Anna Plaksienko. Last updated 1 months ago.
dnamethylationmicroarraysoftwaremethylationarrayregressionimputationmethylationmissing-value-imputation
18.5 match 6 stars 5.62 score 3 scriptsstekhoven
missForest:Nonparametric Missing Value Imputation using Random Forest
The function 'missForest' in this package is used to impute missing values particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data including complex interactions and non-linear relations. It yields an out-of-bag (OOB) imputation error estimate without the need of a test set or elaborate cross-validation. It can be run in parallel to save computation time.
Maintained by Daniel J. Stekhoven. Last updated 1 years ago.
9.0 match 92 stars 11.53 score 1.1k scripts 32 dependentsalexwhitworth
imputeMulti:Imputation Methods for Multivariate Multinomial Data
Implements imputation methods using EM and Data Augmentation for multinomial data following the work of Schafer 1997 <ISBN: 978-0-412-04061-0>.
Maintained by Alex Whitworth. Last updated 3 years ago.
imputationimputation-methodsmultivariate-multinomial-datacpp
24.6 match 5 stars 4.15 score 28 scriptsngreifer
cobalt:Covariate Balance Tables and Plots
Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with 'MatchIt', 'WeightIt', 'MatchThem', 'twang', 'Matching', 'optmatch', 'CBPS', 'ebal', 'cem', 'sbw', and 'designmatch' for assessing balance on the output of their preprocessing functions. Users can also specify data for balance assessment not generated through the above packages. Also included are methods for assessing balance in clustered or multiply imputed data sets or data sets with multi-category, continuous, or longitudinal treatments.
Maintained by Noah Greifer. Last updated 11 months ago.
causal-inferencepropensity-scores
7.6 match 75 stars 12.98 score 1.0k scripts 8 dependentsbgoodri
mi:Missing Data Imputation and Model Checking
The mi package provides functions for data manipulation, imputing missing values in an approximate Bayesian framework, diagnostics of the models used to generate the imputations, confidence-building mechanisms to validate some of the assumptions of the imputation algorithm, and functions to analyze multiply imputed data sets with the appropriate degree of sampling uncertainty.
Maintained by Ben Goodrich. Last updated 3 years ago.
11.8 match 2 stars 8.25 score 244 scripts 47 dependentsrezakj
iCellR:Analyzing High-Throughput Single Cell Sequencing Data
A toolkit that allows scientists to work with data from single cell sequencing technologies such as scRNA-seq, scVDJ-seq, scATAC-seq, CITE-Seq and Spatial Transcriptomics (ST). Single (i) Cell R package ('iCellR') provides unprecedented flexibility at every step of the analysis pipeline, including normalization, clustering, dimensionality reduction, imputation, visualization, and so on. Users can design both unsupervised and supervised models to best suit their research. In addition, the toolkit provides 2D and 3D interactive visualizations, differential expression analysis, filters based on cells, genes and clusters, data merging, normalizing for dropouts, data imputation methods, correcting for batch differences, pathway analysis, tools to find marker genes for clusters and conditions, predict cell types and pseudotime analysis. See Khodadadi-Jamayran, et al (2020) <doi:10.1101/2020.05.05.078550> and Khodadadi-Jamayran, et al (2020) <doi:10.1101/2020.03.31.019109> for more details.
Maintained by Alireza Khodadadi-Jamayran. Last updated 8 months ago.
10xgenomics3dbatch-normalizationcell-type-classificationcite-seqclusteringclustering-algorithmdiffusion-mapsdropouticellrimputationintractive-graphnormalizationpseudotimescrna-seqscvdj-seqsingel-cell-sequencingumapcpp
17.3 match 121 stars 5.56 score 7 scripts 1 dependentsbips-hb
micd:Multiple Imputation in Causal Graph Discovery
Modified functions of the package 'pcalg' and some additional functions to run the PC and the FCI (Fast Causal Inference) algorithm for constraint-based causal discovery in incomplete and multiply imputed datasets. Foraita R, Friemel J, Günther K, Behrens T, Bullerdiek J, Nimzyk R, Ahrens W, Didelez V (2020) <doi:10.1111/rssa.12565>; Andrews RM, Foraita R, Didelez V, Witte J (2021) <arXiv:2108.13395>; Witte J, Foraita R, Didelez V (2022) <doi:10.1002/sim.9535>.
Maintained by Ronja Foraita. Last updated 2 years ago.
causal-discoverygraphical-modelsmultiple-imputation
24.2 match 5 stars 3.70 score 20 scriptsrqtl
qtl2:Quantitative Trait Locus Mapping in Experimental Crosses
Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.
Maintained by Karl W Broman. Last updated 7 days ago.
9.2 match 34 stars 9.48 score 1.1k scripts 5 dependentsjacky11
imp4p:Imputation for Proteomics
Functions to analyse missing value mechanisms and to impute data sets in the context of bottom-up MS-based proteomics.
Maintained by Quentin Giai Gianetto. Last updated 4 years ago.
43.4 match 1 stars 2.00 score 33 scripts 1 dependentsbioc
MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics
MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.
Maintained by Laurent Gatto. Last updated 17 hours ago.
immunooncologyinfrastructureproteomicsmassspectrometryqualitycontroldataimportbioconductorbioinformaticsmass-spectrometryproteomics-datavisualisationcpp
6.8 match 130 stars 12.81 score 772 scripts 36 dependentsprivefl
bigsnpr:Analysis of Massive SNP Arrays
Easy-to-use, efficient, flexible and scalable tools for analyzing massive SNP arrays. Privé et al. (2018) <doi:10.1093/bioinformatics/bty185>.
Maintained by Florian Privé. Last updated 9 days ago.
big-databioinformaticsmemory-mapped-fileparallel-computingpolygenic-scorespopulation-structure-inferencesnp-datastatistical-methodsopenblaszlibcppopenmp
7.5 match 200 stars 11.44 score 1.5k scripts 3 dependentssibipx
missForestPredict:Missing Value Imputation using Random Forest for Prediction Settings
Missing data imputation based on the 'missForest' algorithm (Stekhoven, Daniel J (2012) <doi:10.1093/bioinformatics/btr597>) with adaptations for prediction settings. The function missForest() is used to impute a (training) dataset with missing values and to learn imputation models that can be later used for imputing new observations. The function missForestPredict() is used to impute one or multiple new observations (test set) using the models learned on the training data.
Maintained by Elena Albu. Last updated 1 years ago.
21.3 match 4.00 score 3 scriptsbioc
MAI:Mechanism-Aware Imputation
A two-step approach to imputing missing data in metabolomics. Step 1 uses a random forest classifier to classify missing values as either Missing Completely at Random/Missing At Random (MCAR/MAR) or Missing Not At Random (MNAR). MCAR/MAR are combined because it is often difficult to distinguish these two missing types in metabolomics data. Step 2 imputes the missing values based on the classified missing mechanisms, using the appropriate imputation algorithms. Imputation algorithms tested and available for MCAR/MAR include Bayesian Principal Component Analysis (BPCA), Multiple Imputation No-Skip K-Nearest Neighbors (Multi_nsKNN), and Random Forest. Imputation algorithms tested and available for MNAR include nsKNN and a single imputation approach for imputation of metabolites where left-censoring is present.
Maintained by Jonathan Dekermanjian. Last updated 5 months ago.
softwaremetabolomicsstatisticalmethodclassificationimputation-methodsmachine-learningmissing-data
16.9 match 2 stars 5.00 score 6 scriptsrandel
MixRF:A Random-Forest-Based Approach for Imputing Clustered Incomplete Data
It offers random-forest-based functions to impute clustered incomplete data. The package is tailored for but not limited to imputing multitissue expression data, in which a gene's expression is measured on the collected tissues of an individual but missing on the uncollected tissues.
Maintained by Jiebiao Wang. Last updated 8 years ago.
gene-expressionimputationmixed-modelsrandom-forest
19.2 match 35 stars 4.39 score 14 scriptsbioc
rexposome:Exposome exploration and outcome data analysis
Package that allows to explore the exposome and to perform association analyses between exposures and health outcomes.
Maintained by Xavier Escribà Montagut. Last updated 5 months ago.
softwarebiologicalquestioninfrastructuredataimportdatarepresentationbiomedicalinformaticsexperimentaldesignmultiplecomparisonclassificationclustering
14.6 match 5.70 score 28 scripts 1 dependentsbioc
QFeatures:Quantitative features for mass spectrometry data
The QFeatures infrastructure enables the management and processing of quantitative features for high-throughput mass spectrometry assays. It provides a familiar Bioconductor user experience to manages quantitative data across different assay levels (such as peptide spectrum matches, peptides and proteins) in a coherent and tractable format.
Maintained by Laurent Gatto. Last updated 11 days ago.
infrastructuremassspectrometryproteomicsmetabolomicsbioconductormass-spectrometry
6.9 match 27 stars 11.87 score 278 scripts 49 dependentsbioc
Melissa:Bayesian clustering and imputationa of single cell methylomes
Melissa is a Baysian probabilistic model for jointly clustering and imputing single cell methylomes. This is done by taking into account local correlations via a Generalised Linear Model approach and global similarities using a mixture modelling approach.
Maintained by C. A. Kapourani. Last updated 5 months ago.
immunooncologydnamethylationgeneexpressiongeneregulationepigeneticsgeneticsclusteringfeatureextractionregressionrnaseqbayesiankeggsequencingcoveragesinglecell
16.1 match 4.90 score 7 scriptszjg540066169
SBMTrees:Sequential Imputation with Bayesian Trees Mixed-Effects Models for Longitudinal Data
Implements a sequential imputation framework using Bayesian Mixed-Effects Trees ('SBMTrees') for handling missing data in longitudinal studies. The package supports a variety of models, including non-linear relationships and non-normal random effects and residuals, leveraging Dirichlet Process priors for increased flexibility. Key features include handling Missing at Random (MAR) longitudinal data, imputation of both covariates and outcomes, and generating posterior predictive samples for further analysis. The methodology is designed for applications in epidemiology, biostatistics, and other fields requiring robust handling of missing data in longitudinal settings.
Maintained by Jungang Zou. Last updated 3 months ago.
bayesian-machine-learninglongitudinal-datamissing-data-imputationopenblascpp
17.1 match 1 stars 4.40 score 10 scriptsalexpkeil1
qgcomp:Quantile G-Computation
G-computation for a set of time-fixed exposures with quantile-based basis functions, possibly under linearity and homogeneity assumptions. This approach estimates a regression line corresponding to the expected change in the outcome (on the link basis) given a simultaneous increase in the quantile-based category for all exposures. Works with continuous, binary, and right-censored time-to-event outcomes. Reference: Alexander P. Keil, Jessie P. Buckley, Katie M. OBrien, Kelly K. Ferguson, Shanshan Zhao, and Alexandra J. White (2019) A quantile-based g-computation approach to addressing the effects of exposure mixtures; <doi:10.1289/EHP5838>.
Maintained by Alexander Keil. Last updated 3 days ago.
exposureexposure-mixtureexposure-mixturesquantile-gcomputationsurvival
8.5 match 37 stars 8.73 score 70 scripts 2 dependentsbioc
MOFA2:Multi-Omics Factor Analysis v2
The MOFA2 package contains a collection of tools for training and analysing multi-omic factor analysis (MOFA). MOFA is a probabilistic factor model that aims to identify principal axes of variation from data sets that can comprise multiple omic layers and/or groups of samples. Additional time or space information on the samples can be incorporated using the MEFISTO framework, which is part of MOFA2. Downstream analysis functions to inspect molecular features underlying each factor, vizualisation, imputation etc are available.
Maintained by Ricard Argelaguet. Last updated 5 months ago.
dimensionreductionbayesianvisualizationfactor-analysismofamulti-omics
7.3 match 319 stars 10.02 score 502 scriptsbioc
gwasurvivr:gwasurvivr: an R package for genome wide survival analysis
gwasurvivr is a package to perform survival analysis using Cox proportional hazard models on imputed genetic data.
Maintained by Abbas Rizvi. Last updated 5 months ago.
genomewideassociationsurvivalregressiongeneticssnpgeneticvariabilitypharmacogenomicsbiomedicalinformatics
11.3 match 12 stars 6.43 score 75 scriptsbioc
Pirat:Precursor or Peptide Imputation under Random Truncation
Pirat enables the imputation of missing values (either MNARs or MCARs) in bottom-up LC-MS/MS proteomics data using a penalized maximum likelihood strategy. It does not require any parameter tuning, it models the instrument censorship from the data available. It accounts for sibling peptides correlations and it can leverage complementary transcriptomics measurements.
Maintained by Samuel Wieczorek. Last updated 5 months ago.
proteomicsmassspectrometrypreprocessingsoftware
14.9 match 4.81 score 9 scriptsboennecd
mdgc:Missing Data Imputation Using Gaussian Copulas
Provides functions to impute missing values using Gaussian copulas for mixed data types as described by Christoffersen et al. (2021) <arXiv:2102.02642>. The method is related to Hoff (2007) <doi:10.1214/07-AOAS107> and Zhao and Udell (2019) <arXiv:1910.12845> but differs by making a direct approximation of the log marginal likelihood using an extended version of the Fortran code created by Genz and Bretz (2002) <doi:10.1198/106186002394> in addition to also support multinomial variables.
Maintained by Benjamin Christoffersen. Last updated 2 years ago.
binarygaussian-copulaimputationmultinomial-variablesordinalsemi-parametricfortranopenblascppopenmp
19.0 match 10 stars 3.78 score 12 scriptsjwb133
dejaVu:Multiple Imputation for Recurrent Events
Performs reference based multiple imputation of recurrent event data based on a negative binomial regression model, as described by Keene et al (2014) <doi:10.1002/pst.1624>.
Maintained by Jonathan Bartlett. Last updated 8 months ago.
15.0 match 4.68 score 24 scriptsmarberts
piar:Price Index Aggregation
Most price indexes are made with a two-step procedure, where period-over-period elemental indexes are first calculated for a collection of elemental aggregates at each point in time, and then aggregated according to a price index aggregation structure. These indexes can then be chained together to form a time series that gives the evolution of prices with respect to a fixed base period. This package contains a collection of functions that revolve around this work flow, making it easy to build standard price indexes, and implement the methods described by Balk (2008, <doi:10.1017/CBO9780511720758>), von der Lippe (2007, <doi:10.3726/978-3-653-01120-3>), and the CPI manual (2020, <doi:10.5089/9781484354841.069>) for bilateral price indexes.
Maintained by Steve Martin. Last updated 13 days ago.
economicsinflationofficial-statisticsstatistics
9.6 match 4 stars 7.32 score 25 scriptswadpac
GGIR:Raw Accelerometer Data Analysis
A tool to process and analyse data collected with wearable raw acceleration sensors as described in Migueles and colleagues (JMPB 2019), and van Hees and colleagues (JApplPhysiol 2014; PLoSONE 2015). The package has been developed and tested for binary data from 'GENEActiv' <https://activinsights.com/>, binary (.gt3x) and .csv-export data from 'Actigraph' <https://theactigraph.com> devices, and binary (.cwa) and .csv-export data from 'Axivity' <https://axivity.com>. These devices are currently widely used in research on human daily physical activity. Further, the package can handle accelerometer data file from any other sensor brand providing that the data is stored in csv format. Also the package allows for external function embedding.
Maintained by Vincent T van Hees. Last updated 15 hours ago.
accelerometeractivity-recognitioncircadian-rhythmmovement-sensorsleep
5.3 match 109 stars 13.20 score 342 scripts 3 dependentsjapal
zCompositions:Treatment of Zeros, Left-Censored and Missing Values in Compositional Data Sets
Principled methods for the imputation of zeros, left-censored and missing data in compositional data sets (Palarea-Albaladejo and Martin-Fernandez (2015) <doi:10.1016/j.chemolab.2015.02.019>).
Maintained by Javier Palarea-Albaladejo. Last updated 9 months ago.
censored-datacompositional-dataimputation-methodsmissing-datanondetection
8.0 match 7 stars 8.55 score 370 scripts 11 dependentsjackdunnnz
iai:Interface to 'Interpretable AI' Modules
An interface to the algorithms of 'Interpretable AI' <https://www.interpretable.ai> from the R programming language. 'Interpretable AI' provides various modules, including 'Optimal Trees' for classification, regression, prescription and survival analysis, 'Optimal Imputation' for missing data imputation and outlier detection, and 'Optimal Feature Selection' for exact sparse regression. The 'iai' package is an open-source project. The 'Interpretable AI' software modules are proprietary products, but free academic and evaluation licenses are available.
Maintained by Jack Dunn. Last updated 5 months ago.
33.9 match 1 stars 2.00 score 7 scriptsmarjoleinf
pre:Prediction Rule Ensembles
Derives prediction rule ensembles (PREs). Largely follows the procedure for deriving PREs as described in Friedman & Popescu (2008; <DOI:10.1214/07-AOAS148>), with adjustments and improvements. The main function pre() derives prediction rule ensembles consisting of rules and/or linear terms for continuous, binary, count, multinomial, and multivariate continuous responses. Function gpe() derives generalized prediction ensembles, consisting of rules, hinge and linear functions of the predictor variables.
Maintained by Marjolein Fokkema. Last updated 9 months ago.
7.8 match 58 stars 8.49 score 98 scripts 1 dependentsbioc
aCGH:Classes and functions for Array Comparative Genomic Hybridization data
Functions for reading aCGH data from image analysis output files and clone information files, creation of aCGH S3 objects for storing these data. Basic methods for accessing/replacing, subsetting, printing and plotting aCGH objects.
Maintained by Peter Dimitrov. Last updated 5 months ago.
copynumbervariationdataimportgeneticscpp
12.3 match 5.38 score 9 scripts 4 dependentsmartinster
modi:Multivariate Outlier Detection and Imputation for Incomplete Survey Data
Algorithms for multivariate outlier detection when missing values occur. Algorithms are based on Mahalanobis distance or data depth. Imputation is based on the multivariate normal model or uses nearest neighbour donors. The algorithms take sample designs, in particular weighting, into account. The methods are described in Bill and Hulliger (2016) <doi:10.17713/ajs.v45i1.86>.
Maintained by Beat Hulliger. Last updated 2 years ago.
10.7 match 4 stars 6.02 score 88 scripts 1 dependentsstamats
MKinfer:Inferential Statistics
Computation of various confidence intervals (Altman et al. (2000), ISBN:978-0-727-91375-3; Hedderich and Sachs (2018), ISBN:978-3-662-56657-2) including bootstrapped versions (Davison and Hinkley (1997), ISBN:978-0-511-80284-3) as well as Hsu (Hedderich and Sachs (2018), ISBN:978-3-662-56657-2), permutation (Janssen (1997), <doi:10.1016/S0167-7152(97)00043-6>), bootstrap (Davison and Hinkley (1997), ISBN:978-0-511-80284-3), intersection-union (Sozu et al. (2015), ISBN:978-3-319-22005-5) and multiple imputation (Barnard and Rubin (1999), <doi:10.1093/biomet/86.4.948>) t-test; furthermore, computation of intersection-union z-test as well as multiple imputation Wilcoxon tests. Graphical visualization by volcano and Bland-Altman plots (Bland and Altman (1986), <doi:10.1016/S0140-6736(86)90837-8>; Shieh (2018), <doi:10.1186/s12874-018-0505-y>).
Maintained by Matthias Kohl. Last updated 11 months ago.
9.7 match 6 stars 6.56 score 71 scripts 4 dependentsjwb133
gFormulaMI:G-Formula for Causal Inference via Multiple Imputation
Implements the G-Formula method for causal inference with time-varying treatments and confounders using Bayesian multiple imputation methods, as described by Bartlett, Olarte Parra and Daniel (2023) <arXiv:2301.12026>. It creates multiple synthetic imputed datasets under treatment regimes of interest using the 'mice' package. These can then be analysed using rules developed for analysing multiple synthetic datasets.
Maintained by Jonathan Bartlett. Last updated 1 years ago.
14.0 match 7 stars 4.54 score 7 scriptskogalur
randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.
Maintained by Udaya B. Kogalur. Last updated 2 months ago.
8.0 match 10 stars 7.90 score 1.2k scripts 12 dependentsbioc
msImpute:Imputation of label-free mass spectrometry peptides
MsImpute is a package for imputation of peptide intensity in proteomics experiments. It additionally contains tools for MAR/MNAR diagnosis and assessment of distortions to the probability distribution of the data post imputation. The missing values are imputed by low-rank approximation of the underlying data matrix if they are MAR (method = "v2"), by Barycenter approach if missingness is MNAR ("v2-mnar"), or by Peptide Identity Propagation (PIP).
Maintained by Soroor Hediyeh-zadeh. Last updated 5 months ago.
massspectrometryproteomicssoftwarelabel-free-proteomicslow-rank-approximation
12.1 match 14 stars 5.15 score 7 scriptsbioc
sesame:SEnsible Step-wise Analysis of DNA MEthylation BeadChips
Tools For analyzing Illumina Infinium DNA methylation arrays. SeSAMe provides utilities to support analyses of multiple generations of Infinium DNA methylation BeadChips, including preprocessing, quality control, visualization and inference. SeSAMe features accurate detection calling, intelligent inference of ethnicity, sex and advanced quality control routines.
Maintained by Wanding Zhou. Last updated 2 months ago.
dnamethylationmethylationarraypreprocessingqualitycontrolbioinformaticsdna-methylationmicroarray
6.8 match 69 stars 9.08 score 258 scripts 1 dependentsbioc
MAST:Model-based Analysis of Single Cell Transcriptomics
Methods and models for handling zero-inflated single cell assay data.
Maintained by Andrew McDavid. Last updated 5 months ago.
geneexpressiondifferentialexpressiongenesetenrichmentrnaseqtranscriptomicssinglecell
4.8 match 230 stars 12.75 score 1.8k scripts 5 dependentsohdsi
PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model
A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.
Maintained by Egill Fridgeirsson. Last updated 7 days ago.
5.5 match 190 stars 10.85 score 297 scriptsjpquast
protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools
Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.
Maintained by Jan-Philipp Quast. Last updated 5 months ago.
data-analysislip-msmass-spectrometryomicsproteinproteomicssystems-biology
7.0 match 61 stars 8.58 score 83 scriptsjinghuazhao
tdthap:TDT Tests for Extended Haplotypes
Functions and examples are provided for Transmission/disequilibrium tests for extended marker haplotypes, as in Clayton, D. and Jones, H. (1999) "Transmission/disequilibrium tests for extended marker haplotypes". Amer. J. Hum. Genet., 65:1161-1169, <doi:10.1086/302566>.
Maintained by Jing Hua Zhao. Last updated 15 days ago.
10.0 match 12 stars 5.92 score 6 scriptsolssol
idem:Inference in Randomized Controlled Trials with Death and Missingness
In randomized studies involving severely ill patients, functional outcomes are often unobserved due to missed clinic visits, premature withdrawal or death. It is well known that if these unobserved functional outcomes are not handled properly, biased treatment comparisons can be produced. In this package, we implement a procedure for comparing treatments that is based on the composite endpoint of both the functional outcome and survival. The procedure was proposed in Wang et al. (2016) <DOI:10.1111/biom.12594> and Wang et al. (2020) <DOI:10.18637/jss.v093.i12>. It considers missing data imputation with different sensitivity analysis strategies to handle the unobserved functional outcomes not due to death.
Maintained by Chenguang Wang. Last updated 2 years ago.
16.7 match 3.51 score 16 scriptspharmaverse
admiralonco:Oncology Extension Package for ADaM in 'R' Asset Library
Programming oncology specific Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in 'R'. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team (2021), <https://www.cdisc.org/standards/foundational/adam>). The package is an extension package of the 'admiral' package.
Maintained by Stefan Bundfuss. Last updated 2 months ago.
6.8 match 32 stars 8.66 score 30 scriptsbioc
ccImpute:ccImpute: an accurate and scalable consensus clustering based approach to impute dropout events in the single-cell RNA-seq data (https://doi.org/10.1186/s12859-022-04814-8)
Dropout events make the lowly expressed genes indistinguishable from true zero expression and different than the low expression present in cells of the same type. This issue makes any subsequent downstream analysis difficult. ccImpute is an imputation algorithm that uses cell similarity established by consensus clustering to impute the most probable dropout events in the scRNA-seq datasets. ccImpute demonstrated performance which exceeds the performance of existing imputation approaches while introducing the least amount of new noise as measured by clustering performance characteristics on datasets with known cell identities.
Maintained by Marcin Malec. Last updated 5 months ago.
singlecellsequencingprincipalcomponentdimensionreductionclusteringrnaseqtranscriptomicsopenblascppopenmp
12.7 match 2 stars 4.48 score 2 scriptsbioc
PhosR:A set of methods and tools for comprehensive analysis of phosphoproteomics data
PhosR is a package for the comprenhensive analysis of phosphoproteomic data. There are two major components to PhosR: processing and downstream analysis. PhosR consists of various processing tools for phosphoproteomics data including filtering, imputation, normalisation, and functional analysis for inferring active kinases and signalling pathways.
Maintained by Taiyun Kim. Last updated 5 months ago.
softwareresearchfieldproteomics
12.1 match 4.71 score 51 scriptsbelayb
drugprepr:Prepare Electronic Prescription Record Data to Estimate Drug Exposure
Prepare prescription data (such as from the Clinical Practice Research Datalink) into an analysis-ready format, with start and stop dates for each patient's prescriptions. Based on Pye et al (2018) <doi:10.1002/pds.4440>.
Maintained by David Selby. Last updated 3 years ago.
15.4 match 1 stars 3.70 score 3 scriptsbioc
AffiXcan:A Functional Approach To Impute Genetically Regulated Expression
Impute a GReX (Genetically Regulated Expression) for a set of genes in a sample of individuals, using a method based on the Total Binding Affinity (TBA). Statistical models to impute GReX can be trained with a training dataset where the real total expression values are known.
Maintained by Alessandro Lussana. Last updated 5 months ago.
geneexpressiontranscriptiongeneregulationdimensionreductionregressionprincipalcomponent
14.1 match 4.00 scoremissvalteam
Iscores:Proper Scoring Rules for Missing Value Imputation
Implementation of a KL-based scoring rule to assess the quality of different missing value imputations in the broad sense as introduced in Michel et al. (2021) <arXiv:2106.03742>.
Maintained by Loris Michel. Last updated 2 years ago.
imputation-methodsmachine-learningmissing-valuesrandom-forest
14.4 match 7 stars 3.91 score 23 scriptstetratech
baytrends:Long Term Water Quality Trend Analysis
Enable users to evaluate long-term trends using a Generalized Additive Modeling (GAM) approach. The model development includes selecting a GAM structure to describe nonlinear seasonally-varying changes over time, incorporation of hydrologic variability via either a river flow or salinity, the use of an intervention to deal with method or laboratory changes suspected to impact data values, and representation of left- and interval-censored data. The approach has been applied to water quality data in the Chesapeake Bay, a major estuary on the east coast of the United States to provide insights to a range of management- and research-focused questions. Methodology described in Murphy (2019) <doi:10.1016/j.envsoft.2019.03.027>.
Maintained by Erik W Leppo. Last updated 5 months ago.
8.4 match 12 stars 6.67 score 97 scriptsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 2 days ago.
immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project
4.0 match 182 stars 13.71 score 1.3k scripts 22 dependentsevolecolgroup
tidypopgen:Tidy Population Genetics
We provide a tidy grammar of population genetics, facilitating the manipulation and analysis of data on biallelic single nucleotide polymorphisms (SNPs).
Maintained by Andrea Manica. Last updated 2 days ago.
9.4 match 4 stars 5.83 score 8 scriptsphilchalmers
mirt:Multidimensional Item Response Theory
Analysis of discrete response data using unidimensional and multidimensional item analysis models under the Item Response Theory paradigm (Chalmers (2012) <doi:10.18637/jss.v048.i06>). Exploratory and confirmatory item factor analysis models are estimated with quadrature (EM) or stochastic (MHRM) methods. Confirmatory bi-factor and two-tier models are available for modeling item testlets using dimension reduction EM algorithms, while multiple group analyses and mixed effects designs are included for detecting differential item, bundle, and test functioning, and for modeling item and person covariates. Finally, latent class models such as the DINA, DINO, multidimensional latent class, mixture IRT models, and zero-inflated response models are supported, as well as a wide family of probabilistic unfolding models.
Maintained by Phil Chalmers. Last updated 10 days ago.
3.6 match 210 stars 14.98 score 2.5k scripts 40 dependentsikwak2
DrImpute:Imputing Dropout Events in Single-Cell RNA-Sequencing Data
R codes for imputing dropout events. Many statistical methods in cell type identification, visualization and lineage reconstruction do not account for dropout events ('PCAreduce', 'SC3', 'PCA', 't-SNE', 'Monocle', 'TSCAN', etc). 'DrImpute' can improve the performance of such software by imputing dropout events.
Maintained by Il-Youp Kwak. Last updated 8 years ago.
8.1 match 4 stars 6.66 score 77 scripts 1 dependentsbioc
MSPrep:Package for Summarizing, Filtering, Imputing, and Normalizing Metabolomics Data
Package performs summarization of replicates, filtering by frequency, several different options for imputing missing data, and a variety of options for transforming, batch correcting, and normalizing data.
Maintained by Max McGrath. Last updated 5 months ago.
metabolomicsmassspectrometrypreprocessing
10.3 match 10 stars 5.20 score 4 scriptsropengov
regions:Processing Regional Statistics
Validating sub-national statistical typologies, re-coding across standard typologies of sub-national statistics, and making valid aggregate level imputation, re-aggregation, re-weighting and projection down to lower hierarchical levels to create meaningful data panels and time series.
Maintained by Daniel Antal. Last updated 2 years ago.
observatoryregionsropengovstatistics
6.0 match 12 stars 8.81 score 67 scripts 5 dependentsbillpetti
baseballr:Acquiring and Analyzing Baseball Data
Provides numerous utilities for acquiring and analyzing baseball data from online sources such as 'Baseball Reference' <https://www.baseball-reference.com/>, 'FanGraphs' <https://www.fangraphs.com/>, and the 'MLB Stats' API <https://www.mlb.com/>.
Maintained by Saiem Gilani. Last updated 4 months ago.
baseballpitchfxsabermetricsstatcast
5.9 match 380 stars 8.98 score 582 scriptsrrwen
nbc4va:Bayes Classifier for Verbal Autopsy Data
An implementation of the Naive Bayes Classifier (NBC) algorithm used for Verbal Autopsy (VA) built on code from Miasnikof et al (2015) <DOI:10.1186/s12916-015-0521-2>.
Maintained by Richard Wen. Last updated 3 years ago.
autopsybayescauseclassifiercodedcomputerdeathestimateimputationlearningmachinemdsmillionnaivenbcprobabilitystudytheoryvaverbal
11.3 match 4.60 score 79 scriptsbusiness-science
timetk:A Tool Kit for Working with Time Series
Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.
Maintained by Matt Dancho. Last updated 1 years ago.
coercioncoercion-functionsdata-miningdplyrforecastforecastingforecasting-modelsmachine-learningseries-decompositionseries-signaturetibbletidytidyquanttidyversetimetime-seriestimeseries
3.6 match 625 stars 14.15 score 4.0k scripts 16 dependentslarsenlab
hlaR:Tools for HLA Data
A streamlined tool for eplet analysis of donor and recipient HLA (human leukocyte antigen) mismatch. Messy, low-resolution HLA typing data is cleaned, and imputed to high-resolution using the NMDP (National Marrow Donor Program) haplotype reference database <https://haplostats.org/haplostats>. High resolution data is analyzed for overall or single antigen eplet mismatch using a reference table (currently supporting 'HLAMatchMaker' <http://www.epitopes.net> versions 2 and 3). Data can enter or exit the workflow at different points depending on the user's aims and initial data quality.
Maintained by Joan Zhang. Last updated 2 years ago.
9.8 match 7 stars 5.15 score 9 scriptsjgill22
hot.deck:Multiple Hot Deck Imputation
Performs multiple hot-deck imputation of categorical and continuous variables in a data frame.
Maintained by Jeff Gill. Last updated 4 years ago.
17.5 match 2.80 score 21 scripts 1 dependentsbioc
ptairMS:Pre-processing PTR-TOF-MS Data
This package implements a suite of methods to preprocess data from PTR-TOF-MS instruments (HDF5 format) and generates the 'sample by features' table of peak intensities in addition to the sample and feature metadata (as a singl<e ExpressionSet object for subsequent statistical analysis). This package also permit usefull tools for cohorts management as analyzing data progressively, visualization tools and quality control. The steps include calibration, expiration detection, peak detection and quantification, feature alignment, missing value imputation and feature annotation. Applications to exhaled air and cell culture in headspace are described in the vignettes and examples. This package was used for data analysis of Gassin Delyle study on adults undergoing invasive mechanical ventilation in the intensive care unit due to severe COVID-19 or non-COVID-19 acute respiratory distress syndrome (ARDS), and permit to identfy four potentiel biomarquers of the infection.
Maintained by camille Roquencourt. Last updated 5 months ago.
softwaremassspectrometrypreprocessingmetabolomicspeakdetectionalignmentcpp
9.5 match 7 stars 5.15 score 3 scriptsthevaachandereng
bayesCT:Simulation and Analysis of Adaptive Bayesian Clinical Trials
Simulation and analysis of Bayesian adaptive clinical trials for binomial, continuous, and time-to-event data types, incorporates historical data and allows early stopping for futility or early success. The package uses novel and efficient Monte Carlo methods for estimating Bayesian posterior probabilities, evaluation of loss to follow up, and imputation of incomplete data. The package has the functionality for dynamically incorporating historical data into the analysis via the power prior or non-informative priors.
Maintained by Thevaa Chandereng. Last updated 5 years ago.
adaptivebayesian-methodsbayesian-trialclinical-trialsstatistical-power
7.6 match 14 stars 6.30 score 36 scriptsbmcclintock
momentuHMM:Maximum Likelihood Analysis of Animal Movement Behavior Using Multivariate Hidden Markov Models
Extended tools for analyzing telemetry data using generalized hidden Markov models. Features of momentuHMM (pronounced ``momentum'') include data pre-processing and visualization, fitting HMMs to location and auxiliary biotelemetry or environmental data, biased and correlated random walk movement models, discrete- or continuous-time HMMs, continuous- or discrete-space movement models, approximate Langevin diffusion models, hierarchical HMMs, multiple imputation for incorporating location measurement error and missing data, user-specified design matrices and constraints for covariate modelling of parameters, random effects, decoding of the state process, visualization of fitted models, model checking and selection, and simulation. See McClintock and Michelot (2018) <doi:10.1111/2041-210X.12995>.
Maintained by Brett McClintock. Last updated 29 days ago.
5.7 match 43 stars 8.47 score 162 scriptscovaruber
sommer:Solving Mixed Model Equations in R
Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.
Maintained by Giovanny Covarrubias-Pazaran. Last updated 20 days ago.
average-informationmixed-modelsrcpparmadilloopenblascppopenmp
3.8 match 43 stars 12.70 score 300 scripts 9 dependentslaresbernardo
lares:Analytics & Machine Learning Sidekick
Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.
Maintained by Bernardo Lares. Last updated 23 days ago.
analyticsapiautomationautomldata-sciencedescriptive-statisticsh2omachine-learningmarketingmmmpredictive-modelingpuzzlerlanguagerobynvisualization
4.8 match 233 stars 9.84 score 185 scripts 1 dependentssimsem
semTools:Useful Tools for Structural Equation Modeling
Provides miscellaneous tools for structural equation modeling, many of which extend the 'lavaan' package. For example, latent interactions can be estimated using product indicators (Lin et al., 2010, <doi:10.1080/10705511.2010.488999>) and simple effects probed; analytical power analyses can be conducted (Jak et al., 2021, <doi:10.3758/s13428-020-01479-0>); and scale reliability can be estimated based on estimated factor-model parameters.
Maintained by Terrence D. Jorgensen. Last updated 2 days ago.
3.4 match 79 stars 13.74 score 1.1k scripts 31 dependentsteebusch
mifa:Multiple Imputation for Exploratory Factor Analysis
Impute the covariance matrix of incomplete data so that factor analysis can be performed. Imputations are made using multiple imputation by Multivariate Imputation with Chained Equations (MICE) and combined with Rubin's rules. Parametric Fieller confidence intervals and nonparametric bootstrap confidence intervals can be obtained for the variance explained by different numbers of principal components. The method is described in Nassiri et al. (2018) <doi:10.3758/s13428-017-1013-4>.
Maintained by Tobias Busch. Last updated 4 years ago.
15.7 match 2 stars 3.00 score 5 scriptso1iv3r
ClustImpute:K-Means Clustering with Build-in Missing Data Imputation
This k-means algorithm is able to cluster data with missing values and as a by-product completes the data set. The implementation can deal with missing values in multiple variables and is computationally efficient since it iteratively uses the current cluster assignment to define a plausible distribution for missing value imputation. Weights are used to shrink early random draws for missing values (i.e., draws based on the cluster assignments after few iterations) towards the global mean of each feature. This shrinkage slowly fades out after a fixed number of iterations to reflect the increasing credibility of cluster assignments. See the vignette for details.
Maintained by Oliver Pfaffel. Last updated 3 years ago.
9.4 match 7 stars 4.96 score 13 scriptsopenpharma
brms.mmrm:Bayesian MMRMs using 'brms'
The mixed model for repeated measures (MMRM) is a popular model for longitudinal clinical trial data with continuous endpoints, and 'brms' is a powerful and versatile package for fitting Bayesian regression models. The 'brms.mmrm' R package leverages 'brms' to run MMRMs, and it supports a simplified interfaced to reduce difficulty and align with the best practices of the life sciences. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>, Mallinckrodt (2008) <doi:10.1177/009286150804200402>.
Maintained by William Michael Landau. Last updated 5 months ago.
brmslife-sciencesmc-stanmmrmstanstatistics
5.2 match 21 stars 8.80 score 13 scriptseliocamp
metR:Tools for Easier Analysis of Meteorological Fields
Many useful functions and extensions for dealing with meteorological data in the tidy data framework. Extends 'ggplot2' for better plotting of scalar and vector fields and provides commonly used analysis methods in the atmospheric sciences.
Maintained by Elio Campitelli. Last updated 19 days ago.
atmospheric-scienceggplot2visualization
3.8 match 144 stars 12.19 score 1000 scripts 22 dependentsandyliaw-mrk
randomForest:Breiman and Cutlers Random Forests for Classification and Regression
Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <DOI:10.1023/A:1010933404324>.
Maintained by Andy Liaw. Last updated 6 months ago.
3.8 match 47 stars 12.11 score 35k scripts 282 dependentscran
CALIBERrfimpute:Multiple Imputation Using MICE and Random Forest
Functions to impute using random forest under full conditional specifications (multivariate imputation by chained equations). The methods are described in Shah and others (2014) <doi:10.1093/aje/kwt312>.
Maintained by Anoop Shah. Last updated 2 years ago.
17.4 match 2 stars 2.60 scoreropensci
dynamite:Bayesian Modeling and Causal Inference for Multivariate Longitudinal Data
Easy-to-use and efficient interface for Bayesian inference of complex panel (time series) data using dynamic multivariate panel models by Helske and Tikka (2024) <doi:10.1016/j.alcr.2024.100617>. The package supports joint modeling of multiple measurements per individual, time-varying and time-invariant effects, and a wide range of discrete and continuous distributions. Estimation of these dynamic multivariate panel models is carried out via 'Stan'. For an in-depth tutorial of the package, see (Tikka and Helske, 2024) <doi:10.48550/arXiv.2302.01607>.
Maintained by Santtu Tikka. Last updated 18 days ago.
bayesian-inferencepanel-datastanstatistical-models
5.7 match 29 stars 7.92 score 20 scriptsstamats
MKmisc:Miscellaneous Functions from M. Kohl
Contains several functions for statistical data analysis; e.g. for sample size and power calculations, computation of confidence intervals and tests, and generation of similarity matrices.
Maintained by Matthias Kohl. Last updated 2 years ago.
6.0 match 11 stars 7.40 score 129 scripts 1 dependentscran
e1071:Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien
Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, generalized k-nearest neighbour ...
Maintained by David Meyer. Last updated 6 months ago.
3.0 match 28 stars 14.46 score 19k scripts 2.0k dependentsumich-cphds
lodi:Limit of Detection Imputation for Single-Pollutant Models
Impute observed values below the limit of detection (LOD) via censored likelihood multiple imputation (CLMI) in single-pollutant models, developed by Boss et al (2019) <doi:10.1097/EDE.0000000000001052>. CLMI handles exposure detection limits that may change throughout the course of exposure assessment. 'lodi' provides functions for imputing and pooling for this method.
Maintained by Alexander Rix. Last updated 5 years ago.
11.7 match 1 stars 3.70 score 10 scriptsjeffreyevans
spatialEco:Spatial Analysis and Modelling Utilities
Utilities to support spatial data manipulation, query, sampling and modelling in ecological applications. Functions include models for species population density, spatial smoothing, multivariate separability, point process model for creating pseudo- absences and sub-sampling, Quadrant-based sampling and analysis, auto-logistic modeling, sampling models, cluster optimization, statistical exploratory tools and raster-based metrics.
Maintained by Jeffrey S. Evans. Last updated 12 days ago.
biodiversityconservationecologyr-spatialrasterspatialvector
4.5 match 110 stars 9.55 score 736 scripts 2 dependentsemmaskarstein
inlamemi:Missing Data and Measurement Error Modelling in INLA
Facilitates fitting measurement error and missing data imputation models using integrated nested Laplace approximations, according to the method described in Skarstein, Martino and Muff (2023) <doi:10.1002/bimj.202300078>. See Skarstein and Muff (2024) <doi:10.48550/arXiv.2406.08172> for details on using the package.
Maintained by Emma Skarstein. Last updated 4 months ago.
7.1 match 5.97 score 19 scriptsbips-hb
arf:Adversarial Random Forests
Adversarial random forests (ARFs) recursively partition data into fully factorized leaves, where features are jointly independent. The procedure is iterative, with alternating rounds of generation and discrimination. Data becomes increasingly realistic at each round, until original and synthetic samples can no longer be reliably distinguished. This is useful for several unsupervised learning tasks, such as density estimation and data synthesis. Methods for both are implemented in this package. ARFs naturally handle unstructured data with mixed continuous and categorical covariates. They inherit many of the benefits of random forests, including speed, flexibility, and solid performance with default parameters. For details, see Watson et al. (2023) <https://proceedings.mlr.press/v206/watson23a.html>.
Maintained by Marvin N. Wright. Last updated 18 days ago.
6.4 match 14 stars 6.65 score 16 scriptsluca-scr
mclust:Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation
Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.
Maintained by Luca Scrucca. Last updated 11 months ago.
3.5 match 21 stars 12.23 score 6.6k scripts 587 dependentsandrija-djurovic
monobinShiny:Shiny User Interface for 'monobin' Package
This is an add-on package to the 'monobin' package that simplifies its use. It provides shiny-based user interface (UI) that is especially handy for less experienced 'R' users as well as for those who intend to perform quick scanning of numeric risk factors when building credit rating models. The additional functions implemented in 'monobinShiny' that do no exist in 'monobin' package are: descriptive statistics, special case and outliers imputation. The function descriptive statistics is exported and can be used in 'R' sessions independently from the user interface, while special case and outlier imputation functions are written to be used with shiny UI.
Maintained by Andrija Djurovic. Last updated 3 years ago.
15.5 match 1 stars 2.70 score 6 scriptscardiomoon
autoReg:Automatic Linear and Logistic Regression and Survival Analysis
Make summary tables for descriptive statistics and select explanatory variables automatically in various regression models. Support linear models, generalized linear models and cox-proportional hazard models. Generate publication-ready tables summarizing result of regression analysis and plots. The tables and plots can be exported in "HTML", "pdf('LaTex')", "docx('MS Word')" and "pptx('MS Powerpoint')" documents.
Maintained by Keon-Woong Moon. Last updated 1 years ago.
5.9 match 47 stars 7.00 score 69 scriptsbranchlab
metasnf:Meta Clustering with Similarity Network Fusion
Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.
Maintained by Prashanth S Velayudhan. Last updated 3 days ago.
bioinformaticsclusteringmetaclusteringsnf
5.0 match 8 stars 8.21 score 30 scriptsmichaelwrobbins
gerbil:Generalized Efficient Regression-Based Imputation with Latent Processes
Implements a new multiple imputation method that draws imputations from a latent joint multivariate normal model which underpins generally structured data. This model is constructed using a sequence of flexible conditional linear models that enables the resulting procedure to be efficiently implemented on high dimensional datasets in practice. See Robbins (2021) <arXiv:2008.02243>.
Maintained by Michael Robbins. Last updated 2 years ago.
15.0 match 2.70 score 3 scriptsbioc
RNAseqCovarImpute:Impute Covariate Data in RNA Sequencing Studies
The RNAseqCovarImpute package makes linear model analysis for RNA sequencing read counts compatible with multiple imputation (MI) of missing covariates. A major problem with implementing MI in RNA sequencing studies is that the outcome data must be included in the imputation prediction models to avoid bias. This is difficult in omics studies with high-dimensional data. The first method we developed in the RNAseqCovarImpute package surmounts the problem of high-dimensional outcome data by binning genes into smaller groups to analyze pseudo-independently. This method implements covariate MI in gene expression studies by 1) randomly binning genes into smaller groups, 2) creating M imputed datasets separately within each bin, where the imputation predictor matrix includes all covariates and the log counts per million (CPM) for the genes within each bin, 3) estimating gene expression changes using `limma::voom` followed by `limma::lmFit` functions, separately on each M imputed dataset within each gene bin, 4) un-binning the gene sets and stacking the M sets of model results before applying the `limma::squeezeVar` function to apply a variance shrinking Bayesian procedure to each M set of model results, 5) pooling the results with Rubins’ rules to produce combined coefficients, standard errors, and P-values, and 6) adjusting P-values for multiplicity to account for false discovery rate (FDR). A faster method uses principal component analysis (PCA) to avoid binning genes while still retaining outcome information in the MI models. Binning genes into smaller groups requires that the MI and limma-voom analysis is run many times (typically hundreds). The more computationally efficient MI PCA method implements covariate MI in gene expression studies by 1) performing PCA on the log CPM values for all genes using the Bioconductor `PCAtools` package, 2) creating M imputed datasets where the imputation predictor matrix includes all covariates and the optimum number of PCs to retain (e.g., based on Horn’s parallel analysis or the number of PCs that account for >80% explained variation), 3) conducting the standard limma-voom pipeline with the `voom` followed by `lmFit` followed by `eBayes` functions on each M imputed dataset, 4) pooling the results with Rubins’ rules to produce combined coefficients, standard errors, and P-values, and 5) adjusting P-values for multiplicity to account for false discovery rate (FDR).
Maintained by Brennan Baker. Last updated 5 months ago.
rnaseqgeneexpressiondifferentialexpressionsequencing
9.0 match 1 stars 4.48 score 6 scriptsdppalomar
imputeFin:Imputation of Financial Time Series with Missing Values and/or Outliers
Missing values often occur in financial data due to a variety of reasons (errors in the collection process or in the processing stage, lack of asset liquidity, lack of reporting of funds, etc.). However, most data analysis methods expect complete data and cannot be employed with missing values. One convenient way to deal with this issue without having to redesign the data analysis method is to impute the missing values. This package provides an efficient way to impute the missing values based on modeling the time series with a random walk or an autoregressive (AR) model, convenient to model log-prices and log-volumes in financial data. In the current version, the imputation is univariate-based (so no asset correlation is used). In addition, outliers can be detected and removed. The package is based on the paper: J. Liu, S. Kumar, and D. P. Palomar (2019). Parameter Estimation of Heavy-Tailed AR Model With Missing Data Via Stochastic EM. IEEE Trans. on Signal Processing, vol. 67, no. 8, pp. 2159-2172. <doi:10.1109/TSP.2019.2899816>.
Maintained by Daniel P. Palomar. Last updated 3 years ago.
financial-datamissing-valuesoutlierstime-series
6.9 match 25 stars 5.80 score 25 scriptstorockel
imputeGeneric:Ease the Implementation of Imputation Methods
The general workflow of most imputation methods is quite similar. The aim of this package is to provide parts of this general workflow to make the implementation of imputation methods easier. The heart of an imputation method is normally the used model. These models can be defined using the 'parsnip' package or customized specifications. The rest of an imputation method are more technical specification e.g. which columns and rows should be used for imputation and in which order. These technical specifications can be set inside the imputation functions.
Maintained by Tobias Rockel. Last updated 3 years ago.
14.7 match 1 stars 2.70 score 2 scriptsneerajdhanraj
imputeTestbench:Test Bench for the Comparison of Imputation Methods
Provides a test bench for the comparison of missing data imputation methods in uni-variate time series. Imputation methods are compared using different error metrics. Proposed imputation methods and alternative error metrics can be used.
Maintained by Marcus W. Beck. Last updated 8 years ago.
8.0 match 5 stars 4.94 score 20 scripts 2 dependentswjunger
mtsdi:Multivariate Time Series Data Imputation
This is an EM algorithm based method for imputation of missing values in multivariate normal time series. The imputation algorithm accounts for both spatial and temporal correlation structures. Temporal patterns can be modeled using an ARIMA(p,d,q), optionally with seasonal components, a non-parametric cubic spline or generalized additive models with exogenous covariates. This algorithm is specially tailored for climate data with missing measurements from several monitors along a given region.
Maintained by Washington Junger. Last updated 3 years ago.
9.8 match 1 stars 4.00 score 22 scripts 3 dependentsjwb133
mlmi:Maximum Likelihood Multiple Imputation
Implements so called Maximum Likelihood Multiple Imputation as described by von Hippel and Bartlett (2021) <doi:10.1214/20-STS793>. A number of different imputations are available, by utilising the 'norm', 'cat' and 'mix' packages. Inferences can be performed either using combination rules similar to Rubin's or using a likelihood score based approach based on theory by Wang and Robins (1998) <doi:10.1093/biomet/85.4.935>.
Maintained by Jonathan Bartlett. Last updated 2 years ago.
14.4 match 2.70 score 10 scriptstrevorhastie
softImpute:Matrix Completion via Iterative Soft-Thresholded SVD
Iterative methods for matrix completion that use nuclear-norm regularization. There are two main approaches.The one approach uses iterative soft-thresholded svds to impute the missing values. The second approach uses alternating least squares. Both have an 'EM' flavor, in that at each iteration the matrix is completed with the current estimate. For large matrices there is a special sparse-matrix class named "Incomplete" that efficiently handles all computations. The package includes procedures for centering and scaling rows, columns or both, and for computing low-rank SVDs on large sparse centered matrices (i.e. principal components).
Maintained by Trevor Hastie. Last updated 4 years ago.
5.2 match 10 stars 7.47 score 253 scripts 22 dependentsnicokubi
penetrance:Methods for Penetrance Estimation in Family-Based Studies
Implements statistical methods for estimating disease penetrance in family-based studies. Penetrance refers to the probability of disease§ manifestation in individuals carrying specific genetic variants. The package provides tools for age-specific penetrance estimation, handling missing data, and accounting for ascertainment bias in family studies. Cite as: Kubista, N., Braun, D. & Parmigiani, G. (2024) <doi:10.48550/arXiv.2411.18816>.
Maintained by Nicolas Kubista. Last updated 16 days ago.
7.1 match 5.41 scorebioc
POMA:Tools for Omics Data Analysis
The POMA package offers a comprehensive toolkit designed for omics data analysis, streamlining the process from initial visualization to final statistical analysis. Its primary goal is to simplify and unify the various steps involved in omics data processing, making it more accessible and manageable within a single, intuitive R package. Emphasizing on reproducibility and user-friendliness, POMA leverages the standardized SummarizedExperiment class from Bioconductor, ensuring seamless integration and compatibility with a wide array of Bioconductor tools. This approach guarantees maximum flexibility and replicability, making POMA an essential asset for researchers handling omics datasets. See https://github.com/pcastellanoescuder/POMAShiny. Paper: Castellano-Escuder et al. (2021) <doi:10.1371/journal.pcbi.1009148> for more details.
Maintained by Pol Castellano-Escuder. Last updated 4 months ago.
batcheffectclassificationclusteringdecisiontreedimensionreductionmultidimensionalscalingnormalizationpreprocessingprincipalcomponentregressionrnaseqsoftwarestatisticalmethodvisualizationbioconductorbioinformaticsdata-visualizationdimension-reductionexploratory-data-analysismachine-learningomics-data-integrationpipelinepre-processingstatistical-analysisuser-friendlyworkflow
4.7 match 11 stars 8.23 score 20 scripts 1 dependentsmikejseo
bipd:Bayesian Individual Patient Data Meta-Analysis using 'JAGS'
We use a Bayesian approach to run individual patient data meta-analysis and network meta-analysis using 'JAGS'. The methods incorporate shrinkage methods and calculate patient-specific treatment effects as described in Seo et al. (2021) <DOI:10.1002/sim.8859>. This package also includes user-friendly functions that impute missing data in an individual patient data using mice-related packages.
Maintained by Michael Seo. Last updated 3 years ago.
9.0 match 3 stars 4.26 score 20 scriptscran
hsphase:Phasing, Pedigree Reconstruction, Sire Imputation and Recombination Events Identification of Half-sib Families Using SNP Data
Identification of recombination events, haplotype reconstruction, sire imputation and pedigree reconstruction using half-sib family SNP data.
Maintained by Mohammad Ferdosi. Last updated 1 years ago.
11.5 match 1 stars 3.32 score 1 dependentsbioc
MAPFX:MAssively Parallel Flow cytometry Xplorer (MAPFX): A Toolbox for Analysing Data from the Massively-Parallel Cytometry Experiments
MAPFX is an end-to-end toolbox that pre-processes the raw data from MPC experiments (e.g., BioLegend's LEGENDScreen and BD Lyoplates assays), and further imputes the ‘missing’ infinity markers in the wells without those measurements. The pipeline starts by performing background correction on raw intensities to remove the noise from electronic baseline restoration and fluorescence compensation by adapting a normal-exponential convolution model. Unwanted technical variation, from sources such as well effects, is then removed using a log-normal model with plate, column, and row factors, after which infinity markers are imputed using the informative backbone markers as predictors. The completed dataset can then be used for clustering and other statistical analyses. Additionally, MAPFX can be used to normalise data from FFC assays as well.
Maintained by Hsiao-Chi Liao. Last updated 5 months ago.
softwareflowcytometrycellbasedassayssinglecellproteomicsclustering
8.3 match 1 stars 4.54 scorejmpsteen
medflex:Flexible Mediation Analysis Using Natural Effect Models
Run flexible mediation analyses using natural effect models as described in Lange, Vansteelandt and Bekaert (2012) <DOI:10.1093/aje/kwr525>, Vansteelandt, Bekaert and Lange (2012) <DOI:10.1515/2161-962X.1014> and Loeys, Moerkerke, De Smet, Buysse, Steen and Vansteelandt (2013) <DOI:10.1080/00273171.2013.832132>.
Maintained by Johan Steen. Last updated 2 years ago.
causal-inferenceflexible-modelingmediation-analysis
5.3 match 23 stars 7.09 score 54 scriptskjhealy
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 11 months ago.
16.1 match 2.28 score 38 scriptsmrc-ide
EpiEstim:Estimate Time Varying Reproduction Numbers from Epidemic Curves
Tools to quantify transmissibility throughout an epidemic from the analysis of time series of incidence as described in Cori et al. (2013) <doi:10.1093/aje/kwt133> and Wallinga and Teunis (2004) <doi:10.1093/aje/kwh255>.
Maintained by Anne Cori. Last updated 7 months ago.
3.0 match 95 stars 12.00 score 1.0k scripts 7 dependentspaulkinyanjui01
CondMVT:Conditional Multivariate t Distribution
The packages helps sample from the conditional multivariate t distribution.
Maintained by Paul Kimani Kinyanjui. Last updated 3 years ago.
13.3 match 2.70 scorebioc
scp:Mass Spectrometry-Based Single-Cell Proteomics Data Analysis
Utility functions for manipulating, processing, and analyzing mass spectrometry-based single-cell proteomics data. The package is an extension to the 'QFeatures' package and relies on 'SingleCellExpirement' to enable single-cell proteomics analyses. The package offers the user the functionality to process quantitative table (as generated by MaxQuant, Proteome Discoverer, and more) into data tables ready for downstream analysis and data visualization.
Maintained by Christophe Vanderaa. Last updated 15 days ago.
geneexpressionproteomicssinglecellmassspectrometrypreprocessingcellbasedassaysbioconductormass-spectrometrysingle-cellsoftware
4.0 match 25 stars 8.94 score 115 scriptsbioc
autonomics:Unified Statistical Modeling of Omics Data
This package unifies access to Statistal Modeling of Omics Data. Across linear modeling engines (lm, lme, lmer, limma, and wilcoxon). Across coding systems (treatment, difference, deviation, etc). Across model formulae (with/without intercept, random effect, interaction or nesting). Across omics platforms (microarray, rnaseq, msproteomics, affinity proteomics, metabolomics). Across projection methods (pca, pls, sma, lda, spls, opls). Across clustering methods (hclust, pam, cmeans). It provides a fast enrichment analysis implementation. And an intuitive contrastogram visualisation to summarize contrast effects in complex designs.
Maintained by Aditya Bhagwat. Last updated 2 months ago.
softwaredataimportpreprocessingdimensionreductionprincipalcomponentregressiondifferentialexpressiongenesetenrichmenttranscriptomicstranscriptiongeneexpressionrnaseqmicroarrayproteomicsmetabolomicsmassspectrometry
6.0 match 5.95 score 5 scriptsalexanderrobitzsch
sirt:Supplementary Item Response Theory Models
Supplementary functions for item response models aiming to complement existing R packages. The functionality includes among others multidimensional compensatory and noncompensatory IRT models (Reckase, 2009, <doi:10.1007/978-0-387-89976-3>), MCMC for hierarchical IRT models and testlet models (Fox, 2010, <doi:10.1007/978-1-4419-0742-4>), NOHARM (McDonald, 1982, <doi:10.1177/014662168200600402>), Rasch copula model (Braeken, 2011, <doi:10.1007/s11336-010-9190-4>; Schroeders, Robitzsch & Schipolowski, 2014, <doi:10.1111/jedm.12054>), faceted and hierarchical rater models (DeCarlo, Kim & Johnson, 2011, <doi:10.1111/j.1745-3984.2011.00143.x>), ordinal IRT model (ISOP; Scheiblechner, 1995, <doi:10.1007/BF02301417>), DETECT statistic (Stout, Habing, Douglas & Kim, 1996, <doi:10.1177/014662169602000403>), local structural equation modeling (LSEM; Hildebrandt, Luedtke, Robitzsch, Sommer & Wilhelm, 2016, <doi:10.1080/00273171.2016.1142856>).
Maintained by Alexander Robitzsch. Last updated 2 months ago.
item-response-theoryopenblascpp
3.6 match 23 stars 10.01 score 280 scripts 22 dependentssfcheung
manymome:Mediation, Moderation and Moderated-Mediation After Model Fitting
Computes indirect effects, conditional effects, and conditional indirect effects in a structural equation model or path model after model fitting, with no need to define any user parameters or label any paths in the model syntax, using the approach presented in Cheung and Cheung (2024) <doi:10.3758/s13428-023-02224-z>. Can also form bootstrap confidence intervals by doing bootstrapping only once and reusing the bootstrap estimates in all subsequent computations. Supports bootstrap confidence intervals for standardized (partially or completely) indirect effects, conditional effects, and conditional indirect effects as described in Cheung (2009) <doi:10.3758/BRM.41.2.425> and Cheung, Cheung, Lau, Hui, and Vong (2022) <doi:10.1037/hea0001188>. Model fitting can be done by structural equation modeling using lavaan() or regression using lm().
Maintained by Shu Fai Cheung. Last updated 21 days ago.
bootstrappingconfidence-intervallavaanmanymomemediationmoderated-mediationmoderationregressionsemstandardized-effect-sizestructural-equation-modeling
4.4 match 1 stars 8.06 score 172 scripts 4 dependentsstatnet
tergm:Fit, Simulate and Diagnose Models for Network Evolution Based on Exponential-Family Random Graph Models
An integrated set of extensions to the 'ergm' package to analyze and simulate network evolution based on exponential-family random graph models (ERGM). 'tergm' is a part of the 'statnet' suite of packages for network analysis. See Krivitsky and Handcock (2014) <doi:10.1111/rssb.12014> and Carnegie, Krivitsky, Hunter, and Goodreau (2015) <doi:10.1080/10618600.2014.903087>.
Maintained by Pavel N. Krivitsky. Last updated 4 months ago.
3.8 match 27 stars 9.29 score 78 scripts 3 dependentsgianmarcoborrata
Indicator:Composite 'Indicator' Construction and Imputation Data
Different functions includes constructing composite indicators, imputing missing data, and evaluating imputation techniques. Additionally, different tools for data normalization. Detailed methodologies of 'Indicator' package are: OECD/European Union/EC-JRC (2008), "Handbook on Constructing Composite Indicators: Methodology and User Guide", OECD Publishing, Paris, <DOI:10.1787/533411815016>, Matteo Mazziotta & Adriano Pareto, (2018) "Measuring Well-Being Over Time: The Adjusted Mazziotta–Pareto Index Versus Other Non-compensatory Indices" <DOI:10.1007/s11205-017-1577-5> and De Muro P., Mazziotta M., Pareto A. (2011), "Composite Indices of Development and Poverty: An Application to MDGs" <DOI:10.1007/s11205-010-9727-z>.
Maintained by Gianmarco Borrata. Last updated 4 months ago.
11.7 match 1 stars 3.00 score 1 scriptsbioc
LEA:LEA: an R package for Landscape and Ecological Association Studies
LEA is an R package dedicated to population genomics, landscape genomics and genotype-environment association tests. LEA can run analyses of population structure and genome-wide tests for local adaptation, and also performs imputation of missing genotypes. The package includes statistical methods for estimating ancestry coefficients from large genotypic matrices and for evaluating the number of ancestral populations (snmf). It performs statistical tests using latent factor mixed models for identifying genetic polymorphisms that exhibit association with environmental gradients or phenotypic traits (lfmm2). In addition, LEA computes values of genetic offset statistics based on new or predicted environments (genetic.gap, genetic.offset). LEA is mainly based on optimized programs that can scale with the dimensions of large data sets.
Maintained by Olivier Francois. Last updated 4 days ago.
softwarestatistical methodclusteringregressionopenblas
5.3 match 6.63 score 534 scriptspdwaggoner
hdImpute:A Batch Process for High Dimensional Imputation
A correlation-based batch process for fast, accurate imputation for high dimensional missing data problems via chained random forests. See Waggoner (2023) <doi:10.1007/s00180-023-01325-9> for more on 'hdImpute', Stekhoven and Bühlmann (2012) <doi:10.1093/bioinformatics/btr597> for more on 'missForest', and Mayer (2022) <https://github.com/mayer79/missRanger> for more on 'missRanger'.
Maintained by Philip Waggoner. Last updated 2 months ago.
10.2 match 2 stars 3.41 score 13 scriptsbioc
scRecover:scRecover for imputation of single-cell RNA-seq data
scRecover is an R package for imputation of single-cell RNA-seq (scRNA-seq) data. It will detect and impute dropout values in a scRNA-seq raw read counts matrix while keeping the real zeros unchanged, since there are both dropout zeros and real zeros in scRNA-seq data. By combination with scImpute, SAVER and MAGIC, scRecover not only detects dropout and real zeros at higher accuracy, but also improve the downstream clustering and visualization results.
Maintained by Zhun Miao. Last updated 5 months ago.
geneexpressionsinglecellrnaseqtranscriptomicssequencingpreprocessingsoftware
6.6 match 8 stars 5.20 score 9 scriptstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
4.1 match 3 stars 8.20 score 7.8k scripts 11 dependentsbioc
scone:Single Cell Overview of Normalized Expression data
SCONE is an R package for comparing and ranking the performance of different normalization schemes for single-cell RNA-seq and other high-throughput analyses.
Maintained by Davide Risso. Last updated 24 days ago.
immunooncologynormalizationpreprocessingqualitycontrolgeneexpressionrnaseqsoftwaretranscriptomicssequencingsinglecellcoverage
3.7 match 53 stars 9.12 score 104 scriptsfelixthoemmes
rddapp:Regression Discontinuity Design Application
Estimation of both single- and multiple-assignment Regression Discontinuity Designs (RDDs). Provides both parametric (global) and non-parametric (local) estimation choices for both sharp and fuzzy designs, along with power analysis and assumption checks. Introductions to the underlying logic and analysis of RDDs are in Thistlethwaite, D. L., Campbell, D. T. (1960) <doi:10.1037/h0044319> and Lee, D. S., Lemieux, T. (2010) <doi:10.1257/jel.48.2.281>.
Maintained by Felix Thoemmes. Last updated 2 years ago.
non-parametric-rddparametric-rddrdd
5.3 match 9 stars 6.30 score 44 scriptshandcock
RDS:Respondent-Driven Sampling
Provides functionality for carrying out estimation with data collected using Respondent-Driven Sampling. This includes Heckathorn's RDS-I and RDS-II estimators as well as Gile's Sequential Sampling estimator. The package is part of the "RDS Analyst" suite of packages for the analysis of respondent-driven sampling data. See Gile and Handcock (2010) <doi:10.1111/j.1467-9531.2010.01223.x>, Gile and Handcock (2015) <doi:10.1111/rssa.12091> and Gile, Beaudry, Handcock and Ott (2018) <doi:10.1146/annurev-statistics-031017-100704>.
Maintained by Mark S. Handcock. Last updated 6 months ago.
8.6 match 1 stars 3.87 score 82 scripts 3 dependentsfederico-m-stefanini
convergEU:Monitoring Convergence of EU Countries
Indicators and measures by country and time describe what happens at economic and social levels. This package provides functions to calculate several measures of convergence after imputing missing values. The automated downloading of Eurostat data, followed by the production of country fiches and indicator fiches, makes possible to produce automated reports. The Eurofound report (<doi:10.2806/68012>) "Upward convergence in the EU: Concepts, measurements and indicators", 2018, is a detailed presentation of convergence.
Maintained by Federico M. Stefanini. Last updated 2 years ago.
6.7 match 1 stars 4.95 score 89 scriptsweirichs
eatRep:Educational Assessment Tools for Replication Methods
Replication methods to compute some basic statistic operations (means, standard deviations, frequency tables, percentiles, mean comparisons using weighted effect coding, generalized linear models, and linear multilevel models) in complex survey designs comprising multiple imputed or nested imputed variables and/or a clustered sampling structure which both deserve special procedures at least in estimating standard errors. See the package documentation for a more detailed description along with references.
Maintained by Sebastian Weirich. Last updated 16 days ago.
6.3 match 1 stars 5.16 score 13 scriptsbioc
missRows:Handling Missing Individuals in Multi-Omics Data Integration
The missRows package implements the MI-MFA method to deal with missing individuals ('biological units') in multi-omics data integration. The MI-MFA method generates multiple imputed datasets from a Multiple Factor Analysis model, then the yield results are combined in a single consensus solution. The package provides functions for estimating coordinates of individuals and variables, imputing missing individuals, and various diagnostic plots to inspect the pattern of missingness and visualize the uncertainty due to missing values.
Maintained by Gonzalez Ignacio. Last updated 5 months ago.
softwarestatisticalmethoddimensionreductionprincipalcomponentmathematicalbiologyvisualization
9.8 match 3.30 score 3 scriptskaz-yos
regmedint:Regression-Based Causal Mediation Analysis with Interaction and Effect Modification Terms
This is an extension of the regression-based causal mediation analysis first proposed by Valeri and VanderWeele (2013) <doi:10.1037/a0031034> and Valeri and VanderWeele (2015) <doi:10.1097/EDE.0000000000000253>). It supports including effect measure modification by covariates(treatment-covariate and mediator-covariate product terms in mediator and outcome regression models) as proposed by Li et al (2023) <doi:10.1097/EDE.0000000000001643>. It also accommodates the original 'SAS' macro and 'PROC CAUSALMED' procedure in 'SAS' when there is no effect measure modification. Linear and logistic models are supported for the mediator model. Linear, logistic, loglinear, Poisson, negative binomial, Cox, and accelerated failure time (exponential and Weibull) models are supported for the outcome model.
Maintained by Yi Li. Last updated 1 years ago.
causal-inferencemediation-analysis
4.6 match 29 stars 6.84 score 40 scriptsgeorgheinze
logistf:Firth's Bias-Reduced Logistic Regression
Fit a logistic regression model using Firth's bias reduction method, equivalent to penalization of the log-likelihood by the Jeffreys prior. Confidence intervals for regression coefficients can be computed by penalized profile likelihood. Firth's method was proposed as ideal solution to the problem of separation in logistic regression, see Heinze and Schemper (2002) <doi:10.1002/sim.1047>. If needed, the bias reduction can be turned off such that ordinary maximum likelihood logistic regression is obtained. Two new modifications of Firth's method, FLIC and FLAC, lead to unbiased predictions and are now available in the package as well, see Puhr et al (2017) <doi:10.1002/sim.7273>.
Maintained by Georg Heinze. Last updated 2 years ago.
3.4 match 12 stars 9.23 score 346 scripts 16 dependentsalinetalhouk
diceR:Diverse Cluster Ensemble in R
Performs cluster analysis using an ensemble clustering framework, Chiu & Talhouk (2018) <doi:10.1186/s12859-017-1996-y>. Results from a diverse set of algorithms are pooled together using methods such as majority voting, K-Modes, LinkCluE, and CSPA. There are options to compare cluster assignments across algorithms using internal and external indices, visualizations such as heatmaps, and significance testing for the existence of clusters.
Maintained by Derek Chiu. Last updated 1 months ago.
3.9 match 37 stars 8.13 score 60 scripts 3 dependentsdaiscode
TestDataImputation:Missing Item Responses Imputation for Test and Assessment Data
Functions for imputing missing item responses for dichotomous and polytomous test and assessment data. This package enables missing imputation methods that are suitable for test and assessment data, including: listwise (LW) deletion (see De Ayala et al. 2001 <doi:10.1111/j.1745-3984.2001.tb01124.x>), treating as incorrect (IN, see Lord, 1974 <doi: 10.1111/j.1745-3984.1974.tb00996.x>; Mislevy & Wu, 1996 <doi: 10.1002/j.2333-8504.1996.tb01708.x>; Pohl et al., 2014 <doi: 10.1177/0013164413504926>), person mean imputation (PM), item mean imputation (IM), two-way (TW) and response function (RF) imputation, (see Sijtsma & van der Ark, 2003 <doi: 10.1207/s15327906mbr3804_4>), logistic regression (LR) imputation, predictive mean matching (PMM), and expectation–maximization (EM) imputation (see Finch, 2008 <doi: 10.1111/j.1745-3984.2008.00062.x>).
Maintained by Shenghai Dai. Last updated 3 years ago.
17.2 match 1.82 score 11 scripts 1 dependentsmarco-geraci
Qtools:Utilities for Quantiles
Functions for unconditional and conditional quantiles. These include methods for transformation-based quantile regression, quantile-based measures of location, scale and shape, methods for quantiles of discrete variables, quantile-based multiple imputation, restricted quantile regression, directional quantile classification, and quantile ratio regression. A vignette is given in Geraci (2016, The R Journal) <doi:10.32614/RJ-2016-037> and included in the package.
Maintained by Marco Geraci. Last updated 1 years ago.
7.6 match 4.10 score 33 scripts 2 dependentsabshev
superMICE:SuperLearner Method for MICE
Adds a Super Learner ensemble model method (using the 'SuperLearner' package) to the 'mice' package. Laqueur, H. S., Shev, A. B., Kagawa, R. M. C. (2021) <doi:10.1093/aje/kwab271>.
Maintained by Aaron B. Shev. Last updated 3 years ago.
9.8 match 3 stars 3.18 scorejwb133
bootImpute:Bootstrap Inference for Multiple Imputation
Bootstraps and imputes incomplete datasets. Then performs inference on estimates obtained from analysing the imputed datasets as proposed by von Hippel and Bartlett (2021) <doi:10.1214/20-STS793>.
Maintained by Jonathan Bartlett. Last updated 6 months ago.
11.4 match 1 stars 2.70 score 6 scriptsbioc
FLAMES:FLAMES: Full Length Analysis of Mutations and Splicing in long read RNA-seq data
Semi-supervised isoform detection and annotation from both bulk and single-cell long read RNA-seq data. Flames provides automated pipelines for analysing isoforms, as well as intermediate functions for manual execution.
Maintained by Changqing Wang. Last updated 4 days ago.
rnaseqsinglecelltranscriptomicsdataimportdifferentialsplicingalternativesplicinggeneexpressionlongreadzlibcurlbzip2xz-utilscpp
3.9 match 31 stars 7.95 score 12 scriptsjimb3
BinaryDosage:Creates, Merges, and Reads Binary Dosage Files
Tools to create binary dosage from either VCF or GEN files, merge binary dosage files, and read binary dosage files.
Maintained by John Morrison. Last updated 5 years ago.
7.6 match 4.06 score 33 scriptsbioc
structToolbox:Data processing & analysis tools for Metabolomics and other omics
An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.
Maintained by Gavin Rhys Lloyd. Last updated 24 days ago.
workflowstepmetabolomicsbioconductor-packagedimslc-msmachine-learningmultivariate-analysisstatisticsunivariate
4.9 match 10 stars 6.26 score 12 scriptsjknowles
merTools:Tools for Analyzing Mixed Effect Regression Models
Provides methods for extracting results from mixed-effect model objects fit with the 'lme4' package. Allows construction of prediction intervals efficiently from large scale linear and generalized linear mixed-effects models. This method draws from the simulation framework used in the Gelman and Hill (2007) textbook: Data Analysis Using Regression and Multilevel/Hierarchical Models.
Maintained by Jared E. Knowles. Last updated 1 years ago.
2.9 match 105 stars 10.49 score 768 scriptsdecisionpatterns
na.tools:Comprehensive Library for Working with Missing (NA) Values in Vectors
This comprehensive toolkit provide a consistent and extensible framework for working with missing values in vectors. The companion package 'tidyimpute' provides similar functionality for list-like and table-like structures). Functions exist for detection, removal, replacement, imputation, recollection, etc. of 'NAs'.
Maintained by Christopher Brown. Last updated 6 years ago.
7.5 match 2 stars 4.04 score 109 scriptsalexanderrobitzsch
mdmb:Model Based Treatment of Missing Data
Contains model-based treatment of missing data for regression models with missing values in covariates or the dependent variable using maximum likelihood or Bayesian estimation (Ibrahim et al., 2005; <doi:10.1198/016214504000001844>; Luedtke, Robitzsch, & West, 2020a, 2020b; <doi:10.1080/00273171.2019.1640104><doi:10.1037/met0000233>). The regression model can be nonlinear (e.g., interaction effects, quadratic effects or B-spline functions). Multilevel models with missing data in predictors are available for Bayesian estimation. Substantive-model compatible multiple imputation can be also conducted.
Maintained by Alexander Robitzsch. Last updated 8 months ago.
missing-datamultiple-imputationopenblascpp
8.0 match 4 stars 3.78 score 26 scriptsnanxstats
hdnom:Benchmarking and Visualization Toolkit for Penalized Cox Models
Creates nomogram visualizations for penalized Cox regression models, with the support of reproducible survival model building, validation, calibration, and comparison for high-dimensional data.
Maintained by Nan Xiao. Last updated 6 months ago.
benchmarkhigh-dimensional-datalinear-regressionnomogram-visualizationpenalized-cox-modelssurvival-analysisopenblas
3.8 match 43 stars 8.07 score 68 scripts 1 dependentscran
mix:Estimation/Multiple Imputation for Mixed Categorical and Continuous Data
Estimation/multiple imputation programs for mixed categorical and continuous data.
Maintained by Brian Ripley. Last updated 3 months ago.
7.2 match 2 stars 4.21 score 5 dependents