SimMultiCorrData:Simulation of Correlated Data with Multiple Variable Types
Generate continuous (normal or non-normal), binary, ordinal, and count (Poisson or Negative Binomial) variables
with a specified correlation matrix. It can also produce a
single continuous variable. This package can be used to
simulate data sets that mimic real-world situations (i.e.
clinical or genetic data sets, plasmodes). All variables are
generated from standard normal variables with an imposed
intermediate correlation matrix. Continuous variables are
simulated by specifying mean, variance, skewness, standardized
kurtosis, and fifth and sixth standardized cumulants using
either Fleishman's third-order (<DOI:10.1007/BF02293811>) or
Headrick's fifth-order (<DOI:10.1016/S0167-9473(02)00072-5>)
polynomial transformation. Binary and ordinal variables are
simulated using a modification of the ordsample() function from
'GenOrd'. Count variables are simulated using the inverse cdf
method. There are two simulation pathways which differ
primarily according to the calculation of the intermediate
correlation matrix. In Correlation Method 1, the
intercorrelations involving count variables are determined
using a simulation based, logarithmic correlation correction
(adapting Yahav and Shmueli's 2012 method,
<DOI:10.1002/asmb.901>). In Correlation Method 2, the count
variables are treated as ordinal (adapting Barbiero and
Ferrari's 2015 modification of GenOrd,
<DOI:10.1002/asmb.2072>). There is an optional error loop that
corrects the final correlation matrix to be within a
user-specified precision value of the target matrix. The
package also includes functions to calculate standardized
cumulants for theoretical distributions or from real data sets,
check if a target correlation matrix is within the possible
correlation bounds (given the distributions of the simulated
variables), summarize results (numerically or graphically), to
verify valid power method pdfs, and to calculate lower
standardized kurtosis bounds.