Showing 200 of total 1997 results (show query)

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 4 days ago.

immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project

22.7 match 182 stars 13.71 score 1.3k scripts 22 dependents

josue-rodriguez

psymetadata:Open Datasets from Meta-Analyses in Psychology Research

Data and examples from meta-analyses in psychology research.

Maintained by Josue E. Rodriguez. Last updated 2 years ago.

74.2 match 1 stars 3.40 score 50 scripts

cotima

CoTiMA:Continuous Time Meta-Analysis ('CoTiMA')

The 'CoTiMA' package performs meta-analyses of correlation matrices of repeatedly measured variables taken from studies that used different time intervals. Different time intervals between measurement occasions impose problems for meta-analyses because the effects (e.g. cross-lagged effects) cannot be simply aggregated, for example, by means of common fixed or random effects analysis. However, continuous time math, which is applied in 'CoTiMA', can be used to extrapolate or intrapolate the results from all studies to any desired time lag. By this, effects obtained in studies that used different time intervals can be meta-analyzed. 'CoTiMA' fits models to empirical data using the structural equation model (SEM) package 'ctsem', the effects specified in a SEM are related to parameters that are not directly included in the model (i.e., continuous time parameters; together, they represent the continuous time structural equation model, CTSEM). Statistical model comparisons and significance tests are then performed on the continuous time parameter estimates. 'CoTiMA' also allows analysis of publication bias (Egger's test, PET-PEESE estimates, zcurve analysis etc.) and analysis of statistical power (post hoc power, required sample sizes). See Dormann, C., Guthier, C., & Cortina, J. M. (2019) <doi:10.1177/1094428119847277>. and Guthier, C., Dormann, C., & Voelkle, M. C. (2020) <doi:10.1037/bul0000304>.

Maintained by Markus Homberg. Last updated 2 months ago.

42.6 match 4 stars 5.28 score

mwheymans

psfmi:Prediction Model Pooling, Selection and Performance Evaluation Across Multiply Imputed Datasets

Pooling, backward and forward selection of linear, logistic and Cox regression models in multiply imputed datasets. Backward and forward selection can be done from the pooled model using Rubin's Rules (RR), the D1, D2, D3, D4 and the median p-values method. This is also possible for Mixed models. The models can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of predictors. The stability of the models can be evaluated using (cluster) bootstrapping. The package further contains functions to pool model performance measures as ROC/AUC, Reclassification, R-squared, scaled Brier score, H&L test and calibration plots for logistic regression models. Internal validation can be done across multiply imputed datasets with cross-validation or bootstrapping. The adjusted intercept after shrinkage of pooled regression coefficients can be obtained. Backward and forward selection as part of internal validation is possible. A function to externally validate logistic prediction models in multiple imputed datasets is available and a function to compare models. For Cox models a strata variable can be included. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>.

Maintained by Martijn Heymans. Last updated 2 years ago.

cox-regressionimputationimputed-datasetslogisticmultiple-imputationpoolpredictorregressionselectionsplinespline-predictors

24.3 match 10 stars 7.17 score 70 scripts

alanarnholt

BSDA:Basic Statistics and Data Analysis

Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.

Maintained by Alan T. Arnholt. Last updated 2 years ago.

13.6 match 7 stars 9.11 score 1.3k scripts 6 dependents

opencasestudies

OCSdata:Download Data from the 'Open Case Studies' Repository

Provides functions to access and download data from the 'Open Case Studies' <https://www.opencasestudies.org/> repositories on 'GitHub' <https://github.com/opencasestudies>. Different functions enable users to grab the data they need at different sections in the case study, as well as download the whole case study repository. All the user needs to do is input the name of the case study being worked on. The package relies on the httr::GET() function to access files through the 'GitHub' API. The functions usethis::use_zip() and usethis::create_from_github() are used to clone and/or download the case study repositories. See <https://github.com/opencasestudies/OCSdata/blob/master/README.md> for instructions and examples. To cite an individual case study, please see the 'README' file in the respective case study repository: <https://github.com/opencasestudies/ocs-bp-rural-and-urban-obesity> <https://github.com/opencasestudies/ocs-bp-air-pollution> <https://github.com/opencasestudies/ocs-bp-vaping-case-study> <https://github.com/opencasestudies/ocs-bp-opioid-rural-urban> <https://github.com/opencasestudies/ocs-bp-RTC-wrangling> <https://github.com/opencasestudies/ocs-bp-RTC-analysis> <https://github.com/opencasestudies/ocs-bp-youth-disconnection> <https://github.com/opencasestudies/ocs-bp-youth-mental-health> <https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard> <https://github.com/opencasestudies/ocs-bp-co2-emissions> <https://github.com/opencasestudies/ocs-bp-diet>.

Maintained by Carrie Wright. Last updated 8 months ago.

data-sciencepublic-health

21.9 match 1 stars 4.20 score 32 scripts

globalecologylab

poems:Pattern-Oriented Ensemble Modeling System

A framework of interoperable R6 classes (Chang, 2020, <https://CRAN.R-project.org/package=R6>) for building ensembles of viable models via the pattern-oriented modeling (POM) approach (Grimm et al.,2005, <doi:10.1126/science.1116681>). The package includes classes for encapsulating and generating model parameters, and managing the POM workflow. The workflow includes: model setup; generating model parameters via Latin hyper-cube sampling (Iman & Conover, 1980, <doi:10.1080/03610928008827996>); running multiple sampled model simulations; collating summary results; and validating and selecting an ensemble of models that best match known patterns. By default, model validation and selection utilizes an approximate Bayesian computation (ABC) approach (Beaumont et al., 2002, <doi:10.1093/genetics/162.4.2025>), although alternative user-defined functionality could be employed. The package includes a spatially explicit demographic population model simulation engine, which incorporates default functionality for density dependence, correlated environmental stochasticity, stage-based transitions, and distance-based dispersal. The user may customize the simulator by defining functionality for translocations, harvesting, mortality, and other processes, as well as defining the sequence order for the simulator processes. The framework could also be adapted for use with other model simulators by utilizing its extendable (inheritable) base classes.

Maintained by July Pilowsky. Last updated 20 days ago.

biogeographypopulation-modelprocess-based

9.6 match 10 stars 8.05 score 59 scripts 2 dependents

bioc

ssrch:a simple search engine

Demonstrate tokenization and a search gadget for collections of CSV files.

Maintained by VJ Carey. Last updated 5 months ago.

infrastructure

17.7 match 3.60 score 20 scripts

thothorn

TH.data:TH's Data Archive

Contains data sets used in other packages Torsten Hothorn maintains.

Maintained by Torsten Hothorn. Last updated 2 months ago.

7.5 match 8.28 score 137 scripts 370 dependents

bioc

Moonlight2R:Identify oncogenes and tumor suppressor genes from omics data

The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). We present an updated version of the R/bioconductor package called MoonlightR, namely Moonlight2R, which returns a list of candidate driver genes for specific cancer types on the basis of omics data integration. The Moonlight framework contains a primary layer where gene expression data and information about biological processes are integrated to predict genes called oncogenic mediators, divided into putative tumor suppressors and putative oncogenes. This is done through functional enrichment analyses, gene regulatory networks and upstream regulator analyses to score the importance of well-known biological processes with respect to the studied cancer type. By evaluating the effect of the oncogenic mediators on biological processes or through random forests, the primary layer predicts two putative roles for the oncogenic mediators: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As gene expression data alone is not enough to explain the deregulation of the genes, a second layer of evidence is needed. We have automated the integration of a secondary mutational layer through new functionalities in Moonlight2R. These functionalities analyze mutations in the cancer cohort and classifies these into driver and passenger mutations using the driver mutation prediction tool, CScape-somatic. Those oncogenic mediators with at least one driver mutation are retained as the driver genes. As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, Moonlight2R can be used to discover OCGs and TSGs in the same cancer type. This may for instance help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV). In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments. An additional mechanistic layer evaluates if there are mutations affecting the protein stability of the transcription factors (TFs) of the TSGs and OCGs, as that may have an effect on the expression of the genes.

Maintained by Matteo Tiberti. Last updated 2 months ago.

dnamethylationdifferentialmethylationgeneregulationgeneexpressionmethylationarraydifferentialexpressionpathwaysnetworksurvivalgenesetenrichmentnetworkenrichment

9.0 match 5 stars 6.59 score 43 scripts

sandhu-ss

bsitar:Bayesian Super Imposition by Translation and Rotation Growth Curve Analysis

The Super Imposition by Translation and Rotation (SITAR) model is a shape-invariant nonlinear mixed effect model that fits a natural cubic spline mean curve to the growth data and aligns individual-specific growth curves to the underlying mean curve via a set of random effects (see Cole, 2010 <doi:10.1093/ije/dyq115> for details). The non-Bayesian version of the SITAR model can be fit by using the already available R package 'sitar'. While the 'sitar' package allows modelling of a single outcome only, the 'bsitar' package offers great flexibility in fitting models of varying complexities, including joint modelling of multiple outcomes such as height and weight (multivariate model). Additionally, the 'bsitar' package allows for the simultaneous analysis of an outcome separately for subgroups defined by a factor variable such as gender. This is achieved by fitting separate models for each subgroup (for example males and females for gender variable). An advantage of this approach is that posterior draws for each subgroup are part of a single model object, making it possible to compare coefficients across subgroups and test hypotheses. Since the 'bsitar' package is a front-end to the R package 'brms', it offers excellent support for post-processing of posterior draws via various functions that are directly available from the 'brms' package. In addition, the 'bsitar' package includes various customized functions that allow for the visualization of distance (increase in size with age) and velocity (change in growth rate as a function of age), as well as the estimation of growth spurt parameters such as age at peak growth velocity and peak growth velocity.

Maintained by Satpal Sandhu. Last updated 3 hours ago.

10.5 match 5.46 score 7 scripts

mjlajeunesse

metagear:Comprehensive Research Synthesis Tools for Systematic Reviews and Meta-Analysis

Functionalities for facilitating systematic reviews, data extractions, and meta-analyses. It includes a GUI (graphical user interface) to help screen the abstracts and titles of bibliographic data; tools to assign screening effort across multiple collaborators/reviewers and to assess inter- reviewer reliability; tools to help automate the download and retrieval of journal PDF articles from online databases; figure and image extractions from PDFs; web scraping of citations; automated and manual data extraction from scatter-plot and bar-plot images; PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagrams; simple imputation tools to fill gaps in incomplete or missing study parameters; generation of random effects sizes for Hedges' d, log response ratio, odds ratio, and correlation coefficients for Monte Carlo experiments; covariance equations for modelling dependencies among multiple effect sizes (e.g., effect sizes with a common control); and finally summaries that replicate analyses and outputs from widely used but no longer updated meta-analysis software (i.e., metawin). Funding for this package was supported by National Science Foundation (NSF) grants DBI-1262545 and DEB-1451031. CITE: Lajeunesse, M.J. (2016) Facilitating systematic reviews, data extraction and meta-analysis with the metagear package for R. Methods in Ecology and Evolution 7, 323-330 <doi:10.1111/2041-210X.12472>.

Maintained by Marc J. Lajeunesse. Last updated 4 years ago.

8.3 match 14 stars 6.71 score 91 scripts

cran

CopulaREMADA:Copula Mixed Models for Multivariate Meta-Analysis of Diagnostic Test Accuracy Studies

The bivariate copula mixed model for meta-analysis of diagnostic test accuracy studies in Nikoloulopoulos (2015) <doi:10.1002/sim.6595> and Nikoloulopoulos (2018) <doi:10.1007/s10182-017-0299-y>. The vine copula mixed model for meta-analysis of diagnostic test accuracy studies accounting for disease prevalence in Nikoloulopoulos (2017) <doi:10.1177/0962280215596769> and also accounting for non-evaluable subjects in Nikoloulopoulos (2020) <doi:10.1515/ijb-2019-0107>. The hybrid vine copula mixed model for meta-analysis of diagnostic test accuracy case-control and cohort studies in Nikoloulopoulos (2018) <doi:10.1177/0962280216682376>. The D-vine copula mixed model for meta-analysis and comparison of two diagnostic tests in Nikoloulopoulos (2019) <doi:10.1177/0962280218796685>. The multinomial quadrivariate D-vine copula mixed model for meta-analysis of diagnostic tests with non-evaluable subjects in Nikoloulopoulos (2020) <doi:10.1177/0962280220913898>. The one-factor copula mixed model for joint meta-analysis of multiple diagnostic tests in Nikoloulopoulos (2022) <doi:10.1111/rssa.12838>. The multinomial six-variate 1-truncated D-vine copula mixed model for meta-analysis of two diagnostic tests accounting for within and between studies dependence in Nikoloulopoulos (2024) <doi:10.1177/09622802241269645>. The 1-truncated D-vine copula mixed models for meta-analysis of diagnostic accuracy studies without a gold standard (Nikoloulopoulos, 2024).

Maintained by Aristidis K. Nikoloulopoulos. Last updated 5 months ago.

34.2 match 2 stars 1.60 score 10 scripts

mrcieu

MRInstruments:Data sources for genetic instruments to be used in MR

Datasets of eQTLs, GWAS catalogs, etc.

Maintained by Gibran Hemani. Last updated 5 years ago.

10.4 match 44 stars 5.15 score 212 scripts

jomulder

BFpack:Flexible Bayes Factor Testing of Scientific Expectations

Implementation of default Bayes factors for testing statistical hypotheses under various statistical models. The package is intended for applied quantitative researchers in the social and behavioral sciences, medical research, and related fields. The Bayes factor tests can be executed for statistical models such as univariate and multivariate normal linear models, correlation analysis, generalized linear models, special cases of linear mixed models, survival models, relational event models. Parameters that can be tested are location parameters (e.g., group means, regression coefficients), variances (e.g., group variances), and measures of association (e.g,. polychoric/polyserial/biserial/tetrachoric/product moments correlations), among others. The statistical underpinnings are described in O'Hagan (1995) <DOI:10.1111/j.2517-6161.1995.tb02017.x>, De Santis and Spezzaferri (2001) <DOI:10.1016/S0378-3758(00)00240-8>, Mulder and Xin (2022) <DOI:10.1080/00273171.2021.1904809>, Mulder and Gelissen (2019) <DOI:10.1080/02664763.2021.1992360>, Mulder (2016) <DOI:10.1016/j.jmp.2014.09.004>, Mulder and Fox (2019) <DOI:10.1214/18-BA1115>, Mulder and Fox (2013) <DOI:10.1007/s11222-011-9295-3>, Boeing-Messing, van Assen, Hofman, Hoijtink, and Mulder (2017) <DOI:10.1037/met0000116>, Hoijtink, Mulder, van Lissa, and Gu (2018) <DOI:10.1037/met0000201>, Gu, Mulder, and Hoijtink (2018) <DOI:10.1111/bmsp.12110>, Hoijtink, Gu, and Mulder (2018) <DOI:10.1111/bmsp.12145>, and Hoijtink, Gu, Mulder, and Rosseel (2018) <DOI:10.1037/met0000187>. When using the packages, please refer to the package Mulder et al. (2021) <DOI:10.18637/jss.v100.i18> and the relevant methodological papers.

Maintained by Joris Mulder. Last updated 1 months ago.

fortranopenblas

6.4 match 15 stars 8.24 score 55 scripts 3 dependents

dnychka

fields:Tools for Spatial Data

For curve, surface and function fitting with an emphasis on splines, spatial data, geostatistics, and spatial statistics. The major methods include cubic, and thin plate splines, Kriging, and compactly supported covariance functions for large data sets. The splines and Kriging methods are supported by functions that can determine the smoothing parameter (nugget and sill variance) and other covariance function parameters by cross validation and also by restricted maximum likelihood. For Kriging there is an easy to use function that also estimates the correlation scale (range parameter). A major feature is that any covariance function implemented in R and following a simple format can be used for spatial prediction. There are also many useful functions for plotting and working with spatial data as images. This package also contains an implementation of sparse matrix methods for large spatial data sets and currently requires the sparse matrix (spam) package. Use help(fields) to get started and for an overview. The fields source code is deliberately commented and provides useful explanations of numerical details as a companion to the manual pages. The commented source code can be viewed by expanding the source code version and looking in the R subdirectory. The reference for fields can be generated by the citation function in R and has DOI <doi:10.5065/D6W957CT>. Development of this package was supported in part by the National Science Foundation Grant 1417857, the National Center for Atmospheric Research, and Colorado School of Mines. See the Fields URL for a vignette on using this package and some background on spatial statistics.

Maintained by Douglas Nychka. Last updated 9 months ago.

fortran

3.6 match 15 stars 12.60 score 7.7k scripts 295 dependents