R-universe search: nci

bioc

GenomicDataCommons:NIH / NCI Genomic Data Commons Access

Programmatically access the NIH / NCI Genomic Data Commons RESTful service.

Maintained by Sean Davis. Last updated 1 months ago.

dataimport sequencing api-client bioconductor bioinformatics cancer core-services data-science genomics nci tcga vignette

18.0 match 87 stars 11.94 score 238 scripts 12 dependents

insightsengineering

chevron:Standard TLGs for Clinical Trials Reporting

Provide standard tables, listings, and graphs (TLGs) libraries used in clinical trials. This package implements a structure to reformat the data with 'dunlin', create reporting tables using 'rtables' and 'tern' with standardized input arguments to enable quick generation of standard outputs. In addition, it also provides comprehensive data checks and script generation functionality.

Maintained by Joe Zhu. Last updated 24 days ago.

clinical-trials graphs listings nest reporting tables

21.5 match 12 stars 8.24 score 12 scripts

pharmaverse

admiral:ADaM in R Asset Library

A toolbox for programming Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team, 2021, <https://www.cdisc.org/standards/foundational/adam>).

Maintained by Ben Straub. Last updated 4 days ago.

cdisc clinical-trials open-source

9.7 match 236 stars 13.89 score 486 scripts 4 dependents

bioc

MultiAssayExperiment:Software for the integration of multi-omics experiments in Bioconductor

Harmonize data management of multiple experimental assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames. Facilities are provided for reshaping data into wide and long formats for adaptability to graphing and downstream analysis.

Maintained by Marcel Ramos. Last updated 2 months ago.

infrastructure datarepresentation bioconductor bioconductor-package genomics nci-itcr tcga u24ca289073

7.5 match 71 stars 14.95 score 670 scripts 127 dependents

aalfons

robustHD:Robust Methods for High-Dimensional Data

Robust methods for high-dimensional data, in particular linear model selection techniques based on least angle regression and sparse regression. Specifically, the package implements robust least angle regression (Khan, Van Aelst & Zamar, 2007; <doi:10.1198/016214507000000950>), (robust) groupwise least angle regression (Alfons, Croux & Gelper, 2016; <doi:10.1016/j.csda.2015.02.007>), and sparse least trimmed squares regression (Alfons, Croux & Gelper, 2013; <doi:10.1214/12-AOAS575>).

Maintained by Andreas Alfons. Last updated 9 months ago.

openblas cpp openmp

10.8 match 10 stars 7.06 score 174 scripts 8 dependents

bioc

cBioPortalData:Exposes and Makes Available Data from the cBioPortal Web Resources

The cBioPortalData R package accesses study datasets from the cBio Cancer Genomics Portal. It accesses the data either from the pre-packaged zip / tar files or from the API interface that was recently implemented by the cBioPortal Data Team. The package can provide data in either tabular format or with MultiAssayExperiment object that uses familiar Bioconductor data representations.

Maintained by Marcel Ramos. Last updated 10 days ago.

software infrastructure thirdpartyclient bioconductor-package nci-itcr u24ca289073

7.5 match 33 stars 10.15 score 147 scripts 4 dependents

bioc

NCIgraph:Pathways from the NCI Pathways Database

Provides various methods to load the pathways from the NCI Pathways Database in R graph objects and to re-format them.

Maintained by Laurent Jacob. Last updated 5 months ago.

pathways graphandnetwork

16.6 match 4.26 score 10 scripts 1 dependents

bioc

nipalsMCIA:Multiple Co-Inertia Analysis via the NIPALS Method

Computes Multiple Co-Inertia Analysis (MCIA), a dimensionality reduction (jDR) algorithm, for a multi-block dataset using a modification to the Nonlinear Iterative Partial Least Squares method (NIPALS) proposed in (Hanafi et. al, 2010). Allows multiple options for row- and table-level preprocessing, and speeds up computation of variance explained. Vignettes detail application to bulk- and single cell- multi-omics studies.

Maintained by Maximilian Mattessich. Last updated 27 days ago.

software clustering classification multiplecomparison normalization preprocessing singlecell

8.4 match 6 stars 6.60 score 10 scripts

bioc

rcellminer:rcellminer: Molecular Profiles, Drug Response, and Chemical Structures for the NCI-60 Cell Lines

The NCI-60 cancer cell line panel has been used over the course of several decades as an anti-cancer drug screen. This panel was developed as part of the Developmental Therapeutics Program (DTP, http://dtp.nci.nih.gov/) of the U.S. National Cancer Institute (NCI). Thousands of compounds have been tested on the NCI-60, which have been extensively characterized by many platforms for gene and protein expression, copy number, mutation, and others (Reinhold, et al., 2012). The purpose of the CellMiner project (http://discover.nci.nih.gov/ cellminer) has been to integrate data from multiple platforms used to analyze the NCI-60 and to provide a powerful suite of tools for exploration of NCI-60 data.

Maintained by Augustin Luna. Last updated 5 months ago.

acgh cellbasedassays copynumbervariation geneexpression pharmacogenomics pharmacogenetics mirna cheminformatics visualization software systemsbiology

8.7 match 5.71 score 113 scripts

md-anderson-bioinformatics

NGCHM:Next Generation Clustered Heat Maps

Next-Generation Clustered Heat Maps (NG-CHMs) allow for dynamic exploration of heat map data in a web browser. 'NGCHM' allows users to create both stand-alone HTML files containing a Next-Generation Clustered Heat Map, and .ngchm files to view in the NG-CHM viewer. See Ryan MC, Stucky M, et al (2020) <doi:10.12688/f1000research.20590.2> for more details.

Maintained by Mary A Rohrdanz. Last updated 9 days ago.

heatmap nci-itcr ng-chm

7.5 match 9 stars 5.48 score 28 scripts

trevorhastie

ISLR:Data for an Introduction to Statistical Learning with Applications in R

We provide the collection of data-sets used in the book 'An Introduction to Statistical Learning with Applications in R'.

Maintained by Trevor Hastie. Last updated 4 years ago.

4.0 match 4 stars 7.58 score 10k scripts 2 dependents

bioc

mogsa:Multiple omics data integrative clustering and gene set analysis

This package provide a method for doing gene set analysis based on multiple omics data.

Maintained by Chen Meng. Last updated 5 months ago.

geneexpression principalcomponent statisticalmethod clustering software

6.5 match 4.29 score 49 scripts

bioc

ropls:PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data

Latent variable modeling with Principal Component Analysis (PCA) and Partial Least Squares (PLS) are powerful methods for visualization, regression, classification, and feature selection of omics data where the number of variables exceeds the number of samples and with multicollinearity among variables. Orthogonal Partial Least Squares (OPLS) enables to separately model the variation correlated (predictive) to the factor of interest and the uncorrelated (orthogonal) variation. While performing similarly to PLS, OPLS facilitates interpretation. Successful applications of these chemometrics techniques include spectroscopic data such as Raman spectroscopy, nuclear magnetic resonance (NMR), mass spectrometry (MS) in metabolomics and proteomics, but also transcriptomics data. In addition to scores, loadings and weights plots, the package provides metrics and graphics to determine the optimal number of components (e.g. with the R2 and Q2 coefficients), check the validity of the model by permutation testing, detect outliers, and perform feature selection (e.g. with Variable Importance in Projection or regression coefficients). The package can be accessed via a user interface on the Workflow4Metabolomics.org online resource for computational metabolomics (built upon the Galaxy environment).

Maintained by Etienne A. Thevenot. Last updated 5 months ago.

regression classification principalcomponent transcriptomics proteomics metabolomics lipidomics massspectrometry immunooncology

3.3 match 7.55 score 210 scripts 8 dependents

bioc

CoreGx:Classes and Functions to Serve as the Basis for Other 'Gx' Packages

A collection of functions and classes which serve as the foundation for our lab's suite of R packages, such as 'PharmacoGx' and 'RadioGx'. This package was created to abstract shared functionality from other lab package releases to increase ease of maintainability and reduce code repetition in current and future 'Gx' suite programs. Major features include a 'CoreSet' class, from which 'RadioSet' and 'PharmacoSet' are derived, along with get and set methods for each respective slot. Additional functions related to fitting and plotting dose response curves, quantifying statistical correlation and calculating area under the curve (AUC) or survival fraction (SF) are included. For more details please see the included documentation, as well as: Smirnov, P., Safikhani, Z., El-Hachem, N., Wang, D., She, A., Olsen, C., Freeman, M., Selby, H., Gendoo, D., Grossman, P., Beck, A., Aerts, H., Lupien, M., Goldenberg, A. (2015) <doi:10.1093/bioinformatics/btv723>. Manem, V., Labie, M., Smirnov, P., Kofia, V., Freeman, M., Koritzinksy, M., Abazeed, M., Haibe-Kains, B., Bratman, S. (2018) <doi:10.1101/449793>.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

software pharmacogenomics classification survival

3.4 match 6.53 score 63 scripts 6 dependents

trevorhastie

ISLR2:Introduction to Statistical Learning, Second Edition

We provide the collection of data-sets used in the book 'An Introduction to Statistical Learning with Applications in R, Second Edition'. These include many data-sets that we used in the first edition (some with minor changes), and some new datasets.

Maintained by Trevor Hastie. Last updated 2 years ago.

4.0 match 2 stars 5.49 score 2.2k scripts

bioc

made4:Multivariate analysis of microarray data using ADE4

Multivariate data analysis and graphical display of microarray data. Functions include for supervised dimension reduction (between group analysis) and joint dimension reduction of 2 datasets (coinertia analysis). It contains functions that require R package ade4.

Maintained by Aedin Culhane. Last updated 5 months ago.

clustering classification dimensionreduction principalcomponent transcriptomics multiplecomparison geneexpression sequencing microarray

3.4 match 6.11 score 107 scripts 2 dependents

bioc

AlphaMissenseR:Accessing AlphaMissense Data Resources in R

The AlphaMissense publication <https://www.science.org/doi/epdf/10.1126/science.adg7492> outlines how a variant of AlphaFold / DeepMind was used to predict missense variant pathogenicity. Supporting data on Zenodo <https://zenodo.org/record/10813168> include, for instance, 71M variants across hg19 and hg38 genome builds. The 'AlphaMissenseR' package allows ready access to the data, downloading individual files to DuckDB databases for exploration and integration into *R* and *Bioconductor* workflows.

Maintained by Martin Morgan. Last updated 5 months ago.

snp annotation functionalgenomics structuralprediction transcriptomics variantannotation geneprediction immunooncology

2.8 match 8 stars 6.86 score 10 scripts

jorgetendeiro

PerFit:Person Fit

Several person-fit statistics (PFSs; Meijer and Sijtsma, 2001, <doi:10.1177/01466210122031957>) are offered. These statistics allow assessing whether individual response patterns to tests or questionnaires are (im)plausible given the other respondents in the sample or given a specified item response theory model. Some PFSs apply to dichotomous data, such as the likelihood-based PFSs (lz, lz*) and the group-based PFSs (personal biserial correlation, caution index, (normed) number of Guttman errors, agreement/disagreement/dependability statistics, U3, ZU3, NCI, Ht). PFSs suitable to polytomous data include extensions of lz, U3, and (normed) number of Guttman errors.

Maintained by Jorge N. Tendeiro. Last updated 3 years ago.

5.7 match 1 stars 3.36 score 46 scripts

bioc

omicade4:Multiple co-inertia analysis of omics datasets

This package performes multiple co-inertia analysis of omics datasets.

Maintained by Chen Meng. Last updated 5 months ago.

software clustering classification multiplecomparison

3.3 match 5.48 score 50 scripts 1 dependents

iembry

chem.databases:Collection of 3 Chemical Databases from Public Sources

Contains the Multi-Species Acute Toxicity Database (CAS & SMILES columns only) [United States (US) Department of Health and Human Services (DHHS) National Institutes of Health (NIH) National Cancer Institute (NCI), "Multi-Species Acute Toxicity Database", <https://cactus.nci.nih.gov/download/acute-toxicity-db/>] combined with the Toxic Substances Control Act (TSCA) Inventory [United States Environmental Protection Agency (US EPA), "Toxic Substances Control Act (TSCA) Chemical Substance Inventory", <https://www.epa.gov/tsca-inventory/how-access-tsca-inventory} and <https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/169>] and the Agency for Toxic Substances and Disease Registry (ATSDR) Database [United States (US) Department of Health and Human Services (DHHS) Centers for Disease Control and Prevention (CDC)/Agency for Toxic Substances and Disease Registry (ATSDR), "Agency for Toxic Substances and Disease Registry (ATSDR) Database", <https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/105>] in 2 data sets. One data set has a focus on the latter 2 databases and one data set focuses on the former database. Also contains the collection of chemical data from Wikipedia compiled in the US EPA CompTox Chemicals Dashboard [United States Environmental Protection Agency (US EPA) / Wikimedia Foundation, Inc. "CompTox Chemicals Dashboard v2.2.1", <https://comptox.epa.gov/dashboard/chemical-lists/WIKIPEDIA>].

Maintained by Irucka Embry. Last updated 1 years ago.

9.4 match 1.70 score

wolski

sigora:Signature Overrepresentation Analysis

Pathway Analysis is statistically linking observations on the molecular level to biological processes or pathways on the systems(i.e., organism, organ, tissue, cell) level. Traditionally, pathway analysis methods regard pathways as collections of single genes and treat all genes in a pathway as equally informative. However, this can lead to identifying spurious pathways as statistically significant since components are often shared amongst pathways. SIGORA seeks to avoid this pitfall by focusing on genes or gene pairs that are (as a combination) specific to a single pathway. In relying on such pathway gene-pair signatures (Pathway-GPS), SIGORA inherently uses the status of other genes in the experimental context to identify the most relevant pathways. The current version allows for pathway analysis of human and mouse datasets. In addition, it contains pre-computed Pathway-GPS data for pathways in the KEGG and Reactome pathway repositories and mechanisms for extracting GPS for user-supplied repositories.

Maintained by Witold Wolski. Last updated 3 years ago.

genesetenrichment go software pathways kegg

3.6 match 4.43 score 18 scripts 1 dependents

bioc

GSAR:Gene Set Analysis in R

Gene set analysis using specific alternative hypotheses. Tests for differential expression, scale and net correlation structure.

Maintained by Yasir Rahmatallah. Last updated 5 months ago.

software statisticalmethod differentialexpression

3.5 match 4.38 score 7 scripts

r-gregmisc

gmodels:Various R Programming Tools for Model Fitting

Various R programming tools for model fitting.

Maintained by Gregory R. Warnes. Last updated 3 months ago.

1.5 match 1 stars 10.01 score 3.5k scripts 30 dependents

uscbiostats

slurmR:A Lightweight Wrapper for 'Slurm'

'Slurm', Simple Linux Utility for Resource Management <https://slurm.schedmd.com/>, is a popular 'Linux' based software used to schedule jobs in 'HPC' (High Performance Computing) clusters. This R package provides a specialized lightweight wrapper of 'Slurm' with a syntax similar to that found in the 'parallel' R package. The package also includes a method for creating socket cluster objects spanning multiple nodes that can be used with the 'parallel' package.

Maintained by George Vega Yon. Last updated 1 years ago.

bioinformatics hpc slurm

1.5 match 59 stars 8.06 score 216 scripts 1 dependents

bioc

missRows:Handling Missing Individuals in Multi-Omics Data Integration

The missRows package implements the MI-MFA method to deal with missing individuals ('biological units') in multi-omics data integration. The MI-MFA method generates multiple imputed datasets from a Multiple Factor Analysis model, then the yield results are combined in a single consensus solution. The package provides functions for estimating coordinates of individuals and variables, imputing missing individuals, and various diagnostic plots to inspect the pattern of missingness and visualize the uncertainty due to missing values.

Maintained by Gonzalez Ignacio. Last updated 5 months ago.

software statisticalmethod dimensionreduction principalcomponent mathematicalbiology visualization

3.4 match 3.30 score 3 scripts

idblr

ndi:Neighborhood Deprivation Indices

Computes various geospatial indices of socioeconomic deprivation and disparity in the United States. Some indices are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other indices are "aspatial" because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation indices (NDI) are available: including: (1) based on Messer et al. (2006) <doi:10.1007/s11524-006-9094-x> and (2) based on Andrews et al. (2020) <doi:10.1080/17445647.2020.1750066> and Slotman et al. (2022) <doi:10.1016/j.dib.2022.108002> who use variables chosen by Roux and Mair (2010) <doi:10.1111/j.1749-6632.2009.05333.x>. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward). Using data from the ACS-5 (2005-2009 onward), the package can also compute indices of racial or ethnic residential segregation, including but limited to those discussed in Massey & Denton (1988) <doi:10.1093/sf/67.2.281>, and additional indices of socioeconomic disparity.

Maintained by Ian D. Buller. Last updated 7 months ago.

census census-api census-data deprivation deprivation-stats disparity geospatial geospatial-data metric-development principal-component-analysis segregation-measures socio-economic-indicators

1.6 match 21 stars 6.67 score 7 scripts 1 dependents

uscbiostats

fmcmc:A friendly MCMC framework

Provides a friendly (flexible) Markov Chain Monte Carlo (MCMC) framework for implementing Metropolis-Hastings algorithm in a modular way allowing users to specify automatic convergence checker, personalized transition kernels, and out-of-the-box multiple MCMC chains using parallel computing. Most of the methods implemented in this package can be found in Brooks et al. (2011, ISBN 9781420079425). Among the methods included, we have: Haario (2001) <doi:10.1007/s11222-011-9269-5> Adaptive Metropolis, Vihola (2012) <doi:10.1007/s11222-011-9269-5> Robust Adaptive Metropolis, and Thawornwattana et al. (2018) <doi:10.1214/17-BA1084> Mirror transition kernels.

Maintained by George Vega Yon. Last updated 1 years ago.

adaptive bayesian-inference markov-chain-monte-carlo mcmc metropolis-hastings parallel-computing

1.5 match 16 stars 6.79 score 86 scripts 1 dependents

uscbiostats

aphylo:Statistical Inference and Prediction of Annotations in Phylogenetic Trees

Implements a parsimonious evolutionary model to analyze and predict gene-functional annotations in phylogenetic trees as described in Vega Yon et al. (2021) <doi:10.1371/journal.pcbi.1007948>. Focusing on computational efficiency, 'aphylo' makes it possible to estimate pooled phylogenetic models, including thousands (hundreds) of annotations (trees) in the same run. The package also provides the tools for visualization of annotated phylogenies, calculation of posterior probabilities (prediction) and goodness-of-fit assessment featured in Vega Yon et al. (2021).

Maintained by George Vega Yon. Last updated 1 years ago.

annotations inference phylogenetics rcpparmadillo cpp

1.6 match 6 stars 5.49 score 104 scripts

machiela-lab

PCAmatchR:Match Cases to Controls Based on Genotype Principal Components

Matches cases to controls based on genotype principal components (PC). In order to produce better results, matches are based on the weighted distance of PCs where the weights are equal to the % variance explained by that PC. A weighted Mahalanobis distance metric (Kidd et al. (1987) <DOI:10.1016/0031-3203(87)90066-5>) is used to determine matches.

Maintained by Derek W. Brown. Last updated 2 years ago.

1.5 match 10 stars 4.70 score 4 scripts

machiela-lab

sparrpowR:Power Analysis to Detect Spatial Relative Risk Clusters

Calculate the statistical power to detect clusters using kernel-based spatial relative risk functions that are estimated using the 'sparr' package. Details about the 'sparr' package methods can be found in the tutorial: Davies et al. (2018) <doi:10.1002/sim.7577>. Details about kernel density estimation can be found in J. F. Bithell (1990) <doi:10.1002/sim.4780090616>. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) <doi:10.1002/sim.4780101112>.

Maintained by Ian D. Buller. Last updated 1 years ago.

1.5 match 2 stars 4.30 score 7 scripts

maialab

mskcc.oncotree:Interface to the 'OncoTree' API

Programmatic access to 'OncoTree' API <http://oncotree.mskcc.org/>. Get access to tumor main types, identifiers and utility routines to map across to other tumor classification systems.

Maintained by Ramiro Magno. Last updated 2 years ago.

1.9 match 6 stars 3.48 score 7 scripts

lance-waller-lab

gateR:Flow/Mass Cytometry Gating via Spatial Kernel Density Estimation

Estimates statistically significant marker combination values within which one immunologically distinctive group (i.e., disease case) is more associated than another group (i.e., healthy control), successively, using various combinations (i.e., "gates") of markers to examine features of cells that may be different between groups. For a two-group comparison, the 'gateR' package uses the spatial relative risk function estimated using the 'sparr' package. Details about the 'sparr' package methods can be found in the tutorial: Davies et al. (2018) <doi:10.1002/sim.7577>. Details about kernel density estimation can be found in J. F. Bithell (1990) <doi:10.1002/sim.4780090616>. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) <doi:10.1002/sim.4780101112>.

Maintained by Ian D. Buller. Last updated 1 years ago.

cytometry flow-cytometry gating kernel-density-estimation mass-cytometry non-euclidean-spaces spatial-analysis

1.5 match 2 stars 4.00 score 6 scripts

bioc

ldblock:data structures for linkage disequilibrium measures in populations

Define data structures for linkage disequilibrium measures in populations.

Maintained by VJ Carey. Last updated 5 months ago.

1.8 match 3.30 score 10 scripts

jimb3

GxEScanR:Run GWAS/GWEIS Scans Using Binary Dosage Files

Tools to run genome-wide association study (GWAS) and genome-wide by environment interaction study (GWEIS) scans using the genetic data stored in a binary dosage file. The user provides a data frame with the subject's covariate data and the information about the binary dosage file returned by the BinaryDosage::getbdinfo() routine.

Maintained by John Morrison. Last updated 4 years ago.

openblas cpp openmp

2.5 match 2.28 score 19 scripts

razrahman

IntegratedMRF:Integrated Prediction using Uni-Variate and Multivariate Random Forests

An implementation of a framework for drug sensitivity prediction from various genetic characterizations using ensemble approaches. Random Forests or Multivariate Random Forest predictive models can be generated from each genetic characterization that are then combined using a Least Square Regression approach. It also provides options for the use of different error estimation approaches of Leave-one-out, Bootstrap, N-fold cross validation and 0.632+Bootstrap along with generation of prediction confidence interval using Jackknife-after-Bootstrap approach.

Maintained by Raziur Rahman. Last updated 7 years ago.

cpp

3.4 match 1.26 score 18 scripts