R-universe search: needs:ROCR

satijalab

Seurat:Tools for Single Cell Genomics

A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.

Maintained by Paul Hoffman. Last updated 1 years ago.

human-cell-atlas single-cell-genomics single-cell-rna-seq cpp

2.4k stars 16.86 score 50k scripts 73 dependents

ecpolley

SuperLearner:Super Learner Prediction

Implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner.

Maintained by Eric Polley. Last updated 1 years ago.

274 stars 12.85 score 2.1k scripts 36 dependents

yanyachen

MLmetrics:Machine Learning Evaluation Metrics

A collection of evaluation metrics, including loss, score and utility functions, that measure regression, classification and ranking performance.

Maintained by Yachen Yan. Last updated 12 months ago.

69 stars 11.09 score 2.2k scripts 20 dependents

bioc

infercnv:Infer Copy Number Variation from Single-Cell RNA-Seq Data

Using single-cell RNA-Seq expression to visualize CNV in cells.

Maintained by Christophe Georgescu. Last updated 5 months ago.

software copynumbervariation variantdetection structuralvariation genomicvariation genetics transcriptomics statisticalmethod bayesian hiddenmarkovmodel singlecell jags cpp

601 stars 10.92 score 674 scripts

bioc

singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.

Maintained by Joshua David Campbell. Last updated 1 months ago.

singlecell geneexpression differentialexpression alignment clustering immunooncology batcheffect normalization qualitycontrol dataimport gui

182 stars 10.17 score 252 scripts

bioc

SC3:Single-Cell Consensus Clustering

A tool for unsupervised clustering and analysis of single cell RNA-Seq data.

Maintained by Vladimir Kiselev. Last updated 5 months ago.

immunooncology singlecell software classification clustering dimensionreduction supportvectormachine rnaseq visualization transcriptomics datarepresentation gui differentialexpression transcription bioconductor-package human-cell-atlas single-cell-rna-seq openblas cpp

125 stars 10.10 score 374 scripts 1 dependents

constantamateur

SoupX:Single Cell mRNA Soup eXterminator

Quantify, profile and remove ambient mRNA contamination (the "soup") from droplet based single cell RNA-seq experiments. Implements the method described in Young et al. (2018) <doi:10.1101/303727>.

Maintained by Matthew Daniel Young. Last updated 2 years ago.

266 stars 10.08 score 594 scripts 1 dependents

tlverse

sl3:Pipelines for Machine Learning and Super Learning

A modern implementation of the Super Learner prediction algorithm, coupled with a general purpose framework for composing arbitrary pipelines for machine learning tasks.

Maintained by Jeremy Coyle. Last updated 5 months ago.

data-science ensemble-learning ensemble-model machine-learning model-selection regression stacking statistics

100 stars 9.94 score 748 scripts 7 dependents

jamovi

jmv:The 'jamovi' Analyses

A suite of common statistical methods such as descriptives, t-tests, ANOVAs, regression, correlation matrices, proportion tests, contingency tables, and factor analysis. This package is also useable from the 'jamovi' statistical spreadsheet (see <https://www.jamovi.org> for more information).

Maintained by Jonathon Love. Last updated 1 months ago.

60 stars 9.48 score 440 scripts

stemangiola

tidyseurat:Brings Seurat to the Tidyverse

It creates an invisible layer that allow to see the 'Seurat' object as tibble and interact seamlessly with the tidyverse.

Maintained by Stefano Mangiola. Last updated 8 months ago.

assaydomain infrastructure rnaseq differentialexpression geneexpression normalization clustering qualitycontrol sequencing transcription transcriptomics dplyr ggplot2 pca purrr sct seurat single-cell single-cell-rna-seq tibble tidyr tidyverse transcripts tsne umap

159 stars 9.48 score 398 scripts 1 dependents

ledell

cvAUC:Cross-Validated Area Under the ROC Curve Confidence Intervals

Tools for working with and evaluating cross-validated area under the ROC curve (AUC) estimators. The primary functions of the package are ci.cvAUC and ci.pooled.cvAUC, which report cross-validated AUC and compute confidence intervals for cross-validated AUC estimates based on influence curves for i.i.d. and pooled repeated measures data, respectively. One benefit to using influence curve based confidence intervals is that they require much less computation time than bootstrapping methods. The utility functions, AUC and cvAUC, are simple wrappers for functions from the ROCR package.

Maintained by Erin LeDell. Last updated 3 years ago.

auc confidence-intervals cross-validation machine-learning statistics variance

23 stars 9.17 score 317 scripts 40 dependents

bioc

iCOBRA:Comparison and Visualization of Ranking and Assignment Methods

This package provides functions for calculation and visualization of performance metrics for evaluation of ranking and binary classification (assignment) methods. Various types of performance plots can be generated programmatically. The package also contains a shiny application for interactive exploration of results.

Maintained by Charlotte Soneson. Last updated 3 months ago.

classification visualization

16 stars 8.84 score 192 scripts 1 dependents

pablo14

funModeling:Exploratory Data Analysis and Data Preparation Tool-Box

Around 10% of almost any predictive modeling project is spent in predictive modeling, 'funModeling' and the book Data Science Live Book (<https://livebook.datascienceheroes.com/>) are intended to cover remaining 90%: data preparation, profiling, selecting best variables 'dataViz', assessing model performance and other functions.

Maintained by Pablo Casas. Last updated 2 years ago.

100 stars 8.51 score 654 scripts

samuel-marsh

scCustomize:Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing

Collection of functions created and/or curated to aid in the visualization and analysis of single-cell data using 'R'. 'scCustomize' aims to provide 1) Customized visualizations for aid in ease of use and to create more aesthetic and functional visuals. 2) Improve speed/reproducibility of common tasks/pieces of code in scRNA-seq analysis with a single or group of functions. For citation please use: Marsh SE (2021) "Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing" <doi:10.5281/zenodo.5706430> RRID:SCR_024675.

Maintained by Samuel Marsh. Last updated 3 months ago.

customization ggplot2 scrna-seq seurat single-cell single-cell-genomics single-cell-rna-seq visualization

246 stars 8.45 score 1.1k scripts

bioc

projectR:Functions for the projection of weights from PCA, CoGAPS, NMF, correlation, and clustering

Functions for the projection of data into the spaces defined by PCA, CoGAPS, NMF, correlation, and clustering.

Maintained by Genevieve Stein-OBrien. Last updated 15 days ago.

functionalprediction generegulation biologicalquestion software

62 stars 8.42 score 70 scripts

carmonalab

scGate:Marker-Based Cell Type Purification for Single-Cell Sequencing Data

A common bioinformatics task in single-cell data analysis is to purify a cell type or cell population of interest from heterogeneous datasets. 'scGate' automatizes marker-based purification of specific cell populations, without requiring training data or reference gene expression profiles. Briefly, 'scGate' takes as input: i) a gene expression matrix stored in a 'Seurat' object and ii) a “gating model” (GM), consisting of a set of marker genes that define the cell population of interest. The GM can be as simple as a single marker gene, or a combination of positive and negative markers. More complex GMs can be constructed in a hierarchical fashion, akin to gating strategies employed in flow cytometry. 'scGate' evaluates the strength of signature marker expression in each cell using the rank-based method 'UCell', and then performs k-nearest neighbor (kNN) smoothing by calculating the mean 'UCell' score across neighboring cells. kNN-smoothing aims at compensating for the large degree of sparsity in scRNA-seq data. Finally, a universal threshold over kNN-smoothed signature scores is applied in binary decision trees generated from the user-provided gating model, to annotate cells as either “pure” or “impure”, with respect to the cell population of interest. See the related publication Andreatta et al. (2022) <doi:10.1093/bioinformatics/btac141>.

Maintained by Massimo Andreatta. Last updated 2 months ago.

filtering marker-genes scgate signatures single-cell

106 stars 8.38 score 163 scripts

cefet-rj-dal

harbinger:A Unified Time Series Event Detection Framework

By analyzing time series, it is possible to observe significant changes in the behavior of observations that frequently characterize events. Events present themselves as anomalies, change points, or motifs. In the literature, there are several methods for detecting events. However, searching for a suitable time series method is a complex task, especially considering that the nature of events is often unknown. This work presents Harbinger, a framework for integrating and analyzing event detection methods. Harbinger contains several state-of-the-art methods described in Salles et al. (2020) <doi:10.5753/sbbd.2020.13626>.

Maintained by Eduardo Ogasawara. Last updated 4 months ago.

18 stars 8.32 score 216 scripts

cwolock

survML:Tools for Flexible Survival Analysis Using Machine Learning

Statistical tools for analyzing time-to-event data using machine learning. Implements survival stacking for conditional survival estimation, standardized survival function estimation for current status data, and methods for algorithm-agnostic variable importance. See Wolock CJ, Gilbert PB, Simon N, and Carone M (2024) <doi:10.1080/10618600.2024.2304070>.

Maintained by Charles Wolock. Last updated 1 days ago.

18 stars 8.13 score 73 scripts 1 dependents

bioc

compcodeR:RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods

This package provides extensive functionality for comparing results obtained by different methods for differential expression analysis of RNAseq data. It also contains functions for simulating count data. Finally, it provides convenient interfaces to several packages for performing the differential expression analysis. These can also be used as templates for setting up and running a user-defined differential analysis workflow within the framework of the package.

Maintained by Charlotte Soneson. Last updated 3 months ago.

immunooncology rnaseq differentialexpression

12 stars 8.10 score 26 scripts

tlverse

tmle3:The Extensible TMLE Framework

A general framework supporting the implementation of targeted maximum likelihood estimators (TMLEs) of a diverse range of statistical target parameters through a unified interface. The goal is that the exposed framework be as general as the mathematical framework upon which it draws.

Maintained by Jeremy Coyle. Last updated 5 months ago.

causal-inference machine-learning targeted-learning variable-importance

38 stars 7.91 score 286 scripts 5 dependents

myles-lewis

nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'

Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.

Maintained by Myles Lewis. Last updated 14 days ago.

12 stars 7.90 score 46 scripts

schlosslab

mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines

An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.

Maintained by Kelly Sovacool. Last updated 2 years ago.

machine-learning

56 stars 7.83 score 86 scripts

nsaph-software

CausalGPS:Matching on Generalized Propensity Scores with Continuous Exposures

Provides a framework for estimating causal effects of a continuous exposure using observational data, and implementing matching and weighting on the generalized propensity score. Wu, X., Mealli, F., Kioumourtzoglou, M.A., Dominici, F. and Braun, D., 2022. Matching on generalized propensity scores with continuous exposures. Journal of the American Statistical Association, pp.1-29.

Maintained by Naeem Khoshnevis. Last updated 10 months ago.

cpp openmp

24 stars 7.67 score 39 scripts

yqzhong7

AIPW:Augmented Inverse Probability Weighting

The 'AIPW' package implements the augmented inverse probability weighting, a doubly robust estimator, for average causal effect estimation with user-defined stacked machine learning algorithms. To cite the 'AIPW' package, please use: "Yongqi Zhong, Edward H. Kennedy, Lisa M. Bodnar, Ashley I. Naimi (2021). AIPW: An R Package for Augmented Inverse Probability Weighted Estimation of Average Causal Effects. American Journal of Epidemiology. doi: 10.1093/aje/kwab207". Visit: <https://yqzhong7.github.io/AIPW/> for more information.

Maintained by Yongqi Zhong. Last updated 17 days ago.

causal-inference machine-learning robust-estimators

24 stars 7.65 score 31 scripts 1 dependents

bioc

ggsc:Visualizing Single Cell and Spatial Transcriptomics

Useful functions to visualize single cell and spatial data. It supports visualizing 'Seurat', 'SingleCellExperiment' and 'SpatialExperiment' objects through grammar of graphics syntax implemented in 'ggplot2'.

Maintained by Guangchuang Yu. Last updated 5 months ago.

dimensionreduction geneexpression singlecell software spatial transcriptomics visualization openblas cpp openmp

47 stars 7.59 score 18 scripts

bioc

MOSim:Multi-Omics Simulation (MOSim)

MOSim package simulates multi-omic experiments that mimic regulatory mechanisms within the cell, allowing flexible experimental design including time course and multiple groups.

Maintained by Sonia Tarazona. Last updated 3 days ago.

software timecourse experimentaldesign rnaseq cpp

9 stars 7.46 score 11 scripts

mwheymans

psfmi:Prediction Model Pooling, Selection and Performance Evaluation Across Multiply Imputed Datasets

Pooling, backward and forward selection of linear, logistic and Cox regression models in multiply imputed datasets. Backward and forward selection can be done from the pooled model using Rubin's Rules (RR), the D1, D2, D3, D4 and the median p-values method. This is also possible for Mixed models. The models can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of predictors. The stability of the models can be evaluated using (cluster) bootstrapping. The package further contains functions to pool model performance measures as ROC/AUC, Reclassification, R-squared, scaled Brier score, H&L test and calibration plots for logistic regression models. Internal validation can be done across multiply imputed datasets with cross-validation or bootstrapping. The adjusted intercept after shrinkage of pooled regression coefficients can be obtained. Backward and forward selection as part of internal validation is possible. A function to externally validate logistic prediction models in multiple imputed datasets is available and a function to compare models. For Cox models a strata variable can be included. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>.

Maintained by Martijn Heymans. Last updated 2 years ago.

cox-regression imputation imputed-datasets logistic multiple-imputation pool predictor regression selection spline spline-predictors

10 stars 7.17 score 70 scripts

bioc

CuratedAtlasQueryR:Queries the Human Cell Atlas

Provides access to a copy of the Human Cell Atlas, but with harmonised metadata. This allows for uniform querying across numerous datasets within the Atlas using common fields such as cell type, tissue type, and patient ethnicity. Usage involves first querying the metadata table for cells of interest, and then downloading the corresponding cells into a SingleCellExperiment object.

Maintained by Stefano Mangiola. Last updated 5 months ago.

assaydomain infrastructure rnaseq differentialexpression geneexpression normalization clustering qualitycontrol sequencing transcription transcriptomics database duckdb hdf5 human-cell-atlas single-cell singlecellexperiment tidyverse

90 stars 7.04 score 41 scripts

leifeld

btergm:Temporal Exponential Random Graph Models by Bootstrapped Pseudolikelihood

Temporal Exponential Random Graph Models (TERGM) estimated by maximum pseudolikelihood with bootstrapped confidence intervals or Markov Chain Monte Carlo maximum likelihood. Goodness of fit assessment for ERGMs, TERGMs, and SAOMs. Micro-level interpretation of ERGMs and TERGMs. The methods are described in Leifeld, Cranmer and Desmarais (2018), JStatSoft <doi:10.18637/jss.v083.i06>.

Maintained by Philip Leifeld. Last updated 15 days ago.

complex-networks dynamic-analysis ergm estimation goodness-of-fit inference longitudinal-data network-analysis prediction tergm

18 stars 7.03 score 83 scripts 2 dependents

bioc

pipeComp:pipeComp pipeline benchmarking framework

A simple framework to facilitate the comparison of pipelines involving various steps and parameters. The `pipelineDefinition` class represents pipelines as, minimally, a set of functions consecutively executed on the output of the previous one, and optionally accompanied by step-wise evaluation and aggregation functions. Given such an object, a set of alternative parameters/methods, and benchmark datasets, the `runPipeline` function then proceeds through all combinations arguments, avoiding recomputing the same step twice and compiling evaluations on the fly to avoid storing potentially large intermediate data.

Maintained by Pierre-Luc Germain. Last updated 5 months ago.

geneexpression transcriptomics clustering datarepresentation benchmark bioconductor pipeline-benchmarking pipelines single-cell-rna-seq

41 stars 7.02 score 43 scripts

benkeser

drtmle:Doubly-Robust Nonparametric Estimation and Inference

Targeted minimum loss-based estimators of counterfactual means and causal effects that are doubly-robust with respect both to consistency and asymptotic normality (Benkeser et al (2017), <doi:10.1093/biomet/asx053>; MJ van der Laan (2014), <doi:10.1515/ijb-2012-0038>).

Maintained by David Benkeser. Last updated 2 years ago.

causal-inference ensemble-learning iptw statistical-inference tmle

19 stars 6.89 score 90 scripts 1 dependents

bioc

COTAN:COexpression Tables ANalysis

Statistical and computational method to analyze the co-expression of gene pairs at single cell level. It provides the foundation for single-cell gene interactome analysis. The basic idea is studying the zero UMI counts' distribution instead of focusing on positive counts; this is done with a generalized contingency tables framework. COTAN can effectively assess the correlated or anti-correlated expression of gene pairs. It provides a numerical index related to the correlation and an approximate p-value for the associated independence test. COTAN can also evaluate whether single genes are differentially expressed, scoring them with a newly defined global differentiation index. Moreover, this approach provides ways to plot and cluster genes according to their co-expression pattern with other genes, effectively helping the study of gene interactions and becoming a new tool to identify cell-identity marker genes.

Maintained by Galfrè Silvia Giulia. Last updated 21 days ago.

systemsbiology transcriptomics geneexpression singlecell

16 stars 6.85 score 96 scripts

bdwilliamson

vimp:Perform Inference on Algorithm-Agnostic Variable Importance

Calculate point estimates of and valid confidence intervals for nonparametric, algorithm-agnostic variable importance measures in high and low dimensions, using flexible estimators of the underlying regression functions. For more information about the methods, please see Williamson et al. (Biometrics, 2020), Williamson et al. (JASA, 2021), and Williamson and Feng (ICML, 2020).

Maintained by Brian D. Williamson. Last updated 2 months ago.

machine-learning nonparametric-statistics statistical-inference variable-importance

23 stars 6.79 score 67 scripts

michaellli

evalITR:Evaluating Individualized Treatment Rules

Provides various statistical methods for evaluating Individualized Treatment Rules under randomized data. The provided metrics include Population Average Value (PAV), Population Average Prescription Effect (PAPE), Area Under Prescription Effect Curve (AUPEC). It also provides the tools to analyze Individualized Treatment Rules under budget constraints. Detailed reference in Imai and Li (2019) <arXiv:1905.05389>.

Maintained by Michael Lingzhi Li. Last updated 2 years ago.

14 stars 6.78 score 36 scripts

cefet-rj-dal

daltoolbox:Leveraging Experiment Lines to Data Analytics

The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.

Maintained by Eduardo Ogasawara. Last updated 2 months ago.

1 stars 6.65 score 536 scripts 4 dependents

bioc

scAnnotatR:Pretrained learning models for cell type prediction on single cell RNA-sequencing data

The package comprises a set of pretrained machine learning models to predict basic immune cell types. This enables all users to quickly get a first annotation of the cell types present in their dataset without requiring prior knowledge. scAnnotatR also allows users to train their own models to predict new cell types based on specific research needs.

Maintained by Johannes Griss. Last updated 5 months ago.

singlecell transcriptomics geneexpression supportvectormachine classification software

15 stars 6.61 score 20 scripts

carmonalab

GeneNMF:Non-Negative Matrix Factorization for Single-Cell Omics

A collection of methods to extract gene programs from single-cell gene expression data using non-negative matrix factorization (NMF). 'GeneNMF' contains functions to directly interact with the 'Seurat' toolkit and derive interpretable gene program signatures.

Maintained by Massimo Andreatta. Last updated 16 days ago.

105 stars 6.58 score 12 scripts

bioc

SpotClean:SpotClean adjusts for spot swapping in spatial transcriptomics data

SpotClean is a computational method to adjust for spot swapping in spatial transcriptomics data. Recent spatial transcriptomics experiments utilize slides containing thousands of spots with spot-specific barcodes that bind mRNA. Ideally, unique molecular identifiers at a spot measure spot-specific expression, but this is often not the case due to bleed from nearby spots, an artifact we refer to as spot swapping. SpotClean is able to estimate the contamination rate in observed data and decontaminate the spot swapping effect, thus increase the sensitivity and precision of downstream analyses.

Maintained by Zijian Ni. Last updated 5 months ago.

dataimport rnaseq sequencing geneexpression spatial singlecell transcriptomics preprocessing rna-seq spatial-transcriptomics

31 stars 6.52 score 36 scripts

bioc

CatsCradle:This package provides methods for analysing spatial transcriptomics data and for discovering gene clusters

This package addresses two broad areas. It allows for in-depth analysis of spatial transcriptomic data by identifying tissue neighbourhoods. These are contiguous regions of tissue surrounding individual cells. 'CatsCradle' allows for the categorisation of neighbourhoods by the cell types contained in them and the genes expressed in them. In particular, it produces Seurat objects whose individual elements are neighbourhoods rather than cells. In addition, it enables the categorisation and annotation of genes by producing Seurat objects whose elements are genes.

Maintained by Michael Shapiro. Last updated 17 days ago.

biologicalquestion statisticalmethod geneexpression singlecell transcriptomics spatial

3 stars 6.52 score

mathewchamberlain

SignacX:Cell Type Identification and Discovery from Single Cell Gene Expression Data

An implementation of neural networks trained with flow-sorted gene expression data to classify cellular phenotypes in single cell RNA-sequencing data. See Chamberlain M et al. (2021) <doi:10.1101/2021.02.01.429207> for more details.

Maintained by Mathew Chamberlain. Last updated 2 years ago.

cellular-phenotypes seurat single-cell-rna-seq

25 stars 6.47 score 34 scripts

giscience-fsu

sperrorest:Perform Spatial Error Estimation and Variable Importance Assessment

Implements spatial error estimation and permutation-based variable importance measures for predictive models using spatial cross-validation and spatial block bootstrap.

Maintained by Alexander Brenning. Last updated 2 years ago.

cross-validation machine-learning spatial-statistics spatio-temporal-modeling statistical-learning

19 stars 6.46 score 46 scripts

qile0317

APackOfTheClones:Visualization of Clonal Expansion for Single Cell Immune Profiles

Visualize clonal expansion via circle-packing. 'APackOfTheClones' extends 'scRepertoire' to produce a publication-ready visualization of clonal expansion at a single cell resolution, by representing expanded clones as differently sized circles. The method was originally implemented by Murray Christian and Ben Murrell in the following immunology study: Ma et al. (2021) <doi:10.1126/sciimmunol.abg6356>.

Maintained by Qile Yang. Last updated 5 months ago.

clonal-analysis immune-repertoire immune-system scrna-seq scrnaseq seurat single-cell single-cell-genomics cpp

15 stars 6.45 score 15 scripts

lhe17

nebula:Negative Binomial Mixed Models Using Large-Sample Approximation for Differential Expression Analysis of ScRNA-Seq Data

A fast negative binomial mixed model for conducting association analysis of multi-subject single-cell data. It can be used for identifying marker genes, differential expression and co-expression analyses. The model includes subject-level random effects to account for the hierarchical structure in multi-subject single-cell data. See He et al. (2021) <doi:10.1038/s42003-021-02146-6>.

Maintained by Liang He. Last updated 8 days ago.

cpp

37 stars 6.43 score 145 scripts

nsaph-software

CRE:Interpretable Discovery and Inference of Heterogeneous Treatment Effects

Provides a new method for interpretable heterogeneous treatment effects characterization in terms of decision rules via an extensive exploration of heterogeneity patterns by an ensemble-of-trees approach, enforcing high stability in the discovery. It relies on a two-stage pseudo-outcome regression, and it is supported by theoretical convergence guarantees. Bargagli-Stoffi, F. J., Cadei, R., Lee, K., & Dominici, F. (2023) Causal rule ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects. arXiv preprint <doi:10.48550/arXiv.2009.09036>.

Maintained by Falco Joannes Bargagli Stoffi. Last updated 5 months ago.

13 stars 6.41 score 11 scripts

rivolli

utiml:Utilities for Multi-Label Learning

Multi-label learning strategies and others procedures to support multi- label classification in R. The package provides a set of multi-label procedures such as sampling methods, transformation strategies, threshold functions, pre-processing techniques and evaluation metrics. A complete overview of the matter can be seen in Zhang, M. and Zhou, Z. (2014) <doi:10.1109/TKDE.2013.39> and Gibaja, E. and Ventura, S. (2015) A Tutorial on Multi-label Learning.

Maintained by Adriano Rivolli. Last updated 4 years ago.

28 stars 6.39 score 87 scripts

bioc

RNAmodR:Detection of post-transcriptional modifications in high throughput sequencing data

RNAmodR provides classes and workflows for loading/aggregation data from high througput sequencing aimed at detecting post-transcriptional modifications through analysis of specific patterns. In addition, utilities are provided to validate and visualize the results. The RNAmodR package provides a core functionality from which specific analysis strategies can be easily implemented as a seperate package.

Maintained by Felix G.M. Ernst. Last updated 5 months ago.

software infrastructure workflowstep visualization sequencing alkanilineseq bioconductor modifications ribomethseq rna rnamodr

3 stars 6.39 score 9 scripts 3 dependents

nt-williams

lmtp:Non-Parametric Causal Effects of Feasible Interventions Based on Modified Treatment Policies

Non-parametric estimators for casual effects based on longitudinal modified treatment policies as described in Diaz, Williams, Hoffman, and Schenck <doi:10.1080/01621459.2021.1955691>, traditional point treatment, and traditional longitudinal effects. Continuous, binary, categorical treatments, and multivariate treatments are allowed as well are censored outcomes. The treatment mechanism is estimated via a density ratio classification procedure irrespective of treatment variable type. For both continuous and binary outcomes, additive treatment effects can be calculated and relative risks and odds ratios may be calculated for binary outcomes. Supports survival outcomes with competing risks (Diaz, Hoffman, and Hejazi; <doi:10.1007/s10985-023-09606-7>).

Maintained by Nicholas Williams. Last updated 26 days ago.

causal-inference censored-data longitudinal-data machine-learning modified-treatment-policy nonparametric-statistics precision-medicine robust-statistics statistics stochastic-interventions survival-analysis targeted-learning

64 stars 6.37 score 91 scripts

bioc

scDataviz:scDataviz: single cell dataviz and downstream analyses

In the single cell World, which includes flow cytometry, mass cytometry, single-cell RNA-seq (scRNA-seq), and others, there is a need to improve data visualisation and to bring analysis capabilities to researchers even from non-technical backgrounds. scDataviz attempts to fit into this space, while also catering for advanced users. Additonally, due to the way that scDataviz is designed, which is based on SingleCellExperiment, it has a 'plug and play' feel, and immediately lends itself as flexibile and compatibile with studies that go beyond scDataviz. Finally, the graphics in scDataviz are generated via the ggplot engine, which means that users can 'add on' features to these with ease.

Maintained by Kevin Blighe. Last updated 5 months ago.

singlecell immunooncology rnaseq geneexpression transcription flowcytometry massspectrometry dataimport

63 stars 6.30 score 16 scripts

bioc

tidyomics:Easily install and load the tidyomics ecosystem

The tidyomics ecosystem is a set of packages for ’omic data analysis that work together in harmony; they share common data representations and API design, consistent with the tidyverse ecosystem. The tidyomics package is designed to make it easy to install and load core packages from the tidyomics ecosystem with a single command.

Maintained by Stefano Mangiola. Last updated 5 months ago.

assaydomain infrastructure rnaseq differentialexpression geneexpression normalization clustering qualitycontrol sequencing transcription transcriptomics cytometry genomics tidyverse

67 stars 6.13 score 5 scripts

jernejjevsenak

dendroTools:Linear and Nonlinear Methods for Analyzing Daily and Monthly Dendroclimatological Data

Provides novel dendroclimatological methods, primarily used by the Tree-ring research community. There are four core functions. The first one is daily_response(), which finds the optimal sequence of days that are related to one or more tree-ring proxy records. Similar function is daily_response_seascorr(), which implements partial correlations in the analysis of daily response functions. For the enthusiast of monthly data, there is monthly_response() function. The last core function is compare_methods(), which effectively compares several linear and nonlinear regression algorithms on the task of climate reconstruction.

Maintained by Jernej Jevsenak. Last updated 1 months ago.

5 stars 6.13 score 81 scripts

feiyoung

DR.SC:Joint Dimension Reduction and Spatial Clustering

Joint dimension reduction and spatial clustering is conducted for Single-cell RNA sequencing and spatial transcriptomics data, and more details can be referred to Wei Liu, Xu Liao, Yi Yang, Huazhen Lin, Joe Yeong, Xiang Zhou, Xingjie Shi and Jin Liu. (2022) <doi:10.1093/nar/gkac219>. It is not only computationally efficient and scalable to the sample size increment, but also is capable of choosing the smoothness parameter and the number of clusters as well.

Maintained by Wei Liu. Last updated 1 years ago.

dimension-reduction selfsupervised spatial-clustering spatial-transcriptomics openblas cpp

5 stars 6.12 score 29 scripts 2 dependents

swfsc

eSDM:Ensemble Tool for Predictions from Species Distribution Models

A tool which allows users to create and evaluate ensembles of species distribution model (SDM) predictions. Functionality is offered through R functions or a GUI (R Shiny app). This tool can assist users in identifying spatial uncertainties and making informed conservation and management decisions. The package is further described in Woodman et al (2019) <doi:10.1111/2041-210X.13283>.

Maintained by Sam Woodman. Last updated 6 months ago.

11 stars 6.07 score 24 scripts

bioc

Dino:Normalization of Single-Cell mRNA Sequencing Data

Dino normalizes single-cell, mRNA sequencing data to correct for technical variation, particularly sequencing depth, prior to downstream analysis. The approach produces a matrix of corrected expression for which the dependency between sequencing depth and the full distribution of normalized expression; many existing methods aim to remove only the dependency between sequencing depth and the mean of the normalized expression. This is particuarly useful in the context of highly sparse datasets such as those produced by 10X genomics and other uninque molecular identifier (UMI) based microfluidics protocols for which the depth-dependent proportion of zeros in the raw expression data can otherwise present a challenge.

Maintained by Jared Brown. Last updated 5 months ago.

software normalization rnaseq singlecell sequencing geneexpression transcriptomics regression cellbasedassays

9 stars 6.02 score 13 scripts

bioc

FEAST:FEAture SelcTion (FEAST) for Single-cell clustering

Cell clustering is one of the most important and commonly performed tasks in single-cell RNA sequencing (scRNA-seq) data analysis. An important step in cell clustering is to select a subset of genes (referred to as “features”), whose expression patterns will then be used for downstream clustering. A good set of features should include the ones that distinguish different cell types, and the quality of such set could have significant impact on the clustering accuracy. FEAST is an R library for selecting most representative features before performing the core of scRNA-seq clustering. It can be used as a plug-in for the etablished clustering algorithms such as SC3, TSCAN, SHARP, SIMLR, and Seurat. The core of FEAST algorithm includes three steps: 1. consensus clustering; 2. gene-level significance inference; 3. validation of an optimized feature set.

Maintained by Kenong Su. Last updated 5 months ago.

sequencing singlecell clustering featureextraction

10 stars 5.97 score 47 scripts

bioc

PathoStat:PathoStat Statistical Microbiome Analysis Package

The purpose of this package is to perform Statistical Microbiome Analysis on metagenomics results from sequencing data samples. In particular, it supports analyses on the PathoScope generated report files. PathoStat provides various functionalities including Relative Abundance charts, Diversity estimates and plots, tests of Differential Abundance, Time Series visualization, and Core OTU analysis.

Maintained by Solaiappan Manimaran. Last updated 5 months ago.

microbiome metagenomics graphandnetwork microarray patternlogic principalcomponent sequencing software visualization rnaseq immunooncology

8 stars 5.90 score 8 scripts

feiyoung

ProFAST:Probabilistic Factor Analysis for Spatially-Aware Dimension Reduction

Probabilistic factor analysis for spatially-aware dimension reduction across multi-section spatial transcriptomics data with millions of spatial locations. More details can be referred to Wei Liu, et al. (2023) <doi:10.1101/2023.07.11.548486>.

Maintained by Wei Liu. Last updated 2 months ago.

openblas cpp

2 stars 5.86 score 12 scripts 1 dependents

nsaph-software

GPCERF:Gaussian Processes for Estimating Causal Exposure Response Curves

Provides a non-parametric Bayesian framework based on Gaussian process priors for estimating causal effects of a continuous exposure and detecting change points in the causal exposure response curves using observational data. Ren, B., Wu, X., Braun, D., Pillai, N., & Dominici, F.(2021). "Bayesian modeling for exposure response curve via gaussian processes: Causal effects of exposure to air pollution on health outcomes." arXiv preprint <doi:10.48550/arXiv.2105.03454>.

Maintained by Boyu Ren. Last updated 11 months ago.

cpp

9 stars 5.86 score 16 scripts

bioc

omicsViewer:Interactive and explorative visualization of SummarizedExperssionSet or ExpressionSet using omicsViewer

omicsViewer visualizes ExpressionSet (or SummarizedExperiment) in an interactive way. The omicsViewer has a separate back- and front-end. In the back-end, users need to prepare an ExpressionSet that contains all the necessary information for the downstream data interpretation. Some extra requirements on the headers of phenotype data or feature data are imposed so that the provided information can be clearly recognized by the front-end, at the same time, keep a minimum modification on the existing ExpressionSet object. The pure dependency on R/Bioconductor guarantees maximum flexibility in the statistical analysis in the back-end. Once the ExpressionSet is prepared, it can be visualized using the front-end, implemented by shiny and plotly. Both features and samples could be selected from (data) tables or graphs (scatter plot/heatmap). Different types of analyses, such as enrichment analysis (using Bioconductor package fgsea or fisher's exact test) and STRING network analysis, will be performed on the fly and the results are visualized simultaneously. When a subset of samples and a phenotype variable is selected, a significance test on means (t-test or ranked based test; when phenotype variable is quantitative) or test of independence (chi-square or fisher’s exact test; when phenotype data is categorical) will be performed to test the association between the phenotype of interest with the selected samples. Additionally, other analyses can be easily added as extra shiny modules. Therefore, omicsViewer will greatly facilitate data exploration, many different hypotheses can be explored in a short time without the need for knowledge of R. In addition, the resulting data could be easily shared using a shiny server. Otherwise, a standalone version of omicsViewer together with designated omics data could be easily created by integrating it with portable R, which can be shared with collaborators or submitted as supplementary data together with a manuscript.

Maintained by Chen Meng. Last updated 2 months ago.

software visualization genesetenrichment differentialexpression motifdiscovery network networkenrichment

4 stars 5.82 score 22 scripts

bioc

scBubbletree:Quantitative visual exploration of scRNA-seq data

scBubbletree is a quantitative method for the visual exploration of scRNA-seq data, preserving key biological properties such as local and global cell distances and cell density distributions across samples. It effectively resolves overplotting and enables the visualization of diverse cell attributes from multiomic single-cell experiments. Additionally, scBubbletree is user-friendly and integrates seamlessly with popular scRNA-seq analysis tools, facilitating comprehensive and intuitive data interpretation.

Maintained by Simo Kitanovski. Last updated 5 months ago.

visualization clustering singlecell transcriptomics rnaseq big-data bigdata scrna-seq scrna-seq-analysis visual visual-exploration

6 stars 5.82 score 8 scripts

andreasnordland

polle:Policy Learning

Package for learning and evaluating (subgroup) policies via doubly robust loss functions. Policy learning methods include doubly robust blip/conditional average treatment effect learning and sequential policy tree learning. Methods for (subgroup) policy evaluation include doubly robust cross-fitting and online estimation/sequential validation. See Nordland and Holst (2022) <doi:10.48550/arXiv.2212.02335> for documentation and references.

Maintained by Andreas Nordland. Last updated 6 days ago.

4 stars 5.80 score 6 scripts

bioc

benchdamic:Benchmark of differential abundance methods on microbiome data

Starting from a microbiome dataset (16S or WMS with absolute count values) it is possible to perform several analysis to assess the performances of many differential abundance detection methods. A basic and standardized version of the main differential abundance analysis methods is supplied but the user can also add his method to the benchmark. The analyses focus on 4 main aspects: i) the goodness of fit of each method's distributional assumptions on the observed count data, ii) the ability to control the false discovery rate, iii) the within and between method concordances, iv) the truthfulness of the findings if any apriori knowledge is given. Several graphical functions are available for result visualization.

Maintained by Matteo Calgaro. Last updated 4 months ago.

metagenomics microbiome differentialexpression multiplecomparison normalization preprocessing software benchmark differential-abundance-methods

8 stars 5.78 score 8 scripts

bioc

scRNAseqApp:A single-cell RNAseq Shiny app-package

The scRNAseqApp is a Shiny app package designed for interactive visualization of single-cell data. It is an enhanced version derived from the ShinyCell, repackaged to accommodate multiple datasets. The app enables users to visualize data containing various types of information simultaneously, facilitating comprehensive analysis. Additionally, it includes a user management system to regulate database accessibility for different users.

Maintained by Jianhong Ou. Last updated 20 days ago.

visualization singlecell rnaseq interactive-visualizations multiple-users shiny-apps single-cell-rna-seq

4 stars 5.76 score 3 scripts

shanascogin

BayesPostEst:Generate Postestimation Quantities for Bayesian MCMC Estimation

An implementation of functions to generate and plot postestimation quantities after estimating Bayesian regression models using Markov chain Monte Carlo (MCMC). Functionality includes the estimation of the Precision-Recall curves (see Beger, 2016 <doi:10.2139/ssrn.2765419>), the implementation of the observed values method of calculating predicted probabilities by Hanmer and Kalkan (2013) <doi:10.1111/j.1540-5907.2012.00602.x>, the implementation of the average value method of calculating predicted probabilities (see King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>), and the generation and plotting of first differences to summarize typical effects across covariates (see Long 1997, ISBN:9780803973749; King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>). This package can be used with MCMC output generated by any Bayesian estimation tool including 'JAGS', 'BUGS', 'MCMCpack', and 'Stan'.

Maintained by Shana Scogin. Last updated 3 years ago.

jags cpp

12 stars 5.71 score 17 scripts

core-bioinformatics

ClustAssess:Tools for Assessing Clustering

A set of tools for evaluating clustering robustness using proportion of ambiguously clustered pairs (Senbabaoglu et al. (2014) <doi:10.1038/srep06207>), as well as similarity across methods and method stability using element-centric clustering comparison (Gates et al. (2019) <doi:10.1038/s41598-019-44892-y>). Additionally, this package enables stability-based parameter assessment for graph-based clustering pipelines typical in single-cell data analysis.

Maintained by Andi Munteanu. Last updated 2 months ago.

software singlecell rnaseq atacseq normalization preprocessing dimensionreduction visualization qualitycontrol clustering classification annotation geneexpression differentialexpression bioinformatics genomics machine-learning parameter-optimization robustness single-cell unsupervised-learning cpp

23 stars 5.70 score 18 scripts

bioc

scFeatures:scFeatures: Multi-view representations of single-cell and spatial data for disease outcome prediction

scFeatures constructs multi-view representations of single-cell and spatial data. scFeatures is a tool that generates multi-view representations of single-cell and spatial data through the construction of a total of 17 feature types. These features can then be used for a variety of analyses using other software in Biocondutor.

Maintained by Yue Cao. Last updated 5 months ago.

cellbasedassays singlecell spatial software transcriptomics

11 stars 5.69 score 15 scripts

xiaozhangryy

CAESAR.Suite:CAESAR: a Cross-Technology and Cross-Resolution Framework for Spatial Omics Annotation

Biotechnology in spatial omics has advanced rapidly over the past few years, enhancing both throughput and resolution. However, existing annotation pipelines in spatial omics predominantly rely on clustering methods, lacking the flexibility to integrate extensive annotated information from single-cell RNA sequencing (scRNA-seq) due to discrepancies in spatial resolutions, species, or modalities. Here we introduce the CAESAR suite, an open-source software package that provides image-based spatial co-embedding of locations and genomic features. It uniquely transfers labels from scRNA-seq reference, enabling the annotation of spatial omics datasets across different technologies, resolutions, species, and modalities, based on the conserved relationship between signature genes and cells/locations at an appropriate level of granularity. Notably, CAESAR enriches location-level pathways, allowing for the detection of gradual biological pathway activation within spatially defined domain types. More details on the methods related to our paper currently under submission. A full reference to the paper will be provided in future versions once the paper is published.

Maintained by Xiao Zhang. Last updated 9 days ago.

openblas cpp

1 stars 5.67 score 2 scripts

thuizhou

PSweight:Propensity Score Weighting for Causal Inference with Observational Studies and Randomized Trials

Supports propensity score weighting analysis of observational studies and randomized trials. Enables the estimation and inference of average causal effects with binary and multiple treatments using overlap weights (ATO), inverse probability of treatment weights (ATE), average treatment effect among the treated weights (ATT), matching weights (ATM) and entropy weights (ATEN), with and without propensity score trimming. These weights are members of the family of balancing weights introduced in Li, Morgan and Zaslavsky (2018) <doi:10.1080/01621459.2016.1260466> and Li and Li (2019) <doi:10.1214/19-AOAS1282>.

Maintained by Yukang Zeng. Last updated 1 years ago.

23 stars 5.54 score 47 scripts 2 dependents

bioc

scDotPlot:Cluster a Single-cell RNA-seq Dot Plot

Dot plots of single-cell RNA-seq data allow for an examination of the relationships between cell groupings (e.g. clusters) and marker gene expression. The scDotPlot package offers a unified approach to perform a hierarchical clustering analysis and add annotations to the columns and/or rows of a scRNA-seq dot plot. It works with SingleCellExperiment and Seurat objects as well as data frames.

Maintained by Benjamin I Laufer. Last updated 13 days ago.

software visualization differentialexpression geneexpression transcription rnaseq singlecell sequencing clustering

7 stars 5.45 score 2 scripts

bioc

speckle:Statistical methods for analysing single cell RNA-seq data

The speckle package contains functions for the analysis of single cell RNA-seq data. The speckle package currently contains functions to analyse differences in cell type proportions. There are also functions to estimate the parameters of the Beta distribution based on a given counts matrix, and a function to normalise a counts matrix to the median library size. There are plotting functions to visualise cell type proportions and the mean-variance relationship in cell type proportions and counts. As our research into specialised analyses of single cell data continues we anticipate that the package will be updated with new functions.

Maintained by Belinda Phipson. Last updated 5 months ago.

singlecell rnaseq regression geneexpression

5.41 score 258 scripts

choonghyunryu

alookr:Model Classifier for Binary Classification

A collection of tools that support data splitting, predictive modeling, and model evaluation. A typical function is to split a dataset into a training dataset and a test dataset. Then compare the data distribution of the two datasets. Another feature is to support the development of predictive models and to compare the performance of several predictive models, helping to select the best model.

Maintained by Choonghyun Ryu. Last updated 1 years ago.

12 stars 5.38 score 9 scripts

simonmoulds

lulcc:Land Use Change Modelling in R

Classes and methods for spatially explicit land use change modelling in R.

Maintained by Simon Moulds. Last updated 5 years ago.

41 stars 5.37 score 38 scripts

ddimmery

tidyhte:Tidy Estimation of Heterogeneous Treatment Effects

Estimates heterogeneous treatment effects using tidy semantics on experimental or observational data. Methods are based on the doubly-robust learner of Kennedy (n.d.) <arXiv:2004.14497>. You provide a simple recipe for what machine learning algorithms to use in estimating the nuisance functions and 'tidyhte' will take care of cross-validation, estimation, model selection, diagnostics and construction of relevant quantities of interest about the variability of treatment effects.

Maintained by Drew Dimmery. Last updated 2 years ago.

14 stars 5.36 score 11 scripts

avi-kenny

vaccine:Statistical Tools for Immune Correlates Analysis of Vaccine Clinical Trial Data

Various semiparametric and nonparametric statistical tools for immune correlates analysis of vaccine clinical trial data. This includes calculation of summary statistics and estimation of risk, vaccine efficacy, controlled effects (controlled risk and controlled vaccine efficacy), and mediation effects (natural direct effect, natural indirect effect, proportion mediated). See Gilbert P, Fong Y, Kenny A, and Carone, M (2022) <doi:10.1093/biostatistics/kxac024> and Fay MP and Follmann DA (2023) <doi:10.48550/arXiv.2208.06465>.

Maintained by Avi Kenny. Last updated 1 months ago.

4 stars 5.34 score 11 scripts

tlverse

tmle3shift:Targeted Learning of the Causal Effects of Stochastic Interventions

Targeted maximum likelihood estimation (TMLE) of population-level causal effects under stochastic treatment regimes and related nonparametric variable importance analyses. Tools are provided for TML estimation of the counterfactual mean under a stochastic intervention characterized as a modified treatment policy, such as treatment policies that shift the natural value of the exposure. The causal parameter and estimation were described in Díaz and van der Laan (2013) <doi:10.1111/j.1541-0420.2011.01685.x> and an improved estimation approach was given by Díaz and van der Laan (2018) <doi:10.1007/978-3-319-65304-4_14>.

Maintained by Nima Hejazi. Last updated 6 months ago.

causal-inference machine-learning marginal-structural-models stochastic-interventions targeted-learning treatment-effects variable-importance

17 stars 5.33 score 42 scripts 1 dependents

zhiyuan-hu-lab

CIDER:Meta-Clustering for scRNA-Seq Integration and Evaluation

A workflow of (a) meta-clustering based on inter-group similarity measures and (b) a ground-truth-free test metric to assess the biological correctness of integration in real datasets. See Hu Z, Ahmed A, Yau C (2021) <doi:10.1101/2021.03.29.437525> for more details.

Maintained by Zhiyuan Hu. Last updated 2 months ago.

5.30 score

bioc

scCB2:CB2 improves power of cell detection in droplet-based single-cell RNA sequencing data

scCB2 is an R package implementing CB2 for distinguishing real cells from empty droplets in droplet-based single cell RNA-seq experiments (especially for 10x Chromium). It is based on clustering similar barcodes and calculating Monte-Carlo p-value for each cluster to test against background distribution. This cluster-level test outperforms single-barcode-level tests in dealing with low count barcodes and homogeneous sequencing library, while keeping FDR well controlled.

Maintained by Zijian Ni. Last updated 5 months ago.

dataimport rnaseq singlecell sequencing geneexpression transcriptomics preprocessing clustering

10 stars 5.30 score 5 scripts

bioc

biotmle:Targeted Learning with Moderated Statistics for Biomarker Discovery

Tools for differential expression biomarker discovery based on microarray and next-generation sequencing data that leverage efficient semiparametric estimators of the average treatment effect for variable importance analysis. Estimation and inference of the (marginal) average treatment effects of potential biomarkers are computed by targeted minimum loss-based estimation, with joint, stable inference constructed across all biomarkers using a generalization of moderated statistics for use with the estimated efficient influence function. The procedure accommodates the use of ensemble machine learning for the estimation of nuisance functions.

Maintained by Nima Hejazi. Last updated 5 months ago.

regression geneexpression differentialexpression sequencing microarray rnaseq immunooncology bioconductor bioconductor-package bioconductor-packages bioinformatics biomarker-discovery biostatistics causal-inference computational-biology machine-learning statistics targeted-learning

5 stars 5.30 score 5 scripts

jiang-junyao

CACIMAR:cross-species analysis of cell identities, markers and regulations

A toolkit to perform cross-species analysis based on scRNA-seq data. CACIMAR contains 5 main features. (1) identify Markers in each cluster. (2) Cell type annotaion (3) identify conserved markers. (4) identify conserved cell types. (5) identify conserved modules of regulatory networks.

Maintained by Junyao Jiang. Last updated 16 hours ago.

cross-species-analysis scrna-seq

12 stars 5.23 score 6 scripts

adefazio

classifierplots:Generates a Visualization of Classifier Performance as a Grid of Diagnostic Plots

Generates a visualization of binary classifier performance as a grid of diagnostic plots with just one function call. Includes ROC curves, prediction density, accuracy, precision, recall and calibration plots, all using ggplot2 for easy modification. Debug your binary classifiers faster and easier!

Maintained by Aaron Defazio. Last updated 4 years ago.

50 stars 5.08 score 16 scripts

bioc

CDI:Clustering Deviation Index (CDI)

Single-cell RNA-sequencing (scRNA-seq) is widely used to explore cellular variation. The analysis of scRNA-seq data often starts from clustering cells into subpopulations. This initial step has a high impact on downstream analyses, and hence it is important to be accurate. However, there have not been unsupervised metric designed for scRNA-seq to evaluate clustering performance. Hence, we propose clustering deviation index (CDI), an unsupervised metric based on the modeling of scRNA-seq UMI counts to evaluate clustering of cells.

Maintained by Jiyuan Fang. Last updated 5 months ago.

singlecell software clustering visualization sequencing rnaseq cellbasedassays

5 stars 5.00 score 4 scripts

bioc

decontX:Decontamination of single cell genomics data

This package contains implementation of DecontX (Yang et al. 2020), a decontamination algorithm for single-cell RNA-seq, and DecontPro (Yin et al. 2023), a decontamination algorithm for single cell protein expression data. DecontX is a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. DecontPro is a Bayesian method that estimates the level of contamination from ambient and background sources in CITE-seq ADT dataset and decontaminate the dataset.

Maintained by Joshua Campbell. Last updated 2 months ago.

singlecell bayesian cpp

4.94 score 29 scripts

bioc

Melissa:Bayesian clustering and imputationa of single cell methylomes

Melissa is a Baysian probabilistic model for jointly clustering and imputing single cell methylomes. This is done by taking into account local correlations via a Generalised Linear Model approach and global similarities using a mixture modelling approach.

Maintained by C. A. Kapourani. Last updated 5 months ago.

immunooncology dnamethylation geneexpression generegulation epigenetics genetics clustering featureextraction regression rnaseq bayesian kegg sequencing coverage singlecell

4.90 score 7 scripts

jgasmits

AnanseSeurat:Construct ANANSE GRN-Analysis Seurat

Enables gene regulatory network (GRN) analysis on single cell clusters, using the GRN analysis software 'ANANSE', Xu et al.(2021) <doi:10.1093/nar/gkab598>. Export data from 'Seurat' objects, for GRN analysis by 'ANANSE' implemented in 'snakemake'. Finally, incorporate results for visualization and interpretation.

Maintained by Jos Smits. Last updated 1 years ago.

grn-analysis seurat-objects single-cell single-cell-atac-seq single-cell-rna-seq

8 stars 4.90 score 4 scripts

bdwilliamson

flevr:Flexible, Ensemble-Based Variable Selection with Potentially Missing Data

Perform variable selection in settings with possibly missing data based on extrinsic (algorithm-specific) and intrinsic (population-level) variable importance. Uses a Super Learner ensemble to estimate the underlying prediction functions that give rise to estimates of variable importance. For more information about the methods, please see Williamson and Huang (2023+) <arXiv:2202.12989>.

Maintained by Brian D. Williamson. Last updated 1 years ago.

5 stars 4.88 score 2 scripts

bioc

CelliD:Unbiased Extraction of Single Cell gene signatures using Multiple Correspondence Analysis

CelliD is a clustering-free multivariate statistical method for the robust extraction of per-cell gene signatures from single-cell RNA-seq. CelliD allows unbiased cell identity recognition across different donors, tissues-of-origin, model organisms and single-cell omics protocols. The package can also be used to explore functional pathways enrichment in single cell data.

Maintained by Akira Cortal. Last updated 5 months ago.

rnaseq singlecell dimensionreduction clustering genesetenrichment geneexpression atacseq openblas cpp openmp

4.85 score 70 scripts

yuelyu21

SCIntRuler:Guiding the Integration of Multiple Single-Cell RNA-Seq Datasets

The accumulation of single-cell RNA-seq (scRNA-seq) studies highlights the potential benefits of integrating multiple datasets. By augmenting sample sizes and enhancing analytical robustness, integration can lead to more insightful biological conclusions. However, challenges arise due to the inherent diversity and batch discrepancies within and across studies. SCIntRuler, a novel R package, addresses these challenges by guiding the integration of multiple scRNA-seq datasets.

Maintained by Yue Lyu. Last updated 6 months ago.

sequencing geneticvariability singlecell cpp

2 stars 4.85 score 3 scripts

jucheng1992

ctmle:Collaborative Targeted Maximum Likelihood Estimation

Implements the general template for collaborative targeted maximum likelihood estimation. It also provides several commonly used C-TMLE instantiation, like the vanilla/scalable variable-selection C-TMLE (Ju et al. (2017) <doi:10.1177/0962280217729845>) and the glmnet-C-TMLE algorithm (Ju et al. (2017) <arXiv:1706.10029>).

Maintained by Cheng Ju. Last updated 5 years ago.

causal-inference machine-learning statistics tmle

5 stars 4.83 score 27 scripts

papatheodorou-group

scGOclust:Measuring Cell Type Similarity with Gene Ontology in Single-Cell RNA-Seq

Traditional methods for analyzing single cell RNA-seq datasets focus solely on gene expression, but this package introduces a novel approach that goes beyond this limitation. Using Gene Ontology terms as features, the package allows for the functional profile of cell populations, and comparison within and between datasets from the same or different species. Our approach enables the discovery of previously unrecognized functional similarities and differences between cell types and has demonstrated success in identifying cell types' functional correspondence even between evolutionarily distant species.

Maintained by Yuyao Song. Last updated 12 days ago.

9 stars 4.80 score 14 scripts

cefet-rj-dal

heimdall:Drift Adaptable Models

By analyzing streaming datasets, it is possible to observe significant changes in the data distribution or models' accuracy during their prediction (concept drift). The goal of 'heimdall' is to measure when concept drift occurs. The package makes available several state-of-the-art methods. It also tackles how to adapt models in a nonstationary context. Some concept drifts methods are described in Tavares (2022) <doi:10.1007/s12530-021-09415-z>.

Maintained by Eduardo Ogasawara. Last updated 2 months ago.

2 stars 4.77 score 45 scripts

jillbo1000

EZtune:Tunes AdaBoost, Elastic Net, Support Vector Machines, and Gradient Boosting Machines

Contains two functions that are intended to make tuning supervised learning methods easy. The eztune function uses a genetic algorithm or Hooke-Jeeves optimizer to find the best set of tuning parameters. The user can choose the optimizer, the learning method, and if optimization will be based on accuracy obtained through validation error, cross validation, or resubstitution. The function eztune_cv will compute a cross validated error rate. The purpose of eztune_cv is to provide a cross validated accuracy or MSE when resubstitution or validation data are used for optimization because error measures from both approaches can be misleading.

Maintained by Jill Lundell. Last updated 3 years ago.

4.76 score 38 scripts 1 dependents

hknd23

DeepLearningCausal:Causal Inference with Super Learner and Deep Neural Networks

Functions to estimate Conditional Average Treatment Effects (CATE) and Population Average Treatment Effects on the Treated (PATT) from experimental or observational data using the Super Learner (SL) ensemble method and Deep neural networks. The package first provides functions to implement meta-learners such as the Single-learner (S-learner) and Two-learner (T-learner) described in Künzel et al. (2019) <doi:10.1073/pnas.1804597116> for estimating the CATE. The S- and T-learner are each estimated using the SL ensemble method and deep neural networks. It then provides functions to implement the Ottoboni and Poulos (2020) <doi:10.1515/jci-2018-0035> PATT-C estimator to obtain the PATT from experimental data with noncompliance by using the SL ensemble method and deep neural networks.

Maintained by Nguyen K. Huynh. Last updated 2 months ago.

causal-inference deep-neural-networks machine-learning

2 stars 4.73 score 5 scripts

evalclass

prcbench:Testing Workbench for Precision-Recall Curves

A testing workbench to evaluate tools that calculate precision-recall curves. Saito and Rehmsmeier (2015) <doi:10.1371/journal.pone.0118432>.

Maintained by Takaya Saito. Last updated 2 years ago.

cpp

5 stars 4.72 score 21 scripts

duolajiang

RCTrep:Validation of Estimates of Treatment Effects in Observational Data

Validates estimates of (conditional) average treatment effects obtained using observational data by a) making it easy to obtain and visualize estimates derived using a large variety of methods (G-computation, inverse propensity score weighting, etc.), and b) ensuring that estimates are easily compared to a gold standard (i.e., estimates derived from randomized controlled trials). 'RCTrep' offers a generic protocol for treatment effect validation based on four simple steps, namely, set-selection, estimation, diagnosis, and validation. 'RCTrep' provides a simple dashboard to review the obtained results. The validation approach is introduced by Shen, L., Geleijnse, G. and Kaptein, M. (2023) <doi:10.21203/rs.3.rs-2559287/v1>.

Maintained by Lingjie Shen. Last updated 2 years ago.

8 stars 4.68 score 12 scripts

bioc

Anaquin:Statistical analysis of sequins

The project is intended to support the use of sequins (synthetic sequencing spike-in controls) owned and made available by the Garvan Institute of Medical Research. The goal is to provide a standard open source library for quantitative analysis, modelling and visualization of spike-in controls.

Maintained by Ted Wong. Last updated 5 months ago.

immunooncology differentialexpression preprocessing rnaseq geneexpression software

4.65 score 45 scripts

bioc

stJoincount:stJoincount - Join count statistic for quantifying spatial correlation between clusters

stJoincount facilitates the application of join count analysis to spatial transcriptomic data generated from the 10x Genomics Visium platform. This tool first converts a labeled spatial tissue map into a raster object, in which each spatial feature is represented by a pixel coded by label assignment. This process includes automatic calculation of optimal raster resolution and extent for the sample. A neighbors list is then created from the rasterized sample, in which adjacent and diagonal neighbors for each pixel are identified. After adding binary spatial weights to the neighbors list, a multi-categorical join count analysis is performed to tabulate "joins" between all possible combinations of label pairs. The function returns the observed join counts, the expected count under conditions of spatial randomness, and the variance calculated under non-free sampling. The z-score is then calculated as the difference between observed and expected counts, divided by the square root of the variance.

Maintained by Jiarong Song. Last updated 5 months ago.

transcriptomics clustering spatial biocviews software

4 stars 4.60 score 3 scripts

nhejazi

medoutcon:Efficient Natural and Interventional Causal Mediation Analysis

Efficient estimators of interventional (in)direct effects in the presence of mediator-outcome confounding affected by exposure. The effects estimated allow for the impact of the exposure on the outcome through a direct path to be disentangled from that through mediators, even in the presence of intermediate confounders that complicate such a relationship. Currently supported are non-parametric efficient one-step and targeted minimum loss estimators based on the formulation of Díaz, Hejazi, Rudolph, and van der Laan (2020) <doi:10.1093/biomet/asaa085>. Support for efficient estimation of the natural (in)direct effects is also provided, appropriate for settings in which intermediate confounders are absent. The package also supports estimation of these effects when the mediators are measured using outcome-dependent two-phase sampling designs (e.g., case-cohort).

Maintained by Nima Hejazi. Last updated 1 years ago.

causal-inference causal-machine-learning inverse-probability-weights machine-learning mediation-analysis stochastic-interventions targeted-learning treatment-effects

13 stars 4.46 score 22 scripts

mbannick

RobinCar:Robust Inference for Covariate Adjustment in Randomized Clinical Trials

Performs robust estimation and inference when using covariate adjustment and/or covariate-adaptive randomization in randomized clinical trials. Ting Ye, Jun Shao, Yanyao Yi, Qinyuan Zhao (2023) <doi:10.1080/01621459.2022.2049278>. Ting Ye, Marlena Bannick, Yanyao Yi, Jun Shao (2023) <doi:10.1080/24754269.2023.2205802>. Ting Ye, Jun Shao, Yanyao Yi (2023) <doi:10.1093/biomet/asad045>. Marlena Bannick, Jun Shao, Jingyi Liu, Yu Du, Yanyao Yi, Ting Ye (2024) <doi:10.48550/arXiv.2306.10213>.

Maintained by Marlena Bannick. Last updated 23 days ago.

6 stars 4.42 score 11 scripts

dosorio

rPanglaoDB:Download and Merge Single-Cell RNA-Seq Data from the PanglaoDB Database

Download and merge labeled single-cell RNA-seq data from the PanglaoDB <https://panglaodb.se/> into a Seurat object.

Maintained by Daniel Osorio. Last updated 2 years ago.

data-integration data-mining rna-seq single-cell single-cell-rna-seq

26 stars 4.41 score 20 scripts

blaserlab

DoubletFinder:DoubletFinder is a suite of tools for identifying doublets in single-cell RNA sequencing data.

DoubletFinder identifies doublets by generating artificial doublets from existing scRNA-seq data and defining which real cells preferentially co-localize with artificial doublets in gene expression space. Other DoubletFinder package functions are used for fitting DoubletFinder to different scRNA-seq datasets. For example, ideal DoubletFinder performance in real-world contexts requires (I) Optimal pK selection and (2) Homotypic doublet proportion estimation. pK selection is achieved using pN-pK parameter sweeps and maxima identification in mean-variance-normalized bimodality coefficient distributions. Homotypic doublet proportion estimation is achieved by finding the sum of squared cell annotation frequencies. For more information, see our Cell Sysmtes paper https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30073-0 and our github https://github.com/chris-mcginnis-ucsf/DoubletFinder

Maintained by Chris McGinnis. Last updated 1 years ago.

4.39 score 972 scripts

bioc

ChIPanalyser:ChIPanalyser: Predicting Transcription Factor Binding Sites

ChIPanalyser is a package to predict and understand TF binding by utilizing a statistical thermodynamic model. The model incorporates 4 main factors thought to drive TF binding: Chromatin State, Binding energy, Number of bound molecules and a scaling factor modulating TF binding affinity. Taken together, ChIPanalyser produces ChIP-like profiles that closely mimic the patterns seens in real ChIP-seq data.

Maintained by Patrick C.N. Martin. Last updated 5 months ago.

software biologicalquestion workflowstep transcription sequencing chiponchip coverage alignment chipseq sequencematching dataimport peakdetection

4.38 score 12 scripts

hakyimlab

OmicKriging:Poly-Omic Prediction of Complex TRaits

It provides functions to generate a correlation matrix from a genetic dataset and to use this matrix to predict the phenotype of an individual by using the phenotypes of the remaining individuals through kriging. Kriging is a geostatistical method for optimal prediction or best unbiased linear prediction. It consists of predicting the value of a variable at an unobserved location as a weighted sum of the variable at observed locations. Intuitively, it works as a reverse linear regression: instead of computing correlation (univariate regression coefficients are simply scaled correlation) between a dependent variable Y and independent variables X, it uses known correlation between X and Y to predict Y.

Maintained by Hae Kyung Im. Last updated 4 years ago.

2 stars 4.38 score 48 scripts

ledell

subsemble:An Ensemble Method for Combining Subset-Specific Algorithm Fits

The Subsemble algorithm is a general subset ensemble prediction method, which can be used for small, moderate, or large datasets. Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a unique form of k-fold cross-validation to output a prediction function that combines the subset-specific fits. An oracle result provides a theoretical performance guarantee for Subsemble. The paper, "Subsemble: An ensemble method for combining subset-specific algorithm fits" is authored by Stephanie Sapp, Mark J. van der Laan & John Canny (2014) <doi:10.1080/02664763.2013.864263>.

Maintained by Erin LeDell. Last updated 3 years ago.

big-data cross-validation ensemble ensemble-learning machine-learning machine-learning-algorithms

43 stars 4.37 score 11 scripts

bioc

Spaniel:Spatial Transcriptomics Analysis

Spaniel includes a series of tools to aid the quality control and analysis of Spatial Transcriptomics data. Spaniel can import data from either the original Spatial Transcriptomics system or 10X Visium technology. The package contains functions to create a SingleCellExperiment Seurat object and provides a method of loading a histologial image into R. The spanielPlot function allows visualisation of metrics contained within the S4 object overlaid onto the image of the tissue.

Maintained by Rachel Queen. Last updated 5 months ago.

singlecell rnaseq qualitycontrol preprocessing normalization visualization transcriptomics geneexpression sequencing software dataimport datarepresentation infrastructure coverage clustering

4.34 score 22 scripts

bioc

ClusterFoldSimilarity:Calculate similarity of clusters from different single cell samples using foldchanges

This package calculates a similarity coefficient using the fold changes of shared features (e.g. genes) among clusters of different samples/batches/datasets. The similarity coefficient is calculated using the dot-product (Hadamard product) of every pairwise combination of Fold Changes between a source cluster i of sample/dataset n and all the target clusters j in sample/dataset m

Maintained by Oscar Gonzalez-Velasco. Last updated 5 months ago.

singlecell clustering featureextraction graphandnetwork genetarget rnaseq

4.34 score 11 scripts

bioc

PAA:PAA (Protein Array Analyzer)

PAA imports single color (protein) microarray data that has been saved in gpr file format - esp. ProtoArray data. After preprocessing (background correction, batch filtering, normalization) univariate feature preselection is performed (e.g., using the "minimum M statistic" approach - hereinafter referred to as "mMs"). Subsequently, a multivariate feature selection is conducted to discover biomarker candidates. Therefore, either a frequency-based backwards elimination aproach or ensemble feature selection can be used. PAA provides a complete toolbox of analysis tools including several different plots for results examination and evaluation.

Maintained by Michael Turewicz. Last updated 5 months ago.

classification microarray onechannel proteomics cpp

4.34 score 11 scripts

foucher-y

RISCA:Causal Inference and Prediction in Cohort-Based Analyses

Numerous functions for cohort-based analyses, either for prediction or causal inference. For causal inference, it includes Inverse Probability Weighting and G-computation for marginal estimation of an exposure effect when confounders are expected. We deal with binary outcomes, times-to-events, competing events, and multi-state data. For multistate data, semi-Markov model with interval censoring may be considered, and we propose the possibility to consider the excess of mortality related to the disease compared to reference lifetime tables. For predictive studies, we propose a set of functions to estimate time-dependent receiver operating characteristic (ROC) curves with the possible consideration of right-censoring times-to-events or the presence of confounders. Finally, several functions are available to assess time-dependent ROC curves or survival curves from aggregated data.

Maintained by Yohann Foucher. Last updated 1 months ago.

1 stars 4.33 score 47 scripts

bioc

cytofQC:Labels normalized cells for CyTOF data and assigns probabilities for each label

cytofQC is a package for initial cleaning of CyTOF data. It uses a semi-supervised approach for labeling cells with their most likely data type (bead, doublet, debris, dead) and the probability that they belong to each label type. This package does not remove data from the dataset, but provides labels and information to aid the data user in cleaning their data. Our algorithm is able to distinguish between doublets and large cells.

Maintained by Jill Lundell. Last updated 5 months ago.

software singlecell annotation

2 stars 4.30 score 3 scripts

bioc

RNAmodR.AlkAnilineSeq:Detection of m7G, m3C and D modification by AlkAnilineSeq

RNAmodR.AlkAnilineSeq implements the detection of m7G, m3C and D modifications on RNA from experimental data generated with the AlkAnilineSeq protocol. The package builds on the core functionality of the RNAmodR package to detect specific patterns of the modifications in high throughput sequencing data.

Maintained by Felix G.M. Ernst. Last updated 5 months ago.

software workflowstep visualization sequencing alkanilineseq bioconductor modifications rna rnamodr

2 stars 4.30 score 3 scripts

bioc

RegionalST:Investigating regions of interest and performing regional cell type-specific analysis with spatial transcriptomics data

This package analyze spatial transcriptomics data through cross-regional cell type-specific analysis. It selects regions of interest (ROIs) and identifys cross-regional cell type-specific differential signals. The ROIs can be selected using automatic algorithm or through manual selection. It facilitates manual selection of ROIs using a shiny application.

Maintained by Ziyi Li. Last updated 4 months ago.

spatial transcriptomics reactome kegg

4.30 score 8 scripts

bioc

scBFA:A dimensionality reduction tool using gene detection pattern to mitigate noisy expression profile of scRNA-seq

This package is designed to model gene detection pattern of scRNA-seq through a binary factor analysis model. This model allows user to pass into a cell level covariate matrix X and gene level covariate matrix Q to account for nuisance variance(e.g batch effect), and it will output a low dimensional embedding matrix for downstream analysis.

Maintained by Ruoxin Li. Last updated 5 months ago.

singlecell transcriptomics dimensionreduction geneexpression atacseq batcheffect kegg qualitycontrol

4.30 score 4 scripts

yanpd01

ggsector:Draw Sectors

Some useful functions that can use 'grid' and 'ggplot2' to plot sectors and interact with 'Seurat' to plot gene expression percentages. Also, there are some examples of how to draw sectors in 'ComplexHeatmap'.

Maintained by Pengdong Yan. Last updated 5 months ago.

4 stars 4.30 score 5 scripts

tlverse

tmle3mopttx:Targeted Maximum Likelihood Estimation of the Mean under Optimal Individualized Treatment

This package estimates the optimal individualized treatment rule for the categorical treatment using Super Learner (sl3). In order to avoid nested cross-validation, it uses split-specific estimates of Q and g to estimate the rule as described by Coyle et al. In addition, it provides the Targeted Maximum Likelihood estimates of the mean performance using CV-TMLE under such estimated rules. This is an adapter package for use with the tmle3 framework and the tlverse software ecosystem for Targeted Learning.

Maintained by Ivana Malenica. Last updated 3 years ago.

categorical-treatment causal-inference heterogeneous-effects machine-learning optimal-individualized-treatment targeted-learning variable-importance

13 stars 4.28 score 49 scripts 1 dependents

bioc

LedPred:Learning from DNA to Predict Enhancers

This package aims at creating a predictive model of regulatory sequences used to score unknown sequences based on the content of DNA motifs, next-generation sequencing (NGS) peaks and signals and other numerical scores of the sequences using supervised classification. The package contains a workflow based on the support vector machine (SVM) algorithm that maps features to sequences, optimize SVM parameters and feature number and creates a model that can be stored and used to score the regulatory potential of unknown sequences.

Maintained by Aitor Gonzalez. Last updated 5 months ago.

supportvectormachine software motifannotation chipseq sequencing classification

3 stars 4.26 score 3 scripts

cyrillagger

scDiffCom:Differential Analysis of Intercellular Communication from scRNA-Seq Data

Analysis tools to investigate changes in intercellular communication from scRNA-seq data. Using a Seurat object as input, the package infers which cell-cell interactions are present in the dataset and how these interactions change between two conditions of interest (e.g. young vs old). It relies on an internal database of ligand-receptor interactions (available for human, mouse and rat) that have been gathered from several published studies. Detection and differential analyses rely on permutation tests. The package also contains several tools to perform over-representation analysis and visualize the results. See Lagger, C. et al. (2023) <doi:10.1038/s43587-023-00514-x> for a full description of the methodology.

Maintained by Cyril Lagger. Last updated 1 years ago.

21 stars 4.25 score 17 scripts

lance-waller-lab

envi:Environmental Interpolation using Spatial Kernel Density Estimation

Estimates an ecological niche using occurrence data, covariates, and kernel density-based estimation methods. For a single species with presence and absence data, the 'envi' package uses the spatial relative risk function that is estimated using the 'sparr' package. Details about the 'sparr' package methods can be found in the tutorial: Davies et al. (2018) <doi:10.1002/sim.7577>. Details about kernel density estimation can be found in J. F. Bithell (1990) <doi:10.1002/sim.4780090616>. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) <doi:10.1002/sim.4780101112>.

Maintained by Ian D. Buller. Last updated 5 months ago.

ecological-niche ecological-niche-modelling geospatial geospatial-analysis kernel-density-estimation niche-modeling niche-modelling non-euclidean-spaces point-pattern point-pattern-analysis principal-component-analysis spatial-analysis species-distribution-modeling species-distribution-modelling

1 stars 4.22 score 33 scripts

bioc

easier:Estimate Systems Immune Response from RNA-seq data

This package provides a workflow for the use of EaSIeR tool, developed to assess patients' likelihood to respond to ICB therapies providing just the patients' RNA-seq data as input. We integrate RNA-seq data with different types of prior knowledge to extract quantitative descriptors of the tumor microenvironment from several points of view, including composition of the immune repertoire, and activity of intra- and extra-cellular communications. Then, we use multi-task machine learning trained in TCGA data to identify how these descriptors can simultaneously predict several state-of-the-art hallmarks of anti-cancer immune response. In this way we derive cancer-specific models and identify cancer-specific systems biomarkers of immune response. These biomarkers have been experimentally validated in the literature and the performance of EaSIeR predictions has been validated using independent datasets form four different cancer types with patients treated with anti-PD1 or anti-PDL1 therapy.

Maintained by Oscar Lapuente-Santana. Last updated 5 months ago.

geneexpression software transcription systemsbiology pathways genesetenrichment immunooncology epigenetics classification biomedicalinformatics regression experimenthubsoftware

4.20 score 16 scripts

benkeser

nlpred:Estimators of Non-Linear Cross-Validated Risks Optimized for Small Samples

Methods for obtaining improved estimates of non-linear cross-validated risks are obtained using targeted minimum loss-based estimation, estimating equations, and one-step estimation (Benkeser, Petersen, van der Laan (2019), <doi:10.1080/01621459.2019.1668794>). Cross-validated area under the receiver operating characteristics curve (LeDell, Petersen, van der Laan (2015), <doi:10.1214/15-EJS1035>) and other metrics are included.

Maintained by David Benkeser. Last updated 3 years ago.

auc cross-validation estimating-equations machine-learning tmle

3 stars 4.18 score 6 scripts

jiaxiangbu

rawKS:Easily Get True-Positive Rate and False-Positive Rate and KS Statistic

The Kolmogorov-Smirnov (K-S) statistic is a standard method to measure the model strength for credit risk scoring models. This package calculates the K–S statistic and plots the true-positive rate and false-positive rate to measure the model strength. This package was written with the credit marketer, who uses risk models in conjunction with his campaigns. The users could read more details from Thrasher (1992) <doi:10.1002/dir.4000060408> and 'pyks' <https://pypi.org/project/pyks/>.

Maintained by Jiaxiang Li. Last updated 5 years ago.

ks model-evaluation

3 stars 4.18 score 5 scripts

bioc

partCNV:Infer locally aneuploid cells using single cell RNA-seq data

This package uses a statistical framework for rapid and accurate detection of aneuploid cells with local copy number deletion or amplification. Our method uses an EM algorithm with mixtures of Poisson distributions while incorporating cytogenetics information (e.g., regional deletion or amplification) to guide the classification (partCNV). When applicable, we further improve the accuracy by integrating a Hidden Markov Model for feature selection (partCNVH).

Maintained by Ziyi Li. Last updated 5 months ago.

software copynumbervariation hiddenmarkovmodel singlecell classification

4.18 score 4 scripts

edoardocostantini

gspcr:Generalized Supervised Principal Component Regression

Generalization of supervised principal component regression (SPCR; Bair et al., 2006, <doi:10.1198/016214505000000628>) to support continuous, binary, and discrete variables as outcomes and predictors (inspired by the 'superpc' R package <https://cran.r-project.org/package=superpc>).

Maintained by Edoardo Costantini. Last updated 12 months ago.

1 stars 4.18 score 10 scripts

ruzhangzhao

mixhvg:Mixture of Multiple Highly Variable Feature Selection Methods

Highly variable gene selection methods, including popular public available methods, and also the mixture of multiple highly variable gene selection methods, <https://github.com/RuzhangZhao/mixhvg>. Reference: <doi:10.1101/2024.08.25.608519>.

Maintained by Ruzhang Zhao. Last updated 1 months ago.

rna-seq-analysis rna-seq-pipeline single-cell single-cell-rna-seq variable-selection

5 stars 4.18 score 6 scripts

adam-s-elder

amp:Statistical Test for the Multivariate Point Null Hypotheses

A testing framework for testing the multivariate point null hypothesis. A testing framework described in Elder et al. (2022) <arXiv:2203.01897> to test the multivariate point null hypothesis. After the user selects a parameter of interest and defines the assumed data generating mechanism, this information should be encoded in functions for the parameter estimator and its corresponding influence curve. Some parameter and data generating mechanism combinations have codings in this package, and are explained in detail in the article.

Maintained by Adam Elder. Last updated 3 years ago.

4.11 score 13 scripts

bioc

ssPATHS:ssPATHS: Single Sample PATHway Score

This package generates pathway scores from expression data for single samples after training on a reference cohort. The score is generated by taking the expression of a gene set (pathway) from a reference cohort and performing linear discriminant analysis to distinguish samples in the cohort that have the pathway augmented and not. The separating hyperplane is then used to score new samples.

Maintained by Natalie R. Davidson. Last updated 5 months ago.

software geneexpression biomedicalinformatics rnaseq pathways transcriptomics dimensionreduction classification

4.00 score 1 scripts

bioc

RNAmodR.RiboMethSeq:Detection of 2'-O methylations by RiboMethSeq

RNAmodR.RiboMethSeq implements the detection of 2'-O methylations on RNA from experimental data generated with the RiboMethSeq protocol. The package builds on the core functionality of the RNAmodR package to detect specific patterns of the modifications in high throughput sequencing data.

Maintained by Felix G.M. Ernst. Last updated 5 months ago.

software workflowstep visualization sequencing bioconductor modifications ribomethseq rna rnamodr

1 stars 4.00 score 4 scripts

bioc

RNAmodR.ML:Detecting patterns of post-transcriptional modifications using machine learning

RNAmodR.ML extend the functionality of the RNAmodR package and classical detection strategies towards detection through machine learning models. RNAmodR.ML provides classes, functions and an example workflow to establish a detection stratedy, which can be packaged.

Maintained by Felix G.M. Ernst. Last updated 5 months ago.

software infrastructure workflowstep visualization sequencing

1 stars 4.00 score 3 scripts

bioc

scTreeViz:R/Bioconductor package to interactively explore and visualize single cell RNA-seq datasets with hierarhical annotations

scTreeViz provides classes to support interactive data aggregation and visualization of single cell RNA-seq datasets with hierarchies for e.g. cell clusters at different resolutions. The `TreeIndex` class provides methods to manage hierarchy and split the tree at a given resolution or across resolutions. The `TreeViz` class extends `SummarizedExperiment` and can performs quick aggregations on the count matrix defined by clusters.

Maintained by Jayaram Kancherla. Last updated 5 months ago.

visualization infrastructure gui singlecell

4.00 score 3 scripts

mariaguilleng

boostingDEA:A Boosting Approach to Data Envelopment Analysis

Includes functions to estimate production frontiers and make ideal output predictions in the Data Envelopment Analysis (DEA) context using both standard models from DEA and Free Disposal Hull (FDH) and boosting techniques. In particular, EATBoosting (Guillen et al., 2023 <doi:10.1016/j.eswa.2022.119134>) and MARSBoosting. Moreover, the package includes code for estimating several technical efficiency measures using different models such as the input and output-oriented radial measures, the input and output-oriented Russell measures, the Directional Distance Function (DDF), the Weighted Additive Measure (WAM) and the Slacks-Based Measure (SBM).

Maintained by Maria D. Guillen. Last updated 2 years ago.

2 stars 4.00 score 3 scripts

fentouxungui

SeuratExplorer:An 'Shiny' App for Exploring scRNA-seq Data Processed in 'Seurat'

A simple, one-command package which runs an interactive dashboard capable of common visualizations for single cell RNA-seq. 'SeuratExplorer' requires a processed 'Seurat' object, which is saved as 'rds' or 'qs2' file.

Maintained by Yongchao Zhang. Last updated 2 days ago.

3.98 score

bioc

erccdashboard:Assess Differential Gene Expression Experiments with ERCC Controls

Technical performance metrics for differential gene expression experiments using External RNA Controls Consortium (ERCC) spike-in ratio mixtures.

Maintained by Sarah Munro. Last updated 5 months ago.

immunooncology geneexpression transcription alternativesplicing differentialexpression differentialsplicing genetics microarray mrnamicroarray rnaseq batcheffect multiplecomparison qualitycontrol

3.95 score 4 scripts

bioc

CPSM:CPSM: Cancer patient survival model

The CPSM package provides a comprehensive computational pipeline for predicting the survival probability of cancer patients. It offers a series of steps including data processing, splitting data into training and test subsets, and normalization of data. The package enables the selection of significant features based on univariate survival analysis and generates a LASSO prognostic index score. It supports the development of predictive models for survival probability using various features and provides visualization tools to draw survival curves based on predicted survival probabilities. Additionally, SPM includes functionalities for generating bar plots that depict the predicted mean and median survival times of patients, making it a versatile tool for survival analysis in cancer research.

Maintained by Harpreet Kaur. Last updated 22 days ago.

geneexpression normalization survival

3.90 score

bioc

a4Classif:Automated Affymetrix Array Analysis Classification Package

Functionalities for classification of Affymetrix microarray data, integrating within the Automated Affymetrix Array Analysis set of packages.

Maintained by Laure Cougnaud. Last updated 5 months ago.

microarray geneexpression classification

3.78 score 1 scripts 1 dependents

nhejazi

medshift:Causal mediation analysis for stochastic interventions

Estimators of a parameter arising in the decomposition of the population intervention (in)direct effect of stochastic interventions in causal mediation analysis, including efficient one-step, targeted minimum loss (TML), re-weighting (IPW), and substitution estimators. The parameter estimated constitutes a part of each of the population intervention (in)direct effects. These estimators may be used in assessing population intervention (in)direct effects under stochastic treatment regimes, including incremental propensity score interventions and modified treatment policies. The methodology was first discussed by I Díaz and NS Hejazi (2020) <doi:10.1111/rssb.12362>.

Maintained by Nima Hejazi. Last updated 3 years ago.

causal-inference inverse-probability-weights machine-learning mediation-analysis stochastic-interventions targeted-learning treatment-effects

9 stars 3.73 score 12 scripts

cran

Platypus:Single-Cell Immune Repertoire and Gene Expression Analysis

We present 'Platypus', an open-source software platform providing a user-friendly interface to investigate B-cell receptor and T-cell receptor repertoires from scSeq experiments. 'Platypus' provides a framework to automate and ease the analysis of single-cell immune repertoires while also incorporating transcriptional information involving unsupervised clustering, gene expression and gene ontology. This R version of 'Platypus' is part of the 'ePlatypus' ecosystem for computational analysis of immunogenomics data: Yermanos et al. (2021) <doi:10.1093/nargab/lqab023>, Cotet et al. (2023) <doi:10.1093/bioinformatics/btad553>.

Maintained by Alexander Yermanos. Last updated 6 months ago.

3.70 score

banking-analytics-lab

EMP:Expected Maximum Profit Classification Performance Measure

Functions for estimating EMP (Expected Maximum Profit Measure) in Credit Risk Scoring and Customer Churn Prediction, according to Verbraken et al (2013, 2014) <DOI:10.1109/TKDE.2012.50>, <DOI:10.1016/j.ejor.2014.04.001>.

Maintained by Cristian Bravo. Last updated 6 years ago.

1 stars 3.70 score 6 scripts

yaziciceyda

cmaRs:Implementation of the Conic Multivariate Adaptive Regression Splines in R

An implementation of 'Conic Multivariate Adaptive Regression Splines (CMARS)' in R. See Weber et al. (2011) CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization, <DOI:10.1080/17415977.2011.624770>. It constructs models by using the terms obtained from the forward step of MARS and then estimates parameters by using 'Tikhonov' regularization and conic quadratic optimization. It is possible to construct models for prediction and binary classification. It provides performance measures for the model developed. The package needs the optimisation software 'MOSEK' <https://www.mosek.com/> to construct the models. Please follow the instructions in 'Rmosek' for the installation.

Maintained by Ceyda Yazici. Last updated 2 years ago.

3.70 score 2 scripts

promidat

traineR:Predictive (Classification and Regression) Models Homologator

Methods to unify the different ways of creating predictive models and their different predictive formats for classification and regression. It includes methods such as K-Nearest Neighbors Schliep, K. P. (2004) <doi:10.5282/ubm/epub.1769>, Decision Trees Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (2017) <doi:10.1201/9781315139470>, ADA Boosting Esteban Alfaro, Matias Gamez, Noelia García (2013) <doi:10.18637/jss.v054.i02>, Extreme Gradient Boosting Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>, Random Forest Breiman (2001) <doi:10.1023/A:1010933404324>, Neural Networks Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Support Vector Machines Bennett, K. P. & Campbell, C. (2000) <doi:10.1145/380995.380999>, Bayesian Methods Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995) <doi:10.1201/9780429258411>, Linear Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Quadratic Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Logistic Regression Dobson, A. J., & Barnett, A. G. (2018) <doi:10.1201/9781315182780> and Penalized Logistic Regression Friedman, J. H., Hastie, T., & Tibshirani, R. (2010) <doi:10.18637/jss.v033.i01>.

Maintained by Oldemar Rodriguez R.. Last updated 1 years ago.

3.64 score 36 scripts 2 dependents

sistia01

DWLS:Gene Expression Deconvolution Using Dampened Weighted Least Squares

The rapid development of single-cell transcriptomic technologies has helped uncover the cellular heterogeneity within cell populations. However, bulk RNA-seq continues to be the main workhorse for quantifying gene expression levels due to technical simplicity and low cost. To most effectively extract information from bulk data given the new knowledge gained from single-cell methods, we have developed a novel algorithm to estimate the cell-type composition of bulk data from a single-cell RNA-seq-derived cell-type signature. Comparison with existing methods using various real RNA-seq data sets indicates that our new approach is more accurate and comprehensive than previous methods, especially for the estimation of rare cell types. More importantly,our method can detect cell-type composition changes in response to external perturbations, thereby providing a valuable, cost-effective method for dissecting the cell-type-specific effects of drug treatments or condition changes. As such, our method is applicable to a wide range of biological and clinical investigations. Dampened weighted least squares ('DWLS') is an estimation method for gene expression deconvolution, in which the cell-type composition of a bulk RNA-seq data set is computationally inferred. This method corrects common biases towards cell types that are characterized by highly expressed genes and/or are highly prevalent, to provide accurate detection across diverse cell types. See: <https://www.nature.com/articles/s41467-019-10802-z.pdf> for more information about the development of 'DWLS' and the methods behind our functions.

Maintained by Adriana Sistig. Last updated 3 years ago.

2 stars 3.62 score 42 scripts

liuy12

SCdeconR:Deconvolution of Bulk RNA-Seq Data using Single-Cell RNA-Seq Data as Reference

Streamlined workflow from deconvolution of bulk RNA-seq data to downstream differential expression and gene-set enrichment analysis. Provide various visualization functions.

Maintained by Yuanhang Liu. Last updated 10 months ago.

bulk-rna-seq-deconvolution deconvolution differential-expression ffpe geneset-enrichment-analysis scdeconr single-cell

4 stars 3.60 score 4 scripts

ly129

CausalMetaR:Causally Interpretable Meta-Analysis

Provides robust and efficient methods for estimating causal effects in a target population using a multi-source dataset, including those of Dahabreh et al. (2019) <doi:10.1111/biom.13716>, Robertson et al. (2021) <doi:10.48550/arXiv.2104.05905>, and Wang et al. (2024) <doi:10.48550/arXiv.2402.02684>. The multi-source data can be a collection of trials, observational studies, or a combination of both, which have the same data structure (outcome, treatment, and covariates). The target population can be based on an internal dataset or an external dataset where only covariate information is available. The causal estimands available are average treatment effects and subgroup treatment effects. See Wang et al. (2024) <doi:10.48550/arXiv.2402.04341> for a detailed guide on using the package.

Maintained by Sean McGrath. Last updated 3 months ago.

2 stars 3.60 score 3 scripts

erhard-lab

HetSeq:Identifying Modulators of Cellular Responses Leveraging Intercellular Heterogeneity

Cellular responses to perturbations are highly heterogeneous and depend largely on the initial state of cells. Connecting post-perturbation cells via cellular trajectories to untreated cells (e.g. by leveraging metabolic labeling information) enables exploitation of intercellular heterogeneity as a combined knock-down and overexpression screen to identify pathway modulators, termed Heterogeneity-seq (see 'Berg et al' <doi:10.1101/2024.10.28.620481>). This package contains functions to generate cellular trajectories based on scSLAM-seq (single-cell, thiol-(SH)-linked alkylation of RNA for metabolic labelling sequencing) time courses, functions to identify pathway modulators and to visualize the results.

Maintained by Kevin Berg. Last updated 2 months ago.

3.54 score

barnhilldave

TML:Tropical Geometry Tools for Machine Learning

Suite of tropical geometric tools for use in machine learning applications. These methods may be summarized in the following references: Yoshida, et al. (2022) <arxiv:2209.15045>, Barnhill et al. (2023) <arxiv:2303.02539>, Barnhill and Yoshida (2023) <doi:10.3390/math11153433>, Aliatimis et al. (2023) <arXiv:2306.08796>, Yoshida et al. (2022) <arXiv:2206.04206>, and Yoshida et al. (2019) <doi:10.1007/s11538-018-0493-4>.

Maintained by David Barnhill. Last updated 8 months ago.

3 stars 3.48 score 1 scripts

bioc

SCArray.sat:Large-scale single-cell RNA-seq data analysis using GDS files and Seurat

Extends the Seurat classes and functions to support Genomic Data Structure (GDS) files as a DelayedArray backend for data representation. It relies on the implementation of GDS-based DelayedMatrix in the SCArray package to represent single cell RNA-seq data. The common optimized algorithms leveraging GDS-based and single cell-specific DelayedMatrix (SC_GDSMatrix) are implemented in the SCArray package. SCArray.sat introduces a new SCArrayAssay class (derived from the Seurat Assay), which wraps raw counts, normalized expressions and scaled data matrix based on GDS-specific DelayedMatrix. It is designed to integrate seamlessly with the Seurat package to provide common data analysis in the SeuratObject-based workflow. Compared with Seurat, SCArray.sat significantly reduces the memory usage without downsampling and can be applied to very large datasets.

Maintained by Xiuwen Zheng. Last updated 9 days ago.

datarepresentation dataimport singlecell rnaseq

1 stars 3.48 score 3 scripts

bioc

a4:Automated Affymetrix Array Analysis Umbrella Package

Umbrella package is available for the entire Automated Affymetrix Array Analysis suite of package.

Maintained by Laure Cougnaud. Last updated 5 months ago.

microarray

3.48 score 15 scripts

thecailab

SCRIP:An Accurate Simulator for Single-Cell RNA Sequencing Data

We provide a comprehensive scheme that is capable of simulating Single Cell RNA Sequencing data for various parameters of Biological Coefficient of Variation, busting kinetics, differential expression (DE), cell or sample groups, cell trajectory, batch effect and other experimental designs. 'SCRIP' proposed and compared two frameworks with Gamma-Poisson and Beta-Gamma-Poisson models for simulating Single Cell RNA Sequencing data. Other reference is available in Zappia et al. (2017) <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1305-0>.

Maintained by Fei Qin. Last updated 2 years ago.

2 stars 3.41 score 13 scripts

jackmwolf

tehtuner:Fit and Tune Models to Detect Treatment Effect Heterogeneity

Implements methods to fit Virtual Twins models (Foster et al. (2011) <doi:10.1002/sim.4322>) for identifying subgroups with differential effects in the context of clinical trials while controlling the probability of falsely detecting a differential effect when the conditional average treatment effect is uniform across the study population using parameter selection methods proposed in Wolf et al. (2022) <doi:10.1177/17407745221095855>.

Maintained by Jack Wolf. Last updated 2 years ago.

clinical-trials heterogeneity-of-treatment-effect subgroup-identification

5 stars 3.40 score 6 scripts

mqnjqrid

drpop:Efficient and Doubly Robust Population Size Estimation

Estimation of the total population size from capture-recapture data efficiently and with low bias implementing the methods from Das M, Kennedy EH, and Jewell NP (2021) <arXiv:2104.14091>. The estimator is doubly robust against errors in the estimation of the intermediate nuisance parameters. Users can choose from the flexible estimation models provided in the package, or use any other preferred model.

Maintained by Manjari Das. Last updated 3 years ago.

5 stars 3.40 score 2 scripts

boshiangke

influenceAUC:Identify Influential Observations in Binary Classification

Ke, B. S., Chiang, A. J., & Chang, Y. C. I. (2018) <doi:10.1080/10543406.2017.1377728> provide two theoretical methods (influence function and local influence) based on the area under the receiver operating characteristic curve (AUC) to quantify the numerical impact of each observation to the overall AUC. Alternative graphical tools, cumulative lift charts, are proposed to reveal the existences and approximate locations of those influential observations through data visualization.

Maintained by Bo-Shiang Ke. Last updated 5 months ago.

3.30 score

dsokolo

scMappR:Single Cell Mapper

The single cell mapper (scMappR) R package contains a suite of bioinformatic tools that provide experimentally relevant cell-type specific information to a list of differentially expressed genes (DEG). The function "scMappR_and_pathway_analysis" reranks DEGs to generate cell-type specificity scores called cell-weighted fold-changes. Users input a list of DEGs, normalized counts, and a signature matrix into this function. scMappR then re-weights bulk DEGs by cell-type specific expression from the signature matrix, cell-type proportions from RNA-seq deconvolution and the ratio of cell-type proportions between the two conditions to account for changes in cell-type proportion. With cwFold-changes calculated, scMappR uses two approaches to utilize cwFold-changes to complete cell-type specific pathway analysis. The "process_dgTMatrix_lists" function in the scMappR package contains an automated scRNA-seq processing pipeline where users input scRNA-seq count data, which is made compatible for scMappR and other R packages that analyze scRNA-seq data. We further used this to store hundreds up regularly updating signature matrices. The functions "tissue_by_celltype_enrichment", "tissue_scMappR_internal", and "tissue_scMappR_custom" combine these consistently processed scRNAseq count data with gene-set enrichment tools to allow for cell-type marker enrichment of a generic gene list (e.g. GWAS hits). Reference: Sokolowski,D.J., Faykoo-Martinez,M., Erdman,L., Hou,H., Chan,C., Zhu,H., Holmes,M.M., Goldenberg,A. and Wilson,M.D. (2021) Single-cell mapper (scMappR): using scRNA-seq to infer cell-type specificities of differentially expressed genes. NAR Genomics and Bioinformatics. 3(1). Iqab011. <doi:10.1093/nargab/lqab011>.

Maintained by Dustin Sokolowski. Last updated 2 years ago.

4 stars 3.30 score 9 scripts

suman762

PredictABEL:Assessment of Risk Prediction Models

We included functions to assess the performance of risk models. The package contains functions for the various measures that are used in empirical studies, including univariate and multivariate odds ratios (OR) of the predictors, the c-statistic (or area under the receiver operating characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test, reclassification table, net reclassification improvement (NRI) and integrated discrimination improvement (IDI). Also included are functions to create plots, such as risk distributions, ROC curves, calibration plot, discrimination box plot and predictiveness curves. In addition to functions to assess the performance of risk models, the package includes functions to obtain weighted and unweighted risk scores as well as predicted risks using logistic regression analysis. These logistic regression functions are specifically written for models that include genetic variables, but they can also be applied to models that are based on non-genetic risk factors only. Finally, the package includes function to construct a simulated dataset with genotypes, genetic risks, and disease status for a hypothetical population, which is used for the evaluation of genetic risk models.

Maintained by Suman Kundu. Last updated 5 years ago.

2 stars 3.26 score 91 scripts

cran

tmle:Targeted Maximum Likelihood Estimation

Targeted maximum likelihood estimation of point treatment effects (Targeted Maximum Likelihood Learning, The International Journal of Biostatistics, 2(1), 2006. This version automatically estimates the additive treatment effect among the treated (ATT) and among the controls (ATC). The tmle() function calculates the adjusted marginal difference in mean outcome associated with a binary point treatment, for continuous or binary outcomes. Relative risk and odds ratio estimates are also reported for binary outcomes. Missingness in the outcome is allowed, but not in treatment assignment or baseline covariate values. The population mean is calculated when there is missingness, and no variation in the treatment assignment. The tmleMSM() function estimates the parameters of a marginal structural model for a binary point treatment effect. Effect estimation stratified by a binary mediating variable is also available. An ID argument can be used to identify repeated measures. Default settings call 'SuperLearner' to estimate the Q and g portions of the likelihood, unless values or a user-supplied regression function are passed in as arguments.

Maintained by Susan Gruber. Last updated 10 months ago.

1 stars 3.26 score 3 dependents

cefet-rj-dal

daltoolboxdp:Data Pre-Processing Extensions

An important aspect of data analytics is related to data management support for artificial intelligence. It is related to preparing data correctly. This package provides extensions to support data preparation in terms of both data sampling and data engineering. Overall, the package provides researchers with a comprehensive set of functionalities for data science based on experiment lines, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.

Maintained by Eduardo Ogasawara. Last updated 4 months ago.

openjdk

1 stars 3.26 score 12 scripts

abshev

superMICE:SuperLearner Method for MICE

Adds a Super Learner ensemble model method (using the 'SuperLearner' package) to the 'mice' package. Laqueur, H. S., Shev, A. B., Kagawa, R. M. C. (2021) <doi:10.1093/aje/kwab271>.

Maintained by Aaron B. Shev. Last updated 3 years ago.

3 stars 3.18 score

tlverse

tmle3mediate:Targeted Learning for Causal Mediation Analysis

Targeted maximum likelihood (TML) estimation of population-level causal effects in mediation analysis. The causal effects are defined by joint static or stochastic interventions applied to the exposure and the mediator. Targeted doubly robust estimators are provided for the classical natural direct and indirect effects, as well as the more recently developed population intervention direct and indirect effects.

Maintained by Nima Hejazi. Last updated 4 years ago.

causal-inference causal-mediation-analysis machine-learning mediation-analysis stochastic-interventions targeted-learning treatment-effects

6 stars 2.98 score 16 scripts

eogasawara

tspredit:Time Series Prediction Integrated Tuning

Prediction is one of the most important activities while working with time series. There are many alternative ways to model the time series. Finding the right one is challenging to model them. Most data-driven models (either statistical or machine learning) demand tuning. Setting them right is mandatory for good predictions. It is even more complex since time series prediction also demands choosing a data pre-processing that complies with the chosen model. Many time series frameworks have features to build and tune models. The package differs as it provides a framework that seamlessly integrates tuning data pre-processing activities with the building of models. The package provides functions for defining and conducting time series prediction, including data pre(post)processing, decomposition, tuning, modeling, prediction, and accuracy assessment. More information is available at Izau et al. <doi:10.5753/sbbd.2022.224330>.

Maintained by Eduardo Ogasawara. Last updated 4 months ago.

2.92 score 56 scripts

granatumx

lilikoi:Metabolomics Personalized Pathway Analysis Tool

A comprehensive analysis tool for metabolomics data. It consists a variety of functional modules, including several new modules: a pre-processing module for normalization and imputation, an exploratory data analysis module for dimension reduction and source of variation analysis, a classification module with the new deep-learning method and other machine-learning methods, a prognosis module with cox-PH and neural-network based Cox-nnet methods, and pathway analysis module to visualize the pathway and interpret metabolite-pathway relationships. References: H. Paul Benton <http://www.metabolomics-forum.com/index.php?topic=281.0> Jeff Xia <https://github.com/cangfengzhe/Metabo/blob/master/MetaboAnalyst/website/name_match.R> Travers Ching, Xun Zhu, Lana X. Garmire (2018) <doi:10.1371/journal.pcbi.1006076>.

Maintained by Lana Garmire. Last updated 2 years ago.

openjdk

1 stars 2.85 score 14 scripts

obenno

scSpotlight:A Single Cell Analysis Shiny App

A single cell analysis (viewer) app based on Seurat.

Maintained by Zhixia Xiao. Last updated 8 months ago.

seurat shiny-apps single-cell

2 stars 2.78 score

clementbenard

sirus:Stable and Interpretable RUle Set

A regression and classification algorithm based on random forests, which takes the form of a short list of rules. SIRUS combines the simplicity of decision trees with a predictivity close to random forests. The core aggregation principle of random forests is kept, but instead of aggregating predictions, SIRUS aggregates the forest structure: the most frequent nodes of the forest are selected to form a stable rule ensemble model. The algorithm is fully described in the following articles: Benard C., Biau G., da Veiga S., Scornet E. (2021), Electron. J. Statist., 15:427-505 <DOI:10.1214/20-EJS1792> for classification, and Benard C., Biau G., da Veiga S., Scornet E. (2021), AISTATS, PMLR 130:937-945 <http://proceedings.mlr.press/v130/benard21a>, for regression. This R package is a fork from the project ranger (<https://github.com/imbs-hl/ranger>).

Maintained by Clement Benard. Last updated 3 years ago.

cpp

2.78 score 12 scripts

jamesliley

SPARRAfairness:Analysis of Differential Behaviour of SPARRA Score Across Demographic Groups

The SPARRA risk score (Scottish Patients At Risk of admission and Re-Admission) estimates yearly risk of emergency hospital admission using electronic health records on a monthly basis for most of the Scottish population. This package implements a suite of functions used to analyse the behaviour and performance of the score, focusing particularly on differential performance over demographically-defined groups. It includes useful utility functions to plot receiver-operator-characteristic, precision-recall and calibration curves, draw stock human figures, estimate counterfactual quantities without the need to re-compute risk scores, to simulate a semi-realistic dataset.

Maintained by James Liley. Last updated 5 months ago.

2.70 score 4 scripts

yuepan027

scpoisson:Single Cell Poisson Probability Paradigm

Useful to visualize the Poissoneity (an independent Poisson statistical framework, where each RNA measurement for each cell comes from its own independent Poisson distribution) of Unique Molecular Identifier (UMI) based single cell RNA sequencing (scRNA-seq) data, and explore cell clustering based on model departure as a novel data representation.

Maintained by Yue Pan. Last updated 3 years ago.

2.70 score 4 scripts

jaydevine

pheble:Classifying High-Dimensional Phenotypes with Ensemble Learning

A system for binary and multi-class classification of high-dimensional phenotypic data using ensemble learning. By combining predictions from different classification models, this package attempts to improve performance over individual learners. The pre-processing, training, validation, and testing are performed end-to-end to minimize user input and simplify the process of classification.

Maintained by Jay Devine. Last updated 2 years ago.

2.70 score

lauren-eylerdang

EScvtmle:Experiment-Selector CV-TMLE for Integration of Observational and RCT Data

The experiment selector cross-validated targeted maximum likelihood estimator (ES-CVTMLE) aims to select the experiment that optimizes the bias-variance tradeoff for estimating a causal average treatment effect (ATE) where different experiments may include a randomized controlled trial (RCT) alone or an RCT combined with real-world data. Using cross-validation, the ES-CVTMLE separates the selection of the optimal experiment from the estimation of the ATE for the chosen experiment. The estimated bias term in the selector is a function of the difference in conditional mean outcome under control for the RCT compared to the combined experiment. In order to help include truly unbiased external data in the analysis, the estimated average treatment effect on a negative control outcome may be added to the bias term in the selector. For more details about this method, please see Dang et al. (2022) <arXiv:2210.05802>.

Maintained by Lauren Eyler Dang. Last updated 2 years ago.

2.70 score 4 scripts

doktorandahl

evinf:Inference with Extreme Value Inflated Count Data

Allows users to model and draw inferences from extreme value inflated count data, and to evaluate these models and compare to non extreme-value inflated counterparts. The package is built to be compatible with standard presentation tools such as 'broom', 'tidy', and 'modelsummary'.

Maintained by David Randahl. Last updated 11 months ago.

openblas cpp openmp

1 stars 2.70 score

sridhara-omics

scPipeline:A Wrapper for 'Seurat' and Related R Packages for End-to-End Single Cell Analysis

Reports markers list, differentially expressed genes, associated pathways, cell-type annotations, does batch correction and other related single cell analyses all wrapped within 'Seurat'.

Maintained by Viswanadham Sridhara. Last updated 27 days ago.

2.70 score

schiebout

CAMML:Cell-Typing using Variance Adjusted Mahalanobis Distances with Multi-Labeling

Creates multi-label cell-types for single-cell RNA-sequencing data based on weighted VAM scoring of cell-type specific gene sets. Schiebout, Frost (2022) <https://psb.stanford.edu/psb-online/proceedings/psb22/schiebout.pdf>.

Maintained by Courtney Schiebout. Last updated 1 years ago.

2.60 score

csoneson

ConfoundingExplorer:Confounding Explorer

This package provides a simple interactive application for investigating the effect of confounding between a signal of interest and a batch effect. It uses simulated data with user-specified effect sizes for both batch and condition effects. The user can also specify the number of samples in each condition and batch, and thereby the degree of confounding.

Maintained by Charlotte Soneson. Last updated 3 months ago.

regression experimentaldesign multiplecomparison batcheffect

2 stars 2.60 score 3 scripts

promidat

predictoR:Predictive Data Analysis System

Perform a supervised data analysis on a database through a 'shiny' graphical interface. It includes methods such as K-Nearest Neighbors, Decision Trees, ADA Boosting, Extreme Gradient Boosting, Random Forest, Neural Networks, Deep Learning, Support Vector Machines and Bayesian Methods.

Maintained by Oldemar Rodriguez. Last updated 1 years ago.

1 stars 2.60 score 3 scripts

cran

BioM2:Biologically Explainable Machine Learning Framework

Biologically Explainable Machine Learning Framework for Phenotype Prediction using omics data described in Chen and Schwarz (2017) <doi:10.48550/arXiv.1712.00336>.Identifying reproducible and interpretable biological patterns from high-dimensional omics data is a critical factor in understanding the risk mechanism of complex disease. As such, explainable machine learning can offer biological insight in addition to personalized risk scoring.In this process, a feature space of biological pathways will be generated, and the feature space can also be subsequently analyzed using WGCNA (Described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559> ) methods.

Maintained by Shunjie Zhang. Last updated 1 months ago.

2.54 score

igordot

scooter:Streamlined scRNA-Seq Analysis Pipeline

Streamlined scRNA-Seq analysis pipeline.

Maintained by Igor Dolgalev. Last updated 1 years ago.

4 stars 2.51 score 16 scripts

mbeer3

gkmSVM:Gapped-Kmer Support Vector Machine

Imports the 'gkmSVM' v2.0 functionalities into R <https://www.beerlab.org/gkmsvm/> It also uses the 'kernlab' library (separate R package by different authors) for various SVM algorithms. Users should note that the suggested packages 'rtracklayer', 'GenomicRanges', 'BSgenome', 'BiocGenerics', 'Biostrings', 'GenomeInfoDb', 'IRanges', and 'S4Vectors' are all BioConductor packages <https://bioconductor.org>.

Maintained by Mike Beer. Last updated 2 years ago.

cpp

2.48 score 30 scripts

cran

SPECK:Receptor Abundance Estimation using Reduced Rank Reconstruction and Clustered Thresholding

Surface Protein abundance Estimation using CKmeans-based clustered thresholding ('SPECK') is an unsupervised learning-based method that performs receptor abundance estimation for single cell RNA-sequencing data based on reduced rank reconstruction (RRR) and a clustered thresholding mechanism. Seurat's normalization method is described in: Hao et al., (2021) <doi:10.1016/j.cell.2021.04.048>, Stuart et al., (2019) <doi:10.1016/j.cell.2019.05.031>, Butler et al., (2018) <doi:10.1038/nbt.4096> and Satija et al., (2015) <doi:10.1038/nbt.3192>. Method for the RRR is further detailed in: Erichson et al., (2019) <doi:10.18637/jss.v089.i11> and Halko et al., (2009) <arXiv:0909.4061>. Clustering method is outlined in: Song et al., (2020) <doi:10.1093/bioinformatics/btaa613> and Wang et al., (2011) <doi:10.32614/RJ-2011-015>.

Maintained by Azka Javaid. Last updated 1 years ago.

2.48 score 1 dependents

cran

scaper:Single Cell Transcriptomics-Level Cytokine Activity Prediction and Estimation

Generates cell-level cytokine activity estimates using relevant information from gene sets constructed with the 'CytoSig' and the 'Reactome' databases and scored using the modified 'Variance-adjusted Mahalanobis (VAM)' framework for single-cell RNA-sequencing (scRNA-seq) data. 'CytoSig' database is described in: Jiang at al., (2021) <doi:10.1038/s41592-021-01274-5>. 'Reactome' database is described in: Gillespie et al., (2021) <doi:10.1093/nar/gkab1028>. The 'VAM' method is outlined in: Frost (2020) <doi:10.1093/nar/gkaa582>.

Maintained by Azka Javaid. Last updated 1 years ago.

2.30 score

bioc

maPredictDSC:Phenotype prediction using microarray data: approach of the best overall team in the IMPROVER Diagnostic Signature Challenge

This package implements the classification pipeline of the best overall team (Team221) in the IMPROVER Diagnostic Signature Challenge. Additional functionality is added to compare 27 combinations of data preprocessing, feature selection and classifier types.

Maintained by Adi Laurentiu Tarca. Last updated 5 months ago.

microarray classification

2.30 score 2 scripts

yangfengstat

nproc:Neyman-Pearson (NP) Classification Algorithms and NP Receiver Operating Characteristic (NP-ROC) Curves

In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (i.e., the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (i.e., the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, alpha, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than alpha do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than alpha. As a result, the NP paradigm has not been properly implemented for many classification scenarios in practice. In this work, we develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, including popular methods such as logistic regression, support vector machines and random forests. Powered by this umbrella algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands, motivated by the popular receiver operating characteristic (ROC) curves. NP-ROC bands will help choose in a data adaptive way and compare different NP classifiers.

Maintained by Yang Feng. Last updated 5 years ago.

2.23 score 17 scripts

hugometric

causalweight:Estimation Methods for Causal Inference Based on Inverse Probability Weighting and Doubly Robust Estimation

Various estimators of causal effects based on inverse probability weighting, doubly robust estimation, and double machine learning. Specifically, the package includes methods for estimating average treatment effects, direct and indirect effects in causal mediation analysis, and dynamic treatment effects. The models refer to studies of Froelich (2007) <doi:10.1016/j.jeconom.2006.06.004>, Huber (2012) <doi:10.3102/1076998611411917>, Huber (2014) <doi:10.1080/07474938.2013.806197>, Huber (2014) <doi:10.1002/jae.2341>, Froelich and Huber (2017) <doi:10.1111/rssb.12232>, Hsu, Huber, Lee, and Lettry (2020) <doi:10.1002/jae.2765>, and others.

Maintained by Hugo Bodory. Last updated 8 days ago.

2 stars 2.12 score 22 scripts

cran

STREAK:Receptor Abundance Estimation using Feature Selection and Gene Set Scoring

Performs receptor abundance estimation for single cell RNA-sequencing data using a supervised feature selection mechanism and a thresholded gene set scoring procedure. Seurat's normalization method is described in: Hao et al., (2021) <doi:10.1016/j.cell.2021.04.048>, Stuart et al., (2019) <doi:10.1016/j.cell.2019.05.031>, Butler et al., (2018) <doi:10.1038/nbt.4096> and Satija et al., (2015) <doi:10.1038/nbt.3192>. Method for reduced rank reconstruction and rank-k selection is detailed in: Javaid et al., (2022) <doi:10.1101/2022.10.08.511197>. Gene set scoring procedure is described in: Frost et al., (2020) <doi:10.1093/nar/gkaa582>. Clustering method is outlined in: Song et al., (2020) <doi:10.1093/bioinformatics/btaa613> and Wang et al., (2011) <doi:10.32614/RJ-2011-015>.

Maintained by Azka Javaid. Last updated 1 years ago.

2.00 score 2 scripts

cran

VIMPS:Calculate Variable Importance with Knock Off Variables

The variable importance is calculated using knock off variables. Then output can be provided in numerical and graphical form. Meredith L Wallace (2023) <doi:10.1186/s12874-023-01965-x>.

Maintained by Meredith Wallace. Last updated 1 years ago.

2.00 score

drelliesmall

phm:Phrase Mining

Functions to extract and handle commonly occurring principal phrases obtained from collections of texts.

Maintained by Ellie Small. Last updated 1 years ago.

1 stars 2.00 score 2 scripts

raznargimeno

SLModels:Stepwise Linear Models for Binary Classification Problems under Youden Index Optimisation

Stepwise models for the optimal linear combination of continuous variables in binary classification problems under Youden Index optimisation. Information on the models implemented can be found at Aznar-Gimeno et al. (2021) <doi:10.3390/math9192497>.

Maintained by Rocio Aznar-Gimeno. Last updated 3 years ago.

2.00 score

ictml-project

ictml:Easily Install and Load the 'ICTML'

The 'ICTML' is a software suite implementing the Interpretable Causal Targeted Machine Learning ('ICTML') Project <https://www.ictml.org>.

Maintained by Henrik Bengtsson. Last updated 5 months ago.

2.00 score

lcougnaud

nlcv:Nested Loop Cross Validation

Nested loop cross validation for classification purposes for misclassification error rate estimation. The package supports several methodologies for feature selection: random forest, Student t-test, limma, and provides an interface to the following classification methods in the 'MLInterfaces' package: linear, quadratic discriminant analyses, random forest, bagging, prediction analysis for microarray, generalized linear model, support vector machine (svm and ksvm). Visualizations to assess the quality of the classifier are included: plot of the ranks of the features, scores plot for a specific classification algorithm and number of features, misclassification rate for the different number of features and classification algorithms tested and ROC plot. For further details about the methodology, please check: Markus Ruschhaupt, Wolfgang Huber, Annemarie Poustka, and Ulrich Mansmann (2004) <doi:10.2202/1544-6115.1078>.

Maintained by Laure Cougnaud. Last updated 7 years ago.

2.00 score 8 scripts

aughunter

autoScorecard:Fully Automatic Generation of Scorecards

Provides an efficient suite of R tools for scorecard modeling, analysis, and visualization. Including equal frequency binning, equidistant binning, K-means binning, chi-square binning, decision tree binning, data screening, manual parameter modeling, fully automatic generation of scorecards, etc. This package is designed to make scorecard development easier and faster. References include: 1. <http://shichen.name/posts/>. 2. Dong-feng Li(Peking University),Class PPT. 3. <https://zhuanlan.zhihu.com/p/389710022>. 4. <https://www.zhangshengrong.com/p/281oqR9JNw/>.

Maintained by Tai-Sen Zheng. Last updated 2 years ago.

2.00 score 2 scripts

ubcxzhang

scAnnotate:An Automated Cell Type Annotation Tool for Single-Cell RNA-Sequencing Data

An entirely data-driven cell type annotation tools, which requires training data to learn the classifier, but not biological knowledge to make subjective decisions. It consists of three steps: preprocessing training and test data, model fitting on training data, and cell classification on test data. See Xiangling Ji,Danielle Tsao, Kailun Bai, Min Tsao, Li Xing, Xuekui Zhang.(2022)<doi:10.1101/2022.02.19.481159> for more details.

Maintained by Xuekui Zhang. Last updated 1 years ago.

2.00 score 4 scripts

cran

PoweREST:A Bootstrap-Based Power Estimation Tool for Spatial Transcriptomics

Power estimation and sample size calculation for 10X Visium Spatial Transcriptomics data to detect differential expressed genes between two conditions based on bootstrap resampling. See Shui et al. (2024) <doi:10.1101/2024.08.30.610564> for method details.

Maintained by Lan Shui. Last updated 7 months ago.

2.00 score

syeonkang

causal.decomp:Causal Decomposition Analysis

We implement causal decomposition analysis using the methods proposed by Park, Lee, and Qin (2020) and Park, Kang, and Lee (2021+) <arXiv:2109.06940>. This package allows researchers to use the multiple-mediator-imputation, single-mediator-imputation, and product-of-coefficients regression methods to estimate the initial disparity, disparity reduction, and disparity remaining. It also allows to make the inference conditional on baseline covariates. We also implement sensitivity analysis for the causal decomposition analysis using R-squared values as sensitivity parameters (Park, Kang, Lee, and Ma, 2023).

Maintained by Suyeon Kang. Last updated 2 years ago.

2.00 score 4 scripts

grosenberger

aLFQ:Estimating Absolute Protein Quantities from Label-Free LC-MS/MS Proteomics Data

Determination of absolute protein quantities is necessary for multiple applications, such as mechanistic modeling of biological systems. Quantitative liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics can measure relative protein abundance on a system-wide scale. To estimate absolute quantitative information using these relative abundance measurements requires additional information such as heavy-labeled references of known concentration. Multiple methods have been using different references and strategies; some are easily available whereas others require more effort on the users end. Hence, we believe the field might benefit from making some of these methods available under an automated framework, which also facilitates validation of the chosen strategy. We have implemented the most commonly used absolute label-free protein abundance estimation methods for LC-MS/MS modes quantifying on either MS1-, MS2-levels or spectral counts together with validation algorithms to enable automated data analysis and error estimation. Specifically, we used Monte-carlo cross-validation and bootstrapping for model selection and imputation of proteome-wide absolute protein quantity estimation. Our open-source software is written in the statistical programming language R and validated and demonstrated on a synthetic sample.

Maintained by George Rosenberger. Last updated 5 years ago.

1.85 score 14 scripts

dzhang777

SlideCNA:Calls Copy Number Alterations from Slide-Seq Data

This takes spatial single-cell-type RNA-seq data (specifically designed for Slide-seq v2) that calls copy number alterations (CNAs) using pseudo-spatial binning, clusters cellular units (e.g. beads) based on CNA profile, and visualizes spatial CNA patterns. Documentation about 'SlideCNA' is included in the the pre-print by Zhang et al. (2022, <doi:10.1101/2022.11.25.517982>). The package 'enrichR' (>= 3.0), conditionally used to annotate SlideCNA-determined clusters with gene ontology terms, can be installed at <https://github.com/wjawaid/enrichR> or with install_github("wjawaid/enrichR").

Maintained by Diane Zhang. Last updated 2 months ago.

1.70 score 3 scripts

cran

crossurr:Cross-Fitting for Doubly Robust Evaluation of High-Dimensional Surrogate Markers

Doubly robust methods for evaluating surrogate markers as outlined in: Agniel D, Hejblum BP, Thiebaut R & Parast L (2022). "Doubly robust evaluation of high-dimensional surrogate markers", Biostatistics <doi:10.1093/biostatistics/kxac020>. You can use these methods to determine how much of the overall treatment effect is explained by a (possibly high-dimensional) set of surrogate markers.

Maintained by Denis Agniel. Last updated 10 months ago.

1.70 score

sduxbury

netmediate:Micro-Macro Analysis for Social Networks

Estimates micro effects on macro structures (MEMS) and average micro mediated effects (AMME). URL: <https://github.com/sduxbury/netmediate>. BugReports: <https://github.com/sduxbury/netmediate/issues>. Robins, Garry, Phillipa Pattison, and Jodie Woolcock (2005) <doi:10.1086/427322>. Snijders, Tom A. B., and Christian E. G. Steglich (2015) <doi:10.1177/0049124113494573>. Imai, Kosuke, Luke Keele, and Dustin Tingley (2010) <doi:10.1037/a0020761>. Duxbury, Scott (2023) <doi:10.1177/00811750231209040>. Duxbury, Scott (2024) <doi:10.1177/00811750231220950>.

Maintained by Scott Duxbury. Last updated 10 months ago.

1.70 score

bozercavdar

less:Learning with Subset Stacking

"Learning with Subset Stacking" is a supervised learning algorithm that is based on training many local estimators on subsets of a given dataset, and then passing their predictions to a global estimator. You can find the details about LESS in our manuscript at <arXiv:2112.06251>.

Maintained by Burhan Ozer Cavdar. Last updated 3 years ago.

1.70 score 5 scripts

kirin666

mccf1:Creates the MCC-F1 Curve and Calculates the MCC-F1 Metric and the Best Threshold

The MCC-F1 analysis is a method to evaluate the performance of binary classifications. The MCC-F1 curve is more reliable than the Receiver Operating Characteristic (ROC) curve and the Precision-Recall (PR)curve under imbalanced ground truth. The MCC-F1 analysis also provides the MCC-F1 metric that integrates classifier performance over varying thresholds, and the best threshold of binary classification.

Maintained by Chang Cao. Last updated 5 years ago.

1.61 score 41 scripts

lau-mel

rocc:ROC Based Classification

Functions for a classification method based on receiver operating characteristics (ROC). Briefly, features are selected according to their ranked AUC value in the training set. The selected features are merged by the mean value to form a meta-gene. The samples are ranked by their meta-gene value and the meta-gene threshold that has the highest accuracy in splitting the training samples is determined. A new sample is classified by its meta-gene value relative to the threshold. In the first place, the package is aimed at two class problems in gene expression data, but might also apply to other problems.

Maintained by Martin Lauss. Last updated 5 years ago.

1.56 score 36 scripts

cblatti3

DRaWR:Discriminative Random Walk with Restart

We present DRaWR, a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types, preserving more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only the relevant properties. We then rerank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork.

Maintained by Charles Blatti. Last updated 3 years ago.

1.48 score 7 scripts

drelliesmall

smallstuff:Dr. Small's Functions

Functions used in courses taught by Dr. Small at Drew University.

Maintained by Ellie Small. Last updated 1 years ago.

1.48 score 2 scripts 1 dependents

cran

SpaCCI:Spatially Aware Cell-Cell Interaction Analysis

Provides tools for analyzing spatial cell-cell interactions based on ligand-receptor pairs, including functions for local, regional, and global analysis using spatial transcriptomics data. Integrates with databases like 'CellChat' <http://www.cellchat.org/>, 'CellPhoneDB' <https://www.cellphonedb.org/>, 'Cellinker' <https://www.rna-society.org/cellinker/>, 'ICELLNET' <https://github.com/soumelis-lab/ICELLNET>, and 'ConnectomeDB' <https://humanconnectome.org/software/connectomedb/> to identify ligand-receptor pairs, visualize interactions through heatmaps, chord diagrams, and infer interactions on different spatial scales.

Maintained by Li-Ting Ku. Last updated 2 months ago.

openblas cpp

1.48 score

sduxbury

ergMargins:Process Analysis for Exponential Random Graph Models

Calculates marginal effects and conducts process analysis in exponential family random graph models (ERGM). Includes functions to conduct mediation and moderation analyses and to diagnose multicollinearity. URL: <https://github.com/sduxbury/ergMargins>. BugReports: <https://github.com/sduxbury/ergMargins/issues>. Duxbury, Scott W (2021) <doi:10.1177/0049124120986178>. Long, J. Scott, and Sarah Mustillo (2018) <doi:10.1177/0049124118799374>. Mize, Trenton D. (2019) <doi:10.15195/v6.a4>. Karlson, Kristian Bernt, Anders Holm, and Richard Breen (2012) <doi:10.1177/0081175012444861>. Duxbury, Scott W (2018) <doi:10.1177/0049124118782543>. Duxbury, Scott W, Jenna Wertsching (2023) <doi:10.1016/j.socnet.2023.02.003>. Huang, Peng, Carter Butts (2023) <doi:10.1016/j.socnet.2023.07.001>.

Maintained by Scott Duxbury. Last updated 11 months ago.

1.48 score 3 scripts 1 dependents

jiayiji

CIMTx:Causal Inference for Multiple Treatments with a Binary Outcome

Different methods to conduct causal inference for multiple treatments with a binary outcome, including regression adjustment, vector matching, Bayesian additive regression trees, targeted maximum likelihood and inverse probability of treatment weighting using different generalized propensity score models such as multinomial logistic regression, generalized boosted models and super learner. For more details, see the paper by Hu et al. <doi:10.1177/0962280220921909>.

Maintained by Jiayi Ji. Last updated 3 years ago.

1.43 score 27 scripts

jodamatta

SLOS:ICU Length of Stay Prediction and Efficiency Evaluation

Provides tools for predicting ICU length of stay and assessing ICU efficiency. It is based on the methodologies proposed by Peres et al. (2022, 2023), which utilize data-driven approaches for modeling and validation, offering insights into ICU performance and patient outcomes. References: Peres et al. (2022)<https://pubmed.ncbi.nlm.nih.gov/35988701/>, Peres et al. (2023)<https://pubmed.ncbi.nlm.nih.gov/37922007/>. More information: <https://github.com/igor-peres/ICU-Length-of-Stay-Prediction>.

Maintained by Joana da Matta. Last updated 2 months ago.

1.30 score

sg-tlr

twoStageDesignTMLE:Targeted Maximum Likelihood Estimation for Two-Stage Study Design

An inverse probability of censoring weighted (IPCW) targeted maximum likelihood estimator (TMLE) for evaluating a marginal point treatment effect from data where some variables were collected on only a subset of participants using a two-stage design (or marginal mean outcome for a single arm study). A TMLE for conditional parameters defined by a marginal structural model (MSM) is also available.

Maintained by Susan Gruber. Last updated 2 months ago.

1.30 score

cran

regressoR:Regression Data Analysis System

Perform a supervised data analysis on a database through a 'shiny' graphical interface. It includes methods such as linear regression, penalized regression, k-nearest neighbors, decision trees, ada boosting, extreme gradient boosting, random forest, neural networks, deep learning and support vector machines.

Maintained by Oldemar Rodriguez. Last updated 5 months ago.

2 stars 1.30 score

shuangsong0110

EBPRS:Derive Polygenic Risk Score Based on Emprical Bayes Theory

EB-PRS is a novel method that leverages information for effect sizes across all the markers to improve the prediction accuracy. No parameter tuning is needed in the method, and no external information is needed. This R-package provides the calculation of polygenic risk scores from the given training summary statistics and testing data. We can use EB-PRS to extract main information, estimate Empirical Bayes parameters, derive polygenic risk scores for each individual in testing data, and evaluate the PRS according to AUC and predictive r2. See Song et al. (2020) <doi:10.1371/journal.pcbi.1007565> for a detailed presentation of the method.

Maintained by Shuang Song. Last updated 5 years ago.

2 stars 1.30 score 10 scripts