Showing 200 of total 216 results (show query)
satijalab
Seurat:Tools for Single Cell Genomics
A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.
Maintained by Paul Hoffman. Last updated 1 years ago.
human-cell-atlassingle-cell-genomicssingle-cell-rna-seqcpp
2.4k stars 16.86 score 50k scripts 73 dependentsecpolley
SuperLearner:Super Learner Prediction
Implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner.
Maintained by Eric Polley. Last updated 1 years ago.
274 stars 12.85 score 2.1k scripts 36 dependentsyanyachen
MLmetrics:Machine Learning Evaluation Metrics
A collection of evaluation metrics, including loss, score and utility functions, that measure regression, classification and ranking performance.
Maintained by Yachen Yan. Last updated 12 months ago.
69 stars 11.09 score 2.2k scripts 20 dependentsbioc
infercnv:Infer Copy Number Variation from Single-Cell RNA-Seq Data
Using single-cell RNA-Seq expression to visualize CNV in cells.
Maintained by Christophe Georgescu. Last updated 5 months ago.
softwarecopynumbervariationvariantdetectionstructuralvariationgenomicvariationgeneticstranscriptomicsstatisticalmethodbayesianhiddenmarkovmodelsinglecelljagscpp
601 stars 10.92 score 674 scriptsbioc
singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data
The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.
Maintained by Joshua David Campbell. Last updated 1 months ago.
singlecellgeneexpressiondifferentialexpressionalignmentclusteringimmunooncologybatcheffectnormalizationqualitycontroldataimportgui
182 stars 10.17 score 252 scriptsbioc
SC3:Single-Cell Consensus Clustering
A tool for unsupervised clustering and analysis of single cell RNA-Seq data.
Maintained by Vladimir Kiselev. Last updated 5 months ago.
immunooncologysinglecellsoftwareclassificationclusteringdimensionreductionsupportvectormachinernaseqvisualizationtranscriptomicsdatarepresentationguidifferentialexpressiontranscriptionbioconductor-packagehuman-cell-atlassingle-cell-rna-seqopenblascpp
125 stars 10.10 score 374 scripts 1 dependentsconstantamateur
SoupX:Single Cell mRNA Soup eXterminator
Quantify, profile and remove ambient mRNA contamination (the "soup") from droplet based single cell RNA-seq experiments. Implements the method described in Young et al. (2018) <doi:10.1101/303727>.
Maintained by Matthew Daniel Young. Last updated 2 years ago.
266 stars 10.08 score 594 scripts 1 dependentstlverse
sl3:Pipelines for Machine Learning and Super Learning
A modern implementation of the Super Learner prediction algorithm, coupled with a general purpose framework for composing arbitrary pipelines for machine learning tasks.
Maintained by Jeremy Coyle. Last updated 5 months ago.
data-scienceensemble-learningensemble-modelmachine-learningmodel-selectionregressionstackingstatistics
100 stars 9.94 score 748 scripts 7 dependentsjamovi
jmv:The 'jamovi' Analyses
A suite of common statistical methods such as descriptives, t-tests, ANOVAs, regression, correlation matrices, proportion tests, contingency tables, and factor analysis. This package is also useable from the 'jamovi' statistical spreadsheet (see <https://www.jamovi.org> for more information).
Maintained by Jonathon Love. Last updated 1 months ago.
60 stars 9.48 score 440 scriptsstemangiola
tidyseurat:Brings Seurat to the Tidyverse
It creates an invisible layer that allow to see the 'Seurat' object as tibble and interact seamlessly with the tidyverse.
Maintained by Stefano Mangiola. Last updated 8 months ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsdplyrggplot2pcapurrrsctseuratsingle-cellsingle-cell-rna-seqtibbletidyrtidyversetranscriptstsneumap
159 stars 9.48 score 398 scripts 1 dependentsledell
cvAUC:Cross-Validated Area Under the ROC Curve Confidence Intervals
Tools for working with and evaluating cross-validated area under the ROC curve (AUC) estimators. The primary functions of the package are ci.cvAUC and ci.pooled.cvAUC, which report cross-validated AUC and compute confidence intervals for cross-validated AUC estimates based on influence curves for i.i.d. and pooled repeated measures data, respectively. One benefit to using influence curve based confidence intervals is that they require much less computation time than bootstrapping methods. The utility functions, AUC and cvAUC, are simple wrappers for functions from the ROCR package.
Maintained by Erin LeDell. Last updated 3 years ago.
aucconfidence-intervalscross-validationmachine-learningstatisticsvariance
23 stars 9.17 score 317 scripts 40 dependentsbioc
iCOBRA:Comparison and Visualization of Ranking and Assignment Methods
This package provides functions for calculation and visualization of performance metrics for evaluation of ranking and binary classification (assignment) methods. Various types of performance plots can be generated programmatically. The package also contains a shiny application for interactive exploration of results.
Maintained by Charlotte Soneson. Last updated 3 months ago.
16 stars 8.84 score 192 scripts 1 dependentspablo14
funModeling:Exploratory Data Analysis and Data Preparation Tool-Box
Around 10% of almost any predictive modeling project is spent in predictive modeling, 'funModeling' and the book Data Science Live Book (<https://livebook.datascienceheroes.com/>) are intended to cover remaining 90%: data preparation, profiling, selecting best variables 'dataViz', assessing model performance and other functions.
Maintained by Pablo Casas. Last updated 2 years ago.
100 stars 8.51 score 654 scriptssamuel-marsh
scCustomize:Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing
Collection of functions created and/or curated to aid in the visualization and analysis of single-cell data using 'R'. 'scCustomize' aims to provide 1) Customized visualizations for aid in ease of use and to create more aesthetic and functional visuals. 2) Improve speed/reproducibility of common tasks/pieces of code in scRNA-seq analysis with a single or group of functions. For citation please use: Marsh SE (2021) "Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing" <doi:10.5281/zenodo.5706430> RRID:SCR_024675.
Maintained by Samuel Marsh. Last updated 3 months ago.
customizationggplot2scrna-seqseuratsingle-cellsingle-cell-genomicssingle-cell-rna-seqvisualization
246 stars 8.45 score 1.1k scriptsbioc
projectR:Functions for the projection of weights from PCA, CoGAPS, NMF, correlation, and clustering
Functions for the projection of data into the spaces defined by PCA, CoGAPS, NMF, correlation, and clustering.
Maintained by Genevieve Stein-OBrien. Last updated 15 days ago.
functionalpredictiongeneregulationbiologicalquestionsoftware
62 stars 8.42 score 70 scriptscarmonalab
scGate:Marker-Based Cell Type Purification for Single-Cell Sequencing Data
A common bioinformatics task in single-cell data analysis is to purify a cell type or cell population of interest from heterogeneous datasets. 'scGate' automatizes marker-based purification of specific cell populations, without requiring training data or reference gene expression profiles. Briefly, 'scGate' takes as input: i) a gene expression matrix stored in a 'Seurat' object and ii) a โgating modelโ (GM), consisting of a set of marker genes that define the cell population of interest. The GM can be as simple as a single marker gene, or a combination of positive and negative markers. More complex GMs can be constructed in a hierarchical fashion, akin to gating strategies employed in flow cytometry. 'scGate' evaluates the strength of signature marker expression in each cell using the rank-based method 'UCell', and then performs k-nearest neighbor (kNN) smoothing by calculating the mean 'UCell' score across neighboring cells. kNN-smoothing aims at compensating for the large degree of sparsity in scRNA-seq data. Finally, a universal threshold over kNN-smoothed signature scores is applied in binary decision trees generated from the user-provided gating model, to annotate cells as either โpureโ or โimpureโ, with respect to the cell population of interest. See the related publication Andreatta et al. (2022) <doi:10.1093/bioinformatics/btac141>.
Maintained by Massimo Andreatta. Last updated 2 months ago.
filteringmarker-genesscgatesignaturessingle-cell
106 stars 8.38 score 163 scriptscefet-rj-dal
harbinger:A Unified Time Series Event Detection Framework
By analyzing time series, it is possible to observe significant changes in the behavior of observations that frequently characterize events. Events present themselves as anomalies, change points, or motifs. In the literature, there are several methods for detecting events. However, searching for a suitable time series method is a complex task, especially considering that the nature of events is often unknown. This work presents Harbinger, a framework for integrating and analyzing event detection methods. Harbinger contains several state-of-the-art methods described in Salles et al. (2020) <doi:10.5753/sbbd.2020.13626>.
Maintained by Eduardo Ogasawara. Last updated 4 months ago.
18 stars 8.32 score 216 scriptscwolock
survML:Tools for Flexible Survival Analysis Using Machine Learning
Statistical tools for analyzing time-to-event data using machine learning. Implements survival stacking for conditional survival estimation, standardized survival function estimation for current status data, and methods for algorithm-agnostic variable importance. See Wolock CJ, Gilbert PB, Simon N, and Carone M (2024) <doi:10.1080/10618600.2024.2304070>.
Maintained by Charles Wolock. Last updated 1 days ago.
18 stars 8.13 score 73 scripts 1 dependentsbioc
compcodeR:RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods
This package provides extensive functionality for comparing results obtained by different methods for differential expression analysis of RNAseq data. It also contains functions for simulating count data. Finally, it provides convenient interfaces to several packages for performing the differential expression analysis. These can also be used as templates for setting up and running a user-defined differential analysis workflow within the framework of the package.
Maintained by Charlotte Soneson. Last updated 3 months ago.
immunooncologyrnaseqdifferentialexpression
12 stars 8.10 score 26 scriptstlverse
tmle3:The Extensible TMLE Framework
A general framework supporting the implementation of targeted maximum likelihood estimators (TMLEs) of a diverse range of statistical target parameters through a unified interface. The goal is that the exposed framework be as general as the mathematical framework upon which it draws.
Maintained by Jeremy Coyle. Last updated 5 months ago.
causal-inferencemachine-learningtargeted-learningvariable-importance
38 stars 7.91 score 286 scripts 5 dependentsmyles-lewis
nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'
Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.
Maintained by Myles Lewis. Last updated 14 days ago.
12 stars 7.90 score 46 scriptsschlosslab
mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines
An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topรงuoฤlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.
Maintained by Kelly Sovacool. Last updated 2 years ago.
56 stars 7.83 score 86 scriptsnsaph-software
CausalGPS:Matching on Generalized Propensity Scores with Continuous Exposures
Provides a framework for estimating causal effects of a continuous exposure using observational data, and implementing matching and weighting on the generalized propensity score. Wu, X., Mealli, F., Kioumourtzoglou, M.A., Dominici, F. and Braun, D., 2022. Matching on generalized propensity scores with continuous exposures. Journal of the American Statistical Association, pp.1-29.
Maintained by Naeem Khoshnevis. Last updated 10 months ago.
24 stars 7.67 score 39 scriptsyqzhong7
AIPW:Augmented Inverse Probability Weighting
The 'AIPW' package implements the augmented inverse probability weighting, a doubly robust estimator, for average causal effect estimation with user-defined stacked machine learning algorithms. To cite the 'AIPW' package, please use: "Yongqi Zhong, Edward H. Kennedy, Lisa M. Bodnar, Ashley I. Naimi (2021). AIPW: An R Package for Augmented Inverse Probability Weighted Estimation of Average Causal Effects. American Journal of Epidemiology. doi: 10.1093/aje/kwab207". Visit: <https://yqzhong7.github.io/AIPW/> for more information.
Maintained by Yongqi Zhong. Last updated 17 days ago.
causal-inferencemachine-learningrobust-estimators
24 stars 7.65 score 31 scripts 1 dependentsbioc
ggsc:Visualizing Single Cell and Spatial Transcriptomics
Useful functions to visualize single cell and spatial data. It supports visualizing 'Seurat', 'SingleCellExperiment' and 'SpatialExperiment' objects through grammar of graphics syntax implemented in 'ggplot2'.
Maintained by Guangchuang Yu. Last updated 5 months ago.
dimensionreductiongeneexpressionsinglecellsoftwarespatialtranscriptomicsvisualizationopenblascppopenmp
47 stars 7.59 score 18 scriptsbioc
MOSim:Multi-Omics Simulation (MOSim)
MOSim package simulates multi-omic experiments that mimic regulatory mechanisms within the cell, allowing flexible experimental design including time course and multiple groups.
Maintained by Sonia Tarazona. Last updated 3 days ago.
softwaretimecourseexperimentaldesignrnaseqcpp
9 stars 7.46 score 11 scriptsmwheymans
psfmi:Prediction Model Pooling, Selection and Performance Evaluation Across Multiply Imputed Datasets
Pooling, backward and forward selection of linear, logistic and Cox regression models in multiply imputed datasets. Backward and forward selection can be done from the pooled model using Rubin's Rules (RR), the D1, D2, D3, D4 and the median p-values method. This is also possible for Mixed models. The models can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of predictors. The stability of the models can be evaluated using (cluster) bootstrapping. The package further contains functions to pool model performance measures as ROC/AUC, Reclassification, R-squared, scaled Brier score, H&L test and calibration plots for logistic regression models. Internal validation can be done across multiply imputed datasets with cross-validation or bootstrapping. The adjusted intercept after shrinkage of pooled regression coefficients can be obtained. Backward and forward selection as part of internal validation is possible. A function to externally validate logistic prediction models in multiple imputed datasets is available and a function to compare models. For Cox models a strata variable can be included. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>.
Maintained by Martijn Heymans. Last updated 2 years ago.
cox-regressionimputationimputed-datasetslogisticmultiple-imputationpoolpredictorregressionselectionsplinespline-predictors
10 stars 7.17 score 70 scriptsbioc
CuratedAtlasQueryR:Queries the Human Cell Atlas
Provides access to a copy of the Human Cell Atlas, but with harmonised metadata. This allows for uniform querying across numerous datasets within the Atlas using common fields such as cell type, tissue type, and patient ethnicity. Usage involves first querying the metadata table for cells of interest, and then downloading the corresponding cells into a SingleCellExperiment object.
Maintained by Stefano Mangiola. Last updated 5 months ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsdatabaseduckdbhdf5human-cell-atlassingle-cellsinglecellexperimenttidyverse
90 stars 7.04 score 41 scriptsleifeld
btergm:Temporal Exponential Random Graph Models by Bootstrapped Pseudolikelihood
Temporal Exponential Random Graph Models (TERGM) estimated by maximum pseudolikelihood with bootstrapped confidence intervals or Markov Chain Monte Carlo maximum likelihood. Goodness of fit assessment for ERGMs, TERGMs, and SAOMs. Micro-level interpretation of ERGMs and TERGMs. The methods are described in Leifeld, Cranmer and Desmarais (2018), JStatSoft <doi:10.18637/jss.v083.i06>.
Maintained by Philip Leifeld. Last updated 15 days ago.
complex-networksdynamic-analysisergmestimationgoodness-of-fitinferencelongitudinal-datanetwork-analysispredictiontergm
18 stars 7.03 score 83 scripts 2 dependentsbioc
pipeComp:pipeComp pipeline benchmarking framework
A simple framework to facilitate the comparison of pipelines involving various steps and parameters. The `pipelineDefinition` class represents pipelines as, minimally, a set of functions consecutively executed on the output of the previous one, and optionally accompanied by step-wise evaluation and aggregation functions. Given such an object, a set of alternative parameters/methods, and benchmark datasets, the `runPipeline` function then proceeds through all combinations arguments, avoiding recomputing the same step twice and compiling evaluations on the fly to avoid storing potentially large intermediate data.
Maintained by Pierre-Luc Germain. Last updated 5 months ago.
geneexpressiontranscriptomicsclusteringdatarepresentationbenchmarkbioconductorpipeline-benchmarkingpipelinessingle-cell-rna-seq
41 stars 7.02 score 43 scriptsbenkeser
drtmle:Doubly-Robust Nonparametric Estimation and Inference
Targeted minimum loss-based estimators of counterfactual means and causal effects that are doubly-robust with respect both to consistency and asymptotic normality (Benkeser et al (2017), <doi:10.1093/biomet/asx053>; MJ van der Laan (2014), <doi:10.1515/ijb-2012-0038>).
Maintained by David Benkeser. Last updated 2 years ago.
causal-inferenceensemble-learningiptwstatistical-inferencetmle
19 stars 6.89 score 90 scripts 1 dependentsbioc
COTAN:COexpression Tables ANalysis
Statistical and computational method to analyze the co-expression of gene pairs at single cell level. It provides the foundation for single-cell gene interactome analysis. The basic idea is studying the zero UMI counts' distribution instead of focusing on positive counts; this is done with a generalized contingency tables framework. COTAN can effectively assess the correlated or anti-correlated expression of gene pairs. It provides a numerical index related to the correlation and an approximate p-value for the associated independence test. COTAN can also evaluate whether single genes are differentially expressed, scoring them with a newly defined global differentiation index. Moreover, this approach provides ways to plot and cluster genes according to their co-expression pattern with other genes, effectively helping the study of gene interactions and becoming a new tool to identify cell-identity marker genes.
Maintained by Galfrรจ Silvia Giulia. Last updated 21 days ago.
systemsbiologytranscriptomicsgeneexpressionsinglecell
16 stars 6.85 score 96 scriptsbdwilliamson
vimp:Perform Inference on Algorithm-Agnostic Variable Importance
Calculate point estimates of and valid confidence intervals for nonparametric, algorithm-agnostic variable importance measures in high and low dimensions, using flexible estimators of the underlying regression functions. For more information about the methods, please see Williamson et al. (Biometrics, 2020), Williamson et al. (JASA, 2021), and Williamson and Feng (ICML, 2020).
Maintained by Brian D. Williamson. Last updated 2 months ago.
machine-learningnonparametric-statisticsstatistical-inferencevariable-importance
23 stars 6.79 score 67 scriptsmichaellli
evalITR:Evaluating Individualized Treatment Rules
Provides various statistical methods for evaluating Individualized Treatment Rules under randomized data. The provided metrics include Population Average Value (PAV), Population Average Prescription Effect (PAPE), Area Under Prescription Effect Curve (AUPEC). It also provides the tools to analyze Individualized Treatment Rules under budget constraints. Detailed reference in Imai and Li (2019) <arXiv:1905.05389>.
Maintained by Michael Lingzhi Li. Last updated 2 years ago.
14 stars 6.78 score 36 scriptsbioc
scAnnotatR:Pretrained learning models for cell type prediction on single cell RNA-sequencing data
The package comprises a set of pretrained machine learning models to predict basic immune cell types. This enables all users to quickly get a first annotation of the cell types present in their dataset without requiring prior knowledge. scAnnotatR also allows users to train their own models to predict new cell types based on specific research needs.
Maintained by Johannes Griss. Last updated 5 months ago.
singlecelltranscriptomicsgeneexpressionsupportvectormachineclassificationsoftware
15 stars 6.61 score 20 scriptscarmonalab
GeneNMF:Non-Negative Matrix Factorization for Single-Cell Omics
A collection of methods to extract gene programs from single-cell gene expression data using non-negative matrix factorization (NMF). 'GeneNMF' contains functions to directly interact with the 'Seurat' toolkit and derive interpretable gene program signatures.
Maintained by Massimo Andreatta. Last updated 16 days ago.
105 stars 6.58 score 12 scriptsbioc
SpotClean:SpotClean adjusts for spot swapping in spatial transcriptomics data
SpotClean is a computational method to adjust for spot swapping in spatial transcriptomics data. Recent spatial transcriptomics experiments utilize slides containing thousands of spots with spot-specific barcodes that bind mRNA. Ideally, unique molecular identifiers at a spot measure spot-specific expression, but this is often not the case due to bleed from nearby spots, an artifact we refer to as spot swapping. SpotClean is able to estimate the contamination rate in observed data and decontaminate the spot swapping effect, thus increase the sensitivity and precision of downstream analyses.
Maintained by Zijian Ni. Last updated 5 months ago.
dataimportrnaseqsequencinggeneexpressionspatialsinglecelltranscriptomicspreprocessingrna-seqspatial-transcriptomics
31 stars 6.52 score 36 scriptsbioc
CatsCradle:This package provides methods for analysing spatial transcriptomics data and for discovering gene clusters
This package addresses two broad areas. It allows for in-depth analysis of spatial transcriptomic data by identifying tissue neighbourhoods. These are contiguous regions of tissue surrounding individual cells. 'CatsCradle' allows for the categorisation of neighbourhoods by the cell types contained in them and the genes expressed in them. In particular, it produces Seurat objects whose individual elements are neighbourhoods rather than cells. In addition, it enables the categorisation and annotation of genes by producing Seurat objects whose elements are genes.
Maintained by Michael Shapiro. Last updated 17 days ago.
biologicalquestionstatisticalmethodgeneexpressionsinglecelltranscriptomicsspatial
3 stars 6.52 scoremathewchamberlain
SignacX:Cell Type Identification and Discovery from Single Cell Gene Expression Data
An implementation of neural networks trained with flow-sorted gene expression data to classify cellular phenotypes in single cell RNA-sequencing data. See Chamberlain M et al. (2021) <doi:10.1101/2021.02.01.429207> for more details.
Maintained by Mathew Chamberlain. Last updated 2 years ago.
cellular-phenotypesseuratsingle-cell-rna-seq
25 stars 6.47 score 34 scriptsgiscience-fsu
sperrorest:Perform Spatial Error Estimation and Variable Importance Assessment
Implements spatial error estimation and permutation-based variable importance measures for predictive models using spatial cross-validation and spatial block bootstrap.
Maintained by Alexander Brenning. Last updated 2 years ago.
cross-validationmachine-learningspatial-statisticsspatio-temporal-modelingstatistical-learning
19 stars 6.46 score 46 scriptsqile0317
APackOfTheClones:Visualization of Clonal Expansion for Single Cell Immune Profiles
Visualize clonal expansion via circle-packing. 'APackOfTheClones' extends 'scRepertoire' to produce a publication-ready visualization of clonal expansion at a single cell resolution, by representing expanded clones as differently sized circles. The method was originally implemented by Murray Christian and Ben Murrell in the following immunology study: Ma et al. (2021) <doi:10.1126/sciimmunol.abg6356>.
Maintained by Qile Yang. Last updated 5 months ago.
clonal-analysisimmune-repertoireimmune-systemscrna-seqscrnaseqseuratsingle-cellsingle-cell-genomicscpp
15 stars 6.45 score 15 scriptslhe17
nebula:Negative Binomial Mixed Models Using Large-Sample Approximation for Differential Expression Analysis of ScRNA-Seq Data
A fast negative binomial mixed model for conducting association analysis of multi-subject single-cell data. It can be used for identifying marker genes, differential expression and co-expression analyses. The model includes subject-level random effects to account for the hierarchical structure in multi-subject single-cell data. See He et al. (2021) <doi:10.1038/s42003-021-02146-6>.
Maintained by Liang He. Last updated 8 days ago.
37 stars 6.43 score 145 scriptsnsaph-software
CRE:Interpretable Discovery and Inference of Heterogeneous Treatment Effects
Provides a new method for interpretable heterogeneous treatment effects characterization in terms of decision rules via an extensive exploration of heterogeneity patterns by an ensemble-of-trees approach, enforcing high stability in the discovery. It relies on a two-stage pseudo-outcome regression, and it is supported by theoretical convergence guarantees. Bargagli-Stoffi, F. J., Cadei, R., Lee, K., & Dominici, F. (2023) Causal rule ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects. arXiv preprint <doi:10.48550/arXiv.2009.09036>.
Maintained by Falco Joannes Bargagli Stoffi. Last updated 5 months ago.
13 stars 6.41 score 11 scriptsrivolli
utiml:Utilities for Multi-Label Learning
Multi-label learning strategies and others procedures to support multi- label classification in R. The package provides a set of multi-label procedures such as sampling methods, transformation strategies, threshold functions, pre-processing techniques and evaluation metrics. A complete overview of the matter can be seen in Zhang, M. and Zhou, Z. (2014) <doi:10.1109/TKDE.2013.39> and Gibaja, E. and Ventura, S. (2015) A Tutorial on Multi-label Learning.
Maintained by Adriano Rivolli. Last updated 4 years ago.
28 stars 6.39 score 87 scriptsbioc
RNAmodR:Detection of post-transcriptional modifications in high throughput sequencing data
RNAmodR provides classes and workflows for loading/aggregation data from high througput sequencing aimed at detecting post-transcriptional modifications through analysis of specific patterns. In addition, utilities are provided to validate and visualize the results. The RNAmodR package provides a core functionality from which specific analysis strategies can be easily implemented as a seperate package.
Maintained by Felix G.M. Ernst. Last updated 5 months ago.
softwareinfrastructureworkflowstepvisualizationsequencingalkanilineseqbioconductormodificationsribomethseqrnarnamodr
3 stars 6.39 score 9 scripts 3 dependentsnt-williams
lmtp:Non-Parametric Causal Effects of Feasible Interventions Based on Modified Treatment Policies
Non-parametric estimators for casual effects based on longitudinal modified treatment policies as described in Diaz, Williams, Hoffman, and Schenck <doi:10.1080/01621459.2021.1955691>, traditional point treatment, and traditional longitudinal effects. Continuous, binary, categorical treatments, and multivariate treatments are allowed as well are censored outcomes. The treatment mechanism is estimated via a density ratio classification procedure irrespective of treatment variable type. For both continuous and binary outcomes, additive treatment effects can be calculated and relative risks and odds ratios may be calculated for binary outcomes. Supports survival outcomes with competing risks (Diaz, Hoffman, and Hejazi; <doi:10.1007/s10985-023-09606-7>).
Maintained by Nicholas Williams. Last updated 26 days ago.
causal-inferencecensored-datalongitudinal-datamachine-learningmodified-treatment-policynonparametric-statisticsprecision-medicinerobust-statisticsstatisticsstochastic-interventionssurvival-analysistargeted-learning
64 stars 6.37 score 91 scriptsbioc
scDataviz:scDataviz: single cell dataviz and downstream analyses
In the single cell World, which includes flow cytometry, mass cytometry, single-cell RNA-seq (scRNA-seq), and others, there is a need to improve data visualisation and to bring analysis capabilities to researchers even from non-technical backgrounds. scDataviz attempts to fit into this space, while also catering for advanced users. Additonally, due to the way that scDataviz is designed, which is based on SingleCellExperiment, it has a 'plug and play' feel, and immediately lends itself as flexibile and compatibile with studies that go beyond scDataviz. Finally, the graphics in scDataviz are generated via the ggplot engine, which means that users can 'add on' features to these with ease.
Maintained by Kevin Blighe. Last updated 5 months ago.
singlecellimmunooncologyrnaseqgeneexpressiontranscriptionflowcytometrymassspectrometrydataimport
63 stars 6.30 score 16 scriptsbioc
tidyomics:Easily install and load the tidyomics ecosystem
The tidyomics ecosystem is a set of packages for โomic data analysis that work together in harmony; they share common data representations and API design, consistent with the tidyverse ecosystem. The tidyomics package is designed to make it easy to install and load core packages from the tidyomics ecosystem with a single command.
Maintained by Stefano Mangiola. Last updated 5 months ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicscytometrygenomicstidyverse
67 stars 6.13 score 5 scriptsjernejjevsenak
dendroTools:Linear and Nonlinear Methods for Analyzing Daily and Monthly Dendroclimatological Data
Provides novel dendroclimatological methods, primarily used by the Tree-ring research community. There are four core functions. The first one is daily_response(), which finds the optimal sequence of days that are related to one or more tree-ring proxy records. Similar function is daily_response_seascorr(), which implements partial correlations in the analysis of daily response functions. For the enthusiast of monthly data, there is monthly_response() function. The last core function is compare_methods(), which effectively compares several linear and nonlinear regression algorithms on the task of climate reconstruction.
Maintained by Jernej Jevsenak. Last updated 1 months ago.
5 stars 6.13 score 81 scriptsfeiyoung
DR.SC:Joint Dimension Reduction and Spatial Clustering
Joint dimension reduction and spatial clustering is conducted for Single-cell RNA sequencing and spatial transcriptomics data, and more details can be referred to Wei Liu, Xu Liao, Yi Yang, Huazhen Lin, Joe Yeong, Xiang Zhou, Xingjie Shi and Jin Liu. (2022) <doi:10.1093/nar/gkac219>. It is not only computationally efficient and scalable to the sample size increment, but also is capable of choosing the smoothness parameter and the number of clusters as well.
Maintained by Wei Liu. Last updated 1 years ago.
dimension-reductionselfsupervisedspatial-clusteringspatial-transcriptomicsopenblascpp
5 stars 6.12 score 29 scripts 2 dependentsswfsc
eSDM:Ensemble Tool for Predictions from Species Distribution Models
A tool which allows users to create and evaluate ensembles of species distribution model (SDM) predictions. Functionality is offered through R functions or a GUI (R Shiny app). This tool can assist users in identifying spatial uncertainties and making informed conservation and management decisions. The package is further described in Woodman et al (2019) <doi:10.1111/2041-210X.13283>.
Maintained by Sam Woodman. Last updated 6 months ago.
11 stars 6.07 score 24 scriptsbioc
Dino:Normalization of Single-Cell mRNA Sequencing Data
Dino normalizes single-cell, mRNA sequencing data to correct for technical variation, particularly sequencing depth, prior to downstream analysis. The approach produces a matrix of corrected expression for which the dependency between sequencing depth and the full distribution of normalized expression; many existing methods aim to remove only the dependency between sequencing depth and the mean of the normalized expression. This is particuarly useful in the context of highly sparse datasets such as those produced by 10X genomics and other uninque molecular identifier (UMI) based microfluidics protocols for which the depth-dependent proportion of zeros in the raw expression data can otherwise present a challenge.
Maintained by Jared Brown. Last updated 5 months ago.
softwarenormalizationrnaseqsinglecellsequencinggeneexpressiontranscriptomicsregressioncellbasedassays
9 stars 6.02 score 13 scriptsbioc
FEAST:FEAture SelcTion (FEAST) for Single-cell clustering
Cell clustering is one of the most important and commonly performed tasks in single-cell RNA sequencing (scRNA-seq) data analysis. An important step in cell clustering is to select a subset of genes (referred to as โfeaturesโ), whose expression patterns will then be used for downstream clustering. A good set of features should include the ones that distinguish different cell types, and the quality of such set could have significant impact on the clustering accuracy. FEAST is an R library for selecting most representative features before performing the core of scRNA-seq clustering. It can be used as a plug-in for the etablished clustering algorithms such as SC3, TSCAN, SHARP, SIMLR, and Seurat. The core of FEAST algorithm includes three steps: 1. consensus clustering; 2. gene-level significance inference; 3. validation of an optimized feature set.
Maintained by Kenong Su. Last updated 5 months ago.
sequencingsinglecellclusteringfeatureextraction
10 stars 5.97 score 47 scriptsbioc
PathoStat:PathoStat Statistical Microbiome Analysis Package
The purpose of this package is to perform Statistical Microbiome Analysis on metagenomics results from sequencing data samples. In particular, it supports analyses on the PathoScope generated report files. PathoStat provides various functionalities including Relative Abundance charts, Diversity estimates and plots, tests of Differential Abundance, Time Series visualization, and Core OTU analysis.
Maintained by Solaiappan Manimaran. Last updated 5 months ago.
microbiomemetagenomicsgraphandnetworkmicroarraypatternlogicprincipalcomponentsequencingsoftwarevisualizationrnaseqimmunooncology
8 stars 5.90 score 8 scriptsfeiyoung
ProFAST:Probabilistic Factor Analysis for Spatially-Aware Dimension Reduction
Probabilistic factor analysis for spatially-aware dimension reduction across multi-section spatial transcriptomics data with millions of spatial locations. More details can be referred to Wei Liu, et al. (2023) <doi:10.1101/2023.07.11.548486>.
Maintained by Wei Liu. Last updated 2 months ago.
2 stars 5.86 score 12 scripts 1 dependentsnsaph-software
GPCERF:Gaussian Processes for Estimating Causal Exposure Response Curves
Provides a non-parametric Bayesian framework based on Gaussian process priors for estimating causal effects of a continuous exposure and detecting change points in the causal exposure response curves using observational data. Ren, B., Wu, X., Braun, D., Pillai, N., & Dominici, F.(2021). "Bayesian modeling for exposure response curve via gaussian processes: Causal effects of exposure to air pollution on health outcomes." arXiv preprint <doi:10.48550/arXiv.2105.03454>.
Maintained by Boyu Ren. Last updated 11 months ago.
9 stars 5.86 score 16 scriptsbioc
omicsViewer:Interactive and explorative visualization of SummarizedExperssionSet or ExpressionSet using omicsViewer
omicsViewer visualizes ExpressionSet (or SummarizedExperiment) in an interactive way. The omicsViewer has a separate back- and front-end. In the back-end, users need to prepare an ExpressionSet that contains all the necessary information for the downstream data interpretation. Some extra requirements on the headers of phenotype data or feature data are imposed so that the provided information can be clearly recognized by the front-end, at the same time, keep a minimum modification on the existing ExpressionSet object. The pure dependency on R/Bioconductor guarantees maximum flexibility in the statistical analysis in the back-end. Once the ExpressionSet is prepared, it can be visualized using the front-end, implemented by shiny and plotly. Both features and samples could be selected from (data) tables or graphs (scatter plot/heatmap). Different types of analyses, such as enrichment analysis (using Bioconductor package fgsea or fisher's exact test) and STRING network analysis, will be performed on the fly and the results are visualized simultaneously. When a subset of samples and a phenotype variable is selected, a significance test on means (t-test or ranked based test; when phenotype variable is quantitative) or test of independence (chi-square or fisherโs exact test; when phenotype data is categorical) will be performed to test the association between the phenotype of interest with the selected samples. Additionally, other analyses can be easily added as extra shiny modules. Therefore, omicsViewer will greatly facilitate data exploration, many different hypotheses can be explored in a short time without the need for knowledge of R. In addition, the resulting data could be easily shared using a shiny server. Otherwise, a standalone version of omicsViewer together with designated omics data could be easily created by integrating it with portable R, which can be shared with collaborators or submitted as supplementary data together with a manuscript.
Maintained by Chen Meng. Last updated 2 months ago.
softwarevisualizationgenesetenrichmentdifferentialexpressionmotifdiscoverynetworknetworkenrichment
4 stars 5.82 score 22 scriptsbioc
scBubbletree:Quantitative visual exploration of scRNA-seq data
scBubbletree is a quantitative method for the visual exploration of scRNA-seq data, preserving key biological properties such as local and global cell distances and cell density distributions across samples. It effectively resolves overplotting and enables the visualization of diverse cell attributes from multiomic single-cell experiments. Additionally, scBubbletree is user-friendly and integrates seamlessly with popular scRNA-seq analysis tools, facilitating comprehensive and intuitive data interpretation.
Maintained by Simo Kitanovski. Last updated 5 months ago.
visualizationclusteringsinglecelltranscriptomicsrnaseqbig-databigdatascrna-seqscrna-seq-analysisvisualvisual-exploration
6 stars 5.82 score 8 scriptsandreasnordland
polle:Policy Learning
Package for learning and evaluating (subgroup) policies via doubly robust loss functions. Policy learning methods include doubly robust blip/conditional average treatment effect learning and sequential policy tree learning. Methods for (subgroup) policy evaluation include doubly robust cross-fitting and online estimation/sequential validation. See Nordland and Holst (2022) <doi:10.48550/arXiv.2212.02335> for documentation and references.
Maintained by Andreas Nordland. Last updated 6 days ago.
4 stars 5.80 score 6 scriptsbioc
benchdamic:Benchmark of differential abundance methods on microbiome data
Starting from a microbiome dataset (16S or WMS with absolute count values) it is possible to perform several analysis to assess the performances of many differential abundance detection methods. A basic and standardized version of the main differential abundance analysis methods is supplied but the user can also add his method to the benchmark. The analyses focus on 4 main aspects: i) the goodness of fit of each method's distributional assumptions on the observed count data, ii) the ability to control the false discovery rate, iii) the within and between method concordances, iv) the truthfulness of the findings if any apriori knowledge is given. Several graphical functions are available for result visualization.
Maintained by Matteo Calgaro. Last updated 4 months ago.
metagenomicsmicrobiomedifferentialexpressionmultiplecomparisonnormalizationpreprocessingsoftwarebenchmarkdifferential-abundance-methods
8 stars 5.78 score 8 scriptsbioc
scRNAseqApp:A single-cell RNAseq Shiny app-package
The scRNAseqApp is a Shiny app package designed for interactive visualization of single-cell data. It is an enhanced version derived from the ShinyCell, repackaged to accommodate multiple datasets. The app enables users to visualize data containing various types of information simultaneously, facilitating comprehensive analysis. Additionally, it includes a user management system to regulate database accessibility for different users.
Maintained by Jianhong Ou. Last updated 20 days ago.
visualizationsinglecellrnaseqinteractive-visualizationsmultiple-usersshiny-appssingle-cell-rna-seq
4 stars 5.76 score 3 scriptscore-bioinformatics
ClustAssess:Tools for Assessing Clustering
A set of tools for evaluating clustering robustness using proportion of ambiguously clustered pairs (Senbabaoglu et al. (2014) <doi:10.1038/srep06207>), as well as similarity across methods and method stability using element-centric clustering comparison (Gates et al. (2019) <doi:10.1038/s41598-019-44892-y>). Additionally, this package enables stability-based parameter assessment for graph-based clustering pipelines typical in single-cell data analysis.
Maintained by Andi Munteanu. Last updated 2 months ago.
softwaresinglecellrnaseqatacseqnormalizationpreprocessingdimensionreductionvisualizationqualitycontrolclusteringclassificationannotationgeneexpressiondifferentialexpressionbioinformaticsgenomicsmachine-learningparameter-optimizationrobustnesssingle-cellunsupervised-learningcpp
23 stars 5.70 score 18 scriptsbioc
scFeatures:scFeatures: Multi-view representations of single-cell and spatial data for disease outcome prediction
scFeatures constructs multi-view representations of single-cell and spatial data. scFeatures is a tool that generates multi-view representations of single-cell and spatial data through the construction of a total of 17 feature types. These features can then be used for a variety of analyses using other software in Biocondutor.
Maintained by Yue Cao. Last updated 5 months ago.
cellbasedassayssinglecellspatialsoftwaretranscriptomics
11 stars 5.69 score 15 scriptsthuizhou
PSweight:Propensity Score Weighting for Causal Inference with Observational Studies and Randomized Trials
Supports propensity score weighting analysis of observational studies and randomized trials. Enables the estimation and inference of average causal effects with binary and multiple treatments using overlap weights (ATO), inverse probability of treatment weights (ATE), average treatment effect among the treated weights (ATT), matching weights (ATM) and entropy weights (ATEN), with and without propensity score trimming. These weights are members of the family of balancing weights introduced in Li, Morgan and Zaslavsky (2018) <doi:10.1080/01621459.2016.1260466> and Li and Li (2019) <doi:10.1214/19-AOAS1282>.
Maintained by Yukang Zeng. Last updated 1 years ago.
23 stars 5.54 score 47 scripts 2 dependentsbioc
scDotPlot:Cluster a Single-cell RNA-seq Dot Plot
Dot plots of single-cell RNA-seq data allow for an examination of the relationships between cell groupings (e.g. clusters) and marker gene expression. The scDotPlot package offers a unified approach to perform a hierarchical clustering analysis and add annotations to the columns and/or rows of a scRNA-seq dot plot. It works with SingleCellExperiment and Seurat objects as well as data frames.
Maintained by Benjamin I Laufer. Last updated 13 days ago.
softwarevisualizationdifferentialexpressiongeneexpressiontranscriptionrnaseqsinglecellsequencingclustering
7 stars 5.45 score 2 scriptsbioc
speckle:Statistical methods for analysing single cell RNA-seq data
The speckle package contains functions for the analysis of single cell RNA-seq data. The speckle package currently contains functions to analyse differences in cell type proportions. There are also functions to estimate the parameters of the Beta distribution based on a given counts matrix, and a function to normalise a counts matrix to the median library size. There are plotting functions to visualise cell type proportions and the mean-variance relationship in cell type proportions and counts. As our research into specialised analyses of single cell data continues we anticipate that the package will be updated with new functions.
Maintained by Belinda Phipson. Last updated 5 months ago.
singlecellrnaseqregressiongeneexpression
5.41 score 258 scriptschoonghyunryu
alookr:Model Classifier for Binary Classification
A collection of tools that support data splitting, predictive modeling, and model evaluation. A typical function is to split a dataset into a training dataset and a test dataset. Then compare the data distribution of the two datasets. Another feature is to support the development of predictive models and to compare the performance of several predictive models, helping to select the best model.
Maintained by Choonghyun Ryu. Last updated 1 years ago.
12 stars 5.38 score 9 scriptssimonmoulds
lulcc:Land Use Change Modelling in R
Classes and methods for spatially explicit land use change modelling in R.
Maintained by Simon Moulds. Last updated 5 years ago.
41 stars 5.37 score 38 scriptsddimmery
tidyhte:Tidy Estimation of Heterogeneous Treatment Effects
Estimates heterogeneous treatment effects using tidy semantics on experimental or observational data. Methods are based on the doubly-robust learner of Kennedy (n.d.) <arXiv:2004.14497>. You provide a simple recipe for what machine learning algorithms to use in estimating the nuisance functions and 'tidyhte' will take care of cross-validation, estimation, model selection, diagnostics and construction of relevant quantities of interest about the variability of treatment effects.
Maintained by Drew Dimmery. Last updated 2 years ago.
14 stars 5.36 score 11 scriptsavi-kenny
vaccine:Statistical Tools for Immune Correlates Analysis of Vaccine Clinical Trial Data
Various semiparametric and nonparametric statistical tools for immune correlates analysis of vaccine clinical trial data. This includes calculation of summary statistics and estimation of risk, vaccine efficacy, controlled effects (controlled risk and controlled vaccine efficacy), and mediation effects (natural direct effect, natural indirect effect, proportion mediated). See Gilbert P, Fong Y, Kenny A, and Carone, M (2022) <doi:10.1093/biostatistics/kxac024> and Fay MP and Follmann DA (2023) <doi:10.48550/arXiv.2208.06465>.
Maintained by Avi Kenny. Last updated 1 months ago.
4 stars 5.34 score 11 scriptstlverse
tmle3shift:Targeted Learning of the Causal Effects of Stochastic Interventions
Targeted maximum likelihood estimation (TMLE) of population-level causal effects under stochastic treatment regimes and related nonparametric variable importance analyses. Tools are provided for TML estimation of the counterfactual mean under a stochastic intervention characterized as a modified treatment policy, such as treatment policies that shift the natural value of the exposure. The causal parameter and estimation were described in Dรญaz and van der Laan (2013) <doi:10.1111/j.1541-0420.2011.01685.x> and an improved estimation approach was given by Dรญaz and van der Laan (2018) <doi:10.1007/978-3-319-65304-4_14>.
Maintained by Nima Hejazi. Last updated 6 months ago.
causal-inferencemachine-learningmarginal-structural-modelsstochastic-interventionstargeted-learningtreatment-effectsvariable-importance
17 stars 5.33 score 42 scripts 1 dependentszhiyuan-hu-lab
CIDER:Meta-Clustering for scRNA-Seq Integration and Evaluation
A workflow of (a) meta-clustering based on inter-group similarity measures and (b) a ground-truth-free test metric to assess the biological correctness of integration in real datasets. See Hu Z, Ahmed A, Yau C (2021) <doi:10.1101/2021.03.29.437525> for more details.
Maintained by Zhiyuan Hu. Last updated 2 months ago.
5.30 scorebioc
scCB2:CB2 improves power of cell detection in droplet-based single-cell RNA sequencing data
scCB2 is an R package implementing CB2 for distinguishing real cells from empty droplets in droplet-based single cell RNA-seq experiments (especially for 10x Chromium). It is based on clustering similar barcodes and calculating Monte-Carlo p-value for each cluster to test against background distribution. This cluster-level test outperforms single-barcode-level tests in dealing with low count barcodes and homogeneous sequencing library, while keeping FDR well controlled.
Maintained by Zijian Ni. Last updated 5 months ago.
dataimportrnaseqsinglecellsequencinggeneexpressiontranscriptomicspreprocessingclustering
10 stars 5.30 score 5 scriptsbioc
biotmle:Targeted Learning with Moderated Statistics for Biomarker Discovery
Tools for differential expression biomarker discovery based on microarray and next-generation sequencing data that leverage efficient semiparametric estimators of the average treatment effect for variable importance analysis. Estimation and inference of the (marginal) average treatment effects of potential biomarkers are computed by targeted minimum loss-based estimation, with joint, stable inference constructed across all biomarkers using a generalization of moderated statistics for use with the estimated efficient influence function. The procedure accommodates the use of ensemble machine learning for the estimation of nuisance functions.
Maintained by Nima Hejazi. Last updated 5 months ago.
regressiongeneexpressiondifferentialexpressionsequencingmicroarrayrnaseqimmunooncologybioconductorbioconductor-packagebioconductor-packagesbioinformaticsbiomarker-discoverybiostatisticscausal-inferencecomputational-biologymachine-learningstatisticstargeted-learning
5 stars 5.30 score 5 scriptsjiang-junyao
CACIMAR:cross-species analysis of cell identities, markers and regulations
A toolkit to perform cross-species analysis based on scRNA-seq data. CACIMAR contains 5 main features. (1) identify Markers in each cluster. (2) Cell type annotaion (3) identify conserved markers. (4) identify conserved cell types. (5) identify conserved modules of regulatory networks.
Maintained by Junyao Jiang. Last updated 16 hours ago.
cross-species-analysisscrna-seq
12 stars 5.23 score 6 scriptsadefazio
classifierplots:Generates a Visualization of Classifier Performance as a Grid of Diagnostic Plots
Generates a visualization of binary classifier performance as a grid of diagnostic plots with just one function call. Includes ROC curves, prediction density, accuracy, precision, recall and calibration plots, all using ggplot2 for easy modification. Debug your binary classifiers faster and easier!
Maintained by Aaron Defazio. Last updated 4 years ago.
50 stars 5.08 score 16 scriptsbioc
CDI:Clustering Deviation Index (CDI)
Single-cell RNA-sequencing (scRNA-seq) is widely used to explore cellular variation. The analysis of scRNA-seq data often starts from clustering cells into subpopulations. This initial step has a high impact on downstream analyses, and hence it is important to be accurate. However, there have not been unsupervised metric designed for scRNA-seq to evaluate clustering performance. Hence, we propose clustering deviation index (CDI), an unsupervised metric based on the modeling of scRNA-seq UMI counts to evaluate clustering of cells.
Maintained by Jiyuan Fang. Last updated 5 months ago.
singlecellsoftwareclusteringvisualizationsequencingrnaseqcellbasedassays
5 stars 5.00 score 4 scriptsbioc
decontX:Decontamination of single cell genomics data
This package contains implementation of DecontX (Yang et al. 2020), a decontamination algorithm for single-cell RNA-seq, and DecontPro (Yin et al. 2023), a decontamination algorithm for single cell protein expression data. DecontX is a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. DecontPro is a Bayesian method that estimates the level of contamination from ambient and background sources in CITE-seq ADT dataset and decontaminate the dataset.
Maintained by Joshua Campbell. Last updated 2 months ago.
4.94 score 29 scriptsbioc
Melissa:Bayesian clustering and imputationa of single cell methylomes
Melissa is a Baysian probabilistic model for jointly clustering and imputing single cell methylomes. This is done by taking into account local correlations via a Generalised Linear Model approach and global similarities using a mixture modelling approach.
Maintained by C. A. Kapourani. Last updated 5 months ago.
immunooncologydnamethylationgeneexpressiongeneregulationepigeneticsgeneticsclusteringfeatureextractionregressionrnaseqbayesiankeggsequencingcoveragesinglecell
4.90 score 7 scriptsjgasmits
AnanseSeurat:Construct ANANSE GRN-Analysis Seurat
Enables gene regulatory network (GRN) analysis on single cell clusters, using the GRN analysis software 'ANANSE', Xu et al.(2021) <doi:10.1093/nar/gkab598>. Export data from 'Seurat' objects, for GRN analysis by 'ANANSE' implemented in 'snakemake'. Finally, incorporate results for visualization and interpretation.
Maintained by Jos Smits. Last updated 1 years ago.
grn-analysisseurat-objectssingle-cellsingle-cell-atac-seqsingle-cell-rna-seq
8 stars 4.90 score 4 scriptsbdwilliamson
flevr:Flexible, Ensemble-Based Variable Selection with Potentially Missing Data
Perform variable selection in settings with possibly missing data based on extrinsic (algorithm-specific) and intrinsic (population-level) variable importance. Uses a Super Learner ensemble to estimate the underlying prediction functions that give rise to estimates of variable importance. For more information about the methods, please see Williamson and Huang (2023+) <arXiv:2202.12989>.
Maintained by Brian D. Williamson. Last updated 1 years ago.
5 stars 4.88 score 2 scriptsbioc
CelliD:Unbiased Extraction of Single Cell gene signatures using Multiple Correspondence Analysis
CelliD is a clustering-free multivariate statistical method for the robust extraction of per-cell gene signatures from single-cell RNA-seq. CelliD allows unbiased cell identity recognition across different donors, tissues-of-origin, model organisms and single-cell omics protocols. The package can also be used to explore functional pathways enrichment in single cell data.
Maintained by Akira Cortal. Last updated 5 months ago.
rnaseqsinglecelldimensionreductionclusteringgenesetenrichmentgeneexpressionatacseqopenblascppopenmp
4.85 score 70 scriptsyuelyu21
SCIntRuler:Guiding the Integration of Multiple Single-Cell RNA-Seq Datasets
The accumulation of single-cell RNA-seq (scRNA-seq) studies highlights the potential benefits of integrating multiple datasets. By augmenting sample sizes and enhancing analytical robustness, integration can lead to more insightful biological conclusions. However, challenges arise due to the inherent diversity and batch discrepancies within and across studies. SCIntRuler, a novel R package, addresses these challenges by guiding the integration of multiple scRNA-seq datasets.
Maintained by Yue Lyu. Last updated 6 months ago.
sequencinggeneticvariabilitysinglecellcpp
2 stars 4.85 score 3 scriptsjucheng1992
ctmle:Collaborative Targeted Maximum Likelihood Estimation
Implements the general template for collaborative targeted maximum likelihood estimation. It also provides several commonly used C-TMLE instantiation, like the vanilla/scalable variable-selection C-TMLE (Ju et al. (2017) <doi:10.1177/0962280217729845>) and the glmnet-C-TMLE algorithm (Ju et al. (2017) <arXiv:1706.10029>).
Maintained by Cheng Ju. Last updated 5 years ago.
causal-inferencemachine-learningstatisticstmle
5 stars 4.83 score 27 scriptspapatheodorou-group
scGOclust:Measuring Cell Type Similarity with Gene Ontology in Single-Cell RNA-Seq
Traditional methods for analyzing single cell RNA-seq datasets focus solely on gene expression, but this package introduces a novel approach that goes beyond this limitation. Using Gene Ontology terms as features, the package allows for the functional profile of cell populations, and comparison within and between datasets from the same or different species. Our approach enables the discovery of previously unrecognized functional similarities and differences between cell types and has demonstrated success in identifying cell types' functional correspondence even between evolutionarily distant species.
Maintained by Yuyao Song. Last updated 12 days ago.
9 stars 4.80 score 14 scriptscefet-rj-dal
heimdall:Drift Adaptable Models
By analyzing streaming datasets, it is possible to observe significant changes in the data distribution or models' accuracy during their prediction (concept drift). The goal of 'heimdall' is to measure when concept drift occurs. The package makes available several state-of-the-art methods. It also tackles how to adapt models in a nonstationary context. Some concept drifts methods are described in Tavares (2022) <doi:10.1007/s12530-021-09415-z>.
Maintained by Eduardo Ogasawara. Last updated 2 months ago.
2 stars 4.77 score 45 scriptsjillbo1000
EZtune:Tunes AdaBoost, Elastic Net, Support Vector Machines, and Gradient Boosting Machines
Contains two functions that are intended to make tuning supervised learning methods easy. The eztune function uses a genetic algorithm or Hooke-Jeeves optimizer to find the best set of tuning parameters. The user can choose the optimizer, the learning method, and if optimization will be based on accuracy obtained through validation error, cross validation, or resubstitution. The function eztune_cv will compute a cross validated error rate. The purpose of eztune_cv is to provide a cross validated accuracy or MSE when resubstitution or validation data are used for optimization because error measures from both approaches can be misleading.
Maintained by Jill Lundell. Last updated 3 years ago.
4.76 score 38 scripts 1 dependentshknd23
DeepLearningCausal:Causal Inference with Super Learner and Deep Neural Networks
Functions to estimate Conditional Average Treatment Effects (CATE) and Population Average Treatment Effects on the Treated (PATT) from experimental or observational data using the Super Learner (SL) ensemble method and Deep neural networks. The package first provides functions to implement meta-learners such as the Single-learner (S-learner) and Two-learner (T-learner) described in Kรผnzel et al. (2019) <doi:10.1073/pnas.1804597116> for estimating the CATE. The S- and T-learner are each estimated using the SL ensemble method and deep neural networks. It then provides functions to implement the Ottoboni and Poulos (2020) <doi:10.1515/jci-2018-0035> PATT-C estimator to obtain the PATT from experimental data with noncompliance by using the SL ensemble method and deep neural networks.
Maintained by Nguyen K. Huynh. Last updated 2 months ago.
causal-inferencedeep-neural-networksmachine-learning
2 stars 4.73 score 5 scriptsevalclass
prcbench:Testing Workbench for Precision-Recall Curves
A testing workbench to evaluate tools that calculate precision-recall curves. Saito and Rehmsmeier (2015) <doi:10.1371/journal.pone.0118432>.
Maintained by Takaya Saito. Last updated 2 years ago.
5 stars 4.72 score 21 scriptsduolajiang
RCTrep:Validation of Estimates of Treatment Effects in Observational Data
Validates estimates of (conditional) average treatment effects obtained using observational data by a) making it easy to obtain and visualize estimates derived using a large variety of methods (G-computation, inverse propensity score weighting, etc.), and b) ensuring that estimates are easily compared to a gold standard (i.e., estimates derived from randomized controlled trials). 'RCTrep' offers a generic protocol for treatment effect validation based on four simple steps, namely, set-selection, estimation, diagnosis, and validation. 'RCTrep' provides a simple dashboard to review the obtained results. The validation approach is introduced by Shen, L., Geleijnse, G. and Kaptein, M. (2023) <doi:10.21203/rs.3.rs-2559287/v1>.
Maintained by Lingjie Shen. Last updated 2 years ago.
8 stars 4.68 score 12 scriptsbioc
Anaquin:Statistical analysis of sequins
The project is intended to support the use of sequins (synthetic sequencing spike-in controls) owned and made available by the Garvan Institute of Medical Research. The goal is to provide a standard open source library for quantitative analysis, modelling and visualization of spike-in controls.
Maintained by Ted Wong. Last updated 5 months ago.
immunooncologydifferentialexpressionpreprocessingrnaseqgeneexpressionsoftware
4.65 score 45 scriptsbioc
stJoincount:stJoincount - Join count statistic for quantifying spatial correlation between clusters
stJoincount facilitates the application of join count analysis to spatial transcriptomic data generated from the 10x Genomics Visium platform. This tool first converts a labeled spatial tissue map into a raster object, in which each spatial feature is represented by a pixel coded by label assignment. This process includes automatic calculation of optimal raster resolution and extent for the sample. A neighbors list is then created from the rasterized sample, in which adjacent and diagonal neighbors for each pixel are identified. After adding binary spatial weights to the neighbors list, a multi-categorical join count analysis is performed to tabulate "joins" between all possible combinations of label pairs. The function returns the observed join counts, the expected count under conditions of spatial randomness, and the variance calculated under non-free sampling. The z-score is then calculated as the difference between observed and expected counts, divided by the square root of the variance.
Maintained by Jiarong Song. Last updated 5 months ago.
transcriptomicsclusteringspatialbiocviewssoftware
4 stars 4.60 score 3 scriptsnhejazi
medoutcon:Efficient Natural and Interventional Causal Mediation Analysis
Efficient estimators of interventional (in)direct effects in the presence of mediator-outcome confounding affected by exposure. The effects estimated allow for the impact of the exposure on the outcome through a direct path to be disentangled from that through mediators, even in the presence of intermediate confounders that complicate such a relationship. Currently supported are non-parametric efficient one-step and targeted minimum loss estimators based on the formulation of Dรญaz, Hejazi, Rudolph, and van der Laan (2020) <doi:10.1093/biomet/asaa085>. Support for efficient estimation of the natural (in)direct effects is also provided, appropriate for settings in which intermediate confounders are absent. The package also supports estimation of these effects when the mediators are measured using outcome-dependent two-phase sampling designs (e.g., case-cohort).
Maintained by Nima Hejazi. Last updated 1 years ago.
causal-inferencecausal-machine-learninginverse-probability-weightsmachine-learningmediation-analysisstochastic-interventionstargeted-learningtreatment-effects
13 stars 4.46 score 22 scriptsmbannick
RobinCar:Robust Inference for Covariate Adjustment in Randomized Clinical Trials
Performs robust estimation and inference when using covariate adjustment and/or covariate-adaptive randomization in randomized clinical trials. Ting Ye, Jun Shao, Yanyao Yi, Qinyuan Zhao (2023) <doi:10.1080/01621459.2022.2049278>. Ting Ye, Marlena Bannick, Yanyao Yi, Jun Shao (2023) <doi:10.1080/24754269.2023.2205802>. Ting Ye, Jun Shao, Yanyao Yi (2023) <doi:10.1093/biomet/asad045>. Marlena Bannick, Jun Shao, Jingyi Liu, Yu Du, Yanyao Yi, Ting Ye (2024) <doi:10.48550/arXiv.2306.10213>.
Maintained by Marlena Bannick. Last updated 23 days ago.
6 stars 4.42 score 11 scriptsdosorio
rPanglaoDB:Download and Merge Single-Cell RNA-Seq Data from the PanglaoDB Database
Download and merge labeled single-cell RNA-seq data from the PanglaoDB <https://panglaodb.se/> into a Seurat object.
Maintained by Daniel Osorio. Last updated 2 years ago.
data-integrationdata-miningrna-seqsingle-cellsingle-cell-rna-seq
26 stars 4.41 score 20 scriptsbioc
ChIPanalyser:ChIPanalyser: Predicting Transcription Factor Binding Sites
ChIPanalyser is a package to predict and understand TF binding by utilizing a statistical thermodynamic model. The model incorporates 4 main factors thought to drive TF binding: Chromatin State, Binding energy, Number of bound molecules and a scaling factor modulating TF binding affinity. Taken together, ChIPanalyser produces ChIP-like profiles that closely mimic the patterns seens in real ChIP-seq data.
Maintained by Patrick C.N. Martin. Last updated 5 months ago.
softwarebiologicalquestionworkflowsteptranscriptionsequencingchiponchipcoveragealignmentchipseqsequencematchingdataimportpeakdetection
4.38 score 12 scriptshakyimlab
OmicKriging:Poly-Omic Prediction of Complex TRaits
It provides functions to generate a correlation matrix from a genetic dataset and to use this matrix to predict the phenotype of an individual by using the phenotypes of the remaining individuals through kriging. Kriging is a geostatistical method for optimal prediction or best unbiased linear prediction. It consists of predicting the value of a variable at an unobserved location as a weighted sum of the variable at observed locations. Intuitively, it works as a reverse linear regression: instead of computing correlation (univariate regression coefficients are simply scaled correlation) between a dependent variable Y and independent variables X, it uses known correlation between X and Y to predict Y.
Maintained by Hae Kyung Im. Last updated 4 years ago.
2 stars 4.38 score 48 scriptsledell
subsemble:An Ensemble Method for Combining Subset-Specific Algorithm Fits
The Subsemble algorithm is a general subset ensemble prediction method, which can be used for small, moderate, or large datasets. Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a unique form of k-fold cross-validation to output a prediction function that combines the subset-specific fits. An oracle result provides a theoretical performance guarantee for Subsemble. The paper, "Subsemble: An ensemble method for combining subset-specific algorithm fits" is authored by Stephanie Sapp, Mark J. van der Laan & John Canny (2014) <doi:10.1080/02664763.2013.864263>.
Maintained by Erin LeDell. Last updated 3 years ago.
big-datacross-validationensembleensemble-learningmachine-learningmachine-learning-algorithms
43 stars 4.37 score 11 scriptsbioc
Spaniel:Spatial Transcriptomics Analysis
Spaniel includes a series of tools to aid the quality control and analysis of Spatial Transcriptomics data. Spaniel can import data from either the original Spatial Transcriptomics system or 10X Visium technology. The package contains functions to create a SingleCellExperiment Seurat object and provides a method of loading a histologial image into R. The spanielPlot function allows visualisation of metrics contained within the S4 object overlaid onto the image of the tissue.
Maintained by Rachel Queen. Last updated 5 months ago.
singlecellrnaseqqualitycontrolpreprocessingnormalizationvisualizationtranscriptomicsgeneexpressionsequencingsoftwaredataimportdatarepresentationinfrastructurecoverageclustering
4.34 score 22 scriptsbioc
ClusterFoldSimilarity:Calculate similarity of clusters from different single cell samples using foldchanges
This package calculates a similarity coefficient using the fold changes of shared features (e.g. genes) among clusters of different samples/batches/datasets. The similarity coefficient is calculated using the dot-product (Hadamard product) of every pairwise combination of Fold Changes between a source cluster i of sample/dataset n and all the target clusters j in sample/dataset m
Maintained by Oscar Gonzalez-Velasco. Last updated 5 months ago.
singlecellclusteringfeatureextractiongraphandnetworkgenetargetrnaseq
4.34 score 11 scriptsbioc
PAA:PAA (Protein Array Analyzer)
PAA imports single color (protein) microarray data that has been saved in gpr file format - esp. ProtoArray data. After preprocessing (background correction, batch filtering, normalization) univariate feature preselection is performed (e.g., using the "minimum M statistic" approach - hereinafter referred to as "mMs"). Subsequently, a multivariate feature selection is conducted to discover biomarker candidates. Therefore, either a frequency-based backwards elimination aproach or ensemble feature selection can be used. PAA provides a complete toolbox of analysis tools including several different plots for results examination and evaluation.
Maintained by Michael Turewicz. Last updated 5 months ago.
classificationmicroarrayonechannelproteomicscpp
4.34 score 11 scriptsbioc
cytofQC:Labels normalized cells for CyTOF data and assigns probabilities for each label
cytofQC is a package for initial cleaning of CyTOF data. It uses a semi-supervised approach for labeling cells with their most likely data type (bead, doublet, debris, dead) and the probability that they belong to each label type. This package does not remove data from the dataset, but provides labels and information to aid the data user in cleaning their data. Our algorithm is able to distinguish between doublets and large cells.
Maintained by Jill Lundell. Last updated 5 months ago.
2 stars 4.30 score 3 scriptsbioc
RNAmodR.AlkAnilineSeq:Detection of m7G, m3C and D modification by AlkAnilineSeq
RNAmodR.AlkAnilineSeq implements the detection of m7G, m3C and D modifications on RNA from experimental data generated with the AlkAnilineSeq protocol. The package builds on the core functionality of the RNAmodR package to detect specific patterns of the modifications in high throughput sequencing data.
Maintained by Felix G.M. Ernst. Last updated 5 months ago.
softwareworkflowstepvisualizationsequencingalkanilineseqbioconductormodificationsrnarnamodr
2 stars 4.30 score 3 scriptsbioc
RegionalST:Investigating regions of interest and performing regional cell type-specific analysis with spatial transcriptomics data
This package analyze spatial transcriptomics data through cross-regional cell type-specific analysis. It selects regions of interest (ROIs) and identifys cross-regional cell type-specific differential signals. The ROIs can be selected using automatic algorithm or through manual selection. It facilitates manual selection of ROIs using a shiny application.
Maintained by Ziyi Li. Last updated 4 months ago.
spatialtranscriptomicsreactomekegg
4.30 score 8 scriptsbioc
scBFA:A dimensionality reduction tool using gene detection pattern to mitigate noisy expression profile of scRNA-seq
This package is designed to model gene detection pattern of scRNA-seq through a binary factor analysis model. This model allows user to pass into a cell level covariate matrix X and gene level covariate matrix Q to account for nuisance variance(e.g batch effect), and it will output a low dimensional embedding matrix for downstream analysis.
Maintained by Ruoxin Li. Last updated 5 months ago.
singlecelltranscriptomicsdimensionreductiongeneexpressionatacseqbatcheffectkeggqualitycontrol
4.30 score 4 scriptsyanpd01
ggsector:Draw Sectors
Some useful functions that can use 'grid' and 'ggplot2' to plot sectors and interact with 'Seurat' to plot gene expression percentages. Also, there are some examples of how to draw sectors in 'ComplexHeatmap'.
Maintained by Pengdong Yan. Last updated 5 months ago.
4 stars 4.30 score 5 scriptstlverse
tmle3mopttx:Targeted Maximum Likelihood Estimation of the Mean under Optimal Individualized Treatment
This package estimates the optimal individualized treatment rule for the categorical treatment using Super Learner (sl3). In order to avoid nested cross-validation, it uses split-specific estimates of Q and g to estimate the rule as described by Coyle et al. In addition, it provides the Targeted Maximum Likelihood estimates of the mean performance using CV-TMLE under such estimated rules. This is an adapter package for use with the tmle3 framework and the tlverse software ecosystem for Targeted Learning.
Maintained by Ivana Malenica. Last updated 3 years ago.
categorical-treatmentcausal-inferenceheterogeneous-effectsmachine-learningoptimal-individualized-treatmenttargeted-learningvariable-importance
13 stars 4.28 score 49 scripts 1 dependentsbioc
LedPred:Learning from DNA to Predict Enhancers
This package aims at creating a predictive model of regulatory sequences used to score unknown sequences based on the content of DNA motifs, next-generation sequencing (NGS) peaks and signals and other numerical scores of the sequences using supervised classification. The package contains a workflow based on the support vector machine (SVM) algorithm that maps features to sequences, optimize SVM parameters and feature number and creates a model that can be stored and used to score the regulatory potential of unknown sequences.
Maintained by Aitor Gonzalez. Last updated 5 months ago.
supportvectormachinesoftwaremotifannotationchipseqsequencingclassification
3 stars 4.26 score 3 scriptscyrillagger
scDiffCom:Differential Analysis of Intercellular Communication from scRNA-Seq Data
Analysis tools to investigate changes in intercellular communication from scRNA-seq data. Using a Seurat object as input, the package infers which cell-cell interactions are present in the dataset and how these interactions change between two conditions of interest (e.g. young vs old). It relies on an internal database of ligand-receptor interactions (available for human, mouse and rat) that have been gathered from several published studies. Detection and differential analyses rely on permutation tests. The package also contains several tools to perform over-representation analysis and visualize the results. See Lagger, C. et al. (2023) <doi:10.1038/s43587-023-00514-x> for a full description of the methodology.
Maintained by Cyril Lagger. Last updated 1 years ago.
21 stars 4.25 score 17 scriptslance-waller-lab
envi:Environmental Interpolation using Spatial Kernel Density Estimation
Estimates an ecological niche using occurrence data, covariates, and kernel density-based estimation methods. For a single species with presence and absence data, the 'envi' package uses the spatial relative risk function that is estimated using the 'sparr' package. Details about the 'sparr' package methods can be found in the tutorial: Davies et al. (2018) <doi:10.1002/sim.7577>. Details about kernel density estimation can be found in J. F. Bithell (1990) <doi:10.1002/sim.4780090616>. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) <doi:10.1002/sim.4780101112>.
Maintained by Ian D. Buller. Last updated 5 months ago.
ecological-nicheecological-niche-modellinggeospatialgeospatial-analysiskernel-density-estimationniche-modelingniche-modellingnon-euclidean-spacespoint-patternpoint-pattern-analysisprincipal-component-analysisspatial-analysisspecies-distribution-modelingspecies-distribution-modelling
1 stars 4.22 score 33 scriptsbioc
easier:Estimate Systems Immune Response from RNA-seq data
This package provides a workflow for the use of EaSIeR tool, developed to assess patients' likelihood to respond to ICB therapies providing just the patients' RNA-seq data as input. We integrate RNA-seq data with different types of prior knowledge to extract quantitative descriptors of the tumor microenvironment from several points of view, including composition of the immune repertoire, and activity of intra- and extra-cellular communications. Then, we use multi-task machine learning trained in TCGA data to identify how these descriptors can simultaneously predict several state-of-the-art hallmarks of anti-cancer immune response. In this way we derive cancer-specific models and identify cancer-specific systems biomarkers of immune response. These biomarkers have been experimentally validated in the literature and the performance of EaSIeR predictions has been validated using independent datasets form four different cancer types with patients treated with anti-PD1 or anti-PDL1 therapy.
Maintained by Oscar Lapuente-Santana. Last updated 5 months ago.
geneexpressionsoftwaretranscriptionsystemsbiologypathwaysgenesetenrichmentimmunooncologyepigeneticsclassificationbiomedicalinformaticsregressionexperimenthubsoftware
4.20 score 16 scriptsbenkeser
nlpred:Estimators of Non-Linear Cross-Validated Risks Optimized for Small Samples
Methods for obtaining improved estimates of non-linear cross-validated risks are obtained using targeted minimum loss-based estimation, estimating equations, and one-step estimation (Benkeser, Petersen, van der Laan (2019), <doi:10.1080/01621459.2019.1668794>). Cross-validated area under the receiver operating characteristics curve (LeDell, Petersen, van der Laan (2015), <doi:10.1214/15-EJS1035>) and other metrics are included.
Maintained by David Benkeser. Last updated 3 years ago.
auccross-validationestimating-equationsmachine-learningtmle
3 stars 4.18 score 6 scriptsjiaxiangbu
rawKS:Easily Get True-Positive Rate and False-Positive Rate and KS Statistic
The Kolmogorov-Smirnov (K-S) statistic is a standard method to measure the model strength for credit risk scoring models. This package calculates the KโS statistic and plots the true-positive rate and false-positive rate to measure the model strength. This package was written with the credit marketer, who uses risk models in conjunction with his campaigns. The users could read more details from Thrasher (1992) <doi:10.1002/dir.4000060408> and 'pyks' <https://pypi.org/project/pyks/>.
Maintained by Jiaxiang Li. Last updated 5 years ago.
3 stars 4.18 score 5 scriptsbioc
partCNV:Infer locally aneuploid cells using single cell RNA-seq data
This package uses a statistical framework for rapid and accurate detection of aneuploid cells with local copy number deletion or amplification. Our method uses an EM algorithm with mixtures of Poisson distributions while incorporating cytogenetics information (e.g., regional deletion or amplification) to guide the classification (partCNV). When applicable, we further improve the accuracy by integrating a Hidden Markov Model for feature selection (partCNVH).
Maintained by Ziyi Li. Last updated 5 months ago.
softwarecopynumbervariationhiddenmarkovmodelsinglecellclassification
4.18 score 4 scriptsedoardocostantini
gspcr:Generalized Supervised Principal Component Regression
Generalization of supervised principal component regression (SPCR; Bair et al., 2006, <doi:10.1198/016214505000000628>) to support continuous, binary, and discrete variables as outcomes and predictors (inspired by the 'superpc' R package <https://cran.r-project.org/package=superpc>).
Maintained by Edoardo Costantini. Last updated 12 months ago.
1 stars 4.18 score 10 scriptsruzhangzhao
mixhvg:Mixture of Multiple Highly Variable Feature Selection Methods
Highly variable gene selection methods, including popular public available methods, and also the mixture of multiple highly variable gene selection methods, <https://github.com/RuzhangZhao/mixhvg>. Reference: <doi:10.1101/2024.08.25.608519>.
Maintained by Ruzhang Zhao. Last updated 1 months ago.
rna-seq-analysisrna-seq-pipelinesingle-cellsingle-cell-rna-seqvariable-selection
5 stars 4.18 score 6 scriptsadam-s-elder
amp:Statistical Test for the Multivariate Point Null Hypotheses
A testing framework for testing the multivariate point null hypothesis. A testing framework described in Elder et al. (2022) <arXiv:2203.01897> to test the multivariate point null hypothesis. After the user selects a parameter of interest and defines the assumed data generating mechanism, this information should be encoded in functions for the parameter estimator and its corresponding influence curve. Some parameter and data generating mechanism combinations have codings in this package, and are explained in detail in the article.
Maintained by Adam Elder. Last updated 3 years ago.
4.11 score 13 scriptsbioc
ssPATHS:ssPATHS: Single Sample PATHway Score
This package generates pathway scores from expression data for single samples after training on a reference cohort. The score is generated by taking the expression of a gene set (pathway) from a reference cohort and performing linear discriminant analysis to distinguish samples in the cohort that have the pathway augmented and not. The separating hyperplane is then used to score new samples.
Maintained by Natalie R. Davidson. Last updated 5 months ago.
softwaregeneexpressionbiomedicalinformaticsrnaseqpathwaystranscriptomicsdimensionreductionclassification
4.00 score 1 scriptsbioc
RNAmodR.RiboMethSeq:Detection of 2'-O methylations by RiboMethSeq
RNAmodR.RiboMethSeq implements the detection of 2'-O methylations on RNA from experimental data generated with the RiboMethSeq protocol. The package builds on the core functionality of the RNAmodR package to detect specific patterns of the modifications in high throughput sequencing data.
Maintained by Felix G.M. Ernst. Last updated 5 months ago.
softwareworkflowstepvisualizationsequencingbioconductormodificationsribomethseqrnarnamodr
1 stars 4.00 score 4 scriptsbioc
RNAmodR.ML:Detecting patterns of post-transcriptional modifications using machine learning
RNAmodR.ML extend the functionality of the RNAmodR package and classical detection strategies towards detection through machine learning models. RNAmodR.ML provides classes, functions and an example workflow to establish a detection stratedy, which can be packaged.
Maintained by Felix G.M. Ernst. Last updated 5 months ago.
softwareinfrastructureworkflowstepvisualizationsequencing
1 stars 4.00 score 3 scriptsbioc
scTreeViz:R/Bioconductor package to interactively explore and visualize single cell RNA-seq datasets with hierarhical annotations
scTreeViz provides classes to support interactive data aggregation and visualization of single cell RNA-seq datasets with hierarchies for e.g. cell clusters at different resolutions. The `TreeIndex` class provides methods to manage hierarchy and split the tree at a given resolution or across resolutions. The `TreeViz` class extends `SummarizedExperiment` and can performs quick aggregations on the count matrix defined by clusters.
Maintained by Jayaram Kancherla. Last updated 5 months ago.
visualizationinfrastructureguisinglecell
4.00 score 3 scriptsmariaguilleng
boostingDEA:A Boosting Approach to Data Envelopment Analysis
Includes functions to estimate production frontiers and make ideal output predictions in the Data Envelopment Analysis (DEA) context using both standard models from DEA and Free Disposal Hull (FDH) and boosting techniques. In particular, EATBoosting (Guillen et al., 2023 <doi:10.1016/j.eswa.2022.119134>) and MARSBoosting. Moreover, the package includes code for estimating several technical efficiency measures using different models such as the input and output-oriented radial measures, the input and output-oriented Russell measures, the Directional Distance Function (DDF), the Weighted Additive Measure (WAM) and the Slacks-Based Measure (SBM).
Maintained by Maria D. Guillen. Last updated 2 years ago.
2 stars 4.00 score 3 scriptsfentouxungui
SeuratExplorer:An 'Shiny' App for Exploring scRNA-seq Data Processed in 'Seurat'
A simple, one-command package which runs an interactive dashboard capable of common visualizations for single cell RNA-seq. 'SeuratExplorer' requires a processed 'Seurat' object, which is saved as 'rds' or 'qs2' file.
Maintained by Yongchao Zhang. Last updated 2 days ago.
3.98 scorebioc
erccdashboard:Assess Differential Gene Expression Experiments with ERCC Controls
Technical performance metrics for differential gene expression experiments using External RNA Controls Consortium (ERCC) spike-in ratio mixtures.
Maintained by Sarah Munro. Last updated 5 months ago.
immunooncologygeneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicinggeneticsmicroarraymrnamicroarrayrnaseqbatcheffectmultiplecomparisonqualitycontrol
3.95 score 4 scriptsbioc
CPSM:CPSM: Cancer patient survival model
The CPSM package provides a comprehensive computational pipeline for predicting the survival probability of cancer patients. It offers a series of steps including data processing, splitting data into training and test subsets, and normalization of data. The package enables the selection of significant features based on univariate survival analysis and generates a LASSO prognostic index score. It supports the development of predictive models for survival probability using various features and provides visualization tools to draw survival curves based on predicted survival probabilities. Additionally, SPM includes functionalities for generating bar plots that depict the predicted mean and median survival times of patients, making it a versatile tool for survival analysis in cancer research.
Maintained by Harpreet Kaur. Last updated 22 days ago.
geneexpressionnormalizationsurvival
3.90 scorebioc
a4Classif:Automated Affymetrix Array Analysis Classification Package
Functionalities for classification of Affymetrix microarray data, integrating within the Automated Affymetrix Array Analysis set of packages.
Maintained by Laure Cougnaud. Last updated 5 months ago.
microarraygeneexpressionclassification
3.78 score 1 scripts 1 dependentsnhejazi
medshift:Causal mediation analysis for stochastic interventions
Estimators of a parameter arising in the decomposition of the population intervention (in)direct effect of stochastic interventions in causal mediation analysis, including efficient one-step, targeted minimum loss (TML), re-weighting (IPW), and substitution estimators. The parameter estimated constitutes a part of each of the population intervention (in)direct effects. These estimators may be used in assessing population intervention (in)direct effects under stochastic treatment regimes, including incremental propensity score interventions and modified treatment policies. The methodology was first discussed by I Dรญaz and NS Hejazi (2020) <doi:10.1111/rssb.12362>.
Maintained by Nima Hejazi. Last updated 3 years ago.
causal-inferenceinverse-probability-weightsmachine-learningmediation-analysisstochastic-interventionstargeted-learningtreatment-effects
9 stars 3.73 score 12 scriptsbanking-analytics-lab
EMP:Expected Maximum Profit Classification Performance Measure
Functions for estimating EMP (Expected Maximum Profit Measure) in Credit Risk Scoring and Customer Churn Prediction, according to Verbraken et al (2013, 2014) <DOI:10.1109/TKDE.2012.50>, <DOI:10.1016/j.ejor.2014.04.001>.
Maintained by Cristian Bravo. Last updated 6 years ago.
1 stars 3.70 score 6 scriptsliuy12
SCdeconR:Deconvolution of Bulk RNA-Seq Data using Single-Cell RNA-Seq Data as Reference
Streamlined workflow from deconvolution of bulk RNA-seq data to downstream differential expression and gene-set enrichment analysis. Provide various visualization functions.
Maintained by Yuanhang Liu. Last updated 10 months ago.
bulk-rna-seq-deconvolutiondeconvolutiondifferential-expressionffpegeneset-enrichment-analysisscdeconrsingle-cell
4 stars 3.60 score 4 scriptsbarnhilldave
TML:Tropical Geometry Tools for Machine Learning
Suite of tropical geometric tools for use in machine learning applications. These methods may be summarized in the following references: Yoshida, et al. (2022) <arxiv:2209.15045>, Barnhill et al. (2023) <arxiv:2303.02539>, Barnhill and Yoshida (2023) <doi:10.3390/math11153433>, Aliatimis et al. (2023) <arXiv:2306.08796>, Yoshida et al. (2022) <arXiv:2206.04206>, and Yoshida et al. (2019) <doi:10.1007/s11538-018-0493-4>.
Maintained by David Barnhill. Last updated 8 months ago.
3 stars 3.48 score 1 scriptsbioc
SCArray.sat:Large-scale single-cell RNA-seq data analysis using GDS files and Seurat
Extends the Seurat classes and functions to support Genomic Data Structure (GDS) files as a DelayedArray backend for data representation. It relies on the implementation of GDS-based DelayedMatrix in the SCArray package to represent single cell RNA-seq data. The common optimized algorithms leveraging GDS-based and single cell-specific DelayedMatrix (SC_GDSMatrix) are implemented in the SCArray package. SCArray.sat introduces a new SCArrayAssay class (derived from the Seurat Assay), which wraps raw counts, normalized expressions and scaled data matrix based on GDS-specific DelayedMatrix. It is designed to integrate seamlessly with the Seurat package to provide common data analysis in the SeuratObject-based workflow. Compared with Seurat, SCArray.sat significantly reduces the memory usage without downsampling and can be applied to very large datasets.
Maintained by Xiuwen Zheng. Last updated 9 days ago.
datarepresentationdataimportsinglecellrnaseq
1 stars 3.48 score 3 scriptsbioc
a4:Automated Affymetrix Array Analysis Umbrella Package
Umbrella package is available for the entire Automated Affymetrix Array Analysis suite of package.
Maintained by Laure Cougnaud. Last updated 5 months ago.
3.48 score 15 scriptsthecailab
SCRIP:An Accurate Simulator for Single-Cell RNA Sequencing Data
We provide a comprehensive scheme that is capable of simulating Single Cell RNA Sequencing data for various parameters of Biological Coefficient of Variation, busting kinetics, differential expression (DE), cell or sample groups, cell trajectory, batch effect and other experimental designs. 'SCRIP' proposed and compared two frameworks with Gamma-Poisson and Beta-Gamma-Poisson models for simulating Single Cell RNA Sequencing data. Other reference is available in Zappia et al. (2017) <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1305-0>.
Maintained by Fei Qin. Last updated 2 years ago.
2 stars 3.41 score 13 scriptsjackmwolf
tehtuner:Fit and Tune Models to Detect Treatment Effect Heterogeneity
Implements methods to fit Virtual Twins models (Foster et al. (2011) <doi:10.1002/sim.4322>) for identifying subgroups with differential effects in the context of clinical trials while controlling the probability of falsely detecting a differential effect when the conditional average treatment effect is uniform across the study population using parameter selection methods proposed in Wolf et al. (2022) <doi:10.1177/17407745221095855>.
Maintained by Jack Wolf. Last updated 2 years ago.
clinical-trialsheterogeneity-of-treatment-effectsubgroup-identification
5 stars 3.40 score 6 scriptsmqnjqrid
drpop:Efficient and Doubly Robust Population Size Estimation
Estimation of the total population size from capture-recapture data efficiently and with low bias implementing the methods from Das M, Kennedy EH, and Jewell NP (2021) <arXiv:2104.14091>. The estimator is doubly robust against errors in the estimation of the intermediate nuisance parameters. Users can choose from the flexible estimation models provided in the package, or use any other preferred model.
Maintained by Manjari Das. Last updated 3 years ago.
5 stars 3.40 score 2 scriptsboshiangke
influenceAUC:Identify Influential Observations in Binary Classification
Ke, B. S., Chiang, A. J., & Chang, Y. C. I. (2018) <doi:10.1080/10543406.2017.1377728> provide two theoretical methods (influence function and local influence) based on the area under the receiver operating characteristic curve (AUC) to quantify the numerical impact of each observation to the overall AUC. Alternative graphical tools, cumulative lift charts, are proposed to reveal the existences and approximate locations of those influential observations through data visualization.
Maintained by Bo-Shiang Ke. Last updated 5 months ago.
3.30 scorecefet-rj-dal
daltoolboxdp:Data Pre-Processing Extensions
An important aspect of data analytics is related to data management support for artificial intelligence. It is related to preparing data correctly. This package provides extensions to support data preparation in terms of both data sampling and data engineering. Overall, the package provides researchers with a comprehensive set of functionalities for data science based on experiment lines, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.
Maintained by Eduardo Ogasawara. Last updated 4 months ago.
1 stars 3.26 score 12 scriptsabshev
superMICE:SuperLearner Method for MICE
Adds a Super Learner ensemble model method (using the 'SuperLearner' package) to the 'mice' package. Laqueur, H. S., Shev, A. B., Kagawa, R. M. C. (2021) <doi:10.1093/aje/kwab271>.
Maintained by Aaron B. Shev. Last updated 3 years ago.
3 stars 3.18 scoretlverse
tmle3mediate:Targeted Learning for Causal Mediation Analysis
Targeted maximum likelihood (TML) estimation of population-level causal effects in mediation analysis. The causal effects are defined by joint static or stochastic interventions applied to the exposure and the mediator. Targeted doubly robust estimators are provided for the classical natural direct and indirect effects, as well as the more recently developed population intervention direct and indirect effects.
Maintained by Nima Hejazi. Last updated 4 years ago.
causal-inferencecausal-mediation-analysismachine-learningmediation-analysisstochastic-interventionstargeted-learningtreatment-effects
6 stars 2.98 score 16 scriptsobenno
scSpotlight:A Single Cell Analysis Shiny App
A single cell analysis (viewer) app based on Seurat.
Maintained by Zhixia Xiao. Last updated 8 months ago.
2 stars 2.78 scoreyuepan027
scpoisson:Single Cell Poisson Probability Paradigm
Useful to visualize the Poissoneity (an independent Poisson statistical framework, where each RNA measurement for each cell comes from its own independent Poisson distribution) of Unique Molecular Identifier (UMI) based single cell RNA sequencing (scRNA-seq) data, and explore cell clustering based on model departure as a novel data representation.
Maintained by Yue Pan. Last updated 3 years ago.
2.70 score 4 scriptsjaydevine
pheble:Classifying High-Dimensional Phenotypes with Ensemble Learning
A system for binary and multi-class classification of high-dimensional phenotypic data using ensemble learning. By combining predictions from different classification models, this package attempts to improve performance over individual learners. The pre-processing, training, validation, and testing are performed end-to-end to minimize user input and simplify the process of classification.
Maintained by Jay Devine. Last updated 2 years ago.
2.70 scoredoktorandahl
evinf:Inference with Extreme Value Inflated Count Data
Allows users to model and draw inferences from extreme value inflated count data, and to evaluate these models and compare to non extreme-value inflated counterparts. The package is built to be compatible with standard presentation tools such as 'broom', 'tidy', and 'modelsummary'.
Maintained by David Randahl. Last updated 11 months ago.
1 stars 2.70 scoresridhara-omics
scPipeline:A Wrapper for 'Seurat' and Related R Packages for End-to-End Single Cell Analysis
Reports markers list, differentially expressed genes, associated pathways, cell-type annotations, does batch correction and other related single cell analyses all wrapped within 'Seurat'.
Maintained by Viswanadham Sridhara. Last updated 27 days ago.
2.70 scoreschiebout
CAMML:Cell-Typing using Variance Adjusted Mahalanobis Distances with Multi-Labeling
Creates multi-label cell-types for single-cell RNA-sequencing data based on weighted VAM scoring of cell-type specific gene sets. Schiebout, Frost (2022) <https://psb.stanford.edu/psb-online/proceedings/psb22/schiebout.pdf>.
Maintained by Courtney Schiebout. Last updated 1 years ago.
2.60 scorecsoneson
ConfoundingExplorer:Confounding Explorer
This package provides a simple interactive application for investigating the effect of confounding between a signal of interest and a batch effect. It uses simulated data with user-specified effect sizes for both batch and condition effects. The user can also specify the number of samples in each condition and batch, and thereby the degree of confounding.
Maintained by Charlotte Soneson. Last updated 3 months ago.
regressionexperimentaldesignmultiplecomparisonbatcheffect
2 stars 2.60 score 3 scriptspromidat
predictoR:Predictive Data Analysis System
Perform a supervised data analysis on a database through a 'shiny' graphical interface. It includes methods such as K-Nearest Neighbors, Decision Trees, ADA Boosting, Extreme Gradient Boosting, Random Forest, Neural Networks, Deep Learning, Support Vector Machines and Bayesian Methods.
Maintained by Oldemar Rodriguez. Last updated 1 years ago.
1 stars 2.60 score 3 scriptsigordot
scooter:Streamlined scRNA-Seq Analysis Pipeline
Streamlined scRNA-Seq analysis pipeline.
Maintained by Igor Dolgalev. Last updated 1 years ago.
4 stars 2.51 score 16 scriptsmbeer3
gkmSVM:Gapped-Kmer Support Vector Machine
Imports the 'gkmSVM' v2.0 functionalities into R <https://www.beerlab.org/gkmsvm/> It also uses the 'kernlab' library (separate R package by different authors) for various SVM algorithms. Users should note that the suggested packages 'rtracklayer', 'GenomicRanges', 'BSgenome', 'BiocGenerics', 'Biostrings', 'GenomeInfoDb', 'IRanges', and 'S4Vectors' are all BioConductor packages <https://bioconductor.org>.
Maintained by Mike Beer. Last updated 2 years ago.
2.48 score 30 scriptsbioc
maPredictDSC:Phenotype prediction using microarray data: approach of the best overall team in the IMPROVER Diagnostic Signature Challenge
This package implements the classification pipeline of the best overall team (Team221) in the IMPROVER Diagnostic Signature Challenge. Additional functionality is added to compare 27 combinations of data preprocessing, feature selection and classifier types.
Maintained by Adi Laurentiu Tarca. Last updated 5 months ago.
2.30 score 2 scriptshugometric
causalweight:Estimation Methods for Causal Inference Based on Inverse Probability Weighting and Doubly Robust Estimation
Various estimators of causal effects based on inverse probability weighting, doubly robust estimation, and double machine learning. Specifically, the package includes methods for estimating average treatment effects, direct and indirect effects in causal mediation analysis, and dynamic treatment effects. The models refer to studies of Froelich (2007) <doi:10.1016/j.jeconom.2006.06.004>, Huber (2012) <doi:10.3102/1076998611411917>, Huber (2014) <doi:10.1080/07474938.2013.806197>, Huber (2014) <doi:10.1002/jae.2341>, Froelich and Huber (2017) <doi:10.1111/rssb.12232>, Hsu, Huber, Lee, and Lettry (2020) <doi:10.1002/jae.2765>, and others.
Maintained by Hugo Bodory. Last updated 8 days ago.
2 stars 2.12 score 22 scriptscran
VIMPS:Calculate Variable Importance with Knock Off Variables
The variable importance is calculated using knock off variables. Then output can be provided in numerical and graphical form. Meredith L Wallace (2023) <doi:10.1186/s12874-023-01965-x>.
Maintained by Meredith Wallace. Last updated 1 years ago.
2.00 scoredrelliesmall
phm:Phrase Mining
Functions to extract and handle commonly occurring principal phrases obtained from collections of texts.
Maintained by Ellie Small. Last updated 1 years ago.
1 stars 2.00 score 2 scriptsraznargimeno
SLModels:Stepwise Linear Models for Binary Classification Problems under Youden Index Optimisation
Stepwise models for the optimal linear combination of continuous variables in binary classification problems under Youden Index optimisation. Information on the models implemented can be found at Aznar-Gimeno et al. (2021) <doi:10.3390/math9192497>.
Maintained by Rocio Aznar-Gimeno. Last updated 3 years ago.
2.00 scoreictml-project
ictml:Easily Install and Load the 'ICTML'
The 'ICTML' is a software suite implementing the Interpretable Causal Targeted Machine Learning ('ICTML') Project <https://www.ictml.org>.
Maintained by Henrik Bengtsson. Last updated 5 months ago.
2.00 scoreaughunter
autoScorecard:Fully Automatic Generation of Scorecards
Provides an efficient suite of R tools for scorecard modeling, analysis, and visualization. Including equal frequency binning, equidistant binning, K-means binning, chi-square binning, decision tree binning, data screening, manual parameter modeling, fully automatic generation of scorecards, etc. This package is designed to make scorecard development easier and faster. References include: 1. <http://shichen.name/posts/>. 2. Dong-feng Li(Peking University),Class PPT. 3. <https://zhuanlan.zhihu.com/p/389710022>. 4. <https://www.zhangshengrong.com/p/281oqR9JNw/>.
Maintained by Tai-Sen Zheng. Last updated 2 years ago.
2.00 score 2 scriptsubcxzhang
scAnnotate:An Automated Cell Type Annotation Tool for Single-Cell RNA-Sequencing Data
An entirely data-driven cell type annotation tools, which requires training data to learn the classifier, but not biological knowledge to make subjective decisions. It consists of three steps: preprocessing training and test data, model fitting on training data, and cell classification on test data. See Xiangling Ji,Danielle Tsao, Kailun Bai, Min Tsao, Li Xing, Xuekui Zhang.(2022)<doi:10.1101/2022.02.19.481159> for more details.
Maintained by Xuekui Zhang. Last updated 1 years ago.
2.00 score 4 scriptscran
PoweREST:A Bootstrap-Based Power Estimation Tool for Spatial Transcriptomics
Power estimation and sample size calculation for 10X Visium Spatial Transcriptomics data to detect differential expressed genes between two conditions based on bootstrap resampling. See Shui et al. (2024) <doi:10.1101/2024.08.30.610564> for method details.
Maintained by Lan Shui. Last updated 7 months ago.
2.00 scoresyeonkang
causal.decomp:Causal Decomposition Analysis
We implement causal decomposition analysis using the methods proposed by Park, Lee, and Qin (2020) and Park, Kang, and Lee (2021+) <arXiv:2109.06940>. This package allows researchers to use the multiple-mediator-imputation, single-mediator-imputation, and product-of-coefficients regression methods to estimate the initial disparity, disparity reduction, and disparity remaining. It also allows to make the inference conditional on baseline covariates. We also implement sensitivity analysis for the causal decomposition analysis using R-squared values as sensitivity parameters (Park, Kang, Lee, and Ma, 2023).
Maintained by Suyeon Kang. Last updated 2 years ago.
2.00 score 4 scriptsdzhang777
SlideCNA:Calls Copy Number Alterations from Slide-Seq Data
This takes spatial single-cell-type RNA-seq data (specifically designed for Slide-seq v2) that calls copy number alterations (CNAs) using pseudo-spatial binning, clusters cellular units (e.g. beads) based on CNA profile, and visualizes spatial CNA patterns. Documentation about 'SlideCNA' is included in the the pre-print by Zhang et al. (2022, <doi:10.1101/2022.11.25.517982>). The package 'enrichR' (>= 3.0), conditionally used to annotate SlideCNA-determined clusters with gene ontology terms, can be installed at <https://github.com/wjawaid/enrichR> or with install_github("wjawaid/enrichR").
Maintained by Diane Zhang. Last updated 2 months ago.
1.70 score 3 scriptscran
crossurr:Cross-Fitting for Doubly Robust Evaluation of High-Dimensional Surrogate Markers
Doubly robust methods for evaluating surrogate markers as outlined in: Agniel D, Hejblum BP, Thiebaut R & Parast L (2022). "Doubly robust evaluation of high-dimensional surrogate markers", Biostatistics <doi:10.1093/biostatistics/kxac020>. You can use these methods to determine how much of the overall treatment effect is explained by a (possibly high-dimensional) set of surrogate markers.
Maintained by Denis Agniel. Last updated 10 months ago.
1.70 scoresduxbury
netmediate:Micro-Macro Analysis for Social Networks
Estimates micro effects on macro structures (MEMS) and average micro mediated effects (AMME). URL: <https://github.com/sduxbury/netmediate>. BugReports: <https://github.com/sduxbury/netmediate/issues>. Robins, Garry, Phillipa Pattison, and Jodie Woolcock (2005) <doi:10.1086/427322>. Snijders, Tom A. B., and Christian E. G. Steglich (2015) <doi:10.1177/0049124113494573>. Imai, Kosuke, Luke Keele, and Dustin Tingley (2010) <doi:10.1037/a0020761>. Duxbury, Scott (2023) <doi:10.1177/00811750231209040>. Duxbury, Scott (2024) <doi:10.1177/00811750231220950>.
Maintained by Scott Duxbury. Last updated 10 months ago.
1.70 scorebozercavdar
less:Learning with Subset Stacking
"Learning with Subset Stacking" is a supervised learning algorithm that is based on training many local estimators on subsets of a given dataset, and then passing their predictions to a global estimator. You can find the details about LESS in our manuscript at <arXiv:2112.06251>.
Maintained by Burhan Ozer Cavdar. Last updated 3 years ago.
1.70 score 5 scriptskirin666
mccf1:Creates the MCC-F1 Curve and Calculates the MCC-F1 Metric and the Best Threshold
The MCC-F1 analysis is a method to evaluate the performance of binary classifications. The MCC-F1 curve is more reliable than the Receiver Operating Characteristic (ROC) curve and the Precision-Recall (PR)curve under imbalanced ground truth. The MCC-F1 analysis also provides the MCC-F1 metric that integrates classifier performance over varying thresholds, and the best threshold of binary classification.
Maintained by Chang Cao. Last updated 5 years ago.
1.61 score 41 scriptslau-mel
rocc:ROC Based Classification
Functions for a classification method based on receiver operating characteristics (ROC). Briefly, features are selected according to their ranked AUC value in the training set. The selected features are merged by the mean value to form a meta-gene. The samples are ranked by their meta-gene value and the meta-gene threshold that has the highest accuracy in splitting the training samples is determined. A new sample is classified by its meta-gene value relative to the threshold. In the first place, the package is aimed at two class problems in gene expression data, but might also apply to other problems.
Maintained by Martin Lauss. Last updated 5 years ago.
1.56 score 36 scriptsdrelliesmall
smallstuff:Dr. Small's Functions
Functions used in courses taught by Dr. Small at Drew University.
Maintained by Ellie Small. Last updated 1 years ago.
1.48 score 2 scripts 1 dependentssduxbury
ergMargins:Process Analysis for Exponential Random Graph Models
Calculates marginal effects and conducts process analysis in exponential family random graph models (ERGM). Includes functions to conduct mediation and moderation analyses and to diagnose multicollinearity. URL: <https://github.com/sduxbury/ergMargins>. BugReports: <https://github.com/sduxbury/ergMargins/issues>. Duxbury, Scott W (2021) <doi:10.1177/0049124120986178>. Long, J. Scott, and Sarah Mustillo (2018) <doi:10.1177/0049124118799374>. Mize, Trenton D. (2019) <doi:10.15195/v6.a4>. Karlson, Kristian Bernt, Anders Holm, and Richard Breen (2012) <doi:10.1177/0081175012444861>. Duxbury, Scott W (2018) <doi:10.1177/0049124118782543>. Duxbury, Scott W, Jenna Wertsching (2023) <doi:10.1016/j.socnet.2023.02.003>. Huang, Peng, Carter Butts (2023) <doi:10.1016/j.socnet.2023.07.001>.
Maintained by Scott Duxbury. Last updated 11 months ago.
1.48 score 3 scripts 1 dependentsjiayiji
CIMTx:Causal Inference for Multiple Treatments with a Binary Outcome
Different methods to conduct causal inference for multiple treatments with a binary outcome, including regression adjustment, vector matching, Bayesian additive regression trees, targeted maximum likelihood and inverse probability of treatment weighting using different generalized propensity score models such as multinomial logistic regression, generalized boosted models and super learner. For more details, see the paper by Hu et al. <doi:10.1177/0962280220921909>.
Maintained by Jiayi Ji. Last updated 3 years ago.
1.43 score 27 scriptsjodamatta
SLOS:ICU Length of Stay Prediction and Efficiency Evaluation
Provides tools for predicting ICU length of stay and assessing ICU efficiency. It is based on the methodologies proposed by Peres et al. (2022, 2023), which utilize data-driven approaches for modeling and validation, offering insights into ICU performance and patient outcomes. References: Peres et al. (2022)<https://pubmed.ncbi.nlm.nih.gov/35988701/>, Peres et al. (2023)<https://pubmed.ncbi.nlm.nih.gov/37922007/>. More information: <https://github.com/igor-peres/ICU-Length-of-Stay-Prediction>.
Maintained by Joana da Matta. Last updated 2 months ago.
1.30 scoresg-tlr
twoStageDesignTMLE:Targeted Maximum Likelihood Estimation for Two-Stage Study Design
An inverse probability of censoring weighted (IPCW) targeted maximum likelihood estimator (TMLE) for evaluating a marginal point treatment effect from data where some variables were collected on only a subset of participants using a two-stage design (or marginal mean outcome for a single arm study). A TMLE for conditional parameters defined by a marginal structural model (MSM) is also available.
Maintained by Susan Gruber. Last updated 2 months ago.
1.30 scorecran
regressoR:Regression Data Analysis System
Perform a supervised data analysis on a database through a 'shiny' graphical interface. It includes methods such as linear regression, penalized regression, k-nearest neighbors, decision trees, ada boosting, extreme gradient boosting, random forest, neural networks, deep learning and support vector machines.
Maintained by Oldemar Rodriguez. Last updated 5 months ago.
2 stars 1.30 scoreshuangsong0110
EBPRS:Derive Polygenic Risk Score Based on Emprical Bayes Theory
EB-PRS is a novel method that leverages information for effect sizes across all the markers to improve the prediction accuracy. No parameter tuning is needed in the method, and no external information is needed. This R-package provides the calculation of polygenic risk scores from the given training summary statistics and testing data. We can use EB-PRS to extract main information, estimate Empirical Bayes parameters, derive polygenic risk scores for each individual in testing data, and evaluate the PRS according to AUC and predictive r2. See Song et al. (2020) <doi:10.1371/journal.pcbi.1007565> for a detailed presentation of the method.
Maintained by Shuang Song. Last updated 5 years ago.
2 stars 1.30 score 10 scripts