Showing 200 of total 1071 results (show query)

branchlab

metasnf:Meta Clustering with Similarity Network Fusion

Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.

Maintained by Prashanth S Velayudhan. Last updated 4 days ago.

bioinformaticsclusteringmetaclusteringsnf

17.7 match 8 stars 8.21 score 30 scripts

eikeluedeling

chillR:Statistical Methods for Phenology Analysis in Temperate Fruit Trees

The phenology of plants (i.e. the timing of their annual life phases) depends on climatic cues. For temperate trees and many other plants, spring phases, such as leaf emergence and flowering, have been found to result from the effects of both cool (chilling) conditions and heat. Fruit tree scientists (pomologists) have developed some metrics to quantify chilling and heat (e.g. see Luedeling (2012) <doi:10.1016/j.scienta.2012.07.011>). 'chillR' contains functions for processing temperature records into chilling (Chilling Hours, Utah Chill Units and Chill Portions) and heat units (Growing Degree Hours). Regarding chilling metrics, Chill Portions are often considered the most promising, but they are difficult to calculate. This package makes it easy. 'chillR' also contains procedures for conducting a PLS analysis relating phenological dates (e.g. bloom dates) to either mean temperatures or mean chill and heat accumulation rates, based on long-term weather and phenology records (Luedeling and Gassner (2012) <doi:10.1016/j.agrformet.2011.10.020>). As of version 0.65, it also includes functions for generating weather scenarios with a weather generator, for conducting climate change analyses for temperature-based climatic metrics and for plotting results from such analyses. Since version 0.70, 'chillR' contains a function for interpolating hourly temperature records.

Maintained by Eike Luedeling. Last updated 4 months ago.

cpp

17.8 match 3 stars 6.13 score 346 scripts 1 dependents

globalecologylab

poems:Pattern-Oriented Ensemble Modeling System

A framework of interoperable R6 classes (Chang, 2020, <https://CRAN.R-project.org/package=R6>) for building ensembles of viable models via the pattern-oriented modeling (POM) approach (Grimm et al.,2005, <doi:10.1126/science.1116681>). The package includes classes for encapsulating and generating model parameters, and managing the POM workflow. The workflow includes: model setup; generating model parameters via Latin hyper-cube sampling (Iman & Conover, 1980, <doi:10.1080/03610928008827996>); running multiple sampled model simulations; collating summary results; and validating and selecting an ensemble of models that best match known patterns. By default, model validation and selection utilizes an approximate Bayesian computation (ABC) approach (Beaumont et al., 2002, <doi:10.1093/genetics/162.4.2025>), although alternative user-defined functionality could be employed. The package includes a spatially explicit demographic population model simulation engine, which incorporates default functionality for density dependence, correlated environmental stochasticity, stage-based transitions, and distance-based dispersal. The user may customize the simulator by defining functionality for translocations, harvesting, mortality, and other processes, as well as defining the sequence order for the simulator processes. The framework could also be adapted for use with other model simulators by utilizing its extendable (inheritable) base classes.

Maintained by July Pilowsky. Last updated 19 days ago.

biogeographypopulation-modelprocess-based

13.1 match 10 stars 8.05 score 59 scripts 2 dependents

bioxgeo

geodiv:Methods for Calculating Gradient Surface Metrics

Methods for calculating gradient surface metrics for continuous analysis of landscape features.

Maintained by Annie C. Smith. Last updated 1 years ago.

cpp

14.8 match 11 stars 5.88 score 23 scripts 1 dependents

pharmar

riskmetric:Risk Metrics to Evaluating R Packages

Facilities for assessing R packages against a number of metrics to help quantify their robustness.

Maintained by Eli Miller. Last updated 8 days ago.

9.4 match 167 stars 8.89 score 43 scripts

climateanalytics

foreSIGHT:Systems Insights from Generation of Hydroclimatic Timeseries

A tool to create hydroclimate scenarios, stress test systems and visualize system performance in scenario-neutral climate change impact assessments. Scenario-neutral approaches 'stress-test' the performance of a modelled system by applying a wide range of plausible hydroclimate conditions (see Brown & Wilby (2012) <doi:10.1029/2012EO410001> and Prudhomme et al. (2010) <doi:10.1016/j.jhydrol.2010.06.043>). These approaches allow the identification of hydroclimatic variables that affect the vulnerability of a system to hydroclimate variation and change. This tool enables the generation of perturbed time series using a range of approaches including simple scaling of observed time series (e.g. Culley et al. (2016) <doi:10.1002/2015WR018253>) and stochastic simulation of perturbed time series via an inverse approach (see Guo et al. (2018) <doi:10.1016/j.jhydrol.2016.03.025>). It incorporates 'Richardson-type' weather generator model configurations documented in Richardson (1981) <doi:10.1029/WR017i001p00182>, Richardson and Wright (1984), as well as latent variable type model configurations documented in Bennett et al. (2018) <doi:10.1016/j.jhydrol.2016.12.043>, Rasmussen (2013) <doi:10.1002/wrcr.20164>, Bennett et al. (2019) <doi:10.5194/hess-23-4783-2019> to generate hydroclimate variables on a daily basis (e.g. precipitation, temperature, potential evapotranspiration) and allows a variety of different hydroclimate variable properties, herein called attributes, to be perturbed. Options are included for the easy integration of existing system models both internally in R and externally for seamless 'stress-testing'. A suite of visualization options for the results of a scenario-neutral analysis (e.g. plotting performance spaces and overlaying climate projection information) are also included. Version 1.0 of this package is described in Bennett et al. (2021) <doi:10.1016/j.envsoft.2021.104999>. As further developments in scenario-neutral approaches occur the tool will be updated to incorporate these advances.

Maintained by David McInerney. Last updated 1 years ago.

cpp

19.3 match 1 stars 3.60 score 20 scripts

riazakhan94

ROCit:Performance Assessment of Binary Classifier with Visualization

Sensitivity (or recall or true positive rate), false positive rate, specificity, precision (or positive predictive value), negative predictive value, misclassification rate, accuracy, F-score- these are popular metrics for assessing performance of binary classifier for certain threshold. These metrics are calculated at certain threshold values. Receiver operating characteristic (ROC) curve is a common tool for assessing overall diagnostic ability of the binary classifier. Unlike depending on a certain threshold, area under ROC curve (also known as AUC), is a summary statistic about how well a binary classifier performs overall for the classification task. ROCit package provides flexibility to easily evaluate threshold-bound metrics. Also, ROC curve, along with AUC, can be obtained using different methods, such as empirical, binormal and non-parametric. ROCit encompasses a wide variety of methods for constructing confidence interval of ROC curve and AUC. ROCit also features the option of constructing empirical gains table, which is a handy tool for direct marketing. The package offers options for commonly used visualization, such as, ROC curve, KS plot, lift plot. Along with in-built default graphics setting, there are rooms for manual tweak by providing the necessary values as function arguments. ROCit is a powerful tool offering a range of things, yet it is very easy to use.

Maintained by Md Riaz Ahmed Khan. Last updated 3 years ago.

9.0 match 7.66 score 332 scripts 6 dependents

cran

FuzzySTs:Fuzzy Statistical Tools

The main goal of this package is to present various fuzzy statistical tools. It intends to provide an implementation of the theoretical and empirical approaches presented in the book entitled "The signed distance measure in fuzzy statistical analysis. Some theoretical, empirical and programming advances" <doi: 10.1007/978-3-030-76916-1>. For the theoretical approaches, see Berkachy R. and Donze L. (2019) <doi:10.1007/978-3-030-03368-2_1>. For the empirical approaches, see Berkachy R. and Donze L. (2016) <ISBN: 978-989-758-201-1>). Important (non-exhaustive) implementation highlights of this package are as follows: (1) a numerical procedure to estimate the fuzzy difference and the fuzzy square. (2) two numerical methods of fuzzification. (3) a function performing different possibilities of distances, including the signed distance and the generalized signed distance for instance with all its properties. (4) numerical estimations of fuzzy statistical measures such as the variance, the moment, etc. (5) two methods of estimation of the bootstrap distribution of the likelihood ratio in the fuzzy context. (6) an estimation of a fuzzy confidence interval by the likelihood ratio method. (7) testing fuzzy hypotheses and/or fuzzy data by fuzzy confidence intervals in the Kwakernaak - Kruse and Meyer sense. (8) a general method to estimate the fuzzy p-value with fuzzy hypotheses and/or fuzzy data. (9) a method of estimation of global and individual evaluations of linguistic questionnaires. (10) numerical estimations of multi-ways analysis of variance models in the fuzzy context. The unbalance in the considered designs are also foreseen.

Maintained by Redina Berkachy. Last updated 8 months ago.

18.5 match 3.40 score

fsavje

distances:Tools for Distance Metrics

Provides tools for constructing, manipulating and using distance metrics.

Maintained by Fredrik Savje. Last updated 1 years ago.

cpp

7.7 match 17 stars 6.92 score 117 scripts 12 dependents

bioc

ChIPQC:Quality metrics for ChIPseq data

Quality metrics for ChIPseq data.

Maintained by Tom Carroll. Last updated 5 months ago.

sequencingchipseqqualitycontrolreportwriting

7.3 match 5.45 score 140 scripts

carmonalab

scGate:Marker-Based Cell Type Purification for Single-Cell Sequencing Data

A common bioinformatics task in single-cell data analysis is to purify a cell type or cell population of interest from heterogeneous datasets. 'scGate' automatizes marker-based purification of specific cell populations, without requiring training data or reference gene expression profiles. Briefly, 'scGate' takes as input: i) a gene expression matrix stored in a 'Seurat' object and ii) a “gating model” (GM), consisting of a set of marker genes that define the cell population of interest. The GM can be as simple as a single marker gene, or a combination of positive and negative markers. More complex GMs can be constructed in a hierarchical fashion, akin to gating strategies employed in flow cytometry. 'scGate' evaluates the strength of signature marker expression in each cell using the rank-based method 'UCell', and then performs k-nearest neighbor (kNN) smoothing by calculating the mean 'UCell' score across neighboring cells. kNN-smoothing aims at compensating for the large degree of sparsity in scRNA-seq data. Finally, a universal threshold over kNN-smoothed signature scores is applied in binary decision trees generated from the user-provided gating model, to annotate cells as either “pure” or “impure”, with respect to the cell population of interest. See the related publication Andreatta et al. (2022) <doi:10.1093/bioinformatics/btac141>.

Maintained by Massimo Andreatta. Last updated 1 months ago.

filteringmarker-genesscgatesignaturessingle-cell

4.5 match 106 stars 8.38 score 163 scripts