R-universe search: dendrogram

talgalili

dendextend:Extending 'dendrogram' Functionality in R

Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.

Maintained by Tal Galili. Last updated 2 months ago.

191.7 match 154 stars 17.02 score 6.0k scripts 164 dependents

bioc

ComplexHeatmap:Make Complex Heatmaps

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports various annotation graphics.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization sequencing clustering complex-heatmaps heatmap

35.4 match 1.3k stars 16.93 score 16k scripts 151 dependents

ropensci

phylogram:Dendrograms for Evolutionary Analysis

Contains functions for developing phylogenetic trees as deeply-nested lists ("dendrogram" objects). Enables bi-directional conversion between dendrogram and "phylo" objects (see Paradis et al (2004) <doi:10.1093/bioinformatics/btg412>), and features several tools for command-line tree manipulation and import/export via Newick parenthetic text.

Maintained by Shaun Wilkinson. Last updated 5 years ago.

peer-reviewed

34.5 match 11 stars 8.53 score 228 scripts 9 dependents

talgalili

heatmaply:Interactive Cluster Heat Maps Using 'plotly' and 'ggplot2'

Create interactive cluster 'heatmaps' that can be saved as a stand- alone HTML file, embedded in 'R Markdown' documents or in a 'Shiny' app, and available in the 'RStudio' viewer pane. Hover the mouse pointer over a cell to show details or drag a rectangle to zoom. A 'heatmap' is a popular graphical method for visualizing high-dimensional data, in which a table of numbers are encoded as a grid of colored cells. The rows and columns of the matrix are ordered to highlight patterns and are often accompanied by 'dendrograms'. 'Heatmaps' are used in many fields for visualizing observations, correlations, missing values patterns, and more. Interactive 'heatmaps' allow the inspection of specific value by hovering the mouse over a cell, as well as zooming into a region of the 'heatmap' by dragging a rectangle around the relevant area. This work is based on the 'ggplot2' and 'plotly.js' engine. It produces similar 'heatmaps' to 'heatmap.2' with the advantage of speed ('plotly.js' is able to handle larger size matrix), the ability to zoom from the 'dendrogram' panes, and the placing of factor variables in the sides of the 'heatmap'.

Maintained by Tal Galili. Last updated 8 months ago.

d3-heatmap dendextend dendrogram ggplot2 heatmap plotly

15.5 match 386 stars 14.21 score 2.0k scripts 45 dependents

andrie

ggdendro:Create Dendrograms and Tree Diagrams Using 'ggplot2'

This is a set of tools for dendrograms and tree plots using 'ggplot2'. The 'ggplot2' philosophy is to clearly separate data from the presentation. Unfortunately the plot method for dendrograms plots directly to a plot device without exposing the data. The 'ggdendro' package resolves this by making available functions that extract the dendrogram plot data. The package provides implementations for 'tree', 'rpart', as well as diana and agnes (from 'cluster') diagrams.

Maintained by Andrie de Vries. Last updated 3 months ago.

ggplot2

11.7 match 86 stars 13.54 score 3.9k scripts 62 dependents

yunuuuu

ggalign:A 'ggplot2' Extension for Consistent Axis Alignment

A 'ggplot2' extension offers various tools the creation of complex, multi-plot visualizations. Built on the familiar grammar of graphics, it provides intuitive tools to align and organize plots, making it ideal for complex visualizations. It excels in multi-omics research—such as genomics and microbiomes—by simplifying the visualization of intricate relationships between datasets, for example, linking genes to pathways. Whether you need to stack plots, arrange them around a central figure, or create a circular layout, 'ggalign' delivers flexibility and accuracy with minimal effort.

Maintained by Yun Peng. Last updated 4 hours ago.

complex-heatmaps dendrogram dendrogram-heatmap ggplot ggplot-extension ggplot2 heatmap heatmap-visualization heatmaps marginal-plots oncoplot oncoprint tanglegram upset upsetplot

20.5 match 267 stars 7.08 score 27 scripts

igraph

igraph:Network Analysis and Visualization

Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.

Maintained by Kirill Müller. Last updated 2 days ago.

complex-networks graph-algorithms graph-theory mathematics network-analysis network-graph fortran libxml2 glpk openblas cpp

6.1 match 581 stars 21.10 score 31k scripts 1.9k dependents

bioc

BatchQC:Batch Effects Quality Control Software

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.

Maintained by Jessica McClintock. Last updated 5 months ago.

batcheffect graphandnetwork microarray normalization principalcomponent sequencing software visualization qualitycontrol rnaseq preprocessing differentialexpression immunooncology

14.1 match 7 stars 8.99 score 54 scripts

plangfelder

WGCNA:Weighted Correlation Network Analysis

Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.

Maintained by Peter Langfelder. Last updated 6 months ago.

cpp

12.5 match 54 stars 9.65 score 5.3k scripts 32 dependents

bioc

clusterExperiment:Compare Clusterings for Single-Cell Sequencing

Provides functionality for running and comparing many different clusterings of single-cell sequencing data or other large mRNA Expression data sets.

Maintained by Elizabeth Purdom. Last updated 5 months ago.

clustering rnaseq sequencing software singlecell cpp

12.0 match 39 stars 9.63 score 192 scripts 1 dependents

gluc

data.tree:General Purpose Hierarchical Data Structure

Create tree structures from hierarchical data, and traverse the tree in various orders. Aggregate, cumulate, print, plot, convert to and from data.frame and more. Useful for decision trees, machine learning, finance, conversion from and to JSON, and many other applications.

Maintained by Christoph Glur. Last updated 5 months ago.

7.5 match 209 stars 12.84 score 1.1k scripts 88 dependents

plangfelder

dynamicTreeCut:Methods for Detection of Clusters in Hierarchical Clustering Dendrograms

Contains methods for detection of clusters in hierarchical clustering dendrograms.

Maintained by Peter Langfelder. Last updated 9 years ago.

11.9 match 4 stars 7.52 score 492 scripts 59 dependents

mhahsler

dbscan:Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms

A fast reimplementation of several density-based algorithms of the DBSCAN family. Includes the clustering algorithms DBSCAN (density-based spatial clustering of applications with noise) and HDBSCAN (hierarchical DBSCAN), the ordering algorithm OPTICS (ordering points to identify the clustering structure), shared nearest neighbor clustering, and the outlier detection algorithms LOF (local outlier factor) and GLOSH (global-local outlier score from hierarchies). The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided. Hahsler, Piekenbrock and Doran (2019) <doi:10.18637/jss.v091.i01>.

Maintained by Michael Hahsler. Last updated 2 months ago.

clustering dbscan density-based-clustering hdbscan lof optics cpp

5.3 match 321 stars 15.62 score 1.6k scripts 84 dependents

ncss-tech

sharpshootR:A Soil Survey Toolkit

A collection of data processing, visualization, and export functions to support soil survey operations. Many of the functions build on the `SoilProfileCollection` S4 class provided by the aqp package, extending baseline visualization to more elaborate depictions in the context of spatial and taxonomic data. While this package is primarily developed by and for the USDA-NRCS, in support of the National Cooperative Soil Survey, the authors strive for generalization sufficient to support any soil survey operation. Many of the included functions are used by the SoilWeb suite of websites and movile applications. These functions are provided here, with additional documentation, to enable others to replicate high quality versions of these figures for their own purposes.

Maintained by Dylan Beaudette. Last updated 12 days ago.

8.4 match 18 stars 8.37 score 327 scripts

jokergoo

circlize:Circular Visualization

Circular layout is an efficient way for the visualization of huge amounts of information. Here this package provides an implementation of circular layout generation in R as well as an enhancement of available software. The flexibility of the package is based on the usage of low-level graphics functions such that self-defined high-level graphics can be easily implemented by users for specific purposes. Together with the seamless connection between the powerful computational and visual environment in R, it gives users more convenience and freedom to design figures for better understanding complex patterns behind multiple dimensional data. The package is described in Gu et al. 2014 <doi:10.1093/bioinformatics/btu393>.

Maintained by Zuguang Gu. Last updated 1 years ago.

4.3 match 983 stars 15.62 score 10k scripts 213 dependents

jefferis

dendroextras:Extra Functions to Cut, Label and Colour Dendrogram Clusters

Provides extra functions to manipulate dendrograms that build on the base functions provided by the 'stats' package. The main functionality it is designed to add is the ability to colour all the edges in an object of class 'dendrogram' according to cluster membership i.e. each subtree is coloured, not just the terminal leaves. In addition it provides some utility functions to cut 'dendrogram' and 'hclust' objects and to set/get labels.

Maintained by Gregory Jefferis. Last updated 7 years ago.

12.7 match 4.61 score 90 scripts 3 dependents

evanbiederstedt

dendsort:Modular Leaf Ordering Methods for Dendrogram Nodes

An implementation of functions to optimize ordering of nodes in a dendrogram, without affecting the meaning of the dendrogram. A dendrogram can be sorted based on the average distance of subtrees, or based on the smallest distance value. These sorting methods improve readability and interpretability of tree structure, especially for tasks such as comparison of different distance measures or linkage types and identification of tight clusters and outliers. As a result, it also introduces more meaningful reordering for a coupled heatmap visualization. This method is described in "dendsort: modular leaf ordering methods for dendrogram representations in R", F1000Research 2014, 3: 177 <doi:10.12688/f1000research.4784.1>.

Maintained by Evan Biederstedt. Last updated 4 years ago.

7.5 match 4 stars 7.01 score 472 scripts 3 dependents

bioc

ReducedExperiment:Containers and tools for dimensionally-reduced -omics representations

Provides SummarizedExperiment-like containers for storing and manipulating dimensionally-reduced assay data. The ReducedExperiment classes allow users to simultaneously manipulate their original dataset and their decomposed data, in addition to other method-specific outputs like feature loadings. Implements utilities and specialised classes for the application of stabilised independent component analysis (sICA) and weighted gene correlation network analysis (WGCNA).

Maintained by Jack Gisby. Last updated 2 months ago.

geneexpression infrastructure datarepresentation software dimensionreduction network bioconductor-package bioinformatics dimensionality-reduction

10.1 match 3 stars 5.18 score 8 scripts

bioc

DECIPHER:Tools for curating, analyzing, and manipulating biological sequences

A toolset for deciphering and managing biological sequences.

Maintained by Erik Wright. Last updated 5 days ago.

clustering genetics sequencing dataimport visualization microarray qualitycontrol qpcr alignment wholegenome microbiome immunooncology geneprediction openmp

5.5 match 8.40 score 1.1k scripts 14 dependents

paleolimbot

tidypaleo:Tidy Tools for Paleoenvironmental Archives

Provides a set of functions with a common framework for age-depth model management, stratigraphic visualization, and common statistical transformations. The focus of the package is stratigraphic visualization, for which 'ggplot2' components are provided to reproduce the scales, geometries, facets, and theme elements commonly used in publication-quality stratigraphic diagrams. Helpers are also provided to reproduce the exploratory statistical summaries that are frequently included on stratigraphic diagrams. See Dunnington et al. (2021) <doi:10.18637/jss.v101.i07>.

Maintained by Dewey Dunnington. Last updated 2 years ago.

6.8 match 34 stars 6.59 score 38 scripts

md-anderson-bioinformatics

NGCHM:Next Generation Clustered Heat Maps

Next-Generation Clustered Heat Maps (NG-CHMs) allow for dynamic exploration of heat map data in a web browser. 'NGCHM' allows users to create both stand-alone HTML files containing a Next-Generation Clustered Heat Map, and .ngchm files to view in the NG-CHM viewer. See Ryan MC, Stucky M, et al (2020) <doi:10.12688/f1000research.20590.2> for more details.

Maintained by Mary A Rohrdanz. Last updated 8 days ago.

heatmap nci-itcr ng-chm

7.5 match 9 stars 5.48 score 28 scripts

bioc

SynExtend:Tools for Working With Synteny Objects

Shared order between genomic sequences provide a great deal of information. Synteny objects produced by the R package DECIPHER provides quantitative information about that shared order. SynExtend provides tools for extracting information from Synteny objects.

Maintained by Nicholas Cooley. Last updated 2 days ago.

genetics clustering comparativegenomics dataimport fortran openmp

6.1 match 1 stars 6.42 score 77 scripts

plotly

plotly:Create Interactive Web Graphics via 'plotly.js'

Create interactive web graphics from 'ggplot2' graphs and/or a custom interface to the (MIT-licensed) JavaScript library 'plotly.js' inspired by the grammar of graphics.

Maintained by Carson Sievert. Last updated 3 months ago.

d3js data-visualization ggplot2 javascript plotly shiny webgl

2.0 match 2.6k stars 19.43 score 93k scripts 797 dependents

uclahs-cds

BoutrosLab.plotting.general:Functions to Create Publication-Quality Plots

Contains several plotting functions such as barplots, scatterplots, heatmaps, as well as functions to combine plots and assist in the creation of these plots. These functions will give users great ease of use and customization options in broad use for biomedical applications, as well as general purpose plotting. Each of the functions also provides valid default settings to make plotting data more efficient and producing high quality plots with standard colour schemes simpler. All functions within this package are capable of producing plots that are of the quality to be presented in scientific publications and journals. P'ng et al.; BPG: Seamless, automated and interactive visualization of scientific data; BMC Bioinformatics 2019 <doi:10.1186/s12859-019-2610-2>.

Maintained by Paul Boutros. Last updated 5 months ago.

4.5 match 12 stars 8.36 score 414 scripts 6 dependents

hardin47

biwt:Functions to Compute the Biweight Mean Vector and Covariance and Correlation Matrices

The base functions compute multivariate location, scale, and correlation estimates based on Tukey's biweight M-estimator. Using the base function, the computations can be applied to a large number of observations to create either a matrix of biweight distances or biweight correlations.

Maintained by Johanna Hardin. Last updated 6 months ago.

6.5 match 5.58 score 16 scripts 2 dependents

evanbiederstedt

gapmap:Drawing Gapped Cluster Heatmaps with 'ggplot2'

The gap encodes the distance between clusters and improves interpretation of cluster heatmaps. The gaps can be of the same distance based on a height threshold to cut the dendrogram. Another option is to vary the size of gaps based on the distance between clusters.

Maintained by Evan Biederstedt. Last updated 1 years ago.

7.8 match 2 stars 4.62 score 21 scripts

tpook92

MoBPS:Modular Breeding Program Simulator

Framework for the simulation framework for the simulation of complex breeding programs and compare their economic and genetic impact. The package is also used as the background simulator for our a web-based interface <http:www.mobps.de>. Associated publication: Pook et al. (2020) <doi:10.1534/g3.120.401193>.

Maintained by Torsten Pook. Last updated 3 years ago.

15.1 match 2.35 score 45 scripts

e-sensing

sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes

An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.

Maintained by Gilberto Camara. Last updated 1 months ago.

big-earth-data cbers earth-observation eo-datacubes geospatial image-time-series land-cover-classification landsat planetary-computer r-spatial remote-sensing rspatial satellite-image-time-series satellite-imagery sentinel-2 stac-api stac-catalog cpp

3.7 match 494 stars 9.50 score 384 scripts

teunbrand

legendry:Extended Legends and Axes for 'ggplot2'

A 'ggplot2' extension that focusses on expanding the plotter's arsenal of guides. Guides in 'ggplot2' include axes and legends. 'legendry' offers new axes and annotation options, as well as new legends and colour displays.

Maintained by Teun van den Brand. Last updated 11 days ago.

axis axis-customization ggplot-extension ggplot2 legend visualization

4.5 match 227 stars 7.83 score 29 scripts 2 dependents

tsieger

idendr0:Interactive Dendrograms

Interactive dendrogram that enables the user to select and color clusters, to zoom and pan the dendrogram, and to visualize the clustered data not only in a built-in heat map, but also in 'GGobi' interactive plots and user-supplied plots. This is a backport of Qt-based 'idendro' (<https://github.com/tsieger/idendro>) to base R graphics and Tcl/Tk GUI.

Maintained by Tomas Sieger. Last updated 4 years ago.

9.0 match 7 stars 3.89 score 22 scripts

bioc

Heatplus:Heatmaps with row and/or column covariates and colored clusters

Display a rectangular heatmap (intensity plot) of a data matrix. By default, both samples (columns) and features (row) of the matrix are sorted according to a hierarchical clustering, and the corresponding dendrogram is plotted. Optionally, panels with additional information about samples and features can be added to the plot.

Maintained by Alexander Ploner. Last updated 5 months ago.

microarray visualization

4.6 match 7.63 score 94 scripts 5 dependents

jacobbien

protoclust:Hierarchical Clustering with Prototypes

Performs minimax linkage hierarchical clustering. Every cluster has an associated prototype element that represents that cluster as described in Bien, J., and Tibshirani, R. (2011), "Hierarchical Clustering with Prototypes via Minimax Linkage," The Journal of the American Statistical Association, 106(495), 1075-1084.

Maintained by Jacob Bien. Last updated 3 years ago.

5.8 match 7 stars 5.64 score 50 scripts 4 dependents

bioc

TreeAndLeaf:Displaying binary trees with focus on dendrogram leaves

The TreeAndLeaf package combines unrooted and force-directed graph algorithms in order to layout binary trees, aiming to represent multiple layers of information onto dendrogram leaves.

Maintained by Milena A. Cardoso. Last updated 5 months ago.

infrastructure graphandnetwork software network visualization datarepresentation

7.6 match 4.20 score 16 scripts

mhahsler

seriation:Infrastructure for Ordering Objects Using Seriation

Infrastructure for ordering objects with an implementation of several seriation/sequencing/ordination techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT). Hahsler et al (2008) <doi:10.18637/jss.v025.i03>.

Maintained by Michael Hahsler. Last updated 3 months ago.

combinatorial-optimization ordination seriation fortran

2.3 match 77 stars 14.07 score 640 scripts 79 dependents

thomasp85

ggraph:An Implementation of Grammar of Graphics for Graphs and Networks

The grammar of graphics as implemented in ggplot2 is a poor fit for graph and network visualizations due to its reliance on tabular data input. ggraph is an extension of the ggplot2 API tailored to graph visualizations and provides the same flexible approach to building up plots layer by layer.

Maintained by Thomas Lin Pedersen. Last updated 1 years ago.

ggplot-extension ggplot2 graph-visualization network-visualization visualization cpp

1.9 match 1.1k stars 16.96 score 9.2k scripts 111 dependents

bioc

ggtreeDendro:Drawing 'dendrogram' using 'ggtree'

Offers a set of 'autoplot' methods to visualize tree-like structures (e.g., hierarchical clustering and classification/regression trees) using 'ggtree'. You can adjust graphical parameters using grammar of graphic syntax and integrate external data to the tree.

Maintained by Guangchuang Yu. Last updated 5 months ago.

clustering classification decisiontree phylogenetics visualization

7.6 match 4.18 score 10 scripts

bethatkinson

rpart:Recursive Partitioning and Regression Trees

Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.

Maintained by Beth Atkinson. Last updated 8 months ago.

cart classification statistics

1.9 match 52 stars 16.59 score 18k scripts 1.6k dependents

christophergandrud

networkD3:D3 JavaScript Network Graphs from R

Creates 'D3' 'JavaScript' network, tree, dendrogram, and Sankey graphs from 'R'.

Maintained by Christopher Gandrud. Last updated 6 years ago.

d3js networks

2.3 match 654 stars 13.55 score 3.4k scripts 31 dependents

uiowa-applied-topology

mappeR:Construct and Visualize TDA Mapper Graphs

Topological data analysis (TDA) is a method of data analysis that uses techniques from topology to analyze high-dimensional data. Here we implement Mapper, an algorithm from this area developed by Singh, Mémoli and Carlsson (2007) which generalizes the concept of a Reeb graph <https://en.wikipedia.org/wiki/Reeb_graph>.

Maintained by George Clare Kennedy. Last updated 24 days ago.

7.4 match 2 stars 4.05 score 14 scripts

christophergandrud

d3Network:The Old Package for Creating D3 JavaScript Network, Tree, Dendrogram, and Sankey Graphs

!!! NOTE: Active development has moved to the networkD3 package. !!!

Maintained by Christopher Gandrud. Last updated 10 years ago.

4.5 match 172 stars 6.63 score 82 scripts

zwdzwd

wheatmap:Incrementally Build Complex Plots using Natural Semantics

Builds complex plots, heatmaps in particular, using natural semantics. Bigger plots can be assembled using directives such as 'LeftOf', 'RightOf', 'TopOf', and 'Beneath' and more. Other features include clustering, dendrograms and integration with 'ggplot2' generated grid objects. This package is particularly designed for bioinformaticians to assemble complex plots for publication.

Maintained by Wanding Zhou. Last updated 3 years ago.

4.6 match 10 stars 6.35 score 50 scripts 3 dependents

ubod

apcluster:Affinity Propagation Clustering

Implements Affinity Propagation clustering introduced by Frey and Dueck (2007) <DOI:10.1126/science.1136800>. The algorithms are largely analogous to the 'Matlab' code published by Frey and Dueck. The package further provides leveraged affinity propagation and an algorithm for exemplar-based agglomerative clustering that can also be used to join clusters obtained from affinity propagation. Various plotting functions are available for analyzing clustering results.

Maintained by Ulrich Bodenhofer. Last updated 11 months ago.

cpp

3.0 match 10 stars 9.82 score 270 scripts 25 dependents

bioc

SNPRelate:Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed an R package SNPRelate to provide a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.

Maintained by Xiuwen Zheng. Last updated 5 months ago.

infrastructure genetics statisticalmethod principalcomponent bioinformatics gds-format pca simd snp openblas cpp

2.3 match 104 stars 12.69 score 1.6k scripts 18 dependents

kassambara

factoextra:Extract and Visualize the Results of Multivariate Data Analyses

Provides some easy-to-use functions to extract and visualize the output of multivariate data analyses, including 'PCA' (Principal Component Analysis), 'CA' (Correspondence Analysis), 'MCA' (Multiple Correspondence Analysis), 'FAMD' (Factor Analysis of Mixed Data), 'MFA' (Multiple Factor Analysis) and 'HMFA' (Hierarchical Multiple Factor Analysis) functions from different R packages. It contains also functions for simplifying some clustering analysis steps and provides 'ggplot2' - based elegant data visualization.

Maintained by Alboukadel Kassambara. Last updated 5 years ago.

2.0 match 363 stars 14.13 score 15k scripts 52 dependents

cbhurley

DendSer:Dendrogram Seriation: Ordering for Visualisation

Re-arranges a dendrogram to optimize visualisation-based cost functions.

Maintained by Catherine Hurley. Last updated 3 years ago.

7.5 match 3.74 score 27 scripts 5 dependents

alextkalinka

linkcomm:Tools for Generating, Visualizing, and Analysing Link Communities in Networks

Link communities reveal the nested and overlapping structure in networks, and uncover the key nodes that form connections to multiple communities. linkcomm provides a set of tools for generating, visualizing, and analysing link communities in networks of arbitrary size and type. The linkcomm package also includes tools for generating, visualizing, and analysing Overlapping Cluster Generator (OCG) communities. Kalinka and Tomancak (2011) <doi:10.1093/bioinformatics/btr311>.

Maintained by Alex T. Kalinka. Last updated 4 years ago.

clustering networks networks-biology visualization cpp

3.6 match 7 stars 7.53 score 115 scripts 4 dependents

kharchenkolab

leidenAlg:Implements the Leiden Algorithm via an R Interface

An R interface to the Leiden algorithm, an iterative community detection algorithm on networks. The algorithm is designed to converge to a partition in which all subsets of all communities are locally optimally assigned, yielding communities guaranteed to be connected. The implementation proves to be fast, scales well, and can be run on graphs of millions of nodes (as long as they can fit in memory). The original implementation was constructed as a python interface "leidenalg" found here: <https://github.com/vtraag/leidenalg>. The algorithm was originally described in Traag, V.A., Waltman, L. & van Eck, N.J. "From Louvain to Leiden: guaranteeing well-connected communities". Sci Rep 9, 5233 (2019) <doi:10.1038/s41598-019-41695-z>.

Maintained by Evan Biederstedt. Last updated 5 months ago.

fortran cpp

4.1 match 9 stars 5.87 score 28 scripts 5 dependents

bioc

HGC:A fast hierarchical graph-based clustering method

HGC (short for Hierarchical Graph-based Clustering) is an R package for conducting hierarchical clustering on large-scale single-cell RNA-seq (scRNA-seq) data. The key idea is to construct a dendrogram of cells on their shared nearest neighbor (SNN) graph. HGC provides functions for building graphs and for conducting hierarchical clustering on the graph. The users with old R version could visit https://github.com/XuegongLab/HGC/tree/HGC4oldRVersion to get HGC package built for R 3.6.

Maintained by XGlab. Last updated 5 months ago.

singlecell software clustering rnaseq graphandnetwork dnaseq cpp

5.0 match 4.70 score 25 scripts

bioc

netboost:Network Analysis Supported by Boosting

Boosting supported network analysis for high-dimensional omics applications. This package comes bundled with the MC-UPGMA clustering package by Yaniv Loewenstein.

Maintained by Pascal Schlosser. Last updated 5 months ago.

software statisticalmethod graphandnetwork network clustering dimensionreduction biomedicalinformatics epigenetics metabolomics transcriptomics cpp

5.4 match 4.18 score 1 scripts

ncss-tech

aqp:Algorithms for Quantitative Pedology

The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.

Maintained by Dylan Beaudette. Last updated 28 days ago.

digital-soil-mapping ncss-tech nrcs pedology pedometrics soil soil-survey usda

1.9 match 55 stars 11.77 score 1.2k scripts 2 dependents

massimoaria

bibliometrix:Comprehensive Science Mapping Analysis

Tool for quantitative research in scientometrics and bibliometrics. It implements the comprehensive workflow for science mapping analysis proposed in Aria M. and Cuccurullo C. (2017) <doi:10.1016/j.joi.2017.08.007>. 'bibliometrix' provides various routines for importing bibliographic data from 'SCOPUS', 'Clarivate Analytics Web of Science' (<https://www.webofknowledge.com/>), 'Digital Science Dimensions' (<https://www.dimensions.ai/>), 'OpenAlex' (<https://openalex.org/>), 'Cochrane Library' (<https://www.cochranelibrary.com/>), 'Lens' (<https://lens.org>), and 'PubMed' (<https://pubmed.ncbi.nlm.nih.gov/>) databases, performing bibliometric analysis and building networks for co-citation, coupling, scientific collaboration and co-word analysis.

Maintained by Massimo Aria. Last updated 7 days ago.

bibliometric-analysis bibliometrics citation citation-network citations co-authors co-occurence co-word-analysis correspondence-analysis coupling isi-web journal manuscript quantitative-analysis scholars science science-mapping scientific scientometrics scopus

1.8 match 545 stars 12.54 score 518 scripts 2 dependents

luca-scr

mclust:Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation

Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.

Maintained by Luca Scrucca. Last updated 11 months ago.

fortran openblas

1.8 match 21 stars 12.23 score 6.6k scripts 587 dependents

bioc

celda:CEllular Latent Dirichlet Allocation

Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included.

Maintained by Joshua Campbell. Last updated 27 days ago.

singlecell geneexpression clustering sequencing bayesian immunooncology dataimport cpp openmp

1.9 match 147 stars 10.47 score 256 scripts 2 dependents

bioc

CluMSID:Clustering of MS2 Spectra for Metabolite Identification

CluMSID is a tool that aids the identification of features in untargeted LC-MS/MS analysis by the use of MS2 spectra similarity and unsupervised statistical methods. It offers functions for a complete and customisable workflow from raw data to visualisations and is interfaceable with the xmcs family of preprocessing packages.

Maintained by Tobias Depke. Last updated 5 months ago.

metabolomics preprocessing clustering

3.2 match 10 stars 6.04 score 22 scripts

r-forge

latticeExtra:Extra Graphical Utilities Based on Lattice

Building on the infrastructure provided by the lattice package, this package provides several new high-level functions and methods, as well as additional utilities such as panel and axis annotation functions.

Maintained by Deepayan Sarkar. Last updated 3 years ago.

1.9 match 10.18 score 2.6k scripts 233 dependents

r-hyperspec

hyperSpec:Work with Hyperspectral Data, i.e. Spectra + Meta Information (Spatial, Time, Concentration, ...)

Comfortable ways to work with hyperspectral data sets, i.e. spatially or time-resolved spectra, or spectra with any other kind of information associated with each of the spectra. The spectra can be data as obtained in XRF, UV/VIS, Fluorescence, AES, NIR, IR, Raman, NMR, MS, etc. More generally, any data that is recorded over a discretized variable, e.g. absorbance = f(wavelength), stored as a vector of absorbance values for discrete wavelengths is suitable.

Maintained by Claudia Beleites. Last updated 10 months ago.

data-wrangling hyperspectral imaging infrared nmr raman spectroscopy uv-vis xrf

2.3 match 16 stars 8.13 score 233 scripts 2 dependents

jonasrieger

ldaPrototype:Prototype of Multiple Latent Dirichlet Allocation Runs

Determine a Prototype from a number of runs of Latent Dirichlet Allocation (LDA) measuring its similarities with S-CLOP: A procedure to select the LDA run with highest mean pairwise similarity, which is measured by S-CLOP (Similarity of multiple sets by Clustering with Local Pruning), to all other runs. LDA runs are specified by its assignments leading to estimators for distribution parameters. Repeated runs lead to different results, which we encounter by choosing the most representative LDA run as prototype.

Maintained by Jonas Rieger. Last updated 2 years ago.

latent-dirichlet-allocation lda model-selection modelselection reliability text-mining textdata topic-model topic-models topic-similarities topicmodeling topicmodelling

4.0 match 8 stars 4.44 score 23 scripts 1 dependents

grunwaldlab

poppr:Genetic Analysis of Populations with Mixed Reproduction

Population genetic analyses for hierarchical analysis of partially clonal populations built upon the architecture of the 'adegenet' package. Originally described in Kamvar, Tabima, and Grünwald (2014) <doi:10.7717/peerj.281> with version 2.0 described in Kamvar, Brooks, and Grünwald (2015) <doi:10.3389/fgene.2015.00208>.

Maintained by Zhian N. Kamvar. Last updated 10 months ago.

clonality genetic-analysis genetic-distances minimum-spanning-networks multilocus-genotypes multilocus-lineages population-genetics populations openmp

1.7 match 69 stars 10.84 score 672 scripts

kharchenkolab

sccore:Core Utilities for Single-Cell RNA-Seq

Core utilities for single-cell RNA-seq data analysis. Contained within are utility functions for working with differential expression (DE) matrices and count matrices, a collection of functions for manipulating and plotting data via 'ggplot2', and functions to work with cell graphs and cell embeddings. Graph-based methods include embedding kNN cell graphs into a UMAP <doi:10.21105/joss.00861>, collapsing vertices of each cluster in the graph, and propagating graph labels.

Maintained by Evan Biederstedt. Last updated 1 years ago.

cpp

2.8 match 12 stars 6.44 score 36 scripts 9 dependents

bioc

goProfiles:goProfiles: an R package for the statistical analysis of functional profiles

The package implements methods to compare lists of genes based on comparing the corresponding 'functional profiles'.

Maintained by Alex Sanchez. Last updated 5 months ago.

annotation go geneexpression genesetenrichment graphandnetwork microarray multiplecomparison pathways software

3.2 match 5.48 score 6 scripts 1 dependents

jokergoo

spiralize:Visualize Data on Spirals

It visualizes data along an Archimedean spiral <https://en.wikipedia.org/wiki/Archimedean_spiral>, makes so-called spiral graph or spiral chart. It has two major advantages for visualization: 1. It is able to visualize data with very long axis with high resolution. 2. It is efficient for time series data to reveal periodic patterns.

Maintained by Zuguang Gu. Last updated 9 months ago.

2.3 match 148 stars 7.67 score 35 scripts 3 dependents

karlines

diagram:Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams

Visualises simple graphs (networks) based on a transition matrix, utilities to plot flow diagrams, visualising webs, electrical networks, etc. Support for the book "A practical guide to ecological modelling - using R as a simulation platform" by Karline Soetaert and Peter M.J. Herman (2009), Springer. and the book "Solving Differential Equations in R" by Karline Soetaert, Jeff Cash and Francesca Mazzia (2012), Springer. Includes demo(flowchart), demo(plotmat), demo(plotweb).

Maintained by Karline Soetaert. Last updated 4 years ago.

1.7 match 10.06 score 598 scripts 487 dependents

bioboot

bio3d:Biological Structure Analysis

Utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform sequence and structure database searches, data summaries, atom selection, alignment, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, normal mode analysis, principal component analysis of heterogeneous structure data, and correlation network analysis from normal mode and molecular dynamics data. In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data. Please refer to the URLs below for more information.

Maintained by Barry Grant. Last updated 5 months ago.

zlib cpp

2.0 match 5 stars 8.49 score 1.4k scripts 10 dependents

bioc

goSorensen:Statistical inference based on the Sorensen-Dice dissimilarity and the Gene Ontology (GO)

This package implements inferential methods to compare gene lists in terms of their biological meaning as expressed in the GO. The compared gene lists are characterized by cross-tabulation frequency tables of enriched GO items. Dissimilarity between gene lists is evaluated using the Sorensen-Dice index. The fundamental guiding principle is that two gene lists are taken as similar if they share a great proportion of common enriched GO items.

Maintained by Pablo Flores. Last updated 5 months ago.

annotation go genesetenrichment software microarray pathways geneexpression multiplecomparison graphandnetwork reactome clustering kegg

3.7 match 4.56 score 12 scripts

pcruniversum

RDML:Importing Real-Time Thermo Cycler (qPCR) Data from RDML Format Files

Imports real-time thermo cycler (qPCR) data from Real-time PCR Data Markup Language (RDML) and transforms to the appropriate formats of the 'qpcR' and 'chipPCR' packages. Contains a dendrogram visualization for the structure of RDML object and GUI for RDML editing.

Maintained by Konstantin A. Blagodatskikh. Last updated 7 months ago.

bioinformatics pcr qpcr rdml

2.3 match 21 stars 7.16 score 58 scripts 1 dependents

bioc

pRoloc:A unifying bioinformatics framework for spatial proteomics

The pRoloc package implements machine learning and visualisation methods for the analysis and interogation of quantitiative mass spectrometry data to reliably infer protein sub-cellular localisation.

Maintained by Lisa Breckels. Last updated 26 days ago.

immunooncology proteomics massspectrometry classification clustering qualitycontrol bioconductor proteomics-data spatial-proteomics visualisation openblas cpp

1.9 match 15 stars 8.71 score 101 scripts 2 dependents

cmmr

rbiom:Read/Write, Analyze, and Visualize 'BIOM' Data

A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. CPU intensive operations are coded in C for speed.

Maintained by Daniel P. Smith. Last updated 6 days ago.

1.8 match 15 stars 9.02 score 117 scripts 6 dependents

kharchenkolab

pagoda2:Single Cell Analysis and Differential Expression

Analyzing and interactively exploring large-scale single-cell RNA-seq datasets. 'pagoda2' primarily performs normalization and differential gene expression analysis, with an interactive application for exploring single-cell RNA-seq datasets. It performs basic tasks such as cell size normalization, gene variance normalization, and can be used to identify subpopulations and run differential expression within individual samples. 'pagoda2' was written to rapidly process modern large-scale scRNAseq datasets of approximately 1e6 cells. The companion web application allows users to explore which gene expression patterns form the different subpopulations within your data. The package also serves as the primary method for preprocessing data for conos, <https://github.com/kharchenkolab/conos>. This package interacts with data available through the 'p2data' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/pagoda2>. The size of the 'p2data' package is approximately 6 MB.

Maintained by Evan Biederstedt. Last updated 1 years ago.

scrna-seq single-cell single-cell-rna-seq transcriptomics openblas cpp openmp

2.0 match 222 stars 8.00 score 282 scripts

eahouseman

RPMM:Recursively Partitioned Mixture Model

Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a model-based clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models.

Maintained by E. Andres Houseman. Last updated 8 years ago.

3.6 match 4.34 score 78 scripts 7 dependents

bioc

PhyloProfile:PhyloProfile

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Maintained by Vinh Tran. Last updated 6 days ago.

software visualization datarepresentation multiplecomparison functionalprediction dimensionreduction bioinformatics heatmap interactive-visualizations orthologs phylogenetic-profile shiny

2.0 match 33 stars 7.77 score 10 scripts

r-forge

ClassDiscovery:Classes and Methods for "Class Discovery" with Microarrays or Proteomics

Defines the classes used for "class discovery" problems in the OOMPA project (<http://oompa.r-forge.r-project.org/>). Class discovery primarily consists of unsupervised clustering methods with attempts to assess their statistical significance.

Maintained by Kevin R. Coombes. Last updated 1 months ago.

microarray clustering

1.8 match 8.53 score 85 scripts 9 dependents

stemangiola

tidyHeatmap:A Tidy Implementation of Heatmap

This is a tidy implementation for heatmap. At the moment it is based on the (great) package 'ComplexHeatmap'. The goal of this package is to interface a tidy data frame with this powerful tool. Some of the advantages are: Row and/or columns colour annotations are easy to integrate just specifying one parameter (column names). Custom grouping of rows is easy to specify providing a grouped tbl. For example: df %>% group_by(...). Labels size adjusted by row and column total number. Default use of Brewer and Viridis palettes.

Maintained by Stefano Mangiola. Last updated 1 months ago.

assaydomain infrastructure brewer complexheatmap custom-palette dplyr graphviz heatmap mtcars plotting rstudio scale tibble tidy tidy-data-frame tidybulk tidyverse viridis

1.5 match 335 stars 10.23 score 197 scripts 1 dependents

uclahs-cds

CancerEvolutionVisualization:Publication Quality Phylogenetic Tree Plots

Generates tree plots with precise branch lengths, gene annotations, and cellular prevalence. The package handles complex tree structures (angles, lengths, etc.) and can be further refined as needed by the user.

Maintained by Paul Boutros. Last updated 1 days ago.

2.4 match 2 stars 6.34 score 5 scripts

bioc

GeneTonic:Enjoy Analyzing And Integrating The Results From Differential Expression Analysis And Functional Enrichment Analysis

This package provides functionality to combine the existing pieces of the transcriptome data and results, making it easier to generate insightful observations and hypothesis. Its usage is made easy with a Shiny application, combining the benefits of interactivity and reproducibility e.g. by capturing the features and gene sets of interest highlighted during the live session, and creating an HTML report as an artifact where text, code, and output coexist. Using the GeneTonicList as a standardized container for all the required components, it is possible to simplify the generation of multiple visualizations and summaries.

Maintained by Federico Marini. Last updated 2 months ago.

gui geneexpression software transcription transcriptomics visualization differentialexpression pathways reportwriting genesetenrichment annotation go shinyapps bioconductor bioconductor-package data-exploration data-visualization functional-enrichment-analysis gene-expression pathway-analysis reproducible-research rna-seq-analysis rna-seq-data shiny transcriptome user-friendly

1.8 match 77 stars 8.28 score 37 scripts 1 dependents

wencke

GOplot:Visualization of Functional Analysis Data

Implementation of multilayered visualizations for enhanced graphical representation of functional analysis data. It combines and integrates omics data derived from expression and functional annotation enrichment analyses. Its plotting functions have been developed with an hierarchical structure in mind: starting from a general overview to identify the most enriched categories (modified bar plot, bubble plot) to a more detailed one displaying different types of relevant information for the molecules in a given set of categories (circle plot, chord plot, cluster plot, Venn diagram, heatmap).

Maintained by Wencke Walter. Last updated 8 years ago.

2.3 match 20 stars 6.60 score 235 scripts

r-suzuki

pvclust:Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling

An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides SI (selective inference) p-value, AU (approximately unbiased) p-value and BP (bootstrap probability) value for each cluster in a dendrogram.

Maintained by Ryota Suzuki. Last updated 5 years ago.

2.3 match 5 stars 6.54 score 784 scripts 9 dependents

shaunpwilkinson

aphid:Analysis with Profile Hidden Markov Models

Designed for the development and application of hidden Markov models and profile HMMs for biological sequence analysis. Contains functions for multiple and pairwise sequence alignment, model construction and parameter optimization, file import/export, implementation of the forward, backward and Viterbi algorithms for conditional sequence probabilities, tree-based sequence weighting, and sequence simulation. Features a wide variety of potential applications including database searching, gene-finding and annotation, phylogenetic analysis and sequence classification. Based on the models and algorithms described in Durbin et al (1998, ISBN: 9780521629713).

Maintained by Shaun Wilkinson. Last updated 8 months ago.

cpp

2.3 match 22 stars 6.58 score 38 scripts 3 dependents

loukiaspin

rnmamod:Bayesian Network Meta-Analysis with Missing Participants

A comprehensive suite of functions to perform and visualise pairwise and network meta-analysis with aggregate binary or continuous missing participant outcome data. The package covers core Bayesian one-stage models implemented in a systematic review with multiple interventions, including fixed-effect and random-effects network meta-analysis, meta-regression, evaluation of the consistency assumption via the node-splitting approach and the unrelated mean effects model (original and revised model proposed by Spineli, (2022) <doi:10.1177/0272989X211068005>), and sensitivity analysis (see Spineli et al., (2021) <doi:10.1186/s12916-021-02195-y>). Missing participant outcome data are addressed in all models of the package (see Spineli, (2019) <doi:10.1186/s12874-019-0731-y>, Spineli et al., (2019) <doi:10.1002/sim.8207>, Spineli, (2019) <doi:10.1016/j.jclinepi.2018.09.002>, and Spineli et al., (2021) <doi:10.1002/jrsm.1478>). The robustness to primary analysis results can also be investigated using a novel intuitive index (see Spineli et al., (2021) <doi:10.1177/0962280220983544>). Methods to evaluate the transitivity assumption quantitatively are provided (see Spineli, (2024) <doi:10.1186/s12874-024-02436-7>). A novel index to facilitate interpretation of local inconsistency is also available (see Spineli, (2024) <doi:0.1186/s13643-024-02680-4>) The package also offers a rich, user-friendly visualisation toolkit that aids in appraising and interpreting the results thoroughly and preparing the manuscript for journal submission. The visualisation tools comprise the network plot, forest plots, panel of diagnostic plots, heatmaps on the extent of missing participant outcome data in the network, league heatmaps on estimation and prediction, rankograms, Bland-Altman plot, leverage plot, deviance scatterplot, heatmap of robustness, barplot of Kullback-Leibler divergence, heatmap of comparison dissimilarities and dendrogram of comparison clustering. The package also allows the user to export the results to an Excel file at the working directory.

Maintained by Loukia Spineli. Last updated 9 days ago.

jags cpp

2.2 match 5 stars 6.64 score 12 scripts

bioc

BioNERO:Biological Network Reconstruction Omnibus

BioNERO aims to integrate all aspects of biological network inference in a single package, including data preprocessing, exploratory analyses, network inference, and analyses for biological interpretations. BioNERO can be used to infer gene coexpression networks (GCNs) and gene regulatory networks (GRNs) from gene expression data. Additionally, it can be used to explore topological properties of protein-protein interaction (PPI) networks. GCN inference relies on the popular WGCNA algorithm. GRN inference is based on the "wisdom of the crowds" principle, which consists in inferring GRNs with multiple algorithms (here, CLR, GENIE3 and ARACNE) and calculating the average rank for each interaction pair. As all steps of network analyses are included in this package, BioNERO makes users avoid having to learn the syntaxes of several packages and how to communicate between them. Finally, users can also identify consensus modules across independent expression sets and calculate intra and interspecies module preservation statistics between different networks.

Maintained by Fabricio Almeida-Silva. Last updated 5 months ago.

software geneexpression generegulation systemsbiology graphandnetwork preprocessing network networkinference

1.9 match 27 stars 7.78 score 50 scripts 1 dependents

bioc

simplifyEnrichment:Simplify Functional Enrichment Results

A new clustering algorithm, "binary cut", for clustering similarity matrices of functional terms is implemeted in this package. It also provides functions for visualizing, summarizing and comparing the clusterings.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization go clustering genesetenrichment

1.8 match 113 stars 8.02 score 196 scripts

hiweller

recolorize:Color-Based Image Segmentation

Automatic, semi-automatic, and manual functions for generating color maps from images. The idea is to simplify the colors of an image according to a metric that is useful for the user, using deterministic methods whenever possible. Many images will be clustered well using the out-of-the-box functions, but the package also includes a toolbox of functions for making manual adjustments (layer merging/isolation, blurring, fitting to provided color clusters or those from another image, etc). Also includes export methods for other color/pattern analysis packages (pavo, patternize, colordistance).

Maintained by Hannah Weller. Last updated 13 days ago.

1.9 match 39 stars 7.68 score 87 scripts

bioc

spatialHeatmap:spatialHeatmap: Visualizing Spatial Assays in Anatomical Images and Large-Scale Data Extensions

The spatialHeatmap package offers the primary functionality for visualizing cell-, tissue- and organ-specific assay data in spatial anatomical images. Additionally, it provides extended functionalities for large-scale data mining routines and co-visualizing bulk and single-cell data. A description of the project is available here: https://spatialheatmap.org.

Maintained by Jianhai Zhang. Last updated 4 months ago.

spatial visualization microarray sequencing geneexpression datarepresentation network clustering graphandnetwork cellbasedassays atacseq dnaseq tissuemicroarray singlecell cellbiology genetarget

2.3 match 5 stars 6.26 score 12 scripts

bioc

structToolbox:Data processing & analysis tools for Metabolomics and other omics

An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.

Maintained by Gavin Rhys Lloyd. Last updated 25 days ago.

workflowstep metabolomics bioconductor-package dims lc-ms machine-learning multivariate-analysis statistics univariate

2.3 match 10 stars 6.26 score 12 scripts

bioc

cola:A Framework for Consensus Partitioning

Subgroup classification is a basic task in genomic data analysis, especially for gene expression and DNA methylation data analysis. It can also be used to test the agreement to known clinical annotations, or to test whether there exist significant batch effects. The cola package provides a general framework for subgroup classification by consensus partitioning. It has the following features: 1. It modularizes the consensus partitioning processes that various methods can be easily integrated. 2. It provides rich visualizations for interpreting the results. 3. It allows running multiple methods at the same time and provides functionalities to straightforward compare results. 4. It provides a new method to extract features which are more efficient to separate subgroups. 5. It automatically generates detailed reports for the complete analysis. 6. It allows applying consensus partitioning in a hierarchical manner.

Maintained by Zuguang Gu. Last updated 1 months ago.

clustering geneexpression classification software consensus-clustering cpp

1.8 match 61 stars 7.49 score 112 scripts

chavent

ClustOfVar:Clustering of Variables

Cluster analysis of a set of variables. Variables can be quantitative, qualitative or a mixture of both.

Maintained by Marie Chavent. Last updated 5 years ago.

2.0 match 7 stars 6.47 score 142 scripts 2 dependents

bioc

systemPipeTools:Tools for data visualization

systemPipeTools package extends the widely used systemPipeR (SPR) workflow environment with an enhanced toolkit for data visualization, including utilities to automate the data visualizaton for analysis of differentially expressed genes (DEGs). systemPipeTools provides data transformation and data exploration functions via scatterplots, hierarchical clustering heatMaps, principal component analysis, multidimensional scaling, generalized principal components, t-Distributed Stochastic Neighbor embedding (t-SNE), and MA and volcano plots. All these utilities can be integrated with the modular design of the systemPipeR environment that allows users to easily substitute any of these features and/or custom with alternatives.

Maintained by Daniela Cassol. Last updated 5 months ago.

infrastructure dataimport sequencing qualitycontrol reportwriting experimentaldesign clustering differentialexpression multidimensionalscaling principalcomponent

3.2 match 4.00 score 4 scripts

bioc

GeDi:Defining and visualizing the distances between different genesets

The package provides different distances measurements to calculate the difference between genesets. Based on these scores the genesets are clustered and visualized as graph. This is all presented in an interactive Shiny application for easy usage.

Maintained by Annekathrin Nedwed. Last updated 5 months ago.

gui genesetenrichment software transcription rnaseq visualization clustering pathways reportwriting go kegg reactome shinyapps

2.3 match 1 stars 5.52 score 22 scripts

mums2

mpactr:Correction of Preprocessed MS Data

An 'R' implementation of the 'python' program Metabolomics Peak Analysis Computational Tool ('MPACT') (Robert M. Samples, Sara P. Puckett, and Marcy J. Balunas (2023) <doi:10.1021/acs.analchem.2c04632>). Filters in the package serve to address common errors in tandem mass spectrometry preprocessing, including: (1) isotopic patterns that are incorrectly split during preprocessing, (2) features present in solvent blanks due to carryover between samples, (3) features whose abundance is greater than user-defined abundance threshold in a specific group of samples, for example media blanks, (4) ions that are inconsistent between technical replicates, and (5) in-source fragment ions created during ionization before fragmentation in the tandem mass spectrometry workflow.

Maintained by Patrick Schloss. Last updated 2 days ago.

cpp

2.2 match 1 stars 5.56 score 4 scripts

bioc

made4:Multivariate analysis of microarray data using ADE4

Multivariate data analysis and graphical display of microarray data. Functions include for supervised dimension reduction (between group analysis) and joint dimension reduction of 2 datasets (coinertia analysis). It contains functions that require R package ade4.

Maintained by Aedin Culhane. Last updated 5 months ago.

clustering classification dimensionreduction principalcomponent transcriptomics multiplecomparison geneexpression sequencing microarray

2.0 match 6.11 score 107 scripts 2 dependents

bioc

ViSEAGO:ViSEAGO: a Bioconductor package for clustering biological functions using Gene Ontology and semantic similarity

The main objective of ViSEAGO package is to carry out a data mining of biological functions and establish links between genes involved in the study. We developed ViSEAGO in R to facilitate functional Gene Ontology (GO) analysis of complex experimental design with multiple comparisons of interest. It allows to study large-scale datasets together and visualize GO profiles to capture biological knowledge. The acronym stands for three major concepts of the analysis: Visualization, Semantic similarity and Enrichment Analysis of Gene Ontology. It provides access to the last current GO annotations, which are retrieved from one of NCBI EntrezGene, Ensembl or Uniprot databases for several species. Using available R packages and novel developments, ViSEAGO extends classical functional GO analysis to focus on functional coherence by aggregating closely related biological themes while studying multiple datasets at once. It provides both a synthetic and detailed view using interactive functionalities respecting the GO graph structure and ensuring functional coherence supplied by semantic similarity. ViSEAGO has been successfully applied on several datasets from different species with a variety of biological questions. Results can be easily shared between bioinformaticians and biologists, enhancing reporting capabilities while maintaining reproducibility.

Maintained by Aurelien Brionne. Last updated 2 months ago.

software annotation go genesetenrichment multiplecomparison clustering visualization

1.8 match 6.64 score 22 scripts

dami82

colorhcplot:Colorful Hierarchical Clustering Dendrograms

Build dendrograms with sample groups highlighted by different colors. Visualize results of hierarchical clustering analyses as dendrograms whose leaves and labels are colored according to sample grouping. Assess whether data point grouping aligns to naturally occurring clusters.

Maintained by Damiano Fantini. Last updated 7 years ago.

5.8 match 2.00 score 5 scripts

cran

compositions:Compositional Data Analysis

Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. Pawlowsky-Glahn.

Maintained by K. Gerald van den Boogaart. Last updated 1 years ago.

openblas

1.8 match 1 stars 6.35 score 36 dependents

jfq3

ggordiplots:Make 'ggplot2' Versions of Vegan's Ordiplots

The 'vegan' package includes several functions for adding features to ordination plots: ordiarrows(), ordiellipse(), ordihull(), ordispider() and ordisurf(). This package adds these same features to ordination plots made with 'ggplot2'. In addition, gg_ordibubble() sizes points relative to the value of an environmental variable.

Maintained by John Quensen. Last updated 5 months ago.

1.9 match 7 stars 6.09 score 175 scripts

aroneklund

squash:Color-Based Plots for Multivariate Visualization

Functions for color-based visualization of multivariate data, i.e. colorgrams or heatmaps. Lower-level functions map numeric values to colors, display a matrix as an array of colors, and draw color keys. Higher-level plotting functions generate a bivariate histogram, a dendrogram aligned with a color-coded matrix, a triangular distance matrix, and more.

Maintained by Aron C. Eklund. Last updated 2 years ago.

2.4 match 2 stars 4.74 score 46 scripts 4 dependents

jpfitzinger

hfr:Estimate Hierarchical Feature Regression Models

Provides functions for the estimation, plotting, predicting and cross-validation of hierarchical feature regression models as described in Pfitzinger (2024). Cluster Regularization via a Hierarchical Feature Regression. Econometrics and Statistics (in press). <doi:10.1016/j.ecosta.2024.01.003>.

Maintained by Johann Pfitzinger. Last updated 1 years ago.

hierarchical-clustering machine-learning penalized-regression regularized-regression

3.8 match 1 stars 3.00 score 1 scripts

brandmaier

pdc:Permutation Distribution Clustering

Permutation Distribution Clustering is a clustering method for time series. Dissimilarity of time series is formalized as the divergence between their permutation distributions. The permutation distribution was proposed as measure of the complexity of a time series.

Maintained by Andreas M. Brandmaier. Last updated 2 years ago.

2.0 match 6 stars 5.61 score 25 scripts 9 dependents

bioc

ChAMP:Chip Analysis Methylation Pipeline for Illumina HumanMethylation450 and EPIC

The package includes quality control metrics, a selection of normalization methods and novel methods to identify differentially methylated regions and to highlight copy number alterations.

Maintained by Yuan Tian. Last updated 5 months ago.

microarray methylationarray normalization twochannel copynumber dnamethylation

1.7 match 6.54 score 278 scripts

gefeizhang

statVisual:Statistical Visualization Tools

Visualization functions in the applications of translational medicine (TM) and biomarker (BM) development to compare groups by statistically visualizing data and/or results of analyses, such as visualizing data by displaying in one figure different groups' histograms, boxplots, densities, scatter plots, error-bar plots, or trajectory plots, by displaying scatter plots of top principal components or dendrograms with data points colored based on group information, or visualizing volcano plots to check the results of whole genome analyses for gene differential expression.

Maintained by Wenfei Zhang. Last updated 5 years ago.

3.6 match 3.00 score 3 scripts

cbhurley

gclus:Clustering Graphics

Orders panels in scatterplot matrices and parallel coordinate displays by some merit index. Package contains various indices of merit, ordering functions, and enhanced versions of pairs and parcoord which color panels according to their merit level.

Maintained by Catherine Hurley. Last updated 6 years ago.

1.3 match 8.23 score 406 scripts 82 dependents

tom-wolff

ideanet:Integrating Data Exchange and Analysis for Networks ('ideanet')

A suite of convenient tools for social network analysis geared toward students, entry-level users, and non-expert practitioners. ‘ideanet’ features unique functions for the processing and measurement of sociocentric and egocentric network data. These functions automatically generate node- and system-level measures commonly used in the analysis of these types of networks. Outputs from these functions maximize the ability of novice users to employ network measurements in further analyses while making all users less prone to common data analytic errors. Additionally, ‘ideanet’ features an R Shiny graphic user interface that allows novices to explore network data with minimal need for coding.

Maintained by Tom Wolff. Last updated 2 days ago.

1.5 match 6 stars 6.80 score 10 scripts

maximeherve

RVAideMemoire:Testing and Plotting Procedures for Biostatistics

Contains miscellaneous functions useful in biostatistics, mostly univariate and multivariate testing procedures with a special emphasis on permutation tests. Many functions intend to simplify user's life by shortening existing procedures or by implementing plotting functions that can be used with as many methods from different packages as possible.

Maintained by Maxime HERVE. Last updated 1 years ago.

1.9 match 8 stars 5.31 score 632 scripts

sgs2000

ClustMC:Cluster-Based Multiple Comparisons

Multiple comparison techniques are typically applied following an F test from an ANOVA to decide which means are significantly different from one another. As an alternative to traditional methods, cluster analysis can be performed to group the means of different treatments into non-overlapping clusters. Treatments in different groups are considered statistically different. Several approaches have been proposed, with varying clustering methods and cut-off criteria. This package implements cluster-based multiple comparisons tests and also provides a visual representation in the form of a dendrogram. Di Rienzo, J. A., Guzman, A. W., & Casanoves, F. (2002) <jstor.org/stable/1400690>. Bautista, M. G., Smith, D. W., & Steiner, R. L. (1997) <doi:10.2307/1400402>.

Maintained by Santiago Garcia Sanchez. Last updated 7 months ago.

2.0 match 4.90 score 6 scripts

pneuvial

adjclust:Adjacency-Constrained Clustering of a Block-Diagonal Similarity Matrix

Implements a constrained version of hierarchical agglomerative clustering, in which each observation is associated to a position, and only adjacent clusters can be merged. Typical application fields in bioinformatics include Genome-Wide Association Studies or Hi-C data analysis, where the similarity between items is a decreasing function of their genomic distance. Taking advantage of this feature, the implemented algorithm is time and memory efficient. This algorithm is described in Ambroise et al (2019) <doi:10.1186/s13015-019-0157-4>.

Maintained by Pierre Neuvial. Last updated 5 months ago.

clustering featureextraction gwas hi-c hierarchical-clustering linkage-disequilibrium cpp openmp

1.3 match 16 stars 7.35 score 13 scripts 2 dependents

minoo-asty

CINNA:Deciphering Central Informative Nodes in Network Analysis

Computing, comparing, and demonstrating top informative centrality measures within a network. "CINNA: an R/CRAN package to decipher Central Informative Nodes in Network Analysis" provides a comprehensive overview of the package functionality Ashtiani et al. (2018) <doi:10.1093/bioinformatics/bty819>.

Maintained by Minoo Ashtiani. Last updated 2 years ago.

2.9 match 1 stars 3.29 score 98 scripts

bioc

scGPS:A complete analysis of single cell subpopulations, from identifying subpopulations to analysing their relationship (scGPS = single cell Global Predictions of Subpopulation)

The package implements two main algorithms to answer two key questions: a SCORE (Stable Clustering at Optimal REsolution) to find subpopulations, followed by scGPS to investigate the relationships between subpopulations.

Maintained by Quan Nguyen. Last updated 5 months ago.

singlecell clustering dataimport sequencing coverage openblas cpp

1.8 match 4 stars 5.20 score 7 scripts

kaneplusplus

listdown:Create R Markdown from Lists

Programmatically create R Markdown documents from lists.

Maintained by Michael J. Kane. Last updated 2 years ago.

1.8 match 27 stars 5.17 score 11 scripts

cmlmagneville

mFD:Compute and Illustrate the Multiple Facets of Functional Diversity

Computing functional traits-based distances between pairs of species for species gathered in assemblages allowing to build several functional spaces. The package allows to compute functional diversity indices assessing the distribution of species (and of their dominance) in a given functional space for each assemblage and the overlap between assemblages in a given functional space, see: Chao et al. (2018) <doi:10.1002/ecm.1343>, Maire et al. (2015) <doi:10.1111/geb.12299>, Mouillot et al. (2013) <doi:10.1016/j.tree.2012.10.004>, Mouillot et al. (2014) <doi:10.1073/pnas.1317625111>, Ricotta and Szeidl (2009) <doi:10.1016/j.tpb.2009.10.001>. Graphical outputs are included. Visit the 'mFD' website for more information, documentation and examples.

Maintained by Camille Magneville. Last updated 3 months ago.

1.3 match 26 stars 7.35 score 61 scripts

adrientaudiere

cati:Community Assembly by Traits: Individuals and Beyond

Detect and quantify community assembly processes using trait values of individuals or populations, the T-statistics and other metrics, and dedicated null models.

Maintained by Adrien Taudiere. Last updated 4 months ago.

1.7 match 12 stars 5.33 score 15 scripts

bioc

InterCellar:InterCellar: an R-Shiny app for interactive analysis and exploration of cell-cell communication in single-cell transcriptomics

InterCellar is implemented as an R/Bioconductor Package containing a Shiny app that allows users to interactively analyze cell-cell communication from scRNA-seq data. Starting from precomputed ligand-receptor interactions, InterCellar provides filtering options, annotations and multiple visualizations to explore clusters, genes and functions. Finally, based on functional annotation from Gene Ontology and pathway databases, InterCellar implements data-driven analyses to investigate cell-cell communication in one or multiple conditions.

Maintained by Marta Interlandi. Last updated 5 months ago.

software singlecell visualization go transcriptomics

1.8 match 9 stars 4.95 score 7 scripts

mi2-warsaw

sejmRP:An Information About Deputies and Votings in Polish Diet from Seventh to Eighth Term of Office

Set of functions that access information about deputies and votings in Polish diet from webpage <http://www.sejm.gov.pl>. The package was developed as a result of an internship in MI2 Group - <http://mi2.mini.pw.edu.pl>, Faculty of Mathematics and Information Science, Warsaw University of Technology.

Maintained by Piotr Smuda. Last updated 8 years ago.

1.8 match 21 stars 5.04 score 35 scripts

jarioksa

natto:An Extreme 'vegan' Package of Experimental Code

Random code that is too experimental or too weird to be included in the vegan package.

Maintained by Jari Oksanen. Last updated 28 days ago.

1.9 match 8 stars 4.68 score 1 scripts

rhenkin

visxhclust:A Shiny App for Visual Exploration of Hierarchical Clustering

A Shiny application and functions for visual exploration of hierarchical clustering with numeric datasets. Allows users to iterative set hyperparameters, select features and evaluate results through various plots and computation of evaluation criteria.

Maintained by Rafael Henkin. Last updated 2 years ago.

clustering data-analysis data-science r-shiny shiny-apps

1.8 match 4 stars 4.86 score 12 scripts

dicook

mulgar:Functions for Pre-Processing Data for Multivariate Data Visualisation using Tours

This is a companion to the book Cook, D. and Laa, U. (2023) <https://dicook.github.io/mulgar_book/> "Interactively exploring high-dimensional data and models in R". by Cook and Laa. It contains useful functions for processing data in preparation for visualising with a tour. There are also several sample data sets.

Maintained by Dianne Cook. Last updated 2 months ago.

1.9 match 4 stars 4.50 score 79 scripts

nicolas-robette

seqhandbook:Miscellaneous Tools for Sequence Analysis

It provides miscellaneous sequence analysis functions for describing episodes in individual sequences, measuring association between domains in multidimensional sequence analysis (see Piccarreta (2017) <doi:10.1177/0049124115591013>), heat maps of sequence data, Globally Interdependent Multidimensional Sequence Analysis (see Robette et al (2015) <doi:10.1177/0081175015570976>), smoothing sequences for index plots (see Piccarreta (2012) <doi:10.1177/0049124112452394>), coding sequences for Qualitative Harmonic Analysis (see Deville (1982)), measuring stress from multidimensional scaling factors (see Piccarreta and Lior (2010) <doi:10.1111/j.1467-985X.2009.00606.x>), symmetrical (or canonical) Partial Least Squares (see Bry (1996)).

Maintained by Nicolas Robette. Last updated 2 years ago.

1.8 match 6 stars 4.76 score 19 scripts

clancylabuiuc

moRphomenses:Geometric Morphometric Tools to Align, Scale, and Compare "Shape" of Menstrual Cycle Hormones

Mitteroecker & Gunz (2009) <doi:10.1007/s11692-009-9055-x> describe how geometric morphometric methods allow researchers to quantify the size and shape of physical biological structures. We provide tools to extend geometric morphometric principles to the study of non-physical structures, hormone profiles, as outlined in Ehrlich et al (2021) <doi:10.1002/ajpa.24514>. Easily transform daily measures into multivariate landmark-based data. Includes custom functions to apply multivariate methods for data exploration as well as hypothesis testing. Also includes 'shiny' web app to streamline data exploration. Developed to study menstrual cycle hormones but functions have been generalized and should be applicable to any biomarker over any time period.

Maintained by Daniel Ehrlich. Last updated 2 months ago.

2.0 match 2 stars 4.04 score 4 scripts

cstewartgh

QFASA:Quantitative Fatty Acid Signature Analysis

Accurate estimates of the diets of predators are required in many areas of ecology, but for many species current methods are imprecise, limited to the last meal, and often biased. The diversity of fatty acids and their patterns in organisms, coupled with the narrow limitations on their biosynthesis, properties of digestion in monogastric animals, and the prevalence of large storage reservoirs of lipid in many predators, led to the development of quantitative fatty acid signature analysis (QFASA) to study predator diets.

Maintained by Connie Stewart. Last updated 7 months ago.

1.7 match 1 stars 4.83 score 17 scripts

audreyqyfu

MRPC:PC Algorithm with the Principle of Mendelian Randomization

A PC Algorithm with the Principle of Mendelian Randomization. This package implements the MRPC (PC with the principle of Mendelian randomization) algorithm to infer causal graphs. It also contains functions to simulate data under a certain topology, to visualize a graph in different ways, and to compare graphs and quantify the differences. See Badsha and Fu (2019) <doi:10.3389/fgene.2019.00460>,Badsha, Martin and Fu (2021) <doi:10.3389/fgene.2021.651812>.

Maintained by Audrey Fu. Last updated 3 years ago.

1.7 match 8 stars 4.68 score 20 scripts

andeek

protoshiny:Interactive Dendrograms for Visualizing Hierarchical Clusters with Prototypes

Shiny app to interactively visualize hierarchical clustering with prototypes. For details on hierarchical clustering with prototypes, see Bien and Tibshirani (2011) <doi:10.1198/jasa.2011.tm10183>. This package currently launches the application.

Maintained by Andee Kaplan. Last updated 3 years ago.

2.9 match 1 stars 2.70 score 4 scripts

jarioksa

twinspan:Two-Way Indicator Species Analysis

Classification of biological communities based on splitting first axis of Correspondence Analysis for the current subset of the data, and finding species that best indicate the splits. The method is particularly popular in vegetation science.

Maintained by Jari Oksanen. Last updated 4 months ago.

fortran

1.9 match 7 stars 4.10 score 18 scripts

cran

sparcl:Perform Sparse Hierarchical Clustering and Sparse K-Means Clustering

Implements the sparse clustering methods of Witten and Tibshirani (2010): "A framework for feature selection in clustering"; published in Journal of the American Statistical Association 105(490): 713-726.

Maintained by Daniela Witten. Last updated 6 years ago.

fortran

1.8 match 1 stars 4.20 score 133 scripts 4 dependents

bioc

GSEAmining:Make Biological Sense of Gene Set Enrichment Analysis Outputs

Gene Set Enrichment Analysis is a very powerful and interesting computational method that allows an easy correlation between differential expressed genes and biological processes. Unfortunately, although it was designed to help researchers to interpret gene expression data it can generate huge amounts of results whose biological meaning can be difficult to interpret. Many available tools rely on the hierarchically structured Gene Ontology (GO) classification to reduce reundandcy in the results. However, due to the popularity of GSEA many more gene set collections, such as those in the Molecular Signatures Database are emerging. Since these collections are not organized as those in GO, their usage for GSEA do not always give a straightforward answer or, in other words, getting all the meaninful information can be challenging with the currently available tools. For these reasons, GSEAmining was born to be an easy tool to create reproducible reports to help researchers make biological sense of GSEA outputs. Given the results of GSEA, GSEAmining clusters the different gene sets collections based on the presence of the same genes in the leadind edge (core) subset. Leading edge subsets are those genes that contribute most to the enrichment score of each collection of genes or gene sets. For this reason, gene sets that participate in similar biological processes should share genes in common and in turn cluster together. After that, GSEAmining is able to identify and represent for each cluster: - The most enriched terms in the names of gene sets (as wordclouds) - The most enriched genes in the leading edge subsets (as bar plots). In each case, positive and negative enrichments are shown in different colors so it is easy to distinguish biological processes or genes that may be of interest in that particular study.

Maintained by Oriol Arqués. Last updated 5 months ago.

genesetenrichment clustering visualization

1.9 match 4.00 score 7 scripts

bioc

cn.farms:cn.FARMS - factor analysis for copy number estimation

This package implements the cn.FARMS algorithm for copy number variation (CNV) analysis. cn.FARMS allows to analyze the most common Affymetrix (250K-SNP6.0) array types, supports high-performance computing using snow and ff.

Maintained by Andreas Mitterecker. Last updated 5 months ago.

microarray copynumbervariation cpp

2.3 match 3.30 score 7 scripts

cran

PoiClaClu:Classification and Clustering of Sequencing Data Based on a Poisson Model

Implements the methods described in the paper, Witten (2011) Classification and Clustering of Sequencing Data using a Poisson Model, Annals of Applied Statistics 5(4) 2493-2518.

Maintained by Daniela Witten. Last updated 6 years ago.

1.8 match 3.81 score 107 scripts 2 dependents

promidat

discoveR:Exploratory Data Analysis System

Performs an exploratory data analysis through a 'shiny' interface. It includes basic methods such as the mean, median, mode, normality test, among others. It also includes clustering techniques such as Principal Components Analysis, Hierarchical Clustering and the K-Means Method.

Maintained by Oldemar Rodriguez. Last updated 2 years ago.

2.3 match 3 stars 3.03 score 18 scripts

arliph

SPARTAAS:Statistical Pattern Recognition and daTing using Archaeological Artefacts assemblageS

Statistical pattern recognition and dating using archaeological artefacts assemblages. Package of statistical tools for archaeology. hclustcompro(perioclust): Bellanger Lise, Coulon Arthur, Husi Philibrary(SPARTlippe (2021, ISBN:978-3-030-60103-4). mapclust: Bellanger Lise, Coulon Arthur, Husi Philippe (2021) <doi:10.1016/j.jas.2021.105431>. seriograph: Desachy Bruno (2004) <doi:10.3406/pica.2004.2396>. cerardat: Bellanger Lise, Husi Philippe (2012) <doi:10.1016/j.jas.2011.06.031>.

Maintained by Arthur Coulon. Last updated 10 months ago.

1.6 match 6 stars 4.14 score 46 scripts

bioc

ChromHeatMap:Heat map plotting by genome coordinate

The ChromHeatMap package can be used to plot genome-wide data (e.g. expression, CGH, SNP) along each strand of a given chromosome as a heat map. The generated heat map can be used to interactively identify probes and genes of interest.

Maintained by Tim F. Rayner. Last updated 5 months ago.

visualization

1.7 match 3.30 score

sciviews

exploreit:Exploratory Data Analysis for 'SciViews::R'

Multivariate analysis and data exploration for the 'SciViews::R' dialect.

Maintained by Philippe Grosjean. Last updated 11 months ago.

multivariate-analysis sciviews statistical-methods

1.9 match 2.70 score 4 scripts

lau-mel

swamp:Visualization, Analysis and Adjustment of High-Dimensional Data in Respect to Sample Annotations

Collection of functions to connect the structure of the data with the information on the samples. Three types of associations are covered: 1. linear model of principal components. 2. hierarchical clustering analysis. 3. distribution of features-sample annotation associations. Additionally, the inter-relation between sample annotations can be analyzed. Simple methods are provided for the correction of batch effects and removal of principal components.

Maintained by Martin Lauss. Last updated 5 years ago.

1.9 match 2.42 score 29 scripts 1 dependents

zdeneksulc

nomclust:Hierarchical Cluster Analysis of Nominal Data

Similarity measures for hierarchical clustering of objects characterized by nominal (categorical) variables. Evaluation criteria for nominal data clustering.

Maintained by Zdenek Sulc. Last updated 2 years ago.

cpp

1.8 match 4 stars 2.48 score 38 scripts

gavinsimpson

MRFtools:Tools for Constructing and Plotting Markov Random Fields in R for Graphical Data

Utility functions for using Markov Random Field smooths in Generalized Additive Models fitted with the 'mgcv' package.

Maintained by Eric J. Petersen. Last updated 3 days ago.

2.0 match 2.18 score

cran

heatmapFlex:Tools to Generate Flexible Heatmaps

A set of tools supporting more flexible heatmaps. The graphics is grid-like using the old graphics system. The main function is heatmap.n2(), which is a wrapper around the various functions constructing individual parts of the heatmap, like sidebars, picket plots, legends etc. The function supports zooming and splitting, i.e., having (unlimited) small heatmaps underneath each other in one plot deriving from the same data set, e.g., clustered and ordered by a supervised clustering method.

Maintained by Vidal Fey. Last updated 4 years ago.

1.8 match 2.48 score 1 dependents

cran

ROKET:Optimal Transport-Based Kernel Regression

Perform optimal transport on somatic point mutations and kernel regression hypothesis testing by integrating pathway level similarities at the gene level (Little et al. (2023) <doi:10.1111/biom.13769>). The software implements balanced and unbalanced optimal transport and omnibus tests with 'C++' across a set of tumor samples and allows for multi-threading to decrease computational runtime.

Maintained by Paul Little. Last updated 10 days ago.

openblas cpp openmp

2.0 match 2.00 score

mmaechler

VLMC:Variable Length Markov Chains ('VLMC') Models

Functions, Classes & Methods for estimation, prediction, and simulation (bootstrap) of Variable Length Markov Chain ('VLMC') Models.

Maintained by Martin Maechler. Last updated 7 months ago.

2.0 match 1.92 score 28 scripts

blansche

fdm2id:Data Mining and R Programming for Beginners

Contains functions to simplify the use of data mining methods (classification, regression, clustering, etc.), for students and beginners in R programming. Various R packages are used and wrappers are built around the main functions, to standardize the use of data mining methods (input/output): it brings a certain loss of flexibility, but also a gain of simplicity. The package name came from the French "Fouille de Données en Master 2 Informatique Décisionnelle".

Maintained by Alexandre Blansché. Last updated 2 years ago.

2.3 match 1 stars 1.62 score 42 scripts

cran

RJSplot:Interactive Graphs with R

Creates interactive graphs with 'R'. It joins the data analysis power of R and the visualization libraries of JavaScript in one package.

Maintained by Carlos Prieto. Last updated 3 years ago.

2.3 match 4 stars 1.60 score

synergisticcauselearning

CoOL:Causes of Outcome Learning

Implementing the computational phase of the Causes of Outcome Learning approach as described in Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <doi:10.1093/ije/dyac078>. The optional 'ggtree' package can be obtained through Bioconductor.

Maintained by Andreas Rieckmann. Last updated 3 years ago.

openblas cpp

2.0 match 1.70 score 6 scripts

skranz

distRforest:Distribution-based Random Forest

Extension of the rpart package with added loss functions and random forest functionality.

Maintained by Roel Henckaerts. Last updated 5 years ago.

1.9 match 1.78 score 12 scripts

haijiangq

EFAfactors:Determining the Number of Factors in Exploratory Factor Analysis

Provides a collection of standard factor retention methods in Exploratory Factor Analysis (EFA), making it easier to determine the number of factors. Traditional methods such as the scree plot by Cattell (1966) <doi:10.1207/s15327906mbr0102_10>, Kaiser-Guttman Criterion (KGC) by Guttman (1954) <doi:10.1007/BF02289162> and Kaiser (1960) <doi:10.1177/001316446002000116>, and flexible Parallel Analysis (PA) by Horn (1965) <doi:10.1007/BF02289447> based on eigenvalues form PCA or EFA are readily available. This package also implements several newer methods, such as the Empirical Kaiser Criterion (EKC) by Braeken and van Assen (2017) <doi:10.1037/met0000074>, Comparison Data (CD) by Ruscio and Roche (2012) <doi:10.1037/a0025697>, and Hull method by Lorenzo-Seva et al. (2011) <doi:10.1080/00273171.2011.564527>, as well as some AI-based methods like Comparison Data Forest (CDF) by Goretzko and Ruscio (2024) <doi:10.3758/s13428-023-02122-4> and Factor Forest (FF) by Goretzko and Buhner (2020) <doi:10.1037/met0000262>. Additionally, it includes a deep neural network (DNN) trained on large-scale datasets that can efficiently and reliably determine the number of factors.

Maintained by Haijiang Qin. Last updated 27 days ago.

openblas cpp openmp

1.8 match 1.70 score

cran

twl:Two-Way Latent Structure Clustering Model

Implementation of a Bayesian two-way latent structure model for integrative genomic clustering. The model clusters samples in relation to distinct data sources, with each subject-dataset receiving a latent cluster label, though cluster labels have across-dataset meaning because of the model formulation. A common scaling across data sources is unneeded, and inference is obtained by a Gibbs Sampler. The model can fit multivariate Gaussian distributed clusters or a heavier-tailed modification of a Gaussian density. Uniquely among integrative clustering models, the formulation makes no nestedness assumptions of samples across data sources -- the user can still fit the model if a study subject only has information from one data source. The package provides a variety of post-processing functions for model examination including ones for quantifying observed alignment of clusterings across genomic data sources. Run time is optimized so that analyses of datasets on the order of thousands of features on fewer than 5 datasets and hundreds of subjects can converge in 1 or 2 days on a single CPU. See "Swanson DM, Lien T, Bergholtz H, Sorlie T, Frigessi A, Investigating Coordinated Architectures Across Clusters in Integrative Studies: a Bayesian Two-Way Latent Structure Model, 2018, <doi:10.1101/387076>, Cold Spring Harbor Laboratory" at <https://www.biorxiv.org/content/early/2018/08/07/387076.full.pdf> for model details.

Maintained by Michael Swanson. Last updated 7 years ago.

1.7 match 1.75 score 56 scripts

thezetner

Plasmidprofiler:Visualization of Plasmid Profile Results

Contains functions developed to combine the results of querying a plasmid database using short-read sequence typing with the results of a blast analysis against the query results.

Maintained by Adrian Zetner. Last updated 8 years ago.

1.8 match 1.43 score 27 scripts

tabea17

graphclust:Hierarchical Graph Clustering for a Collection of Networks

Graph clustering using an agglomerative algorithm to maximize the integrated classification likelihood criterion and a mixture of stochastic block models. The method is described in the article "Model-based clustering of multiple networks with a hierarchical algorithm" by T. Rebafka (2022) <arXiv:2211.02314>.

Maintained by Tabea Rebafka. Last updated 2 years ago.

1.7 match 1.43 score 27 scripts

plangfelder

moduleColor:Basic Module Functions

Methods for color labeling, calculation of eigengenes, merging of closely related modules.

Maintained by Peter Langfelder. Last updated 3 years ago.

1.9 match 1.28 score 19 scripts

leondap

recluster:Ordination Methods for the Analysis of Beta-Diversity Indices

The analysis of different aspects of biodiversity requires specific algorithms. For example, in regionalisation analyses, the high frequency of ties and zero values in dissimilarity matrices produced by Beta-diversity turnover produces hierarchical cluster dendrograms whose topology and bootstrap supports are affected by the order of rows in the original matrix. Moreover, visualisation of biogeographical regionalisation can be facilitated by a combination of hierarchical clustering and multi-dimensional scaling. The recluster package provides robust techniques to visualise and analyse pattern of biodiversity and to improve occurrence data for cryptic taxa.

Maintained by Leonardo Dapporto. Last updated 4 months ago.

0.5 match 4 stars 4.69 score 41 scripts

roger0268

octopucs:Statistical Support for Hierarchical Clusters

Generates n hierarchical clustering hypotheses on subsets of classifiers (usually species in community ecology studies). The n clustering hypotheses are combined to generate a generalized cluster, and computes three metrics of support. 1) The average proportion of elements conforming the group in each of the n clusters (integrity). And 2) the contamination, i.e., the average proportion of elements from other groups that enter a focal group. 3) The probability of existence of the group gives the integrity and contamination in a Bayesian approach.

Maintained by Roger Guevara. Last updated 7 months ago.

1.8 match 1.30 score

bioc

omicplotR:Visual Exploration of Omic Datasets Using a Shiny App

A Shiny app for visual exploration of omic datasets as compositions, and differential abundance analysis using ALDEx2. Useful for exploring RNA-seq, meta-RNA-seq, 16s rRNA gene sequencing with visualizations such as principal component analysis biplots (coloured using metadata for visualizing each variable), dendrograms and stacked bar plots, and effect plots (ALDEx2). Input is a table of counts and metadata file (if metadata exists), with options to filter data by count or by metadata to remove low counts, or to visualize select samples according to selected metadata.

Maintained by Daniel Giguere. Last updated 5 months ago.

software differentialexpression geneexpression gui rnaseq dnaseq metagenomics transcriptomics bayesian microbiome visualization sequencing immunooncology

0.5 match 4.00 score 5 scripts

bioc

cellscape:Explores single cell copy number profiles in the context of a single cell tree

CellScape facilitates interactive browsing of single cell clonal evolution datasets. The tool requires two main inputs: (i) the genomic content of each single cell in the form of either copy number segments or targeted mutation values, and (ii) a single cell phylogeny. Phylogenetic formats can vary from dendrogram-like phylogenies with leaf nodes to evolutionary model-derived phylogenies with observed or latent internal nodes. The CellScape phylogeny is flexibly input as a table of source-target edges to support arbitrary representations, where each node may or may not have associated genomic data. The output of CellScape is an interactive interface displaying a single cell phylogeny and a cell-by-locus genomic heatmap representing the mutation status in each cell for each locus.

Maintained by Shixiang Wang. Last updated 5 months ago.

visualization

0.5 match 4.00 score 5 scripts

ashipunov

shipunov:Miscellaneous Functions from Alexey Shipunov

A collection of functions for data manipulation, plotting and statistical computing, to use separately or with the book "Visual Statistics. Use R!": Shipunov (2020) <http://ashipunov.info/shipunov/software/r/r-en.htm>. Dr Alexey Shipunov died in December 2022. Most useful functions: Bclust(), Jclust() and BootA() which bootstrap hierarchical clustering; Recode() which does multiple recoding in a fast, simple and flexible way; Misclass() which outputs confusion matrix even if classes are not concerted; Overlap() which measures group separation on any projection; Biarrows() which converts any scatterplot into biplot; and Pleiad() which is fast and flexible correlogram.

Maintained by ORPHANED. Last updated 2 years ago.

1.9 match 1.00 score 9 scripts

jeffjetton

greenclust:Combine Categories Using Greenacre's Method

Implements a method of iteratively collapsing the rows of a contingency table, two at a time, by selecting the pair of categories whose combination yields a new table with the smallest loss of chi-squared, as described by Greenacre, M.J. (1988) <doi:10.1007/BF01901670>. The result is compatible with the class of object returned by the 'stats' package's hclust() function and can be used similarly (plotted as a dendrogram, cut, etc.). Additional functions are provided for automatic cutting and diagnostic plotting.

Maintained by Jeff Jetton. Last updated 1 years ago.

0.5 match 5 stars 3.40 score 8 scripts

rituroy

heatmap4:Simple Heatmap Function

A color image of a numerical matrix. A dendrogram can be added to the left side and to the top. This package takes the original heatmap function and reduces the argument complexity.

Maintained by Ritu Roy. Last updated 1 months ago.

0.5 match 3.00 score 4 scripts

albyfs

mdendro:Extended Agglomerative Hierarchical Clustering

A comprehensive collection of linkage methods for agglomerative hierarchical clustering on a matrix of proximity data (distances or similarities), returning a multifurcated dendrogram or multidendrogram. Multidendrograms can group more than two clusters when ties in proximity data occur, and therefore they do not depend on the order of the input data. Descriptive measures to analyze the resulting dendrogram are additionally provided.

Maintained by Alberto Fernandez. Last updated 1 years ago.

cpp

0.8 match 2.00 score 6 scripts

cran

MultivariateAnalysis:Pacote Para Analise Multivariada

Package with multivariate analysis methodologies for experiment evaluation. The package estimates dissimilarity measures, builds dendrograms, obtains MANOVA, principal components, canonical variables, etc. (Pacote com metodologias de analise multivariada para avaliação de experimentos. O pacote estima medidas de dissimilaridade, construi de dendogramas, obtem a MANOVA, componentes principais, variaveis canonicas, etc.)

Maintained by Alcinei Mistico Azevedo. Last updated 11 months ago.

0.5 match 2.95 score

bioc

clustComp:Clustering Comparison Package

clustComp is a package that implements several techniques for the comparison and visualisation of relationships between different clustering results, either flat versus flat or hierarchical versus flat. These relationships among clusters are displayed using a weighted bi-graph, in which the nodes represent the clusters and the edges connect pairs of nodes with non-empty intersection; the weight of each edge is the number of elements in that intersection and is displayed through the edge thickness. The best layout of the bi-graph is provided by the barycentre algorithm, which minimises the weighted number of crossings. In the case of comparing a hierarchical and a non-hierarchical clustering, the dendrogram is pruned at different heights, selected by exploring the tree by depth-first search, starting at the root. Branches are decided to be split according to the value of a scoring function, that can be based either on the aesthetics of the bi-graph or on the mutual information between the hierarchical and the flat clusterings. A mapping between groups of clusters from each side is constructed with a greedy algorithm, and can be additionally visualised.

Maintained by Aurora Torrente. Last updated 5 months ago.

geneexpression clustering visualization

0.5 match 2.60 score 1 scripts

daniel-jg

paintmap:Plotting Paintmaps

Plots matrices of colours as grids of coloured squares - aka heatmaps, guaranteeing legible row and column names, without transformation of values, without re-ordering rows or columns, and without dendrograms.

Maintained by Daniel Greene. Last updated 9 years ago.

0.5 match 2.26 score 6 scripts 6 dependents

joshageman

cmAnalysis:Process and Visualise Concept Mapping Data

Processing and visualizing concept mapping data. Concept maps are versatile tools used across disciplines to enhance understanding, teaching, brainstorming, and information organization. The analysis of concept mapping data involves the sequential use of cluster analysis (for sorting participants and statements), multidimensional scaling (for positioning statements in a conceptual space), and visualization techniques, including point cluster maps and dendrograms.

Maintained by Jos Hageman. Last updated 3 days ago.

0.5 match 1.00 score