R-universe search: silhouette

palaeoverse

rphylopic:Get Silhouettes of Organisms from PhyloPic

Work with the PhyloPic Web Service (<http://api-docs.phylopic.org/v2/>) to fetch silhouette images of organisms. Includes functions for adding silhouettes to both base R plots and ggplot2 plots.

Maintained by William Gearty. Last updated 6 months ago.

base-r ggplot2 phylopic silhouette

23.0 match 91 stars 9.25 score 272 scripts

nschiett

fishualize:Color Palettes Based on Fish Species

Implementation of color palettes based on fish species.

Maintained by Nina M. D. Schiettekatte. Last updated 11 months ago.

9.9 match 155 stars 8.54 score 370 scripts

mmaechler

cluster:"Finding Groups in Data": Cluster Analysis Extended Rousseeuw et al.

Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) "Finding Groups in Data".

Maintained by Martin Maechler. Last updated 4 days ago.

4.8 match 3 stars 11.98 score 14k scripts 2.2k dependents

sinhrks

ggfortify:Data Visualization Tools for Statistical Analysis Results

Unified plotting tools for statistics commonly used, such as GLM, time series, PCA families, clustering and survival analysis. The package offers a single plotting interface for these analysis results and plots in a unified style using 'ggplot2'.

Maintained by Yuan Tang. Last updated 9 months ago.

3.8 match 529 stars 14.49 score 9.1k scripts 22 dependents

tidymodels

tidyclust:A Common API to Clustering

A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.

Maintained by Emil Hvitfeldt. Last updated 2 months ago.

6.8 match 111 stars 7.45 score 139 scripts

jkim82133

TDA:Statistical Tools for Topological Data Analysis

Tools for Topological Data Analysis. The package focuses on statistical analysis of persistent homology and density clustering. For that, this package provides an R interface for the efficient algorithms of the C++ libraries 'GUDHI' <https://project.inria.fr/gudhi/software/>, 'Dionysus' <https://www.mrzv.org/software/dionysus/>, and 'PHAT' <https://bitbucket.org/phat-code/phat/>. This package also implements methods from Fasy et al. (2014) <doi:10.1214/14-AOS1252> and Chazal et al. (2015) <doi:10.20382/jocg.v6i2a8> for analyzing the statistical significance of persistent homology features.

Maintained by Jisu Kim. Last updated 1 months ago.

gmp cpp

6.8 match 9 stars 7.18 score 204 scripts 5 dependents

kurthornik

mlbench:Machine Learning Benchmark Problems

A collection of artificial and real-world machine learning benchmark problems, including, e.g., several data sets from the UCI repository.

Maintained by Kurt Hornik. Last updated 3 months ago.

4.5 match 2 stars 8.93 score 5.0k scripts 55 dependents

mlampros

ClusterR:Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids and Affinity Propagation Clustering

Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of 'RcppArmadillo' to speed up the computationally intensive parts of the functions. For more information, see (i) "Clustering in an Object-Oriented Environment" by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) "Web-scale k-means clustering" by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) "Armadillo: a template-based C++ library for linear algebra" by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) "Clustering by Passing Messages Between Data Points" by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, <doi:10.1126/science.1136800>.

Maintained by Lampros Mouselimis. Last updated 9 months ago.

affinity-propagation cpp11 gmm kmeans kmedoids-clustering mini-batch-kmeans rcpparmadillo openblas cpp openmp

3.6 match 84 stars 11.08 score 640 scripts 24 dependents

pascoalf

ulrb:Unsupervised Learning Based Definition of Microbial Rare Biosphere

A tool to define rare biosphere. 'ulrb' solves the problem of the definition of rarity by replacing arbitrary thresholds with an unsupervised machine learning algorithm (partitioning around medoids, or k-medoids). This algorithm works for any type of microbiome data, provided there is a species abundance table. For validation of this method to different species abundance tables see Pascoal et al, 2024 (in peer-review). This method also works for non-microbiome data.

Maintained by Francisco Pascoal. Last updated 20 days ago.

6.7 match 3 stars 5.68 score 9 scripts

bioc

SC3:Single-Cell Consensus Clustering

A tool for unsupervised clustering and analysis of single cell RNA-Seq data.

Maintained by Vladimir Kiselev. Last updated 5 months ago.

immunooncology singlecell software classification clustering dimensionreduction supportvectormachine rnaseq visualization transcriptomics datarepresentation gui differentialexpression transcription bioconductor-package human-cell-atlas single-cell-rna-seq openblas cpp

3.4 match 122 stars 10.09 score 374 scripts 1 dependents

bioc

bluster:Clustering Algorithms for Bioconductor

Wraps common clustering algorithms in an easily extended S4 framework. Backends are implemented for hierarchical, k-means and graph-based clustering. Several utilities are also provided to compare and evaluate clustering results.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology software geneexpression transcriptomics singlecell clustering cpp

3.3 match 9.43 score 636 scripts 51 dependents

blasbenito

distantia:Advanced Toolset for Efficient Time Series Dissimilarity Analysis

Fast C++ implementation of Dynamic Time Warping for time series dissimilarity analysis, with applications in environmental monitoring and sensor data analysis, climate science, signal processing and pattern recognition, and financial data analysis. Built upon the ideas presented in Benito and Birks (2020) <doi:10.1111/ecog.04895>, provides tools for analyzing time series of varying lengths and structures, including irregular multivariate time series. Key features include individual variable contribution analysis, restricted permutation tests for statistical significance, and imputation of missing data via GAMs. Additionally, the package provides an ample set of tools to prepare and manage time series data.

Maintained by Blas M. Benito. Last updated 26 days ago.

5.3 match 23 stars 5.76 score 11 scripts

momx

Momocs:Morphometrics using R

The goal of 'Momocs' is to provide a complete, convenient, reproducible and open-source toolkit for 2D morphometrics. It includes most common 2D morphometrics approaches on outlines, open outlines, configurations of landmarks, traditional morphometrics, and facilities for data preparation, manipulation and visualization with a consistent grammar throughout. It allows reproducible, complex morphometrics analyses and other morphometrics approaches should be easy to plug in, or develop from, on top of this canvas.

Maintained by Vincent Bonhomme. Last updated 1 years ago.

morphometrics

4.0 match 51 stars 7.42 score 346 scripts

joemsong

CircularSilhouette:Fast Silhouette on Circular or Linear Data Clusters

Calculating silhouette information for clusters on circular or linear data using fast algorithms. These algorithms run in linear time on sorted data, in contrast to quadratic time by the definition of silhouette. When used together with the fast and optimal circular clustering method FOCC (Debnath & Song 2021) <doi:10.1109/TCBB.2021.3077573> implemented in R package 'OptCirClust', circular silhouette can be maximized to find the optimal number of circular clusters; it can also be used to estimate the period of noisy periodical data.

Maintained by Joe Song. Last updated 3 years ago.

cpp

12.0 match 2.48 score 3 scripts 1 dependents

cran

flexclust:Flexible Cluster Algorithms

The main function kcca implements a general framework for k-centroids cluster analysis supporting arbitrary distance measures and centroid computation. Further cluster methods include hard competitive learning, neural gas, and QT clustering. There are numerous visualization methods for cluster results (neighborhood graphs, convex cluster hulls, barcharts of centroids, ...), and bootstrap methods for the analysis of cluster stability.

Maintained by Bettina Grün. Last updated 17 days ago.

5.0 match 3 stars 5.81 score 52 dependents

talgalili

dendextend:Extending 'dendrogram' Functionality in R

Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.

Maintained by Tal Galili. Last updated 2 months ago.

1.7 match 154 stars 17.02 score 6.0k scripts 164 dependents

kassambara

factoextra:Extract and Visualize the Results of Multivariate Data Analyses

Provides some easy-to-use functions to extract and visualize the output of multivariate data analyses, including 'PCA' (Principal Component Analysis), 'CA' (Correspondence Analysis), 'MCA' (Multiple Correspondence Analysis), 'FAMD' (Factor Analysis of Mixed Data), 'MFA' (Multiple Factor Analysis) and 'HMFA' (Hierarchical Multiple Factor Analysis) functions from different R packages. It contains also functions for simplifying some clustering analysis steps and provides 'ggplot2' - based elegant data visualization.

Maintained by Alboukadel Kassambara. Last updated 5 years ago.

1.9 match 363 stars 14.13 score 15k scripts 52 dependents

trenchproject

TrenchR:Tools for Microclimate and Biophysical Ecology

Tools for translating environmental change into organismal response. Microclimate models to vertically scale weather station data to organismal heights. The biophysical modeling tools include both general models for heat flows and specific models to predict body temperatures for a variety of ectothermic taxa. Additional functions model and temporally partition air and soil temperatures and solar radiation. Utility functions estimate the organismal and environmental parameters needed for biophysical ecology. 'TrenchR' focuses on relatively simple and modular functions so users can create transparent and flexible biophysical models. Many functions are derived from Gates (1980) <doi:10.1007/978-1-4612-6024-0> and Campbell and Norman (1988) <isbn:9780387949376>.

Maintained by Lauren Buckley. Last updated 1 years ago.

3.8 match 13 stars 6.89 score 43 scripts

coatless-rpkg

msos:Data Sets and Functions Used in Multivariate Statistics: Old School by John Marden

Multivariate Analysis methods and data sets used in John Marden's book Multivariate Statistics: Old School (2015) <ISBN:978-1456538835>. This also serves as a companion package for the STAT 571: Multivariate Analysis course offered by the Department of Statistics at the University of Illinois at Urbana-Champaign ('UIUC').

Maintained by James Balamuta. Last updated 1 years ago.

multivariate statistics

6.1 match 3 stars 4.16 score 32 scripts 1 dependents

bioc

MBECS:Evaluation and correction of batch effects in microbiome data-sets

The Microbiome Batch Effect Correction Suite (MBECS) provides a set of functions to evaluate and mitigate unwated noise due to processing in batches. To that end it incorporates a host of batch correcting algorithms (BECA) from various packages. In addition it offers a correction and reporting pipeline that provides a preliminary look at the characteristics of a data-set before and after correcting for batch effects.

Maintained by Michael Olbrich. Last updated 5 months ago.

batcheffect microbiome reportwriting visualization normalization qualitycontrol

5.3 match 4 stars 4.60 score 4 scripts

scmethods

scregclust:Reconstructing the Regulatory Programs of Target Genes in scRNA-Seq Data

Implementation of the scregclust algorithm described in Larsson, Held, et al. (2024) <doi:10.1038/s41467-024-53954-3> which reconstructs regulatory programs of target genes in scRNA-seq data. Target genes are clustered into modules and each module is associated with a linear model describing the regulatory program.

Maintained by Felix Held. Last updated 2 months ago.

clustering regulatory-programs scrna-seq-analysis cpp openmp

3.6 match 9 stars 6.45 score 21 scripts

bioc

hopach:Hierarchical Ordered Partitioning and Collapsing Hybrid (HOPACH)

The HOPACH clustering algorithm builds a hierarchical tree of clusters by recursively partitioning a data set, while ordering and possibly collapsing clusters at each level. The algorithm uses the Mean/Median Split Silhouette (MSS) criteria to identify the level of the tree with maximally homogeneous clusters. It also runs the tree down to produce a final ordered list of the elements. The non-parametric bootstrap allows one to estimate the probability that each element belongs to each cluster (fuzzy clustering).

Maintained by Katherine S. Pollard. Last updated 5 months ago.

clustering

3.7 match 6.05 score 54 scripts 5 dependents

bioc

timeOmics:Time-Course Multi-Omics data integration

timeOmics is a generic data-driven framework to integrate multi-Omics longitudinal data measured on the same biological samples and select key temporal features with strong associations within the same sample group. The main steps of timeOmics are: 1. Plaform and time-specific normalization and filtering steps; 2. Modelling each biological into one time expression profile; 3. Clustering features with the same expression profile over time; 4. Post-hoc validation step.

Maintained by Antoine Bodein. Last updated 5 months ago.

clustering featureextraction timecourse dimensionreduction software sequencing microarray metabolomics metagenomics proteomics classification regression immunooncology geneprediction multiplecomparison cluster integration multi-omics time-series

3.5 match 24 stars 5.98 score 10 scripts

go-ski

clustra:Clustering Longitudinal Trajectories

Clusters longitudinal trajectories over time (can be unequally spaced, unequal length time series and/or partially overlapping series) on a common time axis. Performs k-means clustering on a single continuous variable measured over time, where each mean is defined by a thin plate spline fit to all points in a cluster. Distance is MSE across trajectory points to cluster spline. Provides graphs of derived cluster splines, silhouette plots, and Adjusted Rand Index evaluations of the number of clusters. Scales well to large data with multicore parallelism available to speed computation.

Maintained by George Ostrouchov. Last updated 3 months ago.

4.7 match 4.48 score 6 scripts

bioc

ILoReg:ILoReg: a tool for high-resolution cell population identification from scRNA-Seq data

ILoReg is a tool for identification of cell populations from scRNA-seq data. In particular, ILoReg is useful for finding cell populations with subtle transcriptomic differences. The method utilizes a self-supervised learning method, called Iteratitive Clustering Projection (ICP), to find cluster probabilities, which are used in noise reduction prior to PCA and the subsequent hierarchical clustering and t-SNE steps. Additionally, functions for differential expression analysis to find gene markers for the populations and gene expression visualization are provided.

Maintained by Johannes Smolander. Last updated 5 months ago.

singlecell software clustering dimensionreduction rnaseq visualization transcriptomics datarepresentation differentialexpression transcription geneexpression

4.0 match 5 stars 4.88 score 2 scripts

mthrun

DataVisualizations:Visualizations of High-Dimensional Data

Gives access to data visualisation methods that are relevant from the data scientist's point of view. The flagship idea of 'DataVisualizations' is the mirrored density plot (MD-plot) for either classified or non-classified multivariate data published in Thrun, M.C. et al.: "Analyzing the Fine Structure of Distributions" (2020), PLoS ONE, <DOI:10.1371/journal.pone.0238835>. The MD-plot outperforms the box-and-whisker diagram (box plot), violin plot and bean plot and geom_violin plot of ggplot2. Furthermore, a collection of various visualization methods for univariate data is provided. In the case of exploratory data analysis, 'DataVisualizations' makes it possible to inspect the distribution of each feature of a dataset visually through a combination of four methods. One of these methods is the Pareto density estimation (PDE) of the probability density function (pdf). Additionally, visualizations of the distribution of distances using PDE, the scatter-density plot using PDE for two variables as well as the Shepard density plot and the Bland-Altman plot are presented here. Pertaining to classified high-dimensional data, a number of visualizations are described, such as f.ex. the heat map and silhouette plot. A political map of the world or Germany can be visualized with the additional information defined by a classification of countries or regions. By extending the political map further, an uncomplicated function for a Choropleth map can be used which is useful for measurements across a geographic area. For categorical features, the Pie charts, slope charts and fan plots, improved by the ABC analysis, become usable. More detailed explanations are found in the book by Thrun, M.C.: "Projection-Based Clustering through Self-Organization and Swarm Intelligence" (2018) <DOI:10.1007/978-3-658-20540-9>.

Maintained by Michael Thrun. Last updated 2 months ago.

cpp

2.4 match 7 stars 7.72 score 118 scripts 7 dependents

yulab-smu

TDbook:Companion Package for the Book "Data Integration, Manipulation and Visualization of Phylogenetic Trees" by Guangchuang Yu (2022, ISBN:9781032233574, doi:10.1201/9781003279242)

The companion package that provides all the datasets used in the book "Data Integration, Manipulation and Visualization of Phylogenetic Trees" by Guangchuang Yu (2022, ISBN:9781032233574, doi:10.1201/9781003279242).

Maintained by Guangchuang Yu. Last updated 3 years ago.

3.8 match 13 stars 4.88 score 59 scripts

bioc

orthogene:Interspecies gene mapping

`orthogene` is an R package for easy mapping of orthologous genes across hundreds of species. It pulls up-to-date gene ortholog mappings across **700+ organisms**. It also provides various utility functions to aggregate/expand common objects (e.g. data.frames, gene expression matrices, lists) using **1:1**, **many:1**, **1:many** or **many:many** gene mappings, both within- and between-species.

Maintained by Brian Schilder. Last updated 5 months ago.

genetics comparativegenomics preprocessing phylogenetics transcriptomics geneexpression animal-models bioconductor bioconductor-package bioinformatics biomedicine comparative-genomics evolutionary-biology genes genomics ontologies translational-research

2.3 match 42 stars 7.85 score 31 scripts 2 dependents

sysbiolab

PathwaySpace:Spatial Projection of Network Signals along Geodesic Paths

For a given graph containing vertices, edges, and a signal associated with the vertices, the 'PathwaySpace' package performs a convolution operation, which involves a weighted combination of neighboring vertices and their associated signals. The package then uses a decay function to project these signals, creating geodesic paths on a 2D-image space. 'PathwaySpace' could have various applications, such as visualizing and analyzing network data in a graphical format that highlights the relationships and signal strengths between vertices. It can be particularly useful for understanding the influence of signals through complex networks. By combining graph theory, signal processing, and visualization, the 'PathwaySpace' package provides a novel way of representing and analyzing graph data.

Maintained by Mauro Castro. Last updated 2 months ago.

bioinformatics biological-networks graph

3.3 match 2 stars 4.85 score 5 scripts

mlr-org

mlr3cluster:Cluster Extension for 'mlr3'

Extends the 'mlr3' package with cluster analysis.

Maintained by Maximilian Mücke. Last updated 27 days ago.

cluster-analysis clustering mlr3

1.9 match 23 stars 8.21 score 50 scripts 2 dependents

mottastefano

SOMMD:Self Organising Maps for the Analysis of Molecular Dynamics Data

Processes data from Molecular Dynamics simulations using Self Organising Maps. Features include the ability to read different input formats. Trajectories can be analysed to identify groups of important frames. Output visualisation can be generated for maps and pathways. Methodological details can be found in Motta S et al (2022) <doi:10.1021/acs.jctc.1c01163>. I/O functions for xtc format files were implemented using the 'xdrfile' library available under open source license. The relevant information can be found in inst/COPYRIGHT.

Maintained by Stefano Motta. Last updated 6 months ago.

9.0 match 1.70 score 4 scripts

theropod1

gdi:Volumetric Analysis using Graphic Double Integration

Tools implementing an automated version of the graphic double integration technique (GDI) for volume implementation, and some other related utilities for paleontological image-analysis. GDI was first employed by Jerison (1973) <ISBN:9780323141086> and Hurlburt (1999) <doi:10.1080/02724634.1999.10011145> and is primarily used for volume or mass estimation of (extinct) animals. The package 'gdi' aims to make this technique as convenient and versatile as possible. The core functions of 'gdi' provide utilities for automatically measuring diameters from digital silhouettes provided as image files and calculating volume via graphic double integration with simple elliptical, superelliptical (following Motani 2001 <doi:10.1666/0094-8373(2001)027%3C0735:EBMFST%3E2.0.CO;2>) or complex cross-sectional models. Additionally, the package provides functions for estimating the center of mass position (COM), the moment of inertia (I) for 3D shapes and the second moment of area (Ix, Iy, Iz) of 2D cross-sections, as well as for visualization of results.

Maintained by Darius Nau. Last updated 11 months ago.

5.6 match 2.68 score 16 scripts

a-dudek-ue

clusterSim:Searching for Optimal Clustering Procedure for a Data Set

Distance measures (GDM1, GDM2, Sokal-Michener, Bray-Curtis, for symbolic interval-valued data), cluster quality indices (Calinski-Harabasz, Baker-Hubert, Hubert-Levine, Silhouette, Krzanowski-Lai, Hartigan, Gap, Davies-Bouldin), data normalization formulas (metric data, interval-valued symbolic data), data generation (typical and non-typical data), HINoV method, replication analysis, linear ordering methods, spectral clustering, agreement indices between two partitions, plot functions (for categorical and symbolic interval-valued data). (MILLIGAN, G.W., COOPER, M.C. (1985) <doi:10.1007/BF02294245>, HUBERT, L., ARABIE, P. (1985) <doi:10.1007%2FBF01908075>, RAND, W.M. (1971) <doi:10.1080/01621459.1971.10482356>, JAJUGA, K., WALESIAK, M. (2000) <doi:10.1007/978-3-642-57280-7_11>, MILLIGAN, G.W., COOPER, M.C. (1988) <doi:10.1007/BF01897163>, JAJUGA, K., WALESIAK, M., BAK, A. (2003) <doi:10.1007/978-3-642-55721-7_12>, DAVIES, D.L., BOULDIN, D.W. (1979) <doi:10.1109/TPAMI.1979.4766909>, CALINSKI, T., HARABASZ, J. (1974) <doi:10.1080/03610927408827101>, HUBERT, L. (1974) <doi:10.1080/01621459.1974.10480191>, TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001) <doi:10.1111/1467-9868.00293>, BRECKENRIDGE, J.N. (2000) <doi:10.1207/S15327906MBR3502_5>, WALESIAK, M., DUDEK, A. (2008) <doi:10.1007/978-3-540-78246-9_11>).

Maintained by Andrzej Dudek. Last updated 6 months ago.

cpp

2.2 match 2 stars 6.35 score 512 scripts 9 dependents

gfellerlab

SuperCell:Simplification of scRNA-seq data by merging together similar cells

Aggregates large single-cell data into metacell dataset by merging together gene expression of very similar cells.

Maintained by The package maintainer. Last updated 8 months ago.

software coarse-graining scrna-seq-analysis scrna-seq-data

1.7 match 72 stars 8.08 score 93 scripts

dami82

mutSignatures:Decipher Mutational Signatures from Somatic Mutational Catalogs

Cancer cells accumulate DNA mutations as result of DNA damage and DNA repair processes. This computational framework is aimed at deciphering DNA mutational signatures operating in cancer. The framework includes modules that support raw data import and processing, mutational signature extraction, and results interpretation and visualization. The framework accepts widely used file formats storing information about DNA variants, such as Variant Call Format files. The framework performs Non-Negative Matrix Factorization to extract mutational signatures explaining the observed set of DNA mutations. Bootstrapping is performed as part of the analysis. The framework supports parallelization and is optimized for use on multi-core systems. The software was described by Fantini D et al (2020) <doi:10.1038/s41598-020-75062-0> and is based on a custom R-based implementation of the original MATLAB WTSI framework by Alexandrov LB et al (2013) <doi:10.1016/j.celrep.2012.12.008>.

Maintained by Damiano Fantini. Last updated 2 years ago.

2.3 match 14 stars 5.83 score 48 scripts

dwbapst

paleotree:Paleontological and Phylogenetic Analyses of Evolution

Provides tools for transforming, a posteriori time-scaling, and modifying phylogenies containing extinct (i.e. fossil) lineages. In particular, most users are interested in the functions timePaleoPhy, bin_timePaleoPhy, cal3TimePaleoPhy and bin_cal3TimePaleoPhy, which date cladograms of fossil taxa using stratigraphic data. This package also contains a large number of likelihood functions for estimating sampling and diversification rates from different types of data available from the fossil record (e.g. range data, occurrence data, etc). paleotree users can also simulate diversification and sampling in the fossil record using the function simFossilRecord, which is a detailed simulator for branching birth-death-sampling processes composed of discrete taxonomic units arranged in ancestor-descendant relationships. Users can use simFossilRecord to simulate diversification in incompletely sampled fossil records, under various models of morphological differentiation (i.e. the various patterns by which morphotaxa originate from one another), and with time-dependent, longevity-dependent and/or diversity-dependent rates of diversification, extinction and sampling. Additional functions allow users to translate simulated ancestor-descendant data from simFossilRecord into standard time-scaled phylogenies or unscaled cladograms that reflect the relationships among taxon units.

Maintained by David W. Bapst. Last updated 8 months ago.

1.7 match 21 stars 7.53 score 216 scripts 2 dependents

jeremygelb

geocmeans:Implementing Methods for Spatial Fuzzy Unsupervised Classification

Provides functions to apply spatial fuzzy unsupervised classification, visualize and interpret results. This method is well suited when the user wants to analyze data with a fuzzy clustering algorithm and to account for the spatial dimension of the dataset. In addition, indexes for estimating the spatial consistency and classification quality are proposed. The methods were originally proposed in the field of brain imagery (seed Cai and al. 2007 <doi:10.1016/j.patcog.2006.07.011> and Zaho and al. 2013 <doi:10.1016/j.dsp.2012.09.016>) and recently applied in geography (see Gelb and Apparicio <doi:10.4000/cybergeo.36414>).

Maintained by Jeremy Gelb. Last updated 4 months ago.

clustering cmeans fuzzy-classification-algorithms spatial-analysis spatial-fuzzy-cmeans unsupervised-learning cpp openmp

2.0 match 27 stars 6.08 score 90 scripts

weksi-budiaji

kmed:Distance-Based k-Medoids

Algorithms of distance-based k-medoids clustering: simple and fast k-medoids, ranked k-medoids, and increasing number of clusters in k-medoids. Calculate distances for mixed variable data such as Gower, Podani, Wishart, Huang, Harikumar-PV, and Ahmad-Dey. Cluster validation applies internal and relative criteria. The internal criteria includes silhouette index and shadow values. The relative criterium applies bootstrap procedure producing a heatmap with a flexible reordering matrix algorithm such as complete, ward, or average linkages. The cluster result can be plotted in a marked barplot or pca biplot.

Maintained by Weksi Budiaji. Last updated 3 years ago.

3.8 match 3.15 score 141 scripts

joemsong

GridOnClusters:Cluster-Preserving Multivariate Joint Grid Discretization

Discretize multivariate continuous data using a grid that captures the joint distribution via preserving clusters in the original data (Wang et al. 2020) <doi:10.1145/3388440.3412415>. Joint grid discretization is applicable as a data transformation step to prepare data for model-free inference of association, function, or causality.

Maintained by Joe Song. Last updated 10 months ago.

cpp

4.3 match 2.70 score 3 scripts

cran

Mercator:Clustering and Visualizing Distance Matrices

Defines the classes used to explore, cluster and visualize distance matrices, especially those arising from binary data. See Abrams and colleagues, 2021, <doi:10.1093/bioinformatics/btab037>.

Maintained by Kevin R. Coombes. Last updated 5 months ago.

clustering

2.7 match 4.33 score 12 scripts 1 dependents

bioc

scone:Single Cell Overview of Normalized Expression data

SCONE is an R package for comparing and ranking the performance of different normalization schemes for single-cell RNA-seq and other high-throughput analyses.

Maintained by Davide Risso. Last updated 26 days ago.

immunooncology normalization preprocessing qualitycontrol geneexpression rnaseq software transcriptomics sequencing singlecell coverage

1.3 match 53 stars 9.12 score 104 scripts

aguslespi

leafSTAR:Silhouette to Area Ratio of Tilted Surfaces

Implementation of trigonometric functions to calculate the exposure of flat, tilted surfaces, such as leaves and slopes, to direct solar radiation. It implements the equations in A.G. Escribano-Rocafort, A. Ventre-Lespiaucq, C. Granado-Yela, et al. (2014) <doi:10.1111/2041-210X.12141> in a few user-friendly R functions. All functions handle data obtained with 'Ahmes' 1.0 for Android, as well as more traditional data sources (compass, protractor, inclinometer). The main function (star()) calculates the potential exposure of flat, tilted surfaces to direct solar radiation (silhouette to area ratio, STAR). It is equivalent to the ratio of the leaf projected area to total leaf area, but instead of using area data it uses spatial position angles, such as pitch, roll and course, and information on the geographical coordinates, hour, and date. The package includes additional functions to recalculate STAR with custom settings of location and time, to calculate the tilt angle of a surface, and the minimum angle between two non-orthogonal planes.

Maintained by Agustina Ventre Lespiaucq. Last updated 7 years ago.

10.6 match 1.00 score 9 scripts

bioc

pipeComp:pipeComp pipeline benchmarking framework

A simple framework to facilitate the comparison of pipelines involving various steps and parameters. The `pipelineDefinition` class represents pipelines as, minimally, a set of functions consecutively executed on the output of the previous one, and optionally accompanied by step-wise evaluation and aggregation functions. Given such an object, a set of alternative parameters/methods, and benchmark datasets, the `runPipeline` function then proceeds through all combinations arguments, avoiding recomputing the same step twice and compiling evaluations on the fly to avoid storing potentially large intermediate data.

Maintained by Pierre-Luc Germain. Last updated 5 months ago.

geneexpression transcriptomics clustering datarepresentation benchmark bioconductor pipeline-benchmarking pipelines single-cell-rna-seq

1.5 match 41 stars 7.02 score 43 scripts

keefe-murphy

MEDseq:Mixtures of Exponential-Distance Models with Covariates

Implements a model-based clustering method for categorical life-course sequences relying on mixtures of exponential-distance models introduced by Murphy et al. (2021) <doi:10.1111/rssa.12712>. A range of flexible precision parameter settings corresponding to weighted generalisations of the Hamming distance metric are considered, along with the potential inclusion of a noise component. Gating covariates can be supplied in order to relate sequences to baseline characteristics and sampling weights are also accommodated. The models are fitted using the EM algorithm and tools for visualising the results are also provided.

Maintained by Keefe Murphy. Last updated 7 days ago.

distance-based-clustering mixture-of-experts model-based-clustering sequence-analysis

1.9 match 5 stars 5.49 score 25 scripts

cran

pdfCluster:Cluster Analysis via Nonparametric Density Estimation

Cluster analysis via nonparametric density estimation is performed. Operationally, the kernel method is used throughout to estimate the density. Diagnostics methods for evaluating the quality of the clustering are available. The package includes also a routine to estimate the probability density function obtained by the kernel method, given a set of data with arbitrary dimensions.

Maintained by Menardi Giovanna. Last updated 2 years ago.

1.8 match 5 stars 5.66 score 196 scripts 12 dependents

cran

fclust:Fuzzy Clustering

Algorithms for fuzzy clustering, cluster validity indices and plots for cluster validity and visualizing fuzzy clustering results.

Maintained by Paolo Giordani. Last updated 2 years ago.

openblas cpp

4.3 match 1 stars 2.38 score 2 dependents

matthias-studer

WeightedCluster:Clustering of Weighted Data

Clusters state sequences and weighted data. It provides an optimized weighted PAM algorithm as well as functions for aggregating replicated cases, computing cluster quality measures for a range of clustering solutions and plotting (fuzzy) clusters of state sequences. Parametric bootstraps methods to validate typology of sequences are also provided. Finally, it provides a fuzzy and crisp CLARA algorithm to cluster large database with sequence analysis.

Maintained by Matthias Studer. Last updated 3 months ago.

cpp

1.8 match 5.55 score 106 scripts 4 dependents

tiago-simoes

EvoPhylo:Pre- And Postprocessing of Morphological Data from Relaxed Clock Bayesian Phylogenetics

Performs automated morphological character partitioning for phylogenetic analyses and analyze macroevolutionary parameter outputs from clock (time-calibrated) Bayesian inference analyses, following concepts introduced by Simões and Pierce (2021) <doi:10.1038/s41559-021-01532-x>.

Maintained by Tiago Simoes. Last updated 2 years ago.

1.7 match 4 stars 5.66 score 19 scripts

kisungyou

T4cluster:Tools for Cluster Analysis

Cluster analysis is one of the most fundamental problems in data science. We provide a variety of algorithms from clustering to the learning on the space of partitions. See Hennig, Meila, and Rocci (2016, ISBN:9781466551886) for general exposition to cluster analysis.

Maintained by Kisung You. Last updated 3 years ago.

openblas cpp openmp

2.3 match 6 stars 4.26 score 9 scripts 2 dependents

dgrun

RaceID:Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data

Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>).

Maintained by Dominic Grün. Last updated 4 months ago.

cpp

2.0 match 4.74 score 110 scripts

frbcesab

rutils:A Collection of R Functions

A collection of R functions commonly used in FRB-CESAB projects.

Maintained by Nicolas Casajus. Last updated 2 months ago.

miscellaneous

2.0 match 2 stars 4.66 score 454 scripts

bioc

GenomicSuperSignature:Interpretation of RNA-seq experiments through robust, efficient comparison to public databases

This package provides a novel method for interpreting new transcriptomic datasets through near-instantaneous comparison to public archives without high-performance computing requirements. Through the pre-computed index, users can identify public resources associated with their dataset such as gene sets, MeSH term, and publication. Functions to identify interpretable annotations and intuitive visualization options are implemented in this package.

Maintained by Sehyun Oh. Last updated 5 months ago.

transcriptomics systemsbiology principalcomponent rnaseq sequencing pathways clustering bioconductor-package exploratory-data-analysis gsea mesh principal-component-analysis rna-sequencing-profiles transferlearning

1.3 match 16 stars 6.97 score 59 scripts

bioc

BERT:High Performance Data Integration for Large-Scale Analyses of Incomplete Omic Profiles Using Batch-Effect Reduction Trees (BERT)

Provides efficient batch-effect adjustment of data with missing values. BERT orders all batch effect correction to a tree of pairwise computations. BERT allows parallelization over sub-trees.

Maintained by Yannis Schumann. Last updated 2 months ago.

batcheffect preprocessing experimentaldesign qualitycontrol batch-effect bioconductor-package bioinformatics data-integration data-science

1.7 match 2 stars 5.40 score 18 scripts

bioc

SGCP:SGCP: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks

SGC is a semi-supervised pipeline for gene clustering in gene co-expression networks. SGC consists of multiple novel steps that enable the computation of highly enriched modules in an unsupervised manner. But unlike all existing frameworks, it further incorporates a novel step that leverages Gene Ontology information in a semi-supervised clustering method that further improves the quality of the computed modules.

Maintained by Niloofar AghaieAbiane. Last updated 5 months ago.

geneexpression genesetenrichment networkenrichment systemsbiology classification clustering dimensionreduction graphandnetwork neuralnetwork network mrnamicroarray rnaseq visualization bioinformatics genecoexpressionnetwork graphs networkclustering networks self-training semi-supervised-learning unsupervised-learning

1.8 match 2 stars 5.12 score 44 scripts

bioc

Polytect:An R package for digital data clustering

Polytect is an advanced computational tool designed for the analysis of multi-color digital PCR data. It provides automatic clustering and labeling of partitions into distinct groups based on clusters first identified by the flowPeaks algorithm. Polytect is particularly useful for researchers in molecular biology and bioinformatics, enabling them to gain deeper insights into their experimental results through precise partition classification and data visualization.

Maintained by Yao Chen. Last updated 3 months ago.

ddpcr clustering multichannel classification

1.9 match 4.74 score 4 scripts

mi2-warsaw

sejmRP:An Information About Deputies and Votings in Polish Diet from Seventh to Eighth Term of Office

Set of functions that access information about deputies and votings in Polish diet from webpage <http://www.sejm.gov.pl>. The package was developed as a result of an internship in MI2 Group - <http://mi2.mini.pw.edu.pl>, Faculty of Mathematics and Information Science, Warsaw University of Technology.

Maintained by Piotr Smuda. Last updated 8 years ago.

1.8 match 21 stars 5.04 score 35 scripts

bioc

SpectralTAD:SpectralTAD: Hierarchical TAD detection using spectral clustering

SpectralTAD is an R package designed to identify Topologically Associated Domains (TADs) from Hi-C contact matrices. It uses a modified version of spectral clustering that uses a sliding window to quickly detect TADs. The function works on a range of different formats of contact matrices and returns a bed file of TAD coordinates. The method does not require users to adjust any parameters to work and gives them control over the number of hierarchical levels to be returned.

Maintained by Mikhail Dozmorov. Last updated 5 months ago.

software hic sequencing featureextraction clustering

1.3 match 8 stars 6.53 score 17 scripts

acabassi

coca:Cluster-of-Clusters Analysis

Contains the R functions needed to perform Cluster-Of-Clusters Analysis (COCA) and Consensus Clustering (CC). For further details please see Cabassi and Kirk (2020) <doi:10.1093/bioinformatics/btaa593>.

Maintained by Alessandra Cabassi. Last updated 5 years ago.

cluster-analysis cluster-of-clusters clustering coca genomics integrative-clustering multi-omics

1.7 match 6 stars 5.03 score 12 scripts 1 dependents

jakobraymaekers

classmap:Visualizing Classification Results

Tools to visualize the results of a classification of cases. The graphical displays include stacked plots, silhouette plots, quasi residual plots, and class maps. Implements the techniques described and illustrated in Raymaekers, Rousseeuw and Hubert (2021), Class maps for visualizing classification results, Technometrics, appeared online. <doi:10.1080/00401706.2021.1927849> (open access) and Raymaekers and Rousseeuw (2021), Silhouettes and quasi residual plots for neural nets and tree-based classifiers, <arXiv:2106.08814>. Examples can be found in the vignettes: "Discriminant_analysis_examples","K_nearest_neighbors_examples", "Support_vector_machine_examples", "Rpart_examples", "Random_forest_examples", and "Neural_net_examples".

Maintained by Jakob Raymaekers. Last updated 2 years ago.

2.6 match 3.08 score 20 scripts

clancylabuiuc

moRphomenses:Geometric Morphometric Tools to Align, Scale, and Compare "Shape" of Menstrual Cycle Hormones

Mitteroecker & Gunz (2009) <doi:10.1007/s11692-009-9055-x> describe how geometric morphometric methods allow researchers to quantify the size and shape of physical biological structures. We provide tools to extend geometric morphometric principles to the study of non-physical structures, hormone profiles, as outlined in Ehrlich et al (2021) <doi:10.1002/ajpa.24514>. Easily transform daily measures into multivariate landmark-based data. Includes custom functions to apply multivariate methods for data exploration as well as hypothesis testing. Also includes 'shiny' web app to streamline data exploration. Developed to study menstrual cycle hormones but functions have been generalized and should be applicable to any biomarker over any time period.

Maintained by Daniel Ehrlich. Last updated 2 months ago.

2.0 match 2 stars 4.04 score 4 scripts

cran

Kira:Machine Learning

Machine learning, containing several algorithms for supervised and unsupervised classification, in addition to a function that plots the Receiver Operating Characteristic (ROC) and Precision-Recall (PRC) curve graphs, and also a function that returns several metrics used for model evaluation, the latter can be used in ranking results from other packs.

Maintained by Paulo Cesar Ossani. Last updated 6 months ago.

4.8 match 1.70 score

cran

dawai:Discriminant Analysis with Additional Information

In applications it is usual that some additional information is available. This package dawai (an acronym for Discriminant Analysis With Additional Information) performs linear and quadratic discriminant analysis with additional information expressed as inequality restrictions among the populations means. It also computes several estimations of the true error rate.

Maintained by David Conde. Last updated 5 months ago.

4.0 match 2.00 score

ocbe-uio

DIscBIO:A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics

An open, multi-algorithmic pipeline for easy, fast and efficient analysis of cellular sub-populations and the molecular signatures that characterize them. The pipeline consists of four successive steps: data pre-processing, cellular clustering with pseudo-temporal ordering, defining differential expressed genes and biomarker identification. More details on Ghannoum et. al. (2021) <doi:10.3390/ijms22031399>. This package implements extensions of the work published by Ghannoum et. al. (2019) <doi:10.1101/700989>.

Maintained by Waldir Leoncio. Last updated 1 years ago.

biomarker-discovery jupyter-notebook scrna-seq single-cell-analysis transcriptomics openjdk

1.8 match 12 stars 4.38 score 5 scripts

biplab44

ctmva:Continuous-Time Multivariate Analysis

Implements a basis function or functional data analysis framework for several techniques of multivariate analysis in continuous-time setting. Specifically, we introduced continuous-time analogues of several classical techniques of multivariate analysis, such as principal component analysis, canonical correlation analysis, Fisher linear discriminant analysis, K-means clustering, and so on. Details are in Biplab Paul, Philip T. Reiss and Erjia Cui (2023) "Continuous-time multivariate analysis" <doi:10.48550/arXiv.2307.09404>.

Maintained by Biplab Paul. Last updated 1 years ago.

7.9 match 1.00 score

jdmde

scellpam:Applying Partitioning Around Medoids to Single Cell Data with High Number of Cells

PAM (Partitioning Around Medoids) algorithm application to samples of single cell sequencing techniques with a high number of cells (as many as the computer memory allows). The package uses a binary format to store matrices (either full, sparse or symmetric) in files written in the disk that can contain any data type (not just double) which allows its manipulation when memory is sufficient to load them as int or float, but not as double. The PAM implementation is done in parallel, using several/all the cores of the machine, if it has them. This package shares a great part of its code with packages 'jmatrix' and 'parallelpam' but their functionality is included here so there is no need to install them.

Maintained by Juan Domingo. Last updated 8 months ago.

cpp

2.8 match 2.78 score 9 scripts

wagner-s

MultIS:Reconstruction of Clones from Integration Site Readouts and Visualization

Tools necessary to reconstruct clonal affiliations from temporally and/or spatially separated measurements of viral integration sites. For this means it utilizes correlations present in the relative readouts of the integration sites. Furthermore, facilities for filtering of the data and visualization of different steps in the pipeline are provided with the package.

Maintained by Sebastian Wagner. Last updated 4 years ago.

3.5 match 2.00 score 1 scripts

promidat

discoveR:Exploratory Data Analysis System

Performs an exploratory data analysis through a 'shiny' interface. It includes basic methods such as the mean, median, mode, normality test, among others. It also includes clustering techniques such as Principal Components Analysis, Hierarchical Clustering and the K-Means Method.

Maintained by Oldemar Rodriguez. Last updated 2 years ago.

2.3 match 3 stars 3.03 score 18 scripts

nwiroonsri

UniversalCVI:Hard and Soft Cluster Validity Indices

Algorithms for checking the accuracy of a clustering result with known classes, computing cluster validity indices, and generating plots for comparing them. The package is compatible with K-means, fuzzy C means, EM clustering, and hierarchical clustering (single, average, and complete linkage). The details of the indices in this package can be found in: J. C. Bezdek, M. Moshtaghi, T. Runkler, C. Leckie (2016) <doi:10.1109/TFUZZ.2016.2540063>, T. Calinski, J. Harabasz (1974) <doi:10.1080/03610927408827101>, C. H. Chou, M. C. Su, E. Lai (2004) <doi:10.1007/s10044-004-0218-1>, D. L. Davies, D. W. Bouldin (1979) <doi:10.1109/TPAMI.1979.4766909>, J. C. Dunn (1973) <doi:10.1080/01969727308546046>, F. Haouas, Z. Ben Dhiaf, A. Hammouda, B. Solaiman (2017) <doi:10.1109/FUZZ-IEEE.2017.8015651>, M. Kim, R. S. Ramakrishna (2005) <doi:10.1016/j.patrec.2005.04.007>, S. H. Kwon (1998) <doi:10.1049/EL:19981523>, S. H. Kwon, J. Kim, S. H. Son (2021) <doi:10.1049/ell2.12249>, G. W. Miligan (1980) <doi:10.1007/BF02293907>, M. K. Pakhira, S. Bandyopadhyay, U. Maulik (2004) <doi:10.1016/j.patcog.2003.06.005>, M. Popescu, J. C. Bezdek, T. C. Havens, J. M. Keller (2013) <doi:10.1109/TSMCB.2012.2205679>, S. Saitta, B. Raphael, I. Smith (2007) <doi:10.1007/978-3-540-73499-4_14>, A. Starczewski (2017) <doi:10.1007/s10044-015-0525-8>, Y. Tang, F. Sun, Z. Sun (2005) <doi:10.1109/ACC.2005.1470111>, N. Wiroonsri (2024) <doi:10.1016/j.patcog.2023.109910>, N. Wiroonsri, O. Preedasawakul (2023) <doi:10.48550/arXiv.2308.14785>, C. H. Wu, C. S. Ouyang, L. W. Chen, L. W. Lu (2015) <doi:10.1109/TFUZZ.2014.2322495>, X. Xie, G. Beni (1991) <doi:10.1109/34.85677> and Rousseeuw (1987) and Kaufman and Rousseeuw(2009) <doi:10.1016/0377-0427(87)90125-7> and <doi:10.1002/9780470316801> C. Alok. (2010).

Maintained by Nathakhun Wiroonsri. Last updated 2 months ago.

2.3 match 2.97 score 31 scripts 1 dependents

uislambekov

TDAvec:Vector summaries of persistence diagrams

Provides tools for computing various vector summaries of persistence diagrams studied in Topological Data Analysis. For improved computational efficiency, all code for the vector summaries is written in 'C++' using the 'Rcpp' and 'RcppArmadillo' packages

Maintained by Umar Islambekov. Last updated 8 days ago.

openblas cpp openmp

1.8 match 3.52 score 11 scripts

bioc

TMixClust:Time Series Clustering of Gene Expression with Gaussian Mixed-Effects Models and Smoothing Splines

Implementation of a clustering method for time series gene expression data based on mixed-effects models with Gaussian variables and non-parametric cubic splines estimation. The method can robustly account for the high levels of noise present in typical gene expression time series datasets.

Maintained by Monica Golumbeanu. Last updated 5 months ago.

software statisticalmethod clustering timecourse geneexpression

1.8 match 3.60 score 5 scripts

cran

drclust:Simultaneous Clustering and (or) Dimensionality Reduction

Methods for simultaneous clustering and dimensionality reduction such as: Double k-means, Reduced k-means, Factorial k-means, Clustering with Disjoint PCA but also methods for exclusively dimensionality reduction: Disjoint PCA, Disjoint FA. The statistical methods implemented refer to the following articles: de Soete G., Carroll J. (1994) "K-means clustering in a low-dimensional Euclidean space" <doi:10.1007/978-3-642-51175-2_24> ; Vichi M. (2001) "Double k-means Clustering for Simultaneous Classification of Objects and Variables" <doi:10.1007/978-3-642-59471-7_6> ; Vichi M., Kiers H.A.L. (2001) "Factorial k-means analysis for two-way data" <doi:10.1016/S0167-9473(00)00064-5> ; Vichi M., Saporta G. (2009) "Clustering and disjoint principal component analysis" <doi:10.1016/j.csda.2008.05.028> ; Vichi M. (2017) "Disjoint factor analysis with cross-loadings" <doi:10.1007/s11634-016-0263-9>.

Maintained by Ionel Prunila. Last updated 11 months ago.

openblas cpp openmp

6.0 match 1.00 score

sugnet

ClusBoot:Bootstrap a Clustering Solution to Establish the Stability of the Clusters

Providing a cluster allocation for n samples, either with an $n \times p$ data matrix or an $n \times n$ distance matrix, a bootstrap procedure is performed. The proportion of bootstrap replicates where a pair of samples cluster in the same cluster indicates who tightly the samples in a particular cluster clusters together.

Maintained by Sugnet Lubbe. Last updated 8 months ago.

5.9 match 1.00 score 1 scripts

cran

aweSOM:Interactive Self-Organizing Maps

Self-organizing maps (also known as SOM, see Kohonen (2001) <doi:10.1007/978-3-642-56927-2>) are a method for dimensionality reduction and clustering of continuous data. This package introduces interactive (html) graphics for easier analysis of SOM results. It also features an interactive interface, for push-button training and visualization of SOM on numeric, categorical or mixed data, as well as tools to evaluate the quality of SOM.

Maintained by Julien Boelaert. Last updated 3 years ago.

1.9 match 3 stars 2.95 score 1 dependents

jdmde

parallelpam:Parallel Partitioning-Around-Medoids (PAM) for Big Sets of Data

Application of the Partitioning-Around-Medoids (PAM) clustering algorithm described in Schubert, E. and Rousseeuw, P.J.: "Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms." Information Systems, vol. 101, p. 101804, (2021). <doi:10.1016/j.is.2021.101804>. It uses a binary format for storing and retrieval of matrices developed for the 'jmatrix' package but the functionality of 'jmatrix' is included here, so you do not need to install it. Also, it is used by package 'scellpam', so if you have installed it, you do not need to install this package. PAM can be applied to sets of data whose dissimilarity matrix can be very big. It has been tested with up to 100.000 points. It does this with the help of the code developed for other package, 'jmatrix', which allows the matrix not to be loaded in 'R' memory (which would force it to be of double type) but it gets from disk, which allows using float (or even smaller data types). Moreover, the dissimilarity matrix is calculated in parallel if the computer has several cores so it can open many threads. The initial part of the PAM algorithm can be done with the BUILD or LAB algorithms; the BUILD algorithm has been implemented in parallel. The optimization phase implements the FastPAM1 algorithm, also in parallel. Finally, calculation of silhouette is available and also implemented in parallel.

Maintained by Juan Domingo. Last updated 8 months ago.

cpp

2.0 match 2.60 score 6 scripts

ifelipebj

PrometheeTools:PROMETHEE and GLNF for Ranking and Sorting Problems

PROMETHEE (Preference Ranking Organisation METHod for Enrichment of Evaluations) based method assesses alternatives to obtain partial and complete rankings. The package also provides the GLNF (Global Local Net Flow) sorting algorithm to classify alternatives into ordered categories, as well as an index function to measure the classification quality. Barrera, F., Segura, M., & Maroto, C. (2023) <doi:10.1111/itor.13288>. Brans, J.P.; De Smet, Y., (2016) <doi:10.1007/978-1-4939-3094-4_6>.

Maintained by Felipe Barrera. Last updated 2 years ago.

1.9 match 2.70 score 1 scripts

cran

RSKC:Robust Sparse K-Means

This RSKC package contains a function RSKC which runs the robust sparse K-means clustering algorithm.

Maintained by Yumi Kondo. Last updated 9 years ago.

2.3 match 1.97 score 31 scripts 1 dependents

cran

bios2mds:From Biological Sequences to Multidimensional Scaling

Utilities dedicated to the analysis of biological sequences by metric MultiDimensional Scaling with projection of supplementary data. It contains functions for reading multiple sequence alignment files, calculating distance matrices, performing metric multidimensional scaling and visualizing results.

Maintained by Marie Chabbert. Last updated 5 years ago.

2.3 match 1 stars 1.90 score

kisungyou

TDAkit:Toolkit for Topological Data Analysis

Topological data analysis studies structure and shape of the data using topological features. We provide a variety of algorithms to learn with persistent homology of the data based on functional summaries for clustering, hypothesis testing, visualization, and others. We refer to Wasserman (2018) <doi:10.1146/annurev-statistics-031017-100045> for a statistical perspective on the topic.

Maintained by Kisung You. Last updated 4 years ago.

openblas cpp openmp

1.8 match 2 stars 2.30 score 4 scripts

cran

ExcessMass:Excess Mass Calculation and Plots

Implementation of a function which calculates the empirical excess mass for given \eqn{\lambda} and given maximal number of modes (excessm()). Offering powerful plot features to visualize empirical excess mass (exmplot()). This includes the possibility of drawing several plots (with different maximal number of modes / cut off values) in a single graph.

Maintained by Marc-Daniel Mildenberger. Last updated 3 years ago.

3.8 match 1.00 score

vpihur

clValid:Validation of Clustering Results

Statistical and biological validation of clustering results. This package implements Dunn Index, Silhouette, Connectivity, Stability, BHI and BSI. Further information can be found in Brock, G et al. (2008) <doi: 10.18637/jss.v025.i04>.

Maintained by Vasyl Pihur. Last updated 4 years ago.

0.5 match 5 stars 7.19 score 422 scripts 14 dependents

gagolews

genieclust:Fast and Robust Hierarchical Clustering with Noise Points Detection

A retake on the Genie algorithm (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>), which is a robust hierarchical clustering method (Gagolewski, Bartoszuk, Cena, 2016 <DOI:10.1016/j.ins.2016.05.003>). It is now faster and more memory efficient; determining the whole cluster hierarchy for datasets of 10M points in low dimensional Euclidean spaces or 100K points in high-dimensional ones takes only a minute or so. Allows clustering with respect to mutual reachability distances so that it can act as a noise point detector or a robustified version of 'HDBSCAN*' (that is able to detect a predefined number of clusters and hence it does not dependent on the somewhat fragile 'eps' parameter). The package also features an implementation of inequality indices (e.g., Gini and Bonferroni), external cluster validity measures (e.g., the normalised clustering accuracy, the adjusted Rand index, the Fowlkes-Mallows index, and normalised mutual information), and internal cluster validity indices (e.g., the Calinski-Harabasz, Davies-Bouldin, Ball-Hall, Silhouette, and generalised Dunn indices). See also the 'Python' version of 'genieclust' available on 'PyPI', which supports sparse data, more metrics, and even larger datasets.

Maintained by Marek Gagolewski. Last updated 5 days ago.

cluster-analysis clustering clustering-algorithm data-analysis data-mining data-science genie hdbscan hierarchical-clustering hierarchical-clustering-algorithm machine-learning machine-learning-algorithms mlpack nmslib python python3 sparse cpp openmp

0.5 match 61 stars 7.29 score 13 scripts 5 dependents

cran

IntNMF:Integrative Clustering of Multiple Genomic Dataset

Carries out integrative clustering analysis using multiple types of genomic dataset using integrative Non-negative Matrix factorization.

Maintained by Prabhakar Chalise. Last updated 2 months ago.

1.9 match 2 stars 1.79 score 31 scripts

cran

FPDclustering:PD-Clustering and Related Methods

Probabilistic distance clustering (PD-clustering) is an iterative, distribution-free, probabilistic clustering method. PD-clustering assigns units to a cluster according to their probability of membership under the constraint that the product of the probability and the distance of each point to any cluster center is a constant. PD-clustering is a flexible method that can be used with elliptical clusters, outliers, or noisy data. PDQ is an extension of the algorithm for clusters of different sizes. GPDC and TPDC use a dissimilarity measure based on densities. Factor PD-clustering (FPDC) is a factor clustering method that involves a linear transformation of variables and a cluster optimizing the PD-clustering criterion. It works on high-dimensional data sets.

Maintained by Cristina Tortora. Last updated 11 days ago.

2.0 match 1.60 score

abeyrann

rYWAASB:Simultaneous Selection by Trait and WAASB Index

This tool proposes a new ranking algorithm that utilizes a "Y*WAASB" biplot generated by the 'metan'. The aim of the current package is to effectively distinguish the top-ranked genotypes in MET (Multi-Environmental Trials). For a detailed explanation of the process of obtaining "WAASB", "WAASBY" indices, and a "Y*WAASB" biplot, refer to the manual included in this package as well as the study by Olivoto & Lúcio (2020) <doi:10.1111/2041-210X.13384>. In this context, "WAASB" refers to the "Weighted Average of Absolute Scores" provided by Olivoto et al. (2019) <doi:10.2134/agronj2019.03.0220>, which quantifies the stability of genotypes across different environments using linear mixed-effect models. To run the package, you need to extract the "WAASB" and "WAASBY" coefficients using the 'metan' and apply them. This tool utilizes PCA (Principal Component Analysis) and differentiates the entries which may be genotypes, hybrids, varieties, etc using "WAASB", "WAASBY", and a combination of the specified trait and WAASB index.

Maintained by Ali Arminian. Last updated 6 months ago.

1.0 match 3.00 score 4 scripts

cran

SLBDD:Statistical Learning for Big Dependent Data

Programs for analyzing large-scale time series data. They include functions for automatic specification and estimation of univariate time series, for clustering time series, for multivariate outlier detections, for quantile plotting of many time series, for dynamic factor models and for creating input data for deep learning programs. Examples of using the package can be found in the Wiley book 'Statistical Learning with Big Dependent Data' by Daniel Peña and Ruey S. Tsay (2021). ISBN 9781119417385.

Maintained by Antonio Elias. Last updated 3 years ago.

1.8 match 1.56 score 12 scripts 1 dependents

ethanyxu

ADPclust:Fast Clustering Using Adaptive Density Peak Detection

An implementation of ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon the idea of Rodriguez and Laio (2014)<DOI:10.1126/science.1242072>. ADPclust clusters data by finding density peaks in a density-distance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes a user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional "density-distance plot". Here is the research article associated with this package: "Wang, Xiao-Feng, and Yifan Xu (2015)<DOI:10.1177/0962280215609948> Fast clustering using adaptive density peak detection." Statistical methods in medical research". url: <http://smm.sagepub.com/content/early/2015/10/15/0962280215609948.abstract>.

Maintained by Ethan Yifan Xu. Last updated 3 years ago.

0.5 match 10 stars 5.34 score 44 scripts

gianmarcoalberti

boxplotcluster:Clustering Method Based on Boxplot Statistics

Following Arroyo-Maté-Roque (2006), the function calculates the distance between rows or columns of the dataset using the generalized Minkowski metric as described by Ichino-Yaguchi (1994). The distance measure gives more weight to differences between quartiles than to differences between extremes, making it less sensitive to outliers. Further,the function calculates the silhouette width (Rousseeuw 1987) for different numbers of clusters and selects the number of clusters that maximizes the average silhouette width, unless a specific number of clusters is provided by the user. The approach implemented in this package is based on the following publications: Rousseeuw (1987) <doi:10.1016/0377-0427(87)90125-7>; Ichino-Yaguchi (1994) <doi:10.1109/21.286391>; Arroyo-Maté-Roque (2006) <doi:10.1007/3-540-34416-0_7>.

Maintained by Gianmarco Alberti. Last updated 1 years ago.

0.8 match 1.70 score

cran

SillyPutty:Silly Putty Clustering

Implements a simple, novel clustering algorithm based on optimizing the silhouette width. See <doi:10.1101/2023.11.07.566055> for details.

Maintained by Kevin R. Coombes. Last updated 1 years ago.

clustering

0.5 match 2.00 score

quantmeth

image2data:Turn Images into Data Sets

The goal of 'image2data' is to extract images and return them into a data set, especially for teaching data manipulation and data visualization. Basically, the eponymous function takes an image file ('png', 'tiff', 'jpeg', 'bmp') and turn it into a data set, pixels being rows (subjects) and columns (variables) being their coordinate positions (x- and y-axis) and their respective color (in hex codes). The function can return a complete image or a range of color (i.e., contour, silhouette). The data can then be manipulated as would any data set by either creating other related variables (to hide the image) or as a genuine toy data set.

Maintained by P.-O. Caron. Last updated 3 years ago.

0.5 match 2.00 score 1 scripts

buybnb

GIC:A General Iterative Clustering Algorithm

An iterative algorithm that improves the proximity matrix (PM) from a random forest (RF) and the resulting clusters as measured by the silhouette score.

Maintained by Ziqiang Lin. Last updated 3 years ago.

0.5 match 1.70 score

gianmarcoalberti

brsim:Brainerd-Robinson Similarity Coefficient Matrix

Provides the facility to calculate the Brainerd-Robinson similarity coefficient for the rows of an input table, and to calculate the significance of each coefficient based on a permutation approach; a heatmap is produced to visually represent the similarity matrix. Optionally, hierarchical agglomerative clustering can be performed and the silhouette method is used to identify an optimal number of clusters; the results of the clustering can be optionally used to sort the heatmap.

Maintained by Gianmarco Alberti. Last updated 1 years ago.

0.5 match 1.70 score