R-universe search: dbscan

mhahsler

dbscan:Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms

A fast reimplementation of several density-based algorithms of the DBSCAN family. Includes the clustering algorithms DBSCAN (density-based spatial clustering of applications with noise) and HDBSCAN (hierarchical DBSCAN), the ordering algorithm OPTICS (ordering points to identify the clustering structure), shared nearest neighbor clustering, and the outlier detection algorithms LOF (local outlier factor) and GLOSH (global-local outlier score from hierarchies). The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided. Hahsler, Piekenbrock and Doran (2019) <doi:10.18637/jss.v091.i01>.

Maintained by Michael Hahsler. Last updated 2 months ago.

clustering dbscan density-based-clustering hdbscan lof optics cpp

87.4 match 321 stars 15.62 score 1.6k scripts 84 dependents

chrhennig

fpc:Flexible Procedures for Clustering

Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.

Maintained by Christian Hennig. Last updated 6 months ago.

12.4 match 11 stars 9.25 score 2.6k scripts 70 dependents

mlr-org

mlr3cluster:Cluster Extension for 'mlr3'

Extends the 'mlr3' package with cluster analysis.

Maintained by Maximilian Mücke. Last updated 27 days ago.

cluster-analysis clustering mlr3

5.1 match 23 stars 8.21 score 50 scripts 2 dependents

cefet-rj-dal

daltoolbox:Leveraging Experiment Lines to Data Analytics

The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.

Maintained by Eduardo Ogasawara. Last updated 1 months ago.

5.3 match 1 stars 6.65 score 536 scripts 4 dependents

opengeos

whitebox:'WhiteboxTools' R Frontend

An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.

Maintained by Andrew Brown. Last updated 5 months ago.

geomorphometry geoprocessing geospatial gis hydrology remote-sensing rstudio

3.3 match 173 stars 9.65 score 203 scripts 2 dependents

kurthornik

RWeka:R/Weka Interface

An R interface to Weka (Version 3.9.3). Weka is a collection of machine learning algorithms for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Package 'RWeka' contains the interface code, the Weka jar is in a separate package 'RWekajars'. For more information on Weka see <https://www.cs.waikato.ac.nz/ml/weka/>.

Maintained by Kurt Hornik. Last updated 2 years ago.

openjdk

3.3 match 4 stars 8.24 score 1.8k scripts 14 dependents

henrifnk

FuzzyDBScan:Run and Predict a Fuzzy DBScan

An interface for training Fuzzy DBScan with both Fuzzy Core and Fuzzy Border. Therefore, the package provides a method to initialize and run the algorithm and a function to predict new data w.t.h. of 'R6'. The package is build upon the paper "Fuzzy Extensions of the DBScan algorithm" from Ienco and Bordogna (2018) <doi:10.1007/s00500-016-2435-0>. A predict function assigns new data according to the same criteria as the algorithm itself. However, the prediction function freezes the algorithm to preserve the trained cluster structure and treats each new prediction object individually.

Maintained by Henri Funk. Last updated 2 years ago.

6.1 match 2 stars 3.48 score 3 scripts 1 dependents

rcurtin

mlpack:'Rcpp' Integration for the 'mlpack' Library

A fast, flexible machine learning library, written in C++, that aims to provide fast, extensible implementations of cutting-edge machine learning algorithms. See also Curtin et al. (2023) <doi:10.21105/joss.05026>.

Maintained by Ryan Curtin. Last updated 3 months ago.

openblas cpp openmp

5.6 match 3.71 score 20 scripts 8 dependents

mhahsler

stream:Infrastructure for Data Stream Mining

A framework for data stream modeling and associated data mining tasks such as clustering and classification. The development of this package was supported in part by NSF IIS-0948893, NSF CMMI 1728612, and NIH R21HG005912. Hahsler et al (2017) <doi:10.18637/jss.v076.i14>.

Maintained by Michael Hahsler. Last updated 4 days ago.

data-stream-clustering datastream stream-mining cpp

2.0 match 39 stars 10.05 score 132 scripts 3 dependents

andriyprotsak5

UAHDataScienceUC:Learn Clustering Techniques Through Examples and Code

A comprehensive educational package combining clustering algorithms with detailed step-by-step explanations. Provides implementations of both traditional (hierarchical, k-means) and modern (Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models (GMM), genetic k-means) clustering methods as described in Ezugwu et. al., (2022) <doi:10.1016/j.engappai.2022.104743>. Includes educational datasets highlighting different clustering challenges, based on 'scikit-learn' examples (Pedregosa et al., 2011) <https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html>. Features detailed algorithm explanations, visualizations, and weighted distance calculations for enhanced learning.

Maintained by Andriy Protsak Protsak. Last updated 27 days ago.

7.7 match 2.30 score

bioc

bluster:Clustering Algorithms for Bioconductor

Wraps common clustering algorithms in an easily extended S4 framework. Backends are implemented for hierarchical, k-means and graph-based clustering. Several utilities are also provided to compare and evaluate clustering results.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology software geneexpression transcriptomics singlecell clustering cpp

1.9 match 9.43 score 636 scripts 51 dependents

immunomind

immunarch:Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires

A comprehensive framework for bioinformatics exploratory analysis of bulk and single-cell T-cell receptor and antibody repertoires. It provides seamless data loading, analysis and visualisation for AIRR (Adaptive Immune Receptor Repertoire) data, both bulk immunosequencing (RepSeq) and single-cell sequencing (scRNAseq). Immunarch implements most of the widely used AIRR analysis methods, such as: clonality analysis, estimation of repertoire similarities in distribution of clonotypes and gene segments, repertoire diversity analysis, annotation of clonotypes using external immune receptor databases and clonotype tracking in vaccination and cancer studies. A successor to our previously published 'tcR' immunoinformatics package (Nazarov 2015) <doi:10.1186/s12859-015-0613-1>.

Maintained by Vadim I. Nazarov. Last updated 12 months ago.

airr-analysis b-cell-receptor bcr bcr-repertoire bioinformatics ig ig-repertoire immune-repertoire immune-repertoire-analysis immune-repertoire-data immunoglobulin immunoinformatics immunology rep-seq repertoire-analysis single-cell single-cell-analysis t-cell-receptor tcr tcr-repertoire cpp

1.8 match 315 stars 9.49 score 203 scripts

blansche

fdm2id:Data Mining and R Programming for Beginners

Contains functions to simplify the use of data mining methods (classification, regression, clustering, etc.), for students and beginners in R programming. Various R packages are used and wrappers are built around the main functions, to standardize the use of data mining methods (input/output): it brings a certain loss of flexibility, but also a gain of simplicity. The package name came from the French "Fouille de Données en Master 2 Informatique Décisionnelle".

Maintained by Alexandre Blansché. Last updated 2 years ago.

9.6 match 1 stars 1.62 score 42 scripts

cran

UAHDataScienceUC:Learn Clustering Techniques Through Examples and Code

A comprehensive educational package combining clustering algorithms with detailed step-by-step explanations. Provides implementations of both traditional (hierarchical, k-means) and modern (Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models (GMM), genetic k-means) clustering methods as described in Ezugwu et. al., (2022) <doi:10.1016/j.engappai.2022.104743>. Includes educational datasets highlighting different clustering challenges, based on 'scikit-learn' examples (Pedregosa et al., 2011) <https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html>. Features detailed algorithm explanations, visualizations, and weighted distance calculations for enhanced learning.

Maintained by Andriy Protsak Protsak. Last updated 27 days ago.

7.7 match 2.00 score

ediu3095

clustlearn:Learn Clustering Techniques Through Examples and Code

Clustering methods, which (if asked) can provide step-by-step explanations of the algorithms used, as described in Ezugwu et. al., (2022) <doi:10.1016/j.engappai.2022.104743>; and datasets to test them on, which highlight the strengths and weaknesses of each technique, as presented in the clustering section of 'scikit-learn' (Pedregosa et al., 2011) <https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html>.

Maintained by Eduardo Ruiz Sabajanes. Last updated 2 years ago.

5.0 match 1 stars 2.70 score 4 scripts

biorgeo

bioregion:Comparison of Bioregionalisation Methods

The main purpose of this package is to propose a transparent methodological framework to compare bioregionalisation methods based on hierarchical and non-hierarchical clustering algorithms (Kreft & Jetz (2010) <doi:10.1111/j.1365-2699.2010.02375.x>) and network algorithms (Lenormand et al. (2019) <doi:10.1002/ece3.4718> and Leroy et al. (2019) <doi:10.1111/jbi.13674>).

Maintained by Maxime Lenormand. Last updated 11 days ago.

biogeography bioregion bioregionalization cpp

1.9 match 7 stars 6.27 score 11 scripts

lmjl-alea

squat:Statistics for Quaternion Temporal Data

An implementation of statistical tools for the analysis of rotation-valued time series and functional data. It relies on pre-existing quaternion data structure provided by the 'Eigen' 'C++' library.

Maintained by Aymeric Stamm. Last updated 1 years ago.

cpp openmp

3.3 match 2 stars 3.00 score 6 scripts

mlverse

cuda.ml:R Interface for the RAPIDS cuML Suite of Libraries

R interface for RAPIDS cuML (<https://github.com/rapidsai/cuml>), a suite of GPU-accelerated machine learning libraries powered by CUDA (<https://en.wikipedia.org/wiki/CUDA>).

Maintained by Daniel Falbel. Last updated 3 years ago.

gpu machine-learning cpp

1.9 match 33 stars 5.27 score 57 scripts

alfodefalco

dPCP:Automated Analysis of Multiplex Digital PCR Data

The automated clustering and quantification of the digital PCR data is based on the combination of 'DBSCAN' (Hahsler et al. (2019) <doi:10.18637/jss.v091.i01>) and 'c-means' (Bezdek et al. (1981) <doi:10.1007/978-1-4757-0450-1>) algorithms. The analysis is independent of multiplexing geometry, dPCR system, and input amount. The details about input data and parameters are available in the vignette.

Maintained by Alfonso De Falco. Last updated 2 years ago.

2.3 match 2 stars 4.36 score 23 scripts

branchlab

metasnf:Meta Clustering with Similarity Network Fusion

Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.

Maintained by Prashanth S Velayudhan. Last updated 5 days ago.

bioinformatics clustering metaclustering snf

1.2 match 8 stars 8.21 score 30 scripts

missiegobeats

OutliersLearn:Educational Outlier Package with Common Outlier Detection Algorithms

Provides implementations of some of the most important outlier detection algorithms. Includes a tutorial mode option that shows a description of each algorithm and provides a step-by-step execution explanation of how it identifies outliers from the given data with the specified input parameters. References include the works of Azzedine Boukerche, Lining Zheng, and Omar Alfandi (2020) <doi:10.1145/3381028>, Abir Smiti (2020) <doi:10.1016/j.cosrev.2020.100306>, and Xiaogang Su, Chih-Ling Tsai (2011) <doi:10.1002/widm.19>.

Maintained by Andres Missiego Manjon. Last updated 10 months ago.

1.5 match 1 stars 4.60 score 2 scripts

mcanigueral

evprof:Electric Vehicle Charging Sessions Profiling and Modelling

Tools for modelling electric vehicle charging sessions into generic groups with similar connection patterns called "user profiles", using Gaussian Mixture Models clustering. The clustering and profiling methodology is described in Cañigueral and Meléndez (2021, ISBN:0142-0615) <doi:10.1016/j.ijepes.2021.107195>.

Maintained by Marc Cañigueral. Last updated 4 days ago.

1.7 match 2 stars 3.30 score 6 scripts

andriyprotsak5

UAHDataScienceO:Educational Outlier Detection Algorithms with Step-by-Step Tutorials

Provides implementations of some of the most important outlier detection algorithms. Includes a tutorial mode option that shows a description of each algorithm and provides a step-by-step execution explanation of how it identifies outliers from the given data with the specified input parameters. References include the works of Azzedine Boukerche, Lining Zheng, and Omar Alfandi (2020) <doi:10.1145/3381028>, Abir Smiti (2020) <doi:10.1016/j.cosrev.2020.100306>, and Xiaogang Su, Chih-Ling Tsai (2011) <doi:10.1002/widm.19>.

Maintained by Andriy Protsak Protsak. Last updated 1 months ago.

1.5 match 3.00 score

yusenzhang

qkerntool:Q-Kernel-Based and Conditionally Negative Definite Kernel-Based Machine Learning Tools

Nonlinear machine learning tool for classification, clustering and dimensionality reduction. It integrates 12 q-kernel functions and 15 conditional negative definite kernel functions and includes the q-kernel and conditional negative definite kernel version of density-based spatial clustering of applications with noise, spectral clustering, generalized discriminant analysis, principal component analysis, multidimensional scaling, locally linear embedding, sammon's mapping and t-Distributed stochastic neighbor embedding.

Maintained by Yusen Zhang. Last updated 6 years ago.

1.8 match 1 stars 2.19 score 31 scripts

astamm

fdacluster:Joint Clustering and Alignment of Functional Data

Implementations of the k-means, hierarchical agglomerative and DBSCAN clustering methods for functional data which allows for jointly aligning and clustering curves. It supports functional data defined on one-dimensional domains but possibly evaluating in multivariate codomains. It supports functional data defined in arrays but also via the 'fd' and 'funData' classes for functional data defined in the 'fda' and 'funData' packages respectively. It currently supports shift, dilation and affine warping functions for functional data defined on the real line and uses the SRVF framework to handle boundary-preserving warping for functional data defined on a specific interval. Main reference for the k-means algorithm: Sangalli L.M., Secchi P., Vantini S., Vitelli V. (2010) "k-mean alignment for curve clustering" <doi:10.1016/j.csda.2009.12.008>. Main reference for the SRVF framework: Tucker, J. D., Wu, W., & Srivastava, A. (2013) "Generative models for functional data using phase and amplitude separation" <doi:10.1016/j.csda.2012.12.001>.

Maintained by Aymeric Stamm. Last updated 2 months ago.

openblas cpp openmp

0.5 match 5 stars 6.14 score 31 scripts 1 dependents

cran

UAHDataScienceO:Educational Outlier Detection Algorithms with Step-by-Step Tutorials

Provides implementations of some of the most important outlier detection algorithms. Includes a tutorial mode option that shows a description of each algorithm and provides a step-by-step execution explanation of how it identifies outliers from the given data with the specified input parameters. References include the works of Azzedine Boukerche, Lining Zheng, and Omar Alfandi (2020) <doi:10.1145/3381028>, Abir Smiti (2020) <doi:10.1016/j.cosrev.2020.100306>, and Xiaogang Su, Chih-Ling Tsai (2011) <doi:10.1002/widm.19>.

Maintained by Andriy Protsak Protsak. Last updated 24 days ago.

1.5 match 2.00 score

sigbertklinke

smvgraph:Visualization and Clustering of Data in a Shiny App

Various visualisations of univariate and multivariate graphs (e.g. mosaic diagram, scatterplot matrix, Andrews curves, parallel coordinate diagram, radar diagram and Chernoff plots) as well as clustering methods (e.g. k-means, agglomerative, EM clustering and DBSCAN) are implemented as a Shiny app. The app allows interactive changes, e.g. of the order of variables. It is intended for use in teaching.

Maintained by Sigbert Klinke. Last updated 2 years ago.

0.5 match 3.70 score