Showing 91 of total 91 results (show query)
palaeoverse
rphylopic:Get Silhouettes of Organisms from PhyloPic
Work with the PhyloPic Web Service (<http://api-docs.phylopic.org/v2/>) to fetch silhouette images of organisms. Includes functions for adding silhouettes to both base R plots and ggplot2 plots.
Maintained by William Gearty. Last updated 6 months ago.
base-rggplot2phylopicsilhouette
23.0 match 91 stars 9.25 score 272 scriptsnschiett
fishualize:Color Palettes Based on Fish Species
Implementation of color palettes based on fish species.
Maintained by Nina M. D. Schiettekatte. Last updated 11 months ago.
9.9 match 155 stars 8.54 score 370 scriptsmmaechler
cluster:"Finding Groups in Data": Cluster Analysis Extended Rousseeuw et al.
Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) "Finding Groups in Data".
Maintained by Martin Maechler. Last updated 4 days ago.
4.8 match 3 stars 11.98 score 14k scripts 2.2k dependentssinhrks
ggfortify:Data Visualization Tools for Statistical Analysis Results
Unified plotting tools for statistics commonly used, such as GLM, time series, PCA families, clustering and survival analysis. The package offers a single plotting interface for these analysis results and plots in a unified style using 'ggplot2'.
Maintained by Yuan Tang. Last updated 9 months ago.
3.8 match 529 stars 14.49 score 9.1k scripts 22 dependentstidymodels
tidyclust:A Common API to Clustering
A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
6.8 match 111 stars 7.45 score 139 scriptsjkim82133
TDA:Statistical Tools for Topological Data Analysis
Tools for Topological Data Analysis. The package focuses on statistical analysis of persistent homology and density clustering. For that, this package provides an R interface for the efficient algorithms of the C++ libraries 'GUDHI' <https://project.inria.fr/gudhi/software/>, 'Dionysus' <https://www.mrzv.org/software/dionysus/>, and 'PHAT' <https://bitbucket.org/phat-code/phat/>. This package also implements methods from Fasy et al. (2014) <doi:10.1214/14-AOS1252> and Chazal et al. (2015) <doi:10.20382/jocg.v6i2a8> for analyzing the statistical significance of persistent homology features.
Maintained by Jisu Kim. Last updated 1 months ago.
6.8 match 9 stars 7.18 score 204 scripts 5 dependentskurthornik
mlbench:Machine Learning Benchmark Problems
A collection of artificial and real-world machine learning benchmark problems, including, e.g., several data sets from the UCI repository.
Maintained by Kurt Hornik. Last updated 3 months ago.
4.5 match 2 stars 8.93 score 5.0k scripts 55 dependentsmlampros
ClusterR:Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids and Affinity Propagation Clustering
Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of 'RcppArmadillo' to speed up the computationally intensive parts of the functions. For more information, see (i) "Clustering in an Object-Oriented Environment" by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) "Web-scale k-means clustering" by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) "Armadillo: a template-based C++ library for linear algebra" by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) "Clustering by Passing Messages Between Data Points" by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, <doi:10.1126/science.1136800>.
Maintained by Lampros Mouselimis. Last updated 9 months ago.
affinity-propagationcpp11gmmkmeanskmedoids-clusteringmini-batch-kmeansrcpparmadilloopenblascppopenmp
3.6 match 84 stars 11.08 score 640 scripts 24 dependentspascoalf
ulrb:Unsupervised Learning Based Definition of Microbial Rare Biosphere
A tool to define rare biosphere. 'ulrb' solves the problem of the definition of rarity by replacing arbitrary thresholds with an unsupervised machine learning algorithm (partitioning around medoids, or k-medoids). This algorithm works for any type of microbiome data, provided there is a species abundance table. For validation of this method to different species abundance tables see Pascoal et al, 2024 (in peer-review). This method also works for non-microbiome data.
Maintained by Francisco Pascoal. Last updated 20 days ago.
6.7 match 3 stars 5.68 score 9 scriptsbioc
SC3:Single-Cell Consensus Clustering
A tool for unsupervised clustering and analysis of single cell RNA-Seq data.
Maintained by Vladimir Kiselev. Last updated 5 months ago.
immunooncologysinglecellsoftwareclassificationclusteringdimensionreductionsupportvectormachinernaseqvisualizationtranscriptomicsdatarepresentationguidifferentialexpressiontranscriptionbioconductor-packagehuman-cell-atlassingle-cell-rna-seqopenblascpp
3.4 match 122 stars 10.09 score 374 scripts 1 dependentsbioc
bluster:Clustering Algorithms for Bioconductor
Wraps common clustering algorithms in an easily extended S4 framework. Backends are implemented for hierarchical, k-means and graph-based clustering. Several utilities are also provided to compare and evaluate clustering results.
Maintained by Aaron Lun. Last updated 5 months ago.
immunooncologysoftwaregeneexpressiontranscriptomicssinglecellclusteringcpp
3.3 match 9.43 score 636 scripts 51 dependentsblasbenito
distantia:Advanced Toolset for Efficient Time Series Dissimilarity Analysis
Fast C++ implementation of Dynamic Time Warping for time series dissimilarity analysis, with applications in environmental monitoring and sensor data analysis, climate science, signal processing and pattern recognition, and financial data analysis. Built upon the ideas presented in Benito and Birks (2020) <doi:10.1111/ecog.04895>, provides tools for analyzing time series of varying lengths and structures, including irregular multivariate time series. Key features include individual variable contribution analysis, restricted permutation tests for statistical significance, and imputation of missing data via GAMs. Additionally, the package provides an ample set of tools to prepare and manage time series data.
Maintained by Blas M. Benito. Last updated 26 days ago.
dissimilaritydynamic-time-warpinglock-steptime-seriescpp
5.3 match 23 stars 5.76 score 11 scriptsmomx
Momocs:Morphometrics using R
The goal of 'Momocs' is to provide a complete, convenient, reproducible and open-source toolkit for 2D morphometrics. It includes most common 2D morphometrics approaches on outlines, open outlines, configurations of landmarks, traditional morphometrics, and facilities for data preparation, manipulation and visualization with a consistent grammar throughout. It allows reproducible, complex morphometrics analyses and other morphometrics approaches should be easy to plug in, or develop from, on top of this canvas.
Maintained by Vincent Bonhomme. Last updated 1 years ago.
4.0 match 51 stars 7.42 score 346 scriptsjoemsong
CircularSilhouette:Fast Silhouette on Circular or Linear Data Clusters
Calculating silhouette information for clusters on circular or linear data using fast algorithms. These algorithms run in linear time on sorted data, in contrast to quadratic time by the definition of silhouette. When used together with the fast and optimal circular clustering method FOCC (Debnath & Song 2021) <doi:10.1109/TCBB.2021.3077573> implemented in R package 'OptCirClust', circular silhouette can be maximized to find the optimal number of circular clusters; it can also be used to estimate the period of noisy periodical data.
Maintained by Joe Song. Last updated 3 years ago.
12.0 match 2.48 score 3 scripts 1 dependentscran
flexclust:Flexible Cluster Algorithms
The main function kcca implements a general framework for k-centroids cluster analysis supporting arbitrary distance measures and centroid computation. Further cluster methods include hard competitive learning, neural gas, and QT clustering. There are numerous visualization methods for cluster results (neighborhood graphs, convex cluster hulls, barcharts of centroids, ...), and bootstrap methods for the analysis of cluster stability.
Maintained by Bettina Grün. Last updated 17 days ago.
5.0 match 3 stars 5.81 score 52 dependentstalgalili
dendextend:Extending 'dendrogram' Functionality in R
Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.
Maintained by Tal Galili. Last updated 2 months ago.
1.7 match 154 stars 17.02 score 6.0k scripts 164 dependentskassambara
factoextra:Extract and Visualize the Results of Multivariate Data Analyses
Provides some easy-to-use functions to extract and visualize the output of multivariate data analyses, including 'PCA' (Principal Component Analysis), 'CA' (Correspondence Analysis), 'MCA' (Multiple Correspondence Analysis), 'FAMD' (Factor Analysis of Mixed Data), 'MFA' (Multiple Factor Analysis) and 'HMFA' (Hierarchical Multiple Factor Analysis) functions from different R packages. It contains also functions for simplifying some clustering analysis steps and provides 'ggplot2' - based elegant data visualization.
Maintained by Alboukadel Kassambara. Last updated 5 years ago.
1.9 match 363 stars 14.13 score 15k scripts 52 dependentstrenchproject
TrenchR:Tools for Microclimate and Biophysical Ecology
Tools for translating environmental change into organismal response. Microclimate models to vertically scale weather station data to organismal heights. The biophysical modeling tools include both general models for heat flows and specific models to predict body temperatures for a variety of ectothermic taxa. Additional functions model and temporally partition air and soil temperatures and solar radiation. Utility functions estimate the organismal and environmental parameters needed for biophysical ecology. 'TrenchR' focuses on relatively simple and modular functions so users can create transparent and flexible biophysical models. Many functions are derived from Gates (1980) <doi:10.1007/978-1-4612-6024-0> and Campbell and Norman (1988) <isbn:9780387949376>.
Maintained by Lauren Buckley. Last updated 1 years ago.
3.8 match 13 stars 6.89 score 43 scriptscoatless-rpkg
msos:Data Sets and Functions Used in Multivariate Statistics: Old School by John Marden
Multivariate Analysis methods and data sets used in John Marden's book Multivariate Statistics: Old School (2015) <ISBN:978-1456538835>. This also serves as a companion package for the STAT 571: Multivariate Analysis course offered by the Department of Statistics at the University of Illinois at Urbana-Champaign ('UIUC').
Maintained by James Balamuta. Last updated 1 years ago.
6.1 match 3 stars 4.16 score 32 scripts 1 dependentsbioc
MBECS:Evaluation and correction of batch effects in microbiome data-sets
The Microbiome Batch Effect Correction Suite (MBECS) provides a set of functions to evaluate and mitigate unwated noise due to processing in batches. To that end it incorporates a host of batch correcting algorithms (BECA) from various packages. In addition it offers a correction and reporting pipeline that provides a preliminary look at the characteristics of a data-set before and after correcting for batch effects.
Maintained by Michael Olbrich. Last updated 5 months ago.
batcheffectmicrobiomereportwritingvisualizationnormalizationqualitycontrol
5.3 match 4 stars 4.60 score 4 scriptsscmethods
scregclust:Reconstructing the Regulatory Programs of Target Genes in scRNA-Seq Data
Implementation of the scregclust algorithm described in Larsson, Held, et al. (2024) <doi:10.1038/s41467-024-53954-3> which reconstructs regulatory programs of target genes in scRNA-seq data. Target genes are clustered into modules and each module is associated with a linear model describing the regulatory program.
Maintained by Felix Held. Last updated 2 months ago.
clusteringregulatory-programsscrna-seq-analysiscppopenmp
3.6 match 9 stars 6.45 score 21 scriptsbioc
hopach:Hierarchical Ordered Partitioning and Collapsing Hybrid (HOPACH)
The HOPACH clustering algorithm builds a hierarchical tree of clusters by recursively partitioning a data set, while ordering and possibly collapsing clusters at each level. The algorithm uses the Mean/Median Split Silhouette (MSS) criteria to identify the level of the tree with maximally homogeneous clusters. It also runs the tree down to produce a final ordered list of the elements. The non-parametric bootstrap allows one to estimate the probability that each element belongs to each cluster (fuzzy clustering).
Maintained by Katherine S. Pollard. Last updated 5 months ago.
3.7 match 6.05 score 54 scripts 5 dependentsbioc
timeOmics:Time-Course Multi-Omics data integration
timeOmics is a generic data-driven framework to integrate multi-Omics longitudinal data measured on the same biological samples and select key temporal features with strong associations within the same sample group. The main steps of timeOmics are: 1. Plaform and time-specific normalization and filtering steps; 2. Modelling each biological into one time expression profile; 3. Clustering features with the same expression profile over time; 4. Post-hoc validation step.
Maintained by Antoine Bodein. Last updated 5 months ago.
clusteringfeatureextractiontimecoursedimensionreductionsoftwaresequencingmicroarraymetabolomicsmetagenomicsproteomicsclassificationregressionimmunooncologygenepredictionmultiplecomparisonclusterintegrationmulti-omicstime-series
3.5 match 24 stars 5.98 score 10 scriptsgo-ski
clustra:Clustering Longitudinal Trajectories
Clusters longitudinal trajectories over time (can be unequally spaced, unequal length time series and/or partially overlapping series) on a common time axis. Performs k-means clustering on a single continuous variable measured over time, where each mean is defined by a thin plate spline fit to all points in a cluster. Distance is MSE across trajectory points to cluster spline. Provides graphs of derived cluster splines, silhouette plots, and Adjusted Rand Index evaluations of the number of clusters. Scales well to large data with multicore parallelism available to speed computation.
Maintained by George Ostrouchov. Last updated 3 months ago.
4.7 match 4.48 score 6 scriptsbioc
ILoReg:ILoReg: a tool for high-resolution cell population identification from scRNA-Seq data
ILoReg is a tool for identification of cell populations from scRNA-seq data. In particular, ILoReg is useful for finding cell populations with subtle transcriptomic differences. The method utilizes a self-supervised learning method, called Iteratitive Clustering Projection (ICP), to find cluster probabilities, which are used in noise reduction prior to PCA and the subsequent hierarchical clustering and t-SNE steps. Additionally, functions for differential expression analysis to find gene markers for the populations and gene expression visualization are provided.
Maintained by Johannes Smolander. Last updated 5 months ago.
singlecellsoftwareclusteringdimensionreductionrnaseqvisualizationtranscriptomicsdatarepresentationdifferentialexpressiontranscriptiongeneexpression
4.0 match 5 stars 4.88 score 2 scriptsyulab-smu
TDbook:Companion Package for the Book "Data Integration, Manipulation and Visualization of Phylogenetic Trees" by Guangchuang Yu (2022, ISBN:9781032233574, doi:10.1201/9781003279242)
The companion package that provides all the datasets used in the book "Data Integration, Manipulation and Visualization of Phylogenetic Trees" by Guangchuang Yu (2022, ISBN:9781032233574, doi:10.1201/9781003279242).
Maintained by Guangchuang Yu. Last updated 3 years ago.
3.8 match 13 stars 4.88 score 59 scriptsbioc
orthogene:Interspecies gene mapping
`orthogene` is an R package for easy mapping of orthologous genes across hundreds of species. It pulls up-to-date gene ortholog mappings across **700+ organisms**. It also provides various utility functions to aggregate/expand common objects (e.g. data.frames, gene expression matrices, lists) using **1:1**, **many:1**, **1:many** or **many:many** gene mappings, both within- and between-species.
Maintained by Brian Schilder. Last updated 5 months ago.
geneticscomparativegenomicspreprocessingphylogeneticstranscriptomicsgeneexpressionanimal-modelsbioconductorbioconductor-packagebioinformaticsbiomedicinecomparative-genomicsevolutionary-biologygenesgenomicsontologiestranslational-research
2.3 match 42 stars 7.85 score 31 scripts 2 dependentssysbiolab
PathwaySpace:Spatial Projection of Network Signals along Geodesic Paths
For a given graph containing vertices, edges, and a signal associated with the vertices, the 'PathwaySpace' package performs a convolution operation, which involves a weighted combination of neighboring vertices and their associated signals. The package then uses a decay function to project these signals, creating geodesic paths on a 2D-image space. 'PathwaySpace' could have various applications, such as visualizing and analyzing network data in a graphical format that highlights the relationships and signal strengths between vertices. It can be particularly useful for understanding the influence of signals through complex networks. By combining graph theory, signal processing, and visualization, the 'PathwaySpace' package provides a novel way of representing and analyzing graph data.
Maintained by Mauro Castro. Last updated 2 months ago.
bioinformaticsbiological-networksgraph
3.3 match 2 stars 4.85 score 5 scriptsmlr-org
mlr3cluster:Cluster Extension for 'mlr3'
Extends the 'mlr3' package with cluster analysis.
Maintained by Maximilian Mücke. Last updated 27 days ago.
cluster-analysisclusteringmlr3
1.9 match 23 stars 8.21 score 50 scripts 2 dependentsmottastefano
SOMMD:Self Organising Maps for the Analysis of Molecular Dynamics Data
Processes data from Molecular Dynamics simulations using Self Organising Maps. Features include the ability to read different input formats. Trajectories can be analysed to identify groups of important frames. Output visualisation can be generated for maps and pathways. Methodological details can be found in Motta S et al (2022) <doi:10.1021/acs.jctc.1c01163>. I/O functions for xtc format files were implemented using the 'xdrfile' library available under open source license. The relevant information can be found in inst/COPYRIGHT.
Maintained by Stefano Motta. Last updated 6 months ago.
9.0 match 1.70 score 4 scriptsgfellerlab
SuperCell:Simplification of scRNA-seq data by merging together similar cells
Aggregates large single-cell data into metacell dataset by merging together gene expression of very similar cells.
Maintained by The package maintainer. Last updated 8 months ago.
softwarecoarse-grainingscrna-seq-analysisscrna-seq-data
1.7 match 72 stars 8.08 score 93 scriptsjeremygelb
geocmeans:Implementing Methods for Spatial Fuzzy Unsupervised Classification
Provides functions to apply spatial fuzzy unsupervised classification, visualize and interpret results. This method is well suited when the user wants to analyze data with a fuzzy clustering algorithm and to account for the spatial dimension of the dataset. In addition, indexes for estimating the spatial consistency and classification quality are proposed. The methods were originally proposed in the field of brain imagery (seed Cai and al. 2007 <doi:10.1016/j.patcog.2006.07.011> and Zaho and al. 2013 <doi:10.1016/j.dsp.2012.09.016>) and recently applied in geography (see Gelb and Apparicio <doi:10.4000/cybergeo.36414>).
Maintained by Jeremy Gelb. Last updated 4 months ago.
clusteringcmeansfuzzy-classification-algorithmsspatial-analysisspatial-fuzzy-cmeansunsupervised-learningcppopenmp
2.0 match 27 stars 6.08 score 90 scriptsweksi-budiaji
kmed:Distance-Based k-Medoids
Algorithms of distance-based k-medoids clustering: simple and fast k-medoids, ranked k-medoids, and increasing number of clusters in k-medoids. Calculate distances for mixed variable data such as Gower, Podani, Wishart, Huang, Harikumar-PV, and Ahmad-Dey. Cluster validation applies internal and relative criteria. The internal criteria includes silhouette index and shadow values. The relative criterium applies bootstrap procedure producing a heatmap with a flexible reordering matrix algorithm such as complete, ward, or average linkages. The cluster result can be plotted in a marked barplot or pca biplot.
Maintained by Weksi Budiaji. Last updated 3 years ago.
3.8 match 3.15 score 141 scriptsjoemsong
GridOnClusters:Cluster-Preserving Multivariate Joint Grid Discretization
Discretize multivariate continuous data using a grid that captures the joint distribution via preserving clusters in the original data (Wang et al. 2020) <doi:10.1145/3388440.3412415>. Joint grid discretization is applicable as a data transformation step to prepare data for model-free inference of association, function, or causality.
Maintained by Joe Song. Last updated 10 months ago.
4.3 match 2.70 score 3 scriptscran
Mercator:Clustering and Visualizing Distance Matrices
Defines the classes used to explore, cluster and visualize distance matrices, especially those arising from binary data. See Abrams and colleagues, 2021, <doi:10.1093/bioinformatics/btab037>.
Maintained by Kevin R. Coombes. Last updated 5 months ago.
2.7 match 4.33 score 12 scripts 1 dependentsbioc
scone:Single Cell Overview of Normalized Expression data
SCONE is an R package for comparing and ranking the performance of different normalization schemes for single-cell RNA-seq and other high-throughput analyses.
Maintained by Davide Risso. Last updated 26 days ago.
immunooncologynormalizationpreprocessingqualitycontrolgeneexpressionrnaseqsoftwaretranscriptomicssequencingsinglecellcoverage
1.3 match 53 stars 9.12 score 104 scriptsbioc
pipeComp:pipeComp pipeline benchmarking framework
A simple framework to facilitate the comparison of pipelines involving various steps and parameters. The `pipelineDefinition` class represents pipelines as, minimally, a set of functions consecutively executed on the output of the previous one, and optionally accompanied by step-wise evaluation and aggregation functions. Given such an object, a set of alternative parameters/methods, and benchmark datasets, the `runPipeline` function then proceeds through all combinations arguments, avoiding recomputing the same step twice and compiling evaluations on the fly to avoid storing potentially large intermediate data.
Maintained by Pierre-Luc Germain. Last updated 5 months ago.
geneexpressiontranscriptomicsclusteringdatarepresentationbenchmarkbioconductorpipeline-benchmarkingpipelinessingle-cell-rna-seq
1.5 match 41 stars 7.02 score 43 scriptskeefe-murphy
MEDseq:Mixtures of Exponential-Distance Models with Covariates
Implements a model-based clustering method for categorical life-course sequences relying on mixtures of exponential-distance models introduced by Murphy et al. (2021) <doi:10.1111/rssa.12712>. A range of flexible precision parameter settings corresponding to weighted generalisations of the Hamming distance metric are considered, along with the potential inclusion of a noise component. Gating covariates can be supplied in order to relate sequences to baseline characteristics and sampling weights are also accommodated. The models are fitted using the EM algorithm and tools for visualising the results are also provided.
Maintained by Keefe Murphy. Last updated 7 days ago.
distance-based-clusteringmixture-of-expertsmodel-based-clusteringsequence-analysis
1.9 match 5 stars 5.49 score 25 scriptscran
pdfCluster:Cluster Analysis via Nonparametric Density Estimation
Cluster analysis via nonparametric density estimation is performed. Operationally, the kernel method is used throughout to estimate the density. Diagnostics methods for evaluating the quality of the clustering are available. The package includes also a routine to estimate the probability density function obtained by the kernel method, given a set of data with arbitrary dimensions.
Maintained by Menardi Giovanna. Last updated 2 years ago.
1.8 match 5 stars 5.66 score 196 scripts 12 dependentscran
fclust:Fuzzy Clustering
Algorithms for fuzzy clustering, cluster validity indices and plots for cluster validity and visualizing fuzzy clustering results.
Maintained by Paolo Giordani. Last updated 2 years ago.
4.3 match 1 stars 2.38 score 2 dependentsmatthias-studer
WeightedCluster:Clustering of Weighted Data
Clusters state sequences and weighted data. It provides an optimized weighted PAM algorithm as well as functions for aggregating replicated cases, computing cluster quality measures for a range of clustering solutions and plotting (fuzzy) clusters of state sequences. Parametric bootstraps methods to validate typology of sequences are also provided. Finally, it provides a fuzzy and crisp CLARA algorithm to cluster large database with sequence analysis.
Maintained by Matthias Studer. Last updated 3 months ago.
1.8 match 5.55 score 106 scripts 4 dependentstiago-simoes
EvoPhylo:Pre- And Postprocessing of Morphological Data from Relaxed Clock Bayesian Phylogenetics
Performs automated morphological character partitioning for phylogenetic analyses and analyze macroevolutionary parameter outputs from clock (time-calibrated) Bayesian inference analyses, following concepts introduced by Simões and Pierce (2021) <doi:10.1038/s41559-021-01532-x>.
Maintained by Tiago Simoes. Last updated 2 years ago.
1.7 match 4 stars 5.66 score 19 scriptskisungyou
T4cluster:Tools for Cluster Analysis
Cluster analysis is one of the most fundamental problems in data science. We provide a variety of algorithms from clustering to the learning on the space of partitions. See Hennig, Meila, and Rocci (2016, ISBN:9781466551886) for general exposition to cluster analysis.
Maintained by Kisung You. Last updated 3 years ago.
2.3 match 6 stars 4.26 score 9 scripts 2 dependentsdgrun
RaceID:Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data
Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>).
Maintained by Dominic Grün. Last updated 4 months ago.
2.0 match 4.74 score 110 scriptsfrbcesab
rutils:A Collection of R Functions
A collection of R functions commonly used in FRB-CESAB projects.
Maintained by Nicolas Casajus. Last updated 2 months ago.
2.0 match 2 stars 4.66 score 454 scriptsbioc
GenomicSuperSignature:Interpretation of RNA-seq experiments through robust, efficient comparison to public databases
This package provides a novel method for interpreting new transcriptomic datasets through near-instantaneous comparison to public archives without high-performance computing requirements. Through the pre-computed index, users can identify public resources associated with their dataset such as gene sets, MeSH term, and publication. Functions to identify interpretable annotations and intuitive visualization options are implemented in this package.
Maintained by Sehyun Oh. Last updated 5 months ago.
transcriptomicssystemsbiologyprincipalcomponentrnaseqsequencingpathwaysclusteringbioconductor-packageexploratory-data-analysisgseameshprincipal-component-analysisrna-sequencing-profilestransferlearning
1.3 match 16 stars 6.97 score 59 scriptsbioc
BERT:High Performance Data Integration for Large-Scale Analyses of Incomplete Omic Profiles Using Batch-Effect Reduction Trees (BERT)
Provides efficient batch-effect adjustment of data with missing values. BERT orders all batch effect correction to a tree of pairwise computations. BERT allows parallelization over sub-trees.
Maintained by Yannis Schumann. Last updated 2 months ago.
batcheffectpreprocessingexperimentaldesignqualitycontrolbatch-effectbioconductor-packagebioinformaticsdata-integrationdata-science
1.7 match 2 stars 5.40 score 18 scriptsbioc
SGCP:SGCP: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks
SGC is a semi-supervised pipeline for gene clustering in gene co-expression networks. SGC consists of multiple novel steps that enable the computation of highly enriched modules in an unsupervised manner. But unlike all existing frameworks, it further incorporates a novel step that leverages Gene Ontology information in a semi-supervised clustering method that further improves the quality of the computed modules.
Maintained by Niloofar AghaieAbiane. Last updated 5 months ago.
geneexpressiongenesetenrichmentnetworkenrichmentsystemsbiologyclassificationclusteringdimensionreductiongraphandnetworkneuralnetworknetworkmrnamicroarrayrnaseqvisualizationbioinformaticsgenecoexpressionnetworkgraphsnetworkclusteringnetworksself-trainingsemi-supervised-learningunsupervised-learning
1.8 match 2 stars 5.12 score 44 scriptsbioc
Polytect:An R package for digital data clustering
Polytect is an advanced computational tool designed for the analysis of multi-color digital PCR data. It provides automatic clustering and labeling of partitions into distinct groups based on clusters first identified by the flowPeaks algorithm. Polytect is particularly useful for researchers in molecular biology and bioinformatics, enabling them to gain deeper insights into their experimental results through precise partition classification and data visualization.
Maintained by Yao Chen. Last updated 3 months ago.
ddpcrclusteringmultichannelclassification
1.9 match 4.74 score 4 scriptsmi2-warsaw
sejmRP:An Information About Deputies and Votings in Polish Diet from Seventh to Eighth Term of Office
Set of functions that access information about deputies and votings in Polish diet from webpage <http://www.sejm.gov.pl>. The package was developed as a result of an internship in MI2 Group - <http://mi2.mini.pw.edu.pl>, Faculty of Mathematics and Information Science, Warsaw University of Technology.
Maintained by Piotr Smuda. Last updated 8 years ago.
1.8 match 21 stars 5.04 score 35 scriptsbioc
SpectralTAD:SpectralTAD: Hierarchical TAD detection using spectral clustering
SpectralTAD is an R package designed to identify Topologically Associated Domains (TADs) from Hi-C contact matrices. It uses a modified version of spectral clustering that uses a sliding window to quickly detect TADs. The function works on a range of different formats of contact matrices and returns a bed file of TAD coordinates. The method does not require users to adjust any parameters to work and gives them control over the number of hierarchical levels to be returned.
Maintained by Mikhail Dozmorov. Last updated 5 months ago.
softwarehicsequencingfeatureextractionclustering
1.3 match 8 stars 6.53 score 17 scriptsacabassi
coca:Cluster-of-Clusters Analysis
Contains the R functions needed to perform Cluster-Of-Clusters Analysis (COCA) and Consensus Clustering (CC). For further details please see Cabassi and Kirk (2020) <doi:10.1093/bioinformatics/btaa593>.
Maintained by Alessandra Cabassi. Last updated 5 years ago.
cluster-analysiscluster-of-clustersclusteringcocagenomicsintegrative-clusteringmulti-omics
1.7 match 6 stars 5.03 score 12 scripts 1 dependentsjakobraymaekers
classmap:Visualizing Classification Results
Tools to visualize the results of a classification of cases. The graphical displays include stacked plots, silhouette plots, quasi residual plots, and class maps. Implements the techniques described and illustrated in Raymaekers, Rousseeuw and Hubert (2021), Class maps for visualizing classification results, Technometrics, appeared online. <doi:10.1080/00401706.2021.1927849> (open access) and Raymaekers and Rousseeuw (2021), Silhouettes and quasi residual plots for neural nets and tree-based classifiers, <arXiv:2106.08814>. Examples can be found in the vignettes: "Discriminant_analysis_examples","K_nearest_neighbors_examples", "Support_vector_machine_examples", "Rpart_examples", "Random_forest_examples", and "Neural_net_examples".
Maintained by Jakob Raymaekers. Last updated 2 years ago.
2.6 match 3.08 score 20 scriptsclancylabuiuc
moRphomenses:Geometric Morphometric Tools to Align, Scale, and Compare "Shape" of Menstrual Cycle Hormones
Mitteroecker & Gunz (2009) <doi:10.1007/s11692-009-9055-x> describe how geometric morphometric methods allow researchers to quantify the size and shape of physical biological structures. We provide tools to extend geometric morphometric principles to the study of non-physical structures, hormone profiles, as outlined in Ehrlich et al (2021) <doi:10.1002/ajpa.24514>. Easily transform daily measures into multivariate landmark-based data. Includes custom functions to apply multivariate methods for data exploration as well as hypothesis testing. Also includes 'shiny' web app to streamline data exploration. Developed to study menstrual cycle hormones but functions have been generalized and should be applicable to any biomarker over any time period.
Maintained by Daniel Ehrlich. Last updated 2 months ago.
2.0 match 2 stars 4.04 score 4 scriptscran
Kira:Machine Learning
Machine learning, containing several algorithms for supervised and unsupervised classification, in addition to a function that plots the Receiver Operating Characteristic (ROC) and Precision-Recall (PRC) curve graphs, and also a function that returns several metrics used for model evaluation, the latter can be used in ranking results from other packs.
Maintained by Paulo Cesar Ossani. Last updated 6 months ago.
4.8 match 1.70 scorecran
dawai:Discriminant Analysis with Additional Information
In applications it is usual that some additional information is available. This package dawai (an acronym for Discriminant Analysis With Additional Information) performs linear and quadratic discriminant analysis with additional information expressed as inequality restrictions among the populations means. It also computes several estimations of the true error rate.
Maintained by David Conde. Last updated 5 months ago.
4.0 match 2.00 scoreocbe-uio
DIscBIO:A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
An open, multi-algorithmic pipeline for easy, fast and efficient analysis of cellular sub-populations and the molecular signatures that characterize them. The pipeline consists of four successive steps: data pre-processing, cellular clustering with pseudo-temporal ordering, defining differential expressed genes and biomarker identification. More details on Ghannoum et. al. (2021) <doi:10.3390/ijms22031399>. This package implements extensions of the work published by Ghannoum et. al. (2019) <doi:10.1101/700989>.
Maintained by Waldir Leoncio. Last updated 1 years ago.
biomarker-discoveryjupyter-notebookscrna-seqsingle-cell-analysistranscriptomicsopenjdk
1.8 match 12 stars 4.38 score 5 scriptsbiplab44
ctmva:Continuous-Time Multivariate Analysis
Implements a basis function or functional data analysis framework for several techniques of multivariate analysis in continuous-time setting. Specifically, we introduced continuous-time analogues of several classical techniques of multivariate analysis, such as principal component analysis, canonical correlation analysis, Fisher linear discriminant analysis, K-means clustering, and so on. Details are in Biplab Paul, Philip T. Reiss and Erjia Cui (2023) "Continuous-time multivariate analysis" <doi:10.48550/arXiv.2307.09404>.
Maintained by Biplab Paul. Last updated 1 years ago.
7.9 match 1.00 scorejdmde
scellpam:Applying Partitioning Around Medoids to Single Cell Data with High Number of Cells
PAM (Partitioning Around Medoids) algorithm application to samples of single cell sequencing techniques with a high number of cells (as many as the computer memory allows). The package uses a binary format to store matrices (either full, sparse or symmetric) in files written in the disk that can contain any data type (not just double) which allows its manipulation when memory is sufficient to load them as int or float, but not as double. The PAM implementation is done in parallel, using several/all the cores of the machine, if it has them. This package shares a great part of its code with packages 'jmatrix' and 'parallelpam' but their functionality is included here so there is no need to install them.
Maintained by Juan Domingo. Last updated 8 months ago.
2.8 match 2.78 score 9 scriptswagner-s
MultIS:Reconstruction of Clones from Integration Site Readouts and Visualization
Tools necessary to reconstruct clonal affiliations from temporally and/or spatially separated measurements of viral integration sites. For this means it utilizes correlations present in the relative readouts of the integration sites. Furthermore, facilities for filtering of the data and visualization of different steps in the pipeline are provided with the package.
Maintained by Sebastian Wagner. Last updated 4 years ago.
3.5 match 2.00 score 1 scriptspromidat
discoveR:Exploratory Data Analysis System
Performs an exploratory data analysis through a 'shiny' interface. It includes basic methods such as the mean, median, mode, normality test, among others. It also includes clustering techniques such as Principal Components Analysis, Hierarchical Clustering and the K-Means Method.
Maintained by Oldemar Rodriguez. Last updated 2 years ago.
2.3 match 3 stars 3.03 score 18 scriptsuislambekov
TDAvec:Vector summaries of persistence diagrams
Provides tools for computing various vector summaries of persistence diagrams studied in Topological Data Analysis. For improved computational efficiency, all code for the vector summaries is written in 'C++' using the 'Rcpp' and 'RcppArmadillo' packages
Maintained by Umar Islambekov. Last updated 8 days ago.
1.8 match 3.52 score 11 scriptsbioc
TMixClust:Time Series Clustering of Gene Expression with Gaussian Mixed-Effects Models and Smoothing Splines
Implementation of a clustering method for time series gene expression data based on mixed-effects models with Gaussian variables and non-parametric cubic splines estimation. The method can robustly account for the high levels of noise present in typical gene expression time series datasets.
Maintained by Monica Golumbeanu. Last updated 5 months ago.
softwarestatisticalmethodclusteringtimecoursegeneexpression
1.8 match 3.60 score 5 scriptssugnet
ClusBoot:Bootstrap a Clustering Solution to Establish the Stability of the Clusters
Providing a cluster allocation for n samples, either with an $n \times p$ data matrix or an $n \times n$ distance matrix, a bootstrap procedure is performed. The proportion of bootstrap replicates where a pair of samples cluster in the same cluster indicates who tightly the samples in a particular cluster clusters together.
Maintained by Sugnet Lubbe. Last updated 8 months ago.
5.9 match 1.00 score 1 scriptscran
aweSOM:Interactive Self-Organizing Maps
Self-organizing maps (also known as SOM, see Kohonen (2001) <doi:10.1007/978-3-642-56927-2>) are a method for dimensionality reduction and clustering of continuous data. This package introduces interactive (html) graphics for easier analysis of SOM results. It also features an interactive interface, for push-button training and visualization of SOM on numeric, categorical or mixed data, as well as tools to evaluate the quality of SOM.
Maintained by Julien Boelaert. Last updated 3 years ago.
1.9 match 3 stars 2.95 score 1 dependentsifelipebj
PrometheeTools:PROMETHEE and GLNF for Ranking and Sorting Problems
PROMETHEE (Preference Ranking Organisation METHod for Enrichment of Evaluations) based method assesses alternatives to obtain partial and complete rankings. The package also provides the GLNF (Global Local Net Flow) sorting algorithm to classify alternatives into ordered categories, as well as an index function to measure the classification quality. Barrera, F., Segura, M., & Maroto, C. (2023) <doi:10.1111/itor.13288>. Brans, J.P.; De Smet, Y., (2016) <doi:10.1007/978-1-4939-3094-4_6>.
Maintained by Felipe Barrera. Last updated 2 years ago.
1.9 match 2.70 score 1 scriptscran
RSKC:Robust Sparse K-Means
This RSKC package contains a function RSKC which runs the robust sparse K-means clustering algorithm.
Maintained by Yumi Kondo. Last updated 9 years ago.
2.3 match 1.97 score 31 scripts 1 dependentscran
bios2mds:From Biological Sequences to Multidimensional Scaling
Utilities dedicated to the analysis of biological sequences by metric MultiDimensional Scaling with projection of supplementary data. It contains functions for reading multiple sequence alignment files, calculating distance matrices, performing metric multidimensional scaling and visualizing results.
Maintained by Marie Chabbert. Last updated 5 years ago.
2.3 match 1 stars 1.90 scorekisungyou
TDAkit:Toolkit for Topological Data Analysis
Topological data analysis studies structure and shape of the data using topological features. We provide a variety of algorithms to learn with persistent homology of the data based on functional summaries for clustering, hypothesis testing, visualization, and others. We refer to Wasserman (2018) <doi:10.1146/annurev-statistics-031017-100045> for a statistical perspective on the topic.
Maintained by Kisung You. Last updated 4 years ago.
1.8 match 2 stars 2.30 score 4 scriptscran
ExcessMass:Excess Mass Calculation and Plots
Implementation of a function which calculates the empirical excess mass for given \eqn{\lambda} and given maximal number of modes (excessm()). Offering powerful plot features to visualize empirical excess mass (exmplot()). This includes the possibility of drawing several plots (with different maximal number of modes / cut off values) in a single graph.
Maintained by Marc-Daniel Mildenberger. Last updated 3 years ago.
3.8 match 1.00 scorevpihur
clValid:Validation of Clustering Results
Statistical and biological validation of clustering results. This package implements Dunn Index, Silhouette, Connectivity, Stability, BHI and BSI. Further information can be found in Brock, G et al. (2008) <doi: 10.18637/jss.v025.i04>.
Maintained by Vasyl Pihur. Last updated 4 years ago.
0.5 match 5 stars 7.19 score 422 scripts 14 dependentsgagolews
genieclust:Fast and Robust Hierarchical Clustering with Noise Points Detection
A retake on the Genie algorithm (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>), which is a robust hierarchical clustering method (Gagolewski, Bartoszuk, Cena, 2016 <DOI:10.1016/j.ins.2016.05.003>). It is now faster and more memory efficient; determining the whole cluster hierarchy for datasets of 10M points in low dimensional Euclidean spaces or 100K points in high-dimensional ones takes only a minute or so. Allows clustering with respect to mutual reachability distances so that it can act as a noise point detector or a robustified version of 'HDBSCAN*' (that is able to detect a predefined number of clusters and hence it does not dependent on the somewhat fragile 'eps' parameter). The package also features an implementation of inequality indices (e.g., Gini and Bonferroni), external cluster validity measures (e.g., the normalised clustering accuracy, the adjusted Rand index, the Fowlkes-Mallows index, and normalised mutual information), and internal cluster validity indices (e.g., the Calinski-Harabasz, Davies-Bouldin, Ball-Hall, Silhouette, and generalised Dunn indices). See also the 'Python' version of 'genieclust' available on 'PyPI', which supports sparse data, more metrics, and even larger datasets.
Maintained by Marek Gagolewski. Last updated 5 days ago.
cluster-analysisclusteringclustering-algorithmdata-analysisdata-miningdata-sciencegeniehdbscanhierarchical-clusteringhierarchical-clustering-algorithmmachine-learningmachine-learning-algorithmsmlpacknmslibpythonpython3sparsecppopenmp
0.5 match 61 stars 7.29 score 13 scripts 5 dependentscran
IntNMF:Integrative Clustering of Multiple Genomic Dataset
Carries out integrative clustering analysis using multiple types of genomic dataset using integrative Non-negative Matrix factorization.
Maintained by Prabhakar Chalise. Last updated 2 months ago.
1.9 match 2 stars 1.79 score 31 scriptscran
SLBDD:Statistical Learning for Big Dependent Data
Programs for analyzing large-scale time series data. They include functions for automatic specification and estimation of univariate time series, for clustering time series, for multivariate outlier detections, for quantile plotting of many time series, for dynamic factor models and for creating input data for deep learning programs. Examples of using the package can be found in the Wiley book 'Statistical Learning with Big Dependent Data' by Daniel Peña and Ruey S. Tsay (2021). ISBN 9781119417385.
Maintained by Antonio Elias. Last updated 3 years ago.
1.8 match 1.56 score 12 scripts 1 dependentscran
SillyPutty:Silly Putty Clustering
Implements a simple, novel clustering algorithm based on optimizing the silhouette width. See <doi:10.1101/2023.11.07.566055> for details.
Maintained by Kevin R. Coombes. Last updated 1 years ago.
0.5 match 2.00 scorequantmeth
image2data:Turn Images into Data Sets
The goal of 'image2data' is to extract images and return them into a data set, especially for teaching data manipulation and data visualization. Basically, the eponymous function takes an image file ('png', 'tiff', 'jpeg', 'bmp') and turn it into a data set, pixels being rows (subjects) and columns (variables) being their coordinate positions (x- and y-axis) and their respective color (in hex codes). The function can return a complete image or a range of color (i.e., contour, silhouette). The data can then be manipulated as would any data set by either creating other related variables (to hide the image) or as a genuine toy data set.
Maintained by P.-O. Caron. Last updated 3 years ago.
0.5 match 2.00 score 1 scriptsbuybnb
GIC:A General Iterative Clustering Algorithm
An iterative algorithm that improves the proximity matrix (PM) from a random forest (RF) and the resulting clusters as measured by the silhouette score.
Maintained by Ziqiang Lin. Last updated 3 years ago.
0.5 match 1.70 scoregianmarcoalberti
brsim:Brainerd-Robinson Similarity Coefficient Matrix
Provides the facility to calculate the Brainerd-Robinson similarity coefficient for the rows of an input table, and to calculate the significance of each coefficient based on a permutation approach; a heatmap is produced to visually represent the similarity matrix. Optionally, hierarchical agglomerative clustering can be performed and the silhouette method is used to identify an optimal number of clusters; the results of the clustering can be optionally used to sort the heatmap.
Maintained by Gianmarco Alberti. Last updated 1 years ago.
0.5 match 1.70 score