Showing 190 of total 190 results (show query)
vegandevs
vegan:Community Ecology Package
Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
Maintained by Jari Oksanen. Last updated 16 days ago.
ecological-modellingecologyordinationfortranopenblas
23.0 match 472 stars 19.41 score 15k scripts 440 dependentsblasbenito
distantia:Advanced Toolset for Efficient Time Series Dissimilarity Analysis
Fast C++ implementation of Dynamic Time Warping for time series dissimilarity analysis, with applications in environmental monitoring and sensor data analysis, climate science, signal processing and pattern recognition, and financial data analysis. Built upon the ideas presented in Benito and Birks (2020) <doi:10.1111/ecog.04895>, provides tools for analyzing time series of varying lengths and structures, including irregular multivariate time series. Key features include individual variable contribution analysis, restricted permutation tests for statistical significance, and imputation of missing data via GAMs. Additionally, the package provides an ample set of tools to prepare and manage time series data.
Maintained by Blas M. Benito. Last updated 25 days ago.
dissimilaritydynamic-time-warpinglock-steptime-seriescpp
55.7 match 23 stars 5.76 score 11 scriptsfitzlab-al
gdm:Generalized Dissimilarity Modeling
A toolkit with functions to fit, plot, summarize, and apply Generalized Dissimilarity Models. Mokany K, Ware C, Woolley SNC, Ferrier S, Fitzpatrick MC (2022) <doi:10.1111/geb.13459> Ferrier S, Manion G, Elith J, Richardson K (2007) <doi:10.1111/j.1472-4642.2007.00341.x>.
Maintained by Matt Fitzpatrick. Last updated 2 months ago.
22.3 match 35 stars 8.12 score 145 scriptsgavinsimpson
analogue:Analogue and Weighted Averaging Methods for Palaeoecology
Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.
Maintained by Gavin L. Simpson. Last updated 6 months ago.
20.0 match 14 stars 8.96 score 185 scripts 4 dependentspmontman
TSclust:Time Series Clustering Utilities
A set of measures of dissimilarity between time series to perform time series clustering. Metrics based on raw data, on generating models and on the forecast behavior are implemented. Some additional utilities related to time series clustering are also provided, such as clustering algorithms and cluster evaluation metrics.
Maintained by Pablo Montero Manso. Last updated 5 years ago.
27.8 match 2 stars 5.76 score 170 scripts 8 dependentsl-ramirez-lopez
resemble:Memory-Based Learning in Spectral Chemometrics
Functions for dissimilarity analysis and memory-based learning (MBL, a.k.a local modeling) in complex spectral data sets. Most of these functions are based on the methods presented in Ramirez-Lopez et al. (2013) <doi:10.1016/j.geoderma.2012.12.014>.
Maintained by Leonardo Ramirez-Lopez. Last updated 2 years ago.
chemoinformaticschemometricsinfrared-spectroscopylazy-learninglocal-regressionmachine-learningmemory-based-learningnirpedometricssoil-spectroscopyspectral-dataspectral-libraryspectroscopyopenblascppopenmp
23.4 match 20 stars 5.91 score 27 scriptsbioc
goSorensen:Statistical inference based on the Sorensen-Dice dissimilarity and the Gene Ontology (GO)
This package implements inferential methods to compare gene lists in terms of their biological meaning as expressed in the GO. The compared gene lists are characterized by cross-tabulation frequency tables of enriched GO items. Dissimilarity between gene lists is evaluated using the Sorensen-Dice index. The fundamental guiding principle is that two gene lists are taken as similar if they share a great proportion of common enriched GO items.
Maintained by Pablo Flores. Last updated 5 months ago.
annotationgogenesetenrichmentsoftwaremicroarraypathwaysgeneexpressionmultiplecomparisongraphandnetworkreactomeclusteringkegg
27.4 match 4.56 score 12 scriptsphiala
ecodist:Dissimilarity-Based Functions for Ecological Analysis
Dissimilarity-based analysis functions including ordination and Mantel test functions, intended for use with spatial and community ecological data. The original package description is in Goslee and Urban (2007) <doi:10.18637/jss.v022.i07>, with further statistical detail in Goslee (2010) <doi:10.1007/s11258-009-9641-0>.
Maintained by Sarah Goslee. Last updated 1 years ago.
12.5 match 9 stars 9.84 score 566 scripts 9 dependentstraminer
TraMineR:Trajectory Miner: a Sequence Analysis Toolkit
Set of sequence analysis tools for manipulating, describing and rendering categorical sequences, and more generally mining sequence data in the field of social sciences. Although this sequence analysis package is primarily intended for state or event sequences that describe time use or life courses such as family formation histories or professional careers, its features also apply to many other kinds of categorical sequence data. It accepts many different sequence representations as input and provides tools for converting sequences from one format to another. It offers several functions for describing and rendering sequences, for computing distances between sequences with different metrics (among which optimal matching), original dissimilarity-based analysis tools, and functions for extracting the most frequent event subsequences and identifying the most discriminating ones among them. A user's guide can be found on the TraMineR web page.
Maintained by Gilbert Ritschard. Last updated 3 months ago.
14.0 match 11 stars 8.24 score 534 scripts 13 dependentskurthornik
clue:Cluster Ensembles
CLUster Ensembles.
Maintained by Kurt Hornik. Last updated 4 months ago.
11.0 match 2 stars 9.85 score 496 scripts 401 dependentspavel-fibich
gawdis:Multi-Trait Dissimilarity with more Uniform Contributions
R function gawdis() produces multi-trait dissimilarity with more uniform contributions of different traits. de Bello et al. (2021) <doi:10.1111/2041-210X.13537> presented the approach based on minimizing the differences in the correlation between the dissimilarity of each trait, or groups of traits, and the multi-trait dissimilarity. This is done using either an analytic or a numerical solution, both available in the function.
Maintained by Pavel Fibich. Last updated 2 years ago.
dissimilarityfdgowdismulti-trait-dissimilaritytrait
20.6 match 5 stars 5.20 score 21 scripts 1 dependentscran
TSdist:Distance Measures for Time Series Data
A set of commonly used distance measures and some additional functions which, although initially not designed for this purpose, can be used to measure the dissimilarity between time series. These measures can be used to perform clustering, classification or other data mining tasks which require the definition of a distance measure between time series. U. Mori, A. Mendiburu and J.A. Lozano (2016), <doi:10.32614/RJ-2016-058>.
Maintained by Usue Mori. Last updated 3 years ago.
27.6 match 5 stars 3.85 score 94 scripts 5 dependentscran
betapart:Partitioning Beta Diversity into Turnover and Nestedness Components
Functions to compute pair-wise dissimilarities (distance matrices) and multiple-site dissimilarities, separating the turnover and nestedness-resultant components of taxonomic (incidence and abundance based), functional and phylogenetic beta diversity.
Maintained by Andres Baselga. Last updated 2 years ago.
26.2 match 2 stars 3.97 score 6 dependentsplangfelder
WGCNA:Weighted Correlation Network Analysis
Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.
Maintained by Peter Langfelder. Last updated 6 months ago.
10.8 match 54 stars 9.65 score 5.3k scripts 32 dependentsbiorgeo
bioregion:Comparison of Bioregionalisation Methods
The main purpose of this package is to propose a transparent methodological framework to compare bioregionalisation methods based on hierarchical and non-hierarchical clustering algorithms (Kreft & Jetz (2010) <doi:10.1111/j.1365-2699.2010.02375.x>) and network algorithms (Lenormand et al. (2019) <doi:10.1002/ece3.4718> and Leroy et al. (2019) <doi:10.1111/jbi.13674>).
Maintained by Maxime Lenormand. Last updated 10 days ago.
biogeographybioregionbioregionalizationcpp
16.1 match 7 stars 6.27 score 11 scriptshpetren
chemodiv:Analysing Chemodiversity of Phytochemical Data
Quantify and visualise various measures of chemical diversity and dissimilarity, for phytochemical compounds and other sets of chemical composition data. Importantly, these measures can incorporate biosynthetic and/or structural properties of the chemical compounds, resulting in a more comprehensive quantification of diversity and dissimilarity. For details, see Petrén, Köllner and Junker (2023) <doi:10.1111/nph.18685>.
Maintained by Hampus Petrén. Last updated 2 years ago.
22.0 match 5 stars 4.57 score 15 scriptsmhahsler
seriation:Infrastructure for Ordering Objects Using Seriation
Infrastructure for ordering objects with an implementation of several seriation/sequencing/ordination techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT). Hahsler et al (2008) <doi:10.18637/jss.v025.i03>.
Maintained by Michael Hahsler. Last updated 3 months ago.
combinatorial-optimizationordinationseriationfortran
6.7 match 77 stars 14.07 score 640 scripts 79 dependentsmmaechler
cluster:"Finding Groups in Data": Cluster Analysis Extended Rousseeuw et al.
Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) "Finding Groups in Data".
Maintained by Martin Maechler. Last updated 4 days ago.
7.6 match 3 stars 11.98 score 14k scripts 2.2k dependentstrinker
qdap:Bridging the Gap Between Qualitative Data and Quantitative Analysis
Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.
Maintained by Tyler Rinker. Last updated 4 years ago.
qdapquantitative-discourse-analysistext-analysistext-miningtext-plottingopenjdk
7.3 match 176 stars 9.61 score 1.3k scripts 3 dependentsjarioksa
natto:An Extreme 'vegan' Package of Experimental Code
Random code that is too experimental or too weird to be included in the vegan package.
Maintained by Jari Oksanen. Last updated 28 days ago.
14.6 match 8 stars 4.68 score 1 scriptspmair78
smacof:Multidimensional Scaling
Implements the following approaches for multidimensional scaling (MDS) based on stress minimization using majorization (smacof): ratio/interval/ordinal/spline MDS on symmetric dissimilarity matrices, MDS with external constraints on the configuration, individual differences scaling (idioscal, indscal), MDS with spherical restrictions, and ratio/interval/ordinal/spline unfolding (circular restrictions, row-conditional). Various tools and extensions like jackknife MDS, bootstrap MDS, permutation tests, MDS biplots, gravity models, unidimensional scaling, drift vectors (asymmetric MDS), classical scaling, and Procrustes are implemented as well.
Maintained by Patrick Mair. Last updated 5 months ago.
8.6 match 5 stars 7.86 score 152 scripts 24 dependentsmhahsler
arules:Mining Association Rules and Frequent Itemsets
Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.
Maintained by Michael Hahsler. Last updated 1 months ago.
arulesassociation-rulesfrequent-itemsets
4.8 match 194 stars 13.99 score 3.3k scripts 28 dependentsdvrbts
labdsv:Ordination and Multivariate Analysis for Ecology
A variety of ordination and community analyses useful in analysis of data sets in community ecology. Includes many of the common ordination methods, with graphical routines to facilitate their interpretation, as well as several novel analyses.
Maintained by David W. Roberts. Last updated 2 years ago.
9.6 match 3 stars 6.08 score 452 scripts 13 dependentsgrahamjwhite
IndexNumR:Index Number Calculation
Computes bilateral and multilateral index numbers. It has support for many standard bilateral indexes as well as multilateral index number methods such as GEKS, GEKS-Tornqvist (or CCDI), Geary-Khamis and the weighted time product dummy (for details on these methods see Diewert and Fox (2020) <doi:10.1080/07350015.2020.1816176>). It also supports updating of multilateral indexes using several splicing methods.
Maintained by Graham White. Last updated 1 years ago.
8.5 match 15 stars 6.20 score 71 scripts 1 dependentsmhahsler
recommenderlab:Lab for Developing and Testing Recommender Algorithms
Provides a research infrastructure to develop and evaluate collaborative filtering recommender algorithms. This includes a sparse representation for user-item matrices, many popular algorithms, top-N recommendations, and cross-validation. Hahsler (2022) <doi:10.48550/arXiv.2205.12371>.
Maintained by Michael Hahsler. Last updated 7 months ago.
collaborative-filteringrecommender-system
4.8 match 214 stars 10.07 score 840 scripts 2 dependentssandrinepavoine
adiv:Analysis of Diversity
Functions, data sets and examples for the calculation of various indices of biodiversity including species, functional and phylogenetic diversity. Part of the indices are expressed in terms of equivalent numbers of species. The package also provides ways to partition biodiversity across spatial or temporal scales (alpha, beta, gamma diversities). In addition to the quantification of biodiversity, ordination approaches are available which rely on diversity indices and allow the detailed identification of species, functional or phylogenetic differences between communities.
Maintained by Sandrine Pavoine. Last updated 1 years ago.
20.4 match 1 stars 2.28 score 63 scriptssergioventurini
dmbc:Model Based Clustering of Binary Dissimilarity Measurements
Functions for fitting a Bayesian model for grouping binary dissimilarity matrices in homogeneous clusters. Currently, it includes methods only for binary data (<doi:10.18637/jss.v100.i16>).
Maintained by Sergio Venturini. Last updated 6 months ago.
14.0 match 2 stars 3.30 score 4 scriptsemf-creaf
vegclust:Fuzzy Clustering of Vegetation Data
A set of functions to: (1) perform fuzzy clustering of vegetation data (De Caceres et al, 2010) <doi:10.1111/j.1654-1103.2010.01211.x>; (2) to assess ecological community similarity on the basis of structure and composition (De Caceres et al, 2013) <doi:10.1111/2041-210X.12116>.
Maintained by Miquel De Cáceres. Last updated 8 months ago.
7.1 match 2 stars 6.28 score 52 scripts 6 dependentsmarkmfredrickson
optmatch:Functions for Optimal Matching
Distance based bipartite matching using minimum cost flow, oriented to matching of treatment and control groups in observational studies ('Hansen' and 'Klopfer' 2006 <doi:10.1198/106186006X137047>). Routines are provided to generate distances from generalised linear models (propensity score matching), formulas giving variables on which to limit matched distances, stratified or exact matching directives, or calipers, alone or in combination.
Maintained by Josh Errickson. Last updated 3 months ago.
3.6 match 47 stars 12.22 score 588 scripts 5 dependentshannameyer
CAST:'caret' Applications for Spatial-Temporal Models
Supporting functionality to run 'caret' with spatial or spatial-temporal data. 'caret' is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using 'caret'. It includes the newly suggested 'Nearest neighbor distance matching' cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milà et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>.
Maintained by Hanna Meyer. Last updated 2 months ago.
autocorrelationcaretfeature-selectionmachine-learningoverfittingpredictive-modelingspatialspatio-temporalvariable-selection
3.5 match 114 stars 11.97 score 298 scripts 1 dependentsmlampros
ClusterR:Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids and Affinity Propagation Clustering
Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of 'RcppArmadillo' to speed up the computationally intensive parts of the functions. For more information, see (i) "Clustering in an Object-Oriented Environment" by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) "Web-scale k-means clustering" by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) "Armadillo: a template-based C++ library for linear algebra" by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) "Clustering by Passing Messages Between Data Points" by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, <doi:10.1126/science.1136800>.
Maintained by Lampros Mouselimis. Last updated 9 months ago.
affinity-propagationcpp11gmmkmeanskmedoids-clusteringmini-batch-kmeansrcpparmadilloopenblascppopenmp
3.6 match 84 stars 11.08 score 640 scripts 24 dependentsjacekbialek
PriceIndices:Calculating Bilateral and Multilateral Price Indexes
Preparing a scanner data set for price dynamics calculations (data selecting, data classification, data matching, data filtering). Computing bilateral and multilateral indexes. For details on these methods see: Diewert and Fox (2020) <doi:10.1080/07350015.2020.1816176>, Białek (2019) <doi:10.2478/jos-2019-0014> or Białek (2020) <doi:10.2478/jos-2020-0037>.
Maintained by Jacek Białek. Last updated 2 months ago.
6.4 match 11 stars 6.06 score 16 scriptstopepo
caret:Classification and Regression Training
Misc functions for training and plotting classification and regression models.
Maintained by Max Kuhn. Last updated 3 months ago.
2.0 match 1.6k stars 19.24 score 61k scripts 303 dependentsgrunwaldlab
poppr:Genetic Analysis of Populations with Mixed Reproduction
Population genetic analyses for hierarchical analysis of partially clonal populations built upon the architecture of the 'adegenet' package. Originally described in Kamvar, Tabima, and Grünwald (2014) <doi:10.7717/peerj.281> with version 2.0 described in Kamvar, Brooks, and Grünwald (2015) <doi:10.3389/fgene.2015.00208>.
Maintained by Zhian N. Kamvar. Last updated 10 months ago.
clonalitygenetic-analysisgenetic-distancesminimum-spanning-networksmultilocus-genotypesmultilocus-lineagespopulation-geneticspopulationsopenmp
3.5 match 69 stars 10.84 score 672 scriptselbersb
segregation:Entropy-Based Segregation Indices
Computes segregation indices, including the Index of Dissimilarity, as well as the information-theoretic indices developed by Theil (1971) <isbn:978-0471858454>, namely the Mutual Information Index (M) and Theil's Information Index (H). The M, further described by Mora and Ruiz-Castillo (2011) <doi:10.1111/j.1467-9531.2011.01237.x> and Frankel and Volij (2011) <doi:10.1016/j.jet.2010.10.008>, is a measure of segregation that is highly decomposable. The package provides tools to decompose the index by units and groups (local segregation), and by within and between terms. The package also provides a method to decompose differences in segregation as described by Elbers (2021) <doi:10.1177/0049124121986204>. The package includes standard error estimation by bootstrapping, which also corrects for small sample bias. The package also contains functions for visualizing segregation patterns.
Maintained by Benjamin Elbers. Last updated 1 years ago.
entropysegregationstatisticscpp
5.5 match 36 stars 6.44 score 51 scriptsmatthewkling
phylospatial:Spatial Phylogenetic Analysis
Conduct various analyses on spatial phylogenetics. Use your data on an evolutionary tree and geographic distributions of the terminal taxa to compute diversity and endemism metrics, test significance with null model randomization, analyze community turnover and biotic regionalization, and perform spatial conservation prioritizations. All functions support quantitative community data in addition to binary data.
Maintained by Matthew Kling. Last updated 4 days ago.
5.8 match 6 stars 6.16 score 9 scriptsleondap
recluster:Ordination Methods for the Analysis of Beta-Diversity Indices
The analysis of different aspects of biodiversity requires specific algorithms. For example, in regionalisation analyses, the high frequency of ties and zero values in dissimilarity matrices produced by Beta-diversity turnover produces hierarchical cluster dendrograms whose topology and bootstrap supports are affected by the order of rows in the original matrix. Moreover, visualisation of biogeographical regionalisation can be facilitated by a combination of hierarchical clustering and multi-dimensional scaling. The recluster package provides robust techniques to visualise and analyse pattern of biodiversity and to improve occurrence data for cryptic taxa.
Maintained by Leonardo Dapporto. Last updated 4 months ago.
7.3 match 4 stars 4.69 score 41 scriptschrhennig
fpc:Flexible Procedures for Clustering
Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.
Maintained by Christian Hennig. Last updated 6 months ago.
3.6 match 11 stars 9.25 score 2.6k scripts 70 dependentsr-spatial
spdep:Spatial Dependence: Weighting Schemes, Statistics
A collection of functions to create spatial weights matrix objects from polygon 'contiguities', from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial 'autocorrelation', including global 'Morans I' and 'Gearys C' proposed by 'Cliff' and 'Ord' (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), 'Hubert/Mantel' general cross product statistic, Empirical Bayes estimates and 'Assunção/Reis' (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, 'Getis/Ord' G ('Getis' and 'Ord' 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, 'APLE' ('Li 'et al.' ) <doi:10.1111/j.1538-4632.2007.00708.x>, local 'Moran's I', 'Gearys C' ('Anselin' 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and 'Getis/Ord' G ('Ord' and 'Getis' 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, 'saddlepoint' approximations ('Tiefelsdorf' 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local 'Moran's I' ('Bivand et al.' 2009) <doi:10.1016/j.csda.2008.07.021> and 'LOSH' local indicators of spatial heteroscedasticity ('Ord' and 'Getis') <doi:10.1007/s00168-011-0492-y>. The implementation of most of these measures is described in 'Bivand' and 'Wong' (2018) <doi:10.1007/s11749-018-0599-x>, with further extensions in 'Bivand' (2022) <doi:10.1111/gean.12319>. 'Lagrange' multiplier tests for spatial dependence in linear models are provided ('Anselin et al'. 1996) <doi:10.1016/0166-0462(95)02111-6>, as are 'Rao' score tests for hypothesised spatial 'Durbin' models based on linear models ('Koley' and 'Bera' 2023) <doi:10.1080/17421772.2023.2256810>. A local indicators for categorical data (LICD) implementation based on 'Carrer et al.' (2021) <doi:10.1016/j.jas.2020.105306> and 'Bivand et al.' (2017) <doi:10.1016/j.spasta.2017.03.003> was added in 1.3-7. From 'spdep' and 'spatialreg' versions >= 1.2-1, the model fitting functions previously present in this package are defunct in 'spdep' and may be found in 'spatialreg'.
Maintained by Roger Bivand. Last updated 18 days ago.
spatial-autocorrelationspatial-dependencespatial-weights
2.0 match 131 stars 16.62 score 6.0k scripts 107 dependentschrhennig
prabclus:Functions for Clustering and Testing of Presence-Absence, Abundance and Multilocus Genetic Data
Distance-based parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presence-absence, abundance and multilocus genetic data for species delimitation, nearest neighbor based noise detection. Genetic distances between communities. Tests whether various distance-based regressions are equal. Try package?prabclus for on overview.
Maintained by Christian Hennig. Last updated 6 months ago.
5.4 match 1 stars 5.99 score 90 scripts 71 dependentsalarm-redist
redistmetrics:Redistricting Metrics
Reliable and flexible tools for scoring redistricting plans using common measures and metrics. These functions provide key direct access to tools useful for non-simulation analyses of redistricting plans, such as for measuring compactness or partisan fairness. Tools are designed to work with the 'redist' package seamlessly.
Maintained by Christopher T. Kenny. Last updated 9 months ago.
4.0 match 10 stars 7.57 score 23 scripts 2 dependentsadeverse
ade4:Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences
Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
Maintained by Aurélie Siberchicot. Last updated 12 days ago.
2.0 match 39 stars 14.96 score 2.2k scripts 256 dependentsbioc
flowMatch:Matching and meta-clustering in flow cytometry
Matching cell populations and building meta-clusters and templates from a collection of FC samples.
Maintained by Ariful Azad. Last updated 5 months ago.
immunooncologyclusteringflowcytometrycpp
7.3 match 3.90 score 1 scriptsrekyt
funrar:Functional Rarity Indices Computation
Computes functional rarity indices as proposed by Violle et al. (2017) <doi:10.1016/j.tree.2017.02.002>. Various indices can be computed using both regional and local information. Functional Rarity combines both the functional aspect of rarity as well as the extent aspect of rarity. 'funrar' is presented in Grenié et al. (2017) <doi:10.1111/ddi.12629>.
Maintained by Matthias Grenié. Last updated 11 months ago.
ecological-modelsecologyraritytraits
3.6 match 17 stars 7.85 score 233 scripts 1 dependentskit-iism-em
partitionComparison:Implements Measures for the Comparison of Two Partitions
Provides several measures ((dis)similarity, distance/metric, correlation, entropy) for comparing two partitions of the same set of objects. The different measures can be assigned to three different classes: Pair comparison (containing the famous Jaccard and Rand indices), set based, and information theory based. Many of the implemented measures can be found in Albatineh AN, Niewiadomska-Bugaj M and Mihalko D (2006) <doi:10.1007/s00357-006-0017-z> and Meila M (2007) <doi:10.1016/j.jmva.2006.11.013>. Partitions are represented by vectors of class labels which allow a straightforward integration with existing clustering algorithms (e.g. kmeans()). The package is mostly based on the S4 object system.
Maintained by Fabian Ball. Last updated 2 years ago.
comparisondissimilarity-measuresdistance-measurespartitionssimilarity-measures
7.5 match 2 stars 3.78 score 60 scriptsbioc
phyloseq:Handling and analysis of high-throughput microbiome census data
phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.
Maintained by Paul J. McMurdie. Last updated 5 months ago.
immunooncologysequencingmicrobiomemetagenomicsclusteringclassificationmultiplecomparisongeneticvariability
2.0 match 597 stars 13.90 score 8.4k scripts 37 dependentstgouhier
biwavelet:Conduct Univariate and Bivariate Wavelet Analyses
This is a port of the WTC MATLAB package written by Aslak Grinsted and the wavelet program written by Christopher Torrence and Gibert P. Compo. This package can be used to perform univariate and bivariate (cross-wavelet, wavelet coherence, wavelet clustering) analyses.
Maintained by Tarik Gouhier. Last updated 7 months ago.
3.6 match 45 stars 7.51 score 81 scripts 1 dependentssciviews
exploreit:Exploratory Data Analysis for 'SciViews::R'
Multivariate analysis and data exploration for the 'SciViews::R' dialect.
Maintained by Philippe Grosjean. Last updated 11 months ago.
multivariate-analysissciviewsstatistical-methods
9.8 match 2.70 score 4 scriptsbioc
mia:Microbiome analysis
mia implements tools for microbiome analysis based on the SummarizedExperiment, SingleCellExperiment and TreeSummarizedExperiment infrastructure. Data wrangling and analysis in the context of taxonomic data is the main scope. Additional functions for common task are implemented such as community indices calculation and summarization.
Maintained by Tuomas Borman. Last updated 2 days ago.
microbiomesoftwaredataimportanalysisbioconductor
2.3 match 52 stars 11.50 score 316 scripts 5 dependentsbioc
SNPRelate:Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data
Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed an R package SNPRelate to provide a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.
Maintained by Xiuwen Zheng. Last updated 5 months ago.
infrastructuregeneticsstatisticalmethodprincipalcomponentbioinformaticsgds-formatpcasimdsnpopenblascpp
2.0 match 104 stars 12.69 score 1.6k scripts 18 dependentsa-s-melo
CommEcol:Community Ecology Analyses
Autosimilarity curves, standardization of spatial extent, dissimilarity indexes that overweight rare species, phylogenetic and functional (pairwise and multisample) dissimilarity indexes and nestedness for phylogenetic, functional and other diversity metrics. The methods for phylogenetic and functional nestedness is described in Melo, Cianciaruso and Almeida-Neto (2014) <doi:10.1111/2041-210X.12185>. This should be a complement to available packages, particularly 'vegan'.
Maintained by Adriano Sanches Melo. Last updated 9 months ago.
9.8 match 1 stars 2.39 score 41 scripts 1 dependentsidblr
ndi:Neighborhood Deprivation Indices
Computes various geospatial indices of socioeconomic deprivation and disparity in the United States. Some indices are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other indices are "aspatial" because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation indices (NDI) are available: including: (1) based on Messer et al. (2006) <doi:10.1007/s11524-006-9094-x> and (2) based on Andrews et al. (2020) <doi:10.1080/17445647.2020.1750066> and Slotman et al. (2022) <doi:10.1016/j.dib.2022.108002> who use variables chosen by Roux and Mair (2010) <doi:10.1111/j.1749-6632.2009.05333.x>. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward). Using data from the ACS-5 (2005-2009 onward), the package can also compute indices of racial or ethnic residential segregation, including but limited to those discussed in Massey & Denton (1988) <doi:10.1093/sf/67.2.281>, and additional indices of socioeconomic disparity.
Maintained by Ian D. Buller. Last updated 7 months ago.
censuscensus-apicensus-datadeprivationdeprivation-statsdisparitygeospatialgeospatial-datametric-developmentprincipal-component-analysissegregation-measuressocio-economic-indicators
3.5 match 21 stars 6.67 score 7 scripts 1 dependentsskembel
picante:Integrating Phylogenies and Ecology
Functions for phylocom integration, community analyses, null-models, traits and evolution. Implements numerous ecophylogenetic approaches including measures of community phylogenetic and trait diversity, phylogenetic signal, estimation of trait values for unobserved taxa, null models for community and phylogeny randomizations, and utility functions for data input/output and phylogeny plotting. A full description of package functionality and methods are provided by Kembel et al. (2010) <doi:10.1093/bioinformatics/btq166>.
Maintained by Steven W. Kembel. Last updated 2 years ago.
2.0 match 34 stars 11.42 score 1.1k scripts 16 dependentsropensci
textreuse:Detect Text Reuse and Document Similarity
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
Maintained by Yaoxiang Li. Last updated 30 days ago.
2.4 match 200 stars 9.28 score 226 scriptsbioc
clusterSeq:Clustering of high-throughput sequencing data by identifying co-expression patterns
Identification of clusters of co-expressed genes based on their expression across multiple (replicated) biological samples.
Maintained by Samuel Granjeaud. Last updated 5 months ago.
sequencingdifferentialexpressionmultiplecomparisonclusteringgeneexpression
5.1 match 4.26 score 2 scriptsbioc
ChemmineR:Cheminformatics Toolkit for R
ChemmineR is a cheminformatics package for analyzing drug-like small molecule data in R. Its latest version contains functions for efficient processing of large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms. In addition, it offers visualization functions for compound clustering results and chemical structures.
Maintained by Thomas Girke. Last updated 5 months ago.
cheminformaticsbiomedicalinformaticspharmacogeneticspharmacogenomicsmicrotitreplateassaycellbasedassaysvisualizationinfrastructuredataimportclusteringproteomicsmetabolomicscpp
2.3 match 14 stars 9.42 score 253 scripts 12 dependentscran
Anthropometry:Statistical Methods for Anthropometric Data
Statistical methodologies especially developed to analyze anthropometric data. These methods are aimed at providing effective solutions to some commons problems related to Ergonomics and Anthropometry. They are based on clustering, the statistical concept of data depth, statistical shape analysis and archetypal analysis. Please see Vinue (2017) <doi:10.18637/jss.v077.i06>.
Maintained by Guillermo Vinue. Last updated 2 years ago.
7.3 match 1 stars 2.78 score 2 dependentsmlampros
textTinyR:Text Processing for Small or Big Data Files
It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.
Maintained by Lampros Mouselimis. Last updated 1 years ago.
bhboostcpp11processingrcpprcpparmadillotextopenblascppopenmp
2.6 match 38 stars 7.64 score 244 scripts 1 dependentsadeverse
adespatial:Multivariate Multiscale Spatial Analysis
Tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran's Eigenvectors Maps, MEM). Several approaches are described in the review Dray et al (2012) <doi:10.1890/11-1183.1>.
Maintained by Aurélie Siberchicot. Last updated 12 days ago.
1.8 match 36 stars 11.06 score 398 scripts 2 dependentsmurphymv
dissCqN:Multiple Assemblage Dissimilarity for Orders q = 0-N
Calculate multiple or pairwise dissimilarity for orders q = 0-N (CqN; Chao et al., 2008 <doi:10/fcvn63>) for a set of species assemblages or interaction networks.
Maintained by Mark Murphy. Last updated 3 years ago.
5.4 match 3.70 score 2 scriptsbiometry
bipartite:Visualising Bipartite Networks and Calculating Some (Ecological) Indices
Functions to visualise webs and calculate a series of indices commonly used to describe pattern in (ecological) webs. It focuses on webs consisting of only two levels (bipartite), e.g. pollination webs or predator-prey-webs. Visualisation is important to get an idea of what we are actually looking at, while the indices summarise different aspects of the web's topology.
Maintained by Carsten F. Dormann. Last updated 6 days ago.
1.8 match 37 stars 10.93 score 592 scripts 15 dependentsberchuck
womblR:Spatiotemporal Boundary Detection Model for Areal Unit Data
Implements a spatiotemporal boundary detection model with a dissimilarity metric for areal data with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC). The response variable can be modeled as Gaussian (no nugget), probit or Tobit link and spatial correlation is introduced at each time point through a conditional autoregressive (CAR) prior. Temporal correlation is introduced through a hierarchical structure and can be specified as exponential or first-order autoregressive. Full details of the package can be found in the accompanying vignette. Furthermore, the details of the package can be found in "Diagnosing Glaucoma Progression with Visual Field Data Using a Spatiotemporal Boundary Detection Method", by Berchuck et al (2018), <arXiv:1805.11636>. The paper is in press at the Journal of the American Statistical Association.
Maintained by Samuel I. Berchuck. Last updated 3 years ago.
4.6 match 1 stars 4.10 score 25 scriptspneuvial
adjclust:Adjacency-Constrained Clustering of a Block-Diagonal Similarity Matrix
Implements a constrained version of hierarchical agglomerative clustering, in which each observation is associated to a position, and only adjacent clusters can be merged. Typical application fields in bioinformatics include Genome-Wide Association Studies or Hi-C data analysis, where the similarity between items is a decreasing function of their genomic distance. Taking advantage of this feature, the implemented algorithm is time and memory efficient. This algorithm is described in Ambroise et al (2019) <doi:10.1186/s13015-019-0157-4>.
Maintained by Pierre Neuvial. Last updated 5 months ago.
clusteringfeatureextractiongwashi-chierarchical-clusteringlinkage-disequilibriumcppopenmp
2.5 match 16 stars 7.35 score 13 scripts 2 dependentsbuttrey
treeClust:Cluster Distances Through Trees
Create a measure of inter-point dissimilarity useful for clustering mixed data, and, optionally, perform the clustering.
Maintained by Sam Buttrey. Last updated 7 years ago.
6.0 match 1 stars 3.06 score 77 scripts 5 dependentsmarce10
warbleR:Streamline Bioacoustic Analysis
Functions aiming to facilitate the analysis of the structure of animal acoustic signals in 'R'. 'warbleR' makes use of the basic sound analysis tools from the packages 'tuneR' and 'seewave', and offers new tools for explore and quantify acoustic signal structure. The package allows to organize and manipulate multiple sound files, create spectrograms of complete recordings or individual signals in different formats, run several measures of acoustic structure, and characterize different structural levels in acoustic signals.
Maintained by Marcelo Araya-Salas. Last updated 2 months ago.
animal-acoustic-signalsaudio-processingbioacousticsspectrogramstreamline-analysiscpp
1.7 match 54 stars 11.01 score 270 scripts 4 dependentscalvintchi
hierBipartite:Bipartite Graph-Based Hierarchical Clustering
Bipartite graph-based hierarchical clustering performs hierarchical clustering of groups of samples based on association patterns between two sets of variables. It is developed for pharmacogenomic datasets and datasets sharing the same data structure. In the context of pharmacogenomic datasets, the samples are cell lines, and the two sets of variables are typically expression levels and drug sensitivity values. For this method, sparse canonical correlation analysis from Lee, W., Lee, D., Lee, Y. and Pawitan, Y. (2011) <doi:10.2202/1544-6115.1638> is first applied to extract association patterns for each group of samples. Then, a nuclear norm-based dissimilarity measure is used to construct a dissimilarity matrix between groups based on the extracted associations. Finally, hierarchical clustering is applied.
Maintained by Calvin Chi. Last updated 4 years ago.
4.9 match 1 stars 3.70 score 4 scriptscmmr
rbiom:Read/Write, Analyze, and Visualize 'BIOM' Data
A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. CPU intensive operations are coded in C for speed.
Maintained by Daniel P. Smith. Last updated 6 days ago.
2.0 match 15 stars 9.02 score 117 scripts 6 dependentstonigi
dtw:Dynamic Time Warping Algorithms
A comprehensive implementation of dynamic time warping (DTW) algorithms in R. DTW computes the optimal (least cumulative distance) alignment between points of two time series. Common DTW variants covered include local (slope) and global (window) constraints, subsequence matches, arbitrary distance definitions, normalizations, minimum variance matching, and so on. Provides cumulative distances, alignments, specialized plot styles, etc., as described in Giorgino (2009) <doi:10.18637/jss.v031.i07>.
Maintained by Toni Giorgino. Last updated 2 years ago.
2.0 match 5 stars 8.48 score 582 scripts 49 dependentsmikemeredith
wiqid:Quick and Dirty Estimates for Wildlife Populations
Provides simple, fast functions for maximum likelihood and Bayesian estimates of wildlife population parameters, suitable for use with simulated data or bootstraps. Early versions were indeed quick and dirty, but optional error-checking routines and meaningful error messages have been added. Includes single and multi-season occupancy, closed capture population estimation, survival, species richness and distance measures.
Maintained by Ngumbang Juat. Last updated 2 years ago.
3.4 match 2 stars 4.84 score 115 scripts 1 dependentsdmuellner
fastcluster:Fast Hierarchical Clustering Routines for R and 'Python'
This is a two-in-one package which provides interfaces to both R and 'Python'. It implements fast hierarchical, agglomerative clustering routines. Part of the functionality is designed as drop-in replacement for existing routines: linkage() in the 'SciPy' package 'scipy.cluster.hierarchy', hclust() in R's 'stats' package, and the 'flashClust' package. It provides the same functionality with the benefit of a much faster implementation. Moreover, there are memory-saving routines for clustering of vector data, which go beyond what the existing packages provide. For information on how to install the 'Python' files, see the file INSTALL in the source distribution. Based on the present package, Christoph Dalitz also wrote a pure 'C++' interface to 'fastcluster': <https://lionel.kr.hs-niederrhein.de/~dalitz/data/hclust/>.
Maintained by Daniel Müllner. Last updated 1 years ago.
1.8 match 10 stars 9.29 score 444 scripts 107 dependentscran
OasisR:Outright Tool for the Analysis of Spatial Inequalities and Segregation
A comprehensive set of indexes and tests for social segregation analysis, as described in Tivadar (2019) - 'OasisR': An R Package to Bring Some Order to the World of Segregation Measurement <doi:10.18637/jss.v089.i07>. The package is the most complete existing tool and it clarifies many ambiguities and errors regarding the definition of segregation indices. Additionally, 'OasisR' introduces several resampling methods that enable testing their statistical significance (randomization tests, bootstrapping, and jackknife methods).
Maintained by Mihai Tivadar. Last updated 4 months ago.
8.8 match 2 stars 1.78 score 1 dependentsminatonakazawa
fmsb:Functions for Medical Statistics Book with some Demographic Data
Several utility functions for the book entitled "Practices of Medical and Health Data Analysis using R" (Pearson Education Japan, 2007) with Japanese demographic data and some demographic analysis related functions.
Maintained by Minato Nakazawa. Last updated 1 years ago.
2.0 match 3 stars 7.74 score 1.9k scripts 23 dependentsantoinelucas64
amap:Another Multidimensional Analysis Package
Tools for Clustering and Principal Component Analysis (With robust methods, and parallelized functions).
Maintained by Antoine Lucas. Last updated 5 months ago.
2.0 match 7.66 score 460 scripts 26 dependentselaliberte
FD:Measuring Functional Diversity (FD) from Multiple Traits, and Other Tools for Functional Ecology
Computes different multidimensional FD indices. Implements a distance-based framework to measure FD that allows any number and type of functional traits, and can also consider species relative abundances. Also contains other useful tools for functional ecology.
Maintained by Etienne Laliberté. Last updated 1 years ago.
2.3 match 4 stars 6.54 score 586 scripts 15 dependentsncss-tech
aqp:Algorithms for Quantitative Pedology
The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.
Maintained by Dylan Beaudette. Last updated 28 days ago.
digital-soil-mappingncss-technrcspedologypedometricssoilsoil-surveyusda
1.3 match 55 stars 11.77 score 1.2k scripts 2 dependentsdaijiang
phyr:Model Based Phylogenetic Analysis
A collection of functions to do model-based phylogenetic analysis. It includes functions to calculate community phylogenetic diversity, to estimate correlations among functional traits while accounting for phylogenetic relationships, and to fit phylogenetic generalized linear mixed models. The Bayesian phylogenetic generalized linear mixed models are fitted with the 'INLA' package (<https://www.r-inla.org>).
Maintained by Daijiang Li. Last updated 1 years ago.
bayesianglmminlaphylogenyspecies-distribution-modelingopenblascpp
1.7 match 31 stars 8.67 score 107 scripts 2 dependentscran
zetadiv:Functions to Compute Compositional Turnover Using Zeta Diversity
Functions to compute compositional turnover using zeta-diversity, the number of species shared by multiple assemblages. The package includes functions to compute zeta-diversity for a specific number of assemblages and to compute zeta-diversity for a range of numbers of assemblages. It also includes functions to explain how zeta-diversity varies with distance and with differences in environmental variables between assemblages, using generalised linear models, linear models with negative constraints, generalised additive models,shape constrained additive models, and I-splines.
Maintained by Guillaume Latombe. Last updated 3 years ago.
5.0 match 3 stars 2.89 score 64 scriptscran
sets:Sets, Generalized Sets, Customizable Sets and Intervals
Data structures and basic operations for ordinary sets, generalizations such as fuzzy sets, multisets, and fuzzy multisets, customizable sets, and intervals.
Maintained by David Meyer. Last updated 1 years ago.
2.0 match 1 stars 7.20 score 592 scripts 109 dependentslucymcgowan
pald:Partitioned Local Depth for Community Structure in Data
Implementation of the Partitioned Local Depth (PaLD) approach which provides a measure of local depth and the cohesion of a point to another which (together with a universal threshold for distinguishing strong and weak ties) may be used to reveal local and global structure in data, based on methods described in Berenhaut, Moore, and Melvin (2022) <doi:10.1073/pnas.2003634119>. No extraneous inputs, distributional assumptions, iterative procedures nor optimization criteria are employed. This package includes functions for computing local depths and cohesion as well as flexible functions for plotting community networks and displays of cohesion against distance.
Maintained by Lucy DAgostino McGowan. Last updated 11 months ago.
4.0 match 6 stars 3.56 score 12 scriptslaperez
Clustering:Techniques for Evaluating Clustering
The design of this package allows us to run different clustering packages and compare the results between them, to determine which algorithm behaves best from the data provided. See Martos, L.A.P., García-Vico, Á.M., González, P. et al.(2023) <doi:10.1007/s13748-022-00294-2> "Clustering: an R library to facilitate the analysis and comparison of cluster algorithms.", Martos, L.A.P., García-Vico, Á.M., González, P. et al. "A Multiclustering Evolutionary Hyperrectangle-Based Algorithm" <doi:10.1007/s44196-023-00341-3> and L.A.P., García-Vico, Á.M., González, P. et al. "An Evolutionary Fuzzy System for Multiclustering in Data Streaming" <doi:10.1016/j.procs.2023.12.058>.
Maintained by Luis Alfonso Perez Martos. Last updated 11 months ago.
3.5 match 5 stars 4.04 score 7 scriptszdk123
pulsar:Parallel Utilities for Lambda Selection along a Regularization Path
Model selection for penalized graphical models using the Stability Approach to Regularization Selection ('StARS'), with options for speed-ups including Bounded StARS (B-StARS), batch computing, and other stability metrics (e.g., graphlet stability G-StARS). Christian L. Müller, Richard Bonneau, Zachary Kurtz (2016) <arXiv:1605.07072>.
Maintained by Zachary Kurtz. Last updated 1 years ago.
2.3 match 10 stars 6.16 score 65 scriptsgesistsa
webtrackR:Preprocessing and Analyzing Web Tracking Data
Data structures and methods to work with web tracking data. The functions cover data preprocessing steps, enriching web tracking data with external information and methods for the analysis of digital behavior as used in several academic papers (e.g., Clemm von Hohenberg et al., 2023 <doi:10.17605/OSF.IO/M3U9P>; Stier et al., 2022 <doi:10.1017/S0003055421001222>).
Maintained by David Schoch. Last updated 3 months ago.
2.3 match 9 stars 6.03 score 8 scriptspzhaonet
rarestR:Rarefaction-Based Species Richness Estimator
Calculate rarefaction-based alpha- and beta-diversity. Offer parametric extrapolation to estimate the total expected species in a single community and the total expected shared species between two communities. Visualize the curve-fitting for these estimators.
Maintained by Peng Zhao. Last updated 4 months ago.
2.7 match 2 stars 4.89 score 13 scriptsbioc
MantelCorr:Compute Mantel Cluster Correlations
Computes Mantel cluster correlations from a (p x n) numeric data matrix (e.g. microarray gene-expression data).
Maintained by Brian Steinmeyer. Last updated 5 months ago.
3.9 match 3.30 score 1 scriptsmspinillos
ecoregime:Analysis of Ecological Dynamic Regimes
A toolbox for implementing the Ecological Dynamic Regime framework (Sánchez-Pinillos et al., 2023 <doi:10.1002/ecm.1589>) to characterize and compare groups of ecological trajectories in multidimensional spaces defined by state variables. The package includes the RETRA-EDR algorithm to identify representative trajectories, functions to generate, summarize, and visualize representative trajectories, and several metrics to quantify the distribution and heterogeneity of trajectories in an ecological dynamic regime and quantify the dissimilarity between two or more ecological dynamic regimes. The package also includes a set of functions to assess ecological resilience based on ecological dynamic regimes (Sánchez-Pinillos et al., 2024 <doi:10.1016/j.biocon.2023.110409>).
Maintained by Martina Sánchez-Pinillos. Last updated 11 months ago.
2.4 match 7 stars 5.32 score 8 scriptschavent
ClustOfVar:Clustering of Variables
Cluster analysis of a set of variables. Variables can be quantitative, qualitative or a mixture of both.
Maintained by Marie Chavent. Last updated 5 years ago.
1.9 match 7 stars 6.47 score 142 scripts 2 dependentsbioc
timeOmics:Time-Course Multi-Omics data integration
timeOmics is a generic data-driven framework to integrate multi-Omics longitudinal data measured on the same biological samples and select key temporal features with strong associations within the same sample group. The main steps of timeOmics are: 1. Plaform and time-specific normalization and filtering steps; 2. Modelling each biological into one time expression profile; 3. Clustering features with the same expression profile over time; 4. Post-hoc validation step.
Maintained by Antoine Bodein. Last updated 5 months ago.
clusteringfeatureextractiontimecoursedimensionreductionsoftwaresequencingmicroarraymetabolomicsmetagenomicsproteomicsclassificationregressionimmunooncologygenepredictionmultiplecomparisonclusterintegrationmulti-omicstime-series
2.0 match 24 stars 5.98 score 10 scriptsswampthingpaul
NADA2:Data Analysis for Censored Environmental Data
Contains methods described by Dennis Helsel in his book "Statistics for Censored Environmental Data using Minitab and R" (2011) and courses and videos at <https://practicalstats.com>. This package adds new functions to the `NADA` Package.
Maintained by Paul Julian. Last updated 6 months ago.
1.9 match 15 stars 6.16 score 16 scriptskurthornik
relations:Data Structures and Algorithms for Relations
Data structures and algorithms for k-ary relations with arbitrary domains, featuring relational algebra, predicate functions, and fitters for consensus relations.
Maintained by Kurt Hornik. Last updated 25 days ago.
2.3 match 5.04 score 58 scripts 14 dependentsstefanomp
yaConsensus:Consensus Clustering of Omic Data
Procedures to perform consensus clustering starting from a dissimilarity matrix or a data matrix. It's allowed to select if the subsampling has to be by samples or features. In case of computational heavy load, the procedures can run in parallel.
Maintained by Stefano Maria Pagnotta. Last updated 4 years ago.
4.1 match 2.70 scorecbolen1
rdi:Repertoire Dissimilarity Index
Methods for calculation and visualization of the Repertoire Dissimilarity Index. Citation: Bolen and Rubelt, et al (2017) <doi:10.1186/s12859-017-1556-5>.
Maintained by Christopher Bolen. Last updated 6 years ago.
3.9 match 2.88 score 15 scriptsmtrupiano1
knnwtsim:K Nearest Neighbor Forecasting with a Tailored Similarity Metric
Functions to implement K Nearest Neighbor forecasting using a weighted similarity metric tailored to the problem of forecasting univariate time series where recent observations, seasonal patterns, and exogenous predictors are all relevant in predicting future observations of the series in question. For more information on the formulation of this similarity metric please see Trupiano (2021) <arXiv:2112.06266>.
Maintained by Matthew Trupiano. Last updated 3 years ago.
forecastingknn-regressionmachine-learningtime-series
4.0 match 1 stars 2.70 score 2 scriptscran
fossil:Palaeoecological and Palaeogeographical Analysis Tools
A set of analytical tools useful in analysing ecological and geographical data sets, both ancient and modern. The package includes functions for estimating species richness (Chao 1 and 2, ACE, ICE, Jacknife), shared species/beta diversity, species area curves and geographic distances and areas.
Maintained by Matthew J. Vavrek. Last updated 5 years ago.
2.0 match 1 stars 5.18 score 532 scripts 7 dependentscarlosp-carmona
TPD:Methods for Measuring Functional Diversity Based on Trait Probability Density
Tools to calculate trait probability density functions (TPD) at any scale (e.g. populations, species, communities). TPD functions are used to compute several indices of functional diversity, as well as its partition across scales. These indices constitute a unified framework that incorporates the underlying probabilistic nature of trait distributions into uni- or multidimensional functional trait-based studies. See Carmona et al. (2016) <doi:10.1016/j.tree.2016.02.003> for further information.
Maintained by Carlos P. Carmona. Last updated 6 years ago.
3.0 match 2 stars 3.42 score 33 scriptschavent
ClustGeo:Hierarchical Clustering with Spatial Constraints
Implements a Ward-like hierarchical clustering algorithm including soft spatial/geographical constraints.
Maintained by Marie Chavent. Last updated 3 years ago.
1.7 match 7 stars 5.85 score 67 scripts 1 dependentscran
FreeSortR:Free Sorting Data Analysis
Provides tools for describing and analysing free sorting data. Main methods are computation of consensus partition and factorial analysis of the dissimilarity matrix between stimuli (using multidimensional scaling approach).
Maintained by Philippe Courcoux. Last updated 7 years ago.
6.8 match 1.38 score 24 scriptsajwills72
catlearn:Formal Psychological Models of Categorization and Learning
Formal psychological models of categorization and learning, independently-replicated data sets against which to test them, and simulation archives.
Maintained by Andy Wills. Last updated 3 months ago.
categorizationcognitive-scienceformal-modelslearninglearning-theoryopen-modelsopen-sciencepsychologycpp
1.8 match 26 stars 5.25 score 46 scriptsmatthias-studer
WeightedCluster:Clustering of Weighted Data
Clusters state sequences and weighted data. It provides an optimized weighted PAM algorithm as well as functions for aggregating replicated cases, computing cluster quality measures for a range of clustering solutions and plotting (fuzzy) clusters of state sequences. Parametric bootstraps methods to validate typology of sequences are also provided. Finally, it provides a fuzzy and crisp CLARA algorithm to cluster large database with sequence analysis.
Maintained by Matthias Studer. Last updated 3 months ago.
1.7 match 5.55 score 106 scripts 4 dependentstkcaccia
MetChem:Chemical Structural Similarity Analysis
A new pipeline to explore chemical structural similarity across metabolite. It allows to classify metabolites in structurally-related modules and identify common shared functional groups. KODAMA algorithm is used to highlight structural similarity between metabolites. See Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA. (2017) Bioinformatics <doi:10.1093/bioinformatics/btw705>, Cacciatore S, Luchinat C, Tenori L. (2014) Proc Natl Acad Sci USA <doi:10.1073/pnas.1220873111>, and Abdel-Shafy EA, Melak T, MacIntyre DA, Zadra G, Zerbini LF, Piazza S, Cacciatore S. (2023) Bioinformatics Advances <doi:10.1093/bioadv/vbad053>.
Maintained by Stefano Cacciatore. Last updated 2 years ago.
4.5 match 2.00 scorecran
qdm:Fitting a Quadrilateral Dissimilarity Model to Same-Different Judgments
This package provides different specifications of a Quadrilateral Dissimilarity Model which can be used to fit same-different judgments in order to get a predicted matrix that satisfies regular minimality [Colonius & Dzhafarov, 2006, Measurement and representations of sensations, Erlbaum]. From such a matrix, Fechnerian distances can be computed.
Maintained by Nora Umbach. Last updated 10 years ago.
8.9 match 1.00 scorewaddella
RnavGraphImageData:Image Data Used in the Loon Package Demos
Image data used as examples in the loon R package.
Maintained by Adrian Waddell. Last updated 7 years ago.
6.9 match 1.28 score 19 scriptscran
CUB:A Class of Mixture Models for Ordinal Data
For ordinal rating data, estimate and test models within the family of CUB models and their extensions (where CUB stands for Combination of a discrete Uniform and a shifted Binomial distributions); Simulation routines, plotting facilities and fitting measures are also provided.
Maintained by Rosaria Simone. Last updated 1 years ago.
2.0 match 4.37 score 79 scripts 1 dependentsnakarinp
longreadvqs:Viral Quasispecies Comparison from Long-Read Sequencing Data
Performs variety of viral quasispecies diversity analyses [see Pamornchainavakul et al. (2024) <doi:10.21203/rs.3.rs-4637890/v1>] based on long-read sequence alignment. Main functions include 1) sequencing error and other noise minimization and read sampling, 2) Single nucleotide variant (SNV) profiles comparison, and 3) viral quasispecies profiles comparison and visualization.
Maintained by Nakarin Pamornchainavakul. Last updated 7 months ago.
1.8 match 4.65 score 4 scriptsyoctozepto
RAFS:Robust Aggregative Feature Selection
A cross-validated minimal-optimal feature selection algorithm. It utilises popularity counting, hierarchical clustering with feature dissimilarity measures, and prefiltering with all-relevant feature selection method to obtain the minimal-optimal set of features.
Maintained by Radosław Piliszek. Last updated 2 months ago.
6.0 match 1.30 scoregagolews
genie:Fast, Robust, and Outlier Resistant Hierarchical Clustering
Includes the reference implementation of Genie - a hierarchical clustering algorithm that links two point groups in such a way that an inequity measure (namely, the Gini index) of the cluster sizes does not significantly increase above a given threshold. This method most often outperforms many other data segmentation approaches in terms of clustering quality as tested on a wide range of benchmark datasets. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. For more details see (Gagolewski et al. 2016 <DOI:10.1016/j.ins.2016.05.003>). For an even faster and more feature-rich implementation, including, amongst others, noise point detection, see the 'genieclust' package (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>).
Maintained by Marek Gagolewski. Last updated 3 years ago.
clustercluster-analysisclusteringdata-analysisdata-miningdata-sciencedatasciencegeniehierarchical-clustering-algorithmmachine-learningmachine-learning-algorithmsoutlierscppopenmp
1.7 match 22 stars 4.55 score 16 scriptscran
bios2mds:From Biological Sequences to Multidimensional Scaling
Utilities dedicated to the analysis of biological sequences by metric MultiDimensional Scaling with projection of supplementary data. It contains functions for reading multiple sequence alignment files, calculating distance matrices, performing metric multidimensional scaling and visualizing results.
Maintained by Marie Chabbert. Last updated 5 years ago.
4.0 match 1 stars 1.90 scorerajarshi
fingerprint:Functions to Operate on Binary Fingerprint Data
Functions to manipulate binary fingerprints of arbitrary length. A fingerprint is represented by an object of S4 class 'fingerprint' which is internally represented a vector of integers, such that each element represents the position in the fingerprint that is set to 1. The bitwise logical functions in R are overridden so that they can be used directly with 'fingerprint' objects. A number of distance metrics are also available (many contributed by Michael Fadock). Fingerprints can be converted to Euclidean vectors (i.e., points on the unit hypersphere) and can also be folded using OR. Arbitrary fingerprint formats can be handled via line handlers. Currently handlers are provided for CDK, MOE and BCI fingerprint data.
Maintained by Rajarshi Guha. Last updated 7 years ago.
1.8 match 4.22 score 82 scripts 12 dependentskylebittinger
abdiv:Alpha and Beta Diversity Measures
A collection of measures for measuring ecological diversity. Ecological diversity comes in two flavors: alpha diversity measures the diversity within a single site or sample, and beta diversity measures the diversity across two sites or samples. This package overlaps considerably with other R packages such as 'vegan', 'gUniFrac', 'betapart', and 'fossil'. We also include a wide range of functions that are implemented in software outside the R ecosystem, such as 'scipy', 'Mothur', and 'scikit-bio'. The implementations here are designed to be basic and clear to the reader.
Maintained by Kyle Bittinger. Last updated 1 years ago.
1.8 match 9 stars 4.14 score 31 scriptsarliph
SPARTAAS:Statistical Pattern Recognition and daTing using Archaeological Artefacts assemblageS
Statistical pattern recognition and dating using archaeological artefacts assemblages. Package of statistical tools for archaeology. hclustcompro(perioclust): Bellanger Lise, Coulon Arthur, Husi Philibrary(SPARTlippe (2021, ISBN:978-3-030-60103-4). mapclust: Bellanger Lise, Coulon Arthur, Husi Philippe (2021) <doi:10.1016/j.jas.2021.105431>. seriograph: Desachy Bruno (2004) <doi:10.3406/pica.2004.2396>. cerardat: Bellanger Lise, Husi Philippe (2012) <doi:10.1016/j.jas.2011.06.031>.
Maintained by Arthur Coulon. Last updated 10 months ago.
1.8 match 6 stars 4.14 score 46 scriptsandregustavom
mlquantify:Algorithms for Class Distribution Estimation
Quantification is a prominent machine learning task that has received an increasing amount of attention in the last years. The objective is to predict the class distribution of a data sample. This package is a collection of machine learning algorithms for class distribution estimation. This package include algorithms from different paradigms of quantification. These methods are described in the paper: A. Maletzke, W. Hassan, D. dos Reis, and G. Batista. The importance of the test set size in quantification assessment. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI20, pages 2640–2646, 2020. <doi:10.24963/ijcai.2020/366>.
Maintained by Andre Maletzke. Last updated 3 years ago.
2.0 match 7 stars 3.54 score 1 scriptsmarcohlmann
metanetwork:Handling and Representing Trophic Networks in Space and Time
A toolbox to handle and represent trophic networks in space or time across aggregation levels. This package contains a layout algorithm specifically designed for trophic networks, using dimension reduction on a diffusion graph kernel and trophic levels. Importantly, this package provides a layout method applicable for large trophic networks. The package also implements network diversity indices at different aggregation levels and connectance computation.
Maintained by Marc Ohlmann. Last updated 2 years ago.
1.8 match 2 stars 3.89 score 77 scriptssyksy
hamlet:Hierarchical Optimal Matching and Machine Learning Toolbox
Various functions and algorithms are provided here for solving optimal matching tasks in the context of preclinical cancer studies. Further, various helper and plotting functions are provided for unsupervised and supervised machine learning as well as longitudinal mixed-effects modeling of tumor growth response patterns.
Maintained by Teemu Daniel Laajala. Last updated 2 years ago.
1.7 match 4.18 score 25 scripts 2 dependentsjdmde
scellpam:Applying Partitioning Around Medoids to Single Cell Data with High Number of Cells
PAM (Partitioning Around Medoids) algorithm application to samples of single cell sequencing techniques with a high number of cells (as many as the computer memory allows). The package uses a binary format to store matrices (either full, sparse or symmetric) in files written in the disk that can contain any data type (not just double) which allows its manipulation when memory is sufficient to load them as int or float, but not as double. The PAM implementation is done in parallel, using several/all the cores of the machine, if it has them. This package shares a great part of its code with packages 'jmatrix' and 'parallelpam' but their functionality is included here so there is no need to install them.
Maintained by Juan Domingo. Last updated 8 months ago.
2.5 match 2.78 score 9 scriptsxytangtang
ProcData:Process Data Analysis
Provides tools for exploratory process data analysis. Process data refers to the data describing participants' problem-solving processes in computer-based assessments. It is often recorded in computer log files. This package provides functions to read, process, and write process data. It also implements two feature extraction methods to compress the information stored in process data into standard numerical vectors. This package also provides recurrent neural network based models that relate response processes with other binary or scale variables of interest. The functions that involve training and evaluating neural networks are wrappers of functions in 'keras'.
Maintained by Xueying Tang. Last updated 4 years ago.
1.8 match 10 stars 3.70 score 2 scriptsfrederic-santos
AnthropMMD:An R Package for the Mean Measure of Divergence (MMD)
Offers a graphical user interface for the calculation of the mean measure of divergence, with facilities for trait selection and graphical representations <doi:10.1002/ajpa.23336>.
Maintained by Frédéric Santos. Last updated 1 years ago.
1.7 match 3.90 score 16 scriptsbarnhilldave
TML:Tropical Geometry Tools for Machine Learning
Suite of tropical geometric tools for use in machine learning applications. These methods may be summarized in the following references: Yoshida, et al. (2022) <arxiv:2209.15045>, Barnhill et al. (2023) <arxiv:2303.02539>, Barnhill and Yoshida (2023) <doi:10.3390/math11153433>, Aliatimis et al. (2023) <arXiv:2306.08796>, Yoshida et al. (2022) <arXiv:2206.04206>, and Yoshida et al. (2019) <doi:10.1007/s11538-018-0493-4>.
Maintained by David Barnhill. Last updated 8 months ago.
1.8 match 3 stars 3.65 score 1 scriptscogdisreslab
PAVER:PAVER: Pathway Analysis Visualization with Embedding Representations
Summary visualization using embedding representations to reveal underlying themes within sets of pathway terms.
Maintained by William G Ryan V. Last updated 8 months ago.
1.9 match 3.48 score 6 scriptscran
SOMbrero:SOM Bound to Realize Euclidean and Relational Outputs
The stochastic (also called on-line) version of the Self-Organising Map (SOM) algorithm is provided. Different versions of the algorithm are implemented, for numeric and relational data and for contingency tables as described, respectively, in Kohonen (2001) <isbn:3-540-67921-9>, Olteanu & Villa-Vialaneix (2005) <doi:10.1016/j.neucom.2013.11.047> and Cottrell et al (2004) <doi:10.1016/j.neunet.2004.07.010>. The package also contains many plotting features (to help the user interpret the results), can handle (and impute) missing values and is delivered with a graphical user interface based on 'shiny'.
Maintained by Nathalie Vialaneix. Last updated 1 years ago.
1.5 match 1 stars 4.32 score 115 scripts 1 dependentsrekyt
fddimensionality:Test Effect of Traits of FD-Environment Relationship
Companion code for paper XXX <doi:xxx> on FD-Environment relationship, which tests to what extent we can expect FD-Environment trait relationship in function of number of traits included and type of environmental filtering.
Maintained by Matthias Grenié. Last updated 2 years ago.
3.8 match 1.70 scorebioc
AWFisher:An R package for fast computing for adaptively weighted fisher's method
Implementation of the adaptively weighted fisher's method, including fast p-value computing, variability index, and meta-pattern.
Maintained by Zhiguang Huo. Last updated 5 months ago.
1.3 match 5 stars 4.70 score 4 scriptscran
PoiClaClu:Classification and Clustering of Sequencing Data Based on a Poisson Model
Implements the methods described in the paper, Witten (2011) Classification and Clustering of Sequencing Data using a Poisson Model, Annals of Applied Statistics 5(4) 2493-2518.
Maintained by Daniela Witten. Last updated 6 years ago.
1.6 match 3.81 score 107 scripts 2 dependentscran
TestDimorph:Analysis of the Interpopulation Difference in Degree of Sexual Dimorphism Using Summary Statistics
Offers a solution for the unavailability of raw data in most anthropological studies by facilitating the calculations of several sexual dimorphism related analyses using the published summary statistics of metric data (mean, standard deviation and sex specific sample size) as illustrated by the works of Relethford, J. H., & Hodges, D. C. (1985) <doi:10.1002/ajpa.1330660105>, Greene, D. L. (1989) <doi:10.1002/ajpa.1330790113> and Konigsberg, L. W. (1991) <doi:10.1002/ajpa.1330840110>.
Maintained by Bassam A. Abulnoor. Last updated 1 years ago.
2.3 match 1 stars 2.70 scorematthewkling
colors3d:Generate 2D and 3D Color Palettes
Generate multivariate color palettes to represent two-dimensional or three-dimensional data in graphics (in contrast to standard color palettes that represent just one variable). You tell 'colors3d' how to map color space onto your data, and it gives you a color for each data point. You can then use these colors to make plots in base 'R', 'ggplot2', or other graphics frameworks.
Maintained by Matthew Kling. Last updated 1 years ago.
1.8 match 3 stars 3.18 score 2 scriptschristopherkenny
divseg:Calculate Diversity and Segregation Indices
Implements common measures of diversity and spatial segregation. This package has tools to compute the majority of measures are reviewed in Massey and Denton (1988) <doi:10.2307/2579183>. Multiple common measures of within-geography diversity are implemented as well. All functions operate on data frames with a 'tidyselect' based workflow.
Maintained by Christopher T. Kenny. Last updated 10 months ago.
2.0 match 1 stars 2.78 score 12 scriptscran
DisimForMixed:Calculate Dissimilarity Matrix for Dataset with Mixed Attributes
Implement the methods proposed by Ahmad & Dey (2007) <doi:10.1016/j.datak.2007.03.016> in calculating the dissimilarity matrix at the presence of mixed attributes. This Package includes functions to discretize quantitative variables, calculate conditional probability for each pair of attribute values, distance between every pair of attribute values, significance of attributes, calculate dissimilarity between each pair of objects.
Maintained by Hasanthi A. Pathberiya. Last updated 9 years ago.
5.5 match 1.00 scorecran
benthos:Marine Benthic Ecosystem Analysis
Preprocessing tools and biodiversity measures (species abundance, species richness, population heterogeneity and sensitivity) for analysing marine benthic data. See Van Loon et al. (2015) <doi:10.1016/j.seares.2015.05.002> for an application of these tools.
Maintained by Dennis Walvoort. Last updated 3 years ago.
2.0 match 2.53 score 34 scriptsmartinoandrea92
gmfd:Inference and Clustering of Functional Data
Some methods for the inference and clustering of univariate and multivariate functional data, using a generalization of Mahalanobis distance, along with some functions useful for the analysis of functional data. For further details, see Martino A., Ghiglietti, A., Ieva, F. and Paganoni A. M. (2017) <arXiv:1708.00386>.
Maintained by Andrea Martino. Last updated 7 years ago.
2.0 match 1 stars 2.48 score 30 scriptscran
ClusteredMutations:Location and Visualization of Clustered Somatic Mutations
Identification and visualization of groups of closely spaced mutations in the DNA sequence of cancer genome. The extremely mutated zones are searched in the symmetric dissimilarity matrix using the anti-Robinson matrix properties. Different data sets are obtained to describe and plot the clustered mutations information.
Maintained by David Lora. Last updated 9 years ago.
2.4 match 2.00 scorejcaledo
EnvNJ:Whole Genome Phylogenies Using Sequence Environments
Contains utilities for the analysis of protein sequences in a phylogenetic context. Allows the generation of phylogenetic trees base on protein sequences in an alignment-independent way. Two different methods have been implemented. One approach is based on the frequency analysis of n-grams, previously described in Stuart et al. (2002) <doi:10.1093/bioinformatics/18.1.100>. The other approach is based on the species-specific neighborhood preference around amino acids. Features include the conversion of a protein set into a vector reflecting these neighborhood preferences, pairwise distances (dissimilarity) between these vectors, and the generation of trees based on these distance matrices.
Maintained by Juan Carlos Aledo. Last updated 3 years ago.
4.3 match 1.04 score 11 scriptscran
evclust:Evidential Clustering
Various clustering algorithms that produce a credal partition, i.e., a set of Dempster-Shafer mass functions representing the membership of objects to clusters. The mass functions quantify the cluster-membership uncertainty of the objects. The algorithms are: Evidential c-Means, Relational Evidential c-Means, Constrained Evidential c-Means, Evidential Clustering, Constrained Evidential Clustering, Evidential K-nearest-neighbor-based Clustering, Bootstrap Model-Based Evidential Clustering, Belief Peak Evidential Clustering, Neural-Network-based Evidential Clustering.
Maintained by Thierry Denoeux. Last updated 1 years ago.
1.8 match 2.48 score 1 dependentsms609
Quartet:Comparison of Phylogenetic Trees Using Quartet and Split Measures
Calculates the number of four-taxon subtrees consistent with a pair of cladograms, calculating the symmetric quartet distance of Bandelt & Dress (1986), Reconstructing the shape of a tree from observed dissimilarity data, Advances in Applied Mathematics, 7, 309-343 <doi:10.1016/0196-8858(86)90038-2>, and using the tqDist algorithm of Sand et al. (2014), tqDist: a library for computing the quartet and triplet distances between binary or general trees, Bioinformatics, 30, 2079–2080 <doi:10.1093/bioinformatics/btu157> for pairs of binary trees.
Maintained by Martin R. Smith. Last updated 2 months ago.
bioinformaticscomparisonphylogenetic-treesphylogeneticsquartetquartet-distanceresearch-tooltreecpp
0.5 match 14 stars 8.00 score 40 scriptsnoreastermt
allelematch:Identifying Unique Multilocus Genotypes where Genotyping Error and Missing Data may be Present
Tools for the identification of unique of multilocus genotypes when both genotyping error and missing data may be present; targeted for use with large datasets and databases containing multiple samples of each individual (a common situation in conservation genetics, particularly in non-invasive wildlife sampling applications). Functions explicitly incorporate missing data and can tolerate allele mismatches created by genotyping error. If you use this package, please cite the original publication in Molecular Ecology Resources (Galpern et al., 2012), the details for which can be generated using citation('allelematch'). For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
Maintained by Todd Cross. Last updated 12 months ago.
1.8 match 2.26 score 8 scripts 1 dependentsnalimilan
RcmdrPlugin.temis:Graphical Integrated Text Mining Solution
An 'R Commander' plug-in providing an integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, vocabulary tables, terms co-occurrences and documents similarity measures, time series analysis, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, 'Twitter' queries, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.
Maintained by Milan Bouchet-Valat. Last updated 7 years ago.
3.9 match 1.00 score 7 scriptscran
ddc:Distance Density Clustering Algorithm
A distance density clustering (DDC) algorithm in R. DDC uses dynamic time warping (DTW) to compute a similarity matrix, based on which cluster centers and cluster assignments are found. DDC inherits dynamic time warping (DTW) arguments and constraints. The cluster centers are centroid points that are calculated using the DTW Barycenter Averaging (DBA) algorithm. The clustering process is divisive. At each iteration, cluster centers are updated and data is reassigned to cluster centers. Early stopping is possible. The output includes cluster centers and clustering assignment, as described in the paper (Ma et al (2017) <doi:10.1109/ICDMW.2017.11>).
Maintained by Ruizhe Ma. Last updated 2 years ago.
1.9 match 2.00 score 9 scriptsgreen-striped-gecko
dartR.captive:Analysing 'SNP' Data to Support Captive Breeding
Functions are provided that facilitate the analysis of SNP (single nucleotide polymorphism) data to answer questions regarding captive breeding and relatedness between individuals. 'dartR.captive' is part of the 'dartRverse' suit of packages. Gruber et al. (2018) <doi:10.1111/1755-0998.12745>. Mijangos et al. (2022) <doi:10.1111/2041-210X.13918>.
Maintained by Bernd Gruber. Last updated 26 days ago.
1.8 match 1 stars 2.00 score 3 scriptsnowosad
supercells:Superpixels of Spatial Data
Creates superpixels based on input spatial data. This package works on spatial data with one variable (e.g., continuous raster), many variables (e.g., RGB rasters), and spatial patterns (e.g., areas in categorical rasters). It is based on the SLIC algorithm (Achanta et al. (2012) <doi:10.1109/TPAMI.2012.120>), and readapts it to work with arbitrary dissimilarity measures.
Maintained by Jakub Nowosad. Last updated 7 months ago.
0.5 match 68 stars 6.50 score 52 scriptsshixiangwang
neopeptides:Calculate and Explore Property Indices of Neopeptides
Includes functions to calculate and explore several property indices for neopeptides, which are abnormal peptides generated from genome.
Maintained by Shixiang Wang. Last updated 3 years ago.
somaticmutationimmunooncologyalignmentimmunotherapyneoantigenpeptides
1.8 match 1 stars 1.70 score 1 scriptsbrandmaier
pdc:Permutation Distribution Clustering
Permutation Distribution Clustering is a clustering method for time series. Dissimilarity of time series is formalized as the divergence between their permutation distributions. The permutation distribution was proposed as measure of the complexity of a time series.
Maintained by Andreas M. Brandmaier. Last updated 2 years ago.
0.5 match 6 stars 5.61 score 25 scripts 9 dependentsannechao
iNEXT.beta3D:Interpolation and Extrapolation with Beta Diversity for Three Dimensions of Biodiversity
As a sequel to 'iNEXT', the 'iNEXT.beta3D' package provides functions to compute standardized taxonomic, phylogenetic, and functional diversity (3D) estimates with a common sample size (for alpha and gamma diversity) or sample coverage (for alpha, beta, gamma diversity as well as dissimilarity or turnover indices). Hill numbers and their generalizations are used to quantify 3D and to make multiplicative decomposition (gamma = alpha x beta). The package also features size- and coverage-based rarefaction and extrapolation sampling curves to facilitate rigorous comparison of beta diversity across datasets. See Chao et al. (2023) <doi:10.1002/ecm.1588> for more details.
Maintained by Anne Chao. Last updated 4 months ago.
0.5 match 5.30 score 6 scriptshms-dbmi
EHRtemporalVariability:Delineating Temporal Dataset Shifts in Electronic Health Records
Functions to delineate temporal dataset shifts in Electronic Health Records through the projection and visualization of dissimilarities among data temporal batches. This is done through the estimation of data statistical distributions over time and their projection in non-parametric statistical manifolds, uncovering the patterns of the data latent temporal variability. 'EHRtemporalVariability' is particularly suitable for multi-modal data and categorical variables with a high number of values, common features of biomedical data where traditional statistical process control or time-series methods may not be appropriate. 'EHRtemporalVariability' allows you to explore and identify dataset shifts through visual analytics formats such as Data Temporal heatmaps and Information Geometric Temporal (IGT) plots. An additional 'EHRtemporalVariability' Shiny app can be used to load and explore the package results and even to allow the use of these functions to those users non-experienced in R coding. (Sáez et al. 2020) <doi:10.1093/gigascience/giaa079>.
Maintained by Carlos Sáez. Last updated 11 months ago.
biomedical-data-sciencebiomedical-informaticsdata-qualitydata-quality-monitoringdataset-shiftselectronic-health-recordstimevariabilityvisualization
0.5 match 17 stars 5.27 score 22 scriptsmarlonecobos
mop:Mobility Oriented-Parity Metric
A set of tools to perform multiple versions of the Mobility Oriented-Parity metric. This multivariate analysis helps to characterize levels of dissimilarity between a set of conditions of reference and another set of conditions of interest. If predictive models are transferred to conditions different from those over which models were calibrated (trained), this metric helps to identify transfer conditions that differ substantially from those of calibration. These tools are implemented following principles proposed in Owens et al. (2013) <doi:10.1016/j.ecolmodel.2013.04.011>, and expanded to obtain more detailed results that aid in interpretation.
Maintained by Marlon E. Cobos. Last updated 9 months ago.
0.5 match 7 stars 5.23 score 20 scripts 2 dependentsemf-creaf
ecotraj:Ecological Trajectory Analysis
Assists ecologists in the analysis of temporal changes of ecosystems, defined as trajectories on a chosen multivariate space, by providing a set of trajectory metrics and visual representations [De Caceres et al. (2019) <doi:10.1002/ecm.1350>; and Sturbois et al. (2021) <doi:10.1016/j.ecolmodel.2020.109400>]. Includes functions to estimate metrics for individual trajectories (length, directionality, angles, ...) as well as metrics to relate pairs of trajectories (dissimilarity and convergence). Functions are also provided to estimate the ecological quality of ecosystem with respect to reference conditions [Sturbois et al. (2023) <doi:10.1002/ecs2.4726>].
Maintained by Miquel De Cáceres. Last updated 18 days ago.
0.5 match 4 stars 5.21 score 21 scripts 1 dependentsplangfelder
moduleColor:Basic Module Functions
Methods for color labeling, calculation of eigengenes, merging of closely related modules.
Maintained by Peter Langfelder. Last updated 3 years ago.
1.9 match 1.28 score 19 scriptshvlieb
gromovlab:Gromov-Hausdorff Type Distances for Labeled Metric Spaces
Computes Gromov-Hausdorff type l^p distances for labeled metric spaces. These distances were introduced in V.Liebscher, Gromov meets Phylogenetics - new Animals for the Zoo of Metrics on Tree Space <arXiv:1504.05795> for phylogenetic trees, but may apply to a diversity of scenarios.
Maintained by Volkmar Liebscher. Last updated 4 years ago.
2.3 match 1 stars 1.00 score 1 scriptsgianluca-sottile
clustEff:Clusters of Effects Curves in Quantile Regression Models
Clustering method to cluster both effects curves, through quantile regression coefficient modeling, and curves in functional data analysis. Sottile G. and Adelfio G. (2019) <doi:10.1007/s00180-018-0817-8>.
Maintained by Gianluca Sottile. Last updated 1 years ago.
2.3 match 1.00 score 7 scriptscran
mvcor:Correlation Coefficients for Multivariate Data
Correlation coefficients for multivariate data, namely the squared correlation coefficient and the RV coefficient (multivariate generalization of the squared Pearson correlation coefficient). References include Mardia K.V., Kent J.T. and Bibby J.M. (1979). "Multivariate Analysis". ISBN: 978-0124712522. London: Academic Press.
Maintained by Michail Tsagris. Last updated 2 months ago.
1.7 match 1.30 scorebioc
OMICsPCA:An R package for quantitative integration and analysis of multiple omics assays from heterogeneous samples
OMICsPCA is an analysis pipeline designed to integrate multi OMICs experiments done on various subjects (e.g. Cell lines, individuals), treatments (e.g. disease/control) or time points and to analyse such integrated data from various various angles and perspectives. In it's core OMICsPCA uses Principal Component Analysis (PCA) to integrate multiomics experiments from various sources and thus has ability to over data insufficiency issues by using the ingegrated data as representatives. OMICsPCA can be used in various application including analysis of overall distribution of OMICs assays across various samples /individuals /time points; grouping assays by user-defined conditions; identification of source of variation, similarity/dissimilarity between assays, variables or individuals.
Maintained by Subhadeep Das. Last updated 5 months ago.
immunooncologymultiplecomparisonprincipalcomponentdatarepresentationworkflowvisualizationdimensionreductionclusteringbiologicalquestionepigeneticsworkflowtranscriptiongeneticvariabilityguibiomedicalinformaticsepigeneticsfunctionalgenomicssinglecell
0.5 match 4.00 score 1 scriptsmansukoh
bayMDS:Bayesian Multidimensional Scaling and Choice of Dimension
Bayesian approach to multidimensional scaling. The package consists of implementations of the methods of Oh and Raftery (2001) <doi:10.1198/016214501753208690>.
Maintained by Man-Suk Oh. Last updated 2 years ago.
2.0 match 1.00 score 1 scriptscran
FastCUB:Fast Estimation of CUB Models via Louis' Identity
For ordinal rating data, consider the accelerated EM algorithm to estimate and test models within the family of CUB models (where CUB stands for Combination of a discrete Uniform and a shifted Binomial distributions). The procedure is built upon Louis' identity for the observed information matrix. Best-subset variable selection is then implemented since it becomes more feasible from the computational point of view.
Maintained by Rosaria Simone. Last updated 1 years ago.
2.0 match 1 stars 1.00 scoremanueleleonelli
bnmonitor:An Implementation of Sensitivity Analysis in Bayesian Networks
An implementation of sensitivity and robustness methods in Bayesian networks in R. It includes methods to perform parameter variations via a variety of co-variation schemes, to compute sensitivity functions and to quantify the dissimilarity of two Bayesian networks via distances and divergences. It further includes diagnostic methods to assess the goodness of fit of a Bayesian networks to data, including global, node and parent-child monitors. Reference: M. Leonelli, R. Ramanathan, R.L. Wilkerson (2022) <doi:10.1016/j.knosys.2023.110882>.
Maintained by Manuele Leonelli. Last updated 6 months ago.
0.5 match 3 stars 3.92 score 14 scriptsjlp2duke
EnsCat:Clustering of categorical data
This package implements the clustering methods of categorical data discussed in Amiri, S., Clarke, B. and Clarke J. (2015). Clustering categorical data via ensembling dissimilarity matrices. arXiv:1506.07930.
Maintained by Saeid Amiri. Last updated 8 years ago.
0.5 match 5 stars 3.74 score 22 scriptsaurora-torrente
briKmeans:Package for Brik, Fabrik and Fdebrik Algorithms to Initialise Kmeans
Implementation of the BRIk, FABRIk and FDEBRIk algorithms to initialise k-means. These methods are intended for the clustering of multivariate and functional data, respectively. They make use of the Modified Band Depth and bootstrap to identify appropriate initial seeds for k-means, which are proven to be better options than many techniques in the literature. Torrente and Romo (2021) <doi:10.1007/s00357-020-09372-3> It makes use of the functions kma and kma.similarity, from the archived package fdakma, by Alice Parodi et al.
Maintained by Aurora Torrente. Last updated 3 years ago.
1.8 match 1.00 scorecran
SegEnvIneq:Environmental Inequality Indices Based on Segregation Measures
A set of segregation-based indices and randomization methods to make robust environmental inequality assessments, as described in Schaeffer and Tivadar (2019) "Measuring Environmental Inequalities: Insights from the Residential Segregation Literature" <doi:10.1016/j.ecolecon.2019.05.009>.
Maintained by Mihai Tivadar. Last updated 4 months ago.
1.8 match 1.00 scorechiliubio
mecoturn:Decipher Microbial Turnover along a Gradient
Two pipelines are provided to study microbial turnover along a gradient, including the beta diversity and microbial abundance change. The 'betaturn' class consists of the steps of community dissimilarity matrix generation, matrix conversion, differential test and visualization. The workflow of 'taxaturn' class includes the taxonomic abundance calculation, abundance transformation, abundance change summary, statistical analysis and visualization. Multiple statistical approaches can contribute to the analysis of microbial turnover.
Maintained by Chi Liu. Last updated 6 months ago.
0.5 match 3 stars 3.48 score 6 scriptsvmielecnrs
econetwork:Analyzing Ecological Networks
A collection of advanced tools, methods and models specifically designed for analyzing different types of ecological networks - especially antagonistic (food webs, host-parasite), mutualistic (plant-pollinator, plant-fungus, etc) and competitive networks, as well as their variability in time and space. Statistical models are developed to describe and understand the mechanisms that determine species interactions, and to decipher the organization of these ecological networks (Ohlmann et al. (2019) <doi:10.1111/ele.13221>, Gonzalez et al. (2020) <doi:10.1101/2020.04.02.021691>, Miele et al. (2021) <doi:10.48550/arXiv.2103.10433>, Botella et al (2021) <doi:10.1111/2041-210X.13738>).
Maintained by Vincent Miele. Last updated 2 years ago.
1.7 match 1.04 score 11 scriptscran
MultivariateAnalysis:Pacote Para Analise Multivariada
Package with multivariate analysis methodologies for experiment evaluation. The package estimates dissimilarity measures, builds dendrograms, obtains MANOVA, principal components, canonical variables, etc. (Pacote com metodologias de analise multivariada para avaliação de experimentos. O pacote estima medidas de dissimilaridade, construi de dendogramas, obtem a MANOVA, componentes principais, variaveis canonicas, etc.)
Maintained by Alcinei Mistico Azevedo. Last updated 11 months ago.
0.5 match 2.95 scorecran
bpcp:Beta Product Confidence Procedure for Right Censored Data
Calculates nonparametric pointwise confidence intervals for the survival distribution for right censored data, and for medians [Fay and Brittain <DOI:10.1002/sim.6905>]. Has two-sample tests for dissimilarity (e.g., difference, ratio or odds ratio) in survival at a fixed time, and differences in medians [Fay, Proschan, and Brittain <DOI:10.1111/biom.12231>]. Basically, the package gives exact inference methods for one- and two-sample exact inferences for Kaplan-Meier curves (e.g., generalizing Fisher's exact test to allow for right censoring), which are especially important for latter parts of the survival curve, small sample sizes or heavily censored data. Includes mid-p options.
Maintained by Michael P. Fay. Last updated 3 years ago.
0.5 match 2.97 score 3 dependentscran
blockmodeling:Generalized and Classical Blockmodeling of Valued Networks
This is primarily meant as an implementation of generalized blockmodeling for valued networks. In addition, measures of similarity or dissimilarity based on structural equivalence and regular equivalence (REGE algorithms) can be computed and partitioned matrices can be plotted: Žiberna (2007)<doi:10.1016/j.socnet.2006.04.002>, Žiberna (2008)<doi:10.1080/00222500701790207>, Žiberna (2014)<doi:10.1016/j.socnet.2014.04.002>.
Maintained by Aleš Žiberna. Last updated 2 years ago.
0.5 match 2.78 score 12 dependentstraminer
TraMineRextras:TraMineR Extension
Collection of ancillary functions and utilities to be used in conjunction with the 'TraMineR' package for sequence data exploration. Includes, among others, specific functions such as state survival plots, position-wise group-typical states, dynamic sequence indicators, and dissimilarities between event sequences. Also includes contributions by non-members of the TraMineR team such as methods for polyadic data and for the comparison of groups of sequences.
Maintained by Gilbert Ritschard. Last updated 7 months ago.
0.5 match 2.43 score 89 scripts 1 dependentspatnik
CIM:Compositional Impact of Migration
Produces statistical indicators of the impact of migration on the socio-demographic composition of an area. Three measures can be used: ratios, percentages and the Duncan index of dissimilarity. The input data files are assumed to be in an origin-destination matrix format, with each cell representing a flow count between an origin and a destination area. Columns are expected to represent origins, and rows are expected to represent destinations. The first row and column are assumed to contain labels for each area. See Rodriguez-Vignoli and Rowe (2018) <doi:10.1080/00324728.2017.1416155> for technical details.
Maintained by Nikos Patias. Last updated 6 years ago.
0.5 match 2.00 score 4 scriptspmair78
semds:Structural Equation Multidimensional Scaling
Fits a structural equation multidimensional scaling (SEMDS) model for asymmetric and three-way input dissimilarities. It assumes that the dissimilarities are measured with errors. The latent dissimilarities are estimated as factor scores within an SEM framework while the objects are represented in a low-dimensional space as in MDS.
Maintained by Patrick Mair. Last updated 6 years ago.
1.0 match 1.00 score 7 scriptscran
fechner:Fechnerian Scaling of Discrete Object Sets
Functions and example datasets for Fechnerian scaling of discrete object sets. User can compute Fechnerian distances among objects representing subjective dissimilarities, and other related information. See package?fechner for an overview.
Maintained by Ali Uenlue. Last updated 9 years ago.
0.5 match 1.00 score 9 scriptsyushushi
aPCoA:Covariate Adjusted PCoA Plot
In fields such as ecology, microbiology, and genomics, non-Euclidean distances are widely applied to describe pairwise dissimilarity between samples. Given these pairwise distances, principal coordinates analysis (PCoA) is commonly used to construct a visualization of the data. However, confounding covariates can make patterns related to the scientific question of interest difficult to observe. We provide 'aPCoA' as an easy-to-use tool to improve data visualization in this context, enabling enhanced presentation of the effects of interest. Details are described in Yushu Shi, Liangliang Zhang, Kim-Anh Do, Christine Peterson and Robert Jenq (2020) Bioinformatics, Volume 36, Issue 13, 4099-4101.
Maintained by Yushu Shi. Last updated 3 years ago.
0.5 match 1.00 score 3 scripts