Showing 109 of total 109 results (show query)
ncchung
jaccard:Testing similarity between binary datasets using Jaccard/Tanimoto coefficients
Calculate statistical significance of Jaccard/Tanimoto similarity coefficients.
Maintained by Neo Christopher Chung. Last updated 5 years ago.
binary-datahypothesis-testingjaccardsimilaritystatisticstanimotocpp
96.5 match 5 stars 5.03 score 85 scriptsdrostlab
philentropy:Similarity and Distance Quantification Between Probability Functions
Computes 46 optimized distance and similarity measures for comparing probability functions (Drost (2018) <doi:10.21105/joss.00765>). These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a core framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions.
Maintained by Hajk-Georg Drost. Last updated 4 months ago.
distance-measuresdistance-quantificationinformation-theoryjensen-shannon-divergenceparametric-distributionssimilarity-measuresstatisticscpp
5.2 match 137 stars 12.44 score 484 scripts 24 dependentsstuart-lab
Signac:Analysis of Single-Cell Chromatin Data
A framework for the analysis and exploration of single-cell chromatin data. The 'Signac' package contains functions for quantifying single-cell chromatin data, computing per-cell quality control metrics, dimension reduction and normalization, visualization, and DNA sequence motif analysis. Reference: Stuart et al. (2021) <doi:10.1038/s41592-021-01282-5>.
Maintained by Tim Stuart. Last updated 7 months ago.
atacbioinformaticssingle-cellzlibcpp
5.1 match 355 stars 12.18 score 3.7k scripts 1 dependentseagerai
fastai:Interface to 'fastai'
The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.
Maintained by Turgut Abdullayev. Last updated 12 months ago.
audiocollaborative-filteringdarknetdarknet-image-classificationfastaimedicalobject-detectiontabulartextvision
6.6 match 118 stars 9.40 score 76 scriptsthie1e
cutpointr:Determine and Evaluate Optimal Cutpoints in Binary Classification Tasks
Estimate cutpoints that optimize a specified metric in binary classification tasks and validate performance using bootstrapping. Some methods for more robust cutpoint estimation are supported, e.g. a parametric method assuming normal distributions, bootstrapped cutpoints, and smoothing of the metric values per cutpoint using Generalized Additive Models. Various plotting functions are included. For an overview of the package see Thiele and Hirschfeld (2021) <doi:10.18637/jss.v098.i11>.
Maintained by Christian Thiele. Last updated 4 months ago.
bootstrappingcutpoint-optimizationroc-curvecpp
5.3 match 88 stars 10.44 score 322 scripts 1 dependentsserkor1
SLmetrics:Machine Learning Performance Evaluation on Steroids
Performance evaluation metrics for supervised and unsupervised machine learning, statistical learning and artificial intelligence applications. Core computations are implemented in 'C++' for scalability and efficiency.
Maintained by Serkan Korkmaz. Last updated 4 days ago.
cppdata-analysisdata-scienceeigen3machine-learningperformance-metricsrcpprcppeigenstatisticssupervised-learningcppopenmp
7.8 match 22 stars 6.56 scorebernd-mueller
epos:Epilepsy Ontologies' Similarities
Analysis and visualization of similarities between epilepsy ontologies based on text mining results by comparing ranked lists of co-occurring drug terms in the BioASQ corpus. The ranked result lists of neurological drug terms co-occurring with terms from the epilepsy ontologies EpSO, ESSO, EPILONT, EPISEM and FENICS undergo further analysis. The source data to create the ranked lists of drug names is produced using the text mining workflows described in Mueller, Bernd and Hagelstein, Alexandra (2016) <doi:10.4126/FRL01-006408558>, Mueller, Bernd et al. (2017) <doi:10.1007/978-3-319-58694-6_22>, Mueller, Bernd and Rebholz-Schuhmann, Dietrich (2020) <doi:10.1007/978-3-030-43887-6_52>, and Mueller, Bernd et al. (2022) <doi:10.1186/s13326-021-00258-w>.
Maintained by Bernd Mueller. Last updated 1 years ago.
11.8 match 4.03 score 53 scriptsigraph
igraph:Network Analysis and Visualization
Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
Maintained by Kirill Müller. Last updated 6 hours ago.
complex-networksgraph-algorithmsgraph-theorymathematicsnetwork-analysisnetwork-graphfortranlibxml2glpkopenblascpp
2.3 match 584 stars 21.13 score 31k scripts 1.9k dependentspboutros
bedr:Genomic Region Processing using Tools Such as 'BEDTools', 'BEDOPS' and 'Tabix'
Genomic regions processing using open-source command line tools such as 'BEDTools', 'BEDOPS' and 'Tabix'. These tools offer scalable and efficient utilities to perform genome arithmetic e.g indexing, formatting and merging. bedr API enhances access to these tools as well as offers additional utilities for genomic regions processing.
Maintained by Paul C. Boutros. Last updated 6 years ago.
8.6 match 4.98 score 264 scripts 2 dependentsmw201608
SuperExactTest:Exact Test and Visualization of Multi-Set Intersections
Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. This package implements a theoretical framework for efficient computation of statistical distributions of multi-set intersections based upon combinatorial theory, and provides multiple scalable techniques for visualizing the intersection statistics. The statistical algorithm behind this package was published in Wang et al. (2015) <doi:10.1038/srep16923>.
Maintained by Minghui Wang. Last updated 1 years ago.
intersectionsetstatisticsvisualization
5.3 match 28 stars 7.47 score 70 scripts 1 dependentsms609
TreeDist:Calculate and Map Distances Between Phylogenetic Trees
Implements measures of tree similarity, including information-based generalized Robinson-Foulds distances (Phylogenetic Information Distance, Clustering Information Distance, Matching Split Information Distance; Smith 2020) <doi:10.1093/bioinformatics/btaa614>; Jaccard-Robinson-Foulds distances (Bocker et al. 2013) <doi:10.1007/978-3-642-40453-5_13>, including the Nye et al. (2006) metric <doi:10.1093/bioinformatics/bti720>; the Matching Split Distance (Bogdanowicz & Giaro 2012) <doi:10.1109/TCBB.2011.48>; Maximum Agreement Subtree distances; the Kendall-Colijn (2016) distance <doi:10.1093/molbev/msw124>, and the Nearest Neighbour Interchange (NNI) distance, approximated per Li et al. (1996) <doi:10.1007/3-540-61332-3_168>. Includes tools for visualizing mappings of tree space (Smith 2022) <doi:10.1093/sysbio/syab100>, for identifying islands of trees (Silva and Wilkinson 2021) <doi:10.1093/sysbio/syab015>, for calculating the median of sets of trees, and for computing the information content of trees and splits.
Maintained by Martin R. Smith. Last updated 2 months ago.
phylogeneticstree-distancephylogenetic-treestree-distancestreescpp
3.6 match 32 stars 10.32 score 97 scripts 5 dependentsc0webster
fedmatch:Fast, Flexible, and User-Friendly Record Linkage Methods
Provides a flexible set of tools for matching two un-linked data sets. 'fedmatch' allows for three ways to match data: exact matches, fuzzy matches, and multi-variable matches. It also allows an easy combination of these three matches via the tier matching function.
Maintained by Chris Webster. Last updated 2 months ago.
7.5 match 1 stars 4.61 score 80 scriptssnoweye
EMCluster:EM Algorithm for Model-Based Clustering of Finite Mixture Gaussian Distribution
EM algorithms and several efficient initialization methods for model-based clustering of finite mixture Gaussian distribution with unstructured dispersion in both of unsupervised and semi-supervised learning.
Maintained by Wei-Chen Chen. Last updated 7 months ago.
4.5 match 18 stars 7.53 score 123 scripts 2 dependentschrhennig
prabclus:Functions for Clustering and Testing of Presence-Absence, Abundance and Multilocus Genetic Data
Distance-based parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presence-absence, abundance and multilocus genetic data for species delimitation, nearest neighbor based noise detection. Genetic distances between communities. Tests whether various distance-based regressions are equal. Try package?prabclus for on overview.
Maintained by Christian Hennig. Last updated 6 months ago.
5.3 match 1 stars 6.07 score 90 scripts 70 dependentskoheiw
proxyC:Computes Proximity in Large Sparse Matrices
Computes proximity between rows or columns of large matrices efficiently in C++. Functions are optimised for large sparse matrices using the Armadillo and Intel TBB libraries. Among various built-in similarity/distance measures, computation of correlation, cosine similarity and Euclidean distance is particularly fast.
Maintained by Kohei Watanabe. Last updated 5 months ago.
data-sciencedistance-measuressimilarity-measuresopenblasonetbbcpp
3.5 match 28 stars 8.90 score 23 scripts 33 dependentsmlr-org
mlr3:Machine Learning in R - Next Generation
Efficient, object-oriented programming on the building blocks of machine learning. Provides 'R6' objects for tasks, learners, resamplings, and measures. The package is geared towards scalability and larger datasets by supporting parallelization and out-of-memory data-backends like databases. While 'mlr3' focuses on the core computational operations, add-on packages provide additional functionality.
Maintained by Marc Becker. Last updated 20 days ago.
classificationdata-sciencemachine-learningmlr3regression
2.0 match 972 stars 14.86 score 2.3k scripts 35 dependentsbioc
scp:Mass Spectrometry-Based Single-Cell Proteomics Data Analysis
Utility functions for manipulating, processing, and analyzing mass spectrometry-based single-cell proteomics data. The package is an extension to the 'QFeatures' package and relies on 'SingleCellExpirement' to enable single-cell proteomics analyses. The package offers the user the functionality to process quantitative table (as generated by MaxQuant, Proteome Discoverer, and more) into data tables ready for downstream analysis and data visualization.
Maintained by Christophe Vanderaa. Last updated 1 months ago.
geneexpressionproteomicssinglecellmassspectrometrypreprocessingcellbasedassaysbioconductormass-spectrometrysingle-cellsoftware
3.2 match 26 stars 8.95 score 115 scriptszpneal
backbone:Extracts the Backbone from Graphs
An implementation of methods for extracting an unweighted unipartite graph (i.e. a backbone) from an unweighted unipartite graph, a weighted unipartite graph, the projection of an unweighted bipartite graph, or the projection of a weighted bipartite graph (Neal, 2022 <doi:10.1371/journal.pone.0269137>).
Maintained by Zachary Neal. Last updated 1 years ago.
4.0 match 41 stars 7.06 score 31 scripts 2 dependentsrich-iannone
DiagrammeR:Graph/Network Visualization
Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.
Maintained by Richard Iannone. Last updated 2 months ago.
graphgraph-functionsnetwork-graphproperty-graphvisualization
1.8 match 1.7k stars 15.29 score 3.8k scripts 86 dependentsbioc
PIUMA:Phenotypes Identification Using Mapper from topological data Analysis
The PIUMA package offers a tidy pipeline of Topological Data Analysis frameworks to identify and characterize communities in high and heterogeneous dimensional data.
Maintained by Mattia Chiesa. Last updated 5 months ago.
clusteringgraphandnetworkdimensionreductionnetworkclassification
5.2 match 4 stars 5.08 score 2 scriptsbeniaminogreen
zoomerjoin:Superlatively Fast Fuzzy Joins
Empowers users to fuzzily-merge data frames with millions or tens of millions of rows in minutes with low memory usage. The package uses the locality sensitive hashing algorithms developed by Datar, Immorlica, Indyk and Mirrokni (2004) <doi:10.1145/997817.997857>, and Broder (1998) <doi:10.1109/SEQUEN.1997.666900> to avoid having to compare every pair of records in each dataset, resulting in fuzzy-merges that finish in linear time.
Maintained by Beniamino Green. Last updated 2 months ago.
blazinglyfastfuzzyjoinjoinrustzoomercargo
3.5 match 102 stars 7.31 score 11 scriptscran
fossil:Palaeoecological and Palaeogeographical Analysis Tools
A set of analytical tools useful in analysing ecological and geographical data sets, both ancient and modern. The package includes functions for estimating species richness (Chao 1 and 2, ACE, ICE, Jacknife), shared species/beta diversity, species area curves and geographic distances and areas.
Maintained by Matthew J. Vavrek. Last updated 5 years ago.
7.3 match 1 stars 3.44 score 7 dependentsjarioksa
natto:An Extreme 'vegan' Package of Experimental Code
Random code that is too experimental or too weird to be included in the vegan package.
Maintained by Jari Oksanen. Last updated 1 months ago.
5.2 match 8 stars 4.68 score 1 scriptsdjvanderlaan
reclin2:Record Linkage Toolkit
Functions to assist in performing probabilistic record linkage and deduplication: generating pairs, comparing records, em-algorithm for estimating m- and u-probabilities (I. Fellegi & A. Sunter (1969) <doi:10.1080/01621459.1969.10501049>, T.N. Herzog, F.J. Scheuren, & W.E. Winkler (2007), "Data Quality and Record Linkage Techniques", ISBN:978-0-387-69502-0), forcing one-to-one matching. Can also be used for pre- and post-processing for machine learning methods for record linkage. Focus is on memory, CPU performance and flexibility.
Maintained by Jan van der Laan. Last updated 1 years ago.
3.3 match 43 stars 7.36 score 89 scripts 1 dependentsbioc
HarmonizR:Handles missing values and makes more data available
An implementation, which takes input data and makes it available for proper batch effect removal by ComBat or Limma. The implementation appropriately handles missing values by dissecting the input matrix into smaller matrices with sufficient data to feed the ComBat or limma algorithm. The adjusted data is returned to the user as a rebuild matrix. The implementation is meant to make as much data available as possible with minimal data loss.
Maintained by Simon Schlumbohm. Last updated 5 months ago.
5.8 match 4.20 score 16 scriptsanespinosa
netmem:Social Network Measures using Matrices
Measures to describe and manipulate networks using matrices.
Maintained by Alejandro Espinosa-Rada. Last updated 21 days ago.
matricesmultilayer-networksnetwork-analysisnetwork-sciencesnasocial-networksocial-network-analysissociology
5.6 match 11 stars 4.33 score 13 scriptselies-ramon
kerntools:Kernel Functions and Tools for Machine Learning Applications
Kernel functions for diverse types of data (including, but not restricted to: nonnegative and real vectors, real matrices, categorical and ordinal variables, sets, strings), plus other utilities like kernel similarity, kernel Principal Components Analysis (PCA) and features' importance for Support Vector Machines (SVMs), which expand other 'R' packages like 'kernlab'.
Maintained by Elies Ramon. Last updated 1 days ago.
4.8 match 1 stars 4.86 score 12 scriptsocbe-uio
DIscBIO:A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
An open, multi-algorithmic pipeline for easy, fast and efficient analysis of cellular sub-populations and the molecular signatures that characterize them. The pipeline consists of four successive steps: data pre-processing, cellular clustering with pseudo-temporal ordering, defining differential expressed genes and biomarker identification. More details on Ghannoum et. al. (2021) <doi:10.3390/ijms22031399>. This package implements extensions of the work published by Ghannoum et. al. (2019) <doi:10.1101/700989>.
Maintained by Waldir Leoncio. Last updated 1 years ago.
biomarker-discoveryjupyter-notebookscrna-seqsingle-cell-analysistranscriptomicsopenjdk
5.3 match 12 stars 4.38 score 5 scriptskylebittinger
abdiv:Alpha and Beta Diversity Measures
A collection of measures for measuring ecological diversity. Ecological diversity comes in two flavors: alpha diversity measures the diversity within a single site or sample, and beta diversity measures the diversity across two sites or samples. This package overlaps considerably with other R packages such as 'vegan', 'gUniFrac', 'betapart', and 'fossil'. We also include a wide range of functions that are implemented in software outside the R ecosystem, such as 'scipy', 'Mothur', and 'scikit-bio'. The implementations here are designed to be basic and clear to the reader.
Maintained by Kyle Bittinger. Last updated 1 years ago.
5.2 match 9 stars 4.14 score 31 scriptsbioc
GeDi:Defining and visualizing the distances between different genesets
The package provides different distances measurements to calculate the difference between genesets. Based on these scores the genesets are clustered and visualized as graph. This is all presented in an interactive Shiny application for easy usage.
Maintained by Annekathrin Nedwed. Last updated 5 months ago.
guigenesetenrichmentsoftwaretranscriptionrnaseqvisualizationclusteringpathwaysreportwritinggokeggreactomeshinyapps
3.9 match 1 stars 5.36 score 22 scriptsbzhanglab
WebGestaltR:Gene Set Analysis Toolkit WebGestaltR
The web version WebGestalt <https://www.webgestalt.org> supports 12 organisms, 354 gene identifiers and 321,251 function categories. Users can upload the data and functional categories with their own gene identifiers. In addition to the Over-Representation Analysis, WebGestalt also supports Gene Set Enrichment Analysis and Network Topology Analysis. The user-friendly output report allows interactive and efficient exploration of enrichment results. The WebGestaltR package not only supports all above functions but also can be integrated into other pipeline or simultaneously analyze multiple gene lists.
Maintained by John Elizarraras. Last updated 5 days ago.
2.3 match 35 stars 9.18 score 180 scriptslixiangzhang
OTclust:Mean Partition, Uncertainty Assessment, Cluster Validation and Visualization Selection for Cluster Analysis
Providing mean partition for ensemble clustering by optimal transport alignment(OTA), uncertainty measures for both partition-wise and cluster-wise assessment and multiple visualization functions to show uncertainty, for instance, membership heat map and plot of covering point set. A partition refers to an overall clustering result. Jia Li, Beomseok Seo, and Lin Lin (2019) <doi:10.1002/sam.11418>. Lixiang Zhang, Lin Lin, and Jia Li (2020) <doi:10.1093/bioinformatics/btaa165>.
Maintained by Lixiang Zhang. Last updated 1 years ago.
5.3 match 3.70 score 6 scriptsbioc
demuxSNP:scRNAseq demultiplexing using cell hashing and SNPs
This package assists in demultiplexing scRNAseq data using both cell hashing and SNPs data. The SNP profile of each group os learned using high confidence assignments from the cell hashing data. Cells which cannot be assigned with high confidence from the cell hashing data are assigned to their most similar group based on their SNPs. We also provide some helper function to optimise SNP selection, create training data and merge SNP data into the SingleCellExperiment framework.
Maintained by Michael Lynch. Last updated 5 months ago.
3.5 match 6 stars 5.52 score 22 scriptschrhennig
fpc:Flexible Procedures for Clustering
Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.
Maintained by Christian Hennig. Last updated 6 months ago.
1.9 match 11 stars 9.32 score 2.6k scripts 69 dependentsbioc
ClustAll:ClustAll: Data driven strategy to robustly identify stratification of patients within complex diseases
Data driven strategy to find hidden groups of patients with complex diseases using clinical data. ClustAll facilitates the unsupervised identification of multiple robust stratifications. ClustAll, is able to overcome the most common limitations found when dealing with clinical data (missing values, correlated data, mixed data types).
Maintained by Asier Ortega-Legarreta. Last updated 5 months ago.
softwarestatisticalmethodclusteringdimensionreductionprincipalcomponent
4.7 match 3.70 score 1 scriptsrnabioco
valr:Genome Interval Arithmetic
Read and manipulate genome intervals and signals. Provides functionality similar to command-line tool suites within R, enabling interactive analysis and visualization of genome-scale data. Riemondy et al. (2017) <doi:10.12688/f1000research.11997.1>.
Maintained by Kent Riemondy. Last updated 23 days ago.
bedtoolsgenomeinterval-arithmeticcpp
1.8 match 90 stars 9.69 score 227 scriptsbblonder
hypervolume:High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls
Estimates the shape and volume of high-dimensional datasets and performs set operations: intersection / overlap, union, unique components, inclusion test, and hole detection. Uses stochastic geometry approach to high-dimensional kernel density estimation, support vector machine delineation, and convex hull generation. Applications include modeling trait and niche hypervolumes and species distribution modeling.
Maintained by Benjamin Blonder. Last updated 2 months ago.
1.7 match 23 stars 9.69 score 211 scripts 7 dependentsbioc
clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters
Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.
Maintained by Rui Fu. Last updated 5 months ago.
singlecellannotationsequencingmicroarraygeneexpressionassign-identitiesclustersmarker-genesrna-seqsingle-cell-rna-seq
1.7 match 120 stars 9.63 score 296 scriptscore-bioinformatics
bulkAnalyseR:Interactive Shiny App for Bulk Sequencing Data
Given an expression matrix from a bulk sequencing experiment, pre-processes it and creates a shiny app for interactive data analysis and visualisation. The app contains quality checks, differential expression analysis, volcano and cross plots, enrichment analysis and gene regulatory network inference, and can be customised to contain more panels by the user.
Maintained by Ilias Moutsopoulos. Last updated 1 years ago.
3.4 match 27 stars 4.47 score 11 scriptsbioc
GeneTonic:Enjoy Analyzing And Integrating The Results From Differential Expression Analysis And Functional Enrichment Analysis
This package provides functionality to combine the existing pieces of the transcriptome data and results, making it easier to generate insightful observations and hypothesis. Its usage is made easy with a Shiny application, combining the benefits of interactivity and reproducibility e.g. by capturing the features and gene sets of interest highlighted during the live session, and creating an HTML report as an artifact where text, code, and output coexist. Using the GeneTonicList as a standardized container for all the required components, it is possible to simplify the generation of multiple visualizations and summaries.
Maintained by Federico Marini. Last updated 3 months ago.
guigeneexpressionsoftwaretranscriptiontranscriptomicsvisualizationdifferentialexpressionpathwaysreportwritinggenesetenrichmentannotationgoshinyappsbioconductorbioconductor-packagedata-explorationdata-visualizationfunctional-enrichment-analysisgene-expressionpathway-analysisreproducible-researchrna-seq-analysisrna-seq-datashinytranscriptomeuser-friendly
1.8 match 77 stars 8.28 score 37 scripts 1 dependentshelixcn
spaa:SPecies Association Analysis
Miscellaneous functions for analysing species association and niche overlap.
Maintained by Jinlong Zhang. Last updated 4 years ago.
2.0 match 12 stars 7.40 score 155 scripts 1 dependentscran
clv:Cluster Validation Techniques
Package contains most of the popular internal and external cluster validation methods ready to use for the most of the outputs produced by functions coming from package "cluster". Package contains also functions and examples of usage for cluster stability approach that might be applied to algorithms implemented in "cluster" package as well as user defined clustering algorithms.
Maintained by Lukasz Nieweglowski. Last updated 2 years ago.
3.9 match 1 stars 3.50 score 17 dependentsmlampros
textTinyR:Text Processing for Small or Big Data Files
It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.
Maintained by Lampros Mouselimis. Last updated 1 years ago.
bhboostcpp11processingrcpprcpparmadillotextopenblascppopenmp
1.8 match 39 stars 7.64 score 244 scripts 1 dependentsjanuary3
tmod:Feature Set Enrichment Analysis for Metabolomics and Transcriptomics
Methods and feature set definitions for feature or gene set enrichment analysis in transcriptional and metabolic profiling data. Package includes tests for enrichment based on ranked lists of features, functions for visualisation and multivariate functional analysis. See Zyla et al (2019) <doi:10.1093/bioinformatics/btz447>.
Maintained by January Weiner. Last updated 2 months ago.
2.0 match 3 stars 6.88 score 168 scripts 1 dependentsblansche
fdm2id:Data Mining and R Programming for Beginners
Contains functions to simplify the use of data mining methods (classification, regression, clustering, etc.), for students and beginners in R programming. Various R packages are used and wrappers are built around the main functions, to standardize the use of data mining methods (input/output): it brings a certain loss of flexibility, but also a gain of simplicity. The package name came from the French "Fouille de Données en Master 2 Informatique Décisionnelle".
Maintained by Alexandre Blansché. Last updated 2 years ago.
8.5 match 1 stars 1.62 score 42 scriptsozvan
Ravages:Rare Variant Analysis and Genetic Simulations
Rare variant association tests: burden tests (Bocher et al. 2019 <doi:10.1002/gepi.22210>) and the Sequence Kernel Association Test (Bocher et al. 2021 <doi:10.1038/s41431-020-00792-8>) in the whole genome; and genetic simulations.
Maintained by Ozvan Bocher. Last updated 2 years ago.
5.6 match 2.30 score 2 scriptsbnosac
textrank:Summarize Text by Ranking Sentences and Finding Keywords
The 'textrank' algorithm is an extension of the 'Pagerank' algorithm for text. The algorithm allows to summarize text by calculating how sentences are related to one another. This is done by looking at overlapping terminology used in sentences in order to set up links between sentences. The resulting sentence network is next plugged into the 'Pagerank' algorithm which identifies the most important sentences in your text and ranks them. In a similar way 'textrank' can also be used to extract keywords. A word network is constructed by looking if words are following one another. On top of that network the 'Pagerank' algorithm is applied to extract relevant words after which relevant words which are following one another are combined to get keywords. More information can be found in the paper from Mihalcea, Rada & Tarau, Paul (2004) <https://www.aclweb.org/anthology/W04-3252/>.
Maintained by Jan Wijffels. Last updated 4 years ago.
natural-language-processingnlptextranktextrank-algorithm
1.7 match 77 stars 7.38 score 103 scripts 2 dependentsbommert
stabm:Stability Measures for Feature Selection
An implementation of many measures for the assessment of the stability of feature selection. Both simple measures and measures which take into account the similarities between features are available, see Bommert (2020) <doi:10.17877/DE290R-21906>.
Maintained by Andrea Bommert. Last updated 2 years ago.
2.0 match 6 stars 6.29 score 33 scripts 3 dependentsnanne-aben
iTOP:Inferring the Topology of Omics Data
Infers a topology of relationships between different datasets, such as multi-omics and phenotypic data recorded on the same samples. We based this methodology on the RV coefficient (Robert & Escoufier, 1976, <doi:10.2307/2347233>), a measure of matrix correlation, which we have extended for partial matrix correlations and binary data (Aben et al., 2018, <doi:10.1101/293993>).
Maintained by Nanne Aben. Last updated 7 years ago.
5.6 match 2.23 score 17 scriptsbioc
TMSig:Tools for Molecular Signatures
The TMSig package contains tools to prepare, analyze, and visualize named lists of sets, with an emphasis on molecular signatures (such as gene or kinase sets). It includes fast, memory efficient functions to construct sparse incidence and similarity matrices and filter, cluster, invert, and decompose sets. Additionally, bubble heatmaps can be created to visualize the results of any differential or molecular signatures analysis.
Maintained by Tyler Sagendorf. Last updated 5 months ago.
clusteringgenesetenrichmentgraphandnetworkpathwaysvisualizationgene-setsmolecular-signatures
2.2 match 4 stars 5.58 score 4 scriptsmwsill
s4vd:Biclustering via Sparse Singular Value Decomposition Incorporating Stability Selection
The main function s4vd() performs a biclustering via sparse singular value decomposition with a nested stability selection. The results is an biclust object and thus all methods of the biclust package can be applied.
Maintained by Martin Sill. Last updated 5 years ago.
2.3 match 4 stars 5.31 score 17 scripts 2 dependentstesselle
tabula:Analysis and Visualization of Archaeological Count Data
An easy way to examine archaeological count data. This package provides several tests and measures of diversity: heterogeneity and evenness (Brillouin, Shannon, Simpson, etc.), richness and rarefaction (Chao1, Chao2, ACE, ICE, etc.), turnover and similarity (Brainerd-Robinson, etc.). It allows to easily visualize count data and statistical thresholds: rank vs abundance plots, heatmaps, Ford (1962) and Bertin (1977) diagrams, etc.
Maintained by Nicolas Frerebeau. Last updated 16 hours ago.
data-visualizationarchaeologyarchaeological-science
2.3 match 5.14 score 38 scripts 1 dependentslarssnip
micropan:Microbial Pan-Genome Analysis
A collection of functions for computations and visualizations of microbial pan-genomes.
Maintained by Lars Snipen. Last updated 3 years ago.
1.9 match 21 stars 6.15 score 67 scriptsbioc
SynExtend:Tools for Working With Synteny Objects
Shared order between genomic sequences provide a great deal of information. Synteny objects produced by the R package DECIPHER provides quantitative information about that shared order. SynExtend provides tools for extracting information from Synteny objects.
Maintained by Nicholas Cooley. Last updated 18 days ago.
geneticsclusteringcomparativegenomicsdataimportfortranopenmp
1.8 match 1 stars 6.42 score 77 scriptsdeisygysi
NetSci:Calculates Basic Network Measures Commonly Used in Network Medicine
Calculates network measures commonly used in Network Medicine. Measures such as the Largest Connected Component, the Relative Largest Connected Component, Proximity and Separation are calculated along with their statistical significance. Significance can be computed both using a degree-preserving randomization and non-degree preserving.
Maintained by Deisy Morselli Gysi. Last updated 6 months ago.
6.6 match 1.70 score 9 scriptsbioc
CNVMetrics:Copy Number Variant Metrics
The CNVMetrics package calculates similarity metrics to facilitate copy number variant comparison among samples and/or methods. Similarity metrics can be employed to compare CNV profiles of genetically unrelated samples as well as those with a common genetic background. Some metrics are based on the shared amplified/deleted regions while other metrics rely on the level of amplification/deletion. The data type used as input is a plain text file containing the genomic position of the copy number variations, as well as the status and/or the log2 ratio values. Finally, a visualization tool is provided to explore resulting metrics.
Maintained by Astrid Deschênes. Last updated 5 months ago.
biologicalquestionsoftwarecopynumbervariationcnvcopy-number-variationmetricsr-language
2.2 match 4 stars 5.08 score 8 scriptsbioc
omicsViewer:Interactive and explorative visualization of SummarizedExperssionSet or ExpressionSet using omicsViewer
omicsViewer visualizes ExpressionSet (or SummarizedExperiment) in an interactive way. The omicsViewer has a separate back- and front-end. In the back-end, users need to prepare an ExpressionSet that contains all the necessary information for the downstream data interpretation. Some extra requirements on the headers of phenotype data or feature data are imposed so that the provided information can be clearly recognized by the front-end, at the same time, keep a minimum modification on the existing ExpressionSet object. The pure dependency on R/Bioconductor guarantees maximum flexibility in the statistical analysis in the back-end. Once the ExpressionSet is prepared, it can be visualized using the front-end, implemented by shiny and plotly. Both features and samples could be selected from (data) tables or graphs (scatter plot/heatmap). Different types of analyses, such as enrichment analysis (using Bioconductor package fgsea or fisher's exact test) and STRING network analysis, will be performed on the fly and the results are visualized simultaneously. When a subset of samples and a phenotype variable is selected, a significance test on means (t-test or ranked based test; when phenotype variable is quantitative) or test of independence (chi-square or fisher’s exact test; when phenotype data is categorical) will be performed to test the association between the phenotype of interest with the selected samples. Additionally, other analyses can be easily added as extra shiny modules. Therefore, omicsViewer will greatly facilitate data exploration, many different hypotheses can be explored in a short time without the need for knowledge of R. In addition, the resulting data could be easily shared using a shiny server. Otherwise, a standalone version of omicsViewer together with designated omics data could be easily created by integrating it with portable R, which can be shared with collaborators or submitted as supplementary data together with a manuscript.
Maintained by Chen Meng. Last updated 2 months ago.
softwarevisualizationgenesetenrichmentdifferentialexpressionmotifdiscoverynetworknetworkenrichment
1.9 match 4 stars 5.82 score 22 scriptskit-iism-em
partitionComparison:Implements Measures for the Comparison of Two Partitions
Provides several measures ((dis)similarity, distance/metric, correlation, entropy) for comparing two partitions of the same set of objects. The different measures can be assigned to three different classes: Pair comparison (containing the famous Jaccard and Rand indices), set based, and information theory based. Many of the implemented measures can be found in Albatineh AN, Niewiadomska-Bugaj M and Mihalko D (2006) <doi:10.1007/s00357-006-0017-z> and Meila M (2007) <doi:10.1016/j.jmva.2006.11.013>. Partitions are represented by vectors of class labels which allow a straightforward integration with existing clustering algorithms (e.g. kmeans()). The package is mostly based on the S4 object system.
Maintained by Fabian Ball. Last updated 2 years ago.
comparisondissimilarity-measuresdistance-measurespartitionssimilarity-measures
2.8 match 2 stars 3.78 score 60 scriptsbioc
GeneOverlap:Test and visualize gene overlaps
Test two sets of gene lists and visualize the results.
Maintained by António Miguel de Jesus Domingues, Max-Planck Institute for Cell Biology and Genetics. Last updated 5 months ago.
multiplecomparisonvisualization
1.6 match 6.46 score 266 scriptscran
fclust:Fuzzy Clustering
Algorithms for fuzzy clustering, cluster validity indices and plots for cluster validity and visualizing fuzzy clustering results.
Maintained by Paolo Giordani. Last updated 2 years ago.
4.3 match 1 stars 2.38 score 2 dependentsjosetamezpena
FRESA.CAD:Feature Selection Algorithms for Computer Aided Diagnosis
Contains a set of utilities for building and testing statistical models (linear, logistic,ordinal or COX) for Computer Aided Diagnosis/Prognosis applications. Utilities include data adjustment, univariate analysis, model building, model-validation, longitudinal analysis, reporting and visualization.
Maintained by Jose Gerardo Tamez-Pena. Last updated 2 months ago.
1.8 match 7 stars 5.59 score 31 scriptsblasbenito
distantia:Advanced Toolset for Efficient Time Series Dissimilarity Analysis
Fast C++ implementation of Dynamic Time Warping for time series dissimilarity analysis, with applications in environmental monitoring and sensor data analysis, climate science, signal processing and pattern recognition, and financial data analysis. Built upon the ideas presented in Benito and Birks (2020) <doi:10.1111/ecog.04895>, provides tools for analyzing time series of varying lengths and structures, including irregular multivariate time series. Key features include individual variable contribution analysis, restricted permutation tests for statistical significance, and imputation of missing data via GAMs. Additionally, the package provides an ample set of tools to prepare and manage time series data.
Maintained by Blas M. Benito. Last updated 1 months ago.
dissimilaritydynamic-time-warpinglock-steptime-seriescpp
1.8 match 23 stars 5.73 score 11 scriptsbioc
IsoformSwitchAnalyzeR:Identify, Annotate and Visualize Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data
Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.
Maintained by Kristoffer Vitting-Seerup. Last updated 5 months ago.
geneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicingvisualizationstatisticalmethodtranscriptomevariantbiomedicalinformaticsfunctionalgenomicssystemsbiologytranscriptomicsrnaseqannotationfunctionalpredictiongenepredictiondataimportmultiplecomparisonbatcheffectimmunooncology
1.1 match 108 stars 9.26 score 125 scriptscadam00
prior3D:3D Prioritization Algorithm
Three-dimensional systematic conservation planning, conducting nested prioritization analyses across multiple depth levels and ensuring efficient resource allocation throughout the water column. It provides a structured workflow designed to address biodiversity conservation and management challenges in the 3 dimensions, while facilitating users’ choices and parameterization (Doxa et al. 2025 <doi:10.1016/j.ecolmodel.2024.110919>).
Maintained by Christos Adam. Last updated 2 months ago.
biodiversityconservationconservation-planningdepthmarine-spatial-planningmultidimensional-environmentsprioritization
1.7 match 6 stars 5.62 score 3 scriptsmengxu98
inferCSN:Inferring Cell-Specific Gene Regulatory Network
An R package for inferring cell-type specific gene regulatory network from single-cell RNA data.
Maintained by Meng Xu. Last updated 22 hours ago.
2.0 match 3 stars 4.80 score 6 scriptsdgrun
RaceID:Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data
Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>).
Maintained by Dominic Grün. Last updated 4 months ago.
2.0 match 4.74 score 110 scriptsbioc
FEAST:FEAture SelcTion (FEAST) for Single-cell clustering
Cell clustering is one of the most important and commonly performed tasks in single-cell RNA sequencing (scRNA-seq) data analysis. An important step in cell clustering is to select a subset of genes (referred to as “features”), whose expression patterns will then be used for downstream clustering. A good set of features should include the ones that distinguish different cell types, and the quality of such set could have significant impact on the clustering accuracy. FEAST is an R library for selecting most representative features before performing the core of scRNA-seq clustering. It can be used as a plug-in for the etablished clustering algorithms such as SC3, TSCAN, SHARP, SIMLR, and Seurat. The core of FEAST algorithm includes three steps: 1. consensus clustering; 2. gene-level significance inference; 3. validation of an optimized feature set.
Maintained by Kenong Su. Last updated 5 months ago.
sequencingsinglecellclusteringfeatureextraction
1.6 match 10 stars 5.97 score 47 scriptsjonasrieger
ldaPrototype:Prototype of Multiple Latent Dirichlet Allocation Runs
Determine a Prototype from a number of runs of Latent Dirichlet Allocation (LDA) measuring its similarities with S-CLOP: A procedure to select the LDA run with highest mean pairwise similarity, which is measured by S-CLOP (Similarity of multiple sets by Clustering with Local Pruning), to all other runs. LDA runs are specified by its assignments leading to estimators for distribution parameters. Repeated runs lead to different results, which we encounter by choosing the most representative LDA run as prototype.
Maintained by Jonas Rieger. Last updated 2 years ago.
latent-dirichlet-allocationldamodel-selectionmodelselectionreliabilitytext-miningtextdatatopic-modeltopic-modelstopic-similaritiestopicmodelingtopicmodelling
2.0 match 8 stars 4.44 score 23 scripts 1 dependentsbioc
epiregulon.extra:Companion package to epiregulon with additional plotting, differential and graph functions
Gene regulatory networks model the underlying gene regulation hierarchies that drive gene expression and observed phenotypes. Epiregulon infers TF activity in single cells by constructing a gene regulatory network (regulons). This is achieved through integration of scATAC-seq and scRNA-seq data and incorporation of public bulk TF ChIP-seq data. Links between regulatory elements and their target genes are established by computing correlations between chromatin accessibility and gene expressions.
Maintained by Xiaosai Yao. Last updated 13 days ago.
generegulationnetworkgeneexpressiontranscriptionchiponchipdifferentialexpressiongenetargetnormalizationgraphandnetwork
1.8 match 4.95 score 10 scriptsrecon-icm
linkprediction:Link Prediction Methods
Implementations of most of the existing proximity-based methods of link prediction in graphs. Among the 20 implemented methods are e.g.: Adamic L. and Adar E. (2003) <doi:10.1016/S0378-8733(03)00009-1>, Leicht E., Holme P., Newman M. (2006) <doi:10.1103/PhysRevE.73.026120>, Zhou T. and Zhang Y (2009) <doi:10.1140/epjb/e2009-00335-8>, and Fouss F., Pirotte A., Renders J., and Saerens M. (2007) <doi:10.1109/TKDE.2007.46>.
Maintained by Michal Bojanowski. Last updated 5 months ago.
1.5 match 12 stars 5.40 score 14 scriptsmarkvanderloo
stringdist:Approximate String Matching, Fuzzy Text Search, and String Distance Functions
Implements an approximate string matching version of R's native 'match' function. Also offers fuzzy text search based on various string distance measures. Can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences. This package is built for speed and runs in parallel by using 'openMP'. An API for C or C++ is exposed as well. Reference: MPJ van der Loo (2014) <doi:10.32614/RJ-2014-011>.
Maintained by Mark van der Loo. Last updated 4 months ago.
0.5 match 327 stars 15.54 score 2.0k scripts 179 dependentsbioc
BioCor:Functional similarities
Calculates functional similarities based on the pathways described on KEGG and REACTOME or in gene sets. These similarities can be calculated for pathways or gene sets, genes, or clusters and combined with other similarities. They can be used to improve networks, gene selection, testing relationships...
Maintained by Lluís Revilla Sancho. Last updated 5 months ago.
statisticalmethodclusteringgeneexpressionnetworkpathwaysnetworkenrichmentsystemsbiologybioconductor-packagesbioinformaticsfunctional-similaritygenegene-setspathway-analysissimilaritysimilarity-measurement
1.2 match 14 stars 6.47 scorebioc
PRONE:The PROteomics Normalization Evaluator
High-throughput omics data are often affected by systematic biases introduced throughout all the steps of a clinical study, from sample collection to quantification. Normalization methods aim to adjust for these biases to make the actual biological signal more prominent. However, selecting an appropriate normalization method is challenging due to the wide range of available approaches. Therefore, a comparative evaluation of unnormalized and normalized data is essential in identifying an appropriate normalization strategy for a specific data set. This R package provides different functions for preprocessing, normalizing, and evaluating different normalization approaches. Furthermore, normalization methods can be evaluated on downstream steps, such as differential expression analysis and statistical enrichment analysis. Spike-in data sets with known ground truth and real-world data sets of biological experiments acquired by either tandem mass tag (TMT) or label-free quantification (LFQ) can be analyzed.
Maintained by Lis Arend. Last updated 11 days ago.
proteomicspreprocessingnormalizationdifferentialexpressionvisualizationdata-analysisevaluation
1.7 match 2 stars 4.41 score 9 scriptsjreisner
biclustermd:Biclustering with Missing Data
Biclustering is a statistical learning technique that simultaneously partitions and clusters rows and columns of a data matrix. Since the solution space of biclustering is in infeasible to completely search with current computational mechanisms, this package uses a greedy heuristic. The algorithm featured in this package is, to the best our knowledge, the first biclustering algorithm to work on data with missing values. Li, J., Reisner, J., Pham, H., Olafsson, S., and Vardeman, S. (2020) Biclustering with Missing Data. Information Sciences, 510, 304–316.
Maintained by John Reisner. Last updated 4 years ago.
1.8 match 3 stars 4.18 score 4 scriptslazappi
doilinker:Link Preprints And Publications By DOI
Links preprints to publications using the method described in Cabanac G, Oikonomidi T, Boutron I. "Day-to-day discovery of preprint-publication links". Scientometrics. 2021;1–20. DOI: 10.1007/s11192-021-03900-7.
Maintained by Luke Zappia. Last updated 1 years ago.
2.0 match 5 stars 3.40 score 3 scriptsadamlilith
statisfactory:Statistical and Geometrical Tools
A collection of statistical and geometrical tools including the aligned rank transform (ART; Higgins et al. 1990 <doi:10.4148/2475-7772.1443>; Peterson 2002 <doi:10.22237/jmasm/1020255240>; Wobbrock et al. 2011 <doi:10.1145/1978942.1978963>), 2-D histograms and histograms with overlapping bins, a function for making all possible formulae within a set of constraints, amongst others.
Maintained by Adam B. Smith. Last updated 6 months ago.
2d-histogramsaligned-rank-transformsampling
2.0 match 3.38 score 16 scripts 1 dependentsdavharris
blender:Analyze biotic homogenization of landscapes
Tools for assessing exotic species' contributions to landscape homogeneity using average pairwise Jaccard similarity and an analytical approximation derived in Harris et al. (2011, "Occupancy is nine-tenths of the law," The American Naturalist). Also includes a randomization method for assessing sources of model error.
Maintained by David J. Harris. Last updated 13 years ago.
2.2 match 3.00 score 4 scriptsewouddt
BiBitR:R Wrapper for Java Implementation of BiBit
A simple R wrapper for the Java BiBit algorithm from "A biclustering algorithm for extracting bit-patterns from binary datasets" from Domingo et al. (2011) <DOI:10.1093/bioinformatics/btr464>. An simple adaption for the BiBit algorithm which allows noise in the biclusters is also introduced as well as a function to guide the algorithm towards given (sub)patterns. Further, a workflow to derive noisy biclusters from discoverd larger column patterns is included as well.
Maintained by De Troyer Ewoud. Last updated 7 years ago.
1.8 match 1 stars 3.76 score 19 scripts 2 dependentscran
Mercator:Clustering and Visualizing Distance Matrices
Defines the classes used to explore, cluster and visualize distance matrices, especially those arising from binary data. See Abrams and colleagues, 2021, <doi:10.1093/bioinformatics/btab037>.
Maintained by Kevin R. Coombes. Last updated 5 months ago.
1.5 match 4.26 score 1 dependentsyhenryli
PAC:Partition-Assisted Clustering and Multiple Alignments of Networks
Implements partition-assisted clustering and multiple alignments of networks. It 1) utilizes partition-assisted clustering to find robust and accurate clusters and 2) discovers coherent relationships of clusters across multiple samples. It is particularly useful for analyzing single-cell data set. Please see Li et al. (2017) <doi:10.1371/journal.pcbi.1005875> for detail method description.
Maintained by Ye Henry Li. Last updated 4 years ago.
1.9 match 3.30 score 7 scriptsandymckenzie
bayesbio:Miscellaneous Functions for Bioinformatics and Bayesian Statistics
A hodgepodge of hopefully helpful functions. Two of these perform shrinkage estimation: one using a simple weighted method where the user can specify the degree of shrinkage required, and one using James-Stein shrinkage estimation for the case of unequal variances.
Maintained by Andrew McKenzie. Last updated 6 years ago.
1.8 match 1 stars 3.18 score 30 scriptsbioc
MesKit:A tool kit for dissecting cancer evolution from multi-region derived tumor biopsies via somatic alterations
MesKit provides commonly used analysis and visualization modules based on mutational data generated by multi-region sequencing (MRS). This package allows to depict mutational profiles, measure heterogeneity within or between tumors from the same patient, track evolutionary dynamics, as well as characterize mutational patterns on different levels. Shiny application was also developed for a need of GUI-based analysis. As a handy tool, MesKit can facilitate the interpretation of tumor heterogeneity and the understanding of evolutionary relationship between regions in MRS study.
Maintained by Mengni Liu. Last updated 5 months ago.
1.2 match 4.73 score 18 scripts 1 dependentsmpiet11
divo:Tools for Analysis of Diversity and Similarity in Biological Systems
A set of tools for empirical analysis of diversity (a number and frequency of different types in a population) and similarity (a number and frequency of shared types in two populations) in biological or ecological systems.
Maintained by Maciej Pietrzak. Last updated 1 months ago.
2.0 match 2.72 score 26 scriptscran
jacpop:Jaccard Index for Population Structure Identification
Uses the Jaccard similarity index to account for population structure in sequencing studies. This method was specifically designed to detect population stratification based on rare variants, hence it will be especially useful in rare variant analysis.
Maintained by Dmitry Prokopenko. Last updated 6 years ago.
5.3 match 1.00 scoremoseleybioinformaticslab
categoryCompare2:Meta-Analysis of High-Throughput Experiments Using Feature Annotations
Facilitates comparison of significant annotations (categories) generated on one or more feature lists. Interactive exploration is facilitated through the use of RCytoscape (heavily suggested).
Maintained by Robert M Flight. Last updated 5 months ago.
annotationgomultiplecomparisonpathwaysgeneexpressionbioconductorbioinformaticsgene-annotationgene-expressiongene-sets
2.3 match 1 stars 2.30 score 9 scriptsvpetrosyan
CTD:A Method for 'Connecting The Dots' in Weighted Graphs
A method for pattern discovery in weighted graphs as outlined in Thistlethwaite et al. (2021) <doi:10.1371/journal.pcbi.1008550>. Two use cases are achieved: 1) Given a weighted graph and a subset of its nodes, do the nodes show significant connectedness? 2) Given a weighted graph and two subsets of its nodes, are the subsets close neighbors or distant?
Maintained by Varduhi Petrosyan. Last updated 8 months ago.
1.8 match 2.70 score 1 scriptsjessicakubrusly
CFilt:Recommendation by Collaborative Filtering
Provides methods and functions to implement a Recommendation System based on Collaborative Filtering Methodology. See Aggarwal (2016) <doi:10.1007/978-3-319-29659-3> for an overview.
Maintained by Jessica Kubrusly. Last updated 6 months ago.
3.3 match 1.00 scorebioc
HTSFilter:Filter replicated high-throughput transcriptome sequencing data
This package implements a filtering procedure for replicated transcriptome sequencing data based on a global Jaccard similarity index in order to identify genes with low, constant levels of expression across one or more experimental conditions.
Maintained by Andrea Rau. Last updated 5 months ago.
sequencingrnaseqpreprocessingdifferentialexpressiongeneexpressionnormalizationimmunooncology
0.5 match 6.24 score 58 scripts 1 dependentstkhamiak
superbiclust:Generating Robust Biclusters from a Bicluster Set (Ensemble Biclustering)
Biclusters are submatrices in the data matrix which satisfy certain conditions of homogeneity. Package contains functions for generating robust biclusters with respect to the initialization parameters for a given bicluster solution contained in a bicluster set in data, the procedure is also known as ensemble biclustering. The set of biclusters is evaluated based on the similarity of its elements (the overlap), and afterwards the hierarchical tree is constructed to obtain cut-off points for the classes of robust biclusters. The result is a number of robust (or super) biclusters with none or low overlap.
Maintained by Tatsiana Khamiakova. Last updated 4 years ago.
1.8 match 1.48 score 2 scripts 1 dependentsbioc
FindIT2:find influential TF and Target based on multi-omics data
This package implements functions to find influential TF and target based on different input type. It have five module: Multi-peak multi-gene annotaion(mmPeakAnno module), Calculate regulation potential(calcRP module), Find influential Target based on ChIP-Seq and RNA-Seq data(Find influential Target module), Find influential TF based on different input(Find influential TF module), Calculate peak-gene or peak-peak correlation(peakGeneCor module). And there are also some other useful function like integrate different source information, calculate jaccard similarity for your TF.
Maintained by Guandong Shang. Last updated 5 months ago.
softwareannotationchipseqatacseqgeneregulationmultiplecomparisongenetarget
0.5 match 6 stars 5.08 score 7 scriptsmarc-75
MSCA:Clustering of Multiple Censored Time-to-Event Endpoints
Provides basic tools for computing clusters of instances described by multiple time-to-event censored endpoints. From long-format datasets, where one instance is described by one or more records of events, a procedure is used to compute state matrices. Then, from state matrices, a procedure provides optimised computation of the Jaccard distance between instances. The library is currently in development, and more options and tools allowing graphical representation of typologies are expected. For methodological details, see our methodological paper: Delord M, Douiri A (2025) <doi:10.1186/s12874-025-02476-7>.
Maintained by Marc Delord. Last updated 1 months ago.
2.3 match 1.00 scoregzt
catsim:Binary and Categorical Image Similarity Index
Computes a structural similarity metric (after the style of MS-SSIM for images) for binary and categorical 2D and 3D images. Can be based on accuracy (simple matching), Cohen's kappa, Rand index, adjusted Rand index, Jaccard index, Dice index, normalized mutual information, or adjusted mutual information. In addition, has fast computation of Cohen's kappa, the Rand indices, and the two mutual informations. Implements the methods of Thompson and Maitra (2020) <doi:10.48550/arXiv.2004.09073>.
Maintained by Geoffrey Thompson. Last updated 6 months ago.
binary-databinary-image-classificationbinary-image-processingcategorical-datacategorical-imagesclassificationimage-processingcpp
0.5 match 5 stars 4.40 score 5 scriptssingator
autoharp:Semi-Automatic Grading of R and Rmd Scripts
A customisable set of tools for assessing and grading R or R-markdown scripts from students. It allows for checking correctness of code output, runtime statistics and static code analysis. The latter feature is made possible by representing R expressions using a tree structure.
Maintained by Vik Gopal. Last updated 3 years ago.
2.0 match 1 stars 1.00 score 8 scriptszhang-zeyu
countTransformers:Transform Counts in RNA-Seq Data Analysis
Provide data transformation functions to transform counts in RNA-seq data analysis. Please see the reference: Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. (2019) <doi.org/10.1038/s41598-019-41315-w>.
Maintained by Zeyu Zhang. Last updated 6 years ago.
bioinformaticsdifferentialexpression
1.8 match 1.00 score 10 scriptschongwu-biostat
prclust:Penalized Regression-Based Clustering Method
Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit. In this package, we provide two algorithms for fitting the penalized regression-based clustering (PRclust) with non-convex grouping penalties, such as group truncated lasso, MCP and SCAD. One algorithm is based on quadratic penalty and difference convex method. Another algorithm is based on difference convex and ADMM, called DC-ADD, which is more efficient. Generalized cross validation and stability based method were provided to select the tuning parameters. Rand index, adjusted Rand index and Jaccard index were provided to estimate the agreement between estimated cluster memberships and the truth.
Maintained by Chong Wu. Last updated 8 years ago.
0.5 match 2.70 score 6 scriptscran
MCSim:Determine the Optimal Number of Clusters
Identifies the optimal number of clusters by calculating the similarity between two clustering methods at the same number of clusters using the corrected indices of Rand and Jaccard as described in Albatineh and Niewiadomska-Bugaj (2011). The number of clusters at which the index attain its maximum more frequently is a candidate for being the optimal number of clusters.
Maintained by Ahmed N. Albatineh. Last updated 6 years ago.
0.5 match 2.00 scoredamienfinn
MicroNiche:Microbial Niche Measurements
Measures niche breadth and overlap of microbial taxa from large matrices. Niche breadth measurements include Levins' niche breadth (Bn) index, Hurlbert's Bn and Feinsinger's proportional similarity (PS) index. (Feinsinger, P., Spears, E.E., Poole, R.W. (1981) <doi:10.2307/1936664>). Niche overlap measurements include Levin's Overlap (Ludwig, J.A. and Reynolds, J.F. (1988, ISBN:0471832359)) and a Jaccard similarity index of Feinsinger's PS values between taxa pairs, as Proportional Overlap.
Maintained by Damien Finn. Last updated 5 years ago.
0.5 match 3 stars 1.48 score 5 scripts