Showing 74 of total 74 results (show query)
nowosad
comat:Creates Co-Occurrence Matrices of Spatial Data
Builds co-occurrence matrices based on spatial raster data. It includes creation of weighted co-occurrence matrices (wecoma) and integrated co-occurrence matrices (incoma; Vadivel et al. (2007) <doi:10.1016/j.patrec.2007.01.004>).
Maintained by Jakub Nowosad. Last updated 1 years ago.
73.6 match 6 stars 6.31 score 25 scripts 3 dependentsr-forge
wordspace:Distributional Semantic Models in R
An interactive laboratory for research on distributional semantic models ('DSM', see <https://en.wikipedia.org/wiki/Distributional_semantics> for more information).
Maintained by Stephanie Evert. Last updated 3 months ago.
48.0 match 4.95 score 150 scripts 2 dependentsecospat
ecospat:Spatial Ecology Miscellaneous Methods
Collection of R functions and data sets for the support of spatial ecology analyses with a focus on pre, core and post modelling analyses of species distribution, niche quantification and community assembly. Written by current and former members and collaborators of the ecospat group of Antoine Guisan, Department of Ecology and Evolution (DEE) and Institute of Earth Surface Dynamics (IDYST), University of Lausanne, Switzerland. Read Di Cola et al. (2016) <doi:10.1111/ecog.02671> for details.
Maintained by Olivier Broennimann. Last updated 1 months ago.
24.6 match 32 stars 9.35 score 418 scripts 1 dependentsgriffithdan
cooccur:Probabilistic Species Co-Occurrence Analysis in R
This R package applies the probabilistic model of species co-occurrence (Veech 2013) to a set of species distributed among a set of survey or sampling sites. The algorithm calculates the observed and expected frequencies of co-occurrence between each pair of species. The expected frequency is based on the distribution of each species being random and independent of the other species. The analysis returns the probabilities that a more extreme (either low or high) value of co-occurrence could have been obtained by chance. The package also includes functions for visualizing species co-occurrence results and preparing data for downstream analyses.
Maintained by Daniel M. Griffith. Last updated 7 years ago.
49.1 match 3 stars 4.63 score 142 scriptsskembel
picante:Integrating Phylogenies and Ecology
Functions for phylocom integration, community analyses, null-models, traits and evolution. Implements numerous ecophylogenetic approaches including measures of community phylogenetic and trait diversity, phylogenetic signal, estimation of trait values for unobserved taxa, null models for community and phylogeny randomizations, and utility functions for data input/output and phylogeny plotting. A full description of package functionality and methods are provided by Kembel et al. (2010) <doi:10.1093/bioinformatics/btq166>.
Maintained by Steven W. Kembel. Last updated 2 years ago.
10.6 match 34 stars 11.42 score 1.1k scripts 16 dependentsxijianzheng
coefa:Meta Analysis of Factor Analysis Based on CO-Occurrence Matrices
Provide a series of functions to conduct a meta analysis of factor analysis based on co-occurrence matrices. The tool can be used to solve the factor structure (i.e. inner structure of a construct, or scale) debate in several disciplines, such as psychology, psychiatry, management, education so on. References: Shafer (2005) <doi:10.1037/1040-3590.17.3.324>; Shafer (2006) <doi:10.1002/jclp.20213>; Loeber and Schmaling (1985) <doi:10.1007/BF00910652>.
Maintained by Xijian Zheng. Last updated 2 years ago.
43.8 match 2.70 score 4 scriptsdwarton
ecostats:Code and Data Accompanying the Eco-Stats Text (Warton 2022)
Functions and data supporting the Eco-Stats text (Warton, 2022, Springer), and solutions to exercises. Functions include tools for using simulation envelopes in diagnostic plots, and a function for diagnostic plots of multivariate linear models. Datasets mentioned in the package are included here (where not available elsewhere) and there is a vignette for each chapter of the text with solutions to exercises.
Maintained by David Warton. Last updated 1 years ago.
16.6 match 8 stars 6.58 score 53 scriptsbernd-mueller
epos:Epilepsy Ontologies' Similarities
Analysis and visualization of similarities between epilepsy ontologies based on text mining results by comparing ranked lists of co-occurring drug terms in the BioASQ corpus. The ranked result lists of neurological drug terms co-occurring with terms from the epilepsy ontologies EpSO, ESSO, EPILONT, EPISEM and FENICS undergo further analysis. The source data to create the ranked lists of drug names is produced using the text mining workflows described in Mueller, Bernd and Hagelstein, Alexandra (2016) <doi:10.4126/FRL01-006408558>, Mueller, Bernd et al. (2017) <doi:10.1007/978-3-319-58694-6_22>, Mueller, Bernd and Rebholz-Schuhmann, Dietrich (2020) <doi:10.1007/978-3-030-43887-6_52>, and Mueller, Bernd et al. (2022) <doi:10.1186/s13326-021-00258-w>.
Maintained by Bernd Mueller. Last updated 1 years ago.
26.4 match 4.03 score 53 scriptsbioc
mosbi:Molecular Signature identification using Biclustering
This package is a implementation of biclustering ensemble method MoSBi (Molecular signature Identification from Biclustering). MoSBi provides standardized interfaces for biclustering results and can combine their results with a multi-algorithm ensemble approach to compute robust ensemble biclusters on molecular omics data. This is done by computing similarity networks of biclusters and filtering for overlaps using a custom error model. After that, the louvain modularity it used to extract bicluster communities from the similarity network, which can then be converted to ensemble biclusters. Additionally, MoSBi includes several network visualization methods to give an intuitive and scalable overview of the results. MoSBi comes with several biclustering algorithms, but can be easily extended to new biclustering algorithms.
Maintained by Tim Daniel Rose. Last updated 5 months ago.
softwarestatisticalmethodclusteringnetworkcpp
24.6 match 4.30 score 8 scriptstommyjones
textmineR:Functions for Text Mining and Topic Modeling
An aid for text mining in R, with a syntax that should be familiar to experienced R users. Provides a wrapper for several topic models that take similarly-formatted input and give similarly-formatted output. Has additional functionality for analyzing and diagnostics for topic models.
Maintained by Tommy Jones. Last updated 2 years ago.
9.2 match 106 stars 10.83 score 310 scripts 7 dependentsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
7.5 match 215 stars 11.83 score 1.2k scripts 9 dependentsdracor-org
rdracor:Access to the 'DraCor' API
Provide an interface for 'Drama Corpora Project' ('DraCor') API: <https://dracor.org/documentation/api>.
Maintained by Ivan Pozdniakov. Last updated 6 months ago.
17.4 match 14 stars 5.05 score 40 scriptsbiorgeo
bioregion:Comparison of Bioregionalisation Methods
The main purpose of this package is to propose a transparent methodological framework to compare bioregionalisation methods based on hierarchical and non-hierarchical clustering algorithms (Kreft & Jetz (2010) <doi:10.1111/j.1365-2699.2010.02375.x>) and network algorithms (Lenormand et al. (2019) <doi:10.1002/ece3.4718> and Leroy et al. (2019) <doi:10.1111/jbi.13674>).
Maintained by Maxime Lenormand. Last updated 12 days ago.
biogeographybioregionbioregionalizationcpp
13.7 match 7 stars 6.27 score 11 scriptsequitable-equations
fqar:Floristic Quality Assessment Tools for R
Tools for downloading and analyzing floristic quality assessment data. See Freyman et al. (2015) <doi:10.1111/2041-210X.12491> for more information about floristic quality assessment and the associated database.
Maintained by Andrew Gard. Last updated 2 months ago.
14.1 match 5 stars 5.88 score 5 scriptsquanteda
quanteda:Quantitative Analysis of Textual Data
A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
Maintained by Kenneth Benoit. Last updated 2 months ago.
corpusnatural-language-processingquantedatext-analyticsonetbbcpp
4.6 match 851 stars 16.68 score 5.4k scripts 51 dependentstrinker
qdap:Bridging the Gap Between Qualitative Data and Quantitative Analysis
Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.
Maintained by Tyler Rinker. Last updated 4 years ago.
qdapquantitative-discourse-analysistext-analysistext-miningtext-plottingopenjdk
7.3 match 176 stars 9.61 score 1.3k scripts 3 dependentslbbe-software
Mondrian:A Simple Graphical Representation of the Relative Occurrence and Co-Occurrence of Events
The unique function of this package allows representing in a single graph the relative occurrence and co-occurrence of events measured in a sample. As examples, the package was applied to describe the occurrence and co-occurrence of different species of bacterial or viral symbionts infecting arthropods at the individual level. The graphics allows determining the prevalence of each symbiont and the patterns of multiple infections (i.e. how different symbionts share or not the same individual hosts). We named the package after the famous painter as the graphical output recalls Mondrian’s paintings.
Maintained by Aurélie Siberchicot. Last updated 8 months ago.
17.2 match 2 stars 4.00 score 8 scriptsazvoleff
glcm:Calculate Textures from Grey-Level Co-Occurrence Matrices (GLCMs)
Enables calculation of image textures (Haralick 1973) <doi:10.1109/TSMC.1973.4309314> from grey-level co-occurrence matrices (GLCMs). Supports processing images that cannot fit in memory.
Maintained by Alex Zvoleff. Last updated 5 years ago.
13.3 match 15 stars 5.05 score 74 scriptsemcramer
CHOIRBM:Plots the CHOIR Body Map
Collection of utility functions for visualizing body map data collected with the Collaborative Health Outcomes Information Registry.
Maintained by Eric Cramer. Last updated 1 years ago.
body-mapcbmchoirdata-visualizationvisualization
11.9 match 5 stars 5.51 score 26 scriptshope-data-science
akc:Automatic Knowledge Classification
A tidy framework for automatic knowledge classification and visualization. Currently, the core functionality of the framework is mainly supported by modularity-based clustering (community detection) in keyword co-occurrence network, and focuses on co-word analysis of bibliometric research. However, the designed functions in 'akc' are general, and could be extended to solve other tasks in text mining as well.
Maintained by Tian-Yuan Huang. Last updated 20 days ago.
11.0 match 15 stars 5.85 score 47 scriptskasperwelbers
corpustools:Managing, Querying and Analyzing Tokenized Text
Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.
Maintained by Kasper Welbers. Last updated 6 months ago.
8.0 match 31 stars 7.50 score 174 scripts 1 dependentstesselle
tabula:Analysis and Visualization of Archaeological Count Data
An easy way to examine archaeological count data. This package provides several tests and measures of diversity: heterogeneity and evenness (Brillouin, Shannon, Simpson, etc.), richness and rarefaction (Chao1, Chao2, ACE, ICE, etc.), turnover and similarity (Brainerd-Robinson, etc.). It allows to easily visualize count data and statistical thresholds: rank vs abundance plots, heatmaps, Ford (1962) and Bertin (1977) diagrams, etc.
Maintained by Nicolas Frerebeau. Last updated 14 days ago.
data-visualizationarchaeologyarchaeological-science
11.3 match 5.10 score 38 scripts 1 dependentsmacroecology
letsR:Data Handling and Analysis in Macroecology
Handling, processing, and analyzing geographic data on species' distributions and environmental variables. Read Vilela & Villalobos (2015) <doi:10.1111/2041-210X.12401> for details.
Maintained by Bruno Vilela. Last updated 2 months ago.
6.2 match 29 stars 8.87 score 104 scriptspaballand
EconGeo:Computing Key Indicators of the Spatial Distribution of Economic Activities
Functions to compute a series of indices commonly used in the fields of economic geography, economic complexity, and evolutionary economics to describe the location, distribution, spatial organization, structure, and complexity of economic activities. Functions include basic spatial indicators such as the location quotient, the Krugman specialization index, the Herfindahl or the Shannon entropy indices but also more advanced functions to compute different forms of normalized relatedness between economic activities or network-based measures of economic complexity. Most of the functions use matrix calculus and are based on bipartite (incidence) matrices consisting of region - industry pairs.
Maintained by Pierre-Alexandre Balland. Last updated 2 years ago.
9.5 match 41 stars 4.96 score 44 scriptskpmainali
CooccurrenceAffinity:Affinity in Co-Occurrence Data
Computes a novel metric of affinity between two entities based on their co-occurrence (using binary presence/absence data). The metric and its MLE, alpha hat, were advanced in Mainali, Slud, et al, 2021 <doi:10.1126/sciadv.abj9204>. Various types of confidence intervals and median interval were developed in Mainali and Slud, 2022 <doi:10.1101/2022.11.01.514801>.
Maintained by Kumar Mainali. Last updated 2 years ago.
10.5 match 26 stars 4.39 score 19 scriptsyiluheihei
RevEcoR:Reverse Ecology Analysis on Microbiome
An implementation of the reverse ecology framework. Reverse ecology refers to the use of genomics to study ecology with no a priori assumptions about the organism(s) under consideration, linking organisms to their environment. It allows researchers to reconstruct the metabolic networks and study the ecology of poorly characterized microbial species from their genomic information, and has substantial potentials for microbial community ecological analysis.
Maintained by Yang Cao. Last updated 6 years ago.
7.1 match 6 stars 5.77 score 22 scripts 1 dependentsalarm-redist
redist:Simulation Methods for Legislative Redistricting
Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <arXiv:1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <arXiv:2011.02288>.
Maintained by Christopher T. Kenny. Last updated 2 months ago.
geospatialgerrymanderingredistrictingsamplingopenblascppopenmp
3.6 match 68 stars 9.17 score 259 scriptsrbarkerclarke
gtexture:Generalized Application of Co-Occurrence Matrices and Haralick Texture
Generalizes application of gray-level co-occurrence matrix (GLCM) metrics to objects outside of images. The current focus is to apply GLCM metrics to the study of biological networks and fitness landscapes that are used in studying evolutionary medicine and biology, particularly the evolution of cancer resistance. The package was used in our publication, Barker-Clarke et al. (2023) <doi:10.1088/1361-6560/ace305>. A general reference to learn more about mathematical oncology can be found at Rockne et al. (2019) <doi:10.1088/1478-3975/ab1a09>.
Maintained by Rowan Barker-Clarke. Last updated 12 months ago.
10.5 match 3.00 score 1 scriptscodymarquart
rENA:Epistemic Network Analysis
ENA (Shaffer, D. W. (2017) Quantitative Ethnography. ISBN: 0578191687) is a method used to identify meaningful and quantifiable patterns in discourse or reasoning. ENA moves beyond the traditional frequency-based assessments by examining the structure of the co-occurrence, or connections in coded data. Moreover, compared to other methodological approaches, ENA has the novelty of (1) modeling whole networks of connections and (2) affording both quantitative and qualitative comparisons between different network models. Shaffer, D.W., Collier, W., & Ruis, A.R. (2016).
Maintained by Cody L Marquart. Last updated 1 years ago.
11.9 match 1 stars 2.26 score 36 scriptsjenniniku
gllvm:Generalized Linear Latent Variable Models
Analysis of multivariate data using generalized linear latent variable models (gllvm). Estimation is performed using either the Laplace method, variational approximations, or extended variational approximations, implemented via TMB (Kristensen et al. (2016), <doi:10.18637/jss.v070.i05>).
Maintained by Jenni Niku. Last updated 13 hours ago.
2.5 match 52 stars 10.54 score 176 scripts 1 dependentsandrew-plowright
ForestTools:Tools for Analyzing Remote Sensing Forest Data
Tools for analyzing remote sensing forest data, including functions for detecting treetops from canopy models, outlining tree crowns, and calculating textural metrics.
Maintained by Andrew Plowright. Last updated 1 months ago.
3.6 match 73 stars 7.01 score 103 scripts 1 dependentsquanteda
quanteda.textplots:Plots for the Quantitative Analysis of Textual Data
Plotting functions for visualising textual data. Extends 'quanteda' and related packages with plot methods designed specifically for text data, textual statistics, and models fit to textual data. Plot types include word clouds, lexical dispersion plots, scaling plots, network visualisations, and word 'keyness' plots.
Maintained by Kenneth Benoit. Last updated 7 months ago.
3.6 match 7 stars 6.77 score 648 scriptsbblonder
netassoc:Inference of Species Associations from Co-Occurrence Data
Infers species associations from community matrices. Uses local and (optional) regional-scale co-occurrence data by comparing observed partial correlation coefficients between species to those estimated from regional species distributions. Extends Gaussian graphical models to a null modeling framework. Provides interface to a variety of inverse covariance matrix estimation methods.
Maintained by Benjamin Blonder. Last updated 3 years ago.
10.4 match 2 stars 2.30 score 9 scriptscysouw
qlcMatrix:Utility Sparse Matrix Functions for Quantitative Language Comparison
Extension of the functionality of the 'Matrix' package for using sparse matrices. Some of the functions are very general, while other are highly specific for special data format as used for quantitative language comparison.
Maintained by Michael Cysouw. Last updated 9 months ago.
3.3 match 6 stars 6.98 score 256 scripts 1 dependentsmicrosoft
wpa:Tools for Analysing and Visualising Viva Insights Data
Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in 'wpa': (i) Standard functions create a 'ggplot' visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to 'tidyverse' principles and works well with the pipe syntax. 'wpa' is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.
Maintained by Martin Chan. Last updated 4 months ago.
3.3 match 30 stars 6.69 score 39 scripts 1 dependentsmicrosoft
vivainsights:Analyze and Visualize Data from 'Microsoft Viva Insights'
Provides a versatile range of functions, including exploratory data analysis, time-series analysis, organizational network analysis, and data validation, whilst at the same time implements a set of best practices in analyzing and visualizing data specific to 'Microsoft Viva Insights'.
Maintained by Martin Chan. Last updated 25 days ago.
3.3 match 11 stars 6.12 score 68 scriptscranhaven
rock:Reproducible Open Coding Kit
The Reproducible Open Coding Kit ('ROCK', and this package, 'rock') was developed to facilitate reproducible and open coding, specifically geared towards qualitative research methods. Although it is a general-purpose toolkit, three specific applications have been implemented, specifically an interface to the 'rENA' package that implements Epistemic Network Analysis ('ENA'), means to process notes from Cognitive Interviews ('CIs'), and means to work with decentralized construct taxonomies ('DCTs'). The 'ROCK' and this 'rock' package are described in the ROCK book <https://rockbook.org> and more information, such as tutorials, is available at <https://rock.science>.
Maintained by Gjalt-Jorn Peters. Last updated 9 days ago.
5.6 match 5 stars 3.40 scorecran
ecoCopula:Graphical Modelling and Ordination using Copulas
Creates 'graphs' of species associations (interactions) and ordination biplots from co-occurrence data by fitting discrete gaussian copula graphical models. Methods described in Popovic, GC., Hui, FKC., Warton, DI., (2018) <doi:10.1016/j.jmva.2017.12.002>.
Maintained by Gordana Popovic. Last updated 3 years ago.
4.4 match 4.17 score 49 scripts 2 dependentsweksi-budiaji
kmed:Distance-Based k-Medoids
Algorithms of distance-based k-medoids clustering: simple and fast k-medoids, ranked k-medoids, and increasing number of clusters in k-medoids. Calculate distances for mixed variable data such as Gower, Podani, Wishart, Huang, Harikumar-PV, and Ahmad-Dey. Cluster validation applies internal and relative criteria. The internal criteria includes silhouette index and shadow values. The relative criterium applies bootstrap procedure producing a heatmap with a flexible reordering matrix algorithm such as complete, ward, or average linkages. The cluster result can be plotted in a marked barplot or pca biplot.
Maintained by Weksi Budiaji. Last updated 3 years ago.
5.7 match 3.15 score 141 scriptsmodesto-escobar
netCoin:Interactive Analytic Networks
Create interactive analytic networks. It joins the data analysis power of R to obtain coincidences, co-occurrences and correlations, and the visualization libraries of 'JavaScript' in one package.
Maintained by Modesto Escobar. Last updated 10 hours ago.
2.2 match 11 stars 7.22 score 47 scriptskisungyou
T4cluster:Tools for Cluster Analysis
Cluster analysis is one of the most fundamental problems in data science. We provide a variety of algorithms from clustering to the learning on the space of partitions. See Hennig, Meila, and Rocci (2016, ISBN:9781466551886) for general exposition to cluster analysis.
Maintained by Kisung You. Last updated 3 years ago.
3.6 match 6 stars 4.26 score 9 scripts 2 dependentschaoliu-cl
textAnnotatoR:Interactive Text Annotation Tool with 'shiny' GUI
A comprehensive text annotation tool built with 'shiny'. Provides an interactive graphical user interface for coding text documents, managing code hierarchies, creating memos, and analyzing coding patterns. Features include code co-occurrence analysis, visualization of coding patterns, comparison of multiple coding sets, and export capabilities. Supports collaborative qualitative research through standardized annotation formats and analysis tools.
Maintained by Chao Liu. Last updated 4 months ago.
3.5 match 4.30 score 5 scriptsdernarr
ndl:Naive Discriminative Learning
Naive discriminative learning implements learning and classification models based on the Rescorla-Wagner equations and their equilibrium equations.
Maintained by Tino Sering. Last updated 7 years ago.
5.0 match 1 stars 3.00 score 66 scriptscran
pubmed.mineR:Text Mining of PubMed Abstracts
Text mining of PubMed Abstracts (text and XML) from <https://pubmed.ncbi.nlm.nih.gov/>.
Maintained by S. Ramachandran. Last updated 6 months ago.
6.8 match 6 stars 2.08 scoretraminer
TraMineRextras:TraMineR Extension
Collection of ancillary functions and utilities to be used in conjunction with the 'TraMineR' package for sequence data exploration. Includes, among others, specific functions such as state survival plots, position-wise group-typical states, dynamic sequence indicators, and dissimilarities between event sequences. Also includes contributions by non-members of the TraMineR team such as methods for polyadic data and for the comparison of groups of sequences.
Maintained by Gilbert Ritschard. Last updated 7 months ago.
5.4 match 2.43 score 89 scripts 1 dependentspolmine
polmineR:Verbs and Nouns for Corpus Analysis
Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.
Maintained by Andreas Blaette. Last updated 1 years ago.
1.5 match 49 stars 7.96 score 311 scriptsbnosac
BTM:Biterm Topic Models for Short Text
Biterm Topic Models find topics in collections of short texts. It is a word co-occurrence based topic model that learns topics by modeling word-word co-occurrences patterns which are called biterms. This in contrast to traditional topic models like Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis which are word-document co-occurrence topic models. A biterm consists of two words co-occurring in the same short text window. This context window can for example be a twitter message, a short answer on a survey, a sentence of a text or a document identifier. The techniques are explained in detail in the paper 'A Biterm Topic Model For Short Text' by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng (2013) <https://github.com/xiaohuiyan/xiaohuiyan.github.io/blob/master/paper/BTM-WWW13.pdf>.
Maintained by Jan Wijffels. Last updated 2 years ago.
biterm-topic-modellingnatural-language-processingtopic-modelingcpp
1.9 match 96 stars 6.25 score 74 scriptsquanteda
quanteda.textstats:Textual Statistics for the Quantitative Analysis of Textual Data
Textual statistics functions formerly in the 'quanteda' package. Textual statistics for characterizing and comparing textual data. Includes functions for measuring term and document frequency, the co-occurrence of words, similarity and distance between features and documents, feature entropy, keyword occurrence, readability, and lexical diversity. These functions extend the 'quanteda' package and are specially designed for sparse textual data.
Maintained by Kenneth Benoit. Last updated 6 months ago.
1.3 match 15 stars 8.91 score 916 scripts 10 dependentsjuliasilge
widyr:Widen, Process, then Re-Tidy Data
Encapsulates the pattern of untidying data into a wide matrix, performing some processing, then turning it back into a tidy form. This is useful for several operations such as co-occurrence counts, correlations, or clustering that are mathematically convenient on wide matrices.
Maintained by Julia Silge. Last updated 2 years ago.
1.0 match 328 stars 11.11 score 1.7k scripts 2 dependentskidoishi
MadanTextNetwork:Persian Textmining Tool for Co-Occurrence_Network
MadanText_co-occurrence_network is an open-source software designed specifically for text mining in the Persian language. It adds co-occurrence network functionality to MadanText. The input file replaces the text format with an Excel format.
Maintained by Kido Ishikawa. Last updated 1 years ago.
4.0 match 2.70 scorenalimilan
RcmdrPlugin.temis:Graphical Integrated Text Mining Solution
An 'R Commander' plug-in providing an integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, vocabulary tables, terms co-occurrences and documents similarity measures, time series analysis, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, 'Twitter' queries, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.
Maintained by Milan Bouchet-Valat. Last updated 7 years ago.
10.5 match 1.00 score 7 scriptsnowosad
motif:Local Pattern Analysis
Describes spatial patterns of categorical raster data for any defined regular and irregular areas. Patterns are described quantitatively using built-in signatures based on co-occurrence matrices but also allows for any user-defined functions. It enables spatial analysis such as search, change detection, and clustering to be performed on spatial patterns (Nowosad (2021) <doi:10.1007/s10980-020-01135-0>).
Maintained by Jakub Nowosad. Last updated 7 months ago.
categorical-rasterglobal-ecologylandscape-ecologyspatialcpp
1.0 match 63 stars 7.48 score 48 scriptsreviewburner
AnimalSequences:Analyse Animal Sequential Behaviour and Communication
All animal behaviour occurs sequentially. The package has a number of functions to format sequence data from different sources, to analyse sequential behaviour and communication in animals. It also has functions to plot the data and to calculate the entropy of sequences.
Maintained by Alex Mielke. Last updated 6 months ago.
7.1 match 1.00 scorebnosac
textplot:Text Plots
Visualise complex relations in texts. This is done by providing functionalities for displaying text co-occurrence networks, text correlation networks, dependency relationships as well as text clustering and semantic text 'embeddings'. Feel free to join the effort of providing interesting text visualisations.
Maintained by Jan Wijffels. Last updated 3 years ago.
1.0 match 54 stars 6.78 score 75 scripts 1 dependentsyoctozepto
RAFS:Robust Aggregative Feature Selection
A cross-validated minimal-optimal feature selection algorithm. It utilises popularity counting, hierarchical clustering with feature dissimilarity measures, and prefiltering with all-relevant feature selection method to obtain the minimal-optimal set of features.
Maintained by Radosław Piliszek. Last updated 2 months ago.
5.2 match 1.30 scoreailich
GLCMTextures:GLCM Textures of Raster Layers
Calculates grey level co-occurrence matrix (GLCM) based texture measures (Hall-Beyer (2017) <https://prism.ucalgary.ca/bitstream/handle/1880/51900/texture%20tutorial%20v%203_0%20180206.pdf>; Haralick et al. (1973) <doi:10.1109/TSMC.1973.4309314>) of raster layers using a sliding rectangular window. It also includes functions to quantize a raster into grey levels as well as tabulate a glcm and calculate glcm texture metrics for a matrix.
Maintained by Alexander Ilich. Last updated 2 months ago.
1.0 match 12 stars 6.33 score 20 scripts 2 dependentsbioc
BioNAR:Biological Network Analysis in R
the R package BioNAR, developed to step by step analysis of PPI network. The aim is to quantify and rank each protein’s simultaneous impact into multiple complexes based on network topology and clustering. Package also enables estimating of co-occurrence of diseases across the network and specific clusters pointing towards shared/common mechanisms.
Maintained by Anatoly Sorokin. Last updated 20 days ago.
softwaregraphandnetworknetwork
1.0 match 3 stars 5.90 score 35 scriptsjferrer-b
Rediscover:Identify Mutually Exclusive Mutations
An optimized method for identifying mutually exclusive genomic events. Its main contribution is a statistical analysis based on the Poisson-Binomial distribution that takes into account that some samples are more mutated than others. See [Canisius, Sander, John WM Martens, and Lodewyk FA Wessels. (2016) "A novel independence test for somatic alterations in cancer shows that biology drives mutual exclusivity but chance explains most co-occurrence." Genome biology 17.1 : 1-17. <doi:10.1186/s13059-016-1114-x>]. The mutations matrices are sparse matrices. The method developed takes advantage of the advantages of this type of matrix to save time and computing resources.
Maintained by Juan A. Ferrer-Bonsoms. Last updated 2 years ago.
2.2 match 2.70 score 7 scriptsnalimilan
R.temis:Integrated Text Mining Solution
An integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, lexical summary, terms co-occurrences and documents similarity measures, graphs of terms, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.
Maintained by Milan Bouchet-Valat. Last updated 18 days ago.
1.0 match 27 stars 4.99 score 24 scriptstslumley
rimu:Responses in Multiplex
Tools for manipulating, exploring, and visualising multiple-response data, including scored or ranked responses. Conversions to and from factors, lists, strings, matrices; reordering, lumping, flattening; set operations; tables; frequency and co-occurrence plots.
Maintained by Thomas Lumley. Last updated 12 months ago.
1.0 match 4 stars 4.78 score 10 scriptschiliubio
meconetcomp:Compare Microbial Networks of 'trans_network' Class of 'microeco' Package
Compare microbial co-occurrence networks created from 'trans_network' class of 'microeco' package <https://github.com/ChiLiubio/microeco>. This package is the extension of 'trans_network' class of 'microeco' package and especially useful when different networks are constructed and analyzed simultaneously.
Maintained by Chi Liu. Last updated 22 days ago.
1.0 match 9 stars 4.69 score 12 scriptskdonnay
meltt:Matching Event Data by Location, Time and Type
Framework for merging and disambiguating event data based on spatiotemporal co-occurrence and secondary event characteristics. It can account for intrinsic "fuzziness" in the coding of events, varying event taxonomies and different geo-precision codes.
Maintained by Karsten Donnay. Last updated 8 months ago.
1.0 match 24 stars 4.64 score 12 scriptscran
agrifeature:Agriculture Image Feature
Functions to calculate Gray Level Co-occurrence Matrix(GLCM), RGB-based Vegetative Index(RGB VI) and Normalized Difference Vegetation Index(NDVI) family image features. GLCM calculations are based on Haralick (1973) <doi:10.1109/TSMC.1973.4309314>.
Maintained by Chun-Han Lee. Last updated 3 years ago.
4.5 match 1.00 scoremlampros
fastGLCM:'GLCM' Texture Features
Two 'Gray Level Co-occurrence Matrix' ('GLCM') implementations are included: The first is a fast 'GLCM' feature texture computation based on 'Python' 'Numpy' arrays ('Github' Repository, <https://github.com/tzm030329/GLCM>). The second is a fast 'GLCM' 'RcppArmadillo' implementation which is parallelized (using 'OpenMP') with the option to return all 'GLCM' features at once. For more information, see "Artifact-Free Thin Cloud Removal Using Gans" by Toizumi Takahiro, Zini Simone, Sagi Kazutoshi, Kaneko Eiji, Tsukada Masato, Schettini Raimondo (2019), IEEE International Conference on Image Processing (ICIP), pp. 3596-3600, <doi:10.1109/ICIP.2019.8803652>.
Maintained by Lampros Mouselimis. Last updated 2 years ago.
glcmrcpparmadilloopenblascppopenmp
1.0 match 5 stars 4.40 score 2 scriptsvtrevino
valorate:Velocity and Accuracy of the LOg-RAnk TEst
The algorithm implemented in this package was designed to quickly estimates the distribution of the log-rank especially for heavy unbalanced groups. VALORATE estimates the null distribution and the p-value of the log-rank test based on a recent formulation. For a given number of alterations that define the size of survival groups, the estimation involves a weighted sum of distributions that are conditional on a co-occurrence term where mutations and events are both present. The estimation of conditional distributions is quite fast allowing the analysis of large datasets in few minutes <http://bioinformatica.mty.itesm.mx/valorate>.
Maintained by Victor Trevino. Last updated 8 years ago.
4.4 match 1.00 score 9 scriptscelehs
kesernetwork:Visualization of the KESER Network
A shiny app to visualize the knowledge networks for the code concepts. Using co-occurrence matrices of EHR codes from Veterans Affairs (VA) and Massachusetts General Brigham (MGB), the knowledge extraction via sparse embedding regression (KESER) algorithm was used to construct knowledge networks for the code concepts. Background and details about the method can be found at Chuan et al. (2021) <doi:10.1038/s41746-021-00519-z>.
Maintained by Su-Chun Cheng. Last updated 2 years ago.
1.0 match 1 stars 4.00 score 7 scriptsjohnihrie
openEBGM:EBGM Disproportionality Scores for Adverse Event Data Mining
An implementation of DuMouchel's (1999) <doi:10.1080/00031305.1999.10474456> Bayesian data mining method for the market basket problem. Calculates Empirical Bayes Geometric Mean (EBGM) and posterior quantile scores using the Gamma-Poisson Shrinker (GPS) model to find unusually large cell counts in large, sparse contingency tables. Can be used to find unusually high reporting rates of adverse events associated with products. In general, can be used to mine any database where the co-occurrence of two variables or items is of interest. Also calculates relative and proportional reporting ratios. Builds on the work of the 'PhViD' package, from which much of the code is derived. Some of the added features include stratification to adjust for confounding variables and data squashing to improve computational efficiency. Includes an implementation of the EM algorithm for hyperparameter estimation loosely derived from the 'mederrRank' package.
Maintained by John Ihrie. Last updated 2 years ago.
1.0 match 1 stars 3.71 score 34 scripts 1 dependentscran
mmpp:Various Similarity and Distance Metrics for Marked Point Processes
Compute similarities and distances between marked point processes.
Maintained by Hideitsu Hino. Last updated 7 years ago.
3.4 match 1.00 scorec0reyes
TextMiningGUI:Text Mining GUI Interface
Graphic interface for text analysis, implement a few methods such as biplots, correspondence analysis, co-occurrence, clustering, topic models, correlations and sentiments.
Maintained by Conrado Reyes. Last updated 4 years ago.
analysisbiplotbiplotscorrelationssentimentstextminingtopic-models
1.1 match 3 stars 3.18 score 6 scriptscran
RIA:Radiomics Image Analysis Toolbox for Medial Images
Radiomics image analysis toolbox for 2D and 3D radiological images. RIA supports DICOM, NIfTI, nrrd and npy (numpy array) file formats. RIA calculates first-order, gray level co-occurrence matrix, gray level run length matrix and geometry-based statistics. Almost all calculations are done using vectorized formulas to optimize run speeds. Calculation of several thousands of parameters only takes minutes on a single core of a conventional PC. Detailed methodology has been published: Kolossvary et al. Circ: Cardiovascular Imaging. 2017;10(12):e006843 <doi: 10.1161/CIRCIMAGING.117.006843>.
Maintained by Marton Kolossvary. Last updated 1 years ago.
1.0 match 7 stars 3.24 scorerobitalec
ScaleInMultilayerNetworks:Package Accompanying: The Problem And Promise Of Scale In Multilayer Animal Social Networks.
Scale remains a foundational concept in ecology. Spatial scale, for instance, has become a central consideration in the way we understand landscape ecology and animal space use. Meanwhile, scale-dependent social processes can range from fine-scale interactions to co-occurrence and overlapping home ranges. Furthermore, sociality can vary within and across seasons. Multilayer networks promise the explicit integration of the social, spatial and, temporal contexts. Given the complex interplay of sociality and animal space use in heterogeneous landscapes, there remains an important gap in our understanding of the influence of scale on animal social networks. Using an empirical case study, we discuss ways of considering social, spatial and, temporal scale in the context of multilayer caribou social networks. Effective integration of social and spatial processes, including biologically meaningful scales, within the context of animal social networks is an emerging area of research. We incorporate perspectives that link the social environment to spatial processes across scales in a multilayer context.
Maintained by Alec L. Robitaille. Last updated 4 years ago.
ecologymultilayer-networkssocial-network-analysis
1.0 match 1 stars 2.00 scorecarlosp-carmona
DarkDiv:Estimating Dark Diversity and Site-Specific Species Pools
Estimation of dark diversity and site-specific species pools using species co-occurrences. It includes implementations of probabilistic dark diversity based on the Hypergeometric distribution, as well as estimations based on the Beals index, which can be transformed to binary predictions using different thresholds, or transformed into a favorability index. All methods include the possibility of using a calibration dataset that is used to estimate the indication matrix between pairs of species, or to estimate dark diversity directly on a single dataset. See De Caceres and Legendre (2008) <doi:10.1007/s00442-008-1017-y>, Lewis et al. (2016) <doi:10.1111/2041-210X.12443>, Partel et al. (2011) <doi:10.1016/j.tree.2010.12.004>, Real et al. (2017) <doi:10.1093/sysbio/syw072> for further information.
Maintained by Carlos P. Carmona. Last updated 5 years ago.
1.0 match 1.00 score 6 scripts