R-universe search: "co-occurrence"

nowosad

comat:Creates Co-Occurrence Matrices of Spatial Data

Builds co-occurrence matrices based on spatial raster data. It includes creation of weighted co-occurrence matrices (wecoma) and integrated co-occurrence matrices (incoma; Vadivel et al. (2007) <doi:10.1016/j.patrec.2007.01.004>).

Maintained by Jakub Nowosad. Last updated 1 years ago.

co-occurrence raster spatial cpp

73.6 match 6 stars 6.31 score 25 scripts 3 dependents

r-forge

wordspace:Distributional Semantic Models in R

An interactive laboratory for research on distributional semantic models ('DSM', see <https://en.wikipedia.org/wiki/Distributional_semantics> for more information).

Maintained by Stephanie Evert. Last updated 3 months ago.

cpp openmp

48.0 match 4.95 score 150 scripts 2 dependents

ecospat

ecospat:Spatial Ecology Miscellaneous Methods

Collection of R functions and data sets for the support of spatial ecology analyses with a focus on pre, core and post modelling analyses of species distribution, niche quantification and community assembly. Written by current and former members and collaborators of the ecospat group of Antoine Guisan, Department of Ecology and Evolution (DEE) and Institute of Earth Surface Dynamics (IDYST), University of Lausanne, Switzerland. Read Di Cola et al. (2016) <doi:10.1111/ecog.02671> for details.

Maintained by Olivier Broennimann. Last updated 1 months ago.

24.6 match 32 stars 9.35 score 418 scripts 1 dependents

griffithdan

cooccur:Probabilistic Species Co-Occurrence Analysis in R

This R package applies the probabilistic model of species co-occurrence (Veech 2013) to a set of species distributed among a set of survey or sampling sites. The algorithm calculates the observed and expected frequencies of co-occurrence between each pair of species. The expected frequency is based on the distribution of each species being random and independent of the other species. The analysis returns the probabilities that a more extreme (either low or high) value of co-occurrence could have been obtained by chance. The package also includes functions for visualizing species co-occurrence results and preparing data for downstream analyses.

Maintained by Daniel M. Griffith. Last updated 7 years ago.

49.1 match 3 stars 4.63 score 142 scripts

skembel

picante:Integrating Phylogenies and Ecology

Functions for phylocom integration, community analyses, null-models, traits and evolution. Implements numerous ecophylogenetic approaches including measures of community phylogenetic and trait diversity, phylogenetic signal, estimation of trait values for unobserved taxa, null models for community and phylogeny randomizations, and utility functions for data input/output and phylogeny plotting. A full description of package functionality and methods are provided by Kembel et al. (2010) <doi:10.1093/bioinformatics/btq166>.

Maintained by Steven W. Kembel. Last updated 2 years ago.

10.6 match 34 stars 11.42 score 1.1k scripts 16 dependents

xijianzheng

coefa:Meta Analysis of Factor Analysis Based on CO-Occurrence Matrices

Provide a series of functions to conduct a meta analysis of factor analysis based on co-occurrence matrices. The tool can be used to solve the factor structure (i.e. inner structure of a construct, or scale) debate in several disciplines, such as psychology, psychiatry, management, education so on. References: Shafer (2005) <doi:10.1037/1040-3590.17.3.324>; Shafer (2006) <doi:10.1002/jclp.20213>; Loeber and Schmaling (1985) <doi:10.1007/BF00910652>.

Maintained by Xijian Zheng. Last updated 2 years ago.

43.8 match 2.70 score 4 scripts

dwarton

ecostats:Code and Data Accompanying the Eco-Stats Text (Warton 2022)

Functions and data supporting the Eco-Stats text (Warton, 2022, Springer), and solutions to exercises. Functions include tools for using simulation envelopes in diagnostic plots, and a function for diagnostic plots of multivariate linear models. Datasets mentioned in the package are included here (where not available elsewhere) and there is a vignette for each chapter of the text with solutions to exercises.

Maintained by David Warton. Last updated 1 years ago.

16.6 match 8 stars 6.58 score 53 scripts

bernd-mueller

epos:Epilepsy Ontologies' Similarities

Analysis and visualization of similarities between epilepsy ontologies based on text mining results by comparing ranked lists of co-occurring drug terms in the BioASQ corpus. The ranked result lists of neurological drug terms co-occurring with terms from the epilepsy ontologies EpSO, ESSO, EPILONT, EPISEM and FENICS undergo further analysis. The source data to create the ranked lists of drug names is produced using the text mining workflows described in Mueller, Bernd and Hagelstein, Alexandra (2016) <doi:10.4126/FRL01-006408558>, Mueller, Bernd et al. (2017) <doi:10.1007/978-3-319-58694-6_22>, Mueller, Bernd and Rebholz-Schuhmann, Dietrich (2020) <doi:10.1007/978-3-030-43887-6_52>, and Mueller, Bernd et al. (2022) <doi:10.1186/s13326-021-00258-w>.

Maintained by Bernd Mueller. Last updated 1 years ago.

26.4 match 4.03 score 53 scripts

bioc

mosbi:Molecular Signature identification using Biclustering

This package is a implementation of biclustering ensemble method MoSBi (Molecular signature Identification from Biclustering). MoSBi provides standardized interfaces for biclustering results and can combine their results with a multi-algorithm ensemble approach to compute robust ensemble biclusters on molecular omics data. This is done by computing similarity networks of biclusters and filtering for overlaps using a custom error model. After that, the louvain modularity it used to extract bicluster communities from the similarity network, which can then be converted to ensemble biclusters. Additionally, MoSBi includes several network visualization methods to give an intuitive and scalable overview of the results. MoSBi comes with several biclustering algorithms, but can be easily extended to new biclustering algorithms.

Maintained by Tim Daniel Rose. Last updated 5 months ago.

software statisticalmethod clustering network cpp

24.6 match 4.30 score 8 scripts

tommyjones

textmineR:Functions for Text Mining and Topic Modeling

An aid for text mining in R, with a syntax that should be familiar to experienced R users. Provides a wrapper for several topic models that take similarly-formatted input and give similarly-formatted output. Has additional functionality for analyzing and diagnostics for topic models.

Maintained by Tommy Jones. Last updated 2 years ago.

cpp

9.2 match 106 stars 10.83 score 310 scripts 7 dependents

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

7.5 match 215 stars 11.83 score 1.2k scripts 9 dependents

dracor-org

rdracor:Access to the 'DraCor' API

Provide an interface for 'Drama Corpora Project' ('DraCor') API: <https://dracor.org/documentation/api>.

Maintained by Ivan Pozdniakov. Last updated 6 months ago.

17.4 match 14 stars 5.05 score 40 scripts

thomaschln

nlpembeds:Natural Language Processing Embeddings

Provides efficient methods to compute co-occurrence matrices, pointwise mutual information (PMI) and singular value decomposition (SVD). In the biomedical and clinical settings, one challenge is the huge size of databases, e.g. when analyzing data of millions of patients over tens of years. To address this, this package provides functions to efficiently compute monthly co-occurrence matrices, which is the computational bottleneck of the analysis, by using the 'RcppAlgos' package and sparse matrices. Furthermore, the functions can be called on 'SQL' databases, enabling the computation of co-occurrence matrices of tens of gigabytes of data, representing millions of patients over tens of years. Partly based on Hong C. (2021) <doi:10.1038/s41746-021-00519-z>.

Maintained by Thomas Charlon. Last updated 26 days ago.

17.4 match 4.98 score

biorgeo

bioregion:Comparison of Bioregionalisation Methods

The main purpose of this package is to propose a transparent methodological framework to compare bioregionalisation methods based on hierarchical and non-hierarchical clustering algorithms (Kreft & Jetz (2010) <doi:10.1111/j.1365-2699.2010.02375.x>) and network algorithms (Lenormand et al. (2019) <doi:10.1002/ece3.4718> and Leroy et al. (2019) <doi:10.1111/jbi.13674>).

Maintained by Maxime Lenormand. Last updated 12 days ago.

biogeography bioregion bioregionalization cpp

13.7 match 7 stars 6.27 score 11 scripts

equitable-equations

fqar:Floristic Quality Assessment Tools for R

Tools for downloading and analyzing floristic quality assessment data. See Freyman et al. (2015) <doi:10.1111/2041-210X.12491> for more information about floristic quality assessment and the associated database.

Maintained by Andrew Gard. Last updated 2 months ago.

14.1 match 5 stars 5.88 score 5 scripts

quanteda

quanteda:Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintained by Kenneth Benoit. Last updated 2 months ago.

corpus natural-language-processing quanteda text-analytics onetbb cpp

4.6 match 851 stars 16.68 score 5.4k scripts 51 dependents

trinker

qdap:Bridging the Gap Between Qualitative Data and Quantitative Analysis

Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.

Maintained by Tyler Rinker. Last updated 4 years ago.

qdap quantitative-discourse-analysis text-analysis text-mining text-plotting openjdk

7.3 match 176 stars 9.61 score 1.3k scripts 3 dependents

lbbe-software

Mondrian:A Simple Graphical Representation of the Relative Occurrence and Co-Occurrence of Events

The unique function of this package allows representing in a single graph the relative occurrence and co-occurrence of events measured in a sample. As examples, the package was applied to describe the occurrence and co-occurrence of different species of bacterial or viral symbionts infecting arthropods at the individual level. The graphics allows determining the prevalence of each symbiont and the patterns of multiple infections (i.e. how different symbionts share or not the same individual hosts). We named the package after the famous painter as the graphical output recalls Mondrian’s paintings.

Maintained by Aurélie Siberchicot. Last updated 8 months ago.

17.2 match 2 stars 4.00 score 8 scripts

azvoleff

glcm:Calculate Textures from Grey-Level Co-Occurrence Matrices (GLCMs)

Enables calculation of image textures (Haralick 1973) <doi:10.1109/TSMC.1973.4309314> from grey-level co-occurrence matrices (GLCMs). Supports processing images that cannot fit in memory.

Maintained by Alex Zvoleff. Last updated 5 years ago.

openblas cpp

13.3 match 15 stars 5.05 score 74 scripts

emcramer

CHOIRBM:Plots the CHOIR Body Map

Collection of utility functions for visualizing body map data collected with the Collaborative Health Outcomes Information Registry.

Maintained by Eric Cramer. Last updated 1 years ago.

body-map cbm choir data-visualization visualization

11.9 match 5 stars 5.51 score 26 scripts

hope-data-science

akc:Automatic Knowledge Classification

A tidy framework for automatic knowledge classification and visualization. Currently, the core functionality of the framework is mainly supported by modularity-based clustering (community detection) in keyword co-occurrence network, and focuses on co-word analysis of bibliometric research. However, the designed functions in 'akc' are general, and could be extended to solve other tasks in text mining as well.

Maintained by Tian-Yuan Huang. Last updated 21 days ago.

11.0 match 15 stars 5.85 score 47 scripts

kasperwelbers

corpustools:Managing, Querying and Analyzing Tokenized Text

Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.

Maintained by Kasper Welbers. Last updated 6 months ago.

cpp

8.0 match 31 stars 7.50 score 174 scripts 1 dependents

tesselle

tabula:Analysis and Visualization of Archaeological Count Data

An easy way to examine archaeological count data. This package provides several tests and measures of diversity: heterogeneity and evenness (Brillouin, Shannon, Simpson, etc.), richness and rarefaction (Chao1, Chao2, ACE, ICE, etc.), turnover and similarity (Brainerd-Robinson, etc.). It allows to easily visualize count data and statistical thresholds: rank vs abundance plots, heatmaps, Ford (1962) and Bertin (1977) diagrams, etc.

Maintained by Nicolas Frerebeau. Last updated 14 days ago.

data-visualization archaeology archaeological-science

11.3 match 5.10 score 38 scripts 1 dependents

macroecology

letsR:Data Handling and Analysis in Macroecology

Handling, processing, and analyzing geographic data on species' distributions and environmental variables. Read Vilela & Villalobos (2015) <doi:10.1111/2041-210X.12401> for details.

Maintained by Bruno Vilela. Last updated 2 months ago.

6.2 match 29 stars 8.87 score 104 scripts

paballand

EconGeo:Computing Key Indicators of the Spatial Distribution of Economic Activities

Functions to compute a series of indices commonly used in the fields of economic geography, economic complexity, and evolutionary economics to describe the location, distribution, spatial organization, structure, and complexity of economic activities. Functions include basic spatial indicators such as the location quotient, the Krugman specialization index, the Herfindahl or the Shannon entropy indices but also more advanced functions to compute different forms of normalized relatedness between economic activities or network-based measures of economic complexity. Most of the functions use matrix calculus and are based on bipartite (incidence) matrices consisting of region - industry pairs.

Maintained by Pierre-Alexandre Balland. Last updated 2 years ago.

9.5 match 41 stars 4.96 score 44 scripts

kpmainali

CooccurrenceAffinity:Affinity in Co-Occurrence Data

Computes a novel metric of affinity between two entities based on their co-occurrence (using binary presence/absence data). The metric and its MLE, alpha hat, were advanced in Mainali, Slud, et al, 2021 <doi:10.1126/sciadv.abj9204>. Various types of confidence intervals and median interval were developed in Mainali and Slud, 2022 <doi:10.1101/2022.11.01.514801>.

Maintained by Kumar Mainali. Last updated 2 years ago.

10.5 match 26 stars 4.39 score 19 scripts

yiluheihei

RevEcoR:Reverse Ecology Analysis on Microbiome

An implementation of the reverse ecology framework. Reverse ecology refers to the use of genomics to study ecology with no a priori assumptions about the organism(s) under consideration, linking organisms to their environment. It allows researchers to reconstruct the metabolic networks and study the ecology of poorly characterized microbial species from their genomic information, and has substantial potentials for microbial community ecological analysis.

Maintained by Yang Cao. Last updated 6 years ago.

7.1 match 6 stars 5.77 score 22 scripts 1 dependents

alarm-redist

redist:Simulation Methods for Legislative Redistricting

Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <arXiv:1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <arXiv:2011.02288>.

Maintained by Christopher T. Kenny. Last updated 2 months ago.

geospatial gerrymandering redistricting sampling openblas cpp openmp

3.6 match 68 stars 9.17 score 259 scripts

rbarkerclarke

gtexture:Generalized Application of Co-Occurrence Matrices and Haralick Texture

Generalizes application of gray-level co-occurrence matrix (GLCM) metrics to objects outside of images. The current focus is to apply GLCM metrics to the study of biological networks and fitness landscapes that are used in studying evolutionary medicine and biology, particularly the evolution of cancer resistance. The package was used in our publication, Barker-Clarke et al. (2023) <doi:10.1088/1361-6560/ace305>. A general reference to learn more about mathematical oncology can be found at Rockne et al. (2019) <doi:10.1088/1478-3975/ab1a09>.

Maintained by Rowan Barker-Clarke. Last updated 12 months ago.

10.5 match 3.00 score 1 scripts

codymarquart

rENA:Epistemic Network Analysis

ENA (Shaffer, D. W. (2017) Quantitative Ethnography. ISBN: 0578191687) is a method used to identify meaningful and quantifiable patterns in discourse or reasoning. ENA moves beyond the traditional frequency-based assessments by examining the structure of the co-occurrence, or connections in coded data. Moreover, compared to other methodological approaches, ENA has the novelty of (1) modeling whole networks of connections and (2) affording both quantitative and qualitative comparisons between different network models. Shaffer, D.W., Collier, W., & Ruis, A.R. (2016).

Maintained by Cody L Marquart. Last updated 1 years ago.

openblas cpp

11.9 match 1 stars 2.26 score 36 scripts

jenniniku

gllvm:Generalized Linear Latent Variable Models

Analysis of multivariate data using generalized linear latent variable models (gllvm). Estimation is performed using either the Laplace method, variational approximations, or extended variational approximations, implemented via TMB (Kristensen et al. (2016), <doi:10.18637/jss.v070.i05>).

Maintained by Jenni Niku. Last updated 1 days ago.

cpp openmp

2.5 match 52 stars 10.54 score 176 scripts 1 dependents

andrew-plowright

ForestTools:Tools for Analyzing Remote Sensing Forest Data

Tools for analyzing remote sensing forest data, including functions for detecting treetops from canopy models, outlining tree crowns, and calculating textural metrics.

Maintained by Andrew Plowright. Last updated 1 months ago.

3.6 match 73 stars 7.01 score 103 scripts 1 dependents

quanteda

quanteda.textplots:Plots for the Quantitative Analysis of Textual Data

Plotting functions for visualising textual data. Extends 'quanteda' and related packages with plot methods designed specifically for text data, textual statistics, and models fit to textual data. Plot types include word clouds, lexical dispersion plots, scaling plots, network visualisations, and word 'keyness' plots.

Maintained by Kenneth Benoit. Last updated 7 months ago.

cpp

3.6 match 7 stars 6.77 score 648 scripts

bblonder

netassoc:Inference of Species Associations from Co-Occurrence Data

Infers species associations from community matrices. Uses local and (optional) regional-scale co-occurrence data by comparing observed partial correlation coefficients between species to those estimated from regional species distributions. Extends Gaussian graphical models to a null modeling framework. Provides interface to a variety of inverse covariance matrix estimation methods.

Maintained by Benjamin Blonder. Last updated 3 years ago.

10.4 match 2 stars 2.30 score 9 scripts

cysouw

qlcMatrix:Utility Sparse Matrix Functions for Quantitative Language Comparison

Extension of the functionality of the 'Matrix' package for using sparse matrices. Some of the functions are very general, while other are highly specific for special data format as used for quantitative language comparison.

Maintained by Michael Cysouw. Last updated 9 months ago.

3.3 match 6 stars 6.98 score 256 scripts 1 dependents

microsoft

wpa:Tools for Analysing and Visualising Viva Insights Data

Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in 'wpa': (i) Standard functions create a 'ggplot' visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to 'tidyverse' principles and works well with the pipe syntax. 'wpa' is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.

Maintained by Martin Chan. Last updated 4 months ago.

workplace-analytics

3.3 match 30 stars 6.69 score 39 scripts 1 dependents

microsoft

vivainsights:Analyze and Visualize Data from 'Microsoft Viva Insights'

Provides a versatile range of functions, including exploratory data analysis, time-series analysis, organizational network analysis, and data validation, whilst at the same time implements a set of best practices in analyzing and visualizing data specific to 'Microsoft Viva Insights'.

Maintained by Martin Chan. Last updated 25 days ago.

3.3 match 11 stars 6.12 score 68 scripts

cranhaven

rock:Reproducible Open Coding Kit

The Reproducible Open Coding Kit ('ROCK', and this package, 'rock') was developed to facilitate reproducible and open coding, specifically geared towards qualitative research methods. Although it is a general-purpose toolkit, three specific applications have been implemented, specifically an interface to the 'rENA' package that implements Epistemic Network Analysis ('ENA'), means to process notes from Cognitive Interviews ('CIs'), and means to work with decentralized construct taxonomies ('DCTs'). The 'ROCK' and this 'rock' package are described in the ROCK book <https://rockbook.org> and more information, such as tutorials, is available at <https://rock.science>.

Maintained by Gjalt-Jorn Peters. Last updated 9 days ago.

archived packages r-universe

5.6 match 5 stars 3.40 score

cran

ecoCopula:Graphical Modelling and Ordination using Copulas

Creates 'graphs' of species associations (interactions) and ordination biplots from co-occurrence data by fitting discrete gaussian copula graphical models. Methods described in Popovic, GC., Hui, FKC., Warton, DI., (2018) <doi:10.1016/j.jmva.2017.12.002>.

Maintained by Gordana Popovic. Last updated 3 years ago.

4.4 match 4.17 score 49 scripts 2 dependents

weksi-budiaji

kmed:Distance-Based k-Medoids

Algorithms of distance-based k-medoids clustering: simple and fast k-medoids, ranked k-medoids, and increasing number of clusters in k-medoids. Calculate distances for mixed variable data such as Gower, Podani, Wishart, Huang, Harikumar-PV, and Ahmad-Dey. Cluster validation applies internal and relative criteria. The internal criteria includes silhouette index and shadow values. The relative criterium applies bootstrap procedure producing a heatmap with a flexible reordering matrix algorithm such as complete, ward, or average linkages. The cluster result can be plotted in a marked barplot or pca biplot.

Maintained by Weksi Budiaji. Last updated 3 years ago.

5.7 match 3.15 score 141 scripts

modesto-escobar

netCoin:Interactive Analytic Networks

Create interactive analytic networks. It joins the data analysis power of R to obtain coincidences, co-occurrences and correlations, and the visualization libraries of 'JavaScript' in one package.

Maintained by Modesto Escobar. Last updated 21 hours ago.

2.2 match 11 stars 7.22 score 47 scripts

kisungyou

T4cluster:Tools for Cluster Analysis

Cluster analysis is one of the most fundamental problems in data science. We provide a variety of algorithms from clustering to the learning on the space of partitions. See Hennig, Meila, and Rocci (2016, ISBN:9781466551886) for general exposition to cluster analysis.

Maintained by Kisung You. Last updated 3 years ago.

openblas cpp openmp

3.6 match 6 stars 4.26 score 9 scripts 2 dependents

chaoliu-cl

textAnnotatoR:Interactive Text Annotation Tool with 'shiny' GUI

A comprehensive text annotation tool built with 'shiny'. Provides an interactive graphical user interface for coding text documents, managing code hierarchies, creating memos, and analyzing coding patterns. Features include code co-occurrence analysis, visualization of coding patterns, comparison of multiple coding sets, and export capabilities. Supports collaborative qualitative research through standardized annotation formats and analysis tools.

Maintained by Chao Liu. Last updated 4 months ago.

3.5 match 4.30 score 5 scripts

dernarr

ndl:Naive Discriminative Learning

Naive discriminative learning implements learning and classification models based on the Rescorla-Wagner equations and their equilibrium equations.

Maintained by Tino Sering. Last updated 7 years ago.

cpp

5.0 match 1 stars 3.00 score 66 scripts

cran

pubmed.mineR:Text Mining of PubMed Abstracts

Text mining of PubMed Abstracts (text and XML) from <https://pubmed.ncbi.nlm.nih.gov/>.

Maintained by S. Ramachandran. Last updated 6 months ago.

6.8 match 6 stars 2.08 score

traminer

TraMineRextras:TraMineR Extension

Collection of ancillary functions and utilities to be used in conjunction with the 'TraMineR' package for sequence data exploration. Includes, among others, specific functions such as state survival plots, position-wise group-typical states, dynamic sequence indicators, and dissimilarities between event sequences. Also includes contributions by non-members of the TraMineR team such as methods for polyadic data and for the comparison of groups of sequences.

Maintained by Gilbert Ritschard. Last updated 7 months ago.

5.4 match 2.43 score 89 scripts 1 dependents

polmine

polmineR:Verbs and Nouns for Corpus Analysis

Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.

Maintained by Andreas Blaette. Last updated 1 years ago.

1.5 match 49 stars 7.96 score 311 scripts

bnosac

BTM:Biterm Topic Models for Short Text

Biterm Topic Models find topics in collections of short texts. It is a word co-occurrence based topic model that learns topics by modeling word-word co-occurrences patterns which are called biterms. This in contrast to traditional topic models like Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis which are word-document co-occurrence topic models. A biterm consists of two words co-occurring in the same short text window. This context window can for example be a twitter message, a short answer on a survey, a sentence of a text or a document identifier. The techniques are explained in detail in the paper 'A Biterm Topic Model For Short Text' by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng (2013) <https://github.com/xiaohuiyan/xiaohuiyan.github.io/blob/master/paper/BTM-WWW13.pdf>.

Maintained by Jan Wijffels. Last updated 2 years ago.

biterm-topic-modelling natural-language-processing topic-modeling cpp

1.9 match 96 stars 6.25 score 74 scripts

quanteda

quanteda.textstats:Textual Statistics for the Quantitative Analysis of Textual Data

Textual statistics functions formerly in the 'quanteda' package. Textual statistics for characterizing and comparing textual data. Includes functions for measuring term and document frequency, the co-occurrence of words, similarity and distance between features and documents, feature entropy, keyword occurrence, readability, and lexical diversity. These functions extend the 'quanteda' package and are specially designed for sparse textual data.

Maintained by Kenneth Benoit. Last updated 7 months ago.

onetbb cpp

1.3 match 15 stars 8.91 score 916 scripts 10 dependents

juliasilge

widyr:Widen, Process, then Re-Tidy Data

Encapsulates the pattern of untidying data into a wide matrix, performing some processing, then turning it back into a tidy form. This is useful for several operations such as co-occurrence counts, correlations, or clustering that are mathematically convenient on wide matrices.

Maintained by Julia Silge. Last updated 2 years ago.

1.0 match 328 stars 11.11 score 1.7k scripts 2 dependents

kidoishi

MadanTextNetwork:Persian Textmining Tool for Co-Occurrence_Network

MadanText_co-occurrence_network is an open-source software designed specifically for text mining in the Persian language. It adds co-occurrence network functionality to MadanText. The input file replaces the text format with an Excel format.

Maintained by Kido Ishikawa. Last updated 1 years ago.

openjdk

4.0 match 2.70 score

nalimilan

RcmdrPlugin.temis:Graphical Integrated Text Mining Solution

An 'R Commander' plug-in providing an integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, vocabulary tables, terms co-occurrences and documents similarity measures, time series analysis, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, 'Twitter' queries, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.

Maintained by Milan Bouchet-Valat. Last updated 7 years ago.

10.5 match 1.00 score 7 scripts

nowosad

motif:Local Pattern Analysis

Describes spatial patterns of categorical raster data for any defined regular and irregular areas. Patterns are described quantitatively using built-in signatures based on co-occurrence matrices but also allows for any user-defined functions. It enables spatial analysis such as search, change detection, and clustering to be performed on spatial patterns (Nowosad (2021) <doi:10.1007/s10980-020-01135-0>).

Maintained by Jakub Nowosad. Last updated 7 months ago.

categorical-raster global-ecology landscape-ecology spatial cpp

1.0 match 63 stars 7.48 score 48 scripts

reviewburner

AnimalSequences:Analyse Animal Sequential Behaviour and Communication

All animal behaviour occurs sequentially. The package has a number of functions to format sequence data from different sources, to analyse sequential behaviour and communication in animals. It also has functions to plot the data and to calculate the entropy of sequences.

Maintained by Alex Mielke. Last updated 6 months ago.

7.1 match 1.00 score

bnosac

textplot:Text Plots

Visualise complex relations in texts. This is done by providing functionalities for displaying text co-occurrence networks, text correlation networks, dependency relationships as well as text clustering and semantic text 'embeddings'. Feel free to join the effort of providing interesting text visualisations.

Maintained by Jan Wijffels. Last updated 3 years ago.

1.0 match 54 stars 6.78 score 75 scripts 1 dependents

yoctozepto

RAFS:Robust Aggregative Feature Selection

A cross-validated minimal-optimal feature selection algorithm. It utilises popularity counting, hierarchical clustering with feature dissimilarity measures, and prefiltering with all-relevant feature selection method to obtain the minimal-optimal set of features.

Maintained by Radosław Piliszek. Last updated 2 months ago.

5.2 match 1.30 score

ailich

GLCMTextures:GLCM Textures of Raster Layers

Calculates grey level co-occurrence matrix (GLCM) based texture measures (Hall-Beyer (2017) <https://prism.ucalgary.ca/bitstream/handle/1880/51900/texture%20tutorial%20v%203_0%20180206.pdf>; Haralick et al. (1973) <doi:10.1109/TSMC.1973.4309314>) of raster layers using a sliding rectangular window. It also includes functions to quantize a raster into grey levels as well as tabulate a glcm and calculate glcm texture metrics for a matrix.

Maintained by Alexander Ilich. Last updated 2 months ago.

openblas cpp openmp

1.0 match 12 stars 6.33 score 20 scripts 2 dependents

bioc

BioNAR:Biological Network Analysis in R

the R package BioNAR, developed to step by step analysis of PPI network. The aim is to quantify and rank each protein’s simultaneous impact into multiple complexes based on network topology and clustering. Package also enables estimating of co-occurrence of diseases across the network and specific clusters pointing towards shared/common mechanisms.

Maintained by Anatoly Sorokin. Last updated 20 days ago.

software graphandnetwork network

1.0 match 3 stars 5.90 score 35 scripts

jferrer-b

Rediscover:Identify Mutually Exclusive Mutations

An optimized method for identifying mutually exclusive genomic events. Its main contribution is a statistical analysis based on the Poisson-Binomial distribution that takes into account that some samples are more mutated than others. See [Canisius, Sander, John WM Martens, and Lodewyk FA Wessels. (2016) "A novel independence test for somatic alterations in cancer shows that biology drives mutual exclusivity but chance explains most co-occurrence." Genome biology 17.1 : 1-17. <doi:10.1186/s13059-016-1114-x>]. The mutations matrices are sparse matrices. The method developed takes advantage of the advantages of this type of matrix to save time and computing resources.

Maintained by Juan A. Ferrer-Bonsoms. Last updated 2 years ago.

mutex

2.2 match 2.70 score 7 scripts

nalimilan

R.temis:Integrated Text Mining Solution

An integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, lexical summary, terms co-occurrences and documents similarity measures, graphs of terms, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.

Maintained by Milan Bouchet-Valat. Last updated 19 days ago.

text-mining

1.0 match 27 stars 4.99 score 24 scripts

tslumley

rimu:Responses in Multiplex

Tools for manipulating, exploring, and visualising multiple-response data, including scored or ranked responses. Conversions to and from factors, lists, strings, matrices; reordering, lumping, flattening; set operations; tables; frequency and co-occurrence plots.

Maintained by Thomas Lumley. Last updated 12 months ago.

1.0 match 4 stars 4.78 score 10 scripts

chiliubio

meconetcomp:Compare Microbial Networks of 'trans_network' Class of 'microeco' Package

Compare microbial co-occurrence networks created from 'trans_network' class of 'microeco' package <https://github.com/ChiLiubio/microeco>. This package is the extension of 'trans_network' class of 'microeco' package and especially useful when different networks are constructed and analyzed simultaneously.

Maintained by Chi Liu. Last updated 22 days ago.

1.0 match 9 stars 4.69 score 12 scripts

kdonnay

meltt:Matching Event Data by Location, Time and Type

Framework for merging and disambiguating event data based on spatiotemporal co-occurrence and secondary event characteristics. It can account for intrinsic "fuzziness" in the coding of events, varying event taxonomies and different geo-precision codes.

Maintained by Karsten Donnay. Last updated 8 months ago.

cpp

1.0 match 24 stars 4.64 score 12 scripts

cran

agrifeature:Agriculture Image Feature

Functions to calculate Gray Level Co-occurrence Matrix(GLCM), RGB-based Vegetative Index(RGB VI) and Normalized Difference Vegetation Index(NDVI) family image features. GLCM calculations are based on Haralick (1973) <doi:10.1109/TSMC.1973.4309314>.

Maintained by Chun-Han Lee. Last updated 3 years ago.

4.5 match 1.00 score

mlampros

fastGLCM:'GLCM' Texture Features

Two 'Gray Level Co-occurrence Matrix' ('GLCM') implementations are included: The first is a fast 'GLCM' feature texture computation based on 'Python' 'Numpy' arrays ('Github' Repository, <https://github.com/tzm030329/GLCM>). The second is a fast 'GLCM' 'RcppArmadillo' implementation which is parallelized (using 'OpenMP') with the option to return all 'GLCM' features at once. For more information, see "Artifact-Free Thin Cloud Removal Using Gans" by Toizumi Takahiro, Zini Simone, Sagi Kazutoshi, Kaneko Eiji, Tsukada Masato, Schettini Raimondo (2019), IEEE International Conference on Image Processing (ICIP), pp. 3596-3600, <doi:10.1109/ICIP.2019.8803652>.

Maintained by Lampros Mouselimis. Last updated 2 years ago.

glcm rcpparmadillo openblas cpp openmp

1.0 match 5 stars 4.40 score 2 scripts

vtrevino

valorate:Velocity and Accuracy of the LOg-RAnk TEst

The algorithm implemented in this package was designed to quickly estimates the distribution of the log-rank especially for heavy unbalanced groups. VALORATE estimates the null distribution and the p-value of the log-rank test based on a recent formulation. For a given number of alterations that define the size of survival groups, the estimation involves a weighted sum of distributions that are conditional on a co-occurrence term where mutations and events are both present. The estimation of conditional distributions is quite fast allowing the analysis of large datasets in few minutes <http://bioinformatica.mty.itesm.mx/valorate>.

Maintained by Victor Trevino. Last updated 8 years ago.

4.4 match 1.00 score 9 scripts

celehs

kesernetwork:Visualization of the KESER Network

A shiny app to visualize the knowledge networks for the code concepts. Using co-occurrence matrices of EHR codes from Veterans Affairs (VA) and Massachusetts General Brigham (MGB), the knowledge extraction via sparse embedding regression (KESER) algorithm was used to construct knowledge networks for the code concepts. Background and details about the method can be found at Chuan et al. (2021) <doi:10.1038/s41746-021-00519-z>.

Maintained by Su-Chun Cheng. Last updated 2 years ago.

1.0 match 1 stars 4.00 score 7 scripts

johnihrie

openEBGM:EBGM Disproportionality Scores for Adverse Event Data Mining

An implementation of DuMouchel's (1999) <doi:10.1080/00031305.1999.10474456> Bayesian data mining method for the market basket problem. Calculates Empirical Bayes Geometric Mean (EBGM) and posterior quantile scores using the Gamma-Poisson Shrinker (GPS) model to find unusually large cell counts in large, sparse contingency tables. Can be used to find unusually high reporting rates of adverse events associated with products. In general, can be used to mine any database where the co-occurrence of two variables or items is of interest. Also calculates relative and proportional reporting ratios. Builds on the work of the 'PhViD' package, from which much of the code is derived. Some of the added features include stratification to adjust for confounding variables and data squashing to improve computational efficiency. Includes an implementation of the EM algorithm for hyperparameter estimation loosely derived from the 'mederrRank' package.

Maintained by John Ihrie. Last updated 2 years ago.

1.0 match 1 stars 3.71 score 34 scripts 1 dependents

cran

mmpp:Various Similarity and Distance Metrics for Marked Point Processes

Compute similarities and distances between marked point processes.

Maintained by Hideitsu Hino. Last updated 7 years ago.

3.4 match 1.00 score

c0reyes

TextMiningGUI:Text Mining GUI Interface

Graphic interface for text analysis, implement a few methods such as biplots, correspondence analysis, co-occurrence, clustering, topic models, correlations and sentiments.

Maintained by Conrado Reyes. Last updated 4 years ago.

analysis biplot biplots correlations sentiments textmining topic-models

1.1 match 3 stars 3.18 score 6 scripts

cran

RIA:Radiomics Image Analysis Toolbox for Medial Images

Radiomics image analysis toolbox for 2D and 3D radiological images. RIA supports DICOM, NIfTI, nrrd and npy (numpy array) file formats. RIA calculates first-order, gray level co-occurrence matrix, gray level run length matrix and geometry-based statistics. Almost all calculations are done using vectorized formulas to optimize run speeds. Calculation of several thousands of parameters only takes minutes on a single core of a conventional PC. Detailed methodology has been published: Kolossvary et al. Circ: Cardiovascular Imaging. 2017;10(12):e006843 <doi: 10.1161/CIRCIMAGING.117.006843>.

Maintained by Marton Kolossvary. Last updated 1 years ago.

1.0 match 7 stars 3.24 score

cran

codecountR:Counting Codes in a Text and Preparing Data for Analysis

Data analysis often requires coding, especially when data are collected through interviews, observations, or questionnaires. As a result, code counting and data preparation are essential steps in the analysis process. Analysts may need to count the codes in a text (Tokenization, counting of pre-established codes, computing the co-occurrence matrix by line) and prepare the data (e.g., min-max normalization, Z-score, robust scaling, Box-Cox transformation, and non-parametric bootstrap). For the Box-Cox transformation (Box & Cox, 1964, <https://www.jstor.org/stable/2984418>), the optimal Lambda is determined using the log-likelihood method. Non-parametric bootstrap involves randomly sampling data with replacement. Two random number generators are also integrated: a Lehmer congruential generator for uniform distribution and a Box-Muller generator for normal distribution. Package for educational purposes.

Maintained by Philippe Cohard. Last updated 17 days ago.

1.0 match 2.48 score

robitalec

ScaleInMultilayerNetworks:Package Accompanying: The Problem And Promise Of Scale In Multilayer Animal Social Networks.

Scale remains a foundational concept in ecology. Spatial scale, for instance, has become a central consideration in the way we understand landscape ecology and animal space use. Meanwhile, scale-dependent social processes can range from fine-scale interactions to co-occurrence and overlapping home ranges. Furthermore, sociality can vary within and across seasons. Multilayer networks promise the explicit integration of the social, spatial and, temporal contexts. Given the complex interplay of sociality and animal space use in heterogeneous landscapes, there remains an important gap in our understanding of the influence of scale on animal social networks. Using an empirical case study, we discuss ways of considering social, spatial and, temporal scale in the context of multilayer caribou social networks. Effective integration of social and spatial processes, including biologically meaningful scales, within the context of animal social networks is an emerging area of research. We incorporate perspectives that link the social environment to spatial processes across scales in a multilayer context.

Maintained by Alec L. Robitaille. Last updated 4 years ago.

ecology multilayer-networks social-network-analysis

1.0 match 1 stars 2.00 score

carlosp-carmona

DarkDiv:Estimating Dark Diversity and Site-Specific Species Pools

Estimation of dark diversity and site-specific species pools using species co-occurrences. It includes implementations of probabilistic dark diversity based on the Hypergeometric distribution, as well as estimations based on the Beals index, which can be transformed to binary predictions using different thresholds, or transformed into a favorability index. All methods include the possibility of using a calibration dataset that is used to estimate the indication matrix between pairs of species, or to estimate dark diversity directly on a single dataset. See De Caceres and Legendre (2008) <doi:10.1007/s00442-008-1017-y>, Lewis et al. (2016) <doi:10.1111/2041-210X.12443>, Partel et al. (2011) <doi:10.1016/j.tree.2010.12.004>, Real et al. (2017) <doi:10.1093/sysbio/syw072> for further information.

Maintained by Carlos P. Carmona. Last updated 5 years ago.

1.0 match 1.00 score 6 scripts