Showing 7 of total 7 results (show query)
cran
topicmodels:Topic Models
Provides an interface to the C code for Latent Dirichlet Allocation (LDA) models and Correlated Topics Models (CTM) by David M. Blei and co-authors and the C++ code for fitting LDA models using Gibbs sampling by Xuan-Hieu Phan and co-authors.
Maintained by Bettina Grün. Last updated 7 months ago.
54.9 match 8 stars 9.04 score 5.0k scripts 16 dependentsjonasrieger
rollinglda:Construct Consistent Time Series from Textual Data
A rolling version of the Latent Dirichlet Allocation, see Rieger et al. (2021) <doi:10.18653/v1/2021.findings-emnlp.201>. By a sequential approach, it enables the construction of LDA-based time series of topics that are consistent with previous states of LDA models. After an initial modeling, updates can be computed efficiently, allowing for real-time monitoring and detection of events or structural breaks.
Maintained by Jonas Rieger. Last updated 1 years ago.
consistencylatent-dirichlet-allocationldamodel-selectionreliabilitytext-miningtextdatatopic-modeltopic-modelstopicmodeltopicmodelingtopicmodelling
31.0 match 12 stars 4.03 score 18 scriptsjwijffels
topicmodels.etm:Topic Modelling in Embedding Spaces
Find topics in texts which are semantically embedded using techniques like word2vec or Glove. This topic modelling technique models each word with a categorical distribution whose natural parameter is the inner product between a word embedding and an embedding of its assigned topic. The techniques are explained in detail in the paper 'Topic Modeling in Embedding Spaces' by Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei (2019), available at <arXiv:1907.04907>.
Maintained by Jan Wijffels. Last updated 3 years ago.
37.5 match 1 stars 2.90 score 32 scriptsjonasrieger
ldaPrototype:Prototype of Multiple Latent Dirichlet Allocation Runs
Determine a Prototype from a number of runs of Latent Dirichlet Allocation (LDA) measuring its similarities with S-CLOP: A procedure to select the LDA run with highest mean pairwise similarity, which is measured by S-CLOP (Similarity of multiple sets by Clustering with Local Pruning), to all other runs. LDA runs are specified by its assignments leading to estimators for distribution parameters. Repeated runs lead to different results, which we encounter by choosing the most representative LDA run as prototype.
Maintained by Jonas Rieger. Last updated 2 years ago.
latent-dirichlet-allocationldamodel-selectionmodelselectionreliabilitytext-miningtextdatatopic-modeltopic-modelstopic-similaritiestopicmodelingtopicmodelling
20.0 match 8 stars 4.44 score 23 scripts 1 dependentsgesistsa
oolong:Create Validation Tests for Automated Content Analysis
Intended to create standard human-in-the-loop validity tests for typical automated content analysis such as topic modeling and dictionary-based methods. This package offers a standard workflow with functions to prepare, administer and evaluate a human-in-the-loop validity test. This package provides functions for validating topic models using word intrusion, topic intrusion (Chang et al. 2009, <https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models>) and word set intrusion (Ying et al. 2021) <doi:10.1017/pan.2021.33> tests. This package also provides functions for generating gold-standard data which are useful for validating dictionary-based methods. The default settings of all generated tests match those suggested in Chang et al. (2009) and Song et al. (2020) <doi:10.1080/10584609.2020.1723752>.
Maintained by Chung-hong Chan. Last updated 20 days ago.
textanalysistopicmodelingvalidation
10.0 match 54 stars 7.57 score 23 scriptsjuliasilge
tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.
Maintained by Julia Silge. Last updated 11 months ago.
natural-language-processingtext-miningtidy-datatidyverse
1.8 match 1.2k stars 16.86 score 17k scripts 61 dependentsdoug-friedman
topicdoc:Topic-Specific Diagnostics for LDA and CTM Topic Models
Calculates topic-specific diagnostics (e.g. mean token length, exclusivity) for Latent Dirichlet Allocation and Correlated Topic Models fit using the 'topicmodels' package. For more details, see Chapter 12 in Airoldi et al. (2014, ISBN:9781466504080), pp 262-272 Mimno et al. (2011, ISBN:9781937284114), and Bischof et al. (2014) <arXiv:1206.4631v1>.
Maintained by Doug Friedman. Last updated 3 years ago.
natural-language-processingtext-miningtopic-modelingtopic-modellingtopic-models
0.5 match 25 stars 5.48 score 24 scripts