R-universe search: textdata

Showing 4 of total 4 results (show query)

emilhvitfeldt

textdata:Download and Load Various Text Datasets

Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled text data sets for classification and analysis.

Maintained by Emil Hvitfeldt. Last updated 10 months ago.

text-datasets

56.3 match 75 stars 9.66 score 1.4k scripts 1 dependents

jonasrieger

ldaPrototype:Prototype of Multiple Latent Dirichlet Allocation Runs

Determine a Prototype from a number of runs of Latent Dirichlet Allocation (LDA) measuring its similarities with S-CLOP: A procedure to select the LDA run with highest mean pairwise similarity, which is measured by S-CLOP (Similarity of multiple sets by Clustering with Local Pruning), to all other runs. LDA runs are specified by its assignments leading to estimators for distribution parameters. Repeated runs lead to different results, which we encounter by choosing the most representative LDA run as prototype.

Maintained by Jonas Rieger. Last updated 2 years ago.

latent-dirichlet-allocation lda model-selection modelselection reliability text-mining textdata topic-model topic-models topic-similarities topicmodeling topicmodelling

11.0 match 8 stars 4.44 score 23 scripts 1 dependents

jonasrieger

rollinglda:Construct Consistent Time Series from Textual Data

A rolling version of the Latent Dirichlet Allocation, see Rieger et al. (2021) <doi:10.18653/v1/2021.findings-emnlp.201>. By a sequential approach, it enables the construction of LDA-based time series of topics that are consistent with previous states of LDA models. After an initial modeling, updates can be computed efficiently, allowing for real-time monitoring and detection of events or structural breaks.

Maintained by Jonas Rieger. Last updated 1 years ago.

consistency latent-dirichlet-allocation lda model-selection reliability text-mining textdata topic-model topic-models topicmodel topicmodeling topicmodelling

11.0 match 12 stars 4.03 score 18 scripts

cran

Xplortext:Statistical Analysis of Textual Data

Provides a set of functions devoted to multivariate exploratory statistics on textual data. Classical methods such as correspondence analysis and agglomerative hierarchical clustering are available. Chronologically constrained agglomerative hierarchical clustering enriched with labelled-by-words trees is offered. Given a division of the corpus into parts, their characteristic words and documents are identified. Further, accessing to 'FactoMineR' functions is very easy. Two of them are relevant in textual domain. MFA() addresses multiple lexical table allowing applications such as dealing with multilingual corpora as well as simultaneously analyzing both open-ended and closed questions in surveys. See <http://xplortext.unileon.es> for examples.

Maintained by Ramón Alvarez-Esteban. Last updated 4 months ago.

15.1 match 2 stars 1.60 score