Showing 4 of total 4 results (show query)
emilhvitfeldt
textdata:Download and Load Various Text Datasets
Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled text data sets for classification and analysis.
Maintained by Emil Hvitfeldt. Last updated 10 months ago.
56.3 match 75 stars 9.66 score 1.4k scripts 1 dependentsjonasrieger
ldaPrototype:Prototype of Multiple Latent Dirichlet Allocation Runs
Determine a Prototype from a number of runs of Latent Dirichlet Allocation (LDA) measuring its similarities with S-CLOP: A procedure to select the LDA run with highest mean pairwise similarity, which is measured by S-CLOP (Similarity of multiple sets by Clustering with Local Pruning), to all other runs. LDA runs are specified by its assignments leading to estimators for distribution parameters. Repeated runs lead to different results, which we encounter by choosing the most representative LDA run as prototype.
Maintained by Jonas Rieger. Last updated 2 years ago.
latent-dirichlet-allocationldamodel-selectionmodelselectionreliabilitytext-miningtextdatatopic-modeltopic-modelstopic-similaritiestopicmodelingtopicmodelling
11.0 match 8 stars 4.44 score 23 scripts 1 dependentsjonasrieger
rollinglda:Construct Consistent Time Series from Textual Data
A rolling version of the Latent Dirichlet Allocation, see Rieger et al. (2021) <doi:10.18653/v1/2021.findings-emnlp.201>. By a sequential approach, it enables the construction of LDA-based time series of topics that are consistent with previous states of LDA models. After an initial modeling, updates can be computed efficiently, allowing for real-time monitoring and detection of events or structural breaks.
Maintained by Jonas Rieger. Last updated 1 years ago.
consistencylatent-dirichlet-allocationldamodel-selectionreliabilitytext-miningtextdatatopic-modeltopic-modelstopicmodeltopicmodelingtopicmodelling
11.0 match 12 stars 4.03 score 18 scriptscran
Xplortext:Statistical Analysis of Textual Data
Provides a set of functions devoted to multivariate exploratory statistics on textual data. Classical methods such as correspondence analysis and agglomerative hierarchical clustering are available. Chronologically constrained agglomerative hierarchical clustering enriched with labelled-by-words trees is offered. Given a division of the corpus into parts, their characteristic words and documents are identified. Further, accessing to 'FactoMineR' functions is very easy. Two of them are relevant in textual domain. MFA() addresses multiple lexical table allowing applications such as dealing with multilingual corpora as well as simultaneously analyzing both open-ended and closed questions in surveys. See <http://xplortext.unileon.es> for examples.
Maintained by Ramón Alvarez-Esteban. Last updated 4 months ago.
15.1 match 2 stars 1.60 score