Showing 10 of total 10 results (show query)
dselivanov
text2vec:Modern Text Mining Framework for R
Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.
Maintained by Dmitriy Selivanov. Last updated 7 months ago.
glovelatent-dirichlet-allocationnatural-language-processingtext-miningtopic-modelingvectorizationword-embeddingsword2veccpp
21.5 match 860 stars 13.48 score 1.3k scripts 23 dependentsprodriguezsosa
conText:'a la Carte' on Text (ConText) Embedding Regression
A fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) <arXiv:1805.05388> and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021)<https://github.com/prodriguezsosa/EmbeddingRegression>.
Maintained by Pedro L. Rodriguez. Last updated 11 months ago.
7.1 match 104 stars 9.40 score 1.7k scriptspsychbruce
PsychWordVec:Word Embedding Research Framework for Psychological Science
An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a series of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <arXiv:1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <arXiv:1607.04606>; (5) a group of functions to download 'pre-trained' language models (e.g., 'GPT', 'BERT') and extract contextualized (dynamic) word vectors (based on the R package 'text').
Maintained by Han-Wu-Shuang Bao. Last updated 1 years ago.
bertcosine-similarityfasttextglovegptlanguage-modelnatural-language-processingnlppretrained-modelspsychologysemantic-analysistext-analysistext-miningtsneword-embeddingsword-vectorsword2vecopenjdk
13.2 match 22 stars 4.04 score 10 scriptsemilhvitfeldt
wordsalad:Provide Tools to Extract and Analyze Word Vectors
Provides access to various word embedding methods (GloVe, fasttext and word2vec) to extract word vectors using a unified framework to increase reproducibility and correctness.
Maintained by Emil Hvitfeldt. Last updated 4 years ago.
5.6 match 8 stars 3.60 score 9 scriptsgesistsa
sweater:Speedy Word Embedding Association Test and Extras Using R
Conduct various tests for evaluating implicit biases in word embeddings: Word Embedding Association Test (Caliskan et al., 2017), <doi:10.1126/science.aal4230>, Relative Norm Distance (Garg et al., 2018), <doi:10.1073/pnas.1720347115>, Mean Average Cosine Similarity (Mazini et al., 2019) <arXiv:1904.04047>, SemAxis (An et al., 2018) <arXiv:1806.05521>, Relative Negative Sentiment Bias (Sweeney & Najafian, 2019) <doi:10.18653/v1/P19-1162>, and Embedding Coherence Test (Dev & Phillips, 2019) <arXiv:1901.07656>.
Maintained by Chung-hong Chan. Last updated 1 months ago.
bias-detectiontextanalysiswordembeddingcpp
3.6 match 30 stars 4.80 score 14 scriptsbioc
DeProViR:A Deep-Learning Framework Based on Pre-trained Sequence Embeddings for Predicting Host-Viral Protein-Protein Interactions
Emerging infectious diseases, exemplified by the zoonotic COVID-19 pandemic caused by SARS-CoV-2, are grave global threats. Understanding protein-protein interactions (PPIs) between host and viral proteins is essential for therapeutic targets and insights into pathogen replication and immune evasion. While experimental methods like yeast two-hybrid screening and mass spectrometry provide valuable insights, they are hindered by experimental noise and costs, yielding incomplete interaction maps. Computational models, notably DeProViR, predict PPIs from amino acid sequences, incorporating semantic information with GloVe embeddings. DeProViR employs a Siamese neural network, integrating convolutional and Bi-LSTM networks to enhance accuracy. It overcomes the limitations of feature engineering, offering an efficient means to predict host-virus interactions, which holds promise for antiviral therapies and advancing our understanding of infectious diseases.
Maintained by Matineh Rahmatbakhsh. Last updated 5 months ago.
proteomicssystemsbiologynetworkinferenceneuralnetworknetwork
3.9 match 1 stars 3.00 score 1 scriptsmlampros
textTinyR:Text Processing for Small or Big Data Files
It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.
Maintained by Lampros Mouselimis. Last updated 1 years ago.
bhboostcpp11processingrcpprcpparmadillotextopenblascppopenmp
0.5 match 38 stars 7.64 score 244 scripts 1 dependentskoheiw
LSX:Semi-Supervised Algorithm for Document Scaling
A word embeddings-based semi-supervised model for document scaling Watanabe (2020) <doi:10.1080/19312458.2020.1832976>. LSS allows users to analyze large and complex corpora on arbitrary dimensions with seed words exploiting efficiency of word embeddings (SVD, Glove). It can generate word vectors on a users-provided corpus or incorporate a pre-trained word vectors.
Maintained by Kohei Watanabe. Last updated 2 months ago.
lsaquantedasentiment-analysistext-analysis
0.5 match 55 stars 6.09 score 14 scriptsjwijffels
topicmodels.etm:Topic Modelling in Embedding Spaces
Find topics in texts which are semantically embedded using techniques like word2vec or Glove. This topic modelling technique models each word with a categorical distribution whose natural parameter is the inner product between a word embedding and an embedding of its assigned topic. The techniques are explained in detail in the paper 'Topic Modeling in Embedding Spaces' by Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei (2019), available at <arXiv:1907.04907>.
Maintained by Jan Wijffels. Last updated 3 years ago.
0.5 match 1 stars 2.90 score 32 scripts