R-universe search: topic:embeddings

Showing 5 of total 5 results (show query)

bnosac

word2vec:Distributed Representations of Words

Learn vector representations of words by continuous bag of words and skip-gram implementations of the 'word2vec' algorithm. The techniques are detailed in the paper "Distributed Representations of Words and Phrases and their Compositionality" by Mikolov et al. (2013), available at <arXiv:1310.4546>.

Maintained by Jan Wijffels. Last updated 1 years ago.

embeddings natural-language-processing word2vec cpp

70 stars 8.36 score 227 scripts 6 dependents

bnosac

ruimtehol:Learn Text 'Embeddings' with 'Starspace'

Wraps the 'StarSpace' library <https://github.com/facebookresearch/StarSpace> allowing users to calculate word, sentence, article, document, webpage, link and entity 'embeddings'. By using the 'embeddings', you can perform text based multi-label classification, find similarities between texts and categories, do collaborative-filtering based recommendation as well as content-based recommendation, find out relations between entities, calculate graph 'embeddings' as well as perform semi-supervised learning and multi-task learning on plain text. The techniques are explained in detail in the paper: 'StarSpace: Embed All The Things!' by Wu et al. (2017), available at <arXiv:1709.03856>.

Maintained by Jan Wijffels. Last updated 1 years ago.

classification embeddings natural-language-processing nlp similarity starspace text-mining cpp

101 stars 6.65 score 44 scripts

bnosac

doc2vec:Distributed Representations of Sentences, Documents and Topics

Learn vector representations of sentences, paragraphs or documents by using the 'Paragraph Vector' algorithms, namely the distributed bag of words ('PV-DBOW') and the distributed memory ('PV-DM') model. The techniques in the package are detailed in the paper "Distributed Representations of Sentences and Documents" by Mikolov et al. (2014), available at <arXiv:1405.4053>. The package also provides an implementation to cluster documents based on these embedding using a technique called top2vec. Top2vec finds clusters in text documents by combining techniques to embed documents and words and density-based clustering. It does this by embedding documents in the semantic space as defined by the 'doc2vec' algorithm. Next it maps these document embeddings to a lower-dimensional space using the 'Uniform Manifold Approximation and Projection' (UMAP) clustering algorithm and finds dense areas in that space using a 'Hierarchical Density-Based Clustering' technique (HDBSCAN). These dense areas are the topic clusters which can be represented by the corresponding topic vector which is an aggregate of the document embeddings of the documents which are part of that topic cluster. In the same semantic space similar words can be found which are representative of the topic. More details can be found in the paper 'Top2Vec: Distributed Representations of Topics' by D. Angelov available at <arXiv:2008.09470>.

Maintained by Jan Wijffels. Last updated 3 years ago.

doc2vec embeddings natural-language-processing paragraph2vec word2vec cpp

48 stars 5.74 score 23 scripts

ropensci

pkgmatch:Find R Packages Matching Either Descriptions or Other R Packages

Find R packages matching either descriptions or other R packages.

Maintained by Mark Padgham. Last updated 2 days ago.

embeddings llms natural-language-processing cpp

3 stars 5.28 score

paithiov909

apportita:Utility for Handling 'magnitude' Word Embeddings

A partial R port from 'magnitude', which is a fast, simple utility library for handling vector embeddings. The main goal of this package is to enable access to user's local magnitude data store.

Maintained by Akiru Kato. Last updated 2 months ago.

embeddings

1 stars 1.70 score 4 scripts