Showing 4 of total 4 results (show query)
bnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
215 stars 11.83 score 1.2k scripts 9 dependentspaithiov909
gibasa:An Alternative 'Rcpp' Wrapper of 'MeCab'
A plain 'Rcpp' wrapper for 'MeCab' that can segment Chinese, Japanese, and Korean text into tokens. The main goal of this package is to provide an alternative to 'tidytext' using morphological analysis.
Maintained by Akiru Kato. Last updated 10 days ago.
15 stars 5.02 score 3 scriptspaithiov909
sudachir2:R Wrapper for 'sudachi.rs'
Offers bindings to 'sudachi.rs' <https://github.com/WorksApplications/sudachi.rs>, a Rust implementation of 'Sudachi' Japanese morphological analyzer.
Maintained by Akiru Kato. Last updated 8 days ago.
3 stars 2.48 score 3 scriptspaithiov909
vibrrt:An R Wrapper for 'vibrato'
An R wrapper for 'vibrato' <https://github.com/daac-tools/vibrato>, a Rust reimplementation of 'MeCab' for fast tokenization.
Maintained by Akiru Kato. Last updated 28 days ago.
2.30 score 1 scripts