Showing 13 of total 13 results (show query)
junhewk
RcppMeCab:'rcpp' Wrapper for 'mecab' Library
R package based on 'Rcpp' for 'MeCab': Yet Another Part-of-Speech and Morphological Analyzer. The purpose of this package is providing a seamless developing and analyzing environment for CJK texts. This package utilizes parallel programming for providing highly efficient text preprocessing 'posParallel()' function. For installation, please refer to README.md file.
Maintained by Junhewk Kim. Last updated 7 months ago.
14.8 match 25 stars 5.30 score 40 scriptsshusei-e
RcppJagger:An R Wrapper for Jagger
A wrapper for Jagger, a morphological analyzer proposed in Yoshinaga (2023) <arXiv:2305.19045>. Jagger uses patterns derived from morphological dictionaries and training data sets and applies them from the beginning of the input. This simultaneous and deterministic process enables it to effectively perform tokenization, POS tagging, and lemmatization.
Maintained by Shusei Eshima. Last updated 2 years ago.
japanese-nlpmorphological-analysernlppart-of-speech-taggertext-analysiscpp
10.2 match 3 stars 3.18 score 3 scriptsmjockers
syuzhet:Extracts Sentiment and Sentiment-Derived Plot Arcs from Text
Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include "syuzhet" (default) developed in the Nebraska Literary Lab "afinn" developed by Finn Årup Nielsen, "bing" developed by Minqing Hu and Bing Liu, and "nrc" developed by Mohammad, Saif M. and Turney, Peter D. Applicable references are available in README.md and in the documentation for the "get_sentiment" function. The package also provides a hack for implementing Stanford's coreNLP sentiment parser. The package provides several methods for plot arc normalization.
Maintained by Matthew Jockers. Last updated 2 years ago.
1.9 match 336 stars 12.92 score 1.4k scripts 31 dependentsqinwf
jiebaR:Chinese Text Segmentation
Chinese text segmentation, keyword extraction and speech tagging For R.
Maintained by Qin Wenfeng. Last updated 5 years ago.
chinesechinese-text-segmentationcppjiebajiebalexical-analysisnlpcpp
2.3 match 352 stars 10.46 score 456 scripts 6 dependentsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
1.5 match 215 stars 11.83 score 1.2k scripts 9 dependentsmartigso
stortingscrape:Access Data from the Norwegian Parliament API
Functions for retrieving general and specific data from the Norwegian Parliament, through the Norwegian Parliament API at <https://data.stortinget.no>.
Maintained by Martin Søyland. Last updated 24 days ago.
1.7 match 11 stars 6.02 score 24 scriptsopenvolley
peranavolley:Perana Sports Volleyball Files
Basic functions for reading and working with Perana Sports volleyball scouting files.
Maintained by Ben Raymond. Last updated 10 months ago.
3.4 match 2.95 score 1 scripts 6 dependentspaithiov909
sudachir2:R Wrapper for 'sudachi.rs'
Offers bindings to 'sudachi.rs' <https://github.com/WorksApplications/sudachi.rs>, a Rust implementation of 'Sudachi' Japanese morphological analyzer.
Maintained by Akiru Kato. Last updated 11 days ago.
3.8 match 3 stars 2.48 score 3 scriptscurso-r
scryr:An Interface to the 'Scryfall' API
A simple, light, and robust interface between R and the 'Scryfall' card data API <https://scryfall.com/docs/api>.
Maintained by Caio Lente. Last updated 3 years ago.
1.5 match 18 stars 6.11 score 18 scriptspaithiov909
vibrrt:An R Wrapper for 'vibrato'
An R wrapper for 'vibrato' <https://github.com/daac-tools/vibrato>, a Rust reimplementation of 'MeCab' for fast tokenization.
Maintained by Akiru Kato. Last updated 1 months ago.
3.9 match 2.30 score 1 scriptsdustinstoltz
text2map:R Tools for Text Matrices, Embeddings, and Networks
This is a collection of functions optimized for working with with various kinds of text matrices. Focusing on the text matrix as the primary object - represented either as a base R dense matrix or a 'Matrix' package sparse matrix - allows for a consistent and intuitive interface that stays close to the underlying mathematical foundation of computational text analysis. In particular, the package includes functions for working with word embeddings, text networks, and document-term matrices. Methods developed in Stoltz and Taylor (2019) <doi:10.1007/s42001-019-00048-6>, Taylor and Stoltz (2020) <doi:10.1007/s42001-020-00075-8>, Taylor and Stoltz (2020) <doi:10.15195/v7.a23>, and Stoltz and Taylor (2021) <doi:10.1016/j.poetic.2021.101567>.
Maintained by Dustin Stoltz. Last updated 4 months ago.
2.0 match 3.82 score 22 scriptsidslme
IDSL.FSA:Fragmentation Spectra Analysis (FSA)
The 'IDSL.FSA' package was designed to annotate standard .msp (mass spectra format) and .mgf (Mascot generic format) files using mass spectral entropy similarity, dot product (cosine) similarity, and normalized Euclidean mass error (NEME) followed by intelligent pre-filtering steps for rapid spectra searches. 'IDSL.FSA' also provides a number of modules to convert and manipulate .msp and .mgf files. The 'IDSL.FSA' workflow was integrated in the 'IDSL.CSA' and 'IDSL.NPA' packages introduced in <doi:10.1021/acs.analchem.3c00376>.
Maintained by Dinesh Barupal. Last updated 8 months ago.
fragmentation-spectramass-spectrometrymassbankmgfmgf-parsermspmsp-parserspectral-entropy
1.8 match 1 stars 3.48 score 2 dependentsschweflo
NLPclient:Stanford 'CoreNLP' Annotation Client
Stanford 'CoreNLP' annotation client. Stanford 'CoreNLP' <https://stanfordnlp.github.io/CoreNLP/index.html> integrates all NLP tools from the Stanford Natural Language Processing Group, including a part-of-speech (POS) tagger, a named entity recognizer (NER), a parser, and a coreference resolution system, and provides model files for the analysis of English. More information can be found in the README.
Maintained by Florian Schwendinger. Last updated 5 years ago.
0.5 match 1.70 score