R-universe search: tagger

Showing 13 of total 13 results (show query)

junhewk

RcppMeCab:'rcpp' Wrapper for 'mecab' Library

R package based on 'Rcpp' for 'MeCab': Yet Another Part-of-Speech and Morphological Analyzer. The purpose of this package is providing a seamless developing and analyzing environment for CJK texts. This package utilizes parallel programming for providing highly efficient text preprocessing 'posParallel()' function. For installation, please refer to README.md file.

Maintained by Junhewk Kim. Last updated 7 months ago.

cjk nlp pos rcpp tagger mecab cpp

14.8 match 25 stars 5.30 score 40 scripts

shusei-e

RcppJagger:An R Wrapper for Jagger

A wrapper for Jagger, a morphological analyzer proposed in Yoshinaga (2023) <arXiv:2305.19045>. Jagger uses patterns derived from morphological dictionaries and training data sets and applies them from the beginning of the input. This simultaneous and deterministic process enables it to effectively perform tokenization, POS tagging, and lemmatization.

Maintained by Shusei Eshima. Last updated 2 years ago.

japanese-nlp morphological-analyser nlp part-of-speech-tagger text-analysis cpp

10.2 match 3 stars 3.18 score 3 scripts

mjockers

syuzhet:Extracts Sentiment and Sentiment-Derived Plot Arcs from Text

Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include "syuzhet" (default) developed in the Nebraska Literary Lab "afinn" developed by Finn Årup Nielsen, "bing" developed by Minqing Hu and Bing Liu, and "nrc" developed by Mohammad, Saif M. and Turney, Peter D. Applicable references are available in README.md and in the documentation for the "get_sentiment" function. The package also provides a hack for implementing Stanford's coreNLP sentiment parser. The package provides several methods for plot arc normalization.

Maintained by Matthew Jockers. Last updated 2 years ago.

1.9 match 336 stars 12.92 score 1.4k scripts 31 dependents

qinwf

jiebaR:Chinese Text Segmentation

Chinese text segmentation, keyword extraction and speech tagging For R.

Maintained by Qin Wenfeng. Last updated 5 years ago.

chinese chinese-text-segmentation cppjieba jieba lexical-analysis nlp cpp

2.3 match 352 stars 10.46 score 456 scripts 6 dependents

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

1.5 match 215 stars 11.83 score 1.2k scripts 9 dependents

martigso

stortingscrape:Access Data from the Norwegian Parliament API

Functions for retrieving general and specific data from the Norwegian Parliament, through the Norwegian Parliament API at <https://data.stortinget.no>.

Maintained by Martin Søyland. Last updated 24 days ago.

scraping stortinget

1.7 match 11 stars 6.02 score 24 scripts

openvolley

peranavolley:Perana Sports Volleyball Files

Basic functions for reading and working with Perana Sports volleyball scouting files.

Maintained by Ben Raymond. Last updated 10 months ago.

3.4 match 2.95 score 1 scripts 6 dependents

paithiov909

sudachir2:R Wrapper for 'sudachi.rs'

Offers bindings to 'sudachi.rs' <https://github.com/WorksApplications/sudachi.rs>, a Rust implementation of 'Sudachi' Japanese morphological analyzer.

Maintained by Akiru Kato. Last updated 11 days ago.

pos-tagging rust cargo

3.8 match 3 stars 2.48 score 3 scripts

curso-r

scryr:An Interface to the 'Scryfall' API

A simple, light, and robust interface between R and the 'Scryfall' card data API <https://scryfall.com/docs/api>.

Maintained by Caio Lente. Last updated 3 years ago.

api mtg

1.5 match 18 stars 6.11 score 18 scripts

paithiov909

vibrrt:An R Wrapper for 'vibrato'

An R wrapper for 'vibrato' <https://github.com/daac-tools/vibrato>, a Rust reimplementation of 'MeCab' for fast tokenization.

Maintained by Akiru Kato. Last updated 1 months ago.

pos-tagging rust cargo

3.9 match 2.30 score 1 scripts

dustinstoltz

text2map:R Tools for Text Matrices, Embeddings, and Networks

This is a collection of functions optimized for working with with various kinds of text matrices. Focusing on the text matrix as the primary object - represented either as a base R dense matrix or a 'Matrix' package sparse matrix - allows for a consistent and intuitive interface that stays close to the underlying mathematical foundation of computational text analysis. In particular, the package includes functions for working with word embeddings, text networks, and document-term matrices. Methods developed in Stoltz and Taylor (2019) <doi:10.1007/s42001-019-00048-6>, Taylor and Stoltz (2020) <doi:10.1007/s42001-020-00075-8>, Taylor and Stoltz (2020) <doi:10.15195/v7.a23>, and Stoltz and Taylor (2021) <doi:10.1016/j.poetic.2021.101567>.

Maintained by Dustin Stoltz. Last updated 4 months ago.

2.0 match 3.82 score 22 scripts

idslme

IDSL.FSA:Fragmentation Spectra Analysis (FSA)

The 'IDSL.FSA' package was designed to annotate standard .msp (mass spectra format) and .mgf (Mascot generic format) files using mass spectral entropy similarity, dot product (cosine) similarity, and normalized Euclidean mass error (NEME) followed by intelligent pre-filtering steps for rapid spectra searches. 'IDSL.FSA' also provides a number of modules to convert and manipulate .msp and .mgf files. The 'IDSL.FSA' workflow was integrated in the 'IDSL.CSA' and 'IDSL.NPA' packages introduced in <doi:10.1021/acs.analchem.3c00376>.

Maintained by Dinesh Barupal. Last updated 8 months ago.

fragmentation-spectra mass-spectrometry massbank mgf mgf-parser msp msp-parser spectral-entropy

1.8 match 1 stars 3.48 score 2 dependents

schweflo

NLPclient:Stanford 'CoreNLP' Annotation Client

Stanford 'CoreNLP' annotation client. Stanford 'CoreNLP' <https://stanfordnlp.github.io/CoreNLP/index.html> integrates all NLP tools from the Stanford Natural Language Processing Group, including a part-of-speech (POS) tagger, a named entity recognizer (NER), a parser, and a coreference resolution system, and provides model files for the analysis of English. More information can be found in the README.

Maintained by Florian Schwendinger. Last updated 5 years ago.

0.5 match 1.70 score