Showing 10 of total 10 results (show query)
trinker
textstem:Tools for Stemming and Lemmatizing Text
Tools that stem and lemmatize text. Stemming is a process that removes endings such as affixes. Lemmatization is the process of grouping inflected forms together as a single base form.
Maintained by Tyler Rinker. Last updated 7 years ago.
lemmatizationstemmingtext-mining
19.8 match 45 stars 8.71 score 888 scripts 11 dependentsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
13.5 match 215 stars 11.83 score 1.2k scripts 9 dependentstrinker
lexicon:Lexicons for Text Analysis
A collection of lexical hash tables, dictionaries, and word lists.
Maintained by Tyler Rinker. Last updated 3 years ago.
hashlexiconlookupnames-frequentstopwordstext-dictionariestext-mining
4.5 match 111 stars 8.80 score 224 scripts 25 dependentsmusajajorge
CINE:Classification International Normalized of Education
Function using lemmatization to classify educational programs according to the CINE(Classification International Normalized of Education) for Peru.
Maintained by Jorge L. C. Musaja. Last updated 2 years ago.
12.3 match 2.70 scoreshusei-e
RcppJagger:An R Wrapper for Jagger
A wrapper for Jagger, a morphological analyzer proposed in Yoshinaga (2023) <arXiv:2305.19045>. Jagger uses patterns derived from morphological dictionaries and training data sets and applies them from the beginning of the input. This simultaneous and deterministic process enables it to effectively perform tokenization, POS tagging, and lemmatization.
Maintained by Shusei Eshima. Last updated 2 years ago.
japanese-nlpmorphological-analysernlppart-of-speech-taggertext-analysiscpp
7.1 match 3 stars 3.18 score 3 scriptstidymodels
textrecipes:Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Maintained by Emil Hvitfeldt. Last updated 8 days ago.
2.0 match 160 stars 10.87 score 964 scripts 1 dependentsmassimoaria
tall:Text Analysis for All
An R 'shiny' app designed for diverse text analysis tasks, offering a wide range of methodologies tailored to Natural Language Processing (NLP) needs. It is a versatile, general-purpose tool for analyzing textual data. 'tall' features a comprehensive workflow, including data cleaning, preprocessing, statistical analysis, and visualization, all integrated for effective text analysis.
Maintained by Massimo Aria. Last updated 3 days ago.
r-shinytext-analysis-and-sentiment-analysistext-classificationtext-miningtextual-analysiscpp
3.4 match 14 stars 5.12 scorekidoishi
MadanText:Persian Textmining Tool for Frequency Analysis, Statistical Analysis, and Word Clouds
MadanText is an open-source software designed specifically for text mining in the Persian language. It allows users to examine word frequencies, download data for analysis, and generate word clouds. This tool is particularly useful for researchers and analysts working with Persian language data.
Maintained by Kido Ishikawa. Last updated 1 years ago.
2.3 match 2.70 scorekidoishi
MadanTextNetwork:Persian Textmining Tool for Co-Occurrence_Network
MadanText_co-occurrence_network is an open-source software designed specifically for text mining in the Persian language. It adds co-occurrence network functionality to MadanText. The input file replaces the text format with an Excel format.
Maintained by Kido Ishikawa. Last updated 1 years ago.
2.3 match 2.70 scoreimbi-heidelberg
MetaNLP:Natural Language Processing for Meta Analysis
Given a CSV file with titles and abstracts, the package creates a document-term matrix that is lemmatized and stemmed and can directly be used to train machine learning methods for automatic title-abstract screening in the preparation of a meta analysis.
Maintained by Maximilian Pilz. Last updated 4 days ago.
0.5 match 3 stars 4.32 score 1 scripts