R-universe search: lexicon

trinker

lexicon:Lexicons for Text Analysis

A collection of lexical hash tables, dictionaries, and word lists.

Maintained by Tyler Rinker. Last updated 3 years ago.

hash lexicon lookup names-frequent stopwords text-dictionaries text-mining

73.2 match 111 stars 8.80 score 224 scripts 25 dependents

juliasilge

tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.

Maintained by Julia Silge. Last updated 11 months ago.

natural-language-processing text-mining tidy-data tidyverse

10.6 match 1.2k stars 16.86 score 17k scripts 61 dependents

emilhvitfeldt

textdata:Download and Load Various Text Datasets

Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled text data sets for classification and analysis.

Maintained by Emil Hvitfeldt. Last updated 10 months ago.

text-datasets

10.7 match 75 stars 9.66 score 1.4k scripts 1 dependents

odelmarcelle

sentopics:Tools for Joint Sentiment and Topic Analysis of Textual Data

A framework that joins topic modeling and sentiment analysis of textual data. The package implements a fast Gibbs sampling estimation of Latent Dirichlet Allocation (Griffiths and Steyvers (2004) <doi:10.1073/pnas.0307752101>) and Joint Sentiment/Topic Model (Lin, He, Everson and Ruger (2012) <doi:10.1109/TKDE.2011.48>). It offers a variety of helpers and visualizations to analyze the result of topic modeling. The framework also allows enriching topic models with dates and externally computed sentiment measures. A flexible aggregation scheme enables the creation of time series of sentiment or topical proportions from the enriched topic models. Moreover, a novel method jointly aggregates topic proportions and sentiment measures to derive time series of topical sentiment.

Maintained by Olivier Delmarcelle. Last updated 2 months ago.

openblas cpp openmp

9.8 match 8 stars 5.38 score 5 scripts

evan-l-munson

saotd:Sentiment Analysis of Twitter Data

This analytic is an in initial foray into sentiment analysis. This analytic will allow a user to access the Twitter API (once they create their own developer account), ingest tweets of their interest, clean / tidy data, perform topic modeling if interested, compute sentiment scores utilizing the Bing Lexicon, and output visualizations.

Maintained by Evan Munson. Last updated 7 months ago.

bing-lexicon latent-dirichlet-allocation n-grams plot sentiment-analysis tidy-data topicanalysis tweets twitter-data

8.0 match 12 stars 6.33 score 118 scripts

sentometricsresearch

sentometrics:An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction

Optimized prediction based on textual sentiment, accounting for the intrinsic challenge that sentiment can be computed and pooled across texts and time in various ways. See Ardia et al. (2021) <doi:10.18637/jss.v099.i02>.

Maintained by Samuel Borms. Last updated 4 years ago.

nlp prediction sentiment-analysis text-mining time-series openblas cpp openmp

7.9 match 83 stars 6.09 score 49 scripts

jbgruber

atrrr:Wrapper for the 'AT' Protocol Behind 'Bluesky'

Wraps the 'AT' Protocol (Authenticated Transfer Protocol) behind 'Bluesky' <https://bsky.social>. Functions can be used for, among others, retrieving posts and followers from the network or posting content.

Maintained by Johannes B. Gruber. Last updated 6 days ago.

atproto bluesky

4.5 match 28 stars 8.07 score 38 scripts

mjockers

syuzhet:Extracts Sentiment and Sentiment-Derived Plot Arcs from Text

Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include "syuzhet" (default) developed in the Nebraska Literary Lab "afinn" developed by Finn Årup Nielsen, "bing" developed by Minqing Hu and Bing Liu, and "nrc" developed by Mohammad, Saif M. and Turney, Peter D. Applicable references are available in README.md and in the documentation for the "get_sentiment" function. The package also provides a hack for implementing Stanford's coreNLP sentiment parser. The package provides several methods for plot arc normalization.

Maintained by Matthew Jockers. Last updated 2 years ago.

2.7 match 336 stars 12.92 score 1.4k scripts 31 dependents

comp-cogneuro-lang

LexFindR:Find Related Items and Lexical Dimensions in a Lexicon

Implements code to identify lexical competitors in a given list of words. We include many of the standard competitor types used in spoken word recognition research, such as functions to find cohorts, neighbors, and rhymes, amongst many others. The package includes documentation for using a variety of lexicon files, including those with form codes made up of multiple letters (i.e., phoneme codes) and also basic orthographies. Importantly, the code makes use of multiple CPU cores and vectorization when possible, making it extremely fast and able to handle large lexicons. Additionally, the package contains documentation for users to easily write new functions, allowing researchers to examine other relationships within a lexicon. Preprint: <https://osf.io/preprints/psyarxiv/8dyru/>. Open access: <doi:10.3758/s13428-021-01667-6>. Citation: Li, Z., Crinnion, A.M. & Magnuson, J.S. (2021). <doi:10.3758/s13428-021-01667-6>.

Maintained by ZhaoBin Li. Last updated 9 months ago.

6.1 match 4 stars 4.30 score 5 scripts

sillasgonzaga

lexiconPT:Lexicons for Portuguese Text Analysis

Provides easy access for sentiment lexicons for those who want to do text analysis in Portuguese texts. As of now, two Portuguese lexicons are available: 'SentiLex-PT02' and 'OpLexicon' (v2.1 and v3.0).

Maintained by Sillas Gonzaga. Last updated 7 years ago.

3.9 match 57 stars 5.12 score 46 scripts

samlestrade2

MoLE:Modeling Language Evolution

Model for simulating language evolution in terms of cultural evolution (Smith & Kirby (2008) <DOI:10.1098/rstb.2008.0145>; Deacon 1997). The focus is on the emergence of argument-marking systems (Dowty (1991) <DOI:10.1353/lan.1991.0021>, Van Valin 1999, Dryer 2002, Lestrade 2015a), i.e. noun marking (Aristar (1997) <DOI:10.1075/sl.21.2.04ari>, Lestrade (2010) <DOI:10.7282/T3ZG6R4S>), person indexing (Ariel 1999, Dahl (2000) <DOI:10.1075/fol.7.1.03dah>, Bhat 2004), and word order (Dryer 2013), but extensions are foreseen. Agents start out with a protolanguage (a language without grammar; Bickerton (1981) <DOI:10.17169/langsci.b91.109>, Jackendoff 2002, Arbib (2015) <DOI:10.1002/9781118346136.ch27>) and interact through language games (Steels 1997). Over time, grammatical constructions emerge that may or may not become obligatory (for which the tolerance principle is assumed; Yang 2016). Throughout the simulation, uniformitarianism of principles is assumed (Hopper (1987) <DOI:10.3765/bls.v13i0.1834>, Givon (1995) <DOI:10.1075/z.74>, Croft (2000), Saffran (2001) <DOI:10.1111/1467-8721.01243>, Heine & Kuteva 2007), in which maximal psychological validity is aimed at (Grice (1975) <DOI:10.1057/9780230005853_5>, Levelt 1989, Gaerdenfors 2000) and language representation is usage based (Tomasello 2003, Bybee 2010). In Lestrade (2015b) <DOI:10.15496/publikation-8640>, Lestrade (2015c) <DOI:10.1075/avt.32.08les>, and Lestrade (2016) <DOI:10.17617/2.2248195>), which reported on the results of preliminary versions, this package was announced as WDWTW (for who does what to whom), but for reasons of pronunciation and generalization the title was changed.

Maintained by Sander Lestrade. Last updated 7 years ago.

6.3 match 2.46 score 58 scripts

cloudyr

aws.polly:Client for AWS Polly

A client for AWS Polly <http://aws.amazon.com/documentation/polly>, a speech synthesis service.

Maintained by Antoine Sachet. Last updated 3 years ago.

aws aws-polly cloudyr polly

3.0 match 23 stars 4.95 score 13 scripts

marce10

warbleR:Streamline Bioacoustic Analysis

Functions aiming to facilitate the analysis of the structure of animal acoustic signals in 'R'. 'warbleR' makes use of the basic sound analysis tools from the packages 'tuneR' and 'seewave', and offers new tools for explore and quantify acoustic signal structure. The package allows to organize and manipulate multiple sound files, create spectrograms of complete recordings or individual signals in different formats, run several measures of acoustic structure, and characterize different structural levels in acoustic signals.

Maintained by Marcelo Araya-Salas. Last updated 2 months ago.

animal-acoustic-signals audio-processing bioacoustics spectrogram streamline-analysis cpp

1.3 match 54 stars 11.01 score 270 scripts 4 dependents

lucymcgowan

tidycode:Analyze Lines of R Code the Tidy Way

Analyze lines of R code using tidy principles. This allows you to input lines of R code and output a data frame with one row per function included. Additionally, it facilitates code classification via included lexicons.

Maintained by Lucy DAgostino McGowan. Last updated 4 years ago.

2.2 match 32 stars 6.54 score 36 scripts

sylvainloiseau

interlineaR:Importing Interlinearized Corpora and Dictionaries as Produced by Descriptive Linguistics Software

Interlinearized glossed texts (IGT) are used in descriptive linguistics for representing a morphological analysis of a text through a morpheme-by-morpheme gloss. 'InterlineaR' provide a set of functions that targets several popular formats of IGT ('SIL Toolbox', 'EMELD XML') and that turns an IGT into a set of data frames following a relational model (the tables represent the different linguistic units: texts, sentences, word, morphems). The same pieces of software ('SIL FLEX', 'SIL Toolbox') typically produce dictionaries of the morphemes used in the glosses. 'InterlineaR' provide a function for turning the LIFT XML dictionary format into a set of data frames following a relational model in order to represent the dictionary entries, the sense(s) attached to the entries, the example(s) attached to senses, etc.

Maintained by Sylvain Loiseau. Last updated 7 years ago.

corpus-linguistics descriptive-linguistics dictionaries interlinear-gloss

2.9 match 4 stars 4.60 score 9 scripts

mdecorde

textometry:Textual Data Analysis Package Used by the TXM Software

Statistical exploration of textual corpora using several methods from French 'Textometrie' (new name of 'Lexicometrie') and French 'Data Analysis' schools. It includes methods for exploring irregularity of distribution of lexicon features across text sets or parts of texts (Specificity analysis); multi-dimensional exploration (Factorial analysis), etc. Those methods are used in the TXM software.

Maintained by Matthieu Decorde. Last updated 3 years ago.

6.4 match 2.00 score 6 scripts

d-score

dscore:D-Score for Child Development

The D-score summarizes the child's performance on a set of milestones into a single number. The package implements four Rasch model keys to convert milestone scores into a D-score. It provides tools to calculate the D-score and its precision from the child's milestone scores, to convert the D-score into the Development-for-Age Z-score (DAZ) using age-conditional references, and to map milestone names into a generic 9-position item naming convention.

Maintained by Stef van Buuren. Last updated 7 months ago.

child-development d-score daz developmental-trajectories growth-charts rasch-model cpp

1.8 match 8 stars 6.89 score 40 scripts

polmine

RcppCWB:'Rcpp' Bindings for the 'Corpus Workbench' ('CWB')

'Rcpp' Bindings for the C code of the 'Corpus Workbench' ('CWB'), an indexing and query engine to efficiently analyze large corpora (<https://cwb.sourceforge.io>). 'RcppCWB' is licensed under the GNU GPL-3, in line with the GPL-3 license of the 'CWB' (<https://www.r-project.org/Licenses/GPL-3>). The 'CWB' relies on 'pcre2' (BSD license, see <http://www.pcre.org/licence.txt>) and 'GLib' (LGPL license, see <https://www.gnu.org/licenses/lgpl-3.0.en.html>). See the file LICENSE.note for further information. The package includes modified code of the 'rcqp' package (GPL-2, see <https://cran.r-project.org/package=rcqp>). The original work of the authors of the 'rcqp' package is acknowledged with great respect, and they are listed as authors of this package. To achieve cross-platform portability (including Windows), using 'Rcpp' for wrapper code is the approach used by 'RcppCWB'.

Maintained by Andreas Blaette. Last updated 1 years ago.

glib pcre2 cpp

2.0 match 2 stars 6.18 score 85 scripts 1 dependents

simmieyungie

texter:An Easy Text and Sentiment Analysis Library

Implement text and sentiment analysis with 'texter'. Generate sentiment scores on text data and also visualize sentiments. 'texter' allows you to quickly generate insights on your data. It includes support for lexicons such as 'NRC' and 'Bing'.

Maintained by Simi Kafaru. Last updated 3 years ago.

3.9 match 2 stars 3.00 score 4 scripts

teachinglab

tlShiny:Supplies essential functions to Teaching Lab dashboards

A bunch of random functions I use in developing dashboards Needs to vastly reduce the number of dependencies at the moment.

Maintained by Duncan Gates. Last updated 12 days ago.

3.6 match 3.04 score

dernarr

ndl:Naive Discriminative Learning

Naive discriminative learning implements learning and classification models based on the Rescorla-Wagner equations and their equilibrium equations.

Maintained by Tino Sering. Last updated 7 years ago.

cpp

3.5 match 1 stars 3.00 score 66 scripts

colinfay

proustr:Tools for Natural Language Processing in French

Tools for Natural Language Processing in French and texts from Marcel Proust's collection "A La Recherche Du Temps Perdu". The novels contained in this collection are "Du cote de chez Swann ", "A l'ombre des jeunes filles en fleurs","Le Cote de Guermantes", "Sodome et Gomorrhe I et II", "La Prisonniere", "Albertine disparue", and "Le Temps retrouve".

Maintained by Colin Fay. Last updated 6 years ago.

1.6 match 24 stars 6.10 score 104 scripts

koheiw

wordmap:Feature Extraction and Document Classification with Noisy Labels

Extract features and classify documents with noisy labels given by document-meta data or keyword matching Watanabe & Zhou (2020) <doi:10.1177/0894439320907027>.

Maintained by Kohei Watanabe. Last updated 2 months ago.

1.9 match 2 stars 4.86 score 1 scripts

zahiernasrudin

malaytextr:Text Mining for Bahasa Malaysia

It is designed to work with text written in Bahasa Malaysia. We provide functions and data sets that will make working with Bahasa Malaysia text much easier. For word stemming in particular, we will look up the Malay words in a dictionary and then proceed to remove "extra suffix" as explained in Khan, Rehman Ullah, Fitri Suraya Mohamad, Muh Inam UlHaq, Shahren Ahmad Zadi Adruce, Philip Nuli Anding, Sajjad Nawaz Khan, and Abdulrazak Yahya Saleh Al-Hababi (2017) <https://ijrest.net/vol-4-issue-12.html> . This package includes a dictionary of Malay words that may be used to perform word stemming, a dataset of Malay stop words, a dataset of sentiment words and a dataset of normalized words.

Maintained by Zahier Nasrudin. Last updated 2 years ago.

1.5 match 4 stars 4.30 score 4 scripts

cran

tmpm:Trauma Mortality Prediction Model

Trauma Mortality prediction for ICD-9, ICD-10, and AIS lexicons in long or wide format based on Dr. Alan Cook's tmpm mortality model.

Maintained by Cody Moore. Last updated 9 years ago.

2.2 match 1.00 score

rubens2005

criticalpath:An Implementation of the Critical Path Method

An R implementation of the Critical Path Method (CPM). CPM is a method used to estimate the minimum project duration and determine the amount of scheduling flexibility on the logical network paths within the schedule model. The flexibility is in terms of early start, early finish, late start, late finish, total float and free float. Beside, it permits to quantify the complexity of network diagram through the analysis of topological indicators. Finally, it permits to change the activities duration to perform what-if scenario analysis. The package was built based on following references: To make topological sorting and other graph operation, we use Csardi, G. & Nepusz, T. (2005) <https://www.researchgate.net/publication/221995787_The_Igraph_Software_Package_for_Complex_Network_Research>; For schedule concept, the reference was Project Management Institute (2017) <https://www.pmi.org/pmbok-guide-standards/foundational/pmbok>; For standards terms, we use Project Management Institute (2017) <https://www.pmi.org/pmbok-guide-standards/lexicon>; For algorithms on Critical Path Method development, we use Vanhoucke, M. (2013) <doi:10.1007/978-3-642-40438-2> and Vanhoucke, M. (2014) <doi:10.1007/978-3-319-04331-9>; And, finally, for topological definitions, we use Vanhoucke, M. (2009) <doi:10.1007/978-1-4419-1014-1>.

Maintained by Rubens Jose Rosa. Last updated 3 years ago.

0.5 match 1 stars 3.70 score 5 scripts

benwiseman

sentiment.ai:Simple Sentiment Analysis Using Deep Learning

Sentiment Analysis via deep learning and gradient boosting models with a lot of the underlying hassle taken care of to make the process as simple as possible. In addition to out-performing traditional, lexicon-based sentiment analysis (see <https://benwiseman.github.io/sentiment.ai/#Benchmarks>), it also allows the user to create embedding vectors for text which can be used in other analyses. GPU acceleration is supported on Windows and Linux.

Maintained by Ben Wiseman. Last updated 3 years ago.

0.5 match 2.70 score 7 scripts

krgitcode

vader:Valence Aware Dictionary and sEntiment Reasoner (VADER)

A lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains. Hutto & Gilbert (2014) <https://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8109/8122>.

Maintained by Katherine Roehrick. Last updated 5 years ago.

0.5 match 1 stars 2.55 score 117 scripts 1 dependents