lexicon:Lexicons for Text Analysis
A collection of lexical hash tables, dictionaries, and word lists.
Maintained by Tyler Rinker. Last updated 3 years ago.
tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.
Maintained by Julia Silge. Last updated 11 months ago.
textdata:Download and Load Various Text Datasets
Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled text data sets for classification and analysis.
Maintained by Emil Hvitfeldt. Last updated 10 months ago.
sentopics:Tools for Joint Sentiment and Topic Analysis of Textual Data
A framework that joins topic modeling and sentiment analysis of textual data. The package implements a fast Gibbs sampling estimation of Latent Dirichlet Allocation (Griffiths and Steyvers (2004) <doi:10.1073/pnas.0307752101>) and Joint Sentiment/Topic Model (Lin, He, Everson and Ruger (2012) <doi:10.1109/TKDE.2011.48>). It offers a variety of helpers and visualizations to analyze the result of topic modeling. The framework also allows enriching topic models with dates and externally computed sentiment measures. A flexible aggregation scheme enables the creation of time series of sentiment or topical proportions from the enriched topic models. Moreover, a novel method jointly aggregates topic proportions and sentiment measures to derive time series of topical sentiment.
Maintained by Olivier Delmarcelle. Last updated 2 months ago.
saotd:Sentiment Analysis of Twitter Data
This analytic is an in initial foray into sentiment analysis. This analytic will allow a user to access the Twitter API (once they create their own developer account), ingest tweets of their interest, clean / tidy data, perform topic modeling if interested, compute sentiment scores utilizing the Bing Lexicon, and output visualizations.
Maintained by Evan Munson. Last updated 7 months ago.
sentometrics:An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction
Optimized prediction based on textual sentiment, accounting for the intrinsic challenge that sentiment can be computed and pooled across texts and time in various ways. See Ardia et al. (2021) <doi:10.18637/jss.v099.i02>.
Maintained by Samuel Borms. Last updated 4 years ago.
atrrr:Wrapper for the 'AT' Protocol Behind 'Bluesky'
Wraps the 'AT' Protocol (Authenticated Transfer Protocol) behind 'Bluesky' <>. Functions can be used for, among others, retrieving posts and followers from the network or posting content.
Maintained by Johannes B. Gruber. Last updated 6 days ago.
syuzhet:Extracts Sentiment and Sentiment-Derived Plot Arcs from Text
Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include "syuzhet" (default) developed in the Nebraska Literary Lab "afinn" developed by Finn ร rup Nielsen, "bing" developed by Minqing Hu and Bing Liu, and "nrc" developed by Mohammad, Saif M. and Turney, Peter D. Applicable references are available in and in the documentation for the "get_sentiment" function. The package also provides a hack for implementing Stanford's coreNLP sentiment parser. The package provides several methods for plot arc normalization.
Maintained by Matthew Jockers. Last updated 2 years ago.
lexiconPT:Lexicons for Portuguese Text Analysis
Provides easy access for sentiment lexicons for those who want to do text analysis in Portuguese texts. As of now, two Portuguese lexicons are available: 'SentiLex-PT02' and 'OpLexicon' (v2.1 and v3.0).
Maintained by Sillas Gonzaga. Last updated 7 years ago.
aws.polly:Client for AWS Polly
A client for AWS Polly <>, a speech synthesis service.
Maintained by Antoine Sachet. Last updated 3 years ago.
warbleR:Streamline Bioacoustic Analysis
Functions aiming to facilitate the analysis of the structure of animal acoustic signals in 'R'. 'warbleR' makes use of the basic sound analysis tools from the packages 'tuneR' and 'seewave', and offers new tools for explore and quantify acoustic signal structure. The package allows to organize and manipulate multiple sound files, create spectrograms of complete recordings or individual signals in different formats, run several measures of acoustic structure, and characterize different structural levels in acoustic signals.
Maintained by Marcelo Araya-Salas. Last updated 2 months ago.
tidycode:Analyze Lines of R Code the Tidy Way
Analyze lines of R code using tidy principles. This allows you to input lines of R code and output a data frame with one row per function included. Additionally, it facilitates code classification via included lexicons.
Maintained by Lucy DAgostino McGowan. Last updated 4 years ago.
interlineaR:Importing Interlinearized Corpora and Dictionaries as Produced by Descriptive Linguistics Software
Interlinearized glossed texts (IGT) are used in descriptive linguistics for representing a morphological analysis of a text through a morpheme-by-morpheme gloss. 'InterlineaR' provide a set of functions that targets several popular formats of IGT ('SIL Toolbox', 'EMELD XML') and that turns an IGT into a set of data frames following a relational model (the tables represent the different linguistic units: texts, sentences, word, morphems). The same pieces of software ('SIL FLEX', 'SIL Toolbox') typically produce dictionaries of the morphemes used in the glosses. 'InterlineaR' provide a function for turning the LIFT XML dictionary format into a set of data frames following a relational model in order to represent the dictionary entries, the sense(s) attached to the entries, the example(s) attached to senses, etc.
Maintained by Sylvain Loiseau. Last updated 7 years ago.
textometry:Textual Data Analysis Package Used by the TXM Software
Statistical exploration of textual corpora using several methods from French 'Textometrie' (new name of 'Lexicometrie') and French 'Data Analysis' schools. It includes methods for exploring irregularity of distribution of lexicon features across text sets or parts of texts (Specificity analysis); multi-dimensional exploration (Factorial analysis), etc. Those methods are used in the TXM software.
Maintained by Matthieu Decorde. Last updated 3 years ago.
dscore:D-Score for Child Development
The D-score summarizes the child's performance on a set of milestones into a single number. The package implements four Rasch model keys to convert milestone scores into a D-score. It provides tools to calculate the D-score and its precision from the child's milestone scores, to convert the D-score into the Development-for-Age Z-score (DAZ) using age-conditional references, and to map milestone names into a generic 9-position item naming convention.
Maintained by Stef van Buuren. Last updated 7 months ago.
RcppCWB:'Rcpp' Bindings for the 'Corpus Workbench' ('CWB')
'Rcpp' Bindings for the C code of the 'Corpus Workbench' ('CWB'), an indexing and query engine to efficiently analyze large corpora (<>). 'RcppCWB' is licensed under the GNU GPL-3, in line with the GPL-3 license of the 'CWB' (<>). The 'CWB' relies on 'pcre2' (BSD license, see <>) and 'GLib' (LGPL license, see <>). See the file LICENSE.note for further information. The package includes modified code of the 'rcqp' package (GPL-2, see <>). The original work of the authors of the 'rcqp' package is acknowledged with great respect, and they are listed as authors of this package. To achieve cross-platform portability (including Windows), using 'Rcpp' for wrapper code is the approach used by 'RcppCWB'.
Maintained by Andreas Blaette. Last updated 1 years ago.
texter:An Easy Text and Sentiment Analysis Library
Implement text and sentiment analysis with 'texter'. Generate sentiment scores on text data and also visualize sentiments. 'texter' allows you to quickly generate insights on your data. It includes support for lexicons such as 'NRC' and 'Bing'.
Maintained by Simi Kafaru. Last updated 3 years ago.
tlShiny:Supplies essential functions to Teaching Lab dashboards
A bunch of random functions I use in developing dashboards Needs to vastly reduce the number of dependencies at the moment.
Maintained by Duncan Gates. Last updated 12 days ago.
ndl:Naive Discriminative Learning
Naive discriminative learning implements learning and classification models based on the Rescorla-Wagner equations and their equilibrium equations.
Maintained by Tino Sering. Last updated 7 years ago.
proustr:Tools for Natural Language Processing in French
Tools for Natural Language Processing in French and texts from Marcel Proust's collection "A La Recherche Du Temps Perdu". The novels contained in this collection are "Du cote de chez Swann ", "A l'ombre des jeunes filles en fleurs","Le Cote de Guermantes", "Sodome et Gomorrhe I et II", "La Prisonniere", "Albertine disparue", and "Le Temps retrouve".
Maintained by Colin Fay. Last updated 6 years ago.
wordmap:Feature Extraction and Document Classification with Noisy Labels
Extract features and classify documents with noisy labels given by document-meta data or keyword matching Watanabe & Zhou (2020) <doi:10.1177/0894439320907027>.
Maintained by Kohei Watanabe. Last updated 2 months ago.
malaytextr:Text Mining for Bahasa Malaysia
It is designed to work with text written in Bahasa Malaysia. We provide functions and data sets that will make working with Bahasa Malaysia text much easier. For word stemming in particular, we will look up the Malay words in a dictionary and then proceed to remove "extra suffix" as explained in Khan, Rehman Ullah, Fitri Suraya Mohamad, Muh Inam UlHaq, Shahren Ahmad Zadi Adruce, Philip Nuli Anding, Sajjad Nawaz Khan, and Abdulrazak Yahya Saleh Al-Hababi (2017) <> . This package includes a dictionary of Malay words that may be used to perform word stemming, a dataset of Malay stop words, a dataset of sentiment words and a dataset of normalized words.
Maintained by Zahier Nasrudin. Last updated 2 years ago.
tmpm:Trauma Mortality Prediction Model
Trauma Mortality prediction for ICD-9, ICD-10, and AIS lexicons in long or wide format based on Dr. Alan Cook's tmpm mortality model.
Maintained by Cody Moore. Last updated 9 years ago.
Sentiment Analysis via deep learning and gradient boosting models with a lot of the underlying hassle taken care of to make the process as simple as possible. In addition to out-performing traditional, lexicon-based sentiment analysis (see <>), it also allows the user to create embedding vectors for text which can be used in other analyses. GPU acceleration is supported on Windows and Linux.
Maintained by Ben Wiseman. Last updated 3 years ago.
vader:Valence Aware Dictionary and sEntiment Reasoner (VADER)
A lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains. Hutto & Gilbert (2014) <>.
Maintained by Katherine Roehrick. Last updated 5 years ago.
