R-universe search: text

oscarkjell

text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Maintained by Oscar Kjell. Last updated 5 days ago.

deep-learning machine-learning nlp transformers openjdk

93.6 match 146 stars 13.16 score 436 scripts 1 dependents

gagolews

stringi:Fast and Portable Character String Processing Facilities

A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).

Maintained by Marek Gagolewski. Last updated 1 months ago.

icu icu4c natural-language-processing nlp regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode cpp

45.8 match 309 stars 18.31 score 10k scripts 8.6k dependents

trinker

qdap:Bridging the Gap Between Qualitative Data and Quantitative Analysis

Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.

Maintained by Tyler Rinker. Last updated 4 years ago.

qdap quantitative-discourse-analysis text-analysis text-mining text-plotting openjdk

58.8 match 176 stars 9.61 score 1.3k scripts 3 dependents

sergejruff

lovecraftr:A Collection of Lovecraftian Tales and Texts

A curated collection of Howard Phillips Lovecraft's complete stories, collected for the purpose of text analysis.

Maintained by Ruff Sergej. Last updated 3 months ago.

130.1 match 6 stars 3.78 score 1 scripts

quanteda

quanteda:Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintained by Kenneth Benoit. Last updated 2 months ago.

corpus natural-language-processing quanteda text-analytics onetbb cpp

28.5 match 851 stars 16.68 score 5.4k scripts 51 dependents

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

36.8 match 215 stars 11.83 score 1.2k scripts 9 dependents

quanteda

readtext:Import and Handling for Plain and Formatted Text Files

Functions for importing and handling text files and formatted text files with additional meta-data, such including '.csv', '.tab', '.json', '.xml', '.html', '.pdf', '.doc', '.docx', '.rtf', '.xls', '.xlsx', and others.

Maintained by Kenneth Benoit. Last updated 4 months ago.

encoding quanteda text

39.9 match 122 stars 10.66 score 1.2k scripts 5 dependents

rstudio

gt:Easily Create Presentation-Ready Display Tables

Build display tables from tabular data with an easy-to-use set of functions. With its progressive approach, we can construct display tables with a cohesive set of table parts. Table values can be formatted using any of the included formatting functions. Footnotes and cell styles can be precisely added through a location targeting system. The way in which 'gt' handles things for you means that you don't often have to worry about the fine details.

Maintained by Richard Iannone. Last updated 13 days ago.

docx easy-to-use html latex rtf summary-tables

20.3 match 2.1k stars 18.36 score 20k scripts 112 dependents

juliasilge

janeaustenr:Jane Austen's Complete Novels

Full texts for Jane Austen's 6 completed novels, ready for text analysis. These novels are "Sense and Sensibility", "Pride and Prejudice", "Mansfield Park", "Emma", "Northanger Abbey", and "Persuasion".

Maintained by Julia Silge. Last updated 3 years ago.

jane-austen novels text-mining

29.5 match 95 stars 11.03 score 1.1k scripts 62 dependents

wilkelab

ggtext:Improved Text Rendering Support for 'ggplot2'

A 'ggplot2' extension that enables the rendering of complex formatted plot labels (titles, subtitles, facet labels, axis labels, etc.). Text boxes with automatic word wrap are also supported.

Maintained by Brenton M. Wiernik. Last updated 3 years ago.

19.7 match 657 stars 15.71 score 13k scripts 155 dependents

trinker

textclean:Text Cleaning Tools

Tools to clean and process text. Tools are geared at checking for substrings that are not optimal for analysis and replacing or removing them (normalizing) with more analysis friendly substrings (see Sproat, Black, Chen, Kumar, Ostendorf, & Richards (2001) <doi:10.1006/csla.2001.0169>) or extracting them into new variables. For example, emoticons are often used in text but not always easily handled by analysis algorithms. The replace_emoticon() function replaces emoticons with word equivalents.

Maintained by Tyler Rinker. Last updated 3 years ago.

data-munging emoticons regex text-analysis text-cleaning

30.6 match 248 stars 10.08 score 760 scripts 22 dependents

digi-vub

text.alignment:Text Alignment with Smith-Waterman

Find similarities between texts using the Smith-Waterman algorithm. The algorithm performs local sequence alignment and determines similar regions between two strings. The Smith-Waterman algorithm is explained in the paper: "Identification of common molecular subsequences" by T.F.Smith and M.S.Waterman (1981), available at <doi:10.1016/0022-2836(81)90087-5>. This package implements the same logic for sequences of words and letters instead of molecular sequences.

Maintained by Jan Wijffels. Last updated 2 years ago.

cpp

52.5 match 10 stars 5.80 score 14 scripts

r-forge

tm:Text Mining Package

A framework for text mining applications within R.

Maintained by Kurt Hornik. Last updated 27 days ago.

cpp

23.1 match 12.96 score 14k scripts 101 dependents

allancameron

geomtextpath:Curved Text in 'ggplot2'

A 'ggplot2' extension that allows text to follow curved paths. Curved text makes it easier to directly label paths or neatly annotate in polar co-ordinates.

Maintained by Allan Cameron. Last updated 2 months ago.

24.0 match 631 stars 12.04 score 960 scripts 5 dependents

ropensci

tokenizers:Fast, Consistent Tokenization of Natural Language Text

Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.

Maintained by Thomas Charlon. Last updated 12 months ago.

nlp peer-reviewed text-mining tokenizer cpp

21.6 match 186 stars 13.33 score 1.1k scripts 81 dependents

slowkow

ggrepel:Automatically Position Non-Overlapping Text Labels with 'ggplot2'

Provides text and label geoms for 'ggplot2' that help to avoid overlapping text labels. Labels repel away from each other and away from the data points.

Maintained by Kamil Slowikowski. Last updated 4 months ago.

ggplot2 text visualization cpp

14.7 match 1.2k stars 19.20 score 37k scripts 1.2k dependents

juliasilge

tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.

Maintained by Julia Silge. Last updated 11 months ago.

natural-language-processing text-mining tidy-data tidyverse

16.6 match 1.2k stars 16.86 score 17k scripts 61 dependents

r-lib

cli:Helpers for Developing Command Line Interfaces

A suite of tools to build attractive command line interfaces ('CLIs'), from semantic elements: headings, lists, alerts, paragraphs, etc. Supports custom themes via a 'CSS'-like language. It also contains a number of lower level 'CLI' elements: rules, boxes, trees, and 'Unicode' symbols with 'ASCII' alternatives. It support ANSI colors and text styles as well.

Maintained by Gábor Csárdi. Last updated 23 hours ago.

cli

14.0 match 664 stars 19.34 score 1.4k scripts 14k dependents

mlampros

textTinyR:Text Processing for Small or Big Data Files

It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.

Maintained by Lampros Mouselimis. Last updated 1 years ago.

bh boost cpp11 processing rcpp rcpparmadillo text openblas cpp openmp

33.9 match 38 stars 7.64 score 244 scripts 1 dependents

bioc

ComplexHeatmap:Make Complex Heatmaps

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports various annotation graphics.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization sequencing clustering complex-heatmaps heatmap

15.2 match 1.3k stars 16.93 score 16k scripts 151 dependents

davidgohel

flextable:Functions for Tabular Reporting

Use a grammar for creating and customizing pretty tables. The following formats are supported: 'HTML', 'PDF', 'RTF', 'Microsoft Word', 'Microsoft PowerPoint' and R 'Grid Graphics'. 'R Markdown', 'Quarto' and the package 'officer' can be used to produce the result files. The syntax is the same for the user regardless of the type of output to be produced. A set of functions allows the creation, definition of cell arrangement, addition of headers or footers, formatting and definition of cell content with text and or images. The package also offers a set of high-level functions that allow tabular reporting of statistical models and the creation of complex cross tabulations.

Maintained by David Gohel. Last updated 1 months ago.

docx html5 ms-office-documents rmarkdown table

14.5 match 583 stars 17.04 score 7.3k scripts 119 dependents

davidgohel

officer:Manipulation of Microsoft Word and PowerPoint Documents

Access and manipulate 'Microsoft Word', 'RTF' and 'Microsoft PowerPoint' documents from R. The package focuses on tabular and graphical reporting from R; it also provides two functions that let users get document content into data objects. A set of functions lets add and remove images, tables and paragraphs of text in new or existing documents. The package does not require any installation of Microsoft products to be able to write Microsoft files.

Maintained by David Gohel. Last updated 1 months ago.

ms-office-documents powerpoint word

15.6 match 630 stars 15.79 score 4.1k scripts 137 dependents

jokergoo

circlize:Circular Visualization

Circular layout is an efficient way for the visualization of huge amounts of information. Here this package provides an implementation of circular layout generation in R as well as an enhancement of available software. The flexibility of the package is based on the usage of low-level graphics functions such that self-defined high-level graphics can be easily implemented by users for specific purposes. Together with the seamless connection between the powerful computational and visual environment in R, it gives users more convenience and freedom to design figures for better understanding complex patterns behind multiple dimensional data. The package is described in Gu et al. 2014 <doi:10.1093/bioinformatics/btu393>.

Maintained by Zuguang Gu. Last updated 1 years ago.

15.6 match 983 stars 15.62 score 10k scripts 213 dependents

ropensci

beautier:'BEAUti' from R

'BEAST2' (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. 'BEAUti 2' (which is part of 'BEAST2') is a GUI tool that allows users to specify the many possible setups and generates the XML file 'BEAST2' needs to run. This package provides a way to create 'BEAST2' input files without active user input, but using R function calls instead.

Maintained by Richèl J.C. Bilderbeek. Last updated 24 days ago.

bayesian beast beast2 beauti phylogenetic-inference phylogenetics

27.8 match 13 stars 8.76 score 198 scripts 5 dependents

wrathematics

ngram:Fast n-Gram 'Tokenization'

An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.

Maintained by Drew Schmidt. Last updated 1 years ago.

ngram text text-mining

23.3 match 71 stars 10.45 score 844 scripts 7 dependents

t-kalinowski

keras:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

22.2 match 10.93 score 10k scripts 55 dependents

hneth

ds4psy:Data Science for Psychologists

All datasets and functions required for the examples and exercises of the book "Data Science for Psychologists" (by Hansjoerg Neth, Konstanz University, 2023), freely available at <https://bookdown.org/hneth/ds4psy/>. The book and course introduce principles and methods of data science to students of psychology and other biological or social sciences. The 'ds4psy' package primarily provides datasets, but also functions for data generation and manipulation (e.g., of text and time data) and graphics that are used in the book and its exercises. All functions included in 'ds4psy' are designed to be explicit and instructive, rather than efficient or elegant.

Maintained by Hansjoerg Neth. Last updated 1 months ago.

data-literacy data-science education exploratory-data-analysis psychology social-sciences visualisation

34.9 match 22 stars 6.79 score 70 scripts

wilkox

ggfittext:Fit Text Inside a Box in 'ggplot2'

A 'ggplot2' extension to fit text into a box by growing, shrinking or wrapping the text.

Maintained by David Wilkins. Last updated 1 years ago.

ggplot2

21.1 match 306 stars 11.08 score 234 scripts 33 dependents

dselivanov

text2vec:Modern Text Mining Framework for R

Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

Maintained by Dmitriy Selivanov. Last updated 7 months ago.

glove latent-dirichlet-allocation natural-language-processing text-mining topic-modeling vectorization word-embeddings word2vec cpp

17.3 match 860 stars 13.48 score 1.3k scripts 23 dependents

trinker

textshape:Tools for Reshaping Text

Tools that can be used to reshape and restructure text data.

Maintained by Tyler Rinker. Last updated 12 months ago.

data-reshaping manipulation sentence-boundary-detection text-data text-formating tidy

24.5 match 50 stars 9.18 score 266 scripts 34 dependents

zumbov2

deeplr:Interface to the 'DeepL' Translation API

A wrapper for the 'DeepL' Pro API <https://www.deepl.com/docs-api>, a web service for translating texts between different languages. A DeepL API developer account is required to use the service (see <https://www.deepl.com/pro#developer>).

Maintained by David Zumbach. Last updated 12 months ago.

api-wrapper deepl translation

40.4 match 41 stars 5.56 score 70 scripts

hneth

unikn:Graphical Elements of the University of Konstanz's Corporate Design

Define and use graphical elements of corporate design manuals in R. The 'unikn' package provides color functions (by defining dedicated colors and color palettes, and commands for finding, changing, viewing, and using them) and styled text elements (e.g., for marking, underlining, or plotting colored titles). The pre-defined range of colors and text decoration functions is based on the corporate design of the University of Konstanz <https://www.uni-konstanz.de/>, but can be adapted and extended for other purposes or institutions.

Maintained by Hansjoerg Neth. Last updated 3 months ago.

branding color color-palette colorscheme corporate-design palette text-decoration university-colors visual-identity

24.9 match 39 stars 8.82 score 156 scripts 2 dependents

patperry

utf8:Unicode Text Processing

Process and print 'UTF-8' encoded international text (Unicode). Input, validate, normalize, encode, format, and display.

Maintained by Kirill Müller. Last updated 3 months ago.

12.9 match 113 stars 16.48 score 295 scripts 11k dependents

haozhu233

kableExtra:Construct Complex Table with 'kable' and Pipe Syntax

Build complex HTML or 'LaTeX' tables using 'kable()' from 'knitr' and the piping syntax from 'magrittr'. Function 'kable()' is a light weight table generator coming from 'knitr'. This package simplifies the way to manipulate the HTML or 'LaTeX' codes generated by 'kable()' and allows users to construct complex tables and customize styles using a readable syntax.

Maintained by Hao Zhu. Last updated 12 days ago.

html kable kableextra knitr latex rmarkdown

10.8 match 702 stars 19.35 score 55k scripts 163 dependents

rstudio

shiny:Web Application Framework for R

Makes it incredibly easy to build interactive web applications with R. Automatic "reactive" binding between inputs and outputs and extensive prebuilt widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.

Maintained by Winston Chang. Last updated 15 days ago.

reactive rstudio shiny web-app web-development

9.7 match 5.4k stars 21.28 score 108k scripts 1.8k dependents

geobosh

Rdpack:Update and Manipulate Rd Documentation Objects

Functions for manipulation of R documentation objects, including functions reprompt() and ereprompt() for updating 'Rd' documentation for functions, methods and classes; 'Rd' macros for citations and import of references from 'bibtex' files for use in 'Rd' files and 'roxygen2' comments; 'Rd' macros for evaluating and inserting snippets of 'R' code and the results of its evaluation or creating graphics on the fly; and many functions for manipulation of references and Rd files.

Maintained by Georgi N. Boshnakov. Last updated 2 days ago.

bibtex bibtex-references citations documentation rd-format roxygen2

14.4 match 30 stars 13.76 score 73 scripts 2.3k dependents

ropensci

googleLanguageR:Call Google's 'Natural Language' API, 'Cloud Translation' API, 'Cloud Speech' API and 'Cloud Text-to-Speech' API

Call 'Google Cloud' machine learning APIs for text and speech tasks. Call the 'Cloud Translation' API <https://cloud.google.com/translate/> for detection and translation of text, the 'Natural Language' API <https://cloud.google.com/natural-language/> to analyse text for sentiment, entities or syntax, the 'Cloud Speech' API <https://cloud.google.com/speech/> to transcribe sound files to text and the 'Cloud Text-to-Speech' API <https://cloud.google.com/text-to-speech/> to turn text into sound files.

Maintained by Mark Edmondson. Last updated 8 months ago.

cloud-speech-api cloud-translation-api google-api-client google-cloud google-cloud-speech google-nlp googleauthr natural-language-processing peer-reviewed sentiment-analysis speech-api translation-api

18.3 match 196 stars 10.36 score 268 scripts 3 dependents

hughjonesd

huxtable:Easily Create and Style Tables for LaTeX, HTML and Other Formats

Creates styled tables for data presentation. Export to HTML, LaTeX, RTF, 'Word', 'Excel', and 'PowerPoint'. Simple, modern interface to manipulate borders, size, position, captions, colours, text styles and number formatting. Table cells can span multiple rows and/or columns. Includes a 'huxreg' function for creation of regression tables, and 'quick_*' one-liners to print data to a new document.

Maintained by David Hugh-Jones. Last updated 14 days ago.

html huxtable latex microsoft-word powerpoint reproducible-research tables

13.6 match 323 stars 13.93 score 1.9k scripts 16 dependents

dreamrs

shinyWidgets:Custom Inputs Widgets for Shiny

Collection of custom input controls and user interface components for 'Shiny' applications. Give your applications a unique and colorful style !

Maintained by Victor Perrier. Last updated 13 days ago.

shiny

10.9 match 849 stars 17.05 score 8.1k scripts 218 dependents

tidyverse

ggplot2:Create Elegant Data Visualisations Using the Grammar of Graphics

A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

Maintained by Thomas Lin Pedersen. Last updated 11 days ago.

data-visualisation visualisation

7.4 match 6.6k stars 25.10 score 645k scripts 7.5k dependents

qinwf

jiebaR:Chinese Text Segmentation

Chinese text segmentation, keyword extraction and speech tagging For R.

Maintained by Qin Wenfeng. Last updated 5 years ago.

chinese chinese-text-segmentation cppjieba jieba lexical-analysis nlp cpp

17.8 match 348 stars 10.18 score 456 scripts 6 dependents

trinker

lexicon:Lexicons for Text Analysis

A collection of lexical hash tables, dictionaries, and word lists.

Maintained by Tyler Rinker. Last updated 3 years ago.

hash lexicon lookup names-frequent stopwords text-dictionaries text-mining

20.3 match 111 stars 8.80 score 224 scripts 25 dependents

tidyverse

vroom:Read and Write Rectangular Text Data Quickly

The goal of 'vroom' is to read and write data (like 'csv', 'tsv' and 'fwf') quickly. When reading it uses a quick initial indexing step, then reads the values lazily , so only the data you actually use needs to be read. The writer formats the data in parallel and writes to disk asynchronously from formatting.

Maintained by Jennifer Bryan. Last updated 7 months ago.

csv csv-parser fixed-width-text tsv tsv-parser cpp

9.6 match 625 stars 17.78 score 4.5k scripts 2.1k dependents

ropensci

pdftools:Text Extraction, Rendering and Converting of PDF Documents

Utilities based on 'libpoppler' <https://poppler.freedesktop.org> for extracting text, fonts, attachments and metadata from a PDF file. Also supports high quality rendering of PDF documents into PNG, JPEG, TIFF format, or into raw bitmap vectors for further processing in R.

Maintained by Jeroen Ooms. Last updated 15 days ago.

pdf-files pdf-format pdftools poppler poppler-library text-extraction cpp

12.9 match 529 stars 13.10 score 3.3k scripts 47 dependents

spatstat

spatstat.geom:Geometrical Functionality of the 'spatstat' Family

Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)

Maintained by Adrian Baddeley. Last updated 7 hours ago.

classes-and-objects distance-calculation geometry geometry-processing images mensuration plotting point-patterns spatial-data spatial-data-analysis

13.8 match 7 stars 12.10 score 241 scripts 227 dependents

eagerai

fastai:Interface to 'fastai'

The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.

Maintained by Turgut Abdullayev. Last updated 11 months ago.

audio collaborative-filtering darknet darknet-image-classification fastai medical object-detection tabular text vision

17.5 match 118 stars 9.40 score 76 scripts

ropensci

rcrossref:Client for Various 'CrossRef' 'APIs'

Client for various 'CrossRef' 'APIs', including 'metadata' search with their old and newer search 'APIs', get 'citations' in various formats (including 'bibtex', 'citeproc-json', 'rdf-xml', etc.), convert 'DOIs' to 'PMIDs', and 'vice versa', get citations for 'DOIs', and get links to full text of articles when available.

Maintained by Najko Jahn. Last updated 2 years ago.

text-ming literature pdf xml publications citations full-text tdm crossref api api-wrapper crossref-api doi metadata

15.5 match 172 stars 10.18 score 404 scripts 10 dependents

tomeriko96

polyglotr:Translate Text

Provide easy methods to translate pieces of text. Functions send requests to translation services online.

Maintained by Tomer Iwan. Last updated 2 months ago.

google-translate googletranslate language linguee mymemory-api mymemorytranslator pons translation translations-api

20.5 match 33 stars 7.61 score 34 scripts 1 dependents

mkearney

wactor:Word Factor Vectors

A user-friendly factor-like interface for converting strings of text into numeric vectors and rectangular data structures.

Maintained by Michael W. Kearney. Last updated 5 years ago.

text text-classification text-processing text-vectorization word-embeddings word-vectors word2vec

34.0 match 33 stars 4.52 score 3 scripts

kassambara

ggpubr:'ggplot2' Based Publication Ready Plots

The 'ggplot2' package is excellent and flexible for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a 'ggplot', the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills. 'ggpubr' provides some easy-to-use functions for creating and customizing 'ggplot2'- based publication ready plots.

Maintained by Alboukadel Kassambara. Last updated 2 years ago.

9.2 match 1.2k stars 16.68 score 65k scripts 409 dependents

trinker

textstem:Tools for Stemming and Lemmatizing Text

Tools that stem and lemmatize text. Stemming is a process that removes endings such as affixes. Lemmatization is the process of grouping inflected forms together as a single base form.

Maintained by Tyler Rinker. Last updated 7 years ago.

lemmatization stemming text-mining

17.5 match 45 stars 8.71 score 888 scripts 11 dependents

yihui

xfun:Supporting Functions for Packages Maintained by 'Yihui Xie'

Miscellaneous functions commonly used in other packages maintained by 'Yihui Xie'.

Maintained by Yihui Xie. Last updated 3 hours ago.

8.3 match 146 stars 18.19 score 916 scripts 4.4k dependents

massimoaria

tall:Text Analysis for All

An R 'shiny' app designed for diverse text analysis tasks, offering a wide range of methodologies tailored to Natural Language Processing (NLP) needs. It is a versatile, general-purpose tool for analyzing textual data. 'tall' features a comprehensive workflow, including data cleaning, preprocessing, statistical analysis, and visualization, all integrated for effective text analysis.

Maintained by Massimo Aria. Last updated 5 days ago.

r-shiny text-analysis-and-sentiment-analysis text-classification text-mining textual-analysis cpp

29.2 match 14 stars 5.12 score

dmurdoch

plotrix:Various Plotting Functions

Lots of plots, various labeling, axis and color scaling functions. The author/maintainer died in September 2023.

Maintained by Duncan Murdoch. Last updated 1 years ago.

12.9 match 5 stars 11.31 score 9.2k scripts 361 dependents

kurthornik

NLP:Natural Language Processing Infrastructure

Basic classes and methods for Natural Language Processing.

Maintained by Kurt Hornik. Last updated 4 months ago.

15.6 match 6 stars 9.37 score 1.0k scripts 127 dependents

computationalstylistics

stylo:Stylometric Multivariate Analyses

Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.

Maintained by Maciej Eder. Last updated 2 months ago.

16.9 match 187 stars 8.58 score 462 scripts

openanalytics

inTextSummaryTable:Creation of in-Text Summary Table

Creation of tables of summary statistics or counts for clinical data (for 'TLFs'). These tables can be exported as in-text table (with the 'flextable' package) for a Clinical Study Report (Word format) or a 'topline' presentation (PowerPoint format), or as interactive table (with the 'DT' package) to an html document for clinical data review.

Maintained by Laure Cougnaud. Last updated 9 months ago.

26.2 match 1 stars 5.52 score 47 scripts

karlines

diagram:Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams

Visualises simple graphs (networks) based on a transition matrix, utilities to plot flow diagrams, visualising webs, electrical networks, etc. Support for the book "A practical guide to ecological modelling - using R as a simulation platform" by Karline Soetaert and Peter M.J. Herman (2009), Springer. and the book "Solving Differential Equations in R" by Karline Soetaert, Jeff Cash and Francesca Mazzia (2012), Springer. Includes demo(flowchart), demo(plotmat), demo(plotweb).

Maintained by Karline Soetaert. Last updated 4 years ago.

14.0 match 10.06 score 598 scripts 487 dependents

kasperwelbers

corpustools:Managing, Querying and Analyzing Tokenized Text

Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.

Maintained by Kasper Welbers. Last updated 6 months ago.

cpp

18.1 match 31 stars 7.50 score 174 scripts 1 dependents

dmurdoch

rgl:3D Visualization Using OpenGL

Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.). Output may be on screen using OpenGL, or to various standard 3D file formats including WebGL, PLY, OBJ, STL as well as 2D image formats, including PNG, Postscript, SVG, PGF.

Maintained by Duncan Murdoch. Last updated 2 months ago.

graphics opengl rgl webgl libglu libglvnd libpng libx11 freetype cpp

7.7 match 91 stars 17.49 score 7.3k scripts 300 dependents

r-forge

zoo:S3 Infrastructure for Regular and Irregular Time Series (Z's Ordered Observations)

An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo's key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.

Maintained by Achim Zeileis. Last updated 15 days ago.

8.3 match 16.23 score 33k scripts 2.2k dependents

trinker

sentimentr:Calculate Text Polarity Sentiment

Calculate text polarity sentiment at the sentence level and optionally aggregate by rows or grouping variable(s).

Maintained by Tyler Rinker. Last updated 3 years ago.

amplifier polarity sentiment sentiment-analysis valence-shifter

13.9 match 432 stars 9.43 score 680 scripts 2 dependents

jhudsl

text2speech:Text to Speech Conversion

Converts text into speech using various text-to-speech (TTS) engines and provides an unified interface for accessing their functionality. With this package, users can easily generate audio files of spoken words, phrases, or sentences from plain text data. The package supports multiple TTS engines, including Google's 'Cloud Text-to-Speech API', 'Amazon Polly', Microsoft's 'Cognitive Services Text to Speech REST API', and a free TTS engine called 'Coqui TTS'.

Maintained by Howard Baek. Last updated 2 years ago.

edtech-software speech-synthesis text-to-speech tts voice

20.6 match 21 stars 6.28 score 9 scripts 2 dependents

andrewheiss

scriptuRs:Complete Text of the LDS Scriptures

Full text, in data frames containing one row per verse, of the Standard Works of The Church of Jesus Christ of Latter-day Saints (LDS). These are the Old Testament, (KJV), the New Testament (KJV), the Book of Mormon, the Doctrine and Covenants, and the Pearl of Great Price.

Maintained by Andrew Heiss. Last updated 6 years ago.

lds lds-scriptures text-mining tidytext

29.9 match 14 stars 4.32 score 30 scripts

ropensci

jstor:Read Data from JSTOR/DfR

Functions and helpers to import metadata, ngrams and full-texts delivered by Data for Research by JSTOR.

Maintained by Thomas Klebel. Last updated 8 months ago.

jstor peer-reviewed text-analysis text-mining

17.5 match 47 stars 7.29 score 55 scripts

ingmarboeschen

JATSdecoder:A Metadata and Text Extraction and Manipulation Tool Set

Provides a function collection to extract metadata, sectioned text and study characteristics from scientific articles in 'NISO-JATS' format. Articles in PDF format can be converted to 'NISO-JATS' with the 'Content ExtRactor and MINEr' ('CERMINE', <https://github.com/CeON/CERMINE>). For convenience, two functions bundle the extraction heuristics: JATSdecoder() converts 'NISO-JATS'-tagged XML files to a structured list with elements title, author, journal, history, 'DOI', abstract, sectioned text and reference list. study.character() extracts multiple study characteristics like number of included studies, statistical methods used, alpha error, power, statistical results, correction method for multiple testing, software used. An estimation of the involved sample size is performed based on reports within the abstract and the reported degrees of freedom within statistical results. In addition, the package contains some useful functions to process text (text2sentences(), text2num(), ngram(), strsplit2(), grep2()). See Böschen, I. (2021) <doi:10.1007/s11192-021-04162-z> Böschen, I. (2021) <doi:10.1038/s41598-021-98782-3> and Böschen, I (2023) <doi:10.1038/s41598-022-27085-y>.

Maintained by Ingmar Böschen. Last updated 7 days ago.

cermine niso-jats pubmedcentral text-extraction text-mining xml-files openjdk

27.8 match 18 stars 4.56 score 7 scripts

coolbutuseless

lofifonts:Text Rendering with Bitmap and Vector Fonts

Alternate font rendering is useful when rendering text to novel graphics outputs where modern font rendering is not available or where bespoke text positioning is required. Bitmap and vector fonts allow for custom layout and rendering using pixel coordinates and line drawing. Formatted text is created as a data.frame of pixel coordinates (for bitmap fonts) or stroke coordinates (for vector fonts). All text can be easily previewed as a matrix or raster image. A selection of fonts is included with this package.

Maintained by Mike Cheng. Last updated 25 days ago.

20.7 match 7 stars 5.94 score 10 scripts

jonclayden

ore:An R Interface to the Onigmo Regular Expression Library

Provides an alternative to R's built-in functionality for handling regular expressions, based on the Onigmo library. Offers first-class compiled regex objects, partial matching and function-based substitutions, amongst other features.

Maintained by Jon Clayden. Last updated 4 days ago.

regex regular-expressions text-analysis

17.2 match 58 stars 7.16 score 125 scripts 6 dependents

kumes

chatAI4R:Chat-Based Interactive Artificial Intelligence for R

The Large Language Model (LLM) represents a groundbreaking advancement in data science and programming, and also allows us to extend the world of R. A seamless interface for integrating the 'OpenAI' Web APIs into R is provided in this package. This package leverages LLM-based AI techniques, enabling efficient knowledge discovery and data analysis (see 'OpenAI' Web APIs details <https://openai.com/blog/openai-api>). The previous functions such as seamless translation and image generation have been moved to other packages 'deepRstudio' and 'stableDiffusion4R'.

Maintained by Satoshi Kume. Last updated 1 months ago.

ai bioinformatics chatgpt gpt image image-generation

27.6 match 14 stars 4.45 score 3 scripts

rspatial

terra:Spatial Data Analysis

Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).

Maintained by Robert J. Hijmans. Last updated 8 hours ago.

geospatial raster spatial vector onetbb proj gdal geos cpp

6.9 match 559 stars 17.63 score 17k scripts 851 dependents

henrikbengtsson

R.utils:Various Programming Utilities

Utility functions useful when programming and developing R packages.

Maintained by Henrik Bengtsson. Last updated 1 years ago.

8.8 match 63 stars 13.74 score 5.7k scripts 814 dependents

nteetor

cascadess:A Style Pronoun for 'htmltools' Tags

Apply styles to tag elements directly and with the .style pronoun. Using the pronoun, styles are created within the context of a tag element. Change borders, backgrounds, text, margins, layouts, and more.

Maintained by Nathan Teetor. Last updated 5 months ago.

bootstrap-5 css shiny

25.0 match 19 stars 4.82 score 4 scripts

quanteda

spacyr:Wrapper to the 'spaCy' 'NLP' Library

An R wrapper to the 'Python' 'spaCy' 'NLP' library, from <https://spacy.io>.

Maintained by Kenneth Benoit. Last updated 1 months ago.

extract-entities nlp spacy speech-tagging

11.1 match 253 stars 10.68 score 408 scripts 6 dependents

emilhvitfeldt

hcandersenr:H.C. Andersens Fairy Tales

Texts for H.C. Andersens fairy tales, ready for text analysis. Fairy tales in German, Danish, English, Spanish and French.

Maintained by Emil Hvitfeldt. Last updated 5 years ago.

andersens-fairy-tales text-mining

25.5 match 10 stars 4.62 score 83 scripts

rolkra

explore:Simplifies Exploratory Data Analysis

Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.

Maintained by Roland Krasser. Last updated 3 months ago.

data-exploration data-visualisation decision-trees eda rmarkdown shiny tidy

10.3 match 228 stars 11.43 score 221 scripts 1 dependents

ropensci

textreuse:Detect Text Reuse and Document Similarity

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Maintained by Yaoxiang Li. Last updated 1 months ago.

peer-reviewed cpp

12.3 match 200 stars 9.28 score 226 scripts

r-lib

textshaping:Bindings to the 'HarfBuzz' and 'Fribidi' Libraries for Text Shaping

Provides access to the text shaping functionality in the 'HarfBuzz' library and the bidirectional algorithm in the 'Fribidi' library. 'textshaping' is a low-level utility package mainly for graphic devices that expands upon the font tool-set provided by the 'systemfonts' package.

Maintained by Thomas Lin Pedersen. Last updated 2 months ago.

harfbuzz freetype fribidi cpp

8.2 match 19 stars 13.58 score 66 scripts 484 dependents

vgherard

sbo:Text Prediction via Stupid Back-Off N-Gram Models

Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).

Maintained by Valerio Gherardi. Last updated 4 years ago.

natural-language-processing ngram-models predictive-text sbo cpp

23.4 match 10 stars 4.78 score 12 scripts

laresbernardo

lares:Analytics & Machine Learning Sidekick

Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.

Maintained by Bernardo Lares. Last updated 26 days ago.

analytics api automation automl data-science descriptive-statistics h2o machine-learning marketing mmm predictive-modeling puzzle rlanguage robyn visualization

11.2 match 233 stars 9.84 score 185 scripts 1 dependents

emilhvitfeldt

textdata:Download and Load Various Text Datasets

Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled text data sets for classification and analysis.

Maintained by Emil Hvitfeldt. Last updated 10 months ago.

text-datasets

11.3 match 75 stars 9.66 score 1.4k scripts 1 dependents

gagolews

stringx:Replacements for Base String Functions Powered by 'stringi'

English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to 'ICU' (International Components for Unicode) and 'stringi', they are fast, reliable, and portable across different platforms.

Maintained by Marek Gagolewski. Last updated 2 months ago.

icu icu4c natural-language-processing nlp regex regexp string-manipulation stringi text text-processing unicode

23.0 match 28 stars 4.75 score 1 scripts

juba

rainette:The Reinert Method for Textual Data Clustering

An R implementation of the Reinert text clustering method. For more details about the algorithm see the included vignettes or Reinert (1990) <doi:10.1177/075910639002600103>.

Maintained by Julien Barnier. Last updated 11 months ago.

text-analysis text-classification cpp

15.5 match 55 stars 6.90 score 24 scripts

bnosac

textplot:Text Plots

Visualise complex relations in texts. This is done by providing functionalities for displaying text co-occurrence networks, text correlation networks, dependency relationships as well as text clustering and semantic text 'embeddings'. Feel free to join the effort of providing interesting text visualisations.

Maintained by Jan Wijffels. Last updated 3 years ago.

15.7 match 54 stars 6.78 score 75 scripts 1 dependents

bioc

DAPAR:Tools for the Differential Analysis of Proteins Abundance with R

The package DAPAR is a Bioconductor distributed R package which provides all the necessary functions to analyze quantitative data from label-free proteomics experiments. Contrarily to most other similar R packages, it is endowed with rich and user-friendly graphical interfaces, so that no programming skill is required (see `Prostar` package).

Maintained by Samuel Wieczorek. Last updated 5 months ago.

proteomics normalization preprocessing massspectrometry qualitycontrol go dataimport prostar1

19.6 match 2 stars 5.42 score 22 scripts 1 dependents

wilkelab

gridtext:Improved Text Rendering Support for 'Grid' Graphics

Provides support for rendering of formatted text using 'grid' graphics. Text can be formatted via a minimal subset of 'Markdown', 'HTML', and inline 'CSS' directives, and it can be rendered both with and without word wrap.

Maintained by Brenton M. Wiernik. Last updated 1 years ago.

cpp

9.1 match 97 stars 11.55 score 344 scripts 203 dependents

bioc

RCy3:Functions to Access and Control Cytoscape

Vizualize, analyze and explore networks using Cytoscape via R. Anything you can do using the graphical user interface of Cytoscape, you can now do with a single RCy3 function.

Maintained by Alex Pico. Last updated 5 months ago.

visualization graphandnetwork thirdpartyclient network

7.8 match 52 stars 13.39 score 628 scripts 15 dependents

dfe-analytical-services

shinyGovstyle:Custom Gov Style Inputs for Shiny

Collection of 'shiny' application styling that are the based on the GOV.UK Design System. See <https://design-system.service.gov.uk/components/> for details.

Maintained by Ross Wyatt. Last updated 14 hours ago.

15.3 match 44 stars 6.74 score 25 scripts

vegandevs

vegan:Community Ecology Package

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Maintained by Jari Oksanen. Last updated 18 days ago.

ecological-modelling ecology ordination fortran openblas

5.3 match 472 stars 19.41 score 15k scripts 440 dependents

parklab

Nozzle.R1:Nozzle Reports

The Nozzle package provides an API to generate HTML reports with dynamic user interface elements based on JavaScript and CSS (Cascading Style Sheets). Nozzle was designed to facilitate summarization and rapid browsing of complex results in data analysis pipelines where multiple analyses are performed frequently on big data sets. The package can be applied to any project where user-friendly reports need to be created.

Maintained by Nils Gehlenborg. Last updated 10 years ago.

gehlenborglab html-report reproducible-research

18.9 match 68 stars 5.31 score 10 scripts 2 dependents

r-lib

marquee:Markdown Parser and Renderer for R Graphics

Provides the mean to parse and render markdown text with grid along with facilities to define the styling of the text.

Maintained by Thomas Lin Pedersen. Last updated 2 months ago.

cpp

11.7 match 84 stars 8.54 score 28 scripts 1 dependents

bnosac

textrank:Summarize Text by Ranking Sentences and Finding Keywords

The 'textrank' algorithm is an extension of the 'Pagerank' algorithm for text. The algorithm allows to summarize text by calculating how sentences are related to one another. This is done by looking at overlapping terminology used in sentences in order to set up links between sentences. The resulting sentence network is next plugged into the 'Pagerank' algorithm which identifies the most important sentences in your text and ranks them. In a similar way 'textrank' can also be used to extract keywords. A word network is constructed by looking if words are following one another. On top of that network the 'Pagerank' algorithm is applied to extract relevant words after which relevant words which are following one another are combined to get keywords. More information can be found in the paper from Mihalcea, Rada & Tarau, Paul (2004) <https://www.aclweb.org/anthology/W04-3252/>.

Maintained by Jan Wijffels. Last updated 4 years ago.

natural-language-processing nlp textrank textrank-algorithm

13.5 match 77 stars 7.38 score 103 scripts 2 dependents

sticsrpacks

SticsRFiles:Read and Modify 'STICS' Input/Output Files

Manipulating input and output files of the 'STICS' crop model. Files are either 'JavaSTICS' XML files or text files used by the model 'fortran' executable. Most basic functionalities are reading or writing parameter names and values in both XML or text input files, and getting data from output files. Advanced functionalities include XML files generation from XML templates and/or spreadsheets, or text files generation from XML files by using 'xslt' transformation.

Maintained by Patrice Lecharpentier. Last updated 20 days ago.

12.0 match 4 stars 8.27 score 124 scripts

bnosac

ruimtehol:Learn Text 'Embeddings' with 'Starspace'

Wraps the 'StarSpace' library <https://github.com/facebookresearch/StarSpace> allowing users to calculate word, sentence, article, document, webpage, link and entity 'embeddings'. By using the 'embeddings', you can perform text based multi-label classification, find similarities between texts and categories, do collaborative-filtering based recommendation as well as content-based recommendation, find out relations between entities, calculate graph 'embeddings' as well as perform semi-supervised learning and multi-task learning on plain text. The techniques are explained in detail in the paper: 'StarSpace: Embed All The Things!' by Wu et al. (2017), available at <arXiv:1709.03856>.

Maintained by Jan Wijffels. Last updated 1 years ago.

classification embeddings natural-language-processing nlp similarity starspace text-mining cpp

14.5 match 101 stars 6.65 score 44 scripts

dwulff

text2sdg:Detecting UN Sustainable Development Goals in Text

The United Nations’ Sustainable Development Goals (SDGs) have become an important guideline for organisations to monitor and plan their contributions to social, economic, and environmental transformations. The 'text2sdg' package is an open-source analysis package that identifies SDGs in text using scientifically developed query systems, opening up the opportunity to monitor any type of text-based data, such as scientific output or corporate publications. For more information regarding the methodology see Meier, Mata & Wulff (2022) <arXiv:2110.05856>.

Maintained by Dominik S. Meier. Last updated 6 months ago.

natural-language-processing sustainability sustainable-development sustainable-development-goals

15.7 match 18 stars 6.13 score 9 scripts

fhdsl

conrad:Client for the Microsoft's 'Cognitive Services Text to Speech REST' API

Convert text into synthesized speech and get a list of supported voices for a region. Microsoft's 'Cognitive Services Text to Speech REST' API <https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech?tabs=streaming> supports neural text to speech voices, which support specific languages and dialects that are identified by locale.

Maintained by Howard Baik. Last updated 2 months ago.

azure text-to-speech tts

18.2 match 1 stars 5.26 score 2 scripts 3 dependents

tommyjones

textmineR:Functions for Text Mining and Topic Modeling

An aid for text mining in R, with a syntax that should be familiar to experienced R users. Provides a wrapper for several topic models that take similarly-formatted input and give similarly-formatted output. Has additional functionality for analyzing and diagnostics for topic models.

Maintained by Tommy Jones. Last updated 2 years ago.

cpp

8.8 match 106 stars 10.83 score 310 scripts 7 dependents

nalimilan

SnowballC:Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library

An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish.

Maintained by Milan Bouchet-Valat. Last updated 19 days ago.

text-mining

7.5 match 27 stars 12.63 score 4.4k scripts 171 dependents

paithiov909

audubon:Japanese Text Processing Tools

A collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the 'Sudachi' morphological analyzer and the 'NEologd' (Neologism dictionary for 'MeCab'). These features are specific to Japanese and are not implemented in 'ICU' (International Components for Unicode).

Maintained by Akiru Kato. Last updated 23 days ago.

japanese javascript

16.9 match 10 stars 5.61 score 3 scripts 1 dependents

harrelfe

Hmisc:Harrell Miscellaneous

Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.

Maintained by Frank E Harrell Jr. Last updated 2 days ago.

fortran

5.3 match 210 stars 17.61 score 17k scripts 750 dependents

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 2 days ago.

fortran cpp

5.6 match 87 stars 16.70 score 7.7k scripts 99 dependents

michelnivard

gptstudio:Use Large Language Models Directly in your Development Environment

Large language models are readily accessible via API. This package lowers the barrier to use the API inside of your development environment. For more on the API, see <https://platform.openai.com/docs/introduction>.

Maintained by James Wade. Last updated 8 days ago.

chatgpt gpt-3 rstudio rstudio-addin

8.7 match 924 stars 10.83 score 43 scripts 1 dependents

polmine

polmineR:Verbs and Nouns for Corpus Analysis

Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.

Maintained by Andreas Blaette. Last updated 1 years ago.

11.8 match 49 stars 7.96 score 311 scripts

mjockers

syuzhet:Extracts Sentiment and Sentiment-Derived Plot Arcs from Text

Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include "syuzhet" (default) developed in the Nebraska Literary Lab "afinn" developed by Finn Årup Nielsen, "bing" developed by Minqing Hu and Bing Liu, and "nrc" developed by Mohammad, Saif M. and Turney, Peter D. Applicable references are available in README.md and in the documentation for the "get_sentiment" function. The package also provides a hack for implementing Stanford's coreNLP sentiment parser. The package provides several methods for plot arc normalization.

Maintained by Matthew Jockers. Last updated 2 years ago.

7.2 match 336 stars 12.92 score 1.4k scripts 31 dependents

brodieg

diffobj:Diffs for R Objects

Generate a colorized diff of two R objects for an intuitive visualization of their differences.

Maintained by Brodie Gaslam. Last updated 3 years ago.

diff

7.1 match 232 stars 13.12 score 107 scripts 486 dependents

kwb-r

kwb.utils:General Utility Functions Developed at KWB

This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).

Maintained by Hauke Sonnenberg. Last updated 12 months ago.

12.7 match 8 stars 7.33 score 12 scripts 78 dependents

gforge

htmlTable:Advanced Tables for Markdown/HTML

Tables with state-of-the-art layout elements such as row spanners, column spanners, table spanners, zebra striping, and more. While allowing advanced layout, the underlying css-structure is simple in order to maximize compatibility with common word processors. The package also contains a few text formatting functions that help outputting text compatible with HTML/LaTeX.

Maintained by Max Gordon. Last updated 8 months ago.

knitr table

6.0 match 79 stars 15.32 score 1.3k scripts 763 dependents

tidymodels

textrecipes:Extra 'Recipes' for Text Processing

Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.

Maintained by Emil Hvitfeldt. Last updated 11 days ago.

8.4 match 160 stars 10.87 score 964 scripts 1 dependents

dgerbing

lessR:Less Code, More Results

Each function replaces multiple standard R functions. For example, two function calls, Read() and CountAll(), generate summary statistics for all variables in the data frame, plus histograms and bar charts as appropriate. Other functions provide for summary statistics via pivot tables, a comprehensive regression analysis, ANOVA and t-test, visualizations including the Violin/Box/Scatter plot for a numerical variable, bar chart, histogram, box plot, density curves, calibrated power curve, reading multiple data formats with the same function call, variable labels, time series with aggregation and forecasting, color themes, and Trellis (facet) graphics. Also includes a confirmatory factor analysis of multiple indicator measurement models, pedagogical routines for data simulation such as for the Central Limit Theorem, generation and rendering of regression instructions for interpretative output, and interactive visualizations.

Maintained by David W. Gerbing. Last updated 1 days ago.

12.3 match 6 stars 7.42 score 394 scripts 3 dependents

ropensci

ijtiff:Comprehensive TIFF I/O with Full Support for 'ImageJ' TIFF Files

General purpose TIFF file I/O for R users. Currently the only such package with read and write support for TIFF files with floating point (real-numbered) pixels, and the only package that can correctly import TIFF files that were saved from 'ImageJ' and write TIFF files than can be correctly read by 'ImageJ' <https://imagej.net/ij/>. Also supports text image I/O.

Maintained by Rory Nolan. Last updated 8 days ago.

image-manipulation imagej peer-reviewed tiff-files tiff-images tiff

10.1 match 18 stars 8.97 score 36 scripts 7 dependents

quanteda

quanteda.textmodels:Scaling Models and Classifiers for Textual Data

Scaling models and classifiers for sparse matrix objects representing textual data in the form of a document-feature matrix. Includes original implementations of 'Laver', 'Benoit', and Garry's (2003) <doi:10.1017/S0003055403000698>, 'Wordscores' model, the Perry and 'Benoit' (2017) <doi:10.48550/arXiv.1710.08963> class affinity scaling model, and the 'Slapin' and 'Proksch' (2008) <doi:10.1111/j.1540-5907.2008.00338.x> 'wordfish' model, as well as methods for correspondence analysis, latent semantic analysis, and fast Naive Bayes and linear 'SVMs' specially designed for sparse textual data.

Maintained by Kenneth Benoit. Last updated 1 months ago.

openblas cpp

9.5 match 42 stars 9.56 score 432 scripts

r-spatial

sf:Simple Features for R

Support for simple feature access, a standardized way to encode and analyze spatial vector data. Binds to 'GDAL' <doi: 10.5281/zenodo.5884351> for reading and writing data, to 'GEOS' <doi: 10.5281/zenodo.11396894> for geometrical operations, and to 'PROJ' <doi: 10.5281/zenodo.5884394> for projection conversions and datum transformations. Uses by default the 's2' package for geometry operations on geodetic (long/lat degree) coordinates.

Maintained by Edzer Pebesma. Last updated 18 days ago.

gdal geos proj spatial cpp

4.0 match 1.4k stars 22.42 score 117k scripts 1.2k dependents

janmarvin

openxlsx2:Read, Write and Edit 'xlsx' Files

Simplifies the creation of 'xlsx' files by providing a high level interface to writing, styling and editing worksheets.

Maintained by Jan Marvin Garbuszus. Last updated 12 hours ago.

xlsx cpp

6.5 match 138 stars 13.66 score 194 scripts 11 dependents

cpsievert

LDAvis:Interactive Visualization of Topic Models

Tools to create an interactive web-based visualization of a topic model that has been fit to a corpus of text data using Latent Dirichlet Allocation (LDA). Given the estimated parameters of the topic model, it computes various summary statistics as input to an interactive visualization built with 'D3.js' that is accessed via a browser. The goal is to help users interpret the topics in their 'LDA' topic model.

Maintained by Carson Sievert. Last updated 7 years ago.

javascript text-mining topic-modeling visualization

8.0 match 558 stars 10.93 score 804 scripts 1 dependents

alexkowa

EnvStats:Package for Environmental Statistics, Including US EPA Guidance

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <doi:10.1007/978-1-4614-8456-1>).

Maintained by Alexander Kowarik. Last updated 18 days ago.

6.8 match 26 stars 12.80 score 2.4k scripts 46 dependents

jhk0530

gemini.R:Interface for 'Google Gemini' API

Provides a comprehensive interface for Google Gemini API, enabling users to access and utilize Gemini Large Language Model (LLM) functionalities directly from R. This package facilitates seamless integration with Google Gemini, allowing for advanced language processing, text generation, and other AI-driven capabilities within the R environment. For more information, please visit <https://ai.google.dev/docs/gemini_api_overview>.

Maintained by Jinhwan Kim. Last updated 18 hours ago.

12.9 match 68 stars 6.69 score 37 scripts 1 dependents

dbosak01

reporter:Creates Statistical Reports

Contains functions to create regulatory-style statistical reports. Originally designed to create tables, listings, and figures for the pharmaceutical, biotechnology, and medical device industries, these reports are generalized enough that they could be used in any industry. Generates text, rich-text, PDF, HTML, and Microsoft Word file formats. The package specializes in printing wide and long tables with automatic page wrapping and splitting. Reports can be produced with a minimum of function calls, and without relying on other table packages. The package supports titles, footnotes, page header, page footers, spanning headers, page by variables, and automatic page numbering.

Maintained by David Bosak. Last updated 12 months ago.

report reporting reports rptr

9.2 match 16 stars 9.35 score 173 scripts 4 dependents

johndharrison

seleniumPipes:R Client Implementing the W3C WebDriver Specification

The W3C WebDriver specification defines a way for out-of-process programs to remotely instruct the behaviour of web browsers. It is detailed at <https://w3c.github.io/webdriver/webdriver-spec.html>. This package provides an R client implementing the W3C specification.

Maintained by John Harrison. Last updated 8 years ago.

12.9 match 54 stars 6.66 score 168 scripts

privefl

bigreadr:Read Large Text Files

Read large text files by splitting them in smaller files. Package 'bigreadr' also provides some convenient wrappers around fread() and fwrite() from package 'data.table'.

Maintained by Florian Privé. Last updated 2 years ago.

large-dataset read-csv cpp

11.1 match 42 stars 7.78 score 636 scripts 4 dependents

andrewheiss

quRan:Complete Text of the Qur'an

Full text, in data frames containing one row per verse, of the Qur'an in Arabic (with and without vowels) and in English (the Yusuf Ali and Saheeh International translations), formatted to be convenient for text analysis.

Maintained by Andrew Heiss. Last updated 6 years ago.

islam quran text-mining tidytext

19.1 match 29 stars 4.44 score 19 scripts

rolkra

utf8ify:Format Text Using Unicode

Format text (bold, italic, ...) and numbers using UTF-8. Offers functions to search for emojis and include them in your text.

Maintained by Roland Krasser. Last updated 2 months ago.

19.7 match 2 stars 4.30 score

quanteda

stopwords:Multilingual Stopword Lists

Provides multiple sources of stopwords, for use in text analysis and natural language processing.

Maintained by Kenneth Benoit. Last updated 3 years ago.

text-analysis

8.1 match 114 stars 10.54 score 1.1k scripts 65 dependents

nmfs-ost

asar:Build NOAA Stock Assessment Report

Build a full or update stock assessment report for any stock assessment model. Parameterization allows the user to call a template based on their regional science center, species, area, ect.

Maintained by Samantha Schiano. Last updated 8 days ago.

latex quarto stock-assessment-reports

12.3 match 21 stars 6.87 score 3 scripts

guangchuangyu

shadowtext:Shadow Text Grob and Layer

Implement shadowtextGrob() for 'grid' and geom_shadowtext() layer for 'ggplot2'. These functions create/draw text grob with background shadow.

Maintained by Guangchuang Yu. Last updated 2 months ago.

8.0 match 38 stars 10.60 score 552 scripts 9 dependents

ropensci

magick:Advanced Graphics and Image-Processing in R

Bindings to 'ImageMagick': the most comprehensive open-source image processing library available. Supports many common formats (png, jpeg, tiff, pdf, etc) and manipulations (rotate, scale, crop, trim, flip, blur, etc). All operations are vectorized via the Magick++ STL meaning they operate either on a single frame or a series of frames for working with layers, collages, or animation. In RStudio images are automatically previewed when printed to the console, resulting in an interactive editing environment. The latest version of the package includes a native graphics device for creating in-memory graphics or drawing onto images using pixel coordinates.

Maintained by Jeroen Ooms. Last updated 22 days ago.

image-manipulation image-processing imagemagick cpp

4.8 match 468 stars 17.31 score 9.0k scripts 256 dependents

wraff

wrMisc:Analyze Experimental High-Throughput (Omics) Data

The efficient treatment and convenient analysis of experimental high-throughput (omics) data gets facilitated through this collection of diverse functions. Several functions address advanced object-conversions, like manipulating lists of lists or lists of arrays, reorganizing lists to arrays or into separate vectors, merging of multiple entries, etc. Another set of functions provides speed-optimized calculation of standard deviation (sd), coefficient of variance (CV) or standard error of the mean (SEM) for data in matrixes or means per line with respect to additional grouping (eg n groups of replicates). A group of functions facilitate dealing with non-redundant information, by indexing unique, adding counters to redundant or eliminating lines with respect redundancy in a given reference-column, etc. Help is provided to identify very closely matching numeric values to generate (partial) distance matrixes for very big data in a memory efficient manner or to reduce the complexity of large data-sets by combining very close values. Other functions help aligning a matrix or data.frame to a reference using partial matching or to mine an experimental setup to extract patterns of replicate samples. Many times large experimental datasets need some additional filtering, adequate functions are provided. Convenient data normalization is supported in various different modes, parameter estimation via permutations or boot-strap as well as flexible testing of multiple pair-wise combinations using the framework of 'limma' is provided, too. Batch reading (or writing) of sets of files and combining data to arrays is supported, too.

Maintained by Wolfgang Raffelsberger. Last updated 7 months ago.

18.8 match 4.44 score 33 scripts 4 dependents

nalimilan

tm.plugin.alceste:Import Texts from Files in the 'Alceste' Format Using the 'tm' Text Mining Framework

Provides a 'tm' Source to create corpora from a corpus prepared in the format used by the 'Alceste' application (i.e. a single text file with inline meta-data). It is able to import both text contents and meta-data (starred) variables.

Maintained by Milan Bouchet-Valat. Last updated 19 days ago.

text-mining

16.3 match 27 stars 5.08 score 5 scripts 1 dependents

adamspannbauer

lexRankr:Extractive Summarization of Text with the LexRank Algorithm

An R implementation of the LexRank algorithm described by G. Erkan and D. R. Radev (2004) <DOI:10.1613/jair.1523>.

Maintained by Adam Spannbauer. Last updated 2 years ago.

lexrank lexrank-algorithm nlp rstat cpp

14.3 match 21 stars 5.81 score 61 scripts

tidyverse

rvest:Easily Harvest (Scrape) Web Pages

Wrappers around the 'xml2' and 'httr' packages to make it easy to download, then manipulate, HTML and XML.

Maintained by Hadley Wickham. Last updated 5 months ago.

html web-scraping

4.2 match 1.5k stars 19.62 score 29k scripts 546 dependents

bioc

GeDi:Defining and visualizing the distances between different genesets

The package provides different distances measurements to calculate the difference between genesets. Based on these scores the genesets are clustered and visualized as graph. This is all presented in an interactive Shiny application for easy usage.

Maintained by Annekathrin Nedwed. Last updated 5 months ago.

gui genesetenrichment software transcription rnaseq visualization clustering pathways reportwriting go kegg reactome shinyapps

14.9 match 1 stars 5.52 score 22 scripts

atorus-research

pharmaRTF:Enhanced RTF Wrapper for Use with Existing Table Packages

Enhanced RTF wrapper written in R for use with existing R tables packages such as 'Huxtable' or 'GT'. This package fills a gap where tables in certain packages can be written out to RTF, but cannot add certain metadata or features to the document that are required/expected in a report for a regulatory submission, such as multiple levels of titles and footnotes, making the document landscape, and controlling properties such as margins.

Maintained by Michael Stackhouse. Last updated 3 years ago.

10.2 match 33 stars 8.01 score 128 scripts 2 dependents

jthomasmock

gtExtras:Extending 'gt' for Beautiful HTML Tables

Provides additional functions for creating beautiful tables with 'gt'. The functions are generally wrappers around boilerplate or adding opinionated niche capabilities and helpers functions.

Maintained by Thomas Mock. Last updated 12 months ago.

data-science data-visualization datascience ggplot2 gt plots sparkline sparkline-graphs sparklines tables

7.1 match 199 stars 11.45 score 2.4k scripts 3 dependents

docma-tu

tosca:Tools for Statistical Content Analysis

A framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the 'lda' package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang's intruder words and intruder topics is provided. Sample data for the vignette is included in the toscaData package, which is available on gitHub: <https://github.com/Docma-TU/toscaData>.

Maintained by Lars Koppers. Last updated 3 years ago.

12.2 match 16 stars 6.64 score 61 scripts 1 dependents

ropensci

rtika:R Interface to 'Apache Tika'

Extract text or metadata from over a thousand file types, using Apache Tika <https://tika.apache.org/>. Get either plain text or structured XHTML content.

Maintained by Sasha Goodman. Last updated 2 years ago.

extract-metadata extract-text java parse pdf-files peer-reviewed tesseract tika

13.4 match 55 stars 6.00 score 12 scripts

miserman

lingmatch:Linguistic Matching and Accommodation

Measure similarity between texts. Offers a variety of processing tools and similarity metrics to facilitate flexible representation of texts and matching. Implements forms of Language Style Matching (Ireland & Pennebaker, 2010) <doi:10.1037/a0020386> and Latent Semantic Analysis (Landauer & Dumais, 1997) <doi:10.1037/0033-295X.104.2.211>.

Maintained by Micah Iserman. Last updated 28 days ago.

nlp rcpp text-analysis cpp

16.5 match 11 stars 4.80 score 23 scripts

strategicprojects

pikchr:R Wrapper for 'pikchr' (PIC) Diagram Language

An 'R' interface to 'pikchr' (<https://pikchr.org>, pronounced “picture”), a 'PIC'-like markup language for creating diagrams within technical documentation. Originally developed by Brian Kernighan, 'PIC' has been adapted into 'pikchr' by D. Richard Hipp, the creator of 'SQLite'. 'pikchr' is designed to be embedded in fenced code blocks of Markdown or other documentation markup languages, making it ideal for generating diagrams in text-based formats. This package allows R users to seamlessly integrate the descriptive syntax of 'pikchr' for diagram creation directly within the 'R' environment.

Maintained by Andre Leite. Last updated 25 days ago.

16.4 match 1 stars 4.85 score 7 scripts

teunbrand

ggh4x:Hacks for 'ggplot2'

A 'ggplot2' extension that does a variety of little helpful things. The package extends 'ggplot2' facets through customisation, by setting individual scales per panel, resizing panels and providing nested facets. Also allows multiple colour and fill scales per plot. Also hosts a smaller collection of stats, geoms and axis guides.

Maintained by Teun van den Brand. Last updated 3 months ago.

ggplot-extension ggplot2

5.6 match 616 stars 13.98 score 4.4k scripts 20 dependents

henriquesposito

poldis:Analyse Political Texts

Wrangle and annotate different types of political texts. It also introduces Urgency Analysis, a new method for the analysis of urgency in political texts.

Maintained by Henrique Sposito. Last updated 7 months ago.

19.9 match 3 stars 3.95 score 4 scripts

quadrama

DramaAnalysis:Analysis of Dramatic Texts

Analysis of preprocessed dramatic texts, with respect to literary research. The package provides functions to analyze and visualize information about characters, stage directions, the dramatic structure and the text itself. The dramatic texts are expected to be in CSV format, which can be installed from within the package, sample texts are provided. The package and the reasoning behind it are described in Reiter et al. (2017) <doi:10.18420/in2017_119>.

Maintained by Nils Reiter. Last updated 4 years ago.

corpus-linguistics digital-humanities drama dramatic-texts statistics

16.4 match 15 stars 4.79 score 41 scripts

bioc

fobitools:Tools for Manipulating the FOBI Ontology

A set of tools for interacting with the Food-Biomarker Ontology (FOBI). A collection of basic manipulation tools for biological significance analysis, graphs, and text mining strategies for annotating nutritional data.

Maintained by Pol Castellano-Escuder. Last updated 4 months ago.

massspectrometry metabolomics software visualization biomedicalinformatics graphandnetwork annotation cheminformatics pathways genesetenrichment biological-intrerpretation biological-knowledge biological-significance-analysis enrichment-analysis food-biomarker-ontology knowledge-graph nutrition obofoundry ontology text-mining

15.4 match 1 stars 5.08 score 5 scripts

gforge

forestplot:Advanced Forest Plot Using 'grid' Graphics

Allows the creation of forest plots with advanced features, such as multiple confidence intervals per row, customizable fonts for individual text elements, and flexible confidence interval drawing. It also supports mixing text with mathematical expressions. The package extends the application of forest plots beyond traditional meta-analyses, offering a more general version of the original 'rmeta' package’s forestplot() function. It relies heavily on the 'grid' package for rendering the plots.

Maintained by Max Gordon. Last updated 4 months ago.

forestplot

6.7 match 43 stars 11.60 score 716 scripts 22 dependents

kgjerde

corporaexplorer:A 'Shiny' App for Exploration of Text Collections

Facilitates dynamic exploration of text collections through an intuitive graphical user interface and the power of regular expressions. The package contains 1) a helper function to convert a data frame to a 'corporaexplorerobject' and 2) a 'Shiny' app for fast and flexible exploration of a 'corporaexplorerobject'. The package also includes demo apps with which one can explore Jane Austen's novels and the State of the Union Addresses (data from the 'janeaustenr' and 'sotu' packages respectively).

Maintained by Kristian Lundby Gjerde. Last updated 7 months ago.

corpora corpus shiny text-analysis

14.5 match 65 stars 5.39 score 38 scripts

adayim

forestploter:Create a Flexible Forest Plot

Create a forest plot based on the layout of the data. Confidence intervals in multiple columns by groups can be done easily. Editing the plot, inserting/adding text, applying a theme to the plot, and much more.

Maintained by Alimu Dayimu. Last updated 6 months ago.

forestplot

8.4 match 93 stars 9.31 score 207 scripts 4 dependents

psychbruce

PsychWordVec:Word Embedding Research Framework for Psychological Science

An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a series of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <arXiv:1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <arXiv:1607.04606>; (5) a group of functions to download 'pre-trained' language models (e.g., 'GPT', 'BERT') and extract contextualized (dynamic) word vectors (based on the R package 'text').

Maintained by Han-Wu-Shuang Bao. Last updated 1 years ago.

19.1 match 22 stars 4.04 score 10 scripts

vosonlab

vosonSML:Collecting Social Media Data and Generating Networks for Analysis

A suite of easy to use functions for collecting social media data and generating networks for analysis. Supports Mastodon, YouTube, Reddit and Web 1.0 data sources.

Maintained by Bryan Gertzel. Last updated 8 months ago.

hyperlink mastodon network-graph reddit sna social-media social-network-analysis voson youtube

10.0 match 79 stars 7.67 score 66 scripts 1 dependents

ropensci

antiword:Extract Text from Microsoft Word Documents

Wraps the 'AntiWord' utility to extract text from Microsoft Word documents. The utility only supports the old 'doc' format, not the new xml based 'docx' format. Use the 'xml2' package to read the latter.

Maintained by Jeroen Ooms. Last updated 6 months ago.

antiword extract-text

11.0 match 59 stars 6.98 score 7 scripts 7 dependents

great-northern-diver

loon:Interactive Statistical Data Visualization

An extendable toolkit for interactive data visualization and exploration.

Maintained by R. Wayne Oldford. Last updated 2 years ago.

data-analysis data-science data-visualization exploratory-analysis exploratory-data-analysis high-dimensional-data interactive-graphics interactive-visualizations loon python statistical-analysis statistical-graphics statistics tcl-extension tk

8.4 match 48 stars 9.00 score 93 scripts 5 dependents

rstudio

rstudioapi:Safely Access the RStudio API

Access the RStudio API (if available) and provide informative error messages when it's not.

Maintained by Kevin Ushey. Last updated 4 months ago.

4.0 match 172 stars 18.81 score 3.6k scripts 2.1k dependents

chaoliu-cl

textAnnotatoR:Interactive Text Annotation Tool with 'shiny' GUI

A comprehensive text annotation tool built with 'shiny'. Provides an interactive graphical user interface for coding text documents, managing code hierarchies, creating memos, and analyzing coding patterns. Features include code co-occurrence analysis, visualization of coding patterns, comparison of multiple coding sets, and export capabilities. Supports collaborative qualitative research through standardized annotation formats and analysis tools.

Maintained by Chao Liu. Last updated 4 months ago.

17.3 match 4.30 score 5 scripts

alexkz

kernlab:Kernel-Based Machine Learning Lab

Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods 'kernlab' includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.

Maintained by Alexandros Karatzoglou. Last updated 7 months ago.

openblas cpp

6.0 match 21 stars 12.26 score 7.8k scripts 487 dependents

rstudio

blastula:Easily Send HTML Email Messages

Compose and send out responsive HTML email messages that render perfectly across a range of email clients and device sizes. Helper functions let the user insert embedded images, web link buttons, and 'ggplot2' plot objects into the message body. Messages can be sent through an 'SMTP' server, through the 'Posit Connect' service, or through the 'Mailgun' API service <https://www.mailgun.com/>.

Maintained by Richard Iannone. Last updated 8 months ago.

easy-to-use email html markdown responsive-email smtp

7.1 match 552 stars 10.27 score 348 scripts 5 dependents

wilkox

gggenes:Draw Gene Arrow Maps in 'ggplot2'

A 'ggplot2' extension for drawing gene arrow maps.

Maintained by David Wilkins. Last updated 1 years ago.

genetics ggplot2

6.9 match 525 stars 10.54 score 372 scripts 2 dependents

andrie

surveydata:Tools to Work with Survey Data

Data obtained from surveys contains information not only about the survey responses, but also the survey metadata, e.g. the original survey questions and the answer options. The 'surveydata' package makes it easy to keep track of this metadata, and to easily extract columns with specific questions.

Maintained by Andrie de Vries. Last updated 2 years ago.

12.5 match 23 stars 5.68 score 42 scripts

julienmoeys

soiltexture:Functions for Soil Texture Plot, Classification and Transformation

"The Soil Texture Wizard" is a set of R functions designed to produce texture triangles (also called texture plots, texture diagrams, texture ternary plots), classify and transform soil textures data. These functions virtually allows to plot any soil texture triangle (classification) into any triangle geometry (isosceles, right-angled triangles, etc.). This set of function is expected to be useful to people using soil textures data from different soil texture classification or different particle size systems. Many (> 15) texture triangles from all around the world are predefined in the package. A simple text based graphical user interface is provided: soiltexture_gui().

Maintained by Julien Moeys. Last updated 1 years ago.

10.0 match 28 stars 7.11 score 136 scripts 1 dependents

sentometricsresearch

sentometrics:An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction

Optimized prediction based on textual sentiment, accounting for the intrinsic challenge that sentiment can be computed and pooled across texts and time in various ways. See Ardia et al. (2021) <doi:10.18637/jss.v099.i02>.

Maintained by Samuel Borms. Last updated 4 years ago.

nlp prediction sentiment-analysis text-mining time-series openblas cpp openmp

11.6 match 83 stars 6.09 score 49 scripts

hadley

plyr:Tools for Splitting, Applying and Combining Data

A set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary statistics. The development of 'plyr' has been generously supported by 'Becton Dickinson'.

Maintained by Hadley Wickham. Last updated 4 months ago.

cpp

3.9 match 500 stars 18.16 score 83k scripts 3.3k dependents

bioc

rWikiPathways:rWikiPathways - R client library for the WikiPathways API

Use this package to interface with the WikiPathways API. It provides programmatic access to WikiPathways content in multiple data and image formats, including official monthly release files and convenient GMT read/write functions.

Maintained by Egon Willighagen. Last updated 5 months ago.

visualization graphandnetwork thirdpartyclient network metabolomics bioinformatics data-access pathways

7.6 match 15 stars 9.23 score 131 scripts 3 dependents

ropensci

EndoMineR:Functions to mine endoscopic and associated pathology datasets

This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.

Maintained by Sebastian Zeki. Last updated 7 months ago.

endoscopy gastroenterology peer-reviewed semi-structured-data text-mining

12.9 match 13 stars 5.47 score 30 scripts

appsilon

shiny.semantic:Semantic UI Support for Shiny

Creating a great user interface for your Shiny apps can be a hassle, especially if you want to work purely in R and don't want to use, for instance HTML templates. This package adds support for a powerful UI library Fomantic UI - <https://fomantic-ui.com/> (before Semantic). It also supports universal UI input binding that works with various DOM elements.

Maintained by Jakub Nowicki. Last updated 11 months ago.

appsilon fomantic-ui rhinoverse semantic semantic-components semantic-ui shiny

5.3 match 506 stars 13.00 score 586 scripts 3 dependents

ajrgodfrey

BrailleR:Improved Access for Blind Users

Blind users do not have access to the graphical output from R without printing the content of graphics windows to an embosser of some kind. This is not as immediate as is required for efficient access to statistical output. The functions here are created so that blind people can make even better use of R. This includes the text descriptions of graphs, convenience functions to replace the functionality offered in many GUI front ends, and experimental functionality for optimising graphical content to prepare it for embossing as tactile images.

Maintained by A. Jonathan R. Godfrey. Last updated 11 months ago.

7.8 match 123 stars 8.90 score 143 scripts

ropensci

epubr:Read EPUB File Metadata and Text

Provides functions supporting the reading and parsing of internal e-book content from EPUB files. The 'epubr' package provides functions supporting the reading and parsing of internal e-book content from EPUB files. E-book metadata and text content are parsed separately and joined together in a tidy, nested tibble data frame. E-book formatting is not completely standardized across all literature. It can be challenging to curate parsed e-book content across an arbitrary collection of e-books perfectly and in completely general form, to yield a singular, consistently formatted output. Many EPUB files do not even contain all the same pieces of information in their respective metadata. EPUB file parsing functionality in this package is intended for relatively general application to arbitrary EPUB e-books. However, poorly formatted e-books or e-books with highly uncommon formatting may not work with this package. There may even be cases where an EPUB file has DRM or some other property that makes it impossible to read with 'epubr'. Text is read 'as is' for the most part. The only nominal changes are minor substitutions, for example curly quotes changed to straight quotes. Substantive changes are expected to be performed subsequently by the user as part of their text analysis. Additional text cleaning can be performed at the user's discretion, such as with functions from packages like 'tm' or 'qdap'.

Maintained by Matthew Leonawicz. Last updated 6 months ago.

epub epub-files epub-format peer-reviewed

10.8 match 24 stars 6.37 score 49 scripts

ggobi

GGally:Extension to 'ggplot2'

The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.

Maintained by Barret Schloerke. Last updated 10 months ago.

4.3 match 597 stars 16.15 score 17k scripts 154 dependents

nbarrowman

vtree:Display Information About Nested Subsets of a Data Frame

A tool for calculating and drawing "variable trees". Variable trees display information about nested subsets of a data frame.

Maintained by Nick Barrowman. Last updated 3 days ago.

data-science data-visualization exploratory-data-analysis statistics

9.6 match 76 stars 7.09 score 65 scripts

cysouw

qlcMatrix:Utility Sparse Matrix Functions for Quantitative Language Comparison

Extension of the functionality of the 'Matrix' package for using sparse matrices. Some of the functions are very general, while other are highly specific for special data format as used for quantitative language comparison.

Maintained by Michael Cysouw. Last updated 9 months ago.

9.7 match 6 stars 6.98 score 256 scripts 1 dependents

ropensci

beastier:Call 'BEAST2'

'BEAST2' (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. 'BEAST2' is a command-line tool. This package provides a way to call 'BEAST2' from an 'R' function call.

Maintained by Richèl J.C. Bilderbeek. Last updated 24 days ago.

bayesian beast beast2 phylogenetic-inference phylogenetics openjdk

8.6 match 11 stars 7.87 score 47 scripts 4 dependents

wilkelab

cowplot:Streamlined Plot Theme and Plot Annotations for 'ggplot2'

Provides various features that help with creating publication-quality figures with 'ggplot2', such as a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images. The package was originally written for internal use in the Wilke lab, hence the name (Claus O. Wilke's plot package). It has also been used extensively in the book Fundamentals of Data Visualization.

Maintained by Claus O. Wilke. Last updated 2 months ago.

3.5 match 714 stars 18.83 score 75k scripts 1.4k dependents

r-lib

gmailr:Access the 'Gmail' 'RESTful' API

An interface to the 'Gmail' 'RESTful' API. Allows access to your 'Gmail' messages, threads, drafts and labels.

Maintained by Jennifer Bryan. Last updated 1 years ago.

5.8 match 230 stars 11.49 score 289 scripts 1 dependents

fabrice-rossi

mixvlmc:Variable Length Markov Chains with Covariates

Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999) <doi:10.1214/aos/1018031204> for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) <doi:10.1111/jtsa.12615> for VLMC with covariates.

Maintained by Fabrice Rossi. Last updated 11 months ago.

machine-learning markov-chain markov-model statistics time-series cpp

10.7 match 2 stars 6.23 score 20 scripts

bnosac

tokenizers.bpe:Byte Pair Encoding Text Tokenization

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.

Maintained by Jan Wijffels. Last updated 2 years ago.

bpe byte-pair-encoding text-mining tokenization cpp

14.5 match 15 stars 4.56 score 48 scripts

tidyverse

readr:Read Rectangular Text Data

The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.

Maintained by Jennifer Bryan. Last updated 8 months ago.

csv fwf parsing cpp

3.1 match 1.0k stars 21.03 score 132k scripts 2.0k dependents

coolbutuseless

tickle:Easily Build Tcl/Tk UIs

Wrap tcltk to make GUI creation easier.

Maintained by mikefc. Last updated 3 years ago.

11.1 match 125 stars 5.88 score 11 scripts

appsilon

shiny.fluent:Microsoft Fluent UI for Shiny Apps

A rich set of UI components for building Shiny applications, including inputs, containers, overlays, menus, and various utilities. All components from Fluent UI (the underlying JavaScript library) are available and have usage examples in R.

Maintained by Jakub Sobolewski. Last updated 10 months ago.

microsoft-fluent-ui react rhinoverse shiny

6.6 match 280 stars 9.91 score 656 scripts

r-lib

xml2:Parse XML

Bindings to 'libxml2' for working with XML data using a simple, consistent interface based on 'XPath' expressions. Also supports XML schema validation; for 'XSLT' transformations see the 'xslt' package.

Maintained by Jeroen Ooms. Last updated 5 days ago.

libxml2 xml cpp

3.5 match 220 stars 18.52 score 6.3k scripts 2.3k dependents

r-tmap

tmap:Thematic Maps

Thematic maps are geographical maps in which spatial data distributions are visualized. This package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.

Maintained by Martijn Tennekes. Last updated 6 days ago.

choropleth-maps maps spatial thematic-maps visualisation

3.9 match 880 stars 16.73 score 13k scripts 24 dependents

mannau

boilerpipeR:Interface to the Boilerpipe Java Library

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe <https://github.com/kohlschutter/boilerpipe> Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Maintained by Mario Annau. Last updated 4 years ago.

openjdk

11.7 match 22 stars 5.52 score 30 scripts

nalimilan

tm.plugin.factiva:Import Articles from 'Factiva' Using the 'tm' Text Mining Framework

Provides a 'tm' Source to create corpora from articles exported from the Dow Jones 'Factiva' content provider as XML or HTML files. It is able to read both text content and meta-data information (including source, date, title, author, subject, geographical coverage, company, industry, and various provider-specific fields).

Maintained by Milan Bouchet-Valat. Last updated 19 days ago.

text-mining

12.5 match 27 stars 5.13 score 11 scripts 1 dependents

mihai-sysbio

glpkAPI:R Interface to C API of GLPK

R Interface to C API of GLPK, depends on GLPK Version >= 4.42.

Maintained by Mihail Anton. Last updated 2 years ago.

glpk

10.7 match 5.97 score 51 scripts 12 dependents

kumes

deepRstudio:Seamless Language Translation in 'RStudio' using 'DeepL' API and 'Rstudioapi'

Enhancing cross-language compatibility within the 'RStudio' environment and supporting seamless language understanding, the 'deepRstudio' package leverages the power of the 'DeepL' API (see <https://www.deepl.com/docs-api>) to enable seamless, fast, accurate, and affordable translation of code comments, documents, and text. This package offers the ability to translate selected text into English (EN), as well as from English into various languages, namely Japanese (JA), Chinese (ZH), Spanish (ES), French (FR), Russian (RU), Portuguese (PT), and Indonesian (ID). With much of the text being written in English, the emphasis is on compatibility from English. It is also designed for developers working on multilingual projects and data analysts collaborating with international teams, simplifying the translation process and making code more accessible and comprehensible to people with diverse language backgrounds. This package uses the 'rstudioapi' package and 'DeepL' API, and is simply implemented, executed from addins or via shortcuts on 'RStudio'. With just a few steps, content can be translated between supported languages, promoting better collaboration and expanding the global reach of work. The functionality of this package works only on 'RStudio' using 'rstudioapi'.

Maintained by Satoshi Kume. Last updated 1 years ago.

deepl deeprstudio language-translation rstudio rstudioapi seamless seamless-language translation

18.3 match 2 stars 3.48 score 4 scripts 1 dependents

nalimilan

tm.plugin.lexisnexis:Import Articles from 'LexisNexis' Using the 'tm' Text Mining Framework

Provides a 'tm' Source to create corpora from articles exported from the 'LexisNexis' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages). Note that the file format is highly unstable: there is no warranty that this package will work for your corpus, and you may have to adjust the code to adapt it to your particular format.

Maintained by Milan Bouchet-Valat. Last updated 19 days ago.

text-mining

12.5 match 27 stars 5.08 score 9 scripts 1 dependents

r-lib

lintr:A 'Linter' for R Code

Checks adherence to a given style, syntax errors and possible semantic issues. Supports on the fly checking of R code edited with 'RStudio IDE', 'Emacs', 'Vim', 'Sublime Text', 'Atom' and 'Visual Studio Code'.

Maintained by Michael Chirico. Last updated 11 hours ago.

linter

3.7 match 1.2k stars 16.99 score 916 scripts 33 dependents

bioc

marray:Exploratory analysis for two-color spotted microarray data

Class definitions for two-color spotted microarray data. Fuctions for data input, diagnostic plots, normalization and quality checking.

Maintained by Yee Hwa (Jean) Yang. Last updated 5 months ago.

microarray twochannel preprocessing

7.1 match 8.92 score 222 scripts 37 dependents

crunch-io

crunch:Crunch.io Data Tools

The Crunch.io service <https://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.

Maintained by Greg Freedman Ellis. Last updated 12 days ago.

6.0 match 9 stars 10.53 score 200 scripts 2 dependents

ropensci

pkgmatch:Find R Packages Matching Either Descriptions or Other R Packages

Find R packages matching either descriptions or other R packages.

Maintained by Mark Padgham. Last updated 1 months ago.

embeddings llms natural-language-processing cpp

12.0 match 3 stars 5.23 score

cjbarrie

quiltr:Qualtrics for Labelling Text using R

Functions to convert text data for labelling into format appropriate for importing into Qualtrics. Supports multiple language, including right-to-left scripts as well as different response types. Outputs an Advance Format .txt file that can be read into Qualtrics.

Maintained by Christopher Barrie. Last updated 3 years ago.

14.6 match 4 stars 4.30 score 9 scripts

renkun-ken

formattable:Create 'Formattable' Data Structures

Provides functions to create formattable vectors and data frames. 'Formattable' vectors are printed with text formatting, and formattable data frames are printed with multiple types of formatting in HTML to improve the readability of data presented in tabular form rendered in web pages.

Maintained by Kun Ren. Last updated 3 months ago.

4.3 match 700 stars 14.69 score 3.6k scripts 26 dependents

bioc

debrowser:Interactive Differential Expresion Analysis Browser

Bioinformatics platform containing interactive plots and tables for differential gene and region expression studies. Allows visualizing expression data much more deeply in an interactive and faster way. By changing the parameters, users can easily discover different parts of the data that like never have been done before. Manually creating and looking these plots takes time. With DEBrowser users can prepare plots without writing any code. Differential expression, PCA and clustering analysis are made on site and the results are shown in various plots such as scatter, bar, box, volcano, ma plots and Heatmaps.

Maintained by Alper Kucukural. Last updated 5 months ago.

sequencing chipseq rnaseq differentialexpression geneexpression clustering immunooncology

8.0 match 61 stars 7.80 score 65 scripts

myeomans

politeness:Detecting Politeness Features in Text

Detecting markers of politeness in English natural language. This package allows researchers to easily visualize and quantify politeness between groups of documents. This package combines prior research on the linguistic markers of politeness. We thank the Spencer Foundation, the Hewlett Foundation, and Harvard's Institute for Quantitative Social Science for support.

Maintained by Mike Yeomans. Last updated 1 months ago.

8.3 match 25 stars 7.49 score 41 scripts 1 dependents

rstudio

learnr:Interactive Tutorials for R

Create interactive tutorials using R Markdown. Use a combination of narrative, figures, videos, exercises, and quizzes to create self-paced tutorials for learning about R and R packages.

Maintained by Garrick Aden-Buie. Last updated 7 months ago.

interactive python rmarkdown shiny sql teaching tutorial

4.2 match 713 stars 14.79 score 6.5k scripts 27 dependents

cran

textreg:n-Gram Text Regression, aka Concise Comparative Summarization

Function for sparse regression on raw text, regressing a labeling vector onto a feature space consisting of all possible phrases.

Maintained by Luke Miratrix. Last updated 6 years ago.

cpp

19.0 match 1 stars 3.26 score

bnosac

word2vec:Distributed Representations of Words

Learn vector representations of words by continuous bag of words and skip-gram implementations of the 'word2vec' algorithm. The techniques are detailed in the paper "Distributed Representations of Words and Phrases and their Compositionality" by Mikolov et al. (2013), available at <arXiv:1310.4546>.

Maintained by Jan Wijffels. Last updated 1 years ago.

embeddings natural-language-processing word2vec cpp

7.4 match 70 stars 8.36 score 227 scripts 6 dependents

rstudio

tfdatasets:Interface to 'TensorFlow' Datasets

Interface to 'TensorFlow' Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces. See <https://www.tensorflow.org/guide> for additional details.

Maintained by Tomasz Kalinowski. Last updated 6 days ago.

6.7 match 34 stars 9.32 score 656 scripts 3 dependents

inlabru-org

inlabru:Bayesian Latent Gaussian Modelling using INLA and Extensions

Facilitates spatial and general latent Gaussian modeling using integrated nested Laplace approximation via the INLA package (<https://www.r-inla.org>). Additionally, extends the GAM-like model class to more general nonlinear predictor expressions, and implements a log Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data. Model components are specified with general inputs and mapping methods to the latent variables, and the predictors are specified via general R expressions, with separate expressions for each observation likelihood model in multi-likelihood models. A prediction method based on fast Monte Carlo sampling allows posterior prediction of general expressions of the latent variables. Ecology-focused introduction in Bachl, Lindgren, Borchers, and Illian (2019) <doi:10.1111/2041-210X.13168>.

Maintained by Finn Lindgren. Last updated 21 hours ago.

4.9 match 96 stars 12.61 score 832 scripts 6 dependents

yingjie4science

SDGdetector:Detect SDGs and Targets in Text

Identify 17 Sustainable Development Goals and associated 169 targets in text.

Maintained by Yingjie Li. Last updated 6 months ago.

sdg sdgs sustainability sustainable-development-goals text-mining

14.9 match 14 stars 4.15 score 10 scripts

frareb

inpdfr:Analyse Text Documents Using Ecological Tools

A set of functions to analyse and compare texts, using classical text mining functions, as well as those from theoretical ecology.

Maintained by Rebaudo Francois. Last updated 2 years ago.

14.0 match 2 stars 4.41 score 26 scripts

manalytics

opitools:Analyzing the Opinions in a Big Text Document

Designed for performing impact analysis of opinions in a digital text document (DTD). The package allows a user to assess the extent to which a theme or subject within a document impacts the overall opinion expressed in the document. The package can be applied to a wide range of opinion-based DTD, including commentaries on social media platforms (such as 'Facebook', 'Twitter' and 'Youtube'), online products reviews, and so on. The utility of 'opitools' was originally demonstrated in Adepeju and Jimoh (2021) <doi:10.31235/osf.io/c32qh> in the assessment of COVID-19 impacts on neighbourhood policing using Twitter data. Further examples can be found in the vignette of the package.

Maintained by Monsuru Adepeju. Last updated 2 years ago.

11.5 match 12 stars 5.30 score 11 scripts

openanalytics

clinUtils:General Utility Functions for Analysis of Clinical Data

Utility functions to facilitate the import, the reporting and analysis of clinical data. Example datasets in 'SDTM' and 'ADaM' format, containing a subset of patients/domains from the 'CDISC Pilot 01 study' are also available as R datasets to demonstrate the package functionalities.

Maintained by Laure Cougnaud. Last updated 10 months ago.

9.0 match 3 stars 6.78 score 105 scripts 3 dependents

gegznav

spAddins:RStudio Add-ins to Format R Markdown files (RETIRED PACKAGE)

The development of `spAddins` ended in 2018 as the package retired in favor of packages `addins.rmd` and `addins.rs`. ... RStudio Add-ins to Format Text and Insert Operators ... A set of RStudio addins that are designed to be used in combination with user-defined RStudio keyboard shortcuts. These addins either: 1) insert text at a cursor position (e.g. insert operators %>%, <<-, %$%, etc.), 2) replace symbols in selected pieces of text (e.g., convert backslashes to forward slashes which results in stings like "c:\data\" converted into "c:/data/") or 3) enclose text with special symbols (e.g., converts "bold" into "**bold**") which is convenient for editing R Markdown files.

Maintained by Vilmantas Gegzna. Last updated 4 years ago.

rstudio-addins

13.2 match 8 stars 4.60 score 8 scripts

satijalab

Seurat:Tools for Single Cell Genomics

A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.

Maintained by Paul Hoffman. Last updated 1 years ago.

human-cell-atlas single-cell-genomics single-cell-rna-seq cpp

3.5 match 2.4k stars 16.86 score 50k scripts 73 dependents

atfutures

calendar:Create, Read, Write, and Work with 'iCalendar' Files, Calendars and Scheduling Data

Provides function to create, read, write, and work with 'iCalendar' files (which typically have '.ics' or '.ical' extensions), and the scheduling data, calendars and timelines of people, organisations and other entities that they represent. 'iCalendar' is an open standard for exchanging calendar and scheduling information between users and computers, described at <https://icalendar.org/>.

Maintained by Robin Lovelace. Last updated 7 months ago.

calendar ical

7.1 match 42 stars 8.39 score 113 scripts 1 dependents

dankelley

oce:Analysis of Oceanographic Data

Supports the analysis of Oceanographic data, including 'ADCP' measurements, measurements made with 'argo' floats, 'CTD' measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the 'UNESCO' or 'TEOS-10' equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively by Kelley (2018) "Oceanographic Analysis with R" <doi:10.1007/978-1-4939-8844-0>.

Maintained by Dan Kelley. Last updated 3 days ago.

oceanography fortran cpp

3.9 match 146 stars 15.42 score 4.2k scripts 18 dependents

rrwen

draw:Wrapper Functions for Producing Graphics

A set of user-friendly wrapper functions for creating consistent graphics and diagrams with lines, common shapes, text, and page settings. Compatible with and based on the R 'grid' package.

Maintained by Richard Wen. Last updated 7 years ago.

box circle curve diagram draw graphics grid line page rectangle reproducible shape square text triangle

13.5 match 2 stars 4.39 score 35 scripts