Showing 200 of total 1642 results (show query)
oscarkjell
text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.
Maintained by Oscar Kjell. Last updated 5 days ago.
deep-learningmachine-learningnlptransformersopenjdk
93.6 match 146 stars 13.16 score 436 scripts 1 dependentsgagolews
stringi:Fast and Portable Character String Processing Facilities
A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).
Maintained by Marek Gagolewski. Last updated 1 months ago.
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringistringrtexttext-processingtidy-dataunicodecpp
45.8 match 309 stars 18.31 score 10k scripts 8.6k dependentstrinker
qdap:Bridging the Gap Between Qualitative Data and Quantitative Analysis
Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.
Maintained by Tyler Rinker. Last updated 4 years ago.
qdapquantitative-discourse-analysistext-analysistext-miningtext-plottingopenjdk
58.8 match 176 stars 9.61 score 1.3k scripts 3 dependentssergejruff
lovecraftr:A Collection of Lovecraftian Tales and Texts
A curated collection of Howard Phillips Lovecraft's complete stories, collected for the purpose of text analysis.
Maintained by Ruff Sergej. Last updated 3 months ago.
130.1 match 6 stars 3.78 score 1 scriptsquanteda
quanteda:Quantitative Analysis of Textual Data
A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
Maintained by Kenneth Benoit. Last updated 2 months ago.
corpusnatural-language-processingquantedatext-analyticsonetbbcpp
28.5 match 851 stars 16.68 score 5.4k scripts 51 dependentsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
36.8 match 215 stars 11.83 score 1.2k scripts 9 dependentsquanteda
readtext:Import and Handling for Plain and Formatted Text Files
Functions for importing and handling text files and formatted text files with additional meta-data, such including '.csv', '.tab', '.json', '.xml', '.html', '.pdf', '.doc', '.docx', '.rtf', '.xls', '.xlsx', and others.
Maintained by Kenneth Benoit. Last updated 4 months ago.
39.9 match 122 stars 10.66 score 1.2k scripts 5 dependentsrstudio
gt:Easily Create Presentation-Ready Display Tables
Build display tables from tabular data with an easy-to-use set of functions. With its progressive approach, we can construct display tables with a cohesive set of table parts. Table values can be formatted using any of the included formatting functions. Footnotes and cell styles can be precisely added through a location targeting system. The way in which 'gt' handles things for you means that you don't often have to worry about the fine details.
Maintained by Richard Iannone. Last updated 12 days ago.
docxeasy-to-usehtmllatexrtfsummary-tables
20.3 match 2.1k stars 18.36 score 20k scripts 112 dependentsjuliasilge
janeaustenr:Jane Austen's Complete Novels
Full texts for Jane Austen's 6 completed novels, ready for text analysis. These novels are "Sense and Sensibility", "Pride and Prejudice", "Mansfield Park", "Emma", "Northanger Abbey", and "Persuasion".
Maintained by Julia Silge. Last updated 3 years ago.
29.5 match 95 stars 11.03 score 1.1k scripts 62 dependentswilkelab
ggtext:Improved Text Rendering Support for 'ggplot2'
A 'ggplot2' extension that enables the rendering of complex formatted plot labels (titles, subtitles, facet labels, axis labels, etc.). Text boxes with automatic word wrap are also supported.
Maintained by Brenton M. Wiernik. Last updated 3 years ago.
19.7 match 657 stars 15.71 score 13k scripts 155 dependentstrinker
textclean:Text Cleaning Tools
Tools to clean and process text. Tools are geared at checking for substrings that are not optimal for analysis and replacing or removing them (normalizing) with more analysis friendly substrings (see Sproat, Black, Chen, Kumar, Ostendorf, & Richards (2001) <doi:10.1006/csla.2001.0169>) or extracting them into new variables. For example, emoticons are often used in text but not always easily handled by analysis algorithms. The replace_emoticon() function replaces emoticons with word equivalents.
Maintained by Tyler Rinker. Last updated 3 years ago.
data-mungingemoticonsregextext-analysistext-cleaning
30.6 match 248 stars 10.08 score 760 scripts 22 dependentsdigi-vub
text.alignment:Text Alignment with Smith-Waterman
Find similarities between texts using the Smith-Waterman algorithm. The algorithm performs local sequence alignment and determines similar regions between two strings. The Smith-Waterman algorithm is explained in the paper: "Identification of common molecular subsequences" by T.F.Smith and M.S.Waterman (1981), available at <doi:10.1016/0022-2836(81)90087-5>. This package implements the same logic for sequences of words and letters instead of molecular sequences.
Maintained by Jan Wijffels. Last updated 2 years ago.
52.5 match 10 stars 5.80 score 14 scriptsr-forge
tm:Text Mining Package
A framework for text mining applications within R.
Maintained by Kurt Hornik. Last updated 27 days ago.
23.1 match 12.96 score 14k scripts 101 dependentsallancameron
geomtextpath:Curved Text in 'ggplot2'
A 'ggplot2' extension that allows text to follow curved paths. Curved text makes it easier to directly label paths or neatly annotate in polar co-ordinates.
Maintained by Allan Cameron. Last updated 2 months ago.
24.0 match 631 stars 12.04 score 960 scripts 5 dependentsropensci
tokenizers:Fast, Consistent Tokenization of Natural Language Text
Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.
Maintained by Thomas Charlon. Last updated 12 months ago.
nlppeer-reviewedtext-miningtokenizercpp
21.6 match 186 stars 13.33 score 1.1k scripts 81 dependentsslowkow
ggrepel:Automatically Position Non-Overlapping Text Labels with 'ggplot2'
Provides text and label geoms for 'ggplot2' that help to avoid overlapping text labels. Labels repel away from each other and away from the data points.
Maintained by Kamil Slowikowski. Last updated 4 months ago.
14.7 match 1.2k stars 19.20 score 37k scripts 1.2k dependentsjuliasilge
tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.
Maintained by Julia Silge. Last updated 11 months ago.
natural-language-processingtext-miningtidy-datatidyverse
16.6 match 1.2k stars 16.86 score 17k scripts 61 dependentsr-lib
cli:Helpers for Developing Command Line Interfaces
A suite of tools to build attractive command line interfaces ('CLIs'), from semantic elements: headings, lists, alerts, paragraphs, etc. Supports custom themes via a 'CSS'-like language. It also contains a number of lower level 'CLI' elements: rules, boxes, trees, and 'Unicode' symbols with 'ASCII' alternatives. It support ANSI colors and text styles as well.
Maintained by Gรกbor Csรกrdi. Last updated 13 hours ago.
14.0 match 664 stars 19.34 score 1.4k scripts 14k dependentsmlampros
textTinyR:Text Processing for Small or Big Data Files
It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.
Maintained by Lampros Mouselimis. Last updated 1 years ago.
bhboostcpp11processingrcpprcpparmadillotextopenblascppopenmp
33.9 match 38 stars 7.64 score 244 scripts 1 dependentsbioc
ComplexHeatmap:Make Complex Heatmaps
Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports various annotation graphics.
Maintained by Zuguang Gu. Last updated 5 months ago.
softwarevisualizationsequencingclusteringcomplex-heatmapsheatmap
15.2 match 1.3k stars 16.93 score 16k scripts 151 dependentsdavidgohel
flextable:Functions for Tabular Reporting
Use a grammar for creating and customizing pretty tables. The following formats are supported: 'HTML', 'PDF', 'RTF', 'Microsoft Word', 'Microsoft PowerPoint' and R 'Grid Graphics'. 'R Markdown', 'Quarto' and the package 'officer' can be used to produce the result files. The syntax is the same for the user regardless of the type of output to be produced. A set of functions allows the creation, definition of cell arrangement, addition of headers or footers, formatting and definition of cell content with text and or images. The package also offers a set of high-level functions that allow tabular reporting of statistical models and the creation of complex cross tabulations.
Maintained by David Gohel. Last updated 1 months ago.
docxhtml5ms-office-documentsrmarkdowntable
14.5 match 583 stars 17.04 score 7.3k scripts 119 dependentsdavidgohel
officer:Manipulation of Microsoft Word and PowerPoint Documents
Access and manipulate 'Microsoft Word', 'RTF' and 'Microsoft PowerPoint' documents from R. The package focuses on tabular and graphical reporting from R; it also provides two functions that let users get document content into data objects. A set of functions lets add and remove images, tables and paragraphs of text in new or existing documents. The package does not require any installation of Microsoft products to be able to write Microsoft files.
Maintained by David Gohel. Last updated 1 months ago.
ms-office-documentspowerpointword
15.6 match 630 stars 15.79 score 4.1k scripts 137 dependentsjokergoo
circlize:Circular Visualization
Circular layout is an efficient way for the visualization of huge amounts of information. Here this package provides an implementation of circular layout generation in R as well as an enhancement of available software. The flexibility of the package is based on the usage of low-level graphics functions such that self-defined high-level graphics can be easily implemented by users for specific purposes. Together with the seamless connection between the powerful computational and visual environment in R, it gives users more convenience and freedom to design figures for better understanding complex patterns behind multiple dimensional data. The package is described in Gu et al. 2014 <doi:10.1093/bioinformatics/btu393>.
Maintained by Zuguang Gu. Last updated 1 years ago.
15.6 match 983 stars 15.62 score 10k scripts 213 dependentsropensci
beautier:'BEAUti' from R
'BEAST2' (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. 'BEAUti 2' (which is part of 'BEAST2') is a GUI tool that allows users to specify the many possible setups and generates the XML file 'BEAST2' needs to run. This package provides a way to create 'BEAST2' input files without active user input, but using R function calls instead.
Maintained by Richรจl J.C. Bilderbeek. Last updated 24 days ago.
bayesianbeastbeast2beautiphylogenetic-inferencephylogenetics
27.8 match 13 stars 8.76 score 198 scripts 5 dependentswrathematics
ngram:Fast n-Gram 'Tokenization'
An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.
Maintained by Drew Schmidt. Last updated 1 years ago.
23.3 match 71 stars 10.45 score 844 scripts 7 dependentst-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
22.2 match 10.93 score 10k scripts 55 dependentshneth
ds4psy:Data Science for Psychologists
All datasets and functions required for the examples and exercises of the book "Data Science for Psychologists" (by Hansjoerg Neth, Konstanz University, 2023), freely available at <https://bookdown.org/hneth/ds4psy/>. The book and course introduce principles and methods of data science to students of psychology and other biological or social sciences. The 'ds4psy' package primarily provides datasets, but also functions for data generation and manipulation (e.g., of text and time data) and graphics that are used in the book and its exercises. All functions included in 'ds4psy' are designed to be explicit and instructive, rather than efficient or elegant.
Maintained by Hansjoerg Neth. Last updated 1 months ago.
data-literacydata-scienceeducationexploratory-data-analysispsychologysocial-sciencesvisualisation
34.9 match 22 stars 6.79 score 70 scriptswilkox
ggfittext:Fit Text Inside a Box in 'ggplot2'
A 'ggplot2' extension to fit text into a box by growing, shrinking or wrapping the text.
Maintained by David Wilkins. Last updated 1 years ago.
21.1 match 306 stars 11.08 score 234 scripts 33 dependentsdselivanov
text2vec:Modern Text Mining Framework for R
Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.
Maintained by Dmitriy Selivanov. Last updated 7 months ago.
glovelatent-dirichlet-allocationnatural-language-processingtext-miningtopic-modelingvectorizationword-embeddingsword2veccpp
17.3 match 860 stars 13.48 score 1.3k scripts 23 dependentszumbov2
deeplr:Interface to the 'DeepL' Translation API
A wrapper for the 'DeepL' Pro API <https://www.deepl.com/docs-api>, a web service for translating texts between different languages. A DeepL API developer account is required to use the service (see <https://www.deepl.com/pro#developer>).
Maintained by David Zumbach. Last updated 12 months ago.
40.4 match 41 stars 5.57 score 70 scriptstrinker
textshape:Tools for Reshaping Text
Tools that can be used to reshape and restructure text data.
Maintained by Tyler Rinker. Last updated 12 months ago.
data-reshapingmanipulationsentence-boundary-detectiontext-datatext-formatingtidy
24.5 match 50 stars 9.18 score 266 scripts 34 dependentshneth
unikn:Graphical Elements of the University of Konstanz's Corporate Design
Define and use graphical elements of corporate design manuals in R. The 'unikn' package provides color functions (by defining dedicated colors and color palettes, and commands for finding, changing, viewing, and using them) and styled text elements (e.g., for marking, underlining, or plotting colored titles). The pre-defined range of colors and text decoration functions is based on the corporate design of the University of Konstanz <https://www.uni-konstanz.de/>, but can be adapted and extended for other purposes or institutions.
Maintained by Hansjoerg Neth. Last updated 3 months ago.
brandingcolorcolor-palettecolorschemecorporate-designpalettetext-decorationuniversity-colorsvisual-identity
24.9 match 39 stars 8.82 score 156 scripts 2 dependentspatperry
utf8:Unicode Text Processing
Process and print 'UTF-8' encoded international text (Unicode). Input, validate, normalize, encode, format, and display.
Maintained by Kirill Mรผller. Last updated 3 months ago.
12.9 match 113 stars 16.48 score 295 scripts 11k dependentshaozhu233
kableExtra:Construct Complex Table with 'kable' and Pipe Syntax
Build complex HTML or 'LaTeX' tables using 'kable()' from 'knitr' and the piping syntax from 'magrittr'. Function 'kable()' is a light weight table generator coming from 'knitr'. This package simplifies the way to manipulate the HTML or 'LaTeX' codes generated by 'kable()' and allows users to construct complex tables and customize styles using a readable syntax.
Maintained by Hao Zhu. Last updated 11 days ago.
htmlkablekableextraknitrlatexrmarkdown
10.8 match 702 stars 19.35 score 55k scripts 163 dependentsrstudio
shiny:Web Application Framework for R
Makes it incredibly easy to build interactive web applications with R. Automatic "reactive" binding between inputs and outputs and extensive prebuilt widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.
Maintained by Winston Chang. Last updated 14 days ago.
reactiverstudioshinyweb-appweb-development
9.7 match 5.4k stars 21.28 score 108k scripts 1.8k dependentsgeobosh
Rdpack:Update and Manipulate Rd Documentation Objects
Functions for manipulation of R documentation objects, including functions reprompt() and ereprompt() for updating 'Rd' documentation for functions, methods and classes; 'Rd' macros for citations and import of references from 'bibtex' files for use in 'Rd' files and 'roxygen2' comments; 'Rd' macros for evaluating and inserting snippets of 'R' code and the results of its evaluation or creating graphics on the fly; and many functions for manipulation of references and Rd files.
Maintained by Georgi N. Boshnakov. Last updated 1 days ago.
bibtexbibtex-referencescitationsdocumentationrd-formatroxygen2
14.4 match 30 stars 13.76 score 73 scripts 2.3k dependentsropensci
googleLanguageR:Call Google's 'Natural Language' API, 'Cloud Translation' API, 'Cloud Speech' API and 'Cloud Text-to-Speech' API
Call 'Google Cloud' machine learning APIs for text and speech tasks. Call the 'Cloud Translation' API <https://cloud.google.com/translate/> for detection and translation of text, the 'Natural Language' API <https://cloud.google.com/natural-language/> to analyse text for sentiment, entities or syntax, the 'Cloud Speech' API <https://cloud.google.com/speech/> to transcribe sound files to text and the 'Cloud Text-to-Speech' API <https://cloud.google.com/text-to-speech/> to turn text into sound files.
Maintained by Mark Edmondson. Last updated 8 months ago.
cloud-speech-apicloud-translation-apigoogle-api-clientgoogle-cloudgoogle-cloud-speechgoogle-nlpgoogleauthrnatural-language-processingpeer-reviewedsentiment-analysisspeech-apitranslation-api
18.3 match 196 stars 10.36 score 268 scripts 3 dependentshughjonesd
huxtable:Easily Create and Style Tables for LaTeX, HTML and Other Formats
Creates styled tables for data presentation. Export to HTML, LaTeX, RTF, 'Word', 'Excel', and 'PowerPoint'. Simple, modern interface to manipulate borders, size, position, captions, colours, text styles and number formatting. Table cells can span multiple rows and/or columns. Includes a 'huxreg' function for creation of regression tables, and 'quick_*' one-liners to print data to a new document.
Maintained by David Hugh-Jones. Last updated 13 days ago.
htmlhuxtablelatexmicrosoft-wordpowerpointreproducible-researchtables
13.6 match 323 stars 13.93 score 1.9k scripts 16 dependentsdreamrs
shinyWidgets:Custom Inputs Widgets for Shiny
Collection of custom input controls and user interface components for 'Shiny' applications. Give your applications a unique and colorful style !
Maintained by Victor Perrier. Last updated 13 days ago.
10.9 match 849 stars 17.05 score 8.1k scripts 218 dependentstidyverse
ggplot2:Create Elegant Data Visualisations Using the Grammar of Graphics
A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
Maintained by Thomas Lin Pedersen. Last updated 10 days ago.
data-visualisationvisualisation
7.4 match 6.6k stars 25.10 score 645k scripts 7.5k dependentsqinwf
jiebaR:Chinese Text Segmentation
Chinese text segmentation, keyword extraction and speech tagging For R.
Maintained by Qin Wenfeng. Last updated 5 years ago.
chinesechinese-text-segmentationcppjiebajiebalexical-analysisnlpcpp
17.8 match 348 stars 10.18 score 456 scripts 6 dependentstrinker
lexicon:Lexicons for Text Analysis
A collection of lexical hash tables, dictionaries, and word lists.
Maintained by Tyler Rinker. Last updated 3 years ago.
hashlexiconlookupnames-frequentstopwordstext-dictionariestext-mining
20.3 match 111 stars 8.80 score 224 scripts 25 dependentstidyverse
vroom:Read and Write Rectangular Text Data Quickly
The goal of 'vroom' is to read and write data (like 'csv', 'tsv' and 'fwf') quickly. When reading it uses a quick initial indexing step, then reads the values lazily , so only the data you actually use needs to be read. The writer formats the data in parallel and writes to disk asynchronously from formatting.
Maintained by Jennifer Bryan. Last updated 7 months ago.
csvcsv-parserfixed-width-texttsvtsv-parsercpp
9.6 match 625 stars 17.78 score 4.5k scripts 2.1k dependentsropensci
pdftools:Text Extraction, Rendering and Converting of PDF Documents
Utilities based on 'libpoppler' <https://poppler.freedesktop.org> for extracting text, fonts, attachments and metadata from a PDF file. Also supports high quality rendering of PDF documents into PNG, JPEG, TIFF format, or into raw bitmap vectors for further processing in R.
Maintained by Jeroen Ooms. Last updated 14 days ago.
pdf-filespdf-formatpdftoolspopplerpoppler-librarytext-extractioncpp
12.9 match 529 stars 13.10 score 3.3k scripts 47 dependentsspatstat
spatstat.geom:Geometrical Functionality of the 'spatstat' Family
Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)
Maintained by Adrian Baddeley. Last updated 15 hours ago.
classes-and-objectsdistance-calculationgeometrygeometry-processingimagesmensurationplottingpoint-patternsspatial-dataspatial-data-analysis
13.8 match 7 stars 12.11 score 241 scripts 227 dependentseagerai
fastai:Interface to 'fastai'
The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.
Maintained by Turgut Abdullayev. Last updated 11 months ago.
audiocollaborative-filteringdarknetdarknet-image-classificationfastaimedicalobject-detectiontabulartextvision
17.5 match 118 stars 9.40 score 76 scriptsropensci
rcrossref:Client for Various 'CrossRef' 'APIs'
Client for various 'CrossRef' 'APIs', including 'metadata' search with their old and newer search 'APIs', get 'citations' in various formats (including 'bibtex', 'citeproc-json', 'rdf-xml', etc.), convert 'DOIs' to 'PMIDs', and 'vice versa', get citations for 'DOIs', and get links to full text of articles when available.
Maintained by Najko Jahn. Last updated 2 years ago.
text-mingliteraturepdfxmlpublicationscitationsfull-texttdmcrossrefapiapi-wrappercrossref-apidoimetadata
15.5 match 172 stars 10.18 score 404 scripts 10 dependentstomeriko96
polyglotr:Translate Text
Provide easy methods to translate pieces of text. Functions send requests to translation services online.
Maintained by Tomer Iwan. Last updated 1 months ago.
google-translategoogletranslatelanguagelingueemymemory-apimymemorytranslatorponstranslationtranslations-api
20.5 match 33 stars 7.61 score 34 scripts 1 dependentsmkearney
wactor:Word Factor Vectors
A user-friendly factor-like interface for converting strings of text into numeric vectors and rectangular data structures.
Maintained by Michael W. Kearney. Last updated 5 years ago.
texttext-classificationtext-processingtext-vectorizationword-embeddingsword-vectorsword2vec
34.0 match 33 stars 4.52 score 3 scriptskassambara
ggpubr:'ggplot2' Based Publication Ready Plots
The 'ggplot2' package is excellent and flexible for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a 'ggplot', the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills. 'ggpubr' provides some easy-to-use functions for creating and customizing 'ggplot2'- based publication ready plots.
Maintained by Alboukadel Kassambara. Last updated 2 years ago.
9.2 match 1.2k stars 16.68 score 65k scripts 409 dependentstrinker
textstem:Tools for Stemming and Lemmatizing Text
Tools that stem and lemmatize text. Stemming is a process that removes endings such as affixes. Lemmatization is the process of grouping inflected forms together as a single base form.
Maintained by Tyler Rinker. Last updated 7 years ago.
lemmatizationstemmingtext-mining
17.5 match 45 stars 8.71 score 888 scripts 11 dependentsyihui
xfun:Supporting Functions for Packages Maintained by 'Yihui Xie'
Miscellaneous functions commonly used in other packages maintained by 'Yihui Xie'.
Maintained by Yihui Xie. Last updated 4 days ago.
8.3 match 145 stars 18.18 score 916 scripts 4.4k dependentsmassimoaria
tall:Text Analysis for All
An R 'shiny' app designed for diverse text analysis tasks, offering a wide range of methodologies tailored to Natural Language Processing (NLP) needs. It is a versatile, general-purpose tool for analyzing textual data. 'tall' features a comprehensive workflow, including data cleaning, preprocessing, statistical analysis, and visualization, all integrated for effective text analysis.
Maintained by Massimo Aria. Last updated 4 days ago.
r-shinytext-analysis-and-sentiment-analysistext-classificationtext-miningtextual-analysiscpp
29.2 match 14 stars 5.12 scoredmurdoch
plotrix:Various Plotting Functions
Lots of plots, various labeling, axis and color scaling functions. The author/maintainer died in September 2023.
Maintained by Duncan Murdoch. Last updated 1 years ago.
12.9 match 5 stars 11.31 score 9.2k scripts 361 dependentskurthornik
NLP:Natural Language Processing Infrastructure
Basic classes and methods for Natural Language Processing.
Maintained by Kurt Hornik. Last updated 4 months ago.
15.6 match 6 stars 9.37 score 1.0k scripts 127 dependentscomputationalstylistics
stylo:Stylometric Multivariate Analyses
Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.
Maintained by Maciej Eder. Last updated 2 months ago.
16.9 match 187 stars 8.58 score 462 scriptsopenanalytics
inTextSummaryTable:Creation of in-Text Summary Table
Creation of tables of summary statistics or counts for clinical data (for 'TLFs'). These tables can be exported as in-text table (with the 'flextable' package) for a Clinical Study Report (Word format) or a 'topline' presentation (PowerPoint format), or as interactive table (with the 'DT' package) to an html document for clinical data review.
Maintained by Laure Cougnaud. Last updated 9 months ago.
26.2 match 1 stars 5.52 score 47 scriptskarlines
diagram:Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams
Visualises simple graphs (networks) based on a transition matrix, utilities to plot flow diagrams, visualising webs, electrical networks, etc. Support for the book "A practical guide to ecological modelling - using R as a simulation platform" by Karline Soetaert and Peter M.J. Herman (2009), Springer. and the book "Solving Differential Equations in R" by Karline Soetaert, Jeff Cash and Francesca Mazzia (2012), Springer. Includes demo(flowchart), demo(plotmat), demo(plotweb).
Maintained by Karline Soetaert. Last updated 4 years ago.
14.0 match 10.06 score 598 scripts 487 dependentskasperwelbers
corpustools:Managing, Querying and Analyzing Tokenized Text
Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.
Maintained by Kasper Welbers. Last updated 6 months ago.
18.1 match 31 stars 7.50 score 174 scripts 1 dependentsdmurdoch
rgl:3D Visualization Using OpenGL
Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.). Output may be on screen using OpenGL, or to various standard 3D file formats including WebGL, PLY, OBJ, STL as well as 2D image formats, including PNG, Postscript, SVG, PGF.
Maintained by Duncan Murdoch. Last updated 2 months ago.
graphicsopenglrglwebgllibglulibglvndlibpnglibx11freetypecpp
7.7 match 91 stars 17.49 score 7.3k scripts 300 dependentsr-forge
zoo:S3 Infrastructure for Regular and Irregular Time Series (Z's Ordered Observations)
An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo's key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.
Maintained by Achim Zeileis. Last updated 14 days ago.
8.3 match 16.23 score 33k scripts 2.2k dependentstrinker
sentimentr:Calculate Text Polarity Sentiment
Calculate text polarity sentiment at the sentence level and optionally aggregate by rows or grouping variable(s).
Maintained by Tyler Rinker. Last updated 3 years ago.
amplifierpolaritysentimentsentiment-analysisvalence-shifter
13.9 match 432 stars 9.43 score 680 scripts 2 dependentsjhudsl
text2speech:Text to Speech Conversion
Converts text into speech using various text-to-speech (TTS) engines and provides an unified interface for accessing their functionality. With this package, users can easily generate audio files of spoken words, phrases, or sentences from plain text data. The package supports multiple TTS engines, including Google's 'Cloud Text-to-Speech API', 'Amazon Polly', Microsoft's 'Cognitive Services Text to Speech REST API', and a free TTS engine called 'Coqui TTS'.
Maintained by Howard Baek. Last updated 2 years ago.
edtech-softwarespeech-synthesistext-to-speechttsvoice
20.6 match 21 stars 6.28 score 9 scripts 2 dependentsandrewheiss
scriptuRs:Complete Text of the LDS Scriptures
Full text, in data frames containing one row per verse, of the Standard Works of The Church of Jesus Christ of Latter-day Saints (LDS). These are the Old Testament, (KJV), the New Testament (KJV), the Book of Mormon, the Doctrine and Covenants, and the Pearl of Great Price.
Maintained by Andrew Heiss. Last updated 6 years ago.
ldslds-scripturestext-miningtidytext
29.9 match 14 stars 4.32 score 30 scriptsropensci
jstor:Read Data from JSTOR/DfR
Functions and helpers to import metadata, ngrams and full-texts delivered by Data for Research by JSTOR.
Maintained by Thomas Klebel. Last updated 8 months ago.
jstorpeer-reviewedtext-analysistext-mining
17.5 match 47 stars 7.29 score 55 scriptsingmarboeschen
JATSdecoder:A Metadata and Text Extraction and Manipulation Tool Set
Provides a function collection to extract metadata, sectioned text and study characteristics from scientific articles in 'NISO-JATS' format. Articles in PDF format can be converted to 'NISO-JATS' with the 'Content ExtRactor and MINEr' ('CERMINE', <https://github.com/CeON/CERMINE>). For convenience, two functions bundle the extraction heuristics: JATSdecoder() converts 'NISO-JATS'-tagged XML files to a structured list with elements title, author, journal, history, 'DOI', abstract, sectioned text and reference list. study.character() extracts multiple study characteristics like number of included studies, statistical methods used, alpha error, power, statistical results, correction method for multiple testing, software used. An estimation of the involved sample size is performed based on reports within the abstract and the reported degrees of freedom within statistical results. In addition, the package contains some useful functions to process text (text2sentences(), text2num(), ngram(), strsplit2(), grep2()). See Bรถschen, I. (2021) <doi:10.1007/s11192-021-04162-z> Bรถschen, I. (2021) <doi:10.1038/s41598-021-98782-3> and Bรถschen, I (2023) <doi:10.1038/s41598-022-27085-y>.
Maintained by Ingmar Bรถschen. Last updated 6 days ago.
cermineniso-jatspubmedcentraltext-extractiontext-miningxml-filesopenjdk
27.8 match 18 stars 4.56 score 7 scriptscoolbutuseless
lofifonts:Text Rendering with Bitmap and Vector Fonts
Alternate font rendering is useful when rendering text to novel graphics outputs where modern font rendering is not available or where bespoke text positioning is required. Bitmap and vector fonts allow for custom layout and rendering using pixel coordinates and line drawing. Formatted text is created as a data.frame of pixel coordinates (for bitmap fonts) or stroke coordinates (for vector fonts). All text can be easily previewed as a matrix or raster image. A selection of fonts is included with this package.
Maintained by Mike Cheng. Last updated 25 days ago.
20.7 match 7 stars 5.94 score 10 scriptsjonclayden
ore:An R Interface to the Onigmo Regular Expression Library
Provides an alternative to R's built-in functionality for handling regular expressions, based on the Onigmo library. Offers first-class compiled regex objects, partial matching and function-based substitutions, amongst other features.
Maintained by Jon Clayden. Last updated 3 days ago.
regexregular-expressionstext-analysis
17.2 match 58 stars 7.16 score 125 scripts 6 dependentskumes
chatAI4R:Chat-Based Interactive Artificial Intelligence for R
The Large Language Model (LLM) represents a groundbreaking advancement in data science and programming, and also allows us to extend the world of R. A seamless interface for integrating the 'OpenAI' Web APIs into R is provided in this package. This package leverages LLM-based AI techniques, enabling efficient knowledge discovery and data analysis (see 'OpenAI' Web APIs details <https://openai.com/blog/openai-api>). The previous functions such as seamless translation and image generation have been moved to other packages 'deepRstudio' and 'stableDiffusion4R'.
Maintained by Satoshi Kume. Last updated 1 months ago.
aibioinformaticschatgptgptimageimage-generation
27.6 match 14 stars 4.45 score 3 scriptsrspatial
terra:Spatial Data Analysis
Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).
Maintained by Robert J. Hijmans. Last updated 6 hours ago.
geospatialrasterspatialvectoronetbbprojgdalgeoscpp
6.9 match 559 stars 17.64 score 17k scripts 851 dependentshenrikbengtsson
R.utils:Various Programming Utilities
Utility functions useful when programming and developing R packages.
Maintained by Henrik Bengtsson. Last updated 1 years ago.
8.8 match 63 stars 13.74 score 5.7k scripts 814 dependentsnteetor
cascadess:A Style Pronoun for 'htmltools' Tags
Apply styles to tag elements directly and with the .style pronoun. Using the pronoun, styles are created within the context of a tag element. Change borders, backgrounds, text, margins, layouts, and more.
Maintained by Nathan Teetor. Last updated 5 months ago.
25.0 match 19 stars 4.82 score 4 scriptsquanteda
spacyr:Wrapper to the 'spaCy' 'NLP' Library
An R wrapper to the 'Python' 'spaCy' 'NLP' library, from <https://spacy.io>.
Maintained by Kenneth Benoit. Last updated 1 months ago.
extract-entitiesnlpspacyspeech-tagging
11.1 match 253 stars 10.68 score 408 scripts 6 dependentsemilhvitfeldt
hcandersenr:H.C. Andersens Fairy Tales
Texts for H.C. Andersens fairy tales, ready for text analysis. Fairy tales in German, Danish, English, Spanish and French.
Maintained by Emil Hvitfeldt. Last updated 5 years ago.
andersens-fairy-talestext-mining
25.5 match 10 stars 4.62 score 83 scriptsrolkra
explore:Simplifies Exploratory Data Analysis
Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.
Maintained by Roland Krasser. Last updated 3 months ago.
data-explorationdata-visualisationdecision-treesedarmarkdownshinytidy
10.3 match 228 stars 11.43 score 221 scripts 1 dependentsropensci
textreuse:Detect Text Reuse and Document Similarity
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
Maintained by Yaoxiang Li. Last updated 1 months ago.
12.3 match 200 stars 9.28 score 226 scriptsr-lib
textshaping:Bindings to the 'HarfBuzz' and 'Fribidi' Libraries for Text Shaping
Provides access to the text shaping functionality in the 'HarfBuzz' library and the bidirectional algorithm in the 'Fribidi' library. 'textshaping' is a low-level utility package mainly for graphic devices that expands upon the font tool-set provided by the 'systemfonts' package.
Maintained by Thomas Lin Pedersen. Last updated 2 months ago.
8.2 match 19 stars 13.58 score 66 scripts 484 dependentsvgherard
sbo:Text Prediction via Stupid Back-Off N-Gram Models
Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).
Maintained by Valerio Gherardi. Last updated 4 years ago.
natural-language-processingngram-modelspredictive-textsbocpp
23.4 match 10 stars 4.78 score 12 scriptslaresbernardo
lares:Analytics & Machine Learning Sidekick
Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.
Maintained by Bernardo Lares. Last updated 25 days ago.
analyticsapiautomationautomldata-sciencedescriptive-statisticsh2omachine-learningmarketingmmmpredictive-modelingpuzzlerlanguagerobynvisualization
11.2 match 233 stars 9.84 score 185 scripts 1 dependentsemilhvitfeldt
textdata:Download and Load Various Text Datasets
Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled text data sets for classification and analysis.
Maintained by Emil Hvitfeldt. Last updated 10 months ago.
11.3 match 75 stars 9.66 score 1.4k scripts 1 dependentsgagolews
stringx:Replacements for Base String Functions Powered by 'stringi'
English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to 'ICU' (International Components for Unicode) and 'stringi', they are fast, reliable, and portable across different platforms.
Maintained by Marek Gagolewski. Last updated 2 months ago.
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringitexttext-processingunicode
23.0 match 28 stars 4.75 score 1 scriptsjuba
rainette:The Reinert Method for Textual Data Clustering
An R implementation of the Reinert text clustering method. For more details about the algorithm see the included vignettes or Reinert (1990) <doi:10.1177/075910639002600103>.
Maintained by Julien Barnier. Last updated 11 months ago.
text-analysistext-classificationcpp
15.5 match 55 stars 6.90 score 24 scriptsbnosac
textplot:Text Plots
Visualise complex relations in texts. This is done by providing functionalities for displaying text co-occurrence networks, text correlation networks, dependency relationships as well as text clustering and semantic text 'embeddings'. Feel free to join the effort of providing interesting text visualisations.
Maintained by Jan Wijffels. Last updated 3 years ago.
15.7 match 54 stars 6.78 score 75 scripts 1 dependentsbioc
DAPAR:Tools for the Differential Analysis of Proteins Abundance with R
The package DAPAR is a Bioconductor distributed R package which provides all the necessary functions to analyze quantitative data from label-free proteomics experiments. Contrarily to most other similar R packages, it is endowed with rich and user-friendly graphical interfaces, so that no programming skill is required (see `Prostar` package).
Maintained by Samuel Wieczorek. Last updated 5 months ago.
proteomicsnormalizationpreprocessingmassspectrometryqualitycontrolgodataimportprostar1
19.6 match 2 stars 5.42 score 22 scripts 1 dependentswilkelab
gridtext:Improved Text Rendering Support for 'Grid' Graphics
Provides support for rendering of formatted text using 'grid' graphics. Text can be formatted via a minimal subset of 'Markdown', 'HTML', and inline 'CSS' directives, and it can be rendered both with and without word wrap.
Maintained by Brenton M. Wiernik. Last updated 1 years ago.
9.1 match 97 stars 11.55 score 344 scripts 203 dependentsbioc
RCy3:Functions to Access and Control Cytoscape
Vizualize, analyze and explore networks using Cytoscape via R. Anything you can do using the graphical user interface of Cytoscape, you can now do with a single RCy3 function.
Maintained by Alex Pico. Last updated 5 months ago.
visualizationgraphandnetworkthirdpartyclientnetwork
7.8 match 52 stars 13.39 score 628 scripts 15 dependentsdfe-analytical-services
shinyGovstyle:Custom Gov Style Inputs for Shiny
Collection of 'shiny' application styling that are the based on the GOV.UK Design System. See <https://design-system.service.gov.uk/components/> for details.
Maintained by Ross Wyatt. Last updated 3 days ago.
15.3 match 44 stars 6.69 score 25 scriptsvegandevs
vegan:Community Ecology Package
Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
Maintained by Jari Oksanen. Last updated 17 days ago.
ecological-modellingecologyordinationfortranopenblas
5.3 match 472 stars 19.41 score 15k scripts 440 dependentsparklab
Nozzle.R1:Nozzle Reports
The Nozzle package provides an API to generate HTML reports with dynamic user interface elements based on JavaScript and CSS (Cascading Style Sheets). Nozzle was designed to facilitate summarization and rapid browsing of complex results in data analysis pipelines where multiple analyses are performed frequently on big data sets. The package can be applied to any project where user-friendly reports need to be created.
Maintained by Nils Gehlenborg. Last updated 10 years ago.
gehlenborglabhtml-reportreproducible-research
18.9 match 68 stars 5.31 score 10 scripts 2 dependentsr-lib
marquee:Markdown Parser and Renderer for R Graphics
Provides the mean to parse and render markdown text with grid along with facilities to define the styling of the text.
Maintained by Thomas Lin Pedersen. Last updated 2 months ago.
11.7 match 84 stars 8.54 score 28 scripts 1 dependentsbnosac
textrank:Summarize Text by Ranking Sentences and Finding Keywords
The 'textrank' algorithm is an extension of the 'Pagerank' algorithm for text. The algorithm allows to summarize text by calculating how sentences are related to one another. This is done by looking at overlapping terminology used in sentences in order to set up links between sentences. The resulting sentence network is next plugged into the 'Pagerank' algorithm which identifies the most important sentences in your text and ranks them. In a similar way 'textrank' can also be used to extract keywords. A word network is constructed by looking if words are following one another. On top of that network the 'Pagerank' algorithm is applied to extract relevant words after which relevant words which are following one another are combined to get keywords. More information can be found in the paper from Mihalcea, Rada & Tarau, Paul (2004) <https://www.aclweb.org/anthology/W04-3252/>.
Maintained by Jan Wijffels. Last updated 4 years ago.
natural-language-processingnlptextranktextrank-algorithm
13.5 match 77 stars 7.38 score 103 scripts 2 dependentssticsrpacks
SticsRFiles:Read and Modify 'STICS' Input/Output Files
Manipulating input and output files of the 'STICS' crop model. Files are either 'JavaSTICS' XML files or text files used by the model 'fortran' executable. Most basic functionalities are reading or writing parameter names and values in both XML or text input files, and getting data from output files. Advanced functionalities include XML files generation from XML templates and/or spreadsheets, or text files generation from XML files by using 'xslt' transformation.
Maintained by Patrice Lecharpentier. Last updated 19 days ago.
12.0 match 4 stars 8.27 score 124 scriptsbnosac
ruimtehol:Learn Text 'Embeddings' with 'Starspace'
Wraps the 'StarSpace' library <https://github.com/facebookresearch/StarSpace> allowing users to calculate word, sentence, article, document, webpage, link and entity 'embeddings'. By using the 'embeddings', you can perform text based multi-label classification, find similarities between texts and categories, do collaborative-filtering based recommendation as well as content-based recommendation, find out relations between entities, calculate graph 'embeddings' as well as perform semi-supervised learning and multi-task learning on plain text. The techniques are explained in detail in the paper: 'StarSpace: Embed All The Things!' by Wu et al. (2017), available at <arXiv:1709.03856>.
Maintained by Jan Wijffels. Last updated 1 years ago.
classificationembeddingsnatural-language-processingnlpsimilaritystarspacetext-miningcpp
14.5 match 101 stars 6.65 score 44 scriptsdwulff
text2sdg:Detecting UN Sustainable Development Goals in Text
The United Nationsโ Sustainable Development Goals (SDGs) have become an important guideline for organisations to monitor and plan their contributions to social, economic, and environmental transformations. The 'text2sdg' package is an open-source analysis package that identifies SDGs in text using scientifically developed query systems, opening up the opportunity to monitor any type of text-based data, such as scientific output or corporate publications. For more information regarding the methodology see Meier, Mata & Wulff (2022) <arXiv:2110.05856>.
Maintained by Dominik S. Meier. Last updated 6 months ago.
natural-language-processingsustainabilitysustainable-developmentsustainable-development-goals
15.7 match 18 stars 6.13 score 9 scriptsfhdsl
conrad:Client for the Microsoft's 'Cognitive Services Text to Speech REST' API
Convert text into synthesized speech and get a list of supported voices for a region. Microsoft's 'Cognitive Services Text to Speech REST' API <https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech?tabs=streaming> supports neural text to speech voices, which support specific languages and dialects that are identified by locale.
Maintained by Howard Baik. Last updated 2 months ago.
18.2 match 1 stars 5.26 score 2 scripts 3 dependentstommyjones
textmineR:Functions for Text Mining and Topic Modeling
An aid for text mining in R, with a syntax that should be familiar to experienced R users. Provides a wrapper for several topic models that take similarly-formatted input and give similarly-formatted output. Has additional functionality for analyzing and diagnostics for topic models.
Maintained by Tommy Jones. Last updated 2 years ago.
8.8 match 106 stars 10.83 score 310 scripts 7 dependentsnalimilan
SnowballC:Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library
An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish.
Maintained by Milan Bouchet-Valat. Last updated 18 days ago.
7.5 match 27 stars 12.63 score 4.4k scripts 171 dependentspaithiov909
audubon:Japanese Text Processing Tools
A collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the 'Sudachi' morphological analyzer and the 'NEologd' (Neologism dictionary for 'MeCab'). These features are specific to Japanese and are not implemented in 'ICU' (International Components for Unicode).
Maintained by Akiru Kato. Last updated 23 days ago.
16.9 match 10 stars 5.61 score 3 scripts 1 dependentsharrelfe
Hmisc:Harrell Miscellaneous
Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.
Maintained by Frank E Harrell Jr. Last updated 1 days ago.
5.3 match 210 stars 17.61 score 17k scripts 750 dependentsmichelnivard
gptstudio:Use Large Language Models Directly in your Development Environment
Large language models are readily accessible via API. This package lowers the barrier to use the API inside of your development environment. For more on the API, see <https://platform.openai.com/docs/introduction>.
Maintained by James Wade. Last updated 7 days ago.
chatgptgpt-3rstudiorstudio-addin
8.7 match 924 stars 10.83 score 43 scripts 1 dependentspolmine
polmineR:Verbs and Nouns for Corpus Analysis
Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.
Maintained by Andreas Blaette. Last updated 1 years ago.
11.8 match 49 stars 7.96 score 311 scriptsmjockers
syuzhet:Extracts Sentiment and Sentiment-Derived Plot Arcs from Text
Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include "syuzhet" (default) developed in the Nebraska Literary Lab "afinn" developed by Finn ร rup Nielsen, "bing" developed by Minqing Hu and Bing Liu, and "nrc" developed by Mohammad, Saif M. and Turney, Peter D. Applicable references are available in README.md and in the documentation for the "get_sentiment" function. The package also provides a hack for implementing Stanford's coreNLP sentiment parser. The package provides several methods for plot arc normalization.
Maintained by Matthew Jockers. Last updated 2 years ago.
7.2 match 336 stars 12.92 score 1.4k scripts 31 dependentsbrodieg
diffobj:Diffs for R Objects
Generate a colorized diff of two R objects for an intuitive visualization of their differences.
Maintained by Brodie Gaslam. Last updated 3 years ago.
7.1 match 232 stars 13.12 score 107 scripts 486 dependentskwb-r
kwb.utils:General Utility Functions Developed at KWB
This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).
Maintained by Hauke Sonnenberg. Last updated 12 months ago.
12.7 match 8 stars 7.33 score 12 scripts 78 dependentsgforge
htmlTable:Advanced Tables for Markdown/HTML
Tables with state-of-the-art layout elements such as row spanners, column spanners, table spanners, zebra striping, and more. While allowing advanced layout, the underlying css-structure is simple in order to maximize compatibility with common word processors. The package also contains a few text formatting functions that help outputting text compatible with HTML/LaTeX.
Maintained by Max Gordon. Last updated 8 months ago.
6.0 match 79 stars 15.32 score 1.3k scripts 763 dependentstidymodels
textrecipes:Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Maintained by Emil Hvitfeldt. Last updated 10 days ago.
8.4 match 160 stars 10.87 score 964 scripts 1 dependentsdgerbing
lessR:Less Code, More Results
Each function replaces multiple standard R functions. For example, two function calls, Read() and CountAll(), generate summary statistics for all variables in the data frame, plus histograms and bar charts as appropriate. Other functions provide for summary statistics via pivot tables, a comprehensive regression analysis, ANOVA and t-test, visualizations including the Violin/Box/Scatter plot for a numerical variable, bar chart, histogram, box plot, density curves, calibrated power curve, reading multiple data formats with the same function call, variable labels, time series with aggregation and forecasting, color themes, and Trellis (facet) graphics. Also includes a confirmatory factor analysis of multiple indicator measurement models, pedagogical routines for data simulation such as for the Central Limit Theorem, generation and rendering of regression instructions for interpretative output, and interactive visualizations.
Maintained by David W. Gerbing. Last updated 1 days ago.
12.3 match 6 stars 7.42 score 394 scripts 3 dependentsropensci
ijtiff:Comprehensive TIFF I/O with Full Support for 'ImageJ' TIFF Files
General purpose TIFF file I/O for R users. Currently the only such package with read and write support for TIFF files with floating point (real-numbered) pixels, and the only package that can correctly import TIFF files that were saved from 'ImageJ' and write TIFF files than can be correctly read by 'ImageJ' <https://imagej.net/ij/>. Also supports text image I/O.
Maintained by Rory Nolan. Last updated 7 days ago.
image-manipulationimagejpeer-reviewedtiff-filestiff-imagestiff
10.1 match 18 stars 8.97 score 36 scripts 7 dependentsquanteda
quanteda.textmodels:Scaling Models and Classifiers for Textual Data
Scaling models and classifiers for sparse matrix objects representing textual data in the form of a document-feature matrix. Includes original implementations of 'Laver', 'Benoit', and Garry's (2003) <doi:10.1017/S0003055403000698>, 'Wordscores' model, the Perry and 'Benoit' (2017) <doi:10.48550/arXiv.1710.08963> class affinity scaling model, and the 'Slapin' and 'Proksch' (2008) <doi:10.1111/j.1540-5907.2008.00338.x> 'wordfish' model, as well as methods for correspondence analysis, latent semantic analysis, and fast Naive Bayes and linear 'SVMs' specially designed for sparse textual data.
Maintained by Kenneth Benoit. Last updated 1 months ago.
9.5 match 42 stars 9.56 score 432 scriptsr-spatial
sf:Simple Features for R
Support for simple feature access, a standardized way to encode and analyze spatial vector data. Binds to 'GDAL' <doi: 10.5281/zenodo.5884351> for reading and writing data, to 'GEOS' <doi: 10.5281/zenodo.11396894> for geometrical operations, and to 'PROJ' <doi: 10.5281/zenodo.5884394> for projection conversions and datum transformations. Uses by default the 's2' package for geometry operations on geodetic (long/lat degree) coordinates.
Maintained by Edzer Pebesma. Last updated 17 days ago.
4.0 match 1.4k stars 22.42 score 117k scripts 1.2k dependentsjanmarvin
openxlsx2:Read, Write and Edit 'xlsx' Files
Simplifies the creation of 'xlsx' files by providing a high level interface to writing, styling and editing worksheets.
Maintained by Jan Marvin Garbuszus. Last updated 12 hours ago.
6.5 match 138 stars 13.67 score 194 scripts 11 dependentscpsievert
LDAvis:Interactive Visualization of Topic Models
Tools to create an interactive web-based visualization of a topic model that has been fit to a corpus of text data using Latent Dirichlet Allocation (LDA). Given the estimated parameters of the topic model, it computes various summary statistics as input to an interactive visualization built with 'D3.js' that is accessed via a browser. The goal is to help users interpret the topics in their 'LDA' topic model.
Maintained by Carson Sievert. Last updated 7 years ago.
javascripttext-miningtopic-modelingvisualization
8.0 match 558 stars 10.93 score 804 scripts 1 dependentsalexkowa
EnvStats:Package for Environmental Statistics, Including US EPA Guidance
Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <doi:10.1007/978-1-4614-8456-1>).
Maintained by Alexander Kowarik. Last updated 18 days ago.
6.8 match 26 stars 12.80 score 2.4k scripts 46 dependentsjhk0530
gemini.R:Interface for 'Google Gemini' API
Provides a comprehensive interface for Google Gemini API, enabling users to access and utilize Gemini Large Language Model (LLM) functionalities directly from R. This package facilitates seamless integration with Google Gemini, allowing for advanced language processing, text generation, and other AI-driven capabilities within the R environment. For more information, please visit <https://ai.google.dev/docs/gemini_api_overview>.
Maintained by Jinhwan Kim. Last updated 9 hours ago.
12.9 match 68 stars 6.69 score 37 scripts 1 dependentsdbosak01
reporter:Creates Statistical Reports
Contains functions to create regulatory-style statistical reports. Originally designed to create tables, listings, and figures for the pharmaceutical, biotechnology, and medical device industries, these reports are generalized enough that they could be used in any industry. Generates text, rich-text, PDF, HTML, and Microsoft Word file formats. The package specializes in printing wide and long tables with automatic page wrapping and splitting. Reports can be produced with a minimum of function calls, and without relying on other table packages. The package supports titles, footnotes, page header, page footers, spanning headers, page by variables, and automatic page numbering.
Maintained by David Bosak. Last updated 12 months ago.
9.2 match 16 stars 9.35 score 173 scripts 4 dependentsjohndharrison
seleniumPipes:R Client Implementing the W3C WebDriver Specification
The W3C WebDriver specification defines a way for out-of-process programs to remotely instruct the behaviour of web browsers. It is detailed at <https://w3c.github.io/webdriver/webdriver-spec.html>. This package provides an R client implementing the W3C specification.
Maintained by John Harrison. Last updated 8 years ago.
12.9 match 54 stars 6.66 score 168 scriptsprivefl
bigreadr:Read Large Text Files
Read large text files by splitting them in smaller files. Package 'bigreadr' also provides some convenient wrappers around fread() and fwrite() from package 'data.table'.
Maintained by Florian Privรฉ. Last updated 2 years ago.
11.1 match 42 stars 7.76 score 636 scripts 4 dependentsandrewheiss
quRan:Complete Text of the Qur'an
Full text, in data frames containing one row per verse, of the Qur'an in Arabic (with and without vowels) and in English (the Yusuf Ali and Saheeh International translations), formatted to be convenient for text analysis.
Maintained by Andrew Heiss. Last updated 6 years ago.
19.1 match 29 stars 4.44 score 19 scriptsrolkra
utf8ify:Format Text Using Unicode
Format text (bold, italic, ...) and numbers using UTF-8. Offers functions to search for emojis and include them in your text.
Maintained by Roland Krasser. Last updated 2 months ago.
19.7 match 2 stars 4.30 scorequanteda
stopwords:Multilingual Stopword Lists
Provides multiple sources of stopwords, for use in text analysis and natural language processing.
Maintained by Kenneth Benoit. Last updated 3 years ago.
8.1 match 114 stars 10.54 score 1.1k scripts 65 dependentsnmfs-ost
asar:Build NOAA Stock Assessment Report
Build a full or update stock assessment report for any stock assessment model. Parameterization allows the user to call a template based on their regional science center, species, area, ect.
Maintained by Samantha Schiano. Last updated 7 days ago.
latexquartostock-assessment-reports
12.3 match 21 stars 6.87 score 3 scriptsguangchuangyu
shadowtext:Shadow Text Grob and Layer
Implement shadowtextGrob() for 'grid' and geom_shadowtext() layer for 'ggplot2'. These functions create/draw text grob with background shadow.
Maintained by Guangchuang Yu. Last updated 2 months ago.
8.0 match 38 stars 10.60 score 552 scripts 9 dependentsropensci
magick:Advanced Graphics and Image-Processing in R
Bindings to 'ImageMagick': the most comprehensive open-source image processing library available. Supports many common formats (png, jpeg, tiff, pdf, etc) and manipulations (rotate, scale, crop, trim, flip, blur, etc). All operations are vectorized via the Magick++ STL meaning they operate either on a single frame or a series of frames for working with layers, collages, or animation. In RStudio images are automatically previewed when printed to the console, resulting in an interactive editing environment. The latest version of the package includes a native graphics device for creating in-memory graphics or drawing onto images using pixel coordinates.
Maintained by Jeroen Ooms. Last updated 21 days ago.
image-manipulationimage-processingimagemagickcpp
4.8 match 468 stars 17.31 score 9.0k scripts 256 dependentsnalimilan
tm.plugin.alceste:Import Texts from Files in the 'Alceste' Format Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from a corpus prepared in the format used by the 'Alceste' application (i.e. a single text file with inline meta-data). It is able to import both text contents and meta-data (starred) variables.
Maintained by Milan Bouchet-Valat. Last updated 18 days ago.
16.3 match 27 stars 5.08 score 5 scripts 1 dependentsadamspannbauer
lexRankr:Extractive Summarization of Text with the LexRank Algorithm
An R implementation of the LexRank algorithm described by G. Erkan and D. R. Radev (2004) <DOI:10.1613/jair.1523>.
Maintained by Adam Spannbauer. Last updated 2 years ago.
lexranklexrank-algorithmnlprstatcpp
14.3 match 21 stars 5.81 score 61 scriptstidyverse
rvest:Easily Harvest (Scrape) Web Pages
Wrappers around the 'xml2' and 'httr' packages to make it easy to download, then manipulate, HTML and XML.
Maintained by Hadley Wickham. Last updated 5 months ago.
4.2 match 1.5k stars 19.62 score 29k scripts 546 dependentsbioc
GeDi:Defining and visualizing the distances between different genesets
The package provides different distances measurements to calculate the difference between genesets. Based on these scores the genesets are clustered and visualized as graph. This is all presented in an interactive Shiny application for easy usage.
Maintained by Annekathrin Nedwed. Last updated 5 months ago.
guigenesetenrichmentsoftwaretranscriptionrnaseqvisualizationclusteringpathwaysreportwritinggokeggreactomeshinyapps
14.9 match 1 stars 5.52 score 22 scriptsatorus-research
pharmaRTF:Enhanced RTF Wrapper for Use with Existing Table Packages
Enhanced RTF wrapper written in R for use with existing R tables packages such as 'Huxtable' or 'GT'. This package fills a gap where tables in certain packages can be written out to RTF, but cannot add certain metadata or features to the document that are required/expected in a report for a regulatory submission, such as multiple levels of titles and footnotes, making the document landscape, and controlling properties such as margins.
Maintained by Michael Stackhouse. Last updated 3 years ago.
10.2 match 33 stars 8.01 score 128 scripts 2 dependentsjthomasmock
gtExtras:Extending 'gt' for Beautiful HTML Tables
Provides additional functions for creating beautiful tables with 'gt'. The functions are generally wrappers around boilerplate or adding opinionated niche capabilities and helpers functions.
Maintained by Thomas Mock. Last updated 12 months ago.
data-sciencedata-visualizationdatascienceggplot2gtplotssparklinesparkline-graphssparklinestables
7.1 match 199 stars 11.45 score 2.4k scripts 3 dependentsdocma-tu
tosca:Tools for Statistical Content Analysis
A framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the 'lda' package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang's intruder words and intruder topics is provided. Sample data for the vignette is included in the toscaData package, which is available on gitHub: <https://github.com/Docma-TU/toscaData>.
Maintained by Lars Koppers. Last updated 3 years ago.
12.2 match 16 stars 6.64 score 61 scripts 1 dependentsropensci
rtika:R Interface to 'Apache Tika'
Extract text or metadata from over a thousand file types, using Apache Tika <https://tika.apache.org/>. Get either plain text or structured XHTML content.
Maintained by Sasha Goodman. Last updated 2 years ago.
extract-metadataextract-textjavaparsepdf-filespeer-reviewedtesseracttika
13.4 match 55 stars 6.00 score 12 scriptsmiserman
lingmatch:Linguistic Matching and Accommodation
Measure similarity between texts. Offers a variety of processing tools and similarity metrics to facilitate flexible representation of texts and matching. Implements forms of Language Style Matching (Ireland & Pennebaker, 2010) <doi:10.1037/a0020386> and Latent Semantic Analysis (Landauer & Dumais, 1997) <doi:10.1037/0033-295X.104.2.211>.
Maintained by Micah Iserman. Last updated 27 days ago.
16.5 match 11 stars 4.80 score 23 scriptsstrategicprojects
pikchr:R Wrapper for 'pikchr' (PIC) Diagram Language
An 'R' interface to 'pikchr' (<https://pikchr.org>, pronounced โpictureโ), a 'PIC'-like markup language for creating diagrams within technical documentation. Originally developed by Brian Kernighan, 'PIC' has been adapted into 'pikchr' by D. Richard Hipp, the creator of 'SQLite'. 'pikchr' is designed to be embedded in fenced code blocks of Markdown or other documentation markup languages, making it ideal for generating diagrams in text-based formats. This package allows R users to seamlessly integrate the descriptive syntax of 'pikchr' for diagram creation directly within the 'R' environment.
Maintained by Andre Leite. Last updated 24 days ago.
16.4 match 1 stars 4.85 score 7 scriptsteunbrand
ggh4x:Hacks for 'ggplot2'
A 'ggplot2' extension that does a variety of little helpful things. The package extends 'ggplot2' facets through customisation, by setting individual scales per panel, resizing panels and providing nested facets. Also allows multiple colour and fill scales per plot. Also hosts a smaller collection of stats, geoms and axis guides.
Maintained by Teun van den Brand. Last updated 3 months ago.
5.6 match 616 stars 13.98 score 4.4k scripts 20 dependentshenriquesposito
poldis:Analyse Political Texts
Wrangle and annotate different types of political texts. It also introduces Urgency Analysis, a new method for the analysis of urgency in political texts.
Maintained by Henrique Sposito. Last updated 6 months ago.
19.9 match 3 stars 3.95 score 4 scriptsquadrama
DramaAnalysis:Analysis of Dramatic Texts
Analysis of preprocessed dramatic texts, with respect to literary research. The package provides functions to analyze and visualize information about characters, stage directions, the dramatic structure and the text itself. The dramatic texts are expected to be in CSV format, which can be installed from within the package, sample texts are provided. The package and the reasoning behind it are described in Reiter et al. (2017) <doi:10.18420/in2017_119>.
Maintained by Nils Reiter. Last updated 4 years ago.
corpus-linguisticsdigital-humanitiesdramadramatic-textsstatistics
16.4 match 15 stars 4.79 score 41 scriptsbioc
fobitools:Tools for Manipulating the FOBI Ontology
A set of tools for interacting with the Food-Biomarker Ontology (FOBI). A collection of basic manipulation tools for biological significance analysis, graphs, and text mining strategies for annotating nutritional data.
Maintained by Pol Castellano-Escuder. Last updated 4 months ago.
massspectrometrymetabolomicssoftwarevisualizationbiomedicalinformaticsgraphandnetworkannotationcheminformaticspathwaysgenesetenrichmentbiological-intrerpretationbiological-knowledgebiological-significance-analysisenrichment-analysisfood-biomarker-ontologyknowledge-graphnutritionobofoundryontologytext-mining
15.4 match 1 stars 5.08 score 5 scriptskgjerde
corporaexplorer:A 'Shiny' App for Exploration of Text Collections
Facilitates dynamic exploration of text collections through an intuitive graphical user interface and the power of regular expressions. The package contains 1) a helper function to convert a data frame to a 'corporaexplorerobject' and 2) a 'Shiny' app for fast and flexible exploration of a 'corporaexplorerobject'. The package also includes demo apps with which one can explore Jane Austen's novels and the State of the Union Addresses (data from the 'janeaustenr' and 'sotu' packages respectively).
Maintained by Kristian Lundby Gjerde. Last updated 7 months ago.
corporacorpusshinytext-analysis
14.5 match 65 stars 5.39 score 38 scriptsadayim
forestploter:Create a Flexible Forest Plot
Create a forest plot based on the layout of the data. Confidence intervals in multiple columns by groups can be done easily. Editing the plot, inserting/adding text, applying a theme to the plot, and much more.
Maintained by Alimu Dayimu. Last updated 6 months ago.
8.4 match 93 stars 9.31 score 207 scripts 4 dependentspsychbruce
PsychWordVec:Word Embedding Research Framework for Psychological Science
An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a series of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <arXiv:1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <arXiv:1607.04606>; (5) a group of functions to download 'pre-trained' language models (e.g., 'GPT', 'BERT') and extract contextualized (dynamic) word vectors (based on the R package 'text').
Maintained by Han-Wu-Shuang Bao. Last updated 1 years ago.
bertcosine-similarityfasttextglovegptlanguage-modelnatural-language-processingnlppretrained-modelspsychologysemantic-analysistext-analysistext-miningtsneword-embeddingsword-vectorsword2vecopenjdk
19.1 match 22 stars 4.04 score 10 scriptsgforge
forestplot:Advanced Forest Plot Using 'grid' Graphics
Allows the creation of forest plots with advanced features, such as multiple confidence intervals per row, customizable fonts for individual text elements, and flexible confidence interval drawing. It also supports mixing text with mathematical expressions. The package extends the application of forest plots beyond traditional meta-analyses, offering a more general version of the original 'rmeta' packageโs forestplot() function. It relies heavily on the 'grid' package for rendering the plots.
Maintained by Max Gordon. Last updated 4 months ago.
6.7 match 43 stars 11.47 score 716 scripts 21 dependentsvosonlab
vosonSML:Collecting Social Media Data and Generating Networks for Analysis
A suite of easy to use functions for collecting social media data and generating networks for analysis. Supports Mastodon, YouTube, Reddit and Web 1.0 data sources.
Maintained by Bryan Gertzel. Last updated 8 months ago.
hyperlinkmastodonnetwork-graphredditsnasocial-mediasocial-network-analysisvosonyoutube
10.0 match 79 stars 7.67 score 66 scripts 1 dependentsropensci
antiword:Extract Text from Microsoft Word Documents
Wraps the 'AntiWord' utility to extract text from Microsoft Word documents. The utility only supports the old 'doc' format, not the new xml based 'docx' format. Use the 'xml2' package to read the latter.
Maintained by Jeroen Ooms. Last updated 6 months ago.
11.0 match 59 stars 6.98 score 7 scripts 7 dependentsgreat-northern-diver
loon:Interactive Statistical Data Visualization
An extendable toolkit for interactive data visualization and exploration.
Maintained by R. Wayne Oldford. Last updated 2 years ago.
data-analysisdata-sciencedata-visualizationexploratory-analysisexploratory-data-analysishigh-dimensional-datainteractive-graphicsinteractive-visualizationsloonpythonstatistical-analysisstatistical-graphicsstatisticstcl-extensiontk
8.4 match 48 stars 9.00 score 93 scripts 5 dependentsrstudio
rstudioapi:Safely Access the RStudio API
Access the RStudio API (if available) and provide informative error messages when it's not.
Maintained by Kevin Ushey. Last updated 4 months ago.
4.0 match 172 stars 18.81 score 3.6k scripts 2.1k dependentschaoliu-cl
textAnnotatoR:Interactive Text Annotation Tool with 'shiny' GUI
A comprehensive text annotation tool built with 'shiny'. Provides an interactive graphical user interface for coding text documents, managing code hierarchies, creating memos, and analyzing coding patterns. Features include code co-occurrence analysis, visualization of coding patterns, comparison of multiple coding sets, and export capabilities. Supports collaborative qualitative research through standardized annotation formats and analysis tools.
Maintained by Chao Liu. Last updated 4 months ago.
17.3 match 4.30 score 5 scriptsalexkz
kernlab:Kernel-Based Machine Learning Lab
Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods 'kernlab' includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
Maintained by Alexandros Karatzoglou. Last updated 7 months ago.
6.0 match 21 stars 12.26 score 7.8k scripts 487 dependentsrstudio
blastula:Easily Send HTML Email Messages
Compose and send out responsive HTML email messages that render perfectly across a range of email clients and device sizes. Helper functions let the user insert embedded images, web link buttons, and 'ggplot2' plot objects into the message body. Messages can be sent through an 'SMTP' server, through the 'Posit Connect' service, or through the 'Mailgun' API service <https://www.mailgun.com/>.
Maintained by Richard Iannone. Last updated 8 months ago.
easy-to-useemailhtmlmarkdownresponsive-emailsmtp
7.1 match 552 stars 10.27 score 348 scripts 5 dependentswilkox
gggenes:Draw Gene Arrow Maps in 'ggplot2'
A 'ggplot2' extension for drawing gene arrow maps.
Maintained by David Wilkins. Last updated 1 years ago.
6.9 match 525 stars 10.54 score 372 scripts 2 dependentsandrie
surveydata:Tools to Work with Survey Data
Data obtained from surveys contains information not only about the survey responses, but also the survey metadata, e.g. the original survey questions and the answer options. The 'surveydata' package makes it easy to keep track of this metadata, and to easily extract columns with specific questions.
Maintained by Andrie de Vries. Last updated 2 years ago.
12.5 match 23 stars 5.68 score 42 scriptsjulienmoeys
soiltexture:Functions for Soil Texture Plot, Classification and Transformation
"The Soil Texture Wizard" is a set of R functions designed to produce texture triangles (also called texture plots, texture diagrams, texture ternary plots), classify and transform soil textures data. These functions virtually allows to plot any soil texture triangle (classification) into any triangle geometry (isosceles, right-angled triangles, etc.). This set of function is expected to be useful to people using soil textures data from different soil texture classification or different particle size systems. Many (> 15) texture triangles from all around the world are predefined in the package. A simple text based graphical user interface is provided: soiltexture_gui().
Maintained by Julien Moeys. Last updated 1 years ago.
10.0 match 28 stars 7.11 score 136 scripts 1 dependentssentometricsresearch
sentometrics:An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction
Optimized prediction based on textual sentiment, accounting for the intrinsic challenge that sentiment can be computed and pooled across texts and time in various ways. See Ardia et al. (2021) <doi:10.18637/jss.v099.i02>.
Maintained by Samuel Borms. Last updated 4 years ago.
nlppredictionsentiment-analysistext-miningtime-seriesopenblascppopenmp
11.6 match 83 stars 6.09 score 49 scriptshadley
plyr:Tools for Splitting, Applying and Combining Data
A set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary statistics. The development of 'plyr' has been generously supported by 'Becton Dickinson'.
Maintained by Hadley Wickham. Last updated 4 months ago.
3.9 match 500 stars 18.16 score 83k scripts 3.3k dependentsbioc
rWikiPathways:rWikiPathways - R client library for the WikiPathways API
Use this package to interface with the WikiPathways API. It provides programmatic access to WikiPathways content in multiple data and image formats, including official monthly release files and convenient GMT read/write functions.
Maintained by Egon Willighagen. Last updated 5 months ago.
visualizationgraphandnetworkthirdpartyclientnetworkmetabolomicsbioinformaticsdata-accesspathways
7.6 match 15 stars 9.23 score 131 scripts 3 dependentsropensci
EndoMineR:Functions to mine endoscopic and associated pathology datasets
This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.
Maintained by Sebastian Zeki. Last updated 7 months ago.
endoscopygastroenterologypeer-reviewedsemi-structured-datatext-mining
12.9 match 13 stars 5.47 score 30 scriptsappsilon
shiny.semantic:Semantic UI Support for Shiny
Creating a great user interface for your Shiny apps can be a hassle, especially if you want to work purely in R and don't want to use, for instance HTML templates. This package adds support for a powerful UI library Fomantic UI - <https://fomantic-ui.com/> (before Semantic). It also supports universal UI input binding that works with various DOM elements.
Maintained by Jakub Nowicki. Last updated 11 months ago.
appsilonfomantic-uirhinoversesemanticsemantic-componentssemantic-uishiny
5.3 match 506 stars 13.00 score 586 scripts 3 dependentsajrgodfrey
BrailleR:Improved Access for Blind Users
Blind users do not have access to the graphical output from R without printing the content of graphics windows to an embosser of some kind. This is not as immediate as is required for efficient access to statistical output. The functions here are created so that blind people can make even better use of R. This includes the text descriptions of graphs, convenience functions to replace the functionality offered in many GUI front ends, and experimental functionality for optimising graphical content to prepare it for embossing as tactile images.
Maintained by A. Jonathan R. Godfrey. Last updated 11 months ago.
7.8 match 123 stars 8.90 score 143 scriptsropensci
epubr:Read EPUB File Metadata and Text
Provides functions supporting the reading and parsing of internal e-book content from EPUB files. The 'epubr' package provides functions supporting the reading and parsing of internal e-book content from EPUB files. E-book metadata and text content are parsed separately and joined together in a tidy, nested tibble data frame. E-book formatting is not completely standardized across all literature. It can be challenging to curate parsed e-book content across an arbitrary collection of e-books perfectly and in completely general form, to yield a singular, consistently formatted output. Many EPUB files do not even contain all the same pieces of information in their respective metadata. EPUB file parsing functionality in this package is intended for relatively general application to arbitrary EPUB e-books. However, poorly formatted e-books or e-books with highly uncommon formatting may not work with this package. There may even be cases where an EPUB file has DRM or some other property that makes it impossible to read with 'epubr'. Text is read 'as is' for the most part. The only nominal changes are minor substitutions, for example curly quotes changed to straight quotes. Substantive changes are expected to be performed subsequently by the user as part of their text analysis. Additional text cleaning can be performed at the user's discretion, such as with functions from packages like 'tm' or 'qdap'.
Maintained by Matthew Leonawicz. Last updated 6 months ago.
epubepub-filesepub-formatpeer-reviewed
10.8 match 24 stars 6.37 score 49 scriptsggobi
GGally:Extension to 'ggplot2'
The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.
Maintained by Barret Schloerke. Last updated 10 months ago.
4.3 match 597 stars 16.15 score 17k scripts 154 dependentsnbarrowman
vtree:Display Information About Nested Subsets of a Data Frame
A tool for calculating and drawing "variable trees". Variable trees display information about nested subsets of a data frame.
Maintained by Nick Barrowman. Last updated 2 days ago.
data-sciencedata-visualizationexploratory-data-analysisstatistics
9.6 match 76 stars 7.09 score 65 scriptscysouw
qlcMatrix:Utility Sparse Matrix Functions for Quantitative Language Comparison
Extension of the functionality of the 'Matrix' package for using sparse matrices. Some of the functions are very general, while other are highly specific for special data format as used for quantitative language comparison.
Maintained by Michael Cysouw. Last updated 9 months ago.
9.7 match 6 stars 6.98 score 256 scripts 1 dependentsropensci
beastier:Call 'BEAST2'
'BEAST2' (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. 'BEAST2' is a command-line tool. This package provides a way to call 'BEAST2' from an 'R' function call.
Maintained by Richรจl J.C. Bilderbeek. Last updated 24 days ago.
bayesianbeastbeast2phylogenetic-inferencephylogeneticsopenjdk
8.6 match 11 stars 7.87 score 47 scripts 4 dependentswilkelab
cowplot:Streamlined Plot Theme and Plot Annotations for 'ggplot2'
Provides various features that help with creating publication-quality figures with 'ggplot2', such as a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images. The package was originally written for internal use in the Wilke lab, hence the name (Claus O. Wilke's plot package). It has also been used extensively in the book Fundamentals of Data Visualization.
Maintained by Claus O. Wilke. Last updated 2 months ago.
3.5 match 714 stars 18.83 score 75k scripts 1.4k dependentsr-lib
gmailr:Access the 'Gmail' 'RESTful' API
An interface to the 'Gmail' 'RESTful' API. Allows access to your 'Gmail' messages, threads, drafts and labels.
Maintained by Jennifer Bryan. Last updated 1 years ago.
5.8 match 230 stars 11.49 score 289 scripts 1 dependentsfabrice-rossi
mixvlmc:Variable Length Markov Chains with Covariates
Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bรผhlmann, P. and Wyner, A. J. (1999) <doi:10.1214/aos/1018031204> for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) <doi:10.1111/jtsa.12615> for VLMC with covariates.
Maintained by Fabrice Rossi. Last updated 10 months ago.
machine-learningmarkov-chainmarkov-modelstatisticstime-seriescpp
10.7 match 2 stars 6.23 score 20 scriptsbnosac
tokenizers.bpe:Byte Pair Encoding Text Tokenization
Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.
Maintained by Jan Wijffels. Last updated 2 years ago.
bpebyte-pair-encodingtext-miningtokenizationcpp
14.5 match 15 stars 4.56 score 48 scriptstidyverse
readr:Read Rectangular Text Data
The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.
Maintained by Jennifer Bryan. Last updated 8 months ago.
3.1 match 1.0k stars 21.03 score 132k scripts 2.0k dependentscoolbutuseless
tickle:Easily Build Tcl/Tk UIs
Wrap tcltk to make GUI creation easier.
Maintained by mikefc. Last updated 3 years ago.
11.1 match 125 stars 5.88 score 11 scriptsappsilon
shiny.fluent:Microsoft Fluent UI for Shiny Apps
A rich set of UI components for building Shiny applications, including inputs, containers, overlays, menus, and various utilities. All components from Fluent UI (the underlying JavaScript library) are available and have usage examples in R.
Maintained by Jakub Sobolewski. Last updated 10 months ago.
microsoft-fluent-uireactrhinoverseshiny
6.6 match 280 stars 9.91 score 656 scriptsr-lib
xml2:Parse XML
Bindings to 'libxml2' for working with XML data using a simple, consistent interface based on 'XPath' expressions. Also supports XML schema validation; for 'XSLT' transformations see the 'xslt' package.
Maintained by Jeroen Ooms. Last updated 4 days ago.
3.5 match 220 stars 18.52 score 6.3k scripts 2.3k dependentsr-tmap
tmap:Thematic Maps
Thematic maps are geographical maps in which spatial data distributions are visualized. This package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.
Maintained by Martijn Tennekes. Last updated 6 days ago.
choropleth-mapsmapsspatialthematic-mapsvisualisation
3.9 match 880 stars 16.73 score 13k scripts 24 dependentsmannau
boilerpipeR:Interface to the Boilerpipe Java Library
Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe <https://github.com/kohlschutter/boilerpipe> Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.
Maintained by Mario Annau. Last updated 4 years ago.
11.7 match 22 stars 5.52 score 30 scriptsnalimilan
tm.plugin.factiva:Import Articles from 'Factiva' Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from articles exported from the Dow Jones 'Factiva' content provider as XML or HTML files. It is able to read both text content and meta-data information (including source, date, title, author, subject, geographical coverage, company, industry, and various provider-specific fields).
Maintained by Milan Bouchet-Valat. Last updated 18 days ago.
12.5 match 27 stars 5.13 score 11 scripts 1 dependentsmihai-sysbio
glpkAPI:R Interface to C API of GLPK
R Interface to C API of GLPK, depends on GLPK Version >= 4.42.
Maintained by Mihail Anton. Last updated 2 years ago.
10.7 match 5.96 score 51 scripts 12 dependentskumes
deepRstudio:Seamless Language Translation in 'RStudio' using 'DeepL' API and 'Rstudioapi'
Enhancing cross-language compatibility within the 'RStudio' environment and supporting seamless language understanding, the 'deepRstudio' package leverages the power of the 'DeepL' API (see <https://www.deepl.com/docs-api>) to enable seamless, fast, accurate, and affordable translation of code comments, documents, and text. This package offers the ability to translate selected text into English (EN), as well as from English into various languages, namely Japanese (JA), Chinese (ZH), Spanish (ES), French (FR), Russian (RU), Portuguese (PT), and Indonesian (ID). With much of the text being written in English, the emphasis is on compatibility from English. It is also designed for developers working on multilingual projects and data analysts collaborating with international teams, simplifying the translation process and making code more accessible and comprehensible to people with diverse language backgrounds. This package uses the 'rstudioapi' package and 'DeepL' API, and is simply implemented, executed from addins or via shortcuts on 'RStudio'. With just a few steps, content can be translated between supported languages, promoting better collaboration and expanding the global reach of work. The functionality of this package works only on 'RStudio' using 'rstudioapi'.
Maintained by Satoshi Kume. Last updated 1 years ago.
deepldeeprstudiolanguage-translationrstudiorstudioapiseamlessseamless-languagetranslation
18.3 match 2 stars 3.48 score 4 scripts 1 dependentsnalimilan
tm.plugin.lexisnexis:Import Articles from 'LexisNexis' Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from articles exported from the 'LexisNexis' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages). Note that the file format is highly unstable: there is no warranty that this package will work for your corpus, and you may have to adjust the code to adapt it to your particular format.
Maintained by Milan Bouchet-Valat. Last updated 18 days ago.
12.5 match 27 stars 5.08 score 9 scripts 1 dependentsr-lib
lintr:A 'Linter' for R Code
Checks adherence to a given style, syntax errors and possible semantic issues. Supports on the fly checking of R code edited with 'RStudio IDE', 'Emacs', 'Vim', 'Sublime Text', 'Atom' and 'Visual Studio Code'.
Maintained by Michael Chirico. Last updated 6 hours ago.
3.7 match 1.2k stars 17.00 score 916 scripts 33 dependentsbioc
marray:Exploratory analysis for two-color spotted microarray data
Class definitions for two-color spotted microarray data. Fuctions for data input, diagnostic plots, normalization and quality checking.
Maintained by Yee Hwa (Jean) Yang. Last updated 5 months ago.
microarraytwochannelpreprocessing
7.1 match 8.92 score 222 scripts 37 dependentscrunch-io
crunch:Crunch.io Data Tools
The Crunch.io service <https://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.
Maintained by Greg Freedman Ellis. Last updated 12 days ago.
6.0 match 9 stars 10.53 score 200 scripts 2 dependentsropensci
pkgmatch:Find R Packages Matching Either Descriptions or Other R Packages
Find R packages matching either descriptions or other R packages.
Maintained by Mark Padgham. Last updated 1 months ago.
embeddingsllmsnatural-language-processingcpp
12.0 match 3 stars 5.23 scorecjbarrie
quiltr:Qualtrics for Labelling Text using R
Functions to convert text data for labelling into format appropriate for importing into Qualtrics. Supports multiple language, including right-to-left scripts as well as different response types. Outputs an Advance Format .txt file that can be read into Qualtrics.
Maintained by Christopher Barrie. Last updated 3 years ago.
14.6 match 4 stars 4.30 score 9 scriptsrenkun-ken
formattable:Create 'Formattable' Data Structures
Provides functions to create formattable vectors and data frames. 'Formattable' vectors are printed with text formatting, and formattable data frames are printed with multiple types of formatting in HTML to improve the readability of data presented in tabular form rendered in web pages.
Maintained by Kun Ren. Last updated 3 months ago.
4.3 match 700 stars 14.69 score 3.6k scripts 26 dependentsbioc
debrowser:Interactive Differential Expresion Analysis Browser
Bioinformatics platform containing interactive plots and tables for differential gene and region expression studies. Allows visualizing expression data much more deeply in an interactive and faster way. By changing the parameters, users can easily discover different parts of the data that like never have been done before. Manually creating and looking these plots takes time. With DEBrowser users can prepare plots without writing any code. Differential expression, PCA and clustering analysis are made on site and the results are shown in various plots such as scatter, bar, box, volcano, ma plots and Heatmaps.
Maintained by Alper Kucukural. Last updated 5 months ago.
sequencingchipseqrnaseqdifferentialexpressiongeneexpressionclusteringimmunooncology
8.0 match 61 stars 7.80 score 65 scriptsmyeomans
politeness:Detecting Politeness Features in Text
Detecting markers of politeness in English natural language. This package allows researchers to easily visualize and quantify politeness between groups of documents. This package combines prior research on the linguistic markers of politeness. We thank the Spencer Foundation, the Hewlett Foundation, and Harvard's Institute for Quantitative Social Science for support.
Maintained by Mike Yeomans. Last updated 1 months ago.
8.3 match 25 stars 7.49 score 41 scripts 1 dependentsrstudio
learnr:Interactive Tutorials for R
Create interactive tutorials using R Markdown. Use a combination of narrative, figures, videos, exercises, and quizzes to create self-paced tutorials for learning about R and R packages.
Maintained by Garrick Aden-Buie. Last updated 6 months ago.
interactivepythonrmarkdownshinysqlteachingtutorial
4.2 match 713 stars 14.79 score 6.5k scripts 27 dependentscran
textreg:n-Gram Text Regression, aka Concise Comparative Summarization
Function for sparse regression on raw text, regressing a labeling vector onto a feature space consisting of all possible phrases.
Maintained by Luke Miratrix. Last updated 6 years ago.
19.0 match 1 stars 3.26 scorebnosac
word2vec:Distributed Representations of Words
Learn vector representations of words by continuous bag of words and skip-gram implementations of the 'word2vec' algorithm. The techniques are detailed in the paper "Distributed Representations of Words and Phrases and their Compositionality" by Mikolov et al. (2013), available at <arXiv:1310.4546>.
Maintained by Jan Wijffels. Last updated 1 years ago.
embeddingsnatural-language-processingword2veccpp
7.4 match 70 stars 8.36 score 227 scripts 6 dependentsrstudio
tfdatasets:Interface to 'TensorFlow' Datasets
Interface to 'TensorFlow' Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces. See <https://www.tensorflow.org/guide> for additional details.
Maintained by Tomasz Kalinowski. Last updated 5 days ago.
6.7 match 34 stars 9.32 score 656 scripts 3 dependentsinlabru-org
inlabru:Bayesian Latent Gaussian Modelling using INLA and Extensions
Facilitates spatial and general latent Gaussian modeling using integrated nested Laplace approximation via the INLA package (<https://www.r-inla.org>). Additionally, extends the GAM-like model class to more general nonlinear predictor expressions, and implements a log Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data. Model components are specified with general inputs and mapping methods to the latent variables, and the predictors are specified via general R expressions, with separate expressions for each observation likelihood model in multi-likelihood models. A prediction method based on fast Monte Carlo sampling allows posterior prediction of general expressions of the latent variables. Ecology-focused introduction in Bachl, Lindgren, Borchers, and Illian (2019) <doi:10.1111/2041-210X.13168>.
Maintained by Finn Lindgren. Last updated 11 hours ago.
4.9 match 96 stars 12.61 score 832 scripts 6 dependentsyingjie4science
SDGdetector:Detect SDGs and Targets in Text
Identify 17 Sustainable Development Goals and associated 169 targets in text.
Maintained by Yingjie Li. Last updated 6 months ago.
sdgsdgssustainabilitysustainable-development-goalstext-mining
14.9 match 14 stars 4.15 score 10 scriptsfrareb
inpdfr:Analyse Text Documents Using Ecological Tools
A set of functions to analyse and compare texts, using classical text mining functions, as well as those from theoretical ecology.
Maintained by Rebaudo Francois. Last updated 2 years ago.
14.0 match 2 stars 4.41 score 26 scriptsmanalytics
opitools:Analyzing the Opinions in a Big Text Document
Designed for performing impact analysis of opinions in a digital text document (DTD). The package allows a user to assess the extent to which a theme or subject within a document impacts the overall opinion expressed in the document. The package can be applied to a wide range of opinion-based DTD, including commentaries on social media platforms (such as 'Facebook', 'Twitter' and 'Youtube'), online products reviews, and so on. The utility of 'opitools' was originally demonstrated in Adepeju and Jimoh (2021) <doi:10.31235/osf.io/c32qh> in the assessment of COVID-19 impacts on neighbourhood policing using Twitter data. Further examples can be found in the vignette of the package.
Maintained by Monsuru Adepeju. Last updated 2 years ago.
11.5 match 12 stars 5.30 score 11 scriptsopenanalytics
clinUtils:General Utility Functions for Analysis of Clinical Data
Utility functions to facilitate the import, the reporting and analysis of clinical data. Example datasets in 'SDTM' and 'ADaM' format, containing a subset of patients/domains from the 'CDISC Pilot 01 study' are also available as R datasets to demonstrate the package functionalities.
Maintained by Laure Cougnaud. Last updated 10 months ago.
9.0 match 3 stars 6.78 score 105 scripts 3 dependentsgegznav
spAddins:RStudio Add-ins to Format R Markdown files (RETIRED PACKAGE)
The development of `spAddins` ended in 2018 as the package retired in favor of packages `addins.rmd` and `addins.rs`. ... RStudio Add-ins to Format Text and Insert Operators ... A set of RStudio addins that are designed to be used in combination with user-defined RStudio keyboard shortcuts. These addins either: 1) insert text at a cursor position (e.g. insert operators %>%, <<-, %$%, etc.), 2) replace symbols in selected pieces of text (e.g., convert backslashes to forward slashes which results in stings like "c:\data\" converted into "c:/data/") or 3) enclose text with special symbols (e.g., converts "bold" into "**bold**") which is convenient for editing R Markdown files.
Maintained by Vilmantas Gegzna. Last updated 4 years ago.
13.2 match 8 stars 4.60 score 8 scriptssatijalab
Seurat:Tools for Single Cell Genomics
A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.
Maintained by Paul Hoffman. Last updated 1 years ago.
human-cell-atlassingle-cell-genomicssingle-cell-rna-seqcpp
3.5 match 2.4k stars 16.86 score 50k scripts 73 dependentsdankelley
oce:Analysis of Oceanographic Data
Supports the analysis of Oceanographic data, including 'ADCP' measurements, measurements made with 'argo' floats, 'CTD' measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the 'UNESCO' or 'TEOS-10' equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively by Kelley (2018) "Oceanographic Analysis with R" <doi:10.1007/978-1-4939-8844-0>.
Maintained by Dan Kelley. Last updated 2 days ago.
3.9 match 146 stars 15.42 score 4.2k scripts 18 dependentsrrwen
draw:Wrapper Functions for Producing Graphics
A set of user-friendly wrapper functions for creating consistent graphics and diagrams with lines, common shapes, text, and page settings. Compatible with and based on the R 'grid' package.
Maintained by Richard Wen. Last updated 7 years ago.
boxcirclecurvediagramdrawgraphicsgridlinepagerectanglereproducibleshapesquaretexttriangle
13.5 match 2 stars 4.39 score 35 scriptsatfutures
calendar:Create, Read, Write, and Work with 'iCalendar' Files, Calendars and Scheduling Data
Provides function to create, read, write, and work with 'iCalendar' files (which typically have '.ics' or '.ical' extensions), and the scheduling data, calendars and timelines of people, organisations and other entities that they represent. 'iCalendar' is an open standard for exchanging calendar and scheduling information between users and computers, described at <https://icalendar.org/>.
Maintained by Robin Lovelace. Last updated 7 months ago.
7.1 match 42 stars 8.33 score 113 scripts 1 dependents