R-universe search: language

ropensci

charlatan:Make Fake Data

Make fake data that looks realistic, supporting addresses, person names, dates, times, colors, coordinates, currencies, digital object identifiers ('DOIs'), jobs, phone numbers, 'DNA' sequences, doubles and integers from distributions and within a range.

Maintained by Roel M. Hogervorst. Last updated 1 months ago.

data dataset fake-data faker peer-reviewed

74.3 match 296 stars 10.06 score 180 scripts 1 dependents

ropensci

lingtypology:Linguistic Typology and Mapping

Provides R with the Glottolog database <https://glottolog.org/> and some more abilities for purposes of linguistic mapping. The Glottolog database contains the catalogue of languages of the world. This package helps researchers to make a linguistic maps, using philosophy of the Cross-Linguistic Linked Data project <https://clld.org/>, which allows for while at the same time facilitating uniform access to the data across publications. A tutorial for this package is available on GitHub pages <https://docs.ropensci.org/lingtypology/> and package vignette. Maps created by this package can be used both for the investigation and linguistic teaching. In addition, package provides an ability to download data from typological databases such as WALS, AUTOTYP and some others and to create your own database website.

Maintained by George Moroz. Last updated 5 months ago.

abvd afbo atlas autotype bivaltyp clld glottolog-database linguistic-maps linguistics phoible sails typology wals

69.7 match 51 stars 9.58 score 694 scripts

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

29.6 match 215 stars 11.83 score 1.2k scripts 9 dependents

ropensci

googleLanguageR:Call Google's 'Natural Language' API, 'Cloud Translation' API, 'Cloud Speech' API and 'Cloud Text-to-Speech' API

Call 'Google Cloud' machine learning APIs for text and speech tasks. Call the 'Cloud Translation' API <https://cloud.google.com/translate/> for detection and translation of text, the 'Natural Language' API <https://cloud.google.com/natural-language/> to analyse text for sentiment, entities or syntax, the 'Cloud Speech' API <https://cloud.google.com/speech/> to transcribe sound files to text and the 'Cloud Text-to-Speech' API <https://cloud.google.com/text-to-speech/> to turn text into sound files.

Maintained by Mark Edmondson. Last updated 8 months ago.

cloud-speech-api cloud-translation-api google-api-client google-cloud google-cloud-speech google-nlp googleauthr natural-language-processing peer-reviewed sentiment-analysis speech-api translation-api

28.1 match 196 stars 10.36 score 268 scripts 3 dependents

tomeriko96

polyglotr:Translate Text

Provide easy methods to translate pieces of text. Functions send requests to translation services online.

Maintained by Tomer Iwan. Last updated 1 months ago.

google-translate googletranslate language linguee mymemory-api mymemorytranslator pons translation translations-api

28.9 match 33 stars 7.61 score 34 scripts 1 dependents

psychbruce

FMAT:The Fill-Mask Association Test

The Fill-Mask Association Test ('FMAT') <doi:10.1037/pspa0000396> is an integrative and probability-based method using Masked Language Models to measure conceptual associations (e.g., attitudes, biases, stereotypes, social norms, cultural values) as propositions in natural language. Supported language models include 'BERT' <doi:10.48550/arXiv.1810.04805> and its variants available at 'Hugging Face' <https://huggingface.co/models?pipeline_tag=fill-mask>. Methodological references and installation guidance are provided at <https://psychbruce.github.io/FMAT/>.

Maintained by Han-Wu-Shuang Bao. Last updated 5 months ago.

ai artificial-intelligence bert bert-model bert-models contextualized-representation fill-in-the-blank fill-mask huggingface language-model language-models large-language-models masked-language-models natural-language-processing natural-language-understanding nlp pretrained-models transformer transformers

42.6 match 12 stars 4.82 score 2 scripts

oscarkjell

text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Maintained by Oscar Kjell. Last updated 2 days ago.

deep-learning machine-learning nlp transformers openjdk

15.0 match 146 stars 13.16 score 436 scripts 1 dependents

vgherard

kgrams:Classical k-gram Language Models

Training and evaluating k-gram language models in R, supporting several probability smoothing techniques, perplexity computations, random text generation and more.

Maintained by Valerio Gherardi. Last updated 4 months ago.

language-models n-grams natural-language-processing cpp

32.1 match 7 stars 5.17 score 14 scripts 1 dependents

gagolews

stringi:Fast and Portable Character String Processing Facilities

A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).

Maintained by Marek Gagolewski. Last updated 1 months ago.

icu icu4c natural-language-processing nlp regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode cpp

9.0 match 309 stars 18.31 score 10k scripts 8.6k dependents

ropensci

cld2:Google's Compact Language Detector 2

Bindings to Google's C++ library Compact Language Detector 2 (see <https://github.com/cld2owners/cld2#readme> for more information). Probabilistically detects over 80 languages in plain text or HTML. For mixed-language input it returns the top three detected languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes). There is also a 'cld3' package on CRAN which uses a neural network model instead.

Maintained by Jeroen Ooms. Last updated 5 months ago.

cld cld2 language-detection language-detector cpp

20.9 match 38 stars 7.74 score 161 scripts 3 dependents

davisvaughan

treesitter:Bindings to 'Tree-Sitter'

Provides bindings to 'Tree-sitter', an incremental parsing system for programming tools. 'Tree-sitter' builds concrete syntax trees for source files of any language, and can efficiently update those syntax trees as the source file is edited. It also includes a robust error recovery system that provides useful parse results even in the presence of syntax errors.

Maintained by Davis Vaughan. Last updated 6 months ago.

23.1 match 37 stars 6.62 score 18 scripts 2 dependents

lepennec

ggwordcloud:A Word Cloud Geom for 'ggplot2'

Provides a word cloud text geom for 'ggplot2'. Texts are placed so that they do not overlap as in 'ggrepel'. The algorithm used is a variation around the one of 'wordcloud2.js'.

Maintained by Erwan Le Pennec. Last updated 10 months ago.

cpp

14.3 match 174 stars 10.38 score 1.3k scripts 15 dependents

reditorsupport

languageserver:Language Server Protocol

An implementation of the Language Server Protocol for R. The Language Server protocol is used by an editor client to integrate features like auto completion. See <https://microsoft.github.io/language-server-protocol/> for details.

Maintained by Randy Lai. Last updated 1 years ago.

language-server-protocol

14.7 match 607 stars 9.93 score 207 scripts 1 dependents

yihui

knitr:A General-Purpose Package for Dynamic Report Generation in R

Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.

Maintained by Yihui Xie. Last updated 8 hours ago.

dynamic-documents knitr literate-programming rmarkdown sweave

6.0 match 2.4k stars 23.62 score 116k scripts 4.2k dependents

ropensci

cld3:Google's Compact Language Detector 3

Google's Compact Language Detector 3 is a neural network model for language identification and the successor of 'cld2' (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from 'cld2'. See <https://github.com/google/cld3#readme> for more information.

Maintained by Jeroen Ooms. Last updated 5 months ago.

cld cld3 language-detection language-detector protobuf cpp

20.8 match 41 stars 6.55 score 85 scripts 1 dependents

gabrielkaiserqfin

perplexR:A Coding Assistant using Perplexity's Large Language Models

A coding assistant using Perplexity's Large Language Models <https://www.perplexity.ai/> API. A set of functions and 'RStudio' add-ins that aim to help R developers.

Maintained by Gabriel Kaiser. Last updated 2 months ago.

31.8 match 6 stars 4.09 score 1 scripts

aphalo

learnrbook:Datasets and Code Examples from P. J. Aphalo's "Learn R" Book

Data, scripts and code from chunks used as examples in the book "Learn R: As a Language" 1ed and 2ed by Pedro J. Aphalo. ISBN 9780367182533 (pbk 1ed); ISBN 9780367182557 (hbk 1ed); ISBN 9780429060342 (ebk 1ed).

Maintained by Pedro J. Aphalo. Last updated 7 months ago.

book

27.4 match 1 stars 4.57 score 25 scripts

cran

languageR:Analyzing Linguistic Data: A Practical Introduction to Statistics

Data sets exemplifying statistical methods, and some facilitatory utility functions used in ``Analyzing Linguistic Data: A practical introduction to statistics using R'', Cambridge University Press, 2008.

Maintained by R. H. Baayen. Last updated 6 years ago.

53.6 match 2.32 score

appsilon

shiny.i18n:Shiny Applications Internationalization

It provides easy internationalization of Shiny applications. It can be used as standalone translation package to translate reports, interactive visualizations or graphical elements as well.

Maintained by Jakub Nowicki. Last updated 11 months ago.

internationalization language rhinoverse shiny translation

11.9 match 168 stars 9.97 score 312 scripts 6 dependents

kjhealy

gssrdoc:Document General Social Survey Variable

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 11 months ago.

51.6 match 2.28 score 38 scripts

juliasilge

tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.

Maintained by Julia Silge. Last updated 11 months ago.

natural-language-processing text-mining tidy-data tidyverse

6.7 match 1.2k stars 16.86 score 17k scripts 61 dependents

quanteda

quanteda:Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintained by Kenneth Benoit. Last updated 2 months ago.

corpus natural-language-processing quanteda text-analytics onetbb cpp

6.7 match 851 stars 16.68 score 5.4k scripts 51 dependents

quanteda

stopwords:Multilingual Stopword Lists

Provides multiple sources of stopwords, for use in text analysis and natural language processing.

Maintained by Kenneth Benoit. Last updated 3 years ago.

text-analysis

9.6 match 114 stars 10.54 score 1.1k scripts 65 dependents

rstudio

learnr:Interactive Tutorials for R

Create interactive tutorials using R Markdown. Use a combination of narrative, figures, videos, exercises, and quizzes to create self-paced tutorials for learning about R and R packages.

Maintained by Garrick Aden-Buie. Last updated 6 months ago.

interactive python rmarkdown shiny sql teaching tutorial

6.7 match 713 stars 14.79 score 6.5k scripts 27 dependents

nimble-dev

nimble:MCMC, Particle Filtering, and Programmable Hierarchical Modeling

A system for writing hierarchical statistical models largely compatible with 'BUGS' and 'JAGS', writing nimbleFunctions to operate models and do basic R-style math, and compiling both models and nimbleFunctions via custom-generated C++. 'NIMBLE' includes default methods for MCMC, Laplace Approximation, Monte Carlo Expectation Maximization, and some other tools. The nimbleFunction system makes it easy to do things like implement new MCMC samplers from R, customize the assignment of samplers to different parts of a model from R, and compile the new samplers automatically via C++ alongside the samplers 'NIMBLE' provides. 'NIMBLE' extends the 'BUGS'/'JAGS' language by making it extensible: New distributions and functions can be added, including as calls to external compiled code. Although most people think of MCMC as the main goal of the 'BUGS'/'JAGS' language for writing models, one can use 'NIMBLE' for writing arbitrary other kinds of model-generic algorithms as well. A full User Manual is available at <https://r-nimble.org>.

Maintained by Christopher Paciorek. Last updated 3 days ago.

bayesian-inference bayesian-methods hierarchical-models mcmc probabilistic-programming openblas cpp

7.6 match 169 stars 12.97 score 2.6k scripts 19 dependents

edjnet

tidywikidatar:Explore 'Wikidata' Through Tidy Data Frames

Query 'Wikidata' API <https://www.wikidata.org/wiki/Wikidata:Main_Page> with ease, get tidy data frames in response, and cache data in a local database.

Maintained by Giorgio Comai. Last updated 8 months ago.

wikidata

12.4 match 26 stars 7.86 score 46 scripts 2 dependents

cysouw

qlcMatrix:Utility Sparse Matrix Functions for Quantitative Language Comparison

Extension of the functionality of the 'Matrix' package for using sparse matrices. Some of the functions are very general, while other are highly specific for special data format as used for quantitative language comparison.

Maintained by Michael Cysouw. Last updated 9 months ago.

13.7 match 6 stars 6.98 score 256 scripts 1 dependents

boost-r

mboost:Model-Based Boosting

Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data. Models and algorithms are described in <doi:10.1214/07-STS242>, a hands-on tutorial is available from <doi:10.1007/s00180-012-0382-5>. The package allows user-specified loss functions and base-learners.

Maintained by Torsten Hothorn. Last updated 4 months ago.

boosting-algorithms gam glm machine-learning mboost modelling r-language tutorials variable-selection openblas

7.5 match 72 stars 12.70 score 540 scripts 27 dependents

taylor-arnold

cleanNLP:A Tidy Data Model for Natural Language Processing

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of the 'udpipe' back end with no external dependencies, or a Python back ends with 'spaCy' <https://spacy.io>. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, and dependency parsing.

Maintained by Taylor B. Arnold. Last updated 10 months ago.

corenlp natural-language-processing spacy

11.3 match 214 stars 8.39 score 229 scripts

miraisolutions

XLConnect:Excel Connector for R

Provides comprehensive functionality to read, write and format Excel data.

Maintained by Martin Studer. Last updated 16 days ago.

cross-platform excel r-language xlconnect openjdk

7.5 match 130 stars 12.28 score 1.2k scripts 1 dependents

dselivanov

text2vec:Modern Text Mining Framework for R

Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

Maintained by Dmitriy Selivanov. Last updated 7 months ago.

glove latent-dirichlet-allocation natural-language-processing text-mining topic-modeling vectorization word-embeddings word2vec cpp

6.7 match 860 stars 13.48 score 1.3k scripts 23 dependents

glottospace

glottospace:Language Mapping and Geospatial Analysis of Linguistic and Cultural Data

Streamlined workflows for geolinguistic analysis, including: accessing global linguistic and cultural databases, data import, data entry, data cleaning, data exploration, mapping, visualization and export.

Maintained by Rui Dong. Last updated 3 months ago.

15.5 match 23 stars 5.54 score 6 scripts

usepa

elevatr:Access Elevation Data from Various APIs

Several web services are available that provide access to elevation data. This package provides access to many of those services and returns elevation data either as an 'sf' simple features object from point elevation services or as a 'raster' object from raster elevation services. In future versions, 'elevatr' will drop support for 'raster' and will instead return 'terra' objects. Currently, the package supports access to the Amazon Web Services Terrain Tiles <https://registry.opendata.aws/terrain-tiles/>, the Open Topography Global Datasets API <https://opentopography.org/developers/>, and the USGS Elevation Point Query Service <https://apps.nationalmap.gov/epqs/>.

Maintained by Jeffrey Hollister. Last updated 6 months ago.

digital-elevation-model elevation-data elevatr epa mapzen-elevation-service r-language

7.5 match 206 stars 11.11 score 1.3k scripts 3 dependents

kumes

deepRstudio:Seamless Language Translation in 'RStudio' using 'DeepL' API and 'Rstudioapi'

Enhancing cross-language compatibility within the 'RStudio' environment and supporting seamless language understanding, the 'deepRstudio' package leverages the power of the 'DeepL' API (see <https://www.deepl.com/docs-api>) to enable seamless, fast, accurate, and affordable translation of code comments, documents, and text. This package offers the ability to translate selected text into English (EN), as well as from English into various languages, namely Japanese (JA), Chinese (ZH), Spanish (ES), French (FR), Russian (RU), Portuguese (PT), and Indonesian (ID). With much of the text being written in English, the emphasis is on compatibility from English. It is also designed for developers working on multilingual projects and data analysts collaborating with international teams, simplifying the translation process and making code more accessible and comprehensible to people with diverse language backgrounds. This package uses the 'rstudioapi' package and 'DeepL' API, and is simply implemented, executed from addins or via shortcuts on 'RStudio'. With just a few steps, content can be translated between supported languages, promoting better collaboration and expanding the global reach of work. The functionality of this package works only on 'RStudio' using 'rstudioapi'.

Maintained by Satoshi Kume. Last updated 1 years ago.

deepl deeprstudio language-translation rstudio rstudioapi seamless seamless-language translation

23.3 match 2 stars 3.48 score 4 scripts 1 dependents

edubruell

tidyllm:Tidy Integration of Large Language Models

A tidy interface for integrating large language model (LLM) APIs such as 'Claude', 'Openai', 'Groq','Mistral' and local models via 'Ollama' into R workflows. The package supports text and media-based interactions, interactive message history, batch request APIs, and a tidy, pipeline-oriented interface for streamlined integration into data workflows. Web services are available at <https://www.anthropic.com>, <https://openai.com>, <https://groq.com>, <https://mistral.ai/> and <https://ollama.com>.

Maintained by Eduard Brüll. Last updated 3 days ago.

10.2 match 68 stars 7.82 score 26 scripts

lightbridge-ks

thaipdf:R Markdown to PDF in Thai Language

Provide R Markdown templates and LaTeX preamble which are necessary for creating PDF from R Markdown documents in Thai language.

Maintained by Kittipos Sirivongrungson. Last updated 3 years ago.

latex-template pdf-document rmarkdown thai thai-language

17.8 match 5 stars 4.40 score 1 scripts

easystats

effectsize:Indices of Effect Size

Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc. References: Ben-Shachar et al. (2020) <doi:10.21105/joss.02815>.

Maintained by Mattan S. Ben-Shachar. Last updated 1 months ago.

anova cohens-d compute conversion correlation effect-size effectsize hacktoberfest hedges-g interpretation standardization standardized statistics

4.7 match 344 stars 16.38 score 1.8k scripts 29 dependents

statisticsgreenland

pxmake:Make PX-Files in R

Create PX-files from scratch or read and modify existing ones. Includes a function for every PX keyword, making metadata manipulation simple and human-readable.

Maintained by Johan Ejstrud. Last updated 10 days ago.

11.0 match 9 stars 6.95 score 11 scripts

jozefhajnala

languageserversetup:Automated Setup and Auto Run for R Language Server

Allows to install the R 'languageserver' with all dependencies into a separate library and use that independent installation automatically when R is instantiated as a language server process. Useful for making language server seamless to use without running into package version conflicts.

Maintained by Jozef Hajnala. Last updated 4 years ago.

language-server-protocol

17.4 match 30 stars 4.32 score 2 scripts

vincentarelbundock

countrycode:Convert Country Names and Country Codes

Standardize country names, convert them into one of 40 different coding schemes, convert between coding schemes, and assign region descriptors.

Maintained by Vincent Arel-Bundock. Last updated 2 months ago.

5.0 match 351 stars 14.80 score 6.3k scripts 119 dependents

tidyverse

purrr:Functional Programming Tools

A complete and consistent functional programming toolkit for R.

Maintained by Hadley Wickham. Last updated 1 months ago.

functional-programming

3.3 match 1.3k stars 22.12 score 59k scripts 6.9k dependents

juliainterop

JuliaCall:Seamless Integration Between R and 'Julia'

Provides an R interface to 'Julia', which is a high-level, high-performance dynamic programming language for numerical computing, see <https://julialang.org/> for more information. It provides a high-level interface as well as a low-level interface. Using the high level interface, you could call any 'Julia' function just like any R function with automatic type conversion. Using the low level interface, you could deal with C-level SEXP directly while enjoying the convenience of using a high-level programming language like 'Julia'.

Maintained by Changcheng Li. Last updated 3 months ago.

julia cpp

6.0 match 270 stars 12.33 score 380 scripts 8 dependents

psychbruce

PsychWordVec:Word Embedding Research Framework for Psychological Science

An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a series of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <arXiv:1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <arXiv:1607.04606>; (5) a group of functions to download 'pre-trained' language models (e.g., 'GPT', 'BERT') and extract contextualized (dynamic) word vectors (based on the R package 'text').

Maintained by Han-Wu-Shuang Bao. Last updated 1 years ago.

18.1 match 22 stars 4.04 score 10 scripts

hofnerb

stabs:Stability Selection with Error Control

Resampling procedures to assess the stability of selected variables with additional finite sample error control for high-dimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Buhlmann, 2010, <doi:10.1111/j.1467-9868.2010.00740.x>) and complementary pairs stability selection with improved error bounds (Shah & Samworth, 2013, <doi:10.1111/j.1467-9868.2011.01034.x>) are implemented. The package can be combined with arbitrary user specified variable selection approaches.

Maintained by Benjamin Hofner. Last updated 4 years ago.

machine-learning r-language resampling stability-selection variable-importance variable-selection

7.5 match 26 stars 9.59 score 53 scripts 31 dependents

mlampros

fastText:Efficient Learning of Word Representations and Sentence Classification

An interface to the 'fastText' <https://github.com/facebookresearch/fastText> library for efficient learning of word representations and sentence classification. The 'fastText' algorithm is explained in detail in (i) "Enriching Word Vectors with subword Information", Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov, 2017, <doi:10.1162/tacl_a_00051>; (ii) "Bag of Tricks for Efficient Text Classification", Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov, 2017, <doi:10.18653/v1/e17-2068>; (iii) "FastText.zip: Compressing text classification models", Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Herve Jegou, Tomas Mikolov, 2016, <arXiv:1612.03651>.

Maintained by Lampros Mouselimis. Last updated 1 years ago.

cpp11 fasttext cpp

9.6 match 42 stars 7.37 score 56 scripts

ropensci

rnaturalearth:World Map Data from Natural Earth

Facilitates mapping by making natural earth map data from <https://www.naturalearthdata.com/> more easily available to R users.

Maintained by Philippe Massicotte. Last updated 15 days ago.

peer-reviewed

4.6 match 232 stars 15.35 score 7.2k scripts 47 dependents

bioc

sevenbridges:Seven Bridges Platform API Client and Common Workflow Language Tool Builder in R

R client and utilities for Seven Bridges platform API, from Cancer Genomics Cloud to other Seven Bridges supported platforms.

Maintained by Phil Webster. Last updated 5 months ago.

software dataimport thirdpartyclient api-client bioconductor bioinformatics cloud common-workflow-language sevenbridges

9.4 match 35 stars 7.40 score 24 scripts

stevenmmortimer

salesforcer:An Implementation of 'Salesforce' APIs Using Tidy Principles

Functions connecting to the 'Salesforce' Platform APIs (REST, SOAP, Bulk 1.0, Bulk 2.0, Metadata, Reports and Dashboards) <https://trailhead.salesforce.com/content/learn/modules/api_basics/api_basics_overview>. "API" is an acronym for "application programming interface". Most all calls from these APIs are supported as they use CSV, XML or JSON data that can be parsed into R data structures. For more details please see the 'Salesforce' API documentation and this package's website <https://stevenmmortimer.github.io/salesforcer/> for more information, documentation, and examples.

Maintained by Steven M. Mortimer. Last updated 4 months ago.

api-wrappers r-language r-programming salesforce salesforce-apis

7.5 match 82 stars 9.27 score 191 scripts

stevecondylios

dictionaRy:Retrieve the Dictionary Definitions of English Words

An R interface to the 'Free Dictionary API' <https://dictionaryapi.dev/>, <https://github.com/meetDeveloper/freeDictionaryAPI>. Retrieve dictionary definitions for English words, as well as additional information including phonetics, part of speech, origins, audio pronunciation, example usage, synonyms and antonyms, returned in 'tidy' format for ease of use.

Maintained by Steve Condylios. Last updated 3 years ago.

literature natural-language-processing r-language

14.2 match 6 stars 4.86 score 240 scripts

hadley

pryr:Tools for Computing on the Language

Useful tools to pry back the covers of R and understand the language at a deeper level.

Maintained by Hadley Wickham. Last updated 1 years ago.

cpp

5.8 match 204 stars 11.85 score 1.9k scripts 56 dependents

tidyverse

glue:Interpreted String Literals

An implementation of interpreted string literals, inspired by Python's Literal String Interpolation <https://www.python.org/dev/peps/pep-0498/> and Docstrings <https://www.python.org/dev/peps/pep-0257/> and Julia's Triple-Quoted String Literals <https://docs.julialang.org/en/v1.3/manual/strings/#Triple-Quoted-String-Literals-1>.

Maintained by Jennifer Bryan. Last updated 5 months ago.

string-interpolation strings

3.1 match 729 stars 21.76 score 57k scripts 14k dependents

rossellhayes

and:Construct Natural-Language Lists with Internationalization

Construct language-aware lists. Make "and"-separated and "or"-separated lists that automatically conform to the user's language settings.

Maintained by Alexander Rossell Hayes. Last updated 17 days ago.

i18n internationalization translation

13.5 match 20 stars 5.01 score 6 scripts 2 dependents

trinker

wakefield:Generate Random Data Sets

Generates random data sets including: data.frames, lists, and vectors.

Maintained by Tyler Rinker. Last updated 5 years ago.

data-generation wakefield

9.4 match 256 stars 7.13 score 209 scripts

ropensci

karel:Learning programming with Karel the robot

This is the R implementation of Karel the robot, a programming language created by Dr. R. E. Pattis at Stanford University in 1981. Karel is an useful tool to teach introductory concepts about general programming, such as algorithmic decomposition, conditional statements, loops, etc., in an interactive and fun way, by writing programs to make Karel the robot achieve certain tasks in the world she lives in. Originally based on Pascal, Karel was implemented in many languages through these decades, including 'Java', 'C++', 'Ruby' and 'Python'. This is the first package implementing Karel in R.

Maintained by Marcos Prunello. Last updated 8 months ago.

learning programming r-language

9.5 match 10 stars 6.87 score 31 scripts

modeloriented

DALEX:moDel Agnostic Language for Exploration and eXplanation

Any unverified black box model is the path to failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection. DALEX package xrays any model and helps to explore and explain its behaviour. Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance. But such black-box models usually lack direct interpretability. DALEX package contains various methods that help to understand the link between input variables and model output. Implemented methods help to explore the model on the level of a single instance as well as a level of the whole dataset. All model explainers are model agnostic and can be compared across different models. DALEX package is the cornerstone for 'DrWhy.AI' universe of packages for visual model exploration. Find more details in (Biecek 2018) <https://jmlr.org/papers/v19/18-416.html>.

Maintained by Przemyslaw Biecek. Last updated 30 days ago.

black-box dalex data-science explainable-ai explainable-artificial-intelligence explainable-ml explanations explanatory-model-analysis fairness iml interpretability interpretable-machine-learning machine-learning model-visualization predictive-modeling responsible-ai responsible-ml xai

4.9 match 1.4k stars 13.40 score 876 scripts 21 dependents

pakjiddat

wordpredictor:Develop Text Prediction Models Based on N-Grams

A framework for developing n-gram models for text prediction. It provides data cleaning, data sampling, extracting tokens from text, model generation, model evaluation and word prediction. For information on how n-gram models work we referred to: "Speech and Language Processing" <https://web.archive.org/web/20240919222934/https%3A%2F%2Fweb.stanford.edu%2F~jurafsky%2Fslp3%2F3.pdf>. For optimizing R code and using R6 classes we referred to "Advanced R" <https://adv-r.hadley.nz/r6.html>. For writing R extensions we referred to "R Packages", <https://r-pkgs.org/index.html>.

Maintained by Nadir Latif. Last updated 5 months ago.

n-gram-language-models natural-language-processing r-programming

13.4 match 6 stars 4.78 score 9 scripts

boost-r

gamboostLSS:Boosting Methods for 'GAMLSS'

Boosting models for fitting generalized additive models for location, shape and scale ('GAMLSS') to potentially high dimensional data.

Maintained by Benjamin Hofner. Last updated 19 days ago.

boosting-algorithms gamboostlss gamlss machine-learning r-language variable-selection

7.5 match 26 stars 8.52 score 163 scripts 1 dependents

bnosac

crfsuite:Conditional Random Fields for Labelling Sequential Data in Natural Language Processing

Wraps the 'CRFsuite' library <https://github.com/chokkan/crfsuite> allowing users to fit a Conditional Random Field model and to apply it on existing data. The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for named entity recognition, text chunking, part of speech tagging, intent recognition or classification of any category you have in mind. Next to training, a small web application is included in the package to allow you to easily construct training data.

Maintained by Jan Wijffels. Last updated 1 years ago.

chunking conditional-random-fields crf crfsuite data-science intent-classification natural-language-processing ner nlp cpp

10.0 match 63 stars 6.34 score 35 scripts

eddelbuettel

RcppTOML:'Rcpp' Bindings to Parser for "Tom's Obvious Markup Language"

The configuration format defined by 'TOML' (which expands to "Tom's Obvious Markup Language") specifies an excellent format (described at <https://toml.io/en/>) suitable for both human editing as well as the common uses of a machine-readable format. This package uses 'Rcpp' to connect to the 'toml++' parser written by Mark Gillard to R.

Maintained by Dirk Eddelbuettel. Last updated 7 days ago.

c-plus-plus-11 toml toml-parser toml-parsing cpp

5.1 match 36 stars 12.32 score 124 scripts 433 dependents

dataobservatory-eu

dataset:Create Data Frames that are Easier to Exchange and Reuse

The aim of the 'dataset' package is to make tidy datasets easier to release, exchange and reuse. It organizes and formats data frame 'R' objects into well-referenced, well-described, interoperable datasets into release and reuse ready form.

Maintained by Daniel Antal. Last updated 19 days ago.

dataset metadata-management

7.8 match 15 stars 7.81 score 76 scripts 1 dependents

kurthornik

ISOcodes:Selected ISO Codes

ISO language, territory, currency, script and character codes. Provides ISO 639 language codes, ISO 3166 territory codes, ISO 4217 currency codes, ISO 15924 script codes, and the ISO 8859 character codes as well as the UN M.49 area codes.

Maintained by Kurt Hornik. Last updated 1 years ago.

10.2 match 5.86 score 208 scripts 77 dependents

cran

nlme:Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixed-effects models.

Maintained by R Core Team. Last updated 2 months ago.

fortran

4.5 match 6 stars 13.00 score 13k scripts 8.7k dependents

spsanderson

TidyDensity:Functions for Tidy Analysis and Generation of Random Data

To make it easy to generate random numbers based upon the underlying stats distribution functions. All data is returned in a tidy and structured format making working with the data simple and straight forward. Given that the data is returned in a tidy 'tibble' it lends itself to working with the rest of the 'tidyverse'.

Maintained by Steven Sanderson. Last updated 5 months ago.

bootstrap density distributions ggplot2 probability r-language simulation statistics tibble tidy

7.5 match 34 stars 7.78 score 66 scripts 1 dependents

gongcastro

bvq:Barcelona Vocabulary Questionnaire Database and Helper Functions

Download, clean, and process the Barcelona Vocabulary Questionnaire (BVQ) data. BVQ is a vocabulary inventory developed for assesing the vocabulary of Catalan-Spanish bilinguals infants from the Metropolitan Area of Barcelona (Spain). This package includes functions to download the data from formr servers, and return the processed data in multiple formats.

Maintained by Gonzalo Garcia-Castro. Last updated 2 months ago.

bilingualism language psycholinguistics vocabulary

13.7 match 1 stars 4.26 score 8 scripts

modeloriented

ingredients:Effects and Importances of Model Ingredients

Collection of tools for assessment of feature importance and feature effects. Key functions are: feature_importance() for assessment of global level feature importance, ceteris_paribus() for calculation of the what-if plots, partial_dependence() for partial dependence plots, conditional_dependence() for conditional dependence plots, accumulated_dependence() for accumulated local effects plots, aggregate_profiles() and cluster_profiles() for aggregation of ceteris paribus profiles, generic print() and plot() for better usability of selected explainers, generic plotD3() for interactive, D3 based explanations, and generic describe() for explanations in natural language. The package 'ingredients' is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.

Maintained by Przemyslaw Biecek. Last updated 2 years ago.

5.6 match 37 stars 10.38 score 83 scripts 22 dependents

brodieg

oshka:Recursive Quoted Language Expansion

Expands quoted language by recursively replacing any symbol that points to quoted language with the language it points to. The recursive process continues until only symbols that point to non-language objects remain. The resulting quoted language can then be evaluated normally. This differs from the traditional 'quote'/'eval' pattern because it resolves intermediate language objects that would interfere with evaluation.

Maintained by Brodie Gaslam. Last updated 7 years ago.

nse-functions

10.9 match 14 stars 5.15 score 9 scripts

computationalstylistics

tidystopwords:Customisable Stop-Words in 110 Languages

Functions to generate stop-word lists in 110 languages, in a way consistent across all the languages supported. The generated lists are based on the morphological tagset from the Universal Dependencies.

Maintained by Maciej Eder. Last updated 12 months ago.

12.4 match 6 stars 4.48 score 7 scripts

hoxo-m

githubinstall:A Helpful Way to Install R Packages Hosted on GitHub

Provides an helpful way to install packages hosted on GitHub.

Maintained by Koji Makiyama. Last updated 7 years ago.

r-language

7.5 match 49 stars 7.35 score 177 scripts

hofnerb

papeR:A Toolbox for Writing Pretty Papers and Reports

A toolbox for writing 'knitr', 'Sweave' or other 'LaTeX'- or 'markdown'-based reports and to prettify the output of various estimated models.

Maintained by Benjamin Hofner. Last updated 4 years ago.

knitr latex r-language reporting reproducible reproducible-research sweave

7.5 match 30 stars 7.30 score 223 scripts 1 dependents

bnosac

word2vec:Distributed Representations of Words

Learn vector representations of words by continuous bag of words and skip-gram implementations of the 'word2vec' algorithm. The techniques are detailed in the paper "Distributed Representations of Words and Phrases and their Compositionality" by Mikolov et al. (2013), available at <arXiv:1310.4546>.

Maintained by Jan Wijffels. Last updated 1 years ago.

embeddings natural-language-processing word2vec cpp

6.7 match 70 stars 8.08 score 227 scripts 5 dependents

r-lib

withr:Run Code 'With' Temporarily Modified Global State

A set of functions to run code 'with' safely and temporarily modified global state. Many of these functions were originally a part of the 'devtools' package, this provides a simple package with limited dependencies to provide access to these functions.

Maintained by Lionel Henry. Last updated 18 days ago.

3.0 match 176 stars 17.92 score 1.2k scripts 12k dependents

nlmixr2

rxode2:Facilities for Simulating from ODE-Based Models

Facilities for running simulations from ordinary differential equation ('ODE') models, such as pharmacometrics and other compartmental models. A compilation manager translates the ODE model into C, compiles it, and dynamically loads the object code into R for improved computational efficiency. An event table object facilitates the specification of complex dosing regimens (optional) and sampling schedules. NB: The use of this package requires both C and Fortran compilers, for details on their use with R please see Section 6.3, Appendix A, and Appendix D in the "R Administration and Installation" manual. Also the code is mostly released under GPL. The 'VODE' and 'LSODA' are in the public domain. The information is available in the inst/COPYRIGHTS.

Maintained by Matthew L. Fidler. Last updated 28 days ago.

fortran openblas cpp openmp

4.7 match 39 stars 11.16 score 220 scripts 13 dependents

kurthornik

NLP:Natural Language Processing Infrastructure

Basic classes and methods for Natural Language Processing.

Maintained by Kurt Hornik. Last updated 4 months ago.

5.6 match 6 stars 9.37 score 1.0k scripts 127 dependents

oobianom

r2country:Country Data with Names, Capitals, Currencies, Populations, Time, Languages and so on

Obtain information about countries around the globe. Information for names, states, languages, time, capitals, currency and many more. Data source are 'Wikipedia' <https://www.wikipedia.org>, 'TimeAndDate' <https://www.timeanddate.com> and 'CountryCode' <https://countrycode.org>.

Maintained by Obinna Obianom. Last updated 1 years ago.

14.1 match 1 stars 3.70 score 4 scripts

spsanderson

tidyAML:Automatic Machine Learning with 'tidymodels'

The goal of this package will be to provide a simple interface for automatic machine learning that fits the 'tidymodels' framework. The intention is to work for regression and classification problems with a simple verb framework.

Maintained by Steven Sanderson. Last updated 11 months ago.

automatic-machine-learning automl classification machine-learning parsnip r-language r-programming regression tidy tidymodels tidyverse

7.5 match 68 stars 6.87 score 36 scripts 1 dependents

cysouw

qlcVisualize:Visualization for Quantitative Language Comparison

Collection of visualizations as used in quantitative language comparison. Currently implemented are visualisations dealing nominal data with multiple levels ("level map" and "factor map"), and assistance for making weighted geographical Voronoi-maps ("weighted map").

Maintained by Michael Cysouw. Last updated 6 months ago.

12.6 match 4.03 score 24 scripts

cran

XR:A Structure for Interfaces from R

Support for interfaces from R to other languages, built around a class for evaluators and a combination of functions, classes and methods for communication. Will be used through a specific language interface package. Described in the book "Extending R".

Maintained by John Chambers. Last updated 7 years ago.

16.7 match 2.95 score 3 dependents

bnosac

textrank:Summarize Text by Ranking Sentences and Finding Keywords

The 'textrank' algorithm is an extension of the 'Pagerank' algorithm for text. The algorithm allows to summarize text by calculating how sentences are related to one another. This is done by looking at overlapping terminology used in sentences in order to set up links between sentences. The resulting sentence network is next plugged into the 'Pagerank' algorithm which identifies the most important sentences in your text and ranks them. In a similar way 'textrank' can also be used to extract keywords. A word network is constructed by looking if words are following one another. On top of that network the 'Pagerank' algorithm is applied to extract relevant words after which relevant words which are following one another are combined to get keywords. More information can be found in the paper from Mihalcea, Rada & Tarau, Paul (2004) <https://www.aclweb.org/anthology/W04-3252/>.

Maintained by Jan Wijffels. Last updated 4 years ago.

natural-language-processing nlp textrank textrank-algorithm

6.7 match 77 stars 7.38 score 103 scripts 2 dependents

ropensci

pkgmatch:Find R Packages Matching Either Descriptions or Other R Packages

Find R packages matching either descriptions or other R packages.

Maintained by Mark Padgham. Last updated 30 days ago.

cpp

9.4 match 3 stars 5.23 score

neptune-ai

neptune:MLOps Metadata Store - Experiment Tracking and Model Registry for Production Teams

An interface to Neptune. A metadata store for MLOps, built for teams that run a lot of experiments. It gives you a single place to log, store, display, organize, compare, and query all your model-building metadata. Neptune is used for: • Experiment tracking: Log, display, organize, and compare ML experiments in a single place. • Model registry: Version, store, manage, and query trained models, and model building metadata. • Monitoring ML runs live: Record and monitor model training, evaluation, or production runs live For more information see <https://neptune.ai/>.

Maintained by Rafal Jankowski. Last updated 2 years ago.

compare language log management metadata metrics mlops models monitoring organize parameters store tracker visualization

10.0 match 14 stars 4.89 score 16 scripts

brandmaier

ggx:A Natural Language Interface to 'ggplot2'

The 'ggplot2' package is the state-of-the-art toolbox for creating and formatting graphs. However, it is easy to forget how certain formatting commands are named and sometimes users find themselves asking: How do you rotate the x-axis labels again? Or how do you hide the legend...? This package allows users to issue natural language commands related to theme-related styling of plots (colors, font size and such), which then are translated into valid 'ggplot2' commands.

Maintained by Andreas M. Brandmaier. Last updated 2 years ago.

7.1 match 152 stars 6.69 score 16 scripts

hoxo-m

densratio:Density Ratio Estimation

Density ratio estimation. The estimated density ratio function can be used in many applications such as anomaly detection, change-point detection, covariate shift adaptation. The implemented methods are uLSIF (Hido et al. (2011) <doi:10.1007/s10115-010-0283-2>), RuLSIF (Yamada et al. (2011) <doi:10.1162/NECO_a_00442>), and KLIEP (Sugiyama et al. (2007) <doi:10.1007/s10463-008-0197-x>).

Maintained by Koji Makiyama. Last updated 6 years ago.

anomalydetection machine-learning machine-learning-algorithms machine-learning-library r-language statistics

7.5 match 21 stars 6.36 score 36 scripts 2 dependents

bioc

RAIDS:Accurate Inference of Genetic Ancestry from Cancer Sequences

This package implements specialized algorithms that enable genetic ancestry inference from various cancer sequences sources (RNA, Exome and Whole-Genome sequences). This package also implements a simulation algorithm that generates synthetic cancer-derived data. This code and analysis pipeline was designed and developed for the following publication: Belleau, P et al. Genetic Ancestry Inference from Cancer-Derived Molecular Data across Genomic and Transcriptomic Platforms. Cancer Res 1 January 2023; 83 (1): 49–58.

Maintained by Pascal Belleau. Last updated 5 months ago.

genetics software sequencing wholegenome principalcomponent geneticvariability dimensionreduction biocviews ancestry cancer-genomics exome-sequencing genomics inference r-language rna-seq rna-sequencing whole-genome-sequencing

7.5 match 5 stars 6.23 score 19 scripts

sigbertklinke

stranslate:Simple Translation Between Different Languages

Message translation is often managed with 'po' files and the 'gettext' programme, but sometimes another solution is needed. In contrast to 'po' files, a more flexible approach is used as in the Fluent <https://projectfluent.org/> project with R Markdown snippets. The key-value approach allows easier handling of the translated messages.

Maintained by Sigbert Klinke. Last updated 1 years ago.

11.1 match 4.18 score 3 scripts 1 dependents

skoval

RISmed:Download Content from NCBI Databases

A set of tools to extract bibliographic content from the National Center for Biotechnology Information (NCBI) databases, including PubMed. The name RISmed is a portmanteau of RIS (for Research Information Systems, a common tag format for bibliographic data) and PubMed.

Maintained by Stephanie Kovalchik. Last updated 3 years ago.

6.7 match 38 stars 6.94 score 252 scripts 3 dependents

epiverse-trace

serofoi:Bayesian Estimation of the Force of Infection from Serological Data

Estimating the force of infection from time varying, age varying, or constant serocatalytic models from population based seroprevalence studies using a Bayesian framework, including data simulation functions enabling the generation of serological surveys based on this models. This tool also provides a flexible prior specification syntax for the force of infection and the seroreversion rate, as well as methods to assess model convergence and comparison criteria along with useful visualisation functions.

Maintained by Zulma M. Cucunubá. Last updated 17 days ago.

antibodies bayesian-methods epidemiology epiverse serological-surveys stan-language cpp

7.5 match 18 stars 6.17 score 10 scripts

lgnbhl

BFS:Get Data from the Swiss Federal Statistical Office

Search and download data from the Swiss Federal Statistical Office (BFS) APIs <https://www.bfs.admin.ch/>.

Maintained by Felix Luginbuhl. Last updated 3 months ago.

switzerland

7.1 match 18 stars 6.55 score 17 scripts

cmerow

rangeModelMetadata:Provides Templates for Metadata Files Associated with Species Range Models

Range Modeling Metadata Standards (RMMS) address three challenges: they (i) are designed for convenience to encourage use, (ii) accommodate a wide variety of applications, and (iii) are extensible to allow the community of range modelers to steer it as needed. RMMS are based on a data dictionary that specifies a hierarchical structure to catalog different aspects of the range modeling process. The dictionary balances a constrained, minimalist vocabulary to improve standardization with flexibility for users to provide their own values. Merow et al. (2019) <DOI:10.1111/geb.12993> describe the standards in more detail. Note that users who prefer to use the R package 'ecospat' can obtain it from <https://github.com/ecospat/ecospat>.

Maintained by Cory Merow. Last updated 8 months ago.

ecological-metadata-language ecological-modelling ecological-models ecology species-distribution-modelling species-distributions

6.7 match 6 stars 6.90 score 16 scripts 3 dependents

tidyverse

ellmer:Chat with Large Language Models

Chat with large language models from a range of providers including 'Claude' <https://claude.ai>, 'OpenAI' <https://chatgpt.com>, and more. Supports streaming, asynchronous calls, tool calling, and structured data extraction.

Maintained by Hadley Wickham. Last updated 3 days ago.

3.6 match 388 stars 12.58 score 98 scripts 7 dependents

ropensci

tokenizers:Fast, Consistent Tokenization of Natural Language Text

Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.

Maintained by Thomas Charlon. Last updated 12 months ago.

nlp peer-reviewed text-mining tokenizer cpp

3.4 match 186 stars 13.33 score 1.1k scripts 81 dependents

ropensci

targets:Dynamic Function-Oriented 'Make'-Like Declarative Pipelines

Pipeline tools coordinate the pieces of computationally demanding analysis projects. The 'targets' package is a 'Make'-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU 'Make' (2015, ISBN:978-9881443519) and 'drake' (2018, <doi:10.21105/joss.00550>).

Maintained by William Michael Landau. Last updated 11 hours ago.

data-science high-performance-computing make peer-reviewed pipeline r-targetopia reproducibility reproducible-research targets workflow

3.0 match 973 stars 15.20 score 4.6k scripts 22 dependents

dcomtois

summarytools:Tools to Quickly and Neatly Summarize Data

Data frame summaries, cross-tabulations, weight-enabled frequency tables and common descriptive (univariate) statistics in concise tables available in a variety of formats (plain ASCII, Markdown and HTML). A good point-of-entry for exploring data, both for experienced and new R users.

Maintained by Dominic Comtois. Last updated 1 days ago.

descriptive-statistics frequency-table html-report markdown pander pandoc pandoc-markdown rmarkdown rstudio

3.1 match 526 stars 14.52 score 2.9k scripts 6 dependents

dickoa

robotoolbox:Client for the 'KoboToolbox' API

Suite of utilities for accessing and manipulating data from the 'KoboToolbox' API. 'KoboToolbox' is a robust platform designed for field data collection in various disciplines. This package aims to simplify the process of fetching and handling data from the API. Detailed documentation for the 'KoboToolbox' API can be found at <https://support.kobotoolbox.org/api.html>.

Maintained by Ahmadou Dicko. Last updated 3 months ago.

open-data kobotoolbox odk kpi api data dataset

7.5 match 5.92 score 48 scripts

cschwem2er

stminsights:A 'Shiny' Application for Inspecting Structural Topic Models

This app enables interactive validation, interpretation and visualization of structural topic models from the 'stm' package by Roberts and others (2014) <doi:10.1111/ajps.12103>. It also includes helper functions for model diagnostics and extracting data from effect estimates.

Maintained by Carsten Schwemmer. Last updated 9 months ago.

natural-language-processing shiny topic-modeling

6.7 match 116 stars 6.69 score 84 scripts

bnosac

ruimtehol:Learn Text 'Embeddings' with 'Starspace'

Wraps the 'StarSpace' library <https://github.com/facebookresearch/StarSpace> allowing users to calculate word, sentence, article, document, webpage, link and entity 'embeddings'. By using the 'embeddings', you can perform text based multi-label classification, find similarities between texts and categories, do collaborative-filtering based recommendation as well as content-based recommendation, find out relations between entities, calculate graph 'embeddings' as well as perform semi-supervised learning and multi-task learning on plain text. The techniques are explained in detail in the paper: 'StarSpace: Embed All The Things!' by Wu et al. (2017), available at <arXiv:1709.03856>.

Maintained by Jan Wijffels. Last updated 1 years ago.

classification embeddings natural-language-processing nlp similarity starspace text-mining cpp

6.7 match 101 stars 6.65 score 44 scripts

agricolamz

lingglosses:Interlinear Glossed Linguistic Examples and Abbreviation Lists Generation

Helps to render interlinear glossed linguistic examples in html 'rmarkdown' documents and then semi-automatically compiles the list of glosses at the end of the document. It also provides a database of linguistic glosses.

Maintained by George Moroz. Last updated 9 days ago.

glosses glosses-list interlinear-gloss language-documentation linguistics rmarkdown typology

7.5 match 15 stars 5.88 score 167 scripts

correlaid

newsanchor:Client for the News API

Interface to gather news from the 'News API', based on a multilevel query <https://newsapi.org/>. A personal API key is required.

Maintained by Yannik Buhl. Last updated 5 years ago.

6.5 match 36 stars 6.70 score 40 scripts

tdaverse

ripserr:Calculate Persistent Homology with Ripser-Based Engines

Ports the Ripser <https://arxiv.org/abs/1908.02518> and Cubical Ripser <https://arxiv.org/abs/2005.12692> persistent homology calculation engines from C++. Can be used as a rapid calculation tool in topological data analysis pipelines.

Maintained by Raoul Wadhwa. Last updated 2 hours ago.

algebraic-topology cohomology cpp cubical-complex persistent-homology pixel point-cloud r-language r-programming rcpp rips-complex ripser simplicial-complex simplicial-homology topological-data-analysis topology vietoris-complex voxel cpp

7.5 match 7 stars 5.80 score 6 scripts

jbgruber

rollama:Communicate with 'Ollama' to Run Large Language Models Locally

Wraps the 'Ollama' <https://ollama.com> API, which can be used to communicate with generative large language models locally.

Maintained by Johannes B. Gruber. Last updated 1 months ago.

5.2 match 110 stars 8.36 score 52 scripts

gaborcsardi

franc:Detect the Language of Text

With no external dependencies and support for 335 languages; all languages spoken by more than one million speakers. 'Franc' is a port of the 'JavaScript' project of the same name, see <https://github.com/wooorm/franc>.

Maintained by Gábor Csárdi. Last updated 3 years ago.

9.9 match 30 stars 4.38 score 16 scripts

ineelhere

clintrialx:Connect and Work with Clinical Trials Data Sources

Are you spending too much time fetching and managing clinical trial data? Struggling with complex queries and bulk data extraction? What if you could simplify this process with just a few lines of code? Introducing 'clintrialx' - Fetch clinical trial data from sources like 'ClinicalTrials.gov' <https://clinicaltrials.gov/> and the 'Clinical Trials Transformation Initiative - Access to Aggregate Content of ClinicalTrials.gov' database <https://aact.ctti-clinicaltrials.org/>, supporting pagination and bulk downloads. Also, you can generate HTML reports based on the data obtained from the sources!

Maintained by Indraneel Chakraborty. Last updated 3 days ago.

aact bioinformatics clinical-data clinical-trials clinicaltrialsgov ctti data data-management medical-informatics r-language trials

7.5 match 15 stars 5.76 score 11 scripts

hoxo-m

magicfor:Magic Functions to Obtain Results from for Loops

Magic functions to obtain results from for loops.

Maintained by Koji Makiyama. Last updated 8 years ago.

r-language

7.5 match 20 stars 5.72 score 53 scripts

curso-r

scryr:An Interface to the 'Scryfall' API

A simple, light, and robust interface between R and the 'Scryfall' card data API <https://scryfall.com/docs/api>.

Maintained by Caio Lente. Last updated 3 years ago.

api mtg

7.0 match 17 stars 6.09 score 18 scripts

r-lib

treesitter.r:'R' Grammar for 'Tree-Sitter'

Provides bindings to an 'R' grammar for 'Tree-sitter', to be used alongside the 'treesitter' package. 'Tree-sitter' builds concrete syntax trees for source files of any language, and can efficiently update those syntax trees as the source file is edited.

Maintained by Davis Vaughan. Last updated 4 months ago.

5.4 match 118 stars 7.81 score 17 scripts 2 dependents

keyatm

keyATM:Keyword Assisted Topic Models

Fits keyword assisted topic models (keyATM) using collapsed Gibbs samplers. The keyATM combines the latent dirichlet allocation (LDA) models with a small number of keywords selected by researchers in order to improve the interpretability and topic classification of the LDA. The keyATM can also incorporate covariates and directly model time trends. The keyATM is proposed in Eshima, Imai, and Sasaki (2024) <doi:10.1111/ajps.12779>.

Maintained by Shusei Eshima. Last updated 11 months ago.

latent-dirichlet-allocation natural-language-processing political-science rcpp rcppeigen social-science topic-models cpp

6.7 match 106 stars 6.30 score 63 scripts

zumbov2

deeplr:Interface to the 'DeepL' Translation API

A wrapper for the 'DeepL' Pro API <https://www.deepl.com/docs-api>, a web service for translating texts between different languages. A DeepL API developer account is required to use the service (see <https://www.deepl.com/pro#developer>).

Maintained by David Zumbach. Last updated 12 months ago.

api-wrapper deepl translation

7.5 match 41 stars 5.57 score 70 scripts

bnosac

BTM:Biterm Topic Models for Short Text

Biterm Topic Models find topics in collections of short texts. It is a word co-occurrence based topic model that learns topics by modeling word-word co-occurrences patterns which are called biterms. This in contrast to traditional topic models like Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis which are word-document co-occurrence topic models. A biterm consists of two words co-occurring in the same short text window. This context window can for example be a twitter message, a short answer on a survey, a sentence of a text or a document identifier. The techniques are explained in detail in the paper 'A Biterm Topic Model For Short Text' by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng (2013) <https://github.com/xiaohuiyan/xiaohuiyan.github.io/blob/master/paper/BTM-WWW13.pdf>.

Maintained by Jan Wijffels. Last updated 2 years ago.

biterm-topic-modelling natural-language-processing topic-modeling cpp

6.7 match 96 stars 6.25 score 74 scripts

r-lib

tidyselect:Select from a Set of Strings

A backend for the selecting functions of the 'tidyverse'. It makes it easy to implement select-like functions in your own packages in a way that is consistent with other 'tidyverse' interfaces for selection.

Maintained by Lionel Henry. Last updated 3 months ago.

2.3 match 130 stars 18.31 score 1.9k scripts 8.2k dependents

sbg

tidycwl:Tidy Common Workflow Language Tools and Workflows

The Common Workflow Language <https://www.commonwl.org/> is an open standard for describing data analysis workflows. This package takes the raw Common Workflow Language workflows encoded in JSON or 'YAML' and turns the workflow elements into tidy data frames or lists. A graph representation for the workflow can be constructed and visualized with the parsed workflow inputs, outputs, and steps. Users can embed the visualizations in their 'Shiny' applications, and export them as HTML files or static images.

Maintained by Soner Koc. Last updated 10 months ago.

bioinformatics-pipeline common-workflow-language sevenbridges tidyverse

10.4 match 9 stars 3.95 score

henrikbengtsson

port4me:Get the Same, Personal, Free 'TCP' Port over and over

An R implementation of the cross-platform, language-independent "port4me" algorithm (<https://github.com/HenrikBengtsson/port4me>), which (1) finds a free Transmission Control Protocol ('TCP') port in [1024,65535] that the user can open, (2) is designed to work in multi-user environments, (3), gives different users, different ports, (4) gives the user the same port over time with high probability, (5) gives different ports for different software tools, and (6) requires no configuration.

Maintained by Henrik Bengtsson. Last updated 1 years ago.

bash cli high-performance-computing hpc multi-tenant multi-user port pypi-package python r-language r-programming tcp utility

8.0 match 13 stars 5.11 score 5 scripts

dwulff

text2sdg:Detecting UN Sustainable Development Goals in Text

The United Nations’ Sustainable Development Goals (SDGs) have become an important guideline for organisations to monitor and plan their contributions to social, economic, and environmental transformations. The 'text2sdg' package is an open-source analysis package that identifies SDGs in text using scientifically developed query systems, opening up the opportunity to monitor any type of text-based data, such as scientific output or corporate publications. For more information regarding the methodology see Meier, Mata & Wulff (2022) <arXiv:2110.05856>.

Maintained by Dominik S. Meier. Last updated 6 months ago.

natural-language-processing sustainability sustainable-development sustainable-development-goals

6.7 match 18 stars 6.13 score 9 scripts

sjewo

readstata13:Import 'Stata' Data Files

Function to read and write the 'Stata' file format.

Maintained by Sebastian Jeworutzki. Last updated 2 years ago.

stata cpp

3.8 match 41 stars 10.74 score 1.7k scripts 45 dependents

moodymudskipper

inops:Infix Operators for Detection, Subsetting and Replacement

Infix operators to detect, subset, and replace the elements matched by a given condition. The functions have several variants of operator types, including subsets, ranges, regular expressions and others. Implemented operators work on vectors, matrices, and lists.

Maintained by Antoine Fabri. Last updated 5 years ago.

r-language r-programming

7.5 match 40 stars 5.34 score 11 scripts

friendly

heplots:Visualizing Hypothesis Tests in Multivariate Linear Models

Provides HE plot and other functions for visualizing hypothesis tests in multivariate linear models. HE plots represent sums-of-squares-and-products matrices for linear hypotheses and for error using ellipses (in two dimensions) and ellipsoids (in three dimensions). The related 'candisc' package provides visualizations in a reduced-rank canonical discriminant space when there are more than a few response variables.

Maintained by Michael Friendly. Last updated 7 days ago.

linear-hypotheses matrices multivariate-linear-models plot repeated-measure-designs visualizing-hypothesis-tests

3.4 match 9 stars 11.49 score 1.1k scripts 7 dependents

ropensci

gutenbergr:Download and Process Public Domain Works from Project Gutenberg

Download and process public domain works in the Project Gutenberg collection <https://www.gutenberg.org/>. Includes metadata for all Project Gutenberg works, so that they can be searched and retrieved.

Maintained by Jon Harmon. Last updated 2 months ago.

peer-reviewed

3.8 match 105 stars 10.50 score 1.1k scripts 1 dependents

r-lib

testthat:Unit Testing for R

Software testing is important, but, in part because it is frustrating and boring, many of us avoid it. 'testthat' is a testing framework for R that is easy to learn and use, and integrates with your existing 'workflow'.

Maintained by Hadley Wickham. Last updated 15 days ago.

unit-testing cpp

1.9 match 900 stars 20.97 score 74k scripts 465 dependents

quanteda

spacyr:Wrapper to the 'spaCy' 'NLP' Library

An R wrapper to the 'Python' 'spaCy' 'NLP' library, from <https://spacy.io>.

Maintained by Kenneth Benoit. Last updated 1 months ago.

extract-entities nlp spacy speech-tagging

3.6 match 253 stars 10.68 score 408 scripts 6 dependents

hauselin

ollamar:'Ollama' Language Models

An interface to easily run local language models with 'Ollama' <https://ollama.com> server and API endpoints (see <https://github.com/ollama/ollama/blob/main/docs/api.md> for details). It lets you run open-source large language models locally on your machine.

Maintained by Hause Lin. Last updated 2 months ago.

ai api llm llms ollama ollama-api

4.1 match 84 stars 9.36 score 74 scripts 5 dependents

ropensci

EML:Read and Write Ecological Metadata Language Files

Work with Ecological Metadata Language ('EML') files. 'EML' is a widely used metadata standard in the ecological and environmental sciences, described in Jones et al. (2006), <doi:10.1146/annurev.ecolsys.37.091305.110031>.

Maintained by Carl Boettiger. Last updated 3 years ago.

eml eml-metadata metadata-standard

3.4 match 97 stars 11.19 score 378 scripts 7 dependents

bnosac

doc2vec:Distributed Representations of Sentences, Documents and Topics

Learn vector representations of sentences, paragraphs or documents by using the 'Paragraph Vector' algorithms, namely the distributed bag of words ('PV-DBOW') and the distributed memory ('PV-DM') model. The techniques in the package are detailed in the paper "Distributed Representations of Sentences and Documents" by Mikolov et al. (2014), available at <arXiv:1405.4053>. The package also provides an implementation to cluster documents based on these embedding using a technique called top2vec. Top2vec finds clusters in text documents by combining techniques to embed documents and words and density-based clustering. It does this by embedding documents in the semantic space as defined by the 'doc2vec' algorithm. Next it maps these document embeddings to a lower-dimensional space using the 'Uniform Manifold Approximation and Projection' (UMAP) clustering algorithm and finds dense areas in that space using a 'Hierarchical Density-Based Clustering' technique (HDBSCAN). These dense areas are the topic clusters which can be represented by the corresponding topic vector which is an aggregate of the document embeddings of the documents which are part of that topic cluster. In the same semantic space similar words can be found which are representative of the topic. More details can be found in the paper 'Top2Vec: Distributed Representations of Topics' by D. Angelov available at <arXiv:2008.09470>.

Maintained by Jan Wijffels. Last updated 3 years ago.

doc2vec embeddings natural-language-processing paragraph2vec word2vec cpp

6.7 match 48 stars 5.74 score 23 scripts

bioc

CNVMetrics:Copy Number Variant Metrics

The CNVMetrics package calculates similarity metrics to facilitate copy number variant comparison among samples and/or methods. Similarity metrics can be employed to compare CNV profiles of genetically unrelated samples as well as those with a common genetic background. Some metrics are based on the shared amplified/deleted regions while other metrics rely on the level of amplification/deletion. The data type used as input is a plain text file containing the genomic position of the copy number variations, as well as the status and/or the log2 ratio values. Finally, a visualization tool is provided to explore resulting metrics.

Maintained by Astrid Deschênes. Last updated 5 months ago.

biologicalquestion software copynumbervariation cnv copy-number-variation metrics r-language

7.5 match 4 stars 5.08 score 8 scripts

pachadotdev

cepiigeodist:CEPII's GeoDist datasets in R

Provides data on countries and their main city or agglomeration and the different distance measures and dummy variables indicating whether two countries are contiguous, share a common language or a colonial relationship. The reference article for these datasets is Mayer and Zignago (2011).

Maintained by Mauricio Vargas. Last updated 2 years ago.

borders colonization geodistance gravity languages trade

10.5 match 3 stars 3.54 score 23 scripts

mlverse

chattr:Interact with Large Language Models in 'RStudio'

Enables user interactivity with large-language models ('LLM') inside the 'RStudio' integrated development environment (IDE). The user can interact with the model using the 'shiny' app included in this package, or directly in the 'R' console. It comes with back-ends for 'OpenAI', 'GitHub' 'Copilot', and 'LlamaGPT'.

Maintained by Edgar Ruiz. Last updated 1 months ago.

3.5 match 215 stars 10.55 score 71 scripts 1 dependents

ropensci

pangoling:Access to Large Language Model Predictions

Provides access to word predictability estimates using large language models (LLMs) based on 'transformer' architectures via integration with the 'Hugging Face' ecosystem. The package interfaces with pre-trained neural networks and supports both causal/auto-regressive LLMs (e.g., 'GPT-2'; Radford et al., 2019) and masked/bidirectional LLMs (e.g., 'BERT'; Devlin et al., 2019, <doi:10.48550/arXiv.1810.04805>) to compute the probability of words, phrases, or tokens given their linguistic context. By enabling a straightforward estimation of word predictability, the package facilitates research in psycholinguistics, computational linguistics, and natural language processing (NLP).

Maintained by Bruno Nicenboim. Last updated 3 days ago.

nlp psycholinguistics transformers

7.5 match 8 stars 4.90 score

ironholds

batman:Convert categorical representations of logicals to actual logicals

Survey systems and other third-party data sources commonly use non- standard representations of logical values when it comes to qualitative data - "Yes", "No" and "N/A", say. batman is a package designed to seamlessly convert these into logicals. It is highly localised, and contains equivalents to boolean values in languages including German, French, Spanish, Italian, Turkish, Chinese and Polish.

Maintained by Oliver Keyes. Last updated 9 years ago.

6.9 match 11 stars 5.28 score 70 scripts

michelnivard

gptstudio:Use Large Language Models Directly in your Development Environment

Large language models are readily accessible via API. This package lowers the barrier to use the API inside of your development environment. For more on the API, see <https://platform.openai.com/docs/introduction>.

Maintained by James Wade. Last updated 5 days ago.

chatgpt gpt-3 rstudio rstudio-addin

3.4 match 924 stars 10.83 score 43 scripts 1 dependents

doug-friedman

topicdoc:Topic-Specific Diagnostics for LDA and CTM Topic Models

Calculates topic-specific diagnostics (e.g. mean token length, exclusivity) for Latent Dirichlet Allocation and Correlated Topic Models fit using the 'topicmodels' package. For more details, see Chapter 12 in Airoldi et al. (2014, ISBN:9781466504080), pp 262-272 Mimno et al. (2011, ISBN:9781937284114), and Bischof et al. (2014) <arXiv:1206.4631v1>.

Maintained by Doug Friedman. Last updated 3 years ago.

natural-language-processing text-mining topic-modeling topic-modelling topic-models

6.7 match 25 stars 5.48 score 24 scripts

dmkaplan2000

knitrdata:Data Language Engine for 'knitr' / 'rmarkdown'

Implements a data language engine for incorporating data directly in 'rmarkdown' documents so that they can be made completely standalone.

Maintained by David M. Kaplan. Last updated 3 years ago.

7.5 match 7 stars 4.75 score 16 scripts

gagolews

stringx:Replacements for Base String Functions Powered by 'stringi'

English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to 'ICU' (International Components for Unicode) and 'stringi', they are fast, reliable, and portable across different platforms.

Maintained by Marek Gagolewski. Last updated 2 months ago.

icu icu4c natural-language-processing nlp regex regexp string-manipulation stringi text text-processing unicode

7.4 match 28 stars 4.75 score 1 scripts

rdatatable

data.table:Extension of `data.frame`

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.

Maintained by Tyson Barrett. Last updated 13 hours ago.

1.5 match 3.7k stars 23.53 score 230k scripts 4.6k dependents

appliedstat

rQCC:Robust Quality Control Chart

Constructs various robust quality control charts based on the median or Hodges-Lehmann estimator (location) and the median absolute deviation (MAD) or Shamos estimator (scale). The estimators used for the robust control charts are all unbiased with a sample of finite size. For more details, see Park, Kim and Wang (2022) <doi:10.1080/03610918.2019.1699114>. In addition, using this R package, the conventional quality control charts such as X-bar, S, R, p, np, u, c, g, h, and t charts are also easily constructed. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2022R1A2C1091319).

Maintained by Chanseok Park. Last updated 1 years ago.

control-chart goodness-of-fit r-language weibull

7.5 match 2 stars 4.70 score 3 scripts

r-forge

deSolve:Solvers for Initial Value Problems of Differential Equations ('ODE', 'DAE', 'DDE')

Functions that solve initial value problems of a system of first-order ordinary differential equations ('ODE'), of partial differential equations ('PDE'), of differential algebraic equations ('DAE'), and of delay differential equations. The functions provide an interface to the FORTRAN functions 'lsoda', 'lsodar', 'lsode', 'lsodes' of the 'ODEPACK' collection, to the FORTRAN functions 'dvode', 'zvode' and 'daspk' and a C-implementation of solvers of the 'Runge-Kutta' family with fixed or variable time steps. The package contains routines designed for solving 'ODEs' resulting from 1-D, 2-D and 3-D partial differential equations ('PDE') that have been converted to 'ODEs' by numerical differencing.

Maintained by Thomas Petzoldt. Last updated 1 years ago.

fortran openblas

2.9 match 12.33 score 8.0k scripts 427 dependents

bioc

Rcwl:An R interface to the Common Workflow Language

The Common Workflow Language (CWL) is an open standard for development of data analysis workflows that is portable and scalable across different tools and working environments. Rcwl provides a simple way to wrap command line tools and build CWL data analysis pipelines programmatically within R. It increases the ease of usage, development, and maintenance of CWL pipelines.

Maintained by Qiang Hu. Last updated 5 months ago.

software workflowstep immunooncology

6.4 match 5.52 score 37 scripts 2 dependents

fatelarico

morestopwords:All Stop Words in One Place

A standalone package combining several stop-word lists for 65 languages with a median of 329 stop words for language and over 1,000 entries for English, Breton, Latin, Slovenian, and Ancient Greek! The user automatically gets access to all the unique stop words contained in: the 'StopwordISO' repository; python's 'Natural Language Toolkit'; the 'Snowball' stop-word list; the R package 'quanteda'; the 'marimo' repository; the 'Perseus' project; and A. Berra's list of stop words for Ancient Greek and Latin.

Maintained by Fabio Ashtar Telarico. Last updated 2 years ago.

13.0 match 2.70 score

ouhscbbmc

REDCapR:Interaction Between R and REDCap

Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.

Maintained by Will Beasley. Last updated 2 months ago.

redcap redcap-api

2.8 match 118 stars 12.36 score 438 scripts 6 dependents

jsugarelli

switchcase:A Simple and Flexible Switch-Case Construct for the 'R' Language

Provides a switch-case construct for 'R', as it is known from other programming languages. It allows to test multiple, similar conditions in an efficient, easy-to-read manner, so nested if-else constructs can be avoided. The switch-case construct is designed as an 'R' function that allows to return values depending on which condition is met and lets the programmer flexibly decide whether or not to leave the switch-case construct after a case block has been executed.

Maintained by Joachim Zuckarelli. Last updated 5 years ago.

r-lang r-language switch-case-construct

10.9 match 3 stars 3.18 score 2 scripts

mlverse

mall:Run Multiple Large Language Model Predictions Against a Table, or Vectors

Run multiple 'Large Language Model' predictions against a table. The predictions run row-wise over a specified column. It works using a one-shot prompt, along with the current row's content. The prompt that is used will depend of the type of analysis needed.

Maintained by Edgar Ruiz. Last updated 3 months ago.

data-science dplyr llm polars python

5.2 match 86 stars 6.61 score 94 scripts

cran

zoomGroupStats:Analyze Text, Audio, and Video from 'Zoom' Meetings

Provides utilities for processing and analyzing the files that are exported from a recorded 'Zoom' Meeting. This includes analyzing data captured through video cameras and microphones, the text-based chat, and meta-data. You can analyze aspects of the conversation among meeting participants and their emotional expressions throughout the meeting.

Maintained by Andrew Knight. Last updated 4 years ago.

10.3 match 3.30 score 10 scripts

docopt

docopt:Command-Line Interface Specification Language

Define a command-line interface by just giving it a description in the specific format.

Maintained by Edwin de Jonge. Last updated 4 years ago.

3.0 match 213 stars 11.29 score 1.5k scripts 19 dependents

stefanieschneider

unstruwwel:Detect and Parse Historic Dates

Automatically converts language-specific verbal information, e.g., "1st half of the 19th century," to its standardized numerical counterparts, e.g., "1801-01-01/1850-12-31." It follows the recommendations of the 'MIDAS' ('Marburger Informations-, Dokumentations- und Administrations-System'), see <doi:10.11588/artdok.00003770>.

Maintained by Stefanie Schneider. Last updated 2 months ago.

dates nlp parser

8.8 match 7 stars 3.85 score 2 scripts

gadget-framework

gadget3:Globally-Applicable Area Disaggregated General Ecosystem Toolbox V3

A framework to assist creation of marine ecosystem models, generating either 'R' or 'C++' code which can then be optimised using the 'TMB' package and standard 'R' tools. Principally designed to reproduce gadget2 models in 'TMB', but can be extended beyond gadget2's capabilities. Kasper Kristensen, Anders Nielsen, Casper W. Berg, Hans Skaug, Bradley M. Bell (2016) <doi:10.18637/jss.v070.i05> "TMB: Automatic Differentiation and Laplace Approximation.". Begley, J., & Howell, D. (2004) <https://core.ac.uk/download/pdf/225936648.pdf> "An overview of Gadget, the globally applicable area-disaggregated general ecosystem toolbox. ICES.".

Maintained by Jamie Lentin. Last updated 29 days ago.

3.9 match 8 stars 8.69 score 170 scripts

pythonhealthdatascience

treat.sim:Nelson's Treatment Centre Simulation in Simmer

A discrete-event simulation of a simple urgent care treatment centre simulation from Nelson (2013). Implemented in R Simmer. The model is packaged to allow for easy experimentation, summary of results, and implementation in other software such as a Shiny interface.

Maintained by Thomas Monks. Last updated 8 months ago.

computer-simulation discrete-event-simulation health open-modelling open-science open-source r-language reproducible-research simmer

7.5 match 2 stars 4.48 score 5 scripts

modeloriented

iBreakDown:Model Agnostic Instance Level Variable Attributions

Model agnostic tool for decomposition of predictions from black boxes. Supports additive attributions and attributions with interactions. The Break Down Table shows contributions of every variable to a final prediction. The Break Down Plot presents variable contributions in a concise graphical way. This package works for classification and regression models. It is an extension of the 'breakDown' package (Staniak and Biecek 2018) <doi:10.32614/RJ-2018-072>, with new and faster strategies for orderings. It supports interactions in explanations and has interactive visuals (implemented with 'D3.js' library). The methodology behind is described in the 'iBreakDown' article (Gosiewska and Biecek 2019) <arXiv:1903.11420> This package is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.

Maintained by Przemyslaw Biecek. Last updated 1 years ago.

breakdown iml interpretability shapley xai

3.3 match 84 stars 10.07 score 56 scripts 22 dependents

cynkra

constructive:Display Idiomatic Code to Construct Most R Objects

Prints code that can be used to recreate R objects. In a sense it is similar to 'base::dput()' or 'base::deparse()' but 'constructive' strives to use idiomatic constructors.

Maintained by Antoine Fabri. Last updated 9 hours ago.

3.9 match 137 stars 8.63 score 20 scripts

mikemahoney218

proceduralnames:Several Methods for Procedural Name Generation

A small, dependency-free way to generate random names. Methods provided include the adjective-surname approach of Docker containers ('<https://github.com/moby/moby/blob/master/pkg/namesgenerator/names-generator.go>'), and combinations of common English or Spanish words.

Maintained by Michael Mahoney. Last updated 3 years ago.

7.2 match 7 stars 4.62 score 4 scripts 4 dependents

elgarteo

cnum:Chinese Numerals Processing

Chinese numerals processing in R, such as conversion between Chinese numerals and Arabic numerals as well as detection and extraction of Chinese numerals in character objects and string. This package supports the casual scale naming system and the respective SI prefix systems used in mainland China and Taiwan: "The State Council's Order on the Unified Implementation of Legal Measurement Units in Our Country" The State Council of the People's Republic of China (1984) "Names, Definitions and Symbols of the Legal Units of Measurement and the Decimal Multiples and Submultiples" Ministry of Economic Affairs (2019) <https://gazette.nat.gov.tw/egFront/detail.do?metaid=108965>.

Maintained by Elgar Teo. Last updated 2 months ago.

chinese-language numeral-systems-conversions text-mining cpp

9.5 match 6 stars 3.48 score 2 scripts

petersfritz

topiclabels:Automated Topic Labeling with Language Models

Leveraging (large) language models for automatic topic labeling. The main function converts a list of top terms into a label for each topic. Hence, it is complementary to any topic modeling package that produces a list of top terms for each topic. While human judgement is indispensable for topic validation (i.e., inspecting top terms and most representative documents), automatic topic labeling can be a valuable tool for researchers in various scenarios.

Maintained by Jonas Rieger. Last updated 5 months ago.

7.0 match 4 stars 4.73 score 1 scripts

rossellhayes

plu:Dynamically Pluralize Phrases

Converts English phrases to singular or plural form based on the length of an associated vector. Contains helper functions to create natural language lists from vectors and to include the length of a vector in natural language.

Maintained by Alexander Rossell Hayes. Last updated 1 years ago.

natural-language plural

8.3 match 6 stars 3.95 score 2 scripts 1 dependents

terrytangyuan

scaffolder:Scaffolding Interfaces to Packages in Other Programming Languages

Comprehensive set of tools for scaffolding R interfaces to modules, classes, functions, and documentations written in other programming languages, such as 'Python'.

Maintained by Yuan Tang. Last updated 2 years ago.

code-generation python reticulate scaffolding

5.3 match 27 stars 6.13 score 9 scripts

glin

reactable:Interactive Data Tables for R

Interactive data tables for R, based on the 'React Table' JavaScript library. Provides an HTML widget that can be used in 'R Markdown' or 'Quarto' documents, 'Shiny' applications, or viewed from an R console.

Maintained by Greg Lin. Last updated 2 months ago.

htmlwidgets react shiny table

2.3 match 645 stars 14.52 score 3.3k scripts 151 dependents

mohamed-180

gtranslate:Translate Between Different Languages

The goal of this package is to translate between different languages without any Google API authentication which is pain and you must pay for the key, This package is free and lightweight.

Maintained by Mohamed El-Desouky. Last updated 2 years ago.

9.9 match 4 stars 3.30 score 7 scripts

jsugarelli

pointr:Working Comfortably with Pointers and Shortcuts to R Objects

R has no built-in pointer functionality. The 'pointr' package fills this gap and lets you create pointers to R objects, including subsets of dataframes. This makes your R code more readable and maintainable.

Maintained by Joachim Zuckarelli. Last updated 4 years ago.

pointers r-lang r-language

7.5 match 8 stars 4.31 score 17 scripts 1 dependents

vgherard

sbo:Text Prediction via Stupid Back-Off N-Gram Models

Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).

Maintained by Valerio Gherardi. Last updated 4 years ago.

natural-language-processing ngram-models predictive-text sbo cpp

6.7 match 10 stars 4.78 score 12 scripts

tonyfischetti

libbib:Various Utilities for Library Science/Assessment and Cataloging

Provides functions for validating and normalizing bibliographic codes such as ISBN, ISSN, and LCCN. Also includes functions to communicate with the WorldCat API, translate Call numbers (Library of Congress and Dewey Decimal) to their subject classifications or subclassifications, and provides various loadable data files such call number / subject crosswalks and code tables.

Maintained by Tony Fischetti. Last updated 2 years ago.

9.9 match 3.20 score 32 scripts

hadley

lazyeval:Lazy (Non-Standard) Evaluation

An alternative approach to non-standard evaluation using formulas. Provides a full implementation of LISP style 'quasiquotation', making it easier to generate code with other code.

Maintained by Hadley Wickham. Last updated 3 years ago.

2.0 match 131 stars 15.74 score 520 scripts 1.8k dependents

trn000

norMmix:Direct MLE for Multivariate Normal Mixture Distributions

Multivariate Normal (i.e. Gaussian) Mixture Models (S3) Classes. Fitting models to data using 'MLE' (maximum likelihood estimation) for multivariate normal mixtures via smart parametrization using the 'LDL' (Cholesky) decomposition, see McLachlan and Peel (2000, ISBN:9780471006268), Celeux and Govaert (1995) <doi:10.1016/0031-3203(94)00125-6>.

Maintained by Nicolas Trutmann. Last updated 6 months ago.

gaussian-mixture-models maximum-likelihood-estimation r-language

7.5 match 4.18 score 3 scripts

bnosac

sentencepiece:Text Tokenization using Byte Pair Encoding and Unigram Modelling

Unsupervised text tokenizer allowing to perform byte pair encoding and unigram modelling. Wraps the 'sentencepiece' library <https://github.com/google/sentencepiece> which provides a language independent tokenizer to split text in words and smaller subword units. The techniques are explained in the paper "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing" by Taku Kudo and John Richardson (2018) <doi:10.18653/v1/D18-2012>. Provides as well straightforward access to pretrained byte pair encoding models and subword embeddings trained on Wikipedia using 'word2vec', as described in "BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages" by Benjamin Heinzerling and Michael Strube (2018) <http://www.lrec-conf.org/proceedings/lrec2018/pdf/1049.pdf>.

Maintained by Jan Wijffels. Last updated 2 years ago.

byte natural-language-processing sentencepiece word-segmentation cpp

7.6 match 25 stars 4.10 score 8 scripts

theharmonylab

topics:Creating and Significance Testing Language Features for Visualisation

Implements differential language analysis with statistical tests and offers various language visualization techniques for n-grams and topics. It also supports the 'text' package. For more information, visit <https://r-topics.org/> and <https://www.r-text.org/>.

Maintained by Oscar Kjell. Last updated 5 days ago.

openjdk

3.7 match 5 stars 8.28 score 22 scripts 2 dependents

nalimilan

SnowballC:Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library

An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish.

Maintained by Milan Bouchet-Valat. Last updated 15 days ago.

text-mining

2.4 match 27 stars 12.63 score 4.4k scripts 171 dependents

brodieg

vetr:Trust, but Verify

Declarative template-based framework for verifying that objects meet structural requirements, and auto-composing error messages when they do not.

Maintained by Brodie Gaslam. Last updated 9 months ago.

argument-checks input-validation

4.0 match 79 stars 7.50 score 67 scripts 1 dependents

krashkov

pcSteiner:Convenient Tool for Solving the Prize-Collecting Steiner Tree Problem

The Prize-Collecting Steiner Tree problem asks to find a subgraph connecting a given set of vertices with the most expensive nodes and least expensive edges. Since it is proven to be NP-hard, exact and efficient algorithm does not exist. This package provides convenient functionality for obtaining an approximate solution to this problem using loopy belief propagation algorithm.

Maintained by Aleksei Krasikov. Last updated 5 years ago.

graph-algorithms r-language steiner-tree steiner-tree-problem

7.5 match 2 stars 4.00 score 3 scripts

ip2location

ip2location:Lookup for IP Address Information

Enables the user to find the country, region, district, city, coordinates, zip code, time zone, ISP, domain name, connection type, area code, weather, MCC, MNC, mobile brand name, elevation, usage type, address type, IAB category and ASN that any IP address or hostname originates from. Supported IPv4 and IPv6. Please visit <https://www.ip2location.com> to learn more. You may also want to visit <https://lite.ip2location.com> for free database download. This package requires 'IP2Location Python' module. At the terminal, please run 'pip install IP2Location' to install the module.

Maintained by Kai Wen Ooi. Last updated 2 years ago.

geolocation geolocation-information ip-geolocation ip-lookup ip2location lookup r-language

7.5 match 10 stars 4.00 score 1 scripts

appliedstat

weibullness:Goodness-of-Fit Test for Weibull Distribution (Weibullness)

Conducts a goodness-of-fit test for the Weibull distribution (referred to as the weibullness test) and furnishes parameter estimations for both the two-parameter and three-parameter Weibull distributions. Notably, the threshold parameter is derived through correlation from the Weibull plot. Additionally, this package conducts goodness-of-fit assessments for the exponential, Gumbel, and inverse Weibull distributions, accompanied by parameter estimations. For more details, see Park (2017) <doi:10.23055/ijietap.2017.24.4.2848>, Park (2018) <doi:10.1155/2018/6056975>, and Park (2023) <doi:10.3390/math11143156>. This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIT) (No. 2022R1A2C1091319, RS-2023-00242528).

Maintained by Chanseok Park. Last updated 1 years ago.

control-chart goodness-of-fit r-language weibull

7.5 match 2 stars 3.98 score 32 scripts 1 dependents

nashjc

nlsr:Functions for Nonlinear Least Squares Solutions - Updated 2022

Provides tools for working with nonlinear least squares problems. For the estimation of models reliable and robust tools than nls(), where the the Gauss-Newton method frequently stops with 'singular gradient' messages. This is accomplished by using, where possible, analytic derivatives to compute the matrix of derivatives and a stabilization of the solution of the estimation equations. Tools for approximate or externally supplied derivative matrices are included. Bounds and masks on parameters are handled properly.

Maintained by John C Nash. Last updated 26 days ago.

4.2 match 7.02 score 94 scripts 5 dependents

azure

AzureKusto:Interface to 'Kusto'/'Azure Data Explorer'

An interface to 'Azure Data Explorer', also known as 'Kusto', a fast, distributed data exploration service from Microsoft: <https://azure.microsoft.com/en-us/products/data-explorer/>. Includes 'DBI' and 'dplyr' interfaces, with the latter modelled after the 'dbplyr' package, whereby queries are translated from R into the native 'KQL' query language and executed lazily. On the admin side, the package extends the object framework provided by 'AzureRMR' to support creation and deletion of databases, and management of database principals. Part of the 'AzureR' family of packages.

Maintained by Alex Kyllo. Last updated 1 years ago.

azure azure-data-explorer azure-sdk-r big-data-analytics kusto

5.5 match 18 stars 5.19 score 9 scripts

rossellhayes

nombre:Number Names

Converts numeric vectors to character vectors of English number names. Provides conversion to cardinals, ordinals, numerators, and denominators. Supports negative and non-integer numbers.

Maintained by Alexander Rossell Hayes. Last updated 3 years ago.

natural-language

7.5 match 13 stars 3.81 score 4 scripts

philferriere

mscstexta4r:R Client for the Microsoft Cognitive Services Text Analytics REST API

R Client for the Microsoft Cognitive Services Text Analytics REST API, including Sentiment Analysis, Topic Detection, Language Detection, and Key Phrase Extraction. An account MUST be registered at the Microsoft Cognitive Services website <https://www.microsoft.com/cognitive-services/> in order to obtain a (free) API key. Without an API key, this package will not work properly.

Maintained by Phil Ferriere. Last updated 9 years ago.

5.4 match 24 stars 5.28 score 16 scripts

cysouw

qlcData:Processing Data for Quantitative Language Comparison

Functionality to read, recode, and transcode data as used in quantitative language comparison, specifically to deal with multilingual orthographic variation (Moran & Cysouw (2018) <doi:10.5281/zenodo.1296780>) and with the recoding of nominal data.

Maintained by Michael Cysouw. Last updated 9 months ago.

5.3 match 3 stars 5.38 score 40 scripts

damoncharlesroberts

genCountR:Interacting with Roberts and Utych's (2019) Gendered Language Dictionary

Allows users to generate a gendered language score according to the gendered language dictionary in Roberts and Utych (2019) <doi:10.1177/1065912919874883>.

Maintained by Damon Roberts. Last updated 8 months ago.

7.1 match 4.00 score 2 scripts

iatgen

tr.iatgen:Translate 'iatgen' Generated QSF Files

Automates translating the instructions of 'iatgen' generated qsf (Qualtrics survey files) to other languages using either officially supported or user-supplied translations (for tutorial see Santos et al., 2023 <doi:10.17504/protocols.io.kxygx34jdg8j/v1>).

Maintained by Michal Kouril. Last updated 3 months ago.

6.8 match 4.18 score 4 scripts

uribo

textlintr:Natural Language Linter Tools for 'R Markdown' and R Code

What the package does (one paragraph).

Maintained by Shinya Uryu. Last updated 2 years ago.

lint natural-language-processing

9.5 match 9 stars 2.95 score 4 scripts

colinfay

proustr:Tools for Natural Language Processing in French

Tools for Natural Language Processing in French and texts from Marcel Proust's collection "A La Recherche Du Temps Perdu". The novels contained in this collection are "Du cote de chez Swann ", "A l'ombre des jeunes filles en fleurs","Le Cote de Guermantes", "Sodome et Gomorrhe I et II", "La Prisonniere", "Albertine disparue", and "Le Temps retrouve".

Maintained by Colin Fay. Last updated 6 years ago.

4.6 match 24 stars 6.10 score 104 scripts

simonpcouch

chores:A Collection of Large Language Model Assistants

Provides a collection of ergonomic large language model assistants designed to help you complete repetitive, hard-to-automate tasks quickly. After selecting some code, press the keyboard shortcut you've chosen to trigger the package app, select an assistant, and watch your chore be carried out. While the package ships with a number of chore helpers for R package development, users can create custom helpers just by writing some instructions in a markdown file.

Maintained by Simon Couch. Last updated 22 days ago.

3.5 match 90 stars 7.91 score 6 scripts

howl-anderson

sdmvspecies:Create Virtual Species for Species Distribution Modelling

A software package help user to create virtual species for species distribution modelling. It includes several methods to help user to create virtual species distribution map. Those maps can be used for Species Distribution Modelling (SDM) study. SDM use environmental data for sites of occurrence of a species to predict all the sites where the environmental conditions are suitable for the species to persist, and may be expected to occur.

Maintained by Xiaoquan Kong. Last updated 9 years ago.

r-language species-distribution-modelling virtual-species

7.5 match 1 stars 3.70 score 8 scripts

ropensci

ritis:Integrated Taxonomic Information System Client

An interface to the Integrated Taxonomic Information System ('ITIS') (<https://www.itis.gov>). Includes functions to work with the 'ITIS' REST API methods (<https://www.itis.gov/ws_description.html>), as well as the 'Solr' web service (<https://www.itis.gov/solr_documentation.html>).

Maintained by Julia Blum. Last updated 1 months ago.

taxonomy biology nomenclature json api web api-client identifiers species names api-wrapper itis taxize

3.6 match 16 stars 7.72 score 64 scripts 24 dependents

brianweinstein

googlenlp:An Interface to Google's Cloud Natural Language API

Interact with Google's Cloud Natural Language API <https://cloud.google.com/natural-language/> (v1) via R. The API has four main features, all of which are available through this R package: syntax analysis and part-of-speech tagging, entity analysis, sentiment analysis, and language identification.

Maintained by Brian Weinstien. Last updated 7 years ago.

api google-cloud-platform nlp

7.2 match 8 stars 3.86 score 18 scripts

ropensci

babelquarto:Renders a Multilingual Quarto Book

Automate rendering and cross-linking of Quarto books following a prescribed structure.

Maintained by Maëlle Salmon. Last updated 1 months ago.

3.7 match 43 stars 7.52 score 23 scripts 1 dependents

discindo

newscatcheR:Programmatically Collect Normalized News from (Almost) Any Website

Programmatically collect normalized news from (almost) any website. An 'R' clone of the <https://github.com/kotartemiy/newscatcher> 'Python' module.

Maintained by Novica Nakov. Last updated 1 years ago.

hacktoberfest news-sites newscatcher rss-feed tidyrss

4.8 match 30 stars 5.65 score 7 scripts

aravind-j

PGRdup:Discover Probable Duplicates in Plant Genetic Resources Collections

Provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections using 'passport databases' comprising of information records of each constituent sample. These include methods for cleaning the data, creation of a searchable Key Word in Context (KWIC) index of keywords associated with sample records and the identification of nearly identical records with similar information by fuzzy, phonetic and semantic matching of keywords.

Maintained by J. Aravind. Last updated 2 years ago.

double-metaphone double-metaphone-algorithm natural-language-processing pgr plant-genetic-resources record-linkage

6.7 match 1 stars 4.06 score 23 scripts

colinfay

languagelayeR:Access the 'languagelayer' API

Improve your text analysis with languagelayer <https://languagelayer.com>, a powerful language detection API.

Maintained by Colin FAY. Last updated 6 years ago.

6.1 match 5 stars 4.40 score 7 scripts

john-harrold

ubiquity:PKPD, PBPK, and Systems Pharmacology Modeling Tools

Complete work flow for the analysis of pharmacokinetic pharmacodynamic (PKPD), physiologically-based pharmacokinetic (PBPK) and systems pharmacology models including: creation of ordinary differential equation-based models, pooled parameter estimation, individual/population based simulations, rule-based simulations for clinical trial design and modeling assays, deployment with a customizable 'Shiny' app, and non-compartmental analysis. System-specific analysis templates can be generated and each element includes integrated reporting with 'PowerPoint' and 'Word'.

Maintained by John Harrold. Last updated 15 days ago.

modeling pkpd

3.8 match 13 stars 7.14 score 33 scripts

philferriere

mscsweblm4r:R Client for the Microsoft Cognitive Services Web Language Model REST API

R Client for the Microsoft Cognitive Services Web Language Model REST API, including Break Into Words, Calculate Conditional Probability, Calculate Joint Probability, Generate Next Words, and List Available Models. A valid account MUST be registered at the Microsoft Cognitive Services website <https://www.microsoft.com/cognitive-services/> in order to obtain a (free) API key. Without an API key, this package will not work properly.

Maintained by Phil Ferriere. Last updated 9 years ago.

6.7 match 2 stars 4.00 score 9 scripts

makhgal-ganbold

NSO1212:National Statistical Office of Mongolia's Open Data API Handler

National Statistical Office of Mongolia (NSO) is the national statistical service and an organization of Mongolian government. NSO provides open access to official data via its API <http://opendata.1212.mn/en/doc>. The package NSO1212 has functions for accessing the API service. The functions are compatible with the API v2.0 and get data sets and its detailed informations from the API.

Maintained by Makhgal Ganbold. Last updated 3 years ago.

r-language statistics

7.5 match 7 stars 3.54 score 6 scripts

rpkgs

rcolors:270 'NCL' Color Tables in R Language

'NCL' (NCAR Command Language) is one of the most popular spatial data mapping tools in meteorology studies, due to its beautiful output figures with plenty of color palettes designed by experts <https://www.ncl.ucar.edu/index.shtml>. Here we translate all 'NCL' color palettes into R hexadecimal RGB colors and provide color selection function, which will help users make a beautiful figure.

Maintained by Dongdong Kong. Last updated 9 months ago.

5.1 match 17 stars 5.14 score 54 scripts

ropensci

tif:Text Interchange Format

Provides validation functions for common interchange formats for representing text data in R. Includes formats for corpus objects, document term matrices, and tokens. Other annotations can be stored by overloading the tokens structure.

Maintained by Taylor B. Arnold. Last updated 1 years ago.

corpus natural-language-processing term-frequency text-processing tokenizer

6.7 match 36 stars 3.94 score 16 scripts

uribo

sudachir:R Interface to 'Sudachi'

Interface to 'Sudachi' <https://github.com/WorksApplications/sudachi.rs>, a Japanese morphological analyzer. This is a port of what is available in Python.

Maintained by Shinya Uryu. Last updated 2 years ago.

japanese-language nlp

7.5 match 6 stars 3.48 score 6 scripts

opendataformat

opendataformat:Reading and Writing Open Data Format Files

The Open Data Format (ODF) is a new, non-proprietary, multilingual, metadata enriched, and zip-compressed data format with metadata structured in the Data Documentation Initiative (DDI) Codebook standard. This package allows reading and writing of data files in the Open Data Format (ODF) in R, and displaying metadata in different languages. For further information on the Open Data Format, see <https://opendataformat.github.io/>.

Maintained by Tom Hartl. Last updated 5 days ago.

4.8 match 5.41 score 7 scripts

moodymudskipper

typed:Support Types for Variables, Arguments, and Return Values

A type system for R. It supports setting variable types in a script or the body of a function, so variables can't be assigned illegal values. Moreover it supports setting argument and return types for functions.

Maintained by Antoine Fabri. Last updated 2 months ago.

3.0 match 169 stars 8.65 score 18 scripts 1 dependents

arnaudgallou

plume:A Simple Author Handler for Scientific Writing

Handles and formats author information in scientific writing in 'R Markdown' and 'Quarto'. 'plume' provides easy-to-use and flexible tools for injecting author metadata in 'YAML' headers as well as generating author and contribution lists (among others) as strings from tabular data.

Maintained by Arnaud Gallou. Last updated 29 days ago.

authors contribution contributions list lists markdown paper preprint quarto role roles

3.8 match 21 stars 6.84 score 15 scripts

eworx-org

labourR:Classify Multilingual Labour Market Free-Text to Standardized Hierarchical Occupations

Allows the user to map multilingual free-text of occupations to a broad range of standardized classifications. The package facilitates automatic occupation coding (see, e.g., Gweon et al. (2017) <doi:10.1515/jos-2017-0006> and Turrell et al. (2019) <doi:10.3386/w25837>), where the ISCO to ESCO mapping is exploited to extend the occupations hierarchy, Le Vrang et al. (2014) <doi:10.1109/mc.2014.283>. Document vectorization is performed using the multilingual ESCO corpus. A method based on the nearest neighbor search is used to suggest the closest ISCO occupation.

Maintained by Alexandros Kouretsis. Last updated 3 years ago.

4.0 match 28 stars 6.29 score 23 scripts 1 dependents

drjphughesjr

hash:Full Featured Implementation of Hash Tables/Associative Arrays/Dictionaries

Implements a data structure similar to hashes in Perl and dictionaries in Python but with a purposefully R flavor. For objects of appreciable size, access using hashes outperforms native named lists and vectors.

Maintained by John Hughes. Last updated 2 years ago.

3.4 match 1 stars 7.54 score 4.0k scripts 50 dependents

easystats

report:Automated Reporting of Results and Statistical Models

The aim of the 'report' package is to bridge the gap between R’s output and the formatted results contained in your manuscript. This package converts statistical models and data frames into textual reports suited for publication, ensuring standardization and quality in results reporting.

Maintained by Rémi Thériault. Last updated 1 months ago.

anovas apa automated-report-generation automatic bayesian describe easystats hacktoberfest manuscript models report reporting reports scientific statsmodels

1.8 match 698 stars 14.48 score 1.1k scripts 3 dependents

projectmosaic

mosaicCalc:R-Language Based Calculus Operations for Teaching

Software to support the introductory *MOSAIC Calculus* textbook <https://www.mosaic-web.org/MOSAIC-Calculus/>), one of many data- and modeling-oriented educational resources developed by Project MOSAIC (<https://www.mosaic-web.org/>). Provides symbolic and numerical differentiation and integration, as well as support for applied linear algebra (for data science), and differential equations/dynamics. Includes grammar-of-graphics-based functions for drawing vector fields, trajectories, etc. The software is suitable for general use, but intended mainly for teaching calculus.

Maintained by Daniel Kaplan. Last updated 18 days ago.

2.9 match 13 stars 8.68 score 546 scripts

mclements

ascii:Export R Objects to Several Markup Languages

Coerce R object to 'asciidoc', 'txt2tags', 'restructuredText', 'org', 'textile' or 'pandoc' syntax. Package comes with a set of drivers for 'Sweave'.

Maintained by Mark Clements. Last updated 1 years ago.

4.7 match 8 stars 5.31 score 161 scripts 2 dependents

melff

memisc:Management of Survey Data and Presentation of Analysis Results

An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.

Maintained by Martin Elff. Last updated 10 days ago.

survey-data

2.0 match 46 stars 12.34 score 1.2k scripts 13 dependents

epiverse-trace

numberize:Convert Words to Numbers in Multiple Languages

Converts written out numbers into their equivalent numbers. Supports numbers written out in English, French, or Spanish.

Maintained by Bankole Ahadzie. Last updated 11 days ago.

r-programming

4.7 match 4 stars 5.28 score 1 scripts 1 dependents

ropensci

xslt:Extensible Style-Sheet Language Transformations

An extension for the 'xml2' package to transform XML documents by applying an 'xslt' style-sheet.

Maintained by Jeroen Ooms. Last updated 15 days ago.

xml xslt libxslt libxml2 cpp

3.0 match 29 stars 8.20 score 80 scripts 12 dependents

symengine

symengine:Interface to the 'SymEngine' Library

Provides an R interface to 'SymEngine' <https://github.com/symengine/>, a standalone 'C++' library for fast symbolic manipulation. The package has functionalities for symbolic computation like calculating exact mathematical expressions, solving systems of linear equations and code generation.

Maintained by Jialin Ma. Last updated 1 years ago.

mpfr4 gmp cpp

3.0 match 26 stars 8.20 score 33 scripts 10 dependents

giocomai

zoteror:Access the Zotero API in R

zoteror provides tools to access the Zotero API

Maintained by Giorgio Comai. Last updated 5 years ago.

r-language zotero zotero-api

7.5 match 37 stars 3.27 score 5 scripts

bioc

BiocGenerics:S4 generic functions used in Bioconductor

The package defines many S4 generic functions used in Bioconductor.

Maintained by Hervé Pagès. Last updated 1 months ago.

infrastructure bioconductor-package core-package

1.7 match 12 stars 14.22 score 612 scripts 2.2k dependents