Showing 200 of total 657 results (show query)
ropensci
charlatan:Make Fake Data
Make fake data that looks realistic, supporting addresses, person names, dates, times, colors, coordinates, currencies, digital object identifiers ('DOIs'), jobs, phone numbers, 'DNA' sequences, doubles and integers from distributions and within a range.
Maintained by Roel M. Hogervorst. Last updated 1 months ago.
datadatasetfake-datafakerpeer-reviewed
74.3 match 296 stars 10.06 score 180 scripts 1 dependentsropensci
lingtypology:Linguistic Typology and Mapping
Provides R with the Glottolog database <https://glottolog.org/> and some more abilities for purposes of linguistic mapping. The Glottolog database contains the catalogue of languages of the world. This package helps researchers to make a linguistic maps, using philosophy of the Cross-Linguistic Linked Data project <https://clld.org/>, which allows for while at the same time facilitating uniform access to the data across publications. A tutorial for this package is available on GitHub pages <https://docs.ropensci.org/lingtypology/> and package vignette. Maps created by this package can be used both for the investigation and linguistic teaching. In addition, package provides an ability to download data from typological databases such as WALS, AUTOTYP and some others and to create your own database website.
Maintained by George Moroz. Last updated 5 months ago.
abvdafboatlasautotypebivaltypclldglottolog-databaselinguistic-mapslinguisticsphoiblesailstypologywals
69.7 match 51 stars 9.58 score 694 scriptsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
29.6 match 215 stars 11.83 score 1.2k scripts 9 dependentsropensci
googleLanguageR:Call Google's 'Natural Language' API, 'Cloud Translation' API, 'Cloud Speech' API and 'Cloud Text-to-Speech' API
Call 'Google Cloud' machine learning APIs for text and speech tasks. Call the 'Cloud Translation' API <https://cloud.google.com/translate/> for detection and translation of text, the 'Natural Language' API <https://cloud.google.com/natural-language/> to analyse text for sentiment, entities or syntax, the 'Cloud Speech' API <https://cloud.google.com/speech/> to transcribe sound files to text and the 'Cloud Text-to-Speech' API <https://cloud.google.com/text-to-speech/> to turn text into sound files.
Maintained by Mark Edmondson. Last updated 8 months ago.
cloud-speech-apicloud-translation-apigoogle-api-clientgoogle-cloudgoogle-cloud-speechgoogle-nlpgoogleauthrnatural-language-processingpeer-reviewedsentiment-analysisspeech-apitranslation-api
28.1 match 196 stars 10.36 score 268 scripts 3 dependentstomeriko96
polyglotr:Translate Text
Provide easy methods to translate pieces of text. Functions send requests to translation services online.
Maintained by Tomer Iwan. Last updated 1 months ago.
google-translategoogletranslatelanguagelingueemymemory-apimymemorytranslatorponstranslationtranslations-api
28.9 match 33 stars 7.61 score 34 scripts 1 dependentspsychbruce
FMAT:The Fill-Mask Association Test
The Fill-Mask Association Test ('FMAT') <doi:10.1037/pspa0000396> is an integrative and probability-based method using Masked Language Models to measure conceptual associations (e.g., attitudes, biases, stereotypes, social norms, cultural values) as propositions in natural language. Supported language models include 'BERT' <doi:10.48550/arXiv.1810.04805> and its variants available at 'Hugging Face' <https://huggingface.co/models?pipeline_tag=fill-mask>. Methodological references and installation guidance are provided at <https://psychbruce.github.io/FMAT/>.
Maintained by Han-Wu-Shuang Bao. Last updated 5 months ago.
aiartificial-intelligencebertbert-modelbert-modelscontextualized-representationfill-in-the-blankfill-maskhuggingfacelanguage-modellanguage-modelslarge-language-modelsmasked-language-modelsnatural-language-processingnatural-language-understandingnlppretrained-modelstransformertransformers
42.6 match 12 stars 4.82 score 2 scriptsoscarkjell
text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.
Maintained by Oscar Kjell. Last updated 2 days ago.
deep-learningmachine-learningnlptransformersopenjdk
15.0 match 146 stars 13.16 score 436 scripts 1 dependentsvgherard
kgrams:Classical k-gram Language Models
Training and evaluating k-gram language models in R, supporting several probability smoothing techniques, perplexity computations, random text generation and more.
Maintained by Valerio Gherardi. Last updated 4 months ago.
language-modelsn-gramsnatural-language-processingcpp
32.1 match 7 stars 5.17 score 14 scripts 1 dependentsgagolews
stringi:Fast and Portable Character String Processing Facilities
A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).
Maintained by Marek Gagolewski. Last updated 1 months ago.
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringistringrtexttext-processingtidy-dataunicodecpp
9.0 match 309 stars 18.31 score 10k scripts 8.6k dependentsropensci
cld2:Google's Compact Language Detector 2
Bindings to Google's C++ library Compact Language Detector 2 (see <https://github.com/cld2owners/cld2#readme> for more information). Probabilistically detects over 80 languages in plain text or HTML. For mixed-language input it returns the top three detected languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes). There is also a 'cld3' package on CRAN which uses a neural network model instead.
Maintained by Jeroen Ooms. Last updated 5 months ago.
cldcld2language-detectionlanguage-detectorcpp
20.9 match 38 stars 7.74 score 161 scripts 3 dependentsdavisvaughan
treesitter:Bindings to 'Tree-Sitter'
Provides bindings to 'Tree-sitter', an incremental parsing system for programming tools. 'Tree-sitter' builds concrete syntax trees for source files of any language, and can efficiently update those syntax trees as the source file is edited. It also includes a robust error recovery system that provides useful parse results even in the presence of syntax errors.
Maintained by Davis Vaughan. Last updated 6 months ago.
23.1 match 37 stars 6.62 score 18 scripts 2 dependentslepennec
ggwordcloud:A Word Cloud Geom for 'ggplot2'
Provides a word cloud text geom for 'ggplot2'. Texts are placed so that they do not overlap as in 'ggrepel'. The algorithm used is a variation around the one of 'wordcloud2.js'.
Maintained by Erwan Le Pennec. Last updated 10 months ago.
14.3 match 174 stars 10.38 score 1.3k scripts 15 dependentsreditorsupport
languageserver:Language Server Protocol
An implementation of the Language Server Protocol for R. The Language Server protocol is used by an editor client to integrate features like auto completion. See <https://microsoft.github.io/language-server-protocol/> for details.
Maintained by Randy Lai. Last updated 1 years ago.
14.7 match 607 stars 9.93 score 207 scripts 1 dependentsyihui
knitr:A General-Purpose Package for Dynamic Report Generation in R
Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.
Maintained by Yihui Xie. Last updated 8 hours ago.
dynamic-documentsknitrliterate-programmingrmarkdownsweave
6.0 match 2.4k stars 23.62 score 116k scripts 4.2k dependentsropensci
cld3:Google's Compact Language Detector 3
Google's Compact Language Detector 3 is a neural network model for language identification and the successor of 'cld2' (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from 'cld2'. See <https://github.com/google/cld3#readme> for more information.
Maintained by Jeroen Ooms. Last updated 5 months ago.
cldcld3language-detectionlanguage-detectorprotobufcpp
20.8 match 41 stars 6.55 score 85 scripts 1 dependentsgabrielkaiserqfin
perplexR:A Coding Assistant using Perplexity's Large Language Models
A coding assistant using Perplexity's Large Language Models <https://www.perplexity.ai/> API. A set of functions and 'RStudio' add-ins that aim to help R developers.
Maintained by Gabriel Kaiser. Last updated 2 months ago.
31.8 match 6 stars 4.09 score 1 scriptsaphalo
learnrbook:Datasets and Code Examples from P. J. Aphalo's "Learn R" Book
Data, scripts and code from chunks used as examples in the book "Learn R: As a Language" 1ed and 2ed by Pedro J. Aphalo. ISBN 9780367182533 (pbk 1ed); ISBN 9780367182557 (hbk 1ed); ISBN 9780429060342 (ebk 1ed).
Maintained by Pedro J. Aphalo. Last updated 7 months ago.
27.4 match 1 stars 4.57 score 25 scriptscran
languageR:Analyzing Linguistic Data: A Practical Introduction to Statistics
Data sets exemplifying statistical methods, and some facilitatory utility functions used in ``Analyzing Linguistic Data: A practical introduction to statistics using R'', Cambridge University Press, 2008.
Maintained by R. H. Baayen. Last updated 6 years ago.
53.6 match 2.32 scoreappsilon
shiny.i18n:Shiny Applications Internationalization
It provides easy internationalization of Shiny applications. It can be used as standalone translation package to translate reports, interactive visualizations or graphical elements as well.
Maintained by Jakub Nowicki. Last updated 11 months ago.
internationalizationlanguagerhinoverseshinytranslation
11.9 match 168 stars 9.97 score 312 scripts 6 dependentskjhealy
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 11 months ago.
51.6 match 2.28 score 38 scriptsjuliasilge
tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.
Maintained by Julia Silge. Last updated 11 months ago.
natural-language-processingtext-miningtidy-datatidyverse
6.7 match 1.2k stars 16.86 score 17k scripts 61 dependentsquanteda
quanteda:Quantitative Analysis of Textual Data
A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
Maintained by Kenneth Benoit. Last updated 2 months ago.
corpusnatural-language-processingquantedatext-analyticsonetbbcpp
6.7 match 851 stars 16.68 score 5.4k scripts 51 dependentsquanteda
stopwords:Multilingual Stopword Lists
Provides multiple sources of stopwords, for use in text analysis and natural language processing.
Maintained by Kenneth Benoit. Last updated 3 years ago.
9.6 match 114 stars 10.54 score 1.1k scripts 65 dependentsrstudio
learnr:Interactive Tutorials for R
Create interactive tutorials using R Markdown. Use a combination of narrative, figures, videos, exercises, and quizzes to create self-paced tutorials for learning about R and R packages.
Maintained by Garrick Aden-Buie. Last updated 6 months ago.
interactivepythonrmarkdownshinysqlteachingtutorial
6.7 match 713 stars 14.79 score 6.5k scripts 27 dependentsnimble-dev
nimble:MCMC, Particle Filtering, and Programmable Hierarchical Modeling
A system for writing hierarchical statistical models largely compatible with 'BUGS' and 'JAGS', writing nimbleFunctions to operate models and do basic R-style math, and compiling both models and nimbleFunctions via custom-generated C++. 'NIMBLE' includes default methods for MCMC, Laplace Approximation, Monte Carlo Expectation Maximization, and some other tools. The nimbleFunction system makes it easy to do things like implement new MCMC samplers from R, customize the assignment of samplers to different parts of a model from R, and compile the new samplers automatically via C++ alongside the samplers 'NIMBLE' provides. 'NIMBLE' extends the 'BUGS'/'JAGS' language by making it extensible: New distributions and functions can be added, including as calls to external compiled code. Although most people think of MCMC as the main goal of the 'BUGS'/'JAGS' language for writing models, one can use 'NIMBLE' for writing arbitrary other kinds of model-generic algorithms as well. A full User Manual is available at <https://r-nimble.org>.
Maintained by Christopher Paciorek. Last updated 3 days ago.
bayesian-inferencebayesian-methodshierarchical-modelsmcmcprobabilistic-programmingopenblascpp
7.6 match 169 stars 12.97 score 2.6k scripts 19 dependentsedjnet
tidywikidatar:Explore 'Wikidata' Through Tidy Data Frames
Query 'Wikidata' API <https://www.wikidata.org/wiki/Wikidata:Main_Page> with ease, get tidy data frames in response, and cache data in a local database.
Maintained by Giorgio Comai. Last updated 8 months ago.
12.4 match 26 stars 7.86 score 46 scripts 2 dependentscysouw
qlcMatrix:Utility Sparse Matrix Functions for Quantitative Language Comparison
Extension of the functionality of the 'Matrix' package for using sparse matrices. Some of the functions are very general, while other are highly specific for special data format as used for quantitative language comparison.
Maintained by Michael Cysouw. Last updated 9 months ago.
13.7 match 6 stars 6.98 score 256 scripts 1 dependentsboost-r
mboost:Model-Based Boosting
Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data. Models and algorithms are described in <doi:10.1214/07-STS242>, a hands-on tutorial is available from <doi:10.1007/s00180-012-0382-5>. The package allows user-specified loss functions and base-learners.
Maintained by Torsten Hothorn. Last updated 4 months ago.
boosting-algorithmsgamglmmachine-learningmboostmodellingr-languagetutorialsvariable-selectionopenblas
7.5 match 72 stars 12.70 score 540 scripts 27 dependentstaylor-arnold
cleanNLP:A Tidy Data Model for Natural Language Processing
Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of the 'udpipe' back end with no external dependencies, or a Python back ends with 'spaCy' <https://spacy.io>. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, and dependency parsing.
Maintained by Taylor B. Arnold. Last updated 10 months ago.
corenlpnatural-language-processingspacy
11.3 match 214 stars 8.39 score 229 scriptsmiraisolutions
XLConnect:Excel Connector for R
Provides comprehensive functionality to read, write and format Excel data.
Maintained by Martin Studer. Last updated 16 days ago.
cross-platformexcelr-languagexlconnectopenjdk
7.5 match 130 stars 12.28 score 1.2k scripts 1 dependentsdselivanov
text2vec:Modern Text Mining Framework for R
Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.
Maintained by Dmitriy Selivanov. Last updated 7 months ago.
glovelatent-dirichlet-allocationnatural-language-processingtext-miningtopic-modelingvectorizationword-embeddingsword2veccpp
6.7 match 860 stars 13.48 score 1.3k scripts 23 dependentsglottospace
glottospace:Language Mapping and Geospatial Analysis of Linguistic and Cultural Data
Streamlined workflows for geolinguistic analysis, including: accessing global linguistic and cultural databases, data import, data entry, data cleaning, data exploration, mapping, visualization and export.
Maintained by Rui Dong. Last updated 3 months ago.
15.5 match 23 stars 5.54 score 6 scriptsusepa
elevatr:Access Elevation Data from Various APIs
Several web services are available that provide access to elevation data. This package provides access to many of those services and returns elevation data either as an 'sf' simple features object from point elevation services or as a 'raster' object from raster elevation services. In future versions, 'elevatr' will drop support for 'raster' and will instead return 'terra' objects. Currently, the package supports access to the Amazon Web Services Terrain Tiles <https://registry.opendata.aws/terrain-tiles/>, the Open Topography Global Datasets API <https://opentopography.org/developers/>, and the USGS Elevation Point Query Service <https://apps.nationalmap.gov/epqs/>.
Maintained by Jeffrey Hollister. Last updated 6 months ago.
digital-elevation-modelelevation-dataelevatrepamapzen-elevation-servicer-language
7.5 match 206 stars 11.11 score 1.3k scripts 3 dependentskumes
deepRstudio:Seamless Language Translation in 'RStudio' using 'DeepL' API and 'Rstudioapi'
Enhancing cross-language compatibility within the 'RStudio' environment and supporting seamless language understanding, the 'deepRstudio' package leverages the power of the 'DeepL' API (see <https://www.deepl.com/docs-api>) to enable seamless, fast, accurate, and affordable translation of code comments, documents, and text. This package offers the ability to translate selected text into English (EN), as well as from English into various languages, namely Japanese (JA), Chinese (ZH), Spanish (ES), French (FR), Russian (RU), Portuguese (PT), and Indonesian (ID). With much of the text being written in English, the emphasis is on compatibility from English. It is also designed for developers working on multilingual projects and data analysts collaborating with international teams, simplifying the translation process and making code more accessible and comprehensible to people with diverse language backgrounds. This package uses the 'rstudioapi' package and 'DeepL' API, and is simply implemented, executed from addins or via shortcuts on 'RStudio'. With just a few steps, content can be translated between supported languages, promoting better collaboration and expanding the global reach of work. The functionality of this package works only on 'RStudio' using 'rstudioapi'.
Maintained by Satoshi Kume. Last updated 1 years ago.
deepldeeprstudiolanguage-translationrstudiorstudioapiseamlessseamless-languagetranslation
23.3 match 2 stars 3.48 score 4 scripts 1 dependentsedubruell
tidyllm:Tidy Integration of Large Language Models
A tidy interface for integrating large language model (LLM) APIs such as 'Claude', 'Openai', 'Groq','Mistral' and local models via 'Ollama' into R workflows. The package supports text and media-based interactions, interactive message history, batch request APIs, and a tidy, pipeline-oriented interface for streamlined integration into data workflows. Web services are available at <https://www.anthropic.com>, <https://openai.com>, <https://groq.com>, <https://mistral.ai/> and <https://ollama.com>.
Maintained by Eduard Brüll. Last updated 3 days ago.
10.2 match 68 stars 7.82 score 26 scriptslightbridge-ks
thaipdf:R Markdown to PDF in Thai Language
Provide R Markdown templates and LaTeX preamble which are necessary for creating PDF from R Markdown documents in Thai language.
Maintained by Kittipos Sirivongrungson. Last updated 3 years ago.
latex-templatepdf-documentrmarkdownthaithai-language
17.8 match 5 stars 4.40 score 1 scriptseasystats
effectsize:Indices of Effect Size
Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc. References: Ben-Shachar et al. (2020) <doi:10.21105/joss.02815>.
Maintained by Mattan S. Ben-Shachar. Last updated 1 months ago.
anovacohens-dcomputeconversioncorrelationeffect-sizeeffectsizehacktoberfesthedges-ginterpretationstandardizationstandardizedstatistics
4.7 match 344 stars 16.38 score 1.8k scripts 29 dependentsstatisticsgreenland
pxmake:Make PX-Files in R
Create PX-files from scratch or read and modify existing ones. Includes a function for every PX keyword, making metadata manipulation simple and human-readable.
Maintained by Johan Ejstrud. Last updated 10 days ago.
11.0 match 9 stars 6.95 score 11 scriptsjozefhajnala
languageserversetup:Automated Setup and Auto Run for R Language Server
Allows to install the R 'languageserver' with all dependencies into a separate library and use that independent installation automatically when R is instantiated as a language server process. Useful for making language server seamless to use without running into package version conflicts.
Maintained by Jozef Hajnala. Last updated 4 years ago.
17.4 match 30 stars 4.32 score 2 scriptsvincentarelbundock
countrycode:Convert Country Names and Country Codes
Standardize country names, convert them into one of 40 different coding schemes, convert between coding schemes, and assign region descriptors.
Maintained by Vincent Arel-Bundock. Last updated 2 months ago.
5.0 match 351 stars 14.80 score 6.3k scripts 119 dependentstidyverse
purrr:Functional Programming Tools
A complete and consistent functional programming toolkit for R.
Maintained by Hadley Wickham. Last updated 1 months ago.
3.3 match 1.3k stars 22.12 score 59k scripts 6.9k dependentsjuliainterop
JuliaCall:Seamless Integration Between R and 'Julia'
Provides an R interface to 'Julia', which is a high-level, high-performance dynamic programming language for numerical computing, see <https://julialang.org/> for more information. It provides a high-level interface as well as a low-level interface. Using the high level interface, you could call any 'Julia' function just like any R function with automatic type conversion. Using the low level interface, you could deal with C-level SEXP directly while enjoying the convenience of using a high-level programming language like 'Julia'.
Maintained by Changcheng Li. Last updated 3 months ago.
6.0 match 270 stars 12.33 score 380 scripts 8 dependentspsychbruce
PsychWordVec:Word Embedding Research Framework for Psychological Science
An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a series of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <arXiv:1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <arXiv:1607.04606>; (5) a group of functions to download 'pre-trained' language models (e.g., 'GPT', 'BERT') and extract contextualized (dynamic) word vectors (based on the R package 'text').
Maintained by Han-Wu-Shuang Bao. Last updated 1 years ago.
bertcosine-similarityfasttextglovegptlanguage-modelnatural-language-processingnlppretrained-modelspsychologysemantic-analysistext-analysistext-miningtsneword-embeddingsword-vectorsword2vecopenjdk
18.1 match 22 stars 4.04 score 10 scriptshofnerb
stabs:Stability Selection with Error Control
Resampling procedures to assess the stability of selected variables with additional finite sample error control for high-dimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Buhlmann, 2010, <doi:10.1111/j.1467-9868.2010.00740.x>) and complementary pairs stability selection with improved error bounds (Shah & Samworth, 2013, <doi:10.1111/j.1467-9868.2011.01034.x>) are implemented. The package can be combined with arbitrary user specified variable selection approaches.
Maintained by Benjamin Hofner. Last updated 4 years ago.
machine-learningr-languageresamplingstability-selectionvariable-importancevariable-selection
7.5 match 26 stars 9.59 score 53 scripts 31 dependentsmlampros
fastText:Efficient Learning of Word Representations and Sentence Classification
An interface to the 'fastText' <https://github.com/facebookresearch/fastText> library for efficient learning of word representations and sentence classification. The 'fastText' algorithm is explained in detail in (i) "Enriching Word Vectors with subword Information", Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov, 2017, <doi:10.1162/tacl_a_00051>; (ii) "Bag of Tricks for Efficient Text Classification", Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov, 2017, <doi:10.18653/v1/e17-2068>; (iii) "FastText.zip: Compressing text classification models", Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Herve Jegou, Tomas Mikolov, 2016, <arXiv:1612.03651>.
Maintained by Lampros Mouselimis. Last updated 1 years ago.
9.6 match 42 stars 7.37 score 56 scriptsropensci
rnaturalearth:World Map Data from Natural Earth
Facilitates mapping by making natural earth map data from <https://www.naturalearthdata.com/> more easily available to R users.
Maintained by Philippe Massicotte. Last updated 15 days ago.
4.6 match 232 stars 15.35 score 7.2k scripts 47 dependentsbioc
sevenbridges:Seven Bridges Platform API Client and Common Workflow Language Tool Builder in R
R client and utilities for Seven Bridges platform API, from Cancer Genomics Cloud to other Seven Bridges supported platforms.
Maintained by Phil Webster. Last updated 5 months ago.
softwaredataimportthirdpartyclientapi-clientbioconductorbioinformaticscloudcommon-workflow-languagesevenbridges
9.4 match 35 stars 7.40 score 24 scriptsstevenmmortimer
salesforcer:An Implementation of 'Salesforce' APIs Using Tidy Principles
Functions connecting to the 'Salesforce' Platform APIs (REST, SOAP, Bulk 1.0, Bulk 2.0, Metadata, Reports and Dashboards) <https://trailhead.salesforce.com/content/learn/modules/api_basics/api_basics_overview>. "API" is an acronym for "application programming interface". Most all calls from these APIs are supported as they use CSV, XML or JSON data that can be parsed into R data structures. For more details please see the 'Salesforce' API documentation and this package's website <https://stevenmmortimer.github.io/salesforcer/> for more information, documentation, and examples.
Maintained by Steven M. Mortimer. Last updated 4 months ago.
api-wrappersr-languager-programmingsalesforcesalesforce-apis
7.5 match 82 stars 9.27 score 191 scriptsstevecondylios
dictionaRy:Retrieve the Dictionary Definitions of English Words
An R interface to the 'Free Dictionary API' <https://dictionaryapi.dev/>, <https://github.com/meetDeveloper/freeDictionaryAPI>. Retrieve dictionary definitions for English words, as well as additional information including phonetics, part of speech, origins, audio pronunciation, example usage, synonyms and antonyms, returned in 'tidy' format for ease of use.
Maintained by Steve Condylios. Last updated 3 years ago.
literaturenatural-language-processingr-language
14.2 match 6 stars 4.86 score 240 scriptshadley
pryr:Tools for Computing on the Language
Useful tools to pry back the covers of R and understand the language at a deeper level.
Maintained by Hadley Wickham. Last updated 1 years ago.
5.8 match 204 stars 11.85 score 1.9k scripts 56 dependentstidyverse
glue:Interpreted String Literals
An implementation of interpreted string literals, inspired by Python's Literal String Interpolation <https://www.python.org/dev/peps/pep-0498/> and Docstrings <https://www.python.org/dev/peps/pep-0257/> and Julia's Triple-Quoted String Literals <https://docs.julialang.org/en/v1.3/manual/strings/#Triple-Quoted-String-Literals-1>.
Maintained by Jennifer Bryan. Last updated 5 months ago.
3.1 match 729 stars 21.76 score 57k scripts 14k dependentsrossellhayes
and:Construct Natural-Language Lists with Internationalization
Construct language-aware lists. Make "and"-separated and "or"-separated lists that automatically conform to the user's language settings.
Maintained by Alexander Rossell Hayes. Last updated 17 days ago.
i18ninternationalizationtranslation
13.5 match 20 stars 5.01 score 6 scripts 2 dependentstrinker
wakefield:Generate Random Data Sets
Generates random data sets including: data.frames, lists, and vectors.
Maintained by Tyler Rinker. Last updated 5 years ago.
9.4 match 256 stars 7.13 score 209 scriptsropensci
karel:Learning programming with Karel the robot
This is the R implementation of Karel the robot, a programming language created by Dr. R. E. Pattis at Stanford University in 1981. Karel is an useful tool to teach introductory concepts about general programming, such as algorithmic decomposition, conditional statements, loops, etc., in an interactive and fun way, by writing programs to make Karel the robot achieve certain tasks in the world she lives in. Originally based on Pascal, Karel was implemented in many languages through these decades, including 'Java', 'C++', 'Ruby' and 'Python'. This is the first package implementing Karel in R.
Maintained by Marcos Prunello. Last updated 8 months ago.
9.5 match 10 stars 6.87 score 31 scriptsmodeloriented
DALEX:moDel Agnostic Language for Exploration and eXplanation
Any unverified black box model is the path to failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection. DALEX package xrays any model and helps to explore and explain its behaviour. Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance. But such black-box models usually lack direct interpretability. DALEX package contains various methods that help to understand the link between input variables and model output. Implemented methods help to explore the model on the level of a single instance as well as a level of the whole dataset. All model explainers are model agnostic and can be compared across different models. DALEX package is the cornerstone for 'DrWhy.AI' universe of packages for visual model exploration. Find more details in (Biecek 2018) <https://jmlr.org/papers/v19/18-416.html>.
Maintained by Przemyslaw Biecek. Last updated 30 days ago.
black-boxdalexdata-scienceexplainable-aiexplainable-artificial-intelligenceexplainable-mlexplanationsexplanatory-model-analysisfairnessimlinterpretabilityinterpretable-machine-learningmachine-learningmodel-visualizationpredictive-modelingresponsible-airesponsible-mlxai
4.9 match 1.4k stars 13.40 score 876 scripts 21 dependentspakjiddat
wordpredictor:Develop Text Prediction Models Based on N-Grams
A framework for developing n-gram models for text prediction. It provides data cleaning, data sampling, extracting tokens from text, model generation, model evaluation and word prediction. For information on how n-gram models work we referred to: "Speech and Language Processing" <https://web.archive.org/web/20240919222934/https%3A%2F%2Fweb.stanford.edu%2F~jurafsky%2Fslp3%2F3.pdf>. For optimizing R code and using R6 classes we referred to "Advanced R" <https://adv-r.hadley.nz/r6.html>. For writing R extensions we referred to "R Packages", <https://r-pkgs.org/index.html>.
Maintained by Nadir Latif. Last updated 5 months ago.
n-gram-language-modelsnatural-language-processingr-programming
13.4 match 6 stars 4.78 score 9 scriptsboost-r
gamboostLSS:Boosting Methods for 'GAMLSS'
Boosting models for fitting generalized additive models for location, shape and scale ('GAMLSS') to potentially high dimensional data.
Maintained by Benjamin Hofner. Last updated 19 days ago.
boosting-algorithmsgamboostlssgamlssmachine-learningr-languagevariable-selection
7.5 match 26 stars 8.52 score 163 scripts 1 dependentsbnosac
crfsuite:Conditional Random Fields for Labelling Sequential Data in Natural Language Processing
Wraps the 'CRFsuite' library <https://github.com/chokkan/crfsuite> allowing users to fit a Conditional Random Field model and to apply it on existing data. The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for named entity recognition, text chunking, part of speech tagging, intent recognition or classification of any category you have in mind. Next to training, a small web application is included in the package to allow you to easily construct training data.
Maintained by Jan Wijffels. Last updated 1 years ago.
chunkingconditional-random-fieldscrfcrfsuitedata-scienceintent-classificationnatural-language-processingnernlpcpp
10.0 match 63 stars 6.34 score 35 scriptseddelbuettel
RcppTOML:'Rcpp' Bindings to Parser for "Tom's Obvious Markup Language"
The configuration format defined by 'TOML' (which expands to "Tom's Obvious Markup Language") specifies an excellent format (described at <https://toml.io/en/>) suitable for both human editing as well as the common uses of a machine-readable format. This package uses 'Rcpp' to connect to the 'toml++' parser written by Mark Gillard to R.
Maintained by Dirk Eddelbuettel. Last updated 7 days ago.
c-plus-plus-11tomltoml-parsertoml-parsingcpp
5.1 match 36 stars 12.32 score 124 scripts 433 dependentsdataobservatory-eu
dataset:Create Data Frames that are Easier to Exchange and Reuse
The aim of the 'dataset' package is to make tidy datasets easier to release, exchange and reuse. It organizes and formats data frame 'R' objects into well-referenced, well-described, interoperable datasets into release and reuse ready form.
Maintained by Daniel Antal. Last updated 19 days ago.
7.8 match 15 stars 7.81 score 76 scripts 1 dependentskurthornik
ISOcodes:Selected ISO Codes
ISO language, territory, currency, script and character codes. Provides ISO 639 language codes, ISO 3166 territory codes, ISO 4217 currency codes, ISO 15924 script codes, and the ISO 8859 character codes as well as the UN M.49 area codes.
Maintained by Kurt Hornik. Last updated 1 years ago.
10.2 match 5.86 score 208 scripts 77 dependentscran
nlme:Linear and Nonlinear Mixed Effects Models
Fit and compare Gaussian linear and nonlinear mixed-effects models.
Maintained by R Core Team. Last updated 2 months ago.
4.5 match 6 stars 13.00 score 13k scripts 8.7k dependentsspsanderson
TidyDensity:Functions for Tidy Analysis and Generation of Random Data
To make it easy to generate random numbers based upon the underlying stats distribution functions. All data is returned in a tidy and structured format making working with the data simple and straight forward. Given that the data is returned in a tidy 'tibble' it lends itself to working with the rest of the 'tidyverse'.
Maintained by Steven Sanderson. Last updated 5 months ago.
bootstrapdensitydistributionsggplot2probabilityr-languagesimulationstatisticstibbletidy
7.5 match 34 stars 7.78 score 66 scripts 1 dependentsgongcastro
bvq:Barcelona Vocabulary Questionnaire Database and Helper Functions
Download, clean, and process the Barcelona Vocabulary Questionnaire (BVQ) data. BVQ is a vocabulary inventory developed for assesing the vocabulary of Catalan-Spanish bilinguals infants from the Metropolitan Area of Barcelona (Spain). This package includes functions to download the data from formr servers, and return the processed data in multiple formats.
Maintained by Gonzalo Garcia-Castro. Last updated 2 months ago.
bilingualismlanguagepsycholinguisticsvocabulary
13.7 match 1 stars 4.26 score 8 scriptsmodeloriented
ingredients:Effects and Importances of Model Ingredients
Collection of tools for assessment of feature importance and feature effects. Key functions are: feature_importance() for assessment of global level feature importance, ceteris_paribus() for calculation of the what-if plots, partial_dependence() for partial dependence plots, conditional_dependence() for conditional dependence plots, accumulated_dependence() for accumulated local effects plots, aggregate_profiles() and cluster_profiles() for aggregation of ceteris paribus profiles, generic print() and plot() for better usability of selected explainers, generic plotD3() for interactive, D3 based explanations, and generic describe() for explanations in natural language. The package 'ingredients' is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.
Maintained by Przemyslaw Biecek. Last updated 2 years ago.
5.6 match 37 stars 10.38 score 83 scripts 22 dependentsbrodieg
oshka:Recursive Quoted Language Expansion
Expands quoted language by recursively replacing any symbol that points to quoted language with the language it points to. The recursive process continues until only symbols that point to non-language objects remain. The resulting quoted language can then be evaluated normally. This differs from the traditional 'quote'/'eval' pattern because it resolves intermediate language objects that would interfere with evaluation.
Maintained by Brodie Gaslam. Last updated 7 years ago.
10.9 match 14 stars 5.15 score 9 scriptscomputationalstylistics
tidystopwords:Customisable Stop-Words in 110 Languages
Functions to generate stop-word lists in 110 languages, in a way consistent across all the languages supported. The generated lists are based on the morphological tagset from the Universal Dependencies.
Maintained by Maciej Eder. Last updated 12 months ago.
12.4 match 6 stars 4.48 score 7 scriptshoxo-m
githubinstall:A Helpful Way to Install R Packages Hosted on GitHub
Provides an helpful way to install packages hosted on GitHub.
Maintained by Koji Makiyama. Last updated 7 years ago.
7.5 match 49 stars 7.35 score 177 scriptshofnerb
papeR:A Toolbox for Writing Pretty Papers and Reports
A toolbox for writing 'knitr', 'Sweave' or other 'LaTeX'- or 'markdown'-based reports and to prettify the output of various estimated models.
Maintained by Benjamin Hofner. Last updated 4 years ago.
knitrlatexr-languagereportingreproduciblereproducible-researchsweave
7.5 match 30 stars 7.30 score 223 scripts 1 dependentsbnosac
word2vec:Distributed Representations of Words
Learn vector representations of words by continuous bag of words and skip-gram implementations of the 'word2vec' algorithm. The techniques are detailed in the paper "Distributed Representations of Words and Phrases and their Compositionality" by Mikolov et al. (2013), available at <arXiv:1310.4546>.
Maintained by Jan Wijffels. Last updated 1 years ago.
embeddingsnatural-language-processingword2veccpp
6.7 match 70 stars 8.08 score 227 scripts 5 dependentsr-lib
withr:Run Code 'With' Temporarily Modified Global State
A set of functions to run code 'with' safely and temporarily modified global state. Many of these functions were originally a part of the 'devtools' package, this provides a simple package with limited dependencies to provide access to these functions.
Maintained by Lionel Henry. Last updated 18 days ago.
3.0 match 176 stars 17.92 score 1.2k scripts 12k dependentsnlmixr2
rxode2:Facilities for Simulating from ODE-Based Models
Facilities for running simulations from ordinary differential equation ('ODE') models, such as pharmacometrics and other compartmental models. A compilation manager translates the ODE model into C, compiles it, and dynamically loads the object code into R for improved computational efficiency. An event table object facilitates the specification of complex dosing regimens (optional) and sampling schedules. NB: The use of this package requires both C and Fortran compilers, for details on their use with R please see Section 6.3, Appendix A, and Appendix D in the "R Administration and Installation" manual. Also the code is mostly released under GPL. The 'VODE' and 'LSODA' are in the public domain. The information is available in the inst/COPYRIGHTS.
Maintained by Matthew L. Fidler. Last updated 28 days ago.
4.7 match 39 stars 11.16 score 220 scripts 13 dependentskurthornik
NLP:Natural Language Processing Infrastructure
Basic classes and methods for Natural Language Processing.
Maintained by Kurt Hornik. Last updated 4 months ago.
5.6 match 6 stars 9.37 score 1.0k scripts 127 dependentsoobianom
r2country:Country Data with Names, Capitals, Currencies, Populations, Time, Languages and so on
Obtain information about countries around the globe. Information for names, states, languages, time, capitals, currency and many more. Data source are 'Wikipedia' <https://www.wikipedia.org>, 'TimeAndDate' <https://www.timeanddate.com> and 'CountryCode' <https://countrycode.org>.
Maintained by Obinna Obianom. Last updated 1 years ago.
14.1 match 1 stars 3.70 score 4 scriptsspsanderson
tidyAML:Automatic Machine Learning with 'tidymodels'
The goal of this package will be to provide a simple interface for automatic machine learning that fits the 'tidymodels' framework. The intention is to work for regression and classification problems with a simple verb framework.
Maintained by Steven Sanderson. Last updated 11 months ago.
automatic-machine-learningautomlclassificationmachine-learningparsnipr-languager-programmingregressiontidytidymodelstidyverse
7.5 match 68 stars 6.87 score 36 scripts 1 dependentscysouw
qlcVisualize:Visualization for Quantitative Language Comparison
Collection of visualizations as used in quantitative language comparison. Currently implemented are visualisations dealing nominal data with multiple levels ("level map" and "factor map"), and assistance for making weighted geographical Voronoi-maps ("weighted map").
Maintained by Michael Cysouw. Last updated 6 months ago.
12.6 match 4.03 score 24 scriptscran
XR:A Structure for Interfaces from R
Support for interfaces from R to other languages, built around a class for evaluators and a combination of functions, classes and methods for communication. Will be used through a specific language interface package. Described in the book "Extending R".
Maintained by John Chambers. Last updated 7 years ago.
16.7 match 2.95 score 3 dependentsbnosac
textrank:Summarize Text by Ranking Sentences and Finding Keywords
The 'textrank' algorithm is an extension of the 'Pagerank' algorithm for text. The algorithm allows to summarize text by calculating how sentences are related to one another. This is done by looking at overlapping terminology used in sentences in order to set up links between sentences. The resulting sentence network is next plugged into the 'Pagerank' algorithm which identifies the most important sentences in your text and ranks them. In a similar way 'textrank' can also be used to extract keywords. A word network is constructed by looking if words are following one another. On top of that network the 'Pagerank' algorithm is applied to extract relevant words after which relevant words which are following one another are combined to get keywords. More information can be found in the paper from Mihalcea, Rada & Tarau, Paul (2004) <https://www.aclweb.org/anthology/W04-3252/>.
Maintained by Jan Wijffels. Last updated 4 years ago.
natural-language-processingnlptextranktextrank-algorithm
6.7 match 77 stars 7.38 score 103 scripts 2 dependentsropensci
pkgmatch:Find R Packages Matching Either Descriptions or Other R Packages
Find R packages matching either descriptions or other R packages.
Maintained by Mark Padgham. Last updated 30 days ago.
9.4 match 3 stars 5.23 scoreneptune-ai
neptune:MLOps Metadata Store - Experiment Tracking and Model Registry for Production Teams
An interface to Neptune. A metadata store for MLOps, built for teams that run a lot of experiments. It gives you a single place to log, store, display, organize, compare, and query all your model-building metadata. Neptune is used for: • Experiment tracking: Log, display, organize, and compare ML experiments in a single place. • Model registry: Version, store, manage, and query trained models, and model building metadata. • Monitoring ML runs live: Record and monitor model training, evaluation, or production runs live For more information see <https://neptune.ai/>.
Maintained by Rafal Jankowski. Last updated 2 years ago.
comparelanguagelogmanagementmetadatametricsmlopsmodelsmonitoringorganizeparametersstoretrackervisualization
10.0 match 14 stars 4.89 score 16 scriptsbrandmaier
ggx:A Natural Language Interface to 'ggplot2'
The 'ggplot2' package is the state-of-the-art toolbox for creating and formatting graphs. However, it is easy to forget how certain formatting commands are named and sometimes users find themselves asking: How do you rotate the x-axis labels again? Or how do you hide the legend...? This package allows users to issue natural language commands related to theme-related styling of plots (colors, font size and such), which then are translated into valid 'ggplot2' commands.
Maintained by Andreas M. Brandmaier. Last updated 2 years ago.
7.1 match 152 stars 6.69 score 16 scriptshoxo-m
densratio:Density Ratio Estimation
Density ratio estimation. The estimated density ratio function can be used in many applications such as anomaly detection, change-point detection, covariate shift adaptation. The implemented methods are uLSIF (Hido et al. (2011) <doi:10.1007/s10115-010-0283-2>), RuLSIF (Yamada et al. (2011) <doi:10.1162/NECO_a_00442>), and KLIEP (Sugiyama et al. (2007) <doi:10.1007/s10463-008-0197-x>).
Maintained by Koji Makiyama. Last updated 6 years ago.
anomalydetectionmachine-learningmachine-learning-algorithmsmachine-learning-libraryr-languagestatistics
7.5 match 21 stars 6.36 score 36 scripts 2 dependentsbioc
RAIDS:Accurate Inference of Genetic Ancestry from Cancer Sequences
This package implements specialized algorithms that enable genetic ancestry inference from various cancer sequences sources (RNA, Exome and Whole-Genome sequences). This package also implements a simulation algorithm that generates synthetic cancer-derived data. This code and analysis pipeline was designed and developed for the following publication: Belleau, P et al. Genetic Ancestry Inference from Cancer-Derived Molecular Data across Genomic and Transcriptomic Platforms. Cancer Res 1 January 2023; 83 (1): 49–58.
Maintained by Pascal Belleau. Last updated 5 months ago.
geneticssoftwaresequencingwholegenomeprincipalcomponentgeneticvariabilitydimensionreductionbiocviewsancestrycancer-genomicsexome-sequencinggenomicsinferencer-languagerna-seqrna-sequencingwhole-genome-sequencing
7.5 match 5 stars 6.23 score 19 scriptssigbertklinke
stranslate:Simple Translation Between Different Languages
Message translation is often managed with 'po' files and the 'gettext' programme, but sometimes another solution is needed. In contrast to 'po' files, a more flexible approach is used as in the Fluent <https://projectfluent.org/> project with R Markdown snippets. The key-value approach allows easier handling of the translated messages.
Maintained by Sigbert Klinke. Last updated 1 years ago.
11.1 match 4.18 score 3 scripts 1 dependentsskoval
RISmed:Download Content from NCBI Databases
A set of tools to extract bibliographic content from the National Center for Biotechnology Information (NCBI) databases, including PubMed. The name RISmed is a portmanteau of RIS (for Research Information Systems, a common tag format for bibliographic data) and PubMed.
Maintained by Stephanie Kovalchik. Last updated 3 years ago.
6.7 match 38 stars 6.94 score 252 scripts 3 dependentsepiverse-trace
serofoi:Bayesian Estimation of the Force of Infection from Serological Data
Estimating the force of infection from time varying, age varying, or constant serocatalytic models from population based seroprevalence studies using a Bayesian framework, including data simulation functions enabling the generation of serological surveys based on this models. This tool also provides a flexible prior specification syntax for the force of infection and the seroreversion rate, as well as methods to assess model convergence and comparison criteria along with useful visualisation functions.
Maintained by Zulma M. Cucunubá. Last updated 17 days ago.
antibodiesbayesian-methodsepidemiologyepiverseserological-surveysstan-languagecpp
7.5 match 18 stars 6.17 score 10 scriptslgnbhl
BFS:Get Data from the Swiss Federal Statistical Office
Search and download data from the Swiss Federal Statistical Office (BFS) APIs <https://www.bfs.admin.ch/>.
Maintained by Felix Luginbuhl. Last updated 3 months ago.
7.1 match 18 stars 6.55 score 17 scriptscmerow
rangeModelMetadata:Provides Templates for Metadata Files Associated with Species Range Models
Range Modeling Metadata Standards (RMMS) address three challenges: they (i) are designed for convenience to encourage use, (ii) accommodate a wide variety of applications, and (iii) are extensible to allow the community of range modelers to steer it as needed. RMMS are based on a data dictionary that specifies a hierarchical structure to catalog different aspects of the range modeling process. The dictionary balances a constrained, minimalist vocabulary to improve standardization with flexibility for users to provide their own values. Merow et al. (2019) <DOI:10.1111/geb.12993> describe the standards in more detail. Note that users who prefer to use the R package 'ecospat' can obtain it from <https://github.com/ecospat/ecospat>.
Maintained by Cory Merow. Last updated 8 months ago.
ecological-metadata-languageecological-modellingecological-modelsecologyspecies-distribution-modellingspecies-distributions
6.7 match 6 stars 6.90 score 16 scripts 3 dependentstidyverse
ellmer:Chat with Large Language Models
Chat with large language models from a range of providers including 'Claude' <https://claude.ai>, 'OpenAI' <https://chatgpt.com>, and more. Supports streaming, asynchronous calls, tool calling, and structured data extraction.
Maintained by Hadley Wickham. Last updated 3 days ago.
3.6 match 388 stars 12.58 score 98 scripts 7 dependentsropensci
tokenizers:Fast, Consistent Tokenization of Natural Language Text
Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.
Maintained by Thomas Charlon. Last updated 12 months ago.
nlppeer-reviewedtext-miningtokenizercpp
3.4 match 186 stars 13.33 score 1.1k scripts 81 dependentsropensci
targets:Dynamic Function-Oriented 'Make'-Like Declarative Pipelines
Pipeline tools coordinate the pieces of computationally demanding analysis projects. The 'targets' package is a 'Make'-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU 'Make' (2015, ISBN:978-9881443519) and 'drake' (2018, <doi:10.21105/joss.00550>).
Maintained by William Michael Landau. Last updated 11 hours ago.
data-sciencehigh-performance-computingmakepeer-reviewedpipeliner-targetopiareproducibilityreproducible-researchtargetsworkflow
3.0 match 973 stars 15.20 score 4.6k scripts 22 dependentsdcomtois
summarytools:Tools to Quickly and Neatly Summarize Data
Data frame summaries, cross-tabulations, weight-enabled frequency tables and common descriptive (univariate) statistics in concise tables available in a variety of formats (plain ASCII, Markdown and HTML). A good point-of-entry for exploring data, both for experienced and new R users.
Maintained by Dominic Comtois. Last updated 1 days ago.
descriptive-statisticsfrequency-tablehtml-reportmarkdownpanderpandocpandoc-markdownrmarkdownrstudio
3.1 match 526 stars 14.52 score 2.9k scripts 6 dependentsdickoa
robotoolbox:Client for the 'KoboToolbox' API
Suite of utilities for accessing and manipulating data from the 'KoboToolbox' API. 'KoboToolbox' is a robust platform designed for field data collection in various disciplines. This package aims to simplify the process of fetching and handling data from the API. Detailed documentation for the 'KoboToolbox' API can be found at <https://support.kobotoolbox.org/api.html>.
Maintained by Ahmadou Dicko. Last updated 3 months ago.
open-datakobotoolboxodkkpiapidatadataset
7.5 match 5.92 score 48 scriptscschwem2er
stminsights:A 'Shiny' Application for Inspecting Structural Topic Models
This app enables interactive validation, interpretation and visualization of structural topic models from the 'stm' package by Roberts and others (2014) <doi:10.1111/ajps.12103>. It also includes helper functions for model diagnostics and extracting data from effect estimates.
Maintained by Carsten Schwemmer. Last updated 9 months ago.
natural-language-processingshinytopic-modeling
6.7 match 116 stars 6.69 score 84 scriptsbnosac
ruimtehol:Learn Text 'Embeddings' with 'Starspace'
Wraps the 'StarSpace' library <https://github.com/facebookresearch/StarSpace> allowing users to calculate word, sentence, article, document, webpage, link and entity 'embeddings'. By using the 'embeddings', you can perform text based multi-label classification, find similarities between texts and categories, do collaborative-filtering based recommendation as well as content-based recommendation, find out relations between entities, calculate graph 'embeddings' as well as perform semi-supervised learning and multi-task learning on plain text. The techniques are explained in detail in the paper: 'StarSpace: Embed All The Things!' by Wu et al. (2017), available at <arXiv:1709.03856>.
Maintained by Jan Wijffels. Last updated 1 years ago.
classificationembeddingsnatural-language-processingnlpsimilaritystarspacetext-miningcpp
6.7 match 101 stars 6.65 score 44 scriptsagricolamz
lingglosses:Interlinear Glossed Linguistic Examples and Abbreviation Lists Generation
Helps to render interlinear glossed linguistic examples in html 'rmarkdown' documents and then semi-automatically compiles the list of glosses at the end of the document. It also provides a database of linguistic glosses.
Maintained by George Moroz. Last updated 9 days ago.
glossesglosses-listinterlinear-glosslanguage-documentationlinguisticsrmarkdowntypology
7.5 match 15 stars 5.88 score 167 scriptscorrelaid
newsanchor:Client for the News API
Interface to gather news from the 'News API', based on a multilevel query <https://newsapi.org/>. A personal API key is required.
Maintained by Yannik Buhl. Last updated 5 years ago.
6.5 match 36 stars 6.70 score 40 scriptstdaverse
ripserr:Calculate Persistent Homology with Ripser-Based Engines
Ports the Ripser <https://arxiv.org/abs/1908.02518> and Cubical Ripser <https://arxiv.org/abs/2005.12692> persistent homology calculation engines from C++. Can be used as a rapid calculation tool in topological data analysis pipelines.
Maintained by Raoul Wadhwa. Last updated 2 hours ago.
algebraic-topologycohomologycppcubical-complexpersistent-homologypixelpoint-cloudr-languager-programmingrcpprips-complexripsersimplicial-complexsimplicial-homologytopological-data-analysistopologyvietoris-complexvoxelcpp
7.5 match 7 stars 5.80 score 6 scriptsjbgruber
rollama:Communicate with 'Ollama' to Run Large Language Models Locally
Wraps the 'Ollama' <https://ollama.com> API, which can be used to communicate with generative large language models locally.
Maintained by Johannes B. Gruber. Last updated 1 months ago.
5.2 match 110 stars 8.36 score 52 scriptsgaborcsardi
franc:Detect the Language of Text
With no external dependencies and support for 335 languages; all languages spoken by more than one million speakers. 'Franc' is a port of the 'JavaScript' project of the same name, see <https://github.com/wooorm/franc>.
Maintained by Gábor Csárdi. Last updated 3 years ago.
9.9 match 30 stars 4.38 score 16 scriptsineelhere
clintrialx:Connect and Work with Clinical Trials Data Sources
Are you spending too much time fetching and managing clinical trial data? Struggling with complex queries and bulk data extraction? What if you could simplify this process with just a few lines of code? Introducing 'clintrialx' - Fetch clinical trial data from sources like 'ClinicalTrials.gov' <https://clinicaltrials.gov/> and the 'Clinical Trials Transformation Initiative - Access to Aggregate Content of ClinicalTrials.gov' database <https://aact.ctti-clinicaltrials.org/>, supporting pagination and bulk downloads. Also, you can generate HTML reports based on the data obtained from the sources!
Maintained by Indraneel Chakraborty. Last updated 3 days ago.
aactbioinformaticsclinical-dataclinical-trialsclinicaltrialsgovcttidatadata-managementmedical-informaticsr-languagetrials
7.5 match 15 stars 5.76 score 11 scriptshoxo-m
magicfor:Magic Functions to Obtain Results from for Loops
Magic functions to obtain results from for loops.
Maintained by Koji Makiyama. Last updated 8 years ago.
7.5 match 20 stars 5.72 score 53 scriptscurso-r
scryr:An Interface to the 'Scryfall' API
A simple, light, and robust interface between R and the 'Scryfall' card data API <https://scryfall.com/docs/api>.
Maintained by Caio Lente. Last updated 3 years ago.
7.0 match 17 stars 6.09 score 18 scriptsr-lib
treesitter.r:'R' Grammar for 'Tree-Sitter'
Provides bindings to an 'R' grammar for 'Tree-sitter', to be used alongside the 'treesitter' package. 'Tree-sitter' builds concrete syntax trees for source files of any language, and can efficiently update those syntax trees as the source file is edited.
Maintained by Davis Vaughan. Last updated 4 months ago.
5.4 match 118 stars 7.81 score 17 scripts 2 dependentskeyatm
keyATM:Keyword Assisted Topic Models
Fits keyword assisted topic models (keyATM) using collapsed Gibbs samplers. The keyATM combines the latent dirichlet allocation (LDA) models with a small number of keywords selected by researchers in order to improve the interpretability and topic classification of the LDA. The keyATM can also incorporate covariates and directly model time trends. The keyATM is proposed in Eshima, Imai, and Sasaki (2024) <doi:10.1111/ajps.12779>.
Maintained by Shusei Eshima. Last updated 11 months ago.
latent-dirichlet-allocationnatural-language-processingpolitical-sciencercpprcppeigensocial-sciencetopic-modelscpp
6.7 match 106 stars 6.30 score 63 scriptszumbov2
deeplr:Interface to the 'DeepL' Translation API
A wrapper for the 'DeepL' Pro API <https://www.deepl.com/docs-api>, a web service for translating texts between different languages. A DeepL API developer account is required to use the service (see <https://www.deepl.com/pro#developer>).
Maintained by David Zumbach. Last updated 12 months ago.
7.5 match 41 stars 5.57 score 70 scriptsbnosac
BTM:Biterm Topic Models for Short Text
Biterm Topic Models find topics in collections of short texts. It is a word co-occurrence based topic model that learns topics by modeling word-word co-occurrences patterns which are called biterms. This in contrast to traditional topic models like Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis which are word-document co-occurrence topic models. A biterm consists of two words co-occurring in the same short text window. This context window can for example be a twitter message, a short answer on a survey, a sentence of a text or a document identifier. The techniques are explained in detail in the paper 'A Biterm Topic Model For Short Text' by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng (2013) <https://github.com/xiaohuiyan/xiaohuiyan.github.io/blob/master/paper/BTM-WWW13.pdf>.
Maintained by Jan Wijffels. Last updated 2 years ago.
biterm-topic-modellingnatural-language-processingtopic-modelingcpp
6.7 match 96 stars 6.25 score 74 scriptsr-lib
tidyselect:Select from a Set of Strings
A backend for the selecting functions of the 'tidyverse'. It makes it easy to implement select-like functions in your own packages in a way that is consistent with other 'tidyverse' interfaces for selection.
Maintained by Lionel Henry. Last updated 3 months ago.
2.3 match 130 stars 18.31 score 1.9k scripts 8.2k dependentssbg
tidycwl:Tidy Common Workflow Language Tools and Workflows
The Common Workflow Language <https://www.commonwl.org/> is an open standard for describing data analysis workflows. This package takes the raw Common Workflow Language workflows encoded in JSON or 'YAML' and turns the workflow elements into tidy data frames or lists. A graph representation for the workflow can be constructed and visualized with the parsed workflow inputs, outputs, and steps. Users can embed the visualizations in their 'Shiny' applications, and export them as HTML files or static images.
Maintained by Soner Koc. Last updated 10 months ago.
bioinformatics-pipelinecommon-workflow-languagesevenbridgestidyverse
10.4 match 9 stars 3.95 scorehenrikbengtsson
port4me:Get the Same, Personal, Free 'TCP' Port over and over
An R implementation of the cross-platform, language-independent "port4me" algorithm (<https://github.com/HenrikBengtsson/port4me>), which (1) finds a free Transmission Control Protocol ('TCP') port in [1024,65535] that the user can open, (2) is designed to work in multi-user environments, (3), gives different users, different ports, (4) gives the user the same port over time with high probability, (5) gives different ports for different software tools, and (6) requires no configuration.
Maintained by Henrik Bengtsson. Last updated 1 years ago.
bashclihigh-performance-computinghpcmulti-tenantmulti-userportpypi-packagepythonr-languager-programmingtcputility
8.0 match 13 stars 5.11 score 5 scriptsdwulff
text2sdg:Detecting UN Sustainable Development Goals in Text
The United Nations’ Sustainable Development Goals (SDGs) have become an important guideline for organisations to monitor and plan their contributions to social, economic, and environmental transformations. The 'text2sdg' package is an open-source analysis package that identifies SDGs in text using scientifically developed query systems, opening up the opportunity to monitor any type of text-based data, such as scientific output or corporate publications. For more information regarding the methodology see Meier, Mata & Wulff (2022) <arXiv:2110.05856>.
Maintained by Dominik S. Meier. Last updated 6 months ago.
natural-language-processingsustainabilitysustainable-developmentsustainable-development-goals
6.7 match 18 stars 6.13 score 9 scriptssjewo
readstata13:Import 'Stata' Data Files
Function to read and write the 'Stata' file format.
Maintained by Sebastian Jeworutzki. Last updated 2 years ago.
3.8 match 41 stars 10.74 score 1.7k scripts 45 dependentsmoodymudskipper
inops:Infix Operators for Detection, Subsetting and Replacement
Infix operators to detect, subset, and replace the elements matched by a given condition. The functions have several variants of operator types, including subsets, ranges, regular expressions and others. Implemented operators work on vectors, matrices, and lists.
Maintained by Antoine Fabri. Last updated 5 years ago.
7.5 match 40 stars 5.34 score 11 scriptsfriendly
heplots:Visualizing Hypothesis Tests in Multivariate Linear Models
Provides HE plot and other functions for visualizing hypothesis tests in multivariate linear models. HE plots represent sums-of-squares-and-products matrices for linear hypotheses and for error using ellipses (in two dimensions) and ellipsoids (in three dimensions). The related 'candisc' package provides visualizations in a reduced-rank canonical discriminant space when there are more than a few response variables.
Maintained by Michael Friendly. Last updated 7 days ago.
linear-hypothesesmatricesmultivariate-linear-modelsplotrepeated-measure-designsvisualizing-hypothesis-tests
3.4 match 9 stars 11.49 score 1.1k scripts 7 dependentsropensci
gutenbergr:Download and Process Public Domain Works from Project Gutenberg
Download and process public domain works in the Project Gutenberg collection <https://www.gutenberg.org/>. Includes metadata for all Project Gutenberg works, so that they can be searched and retrieved.
Maintained by Jon Harmon. Last updated 2 months ago.
3.8 match 105 stars 10.50 score 1.1k scripts 1 dependentsr-lib
testthat:Unit Testing for R
Software testing is important, but, in part because it is frustrating and boring, many of us avoid it. 'testthat' is a testing framework for R that is easy to learn and use, and integrates with your existing 'workflow'.
Maintained by Hadley Wickham. Last updated 15 days ago.
1.9 match 900 stars 20.97 score 74k scripts 465 dependentsquanteda
spacyr:Wrapper to the 'spaCy' 'NLP' Library
An R wrapper to the 'Python' 'spaCy' 'NLP' library, from <https://spacy.io>.
Maintained by Kenneth Benoit. Last updated 1 months ago.
extract-entitiesnlpspacyspeech-tagging
3.6 match 253 stars 10.68 score 408 scripts 6 dependentshauselin
ollamar:'Ollama' Language Models
An interface to easily run local language models with 'Ollama' <https://ollama.com> server and API endpoints (see <https://github.com/ollama/ollama/blob/main/docs/api.md> for details). It lets you run open-source large language models locally on your machine.
Maintained by Hause Lin. Last updated 2 months ago.
4.1 match 84 stars 9.36 score 74 scripts 5 dependentsropensci
EML:Read and Write Ecological Metadata Language Files
Work with Ecological Metadata Language ('EML') files. 'EML' is a widely used metadata standard in the ecological and environmental sciences, described in Jones et al. (2006), <doi:10.1146/annurev.ecolsys.37.091305.110031>.
Maintained by Carl Boettiger. Last updated 3 years ago.
emleml-metadatametadata-standard
3.4 match 97 stars 11.19 score 378 scripts 7 dependentsbnosac
doc2vec:Distributed Representations of Sentences, Documents and Topics
Learn vector representations of sentences, paragraphs or documents by using the 'Paragraph Vector' algorithms, namely the distributed bag of words ('PV-DBOW') and the distributed memory ('PV-DM') model. The techniques in the package are detailed in the paper "Distributed Representations of Sentences and Documents" by Mikolov et al. (2014), available at <arXiv:1405.4053>. The package also provides an implementation to cluster documents based on these embedding using a technique called top2vec. Top2vec finds clusters in text documents by combining techniques to embed documents and words and density-based clustering. It does this by embedding documents in the semantic space as defined by the 'doc2vec' algorithm. Next it maps these document embeddings to a lower-dimensional space using the 'Uniform Manifold Approximation and Projection' (UMAP) clustering algorithm and finds dense areas in that space using a 'Hierarchical Density-Based Clustering' technique (HDBSCAN). These dense areas are the topic clusters which can be represented by the corresponding topic vector which is an aggregate of the document embeddings of the documents which are part of that topic cluster. In the same semantic space similar words can be found which are representative of the topic. More details can be found in the paper 'Top2Vec: Distributed Representations of Topics' by D. Angelov available at <arXiv:2008.09470>.
Maintained by Jan Wijffels. Last updated 3 years ago.
doc2vecembeddingsnatural-language-processingparagraph2vecword2veccpp
6.7 match 48 stars 5.74 score 23 scriptsbioc
CNVMetrics:Copy Number Variant Metrics
The CNVMetrics package calculates similarity metrics to facilitate copy number variant comparison among samples and/or methods. Similarity metrics can be employed to compare CNV profiles of genetically unrelated samples as well as those with a common genetic background. Some metrics are based on the shared amplified/deleted regions while other metrics rely on the level of amplification/deletion. The data type used as input is a plain text file containing the genomic position of the copy number variations, as well as the status and/or the log2 ratio values. Finally, a visualization tool is provided to explore resulting metrics.
Maintained by Astrid Deschênes. Last updated 5 months ago.
biologicalquestionsoftwarecopynumbervariationcnvcopy-number-variationmetricsr-language
7.5 match 4 stars 5.08 score 8 scriptspachadotdev
cepiigeodist:CEPII's GeoDist datasets in R
Provides data on countries and their main city or agglomeration and the different distance measures and dummy variables indicating whether two countries are contiguous, share a common language or a colonial relationship. The reference article for these datasets is Mayer and Zignago (2011).
Maintained by Mauricio Vargas. Last updated 2 years ago.
borderscolonizationgeodistancegravitylanguagestrade
10.5 match 3 stars 3.54 score 23 scriptsmlverse
chattr:Interact with Large Language Models in 'RStudio'
Enables user interactivity with large-language models ('LLM') inside the 'RStudio' integrated development environment (IDE). The user can interact with the model using the 'shiny' app included in this package, or directly in the 'R' console. It comes with back-ends for 'OpenAI', 'GitHub' 'Copilot', and 'LlamaGPT'.
Maintained by Edgar Ruiz. Last updated 1 months ago.
3.5 match 215 stars 10.55 score 71 scripts 1 dependentsropensci
pangoling:Access to Large Language Model Predictions
Provides access to word predictability estimates using large language models (LLMs) based on 'transformer' architectures via integration with the 'Hugging Face' ecosystem. The package interfaces with pre-trained neural networks and supports both causal/auto-regressive LLMs (e.g., 'GPT-2'; Radford et al., 2019) and masked/bidirectional LLMs (e.g., 'BERT'; Devlin et al., 2019, <doi:10.48550/arXiv.1810.04805>) to compute the probability of words, phrases, or tokens given their linguistic context. By enabling a straightforward estimation of word predictability, the package facilitates research in psycholinguistics, computational linguistics, and natural language processing (NLP).
Maintained by Bruno Nicenboim. Last updated 3 days ago.
nlppsycholinguisticstransformers
7.5 match 8 stars 4.90 scoreironholds
batman:Convert categorical representations of logicals to actual logicals
Survey systems and other third-party data sources commonly use non- standard representations of logical values when it comes to qualitative data - "Yes", "No" and "N/A", say. batman is a package designed to seamlessly convert these into logicals. It is highly localised, and contains equivalents to boolean values in languages including German, French, Spanish, Italian, Turkish, Chinese and Polish.
Maintained by Oliver Keyes. Last updated 9 years ago.
6.9 match 11 stars 5.28 score 70 scriptsmichelnivard
gptstudio:Use Large Language Models Directly in your Development Environment
Large language models are readily accessible via API. This package lowers the barrier to use the API inside of your development environment. For more on the API, see <https://platform.openai.com/docs/introduction>.
Maintained by James Wade. Last updated 5 days ago.
chatgptgpt-3rstudiorstudio-addin
3.4 match 924 stars 10.83 score 43 scripts 1 dependentsdoug-friedman
topicdoc:Topic-Specific Diagnostics for LDA and CTM Topic Models
Calculates topic-specific diagnostics (e.g. mean token length, exclusivity) for Latent Dirichlet Allocation and Correlated Topic Models fit using the 'topicmodels' package. For more details, see Chapter 12 in Airoldi et al. (2014, ISBN:9781466504080), pp 262-272 Mimno et al. (2011, ISBN:9781937284114), and Bischof et al. (2014) <arXiv:1206.4631v1>.
Maintained by Doug Friedman. Last updated 3 years ago.
natural-language-processingtext-miningtopic-modelingtopic-modellingtopic-models
6.7 match 25 stars 5.48 score 24 scriptsdmkaplan2000
knitrdata:Data Language Engine for 'knitr' / 'rmarkdown'
Implements a data language engine for incorporating data directly in 'rmarkdown' documents so that they can be made completely standalone.
Maintained by David M. Kaplan. Last updated 3 years ago.
7.5 match 7 stars 4.75 score 16 scriptsgagolews
stringx:Replacements for Base String Functions Powered by 'stringi'
English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to 'ICU' (International Components for Unicode) and 'stringi', they are fast, reliable, and portable across different platforms.
Maintained by Marek Gagolewski. Last updated 2 months ago.
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringitexttext-processingunicode
7.4 match 28 stars 4.75 score 1 scriptsrdatatable
data.table:Extension of `data.frame`
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
Maintained by Tyson Barrett. Last updated 13 hours ago.
1.5 match 3.7k stars 23.53 score 230k scripts 4.6k dependentsappliedstat
rQCC:Robust Quality Control Chart
Constructs various robust quality control charts based on the median or Hodges-Lehmann estimator (location) and the median absolute deviation (MAD) or Shamos estimator (scale). The estimators used for the robust control charts are all unbiased with a sample of finite size. For more details, see Park, Kim and Wang (2022) <doi:10.1080/03610918.2019.1699114>. In addition, using this R package, the conventional quality control charts such as X-bar, S, R, p, np, u, c, g, h, and t charts are also easily constructed. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2022R1A2C1091319).
Maintained by Chanseok Park. Last updated 1 years ago.
control-chartgoodness-of-fitr-languageweibull
7.5 match 2 stars 4.70 score 3 scriptsr-forge
deSolve:Solvers for Initial Value Problems of Differential Equations ('ODE', 'DAE', 'DDE')
Functions that solve initial value problems of a system of first-order ordinary differential equations ('ODE'), of partial differential equations ('PDE'), of differential algebraic equations ('DAE'), and of delay differential equations. The functions provide an interface to the FORTRAN functions 'lsoda', 'lsodar', 'lsode', 'lsodes' of the 'ODEPACK' collection, to the FORTRAN functions 'dvode', 'zvode' and 'daspk' and a C-implementation of solvers of the 'Runge-Kutta' family with fixed or variable time steps. The package contains routines designed for solving 'ODEs' resulting from 1-D, 2-D and 3-D partial differential equations ('PDE') that have been converted to 'ODEs' by numerical differencing.
Maintained by Thomas Petzoldt. Last updated 1 years ago.
2.9 match 12.33 score 8.0k scripts 427 dependentsbioc
Rcwl:An R interface to the Common Workflow Language
The Common Workflow Language (CWL) is an open standard for development of data analysis workflows that is portable and scalable across different tools and working environments. Rcwl provides a simple way to wrap command line tools and build CWL data analysis pipelines programmatically within R. It increases the ease of usage, development, and maintenance of CWL pipelines.
Maintained by Qiang Hu. Last updated 5 months ago.
softwareworkflowstepimmunooncology
6.4 match 5.52 score 37 scripts 2 dependentsfatelarico
morestopwords:All Stop Words in One Place
A standalone package combining several stop-word lists for 65 languages with a median of 329 stop words for language and over 1,000 entries for English, Breton, Latin, Slovenian, and Ancient Greek! The user automatically gets access to all the unique stop words contained in: the 'StopwordISO' repository; python's 'Natural Language Toolkit'; the 'Snowball' stop-word list; the R package 'quanteda'; the 'marimo' repository; the 'Perseus' project; and A. Berra's list of stop words for Ancient Greek and Latin.
Maintained by Fabio Ashtar Telarico. Last updated 2 years ago.
13.0 match 2.70 scoreouhscbbmc
REDCapR:Interaction Between R and REDCap
Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.
Maintained by Will Beasley. Last updated 2 months ago.
2.8 match 118 stars 12.36 score 438 scripts 6 dependentsjsugarelli
switchcase:A Simple and Flexible Switch-Case Construct for the 'R' Language
Provides a switch-case construct for 'R', as it is known from other programming languages. It allows to test multiple, similar conditions in an efficient, easy-to-read manner, so nested if-else constructs can be avoided. The switch-case construct is designed as an 'R' function that allows to return values depending on which condition is met and lets the programmer flexibly decide whether or not to leave the switch-case construct after a case block has been executed.
Maintained by Joachim Zuckarelli. Last updated 5 years ago.
r-langr-languageswitch-case-construct
10.9 match 3 stars 3.18 score 2 scriptsmlverse
mall:Run Multiple Large Language Model Predictions Against a Table, or Vectors
Run multiple 'Large Language Model' predictions against a table. The predictions run row-wise over a specified column. It works using a one-shot prompt, along with the current row's content. The prompt that is used will depend of the type of analysis needed.
Maintained by Edgar Ruiz. Last updated 3 months ago.
data-sciencedplyrllmpolarspython
5.2 match 86 stars 6.61 score 94 scriptscran
zoomGroupStats:Analyze Text, Audio, and Video from 'Zoom' Meetings
Provides utilities for processing and analyzing the files that are exported from a recorded 'Zoom' Meeting. This includes analyzing data captured through video cameras and microphones, the text-based chat, and meta-data. You can analyze aspects of the conversation among meeting participants and their emotional expressions throughout the meeting.
Maintained by Andrew Knight. Last updated 4 years ago.
10.3 match 3.30 score 10 scriptsdocopt
docopt:Command-Line Interface Specification Language
Define a command-line interface by just giving it a description in the specific format.
Maintained by Edwin de Jonge. Last updated 4 years ago.
3.0 match 213 stars 11.29 score 1.5k scripts 19 dependentsstefanieschneider
unstruwwel:Detect and Parse Historic Dates
Automatically converts language-specific verbal information, e.g., "1st half of the 19th century," to its standardized numerical counterparts, e.g., "1801-01-01/1850-12-31." It follows the recommendations of the 'MIDAS' ('Marburger Informations-, Dokumentations- und Administrations-System'), see <doi:10.11588/artdok.00003770>.
Maintained by Stefanie Schneider. Last updated 2 months ago.
8.8 match 7 stars 3.85 score 2 scriptsgadget-framework
gadget3:Globally-Applicable Area Disaggregated General Ecosystem Toolbox V3
A framework to assist creation of marine ecosystem models, generating either 'R' or 'C++' code which can then be optimised using the 'TMB' package and standard 'R' tools. Principally designed to reproduce gadget2 models in 'TMB', but can be extended beyond gadget2's capabilities. Kasper Kristensen, Anders Nielsen, Casper W. Berg, Hans Skaug, Bradley M. Bell (2016) <doi:10.18637/jss.v070.i05> "TMB: Automatic Differentiation and Laplace Approximation.". Begley, J., & Howell, D. (2004) <https://core.ac.uk/download/pdf/225936648.pdf> "An overview of Gadget, the globally applicable area-disaggregated general ecosystem toolbox. ICES.".
Maintained by Jamie Lentin. Last updated 29 days ago.
3.9 match 8 stars 8.69 score 170 scriptspythonhealthdatascience
treat.sim:Nelson's Treatment Centre Simulation in Simmer
A discrete-event simulation of a simple urgent care treatment centre simulation from Nelson (2013). Implemented in R Simmer. The model is packaged to allow for easy experimentation, summary of results, and implementation in other software such as a Shiny interface.
Maintained by Thomas Monks. Last updated 8 months ago.
computer-simulationdiscrete-event-simulationhealthopen-modellingopen-scienceopen-sourcer-languagereproducible-researchsimmer
7.5 match 2 stars 4.48 score 5 scriptsmodeloriented
iBreakDown:Model Agnostic Instance Level Variable Attributions
Model agnostic tool for decomposition of predictions from black boxes. Supports additive attributions and attributions with interactions. The Break Down Table shows contributions of every variable to a final prediction. The Break Down Plot presents variable contributions in a concise graphical way. This package works for classification and regression models. It is an extension of the 'breakDown' package (Staniak and Biecek 2018) <doi:10.32614/RJ-2018-072>, with new and faster strategies for orderings. It supports interactions in explanations and has interactive visuals (implemented with 'D3.js' library). The methodology behind is described in the 'iBreakDown' article (Gosiewska and Biecek 2019) <arXiv:1903.11420> This package is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.
Maintained by Przemyslaw Biecek. Last updated 1 years ago.
breakdownimlinterpretabilityshapleyxai
3.3 match 84 stars 10.07 score 56 scripts 22 dependentscynkra
constructive:Display Idiomatic Code to Construct Most R Objects
Prints code that can be used to recreate R objects. In a sense it is similar to 'base::dput()' or 'base::deparse()' but 'constructive' strives to use idiomatic constructors.
Maintained by Antoine Fabri. Last updated 9 hours ago.
3.9 match 137 stars 8.63 score 20 scriptsmikemahoney218
proceduralnames:Several Methods for Procedural Name Generation
A small, dependency-free way to generate random names. Methods provided include the adjective-surname approach of Docker containers ('<https://github.com/moby/moby/blob/master/pkg/namesgenerator/names-generator.go>'), and combinations of common English or Spanish words.
Maintained by Michael Mahoney. Last updated 3 years ago.
7.2 match 7 stars 4.62 score 4 scripts 4 dependentselgarteo
cnum:Chinese Numerals Processing
Chinese numerals processing in R, such as conversion between Chinese numerals and Arabic numerals as well as detection and extraction of Chinese numerals in character objects and string. This package supports the casual scale naming system and the respective SI prefix systems used in mainland China and Taiwan: "The State Council's Order on the Unified Implementation of Legal Measurement Units in Our Country" The State Council of the People's Republic of China (1984) "Names, Definitions and Symbols of the Legal Units of Measurement and the Decimal Multiples and Submultiples" Ministry of Economic Affairs (2019) <https://gazette.nat.gov.tw/egFront/detail.do?metaid=108965>.
Maintained by Elgar Teo. Last updated 2 months ago.
chinese-languagenumeral-systems-conversionstext-miningcpp
9.5 match 6 stars 3.48 score 2 scriptspetersfritz
topiclabels:Automated Topic Labeling with Language Models
Leveraging (large) language models for automatic topic labeling. The main function converts a list of top terms into a label for each topic. Hence, it is complementary to any topic modeling package that produces a list of top terms for each topic. While human judgement is indispensable for topic validation (i.e., inspecting top terms and most representative documents), automatic topic labeling can be a valuable tool for researchers in various scenarios.
Maintained by Jonas Rieger. Last updated 5 months ago.
7.0 match 4 stars 4.73 score 1 scriptsrossellhayes
plu:Dynamically Pluralize Phrases
Converts English phrases to singular or plural form based on the length of an associated vector. Contains helper functions to create natural language lists from vectors and to include the length of a vector in natural language.
Maintained by Alexander Rossell Hayes. Last updated 1 years ago.
8.3 match 6 stars 3.95 score 2 scripts 1 dependentsterrytangyuan
scaffolder:Scaffolding Interfaces to Packages in Other Programming Languages
Comprehensive set of tools for scaffolding R interfaces to modules, classes, functions, and documentations written in other programming languages, such as 'Python'.
Maintained by Yuan Tang. Last updated 2 years ago.
code-generationpythonreticulatescaffolding
5.3 match 27 stars 6.13 score 9 scriptsglin
reactable:Interactive Data Tables for R
Interactive data tables for R, based on the 'React Table' JavaScript library. Provides an HTML widget that can be used in 'R Markdown' or 'Quarto' documents, 'Shiny' applications, or viewed from an R console.
Maintained by Greg Lin. Last updated 2 months ago.
2.3 match 645 stars 14.52 score 3.3k scripts 151 dependentsmohamed-180
gtranslate:Translate Between Different Languages
The goal of this package is to translate between different languages without any Google API authentication which is pain and you must pay for the key, This package is free and lightweight.
Maintained by Mohamed El-Desouky. Last updated 2 years ago.
9.9 match 4 stars 3.30 score 7 scriptsjsugarelli
pointr:Working Comfortably with Pointers and Shortcuts to R Objects
R has no built-in pointer functionality. The 'pointr' package fills this gap and lets you create pointers to R objects, including subsets of dataframes. This makes your R code more readable and maintainable.
Maintained by Joachim Zuckarelli. Last updated 4 years ago.
7.5 match 8 stars 4.31 score 17 scripts 1 dependentsvgherard
sbo:Text Prediction via Stupid Back-Off N-Gram Models
Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).
Maintained by Valerio Gherardi. Last updated 4 years ago.
natural-language-processingngram-modelspredictive-textsbocpp
6.7 match 10 stars 4.78 score 12 scriptstonyfischetti
libbib:Various Utilities for Library Science/Assessment and Cataloging
Provides functions for validating and normalizing bibliographic codes such as ISBN, ISSN, and LCCN. Also includes functions to communicate with the WorldCat API, translate Call numbers (Library of Congress and Dewey Decimal) to their subject classifications or subclassifications, and provides various loadable data files such call number / subject crosswalks and code tables.
Maintained by Tony Fischetti. Last updated 2 years ago.
9.9 match 3.20 score 32 scriptshadley
lazyeval:Lazy (Non-Standard) Evaluation
An alternative approach to non-standard evaluation using formulas. Provides a full implementation of LISP style 'quasiquotation', making it easier to generate code with other code.
Maintained by Hadley Wickham. Last updated 3 years ago.
2.0 match 131 stars 15.74 score 520 scripts 1.8k dependentstrn000
norMmix:Direct MLE for Multivariate Normal Mixture Distributions
Multivariate Normal (i.e. Gaussian) Mixture Models (S3) Classes. Fitting models to data using 'MLE' (maximum likelihood estimation) for multivariate normal mixtures via smart parametrization using the 'LDL' (Cholesky) decomposition, see McLachlan and Peel (2000, ISBN:9780471006268), Celeux and Govaert (1995) <doi:10.1016/0031-3203(94)00125-6>.
Maintained by Nicolas Trutmann. Last updated 6 months ago.
gaussian-mixture-modelsmaximum-likelihood-estimationr-language
7.5 match 4.18 score 3 scriptsbnosac
sentencepiece:Text Tokenization using Byte Pair Encoding and Unigram Modelling
Unsupervised text tokenizer allowing to perform byte pair encoding and unigram modelling. Wraps the 'sentencepiece' library <https://github.com/google/sentencepiece> which provides a language independent tokenizer to split text in words and smaller subword units. The techniques are explained in the paper "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing" by Taku Kudo and John Richardson (2018) <doi:10.18653/v1/D18-2012>. Provides as well straightforward access to pretrained byte pair encoding models and subword embeddings trained on Wikipedia using 'word2vec', as described in "BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages" by Benjamin Heinzerling and Michael Strube (2018) <http://www.lrec-conf.org/proceedings/lrec2018/pdf/1049.pdf>.
Maintained by Jan Wijffels. Last updated 2 years ago.
bytenatural-language-processingsentencepieceword-segmentationcpp
7.6 match 25 stars 4.10 score 8 scriptstheharmonylab
topics:Creating and Significance Testing Language Features for Visualisation
Implements differential language analysis with statistical tests and offers various language visualization techniques for n-grams and topics. It also supports the 'text' package. For more information, visit <https://r-topics.org/> and <https://www.r-text.org/>.
Maintained by Oscar Kjell. Last updated 5 days ago.
3.7 match 5 stars 8.28 score 22 scripts 2 dependentsnalimilan
SnowballC:Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library
An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish.
Maintained by Milan Bouchet-Valat. Last updated 15 days ago.
2.4 match 27 stars 12.63 score 4.4k scripts 171 dependentsbrodieg
vetr:Trust, but Verify
Declarative template-based framework for verifying that objects meet structural requirements, and auto-composing error messages when they do not.
Maintained by Brodie Gaslam. Last updated 9 months ago.
argument-checksinput-validation
4.0 match 79 stars 7.50 score 67 scripts 1 dependentskrashkov
pcSteiner:Convenient Tool for Solving the Prize-Collecting Steiner Tree Problem
The Prize-Collecting Steiner Tree problem asks to find a subgraph connecting a given set of vertices with the most expensive nodes and least expensive edges. Since it is proven to be NP-hard, exact and efficient algorithm does not exist. This package provides convenient functionality for obtaining an approximate solution to this problem using loopy belief propagation algorithm.
Maintained by Aleksei Krasikov. Last updated 5 years ago.
graph-algorithmsr-languagesteiner-treesteiner-tree-problem
7.5 match 2 stars 4.00 score 3 scriptsip2location
ip2location:Lookup for IP Address Information
Enables the user to find the country, region, district, city, coordinates, zip code, time zone, ISP, domain name, connection type, area code, weather, MCC, MNC, mobile brand name, elevation, usage type, address type, IAB category and ASN that any IP address or hostname originates from. Supported IPv4 and IPv6. Please visit <https://www.ip2location.com> to learn more. You may also want to visit <https://lite.ip2location.com> for free database download. This package requires 'IP2Location Python' module. At the terminal, please run 'pip install IP2Location' to install the module.
Maintained by Kai Wen Ooi. Last updated 2 years ago.
geolocationgeolocation-informationip-geolocationip-lookupip2locationlookupr-language
7.5 match 10 stars 4.00 score 1 scriptsappliedstat
weibullness:Goodness-of-Fit Test for Weibull Distribution (Weibullness)
Conducts a goodness-of-fit test for the Weibull distribution (referred to as the weibullness test) and furnishes parameter estimations for both the two-parameter and three-parameter Weibull distributions. Notably, the threshold parameter is derived through correlation from the Weibull plot. Additionally, this package conducts goodness-of-fit assessments for the exponential, Gumbel, and inverse Weibull distributions, accompanied by parameter estimations. For more details, see Park (2017) <doi:10.23055/ijietap.2017.24.4.2848>, Park (2018) <doi:10.1155/2018/6056975>, and Park (2023) <doi:10.3390/math11143156>. This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIT) (No. 2022R1A2C1091319, RS-2023-00242528).
Maintained by Chanseok Park. Last updated 1 years ago.
control-chartgoodness-of-fitr-languageweibull
7.5 match 2 stars 3.98 score 32 scripts 1 dependentsnashjc
nlsr:Functions for Nonlinear Least Squares Solutions - Updated 2022
Provides tools for working with nonlinear least squares problems. For the estimation of models reliable and robust tools than nls(), where the the Gauss-Newton method frequently stops with 'singular gradient' messages. This is accomplished by using, where possible, analytic derivatives to compute the matrix of derivatives and a stabilization of the solution of the estimation equations. Tools for approximate or externally supplied derivative matrices are included. Bounds and masks on parameters are handled properly.
Maintained by John C Nash. Last updated 26 days ago.
4.2 match 7.02 score 94 scripts 5 dependentsazure
AzureKusto:Interface to 'Kusto'/'Azure Data Explorer'
An interface to 'Azure Data Explorer', also known as 'Kusto', a fast, distributed data exploration service from Microsoft: <https://azure.microsoft.com/en-us/products/data-explorer/>. Includes 'DBI' and 'dplyr' interfaces, with the latter modelled after the 'dbplyr' package, whereby queries are translated from R into the native 'KQL' query language and executed lazily. On the admin side, the package extends the object framework provided by 'AzureRMR' to support creation and deletion of databases, and management of database principals. Part of the 'AzureR' family of packages.
Maintained by Alex Kyllo. Last updated 1 years ago.
azureazure-data-explorerazure-sdk-rbig-data-analyticskusto
5.5 match 18 stars 5.19 score 9 scriptsrossellhayes
nombre:Number Names
Converts numeric vectors to character vectors of English number names. Provides conversion to cardinals, ordinals, numerators, and denominators. Supports negative and non-integer numbers.
Maintained by Alexander Rossell Hayes. Last updated 3 years ago.
7.5 match 13 stars 3.81 score 4 scriptsphilferriere
mscstexta4r:R Client for the Microsoft Cognitive Services Text Analytics REST API
R Client for the Microsoft Cognitive Services Text Analytics REST API, including Sentiment Analysis, Topic Detection, Language Detection, and Key Phrase Extraction. An account MUST be registered at the Microsoft Cognitive Services website <https://www.microsoft.com/cognitive-services/> in order to obtain a (free) API key. Without an API key, this package will not work properly.
Maintained by Phil Ferriere. Last updated 9 years ago.
5.4 match 24 stars 5.28 score 16 scriptscysouw
qlcData:Processing Data for Quantitative Language Comparison
Functionality to read, recode, and transcode data as used in quantitative language comparison, specifically to deal with multilingual orthographic variation (Moran & Cysouw (2018) <doi:10.5281/zenodo.1296780>) and with the recoding of nominal data.
Maintained by Michael Cysouw. Last updated 9 months ago.
5.3 match 3 stars 5.38 score 40 scriptsdamoncharlesroberts
genCountR:Interacting with Roberts and Utych's (2019) Gendered Language Dictionary
Allows users to generate a gendered language score according to the gendered language dictionary in Roberts and Utych (2019) <doi:10.1177/1065912919874883>.
Maintained by Damon Roberts. Last updated 8 months ago.
7.1 match 4.00 score 2 scriptsiatgen
tr.iatgen:Translate 'iatgen' Generated QSF Files
Automates translating the instructions of 'iatgen' generated qsf (Qualtrics survey files) to other languages using either officially supported or user-supplied translations (for tutorial see Santos et al., 2023 <doi:10.17504/protocols.io.kxygx34jdg8j/v1>).
Maintained by Michal Kouril. Last updated 3 months ago.
6.8 match 4.18 score 4 scriptsuribo
textlintr:Natural Language Linter Tools for 'R Markdown' and R Code
What the package does (one paragraph).
Maintained by Shinya Uryu. Last updated 2 years ago.
lintnatural-language-processing
9.5 match 9 stars 2.95 score 4 scriptscolinfay
proustr:Tools for Natural Language Processing in French
Tools for Natural Language Processing in French and texts from Marcel Proust's collection "A La Recherche Du Temps Perdu". The novels contained in this collection are "Du cote de chez Swann ", "A l'ombre des jeunes filles en fleurs","Le Cote de Guermantes", "Sodome et Gomorrhe I et II", "La Prisonniere", "Albertine disparue", and "Le Temps retrouve".
Maintained by Colin Fay. Last updated 6 years ago.
4.6 match 24 stars 6.10 score 104 scriptssimonpcouch
chores:A Collection of Large Language Model Assistants
Provides a collection of ergonomic large language model assistants designed to help you complete repetitive, hard-to-automate tasks quickly. After selecting some code, press the keyboard shortcut you've chosen to trigger the package app, select an assistant, and watch your chore be carried out. While the package ships with a number of chore helpers for R package development, users can create custom helpers just by writing some instructions in a markdown file.
Maintained by Simon Couch. Last updated 22 days ago.
3.5 match 90 stars 7.91 score 6 scriptshowl-anderson
sdmvspecies:Create Virtual Species for Species Distribution Modelling
A software package help user to create virtual species for species distribution modelling. It includes several methods to help user to create virtual species distribution map. Those maps can be used for Species Distribution Modelling (SDM) study. SDM use environmental data for sites of occurrence of a species to predict all the sites where the environmental conditions are suitable for the species to persist, and may be expected to occur.
Maintained by Xiaoquan Kong. Last updated 9 years ago.
r-languagespecies-distribution-modellingvirtual-species
7.5 match 1 stars 3.70 score 8 scriptsropensci
ritis:Integrated Taxonomic Information System Client
An interface to the Integrated Taxonomic Information System ('ITIS') (<https://www.itis.gov>). Includes functions to work with the 'ITIS' REST API methods (<https://www.itis.gov/ws_description.html>), as well as the 'Solr' web service (<https://www.itis.gov/solr_documentation.html>).
Maintained by Julia Blum. Last updated 1 months ago.
taxonomybiologynomenclaturejsonapiwebapi-clientidentifiersspeciesnamesapi-wrapperitistaxize
3.6 match 16 stars 7.72 score 64 scripts 24 dependentsbrianweinstein
googlenlp:An Interface to Google's Cloud Natural Language API
Interact with Google's Cloud Natural Language API <https://cloud.google.com/natural-language/> (v1) via R. The API has four main features, all of which are available through this R package: syntax analysis and part-of-speech tagging, entity analysis, sentiment analysis, and language identification.
Maintained by Brian Weinstien. Last updated 7 years ago.
7.2 match 8 stars 3.86 score 18 scriptsropensci
babelquarto:Renders a Multilingual Quarto Book
Automate rendering and cross-linking of Quarto books following a prescribed structure.
Maintained by Maëlle Salmon. Last updated 1 months ago.
3.7 match 43 stars 7.52 score 23 scripts 1 dependentsdiscindo
newscatcheR:Programmatically Collect Normalized News from (Almost) Any Website
Programmatically collect normalized news from (almost) any website. An 'R' clone of the <https://github.com/kotartemiy/newscatcher> 'Python' module.
Maintained by Novica Nakov. Last updated 1 years ago.
hacktoberfestnews-sitesnewscatcherrss-feedtidyrss
4.8 match 30 stars 5.65 score 7 scriptsaravind-j
PGRdup:Discover Probable Duplicates in Plant Genetic Resources Collections
Provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections using 'passport databases' comprising of information records of each constituent sample. These include methods for cleaning the data, creation of a searchable Key Word in Context (KWIC) index of keywords associated with sample records and the identification of nearly identical records with similar information by fuzzy, phonetic and semantic matching of keywords.
Maintained by J. Aravind. Last updated 2 years ago.
double-metaphonedouble-metaphone-algorithmnatural-language-processingpgrplant-genetic-resourcesrecord-linkage
6.7 match 1 stars 4.06 score 23 scriptscolinfay
languagelayeR:Access the 'languagelayer' API
Improve your text analysis with languagelayer <https://languagelayer.com>, a powerful language detection API.
Maintained by Colin FAY. Last updated 6 years ago.
6.1 match 5 stars 4.40 score 7 scriptsjohn-harrold
ubiquity:PKPD, PBPK, and Systems Pharmacology Modeling Tools
Complete work flow for the analysis of pharmacokinetic pharmacodynamic (PKPD), physiologically-based pharmacokinetic (PBPK) and systems pharmacology models including: creation of ordinary differential equation-based models, pooled parameter estimation, individual/population based simulations, rule-based simulations for clinical trial design and modeling assays, deployment with a customizable 'Shiny' app, and non-compartmental analysis. System-specific analysis templates can be generated and each element includes integrated reporting with 'PowerPoint' and 'Word'.
Maintained by John Harrold. Last updated 15 days ago.
3.8 match 13 stars 7.14 score 33 scriptsphilferriere
mscsweblm4r:R Client for the Microsoft Cognitive Services Web Language Model REST API
R Client for the Microsoft Cognitive Services Web Language Model REST API, including Break Into Words, Calculate Conditional Probability, Calculate Joint Probability, Generate Next Words, and List Available Models. A valid account MUST be registered at the Microsoft Cognitive Services website <https://www.microsoft.com/cognitive-services/> in order to obtain a (free) API key. Without an API key, this package will not work properly.
Maintained by Phil Ferriere. Last updated 9 years ago.
6.7 match 2 stars 4.00 score 9 scriptsmakhgal-ganbold
NSO1212:National Statistical Office of Mongolia's Open Data API Handler
National Statistical Office of Mongolia (NSO) is the national statistical service and an organization of Mongolian government. NSO provides open access to official data via its API <http://opendata.1212.mn/en/doc>. The package NSO1212 has functions for accessing the API service. The functions are compatible with the API v2.0 and get data sets and its detailed informations from the API.
Maintained by Makhgal Ganbold. Last updated 3 years ago.
7.5 match 7 stars 3.54 score 6 scriptsrpkgs
rcolors:270 'NCL' Color Tables in R Language
'NCL' (NCAR Command Language) is one of the most popular spatial data mapping tools in meteorology studies, due to its beautiful output figures with plenty of color palettes designed by experts <https://www.ncl.ucar.edu/index.shtml>. Here we translate all 'NCL' color palettes into R hexadecimal RGB colors and provide color selection function, which will help users make a beautiful figure.
Maintained by Dongdong Kong. Last updated 9 months ago.
5.1 match 17 stars 5.14 score 54 scriptsropensci
tif:Text Interchange Format
Provides validation functions for common interchange formats for representing text data in R. Includes formats for corpus objects, document term matrices, and tokens. Other annotations can be stored by overloading the tokens structure.
Maintained by Taylor B. Arnold. Last updated 1 years ago.
corpusnatural-language-processingterm-frequencytext-processingtokenizer
6.7 match 36 stars 3.94 score 16 scriptsuribo
sudachir:R Interface to 'Sudachi'
Interface to 'Sudachi' <https://github.com/WorksApplications/sudachi.rs>, a Japanese morphological analyzer. This is a port of what is available in Python.
Maintained by Shinya Uryu. Last updated 2 years ago.
7.5 match 6 stars 3.48 score 6 scriptsopendataformat
opendataformat:Reading and Writing Open Data Format Files
The Open Data Format (ODF) is a new, non-proprietary, multilingual, metadata enriched, and zip-compressed data format with metadata structured in the Data Documentation Initiative (DDI) Codebook standard. This package allows reading and writing of data files in the Open Data Format (ODF) in R, and displaying metadata in different languages. For further information on the Open Data Format, see <https://opendataformat.github.io/>.
Maintained by Tom Hartl. Last updated 5 days ago.
4.8 match 5.41 score 7 scriptsmoodymudskipper
typed:Support Types for Variables, Arguments, and Return Values
A type system for R. It supports setting variable types in a script or the body of a function, so variables can't be assigned illegal values. Moreover it supports setting argument and return types for functions.
Maintained by Antoine Fabri. Last updated 2 months ago.
3.0 match 169 stars 8.65 score 18 scripts 1 dependentsarnaudgallou
plume:A Simple Author Handler for Scientific Writing
Handles and formats author information in scientific writing in 'R Markdown' and 'Quarto'. 'plume' provides easy-to-use and flexible tools for injecting author metadata in 'YAML' headers as well as generating author and contribution lists (among others) as strings from tabular data.
Maintained by Arnaud Gallou. Last updated 29 days ago.
authorscontributioncontributionslistlistsmarkdownpaperpreprintquartoroleroles
3.8 match 21 stars 6.84 score 15 scriptseworx-org
labourR:Classify Multilingual Labour Market Free-Text to Standardized Hierarchical Occupations
Allows the user to map multilingual free-text of occupations to a broad range of standardized classifications. The package facilitates automatic occupation coding (see, e.g., Gweon et al. (2017) <doi:10.1515/jos-2017-0006> and Turrell et al. (2019) <doi:10.3386/w25837>), where the ISCO to ESCO mapping is exploited to extend the occupations hierarchy, Le Vrang et al. (2014) <doi:10.1109/mc.2014.283>. Document vectorization is performed using the multilingual ESCO corpus. A method based on the nearest neighbor search is used to suggest the closest ISCO occupation.
Maintained by Alexandros Kouretsis. Last updated 3 years ago.
4.0 match 28 stars 6.29 score 23 scripts 1 dependentsdrjphughesjr
hash:Full Featured Implementation of Hash Tables/Associative Arrays/Dictionaries
Implements a data structure similar to hashes in Perl and dictionaries in Python but with a purposefully R flavor. For objects of appreciable size, access using hashes outperforms native named lists and vectors.
Maintained by John Hughes. Last updated 2 years ago.
3.4 match 1 stars 7.54 score 4.0k scripts 50 dependentseasystats
report:Automated Reporting of Results and Statistical Models
The aim of the 'report' package is to bridge the gap between R’s output and the formatted results contained in your manuscript. This package converts statistical models and data frames into textual reports suited for publication, ensuring standardization and quality in results reporting.
Maintained by Rémi Thériault. Last updated 1 months ago.
anovasapaautomated-report-generationautomaticbayesiandescribeeasystatshacktoberfestmanuscriptmodelsreportreportingreportsscientificstatsmodels
1.8 match 698 stars 14.48 score 1.1k scripts 3 dependentsprojectmosaic
mosaicCalc:R-Language Based Calculus Operations for Teaching
Software to support the introductory *MOSAIC Calculus* textbook <https://www.mosaic-web.org/MOSAIC-Calculus/>), one of many data- and modeling-oriented educational resources developed by Project MOSAIC (<https://www.mosaic-web.org/>). Provides symbolic and numerical differentiation and integration, as well as support for applied linear algebra (for data science), and differential equations/dynamics. Includes grammar-of-graphics-based functions for drawing vector fields, trajectories, etc. The software is suitable for general use, but intended mainly for teaching calculus.
Maintained by Daniel Kaplan. Last updated 18 days ago.
2.9 match 13 stars 8.68 score 546 scriptsmclements
ascii:Export R Objects to Several Markup Languages
Coerce R object to 'asciidoc', 'txt2tags', 'restructuredText', 'org', 'textile' or 'pandoc' syntax. Package comes with a set of drivers for 'Sweave'.
Maintained by Mark Clements. Last updated 1 years ago.
4.7 match 8 stars 5.31 score 161 scripts 2 dependentsmelff
memisc:Management of Survey Data and Presentation of Analysis Results
An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.
Maintained by Martin Elff. Last updated 10 days ago.
2.0 match 46 stars 12.34 score 1.2k scripts 13 dependentsepiverse-trace
numberize:Convert Words to Numbers in Multiple Languages
Converts written out numbers into their equivalent numbers. Supports numbers written out in English, French, or Spanish.
Maintained by Bankole Ahadzie. Last updated 11 days ago.
4.7 match 4 stars 5.28 score 1 scripts 1 dependentssymengine
symengine:Interface to the 'SymEngine' Library
Provides an R interface to 'SymEngine' <https://github.com/symengine/>, a standalone 'C++' library for fast symbolic manipulation. The package has functionalities for symbolic computation like calculating exact mathematical expressions, solving systems of linear equations and code generation.
Maintained by Jialin Ma. Last updated 1 years ago.
3.0 match 26 stars 8.20 score 33 scripts 10 dependentsgiocomai
zoteror:Access the Zotero API in R
zoteror provides tools to access the Zotero API
Maintained by Giorgio Comai. Last updated 5 years ago.
7.5 match 37 stars 3.27 score 5 scriptsbioc
BiocGenerics:S4 generic functions used in Bioconductor
The package defines many S4 generic functions used in Bioconductor.
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructurebioconductor-packagecore-package
1.7 match 12 stars 14.22 score 612 scripts 2.2k dependents