R-universe search: dictionary

nflverse

nflreadr:Download 'nflverse' Data

A minimal package for downloading data from 'GitHub' repositories of the 'nflverse' project.

Maintained by Tan Ho. Last updated 4 months ago.

nfl nflfastr nflverse sports-data

142.8 match 66 stars 12.46 score 476 scripts 10 dependents

sfeuerriegel

SentimentAnalysis:Dictionary-Based Sentiment Analysis

Performs a sentiment analysis of textual contents in R. This implementation utilizes various existing dictionaries, such as Harvard IV, or finance-specific dictionaries. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.

Maintained by Nicolas Proellochs. Last updated 2 years ago.

61.2 match 149 stars 8.34 score 242 scripts 1 dependents

alexchristensen

SemNetDictionaries:Dictionaries for the 'SemNetCleaner' Package

Implements dictionaries that can be used in the 'SemNetCleaner' package. Also includes several functions aimed at facilitating the text cleaning analysis in the 'SemNetCleaner' package. This package is designed to integrate and update word lists and dictionaries based on each user's individual needs by allowing users to store and save their own dictionaries. Dictionaries can be added to the 'SemNetDictionaries' package by submitting user-defined dictionaries to <https://github.com/AlexChristensen/SemNetDictionaries>.

Maintained by Alexander P. Christensen. Last updated 3 years ago.

dictionaries semantic-network-analysis

89.4 match 4 stars 5.08 score 3 scripts 2 dependents

quanteda

quanteda:Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintained by Kenneth Benoit. Last updated 2 months ago.

corpus natural-language-processing quanteda text-analytics onetbb cpp

24.4 match 851 stars 16.68 score 5.4k scripts 51 dependents

koheiw

newsmap:Semi-Supervised Model for Geographical Document Classification

Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).

Maintained by Kohei Watanabe. Last updated 9 months ago.

machine-learning news-stories quanteda text-analysis

49.0 match 62 stars 6.05 score 8 scripts

stevecondylios

dictionaRy:Retrieve the Dictionary Definitions of English Words

An R interface to the 'Free Dictionary API' <https://dictionaryapi.dev/>, <https://github.com/meetDeveloper/freeDictionaryAPI>. Retrieve dictionary definitions for English words, as well as additional information including phonetics, part of speech, origins, audio pronunciation, example usage, synonyms and antonyms, returned in 'tidy' format for ease of use.

Maintained by Steve Condylios. Last updated 3 years ago.

literature natural-language-processing r-language

55.6 match 6 stars 4.86 score 240 scripts

maelstrom-research

madshapR:Support Technical Processes Following 'Maelstrom Research' Standards

Functions to support rigorous processes in data cleaning, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the process, diagnose errors, and summarize and evaluate datasets and their associated data dictionaries. The main outputs are clean datasets and associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).

Maintained by Guillaume Fabre. Last updated 11 months ago.

49.8 match 2 stars 5.40 score 28 scripts 3 dependents

hsvab

odbr:Download Data from Brazil's Origin Destination Surveys

Download data from Brazil's Origin Destination Surveys. The package covers both data from household travel surveys, dictionaries of variables, and the spatial geometries of surveys conducted in different years and across various urban areas in Brazil. For some cities, the package will include enhanced versions of the data sets with variables "harmonized" across different years.

Maintained by Haydee Svab. Last updated 1 months ago.

31.1 match 16 stars 5.85 score 11 scripts

obiba

opalr:'Opal' Data Repository Client and 'DataSHIELD' Utils

Data integration Web application for biobanks by 'OBiBa'. 'Opal' is the core database application for biobanks. Participant data, once collected from any data source, must be integrated and stored in a central data repository under a uniform model. 'Opal' is such a central repository. It can import, process, validate, query, analyze, report, and export data. 'Opal' is typically used in a research center to analyze the data acquired at assessment centres. Its ultimate purpose is to achieve seamless data-sharing among biobanks. This 'Opal' client allows to interact with 'Opal' web services and to perform operations on the R server side. 'DataSHIELD' administration tools are also provided.

Maintained by Yannick Marcon. Last updated 2 months ago.

20.8 match 3 stars 7.76 score 179 scripts 2 dependents

mlr-org

mlr3:Machine Learning in R - Next Generation

Efficient, object-oriented programming on the building blocks of machine learning. Provides 'R6' objects for tasks, learners, resamplings, and measures. The package is geared towards scalability and larger datasets by supporting parallelization and out-of-memory data-backends like databases. While 'mlr3' focuses on the core computational operations, add-on packages provide additional functionality.

Maintained by Marc Becker. Last updated 4 days ago.

classification data-science machine-learning mlr3 regression

10.5 match 972 stars 14.86 score 2.3k scripts 35 dependents

cran

Rdiagnosislist:Manipulate SNOMED CT Diagnosis Lists

Functions and methods for manipulating 'SNOMED CT' concepts. The package contains functions for loading the 'SNOMED CT' release into a convenient R environment, selecting 'SNOMED CT' concepts using regular expressions, and navigating the 'SNOMED CT' ontology. It provides the 'SNOMEDconcept' S3 class for a vector of 'SNOMED CT' concepts (stored as 64-bit integers) and the 'SNOMEDcodelist' S3 class for a table of concepts IDs with descriptions. The package can be used to construct sets of 'SNOMED CT' concepts for research (<doi:10.1093/jamia/ocac158>). For more information about 'SNOMED CT' visit <https://www.snomed.org/>.

Maintained by Anoop D. Shah. Last updated 2 months ago.

40.5 match 1 stars 3.60 score

trinker

lexicon:Lexicons for Text Analysis

A collection of lexical hash tables, dictionaries, and word lists.

Maintained by Tyler Rinker. Last updated 3 years ago.

hash lexicon lookup names-frequent stopwords text-dictionaries text-mining

14.9 match 111 stars 8.80 score 224 scripts 25 dependents

bioc

XNAString:Efficient Manipulation of Modified Oligonucleotide Sequences

The XNAString package allows for description of base sequences and associated chemical modifications in a single object. XNAString is able to capture single stranded, as well as double stranded molecules. Chemical modifications are represented as independent strings associated with different features of the molecules (base sequence, sugar sequence, backbone sequence, modifications) and can be read or written to a HELM notation. It also enables secondary structure prediction using RNAfold from ViennaRNA. XNAString is designed to be efficient representation of nucleic-acid based therapeutics, therefore it stores information about target sequences and provides interface for matching and alignment functions from Biostrings and pwalign packages.

Maintained by Marianna Plucinska. Last updated 5 months ago.

sequencematching alignment sequencing genetics cpp

30.7 match 4.18 score 4 scripts

kwb-r

kwb.utils:General Utility Functions Developed at KWB

This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).

Maintained by Hauke Sonnenberg. Last updated 12 months ago.

17.3 match 8 stars 7.33 score 12 scripts 78 dependents

cjvanlissa

tidySEM:Tidy Structural Equation Modeling

A tidy workflow for generating, estimating, reporting, and plotting structural equation models using 'lavaan', 'OpenMx', or 'Mplus'. Throughout this workflow, elements of syntax, results, and graphs are represented as 'tidy' data, making them easy to customize. Includes functionality to estimate latent class analyses, and to plot 'dagitty' and 'igraph' objects.

Maintained by Caspar J. van Lissa. Last updated 7 days ago.

11.6 match 58 stars 10.69 score 330 scripts 1 dependents

nflverse

nflseedR:Functions to Efficiently Simulate and Evaluate NFL Seasons

A set of functions to simulate National Football League seasons including the sophisticated tie-breaking procedures.

Maintained by Sebastian Carl. Last updated 5 days ago.

football-simulation nfl season-simulations

18.4 match 23 stars 6.32 score 34 scripts 1 dependents

ropensci

hunspell:High-Performance Stemmer, Tokenizer, and Spell Checker

Low level spell checker and morphological analyzer based on the famous 'hunspell' library <https://hunspell.github.io>. The package can analyze or check individual words as well as parse text, latex, html or xml documents. For a more user-friendly interface use the 'spelling' package which builds on this package to automate checking of files, documentation and vignettes in all common formats.

Maintained by Jeroen Ooms. Last updated 5 months ago.

hunspell spell-check spellchecker stemmer tokenizer cpp

8.8 match 111 stars 13.13 score 422 scripts 30 dependents

coolbutuseless

zstdlite:Fast Compression and Serialization with 'Zstandard' Algorithm

Fast, compressed serialization of R objects using the 'Zstandard' algorithm. The included zstandard connection ('zstdfile()') can be used to read/write compressed data by any code which supports R's built-in 'connections' mechanism. Dictionaries are supported for more effective compression of small data, and functions are provided for training these dictionaries. This implementation provides an R interface to advanced features of the 'Zstandard' 'C' library (available from <https://github.com/facebook/zstd>).

Maintained by Mike Cheng. Last updated 2 months ago.

zstd

23.4 match 30 stars 4.95 score 7 scripts

rstudio

reticulate:Interface to 'Python'

Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.

Maintained by Tomasz Kalinowski. Last updated 1 days ago.

cpp

5.3 match 1.7k stars 21.07 score 18k scripts 427 dependents

sylvainloiseau

interlineaR:Importing Interlinearized Corpora and Dictionaries as Produced by Descriptive Linguistics Software

Interlinearized glossed texts (IGT) are used in descriptive linguistics for representing a morphological analysis of a text through a morpheme-by-morpheme gloss. 'InterlineaR' provide a set of functions that targets several popular formats of IGT ('SIL Toolbox', 'EMELD XML') and that turns an IGT into a set of data frames following a relational model (the tables represent the different linguistic units: texts, sentences, word, morphems). The same pieces of software ('SIL FLEX', 'SIL Toolbox') typically produce dictionaries of the morphemes used in the glosses. 'InterlineaR' provide a function for turning the LIFT XML dictionary format into a set of data frames following a relational model in order to represent the dictionary entries, the sense(s) attached to the entries, the example(s) attached to senses, etc.

Maintained by Sylvain Loiseau. Last updated 7 years ago.

corpus-linguistics descriptive-linguistics dictionaries interlinear-gloss

21.3 match 4 stars 4.60 score 9 scripts

apache

arrow:Integration to 'Apache' 'Arrow'

'Apache' 'Arrow' <https://arrow.apache.org/> is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. This package provides an interface to the 'Arrow C++' library.

Maintained by Jonathan Keane. Last updated 1 months ago.

arrow curl openssl cpp

5.0 match 15k stars 19.22 score 10k scripts 81 dependents

r4epi

epidict:Epidemiology data dictionaries and random data generators

The 'R4EPIs' project <https://R4epis.netlify.com> seeks to provide a set of standardized tools for analysis of outbreak and survey data in humanitarian aid settings. This package currently provides standardized data dictionaries from MSF OCA for four outbreak scenarios (Acute Jaundice Syndrome, Cholera, Measles, Meningitis) and three surveys (Retrospective mortality and access to care, Malnutrition, and Vaccination coverage). In addition, a data generator from these dictionaries is provided.

Maintained by Alexander Spina. Last updated 10 days ago.

21.2 match 3 stars 4.43 score 5 scripts 1 dependents

mlr-org

mlr3misc:Helper Functions for 'mlr3'

Frequently used helper functions and assertions used in 'mlr3' and its companion packages. Comes with helper functions for functional programming, for printing, to work with 'data.table', as well as some generally useful 'R6' classes. This package also supersedes the package 'BBmisc'.

Maintained by Marc Becker. Last updated 4 months ago.

machine-learning miscellaneous mlr3

8.8 match 12 stars 10.28 score 302 scripts 42 dependents

rmi-pacta

pacta.multi.loanbook:Run 'PACTA' on Multiple Loan Books Easily

Run Paris Agreement Capital Transition Assessment ('PACTA') analyses on multiple loan books in a structured way. Provides access to standard 'PACTA' metrics and additional 'PACTA'-related metrics for multiple loan books. Results take the form of 'csv' files and plots and are exported to user-specified project paths.

Maintained by Jacob Kastl. Last updated 2 days ago.

climate-change pacta pactaverse sustainable-finance

13.7 match 6.48 score 4 scripts

cvxgrp

CVXR:Disciplined Convex Optimization

An object-oriented modeling language for disciplined convex programming (DCP) as described in Fu, Narasimhan, and Boyd (2020, <doi:10.18637/jss.v094.i14>). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution. Interfaces to solvers on CRAN and elsewhere are provided, both commercial and open source.

Maintained by Anqi Fu. Last updated 4 months ago.

cpp

6.8 match 207 stars 12.89 score 768 scripts 51 dependents

chris31415926535

tardis:Text Analysis with Rules and Dictionaries for Inferring Sentiment

Measure text's sentiment with dictionaries and simple rules covering negations and modifiers. User-supplied dictionaries are supported, including Unicode emojis and multi-word tokens, so this package can also be used to study constructs beyond sentiment.

Maintained by Christopher Belanger. Last updated 2 years ago.

nlp sentiment-analysis cpp

21.6 match 2 stars 4.00 score 10 scripts

dbosak01

libr:Libraries, Data Dictionaries, and a Data Step for R

Contains a set of functions to create data libraries, generate data dictionaries, and simulate a data step. The libname() function will load a directory of data into a library in one line of code. The dictionary() function will generate data dictionaries for individual data frames or an entire library. And the datestep() function will perform row-by-row data processing.

Maintained by David Bosak. Last updated 3 months ago.

cpp

10.4 match 27 stars 8.27 score 48 scripts 2 dependents

dmrodz

dataMeta:Create and Append a Data Dictionary for an R Dataset

Designed to create a basic data dictionary and append to the original dataset's attributes list. The package makes use of a tidy dataset and creates a data frame that will serve as a linker that will aid in building the dictionary. The dictionary is then appended to the list of the original dataset's attributes. The user will have the option of entering variable and item descriptions by writing code or use alternate functions that will prompt the user to add these.

Maintained by Dania M. Rodriguez. Last updated 3 years ago.

14.5 match 23 stars 5.54 score 15 scripts

rstudio

pointblank:Data Validation and Organization of Metadata for Local and Remote Tables

Validate data in data frames, 'tibble' objects, 'Spark' 'DataFrames', and database tables. Validation pipelines can be made using easily-readable, consecutive validation steps. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions. Many other workflows are available including an information management workflow, where the aim is to record, collect, and generate useful information on data tables.

Maintained by Richard Iannone. Last updated 9 days ago.

data-assertions data-checker data-dictionaries data-frames data-inference data-management data-profiler data-quality data-validation data-verification database-tables easy-to-understand reporting-tool schema-validation testing-tools yaml-configuration

7.5 match 932 stars 10.59 score 284 scripts

a-maldet

labelmachine:Make Labeling of R Data Sets Easy

Assign meaningful labels to data frame columns. 'labelmachine' manages your label assignment rules in 'yaml' files and makes it easy to use the same labels in multiple projects.

Maintained by Adrian Maldet. Last updated 5 years ago.

13.7 match 7 stars 5.26 score 13 scripts

larmarange

labelled:Manipulating Labelled Data

Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.

Maintained by Joseph Larmarange. Last updated 26 days ago.

haven labels metadata sas spss stata

4.6 match 76 stars 15.02 score 2.4k scripts 96 dependents

oobianom

r2dictionary:A Mini-Dictionary for 'Shiny' and 'Rmarkdown' Documents

Despite the predominant use of R for data manipulation and various robust statistical calculations, in recent years, more people from various disciplines are beginning to use R for other purposes. A critical milestone that has enabled large influx of users in the R community is the development of the Tidyverse family of packages and Rmarkdown. With the latter, one can write all kinds of documents and produce output in formats such html and pdf very easily. In doing this seemlessly, further tools are needed for such users to easily and freely write in R for all kinds of purposes. The r2dictionary introduces a means for users to directly search for definitions of terms within the R environment.

Maintained by Obinna Obianom. Last updated 2 years ago.

dictionary

17.0 match 2 stars 4.00 score 3 scripts

epicentre-msf

dbc:Dictionary-Based Cleaning

Tools for dictionary-based data cleaning.

Maintained by Patrick Barks. Last updated 1 years ago.

27.4 match 2 stars 2.48 score 4 scripts 1 dependents

vgherard

sbo:Text Prediction via Stupid Back-Off N-Gram Models

Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).

Maintained by Valerio Gherardi. Last updated 4 years ago.

natural-language-processing ngram-models predictive-text sbo cpp

13.8 match 10 stars 4.78 score 12 scripts

bioc

transcriptogramer:Transcriptional analysis based on transcriptograms

R package for transcriptional analysis based on transcriptograms, a method to analyze transcriptomes that projects expression values on a set of ordered proteins, arranged such that the probability that gene products participate in the same metabolic pathway exponentially decreases with the increase of the distance between two proteins of the ordering. Transcriptograms are, hence, genome wide gene expression profiles that provide a global view for the cellular metabolism, while indicating gene sets whose expressions are altered.

Maintained by Diego Morais. Last updated 5 months ago.

software network visualization systemsbiology geneexpression genesetenrichment graphandnetwork clustering differentialexpression microarray rnaseq transcription immunooncology

13.5 match 4 stars 4.81 score 9 scripts

patzaw

BED:Biological Entity Dictionary (BED)

An interface for the 'Neo4j' database providing mapping between different identifiers of biological entities. This Biological Entity Dictionary (BED) has been developed to address three main challenges. The first one is related to the completeness of identifier mappings. Indeed, direct mapping information provided by the different systems are not always complete and can be enriched by mappings provided by other resources. More interestingly, direct mappings not identified by any of these resources can be indirectly inferred by using mappings to a third reference. For example, many human Ensembl gene ID are not directly mapped to any Entrez gene ID but such mappings can be inferred using respective mappings to HGNC ID. The second challenge is related to the mapping of deprecated identifiers. Indeed, entity identifiers can change from one resource release to another. The identifier history is provided by some resources, such as Ensembl or the NCBI, but it is generally not used by mapping tools. The third challenge is related to the automation of the mapping process according to the relationships between the biological entities of interest. Indeed, mapping between gene and protein ID scopes should not be done the same way than between two scopes regarding gene ID. Also, converting identifiers from different organisms should be possible using gene orthologs information. The method has been published by Godard and van Eyll (2018) <doi:10.12688/f1000research.13925.3>.

Maintained by Patrice Godard. Last updated 3 months ago.

9.5 match 8 stars 6.85 score 25 scripts

mjockers

syuzhet:Extracts Sentiment and Sentiment-Derived Plot Arcs from Text

Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include "syuzhet" (default) developed in the Nebraska Literary Lab "afinn" developed by Finn Årup Nielsen, "bing" developed by Minqing Hu and Bing Liu, and "nrc" developed by Mohammad, Saif M. and Turney, Peter D. Applicable references are available in README.md and in the documentation for the "get_sentiment" function. The package also provides a hack for implementing Stanford's coreNLP sentiment parser. The package provides several methods for plot arc normalization.

Maintained by Matthew Jockers. Last updated 2 years ago.

4.8 match 336 stars 12.92 score 1.4k scripts 31 dependents

qinwf

jiebaR:Chinese Text Segmentation

Chinese text segmentation, keyword extraction and speech tagging For R.

Maintained by Qin Wenfeng. Last updated 5 years ago.

chinese chinese-text-segmentation cppjieba jieba lexical-analysis nlp cpp

6.0 match 348 stars 10.18 score 456 scripts 6 dependents

kasperwelbers

corpustools:Managing, Querying and Analyzing Tokenized Text

Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.

Maintained by Kasper Welbers. Last updated 6 months ago.

cpp

8.1 match 31 stars 7.50 score 174 scripts 1 dependents

revelle

psychTools:Tools to Accompany the 'psych' Package for Psychological Research

Support functions, data sets, and vignettes for the 'psych' package. Contains several of the biggest data sets for the 'psych' package as well as four vignettes. A few helper functions for file manipulation are included as well. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 12 months ago.

10.3 match 5.89 score 178 scripts 5 dependents

agusnieto77

ACEP:Análisis Computacional de Eventos de Protesta

La librería 'ACEP' contiene funciones específicas para desarrollar análisis computacional de eventos de protesta. Asimismo, contiene bases de datos con colecciones de notas sobre protestas y diccionarios de palabras conflictivas. La colección de diccionarios reune diccionarios de diferentes orígenes. The 'ACEP' library contains specific functions to perform computational analysis of protest events. It also contains a database with collections of notes on protests and dictionaries of conflicting words. Collection of dictionaries that brings together dictionaries from different sources.

Maintained by Agustín Nieto. Last updated 1 years ago.

computer-aided-detection conflict-analysis conflict-detection dictionaries nlp-keywords-extraction protest-events text-mining visualization

10.9 match 10 stars 5.48 score 9 scripts

wa-department-of-agriculture

soils:Visualize and Report Soil Health Data

Collection of soil health data visualization and reporting tools, including a RStudio project template with everything you need to generate custom HTML and Microsoft Word reports for each participant in your soil health sampling project.

Maintained by Jadey N Ryan. Last updated 1 months ago.

10.0 match 11 stars 5.74 score 9 scripts

thomaschln

kgraph:Knowledge Graphs Constructions and Visualizations

Knowledge graphs enable to efficiently visualize and gain insights into large-scale data analysis results, as p-values from multiple studies or embedding data matrices. The usual workflow is a user providing a data frame of association studies results and specifying target nodes, e.g. phenotypes, to visualize. The knowledge graph then shows all the features which are significantly associated with the phenotype, with the edges being proportional to the association scores. As the user adds several target nodes and grouping information about the nodes such as biological pathways, the construction of such graphs soon becomes complex. The 'kgraph' package aims to enable users to easily build such knowledge graphs, and provides two main features: first, to enable building a knowledge graph based on a data frame of concepts relationships, be it p-values or cosine similarities; second, to enable determining an appropriate cut-off on cosine similarities from a complete embedding matrix, to enable the building of a knowledge graph directly from an embedding matrix. The 'kgraph' package provides several display, layout and cut-off options, and has already proven useful to researchers to enable them to visualize large sets of p-value associations with various phenotypes, and to quickly be able to visualize embedding results. Two example datasets are provided to demonstrate these behaviors, and several live 'shiny' applications are hosted by the CELEHS laboratory and Parse Health, as the KESER Mental Health application <https://keser-mental-health.parse-health.org/> based on Hong C. (2021) <doi:10.1038/s41746-021-00519-z>.

Maintained by Thomas Charlon. Last updated 24 days ago.

11.3 match 4.85 score

selesnow

rgoogleads:Loading Data from 'Google Ads API'

Interface for loading data from 'Google Ads API', see <https://developers.google.com/google-ads/api/docs/start>. Package provide function for authorization and loading reports.

Maintained by Alexey Seleznev. Last updated 2 months ago.

8.5 match 14 stars 6.40 score 15 scripts 1 dependents

lrberge

fixest:Fast Fixed-Effects Estimations

Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018) <https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf>. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.

Maintained by Laurent Berge. Last updated 7 months ago.

cpp openmp

3.7 match 387 stars 14.69 score 3.8k scripts 25 dependents

epiverse-trace

ColOpenData:Download Colombian Demographic, Climate and Geospatial Data

Downloads wrangled Colombian socioeconomic, geospatial,population and climate data from DANE <https://www.dane.gov.co/> (National Administrative Department of Statistics) and IDEAM <https://ideam.gov.co> (Institute of Hydrology, Meteorology and Environmental Studies). It solves the problem of Colombian data being issued in different web pages and sources by using functions that allow the user to select the desired database and download it without having to do the exhausting acquisition process.

Maintained by Maria Camila Tavera-Cifuentes. Last updated 1 months ago.

climate colombia data-package demographics maps

7.3 match 11 stars 7.44 score 17 scripts

ctn-0094

DOPE:Drug Ontology Parsing Engine

Provides information on drug names (brand, generic and street) for drugs tracked by the DEA. There are functions that will search synonyms and return the drug names and types. The vignettes have extensive information on the work done to create the data for the package.

Maintained by Raymond Balise. Last updated 4 years ago.

6.8 match 21 stars 7.83 score 31 scripts

mlr-org

mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'

Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.

Maintained by Martin Binder. Last updated 8 days ago.

bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing stacking

4.3 match 141 stars 12.36 score 448 scripts 7 dependents

pzhaonet

pinyin:Convert Chinese Characters into Pinyin, Sijiao, Wubi or Other Codes

Convert Chinese characters into Pinyin (the official romanization system for Standard Chinese in mainland China, Malaysia, Singapore, and Taiwan. See <https://en.wikipedia.org/wiki/Pinyin> for details), Sijiao (four or five numerical digits per character. See <https://en.wikipedia.org/wiki/Four-Corner_Method>.), Wubi (an input method with five strokes. See <https://en.wikipedia.org/wiki/Wubi_method>) or user-defined codes.

Maintained by Peng Zhao. Last updated 5 years ago.

bookdown chinese-characters pinyin

9.1 match 49 stars 5.71 score 35 scripts 1 dependents

mlr-org

mlr3mbo:Flexible Bayesian Optimization

A modern and flexible approach to Bayesian Optimization / Model Based Optimization building on the 'bbotk' package. 'mlr3mbo' is a toolbox providing both ready-to-use optimization algorithms as well as their fundamental building blocks allowing for straightforward implementation of custom algorithms. Single- and multi-objective optimization is supported as well as mixed continuous, categorical and conditional search spaces. Moreover, using 'mlr3mbo' for hyperparameter optimization of machine learning models within the 'mlr3' ecosystem is straightforward via 'mlr3tuning'. Examples of ready-to-use optimization algorithms include Efficient Global Optimization by Jones et al. (1998) <doi:10.1023/A:1008306431147>, ParEGO by Knowles (2006) <doi:10.1109/TEVC.2005.851274> and SMS-EGO by Ponweiser et al. (2008) <doi:10.1007/978-3-540-87700-4_78>.

Maintained by Lennart Schneider. Last updated 12 days ago.

automl bayesian-optimization bbotk black-box-optimization gaussian-process hpo hyperparameter hyperparameter-optimization hyperparameter-tuning machine-learning mlr3 model-based-optimization optimization optimizer random-forest tuning

6.0 match 25 stars 8.57 score 120 scripts 3 dependents

trinker

qdapTools:Tools for the 'qdap' Package

A collection of tools associated with the 'qdap' package that may be useful outside of the context of text analysis.

Maintained by Tyler Rinker. Last updated 2 years ago.

7.2 match 16 stars 7.04 score 408 scripts 5 dependents

bioc

MetMashR:Metabolite Mashing with R

A package to merge, filter sort, organise and otherwise mash together metabolite annotation tables. Metabolite annotations can be imported from multiple sources (software) and combined using workflow steps based on S4 class templates derived from the `struct` package. Other modular workflow steps such as filtering, merging, splitting, normalisation and rest-api queries are included.

Maintained by Gavin Rhys Lloyd. Last updated 5 months ago.

workflowstep metabolomics kegg

8.8 match 2 stars 5.81 score 5 scripts

drjphughesjr

hash:Full Featured Implementation of Hash Tables/Associative Arrays/Dictionaries

Implements a data structure similar to hashes in Perl and dictionaries in Python but with a purposefully R flavor. For objects of appreciable size, access using hashes outperforms native named lists and vectors.

Maintained by John Hughes. Last updated 2 years ago.

6.7 match 1 stars 7.54 score 4.0k scripts 50 dependents

ropensci

gendercoder:Recodes Sex/Gender Descriptions into a Standard Set

Provides functions and dictionaries for recoding of freetext gender responses into more consistent categories.

Maintained by Yaoxiang Li. Last updated 1 months ago.

gender-diversity ozunconf18 unconf

7.8 match 46 stars 6.36 score 45 scripts

usepa

tcpl:ToxCast Data Analysis Pipeline

The ToxCast Data Analysis Pipeline ('tcpl') is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, 'invitrodb'. The package was developed for the chemical screening data curated by the US EPA's Toxicity Forecaster (ToxCast) program, but 'tcpl' can be used to support diverse chemical screening efforts.

Maintained by Jason Brown. Last updated 2 days ago.

ccte comptox ord

5.3 match 36 stars 9.41 score 90 scripts

gesistsa

oolong:Create Validation Tests for Automated Content Analysis

Intended to create standard human-in-the-loop validity tests for typical automated content analysis such as topic modeling and dictionary-based methods. This package offers a standard workflow with functions to prepare, administer and evaluate a human-in-the-loop validity test. This package provides functions for validating topic models using word intrusion, topic intrusion (Chang et al. 2009, <https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models>) and word set intrusion (Ying et al. 2021) <doi:10.1017/pan.2021.33> tests. This package also provides functions for generating gold-standard data which are useful for validating dictionary-based methods. The default settings of all generated tests match those suggested in Chang et al. (2009) and Song et al. (2020) <doi:10.1080/10584609.2020.1723752>.

Maintained by Chung-hong Chan. Last updated 19 days ago.

textanalysis topicmodeling validation

6.5 match 54 stars 7.57 score 23 scripts

randy3k

collections:High Performance Container Data Types

Provides high performance container data types such as queues, stacks, deques, dicts and ordered dicts. Benchmarks <https://randy3k.github.io/collections/articles/benchmark.html> have shown that these containers are asymptotically more efficient than those offered by other packages.

Maintained by Randy Lai. Last updated 2 years ago.

5.3 match 104 stars 9.14 score 215 scripts 27 dependents

miserman

lingmatch:Linguistic Matching and Accommodation

Measure similarity between texts. Offers a variety of processing tools and similarity metrics to facilitate flexible representation of texts and matching. Implements forms of Language Style Matching (Ireland & Pennebaker, 2010) <doi:10.1037/a0020386> and Latent Semantic Analysis (Landauer & Dumais, 1997) <doi:10.1037/0033-295X.104.2.211>.

Maintained by Micah Iserman. Last updated 26 days ago.

nlp rcpp text-analysis cpp

9.8 match 11 stars 4.80 score 23 scripts

tidymodels

embed:Extra Recipes for Encoding Predictors

Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.

Maintained by Emil Hvitfeldt. Last updated 2 months ago.

5.0 match 142 stars 9.35 score 1.1k scripts

epiverse-trace

cleanepi:Clean and Standardize Epidemiological Data

Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, and generates a comprehensive report detailing the outcomes of each cleaning task.

Maintained by Karim Mané. Last updated 2 days ago.

data-cleaning epidemiology epiverse

6.3 match 9 stars 7.44 score 19 scripts

vubiostat

redcapAPI:Interface to 'REDCap'

Access data stored in 'REDCap' databases using the Application Programming Interface (API). 'REDCap' (Research Electronic Data CAPture; <https://projectredcap.org>, Harris, et al. (2009) <doi:10.1016/j.jbi.2008.08.010>, Harris, et al. (2019) <doi:10.1016/j.jbi.2019.103208>) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The API allows users to access data and project meta data (such as the data dictionary) from the web programmatically. The 'redcapAPI' package facilitates the process of accessing data with options to prepare an analysis-ready data set consistent with the definitions in a database's data dictionary.

Maintained by Shawn Garbett. Last updated 9 days ago.

4.4 match 22 stars 10.47 score 134 scripts 2 dependents

xoopr

dictionar6:R6 Dictionary Interface

Efficient object-oriented R6 dictionary capable of holding objects of any class, including R6. Typed and untyped dictionaries are provided as well as the 'usual' dictionary methods that are available in other OOP languages, for example listing keys, items, values, and methods to get/set these.

Maintained by Raphael Sonabend. Last updated 3 years ago.

11.8 match 4 stars 3.78 score 1 scripts 1 dependents

mlr-org

bbotk:Black-Box Optimization Toolkit

Features highly configurable search spaces via the 'paradox' package and optimizes every user-defined objective function. The package includes several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in 'mlr3mbo') and Hyperband (in 'mlr3hyperband'). bbotk is the base package of 'mlr3tuning', 'mlr3fselect' and 'miesmuschel'.

Maintained by Marc Becker. Last updated 3 months ago.

bbotk black-box-optimization data-science hyperparameter-optimization hyperparameter-tuning machine-learning mlr3 optimization

4.5 match 22 stars 9.87 score 166 scripts 14 dependents

quadrama

DramaAnalysis:Analysis of Dramatic Texts

Analysis of preprocessed dramatic texts, with respect to literary research. The package provides functions to analyze and visualize information about characters, stage directions, the dramatic structure and the text itself. The dramatic texts are expected to be in CSV format, which can be installed from within the package, sample texts are provided. The package and the reasoning behind it are described in Reiter et al. (2017) <doi:10.18420/in2017_119>.

Maintained by Nils Reiter. Last updated 4 years ago.

corpus-linguistics digital-humanities drama dramatic-texts statistics

9.0 match 15 stars 4.79 score 41 scripts

ropengov

eurostat:Tools for Eurostat Open Data

Tools to download data from the Eurostat database <https://ec.europa.eu/eurostat> together with search and manipulation utilities.

Maintained by Leo Lahti. Last updated 27 days ago.

ropengov eurostat eurostat-data

3.8 match 239 stars 11.09 score 892 scripts 5 dependents

lwheinsberg

dbGaPCheckup:dbGaP Checkup

Contains functions that check for formatting of the Subject Phenotype data set and data dictionary as specified by the National Center for Biotechnology Information (NCBI) Database of Genotypes and Phenotypes (dbGaP) <https://www.ncbi.nlm.nih.gov/gap/docs/submissionguide/>.

Maintained by Lacey W. Heinsberg. Last updated 1 years ago.

8.4 match 4 stars 4.86 score 18 scripts

agdamsbo

REDCapCAST:REDCap Metadata Casting and Castellated Data Handling

Casting metadata for REDCap database creation and handling of castellated data using repeated instruments and longitudinal projects in 'REDCap'. Keeps a focused data export approach, by allowing to only export required data from the database. Also for casting new REDCap databases based on datasets from other sources. Originally forked from the R part of 'REDCapRITS' by Paul Egeler. See <https://github.com/pegeler/REDCapRITS>. 'REDCap' (Research Electronic Data Capture) is a secure, web-based software platform designed to support data capture for research studies, providing 1) an intuitive interface for validated data capture; 2) audit trails for tracking data manipulation and export procedures; 3) automated export procedures for seamless data downloads to common statistical packages; and 4) procedures for data integration and interoperability with external sources (Harris et al (2009) <doi:10.1016/j.jbi.2008.08.010>; Harris et al (2019) <doi:10.1016/j.jbi.2019.103208>).

Maintained by Andreas Gammelgaard Damsbo. Last updated 6 days ago.

6.9 match 1 stars 5.84 score 12 scripts

ipeagit

censobr:Download Data from Brazil's Population Census

Easy access to data from Brazil's population censuses. The package provides a simple and efficient way to download and read the data sets and the documentation of all the population censuses taken in and after 1960 in the country. The package is built on top of the 'Arrow' platform <https://arrow.apache.org/docs/r/>, which allows users to work with larger-than-memory census data using 'dplyr' familiar functions. <https://arrow.apache.org/docs/r/articles/arrow.html#analyzing-arrow-data-with-dplyr>.

Maintained by Rafael H. M. Pereira. Last updated 16 days ago.

brazil census census-data microdados microdata

4.8 match 39 stars 8.38 score 79 scripts

paithiov909

gibasa:An Alternative 'Rcpp' Wrapper of 'MeCab'

A plain 'Rcpp' wrapper for 'MeCab' that can segment Chinese, Japanese, and Korean text into tokens. The main goal of this package is to provide an alternative to 'tidytext' using morphological analysis.

Maintained by Akiru Kato. Last updated 28 days ago.

mecab pos-tagging rcpp cpp

8.0 match 15 stars 5.02 score 3 scripts

pharmacologie-caen

vigicaen:'VigiBase' Pharmacovigilance Database Toolbox

Perform the analysis of the World Health Organization (WHO) Pharmacovigilance database 'VigiBase' (Extract Case Level version), <https://who-umc.org/> e.g., load data, perform data management, disproportionality analysis, and descriptive statistics. Intended for pharmacovigilance routine use or studies. This package is NOT supported nor reflect the opinion of the WHO, or the Uppsala Monitoring Centre. Disproportionality methods are described by Norén et al (2013) <doi:10.1177/0962280211403604>.

Maintained by Charles Dolladille. Last updated 3 days ago.

datamanagement pharmacovigilance

6.3 match 1 stars 6.27 score 11 scripts

dgrtwo

fuzzyjoin:Join Tables Together on Inexact Matching

Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance and regular expression matching.

Maintained by David Robinson. Last updated 5 years ago.

3.0 match 678 stars 12.92 score 1.5k scripts 20 dependents

celehs

MAP:Multimodal Automated Phenotyping

Electronic health records (EHR) linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. Towards that end, we developed an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Specifically, our proposed method, called MAP (Map Automated Phenotyping algorithm), fits an ensemble of latent mixture models on aggregated ICD and NLP counts along with healthcare utilization. The MAP algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying subjects with phenotype yes/no (See Katherine P. Liao, et al. (2019) <doi:10.1093/jamia/ocz066>.).

Maintained by Thomas Charlon. Last updated 2 months ago.

4.5 match 6 stars 8.58 score 177 scripts 1 dependents

rolkra

explore:Simplifies Exploratory Data Analysis

Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.

Maintained by Roland Krasser. Last updated 3 months ago.

data-exploration data-visualisation decision-trees eda rmarkdown shiny tidy

3.3 match 228 stars 11.43 score 221 scripts 1 dependents

rstudio

rstudioapi:Safely Access the RStudio API

Access the RStudio API (if available) and provide informative error messages when it's not.

Maintained by Kevin Ushey. Last updated 4 months ago.

2.0 match 172 stars 18.81 score 3.6k scripts 2.1k dependents

hongyuanjia

eplusr:A Toolkit for Using Whole Building Simulation Program 'EnergyPlus'

A rich toolkit of using the whole building simulation program 'EnergyPlus'(<https://energyplus.net>), which enables programmatic navigation, modification of 'EnergyPlus' models and makes it less painful to do parametric simulations and analysis.

Maintained by Hongyuan Jia. Last updated 8 months ago.

energy-simulation energyplus energyplus-models eplus epw idd idf parametric-simulation r6 simulation

5.1 match 72 stars 7.20 score 91 scripts 4 dependents

vgherard

kgrams:Classical k-gram Language Models

Training and evaluating k-gram language models in R, supporting several probability smoothing techniques, perplexity computations, random text generation and more.

Maintained by Valerio Gherardi. Last updated 4 months ago.

language-models n-grams natural-language-processing cpp

7.0 match 7 stars 5.17 score 14 scripts 1 dependents

lucasgodeiro

TextForecast:Regression Analysis and Forecasting Using Textual Data from a Time-Varying Dictionary

Provides functionalities based on the paper "Time Varying Dictionary and the Predictive Power of FED Minutes" (Lima, 2018) <doi:10.2139/ssrn.3312483>. It selects the most predictive terms, that we call time-varying dictionary using supervised machine learning techniques as lasso and elastic net.

Maintained by Lucas Godeiro. Last updated 5 years ago.

6.9 match 15 stars 5.18 score 20 scripts

yukai-yang

R6DS:R6 Reference Class Based Data Structures

Provides reference classes implementing some useful data structures. The package implements these data structures by using the reference class R6. Therefore, the classes of the data structures are also reference classes which means that their instances are passed by reference. The implemented data structures include stack, queue, double-ended queue, doubly linked list, set, dictionary and binary search tree. See for example <https://en.wikipedia.org/wiki/Data_structure> for more information about the data structures.

Maintained by Yukai Yang. Last updated 2 years ago.

binary-search-trees data-structures deque dictionary doubly-linked-list functional-programming map queue reference-class stack traversal

10.5 match 5 stars 3.40 score 5 scripts

aidanmorales

rTwig:Realistic Quantitative Structure Models

Real Twig is a method to correct branch overestimation in quantitative structure models. Overestimated cylinders are correctly tapered using measured twig diameters of corresponding tree species. Supported quantitative structure modeling software includes 'TreeQSM', 'SimpleForest', 'Treegraph', and 'aRchi'. Also included is a novel database of twig diameters and tools for fractal analysis of point clouds.

Maintained by Aidan Morales. Last updated 13 days ago.

forestry lidar modeling qsm rcpp cpp

5.0 match 8 stars 7.10 score 13 scripts

rmi-pacta

r2dii.match:Tools to Match Corporate Lending Portfolios with Climate Data

These tools implement in R a fundamental part of the software 'PACTA' (Paris Agreement Capital Transition Assessment), which is a free tool that calculates the alignment between financial portfolios and climate scenarios (<https://www.transitionmonitor.com/>). Financial institutions use 'PACTA' to study how their capital allocation decisions align with climate change mitigation goals. This package matches data from corporate lending portfolios to asset level data from market-intelligence databases (e.g. power plant capacities, emission factors, etc.). This is the first step to assess if a financial portfolio aligns with climate goals.

Maintained by Jacob Kastl. Last updated 27 days ago.

climate-change

4.5 match 7 stars 7.63 score 118 scripts 2 dependents

rmi-pacta

r2dii.analysis:Measure Climate Scenario Alignment of Corporate Loans

These tools help you to assess if a corporate lending portfolio aligns with climate goals. They summarize key climate indicators attributed to the portfolio (e.g. production, emission factors), and calculate alignment targets based on climate scenarios. They implement in R the last step of the free software 'PACTA' (Paris Agreement Capital Transition Assessment; <https://www.transitionmonitor.com/>). Financial institutions use 'PACTA' to study how their capital allocation decisions align with climate change mitigation goals.

Maintained by Jacob Kastl. Last updated 12 days ago.

climate-change

4.5 match 12 stars 7.45 score 46 scripts 2 dependents

bioc

Biostrings:Efficient manipulation of biological strings

Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.

Maintained by Hervé Pagès. Last updated 23 days ago.

sequencematching alignment sequencing genetics dataimport datarepresentation infrastructure bioconductor-package core-package

1.9 match 61 stars 17.83 score 8.6k scripts 1.2k dependents

rmi-pacta

r2dii.plot:Visualize the Climate Scenario Alignment of a Financial Portfolio

Create plots to visualize the alignment of a corporate lending financial portfolio to climate change scenarios based on climate indicators (production and emission intensities) across key climate relevant sectors of the 'PACTA' methodology (Paris Agreement Capital Transition Assessment; <https://www.transitionmonitor.com/>). Financial institutions use 'PACTA' to study how their capital allocation decisions align with climate change mitigation goals.

Maintained by Monika Furdyna. Last updated 13 days ago.

4.5 match 8 stars 7.31 score 33 scripts 5 dependents

melff

RKernel:Yet another R kernel for Jupyter

Provides a kernel for Jupyter.

Maintained by Martin Elff. Last updated 14 days ago.

jupyter jupyter-kernel jupyter-kernels jupyter-notebook

7.0 match 38 stars 4.60 score

bioc

pgca:PGCA: An Algorithm to Link Protein Groups Created from MS/MS Data

Protein Group Code Algorithm (PGCA) is a computationally inexpensive algorithm to merge protein summaries from multiple experimental quantitative proteomics data. The algorithm connects two or more groups with overlapping accession numbers. In some cases, pairwise groups are mutually exclusive but they may still be connected by another group (or set of groups) with overlapping accession numbers. Thus, groups created by PGCA from multiple experimental runs (i.e., global groups) are called "connected" groups. These identified global protein groups enable the analysis of quantitative data available for protein groups instead of unique protein identifiers.

Maintained by Gabriela Cohen-Freue. Last updated 5 months ago.

workflowstep assaydomain proteomics massspectrometry immunooncology

8.1 match 4.00 score 3 scripts

rafapereirabr

aopdata:Data from the 'Access to Opportunities Project (AOP)'

Download data from the 'Access to Opportunities Project (AOP)'. The 'aopdata' package brings annual estimates of access to employment, health, education and social assistance services by transport mode, as well as data on the spatial distribution of population, jobs, health care, schools and social assistance facilities at a fine spatial resolution for all cities included in the project. More info on the 'AOP' website <https://www.ipea.gov.br/acessooportunidades/en/>.

Maintained by Rafael H. M. Pereira. Last updated 2 months ago.

6.8 match 4.70 score 72 scripts

rich-iannone

intendo:A Group of Fun Datasets of Various Sizes and Differing Levels of Quality

Four datasets are provided here from the 'Intendo' game 'Super Jetroid'. It is data from the 2015 year of operation and it comprises a revenue table ('all_revenue'), a daily users table ('users_daily'), a user summary table ('user_summary'), and a table with data on all user sessions ('all_sessions'). These core datasets come in different sizes, and, each of them has a variant that was intentionally made faulty (totally riddled with errors and inconsistencies). This suite of tables is useful for testing with packages that focus on data validation and data documentation.

Maintained by Richard Iannone. Last updated 1 years ago.

8.0 match 9 stars 4.01 score 23 scripts

predictiveecology

NetLogoR:Build and Run Spatially Explicit Agent-Based Models

Build and run spatially explicit agent-based models using only the R platform. 'NetLogoR' follows the same framework as the 'NetLogo' software (Wilensky (1999) <http://ccl.northwestern.edu/netlogo/>) and is a translation in R of the structure and functions of 'NetLogo'. 'NetLogoR' provides new R classes to define model agents and functions to implement spatially explicit agent-based models in the R environment. This package allows benefiting of the fast and easy coding phase from the highly developed 'NetLogo' framework, coupled with the versatility, power and massive resources of the R software. Examples of two models from the NetLogo software repository (Ants <http://ccl.northwestern.edu/netlogo/models/Ants>) and Wolf-Sheep-Predation (<http://ccl.northwestern.edu/netlogo/models/WolfSheepPredation>), and a third, Butterfly, from Railsback and Grimm (2012) <https://www.railsback-grimm-abm-book.com/>, all written using 'NetLogoR' are available. The 'NetLogo' code of the original version of these models is provided alongside. A programming guide inspired from the 'NetLogo' Programming Guide (<https://ccl.northwestern.edu/netlogo/docs/programming.html>) and a dictionary of 'NetLogo' primitives (<https://ccl.northwestern.edu/netlogo/docs/dictionary.html>) equivalences are also available. NOTE: To increment 'time', these functions can use a for loop or can be integrated with a discrete event simulator, such as 'SpaDES' (<https://cran.r-project.org/package=SpaDES>). The suggested package 'fastshp' can be installed with 'install.packages("fastshp", repos = ("<https://rforge.net>"), type = "source")'.

Maintained by Eliot J B McIntire. Last updated 4 months ago.

4.5 match 38 stars 6.94 score 19 scripts

juliasilge

tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.

Maintained by Julia Silge. Last updated 11 months ago.

natural-language-processing text-mining tidy-data tidyverse

1.8 match 1.2k stars 16.86 score 17k scripts 61 dependents

nalimilan

R.temis:Integrated Text Mining Solution

An integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, lexical summary, terms co-occurrences and documents similarity measures, graphs of terms, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.

Maintained by Milan Bouchet-Valat. Last updated 17 days ago.

text-mining

6.0 match 27 stars 4.99 score 24 scripts

qtalr

qtkit:Quantitative Text Kit

Support package for the textbook "An Introduction to Quantitative Text Analysis for Linguists: Reproducible Research Using R" (Francom, 2024) <doi:10.4324/9781003393764>. Includes functions to acquire, clean, and analyze text data as well as functions to document and share the results of text analysis. The package is designed to be used in conjunction with the book, but can also be used as a standalone package for text analysis.

Maintained by Jerid Francom. Last updated 2 months ago.

5.9 match 5.03 score 12 scripts

vincentarelbundock

countrycode:Convert Country Names and Country Codes

Standardize country names, convert them into one of 40 different coding schemes, convert between coding schemes, and assign region descriptors.

Maintained by Vincent Arel-Bundock. Last updated 3 months ago.

2.0 match 351 stars 14.80 score 6.3k scripts 119 dependents

reconhub

matchmaker:Flexible Dictionary-Based Cleaning

Provides flexible dictionary-based cleaning that allows users to specify implicit and explicit missing data, regular expressions for both data and columns, and global matches, while respecting ordering of factors. This package is part of the 'RECON' (<https://www.repidemicsconsortium.org/>) toolkit for outbreak analysis.

Maintained by Zhian N. Kamvar. Last updated 5 years ago.

5.4 match 9 stars 5.43 score 9 scripts 2 dependents

yonicd

sinew:Package Development Documentation and Namespace Management

Manage package documentation and namespaces from the command line. Programmatically attach namespaces in R and Rmd script, populates Roxygen2 skeletons with information scraped from within functions and populate the Imports field of the DESCRIPTION file.

Maintained by Jonathan Sidi. Last updated 1 years ago.

3.4 match 166 stars 8.54 score 88 scripts

rjdverse

rjd3toolkit:Utility Functions around 'JDemetra+ 3.0'

R Interface to 'JDemetra+ 3.x' (<https://github.com/jdemetra>) time series analysis software. It provides functions allowing to model time series (create outlier regressors, user-defined calendar regressors, UCARIMA models...), to test the presence of trading days or seasonal effects and also to set specifications in pre-adjustment and benchmarking when using rjd3x13 or rjd3tramoseats.

Maintained by Tanguy Barthelemy. Last updated 5 months ago.

jdemetra seasonal-adjustment timeseries openjdk

5.0 match 5 stars 5.81 score 48 scripts 15 dependents

maelstrom-research

Rmonize:Support Retrospective Harmonization of Data

Functions to support rigorous retrospective data harmonization processing, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the harmonization process, apply specified processing rules to generate harmonized data, diagnose processing errors, and summarize and evaluate harmonized outputs. The main inputs that define the processing are a DataSchema (list and definitions of harmonized variables to be generated) and Data Processing Elements (processing rules to be applied to generate harmonized variables from study-specific variables). The main outputs of processing are harmonized datasets, associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).

Maintained by Guillaume Fabre. Last updated 12 months ago.

5.1 match 5 stars 5.58 score 51 scripts

openwashdata

washr:Publication Toolkit for Water, Sanitation and Hygiene (WASH) Data

A toolkit to set up an R data package in a consistent structure. Automates tasks like tidy data export, data dictionary documentation, README and website creation, and citation management.

Maintained by Colin Walder. Last updated 4 months ago.

5.7 match 2 stars 4.95 score 7 scripts

myeomans

politeness:Detecting Politeness Features in Text

Detecting markers of politeness in English natural language. This package allows researchers to easily visualize and quantify politeness between groups of documents. This package combines prior research on the linguistic markers of politeness. We thank the Spencer Foundation, the Hewlett Foundation, and Harvard's Institute for Quantitative Social Science for support.

Maintained by Mike Yeomans. Last updated 1 months ago.

3.8 match 25 stars 7.49 score 41 scripts 1 dependents

myeomans

doc2concrete:Measuring Concreteness in Natural Language

Models for detecting concreteness in natural language. This package is built in support of Yeomans (2021) <doi:10.1016/j.obhdp.2020.10.008>, which reviews linguistic models of concreteness in several domains. Here, we provide an implementation of the best-performing domain-general model (from Brysbaert et al., (2014) <doi:10.3758/s13428-013-0403-5>) as well as two pre-trained models for the feedback and plan-making domains.

Maintained by Mike Yeomans. Last updated 1 years ago.

5.0 match 13 stars 5.59 score 20 scripts 1 dependents

mlverse

torch:Tensors and Neural Networks with 'GPU' Acceleration

Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.

Maintained by Daniel Falbel. Last updated 5 days ago.

autograd deep-learning torch cpp

1.7 match 520 stars 16.52 score 1.4k scripts 38 dependents

dankelley

oce:Analysis of Oceanographic Data

Supports the analysis of Oceanographic data, including 'ADCP' measurements, measurements made with 'argo' floats, 'CTD' measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the 'UNESCO' or 'TEOS-10' equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively by Kelley (2018) "Oceanographic Analysis with R" <doi:10.1007/978-1-4939-8844-0>.

Maintained by Dan Kelley. Last updated 19 hours ago.

oceanography fortran cpp

1.8 match 146 stars 15.42 score 4.2k scripts 18 dependents

paithiov909

kelpbeds:Dictionary Tool for 'MeCab'

Provides the source 'IPAdic' for 'MeCab'.

Maintained by Akiru Kato. Last updated 11 months ago.

16.2 match 1.70 score

loelschlaeger

oeli:Utilities for Developing Data Science Software

Some general helper functions that I (and maybe others) find useful when developing data science software.

Maintained by Lennart Oelschläger. Last updated 4 months ago.

openblas cpp

5.0 match 2 stars 5.42 score 1 scripts 4 dependents

damoncharlesroberts

genCountR:Interacting with Roberts and Utych's (2019) Gendered Language Dictionary

Allows users to generate a gendered language score according to the gendered language dictionary in Roberts and Utych (2019) <doi:10.1177/1065912919874883>.

Maintained by Damon Roberts. Last updated 8 months ago.

6.8 match 4.00 score 2 scripts

five-dots

Dict:R6 Based Key-Value Dictionary Implementation

A key-value dictionary data structure based on R6 class which is designed to be similar usages with other languages dictionary (e.g. 'Python') with reference semantics and extendabilities by R6.

Maintained by Shun Asai. Last updated 3 years ago.

5.5 match 4 stars 4.85 score 358 scripts

cran

tmcn:A Text Mining Toolkit for Chinese

A Text mining toolkit for Chinese, which includes facilities for Chinese string processing, Chinese NLP supporting, encoding detecting and converting. Moreover, it provides some functions to support 'tm' package in Chinese.

Maintained by Jian Li. Last updated 6 years ago.

11.1 match 1 stars 2.38 score 5 dependents

mlr-org

mlr3tuning:Hyperparameter Optimization for 'mlr3'

Hyperparameter optimization package of the 'mlr3' ecosystem. It features highly configurable search spaces via the 'paradox' package and finds optimal hyperparameter configurations for any 'mlr3' learner. 'mlr3tuning' works with several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in 'mlr3mbo') and Hyperband (in 'mlr3hyperband'). Moreover, it can automatically optimize learners and estimate the performance of optimized models with nested resampling.

Maintained by Marc Becker. Last updated 3 months ago.

bbotk hyperparameter-optimization hyperparameter-tuning machine-learning mlr3 optimization tune tuning

2.3 match 55 stars 11.59 score 384 scripts 11 dependents

theogrost

NUSS:Mixed N-Grams and Unigram Sequence Segmentation

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

Maintained by Oskar Kosch. Last updated 8 months ago.

cpp

8.7 match 3.00 score 8 scripts

nikdata

RClimacell:R Wrapper for the 'Climacell' API

'Climacell' is a weather platform that provides hyper-local forecasts and weather data. This package enables the user to query the core layers of the time line interface of the 'Climacell' v4 API <https://www.climacell.co/weather-api/>. This package requires a valid API key. See vignettes for instructions on use.

Maintained by Nikhil Agarwal. Last updated 4 years ago.

climacell climacell-api weather weather-api

6.5 match 4.00 score 5 scripts

kurthornik

wordnet:WordNet Interface

An interface to WordNet using the Jawbone Java API to WordNet. WordNet (<https://wordnet.princeton.edu/>) is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. Please note that WordNet(R) is a registered tradename. Princeton University makes WordNet available to research and commercial users free of charge provided the terms of their license (<https://wordnet.princeton.edu/license-and-commercial-use>) are followed, and proper reference is made to the project using an appropriate citation (<https://wordnet.princeton.edu/citing-wordnet>). The WordNet database files need to be made available separately, either via package 'wordnetDicts' from <https://datacube.wu.ac.at>, installing system packages where available, or direct download from <https://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz>.

Maintained by Kurt Hornik. Last updated 9 months ago.

openjdk

8.3 match 2 stars 3.13 score 67 scripts

inbo

checklist:A Thorough and Strict Set of Checks for R Packages and Source Code

An opinionated set of rules for R packages and R source code projects.

Maintained by Thierry Onkelinx. Last updated 26 days ago.

checklist continuous-integration continuous-testing quality-assurance

3.4 match 19 stars 7.24 score 21 scripts 2 dependents

comp-cogneuro-lang

LexFindR:Find Related Items and Lexical Dimensions in a Lexicon

Implements code to identify lexical competitors in a given list of words. We include many of the standard competitor types used in spoken word recognition research, such as functions to find cohorts, neighbors, and rhymes, amongst many others. The package includes documentation for using a variety of lexicon files, including those with form codes made up of multiple letters (i.e., phoneme codes) and also basic orthographies. Importantly, the code makes use of multiple CPU cores and vectorization when possible, making it extremely fast and able to handle large lexicons. Additionally, the package contains documentation for users to easily write new functions, allowing researchers to examine other relationships within a lexicon. Preprint: <https://osf.io/preprints/psyarxiv/8dyru/>. Open access: <doi:10.3758/s13428-021-01667-6>. Citation: Li, Z., Crinnion, A.M. & Magnuson, J.S. (2021). <doi:10.3758/s13428-021-01667-6>.

Maintained by ZhaoBin Li. Last updated 9 months ago.

5.7 match 4 stars 4.30 score 5 scripts

matrix-profile-foundation

tsmp:Time Series with Matrix Profile

A toolkit implementing the Matrix Profile concept that was created by CS-UCR <http://www.cs.ucr.edu/~eamonn/MatrixProfile.html>.

Maintained by Francisco Bischoff. Last updated 3 years ago.

algorithm matrix-profile motif-search time-series cpp

3.3 match 72 stars 7.29 score 179 scripts 1 dependents

roux-ohdsi

allofus:Interface for 'All of Us' Researcher Workbench

Streamline use of the 'All of Us' Researcher Workbench (<https://www.researchallofus.org/data-tools/workbench/>)with tools to extract and manipulate data from the 'All of Us' database. Increase interoperability with the Observational Health Data Science and Informatics ('OHDSI') tool stack by decreasing reliance of 'All of Us' tools and allowing for cohort creation via 'Atlas'. Improve reproducible and transparent research using 'All of Us'.

Maintained by Rob Cavanaugh. Last updated 4 months ago.

3.4 match 16 stars 7.19 score 30 scripts

ctu-bern

redcaptools:Tools for exporting and working with REDCap data

Tools for exporting and working with REDCap data (e.g. adding labels, formatting dates).

Maintained by Alan G Haynes. Last updated 4 months ago.

api-export database

5.3 match 4 stars 4.51 score 9 scripts

lgnbhl

BFS:Get Data from the Swiss Federal Statistical Office

Search and download data from the Swiss Federal Statistical Office (BFS) APIs <https://www.bfs.admin.ch/>.

Maintained by Felix Luginbuhl. Last updated 3 months ago.

switzerland

3.6 match 18 stars 6.55 score 17 scripts

canmod

iidda:Processing Infectious Disease Datasets in IIDDA.

Part of an open toolchain for processing infectious disease datasets available through the IIDDA data repository.

Maintained by Steve Walker. Last updated 4 months ago.

3.9 match 6.07 score 133 scripts 3 dependents

ropensci

phylotaR:Automated Phylogenetic Sequence Cluster Identification from 'GenBank'

A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.

Maintained by Shixiang Wang. Last updated 8 months ago.

blastn genbank peer-reviewed phylogenetics sequence-alignment

4.0 match 23 stars 5.86 score 156 scripts

ropensci

EndoMineR:Functions to mine endoscopic and associated pathology datasets

This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.

Maintained by Sebastian Zeki. Last updated 7 months ago.

endoscopy gastroenterology peer-reviewed semi-structured-data text-mining

4.3 match 13 stars 5.47 score 30 scripts

dylanb95

statespacer:State Space Modelling in 'R'

A tool that makes estimating models in state space form a breeze. See "Time Series Analysis by State Space Methods" by Durbin and Koopman (2012, ISBN: 978-0-19-964117-8) for details about the algorithms implemented.

Maintained by Dylan Beijers. Last updated 2 years ago.

cpp dynamic-linear-model forecasting gaussian-models kalman-filter mathematical-modelling state-space statistical-inference statistical-models structural-analysis time-series openblas cpp openmp

3.8 match 15 stars 6.14 score 37 scripts

cardiomoon

webr:Data and Functions for Web-Based Analysis

Several analysis-related functions for the book entitled "Web-based Analysis without R in Your Computer"(written in Korean, ISBN 978-89-5566-185-9) by Keon-Woong Moon. The main function plot.htest() shows the distribution of statistic for the object of class 'htest'.

Maintained by Keon-Woong Moon. Last updated 5 years ago.

3.4 match 33 stars 6.82 score 181 scripts

doctorbjones

datadictionary:Create a Data Dictionary

Creates a data dictionary from any dataframe or tibble in your R environment. You can opt to add variable labels. You can write the object directly to Excel.

Maintained by Bethany Jones. Last updated 1 days ago.

5.7 match 11 stars 4.00 score 18 scripts

sportsdataverse

hoopR:Access Men's Basketball Play by Play Data

A utility to quickly obtain clean and tidy men's basketball play by play data. Provides functions to access live play by play and box score data from ESPN<https://www.espn.com> with shot locations when available. It is also a full NBA Stats API<https://www.nba.com/stats/> wrapper. It is also a scraping and aggregating interface for Ken Pomeroy's men's college basketball statistics website<https://kenpom.com>. It provides users with an active subscription the capability to scrape the website tables and analyze the data for themselves.

Maintained by Saiem Gilani. Last updated 1 years ago.

basketball college-basketball espn kenpom nba nba-analytics nba-api nba-data nba-statistics nba-stats nba-stats-api ncaa ncaa-basketball ncaa-bracket ncaa-players ncaa-ratings ncaam sportsdataverse

3.3 match 91 stars 6.93 score 261 scripts

jonathanconrad98

docket:Insert R Data into 'Word' Documents

Populate data from an R environment into '.doc' and '.docx' templates. Create a template document in a program such as 'Word', and add strings encased in guillemet characters to create flags («example»). Use getDictionary() to create a dictionary of flags and replacement values, then call docket() to generate a populated document.

Maintained by Jonathan Conrad. Last updated 1 years ago.

8.3 match 2.70 score 3 scripts

ideasybits

redatamx:R Interface to 'Redatam' Library

Provides an API to work with 'Redatam' (see <https://redatam.org>) databases in both formats: 'RXDB' (new format) and 'DICX' (old format) and running 'Redatam' programs written in 'SPC' language. It's a wrapper around 'Redatam' core and provides functions to open/close a database (redatam_open()/redatam_close()), list entities and variables from the database (redatam_entities(), redatam_variables()) and execute a 'SPC' program and gets the results as data frames (redatam_query(), redatam_run()).

Maintained by Jaime Salvador. Last updated 3 months ago.

redatam cpp

6.8 match 3.30 score 2 scripts

laresbernardo

lares:Analytics & Machine Learning Sidekick

Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.

Maintained by Bernardo Lares. Last updated 23 days ago.

analytics api automation automl data-science descriptive-statistics h2o machine-learning marketing mmm predictive-modeling puzzle rlanguage robyn visualization

2.3 match 233 stars 9.84 score 185 scripts 1 dependents

wadpac

GGIR:Raw Accelerometer Data Analysis

A tool to process and analyse data collected with wearable raw acceleration sensors as described in Migueles and colleagues (JMPB 2019), and van Hees and colleagues (JApplPhysiol 2014; PLoSONE 2015). The package has been developed and tested for binary data from 'GENEActiv' <https://activinsights.com/>, binary (.gt3x) and .csv-export data from 'Actigraph' <https://theactigraph.com> devices, and binary (.cwa) and .csv-export data from 'Axivity' <https://axivity.com>. These devices are currently widely used in research on human daily physical activity. Further, the package can handle accelerometer data file from any other sensor brand providing that the data is stored in csv format. Also the package allows for external function embedding.

Maintained by Vincent T van Hees. Last updated 2 days ago.

accelerometer activity-recognition circadian-rhythm movement-sensor sleep

1.7 match 109 stars 13.20 score 342 scripts 3 dependents

ahl27

froth:Emulate a 'Forth' Programming Environment

Emulates a 'Forth' programming environment with added features to interface between R and 'Forth'. Implements most of the functionality described in the original "Starting Forth" textbook <https://www.forth.com/starting-forth/>.

Maintained by Aidan Lakshman. Last updated 1 years ago.

4.3 match 3 stars 5.08 score 2 scripts

rpahl

container:Extending Base 'R' Lists

Extends the functionality of base 'R' lists and provides specialized data structures 'deque', 'set', 'dict', and 'dict.table', the latter to extend the 'data.table' package.

Maintained by Roman Pahl. Last updated 2 months ago.

container data-structures deque dict sets

3.0 match 16 stars 7.13 score 140 scripts

trinker

qdapDictionaries:Dictionaries and Word Lists for the 'qdap' Package

A collection of text analysis dictionaries and word lists for use with the 'qdap' package.

Maintained by Tyler Rinker. Last updated 7 years ago.

3.6 match 4 stars 5.99 score 113 scripts 6 dependents

cran

nonmem2R:Loading NONMEM Output Files with Functions for Visual Predictive Checks (VPC) and Goodness of Fit (GOF) Plots

Loading NONMEM (NONlinear Mixed-Effect Modeling, <https://www.iconplc.com/solutions/technologies/nonmem/>) and PSN (Perl-speaks-NONMEM, <https://uupharmacometrics.github.io/PsN/>) output files to extract parameter estimates, provide visual predictive check (VPC) and goodness of fit (GOF) plots, and simulate with parameter uncertainty.

Maintained by Magnus Astrand. Last updated 1 years ago.

7.7 match 3 stars 2.78 score

tiledb-inc

tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays

The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.

Maintained by Isaiah Norton. Last updated 4 days ago.

array hdfs s3 storage-manager tiledb cpp

1.8 match 107 stars 11.96 score 306 scripts 4 dependents

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

1.7 match 215 stars 11.83 score 1.2k scripts 9 dependents

jaseziv

worldfootballR:Extract and Clean World Football (Soccer) Data

Allow users to obtain clean and tidy football (soccer) game, team and player data. Data is collected from a number of popular sites, including 'FBref', transfer and valuations data from 'Transfermarkt'<https://www.transfermarkt.com/> and shooting location and other match stats data from 'Understat'<https://understat.com/>. It gives users the ability to access data more efficiently, rather than having to export data tables to files before being able to complete their analysis.

Maintained by Jason Zivkovic. Last updated 30 days ago.

fbref football football-data soccer-data sports-data transfermarkt understat

2.0 match 506 stars 9.89 score 516 scripts 2 dependents

hugheylab

pmparser:Create and Maintain a Relational Database of Data from PubMed/MEDLINE

Provides a simple interface for extracting various elements from the publicly available PubMed XML files, incorporating PubMed's regular updates, and combining the data with the NIH Open Citation Collection. See Schoenbachler and Hughey (2021) <doi:10.7717/peerj.11071>.

Maintained by Jake Hughey. Last updated 2 months ago.

3.8 match 17 stars 5.23 score 1 scripts

selesnow

rvkstat:R Interface to API 'vk.com'

Load data from vk.com api about your communiti users and views, ads performance, post on user wall and etc. For more information see API Documentation <https://vk.com/dev/first_guide>.

Maintained by Alexey Seleznev. Last updated 3 years ago.

4.5 match 15 stars 4.35 score 9 scripts 1 dependents

koheiw

wordmap:Feature Extraction and Document Classification with Noisy Labels

Extract features and classify documents with noisy labels given by document-meta data or keyword matching Watanabe & Zhou (2020) <doi:10.1177/0894439320907027>.

Maintained by Kohei Watanabe. Last updated 2 months ago.

4.0 match 2 stars 4.86 score 1 scripts

dahhamalsoud

phdcocktail:Enhance the Ease of R Experience as an Emerging Researcher

A toolkit of functions to help: i) effortlessly transform collected data into a publication ready format, ii) generate insightful visualizations from clinical data, iii) report summary statistics in a publication-ready format, iv) efficiently export, save and reload R objects within the framework of R projects.

Maintained by Dahham Alsoud. Last updated 1 years ago.

5.2 match 3.70 score 1 scripts

hope-data-science

akc:Automatic Knowledge Classification

A tidy framework for automatic knowledge classification and visualization. Currently, the core functionality of the framework is mainly supported by modularity-based clustering (community detection) in keyword co-occurrence network, and focuses on co-word analysis of bibliometric research. However, the designed functions in 'akc' are general, and could be extended to solve other tasks in text mining as well.

Maintained by Tian-Yuan Huang. Last updated 19 days ago.

3.3 match 15 stars 5.85 score 47 scripts

guyabel

migest:Methods for the Indirect Estimation of Bilateral Migration

Tools for estimating, measuring and working with migration data.

Maintained by Guy J. Abel. Last updated 1 months ago.

demography migration population

3.3 match 32 stars 5.80 score 86 scripts

mlr-org

mlr3filters:Filter Based Feature Selection for 'mlr3'

Extends 'mlr3' with filter methods for feature selection. Besides standalone filter methods built-in methods of any machine-learning algorithm are supported. Partial scoring of multivariate filter methods is supported.

Maintained by Marc Becker. Last updated 4 months ago.

feature-selection filter filters mlr mlr3 variable-importance

2.3 match 20 stars 8.37 score 95 scripts 3 dependents

nt-williams

codebreak:Label Data Using a YAML Codebook

A light-weight framework for labeling coded data using a codebook saved as YAML text file.

Maintained by Nick Williams. Last updated 7 months ago.

codebook data-dictionary

7.5 match 6 stars 2.48 score 1 scripts

mlr-org

mlr3fselect:Feature Selection for 'mlr3'

Feature selection package of the 'mlr3' ecosystem. It selects the optimal feature set for any 'mlr3' learner. The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling.

Maintained by Marc Becker. Last updated 2 months ago.

evolutionary-algorithms exhaustive-search feature-selection machine-learning mlr3 optimization random-search recursive-feature-elimination sequential-feature-selection

2.3 match 23 stars 8.25 score 70 scripts 2 dependents

oliver-wyman-actuarial

easyr:Helpful Functions from Oliver Wyman Actuarial Consulting

Makes difficult operations easy. Includes these types of functions: shorthand, type conversion, data wrangling, and work flow. Also includes some helpful data objects: NA strings, U.S. state list, color blind charting colors. Built and shared by Oliver Wyman Actuarial Consulting. Accepting proposed contributions through GitHub.

Maintained by Bryce Chamberlain. Last updated 1 years ago.

3.8 match 20 stars 4.86 score 18 scripts

pecanproject

PEcAn.utils:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by Rob Kooper. Last updated 2 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

1.7 match 216 stars 10.92 score 218 scripts 35 dependents

bioc

affxparser:Affymetrix File Parsing SDK

Package for parsing Affymetrix files (CDF, CEL, CHP, BPMAP, BAR). It provides methods for fast and memory efficient parsing of Affymetrix files using the Affymetrix' Fusion SDK. Both ASCII- and binary-based files are supported. Currently, there are methods for reading chip definition file (CDF) and a cell intensity file (CEL). These files can be read either in full or in part. For example, probe signals from a few probesets can be extracted very quickly from a set of CEL files into a convenient list structure.

Maintained by Kasper Daniel Hansen. Last updated 2 months ago.

infrastructure dataimport microarray proprietaryplatforms onechannel bioconductor cpp

2.3 match 7 stars 8.19 score 65 scripts 14 dependents

trevorld

xmpdf:Edit 'XMP' Metadata and 'PDF' Bookmarks and Documentation Info

Edit 'XMP' metadata <https://en.wikipedia.org/wiki/Extensible_Metadata_Platform> in a variety of media file formats as well as edit bookmarks (aka outline aka table of contents) and documentation info entries in 'pdf' files. Can detect and use a variety of command-line tools to perform these operations such as 'exiftool' <https://exiftool.org/>, 'ghostscript' <https://www.ghostscript.com/>, and/or 'pdftk' <https://gitlab.com/pdftk-java/pdftk>.

Maintained by Trevor L Davis. Last updated 12 months ago.

3.5 match 5 stars 5.18 score 1 scripts 1 dependents

ropensci

rdhs:API Client and Dataset Management for the Demographic and Health Survey (DHS) Data

Provides a client for (1) querying the DHS API for survey indicators and metadata (<https://api.dhsprogram.com/#/index.html>), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.

Maintained by OJ Watson. Last updated 17 days ago.

dataset dhs dhs-api extract peer-reviewed survey-data

1.8 match 35 stars 10.07 score 286 scripts 3 dependents

jimmyday12

fitzRoy:Easily Scrape and Process AFL Data

An easy package for scraping and processing Australia Rules Football (AFL) data. 'fitzRoy' provides a range of functions for accessing publicly available data from 'AFL Tables' <https://afltables.com/afl/afl_index.html>, 'Footy Wire' <https://www.footywire.com> and 'The Squiggle' <https://squiggle.com.au>. Further functions allow for easy processing, cleaning and transformation of this data into formats that can be used for analysis.

Maintained by James Day. Last updated 2 months ago.

1.7 match 134 stars 10.74 score 324 scripts

gdemin

maditr:Fast Data Aggregation, Modification, and Filtering with Pipes and 'data.table'

Provides pipe-style interface for 'data.table'. Package preserves all 'data.table' features without significant impact on performance. 'let' and 'take' functions are simplified interfaces for most common data manipulation tasks. For example, you can write 'take(mtcars, mean(mpg), by = am)' for aggregation or 'let(mtcars, hp_wt = hp/wt, hp_wt_mpg = hp_wt/mpg)' for modification. Use 'take_if/let_if' for conditional aggregation/modification. Additionally there are some conveniences such as automatic 'data.frame' conversion to 'data.table'.

Maintained by Gregory Demin. Last updated 4 months ago.

data-table magrittr pipes

2.0 match 61 stars 8.98 score 248 scripts 7 dependents

cmerow

rangeModelMetadata:Provides Templates for Metadata Files Associated with Species Range Models

Range Modeling Metadata Standards (RMMS) address three challenges: they (i) are designed for convenience to encourage use, (ii) accommodate a wide variety of applications, and (iii) are extensible to allow the community of range modelers to steer it as needed. RMMS are based on a data dictionary that specifies a hierarchical structure to catalog different aspects of the range modeling process. The dictionary balances a constrained, minimalist vocabulary to improve standardization with flexibility for users to provide their own values. Merow et al. (2019) <DOI:10.1111/geb.12993> describe the standards in more detail. Note that users who prefer to use the R package 'ecospat' can obtain it from <https://github.com/ecospat/ecospat>.

Maintained by Cory Merow. Last updated 8 months ago.

ecological-metadata-language ecological-modelling ecological-models ecology species-distribution-modelling species-distributions

2.6 match 6 stars 6.96 score 16 scripts 3 dependents

trinker

qdapRegex:Regular Expression Removal, Extraction, and Replacement Tools

A collection of regular expression tools associated with the 'qdap' package that may be useful outside of the context of discourse analysis. Tools include removal/extraction/replacement of abbreviations, dates, dollar amounts, email addresses, hash tags, numbers, percentages, citations, person tags, phone numbers, times, and zip codes.

Maintained by Tyler Rinker. Last updated 1 years ago.

qdapregex regular-expression

1.9 match 50 stars 9.48 score 502 scripts 41 dependents

trinker

textstem:Tools for Stemming and Lemmatizing Text

Tools that stem and lemmatize text. Stemming is a process that removes endings such as affixes. Lemmatization is the process of grouping inflected forms together as a single base form.

Maintained by Tyler Rinker. Last updated 7 years ago.

lemmatization stemming text-mining

2.0 match 45 stars 8.71 score 888 scripts 11 dependents

usaid-oha-si

mindthegap:Mind the Gap

Package to tidy UNAIDS estimates (from the EDMS database) as well as plot trends in UNAIDS 95 goals and ART coverage gap by country.

Maintained by Karishma Srikanth. Last updated 2 months ago.

3.1 match 5 stars 5.51 score 13 scripts

scholaempirica

reschola:The Schola Empirica Package

A collection of utilies, themes and templates for data analysis at Schola Empirica.

Maintained by Jan Netík. Last updated 5 months ago.

3.5 match 4 stars 4.83 score 14 scripts

brendensm

misuvi:Access the Michigan Substance Use Vulnerability Index (MI-SUVI)

Easily import the MI-SUVI data sets. The user can import data sets with full metrics, percentiles, Z-scores, or rankings. Data is available at both the County and Zip Code Tabulation Area (ZCTA) levels. This package also includes a function to import shape files for easy mapping and a function to access the full technical documentation. All data is sourced from the Michigan Department of Health and Human Services.

Maintained by Brenden Smith. Last updated 1 months ago.

4.9 match 3.40 score

christopherkenny

acronames:Create Acronyms for Naming Things

Simple tool for developing names based on first letters of keywords.

Maintained by Christopher T. Kenny. Last updated 3 years ago.

9.8 match 1 stars 1.70 score 1 scripts

kwb-r

kwb.pathdict:Functions to Work with Path Dictionaries

This package provides functions to work with what I call path dictionaries. Path dictionaries are lists defining file and folder paths. In order not to repeat sub-paths, placeholders can be used. The package provides functions to find duplicated sub-paths and to define placeholders accordingly.

Maintained by Hauke Sonnenberg. Last updated 5 years ago.

paths project-fakin

9.7 match 1.70 score 1 scripts

subramv

cms:Calculate Medicare Reimbursement

Uses the 'CMS' application programming interface <https://dnav.cms.gov/api/healthdata> to provide users databases containing yearly Medicare reimbursement rates in the United States. Data can be acquired for the entire United States or only for specific localities. Currently, support is only provided for the Medicare Physician Fee Schedule, but support will be expanded for other 'CMS' databases in future versions.

Maintained by Vigneshwar Subramanian. Last updated 4 years ago.

medicare medicare-data reimbursement

3.5 match 7 stars 4.54 score 10 scripts

davidasmith

wordler:The 'WORDLE' Game

The 'Wordle' game. Players have six attempts to guess a five-letter word. After each guess, the player is informed which letters in their guess are either: anywhere in the word; in the right position in the word. This can be used to inform the next guess. Can be played interactively in the console, or programmatically. Based on Josh Wardle's game <https://www.powerlanguage.co.uk/wordle/>.

Maintained by David Smith. Last updated 3 years ago.

3.6 match 5 stars 4.40 score 7 scripts

great-northern-diver

loon:Interactive Statistical Data Visualization

An extendable toolkit for interactive data visualization and exploration.

Maintained by R. Wayne Oldford. Last updated 2 years ago.

data-analysis data-science data-visualization exploratory-analysis exploratory-data-analysis high-dimensional-data interactive-graphics interactive-visualizations loon python statistical-analysis statistical-graphics statistics tcl-extension tk

1.8 match 48 stars 9.00 score 93 scripts 5 dependents

genentech

psborrow2:Bayesian Dynamic Borrowing Analysis and Simulation

Bayesian dynamic borrowing is an approach to incorporating external data to supplement a randomized, controlled trial analysis in which external data are incorporated in a dynamic way (e.g., based on similarity of outcomes); see Viele 2013 <doi:10.1002/pst.1589> for an overview. This package implements the hierarchical commensurate prior approach to dynamic borrowing as described in Hobbes 2011 <doi:10.1111/j.1541-0420.2011.01564.x>. There are three main functionalities. First, 'psborrow2' provides a user-friendly interface for applying dynamic borrowing on the study results handles the Markov Chain Monte Carlo sampling on behalf of the user. Second, 'psborrow2' provides a simulation framework to compare different borrowing parameters (e.g. full borrowing, no borrowing, dynamic borrowing) and other trial and borrowing characteristics (e.g. sample size, covariates) in a unified way. Third, 'psborrow2' provides a set of functions to generate data for simulation studies, and also allows the user to specify their own data generation process. This package is designed to use the sampling functions from 'cmdstanr' which can be installed from <https://stan-dev.r-universe.dev>.

Maintained by Matt Secrest. Last updated 1 months ago.

bayesian-dynamic-borrowing psborrow2 simulation-study

2.0 match 18 stars 7.87 score 16 scripts

kwb-r

kwb.monitoring:Functions Used Within Different Kwb Monitoring Projects

Functions used within different KWB projects dealing with monitoring data.

Maintained by Hauke Sonnenberg. Last updated 6 years ago.

monitoring

4.1 match 3.78 score 3 scripts 4 dependents

flavjack

inti:Tools and Statistical Procedures in Plant Science

The 'inti' package is part of the 'inkaverse' project for developing different procedures and tools used in plant science and experimental designs. The mean aim of the package is to support researchers during the planning of experiments and data collection (tarpuy()), data analysis and graphics (yupana()) , and technical writing. Learn more about the 'inkaverse' project at <https://inkaverse.com/>.

Maintained by Flavio Lozano-Isla. Last updated 20 hours ago.

agriculture apps inkaverse lmm plant-breeding plant-science shiny

1.9 match 5 stars 8.27 score 193 scripts

mlr-org

mlr3torch:Deep Learning with 'mlr3'

Deep Learning library that extends the mlr3 framework by building upon the 'torch' package. It allows to conveniently build, train, and evaluate deep learning models without having to worry about low level details. Custom architectures can be created using the graph language defined in 'mlr3pipelines'.

Maintained by Sebastian Fischer. Last updated 1 months ago.

data-science deep-learning machine-learning mlr3 torch

2.0 match 42 stars 7.63 score 78 scripts

cran

NHSDataDictionaRy:NHS Data Dictionary Toolset for NHS Lookups

Providing a common set of simplified web scraping tools for working with the NHS Data Dictionary <https://datadictionary.nhs.uk/data_elements_overview.html>. The intended usage is to access the data elements section of the NHS Data Dictionary to access key lookups. The benefits of having it in this package are that the lookups are the live lookups on the website and will not need to be maintained. This package was commissioned by the NHS-R community <https://nhsrcommunity.com/> to provide this consistency of lookups. The OpenSafely lookups have now been added <https://www.opencodelists.org/docs/>.

Maintained by Gary Hutson. Last updated 4 years ago.

7.6 match 2.00 score

ropensci

ckanr:Client for the Comprehensive Knowledge Archive Network ('CKAN') API

Client for 'CKAN' API (<https://ckan.org/>). Includes interface to 'CKAN' 'APIs' for search, list, show for packages, organizations, and resources. In addition, provides an interface to the 'datastore' API.

Maintained by Francisco Alves. Last updated 2 years ago.

database open-data ckan api data dataset api-wrapper ckan-api

1.8 match 100 stars 8.67 score 448 scripts 4 dependents

psychbruce

PsychWordVec:Word Embedding Research Framework for Psychological Science

An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a series of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <arXiv:1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <arXiv:1607.04606>; (5) a group of functions to download 'pre-trained' language models (e.g., 'GPT', 'BERT') and extract contextualized (dynamic) word vectors (based on the R package 'text').

Maintained by Han-Wu-Shuang Bao. Last updated 1 years ago.

3.8 match 22 stars 4.04 score 10 scripts

ai-sdc

acro:A Tool for Semi-Automating the Statistical Disclosure Control of Research Outputs

Assists researchers and output checkers by distinguishing between research output that is safe to publish, output that requires further analysis, and output that cannot be published because of substantial disclosure risk. A paper about the tool was presented at the UNECE Expert Meeting on Statistical Data Confidentiality 2023; see <https://uwe-repository.worktribe.com/output/11060964>.

Maintained by Jim Smith. Last updated 9 days ago.

data-privacy data-protection privacy privacy-tools statistical-disclosure-control statistical-software

3.7 match 1 stars 4.11 score 1 scripts

bioc

MSstats:Protein Significance Analysis in DDA, SRM and DIA for Label-free or Label-based Proteomics Experiments

A set of tools for statistical relative protein significance analysis in DDA, SRM and DIA experiments.

Maintained by Meena Choi. Last updated 11 days ago.

immunooncology massspectrometry proteomics software normalization qualitycontrol timecourse openblas cpp

1.8 match 8.49 score 164 scripts 7 dependents

selesnow

ractivecampaign:Loading Data from 'ActiveCampaign API v3'

Interface for loading data from 'ActiveCampaign API v3' <https://developers.activecampaign.com/reference>. Provide functions for getting data by deals, contacts, accounts, campaigns and messages.

Maintained by Alexey Seleznev. Last updated 2 years ago.

5.4 match 2.70 score 2 scripts

ouhscbbmc

REDCapR:Interaction Between R and REDCap

Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.

Maintained by Will Beasley. Last updated 2 months ago.

redcap redcap-api

1.2 match 118 stars 12.36 score 438 scripts 6 dependents

rstudio

tfestimators:Interface to 'TensorFlow' Estimators

Interface to 'TensorFlow' Estimators <https://www.tensorflow.org/guide/estimator>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.

Maintained by Tomasz Kalinowski. Last updated 3 years ago.

1.7 match 57 stars 8.42 score 170 scripts

selesnow

ryandexdirect:Load Data From 'Yandex Direct'

Load data from 'Yandex Direct' API V5 <https://yandex.ru/dev/direct/doc/dg/concepts/about-docpage> into R. Provide function for load lists of campaings, ads, keywords and other objects from 'Yandex Direct' account. Also you can load statistic from API 'Reports Service' <https://yandex.ru/dev/direct/doc/reports/reports-docpage>. And allows keyword bids management.

Maintained by Alexey Seleznev. Last updated 1 months ago.

campaign filterlist wifi

1.9 match 53 stars 7.54 score 44 scripts 1 dependents

paithiov909

audubon:Japanese Text Processing Tools

A collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the 'Sudachi' morphological analyzer and the 'NEologd' (Neologism dictionary for 'MeCab'). These features are specific to Japanese and are not implemented in 'ICU' (International Components for Unicode).

Maintained by Akiru Kato. Last updated 21 days ago.

japanese javascript

2.5 match 10 stars 5.61 score 3 scripts 1 dependents

ropensci

popler:Popler R Package

Browse and query the popler database.

Maintained by Compagnoni Aldo. Last updated 5 years ago.

3.7 match 7 stars 3.82 score 47 scripts

ropensci

rperseus:Get Texts from the Perseus Digital Library

The Perseus Digital Library is a collection of classical texts. This package helps you get them. The available works can also be viewed here: <http://cts.perseids.org/>.

Maintained by David Ranzolin. Last updated 2 years ago.

classics greek greek-bible greek-new-testament latin peer-reviewed perseus perseus-digital-library translation

3.8 match 19 stars 3.74 score 29 scripts

grantmcdermott

ggfixest:Dedicated 'ggplot2' Methods for 'fixest' Objects

Provides 'ggplot2' equivalents of fixest::coefplot() and fixest::iplot(), for producing nice coefficient plots and interaction plots. Enables some additional functionality and convenience features, including grouped multi-'fixest' object faceting and programmatic updates to existing plots (e.g., themes and aesthetics).

Maintained by Grant McDermott. Last updated 2 months ago.

2.0 match 49 stars 7.01 score 28 scripts

cran

localScore:Package for Sequence Analysis by Local Score

Functionalities for calculating the local score and calculating statistical relevance (p-value) to find a local Score in a sequence of given distribution (S. Mercier and J.-J. Daudin (2001) <https://hal.science/hal-00714174/>) ; S. Karlin and S. Altschul (1990) <https://pmc.ncbi.nlm.nih.gov/articles/PMC53667/> ; S. Mercier, D. Cellier and F. Charlot (2003) <https://hal.science/hal-00937529v1/> ; A. Lagnoux, S. Mercier and P. Valois (2017) <doi:10.1093/bioinformatics/btw699> ).

Maintained by David Robelin. Last updated 20 days ago.

cpp

6.0 match 2.30 score 6 scripts

rubenarslan

codebook:Automatic Codebooks from Metadata Encoded in Dataset Attributes

Easily automate the following tasks to describe data frames: Summarise the distributions, and labelled missings of variables graphically and using descriptive statistics. For surveys, compute and summarise reliabilities (internal consistencies, retest, multilevel) for psychological scales. Combine this information with metadata (such as item labels and labelled values) that is derived from R attributes. To do so, the package relies on 'rmarkdown' partials, so you can generate HTML, PDF, and Word documents. Codebooks are also available as tables (CSV, Excel, etc.) and in JSON-LD, so that search engines can find your data and index the metadata. The metadata are also available at your fingertips via RStudio Addins.

Maintained by Ruben Arslan. Last updated 3 months ago.

codebook documentation formr json-ld metadata spss webapp

1.7 match 142 stars 8.31 score 229 scripts

zhukovyuri

SUNGEO:Sub-National Geospatial Data Archive: Geoprocessing Toolkit

Tools for integrating spatially-misaligned GIS datasets. Part of the Sub-National Geospatial Data Archive System.

Maintained by Yuri M. Zhukov. Last updated 10 months ago.

4.0 match 5 stars 3.42 score 8 scripts

pwkraft

discursive:Measuring Discursive Sophistication in Open-Ended Survey Responses

A simple approach to measure political sophistication based on open-ended survey responses. Discursive sophistication captures the complexity of individual attitude expression by quantifying its relative size, range, and constraint. For more information on the measurement approach see: Kraft, Patrick W. 2023. "Women Also Know Stuff: Challenging the Gender Gap in Political Sophistication." American Political Science Review (forthcoming).

Maintained by Patrick Kraft. Last updated 2 years ago.

4.5 match 2 stars 3.00 score 5 scripts

jaimesalvador

minired:R Interface to 'Redatam' Library

This package is deprecated. Please use 'redatamx' instead. Provides an API to work with 'Redatam' (see <https://redatam.org>) databases in both formats: 'RXDB' (new format) and 'DICX' (old format) and running 'Redatam' programs written in 'SPC' language. It's a wrapper around 'Redatam' core and provides functions to open/close a database (redatam_open()/redatam_close()), list entities and variables from the database (redatam_entities(), redatam_variables()) and execute a 'SPC' program and gets the results as data frames (redatam_query(), redatam_run()).

Maintained by Jaime Salvador. Last updated 4 months ago.

cpp

6.8 match 2.00 score

condwanaland

words:List of English Words from the Scrabble Dictionary

List of english scrabble words as listed in the OTCWL2014 <https://www.scrabbleplayers.org/w/Official_Tournament_and_Club_Word_List_2014_Edition>. Words are collated from the 'Word Game Dictionary' <https://www.wordgamedictionary.com/word-lists/>.

Maintained by Conor Neilson. Last updated 4 years ago.

3.5 match 3.80 score 42 scripts 1 dependents

tomeriko96

polyglotr:Translate Text

Provide easy methods to translate pieces of text. Functions send requests to translation services online.

Maintained by Tomer Iwan. Last updated 1 months ago.

google-translate googletranslate language linguee mymemory-api mymemorytranslator pons translation translations-api

1.8 match 33 stars 7.61 score 34 scripts 1 dependents

hughparsonage

heims:Decode and Validate HEIMS Data from Department of Education, Australia

Decode elements of the Australian Higher Education Information Management System (HEIMS) data for clarity and performance. HEIMS is the record system of the Department of Education, Australia to record enrolments and completions in Australia's higher education system, as well as a range of relevant information. For more information, including the source of the data dictionary, see <http://heimshelp.education.gov.au/sites/heimshelp/dictionary/pages/data-element-dictionary>.

Maintained by Hugh Parsonage. Last updated 7 years ago.

4.9 match 2.70 score 8 scripts

cran

exceldata:Streamline Data Import, Cleaning and Recoding from 'Excel'

A small group of functions to read in a data dictionary and the corresponding data table from 'Excel' and to automate the cleaning, re-coding and creation of simple calculated variables. This package was designed to be a companion to the macro-enabled 'Excel' template available on the GitHub site, but works with any similarly-formatted 'Excel' data.

Maintained by Lisa Avery. Last updated 1 years ago.

7.8 match 1.70 score

jonclayden

ore:An R Interface to the Onigmo Regular Expression Library

Provides an alternative to R's built-in functionality for handling regular expressions, based on the Onigmo library. Offers first-class compiled regex objects, partial matching and function-based substitutions, amongst other features.

Maintained by Jon Clayden. Last updated 2 days ago.

regex regular-expressions text-analysis

1.8 match 58 stars 7.16 score 125 scripts 6 dependents

daya6489

SmartEDA:Summarize and Explore the Data

Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.

Maintained by Dayanand Ubrangala. Last updated 1 years ago.

analysis exploratory-data-analysis

1.8 match 42 stars 7.25 score 214 scripts

bnosac

ruimtehol:Learn Text 'Embeddings' with 'Starspace'

Wraps the 'StarSpace' library <https://github.com/facebookresearch/StarSpace> allowing users to calculate word, sentence, article, document, webpage, link and entity 'embeddings'. By using the 'embeddings', you can perform text based multi-label classification, find similarities between texts and categories, do collaborative-filtering based recommendation as well as content-based recommendation, find out relations between entities, calculate graph 'embeddings' as well as perform semi-supervised learning and multi-task learning on plain text. The techniques are explained in detail in the paper: 'StarSpace: Embed All The Things!' by Wu et al. (2017), available at <arXiv:1709.03856>.

Maintained by Jan Wijffels. Last updated 1 years ago.

classification embeddings natural-language-processing nlp similarity starspace text-mining cpp

1.9 match 101 stars 6.65 score 44 scripts

bioc

amplican:Automated analysis of CRISPR experiments

`amplican` performs alignment of the amplicon reads, normalizes gathered data, calculates multiple statistics (e.g. cut rates, frameshifts) and presents results in form of aggregated reports. Data and statistics can be broken down by experiments, barcodes, user defined groups, guides and amplicons allowing for quick identification of potential problems.

Maintained by Eivind Valen. Last updated 5 months ago.

immunooncology technology alignment qpcr crispr cpp

1.7 match 10 stars 7.54 score 41 scripts

myeomans

DICEM:Directness and Intensity of Conflict Expression

A Natural Language Processing Model trained to detect directness and intensity during conflict. See <https://www.mikeyeomans.info>.

Maintained by Michael Yeomans. Last updated 7 months ago.

3.8 match 3.30 score

spsanderson

healthyR.data:Data Only Package to 'healthyR'

Provides data for functions typically used in the 'healthyR' package.

Maintained by Steven Sanderson. Last updated 2 months ago.

data data-science data-sets healthcare healthcare-analysis healthcare-application healthcare-datasets

1.9 match 10 stars 6.52 score 105 scripts 1 dependents

epicentre-msf

redcap:R Utilities For REDCap

R utilities for interacting with the REDCap API.

Maintained by Patrick Barks. Last updated 3 months ago.

3.5 match 7 stars 3.45 score 5 scripts

keyatm

keyATM:Keyword Assisted Topic Models

Fits keyword assisted topic models (keyATM) using collapsed Gibbs samplers. The keyATM combines the latent dirichlet allocation (LDA) models with a small number of keywords selected by researchers in order to improve the interpretability and topic classification of the LDA. The keyATM can also incorporate covariates and directly model time trends. The keyATM is proposed in Eshima, Imai, and Sasaki (2024) <doi:10.1111/ajps.12779>.

Maintained by Shusei Eshima. Last updated 11 months ago.

latent-dirichlet-allocation natural-language-processing political-science rcpp rcppeigen social-science topic-models cpp

1.9 match 106 stars 6.30 score 63 scripts

inzightvit

iNZightTools:Tools for 'iNZight'

Provides a collection of wrapper functions for common variable and dataset manipulation workflows primarily used by 'iNZight', a graphical user interface providing easy exploration and visualisation of data for students of statistics, available in both desktop and online versions. Additionally, many of the functions return the 'tidyverse' code used to obtain the result in an effort to bridge the gap between GUI and coding.

Maintained by Tom Elliott. Last updated 3 months ago.

2.3 match 1 stars 5.16 score 18 scripts 2 dependents

blasbenito

distantia:Advanced Toolset for Efficient Time Series Dissimilarity Analysis

Fast C++ implementation of Dynamic Time Warping for time series dissimilarity analysis, with applications in environmental monitoring and sensor data analysis, climate science, signal processing and pattern recognition, and financial data analysis. Built upon the ideas presented in Benito and Birks (2020) <doi:10.1111/ecog.04895>, provides tools for analyzing time series of varying lengths and structures, including irregular multivariate time series. Key features include individual variable contribution analysis, restricted permutation tests for statistical significance, and imputation of missing data via GAMs. Additionally, the package provides an ample set of tools to prepare and manage time series data.

Maintained by Blas M. Benito. Last updated 25 days ago.

2.0 match 23 stars 5.76 score 11 scripts

entjos

TreeMineR:Tree-Based Scan Statistics

Implementation of unconditional Bernoulli Scan Statistic developed by Kulldorff et al. (2003) <doi:10.1111/1541-0420.00039> for hierarchical tree structures. Tree-based Scan Statistics are an exploratory method to identify event clusters across the space of a hierarchical tree.

Maintained by Joshua P. Entrop. Last updated 7 months ago.

tree-based-scan-statistics

3.4 match 3.40 score 2 scripts

nschuwirth

ecoval:Procedures for Ecological Assessment of Surface Waters

Functions for evaluating and visualizing ecological assessment procedures for surface waters containing physical, chemical and biological assessments in the form of value functions.

Maintained by Nele Schuwirth. Last updated 3 years ago.

8.5 match 1.34 score 22 scripts

cran

espadon:Easy Study of Patient DICOM Data in Oncology

Exploitation, processing and 2D-3D visualization of DICOM-RT files (structures, dosimetry, imagery) for medical physics and clinical research, in a patient-oriented perspective.

Maintained by Cathy Fontbonne. Last updated 1 months ago.

cpp

4.0 match 2.85 score