Showing 200 of total 294 results (show query)
nflverse
nflreadr:Download 'nflverse' Data
A minimal package for downloading data from 'GitHub' repositories of the 'nflverse' project.
Maintained by Tan Ho. Last updated 4 months ago.
nflnflfastrnflversesports-data
142.8 match 66 stars 12.46 score 476 scripts 10 dependentssfeuerriegel
SentimentAnalysis:Dictionary-Based Sentiment Analysis
Performs a sentiment analysis of textual contents in R. This implementation utilizes various existing dictionaries, such as Harvard IV, or finance-specific dictionaries. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.
Maintained by Nicolas Proellochs. Last updated 2 years ago.
61.2 match 149 stars 8.34 score 242 scripts 1 dependentsalexchristensen
SemNetDictionaries:Dictionaries for the 'SemNetCleaner' Package
Implements dictionaries that can be used in the 'SemNetCleaner' package. Also includes several functions aimed at facilitating the text cleaning analysis in the 'SemNetCleaner' package. This package is designed to integrate and update word lists and dictionaries based on each user's individual needs by allowing users to store and save their own dictionaries. Dictionaries can be added to the 'SemNetDictionaries' package by submitting user-defined dictionaries to <https://github.com/AlexChristensen/SemNetDictionaries>.
Maintained by Alexander P. Christensen. Last updated 3 years ago.
dictionariessemantic-network-analysis
89.4 match 4 stars 5.08 score 3 scripts 2 dependentsquanteda
quanteda:Quantitative Analysis of Textual Data
A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
Maintained by Kenneth Benoit. Last updated 2 months ago.
corpusnatural-language-processingquantedatext-analyticsonetbbcpp
24.4 match 851 stars 16.68 score 5.4k scripts 51 dependentskoheiw
newsmap:Semi-Supervised Model for Geographical Document Classification
Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).
Maintained by Kohei Watanabe. Last updated 9 months ago.
machine-learningnews-storiesquantedatext-analysis
49.0 match 62 stars 6.05 score 8 scriptsstevecondylios
dictionaRy:Retrieve the Dictionary Definitions of English Words
An R interface to the 'Free Dictionary API' <https://dictionaryapi.dev/>, <https://github.com/meetDeveloper/freeDictionaryAPI>. Retrieve dictionary definitions for English words, as well as additional information including phonetics, part of speech, origins, audio pronunciation, example usage, synonyms and antonyms, returned in 'tidy' format for ease of use.
Maintained by Steve Condylios. Last updated 3 years ago.
literaturenatural-language-processingr-language
55.6 match 6 stars 4.86 score 240 scriptsmaelstrom-research
madshapR:Support Technical Processes Following 'Maelstrom Research' Standards
Functions to support rigorous processes in data cleaning, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the process, diagnose errors, and summarize and evaluate datasets and their associated data dictionaries. The main outputs are clean datasets and associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).
Maintained by Guillaume Fabre. Last updated 11 months ago.
49.8 match 2 stars 5.40 score 28 scripts 3 dependentshsvab
odbr:Download Data from Brazil's Origin Destination Surveys
Download data from Brazil's Origin Destination Surveys. The package covers both data from household travel surveys, dictionaries of variables, and the spatial geometries of surveys conducted in different years and across various urban areas in Brazil. For some cities, the package will include enhanced versions of the data sets with variables "harmonized" across different years.
Maintained by Haydee Svab. Last updated 1 months ago.
31.1 match 16 stars 5.85 score 11 scriptsobiba
opalr:'Opal' Data Repository Client and 'DataSHIELD' Utils
Data integration Web application for biobanks by 'OBiBa'. 'Opal' is the core database application for biobanks. Participant data, once collected from any data source, must be integrated and stored in a central data repository under a uniform model. 'Opal' is such a central repository. It can import, process, validate, query, analyze, report, and export data. 'Opal' is typically used in a research center to analyze the data acquired at assessment centres. Its ultimate purpose is to achieve seamless data-sharing among biobanks. This 'Opal' client allows to interact with 'Opal' web services and to perform operations on the R server side. 'DataSHIELD' administration tools are also provided.
Maintained by Yannick Marcon. Last updated 2 months ago.
20.8 match 3 stars 7.76 score 179 scripts 2 dependentsmlr-org
mlr3:Machine Learning in R - Next Generation
Efficient, object-oriented programming on the building blocks of machine learning. Provides 'R6' objects for tasks, learners, resamplings, and measures. The package is geared towards scalability and larger datasets by supporting parallelization and out-of-memory data-backends like databases. While 'mlr3' focuses on the core computational operations, add-on packages provide additional functionality.
Maintained by Marc Becker. Last updated 4 days ago.
classificationdata-sciencemachine-learningmlr3regression
10.5 match 972 stars 14.86 score 2.3k scripts 35 dependentscran
Rdiagnosislist:Manipulate SNOMED CT Diagnosis Lists
Functions and methods for manipulating 'SNOMED CT' concepts. The package contains functions for loading the 'SNOMED CT' release into a convenient R environment, selecting 'SNOMED CT' concepts using regular expressions, and navigating the 'SNOMED CT' ontology. It provides the 'SNOMEDconcept' S3 class for a vector of 'SNOMED CT' concepts (stored as 64-bit integers) and the 'SNOMEDcodelist' S3 class for a table of concepts IDs with descriptions. The package can be used to construct sets of 'SNOMED CT' concepts for research (<doi:10.1093/jamia/ocac158>). For more information about 'SNOMED CT' visit <https://www.snomed.org/>.
Maintained by Anoop D. Shah. Last updated 2 months ago.
40.5 match 1 stars 3.60 scoretrinker
lexicon:Lexicons for Text Analysis
A collection of lexical hash tables, dictionaries, and word lists.
Maintained by Tyler Rinker. Last updated 3 years ago.
hashlexiconlookupnames-frequentstopwordstext-dictionariestext-mining
14.9 match 111 stars 8.80 score 224 scripts 25 dependentsbioc
XNAString:Efficient Manipulation of Modified Oligonucleotide Sequences
The XNAString package allows for description of base sequences and associated chemical modifications in a single object. XNAString is able to capture single stranded, as well as double stranded molecules. Chemical modifications are represented as independent strings associated with different features of the molecules (base sequence, sugar sequence, backbone sequence, modifications) and can be read or written to a HELM notation. It also enables secondary structure prediction using RNAfold from ViennaRNA. XNAString is designed to be efficient representation of nucleic-acid based therapeutics, therefore it stores information about target sequences and provides interface for matching and alignment functions from Biostrings and pwalign packages.
Maintained by Marianna Plucinska. Last updated 5 months ago.
sequencematchingalignmentsequencinggeneticscpp
30.7 match 4.18 score 4 scriptskwb-r
kwb.utils:General Utility Functions Developed at KWB
This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).
Maintained by Hauke Sonnenberg. Last updated 12 months ago.
17.3 match 8 stars 7.33 score 12 scripts 78 dependentscjvanlissa
tidySEM:Tidy Structural Equation Modeling
A tidy workflow for generating, estimating, reporting, and plotting structural equation models using 'lavaan', 'OpenMx', or 'Mplus'. Throughout this workflow, elements of syntax, results, and graphs are represented as 'tidy' data, making them easy to customize. Includes functionality to estimate latent class analyses, and to plot 'dagitty' and 'igraph' objects.
Maintained by Caspar J. van Lissa. Last updated 7 days ago.
11.6 match 58 stars 10.69 score 330 scripts 1 dependentsnflverse
nflseedR:Functions to Efficiently Simulate and Evaluate NFL Seasons
A set of functions to simulate National Football League seasons including the sophisticated tie-breaking procedures.
Maintained by Sebastian Carl. Last updated 5 days ago.
football-simulationnflseason-simulations
18.4 match 23 stars 6.32 score 34 scripts 1 dependentsropensci
hunspell:High-Performance Stemmer, Tokenizer, and Spell Checker
Low level spell checker and morphological analyzer based on the famous 'hunspell' library <https://hunspell.github.io>. The package can analyze or check individual words as well as parse text, latex, html or xml documents. For a more user-friendly interface use the 'spelling' package which builds on this package to automate checking of files, documentation and vignettes in all common formats.
Maintained by Jeroen Ooms. Last updated 5 months ago.
hunspellspell-checkspellcheckerstemmertokenizercpp
8.8 match 111 stars 13.13 score 422 scripts 30 dependentscoolbutuseless
zstdlite:Fast Compression and Serialization with 'Zstandard' Algorithm
Fast, compressed serialization of R objects using the 'Zstandard' algorithm. The included zstandard connection ('zstdfile()') can be used to read/write compressed data by any code which supports R's built-in 'connections' mechanism. Dictionaries are supported for more effective compression of small data, and functions are provided for training these dictionaries. This implementation provides an R interface to advanced features of the 'Zstandard' 'C' library (available from <https://github.com/facebook/zstd>).
Maintained by Mike Cheng. Last updated 2 months ago.
23.4 match 30 stars 4.95 score 7 scriptsrstudio
reticulate:Interface to 'Python'
Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.
Maintained by Tomasz Kalinowski. Last updated 1 days ago.
5.3 match 1.7k stars 21.07 score 18k scripts 427 dependentssylvainloiseau
interlineaR:Importing Interlinearized Corpora and Dictionaries as Produced by Descriptive Linguistics Software
Interlinearized glossed texts (IGT) are used in descriptive linguistics for representing a morphological analysis of a text through a morpheme-by-morpheme gloss. 'InterlineaR' provide a set of functions that targets several popular formats of IGT ('SIL Toolbox', 'EMELD XML') and that turns an IGT into a set of data frames following a relational model (the tables represent the different linguistic units: texts, sentences, word, morphems). The same pieces of software ('SIL FLEX', 'SIL Toolbox') typically produce dictionaries of the morphemes used in the glosses. 'InterlineaR' provide a function for turning the LIFT XML dictionary format into a set of data frames following a relational model in order to represent the dictionary entries, the sense(s) attached to the entries, the example(s) attached to senses, etc.
Maintained by Sylvain Loiseau. Last updated 7 years ago.
corpus-linguisticsdescriptive-linguisticsdictionariesinterlinear-gloss
21.3 match 4 stars 4.60 score 9 scriptsapache
arrow:Integration to 'Apache' 'Arrow'
'Apache' 'Arrow' <https://arrow.apache.org/> is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. This package provides an interface to the 'Arrow C++' library.
Maintained by Jonathan Keane. Last updated 1 months ago.
5.0 match 15k stars 19.22 score 10k scripts 81 dependentsr4epi
epidict:Epidemiology data dictionaries and random data generators
The 'R4EPIs' project <https://R4epis.netlify.com> seeks to provide a set of standardized tools for analysis of outbreak and survey data in humanitarian aid settings. This package currently provides standardized data dictionaries from MSF OCA for four outbreak scenarios (Acute Jaundice Syndrome, Cholera, Measles, Meningitis) and three surveys (Retrospective mortality and access to care, Malnutrition, and Vaccination coverage). In addition, a data generator from these dictionaries is provided.
Maintained by Alexander Spina. Last updated 10 days ago.
21.2 match 3 stars 4.43 score 5 scripts 1 dependentsmlr-org
mlr3misc:Helper Functions for 'mlr3'
Frequently used helper functions and assertions used in 'mlr3' and its companion packages. Comes with helper functions for functional programming, for printing, to work with 'data.table', as well as some generally useful 'R6' classes. This package also supersedes the package 'BBmisc'.
Maintained by Marc Becker. Last updated 4 months ago.
machine-learningmiscellaneousmlr3
8.8 match 12 stars 10.28 score 302 scripts 42 dependentsrmi-pacta
pacta.multi.loanbook:Run 'PACTA' on Multiple Loan Books Easily
Run Paris Agreement Capital Transition Assessment ('PACTA') analyses on multiple loan books in a structured way. Provides access to standard 'PACTA' metrics and additional 'PACTA'-related metrics for multiple loan books. Results take the form of 'csv' files and plots and are exported to user-specified project paths.
Maintained by Jacob Kastl. Last updated 2 days ago.
climate-changepactapactaversesustainable-finance
13.7 match 6.48 score 4 scriptscvxgrp
CVXR:Disciplined Convex Optimization
An object-oriented modeling language for disciplined convex programming (DCP) as described in Fu, Narasimhan, and Boyd (2020, <doi:10.18637/jss.v094.i14>). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution. Interfaces to solvers on CRAN and elsewhere are provided, both commercial and open source.
Maintained by Anqi Fu. Last updated 4 months ago.
6.8 match 207 stars 12.89 score 768 scripts 51 dependentschris31415926535
tardis:Text Analysis with Rules and Dictionaries for Inferring Sentiment
Measure text's sentiment with dictionaries and simple rules covering negations and modifiers. User-supplied dictionaries are supported, including Unicode emojis and multi-word tokens, so this package can also be used to study constructs beyond sentiment.
Maintained by Christopher Belanger. Last updated 2 years ago.
21.6 match 2 stars 4.00 score 10 scriptsdbosak01
libr:Libraries, Data Dictionaries, and a Data Step for R
Contains a set of functions to create data libraries, generate data dictionaries, and simulate a data step. The libname() function will load a directory of data into a library in one line of code. The dictionary() function will generate data dictionaries for individual data frames or an entire library. And the datestep() function will perform row-by-row data processing.
Maintained by David Bosak. Last updated 3 months ago.
10.4 match 27 stars 8.27 score 48 scripts 2 dependentsdmrodz
dataMeta:Create and Append a Data Dictionary for an R Dataset
Designed to create a basic data dictionary and append to the original dataset's attributes list. The package makes use of a tidy dataset and creates a data frame that will serve as a linker that will aid in building the dictionary. The dictionary is then appended to the list of the original dataset's attributes. The user will have the option of entering variable and item descriptions by writing code or use alternate functions that will prompt the user to add these.
Maintained by Dania M. Rodriguez. Last updated 3 years ago.
14.5 match 23 stars 5.54 score 15 scriptsrstudio
pointblank:Data Validation and Organization of Metadata for Local and Remote Tables
Validate data in data frames, 'tibble' objects, 'Spark' 'DataFrames', and database tables. Validation pipelines can be made using easily-readable, consecutive validation steps. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions. Many other workflows are available including an information management workflow, where the aim is to record, collect, and generate useful information on data tables.
Maintained by Richard Iannone. Last updated 9 days ago.
data-assertionsdata-checkerdata-dictionariesdata-framesdata-inferencedata-managementdata-profilerdata-qualitydata-validationdata-verificationdatabase-tableseasy-to-understandreporting-toolschema-validationtesting-toolsyaml-configuration
7.5 match 932 stars 10.59 score 284 scriptsa-maldet
labelmachine:Make Labeling of R Data Sets Easy
Assign meaningful labels to data frame columns. 'labelmachine' manages your label assignment rules in 'yaml' files and makes it easy to use the same labels in multiple projects.
Maintained by Adrian Maldet. Last updated 5 years ago.
13.7 match 7 stars 5.26 score 13 scriptslarmarange
labelled:Manipulating Labelled Data
Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.
Maintained by Joseph Larmarange. Last updated 26 days ago.
havenlabelsmetadatasasspssstata
4.6 match 76 stars 15.02 score 2.4k scripts 96 dependentsoobianom
r2dictionary:A Mini-Dictionary for 'Shiny' and 'Rmarkdown' Documents
Despite the predominant use of R for data manipulation and various robust statistical calculations, in recent years, more people from various disciplines are beginning to use R for other purposes. A critical milestone that has enabled large influx of users in the R community is the development of the Tidyverse family of packages and Rmarkdown. With the latter, one can write all kinds of documents and produce output in formats such html and pdf very easily. In doing this seemlessly, further tools are needed for such users to easily and freely write in R for all kinds of purposes. The r2dictionary introduces a means for users to directly search for definitions of terms within the R environment.
Maintained by Obinna Obianom. Last updated 2 years ago.
17.0 match 2 stars 4.00 score 3 scriptsepicentre-msf
dbc:Dictionary-Based Cleaning
Tools for dictionary-based data cleaning.
Maintained by Patrick Barks. Last updated 1 years ago.
27.4 match 2 stars 2.48 score 4 scripts 1 dependentsvgherard
sbo:Text Prediction via Stupid Back-Off N-Gram Models
Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).
Maintained by Valerio Gherardi. Last updated 4 years ago.
natural-language-processingngram-modelspredictive-textsbocpp
13.8 match 10 stars 4.78 score 12 scriptsbioc
transcriptogramer:Transcriptional analysis based on transcriptograms
R package for transcriptional analysis based on transcriptograms, a method to analyze transcriptomes that projects expression values on a set of ordered proteins, arranged such that the probability that gene products participate in the same metabolic pathway exponentially decreases with the increase of the distance between two proteins of the ordering. Transcriptograms are, hence, genome wide gene expression profiles that provide a global view for the cellular metabolism, while indicating gene sets whose expressions are altered.
Maintained by Diego Morais. Last updated 5 months ago.
softwarenetworkvisualizationsystemsbiologygeneexpressiongenesetenrichmentgraphandnetworkclusteringdifferentialexpressionmicroarrayrnaseqtranscriptionimmunooncology
13.5 match 4 stars 4.81 score 9 scriptsmjockers
syuzhet:Extracts Sentiment and Sentiment-Derived Plot Arcs from Text
Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include "syuzhet" (default) developed in the Nebraska Literary Lab "afinn" developed by Finn ร rup Nielsen, "bing" developed by Minqing Hu and Bing Liu, and "nrc" developed by Mohammad, Saif M. and Turney, Peter D. Applicable references are available in README.md and in the documentation for the "get_sentiment" function. The package also provides a hack for implementing Stanford's coreNLP sentiment parser. The package provides several methods for plot arc normalization.
Maintained by Matthew Jockers. Last updated 2 years ago.
4.8 match 336 stars 12.92 score 1.4k scripts 31 dependentsqinwf
jiebaR:Chinese Text Segmentation
Chinese text segmentation, keyword extraction and speech tagging For R.
Maintained by Qin Wenfeng. Last updated 5 years ago.
chinesechinese-text-segmentationcppjiebajiebalexical-analysisnlpcpp
6.0 match 348 stars 10.18 score 456 scripts 6 dependentskasperwelbers
corpustools:Managing, Querying and Analyzing Tokenized Text
Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.
Maintained by Kasper Welbers. Last updated 6 months ago.
8.1 match 31 stars 7.50 score 174 scripts 1 dependentsrevelle
psychTools:Tools to Accompany the 'psych' Package for Psychological Research
Support functions, data sets, and vignettes for the 'psych' package. Contains several of the biggest data sets for the 'psych' package as well as four vignettes. A few helper functions for file manipulation are included as well. For more information, see the <https://personality-project.org/r/> web page.
Maintained by William Revelle. Last updated 12 months ago.
10.3 match 5.89 score 178 scripts 5 dependentsagusnieto77
ACEP:Anรกlisis Computacional de Eventos de Protesta
La librerรญa 'ACEP' contiene funciones especรญficas para desarrollar anรกlisis computacional de eventos de protesta. Asimismo, contiene bases de datos con colecciones de notas sobre protestas y diccionarios de palabras conflictivas. La colecciรณn de diccionarios reune diccionarios de diferentes orรญgenes. The 'ACEP' library contains specific functions to perform computational analysis of protest events. It also contains a database with collections of notes on protests and dictionaries of conflicting words. Collection of dictionaries that brings together dictionaries from different sources.
Maintained by Agustรญn Nieto. Last updated 1 years ago.
computer-aided-detectionconflict-analysisconflict-detectiondictionariesnlp-keywords-extractionprotest-eventstext-miningvisualization
10.9 match 10 stars 5.48 score 9 scriptswa-department-of-agriculture
soils:Visualize and Report Soil Health Data
Collection of soil health data visualization and reporting tools, including a RStudio project template with everything you need to generate custom HTML and Microsoft Word reports for each participant in your soil health sampling project.
Maintained by Jadey N Ryan. Last updated 1 months ago.
10.0 match 11 stars 5.74 score 9 scriptsselesnow
rgoogleads:Loading Data from 'Google Ads API'
Interface for loading data from 'Google Ads API', see <https://developers.google.com/google-ads/api/docs/start>. Package provide function for authorization and loading reports.
Maintained by Alexey Seleznev. Last updated 2 months ago.
8.5 match 14 stars 6.40 score 15 scripts 1 dependentslrberge
fixest:Fast Fixed-Effects Estimations
Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018) <https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf>. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.
Maintained by Laurent Berge. Last updated 7 months ago.
3.7 match 387 stars 14.69 score 3.8k scripts 25 dependentsepiverse-trace
ColOpenData:Download Colombian Demographic, Climate and Geospatial Data
Downloads wrangled Colombian socioeconomic, geospatial,population and climate data from DANE <https://www.dane.gov.co/> (National Administrative Department of Statistics) and IDEAM <https://ideam.gov.co> (Institute of Hydrology, Meteorology and Environmental Studies). It solves the problem of Colombian data being issued in different web pages and sources by using functions that allow the user to select the desired database and download it without having to do the exhausting acquisition process.
Maintained by Maria Camila Tavera-Cifuentes. Last updated 1 months ago.
climatecolombiadata-packagedemographicsmaps
7.3 match 11 stars 7.44 score 17 scriptsctn-0094
DOPE:Drug Ontology Parsing Engine
Provides information on drug names (brand, generic and street) for drugs tracked by the DEA. There are functions that will search synonyms and return the drug names and types. The vignettes have extensive information on the work done to create the data for the package.
Maintained by Raymond Balise. Last updated 4 years ago.
6.8 match 21 stars 7.83 score 31 scriptsmlr-org
mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'
Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.
Maintained by Martin Binder. Last updated 8 days ago.
baggingdata-sciencedataflow-programmingensemble-learningmachine-learningmlr3pipelinespreprocessingstacking
4.3 match 141 stars 12.36 score 448 scripts 7 dependentspzhaonet
pinyin:Convert Chinese Characters into Pinyin, Sijiao, Wubi or Other Codes
Convert Chinese characters into Pinyin (the official romanization system for Standard Chinese in mainland China, Malaysia, Singapore, and Taiwan. See <https://en.wikipedia.org/wiki/Pinyin> for details), Sijiao (four or five numerical digits per character. See <https://en.wikipedia.org/wiki/Four-Corner_Method>.), Wubi (an input method with five strokes. See <https://en.wikipedia.org/wiki/Wubi_method>) or user-defined codes.
Maintained by Peng Zhao. Last updated 5 years ago.
bookdownchinese-characterspinyin
9.1 match 49 stars 5.71 score 35 scripts 1 dependentsmlr-org
mlr3mbo:Flexible Bayesian Optimization
A modern and flexible approach to Bayesian Optimization / Model Based Optimization building on the 'bbotk' package. 'mlr3mbo' is a toolbox providing both ready-to-use optimization algorithms as well as their fundamental building blocks allowing for straightforward implementation of custom algorithms. Single- and multi-objective optimization is supported as well as mixed continuous, categorical and conditional search spaces. Moreover, using 'mlr3mbo' for hyperparameter optimization of machine learning models within the 'mlr3' ecosystem is straightforward via 'mlr3tuning'. Examples of ready-to-use optimization algorithms include Efficient Global Optimization by Jones et al. (1998) <doi:10.1023/A:1008306431147>, ParEGO by Knowles (2006) <doi:10.1109/TEVC.2005.851274> and SMS-EGO by Ponweiser et al. (2008) <doi:10.1007/978-3-540-87700-4_78>.
Maintained by Lennart Schneider. Last updated 12 days ago.
automlbayesian-optimizationbbotkblack-box-optimizationgaussian-processhpohyperparameterhyperparameter-optimizationhyperparameter-tuningmachine-learningmlr3model-based-optimizationoptimizationoptimizerrandom-foresttuning
6.0 match 25 stars 8.57 score 120 scripts 3 dependentstrinker
qdapTools:Tools for the 'qdap' Package
A collection of tools associated with the 'qdap' package that may be useful outside of the context of text analysis.
Maintained by Tyler Rinker. Last updated 2 years ago.
7.2 match 16 stars 7.04 score 408 scripts 5 dependentsbioc
MetMashR:Metabolite Mashing with R
A package to merge, filter sort, organise and otherwise mash together metabolite annotation tables. Metabolite annotations can be imported from multiple sources (software) and combined using workflow steps based on S4 class templates derived from the `struct` package. Other modular workflow steps such as filtering, merging, splitting, normalisation and rest-api queries are included.
Maintained by Gavin Rhys Lloyd. Last updated 5 months ago.
8.8 match 2 stars 5.81 score 5 scriptsdrjphughesjr
hash:Full Featured Implementation of Hash Tables/Associative Arrays/Dictionaries
Implements a data structure similar to hashes in Perl and dictionaries in Python but with a purposefully R flavor. For objects of appreciable size, access using hashes outperforms native named lists and vectors.
Maintained by John Hughes. Last updated 2 years ago.
6.7 match 1 stars 7.54 score 4.0k scripts 50 dependentsropensci
gendercoder:Recodes Sex/Gender Descriptions into a Standard Set
Provides functions and dictionaries for recoding of freetext gender responses into more consistent categories.
Maintained by Yaoxiang Li. Last updated 1 months ago.
gender-diversityozunconf18unconf
7.8 match 46 stars 6.36 score 45 scriptsusepa
tcpl:ToxCast Data Analysis Pipeline
The ToxCast Data Analysis Pipeline ('tcpl') is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, 'invitrodb'. The package was developed for the chemical screening data curated by the US EPA's Toxicity Forecaster (ToxCast) program, but 'tcpl' can be used to support diverse chemical screening efforts.
Maintained by Jason Brown. Last updated 2 days ago.
5.3 match 36 stars 9.41 score 90 scriptsgesistsa
oolong:Create Validation Tests for Automated Content Analysis
Intended to create standard human-in-the-loop validity tests for typical automated content analysis such as topic modeling and dictionary-based methods. This package offers a standard workflow with functions to prepare, administer and evaluate a human-in-the-loop validity test. This package provides functions for validating topic models using word intrusion, topic intrusion (Chang et al. 2009, <https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models>) and word set intrusion (Ying et al. 2021) <doi:10.1017/pan.2021.33> tests. This package also provides functions for generating gold-standard data which are useful for validating dictionary-based methods. The default settings of all generated tests match those suggested in Chang et al. (2009) and Song et al. (2020) <doi:10.1080/10584609.2020.1723752>.
Maintained by Chung-hong Chan. Last updated 19 days ago.
textanalysistopicmodelingvalidation
6.5 match 54 stars 7.57 score 23 scriptsrandy3k
collections:High Performance Container Data Types
Provides high performance container data types such as queues, stacks, deques, dicts and ordered dicts. Benchmarks <https://randy3k.github.io/collections/articles/benchmark.html> have shown that these containers are asymptotically more efficient than those offered by other packages.
Maintained by Randy Lai. Last updated 2 years ago.
5.3 match 104 stars 9.14 score 215 scripts 27 dependentsmiserman
lingmatch:Linguistic Matching and Accommodation
Measure similarity between texts. Offers a variety of processing tools and similarity metrics to facilitate flexible representation of texts and matching. Implements forms of Language Style Matching (Ireland & Pennebaker, 2010) <doi:10.1037/a0020386> and Latent Semantic Analysis (Landauer & Dumais, 1997) <doi:10.1037/0033-295X.104.2.211>.
Maintained by Micah Iserman. Last updated 26 days ago.
9.8 match 11 stars 4.80 score 23 scriptstidymodels
embed:Extra Recipes for Encoding Predictors
Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
5.0 match 142 stars 9.35 score 1.1k scriptsepiverse-trace
cleanepi:Clean and Standardize Epidemiological Data
Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, and generates a comprehensive report detailing the outcomes of each cleaning task.
Maintained by Karim Manรฉ. Last updated 2 days ago.
data-cleaningepidemiologyepiverse
6.3 match 9 stars 7.44 score 19 scriptsvubiostat
redcapAPI:Interface to 'REDCap'
Access data stored in 'REDCap' databases using the Application Programming Interface (API). 'REDCap' (Research Electronic Data CAPture; <https://projectredcap.org>, Harris, et al. (2009) <doi:10.1016/j.jbi.2008.08.010>, Harris, et al. (2019) <doi:10.1016/j.jbi.2019.103208>) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The API allows users to access data and project meta data (such as the data dictionary) from the web programmatically. The 'redcapAPI' package facilitates the process of accessing data with options to prepare an analysis-ready data set consistent with the definitions in a database's data dictionary.
Maintained by Shawn Garbett. Last updated 9 days ago.
4.4 match 22 stars 10.47 score 134 scripts 2 dependentsxoopr
dictionar6:R6 Dictionary Interface
Efficient object-oriented R6 dictionary capable of holding objects of any class, including R6. Typed and untyped dictionaries are provided as well as the 'usual' dictionary methods that are available in other OOP languages, for example listing keys, items, values, and methods to get/set these.
Maintained by Raphael Sonabend. Last updated 3 years ago.
11.8 match 4 stars 3.78 score 1 scripts 1 dependentsmlr-org
bbotk:Black-Box Optimization Toolkit
Features highly configurable search spaces via the 'paradox' package and optimizes every user-defined objective function. The package includes several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in 'mlr3mbo') and Hyperband (in 'mlr3hyperband'). bbotk is the base package of 'mlr3tuning', 'mlr3fselect' and 'miesmuschel'.
Maintained by Marc Becker. Last updated 3 months ago.
bbotkblack-box-optimizationdata-sciencehyperparameter-optimizationhyperparameter-tuningmachine-learningmlr3optimization
4.5 match 22 stars 9.87 score 166 scripts 14 dependentsquadrama
DramaAnalysis:Analysis of Dramatic Texts
Analysis of preprocessed dramatic texts, with respect to literary research. The package provides functions to analyze and visualize information about characters, stage directions, the dramatic structure and the text itself. The dramatic texts are expected to be in CSV format, which can be installed from within the package, sample texts are provided. The package and the reasoning behind it are described in Reiter et al. (2017) <doi:10.18420/in2017_119>.
Maintained by Nils Reiter. Last updated 4 years ago.
corpus-linguisticsdigital-humanitiesdramadramatic-textsstatistics
9.0 match 15 stars 4.79 score 41 scriptsropengov
eurostat:Tools for Eurostat Open Data
Tools to download data from the Eurostat database <https://ec.europa.eu/eurostat> together with search and manipulation utilities.
Maintained by Leo Lahti. Last updated 27 days ago.
3.8 match 239 stars 11.09 score 892 scripts 5 dependentslwheinsberg
dbGaPCheckup:dbGaP Checkup
Contains functions that check for formatting of the Subject Phenotype data set and data dictionary as specified by the National Center for Biotechnology Information (NCBI) Database of Genotypes and Phenotypes (dbGaP) <https://www.ncbi.nlm.nih.gov/gap/docs/submissionguide/>.
Maintained by Lacey W. Heinsberg. Last updated 1 years ago.
8.4 match 4 stars 4.86 score 18 scriptsipeagit
censobr:Download Data from Brazil's Population Census
Easy access to data from Brazil's population censuses. The package provides a simple and efficient way to download and read the data sets and the documentation of all the population censuses taken in and after 1960 in the country. The package is built on top of the 'Arrow' platform <https://arrow.apache.org/docs/r/>, which allows users to work with larger-than-memory census data using 'dplyr' familiar functions. <https://arrow.apache.org/docs/r/articles/arrow.html#analyzing-arrow-data-with-dplyr>.
Maintained by Rafael H. M. Pereira. Last updated 16 days ago.
brazilcensuscensus-datamicrodadosmicrodata
4.8 match 39 stars 8.38 score 79 scriptspaithiov909
gibasa:An Alternative 'Rcpp' Wrapper of 'MeCab'
A plain 'Rcpp' wrapper for 'MeCab' that can segment Chinese, Japanese, and Korean text into tokens. The main goal of this package is to provide an alternative to 'tidytext' using morphological analysis.
Maintained by Akiru Kato. Last updated 28 days ago.
8.0 match 15 stars 5.02 score 3 scriptspharmacologie-caen
vigicaen:'VigiBase' Pharmacovigilance Database Toolbox
Perform the analysis of the World Health Organization (WHO) Pharmacovigilance database 'VigiBase' (Extract Case Level version), <https://who-umc.org/> e.g., load data, perform data management, disproportionality analysis, and descriptive statistics. Intended for pharmacovigilance routine use or studies. This package is NOT supported nor reflect the opinion of the WHO, or the Uppsala Monitoring Centre. Disproportionality methods are described by Norรฉn et al (2013) <doi:10.1177/0962280211403604>.
Maintained by Charles Dolladille. Last updated 3 days ago.
datamanagementpharmacovigilance
6.3 match 1 stars 6.27 score 11 scriptsdgrtwo
fuzzyjoin:Join Tables Together on Inexact Matching
Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance and regular expression matching.
Maintained by David Robinson. Last updated 5 years ago.
3.0 match 678 stars 12.92 score 1.5k scripts 20 dependentscelehs
MAP:Multimodal Automated Phenotyping
Electronic health records (EHR) linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. Towards that end, we developed an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Specifically, our proposed method, called MAP (Map Automated Phenotyping algorithm), fits an ensemble of latent mixture models on aggregated ICD and NLP counts along with healthcare utilization. The MAP algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying subjects with phenotype yes/no (See Katherine P. Liao, et al. (2019) <doi:10.1093/jamia/ocz066>.).
Maintained by Thomas Charlon. Last updated 2 months ago.
4.5 match 6 stars 8.58 score 177 scripts 1 dependentsrolkra
explore:Simplifies Exploratory Data Analysis
Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.
Maintained by Roland Krasser. Last updated 3 months ago.
data-explorationdata-visualisationdecision-treesedarmarkdownshinytidy
3.3 match 228 stars 11.43 score 221 scripts 1 dependentsrstudio
rstudioapi:Safely Access the RStudio API
Access the RStudio API (if available) and provide informative error messages when it's not.
Maintained by Kevin Ushey. Last updated 4 months ago.
2.0 match 172 stars 18.81 score 3.6k scripts 2.1k dependentshongyuanjia
eplusr:A Toolkit for Using Whole Building Simulation Program 'EnergyPlus'
A rich toolkit of using the whole building simulation program 'EnergyPlus'(<https://energyplus.net>), which enables programmatic navigation, modification of 'EnergyPlus' models and makes it less painful to do parametric simulations and analysis.
Maintained by Hongyuan Jia. Last updated 8 months ago.
energy-simulationenergyplusenergyplus-modelseplusepwiddidfparametric-simulationr6simulation
5.1 match 72 stars 7.20 score 91 scripts 4 dependentsvgherard
kgrams:Classical k-gram Language Models
Training and evaluating k-gram language models in R, supporting several probability smoothing techniques, perplexity computations, random text generation and more.
Maintained by Valerio Gherardi. Last updated 4 months ago.
language-modelsn-gramsnatural-language-processingcpp
7.0 match 7 stars 5.17 score 14 scripts 1 dependentslucasgodeiro
TextForecast:Regression Analysis and Forecasting Using Textual Data from a Time-Varying Dictionary
Provides functionalities based on the paper "Time Varying Dictionary and the Predictive Power of FED Minutes" (Lima, 2018) <doi:10.2139/ssrn.3312483>. It selects the most predictive terms, that we call time-varying dictionary using supervised machine learning techniques as lasso and elastic net.
Maintained by Lucas Godeiro. Last updated 5 years ago.
6.9 match 15 stars 5.18 score 20 scriptsyukai-yang
R6DS:R6 Reference Class Based Data Structures
Provides reference classes implementing some useful data structures. The package implements these data structures by using the reference class R6. Therefore, the classes of the data structures are also reference classes which means that their instances are passed by reference. The implemented data structures include stack, queue, double-ended queue, doubly linked list, set, dictionary and binary search tree. See for example <https://en.wikipedia.org/wiki/Data_structure> for more information about the data structures.
Maintained by Yukai Yang. Last updated 2 years ago.
binary-search-treesdata-structuresdequedictionarydoubly-linked-listfunctional-programmingmapqueuereference-classstacktraversal
10.5 match 5 stars 3.40 score 5 scriptsaidanmorales
rTwig:Realistic Quantitative Structure Models
Real Twig is a method to correct branch overestimation in quantitative structure models. Overestimated cylinders are correctly tapered using measured twig diameters of corresponding tree species. Supported quantitative structure modeling software includes 'TreeQSM', 'SimpleForest', 'Treegraph', and 'aRchi'. Also included is a novel database of twig diameters and tools for fractal analysis of point clouds.
Maintained by Aidan Morales. Last updated 13 days ago.
forestrylidarmodelingqsmrcppcpp
5.0 match 8 stars 7.10 score 13 scriptsrmi-pacta
r2dii.match:Tools to Match Corporate Lending Portfolios with Climate Data
These tools implement in R a fundamental part of the software 'PACTA' (Paris Agreement Capital Transition Assessment), which is a free tool that calculates the alignment between financial portfolios and climate scenarios (<https://www.transitionmonitor.com/>). Financial institutions use 'PACTA' to study how their capital allocation decisions align with climate change mitigation goals. This package matches data from corporate lending portfolios to asset level data from market-intelligence databases (e.g. power plant capacities, emission factors, etc.). This is the first step to assess if a financial portfolio aligns with climate goals.
Maintained by Jacob Kastl. Last updated 27 days ago.
4.5 match 7 stars 7.63 score 118 scripts 2 dependentsrmi-pacta
r2dii.analysis:Measure Climate Scenario Alignment of Corporate Loans
These tools help you to assess if a corporate lending portfolio aligns with climate goals. They summarize key climate indicators attributed to the portfolio (e.g. production, emission factors), and calculate alignment targets based on climate scenarios. They implement in R the last step of the free software 'PACTA' (Paris Agreement Capital Transition Assessment; <https://www.transitionmonitor.com/>). Financial institutions use 'PACTA' to study how their capital allocation decisions align with climate change mitigation goals.
Maintained by Jacob Kastl. Last updated 12 days ago.
4.5 match 12 stars 7.45 score 46 scripts 2 dependentsbioc
Biostrings:Efficient manipulation of biological strings
Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.
Maintained by Hervรฉ Pagรจs. Last updated 23 days ago.
sequencematchingalignmentsequencinggeneticsdataimportdatarepresentationinfrastructurebioconductor-packagecore-package
1.9 match 61 stars 17.83 score 8.6k scripts 1.2k dependentsrmi-pacta
r2dii.plot:Visualize the Climate Scenario Alignment of a Financial Portfolio
Create plots to visualize the alignment of a corporate lending financial portfolio to climate change scenarios based on climate indicators (production and emission intensities) across key climate relevant sectors of the 'PACTA' methodology (Paris Agreement Capital Transition Assessment; <https://www.transitionmonitor.com/>). Financial institutions use 'PACTA' to study how their capital allocation decisions align with climate change mitigation goals.
Maintained by Monika Furdyna. Last updated 13 days ago.
4.5 match 8 stars 7.31 score 33 scripts 5 dependentsmelff
RKernel:Yet another R kernel for Jupyter
Provides a kernel for Jupyter.
Maintained by Martin Elff. Last updated 14 days ago.
jupyterjupyter-kerneljupyter-kernelsjupyter-notebook
7.0 match 38 stars 4.60 scorebioc
pgca:PGCA: An Algorithm to Link Protein Groups Created from MS/MS Data
Protein Group Code Algorithm (PGCA) is a computationally inexpensive algorithm to merge protein summaries from multiple experimental quantitative proteomics data. The algorithm connects two or more groups with overlapping accession numbers. In some cases, pairwise groups are mutually exclusive but they may still be connected by another group (or set of groups) with overlapping accession numbers. Thus, groups created by PGCA from multiple experimental runs (i.e., global groups) are called "connected" groups. These identified global protein groups enable the analysis of quantitative data available for protein groups instead of unique protein identifiers.
Maintained by Gabriela Cohen-Freue. Last updated 5 months ago.
workflowstepassaydomainproteomicsmassspectrometryimmunooncology
8.1 match 4.00 score 3 scriptsrafapereirabr
aopdata:Data from the 'Access to Opportunities Project (AOP)'
Download data from the 'Access to Opportunities Project (AOP)'. The 'aopdata' package brings annual estimates of access to employment, health, education and social assistance services by transport mode, as well as data on the spatial distribution of population, jobs, health care, schools and social assistance facilities at a fine spatial resolution for all cities included in the project. More info on the 'AOP' website <https://www.ipea.gov.br/acessooportunidades/en/>.
Maintained by Rafael H. M. Pereira. Last updated 2 months ago.
6.8 match 4.70 score 72 scriptsrich-iannone
intendo:A Group of Fun Datasets of Various Sizes and Differing Levels of Quality
Four datasets are provided here from the 'Intendo' game 'Super Jetroid'. It is data from the 2015 year of operation and it comprises a revenue table ('all_revenue'), a daily users table ('users_daily'), a user summary table ('user_summary'), and a table with data on all user sessions ('all_sessions'). These core datasets come in different sizes, and, each of them has a variant that was intentionally made faulty (totally riddled with errors and inconsistencies). This suite of tables is useful for testing with packages that focus on data validation and data documentation.
Maintained by Richard Iannone. Last updated 1 years ago.
8.0 match 9 stars 4.01 score 23 scriptsjuliasilge
tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.
Maintained by Julia Silge. Last updated 11 months ago.
natural-language-processingtext-miningtidy-datatidyverse
1.8 match 1.2k stars 16.86 score 17k scripts 61 dependentsnalimilan
R.temis:Integrated Text Mining Solution
An integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, lexical summary, terms co-occurrences and documents similarity measures, graphs of terms, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.
Maintained by Milan Bouchet-Valat. Last updated 17 days ago.
6.0 match 27 stars 4.99 score 24 scriptsqtalr
qtkit:Quantitative Text Kit
Support package for the textbook "An Introduction to Quantitative Text Analysis for Linguists: Reproducible Research Using R" (Francom, 2024) <doi:10.4324/9781003393764>. Includes functions to acquire, clean, and analyze text data as well as functions to document and share the results of text analysis. The package is designed to be used in conjunction with the book, but can also be used as a standalone package for text analysis.
Maintained by Jerid Francom. Last updated 2 months ago.
5.9 match 5.03 score 12 scriptsvincentarelbundock
countrycode:Convert Country Names and Country Codes
Standardize country names, convert them into one of 40 different coding schemes, convert between coding schemes, and assign region descriptors.
Maintained by Vincent Arel-Bundock. Last updated 3 months ago.
2.0 match 351 stars 14.80 score 6.3k scripts 119 dependentsreconhub
matchmaker:Flexible Dictionary-Based Cleaning
Provides flexible dictionary-based cleaning that allows users to specify implicit and explicit missing data, regular expressions for both data and columns, and global matches, while respecting ordering of factors. This package is part of the 'RECON' (<https://www.repidemicsconsortium.org/>) toolkit for outbreak analysis.
Maintained by Zhian N. Kamvar. Last updated 5 years ago.
5.4 match 9 stars 5.43 score 9 scripts 2 dependentsyonicd
sinew:Package Development Documentation and Namespace Management
Manage package documentation and namespaces from the command line. Programmatically attach namespaces in R and Rmd script, populates Roxygen2 skeletons with information scraped from within functions and populate the Imports field of the DESCRIPTION file.
Maintained by Jonathan Sidi. Last updated 1 years ago.
3.4 match 166 stars 8.54 score 88 scriptsrjdverse
rjd3toolkit:Utility Functions around 'JDemetra+ 3.0'
R Interface to 'JDemetra+ 3.x' (<https://github.com/jdemetra>) time series analysis software. It provides functions allowing to model time series (create outlier regressors, user-defined calendar regressors, UCARIMA models...), to test the presence of trading days or seasonal effects and also to set specifications in pre-adjustment and benchmarking when using rjd3x13 or rjd3tramoseats.
Maintained by Tanguy Barthelemy. Last updated 5 months ago.
jdemetraseasonal-adjustmenttimeseriesopenjdk
5.0 match 5 stars 5.81 score 48 scripts 15 dependentsmaelstrom-research
Rmonize:Support Retrospective Harmonization of Data
Functions to support rigorous retrospective data harmonization processing, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the harmonization process, apply specified processing rules to generate harmonized data, diagnose processing errors, and summarize and evaluate harmonized outputs. The main inputs that define the processing are a DataSchema (list and definitions of harmonized variables to be generated) and Data Processing Elements (processing rules to be applied to generate harmonized variables from study-specific variables). The main outputs of processing are harmonized datasets, associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).
Maintained by Guillaume Fabre. Last updated 12 months ago.
5.1 match 5 stars 5.58 score 51 scriptsopenwashdata
washr:Publication Toolkit for Water, Sanitation and Hygiene (WASH) Data
A toolkit to set up an R data package in a consistent structure. Automates tasks like tidy data export, data dictionary documentation, README and website creation, and citation management.
Maintained by Colin Walder. Last updated 4 months ago.
5.7 match 2 stars 4.95 score 7 scriptsmyeomans
politeness:Detecting Politeness Features in Text
Detecting markers of politeness in English natural language. This package allows researchers to easily visualize and quantify politeness between groups of documents. This package combines prior research on the linguistic markers of politeness. We thank the Spencer Foundation, the Hewlett Foundation, and Harvard's Institute for Quantitative Social Science for support.
Maintained by Mike Yeomans. Last updated 1 months ago.
3.8 match 25 stars 7.49 score 41 scripts 1 dependentsmyeomans
doc2concrete:Measuring Concreteness in Natural Language
Models for detecting concreteness in natural language. This package is built in support of Yeomans (2021) <doi:10.1016/j.obhdp.2020.10.008>, which reviews linguistic models of concreteness in several domains. Here, we provide an implementation of the best-performing domain-general model (from Brysbaert et al., (2014) <doi:10.3758/s13428-013-0403-5>) as well as two pre-trained models for the feedback and plan-making domains.
Maintained by Mike Yeomans. Last updated 1 years ago.
5.0 match 13 stars 5.59 score 20 scripts 1 dependentsmlverse
torch:Tensors and Neural Networks with 'GPU' Acceleration
Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.
Maintained by Daniel Falbel. Last updated 5 days ago.
1.7 match 520 stars 16.52 score 1.4k scripts 38 dependentsdankelley
oce:Analysis of Oceanographic Data
Supports the analysis of Oceanographic data, including 'ADCP' measurements, measurements made with 'argo' floats, 'CTD' measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the 'UNESCO' or 'TEOS-10' equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively by Kelley (2018) "Oceanographic Analysis with R" <doi:10.1007/978-1-4939-8844-0>.
Maintained by Dan Kelley. Last updated 19 hours ago.
1.8 match 146 stars 15.42 score 4.2k scripts 18 dependentspaithiov909
kelpbeds:Dictionary Tool for 'MeCab'
Provides the source 'IPAdic' for 'MeCab'.
Maintained by Akiru Kato. Last updated 11 months ago.
16.2 match 1.70 scoreloelschlaeger
oeli:Utilities for Developing Data Science Software
Some general helper functions that I (and maybe others) find useful when developing data science software.
Maintained by Lennart Oelschlรคger. Last updated 4 months ago.
5.0 match 2 stars 5.42 score 1 scripts 4 dependentsdamoncharlesroberts
genCountR:Interacting with Roberts and Utych's (2019) Gendered Language Dictionary
Allows users to generate a gendered language score according to the gendered language dictionary in Roberts and Utych (2019) <doi:10.1177/1065912919874883>.
Maintained by Damon Roberts. Last updated 8 months ago.
6.8 match 4.00 score 2 scriptsfive-dots
Dict:R6 Based Key-Value Dictionary Implementation
A key-value dictionary data structure based on R6 class which is designed to be similar usages with other languages dictionary (e.g. 'Python') with reference semantics and extendabilities by R6.
Maintained by Shun Asai. Last updated 3 years ago.
5.5 match 4 stars 4.85 score 358 scriptscran
tmcn:A Text Mining Toolkit for Chinese
A Text mining toolkit for Chinese, which includes facilities for Chinese string processing, Chinese NLP supporting, encoding detecting and converting. Moreover, it provides some functions to support 'tm' package in Chinese.
Maintained by Jian Li. Last updated 6 years ago.
11.1 match 1 stars 2.38 score 5 dependentsmlr-org
mlr3tuning:Hyperparameter Optimization for 'mlr3'
Hyperparameter optimization package of the 'mlr3' ecosystem. It features highly configurable search spaces via the 'paradox' package and finds optimal hyperparameter configurations for any 'mlr3' learner. 'mlr3tuning' works with several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in 'mlr3mbo') and Hyperband (in 'mlr3hyperband'). Moreover, it can automatically optimize learners and estimate the performance of optimized models with nested resampling.
Maintained by Marc Becker. Last updated 3 months ago.
bbotkhyperparameter-optimizationhyperparameter-tuningmachine-learningmlr3optimizationtunetuning
2.3 match 55 stars 11.59 score 384 scripts 11 dependentstheogrost
NUSS:Mixed N-Grams and Unigram Sequence Segmentation
Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.
Maintained by Oskar Kosch. Last updated 8 months ago.
8.7 match 3.00 score 8 scriptsnikdata
RClimacell:R Wrapper for the 'Climacell' API
'Climacell' is a weather platform that provides hyper-local forecasts and weather data. This package enables the user to query the core layers of the time line interface of the 'Climacell' v4 API <https://www.climacell.co/weather-api/>. This package requires a valid API key. See vignettes for instructions on use.
Maintained by Nikhil Agarwal. Last updated 4 years ago.
climacellclimacell-apiweatherweather-api
6.5 match 4.00 score 5 scriptsinbo
checklist:A Thorough and Strict Set of Checks for R Packages and Source Code
An opinionated set of rules for R packages and R source code projects.
Maintained by Thierry Onkelinx. Last updated 26 days ago.
checklistcontinuous-integrationcontinuous-testingquality-assurance
3.4 match 19 stars 7.24 score 21 scripts 2 dependentsmatrix-profile-foundation
tsmp:Time Series with Matrix Profile
A toolkit implementing the Matrix Profile concept that was created by CS-UCR <http://www.cs.ucr.edu/~eamonn/MatrixProfile.html>.
Maintained by Francisco Bischoff. Last updated 3 years ago.
algorithmmatrix-profilemotif-searchtime-seriescpp
3.3 match 72 stars 7.29 score 179 scripts 1 dependentsroux-ohdsi
allofus:Interface for 'All of Us' Researcher Workbench
Streamline use of the 'All of Us' Researcher Workbench (<https://www.researchallofus.org/data-tools/workbench/>)with tools to extract and manipulate data from the 'All of Us' database. Increase interoperability with the Observational Health Data Science and Informatics ('OHDSI') tool stack by decreasing reliance of 'All of Us' tools and allowing for cohort creation via 'Atlas'. Improve reproducible and transparent research using 'All of Us'.
Maintained by Rob Cavanaugh. Last updated 4 months ago.
3.4 match 16 stars 7.19 score 30 scriptsctu-bern
redcaptools:Tools for exporting and working with REDCap data
Tools for exporting and working with REDCap data (e.g. adding labels, formatting dates).
Maintained by Alan G Haynes. Last updated 4 months ago.
5.3 match 4 stars 4.51 score 9 scriptslgnbhl
BFS:Get Data from the Swiss Federal Statistical Office
Search and download data from the Swiss Federal Statistical Office (BFS) APIs <https://www.bfs.admin.ch/>.
Maintained by Felix Luginbuhl. Last updated 3 months ago.
3.6 match 18 stars 6.55 score 17 scriptscanmod
iidda:Processing Infectious Disease Datasets in IIDDA.
Part of an open toolchain for processing infectious disease datasets available through the IIDDA data repository.
Maintained by Steve Walker. Last updated 4 months ago.
3.9 match 6.07 score 133 scripts 3 dependentsropensci
phylotaR:Automated Phylogenetic Sequence Cluster Identification from 'GenBank'
A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.
Maintained by Shixiang Wang. Last updated 8 months ago.
blastngenbankpeer-reviewedphylogeneticssequence-alignment
4.0 match 23 stars 5.86 score 156 scriptsropensci
EndoMineR:Functions to mine endoscopic and associated pathology datasets
This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.
Maintained by Sebastian Zeki. Last updated 7 months ago.
endoscopygastroenterologypeer-reviewedsemi-structured-datatext-mining
4.3 match 13 stars 5.47 score 30 scriptsdylanb95
statespacer:State Space Modelling in 'R'
A tool that makes estimating models in state space form a breeze. See "Time Series Analysis by State Space Methods" by Durbin and Koopman (2012, ISBN: 978-0-19-964117-8) for details about the algorithms implemented.
Maintained by Dylan Beijers. Last updated 2 years ago.
cppdynamic-linear-modelforecastinggaussian-modelskalman-filtermathematical-modellingstate-spacestatistical-inferencestatistical-modelsstructural-analysistime-seriesopenblascppopenmp
3.8 match 15 stars 6.14 score 37 scriptscardiomoon
webr:Data and Functions for Web-Based Analysis
Several analysis-related functions for the book entitled "Web-based Analysis without R in Your Computer"(written in Korean, ISBN 978-89-5566-185-9) by Keon-Woong Moon. The main function plot.htest() shows the distribution of statistic for the object of class 'htest'.
Maintained by Keon-Woong Moon. Last updated 5 years ago.
3.4 match 33 stars 6.82 score 181 scriptsdoctorbjones
datadictionary:Create a Data Dictionary
Creates a data dictionary from any dataframe or tibble in your R environment. You can opt to add variable labels. You can write the object directly to Excel.
Maintained by Bethany Jones. Last updated 1 days ago.
5.7 match 11 stars 4.00 score 18 scriptssportsdataverse
hoopR:Access Men's Basketball Play by Play Data
A utility to quickly obtain clean and tidy men's basketball play by play data. Provides functions to access live play by play and box score data from ESPN<https://www.espn.com> with shot locations when available. It is also a full NBA Stats API<https://www.nba.com/stats/> wrapper. It is also a scraping and aggregating interface for Ken Pomeroy's men's college basketball statistics website<https://kenpom.com>. It provides users with an active subscription the capability to scrape the website tables and analyze the data for themselves.
Maintained by Saiem Gilani. Last updated 1 years ago.
basketballcollege-basketballespnkenpomnbanba-analyticsnba-apinba-datanba-statisticsnba-statsnba-stats-apincaancaa-basketballncaa-bracketncaa-playersncaa-ratingsncaamsportsdataverse
3.3 match 91 stars 6.93 score 261 scriptsjonathanconrad98
docket:Insert R Data into 'Word' Documents
Populate data from an R environment into '.doc' and '.docx' templates. Create a template document in a program such as 'Word', and add strings encased in guillemet characters to create flags (ยซexampleยป). Use getDictionary() to create a dictionary of flags and replacement values, then call docket() to generate a populated document.
Maintained by Jonathan Conrad. Last updated 1 years ago.
8.3 match 2.70 score 3 scriptsideasybits
redatamx:R Interface to 'Redatam' Library
Provides an API to work with 'Redatam' (see <https://redatam.org>) databases in both formats: 'RXDB' (new format) and 'DICX' (old format) and running 'Redatam' programs written in 'SPC' language. It's a wrapper around 'Redatam' core and provides functions to open/close a database (redatam_open()/redatam_close()), list entities and variables from the database (redatam_entities(), redatam_variables()) and execute a 'SPC' program and gets the results as data frames (redatam_query(), redatam_run()).
Maintained by Jaime Salvador. Last updated 3 months ago.
6.8 match 3.30 score 2 scriptslaresbernardo
lares:Analytics & Machine Learning Sidekick
Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.
Maintained by Bernardo Lares. Last updated 23 days ago.
analyticsapiautomationautomldata-sciencedescriptive-statisticsh2omachine-learningmarketingmmmpredictive-modelingpuzzlerlanguagerobynvisualization
2.3 match 233 stars 9.84 score 185 scripts 1 dependentswadpac
GGIR:Raw Accelerometer Data Analysis
A tool to process and analyse data collected with wearable raw acceleration sensors as described in Migueles and colleagues (JMPB 2019), and van Hees and colleagues (JApplPhysiol 2014; PLoSONE 2015). The package has been developed and tested for binary data from 'GENEActiv' <https://activinsights.com/>, binary (.gt3x) and .csv-export data from 'Actigraph' <https://theactigraph.com> devices, and binary (.cwa) and .csv-export data from 'Axivity' <https://axivity.com>. These devices are currently widely used in research on human daily physical activity. Further, the package can handle accelerometer data file from any other sensor brand providing that the data is stored in csv format. Also the package allows for external function embedding.
Maintained by Vincent T van Hees. Last updated 2 days ago.
accelerometeractivity-recognitioncircadian-rhythmmovement-sensorsleep
1.7 match 109 stars 13.20 score 342 scripts 3 dependentsahl27
froth:Emulate a 'Forth' Programming Environment
Emulates a 'Forth' programming environment with added features to interface between R and 'Forth'. Implements most of the functionality described in the original "Starting Forth" textbook <https://www.forth.com/starting-forth/>.
Maintained by Aidan Lakshman. Last updated 1 years ago.
4.3 match 3 stars 5.08 score 2 scriptsrpahl
container:Extending Base 'R' Lists
Extends the functionality of base 'R' lists and provides specialized data structures 'deque', 'set', 'dict', and 'dict.table', the latter to extend the 'data.table' package.
Maintained by Roman Pahl. Last updated 2 months ago.
containerdata-structuresdequedictsets
3.0 match 16 stars 7.13 score 140 scriptstrinker
qdapDictionaries:Dictionaries and Word Lists for the 'qdap' Package
A collection of text analysis dictionaries and word lists for use with the 'qdap' package.
Maintained by Tyler Rinker. Last updated 7 years ago.
3.6 match 4 stars 5.99 score 113 scripts 6 dependentscran
nonmem2R:Loading NONMEM Output Files with Functions for Visual Predictive Checks (VPC) and Goodness of Fit (GOF) Plots
Loading NONMEM (NONlinear Mixed-Effect Modeling, <https://www.iconplc.com/solutions/technologies/nonmem/>) and PSN (Perl-speaks-NONMEM, <https://uupharmacometrics.github.io/PsN/>) output files to extract parameter estimates, provide visual predictive check (VPC) and goodness of fit (GOF) plots, and simulate with parameter uncertainty.
Maintained by Magnus Astrand. Last updated 1 years ago.
7.7 match 3 stars 2.78 scoretiledb-inc
tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays
The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.
Maintained by Isaiah Norton. Last updated 4 days ago.
arrayhdfss3storage-managertiledbcpp
1.8 match 107 stars 11.96 score 306 scripts 4 dependentsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
1.7 match 215 stars 11.83 score 1.2k scripts 9 dependentsjaseziv
worldfootballR:Extract and Clean World Football (Soccer) Data
Allow users to obtain clean and tidy football (soccer) game, team and player data. Data is collected from a number of popular sites, including 'FBref', transfer and valuations data from 'Transfermarkt'<https://www.transfermarkt.com/> and shooting location and other match stats data from 'Understat'<https://understat.com/>. It gives users the ability to access data more efficiently, rather than having to export data tables to files before being able to complete their analysis.
Maintained by Jason Zivkovic. Last updated 30 days ago.
fbreffootballfootball-datasoccer-datasports-datatransfermarktunderstat
2.0 match 506 stars 9.89 score 516 scripts 2 dependentshugheylab
pmparser:Create and Maintain a Relational Database of Data from PubMed/MEDLINE
Provides a simple interface for extracting various elements from the publicly available PubMed XML files, incorporating PubMed's regular updates, and combining the data with the NIH Open Citation Collection. See Schoenbachler and Hughey (2021) <doi:10.7717/peerj.11071>.
Maintained by Jake Hughey. Last updated 2 months ago.
3.8 match 17 stars 5.23 score 1 scriptsselesnow
rvkstat:R Interface to API 'vk.com'
Load data from vk.com api about your communiti users and views, ads performance, post on user wall and etc. For more information see API Documentation <https://vk.com/dev/first_guide>.
Maintained by Alexey Seleznev. Last updated 3 years ago.
4.5 match 15 stars 4.35 score 9 scripts 1 dependentskoheiw
wordmap:Feature Extraction and Document Classification with Noisy Labels
Extract features and classify documents with noisy labels given by document-meta data or keyword matching Watanabe & Zhou (2020) <doi:10.1177/0894439320907027>.
Maintained by Kohei Watanabe. Last updated 2 months ago.
4.0 match 2 stars 4.86 score 1 scriptsdahhamalsoud
phdcocktail:Enhance the Ease of R Experience as an Emerging Researcher
A toolkit of functions to help: i) effortlessly transform collected data into a publication ready format, ii) generate insightful visualizations from clinical data, iii) report summary statistics in a publication-ready format, iv) efficiently export, save and reload R objects within the framework of R projects.
Maintained by Dahham Alsoud. Last updated 1 years ago.
5.2 match 3.70 score 1 scriptshope-data-science
akc:Automatic Knowledge Classification
A tidy framework for automatic knowledge classification and visualization. Currently, the core functionality of the framework is mainly supported by modularity-based clustering (community detection) in keyword co-occurrence network, and focuses on co-word analysis of bibliometric research. However, the designed functions in 'akc' are general, and could be extended to solve other tasks in text mining as well.
Maintained by Tian-Yuan Huang. Last updated 19 days ago.
3.3 match 15 stars 5.85 score 47 scriptsguyabel
migest:Methods for the Indirect Estimation of Bilateral Migration
Tools for estimating, measuring and working with migration data.
Maintained by Guy J. Abel. Last updated 1 months ago.
3.3 match 32 stars 5.80 score 86 scriptsmlr-org
mlr3filters:Filter Based Feature Selection for 'mlr3'
Extends 'mlr3' with filter methods for feature selection. Besides standalone filter methods built-in methods of any machine-learning algorithm are supported. Partial scoring of multivariate filter methods is supported.
Maintained by Marc Becker. Last updated 4 months ago.
feature-selectionfilterfiltersmlrmlr3variable-importance
2.3 match 20 stars 8.37 score 95 scripts 3 dependentsnt-williams
codebreak:Label Data Using a YAML Codebook
A light-weight framework for labeling coded data using a codebook saved as YAML text file.
Maintained by Nick Williams. Last updated 7 months ago.
7.5 match 6 stars 2.48 score 1 scriptsmlr-org
mlr3fselect:Feature Selection for 'mlr3'
Feature selection package of the 'mlr3' ecosystem. It selects the optimal feature set for any 'mlr3' learner. The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling.
Maintained by Marc Becker. Last updated 2 months ago.
evolutionary-algorithmsexhaustive-searchfeature-selectionmachine-learningmlr3optimizationrandom-searchrecursive-feature-eliminationsequential-feature-selection
2.3 match 23 stars 8.25 score 70 scripts 2 dependentsoliver-wyman-actuarial
easyr:Helpful Functions from Oliver Wyman Actuarial Consulting
Makes difficult operations easy. Includes these types of functions: shorthand, type conversion, data wrangling, and work flow. Also includes some helpful data objects: NA strings, U.S. state list, color blind charting colors. Built and shared by Oliver Wyman Actuarial Consulting. Accepting proposed contributions through GitHub.
Maintained by Bryce Chamberlain. Last updated 1 years ago.
3.8 match 20 stars 4.86 score 18 scriptspecanproject
PEcAn.utils:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by Rob Kooper. Last updated 2 days ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
1.7 match 216 stars 10.92 score 218 scripts 35 dependentsbioc
affxparser:Affymetrix File Parsing SDK
Package for parsing Affymetrix files (CDF, CEL, CHP, BPMAP, BAR). It provides methods for fast and memory efficient parsing of Affymetrix files using the Affymetrix' Fusion SDK. Both ASCII- and binary-based files are supported. Currently, there are methods for reading chip definition file (CDF) and a cell intensity file (CEL). These files can be read either in full or in part. For example, probe signals from a few probesets can be extracted very quickly from a set of CEL files into a convenient list structure.
Maintained by Kasper Daniel Hansen. Last updated 2 months ago.
infrastructuredataimportmicroarrayproprietaryplatformsonechannelbioconductorcpp
2.3 match 7 stars 8.19 score 65 scripts 14 dependentstrevorld
xmpdf:Edit 'XMP' Metadata and 'PDF' Bookmarks and Documentation Info
Edit 'XMP' metadata <https://en.wikipedia.org/wiki/Extensible_Metadata_Platform> in a variety of media file formats as well as edit bookmarks (aka outline aka table of contents) and documentation info entries in 'pdf' files. Can detect and use a variety of command-line tools to perform these operations such as 'exiftool' <https://exiftool.org/>, 'ghostscript' <https://www.ghostscript.com/>, and/or 'pdftk' <https://gitlab.com/pdftk-java/pdftk>.
Maintained by Trevor L Davis. Last updated 12 months ago.
3.5 match 5 stars 5.18 score 1 scripts 1 dependentsropensci
rdhs:API Client and Dataset Management for the Demographic and Health Survey (DHS) Data
Provides a client for (1) querying the DHS API for survey indicators and metadata (<https://api.dhsprogram.com/#/index.html>), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.
Maintained by OJ Watson. Last updated 17 days ago.
datasetdhsdhs-apiextractpeer-reviewedsurvey-data
1.8 match 35 stars 10.07 score 286 scripts 3 dependentsjimmyday12
fitzRoy:Easily Scrape and Process AFL Data
An easy package for scraping and processing Australia Rules Football (AFL) data. 'fitzRoy' provides a range of functions for accessing publicly available data from 'AFL Tables' <https://afltables.com/afl/afl_index.html>, 'Footy Wire' <https://www.footywire.com> and 'The Squiggle' <https://squiggle.com.au>. Further functions allow for easy processing, cleaning and transformation of this data into formats that can be used for analysis.
Maintained by James Day. Last updated 2 months ago.
1.7 match 134 stars 10.74 score 324 scriptsgdemin
maditr:Fast Data Aggregation, Modification, and Filtering with Pipes and 'data.table'
Provides pipe-style interface for 'data.table'. Package preserves all 'data.table' features without significant impact on performance. 'let' and 'take' functions are simplified interfaces for most common data manipulation tasks. For example, you can write 'take(mtcars, mean(mpg), by = am)' for aggregation or 'let(mtcars, hp_wt = hp/wt, hp_wt_mpg = hp_wt/mpg)' for modification. Use 'take_if/let_if' for conditional aggregation/modification. Additionally there are some conveniences such as automatic 'data.frame' conversion to 'data.table'.
Maintained by Gregory Demin. Last updated 4 months ago.
2.0 match 61 stars 8.98 score 248 scripts 7 dependentscmerow
rangeModelMetadata:Provides Templates for Metadata Files Associated with Species Range Models
Range Modeling Metadata Standards (RMMS) address three challenges: they (i) are designed for convenience to encourage use, (ii) accommodate a wide variety of applications, and (iii) are extensible to allow the community of range modelers to steer it as needed. RMMS are based on a data dictionary that specifies a hierarchical structure to catalog different aspects of the range modeling process. The dictionary balances a constrained, minimalist vocabulary to improve standardization with flexibility for users to provide their own values. Merow et al. (2019) <DOI:10.1111/geb.12993> describe the standards in more detail. Note that users who prefer to use the R package 'ecospat' can obtain it from <https://github.com/ecospat/ecospat>.
Maintained by Cory Merow. Last updated 8 months ago.
ecological-metadata-languageecological-modellingecological-modelsecologyspecies-distribution-modellingspecies-distributions
2.6 match 6 stars 6.96 score 16 scripts 3 dependentstrinker
qdapRegex:Regular Expression Removal, Extraction, and Replacement Tools
A collection of regular expression tools associated with the 'qdap' package that may be useful outside of the context of discourse analysis. Tools include removal/extraction/replacement of abbreviations, dates, dollar amounts, email addresses, hash tags, numbers, percentages, citations, person tags, phone numbers, times, and zip codes.
Maintained by Tyler Rinker. Last updated 1 years ago.
1.9 match 50 stars 9.48 score 502 scripts 41 dependentstrinker
textstem:Tools for Stemming and Lemmatizing Text
Tools that stem and lemmatize text. Stemming is a process that removes endings such as affixes. Lemmatization is the process of grouping inflected forms together as a single base form.
Maintained by Tyler Rinker. Last updated 7 years ago.
lemmatizationstemmingtext-mining
2.0 match 45 stars 8.71 score 888 scripts 11 dependentsusaid-oha-si
mindthegap:Mind the Gap
Package to tidy UNAIDS estimates (from the EDMS database) as well as plot trends in UNAIDS 95 goals and ART coverage gap by country.
Maintained by Karishma Srikanth. Last updated 2 months ago.
3.1 match 5 stars 5.51 score 13 scriptsscholaempirica
reschola:The Schola Empirica Package
A collection of utilies, themes and templates for data analysis at Schola Empirica.
Maintained by Jan Netรญk. Last updated 5 months ago.
3.5 match 4 stars 4.83 score 14 scriptsbrendensm
misuvi:Access the Michigan Substance Use Vulnerability Index (MI-SUVI)
Easily import the MI-SUVI data sets. The user can import data sets with full metrics, percentiles, Z-scores, or rankings. Data is available at both the County and Zip Code Tabulation Area (ZCTA) levels. This package also includes a function to import shape files for easy mapping and a function to access the full technical documentation. All data is sourced from the Michigan Department of Health and Human Services.
Maintained by Brenden Smith. Last updated 1 months ago.
4.9 match 3.40 scorechristopherkenny
acronames:Create Acronyms for Naming Things
Simple tool for developing names based on first letters of keywords.
Maintained by Christopher T. Kenny. Last updated 3 years ago.
9.8 match 1 stars 1.70 score 1 scriptskwb-r
kwb.pathdict:Functions to Work with Path Dictionaries
This package provides functions to work with what I call path dictionaries. Path dictionaries are lists defining file and folder paths. In order not to repeat sub-paths, placeholders can be used. The package provides functions to find duplicated sub-paths and to define placeholders accordingly.
Maintained by Hauke Sonnenberg. Last updated 5 years ago.
9.7 match 1.70 score 1 scriptssubramv
cms:Calculate Medicare Reimbursement
Uses the 'CMS' application programming interface <https://dnav.cms.gov/api/healthdata> to provide users databases containing yearly Medicare reimbursement rates in the United States. Data can be acquired for the entire United States or only for specific localities. Currently, support is only provided for the Medicare Physician Fee Schedule, but support will be expanded for other 'CMS' databases in future versions.
Maintained by Vigneshwar Subramanian. Last updated 4 years ago.
medicaremedicare-datareimbursement
3.5 match 7 stars 4.54 score 10 scriptsdavidasmith
wordler:The 'WORDLE' Game
The 'Wordle' game. Players have six attempts to guess a five-letter word. After each guess, the player is informed which letters in their guess are either: anywhere in the word; in the right position in the word. This can be used to inform the next guess. Can be played interactively in the console, or programmatically. Based on Josh Wardle's game <https://www.powerlanguage.co.uk/wordle/>.
Maintained by David Smith. Last updated 3 years ago.
3.6 match 5 stars 4.40 score 7 scriptsgreat-northern-diver
loon:Interactive Statistical Data Visualization
An extendable toolkit for interactive data visualization and exploration.
Maintained by R. Wayne Oldford. Last updated 2 years ago.
data-analysisdata-sciencedata-visualizationexploratory-analysisexploratory-data-analysishigh-dimensional-datainteractive-graphicsinteractive-visualizationsloonpythonstatistical-analysisstatistical-graphicsstatisticstcl-extensiontk
1.8 match 48 stars 9.00 score 93 scripts 5 dependentsgenentech
psborrow2:Bayesian Dynamic Borrowing Analysis and Simulation
Bayesian dynamic borrowing is an approach to incorporating external data to supplement a randomized, controlled trial analysis in which external data are incorporated in a dynamic way (e.g., based on similarity of outcomes); see Viele 2013 <doi:10.1002/pst.1589> for an overview. This package implements the hierarchical commensurate prior approach to dynamic borrowing as described in Hobbes 2011 <doi:10.1111/j.1541-0420.2011.01564.x>. There are three main functionalities. First, 'psborrow2' provides a user-friendly interface for applying dynamic borrowing on the study results handles the Markov Chain Monte Carlo sampling on behalf of the user. Second, 'psborrow2' provides a simulation framework to compare different borrowing parameters (e.g. full borrowing, no borrowing, dynamic borrowing) and other trial and borrowing characteristics (e.g. sample size, covariates) in a unified way. Third, 'psborrow2' provides a set of functions to generate data for simulation studies, and also allows the user to specify their own data generation process. This package is designed to use the sampling functions from 'cmdstanr' which can be installed from <https://stan-dev.r-universe.dev>.
Maintained by Matt Secrest. Last updated 1 months ago.
bayesian-dynamic-borrowingpsborrow2simulation-study
2.0 match 18 stars 7.87 score 16 scriptskwb-r
kwb.monitoring:Functions Used Within Different Kwb Monitoring Projects
Functions used within different KWB projects dealing with monitoring data.
Maintained by Hauke Sonnenberg. Last updated 6 years ago.
4.1 match 3.78 score 3 scripts 4 dependentsflavjack
inti:Tools and Statistical Procedures in Plant Science
The 'inti' package is part of the 'inkaverse' project for developing different procedures and tools used in plant science and experimental designs. The mean aim of the package is to support researchers during the planning of experiments and data collection (tarpuy()), data analysis and graphics (yupana()) , and technical writing. Learn more about the 'inkaverse' project at <https://inkaverse.com/>.
Maintained by Flavio Lozano-Isla. Last updated 20 hours ago.
agricultureappsinkaverselmmplant-breedingplant-scienceshiny
1.9 match 5 stars 8.27 score 193 scriptsmlr-org
mlr3torch:Deep Learning with 'mlr3'
Deep Learning library that extends the mlr3 framework by building upon the 'torch' package. It allows to conveniently build, train, and evaluate deep learning models without having to worry about low level details. Custom architectures can be created using the graph language defined in 'mlr3pipelines'.
Maintained by Sebastian Fischer. Last updated 1 months ago.
data-sciencedeep-learningmachine-learningmlr3torch
2.0 match 42 stars 7.63 score 78 scriptscran
NHSDataDictionaRy:NHS Data Dictionary Toolset for NHS Lookups
Providing a common set of simplified web scraping tools for working with the NHS Data Dictionary <https://datadictionary.nhs.uk/data_elements_overview.html>. The intended usage is to access the data elements section of the NHS Data Dictionary to access key lookups. The benefits of having it in this package are that the lookups are the live lookups on the website and will not need to be maintained. This package was commissioned by the NHS-R community <https://nhsrcommunity.com/> to provide this consistency of lookups. The OpenSafely lookups have now been added <https://www.opencodelists.org/docs/>.
Maintained by Gary Hutson. Last updated 4 years ago.
7.6 match 2.00 scoreropensci
ckanr:Client for the Comprehensive Knowledge Archive Network ('CKAN') API
Client for 'CKAN' API (<https://ckan.org/>). Includes interface to 'CKAN' 'APIs' for search, list, show for packages, organizations, and resources. In addition, provides an interface to the 'datastore' API.
Maintained by Francisco Alves. Last updated 2 years ago.
databaseopen-datackanapidatadatasetapi-wrapperckan-api
1.8 match 100 stars 8.67 score 448 scripts 4 dependentspsychbruce
PsychWordVec:Word Embedding Research Framework for Psychological Science
An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a series of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <arXiv:1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <arXiv:1607.04606>; (5) a group of functions to download 'pre-trained' language models (e.g., 'GPT', 'BERT') and extract contextualized (dynamic) word vectors (based on the R package 'text').
Maintained by Han-Wu-Shuang Bao. Last updated 1 years ago.
bertcosine-similarityfasttextglovegptlanguage-modelnatural-language-processingnlppretrained-modelspsychologysemantic-analysistext-analysistext-miningtsneword-embeddingsword-vectorsword2vecopenjdk
3.8 match 22 stars 4.04 score 10 scriptsai-sdc
acro:A Tool for Semi-Automating the Statistical Disclosure Control of Research Outputs
Assists researchers and output checkers by distinguishing between research output that is safe to publish, output that requires further analysis, and output that cannot be published because of substantial disclosure risk. A paper about the tool was presented at the UNECE Expert Meeting on Statistical Data Confidentiality 2023; see <https://uwe-repository.worktribe.com/output/11060964>.
Maintained by Jim Smith. Last updated 9 days ago.
data-privacydata-protectionprivacyprivacy-toolsstatistical-disclosure-controlstatistical-software
3.7 match 1 stars 4.11 score 1 scriptsbioc
MSstats:Protein Significance Analysis in DDA, SRM and DIA for Label-free or Label-based Proteomics Experiments
A set of tools for statistical relative protein significance analysis in DDA, SRM and DIA experiments.
Maintained by Meena Choi. Last updated 11 days ago.
immunooncologymassspectrometryproteomicssoftwarenormalizationqualitycontroltimecourseopenblascpp
1.8 match 8.49 score 164 scripts 7 dependentsselesnow
ractivecampaign:Loading Data from 'ActiveCampaign API v3'
Interface for loading data from 'ActiveCampaign API v3' <https://developers.activecampaign.com/reference>. Provide functions for getting data by deals, contacts, accounts, campaigns and messages.
Maintained by Alexey Seleznev. Last updated 2 years ago.
5.4 match 2.70 score 2 scriptsouhscbbmc
REDCapR:Interaction Between R and REDCap
Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.
Maintained by Will Beasley. Last updated 2 months ago.
1.2 match 118 stars 12.36 score 438 scripts 6 dependentsrstudio
tfestimators:Interface to 'TensorFlow' Estimators
Interface to 'TensorFlow' Estimators <https://www.tensorflow.org/guide/estimator>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.
Maintained by Tomasz Kalinowski. Last updated 3 years ago.
1.7 match 57 stars 8.42 score 170 scriptsselesnow
ryandexdirect:Load Data From 'Yandex Direct'
Load data from 'Yandex Direct' API V5 <https://yandex.ru/dev/direct/doc/dg/concepts/about-docpage> into R. Provide function for load lists of campaings, ads, keywords and other objects from 'Yandex Direct' account. Also you can load statistic from API 'Reports Service' <https://yandex.ru/dev/direct/doc/reports/reports-docpage>. And allows keyword bids management.
Maintained by Alexey Seleznev. Last updated 1 months ago.
1.9 match 53 stars 7.54 score 44 scripts 1 dependentspaithiov909
audubon:Japanese Text Processing Tools
A collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the 'Sudachi' morphological analyzer and the 'NEologd' (Neologism dictionary for 'MeCab'). These features are specific to Japanese and are not implemented in 'ICU' (International Components for Unicode).
Maintained by Akiru Kato. Last updated 21 days ago.
2.5 match 10 stars 5.61 score 3 scripts 1 dependentsropensci
popler:Popler R Package
Browse and query the popler database.
Maintained by Compagnoni Aldo. Last updated 5 years ago.
3.7 match 7 stars 3.82 score 47 scriptsropensci
rperseus:Get Texts from the Perseus Digital Library
The Perseus Digital Library is a collection of classical texts. This package helps you get them. The available works can also be viewed here: <http://cts.perseids.org/>.
Maintained by David Ranzolin. Last updated 2 years ago.
classicsgreekgreek-biblegreek-new-testamentlatinpeer-reviewedperseusperseus-digital-librarytranslation
3.8 match 19 stars 3.74 score 29 scriptsgrantmcdermott
ggfixest:Dedicated 'ggplot2' Methods for 'fixest' Objects
Provides 'ggplot2' equivalents of fixest::coefplot() and fixest::iplot(), for producing nice coefficient plots and interaction plots. Enables some additional functionality and convenience features, including grouped multi-'fixest' object faceting and programmatic updates to existing plots (e.g., themes and aesthetics).
Maintained by Grant McDermott. Last updated 2 months ago.
2.0 match 49 stars 7.01 score 28 scriptscran
localScore:Package for Sequence Analysis by Local Score
Functionalities for calculating the local score and calculating statistical relevance (p-value) to find a local Score in a sequence of given distribution (S. Mercier and J.-J. Daudin (2001) <https://hal.science/hal-00714174/>) ; S. Karlin and S. Altschul (1990) <https://pmc.ncbi.nlm.nih.gov/articles/PMC53667/> ; S. Mercier, D. Cellier and F. Charlot (2003) <https://hal.science/hal-00937529v1/> ; A. Lagnoux, S. Mercier and P. Valois (2017) <doi:10.1093/bioinformatics/btw699> ).
Maintained by David Robelin. Last updated 20 days ago.
6.0 match 2.30 score 6 scriptsrubenarslan
codebook:Automatic Codebooks from Metadata Encoded in Dataset Attributes
Easily automate the following tasks to describe data frames: Summarise the distributions, and labelled missings of variables graphically and using descriptive statistics. For surveys, compute and summarise reliabilities (internal consistencies, retest, multilevel) for psychological scales. Combine this information with metadata (such as item labels and labelled values) that is derived from R attributes. To do so, the package relies on 'rmarkdown' partials, so you can generate HTML, PDF, and Word documents. Codebooks are also available as tables (CSV, Excel, etc.) and in JSON-LD, so that search engines can find your data and index the metadata. The metadata are also available at your fingertips via RStudio Addins.
Maintained by Ruben Arslan. Last updated 3 months ago.
codebookdocumentationformrjson-ldmetadataspsswebapp
1.7 match 142 stars 8.31 score 229 scriptszhukovyuri
SUNGEO:Sub-National Geospatial Data Archive: Geoprocessing Toolkit
Tools for integrating spatially-misaligned GIS datasets. Part of the Sub-National Geospatial Data Archive System.
Maintained by Yuri M. Zhukov. Last updated 10 months ago.
4.0 match 5 stars 3.42 score 8 scriptspwkraft
discursive:Measuring Discursive Sophistication in Open-Ended Survey Responses
A simple approach to measure political sophistication based on open-ended survey responses. Discursive sophistication captures the complexity of individual attitude expression by quantifying its relative size, range, and constraint. For more information on the measurement approach see: Kraft, Patrick W. 2023. "Women Also Know Stuff: Challenging the Gender Gap in Political Sophistication." American Political Science Review (forthcoming).
Maintained by Patrick Kraft. Last updated 2 years ago.
4.5 match 2 stars 3.00 score 5 scriptsjaimesalvador
minired:R Interface to 'Redatam' Library
This package is deprecated. Please use 'redatamx' instead. Provides an API to work with 'Redatam' (see <https://redatam.org>) databases in both formats: 'RXDB' (new format) and 'DICX' (old format) and running 'Redatam' programs written in 'SPC' language. It's a wrapper around 'Redatam' core and provides functions to open/close a database (redatam_open()/redatam_close()), list entities and variables from the database (redatam_entities(), redatam_variables()) and execute a 'SPC' program and gets the results as data frames (redatam_query(), redatam_run()).
Maintained by Jaime Salvador. Last updated 4 months ago.
6.8 match 2.00 scorecondwanaland
words:List of English Words from the Scrabble Dictionary
List of english scrabble words as listed in the OTCWL2014 <https://www.scrabbleplayers.org/w/Official_Tournament_and_Club_Word_List_2014_Edition>. Words are collated from the 'Word Game Dictionary' <https://www.wordgamedictionary.com/word-lists/>.
Maintained by Conor Neilson. Last updated 4 years ago.
3.5 match 3.80 score 42 scripts 1 dependentstomeriko96
polyglotr:Translate Text
Provide easy methods to translate pieces of text. Functions send requests to translation services online.
Maintained by Tomer Iwan. Last updated 1 months ago.
google-translategoogletranslatelanguagelingueemymemory-apimymemorytranslatorponstranslationtranslations-api
1.8 match 33 stars 7.61 score 34 scripts 1 dependentshughparsonage
heims:Decode and Validate HEIMS Data from Department of Education, Australia
Decode elements of the Australian Higher Education Information Management System (HEIMS) data for clarity and performance. HEIMS is the record system of the Department of Education, Australia to record enrolments and completions in Australia's higher education system, as well as a range of relevant information. For more information, including the source of the data dictionary, see <http://heimshelp.education.gov.au/sites/heimshelp/dictionary/pages/data-element-dictionary>.
Maintained by Hugh Parsonage. Last updated 7 years ago.
4.9 match 2.70 score 8 scriptscran
exceldata:Streamline Data Import, Cleaning and Recoding from 'Excel'
A small group of functions to read in a data dictionary and the corresponding data table from 'Excel' and to automate the cleaning, re-coding and creation of simple calculated variables. This package was designed to be a companion to the macro-enabled 'Excel' template available on the GitHub site, but works with any similarly-formatted 'Excel' data.
Maintained by Lisa Avery. Last updated 1 years ago.
7.8 match 1.70 scorejonclayden
ore:An R Interface to the Onigmo Regular Expression Library
Provides an alternative to R's built-in functionality for handling regular expressions, based on the Onigmo library. Offers first-class compiled regex objects, partial matching and function-based substitutions, amongst other features.
Maintained by Jon Clayden. Last updated 2 days ago.
regexregular-expressionstext-analysis
1.8 match 58 stars 7.16 score 125 scripts 6 dependentsdaya6489
SmartEDA:Summarize and Explore the Data
Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.
Maintained by Dayanand Ubrangala. Last updated 1 years ago.
analysisexploratory-data-analysis
1.8 match 42 stars 7.25 score 214 scriptsbnosac
ruimtehol:Learn Text 'Embeddings' with 'Starspace'
Wraps the 'StarSpace' library <https://github.com/facebookresearch/StarSpace> allowing users to calculate word, sentence, article, document, webpage, link and entity 'embeddings'. By using the 'embeddings', you can perform text based multi-label classification, find similarities between texts and categories, do collaborative-filtering based recommendation as well as content-based recommendation, find out relations between entities, calculate graph 'embeddings' as well as perform semi-supervised learning and multi-task learning on plain text. The techniques are explained in detail in the paper: 'StarSpace: Embed All The Things!' by Wu et al. (2017), available at <arXiv:1709.03856>.
Maintained by Jan Wijffels. Last updated 1 years ago.
classificationembeddingsnatural-language-processingnlpsimilaritystarspacetext-miningcpp
1.9 match 101 stars 6.65 score 44 scriptsbioc
amplican:Automated analysis of CRISPR experiments
`amplican` performs alignment of the amplicon reads, normalizes gathered data, calculates multiple statistics (e.g. cut rates, frameshifts) and presents results in form of aggregated reports. Data and statistics can be broken down by experiments, barcodes, user defined groups, guides and amplicons allowing for quick identification of potential problems.
Maintained by Eivind Valen. Last updated 5 months ago.
immunooncologytechnologyalignmentqpcrcrisprcpp
1.7 match 10 stars 7.54 score 41 scriptsmyeomans
DICEM:Directness and Intensity of Conflict Expression
A Natural Language Processing Model trained to detect directness and intensity during conflict. See <https://www.mikeyeomans.info>.
Maintained by Michael Yeomans. Last updated 7 months ago.
3.8 match 3.30 scorespsanderson
healthyR.data:Data Only Package to 'healthyR'
Provides data for functions typically used in the 'healthyR' package.
Maintained by Steven Sanderson. Last updated 2 months ago.
datadata-sciencedata-setshealthcarehealthcare-analysishealthcare-applicationhealthcare-datasets
1.9 match 10 stars 6.52 score 105 scripts 1 dependentsepicentre-msf
redcap:R Utilities For REDCap
R utilities for interacting with the REDCap API.
Maintained by Patrick Barks. Last updated 3 months ago.
3.5 match 7 stars 3.45 score 5 scriptskeyatm
keyATM:Keyword Assisted Topic Models
Fits keyword assisted topic models (keyATM) using collapsed Gibbs samplers. The keyATM combines the latent dirichlet allocation (LDA) models with a small number of keywords selected by researchers in order to improve the interpretability and topic classification of the LDA. The keyATM can also incorporate covariates and directly model time trends. The keyATM is proposed in Eshima, Imai, and Sasaki (2024) <doi:10.1111/ajps.12779>.
Maintained by Shusei Eshima. Last updated 11 months ago.
latent-dirichlet-allocationnatural-language-processingpolitical-sciencercpprcppeigensocial-sciencetopic-modelscpp
1.9 match 106 stars 6.30 score 63 scriptsinzightvit
iNZightTools:Tools for 'iNZight'
Provides a collection of wrapper functions for common variable and dataset manipulation workflows primarily used by 'iNZight', a graphical user interface providing easy exploration and visualisation of data for students of statistics, available in both desktop and online versions. Additionally, many of the functions return the 'tidyverse' code used to obtain the result in an effort to bridge the gap between GUI and coding.
Maintained by Tom Elliott. Last updated 3 months ago.
2.3 match 1 stars 5.16 score 18 scripts 2 dependentsblasbenito
distantia:Advanced Toolset for Efficient Time Series Dissimilarity Analysis
Fast C++ implementation of Dynamic Time Warping for time series dissimilarity analysis, with applications in environmental monitoring and sensor data analysis, climate science, signal processing and pattern recognition, and financial data analysis. Built upon the ideas presented in Benito and Birks (2020) <doi:10.1111/ecog.04895>, provides tools for analyzing time series of varying lengths and structures, including irregular multivariate time series. Key features include individual variable contribution analysis, restricted permutation tests for statistical significance, and imputation of missing data via GAMs. Additionally, the package provides an ample set of tools to prepare and manage time series data.
Maintained by Blas M. Benito. Last updated 25 days ago.
dissimilaritydynamic-time-warpinglock-steptime-seriescpp
2.0 match 23 stars 5.76 score 11 scriptsentjos
TreeMineR:Tree-Based Scan Statistics
Implementation of unconditional Bernoulli Scan Statistic developed by Kulldorff et al. (2003) <doi:10.1111/1541-0420.00039> for hierarchical tree structures. Tree-based Scan Statistics are an exploratory method to identify event clusters across the space of a hierarchical tree.
Maintained by Joshua P. Entrop. Last updated 7 months ago.
3.4 match 3.40 score 2 scriptsnschuwirth
ecoval:Procedures for Ecological Assessment of Surface Waters
Functions for evaluating and visualizing ecological assessment procedures for surface waters containing physical, chemical and biological assessments in the form of value functions.
Maintained by Nele Schuwirth. Last updated 3 years ago.
8.5 match 1.34 score 22 scriptscran
espadon:Easy Study of Patient DICOM Data in Oncology
Exploitation, processing and 2D-3D visualization of DICOM-RT files (structures, dosimetry, imagery) for medical physics and clinical research, in a patient-oriented perspective.
Maintained by Cathy Fontbonne. Last updated 1 months ago.
4.0 match 2.85 score