R-universe search: keyword

package

owner

contributor

author

maintainer

topic

needs

exports

data

Currently serving26352packages,22683articles, and64257datasets by1264organizations,13667 maintainers and22203 contributors.

Not sure what to search for? Why not try:maps, bayesian, ecology, climate, genome, gam, spatial, database, pdf, shiny, rstudio, machine learning, prediction, birds, fish, sports, ... (more popular topics)

Organizations

vimc

lcbc-uio

stan-dev

pharmaverse

r-spatial

tidyverse

ropengov

rstudio

r-lib

ropensci

bioc

r-forge

kwb-r

pik-piam

hypertidy

poissonconsulting

mrc-ide

tidymodels

pecanproject

insightsengineering

thinkr-open

mlr-org

inbo

ggseg

ohdsi

modeloriented

paws-r

predictiveecology

flr

ropenspain

sciviews

bnosac

rmi-pacta

repboxr

mrcieu

openvolley

epiverse-trace

nlmixr2

yulab-smu

ices-tools-prod

frbcesab

riatelab

statnet

azure

bips-hb

mlverse

appsilon

rjdverse

epiforecasts

cloudyr

tmsalab

bupaverse

hubverse-org

usepa

usaid-oha-si

dreamrs

openpharma

coatless-rpkg

darwin-eu

certe-medical-epidemiology

merck

easystats

ambiorix-web

business-science

r-dbi

rikenbit

rsquaredacademy

traitecoevo

spatstat

hugheylab

bluegreen-labs

nutriverse

uscbiostats

reconhub

epicentre-msf

ocbe-uio

ipeagit

terminological

aus-doh-safety-and-quality

nflverse

ctu-bern

humaniverse

data-cleaning

biometris

cogdisreslab

gesistsa

rspatial

apache

ifpri

doi-usgs

idslme

csids

winvector

gamlss-dev

stscl

cynkra

Want to learn more about r-universe? Have a look atropensci.org/r-universeor updates from the rOpenSci blog:

Showing 164 of total 164 results (show query)

hope-data-science

akc:Automatic Knowledge Classification

A tidy framework for automatic knowledge classification and visualization. Currently, the core functionality of the framework is mainly supported by modularity-based clustering (community detection) in keyword co-occurrence network, and focuses on co-word analysis of bibliometric research. However, the designed functions in 'akc' are general, and could be extended to solve other tasks in text mining as well.

Maintained by Tian-Yuan Huang. Last updated 1 months ago.

26.8 match 15 stars 5.85 score 47 scripts

bioc

flowCore:flowCore: Basic structures for flow cytometry data

Provides S4 data structures and basic functions to deal with flow cytometry data.

Maintained by Mike Jiang. Last updated 5 months ago.

immunooncology infrastructure flowcytometry cellbasedassays cpp

13.2 match 10.17 score 1.7k scripts 59 dependents

cran

medExtractR:Extraction of Medication Information from Clinical Text

Function and support for medication and dosing information extraction from free-text clinical notes. Medication entities for the basic medExtractR implementation that can be extracted include drug name, strength, dose amount, dose, frequency, intake time, dose change, and time of last dose. The basic medExtractR is outlined in Weeks, Beck, McNeer, Williams, Bejan, Denny, Choi (2020) <doi: 10.1093/jamia/ocz207>. The extended medExtractR_tapering implementation is intended to extract dosing information for more tapering schedules, which are far more complex. The tapering extension allows for the extraction of additional entities including dispense amount, refills, dose schedule, time keyword, transition, and preposition.

Maintained by Leena Choi. Last updated 3 years ago.

42.0 match 3.18 score

qinwf

jiebaR:Chinese Text Segmentation

Chinese text segmentation, keyword extraction and speech tagging For R.

Maintained by Qin Wenfeng. Last updated 5 years ago.

chinese chinese-text-segmentation cppjieba jieba lexical-analysis nlp cpp

9.6 match 352 stars 10.46 score 456 scripts 6 dependents

selesnow

ryandexdirect:Load Data From 'Yandex Direct'

Load data from 'Yandex Direct' API V5 <https://yandex.ru/dev/direct/doc/dg/concepts/about-docpage> into R. Provide function for load lists of campaings, ads, keywords and other objects from 'Yandex Direct' account. Also you can load statistic from API 'Reports Service' <https://yandex.ru/dev/direct/doc/reports/reports-docpage>. And allows keyword bids management.

Maintained by Alexey Seleznev. Last updated 2 months ago.

campaign filterlist wifi

13.2 match 53 stars 7.54 score 44 scripts 1 dependents

ropensci

rerddap:General Purpose Client for 'ERDDAP™' Servers

General purpose R client for 'ERDDAP™' servers. Includes functions to search for 'datasets', get summary information on 'datasets', and fetch 'datasets', in either 'csv' or 'netCDF' format. 'ERDDAP™' information: <https://upwell.pfeg.noaa.gov/erddap/information.html>.

Maintained by Roy Mendelssohn. Last updated 12 days ago.

earth science climate precipitation temperature storm buoy noaa api-client erddap noaa-data

9.4 match 41 stars 10.43 score 376 scripts 5 dependents

lebebr01

pdfsearch:Search Tools for PDF Files

Includes functions for keyword search of pdf files. There is also a wrapper that includes searching of all files within a single directory.

Maintained by Brandon LeBeau. Last updated 3 years ago.

keyword pdf

15.3 match 40 stars 5.85 score 35 scripts

statnet

ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 22 days ago.

5.5 match 100 stars 15.36 score 1.4k scripts 36 dependents

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 4 days ago.

fortran cpp

4.8 match 86 stars 16.73 score 7.7k scripts 101 dependents

r-gregmisc

gtools:Various R Programming Tools

Functions to assist in R programming, including: - assist in developing, updating, and maintaining R and R packages ('ask', 'checkRVersion', 'getDependencies', 'keywords', 'scat'), - calculate the logit and inverse logit transformations ('logit', 'inv.logit'), - test if a value is missing, empty or contains only NA and NULL values ('invalid'), - manipulate R's .Last function ('addLast'), - define macros ('defmacro'), - detect odd and even integers ('odd', 'even'), - convert strings containing non-ASCII characters (like single quotes) to plain ASCII ('ASCIIfy'), - perform a binary search ('binsearch'), - sort strings containing both numeric and character components ('mixedsort'), - create a factor variable from the quantiles of a continuous variable ('quantcut'), - enumerate permutations and combinations ('combinations', 'permutation'), - calculate and convert between fold-change and log-ratio ('foldchange', 'logratio2foldchange', 'foldchange2logratio'), - calculate probabilities and generate random numbers from Dirichlet distributions ('rdirichlet', 'ddirichlet'), - apply a function over adjacent subsets of a vector ('running'), - modify the TCP_NODELAY ('de-Nagle') flag for socket objects, - efficient 'rbind' of data frames, even if the column names don't match ('smartbind'), - generate significance stars from p-values ('stars.pval'), - convert characters to/from ASCII codes ('asc', 'chr'), - convert character vector to ASCII representation ('ASCIIfy'), - apply title capitalization rules to a character vector ('capwords').

Maintained by Ben Bolker. Last updated 9 months ago.

5.3 match 25 stars 14.47 score 11k scripts 1.1k dependents

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

6.3 match 215 stars 11.83 score 1.2k scripts 9 dependents

ecor

geotopbricks:An R Plug-in for the Distributed Hydrological Model GEOtop

It analyzes raster maps and other information as input/output files from the Hydrological Distributed Model GEOtop. It contains functions and methods to import maps and other keywords from geotop.inpts file. Some examples with simulation cases of GEOtop 2.x/3.x are presented in the package. Any information about the GEOtop Distributed Hydrological Model source code is available on www.geotop.org. Technical details about the model are available in Endrizzi et al (2014) <https://gmd.copernicus.org/articles/7/2831/2014/gmd-7-2831-2014.html>.

Maintained by Emanuele Cordano. Last updated 2 months ago.

15.9 match 4 stars 4.65 score 112 scripts

keyatm

keyATM:Keyword Assisted Topic Models

Fits keyword assisted topic models (keyATM) using collapsed Gibbs samplers. The keyATM combines the latent dirichlet allocation (LDA) models with a small number of keywords selected by researchers in order to improve the interpretability and topic classification of the LDA. The keyATM can also incorporate covariates and directly model time trends. The keyATM is proposed in Eshima, Imai, and Sasaki (2024) <doi:10.1111/ajps.12779>.

Maintained by Shusei Eshima. Last updated 1 days ago.

latent-dirichlet-allocation natural-language-processing political-science rcpp rcppeigen social-science topic-models cpp

9.9 match 107 stars 6.67 score 63 scripts

bioc

flowWorkspace:Infrastructure for representing and interacting with gated and ungated cytometry data sets.

This package is designed to facilitate comparison of automated gating methods against manual gating done in flowJo. This package allows you to import basic flowJo workspaces into BioConductor and replicate the gating from flowJo using the flowCore functionality. Gating hierarchies, groups of samples, compensation, and transformation are performed so that the output matches the flowJo analysis.

Maintained by Greg Finak. Last updated 25 days ago.

immunooncology flowcytometry dataimport preprocessing datarepresentation zlib openblas cpp

8.3 match 7.89 score 576 scripts 10 dependents

bnosac

textrank:Summarize Text by Ranking Sentences and Finding Keywords

The 'textrank' algorithm is an extension of the 'Pagerank' algorithm for text. The algorithm allows to summarize text by calculating how sentences are related to one another. This is done by looking at overlapping terminology used in sentences in order to set up links between sentences. The resulting sentence network is next plugged into the 'Pagerank' algorithm which identifies the most important sentences in your text and ranks them. In a similar way 'textrank' can also be used to extract keywords. A word network is constructed by looking if words are following one another. On top of that network the 'Pagerank' algorithm is applied to extract relevant words after which relevant words which are following one another are combined to get keywords. More information can be found in the paper from Mihalcea, Rada & Tarau, Paul (2004) <https://www.aclweb.org/anthology/W04-3252/>.

Maintained by Jan Wijffels. Last updated 4 years ago.

natural-language-processing nlp textrank textrank-algorithm

8.2 match 77 stars 7.38 score 103 scripts 2 dependents

manalytics

opitools:Analyzing the Opinions in a Big Text Document

Designed for performing impact analysis of opinions in a digital text document (DTD). The package allows a user to assess the extent to which a theme or subject within a document impacts the overall opinion expressed in the document. The package can be applied to a wide range of opinion-based DTD, including commentaries on social media platforms (such as 'Facebook', 'Twitter' and 'Youtube'), online products reviews, and so on. The utility of 'opitools' was originally demonstrated in Adepeju and Jimoh (2021) <doi:10.31235/osf.io/c32qh> in the assessment of COVID-19 impacts on neighbourhood policing using Twitter data. Further examples can be found in the vignette of the package.

Maintained by Monsuru Adepeju. Last updated 2 years ago.

10.8 match 12 stars 5.30 score 11 scripts

emilhvitfeldt

emoji:Data and Function to Work with Emojis

Contains data about emojis with relevant metadata, and functions to work with emojis when they are in strings.

Maintained by Emil Hvitfeldt. Last updated 5 months ago.

6.4 match 28 stars 7.93 score 304 scripts 3 dependents

nano-optics

terms:T-matrix for Electromagnetic Radiation with Multiple Scatterers

A set of Fortran modules/routines for T-matrix-based calculations of light scattering by clusters of individual scatterers.

Maintained by Baptiste Auguié. Last updated 8 months ago.

light scattering

7.8 match 7 stars 6.46 score 828 scripts

bioc

ncdfFlow:ncdfFlow: A package that provides HDF5 based storage for flow cytometry data.

Provides HDF5 storage based methods and functions for manipulation of flow cytometry data.

Maintained by Mike Jiang. Last updated 3 months ago.

immunooncology flowcytometry zlib cpp

6.3 match 7.56 score 96 scripts 11 dependents

datawookie

emayili:Send Email Messages

A light, simple tool for sending emails with minimal dependencies.

Maintained by Andrew B. Collier. Last updated 2 months ago.

hacktoberfest

4.9 match 180 stars 9.59 score 95 scripts 3 dependents

quanteda

quanteda:Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintained by Kenneth Benoit. Last updated 3 months ago.

corpus natural-language-processing quanteda text-analytics onetbb cpp

2.8 match 851 stars 16.65 score 5.4k scripts 52 dependents

massimoaria

bibliometrix:Comprehensive Science Mapping Analysis

Tool for quantitative research in scientometrics and bibliometrics. It implements the comprehensive workflow for science mapping analysis proposed in Aria M. and Cuccurullo C. (2017) <doi:10.1016/j.joi.2017.08.007>. 'bibliometrix' provides various routines for importing bibliographic data from 'SCOPUS', 'Clarivate Analytics Web of Science' (<https://www.webofknowledge.com/>), 'Digital Science Dimensions' (<https://www.dimensions.ai/>), 'OpenAlex' (<https://openalex.org/>), 'Cochrane Library' (<https://www.cochranelibrary.com/>), 'Lens' (<https://lens.org>), and 'PubMed' (<https://pubmed.ncbi.nlm.nih.gov/>) databases, performing bibliometric analysis and building networks for co-citation, coupling, scientific collaboration and co-word analysis.

Maintained by Massimo Aria. Last updated 13 days ago.

bibliometric-analysis bibliometrics citation citation-network citations co-authors co-occurence co-word-analysis correspondence-analysis coupling isi-web journal manuscript quantitative-analysis scholars science science-mapping scientific scientometrics scopus

3.7 match 545 stars 12.54 score 518 scripts 2 dependents

ha-pu

globaltrends:Download and Measure Global Trends Through Google Search Volumes

Google offers public access to global search volumes from its search engine through the Google Trends portal. The package downloads these search volumes provided by Google Trends and uses them to measure and analyze the distribution of search scores across countries or within countries. The package allows researchers and analysts to use these search scores to investigate global trends based on patterns within these scores. This offers insights such as degree of internationalization of firms and organizations or dissemination of political, social, or technological trends across the globe or within single countries. An outline of the package's methodological foundations and potential applications is available as a working paper: <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3969013>.

Maintained by Harald Puhr. Last updated 2 years ago.

google-trends internationalization

9.1 match 18 stars 5.00 score 11 scripts

skoval

RISmed:Download Content from NCBI Databases

A set of tools to extract bibliographic content from the National Center for Biotechnology Information (NCBI) databases, including PubMed. The name RISmed is a portmanteau of RIS (for Research Information Systems, a common tag format for bibliographic data) and PubMed.

Maintained by Stephanie Kovalchik. Last updated 3 years ago.

6.5 match 38 stars 6.94 score 252 scripts 3 dependents

statisticsgreenland

pxmake:Make PX-Files in R

Create PX-files from scratch or read and modify existing ones. Includes a function for every PX keyword, making metadata manipulation simple and human-readable.

Maintained by Johan Ejstrud. Last updated 4 days ago.

6.5 match 9 stars 6.95 score 11 scripts

dcomtois

summarytools:Tools to Quickly and Neatly Summarize Data

Data frame summaries, cross-tabulations, weight-enabled frequency tables and common descriptive (univariate) statistics in concise tables available in a variety of formats (plain ASCII, Markdown and HTML). A good point-of-entry for exploring data, both for experienced and new R users.

Maintained by Dominic Comtois. Last updated 6 days ago.

descriptive-statistics frequency-table html-report markdown pander pandoc pandoc-markdown rmarkdown rstudio

3.1 match 527 stars 14.62 score 2.9k scripts 6 dependents

selesnow

rgoogleads:Loading Data from 'Google Ads API'

Interface for loading data from 'Google Ads API', see <https://developers.google.com/google-ads/api/docs/start>. Package provide function for authorization and loading reports.

Maintained by Alexey Seleznev. Last updated 3 months ago.

6.9 match 14 stars 6.40 score 15 scripts 1 dependents

ropensci

awardFindR:awardFindR

Queries a number of scientific awards databases. Collects relevant results based on keyword and date parameters, returns list of projects that fit those criteria as a data frame. Sources include: Arnold Ventures, Carnegie Corp, Federal RePORTER, Gates Foundation, MacArthur Foundation, Mellon Foundation, NEH, NIH, NSF, Open Philanthropy, Open Society Foundations, Rockefeller Foundation, Russell Sage Foundation, Robert Wood Johnson Foundation, Sloan Foundation, Social Science Research Council, John Templeton Foundation, and USASpending.gov.

Maintained by Michael McCall. Last updated 1 years ago.

9.7 match 16 stars 4.38 score 3 scripts

chambm

AhoCorasickTrie:Fast Searching for Multiple Keywords in Multiple Texts

Aho-Corasick is an optimal algorithm for finding many keywords in a text. It can locate all matches in a text in O(N+M) time; i.e., the time needed scales linearly with the number of keywords (N) and the size of the text (M). Compare this to the naive approach which takes O(N*M) time to loop through each pattern and scan for it in the text. This implementation builds the trie (the generic name of the data structure) and runs the search in a single function call. If you want to search multiple texts with the same trie, the function will take a list or vector of texts and return a list of matches to each text. By default, all 128 ASCII characters are allowed in both the keywords and the text. A more efficient trie is possible if the alphabet size can be reduced. For example, DNA sequences use at most 19 distinct characters and usually only 4; protein sequences use at most 26 distinct characters and usually only 20. UTF-8 (Unicode) matching is not currently supported.

Maintained by Matt Chambers. Last updated 2 months ago.

cpp

9.0 match 10 stars 4.65 score 15 scripts 2 dependents

nlmixr2

rxode2:Facilities for Simulating from ODE-Based Models

Facilities for running simulations from ordinary differential equation ('ODE') models, such as pharmacometrics and other compartmental models. A compilation manager translates the ODE model into C, compiles it, and dynamically loads the object code into R for improved computational efficiency. An event table object facilitates the specification of complex dosing regimens (optional) and sampling schedules. NB: The use of this package requires both C and Fortran compilers, for details on their use with R please see Section 6.3, Appendix A, and Appendix D in the "R Administration and Installation" manual. Also the code is mostly released under GPL. The 'VODE' and 'LSODA' are in the public domain. The information is available in the inst/COPYRIGHTS.

Maintained by Matthew L. Fidler. Last updated 2 months ago.

fortran openblas cpp openmp

3.5 match 40 stars 11.24 score 220 scripts 13 dependents

cran

discoverableresearch:Checks Title, Abstract and Keywords to Optimise Discoverability

A suite of tools are provided here to support authors in making their research more discoverable. check_keywords() - this function checks the keywords to assess whether they are already represented in the title and abstract. check_fields() - this function compares terminology used across the title, abstract and keywords to assess where terminological diversity (i.e. the use of synonyms) could increase the likelihood of the record being identified in a search. The function looks for terms in the title and abstract that also exist in other fields and highlights these as needing attention. suggest_keywords() - this function takes a full text document and produces a list of unigrams, bigrams and trigrams (1-, 2- or 2-word phrases) present in the full text after removing stop words (words with a low utility in natural language processing) that do not occur in the title or abstract that may be suitable candidates for keywords. suggest_title() - this function takes a full text document and produces a list of the most frequently used unigrams, bigrams and trigrams after removing stop words that do not occur in the abstract or keywords that may be suitable candidates for title words. check_title() - this function carries out a number of sub tasks: 1) it compares the length (number of words) of the title with the mean length of titles in major bibliographic databases to assess whether the title is likely to be too short; 2) it assesses the proportion of stop words in the title to highlight titles with low utility in search engines that strip out stop words; 3) it compares the title with a given sample of record titles from an .ris import and calculates a similarity score based on phrase overlap. This highlights the level of uniqueness of the title. This version of the package also contains functions currently in a non-CRAN package called 'litsearchr' <https://github.com/elizagrames/litsearchr>.

Maintained by Neal Haddaway. Last updated 4 years ago.

14.5 match 2.70 score

r-dbi

DBI:R Database Interface

A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.

Maintained by Kirill Müller. Last updated 14 days ago.

database interface

1.8 match 302 stars 20.87 score 19k scripts 2.9k dependents

agusnieto77

ACEP:Análisis Computacional de Eventos de Protesta

La librería 'ACEP' contiene funciones específicas para desarrollar análisis computacional de eventos de protesta. Asimismo, contiene bases de datos con colecciones de notas sobre protestas y diccionarios de palabras conflictivas. La colección de diccionarios reune diccionarios de diferentes orígenes. The 'ACEP' library contains specific functions to perform computational analysis of protest events. It also contains a database with collections of notes on protests and dictionaries of conflicting words. Collection of dictionaries that brings together dictionaries from different sources.

Maintained by Agustín Nieto. Last updated 1 years ago.

computer-aided-detection conflict-analysis conflict-detection dictionaries nlp-keywords-extraction protest-events text-mining visualization

6.7 match 10 stars 5.48 score 9 scripts

bioc

GenomicSuperSignature:Interpretation of RNA-seq experiments through robust, efficient comparison to public databases

This package provides a novel method for interpreting new transcriptomic datasets through near-instantaneous comparison to public archives without high-performance computing requirements. Through the pre-computed index, users can identify public resources associated with their dataset such as gene sets, MeSH term, and publication. Functions to identify interpretable annotations and intuitive visualization options are implemented in this package.

Maintained by Sehyun Oh. Last updated 5 months ago.

transcriptomics systemsbiology principalcomponent rnaseq sequencing pathways clustering bioconductor-package exploratory-data-analysis gsea mesh principal-component-analysis rna-sequencing-profiles transferlearning u24ca289073

4.8 match 16 stars 6.97 score 59 scripts

rjournal

rjtools:Preparing, Checking, and Submitting Articles to the 'R Journal'

Create an 'R Journal' 'Rmarkdown' template article, that will generate html and pdf versions of your paper. Check that the paper folder has all the required components needed for submission. Examples of 'R Journal' publications can be found at <https://journal.r-project.org>.

Maintained by Di Cook. Last updated 2 months ago.

3.8 match 33 stars 8.81 score 37 scripts 1 dependents

blasbenito

distantia:Advanced Toolset for Efficient Time Series Dissimilarity Analysis

Fast C++ implementation of Dynamic Time Warping for time series dissimilarity analysis, with applications in environmental monitoring and sensor data analysis, climate science, signal processing and pattern recognition, and financial data analysis. Built upon the ideas presented in Benito and Birks (2020) <doi:10.1111/ecog.04895>, provides tools for analyzing time series of varying lengths and structures, including irregular multivariate time series. Key features include individual variable contribution analysis, restricted permutation tests for statistical significance, and imputation of missing data via GAMs. Additionally, the package provides an ample set of tools to prepare and manage time series data.

Maintained by Blas M. Benito. Last updated 1 months ago.

5.8 match 23 stars 5.73 score 11 scripts

ropensci

restez:Create and Query a Local Copy of 'GenBank' in R

Download large sections of 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> and generate a local SQL-based database. A user can then query this database using 'restez' functions or through 'rentrez' <https://CRAN.R-project.org/package=rentrez> wrappers.

Maintained by Joel H. Nitta. Last updated 25 days ago.

dna entrez genbank sequence

4.5 match 26 stars 7.01 score 175 scripts 1 dependents

oobianom

r2symbols:Symbols for 'Markdown' and 'Shiny' Application

Direct insertion of over 1000 symbols (e.g. currencies, letters, emojis, arrows, mathematical symbols and so on) into 'Rmarkdown' documents and 'Shiny' applications by incorporating 'HTML' hex codes.

Maintained by Obinna Obianom. Last updated 2 years ago.

4.6 match 11 stars 6.67 score 94 scripts 1 dependents

mountainmath

cancensus:Access, Retrieve, and Work with Canadian Census Data and Geography

Integrated, convenient, and uniform access to Canadian Census data and geography retrieved using the 'CensusMapper' API. This package produces analysis-ready tidy data frames and spatial data in multiple formats, as well as convenience functions for working with Census variables, variable hierarchies, and region selection. API keys are freely available with free registration at <https://censusmapper.ca/api>. Census data and boundary geometries are reproduced and distributed on an "as is" basis with the permission of Statistics Canada (Statistics Canada 2001; 2006; 2011; 2016; 2021).

Maintained by Dmitry Shkolnik. Last updated 1 years ago.

3.2 match 82 stars 8.80 score 414 scripts

merck

metalite:ADaM Metadata Structure

A metadata structure for clinical data analysis and reporting based on Analysis Data Model (ADaM) datasets. The package simplifies clinical analysis and reporting tool development by defining standardized inputs, outputs, and workflow. The package can be used to create analysis and reporting planning grid, mock table, and validated analysis and reporting results based on consistent inputs.

Maintained by Yujie Zhao. Last updated 7 months ago.

cdisc clinical-trials metadata

3.1 match 15 stars 8.89 score 57 scripts 5 dependents

crew102

slowraker:A Slow Version of the Rapid Automatic Keyword Extraction (RAKE) Algorithm

A mostly pure-R implementation of the RAKE algorithm (Rose, S., Engel, D., Cramer, N. and Cowley, W. (2010) <doi:10.1002/9780470689646.ch1>), which can be used to extract keywords from documents without any training data.

Maintained by Christopher Baker. Last updated 7 months ago.

openjdk

5.0 match 6 stars 5.37 score 13 scripts 1 dependents

andreacapozio

TMDb:Access to TMDb API

Provides an R-interface to the TMDb API (see TMDb API on <https://developers.themoviedb.org/3/getting-started/introduction>). The Movie Database (TMDb) is a popular user editable database for movies and TV shows (see <https://www.themoviedb.org>).

Maintained by Andrea Capozio. Last updated 5 years ago.

12.7 match 2.00 score 99 scripts

larmarange

labelled:Manipulating Labelled Data

Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.

Maintained by Joseph Larmarange. Last updated 1 months ago.

haven labels metadata sas spss stata

1.7 match 76 stars 15.04 score 2.4k scripts 98 dependents

jbgruber

LexisNexisTools:Working with Files from 'LexisNexis'

My PhD supervisor once told me that everyone doing newspaper analysis starts by writing code to read in files from the 'LexisNexis' newspaper archive (retrieved e.g., from <https://www.lexisnexis.com/> or any of the partner sites). However, while this is a nice exercise I do recommend, not everyone has the time. This package takes files downloaded from the newspaper archive of 'LexisNexis', reads them into R and offers functions for further processing.

Maintained by Johannes B. Gruber. Last updated 12 months ago.

text-analysis

3.5 match 107 stars 7.14 score 65 scripts

polmine

polmineR:Verbs and Nouns for Corpus Analysis

Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.

Maintained by Andreas Blaette. Last updated 1 years ago.

3.1 match 49 stars 7.96 score 311 scripts

dwulff

text2sdg:Detecting UN Sustainable Development Goals in Text

The United Nations’ Sustainable Development Goals (SDGs) have become an important guideline for organisations to monitor and plan their contributions to social, economic, and environmental transformations. The 'text2sdg' package is an open-source analysis package that identifies SDGs in text using scientifically developed query systems, opening up the opportunity to monitor any type of text-based data, such as scientific output or corporate publications. For more information regarding the methodology see Meier, Mata & Wulff (2022) <arXiv:2110.05856>.

Maintained by Dominik S. Meier. Last updated 7 months ago.

natural-language-processing sustainability sustainable-development sustainable-development-goals

4.0 match 18 stars 6.13 score 9 scripts

bioc

KEGGREST:Client-side REST access to the Kyoto Encyclopedia of Genes and Genomes (KEGG)

A package that provides a client interface to the Kyoto Encyclopedia of Genes and Genomes (KEGG) REST API. Only for academic use by academic users belonging to academic institutions (see <https://www.kegg.jp/kegg/rest/>). Note that KEGGREST is based on KEGGSOAP by J. Zhang, R. Gentleman, and Marc Carlson, and KEGG (python package) by Aurelien Mazurie.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation pathways thirdpartyclient kegg bioconductor-package core-package

1.7 match 10 stars 13.50 score 688 scripts 771 dependents

epiverse-trace

ColOpenData:Download Colombian Demographic, Climate and Geospatial Data

Downloads wrangled Colombian socioeconomic, geospatial,population and climate data from DANE <https://www.dane.gov.co/> (National Administrative Department of Statistics) and IDEAM <https://ideam.gov.co> (Institute of Hydrology, Meteorology and Environmental Studies). It solves the problem of Colombian data being issued in different web pages and sources by using functions that allow the user to select the desired database and download it without having to do the exhausting acquisition process.

Maintained by Maria Camila Tavera-Cifuentes. Last updated 2 months ago.

climate colombia data-package demographics maps

2.9 match 11 stars 7.44 score 17 scripts

taylor-arnold

ctrialsgov:Query Data from U.S. National Library of Medicine's Clinical Trials Database

Tools to create and query database from the U.S. National Library of Medicine's Clinical Trials database <https://clinicaltrials.gov/>. Functions provide access a variety of techniques for searching the data using range queries, categorical filtering, and by searching for full-text keywords. Minimal graphical tools are also provided for interactively exploring the constructed data.

Maintained by Taylor Arnold. Last updated 3 years ago.

6.3 match 3.46 score 29 scripts

bioc

AnVILWorkflow:Run workflows implemented in Terra/AnVIL workspace

The AnVIL is a cloud computing resource developed in part by the National Human Genome Research Institute. The main cloud-based genomics platform deported by the AnVIL project is Terra. The AnVILWorkflow package allows remote access to Terra implemented workflows, enabling end-user to utilize Terra/ AnVIL provided resources - such as data, workflows, and flexible/scalble computing resources - through the conventional R functions.

Maintained by Sehyun Oh. Last updated 1 months ago.

infrastructure software anvil gcp terra workflows

3.6 match 6 stars 5.95 score 1 scripts

business-science

tibbletime:Time Aware Tibbles

Built on top of the 'tibble' package, 'tibbletime' is an extension that allows for the creation of time aware tibbles. Some immediate advantages of this include: the ability to perform time-based subsetting on tibbles, quickly summarising and aggregating results by time periods, and creating columns that can be used as 'dplyr' time-based groups.

Maintained by Davis Vaughan. Last updated 4 months ago.

periodicity tibble time time-series timeseries cpp

2.0 match 177 stars 10.51 score 644 scripts 2 dependents

bioc

SBGNview:"SBGNview: Data Analysis, Integration and Visualization on SBGN Pathways"

SBGNview is a tool set for pathway based data visalization, integration and analysis. SBGNview is similar and complementary to the widely used Pathview, with the following key features: 1. Pathway definition by the widely adopted Systems Biology Graphical Notation (SBGN); 2. Supports multiple major pathway databases beyond KEGG (Reactome, MetaCyc, SMPDB, PANTHER, METACROP) and user defined pathways; 3. Covers 5,200 reference pathways and over 3,000 species by default; 4. Extensive graphics controls, including glyph and edge attributes, graph layout and sub-pathway highlight; 5. SBGN pathway data manipulation, processing, extraction and analysis.

Maintained by Weijun Luo. Last updated 5 months ago.

genetarget pathways graphandnetwork visualization genesetenrichment differentialexpression geneexpression microarray rnaseq genetics metabolomics proteomics systemsbiology sequencing

3.3 match 26 stars 6.23 score 22 scripts

ingmarboeschen

JATSdecoder:A Metadata and Text Extraction and Manipulation Tool Set

Provides a function collection to extract metadata, sectioned text and study characteristics from scientific articles in 'NISO-JATS' format. Articles in PDF format can be converted to 'NISO-JATS' with the 'Content ExtRactor and MINEr' ('CERMINE', <https://github.com/CeON/CERMINE>). For convenience, two functions bundle the extraction heuristics: JATSdecoder() converts 'NISO-JATS'-tagged XML files to a structured list with elements title, author, journal, history, 'DOI', abstract, sectioned text and reference list. study.character() extracts multiple study characteristics like number of included studies, statistical methods used, alpha error, power, statistical results, correction method for multiple testing, software used. An estimation of the involved sample size is performed based on reports within the abstract and the reported degrees of freedom within statistical results. In addition, the package contains some useful functions to process text (text2sentences(), text2num(), ngram(), strsplit2(), grep2()). See Böschen, I. (2021) <doi:10.1007/s11192-021-04162-z> Böschen, I. (2021) <doi:10.1038/s41598-021-98782-3> and Böschen, I (2023) <doi:10.1038/s41598-022-27085-y>.

Maintained by Ingmar Böschen. Last updated 21 days ago.

cermine niso-jats pubmedcentral text-extraction text-mining xml-files openjdk

4.5 match 18 stars 4.56 score 7 scripts

pwwang

plotthis:High-Level Plotting Built Upon 'ggplot2' and Other Plotting Packages

Provides high-level API and a wide range of options to create stunning, publication-quality plots effortlessly. It is built upon 'ggplot2' and other plotting packages, and is designed to be easy to use and to work seamlessly with 'ggplot2' objects. It is particularly useful for creating complex plots with multiple layers, facets, and annotations. It also provides a set of functions to create plots for specific types of data, such as Venn diagrams, alluvial diagrams, and phylogenetic trees. The package is designed to be flexible and customizable, and to work well with the 'ggplot2' ecosystem. The API can be found at <https://pwwang.github.io/plotthis/reference/index.html>.

Maintained by Panwen Wang. Last updated 7 hours ago.

ggplot2 plotting single-cell

3.5 match 43 stars 5.61 score 2 scripts

ropensci

frictionless:Read and Write Frictionless Data Packages

Read and write Frictionless Data Packages. A 'Data Package' (<https://specs.frictionlessdata.io/data-package/>) is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR (<https://www.go-fair.org/fair-principles/>) and open datasets.

Maintained by Peter Desmet. Last updated 6 months ago.

frictionlessdata oscibio

2.0 match 30 stars 9.79 score 55 scripts 6 dependents

aravind-j

PGRdup:Discover Probable Duplicates in Plant Genetic Resources Collections

Provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections using 'passport databases' comprising of information records of each constituent sample. These include methods for cleaning the data, creation of a searchable Key Word in Context (KWIC) index of keywords associated with sample records and the identification of nearly identical records with similar information by fuzzy, phonetic and semantic matching of keywords.

Maintained by J. Aravind. Last updated 2 years ago.

double-metaphone double-metaphone-algorithm natural-language-processing pgr plant-genetic-resources record-linkage

4.8 match 1 stars 4.06 score 23 scripts

ropensci

cffr:Generate Citation File Format ('cff') Metadata for R Packages

The Citation File Format version 1.2.0 <doi:10.5281/zenodo.5171937> is a human and machine readable file format which provides citation metadata for software. This package provides core utilities to generate and validate this metadata.

Maintained by Diego Hernangómez. Last updated 20 hours ago.

attribution citation credit citation-files cff metadata citation-file-format ropensci

2.0 match 26 stars 9.65 score 116 scripts 3 dependents

ropenspain

opendataes:Interact with the datos.gob.es API to download public data from all of Spain

Easily interact with the API from http://datos.gob.es to download data over 19,000 files from all different provinces of Spain.

Maintained by Jorge Cimentada. Last updated 4 years ago.

5.4 match 20 stars 3.60 score 9 scripts

ropensci

openalexR:Getting Bibliographic Records from 'OpenAlex' Database Using 'DSL' API

A set of tools to extract bibliographic content from 'OpenAlex' database using API <https://docs.openalex.org>.

Maintained by Massimo Aria. Last updated 7 days ago.

bibliographic-data bibliographic-database bibliometrics bibliometrix science-mapping

1.8 match 110 stars 10.34 score 194 scripts 5 dependents

alex-danilin

serpstatr:'Serpstat' API Wrapper

The primary goal of 'Serpstat' API <https://serpstat.com/api/> is to reduce manual SEO (search engine optimization) and PPC (pay-per-click) tasks. You can automate your keywords research or competitors analysis with this API wrapper.

Maintained by Alex Danilin. Last updated 8 months ago.

10.7 match 1.70 score 2 scripts

laresbernardo

lares:Analytics & Machine Learning Sidekick

Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.

Maintained by Bernardo Lares. Last updated 1 months ago.

analytics api automation automl data-science descriptive-statistics h2o machine-learning marketing mmm predictive-modeling puzzle rlanguage robyn visualization

1.8 match 233 stars 9.92 score 185 scripts 1 dependents

ncss-tech

aqp:Algorithms for Quantitative Pedology

The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.

Maintained by Dylan Beaudette. Last updated 1 months ago.

digital-soil-mapping ncss-tech nrcs pedology pedometrics soil soil-survey usda

1.5 match 55 stars 11.90 score 1.2k scripts 2 dependents

kwb-r

kwb.endnote:Helper Functions for Analysing KWB Endnote Library (Exported as .xml)

Helper Functions For Analysing KWB Endnote Library (Exported As .XML).

Maintained by Michael Rustler. Last updated 4 years ago.

endnote knowledge-repo literature-data-management project-fakin publication

5.8 match 3.00 score 2 scripts

bioc

CytoML:A GatingML Interface for Cross Platform Cytometry Data Sharing

Uses platform-specific implemenations of the GatingML2.0 standard to exchange gated cytometry data with other software platforms.

Maintained by Mike Jiang. Last updated 25 days ago.

immunooncology flowcytometry dataimport datarepresentation zlib openblas libxml2 cpp

2.3 match 30 stars 7.60 score 132 scripts

ohdsi

SqlRender:Rendering Parameterized SQL and Translation to Dialects

A rendering tool for parameterized SQL that also translates into different SQL dialects. These dialects include 'Microsoft SQL Server', 'Oracle', 'PostgreSql', 'Amazon RedShift', 'Apache Impala', 'IBM Netezza', 'Google BigQuery', 'Microsoft PDW', 'Snowflake', 'Azure Synapse Analytics Dedicated', 'Apache Spark', 'SQLite', and 'InterSystems IRIS'.

Maintained by Martijn Schuemie. Last updated 19 days ago.

hades openjdk

1.3 match 82 stars 12.52 score 488 scripts 13 dependents

bioc

ASURAT:Functional annotation-driven unsupervised clustering for single-cell data

ASURAT is a software for single-cell data analysis. Using ASURAT, one can simultaneously perform unsupervised clustering and biological interpretation in terms of cell type, disease, biological process, and signaling pathway activity. Inputting a single-cell RNA-seq data and knowledge-based databases, such as Cell Ontology, Gene Ontology, KEGG, etc., ASURAT transforms gene expression tables into original multivariate tables, termed sign-by-sample matrices (SSMs).

Maintained by Keita Iida. Last updated 5 months ago.

geneexpression singlecell sequencing clustering genesignaling cpp

3.8 match 4.32 score 21 scripts

eddelbuettel

pinp:'pinp' is not 'PNAS'

A 'PNAS'-alike style for 'rmarkdown', derived from the 'Proceedings of the National Academy of Sciences of the United States of America' ('PNAS') 'LaTeX' style, and adapted for use with 'markdown' and 'pandoc'.

Maintained by Dirk Eddelbuettel. Last updated 2 months ago.

markdown vignette

2.0 match 149 stars 7.81 score 2 scripts 1 dependents

giocomai

castarter:Content Analysis Starter Toolkit

Consistent approaches for basic web scraping, text mining and word frequency analysis of textual datasets.

Maintained by Giorgio Comai. Last updated 1 days ago.

tada text-mining

3.3 match 3 stars 4.59 score 2 scripts

bioc

simplifyEnrichment:Simplify Functional Enrichment Results

A new clustering algorithm, "binary cut", for clustering similarity matrices of functional terms is implemeted in this package. It also provides functions for visualizing, summarizing and comparing the clusterings.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization go clustering genesetenrichment

1.9 match 113 stars 8.02 score 196 scripts

bioc

FGNet:Functional Gene Networks derived from biological enrichment analyses

Build and visualize functional gene and term networks from clustering of enrichment analyses in multiple annotation spaces. The package includes a graphical user interface (GUI) and functions to perform the functional enrichment analysis through DAVID, GeneTerm Linker, gage (GSEA) and topGO.

Maintained by Sara Aibar. Last updated 5 months ago.

annotation go pathways genesetenrichment network visualization functionalgenomics networkenrichment clustering

3.3 match 4.62 score 5 scripts 1 dependents

quadrama

DramaAnalysis:Analysis of Dramatic Texts

Analysis of preprocessed dramatic texts, with respect to literary research. The package provides functions to analyze and visualize information about characters, stage directions, the dramatic structure and the text itself. The dramatic texts are expected to be in CSV format, which can be installed from within the package, sample texts are provided. The package and the reasoning behind it are described in Reiter et al. (2017) <doi:10.18420/in2017_119>.

Maintained by Nils Reiter. Last updated 5 years ago.

corpus-linguistics digital-humanities drama dramatic-texts statistics

3.0 match 15 stars 4.79 score 41 scripts

bioc

biobtreeR:Using biobtree tool from R

The biobtreeR package provides an interface to [biobtree](https://github.com/tamerh/biobtree) tool which covers large set of bioinformatics datasets and allows search and chain mappings functionalities.

Maintained by Tamer Gur. Last updated 5 months ago.

annotation bioinformatics

3.2 match 3 stars 4.48 score 3 scripts

insongkim

concordance:Product Concordance

A set of utilities for matching products in different classification codes used in international trade research. It supports concordance between the Harmonized System (HS0, HS1, HS2, HS3, HS4, HS5, HS6, HS combined), the Standard International Trade Classification (SITC1, SITC2, SITC3, SITC4), the North American Industry Classification System (NAICS 2002, 2007, 2012, 2017, combined), as well as the Broad Economic Categories (BEC4), the International Standard of Industrial Classification (ISIC2, ISIC3, ISIC3.1, ISIC4), and the Standard Industrial Classification (SIC). It also provides code nomenclature/descriptions look-up, product code look-up based on user-specified keywords, Rauch classification look-up (via concordance to SITC2), trade elasticity look-up (via concordance to HS0 or SITC3 codes), upstreamness/downstreamness look-up (via concordance to ISIC3 and NAICS codes), and intermediateness look-up (via product descriptions).

Maintained by Steven Liao. Last updated 8 months ago.

2.4 match 53 stars 5.99 score 73 scripts

ceopinio

CEOdata:Datasets of the CEO (Centre d'Estudis d'Opinio) for Opinion Polls in Catalonia

Easy and convenient access to the datasets of the "Centre d'Estudis d'Opinio", the Catalan institution for polling and public opinion. The package uses the data stored in the servers of the CEO and returns it in a tidy format.

Maintained by Xavier Fernández-i-Marín. Last updated 2 years ago.

2.9 match 5 stars 4.88 score 4 scripts

oobianom

quickcode:Quick and Essential 'R' Tricks for Better Scripts

The NOT functions, 'R' tricks and a compilation of some simple quick plus often used 'R' codes to improve your scripts. Improve the quality and reproducibility of 'R' scripts.

Maintained by Obinna Obianom. Last updated 29 days ago.

colors data distributions images

1.8 match 5 stars 7.76 score 7 scripts 6 dependents

selesnow

rtgstat:Client for 'TGStat API'

Allow function for using 'TGStat Stat API' and 'TGStat Search API', for more details see <https://api.tgstat.ru/docs/ru/start/intro.html>. 'TGStat' provide telegram channel analytics data.

Maintained by Alexey Seleznev. Last updated 5 months ago.

3.9 match 8 stars 3.60 score 3 scripts

desiquintans

librarian:Install, Update, Load Packages from CRAN, 'GitHub', and 'Bioconductor' in One Step

Automatically install, update, and load 'CRAN', 'GitHub', and 'Bioconductor' packages in a single function call. By accepting bare unquoted names for packages, it's easy to add or remove packages from the list.

Maintained by Desi Quintans. Last updated 3 months ago.

1.8 match 53 stars 7.64 score 410 scripts 1 dependents

kwb-r

kwb.utils:General Utility Functions Developed at KWB

This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).

Maintained by Hauke Sonnenberg. Last updated 1 years ago.

1.9 match 8 stars 7.33 score 12 scripts 78 dependents

prodriguezsosa

conText:'a la Carte' on Text (ConText) Embedding Regression

A fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) <arXiv:1805.05388> and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021)<https://github.com/prodriguezsosa/EmbeddingRegression>.

Maintained by Pedro L. Rodriguez. Last updated 11 months ago.

1.5 match 104 stars 9.10 score 1.7k scripts

cont-limno

LAGOSNE:Interface to the Lake Multi-Scaled Geospatial and Temporal Database

Client for programmatic access to the Lake Multi-scaled Geospatial and Temporal database <https://lagoslakes.org>, with functions for accessing lake water quality and ecological context data for the US.

Maintained by Jemma Stachelek. Last updated 2 years ago.

ecology geoscience limnology water-quality

2.0 match 15 stars 6.77 score 98 scripts

kasperwelbers

corpustools:Managing, Querying and Analyzing Tokenized Text

Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.

Maintained by Kasper Welbers. Last updated 6 months ago.

cpp

1.8 match 31 stars 7.50 score 174 scripts 1 dependents

cran

RWsearch:Lazy Search in R Packages, Task Views, CRAN, the Web. All-in-One Download

Search by keywords in R packages, task views, CRAN, the web and display the results in the console or in txt, html or pdf files. Download the package documentation (html index, README, NEWS, pdf manual, vignettes, source code, binaries) with a single instruction. Visualize the package dependencies and CRAN checks. Compare the package versions, unload and install the packages and their dependencies in a safe order. Explore CRAN archives. Use the above functions for task view maintenance. Access web search engines from the console thanks to 80+ bookmarks. All functions accept standard and non-standard evaluation.

Maintained by Patrice Kiener. Last updated 18 days ago.

3.6 match 3.70 score

elipousson

d2r:Create Diagrams with D2

Build, read, write, and render diagrams using the D2 syntax.

Maintained by Eli Pousson. Last updated 6 months ago.

d2 visualization

4.0 match 7 stars 3.24 score 6 scripts

mgondan

rolog:Query 'SWI'-'Prolog' from R

This R package connects to SWI-Prolog, <https://www.swi-prolog.org/>, so that R can send deterministic and non-deterministic queries to prolog (consult, query/submit, once, findall).

Maintained by Matthias Gondan. Last updated 21 days ago.

cpp

2.0 match 4 stars 6.37 score 10 scripts 1 dependents

darwin-eu

CodelistGenerator:Identify Relevant Clinical Codes and Evaluate Their Use

Generate a candidate code list for the Observational Medical Outcomes Partnership (OMOP) common data model based on string matching. For a given search strategy, a candidate code list will be returned.

Maintained by Edward Burn. Last updated 5 days ago.

1.3 match 14 stars 9.94 score 165 scripts 4 dependents

ivan-rivera

RedditExtractoR:Reddit Data Extraction Toolkit

A collection of tools for extracting structured data from <https://www.reddit.com/>.

Maintained by Ivan Rivera. Last updated 2 years ago.

data reddit scraper

2.0 match 93 stars 6.02 score 153 scripts

rhoinc

CRANsearcher:RStudio Addin for Searching Packages in CRAN Database Based on Keywords

One of the strengths of R is its vast package ecosystem. Indeed, R packages extend from visualization to Bayesian inference and from spatial analyses to pharmacokinetics (<https://cran.r-project.org/web/views/>). There is probably not an area of quantitative research that isn't represented by at least one R package. At the time of this writing, there are more than 10,000 active CRAN packages. Because of this massive ecosystem, it is important to have tools to search and learn about packages related to your personal R needs. For this reason, we developed an RStudio addin capable of searching available CRAN packages directly within RStudio.

Maintained by Agustin Calatroni. Last updated 8 years ago.

rstudio-addin

2.8 match 34 stars 4.23 score 2 scripts

bioc

hipathia:HiPathia: High-throughput Pathway Analysis

Hipathia is a method for the computation of signal transduction along signaling pathways from transcriptomic data. The method is based on an iterative algorithm which is able to compute the signal intensity passing through the nodes of a network by taking into account the level of expression of each gene and the intensity of the signal arriving to it. It also provides a new approach to functional analysis allowing to compute the signal arriving to the functions annotated to each pathway.

Maintained by Marta R. Hidalgo. Last updated 5 months ago.

pathways graphandnetwork geneexpression genesignaling go

3.2 match 3.62 score 42 scripts

cran

miRetrieve:miRNA Text Mining in Abstracts

Providing tools for microRNA (miRNA) text mining. miRetrieve summarizes miRNA literature by extracting, counting, and analyzing miRNA names, thus aiming at gaining biological insights into a large amount of text within a short period of time. To do so, miRetrieve uses regular expressions to extract miRNAs and tokenization to identify meaningful miRNA associations. In addition, miRetrieve uses the latest miRTarBase version 8.0 (Hsi-Yuan Huang et al. (2020) "miRTarBase 2020: updates to the experimentally validated microRNA–target interaction database" <doi:10.1093/nar/gkz896>) to display field-specific miRNA-mRNA interactions. The most important functions are available as a Shiny web application under <https://miretrieve.shinyapps.io/miRetrieve/>.

Maintained by Julian Friedrich. Last updated 4 years ago.

6.8 match 1.70 score

bart1

move:Visualizing and Analyzing Animal Track Data

Contains functions to access movement data stored in 'movebank.org' as well as tools to visualize and statistically analyze animal movement data, among others functions to calculate dynamic Brownian Bridge Movement Models. Move helps addressing movement ecology questions.

Maintained by Bart Kranstauber. Last updated 4 months ago.

cpp

1.3 match 8.76 score 690 scripts 3 dependents

cran

qmrparser:Parser Combinator in R

Basic functions for building parsers, with an application to PC-AXIS format files.

Maintained by Juan Gea. Last updated 3 years ago.

3.3 match 1 stars 3.26 score 6 dependents

sticsrpacks

SticsRFiles:Read and Modify 'STICS' Input/Output Files

Manipulating input and output files of the 'STICS' crop model. Files are either 'JavaSTICS' XML files or text files used by the model 'fortran' executable. Most basic functionalities are reading or writing parameter names and values in both XML or text input files, and getting data from output files. Advanced functionalities include XML files generation from XML templates and/or spreadsheets, or text files generation from XML files by using 'xslt' transformation.

Maintained by Patrice Lecharpentier. Last updated 1 months ago.

1.3 match 4 stars 8.21 score 124 scripts

gbganalyst

bulkreadr:The Ultimate Tool for Reading Data in Bulk

Designed to simplify and streamline the process of reading and processing large volumes of data in R, this package offers a collection of functions tailored for bulk data operations. It enables users to efficiently read multiple sheets from Microsoft Excel and Google Sheets workbooks, as well as various CSV files from a directory. The data is returned as organized data frames, facilitating further analysis and manipulation. Ideal for handling extensive data sets or batch processing tasks, bulkreadr empowers users to manage data in bulk effortlessly, saving time and effort in data preparation workflows. Additionally, the package seamlessly works with labelled data from SPSS and Stata.

Maintained by Ezekiel Ogundepo. Last updated 7 months ago.

bulkreader csv-reader data-import googlesheets missing-values xlsxreader

1.7 match 12 stars 5.94 score 12 scripts

tidy-intelligence

wbwdi:Seamless Access to World Bank World Development Indicators (WDI)

Access and analyze the World Bank’s World Development Indicators (WDI) using the corresponding API <https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation>. WDI provides more than 24,000 country or region-level indicators for various contexts. 'wbwdi' enables users to download, process and work with WDI series across multiple countries, aggregates, and time periods.

Maintained by Christoph Scheuch. Last updated 6 days ago.

economic-data

1.8 match 5 stars 5.60 score 4 scripts

cran

eatDB:Spreadsheet Interface for Relational Databases

Use 'SQLite3' as a database system via a complete SQL free R interface, treating the data as if it was a single spreadsheet.

Maintained by Benjamin Becker. Last updated 3 years ago.

4.0 match 2.48 score 2 dependents

ropensci

internetarchive:An API Client for the Internet Archive

Search the Internet Archive (<https://archive.org>), retrieve metadata, and download files.

Maintained by Ahmet Akkoc. Last updated 4 months ago.

1.8 match 60 stars 5.44 score 23 scripts

thmild

keyperm:Keyword Analysis Using Permutation Tests

Fast implementation of permutation tests for keyword analysis in corpus linguistics. The aim is to identify words that are significantly more frequent in one corpus than in another. The method is described in Mildenberger (2023) <arXiv:2308.13383>.

Maintained by Thoralf Mildenberger. Last updated 2 years ago.

cpp

3.5 match 2.70 score 3 scripts

petrbouchal

czso:Use Open Data from the Czech Statistical Office in R

Get programmatic access to the open data provided by the Czech Statistical Office (CZSO, <https://czso.cz>).

Maintained by Petr Bouchal. Last updated 7 months ago.

czech-republic czech-statistical-office czso dataset open-data statistics

1.8 match 11 stars 5.24 score 53 scripts

lchansson

rKolada:Access Data from the 'Kolada' Database

Methods for downloading and processing data and metadata from 'Kolada', the official Swedish regions and municipalities database <https://www.kolada.se/>.

Maintained by Love Hansson. Last updated 11 months ago.

1.8 match 4 stars 5.30 score 5 scripts

crew102

rapidraker:Rapid Automatic Keyword Extraction (RAKE) Algorithm

A 'Java' implementation of the RAKE algorithm ('Rose', S., 'Engel', D., 'Cramer', N. and 'Cowley', W. (2010) <doi:10.1002/9780470689646.ch1>), which can be used to extract keywords from documents without any training data.

Maintained by Christopher Baker. Last updated 4 years ago.

openjdk

3.4 match 1 stars 2.70 score 5 scripts

curso-r

scryr:An Interface to the 'Scryfall' API

A simple, light, and robust interface between R and the 'Scryfall' card data API <https://scryfall.com/docs/api>.

Maintained by Caio Lente. Last updated 3 years ago.

api mtg

1.5 match 18 stars 6.11 score 18 scripts

qile0317

FastUtils:Fast, Readable Utility Functions

A wide variety of tools for general data analysis, wrangling, spelling, statistics, visualizations, package development, and more. All functions have vectorized implementations whenever possible. Exported names are designed to be readable, with longer names possessing short aliases.

Maintained by Qile Yang. Last updated 4 months ago.

scientific-computing utilities utility cpp

1.8 match 2 stars 4.95 score 2 scripts

hoxo-m

githubinstall:A Helpful Way to Install R Packages Hosted on GitHub

Provides an helpful way to install packages hosted on GitHub.

Maintained by Koji Makiyama. Last updated 7 years ago.

r-language

1.2 match 49 stars 7.29 score 177 scripts

fbertran

plsRbeta:Partial Least Squares Regression for Beta Regression Models

Provides Partial least squares Regression for (weighted) beta regression models (Bertrand 2013, <http://journal-sfds.fr/article/view/215>) and k-fold cross-validation of such models using various criteria. It allows for missing data in the explanatory variables. Bootstrap confidence intervals constructions are also available.

Maintained by Frederic Bertrand. Last updated 2 years ago.

2.0 match 2 stars 4.34 score 22 scripts

felixfan

PubMedWordcloud:'Pubmed' Word Clouds

Create a word cloud using the abstract of publications from 'Pubmed'.

Maintained by Felix Yanhui Fan. Last updated 6 years ago.

1.8 match 22 stars 4.79 score 28 scripts

aymennasri

tndata:Fetch Datasets from the Official Tunisian Data Catalog

Simplifies access to Tunisian government open data from <https://data.gov.tn/fr/>. Queries datasets by theme, author, or keywords, retrieves metadata, and gets structured results ready for analysis; all through the official 'CKAN' API.

Maintained by Aymen Nasri. Last updated 9 days ago.

government-data tunisia

2.4 match 3.40 score

mrc-ide

odin2:Next generation odin

Temporary package for rewriting odin.

Maintained by Rich FitzJohn. Last updated 2 months ago.

1.3 match 5 stars 6.32 score 22 scripts

sditools

statsearchanalyticsr:An Interface for the 'STAT Search Analytics' API

Pull data from the 'STAT Search Analytics' API <https://help.getstat.com/knowledgebase/api-services/>. It was developed by the Search Discovery team to help analyze keyword ranking data.

Maintained by Ben Woodard. Last updated 4 years ago.

2.8 match 1 stars 2.70 score

cyrillagger

scDiffCom:Differential Analysis of Intercellular Communication from scRNA-Seq Data

Analysis tools to investigate changes in intercellular communication from scRNA-seq data. Using a Seurat object as input, the package infers which cell-cell interactions are present in the dataset and how these interactions change between two conditions of interest (e.g. young vs old). It relies on an internal database of ligand-receptor interactions (available for human, mouse and rat) that have been gathered from several published studies. Detection and differential analyses rely on permutation tests. The package also contains several tools to perform over-representation analysis and visualize the results. See Lagger, C. et al. (2023) <doi:10.1038/s43587-023-00514-x> for a full description of the methodology.

Maintained by Cyril Lagger. Last updated 1 years ago.

1.8 match 21 stars 4.25 score 17 scripts

midfieldr

midfieldr:Tools and Methods for Working with MIDFIELD Data in 'R'

Provides tools and demonstrates methods for working with individual undergraduate student-level records (registrar's data) in 'R'. Tools include filters for program codes, data sufficiency, and timely completion. Methods include gathering blocs of records, computing quantitative metrics such as graduation rate, and creating charts to visualize comparisons. 'midfieldr' interacts with practice data provided in 'midfielddata', an R data package available at <https://midfieldr.github.io/midfielddata/>. 'midfieldr' also interacts with the full MIDFIELD database for users who have access. This work is supported by the US National Science Foundation through grant numbers 1545667 and 2142087.

Maintained by Richard Layton. Last updated 2 months ago.

1.3 match 2 stars 5.56 score 26 scripts

masterclm

mclm:Mastering Corpus Linguistics Methods

Read, inspect and process corpus files for quantitative corpus linguistics. Obtain concordances via regular expressions, tokenize texts, and compute frequencies and association measures. Useful for collocation analysis, keywords analysis and variationist studies (comparison of linguistic variants and of linguistic varieties).

Maintained by Mariana Montes. Last updated 2 years ago.

corpus linguistics cpp

2.2 match 1 stars 3.24 score 35 scripts

dietrichson

ProPublicaR:Access Functions for ProPublica's APIs

Provides wrapper functions to access the ProPublica's Congress and Campaign Finance APIs. The Congress API provides near real-time access to legislative data from the House of Representatives, the Senate and the Library of Congress. The Campaign Finance API provides data from United States Federal Election Commission filings and other sources. The API covers summary information for candidates and committees, as well as certain types of itemized data. For more information about these APIs go to: <https://www.propublica.org/datastore/apis>.

Maintained by Aleksander Dietrichson. Last updated 2 years ago.

1.6 match 12 stars 4.38 score 1 scripts

elakkkk

istat:Download and Manipulate Data from Istat

Download data from Istat (Italian Institute of Statistics) database, both old and new provider (respectively, <http://dati.istat.it/> and <https://esploradati.istat.it/databrowser/>). Additional functions for manipulating data are provided. Moreover, a 'shiny' application called 'shinyIstat' can be used to search, download and filter datasets in an easier way.

Maintained by Elena Gradi. Last updated 7 months ago.

3.5 match 2.00 score 2 scripts

tidy-intelligence

owidapi:Access the Our World in Data Chart API

Retrieve data from the Our World in Data (OWID) Chart API <https://docs.owid.io/projects/etl/api/>. OWID provides public access to more than 5,000 charts focusing on global problems such as poverty, disease, hunger, climate change, war, existential risks, and inequality.

Maintained by Christoph Scheuch. Last updated 28 days ago.

economic-data

1.8 match 6 stars 3.78 score

nicholasjcooper

NCmisc:Miscellaneous Functions for Creating Adaptive Functions and Scripts

A set of handy functions. Includes a versatile one line progress bar, one line function timer with detailed output, time delay function, text histogram, object preview, CRAN package search, simpler package installer, Linux command install check, a flexible Mode function, top function, simulation of correlated data, and more.

Maintained by Nicholas Cooper. Last updated 2 years ago.

1.8 match 3.87 score 172 scripts 5 dependents

craig-parylo

cvdprevent:Wrapper for the 'CVD Prevent' Application Programming Interface

Provides an R wrapper to the 'CVD Prevent' application programming interface (API). Users can make API requests through built-in R functions. The Cardiovascular Disease Prevention Audit (CVDPREVENT) is an England-wide primary care audit that automatically extracts routinely held GP health data. <https://bmchealthdocs.atlassian.net/wiki/spaces/CP/pages/317882369/CVDPREVENT+API+Documentation>.

Maintained by Craig Parylo. Last updated 2 months ago.

1.3 match 3 stars 5.02 score 4 scripts

defra-data-science-centre-of-excellence

sgapi:Aid Querying 'nomis' and 'Office for National Statistics Open Geography' APIs

Facilitates extraction of geospatial data from the 'Office for National Statistics Open Geography' and 'nomis' Application Programming Interfaces (APIs). Simplifies process of querying 'nomis' datasets <https://www.nomisweb.co.uk/> and extracting desired datasets in dataframe format. Extracts area shapefiles at chosen resolution from 'Office for National Statistics Open Geography' <https://geoportal.statistics.gov.uk/>.

Maintained by Andrew Christy. Last updated 4 months ago.

2.0 match 3.30 score 2 scripts

gunratan

edgar:Tool for the U.S. SEC EDGAR Retrieval and Parsing of Corporate Filings

In the USA, companies file different forms with the U.S. Securities and Exchange Commission (SEC) through EDGAR (Electronic Data Gathering, Analysis, and Retrieval system). The EDGAR database automated system collects all the different necessary filings and makes it publicly available. This package facilitates retrieving, storing, searching, and parsing of all the available filings on the EDGAR server. It downloads filings from SEC server in bulk with a single query. Additionally, it provides various useful functions: extracts 8-K triggering events, extract "Business (Item 1)" and "Management's Discussion and Analysis(Item 7)" sections of annual statements, searches filings for desired keywords, provides sentiment measures, parses filing header information, and provides HTML view of SEC filings.

Maintained by Gunratan Lonare. Last updated 24 days ago.

2.3 match 10 stars 2.79 score 61 scripts

cran

etable:Easy Table

Creates simple to highly customized tables for a wide selection of descriptive statistics, with or without weighting the data.

Maintained by Andreas Schulz. Last updated 4 years ago.

3.2 match 2.00 score

chgrl

diezeit:R Interface to the ZEIT ONLINE Content API

A wrapper for the ZEIT ONLINE Content API, available at <http://developer.zeit.de>. 'diezeit' gives access to articles and corresponding metadata from the ZEIT archive and from ZEIT ONLINE. A personal API key is required for usage.

Maintained by Christian Graul. Last updated 9 years ago.

1.5 match 3 stars 4.18 score 6 scripts

chihangchen

SRTtools:Adjust Srt File to Get Better Experience when Watching Movie

Srt file is a common subtitle format for videos, it contains subtitle and when the subtitle showed. This package is for align time of srt file, and also change color, style and position of subtitle in videos, the srt file will be read as a vector into R, and can be write into srt file after modified using this package.

Maintained by Jim Chen. Last updated 6 years ago.

2.0 match 2 stars 3.00 score 2 scripts

chgrl

rdnb:R Interface to the 'Deutsche Nationalbibliothek (German National Library) API'

A wrapper for the 'Deutsche Nationalbibliothek (German National Library) API', available at <https://www.dnb.de/EN/Home/home_node.html>. The German National Library is the German central archival library, collecting, archiving, bibliographically classifying all German and German-language publications, foreign publications about Germany, translations of German works, and the works of German-speaking emigrants published abroad between 1933 and 1945.

Maintained by Christian Graul. Last updated 1 years ago.

1.5 match 2 stars 4.00 score 9 scripts

regisoc

kibior:A Simple Data Management and Sharing Tool

An interface to store, retrieve, search, join and share datasets, based on Elasticsearch (ES) API. As a decentralized, FAIR and collaborative search engine and database effort, it proposes a simple push/pull/search mechanism only based on ES, a tool which can be deployed on nearly any hardware. It is a high-level R-ES binding to ease data usage using 'elastic' package (S. Chamberlain (2020)) <https://docs.ropensci.org/elastic/>, extends joins from 'dplyr' package (H. Wickham et al. (2020)) <https://dplyr.tidyverse.org/> and integrates specific biological format importation with Bioconductor packages such as 'rtracklayer' (M. Lawrence and al. (2009) <doi:10.1093/bioinformatics/btp328>) <http://bioconductor.org/packages/rtracklayer>, 'Biostrings' (H. Pagès and al. (2020) <doi:10.18129/B9.bioc.Biostrings>) <http://bioconductor.org/packages/Biostrings>, and 'Rsamtools' (M. Morgan and al. (2020) <doi:10.18129/B9.bioc.Rsamtools>) <http://bioconductor.org/packages/Rsamtools>, but also a long list of more common ones with 'rio' (C-h. Chan and al. (2018)) <https://cran.r-project.org/package=rio>.

Maintained by Régis Ongaro-Carcy. Last updated 4 years ago.

dataimport datarepresentation thirdpartyclient data-science database datasets elasticsearch elasticsearch-client push-pull search search-engine

1.3 match 3 stars 4.48 score 8 scripts

bioc

rfaRm:An R interface to the Rfam database

rfaRm provides a client interface to the Rfam database of RNA families. Data that can be retrieved include RNA families, secondary structure images, covariance models, sequences within each family, alignments leading to the identification of a family and secondary structures in the dot-bracket format.

Maintained by Lara Selles Vidal. Last updated 5 months ago.

functionalgenomics dataimport thirdpartyclient visualization multiplesequencealignment

1.8 match 3.30 score 1 scripts

elipousson

officerExtras:Extra Helpers for 'officer'

Helper and convenience functions using the 'officer' package to modify docx files.

Maintained by Eli Pousson. Last updated 3 months ago.

microsoft-word officer

1.7 match 13 stars 3.41 score 3 scripts

jonathanlees

ProfessR:Grades Setting and Exam Maker

Programs to determine student grades and create examinations from Question banks. Programs will create numerous multiple choice exams, randomly shuffled, for different versions of same question list.

Maintained by Jonathan M. Lees. Last updated 2 years ago.

1.9 match 2.91 score 41 scripts

pedrocava

basedosdados:'Base Dos Dados' R Client

An R interface to the 'Base dos Dados' API <https:basedosdados.github.io/mais/py_reference_api/>). Authenticate your project, query our tables, save data to disk and memory, all from R.

Maintained by Pedro Cavalcante. Last updated 2 years ago.

2.0 match 2.70 score 101 scripts

neonira

wyz.code.rdoc:Wizardry Code Offensive Programming R Documentation

Allows to generate on-demand or by batch, any R documentation file, whatever is kind, data, function, class or package. It populates documentation sections, either automatically or by considering your input. Input code could be standard R code or offensive programming code. Documentation content completeness depends on the type of code you use. With offensive programming code, expect generated documentation to be fully completed, from a format and content point of view. With some standard R code, you will have to activate post processing to fill-in any section that requires complements. Produced manual page validity is automatically tested against R documentation compliance rules. Documentation language proficiency, wording style, and phrasal adjustments remains your job.

Maintained by Fabien Gelineau. Last updated 3 years ago.

2.0 match 2.70 score 1 scripts

bcgov

bcdata:Search and Retrieve Data from the BC Data Catalogue

Search, query, and download tabular and 'geospatial' data from the British Columbia Data Catalogue (<https://catalogue.data.gov.bc.ca/>). Search catalogue data records based on keywords, data licence, sector, data format, and B.C. government organization. View metadata directly in R, download many data formats, and query 'geospatial' data available via the B.C. government Web Feature Service ('WFS') using 'dplyr' syntax.

Maintained by Andy Teucher. Last updated 5 days ago.

bcdc citz data-science env

0.5 match 83 stars 10.36 score 186 scripts 4 dependents

cran

datarobot:'DataRobot' Predictive Modeling API

For working with the 'DataRobot' predictive modeling platform's API <https://www.datarobot.com/>.

Maintained by AJ Alon. Last updated 1 years ago.

1.5 match 2 stars 3.48 score

vlucet

rgovcan:Easy Access to the Canadian Open Government Portal

Allows to search for existing resources, including datasets, on the Canadian Open Government portal (<https://open.canada.ca/en>). It is also designed to allow users to easily download a range of files directly from the portal in a reproducible manner.

Maintained by Valentin Lucet. Last updated 2 years ago.

api-wrapper canada openscience

1.7 match 22 stars 3.04 score 7 scripts

webstat-bdf

rwebstat:Download Data from the Webstat API

Access the Webstat API, download data and metadata from more than 35000 time series from the Banque de France statistics web portal. Access requires a free client ID easily available from the API portal <https://developer.webstat.banque-france.fr/>.

Maintained by Vincent Guegan. Last updated 2 years ago.

1.7 match 2.93 score 17 scripts

swissstatsr

dcatapchr:Create DCAT-AP CH Metadata Files

Create DCAT-AP CH metadata files, typically in rdf format.

Maintained by Sandro Burri. Last updated 3 months ago.

1.7 match 2.78 score 3 scripts

jl5000

tidyged.io:Import and Export GEDCOM Files

Import and export family tree GEDCOM files to and from tidy dataframes.

Maintained by Jamie Lendrum. Last updated 3 years ago.

1.9 match 2.48 score 2 dependents

soodoku

rdomains:Get the Category of Content Hosted by a Domain

Get the category of content hosted by a domain. Use Shallalist <http://shalla.de/>, Virustotal (which provides access to lots of services) <https://www.virustotal.com/>, Alexa <https://aws.amazon.com/awis/>, DMOZ <https://curlie.org/>, University Domain list <https://github.com/Hipo/university-domains-list> or validated machine learning classifiers based on Shallalist data to learn about the kind of content hosted by a domain.

Maintained by Gaurav Sood. Last updated 3 years ago.

1.7 match 2.70 score 7 scripts

quanteda

quanteda.textstats:Textual Statistics for the Quantitative Analysis of Textual Data

Textual statistics functions formerly in the 'quanteda' package. Textual statistics for characterizing and comparing textual data. Includes functions for measuring term and document frequency, the co-occurrence of words, similarity and distance between features and documents, feature entropy, keyword occurrence, readability, and lexical diversity. These functions extend the 'quanteda' package and are specially designed for sparse textual data.

Maintained by Kenneth Benoit. Last updated 7 months ago.

onetbb cpp

0.5 match 15 stars 8.91 score 916 scripts 10 dependents

kurthornik

RKEA:R/KEA Interface

An R interface to KEA (Version 5.0). KEA (for Keyphrase Extraction Algorithm) allows for extracting keyphrases from text documents. It can be either used for free indexing or for indexing with a controlled vocabulary. For more information see <http://www.nzdl.org/Kea/>.

Maintained by Kurt Hornik. Last updated 10 years ago.

openjdk

2.3 match 1 stars 2.00 score 7 scripts

awconway

spiritR:Template for Clinical Trial Protocol

Contains an R Markdown template for a clinical trial protocol adhering to the SPIRIT statement. The SPIRIT (Standard Protocol Items for Interventional Trials) statement outlines recommendations for a minimum set of elements to be addressed in a clinical trial protocol. Also contains functions to create a xml document from the template and upload it to clinicaltrials.gov<https://www.clinicaltrials.gov/> for trial registration.

Maintained by Aaron Conway. Last updated 6 years ago.

1.1 match 2 stars 4.00 score 2 scripts

r-hub

pkgsearch:Search and Query CRAN R Packages

Search CRAN metadata about packages by keyword, popularity, recent activity, package name and more. Uses the 'R-hub' search server, see <https://r-pkg.org> and the CRAN metadata database, that contains information about CRAN packages. Note that this is _not_ a CRAN project.

Maintained by Gábor Csárdi. Last updated 3 months ago.

ranking search-engine

0.5 match 109 stars 8.62 score 64 scripts 10 dependents

davidruvolo51

rdConvert:Convert Rd files to Markdown files loaded with YAML

As R6 class for converting Rd files to markdown with YAML headers. This may be useful if you wish to use package documentation in static site generators outside of the R ecosystem (e.g., React, Vue, Svelte, Gatsby, etc.). By default, Rd files are rendered into their own directory with an independent `index.md` file. The Rd name is parsed and set as the child directory name.

Maintained by David Ruvolo. Last updated 4 years ago.

gatsby-template package-development rmarkdown rmarkdown-websites workflow

2.0 match 3 stars 2.18 score

jcaledo

ptm:Analyses of Protein Post-Translational Modifications

Contains utilities for the analysis of post-translational modifications (PTMs) in proteins, with particular emphasis on the sulfoxidation of methionine residues. Features include the ability to download, filter and analyze data from the sulfoxidation database 'MetOSite'. Utilities to search and characterize S-aromatic motifs in proteins are also provided. In addition, functions to analyze sequence environments around modifiable residues in proteins can be found. For instance, 'ptm' allows to search for amino acids either overrepresented or avoided around the modifiable residues from the proteins of interest. Functions tailored to test statistical hypothesis related to these differential sequence environments are also implemented. Further and detailed information regarding the methods in this package can be found in (Aledo (2020) <https://metositeptm.com>).

Maintained by Juan Carlos Aledo. Last updated 10 months ago.

1.8 match 2.26 score 18 scripts

rafael-ayala

NutrienTrackeR:Food Composition Information and Dietary Assessment

Provides a tool set for food information and dietary assessment. It uses food composition data from several reference databases, including: 'USDA' (United States), 'CIQUAL' (France), 'BEDCA' (Spain), 'CNF' (Canada) and 'STFCJ' (Japan). 'NutrienTrackeR' calculates the intake levels for both macronutrient and micronutrients, and compares them with the recommended dietary allowances (RDA). It includes a number of visualization tools, such as time series plots of nutrient intake, and pie-charts showing the main foods contributing to the intake level of a given nutrient. A shiny app exposing the main functionalities of the package is also provided.

Maintained by Rafael Ayala. Last updated 2 years ago.

1.8 match 2.18 score 15 scripts

kwb-r

kwb.prep:Markdown-Documented Data Preparation

R Package for Markdown-documented data preparation.

Maintained by Hauke Sonnenberg. Last updated 3 years ago.

1.8 match 2.18 score 1 scripts 1 dependents

paithiov909

aznyan:An 'Utanet' Scraper and Utilities

Scrape lyrics from 'Utanet' website.

Maintained by Akiru Kato. Last updated 11 months ago.

cpp

1.9 match 2.00 score 1 scripts

cran

pubmed.mineR:Text Mining of PubMed Abstracts

Text mining of PubMed Abstracts (text and XML) from <https://pubmed.ncbi.nlm.nih.gov/>.

Maintained by S. Ramachandran. Last updated 7 months ago.

1.8 match 6 stars 2.08 score

bryanhanson

HiveR:2D and 3D Hive Plots for R

Creates and plots 2D and 3D hive plots. Hive plots are a unique method of displaying networks of many types in which node properties are mapped to axes using meaningful properties rather than being arbitrarily positioned. The hive plot concept was invented by Martin Krzywinski at the Genome Science Center (www.hiveplot.net/). Keywords: networks, food webs, linnet, systems biology, bioinformatics.

Maintained by Bryan A. Hanson. Last updated 9 months ago.

0.5 match 72 stars 6.76 score 53 scripts 2 dependents

amoeba

eatocsv:Download and extract Entity-Attribute metadata into a CSV

Downloads and extracts Entity-Attribute metadata from EML documents stored in a DataONE Member Node

Maintained by Bryce Mecum. Last updated 4 years ago.

1.9 match 1.70 score 5 scripts

qinwf

jiebaRD:Chinese Text Segmentation Data for jiebaR Package

jiebaR is a package for Chinese text segmentation, keyword extraction and speech tagging. This package provides the data files required by jiebaR.

Maintained by Qin Wenfeng. Last updated 10 years ago.

0.5 match 5 stars 5.78 score 72 scripts 7 dependents

eu-ecdc

epitweetr:Early Detection of Public Health Threats from 'Twitter' Data

It allows you to automatically monitor trends of tweets by time, place and topic aiming at detecting public health threats early through the detection of signals (e.g. an unusual increase in the number of tweets). It was designed to focus on infectious diseases, and it can be extended to all hazards or other fields of study by modifying the topics and keywords. More information is available in the 'epitweetr' peer-review publication (doi:10.2807/1560-7917.ES.2022.27.39.2200177).

Maintained by Laura Espinosa. Last updated 1 years ago.

early-warning-systems epidemic-surveillance lucene machine-learning signal-detection spark twitter

0.5 match 56 stars 5.98 score 86 scripts

cran

WhatsR:Parsing, Anonymizing and Visualizing Exported 'WhatsApp' Chat Logs

Imports 'WhatsApp' chat logs and parses them into a usable dataframe object. The parser works on chats exported from Android or iOS phones and on Linux, macOS and Windows. The parser has multiple options for extracting smileys and emojis from the messages, extracting URLs and domains from the messages, extracting names and types of sent media files from the messages, extracting timestamps from messages, extracting and anonymizing author names from messages. Can be used to create anonymized versions of data.

Maintained by Julian Kohne. Last updated 1 years ago.

openjdk

1.7 match 1.70 score

mjwestgate

revtools:Tools to Support Evidence Synthesis

Researchers commonly need to summarize scientific information, a process known as 'evidence synthesis'. The first stage of a synthesis process (such as a systematic review or meta-analysis) is to download a list of references from academic search engines such as 'Web of Knowledge' or 'Scopus'. The traditional approach to systematic review is then to sort these data manually, first by locating and removing duplicated entries, and then screening to remove irrelevant content by viewing titles and abstracts (in that order). 'revtools' provides interfaces for each of these tasks. An alternative approach, however, is to draw on tools from machine learning to visualise patterns in the corpus. In this case, you can use 'revtools' to render ordinations of text drawn from article titles, keywords and abstracts, and interactively select or exclude individual references, words or topics.

Maintained by Martin J. Westgate. Last updated 5 years ago.

0.5 match 52 stars 5.57 score 72 scripts

majerr

sqlhelper:Easier 'SQL' Integration

Execute files of 'SQL' and manage database connections. 'SQL' statements and queries may be interpolated with string literals. Execution of individual statements and queries may be controlled with keywords. Multiple connections may be defined with 'YAML' and accessed by name.

Maintained by Matthew Roberts. Last updated 1 years ago.

0.5 match 2 stars 5.19 score 39 scripts

koheiw

wordmap:Feature Extraction and Document Classification with Noisy Labels

Extract features and classify documents with noisy labels given by document-meta data or keyword matching Watanabe & Zhou (2020) <doi:10.1177/0894439320907027>.

Maintained by Kohei Watanabe. Last updated 2 months ago.

0.5 match 2 stars 4.86 score 1 scripts

jsugarelli

packagefinder:Comfortable Search for R Packages on CRAN, Either Directly from the R Console or with an R Studio Add-in

Search for R packages on CRAN directly from the R console, based on the packages' titles, short and long descriptions, or other fields. Combine multiple keywords with logical operators ('and', 'or'), view detailed information on any package and keep track of the latest package contributions to CRAN. If you don't want to search from the R console, use the comfortable R Studio add-in.

Maintained by Joachim Zuckarelli. Last updated 4 years ago.

cran-search

0.5 match 43 stars 4.63 score 20 scripts

vforwater

json2aRgs:Parse Parameters Inside a Docker Container

The functions get_parameters() and get_data() are intended to be used within a docker container with a certain file structure to read keyword arguments from the file /in/input.json automagically. The file /src/tool.yaml contains specifications on these keyword arguments, which are then passed as input to containerized R tools in the tool-runner framework.. A template for a containerized R tool, which can be used as a basis for developing new tools, is available at the following URL: <https://github.com/VForWaTer/tool_template_r>.

Maintained by Alexander Dolich. Last updated 9 months ago.

0.8 match 2.70 score 3 scripts

rhlee12

Z10:Simple Ecological Statistics from the NEON Network

Provides simple statistics from instruments and observations at sites in the NEON network, and acts as a simple interface for v0 of the National Ecological Observatory Network (NEON) API. Statistics are generated for meteorologic and soil-based observations, and are presented for daily, annual, and one-time observations at all available NEON sites. Users can also retrieve any dataset publicly hosted by NEON. Metadata for NEON sites and data products can be returned, as well as information on data product availability by site and date. For more information on NEON, please visit <https://www.neonscience.org>. For detailed data product information, please see the NEON data product catalog at <https://data.neonscience.org/data-product-catalog>.

Maintained by Robert Lee. Last updated 6 years ago.

1.7 match 1.20 score 16 scripts

ropengov

usdoj:For Accessing U.S. Department of Justice (DOJ) Open Data

Fetch data from the <https://www.justice.gov/developer/api-documentation/api_v1> API such as press releases, blog entries, and speeches. Optional parameters allow users to specify the number of results starting from the earliest or latest entries, and whether these results contain keywords. Data is cleaned for analysis and returned in a dataframe.

Maintained by Steph Buongiorno. Last updated 3 days ago.

0.5 match 10 stars 4.00 score 4 scripts

cran

PytrendsLongitudinalR:Create Longitudinal Google Trends Data

'Google Trends' provides cross-sectional and time-series data on searches, but lacks readily available longitudinal data. Researchers, who want to create longitudinal 'Google Trends' on their own, face practical challenges, such as normalized counts that make it difficult to combine cross-sectional and time-series data and limitations in data formats and timelines that limit data granularity over extended time periods. This package addresses these issues and enables researchers to generate longitudinal 'Google Trends' data. This package is built on 'pytrends', a Python library that acts as the unofficial 'Google Trends API' to collect 'Google Trends' data. As long as the 'Google Trends API', 'pytrends' and all their dependencies are working, this package will work. During testing, we noticed that for the same input (keyword, topic, data_format, timeline), the output index can vary from time to time. Besides, if the keyword is not very popular, then the resulting dataset will contain a lot of zeros, which will greatly affect the final result. While this package has no control over the accuracy or quality of 'Google Trends' data, once the data is created, this package coverts it to longitudinal data. In addition, the user may encounter a 429 Too Many Requests error when using cross_section() and time_series() to collect 'Google Trends' data. This error indicates that the user has exceeded the rate limits set by the 'Google Trends API'. For more information about the 'Google Trends API' - 'pytrends', visit <https://pypi.org/project/pytrends/>.

Maintained by Taeyong Park. Last updated 7 months ago.

0.8 match 2.70 score

phgrosjean

svIDE:Functions to Ease Interactions Between R and IDE or Code Editors

Function for the GUI API to interact with external IDE/code editors.

Maintained by Philippe Grosjean. Last updated 7 years ago.

1.9 match 1.04 score 11 scripts

cran

needmining:A Simple Needmining Implementation

Showcasing needmining (the semi-automatic extraction of customer needs from social media data) with Twitter data. It uses the handling of the Twitter API provided by the package 'rtweet' and the textmining algorithms provided by the package 'tm'. Niklas Kuehl (2016) <doi:10.1007/978-3-319-32689-4_14> wrote an introduction to the topic of needmining.

Maintained by Dorian Proksch. Last updated 6 years ago.

1.8 match 1.00 score

cran

FITSio:FITS (Flexible Image Transport System) Utilities

Utilities to read and write files in the FITS (Flexible Image Transport System) format, a standard format in astronomy (see e.g. <https://en.wikipedia.org/wiki/FITS> for more information). Present low-level routines allow: reading, parsing, and modifying FITS headers; reading FITS images (multi-dimensional arrays); reading FITS binary and ASCII tables; and writing FITS images (multi-dimensional arrays). Higher-level functions allow: reading files composed of one or more headers and a single (perhaps multidimensional) image or single table; reading tables into data frames; generating vectors for image array axes; scaling and writing images as 16-bit integers. Known incompletenesses are reading random group extensions, as well as complex and array descriptor data types in binary tables.

Maintained by Andrew Harris. Last updated 4 years ago.

1.7 match 1 stars 1.00 score

hakkisabah

tsentiment:Fetching Tweet Data for Sentiment Analysis

Which uses Twitter APIs for the necessary data in sentiment analysis, acts as a middleware with the approved Twitter Application. A special access key is given to users who subscribe to the application with their Twitter account. With this special access key, the user defined keyword for sentiment analysis can be searched in twitter recent searches and results can be obtained( more information <https://github.com/hakkisabah/tsentiment> ). In addition, a service named tsentiment-services has been developed to provide all these operations ( for more information <https://github.com/hakkisabah/tsentiment-services> ). After the successful results obtained and in line with the permissions given by the user, the results of the analysis of the word cloud and bar graph saved in the user folder directory can be seen. In each analysis performed, the previous analysis visual result is deleted and this is the basic information you need to know as a practice rule. 'tsentiment' package provides a free service that acts as a middleware for easy data extraction from Twitter, and in return, the user rate limit is reduced by 30 requests from the total limit and the remaining requests are used. These 30 requests are reserved for use in application analytics. For information about endpoints, you can refer to the limit information in the "GET search/tweets" row in the Endpoints column in the list at <https://developer.twitter.com/en/docs/twitter-api/v1/rate-limits>.

Maintained by Hakki Sabah. Last updated 2 years ago.

sentiment sentiment-analysis tidyverse twitter-api twitter-sentiment-analysis

0.5 match 1 stars 2.70 score

christopherkenny

acronames:Create Acronyms for Naming Things

Simple tool for developing names based on first letters of keywords.

Maintained by Christopher T. Kenny. Last updated 3 years ago.

0.6 match 1 stars 1.70 score 1 scripts

cran

radlibs:Build Your Own Madlibs!

Make your phrase or sentence into something funny! Pass a string with the keywords in, and get out a bit of humor.

Maintained by Stephanie Kirmer. Last updated 5 years ago.

0.5 match 1.70 score