R-universe search: scraping

billpetti

baseballr:Acquiring and Analyzing Baseball Data

Provides numerous utilities for acquiring and analyzing baseball data from online sources such as 'Baseball Reference' <https://www.baseball-reference.com/>, 'FanGraphs' <https://www.fangraphs.com/>, and the 'MLB Stats' API <https://www.mlb.com/>.

Maintained by Saiem Gilani. Last updated 4 months ago.

baseball pitchfx sabermetrics statcast

55.2 match 380 stars 8.98 score 582 scripts

tidyverse

rvest:Easily Harvest (Scrape) Web Pages

Wrappers around the 'xml2' and 'httr' packages to make it easy to download, then manipulate, HTML and XML.

Maintained by Hadley Wickham. Last updated 5 months ago.

html web-scraping

17.6 match 1.5k stars 19.62 score 29k scripts 546 dependents

feddelegrand7

ralger:Easy Web Scraping

The goal of 'ralger' is to facilitate web scraping in R.

Maintained by Mohamed El Fodil Ihaddaden. Last updated 8 months ago.

dataextraction webcrawling webscraper-website webscraping

30.1 match 155 stars 7.41 score 33 scripts

rmi-pacta

pacta.data.scraping:Scrapes data from various web sources needed for PACTA

This package provides tools to scrape data from various web sources needed for PACTA.

Maintained by CJ Yetman. Last updated 5 months ago.

climate-change pacta pactaverse sustainable-finance

38.5 match 2 stars 3.89 score 3 scripts 1 dependents

gastonbecerra

ojsr:Crawler and Data Scraper for Open Journal System ('OJS')

Crawler for 'OJS' pages and scraper for meta-data from articles. You can crawl 'OJS' archives, issues, articles, galleys, and search results. You can scrape articles metadata from their head tag in html, or from Open Archives Initiative ('OAI') records. Most of these functions rely on 'OJS' routing conventions (<https://docs.pkp.sfu.ca/dev/documentation/en/architecture-routes>).

Maintained by Gaston Becerra. Last updated 4 months ago.

oai-pmh ojs scraper web-scraping

23.8 match 3 stars 4.35 score 15 scripts

dmi3kno

polite:Be Nice on the Web

Be responsible when scraping data from websites by following polite principles: introduce yourself, ask for permission, take slowly and never ask twice.

Maintained by Dmytro Perepolkin. Last updated 2 years ago.

crawler memoise rate-limiter robotstxt rvest scraper webscraping

9.1 match 327 stars 8.98 score 596 scripts 5 dependents

till-tietz

parsel:Parallel Dynamic Web-Scraping Using 'RSelenium'

A system to increase the efficiency of dynamic web-scraping with 'RSelenium' by leveraging parallel processing. You provide a function wrapper for your 'RSelenium' scraping routine with a set of inputs, and 'parsel' runs it in several browser instances. Chunked input processing as well as error catching and logging ensures seamless execution and minimal data loss, even when unforeseen 'RSelenium' errors occur. You can additionally build safe scraping functions with minimal coding by utilizing constructor functions that act as wrappers around 'RSelenium' methods.

Maintained by Till Tietz. Last updated 1 years ago.

parallel rselenium web-scraping

19.9 match 15 stars 3.88 score 8 scripts

jameshwade

gpttools:Extensions and Tools for gptstudio

gpttools is an R package that provides extensions to gptstudio to provide devtools-like functionality using the latest natural language processing (NLP) models. It is designed to make package development easier by providing a range of tools and functions that can be used to improve the quality of your package's documentation, testing, and maybe even functionality.

Maintained by James Wade. Last updated 7 months ago.

chatgpt nlp openai package-development rstudio-addin

10.8 match 293 stars 7.06 score 14 scripts

pgomba

MDPIexploreR:Web Scraping and Bibliometric Analysis of MDPI Journals

Provides comprehensive tools to scrape and analyze data from the MDPI journals. It allows users to extract metrics such as submission-to-acceptance times, article types, and whether articles are part of special issues. The package can also visualize this information through plots. Additionally, 'MDPIexploreR' offers tools to explore patterns of self-citations within articles and provides insights into guest-edited special issues.

Maintained by Pablo Gómez Barreiro. Last updated 4 months ago.

analysis data-analysis data-visualization mdpi metrics scientific-journals visualization web-scraping

10.9 match 20 stars 6.20 score 9 scripts

martigso

stortingscrape:Access Data from the Norwegian Parliament API

Functions for retrieving general and specific data from the Norwegian Parliament, through the Norwegian Parliament API at <https://data.stortinget.no>.

Maintained by Martin Søyland. Last updated 9 days ago.

scraping stortinget

10.0 match 11 stars 6.02 score 24 scripts

ashbythorpe

selenider:Concise, Lazy and Reliable Wrapper for 'chromote' and 'selenium'

A user-friendly wrapper for web automation, using either 'chromote' or 'selenium'. Provides a simple and consistent API to make web scraping and testing scripts easy to write and understand. Elements are lazy, and automatically wait for the website to be valid, resulting in reliable and reproducible code, with no visible impact on the experience of the programmer.

Maintained by Ashby Thorpe. Last updated 2 months ago.

web-scraping

8.0 match 39 stars 7.21 score 23 scripts

jsta

wikilake:Scrape Lake Metadata Tables from Wikipedia

Scrape lake metadata tables from Wikipedia <https://www.wikipedia.org/>.

Maintained by Jemma Stachelek. Last updated 2 years ago.

lakes limnology wikipedia

11.6 match 8 stars 4.83 score 17 scripts

erictleung

pixarfilms:Pixar Films and Achievements

Data about Disney Pixar films provided by Wikipedia. This package contains data about the films, the people involved, and their awards.

Maintained by Eric Leung. Last updated 2 days ago.

data data-science datapackage disney imdb imdb-dataset pixar pixar-films web-scraping wikipedia

7.5 match 20 stars 7.42 score 23 scripts 1 dependents

jimmyday12

fitzRoy:Easily Scrape and Process AFL Data

An easy package for scraping and processing Australia Rules Football (AFL) data. 'fitzRoy' provides a range of functions for accessing publicly available data from 'AFL Tables' <https://afltables.com/afl/afl_index.html>, 'Footy Wire' <https://www.footywire.com> and 'The Squiggle' <https://squiggle.com.au>. Further functions allow for easy processing, cleaning and transformation of this data into formats that can be used for analysis.

Maintained by James Day. Last updated 2 months ago.

4.8 match 134 stars 10.74 score 324 scripts

ctn-0094

DOPE:Drug Ontology Parsing Engine

Provides information on drug names (brand, generic and street) for drugs tracked by the DEA. There are functions that will search synonyms and return the drug names and types. The vignettes have extensive information on the work done to create the data for the package.

Maintained by Raymond Balise. Last updated 4 years ago.

5.8 match 21 stars 7.83 score 31 scripts

jaytimm

textpress:A Lightweight and Versatile NLP Toolkit

A simple Natural Language Processing (NLP) toolkit focused on search-centric workflows with minimal dependencies. The package offers key features for web scraping, text processing, corpus search, and text embedding generation via the 'HuggingFace API' <https://huggingface.co/docs/api-inference/index>.

Maintained by Jason Timm. Last updated 5 months ago.

corpus-search nlp openai-embeddings web-scraping

9.8 match 3 stars 4.18 score

chaoliu-cl

Goodreader:Scrape and Analyze 'Goodreads' Book Data

A comprehensive toolkit for scraping and analyzing book data from <https://www.goodreads.com/>. This package provides functions to search for books, scrape book details and reviews, perform sentiment analysis on reviews, and conduct topic modeling. It's designed for researchers, data analysts, and book enthusiasts who want to gain insights from 'Goodreads' data.

Maintained by Chao Liu. Last updated 13 days ago.

8.8 match 4.40 score 5 scripts

yuanchao-xu

gfer:Green Finance and Environmental Risk

Focuses on data collecting, analyzing and visualization in green finance and environmental risk research and analysis. Main function includes environmental data collecting from official websites such as MEP (Ministry of Environmental Protection of China, <https://www.mee.gov.cn>), water related projects identification and environmental data visualization.

Maintained by Yuanchao Xu. Last updated 3 months ago.

corporate-social-responsibility csr data-analysis data-scraping environmental-risk green-finance stock-data

7.5 match 8 stars 4.81 score 16 scripts

ricilandolt

shadowr:Selenium Plugin to Manage Multi Level Shadow Elements on Web Page

Shadow Document Object Model is a web standard that offers component style and markup encapsulation. It is a critically important piece of the Web Components story as it ensures that a component will work in any environment even if other CSS or JavaScript is at play on the page. Custom HTML tags can't be directly identified with selenium tools, because Selenium doesn't provide any way to deal with shadow elements. Using this plugin you can handle any custom HTML tags.

Maintained by Ricardo Landolt. Last updated 3 years ago.

rselenium rstudio scraping shadow-dom

10.0 match 5 stars 3.44 score 11 scripts

bioc

midasHLA:R package for immunogenomics data handling and association analysis

MiDAS is a R package for immunogenetics data transformation and statistical analysis. MiDAS accepts input data in the form of HLA alleles and KIR types, and can transform it into biologically meaningful variables, enabling HLA amino acid fine mapping, analyses of HLA evolutionary divergence, KIR gene presence, as well as validated HLA-KIR interactions. Further, it allows comprehensive statistical association analysis workflows with phenotypes of diverse measurement scales. MiDAS closes a gap between the inference of immunogenetic variation and its efficient utilization to make relevant discoveries related to T cell, Natural Killer cell, and disease biology.

Maintained by Maciej Migdał. Last updated 5 months ago.

cellbiology genetics statisticalmethod

7.1 match 4.30 score 3 scripts

ropensci

Rpolyhedra:Polyhedra Database

A polyhedra database scraped from various sources as R6 objects and 'rgl' visualizing capabilities.

Maintained by Alejandro Baranek. Last updated 5 months ago.

geometry polyhedra-database rgl

4.5 match 12 stars 6.21 score 30 scripts

corynissen

fitbitScraper:Scrapes Data from Fitbit

Scrapes data from Fitbit <http://www.fitbit.com>. This does not use the official API, but instead uses the API that the web dashboard uses to generate the graphs displayed on the dashboard after login at <http://www.fitbit.com>.

Maintained by Cory Nissen. Last updated 8 years ago.

3.9 match 118 stars 5.93 score 12 scripts 1 dependents

jsakaluk

dySEM:Dyadic Structural Equation Modeling

Scripting of structural equation models via 'lavaan' for Dyadic Data Analysis, and helper functions for supplemental calculations, tabling, and model visualization. Current models supported include Dyadic Confirmatory Factor Analysis, the Actor–Partner Interdependence Model (observed and latent), the Common Fate Model (observed and latent), Mutual Influence Model (latent), and the Bifactor Dyadic Model (latent).

Maintained by John Sakaluk. Last updated 24 days ago.

3.2 match 6 stars 6.16 score 10 scripts

neonscience

neonUtilities:Utilities for Working with NEON Data

NEON data packages can be accessed through the NEON Data Portal <https://www.neonscience.org> or through the NEON Data API (see <https://data.neonscience.org/data-api> for documentation). Data delivered from the Data Portal are provided as monthly zip files packaged within a parent zip file, while individual files can be accessed from the API. This package provides tools that aid in discovering, downloading, and reformatting data prior to use in analyses. This includes downloading data via the API, merging data tables by type, and converting formats. For more information, see the readme file at <https://github.com/NEONScience/NEON-utilities>.

Maintained by Claire Lunch. Last updated 1 months ago.

1.8 match 57 stars 10.66 score 944 scripts 15 dependents

kwb-r

kwb.site:R Package for Scraping Our Offical KWB Website (Before Re-Design in 2021)

This package contains functions for scraping our official [KWB website](https://kompetenz-wasser.de). The data for all projects and people can be collected in order to provide an overview of the website`s content and in order to be integrate that data into a KWB knowledge repo.

Maintained by Michael Rustler. Last updated 3 years ago.

knowledge-repo project-fakin r-selenium rvest web-scraping website

10.8 match 1.70 score 2 scripts

troyhill

VulnToolkit:Analysis of Tidal Datasets

Contains functions for analysis and summary of tidal datasets. Also provides access to tidal data collected by the National Oceanic and Atmospheric Administration's Center for Operational Oceanographic Products and Services and the Permanent Service for Mean Sea Level. For detailed description and application examples, see Hill, T.D. and S.C. Anisfeld (2021) <doi:10.6084/m9.figshare.14161202.v1> and Hill, T.D. and S.C. Anisfeld (2015) <doi:10.1016/j.ecss.2015.06.004>.

Maintained by Troy Hill. Last updated 4 years ago.

3.4 match 8 stars 5.18 score 19 scripts

sportsdataverse

hoopR:Access Men's Basketball Play by Play Data

A utility to quickly obtain clean and tidy men's basketball play by play data. Provides functions to access live play by play and box score data from ESPN<https://www.espn.com> with shot locations when available. It is also a full NBA Stats API<https://www.nba.com/stats/> wrapper. It is also a scraping and aggregating interface for Ken Pomeroy's men's college basketball statistics website<https://kenpom.com>. It provides users with an active subscription the capability to scrape the website tables and analyze the data for themselves.

Maintained by Saiem Gilani. Last updated 1 years ago.

basketball college-basketball espn kenpom nba nba-analytics nba-api nba-data nba-statistics nba-stats nba-stats-api ncaa ncaa-basketball ncaa-bracket ncaa-players ncaa-ratings ncaam sportsdataverse

2.5 match 91 stars 6.93 score 261 scripts

schochastics

paperwizard:Scrape News Sites using 'readability.js'

Uses Mozillas readability.js to scrape text from websites. This is particularly useful to obtain news articles.

Maintained by David Schoch. Last updated 1 months ago.

5.3 match 6 stars 3.08 score

ropensci

awardFindR:awardFindR

Queries a number of scientific awards databases. Collects relevant results based on keyword and date parameters, returns list of projects that fit those criteria as a data frame. Sources include: Arnold Ventures, Carnegie Corp, Federal RePORTER, Gates Foundation, MacArthur Foundation, Mellon Foundation, NEH, NIH, NSF, Open Philanthropy, Open Society Foundations, Rockefeller Foundation, Russell Sage Foundation, Robert Wood Johnson Foundation, Sloan Foundation, Social Science Research Council, John Templeton Foundation, and USASpending.gov.

Maintained by Michael McCall. Last updated 12 months ago.

3.5 match 16 stars 4.38 score 3 scripts

bioc

rebook:Re-using Content in Bioconductor Books

Provides utilities to re-use content across chapters of a Bioconductor book. This is mostly based on functionality developed while writing the OSCA book, but generalized for potential use in other large books with heavy compute. Also contains some functions to assist book deployment.

Maintained by Aaron Lun. Last updated 5 months ago.

software infrastructure reportwriting

4.1 match 3.65 score 223 scripts

takeshinishimura

jstager:Retrieve Information Published on J-STAGE

Provides tools to access the J-STAGE WebAPI and retrieve information published on J-STAGE <https://www.jstage.jst.go.jp/browse/-char/ja>.

Maintained by Takeshi Nishimura. Last updated 7 months ago.

3.7 match 3 stars 4.08 score 5 scripts

mattcowgill

readabs:Download and Tidy Time Series Data from the Australian Bureau of Statistics

Downloads, imports, and tidies time series data from the Australian Bureau of Statistics <https://www.abs.gov.au/>.

Maintained by Matt Cowgill. Last updated 15 days ago.

abs australia australian-bureau-of-statistics australian-data statistics tidy-data time-series

1.7 match 104 stars 8.85 score 180 scripts

mjlajeunesse

metagear:Comprehensive Research Synthesis Tools for Systematic Reviews and Meta-Analysis

Functionalities for facilitating systematic reviews, data extractions, and meta-analyses. It includes a GUI (graphical user interface) to help screen the abstracts and titles of bibliographic data; tools to assign screening effort across multiple collaborators/reviewers and to assess inter- reviewer reliability; tools to help automate the download and retrieval of journal PDF articles from online databases; figure and image extractions from PDFs; web scraping of citations; automated and manual data extraction from scatter-plot and bar-plot images; PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagrams; simple imputation tools to fill gaps in incomplete or missing study parameters; generation of random effects sizes for Hedges' d, log response ratio, odds ratio, and correlation coefficients for Monte Carlo experiments; covariance equations for modelling dependencies among multiple effect sizes (e.g., effect sizes with a common control); and finally summaries that replicate analyses and outputs from widely used but no longer updated meta-analysis software (i.e., metawin). Funding for this package was supported by National Science Foundation (NSF) grants DBI-1262545 and DEB-1451031. CITE: Lajeunesse, M.J. (2016) Facilitating systematic reviews, data extraction and meta-analysis with the metagear package for R. Methods in Ecology and Evolution 7, 323-330 <doi:10.1111/2041-210X.12472>.

Maintained by Marc J. Lajeunesse. Last updated 4 years ago.

2.2 match 14 stars 6.71 score 91 scripts

villegar

scrappy:A Simple Web Scraper

A group of functions to scrape data from different websites, for academic purposes.

Maintained by Roberto Villegas-Diaz. Last updated 1 years ago.

4.4 match 4 stars 3.30 score

cgoo4

usedthese:Summarises Package & Function Usage

Consistent with 'knitr' syntax highlighting, 'usedthese' adds a summary table of package & function usage to a Quarto document and enables aggregation of usage across a website.

Maintained by Carl Goodwin. Last updated 8 months ago.

quarto

1.8 match 7 stars 6.70 score 120 scripts

exetrujillo

datamedios:Scraping Chilean Media

A system for extracting news from Chilean media, specifically through Web Scapping from Chilean media. The package allows for news searches using search phrases and date filters, and returns the results in a structured format, ready for analysis. Additionally, it includes functions to clean the extracted data, visualize it, and store it in databases. All of this can be done automatically, facilitating the collection and analysis of relevant information from Chilean media.

Maintained by Exequiel Trujillo. Last updated 28 days ago.

3.3 match 1 stars 3.60 score

sportsdataverse

wehoop:Access Women's Basketball Play by Play Data

A utility for working with women's basketball data. A scraping and aggregating interface for the WNBA Stats API <https://stats.wnba.com/> and ESPN's <https://www.espn.com> women's college basketball and WNBA statistics. It provides users with the capability to access the game play-by-plays, box scores, standings and results to analyze the data for themselves.

Maintained by Saiem Gilani. Last updated 8 months ago.

college-basketball espn espn-stats ncaa ncaa-basketball professional-basketball-data sportsdataverse wnba wnba-players wnba-stats womens-basketball

2.2 match 28 stars 5.36 score 54 scripts

lindbrook

packageRank:Computation and Visualization of Package Download Counts and Percentile Ranks

Compute and visualize package download counts and percentile ranks from Posit/RStudio's CRAN mirror.

Maintained by lindbrook. Last updated 5 days ago.

bioconductor-packages

1.9 match 28 stars 6.13 score 27 scripts

liserman

archiveRetriever:Retrieve Archived Web Pages from the 'Internet Archive'

Scraping content from archived web pages stored in the 'Internet Archive' (<https://archive.org>) using a systematic workflow. Get an overview of the mementos available from the respective homepage, retrieve the Urls and links of the page and finally scrape the content. The final output is stored in tibbles, which can be then easily used for further analysis.

Maintained by Lukas Isermann. Last updated 9 months ago.

2.6 match 13 stars 4.32 score 16 scripts

michalovadek

eurlex:Retrieve Data on European Union Law

Access to data on European Union laws and court decisions made easy with pre-defined 'SPARQL' queries and 'GET' requests. See Ovadek (2021) <doi:10.1080/2474736X.2020.1870150> .

Maintained by Michal Ovadek. Last updated 7 months ago.

courts eurlex european-union law legislation sparql

1.8 match 36 stars 6.18 score 21 scripts

dylanpieper

batchLLM:Batch Process LLM Text Completions Using a Data Frame

Batch process large language model (LLM) text completions using data frame rows, with support for OpenAI's 'GPT' (<https://chat.openai.com>), Anthropic's 'Claude' (<https://claude.ai>), and Google's 'Gemini' (<https://gemini.google.com>). Includes features such as local storage, metadata logging, API rate limiting delays, and a 'shiny' app addin.

Maintained by Dylan Pieper. Last updated 1 months ago.

depreciated

2.3 match 11 stars 4.85 score 6 scripts

cran

NHSDataDictionaRy:NHS Data Dictionary Toolset for NHS Lookups

Providing a common set of simplified web scraping tools for working with the NHS Data Dictionary <https://datadictionary.nhs.uk/data_elements_overview.html>. The intended usage is to access the data elements section of the NHS Data Dictionary to access key lookups. The benefits of having it in this package are that the lookups are the live lookups on the website and will not need to be maintained. This package was commissioned by the NHS-R community <https://nhsrcommunity.com/> to provide this consistency of lookups. The OpenSafely lookups have now been added <https://www.opencodelists.org/docs/>.

Maintained by Gary Hutson. Last updated 4 years ago.

5.1 match 2.00 score

paithiov909

aznyan:An 'Utanet' Scraper and Utilities

Scrape lyrics from 'Utanet' website.

Maintained by Akiru Kato. Last updated 10 months ago.

cpp

4.5 match 2.00 score 1 scripts

kumes

chatAI4R:Chat-Based Interactive Artificial Intelligence for R

The Large Language Model (LLM) represents a groundbreaking advancement in data science and programming, and also allows us to extend the world of R. A seamless interface for integrating the 'OpenAI' Web APIs into R is provided in this package. This package leverages LLM-based AI techniques, enabling efficient knowledge discovery and data analysis (see 'OpenAI' Web APIs details <https://openai.com/blog/openai-api>). The previous functions such as seamless translation and image generation have been moved to other packages 'deepRstudio' and 'stableDiffusion4R'.

Maintained by Satoshi Kume. Last updated 1 months ago.

ai bioinformatics chatgpt gpt image image-generation

1.8 match 14 stars 4.45 score 3 scripts

globeandmail

upstartr:Utilities Powering the Globe and Mail's Data Journalism Template

Core functions necessary for using The Globe and Mail's R data journalism template, 'startr', along with utilities for day-to-day data journalism tasks, such as reading and writing files, producing graphics and cleaning up datasets.

Maintained by Tom Cardoso. Last updated 1 years ago.

data data-analysis data-journalism data-visualization journalism news

1.8 match 6 stars 4.14 score 46 scripts

scholaempirica

reschola:The Schola Empirica Package

A collection of utilies, themes and templates for data analysis at Schola Empirica.

Maintained by Jan Netík. Last updated 5 months ago.

1.5 match 4 stars 4.83 score 14 scripts

edonnachie

ICD10gm:Metadata Processing for the German Modification of the ICD-10 Coding System

Provides convenient access to the German modification of the International Classification of Diagnoses, 10th revision (ICD-10-GM). It provides functionality to aid in the identification, specification and historisation of ICD-10 codes. Its intended use is the analysis of routinely collected data in the context of epidemiology, medical research and health services research. The underlying metadata are released by the German Institute for Medical Documentation and Information <https://www.dimdi.de>, and are redistributed in accordance with their license.

Maintained by Ewan Donnachie. Last updated 1 years ago.

bfarm charlson comorbidities diagnoses dimdi icd-10 metadata routinedaten versorgungsforschung

1.3 match 10 stars 5.30 score 20 scripts

bioc

fobitools:Tools for Manipulating the FOBI Ontology

A set of tools for interacting with the Food-Biomarker Ontology (FOBI). A collection of basic manipulation tools for biological significance analysis, graphs, and text mining strategies for annotating nutritional data.

Maintained by Pol Castellano-Escuder. Last updated 4 months ago.

massspectrometry metabolomics software visualization biomedicalinformatics graphandnetwork annotation cheminformatics pathways genesetenrichment biological-intrerpretation biological-knowledge biological-significance-analysis enrichment-analysis food-biomarker-ontology knowledge-graph nutrition obofoundry ontology text-mining

1.2 match 1 stars 5.08 score 5 scripts

p0bs

DataKindR:Provides Helper Functions for DataKind Volunteers

DataKind volunteers often need access to specific data or techniques in a DataDive or other project. This package seeks to simplify access to these resources.

Maintained by Robin Penfold. Last updated 9 months ago.

1.8 match 3.18 score 3 scripts

jboelaert

scraEP:Scrape the Web with Extra Power

Tools for scraping information from webpages and other XML contents, using XPath or CSS selectors.

Maintained by Julien Boelaert. Last updated 4 years ago.

5.5 match 1.00 score 3 scripts

gunratan

edgar:Tool for the U.S. SEC EDGAR Retrieval and Parsing of Corporate Filings

In the USA, companies file different forms with the U.S. Securities and Exchange Commission (SEC) through EDGAR (Electronic Data Gathering, Analysis, and Retrieval system). The EDGAR database automated system collects all the different necessary filings and makes it publicly available. This package facilitates retrieving, storing, searching, and parsing of all the available filings on the EDGAR server. It downloads filings from SEC server in bulk with a single query. Additionally, it provides various useful functions: extracts 8-K triggering events, extract "Business (Item 1)" and "Management's Discussion and Analysis(Item 7)" sections of annual statements, searches filings for desired keywords, provides sentiment measures, parses filing header information, and provides HTML view of SEC filings.

Maintained by Gunratan Lonare. Last updated 9 days ago.

1.8 match 10 stars 2.79 score 61 scripts

gojiplus

tuber:Client for the YouTube API

Get comments posted on YouTube videos, information on how many times a video has been liked, search for videos with particular content, and much more. You can also scrape captions from a few videos. To learn more about the YouTube API, see <https://developers.google.com/youtube/v3/>.

Maintained by Gaurav Sood. Last updated 8 months ago.

access-youtube caption video youtube youtube-api youtube-oauth

0.5 match 184 stars 8.99 score 206 scripts

yonicd

sinew:Package Development Documentation and Namespace Management

Manage package documentation and namespaces from the command line. Programmatically attach namespaces in R and Rmd script, populates Roxygen2 skeletons with information scraped from within functions and populate the Imports field of the DESCRIPTION file.

Maintained by Jonathan Sidi. Last updated 1 years ago.

0.5 match 166 stars 8.54 score 88 scripts

thinkr-open

cranology:The CRAN Chronology

Scraping routines and datasets to monitor the evolution of the number of packages on CRAN.

Maintained by Antoine Languillaume. Last updated 6 months ago.

2.4 match 1 stars 1.70 score 5 scripts

keberwein

blscrapeR:An API Wrapper for the United States Bureau of Labor Statistics

Scrapes various data from <https://www.bls.gov/>. The Bureau of Labor Statistics is the statistical branch of the United States Department of Labor. The package has additional functions to help parse, analyze and visualize the data.

Maintained by Kris Eberwein. Last updated 1 years ago.

api api-wrapper bls bureau-of-labor-statistics consumer-price-index cpi inflation inflation-calculator labor-statistics unemployment

0.5 match 112 stars 7.66 score 270 scripts

sstoeckl

crypto2:Download Crypto Currency Data from 'CoinMarketCap' without 'API'

Retrieves crypto currency information and historical prices as well as information on the exchanges they are listed on. Historical data contains daily open, high, low and close values for all crypto currencies. All data is scraped from <https://coinmarketcap.com> via their 'web-api'.

Maintained by Sebastian Stoeckl. Last updated 7 days ago.

0.5 match 56 stars 7.33 score 60 scripts 1 dependents

ginsburg1

ProSportsDraftData:Professional Sports Draft Data

We provide comprehensive draft data for major professional sports leagues, including the National Football League (NFL), National Basketball Association (NBA), and National Hockey League (NHL). It offers access to both historical and current draft data, allowing for detailed analysis and research on player biases and player performance. The package is useful for sports fans and researchers interested in identifying biases and trends within scouting reports. Created by web scraping data from leading websites that cover professional sports player scouting reports, the package allows users to filter and summarize data for analytical purposes. For further details on the methods used, please refer to Wickham (2022) "rvest: Easily Harvest (Scrape) Web Pages" <https://CRAN.R-project.org/package=rvest> and Harrison (2023) "RSelenium: R Bindings for Selenium WebDriver" <https://CRAN.R-project.org/package=RSelenium>.

Maintained by Benjamin Ginsburg. Last updated 6 months ago.

0.8 match 2 stars 4.85 score 5 scripts

salimk

Rcrawler:Web Crawler and Scraper

Performs parallel web crawling and web scraping. It is designed to crawl, parse and store web pages to produce data that can be directly used for analysis application. For details see Khalil and Fakir (2017) <DOI:10.1016/j.softx.2017.04.004>.

Maintained by Salim Khalil. Last updated 5 years ago.

crawler crawlers scraper webcrawler webscraper webscraping webscrapping

0.5 match 354 stars 6.89 score 110 scripts

anniesbooth

runexp:Softball Run Expectancy using Markov Chains and Simulation

Implements two methods of estimating runs scored in a softball scenario: (1) theoretical expectation using discrete Markov chains and (2) empirical distribution using multinomial random simulation. Scores are based on player-specific input probabilities (out, single, double, triple, walk, and homerun). Optional inputs include probability of attempting a steal, probability of succeeding in an attempted steal, and an indicator of whether a player is "fast" (e.g. the player could stretch home). These probabilities may be calculated from common player statistics that are publicly available on team's webpages. Scores are evaluated based on a nine-player lineup and may be used to compare lineups, evaluate base scenarios, and compare the offensive potential of individual players. Manuscript forthcoming. See Bukiet & Harold (1997) <doi:10.1287/opre.45.1.14> for implementation of discrete Markov chains.

Maintained by Annie Sauer. Last updated 4 years ago.

3.3 match 1.00 score 1 scripts

sewardlee337

finreportr:Financial Data from U.S. Securities and Exchange Commission

Download and display company financial data from the U.S. Securities and Exchange Commission's EDGAR database. It contains a suite of functions with web scraping and XBRL parsing capabilities that allows users to extract data from EDGAR in an automated and scalable manner. See <https://www.sec.gov/edgar/searchedgar/companysearch.html> for more information.

Maintained by Seward Lee. Last updated 3 years ago.

balance-sheet cash-flow finance financial-data financial-statement financial-statements income-statement sec stock-ticker-symbol

0.5 match 131 stars 6.28 score 29 scripts

cran

WhatsR:Parsing, Anonymizing and Visualizing Exported 'WhatsApp' Chat Logs

Imports 'WhatsApp' chat logs and parses them into a usable dataframe object. The parser works on chats exported from Android or iOS phones and on Linux, macOS and Windows. The parser has multiple options for extracting smileys and emojis from the messages, extracting URLs and domains from the messages, extracting names and types of sent media files from the messages, extracting timestamps from messages, extracting and anonymizing author names from messages. Can be used to create anonymized versions of data.

Maintained by Julian Kohne. Last updated 1 years ago.

openjdk

1.7 match 1.70 score 3 scripts

elipousson

esri2sf:Create Simple Features from ArcGIS Server REST API

This package enables you to scrape geographic features directly from ArcGIS servers REST API into R as simple features.

Maintained by Eli Pousson. Last updated 4 months ago.

arcgis esri

0.5 match 7 stars 5.28 score 50 scripts 1 dependents

ropensci

epair:EPA Data Helper for R

Aid the user in making queries to the EPA API site found at https://aqs.epa.gov/aqsweb/documents/data_api. This package combines API calling methods from various web scraping packages with specific strings to retrieve data from the EPA API. It also contains easy to use loaded variables that help a user navigate services offered by the API and aid the user in determining the appropriate way to make a an API call.

Maintained by G.L. Orozco-Mulfinger. Last updated 3 years ago.

0.5 match 7 stars 4.89 score 11 scripts

giocomai

castarter:Content Analysis Starter Toolkit

Consistent approaches for basic web scraping, text mining and word frequency analysis of textual datasets

Maintained by Giorgio Comai. Last updated 2 days ago.

tada text-mining

0.5 match 3 stars 4.52 score 2 scripts

fatelarico

FinNet:Quickly Build and Manipulate Financial Networks

Providing classes, methods, and functions to deal with financial networks. Users can easily store information about both physical and legal persons by using pre-made classes that are studied for integration with scraping packages such as 'rvest' and 'RSelenium'. Moreover, the package assists in creating various types of financial networks depending on the type of relation between its units depending on the relation under scrutiny (ownership, board interlocks, etc.), the desired tie type (valued or binary), and renders them in the most common formats (adjacency matrix, incidence matrix, edge list, 'igraph', 'network'). There are also ad-hoc functions for the Fiedler value, global network efficiency, and cascade-failure analysis.

Maintained by Fabio Ashtar Telarico. Last updated 5 months ago.

cpp

0.5 match 2 stars 4.78 score 7 scripts

conjugateprior

twfy:Drive the API for TheyWorkForYou

An R wrapper around the API of TheyWorkForYou, a parliamentary monitoring site that scrapes and repackages Hansard (the UK's parliamentary record) and augments it with information from the Register of Members' Interests, election results, and voting records to provide a unified source of information about UK legislators and their activities. See <http://www.theyworkforyou.com> for details.

Maintained by Will Lowe. Last updated 6 years ago.

0.5 match 9 stars 4.65 score 3 scripts

shabbychef

cocktailApp:'shiny' App to Discover Cocktails

A 'shiny' app to discover cocktails. The app allows one to search for cocktails by ingredient, filter on rating, and number of ingredients. The package also contains data with the ingredients of nearly 26 thousand cocktails scraped from the web.

Maintained by Steven E. Pav. Last updated 3 years ago.

0.5 match 43 stars 4.33 score 5 scripts

ashbaldry

appler:'Apple App Store' and 'iTunes' Data Extraction

Using 'Apple App Store' <https://www.apple.com/app-store/> web scraping and 'iTunes' API <https://performance-partners.apple.com/search-api> to extract content information, app ratings and reviews.

Maintained by Ashley Baldry. Last updated 2 years ago.

0.5 match 18 stars 4.13 score 15 scripts

eflores89

banxicoR:Download Data from the Bank of Mexico

Provides functions to scrape IQY calls to Bank of Mexico, downloading and ordering the data conveniently.

Maintained by Eduardo Flores. Last updated 7 years ago.

0.5 match 11 stars 3.74 score 7 scripts

muschellij2

gcite:Google Citation Parser

Scrapes Google Citation pages and creates data frames of citations over time.

Maintained by John Muschelli. Last updated 3 years ago.

0.6 match 3 stars 3.67 score 31 scripts

lcef97

SchoolDataIT:Retrieve, Harmonise and Map Open Data Regarding the Italian School System

Compiles and displays the available data sets regarding the Italian school system, with a focus on the infrastructural aspects. Input datasets are downloaded from the web, with the aim of updating everything to real time. The functions are divided in four main modules, namely 'Get', to scrape raw data from the web 'Util', various utilities needed to process raw data 'Group', to aggregate data at the municipality or province level 'Map', to visualize the output datasets.

Maintained by Leonardo Cefalo. Last updated 2 months ago.

0.5 match 3.88 score

ktemadarko

rGhanaCensus:2021 Ghana Population and Housing Census Results as Data Frames

Datasets from the 2021 Ghana Population and Housing Census Results. Users can access results as 'tidyverse' and 'sf'-Ready Data Frames. The data in this package is scraped from pdf reports released by the Ghana Statistical Service website <https://census2021.statsghana.gov.gh/> . The package currently only contains datasets from the literacy and education reports. Namely, school attendance data for respondents aged 3 years and above.

Maintained by Ama Owusu-Darko. Last updated 3 years ago.

0.5 match 3.70 score 2 scripts

matt-dray

altcheckr:Assess Image Alt Text on a Web Page

Scrape image element attributes from a webpage, detect alternative (alt) text and assess it with simple heuristics. Alt text is important for users of assistive technologies, like screen readers, for understanding the content of images. This package should be used in conjunction with other accessibility assessment tools for more comprehensive coverage.

Maintained by Matt Dray. Last updated 4 years ago.

accessibility alt-text webscraping

0.5 match 7 stars 3.54 score 6 scripts

aymennasri

ggfootball:Plotting Football matches Expected Goals (xG) Stats with 'Understat' Data

Scrapes footbal match shots data from 'Understat' <https://understat.com/> and visualizes it using interactive plots: - A detailed shot map displaying the location, type, and xG value of shots taken by both teams. - An xG timeline chart showing the cumulative xG for each team over time, annotated with the details of scored goals.

Maintained by Aymen Nasri. Last updated 10 days ago.

football football-analytics football-data football-scores soccer soccer-analytics soccer-data sports-data understat

0.5 match 1 stars 3.54 score 10 scripts

schochastics

webbotparseR:Parse html files containing search engine results

Parse search engine results which have been scraped with the 'WebBot' browser extension <https://github.com/gesiscss/WebBot>.

Maintained by David Schoch. Last updated 4 months ago.

browser-extension search-engine

0.5 match 8 stars 3.38 score 6 scripts

mayamathur

MetaUtility:Utility Functions for Conducting and Interpreting Meta-Analyses

Contains functions to estimate the proportion of effects stronger than a threshold of scientific importance (function prop_stronger), to nonparametrically characterize the distribution of effects in a meta-analysis (calib_ests, pct_pval), to make effect size conversions (r_to_d, r_to_z, z_to_r, d_to_logRR), to compute and format inference in a meta-analysis (format_CI, format_stat, tau_CI), to scrape results from existing meta-analyses for re-analysis (scrape_meta, parse_CI_string, ci_to_var).

Maintained by Maya B. Mathur. Last updated 3 years ago.

0.5 match 3.40 score 21 scripts 2 dependents

chris-dworschak

disastr.api:Wrapper for the UN OCHA ReliefWeb Disaster Events API

Access and manage the application programming interface (API) of the United Nations Office for the Coordination of Humanitarian Affairs' (OCHA) ReliefWeb disaster events at <https://reliefweb.int/disasters>. The package requires a minimal number of dependencies. It offers functionality to retrieve a user-defined sample of disaster events from ReliefWeb, providing an easy alternative to scraping the ReliefWeb website. It enables a seamless integration of regular data updates into the research work flow.

Maintained by Christoph Dworschak. Last updated 11 months ago.

api-wrapper disaster-events ocha reliefweb

0.5 match 3 stars 3.18 score 6 scripts

amalan-constat

SLPresElection:Presidential Election Data of "Sri Lanka" from 1982 to 2015

Presidential Election data of "Sri Lanka"" is stored in Pdf files, through Pdf scraping they are converted into data-frames and stored in this R package.

Maintained by Amalan Mahendran. Last updated 5 months ago.

presidential-election sri-lanka

0.5 match 1 stars 3.00 score 4 scripts

pilacuan-bonete-luis

LDABiplots:Biplot Graphical Interface for LDA Models

Contains the development of a tool that provides a web-based graphical user interface (GUI) to perform Biplots representations from a scraping of news from digital newspapers under the Bayesian approach of Latent Dirichlet Assignment (LDA) and machine learning algorithms. Contains LDA methods described by Blei , David M., Andrew Y. Ng and Michael I. Jordan (2003) <https://jmlr.org/papers/volume3/blei03a/blei03a.pdf>, and Biplot methods described by Gabriel K.R(1971) <doi:10.1093/biomet/58.3.453> and Galindo-Villardon P(1986) <https://diarium.usal.es/pgalindo/files/2012/07/Questiio.pdf>.

Maintained by Luis Pilacuan-Bonete. Last updated 3 years ago.

0.5 match 3.00 score 4 scripts

kvasilopoulos

ihpdr:Download Data from the International House Price Database

Web scraping the <https://www.dallasfed.org> for up-to-date data on international house prices and exuberance indicators. Download data in tidy format.

Maintained by Kostas Vasilopoulos. Last updated 4 years ago.

0.5 match 2.70 score 9 scripts

amalan-constat

SouthParkRshiny:Data and 'Shiny' Application for the Show 'SouthPark'

Ratings, votes, swear words and sentiments are analysed for the show 'SouthPark' through a 'Shiny' application after web scraping from 'IMDB' and the website <https://southpark.fandom.com/wiki/South_Park_Archives>.

Maintained by Amalan Mahendran. Last updated 1 years ago.

0.5 match 1 stars 2.70 score

cran

BAwiR:Analysis of Basketball Data

Collection of tools to work with European basketball data. Functions available are related to friendly web scraping, data management and visualization. Data were obtained from <https://www.euroleaguebasketball.net/euroleague/>, <https://www.euroleaguebasketball.net/eurocup/> and <https://www.acb.com/>, following the instructions of their respectives robots.txt files, when available. Box score data are available for the three leagues. Play-by-play data are also available for the Spanish league. Methods for analysis include a population pyramid, 2D plots, circular plots of players' percentiles, plots of players' monthly/yearly stats, team heatmaps, team shooting plots, team four factors plots, cross-tables with the results of regular season games, maps of nationalities, combinations of lineups, possessions-related variables, timeouts, performance by periods, personal fouls and offensive rebounds. Please see Vinue (2020) <doi:10.1089/big.2018.0124> and Vinue (2024) <doi:10.1089/big.2023.0177>.

Maintained by Guillermo Vinue. Last updated 1 months ago.

0.5 match 1 stars 2.60 score

bentaylor1

miscFuncs:Miscellaneous Useful Functions Including LaTeX Tables, Kalman Filtering, QQplots with Simulation-Based Confidence Intervals, Linear Regression Diagnostics and Development Tools

Implementing various things including functions for LaTeX tables, the Kalman filter, QQ-plots with simulation-based confidence intervals, linear regression diagnostics, web scraping, development tools, relative risk and odds rati, GARCH(1,1) Forecasting.

Maintained by Benjamin M. Taylor. Last updated 4 months ago.

0.5 match 2.48 score 8 scripts

marchionnilab

covid19census:Extracts Covid-19 and other demographic metrics regarding U.S.A and Italy

Package with functions to scrape data regarding COVID-19 epidemic in U.S.A and Italy, as well as datasets with related indexes.

Maintained by claudio_zanettini. Last updated 4 years ago.

0.5 match 2 stars 2.00 score 5 scripts

sebkrantz

samadb:South Africa Macroeconomic Database API

An R API providing access to a relational database with macroeconomic time series data for South Africa, obtained from the South African Reserve Bank (SARB) and Statistics South Africa (STATSSA), and updated on a weekly basis via the EconData <https://www.econdata.co.za/> platform and automated scraping of the SARB and STATSSA websites. The database is maintained at the Department of Economics at Stellenbosch University.

Maintained by Sebastian Krantz. Last updated 10 months ago.

0.5 match 1.00 score 2 scripts