Showing 85 of total 85 results (show query)
billpetti
baseballr:Acquiring and Analyzing Baseball Data
Provides numerous utilities for acquiring and analyzing baseball data from online sources such as 'Baseball Reference' <https://www.baseball-reference.com/>, 'FanGraphs' <https://www.fangraphs.com/>, and the 'MLB Stats' API <https://www.mlb.com/>.
Maintained by Saiem Gilani. Last updated 4 months ago.
baseballpitchfxsabermetricsstatcast
55.2 match 380 stars 8.98 score 582 scriptstidyverse
rvest:Easily Harvest (Scrape) Web Pages
Wrappers around the 'xml2' and 'httr' packages to make it easy to download, then manipulate, HTML and XML.
Maintained by Hadley Wickham. Last updated 5 months ago.
17.6 match 1.5k stars 19.62 score 29k scripts 546 dependentsfeddelegrand7
ralger:Easy Web Scraping
The goal of 'ralger' is to facilitate web scraping in R.
Maintained by Mohamed El Fodil Ihaddaden. Last updated 8 months ago.
dataextractionwebcrawlingwebscraper-websitewebscraping
30.1 match 155 stars 7.41 score 33 scriptsrmi-pacta
pacta.data.scraping:Scrapes data from various web sources needed for PACTA
This package provides tools to scrape data from various web sources needed for PACTA.
Maintained by CJ Yetman. Last updated 5 months ago.
climate-changepactapactaversesustainable-finance
38.5 match 2 stars 3.89 score 3 scripts 1 dependentsgastonbecerra
ojsr:Crawler and Data Scraper for Open Journal System ('OJS')
Crawler for 'OJS' pages and scraper for meta-data from articles. You can crawl 'OJS' archives, issues, articles, galleys, and search results. You can scrape articles metadata from their head tag in html, or from Open Archives Initiative ('OAI') records. Most of these functions rely on 'OJS' routing conventions (<https://docs.pkp.sfu.ca/dev/documentation/en/architecture-routes>).
Maintained by Gaston Becerra. Last updated 4 months ago.
23.8 match 3 stars 4.35 score 15 scriptsdmi3kno
polite:Be Nice on the Web
Be responsible when scraping data from websites by following polite principles: introduce yourself, ask for permission, take slowly and never ask twice.
Maintained by Dmytro Perepolkin. Last updated 2 years ago.
crawlermemoiserate-limiterrobotstxtrvestscraperwebscraping
9.1 match 327 stars 8.98 score 596 scripts 5 dependentstill-tietz
parsel:Parallel Dynamic Web-Scraping Using 'RSelenium'
A system to increase the efficiency of dynamic web-scraping with 'RSelenium' by leveraging parallel processing. You provide a function wrapper for your 'RSelenium' scraping routine with a set of inputs, and 'parsel' runs it in several browser instances. Chunked input processing as well as error catching and logging ensures seamless execution and minimal data loss, even when unforeseen 'RSelenium' errors occur. You can additionally build safe scraping functions with minimal coding by utilizing constructor functions that act as wrappers around 'RSelenium' methods.
Maintained by Till Tietz. Last updated 1 years ago.
19.9 match 15 stars 3.88 score 8 scriptsjameshwade
gpttools:Extensions and Tools for gptstudio
gpttools is an R package that provides extensions to gptstudio to provide devtools-like functionality using the latest natural language processing (NLP) models. It is designed to make package development easier by providing a range of tools and functions that can be used to improve the quality of your package's documentation, testing, and maybe even functionality.
Maintained by James Wade. Last updated 7 months ago.
chatgptnlpopenaipackage-developmentrstudio-addin
10.8 match 293 stars 7.06 score 14 scriptspgomba
MDPIexploreR:Web Scraping and Bibliometric Analysis of MDPI Journals
Provides comprehensive tools to scrape and analyze data from the MDPI journals. It allows users to extract metrics such as submission-to-acceptance times, article types, and whether articles are part of special issues. The package can also visualize this information through plots. Additionally, 'MDPIexploreR' offers tools to explore patterns of self-citations within articles and provides insights into guest-edited special issues.
Maintained by Pablo Gรณmez Barreiro. Last updated 4 months ago.
analysisdata-analysisdata-visualizationmdpimetricsscientific-journalsvisualizationweb-scraping
10.9 match 20 stars 6.20 score 9 scriptsmartigso
stortingscrape:Access Data from the Norwegian Parliament API
Functions for retrieving general and specific data from the Norwegian Parliament, through the Norwegian Parliament API at <https://data.stortinget.no>.
Maintained by Martin Sรธyland. Last updated 9 days ago.
10.0 match 11 stars 6.02 score 24 scriptsashbythorpe
selenider:Concise, Lazy and Reliable Wrapper for 'chromote' and 'selenium'
A user-friendly wrapper for web automation, using either 'chromote' or 'selenium'. Provides a simple and consistent API to make web scraping and testing scripts easy to write and understand. Elements are lazy, and automatically wait for the website to be valid, resulting in reliable and reproducible code, with no visible impact on the experience of the programmer.
Maintained by Ashby Thorpe. Last updated 2 months ago.
8.0 match 39 stars 7.21 score 23 scriptsjsta
wikilake:Scrape Lake Metadata Tables from Wikipedia
Scrape lake metadata tables from Wikipedia <https://www.wikipedia.org/>.
Maintained by Jemma Stachelek. Last updated 2 years ago.
11.6 match 8 stars 4.83 score 17 scriptserictleung
pixarfilms:Pixar Films and Achievements
Data about Disney Pixar films provided by Wikipedia. This package contains data about the films, the people involved, and their awards.
Maintained by Eric Leung. Last updated 2 days ago.
datadata-sciencedatapackagedisneyimdbimdb-datasetpixarpixar-filmsweb-scrapingwikipedia
7.5 match 20 stars 7.42 score 23 scripts 1 dependentsjimmyday12
fitzRoy:Easily Scrape and Process AFL Data
An easy package for scraping and processing Australia Rules Football (AFL) data. 'fitzRoy' provides a range of functions for accessing publicly available data from 'AFL Tables' <https://afltables.com/afl/afl_index.html>, 'Footy Wire' <https://www.footywire.com> and 'The Squiggle' <https://squiggle.com.au>. Further functions allow for easy processing, cleaning and transformation of this data into formats that can be used for analysis.
Maintained by James Day. Last updated 2 months ago.
4.8 match 134 stars 10.74 score 324 scriptsctn-0094
DOPE:Drug Ontology Parsing Engine
Provides information on drug names (brand, generic and street) for drugs tracked by the DEA. There are functions that will search synonyms and return the drug names and types. The vignettes have extensive information on the work done to create the data for the package.
Maintained by Raymond Balise. Last updated 4 years ago.
5.8 match 21 stars 7.83 score 31 scriptsjaytimm
textpress:A Lightweight and Versatile NLP Toolkit
A simple Natural Language Processing (NLP) toolkit focused on search-centric workflows with minimal dependencies. The package offers key features for web scraping, text processing, corpus search, and text embedding generation via the 'HuggingFace API' <https://huggingface.co/docs/api-inference/index>.
Maintained by Jason Timm. Last updated 5 months ago.
corpus-searchnlpopenai-embeddingsweb-scraping
9.8 match 3 stars 4.18 scorechaoliu-cl
Goodreader:Scrape and Analyze 'Goodreads' Book Data
A comprehensive toolkit for scraping and analyzing book data from <https://www.goodreads.com/>. This package provides functions to search for books, scrape book details and reviews, perform sentiment analysis on reviews, and conduct topic modeling. It's designed for researchers, data analysts, and book enthusiasts who want to gain insights from 'Goodreads' data.
Maintained by Chao Liu. Last updated 13 days ago.
8.8 match 4.40 score 5 scriptsyuanchao-xu
gfer:Green Finance and Environmental Risk
Focuses on data collecting, analyzing and visualization in green finance and environmental risk research and analysis. Main function includes environmental data collecting from official websites such as MEP (Ministry of Environmental Protection of China, <https://www.mee.gov.cn>), water related projects identification and environmental data visualization.
Maintained by Yuanchao Xu. Last updated 3 months ago.
corporate-social-responsibilitycsrdata-analysisdata-scrapingenvironmental-riskgreen-financestock-data
7.5 match 8 stars 4.81 score 16 scriptsricilandolt
shadowr:Selenium Plugin to Manage Multi Level Shadow Elements on Web Page
Shadow Document Object Model is a web standard that offers component style and markup encapsulation. It is a critically important piece of the Web Components story as it ensures that a component will work in any environment even if other CSS or JavaScript is at play on the page. Custom HTML tags can't be directly identified with selenium tools, because Selenium doesn't provide any way to deal with shadow elements. Using this plugin you can handle any custom HTML tags.
Maintained by Ricardo Landolt. Last updated 3 years ago.
rseleniumrstudioscrapingshadow-dom
10.0 match 5 stars 3.44 score 11 scriptsbioc
midasHLA:R package for immunogenomics data handling and association analysis
MiDAS is a R package for immunogenetics data transformation and statistical analysis. MiDAS accepts input data in the form of HLA alleles and KIR types, and can transform it into biologically meaningful variables, enabling HLA amino acid fine mapping, analyses of HLA evolutionary divergence, KIR gene presence, as well as validated HLA-KIR interactions. Further, it allows comprehensive statistical association analysis workflows with phenotypes of diverse measurement scales. MiDAS closes a gap between the inference of immunogenetic variation and its efficient utilization to make relevant discoveries related to T cell, Natural Killer cell, and disease biology.
Maintained by Maciej Migdaล. Last updated 5 months ago.
cellbiologygeneticsstatisticalmethod
7.1 match 4.30 score 3 scriptsropensci
Rpolyhedra:Polyhedra Database
A polyhedra database scraped from various sources as R6 objects and 'rgl' visualizing capabilities.
Maintained by Alejandro Baranek. Last updated 5 months ago.
4.5 match 12 stars 6.21 score 30 scriptscorynissen
fitbitScraper:Scrapes Data from Fitbit
Scrapes data from Fitbit <http://www.fitbit.com>. This does not use the official API, but instead uses the API that the web dashboard uses to generate the graphs displayed on the dashboard after login at <http://www.fitbit.com>.
Maintained by Cory Nissen. Last updated 8 years ago.
3.9 match 118 stars 5.93 score 12 scripts 1 dependentsjsakaluk
dySEM:Dyadic Structural Equation Modeling
Scripting of structural equation models via 'lavaan' for Dyadic Data Analysis, and helper functions for supplemental calculations, tabling, and model visualization. Current models supported include Dyadic Confirmatory Factor Analysis, the ActorโPartner Interdependence Model (observed and latent), the Common Fate Model (observed and latent), Mutual Influence Model (latent), and the Bifactor Dyadic Model (latent).
Maintained by John Sakaluk. Last updated 24 days ago.
3.2 match 6 stars 6.16 score 10 scriptsneonscience
neonUtilities:Utilities for Working with NEON Data
NEON data packages can be accessed through the NEON Data Portal <https://www.neonscience.org> or through the NEON Data API (see <https://data.neonscience.org/data-api> for documentation). Data delivered from the Data Portal are provided as monthly zip files packaged within a parent zip file, while individual files can be accessed from the API. This package provides tools that aid in discovering, downloading, and reformatting data prior to use in analyses. This includes downloading data via the API, merging data tables by type, and converting formats. For more information, see the readme file at <https://github.com/NEONScience/NEON-utilities>.
Maintained by Claire Lunch. Last updated 1 months ago.
1.8 match 57 stars 10.66 score 944 scripts 15 dependentskwb-r
kwb.site:R Package for Scraping Our Offical KWB Website (Before Re-Design in 2021)
This package contains functions for scraping our official [KWB website](https://kompetenz-wasser.de). The data for all projects and people can be collected in order to provide an overview of the website`s content and in order to be integrate that data into a KWB knowledge repo.
Maintained by Michael Rustler. Last updated 3 years ago.
knowledge-repoproject-fakinr-seleniumrvestweb-scrapingwebsite
10.8 match 1.70 score 2 scriptstroyhill
VulnToolkit:Analysis of Tidal Datasets
Contains functions for analysis and summary of tidal datasets. Also provides access to tidal data collected by the National Oceanic and Atmospheric Administration's Center for Operational Oceanographic Products and Services and the Permanent Service for Mean Sea Level. For detailed description and application examples, see Hill, T.D. and S.C. Anisfeld (2021) <doi:10.6084/m9.figshare.14161202.v1> and Hill, T.D. and S.C. Anisfeld (2015) <doi:10.1016/j.ecss.2015.06.004>.
Maintained by Troy Hill. Last updated 4 years ago.
3.4 match 8 stars 5.18 score 19 scriptssportsdataverse
hoopR:Access Men's Basketball Play by Play Data
A utility to quickly obtain clean and tidy men's basketball play by play data. Provides functions to access live play by play and box score data from ESPN<https://www.espn.com> with shot locations when available. It is also a full NBA Stats API<https://www.nba.com/stats/> wrapper. It is also a scraping and aggregating interface for Ken Pomeroy's men's college basketball statistics website<https://kenpom.com>. It provides users with an active subscription the capability to scrape the website tables and analyze the data for themselves.
Maintained by Saiem Gilani. Last updated 1 years ago.
basketballcollege-basketballespnkenpomnbanba-analyticsnba-apinba-datanba-statisticsnba-statsnba-stats-apincaancaa-basketballncaa-bracketncaa-playersncaa-ratingsncaamsportsdataverse
2.5 match 91 stars 6.93 score 261 scriptsschochastics
paperwizard:Scrape News Sites using 'readability.js'
Uses Mozillas readability.js to scrape text from websites. This is particularly useful to obtain news articles.
Maintained by David Schoch. Last updated 1 months ago.
5.3 match 6 stars 3.08 scoreropensci
awardFindR:awardFindR
Queries a number of scientific awards databases. Collects relevant results based on keyword and date parameters, returns list of projects that fit those criteria as a data frame. Sources include: Arnold Ventures, Carnegie Corp, Federal RePORTER, Gates Foundation, MacArthur Foundation, Mellon Foundation, NEH, NIH, NSF, Open Philanthropy, Open Society Foundations, Rockefeller Foundation, Russell Sage Foundation, Robert Wood Johnson Foundation, Sloan Foundation, Social Science Research Council, John Templeton Foundation, and USASpending.gov.
Maintained by Michael McCall. Last updated 12 months ago.
3.5 match 16 stars 4.38 score 3 scriptsbioc
rebook:Re-using Content in Bioconductor Books
Provides utilities to re-use content across chapters of a Bioconductor book. This is mostly based on functionality developed while writing the OSCA book, but generalized for potential use in other large books with heavy compute. Also contains some functions to assist book deployment.
Maintained by Aaron Lun. Last updated 5 months ago.
softwareinfrastructurereportwriting
4.1 match 3.65 score 223 scriptstakeshinishimura
jstager:Retrieve Information Published on J-STAGE
Provides tools to access the J-STAGE WebAPI and retrieve information published on J-STAGE <https://www.jstage.jst.go.jp/browse/-char/ja>.
Maintained by Takeshi Nishimura. Last updated 7 months ago.
3.7 match 3 stars 4.08 score 5 scriptsmattcowgill
readabs:Download and Tidy Time Series Data from the Australian Bureau of Statistics
Downloads, imports, and tidies time series data from the Australian Bureau of Statistics <https://www.abs.gov.au/>.
Maintained by Matt Cowgill. Last updated 15 days ago.
absaustraliaaustralian-bureau-of-statisticsaustralian-datastatisticstidy-datatime-series
1.7 match 104 stars 8.85 score 180 scriptsvillegar
scrappy:A Simple Web Scraper
A group of functions to scrape data from different websites, for academic purposes.
Maintained by Roberto Villegas-Diaz. Last updated 1 years ago.
4.4 match 4 stars 3.30 scorecgoo4
usedthese:Summarises Package & Function Usage
Consistent with 'knitr' syntax highlighting, 'usedthese' adds a summary table of package & function usage to a Quarto document and enables aggregation of usage across a website.
Maintained by Carl Goodwin. Last updated 8 months ago.
1.8 match 7 stars 6.70 score 120 scriptsexetrujillo
datamedios:Scraping Chilean Media
A system for extracting news from Chilean media, specifically through Web Scapping from Chilean media. The package allows for news searches using search phrases and date filters, and returns the results in a structured format, ready for analysis. Additionally, it includes functions to clean the extracted data, visualize it, and store it in databases. All of this can be done automatically, facilitating the collection and analysis of relevant information from Chilean media.
Maintained by Exequiel Trujillo. Last updated 28 days ago.
3.3 match 1 stars 3.60 scoresportsdataverse
wehoop:Access Women's Basketball Play by Play Data
A utility for working with women's basketball data. A scraping and aggregating interface for the WNBA Stats API <https://stats.wnba.com/> and ESPN's <https://www.espn.com> women's college basketball and WNBA statistics. It provides users with the capability to access the game play-by-plays, box scores, standings and results to analyze the data for themselves.
Maintained by Saiem Gilani. Last updated 8 months ago.
college-basketballespnespn-statsncaancaa-basketballprofessional-basketball-datasportsdataversewnbawnba-playerswnba-statswomens-basketball
2.2 match 28 stars 5.36 score 54 scriptslindbrook
packageRank:Computation and Visualization of Package Download Counts and Percentile Ranks
Compute and visualize package download counts and percentile ranks from Posit/RStudio's CRAN mirror.
Maintained by lindbrook. Last updated 5 days ago.
1.9 match 28 stars 6.13 score 27 scriptsliserman
archiveRetriever:Retrieve Archived Web Pages from the 'Internet Archive'
Scraping content from archived web pages stored in the 'Internet Archive' (<https://archive.org>) using a systematic workflow. Get an overview of the mementos available from the respective homepage, retrieve the Urls and links of the page and finally scrape the content. The final output is stored in tibbles, which can be then easily used for further analysis.
Maintained by Lukas Isermann. Last updated 9 months ago.
2.6 match 13 stars 4.32 score 16 scriptsmichalovadek
eurlex:Retrieve Data on European Union Law
Access to data on European Union laws and court decisions made easy with pre-defined 'SPARQL' queries and 'GET' requests. See Ovadek (2021) <doi:10.1080/2474736X.2020.1870150> .
Maintained by Michal Ovadek. Last updated 7 months ago.
courtseurlexeuropean-unionlawlegislationsparql
1.8 match 36 stars 6.18 score 21 scriptsdylanpieper
batchLLM:Batch Process LLM Text Completions Using a Data Frame
Batch process large language model (LLM) text completions using data frame rows, with support for OpenAI's 'GPT' (<https://chat.openai.com>), Anthropic's 'Claude' (<https://claude.ai>), and Google's 'Gemini' (<https://gemini.google.com>). Includes features such as local storage, metadata logging, API rate limiting delays, and a 'shiny' app addin.
Maintained by Dylan Pieper. Last updated 1 months ago.
2.3 match 11 stars 4.85 score 6 scriptscran
NHSDataDictionaRy:NHS Data Dictionary Toolset for NHS Lookups
Providing a common set of simplified web scraping tools for working with the NHS Data Dictionary <https://datadictionary.nhs.uk/data_elements_overview.html>. The intended usage is to access the data elements section of the NHS Data Dictionary to access key lookups. The benefits of having it in this package are that the lookups are the live lookups on the website and will not need to be maintained. This package was commissioned by the NHS-R community <https://nhsrcommunity.com/> to provide this consistency of lookups. The OpenSafely lookups have now been added <https://www.opencodelists.org/docs/>.
Maintained by Gary Hutson. Last updated 4 years ago.
5.1 match 2.00 scorepaithiov909
aznyan:An 'Utanet' Scraper and Utilities
Scrape lyrics from 'Utanet' website.
Maintained by Akiru Kato. Last updated 10 months ago.
4.5 match 2.00 score 1 scriptskumes
chatAI4R:Chat-Based Interactive Artificial Intelligence for R
The Large Language Model (LLM) represents a groundbreaking advancement in data science and programming, and also allows us to extend the world of R. A seamless interface for integrating the 'OpenAI' Web APIs into R is provided in this package. This package leverages LLM-based AI techniques, enabling efficient knowledge discovery and data analysis (see 'OpenAI' Web APIs details <https://openai.com/blog/openai-api>). The previous functions such as seamless translation and image generation have been moved to other packages 'deepRstudio' and 'stableDiffusion4R'.
Maintained by Satoshi Kume. Last updated 1 months ago.
aibioinformaticschatgptgptimageimage-generation
1.8 match 14 stars 4.45 score 3 scriptsglobeandmail
upstartr:Utilities Powering the Globe and Mail's Data Journalism Template
Core functions necessary for using The Globe and Mail's R data journalism template, 'startr', along with utilities for day-to-day data journalism tasks, such as reading and writing files, producing graphics and cleaning up datasets.
Maintained by Tom Cardoso. Last updated 1 years ago.
datadata-analysisdata-journalismdata-visualizationjournalismnews
1.8 match 6 stars 4.14 score 46 scriptsscholaempirica
reschola:The Schola Empirica Package
A collection of utilies, themes and templates for data analysis at Schola Empirica.
Maintained by Jan Netรญk. Last updated 5 months ago.
1.5 match 4 stars 4.83 score 14 scriptsedonnachie
ICD10gm:Metadata Processing for the German Modification of the ICD-10 Coding System
Provides convenient access to the German modification of the International Classification of Diagnoses, 10th revision (ICD-10-GM). It provides functionality to aid in the identification, specification and historisation of ICD-10 codes. Its intended use is the analysis of routinely collected data in the context of epidemiology, medical research and health services research. The underlying metadata are released by the German Institute for Medical Documentation and Information <https://www.dimdi.de>, and are redistributed in accordance with their license.
Maintained by Ewan Donnachie. Last updated 1 years ago.
bfarmcharlsoncomorbiditiesdiagnosesdimdiicd-10metadataroutinedatenversorgungsforschung
1.3 match 10 stars 5.30 score 20 scriptsbioc
fobitools:Tools for Manipulating the FOBI Ontology
A set of tools for interacting with the Food-Biomarker Ontology (FOBI). A collection of basic manipulation tools for biological significance analysis, graphs, and text mining strategies for annotating nutritional data.
Maintained by Pol Castellano-Escuder. Last updated 4 months ago.
massspectrometrymetabolomicssoftwarevisualizationbiomedicalinformaticsgraphandnetworkannotationcheminformaticspathwaysgenesetenrichmentbiological-intrerpretationbiological-knowledgebiological-significance-analysisenrichment-analysisfood-biomarker-ontologyknowledge-graphnutritionobofoundryontologytext-mining
1.2 match 1 stars 5.08 score 5 scriptsp0bs
DataKindR:Provides Helper Functions for DataKind Volunteers
DataKind volunteers often need access to specific data or techniques in a DataDive or other project. This package seeks to simplify access to these resources.
Maintained by Robin Penfold. Last updated 9 months ago.
1.8 match 3.18 score 3 scriptsjboelaert
scraEP:Scrape the Web with Extra Power
Tools for scraping information from webpages and other XML contents, using XPath or CSS selectors.
Maintained by Julien Boelaert. Last updated 4 years ago.
5.5 match 1.00 score 3 scriptsgunratan
edgar:Tool for the U.S. SEC EDGAR Retrieval and Parsing of Corporate Filings
In the USA, companies file different forms with the U.S. Securities and Exchange Commission (SEC) through EDGAR (Electronic Data Gathering, Analysis, and Retrieval system). The EDGAR database automated system collects all the different necessary filings and makes it publicly available. This package facilitates retrieving, storing, searching, and parsing of all the available filings on the EDGAR server. It downloads filings from SEC server in bulk with a single query. Additionally, it provides various useful functions: extracts 8-K triggering events, extract "Business (Item 1)" and "Management's Discussion and Analysis(Item 7)" sections of annual statements, searches filings for desired keywords, provides sentiment measures, parses filing header information, and provides HTML view of SEC filings.
Maintained by Gunratan Lonare. Last updated 9 days ago.
1.8 match 10 stars 2.79 score 61 scriptsgojiplus
tuber:Client for the YouTube API
Get comments posted on YouTube videos, information on how many times a video has been liked, search for videos with particular content, and much more. You can also scrape captions from a few videos. To learn more about the YouTube API, see <https://developers.google.com/youtube/v3/>.
Maintained by Gaurav Sood. Last updated 8 months ago.
access-youtubecaptionvideoyoutubeyoutube-apiyoutube-oauth
0.5 match 184 stars 8.99 score 206 scriptsyonicd
sinew:Package Development Documentation and Namespace Management
Manage package documentation and namespaces from the command line. Programmatically attach namespaces in R and Rmd script, populates Roxygen2 skeletons with information scraped from within functions and populate the Imports field of the DESCRIPTION file.
Maintained by Jonathan Sidi. Last updated 1 years ago.
0.5 match 166 stars 8.54 score 88 scriptsthinkr-open
cranology:The CRAN Chronology
Scraping routines and datasets to monitor the evolution of the number of packages on CRAN.
Maintained by Antoine Languillaume. Last updated 6 months ago.
2.4 match 1 stars 1.70 score 5 scriptskeberwein
blscrapeR:An API Wrapper for the United States Bureau of Labor Statistics
Scrapes various data from <https://www.bls.gov/>. The Bureau of Labor Statistics is the statistical branch of the United States Department of Labor. The package has additional functions to help parse, analyze and visualize the data.
Maintained by Kris Eberwein. Last updated 1 years ago.
apiapi-wrapperblsbureau-of-labor-statisticsconsumer-price-indexcpiinflationinflation-calculatorlabor-statisticsunemployment
0.5 match 112 stars 7.66 score 270 scriptssstoeckl
crypto2:Download Crypto Currency Data from 'CoinMarketCap' without 'API'
Retrieves crypto currency information and historical prices as well as information on the exchanges they are listed on. Historical data contains daily open, high, low and close values for all crypto currencies. All data is scraped from <https://coinmarketcap.com> via their 'web-api'.
Maintained by Sebastian Stoeckl. Last updated 7 days ago.
0.5 match 56 stars 7.33 score 60 scripts 1 dependentssalimk
Rcrawler:Web Crawler and Scraper
Performs parallel web crawling and web scraping. It is designed to crawl, parse and store web pages to produce data that can be directly used for analysis application. For details see Khalil and Fakir (2017) <DOI:10.1016/j.softx.2017.04.004>.
Maintained by Salim Khalil. Last updated 5 years ago.
crawlercrawlersscraperwebcrawlerwebscraperwebscrapingwebscrapping
0.5 match 354 stars 6.89 score 110 scriptssewardlee337
finreportr:Financial Data from U.S. Securities and Exchange Commission
Download and display company financial data from the U.S. Securities and Exchange Commission's EDGAR database. It contains a suite of functions with web scraping and XBRL parsing capabilities that allows users to extract data from EDGAR in an automated and scalable manner. See <https://www.sec.gov/edgar/searchedgar/companysearch.html> for more information.
Maintained by Seward Lee. Last updated 3 years ago.
balance-sheetcash-flowfinancefinancial-datafinancial-statementfinancial-statementsincome-statementsecstock-ticker-symbol
0.5 match 131 stars 6.28 score 29 scriptscran
WhatsR:Parsing, Anonymizing and Visualizing Exported 'WhatsApp' Chat Logs
Imports 'WhatsApp' chat logs and parses them into a usable dataframe object. The parser works on chats exported from Android or iOS phones and on Linux, macOS and Windows. The parser has multiple options for extracting smileys and emojis from the messages, extracting URLs and domains from the messages, extracting names and types of sent media files from the messages, extracting timestamps from messages, extracting and anonymizing author names from messages. Can be used to create anonymized versions of data.
Maintained by Julian Kohne. Last updated 1 years ago.
1.7 match 1.70 score 3 scriptselipousson
esri2sf:Create Simple Features from ArcGIS Server REST API
This package enables you to scrape geographic features directly from ArcGIS servers REST API into R as simple features.
Maintained by Eli Pousson. Last updated 4 months ago.
0.5 match 7 stars 5.28 score 50 scripts 1 dependentsropensci
epair:EPA Data Helper for R
Aid the user in making queries to the EPA API site found at https://aqs.epa.gov/aqsweb/documents/data_api. This package combines API calling methods from various web scraping packages with specific strings to retrieve data from the EPA API. It also contains easy to use loaded variables that help a user navigate services offered by the API and aid the user in determining the appropriate way to make a an API call.
Maintained by G.L. Orozco-Mulfinger. Last updated 3 years ago.
0.5 match 7 stars 4.89 score 11 scriptsgiocomai
castarter:Content Analysis Starter Toolkit
Consistent approaches for basic web scraping, text mining and word frequency analysis of textual datasets
Maintained by Giorgio Comai. Last updated 2 days ago.
0.5 match 3 stars 4.52 score 2 scriptsfatelarico
FinNet:Quickly Build and Manipulate Financial Networks
Providing classes, methods, and functions to deal with financial networks. Users can easily store information about both physical and legal persons by using pre-made classes that are studied for integration with scraping packages such as 'rvest' and 'RSelenium'. Moreover, the package assists in creating various types of financial networks depending on the type of relation between its units depending on the relation under scrutiny (ownership, board interlocks, etc.), the desired tie type (valued or binary), and renders them in the most common formats (adjacency matrix, incidence matrix, edge list, 'igraph', 'network'). There are also ad-hoc functions for the Fiedler value, global network efficiency, and cascade-failure analysis.
Maintained by Fabio Ashtar Telarico. Last updated 5 months ago.
0.5 match 2 stars 4.78 score 7 scriptsconjugateprior
twfy:Drive the API for TheyWorkForYou
An R wrapper around the API of TheyWorkForYou, a parliamentary monitoring site that scrapes and repackages Hansard (the UK's parliamentary record) and augments it with information from the Register of Members' Interests, election results, and voting records to provide a unified source of information about UK legislators and their activities. See <http://www.theyworkforyou.com> for details.
Maintained by Will Lowe. Last updated 6 years ago.
0.5 match 9 stars 4.65 score 3 scriptsshabbychef
cocktailApp:'shiny' App to Discover Cocktails
A 'shiny' app to discover cocktails. The app allows one to search for cocktails by ingredient, filter on rating, and number of ingredients. The package also contains data with the ingredients of nearly 26 thousand cocktails scraped from the web.
Maintained by Steven E. Pav. Last updated 3 years ago.
0.5 match 43 stars 4.33 score 5 scriptsashbaldry
appler:'Apple App Store' and 'iTunes' Data Extraction
Using 'Apple App Store' <https://www.apple.com/app-store/> web scraping and 'iTunes' API <https://performance-partners.apple.com/search-api> to extract content information, app ratings and reviews.
Maintained by Ashley Baldry. Last updated 2 years ago.
0.5 match 18 stars 4.13 score 15 scriptseflores89
banxicoR:Download Data from the Bank of Mexico
Provides functions to scrape IQY calls to Bank of Mexico, downloading and ordering the data conveniently.
Maintained by Eduardo Flores. Last updated 7 years ago.
0.5 match 11 stars 3.74 score 7 scriptsmuschellij2
gcite:Google Citation Parser
Scrapes Google Citation pages and creates data frames of citations over time.
Maintained by John Muschelli. Last updated 3 years ago.
0.6 match 3 stars 3.67 score 31 scriptslcef97
SchoolDataIT:Retrieve, Harmonise and Map Open Data Regarding the Italian School System
Compiles and displays the available data sets regarding the Italian school system, with a focus on the infrastructural aspects. Input datasets are downloaded from the web, with the aim of updating everything to real time. The functions are divided in four main modules, namely 'Get', to scrape raw data from the web 'Util', various utilities needed to process raw data 'Group', to aggregate data at the municipality or province level 'Map', to visualize the output datasets.
Maintained by Leonardo Cefalo. Last updated 2 months ago.
0.5 match 3.88 scorektemadarko
rGhanaCensus:2021 Ghana Population and Housing Census Results as Data Frames
Datasets from the 2021 Ghana Population and Housing Census Results. Users can access results as 'tidyverse' and 'sf'-Ready Data Frames. The data in this package is scraped from pdf reports released by the Ghana Statistical Service website <https://census2021.statsghana.gov.gh/> . The package currently only contains datasets from the literacy and education reports. Namely, school attendance data for respondents aged 3 years and above.
Maintained by Ama Owusu-Darko. Last updated 3 years ago.
0.5 match 3.70 score 2 scriptsmatt-dray
altcheckr:Assess Image Alt Text on a Web Page
Scrape image element attributes from a webpage, detect alternative (alt) text and assess it with simple heuristics. Alt text is important for users of assistive technologies, like screen readers, for understanding the content of images. This package should be used in conjunction with other accessibility assessment tools for more comprehensive coverage.
Maintained by Matt Dray. Last updated 4 years ago.
accessibilityalt-textwebscraping
0.5 match 7 stars 3.54 score 6 scriptsaymennasri
ggfootball:Plotting Football matches Expected Goals (xG) Stats with 'Understat' Data
Scrapes footbal match shots data from 'Understat' <https://understat.com/> and visualizes it using interactive plots: - A detailed shot map displaying the location, type, and xG value of shots taken by both teams. - An xG timeline chart showing the cumulative xG for each team over time, annotated with the details of scored goals.
Maintained by Aymen Nasri. Last updated 10 days ago.
footballfootball-analyticsfootball-datafootball-scoressoccersoccer-analyticssoccer-datasports-dataunderstat
0.5 match 1 stars 3.54 score 10 scriptsschochastics
webbotparseR:Parse html files containing search engine results
Parse search engine results which have been scraped with the 'WebBot' browser extension <https://github.com/gesiscss/WebBot>.
Maintained by David Schoch. Last updated 4 months ago.
browser-extensionsearch-engine
0.5 match 8 stars 3.38 score 6 scriptsmayamathur
MetaUtility:Utility Functions for Conducting and Interpreting Meta-Analyses
Contains functions to estimate the proportion of effects stronger than a threshold of scientific importance (function prop_stronger), to nonparametrically characterize the distribution of effects in a meta-analysis (calib_ests, pct_pval), to make effect size conversions (r_to_d, r_to_z, z_to_r, d_to_logRR), to compute and format inference in a meta-analysis (format_CI, format_stat, tau_CI), to scrape results from existing meta-analyses for re-analysis (scrape_meta, parse_CI_string, ci_to_var).
Maintained by Maya B. Mathur. Last updated 3 years ago.
0.5 match 3.40 score 21 scripts 2 dependentschris-dworschak
disastr.api:Wrapper for the UN OCHA ReliefWeb Disaster Events API
Access and manage the application programming interface (API) of the United Nations Office for the Coordination of Humanitarian Affairs' (OCHA) ReliefWeb disaster events at <https://reliefweb.int/disasters>. The package requires a minimal number of dependencies. It offers functionality to retrieve a user-defined sample of disaster events from ReliefWeb, providing an easy alternative to scraping the ReliefWeb website. It enables a seamless integration of regular data updates into the research work flow.
Maintained by Christoph Dworschak. Last updated 11 months ago.
api-wrapperdisaster-eventsochareliefweb
0.5 match 3 stars 3.18 score 6 scriptsamalan-constat
SLPresElection:Presidential Election Data of "Sri Lanka" from 1982 to 2015
Presidential Election data of "Sri Lanka"" is stored in Pdf files, through Pdf scraping they are converted into data-frames and stored in this R package.
Maintained by Amalan Mahendran. Last updated 5 months ago.
presidential-electionsri-lanka
0.5 match 1 stars 3.00 score 4 scriptspilacuan-bonete-luis
LDABiplots:Biplot Graphical Interface for LDA Models
Contains the development of a tool that provides a web-based graphical user interface (GUI) to perform Biplots representations from a scraping of news from digital newspapers under the Bayesian approach of Latent Dirichlet Assignment (LDA) and machine learning algorithms. Contains LDA methods described by Blei , David M., Andrew Y. Ng and Michael I. Jordan (2003) <https://jmlr.org/papers/volume3/blei03a/blei03a.pdf>, and Biplot methods described by Gabriel K.R(1971) <doi:10.1093/biomet/58.3.453> and Galindo-Villardon P(1986) <https://diarium.usal.es/pgalindo/files/2012/07/Questiio.pdf>.
Maintained by Luis Pilacuan-Bonete. Last updated 3 years ago.
0.5 match 3.00 score 4 scriptskvasilopoulos
ihpdr:Download Data from the International House Price Database
Web scraping the <https://www.dallasfed.org> for up-to-date data on international house prices and exuberance indicators. Download data in tidy format.
Maintained by Kostas Vasilopoulos. Last updated 4 years ago.
0.5 match 2.70 score 9 scriptsamalan-constat
SouthParkRshiny:Data and 'Shiny' Application for the Show 'SouthPark'
Ratings, votes, swear words and sentiments are analysed for the show 'SouthPark' through a 'Shiny' application after web scraping from 'IMDB' and the website <https://southpark.fandom.com/wiki/South_Park_Archives>.
Maintained by Amalan Mahendran. Last updated 1 years ago.
0.5 match 1 stars 2.70 scorebentaylor1
miscFuncs:Miscellaneous Useful Functions Including LaTeX Tables, Kalman Filtering, QQplots with Simulation-Based Confidence Intervals, Linear Regression Diagnostics and Development Tools
Implementing various things including functions for LaTeX tables, the Kalman filter, QQ-plots with simulation-based confidence intervals, linear regression diagnostics, web scraping, development tools, relative risk and odds rati, GARCH(1,1) Forecasting.
Maintained by Benjamin M. Taylor. Last updated 4 months ago.
0.5 match 2.48 score 8 scriptsmarchionnilab
covid19census:Extracts Covid-19 and other demographic metrics regarding U.S.A and Italy
Package with functions to scrape data regarding COVID-19 epidemic in U.S.A and Italy, as well as datasets with related indexes.
Maintained by claudio_zanettini. Last updated 4 years ago.
0.5 match 2 stars 2.00 score 5 scriptssebkrantz
samadb:South Africa Macroeconomic Database API
An R API providing access to a relational database with macroeconomic time series data for South Africa, obtained from the South African Reserve Bank (SARB) and Statistics South Africa (STATSSA), and updated on a weekly basis via the EconData <https://www.econdata.co.za/> platform and automated scraping of the SARB and STATSSA websites. The database is maintained at the Department of Economics at Stellenbosch University.
Maintained by Sebastian Krantz. Last updated 10 months ago.
0.5 match 1.00 score 2 scripts