R-universe search: crawlers

Showing 7 of total 7 results (show query)

salimk

Rcrawler:Web Crawler and Scraper

Performs parallel web crawling and web scraping. It is designed to crawl, parse and store web pages to produce data that can be directly used for analysis application. For details see Khalil and Fakir (2017) <DOI:10.1016/j.softx.2017.04.004>.

Maintained by Salim Khalil. Last updated 5 years ago.

crawler crawlers scraper webcrawler webscraper webscraping webscrapping

24.3 match 354 stars 6.89 score 110 scripts

ropensci

robotstxt:A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

Provides functions to download and parse 'robots.txt' files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, ...) are allowed to access specific resources on a domain.

Maintained by Jordan Bradford. Last updated 4 months ago.

crawler peer-reviewed robotstxt scraper spider webscraping

14.3 match 68 stars 10.43 score 414 scripts 7 dependents

dmi3kno

polite:Be Nice on the Web

Be responsible when scraping data from websites by following polite principles: introduce yourself, ask for permission, take slowly and never ask twice.

Maintained by Dmytro Perepolkin. Last updated 2 years ago.

crawler memoise rate-limiter robotstxt rvest scraper webscraping

11.0 match 327 stars 8.98 score 596 scripts 5 dependents

forkonlp

N2H4:Handling Methods for Naver News Text Crawling

Provides some functions to get Korean text sample from news articles in Naver which is popular news portal service <https://news.naver.com/> in Korea.

Maintained by Chanyub Park. Last updated 1 years ago.

crawler crawling getcomments hacktoberfest hacktoberfest2021 korean naver news sort

11.0 match 216 stars 6.11 score 20 scripts

bigelowlab

thredds:Crawler for Navigating THREDDS Catalogs

Provides a crawler for programmatically navigating THREDDS Data Server (<https://www.unidata.ucar.edu/software/tds/>) catalogs, and access dataset metadata and resources.

Maintained by Emmanuel Blondel. Last updated 17 days ago.

5.5 match 6 stars 4.00 score 33 scripts

vosonlab

vosonSML:Collecting Social Media Data and Generating Networks for Analysis

A suite of easy to use functions for collecting social media data and generating networks for analysis. Supports Mastodon, YouTube, Reddit and Web 1.0 data sources.

Maintained by Bryan Gertzel. Last updated 8 months ago.

hyperlink mastodon network-graph reddit sna social-media social-network-analysis voson youtube

2.0 match 79 stars 7.67 score 66 scripts 1 dependents

gastonbecerra

ojsr:Crawler and Data Scraper for Open Journal System ('OJS')

Crawler for 'OJS' pages and scraper for meta-data from articles. You can crawl 'OJS' archives, issues, articles, galleys, and search results. You can scrape articles metadata from their head tag in html, or from Open Archives Initiative ('OAI') records. Most of these functions rely on 'OJS' routing conventions (<https://docs.pkp.sfu.ca/dev/documentation/en/architecture-routes>).

Maintained by Gaston Becerra. Last updated 4 months ago.

oai-pmh ojs scraper web-scraping

3.4 match 3 stars 4.35 score 15 scripts