Showing 7 of total 7 results (show query)
salimk
Rcrawler:Web Crawler and Scraper
Performs parallel web crawling and web scraping. It is designed to crawl, parse and store web pages to produce data that can be directly used for analysis application. For details see Khalil and Fakir (2017) <DOI:10.1016/j.softx.2017.04.004>.
Maintained by Salim Khalil. Last updated 5 years ago.
crawlercrawlersscraperwebcrawlerwebscraperwebscrapingwebscrapping
24.3 match 354 stars 6.89 score 110 scriptsropensci
robotstxt:A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker
Provides functions to download and parse 'robots.txt' files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, ...) are allowed to access specific resources on a domain.
Maintained by Jordan Bradford. Last updated 4 months ago.
crawlerpeer-reviewedrobotstxtscraperspiderwebscraping
14.3 match 68 stars 10.43 score 414 scripts 7 dependentsdmi3kno
polite:Be Nice on the Web
Be responsible when scraping data from websites by following polite principles: introduce yourself, ask for permission, take slowly and never ask twice.
Maintained by Dmytro Perepolkin. Last updated 2 years ago.
crawlermemoiserate-limiterrobotstxtrvestscraperwebscraping
11.0 match 327 stars 8.98 score 596 scripts 5 dependentsforkonlp
N2H4:Handling Methods for Naver News Text Crawling
Provides some functions to get Korean text sample from news articles in Naver which is popular news portal service <https://news.naver.com/> in Korea.
Maintained by Chanyub Park. Last updated 1 years ago.
crawlercrawlinggetcommentshacktoberfesthacktoberfest2021koreannavernewssort
11.0 match 216 stars 6.11 score 20 scriptsbigelowlab
thredds:Crawler for Navigating THREDDS Catalogs
Provides a crawler for programmatically navigating THREDDS Data Server (<https://www.unidata.ucar.edu/software/tds/>) catalogs, and access dataset metadata and resources.
Maintained by Emmanuel Blondel. Last updated 17 days ago.
5.5 match 6 stars 4.00 score 33 scriptsvosonlab
vosonSML:Collecting Social Media Data and Generating Networks for Analysis
A suite of easy to use functions for collecting social media data and generating networks for analysis. Supports Mastodon, YouTube, Reddit and Web 1.0 data sources.
Maintained by Bryan Gertzel. Last updated 8 months ago.
hyperlinkmastodonnetwork-graphredditsnasocial-mediasocial-network-analysisvosonyoutube
2.0 match 79 stars 7.67 score 66 scripts 1 dependentsgastonbecerra
ojsr:Crawler and Data Scraper for Open Journal System ('OJS')
Crawler for 'OJS' pages and scraper for meta-data from articles. You can crawl 'OJS' archives, issues, articles, galleys, and search results. You can scrape articles metadata from their head tag in html, or from Open Archives Initiative ('OAI') records. Most of these functions rely on 'OJS' routing conventions (<https://docs.pkp.sfu.ca/dev/documentation/en/architecture-routes>).
Maintained by Gaston Becerra. Last updated 4 months ago.
3.4 match 3 stars 4.35 score 15 scripts