Scraper
Scraper is a modular data extraction framework designed to fetch pages, respect robots.txt and rate limits, and extract structured data using configurable selectors. It performs HTTP requests with retry and backoff, supports asynchronous crawling, and includes parsers for HTML, JSON, and XML. Pipelines enable data normalization, deduplication, and storage to JSON, CSV, or databases. The architecture uses adapters for site-specific logic, user agents, and proxy management, with built-in error handling, logging, and test fixtures. Features include pagination, scheduling, and incremental updates, all aimed at strong testability and extensibility.