Scraper

Scraper is a modular data extraction framework designed to fetch pages, respect robots.txt and rate limits, and extract structured data using configurable selectors. It performs HTTP requests with retry and backoff, supports asynchronous crawling, and includes parsers for HTML, JSON, and XML. Pipelines enable data normalization, deduplication, and storage to JSON, CSV, or databases. The architecture uses adapters for site-specific logic, user agents, and proxy management, with built-in error handling, logging, and test fixtures. Features include pagination, scheduling, and incremental updates, all aimed at strong testability and extensibility.
























Recent Views

We don't have any recent views that match your criteria.