Getting Started
Prerequisites
- Python 3.8+
- Git
- pip (Python package manager)
Installation
1. Clone the repository
2. Install dependencies
3. Install Playwright browsers
Configuration
Edit the configuration in main.py:
Windows users: Use paths like C:\\output\\cache.db
Usage
Basic scraping
This will:
- Crawl listing pages to collect lot URLs
- Scrape each individual lot page
- Save results in JSON and CSV formats
- Cache all pages for future runs
Test mode
Debug extraction on a specific URL:
Output
The scraper generates:
troostwijk_lots_final_YYYYMMDD_HHMMSS.json - Complete data
troostwijk_lots_final_YYYYMMDD_HHMMSS.csv - CSV export
cache.db - SQLite cache (persistent)