_internal_db

0
2025-12-10 08:04:04 +01:00 · 2025-12-10 07:54:12 +01:00 · 2025-12-09 23:39:38 +01:00 · 2025-12-09 23:30:24 +01:00 · 2025-12-09 22:56:10 +01:00 · 2025-12-09 20:53:54 +01:00
30 changed files with 1428 additions and 3841 deletions
--- a/.aiignore
+++ b/.aiignore
@@ -10,3 +10,16 @@
 dist/
 build/
 out/
+# An .aiignore file follows the same syntax as a .gitignore file.
+# .gitignore documentation: https://git-scm.com/docs/gitignore
+
+# you can ignore files
+# or folders
+.idea
+node_modules/
+.vscode/
+.git
+.github
+scripts
+.pytest_cache/
+__pycache__
--- a/README.md
+++ b/README.md
@@ -1,85 +1,177 @@
-# Setup & IDE Configuration
+# Python Setup & IDE Guide

-##  Python Version Requirement
+Short, clear, Python‑focused.

-This project **requires Python 3.10 or higher**.
+---

-The code uses Python 3.10+ features including:
- Structural pattern matching
- Union type syntax (`X | Y`)
- Improved type hints
- Modern async/await patterns
+## Requirements

-## IDE Configuration
+- **Python 3.10+**  
+Uses pattern matching, modern type hints, async improvements.

-### PyCharm / IntelliJ IDEA
-
-If your IDE shows "Python 2.7 syntax" warnings, configure it for Python 3.10+:
-
-1. **File → Project Structure → Project Settings → Project**
-   - Set Python SDK to 3.10 or higher
-
-2. **File → Settings → Project → Python Interpreter**
-   - Select Python 3.10+ interpreter
-   - Click gear icon → Add → System Interpreter → Browse to your Python 3.10 installation
-
-3. **File → Settings → Editor → Inspections → Python**
-   - Ensure "Python version" is set to 3.10+
-   - Check "Code compatibility inspection" → Set minimum version to 3.10
-
-### VS Code
-
-Add to `.vscode/settings.json`:
-```json
-{
-    "python.pythonPath": "path/to/python3.10",
-    "python.analysis.typeCheckingMode": "basic",
-    "python.languageServer": "Pylance"
-}
+```bash
+python --version
 ```

+---
+
+## IDE Setup (PyCharm / IntelliJ)
+
+1. **Set interpreter:**  
+   *File → Settings → Project → Python Interpreter → Select Python 3.10+*
+
+2. **Fix syntax warnings:**  
+   *Editor → Inspections → Python → Set language level to 3.10+*
+
+3. **Ensure correct SDK:**  
+   *Project Structure → Project SDK → Python 3.10+*
+
+---
+
 ## Installation

 ```bash
-# Check Python version
-python --version  # Should be 3.10+
+# Activate venv
+~\venvs\scaev\Scripts\Activate.ps1

-# Install dependencies
+# Install deps
 pip install -r requirements.txt

-# Install Playwright browsers
+# Playwright browsers
 playwright install chromium
 ```

-## Verifying Setup
+---
+
+## Database Configuration (PostgreSQL)
+
+The scraper now uses PostgreSQL (no more SQLite files). Configure via `DATABASE_URL`:
+
+- Default (baked in):
+  `postgresql://auction:heel-goed-wachtwoord@192.168.1.159:5432/auctiondb`
+- Override for your environment:

 ```bash
-# Should print version 3.10.x or higher
-python -c "import sys; print(sys.version)"
+# Windows PowerShell
+$env:DATABASE_URL = "postgresql://user:pass@host:5432/dbname"

-# Should run without errors
+# Linux/macOS
+export DATABASE_URL="postgresql://user:pass@host:5432/dbname"
+```
+
+Packages used:
+- Driver: `psycopg[binary]`
+
+Nothing is written to local `.db` files anymore.
+
+---
+
+## Verify
+
+```bash
+python -c "import sys; print(sys.version)"
 python main.py --help
 ```

-## Common Issues
+Common fixes:

-### "ModuleNotFoundError: No module named 'playwright'"
 ```bash
 pip install playwright
 playwright install chromium
 ```

-### "Python 2.7 does not support..." warnings in IDE
- Your IDE is configured for Python 2.7
- Follow IDE configuration steps above
- The code WILL work with Python 3.10+ despite warnings
+---

-### Script exits with "requires Python 3.10 or higher"
- You're running Python 3.9 or older
- Upgrade to Python 3.10+: https://www.python.org/downloads/
+# Auto‑Start (Monitor)

-## Version Files
+## Linux (systemd) — Recommended

- `.python-version` - Used by pyenv and similar tools
- `requirements.txt` - Package dependencies
- Runtime checks in scripts ensure Python 3.10+
+```bash
+cd ~/scaev
+chmod +x install_service.sh
+./install_service.sh
+```
+
+Service features:
+- Auto‑start
+- Auto‑restart
+- Logs: `~/scaev/logs/monitor.log`
+
+```bash
+sudo systemctl status scaev-monitor
+journalctl -u scaev-monitor -f
+```
+
+---
+
+## Windows (Task Scheduler)
+
+```powershell
+cd C:\vibe\scaev
+.\setup_windows_task.ps1
+```
+
+Manage:
+
+```powershell
+Start-ScheduledTask "ScaevAuctionMonitor"
+```
+
+---
+
+# Cron Alternative (Linux)
+
+```bash
+crontab -e
+@reboot cd ~/scaev && python3 src/monitor.py 30 >> logs/monitor.log 2>&1
+0 * * * * pgrep -f monitor.py || (cd ~/scaev && python3 src/monitor.py 30 >> logs/monitor.log 2>&1 &)
+```
+
+---
+
+# Status Checks
+
+```bash
+ps aux | grep monitor.py
+tasklist | findstr python
+```
+
+---
+
+# Troubleshooting
+
+- Wrong interpreter → Set Python 3.10+
+- Multiple monitors running → kill extra processes
+- PostgreSQL connectivity → verify `DATABASE_URL`, network/firewall, and credentials
+- Service fails → check `journalctl -u scaev-monitor`
+
+---
+
+# Java Extractor (Short Version)
+
+Prereqs: **Java 21**, **Maven**
+
+Install:
+
+```bash
+mvn clean install
+mvn exec:java -Dexec.mainClass=com.microsoft.playwright.CLI -Dexec.args="install"
+```
+
+Run:
+
+```bash
+mvn exec:java -Dexec.args="--max-visits 3"
+```
+
+Enable native access (IntelliJ → VM Options):
+
+```
+--enable-native-access=ALL-UNNAMED
+```
+
+---
+
+---
+
+This file keeps everything compact, Python‑focused, and ready for onboarding.
--- a/db/migration/V1__initial_schema.sql
+++ b/db/migration/V1__initial_schema.sql
@@ -0,0 +1,139 @@
+-- Auctions
+CREATE TABLE auctions (
+    auction_id TEXT PRIMARY KEY,
+    url TEXT UNIQUE,
+    title TEXT,
+    location TEXT,
+    lots_count INTEGER,
+    first_lot_closing_time TEXT,
+    scraped_at TEXT,
+    city TEXT,
+    country TEXT,
+    type TEXT,
+    lot_count INTEGER DEFAULT 0,
+    closing_time TEXT,
+    discovered_at BIGINT
+);
+
+CREATE INDEX idx_auctions_country ON auctions(country);
+
+-- Cache
+CREATE TABLE cache (
+    url TEXT PRIMARY KEY,
+    content BYTEA,
+    timestamp DOUBLE PRECISION,
+    status_code INTEGER
+);
+
+CREATE INDEX idx_timestamp ON cache(timestamp);
+
+-- Lots
+CREATE TABLE lots (
+    lot_id TEXT PRIMARY KEY,
+    auction_id TEXT REFERENCES auctions(auction_id),
+    url TEXT UNIQUE,
+    title TEXT,
+    current_bid TEXT,
+    bid_count INTEGER,
+    closing_time TEXT,
+    viewing_time TEXT,
+    pickup_date TEXT,
+    location TEXT,
+    description TEXT,
+    category TEXT,
+    scraped_at TEXT,
+    sale_id INTEGER,
+    manufacturer TEXT,
+    type TEXT,
+    year INTEGER,
+    currency TEXT DEFAULT 'EUR',
+    closing_notified INTEGER DEFAULT 0,
+    starting_bid TEXT,
+    minimum_bid TEXT,
+    status TEXT,
+    brand TEXT,
+    model TEXT,
+    attributes_json TEXT,
+    first_bid_time TEXT,
+    last_bid_time TEXT,
+    bid_velocity DOUBLE PRECISION,
+    bid_increment DOUBLE PRECISION,
+    year_manufactured INTEGER,
+    condition_score DOUBLE PRECISION,
+    condition_description TEXT,
+    serial_number TEXT,
+    damage_description TEXT,
+    followers_count INTEGER DEFAULT 0,
+    estimated_min_price DOUBLE PRECISION,
+    estimated_max_price DOUBLE PRECISION,
+    lot_condition TEXT,
+    appearance TEXT,
+    estimated_min DOUBLE PRECISION,
+    estimated_max DOUBLE PRECISION,
+    next_bid_step_cents INTEGER,
+    condition TEXT,
+    category_path TEXT,
+    city_location TEXT,
+    country_code TEXT,
+    bidding_status TEXT,
+    packaging TEXT,
+    quantity INTEGER,
+    vat DOUBLE PRECISION,
+    buyer_premium_percentage DOUBLE PRECISION,
+    remarks TEXT,
+    reserve_price DOUBLE PRECISION,
+    reserve_met INTEGER,
+    view_count INTEGER,
+    api_data_json TEXT,
+    next_scrape_at BIGINT,
+    scrape_priority INTEGER DEFAULT 0
+);
+
+CREATE INDEX idx_lots_closing_time ON lots(closing_time);
+CREATE INDEX idx_lots_next_scrape ON lots(next_scrape_at);
+CREATE INDEX idx_lots_priority ON lots(scrape_priority DESC);
+CREATE INDEX idx_lots_sale_id ON lots(sale_id);
+
+-- Bid history
+CREATE TABLE bid_history (
+    id SERIAL PRIMARY KEY,
+    lot_id TEXT REFERENCES lots(lot_id),
+    bid_amount DOUBLE PRECISION NOT NULL,
+    bid_time TEXT NOT NULL,
+    is_autobid INTEGER DEFAULT 0,
+    bidder_id TEXT,
+    bidder_number INTEGER,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE INDEX idx_bid_history_bidder ON bid_history(bidder_id);
+CREATE INDEX idx_bid_history_lot_time ON bid_history(lot_id, bid_time);
+
+-- Images
+CREATE TABLE images (
+    id SERIAL PRIMARY KEY,
+    lot_id TEXT REFERENCES lots(lot_id),
+    url TEXT,
+    local_path TEXT,
+    downloaded INTEGER DEFAULT 0,
+    labels TEXT,
+    processed_at BIGINT
+);
+
+CREATE INDEX idx_images_lot_id ON images(lot_id);
+CREATE UNIQUE INDEX idx_unique_lot_url ON images(lot_id, url);
+
+-- Resource cache
+CREATE TABLE resource_cache (
+    url TEXT PRIMARY KEY,
+    content BYTEA,
+    content_type TEXT,
+    status_code INTEGER,
+    headers TEXT,
+    timestamp DOUBLE PRECISION,
+    size_bytes INTEGER,
+    local_path TEXT
+);
+
+CREATE INDEX idx_resource_timestamp ON resource_cache(timestamp);
+CREATE INDEX idx_resource_content_type ON resource_cache(content_type);
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -5,16 +5,29 @@ services:
      dockerfile: Dockerfile
    container_name: scaev
    restart: unless-stopped
+
+    # Voeg het PostgreSQL-netwerk toe
    networks:
      scaev_mobile_net:
        ipv4_address: 172.30.0.10
      traefik_net:
+      db_net:
+
    environment:
+      SCAEV_OFFLINE: 0
      RATE_LIMIT_SECONDS: "0.5"
      MAX_PAGES: "500"
      DOWNLOAD_IMAGES: "True"
+
+      # Nieuw: verbind intern via service-naam, niet via LAN IP
+      POSTGRES_HOST: postgres
+      POSTGRES_DB: auctiondb
+      POSTGRES_USER: auction
+      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
+
    volumes:
      - shared-auction-data:/mnt/okcomputer/output
+
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.scaev.rule=Host(`scaev.appmodel.nl`)"
@@ -23,7 +36,6 @@ services:
      - "traefik.http.routers.scaev.tls.certresolver=letsencrypt"
      - "traefik.http.services.scaev.loadbalancer.server.port=8000"

-
 networks:
  scaev_mobile_net:
    driver: bridge
@@ -33,10 +45,16 @@ networks:
      config:
        - subnet: 172.30.0.0/24
          gateway: 172.30.0.1
+
  traefik_net:
    external: true
    name: traefik_net

+  # Nieuw: gedeeld netwerk voor scaev en postgres
+  db_net:
+    external: true
+    name: db_net
+
 volumes:
  shared-auction-data:
    external: true
--- a/docs/API_INTELLIGENCE_FINDINGS.md
+++ b/docs/API_INTELLIGENCE_FINDINGS.md
@@ -1,240 +0,0 @@
-# API Intelligence Findings
-
-## GraphQL API - Available Fields for Intelligence
-
-### Key Discovery: Additional Fields Available
-
-From GraphQL schema introspection on `Lot` type:
-
-#### **Already Captured ✓**
- `currentBidAmount` (Money) - Current bid
- `initialAmount` (Money) - Starting bid
- `nextMinimalBid` (Money) - Minimum bid
- `bidsCount` (Int) - Bid count
- `startDate` / `endDate` (TbaDate) - Timing
- `minimumBidAmountMet` (MinimumBidAmountMet) - Status
- `attributes` - Brand/model extraction
- `title`, `description`, `images`
-
-#### **NEW - Available but NOT Captured:**
-
-1. **followersCount** (Int) - **CRITICAL for intelligence!**
-   - This is the "watch count" we thought was missing
-   - Indicates bidder interest level
-   - **ACTION: Add to schema and extraction**
-
-2. **biddingStatus** (BiddingStatus) - Lot bidding state
-   - More detailed than minimumBidAmountMet
-   - **ACTION: Investigate enum values**
-
-3. **estimatedFullPrice** (EstimatedFullPrice) - **Found it!**
-   - Available via `LotDetails.estimatedFullPrice`
-   - May contain estimated min/max values
-   - **ACTION: Test extraction**
-
-4. **nextBidStepInCents** (Long) - Exact bid increment
-   - More precise than our calculated bid_increment
-   - **ACTION: Replace calculated field**
-
-5. **condition** (String) - Direct condition field
-   - Cleaner than attribute extraction
-   - **ACTION: Use as primary source**
-
-6. **categoryInformation** (LotCategoryInformation) - Category data
-   - Structured category info
-   - **ACTION: Extract category path**
-
-7. **location** (LotLocation) - Lot location details
-   - City, country, possibly address
-   - **ACTION: Add to schema**
-
-8. **remarks** (String) - Additional notes
-   - May contain pickup/viewing text
-   - **ACTION: Check for viewing/pickup extraction**
-
-9. **appearance** (String) - Condition appearance
-   - Visual condition notes
-   - **ACTION: Combine with condition_description**
-
-10. **packaging** (String) - Packaging details
-    - Relevant for shipping intelligence
-
-11. **quantity** (Long) - Lot quantity
-    - Important for bulk lots
-
-12. **vat** (BigDecimal) - VAT percentage
-    - For total cost calculations
-
-13. **buyerPremiumPercentage** (BigDecimal) - Buyer premium
-    - For total cost calculations
-
-14. **videos** - Video URLs (if available)
-    - **ACTION: Add video support**
-
-15. **documents** - Document URLs (if available)
-    - May contain specs/manuals
-
-## Bid History API - Fields
-
-### Currently Captured ✓
- `buyerId` (UUID) - Anonymized bidder
- `buyerNumber` (Int) - Bidder number
- `currentBid.cents` / `currency` - Bid amount
- `autoBid` (Boolean) - Autobid flag
- `createdAt` (Timestamp) - Bid time
-
-### Additional Available:
- `negotiated` (Boolean) - Was bid negotiated
-  - **ACTION: Add to bid_history table**
-
-## Auction API - Not Available
- Attempted `auctionDetails` query - **does not exist**
- Auction data must be scraped from listing pages
-
-## Priority Actions for Intelligence
-
-### HIGH PRIORITY (Immediate):
-1. ✅ Add `followersCount` field (watch count)
-2. ✅ Add `estimatedFullPrice` extraction
-3. ✅ Use `nextBidStepInCents` instead of calculated increment
-4. ✅ Add `condition` as primary condition source
-5. ✅ Add `categoryInformation` extraction
-6. ✅ Add `location` details
-7. ✅ Add `negotiated` to bid_history table
-
-### MEDIUM PRIORITY:
-8. Extract `remarks` for viewing/pickup text
-9. Add `appearance` and `packaging` fields
-10. Add `quantity` field
-11. Add `vat` and `buyerPremiumPercentage` for cost calculations
-12. Add `biddingStatus` enum extraction
-
-### LOW PRIORITY:
-13. Add video URL support
-14. Add document URL support
-
-## Updated Schema Requirements
-
-### lots table - NEW columns:
-```sql
-ALTER TABLE lots ADD COLUMN followers_count INTEGER DEFAULT 0;
-ALTER TABLE lots ADD COLUMN estimated_min_price REAL;
-ALTER TABLE lots ADD COLUMN estimated_max_price REAL;
-ALTER TABLE lots ADD COLUMN location_city TEXT;
-ALTER TABLE lots ADD COLUMN location_country TEXT;
-ALTER TABLE lots ADD COLUMN lot_condition TEXT;  -- Direct from API
-ALTER TABLE lots ADD COLUMN appearance TEXT;
-ALTER TABLE lots ADD COLUMN packaging TEXT;
-ALTER TABLE lots ADD COLUMN quantity INTEGER DEFAULT 1;
-ALTER TABLE lots ADD COLUMN vat_percentage REAL;
-ALTER TABLE lots ADD COLUMN buyer_premium_percentage REAL;
-ALTER TABLE lots ADD COLUMN remarks TEXT;
-ALTER TABLE lots ADD COLUMN bidding_status TEXT;
-ALTER TABLE lots ADD COLUMN videos_json TEXT;  -- Store as JSON array
-ALTER TABLE lots ADD COLUMN documents_json TEXT;  -- Store as JSON array
-```
-
-### bid_history table - NEW column:
-```sql
-ALTER TABLE bid_history ADD COLUMN negotiated INTEGER DEFAULT 0;
-```
-
-## Intelligence Use Cases
-
-### With followers_count:
- Predict lot popularity and final price
- Identify hot items early
- Calculate interest-to-bid conversion rate
-
-### With estimated prices:
- Compare final price to estimate
- Identify bargains (final < estimate)
- Calculate auction house accuracy
-
-### With nextBidStepInCents:
- Show exact next bid amount
- Calculate optimal bidding strategy
-
-### With location:
- Filter by proximity
- Calculate pickup logistics
-
-### With vat/buyer_premium:
- Calculate true total cost
- Compare all-in prices
-
-### With condition/appearance:
- Better condition scoring
- Identify restoration projects
-
-## Updated GraphQL Query
-
-```graphql
-query EnhancedLotQuery($lotDisplayId: String!, $locale: String!, $platform: Platform!) {
-  lotDetails(displayId: $lotDisplayId, locale: $locale, platform: $platform) {
-    estimatedFullPrice {
-      min { cents currency }
-      max { cents currency }
-    }
-    lot {
-      id
-      displayId
-      title
-      description { text }
-      currentBidAmount { cents currency }
-      initialAmount { cents currency }
-      nextMinimalBid { cents currency }
-      nextBidStepInCents
-      bidsCount
-      followersCount
-      startDate
-      endDate
-      minimumBidAmountMet
-      biddingStatus
-      condition
-      appearance
-      packaging
-      quantity
-      vat
-      buyerPremiumPercentage
-      remarks
-      auctionId
-      location {
-        city
-        countryCode
-        addressLine1
-        addressLine2
-      }
-      categoryInformation {
-        id
-        name
-        path
-      }
-      images {
-        url
-        thumbnailUrl
-      }
-      videos {
-        url
-        thumbnailUrl
-      }
-      documents {
-        url
-        name
-      }
-      attributes {
-        name
-        value
-      }
-    }
-  }
-}
-```
-
-## Summary
-
-**NEW fields found:** 15+ additional intelligence fields available
-**Most critical:** `followersCount` (watch count), `estimatedFullPrice`, `nextBidStepInCents`
-**Data quality impact:** Estimated 80%+ increase in intelligence value
-
-These fields will significantly enhance prediction and analysis capabilities.
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -8,7 +8,7 @@ The scraper follows a **3-phase hierarchical crawling pattern** to extract aucti

 ```mariadb
 ┌─────────────────────────────────────────────────────────────────┐
-│                     TROOSTWIJK SCRAPER                          │
+│                         SCAEV SCRAPER                           │
 └─────────────────────────────────────────────────────────────────┘

 ┌─────────────────────────────────────────────────────────────────┐
@@ -321,19 +321,18 @@ Lot Page Parsed

 ## Key Configuration

-| Setting              | Value                             | Purpose                          |
-|----------------------|-----------------------------------|----------------------------------|
-| `CACHE_DB`           | `/mnt/okcomputer/output/cache.db` | SQLite database path             |
-| `IMAGES_DIR`         | `/mnt/okcomputer/output/images`   | Downloaded images storage        |
-| `RATE_LIMIT_SECONDS` | `0.5`                             | Delay between requests           |
-| `DOWNLOAD_IMAGES`    | `False`                           | Toggle image downloading         |
-| `MAX_PAGES`          | `50`                              | Number of listing pages to crawl |
+| Setting              | Value                                                                    | Purpose                          |
+|----------------------|--------------------------------------------------------------------------|----------------------------------|
+| `DATABASE_URL`       | `postgresql://auction:heel-goed-wachtwoord@192.168.1.159:5432/auctiondb` | PostgreSQL connection string     |
+| `IMAGES_DIR`         | `/mnt/okcomputer/output/images`                                          | Downloaded images storage        |
+| `RATE_LIMIT_SECONDS` | `0.5`                                                                    | Delay between requests           |
+| `DOWNLOAD_IMAGES`    | `False`                                                                  | Toggle image downloading         |
+| `MAX_PAGES`          | `50`                                                                     | Number of listing pages to crawl |

 ## Output Files

 ```
 /mnt/okcomputer/output/
-├── cache.db                              # SQLite database (compressed HTML + data)
 ├── auctions_{timestamp}.json             # Exported auctions
 ├── auctions_{timestamp}.csv              # Exported auctions
 ├── lots_{timestamp}.json                 # Exported lots
@@ -346,6 +345,48 @@ Lot Page Parsed
        └── 001.jpg
 ```

+## Terminal Progress per Lot (TTY)
+
+During lot analysis, Scaev now shows a per‑lot TTY progress animation with a final summary of all inputs used:
+
+- Spinner runs while enrichment is in progress.
+- Summary lists every page/API used to analyze the lot with:
+  - URL/label
+  - Size in bytes
+  - Source state: cache | realtime | offline | db | intercepted
+  - Duration in ms
+
+Example output snippet:
+
+```
+[LOT A1-28505-5] ✓ Done in 812 ms — pages/APIs used:
+  • [html] https://www.troostwijkauctions.com/l/... | 142331 B | cache | 4 ms
+  • [graphql] GraphQL lotDetails | 5321 B | realtime | 142 ms
+  • [rest] REST bid history | 18234 B | realtime | 236 ms
+```
+
+Notes:
+- In non‑TTY environments the spinner is replaced by simple log lines.
+- Intercepted GraphQL responses (captured during page load) are labeled as `intercepted` with near‑zero duration.
+
+## Data Flow “Tunnel” (Simplified)
+
+For each lot, the data “tunnels through” the following stages:
+
+1. HTML page → parse `__NEXT_DATA__` for core lot fields and lot UUID.
+2. GraphQL `lotDetails` → bidding data (current/starting/minimum bid, bid count, bid step, close time, status).
+3. Optional REST bid history → complete timeline of bids; derive first/last bid time and bid velocity.
+4. Persist to DB (PostgreSQL) and export; image URLs are captured and optionally downloaded concurrently per lot.
+
+Each stage is recorded by the TTY progress reporter with timing and byte size for transparency and diagnostics.
+
+## Migrations and ORM Roadmap
+
+- Migrations follow a Flyway‑style convention in `db/migration` (e.g., `V1__initial_schema.sql`).
+- Current baseline is V1; there are no new migrations required at this time.
+- Raw SQL usage remains in place (SQLite) while we prepare a gradual move to SQLAlchemy 2.x targeting PostgreSQL.
+- See `docs/MIGRATIONS.md` for details on naming, workflow, and the future switch to PostgreSQL.
+
 ## Extension Points for Integration

 ### 1. **Downstream Processing Pipeline**
@@ -461,13 +502,6 @@ query LotBiddingData($lotDisplayId: String!, $locale: String!, $platform: Platfo
 - ✅ Closing time and status
 - ✅ Brand, model, manufacturer (from attributes)

-**Available but Not Yet Captured:**
- ⚠️ `followersCount` - Watch count for popularity analysis
- ⚠️ `estimatedFullPrice` - Min/max estimated values
- ⚠️ `biddingStatus` - More detailed status enum
- ⚠️ `condition` - Direct condition field
- ⚠️ `location` - City, country details
- ⚠️ `categoryInformation` - Structured category

 ### REST API - Bid History
 **Endpoint:** `https://shared-api.tbauctions.com/bidmanagement/lots/{lot_uuid}/bidding-history`
@@ -511,11 +545,6 @@ query LotBiddingData($lotDisplayId: String!, $locale: String!, $platform: Platfo

 ### API Integration Points

-**Files:**
- `src/graphql_client.py` - GraphQL queries and parsing
- `src/bid_history_client.py` - REST API pagination and parsing
- `src/scraper.py` - Integration during lot scraping
-
 **Flow:**
 1. Lot page scraped → Extract lot UUID from `__NEXT_DATA__`
 2. Call GraphQL API → Get bidding data
@@ -528,4 +557,3 @@ query LotBiddingData($lotDisplayId: String!, $locale: String!, $platform: Platfo
 - Overall 0.5s rate limit applies to page requests
 - API calls are part of lot processing (not separately limited)

-See `API_INTELLIGENCE_FINDINGS.md` for detailed field analysis and roadmap.
--- a/docs/AUTOSTART_SETUP.md
+++ b/docs/AUTOSTART_SETUP.md
@@ -1,120 +0,0 @@
-# Auto-Start Setup Guide
-
-The monitor doesn't run automatically yet. Choose your setup based on your server OS:
-
---
-
-## Linux Server (Systemd Service) ⭐ RECOMMENDED
-
-**Install:**
-```bash
-cd /home/tour/scaev
-chmod +x install_service.sh
-./install_service.sh
-```
-
-**The service will:**
- ✅ Start automatically on server boot
- ✅ Restart automatically if it crashes
- ✅ Log to `~/scaev/logs/monitor.log`
- ✅ Poll every 30 minutes
-
-**Management commands:**
-```bash
-sudo systemctl status scaev-monitor     # Check if running
-sudo systemctl stop scaev-monitor       # Stop
-sudo systemctl start scaev-monitor      # Start
-sudo systemctl restart scaev-monitor    # Restart
-journalctl -u scaev-monitor -f          # Live logs
-tail -f ~/scaev/logs/monitor.log        # Monitor log file
-```
-
---
-
-## Windows (Task Scheduler)
-
-**Install (Run as Administrator):**
-```powershell
-cd C:\vibe\scaev
-.\setup_windows_task.ps1
-```
-
-**The task will:**
- ✅ Start automatically on Windows boot
- ✅ Restart automatically if it crashes (up to 3 times)
- ✅ Run as SYSTEM user
- ✅ Poll every 30 minutes
-
-**Management:**
-1. Open Task Scheduler (`taskschd.msc`)
-2. Find `ScaevAuctionMonitor` in Task Scheduler Library
-3. Right-click to Run/Stop/Disable
-
-**Or via PowerShell:**
-```powershell
-Start-ScheduledTask -TaskName "ScaevAuctionMonitor"
-Stop-ScheduledTask -TaskName "ScaevAuctionMonitor"
-Get-ScheduledTask -TaskName "ScaevAuctionMonitor" | Get-ScheduledTaskInfo
-```
-
---
-
-## Alternative: Cron Job (Linux)
-
-**For simpler setup without systemd:**
-
-```bash
-# Edit crontab
-crontab -e
-
-# Add this line (runs on boot and restarts every hour if not running)
-@reboot cd /home/tour/scaev && python3 src/monitor.py 30 >> logs/monitor.log 2>&1
-0 * * * * pgrep -f "monitor.py" || (cd /home/tour/scaev && python3 src/monitor.py 30 >> logs/monitor.log 2>&1 &)
-```
-
---
-
-## Verify It's Working
-
-**Check process is running:**
-```bash
-# Linux
-ps aux | grep monitor.py
-
-# Windows
-tasklist | findstr python
-```
-
-**Check logs:**
-```bash
-# Linux
-tail -f ~/scaev/logs/monitor.log
-
-# Windows
-# Check Task Scheduler history
-```
-
-**Check database is updating:**
-```bash
-# Last modified time should update every 30 minutes
-ls -lh C:/mnt/okcomputer/output/cache.db
-```
-
---
-
-## Troubleshooting
-
-**Service won't start:**
-1. Check Python path is correct in service file
-2. Check working directory exists
-3. Check user permissions
-4. View error logs: `journalctl -u scaev-monitor -n 50`
-
-**Monitor stops after a while:**
- Check disk space for logs
- Check rate limiting isn't blocking requests
- Increase RestartSec in service file
-
-**Database locked errors:**
- Ensure only one monitor instance is running
- Add timeout to SQLite connections in config
--- a/docs/DEPLOY_MOBILE.md
+++ b/docs/DEPLOY_MOBILE.md
@@ -1,23 +0,0 @@
-✅ Routing service configured - scaev-mobile-routing.service active and working
-✅ Scaev deployed - Container running with dual networks:
-scaev_mobile_net (172.30.0.10) - for outbound internet via mobile
-traefik_net (172.20.0.8) - for LAN access
-✅ Mobile routing verified:
-Host IP: 5.132.33.195 (LAN gateway)
-Mobile IP: 77.63.26.140 (mobile provider)
-Scaev IP: 77.63.26.140 ✅ Using mobile connection!
-✅ Scraper functional - Successfully accessing troostwijkauctions.com through mobile network
-Architecture:``` 
-┌─────────────────────────────────────────┐
-│ Tour Machine (192.168.1.159)            │
-│                                         │
-│  ┌──────────────────────────────┐      │
-│  │ Scaev Container              │      │
-│  │ • scaev_mobile_net: 172.30.0.10 ────┼──> Mobile Gateway (10.133.133.26)
-│  │ • traefik_net: 172.20.0.8    │      │    └─> Internet (77.63.26.140)
-│  │ • SQLite: shared-auction-data│      │
-│  │ • Images: shared-auction-data│      │
-│  └──────────────────────────────┘      │
-│                                         │
-└─────────────────────────────────────────┘
-```
--- a/docs/Deployment.md
+++ b/docs/Deployment.md
@@ -1,122 +0,0 @@
-# Deployment
-
-## Prerequisites
-
- Python 3.8+ installed
- Access to a server (Linux/Windows)
- Playwright and dependencies installed
-
-## Production Setup
-
-### 1. Install on Server
-
-```bash
-# Clone repository
-git clone git@git.appmodel.nl:Tour/troost-scraper.git
-cd troost-scraper
-
-# Create virtual environment
-python -m venv .venv
-source .venv/bin/activate  # On Windows: .venv\Scripts\activate
-
-# Install dependencies
-pip install -r requirements.txt
-playwright install chromium
-playwright install-deps  # Install system dependencies
-```
-
-### 2. Configuration
-
-Create a configuration file or set environment variables:
-
-```python
-# main.py configuration
-BASE_URL = "https://www.troostwijkauctions.com"
-CACHE_DB = "/mnt/okcomputer/output/cache.db"
-OUTPUT_DIR = "/mnt/okcomputer/output"
-RATE_LIMIT_SECONDS = 0.5
-MAX_PAGES = 50
-```
-
-### 3. Create Output Directories
-
-```bash
-sudo mkdir -p /var/troost-scraper/output
-sudo chown $USER:$USER /var/troost-scraper
-```
-
-### 4. Run as Cron Job
-
-Add to crontab (`crontab -e`):
-
-```bash
-# Run scraper daily at 2 AM
-0 2 * * * cd /path/to/troost-scraper && /path/to/.venv/bin/python main.py >> /var/log/troost-scraper.log 2>&1
-```
-
-## Docker Deployment (Optional)
-
-Create `Dockerfile`:
-
-```dockerfile
-FROM python:3.10-slim
-
-WORKDIR /app
-
-# Install system dependencies for Playwright
-RUN apt-get update && apt-get install -y \
-    wget \
-    gnupg \
-    && rm -rf /var/lib/apt/lists/*
-
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-RUN playwright install chromium
-RUN playwright install-deps
-
-COPY main.py .
-
-CMD ["python", "main.py"]
-```
-
-Build and run:
-
-```bash
-docker build -t troost-scraper .
-docker run -v /path/to/output:/output troost-scraper
-```
-
-## Monitoring
-
-### Check Logs
-
-```bash
-tail -f /var/log/troost-scraper.log
-```
-
-### Monitor Output
-
-```bash
-ls -lh /var/troost-scraper/output/
-```
-
-## Troubleshooting
-
-### Playwright Browser Issues
-
-```bash
-# Reinstall browsers
-playwright install --force chromium
-```
-
-### Permission Issues
-
-```bash
-# Fix permissions
-sudo chown -R $USER:$USER /var/troost-scraper
-```
-
-### Memory Issues
-
- Reduce `MAX_PAGES` in configuration
- Run on machine with more RAM (Playwright needs ~1GB)
--- a/docs/FIXES_COMPLETE.md
+++ b/docs/FIXES_COMPLETE.md
@@ -1,377 +0,0 @@
-# Data Quality Fixes - Complete Summary
-
-## Executive Summary
-
-Successfully completed all 5 high-priority data quality and intelligence tasks:
-
-1. ✅ **Fixed orphaned lots** (16,807 → 13 orphaned lots)
-2. ✅ **Fixed bid history fetching** (script created, ready to run)
-3. ✅ **Added followersCount extraction** (watch count)
-4. ✅ **Added estimatedFullPrice extraction** (min/max values)
-5. ✅ **Added direct condition field** from API
-
-**Impact:** Database now captures 80%+ more intelligence data for future scrapes.
-
---
-
-## Task 1: Fix Orphaned Lots ✅ COMPLETE
-
-### Problem:
- **16,807 lots** had no matching auction (100% orphaned)
- Root cause: auction_id mismatch
-  - Lots table used UUID auction_id (e.g., `72928a1a-12bf-4d5d-93ac-292f057aab6e`)
-  - Auctions table used numeric IDs (legacy incorrect data)
-  - Auction pages use `displayId` (e.g., `A1-34731`)
-
-### Solution:
-1. **Updated parse.py** - Modified `_parse_lot_json()` to extract auction displayId from page_props
-   - Lot pages include full auction data
-   - Now extracts `auction.displayId` instead of using UUID `lot.auctionId`
-
-2. **Created fix_orphaned_lots.py** - Migrated existing 16,793 lots
-   - Read cached lot pages
-   - Extracted auction displayId from embedded auction data
-   - Updated lots.auction_id from UUID to displayId
-
-3. **Created fix_auctions_table.py** - Rebuilt auctions table
-   - Cleared incorrect auction data
-   - Re-extracted from 517 cached auction pages
-   - Inserted 509 auctions with correct displayId
-
-### Results:
- **Orphaned lots:** 16,807 → **13** (99.9% fixed)
- **Auctions completeness:**
-  - lots_count: 0% → **100%**
-  - first_lot_closing_time: 0% → **100%**
- **All lots now properly linked to auctions**
-
-### Files Modified:
- `src/parse.py` - Updated `_extract_nextjs_data()` and `_parse_lot_json()`
-
-### Scripts Created:
- `fix_orphaned_lots.py` - Migrates existing lots
- `fix_auctions_table.py` - Rebuilds auctions table
- `check_lot_auction_link.py` - Diagnostic script
-
---
-
-## Task 2: Fix Bid History Fetching ✅ COMPLETE
-
-### Problem:
- **1,590 lots** with bids but no bid history (0.1% coverage)
- Bid history fetching only ran during scraping, not for existing lots
-
-### Solution:
-1. **Verified scraper logic** - src/scraper.py bid history fetching is correct
-   - Extracts lot UUID from __NEXT_DATA__
-   - Calls REST API: `https://shared-api.tbauctions.com/bidmanagement/lots/{uuid}/bidding-history`
-   - Calculates bid velocity, first/last bid time
-   - Saves to bid_history table
-
-2. **Created fetch_missing_bid_history.py**
-   - Builds lot_id → UUID mapping from cached pages
-   - Fetches bid history from REST API for all lots with bids
-   - Updates lots table with bid intelligence
-   - Saves complete bid history records
-
-### Results:
- Script created and tested
- **Limitation:** Takes ~13 minutes to process 1,590 lots (0.5s rate limit)
- **Future scrapes:** Bid history will be captured automatically
-
-### Files Created:
- `fetch_missing_bid_history.py` - Migration script for existing lots
-
-### Note:
- Script is ready to run but requires ~13-15 minutes
- Future scrapes will automatically capture bid history
- No code changes needed - existing scraper logic is correct
-
---
-
-## Task 3: Add followersCount Field ✅ COMPLETE
-
-### Problem:
- Watch count thought to be unavailable
- **Discovery:** `followersCount` field exists in GraphQL API!
-
-### Solution:
-1. **Updated database schema** (src/cache.py)
-   - Added `followers_count INTEGER DEFAULT 0` column
-   - Auto-migration on scraper startup
-
-2. **Updated GraphQL query** (src/graphql_client.py)
-   - Added `followersCount` to LOT_BIDDING_QUERY
-
-3. **Updated format_bid_data()** (src/graphql_client.py)
-   - Extracts and returns `followers_count`
-
-4. **Updated save_lot()** (src/cache.py)
-   - Saves followers_count to database
-
-5. **Created enrich_existing_lots.py**
-   - Fetches followers_count for existing 16,807 lots
-   - Uses GraphQL API with 0.5s rate limiting
-   - Takes ~2.3 hours to complete
-
-### Intelligence Value:
- **Predict lot popularity** before bidding wars
- Calculate interest-to-bid conversion rate
- Identify "sleeper" lots (high followers, low bids)
- Alert on lots gaining sudden interest
-
-### Files Modified:
- `src/cache.py` - Schema + save_lot()
- `src/graphql_client.py` - Query + format_bid_data()
-
-### Files Created:
- `enrich_existing_lots.py` - Migration for existing lots
-
---
-
-## Task 4: Add estimatedFullPrice Extraction ✅ COMPLETE
-
-### Problem:
- Estimated min/max values thought to be unavailable
- **Discovery:** `estimatedFullPrice` object with min/max exists in GraphQL API!
-
-### Solution:
-1. **Updated database schema** (src/cache.py)
-   - Added `estimated_min_price REAL` column
-   - Added `estimated_max_price REAL` column
-
-2. **Updated GraphQL query** (src/graphql_client.py)
-   - Added `estimatedFullPrice { min { cents currency } max { cents currency } }`
-
-3. **Updated format_bid_data()** (src/graphql_client.py)
-   - Extracts estimated_min_obj and estimated_max_obj
-   - Converts cents to EUR
-   - Returns estimated_min_price and estimated_max_price
-
-4. **Updated save_lot()** (src/cache.py)
-   - Saves both estimated price fields
-
-5. **Migration** (enrich_existing_lots.py)
-   - Fetches estimated prices for existing lots
-
-### Intelligence Value:
- Compare final price vs estimate (accuracy analysis)
- Identify bargains: `final_price < estimated_min`
- Identify overvalued: `final_price > estimated_max`
- Build pricing models per category
- Investment opportunity detection
-
-### Files Modified:
- `src/cache.py` - Schema + save_lot()
- `src/graphql_client.py` - Query + format_bid_data()
-
---
-
-## Task 5: Use Direct Condition Field ✅ COMPLETE
-
-### Problem:
- Condition extracted from attributes (complex, unreliable)
- 0% condition_score success rate
- **Discovery:** Direct `condition` and `appearance` fields in GraphQL API!
-
-### Solution:
-1. **Updated database schema** (src/cache.py)
-   - Added `lot_condition TEXT` column (direct from API)
-   - Added `appearance TEXT` column (visual condition notes)
-
-2. **Updated GraphQL query** (src/graphql_client.py)
-   - Added `condition` field
-   - Added `appearance` field
-
-3. **Updated format_bid_data()** (src/graphql_client.py)
-   - Extracts and returns `lot_condition`
-   - Extracts and returns `appearance`
-
-4. **Updated save_lot()** (src/cache.py)
-   - Saves both condition fields
-
-5. **Migration** (enrich_existing_lots.py)
-   - Fetches condition data for existing lots
-
-### Intelligence Value:
- **Cleaner, more reliable** condition data
- Better condition scoring potential
- Identify restoration projects
- Filter by condition category
- Combined with appearance for detailed assessment
-
-### Files Modified:
- `src/cache.py` - Schema + save_lot()
- `src/graphql_client.py` - Query + format_bid_data()
-
---
-
-## Summary of Code Changes
-
-### Core Files Modified:
-
-#### 1. `src/parse.py`
-**Changes:**
- `_extract_nextjs_data()`: Pass auction data to lot parser
- `_parse_lot_json()`: Accept auction_data parameter, extract auction displayId
-
-**Impact:** Fixes orphaned lots issue going forward
-
-#### 2. `src/cache.py`
-**Changes:**
- Added 5 new columns to lots table schema
- Updated `save_lot()` INSERT statement to include new fields
- Auto-migration logic for new columns
-
-**New Columns:**
- `followers_count INTEGER DEFAULT 0`
- `estimated_min_price REAL`
- `estimated_max_price REAL`
- `lot_condition TEXT`
- `appearance TEXT`
-
-#### 3. `src/graphql_client.py`
-**Changes:**
- Updated `LOT_BIDDING_QUERY` to include new fields
- Updated `format_bid_data()` to extract and format new fields
-
-**New Fields Extracted:**
- `followersCount`
- `estimatedFullPrice { min { cents } max { cents } }`
- `condition`
- `appearance`
-
-### Migration Scripts Created:
-
-1. **fix_orphaned_lots.py** - Fix auction_id mismatch (COMPLETED)
-2. **fix_auctions_table.py** - Rebuild auctions table (COMPLETED)
-3. **fetch_missing_bid_history.py** - Fetch bid history for existing lots (READY TO RUN)
-4. **enrich_existing_lots.py** - Fetch new intelligence fields for existing lots (READY TO RUN)
-
-### Diagnostic/Validation Scripts:
-
-1. **check_lot_auction_link.py** - Verify lot-auction linkage
-2. **validate_data.py** - Comprehensive data quality report
-3. **explore_api_fields.py** - API schema introspection
-
---
-
-## Running the Migration Scripts
-
-### Immediate (Already Complete):
-```bash
-python fix_orphaned_lots.py      # ✅ DONE - Fixed 16,793 lots
-python fix_auctions_table.py     # ✅ DONE - Rebuilt 509 auctions
-```
-
-### Optional (Time-Intensive):
-```bash
-# Fetch bid history for 1,590 lots (~13-15 minutes)
-python fetch_missing_bid_history.py
-
-# Enrich all 16,807 lots with new fields (~2.3 hours)
-python enrich_existing_lots.py
-```
-
-**Note:** Future scrapes will automatically capture all data, so migration is optional.
-
---
-
-## Validation Results
-
-### Before Fixes:
-```
-Orphaned lots: 16,807 (100%)
-Auctions lots_count: 0%
-Auctions first_lot_closing: 0%
-Bid history coverage: 0.1% (1/1,591 lots)
-```
-
-### After Fixes:
-```
-Orphaned lots: 13 (0.08%)
-Auctions lots_count: 100%
-Auctions first_lot_closing: 100%
-Bid history: Script ready (will process 1,590 lots)
-New intelligence fields: Implemented and ready
-```
-
---
-
-## Intelligence Impact
-
-### Data Completeness Improvements:
-| Field | Before | After | Improvement |
-|-------|--------|-------|-------------|
-| Orphaned lots | 100% | 0.08% | **99.9% fixed** |
-| Auction lots_count | 0% | 100% | **+100%** |
-| Auction first_lot_closing | 0% | 100% | **+100%** |
-
-### New Intelligence Fields (Future Scrapes):
-| Field | Status | Intelligence Value |
-|-------|--------|-------------------|
-| followers_count | ✅ Implemented | High - Popularity predictor |
-| estimated_min_price | ✅ Implemented | High - Bargain detection |
-| estimated_max_price | ✅ Implemented | High - Value assessment |
-| lot_condition | ✅ Implemented | Medium - Condition filtering |
-| appearance | ✅ Implemented | Medium - Visual assessment |
-
-### Estimated Intelligence Value Increase:
-**80%+** - Based on addition of 5 critical fields that enable:
- Popularity prediction
- Value assessment
- Bargain detection
- Better condition scoring
- Investment opportunity identification
-
---
-
-## Documentation Updated
-
-### Created:
- `VALIDATION_SUMMARY.md` - Complete validation findings
- `API_INTELLIGENCE_FINDINGS.md` - API field analysis
- `FIXES_COMPLETE.md` - This document
-
-### Updated:
- `_wiki/ARCHITECTURE.md` - Complete system documentation
-  - Updated Phase 3 diagram with API enrichment
-  - Expanded lots table schema documentation
-  - Added bid_history table
-  - Added API Integration Architecture section
-  - Updated rate limiting and image download flows
-
---
-
-## Next Steps (Optional)
-
-### Immediate:
-1. ✅ All high-priority fixes complete
-2. ✅ Code ready for future scrapes
-3. ⏳ Optional: Run migration scripts for existing data
-
-### Future Enhancements (Low Priority):
-1. Extract structured location (city, country)
-2. Extract category information (structured)
-3. Add VAT and buyer premium fields
-4. Add video/document URL support
-5. Parse viewing/pickup times from remarks text
-
-See `API_INTELLIGENCE_FINDINGS.md` for complete roadmap.
-
---
-
-## Success Criteria
-
-All tasks completed successfully:
-
- [x] **Orphaned lots fixed** - 99.9% reduction (16,807 → 13)
- [x] **Bid history logic verified** - Script created, ready to run
- [x] **followersCount added** - Schema, extraction, saving implemented
- [x] **estimatedFullPrice added** - Min/max extraction implemented
- [x] **Direct condition field** - lot_condition and appearance added
- [x] **Code updated** - parse.py, cache.py, graphql_client.py
- [x] **Migrations created** - 4 scripts for data cleanup/enrichment
- [x] **Documentation complete** - ARCHITECTURE.md, summaries, findings
-
-**Impact:** Scraper now captures 80%+ more intelligence data with higher data quality.
--- a/docs/Home.md
+++ b/docs/Home.md
@@ -1,18 +0,0 @@
-# scaev Wiki
-
-Welcome to the scaev documentation.
-
-## Contents
-
- [Getting Started](Getting-Started)
- [Architecture](Architecture)
- [Deployment](Deployment)
-
-## Overview
-
-Scaev Auctions Scraper is a Python-based web scraper that extracts auction lot data using Playwright for browser automation and SQLite for caching.
-
-## Quick Links
-
- [Repository](https://git.appmodel.nl/Tour/troost-scraper)
- [Issues](https://git.appmodel.nl/Tour/troost-scraper/issues)
--- a/docs/INTELLIGENCE_DASHBOARD_UPGRADE.md
+++ b/docs/INTELLIGENCE_DASHBOARD_UPGRADE.md
@@ -1,624 +0,0 @@
-# Intelligence Dashboard Upgrade Plan
-
-## Executive Summary
-
-The Troostwijk scraper now captures **5 critical new intelligence fields** that enable advanced predictive analytics and opportunity detection. This document outlines recommended dashboard upgrades to leverage the new data.
-
---
-
-## New Intelligence Fields Available
-
-### 1. **followers_count** (Watch Count)
-**Type:** INTEGER
-**Coverage:** Will be 100% for new scrapes, 0% for existing (requires migration)
-**Intelligence Value:** ⭐⭐⭐⭐⭐ CRITICAL
-
-**What it tells us:**
- How many users are watching/following each lot
- Real-time popularity indicator
- Early warning of bidding competition
-
-**Dashboard Applications:**
- **Popularity Score**: Calculate interest level before bidding starts
- **Follower Trends**: Track follower growth rate (requires time-series scraping)
- **Interest-to-Bid Conversion**: Ratio of followers to actual bidders
- **Sleeper Lots Alert**: High followers + low bids = hidden opportunity
-
-### 2. **estimated_min_price** & **estimated_max_price**
-**Type:** REAL (EUR)
-**Coverage:** Will be 100% for new scrapes, 0% for existing (requires migration)
-**Intelligence Value:** ⭐⭐⭐⭐⭐ CRITICAL
-
-**What it tells us:**
- Auction house's professional valuation range
- Expected market value
- Reserve price indicator (when combined with status)
-
-**Dashboard Applications:**
- **Value Gap Analysis**: `current_bid / estimated_min_price` ratio
- **Bargain Detector**: Lots where `current_bid < estimated_min_price * 0.8`
- **Overvaluation Alert**: Lots where `current_bid > estimated_max_price * 1.2`
- **Investment ROI Calculator**: Potential profit if bought at current bid
- **Auction House Accuracy**: Track actual closing vs estimates
-
-### 3. **lot_condition** & **appearance**
-**Type:** TEXT
-**Coverage:** Will be ~80-90% for new scrapes (not all lots have condition data)
-**Intelligence Value:** ⭐⭐⭐ HIGH
-
-**What it tells us:**
- Direct condition assessment from auction house
- Visual quality notes
- Cleaner than parsing from attributes
-
-**Dashboard Applications:**
- **Condition Filtering**: Filter by condition categories
- **Restoration Projects**: Identify lots needing work
- **Quality Scoring**: Combine condition + appearance for rating
- **Condition vs Price**: Analyze price premium for better condition
-
---
-
-## Data Quality Improvements
-
-### Orphaned Lots Issue - FIXED ✅
-**Before:** 16,807 lots (100%) had no matching auction
-**After:** 13 lots (0.08%) orphaned
-
-**Impact on Dashboard:**
- Auction-level analytics now possible
- Can group lots by auction
- Can show auction statistics
- Can track auction house performance
-
-### Auction Data Completeness - FIXED ✅
-**Before:**
- lots_count: 0%
- first_lot_closing_time: 0%
-
-**After:**
- lots_count: 100%
- first_lot_closing_time: 100%
-
-**Impact on Dashboard:**
- Show auction size (number of lots)
- Display auction timeline
- Calculate auction velocity (lots per hour closing)
-
---
-
-## Recommended Dashboard Upgrades
-
-### Priority 1: Opportunity Detection (High ROI)
-
-#### 1.1 **Bargain Hunter Dashboard**
-```
-╔══════════════════════════════════════════════════════════╗
-║                   BARGAIN OPPORTUNITIES                  ║
-╠══════════════════════════════════════════════════════════╣
-║ Lot: A1-34731-107 - Ford Generator                      ║
-║ Current Bid: €500                                        ║
-║ Estimated Range: €1,200 - €1,800                        ║
-║ Bargain Score: 🔥🔥🔥🔥🔥 (58% below estimate)          ║
-║ Followers: 12 (High interest, low bids)                 ║
-║ Time Left: 2h 15m                                        ║
-║ → POTENTIAL PROFIT: €700 - €1,300                       ║
-╚══════════════════════════════════════════════════════════╝
-```
-
-**Calculations:**
-```python
-value_gap = estimated_min_price - current_bid
-bargain_score = value_gap / estimated_min_price * 100
-potential_profit = estimated_max_price - current_bid
-
-# Filter criteria
-if current_bid < estimated_min_price * 0.80:  # 20%+ discount
-    if followers_count > 5:  # Has interest
-        SHOW_AS_OPPORTUNITY
-```
-
-#### 1.2 **Popularity vs Bidding Dashboard**
-```
-╔══════════════════════════════════════════════════════════╗
-║              SLEEPER LOTS (High Watch, Low Bids)         ║
-╠══════════════════════════════════════════════════════════╣
-║ Lot               │ Followers │ Bids │ Current │ Est Min ║
-║═══════════════════╪═══════════╪══════╪═════════╪═════════║
-║ Laptop Dell XPS   │    47     │  0   │  No bids│  €800   ║
-║ iPhone 15 Pro     │    32     │  1   │  €150   │  €950   ║
-║ Office Chairs 10x │    18     │  0   │  No bids│  €450   ║
-╚══════════════════════════════════════════════════════════╝
-```
-
-**Insight:** High followers + low bids = people watching but not committing yet. Opportunity to bid early before competition heats up.
-
-#### 1.3 **Value Gap Heatmap**
-```
-╔══════════════════════════════════════════════════════════╗
-║                    VALUE GAP ANALYSIS                    ║
-╠══════════════════════════════════════════════════════════╣
-║                                                          ║
-║  Great Deals        Fair Price       Overvalued         ║
-║  (< 80% est)       (80-120% est)     (> 120% est)      ║
-║  ╔═══╗              ╔═══╗             ╔═══╗             ║
-║  ║325║              ║892║             ║124║             ║
-║  ╚═══╝              ╚═══╝             ╚═══╝             ║
-║   🔥                  ➡                 ⚠               ║
-╚══════════════════════════════════════════════════════════╝
-```
-
-### Priority 2: Intelligence Analytics
-
-#### 2.1 **Lot Intelligence Card**
-Enhanced lot detail view with all new fields:
-
-```
-╔══════════════════════════════════════════════════════════╗
-║ A1-34731-107 - Ford FGT9250E Generator                  ║
-╠══════════════════════════════════════════════════════════╣
-║ BIDDING                                                  ║
-║   Current:     €500                                      ║
-║   Starting:    €100                                      ║
-║   Minimum:     €550                                      ║
-║   Bids:        8 (2.4 bids/hour)                        ║
-║   Followers:   12 👁                                     ║
-║                                                          ║
-║ VALUATION                                                ║
-║   Estimated:   €1,200 - €1,800                          ║
-║   Value Gap:   -€700 (58% below estimate) 🔥           ║
-║   Potential:   €700 - €1,300 profit                     ║
-║                                                          ║
-║ CONDITION                                                ║
-║   Condition:   Used - Good working order                ║
-║   Appearance:  Normal wear, some scratches              ║
-║   Year:        2015                                      ║
-║                                                          ║
-║ TIMING                                                   ║
-║   Closes:      2025-12-08 14:30                         ║
-║   Time Left:   2h 15m                                    ║
-║   First Bid:   2025-12-06 09:15                         ║
-║   Last Bid:    2025-12-08 12:10                         ║
-╚══════════════════════════════════════════════════════════╝
-```
-
-#### 2.2 **Auction House Accuracy Tracker**
-Track how accurate estimates are compared to final prices:
-
-```
-╔══════════════════════════════════════════════════════════╗
-║           AUCTION HOUSE ESTIMATION ACCURACY              ║
-╠══════════════════════════════════════════════════════════╣
-║ Category         │ Avg Accuracy │ Tend to Over/Under    ║
-║══════════════════╪══════════════╪═══════════════════════║
-║ Electronics      │    92.3%     │ Underestimate 5.2%    ║
-║ Vehicles         │    88.7%     │ Overestimate 8.1%     ║
-║ Furniture        │    94.1%     │ Accurate ±2%          ║
-║ Heavy Machinery  │    85.4%     │ Underestimate 12.3%   ║
-╚══════════════════════════════════════════════════════════╝
-
-Insight: Heavy Machinery estimates tend to be 12% low
-         → Good buying opportunities in this category
-```
-
-**Calculation:**
-```python
-# After lot closes
-actual_price = final_bid
-estimated_mid = (estimated_min_price + estimated_max_price) / 2
-accuracy = abs(actual_price - estimated_mid) / estimated_mid * 100
-
-if actual_price < estimated_mid:
-    trend = "Underestimate"
-else:
-    trend = "Overestimate"
-```
-
-#### 2.3 **Interest Conversion Dashboard**
-```
-╔══════════════════════════════════════════════════════════╗
-║              FOLLOWER → BIDDER CONVERSION                ║
-╠══════════════════════════════════════════════════════════╣
-║ Total Lots:           16,807                             ║
-║ Lots with Followers:  12,450 (74%)                       ║
-║ Lots with Bids:        1,591 (9.5%)                      ║
-║                                                          ║
-║ Conversion Rate:       12.8%                             ║
-║ (Followers who bid)                                      ║
-║                                                          ║
-║ Avg Followers per Lot:  8.3                              ║
-║ Avg Bids when >0:      5.2                               ║
-║                                                          ║
-║ HIGH INTEREST CATEGORIES:                                ║
-║   Electronics:    18.5 followers avg                     ║
-║   Vehicles:       24.3 followers avg                     ║
-║   Art:            31.2 followers avg                     ║
-╚══════════════════════════════════════════════════════════╝
-```
-
-### Priority 3: Real-Time Alerts
-
-#### 3.1 **Opportunity Alerts**
-```python
-# Alert conditions using new fields
-
-# BARGAIN ALERT
-if (current_bid < estimated_min_price * 0.80 and
-    time_remaining < 24_hours and
-    followers_count > 3):
-
-    send_alert("BARGAIN: {lot_id} - {value_gap}% below estimate!")
-
-# SLEEPER LOT ALERT
-if (followers_count > 10 and
-    bid_count == 0 and
-    time_remaining < 12_hours):
-
-    send_alert("SLEEPER: {lot_id} - {followers_count} watching, no bids yet!")
-
-# HEATING UP ALERT
-if (follower_growth_rate > 5_per_hour and
-    bid_count < 3):
-
-    send_alert("HEATING UP: {lot_id} - Interest spiking, get in early!")
-
-# OVERVALUED WARNING
-if (current_bid > estimated_max_price * 1.2):
-
-    send_alert("OVERVALUED: {lot_id} - 20%+ above high estimate!")
-```
-
-#### 3.2 **Watchlist Smart Alerts**
-```
-╔══════════════════════════════════════════════════════════╗
-║                  YOUR WATCHLIST ALERTS                   ║
-╠══════════════════════════════════════════════════════════╣
-║ 🔥 MacBook Pro A1-34523                                  ║
-║    Now €800 (€400 below estimate!)                      ║
-║    12 others watching - Act fast!                        ║
-║                                                          ║
-║ 👁 iPhone 15 A1-34987                                    ║
-║    32 followers but no bids - Opportunity?              ║
-║                                                          ║
-║ ⚠ Office Desk A1-35102                                  ║
-║    Bid at €450 but estimate €200-€300                   ║
-║    Consider dropping - overvalued!                       ║
-╚══════════════════════════════════════════════════════════╝
-```
-
-### Priority 4: Advanced Analytics
-
-#### 4.1 **Price Prediction Model**
-Using new fields for ML-based price prediction:
-
-```python
-# Features for price prediction model
-features = [
-    'followers_count',           # NEW - Strong predictor
-    'estimated_min_price',       # NEW - Baseline value
-    'estimated_max_price',       # NEW - Upper bound
-    'lot_condition',             # NEW - Quality indicator
-    'appearance',                # NEW - Visual quality
-    'bid_velocity',              # Existing
-    'time_to_close',             # Existing
-    'category',                  # Existing
-    'manufacturer',              # Existing
-    'year_manufactured',         # Existing
-]
-
-predicted_final_price = model.predict(features)
-confidence_interval = (predicted_low, predicted_high)
-```
-
-**Dashboard Display:**
-```
-╔══════════════════════════════════════════════════════════╗
-║                 PRICE PREDICTION (AI)                    ║
-╠══════════════════════════════════════════════════════════╣
-║ Lot: Ford Generator A1-34731-107                        ║
-║                                                          ║
-║ Current Bid:      €500                                   ║
-║ Estimate Range:   €1,200 - €1,800                       ║
-║                                                          ║
-║ AI PREDICTION:    €1,450                                 ║
-║ Confidence:       €1,280 - €1,620 (85% confidence)      ║
-║                                                          ║
-║ Factors:                                                 ║
-║   ✓ 12 followers (above avg)                            ║
-║   ✓ Good condition                                       ║
-║   ✓ 2.4 bids/hour (active)                              ║
-║   - 2015 model (slightly old)                           ║
-║                                                          ║
-║ Recommendation: BUY if below €1,280                      ║
-╚══════════════════════════════════════════════════════════╝
-```
-
-#### 4.2 **Category Intelligence**
-```
-╔══════════════════════════════════════════════════════════╗
-║          ELECTRONICS CATEGORY INTELLIGENCE               ║
-╠══════════════════════════════════════════════════════════╣
-║ Total Lots:           1,243                              ║
-║ Avg Followers:        18.5 (High Interest Category)      ║
-║ Avg Bids:            12.3                                ║
-║ Follower→Bid Rate:    15.2% (above avg 12.8%)           ║
-║                                                          ║
-║ PRICE ANALYSIS:                                          ║
-║   Estimate Accuracy:  92.3%                              ║
-║   Avg Value Gap:      -5.2% (tend to underestimate)     ║
-║   Bargains Found:     87 lots (7%)                       ║
-║                                                          ║
-║ BEST CONDITIONS:                                         ║
-║   "New/Sealed":       Avg 145% of estimate               ║
-║   "Like New":         Avg 112% of estimate               ║
-║   "Used - Good":      Avg 89% of estimate                ║
-║   "Used - Fair":      Avg 62% of estimate                ║
-║                                                          ║
-║ 💡 INSIGHT: Electronics estimates are accurate but      ║
-║    tend to slightly undervalue. Good buying category.    ║
-╚══════════════════════════════════════════════════════════╝
-```
-
---
-
-## Implementation Priority
-
-### Phase 1: Quick Wins (1-2 days)
-1. ✅ **Bargain Hunter Dashboard** - Filter lots by value gap
-2. ✅ **Enhanced Lot Cards** - Show all new fields
-3. ✅ **Opportunity Alerts** - Email/push notifications for bargains
-
-### Phase 2: Analytics (3-5 days)
-4. ✅ **Popularity vs Bidding Dashboard** - Follower analysis
-5. ✅ **Value Gap Heatmap** - Visual overview
-6. ✅ **Auction House Accuracy** - Historical tracking
-
-### Phase 3: Advanced (1-2 weeks)
-7. ✅ **Price Prediction Model** - ML-based predictions
-8. ✅ **Category Intelligence** - Deep category analytics
-9. ✅ **Smart Watchlist** - Personalized alerts
-
---
-
-## Database Queries for Dashboard
-
-### Get Bargain Opportunities
-```sql
-SELECT
-    lot_id,
-    title,
-    current_bid,
-    estimated_min_price,
-    estimated_max_price,
-    followers_count,
-    lot_condition,
-    closing_time,
-    (estimated_min_price - CAST(REPLACE(REPLACE(current_bid, 'EUR ', ''), '€', '') AS REAL)) as value_gap,
-    ((estimated_min_price - CAST(REPLACE(REPLACE(current_bid, 'EUR ', ''), '€', '') AS REAL)) / estimated_min_price * 100) as bargain_score
-FROM lots
-WHERE estimated_min_price IS NOT NULL
-  AND current_bid NOT LIKE '%No bids%'
-  AND CAST(REPLACE(REPLACE(current_bid, 'EUR ', ''), '€', '') AS REAL) < estimated_min_price * 0.80
-  AND followers_count > 3
-  AND datetime(closing_time) > datetime('now')
-ORDER BY bargain_score DESC
-LIMIT 50;
-```
-
-### Get Sleeper Lots
-```sql
-SELECT
-    lot_id,
-    title,
-    followers_count,
-    bid_count,
-    current_bid,
-    estimated_min_price,
-    closing_time,
-    (julianday(closing_time) - julianday('now')) * 24 as hours_remaining
-FROM lots
-WHERE followers_count > 10
-  AND bid_count = 0
-  AND datetime(closing_time) > datetime('now')
-  AND (julianday(closing_time) - julianday('now')) * 24 < 24
-ORDER BY followers_count DESC;
-```
-
-### Get Auction House Accuracy (Historical)
-```sql
-- After lots close
-SELECT
-    category,
-    COUNT(*) as total_lots,
-    AVG(ABS(final_price - (estimated_min_price + estimated_max_price) / 2) /
-        ((estimated_min_price + estimated_max_price) / 2) * 100) as avg_accuracy,
-    AVG(final_price - (estimated_min_price + estimated_max_price) / 2) as avg_bias
-FROM lots
-WHERE estimated_min_price IS NOT NULL
-  AND final_price IS NOT NULL
-  AND datetime(closing_time) < datetime('now')
-GROUP BY category
-ORDER BY avg_accuracy DESC;
-```
-
-### Get Interest Conversion Rate
-```sql
-SELECT
-    COUNT(*) as total_lots,
-    COUNT(CASE WHEN followers_count > 0 THEN 1 END) as lots_with_followers,
-    COUNT(CASE WHEN bid_count > 0 THEN 1 END) as lots_with_bids,
-    ROUND(COUNT(CASE WHEN bid_count > 0 THEN 1 END) * 100.0 /
-          COUNT(CASE WHEN followers_count > 0 THEN 1 END), 2) as conversion_rate,
-    AVG(followers_count) as avg_followers,
-    AVG(CASE WHEN bid_count > 0 THEN bid_count END) as avg_bids_when_active
-FROM lots
-WHERE followers_count > 0;
-```
-
-### Get Category Intelligence
-```sql
-SELECT
-    category,
-    COUNT(*) as total_lots,
-    AVG(followers_count) as avg_followers,
-    AVG(bid_count) as avg_bids,
-    COUNT(CASE WHEN bid_count > 0 THEN 1 END) * 100.0 / COUNT(*) as bid_rate,
-    COUNT(CASE WHEN followers_count > 0 THEN 1 END) * 100.0 / COUNT(*) as follower_rate,
-    -- Bargain rate
-    COUNT(CASE
-        WHEN estimated_min_price IS NOT NULL
-        AND current_bid NOT LIKE '%No bids%'
-        AND CAST(REPLACE(REPLACE(current_bid, 'EUR ', ''), '€', '') AS REAL) < estimated_min_price * 0.80
-        THEN 1
-    END) as bargains_found
-FROM lots
-WHERE category IS NOT NULL AND category != ''
-GROUP BY category
-HAVING COUNT(*) > 50
-ORDER BY avg_followers DESC;
-```
-
---
-
-## API Requirements
-
-### Real-Time Updates
-For dashboards to stay current, implement periodic scraping:
-
-```python
-# Recommended update frequency
-ACTIVE_LOTS = "Every 15 minutes"  # Lots closing soon
-ALL_LOTS = "Every 4 hours"         # General updates
-NEW_LOTS = "Every 1 hour"          # Check for new listings
-```
-
-### Webhook Notifications
-```python
-# Alert types to implement
-BARGAIN_ALERT = "Lot below 80% estimate"
-SLEEPER_ALERT = "10+ followers, 0 bids, <12h remaining"
-HEATING_UP = "Follower growth > 5/hour"
-OVERVALUED = "Bid > 120% high estimate"
-CLOSING_SOON = "Watchlist item < 1h remaining"
-```
-
---
-
-## Migration Scripts to Run
-
-To populate new fields for existing 16,807 lots:
-
-```bash
-# High priority - enriches all lots with new intelligence
-python enrich_existing_lots.py
-# Time: ~2.3 hours
-# Benefit: Enables all dashboard features immediately
-
-# Medium priority - adds bid history intelligence
-python fetch_missing_bid_history.py
-# Time: ~15 minutes
-# Benefit: Bid velocity, timing analysis
-```
-
-**Note:** Future scrapes will automatically capture all fields, so migration is optional but recommended for immediate dashboard functionality.
-
---
-
-## Expected Impact
-
-### Before New Fields:
- Basic price tracking
- Simple bid monitoring
- Limited opportunity detection
-
-### After New Fields:
- **80% more intelligence** per lot
- Advanced opportunity detection (bargains, sleepers)
- Price prediction capability
- Auction house accuracy tracking
- Category-specific insights
- Interest→Bid conversion analytics
- Real-time popularity tracking
-
-### ROI Potential:
-```
-Example Scenario:
- User finds bargain: €500 current bid, €1,200-€1,800 estimate
- Buys at: €600 (after competition)
- Resells at: €1,400 (within estimate range)
- Profit: €800
-
-Dashboard Value: Automated detection of 87 such opportunities
-Potential Value: 87 × €800 = €69,600 in identified opportunities
-```
-
---
-
-## Monitoring & Success Metrics
-
-Track dashboard effectiveness:
-
-```python
-# User engagement metrics
-opportunities_shown = COUNT(bargain_alerts)
-opportunities_acted_on = COUNT(user_bids_after_alert)
-conversion_rate = opportunities_acted_on / opportunities_shown
-
-# Accuracy metrics
-predicted_bargains = COUNT(lots_flagged_as_bargain)
-actual_bargains = COUNT(lots_closed_below_estimate)
-prediction_accuracy = actual_bargains / predicted_bargains
-
-# Value metrics
-total_opportunity_value = SUM(estimated_min - final_price) WHERE final_price < estimated_min
-avg_opportunity_value = total_opportunity_value / actual_bargains
-```
-
---
-
-## Next Steps
-
-1. **Immediate (Today):**
-   - ✅ Run `enrich_existing_lots.py` to populate new fields
-   - ✅ Update dashboard to display new fields
-
-2. **This Week:**
-   - Implement Bargain Hunter Dashboard
-   - Add opportunity alerts
-   - Create enhanced lot cards
-
-3. **Next Week:**
-   - Build analytics dashboards
-   - Implement price prediction model
-   - Set up webhook notifications
-
-4. **Future:**
-   - A/B test alert strategies
-   - Refine prediction models with historical data
-   - Add category-specific recommendations
-
---
-
-## Conclusion
-
-The scraper now captures **5 critical intelligence fields** that unlock advanced analytics:
-
-| Field | Dashboard Impact |
-|-------|------------------|
-| followers_count | Popularity tracking, sleeper detection |
-| estimated_min_price | Bargain detection, value assessment |
-| estimated_max_price | Overvaluation alerts, ROI calculation |
-| lot_condition | Quality filtering, restoration opportunities |
-| appearance | Visual assessment, detailed condition |
-
-**Combined with fixed data quality** (99.9% fewer orphaned lots, 100% auction completeness), the dashboard can now provide:
-
- 🎯 **Opportunity Detection** - Automated bargain hunting
- 📊 **Predictive Analytics** - ML-based price predictions
- 📈 **Category Intelligence** - Deep market insights
- ⚡ **Real-Time Alerts** - Instant opportunity notifications
- 💰 **ROI Tracking** - Measure investment potential
-
-**Estimated intelligence value increase: 80%+**
-
-Ready to build! 🚀
--- a/docs/RUN_INSTRUCTIONS.md
+++ b/docs/RUN_INSTRUCTIONS.md
@@ -1,164 +0,0 @@
-# Troostwijk Auction Extractor - Run Instructions
-
-## Fixed Warnings
-
-All warnings have been resolved:
- ✅ SLF4J logging configured (slf4j-simple)
- ✅ Native access enabled for SQLite JDBC
- ✅ Logging output controlled via simplelogger.properties
-
-## Prerequisites
-
-1. **Java 21** installed
-2. **Maven** installed
-3. **IntelliJ IDEA** (recommended) or command line
-
-## Setup (First Time Only)
-
-### 1. Install Dependencies
-
-In IntelliJ Terminal or PowerShell:
-
-```bash
-# Reload Maven dependencies
-mvn clean install
-
-# Install Playwright browser binaries (first time only)
-mvn exec:java -e -Dexec.mainClass=com.microsoft.playwright.CLI -Dexec.args="install"
-```
-
-## Running the Application
-
-### Option A: Using IntelliJ IDEA (Easiest)
-
-1. **Add VM Options for native access:**
-   - Run → Edit Configurations
-   - Select or create configuration for `TroostwijkAuctionExtractor`
-   - In "VM options" field, add:
-     ```
-     --enable-native-access=ALL-UNNAMED
-     ```
-
-2. **Add Program Arguments (optional):**
-   - In "Program arguments" field, add:
-     ```
-     --max-visits 3
-     ```
-
-3. **Run the application:**
-   - Click the green Run button
-
-### Option B: Using Maven (Command Line)
-
-```bash
-# Run with 3 page limit
-mvn exec:java
-
-# Run with custom arguments (override pom.xml defaults)
-mvn exec:java -Dexec.args="--max-visits 5"
-
-# Run without cache
-mvn exec:java -Dexec.args="--no-cache --max-visits 2"
-
-# Run with unlimited visits
-mvn exec:java -Dexec.args=""
-```
-
-### Option C: Using Java Directly
-
-```bash
-# Compile first
-mvn clean compile
-
-# Run with native access enabled
-java --enable-native-access=ALL-UNNAMED \
-  -cp target/classes:$(mvn dependency:build-classpath -Dmdep.outputFile=/dev/stdout -q) \
-  com.auction.TroostwijkAuctionExtractor --max-visits 3
-```
-
-## Command Line Arguments
-
-```
--max-visits <n>   Limit actual page fetches to n (0 = unlimited, default)
--no-cache         Disable page caching
--help             Show help message
-```
-
-## Examples
-
-### Test with 3 page visits (cached pages don't count):
-```bash
-mvn exec:java -Dexec.args="--max-visits 3"
-```
-
-### Fresh extraction without cache:
-```bash
-mvn exec:java -Dexec.args="--no-cache --max-visits 5"
-```
-
-### Full extraction (all pages, unlimited):
-```bash
-mvn exec:java -Dexec.args=""
-```
-
-## Expected Output (No Warnings)
-
-```
-=== Troostwijk Auction Extractor ===
-Max page visits set to: 3
-
-Initializing Playwright browser...
-✓ Browser ready
-✓ Cache database initialized
-
-Starting auction extraction from https://www.troostwijkauctions.com/auctions
-
-[Page 1] Fetching auctions...
-  ✓ Fetched from website (visit 1/3)
-  ✓ Found 20 auctions
-
-[Page 2] Fetching auctions...
-  ✓ Loaded from cache
-  ✓ Found 20 auctions
-
-[Page 3] Fetching auctions...
-  ✓ Fetched from website (visit 2/3)
-  ✓ Found 20 auctions
-
-✓ Total auctions extracted: 60
-
-=== Results ===
-Total auctions found: 60
-Dutch auctions (NL): 45
-Actual page visits: 2
-
-✓ Browser and cache closed
-```
-
-## Cache Management
-
- Cache is stored in: `cache/page_cache.db`
- Cache expires after: 24 hours (configurable in code)
- To clear cache: Delete `cache/page_cache.db` file
-
-## Troubleshooting
-
-### If you still see warnings:
-
-1. **Reload Maven project in IntelliJ:**
-   - Right-click `pom.xml` → Maven → Reload project
-
-2. **Verify VM options:**
-   - Ensure `--enable-native-access=ALL-UNNAMED` is in VM options
-
-3. **Clean and rebuild:**
-   ```bash
-   mvn clean install
-   ```
-
-### If Playwright fails:
-
-```bash
-# Reinstall browser binaries
-mvn exec:java -e -Dexec.mainClass=com.microsoft.playwright.CLI -Dexec.args="install chromium"
-```
--- a/requirements.txt
+++ b/requirements.txt
@@ -5,6 +5,11 @@
 playwright>=1.40.0
 aiohttp>=3.9.0  # Optional: only needed if DOWNLOAD_IMAGES=True

+# ORM groundwork (gradual adoption)
+SQLAlchemy>=2.0  # Modern ORM (2.x) — groundwork for PostgreSQL
+# PostgreSQL driver (runtime)
+psycopg[binary]>=3.1
+
 # Development/Testing
 pytest>=7.4.0  # Optional: for testing
 pytest-asyncio>=0.21.0  # Optional: for async tests
--- a/script/fix_malformed_entries.py
+++ b/script/fix_malformed_entries.py
@@ -1,290 +0,0 @@
-#!/usr/bin/env python3
-"""
-Script to detect and fix malformed/incomplete database entries.
-
-Identifies entries with:
- Missing auction_id for auction pages
- Missing title
- Invalid bid values like "€Huidig bod"
- "gap" in closing_time
- Empty or invalid critical fields
-
-Then re-parses from cache and updates.
-"""
-import sys
-import sqlite3
-import zlib
-from pathlib import Path
-from typing import List, Dict, Tuple
-
-sys.path.insert(0, str(Path(__file__).parent.parent / 'src'))
-
-from parse import DataParser
-from config import CACHE_DB
-
-
-class MalformedEntryFixer:
-    """Detects and fixes malformed database entries"""
-
-    def __init__(self, db_path: str):
-        self.db_path = db_path
-        self.parser = DataParser()
-
-    def detect_malformed_auctions(self) -> List[Tuple]:
-        """Find auctions with missing or invalid data"""
-        with sqlite3.connect(self.db_path) as conn:
-            # Auctions with issues
-            cursor = conn.execute("""
-                SELECT auction_id, url, title, first_lot_closing_time
-                FROM auctions
-                WHERE
-                    auction_id = '' OR auction_id IS NULL
-                    OR title = '' OR title IS NULL
-                    OR first_lot_closing_time = 'gap'
-                    OR first_lot_closing_time LIKE '%wegens vereffening%'
-            """)
-            return cursor.fetchall()
-
-    def detect_malformed_lots(self) -> List[Tuple]:
-        """Find lots with missing or invalid data"""
-        with sqlite3.connect(self.db_path) as conn:
-            cursor = conn.execute("""
-                SELECT lot_id, url, title, current_bid, closing_time
-                FROM lots
-                WHERE
-                    auction_id = '' OR auction_id IS NULL
-                    OR title = '' OR title IS NULL
-                    OR current_bid LIKE '%Huidig%bod%'
-                    OR current_bid = '€Huidig bod'
-                    OR closing_time = 'gap'
-                    OR closing_time = ''
-                    OR closing_time LIKE '%wegens vereffening%'
-            """)
-            return cursor.fetchall()
-
-    def get_cached_content(self, url: str) -> str:
-        """Retrieve and decompress cached HTML for a URL"""
-        with sqlite3.connect(self.db_path) as conn:
-            cursor = conn.execute(
-                "SELECT content FROM cache WHERE url = ?",
-                (url,)
-            )
-            row = cursor.fetchone()
-            if row and row[0]:
-                try:
-                    return zlib.decompress(row[0]).decode('utf-8')
-                except Exception as e:
-                    print(f"  ❌ Failed to decompress: {e}")
-                    return None
-            return None
-
-    def reparse_and_fix_auction(self, auction_id: str, url: str, dry_run: bool = False) -> bool:
-        """Re-parse auction page from cache and update database"""
-        print(f"\n  Fixing auction: {auction_id}")
-        print(f"    URL: {url}")
-
-        content = self.get_cached_content(url)
-        if not content:
-            print(f"    ❌ No cached content found")
-            return False
-
-        # Re-parse using current parser
-        parsed = self.parser.parse_page(content, url)
-        if not parsed or parsed.get('type') != 'auction':
-            print(f"    ❌ Could not parse as auction")
-            return False
-
-        # Validate parsed data
-        if not parsed.get('auction_id') or not parsed.get('title'):
-            print(f"    ⚠️  Re-parsed data still incomplete:")
-            print(f"       auction_id: {parsed.get('auction_id')}")
-            print(f"       title: {parsed.get('title', '')[:50]}")
-            return False
-
-        print(f"    ✓ Parsed successfully:")
-        print(f"       auction_id: {parsed.get('auction_id')}")
-        print(f"       title: {parsed.get('title', '')[:50]}")
-        print(f"       location: {parsed.get('location', 'N/A')}")
-        print(f"       lots: {parsed.get('lots_count', 0)}")
-
-        if not dry_run:
-            with sqlite3.connect(self.db_path) as conn:
-                conn.execute("""
-                    UPDATE auctions SET
-                        auction_id = ?,
-                        title = ?,
-                        location = ?,
-                        lots_count = ?,
-                        first_lot_closing_time = ?
-                    WHERE url = ?
-                """, (
-                    parsed['auction_id'],
-                    parsed['title'],
-                    parsed.get('location', ''),
-                    parsed.get('lots_count', 0),
-                    parsed.get('first_lot_closing_time', ''),
-                    url
-                ))
-                conn.commit()
-                print(f"    ✓ Database updated")
-
-        return True
-
-    def reparse_and_fix_lot(self, lot_id: str, url: str, dry_run: bool = False) -> bool:
-        """Re-parse lot page from cache and update database"""
-        print(f"\n  Fixing lot: {lot_id}")
-        print(f"    URL: {url}")
-
-        content = self.get_cached_content(url)
-        if not content:
-            print(f"    ❌ No cached content found")
-            return False
-
-        # Re-parse using current parser
-        parsed = self.parser.parse_page(content, url)
-        if not parsed or parsed.get('type') != 'lot':
-            print(f"    ❌ Could not parse as lot")
-            return False
-
-        # Validate parsed data
-        issues = []
-        if not parsed.get('lot_id'):
-            issues.append("missing lot_id")
-        if not parsed.get('title'):
-            issues.append("missing title")
-        if parsed.get('current_bid', '').lower().startswith('€huidig'):
-            issues.append("invalid bid format")
-
-        if issues:
-            print(f"    ⚠️  Re-parsed data still has issues: {', '.join(issues)}")
-            print(f"       lot_id: {parsed.get('lot_id')}")
-            print(f"       title: {parsed.get('title', '')[:50]}")
-            print(f"       bid: {parsed.get('current_bid')}")
-            return False
-
-        print(f"    ✓ Parsed successfully:")
-        print(f"       lot_id: {parsed.get('lot_id')}")
-        print(f"       auction_id: {parsed.get('auction_id')}")
-        print(f"       title: {parsed.get('title', '')[:50]}")
-        print(f"       bid: {parsed.get('current_bid')}")
-        print(f"       closing: {parsed.get('closing_time', 'N/A')}")
-
-        if not dry_run:
-            with sqlite3.connect(self.db_path) as conn:
-                conn.execute("""
-                    UPDATE lots SET
-                        lot_id = ?,
-                        auction_id = ?,
-                        title = ?,
-                        current_bid = ?,
-                        bid_count = ?,
-                        closing_time = ?,
-                        viewing_time = ?,
-                        pickup_date = ?,
-                        location = ?,
-                        description = ?,
-                        category = ?
-                    WHERE url = ?
-                """, (
-                    parsed['lot_id'],
-                    parsed.get('auction_id', ''),
-                    parsed['title'],
-                    parsed.get('current_bid', ''),
-                    parsed.get('bid_count', 0),
-                    parsed.get('closing_time', ''),
-                    parsed.get('viewing_time', ''),
-                    parsed.get('pickup_date', ''),
-                    parsed.get('location', ''),
-                    parsed.get('description', ''),
-                    parsed.get('category', ''),
-                    url
-                ))
-                conn.commit()
-                print(f"    ✓ Database updated")
-
-        return True
-
-    def run(self, dry_run: bool = False):
-        """Main execution - detect and fix all malformed entries"""
-        print("="*70)
-        print("MALFORMED ENTRY DETECTION AND REPAIR")
-        print("="*70)
-
-        # Check for auctions
-        print("\n1. CHECKING AUCTIONS...")
-        malformed_auctions = self.detect_malformed_auctions()
-        print(f"   Found {len(malformed_auctions)} malformed auction entries")
-
-        stats = {'auctions_fixed': 0, 'auctions_failed': 0}
-        for auction_id, url, title, closing_time in malformed_auctions:
-            try:
-                if self.reparse_and_fix_auction(auction_id or url.split('/')[-1], url, dry_run):
-                    stats['auctions_fixed'] += 1
-                else:
-                    stats['auctions_failed'] += 1
-            except Exception as e:
-                print(f"    ❌ Error: {e}")
-                stats['auctions_failed'] += 1
-
-        # Check for lots
-        print("\n2. CHECKING LOTS...")
-        malformed_lots = self.detect_malformed_lots()
-        print(f"   Found {len(malformed_lots)} malformed lot entries")
-
-        stats['lots_fixed'] = 0
-        stats['lots_failed'] = 0
-        for lot_id, url, title, bid, closing_time in malformed_lots:
-            try:
-                if self.reparse_and_fix_lot(lot_id or url.split('/')[-1], url, dry_run):
-                    stats['lots_fixed'] += 1
-                else:
-                    stats['lots_failed'] += 1
-            except Exception as e:
-                print(f"    ❌ Error: {e}")
-                stats['lots_failed'] += 1
-
-        # Summary
-        print("\n" + "="*70)
-        print("SUMMARY")
-        print("="*70)
-        print(f"Auctions:")
-        print(f"  - Found:  {len(malformed_auctions)}")
-        print(f"  - Fixed:  {stats['auctions_fixed']}")
-        print(f"  - Failed: {stats['auctions_failed']}")
-        print(f"\nLots:")
-        print(f"  - Found:  {len(malformed_lots)}")
-        print(f"  - Fixed:  {stats['lots_fixed']}")
-        print(f"  - Failed: {stats['lots_failed']}")
-
-        if dry_run:
-            print("\n⚠️  DRY RUN - No changes were made to the database")
-
-
-def main():
-    import argparse
-
-    parser = argparse.ArgumentParser(
-        description="Detect and fix malformed database entries"
-    )
-    parser.add_argument(
-        '--db',
-        default=CACHE_DB,
-        help='Path to cache database'
-    )
-    parser.add_argument(
-        '--dry-run',
-        action='store_true',
-        help='Show what would be done without making changes'
-    )
-
-    args = parser.parse_args()
-
-    print(f"Database: {args.db}")
-    print(f"Dry run:  {args.dry_run}\n")
-
-    fixer = MalformedEntryFixer(args.db)
-    fixer.run(dry_run=args.dry_run)
-
-
-if __name__ == "__main__":
-    main()
--- a/script/migrate_compress_cache.py
+++ b/script/migrate_compress_cache.py
@@ -1,139 +0,0 @@
-#!/usr/bin/env python3
-"""
-Migrate uncompressed cache entries to compressed format
-This script compresses all cache entries where compressed=0
-"""
-
-import sqlite3
-import zlib
-import time
-
-CACHE_DB = "/mnt/okcomputer/output/cache.db"
-
-def migrate_cache():
-    """Compress all uncompressed cache entries"""
-
-    with sqlite3.connect(CACHE_DB) as conn:
-        # Get uncompressed entries
-        cursor = conn.execute(
-            "SELECT url, content FROM cache WHERE compressed = 0 OR compressed IS NULL"
-        )
-        uncompressed = cursor.fetchall()
-
-        if not uncompressed:
-            print("✓ No uncompressed entries found. All cache is already compressed!")
-            return
-
-        print(f"Found {len(uncompressed)} uncompressed cache entries")
-        print("Starting compression...")
-
-        total_original_size = 0
-        total_compressed_size = 0
-        compressed_count = 0
-
-        for url, content in uncompressed:
-            try:
-                # Handle both text and bytes
-                if isinstance(content, str):
-                    content_bytes = content.encode('utf-8')
-                else:
-                    content_bytes = content
-
-                original_size = len(content_bytes)
-
-                # Compress
-                compressed_content = zlib.compress(content_bytes, level=9)
-                compressed_size = len(compressed_content)
-
-                # Update in database
-                conn.execute(
-                    "UPDATE cache SET content = ?, compressed = 1 WHERE url = ?",
-                    (compressed_content, url)
-                )
-
-                total_original_size += original_size
-                total_compressed_size += compressed_size
-                compressed_count += 1
-
-                if compressed_count % 100 == 0:
-                    conn.commit()
-                    ratio = (1 - total_compressed_size / total_original_size) * 100
-                    print(f"  Compressed {compressed_count}/{len(uncompressed)} entries... "
-                          f"({ratio:.1f}% reduction so far)")
-
-            except Exception as e:
-                print(f"  ERROR compressing {url}: {e}")
-                continue
-
-        # Final commit
-        conn.commit()
-
-        # Calculate final statistics
-        ratio = (1 - total_compressed_size / total_original_size) * 100 if total_original_size > 0 else 0
-        size_saved_mb = (total_original_size - total_compressed_size) / (1024 * 1024)
-
-        print("\n" + "="*60)
-        print("MIGRATION COMPLETE")
-        print("="*60)
-        print(f"Entries compressed: {compressed_count}")
-        print(f"Original size:      {total_original_size / (1024*1024):.2f} MB")
-        print(f"Compressed size:    {total_compressed_size / (1024*1024):.2f} MB")
-        print(f"Space saved:        {size_saved_mb:.2f} MB")
-        print(f"Compression ratio:  {ratio:.1f}%")
-        print("="*60)
-
-def verify_migration():
-    """Verify all entries are compressed"""
-    with sqlite3.connect(CACHE_DB) as conn:
-        cursor = conn.execute(
-            "SELECT COUNT(*) FROM cache WHERE compressed = 0 OR compressed IS NULL"
-        )
-        uncompressed_count = cursor.fetchone()[0]
-
-        cursor = conn.execute("SELECT COUNT(*) FROM cache WHERE compressed = 1")
-        compressed_count = cursor.fetchone()[0]
-
-        print("\nVERIFICATION:")
-        print(f"  Compressed entries:   {compressed_count}")
-        print(f"  Uncompressed entries: {uncompressed_count}")
-
-        if uncompressed_count == 0:
-            print("  ✓ All cache entries are compressed!")
-            return True
-        else:
-            print("  ✗ Some entries are still uncompressed")
-            return False
-
-def get_db_size():
-    """Get current database file size"""
-    import os
-    if os.path.exists(CACHE_DB):
-        size_mb = os.path.getsize(CACHE_DB) / (1024 * 1024)
-        return size_mb
-    return 0
-
-if __name__ == "__main__":
-    print("Cache Compression Migration Tool")
-    print("="*60)
-
-    # Show initial DB size
-    initial_size = get_db_size()
-    print(f"Initial database size: {initial_size:.2f} MB\n")
-
-    # Run migration
-    start_time = time.time()
-    migrate_cache()
-    elapsed = time.time() - start_time
-
-    print(f"\nTime taken: {elapsed:.2f} seconds")
-
-    # Verify
-    verify_migration()
-
-    # Show final DB size
-    final_size = get_db_size()
-    print(f"\nFinal database size: {final_size:.2f} MB")
-    print(f"Database size reduced by: {initial_size - final_size:.2f} MB")
-
-    print("\n✓ Migration complete! You can now run VACUUM to reclaim disk space:")
-    print("  sqlite3 /mnt/okcomputer/output/cache.db 'VACUUM;'")
--- a/script/migrate_reparse_lots.py
+++ b/script/migrate_reparse_lots.py
@@ -1,180 +0,0 @@
-#!/usr/bin/env python3
-"""
-Migration script to re-parse cached HTML pages and update database entries.
-Fixes issues with incomplete data extraction from earlier scrapes.
-"""
-import sys
-import sqlite3
-from pathlib import Path
-
-# Add src to path
-sys.path.insert(0, str(Path(__file__).parent.parent / 'src'))
-
-from parse import DataParser
-from config import CACHE_DB
-
-
-def reparse_and_update_lots(db_path: str = CACHE_DB, dry_run: bool = False):
-    """
-    Re-parse cached HTML pages and update lot entries in the database.
-
-    This extracts improved data from __NEXT_DATA__ JSON blobs that may have been
-    missed in earlier scraping runs when validation was less strict.
-    """
-    parser = DataParser()
-
-    with sqlite3.connect(db_path) as conn:
-        # Get all cached lot pages
-        cursor = conn.execute("""
-            SELECT url, content
-            FROM cache
-            WHERE url LIKE '%/l/%'
-            ORDER BY timestamp DESC
-        """)
-
-        cached_pages = cursor.fetchall()
-        print(f"Found {len(cached_pages)} cached lot pages to re-parse")
-
-        stats = {
-            'processed': 0,
-            'updated': 0,
-            'skipped': 0,
-            'errors': 0
-        }
-
-        for url, compressed_content in cached_pages:
-            try:
-                # Decompress content
-                import zlib
-                content = zlib.decompress(compressed_content).decode('utf-8')
-
-                # Re-parse using current parser logic
-                parsed_data = parser.parse_page(content, url)
-
-                if not parsed_data or parsed_data.get('type') != 'lot':
-                    stats['skipped'] += 1
-                    continue
-
-                lot_id = parsed_data.get('lot_id', '')
-                if not lot_id:
-                    print(f"  ⚠️  No lot_id for {url}")
-                    stats['skipped'] += 1
-                    continue
-
-                # Check if lot exists
-                existing = conn.execute(
-                    "SELECT lot_id FROM lots WHERE lot_id = ?",
-                    (lot_id,)
-                ).fetchone()
-
-                if not existing:
-                    print(f"  → New lot: {lot_id}")
-                    # Insert new lot
-                    if not dry_run:
-                        conn.execute("""
-                            INSERT INTO lots
-                            (lot_id, auction_id, url, title, current_bid, bid_count,
-                             closing_time, viewing_time, pickup_date, location,
-                             description, category, scraped_at)
-                            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
-                        """, (
-                            lot_id,
-                            parsed_data.get('auction_id', ''),
-                            url,
-                            parsed_data.get('title', ''),
-                            parsed_data.get('current_bid', ''),
-                            parsed_data.get('bid_count', 0),
-                            parsed_data.get('closing_time', ''),
-                            parsed_data.get('viewing_time', ''),
-                            parsed_data.get('pickup_date', ''),
-                            parsed_data.get('location', ''),
-                            parsed_data.get('description', ''),
-                            parsed_data.get('category', ''),
-                            parsed_data.get('scraped_at', '')
-                        ))
-                        stats['updated'] += 1
-                else:
-                    # Update existing lot with newly parsed data
-                    # Only update fields that are now populated but weren't before
-                    if not dry_run:
-                        conn.execute("""
-                            UPDATE lots SET
-                                auction_id = COALESCE(NULLIF(?, ''), auction_id),
-                                title = COALESCE(NULLIF(?, ''), title),
-                                current_bid = COALESCE(NULLIF(?, ''), current_bid),
-                                bid_count = CASE WHEN ? > 0 THEN ? ELSE bid_count END,
-                                closing_time = COALESCE(NULLIF(?, ''), closing_time),
-                                viewing_time = COALESCE(NULLIF(?, ''), viewing_time),
-                                pickup_date = COALESCE(NULLIF(?, ''), pickup_date),
-                                location = COALESCE(NULLIF(?, ''), location),
-                                description = COALESCE(NULLIF(?, ''), description),
-                                category = COALESCE(NULLIF(?, ''), category)
-                            WHERE lot_id = ?
-                        """, (
-                            parsed_data.get('auction_id', ''),
-                            parsed_data.get('title', ''),
-                            parsed_data.get('current_bid', ''),
-                            parsed_data.get('bid_count', 0),
-                            parsed_data.get('bid_count', 0),
-                            parsed_data.get('closing_time', ''),
-                            parsed_data.get('viewing_time', ''),
-                            parsed_data.get('pickup_date', ''),
-                            parsed_data.get('location', ''),
-                            parsed_data.get('description', ''),
-                            parsed_data.get('category', ''),
-                            lot_id
-                        ))
-                        stats['updated'] += 1
-
-                    print(f"  ✓ Updated: {lot_id[:20]}")
-
-                # Update images if they exist
-                images = parsed_data.get('images', [])
-                if images and not dry_run:
-                    for img_url in images:
-                        conn.execute("""
-                            INSERT OR IGNORE INTO images (lot_id, url)
-                            VALUES (?, ?)
-                        """, (lot_id, img_url))
-
-                stats['processed'] += 1
-
-                if stats['processed'] % 100 == 0:
-                    print(f"  Progress: {stats['processed']}/{len(cached_pages)}")
-                    if not dry_run:
-                        conn.commit()
-
-            except Exception as e:
-                print(f"  ❌ Error processing {url}: {e}")
-                stats['errors'] += 1
-                continue
-
-        if not dry_run:
-            conn.commit()
-
-        print("\n" + "="*60)
-        print("MIGRATION COMPLETE")
-        print("="*60)
-        print(f"Processed: {stats['processed']}")
-        print(f"Updated:   {stats['updated']}")
-        print(f"Skipped:   {stats['skipped']}")
-        print(f"Errors:    {stats['errors']}")
-
-        if dry_run:
-            print("\n⚠️  DRY RUN - No changes were made to the database")
-
-
-if __name__ == "__main__":
-    import argparse
-
-    parser = argparse.ArgumentParser(description="Re-parse and update lot entries from cached HTML")
-    parser.add_argument('--db', default=CACHE_DB, help='Path to cache database')
-    parser.add_argument('--dry-run', action='store_true', help='Show what would be done without making changes')
-
-    args = parser.parse_args()
-
-    print(f"Database: {args.db}")
-    print(f"Dry run:  {args.dry_run}")
-    print()
-
-    reparse_and_update_lots(args.db, args.dry_run)
--- a/src/cache.py
+++ b/src/cache.py
--- a/src/config.py
+++ b/src/config.py
@@ -15,7 +15,36 @@ if sys.version_info < (3, 10):

 # ==================== CONFIGURATION ====================
 BASE_URL = "https://www.troostwijkauctions.com"
-CACHE_DB = "/mnt/okcomputer/output/cache.db"
+POSTGRES_HOST = os.getenv("POSTGRES_HOST", "postgres")
+POSTGRES_DB = os.getenv("POSTGRES_DB", "auctiondb")
+POSTGRES_USER = os.getenv("POSTGRES_USER", "auction")
+POSTGRES_PASSWORD = os.getenv("POSTGRES_PASSWORD", "heel-goed-wachtwoord")
+# Full DSN
+DATABASE_URL = os.getenv(
+    "DATABASE_URL",
+    f"postgresql://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}:5432/{POSTGRES_DB}"
+).strip()
+
+# Primary database: PostgreSQL only
+# Override via environment variable DATABASE_URL
+# Example: postgresql://user:pass@host:5432/dbname
+# DATABASE_URL = os.getenv(
+#    "DATABASE_URL",
+#    # Default provided by ops
+#    "postgresql://auction:heel-goed-wachtwoord@192.168.1.159:5432/auctiondb",
+# ).strip()
+
+# Database connection pool controls (to avoid creating too many short-lived TCP connections)
+# Environment overrides: SCAEV_DB_POOL_MIN, SCAEV_DB_POOL_MAX, SCAEV_DB_POOL_TIMEOUT
+def _int_env(name: str, default: int) -> int:
+    try:
+        return int(os.getenv(name, str(default)))
+    except Exception:
+        return default
+
+DB_POOL_MIN = _int_env("SCAEV_DB_POOL_MIN", 1)
+DB_POOL_MAX = _int_env("SCAEV_DB_POOL_MAX", 6)
+DB_POOL_TIMEOUT = _int_env("SCAEV_DB_POOL_TIMEOUT", 30)  # seconds to wait for a pooled connection
 OUTPUT_DIR = "/mnt/okcomputer/output"
 IMAGES_DIR = "/mnt/okcomputer/output/images"
 RATE_LIMIT_SECONDS = 0.5  # EXACTLY 0.5 seconds between requests
--- a/src/db.py
+++ b/src/db.py
@@ -0,0 +1,54 @@
+#!/usr/bin/env python3
+"""
+Database scaffolding for future SQLAlchemy 2.x usage.
+
+Notes:
+- The application now uses PostgreSQL exclusively via `config.DATABASE_URL`.
+- This module prepares an engine/session bound to `DATABASE_URL`.
+- Example URL: `postgresql+psycopg://user:pass@host:5432/scaev`
+
+No runtime dependency from the scraper currently imports or uses this module.
+It is present to bootstrap a possible future move to SQLAlchemy 2.x.
+"""
+
+from __future__ import annotations
+
+import os
+from typing import Optional
+
+
+def get_database_url() -> str:
+    url = os.getenv("DATABASE_URL")
+    if not url or not url.strip():
+        raise RuntimeError("DATABASE_URL must be set for PostgreSQL connection")
+    return url.strip()
+
+
+def create_engine_and_session(database_url: str):
+    try:
+        from sqlalchemy import create_engine
+        from sqlalchemy.orm import sessionmaker
+    except Exception as e:
+        raise RuntimeError(
+            "SQLAlchemy is not installed. Add it to requirements.txt to use this module."
+        ) from e
+
+    # Engine tuned for simple use; callers may override
+    engine = create_engine(database_url, pool_pre_ping=True, future=True)
+    SessionLocal = sessionmaker(bind=engine, autoflush=False, autocommit=False, future=True)
+    return engine, SessionLocal
+
+
+def get_sa(session_cached: dict):
+    """Helper to lazily create and cache SQLAlchemy engine/session factory.
+
+    session_cached: dict — a mutable dict, e.g., module-level {}, to store engine and factory
+    """
+    if 'engine' in session_cached and 'SessionLocal' in session_cached:
+        return session_cached['engine'], session_cached['SessionLocal']
+
+    url = get_database_url()
+    engine, SessionLocal = create_engine_and_session(url)
+    session_cached['engine'] = engine
+    session_cached['SessionLocal'] = SessionLocal
+    return engine, SessionLocal
--- a/src/main.py
+++ b/src/main.py
@@ -8,7 +8,6 @@ import sys
 import asyncio
 import json
 import csv
-import sqlite3
 from datetime import datetime
 from pathlib import Path

@@ -16,6 +15,17 @@ import config
 from cache import CacheManager
 from scraper import TroostwijkScraper

+def mask_db_url(url: str) -> str:
+    try:
+        from urllib.parse import urlparse
+        p = urlparse(url)
+        user = p.username or ''
+        host = p.hostname or ''
+        port = f":{p.port}" if p.port else ''
+        return f"{p.scheme}://{user}:***@{host}{port}{p.path or ''}"
+    except Exception:
+        return url
+
 def main():
    """Main execution"""
    # Check for test mode
@@ -34,7 +44,7 @@ def main():
    if config.OFFLINE:
        print("OFFLINE MODE ENABLED — only database and cache will be used (no network)")
    print(f"Rate limit: {config.RATE_LIMIT_SECONDS} seconds BETWEEN EVERY REQUEST")
-    print(f"Cache database: {config.CACHE_DB}")
+    print(f"Database URL: {mask_db_url(config.DATABASE_URL)}")
    print(f"Output directory: {config.OUTPUT_DIR}")
    print(f"Max listing pages: {config.MAX_PAGES}")
    print("=" * 60)
--- a/src/monitor.py
+++ b/src/monitor.py
@@ -7,7 +7,6 @@ Runs indefinitely to keep database current with latest Troostwijk data
 import asyncio
 import time
 from datetime import datetime
-import sqlite3
 import config
 from cache import CacheManager
 from scraper import TroostwijkScraper
@@ -82,21 +81,7 @@ class AuctionMonitor:

    def _get_stats(self) -> dict:
        """Get current database statistics"""
-        conn = sqlite3.connect(self.scraper.cache.db_path)
-        cursor = conn.cursor()
-
-        cursor.execute("SELECT COUNT(*) FROM auctions")
-        auction_count = cursor.fetchone()[0]
-
-        cursor.execute("SELECT COUNT(*) FROM lots")
-        lot_count = cursor.fetchone()[0]
-
-        conn.close()
-
-        return {
-            'auctions': auction_count,
-            'lots': lot_count
-        }
+        return self.scraper.cache.get_counts()

    async def start(self):
        """Start continuous monitoring loop"""
@@ -106,7 +91,7 @@ class AuctionMonitor:
        if config.OFFLINE:
            print("OFFLINE MODE ENABLED — only database and cache will be used (no network)")
        print(f"Poll interval: {self.poll_interval / 60:.0f} minutes")
-        print(f"Cache database: {config.CACHE_DB}")
+        print(f"Database URL: {self._mask_db_url(config.DATABASE_URL)}")
        print(f"Rate limit: {config.RATE_LIMIT_SECONDS}s between requests")
        print("="*60)
        print("\nPress Ctrl+C to stop\n")
@@ -135,6 +120,21 @@ class AuctionMonitor:
                print(f"Last scan: {self.last_run.strftime('%Y-%m-%d %H:%M:%S')}")
            print("\nDatabase remains intact with all collected data")

+    @staticmethod
+    def _mask_db_url(url: str) -> str:
+        try:
+            from urllib.parse import urlparse
+            parsed = urlparse(url)
+            if parsed.username:
+                user = parsed.username
+                host = parsed.hostname or ''
+                port = f":{parsed.port}" if parsed.port else ''
+                db = parsed.path or ''
+                return f"{parsed.scheme}://{user}:***@{host}{port}{db}"
+        except Exception:
+            pass
+        return url
+
 def main():
    """Main entry point for monitor"""
    import sys
--- a/src/progress.py
+++ b/src/progress.py
@@ -0,0 +1,105 @@
+#!/usr/bin/env python3
+"""
+Lightweight TTY progress reporter for per-lot scraping.
+
+It shows a spinner while work is in progress and records all page/API
+fetches that contributed to the lot analysis, including:
+- URL or source label
+- size in bytes (when available)
+- cache status (cached/real-time/offline/db/intercepted)
+- duration in milliseconds
+
+Intentionally dependency-free and safe to use in async code.
+"""
+
+from __future__ import annotations
+
+import sys
+import time
+import threading
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+
+SPINNER_FRAMES = ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"]
+
+
+@dataclass
+class ProgressEvent:
+    kind: str  # html | graphql | rest | image | cache | db | intercepted | other
+    label: str  # url or description
+    size_bytes: Optional[int]
+    cached: str  # "cache", "realtime", "offline", "db", "intercepted"
+    duration_ms: Optional[int]
+
+
+@dataclass
+class ProgressReporter:
+    lot_id: str
+    title: str = ""
+    _events: List[ProgressEvent] = field(default_factory=list)
+    _start_ts: float = field(default_factory=time.time)
+    _stop_ts: Optional[float] = None
+    _spinner_thread: Optional[threading.Thread] = None
+    _stop_flag: bool = False
+    _is_tty: bool = field(default_factory=lambda: sys.stdout.isatty())
+
+    def start(self) -> None:
+        if not self._is_tty:
+            print(f"[LOT {self.lot_id}] ⏳ Analyzing… {self.title[:60]}")
+            return
+
+        def run_spinner():
+            idx = 0
+            while not self._stop_flag:
+                frame = SPINNER_FRAMES[idx % len(SPINNER_FRAMES)]
+                idx += 1
+                summary = f"{len(self._events)} events"
+                line = f"[LOT {self.lot_id}] {frame} {self.title[:60]} · {summary}"
+                # CR without newline to animate
+                sys.stdout.write("\r" + line)
+                sys.stdout.flush()
+                time.sleep(0.09)
+            # Clear the spinner line
+            sys.stdout.write("\r" + " " * 120 + "\r")
+            sys.stdout.flush()
+
+        self._spinner_thread = threading.Thread(target=run_spinner, daemon=True)
+        self._spinner_thread.start()
+
+    def add_event(
+        self,
+        *,
+        kind: str,
+        label: str,
+        size_bytes: Optional[int] = None,
+        cached: str = "realtime",
+        duration_ms: Optional[float] = None,
+    ) -> None:
+        self._events.append(
+            ProgressEvent(
+                kind=kind,
+                label=label,
+                size_bytes=int(size_bytes) if size_bytes is not None else None,
+                cached=cached,
+                duration_ms=int(duration_ms) if duration_ms is not None else None,
+            )
+        )
+
+    def stop(self) -> None:
+        self._stop_ts = time.time()
+        self._stop_flag = True
+        if self._spinner_thread and self._spinner_thread.is_alive():
+            self._spinner_thread.join(timeout=1.0)
+
+        total_ms = int((self._stop_ts - self._start_ts) * 1000)
+        print(f"[LOT {self.lot_id}] ✓ Done in {total_ms} ms — pages/APIs used:")
+        if not self._events:
+            print("  • (none)")
+            return
+
+        # Print events as a compact list
+        for ev in self._events:
+            size = f"{ev.size_bytes} B" if ev.size_bytes is not None else "?"
+            dur = f"{ev.duration_ms} ms" if ev.duration_ms is not None else "?"
+            print(f"  • [{ev.kind}] {ev.label} | {size} | {ev.cached} | {dur}")
--- a/src/scraper.py
+++ b/src/scraper.py
@@ -3,7 +3,6 @@
 Core scaev module for Scaev Auctions
 """
 import os
-import sqlite3
 import asyncio
 import time
 import random
@@ -29,6 +28,7 @@ from graphql_client import (
 )
 from bid_history_client import fetch_bid_history, parse_bid_history
 from priority import calculate_priority, parse_closing_time
+from progress import ProgressReporter

 class TroostwijkScraper:
    """Main scraper class for Troostwijk Auctions"""
@@ -65,13 +65,8 @@ class TroostwijkScraper:
                    content = await response.read()
                    with open(filepath, 'wb') as f:
                        f.write(content)
-
-                    with sqlite3.connect(self.cache.db_path) as conn:
-                        conn.execute("UPDATE images\n"
-                                     "SET local_path = ?, downloaded = 1\n"
-                                     "WHERE lot_id = ? AND url = ?\n"
-                                     "", (str(filepath), lot_id, url))
-                        conn.commit()
+                    # Record download in DB
+                    self.cache.update_image_local_path(lot_id, url, str(filepath))
                    return str(filepath)

        except Exception as e:
@@ -96,7 +91,7 @@ class TroostwijkScraper:
                      (useful for auction listing pages where we just need HTML structure)

        Returns:
-            Dict with 'content' and 'from_cache' keys
+            Dict with: 'content', 'from_cache', 'duration_ms', 'bytes', 'url'
        """
        if use_cache:
            cache_start = time.time()
@@ -104,7 +99,17 @@ class TroostwijkScraper:
            if cached:
                cache_time = (time.time() - cache_start) * 1000
                print(f"  CACHE HIT: {url} ({cache_time:.0f}ms)")
-                return {'content': cached['content'], 'from_cache': True}
+                try:
+                    byte_len = len(cached['content'].encode('utf-8'))
+                except Exception:
+                    byte_len = None
+                return {
+                    'content': cached['content'],
+                    'from_cache': True,
+                    'duration_ms': int(cache_time),
+                    'bytes': byte_len,
+                    'url': url
+                }

        # In OFFLINE mode we never fetch from network
        if self.offline:
@@ -130,7 +135,17 @@ class TroostwijkScraper:
            total_time = time.time() - fetch_start
            self.cache.set(url, content, 200)
            print(f"    [Timing: goto={goto_time:.2f}s, total={total_time:.2f}s, mode={wait_strategy}]")
-            return {'content': content, 'from_cache': False}
+            try:
+                byte_len = len(content.encode('utf-8'))
+            except Exception:
+                byte_len = None
+            return {
+                'content': content,
+                'from_cache': False,
+                'duration_ms': int(total_time * 1000),
+                'bytes': byte_len,
+                'url': url
+            }

        except Exception as e:
            print(f"  ERROR: {e}")
@@ -216,71 +231,54 @@ class TroostwijkScraper:
        if not result:
            # OFFLINE fallback: try to construct page data directly from DB
            if self.offline:
-                import sqlite3
-                conn = sqlite3.connect(self.cache.db_path)
-                cur = conn.cursor()
-                # Try lot first
-                cur.execute("SELECT * FROM lots WHERE url = ?", (url,))
-                lot_row = cur.fetchone()
-                if lot_row:
-                    # Build a dict using column names
-                    col_names = [d[0] for d in cur.description]
-                    lot_dict = dict(zip(col_names, lot_row))
-                    conn.close()
-                    page_data = {
-                        'type': 'lot',
-                        'lot_id': lot_dict.get('lot_id'),
-                        'auction_id': lot_dict.get('auction_id'),
-                        'url': lot_dict.get('url') or url,
-                        'title': lot_dict.get('title') or '',
-                        'current_bid': lot_dict.get('current_bid') or '',
-                        'bid_count': lot_dict.get('bid_count') or 0,
-                        'closing_time': lot_dict.get('closing_time') or '',
-                        'viewing_time': lot_dict.get('viewing_time') or '',
-                        'pickup_date': lot_dict.get('pickup_date') or '',
-                        'location': lot_dict.get('location') or '',
-                        'description': lot_dict.get('description') or '',
-                        'category': lot_dict.get('category') or '',
-                        'status': lot_dict.get('status') or '',
-                        'brand': lot_dict.get('brand') or '',
-                        'model': lot_dict.get('model') or '',
-                        'attributes_json': lot_dict.get('attributes_json') or '',
-                        'first_bid_time': lot_dict.get('first_bid_time'),
-                        'last_bid_time': lot_dict.get('last_bid_time'),
-                        'bid_velocity': lot_dict.get('bid_velocity'),
-                        'followers_count': lot_dict.get('followers_count') or 0,
-                        'estimated_min_price': lot_dict.get('estimated_min_price'),
-                        'estimated_max_price': lot_dict.get('estimated_max_price'),
-                        'lot_condition': lot_dict.get('lot_condition') or '',
-                        'appearance': lot_dict.get('appearance') or '',
-                        'scraped_at': lot_dict.get('scraped_at') or '',
-                    }
-                    print("  OFFLINE: using DB record for lot")
-                    self.visited_lots.add(url)
-                    return page_data
-
-                # Try auction by URL
-                cur.execute("SELECT * FROM auctions WHERE url = ?", (url,))
-                auc_row = cur.fetchone()
-                if auc_row:
-                    col_names = [d[0] for d in cur.description]
-                    auc_dict = dict(zip(col_names, auc_row))
-                    conn.close()
-                    page_data = {
-                        'type': 'auction',
-                        'auction_id': auc_dict.get('auction_id'),
-                        'url': auc_dict.get('url') or url,
-                        'title': auc_dict.get('title') or '',
-                        'location': auc_dict.get('location') or '',
-                        'lots_count': auc_dict.get('lots_count') or 0,
-                        'first_lot_closing_time': auc_dict.get('first_lot_closing_time') or '',
-                        'scraped_at': auc_dict.get('scraped_at') or '',
-                    }
-                    print("  OFFLINE: using DB record for auction")
-                    self.visited_lots.add(url)
-                    return page_data
-
-                conn.close()
+                rec = self.cache.get_page_record_by_url(url)
+                if rec:
+                    if rec.get('type') == 'lot':
+                        page_data = {
+                            'type': 'lot',
+                            'lot_id': rec.get('lot_id'),
+                            'auction_id': rec.get('auction_id'),
+                            'url': rec.get('url') or url,
+                            'title': rec.get('title') or '',
+                            'current_bid': rec.get('current_bid') or '',
+                            'bid_count': rec.get('bid_count') or 0,
+                            'closing_time': rec.get('closing_time') or '',
+                            'viewing_time': rec.get('viewing_time') or '',
+                            'pickup_date': rec.get('pickup_date') or '',
+                            'location': rec.get('location') or '',
+                            'description': rec.get('description') or '',
+                            'category': rec.get('category') or '',
+                            'status': rec.get('status') or '',
+                            'brand': rec.get('brand') or '',
+                            'model': rec.get('model') or '',
+                            'attributes_json': rec.get('attributes_json') or '',
+                            'first_bid_time': rec.get('first_bid_time'),
+                            'last_bid_time': rec.get('last_bid_time'),
+                            'bid_velocity': rec.get('bid_velocity'),
+                            'followers_count': rec.get('followers_count') or 0,
+                            'estimated_min_price': rec.get('estimated_min_price'),
+                            'estimated_max_price': rec.get('estimated_max_price'),
+                            'lot_condition': rec.get('lot_condition') or '',
+                            'appearance': rec.get('appearance') or '',
+                            'scraped_at': rec.get('scraped_at') or '',
+                        }
+                        print("  OFFLINE: using DB record for lot")
+                        self.visited_lots.add(url)
+                        return page_data
+                    else:
+                        page_data = {
+                            'type': 'auction',
+                            'auction_id': rec.get('auction_id'),
+                            'url': rec.get('url') or url,
+                            'title': rec.get('title') or '',
+                            'location': rec.get('location') or '',
+                            'lots_count': rec.get('lots_count') or 0,
+                            'first_lot_closing_time': rec.get('first_lot_closing_time') or '',
+                            'scraped_at': rec.get('scraped_at') or '',
+                        }
+                        print("  OFFLINE: using DB record for auction")
+                        self.visited_lots.add(url)
+                        return page_data
            return None

        content = result['content']
@@ -302,6 +300,18 @@ class TroostwijkScraper:
            print(f"  Type: LOT")
            print(f"  Title: {page_data.get('title', 'N/A')[:60]}...")

+            # TTY progress reporter per lot
+            lot_progress = ProgressReporter(lot_id=page_data.get('lot_id', ''), title=page_data.get('title', ''))
+            lot_progress.start()
+            # Record HTML page fetch
+            lot_progress.add_event(
+                kind='html',
+                label=result.get('url', url),
+                size_bytes=result.get('bytes'),
+                cached='cache' if from_cache else 'realtime',
+                duration_ms=result.get('duration_ms')
+            )
+
            # Extract ALL data from __NEXT_DATA__ lot object
            import json
            import re
@@ -330,7 +340,6 @@ class TroostwijkScraper:
            # Fetch all API data concurrently (or use intercepted/cached data)
            lot_id = page_data.get('lot_id')
            auction_id = page_data.get('auction_id')
-            import sqlite3

            # Step 1: Check if we intercepted API data during page load
            intercepted_data = None
@@ -339,6 +348,13 @@ class TroostwijkScraper:
                try:
                    intercepted_json = self.intercepted_api_data[lot_id]
                    intercepted_data = json.loads(intercepted_json)
+                    lot_progress.add_event(
+                        kind='intercepted',
+                        label='GraphQL (intercepted)',
+                        size_bytes=len(intercepted_json.encode('utf-8')),
+                        cached='intercepted',
+                        duration_ms=0
+                    )
                    # Store the raw JSON for future offline use
                    page_data['api_data_json'] = intercepted_json
                    # Extract lot data from intercepted response
@@ -356,14 +372,7 @@ class TroostwijkScraper:
                pass
            elif from_cache:
                # Check if we have cached API data in database
-                conn = sqlite3.connect(self.cache.db_path)
-                cursor = conn.cursor()
-                cursor.execute("""
-                    SELECT followers_count, estimated_min_price, current_bid, bid_count, closing_time, status
-                    FROM lots WHERE lot_id = ?
-                """, (lot_id,))
-                existing = cursor.fetchone()
-                conn.close()
+                existing = self.cache.get_lot_api_fields(lot_id)

                # Data quality check: Must have followers_count AND closing_time to be considered "complete"
                # This prevents using stale records like old "0 bids" entries
@@ -374,6 +383,13 @@ class TroostwijkScraper:

                if is_complete:
                    print(f"  Using cached API data")
+                    lot_progress.add_event(
+                        kind='db',
+                        label='lots table (cached api fields)',
+                        size_bytes=None,
+                        cached='db',
+                        duration_ms=0
+                    )
                    page_data['followers_count'] = existing[0]
                    page_data['estimated_min_price'] = existing[1]
                    page_data['current_bid'] = existing[2] or page_data.get('current_bid', 'No bids')
@@ -385,9 +401,31 @@ class TroostwijkScraper:
                else:
                    print(f"  Fetching lot data from API (concurrent)...")
                    # Make concurrent API calls
-                    api_tasks = [fetch_lot_bidding_data(lot_id)]
+                    api_tasks = []
+                    # Wrap each API call to capture duration and size
+                    async def _timed_fetch(name, coro_func, *args, **kwargs):
+                        t0 = time.time()
+                        data = await coro_func(*args, **kwargs)
+                        dt = int((time.time() - t0) * 1000)
+                        size_b = None
+                        try:
+                            if data is not None:
+                                import json as _json
+                                size_b = len(_json.dumps(data).encode('utf-8'))
+                        except Exception:
+                            size_b = None
+                        lot_progress.add_event(
+                            kind='graphql',
+                            label=name,
+                            size_bytes=size_b,
+                            cached='realtime',
+                            duration_ms=dt
+                        )
+                        return data
+
+                    api_tasks.append(_timed_fetch('GraphQL lotDetails', fetch_lot_bidding_data, lot_id))
                    if auction_id:
-                        api_tasks.append(fetch_auction_data(auction_id))
+                        api_tasks.append(_timed_fetch('GraphQL auction', fetch_auction_data, auction_id))
                    results = await asyncio.gather(*api_tasks, return_exceptions=True)
                    bidding_data = results[0] if results and not isinstance(results[0], Exception) else None
                    bid_history_data = None  # Will fetch after we have lot_uuid
@@ -395,32 +433,90 @@ class TroostwijkScraper:
                # Fresh page fetch - make concurrent API calls for all data
                if not self.offline:
                    print(f"  Fetching lot data from API (concurrent)...")
-                api_tasks = [fetch_lot_bidding_data(lot_id)]
+                api_tasks = []
                task_map = {'bidding': 0}  # Track which index corresponds to which task

                # Add auction data fetch if we need viewing/pickup times
                if auction_id:
-                    conn = sqlite3.connect(self.cache.db_path)
-                    cursor = conn.cursor()
-                    cursor.execute("""
-                        SELECT viewing_time, pickup_date FROM lots WHERE lot_id = ?
-                    """, (lot_id,))
-                    times = cursor.fetchone()
-                    conn.close()
-                    has_times = times and (times[0] or times[1])
+                    vt, pd = self.cache.get_lot_times(lot_id)
+                    has_times = vt or pd

                    if not has_times:
                        task_map['auction'] = len(api_tasks)
-                        api_tasks.append(fetch_auction_data(auction_id))
+                        async def fetch_auction_wrapped():
+                            t0 = time.time()
+                            data = await fetch_auction_data(auction_id)
+                            dt = int((time.time() - t0) * 1000)
+                            size_b = None
+                            try:
+                                if data is not None:
+                                    import json as _json
+                                    size_b = len(_json.dumps(data).encode('utf-8'))
+                            except Exception:
+                                size_b = None
+                            lot_progress.add_event(
+                                kind='graphql',
+                                label='GraphQL auction',
+                                size_bytes=size_b,
+                                cached='realtime',
+                                duration_ms=dt
+                            )
+                            return data
+                        api_tasks.append(fetch_auction_wrapped())

                # Add bid history fetch if we have lot_uuid and expect bids
                if lot_uuid:
                    task_map['bid_history'] = len(api_tasks)
-                    api_tasks.append(fetch_bid_history(lot_uuid))
+                    async def fetch_bid_history_wrapped():
+                        t0 = time.time()
+                        data = await fetch_bid_history(lot_uuid)
+                        dt = int((time.time() - t0) * 1000)
+                        size_b = None
+                        try:
+                            if data is not None:
+                                import json as _json
+                                size_b = len(_json.dumps(data).encode('utf-8'))
+                        except Exception:
+                            size_b = None
+                        lot_progress.add_event(
+                            kind='rest',
+                            label='REST bid history',
+                            size_bytes=size_b,
+                            cached='realtime',
+                            duration_ms=dt
+                        )
+                        return data
+                    api_tasks.append(fetch_bid_history_wrapped())

                # Execute all API calls concurrently
+                # Always include the bidding data as first task
+                async def fetch_bidding_wrapped():
+                    t0 = time.time()
+                    data = await fetch_lot_bidding_data(lot_id)
+                    dt = int((time.time() - t0) * 1000)
+                    size_b = None
+                    try:
+                        if data is not None:
+                            import json as _json
+                            size_b = len(_json.dumps(data).encode('utf-8'))
+                    except Exception:
+                        size_b = None
+                    lot_progress.add_event(
+                        kind='graphql',
+                        label='GraphQL lotDetails',
+                        size_bytes=size_b,
+                        cached='realtime',
+                        duration_ms=dt
+                    )
+                    return data
+
+                api_tasks.insert(0, fetch_bidding_wrapped())
+                # Adjust task_map indexes
+                for k in list(task_map.keys()):
+                    task_map[k] += 1 if k != 'bidding' else 0
+
                results = await asyncio.gather(*api_tasks, return_exceptions=True)
-                bidding_data = results[task_map['bidding']] if results and not isinstance(results[task_map['bidding']], Exception) else None
+                bidding_data = results[0] if results and not isinstance(results[0], Exception) else None

                # Store raw API JSON for offline replay
                if bidding_data:
@@ -538,14 +634,7 @@ class TroostwijkScraper:
                        self.cache.save_bid_history(lot_id, bid_data['bid_records'])
                elif from_cache and page_data.get('bid_count', 0) > 0:
                    # Check if cached bid history exists
-                    conn = sqlite3.connect(self.cache.db_path)
-                    cursor = conn.cursor()
-                    cursor.execute("""
-                        SELECT COUNT(*) FROM bid_history WHERE lot_id = ?
-                    """, (lot_id,))
-                    has_history = cursor.fetchone()[0] > 0
-                    conn.close()
-                    if has_history:
+                    if self.cache.has_bid_history(lot_id):
                        print(f"  Bid history cached")
            else:
                print(f"  Bid: {page_data.get('current_bid', 'N/A')} (from HTML)")
@@ -571,15 +660,7 @@ class TroostwijkScraper:

                if self.download_images:
                    # Check which images are already downloaded
-                    import sqlite3
-                    conn = sqlite3.connect(self.cache.db_path)
-                    cursor = conn.cursor()
-                    cursor.execute("""
-                        SELECT url FROM images
-                        WHERE lot_id = ? AND downloaded = 1
-                    """, (page_data['lot_id'],))
-                    already_downloaded = {row[0] for row in cursor.fetchall()}
-                    conn.close()
+                    already_downloaded = set(self.cache.get_downloaded_image_urls(page_data['lot_id']))

                    # Only download missing images
                    images_to_download = [
@@ -628,6 +709,12 @@ class TroostwijkScraper:
                    else:
                        print(f"    All {len(images)} images already cached")

+            # Stop and print progress summary for the lot
+            try:
+                lot_progress.stop()
+            except Exception:
+                pass
+
        return page_data

    def _prioritize_lots(self, lot_urls: List[str]) -> List[Tuple[int, str, str]]:
@@ -636,25 +723,15 @@ class TroostwijkScraper:

        Returns list of (priority, url, description) tuples sorted by priority (highest first)
        """
-        import sqlite3
-
        prioritized = []
        current_time = int(time.time())

-        conn = sqlite3.connect(self.cache.db_path)
-        cursor = conn.cursor()
-
        for url in lot_urls:
            # Extract lot_id from URL
            lot_id = self.parser.extract_lot_id(url)

            # Try to get existing data from database
-            cursor.execute("""
-                SELECT closing_time, scraped_at, scrape_priority, next_scrape_at
-                FROM lots WHERE lot_id = ? OR url = ?
-            """, (lot_id, url))
-
-            row = cursor.fetchone()
+            row = self.cache.get_lot_priority_info(lot_id, url)

            if row:
                closing_time, scraped_at, existing_priority, next_scrape_at = row
@@ -694,8 +771,6 @@ class TroostwijkScraper:

            prioritized.append((priority, url, desc))

-        conn.close()
-
        # Sort by priority (highest first)
        prioritized.sort(key=lambda x: x[0], reverse=True)

@@ -706,14 +781,9 @@ class TroostwijkScraper:
        if self.offline:
            print("Launching OFFLINE crawl (no network requests)")
            # Gather URLs from database
-            import sqlite3
-            conn = sqlite3.connect(self.cache.db_path)
-            cur = conn.cursor()
-            cur.execute("SELECT DISTINCT url FROM auctions")
-            auction_urls = [r[0] for r in cur.fetchall() if r and r[0]]
-            cur.execute("SELECT DISTINCT url FROM lots")
-            lot_urls = [r[0] for r in cur.fetchall() if r and r[0]]
-            conn.close()
+            urls = self.cache.get_distinct_urls()
+            auction_urls = urls['auctions']
+            lot_urls = urls['lots']

            print(f"  OFFLINE: {len(auction_urls)} auctions and {len(lot_urls)} lots in DB")

@@ -933,23 +1003,17 @@ class TroostwijkScraper:

    def export_to_files(self) -> Dict[str, str]:
        """Export database to CSV/JSON files"""
-        import sqlite3
        import json
        import csv
        from datetime import datetime

        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
-        output_dir = os.path.dirname(self.cache.db_path)
-
-        conn = sqlite3.connect(self.cache.db_path)
-        conn.row_factory = sqlite3.Row
-        cursor = conn.cursor()
+        from config import OUTPUT_DIR as output_dir

        files = {}

        # Export auctions
-        cursor.execute("SELECT * FROM auctions")
-        auctions = [dict(row) for row in cursor.fetchall()]
+        auctions = self.cache.fetch_all('auctions')

        auctions_csv = os.path.join(output_dir, f'auctions_{timestamp}.csv')
        auctions_json = os.path.join(output_dir, f'auctions_{timestamp}.json')
@@ -968,8 +1032,7 @@ class TroostwijkScraper:
            print(f"  Exported {len(auctions)} auctions")

        # Export lots
-        cursor.execute("SELECT * FROM lots")
-        lots = [dict(row) for row in cursor.fetchall()]
+        lots = self.cache.fetch_all('lots')

        lots_csv = os.path.join(output_dir, f'lots_{timestamp}.csv')
        lots_json = os.path.join(output_dir, f'lots_{timestamp}.json')
@@ -987,5 +1050,4 @@ class TroostwijkScraper:
            files['lots_json'] = lots_json
            print(f"  Exported {len(lots)} lots")

-        conn.close()
        return files
--- a/src/test.py
+++ b/src/test.py
@@ -4,7 +4,6 @@ Test module for debugging extraction patterns
 """

 import sys
-import sqlite3
 import time
 import re
 import json
@@ -27,10 +26,11 @@ def test_extraction(
    if not cached:
        print(f"ERROR: URL not found in cache: {test_url}")
        print(f"\nAvailable cached URLs:")
-        with sqlite3.connect(config.CACHE_DB) as conn:
-            cursor = conn.execute("SELECT url FROM cache ORDER BY timestamp DESC LIMIT 10")
-            for row in cursor.fetchall():
-                print(f"  - {row[0]}")
+        try:
+            for url in scraper.cache.get_recent_cached_urls(limit=10):
+                print(f"  - {url}")
+        except Exception as e:
+            print(f"  (failed to list recent cached URLs: {e})")
        return

    content = cached['content']
--- a/test/test_cache_behavior.py
+++ b/test/test_cache_behavior.py
@@ -1,303 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test cache behavior - verify page is only fetched once and data persists offline
-"""
-
-import sys
-import os
-import asyncio
-import sqlite3
-import time
-from pathlib import Path
-
-# Add src to path
-sys.path.insert(0, str(Path(__file__).parent.parent / 'src'))
-
-from cache import CacheManager
-from scraper import TroostwijkScraper
-import config
-
-
-class TestCacheBehavior:
-    """Test suite for cache and offline functionality"""
-
-    def __init__(self):
-        self.test_db = "test_cache.db"
-        self.original_db = config.CACHE_DB
-        self.cache = None
-        self.scraper = None
-
-    def setup(self):
-        """Setup test environment"""
-        print("\n" + "="*60)
-        print("TEST SETUP")
-        print("="*60)
-
-        # Use test database
-        config.CACHE_DB = self.test_db
-
-        # Ensure offline mode is disabled for tests
-        config.OFFLINE = False
-
-        # Clean up old test database
-        if os.path.exists(self.test_db):
-            os.remove(self.test_db)
-            print(f"  * Removed old test database")
-
-        # Initialize cache and scraper
-        self.cache = CacheManager()
-        self.scraper = TroostwijkScraper()
-        self.scraper.offline = False  # Explicitly disable offline mode
-
-        print(f"  * Created test database: {self.test_db}")
-        print(f"  * Initialized cache and scraper")
-        print(f"  * Offline mode: DISABLED")
-
-    def teardown(self):
-        """Cleanup test environment"""
-        print("\n" + "="*60)
-        print("TEST TEARDOWN")
-        print("="*60)
-
-        # Restore original database path
-        config.CACHE_DB = self.original_db
-
-        # Keep test database for inspection
-        print(f"  * Test database preserved: {self.test_db}")
-        print(f"  * Restored original database path")
-
-    async def test_page_fetched_once(self):
-        """Test that a page is only fetched from network once"""
-        print("\n" + "="*60)
-        print("TEST 1: Page Fetched Only Once")
-        print("="*60)
-
-        # Pick a real lot URL to test with
-        test_url = "https://www.troostwijkauctions.com/l/bmw-x5-xdrive40d-high-executive-m-sport-a8-286pk-2019-A1-26955-7"
-
-        print(f"\nTest URL: {test_url}")
-
-        # First visit - should fetch from network
-        print("\n--- FIRST VISIT (should fetch from network) ---")
-        start_time = time.time()
-
-        async with asyncio.timeout(60):  # 60 second timeout
-            page_data_1 = await self._scrape_single_page(test_url)
-
-        first_visit_time = time.time() - start_time
-
-        if not page_data_1:
-            print("  [FAIL] First visit returned no data")
-            return False
-
-        print(f"  [OK] First visit completed in {first_visit_time:.2f}s")
-        print(f"  [OK] Got lot data: {page_data_1.get('title', 'N/A')[:60]}...")
-
-        # Check closing time was captured
-        closing_time_1 = page_data_1.get('closing_time')
-        print(f"  [OK] Closing time: {closing_time_1}")
-
-        # Second visit - should use cache
-        print("\n--- SECOND VISIT (should use cache) ---")
-        start_time = time.time()
-
-        async with asyncio.timeout(30):  # Should be much faster
-            page_data_2 = await self._scrape_single_page(test_url)
-
-        second_visit_time = time.time() - start_time
-
-        if not page_data_2:
-            print("  [FAIL] Second visit returned no data")
-            return False
-
-        print(f"  [OK] Second visit completed in {second_visit_time:.2f}s")
-
-        # Verify data matches
-        if page_data_1.get('lot_id') != page_data_2.get('lot_id'):
-            print(f"  [FAIL] Lot IDs don't match")
-            return False
-
-        closing_time_2 = page_data_2.get('closing_time')
-        print(f"  [OK] Closing time: {closing_time_2}")
-
-        if closing_time_1 != closing_time_2:
-            print(f"  [FAIL] Closing times don't match!")
-            print(f"    First:  {closing_time_1}")
-            print(f"    Second: {closing_time_2}")
-            return False
-
-        # Verify second visit was significantly faster (used cache)
-        if second_visit_time >= first_visit_time * 0.5:
-            print(f"  [WARN] Second visit not significantly faster")
-            print(f"    First:  {first_visit_time:.2f}s")
-            print(f"    Second: {second_visit_time:.2f}s")
-        else:
-            print(f"  [OK] Second visit was {(first_visit_time / second_visit_time):.1f}x faster (cache working!)")
-
-        # Verify resource cache has entries
-        conn = sqlite3.connect(self.test_db)
-        cursor = conn.execute("SELECT COUNT(*) FROM resource_cache")
-        resource_count = cursor.fetchone()[0]
-        conn.close()
-
-        print(f"  [OK] Cached {resource_count} resources")
-
-        print("\n[PASS] TEST 1 PASSED: Page fetched only once, data persists")
-        return True
-
-    async def test_offline_mode(self):
-        """Test that offline mode works with cached data"""
-        print("\n" + "="*60)
-        print("TEST 2: Offline Mode with Cached Data")
-        print("="*60)
-
-        # Use the same URL from test 1 (should be cached)
-        test_url = "https://www.troostwijkauctions.com/l/bmw-x5-xdrive40d-high-executive-m-sport-a8-286pk-2019-A1-26955-7"
-
-        # Enable offline mode
-        original_offline = config.OFFLINE
-        config.OFFLINE = True
-        self.scraper.offline = True
-
-        print(f"\nTest URL: {test_url}")
-        print("  * Offline mode: ENABLED")
-
-        try:
-            # Try to scrape in offline mode
-            print("\n--- OFFLINE SCRAPE (should use DB/cache only) ---")
-            start_time = time.time()
-
-            async with asyncio.timeout(30):
-                page_data = await self._scrape_single_page(test_url)
-
-            offline_time = time.time() - start_time
-
-            if not page_data:
-                print("  [FAIL] Offline mode returned no data")
-                return False
-
-            print(f"  [OK] Offline scrape completed in {offline_time:.2f}s")
-            print(f"  [OK] Got lot data: {page_data.get('title', 'N/A')[:60]}...")
-
-            # Check closing time is available
-            closing_time = page_data.get('closing_time')
-            if not closing_time:
-                print(f"  [FAIL] No closing time in offline mode")
-                return False
-
-            print(f"  [OK] Closing time preserved: {closing_time}")
-
-            # Verify essential fields are present
-            essential_fields = ['lot_id', 'title', 'url', 'location']
-            missing_fields = [f for f in essential_fields if not page_data.get(f)]
-
-            if missing_fields:
-                print(f"  [FAIL] Missing essential fields: {missing_fields}")
-                return False
-
-            print(f"  [OK] All essential fields present")
-
-            # Check database has the lot
-            conn = sqlite3.connect(self.test_db)
-            cursor = conn.execute("SELECT closing_time FROM lots WHERE url = ?", (test_url,))
-            row = cursor.fetchone()
-            conn.close()
-
-            if not row:
-                print(f"  [FAIL] Lot not found in database")
-                return False
-
-            db_closing_time = row[0]
-            print(f"  [OK] Database has closing time: {db_closing_time}")
-
-            if db_closing_time != closing_time:
-                print(f"  [FAIL] Closing time mismatch")
-                print(f"    Scraped: {closing_time}")
-                print(f"    Database: {db_closing_time}")
-                return False
-
-            print("\n[PASS] TEST 2 PASSED: Offline mode works, closing time preserved")
-            return True
-
-        finally:
-            # Restore offline mode
-            config.OFFLINE = original_offline
-            self.scraper.offline = original_offline
-
-    async def _scrape_single_page(self, url):
-        """Helper to scrape a single page"""
-        from playwright.async_api import async_playwright
-
-        if config.OFFLINE or self.scraper.offline:
-            # Offline mode - use crawl_page directly
-            return await self.scraper.crawl_page(page=None, url=url)
-
-        # Online mode - need browser
-        async with async_playwright() as p:
-            browser = await p.chromium.launch(headless=True)
-            page = await browser.new_page()
-
-            try:
-                result = await self.scraper.crawl_page(page, url)
-                return result
-            finally:
-                await browser.close()
-
-    async def run_all_tests(self):
-        """Run all tests"""
-        print("\n" + "="*70)
-        print("CACHE BEHAVIOR TEST SUITE")
-        print("="*70)
-
-        self.setup()
-
-        results = []
-
-        try:
-            # Test 1: Page fetched once
-            result1 = await self.test_page_fetched_once()
-            results.append(("Page Fetched Once", result1))
-
-            # Test 2: Offline mode
-            result2 = await self.test_offline_mode()
-            results.append(("Offline Mode", result2))
-
-        except Exception as e:
-            print(f"\n[ERROR] TEST SUITE ERROR: {e}")
-            import traceback
-            traceback.print_exc()
-
-        finally:
-            self.teardown()
-
-        # Print summary
-        print("\n" + "="*70)
-        print("TEST SUMMARY")
-        print("="*70)
-
-        all_passed = True
-        for test_name, passed in results:
-            status = "[PASS]" if passed else "[FAIL]"
-            print(f"  {status}: {test_name}")
-            if not passed:
-                all_passed = False
-
-        print("="*70)
-
-        if all_passed:
-            print("\n*** ALL TESTS PASSED! ***")
-            return 0
-        else:
-            print("\n*** SOME TESTS FAILED ***")
-            return 1
-
-
-async def main():
-    """Run tests"""
-    tester = TestCacheBehavior()
-    exit_code = await tester.run_all_tests()
-    sys.exit(exit_code)
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/test/test_description_simple.py
+++ b/test/test_description_simple.py
@@ -1,51 +0,0 @@
-#!/usr/bin/env python3
-import sys
-import os
-parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
-sys.path.insert(0, parent_dir)
-sys.path.insert(0, os.path.join(parent_dir, 'src'))
-
-import asyncio
-from scraper import TroostwijkScraper
-import config
-import os
-
-async def test():
-    # Force online mode
-    os.environ['SCAEV_OFFLINE'] = '0'
-    config.OFFLINE = False
-
-    scraper = TroostwijkScraper()
-    scraper.offline = False
-
-    from playwright.async_api import async_playwright
-    async with async_playwright() as p:
-        browser = await p.chromium.launch(headless=True)
-        context = await browser.new_context()
-        page = await context.new_page()
-
-        url = "https://www.troostwijkauctions.com/l/used-dometic-seastar-tfxchx8641p-top-mount-engine-control-liver-A1-39684-12"
-
-        # Add debug logging to parser
-        original_parse = scraper.parser.parse_page
-        def debug_parse(content, url):
-            result = original_parse(content, url)
-            if result:
-                print(f"PARSER OUTPUT:")
-                print(f"  description: {result.get('description', 'NONE')[:100] if result.get('description') else 'EMPTY'}")
-                print(f"  closing_time: {result.get('closing_time', 'NONE')}")
-                print(f"  bid_count: {result.get('bid_count', 'NONE')}")
-            return result
-        scraper.parser.parse_page = debug_parse
-
-        page_data = await scraper.crawl_page(page, url)
-
-        await browser.close()
-
-        print(f"\nFINAL page_data:")
-        print(f"  description: {page_data.get('description', 'NONE')[:100] if page_data and page_data.get('description') else 'EMPTY'}")
-        print(f"  closing_time: {page_data.get('closing_time', 'NONE') if page_data else 'NONE'}")
-        print(f"  bid_count: {page_data.get('bid_count', 'NONE') if page_data else 'NONE'}")
-        print(f"  status: {page_data.get('status', 'NONE') if page_data else 'NONE'}")
-
-asyncio.run(test())
--- a/test/test_graphql_403.py
+++ b/test/test_graphql_403.py
@@ -1,85 +0,0 @@
-import asyncio
-import types
-import sys
-from pathlib import Path
-import pytest
-
-
-@pytest.mark.asyncio
-async def test_fetch_lot_bidding_data_403(monkeypatch):
-    """
-    Simulate a 403 from the GraphQL endpoint and verify:
-    - Function returns None (graceful handling)
-    - It attempts a retry and logs a clear 403 message
-    """
-    # Load modules directly from src using importlib to avoid path issues
-    project_root = Path(__file__).resolve().parents[1]
-    src_path = project_root / 'src'
-    import importlib.util
-
-    def _load_module(name, file_path):
-        spec = importlib.util.spec_from_file_location(name, str(file_path))
-        module = importlib.util.module_from_spec(spec)
-        sys.modules[name] = module
-        spec.loader.exec_module(module)  # type: ignore
-        return module
-
-    # Load config first because graphql_client imports it by module name
-    config = _load_module('config', src_path / 'config.py')
-    graphql_client = _load_module('graphql_client', src_path / 'graphql_client.py')
-    monkeypatch.setattr(config, "OFFLINE", False, raising=False)
-
-    log_messages = []
-
-    def fake_print(*args, **kwargs):
-        msg = " ".join(str(a) for a in args)
-        log_messages.append(msg)
-
-    import builtins
-    monkeypatch.setattr(builtins, "print", fake_print)
-
-    class MockResponse:
-        def __init__(self, status=403, text_body="Forbidden"):
-            self.status = status
-            self._text_body = text_body
-
-        async def json(self):
-            return {}
-
-        async def text(self):
-            return self._text_body
-
-        async def __aenter__(self):
-            return self
-
-        async def __aexit__(self, exc_type, exc, tb):
-            return False
-
-    class MockSession:
-        def __init__(self, *args, **kwargs):
-            pass
-
-        def post(self, *args, **kwargs):
-            # Always return 403
-            return MockResponse(403, "Forbidden by WAF")
-
-        async def __aenter__(self):
-            return self
-
-        async def __aexit__(self, exc_type, exc, tb):
-            return False
-
-    # Patch aiohttp.ClientSession to our mock
-    import types as _types
-    dummy_aiohttp = _types.SimpleNamespace()
-    dummy_aiohttp.ClientSession = MockSession
-    # Ensure that an `import aiohttp` inside the function resolves to our dummy
-    monkeypatch.setitem(sys.modules, 'aiohttp', dummy_aiohttp)
-
-    result = await graphql_client.fetch_lot_bidding_data("A1-40179-35")
-
-    # Should gracefully return None
-    assert result is None
-
-    # Should have logged a 403 at least once
-    assert any("GraphQL API error: 403" in m for m in log_messages)
--- a/test/test_missing_fields.py
+++ b/test/test_missing_fields.py
@@ -1,208 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test to validate that all expected fields are populated after scraping
-"""
-import sys
-import os
-import asyncio
-import sqlite3
-
-# Add parent and src directory to path
-parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
-sys.path.insert(0, parent_dir)
-sys.path.insert(0, os.path.join(parent_dir, 'src'))
-
-# Force online mode before importing
-os.environ['SCAEV_OFFLINE'] = '0'
-
-from scraper import TroostwijkScraper
-import config
-
-
-async def test_lot_has_all_fields():
-    """Test that a lot page has all expected fields populated"""
-
-    print("\n" + "="*60)
-    print("TEST: Lot has all required fields")
-    print("="*60)
-
-    # Use the example lot from user
-    test_url = "https://www.troostwijkauctions.com/l/radaway-idea-black-dwj-doucheopstelling-A1-39956-18"
-
-    # Ensure we're not in offline mode
-    config.OFFLINE = False
-
-    scraper = TroostwijkScraper()
-    scraper.offline = False
-
-    print(f"\n[1] Scraping: {test_url}")
-
-    # Start playwright and scrape
-    from playwright.async_api import async_playwright
-    async with async_playwright() as p:
-        browser = await p.chromium.launch(headless=True)
-        context = await browser.new_context()
-        page = await context.new_page()
-
-        page_data = await scraper.crawl_page(page, test_url)
-
-        await browser.close()
-
-    if not page_data:
-        print("  [FAIL] No data returned")
-        return False
-
-    print(f"\n[2] Validating fields...")
-
-    # Fields that MUST have values (critical for auction functionality)
-    required_fields = {
-        'closing_time': 'Closing time',
-        'current_bid': 'Current bid',
-        'bid_count': 'Bid count',
-        'status': 'Status',
-    }
-
-    # Fields that SHOULD have values but may legitimately be empty
-    optional_fields = {
-        'description': 'Description',
-    }
-
-    missing_fields = []
-    empty_fields = []
-    optional_missing = []
-
-    # Check required fields
-    for field, label in required_fields.items():
-        value = page_data.get(field)
-
-        if value is None:
-            missing_fields.append(label)
-            print(f"  [FAIL] {label}: MISSING (None)")
-        elif value == '' or value == 0 or value == 'No bids':
-            # Special case: 'No bids' is only acceptable if bid_count is 0
-            if field == 'current_bid' and page_data.get('bid_count', 0) == 0:
-                print(f"  [PASS] {label}: '{value}' (acceptable - no bids)")
-            else:
-                empty_fields.append(label)
-                print(f"  [FAIL] {label}: EMPTY ('{value}')")
-        else:
-            print(f"  [PASS] {label}: {value}")
-
-    # Check optional fields (warn but don't fail)
-    for field, label in optional_fields.items():
-        value = page_data.get(field)
-        if value is None or value == '':
-            optional_missing.append(label)
-            print(f"  [WARN] {label}: EMPTY (may be legitimate)")
-        else:
-            print(f"  [PASS] {label}: {value[:50]}...")
-
-    # Check database
-    print(f"\n[3] Checking database entry...")
-    conn = sqlite3.connect(scraper.cache.db_path)
-    cursor = conn.cursor()
-    cursor.execute("""
-        SELECT closing_time, current_bid, bid_count, description, status
-        FROM lots WHERE url = ?
-    """, (test_url,))
-    row = cursor.fetchone()
-    conn.close()
-
-    if row:
-        db_closing, db_bid, db_count, db_desc, db_status = row
-        print(f"  DB closing_time: {db_closing or 'EMPTY'}")
-        print(f"  DB current_bid: {db_bid or 'EMPTY'}")
-        print(f"  DB bid_count: {db_count}")
-        print(f"  DB description: {db_desc[:50] if db_desc else 'EMPTY'}...")
-        print(f"  DB status: {db_status or 'EMPTY'}")
-
-        # Verify DB matches page_data
-        if db_closing != page_data.get('closing_time'):
-            print(f"  [WARN] DB closing_time doesn't match page_data")
-        if db_count != page_data.get('bid_count'):
-            print(f"  [WARN] DB bid_count doesn't match page_data")
-    else:
-        print(f"  [WARN] No database entry found")
-
-    print(f"\n" + "="*60)
-    if missing_fields or empty_fields:
-        print(f"[FAIL] Missing fields: {', '.join(missing_fields)}")
-        print(f"[FAIL] Empty fields: {', '.join(empty_fields)}")
-        if optional_missing:
-            print(f"[WARN] Optional missing: {', '.join(optional_missing)}")
-        return False
-    else:
-        print("[PASS] All required fields are populated")
-        if optional_missing:
-            print(f"[WARN] Optional missing: {', '.join(optional_missing)}")
-        return True
-
-
-async def test_lot_with_description():
-    """Test that a lot with description preserves it"""
-
-    print("\n" + "="*60)
-    print("TEST: Lot with description")
-    print("="*60)
-
-    # Use a lot known to have description
-    test_url = "https://www.troostwijkauctions.com/l/used-dometic-seastar-tfxchx8641p-top-mount-engine-control-liver-A1-39684-12"
-
-    config.OFFLINE = False
-
-    scraper = TroostwijkScraper()
-    scraper.offline = False
-
-    print(f"\n[1] Scraping: {test_url}")
-
-    from playwright.async_api import async_playwright
-    async with async_playwright() as p:
-        browser = await p.chromium.launch(headless=True)
-        context = await browser.new_context()
-        page = await context.new_page()
-
-        page_data = await scraper.crawl_page(page, test_url)
-
-        await browser.close()
-
-    if not page_data:
-        print("  [FAIL] No data returned")
-        return False
-
-    print(f"\n[2] Checking description...")
-    description = page_data.get('description', '')
-
-    if not description or description == '':
-        print(f"  [FAIL] Description is empty")
-        return False
-    else:
-        print(f"  [PASS] Description: {description[:100]}...")
-        return True
-
-
-async def main():
-    """Run all tests"""
-    print("\n" + "="*60)
-    print("MISSING FIELDS TEST SUITE")
-    print("="*60)
-
-    test1 = await test_lot_has_all_fields()
-    test2 = await test_lot_with_description()
-
-    print("\n" + "="*60)
-    if test1 and test2:
-        print("ALL TESTS PASSED")
-    else:
-        print("SOME TESTS FAILED")
-        if not test1:
-            print("  - test_lot_has_all_fields FAILED")
-        if not test2:
-            print("  - test_lot_with_description FAILED")
-    print("="*60 + "\n")
-
-    return 0 if (test1 and test2) else 1
-
-
-if __name__ == '__main__':
-    exit_code = asyncio.run(main())
-    sys.exit(exit_code)
--- a/test/test_scraper.py
+++ b/test/test_scraper.py
@@ -1,335 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test suite for Troostwijk Scraper
-Tests both auction and lot parsing with cached data
-
-Requires Python 3.10+
-"""
-
-import sys
-
-# Require Python 3.10+
-if sys.version_info < (3, 10):
-    print("ERROR: This script requires Python 3.10 or higher")
-    print(f"Current version: {sys.version}")
-    sys.exit(1)
-
-import asyncio
-import json
-import sqlite3
-from datetime import datetime
-from pathlib import Path
-
-# Add parent directory to path
-sys.path.insert(0, str(Path(__file__).parent))
-
-from main import TroostwijkScraper, CacheManager, CACHE_DB
-
-# Test URLs - these will use cached data to avoid overloading the server
-TEST_AUCTIONS = [
-    "https://www.troostwijkauctions.com/a/online-auction-cnc-lathes-machining-centres-precision-measurement-romania-A7-39813",
-    "https://www.troostwijkauctions.com/a/faillissement-bab-shortlease-i-ii-b-v-%E2%80%93-2024-big-ass-energieopslagsystemen-A1-39557",
-    "https://www.troostwijkauctions.com/a/industriele-goederen-uit-diverse-bedrijfsbeeindigingen-A1-38675",
-]
-
-TEST_LOTS = [
-    "https://www.troostwijkauctions.com/l/%25282x%2529-duo-bureau-160x168-cm-A1-28505-5",
-    "https://www.troostwijkauctions.com/l/tos-sui-50-1000-universele-draaibank-A7-39568-9",
-    "https://www.troostwijkauctions.com/l/rolcontainer-%25282x%2529-A1-40191-101",
-]
-
-class TestResult:
-    def __init__(self, url, success, message, data=None):
-        self.url = url
-        self.success = success
-        self.message = message
-        self.data = data
-
-class ScraperTester:
-    def __init__(self):
-        self.scraper = TroostwijkScraper()
-        self.results = []
-
-    def check_cache_exists(self, url):
-        """Check if URL is cached"""
-        cached = self.scraper.cache.get(url, max_age_hours=999999)  # Get even old cache
-        return cached is not None
-
-    def test_auction_parsing(self, url):
-        """Test auction page parsing"""
-        print(f"\n{'='*70}")
-        print(f"Testing Auction: {url}")
-        print('='*70)
-
-        # Check cache
-        if not self.check_cache_exists(url):
-            return TestResult(
-                url,
-                False,
-                "❌ NOT IN CACHE - Please run scraper first to cache this URL",
-                None
-            )
-
-        # Get cached content
-        cached = self.scraper.cache.get(url, max_age_hours=999999)
-        content = cached['content']
-
-        print(f"✓ Cache hit (age: {(datetime.now().timestamp() - cached['timestamp']) / 3600:.1f} hours)")
-
-        # Parse
-        try:
-            data = self.scraper._parse_page(content, url)
-
-            if not data:
-                return TestResult(url, False, "❌ Parsing returned None", None)
-
-            if data.get('type') != 'auction':
-                return TestResult(
-                    url,
-                    False,
-                    f"❌ Expected type='auction', got '{data.get('type')}'",
-                    data
-                )
-
-            # Validate required fields
-            issues = []
-            required_fields = {
-                'auction_id': str,
-                'title': str,
-                'location': str,
-                'lots_count': int,
-                'first_lot_closing_time': str,
-            }
-
-            for field, expected_type in required_fields.items():
-                value = data.get(field)
-                if value is None or value == '':
-                    issues.append(f"  ❌ {field}: MISSING or EMPTY")
-                elif not isinstance(value, expected_type):
-                    issues.append(f"  ❌ {field}: Wrong type (expected {expected_type.__name__}, got {type(value).__name__})")
-                else:
-                    # Pretty print value
-                    display_value = str(value)[:60]
-                    print(f"  ✓ {field}: {display_value}")
-
-            if issues:
-                return TestResult(url, False, "\n".join(issues), data)
-
-            print(f"  ✓ lots_count: {data.get('lots_count')}")
-
-            return TestResult(url, True, "✅ All auction fields validated successfully", data)
-
-        except Exception as e:
-            return TestResult(url, False, f"❌ Exception during parsing: {e}", None)
-
-    def test_lot_parsing(self, url):
-        """Test lot page parsing"""
-        print(f"\n{'='*70}")
-        print(f"Testing Lot: {url}")
-        print('='*70)
-
-        # Check cache
-        if not self.check_cache_exists(url):
-            return TestResult(
-                url,
-                False,
-                "❌ NOT IN CACHE - Please run scraper first to cache this URL",
-                None
-            )
-
-        # Get cached content
-        cached = self.scraper.cache.get(url, max_age_hours=999999)
-        content = cached['content']
-
-        print(f"✓ Cache hit (age: {(datetime.now().timestamp() - cached['timestamp']) / 3600:.1f} hours)")
-
-        # Parse
-        try:
-            data = self.scraper._parse_page(content, url)
-
-            if not data:
-                return TestResult(url, False, "❌ Parsing returned None", None)
-
-            if data.get('type') != 'lot':
-                return TestResult(
-                    url,
-                    False,
-                    f"❌ Expected type='lot', got '{data.get('type')}'",
-                    data
-                )
-
-            # Validate required fields
-            issues = []
-            required_fields = {
-                'lot_id': (str, lambda x: x and len(x) > 0),
-                'title': (str, lambda x: x and len(x) > 3 and x not in ['...', 'N/A']),
-                'location': (str, lambda x: x and len(x) > 2 and x not in ['Locatie', 'Location']),
-                'current_bid': (str, lambda x: x and x not in ['€Huidig bod', 'Huidig bod']),
-                'closing_time': (str, lambda x: True),  # Can be empty
-                'images': (list, lambda x: True),  # Can be empty list
-            }
-
-            for field, (expected_type, validator) in required_fields.items():
-                value = data.get(field)
-
-                if value is None:
-                    issues.append(f"  ❌ {field}: MISSING (None)")
-                elif not isinstance(value, expected_type):
-                    issues.append(f"  ❌ {field}: Wrong type (expected {expected_type.__name__}, got {type(value).__name__})")
-                elif not validator(value):
-                    issues.append(f"  ❌ {field}: Invalid value: '{value}'")
-                else:
-                    # Pretty print value
-                    if field == 'images':
-                        print(f"  ✓ {field}: {len(value)} images")
-                        for i, img in enumerate(value[:3], 1):
-                            print(f"      {i}. {img[:60]}...")
-                    else:
-                        display_value = str(value)[:60]
-                        print(f"  ✓ {field}: {display_value}")
-
-            # Additional checks
-            if data.get('bid_count') is not None:
-                print(f"  ✓ bid_count: {data.get('bid_count')}")
-
-            if data.get('viewing_time'):
-                print(f"  ✓ viewing_time: {data.get('viewing_time')}")
-
-            if data.get('pickup_date'):
-                print(f"  ✓ pickup_date: {data.get('pickup_date')}")
-
-            if issues:
-                return TestResult(url, False, "\n".join(issues), data)
-
-            return TestResult(url, True, "✅ All lot fields validated successfully", data)
-
-        except Exception as e:
-            import traceback
-            return TestResult(url, False, f"❌ Exception during parsing: {e}\n{traceback.format_exc()}", None)
-
-    def run_all_tests(self):
-        """Run all tests"""
-        print("\n" + "="*70)
-        print("TROOSTWIJK SCRAPER TEST SUITE")
-        print("="*70)
-        print("\nThis test suite uses CACHED data only - no live requests to server")
-        print("="*70)
-
-        # Test auctions
-        print("\n" + "="*70)
-        print("TESTING AUCTIONS")
-        print("="*70)
-
-        for url in TEST_AUCTIONS:
-            result = self.test_auction_parsing(url)
-            self.results.append(result)
-
-        # Test lots
-        print("\n" + "="*70)
-        print("TESTING LOTS")
-        print("="*70)
-
-        for url in TEST_LOTS:
-            result = self.test_lot_parsing(url)
-            self.results.append(result)
-
-        # Summary
-        self.print_summary()
-
-    def print_summary(self):
-        """Print test summary"""
-        print("\n" + "="*70)
-        print("TEST SUMMARY")
-        print("="*70)
-
-        passed = sum(1 for r in self.results if r.success)
-        failed = sum(1 for r in self.results if not r.success)
-        total = len(self.results)
-
-        print(f"\nTotal tests: {total}")
-        print(f"Passed: {passed} ✓")
-        print(f"Failed: {failed} ✗")
-        print(f"Success rate: {passed/total*100:.1f}%")
-
-        if failed > 0:
-            print("\n" + "="*70)
-            print("FAILED TESTS:")
-            print("="*70)
-            for result in self.results:
-                if not result.success:
-                    print(f"\n{result.url}")
-                    print(result.message)
-                    if result.data:
-                        print("\nParsed data:")
-                        for key, value in result.data.items():
-                            if key != 'lots':  # Don't print full lots array
-                                print(f"  {key}: {str(value)[:80]}")
-
-        print("\n" + "="*70)
-
-        return failed == 0
-
-def check_cache_status():
-    """Check cache compression status"""
-    print("\n" + "="*70)
-    print("CACHE STATUS CHECK")
-    print("="*70)
-
-    try:
-        with sqlite3.connect(CACHE_DB) as conn:
-            # Total entries
-            cursor = conn.execute("SELECT COUNT(*) FROM cache")
-            total = cursor.fetchone()[0]
-
-            # Compressed vs uncompressed
-            cursor = conn.execute("SELECT COUNT(*) FROM cache WHERE compressed = 1")
-            compressed = cursor.fetchone()[0]
-
-            cursor = conn.execute("SELECT COUNT(*) FROM cache WHERE compressed = 0 OR compressed IS NULL")
-            uncompressed = cursor.fetchone()[0]
-
-            print(f"Total cache entries: {total}")
-            print(f"Compressed: {compressed} ({compressed/total*100:.1f}%)")
-            print(f"Uncompressed: {uncompressed} ({uncompressed/total*100:.1f}%)")
-
-            if uncompressed > 0:
-                print(f"\n⚠️  Warning: {uncompressed} entries are still uncompressed")
-                print("   Run: python migrate_compress_cache.py")
-            else:
-                print("\n✓ All cache entries are compressed!")
-
-            # Check test URLs
-            print(f"\n{'='*70}")
-            print("TEST URL CACHE STATUS:")
-            print('='*70)
-
-            all_test_urls = TEST_AUCTIONS + TEST_LOTS
-            cached_count = 0
-
-            for url in all_test_urls:
-                cursor = conn.execute("SELECT url FROM cache WHERE url = ?", (url,))
-                if cursor.fetchone():
-                    print(f"✓ {url[:60]}...")
-                    cached_count += 1
-                else:
-                    print(f"✗ {url[:60]}... (NOT CACHED)")
-
-            print(f"\n{cached_count}/{len(all_test_urls)} test URLs are cached")
-
-            if cached_count < len(all_test_urls):
-                print("\n⚠️  Some test URLs are not cached. Tests for those URLs will fail.")
-                print("   Run the main scraper to cache these URLs first.")
-
-    except Exception as e:
-        print(f"Error checking cache status: {e}")
-
-if __name__ == "__main__":
-    # Check cache status first
-    check_cache_status()
-
-    # Run tests
-    tester = ScraperTester()
-    success = tester.run_all_tests()
-
-    # Exit with appropriate code
-    sys.exit(0 if success else 1)
Author	SHA1	Message	Date
Tour	d7860adbaa	_internal_db	2025-12-10 08:04:04 +01:00
Tour	a71b3f36ec	_internal_db	2025-12-10 07:54:12 +01:00
Tour	5b0d2f78d6	0	2025-12-09 23:39:38 +01:00
Tour	3f5b93abdd	0	2025-12-09 23:30:24 +01:00
Tour	2dda1aff00	- Added targeted test to reproduce and validate handling of GraphQL 403 errors. - Hardened the GraphQL client to reduce 403 occurrences and provide clearer diagnostics when they appear. - Improved per-lot download logging to show incremental, in-place progress and a concise summary of what was downloaded. ### Details 1) Test case for 403 and investigation - New test file: `test/test_graphql_403.py`. - Uses `importlib` to load `src/config.py` and `src/graphql_client.py` directly so it’s independent of sys.path quirks. - Mocks `aiohttp.ClientSession` to always return HTTP 403 with a short message and monkeypatches `builtins.print` to capture logs. - Verifies that `fetch_lot_bidding_data("A1-40179-35")` returns `None` (no crash) and that a clear `GraphQL API error: 403` line is logged. - Result: `pytest test/test_graphql_403.py -q` passes locally. - Root cause insights (from investigation and log improvements): - 403s are coming from the GraphQL endpoint (not the HTML page). These are likely due to WAF/CDN protections that reject non-browser-like requests or rate spikes. - To mitigate, I added realistic headers (User-Agent, Origin, Referer) and a tiny retry with backoff for 403/429 to handle transient protection triggers. When 403 persists, we now log the status and a safe, truncated snippet of the body for troubleshooting. 2) Incremental/in-place logging for downloads - Updated `src/scraper.py` image download section to: - Show in-place progress: `Downloading images: X/N` updated live as each image finishes. - After completion, print: `Downloaded: K/N new images`. - Also list the indexes of images that were actually downloaded (first 20, then `(+M more)` if applicable), so you see exactly what was fetched for the lot. 3) GraphQL client improvements - Updated `src/graphql_client.py`: - Added browser-like headers and contextual Referer. - Added small retry with backoff for 403/429. - Improved error logs to include status, lot id, and a short body snippet. ### How your example logs will look now For a lot where GraphQL returns 403: ``` Fetching lot data from API (concurrent)... GraphQL API error: 403 (lot=A1-40179-35) — Forbidden by WAF ``` For image downloads: ``` Images: 6 Downloading images: 0/6 ... 6/6 Downloaded: 6/6 new images Indexes: 0, 1, 2, 3, 4, 5 ``` (When all cached: `All 6 images already cached`) ### Notes - Full test run surfaced a pre-existing import error in `test/test_scraper.py` (unrelated to these changes). The targeted 403 test passes and validates the error handling/logging path we changed. - If you want, I can extend the logging to include a short list of image URLs in addition to indexes.	2025-12-09 22:56:10 +01:00
Tour	62d664c580	- Added targeted test to reproduce and validate handling of GraphQL 403 errors. - Hardened the GraphQL client to reduce 403 occurrences and provide clearer diagnostics when they appear. - Improved per-lot download logging to show incremental, in-place progress and a concise summary of what was downloaded. ### Details 1) Test case for 403 and investigation - New test file: `test/test_graphql_403.py`. - Uses `importlib` to load `src/config.py` and `src/graphql_client.py` directly so it’s independent of sys.path quirks. - Mocks `aiohttp.ClientSession` to always return HTTP 403 with a short message and monkeypatches `builtins.print` to capture logs. - Verifies that `fetch_lot_bidding_data("A1-40179-35")` returns `None` (no crash) and that a clear `GraphQL API error: 403` line is logged. - Result: `pytest test/test_graphql_403.py -q` passes locally. - Root cause insights (from investigation and log improvements): - 403s are coming from the GraphQL endpoint (not the HTML page). These are likely due to WAF/CDN protections that reject non-browser-like requests or rate spikes. - To mitigate, I added realistic headers (User-Agent, Origin, Referer) and a tiny retry with backoff for 403/429 to handle transient protection triggers. When 403 persists, we now log the status and a safe, truncated snippet of the body for troubleshooting. 2) Incremental/in-place logging for downloads - Updated `src/scraper.py` image download section to: - Show in-place progress: `Downloading images: X/N` updated live as each image finishes. - After completion, print: `Downloaded: K/N new images`. - Also list the indexes of images that were actually downloaded (first 20, then `(+M more)` if applicable), so you see exactly what was fetched for the lot. 3) GraphQL client improvements - Updated `src/graphql_client.py`: - Added browser-like headers and contextual Referer. - Added small retry with backoff for 403/429. - Improved error logs to include status, lot id, and a short body snippet. ### How your example logs will look now For a lot where GraphQL returns 403: ``` Fetching lot data from API (concurrent)... GraphQL API error: 403 (lot=A1-40179-35) — Forbidden by WAF ``` For image downloads: ``` Images: 6 Downloading images: 0/6 ... 6/6 Downloaded: 6/6 new images Indexes: 0, 1, 2, 3, 4, 5 ``` (When all cached: `All 6 images already cached`) ### Notes - Full test run surfaced a pre-existing import error in `test/test_scraper.py` (unrelated to these changes). The targeted 403 test passes and validates the error handling/logging path we changed. - If you want, I can extend the logging to include a short list of image URLs in addition to indexes.	2025-12-09 20:53:54 +01:00
Tour	5ea2342dbc	- Added targeted test to reproduce and validate handling of GraphQL 403 errors. - Hardened the GraphQL client to reduce 403 occurrences and provide clearer diagnostics when they appear. - Improved per-lot download logging to show incremental, in-place progress and a concise summary of what was downloaded. ### Details 1) Test case for 403 and investigation - New test file: `test/test_graphql_403.py`. - Uses `importlib` to load `src/config.py` and `src/graphql_client.py` directly so it’s independent of sys.path quirks. - Mocks `aiohttp.ClientSession` to always return HTTP 403 with a short message and monkeypatches `builtins.print` to capture logs. - Verifies that `fetch_lot_bidding_data("A1-40179-35")` returns `None` (no crash) and that a clear `GraphQL API error: 403` line is logged. - Result: `pytest test/test_graphql_403.py -q` passes locally. - Root cause insights (from investigation and log improvements): - 403s are coming from the GraphQL endpoint (not the HTML page). These are likely due to WAF/CDN protections that reject non-browser-like requests or rate spikes. - To mitigate, I added realistic headers (User-Agent, Origin, Referer) and a tiny retry with backoff for 403/429 to handle transient protection triggers. When 403 persists, we now log the status and a safe, truncated snippet of the body for troubleshooting. 2) Incremental/in-place logging for downloads - Updated `src/scraper.py` image download section to: - Show in-place progress: `Downloading images: X/N` updated live as each image finishes. - After completion, print: `Downloaded: K/N new images`. - Also list the indexes of images that were actually downloaded (first 20, then `(+M more)` if applicable), so you see exactly what was fetched for the lot. 3) GraphQL client improvements - Updated `src/graphql_client.py`: - Added browser-like headers and contextual Referer. - Added small retry with backoff for 403/429. - Improved error logs to include status, lot id, and a short body snippet. ### How your example logs will look now For a lot where GraphQL returns 403: ``` Fetching lot data from API (concurrent)... GraphQL API error: 403 (lot=A1-40179-35) — Forbidden by WAF ``` For image downloads: ``` Images: 6 Downloading images: 0/6 ... 6/6 Downloaded: 6/6 new images Indexes: 0, 1, 2, 3, 4, 5 ``` (When all cached: `All 6 images already cached`) ### Notes - Full test run surfaced a pre-existing import error in `test/test_scraper.py` (unrelated to these changes). The targeted 403 test passes and validates the error handling/logging path we changed. - If you want, I can extend the logging to include a short list of image URLs in addition to indexes.	2025-12-09 19:53:31 +01:00
Tour	570fd3870e	scaev	2025-12-09 11:54:19 +01:00
Tour	5a755a2125	- Added targeted test to reproduce and validate handling of GraphQL 403 errors. - Hardened the GraphQL client to reduce 403 occurrences and provide clearer diagnostics when they appear. - Improved per-lot download logging to show incremental, in-place progress and a concise summary of what was downloaded. ### Details 1) Test case for 403 and investigation - New test file: `test/test_graphql_403.py`. - Uses `importlib` to load `src/config.py` and `src/graphql_client.py` directly so it’s independent of sys.path quirks. - Mocks `aiohttp.ClientSession` to always return HTTP 403 with a short message and monkeypatches `builtins.print` to capture logs. - Verifies that `fetch_lot_bidding_data("A1-40179-35")` returns `None` (no crash) and that a clear `GraphQL API error: 403` line is logged. - Result: `pytest test/test_graphql_403.py -q` passes locally. - Root cause insights (from investigation and log improvements): - 403s are coming from the GraphQL endpoint (not the HTML page). These are likely due to WAF/CDN protections that reject non-browser-like requests or rate spikes. - To mitigate, I added realistic headers (User-Agent, Origin, Referer) and a tiny retry with backoff for 403/429 to handle transient protection triggers. When 403 persists, we now log the status and a safe, truncated snippet of the body for troubleshooting. 2) Incremental/in-place logging for downloads - Updated `src/scraper.py` image download section to: - Show in-place progress: `Downloading images: X/N` updated live as each image finishes. - After completion, print: `Downloaded: K/N new images`. - Also list the indexes of images that were actually downloaded (first 20, then `(+M more)` if applicable), so you see exactly what was fetched for the lot. 3) GraphQL client improvements - Updated `src/graphql_client.py`: - Added browser-like headers and contextual Referer. - Added small retry with backoff for 403/429. - Improved error logs to include status, lot id, and a short body snippet. ### How your example logs will look now For a lot where GraphQL returns 403: ``` Fetching lot data from API (concurrent)... GraphQL API error: 403 (lot=A1-40179-35) — Forbidden by WAF ``` For image downloads: ``` Images: 6 Downloading images: 0/6 ... 6/6 Downloaded: 6/6 new images Indexes: 0, 1, 2, 3, 4, 5 ``` (When all cached: `All 6 images already cached`) ### Notes - Full test run surfaced a pre-existing import error in `test/test_scraper.py` (unrelated to these changes). The targeted 403 test passes and validates the error handling/logging path we changed. - If you want, I can extend the logging to include a short list of image URLs in addition to indexes.	2025-12-09 09:15:49 +01:00