auctiora/wiki/INTEGRATION_GUIDE.md

# Integration Guide: Troostwijk Monitor ↔ Scraper

## Overview

This document describes how **Troostwijk Monitor** (this Java project) integrates with the **ARCHITECTURE-TROOSTWIJK-SCRAPER** (Python scraper process).

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│              ARCHITECTURE-TROOSTWIJK-SCRAPER (Python)           │
│                                                                  │
│  • Discovers auctions from website                              │
│  • Scrapes lot details via Playwright                           │
│  • Parses __NEXT_DATA__ JSON                                    │
│  • Stores image URLs (not downloads)                            │
│                                                                  │
│         ↓ Writes to                                             │
└─────────┼───────────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────────┐
│                     SHARED SQLite DATABASE                       │
│                       (troostwijk.db)                            │
│                                                                  │
│  ┌────────────────┐  ┌────────────────┐  ┌────────────────┐   │
│  │   auctions     │  │     lots       │  │    images      │   │
│  │  (Scraper)     │  │   (Scraper)    │  │  (Both)        │   │
│  └────────────────┘  └────────────────┘  └────────────────┘   │
│                                                                  │
│         ↑ Reads from                    ↓ Writes to             │
└─────────┼──────────────────────────────┼──────────────────────┘
          │                              │
          │                              ▼
┌─────────┴──────────────────────────────────────────────────────┐
│          TROOSTWIJK MONITOR (Java - This Project)              │
│                                                                  │
│  • Reads auction/lot data from database                         │
│  • Downloads images from URLs                                   │
│  • Runs YOLO object detection                                   │
│  • Monitors bid changes                                         │
│  • Sends notifications                                          │
└─────────────────────────────────────────────────────────────────┘
```

## Database Schema Mapping

### Scraper Schema → Monitor Schema

The scraper and monitor use **slightly different schemas** that need to be reconciled:

| Scraper Table | Monitor Table | Integration Notes                             |
|---------------|---------------|-----------------------------------------------|
| `auctions`    | `auctions`    | ✅ **Compatible** - same structure             |
| `lots`        | `lots`        | ⚠️ **Needs mapping** - field name differences |
| `images`      | `images`      | ⚠️ **Partial overlap** - different purposes   |
| `cache`       | N/A           | ❌ Monitor doesn't use cache                   |

### Field Mapping: `auctions` Table

| Scraper Field            | Monitor Field                 | Notes                                                               |
|--------------------------|-------------------------------|---------------------------------------------------------------------|
| `auction_id` (TEXT)      | `auction_id` (INTEGER)        | ⚠️ **TYPE MISMATCH** - Scraper uses "A7-39813", Monitor expects INT |
| `url`                    | `url`                         | ✅ Compatible                                                        |
| `title`                  | `title`                       | ✅ Compatible                                                        |
| `location`               | `location`, `city`, `country` | ⚠️ Monitor splits into 3 fields                                     |
| `lots_count`             | `lot_count`                   | ⚠️ Name difference                                                  |
| `first_lot_closing_time` | `closing_time`                | ⚠️ Name difference                                                  |
| `scraped_at`             | `discovered_at`               | ⚠️ Name + type difference (TEXT vs INTEGER timestamp)               |

### Field Mapping: `lots` Table

| Scraper Field        | Monitor Field        | Notes                                            |
|----------------------|----------------------|--------------------------------------------------|
| `lot_id` (TEXT)      | `lot_id` (INTEGER)   | ⚠️ **TYPE MISMATCH** - "A1-28505-5" vs INT       |
| `auction_id`         | `sale_id`            | ⚠️ Different name                                |
| `url`                | `url`                | ✅ Compatible                                     |
| `title`              | `title`              | ✅ Compatible                                     |
| `current_bid` (TEXT) | `current_bid` (REAL) | ⚠️ **TYPE MISMATCH** - "€123.45" vs 123.45       |
| `bid_count`          | N/A                  | ℹ️ Monitor doesn't track                         |
| `closing_time`       | `closing_time`       | ⚠️ Format difference (TEXT vs LocalDateTime)     |
| `viewing_time`       | N/A                  | ℹ️ Monitor doesn't track                         |
| `pickup_date`        | N/A                  | ℹ️ Monitor doesn't track                         |
| `location`           | N/A                  | ℹ️ Monitor doesn't track lot location separately |
| `description`        | `description`        | ✅ Compatible                                     |
| `category`           | `category`           | ✅ Compatible                                     |
| N/A                  | `manufacturer`       | ℹ️ Monitor has additional field                  |
| N/A                  | `type`               | ℹ️ Monitor has additional field                  |
| N/A                  | `year`               | ℹ️ Monitor has additional field                  |
| N/A                  | `currency`           | ℹ️ Monitor has additional field                  |
| N/A                  | `closing_notified`   | ℹ️ Monitor tracking field                        |

### Field Mapping: `images` Table

| Scraper Field          | Monitor Field            | Notes                                  |
|------------------------|--------------------------|----------------------------------------|
| `id`                   | `id`                     | ✅ Compatible                           |
| `lot_id`               | `lot_id`                 | ⚠️ Type difference (TEXT vs INTEGER)   |
| `url`                  | `url`                    | ✅ Compatible                           |
| `local_path`           | `Local_path`             | ⚠️ Different name                      |
| `downloaded` (INTEGER) | N/A                      | ℹ️ Monitor uses `processed_at` instead |
| N/A                    | `labels` (TEXT)          | ℹ️ Monitor adds detected objects       |
| N/A                    | `processed_at` (INTEGER) | ℹ️ Monitor tracking field              |

## Integration Options

### Option 1: Database Schema Adapter (Recommended)

Create a compatibility layer that transforms scraper data to monitor format.

**Implementation:**
```java
// Add to DatabaseService.java
class ScraperDataAdapter {

    /**
     * Imports auction from scraper format to monitor format
     */
    static AuctionInfo fromScraperAuction(ResultSet rs) throws SQLException {
        // Parse "A7-39813" → 39813
        String auctionIdStr = rs.getString("auction_id");
        int auctionId = extractNumericId(auctionIdStr);

        // Split "Cluj-Napoca, RO" → city="Cluj-Napoca", country="RO"
        String location = rs.getString("location");
        String[] parts = location.split(",\\s*");
        String city = parts.length > 0 ? parts[0] : "";
        String country = parts.length > 1 ? parts[1] : "";

        return new AuctionInfo(
            auctionId,
            rs.getString("title"),
            location,
            city,
            country,
            rs.getString("url"),
            extractTypePrefix(auctionIdStr), // "A7-39813" → "A7"
            rs.getInt("lots_count"),
            parseTimestamp(rs.getString("first_lot_closing_time"))
        );
    }

    /**
     * Imports lot from scraper format to monitor format
     */
    static Lot fromScraperLot(ResultSet rs) throws SQLException {
        // Parse "A1-28505-5" → 285055 (combine numbers)
        String lotIdStr = rs.getString("lot_id");
        int lotId = extractNumericId(lotIdStr);

        // Parse "A7-39813" → 39813
        String auctionIdStr = rs.getString("auction_id");
        int saleId = extractNumericId(auctionIdStr);

        // Parse "€123.45" → 123.45
        String currentBidStr = rs.getString("current_bid");
        double currentBid = parseBid(currentBidStr);

        return new Lot(
            saleId,
            lotId,
            rs.getString("title"),
            rs.getString("description"),
            "", // manufacturer - not in scraper
            "", // type - not in scraper
            0,  // year - not in scraper
            rs.getString("category"),
            currentBid,
            "EUR", // currency - inferred from €
            rs.getString("url"),
            parseTimestamp(rs.getString("closing_time")),
            false // not yet notified
        );
    }

    private static int extractNumericId(String id) {
        // "A7-39813" → 39813
        // "A1-28505-5" → 285055
        return Integer.parseInt(id.replaceAll("[^0-9]", ""));
    }

    private static String extractTypePrefix(String id) {
        // "A7-39813" → "A7"
        int dashIndex = id.indexOf('-');
        return dashIndex > 0 ? id.substring(0, dashIndex) : "";
    }

    private static double parseBid(String bid) {
        // "€123.45" → 123.45
        // "No bids" → 0.0
        if (bid == null || bid.contains("No")) return 0.0;
        return Double.parseDouble(bid.replaceAll("[^0-9.]", ""));
    }

    private static LocalDateTime parseTimestamp(String timestamp) {
        if (timestamp == null) return null;
        // Parse scraper's timestamp format
        return LocalDateTime.parse(timestamp);
    }
}
```

### Option 2: Unified Schema (Better Long-term)

Modify **both** scraper and monitor to use a unified schema.

**Create**: `SHARED_SCHEMA.sql`
```sql
-- Unified schema that both projects use

CREATE TABLE IF NOT EXISTS auctions (
    auction_id TEXT PRIMARY KEY,        -- Use TEXT to support "A7-39813"
    auction_id_numeric INTEGER,         -- For monitor's integer needs
    title TEXT NOT NULL,
    location TEXT,                      -- Full: "Cluj-Napoca, RO"
    city TEXT,                          -- Parsed: "Cluj-Napoca"
    country TEXT,                       -- Parsed: "RO"
    url TEXT NOT NULL,
    type TEXT,                          -- "A7", "A1"
    lot_count INTEGER DEFAULT 0,
    closing_time TEXT,                  -- ISO 8601 format
    scraped_at INTEGER,                 -- Unix timestamp
    discovered_at INTEGER               -- Unix timestamp (same as scraped_at)
);

CREATE TABLE IF NOT EXISTS lots (
    lot_id TEXT PRIMARY KEY,            -- Use TEXT: "A1-28505-5"
    lot_id_numeric INTEGER,             -- For monitor's integer needs
    auction_id TEXT,                    -- FK: "A7-39813"
    sale_id INTEGER,                    -- For monitor (same as auction_id_numeric)
    title TEXT,
    description TEXT,
    manufacturer TEXT,
    type TEXT,
    year INTEGER,
    category TEXT,
    current_bid_text TEXT,              -- "€123.45" or "No bids"
    current_bid REAL,                   -- 123.45
    bid_count INTEGER,
    currency TEXT DEFAULT 'EUR',
    url TEXT UNIQUE,
    closing_time TEXT,
    viewing_time TEXT,
    pickup_date TEXT,
    location TEXT,
    closing_notified INTEGER DEFAULT 0,
    scraped_at TEXT,
    FOREIGN KEY (auction_id) REFERENCES auctions(auction_id)
);

CREATE TABLE IF NOT EXISTS images (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    lot_id TEXT,                        -- FK: "A1-28505-5"
    url TEXT,                           -- Image URL from website
    local_path TEXT,                     -- Local path after download
    labels TEXT,                        -- Detected objects (comma-separated)
    downloaded INTEGER DEFAULT 0,       -- 0=pending, 1=downloaded
    processed_at INTEGER,               -- Unix timestamp when processed
    FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
);

-- Indexes
CREATE INDEX IF NOT EXISTS idx_auctions_country ON auctions(country);
CREATE INDEX IF NOT EXISTS idx_lots_auction_id ON lots(auction_id);
CREATE INDEX IF NOT EXISTS idx_images_lot_id ON images(lot_id);
CREATE INDEX IF NOT EXISTS idx_images_downloaded ON images(downloaded);
```

### Option 3: API Integration (Most Flexible)

Have the scraper expose a REST API for the monitor to query.

```python
# In scraper: Add Flask API endpoint
@app.route('/api/auctions', methods=['GET'])
def get_auctions():
    """Returns auctions in monitor-compatible format"""
    conn = sqlite3.connect(CACHE_DB)
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM auctions WHERE location LIKE '%NL%'")

    auctions = []
    for row in cursor.fetchall():
        auctions.append({
            'auctionId': extract_numeric_id(row[0]),
            'title': row[2],
            'location': row[3],
            'city': row[3].split(',')[0] if row[3] else '',
            'country': row[3].split(',')[1].strip() if ',' in row[3] else '',
            'url': row[1],
            'type': row[0].split('-')[0],
            'lotCount': row[4],
            'closingTime': row[5]
        })

    return jsonify(auctions)
```

## Recommended Integration Steps

### Phase 1: Immediate (Adapter Pattern)
1. ✅ Keep separate schemas
2. ✅ Create `ScraperDataAdapter` in Monitor
3. ✅ Add import methods to `DatabaseService`
4. ✅ Monitor reads from scraper's tables using adapter

### Phase 2: Short-term (Unified Schema)
1. 📋 Design unified schema (see Option 2)
2. 📋 Update scraper to use unified schema
3. 📋 Update monitor to use unified schema
4. 📋 Migrate existing data

### Phase 3: Long-term (API + Event-driven)
1. 📋 Add REST API to scraper
2. 📋 Add webhook/event notification when new data arrives
3. 📋 Monitor subscribes to events
4. 📋 Process images asynchronously

## Current Integration Flow

### Scraper Process (Python)
```bash
# 1. Run scraper to populate database
cd /path/to/scraper
python scraper.py

# Output:
# ✅ Scraped 42 auctions
# ✅ Scraped 1,234 lots
# ✅ Saved 3,456 image URLs
# ✅ Data written to: /mnt/okcomputer/output/cache.db
```

### Monitor Process (Java)
```bash
# 2. Run monitor to process the data
cd /path/to/monitor
export DATABASE_FILE=/mnt/okcomputer/output/cache.db
java -jar troostwijk-monitor.jar

# Output:
# 📊 Current Database State:
#   Total lots in database: 1,234
#   Total images processed: 0
#
# [1/2] Processing images...
#   Downloading and analyzing 3,456 images...
#
# [2/2] Starting bid monitoring...
# ✓ Monitoring 1,234 active lots
```

## Configuration

### Shared Database Path
Both processes must point to the same database file:

**Scraper** (`config.py`):
```python
CACHE_DB = '/mnt/okcomputer/output/cache.db'
```

**Monitor** (`Main.java`):
```java
String databaseFile = System.getenv().getOrDefault(
    "DATABASE_FILE",
    "/mnt/okcomputer/output/cache.db"
);
```

### Recommended Directory Structure
```
/mnt/okcomputer/
├── scraper/                          # Python scraper code
│   ├── scraper.py
│   └── requirements.txt
├── monitor/                          # Java monitor code
│   ├── troostwijk-monitor.jar
│   └── models/                       # YOLO models
│       ├── yolov4.cfg
│       ├── yolov4.weights
│       └── coco.names
└── output/                           # Shared data directory
    ├── cache.db                      # Shared SQLite database
    └── images/                       # Downloaded images
        ├── A1-28505-5/
        │   ├── 001.jpg
        │   └── 002.jpg
        └── ...
```

## Monitoring & Coordination

### Option A: Sequential Execution
```bash
#!/bin/bash
# run-pipeline.sh

echo "Step 1: Scraping..."
python scraper/scraper.py

echo "Step 2: Processing images..."
java -jar monitor/troostwijk-monitor.jar --process-images-only

echo "Step 3: Starting monitor..."
java -jar monitor/troostwijk-monitor.jar --monitor-only
```

### Option B: Separate Services (Docker Compose)
```yaml
version: '3.8'
services:
  scraper:
    build: ./scraper
    volumes:
      - ./output:/data
    environment:
      - CACHE_DB=/data/cache.db
    command: python scraper.py

  monitor:
    build: ./monitor
    volumes:
      - ./output:/data
    environment:
      - DATABASE_FILE=/data/cache.db
      - NOTIFICATION_CONFIG=desktop
    depends_on:
      - scraper
    command: java -jar troostwijk-monitor.jar
```

### Option C: Cron-based Scheduling
```cron
# Scrape every 6 hours
0 */6 * * * cd /mnt/okcomputer/scraper && python scraper.py

# Process images every hour (if new lots found)
0 * * * * cd /mnt/okcomputer/monitor && java -jar monitor.jar --process-new

# Monitor runs continuously
@reboot cd /mnt/okcomputer/monitor && java -jar monitor.jar --monitor-only
```

## Troubleshooting

### Issue: Type Mismatch Errors
**Symptom**: Monitor crashes with "INTEGER expected, got TEXT"

**Solution**: Use adapter pattern (Option 1) or unified schema (Option 2)

### Issue: Monitor sees no data
**Symptom**: "Total lots in database: 0"

**Check**:
1. Is `DATABASE_FILE` env var set correctly?
2. Did scraper actually write data?
3. Are both processes using the same database file?

```bash
# Verify database has data
sqlite3 /mnt/okcomputer/output/cache.db "SELECT COUNT(*) FROM lots"
```

### Issue: Images not downloading
**Symptom**: "Total images processed: 0" but scraper found images

**Check**:
1. Scraper writes image URLs to `images` table
2. Monitor reads from `images` table with `downloaded=0`
3. Field name mapping: `local_path` vs `local_path`

## Next Steps

1. **Immediate**: Implement `ScraperDataAdapter` for compatibility
2. **This Week**: Test end-to-end integration with sample data
3. **Next Sprint**: Migrate to unified schema
4. **Future**: Add event-driven architecture with webhooks