Fix mock tests
This commit is contained in:
479
wiki/INTEGRATION_GUIDE.md
Normal file
479
wiki/INTEGRATION_GUIDE.md
Normal file
@@ -0,0 +1,479 @@
|
||||
# Integration Guide: Troostwijk Monitor ↔ Scraper
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes how **Troostwijk Monitor** (this Java project) integrates with the **ARCHITECTURE-TROOSTWIJK-SCRAPER** (Python scraper process).
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ ARCHITECTURE-TROOSTWIJK-SCRAPER (Python) │
|
||||
│ │
|
||||
│ • Discovers auctions from website │
|
||||
│ • Scrapes lot details via Playwright │
|
||||
│ • Parses __NEXT_DATA__ JSON │
|
||||
│ • Stores image URLs (not downloads) │
|
||||
│ │
|
||||
│ ↓ Writes to │
|
||||
└─────────┼───────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ SHARED SQLite DATABASE │
|
||||
│ (troostwijk.db) │
|
||||
│ │
|
||||
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
||||
│ │ auctions │ │ lots │ │ images │ │
|
||||
│ │ (Scraper) │ │ (Scraper) │ │ (Both) │ │
|
||||
│ └────────────────┘ └────────────────┘ └────────────────┘ │
|
||||
│ │
|
||||
│ ↑ Reads from ↓ Writes to │
|
||||
└─────────┼──────────────────────────────┼──────────────────────┘
|
||||
│ │
|
||||
│ ▼
|
||||
┌─────────┴──────────────────────────────────────────────────────┐
|
||||
│ TROOSTWIJK MONITOR (Java - This Project) │
|
||||
│ │
|
||||
│ • Reads auction/lot data from database │
|
||||
│ • Downloads images from URLs │
|
||||
│ • Runs YOLO object detection │
|
||||
│ • Monitors bid changes │
|
||||
│ • Sends notifications │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Database Schema Mapping
|
||||
|
||||
### Scraper Schema → Monitor Schema
|
||||
|
||||
The scraper and monitor use **slightly different schemas** that need to be reconciled:
|
||||
|
||||
| Scraper Table | Monitor Table | Integration Notes |
|
||||
|---------------|---------------|-------------------|
|
||||
| `auctions` | `auctions` | ✅ **Compatible** - same structure |
|
||||
| `lots` | `lots` | ⚠️ **Needs mapping** - field name differences |
|
||||
| `images` | `images` | ⚠️ **Partial overlap** - different purposes |
|
||||
| `cache` | N/A | ❌ Monitor doesn't use cache |
|
||||
|
||||
### Field Mapping: `auctions` Table
|
||||
|
||||
| Scraper Field | Monitor Field | Notes |
|
||||
|---------------|---------------|-------|
|
||||
| `auction_id` (TEXT) | `auction_id` (INTEGER) | ⚠️ **TYPE MISMATCH** - Scraper uses "A7-39813", Monitor expects INT |
|
||||
| `url` | `url` | ✅ Compatible |
|
||||
| `title` | `title` | ✅ Compatible |
|
||||
| `location` | `location`, `city`, `country` | ⚠️ Monitor splits into 3 fields |
|
||||
| `lots_count` | `lot_count` | ⚠️ Name difference |
|
||||
| `first_lot_closing_time` | `closing_time` | ⚠️ Name difference |
|
||||
| `scraped_at` | `discovered_at` | ⚠️ Name + type difference (TEXT vs INTEGER timestamp) |
|
||||
|
||||
### Field Mapping: `lots` Table
|
||||
|
||||
| Scraper Field | Monitor Field | Notes |
|
||||
|---------------|---------------|-------|
|
||||
| `lot_id` (TEXT) | `lot_id` (INTEGER) | ⚠️ **TYPE MISMATCH** - "A1-28505-5" vs INT |
|
||||
| `auction_id` | `sale_id` | ⚠️ Different name |
|
||||
| `url` | `url` | ✅ Compatible |
|
||||
| `title` | `title` | ✅ Compatible |
|
||||
| `current_bid` (TEXT) | `current_bid` (REAL) | ⚠️ **TYPE MISMATCH** - "€123.45" vs 123.45 |
|
||||
| `bid_count` | N/A | ℹ️ Monitor doesn't track |
|
||||
| `closing_time` | `closing_time` | ⚠️ Format difference (TEXT vs LocalDateTime) |
|
||||
| `viewing_time` | N/A | ℹ️ Monitor doesn't track |
|
||||
| `pickup_date` | N/A | ℹ️ Monitor doesn't track |
|
||||
| `location` | N/A | ℹ️ Monitor doesn't track lot location separately |
|
||||
| `description` | `description` | ✅ Compatible |
|
||||
| `category` | `category` | ✅ Compatible |
|
||||
| N/A | `manufacturer` | ℹ️ Monitor has additional field |
|
||||
| N/A | `type` | ℹ️ Monitor has additional field |
|
||||
| N/A | `year` | ℹ️ Monitor has additional field |
|
||||
| N/A | `currency` | ℹ️ Monitor has additional field |
|
||||
| N/A | `closing_notified` | ℹ️ Monitor tracking field |
|
||||
|
||||
### Field Mapping: `images` Table
|
||||
|
||||
| Scraper Field | Monitor Field | Notes |
|
||||
|---------------|---------------|-------|
|
||||
| `id` | `id` | ✅ Compatible |
|
||||
| `lot_id` | `lot_id` | ⚠️ Type difference (TEXT vs INTEGER) |
|
||||
| `url` | `url` | ✅ Compatible |
|
||||
| `local_path` | `file_path` | ⚠️ Different name |
|
||||
| `downloaded` (INTEGER) | N/A | ℹ️ Monitor uses `processed_at` instead |
|
||||
| N/A | `labels` (TEXT) | ℹ️ Monitor adds detected objects |
|
||||
| N/A | `processed_at` (INTEGER) | ℹ️ Monitor tracking field |
|
||||
|
||||
## Integration Options
|
||||
|
||||
### Option 1: Database Schema Adapter (Recommended)
|
||||
|
||||
Create a compatibility layer that transforms scraper data to monitor format.
|
||||
|
||||
**Implementation:**
|
||||
```java
|
||||
// Add to DatabaseService.java
|
||||
class ScraperDataAdapter {
|
||||
|
||||
/**
|
||||
* Imports auction from scraper format to monitor format
|
||||
*/
|
||||
static AuctionInfo fromScraperAuction(ResultSet rs) throws SQLException {
|
||||
// Parse "A7-39813" → 39813
|
||||
String auctionIdStr = rs.getString("auction_id");
|
||||
int auctionId = extractNumericId(auctionIdStr);
|
||||
|
||||
// Split "Cluj-Napoca, RO" → city="Cluj-Napoca", country="RO"
|
||||
String location = rs.getString("location");
|
||||
String[] parts = location.split(",\\s*");
|
||||
String city = parts.length > 0 ? parts[0] : "";
|
||||
String country = parts.length > 1 ? parts[1] : "";
|
||||
|
||||
return new AuctionInfo(
|
||||
auctionId,
|
||||
rs.getString("title"),
|
||||
location,
|
||||
city,
|
||||
country,
|
||||
rs.getString("url"),
|
||||
extractTypePrefix(auctionIdStr), // "A7-39813" → "A7"
|
||||
rs.getInt("lots_count"),
|
||||
parseTimestamp(rs.getString("first_lot_closing_time"))
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Imports lot from scraper format to monitor format
|
||||
*/
|
||||
static Lot fromScraperLot(ResultSet rs) throws SQLException {
|
||||
// Parse "A1-28505-5" → 285055 (combine numbers)
|
||||
String lotIdStr = rs.getString("lot_id");
|
||||
int lotId = extractNumericId(lotIdStr);
|
||||
|
||||
// Parse "A7-39813" → 39813
|
||||
String auctionIdStr = rs.getString("auction_id");
|
||||
int saleId = extractNumericId(auctionIdStr);
|
||||
|
||||
// Parse "€123.45" → 123.45
|
||||
String currentBidStr = rs.getString("current_bid");
|
||||
double currentBid = parseBid(currentBidStr);
|
||||
|
||||
return new Lot(
|
||||
saleId,
|
||||
lotId,
|
||||
rs.getString("title"),
|
||||
rs.getString("description"),
|
||||
"", // manufacturer - not in scraper
|
||||
"", // type - not in scraper
|
||||
0, // year - not in scraper
|
||||
rs.getString("category"),
|
||||
currentBid,
|
||||
"EUR", // currency - inferred from €
|
||||
rs.getString("url"),
|
||||
parseTimestamp(rs.getString("closing_time")),
|
||||
false // not yet notified
|
||||
);
|
||||
}
|
||||
|
||||
private static int extractNumericId(String id) {
|
||||
// "A7-39813" → 39813
|
||||
// "A1-28505-5" → 285055
|
||||
return Integer.parseInt(id.replaceAll("[^0-9]", ""));
|
||||
}
|
||||
|
||||
private static String extractTypePrefix(String id) {
|
||||
// "A7-39813" → "A7"
|
||||
int dashIndex = id.indexOf('-');
|
||||
return dashIndex > 0 ? id.substring(0, dashIndex) : "";
|
||||
}
|
||||
|
||||
private static double parseBid(String bid) {
|
||||
// "€123.45" → 123.45
|
||||
// "No bids" → 0.0
|
||||
if (bid == null || bid.contains("No")) return 0.0;
|
||||
return Double.parseDouble(bid.replaceAll("[^0-9.]", ""));
|
||||
}
|
||||
|
||||
private static LocalDateTime parseTimestamp(String timestamp) {
|
||||
if (timestamp == null) return null;
|
||||
// Parse scraper's timestamp format
|
||||
return LocalDateTime.parse(timestamp);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Option 2: Unified Schema (Better Long-term)
|
||||
|
||||
Modify **both** scraper and monitor to use a unified schema.
|
||||
|
||||
**Create**: `SHARED_SCHEMA.sql`
|
||||
```sql
|
||||
-- Unified schema that both projects use
|
||||
|
||||
CREATE TABLE IF NOT EXISTS auctions (
|
||||
auction_id TEXT PRIMARY KEY, -- Use TEXT to support "A7-39813"
|
||||
auction_id_numeric INTEGER, -- For monitor's integer needs
|
||||
title TEXT NOT NULL,
|
||||
location TEXT, -- Full: "Cluj-Napoca, RO"
|
||||
city TEXT, -- Parsed: "Cluj-Napoca"
|
||||
country TEXT, -- Parsed: "RO"
|
||||
url TEXT NOT NULL,
|
||||
type TEXT, -- "A7", "A1"
|
||||
lot_count INTEGER DEFAULT 0,
|
||||
closing_time TEXT, -- ISO 8601 format
|
||||
scraped_at INTEGER, -- Unix timestamp
|
||||
discovered_at INTEGER -- Unix timestamp (same as scraped_at)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS lots (
|
||||
lot_id TEXT PRIMARY KEY, -- Use TEXT: "A1-28505-5"
|
||||
lot_id_numeric INTEGER, -- For monitor's integer needs
|
||||
auction_id TEXT, -- FK: "A7-39813"
|
||||
sale_id INTEGER, -- For monitor (same as auction_id_numeric)
|
||||
title TEXT,
|
||||
description TEXT,
|
||||
manufacturer TEXT,
|
||||
type TEXT,
|
||||
year INTEGER,
|
||||
category TEXT,
|
||||
current_bid_text TEXT, -- "€123.45" or "No bids"
|
||||
current_bid REAL, -- 123.45
|
||||
bid_count INTEGER,
|
||||
currency TEXT DEFAULT 'EUR',
|
||||
url TEXT UNIQUE,
|
||||
closing_time TEXT,
|
||||
viewing_time TEXT,
|
||||
pickup_date TEXT,
|
||||
location TEXT,
|
||||
closing_notified INTEGER DEFAULT 0,
|
||||
scraped_at TEXT,
|
||||
FOREIGN KEY (auction_id) REFERENCES auctions(auction_id)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS images (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
lot_id TEXT, -- FK: "A1-28505-5"
|
||||
url TEXT, -- Image URL from website
|
||||
file_path TEXT, -- Local path after download
|
||||
local_path TEXT, -- Alias for compatibility
|
||||
labels TEXT, -- Detected objects (comma-separated)
|
||||
downloaded INTEGER DEFAULT 0, -- 0=pending, 1=downloaded
|
||||
processed_at INTEGER, -- Unix timestamp when processed
|
||||
FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
|
||||
);
|
||||
|
||||
-- Indexes
|
||||
CREATE INDEX IF NOT EXISTS idx_auctions_country ON auctions(country);
|
||||
CREATE INDEX IF NOT EXISTS idx_lots_auction_id ON lots(auction_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_images_lot_id ON images(lot_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_images_downloaded ON images(downloaded);
|
||||
```
|
||||
|
||||
### Option 3: API Integration (Most Flexible)
|
||||
|
||||
Have the scraper expose a REST API for the monitor to query.
|
||||
|
||||
```python
|
||||
# In scraper: Add Flask API endpoint
|
||||
@app.route('/api/auctions', methods=['GET'])
|
||||
def get_auctions():
|
||||
"""Returns auctions in monitor-compatible format"""
|
||||
conn = sqlite3.connect(CACHE_DB)
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("SELECT * FROM auctions WHERE location LIKE '%NL%'")
|
||||
|
||||
auctions = []
|
||||
for row in cursor.fetchall():
|
||||
auctions.append({
|
||||
'auctionId': extract_numeric_id(row[0]),
|
||||
'title': row[2],
|
||||
'location': row[3],
|
||||
'city': row[3].split(',')[0] if row[3] else '',
|
||||
'country': row[3].split(',')[1].strip() if ',' in row[3] else '',
|
||||
'url': row[1],
|
||||
'type': row[0].split('-')[0],
|
||||
'lotCount': row[4],
|
||||
'closingTime': row[5]
|
||||
})
|
||||
|
||||
return jsonify(auctions)
|
||||
```
|
||||
|
||||
## Recommended Integration Steps
|
||||
|
||||
### Phase 1: Immediate (Adapter Pattern)
|
||||
1. ✅ Keep separate schemas
|
||||
2. ✅ Create `ScraperDataAdapter` in Monitor
|
||||
3. ✅ Add import methods to `DatabaseService`
|
||||
4. ✅ Monitor reads from scraper's tables using adapter
|
||||
|
||||
### Phase 2: Short-term (Unified Schema)
|
||||
1. 📋 Design unified schema (see Option 2)
|
||||
2. 📋 Update scraper to use unified schema
|
||||
3. 📋 Update monitor to use unified schema
|
||||
4. 📋 Migrate existing data
|
||||
|
||||
### Phase 3: Long-term (API + Event-driven)
|
||||
1. 📋 Add REST API to scraper
|
||||
2. 📋 Add webhook/event notification when new data arrives
|
||||
3. 📋 Monitor subscribes to events
|
||||
4. 📋 Process images asynchronously
|
||||
|
||||
## Current Integration Flow
|
||||
|
||||
### Scraper Process (Python)
|
||||
```bash
|
||||
# 1. Run scraper to populate database
|
||||
cd /path/to/scraper
|
||||
python scraper.py
|
||||
|
||||
# Output:
|
||||
# ✅ Scraped 42 auctions
|
||||
# ✅ Scraped 1,234 lots
|
||||
# ✅ Saved 3,456 image URLs
|
||||
# ✅ Data written to: /mnt/okcomputer/output/cache.db
|
||||
```
|
||||
|
||||
### Monitor Process (Java)
|
||||
```bash
|
||||
# 2. Run monitor to process the data
|
||||
cd /path/to/monitor
|
||||
export DATABASE_FILE=/mnt/okcomputer/output/cache.db
|
||||
java -jar troostwijk-monitor.jar
|
||||
|
||||
# Output:
|
||||
# 📊 Current Database State:
|
||||
# Total lots in database: 1,234
|
||||
# Total images processed: 0
|
||||
#
|
||||
# [1/2] Processing images...
|
||||
# Downloading and analyzing 3,456 images...
|
||||
#
|
||||
# [2/2] Starting bid monitoring...
|
||||
# ✓ Monitoring 1,234 active lots
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Shared Database Path
|
||||
Both processes must point to the same database file:
|
||||
|
||||
**Scraper** (`config.py`):
|
||||
```python
|
||||
CACHE_DB = '/mnt/okcomputer/output/cache.db'
|
||||
```
|
||||
|
||||
**Monitor** (`Main.java`):
|
||||
```java
|
||||
String databaseFile = System.getenv().getOrDefault(
|
||||
"DATABASE_FILE",
|
||||
"/mnt/okcomputer/output/cache.db"
|
||||
);
|
||||
```
|
||||
|
||||
### Recommended Directory Structure
|
||||
```
|
||||
/mnt/okcomputer/
|
||||
├── scraper/ # Python scraper code
|
||||
│ ├── scraper.py
|
||||
│ └── requirements.txt
|
||||
├── monitor/ # Java monitor code
|
||||
│ ├── troostwijk-monitor.jar
|
||||
│ └── models/ # YOLO models
|
||||
│ ├── yolov4.cfg
|
||||
│ ├── yolov4.weights
|
||||
│ └── coco.names
|
||||
└── output/ # Shared data directory
|
||||
├── cache.db # Shared SQLite database
|
||||
└── images/ # Downloaded images
|
||||
├── A1-28505-5/
|
||||
│ ├── 001.jpg
|
||||
│ └── 002.jpg
|
||||
└── ...
|
||||
```
|
||||
|
||||
## Monitoring & Coordination
|
||||
|
||||
### Option A: Sequential Execution
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# run-pipeline.sh
|
||||
|
||||
echo "Step 1: Scraping..."
|
||||
python scraper/scraper.py
|
||||
|
||||
echo "Step 2: Processing images..."
|
||||
java -jar monitor/troostwijk-monitor.jar --process-images-only
|
||||
|
||||
echo "Step 3: Starting monitor..."
|
||||
java -jar monitor/troostwijk-monitor.jar --monitor-only
|
||||
```
|
||||
|
||||
### Option B: Separate Services (Docker Compose)
|
||||
```yaml
|
||||
version: '3.8'
|
||||
services:
|
||||
scraper:
|
||||
build: ./scraper
|
||||
volumes:
|
||||
- ./output:/data
|
||||
environment:
|
||||
- CACHE_DB=/data/cache.db
|
||||
command: python scraper.py
|
||||
|
||||
monitor:
|
||||
build: ./monitor
|
||||
volumes:
|
||||
- ./output:/data
|
||||
environment:
|
||||
- DATABASE_FILE=/data/cache.db
|
||||
- NOTIFICATION_CONFIG=desktop
|
||||
depends_on:
|
||||
- scraper
|
||||
command: java -jar troostwijk-monitor.jar
|
||||
```
|
||||
|
||||
### Option C: Cron-based Scheduling
|
||||
```cron
|
||||
# Scrape every 6 hours
|
||||
0 */6 * * * cd /mnt/okcomputer/scraper && python scraper.py
|
||||
|
||||
# Process images every hour (if new lots found)
|
||||
0 * * * * cd /mnt/okcomputer/monitor && java -jar monitor.jar --process-new
|
||||
|
||||
# Monitor runs continuously
|
||||
@reboot cd /mnt/okcomputer/monitor && java -jar monitor.jar --monitor-only
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: Type Mismatch Errors
|
||||
**Symptom**: Monitor crashes with "INTEGER expected, got TEXT"
|
||||
|
||||
**Solution**: Use adapter pattern (Option 1) or unified schema (Option 2)
|
||||
|
||||
### Issue: Monitor sees no data
|
||||
**Symptom**: "Total lots in database: 0"
|
||||
|
||||
**Check**:
|
||||
1. Is `DATABASE_FILE` env var set correctly?
|
||||
2. Did scraper actually write data?
|
||||
3. Are both processes using the same database file?
|
||||
|
||||
```bash
|
||||
# Verify database has data
|
||||
sqlite3 /mnt/okcomputer/output/cache.db "SELECT COUNT(*) FROM lots"
|
||||
```
|
||||
|
||||
### Issue: Images not downloading
|
||||
**Symptom**: "Total images processed: 0" but scraper found images
|
||||
|
||||
**Check**:
|
||||
1. Scraper writes image URLs to `images` table
|
||||
2. Monitor reads from `images` table with `downloaded=0`
|
||||
3. Field name mapping: `local_path` vs `file_path`
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Immediate**: Implement `ScraperDataAdapter` for compatibility
|
||||
2. **This Week**: Test end-to-end integration with sample data
|
||||
3. **Next Sprint**: Migrate to unified schema
|
||||
4. **Future**: Add event-driven architecture with webhooks
|
||||
Reference in New Issue
Block a user