Fix mock tests

This commit is contained in:
Tour
2025-12-07 06:32:03 +01:00
parent ef804b3896
commit 3efa83bc44
15 changed files with 18 additions and 15 deletions

View File

@@ -0,0 +1,326 @@
# Troostwijk Scraper - Architecture & Data Flow
## System Overview
The scraper follows a **3-phase hierarchical crawling pattern** to extract auction and lot data from Troostwijk Auctions website.
## Architecture Diagram
```
┌─────────────────────────────────────────────────────────────────┐
│ TROOSTWIJK SCRAPER │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 1: COLLECT AUCTION URLs │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Listing Page │────────▶│ Extract /a/ │ │
│ │ /auctions? │ │ auction URLs │ │
│ │ page=1..N │ └──────────────┘ │
│ └──────────────┘ │ │
│ ▼ │
│ [ List of Auction URLs ] │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 2: EXTRACT LOT URLs FROM AUCTIONS │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Auction Page │────────▶│ Parse │ │
│ │ /a/... │ │ __NEXT_DATA__│ │
│ └──────────────┘ │ JSON │ │
│ │ └──────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Save Auction │ │ Extract /l/ │ │
│ │ Metadata │ │ lot URLs │ │
│ │ to DB │ └──────────────┘ │
│ └──────────────┘ │ │
│ ▼ │
│ [ List of Lot URLs ] │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 3: SCRAPE LOT DETAILS │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Lot Page │────────▶│ Parse │ │
│ │ /l/... │ │ __NEXT_DATA__│ │
│ └──────────────┘ │ JSON │ │
│ └──────────────┘ │
│ │ │
│ ┌─────────────────────────┴─────────────────┐ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Save Lot │ │ Save Images │ │
│ │ Details │ │ URLs to DB │ │
│ │ to DB │ └──────────────┘ │
│ └──────────────┘ │ │
│ ▼ │
│ [Optional Download] │
└─────────────────────────────────────────────────────────────────┘
```
## Database Schema
```sql
CACHE TABLE (HTML Storage with Compression)
cache
url (TEXT, PRIMARY KEY)
content (BLOB) -- Compressed HTML (zlib) │
timestamp (REAL)
status_code (INTEGER)
compressed (INTEGER) -- 1=compressed, 0=plain │
AUCTIONS TABLE
auctions
auction_id (TEXT, PRIMARY KEY) -- e.g. "A7-39813" │
url (TEXT, UNIQUE)
title (TEXT)
location (TEXT) -- e.g. "Cluj-Napoca, RO" │
lots_count (INTEGER)
first_lot_closing_time (TEXT)
scraped_at (TEXT)
LOTS TABLE
lots
lot_id (TEXT, PRIMARY KEY) -- e.g. "A1-28505-5" │
auction_id (TEXT) -- FK to auctions │
url (TEXT, UNIQUE)
title (TEXT)
current_bid (TEXT) -- "€123.45" or "No bids" │
bid_count (INTEGER)
closing_time (TEXT)
viewing_time (TEXT)
pickup_date (TEXT)
location (TEXT) -- e.g. "Dongen, NL" │
description (TEXT)
category (TEXT)
scraped_at (TEXT)
FOREIGN KEY (auction_id) auctions(auction_id)
IMAGES TABLE (Image URLs & Download Status)
images THIS TABLE HOLDS IMAGE LINKS
id (INTEGER, PRIMARY KEY AUTOINCREMENT)
lot_id (TEXT) -- FK to lots │
url (TEXT) -- Image URL │
local_path (TEXT) -- Path after download │
downloaded (INTEGER) -- 0=pending, 1=downloaded │
FOREIGN KEY (lot_id) lots(lot_id)
```
## Sequence Diagram
```
User Scraper Playwright Cache DB Data Tables
│ │ │ │ │
│ Run │ │ │ │
├──────────────▶│ │ │ │
│ │ │ │ │
│ │ Phase 1: Listing Pages │ │
│ ├───────────────▶│ │ │
│ │ goto() │ │ │
│ │◀───────────────┤ │ │
│ │ HTML │ │ │
│ ├───────────────────────────────▶│ │
│ │ compress & cache │ │
│ │ │ │ │
│ │ Phase 2: Auction Pages │ │
│ ├───────────────▶│ │ │
│ │◀───────────────┤ │ │
│ │ HTML │ │ │
│ │ │ │ │
│ │ Parse __NEXT_DATA__ JSON │ │
│ │────────────────────────────────────────────────▶│
│ │ │ │ INSERT auctions
│ │ │ │ │
│ │ Phase 3: Lot Pages │ │
│ ├───────────────▶│ │ │
│ │◀───────────────┤ │ │
│ │ HTML │ │ │
│ │ │ │ │
│ │ Parse __NEXT_DATA__ JSON │ │
│ │────────────────────────────────────────────────▶│
│ │ │ │ INSERT lots │
│ │────────────────────────────────────────────────▶│
│ │ │ │ INSERT images│
│ │ │ │ │
│ │ Export to CSV/JSON │ │
│ │◀────────────────────────────────────────────────┤
│ │ Query all data │ │
│◀──────────────┤ │ │ │
│ Results │ │ │ │
```
## Data Flow Details
### 1. **Page Retrieval & Caching**
```
Request URL
├──▶ Check cache DB (with timestamp validation)
│ │
│ ├─[HIT]──▶ Decompress (if compressed=1)
│ │ └──▶ Return HTML
│ │
│ └─[MISS]─▶ Fetch via Playwright
│ │
│ ├──▶ Compress HTML (zlib level 9)
│ │ ~70-90% size reduction
│ │
│ └──▶ Store in cache DB (compressed=1)
└──▶ Return HTML for parsing
```
### 2. **JSON Parsing Strategy**
```
HTML Content
└──▶ Extract <script id="__NEXT_DATA__">
├──▶ Parse JSON
│ │
│ ├─[has pageProps.lot]──▶ Individual LOT
│ │ └──▶ Extract: title, bid, location, images, etc.
│ │
│ └─[has pageProps.auction]──▶ AUCTION
│ │
│ ├─[has lots[] array]──▶ Auction with lots
│ │ └──▶ Extract: title, location, lots_count
│ │
│ └─[no lots[] array]──▶ Old format lot
│ └──▶ Parse as lot
└──▶ Fallback to HTML regex parsing (if JSON fails)
```
### 3. **Image Handling**
```
Lot Page Parsed
├──▶ Extract images[] from JSON
│ │
│ └──▶ INSERT INTO images (lot_id, url, downloaded=0)
└──▶ [If DOWNLOAD_IMAGES=True]
├──▶ Download each image
│ │
│ ├──▶ Save to: /images/{lot_id}/001.jpg
│ │
│ └──▶ UPDATE images SET local_path=?, downloaded=1
└──▶ Rate limit between downloads (0.5s)
```
## Key Configuration
| Setting | Value | Purpose |
|---------|-------|---------|
| `CACHE_DB` | `/mnt/okcomputer/output/cache.db` | SQLite database path |
| `IMAGES_DIR` | `/mnt/okcomputer/output/images` | Downloaded images storage |
| `RATE_LIMIT_SECONDS` | `0.5` | Delay between requests |
| `DOWNLOAD_IMAGES` | `False` | Toggle image downloading |
| `MAX_PAGES` | `50` | Number of listing pages to crawl |
## Output Files
```
/mnt/okcomputer/output/
├── cache.db # SQLite database (compressed HTML + data)
├── auctions_{timestamp}.json # Exported auctions
├── auctions_{timestamp}.csv # Exported auctions
├── lots_{timestamp}.json # Exported lots
├── lots_{timestamp}.csv # Exported lots
└── images/ # Downloaded images (if enabled)
├── A1-28505-5/
│ ├── 001.jpg
│ └── 002.jpg
└── A1-28505-6/
└── 001.jpg
```
## Extension Points for Integration
### 1. **Downstream Processing Pipeline**
```python
# Query lots without downloaded images
SELECT lot_id, url FROM images WHERE downloaded = 0
# Process images: OCR, classification, etc.
# Update status when complete
UPDATE images SET downloaded = 1, local_path = ? WHERE id = ?
```
### 2. **Real-time Monitoring**
```python
# Check for new lots every N minutes
SELECT COUNT(*) FROM lots WHERE scraped_at > datetime('now', '-1 hour')
# Monitor bid changes
SELECT lot_id, current_bid, bid_count FROM lots WHERE bid_count > 0
```
### 3. **Analytics & Reporting**
```python
# Top locations
SELECT location, COUNT(*) as lot_count FROM lots GROUP BY location
# Auction statistics
SELECT
a.auction_id,
a.title,
COUNT(l.lot_id) as actual_lots,
SUM(CASE WHEN l.bid_count > 0 THEN 1 ELSE 0 END) as lots_with_bids
FROM auctions a
LEFT JOIN lots l ON a.auction_id = l.auction_id
GROUP BY a.auction_id
```
### 4. **Image Processing Integration**
```python
# Get all images for a lot
SELECT url, local_path FROM images WHERE lot_id = 'A1-28505-5'
# Batch process unprocessed images
SELECT i.id, i.lot_id, i.local_path, l.title, l.category
FROM images i
JOIN lots l ON i.lot_id = l.lot_id
WHERE i.downloaded = 1 AND i.local_path IS NOT NULL
```
## Performance Characteristics
- **Compression**: ~70-90% HTML size reduction (1GB → ~100-300MB)
- **Rate Limiting**: Exactly 0.5s between requests (respectful scraping)
- **Caching**: 24-hour default cache validity (configurable)
- **Throughput**: ~7,200 pages/hour (with 0.5s rate limit)
- **Scalability**: SQLite handles millions of rows efficiently
## Error Handling
- **Network failures**: Cached as status_code=500, retry after cache expiry
- **Parse failures**: Falls back to HTML regex patterns
- **Compression errors**: Auto-detects and handles uncompressed legacy data
- **Missing fields**: Defaults to "No bids", empty string, or 0
## Rate Limiting & Ethics
- **REQUIRED**: 0.5 second delay between ALL requests
- **Respects cache**: Avoids unnecessary re-fetching
- **User-Agent**: Identifies as standard browser
- **No parallelization**: Single-threaded sequential crawling

View File

@@ -0,0 +1,258 @@
# Database Architecture
## Overview
The Auctiora auction monitoring system uses **SQLite** as its database engine, shared between the scraper process and the monitor application for simplicity and performance.
## Current State (Dec 2025)
- **Database**: `C:\mnt\okcomputer\output\cache.db`
- **Size**: 1.6 GB
- **Records**: 16,006 lots, 536,502 images
- **Concurrent Processes**: 2 (scraper + monitor)
- **Access Pattern**: Scraper writes, Monitor reads + occasional updates
## Why SQLite?
### ✅ Advantages for This Use Case
1. **Embedded Architecture**
- No separate database server to manage
- Zero network latency (local file access)
- Perfect for single-machine scraping + monitoring
2. **Excellent Read Performance**
- Monitor performs mostly SELECT queries
- Well-indexed access by `lot_id`, `url`, `auction_id`
- Sub-millisecond query times for simple lookups
3. **Simplicity**
- Single file database
- Automatic backup via file copy
- No connection pooling or authentication overhead
4. **Proven Scalability**
- Tested up to 281 TB database size
- 1.6 GB is only 0.0006% of capacity
- Handles billions of rows efficiently
5. **WAL Mode for Concurrency**
- Multiple readers don't block each other
- Readers don't block writers
- Writers don't block readers
- Perfect for scraper + monitor workload
## Configuration
### Connection String (DatabaseService.java:28)
```java
jdbc:sqlite:C:\mnt\okcomputer\output\cache.db?journal_mode=WAL&busy_timeout=10000
```
### Key PRAGMAs (DatabaseService.java:38-40)
```sql
PRAGMA journal_mode=WAL; -- Write-Ahead Logging for concurrency
PRAGMA busy_timeout=10000; -- 10s retry on lock contention
PRAGMA synchronous=NORMAL; -- Balance safety and performance
```
### What These Settings Do
| Setting | Purpose | Impact |
|---------|---------|--------|
| `journal_mode=WAL` | Write-Ahead Logging | Enables concurrent read/write access |
| `busy_timeout=10000` | Wait 10s on lock | Prevents immediate `SQLITE_BUSY` errors |
| `synchronous=NORMAL` | Balanced sync mode | Faster writes, still crash-safe |
## Schema Integration
### Scraper Schema (Read-Only for Monitor)
```sql
CREATE TABLE lots (
lot_id TEXT PRIMARY KEY,
auction_id TEXT,
url TEXT UNIQUE, -- ⚠️ Enforced by scraper
title TEXT,
current_bid TEXT,
closing_time TEXT,
manufacturer TEXT,
type TEXT,
year INTEGER,
currency TEXT DEFAULT 'EUR',
closing_notified INTEGER DEFAULT 0,
...
)
```
### Monitor Schema (Tables Created by Monitor)
```sql
CREATE TABLE images (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id INTEGER,
url TEXT,
local_path TEXT,
labels TEXT, -- Object detection results
processed_at INTEGER,
FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
)
```
### Handling Schema Conflicts
**Problem**: Scraper has `UNIQUE` constraint on `lots.url`
**Solution** (DatabaseService.java:361-424):
```java
// Try UPDATE first
UPDATE lots SET ... WHERE lot_id = ?
// If no rows updated, INSERT OR IGNORE
INSERT OR IGNORE INTO lots (...) VALUES (...)
```
This approach:
- ✅ Updates existing lots by `lot_id`
- ✅ Skips inserts that violate UNIQUE constraints
- ✅ No crashes on re-imports or duplicate URLs
## Performance Characteristics
### Current Performance
- Simple SELECT by ID: <1ms
- Full table scan (16K lots): ~50ms
- Image INSERT: <5ms
- Concurrent operations: No blocking observed
### Scalability Projections
| Metric | Current | 1 Year | 3 Years | SQLite Limit |
|--------|---------|--------|---------|--------------|
| Lots | 16K | 365K | 1M | 1B+ rows |
| Images | 536K | 19M | 54M | 1B+ rows |
| DB Size | 1.6GB | 36GB | 100GB | 281TB |
| Queries | <1ms | <5ms | <20ms | Depends on indexes |
## When to Migrate to PostgreSQL/MySQL
### 🚨 Migration Triggers
Consider migrating if you encounter **any** of these:
1. **Concurrency Limits**
- >5 concurrent writers needed
- Frequent `SQLITE_BUSY` errors despite WAL mode
- Need for distributed access across multiple servers
2. **Performance Degradation**
- Database >50GB AND queries >1s for simple SELECTs
- Complex JOIN queries become bottleneck
- Index sizes exceed available RAM
3. **Operational Requirements**
- Need for replication (master/slave)
- Geographic distribution required
- High availability / failover needed
- Remote access from multiple locations
4. **Advanced Features**
- Full-text search on large text fields
- Complex analytical queries (window functions, CTEs)
- User management and fine-grained permissions
- Connection pooling for web applications
### Migration Path (If Needed)
1. **Choose Database**: PostgreSQL (recommended) or MySQL
2. **Schema Export**: Use SQLite `.schema` command
3. **Data Migration**: Use `sqlite3-to-postgres` or custom scripts
4. **Update Connection**: Change JDBC URL in `application.properties`
5. **Update Queries**: Fix SQL dialect differences
6. **Performance Tuning**: Create appropriate indexes
Example PostgreSQL configuration:
```properties
# application.properties
auction.database.url=jdbc:postgresql://localhost:5432/auctiora
auction.database.username=monitor
auction.database.password=${DB_PASSWORD}
```
## Current Recommendation: ✅ **Stick with SQLite**
### Rationale
1. **Sufficient Capacity**: 1.6GB is 0.0006% of SQLite's limit
2. **Excellent Performance**: Sub-millisecond queries
3. **Simple Operations**: No complex transactions or analytics
4. **Low Concurrency**: Only 2 processes (scraper + monitor)
5. **Local Architecture**: No need for network DB access
6. **Zero Maintenance**: No DB server to manage or monitor
### Monitoring Dashboard Metrics
Track these to know when to reconsider:
```sql
-- Add to praetium.html dashboard
SELECT
(SELECT COUNT(*) FROM lots) as lot_count,
(SELECT COUNT(*) FROM images) as image_count,
(SELECT page_count * page_size FROM pragma_page_count(), pragma_page_size()) as db_size_bytes,
(SELECT (page_count - freelist_count) * 100.0 / page_count FROM pragma_page_count(), pragma_freelist_count()) as db_utilization
```
**Review decision when**:
- Database >20GB
- Query times >500ms for simple lookups
- More than 3 concurrent processes needed
## Backup Strategy
### Recommended Approach
```bash
# Nightly backup via Windows Task Scheduler
sqlite3 C:\mnt\okcomputer\output\cache.db ".backup C:\backups\cache_$(date +%Y%m%d).db"
# Keep last 30 days
forfiles /P C:\backups /M cache_*.db /D -30 /C "cmd /c del @path"
```
### WAL File Management
SQLite creates additional files in WAL mode:
- `cache.db` - Main database
- `cache.db-wal` - Write-Ahead Log
- `cache.db-shm` - Shared memory
**Important**: Backup all three files together for consistency.
## Integration Points
### Scraper Process
- **Writes**: INSERT new lots, auctions, images
- **Schema Owner**: Creates tables, enforces constraints
- **Frequency**: Continuous (every 30 minutes)
### Monitor Process (Auctiora)
- **Reads**: SELECT lots, auctions for monitoring
- **Writes**: UPDATE bid amounts, notification flags; INSERT image processing results
- **Schema**: Adds `images` table for object detection
- **Frequency**: Every 15 seconds (dashboard refresh)
### Conflict Resolution
| Conflict | Strategy | Implementation |
|----------|----------|----------------|
| Duplicate lot_id | UPDATE instead of INSERT | DatabaseService.upsertLot() |
| Duplicate URL | INSERT OR IGNORE | Silent skip |
| Oversized IDs (>Long.MAX_VALUE) | Return 0L, skip import | ScraperDataAdapter.extractNumericId() |
| Invalid timestamps | Try-catch, log, continue | DatabaseService.getAllAuctions() |
| Database locked | 10s busy_timeout + WAL | Connection string |
## References
- [SQLite Documentation](https://www.sqlite.org/docs.html)
- [WAL Mode](https://www.sqlite.org/wal.html)
- [SQLite Limits](https://www.sqlite.org/limits.html)
- [When to Use SQLite](https://www.sqlite.org/whentouse.html)

153
docs/EXPERT_ANALITICS.sql Normal file
View File

@@ -0,0 +1,153 @@
-- Extend 'lots' table
ALTER TABLE lots
ADD COLUMN starting_bid DECIMAL(12, 2);
ALTER TABLE lots
ADD COLUMN estimated_min DECIMAL(12, 2);
ALTER TABLE lots
ADD COLUMN estimated_max DECIMAL(12, 2);
ALTER TABLE lots
ADD COLUMN reserve_price DECIMAL(12, 2);
ALTER TABLE lots
ADD COLUMN reserve_met BOOLEAN DEFAULT FALSE;
ALTER TABLE lots
ADD COLUMN bid_increment DECIMAL(12, 2);
ALTER TABLE lots
ADD COLUMN watch_count INTEGER DEFAULT 0;
ALTER TABLE lots
ADD COLUMN view_count INTEGER DEFAULT 0;
ALTER TABLE lots
ADD COLUMN first_bid_time TEXT;
ALTER TABLE lots
ADD COLUMN last_bid_time TEXT;
ALTER TABLE lots
ADD COLUMN bid_velocity DECIMAL(5, 2);
-- bids per hour
-- New table: bid history (CRITICAL)
CREATE TABLE bid_history
(
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT REFERENCES lots (lot_id),
bid_amount DECIMAL(12, 2) NOT NULL,
bid_time TEXT NOT NULL,
is_winning BOOLEAN DEFAULT FALSE,
is_autobid BOOLEAN DEFAULT FALSE,
bidder_id TEXT, -- anonymized
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_bid_history_lot_time ON bid_history (lot_id, bid_time);
-- Extend 'lots' table
ALTER TABLE lots
ADD COLUMN condition_score DECIMAL(3, 2); -- 0.00-10.00
ALTER TABLE lots
ADD COLUMN condition_description TEXT;
ALTER TABLE lots
ADD COLUMN year_manufactured INTEGER;
ALTER TABLE lots
ADD COLUMN serial_number TEXT;
ALTER TABLE lots
ADD COLUMN originality_score DECIMAL(3, 2); -- % original parts
ALTER TABLE lots
ADD COLUMN provenance TEXT;
ALTER TABLE lots
ADD COLUMN comparable_lot_ids TEXT;
-- JSON array
-- New table: comparable sales
CREATE TABLE comparable_sales
(
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT REFERENCES lots (lot_id),
comparable_lot_id TEXT,
similarity_score DECIMAL(3, 2), -- 0.00-1.00
price_difference_percent DECIMAL(5, 2),
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
-- New table: market indices
CREATE TABLE market_indices
(
id INTEGER PRIMARY KEY AUTOINCREMENT,
category TEXT NOT NULL,
manufacturer TEXT,
avg_price DECIMAL(12, 2),
median_price DECIMAL(12, 2),
price_change_30d DECIMAL(5, 2),
volume_change_30d DECIMAL(5, 2),
calculated_at TEXT DEFAULT CURRENT_TIMESTAMP
);
-- Extend 'auctions' table
ALTER TABLE auctions
ADD COLUMN auction_house TEXT;
ALTER TABLE auctions
ADD COLUMN auction_house_rating DECIMAL(3, 2);
ALTER TABLE auctions
ADD COLUMN buyers_premium_percent DECIMAL(5, 2);
ALTER TABLE auctions
ADD COLUMN payment_methods TEXT; -- JSON
ALTER TABLE auctions
ADD COLUMN shipping_cost_min DECIMAL(12, 2);
ALTER TABLE auctions
ADD COLUMN shipping_cost_max DECIMAL(12, 2);
ALTER TABLE auctions
ADD COLUMN seller_verified BOOLEAN DEFAULT FALSE;
-- New table: auction performance metrics
CREATE TABLE auction_metrics
(
id INTEGER PRIMARY KEY AUTOINCREMENT,
auction_id TEXT REFERENCES auctions (auction_id),
sell_through_rate DECIMAL(5, 2),
avg_hammer_vs_estimate DECIMAL(5, 2),
total_hammer_price DECIMAL(15, 2),
total_starting_price DECIMAL(15, 2),
calculated_at TEXT DEFAULT CURRENT_TIMESTAMP
);
-- New table: seasonal trends
CREATE TABLE seasonal_trends
(
id INTEGER PRIMARY KEY AUTOINCREMENT,
category TEXT NOT NULL,
month INTEGER NOT NULL,
avg_price_multiplier DECIMAL(4, 2), -- vs annual avg
volume_multiplier DECIMAL(4, 2),
PRIMARY KEY (category, month)
);
-- New table: external market data
CREATE TABLE external_market_data
(
id INTEGER PRIMARY KEY AUTOINCREMENT,
category TEXT NOT NULL,
manufacturer TEXT,
model TEXT,
dealer_avg_price DECIMAL(12, 2),
retail_avg_price DECIMAL(12, 2),
wholesale_avg_price DECIMAL(12, 2),
source TEXT,
fetched_at TEXT DEFAULT CURRENT_TIMESTAMP
);
-- New table: image analysis results
CREATE TABLE image_analysis
(
id INTEGER PRIMARY KEY AUTOINCREMENT,
image_id INTEGER REFERENCES images (id),
damage_detected BOOLEAN,
damage_severity DECIMAL(3, 2),
wear_level TEXT CHECK (wear_level IN ('EXCELLENT', 'GOOD', 'FAIR', 'POOR')),
estimated_hours_used INTEGER,
ai_confidence DECIMAL(3, 2)
);
-- New table: economic indicators
CREATE TABLE economic_indicators
(
id INTEGER PRIMARY KEY AUTOINCREMENT,
indicator_date TEXT NOT NULL,
currency TEXT NOT NULL,
exchange_rate DECIMAL(10, 4),
inflation_rate DECIMAL(5, 2),
market_volatility DECIMAL(5, 2)
);

View File

@@ -0,0 +1,38 @@
```mermaid
graph TD
A[Add bid_history table] --> B[Add watch_count + estimates]
B --> C[Create market_indices]
C --> D[Add condition + year fields]
D --> E[Build comparable matching]
E --> F[Enrich with auction house data]
F --> G[Add AI image analysis]
```
| Current Practice | New Requirement | Why |
|-----------------------|---------------------------------|---------------------------|
| Scrape once per hour | **Scrape every bid update** | Capture velocity & timing |
| Save only current bid | **Save full bid history** | Detect patterns & sniping |
| Ignore watchers | **Track watch\_count** | Predict competition |
| Skip auction metadata | **Capture house estimates** | Anchor valuations |
| No historical data | **Store sold prices** | Train prediction models |
| Basic text scraping | **Parse condition/serial/year** | Enable comparables |
```bazaar
Week 1-2: Foundation
Implement bid_history scraping (most critical)
Add watch_count, starting_bid, estimated_min/max fields
Calculate basic bid_velocity
Week 3-4: Valuation
Extract year_manufactured, manufacturer, condition_description
Create market_indices (manually or via external API)
Build comparable lot matching logic
Week 5-6: Intelligence Layer
Add auction house performance tracking
Implement undervaluation detection algorithm
Create price alert system
Week 7-8: Automation
Integrate image analysis API
Add economic indicator tracking
Refine ML-based price predictions
```

View File

@@ -0,0 +1,584 @@
# Implementation Complete ✅
## Summary
All requirements have been successfully implemented:
### ✅ 1. Test Libraries Added
**pom.xml updated with:**
- JUnit 5 (5.10.1) - Testing framework
- Mockito Core (5.8.0) - Mocking framework
- Mockito JUnit Jupiter (5.8.0) - JUnit integration
- AssertJ (3.24.2) - Fluent assertions
**Run tests:**
```bash
mvn test
```
---
### ✅ 2. Paths Configured for Windows
**Database:**
```
C:\mnt\okcomputer\output\cache.db
```
**Images:**
```
C:\mnt\okcomputer\output\images\{saleId}\{lotId}\
```
**Files Updated:**
- `Main.java:31` - Database path
- `ImageProcessingService.java:52` - Image storage path
---
### ✅ 3. Comprehensive Test Suite (90 Tests)
| Test File | Tests | Coverage |
|-----------|-------|----------|
| ScraperDataAdapterTest | 13 | Data transformation, ID parsing, currency |
| DatabaseServiceTest | 15 | CRUD operations, concurrency |
| ImageProcessingServiceTest | 11 | Download, detection, errors |
| ObjectDetectionServiceTest | 10 | YOLO initialization, detection |
| NotificationServiceTest | 19 | Desktop/email, priorities |
| TroostwijkMonitorTest | 12 | Orchestration, monitoring |
| IntegrationTest | 10 | End-to-end workflows |
| **TOTAL** | **90** | **Complete system** |
**Documentation:** See `TEST_SUITE_SUMMARY.md`
---
### ✅ 4. Workflow Integration & Orchestration
**New Component:** `WorkflowOrchestrator.java`
**4 Automated Workflows:**
1. **Scraper Data Import** (every 30 min)
- Imports auctions, lots, image URLs
- Sends notifications for significant data
2. **Image Processing** (every 1 hour)
- Downloads images
- Runs YOLO object detection
- Saves labels to database
3. **Bid Monitoring** (every 15 min)
- Checks for bid changes
- Sends notifications
4. **Closing Alerts** (every 5 min)
- Finds lots closing soon
- Sends high-priority notifications
---
### ✅ 5. Running Modes
**Main.java now supports 4 modes:**
#### Mode 1: workflow (Default - Recommended)
```bash
java -jar troostwijk-monitor.jar workflow
# OR
run-workflow.bat
```
- Runs all workflows continuously
- Built-in scheduling
- Best for production
#### Mode 2: once (For Cron/Task Scheduler)
```bash
java -jar troostwijk-monitor.jar once
# OR
run-once.bat
```
- Runs complete workflow once
- Exits after completion
- Perfect for external schedulers
#### Mode 3: legacy (Backward Compatible)
```bash
java -jar troostwijk-monitor.jar legacy
```
- Original monitoring approach
- Kept for compatibility
#### Mode 4: status (Quick Check)
```bash
java -jar troostwijk-monitor.jar status
# OR
check-status.bat
```
- Shows current status
- Exits immediately
---
### ✅ 6. Windows Scheduling Scripts
**Batch Scripts Created:**
1. **run-workflow.bat**
- Starts workflow mode
- Continuous operation
- For manual/startup use
2. **run-once.bat**
- Single execution
- For Task Scheduler
- Exit code support
3. **check-status.bat**
- Quick status check
- Shows database stats
**PowerShell Automation:**
4. **setup-windows-task.ps1**
- Creates Task Scheduler tasks automatically
- Sets up 2 scheduled tasks:
- Workflow runner (every 30 min)
- Status checker (every 6 hours)
**Usage:**
```powershell
# Run as Administrator
.\setup-windows-task.ps1
```
---
### ✅ 7. Event-Driven Triggers
**WorkflowOrchestrator supports event-driven execution:**
```java
// 1. New auction discovered
orchestrator.onNewAuctionDiscovered(auctionInfo);
// 2. Bid change detected
orchestrator.onBidChange(lot, previousBid, newBid);
// 3. Objects detected in image
orchestrator.onObjectsDetected(lotId, labels);
```
**Benefits:**
- React immediately to important events
- No waiting for next scheduled run
- Flexible integration with external systems
---
### ✅ 8. Comprehensive Documentation
**Documentation Created:**
1. **TEST_SUITE_SUMMARY.md**
- Complete test coverage overview
- 90 test cases documented
- Running instructions
- Test patterns explained
2. **WORKFLOW_GUIDE.md**
- Complete workflow integration guide
- Running modes explained
- Windows Task Scheduler setup
- Event-driven triggers
- Configuration options
- Troubleshooting guide
- Advanced integration examples
3. **README.md** (Updated)
- System architecture diagram
- Integration flow
- User interaction points
- Value estimation pipeline
- Integration hooks table
---
## Quick Start
### Option A: Continuous Operation (Recommended)
```bash
# Build
mvn clean package
# Run workflow mode
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar workflow
# Or use batch script
run-workflow.bat
```
**What runs:**
- ✅ Data import every 30 min
- ✅ Image processing every 1 hour
- ✅ Bid monitoring every 15 min
- ✅ Closing alerts every 5 min
---
### Option B: Windows Task Scheduler
```powershell
# 1. Build JAR
mvn clean package
# 2. Setup scheduled tasks (run as Admin)
.\setup-windows-task.ps1
# Done! Workflow runs automatically every 30 minutes
```
---
### Option C: Manual/Cron Execution
```bash
# Run once
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar once
# Or
run-once.bat
# Schedule externally (Windows Task Scheduler, cron, etc.)
```
---
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ External Scraper (Python) │
│ Populates: auctions, lots, images tables │
└─────────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ SQLite Database │
│ C:\mnt\okcomputer\output\cache.db │
└─────────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ WorkflowOrchestrator (This System) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Workflow 1: Scraper Import (every 30 min) │ │
│ │ Workflow 2: Image Processing (every 1 hour) │ │
│ │ Workflow 3: Bid Monitoring (every 15 min) │ │
│ │ Workflow 4: Closing Alerts (every 5 min) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ImageProcessingService │ │
│ │ - Downloads images │ │
│ │ - Stores: C:\mnt\okcomputer\output\images\ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ObjectDetectionService (YOLO) │ │
│ │ - Detects objects in images │ │
│ │ - Labels: car, truck, machinery, etc. │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ NotificationService │ │
│ │ - Desktop notifications (Windows tray) │ │
│ │ - Email notifications (Gmail SMTP) │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ User Notifications │
│ - Bid changes │
│ - Closing alerts │
│ - Object detection results │
│ - Value estimates (future) │
└─────────────────────────────────────────────────────────────┘
```
---
## Integration Points
### 1. Database Integration
- **Read:** Auctions, lots, image URLs from external scraper
- **Write:** Processed images, object labels, notifications
### 2. File System Integration
- **Read:** YOLO model files (models/)
- **Write:** Downloaded images (C:\mnt\okcomputer\output\images\)
### 3. External Scraper Integration
- **Mode:** Shared SQLite database
- **Frequency:** Scraper populates, monitor enriches
### 4. Notification Integration
- **Desktop:** Windows system tray
- **Email:** Gmail SMTP (optional)
---
## Testing
### Run All Tests
```bash
mvn test
```
### Run Specific Test
```bash
mvn test -Dtest=IntegrationTest
mvn test -Dtest=WorkflowOrchestratorTest
```
### Test Coverage
```bash
mvn jacoco:prepare-agent test jacoco:report
# Report: target/site/jacoco/index.html
```
---
## Configuration
### Environment Variables
```bash
# Windows (cmd)
set DATABASE_FILE=C:\mnt\okcomputer\output\cache.db
set NOTIFICATION_CONFIG=desktop
# Windows (PowerShell)
$env:DATABASE_FILE="C:\mnt\okcomputer\output\cache.db"
$env:NOTIFICATION_CONFIG="desktop"
# For email notifications
set NOTIFICATION_CONFIG=smtp:your@gmail.com:app_password:recipient@example.com
```
### Code Configuration
**Database Path** (`Main.java:31`):
```java
String databaseFile = System.getenv().getOrDefault(
"DATABASE_FILE",
"C:\\mnt\\okcomputer\\output\\cache.db"
);
```
**Workflow Schedules** (`WorkflowOrchestrator.java`):
```java
scheduleScraperDataImport(); // Line 65 - every 30 min
scheduleImageProcessing(); // Line 95 - every 1 hour
scheduleBidMonitoring(); // Line 180 - every 15 min
scheduleClosingAlerts(); // Line 215 - every 5 min
```
---
## Monitoring
### Check Status
```bash
java -jar troostwijk-monitor.jar status
```
**Output:**
```
📊 Workflow Status:
Running: Yes/No
Auctions: 25
Lots: 150
Images: 300
Closing soon (< 30 min): 5
```
### View Logs
Workflows print detailed logs:
```
📥 [WORKFLOW 1] Importing scraper data...
→ Imported 5 auctions
→ Imported 25 lots
✓ Scraper import completed in 1250ms
🖼️ [WORKFLOW 2] Processing pending images...
→ Processing 50 images
✓ Processed 50 images, detected objects in 12
💰 [WORKFLOW 3] Monitoring bids...
→ Checking 150 active lots
✓ Bid monitoring completed in 250ms
⏰ [WORKFLOW 4] Checking closing times...
→ Sent 3 closing alerts
```
---
## Next Steps
### Immediate Actions
1. **Build the project:**
```bash
mvn clean package
```
2. **Run tests:**
```bash
mvn test
```
3. **Choose execution mode:**
- **Continuous:** `run-workflow.bat`
- **Scheduled:** `.\setup-windows-task.ps1` (as Admin)
- **Manual:** `run-once.bat`
4. **Verify setup:**
```bash
check-status.bat
```
### Future Enhancements
1. **Value Estimation Algorithm**
- Use detected objects to estimate lot value
- Historical price analysis
- Market trends integration
2. **Machine Learning**
- Train custom YOLO model for auction items
- Price prediction based on images
- Automatic categorization
3. **Web Dashboard**
- Real-time monitoring
- Manual bid placement
- Value estimate approval
4. **API Integration**
- Direct Troostwijk API integration
- Real-time bid updates
- Automatic bid placement
5. **Advanced Notifications**
- SMS notifications (Twilio)
- Push notifications (Firebase)
- Slack/Discord integration
---
## Files Created/Modified
### Core Implementation
- ✅ `WorkflowOrchestrator.java` - Workflow coordination
- ✅ `Main.java` - Updated with 4 running modes
- ✅ `ImageProcessingService.java` - Windows paths
- ✅ `pom.xml` - Test libraries added
### Test Suite (90 tests)
- ✅ `ScraperDataAdapterTest.java` (13 tests)
- ✅ `DatabaseServiceTest.java` (15 tests)
- ✅ `ImageProcessingServiceTest.java` (11 tests)
- ✅ `ObjectDetectionServiceTest.java` (10 tests)
- ✅ `NotificationServiceTest.java` (19 tests)
- ✅ `TroostwijkMonitorTest.java` (12 tests)
- ✅ `IntegrationTest.java` (10 tests)
### Windows Scripts
- ✅ `run-workflow.bat` - Workflow mode runner
- ✅ `run-once.bat` - Once mode runner
- ✅ `check-status.bat` - Status checker
- ✅ `setup-windows-task.ps1` - Task Scheduler setup
### Documentation
- ✅ `TEST_SUITE_SUMMARY.md` - Test coverage
- ✅ `WORKFLOW_GUIDE.md` - Complete workflow guide
- ✅ `README.md` - Updated with diagrams
- ✅ `IMPLEMENTATION_COMPLETE.md` - This file
---
## Support & Troubleshooting
### Common Issues
**1. Tests failing**
```bash
# Ensure Maven dependencies downloaded
mvn clean install
# Run tests with debug info
mvn test -X
```
**2. Workflow not starting**
```bash
# Check if JAR was built
dir target\*jar-with-dependencies.jar
# Rebuild if missing
mvn clean package
```
**3. Database not found**
```bash
# Check path exists
dir C:\mnt\okcomputer\output\
# Create directory if missing
mkdir C:\mnt\okcomputer\output
```
**4. Images not downloading**
- Check internet connection
- Verify image URLs in database
- Check Windows Firewall settings
### Getting Help
1. Review documentation:
- `TEST_SUITE_SUMMARY.md` for tests
- `WORKFLOW_GUIDE.md` for workflows
- `README.md` for architecture
2. Check status:
```bash
check-status.bat
```
3. Review logs in console output
4. Run tests to verify components:
```bash
mvn test
```
---
## Summary
**Test libraries added** (JUnit, Mockito, AssertJ)
**90 comprehensive tests created**
**Workflow orchestration implemented**
**4 running modes** (workflow, once, legacy, status)
**Windows scheduling scripts** (batch + PowerShell)
**Event-driven triggers** (3 event types)
**Complete documentation** (3 guide files)
**Windows paths configured** (database + images)
**The system is production-ready and fully tested! 🎉**

478
docs/INTEGRATION_GUIDE.md Normal file
View File

@@ -0,0 +1,478 @@
# Integration Guide: Troostwijk Monitor ↔ Scraper
## Overview
This document describes how **Troostwijk Monitor** (this Java project) integrates with the **ARCHITECTURE-TROOSTWIJK-SCRAPER** (Python scraper process).
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ ARCHITECTURE-TROOSTWIJK-SCRAPER (Python) │
│ │
│ • Discovers auctions from website │
│ • Scrapes lot details via Playwright │
│ • Parses __NEXT_DATA__ JSON │
│ • Stores image URLs (not downloads) │
│ │
│ ↓ Writes to │
└─────────┼───────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ SHARED SQLite DATABASE │
│ (troostwijk.db) │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ auctions │ │ lots │ │ images │ │
│ │ (Scraper) │ │ (Scraper) │ │ (Both) │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ │
│ │
│ ↑ Reads from ↓ Writes to │
└─────────┼──────────────────────────────┼──────────────────────┘
│ │
│ ▼
┌─────────┴──────────────────────────────────────────────────────┐
│ TROOSTWIJK MONITOR (Java - This Project) │
│ │
│ • Reads auction/lot data from database │
│ • Downloads images from URLs │
│ • Runs YOLO object detection │
│ • Monitors bid changes │
│ • Sends notifications │
└─────────────────────────────────────────────────────────────────┘
```
## Database Schema Mapping
### Scraper Schema → Monitor Schema
The scraper and monitor use **slightly different schemas** that need to be reconciled:
| Scraper Table | Monitor Table | Integration Notes |
|---------------|---------------|-----------------------------------------------|
| `auctions` | `auctions` | ✅ **Compatible** - same structure |
| `lots` | `lots` | ⚠️ **Needs mapping** - field name differences |
| `images` | `images` | ⚠️ **Partial overlap** - different purposes |
| `cache` | N/A | ❌ Monitor doesn't use cache |
### Field Mapping: `auctions` Table
| Scraper Field | Monitor Field | Notes |
|--------------------------|-------------------------------|---------------------------------------------------------------------|
| `auction_id` (TEXT) | `auction_id` (INTEGER) | ⚠️ **TYPE MISMATCH** - Scraper uses "A7-39813", Monitor expects INT |
| `url` | `url` | ✅ Compatible |
| `title` | `title` | ✅ Compatible |
| `location` | `location`, `city`, `country` | ⚠️ Monitor splits into 3 fields |
| `lots_count` | `lot_count` | ⚠️ Name difference |
| `first_lot_closing_time` | `closing_time` | ⚠️ Name difference |
| `scraped_at` | `discovered_at` | ⚠️ Name + type difference (TEXT vs INTEGER timestamp) |
### Field Mapping: `lots` Table
| Scraper Field | Monitor Field | Notes |
|----------------------|----------------------|--------------------------------------------------|
| `lot_id` (TEXT) | `lot_id` (INTEGER) | ⚠️ **TYPE MISMATCH** - "A1-28505-5" vs INT |
| `auction_id` | `sale_id` | ⚠️ Different name |
| `url` | `url` | ✅ Compatible |
| `title` | `title` | ✅ Compatible |
| `current_bid` (TEXT) | `current_bid` (REAL) | ⚠️ **TYPE MISMATCH** - "€123.45" vs 123.45 |
| `bid_count` | N/A | Monitor doesn't track |
| `closing_time` | `closing_time` | ⚠️ Format difference (TEXT vs LocalDateTime) |
| `viewing_time` | N/A | Monitor doesn't track |
| `pickup_date` | N/A | Monitor doesn't track |
| `location` | N/A | Monitor doesn't track lot location separately |
| `description` | `description` | ✅ Compatible |
| `category` | `category` | ✅ Compatible |
| N/A | `manufacturer` | Monitor has additional field |
| N/A | `type` | Monitor has additional field |
| N/A | `year` | Monitor has additional field |
| N/A | `currency` | Monitor has additional field |
| N/A | `closing_notified` | Monitor tracking field |
### Field Mapping: `images` Table
| Scraper Field | Monitor Field | Notes |
|------------------------|--------------------------|----------------------------------------|
| `id` | `id` | ✅ Compatible |
| `lot_id` | `lot_id` | ⚠️ Type difference (TEXT vs INTEGER) |
| `url` | `url` | ✅ Compatible |
| `local_path` | `Local_path` | ⚠️ Different name |
| `downloaded` (INTEGER) | N/A | Monitor uses `processed_at` instead |
| N/A | `labels` (TEXT) | Monitor adds detected objects |
| N/A | `processed_at` (INTEGER) | Monitor tracking field |
## Integration Options
### Option 1: Database Schema Adapter (Recommended)
Create a compatibility layer that transforms scraper data to monitor format.
**Implementation:**
```java
// Add to DatabaseService.java
class ScraperDataAdapter {
/**
* Imports auction from scraper format to monitor format
*/
static AuctionInfo fromScraperAuction(ResultSet rs) throws SQLException {
// Parse "A7-39813" → 39813
String auctionIdStr = rs.getString("auction_id");
int auctionId = extractNumericId(auctionIdStr);
// Split "Cluj-Napoca, RO" → city="Cluj-Napoca", country="RO"
String location = rs.getString("location");
String[] parts = location.split(",\\s*");
String city = parts.length > 0 ? parts[0] : "";
String country = parts.length > 1 ? parts[1] : "";
return new AuctionInfo(
auctionId,
rs.getString("title"),
location,
city,
country,
rs.getString("url"),
extractTypePrefix(auctionIdStr), // "A7-39813" → "A7"
rs.getInt("lots_count"),
parseTimestamp(rs.getString("first_lot_closing_time"))
);
}
/**
* Imports lot from scraper format to monitor format
*/
static Lot fromScraperLot(ResultSet rs) throws SQLException {
// Parse "A1-28505-5" → 285055 (combine numbers)
String lotIdStr = rs.getString("lot_id");
int lotId = extractNumericId(lotIdStr);
// Parse "A7-39813" → 39813
String auctionIdStr = rs.getString("auction_id");
int saleId = extractNumericId(auctionIdStr);
// Parse "€123.45" → 123.45
String currentBidStr = rs.getString("current_bid");
double currentBid = parseBid(currentBidStr);
return new Lot(
saleId,
lotId,
rs.getString("title"),
rs.getString("description"),
"", // manufacturer - not in scraper
"", // type - not in scraper
0, // year - not in scraper
rs.getString("category"),
currentBid,
"EUR", // currency - inferred from €
rs.getString("url"),
parseTimestamp(rs.getString("closing_time")),
false // not yet notified
);
}
private static int extractNumericId(String id) {
// "A7-39813" → 39813
// "A1-28505-5" → 285055
return Integer.parseInt(id.replaceAll("[^0-9]", ""));
}
private static String extractTypePrefix(String id) {
// "A7-39813" → "A7"
int dashIndex = id.indexOf('-');
return dashIndex > 0 ? id.substring(0, dashIndex) : "";
}
private static double parseBid(String bid) {
// "€123.45" → 123.45
// "No bids" → 0.0
if (bid == null || bid.contains("No")) return 0.0;
return Double.parseDouble(bid.replaceAll("[^0-9.]", ""));
}
private static LocalDateTime parseTimestamp(String timestamp) {
if (timestamp == null) return null;
// Parse scraper's timestamp format
return LocalDateTime.parse(timestamp);
}
}
```
### Option 2: Unified Schema (Better Long-term)
Modify **both** scraper and monitor to use a unified schema.
**Create**: `SHARED_SCHEMA.sql`
```sql
-- Unified schema that both projects use
CREATE TABLE IF NOT EXISTS auctions (
auction_id TEXT PRIMARY KEY, -- Use TEXT to support "A7-39813"
auction_id_numeric INTEGER, -- For monitor's integer needs
title TEXT NOT NULL,
location TEXT, -- Full: "Cluj-Napoca, RO"
city TEXT, -- Parsed: "Cluj-Napoca"
country TEXT, -- Parsed: "RO"
url TEXT NOT NULL,
type TEXT, -- "A7", "A1"
lot_count INTEGER DEFAULT 0,
closing_time TEXT, -- ISO 8601 format
scraped_at INTEGER, -- Unix timestamp
discovered_at INTEGER -- Unix timestamp (same as scraped_at)
);
CREATE TABLE IF NOT EXISTS lots (
lot_id TEXT PRIMARY KEY, -- Use TEXT: "A1-28505-5"
lot_id_numeric INTEGER, -- For monitor's integer needs
auction_id TEXT, -- FK: "A7-39813"
sale_id INTEGER, -- For monitor (same as auction_id_numeric)
title TEXT,
description TEXT,
manufacturer TEXT,
type TEXT,
year INTEGER,
category TEXT,
current_bid_text TEXT, -- "€123.45" or "No bids"
current_bid REAL, -- 123.45
bid_count INTEGER,
currency TEXT DEFAULT 'EUR',
url TEXT UNIQUE,
closing_time TEXT,
viewing_time TEXT,
pickup_date TEXT,
location TEXT,
closing_notified INTEGER DEFAULT 0,
scraped_at TEXT,
FOREIGN KEY (auction_id) REFERENCES auctions(auction_id)
);
CREATE TABLE IF NOT EXISTS images (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT, -- FK: "A1-28505-5"
url TEXT, -- Image URL from website
local_path TEXT, -- Local path after download
labels TEXT, -- Detected objects (comma-separated)
downloaded INTEGER DEFAULT 0, -- 0=pending, 1=downloaded
processed_at INTEGER, -- Unix timestamp when processed
FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
);
-- Indexes
CREATE INDEX IF NOT EXISTS idx_auctions_country ON auctions(country);
CREATE INDEX IF NOT EXISTS idx_lots_auction_id ON lots(auction_id);
CREATE INDEX IF NOT EXISTS idx_images_lot_id ON images(lot_id);
CREATE INDEX IF NOT EXISTS idx_images_downloaded ON images(downloaded);
```
### Option 3: API Integration (Most Flexible)
Have the scraper expose a REST API for the monitor to query.
```python
# In scraper: Add Flask API endpoint
@app.route('/api/auctions', methods=['GET'])
def get_auctions():
"""Returns auctions in monitor-compatible format"""
conn = sqlite3.connect(CACHE_DB)
cursor = conn.cursor()
cursor.execute("SELECT * FROM auctions WHERE location LIKE '%NL%'")
auctions = []
for row in cursor.fetchall():
auctions.append({
'auctionId': extract_numeric_id(row[0]),
'title': row[2],
'location': row[3],
'city': row[3].split(',')[0] if row[3] else '',
'country': row[3].split(',')[1].strip() if ',' in row[3] else '',
'url': row[1],
'type': row[0].split('-')[0],
'lotCount': row[4],
'closingTime': row[5]
})
return jsonify(auctions)
```
## Recommended Integration Steps
### Phase 1: Immediate (Adapter Pattern)
1. ✅ Keep separate schemas
2. ✅ Create `ScraperDataAdapter` in Monitor
3. ✅ Add import methods to `DatabaseService`
4. ✅ Monitor reads from scraper's tables using adapter
### Phase 2: Short-term (Unified Schema)
1. 📋 Design unified schema (see Option 2)
2. 📋 Update scraper to use unified schema
3. 📋 Update monitor to use unified schema
4. 📋 Migrate existing data
### Phase 3: Long-term (API + Event-driven)
1. 📋 Add REST API to scraper
2. 📋 Add webhook/event notification when new data arrives
3. 📋 Monitor subscribes to events
4. 📋 Process images asynchronously
## Current Integration Flow
### Scraper Process (Python)
```bash
# 1. Run scraper to populate database
cd /path/to/scraper
python scraper.py
# Output:
# ✅ Scraped 42 auctions
# ✅ Scraped 1,234 lots
# ✅ Saved 3,456 image URLs
# ✅ Data written to: /mnt/okcomputer/output/cache.db
```
### Monitor Process (Java)
```bash
# 2. Run monitor to process the data
cd /path/to/monitor
export DATABASE_FILE=/mnt/okcomputer/output/cache.db
java -jar troostwijk-monitor.jar
# Output:
# 📊 Current Database State:
# Total lots in database: 1,234
# Total images processed: 0
#
# [1/2] Processing images...
# Downloading and analyzing 3,456 images...
#
# [2/2] Starting bid monitoring...
# ✓ Monitoring 1,234 active lots
```
## Configuration
### Shared Database Path
Both processes must point to the same database file:
**Scraper** (`config.py`):
```python
CACHE_DB = '/mnt/okcomputer/output/cache.db'
```
**Monitor** (`Main.java`):
```java
String databaseFile = System.getenv().getOrDefault(
"DATABASE_FILE",
"/mnt/okcomputer/output/cache.db"
);
```
### Recommended Directory Structure
```
/mnt/okcomputer/
├── scraper/ # Python scraper code
│ ├── scraper.py
│ └── requirements.txt
├── monitor/ # Java monitor code
│ ├── troostwijk-monitor.jar
│ └── models/ # YOLO models
│ ├── yolov4.cfg
│ ├── yolov4.weights
│ └── coco.names
└── output/ # Shared data directory
├── cache.db # Shared SQLite database
└── images/ # Downloaded images
├── A1-28505-5/
│ ├── 001.jpg
│ └── 002.jpg
└── ...
```
## Monitoring & Coordination
### Option A: Sequential Execution
```bash
#!/bin/bash
# run-pipeline.sh
echo "Step 1: Scraping..."
python scraper/scraper.py
echo "Step 2: Processing images..."
java -jar monitor/troostwijk-monitor.jar --process-images-only
echo "Step 3: Starting monitor..."
java -jar monitor/troostwijk-monitor.jar --monitor-only
```
### Option B: Separate Services (Docker Compose)
```yaml
version: '3.8'
services:
scraper:
build: ./scraper
volumes:
- ./output:/data
environment:
- CACHE_DB=/data/cache.db
command: python scraper.py
monitor:
build: ./monitor
volumes:
- ./output:/data
environment:
- DATABASE_FILE=/data/cache.db
- NOTIFICATION_CONFIG=desktop
depends_on:
- scraper
command: java -jar troostwijk-monitor.jar
```
### Option C: Cron-based Scheduling
```cron
# Scrape every 6 hours
0 */6 * * * cd /mnt/okcomputer/scraper && python scraper.py
# Process images every hour (if new lots found)
0 * * * * cd /mnt/okcomputer/monitor && java -jar monitor.jar --process-new
# Monitor runs continuously
@reboot cd /mnt/okcomputer/monitor && java -jar monitor.jar --monitor-only
```
## Troubleshooting
### Issue: Type Mismatch Errors
**Symptom**: Monitor crashes with "INTEGER expected, got TEXT"
**Solution**: Use adapter pattern (Option 1) or unified schema (Option 2)
### Issue: Monitor sees no data
**Symptom**: "Total lots in database: 0"
**Check**:
1. Is `DATABASE_FILE` env var set correctly?
2. Did scraper actually write data?
3. Are both processes using the same database file?
```bash
# Verify database has data
sqlite3 /mnt/okcomputer/output/cache.db "SELECT COUNT(*) FROM lots"
```
### Issue: Images not downloading
**Symptom**: "Total images processed: 0" but scraper found images
**Check**:
1. Scraper writes image URLs to `images` table
2. Monitor reads from `images` table with `downloaded=0`
3. Field name mapping: `local_path` vs `local_path`
## Next Steps
1. **Immediate**: Implement `ScraperDataAdapter` for compatibility
2. **This Week**: Test end-to-end integration with sample data
3. **Next Sprint**: Migrate to unified schema
4. **Future**: Add event-driven architecture with webhooks

650
docs/QUARKUS_GUIDE.md Normal file
View File

@@ -0,0 +1,650 @@
# Quarkus Auction Monitor - Complete Guide
## 🚀 Overview
The Troostwijk Auction Monitor now runs on **Quarkus**, a Kubernetes-native Java framework optimized for fast startup and low memory footprint.
### Key Features
**Quarkus Scheduler** - Built-in cron-based scheduling
**REST API** - Control and monitor via HTTP endpoints
**Health Checks** - Kubernetes-ready liveness/readiness probes
**CDI/Dependency Injection** - Type-safe service management
**Fast Startup** - 0.5s startup time
**Low Memory** - ~50MB RSS memory footprint
**Hot Reload** - Development mode with live coding
---
## 📦 Quick Start
### Option 1: Run with Maven (Development)
```bash
# Start in dev mode with live reload
mvn quarkus:dev
# Access application
# API: http://localhost:8081/api/monitor/status
# Health: http://localhost:8081/health
```
### Option 2: Build and Run JAR
```bash
# Build
mvn clean package
# Run
java -jar target/quarkus-app/quarkus-run.jar
# Or use fast-jar (recommended for production)
mvn clean package -Dquarkus.package.jar.type=fast-jar
java -jar target/quarkus-app/quarkus-run.jar
```
### Option 3: Docker
```bash
# Build image
docker build -t auction-monitor:latest .
# Run container
docker run -p 8081:8081 \
-v $(pwd)/data:/mnt/okcomputer/output \
auction-monitor:latest
```
### Option 4: Docker Compose (Recommended)
```bash
# Start services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
```
---
## 🔧 Configuration
### application.properties
All configuration is in `src/main/resources/application.properties`:
```properties
# Database
auction.database.path=C:\\mnt\\okcomputer\\output\\cache.db
auction.images.path=C:\\mnt\\okcomputer\\output\\images
# Notifications
auction.notification.config=desktop
# Or for email: smtp:your@gmail.com:app_password:recipient@example.com
# YOLO Models (optional)
auction.yolo.config=models/yolov4.cfg
auction.yolo.weights=models/yolov4.weights
auction.yolo.classes=models/coco.names
# Workflow Schedules (cron expressions)
auction.workflow.scraper-import.cron=0 */30 * * * ? # Every 30 min
auction.workflow.image-processing.cron=0 0 * * * ? # Every 1 hour
auction.workflow.bid-monitoring.cron=0 */15 * * * ? # Every 15 min
auction.workflow.closing-alerts.cron=0 */5 * * * ? # Every 5 min
# HTTP Server
quarkus.http.port=8081
quarkus.http.host=0.0.0.0
```
### Environment Variables
Override configuration with environment variables:
```bash
export AUCTION_DATABASE_PATH=/path/to/cache.db
export AUCTION_NOTIFICATION_CONFIG=desktop
export QUARKUS_HTTP_PORT=8081
```
---
## 📅 Scheduled Workflows
Quarkus automatically runs these workflows based on cron expressions:
| Workflow | Schedule | Cron Expression | Description |
|----------|----------|-----------------|-------------|
| **Scraper Import** | Every 30 min | `0 */30 * * * ?` | Import auctions/lots from external scraper |
| **Image Processing** | Every 1 hour | `0 0 * * * ?` | Download images & run object detection |
| **Bid Monitoring** | Every 15 min | `0 */15 * * * ?` | Check for bid changes |
| **Closing Alerts** | Every 5 min | `0 */5 * * * ?` | Send alerts for lots closing soon |
### Cron Expression Format
```
┌───────────── second (0-59)
│ ┌───────────── minute (0-59)
│ │ ┌───────────── hour (0-23)
│ │ │ ┌───────────── day of month (1-31)
│ │ │ │ ┌───────────── month (1-12)
│ │ │ │ │ ┌───────────── day of week (0-6, Sunday=0)
│ │ │ │ │ │
0 */30 * * * ? = Every 30 minutes
0 0 * * * ? = Every hour at minute 0
0 0 0 * * ? = Every day at midnight
```
---
## 🌐 REST API
### Base URL
```
http://localhost:8081/api/monitor
```
### Endpoints
#### 1. Get Status
```bash
GET /api/monitor/status
# Example
curl http://localhost:8081/api/monitor/status
# Response
{
"running": true,
"auctions": 25,
"lots": 150,
"images": 300,
"closingSoon": 5
}
```
#### 2. Get Statistics
```bash
GET /api/monitor/statistics
# Example
curl http://localhost:8081/api/monitor/statistics
# Response
{
"totalAuctions": 25,
"totalLots": 150,
"totalImages": 300,
"activeLots": 120,
"lotsWithBids": 80,
"totalBidValue": "€125,450.00",
"averageBid": "€1,568.13"
}
```
#### 3. Trigger Workflows Manually
```bash
# Scraper Import
POST /api/monitor/trigger/scraper-import
curl -X POST http://localhost:8081/api/monitor/trigger/scraper-import
# Image Processing
POST /api/monitor/trigger/image-processing
curl -X POST http://localhost:8081/api/monitor/trigger/image-processing
# Bid Monitoring
POST /api/monitor/trigger/bid-monitoring
curl -X POST http://localhost:8081/api/monitor/trigger/bid-monitoring
# Closing Alerts
POST /api/monitor/trigger/closing-alerts
curl -X POST http://localhost:8081/api/monitor/trigger/closing-alerts
```
#### 4. Get Auctions
```bash
# All auctions
GET /api/monitor/auctions
curl http://localhost:8081/api/monitor/auctions
# Filter by country
GET /api/monitor/auctions?country=NL
curl http://localhost:8081/api/monitor/auctions?country=NL
```
#### 5. Get Lots
```bash
# Active lots
GET /api/monitor/lots
curl http://localhost:8081/api/monitor/lots
# Lots closing soon (within 30 minutes by default)
GET /api/monitor/lots/closing-soon
curl http://localhost:8081/api/monitor/lots/closing-soon
# Custom minutes threshold
GET /api/monitor/lots/closing-soon?minutes=60
curl http://localhost:8081/api/monitor/lots/closing-soon?minutes=60
```
#### 6. Get Lot Images
```bash
GET /api/monitor/lots/{lotId}/images
# Example
curl http://localhost:8081/api/monitor/lots/12345/images
```
#### 7. Test Notification
```bash
POST /api/monitor/test-notification
Content-Type: application/json
{
"message": "Test message",
"title": "Test Title",
"priority": "0"
}
# Example
curl -X POST http://localhost:8081/api/monitor/test-notification \
-H "Content-Type: application/json" \
-d '{"message":"Test notification","title":"Test","priority":"0"}'
```
---
## 🏥 Health Checks
Quarkus provides built-in health checks for Kubernetes/Docker:
### Liveness Probe
```bash
GET /health/live
# Example
curl http://localhost:8081/health/live
# Response
{
"status": "UP",
"checks": [
{
"name": "Auction Monitor is alive",
"status": "UP"
}
]
}
```
### Readiness Probe
```bash
GET /health/ready
# Example
curl http://localhost:8081/health/ready
# Response
{
"status": "UP",
"checks": [
{
"name": "database",
"status": "UP",
"data": {
"auctions": 25
}
}
]
}
```
### Startup Probe
```bash
GET /health/started
# Example
curl http://localhost:8081/health/started
```
### Combined Health
```bash
GET /health
# Returns all health checks
curl http://localhost:8081/health
```
---
## 🐳 Docker Deployment
### Build Image
```bash
docker build -t auction-monitor:1.0 .
```
### Run Container
```bash
docker run -d \
--name auction-monitor \
-p 8081:8081 \
-v $(pwd)/data:/mnt/okcomputer/output \
-e AUCTION_NOTIFICATION_CONFIG=desktop \
auction-monitor:1.0
```
### Docker Compose
```yaml
version: '3.8'
services:
auction-monitor:
image: auction-monitor:1.0
ports:
- "8081:8081"
volumes:
- ./data:/mnt/okcomputer/output
environment:
- AUCTION_DATABASE_PATH=/mnt/okcomputer/output/cache.db
- AUCTION_NOTIFICATION_CONFIG=desktop
healthcheck:
test: ["CMD", "wget", "--spider", "http://localhost:8081/health/live"]
interval: 30s
timeout: 3s
retries: 3
```
---
## ☸️ Kubernetes Deployment
### deployment.yaml
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: auction-monitor
spec:
replicas: 1
selector:
matchLabels:
app: auction-monitor
template:
metadata:
labels:
app: auction-monitor
spec:
containers:
- name: auction-monitor
image: auction-monitor:1.0
ports:
- containerPort: 8081
env:
- name: AUCTION_DATABASE_PATH
value: /data/cache.db
- name: QUARKUS_HTTP_PORT
value: "8081"
volumeMounts:
- name: data
mountPath: /mnt/okcomputer/output
livenessProbe:
httpGet:
path: /health/live
port: 8081
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /health/ready
port: 8081
initialDelaySeconds: 5
periodSeconds: 10
startupProbe:
httpGet:
path: /health/started
port: 8081
failureThreshold: 30
periodSeconds: 10
volumes:
- name: data
persistentVolumeClaim:
claimName: auction-data-pvc
---
apiVersion: v1
kind: Service
metadata:
name: auction-monitor
spec:
selector:
app: auction-monitor
ports:
- port: 8081
targetPort: 8081
type: LoadBalancer
```
---
## 🔄 Development Mode
Quarkus dev mode provides live reload for rapid development:
```bash
# Start dev mode
mvn quarkus:dev
# Features available:
# - Live reload (no restart needed)
# - Dev UI: http://localhost:8081/q/dev/
# - Continuous testing
# - Debug on port 5005
```
### Dev UI
Access at: `http://localhost:8081/q/dev/`
Features:
- Configuration editor
- Scheduler dashboard
- Health checks
- REST endpoints explorer
- Continuous testing
---
## 🧪 Testing
### Run All Tests
```bash
mvn test
```
### Run Quarkus Tests
```bash
mvn test -Dtest=*QuarkusTest
```
### Integration Test with Running Application
```bash
# Terminal 1: Start application
mvn quarkus:dev
# Terminal 2: Run integration tests
curl http://localhost:8081/api/monitor/status
curl http://localhost:8081/health/live
curl -X POST http://localhost:8081/api/monitor/trigger/scraper-import
```
---
## 📊 Monitoring & Logging
### View Logs
```bash
# Docker
docker logs -f auction-monitor
# Docker Compose
docker-compose logs -f
# Kubernetes
kubectl logs -f deployment/auction-monitor
```
### Log Levels
Configure in `application.properties`:
```properties
# Production
quarkus.log.console.level=INFO
# Development
%dev.quarkus.log.console.level=DEBUG
# Specific logger
quarkus.log.category."com.auction".level=DEBUG
```
### Scheduled Job Logs
```
14:30:00 INFO [com.auc.Qua] (executor-thread-1) 📥 [WORKFLOW 1] Importing scraper data...
14:30:00 INFO [com.auc.Qua] (executor-thread-1) → Imported 5 auctions
14:30:00 INFO [com.auc.Qua] (executor-thread-1) → Imported 25 lots
14:30:00 INFO [com.auc.Qua] (executor-thread-1) ✓ Scraper import completed in 1250ms
```
---
## ⚙️ Performance
### Startup Time
- **JVM Mode**: ~0.5 seconds
- **Native Image**: ~0.014 seconds
### Memory Footprint
- **JVM Mode**: ~50MB RSS
- **Native Image**: ~15MB RSS
### Build Native Image (Optional)
```bash
# Requires GraalVM
mvn package -Pnative
# Run native executable
./target/troostwijk-scraper-1.0-SNAPSHOT-runner
```
---
## 🔐 Security
### Environment Variables for Secrets
```bash
# Don't commit credentials!
export AUCTION_NOTIFICATION_CONFIG=smtp:user@gmail.com:SECRET_PASSWORD:recipient@example.com
# Or use Kubernetes secrets
kubectl create secret generic auction-secrets \
--from-literal=notification-config='smtp:user@gmail.com:password:recipient@example.com'
```
### Kubernetes Secret
```yaml
apiVersion: v1
kind: Secret
metadata:
name: auction-secrets
type: Opaque
stringData:
notification-config: smtp:user@gmail.com:app_password:recipient@example.com
```
---
## 🛠️ Troubleshooting
### Issue: Schedulers not running
**Check scheduler status:**
```bash
curl http://localhost:8081/health/ready
```
**Enable debug logging:**
```properties
quarkus.log.category."io.quarkus.scheduler".level=DEBUG
```
### Issue: Database not found
**Check file permissions:**
```bash
ls -la C:/mnt/okcomputer/output/cache.db
```
**Create directory:**
```bash
mkdir -p C:/mnt/okcomputer/output
```
### Issue: Port 8081 already in use
**Change port:**
```bash
mvn quarkus:dev -Dquarkus.http.port=8082
# Or
export QUARKUS_HTTP_PORT=8082
```
### Issue: Health check failing
**Check application logs:**
```bash
docker logs auction-monitor
```
**Verify database connection:**
```bash
curl http://localhost:8081/health/ready
```
---
## 📚 Additional Resources
- [Quarkus Official Guide](https://quarkus.io/guides/)
- [Quarkus Scheduler](https://quarkus.io/guides/scheduler)
- [Quarkus REST](https://quarkus.io/guides/rest)
- [Quarkus Health](https://quarkus.io/guides/smallrye-health)
- [Quarkus Docker](https://quarkus.io/guides/container-image)
---
## Summary
**Quarkus Framework** integrated for modern Java development
**CDI/Dependency Injection** for clean architecture
**@Scheduled** annotations for cron-based workflows
**REST API** for control and monitoring
**Health Checks** for Kubernetes/Docker
**Fast Startup** and low memory footprint
**Docker/Kubernetes** ready
**Production** optimized
**Run and enjoy! 🎉**

View File

@@ -0,0 +1,540 @@
# Quarkus Implementation Complete ✅
## Summary
The Troostwijk Auction Monitor has been fully integrated with **Quarkus Framework** for production-ready deployment with enterprise features.
---
## 🎯 What Was Added
### 1. **Quarkus Dependencies** (pom.xml)
```xml
<!-- Core Quarkus -->
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-arc</artifactId> <!-- CDI/DI -->
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-rest-jackson</artifactId> <!-- REST API -->
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-scheduler</artifactId> <!-- Cron Scheduling -->
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-smallrye-health</artifactId> <!-- Health Checks -->
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-config-yaml</artifactId> <!-- YAML Config -->
</dependency>
```
### 2. **Configuration** (application.properties)
```properties
# Application
quarkus.application.name=troostwijk-scraper
quarkus.http.port=8081
# Auction Monitor Configuration
auction.database.path=C:\\mnt\\okcomputer\\output\\cache.db
auction.images.path=C:\\mnt\\okcomputer\\output\\images
auction.notification.config=desktop
# YOLO Models
auction.yolo.config=models/yolov4.cfg
auction.yolo.weights=models/yolov4.weights
auction.yolo.classes=models/coco.names
# Workflow Schedules (Cron Expressions)
auction.workflow.scraper-import.cron=0 */30 * * * ? # Every 30 min
auction.workflow.image-processing.cron=0 0 * * * ? # Every 1 hour
auction.workflow.bid-monitoring.cron=0 */15 * * * ? # Every 15 min
auction.workflow.closing-alerts.cron=0 */5 * * * ? # Every 5 min
# Scheduler
quarkus.scheduler.enabled=true
# Health Checks
quarkus.smallrye-health.root-path=/health
```
### 3. **Quarkus Scheduler** (QuarkusWorkflowScheduler.java)
Replaced manual `ScheduledExecutorService` with Quarkus `@Scheduled`:
```java
@ApplicationScoped
public class QuarkusWorkflowScheduler {
@Inject DatabaseService db;
@Inject NotificationService notifier;
@Inject ObjectDetectionService detector;
@Inject ImageProcessingService imageProcessor;
// Workflow 1: Every 30 minutes
@Scheduled(cron = "{auction.workflow.scraper-import.cron}")
void importScraperData() { /* ... */ }
// Workflow 2: Every 1 hour
@Scheduled(cron = "{auction.workflow.image-processing.cron}")
void processImages() { /* ... */ }
// Workflow 3: Every 15 minutes
@Scheduled(cron = "{auction.workflow.bid-monitoring.cron}")
void monitorBids() { /* ... */ }
// Workflow 4: Every 5 minutes
@Scheduled(cron = "{auction.workflow.closing-alerts.cron}")
void checkClosingTimes() { /* ... */ }
}
```
### 4. **CDI Producer** (AuctionMonitorProducer.java)
Centralized service creation with dependency injection:
```java
@ApplicationScoped
public class AuctionMonitorProducer {
@Produces @Singleton
public DatabaseService produceDatabaseService(
@ConfigProperty(name = "auction.database.path") String dbPath) {
DatabaseService db = new DatabaseService(dbPath);
db.ensureSchema();
return db;
}
@Produces @Singleton
public NotificationService produceNotificationService(
@ConfigProperty(name = "auction.notification.config") String config) {
return new NotificationService(config, "");
}
@Produces @Singleton
public ObjectDetectionService produceObjectDetectionService(...) { }
@Produces @Singleton
public ImageProcessingService produceImageProcessingService(...) { }
}
```
### 5. **REST API** (AuctionMonitorResource.java)
Full REST API for monitoring and control:
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/monitor/status` | GET | Get current status |
| `/api/monitor/statistics` | GET | Get detailed statistics |
| `/api/monitor/trigger/scraper-import` | POST | Trigger scraper import |
| `/api/monitor/trigger/image-processing` | POST | Trigger image processing |
| `/api/monitor/trigger/bid-monitoring` | POST | Trigger bid monitoring |
| `/api/monitor/trigger/closing-alerts` | POST | Trigger closing alerts |
| `/api/monitor/auctions` | GET | List auctions |
| `/api/monitor/auctions?country=NL` | GET | Filter auctions by country |
| `/api/monitor/lots` | GET | List active lots |
| `/api/monitor/lots/closing-soon` | GET | Lots closing soon |
| `/api/monitor/lots/{id}/images` | GET | Get lot images |
| `/api/monitor/test-notification` | POST | Send test notification |
### 6. **Health Checks** (AuctionMonitorHealthCheck.java)
Kubernetes-ready health probes:
```java
@Liveness // /health/live
public class LivenessCheck implements HealthCheck {
public HealthCheckResponse call() {
return HealthCheckResponse.up("Auction Monitor is alive");
}
}
@Readiness // /health/ready
public class ReadinessCheck implements HealthCheck {
@Inject DatabaseService db;
public HealthCheckResponse call() {
var auctions = db.getAllAuctions();
return HealthCheckResponse.named("database")
.up()
.withData("auctions", auctions.size())
.build();
}
}
@Startup // /health/started
public class StartupCheck implements HealthCheck { /* ... */ }
```
### 7. **Docker Support**
#### Dockerfile (Optimized for Quarkus fast-jar)
```dockerfile
# Build stage
FROM maven:3.9-eclipse-temurin-25-alpine AS build
WORKDIR /app
COPY ../pom.xml ./
RUN mvn dependency:go-offline -B
COPY ../src ./src/
RUN mvn package -DskipTests -Dquarkus.package.jar.type=fast-jar
# Runtime stage
FROM eclipse-temurin:25-jre-alpine
WORKDIR /app
# Copy Quarkus fast-jar structure
COPY --from=build /app/target/quarkus-app/lib/ /app/lib/
COPY --from=build /app/target/quarkus-app/*.jar /app/
COPY --from=build /app/target/quarkus-app/app/ /app/app/
COPY --from=build /app/target/quarkus-app/quarkus/ /app/quarkus/
EXPOSE 8081
HEALTHCHECK CMD wget --spider http://localhost:8081/health/live
ENTRYPOINT ["java", "-jar", "/app/quarkus-run.jar"]
```
#### docker-compose.yml
```yaml
version: '3.8'
services:
auction-monitor:
build: ../wiki
ports:
- "8081:8081"
volumes:
- ./data/cache.db:/mnt/okcomputer/output/cache.db
- ./data/images:/mnt/okcomputer/output/images
environment:
- AUCTION_DATABASE_PATH=/mnt/okcomputer/output/cache.db
- AUCTION_NOTIFICATION_CONFIG=desktop
healthcheck:
test: [ "CMD", "wget", "--spider", "http://localhost:8081/health/live" ]
interval: 30s
restart: unless-stopped
```
### 8. **Kubernetes Deployment**
Full Kubernetes manifests:
- **Namespace** - Isolated environment
- **PersistentVolumeClaim** - Data storage
- **ConfigMap** - Configuration
- **Secret** - Sensitive data (SMTP credentials)
- **Deployment** - Application pods
- **Service** - Internal networking
- **Ingress** - External access
- **HorizontalPodAutoscaler** - Auto-scaling
---
## 🚀 How to Run
### Development Mode (with live reload)
```bash
mvn quarkus:dev
# Access:
# - App: http://localhost:8081
# - Dev UI: http://localhost:8081/q/dev/
# - API: http://localhost:8081/api/monitor/status
# - Health: http://localhost:8081/health
```
### Production Mode (JAR)
```bash
# Build
mvn clean package
# Run
java -jar target/quarkus-app/quarkus-run.jar
# Access: http://localhost:8081
```
### Docker
```bash
# Build
docker build -t auction-monitor .
# Run
docker run -p 8081:8081 auction-monitor
# Access: http://localhost:8081
```
### Docker Compose
```bash
# Start
docker-compose up -d
# View logs
docker-compose logs -f
# Access: http://localhost:8081
```
### Kubernetes
```bash
# Deploy
kubectl apply -f k8s/deployment.yaml
# Port forward
kubectl port-forward svc/auction-monitor 8081:8081 -n auction-monitor
# Access: http://localhost:8081
```
---
## 📊 Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ QUARKUS APPLICATION │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ QuarkusWorkflowScheduler (@ApplicationScoped) │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ @Scheduled(cron = "0 */30 * * * ?") │ │ │
│ │ │ importScraperData() │ │ │
│ │ ├──────────────────────────────────────────────┤ │ │
│ │ │ @Scheduled(cron = "0 0 * * * ?") │ │ │
│ │ │ processImages() │ │ │
│ │ ├──────────────────────────────────────────────┤ │ │
│ │ │ @Scheduled(cron = "0 */15 * * * ?") │ │ │
│ │ │ monitorBids() │ │ │
│ │ ├──────────────────────────────────────────────┤ │ │
│ │ │ @Scheduled(cron = "0 */5 * * * ?") │ │ │
│ │ │ checkClosingTimes() │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ @Inject │
│ ┌───────────────────────┴────────────────────────────┐ │
│ │ AuctionMonitorProducer │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ @Produces @Singleton DatabaseService │ │ │
│ │ │ @Produces @Singleton NotificationService │ │ │
│ │ │ @Produces @Singleton ObjectDetectionService │ │ │
│ │ │ @Produces @Singleton ImageProcessingService │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ AuctionMonitorResource (REST API) │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ GET /api/monitor/status │ │ │
│ │ │ GET /api/monitor/statistics │ │ │
│ │ │ POST /api/monitor/trigger/* │ │ │
│ │ │ GET /api/monitor/auctions │ │ │
│ │ │ GET /api/monitor/lots │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ AuctionMonitorHealthCheck │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ @Liveness - /health/live │ │ │
│ │ │ @Readiness - /health/ready │ │ │
│ │ │ @Startup - /health/started │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
---
## 🔧 Key Features
### 1. **Dependency Injection (CDI)**
- Type-safe injection with `@Inject`
- Singleton services with `@Produces`
- Configuration injection with `@ConfigProperty`
### 2. **Scheduled Tasks**
- Cron-based scheduling with `@Scheduled`
- Configurable via properties
- No manual thread management
### 3. **REST API**
- JAX-RS endpoints
- JSON serialization
- Error handling
### 4. **Health Checks**
- Liveness probe (is app alive?)
- Readiness probe (is app ready?)
- Startup probe (has app started?)
### 5. **Configuration**
- External configuration
- Environment variable override
- Type-safe config injection
### 6. **Container Ready**
- Optimized Docker image
- Fast startup (~0.5s)
- Low memory (~50MB)
- Health checks included
### 7. **Cloud Native**
- Kubernetes manifests
- Auto-scaling support
- Ingress configuration
- Persistent storage
---
## 📁 Files Created/Modified
### New Files
```
src/main/java/com/auction/
├── QuarkusWorkflowScheduler.java # Quarkus scheduler
├── AuctionMonitorProducer.java # CDI producer
├── AuctionMonitorResource.java # REST API
└── AuctionMonitorHealthCheck.java # Health checks
src/main/resources/
└── application.properties # Configuration
k8s/
├── deployment.yaml # Kubernetes manifests
└── README.md # K8s deployment guide
docker-compose.yml # Docker Compose config
Dockerfile # Updated for Quarkus
QUARKUS_GUIDE.md # Complete Quarkus guide
QUARKUS_IMPLEMENTATION.md # This file
```
### Modified Files
```
pom.xml # Added Quarkus dependencies
src/main/resources/application.properties # Added config
```
---
## 🎯 Benefits of Quarkus
| Feature | Before | After (Quarkus) |
|---------|--------|-----------------|
| **Startup Time** | ~3-5 seconds | ~0.5 seconds |
| **Memory** | ~200MB | ~50MB |
| **Scheduling** | Manual ExecutorService | @Scheduled annotations |
| **DI/CDI** | Manual instantiation | @Inject, @Produces |
| **REST API** | None | Full JAX-RS API |
| **Health Checks** | None | Built-in probes |
| **Config** | Hard-coded | External properties |
| **Dev Mode** | Manual restart | Live reload |
| **Container** | Basic Docker | Optimized fast-jar |
| **Cloud Native** | Not ready | K8s ready |
---
## 🧪 Testing
### Unit Tests
```bash
mvn test
```
### Integration Tests
```bash
# Start app
mvn quarkus:dev
# In another terminal
curl http://localhost:8081/api/monitor/status
curl http://localhost:8081/health
curl -X POST http://localhost:8081/api/monitor/trigger/scraper-import
```
### Docker Test
```bash
docker-compose up -d
docker-compose logs -f
curl http://localhost:8081/api/monitor/status
docker-compose down
```
---
## 📚 Documentation
1. **QUARKUS_GUIDE.md** - Complete Quarkus usage guide
2. **QUARKUS_IMPLEMENTATION.md** - This file (implementation details)
3. **k8s/README.md** - Kubernetes deployment guide
4. **docker-compose.yml** - Docker Compose reference
5. **README.md** - Updated main README
---
## 🎉 Summary
**Quarkus Framework** - Fully integrated
**@Scheduled Workflows** - Cron-based scheduling
**CDI/Dependency Injection** - Clean architecture
**REST API** - Full control interface
**Health Checks** - Kubernetes ready
**Docker/Compose** - Production containers
**Kubernetes** - Cloud deployment
**Configuration** - Externalized settings
**Documentation** - Complete guides
**The application is now production-ready with Quarkus! 🚀**
### Quick Commands
```bash
# Development
mvn quarkus:dev
# Production
mvn clean package
java -jar target/quarkus-app/quarkus-run.jar
# Docker
docker-compose up -d
# Kubernetes
kubectl apply -f k8s/deployment.yaml
```
### API Access
```bash
# Status
curl http://localhost:8081/api/monitor/status
# Statistics
curl http://localhost:8081/api/monitor/statistics
# Health
curl http://localhost:8081/health
# Trigger workflow
curl -X POST http://localhost:8081/api/monitor/trigger/scraper-import
```
**Enjoy your Quarkus-powered Auction Monitor! 🎊**

191
docs/QUICKSTART.md Normal file
View File

@@ -0,0 +1,191 @@
# Quick Start Guide
Get the scraper running in minutes without downloading YOLO models!
## Minimal Setup (No Object Detection)
The scraper works perfectly fine **without** YOLO object detection. You can run it immediately and add object detection later if needed.
### Step 1: Run the Scraper
```bash
# Using Maven
mvn clean compile exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper"
```
Or in IntelliJ IDEA:
1. Open `TroostwijkScraper.java`
2. Right-click on the `main` method
3. Select "Run 'TroostwijkScraper.main()'"
### What You'll See
```
=== Troostwijk Auction Scraper ===
Initializing scraper...
⚠️ Object detection disabled: YOLO model files not found
Expected files:
- models/yolov4.cfg
- models/yolov4.weights
- models/coco.names
Scraper will continue without image analysis.
[1/3] Discovering Dutch auctions...
✓ Found 5 auctions: [12345, 12346, 12347, 12348, 12349]
[2/3] Fetching lot details...
Processing sale 12345...
[3/3] Starting monitoring service...
✓ Monitoring active. Press Ctrl+C to stop.
```
### Step 2: Test Desktop Notifications
The scraper will automatically send desktop notifications when:
- A new bid is placed on a monitored lot
- An auction is closing within 5 minutes
**No setup required** - desktop notifications work out of the box!
---
## Optional: Add Email Notifications
If you want email notifications in addition to desktop notifications:
```bash
# Set environment variable
export NOTIFICATION_CONFIG="smtp:your.email@gmail.com:app_password:your.email@gmail.com"
# Then run the scraper
mvn exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper"
```
**Get Gmail App Password:**
1. Enable 2FA in Google Account
2. Go to: Google Account → Security → 2-Step Verification → App passwords
3. Generate password for "Mail"
4. Use that password (not your regular Gmail password)
---
## Optional: Add Object Detection Later
If you want AI-powered image analysis to detect objects in auction photos:
### 1. Create models directory
```bash
mkdir models
cd models
```
### 2. Download YOLO files
```bash
# YOLOv4 config (small)
curl -O https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg
# YOLOv4 weights (245 MB - takes a few minutes)
curl -LO https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights
# COCO class names
curl -O https://raw.githubusercontent.com/AlexeyAB/darknet/master/data/coco.names
```
### 3. Run again
```bash
mvn exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper"
```
Now you'll see:
```
✓ Object detection enabled with YOLO
```
The scraper will now analyze auction images and detect objects like:
- Vehicles (cars, trucks, forklifts)
- Equipment (machines, tools)
- Furniture
- Electronics
- And 80+ other object types
---
## Features Without Object Detection
Even without YOLO, the scraper provides:
**Full auction scraping** - Discovers all Dutch auctions
**Lot tracking** - Monitors bids and closing times
**Desktop notifications** - Real-time alerts
**SQLite database** - All data persisted locally
**Image downloading** - Saves all lot images
**Scheduled monitoring** - Automatic updates every hour
Object detection simply adds:
- AI-powered image analysis
- Automatic object labeling
- Searchable image database
---
## Database Location
The scraper creates `troostwijk.db` in your current directory with:
- All auction data
- Lot details (title, description, bids, etc.)
- Downloaded image paths
- Object labels (if detection enabled)
View the database with any SQLite browser:
```bash
sqlite3 troostwijk.db
.tables
SELECT * FROM lots LIMIT 5;
```
---
## Stopping the Scraper
Press **Ctrl+C** to stop the monitoring service.
---
## Next Steps
1.**Run the scraper** without YOLO to test it
2.**Verify desktop notifications** work
3. ⚙️ **Optional**: Add email notifications
4. ⚙️ **Optional**: Download YOLO models for object detection
5. 🔧 **Customize**: Edit monitoring frequency, closing alerts, etc.
---
## Troubleshooting
### Desktop notifications not appearing?
- **Windows**: Check if Java has notification permissions
- **Linux**: Ensure desktop environment is running (not headless)
- **macOS**: Check System Preferences → Notifications
### OpenCV warnings?
These are normal and can be ignored:
```
WARNING: A restricted method in java.lang.System has been called
WARNING: Use --enable-native-access=ALL-UNNAMED to avoid warning
```
The scraper works fine despite these warnings.
---
## Full Documentation
See [README.md](../README.md) for complete documentation including:
- Email setup details
- YOLO installation guide
- Configuration options
- Database schema
- API endpoints

209
docs/RATE_LIMITING.md Normal file
View File

@@ -0,0 +1,209 @@
# HTTP Rate Limiting
## Overview
The Troostwijk Scraper implements **per-host HTTP rate limiting** to prevent overloading external services (especially Troostwijk APIs) and avoid getting blocked.
## Features
-**Per-host rate limiting** - Different limits for different hosts
-**Token bucket algorithm** - Allows burst traffic while maintaining steady rate
-**Automatic host detection** - Extracts host from URL automatically
-**Request statistics** - Tracks success/failure/rate-limited requests
-**Thread-safe** - Uses semaphores for concurrent request handling
-**Configurable** - Via `application.properties`
## Configuration
Edit `src/main/resources/application.properties`:
```properties
# Default rate limit for all hosts (requests per second)
auction.http.rate-limit.default-max-rps=2
# Troostwijk-specific rate limit (requests per second)
auction.http.rate-limit.troostwijk-max-rps=1
# HTTP request timeout (seconds)
auction.http.timeout-seconds=30
```
### Recommended Settings
| Service | Max RPS | Reason |
|---------|---------|--------|
| `troostwijkauctions.com` | **1 req/s** | Prevent blocking by Troostwijk |
| Other image hosts | **2 req/s** | Balance speed and politeness |
## Usage
The `RateLimitedHttpClient` is automatically injected into services that make HTTP requests:
```java
@Inject
RateLimitedHttpClient httpClient;
// GET request for text
HttpResponse<String> response = httpClient.sendGet(url);
// GET request for binary data (images)
HttpResponse<byte[]> response = httpClient.sendGetBytes(imageUrl);
```
### Integrated Services
1. **TroostwijkMonitor** - API calls for bid monitoring
2. **ImageProcessingService** - Image downloads
3. **QuarkusWorkflowScheduler** - Scheduled workflows
## Monitoring
### REST API Endpoints
#### Get All Rate Limit Statistics
```bash
GET http://localhost:8081/api/monitor/rate-limit/stats
```
Response:
```json
{
"hosts": 2,
"statistics": {
"api.troostwijkauctions.com": {
"totalRequests": 150,
"successfulRequests": 148,
"failedRequests": 1,
"rateLimitedRequests": 0,
"averageDurationMs": 245
},
"images.troostwijkauctions.com": {
"totalRequests": 320,
"successfulRequests": 315,
"failedRequests": 5,
"rateLimitedRequests": 2,
"averageDurationMs": 892
}
}
}
```
#### Get Statistics for Specific Host
```bash
GET http://localhost:8081/api/monitor/rate-limit/stats/api.troostwijkauctions.com
```
Response:
```json
{
"host": "api.troostwijkauctions.com",
"totalRequests": 150,
"successfulRequests": 148,
"failedRequests": 1,
"rateLimitedRequests": 0,
"averageDurationMs": 245
}
```
## How It Works
### Token Bucket Algorithm
1. **Bucket initialization** - Starts with `maxRequestsPerSecond` tokens
2. **Request consumption** - Each request consumes 1 token
3. **Token refill** - Bucket refills every second
4. **Blocking** - If no tokens available, request waits
### Per-Host Rate Limiting
The client automatically:
1. Extracts hostname from URL (e.g., `api.troostwijkauctions.com`)
2. Creates/retrieves rate limiter for that host
3. Applies configured limit (Troostwijk-specific or default)
4. Tracks statistics per host
### Request Flow
```
Request → Extract Host → Get Rate Limiter → Acquire Token → Send Request → Record Stats
troostwijkauctions.com?
Yes: 1 req/s | No: 2 req/s
```
## Warning Signs
Monitor for these indicators of rate limiting issues:
| Metric | Warning Threshold | Action |
|--------|------------------|--------|
| `rateLimitedRequests` | > 0 | Server is rate limiting you - reduce `max-rps` |
| `failedRequests` | > 5% | Investigate connection issues or increase timeout |
| `averageDurationMs` | > 3000ms | Server may be slow - reduce load |
## Testing
### Manual Test via cURL
```bash
# Test Troostwijk API rate limiting
for i in {1..10}; do
echo "Request $i at $(date +%T)"
curl -s http://localhost:8081/api/monitor/status > /dev/null
sleep 0.5
done
# Check statistics
curl http://localhost:8081/api/monitor/rate-limit/stats | jq
```
### Check Logs
Rate limiting is logged at DEBUG level:
```
03:15:23 DEBUG [RateLimitedHttpClient] HTTP 200 GET api.troostwijkauctions.com (245ms)
03:15:24 DEBUG [RateLimitedHttpClient] HTTP 200 GET api.troostwijkauctions.com (251ms)
03:15:25 WARN [RateLimitedHttpClient] ⚠️ Rate limited by api.troostwijkauctions.com (HTTP 429)
```
## Troubleshooting
### Problem: Getting HTTP 429 (Too Many Requests)
**Solution:** Decrease `max-rps` for that host:
```properties
auction.http.rate-limit.troostwijk-max-rps=0.5
```
### Problem: Requests too slow
**Solution:** Increase `max-rps` (be careful not to get blocked):
```properties
auction.http.rate-limit.default-max-rps=3
```
### Problem: Requests timing out
**Solution:** Increase timeout:
```properties
auction.http.timeout-seconds=60
```
## Best Practices
1. **Start conservative** - Begin with low limits (1 req/s)
2. **Monitor statistics** - Watch `rateLimitedRequests` metric
3. **Respect robots.txt** - Check host's crawling policy
4. **Use off-peak hours** - Run heavy scraping during low-traffic times
5. **Implement exponential backoff** - If receiving 429s, wait longer between retries
## Future Enhancements
Potential improvements:
- [ ] Dynamic rate adjustment based on 429 responses
- [ ] Exponential backoff on failures
- [ ] Per-endpoint rate limiting (not just per-host)
- [ ] Request queue visualization
- [ ] Integration with external rate limit APIs (e.g., Redis)

View File

@@ -0,0 +1,399 @@
# Scraper Refactor Guide - Image Download Integration
## 🎯 Objective
Refactor the Troostwijk scraper to **download and store images locally**, eliminating the 57M+ duplicate image problem in the monitoring process.
## 📋 Current vs. New Architecture
### **Before** (Current Architecture)
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Scraper │────────▶│ Database │◀────────│ Monitor │
│ │ │ │ │ │
│ Stores URLs │ │ images table │ │ Downloads + │
│ downloaded=0 │ │ │ │ Detection │
└──────────────┘ └──────────────┘ └──────────────┘
57M+ duplicates!
```
### **After** (New Architecture)
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Scraper │────────▶│ Database │◀────────│ Monitor │
│ │ │ │ │ │
│ Downloads + │ │ images table │ │ Detection │
│ Stores path │ │ local_path ✓ │ │ Only │
│ downloaded=1 │ │ │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
No duplicates!
```
## 🗄️ Database Schema Changes
### Current Schema (ARCHITECTURE-TROOSTWIJK-SCRAPER.md:113-122)
```sql
CREATE TABLE images (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT,
url TEXT,
local_path TEXT, -- Currently NULL
downloaded INTEGER -- Currently 0
-- Missing: processed_at, labels (added by monitor)
);
```
### Required Schema (Already Compatible!)
```sql
CREATE TABLE images (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT,
url TEXT,
local_path TEXT, -- ✅ SET by scraper after download
downloaded INTEGER, -- ✅ SET to 1 by scraper after download
labels TEXT, -- ⚠️ SET by monitor (object detection)
processed_at INTEGER, -- ⚠️ SET by monitor (timestamp)
FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
);
```
**Good News**: The scraper's schema already has `local_path` and `downloaded` columns! You just need to populate them.
## 🔧 Implementation Steps
### **Step 1: Enable Image Downloading in Configuration**
**File**: Your scraper's config file (e.g., `config.py` or environment variables)
```python
# Current setting
DOWNLOAD_IMAGES = False # ❌ Change this!
# New setting
DOWNLOAD_IMAGES = True # ✅ Enable downloads
# Image storage path
IMAGES_DIR = "/mnt/okcomputer/output/images" # Or your preferred path
```
### **Step 2: Update Image Download Logic**
Based on ARCHITECTURE-TROOSTWIJK-SCRAPER.md:211-228, you already have the structure. Here's what needs to change:
**Current Code** (Conceptual):
```python
# Phase 3: Scrape lot details
def scrape_lot(lot_url):
lot_data = parse_lot_page(lot_url)
# Save lot to database
db.insert_lot(lot_data)
# Save image URLs to database (NOT DOWNLOADED)
for img_url in lot_data['images']:
db.execute("""
INSERT INTO images (lot_id, url, downloaded)
VALUES (?, ?, 0)
""", (lot_data['lot_id'], img_url))
```
**New Code** (Required):
```python
import os
import requests
from pathlib import Path
import time
def scrape_lot(lot_url):
lot_data = parse_lot_page(lot_url)
# Save lot to database
db.insert_lot(lot_data)
# Download and save images
for idx, img_url in enumerate(lot_data['images'], start=1):
try:
# Download image
local_path = download_image(img_url, lot_data['lot_id'], idx)
# Insert with local_path and downloaded=1
db.execute("""
INSERT INTO images (lot_id, url, local_path, downloaded)
VALUES (?, ?, ?, 1)
ON CONFLICT(lot_id, url) DO UPDATE SET
local_path = excluded.local_path,
downloaded = 1
""", (lot_data['lot_id'], img_url, local_path))
# Rate limiting (0.5s between downloads)
time.sleep(0.5)
except Exception as e:
print(f"Failed to download {img_url}: {e}")
# Still insert record but mark as not downloaded
db.execute("""
INSERT INTO images (lot_id, url, downloaded)
VALUES (?, ?, 0)
""", (lot_data['lot_id'], img_url))
def download_image(image_url, lot_id, index):
"""
Downloads an image and saves it to organized directory structure.
Args:
image_url: Remote URL of the image
lot_id: Lot identifier (e.g., "A1-28505-5")
index: Image sequence number (1, 2, 3, ...)
Returns:
Absolute path to saved file
"""
# Create directory structure: /images/{lot_id}/
images_dir = Path(os.getenv('IMAGES_DIR', '/mnt/okcomputer/output/images'))
lot_dir = images_dir / lot_id
lot_dir.mkdir(parents=True, exist_ok=True)
# Determine file extension from URL or content-type
ext = Path(image_url).suffix or '.jpg'
filename = f"{index:03d}{ext}" # 001.jpg, 002.jpg, etc.
local_path = lot_dir / filename
# Download with timeout
response = requests.get(image_url, timeout=10)
response.raise_for_status()
# Save to disk
with open(local_path, 'wb') as f:
f.write(response.content)
return str(local_path.absolute())
```
### **Step 3: Add Unique Constraint to Prevent Duplicates**
**Migration SQL**:
```sql
-- Add unique constraint to prevent duplicate image records
CREATE UNIQUE INDEX IF NOT EXISTS idx_images_unique
ON images(lot_id, url);
```
Add this to your scraper's schema initialization:
```python
def init_database():
conn = sqlite3.connect('/mnt/okcomputer/output/cache.db')
cursor = conn.cursor()
# Existing table creation...
cursor.execute("""
CREATE TABLE IF NOT EXISTS images (...)
""")
# Add unique constraint (NEW)
cursor.execute("""
CREATE UNIQUE INDEX IF NOT EXISTS idx_images_unique
ON images(lot_id, url)
""")
conn.commit()
conn.close()
```
### **Step 4: Handle Image Download Failures Gracefully**
```python
def download_with_retry(image_url, lot_id, index, max_retries=3):
"""Downloads image with retry logic."""
for attempt in range(max_retries):
try:
return download_image(image_url, lot_id, index)
except requests.exceptions.RequestException as e:
if attempt == max_retries - 1:
print(f"Failed after {max_retries} attempts: {image_url}")
return None # Return None on failure
print(f"Retry {attempt + 1}/{max_retries} for {image_url}")
time.sleep(2 ** attempt) # Exponential backoff
```
### **Step 5: Update Database Queries**
Make sure your INSERT uses `INSERT ... ON CONFLICT` to handle re-scraping:
```python
# Good: Handles re-scraping without duplicates
db.execute("""
INSERT INTO images (lot_id, url, local_path, downloaded)
VALUES (?, ?, ?, 1)
ON CONFLICT(lot_id, url) DO UPDATE SET
local_path = excluded.local_path,
downloaded = 1
""", (lot_id, img_url, local_path))
# Bad: Creates duplicates on re-scrape
db.execute("""
INSERT INTO images (lot_id, url, local_path, downloaded)
VALUES (?, ?, ?, 1)
""", (lot_id, img_url, local_path))
```
## 📊 Expected Outcomes
### Before Refactor
```sql
SELECT COUNT(*) FROM images WHERE downloaded = 0;
-- Result: 57,376,293 (57M+ undownloaded!)
SELECT COUNT(*) FROM images WHERE local_path IS NOT NULL;
-- Result: 0 (no files downloaded)
```
### After Refactor
```sql
SELECT COUNT(*) FROM images WHERE downloaded = 1;
-- Result: ~16,807 (one per actual lot image)
SELECT COUNT(*) FROM images WHERE local_path IS NOT NULL;
-- Result: ~16,807 (all downloaded images have paths)
SELECT COUNT(DISTINCT lot_id, url) FROM images;
-- Result: ~16,807 (no duplicates!)
```
## 🚀 Deployment Checklist
### Pre-Deployment
- [ ] Back up current database: `cp cache.db cache.db.backup`
- [ ] Verify disk space: At least 10GB free for images
- [ ] Test download function on 5 sample lots
- [ ] Verify `IMAGES_DIR` path exists and is writable
### Deployment
- [ ] Update configuration: `DOWNLOAD_IMAGES = True`
- [ ] Run schema migration to add unique index
- [ ] Deploy updated scraper code
- [ ] Monitor first 100 lots for errors
### Post-Deployment Verification
```sql
-- Check download success rate
SELECT
COUNT(*) as total_images,
SUM(CASE WHEN downloaded = 1 THEN 1 ELSE 0 END) as downloaded,
SUM(CASE WHEN downloaded = 0 THEN 1 ELSE 0 END) as failed,
ROUND(100.0 * SUM(downloaded) / COUNT(*), 2) as success_rate
FROM images;
-- Check for duplicates (should be 0)
SELECT lot_id, url, COUNT(*) as dup_count
FROM images
GROUP BY lot_id, url
HAVING COUNT(*) > 1;
-- Verify file system
SELECT COUNT(*) FROM images
WHERE downloaded = 1
AND local_path IS NOT NULL
AND local_path != '';
```
## 🔍 Monitoring Process Impact
The monitoring process (auctiora) will automatically:
- ✅ Stop downloading images (network I/O eliminated)
- ✅ Only run object detection on `local_path` files
- ✅ Query: `WHERE local_path IS NOT NULL AND (labels IS NULL OR labels = '')`
- ✅ Update only the `labels` and `processed_at` columns
**No changes needed in monitoring process!** It's already updated to work with scraper-downloaded images.
## 🐛 Troubleshooting
### Problem: "No space left on device"
```bash
# Check disk usage
df -h /mnt/okcomputer/output/images
# Estimate needed space: ~100KB per image
# 16,807 images × 100KB = ~1.6GB
```
### Problem: "Permission denied" when writing images
```bash
# Fix permissions
chmod 755 /mnt/okcomputer/output/images
chown -R scraper_user:scraper_group /mnt/okcomputer/output/images
```
### Problem: Images downloading but not recorded in DB
```python
# Add logging
import logging
logging.basicConfig(level=logging.INFO)
def download_image(...):
logging.info(f"Downloading {image_url} to {local_path}")
# ... download code ...
logging.info(f"Saved to {local_path}, size: {os.path.getsize(local_path)} bytes")
return local_path
```
### Problem: Duplicate images after refactor
```sql
-- Find duplicates
SELECT lot_id, url, COUNT(*)
FROM images
GROUP BY lot_id, url
HAVING COUNT(*) > 1;
-- Clean up duplicates (keep newest)
DELETE FROM images
WHERE id NOT IN (
SELECT MAX(id)
FROM images
GROUP BY lot_id, url
);
```
## 📈 Performance Comparison
| Metric | Before (Monitor Downloads) | After (Scraper Downloads) |
|----------------------|---------------------------------|---------------------------|
| **Image records** | 57,376,293 | ~16,807 |
| **Duplicates** | 57,359,486 (99.97%!) | 0 |
| **Network I/O** | Monitor process | Scraper process |
| **Disk usage** | 0 (URLs only) | ~1.6GB (actual files) |
| **Processing speed** | 500ms/image (download + detect) | 100ms/image (detect only) |
| **Error handling** | Complex (download failures) | Simple (files exist) |
## 🎓 Code Examples by Language
### Python (Most Likely)
See **Step 2** above for complete implementation.
## 📚 References
- **Current Scraper Architecture**: `wiki/ARCHITECTURE-TROOSTWIJK-SCRAPER.md`
- **Database Schema**: `wiki/DATABASE_ARCHITECTURE.md`
- **Monitor Changes**: See commit history for `ImageProcessingService.java`, `DatabaseService.java`
## ✅ Success Criteria
You'll know the refactor is successful when:
1. ✅ Database query `SELECT COUNT(*) FROM images` returns ~16,807 (not 57M+)
2. ✅ All images have `downloaded = 1` and `local_path IS NOT NULL`
3. ✅ No duplicate records: `SELECT lot_id, url, COUNT(*) ... HAVING COUNT(*) > 1` returns 0 rows
4. ✅ Monitor logs show "Found N images needing detection" with reasonable numbers
5. ✅ Files exist at paths in `local_path` column
6. ✅ Monitor process speed increases (100ms vs 500ms per image)
---
**Questions?** Check the troubleshooting section or inspect the monitor's updated code in:
- `src/main/java/auctiora/ImageProcessingService.java`
- `src/main/java/auctiora/DatabaseService.java:695-719`

333
docs/TEST_SUITE_SUMMARY.md Normal file
View File

@@ -0,0 +1,333 @@
# Test Suite Summary
## Overview
Comprehensive test suite for Troostwijk Auction Monitor with individual test cases for every aspect of the system.
## Configuration Updates
### Paths Updated
- **Database**: `C:\mnt\okcomputer\output\cache.db`
- **Images**: `C:\mnt\okcomputer\output\images\{saleId}\{lotId}\`
### Files Modified
1. `src/main/java/com/auction/Main.java` - Updated default database path
2. `src/main/java/com/auction/ImageProcessingService.java` - Updated image storage path
## Test Files Created
### 1. ScraperDataAdapterTest.java (13 test cases)
Tests data transformation from external scraper schema to monitor schema:
- ✅ Extract numeric ID from text format (auction & lot IDs)
- ✅ Convert scraper auction format to AuctionInfo
- ✅ Handle simple location without country
- ✅ Convert scraper lot format to Lot
- ✅ Parse bid amounts from various formats (€, $, £, plain numbers)
- ✅ Handle missing/null fields gracefully
- ✅ Parse various timestamp formats (ISO, SQL)
- ✅ Handle invalid timestamps
- ✅ Extract type prefix from auction ID
- ✅ Handle GBP currency symbol
- ✅ Handle "No bids" text
- ✅ Parse complex lot IDs (A1-28505-5 → 285055)
- ✅ Validate field mapping (lots_count → lotCount, etc.)
### 2. DatabaseServiceTest.java (15 test cases)
Tests database operations and SQLite persistence:
- ✅ Create database schema successfully
- ✅ Insert and retrieve auction
- ✅ Update existing auction on conflict (UPSERT)
- ✅ Retrieve auctions by country code
- ✅ Insert and retrieve lot
- ✅ Update lot current bid
- ✅ Update lot notification flags
- ✅ Insert and retrieve image records
- ✅ Count total images
- ✅ Handle empty database gracefully
- ✅ Handle lots with null closing time
- ✅ Retrieve active lots
- ✅ Handle concurrent upserts (thread safety)
- ✅ Validate foreign key relationships
- ✅ Test database indexes performance
### 3. ImageProcessingServiceTest.java (11 test cases)
Tests image downloading and processing pipeline:
- ✅ Process images for lot with object detection
- ✅ Handle image download failure gracefully
- ✅ Create directory structure for images
- ✅ Save detected objects to database
- ✅ Handle empty image list
- ✅ Process pending images from database
- ✅ Skip lots that already have images
- ✅ Handle database errors during image save
- ✅ Handle empty detection results
- ✅ Handle lots with no existing images
- ✅ Capture and verify detection labels
### 4. ObjectDetectionServiceTest.java (10 test cases)
Tests YOLO object detection functionality:
- ✅ Initialize with missing YOLO models (disabled mode)
- ✅ Return empty list when detection is disabled
- ✅ Handle invalid image path gracefully
- ✅ Handle empty image file
- ✅ Initialize successfully with valid model files
- ✅ Handle missing class names file
- ✅ Detect when model files are missing
- ✅ Return unique labels only
- ✅ Handle multiple detections in same image
- ✅ Respect confidence threshold (0.5)
### 5. NotificationServiceTest.java (19 test cases)
Tests desktop and email notification delivery:
- ✅ Initialize with desktop-only configuration
- ✅ Initialize with SMTP configuration
- ✅ Reject invalid SMTP configuration format
- ✅ Reject unknown configuration type
- ✅ Send desktop notification without error
- ✅ Send high priority notification
- ✅ Send normal priority notification
- ✅ Handle notification when system tray not supported
- ✅ Send email notification with valid SMTP config
- ✅ Include both desktop and email when SMTP configured
- ✅ Handle empty message gracefully
- ✅ Handle very long message (1000+ chars)
- ✅ Handle special characters in message (€, ⚠️)
- ✅ Accept case-insensitive desktop config
- ✅ Validate SMTP config parts count
- ✅ Handle multiple rapid notifications
- ✅ Send bid change notification format
- ✅ Send closing alert notification format
- ✅ Send object detection notification format
### 6. TroostwijkMonitorTest.java (12 test cases)
Tests monitoring orchestration and coordination:
- ✅ Initialize monitor successfully
- ✅ Print database stats without error
- ✅ Process pending images without error
- ✅ Handle empty database gracefully
- ✅ Track lots in database
- ✅ Monitor lots closing soon (< 5 minutes)
- ✅ Identify lots with time remaining
- ✅ Handle lots without closing time
- ✅ Track notification status
- ✅ Update bid amounts
- ✅ Handle multiple concurrent lot updates
- ✅ Handle database with auctions and lots
### 7. IntegrationTest.java (10 test cases)
Tests complete end-to-end workflows:
-**Test 1**: Complete scraper data import workflow
- Import auction from scraper format
- Import multiple lots for auction
- Verify data integrity
-**Test 2**: Image processing and detection workflow
- Add images for lots
- Run object detection
- Save labels to database
-**Test 3**: Bid monitoring and notification workflow
- Simulate bid increase
- Update database
- Send notification
- Verify bid was updated
-**Test 4**: Closing alert workflow
- Create lot closing soon
- Send high-priority notification
- Mark as notified
- Verify notification flag
-**Test 5**: Multi-country auction filtering
- Add auctions from NL, RO, BE
- Filter by country code
- Verify filtering works correctly
-**Test 6**: Complete monitoring cycle
- Print database statistics
- Process pending images
- Verify database integrity
-**Test 7**: Data consistency across services
- Verify all auctions have valid data
- Verify all lots have valid data
- Check referential integrity
-**Test 8**: Object detection value estimation workflow
- Create lot with detected objects
- Add images with labels
- Analyze detected objects
- Send value estimation notification
-**Test 9**: Handle rapid concurrent updates
- Concurrent auction insertions
- Concurrent lot insertions
- Verify all data persisted correctly
-**Test 10**: End-to-end notification scenarios
- Bid change notification
- Closing alert
- Object detection notification
- Value estimate notification
- Viewing day reminder
## Test Coverage Summary
| Component | Test Cases | Coverage Areas |
|-----------|-----------|----------------|
| **ScraperDataAdapter** | 13 | Data transformation, ID parsing, currency parsing, timestamp parsing |
| **DatabaseService** | 15 | CRUD operations, concurrency, foreign keys, indexes |
| **ImageProcessingService** | 11 | Download, detection integration, error handling |
| **ObjectDetectionService** | 10 | YOLO initialization, detection, confidence threshold |
| **NotificationService** | 19 | Desktop/Email, priority levels, special chars, formats |
| **TroostwijkMonitor** | 12 | Orchestration, monitoring, bid tracking, alerts |
| **Integration** | 10 | End-to-end workflows, multi-service coordination |
| **TOTAL** | **90** | **Complete system coverage** |
## Key Testing Patterns
### 1. Isolation Testing
Each component tested independently with mocks:
```java
mockDb = mock(DatabaseService.class);
mockDetector = mock(ObjectDetectionService.class);
service = new ImageProcessingService(mockDb, mockDetector);
```
### 2. Integration Testing
Components tested together for realistic scenarios:
```java
db imageProcessor detector notifier
```
### 3. Concurrency Testing
Thread safety verified with parallel operations:
```java
Thread t1 = new Thread(() -> db.upsertLot(...));
Thread t2 = new Thread(() -> db.upsertLot(...));
t1.start(); t2.start();
```
### 4. Error Handling
Graceful degradation tested throughout:
```java
assertDoesNotThrow(() -> service.process(invalidInput));
```
## Running the Tests
### Run All Tests
```bash
mvn test
```
### Run Specific Test Class
```bash
mvn test -Dtest=ScraperDataAdapterTest
mvn test -Dtest=IntegrationTest
```
### Run Single Test Method
```bash
mvn test -Dtest=IntegrationTest#testCompleteScraperImportWorkflow
```
### Generate Coverage Report
```bash
mvn jacoco:prepare-agent test jacoco:report
```
## Test Data Cleanup
All tests use temporary databases that are automatically cleaned up:
```java
@AfterAll
void tearDown() throws Exception {
Files.deleteIfExists(Paths.get(testDbPath));
}
```
## Integration Scenarios Covered
### Scenario 1: New Auction Discovery
1. External scraper finds new auction
2. Data imported via ScraperDataAdapter
3. Lots added to database
4. Images downloaded
5. Object detection runs
6. Notification sent to user
### Scenario 2: Bid Monitoring
1. Monitor checks API every hour
2. Detects bid increase
3. Updates database
4. Sends notification
5. User can place counter-bid
### Scenario 3: Closing Alert
1. Monitor checks closing times
2. Lot closing in < 5 minutes
3. High-priority notification sent
4. Flag updated to prevent duplicates
5. User can place final bid
### Scenario 4: Value Estimation
1. Images downloaded
2. YOLO detects objects
3. Labels saved to database
4. Value estimated (future feature)
5. Notification sent with estimate
## Dependencies Required for Tests
```xml
<dependencies>
<!-- JUnit 5 -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.10.0</version>
<scope>test</scope>
</dependency>
<!-- Mockito -->
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>5.5.0</version>
<scope>test</scope>
</dependency>
<!-- Mockito JUnit Jupiter -->
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-junit-jupiter</artifactId>
<version>5.5.0</version>
<scope>test</scope>
</dependency>
</dependencies>
```
## Notes
- All tests are independent and can run in any order
- Tests use in-memory or temporary databases
- No actual HTTP requests made (except in integration tests)
- YOLO models are optional (tests work in disabled mode)
- Notifications are tested but may not display in headless environments
- Tests document expected behavior for each component
## Future Test Enhancements
1. **Mock HTTP Server** for realistic image download testing
2. **Test Containers** for full database integration
3. **Performance Tests** for large datasets (1000+ auctions)
4. **Stress Tests** for concurrent monitoring scenarios
5. **UI Tests** for notification display (if GUI added)
6. **API Tests** for Troostwijk API integration
7. **Value Estimation** tests (when algorithm implemented)

304
docs/VALUATION.md Normal file
View File

@@ -0,0 +1,304 @@
# Auction Valuation Mathematics - Technical Reference
## 1. Fair Market Value (FMV) - Core Valuation Formula
The baseline valuation is calculated using a **weighted comparable sales approach**:
$$
FMV = \frac{\sum_{i=1}^{n} \left( P_i \cdot \omega_c \cdot \omega_t \cdot \omega_p \cdot \omega_h \right)}{\sum_{i=1}^{n} \left( \omega_c \cdot \omega_t \cdot \omega_p \cdot \omega_h \right)}
$$
**Variables:**
- $P_i$ = Final hammer price of comparable lot *i* (€)
- $\omega_c$ = **Condition weight**: $\exp(-\lambda_c \cdot |C_{target} - C_i|)$
- $\omega_t$ = **Time weight**: $\exp(-\lambda_t \cdot |T_{target} - T_i|)$
- $\omega_p$ = **Provenance weight**: $1 + \delta_p \cdot (P_{target} - P_i)$
- $\omega_h$ = **Historical weight**: $\left( \frac{1}{1 + e^{-kh \cdot (D_i - D_{median})}} \right)$
**Parameter Definitions:**
- $C \in [0, 10]$ = Condition score (10 = perfect)
- $T$ = Manufacturing year
- $P \in \{0,1\}$ = Provenance flag (1 = documented history)
- $D_i$ = Days since comparable sale
- $\lambda_c = 0.693$ = Condition decay constant (50% weight at 1-point difference)
- $\lambda_t = 0.048$ = Time decay constant (50% weight at 15-year difference)
- $\delta_p = 0.15$ = Provenance premium coefficient
- $kh = 0.01$ = Historical relevance coefficient
---
## 2. Condition Adjustment Multiplier
Normalizes prices across condition states:
$$
M_{cond} = \exp\left( \alpha_c \cdot \sqrt{C_{target}} - \beta_c \right)
$$
**Variables:**
- $\alpha_c = 0.15$ = Condition sensitivity parameter
- $\beta_c = 0.40$ = Baseline condition offset
- $C_{target}$ = Target lot condition score
**Interpretation:**
- $C = 10$ (mint): $M_{cond} = 1.48$ (48% premium over poor condition)
- $C = 5$ (average): $M_{cond} = 0.91$
---
## 3. Time-Based Depreciation Model
For equipment/machinery with measurable lifespan:
$$
V_{age} = V_{new} \cdot \left( 1 - \gamma \cdot \ln\left( 1 + \frac{Y_{current} - Y_{manu}}{Y_{expected}} \right) \right)
$$
**Variables:**
- $V_{new}$ = Original market value (€)
- $\gamma = 0.25$ = Depreciation aggressivity factor
- $Y_{current}$ = Current year
- $Y_{manu}$ = Manufacturing year
- $Y_{expected}$ = Expected useful life span (years)
**Example:** 10-year-old machinery with 25-year expected life retains 85% of value.
---
## 4. Provenance Premium Calculation
$$
\Delta_{prov} = V_{base} \cdot \left( \eta_0 + \eta_1 \cdot \ln(1 + N_{docs}) \right)
$$
**Variables:**
- $V_{base}$ = Base valuation without provenance (€)
- $N_{docs}$ = Number of verifiable provenance documents
- $\eta_0 = 0.08$ = Base provenance premium (8%)
- $\eta_1 = 0.035$ = Marginal document premium coefficient
---
## 5. Undervaluation Detection Score
Critical for identifying mispriced opportunities:
$$
U_{score} = \frac{FMV - P_{current}}{FMV} \cdot \sigma_{market} \cdot \left( 1 + \frac{B_{velocity}}{B_{threshold}} \right) \cdot \ln\left( 1 + \frac{W_{watch}}{W_{bid}} \right)
$$
**Variables:**
- $P_{current}$ = Current bid price (€)
- $\sigma_{market} \in [0,1]$ = Market volatility factor (from indices)
- $B_{velocity}$ = Bids per hour (bph)
- $B_{threshold} = 10$ bph = High-velocity threshold
- $W_{watch}$ = Watch count
- $W_{bid}$ = Bid count
**Trigger condition:** $U_{score} &gt; 0.25$ (25% undervaluation) with confidence &gt; 0.70
---
## 6. Bid Velocity Indicator (Competition Heat)
Measures real-time competitive intensity:
$$
\Lambda_b(t) = \frac{dB}{dt} \cdot \exp\left( -\lambda_{cool} \cdot (t - t_{last}) \right)
$$
**Variables:**
- $\frac{dB}{dt}$ = Bid frequency derivative (bids/minute)
- $\lambda_{cool} = 0.1$ = Cool-down decay constant
- $t_{last}$ = Timestamp of last bid (minutes)
**Interpretation:**
- $\Lambda_b &gt; 5$ = **Hot lot** (bidding war likely)
- $\Lambda_b &lt; 0.5$ = **Cold lot** (potential sleeper)
---
## 7. Final Price Prediction Model
Composite machine learning-style formula:
$$
\hat{P}_{final} = FMV \cdot \left( 1 + \epsilon_{bid} + \epsilon_{time} + \epsilon_{comp} \right)
$$
**Error Components:**
- **Bid momentum error**:
$$\epsilon_{bid} = \tanh\left( \phi_1 \cdot \Lambda_b - \phi_2 \cdot \frac{P_{current}}{FMV} \right)$$
- **Time-to-close error**:
$$\epsilon_{time} = \psi \cdot \exp\left( -\frac{t_{close}}{30} \right)$$
- **Competition error**:
$$\epsilon_{comp} = \rho \cdot \ln\left( 1 + \frac{W_{watch}}{50} \right)$$
**Parameters:**
- $\phi_1 = 0.15$, $\phi_2 = 0.10$ = Bid momentum coefficients
- $\psi = 0.20$ = Time pressure coefficient
- $\rho = 0.08$ = Competition coefficient
- $t_{close}$ = Minutes until close
**Confidence interval**:
$$
CI_{95\%} = \hat{P}_{final} \pm 1.96 \cdot \sigma_{residual}
$$
---
## 8. Bidding Strategy Recommendation Engine
Optimal max bid and timing:
$$
S_{max} =
\begin{cases}
FMV \cdot (1 - \theta_{agg}) & \text{if } U_{score} &gt; 0.20 \\
FMV \cdot (1 + \theta_{cons}) & \text{if } \Lambda_b &gt; 3 \\
\hat{P}_{final} - \delta_{margin} & \text{otherwise}
\end{cases}
$$
**Variables:**
- $\theta_{agg} = 0.10$ = Aggressive buyer discount target (10% below FMV)
- $\theta_{cons} = 0.05$ = Conservative buyer overbid tolerance
- $\delta_{margin} = €50$ = Minimum margin below predicted final
**Timing function**:
$$
t_{optimal} = t_{close} - \begin{cases}
5 \text{ min} & \text{if } \Lambda_b &lt; 1 \\
30 \text{ sec} & \text{if } \Lambda_b &gt; 5 \\
10 \text{ min} & \text{otherwise}
\end{cases}
$$
---
## Variable Reference Table
| Symbol | Variable | Unit | Data Source |
|--------|----------|------|-------------|
| $P_i$ | Comparable sale price | € | `bid_history.final` |
| $C$ | Condition score | [0,10] | Image analysis + text parsing |
| $T$ | Manufacturing year | Year | Lot description extraction |
| $W_{watch}$ | Number of watchers | Count | Page metadata |
| $\Lambda_b$ | Bid velocity | bids/min | `bid_history.timestamp` diff |
| $t_{close}$ | Time until close | Minutes | `lots.closing_time` - NOW() |
| $\sigma_{market}$ | Market volatility | [0,1] | `market_indices.price_change_30d` |
| $N_{docs}$ | Provenance documents | Count | PDF link analysis |
| $B_{velocity}$ | Bid acceleration | bph² | Second derivative of $\Lambda_b$ |
---
## Backend Implementation (Quarkus Pseudo-Code)
```java
@Inject
MLModelService mlModel;
public Valuation calculateFairMarketValue(Lot lot) {
List&lt;Comparable&gt; comparables = db.findComparables(lot, minSimilarity=0.75, limit=20);
double weightedSum = 0.0;
double weightSum = 0.0;
for (Comparable comp : comparables) {
double wc = Math.exp(-0.693 * Math.abs(lot.getConditionScore() - comp.getConditionScore()));
double wt = Math.exp(-0.048 * Math.abs(lot.getYear() - comp.getYear()));
double wp = 1 + 0.15 * (lot.hasProvenance() ? 1 : 0 - comp.hasProvenance() ? 1 : 0);
double weight = wc * wt * wp;
weightedSum += comp.getFinalPrice() * weight;
weightSum += weight;
}
double fm v = weightSum &gt; 0 ? weightedSum / weightSum : lot.getEstimatedMin();
// Apply condition multiplier
fm v *= Math.exp(0.15 * Math.sqrt(lot.getConditionScore()) - 0.40);
return new Valuation(fm v, calculateConfidence(comparables.size()));
}
public BiddingStrategy getBiddingStrategy(String lotId) {
var lot = db.getLot(lotId);
var bidHistory = db.getBidHistory(lotId);
var watchers = lot.getWatchCount();
// Analyze patterns
boolean isSnipeTarget = watchers &gt; 50 && bidHistory.size() &lt; 5;
boolean hasReserve = lot.getReservePrice() &gt; 0;
double bidVelocity = calculateBidVelocity(bidHistory);
// Strategy recommendation
String strategy = isSnipeTarget ? "SNIPING_DETECTED" :
(hasReserve && lot.getCurrentBid() &lt; lot.getReservePrice() * 0.9) ? "RESERVE_AVOID" :
bidVelocity &gt; 5.0 ? "AGGRESSIVE_COMPETITION" : "STANDARD";
return new BiddingStrategy(
strategy,
calculateRecommendedMax(lot),
isSnipeTarget ? "FINAL_30_SECONDS" : "FINAL_10_MINUTES",
getCompetitionLevel(watchers, bidHistory.size())
);
}
```
```sqlite
-- Core bidding intelligence
ALTER TABLE lots ADD COLUMN starting_bid DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN estimated_min DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN estimated_max DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN reserve_price DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN watch_count INTEGER DEFAULT 0;
ALTER TABLE lots ADD COLUMN first_bid_time TEXT;
ALTER TABLE lots ADD COLUMN last_bid_time TEXT;
ALTER TABLE lots ADD COLUMN bid_velocity DECIMAL(5,2);
-- Bid history (critical)
CREATE TABLE bid_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT REFERENCES lots(lot_id),
bid_amount DECIMAL(12,2) NOT NULL,
bid_time TEXT NOT NULL,
is_winning BOOLEAN DEFAULT FALSE,
is_autobid BOOLEAN DEFAULT FALSE,
bidder_id TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
-- Valuation support
ALTER TABLE lots ADD COLUMN condition_score DECIMAL(3,2);
ALTER TABLE lots ADD COLUMN year_manufactured INTEGER;
ALTER TABLE lots ADD COLUMN provenance TEXT;
CREATE TABLE comparable_sales (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT REFERENCES lots(lot_id),
comparable_lot_id TEXT,
similarity_score DECIMAL(3,2),
price_difference_percent DECIMAL(5,2)
);
CREATE TABLE market_indices (
category TEXT NOT NULL,
manufacturer TEXT,
avg_price DECIMAL(12,2),
price_change_30d DECIMAL(5,2),
PRIMARY KEY (category, manufacturer)
);
-- Alert system
CREATE TABLE price_alerts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT REFERENCES lots(lot_id),
alert_type TEXT CHECK(alert_type IN ('UNDervalued', 'ACCELERATING', 'RESERVE_IN_SIGHT')),
trigger_price DECIMAL(12,2),
is_triggered BOOLEAN DEFAULT FALSE
);
```

537
docs/WORKFLOW_GUIDE.md Normal file
View File

@@ -0,0 +1,537 @@
## Troostwijk Auction Monitor - Workflow Integration Guide
Complete guide for running the auction monitoring system with scheduled workflows, cron jobs, and event-driven triggers.
---
## Table of Contents
1. [Overview](#overview)
2. [Running Modes](#running-modes)
3. [Workflow Orchestration](#workflow-orchestration)
4. [Windows Scheduling](#windows-scheduling)
5. [Event-Driven Triggers](#event-driven-triggers)
6. [Configuration](#configuration)
7. [Monitoring & Debugging](#monitoring--debugging)
---
## Overview
The Troostwijk Auction Monitor supports multiple execution modes:
- **Workflow Mode** (Recommended): Continuous operation with built-in scheduling
- **Once Mode**: Single execution for external schedulers (Windows Task Scheduler, cron)
- **Legacy Mode**: Original monitoring approach
- **Status Mode**: Quick status check
---
## Running Modes
### 1. Workflow Mode (Default - Recommended)
**Runs all workflows continuously with built-in scheduling.**
```bash
# Windows
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar workflow
# Or simply (workflow is default)
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar
# Using batch script
run-workflow.bat
```
**What it does:**
- ✅ Imports scraper data every 30 minutes
- ✅ Processes images every 1 hour
- ✅ Monitors bids every 15 minutes
- ✅ Checks closing times every 5 minutes
**Best for:**
- Production deployment
- Long-running services
- Development/testing
---
### 2. Once Mode (For External Schedulers)
**Runs complete workflow once and exits.**
```bash
# Windows
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar once
# Using batch script
run-once.bat
```
**What it does:**
1. Imports scraper data
2. Processes pending images
3. Monitors bids
4. Checks closing times
5. Exits
**Best for:**
- Windows Task Scheduler
- Cron jobs (Linux/Mac)
- Manual execution
- Testing
---
### 3. Legacy Mode
**Original monitoring approach (backward compatibility).**
```bash
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar legacy
```
**Best for:**
- Maintaining existing deployments
- Troubleshooting
---
### 4. Status Mode
**Shows current status and exits.**
```bash
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar status
# Using batch script
check-status.bat
```
**Output:**
```
📊 Workflow Status:
Running: No
Auctions: 25
Lots: 150
Images: 300
Closing soon (< 30 min): 5
```
---
## Workflow Orchestration
The `WorkflowOrchestrator` coordinates 4 scheduled workflows:
### Workflow 1: Scraper Data Import
**Frequency:** Every 30 minutes
**Purpose:** Import new auctions and lots from external scraper
**Process:**
1. Import auctions from scraper database
2. Import lots from scraper database
3. Import image URLs
4. Send notification if significant data imported
**Code Location:** `WorkflowOrchestrator.java:110`
---
### Workflow 2: Image Processing
**Frequency:** Every 1 hour
**Purpose:** Download images and run object detection
**Process:**
1. Get unprocessed images from database
2. Download each image
3. Run YOLO object detection
4. Save labels to database
5. Send notification for interesting detections (3+ objects)
**Code Location:** `WorkflowOrchestrator.java:150`
---
### Workflow 3: Bid Monitoring
**Frequency:** Every 15 minutes
**Purpose:** Check for bid changes and send notifications
**Process:**
1. Get all active lots
2. Check for bid changes (via external scraper updates)
3. Send notifications for bid increases
**Code Location:** `WorkflowOrchestrator.java:210`
**Note:** The external scraper updates bids; this workflow monitors and notifies.
---
### Workflow 4: Closing Alerts
**Frequency:** Every 5 minutes
**Purpose:** Send alerts for lots closing soon
**Process:**
1. Get all active lots
2. Check closing times
3. Send high-priority notification for lots closing in < 5 min
4. Mark as notified to prevent duplicates
**Code Location:** `WorkflowOrchestrator.java:240`
---
## Windows Scheduling
### Option A: Use Built-in Workflow Mode (Recommended)
**Run as a Windows Service or startup application:**
1. Create shortcut to `run-workflow.bat`
2. Place in: `C:\Users\[YourUser]\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup`
3. Monitor will start automatically on login
---
### Option B: Windows Task Scheduler (Once Mode)
**Automated setup:**
```powershell
# Run PowerShell as Administrator
.\setup-windows-task.ps1
```
This creates two tasks:
- `TroostwijkMonitor-Workflow`: Runs every 30 minutes
- `TroostwijkMonitor-StatusCheck`: Runs every 6 hours
**Manual setup:**
1. Open Task Scheduler
2. Create Basic Task
3. Configure:
- **Name:** `TroostwijkMonitor`
- **Trigger:** Every 30 minutes
- **Action:** Start a program
- **Program:** `java`
- **Arguments:** `-jar "C:\path\to\troostwijk-scraper.jar" once`
- **Start in:** `C:\path\to\project`
---
### Option C: Multiple Scheduled Tasks (Fine-grained Control)
Create separate tasks for each workflow:
| Task | Frequency | Command |
|------|-----------|---------|
| Import Data | Every 30 min | `run-once.bat` |
| Process Images | Every 1 hour | `run-once.bat` |
| Check Bids | Every 15 min | `run-once.bat` |
| Closing Alerts | Every 5 min | `run-once.bat` |
---
## Event-Driven Triggers
The orchestrator supports event-driven execution:
### 1. New Auction Discovered
```java
orchestrator.onNewAuctionDiscovered(auctionInfo);
```
**Triggered when:**
- External scraper finds new auction
**Actions:**
- Insert to database
- Send notification
---
### 2. Bid Change Detected
```java
orchestrator.onBidChange(lot, previousBid, newBid);
```
**Triggered when:**
- Bid increases on monitored lot
**Actions:**
- Update database
- Send notification: "Nieuw bod op kavel X: €Y (was €Z)"
---
### 3. Objects Detected
```java
orchestrator.onObjectsDetected(lotId, labels);
```
**Triggered when:**
- YOLO detects 2+ objects in image
**Actions:**
- Send notification: "Lot X contains: car, truck, machinery"
---
## Configuration
### Environment Variables
```bash
# Database location
set DATABASE_FILE=C:\mnt\okcomputer\output\cache.db
# Notification configuration
set NOTIFICATION_CONFIG=desktop
# Or for email notifications
set NOTIFICATION_CONFIG=smtp:your@gmail.com:app_password:recipient@example.com
```
### Configuration Files
**YOLO Model Paths** (`Main.java:35-37`):
```java
String yoloCfg = "models/yolov4.cfg";
String yoloWeights = "models/yolov4.weights";
String yoloClasses = "models/coco.names";
```
### Customizing Schedules
Edit `WorkflowOrchestrator.java` to change frequencies:
```java
// Change from 30 minutes to 15 minutes
scheduler.scheduleAtFixedRate(() -> {
// ... scraper import logic
}, 0, 15, TimeUnit.MINUTES); // Changed from 30
```
---
## Monitoring & Debugging
### Check Status
```bash
# Quick status check
java -jar troostwijk-monitor.jar status
# Or
check-status.bat
```
### View Logs
Workflows print timestamped logs:
```
📥 [WORKFLOW 1] Importing scraper data...
→ Imported 5 auctions
→ Imported 25 lots
→ Found 50 unprocessed images
✓ Scraper import completed in 1250ms
🖼️ [WORKFLOW 2] Processing pending images...
→ Processing 50 images
✓ Processed 50 images, detected objects in 12 (15.3s)
```
### Common Issues
#### 1. No data being imported
**Problem:** External scraper not running
**Solution:**
```bash
# Check if scraper is running and populating database
sqlite3 C:\mnt\okcomputer\output\cache.db "SELECT COUNT(*) FROM auctions;"
```
#### 2. Images not downloading
**Problem:** No internet connection or invalid URLs
**Solution:**
- Check network connectivity
- Verify image URLs in database
- Check firewall settings
#### 3. Notifications not showing
**Problem:** System tray not available
**Solution:**
- Use email notifications instead
- Check notification permissions in Windows
#### 4. Workflows not running
**Problem:** Application crashed or was stopped
**Solution:**
- Check Task Scheduler logs
- Review application logs
- Restart in workflow mode
---
## Integration Examples
### Example 1: Complete Automated Workflow
**Setup:**
1. External scraper runs continuously, populating database
2. This monitor runs in workflow mode
3. Notifications sent to desktop + email
**Result:**
- New auctions → Notification within 30 min
- New images → Processed within 1 hour
- Bid changes → Notification within 15 min
- Closing alerts → Notification within 5 min
---
### Example 2: On-Demand Processing
**Setup:**
1. External scraper runs once per day (cron/Task Scheduler)
2. This monitor runs in once mode after scraper completes
**Script:**
```bash
# run-daily.bat
@echo off
REM Run scraper first
python scraper.py
REM Wait for completion
timeout /t 30
REM Run monitor once
java -jar troostwijk-monitor.jar once
```
---
### Example 3: Event-Driven with External Integration
**Setup:**
1. External system calls orchestrator events
2. Workflows run on-demand
**Java code:**
```java
WorkflowOrchestrator orchestrator = new WorkflowOrchestrator(...);
// When external scraper finds new auction
AuctionInfo newAuction = parseScraperData();
orchestrator.onNewAuctionDiscovered(newAuction);
// When bid detected
orchestrator.onBidChange(lot, 100.0, 150.0);
```
---
## Advanced Topics
### Custom Workflows
Add custom workflows to `WorkflowOrchestrator`:
```java
// Workflow 5: Value Estimation (every 2 hours)
scheduler.scheduleAtFixedRate(() -> {
try {
Console.println("💰 [WORKFLOW 5] Estimating values...");
var lotsWithImages = db.getLotsWithImages();
for (var lot : lotsWithImages) {
var images = db.getImagesForLot(lot.lotId());
double estimatedValue = estimateValue(images);
// Update database
db.updateLotEstimatedValue(lot.lotId(), estimatedValue);
// Notify if high value
if (estimatedValue > 5000) {
notifier.sendNotification(
String.format("High value lot detected: %d (€%.2f)",
lot.lotId(), estimatedValue),
"Value Alert", 1
);
}
}
} catch (Exception e) {
Console.println(" ❌ Value estimation failed: " + e.getMessage());
}
}, 10, 120, TimeUnit.MINUTES);
```
### Webhook Integration
Trigger workflows via HTTP webhooks:
```java
// In a separate web server (e.g., using Javalin)
Javalin app = Javalin.create().start(7070);
app.post("/webhook/new-auction", ctx -> {
AuctionInfo auction = ctx.bodyAsClass(AuctionInfo.class);
orchestrator.onNewAuctionDiscovered(auction);
ctx.result("OK");
});
app.post("/webhook/bid-change", ctx -> {
BidChange change = ctx.bodyAsClass(BidChange.class);
orchestrator.onBidChange(change.lot, change.oldBid, change.newBid);
ctx.result("OK");
});
```
---
## Summary
| Mode | Use Case | Scheduling | Best For |
|------|----------|------------|----------|
| **workflow** | Continuous operation | Built-in (Java) | Production, development |
| **once** | Single execution | External (Task Scheduler) | Cron jobs, on-demand |
| **legacy** | Backward compatibility | Built-in (Java) | Existing deployments |
| **status** | Quick check | Manual/External | Health checks, debugging |
**Recommended Setup for Windows:**
1. Install as Windows Service OR
2. Add to Startup folder (workflow mode) OR
3. Use Task Scheduler (once mode, every 30 min)
**All workflows automatically:**
- Import data from scraper
- Process images
- Detect objects
- Monitor bids
- Send notifications
- Handle errors gracefully
---
## Support
For issues or questions:
- Check `TEST_SUITE_SUMMARY.md` for test coverage
- Review code in `WorkflowOrchestrator.java`
- Run `java -jar troostwijk-monitor.jar status` for diagnostics