fix-tests-cleanup

Former-commit-id: 3358a2693c
This commit is contained in:
Tour
2025-12-08 05:37:35 +01:00
parent efc6b7ac21
commit aecf32eb19
9 changed files with 0 additions and 3676 deletions

View File

@@ -1,192 +0,0 @@
# Database Cleanup Guide
## Problem: Mixed Data Formats
Your production database (`cache.db`) contains data from two different scrapers:
### Valid Data (99.92%)
- **Format**: `A1-34732-49` (lot_id) + `c1f44ec2-ad6e-4c98-b0e2-cb1d8ccddcab` (auction_id UUID)
- **Count**: 16,794 lots
- **Source**: Current GraphQL-based scraper
- **Status**: ✅ Clean, with proper auction_id
### Invalid Data (0.08%)
- **Format**: `bmw-550i-4-4-v8-high-executive-...` (slug as lot_id) + `""` (empty auction_id)
- **Count**: 13 lots
- **Source**: Old legacy scraper
- **Status**: ❌ Missing auction_id, causes issues
## Impact
These 13 invalid entries:
- Cause `NullPointerException` in analytics when grouping by country
- Cannot be properly linked to auctions
- Skew statistics slightly
- May cause issues with intelligence features that rely on auction_id
## Solution 1: Clean Sync (Recommended)
The updated sync script now **automatically removes old local data** before syncing:
```bash
# Windows PowerShell
.\scripts\Sync-ProductionData.ps1
# Linux/Mac
./scripts/sync-production-data.sh --db-only
```
**What it does**:
1. Backs up existing database to `cache.db.backup-YYYYMMDD-HHMMSS`
2. **Removes old local database completely**
3. Downloads fresh copy from production
4. Shows data quality report
**Output includes**:
```
Database statistics:
┌─────────────┬────────┐
│ table_name │ count │
├─────────────┼────────┤
│ auctions │ 526 │
│ lots │ 16807 │
│ images │ 536502 │
│ cache │ 2134 │
└─────────────┴────────┘
Data quality:
┌────────────────────────────────────┬────────┬────────────┐
│ metric │ count │ percentage │
├────────────────────────────────────┼────────┼────────────┤
│ Valid lots │ 16794 │ 99.92% │
│ Invalid lots (missing auction_id) │ 13 │ 0.08% │
│ Lots with intelligence fields │ 0 │ 0.00% │
└────────────────────────────────────┴────────┴────────────┘
```
## Solution 2: Manual Cleanup
If you want to clean your existing local database without re-downloading:
```bash
# Dry run (see what would be deleted)
./scripts/cleanup-database.sh --dry-run
# Actual cleanup
./scripts/cleanup-database.sh
```
**What it does**:
1. Creates backup before cleanup
2. Deletes lots with missing auction_id
3. Deletes orphaned images (images without matching lots)
4. Compacts database (VACUUM) to reclaim space
5. Shows before/after statistics
**Example output**:
```
Current database state:
┌──────────────────────────────────┬────────┐
│ metric │ count │
├──────────────────────────────────┼────────┤
│ Total lots │ 16807 │
│ Valid lots (with auction_id) │ 16794 │
│ Invalid lots (missing auction_id) │ 13 │
└──────────────────────────────────┴────────┘
Analyzing data to clean up...
→ Invalid lots to delete: 13
→ Orphaned images to delete: 0
This will permanently delete the above records.
Continue? (y/N) y
Cleaning up database...
[1/2] Deleting invalid lots...
✓ Deleted 13 invalid lots
[2/2] Deleting orphaned images...
✓ Deleted 0 orphaned images
[3/3] Compacting database...
✓ Database compacted
Final database state:
┌───────────────┬────────┐
│ metric │ count │
├───────────────┼────────┤
│ Total lots │ 16794 │
│ Total images │ 536502 │
└───────────────┴────────┘
Database size: 8.9G
```
## Solution 3: SQL Manual Cleanup
If you prefer to manually clean using SQL:
```sql
-- Backup first!
-- cp cache.db cache.db.backup
-- Check invalid entries
SELECT COUNT(*), 'Invalid' as type
FROM lots
WHERE auction_id IS NULL OR auction_id = ''
UNION ALL
SELECT COUNT(*), 'Valid'
FROM lots
WHERE auction_id IS NOT NULL AND auction_id != '';
-- Delete invalid lots
DELETE FROM lots
WHERE auction_id IS NULL OR auction_id = '';
-- Delete orphaned images
DELETE FROM images
WHERE lot_id NOT IN (SELECT lot_id FROM lots);
-- Compact database
VACUUM;
```
## Prevention: Production Database Cleanup
To prevent these invalid entries from accumulating on production, you can:
1. **Clean production database** (one-time):
```bash
ssh tour@athena.lan
docker run --rm -v shared-auction-data:/data alpine sqlite3 /data/cache.db "DELETE FROM lots WHERE auction_id IS NULL OR auction_id = '';"
```
2. **Update scraper** to ensure all lots have auction_id
3. **Add validation** in scraper to reject lots without auction_id
## When to Clean
### Immediately if:
- ❌ Seeing `NullPointerException` in analytics
- ❌ Dashboard insights failing
- ❌ Country distribution not working
### Periodically:
- 🔄 After syncing from production (if production has invalid data)
- 🔄 Weekly/monthly maintenance
- 🔄 Before major testing or demos
## Recommendation
**Use Solution 1 (Clean Sync)** for simplicity:
- ✅ Guarantees clean state
- ✅ No manual SQL needed
- ✅ Shows data quality report
- ✅ Safe (automatic backup)
The 13 invalid entries are from an old scraper and represent only 0.08% of data, so cleaning them up has minimal impact but prevents future errors.
---
**Related Documentation**:
- [Sync Scripts README](../scripts/README.md)
- [Data Sync Setup](DATA_SYNC_SETUP.md)
- [Database Architecture](../wiki/DATABASE_ARCHITECTURE.md)

View File

@@ -1,584 +0,0 @@
# Implementation Complete ✅
## Summary
All requirements have been successfully implemented:
### ✅ 1. Test Libraries Added
**pom.xml updated with:**
- JUnit 5 (5.10.1) - Testing framework
- Mockito Core (5.8.0) - Mocking framework
- Mockito JUnit Jupiter (5.8.0) - JUnit integration
- AssertJ (3.24.2) - Fluent assertions
**Run tests:**
```bash
mvn test
```
---
### ✅ 2. Paths Configured for Windows
**Database:**
```
C:\mnt\okcomputer\output\cache.db
```
**Images:**
```
C:\mnt\okcomputer\output\images\{saleId}\{lotId}\
```
**Files Updated:**
- `Main.java:31` - Database path
- `ImageProcessingService.java:52` - Image storage path
---
### ✅ 3. Comprehensive Test Suite (90 Tests)
| Test File | Tests | Coverage |
|-----------|-------|----------|
| ScraperDataAdapterTest | 13 | Data transformation, ID parsing, currency |
| DatabaseServiceTest | 15 | CRUD operations, concurrency |
| ImageProcessingServiceTest | 11 | Download, detection, errors |
| ObjectDetectionServiceTest | 10 | YOLO initialization, detection |
| NotificationServiceTest | 19 | Desktop/email, priorities |
| TroostwijkMonitorTest | 12 | Orchestration, monitoring |
| IntegrationTest | 10 | End-to-end workflows |
| **TOTAL** | **90** | **Complete system** |
**Documentation:** See `TEST_SUITE_SUMMARY.md`
---
### ✅ 4. Workflow Integration & Orchestration
**New Component:** `WorkflowOrchestrator.java`
**4 Automated Workflows:**
1. **Scraper Data Import** (every 30 min)
- Imports auctions, lots, image URLs
- Sends notifications for significant data
2. **Image Processing** (every 1 hour)
- Downloads images
- Runs YOLO object detection
- Saves labels to database
3. **Bid Monitoring** (every 15 min)
- Checks for bid changes
- Sends notifications
4. **Closing Alerts** (every 5 min)
- Finds lots closing soon
- Sends high-priority notifications
---
### ✅ 5. Running Modes
**Main.java now supports 4 modes:**
#### Mode 1: workflow (Default - Recommended)
```bash
java -jar troostwijk-monitor.jar workflow
# OR
run-workflow.bat
```
- Runs all workflows continuously
- Built-in scheduling
- Best for production
#### Mode 2: once (For Cron/Task Scheduler)
```bash
java -jar troostwijk-monitor.jar once
# OR
run-once.bat
```
- Runs complete workflow once
- Exits after completion
- Perfect for external schedulers
#### Mode 3: legacy (Backward Compatible)
```bash
java -jar troostwijk-monitor.jar legacy
```
- Original monitoring approach
- Kept for compatibility
#### Mode 4: status (Quick Check)
```bash
java -jar troostwijk-monitor.jar status
# OR
check-status.bat
```
- Shows current status
- Exits immediately
---
### ✅ 6. Windows Scheduling Scripts
**Batch Scripts Created:**
1. **run-workflow.bat**
- Starts workflow mode
- Continuous operation
- For manual/startup use
2. **run-once.bat**
- Single execution
- For Task Scheduler
- Exit code support
3. **check-status.bat**
- Quick status check
- Shows database stats
**PowerShell Automation:**
4. **setup-windows-task.ps1**
- Creates Task Scheduler tasks automatically
- Sets up 2 scheduled tasks:
- Workflow runner (every 30 min)
- Status checker (every 6 hours)
**Usage:**
```powershell
# Run as Administrator
.\setup-windows-task.ps1
```
---
### ✅ 7. Event-Driven Triggers
**WorkflowOrchestrator supports event-driven execution:**
```java
// 1. New auction discovered
orchestrator.onNewAuctionDiscovered(auctionInfo);
// 2. Bid change detected
orchestrator.onBidChange(lot, previousBid, newBid);
// 3. Objects detected in image
orchestrator.onObjectsDetected(lotId, labels);
```
**Benefits:**
- React immediately to important events
- No waiting for next scheduled run
- Flexible integration with external systems
---
### ✅ 8. Comprehensive Documentation
**Documentation Created:**
1. **TEST_SUITE_SUMMARY.md**
- Complete test coverage overview
- 90 test cases documented
- Running instructions
- Test patterns explained
2. **WORKFLOW_GUIDE.md**
- Complete workflow integration guide
- Running modes explained
- Windows Task Scheduler setup
- Event-driven triggers
- Configuration options
- Troubleshooting guide
- Advanced integration examples
3. **README.md** (Updated)
- System architecture diagram
- Integration flow
- User interaction points
- Value estimation pipeline
- Integration hooks table
---
## Quick Start
### Option A: Continuous Operation (Recommended)
```bash
# Build
mvn clean package
# Run workflow mode
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar workflow
# Or use batch script
run-workflow.bat
```
**What runs:**
- ✅ Data import every 30 min
- ✅ Image processing every 1 hour
- ✅ Bid monitoring every 15 min
- ✅ Closing alerts every 5 min
---
### Option B: Windows Task Scheduler
```powershell
# 1. Build JAR
mvn clean package
# 2. Setup scheduled tasks (run as Admin)
.\setup-windows-task.ps1
# Done! Workflow runs automatically every 30 minutes
```
---
### Option C: Manual/Cron Execution
```bash
# Run once
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar once
# Or
run-once.bat
# Schedule externally (Windows Task Scheduler, cron, etc.)
```
---
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ External Scraper (Python) │
│ Populates: auctions, lots, images tables │
└─────────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ SQLite Database │
│ C:\mnt\okcomputer\output\cache.db │
└─────────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ WorkflowOrchestrator (This System) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Workflow 1: Scraper Import (every 30 min) │ │
│ │ Workflow 2: Image Processing (every 1 hour) │ │
│ │ Workflow 3: Bid Monitoring (every 15 min) │ │
│ │ Workflow 4: Closing Alerts (every 5 min) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ImageProcessingService │ │
│ │ - Downloads images │ │
│ │ - Stores: C:\mnt\okcomputer\output\images\ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ObjectDetectionService (YOLO) │ │
│ │ - Detects objects in images │ │
│ │ - Labels: car, truck, machinery, etc. │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ NotificationService │ │
│ │ - Desktop notifications (Windows tray) │ │
│ │ - Email notifications (Gmail SMTP) │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ User Notifications │
│ - Bid changes │
│ - Closing alerts │
│ - Object detection results │
│ - Value estimates (future) │
└─────────────────────────────────────────────────────────────┘
```
---
## Integration Points
### 1. Database Integration
- **Read:** Auctions, lots, image URLs from external scraper
- **Write:** Processed images, object labels, notifications
### 2. File System Integration
- **Read:** YOLO model files (models/)
- **Write:** Downloaded images (C:\mnt\okcomputer\output\images\)
### 3. External Scraper Integration
- **Mode:** Shared SQLite database
- **Frequency:** Scraper populates, monitor enriches
### 4. Notification Integration
- **Desktop:** Windows system tray
- **Email:** Gmail SMTP (optional)
---
## Testing
### Run All Tests
```bash
mvn test
```
### Run Specific Test
```bash
mvn test -Dtest=IntegrationTest
mvn test -Dtest=WorkflowOrchestratorTest
```
### Test Coverage
```bash
mvn jacoco:prepare-agent test jacoco:report
# Report: target/site/jacoco/index.html
```
---
## Configuration
### Environment Variables
```bash
# Windows (cmd)
set DATABASE_FILE=C:\mnt\okcomputer\output\cache.db
set NOTIFICATION_CONFIG=desktop
# Windows (PowerShell)
$env:DATABASE_FILE="C:\mnt\okcomputer\output\cache.db"
$env:NOTIFICATION_CONFIG="desktop"
# For email notifications
set NOTIFICATION_CONFIG=smtp:your@gmail.com:app_password:recipient@example.com
```
### Code Configuration
**Database Path** (`Main.java:31`):
```java
String databaseFile = System.getenv().getOrDefault(
"DATABASE_FILE",
"C:\\mnt\\okcomputer\\output\\cache.db"
);
```
**Workflow Schedules** (`WorkflowOrchestrator.java`):
```java
scheduleScraperDataImport(); // Line 65 - every 30 min
scheduleImageProcessing(); // Line 95 - every 1 hour
scheduleBidMonitoring(); // Line 180 - every 15 min
scheduleClosingAlerts(); // Line 215 - every 5 min
```
---
## Monitoring
### Check Status
```bash
java -jar troostwijk-monitor.jar status
```
**Output:**
```
📊 Workflow Status:
Running: Yes/No
Auctions: 25
Lots: 150
Images: 300
Closing soon (< 30 min): 5
```
### View Logs
Workflows print detailed logs:
```
📥 [WORKFLOW 1] Importing scraper data...
→ Imported 5 auctions
→ Imported 25 lots
✓ Scraper import completed in 1250ms
🖼️ [WORKFLOW 2] Processing pending images...
→ Processing 50 images
✓ Processed 50 images, detected objects in 12
💰 [WORKFLOW 3] Monitoring bids...
→ Checking 150 active lots
✓ Bid monitoring completed in 250ms
⏰ [WORKFLOW 4] Checking closing times...
→ Sent 3 closing alerts
```
---
## Next Steps
### Immediate Actions
1. **Build the project:**
```bash
mvn clean package
```
2. **Run tests:**
```bash
mvn test
```
3. **Choose execution mode:**
- **Continuous:** `run-workflow.bat`
- **Scheduled:** `.\setup-windows-task.ps1` (as Admin)
- **Manual:** `run-once.bat`
4. **Verify setup:**
```bash
check-status.bat
```
### Future Enhancements
1. **Value Estimation Algorithm**
- Use detected objects to estimate lot value
- Historical price analysis
- Market trends integration
2. **Machine Learning**
- Train custom YOLO model for auction items
- Price prediction based on images
- Automatic categorization
3. **Web Dashboard**
- Real-time monitoring
- Manual bid placement
- Value estimate approval
4. **API Integration**
- Direct Troostwijk API integration
- Real-time bid updates
- Automatic bid placement
5. **Advanced Notifications**
- SMS notifications (Twilio)
- Push notifications (Firebase)
- Slack/Discord integration
---
## Files Created/Modified
### Core Implementation
- ✅ `WorkflowOrchestrator.java` - Workflow coordination
- ✅ `Main.java` - Updated with 4 running modes
- ✅ `ImageProcessingService.java` - Windows paths
- ✅ `pom.xml` - Test libraries added
### Test Suite (90 tests)
- ✅ `ScraperDataAdapterTest.java` (13 tests)
- ✅ `DatabaseServiceTest.java` (15 tests)
- ✅ `ImageProcessingServiceTest.java` (11 tests)
- ✅ `ObjectDetectionServiceTest.java` (10 tests)
- ✅ `NotificationServiceTest.java` (19 tests)
- ✅ `TroostwijkMonitorTest.java` (12 tests)
- ✅ `IntegrationTest.java` (10 tests)
### Windows Scripts
- ✅ `run-workflow.bat` - Workflow mode runner
- ✅ `run-once.bat` - Once mode runner
- ✅ `check-status.bat` - Status checker
- ✅ `setup-windows-task.ps1` - Task Scheduler setup
### Documentation
- ✅ `TEST_SUITE_SUMMARY.md` - Test coverage
- ✅ `WORKFLOW_GUIDE.md` - Complete workflow guide
- ✅ `README.md` - Updated with diagrams
- ✅ `IMPLEMENTATION_COMPLETE.md` - This file
---
## Support & Troubleshooting
### Common Issues
**1. Tests failing**
```bash
# Ensure Maven dependencies downloaded
mvn clean install
# Run tests with debug info
mvn test -X
```
**2. Workflow not starting**
```bash
# Check if JAR was built
dir target\*jar-with-dependencies.jar
# Rebuild if missing
mvn clean package
```
**3. Database not found**
```bash
# Check path exists
dir C:\mnt\okcomputer\output\
# Create directory if missing
mkdir C:\mnt\okcomputer\output
```
**4. Images not downloading**
- Check internet connection
- Verify image URLs in database
- Check Windows Firewall settings
### Getting Help
1. Review documentation:
- `TEST_SUITE_SUMMARY.md` for tests
- `WORKFLOW_GUIDE.md` for workflows
- `README.md` for architecture
2. Check status:
```bash
check-status.bat
```
3. Review logs in console output
4. Run tests to verify components:
```bash
mvn test
```
---
## Summary
**Test libraries added** (JUnit, Mockito, AssertJ)
**90 comprehensive tests created**
**Workflow orchestration implemented**
**4 running modes** (workflow, once, legacy, status)
**Windows scheduling scripts** (batch + PowerShell)
**Event-driven triggers** (3 event types)
**Complete documentation** (3 guide files)
**Windows paths configured** (database + images)
**The system is production-ready and fully tested! 🎉**

View File

@@ -1,478 +0,0 @@
# Integration Guide: Troostwijk Monitor ↔ Scraper
## Overview
This document describes how **Troostwijk Monitor** (this Java project) integrates with the **ARCHITECTURE-TROOSTWIJK-SCRAPER** (Python scraper process).
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ ARCHITECTURE-TROOSTWIJK-SCRAPER (Python) │
│ │
│ • Discovers auctions from website │
│ • Scrapes lot details via Playwright │
│ • Parses __NEXT_DATA__ JSON │
│ • Stores image URLs (not downloads) │
│ │
│ ↓ Writes to │
└─────────┼───────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ SHARED SQLite DATABASE │
│ (troostwijk.db) │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ auctions │ │ lots │ │ images │ │
│ │ (Scraper) │ │ (Scraper) │ │ (Both) │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ │
│ │
│ ↑ Reads from ↓ Writes to │
└─────────┼──────────────────────────────┼──────────────────────┘
│ │
│ ▼
┌─────────┴──────────────────────────────────────────────────────┐
│ TROOSTWIJK MONITOR (Java - This Project) │
│ │
│ • Reads auction/lot data from database │
│ • Downloads images from URLs │
│ • Runs YOLO object detection │
│ • Monitors bid changes │
│ • Sends notifications │
└─────────────────────────────────────────────────────────────────┘
```
## Database Schema Mapping
### Scraper Schema → Monitor Schema
The scraper and monitor use **slightly different schemas** that need to be reconciled:
| Scraper Table | Monitor Table | Integration Notes |
|---------------|---------------|-----------------------------------------------|
| `auctions` | `auctions` | ✅ **Compatible** - same structure |
| `lots` | `lots` | ⚠️ **Needs mapping** - field name differences |
| `images` | `images` | ⚠️ **Partial overlap** - different purposes |
| `cache` | N/A | ❌ Monitor doesn't use cache |
### Field Mapping: `auctions` Table
| Scraper Field | Monitor Field | Notes |
|--------------------------|-------------------------------|---------------------------------------------------------------------|
| `auction_id` (TEXT) | `auction_id` (INTEGER) | ⚠️ **TYPE MISMATCH** - Scraper uses "A7-39813", Monitor expects INT |
| `url` | `url` | ✅ Compatible |
| `title` | `title` | ✅ Compatible |
| `location` | `location`, `city`, `country` | ⚠️ Monitor splits into 3 fields |
| `lots_count` | `lot_count` | ⚠️ Name difference |
| `first_lot_closing_time` | `closing_time` | ⚠️ Name difference |
| `scraped_at` | `discovered_at` | ⚠️ Name + type difference (TEXT vs INTEGER timestamp) |
### Field Mapping: `lots` Table
| Scraper Field | Monitor Field | Notes |
|----------------------|----------------------|--------------------------------------------------|
| `lot_id` (TEXT) | `lot_id` (INTEGER) | ⚠️ **TYPE MISMATCH** - "A1-28505-5" vs INT |
| `auction_id` | `sale_id` | ⚠️ Different name |
| `url` | `url` | ✅ Compatible |
| `title` | `title` | ✅ Compatible |
| `current_bid` (TEXT) | `current_bid` (REAL) | ⚠️ **TYPE MISMATCH** - "€123.45" vs 123.45 |
| `bid_count` | N/A | Monitor doesn't track |
| `closing_time` | `closing_time` | ⚠️ Format difference (TEXT vs LocalDateTime) |
| `viewing_time` | N/A | Monitor doesn't track |
| `pickup_date` | N/A | Monitor doesn't track |
| `location` | N/A | Monitor doesn't track lot location separately |
| `description` | `description` | ✅ Compatible |
| `category` | `category` | ✅ Compatible |
| N/A | `manufacturer` | Monitor has additional field |
| N/A | `type` | Monitor has additional field |
| N/A | `year` | Monitor has additional field |
| N/A | `currency` | Monitor has additional field |
| N/A | `closing_notified` | Monitor tracking field |
### Field Mapping: `images` Table
| Scraper Field | Monitor Field | Notes |
|------------------------|--------------------------|----------------------------------------|
| `id` | `id` | ✅ Compatible |
| `lot_id` | `lot_id` | ⚠️ Type difference (TEXT vs INTEGER) |
| `url` | `url` | ✅ Compatible |
| `local_path` | `Local_path` | ⚠️ Different name |
| `downloaded` (INTEGER) | N/A | Monitor uses `processed_at` instead |
| N/A | `labels` (TEXT) | Monitor adds detected objects |
| N/A | `processed_at` (INTEGER) | Monitor tracking field |
## Integration Options
### Option 1: Database Schema Adapter (Recommended)
Create a compatibility layer that transforms scraper data to monitor format.
**Implementation:**
```java
// Add to DatabaseService.java
class ScraperDataAdapter {
/**
* Imports auction from scraper format to monitor format
*/
static AuctionInfo fromScraperAuction(ResultSet rs) throws SQLException {
// Parse "A7-39813" → 39813
String auctionIdStr = rs.getString("auction_id");
int auctionId = extractNumericId(auctionIdStr);
// Split "Cluj-Napoca, RO" → city="Cluj-Napoca", country="RO"
String location = rs.getString("location");
String[] parts = location.split(",\\s*");
String city = parts.length > 0 ? parts[0] : "";
String country = parts.length > 1 ? parts[1] : "";
return new AuctionInfo(
auctionId,
rs.getString("title"),
location,
city,
country,
rs.getString("url"),
extractTypePrefix(auctionIdStr), // "A7-39813" → "A7"
rs.getInt("lots_count"),
parseTimestamp(rs.getString("first_lot_closing_time"))
);
}
/**
* Imports lot from scraper format to monitor format
*/
static Lot fromScraperLot(ResultSet rs) throws SQLException {
// Parse "A1-28505-5" → 285055 (combine numbers)
String lotIdStr = rs.getString("lot_id");
int lotId = extractNumericId(lotIdStr);
// Parse "A7-39813" → 39813
String auctionIdStr = rs.getString("auction_id");
int saleId = extractNumericId(auctionIdStr);
// Parse "€123.45" → 123.45
String currentBidStr = rs.getString("current_bid");
double currentBid = parseBid(currentBidStr);
return new Lot(
saleId,
lotId,
rs.getString("title"),
rs.getString("description"),
"", // manufacturer - not in scraper
"", // type - not in scraper
0, // year - not in scraper
rs.getString("category"),
currentBid,
"EUR", // currency - inferred from €
rs.getString("url"),
parseTimestamp(rs.getString("closing_time")),
false // not yet notified
);
}
private static int extractNumericId(String id) {
// "A7-39813" → 39813
// "A1-28505-5" → 285055
return Integer.parseInt(id.replaceAll("[^0-9]", ""));
}
private static String extractTypePrefix(String id) {
// "A7-39813" → "A7"
int dashIndex = id.indexOf('-');
return dashIndex > 0 ? id.substring(0, dashIndex) : "";
}
private static double parseBid(String bid) {
// "€123.45" → 123.45
// "No bids" → 0.0
if (bid == null || bid.contains("No")) return 0.0;
return Double.parseDouble(bid.replaceAll("[^0-9.]", ""));
}
private static LocalDateTime parseTimestamp(String timestamp) {
if (timestamp == null) return null;
// Parse scraper's timestamp format
return LocalDateTime.parse(timestamp);
}
}
```
### Option 2: Unified Schema (Better Long-term)
Modify **both** scraper and monitor to use a unified schema.
**Create**: `SHARED_SCHEMA.sql`
```sql
-- Unified schema that both projects use
CREATE TABLE IF NOT EXISTS auctions (
auction_id TEXT PRIMARY KEY, -- Use TEXT to support "A7-39813"
auction_id_numeric INTEGER, -- For monitor's integer needs
title TEXT NOT NULL,
location TEXT, -- Full: "Cluj-Napoca, RO"
city TEXT, -- Parsed: "Cluj-Napoca"
country TEXT, -- Parsed: "RO"
url TEXT NOT NULL,
type TEXT, -- "A7", "A1"
lot_count INTEGER DEFAULT 0,
closing_time TEXT, -- ISO 8601 format
scraped_at INTEGER, -- Unix timestamp
discovered_at INTEGER -- Unix timestamp (same as scraped_at)
);
CREATE TABLE IF NOT EXISTS lots (
lot_id TEXT PRIMARY KEY, -- Use TEXT: "A1-28505-5"
lot_id_numeric INTEGER, -- For monitor's integer needs
auction_id TEXT, -- FK: "A7-39813"
sale_id INTEGER, -- For monitor (same as auction_id_numeric)
title TEXT,
description TEXT,
manufacturer TEXT,
type TEXT,
year INTEGER,
category TEXT,
current_bid_text TEXT, -- "€123.45" or "No bids"
current_bid REAL, -- 123.45
bid_count INTEGER,
currency TEXT DEFAULT 'EUR',
url TEXT UNIQUE,
closing_time TEXT,
viewing_time TEXT,
pickup_date TEXT,
location TEXT,
closing_notified INTEGER DEFAULT 0,
scraped_at TEXT,
FOREIGN KEY (auction_id) REFERENCES auctions(auction_id)
);
CREATE TABLE IF NOT EXISTS images (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT, -- FK: "A1-28505-5"
url TEXT, -- Image URL from website
local_path TEXT, -- Local path after download
labels TEXT, -- Detected objects (comma-separated)
downloaded INTEGER DEFAULT 0, -- 0=pending, 1=downloaded
processed_at INTEGER, -- Unix timestamp when processed
FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
);
-- Indexes
CREATE INDEX IF NOT EXISTS idx_auctions_country ON auctions(country);
CREATE INDEX IF NOT EXISTS idx_lots_auction_id ON lots(auction_id);
CREATE INDEX IF NOT EXISTS idx_images_lot_id ON images(lot_id);
CREATE INDEX IF NOT EXISTS idx_images_downloaded ON images(downloaded);
```
### Option 3: API Integration (Most Flexible)
Have the scraper expose a REST API for the monitor to query.
```python
# In scraper: Add Flask API endpoint
@app.route('/api/auctions', methods=['GET'])
def get_auctions():
"""Returns auctions in monitor-compatible format"""
conn = sqlite3.connect(CACHE_DB)
cursor = conn.cursor()
cursor.execute("SELECT * FROM auctions WHERE location LIKE '%NL%'")
auctions = []
for row in cursor.fetchall():
auctions.append({
'auctionId': extract_numeric_id(row[0]),
'title': row[2],
'location': row[3],
'city': row[3].split(',')[0] if row[3] else '',
'country': row[3].split(',')[1].strip() if ',' in row[3] else '',
'url': row[1],
'type': row[0].split('-')[0],
'lotCount': row[4],
'closingTime': row[5]
})
return jsonify(auctions)
```
## Recommended Integration Steps
### Phase 1: Immediate (Adapter Pattern)
1. ✅ Keep separate schemas
2. ✅ Create `ScraperDataAdapter` in Monitor
3. ✅ Add import methods to `DatabaseService`
4. ✅ Monitor reads from scraper's tables using adapter
### Phase 2: Short-term (Unified Schema)
1. 📋 Design unified schema (see Option 2)
2. 📋 Update scraper to use unified schema
3. 📋 Update monitor to use unified schema
4. 📋 Migrate existing data
### Phase 3: Long-term (API + Event-driven)
1. 📋 Add REST API to scraper
2. 📋 Add webhook/event notification when new data arrives
3. 📋 Monitor subscribes to events
4. 📋 Process images asynchronously
## Current Integration Flow
### Scraper Process (Python)
```bash
# 1. Run scraper to populate database
cd /path/to/scraper
python scraper.py
# Output:
# ✅ Scraped 42 auctions
# ✅ Scraped 1,234 lots
# ✅ Saved 3,456 image URLs
# ✅ Data written to: /mnt/okcomputer/output/cache.db
```
### Monitor Process (Java)
```bash
# 2. Run monitor to process the data
cd /path/to/monitor
export DATABASE_FILE=/mnt/okcomputer/output/cache.db
java -jar troostwijk-monitor.jar
# Output:
# 📊 Current Database State:
# Total lots in database: 1,234
# Total images processed: 0
#
# [1/2] Processing images...
# Downloading and analyzing 3,456 images...
#
# [2/2] Starting bid monitoring...
# ✓ Monitoring 1,234 active lots
```
## Configuration
### Shared Database Path
Both processes must point to the same database file:
**Scraper** (`config.py`):
```python
CACHE_DB = '/mnt/okcomputer/output/cache.db'
```
**Monitor** (`Main.java`):
```java
String databaseFile = System.getenv().getOrDefault(
"DATABASE_FILE",
"/mnt/okcomputer/output/cache.db"
);
```
### Recommended Directory Structure
```
/mnt/okcomputer/
├── scraper/ # Python scraper code
│ ├── scraper.py
│ └── requirements.txt
├── monitor/ # Java monitor code
│ ├── troostwijk-monitor.jar
│ └── models/ # YOLO models
│ ├── yolov4.cfg
│ ├── yolov4.weights
│ └── coco.names
└── output/ # Shared data directory
├── cache.db # Shared SQLite database
└── images/ # Downloaded images
├── A1-28505-5/
│ ├── 001.jpg
│ └── 002.jpg
└── ...
```
## Monitoring & Coordination
### Option A: Sequential Execution
```bash
#!/bin/bash
# run-pipeline.sh
echo "Step 1: Scraping..."
python scraper/scraper.py
echo "Step 2: Processing images..."
java -jar monitor/troostwijk-monitor.jar --process-images-only
echo "Step 3: Starting monitor..."
java -jar monitor/troostwijk-monitor.jar --monitor-only
```
### Option B: Separate Services (Docker Compose)
```yaml
version: '3.8'
services:
scraper:
build: ./scraper
volumes:
- ./output:/data
environment:
- CACHE_DB=/data/cache.db
command: python scraper.py
monitor:
build: ./monitor
volumes:
- ./output:/data
environment:
- DATABASE_FILE=/data/cache.db
- NOTIFICATION_CONFIG=desktop
depends_on:
- scraper
command: java -jar troostwijk-monitor.jar
```
### Option C: Cron-based Scheduling
```cron
# Scrape every 6 hours
0 */6 * * * cd /mnt/okcomputer/scraper && python scraper.py
# Process images every hour (if new lots found)
0 * * * * cd /mnt/okcomputer/monitor && java -jar monitor.jar --process-new
# Monitor runs continuously
@reboot cd /mnt/okcomputer/monitor && java -jar monitor.jar --monitor-only
```
## Troubleshooting
### Issue: Type Mismatch Errors
**Symptom**: Monitor crashes with "INTEGER expected, got TEXT"
**Solution**: Use adapter pattern (Option 1) or unified schema (Option 2)
### Issue: Monitor sees no data
**Symptom**: "Total lots in database: 0"
**Check**:
1. Is `DATABASE_FILE` env var set correctly?
2. Did scraper actually write data?
3. Are both processes using the same database file?
```bash
# Verify database has data
sqlite3 /mnt/okcomputer/output/cache.db "SELECT COUNT(*) FROM lots"
```
### Issue: Images not downloading
**Symptom**: "Total images processed: 0" but scraper found images
**Check**:
1. Scraper writes image URLs to `images` table
2. Monitor reads from `images` table with `downloaded=0`
3. Field name mapping: `local_path` vs `local_path`
## Next Steps
1. **Immediate**: Implement `ScraperDataAdapter` for compatibility
2. **This Week**: Test end-to-end integration with sample data
3. **Next Sprint**: Migrate to unified schema
4. **Future**: Add event-driven architecture with webhooks

View File

@@ -1,422 +0,0 @@
# Intelligence Features Implementation Summary
## Overview
This document summarizes the implementation of advanced intelligence features based on 15+ new GraphQL API fields discovered from the Troostwijk auction system.
## New GraphQL Fields Integrated
### HIGH PRIORITY FIELDS (Implemented)
1. **`followersCount`** (Integer) - Watch count showing bidder interest
- Direct indicator of competition
- Used for sleeper lot detection
- Popularity level classification
2. **`estimatedFullPrice`** (Object: min/max cents)
- Auction house's estimated value range
- Used for bargain detection
- Price vs estimate analytics
3. **`nextBidStepInCents`** (Long)
- Exact bid increment from API
- Precise next bid calculations
- Better UX for bidding recommendations
4. **`condition`** (String)
- Direct condition field from API
- Better than extracting from attributes
- Used in condition scoring
5. **`categoryInformation`** (Object)
- Structured category with path
- Better categorization and filtering
- Category-based analytics
6. **`location`** (Object: city, countryCode, etc.)
- Structured location data
- Proximity filtering capability
- Logistics cost calculation
### MEDIUM PRIORITY FIELDS (Implemented)
7. **`biddingStatus`** (Enum) - Detailed bidding status
8. **`appearance`** (String) - Visual condition notes
9. **`packaging`** (String) - Packaging details
10. **`quantity`** (Long) - Lot quantity for bulk items
11. **`vat`** (BigDecimal) - VAT percentage
12. **`buyerPremiumPercentage`** (BigDecimal) - Buyer premium
13. **`remarks`** (String) - Viewing/pickup notes
## Code Changes
### 1. Backend - Lot.java (Domain Model)
**File**: `src/main/java/auctiora/Lot.java`
**Changes**:
- Added 24 new fields to the Lot record
- Implemented 9 intelligence calculation methods:
- `calculateTotalCost()` - Bid + VAT + Premium
- `calculateNextBid()` - Using API increment
- `isBelowEstimate()` - Bargain detection
- `isAboveEstimate()` - Overvalued detection
- `getInterestToBidRatio()` - Conversion rate
- `getPopularityLevel()` - HIGH/MEDIUM/LOW/MINIMAL
- `isSleeperLot()` - High interest, low bid
- `getEstimatedMidpoint()` - Average of estimate range
- `getPriceVsEstimateRatio()` - Price comparison metric
**Example**:
```java
public boolean isSleeperLot() {
return followersCount != null && followersCount > 10 && currentBid < 100;
}
public double calculateTotalCost() {
double base = currentBid > 0 ? currentBid : 0;
if (vat != null && vat > 0) {
base += (base * vat / 100.0);
}
if (buyerPremiumPercentage != null && buyerPremiumPercentage > 0) {
base += (base * buyerPremiumPercentage / 100.0);
}
return base;
}
```
### 2. Backend - AuctionMonitorResource.java (REST API)
**File**: `src/main/java/auctiora/AuctionMonitorResource.java`
**New Endpoints Added**:
1. `GET /api/monitor/intelligence/sleepers` - Sleeper lots (high interest, low bids)
2. `GET /api/monitor/intelligence/bargains` - Bargain lots (below estimate)
3. `GET /api/monitor/intelligence/popular?level={HIGH|MEDIUM|LOW}` - Popular lots
4. `GET /api/monitor/intelligence/price-analysis` - Price vs estimate statistics
5. `GET /api/monitor/lots/{lotId}/intelligence` - Detailed lot intelligence
6. `GET /api/monitor/charts/watch-distribution` - Follower count distribution
**Enhanced Features**:
- Updated insights endpoint to include sleeper, bargain, and popular insights
- Added intelligent filtering and sorting for intelligence data
- Integrated new fields into existing statistics
**Example Endpoint**:
```java
@GET
@Path("/intelligence/sleepers")
public Response getSleeperLots(@QueryParam("minFollowers") @DefaultValue("10") int minFollowers) {
var allLots = db.getAllLots();
var sleepers = allLots.stream()
.filter(Lot::isSleeperLot)
.toList();
return Response.ok(Map.of(
"count", sleepers.size(),
"lots", sleepers
)).build();
}
```
### 3. Frontend - index.html (Intelligence Dashboard)
**File**: `src/main/resources/META-INF/resources/index.html`
**New UI Components**:
#### Intelligence Dashboard Widgets (3 new cards)
1. **Sleeper Lots Widget**
- Purple gradient design
- Shows count of high-interest, low-bid lots
- Click to filter table
2. **Bargain Lots Widget**
- Green gradient design
- Shows count of below-estimate lots
- Click to filter table
3. **Popular/Hot Lots Widget**
- Orange gradient design
- Shows count of high-follower lots
- Click to filter table
#### Enhanced Closing Soon Table
**New Columns Added**:
1. **Watchers** - Follower count with color-coded badges
- Red (50+ followers): High competition
- Orange (21-50): Medium competition
- Blue (6-20): Some interest
- Gray (0-5): Minimal interest
2. **Est. Range** - Auction house estimate (`€min-€max`)
- Shows "DEAL" badge if below estimate
3. **Total Cost** - True cost including VAT and premium
- Hover tooltip shows breakdown
- Purple color to stand out
**JavaScript Functions Added**:
- `fetchIntelligenceData()` - Fetches all intelligence metrics
- `showSleeperLots()` - Filters table to sleepers
- `showBargainLots()` - Filters table to bargains
- `showPopularLots()` - Filters table to popular
- Enhanced table rendering with smart badges
**Example Code**:
```javascript
// Calculate total cost (including VAT and premium)
const currentBid = lot.currentBid || 0;
const vat = lot.vat || 0;
const premium = lot.buyerPremiumPercentage || 0;
const totalCost = currentBid * (1 + (vat/100) + (premium/100));
// Bargain indicator
const isBargain = estMin && currentBid < parseFloat(estMin);
const bargainBadge = isBargain ?
'<span class="ml-1 text-xs bg-green-500 text-white px-1 rounded">DEAL</span>' : '';
```
## Intelligence Features
### 1. Sleeper Lot Detection
**Algorithm**: `followersCount > 10 AND currentBid < 100`
**Value Proposition**:
- Identifies lots with high interest but low current bids
- Opportunity to bid strategically before price escalates
- Early indicator of undervalued items
**Dashboard Display**:
- Count shown in purple widget
- Click to filter table
- Purple "eye" icon
### 2. Bargain Detection
**Algorithm**: `currentBid < estimatedMin`
**Value Proposition**:
- Identifies lots priced below auction house estimate
- Clear signal of potential good deals
- Quantifiable value assessment
**Dashboard Display**:
- Count shown in green widget
- "DEAL" badge in table
- Click to filter table
### 3. Popularity Analysis
**Algorithm**: Tiered classification by follower count
- HIGH: > 50 followers
- MEDIUM: 21-50 followers
- LOW: 6-20 followers
- MINIMAL: 0-5 followers
**Value Proposition**:
- Predict competition level
- Identify trending items
- Adjust bidding strategy accordingly
**Dashboard Display**:
- Count shown in orange widget
- Color-coded badges in table
- Click to filter by level
### 4. True Cost Calculator
**Algorithm**: `currentBid × (1 + VAT/100) × (1 + premium/100)`
**Value Proposition**:
- Shows actual out-of-pocket cost
- Prevents budget surprises
- Enables accurate comparison across lots
**Dashboard Display**:
- Purple "Total Cost" column
- Hover tooltip shows breakdown
- Updated in real-time
### 5. Exact Bid Increment
**Algorithm**: Uses `nextBidStepInCents` from API, falls back to calculated increment
**Value Proposition**:
- No guesswork on next bid amount
- API-provided accuracy
- Better bidding UX
**Implementation**:
```java
public double calculateNextBid() {
if (nextBidStepInCents != null && nextBidStepInCents > 0) {
return currentBid + (nextBidStepInCents / 100.0);
} else if (bidIncrement != null && bidIncrement > 0) {
return currentBid + bidIncrement;
}
return currentBid * 1.05; // Fallback: 5% increment
}
```
### 6. Price vs Estimate Analytics
**Metrics**:
- Total lots with estimates
- Count below estimate
- Count above estimate
- Average price vs estimate percentage
**Value Proposition**:
- Market efficiency analysis
- Auction house accuracy tracking
- Investment opportunity identification
**API Endpoint**: `/api/monitor/intelligence/price-analysis`
## Visual Design
### Color Scheme
- **Purple**: Sleeper lots, total cost (opportunity/value)
- **Green**: Bargains, deals (positive value)
- **Orange/Red**: Popular/hot lots (competition warning)
- **Blue**: Moderate interest (informational)
- **Gray**: Minimal interest (neutral)
### Badge System
1. **Watchers Badge**: Color-coded by competition level
2. **DEAL Badge**: Green indicator for below-estimate
3. **Time Left Badge**: Red/yellow/green by urgency
4. **Popularity Badge**: Fire icon for hot lots
### Interactive Elements
- Click widgets to filter table
- Hover for detailed tooltips
- Smooth scroll to table on filter
- Toast notifications for user feedback
## Performance Considerations
### API Optimization
- All intelligence data fetched in parallel
- Cached in dashboard state
- Minimal recalculation on render
- Efficient stream operations in backend
### Frontend Optimization
- Batch DOM updates
- Lazy rendering for large tables
- Debounced filter operations
- CSS transitions for smooth UX
## Testing Recommendations
### Backend Tests
1. Test `Lot` intelligence methods with various inputs
2. Test API endpoints with mock data
3. Test edge cases (null values, zero bids, etc.)
4. Performance test with 10k+ lots
### Frontend Tests
1. Test widget click handlers
2. Test table rendering with new columns
3. Test filter functionality
4. Test responsive design on mobile
### Integration Tests
1. End-to-end flow: Scraper → DB → API → Dashboard
2. Real-time data refresh
3. Concurrent user access
4. Load testing
## Future Enhancements
### Phase 2 (Bid History)
- Implement `bid_history` table scraping
- Track bid changes over time
- Calculate bid velocity accurately
- Identify bid patterns
### Phase 3 (ML Predictions)
- Predict final hammer price
- Recommend optimal bid timing
- Classify lot categories automatically
- Anomaly detection
### Phase 4 (Mobile)
- React Native mobile app
- Push notifications
- Offline mode
- Quick bid functionality
## Migration Guide
### Database Migration (Required)
The new fields need to be added to the database schema:
```sql
-- Add to lots table
ALTER TABLE lots ADD COLUMN followers_count INTEGER DEFAULT 0;
ALTER TABLE lots ADD COLUMN estimated_min DECIMAL(12, 2);
ALTER TABLE lots ADD COLUMN estimated_max DECIMAL(12, 2);
ALTER TABLE lots ADD COLUMN next_bid_step_in_cents BIGINT;
ALTER TABLE lots ADD COLUMN condition TEXT;
ALTER TABLE lots ADD COLUMN category_path TEXT;
ALTER TABLE lots ADD COLUMN city_location TEXT;
ALTER TABLE lots ADD COLUMN country_code TEXT;
ALTER TABLE lots ADD COLUMN bidding_status TEXT;
ALTER TABLE lots ADD COLUMN appearance TEXT;
ALTER TABLE lots ADD COLUMN packaging TEXT;
ALTER TABLE lots ADD COLUMN quantity BIGINT;
ALTER TABLE lots ADD COLUMN vat DECIMAL(5, 2);
ALTER TABLE lots ADD COLUMN buyer_premium_percentage DECIMAL(5, 2);
ALTER TABLE lots ADD COLUMN remarks TEXT;
ALTER TABLE lots ADD COLUMN starting_bid DECIMAL(12, 2);
ALTER TABLE lots ADD COLUMN reserve_price DECIMAL(12, 2);
ALTER TABLE lots ADD COLUMN reserve_met BOOLEAN DEFAULT FALSE;
ALTER TABLE lots ADD COLUMN bid_increment DECIMAL(12, 2);
ALTER TABLE lots ADD COLUMN view_count INTEGER DEFAULT 0;
ALTER TABLE lots ADD COLUMN first_bid_time TEXT;
ALTER TABLE lots ADD COLUMN last_bid_time TEXT;
ALTER TABLE lots ADD COLUMN bid_velocity DECIMAL(5, 2);
```
### Scraper Update (Required)
The external scraper (Python/Playwright) needs to extract the new fields from GraphQL:
```python
# Extract from __NEXT_DATA__ JSON
followers_count = lot_data.get('followersCount')
estimated_min = lot_data.get('estimatedFullPrice', {}).get('min', {}).get('cents')
estimated_max = lot_data.get('estimatedFullPrice', {}).get('max', {}).get('cents')
next_bid_step = lot_data.get('nextBidStepInCents')
condition = lot_data.get('condition')
# ... etc
```
### Deployment Steps
1. Stop the monitor service
2. Run database migrations
3. Update scraper to extract new fields
4. Deploy updated monitor JAR
5. Restart services
6. Verify data populating in dashboard
## Performance Metrics
### Expected Performance
- **Intelligence Data Fetch**: < 100ms for 10k lots
- **Table Rendering**: < 200ms with all new columns
- **Widget Update**: < 50ms
- **API Response Time**: < 500ms
### Resource Usage
- **Memory**: +50MB for intelligence calculations
- **Database**: +2KB per lot (new columns)
- **Network**: +10KB per dashboard refresh
## Documentation
- **Integration Flowchart**: `docs/INTEGRATION_FLOWCHART.md`
- **API Documentation**: Auto-generated from JAX-RS annotations
- **Database Schema**: `wiki/DATABASE_ARCHITECTURE.md`
- **GraphQL Fields**: `wiki/EXPERT_ANALITICS.sql`
---
**Implementation Date**: December 2025
**Version**: 2.1
**Status**: ✅ Complete - Ready for Testing
**Next Steps**:
1. Deploy to staging environment
2. Run integration tests
3. Update scraper to extract new fields
4. Deploy to production

View File

@@ -1,540 +0,0 @@
# Quarkus Implementation Complete ✅
## Summary
The Troostwijk Auction Monitor has been fully integrated with **Quarkus Framework** for production-ready deployment with enterprise features.
---
## 🎯 What Was Added
### 1. **Quarkus Dependencies** (pom.xml)
```xml
<!-- Core Quarkus -->
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-arc</artifactId> <!-- CDI/DI -->
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-rest-jackson</artifactId> <!-- REST API -->
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-scheduler</artifactId> <!-- Cron Scheduling -->
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-smallrye-health</artifactId> <!-- Health Checks -->
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-config-yaml</artifactId> <!-- YAML Config -->
</dependency>
```
### 2. **Configuration** (application.properties)
```properties
# Application
quarkus.application.name=troostwijk-scraper
quarkus.http.port=8081
# Auction Monitor Configuration
auction.database.path=C:\\mnt\\okcomputer\\output\\cache.db
auction.images.path=C:\\mnt\\okcomputer\\output\\images
auction.notification.config=desktop
# YOLO Models
auction.yolo.config=models/yolov4.cfg
auction.yolo.weights=models/yolov4.weights
auction.yolo.classes=models/coco.names
# Workflow Schedules (Cron Expressions)
auction.workflow.scraper-import.cron=0 */30 * * * ? # Every 30 min
auction.workflow.image-processing.cron=0 0 * * * ? # Every 1 hour
auction.workflow.bid-monitoring.cron=0 */15 * * * ? # Every 15 min
auction.workflow.closing-alerts.cron=0 */5 * * * ? # Every 5 min
# Scheduler
quarkus.scheduler.enabled=true
# Health Checks
quarkus.smallrye-health.root-path=/health
```
### 3. **Quarkus Scheduler** (QuarkusWorkflowScheduler.java)
Replaced manual `ScheduledExecutorService` with Quarkus `@Scheduled`:
```java
@ApplicationScoped
public class QuarkusWorkflowScheduler {
@Inject DatabaseService db;
@Inject NotificationService notifier;
@Inject ObjectDetectionService detector;
@Inject ImageProcessingService imageProcessor;
// Workflow 1: Every 30 minutes
@Scheduled(cron = "{auction.workflow.scraper-import.cron}")
void importScraperData() { /* ... */ }
// Workflow 2: Every 1 hour
@Scheduled(cron = "{auction.workflow.image-processing.cron}")
void processImages() { /* ... */ }
// Workflow 3: Every 15 minutes
@Scheduled(cron = "{auction.workflow.bid-monitoring.cron}")
void monitorBids() { /* ... */ }
// Workflow 4: Every 5 minutes
@Scheduled(cron = "{auction.workflow.closing-alerts.cron}")
void checkClosingTimes() { /* ... */ }
}
```
### 4. **CDI Producer** (AuctionMonitorProducer.java)
Centralized service creation with dependency injection:
```java
@ApplicationScoped
public class AuctionMonitorProducer {
@Produces @Singleton
public DatabaseService produceDatabaseService(
@ConfigProperty(name = "auction.database.path") String dbPath) {
DatabaseService db = new DatabaseService(dbPath);
db.ensureSchema();
return db;
}
@Produces @Singleton
public NotificationService produceNotificationService(
@ConfigProperty(name = "auction.notification.config") String config) {
return new NotificationService(config, "");
}
@Produces @Singleton
public ObjectDetectionService produceObjectDetectionService(...) { }
@Produces @Singleton
public ImageProcessingService produceImageProcessingService(...) { }
}
```
### 5. **REST API** (AuctionMonitorResource.java)
Full REST API for monitoring and control:
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/monitor/status` | GET | Get current status |
| `/api/monitor/statistics` | GET | Get detailed statistics |
| `/api/monitor/trigger/scraper-import` | POST | Trigger scraper import |
| `/api/monitor/trigger/image-processing` | POST | Trigger image processing |
| `/api/monitor/trigger/bid-monitoring` | POST | Trigger bid monitoring |
| `/api/monitor/trigger/closing-alerts` | POST | Trigger closing alerts |
| `/api/monitor/auctions` | GET | List auctions |
| `/api/monitor/auctions?country=NL` | GET | Filter auctions by country |
| `/api/monitor/lots` | GET | List active lots |
| `/api/monitor/lots/closing-soon` | GET | Lots closing soon |
| `/api/monitor/lots/{id}/images` | GET | Get lot images |
| `/api/monitor/test-notification` | POST | Send test notification |
### 6. **Health Checks** (AuctionMonitorHealthCheck.java)
Kubernetes-ready health probes:
```java
@Liveness // /health/live
public class LivenessCheck implements HealthCheck {
public HealthCheckResponse call() {
return HealthCheckResponse.up("Auction Monitor is alive");
}
}
@Readiness // /health/ready
public class ReadinessCheck implements HealthCheck {
@Inject DatabaseService db;
public HealthCheckResponse call() {
var auctions = db.getAllAuctions();
return HealthCheckResponse.named("database")
.up()
.withData("auctions", auctions.size())
.build();
}
}
@Startup // /health/started
public class StartupCheck implements HealthCheck { /* ... */ }
```
### 7. **Docker Support**
#### Dockerfile (Optimized for Quarkus fast-jar)
```dockerfile
# Build stage
FROM maven:3.9-eclipse-temurin-25-alpine AS build
WORKDIR /app
COPY ../pom.xml ./
RUN mvn dependency:go-offline -B
COPY ../src ./src/
RUN mvn package -DskipTests -Dquarkus.package.jar.type=fast-jar
# Runtime stage
FROM eclipse-temurin:25-jre-alpine
WORKDIR /app
# Copy Quarkus fast-jar structure
COPY --from=build /app/target/quarkus-app/lib/ /app/lib/
COPY --from=build /app/target/quarkus-app/*.jar /app/
COPY --from=build /app/target/quarkus-app/app/ /app/app/
COPY --from=build /app/target/quarkus-app/quarkus/ /app/quarkus/
EXPOSE 8081
HEALTHCHECK CMD wget --spider http://localhost:8081/health/live
ENTRYPOINT ["java", "-jar", "/app/quarkus-run.jar"]
```
#### docker-compose.yml
```yaml
version: '3.8'
services:
auction-monitor:
build: ../wiki
ports:
- "8081:8081"
volumes:
- ./data/cache.db:/mnt/okcomputer/output/cache.db
- ./data/images:/mnt/okcomputer/output/images
environment:
- AUCTION_DATABASE_PATH=/mnt/okcomputer/output/cache.db
- AUCTION_NOTIFICATION_CONFIG=desktop
healthcheck:
test: [ "CMD", "wget", "--spider", "http://localhost:8081/health/live" ]
interval: 30s
restart: unless-stopped
```
### 8. **Kubernetes Deployment**
Full Kubernetes manifests:
- **Namespace** - Isolated environment
- **PersistentVolumeClaim** - Data storage
- **ConfigMap** - Configuration
- **Secret** - Sensitive data (SMTP credentials)
- **Deployment** - Application pods
- **Service** - Internal networking
- **Ingress** - External access
- **HorizontalPodAutoscaler** - Auto-scaling
---
## 🚀 How to Run
### Development Mode (with live reload)
```bash
mvn quarkus:dev
# Access:
# - App: http://localhost:8081
# - Dev UI: http://localhost:8081/q/dev/
# - API: http://localhost:8081/api/monitor/status
# - Health: http://localhost:8081/health
```
### Production Mode (JAR)
```bash
# Build
mvn clean package
# Run
java -jar target/quarkus-app/quarkus-run.jar
# Access: http://localhost:8081
```
### Docker
```bash
# Build
docker build -t auction-monitor .
# Run
docker run -p 8081:8081 auction-monitor
# Access: http://localhost:8081
```
### Docker Compose
```bash
# Start
docker-compose up -d
# View logs
docker-compose logs -f
# Access: http://localhost:8081
```
### Kubernetes
```bash
# Deploy
kubectl apply -f k8s/deployment.yaml
# Port forward
kubectl port-forward svc/auction-monitor 8081:8081 -n auction-monitor
# Access: http://localhost:8081
```
---
## 📊 Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ QUARKUS APPLICATION │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ QuarkusWorkflowScheduler (@ApplicationScoped) │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ @Scheduled(cron = "0 */30 * * * ?") │ │ │
│ │ │ importScraperData() │ │ │
│ │ ├──────────────────────────────────────────────┤ │ │
│ │ │ @Scheduled(cron = "0 0 * * * ?") │ │ │
│ │ │ processImages() │ │ │
│ │ ├──────────────────────────────────────────────┤ │ │
│ │ │ @Scheduled(cron = "0 */15 * * * ?") │ │ │
│ │ │ monitorBids() │ │ │
│ │ ├──────────────────────────────────────────────┤ │ │
│ │ │ @Scheduled(cron = "0 */5 * * * ?") │ │ │
│ │ │ checkClosingTimes() │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ @Inject │
│ ┌───────────────────────┴────────────────────────────┐ │
│ │ AuctionMonitorProducer │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ @Produces @Singleton DatabaseService │ │ │
│ │ │ @Produces @Singleton NotificationService │ │ │
│ │ │ @Produces @Singleton ObjectDetectionService │ │ │
│ │ │ @Produces @Singleton ImageProcessingService │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ AuctionMonitorResource (REST API) │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ GET /api/monitor/status │ │ │
│ │ │ GET /api/monitor/statistics │ │ │
│ │ │ POST /api/monitor/trigger/* │ │ │
│ │ │ GET /api/monitor/auctions │ │ │
│ │ │ GET /api/monitor/lots │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ AuctionMonitorHealthCheck │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ @Liveness - /health/live │ │ │
│ │ │ @Readiness - /health/ready │ │ │
│ │ │ @Startup - /health/started │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
---
## 🔧 Key Features
### 1. **Dependency Injection (CDI)**
- Type-safe injection with `@Inject`
- Singleton services with `@Produces`
- Configuration injection with `@ConfigProperty`
### 2. **Scheduled Tasks**
- Cron-based scheduling with `@Scheduled`
- Configurable via properties
- No manual thread management
### 3. **REST API**
- JAX-RS endpoints
- JSON serialization
- Error handling
### 4. **Health Checks**
- Liveness probe (is app alive?)
- Readiness probe (is app ready?)
- Startup probe (has app started?)
### 5. **Configuration**
- External configuration
- Environment variable override
- Type-safe config injection
### 6. **Container Ready**
- Optimized Docker image
- Fast startup (~0.5s)
- Low memory (~50MB)
- Health checks included
### 7. **Cloud Native**
- Kubernetes manifests
- Auto-scaling support
- Ingress configuration
- Persistent storage
---
## 📁 Files Created/Modified
### New Files
```
src/main/java/com/auction/
├── QuarkusWorkflowScheduler.java # Quarkus scheduler
├── AuctionMonitorProducer.java # CDI producer
├── AuctionMonitorResource.java # REST API
└── AuctionMonitorHealthCheck.java # Health checks
src/main/resources/
└── application.properties # Configuration
k8s/
├── deployment.yaml # Kubernetes manifests
└── README.md # K8s deployment guide
docker-compose.yml # Docker Compose config
Dockerfile # Updated for Quarkus
QUARKUS_GUIDE.md # Complete Quarkus guide
QUARKUS_IMPLEMENTATION.md # This file
```
### Modified Files
```
pom.xml # Added Quarkus dependencies
src/main/resources/application.properties # Added config
```
---
## 🎯 Benefits of Quarkus
| Feature | Before | After (Quarkus) |
|---------|--------|-----------------|
| **Startup Time** | ~3-5 seconds | ~0.5 seconds |
| **Memory** | ~200MB | ~50MB |
| **Scheduling** | Manual ExecutorService | @Scheduled annotations |
| **DI/CDI** | Manual instantiation | @Inject, @Produces |
| **REST API** | None | Full JAX-RS API |
| **Health Checks** | None | Built-in probes |
| **Config** | Hard-coded | External properties |
| **Dev Mode** | Manual restart | Live reload |
| **Container** | Basic Docker | Optimized fast-jar |
| **Cloud Native** | Not ready | K8s ready |
---
## 🧪 Testing
### Unit Tests
```bash
mvn test
```
### Integration Tests
```bash
# Start app
mvn quarkus:dev
# In another terminal
curl http://localhost:8081/api/monitor/status
curl http://localhost:8081/health
curl -X POST http://localhost:8081/api/monitor/trigger/scraper-import
```
### Docker Test
```bash
docker-compose up -d
docker-compose logs -f
curl http://localhost:8081/api/monitor/status
docker-compose down
```
---
## 📚 Documentation
1. **QUARKUS_GUIDE.md** - Complete Quarkus usage guide
2. **QUARKUS_IMPLEMENTATION.md** - This file (implementation details)
3. **k8s/README.md** - Kubernetes deployment guide
4. **docker-compose.yml** - Docker Compose reference
5. **README.md** - Updated main README
---
## 🎉 Summary
**Quarkus Framework** - Fully integrated
**@Scheduled Workflows** - Cron-based scheduling
**CDI/Dependency Injection** - Clean architecture
**REST API** - Full control interface
**Health Checks** - Kubernetes ready
**Docker/Compose** - Production containers
**Kubernetes** - Cloud deployment
**Configuration** - Externalized settings
**Documentation** - Complete guides
**The application is now production-ready with Quarkus! 🚀**
### Quick Commands
```bash
# Development
mvn quarkus:dev
# Production
mvn clean package
java -jar target/quarkus-app/quarkus-run.jar
# Docker
docker-compose up -d
# Kubernetes
kubectl apply -f k8s/deployment.yaml
```
### API Access
```bash
# Status
curl http://localhost:8081/api/monitor/status
# Statistics
curl http://localhost:8081/api/monitor/statistics
# Health
curl http://localhost:8081/health
# Trigger workflow
curl -X POST http://localhost:8081/api/monitor/trigger/scraper-import
```
**Enjoy your Quarkus-powered Auction Monitor! 🎊**

View File

@@ -1,191 +0,0 @@
# Quick Start Guide
Get the scraper running in minutes without downloading YOLO models!
## Minimal Setup (No Object Detection)
The scraper works perfectly fine **without** YOLO object detection. You can run it immediately and add object detection later if needed.
### Step 1: Run the Scraper
```bash
# Using Maven
mvn clean compile exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper"
```
Or in IntelliJ IDEA:
1. Open `TroostwijkScraper.java`
2. Right-click on the `main` method
3. Select "Run 'TroostwijkScraper.main()'"
### What You'll See
```
=== Troostwijk Auction Scraper ===
Initializing scraper...
⚠️ Object detection disabled: YOLO model files not found
Expected files:
- models/yolov4.cfg
- models/yolov4.weights
- models/coco.names
Scraper will continue without image analysis.
[1/3] Discovering Dutch auctions...
✓ Found 5 auctions: [12345, 12346, 12347, 12348, 12349]
[2/3] Fetching lot details...
Processing sale 12345...
[3/3] Starting monitoring service...
✓ Monitoring active. Press Ctrl+C to stop.
```
### Step 2: Test Desktop Notifications
The scraper will automatically send desktop notifications when:
- A new bid is placed on a monitored lot
- An auction is closing within 5 minutes
**No setup required** - desktop notifications work out of the box!
---
## Optional: Add Email Notifications
If you want email notifications in addition to desktop notifications:
```bash
# Set environment variable
export NOTIFICATION_CONFIG="smtp:your.email@gmail.com:app_password:your.email@gmail.com"
# Then run the scraper
mvn exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper"
```
**Get Gmail App Password:**
1. Enable 2FA in Google Account
2. Go to: Google Account → Security → 2-Step Verification → App passwords
3. Generate password for "Mail"
4. Use that password (not your regular Gmail password)
---
## Optional: Add Object Detection Later
If you want AI-powered image analysis to detect objects in auction photos:
### 1. Create models directory
```bash
mkdir models
cd models
```
### 2. Download YOLO files
```bash
# YOLOv4 config (small)
curl -O https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg
# YOLOv4 weights (245 MB - takes a few minutes)
curl -LO https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights
# COCO class names
curl -O https://raw.githubusercontent.com/AlexeyAB/darknet/master/data/coco.names
```
### 3. Run again
```bash
mvn exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper"
```
Now you'll see:
```
✓ Object detection enabled with YOLO
```
The scraper will now analyze auction images and detect objects like:
- Vehicles (cars, trucks, forklifts)
- Equipment (machines, tools)
- Furniture
- Electronics
- And 80+ other object types
---
## Features Without Object Detection
Even without YOLO, the scraper provides:
**Full auction scraping** - Discovers all Dutch auctions
**Lot tracking** - Monitors bids and closing times
**Desktop notifications** - Real-time alerts
**SQLite database** - All data persisted locally
**Image downloading** - Saves all lot images
**Scheduled monitoring** - Automatic updates every hour
Object detection simply adds:
- AI-powered image analysis
- Automatic object labeling
- Searchable image database
---
## Database Location
The scraper creates `troostwijk.db` in your current directory with:
- All auction data
- Lot details (title, description, bids, etc.)
- Downloaded image paths
- Object labels (if detection enabled)
View the database with any SQLite browser:
```bash
sqlite3 troostwijk.db
.tables
SELECT * FROM lots LIMIT 5;
```
---
## Stopping the Scraper
Press **Ctrl+C** to stop the monitoring service.
---
## Next Steps
1.**Run the scraper** without YOLO to test it
2.**Verify desktop notifications** work
3. ⚙️ **Optional**: Add email notifications
4. ⚙️ **Optional**: Download YOLO models for object detection
5. 🔧 **Customize**: Edit monitoring frequency, closing alerts, etc.
---
## Troubleshooting
### Desktop notifications not appearing?
- **Windows**: Check if Java has notification permissions
- **Linux**: Ensure desktop environment is running (not headless)
- **macOS**: Check System Preferences → Notifications
### OpenCV warnings?
These are normal and can be ignored:
```
WARNING: A restricted method in java.lang.System has been called
WARNING: Use --enable-native-access=ALL-UNNAMED to avoid warning
```
The scraper works fine despite these warnings.
---
## Full Documentation
See [README.md](../README.md) for complete documentation including:
- Email setup details
- YOLO installation guide
- Configuration options
- Database schema
- API endpoints

View File

@@ -1,399 +0,0 @@
# Scraper Refactor Guide - Image Download Integration
## 🎯 Objective
Refactor the Troostwijk scraper to **download and store images locally**, eliminating the 57M+ duplicate image problem in the monitoring process.
## 📋 Current vs. New Architecture
### **Before** (Current Architecture)
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Scraper │────────▶│ Database │◀────────│ Monitor │
│ │ │ │ │ │
│ Stores URLs │ │ images table │ │ Downloads + │
│ downloaded=0 │ │ │ │ Detection │
└──────────────┘ └──────────────┘ └──────────────┘
57M+ duplicates!
```
### **After** (New Architecture)
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Scraper │────────▶│ Database │◀────────│ Monitor │
│ │ │ │ │ │
│ Downloads + │ │ images table │ │ Detection │
│ Stores path │ │ local_path ✓ │ │ Only │
│ downloaded=1 │ │ │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
No duplicates!
```
## 🗄️ Database Schema Changes
### Current Schema (ARCHITECTURE-TROOSTWIJK-SCRAPER.md:113-122)
```sql
CREATE TABLE images (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT,
url TEXT,
local_path TEXT, -- Currently NULL
downloaded INTEGER -- Currently 0
-- Missing: processed_at, labels (added by monitor)
);
```
### Required Schema (Already Compatible!)
```sql
CREATE TABLE images (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT,
url TEXT,
local_path TEXT, -- ✅ SET by scraper after download
downloaded INTEGER, -- ✅ SET to 1 by scraper after download
labels TEXT, -- ⚠️ SET by monitor (object detection)
processed_at INTEGER, -- ⚠️ SET by monitor (timestamp)
FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
);
```
**Good News**: The scraper's schema already has `local_path` and `downloaded` columns! You just need to populate them.
## 🔧 Implementation Steps
### **Step 1: Enable Image Downloading in Configuration**
**File**: Your scraper's config file (e.g., `config.py` or environment variables)
```python
# Current setting
DOWNLOAD_IMAGES = False # ❌ Change this!
# New setting
DOWNLOAD_IMAGES = True # ✅ Enable downloads
# Image storage path
IMAGES_DIR = "/mnt/okcomputer/output/images" # Or your preferred path
```
### **Step 2: Update Image Download Logic**
Based on ARCHITECTURE-TROOSTWIJK-SCRAPER.md:211-228, you already have the structure. Here's what needs to change:
**Current Code** (Conceptual):
```python
# Phase 3: Scrape lot details
def scrape_lot(lot_url):
lot_data = parse_lot_page(lot_url)
# Save lot to database
db.insert_lot(lot_data)
# Save image URLs to database (NOT DOWNLOADED)
for img_url in lot_data['images']:
db.execute("""
INSERT INTO images (lot_id, url, downloaded)
VALUES (?, ?, 0)
""", (lot_data['lot_id'], img_url))
```
**New Code** (Required):
```python
import os
import requests
from pathlib import Path
import time
def scrape_lot(lot_url):
lot_data = parse_lot_page(lot_url)
# Save lot to database
db.insert_lot(lot_data)
# Download and save images
for idx, img_url in enumerate(lot_data['images'], start=1):
try:
# Download image
local_path = download_image(img_url, lot_data['lot_id'], idx)
# Insert with local_path and downloaded=1
db.execute("""
INSERT INTO images (lot_id, url, local_path, downloaded)
VALUES (?, ?, ?, 1)
ON CONFLICT(lot_id, url) DO UPDATE SET
local_path = excluded.local_path,
downloaded = 1
""", (lot_data['lot_id'], img_url, local_path))
# Rate limiting (0.5s between downloads)
time.sleep(0.5)
except Exception as e:
print(f"Failed to download {img_url}: {e}")
# Still insert record but mark as not downloaded
db.execute("""
INSERT INTO images (lot_id, url, downloaded)
VALUES (?, ?, 0)
""", (lot_data['lot_id'], img_url))
def download_image(image_url, lot_id, index):
"""
Downloads an image and saves it to organized directory structure.
Args:
image_url: Remote URL of the image
lot_id: Lot identifier (e.g., "A1-28505-5")
index: Image sequence number (1, 2, 3, ...)
Returns:
Absolute path to saved file
"""
# Create directory structure: /images/{lot_id}/
images_dir = Path(os.getenv('IMAGES_DIR', '/mnt/okcomputer/output/images'))
lot_dir = images_dir / lot_id
lot_dir.mkdir(parents=True, exist_ok=True)
# Determine file extension from URL or content-type
ext = Path(image_url).suffix or '.jpg'
filename = f"{index:03d}{ext}" # 001.jpg, 002.jpg, etc.
local_path = lot_dir / filename
# Download with timeout
response = requests.get(image_url, timeout=10)
response.raise_for_status()
# Save to disk
with open(local_path, 'wb') as f:
f.write(response.content)
return str(local_path.absolute())
```
### **Step 3: Add Unique Constraint to Prevent Duplicates**
**Migration SQL**:
```sql
-- Add unique constraint to prevent duplicate image records
CREATE UNIQUE INDEX IF NOT EXISTS idx_images_unique
ON images(lot_id, url);
```
Add this to your scraper's schema initialization:
```python
def init_database():
conn = sqlite3.connect('/mnt/okcomputer/output/cache.db')
cursor = conn.cursor()
# Existing table creation...
cursor.execute("""
CREATE TABLE IF NOT EXISTS images (...)
""")
# Add unique constraint (NEW)
cursor.execute("""
CREATE UNIQUE INDEX IF NOT EXISTS idx_images_unique
ON images(lot_id, url)
""")
conn.commit()
conn.close()
```
### **Step 4: Handle Image Download Failures Gracefully**
```python
def download_with_retry(image_url, lot_id, index, max_retries=3):
"""Downloads image with retry logic."""
for attempt in range(max_retries):
try:
return download_image(image_url, lot_id, index)
except requests.exceptions.RequestException as e:
if attempt == max_retries - 1:
print(f"Failed after {max_retries} attempts: {image_url}")
return None # Return None on failure
print(f"Retry {attempt + 1}/{max_retries} for {image_url}")
time.sleep(2 ** attempt) # Exponential backoff
```
### **Step 5: Update Database Queries**
Make sure your INSERT uses `INSERT ... ON CONFLICT` to handle re-scraping:
```python
# Good: Handles re-scraping without duplicates
db.execute("""
INSERT INTO images (lot_id, url, local_path, downloaded)
VALUES (?, ?, ?, 1)
ON CONFLICT(lot_id, url) DO UPDATE SET
local_path = excluded.local_path,
downloaded = 1
""", (lot_id, img_url, local_path))
# Bad: Creates duplicates on re-scrape
db.execute("""
INSERT INTO images (lot_id, url, local_path, downloaded)
VALUES (?, ?, ?, 1)
""", (lot_id, img_url, local_path))
```
## 📊 Expected Outcomes
### Before Refactor
```sql
SELECT COUNT(*) FROM images WHERE downloaded = 0;
-- Result: 57,376,293 (57M+ undownloaded!)
SELECT COUNT(*) FROM images WHERE local_path IS NOT NULL;
-- Result: 0 (no files downloaded)
```
### After Refactor
```sql
SELECT COUNT(*) FROM images WHERE downloaded = 1;
-- Result: ~16,807 (one per actual lot image)
SELECT COUNT(*) FROM images WHERE local_path IS NOT NULL;
-- Result: ~16,807 (all downloaded images have paths)
SELECT COUNT(DISTINCT lot_id, url) FROM images;
-- Result: ~16,807 (no duplicates!)
```
## 🚀 Deployment Checklist
### Pre-Deployment
- [ ] Back up current database: `cp cache.db cache.db.backup`
- [ ] Verify disk space: At least 10GB free for images
- [ ] Test download function on 5 sample lots
- [ ] Verify `IMAGES_DIR` path exists and is writable
### Deployment
- [ ] Update configuration: `DOWNLOAD_IMAGES = True`
- [ ] Run schema migration to add unique index
- [ ] Deploy updated scraper code
- [ ] Monitor first 100 lots for errors
### Post-Deployment Verification
```sql
-- Check download success rate
SELECT
COUNT(*) as total_images,
SUM(CASE WHEN downloaded = 1 THEN 1 ELSE 0 END) as downloaded,
SUM(CASE WHEN downloaded = 0 THEN 1 ELSE 0 END) as failed,
ROUND(100.0 * SUM(downloaded) / COUNT(*), 2) as success_rate
FROM images;
-- Check for duplicates (should be 0)
SELECT lot_id, url, COUNT(*) as dup_count
FROM images
GROUP BY lot_id, url
HAVING COUNT(*) > 1;
-- Verify file system
SELECT COUNT(*) FROM images
WHERE downloaded = 1
AND local_path IS NOT NULL
AND local_path != '';
```
## 🔍 Monitoring Process Impact
The monitoring process (auctiora) will automatically:
- ✅ Stop downloading images (network I/O eliminated)
- ✅ Only run object detection on `local_path` files
- ✅ Query: `WHERE local_path IS NOT NULL AND (labels IS NULL OR labels = '')`
- ✅ Update only the `labels` and `processed_at` columns
**No changes needed in monitoring process!** It's already updated to work with scraper-downloaded images.
## 🐛 Troubleshooting
### Problem: "No space left on device"
```bash
# Check disk usage
df -h /mnt/okcomputer/output/images
# Estimate needed space: ~100KB per image
# 16,807 images × 100KB = ~1.6GB
```
### Problem: "Permission denied" when writing images
```bash
# Fix permissions
chmod 755 /mnt/okcomputer/output/images
chown -R scraper_user:scraper_group /mnt/okcomputer/output/images
```
### Problem: Images downloading but not recorded in DB
```python
# Add logging
import logging
logging.basicConfig(level=logging.INFO)
def download_image(...):
logging.info(f"Downloading {image_url} to {local_path}")
# ... download code ...
logging.info(f"Saved to {local_path}, size: {os.path.getsize(local_path)} bytes")
return local_path
```
### Problem: Duplicate images after refactor
```sql
-- Find duplicates
SELECT lot_id, url, COUNT(*)
FROM images
GROUP BY lot_id, url
HAVING COUNT(*) > 1;
-- Clean up duplicates (keep newest)
DELETE FROM images
WHERE id NOT IN (
SELECT MAX(id)
FROM images
GROUP BY lot_id, url
);
```
## 📈 Performance Comparison
| Metric | Before (Monitor Downloads) | After (Scraper Downloads) |
|----------------------|---------------------------------|---------------------------|
| **Image records** | 57,376,293 | ~16,807 |
| **Duplicates** | 57,359,486 (99.97%!) | 0 |
| **Network I/O** | Monitor process | Scraper process |
| **Disk usage** | 0 (URLs only) | ~1.6GB (actual files) |
| **Processing speed** | 500ms/image (download + detect) | 100ms/image (detect only) |
| **Error handling** | Complex (download failures) | Simple (files exist) |
## 🎓 Code Examples by Language
### Python (Most Likely)
See **Step 2** above for complete implementation.
## 📚 References
- **Current Scraper Architecture**: `wiki/ARCHITECTURE-TROOSTWIJK-SCRAPER.md`
- **Database Schema**: `wiki/DATABASE_ARCHITECTURE.md`
- **Monitor Changes**: See commit history for `ImageProcessingService.java`, `DatabaseService.java`
## ✅ Success Criteria
You'll know the refactor is successful when:
1. ✅ Database query `SELECT COUNT(*) FROM images` returns ~16,807 (not 57M+)
2. ✅ All images have `downloaded = 1` and `local_path IS NOT NULL`
3. ✅ No duplicate records: `SELECT lot_id, url, COUNT(*) ... HAVING COUNT(*) > 1` returns 0 rows
4. ✅ Monitor logs show "Found N images needing detection" with reasonable numbers
5. ✅ Files exist at paths in `local_path` column
6. ✅ Monitor process speed increases (100ms vs 500ms per image)
---
**Questions?** Check the troubleshooting section or inspect the monitor's updated code in:
- `src/main/java/auctiora/ImageProcessingService.java`
- `src/main/java/auctiora/DatabaseService.java:695-719`

View File

@@ -1,333 +0,0 @@
# Test Suite Summary
## Overview
Comprehensive test suite for Troostwijk Auction Monitor with individual test cases for every aspect of the system.
## Configuration Updates
### Paths Updated
- **Database**: `C:\mnt\okcomputer\output\cache.db`
- **Images**: `C:\mnt\okcomputer\output\images\{saleId}\{lotId}\`
### Files Modified
1. `src/main/java/com/auction/Main.java` - Updated default database path
2. `src/main/java/com/auction/ImageProcessingService.java` - Updated image storage path
## Test Files Created
### 1. ScraperDataAdapterTest.java (13 test cases)
Tests data transformation from external scraper schema to monitor schema:
- ✅ Extract numeric ID from text format (auction & lot IDs)
- ✅ Convert scraper auction format to AuctionInfo
- ✅ Handle simple location without country
- ✅ Convert scraper lot format to Lot
- ✅ Parse bid amounts from various formats (€, $, £, plain numbers)
- ✅ Handle missing/null fields gracefully
- ✅ Parse various timestamp formats (ISO, SQL)
- ✅ Handle invalid timestamps
- ✅ Extract type prefix from auction ID
- ✅ Handle GBP currency symbol
- ✅ Handle "No bids" text
- ✅ Parse complex lot IDs (A1-28505-5 → 285055)
- ✅ Validate field mapping (lots_count → lotCount, etc.)
### 2. DatabaseServiceTest.java (15 test cases)
Tests database operations and SQLite persistence:
- ✅ Create database schema successfully
- ✅ Insert and retrieve auction
- ✅ Update existing auction on conflict (UPSERT)
- ✅ Retrieve auctions by country code
- ✅ Insert and retrieve lot
- ✅ Update lot current bid
- ✅ Update lot notification flags
- ✅ Insert and retrieve image records
- ✅ Count total images
- ✅ Handle empty database gracefully
- ✅ Handle lots with null closing time
- ✅ Retrieve active lots
- ✅ Handle concurrent upserts (thread safety)
- ✅ Validate foreign key relationships
- ✅ Test database indexes performance
### 3. ImageProcessingServiceTest.java (11 test cases)
Tests image downloading and processing pipeline:
- ✅ Process images for lot with object detection
- ✅ Handle image download failure gracefully
- ✅ Create directory structure for images
- ✅ Save detected objects to database
- ✅ Handle empty image list
- ✅ Process pending images from database
- ✅ Skip lots that already have images
- ✅ Handle database errors during image save
- ✅ Handle empty detection results
- ✅ Handle lots with no existing images
- ✅ Capture and verify detection labels
### 4. ObjectDetectionServiceTest.java (10 test cases)
Tests YOLO object detection functionality:
- ✅ Initialize with missing YOLO models (disabled mode)
- ✅ Return empty list when detection is disabled
- ✅ Handle invalid image path gracefully
- ✅ Handle empty image file
- ✅ Initialize successfully with valid model files
- ✅ Handle missing class names file
- ✅ Detect when model files are missing
- ✅ Return unique labels only
- ✅ Handle multiple detections in same image
- ✅ Respect confidence threshold (0.5)
### 5. NotificationServiceTest.java (19 test cases)
Tests desktop and email notification delivery:
- ✅ Initialize with desktop-only configuration
- ✅ Initialize with SMTP configuration
- ✅ Reject invalid SMTP configuration format
- ✅ Reject unknown configuration type
- ✅ Send desktop notification without error
- ✅ Send high priority notification
- ✅ Send normal priority notification
- ✅ Handle notification when system tray not supported
- ✅ Send email notification with valid SMTP config
- ✅ Include both desktop and email when SMTP configured
- ✅ Handle empty message gracefully
- ✅ Handle very long message (1000+ chars)
- ✅ Handle special characters in message (€, ⚠️)
- ✅ Accept case-insensitive desktop config
- ✅ Validate SMTP config parts count
- ✅ Handle multiple rapid notifications
- ✅ Send bid change notification format
- ✅ Send closing alert notification format
- ✅ Send object detection notification format
### 6. TroostwijkMonitorTest.java (12 test cases)
Tests monitoring orchestration and coordination:
- ✅ Initialize monitor successfully
- ✅ Print database stats without error
- ✅ Process pending images without error
- ✅ Handle empty database gracefully
- ✅ Track lots in database
- ✅ Monitor lots closing soon (< 5 minutes)
- ✅ Identify lots with time remaining
- ✅ Handle lots without closing time
- ✅ Track notification status
- ✅ Update bid amounts
- ✅ Handle multiple concurrent lot updates
- ✅ Handle database with auctions and lots
### 7. IntegrationTest.java (10 test cases)
Tests complete end-to-end workflows:
-**Test 1**: Complete scraper data import workflow
- Import auction from scraper format
- Import multiple lots for auction
- Verify data integrity
-**Test 2**: Image processing and detection workflow
- Add images for lots
- Run object detection
- Save labels to database
-**Test 3**: Bid monitoring and notification workflow
- Simulate bid increase
- Update database
- Send notification
- Verify bid was updated
-**Test 4**: Closing alert workflow
- Create lot closing soon
- Send high-priority notification
- Mark as notified
- Verify notification flag
-**Test 5**: Multi-country auction filtering
- Add auctions from NL, RO, BE
- Filter by country code
- Verify filtering works correctly
-**Test 6**: Complete monitoring cycle
- Print database statistics
- Process pending images
- Verify database integrity
-**Test 7**: Data consistency across services
- Verify all auctions have valid data
- Verify all lots have valid data
- Check referential integrity
-**Test 8**: Object detection value estimation workflow
- Create lot with detected objects
- Add images with labels
- Analyze detected objects
- Send value estimation notification
-**Test 9**: Handle rapid concurrent updates
- Concurrent auction insertions
- Concurrent lot insertions
- Verify all data persisted correctly
-**Test 10**: End-to-end notification scenarios
- Bid change notification
- Closing alert
- Object detection notification
- Value estimate notification
- Viewing day reminder
## Test Coverage Summary
| Component | Test Cases | Coverage Areas |
|-----------|-----------|----------------|
| **ScraperDataAdapter** | 13 | Data transformation, ID parsing, currency parsing, timestamp parsing |
| **DatabaseService** | 15 | CRUD operations, concurrency, foreign keys, indexes |
| **ImageProcessingService** | 11 | Download, detection integration, error handling |
| **ObjectDetectionService** | 10 | YOLO initialization, detection, confidence threshold |
| **NotificationService** | 19 | Desktop/Email, priority levels, special chars, formats |
| **TroostwijkMonitor** | 12 | Orchestration, monitoring, bid tracking, alerts |
| **Integration** | 10 | End-to-end workflows, multi-service coordination |
| **TOTAL** | **90** | **Complete system coverage** |
## Key Testing Patterns
### 1. Isolation Testing
Each component tested independently with mocks:
```java
mockDb = mock(DatabaseService.class);
mockDetector = mock(ObjectDetectionService.class);
service = new ImageProcessingService(mockDb, mockDetector);
```
### 2. Integration Testing
Components tested together for realistic scenarios:
```java
db imageProcessor detector notifier
```
### 3. Concurrency Testing
Thread safety verified with parallel operations:
```java
Thread t1 = new Thread(() -> db.upsertLot(...));
Thread t2 = new Thread(() -> db.upsertLot(...));
t1.start(); t2.start();
```
### 4. Error Handling
Graceful degradation tested throughout:
```java
assertDoesNotThrow(() -> service.process(invalidInput));
```
## Running the Tests
### Run All Tests
```bash
mvn test
```
### Run Specific Test Class
```bash
mvn test -Dtest=ScraperDataAdapterTest
mvn test -Dtest=IntegrationTest
```
### Run Single Test Method
```bash
mvn test -Dtest=IntegrationTest#testCompleteScraperImportWorkflow
```
### Generate Coverage Report
```bash
mvn jacoco:prepare-agent test jacoco:report
```
## Test Data Cleanup
All tests use temporary databases that are automatically cleaned up:
```java
@AfterAll
void tearDown() throws Exception {
Files.deleteIfExists(Paths.get(testDbPath));
}
```
## Integration Scenarios Covered
### Scenario 1: New Auction Discovery
1. External scraper finds new auction
2. Data imported via ScraperDataAdapter
3. Lots added to database
4. Images downloaded
5. Object detection runs
6. Notification sent to user
### Scenario 2: Bid Monitoring
1. Monitor checks API every hour
2. Detects bid increase
3. Updates database
4. Sends notification
5. User can place counter-bid
### Scenario 3: Closing Alert
1. Monitor checks closing times
2. Lot closing in < 5 minutes
3. High-priority notification sent
4. Flag updated to prevent duplicates
5. User can place final bid
### Scenario 4: Value Estimation
1. Images downloaded
2. YOLO detects objects
3. Labels saved to database
4. Value estimated (future feature)
5. Notification sent with estimate
## Dependencies Required for Tests
```xml
<dependencies>
<!-- JUnit 5 -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.10.0</version>
<scope>test</scope>
</dependency>
<!-- Mockito -->
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>5.5.0</version>
<scope>test</scope>
</dependency>
<!-- Mockito JUnit Jupiter -->
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-junit-jupiter</artifactId>
<version>5.5.0</version>
<scope>test</scope>
</dependency>
</dependencies>
```
## Notes
- All tests are independent and can run in any order
- Tests use in-memory or temporary databases
- No actual HTTP requests made (except in integration tests)
- YOLO models are optional (tests work in disabled mode)
- Notifications are tested but may not display in headless environments
- Tests document expected behavior for each component
## Future Test Enhancements
1. **Mock HTTP Server** for realistic image download testing
2. **Test Containers** for full database integration
3. **Performance Tests** for large datasets (1000+ auctions)
4. **Stress Tests** for concurrent monitoring scenarios
5. **UI Tests** for notification display (if GUI added)
6. **API Tests** for Troostwijk API integration
7. **Value Estimation** tests (when algorithm implemented)

View File

@@ -1,537 +0,0 @@
## Troostwijk Auction Monitor - Workflow Integration Guide
Complete guide for running the auction monitoring system with scheduled workflows, cron jobs, and event-driven triggers.
---
## Table of Contents
1. [Overview](#overview)
2. [Running Modes](#running-modes)
3. [Workflow Orchestration](#workflow-orchestration)
4. [Windows Scheduling](#windows-scheduling)
5. [Event-Driven Triggers](#event-driven-triggers)
6. [Configuration](#configuration)
7. [Monitoring & Debugging](#monitoring--debugging)
---
## Overview
The Troostwijk Auction Monitor supports multiple execution modes:
- **Workflow Mode** (Recommended): Continuous operation with built-in scheduling
- **Once Mode**: Single execution for external schedulers (Windows Task Scheduler, cron)
- **Legacy Mode**: Original monitoring approach
- **Status Mode**: Quick status check
---
## Running Modes
### 1. Workflow Mode (Default - Recommended)
**Runs all workflows continuously with built-in scheduling.**
```bash
# Windows
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar workflow
# Or simply (workflow is default)
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar
# Using batch script
run-workflow.bat
```
**What it does:**
- ✅ Imports scraper data every 30 minutes
- ✅ Processes images every 1 hour
- ✅ Monitors bids every 15 minutes
- ✅ Checks closing times every 5 minutes
**Best for:**
- Production deployment
- Long-running services
- Development/testing
---
### 2. Once Mode (For External Schedulers)
**Runs complete workflow once and exits.**
```bash
# Windows
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar once
# Using batch script
run-once.bat
```
**What it does:**
1. Imports scraper data
2. Processes pending images
3. Monitors bids
4. Checks closing times
5. Exits
**Best for:**
- Windows Task Scheduler
- Cron jobs (Linux/Mac)
- Manual execution
- Testing
---
### 3. Legacy Mode
**Original monitoring approach (backward compatibility).**
```bash
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar legacy
```
**Best for:**
- Maintaining existing deployments
- Troubleshooting
---
### 4. Status Mode
**Shows current status and exits.**
```bash
java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar status
# Using batch script
check-status.bat
```
**Output:**
```
📊 Workflow Status:
Running: No
Auctions: 25
Lots: 150
Images: 300
Closing soon (< 30 min): 5
```
---
## Workflow Orchestration
The `WorkflowOrchestrator` coordinates 4 scheduled workflows:
### Workflow 1: Scraper Data Import
**Frequency:** Every 30 minutes
**Purpose:** Import new auctions and lots from external scraper
**Process:**
1. Import auctions from scraper database
2. Import lots from scraper database
3. Import image URLs
4. Send notification if significant data imported
**Code Location:** `WorkflowOrchestrator.java:110`
---
### Workflow 2: Image Processing
**Frequency:** Every 1 hour
**Purpose:** Download images and run object detection
**Process:**
1. Get unprocessed images from database
2. Download each image
3. Run YOLO object detection
4. Save labels to database
5. Send notification for interesting detections (3+ objects)
**Code Location:** `WorkflowOrchestrator.java:150`
---
### Workflow 3: Bid Monitoring
**Frequency:** Every 15 minutes
**Purpose:** Check for bid changes and send notifications
**Process:**
1. Get all active lots
2. Check for bid changes (via external scraper updates)
3. Send notifications for bid increases
**Code Location:** `WorkflowOrchestrator.java:210`
**Note:** The external scraper updates bids; this workflow monitors and notifies.
---
### Workflow 4: Closing Alerts
**Frequency:** Every 5 minutes
**Purpose:** Send alerts for lots closing soon
**Process:**
1. Get all active lots
2. Check closing times
3. Send high-priority notification for lots closing in < 5 min
4. Mark as notified to prevent duplicates
**Code Location:** `WorkflowOrchestrator.java:240`
---
## Windows Scheduling
### Option A: Use Built-in Workflow Mode (Recommended)
**Run as a Windows Service or startup application:**
1. Create shortcut to `run-workflow.bat`
2. Place in: `C:\Users\[YourUser]\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup`
3. Monitor will start automatically on login
---
### Option B: Windows Task Scheduler (Once Mode)
**Automated setup:**
```powershell
# Run PowerShell as Administrator
.\setup-windows-task.ps1
```
This creates two tasks:
- `TroostwijkMonitor-Workflow`: Runs every 30 minutes
- `TroostwijkMonitor-StatusCheck`: Runs every 6 hours
**Manual setup:**
1. Open Task Scheduler
2. Create Basic Task
3. Configure:
- **Name:** `TroostwijkMonitor`
- **Trigger:** Every 30 minutes
- **Action:** Start a program
- **Program:** `java`
- **Arguments:** `-jar "C:\path\to\troostwijk-scraper.jar" once`
- **Start in:** `C:\path\to\project`
---
### Option C: Multiple Scheduled Tasks (Fine-grained Control)
Create separate tasks for each workflow:
| Task | Frequency | Command |
|------|-----------|---------|
| Import Data | Every 30 min | `run-once.bat` |
| Process Images | Every 1 hour | `run-once.bat` |
| Check Bids | Every 15 min | `run-once.bat` |
| Closing Alerts | Every 5 min | `run-once.bat` |
---
## Event-Driven Triggers
The orchestrator supports event-driven execution:
### 1. New Auction Discovered
```java
orchestrator.onNewAuctionDiscovered(auctionInfo);
```
**Triggered when:**
- External scraper finds new auction
**Actions:**
- Insert to database
- Send notification
---
### 2. Bid Change Detected
```java
orchestrator.onBidChange(lot, previousBid, newBid);
```
**Triggered when:**
- Bid increases on monitored lot
**Actions:**
- Update database
- Send notification: "Nieuw bod op kavel X: €Y (was €Z)"
---
### 3. Objects Detected
```java
orchestrator.onObjectsDetected(lotId, labels);
```
**Triggered when:**
- YOLO detects 2+ objects in image
**Actions:**
- Send notification: "Lot X contains: car, truck, machinery"
---
## Configuration
### Environment Variables
```bash
# Database location
set DATABASE_FILE=C:\mnt\okcomputer\output\cache.db
# Notification configuration
set NOTIFICATION_CONFIG=desktop
# Or for email notifications
set NOTIFICATION_CONFIG=smtp:your@gmail.com:app_password:recipient@example.com
```
### Configuration Files
**YOLO Model Paths** (`Main.java:35-37`):
```java
String yoloCfg = "models/yolov4.cfg";
String yoloWeights = "models/yolov4.weights";
String yoloClasses = "models/coco.names";
```
### Customizing Schedules
Edit `WorkflowOrchestrator.java` to change frequencies:
```java
// Change from 30 minutes to 15 minutes
scheduler.scheduleAtFixedRate(() -> {
// ... scraper import logic
}, 0, 15, TimeUnit.MINUTES); // Changed from 30
```
---
## Monitoring & Debugging
### Check Status
```bash
# Quick status check
java -jar troostwijk-monitor.jar status
# Or
check-status.bat
```
### View Logs
Workflows print timestamped logs:
```
📥 [WORKFLOW 1] Importing scraper data...
→ Imported 5 auctions
→ Imported 25 lots
→ Found 50 unprocessed images
✓ Scraper import completed in 1250ms
🖼️ [WORKFLOW 2] Processing pending images...
→ Processing 50 images
✓ Processed 50 images, detected objects in 12 (15.3s)
```
### Common Issues
#### 1. No data being imported
**Problem:** External scraper not running
**Solution:**
```bash
# Check if scraper is running and populating database
sqlite3 C:\mnt\okcomputer\output\cache.db "SELECT COUNT(*) FROM auctions;"
```
#### 2. Images not downloading
**Problem:** No internet connection or invalid URLs
**Solution:**
- Check network connectivity
- Verify image URLs in database
- Check firewall settings
#### 3. Notifications not showing
**Problem:** System tray not available
**Solution:**
- Use email notifications instead
- Check notification permissions in Windows
#### 4. Workflows not running
**Problem:** Application crashed or was stopped
**Solution:**
- Check Task Scheduler logs
- Review application logs
- Restart in workflow mode
---
## Integration Examples
### Example 1: Complete Automated Workflow
**Setup:**
1. External scraper runs continuously, populating database
2. This monitor runs in workflow mode
3. Notifications sent to desktop + email
**Result:**
- New auctions → Notification within 30 min
- New images → Processed within 1 hour
- Bid changes → Notification within 15 min
- Closing alerts → Notification within 5 min
---
### Example 2: On-Demand Processing
**Setup:**
1. External scraper runs once per day (cron/Task Scheduler)
2. This monitor runs in once mode after scraper completes
**Script:**
```bash
# run-daily.bat
@echo off
REM Run scraper first
python scraper.py
REM Wait for completion
timeout /t 30
REM Run monitor once
java -jar troostwijk-monitor.jar once
```
---
### Example 3: Event-Driven with External Integration
**Setup:**
1. External system calls orchestrator events
2. Workflows run on-demand
**Java code:**
```java
WorkflowOrchestrator orchestrator = new WorkflowOrchestrator(...);
// When external scraper finds new auction
AuctionInfo newAuction = parseScraperData();
orchestrator.onNewAuctionDiscovered(newAuction);
// When bid detected
orchestrator.onBidChange(lot, 100.0, 150.0);
```
---
## Advanced Topics
### Custom Workflows
Add custom workflows to `WorkflowOrchestrator`:
```java
// Workflow 5: Value Estimation (every 2 hours)
scheduler.scheduleAtFixedRate(() -> {
try {
Console.println("💰 [WORKFLOW 5] Estimating values...");
var lotsWithImages = db.getLotsWithImages();
for (var lot : lotsWithImages) {
var images = db.getImagesForLot(lot.lotId());
double estimatedValue = estimateValue(images);
// Update database
db.updateLotEstimatedValue(lot.lotId(), estimatedValue);
// Notify if high value
if (estimatedValue > 5000) {
notifier.sendNotification(
String.format("High value lot detected: %d (€%.2f)",
lot.lotId(), estimatedValue),
"Value Alert", 1
);
}
}
} catch (Exception e) {
Console.println(" ❌ Value estimation failed: " + e.getMessage());
}
}, 10, 120, TimeUnit.MINUTES);
```
### Webhook Integration
Trigger workflows via HTTP webhooks:
```java
// In a separate web server (e.g., using Javalin)
Javalin app = Javalin.create().start(7070);
app.post("/webhook/new-auction", ctx -> {
AuctionInfo auction = ctx.bodyAsClass(AuctionInfo.class);
orchestrator.onNewAuctionDiscovered(auction);
ctx.result("OK");
});
app.post("/webhook/bid-change", ctx -> {
BidChange change = ctx.bodyAsClass(BidChange.class);
orchestrator.onBidChange(change.lot, change.oldBid, change.newBid);
ctx.result("OK");
});
```
---
## Summary
| Mode | Use Case | Scheduling | Best For |
|------|----------|------------|----------|
| **workflow** | Continuous operation | Built-in (Java) | Production, development |
| **once** | Single execution | External (Task Scheduler) | Cron jobs, on-demand |
| **legacy** | Backward compatibility | Built-in (Java) | Existing deployments |
| **status** | Quick check | Manual/External | Health checks, debugging |
**Recommended Setup for Windows:**
1. Install as Windows Service OR
2. Add to Startup folder (workflow mode) OR
3. Use Task Scheduler (once mode, every 30 min)
**All workflows automatically:**
- Import data from scraper
- Process images
- Detect objects
- Monitor bids
- Send notifications
- Handle errors gracefully
---
## Support
For issues or questions:
- Check `TEST_SUITE_SUMMARY.md` for test coverage
- Review code in `WorkflowOrchestrator.java`
- Run `java -jar troostwijk-monitor.jar status` for diagnostics