From 3358a2693c413e80bcc6b4882cdf30d97e27ba3c Mon Sep 17 00:00:00 2001 From: Tour Date: Mon, 8 Dec 2025 05:37:35 +0100 Subject: [PATCH] fix-tests-cleanup --- docs/DATABASE_CLEANUP_GUIDE.md | 192 --------- docs/IMPLEMENTATION_COMPLETE.md | 584 -------------------------- docs/INTEGRATION_GUIDE.md | 478 --------------------- docs/INTELLIGENCE_FEATURES_SUMMARY.md | 422 ------------------- docs/QUARKUS_IMPLEMENTATION.md | 540 ------------------------ docs/QUICKSTART.md | 191 --------- docs/SCRAPER_REFACTOR_GUIDE.md | 399 ------------------ docs/TEST_SUITE_SUMMARY.md | 333 --------------- docs/WORKFLOW_GUIDE.md | 537 ----------------------- 9 files changed, 3676 deletions(-) delete mode 100644 docs/DATABASE_CLEANUP_GUIDE.md delete mode 100644 docs/IMPLEMENTATION_COMPLETE.md delete mode 100644 docs/INTEGRATION_GUIDE.md delete mode 100644 docs/INTELLIGENCE_FEATURES_SUMMARY.md delete mode 100644 docs/QUARKUS_IMPLEMENTATION.md delete mode 100644 docs/QUICKSTART.md delete mode 100644 docs/SCRAPER_REFACTOR_GUIDE.md delete mode 100644 docs/TEST_SUITE_SUMMARY.md delete mode 100644 docs/WORKFLOW_GUIDE.md diff --git a/docs/DATABASE_CLEANUP_GUIDE.md b/docs/DATABASE_CLEANUP_GUIDE.md deleted file mode 100644 index b8119e4..0000000 --- a/docs/DATABASE_CLEANUP_GUIDE.md +++ /dev/null @@ -1,192 +0,0 @@ -# Database Cleanup Guide - -## Problem: Mixed Data Formats - -Your production database (`cache.db`) contains data from two different scrapers: - -### Valid Data (99.92%) -- **Format**: `A1-34732-49` (lot_id) + `c1f44ec2-ad6e-4c98-b0e2-cb1d8ccddcab` (auction_id UUID) -- **Count**: 16,794 lots -- **Source**: Current GraphQL-based scraper -- **Status**: ✅ Clean, with proper auction_id - -### Invalid Data (0.08%) -- **Format**: `bmw-550i-4-4-v8-high-executive-...` (slug as lot_id) + `""` (empty auction_id) -- **Count**: 13 lots -- **Source**: Old legacy scraper -- **Status**: ❌ Missing auction_id, causes issues - -## Impact - -These 13 invalid entries: -- Cause `NullPointerException` in analytics when grouping by country -- Cannot be properly linked to auctions -- Skew statistics slightly -- May cause issues with intelligence features that rely on auction_id - -## Solution 1: Clean Sync (Recommended) - -The updated sync script now **automatically removes old local data** before syncing: - -```bash -# Windows PowerShell -.\scripts\Sync-ProductionData.ps1 - -# Linux/Mac -./scripts/sync-production-data.sh --db-only -``` - -**What it does**: -1. Backs up existing database to `cache.db.backup-YYYYMMDD-HHMMSS` -2. **Removes old local database completely** -3. Downloads fresh copy from production -4. Shows data quality report - -**Output includes**: -``` -Database statistics: -┌─────────────┬────────┐ -│ table_name │ count │ -├─────────────┼────────┤ -│ auctions │ 526 │ -│ lots │ 16807 │ -│ images │ 536502 │ -│ cache │ 2134 │ -└─────────────┴────────┘ - -Data quality: -┌────────────────────────────────────┬────────┬────────────┐ -│ metric │ count │ percentage │ -├────────────────────────────────────┼────────┼────────────┤ -│ Valid lots │ 16794 │ 99.92% │ -│ Invalid lots (missing auction_id) │ 13 │ 0.08% │ -│ Lots with intelligence fields │ 0 │ 0.00% │ -└────────────────────────────────────┴────────┴────────────┘ -``` - -## Solution 2: Manual Cleanup - -If you want to clean your existing local database without re-downloading: - -```bash -# Dry run (see what would be deleted) -./scripts/cleanup-database.sh --dry-run - -# Actual cleanup -./scripts/cleanup-database.sh -``` - -**What it does**: -1. Creates backup before cleanup -2. Deletes lots with missing auction_id -3. Deletes orphaned images (images without matching lots) -4. Compacts database (VACUUM) to reclaim space -5. Shows before/after statistics - -**Example output**: -``` -Current database state: -┌──────────────────────────────────┬────────┐ -│ metric │ count │ -├──────────────────────────────────┼────────┤ -│ Total lots │ 16807 │ -│ Valid lots (with auction_id) │ 16794 │ -│ Invalid lots (missing auction_id) │ 13 │ -└──────────────────────────────────┴────────┘ - -Analyzing data to clean up... - → Invalid lots to delete: 13 - → Orphaned images to delete: 0 - -This will permanently delete the above records. -Continue? (y/N) y - -Cleaning up database... - [1/2] Deleting invalid lots... - ✓ Deleted 13 invalid lots - [2/2] Deleting orphaned images... - ✓ Deleted 0 orphaned images - [3/3] Compacting database... - ✓ Database compacted - -Final database state: -┌───────────────┬────────┐ -│ metric │ count │ -├───────────────┼────────┤ -│ Total lots │ 16794 │ -│ Total images │ 536502 │ -└───────────────┴────────┘ - -Database size: 8.9G -``` - -## Solution 3: SQL Manual Cleanup - -If you prefer to manually clean using SQL: - -```sql --- Backup first! --- cp cache.db cache.db.backup - --- Check invalid entries -SELECT COUNT(*), 'Invalid' as type -FROM lots -WHERE auction_id IS NULL OR auction_id = '' -UNION ALL -SELECT COUNT(*), 'Valid' -FROM lots -WHERE auction_id IS NOT NULL AND auction_id != ''; - --- Delete invalid lots -DELETE FROM lots -WHERE auction_id IS NULL OR auction_id = ''; - --- Delete orphaned images -DELETE FROM images -WHERE lot_id NOT IN (SELECT lot_id FROM lots); - --- Compact database -VACUUM; -``` - -## Prevention: Production Database Cleanup - -To prevent these invalid entries from accumulating on production, you can: - -1. **Clean production database** (one-time): -```bash -ssh tour@athena.lan -docker run --rm -v shared-auction-data:/data alpine sqlite3 /data/cache.db "DELETE FROM lots WHERE auction_id IS NULL OR auction_id = '';" -``` - -2. **Update scraper** to ensure all lots have auction_id -3. **Add validation** in scraper to reject lots without auction_id - -## When to Clean - -### Immediately if: -- ❌ Seeing `NullPointerException` in analytics -- ❌ Dashboard insights failing -- ❌ Country distribution not working - -### Periodically: -- 🔄 After syncing from production (if production has invalid data) -- 🔄 Weekly/monthly maintenance -- 🔄 Before major testing or demos - -## Recommendation - -**Use Solution 1 (Clean Sync)** for simplicity: -- ✅ Guarantees clean state -- ✅ No manual SQL needed -- ✅ Shows data quality report -- ✅ Safe (automatic backup) - -The 13 invalid entries are from an old scraper and represent only 0.08% of data, so cleaning them up has minimal impact but prevents future errors. - ---- - -**Related Documentation**: -- [Sync Scripts README](../scripts/README.md) -- [Data Sync Setup](DATA_SYNC_SETUP.md) -- [Database Architecture](../wiki/DATABASE_ARCHITECTURE.md) diff --git a/docs/IMPLEMENTATION_COMPLETE.md b/docs/IMPLEMENTATION_COMPLETE.md deleted file mode 100644 index 2d4f717..0000000 --- a/docs/IMPLEMENTATION_COMPLETE.md +++ /dev/null @@ -1,584 +0,0 @@ -# Implementation Complete ✅ - -## Summary - -All requirements have been successfully implemented: - -### ✅ 1. Test Libraries Added - -**pom.xml updated with:** -- JUnit 5 (5.10.1) - Testing framework -- Mockito Core (5.8.0) - Mocking framework -- Mockito JUnit Jupiter (5.8.0) - JUnit integration -- AssertJ (3.24.2) - Fluent assertions - -**Run tests:** -```bash -mvn test -``` - ---- - -### ✅ 2. Paths Configured for Windows - -**Database:** -``` -C:\mnt\okcomputer\output\cache.db -``` - -**Images:** -``` -C:\mnt\okcomputer\output\images\{saleId}\{lotId}\ -``` - -**Files Updated:** -- `Main.java:31` - Database path -- `ImageProcessingService.java:52` - Image storage path - ---- - -### ✅ 3. Comprehensive Test Suite (90 Tests) - -| Test File | Tests | Coverage | -|-----------|-------|----------| -| ScraperDataAdapterTest | 13 | Data transformation, ID parsing, currency | -| DatabaseServiceTest | 15 | CRUD operations, concurrency | -| ImageProcessingServiceTest | 11 | Download, detection, errors | -| ObjectDetectionServiceTest | 10 | YOLO initialization, detection | -| NotificationServiceTest | 19 | Desktop/email, priorities | -| TroostwijkMonitorTest | 12 | Orchestration, monitoring | -| IntegrationTest | 10 | End-to-end workflows | -| **TOTAL** | **90** | **Complete system** | - -**Documentation:** See `TEST_SUITE_SUMMARY.md` - ---- - -### ✅ 4. Workflow Integration & Orchestration - -**New Component:** `WorkflowOrchestrator.java` - -**4 Automated Workflows:** - -1. **Scraper Data Import** (every 30 min) - - Imports auctions, lots, image URLs - - Sends notifications for significant data - -2. **Image Processing** (every 1 hour) - - Downloads images - - Runs YOLO object detection - - Saves labels to database - -3. **Bid Monitoring** (every 15 min) - - Checks for bid changes - - Sends notifications - -4. **Closing Alerts** (every 5 min) - - Finds lots closing soon - - Sends high-priority notifications - ---- - -### ✅ 5. Running Modes - -**Main.java now supports 4 modes:** - -#### Mode 1: workflow (Default - Recommended) -```bash -java -jar troostwijk-monitor.jar workflow -# OR -run-workflow.bat -``` -- Runs all workflows continuously -- Built-in scheduling -- Best for production - -#### Mode 2: once (For Cron/Task Scheduler) -```bash -java -jar troostwijk-monitor.jar once -# OR -run-once.bat -``` -- Runs complete workflow once -- Exits after completion -- Perfect for external schedulers - -#### Mode 3: legacy (Backward Compatible) -```bash -java -jar troostwijk-monitor.jar legacy -``` -- Original monitoring approach -- Kept for compatibility - -#### Mode 4: status (Quick Check) -```bash -java -jar troostwijk-monitor.jar status -# OR -check-status.bat -``` -- Shows current status -- Exits immediately - ---- - -### ✅ 6. Windows Scheduling Scripts - -**Batch Scripts Created:** - -1. **run-workflow.bat** - - Starts workflow mode - - Continuous operation - - For manual/startup use - -2. **run-once.bat** - - Single execution - - For Task Scheduler - - Exit code support - -3. **check-status.bat** - - Quick status check - - Shows database stats - -**PowerShell Automation:** - -4. **setup-windows-task.ps1** - - Creates Task Scheduler tasks automatically - - Sets up 2 scheduled tasks: - - Workflow runner (every 30 min) - - Status checker (every 6 hours) - -**Usage:** -```powershell -# Run as Administrator -.\setup-windows-task.ps1 -``` - ---- - -### ✅ 7. Event-Driven Triggers - -**WorkflowOrchestrator supports event-driven execution:** - -```java -// 1. New auction discovered -orchestrator.onNewAuctionDiscovered(auctionInfo); - -// 2. Bid change detected -orchestrator.onBidChange(lot, previousBid, newBid); - -// 3. Objects detected in image -orchestrator.onObjectsDetected(lotId, labels); -``` - -**Benefits:** -- React immediately to important events -- No waiting for next scheduled run -- Flexible integration with external systems - ---- - -### ✅ 8. Comprehensive Documentation - -**Documentation Created:** - -1. **TEST_SUITE_SUMMARY.md** - - Complete test coverage overview - - 90 test cases documented - - Running instructions - - Test patterns explained - -2. **WORKFLOW_GUIDE.md** - - Complete workflow integration guide - - Running modes explained - - Windows Task Scheduler setup - - Event-driven triggers - - Configuration options - - Troubleshooting guide - - Advanced integration examples - -3. **README.md** (Updated) - - System architecture diagram - - Integration flow - - User interaction points - - Value estimation pipeline - - Integration hooks table - ---- - -## Quick Start - -### Option A: Continuous Operation (Recommended) - -```bash -# Build -mvn clean package - -# Run workflow mode -java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar workflow - -# Or use batch script -run-workflow.bat -``` - -**What runs:** -- ✅ Data import every 30 min -- ✅ Image processing every 1 hour -- ✅ Bid monitoring every 15 min -- ✅ Closing alerts every 5 min - ---- - -### Option B: Windows Task Scheduler - -```powershell -# 1. Build JAR -mvn clean package - -# 2. Setup scheduled tasks (run as Admin) -.\setup-windows-task.ps1 - -# Done! Workflow runs automatically every 30 minutes -``` - ---- - -### Option C: Manual/Cron Execution - -```bash -# Run once -java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar once - -# Or -run-once.bat - -# Schedule externally (Windows Task Scheduler, cron, etc.) -``` - ---- - -## Architecture Overview - -``` -┌─────────────────────────────────────────────────────────────┐ -│ External Scraper (Python) │ -│ Populates: auctions, lots, images tables │ -└─────────────────────────┬───────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ SQLite Database │ -│ C:\mnt\okcomputer\output\cache.db │ -└─────────────────────────┬───────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ WorkflowOrchestrator (This System) │ -│ ┌─────────────────────────────────────────────────────┐ │ -│ │ Workflow 1: Scraper Import (every 30 min) │ │ -│ │ Workflow 2: Image Processing (every 1 hour) │ │ -│ │ Workflow 3: Bid Monitoring (every 15 min) │ │ -│ │ Workflow 4: Closing Alerts (every 5 min) │ │ -│ └─────────────────────────────────────────────────────┘ │ -│ │ │ -│ ┌─────────────────────────────────────────────────────┐ │ -│ │ ImageProcessingService │ │ -│ │ - Downloads images │ │ -│ │ - Stores: C:\mnt\okcomputer\output\images\ │ │ -│ └─────────────────────────────────────────────────────┘ │ -│ │ │ -│ ┌─────────────────────────────────────────────────────┐ │ -│ │ ObjectDetectionService (YOLO) │ │ -│ │ - Detects objects in images │ │ -│ │ - Labels: car, truck, machinery, etc. │ │ -│ └─────────────────────────────────────────────────────┘ │ -│ │ │ -│ ┌─────────────────────────────────────────────────────┐ │ -│ │ NotificationService │ │ -│ │ - Desktop notifications (Windows tray) │ │ -│ │ - Email notifications (Gmail SMTP) │ │ -│ └─────────────────────────────────────────────────────┘ │ -└─────────────────────────┬───────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ User Notifications │ -│ - Bid changes │ -│ - Closing alerts │ -│ - Object detection results │ -│ - Value estimates (future) │ -└─────────────────────────────────────────────────────────────┘ -``` - ---- - -## Integration Points - -### 1. Database Integration -- **Read:** Auctions, lots, image URLs from external scraper -- **Write:** Processed images, object labels, notifications - -### 2. File System Integration -- **Read:** YOLO model files (models/) -- **Write:** Downloaded images (C:\mnt\okcomputer\output\images\) - -### 3. External Scraper Integration -- **Mode:** Shared SQLite database -- **Frequency:** Scraper populates, monitor enriches - -### 4. Notification Integration -- **Desktop:** Windows system tray -- **Email:** Gmail SMTP (optional) - ---- - -## Testing - -### Run All Tests -```bash -mvn test -``` - -### Run Specific Test -```bash -mvn test -Dtest=IntegrationTest -mvn test -Dtest=WorkflowOrchestratorTest -``` - -### Test Coverage -```bash -mvn jacoco:prepare-agent test jacoco:report -# Report: target/site/jacoco/index.html -``` - ---- - -## Configuration - -### Environment Variables - -```bash -# Windows (cmd) -set DATABASE_FILE=C:\mnt\okcomputer\output\cache.db -set NOTIFICATION_CONFIG=desktop - -# Windows (PowerShell) -$env:DATABASE_FILE="C:\mnt\okcomputer\output\cache.db" -$env:NOTIFICATION_CONFIG="desktop" - -# For email notifications -set NOTIFICATION_CONFIG=smtp:your@gmail.com:app_password:recipient@example.com -``` - -### Code Configuration - -**Database Path** (`Main.java:31`): -```java -String databaseFile = System.getenv().getOrDefault( - "DATABASE_FILE", - "C:\\mnt\\okcomputer\\output\\cache.db" -); -``` - -**Workflow Schedules** (`WorkflowOrchestrator.java`): -```java -scheduleScraperDataImport(); // Line 65 - every 30 min -scheduleImageProcessing(); // Line 95 - every 1 hour -scheduleBidMonitoring(); // Line 180 - every 15 min -scheduleClosingAlerts(); // Line 215 - every 5 min -``` - ---- - -## Monitoring - -### Check Status -```bash -java -jar troostwijk-monitor.jar status -``` - -**Output:** -``` -📊 Workflow Status: - Running: Yes/No - Auctions: 25 - Lots: 150 - Images: 300 - Closing soon (< 30 min): 5 -``` - -### View Logs - -Workflows print detailed logs: -``` -📥 [WORKFLOW 1] Importing scraper data... - → Imported 5 auctions - → Imported 25 lots - ✓ Scraper import completed in 1250ms - -🖼️ [WORKFLOW 2] Processing pending images... - → Processing 50 images - ✓ Processed 50 images, detected objects in 12 - -💰 [WORKFLOW 3] Monitoring bids... - → Checking 150 active lots - ✓ Bid monitoring completed in 250ms - -⏰ [WORKFLOW 4] Checking closing times... - → Sent 3 closing alerts -``` - ---- - -## Next Steps - -### Immediate Actions - -1. **Build the project:** - ```bash - mvn clean package - ``` - -2. **Run tests:** - ```bash - mvn test - ``` - -3. **Choose execution mode:** - - **Continuous:** `run-workflow.bat` - - **Scheduled:** `.\setup-windows-task.ps1` (as Admin) - - **Manual:** `run-once.bat` - -4. **Verify setup:** - ```bash - check-status.bat - ``` - -### Future Enhancements - -1. **Value Estimation Algorithm** - - Use detected objects to estimate lot value - - Historical price analysis - - Market trends integration - -2. **Machine Learning** - - Train custom YOLO model for auction items - - Price prediction based on images - - Automatic categorization - -3. **Web Dashboard** - - Real-time monitoring - - Manual bid placement - - Value estimate approval - -4. **API Integration** - - Direct Troostwijk API integration - - Real-time bid updates - - Automatic bid placement - -5. **Advanced Notifications** - - SMS notifications (Twilio) - - Push notifications (Firebase) - - Slack/Discord integration - ---- - -## Files Created/Modified - -### Core Implementation -- ✅ `WorkflowOrchestrator.java` - Workflow coordination -- ✅ `Main.java` - Updated with 4 running modes -- ✅ `ImageProcessingService.java` - Windows paths -- ✅ `pom.xml` - Test libraries added - -### Test Suite (90 tests) -- ✅ `ScraperDataAdapterTest.java` (13 tests) -- ✅ `DatabaseServiceTest.java` (15 tests) -- ✅ `ImageProcessingServiceTest.java` (11 tests) -- ✅ `ObjectDetectionServiceTest.java` (10 tests) -- ✅ `NotificationServiceTest.java` (19 tests) -- ✅ `TroostwijkMonitorTest.java` (12 tests) -- ✅ `IntegrationTest.java` (10 tests) - -### Windows Scripts -- ✅ `run-workflow.bat` - Workflow mode runner -- ✅ `run-once.bat` - Once mode runner -- ✅ `check-status.bat` - Status checker -- ✅ `setup-windows-task.ps1` - Task Scheduler setup - -### Documentation -- ✅ `TEST_SUITE_SUMMARY.md` - Test coverage -- ✅ `WORKFLOW_GUIDE.md` - Complete workflow guide -- ✅ `README.md` - Updated with diagrams -- ✅ `IMPLEMENTATION_COMPLETE.md` - This file - ---- - -## Support & Troubleshooting - -### Common Issues - -**1. Tests failing** -```bash -# Ensure Maven dependencies downloaded -mvn clean install - -# Run tests with debug info -mvn test -X -``` - -**2. Workflow not starting** -```bash -# Check if JAR was built -dir target\*jar-with-dependencies.jar - -# Rebuild if missing -mvn clean package -``` - -**3. Database not found** -```bash -# Check path exists -dir C:\mnt\okcomputer\output\ - -# Create directory if missing -mkdir C:\mnt\okcomputer\output -``` - -**4. Images not downloading** -- Check internet connection -- Verify image URLs in database -- Check Windows Firewall settings - -### Getting Help - -1. Review documentation: - - `TEST_SUITE_SUMMARY.md` for tests - - `WORKFLOW_GUIDE.md` for workflows - - `README.md` for architecture - -2. Check status: - ```bash - check-status.bat - ``` - -3. Review logs in console output - -4. Run tests to verify components: - ```bash - mvn test - ``` - ---- - -## Summary - -✅ **Test libraries added** (JUnit, Mockito, AssertJ) -✅ **90 comprehensive tests created** -✅ **Workflow orchestration implemented** -✅ **4 running modes** (workflow, once, legacy, status) -✅ **Windows scheduling scripts** (batch + PowerShell) -✅ **Event-driven triggers** (3 event types) -✅ **Complete documentation** (3 guide files) -✅ **Windows paths configured** (database + images) - -**The system is production-ready and fully tested! 🎉** diff --git a/docs/INTEGRATION_GUIDE.md b/docs/INTEGRATION_GUIDE.md deleted file mode 100644 index 1190b08..0000000 --- a/docs/INTEGRATION_GUIDE.md +++ /dev/null @@ -1,478 +0,0 @@ -# Integration Guide: Troostwijk Monitor ↔ Scraper - -## Overview - -This document describes how **Troostwijk Monitor** (this Java project) integrates with the **ARCHITECTURE-TROOSTWIJK-SCRAPER** (Python scraper process). - -## Architecture - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ ARCHITECTURE-TROOSTWIJK-SCRAPER (Python) │ -│ │ -│ • Discovers auctions from website │ -│ • Scrapes lot details via Playwright │ -│ • Parses __NEXT_DATA__ JSON │ -│ • Stores image URLs (not downloads) │ -│ │ -│ ↓ Writes to │ -└─────────┼───────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ SHARED SQLite DATABASE │ -│ (troostwijk.db) │ -│ │ -│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ -│ │ auctions │ │ lots │ │ images │ │ -│ │ (Scraper) │ │ (Scraper) │ │ (Both) │ │ -│ └────────────────┘ └────────────────┘ └────────────────┘ │ -│ │ -│ ↑ Reads from ↓ Writes to │ -└─────────┼──────────────────────────────┼──────────────────────┘ - │ │ - │ ▼ -┌─────────┴──────────────────────────────────────────────────────┐ -│ TROOSTWIJK MONITOR (Java - This Project) │ -│ │ -│ • Reads auction/lot data from database │ -│ • Downloads images from URLs │ -│ • Runs YOLO object detection │ -│ • Monitors bid changes │ -│ • Sends notifications │ -└─────────────────────────────────────────────────────────────────┘ -``` - -## Database Schema Mapping - -### Scraper Schema → Monitor Schema - -The scraper and monitor use **slightly different schemas** that need to be reconciled: - -| Scraper Table | Monitor Table | Integration Notes | -|---------------|---------------|-----------------------------------------------| -| `auctions` | `auctions` | ✅ **Compatible** - same structure | -| `lots` | `lots` | ⚠️ **Needs mapping** - field name differences | -| `images` | `images` | ⚠️ **Partial overlap** - different purposes | -| `cache` | N/A | ❌ Monitor doesn't use cache | - -### Field Mapping: `auctions` Table - -| Scraper Field | Monitor Field | Notes | -|--------------------------|-------------------------------|---------------------------------------------------------------------| -| `auction_id` (TEXT) | `auction_id` (INTEGER) | ⚠️ **TYPE MISMATCH** - Scraper uses "A7-39813", Monitor expects INT | -| `url` | `url` | ✅ Compatible | -| `title` | `title` | ✅ Compatible | -| `location` | `location`, `city`, `country` | ⚠️ Monitor splits into 3 fields | -| `lots_count` | `lot_count` | ⚠️ Name difference | -| `first_lot_closing_time` | `closing_time` | ⚠️ Name difference | -| `scraped_at` | `discovered_at` | ⚠️ Name + type difference (TEXT vs INTEGER timestamp) | - -### Field Mapping: `lots` Table - -| Scraper Field | Monitor Field | Notes | -|----------------------|----------------------|--------------------------------------------------| -| `lot_id` (TEXT) | `lot_id` (INTEGER) | ⚠️ **TYPE MISMATCH** - "A1-28505-5" vs INT | -| `auction_id` | `sale_id` | ⚠️ Different name | -| `url` | `url` | ✅ Compatible | -| `title` | `title` | ✅ Compatible | -| `current_bid` (TEXT) | `current_bid` (REAL) | ⚠️ **TYPE MISMATCH** - "€123.45" vs 123.45 | -| `bid_count` | N/A | ℹ️ Monitor doesn't track | -| `closing_time` | `closing_time` | ⚠️ Format difference (TEXT vs LocalDateTime) | -| `viewing_time` | N/A | ℹ️ Monitor doesn't track | -| `pickup_date` | N/A | ℹ️ Monitor doesn't track | -| `location` | N/A | ℹ️ Monitor doesn't track lot location separately | -| `description` | `description` | ✅ Compatible | -| `category` | `category` | ✅ Compatible | -| N/A | `manufacturer` | ℹ️ Monitor has additional field | -| N/A | `type` | ℹ️ Monitor has additional field | -| N/A | `year` | ℹ️ Monitor has additional field | -| N/A | `currency` | ℹ️ Monitor has additional field | -| N/A | `closing_notified` | ℹ️ Monitor tracking field | - -### Field Mapping: `images` Table - -| Scraper Field | Monitor Field | Notes | -|------------------------|--------------------------|----------------------------------------| -| `id` | `id` | ✅ Compatible | -| `lot_id` | `lot_id` | ⚠️ Type difference (TEXT vs INTEGER) | -| `url` | `url` | ✅ Compatible | -| `local_path` | `Local_path` | ⚠️ Different name | -| `downloaded` (INTEGER) | N/A | ℹ️ Monitor uses `processed_at` instead | -| N/A | `labels` (TEXT) | ℹ️ Monitor adds detected objects | -| N/A | `processed_at` (INTEGER) | ℹ️ Monitor tracking field | - -## Integration Options - -### Option 1: Database Schema Adapter (Recommended) - -Create a compatibility layer that transforms scraper data to monitor format. - -**Implementation:** -```java -// Add to DatabaseService.java -class ScraperDataAdapter { - - /** - * Imports auction from scraper format to monitor format - */ - static AuctionInfo fromScraperAuction(ResultSet rs) throws SQLException { - // Parse "A7-39813" → 39813 - String auctionIdStr = rs.getString("auction_id"); - int auctionId = extractNumericId(auctionIdStr); - - // Split "Cluj-Napoca, RO" → city="Cluj-Napoca", country="RO" - String location = rs.getString("location"); - String[] parts = location.split(",\\s*"); - String city = parts.length > 0 ? parts[0] : ""; - String country = parts.length > 1 ? parts[1] : ""; - - return new AuctionInfo( - auctionId, - rs.getString("title"), - location, - city, - country, - rs.getString("url"), - extractTypePrefix(auctionIdStr), // "A7-39813" → "A7" - rs.getInt("lots_count"), - parseTimestamp(rs.getString("first_lot_closing_time")) - ); - } - - /** - * Imports lot from scraper format to monitor format - */ - static Lot fromScraperLot(ResultSet rs) throws SQLException { - // Parse "A1-28505-5" → 285055 (combine numbers) - String lotIdStr = rs.getString("lot_id"); - int lotId = extractNumericId(lotIdStr); - - // Parse "A7-39813" → 39813 - String auctionIdStr = rs.getString("auction_id"); - int saleId = extractNumericId(auctionIdStr); - - // Parse "€123.45" → 123.45 - String currentBidStr = rs.getString("current_bid"); - double currentBid = parseBid(currentBidStr); - - return new Lot( - saleId, - lotId, - rs.getString("title"), - rs.getString("description"), - "", // manufacturer - not in scraper - "", // type - not in scraper - 0, // year - not in scraper - rs.getString("category"), - currentBid, - "EUR", // currency - inferred from € - rs.getString("url"), - parseTimestamp(rs.getString("closing_time")), - false // not yet notified - ); - } - - private static int extractNumericId(String id) { - // "A7-39813" → 39813 - // "A1-28505-5" → 285055 - return Integer.parseInt(id.replaceAll("[^0-9]", "")); - } - - private static String extractTypePrefix(String id) { - // "A7-39813" → "A7" - int dashIndex = id.indexOf('-'); - return dashIndex > 0 ? id.substring(0, dashIndex) : ""; - } - - private static double parseBid(String bid) { - // "€123.45" → 123.45 - // "No bids" → 0.0 - if (bid == null || bid.contains("No")) return 0.0; - return Double.parseDouble(bid.replaceAll("[^0-9.]", "")); - } - - private static LocalDateTime parseTimestamp(String timestamp) { - if (timestamp == null) return null; - // Parse scraper's timestamp format - return LocalDateTime.parse(timestamp); - } -} -``` - -### Option 2: Unified Schema (Better Long-term) - -Modify **both** scraper and monitor to use a unified schema. - -**Create**: `SHARED_SCHEMA.sql` -```sql --- Unified schema that both projects use - -CREATE TABLE IF NOT EXISTS auctions ( - auction_id TEXT PRIMARY KEY, -- Use TEXT to support "A7-39813" - auction_id_numeric INTEGER, -- For monitor's integer needs - title TEXT NOT NULL, - location TEXT, -- Full: "Cluj-Napoca, RO" - city TEXT, -- Parsed: "Cluj-Napoca" - country TEXT, -- Parsed: "RO" - url TEXT NOT NULL, - type TEXT, -- "A7", "A1" - lot_count INTEGER DEFAULT 0, - closing_time TEXT, -- ISO 8601 format - scraped_at INTEGER, -- Unix timestamp - discovered_at INTEGER -- Unix timestamp (same as scraped_at) -); - -CREATE TABLE IF NOT EXISTS lots ( - lot_id TEXT PRIMARY KEY, -- Use TEXT: "A1-28505-5" - lot_id_numeric INTEGER, -- For monitor's integer needs - auction_id TEXT, -- FK: "A7-39813" - sale_id INTEGER, -- For monitor (same as auction_id_numeric) - title TEXT, - description TEXT, - manufacturer TEXT, - type TEXT, - year INTEGER, - category TEXT, - current_bid_text TEXT, -- "€123.45" or "No bids" - current_bid REAL, -- 123.45 - bid_count INTEGER, - currency TEXT DEFAULT 'EUR', - url TEXT UNIQUE, - closing_time TEXT, - viewing_time TEXT, - pickup_date TEXT, - location TEXT, - closing_notified INTEGER DEFAULT 0, - scraped_at TEXT, - FOREIGN KEY (auction_id) REFERENCES auctions(auction_id) -); - -CREATE TABLE IF NOT EXISTS images ( - id INTEGER PRIMARY KEY AUTOINCREMENT, - lot_id TEXT, -- FK: "A1-28505-5" - url TEXT, -- Image URL from website - local_path TEXT, -- Local path after download - labels TEXT, -- Detected objects (comma-separated) - downloaded INTEGER DEFAULT 0, -- 0=pending, 1=downloaded - processed_at INTEGER, -- Unix timestamp when processed - FOREIGN KEY (lot_id) REFERENCES lots(lot_id) -); - --- Indexes -CREATE INDEX IF NOT EXISTS idx_auctions_country ON auctions(country); -CREATE INDEX IF NOT EXISTS idx_lots_auction_id ON lots(auction_id); -CREATE INDEX IF NOT EXISTS idx_images_lot_id ON images(lot_id); -CREATE INDEX IF NOT EXISTS idx_images_downloaded ON images(downloaded); -``` - -### Option 3: API Integration (Most Flexible) - -Have the scraper expose a REST API for the monitor to query. - -```python -# In scraper: Add Flask API endpoint -@app.route('/api/auctions', methods=['GET']) -def get_auctions(): - """Returns auctions in monitor-compatible format""" - conn = sqlite3.connect(CACHE_DB) - cursor = conn.cursor() - cursor.execute("SELECT * FROM auctions WHERE location LIKE '%NL%'") - - auctions = [] - for row in cursor.fetchall(): - auctions.append({ - 'auctionId': extract_numeric_id(row[0]), - 'title': row[2], - 'location': row[3], - 'city': row[3].split(',')[0] if row[3] else '', - 'country': row[3].split(',')[1].strip() if ',' in row[3] else '', - 'url': row[1], - 'type': row[0].split('-')[0], - 'lotCount': row[4], - 'closingTime': row[5] - }) - - return jsonify(auctions) -``` - -## Recommended Integration Steps - -### Phase 1: Immediate (Adapter Pattern) -1. ✅ Keep separate schemas -2. ✅ Create `ScraperDataAdapter` in Monitor -3. ✅ Add import methods to `DatabaseService` -4. ✅ Monitor reads from scraper's tables using adapter - -### Phase 2: Short-term (Unified Schema) -1. 📋 Design unified schema (see Option 2) -2. 📋 Update scraper to use unified schema -3. 📋 Update monitor to use unified schema -4. 📋 Migrate existing data - -### Phase 3: Long-term (API + Event-driven) -1. 📋 Add REST API to scraper -2. 📋 Add webhook/event notification when new data arrives -3. 📋 Monitor subscribes to events -4. 📋 Process images asynchronously - -## Current Integration Flow - -### Scraper Process (Python) -```bash -# 1. Run scraper to populate database -cd /path/to/scraper -python scraper.py - -# Output: -# ✅ Scraped 42 auctions -# ✅ Scraped 1,234 lots -# ✅ Saved 3,456 image URLs -# ✅ Data written to: /mnt/okcomputer/output/cache.db -``` - -### Monitor Process (Java) -```bash -# 2. Run monitor to process the data -cd /path/to/monitor -export DATABASE_FILE=/mnt/okcomputer/output/cache.db -java -jar troostwijk-monitor.jar - -# Output: -# 📊 Current Database State: -# Total lots in database: 1,234 -# Total images processed: 0 -# -# [1/2] Processing images... -# Downloading and analyzing 3,456 images... -# -# [2/2] Starting bid monitoring... -# ✓ Monitoring 1,234 active lots -``` - -## Configuration - -### Shared Database Path -Both processes must point to the same database file: - -**Scraper** (`config.py`): -```python -CACHE_DB = '/mnt/okcomputer/output/cache.db' -``` - -**Monitor** (`Main.java`): -```java -String databaseFile = System.getenv().getOrDefault( - "DATABASE_FILE", - "/mnt/okcomputer/output/cache.db" -); -``` - -### Recommended Directory Structure -``` -/mnt/okcomputer/ -├── scraper/ # Python scraper code -│ ├── scraper.py -│ └── requirements.txt -├── monitor/ # Java monitor code -│ ├── troostwijk-monitor.jar -│ └── models/ # YOLO models -│ ├── yolov4.cfg -│ ├── yolov4.weights -│ └── coco.names -└── output/ # Shared data directory - ├── cache.db # Shared SQLite database - └── images/ # Downloaded images - ├── A1-28505-5/ - │ ├── 001.jpg - │ └── 002.jpg - └── ... -``` - -## Monitoring & Coordination - -### Option A: Sequential Execution -```bash -#!/bin/bash -# run-pipeline.sh - -echo "Step 1: Scraping..." -python scraper/scraper.py - -echo "Step 2: Processing images..." -java -jar monitor/troostwijk-monitor.jar --process-images-only - -echo "Step 3: Starting monitor..." -java -jar monitor/troostwijk-monitor.jar --monitor-only -``` - -### Option B: Separate Services (Docker Compose) -```yaml -version: '3.8' -services: - scraper: - build: ./scraper - volumes: - - ./output:/data - environment: - - CACHE_DB=/data/cache.db - command: python scraper.py - - monitor: - build: ./monitor - volumes: - - ./output:/data - environment: - - DATABASE_FILE=/data/cache.db - - NOTIFICATION_CONFIG=desktop - depends_on: - - scraper - command: java -jar troostwijk-monitor.jar -``` - -### Option C: Cron-based Scheduling -```cron -# Scrape every 6 hours -0 */6 * * * cd /mnt/okcomputer/scraper && python scraper.py - -# Process images every hour (if new lots found) -0 * * * * cd /mnt/okcomputer/monitor && java -jar monitor.jar --process-new - -# Monitor runs continuously -@reboot cd /mnt/okcomputer/monitor && java -jar monitor.jar --monitor-only -``` - -## Troubleshooting - -### Issue: Type Mismatch Errors -**Symptom**: Monitor crashes with "INTEGER expected, got TEXT" - -**Solution**: Use adapter pattern (Option 1) or unified schema (Option 2) - -### Issue: Monitor sees no data -**Symptom**: "Total lots in database: 0" - -**Check**: -1. Is `DATABASE_FILE` env var set correctly? -2. Did scraper actually write data? -3. Are both processes using the same database file? - -```bash -# Verify database has data -sqlite3 /mnt/okcomputer/output/cache.db "SELECT COUNT(*) FROM lots" -``` - -### Issue: Images not downloading -**Symptom**: "Total images processed: 0" but scraper found images - -**Check**: -1. Scraper writes image URLs to `images` table -2. Monitor reads from `images` table with `downloaded=0` -3. Field name mapping: `local_path` vs `local_path` - -## Next Steps - -1. **Immediate**: Implement `ScraperDataAdapter` for compatibility -2. **This Week**: Test end-to-end integration with sample data -3. **Next Sprint**: Migrate to unified schema -4. **Future**: Add event-driven architecture with webhooks diff --git a/docs/INTELLIGENCE_FEATURES_SUMMARY.md b/docs/INTELLIGENCE_FEATURES_SUMMARY.md deleted file mode 100644 index bdd953b..0000000 --- a/docs/INTELLIGENCE_FEATURES_SUMMARY.md +++ /dev/null @@ -1,422 +0,0 @@ -# Intelligence Features Implementation Summary - -## Overview -This document summarizes the implementation of advanced intelligence features based on 15+ new GraphQL API fields discovered from the Troostwijk auction system. - -## New GraphQL Fields Integrated - -### HIGH PRIORITY FIELDS (Implemented) -1. **`followersCount`** (Integer) - Watch count showing bidder interest - - Direct indicator of competition - - Used for sleeper lot detection - - Popularity level classification - -2. **`estimatedFullPrice`** (Object: min/max cents) - - Auction house's estimated value range - - Used for bargain detection - - Price vs estimate analytics - -3. **`nextBidStepInCents`** (Long) - - Exact bid increment from API - - Precise next bid calculations - - Better UX for bidding recommendations - -4. **`condition`** (String) - - Direct condition field from API - - Better than extracting from attributes - - Used in condition scoring - -5. **`categoryInformation`** (Object) - - Structured category with path - - Better categorization and filtering - - Category-based analytics - -6. **`location`** (Object: city, countryCode, etc.) - - Structured location data - - Proximity filtering capability - - Logistics cost calculation - -### MEDIUM PRIORITY FIELDS (Implemented) -7. **`biddingStatus`** (Enum) - Detailed bidding status -8. **`appearance`** (String) - Visual condition notes -9. **`packaging`** (String) - Packaging details -10. **`quantity`** (Long) - Lot quantity for bulk items -11. **`vat`** (BigDecimal) - VAT percentage -12. **`buyerPremiumPercentage`** (BigDecimal) - Buyer premium -13. **`remarks`** (String) - Viewing/pickup notes - -## Code Changes - -### 1. Backend - Lot.java (Domain Model) -**File**: `src/main/java/auctiora/Lot.java` - -**Changes**: -- Added 24 new fields to the Lot record -- Implemented 9 intelligence calculation methods: - - `calculateTotalCost()` - Bid + VAT + Premium - - `calculateNextBid()` - Using API increment - - `isBelowEstimate()` - Bargain detection - - `isAboveEstimate()` - Overvalued detection - - `getInterestToBidRatio()` - Conversion rate - - `getPopularityLevel()` - HIGH/MEDIUM/LOW/MINIMAL - - `isSleeperLot()` - High interest, low bid - - `getEstimatedMidpoint()` - Average of estimate range - - `getPriceVsEstimateRatio()` - Price comparison metric - -**Example**: -```java -public boolean isSleeperLot() { - return followersCount != null && followersCount > 10 && currentBid < 100; -} - -public double calculateTotalCost() { - double base = currentBid > 0 ? currentBid : 0; - if (vat != null && vat > 0) { - base += (base * vat / 100.0); - } - if (buyerPremiumPercentage != null && buyerPremiumPercentage > 0) { - base += (base * buyerPremiumPercentage / 100.0); - } - return base; -} -``` - -### 2. Backend - AuctionMonitorResource.java (REST API) -**File**: `src/main/java/auctiora/AuctionMonitorResource.java` - -**New Endpoints Added**: -1. `GET /api/monitor/intelligence/sleepers` - Sleeper lots (high interest, low bids) -2. `GET /api/monitor/intelligence/bargains` - Bargain lots (below estimate) -3. `GET /api/monitor/intelligence/popular?level={HIGH|MEDIUM|LOW}` - Popular lots -4. `GET /api/monitor/intelligence/price-analysis` - Price vs estimate statistics -5. `GET /api/monitor/lots/{lotId}/intelligence` - Detailed lot intelligence -6. `GET /api/monitor/charts/watch-distribution` - Follower count distribution - -**Enhanced Features**: -- Updated insights endpoint to include sleeper, bargain, and popular insights -- Added intelligent filtering and sorting for intelligence data -- Integrated new fields into existing statistics - -**Example Endpoint**: -```java -@GET -@Path("/intelligence/sleepers") -public Response getSleeperLots(@QueryParam("minFollowers") @DefaultValue("10") int minFollowers) { - var allLots = db.getAllLots(); - var sleepers = allLots.stream() - .filter(Lot::isSleeperLot) - .toList(); - - return Response.ok(Map.of( - "count", sleepers.size(), - "lots", sleepers - )).build(); -} -``` - -### 3. Frontend - index.html (Intelligence Dashboard) -**File**: `src/main/resources/META-INF/resources/index.html` - -**New UI Components**: - -#### Intelligence Dashboard Widgets (3 new cards) -1. **Sleeper Lots Widget** - - Purple gradient design - - Shows count of high-interest, low-bid lots - - Click to filter table - -2. **Bargain Lots Widget** - - Green gradient design - - Shows count of below-estimate lots - - Click to filter table - -3. **Popular/Hot Lots Widget** - - Orange gradient design - - Shows count of high-follower lots - - Click to filter table - -#### Enhanced Closing Soon Table -**New Columns Added**: -1. **Watchers** - Follower count with color-coded badges - - Red (50+ followers): High competition - - Orange (21-50): Medium competition - - Blue (6-20): Some interest - - Gray (0-5): Minimal interest - -2. **Est. Range** - Auction house estimate (`€min-€max`) - - Shows "DEAL" badge if below estimate - -3. **Total Cost** - True cost including VAT and premium - - Hover tooltip shows breakdown - - Purple color to stand out - -**JavaScript Functions Added**: -- `fetchIntelligenceData()` - Fetches all intelligence metrics -- `showSleeperLots()` - Filters table to sleepers -- `showBargainLots()` - Filters table to bargains -- `showPopularLots()` - Filters table to popular -- Enhanced table rendering with smart badges - -**Example Code**: -```javascript -// Calculate total cost (including VAT and premium) -const currentBid = lot.currentBid || 0; -const vat = lot.vat || 0; -const premium = lot.buyerPremiumPercentage || 0; -const totalCost = currentBid * (1 + (vat/100) + (premium/100)); - -// Bargain indicator -const isBargain = estMin && currentBid < parseFloat(estMin); -const bargainBadge = isBargain ? - 'DEAL' : ''; -``` - -## Intelligence Features - -### 1. Sleeper Lot Detection -**Algorithm**: `followersCount > 10 AND currentBid < 100` - -**Value Proposition**: -- Identifies lots with high interest but low current bids -- Opportunity to bid strategically before price escalates -- Early indicator of undervalued items - -**Dashboard Display**: -- Count shown in purple widget -- Click to filter table -- Purple "eye" icon - -### 2. Bargain Detection -**Algorithm**: `currentBid < estimatedMin` - -**Value Proposition**: -- Identifies lots priced below auction house estimate -- Clear signal of potential good deals -- Quantifiable value assessment - -**Dashboard Display**: -- Count shown in green widget -- "DEAL" badge in table -- Click to filter table - -### 3. Popularity Analysis -**Algorithm**: Tiered classification by follower count -- HIGH: > 50 followers -- MEDIUM: 21-50 followers -- LOW: 6-20 followers -- MINIMAL: 0-5 followers - -**Value Proposition**: -- Predict competition level -- Identify trending items -- Adjust bidding strategy accordingly - -**Dashboard Display**: -- Count shown in orange widget -- Color-coded badges in table -- Click to filter by level - -### 4. True Cost Calculator -**Algorithm**: `currentBid × (1 + VAT/100) × (1 + premium/100)` - -**Value Proposition**: -- Shows actual out-of-pocket cost -- Prevents budget surprises -- Enables accurate comparison across lots - -**Dashboard Display**: -- Purple "Total Cost" column -- Hover tooltip shows breakdown -- Updated in real-time - -### 5. Exact Bid Increment -**Algorithm**: Uses `nextBidStepInCents` from API, falls back to calculated increment - -**Value Proposition**: -- No guesswork on next bid amount -- API-provided accuracy -- Better bidding UX - -**Implementation**: -```java -public double calculateNextBid() { - if (nextBidStepInCents != null && nextBidStepInCents > 0) { - return currentBid + (nextBidStepInCents / 100.0); - } else if (bidIncrement != null && bidIncrement > 0) { - return currentBid + bidIncrement; - } - return currentBid * 1.05; // Fallback: 5% increment -} -``` - -### 6. Price vs Estimate Analytics -**Metrics**: -- Total lots with estimates -- Count below estimate -- Count above estimate -- Average price vs estimate percentage - -**Value Proposition**: -- Market efficiency analysis -- Auction house accuracy tracking -- Investment opportunity identification - -**API Endpoint**: `/api/monitor/intelligence/price-analysis` - -## Visual Design - -### Color Scheme -- **Purple**: Sleeper lots, total cost (opportunity/value) -- **Green**: Bargains, deals (positive value) -- **Orange/Red**: Popular/hot lots (competition warning) -- **Blue**: Moderate interest (informational) -- **Gray**: Minimal interest (neutral) - -### Badge System -1. **Watchers Badge**: Color-coded by competition level -2. **DEAL Badge**: Green indicator for below-estimate -3. **Time Left Badge**: Red/yellow/green by urgency -4. **Popularity Badge**: Fire icon for hot lots - -### Interactive Elements -- Click widgets to filter table -- Hover for detailed tooltips -- Smooth scroll to table on filter -- Toast notifications for user feedback - -## Performance Considerations - -### API Optimization -- All intelligence data fetched in parallel -- Cached in dashboard state -- Minimal recalculation on render -- Efficient stream operations in backend - -### Frontend Optimization -- Batch DOM updates -- Lazy rendering for large tables -- Debounced filter operations -- CSS transitions for smooth UX - -## Testing Recommendations - -### Backend Tests -1. Test `Lot` intelligence methods with various inputs -2. Test API endpoints with mock data -3. Test edge cases (null values, zero bids, etc.) -4. Performance test with 10k+ lots - -### Frontend Tests -1. Test widget click handlers -2. Test table rendering with new columns -3. Test filter functionality -4. Test responsive design on mobile - -### Integration Tests -1. End-to-end flow: Scraper → DB → API → Dashboard -2. Real-time data refresh -3. Concurrent user access -4. Load testing - -## Future Enhancements - -### Phase 2 (Bid History) -- Implement `bid_history` table scraping -- Track bid changes over time -- Calculate bid velocity accurately -- Identify bid patterns - -### Phase 3 (ML Predictions) -- Predict final hammer price -- Recommend optimal bid timing -- Classify lot categories automatically -- Anomaly detection - -### Phase 4 (Mobile) -- React Native mobile app -- Push notifications -- Offline mode -- Quick bid functionality - -## Migration Guide - -### Database Migration (Required) -The new fields need to be added to the database schema: - -```sql --- Add to lots table -ALTER TABLE lots ADD COLUMN followers_count INTEGER DEFAULT 0; -ALTER TABLE lots ADD COLUMN estimated_min DECIMAL(12, 2); -ALTER TABLE lots ADD COLUMN estimated_max DECIMAL(12, 2); -ALTER TABLE lots ADD COLUMN next_bid_step_in_cents BIGINT; -ALTER TABLE lots ADD COLUMN condition TEXT; -ALTER TABLE lots ADD COLUMN category_path TEXT; -ALTER TABLE lots ADD COLUMN city_location TEXT; -ALTER TABLE lots ADD COLUMN country_code TEXT; -ALTER TABLE lots ADD COLUMN bidding_status TEXT; -ALTER TABLE lots ADD COLUMN appearance TEXT; -ALTER TABLE lots ADD COLUMN packaging TEXT; -ALTER TABLE lots ADD COLUMN quantity BIGINT; -ALTER TABLE lots ADD COLUMN vat DECIMAL(5, 2); -ALTER TABLE lots ADD COLUMN buyer_premium_percentage DECIMAL(5, 2); -ALTER TABLE lots ADD COLUMN remarks TEXT; -ALTER TABLE lots ADD COLUMN starting_bid DECIMAL(12, 2); -ALTER TABLE lots ADD COLUMN reserve_price DECIMAL(12, 2); -ALTER TABLE lots ADD COLUMN reserve_met BOOLEAN DEFAULT FALSE; -ALTER TABLE lots ADD COLUMN bid_increment DECIMAL(12, 2); -ALTER TABLE lots ADD COLUMN view_count INTEGER DEFAULT 0; -ALTER TABLE lots ADD COLUMN first_bid_time TEXT; -ALTER TABLE lots ADD COLUMN last_bid_time TEXT; -ALTER TABLE lots ADD COLUMN bid_velocity DECIMAL(5, 2); -``` - -### Scraper Update (Required) -The external scraper (Python/Playwright) needs to extract the new fields from GraphQL: - -```python -# Extract from __NEXT_DATA__ JSON -followers_count = lot_data.get('followersCount') -estimated_min = lot_data.get('estimatedFullPrice', {}).get('min', {}).get('cents') -estimated_max = lot_data.get('estimatedFullPrice', {}).get('max', {}).get('cents') -next_bid_step = lot_data.get('nextBidStepInCents') -condition = lot_data.get('condition') -# ... etc -``` - -### Deployment Steps -1. Stop the monitor service -2. Run database migrations -3. Update scraper to extract new fields -4. Deploy updated monitor JAR -5. Restart services -6. Verify data populating in dashboard - -## Performance Metrics - -### Expected Performance -- **Intelligence Data Fetch**: < 100ms for 10k lots -- **Table Rendering**: < 200ms with all new columns -- **Widget Update**: < 50ms -- **API Response Time**: < 500ms - -### Resource Usage -- **Memory**: +50MB for intelligence calculations -- **Database**: +2KB per lot (new columns) -- **Network**: +10KB per dashboard refresh - -## Documentation -- **Integration Flowchart**: `docs/INTEGRATION_FLOWCHART.md` -- **API Documentation**: Auto-generated from JAX-RS annotations -- **Database Schema**: `wiki/DATABASE_ARCHITECTURE.md` -- **GraphQL Fields**: `wiki/EXPERT_ANALITICS.sql` - ---- - -**Implementation Date**: December 2025 -**Version**: 2.1 -**Status**: ✅ Complete - Ready for Testing -**Next Steps**: -1. Deploy to staging environment -2. Run integration tests -3. Update scraper to extract new fields -4. Deploy to production diff --git a/docs/QUARKUS_IMPLEMENTATION.md b/docs/QUARKUS_IMPLEMENTATION.md deleted file mode 100644 index 885e02d..0000000 --- a/docs/QUARKUS_IMPLEMENTATION.md +++ /dev/null @@ -1,540 +0,0 @@ -# Quarkus Implementation Complete ✅ - -## Summary - -The Troostwijk Auction Monitor has been fully integrated with **Quarkus Framework** for production-ready deployment with enterprise features. - ---- - -## 🎯 What Was Added - -### 1. **Quarkus Dependencies** (pom.xml) - -```xml - - - io.quarkus - quarkus-arc - - - io.quarkus - quarkus-rest-jackson - - - io.quarkus - quarkus-scheduler - - - io.quarkus - quarkus-smallrye-health - - - io.quarkus - quarkus-config-yaml - -``` - -### 2. **Configuration** (application.properties) - -```properties -# Application -quarkus.application.name=troostwijk-scraper -quarkus.http.port=8081 - -# Auction Monitor Configuration -auction.database.path=C:\\mnt\\okcomputer\\output\\cache.db -auction.images.path=C:\\mnt\\okcomputer\\output\\images -auction.notification.config=desktop - -# YOLO Models -auction.yolo.config=models/yolov4.cfg -auction.yolo.weights=models/yolov4.weights -auction.yolo.classes=models/coco.names - -# Workflow Schedules (Cron Expressions) -auction.workflow.scraper-import.cron=0 */30 * * * ? # Every 30 min -auction.workflow.image-processing.cron=0 0 * * * ? # Every 1 hour -auction.workflow.bid-monitoring.cron=0 */15 * * * ? # Every 15 min -auction.workflow.closing-alerts.cron=0 */5 * * * ? # Every 5 min - -# Scheduler -quarkus.scheduler.enabled=true - -# Health Checks -quarkus.smallrye-health.root-path=/health -``` - -### 3. **Quarkus Scheduler** (QuarkusWorkflowScheduler.java) - -Replaced manual `ScheduledExecutorService` with Quarkus `@Scheduled`: - -```java -@ApplicationScoped -public class QuarkusWorkflowScheduler { - - @Inject DatabaseService db; - @Inject NotificationService notifier; - @Inject ObjectDetectionService detector; - @Inject ImageProcessingService imageProcessor; - - // Workflow 1: Every 30 minutes - @Scheduled(cron = "{auction.workflow.scraper-import.cron}") - void importScraperData() { /* ... */ } - - // Workflow 2: Every 1 hour - @Scheduled(cron = "{auction.workflow.image-processing.cron}") - void processImages() { /* ... */ } - - // Workflow 3: Every 15 minutes - @Scheduled(cron = "{auction.workflow.bid-monitoring.cron}") - void monitorBids() { /* ... */ } - - // Workflow 4: Every 5 minutes - @Scheduled(cron = "{auction.workflow.closing-alerts.cron}") - void checkClosingTimes() { /* ... */ } -} -``` - -### 4. **CDI Producer** (AuctionMonitorProducer.java) - -Centralized service creation with dependency injection: - -```java -@ApplicationScoped -public class AuctionMonitorProducer { - - @Produces @Singleton - public DatabaseService produceDatabaseService( - @ConfigProperty(name = "auction.database.path") String dbPath) { - DatabaseService db = new DatabaseService(dbPath); - db.ensureSchema(); - return db; - } - - @Produces @Singleton - public NotificationService produceNotificationService( - @ConfigProperty(name = "auction.notification.config") String config) { - return new NotificationService(config, ""); - } - - @Produces @Singleton - public ObjectDetectionService produceObjectDetectionService(...) { } - - @Produces @Singleton - public ImageProcessingService produceImageProcessingService(...) { } -} -``` - -### 5. **REST API** (AuctionMonitorResource.java) - -Full REST API for monitoring and control: - -| Endpoint | Method | Description | -|----------|--------|-------------| -| `/api/monitor/status` | GET | Get current status | -| `/api/monitor/statistics` | GET | Get detailed statistics | -| `/api/monitor/trigger/scraper-import` | POST | Trigger scraper import | -| `/api/monitor/trigger/image-processing` | POST | Trigger image processing | -| `/api/monitor/trigger/bid-monitoring` | POST | Trigger bid monitoring | -| `/api/monitor/trigger/closing-alerts` | POST | Trigger closing alerts | -| `/api/monitor/auctions` | GET | List auctions | -| `/api/monitor/auctions?country=NL` | GET | Filter auctions by country | -| `/api/monitor/lots` | GET | List active lots | -| `/api/monitor/lots/closing-soon` | GET | Lots closing soon | -| `/api/monitor/lots/{id}/images` | GET | Get lot images | -| `/api/monitor/test-notification` | POST | Send test notification | - -### 6. **Health Checks** (AuctionMonitorHealthCheck.java) - -Kubernetes-ready health probes: - -```java -@Liveness // /health/live -public class LivenessCheck implements HealthCheck { - public HealthCheckResponse call() { - return HealthCheckResponse.up("Auction Monitor is alive"); - } -} - -@Readiness // /health/ready -public class ReadinessCheck implements HealthCheck { - @Inject DatabaseService db; - - public HealthCheckResponse call() { - var auctions = db.getAllAuctions(); - return HealthCheckResponse.named("database") - .up() - .withData("auctions", auctions.size()) - .build(); - } -} - -@Startup // /health/started -public class StartupCheck implements HealthCheck { /* ... */ } -``` - -### 7. **Docker Support** - -#### Dockerfile (Optimized for Quarkus fast-jar) - -```dockerfile -# Build stage -FROM maven:3.9-eclipse-temurin-25-alpine AS build -WORKDIR /app -COPY ../pom.xml ./ -RUN mvn dependency:go-offline -B -COPY ../src ./src/ -RUN mvn package -DskipTests -Dquarkus.package.jar.type=fast-jar - -# Runtime stage -FROM eclipse-temurin:25-jre-alpine -WORKDIR /app - -# Copy Quarkus fast-jar structure -COPY --from=build /app/target/quarkus-app/lib/ /app/lib/ -COPY --from=build /app/target/quarkus-app/*.jar /app/ -COPY --from=build /app/target/quarkus-app/app/ /app/app/ -COPY --from=build /app/target/quarkus-app/quarkus/ /app/quarkus/ - -EXPOSE 8081 -HEALTHCHECK CMD wget --spider http://localhost:8081/health/live - -ENTRYPOINT ["java", "-jar", "/app/quarkus-run.jar"] -``` - -#### docker-compose.yml - -```yaml -version: '3.8' -services: - auction-monitor: - build: ../wiki - ports: - - "8081:8081" - volumes: - - ./data/cache.db:/mnt/okcomputer/output/cache.db - - ./data/images:/mnt/okcomputer/output/images - environment: - - AUCTION_DATABASE_PATH=/mnt/okcomputer/output/cache.db - - AUCTION_NOTIFICATION_CONFIG=desktop - healthcheck: - test: [ "CMD", "wget", "--spider", "http://localhost:8081/health/live" ] - interval: 30s - restart: unless-stopped -``` - -### 8. **Kubernetes Deployment** - -Full Kubernetes manifests: -- **Namespace** - Isolated environment -- **PersistentVolumeClaim** - Data storage -- **ConfigMap** - Configuration -- **Secret** - Sensitive data (SMTP credentials) -- **Deployment** - Application pods -- **Service** - Internal networking -- **Ingress** - External access -- **HorizontalPodAutoscaler** - Auto-scaling - ---- - -## 🚀 How to Run - -### Development Mode (with live reload) - -```bash -mvn quarkus:dev - -# Access: -# - App: http://localhost:8081 -# - Dev UI: http://localhost:8081/q/dev/ -# - API: http://localhost:8081/api/monitor/status -# - Health: http://localhost:8081/health -``` - -### Production Mode (JAR) - -```bash -# Build -mvn clean package - -# Run -java -jar target/quarkus-app/quarkus-run.jar - -# Access: http://localhost:8081 -``` - -### Docker - -```bash -# Build -docker build -t auction-monitor . - -# Run -docker run -p 8081:8081 auction-monitor - -# Access: http://localhost:8081 -``` - -### Docker Compose - -```bash -# Start -docker-compose up -d - -# View logs -docker-compose logs -f - -# Access: http://localhost:8081 -``` - -### Kubernetes - -```bash -# Deploy -kubectl apply -f k8s/deployment.yaml - -# Port forward -kubectl port-forward svc/auction-monitor 8081:8081 -n auction-monitor - -# Access: http://localhost:8081 -``` - ---- - -## 📊 Architecture - -``` -┌─────────────────────────────────────────────────────────────┐ -│ QUARKUS APPLICATION │ -├─────────────────────────────────────────────────────────────┤ -│ │ -│ ┌────────────────────────────────────────────────────┐ │ -│ │ QuarkusWorkflowScheduler (@ApplicationScoped) │ │ -│ │ ┌──────────────────────────────────────────────┐ │ │ -│ │ │ @Scheduled(cron = "0 */30 * * * ?") │ │ │ -│ │ │ importScraperData() │ │ │ -│ │ ├──────────────────────────────────────────────┤ │ │ -│ │ │ @Scheduled(cron = "0 0 * * * ?") │ │ │ -│ │ │ processImages() │ │ │ -│ │ ├──────────────────────────────────────────────┤ │ │ -│ │ │ @Scheduled(cron = "0 */15 * * * ?") │ │ │ -│ │ │ monitorBids() │ │ │ -│ │ ├──────────────────────────────────────────────┤ │ │ -│ │ │ @Scheduled(cron = "0 */5 * * * ?") │ │ │ -│ │ │ checkClosingTimes() │ │ │ -│ │ └──────────────────────────────────────────────┘ │ │ -│ └────────────────────────────────────────────────────┘ │ -│ ▲ │ -│ │ @Inject │ -│ ┌───────────────────────┴────────────────────────────┐ │ -│ │ AuctionMonitorProducer │ │ -│ │ ┌──────────────────────────────────────────────┐ │ │ -│ │ │ @Produces @Singleton DatabaseService │ │ │ -│ │ │ @Produces @Singleton NotificationService │ │ │ -│ │ │ @Produces @Singleton ObjectDetectionService │ │ │ -│ │ │ @Produces @Singleton ImageProcessingService │ │ │ -│ │ └──────────────────────────────────────────────┘ │ │ -│ └────────────────────────────────────────────────────┘ │ -│ │ -│ ┌────────────────────────────────────────────────────┐ │ -│ │ AuctionMonitorResource (REST API) │ │ -│ │ ┌──────────────────────────────────────────────┐ │ │ -│ │ │ GET /api/monitor/status │ │ │ -│ │ │ GET /api/monitor/statistics │ │ │ -│ │ │ POST /api/monitor/trigger/* │ │ │ -│ │ │ GET /api/monitor/auctions │ │ │ -│ │ │ GET /api/monitor/lots │ │ │ -│ │ └──────────────────────────────────────────────┘ │ │ -│ └────────────────────────────────────────────────────┘ │ -│ │ -│ ┌────────────────────────────────────────────────────┐ │ -│ │ AuctionMonitorHealthCheck │ │ -│ │ ┌──────────────────────────────────────────────┐ │ │ -│ │ │ @Liveness - /health/live │ │ │ -│ │ │ @Readiness - /health/ready │ │ │ -│ │ │ @Startup - /health/started │ │ │ -│ │ └──────────────────────────────────────────────┘ │ │ -│ └────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────┘ -``` - ---- - -## 🔧 Key Features - -### 1. **Dependency Injection (CDI)** -- Type-safe injection with `@Inject` -- Singleton services with `@Produces` -- Configuration injection with `@ConfigProperty` - -### 2. **Scheduled Tasks** -- Cron-based scheduling with `@Scheduled` -- Configurable via properties -- No manual thread management - -### 3. **REST API** -- JAX-RS endpoints -- JSON serialization -- Error handling - -### 4. **Health Checks** -- Liveness probe (is app alive?) -- Readiness probe (is app ready?) -- Startup probe (has app started?) - -### 5. **Configuration** -- External configuration -- Environment variable override -- Type-safe config injection - -### 6. **Container Ready** -- Optimized Docker image -- Fast startup (~0.5s) -- Low memory (~50MB) -- Health checks included - -### 7. **Cloud Native** -- Kubernetes manifests -- Auto-scaling support -- Ingress configuration -- Persistent storage - ---- - -## 📁 Files Created/Modified - -### New Files - -``` -src/main/java/com/auction/ -├── QuarkusWorkflowScheduler.java # Quarkus scheduler -├── AuctionMonitorProducer.java # CDI producer -├── AuctionMonitorResource.java # REST API -└── AuctionMonitorHealthCheck.java # Health checks - -src/main/resources/ -└── application.properties # Configuration - -k8s/ -├── deployment.yaml # Kubernetes manifests -└── README.md # K8s deployment guide - -docker-compose.yml # Docker Compose config -Dockerfile # Updated for Quarkus -QUARKUS_GUIDE.md # Complete Quarkus guide -QUARKUS_IMPLEMENTATION.md # This file -``` - -### Modified Files - -``` -pom.xml # Added Quarkus dependencies -src/main/resources/application.properties # Added config -``` - ---- - -## 🎯 Benefits of Quarkus - -| Feature | Before | After (Quarkus) | -|---------|--------|-----------------| -| **Startup Time** | ~3-5 seconds | ~0.5 seconds | -| **Memory** | ~200MB | ~50MB | -| **Scheduling** | Manual ExecutorService | @Scheduled annotations | -| **DI/CDI** | Manual instantiation | @Inject, @Produces | -| **REST API** | None | Full JAX-RS API | -| **Health Checks** | None | Built-in probes | -| **Config** | Hard-coded | External properties | -| **Dev Mode** | Manual restart | Live reload | -| **Container** | Basic Docker | Optimized fast-jar | -| **Cloud Native** | Not ready | K8s ready | - ---- - -## 🧪 Testing - -### Unit Tests -```bash -mvn test -``` - -### Integration Tests -```bash -# Start app -mvn quarkus:dev - -# In another terminal -curl http://localhost:8081/api/monitor/status -curl http://localhost:8081/health -curl -X POST http://localhost:8081/api/monitor/trigger/scraper-import -``` - -### Docker Test -```bash -docker-compose up -d -docker-compose logs -f -curl http://localhost:8081/api/monitor/status -docker-compose down -``` - ---- - -## 📚 Documentation - -1. **QUARKUS_GUIDE.md** - Complete Quarkus usage guide -2. **QUARKUS_IMPLEMENTATION.md** - This file (implementation details) -3. **k8s/README.md** - Kubernetes deployment guide -4. **docker-compose.yml** - Docker Compose reference -5. **README.md** - Updated main README - ---- - -## 🎉 Summary - -✅ **Quarkus Framework** - Fully integrated -✅ **@Scheduled Workflows** - Cron-based scheduling -✅ **CDI/Dependency Injection** - Clean architecture -✅ **REST API** - Full control interface -✅ **Health Checks** - Kubernetes ready -✅ **Docker/Compose** - Production containers -✅ **Kubernetes** - Cloud deployment -✅ **Configuration** - Externalized settings -✅ **Documentation** - Complete guides - -**The application is now production-ready with Quarkus! 🚀** - -### Quick Commands - -```bash -# Development -mvn quarkus:dev - -# Production -mvn clean package -java -jar target/quarkus-app/quarkus-run.jar - -# Docker -docker-compose up -d - -# Kubernetes -kubectl apply -f k8s/deployment.yaml -``` - -### API Access - -```bash -# Status -curl http://localhost:8081/api/monitor/status - -# Statistics -curl http://localhost:8081/api/monitor/statistics - -# Health -curl http://localhost:8081/health - -# Trigger workflow -curl -X POST http://localhost:8081/api/monitor/trigger/scraper-import -``` - -**Enjoy your Quarkus-powered Auction Monitor! 🎊** diff --git a/docs/QUICKSTART.md b/docs/QUICKSTART.md deleted file mode 100644 index 5d918fd..0000000 --- a/docs/QUICKSTART.md +++ /dev/null @@ -1,191 +0,0 @@ -# Quick Start Guide - -Get the scraper running in minutes without downloading YOLO models! - -## Minimal Setup (No Object Detection) - -The scraper works perfectly fine **without** YOLO object detection. You can run it immediately and add object detection later if needed. - -### Step 1: Run the Scraper - -```bash -# Using Maven -mvn clean compile exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper" -``` - -Or in IntelliJ IDEA: -1. Open `TroostwijkScraper.java` -2. Right-click on the `main` method -3. Select "Run 'TroostwijkScraper.main()'" - -### What You'll See - -``` -=== Troostwijk Auction Scraper === - -Initializing scraper... -⚠️ Object detection disabled: YOLO model files not found - Expected files: - - models/yolov4.cfg - - models/yolov4.weights - - models/coco.names - Scraper will continue without image analysis. - -[1/3] Discovering Dutch auctions... -✓ Found 5 auctions: [12345, 12346, 12347, 12348, 12349] - -[2/3] Fetching lot details... - Processing sale 12345... - -[3/3] Starting monitoring service... -✓ Monitoring active. Press Ctrl+C to stop. -``` - -### Step 2: Test Desktop Notifications - -The scraper will automatically send desktop notifications when: -- A new bid is placed on a monitored lot -- An auction is closing within 5 minutes - -**No setup required** - desktop notifications work out of the box! - ---- - -## Optional: Add Email Notifications - -If you want email notifications in addition to desktop notifications: - -```bash -# Set environment variable -export NOTIFICATION_CONFIG="smtp:your.email@gmail.com:app_password:your.email@gmail.com" - -# Then run the scraper -mvn exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper" -``` - -**Get Gmail App Password:** -1. Enable 2FA in Google Account -2. Go to: Google Account → Security → 2-Step Verification → App passwords -3. Generate password for "Mail" -4. Use that password (not your regular Gmail password) - ---- - -## Optional: Add Object Detection Later - -If you want AI-powered image analysis to detect objects in auction photos: - -### 1. Create models directory -```bash -mkdir models -cd models -``` - -### 2. Download YOLO files -```bash -# YOLOv4 config (small) -curl -O https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg - -# YOLOv4 weights (245 MB - takes a few minutes) -curl -LO https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights - -# COCO class names -curl -O https://raw.githubusercontent.com/AlexeyAB/darknet/master/data/coco.names -``` - -### 3. Run again -```bash -mvn exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper" -``` - -Now you'll see: -``` -✓ Object detection enabled with YOLO -``` - -The scraper will now analyze auction images and detect objects like: -- Vehicles (cars, trucks, forklifts) -- Equipment (machines, tools) -- Furniture -- Electronics -- And 80+ other object types - ---- - -## Features Without Object Detection - -Even without YOLO, the scraper provides: - -✅ **Full auction scraping** - Discovers all Dutch auctions -✅ **Lot tracking** - Monitors bids and closing times -✅ **Desktop notifications** - Real-time alerts -✅ **SQLite database** - All data persisted locally -✅ **Image downloading** - Saves all lot images -✅ **Scheduled monitoring** - Automatic updates every hour - -Object detection simply adds: -- AI-powered image analysis -- Automatic object labeling -- Searchable image database - ---- - -## Database Location - -The scraper creates `troostwijk.db` in your current directory with: -- All auction data -- Lot details (title, description, bids, etc.) -- Downloaded image paths -- Object labels (if detection enabled) - -View the database with any SQLite browser: -```bash -sqlite3 troostwijk.db -.tables -SELECT * FROM lots LIMIT 5; -``` - ---- - -## Stopping the Scraper - -Press **Ctrl+C** to stop the monitoring service. - ---- - -## Next Steps - -1. ✅ **Run the scraper** without YOLO to test it -2. ✅ **Verify desktop notifications** work -3. ⚙️ **Optional**: Add email notifications -4. ⚙️ **Optional**: Download YOLO models for object detection -5. 🔧 **Customize**: Edit monitoring frequency, closing alerts, etc. - ---- - -## Troubleshooting - -### Desktop notifications not appearing? -- **Windows**: Check if Java has notification permissions -- **Linux**: Ensure desktop environment is running (not headless) -- **macOS**: Check System Preferences → Notifications - -### OpenCV warnings? -These are normal and can be ignored: -``` -WARNING: A restricted method in java.lang.System has been called -WARNING: Use --enable-native-access=ALL-UNNAMED to avoid warning -``` - -The scraper works fine despite these warnings. - ---- - -## Full Documentation - -See [README.md](../README.md) for complete documentation including: -- Email setup details -- YOLO installation guide -- Configuration options -- Database schema -- API endpoints diff --git a/docs/SCRAPER_REFACTOR_GUIDE.md b/docs/SCRAPER_REFACTOR_GUIDE.md deleted file mode 100644 index 483c570..0000000 --- a/docs/SCRAPER_REFACTOR_GUIDE.md +++ /dev/null @@ -1,399 +0,0 @@ -# Scraper Refactor Guide - Image Download Integration - -## 🎯 Objective - -Refactor the Troostwijk scraper to **download and store images locally**, eliminating the 57M+ duplicate image problem in the monitoring process. - -## 📋 Current vs. New Architecture - -### **Before** (Current Architecture) -``` -┌──────────────┐ ┌──────────────┐ ┌──────────────┐ -│ Scraper │────────▶│ Database │◀────────│ Monitor │ -│ │ │ │ │ │ -│ Stores URLs │ │ images table │ │ Downloads + │ -│ downloaded=0 │ │ │ │ Detection │ -└──────────────┘ └──────────────┘ └──────────────┘ - │ - ▼ - 57M+ duplicates! -``` - -### **After** (New Architecture) -``` -┌──────────────┐ ┌──────────────┐ ┌──────────────┐ -│ Scraper │────────▶│ Database │◀────────│ Monitor │ -│ │ │ │ │ │ -│ Downloads + │ │ images table │ │ Detection │ -│ Stores path │ │ local_path ✓ │ │ Only │ -│ downloaded=1 │ │ │ │ │ -└──────────────┘ └──────────────┘ └──────────────┘ - │ - ▼ - No duplicates! -``` - -## 🗄️ Database Schema Changes - -### Current Schema (ARCHITECTURE-TROOSTWIJK-SCRAPER.md:113-122) -```sql -CREATE TABLE images ( - id INTEGER PRIMARY KEY AUTOINCREMENT, - lot_id TEXT, - url TEXT, - local_path TEXT, -- Currently NULL - downloaded INTEGER -- Currently 0 - -- Missing: processed_at, labels (added by monitor) -); -``` - -### Required Schema (Already Compatible!) -```sql -CREATE TABLE images ( - id INTEGER PRIMARY KEY AUTOINCREMENT, - lot_id TEXT, - url TEXT, - local_path TEXT, -- ✅ SET by scraper after download - downloaded INTEGER, -- ✅ SET to 1 by scraper after download - labels TEXT, -- ⚠️ SET by monitor (object detection) - processed_at INTEGER, -- ⚠️ SET by monitor (timestamp) - FOREIGN KEY (lot_id) REFERENCES lots(lot_id) -); -``` - -**Good News**: The scraper's schema already has `local_path` and `downloaded` columns! You just need to populate them. - -## 🔧 Implementation Steps - -### **Step 1: Enable Image Downloading in Configuration** - -**File**: Your scraper's config file (e.g., `config.py` or environment variables) - -```python -# Current setting -DOWNLOAD_IMAGES = False # ❌ Change this! - -# New setting -DOWNLOAD_IMAGES = True # ✅ Enable downloads - -# Image storage path -IMAGES_DIR = "/mnt/okcomputer/output/images" # Or your preferred path -``` - -### **Step 2: Update Image Download Logic** - -Based on ARCHITECTURE-TROOSTWIJK-SCRAPER.md:211-228, you already have the structure. Here's what needs to change: - -**Current Code** (Conceptual): -```python -# Phase 3: Scrape lot details -def scrape_lot(lot_url): - lot_data = parse_lot_page(lot_url) - - # Save lot to database - db.insert_lot(lot_data) - - # Save image URLs to database (NOT DOWNLOADED) - for img_url in lot_data['images']: - db.execute(""" - INSERT INTO images (lot_id, url, downloaded) - VALUES (?, ?, 0) - """, (lot_data['lot_id'], img_url)) -``` - -**New Code** (Required): -```python -import os -import requests -from pathlib import Path -import time - -def scrape_lot(lot_url): - lot_data = parse_lot_page(lot_url) - - # Save lot to database - db.insert_lot(lot_data) - - # Download and save images - for idx, img_url in enumerate(lot_data['images'], start=1): - try: - # Download image - local_path = download_image(img_url, lot_data['lot_id'], idx) - - # Insert with local_path and downloaded=1 - db.execute(""" - INSERT INTO images (lot_id, url, local_path, downloaded) - VALUES (?, ?, ?, 1) - ON CONFLICT(lot_id, url) DO UPDATE SET - local_path = excluded.local_path, - downloaded = 1 - """, (lot_data['lot_id'], img_url, local_path)) - - # Rate limiting (0.5s between downloads) - time.sleep(0.5) - - except Exception as e: - print(f"Failed to download {img_url}: {e}") - # Still insert record but mark as not downloaded - db.execute(""" - INSERT INTO images (lot_id, url, downloaded) - VALUES (?, ?, 0) - """, (lot_data['lot_id'], img_url)) - -def download_image(image_url, lot_id, index): - """ - Downloads an image and saves it to organized directory structure. - - Args: - image_url: Remote URL of the image - lot_id: Lot identifier (e.g., "A1-28505-5") - index: Image sequence number (1, 2, 3, ...) - - Returns: - Absolute path to saved file - """ - # Create directory structure: /images/{lot_id}/ - images_dir = Path(os.getenv('IMAGES_DIR', '/mnt/okcomputer/output/images')) - lot_dir = images_dir / lot_id - lot_dir.mkdir(parents=True, exist_ok=True) - - # Determine file extension from URL or content-type - ext = Path(image_url).suffix or '.jpg' - filename = f"{index:03d}{ext}" # 001.jpg, 002.jpg, etc. - local_path = lot_dir / filename - - # Download with timeout - response = requests.get(image_url, timeout=10) - response.raise_for_status() - - # Save to disk - with open(local_path, 'wb') as f: - f.write(response.content) - - return str(local_path.absolute()) -``` - -### **Step 3: Add Unique Constraint to Prevent Duplicates** - -**Migration SQL**: -```sql --- Add unique constraint to prevent duplicate image records -CREATE UNIQUE INDEX IF NOT EXISTS idx_images_unique -ON images(lot_id, url); -``` - -Add this to your scraper's schema initialization: - -```python -def init_database(): - conn = sqlite3.connect('/mnt/okcomputer/output/cache.db') - cursor = conn.cursor() - - # Existing table creation... - cursor.execute(""" - CREATE TABLE IF NOT EXISTS images (...) - """) - - # Add unique constraint (NEW) - cursor.execute(""" - CREATE UNIQUE INDEX IF NOT EXISTS idx_images_unique - ON images(lot_id, url) - """) - - conn.commit() - conn.close() -``` - -### **Step 4: Handle Image Download Failures Gracefully** - -```python -def download_with_retry(image_url, lot_id, index, max_retries=3): - """Downloads image with retry logic.""" - for attempt in range(max_retries): - try: - return download_image(image_url, lot_id, index) - except requests.exceptions.RequestException as e: - if attempt == max_retries - 1: - print(f"Failed after {max_retries} attempts: {image_url}") - return None # Return None on failure - print(f"Retry {attempt + 1}/{max_retries} for {image_url}") - time.sleep(2 ** attempt) # Exponential backoff -``` - -### **Step 5: Update Database Queries** - -Make sure your INSERT uses `INSERT ... ON CONFLICT` to handle re-scraping: - -```python -# Good: Handles re-scraping without duplicates -db.execute(""" - INSERT INTO images (lot_id, url, local_path, downloaded) - VALUES (?, ?, ?, 1) - ON CONFLICT(lot_id, url) DO UPDATE SET - local_path = excluded.local_path, - downloaded = 1 -""", (lot_id, img_url, local_path)) - -# Bad: Creates duplicates on re-scrape -db.execute(""" - INSERT INTO images (lot_id, url, local_path, downloaded) - VALUES (?, ?, ?, 1) -""", (lot_id, img_url, local_path)) -``` - -## 📊 Expected Outcomes - -### Before Refactor -```sql -SELECT COUNT(*) FROM images WHERE downloaded = 0; --- Result: 57,376,293 (57M+ undownloaded!) - -SELECT COUNT(*) FROM images WHERE local_path IS NOT NULL; --- Result: 0 (no files downloaded) -``` - -### After Refactor -```sql -SELECT COUNT(*) FROM images WHERE downloaded = 1; --- Result: ~16,807 (one per actual lot image) - -SELECT COUNT(*) FROM images WHERE local_path IS NOT NULL; --- Result: ~16,807 (all downloaded images have paths) - -SELECT COUNT(DISTINCT lot_id, url) FROM images; --- Result: ~16,807 (no duplicates!) -``` - -## 🚀 Deployment Checklist - -### Pre-Deployment -- [ ] Back up current database: `cp cache.db cache.db.backup` -- [ ] Verify disk space: At least 10GB free for images -- [ ] Test download function on 5 sample lots -- [ ] Verify `IMAGES_DIR` path exists and is writable - -### Deployment -- [ ] Update configuration: `DOWNLOAD_IMAGES = True` -- [ ] Run schema migration to add unique index -- [ ] Deploy updated scraper code -- [ ] Monitor first 100 lots for errors - -### Post-Deployment Verification -```sql --- Check download success rate -SELECT - COUNT(*) as total_images, - SUM(CASE WHEN downloaded = 1 THEN 1 ELSE 0 END) as downloaded, - SUM(CASE WHEN downloaded = 0 THEN 1 ELSE 0 END) as failed, - ROUND(100.0 * SUM(downloaded) / COUNT(*), 2) as success_rate -FROM images; - --- Check for duplicates (should be 0) -SELECT lot_id, url, COUNT(*) as dup_count -FROM images -GROUP BY lot_id, url -HAVING COUNT(*) > 1; - --- Verify file system -SELECT COUNT(*) FROM images -WHERE downloaded = 1 - AND local_path IS NOT NULL - AND local_path != ''; -``` - -## 🔍 Monitoring Process Impact - -The monitoring process (auctiora) will automatically: -- ✅ Stop downloading images (network I/O eliminated) -- ✅ Only run object detection on `local_path` files -- ✅ Query: `WHERE local_path IS NOT NULL AND (labels IS NULL OR labels = '')` -- ✅ Update only the `labels` and `processed_at` columns - -**No changes needed in monitoring process!** It's already updated to work with scraper-downloaded images. - -## 🐛 Troubleshooting - -### Problem: "No space left on device" -```bash -# Check disk usage -df -h /mnt/okcomputer/output/images - -# Estimate needed space: ~100KB per image -# 16,807 images × 100KB = ~1.6GB -``` - -### Problem: "Permission denied" when writing images -```bash -# Fix permissions -chmod 755 /mnt/okcomputer/output/images -chown -R scraper_user:scraper_group /mnt/okcomputer/output/images -``` - -### Problem: Images downloading but not recorded in DB -```python -# Add logging -import logging -logging.basicConfig(level=logging.INFO) - -def download_image(...): - logging.info(f"Downloading {image_url} to {local_path}") - # ... download code ... - logging.info(f"Saved to {local_path}, size: {os.path.getsize(local_path)} bytes") - return local_path -``` - -### Problem: Duplicate images after refactor -```sql --- Find duplicates -SELECT lot_id, url, COUNT(*) -FROM images -GROUP BY lot_id, url -HAVING COUNT(*) > 1; - --- Clean up duplicates (keep newest) -DELETE FROM images -WHERE id NOT IN ( - SELECT MAX(id) - FROM images - GROUP BY lot_id, url -); -``` - -## 📈 Performance Comparison - -| Metric | Before (Monitor Downloads) | After (Scraper Downloads) | -|----------------------|---------------------------------|---------------------------| -| **Image records** | 57,376,293 | ~16,807 | -| **Duplicates** | 57,359,486 (99.97%!) | 0 | -| **Network I/O** | Monitor process | Scraper process | -| **Disk usage** | 0 (URLs only) | ~1.6GB (actual files) | -| **Processing speed** | 500ms/image (download + detect) | 100ms/image (detect only) | -| **Error handling** | Complex (download failures) | Simple (files exist) | - -## 🎓 Code Examples by Language - -### Python (Most Likely) -See **Step 2** above for complete implementation. - -## 📚 References - -- **Current Scraper Architecture**: `wiki/ARCHITECTURE-TROOSTWIJK-SCRAPER.md` -- **Database Schema**: `wiki/DATABASE_ARCHITECTURE.md` -- **Monitor Changes**: See commit history for `ImageProcessingService.java`, `DatabaseService.java` - -## ✅ Success Criteria - -You'll know the refactor is successful when: - -1. ✅ Database query `SELECT COUNT(*) FROM images` returns ~16,807 (not 57M+) -2. ✅ All images have `downloaded = 1` and `local_path IS NOT NULL` -3. ✅ No duplicate records: `SELECT lot_id, url, COUNT(*) ... HAVING COUNT(*) > 1` returns 0 rows -4. ✅ Monitor logs show "Found N images needing detection" with reasonable numbers -5. ✅ Files exist at paths in `local_path` column -6. ✅ Monitor process speed increases (100ms vs 500ms per image) - ---- - -**Questions?** Check the troubleshooting section or inspect the monitor's updated code in: -- `src/main/java/auctiora/ImageProcessingService.java` -- `src/main/java/auctiora/DatabaseService.java:695-719` diff --git a/docs/TEST_SUITE_SUMMARY.md b/docs/TEST_SUITE_SUMMARY.md deleted file mode 100644 index 8e11e21..0000000 --- a/docs/TEST_SUITE_SUMMARY.md +++ /dev/null @@ -1,333 +0,0 @@ -# Test Suite Summary - -## Overview -Comprehensive test suite for Troostwijk Auction Monitor with individual test cases for every aspect of the system. - -## Configuration Updates - -### Paths Updated -- **Database**: `C:\mnt\okcomputer\output\cache.db` -- **Images**: `C:\mnt\okcomputer\output\images\{saleId}\{lotId}\` - -### Files Modified -1. `src/main/java/com/auction/Main.java` - Updated default database path -2. `src/main/java/com/auction/ImageProcessingService.java` - Updated image storage path - -## Test Files Created - -### 1. ScraperDataAdapterTest.java (13 test cases) -Tests data transformation from external scraper schema to monitor schema: - -- ✅ Extract numeric ID from text format (auction & lot IDs) -- ✅ Convert scraper auction format to AuctionInfo -- ✅ Handle simple location without country -- ✅ Convert scraper lot format to Lot -- ✅ Parse bid amounts from various formats (€, $, £, plain numbers) -- ✅ Handle missing/null fields gracefully -- ✅ Parse various timestamp formats (ISO, SQL) -- ✅ Handle invalid timestamps -- ✅ Extract type prefix from auction ID -- ✅ Handle GBP currency symbol -- ✅ Handle "No bids" text -- ✅ Parse complex lot IDs (A1-28505-5 → 285055) -- ✅ Validate field mapping (lots_count → lotCount, etc.) - -### 2. DatabaseServiceTest.java (15 test cases) -Tests database operations and SQLite persistence: - -- ✅ Create database schema successfully -- ✅ Insert and retrieve auction -- ✅ Update existing auction on conflict (UPSERT) -- ✅ Retrieve auctions by country code -- ✅ Insert and retrieve lot -- ✅ Update lot current bid -- ✅ Update lot notification flags -- ✅ Insert and retrieve image records -- ✅ Count total images -- ✅ Handle empty database gracefully -- ✅ Handle lots with null closing time -- ✅ Retrieve active lots -- ✅ Handle concurrent upserts (thread safety) -- ✅ Validate foreign key relationships -- ✅ Test database indexes performance - -### 3. ImageProcessingServiceTest.java (11 test cases) -Tests image downloading and processing pipeline: - -- ✅ Process images for lot with object detection -- ✅ Handle image download failure gracefully -- ✅ Create directory structure for images -- ✅ Save detected objects to database -- ✅ Handle empty image list -- ✅ Process pending images from database -- ✅ Skip lots that already have images -- ✅ Handle database errors during image save -- ✅ Handle empty detection results -- ✅ Handle lots with no existing images -- ✅ Capture and verify detection labels - -### 4. ObjectDetectionServiceTest.java (10 test cases) -Tests YOLO object detection functionality: - -- ✅ Initialize with missing YOLO models (disabled mode) -- ✅ Return empty list when detection is disabled -- ✅ Handle invalid image path gracefully -- ✅ Handle empty image file -- ✅ Initialize successfully with valid model files -- ✅ Handle missing class names file -- ✅ Detect when model files are missing -- ✅ Return unique labels only -- ✅ Handle multiple detections in same image -- ✅ Respect confidence threshold (0.5) - -### 5. NotificationServiceTest.java (19 test cases) -Tests desktop and email notification delivery: - -- ✅ Initialize with desktop-only configuration -- ✅ Initialize with SMTP configuration -- ✅ Reject invalid SMTP configuration format -- ✅ Reject unknown configuration type -- ✅ Send desktop notification without error -- ✅ Send high priority notification -- ✅ Send normal priority notification -- ✅ Handle notification when system tray not supported -- ✅ Send email notification with valid SMTP config -- ✅ Include both desktop and email when SMTP configured -- ✅ Handle empty message gracefully -- ✅ Handle very long message (1000+ chars) -- ✅ Handle special characters in message (€, ⚠️) -- ✅ Accept case-insensitive desktop config -- ✅ Validate SMTP config parts count -- ✅ Handle multiple rapid notifications -- ✅ Send bid change notification format -- ✅ Send closing alert notification format -- ✅ Send object detection notification format - -### 6. TroostwijkMonitorTest.java (12 test cases) -Tests monitoring orchestration and coordination: - -- ✅ Initialize monitor successfully -- ✅ Print database stats without error -- ✅ Process pending images without error -- ✅ Handle empty database gracefully -- ✅ Track lots in database -- ✅ Monitor lots closing soon (< 5 minutes) -- ✅ Identify lots with time remaining -- ✅ Handle lots without closing time -- ✅ Track notification status -- ✅ Update bid amounts -- ✅ Handle multiple concurrent lot updates -- ✅ Handle database with auctions and lots - -### 7. IntegrationTest.java (10 test cases) -Tests complete end-to-end workflows: - -- ✅ **Test 1**: Complete scraper data import workflow - - Import auction from scraper format - - Import multiple lots for auction - - Verify data integrity - -- ✅ **Test 2**: Image processing and detection workflow - - Add images for lots - - Run object detection - - Save labels to database - -- ✅ **Test 3**: Bid monitoring and notification workflow - - Simulate bid increase - - Update database - - Send notification - - Verify bid was updated - -- ✅ **Test 4**: Closing alert workflow - - Create lot closing soon - - Send high-priority notification - - Mark as notified - - Verify notification flag - -- ✅ **Test 5**: Multi-country auction filtering - - Add auctions from NL, RO, BE - - Filter by country code - - Verify filtering works correctly - -- ✅ **Test 6**: Complete monitoring cycle - - Print database statistics - - Process pending images - - Verify database integrity - -- ✅ **Test 7**: Data consistency across services - - Verify all auctions have valid data - - Verify all lots have valid data - - Check referential integrity - -- ✅ **Test 8**: Object detection value estimation workflow - - Create lot with detected objects - - Add images with labels - - Analyze detected objects - - Send value estimation notification - -- ✅ **Test 9**: Handle rapid concurrent updates - - Concurrent auction insertions - - Concurrent lot insertions - - Verify all data persisted correctly - -- ✅ **Test 10**: End-to-end notification scenarios - - Bid change notification - - Closing alert - - Object detection notification - - Value estimate notification - - Viewing day reminder - -## Test Coverage Summary - -| Component | Test Cases | Coverage Areas | -|-----------|-----------|----------------| -| **ScraperDataAdapter** | 13 | Data transformation, ID parsing, currency parsing, timestamp parsing | -| **DatabaseService** | 15 | CRUD operations, concurrency, foreign keys, indexes | -| **ImageProcessingService** | 11 | Download, detection integration, error handling | -| **ObjectDetectionService** | 10 | YOLO initialization, detection, confidence threshold | -| **NotificationService** | 19 | Desktop/Email, priority levels, special chars, formats | -| **TroostwijkMonitor** | 12 | Orchestration, monitoring, bid tracking, alerts | -| **Integration** | 10 | End-to-end workflows, multi-service coordination | -| **TOTAL** | **90** | **Complete system coverage** | - -## Key Testing Patterns - -### 1. Isolation Testing -Each component tested independently with mocks: -```java -mockDb = mock(DatabaseService.class); -mockDetector = mock(ObjectDetectionService.class); -service = new ImageProcessingService(mockDb, mockDetector); -``` - -### 2. Integration Testing -Components tested together for realistic scenarios: -```java -db → imageProcessor → detector → notifier -``` - -### 3. Concurrency Testing -Thread safety verified with parallel operations: -```java -Thread t1 = new Thread(() -> db.upsertLot(...)); -Thread t2 = new Thread(() -> db.upsertLot(...)); -t1.start(); t2.start(); -``` - -### 4. Error Handling -Graceful degradation tested throughout: -```java -assertDoesNotThrow(() -> service.process(invalidInput)); -``` - -## Running the Tests - -### Run All Tests -```bash -mvn test -``` - -### Run Specific Test Class -```bash -mvn test -Dtest=ScraperDataAdapterTest -mvn test -Dtest=IntegrationTest -``` - -### Run Single Test Method -```bash -mvn test -Dtest=IntegrationTest#testCompleteScraperImportWorkflow -``` - -### Generate Coverage Report -```bash -mvn jacoco:prepare-agent test jacoco:report -``` - -## Test Data Cleanup -All tests use temporary databases that are automatically cleaned up: -```java -@AfterAll -void tearDown() throws Exception { - Files.deleteIfExists(Paths.get(testDbPath)); -} -``` - -## Integration Scenarios Covered - -### Scenario 1: New Auction Discovery -1. External scraper finds new auction -2. Data imported via ScraperDataAdapter -3. Lots added to database -4. Images downloaded -5. Object detection runs -6. Notification sent to user - -### Scenario 2: Bid Monitoring -1. Monitor checks API every hour -2. Detects bid increase -3. Updates database -4. Sends notification -5. User can place counter-bid - -### Scenario 3: Closing Alert -1. Monitor checks closing times -2. Lot closing in < 5 minutes -3. High-priority notification sent -4. Flag updated to prevent duplicates -5. User can place final bid - -### Scenario 4: Value Estimation -1. Images downloaded -2. YOLO detects objects -3. Labels saved to database -4. Value estimated (future feature) -5. Notification sent with estimate - -## Dependencies Required for Tests - -```xml - - - - org.junit.jupiter - junit-jupiter - 5.10.0 - test - - - - - org.mockito - mockito-core - 5.5.0 - test - - - - - org.mockito - mockito-junit-jupiter - 5.5.0 - test - - -``` - -## Notes - -- All tests are independent and can run in any order -- Tests use in-memory or temporary databases -- No actual HTTP requests made (except in integration tests) -- YOLO models are optional (tests work in disabled mode) -- Notifications are tested but may not display in headless environments -- Tests document expected behavior for each component - -## Future Test Enhancements - -1. **Mock HTTP Server** for realistic image download testing -2. **Test Containers** for full database integration -3. **Performance Tests** for large datasets (1000+ auctions) -4. **Stress Tests** for concurrent monitoring scenarios -5. **UI Tests** for notification display (if GUI added) -6. **API Tests** for Troostwijk API integration -7. **Value Estimation** tests (when algorithm implemented) diff --git a/docs/WORKFLOW_GUIDE.md b/docs/WORKFLOW_GUIDE.md deleted file mode 100644 index 4e8f968..0000000 --- a/docs/WORKFLOW_GUIDE.md +++ /dev/null @@ -1,537 +0,0 @@ -## Troostwijk Auction Monitor - Workflow Integration Guide - -Complete guide for running the auction monitoring system with scheduled workflows, cron jobs, and event-driven triggers. - ---- - -## Table of Contents - -1. [Overview](#overview) -2. [Running Modes](#running-modes) -3. [Workflow Orchestration](#workflow-orchestration) -4. [Windows Scheduling](#windows-scheduling) -5. [Event-Driven Triggers](#event-driven-triggers) -6. [Configuration](#configuration) -7. [Monitoring & Debugging](#monitoring--debugging) - ---- - -## Overview - -The Troostwijk Auction Monitor supports multiple execution modes: - -- **Workflow Mode** (Recommended): Continuous operation with built-in scheduling -- **Once Mode**: Single execution for external schedulers (Windows Task Scheduler, cron) -- **Legacy Mode**: Original monitoring approach -- **Status Mode**: Quick status check - ---- - -## Running Modes - -### 1. Workflow Mode (Default - Recommended) - -**Runs all workflows continuously with built-in scheduling.** - -```bash -# Windows -java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar workflow - -# Or simply (workflow is default) -java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar - -# Using batch script -run-workflow.bat -``` - -**What it does:** -- ✅ Imports scraper data every 30 minutes -- ✅ Processes images every 1 hour -- ✅ Monitors bids every 15 minutes -- ✅ Checks closing times every 5 minutes - -**Best for:** -- Production deployment -- Long-running services -- Development/testing - ---- - -### 2. Once Mode (For External Schedulers) - -**Runs complete workflow once and exits.** - -```bash -# Windows -java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar once - -# Using batch script -run-once.bat -``` - -**What it does:** -1. Imports scraper data -2. Processes pending images -3. Monitors bids -4. Checks closing times -5. Exits - -**Best for:** -- Windows Task Scheduler -- Cron jobs (Linux/Mac) -- Manual execution -- Testing - ---- - -### 3. Legacy Mode - -**Original monitoring approach (backward compatibility).** - -```bash -java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar legacy -``` - -**Best for:** -- Maintaining existing deployments -- Troubleshooting - ---- - -### 4. Status Mode - -**Shows current status and exits.** - -```bash -java -jar target\troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar status - -# Using batch script -check-status.bat -``` - -**Output:** -``` -📊 Workflow Status: - Running: No - Auctions: 25 - Lots: 150 - Images: 300 - Closing soon (< 30 min): 5 -``` - ---- - -## Workflow Orchestration - -The `WorkflowOrchestrator` coordinates 4 scheduled workflows: - -### Workflow 1: Scraper Data Import -**Frequency:** Every 30 minutes -**Purpose:** Import new auctions and lots from external scraper - -**Process:** -1. Import auctions from scraper database -2. Import lots from scraper database -3. Import image URLs -4. Send notification if significant data imported - -**Code Location:** `WorkflowOrchestrator.java:110` - ---- - -### Workflow 2: Image Processing -**Frequency:** Every 1 hour -**Purpose:** Download images and run object detection - -**Process:** -1. Get unprocessed images from database -2. Download each image -3. Run YOLO object detection -4. Save labels to database -5. Send notification for interesting detections (3+ objects) - -**Code Location:** `WorkflowOrchestrator.java:150` - ---- - -### Workflow 3: Bid Monitoring -**Frequency:** Every 15 minutes -**Purpose:** Check for bid changes and send notifications - -**Process:** -1. Get all active lots -2. Check for bid changes (via external scraper updates) -3. Send notifications for bid increases - -**Code Location:** `WorkflowOrchestrator.java:210` - -**Note:** The external scraper updates bids; this workflow monitors and notifies. - ---- - -### Workflow 4: Closing Alerts -**Frequency:** Every 5 minutes -**Purpose:** Send alerts for lots closing soon - -**Process:** -1. Get all active lots -2. Check closing times -3. Send high-priority notification for lots closing in < 5 min -4. Mark as notified to prevent duplicates - -**Code Location:** `WorkflowOrchestrator.java:240` - ---- - -## Windows Scheduling - -### Option A: Use Built-in Workflow Mode (Recommended) - -**Run as a Windows Service or startup application:** - -1. Create shortcut to `run-workflow.bat` -2. Place in: `C:\Users\[YourUser]\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup` -3. Monitor will start automatically on login - ---- - -### Option B: Windows Task Scheduler (Once Mode) - -**Automated setup:** - -```powershell -# Run PowerShell as Administrator -.\setup-windows-task.ps1 -``` - -This creates two tasks: -- `TroostwijkMonitor-Workflow`: Runs every 30 minutes -- `TroostwijkMonitor-StatusCheck`: Runs every 6 hours - -**Manual setup:** - -1. Open Task Scheduler -2. Create Basic Task -3. Configure: - - **Name:** `TroostwijkMonitor` - - **Trigger:** Every 30 minutes - - **Action:** Start a program - - **Program:** `java` - - **Arguments:** `-jar "C:\path\to\troostwijk-scraper.jar" once` - - **Start in:** `C:\path\to\project` - ---- - -### Option C: Multiple Scheduled Tasks (Fine-grained Control) - -Create separate tasks for each workflow: - -| Task | Frequency | Command | -|------|-----------|---------| -| Import Data | Every 30 min | `run-once.bat` | -| Process Images | Every 1 hour | `run-once.bat` | -| Check Bids | Every 15 min | `run-once.bat` | -| Closing Alerts | Every 5 min | `run-once.bat` | - ---- - -## Event-Driven Triggers - -The orchestrator supports event-driven execution: - -### 1. New Auction Discovered - -```java -orchestrator.onNewAuctionDiscovered(auctionInfo); -``` - -**Triggered when:** -- External scraper finds new auction - -**Actions:** -- Insert to database -- Send notification - ---- - -### 2. Bid Change Detected - -```java -orchestrator.onBidChange(lot, previousBid, newBid); -``` - -**Triggered when:** -- Bid increases on monitored lot - -**Actions:** -- Update database -- Send notification: "Nieuw bod op kavel X: €Y (was €Z)" - ---- - -### 3. Objects Detected - -```java -orchestrator.onObjectsDetected(lotId, labels); -``` - -**Triggered when:** -- YOLO detects 2+ objects in image - -**Actions:** -- Send notification: "Lot X contains: car, truck, machinery" - ---- - -## Configuration - -### Environment Variables - -```bash -# Database location -set DATABASE_FILE=C:\mnt\okcomputer\output\cache.db - -# Notification configuration -set NOTIFICATION_CONFIG=desktop - -# Or for email notifications -set NOTIFICATION_CONFIG=smtp:your@gmail.com:app_password:recipient@example.com -``` - -### Configuration Files - -**YOLO Model Paths** (`Main.java:35-37`): -```java -String yoloCfg = "models/yolov4.cfg"; -String yoloWeights = "models/yolov4.weights"; -String yoloClasses = "models/coco.names"; -``` - -### Customizing Schedules - -Edit `WorkflowOrchestrator.java` to change frequencies: - -```java -// Change from 30 minutes to 15 minutes -scheduler.scheduleAtFixedRate(() -> { - // ... scraper import logic -}, 0, 15, TimeUnit.MINUTES); // Changed from 30 -``` - ---- - -## Monitoring & Debugging - -### Check Status - -```bash -# Quick status check -java -jar troostwijk-monitor.jar status - -# Or -check-status.bat -``` - -### View Logs - -Workflows print timestamped logs: - -``` -📥 [WORKFLOW 1] Importing scraper data... - → Imported 5 auctions - → Imported 25 lots - → Found 50 unprocessed images - ✓ Scraper import completed in 1250ms - -🖼️ [WORKFLOW 2] Processing pending images... - → Processing 50 images - ✓ Processed 50 images, detected objects in 12 (15.3s) -``` - -### Common Issues - -#### 1. No data being imported - -**Problem:** External scraper not running - -**Solution:** -```bash -# Check if scraper is running and populating database -sqlite3 C:\mnt\okcomputer\output\cache.db "SELECT COUNT(*) FROM auctions;" -``` - -#### 2. Images not downloading - -**Problem:** No internet connection or invalid URLs - -**Solution:** -- Check network connectivity -- Verify image URLs in database -- Check firewall settings - -#### 3. Notifications not showing - -**Problem:** System tray not available - -**Solution:** -- Use email notifications instead -- Check notification permissions in Windows - -#### 4. Workflows not running - -**Problem:** Application crashed or was stopped - -**Solution:** -- Check Task Scheduler logs -- Review application logs -- Restart in workflow mode - ---- - -## Integration Examples - -### Example 1: Complete Automated Workflow - -**Setup:** -1. External scraper runs continuously, populating database -2. This monitor runs in workflow mode -3. Notifications sent to desktop + email - -**Result:** -- New auctions → Notification within 30 min -- New images → Processed within 1 hour -- Bid changes → Notification within 15 min -- Closing alerts → Notification within 5 min - ---- - -### Example 2: On-Demand Processing - -**Setup:** -1. External scraper runs once per day (cron/Task Scheduler) -2. This monitor runs in once mode after scraper completes - -**Script:** -```bash -# run-daily.bat -@echo off -REM Run scraper first -python scraper.py - -REM Wait for completion -timeout /t 30 - -REM Run monitor once -java -jar troostwijk-monitor.jar once -``` - ---- - -### Example 3: Event-Driven with External Integration - -**Setup:** -1. External system calls orchestrator events -2. Workflows run on-demand - -**Java code:** -```java -WorkflowOrchestrator orchestrator = new WorkflowOrchestrator(...); - -// When external scraper finds new auction -AuctionInfo newAuction = parseScraperData(); -orchestrator.onNewAuctionDiscovered(newAuction); - -// When bid detected -orchestrator.onBidChange(lot, 100.0, 150.0); -``` - ---- - -## Advanced Topics - -### Custom Workflows - -Add custom workflows to `WorkflowOrchestrator`: - -```java -// Workflow 5: Value Estimation (every 2 hours) -scheduler.scheduleAtFixedRate(() -> { - try { - Console.println("💰 [WORKFLOW 5] Estimating values..."); - - var lotsWithImages = db.getLotsWithImages(); - for (var lot : lotsWithImages) { - var images = db.getImagesForLot(lot.lotId()); - double estimatedValue = estimateValue(images); - - // Update database - db.updateLotEstimatedValue(lot.lotId(), estimatedValue); - - // Notify if high value - if (estimatedValue > 5000) { - notifier.sendNotification( - String.format("High value lot detected: %d (€%.2f)", - lot.lotId(), estimatedValue), - "Value Alert", 1 - ); - } - } - } catch (Exception e) { - Console.println(" ❌ Value estimation failed: " + e.getMessage()); - } -}, 10, 120, TimeUnit.MINUTES); -``` - -### Webhook Integration - -Trigger workflows via HTTP webhooks: - -```java -// In a separate web server (e.g., using Javalin) -Javalin app = Javalin.create().start(7070); - -app.post("/webhook/new-auction", ctx -> { - AuctionInfo auction = ctx.bodyAsClass(AuctionInfo.class); - orchestrator.onNewAuctionDiscovered(auction); - ctx.result("OK"); -}); - -app.post("/webhook/bid-change", ctx -> { - BidChange change = ctx.bodyAsClass(BidChange.class); - orchestrator.onBidChange(change.lot, change.oldBid, change.newBid); - ctx.result("OK"); -}); -``` - ---- - -## Summary - -| Mode | Use Case | Scheduling | Best For | -|------|----------|------------|----------| -| **workflow** | Continuous operation | Built-in (Java) | Production, development | -| **once** | Single execution | External (Task Scheduler) | Cron jobs, on-demand | -| **legacy** | Backward compatibility | Built-in (Java) | Existing deployments | -| **status** | Quick check | Manual/External | Health checks, debugging | - -**Recommended Setup for Windows:** -1. Install as Windows Service OR -2. Add to Startup folder (workflow mode) OR -3. Use Task Scheduler (once mode, every 30 min) - -**All workflows automatically:** -- Import data from scraper -- Process images -- Detect objects -- Monitor bids -- Send notifications -- Handle errors gracefully - ---- - -## Support - -For issues or questions: -- Check `TEST_SUITE_SUMMARY.md` for test coverage -- Review code in `WorkflowOrchestrator.java` -- Run `java -jar troostwijk-monitor.jar status` for diagnostics