# Database Architecture ## Overview The Auctiora auction monitoring system uses **SQLite** as its database engine, shared between the scraper process and the monitor application for simplicity and performance. ## Current State (Dec 2025) - **Database**: `C:\mnt\okcomputer\output\cache.db` - **Size**: 1.6 GB - **Records**: 16,006 lots, 536,502 images - **Concurrent Processes**: 2 (scraper + monitor) - **Access Pattern**: Scraper writes, Monitor reads + occasional updates ## Why SQLite? ### ✅ Advantages for This Use Case 1. **Embedded Architecture** - No separate database server to manage - Zero network latency (local file access) - Perfect for single-machine scraping + monitoring 2. **Excellent Read Performance** - Monitor performs mostly SELECT queries - Well-indexed access by `lot_id`, `url`, `auction_id` - Sub-millisecond query times for simple lookups 3. **Simplicity** - Single file database - Automatic backup via file copy - No connection pooling or authentication overhead 4. **Proven Scalability** - Tested up to 281 TB database size - 1.6 GB is only 0.0006% of capacity - Handles billions of rows efficiently 5. **WAL Mode for Concurrency** - Multiple readers don't block each other - Readers don't block writers - Writers don't block readers - Perfect for scraper + monitor workload ## Configuration ### Connection String (DatabaseService.java:28) ```java jdbc:sqlite:C:\mnt\okcomputer\output\cache.db?journal_mode=WAL&busy_timeout=10000 ``` ### Key PRAGMAs (DatabaseService.java:38-40) ```sql PRAGMA journal_mode=WAL; -- Write-Ahead Logging for concurrency PRAGMA busy_timeout=10000; -- 10s retry on lock contention PRAGMA synchronous=NORMAL; -- Balance safety and performance ``` ### What These Settings Do | Setting | Purpose | Impact | |---------|---------|--------| | `journal_mode=WAL` | Write-Ahead Logging | Enables concurrent read/write access | | `busy_timeout=10000` | Wait 10s on lock | Prevents immediate `SQLITE_BUSY` errors | | `synchronous=NORMAL` | Balanced sync mode | Faster writes, still crash-safe | ## Schema Integration ### Scraper Schema (Read-Only for Monitor) ```sql CREATE TABLE lots ( lot_id TEXT PRIMARY KEY, auction_id TEXT, url TEXT UNIQUE, -- ⚠️ Enforced by scraper title TEXT, current_bid TEXT, closing_time TEXT, manufacturer TEXT, type TEXT, year INTEGER, currency TEXT DEFAULT 'EUR', closing_notified INTEGER DEFAULT 0, ... ) ``` ### Monitor Schema (Tables Created by Monitor) ```sql CREATE TABLE images ( id INTEGER PRIMARY KEY AUTOINCREMENT, lot_id INTEGER, url TEXT, local_path TEXT, labels TEXT, -- Object detection results processed_at INTEGER, FOREIGN KEY (lot_id) REFERENCES lots(lot_id) ) ``` ### Handling Schema Conflicts **Problem**: Scraper has `UNIQUE` constraint on `lots.url` **Solution** (DatabaseService.java:361-424): ```java // Try UPDATE first UPDATE lots SET ... WHERE lot_id = ? // If no rows updated, INSERT OR IGNORE INSERT OR IGNORE INTO lots (...) VALUES (...) ``` This approach: - ✅ Updates existing lots by `lot_id` - ✅ Skips inserts that violate UNIQUE constraints - ✅ No crashes on re-imports or duplicate URLs ## Performance Characteristics ### Current Performance - Simple SELECT by ID: <1ms - Full table scan (16K lots): ~50ms - Image INSERT: <5ms - Concurrent operations: No blocking observed ### Scalability Projections | Metric | Current | 1 Year | 3 Years | SQLite Limit | |--------|---------|--------|---------|--------------| | Lots | 16K | 365K | 1M | 1B+ rows | | Images | 536K | 19M | 54M | 1B+ rows | | DB Size | 1.6GB | 36GB | 100GB | 281TB | | Queries | <1ms | <5ms | <20ms | Depends on indexes | ## When to Migrate to PostgreSQL/MySQL ### 🚨 Migration Triggers Consider migrating if you encounter **any** of these: 1. **Concurrency Limits** - >5 concurrent writers needed - Frequent `SQLITE_BUSY` errors despite WAL mode - Need for distributed access across multiple servers 2. **Performance Degradation** - Database >50GB AND queries >1s for simple SELECTs - Complex JOIN queries become bottleneck - Index sizes exceed available RAM 3. **Operational Requirements** - Need for replication (master/slave) - Geographic distribution required - High availability / failover needed - Remote access from multiple locations 4. **Advanced Features** - Full-text search on large text fields - Complex analytical queries (window functions, CTEs) - User management and fine-grained permissions - Connection pooling for web applications ### Migration Path (If Needed) 1. **Choose Database**: PostgreSQL (recommended) or MySQL 2. **Schema Export**: Use SQLite `.schema` command 3. **Data Migration**: Use `sqlite3-to-postgres` or custom scripts 4. **Update Connection**: Change JDBC URL in `application.properties` 5. **Update Queries**: Fix SQL dialect differences 6. **Performance Tuning**: Create appropriate indexes Example PostgreSQL configuration: ```properties # application.properties auction.database.url=jdbc:postgresql://localhost:5432/auctiora auction.database.username=monitor auction.database.password=${DB_PASSWORD} ``` ## Current Recommendation: ✅ **Stick with SQLite** ### Rationale 1. **Sufficient Capacity**: 1.6GB is 0.0006% of SQLite's limit 2. **Excellent Performance**: Sub-millisecond queries 3. **Simple Operations**: No complex transactions or analytics 4. **Low Concurrency**: Only 2 processes (scraper + monitor) 5. **Local Architecture**: No need for network DB access 6. **Zero Maintenance**: No DB server to manage or monitor ### Monitoring Dashboard Metrics Track these to know when to reconsider: ```sql -- Add to praetium.html dashboard SELECT (SELECT COUNT(*) FROM lots) as lot_count, (SELECT COUNT(*) FROM images) as image_count, (SELECT page_count * page_size FROM pragma_page_count(), pragma_page_size()) as db_size_bytes, (SELECT (page_count - freelist_count) * 100.0 / page_count FROM pragma_page_count(), pragma_freelist_count()) as db_utilization ``` **Review decision when**: - Database >20GB - Query times >500ms for simple lookups - More than 3 concurrent processes needed ## Backup Strategy ### Recommended Approach ```bash # Nightly backup via Windows Task Scheduler sqlite3 C:\mnt\okcomputer\output\cache.db ".backup C:\backups\cache_$(date +%Y%m%d).db" # Keep last 30 days forfiles /P C:\backups /M cache_*.db /D -30 /C "cmd /c del @path" ``` ### WAL File Management SQLite creates additional files in WAL mode: - `cache.db` - Main database - `cache.db-wal` - Write-Ahead Log - `cache.db-shm` - Shared memory **Important**: Backup all three files together for consistency. ## Integration Points ### Scraper Process - **Writes**: INSERT new lots, auctions, images - **Schema Owner**: Creates tables, enforces constraints - **Frequency**: Continuous (every 30 minutes) ### Monitor Process (Auctiora) - **Reads**: SELECT lots, auctions for monitoring - **Writes**: UPDATE bid amounts, notification flags; INSERT image processing results - **Schema**: Adds `images` table for object detection - **Frequency**: Every 15 seconds (dashboard refresh) ### Conflict Resolution | Conflict | Strategy | Implementation | |----------|----------|----------------| | Duplicate lot_id | UPDATE instead of INSERT | DatabaseService.upsertLot() | | Duplicate URL | INSERT OR IGNORE | Silent skip | | Oversized IDs (>Long.MAX_VALUE) | Return 0L, skip import | ScraperDataAdapter.extractNumericId() | | Invalid timestamps | Try-catch, log, continue | DatabaseService.getAllAuctions() | | Database locked | 10s busy_timeout + WAL | Connection string | ## References - [SQLite Documentation](https://www.sqlite.org/docs.html) - [WAL Mode](https://www.sqlite.org/wal.html) - [SQLite Limits](https://www.sqlite.org/limits.html) - [When to Use SQLite](https://www.sqlite.org/whentouse.html)