7.8 KiB
7.8 KiB
Database Architecture
Overview
The Auctiora auction monitoring system uses SQLite as its database engine, shared between the scraper process and the monitor application for simplicity and performance.
Current State (Dec 2025)
- Database:
C:\mnt\okcomputer\output\cache.db - Size: 1.6 GB
- Records: 16,006 lots, 536,502 images
- Concurrent Processes: 2 (scraper + monitor)
- Access Pattern: Scraper writes, Monitor reads + occasional updates
Why SQLite?
✅ Advantages for This Use Case
-
Embedded Architecture
- No separate database server to manage
- Zero network latency (local file access)
- Perfect for single-machine scraping + monitoring
-
Excellent Read Performance
- Monitor performs mostly SELECT queries
- Well-indexed access by
lot_id,url,auction_id - Sub-millisecond query times for simple lookups
-
Simplicity
- Single file database
- Automatic backup via file copy
- No connection pooling or authentication overhead
-
Proven Scalability
- Tested up to 281 TB database size
- 1.6 GB is only 0.0006% of capacity
- Handles billions of rows efficiently
-
WAL Mode for Concurrency
- Multiple readers don't block each other
- Readers don't block writers
- Writers don't block readers
- Perfect for scraper + monitor workload
Configuration
Connection String (DatabaseService.java:28)
jdbc:sqlite:C:\mnt\okcomputer\output\cache.db?journal_mode=WAL&busy_timeout=10000
Key PRAGMAs (DatabaseService.java:38-40)
PRAGMA journal_mode=WAL; -- Write-Ahead Logging for concurrency
PRAGMA busy_timeout=10000; -- 10s retry on lock contention
PRAGMA synchronous=NORMAL; -- Balance safety and performance
What These Settings Do
| Setting | Purpose | Impact |
|---|---|---|
journal_mode=WAL |
Write-Ahead Logging | Enables concurrent read/write access |
busy_timeout=10000 |
Wait 10s on lock | Prevents immediate SQLITE_BUSY errors |
synchronous=NORMAL |
Balanced sync mode | Faster writes, still crash-safe |
Schema Integration
Scraper Schema (Read-Only for Monitor)
CREATE TABLE lots (
lot_id TEXT PRIMARY KEY,
auction_id TEXT,
url TEXT UNIQUE, -- ⚠️ Enforced by scraper
title TEXT,
current_bid TEXT,
closing_time TEXT,
manufacturer TEXT,
type TEXT,
year INTEGER,
currency TEXT DEFAULT 'EUR',
closing_notified INTEGER DEFAULT 0,
...
)
Monitor Schema (Tables Created by Monitor)
CREATE TABLE images (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id INTEGER,
url TEXT,
file_path TEXT,
labels TEXT, -- Object detection results
processed_at INTEGER,
FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
)
Handling Schema Conflicts
Problem: Scraper has UNIQUE constraint on lots.url
Solution (DatabaseService.java:361-424):
// Try UPDATE first
UPDATE lots SET ... WHERE lot_id = ?
// If no rows updated, INSERT OR IGNORE
INSERT OR IGNORE INTO lots (...) VALUES (...)
This approach:
- ✅ Updates existing lots by
lot_id - ✅ Skips inserts that violate UNIQUE constraints
- ✅ No crashes on re-imports or duplicate URLs
Performance Characteristics
Current Performance
- Simple SELECT by ID: <1ms
- Full table scan (16K lots): ~50ms
- Image INSERT: <5ms
- Concurrent operations: No blocking observed
Scalability Projections
| Metric | Current | 1 Year | 3 Years | SQLite Limit |
|---|---|---|---|---|
| Lots | 16K | 365K | 1M | 1B+ rows |
| Images | 536K | 19M | 54M | 1B+ rows |
| DB Size | 1.6GB | 36GB | 100GB | 281TB |
| Queries | <1ms | <5ms | <20ms | Depends on indexes |
When to Migrate to PostgreSQL/MySQL
🚨 Migration Triggers
Consider migrating if you encounter any of these:
-
Concurrency Limits
-
5 concurrent writers needed
- Frequent
SQLITE_BUSYerrors despite WAL mode - Need for distributed access across multiple servers
-
-
Performance Degradation
- Database >50GB AND queries >1s for simple SELECTs
- Complex JOIN queries become bottleneck
- Index sizes exceed available RAM
-
Operational Requirements
- Need for replication (master/slave)
- Geographic distribution required
- High availability / failover needed
- Remote access from multiple locations
-
Advanced Features
- Full-text search on large text fields
- Complex analytical queries (window functions, CTEs)
- User management and fine-grained permissions
- Connection pooling for web applications
Migration Path (If Needed)
- Choose Database: PostgreSQL (recommended) or MySQL
- Schema Export: Use SQLite
.schemacommand - Data Migration: Use
sqlite3-to-postgresor custom scripts - Update Connection: Change JDBC URL in
application.properties - Update Queries: Fix SQL dialect differences
- Performance Tuning: Create appropriate indexes
Example PostgreSQL configuration:
# application.properties
auction.database.url=jdbc:postgresql://localhost:5432/auctiora
auction.database.username=monitor
auction.database.password=${DB_PASSWORD}
Current Recommendation: ✅ Stick with SQLite
Rationale
- Sufficient Capacity: 1.6GB is 0.0006% of SQLite's limit
- Excellent Performance: Sub-millisecond queries
- Simple Operations: No complex transactions or analytics
- Low Concurrency: Only 2 processes (scraper + monitor)
- Local Architecture: No need for network DB access
- Zero Maintenance: No DB server to manage or monitor
Monitoring Dashboard Metrics
Track these to know when to reconsider:
-- Add to praetium.html dashboard
SELECT
(SELECT COUNT(*) FROM lots) as lot_count,
(SELECT COUNT(*) FROM images) as image_count,
(SELECT page_count * page_size FROM pragma_page_count(), pragma_page_size()) as db_size_bytes,
(SELECT (page_count - freelist_count) * 100.0 / page_count FROM pragma_page_count(), pragma_freelist_count()) as db_utilization
Review decision when:
- Database >20GB
- Query times >500ms for simple lookups
- More than 3 concurrent processes needed
Backup Strategy
Recommended Approach
# Nightly backup via Windows Task Scheduler
sqlite3 C:\mnt\okcomputer\output\cache.db ".backup C:\backups\cache_$(date +%Y%m%d).db"
# Keep last 30 days
forfiles /P C:\backups /M cache_*.db /D -30 /C "cmd /c del @path"
WAL File Management
SQLite creates additional files in WAL mode:
cache.db- Main databasecache.db-wal- Write-Ahead Logcache.db-shm- Shared memory
Important: Backup all three files together for consistency.
Integration Points
Scraper Process
- Writes: INSERT new lots, auctions, images
- Schema Owner: Creates tables, enforces constraints
- Frequency: Continuous (every 30 minutes)
Monitor Process (Auctiora)
- Reads: SELECT lots, auctions for monitoring
- Writes: UPDATE bid amounts, notification flags; INSERT image processing results
- Schema: Adds
imagestable for object detection - Frequency: Every 15 seconds (dashboard refresh)
Conflict Resolution
| Conflict | Strategy | Implementation |
|---|---|---|
| Duplicate lot_id | UPDATE instead of INSERT | DatabaseService.upsertLot() |
| Duplicate URL | INSERT OR IGNORE | Silent skip |
| Oversized IDs (>Long.MAX_VALUE) | Return 0L, skip import | ScraperDataAdapter.extractNumericId() |
| Invalid timestamps | Try-catch, log, continue | DatabaseService.getAllAuctions() |
| Database locked | 10s busy_timeout + WAL | Connection string |