259 lines
7.8 KiB
Markdown
259 lines
7.8 KiB
Markdown
# Database Architecture
|
|
|
|
## Overview
|
|
|
|
The Auctiora auction monitoring system uses **SQLite** as its database engine, shared between the scraper process and the monitor application for simplicity and performance.
|
|
|
|
## Current State (Dec 2025)
|
|
|
|
- **Database**: `C:\mnt\okcomputer\output\cache.db`
|
|
- **Size**: 1.6 GB
|
|
- **Records**: 16,006 lots, 536,502 images
|
|
- **Concurrent Processes**: 2 (scraper + monitor)
|
|
- **Access Pattern**: Scraper writes, Monitor reads + occasional updates
|
|
|
|
## Why SQLite?
|
|
|
|
### ✅ Advantages for This Use Case
|
|
|
|
1. **Embedded Architecture**
|
|
- No separate database server to manage
|
|
- Zero network latency (local file access)
|
|
- Perfect for single-machine scraping + monitoring
|
|
|
|
2. **Excellent Read Performance**
|
|
- Monitor performs mostly SELECT queries
|
|
- Well-indexed access by `lot_id`, `url`, `auction_id`
|
|
- Sub-millisecond query times for simple lookups
|
|
|
|
3. **Simplicity**
|
|
- Single file database
|
|
- Automatic backup via file copy
|
|
- No connection pooling or authentication overhead
|
|
|
|
4. **Proven Scalability**
|
|
- Tested up to 281 TB database size
|
|
- 1.6 GB is only 0.0006% of capacity
|
|
- Handles billions of rows efficiently
|
|
|
|
5. **WAL Mode for Concurrency**
|
|
- Multiple readers don't block each other
|
|
- Readers don't block writers
|
|
- Writers don't block readers
|
|
- Perfect for scraper + monitor workload
|
|
|
|
## Configuration
|
|
|
|
### Connection String (DatabaseService.java:28)
|
|
```java
|
|
jdbc:sqlite:C:\mnt\okcomputer\output\cache.db?journal_mode=WAL&busy_timeout=10000
|
|
```
|
|
|
|
### Key PRAGMAs (DatabaseService.java:38-40)
|
|
```sql
|
|
PRAGMA journal_mode=WAL; -- Write-Ahead Logging for concurrency
|
|
PRAGMA busy_timeout=10000; -- 10s retry on lock contention
|
|
PRAGMA synchronous=NORMAL; -- Balance safety and performance
|
|
```
|
|
|
|
### What These Settings Do
|
|
|
|
| Setting | Purpose | Impact |
|
|
|---------|---------|--------|
|
|
| `journal_mode=WAL` | Write-Ahead Logging | Enables concurrent read/write access |
|
|
| `busy_timeout=10000` | Wait 10s on lock | Prevents immediate `SQLITE_BUSY` errors |
|
|
| `synchronous=NORMAL` | Balanced sync mode | Faster writes, still crash-safe |
|
|
|
|
## Schema Integration
|
|
|
|
### Scraper Schema (Read-Only for Monitor)
|
|
```sql
|
|
CREATE TABLE lots (
|
|
lot_id TEXT PRIMARY KEY,
|
|
auction_id TEXT,
|
|
url TEXT UNIQUE, -- ⚠️ Enforced by scraper
|
|
title TEXT,
|
|
current_bid TEXT,
|
|
closing_time TEXT,
|
|
manufacturer TEXT,
|
|
type TEXT,
|
|
year INTEGER,
|
|
currency TEXT DEFAULT 'EUR',
|
|
closing_notified INTEGER DEFAULT 0,
|
|
...
|
|
)
|
|
```
|
|
|
|
### Monitor Schema (Tables Created by Monitor)
|
|
```sql
|
|
CREATE TABLE images (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
lot_id INTEGER,
|
|
url TEXT,
|
|
file_path TEXT,
|
|
labels TEXT, -- Object detection results
|
|
processed_at INTEGER,
|
|
FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
|
|
)
|
|
```
|
|
|
|
### Handling Schema Conflicts
|
|
|
|
**Problem**: Scraper has `UNIQUE` constraint on `lots.url`
|
|
|
|
**Solution** (DatabaseService.java:361-424):
|
|
```java
|
|
// Try UPDATE first
|
|
UPDATE lots SET ... WHERE lot_id = ?
|
|
|
|
// If no rows updated, INSERT OR IGNORE
|
|
INSERT OR IGNORE INTO lots (...) VALUES (...)
|
|
```
|
|
|
|
This approach:
|
|
- ✅ Updates existing lots by `lot_id`
|
|
- ✅ Skips inserts that violate UNIQUE constraints
|
|
- ✅ No crashes on re-imports or duplicate URLs
|
|
|
|
## Performance Characteristics
|
|
|
|
### Current Performance
|
|
- Simple SELECT by ID: <1ms
|
|
- Full table scan (16K lots): ~50ms
|
|
- Image INSERT: <5ms
|
|
- Concurrent operations: No blocking observed
|
|
|
|
### Scalability Projections
|
|
|
|
| Metric | Current | 1 Year | 3 Years | SQLite Limit |
|
|
|--------|---------|--------|---------|--------------|
|
|
| Lots | 16K | 365K | 1M | 1B+ rows |
|
|
| Images | 536K | 19M | 54M | 1B+ rows |
|
|
| DB Size | 1.6GB | 36GB | 100GB | 281TB |
|
|
| Queries | <1ms | <5ms | <20ms | Depends on indexes |
|
|
|
|
## When to Migrate to PostgreSQL/MySQL
|
|
|
|
### 🚨 Migration Triggers
|
|
|
|
Consider migrating if you encounter **any** of these:
|
|
|
|
1. **Concurrency Limits**
|
|
- >5 concurrent writers needed
|
|
- Frequent `SQLITE_BUSY` errors despite WAL mode
|
|
- Need for distributed access across multiple servers
|
|
|
|
2. **Performance Degradation**
|
|
- Database >50GB AND queries >1s for simple SELECTs
|
|
- Complex JOIN queries become bottleneck
|
|
- Index sizes exceed available RAM
|
|
|
|
3. **Operational Requirements**
|
|
- Need for replication (master/slave)
|
|
- Geographic distribution required
|
|
- High availability / failover needed
|
|
- Remote access from multiple locations
|
|
|
|
4. **Advanced Features**
|
|
- Full-text search on large text fields
|
|
- Complex analytical queries (window functions, CTEs)
|
|
- User management and fine-grained permissions
|
|
- Connection pooling for web applications
|
|
|
|
### Migration Path (If Needed)
|
|
|
|
1. **Choose Database**: PostgreSQL (recommended) or MySQL
|
|
2. **Schema Export**: Use SQLite `.schema` command
|
|
3. **Data Migration**: Use `sqlite3-to-postgres` or custom scripts
|
|
4. **Update Connection**: Change JDBC URL in `application.properties`
|
|
5. **Update Queries**: Fix SQL dialect differences
|
|
6. **Performance Tuning**: Create appropriate indexes
|
|
|
|
Example PostgreSQL configuration:
|
|
```properties
|
|
# application.properties
|
|
auction.database.url=jdbc:postgresql://localhost:5432/auctiora
|
|
auction.database.username=monitor
|
|
auction.database.password=${DB_PASSWORD}
|
|
```
|
|
|
|
## Current Recommendation: ✅ **Stick with SQLite**
|
|
|
|
### Rationale
|
|
|
|
1. **Sufficient Capacity**: 1.6GB is 0.0006% of SQLite's limit
|
|
2. **Excellent Performance**: Sub-millisecond queries
|
|
3. **Simple Operations**: No complex transactions or analytics
|
|
4. **Low Concurrency**: Only 2 processes (scraper + monitor)
|
|
5. **Local Architecture**: No need for network DB access
|
|
6. **Zero Maintenance**: No DB server to manage or monitor
|
|
|
|
### Monitoring Dashboard Metrics
|
|
|
|
Track these to know when to reconsider:
|
|
|
|
```sql
|
|
-- Add to praetium.html dashboard
|
|
SELECT
|
|
(SELECT COUNT(*) FROM lots) as lot_count,
|
|
(SELECT COUNT(*) FROM images) as image_count,
|
|
(SELECT page_count * page_size FROM pragma_page_count(), pragma_page_size()) as db_size_bytes,
|
|
(SELECT (page_count - freelist_count) * 100.0 / page_count FROM pragma_page_count(), pragma_freelist_count()) as db_utilization
|
|
```
|
|
|
|
**Review decision when**:
|
|
- Database >20GB
|
|
- Query times >500ms for simple lookups
|
|
- More than 3 concurrent processes needed
|
|
|
|
## Backup Strategy
|
|
|
|
### Recommended Approach
|
|
|
|
```bash
|
|
# Nightly backup via Windows Task Scheduler
|
|
sqlite3 C:\mnt\okcomputer\output\cache.db ".backup C:\backups\cache_$(date +%Y%m%d).db"
|
|
|
|
# Keep last 30 days
|
|
forfiles /P C:\backups /M cache_*.db /D -30 /C "cmd /c del @path"
|
|
```
|
|
|
|
### WAL File Management
|
|
|
|
SQLite creates additional files in WAL mode:
|
|
- `cache.db` - Main database
|
|
- `cache.db-wal` - Write-Ahead Log
|
|
- `cache.db-shm` - Shared memory
|
|
|
|
**Important**: Backup all three files together for consistency.
|
|
|
|
## Integration Points
|
|
|
|
### Scraper Process
|
|
- **Writes**: INSERT new lots, auctions, images
|
|
- **Schema Owner**: Creates tables, enforces constraints
|
|
- **Frequency**: Continuous (every 30 minutes)
|
|
|
|
### Monitor Process (Auctiora)
|
|
- **Reads**: SELECT lots, auctions for monitoring
|
|
- **Writes**: UPDATE bid amounts, notification flags; INSERT image processing results
|
|
- **Schema**: Adds `images` table for object detection
|
|
- **Frequency**: Every 15 seconds (dashboard refresh)
|
|
|
|
### Conflict Resolution
|
|
|
|
| Conflict | Strategy | Implementation |
|
|
|----------|----------|----------------|
|
|
| Duplicate lot_id | UPDATE instead of INSERT | DatabaseService.upsertLot() |
|
|
| Duplicate URL | INSERT OR IGNORE | Silent skip |
|
|
| Oversized IDs (>Long.MAX_VALUE) | Return 0L, skip import | ScraperDataAdapter.extractNumericId() |
|
|
| Invalid timestamps | Try-catch, log, continue | DatabaseService.getAllAuctions() |
|
|
| Database locked | 10s busy_timeout + WAL | Connection string |
|
|
|
|
## References
|
|
|
|
- [SQLite Documentation](https://www.sqlite.org/docs.html)
|
|
- [WAL Mode](https://www.sqlite.org/wal.html)
|
|
- [SQLite Limits](https://www.sqlite.org/limits.html)
|
|
- [When to Use SQLite](https://www.sqlite.org/whentouse.html)
|