Files
auctiora/wiki/DATABASE_ARCHITECTURE.md
2025-12-06 05:39:59 +01:00

7.8 KiB

Database Architecture

Overview

The Auctiora auction monitoring system uses SQLite as its database engine, shared between the scraper process and the monitor application for simplicity and performance.

Current State (Dec 2025)

  • Database: C:\mnt\okcomputer\output\cache.db
  • Size: 1.6 GB
  • Records: 16,006 lots, 536,502 images
  • Concurrent Processes: 2 (scraper + monitor)
  • Access Pattern: Scraper writes, Monitor reads + occasional updates

Why SQLite?

Advantages for This Use Case

  1. Embedded Architecture

    • No separate database server to manage
    • Zero network latency (local file access)
    • Perfect for single-machine scraping + monitoring
  2. Excellent Read Performance

    • Monitor performs mostly SELECT queries
    • Well-indexed access by lot_id, url, auction_id
    • Sub-millisecond query times for simple lookups
  3. Simplicity

    • Single file database
    • Automatic backup via file copy
    • No connection pooling or authentication overhead
  4. Proven Scalability

    • Tested up to 281 TB database size
    • 1.6 GB is only 0.0006% of capacity
    • Handles billions of rows efficiently
  5. WAL Mode for Concurrency

    • Multiple readers don't block each other
    • Readers don't block writers
    • Writers don't block readers
    • Perfect for scraper + monitor workload

Configuration

Connection String (DatabaseService.java:28)

jdbc:sqlite:C:\mnt\okcomputer\output\cache.db?journal_mode=WAL&busy_timeout=10000

Key PRAGMAs (DatabaseService.java:38-40)

PRAGMA journal_mode=WAL;       -- Write-Ahead Logging for concurrency
PRAGMA busy_timeout=10000;     -- 10s retry on lock contention
PRAGMA synchronous=NORMAL;     -- Balance safety and performance

What These Settings Do

Setting Purpose Impact
journal_mode=WAL Write-Ahead Logging Enables concurrent read/write access
busy_timeout=10000 Wait 10s on lock Prevents immediate SQLITE_BUSY errors
synchronous=NORMAL Balanced sync mode Faster writes, still crash-safe

Schema Integration

Scraper Schema (Read-Only for Monitor)

CREATE TABLE lots (
    lot_id TEXT PRIMARY KEY,
    auction_id TEXT,
    url TEXT UNIQUE,              -- ⚠️ Enforced by scraper
    title TEXT,
    current_bid TEXT,
    closing_time TEXT,
    manufacturer TEXT,
    type TEXT,
    year INTEGER,
    currency TEXT DEFAULT 'EUR',
    closing_notified INTEGER DEFAULT 0,
    ...
)

Monitor Schema (Tables Created by Monitor)

CREATE TABLE images (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    lot_id INTEGER,
    url TEXT,
    local_path TEXT,
    labels TEXT,                  -- Object detection results
    processed_at INTEGER,
    FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
)

Handling Schema Conflicts

Problem: Scraper has UNIQUE constraint on lots.url

Solution (DatabaseService.java:361-424):

// Try UPDATE first
UPDATE lots SET ... WHERE lot_id = ?

// If no rows updated, INSERT OR IGNORE
INSERT OR IGNORE INTO lots (...) VALUES (...)

This approach:

  • Updates existing lots by lot_id
  • Skips inserts that violate UNIQUE constraints
  • No crashes on re-imports or duplicate URLs

Performance Characteristics

Current Performance

  • Simple SELECT by ID: <1ms
  • Full table scan (16K lots): ~50ms
  • Image INSERT: <5ms
  • Concurrent operations: No blocking observed

Scalability Projections

Metric Current 1 Year 3 Years SQLite Limit
Lots 16K 365K 1M 1B+ rows
Images 536K 19M 54M 1B+ rows
DB Size 1.6GB 36GB 100GB 281TB
Queries <1ms <5ms <20ms Depends on indexes

When to Migrate to PostgreSQL/MySQL

🚨 Migration Triggers

Consider migrating if you encounter any of these:

  1. Concurrency Limits

    • 5 concurrent writers needed

    • Frequent SQLITE_BUSY errors despite WAL mode
    • Need for distributed access across multiple servers
  2. Performance Degradation

    • Database >50GB AND queries >1s for simple SELECTs
    • Complex JOIN queries become bottleneck
    • Index sizes exceed available RAM
  3. Operational Requirements

    • Need for replication (master/slave)
    • Geographic distribution required
    • High availability / failover needed
    • Remote access from multiple locations
  4. Advanced Features

    • Full-text search on large text fields
    • Complex analytical queries (window functions, CTEs)
    • User management and fine-grained permissions
    • Connection pooling for web applications

Migration Path (If Needed)

  1. Choose Database: PostgreSQL (recommended) or MySQL
  2. Schema Export: Use SQLite .schema command
  3. Data Migration: Use sqlite3-to-postgres or custom scripts
  4. Update Connection: Change JDBC URL in application.properties
  5. Update Queries: Fix SQL dialect differences
  6. Performance Tuning: Create appropriate indexes

Example PostgreSQL configuration:

# application.properties
auction.database.url=jdbc:postgresql://localhost:5432/auctiora
auction.database.username=monitor
auction.database.password=${DB_PASSWORD}

Current Recommendation: Stick with SQLite

Rationale

  1. Sufficient Capacity: 1.6GB is 0.0006% of SQLite's limit
  2. Excellent Performance: Sub-millisecond queries
  3. Simple Operations: No complex transactions or analytics
  4. Low Concurrency: Only 2 processes (scraper + monitor)
  5. Local Architecture: No need for network DB access
  6. Zero Maintenance: No DB server to manage or monitor

Monitoring Dashboard Metrics

Track these to know when to reconsider:

-- Add to praetium.html dashboard
SELECT
  (SELECT COUNT(*) FROM lots) as lot_count,
  (SELECT COUNT(*) FROM images) as image_count,
  (SELECT page_count * page_size FROM pragma_page_count(), pragma_page_size()) as db_size_bytes,
  (SELECT (page_count - freelist_count) * 100.0 / page_count FROM pragma_page_count(), pragma_freelist_count()) as db_utilization

Review decision when:

  • Database >20GB
  • Query times >500ms for simple lookups
  • More than 3 concurrent processes needed

Backup Strategy

# Nightly backup via Windows Task Scheduler
sqlite3 C:\mnt\okcomputer\output\cache.db ".backup C:\backups\cache_$(date +%Y%m%d).db"

# Keep last 30 days
forfiles /P C:\backups /M cache_*.db /D -30 /C "cmd /c del @path"

WAL File Management

SQLite creates additional files in WAL mode:

  • cache.db - Main database
  • cache.db-wal - Write-Ahead Log
  • cache.db-shm - Shared memory

Important: Backup all three files together for consistency.

Integration Points

Scraper Process

  • Writes: INSERT new lots, auctions, images
  • Schema Owner: Creates tables, enforces constraints
  • Frequency: Continuous (every 30 minutes)

Monitor Process (Auctiora)

  • Reads: SELECT lots, auctions for monitoring
  • Writes: UPDATE bid amounts, notification flags; INSERT image processing results
  • Schema: Adds images table for object detection
  • Frequency: Every 15 seconds (dashboard refresh)

Conflict Resolution

Conflict Strategy Implementation
Duplicate lot_id UPDATE instead of INSERT DatabaseService.upsertLot()
Duplicate URL INSERT OR IGNORE Silent skip
Oversized IDs (>Long.MAX_VALUE) Return 0L, skip import ScraperDataAdapter.extractNumericId()
Invalid timestamps Try-catch, log, continue DatabaseService.getAllAuctions()
Database locked 10s busy_timeout + WAL Connection string

References