Files
scaev/_wiki/JAVA_FIXES_NEEDED.md
2025-12-05 06:48:08 +01:00

4.9 KiB

Java Monitoring Process Fixes

Issues Identified

Based on the error logs from the Java monitoring process, the following bugs need to be fixed:

1. Integer Overflow - extractNumericId() method

Error:

For input string: "239144949705335"
at java.lang.Integer.parseInt(Integer.java:565)
at auctiora.ScraperDataAdapter.extractNumericId(ScraperDataAdapter.java:81)

Problem:

  • Lot IDs are being parsed as int (32-bit, max value: 2,147,483,647)
  • Actual lot IDs can exceed this limit (e.g., "239144949705335")

Solution: Change from Integer.parseInt() to Long.parseLong():

// BEFORE (ScraperDataAdapter.java:81)
int numericId = Integer.parseInt(lotId);

// AFTER
long numericId = Long.parseLong(lotId);

Additional changes needed:

  • Update all related fields/variables from int to long
  • Update database schema if numeric ID is stored (change INTEGER to BIGINT)
  • Update any method signatures that return/accept int for lot IDs

2. UNIQUE Constraint Failures

Error:

Failed to import lot: [SQLITE_CONSTRAINT_UNIQUE] A UNIQUE constraint failed (UNIQUE constraint failed: lots.url)

Problem:

  • Attempting to re-insert lots that already exist
  • No graceful handling of duplicate entries

Solution: Use INSERT OR REPLACE or INSERT OR IGNORE:

// BEFORE
String sql = "INSERT INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";

// AFTER - Option 1: Update existing records
String sql = "INSERT OR REPLACE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";

// AFTER - Option 2: Skip duplicates silently
String sql = "INSERT OR IGNORE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";

Alternative with try-catch:

try {
    insertLot(lotData);
} catch (SQLException e) {
    if (e.getMessage().contains("UNIQUE constraint")) {
        logger.debug("Lot already exists, skipping: " + lotData.getUrl());
        return; // Or update instead
    }
    throw e;
}

3. Timestamp Parsing - Already Fixed in Python

Error:

Unable to parse timestamp: materieel wegens vereffening
Unable to parse timestamp: gap

Status: Fixed in parse.py (src/parse.py:37-70)

The Python parser now:

  • Filters out invalid timestamp strings like "gap", "materieel wegens vereffening"
  • Returns empty string for invalid values
  • Handles both Unix timestamps (seconds/milliseconds)

Java side action: If the Java code also parses timestamps, apply similar validation:

  • Check for known invalid values before parsing
  • Use try-catch and return null/empty for unparseable timestamps
  • Don't fail the entire import if one timestamp is invalid

Migration Strategy

Step 1: Fix Python Parser

  • Updated format_timestamp() to handle invalid strings
  • Created migration script script/migrate_reparse_lots.py

Step 2: Run Migration

cd /path/to/scaev
python script/migrate_reparse_lots.py --dry-run  # Preview changes
python script/migrate_reparse_lots.py           # Apply changes

This will:

  • Re-parse all cached HTML pages using improved NEXT_DATA extraction
  • Update existing database entries with newly extracted fields
  • Populate missing viewing_time, pickup_date, and other fields

Step 3: Fix Java Code

  1. Update ScraperDataAdapter.java:81 - use Long.parseLong()
  2. Update DatabaseService.java - use INSERT OR REPLACE or handle duplicates
  3. Update timestamp parsing - add validation for invalid strings
  4. Update database schema - change numeric ID columns to BIGINT if needed

Step 4: Re-run Monitoring Process

After fixes, the monitoring process should:

  • Successfully import all lots without crashes
  • Gracefully skip duplicates
  • Handle large numeric IDs
  • Ignore invalid timestamp values

Database Schema Changes (if needed)

If lot IDs are stored as numeric values in Java's database:

-- Check current schema
PRAGMA table_info(lots);

-- If numeric ID field exists and is INTEGER, change to BIGINT:
ALTER TABLE lots ADD COLUMN lot_id_numeric BIGINT;
UPDATE lots SET lot_id_numeric = CAST(lot_id AS BIGINT) WHERE lot_id GLOB '[0-9]*';
-- Then update code to use lot_id_numeric

Testing Checklist

After applying fixes:

  • Import lot with ID > 2,147,483,647 (e.g., "239144949705335")
  • Re-import existing lot (should update or skip gracefully)
  • Import lot with invalid timestamp (should not crash)
  • Verify all newly extracted fields are populated (viewing_time, pickup_date, etc.)
  • Check logs for any remaining errors

Files Modified

Python side (completed):

  • src/parse.py - Fixed format_timestamp() method
  • script/migrate_reparse_lots.py - New migration script

Java side (needs implementation):

  • auctiora/ScraperDataAdapter.java - Line 81: Change Integer.parseInt to Long.parseLong
  • auctiora/DatabaseService.java - Line ~569: Handle UNIQUE constraints gracefully
  • Database schema - Consider BIGINT for numeric IDs