# Java Monitoring Process Fixes ## Issues Identified Based on the error logs from the Java monitoring process, the following bugs need to be fixed: ### 1. Integer Overflow - `extractNumericId()` method **Error:** ``` For input string: "239144949705335" at java.lang.Integer.parseInt(Integer.java:565) at auctiora.ScraperDataAdapter.extractNumericId(ScraperDataAdapter.java:81) ``` **Problem:** - Lot IDs are being parsed as `int` (32-bit, max value: 2,147,483,647) - Actual lot IDs can exceed this limit (e.g., "239144949705335") **Solution:** Change from `Integer.parseInt()` to `Long.parseLong()`: ```java // BEFORE (ScraperDataAdapter.java:81) int numericId = Integer.parseInt(lotId); // AFTER long numericId = Long.parseLong(lotId); ``` **Additional changes needed:** - Update all related fields/variables from `int` to `long` - Update database schema if numeric ID is stored (change INTEGER to BIGINT) - Update any method signatures that return/accept `int` for lot IDs --- ### 2. UNIQUE Constraint Failures **Error:** ``` Failed to import lot: [SQLITE_CONSTRAINT_UNIQUE] A UNIQUE constraint failed (UNIQUE constraint failed: lots.url) ``` **Problem:** - Attempting to re-insert lots that already exist - No graceful handling of duplicate entries **Solution:** Use `INSERT OR REPLACE` or `INSERT OR IGNORE`: ```java // BEFORE String sql = "INSERT INTO lots (lot_id, url, ...) VALUES (?, ?, ...)"; // AFTER - Option 1: Update existing records String sql = "INSERT OR REPLACE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)"; // AFTER - Option 2: Skip duplicates silently String sql = "INSERT OR IGNORE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)"; ``` **Alternative with try-catch:** ```java try { insertLot(lotData); } catch (SQLException e) { if (e.getMessage().contains("UNIQUE constraint")) { logger.debug("Lot already exists, skipping: " + lotData.getUrl()); return; // Or update instead } throw e; } ``` --- ### 3. Timestamp Parsing - Already Fixed in Python **Error:** ``` Unable to parse timestamp: materieel wegens vereffening Unable to parse timestamp: gap ``` **Status:** ✅ Fixed in `parse.py` (src/parse.py:37-70) The Python parser now: - Filters out invalid timestamp strings like "gap", "materieel wegens vereffening" - Returns empty string for invalid values - Handles both Unix timestamps (seconds/milliseconds) **Java side action:** If the Java code also parses timestamps, apply similar validation: - Check for known invalid values before parsing - Use try-catch and return null/empty for unparseable timestamps - Don't fail the entire import if one timestamp is invalid --- ## Migration Strategy ### Step 1: Fix Python Parser ✅ - [x] Updated `format_timestamp()` to handle invalid strings - [x] Created migration script `script/migrate_reparse_lots.py` ### Step 2: Run Migration ```bash cd /path/to/scaev python script/migrate_reparse_lots.py --dry-run # Preview changes python script/migrate_reparse_lots.py # Apply changes ``` This will: - Re-parse all cached HTML pages using improved __NEXT_DATA__ extraction - Update existing database entries with newly extracted fields - Populate missing `viewing_time`, `pickup_date`, and other fields ### Step 3: Fix Java Code 1. Update `ScraperDataAdapter.java:81` - use `Long.parseLong()` 2. Update `DatabaseService.java` - use `INSERT OR REPLACE` or handle duplicates 3. Update timestamp parsing - add validation for invalid strings 4. Update database schema - change numeric ID columns to BIGINT if needed ### Step 4: Re-run Monitoring Process After fixes, the monitoring process should: - Successfully import all lots without crashes - Gracefully skip duplicates - Handle large numeric IDs - Ignore invalid timestamp values --- ## Database Schema Changes (if needed) If lot IDs are stored as numeric values in Java's database: ```sql -- Check current schema PRAGMA table_info(lots); -- If numeric ID field exists and is INTEGER, change to BIGINT: ALTER TABLE lots ADD COLUMN lot_id_numeric BIGINT; UPDATE lots SET lot_id_numeric = CAST(lot_id AS BIGINT) WHERE lot_id GLOB '[0-9]*'; -- Then update code to use lot_id_numeric ``` --- ## Testing Checklist After applying fixes: - [ ] Import lot with ID > 2,147,483,647 (e.g., "239144949705335") - [ ] Re-import existing lot (should update or skip gracefully) - [ ] Import lot with invalid timestamp (should not crash) - [ ] Verify all newly extracted fields are populated (viewing_time, pickup_date, etc.) - [ ] Check logs for any remaining errors --- ## Files Modified Python side (completed): - `src/parse.py` - Fixed `format_timestamp()` method - `script/migrate_reparse_lots.py` - New migration script Java side (needs implementation): - `auctiora/ScraperDataAdapter.java` - Line 81: Change Integer.parseInt to Long.parseLong - `auctiora/DatabaseService.java` - Line ~569: Handle UNIQUE constraints gracefully - Database schema - Consider BIGINT for numeric IDs