4.9 KiB
4.9 KiB
Java Monitoring Process Fixes
Issues Identified
Based on the error logs from the Java monitoring process, the following bugs need to be fixed:
1. Integer Overflow - extractNumericId() method
Error:
For input string: "239144949705335"
at java.lang.Integer.parseInt(Integer.java:565)
at auctiora.ScraperDataAdapter.extractNumericId(ScraperDataAdapter.java:81)
Problem:
- Lot IDs are being parsed as
int(32-bit, max value: 2,147,483,647) - Actual lot IDs can exceed this limit (e.g., "239144949705335")
Solution:
Change from Integer.parseInt() to Long.parseLong():
// BEFORE (ScraperDataAdapter.java:81)
int numericId = Integer.parseInt(lotId);
// AFTER
long numericId = Long.parseLong(lotId);
Additional changes needed:
- Update all related fields/variables from
inttolong - Update database schema if numeric ID is stored (change INTEGER to BIGINT)
- Update any method signatures that return/accept
intfor lot IDs
2. UNIQUE Constraint Failures
Error:
Failed to import lot: [SQLITE_CONSTRAINT_UNIQUE] A UNIQUE constraint failed (UNIQUE constraint failed: lots.url)
Problem:
- Attempting to re-insert lots that already exist
- No graceful handling of duplicate entries
Solution:
Use INSERT OR REPLACE or INSERT OR IGNORE:
// BEFORE
String sql = "INSERT INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
// AFTER - Option 1: Update existing records
String sql = "INSERT OR REPLACE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
// AFTER - Option 2: Skip duplicates silently
String sql = "INSERT OR IGNORE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
Alternative with try-catch:
try {
insertLot(lotData);
} catch (SQLException e) {
if (e.getMessage().contains("UNIQUE constraint")) {
logger.debug("Lot already exists, skipping: " + lotData.getUrl());
return; // Or update instead
}
throw e;
}
3. Timestamp Parsing - Already Fixed in Python
Error:
Unable to parse timestamp: materieel wegens vereffening
Unable to parse timestamp: gap
Status: ✅ Fixed in parse.py (src/parse.py:37-70)
The Python parser now:
- Filters out invalid timestamp strings like "gap", "materieel wegens vereffening"
- Returns empty string for invalid values
- Handles both Unix timestamps (seconds/milliseconds)
Java side action: If the Java code also parses timestamps, apply similar validation:
- Check for known invalid values before parsing
- Use try-catch and return null/empty for unparseable timestamps
- Don't fail the entire import if one timestamp is invalid
Migration Strategy
Step 1: Fix Python Parser ✅
- Updated
format_timestamp()to handle invalid strings - Created migration script
script/migrate_reparse_lots.py
Step 2: Run Migration
cd /path/to/scaev
python script/migrate_reparse_lots.py --dry-run # Preview changes
python script/migrate_reparse_lots.py # Apply changes
This will:
- Re-parse all cached HTML pages using improved NEXT_DATA extraction
- Update existing database entries with newly extracted fields
- Populate missing
viewing_time,pickup_date, and other fields
Step 3: Fix Java Code
- Update
ScraperDataAdapter.java:81- useLong.parseLong() - Update
DatabaseService.java- useINSERT OR REPLACEor handle duplicates - Update timestamp parsing - add validation for invalid strings
- Update database schema - change numeric ID columns to BIGINT if needed
Step 4: Re-run Monitoring Process
After fixes, the monitoring process should:
- Successfully import all lots without crashes
- Gracefully skip duplicates
- Handle large numeric IDs
- Ignore invalid timestamp values
Database Schema Changes (if needed)
If lot IDs are stored as numeric values in Java's database:
-- Check current schema
PRAGMA table_info(lots);
-- If numeric ID field exists and is INTEGER, change to BIGINT:
ALTER TABLE lots ADD COLUMN lot_id_numeric BIGINT;
UPDATE lots SET lot_id_numeric = CAST(lot_id AS BIGINT) WHERE lot_id GLOB '[0-9]*';
-- Then update code to use lot_id_numeric
Testing Checklist
After applying fixes:
- Import lot with ID > 2,147,483,647 (e.g., "239144949705335")
- Re-import existing lot (should update or skip gracefully)
- Import lot with invalid timestamp (should not crash)
- Verify all newly extracted fields are populated (viewing_time, pickup_date, etc.)
- Check logs for any remaining errors
Files Modified
Python side (completed):
src/parse.py- Fixedformat_timestamp()methodscript/migrate_reparse_lots.py- New migration script
Java side (needs implementation):
auctiora/ScraperDataAdapter.java- Line 81: Change Integer.parseInt to Long.parseLongauctiora/DatabaseService.java- Line ~569: Handle UNIQUE constraints gracefully- Database schema - Consider BIGINT for numeric IDs