171 lines
4.9 KiB
Markdown
171 lines
4.9 KiB
Markdown
# Java Monitoring Process Fixes
|
|
|
|
## Issues Identified
|
|
|
|
Based on the error logs from the Java monitoring process, the following bugs need to be fixed:
|
|
|
|
### 1. Integer Overflow - `extractNumericId()` method
|
|
|
|
**Error:**
|
|
```
|
|
For input string: "239144949705335"
|
|
at java.lang.Integer.parseInt(Integer.java:565)
|
|
at auctiora.ScraperDataAdapter.extractNumericId(ScraperDataAdapter.java:81)
|
|
```
|
|
|
|
**Problem:**
|
|
- Lot IDs are being parsed as `int` (32-bit, max value: 2,147,483,647)
|
|
- Actual lot IDs can exceed this limit (e.g., "239144949705335")
|
|
|
|
**Solution:**
|
|
Change from `Integer.parseInt()` to `Long.parseLong()`:
|
|
|
|
```java
|
|
// BEFORE (ScraperDataAdapter.java:81)
|
|
int numericId = Integer.parseInt(lotId);
|
|
|
|
// AFTER
|
|
long numericId = Long.parseLong(lotId);
|
|
```
|
|
|
|
**Additional changes needed:**
|
|
- Update all related fields/variables from `int` to `long`
|
|
- Update database schema if numeric ID is stored (change INTEGER to BIGINT)
|
|
- Update any method signatures that return/accept `int` for lot IDs
|
|
|
|
---
|
|
|
|
### 2. UNIQUE Constraint Failures
|
|
|
|
**Error:**
|
|
```
|
|
Failed to import lot: [SQLITE_CONSTRAINT_UNIQUE] A UNIQUE constraint failed (UNIQUE constraint failed: lots.url)
|
|
```
|
|
|
|
**Problem:**
|
|
- Attempting to re-insert lots that already exist
|
|
- No graceful handling of duplicate entries
|
|
|
|
**Solution:**
|
|
Use `INSERT OR REPLACE` or `INSERT OR IGNORE`:
|
|
|
|
```java
|
|
// BEFORE
|
|
String sql = "INSERT INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
|
|
|
|
// AFTER - Option 1: Update existing records
|
|
String sql = "INSERT OR REPLACE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
|
|
|
|
// AFTER - Option 2: Skip duplicates silently
|
|
String sql = "INSERT OR IGNORE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
|
|
```
|
|
|
|
**Alternative with try-catch:**
|
|
```java
|
|
try {
|
|
insertLot(lotData);
|
|
} catch (SQLException e) {
|
|
if (e.getMessage().contains("UNIQUE constraint")) {
|
|
logger.debug("Lot already exists, skipping: " + lotData.getUrl());
|
|
return; // Or update instead
|
|
}
|
|
throw e;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Timestamp Parsing - Already Fixed in Python
|
|
|
|
**Error:**
|
|
```
|
|
Unable to parse timestamp: materieel wegens vereffening
|
|
Unable to parse timestamp: gap
|
|
```
|
|
|
|
**Status:** ✅ Fixed in `parse.py` (src/parse.py:37-70)
|
|
|
|
The Python parser now:
|
|
- Filters out invalid timestamp strings like "gap", "materieel wegens vereffening"
|
|
- Returns empty string for invalid values
|
|
- Handles both Unix timestamps (seconds/milliseconds)
|
|
|
|
**Java side action:**
|
|
If the Java code also parses timestamps, apply similar validation:
|
|
- Check for known invalid values before parsing
|
|
- Use try-catch and return null/empty for unparseable timestamps
|
|
- Don't fail the entire import if one timestamp is invalid
|
|
|
|
---
|
|
|
|
## Migration Strategy
|
|
|
|
### Step 1: Fix Python Parser ✅
|
|
- [x] Updated `format_timestamp()` to handle invalid strings
|
|
- [x] Created migration script `script/migrate_reparse_lots.py`
|
|
|
|
### Step 2: Run Migration
|
|
```bash
|
|
cd /path/to/scaev
|
|
python script/migrate_reparse_lots.py --dry-run # Preview changes
|
|
python script/migrate_reparse_lots.py # Apply changes
|
|
```
|
|
|
|
This will:
|
|
- Re-parse all cached HTML pages using improved __NEXT_DATA__ extraction
|
|
- Update existing database entries with newly extracted fields
|
|
- Populate missing `viewing_time`, `pickup_date`, and other fields
|
|
|
|
### Step 3: Fix Java Code
|
|
1. Update `ScraperDataAdapter.java:81` - use `Long.parseLong()`
|
|
2. Update `DatabaseService.java` - use `INSERT OR REPLACE` or handle duplicates
|
|
3. Update timestamp parsing - add validation for invalid strings
|
|
4. Update database schema - change numeric ID columns to BIGINT if needed
|
|
|
|
### Step 4: Re-run Monitoring Process
|
|
After fixes, the monitoring process should:
|
|
- Successfully import all lots without crashes
|
|
- Gracefully skip duplicates
|
|
- Handle large numeric IDs
|
|
- Ignore invalid timestamp values
|
|
|
|
---
|
|
|
|
## Database Schema Changes (if needed)
|
|
|
|
If lot IDs are stored as numeric values in Java's database:
|
|
|
|
```sql
|
|
-- Check current schema
|
|
PRAGMA table_info(lots);
|
|
|
|
-- If numeric ID field exists and is INTEGER, change to BIGINT:
|
|
ALTER TABLE lots ADD COLUMN lot_id_numeric BIGINT;
|
|
UPDATE lots SET lot_id_numeric = CAST(lot_id AS BIGINT) WHERE lot_id GLOB '[0-9]*';
|
|
-- Then update code to use lot_id_numeric
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Checklist
|
|
|
|
After applying fixes:
|
|
- [ ] Import lot with ID > 2,147,483,647 (e.g., "239144949705335")
|
|
- [ ] Re-import existing lot (should update or skip gracefully)
|
|
- [ ] Import lot with invalid timestamp (should not crash)
|
|
- [ ] Verify all newly extracted fields are populated (viewing_time, pickup_date, etc.)
|
|
- [ ] Check logs for any remaining errors
|
|
|
|
---
|
|
|
|
## Files Modified
|
|
|
|
Python side (completed):
|
|
- `src/parse.py` - Fixed `format_timestamp()` method
|
|
- `script/migrate_reparse_lots.py` - New migration script
|
|
|
|
Java side (needs implementation):
|
|
- `auctiora/ScraperDataAdapter.java` - Line 81: Change Integer.parseInt to Long.parseLong
|
|
- `auctiora/DatabaseService.java` - Line ~569: Handle UNIQUE constraints gracefully
|
|
- Database schema - Consider BIGINT for numeric IDs
|