integrating with monitor app
This commit is contained in:
170
_wiki/JAVA_FIXES_NEEDED.md
Normal file
170
_wiki/JAVA_FIXES_NEEDED.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# Java Monitoring Process Fixes
|
||||
|
||||
## Issues Identified
|
||||
|
||||
Based on the error logs from the Java monitoring process, the following bugs need to be fixed:
|
||||
|
||||
### 1. Integer Overflow - `extractNumericId()` method
|
||||
|
||||
**Error:**
|
||||
```
|
||||
For input string: "239144949705335"
|
||||
at java.lang.Integer.parseInt(Integer.java:565)
|
||||
at auctiora.ScraperDataAdapter.extractNumericId(ScraperDataAdapter.java:81)
|
||||
```
|
||||
|
||||
**Problem:**
|
||||
- Lot IDs are being parsed as `int` (32-bit, max value: 2,147,483,647)
|
||||
- Actual lot IDs can exceed this limit (e.g., "239144949705335")
|
||||
|
||||
**Solution:**
|
||||
Change from `Integer.parseInt()` to `Long.parseLong()`:
|
||||
|
||||
```java
|
||||
// BEFORE (ScraperDataAdapter.java:81)
|
||||
int numericId = Integer.parseInt(lotId);
|
||||
|
||||
// AFTER
|
||||
long numericId = Long.parseLong(lotId);
|
||||
```
|
||||
|
||||
**Additional changes needed:**
|
||||
- Update all related fields/variables from `int` to `long`
|
||||
- Update database schema if numeric ID is stored (change INTEGER to BIGINT)
|
||||
- Update any method signatures that return/accept `int` for lot IDs
|
||||
|
||||
---
|
||||
|
||||
### 2. UNIQUE Constraint Failures
|
||||
|
||||
**Error:**
|
||||
```
|
||||
Failed to import lot: [SQLITE_CONSTRAINT_UNIQUE] A UNIQUE constraint failed (UNIQUE constraint failed: lots.url)
|
||||
```
|
||||
|
||||
**Problem:**
|
||||
- Attempting to re-insert lots that already exist
|
||||
- No graceful handling of duplicate entries
|
||||
|
||||
**Solution:**
|
||||
Use `INSERT OR REPLACE` or `INSERT OR IGNORE`:
|
||||
|
||||
```java
|
||||
// BEFORE
|
||||
String sql = "INSERT INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
|
||||
|
||||
// AFTER - Option 1: Update existing records
|
||||
String sql = "INSERT OR REPLACE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
|
||||
|
||||
// AFTER - Option 2: Skip duplicates silently
|
||||
String sql = "INSERT OR IGNORE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
|
||||
```
|
||||
|
||||
**Alternative with try-catch:**
|
||||
```java
|
||||
try {
|
||||
insertLot(lotData);
|
||||
} catch (SQLException e) {
|
||||
if (e.getMessage().contains("UNIQUE constraint")) {
|
||||
logger.debug("Lot already exists, skipping: " + lotData.getUrl());
|
||||
return; // Or update instead
|
||||
}
|
||||
throw e;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Timestamp Parsing - Already Fixed in Python
|
||||
|
||||
**Error:**
|
||||
```
|
||||
Unable to parse timestamp: materieel wegens vereffening
|
||||
Unable to parse timestamp: gap
|
||||
```
|
||||
|
||||
**Status:** ✅ Fixed in `parse.py` (src/parse.py:37-70)
|
||||
|
||||
The Python parser now:
|
||||
- Filters out invalid timestamp strings like "gap", "materieel wegens vereffening"
|
||||
- Returns empty string for invalid values
|
||||
- Handles both Unix timestamps (seconds/milliseconds)
|
||||
|
||||
**Java side action:**
|
||||
If the Java code also parses timestamps, apply similar validation:
|
||||
- Check for known invalid values before parsing
|
||||
- Use try-catch and return null/empty for unparseable timestamps
|
||||
- Don't fail the entire import if one timestamp is invalid
|
||||
|
||||
---
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Step 1: Fix Python Parser ✅
|
||||
- [x] Updated `format_timestamp()` to handle invalid strings
|
||||
- [x] Created migration script `script/migrate_reparse_lots.py`
|
||||
|
||||
### Step 2: Run Migration
|
||||
```bash
|
||||
cd /path/to/scaev
|
||||
python script/migrate_reparse_lots.py --dry-run # Preview changes
|
||||
python script/migrate_reparse_lots.py # Apply changes
|
||||
```
|
||||
|
||||
This will:
|
||||
- Re-parse all cached HTML pages using improved __NEXT_DATA__ extraction
|
||||
- Update existing database entries with newly extracted fields
|
||||
- Populate missing `viewing_time`, `pickup_date`, and other fields
|
||||
|
||||
### Step 3: Fix Java Code
|
||||
1. Update `ScraperDataAdapter.java:81` - use `Long.parseLong()`
|
||||
2. Update `DatabaseService.java` - use `INSERT OR REPLACE` or handle duplicates
|
||||
3. Update timestamp parsing - add validation for invalid strings
|
||||
4. Update database schema - change numeric ID columns to BIGINT if needed
|
||||
|
||||
### Step 4: Re-run Monitoring Process
|
||||
After fixes, the monitoring process should:
|
||||
- Successfully import all lots without crashes
|
||||
- Gracefully skip duplicates
|
||||
- Handle large numeric IDs
|
||||
- Ignore invalid timestamp values
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes (if needed)
|
||||
|
||||
If lot IDs are stored as numeric values in Java's database:
|
||||
|
||||
```sql
|
||||
-- Check current schema
|
||||
PRAGMA table_info(lots);
|
||||
|
||||
-- If numeric ID field exists and is INTEGER, change to BIGINT:
|
||||
ALTER TABLE lots ADD COLUMN lot_id_numeric BIGINT;
|
||||
UPDATE lots SET lot_id_numeric = CAST(lot_id AS BIGINT) WHERE lot_id GLOB '[0-9]*';
|
||||
-- Then update code to use lot_id_numeric
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Checklist
|
||||
|
||||
After applying fixes:
|
||||
- [ ] Import lot with ID > 2,147,483,647 (e.g., "239144949705335")
|
||||
- [ ] Re-import existing lot (should update or skip gracefully)
|
||||
- [ ] Import lot with invalid timestamp (should not crash)
|
||||
- [ ] Verify all newly extracted fields are populated (viewing_time, pickup_date, etc.)
|
||||
- [ ] Check logs for any remaining errors
|
||||
|
||||
---
|
||||
|
||||
## Files Modified
|
||||
|
||||
Python side (completed):
|
||||
- `src/parse.py` - Fixed `format_timestamp()` method
|
||||
- `script/migrate_reparse_lots.py` - New migration script
|
||||
|
||||
Java side (needs implementation):
|
||||
- `auctiora/ScraperDataAdapter.java` - Line 81: Change Integer.parseInt to Long.parseLong
|
||||
- `auctiora/DatabaseService.java` - Line ~569: Handle UNIQUE constraints gracefully
|
||||
- Database schema - Consider BIGINT for numeric IDs
|
||||
Reference in New Issue
Block a user