integrating with monitor app

This commit is contained in:
Tour
2025-12-05 06:48:08 +01:00
parent 72afdf772b
commit aea188699f
7 changed files with 1234 additions and 7 deletions

170
_wiki/JAVA_FIXES_NEEDED.md Normal file
View File

@@ -0,0 +1,170 @@
# Java Monitoring Process Fixes
## Issues Identified
Based on the error logs from the Java monitoring process, the following bugs need to be fixed:
### 1. Integer Overflow - `extractNumericId()` method
**Error:**
```
For input string: "239144949705335"
at java.lang.Integer.parseInt(Integer.java:565)
at auctiora.ScraperDataAdapter.extractNumericId(ScraperDataAdapter.java:81)
```
**Problem:**
- Lot IDs are being parsed as `int` (32-bit, max value: 2,147,483,647)
- Actual lot IDs can exceed this limit (e.g., "239144949705335")
**Solution:**
Change from `Integer.parseInt()` to `Long.parseLong()`:
```java
// BEFORE (ScraperDataAdapter.java:81)
int numericId = Integer.parseInt(lotId);
// AFTER
long numericId = Long.parseLong(lotId);
```
**Additional changes needed:**
- Update all related fields/variables from `int` to `long`
- Update database schema if numeric ID is stored (change INTEGER to BIGINT)
- Update any method signatures that return/accept `int` for lot IDs
---
### 2. UNIQUE Constraint Failures
**Error:**
```
Failed to import lot: [SQLITE_CONSTRAINT_UNIQUE] A UNIQUE constraint failed (UNIQUE constraint failed: lots.url)
```
**Problem:**
- Attempting to re-insert lots that already exist
- No graceful handling of duplicate entries
**Solution:**
Use `INSERT OR REPLACE` or `INSERT OR IGNORE`:
```java
// BEFORE
String sql = "INSERT INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
// AFTER - Option 1: Update existing records
String sql = "INSERT OR REPLACE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
// AFTER - Option 2: Skip duplicates silently
String sql = "INSERT OR IGNORE INTO lots (lot_id, url, ...) VALUES (?, ?, ...)";
```
**Alternative with try-catch:**
```java
try {
insertLot(lotData);
} catch (SQLException e) {
if (e.getMessage().contains("UNIQUE constraint")) {
logger.debug("Lot already exists, skipping: " + lotData.getUrl());
return; // Or update instead
}
throw e;
}
```
---
### 3. Timestamp Parsing - Already Fixed in Python
**Error:**
```
Unable to parse timestamp: materieel wegens vereffening
Unable to parse timestamp: gap
```
**Status:** ✅ Fixed in `parse.py` (src/parse.py:37-70)
The Python parser now:
- Filters out invalid timestamp strings like "gap", "materieel wegens vereffening"
- Returns empty string for invalid values
- Handles both Unix timestamps (seconds/milliseconds)
**Java side action:**
If the Java code also parses timestamps, apply similar validation:
- Check for known invalid values before parsing
- Use try-catch and return null/empty for unparseable timestamps
- Don't fail the entire import if one timestamp is invalid
---
## Migration Strategy
### Step 1: Fix Python Parser ✅
- [x] Updated `format_timestamp()` to handle invalid strings
- [x] Created migration script `script/migrate_reparse_lots.py`
### Step 2: Run Migration
```bash
cd /path/to/scaev
python script/migrate_reparse_lots.py --dry-run # Preview changes
python script/migrate_reparse_lots.py # Apply changes
```
This will:
- Re-parse all cached HTML pages using improved __NEXT_DATA__ extraction
- Update existing database entries with newly extracted fields
- Populate missing `viewing_time`, `pickup_date`, and other fields
### Step 3: Fix Java Code
1. Update `ScraperDataAdapter.java:81` - use `Long.parseLong()`
2. Update `DatabaseService.java` - use `INSERT OR REPLACE` or handle duplicates
3. Update timestamp parsing - add validation for invalid strings
4. Update database schema - change numeric ID columns to BIGINT if needed
### Step 4: Re-run Monitoring Process
After fixes, the monitoring process should:
- Successfully import all lots without crashes
- Gracefully skip duplicates
- Handle large numeric IDs
- Ignore invalid timestamp values
---
## Database Schema Changes (if needed)
If lot IDs are stored as numeric values in Java's database:
```sql
-- Check current schema
PRAGMA table_info(lots);
-- If numeric ID field exists and is INTEGER, change to BIGINT:
ALTER TABLE lots ADD COLUMN lot_id_numeric BIGINT;
UPDATE lots SET lot_id_numeric = CAST(lot_id AS BIGINT) WHERE lot_id GLOB '[0-9]*';
-- Then update code to use lot_id_numeric
```
---
## Testing Checklist
After applying fixes:
- [ ] Import lot with ID > 2,147,483,647 (e.g., "239144949705335")
- [ ] Re-import existing lot (should update or skip gracefully)
- [ ] Import lot with invalid timestamp (should not crash)
- [ ] Verify all newly extracted fields are populated (viewing_time, pickup_date, etc.)
- [ ] Check logs for any remaining errors
---
## Files Modified
Python side (completed):
- `src/parse.py` - Fixed `format_timestamp()` method
- `script/migrate_reparse_lots.py` - New migration script
Java side (needs implementation):
- `auctiora/ScraperDataAdapter.java` - Line 81: Change Integer.parseInt to Long.parseLong
- `auctiora/DatabaseService.java` - Line ~569: Handle UNIQUE constraints gracefully
- Database schema - Consider BIGINT for numeric IDs