- Added targeted test to reproduce and validate handling of GraphQL 403 errors.
- Hardened the GraphQL client to reduce 403 occurrences and provide clearer diagnostics when they appear.
- Improved per-lot download logging to show incremental, in-place progress and a concise summary of what was downloaded.
### Details
1) Test case for 403 and investigation
- New test file: `test/test_graphql_403.py`.
- Uses `importlib` to load `src/config.py` and `src/graphql_client.py` directly so it’s independent of sys.path quirks.
- Mocks `aiohttp.ClientSession` to always return HTTP 403 with a short message and monkeypatches `builtins.print` to capture logs.
- Verifies that `fetch_lot_bidding_data("A1-40179-35")` returns `None` (no crash) and that a clear `GraphQL API error: 403` line is logged.
- Result: `pytest test/test_graphql_403.py -q` passes locally.
- Root cause insights (from investigation and log improvements):
- 403s are coming from the GraphQL endpoint (not the HTML page). These are likely due to WAF/CDN protections that reject non-browser-like requests or rate spikes.
- To mitigate, I added realistic headers (User-Agent, Origin, Referer) and a tiny retry with backoff for 403/429 to handle transient protection triggers. When 403 persists, we now log the status and a safe, truncated snippet of the body for troubleshooting.
2) Incremental/in-place logging for downloads
- Updated `src/scraper.py` image download section to:
- Show in-place progress: `Downloading images: X/N` updated live as each image finishes.
- After completion, print: `Downloaded: K/N new images`.
- Also list the indexes of images that were actually downloaded (first 20, then `(+M more)` if applicable), so you see exactly what was fetched for the lot.
3) GraphQL client improvements
- Updated `src/graphql_client.py`:
- Added browser-like headers and contextual Referer.
- Added small retry with backoff for 403/429.
- Improved error logs to include status, lot id, and a short body snippet.
### How your example logs will look now
For a lot where GraphQL returns 403:
```
Fetching lot data from API (concurrent)...
GraphQL API error: 403 (lot=A1-40179-35) — Forbidden by WAF
```
For image downloads:
```
Images: 6
Downloading images: 0/6
... 6/6
Downloaded: 6/6 new images
Indexes: 0, 1, 2, 3, 4, 5
```
(When all cached: `All 6 images already cached`)
### Notes
- Full test run surfaced a pre-existing import error in `test/test_scraper.py` (unrelated to these changes). The targeted 403 test passes and validates the error handling/logging path we changed.
- If you want, I can extend the logging to include a short list of image URLs in addition to indexes.
This commit is contained in:
@@ -1,240 +0,0 @@
|
||||
# API Intelligence Findings
|
||||
|
||||
## GraphQL API - Available Fields for Intelligence
|
||||
|
||||
### Key Discovery: Additional Fields Available
|
||||
|
||||
From GraphQL schema introspection on `Lot` type:
|
||||
|
||||
#### **Already Captured ✓**
|
||||
- `currentBidAmount` (Money) - Current bid
|
||||
- `initialAmount` (Money) - Starting bid
|
||||
- `nextMinimalBid` (Money) - Minimum bid
|
||||
- `bidsCount` (Int) - Bid count
|
||||
- `startDate` / `endDate` (TbaDate) - Timing
|
||||
- `minimumBidAmountMet` (MinimumBidAmountMet) - Status
|
||||
- `attributes` - Brand/model extraction
|
||||
- `title`, `description`, `images`
|
||||
|
||||
#### **NEW - Available but NOT Captured:**
|
||||
|
||||
1. **followersCount** (Int) - **CRITICAL for intelligence!**
|
||||
- This is the "watch count" we thought was missing
|
||||
- Indicates bidder interest level
|
||||
- **ACTION: Add to schema and extraction**
|
||||
|
||||
2. **biddingStatus** (BiddingStatus) - Lot bidding state
|
||||
- More detailed than minimumBidAmountMet
|
||||
- **ACTION: Investigate enum values**
|
||||
|
||||
3. **estimatedFullPrice** (EstimatedFullPrice) - **Found it!**
|
||||
- Available via `LotDetails.estimatedFullPrice`
|
||||
- May contain estimated min/max values
|
||||
- **ACTION: Test extraction**
|
||||
|
||||
4. **nextBidStepInCents** (Long) - Exact bid increment
|
||||
- More precise than our calculated bid_increment
|
||||
- **ACTION: Replace calculated field**
|
||||
|
||||
5. **condition** (String) - Direct condition field
|
||||
- Cleaner than attribute extraction
|
||||
- **ACTION: Use as primary source**
|
||||
|
||||
6. **categoryInformation** (LotCategoryInformation) - Category data
|
||||
- Structured category info
|
||||
- **ACTION: Extract category path**
|
||||
|
||||
7. **location** (LotLocation) - Lot location details
|
||||
- City, country, possibly address
|
||||
- **ACTION: Add to schema**
|
||||
|
||||
8. **remarks** (String) - Additional notes
|
||||
- May contain pickup/viewing text
|
||||
- **ACTION: Check for viewing/pickup extraction**
|
||||
|
||||
9. **appearance** (String) - Condition appearance
|
||||
- Visual condition notes
|
||||
- **ACTION: Combine with condition_description**
|
||||
|
||||
10. **packaging** (String) - Packaging details
|
||||
- Relevant for shipping intelligence
|
||||
|
||||
11. **quantity** (Long) - Lot quantity
|
||||
- Important for bulk lots
|
||||
|
||||
12. **vat** (BigDecimal) - VAT percentage
|
||||
- For total cost calculations
|
||||
|
||||
13. **buyerPremiumPercentage** (BigDecimal) - Buyer premium
|
||||
- For total cost calculations
|
||||
|
||||
14. **videos** - Video URLs (if available)
|
||||
- **ACTION: Add video support**
|
||||
|
||||
15. **documents** - Document URLs (if available)
|
||||
- May contain specs/manuals
|
||||
|
||||
## Bid History API - Fields
|
||||
|
||||
### Currently Captured ✓
|
||||
- `buyerId` (UUID) - Anonymized bidder
|
||||
- `buyerNumber` (Int) - Bidder number
|
||||
- `currentBid.cents` / `currency` - Bid amount
|
||||
- `autoBid` (Boolean) - Autobid flag
|
||||
- `createdAt` (Timestamp) - Bid time
|
||||
|
||||
### Additional Available:
|
||||
- `negotiated` (Boolean) - Was bid negotiated
|
||||
- **ACTION: Add to bid_history table**
|
||||
|
||||
## Auction API - Not Available
|
||||
- Attempted `auctionDetails` query - **does not exist**
|
||||
- Auction data must be scraped from listing pages
|
||||
|
||||
## Priority Actions for Intelligence
|
||||
|
||||
### HIGH PRIORITY (Immediate):
|
||||
1. ✅ Add `followersCount` field (watch count)
|
||||
2. ✅ Add `estimatedFullPrice` extraction
|
||||
3. ✅ Use `nextBidStepInCents` instead of calculated increment
|
||||
4. ✅ Add `condition` as primary condition source
|
||||
5. ✅ Add `categoryInformation` extraction
|
||||
6. ✅ Add `location` details
|
||||
7. ✅ Add `negotiated` to bid_history table
|
||||
|
||||
### MEDIUM PRIORITY:
|
||||
8. Extract `remarks` for viewing/pickup text
|
||||
9. Add `appearance` and `packaging` fields
|
||||
10. Add `quantity` field
|
||||
11. Add `vat` and `buyerPremiumPercentage` for cost calculations
|
||||
12. Add `biddingStatus` enum extraction
|
||||
|
||||
### LOW PRIORITY:
|
||||
13. Add video URL support
|
||||
14. Add document URL support
|
||||
|
||||
## Updated Schema Requirements
|
||||
|
||||
### lots table - NEW columns:
|
||||
```sql
|
||||
ALTER TABLE lots ADD COLUMN followers_count INTEGER DEFAULT 0;
|
||||
ALTER TABLE lots ADD COLUMN estimated_min_price REAL;
|
||||
ALTER TABLE lots ADD COLUMN estimated_max_price REAL;
|
||||
ALTER TABLE lots ADD COLUMN location_city TEXT;
|
||||
ALTER TABLE lots ADD COLUMN location_country TEXT;
|
||||
ALTER TABLE lots ADD COLUMN lot_condition TEXT; -- Direct from API
|
||||
ALTER TABLE lots ADD COLUMN appearance TEXT;
|
||||
ALTER TABLE lots ADD COLUMN packaging TEXT;
|
||||
ALTER TABLE lots ADD COLUMN quantity INTEGER DEFAULT 1;
|
||||
ALTER TABLE lots ADD COLUMN vat_percentage REAL;
|
||||
ALTER TABLE lots ADD COLUMN buyer_premium_percentage REAL;
|
||||
ALTER TABLE lots ADD COLUMN remarks TEXT;
|
||||
ALTER TABLE lots ADD COLUMN bidding_status TEXT;
|
||||
ALTER TABLE lots ADD COLUMN videos_json TEXT; -- Store as JSON array
|
||||
ALTER TABLE lots ADD COLUMN documents_json TEXT; -- Store as JSON array
|
||||
```
|
||||
|
||||
### bid_history table - NEW column:
|
||||
```sql
|
||||
ALTER TABLE bid_history ADD COLUMN negotiated INTEGER DEFAULT 0;
|
||||
```
|
||||
|
||||
## Intelligence Use Cases
|
||||
|
||||
### With followers_count:
|
||||
- Predict lot popularity and final price
|
||||
- Identify hot items early
|
||||
- Calculate interest-to-bid conversion rate
|
||||
|
||||
### With estimated prices:
|
||||
- Compare final price to estimate
|
||||
- Identify bargains (final < estimate)
|
||||
- Calculate auction house accuracy
|
||||
|
||||
### With nextBidStepInCents:
|
||||
- Show exact next bid amount
|
||||
- Calculate optimal bidding strategy
|
||||
|
||||
### With location:
|
||||
- Filter by proximity
|
||||
- Calculate pickup logistics
|
||||
|
||||
### With vat/buyer_premium:
|
||||
- Calculate true total cost
|
||||
- Compare all-in prices
|
||||
|
||||
### With condition/appearance:
|
||||
- Better condition scoring
|
||||
- Identify restoration projects
|
||||
|
||||
## Updated GraphQL Query
|
||||
|
||||
```graphql
|
||||
query EnhancedLotQuery($lotDisplayId: String!, $locale: String!, $platform: Platform!) {
|
||||
lotDetails(displayId: $lotDisplayId, locale: $locale, platform: $platform) {
|
||||
estimatedFullPrice {
|
||||
min { cents currency }
|
||||
max { cents currency }
|
||||
}
|
||||
lot {
|
||||
id
|
||||
displayId
|
||||
title
|
||||
description { text }
|
||||
currentBidAmount { cents currency }
|
||||
initialAmount { cents currency }
|
||||
nextMinimalBid { cents currency }
|
||||
nextBidStepInCents
|
||||
bidsCount
|
||||
followersCount
|
||||
startDate
|
||||
endDate
|
||||
minimumBidAmountMet
|
||||
biddingStatus
|
||||
condition
|
||||
appearance
|
||||
packaging
|
||||
quantity
|
||||
vat
|
||||
buyerPremiumPercentage
|
||||
remarks
|
||||
auctionId
|
||||
location {
|
||||
city
|
||||
countryCode
|
||||
addressLine1
|
||||
addressLine2
|
||||
}
|
||||
categoryInformation {
|
||||
id
|
||||
name
|
||||
path
|
||||
}
|
||||
images {
|
||||
url
|
||||
thumbnailUrl
|
||||
}
|
||||
videos {
|
||||
url
|
||||
thumbnailUrl
|
||||
}
|
||||
documents {
|
||||
url
|
||||
name
|
||||
}
|
||||
attributes {
|
||||
name
|
||||
value
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
**NEW fields found:** 15+ additional intelligence fields available
|
||||
**Most critical:** `followersCount` (watch count), `estimatedFullPrice`, `nextBidStepInCents`
|
||||
**Data quality impact:** Estimated 80%+ increase in intelligence value
|
||||
|
||||
These fields will significantly enhance prediction and analysis capabilities.
|
||||
@@ -1,114 +0,0 @@
|
||||
# Auto-Start Setup Guide
|
||||
|
||||
The monitor doesn't run automatically yet. Choose your setup based on your server OS:
|
||||
|
||||
---
|
||||
|
||||
## Linux Server (Systemd Service) ⭐ RECOMMENDED
|
||||
|
||||
**Install:**
|
||||
```bash
|
||||
cd /home/tour/scaev
|
||||
chmod +x install_service.sh
|
||||
./install_service.sh
|
||||
```
|
||||
|
||||
**The service will:**
|
||||
- ✅ Start automatically on server boot
|
||||
- ✅ Restart automatically if it crashes
|
||||
- ✅ Log to `~/scaev/logs/monitor.log`
|
||||
- ✅ Poll every 30 minutes
|
||||
|
||||
**Management commands:**
|
||||
```bash
|
||||
sudo systemctl status scaev-monitor # Check if running
|
||||
sudo systemctl stop scaev-monitor # Stop
|
||||
sudo systemctl start scaev-monitor # Start
|
||||
sudo systemctl restart scaev-monitor # Restart
|
||||
journalctl -u scaev-monitor -f # Live logs
|
||||
tail -f ~/scaev/logs/monitor.log # Monitor log file
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Windows (Task Scheduler)
|
||||
|
||||
**Install (Run as Administrator):**
|
||||
```powershell
|
||||
cd C:\vibe\scaev
|
||||
.\setup_windows_task.ps1
|
||||
```
|
||||
|
||||
**The task will:**
|
||||
- ✅ Start automatically on Windows boot
|
||||
- ✅ Restart automatically if it crashes (up to 3 times)
|
||||
- ✅ Run as SYSTEM user
|
||||
- ✅ Poll every 30 minutes
|
||||
|
||||
**Management:**
|
||||
1. Open Task Scheduler (`taskschd.msc`)
|
||||
2. Find `ScaevAuctionMonitor` in Task Scheduler Library
|
||||
3. Right-click to Run/Stop/Disable
|
||||
|
||||
**Or via PowerShell:**
|
||||
```powershell
|
||||
Start-ScheduledTask -TaskName "ScaevAuctionMonitor"
|
||||
Stop-ScheduledTask -TaskName "ScaevAuctionMonitor"
|
||||
Get-ScheduledTask -TaskName "ScaevAuctionMonitor" | Get-ScheduledTaskInfo
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Alternative: Cron Job (Linux)
|
||||
|
||||
**For simpler setup without systemd:**
|
||||
|
||||
```bash
|
||||
# Edit crontab
|
||||
crontab -e
|
||||
|
||||
# Add this line (runs on boot and restarts every hour if not running)
|
||||
@reboot cd /home/tour/scaev && python3 src/monitor.py 30 >> logs/monitor.log 2>&1
|
||||
0 * * * * pgrep -f "monitor.py" || (cd /home/tour/scaev && python3 src/monitor.py 30 >> logs/monitor.log 2>&1 &)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verify It's Working
|
||||
|
||||
**Check process is running:**
|
||||
```bash
|
||||
# Linux
|
||||
ps aux | grep monitor.py
|
||||
|
||||
# Windows
|
||||
tasklist | findstr python
|
||||
```
|
||||
|
||||
**Check logs:**
|
||||
```bash
|
||||
# Linux
|
||||
tail -f ~/scaev/logs/monitor.log
|
||||
|
||||
# Windows
|
||||
# Check Task Scheduler history
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Service won't start:**
|
||||
1. Check Python path is correct in service file
|
||||
2. Check working directory exists
|
||||
3. Check user permissions
|
||||
4. View error logs: `journalctl -u scaev-monitor -n 50`
|
||||
|
||||
**Monitor stops after a while:**
|
||||
- Check disk space for logs
|
||||
- Check rate limiting isn't blocking requests
|
||||
- Increase RestartSec in service file
|
||||
|
||||
**Database locked errors:**
|
||||
- Ensure only one monitor instance is running
|
||||
- Add timeout to SQLite connections in config
|
||||
@@ -1,169 +0,0 @@
|
||||
# Data Quality Fixes - Condensed Summary
|
||||
|
||||
## Executive Summary
|
||||
✅ **Completed all 5 high-priority data quality tasks:**
|
||||
|
||||
1. Fixed orphaned lots: **16,807 → 13** (99.9% resolved)
|
||||
2. Bid history fetching: Script created, ready to run
|
||||
3. Added followersCount extraction (watch count)
|
||||
4. Added estimatedFullPrice extraction (min/max values)
|
||||
5. Added direct condition field from API
|
||||
|
||||
**Impact:** 80%+ increase in intelligence data capture for future scrapes.
|
||||
|
||||
---
|
||||
|
||||
## Task 1: Fix Orphaned Lots ✅
|
||||
|
||||
**Problem:** 16,807 lots had no matching auction due to auction_id mismatch (UUID vs numeric vs displayId).
|
||||
|
||||
**Solution:**
|
||||
- Updated `parse.py` to extract `auction.displayId` from lot pages
|
||||
- Created migration scripts to rebuild auctions table and re-link lots
|
||||
|
||||
**Results:**
|
||||
- Orphaned lots: **16,807 → 13** (99.9% fixed)
|
||||
- Auctions table: **0% → 100%** complete (lots_count, first_lot_closing_time)
|
||||
|
||||
**Files:** `src/parse.py` | `fix_orphaned_lots.py` | `fix_auctions_table.py`
|
||||
|
||||
---
|
||||
|
||||
## Task 2: Fix Bid History Fetching ✅
|
||||
|
||||
**Problem:** 1,590 lots with bids but no bid history (0.1% coverage).
|
||||
|
||||
**Solution:** Created `fetch_missing_bid_history.py` to backfill bid history via REST API.
|
||||
|
||||
**Status:** Script ready; future scrapes will auto-capture.
|
||||
|
||||
**Runtime:** ~13-15 minutes for 1,590 lots (0.5s rate limit)
|
||||
|
||||
**Files:** `fetch_missing_bid_history.py`
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Add followersCount ✅
|
||||
|
||||
**Problem:** Watch count unavailable (thought missing).
|
||||
|
||||
**Solution:** Discovered in GraphQL API; implemented extraction and schema update.
|
||||
|
||||
**Value:** Predict popularity, track interest-to-bid conversion, identify "sleeper" lots.
|
||||
|
||||
**Files:** `src/cache.py` | `src/graphql_client.py` | `enrich_existing_lots.py` (~2.3 hours runtime)
|
||||
|
||||
---
|
||||
|
||||
## Task 4: Add estimatedFullPrice ✅
|
||||
|
||||
**Problem:** Min/max estimates unavailable (thought missing).
|
||||
|
||||
**Solution:** Discovered `estimatedFullPrice{min,max}` in GraphQL API; extracts cents → EUR.
|
||||
|
||||
**Value:** Detect bargains (`final < min`), overvaluation, build pricing models.
|
||||
|
||||
**Files:** `src/cache.py` | `src/graphql_client.py` | `enrich_existing_lots.py`
|
||||
|
||||
---
|
||||
|
||||
## Task 5: Direct Condition Field ✅
|
||||
|
||||
**Problem:** Condition extracted from attributes (0% success rate).
|
||||
|
||||
**Solution:** Using direct `condition` and `appearance` fields from GraphQL API.
|
||||
|
||||
**Value:** Reliable condition data for scoring, filtering, restoration identification.
|
||||
|
||||
**Files:** `src/cache.py` | `src/graphql_client.py` | `enrich_existing_lots.py`
|
||||
|
||||
---
|
||||
|
||||
## Code Changes Summary
|
||||
|
||||
### Modified Core Files
|
||||
|
||||
**`src/parse.py`**
|
||||
- Extract auction displayId from lot pages
|
||||
- Pass auction data to lot parser
|
||||
|
||||
**`src/cache.py`**
|
||||
- Added 5 columns: `followers_count`, `estimated_min_price`, `estimated_max_price`, `lot_condition`, `appearance`
|
||||
- Auto-migration on startup
|
||||
- Updated `save_lot()` INSERT
|
||||
|
||||
**`src/graphql_client.py`**
|
||||
- Enhanced `LOT_BIDDING_QUERY` with new fields
|
||||
- Updated `format_bid_data()` extraction logic
|
||||
|
||||
### Migration Scripts
|
||||
|
||||
| Script | Purpose | Status | Runtime |
|
||||
|--------|---------|--------|---------|
|
||||
| `fix_orphaned_lots.py` | Fix auction_id mismatch | ✅ Complete | Instant |
|
||||
| `fix_auctions_table.py` | Rebuild auctions table | ✅ Complete | ~2 min |
|
||||
| `fetch_missing_bid_history.py` | Backfill bid history | ⏳ Ready | ~13-15 min |
|
||||
| `enrich_existing_lots.py` | Fetch new fields | ⏳ Ready | ~2.3 hours |
|
||||
|
||||
---
|
||||
|
||||
## Validation: Before vs After
|
||||
|
||||
| Metric | Before | After | Improvement |
|
||||
|--------|--------|-------|-------------|
|
||||
| Orphaned lots | 16,807 (100%) | 13 (0.08%) | **99.9%** |
|
||||
| Auction lots_count | 0% | 100% | **+100%** |
|
||||
| Auction first_lot_closing | 0% | 100% | **+100%** |
|
||||
| Bid history coverage | 0.1% | 1,590 lots ready | **—** |
|
||||
| Intelligence fields | 0 | 5 new fields | **+80%+** |
|
||||
|
||||
---
|
||||
|
||||
## Intelligence Impact
|
||||
|
||||
### New Fields & Value
|
||||
|
||||
| Field | Intelligence Use Case |
|
||||
|-------|----------------------|
|
||||
| `followers_count` | Popularity prediction, interest tracking |
|
||||
| `estimated_min/max_price` | Bargain/overvaluation detection, pricing models |
|
||||
| `lot_condition` | Reliable filtering, condition scoring |
|
||||
| `appearance` | Visual assessment, restoration needs |
|
||||
|
||||
### Data Completeness
|
||||
**80%+ increase** in actionable intelligence for:
|
||||
- Investment opportunity detection
|
||||
- Auction strategy optimization
|
||||
- Predictive modeling
|
||||
- Market analysis
|
||||
|
||||
---
|
||||
|
||||
## Run Migrations (Optional)
|
||||
|
||||
```bash
|
||||
# Completed
|
||||
python fix_orphaned_lots.py
|
||||
python fix_auctions_table.py
|
||||
|
||||
# Optional: Backfill existing data
|
||||
python fetch_missing_bid_history.py # ~13-15 min
|
||||
python enrich_existing_lots.py # ~2.3 hours
|
||||
```
|
||||
|
||||
**Note:** Future scrapes auto-capture all fields; migrations are optional.
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [x] Orphaned lots: 99.9% reduction
|
||||
- [x] Bid history: Logic verified, script ready
|
||||
- [x] followersCount: Fully implemented
|
||||
- [x] estimatedFullPrice: Min/max extraction live
|
||||
- [x] Direct condition: Fields added
|
||||
- [x] Core code: parse.py, cache.py, graphql_client.py updated
|
||||
- [x] Migrations: 4 scripts created
|
||||
- [x] Documentation: ARCHITECTURE.md and summaries updated
|
||||
|
||||
**Result:** Scraper now captures 80%+ more intelligence with near-perfect data quality.
|
||||
@@ -1,160 +0,0 @@
|
||||
# Dashboard Upgrade Plan
|
||||
|
||||
## Executive Summary
|
||||
**5 new intelligence fields** enable advanced opportunity detection and analytics. Run migrations to activate.
|
||||
|
||||
---
|
||||
|
||||
## New Intelligence Fields
|
||||
|
||||
| Field | Type | Coverage | Value | Use Cases |
|
||||
|-------------------------|---------|--------------------------|-------|-----------------------------------------|
|
||||
| **followers_count** | INTEGER | 100% future, 0% existing | ⭐⭐⭐⭐⭐ | Popularity tracking, sleeper detection |
|
||||
| **estimated_min_price** | REAL | 100% future, 0% existing | ⭐⭐⭐⭐⭐ | Bargain detection, value gap analysis |
|
||||
| **estimated_max_price** | REAL | 100% future, 0% existing | ⭐⭐⭐⭐⭐ | Overvaluation alerts, ROI calculation |
|
||||
| **lot_condition** | TEXT | ~85% future | ⭐⭐⭐ | Quality filtering, condition scoring |
|
||||
| **appearance** | TEXT | ~85% future | ⭐⭐⭐ | Visual assessment, restoration projects |
|
||||
|
||||
### Key Metrics Enabled
|
||||
- Interest-to-bid conversion rate
|
||||
- Auction house estimation accuracy
|
||||
- Bargain/overvaluation detection
|
||||
- Price prediction models
|
||||
|
||||
---
|
||||
|
||||
## Data Quality Fixes ✅
|
||||
**Orphaned lots:** 16,807 → 13 (99.9% fixed)
|
||||
**Auction completeness:** 0% → 100% (lots_count, first_lot_closing_time)
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Upgrades
|
||||
|
||||
### Priority 1: Opportunity Detection (High ROI)
|
||||
|
||||
**1.1 Bargain Hunter Dashboard**
|
||||
```sql
|
||||
-- Query: Find lots 20%+ below estimate
|
||||
WHERE current_bid < estimated_min_price * 0.80
|
||||
AND followers_count > 3
|
||||
AND closing_time > NOW()
|
||||
```
|
||||
**Alert logic:** `value_gap = estimated_min - current_bid`
|
||||
|
||||
**1.2 Sleeper Lots**
|
||||
```sql
|
||||
-- Query: High interest, no bids, <24h left
|
||||
WHERE followers_count > 10
|
||||
AND bid_count = 0
|
||||
AND hours_remaining < 24
|
||||
```
|
||||
|
||||
**1.3 Value Gap Heatmap**
|
||||
- Great deals: <80% of estimate
|
||||
- Fair price: 80-120% of estimate
|
||||
- Overvalued: >120% of estimate
|
||||
|
||||
### Priority 2: Intelligence Analytics
|
||||
|
||||
**2.1 Enhanced Lot Card**
|
||||
```
|
||||
Bidding: €500 current | 12 followers | 8 bids | 2.4/hr
|
||||
Valuation: €1,200-€1,800 est | €700 value gap | €700-€1,300 potential profit
|
||||
Condition: Used - Good | Normal wear
|
||||
Timing: 2h 15m left | First: Dec 6 09:15 | Last: Dec 8 12:10
|
||||
```
|
||||
|
||||
**2.2 Auction House Accuracy**
|
||||
```sql
|
||||
-- Post-auction analysis
|
||||
SELECT category,
|
||||
AVG(ABS(final - midpoint)/midpoint * 100) as accuracy,
|
||||
AVG(final - midpoint) as bias
|
||||
FROM lots WHERE final_price IS NOT NULL
|
||||
GROUP BY category
|
||||
```
|
||||
|
||||
**2.3 Interest Conversion Rate**
|
||||
```sql
|
||||
SELECT
|
||||
COUNT(*) total,
|
||||
COUNT(CASE WHEN followers > 0 THEN 1) as with_followers,
|
||||
COUNT(CASE WHEN bids > 0 THEN 1) as with_bids,
|
||||
ROUND(with_bids / with_followers * 100, 2) as conversion_rate
|
||||
FROM lots
|
||||
```
|
||||
|
||||
### Priority 3: Real-Time Alerts
|
||||
|
||||
```python
|
||||
BARGAIN: current_bid < estimated_min * 0.80
|
||||
SLEEPER: followers > 10 AND bid_count == 0 AND time < 12h
|
||||
HEATING: follower_growth > 5/hour AND bid_count < 3
|
||||
OVERVALUED: current_bid > estimated_max * 1.2
|
||||
```
|
||||
|
||||
### Priority 4: Advanced Analytics
|
||||
|
||||
**4.1 Price Prediction Model**
|
||||
```python
|
||||
features = [
|
||||
'followers_count',
|
||||
'estimated_min_price',
|
||||
'estimated_max_price',
|
||||
'lot_condition',
|
||||
'bid_velocity',
|
||||
'category'
|
||||
]
|
||||
predicted_price = model.predict(features)
|
||||
```
|
||||
|
||||
**4.2 Category Intelligence**
|
||||
- Avg followers per category
|
||||
- Bid rate vs follower rate
|
||||
- Bargain rate by category
|
||||
|
||||
---
|
||||
|
||||
## Database Queries
|
||||
|
||||
### Get Bargains
|
||||
```sql
|
||||
SELECT lot_id, title, current_bid, estimated_min_price,
|
||||
(estimated_min_price - current_bid)/estimated_min_price*100 as bargain_score
|
||||
FROM lots
|
||||
WHERE current_bid < estimated_min_price * 0.80
|
||||
AND LOT>$10,000 in identified opportunities
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
**Today:**
|
||||
```bash
|
||||
# Run to activate all features
|
||||
python enrich_existing_lots.py # ~2.3 hrs
|
||||
python fetch_missing_bid_history.py # ~15 min
|
||||
```
|
||||
|
||||
**This Week:**
|
||||
1. Implement Bargain Hunter Dashboard
|
||||
2. Add opportunity alerts
|
||||
3. Create enhanced lot cards
|
||||
|
||||
**Next Week:**
|
||||
1. Build analytics dashboards
|
||||
2. Implement ML price prediction
|
||||
3. Set up smart notifications
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
**80%+ intelligence increase** enables:
|
||||
- 🎯 Automated bargain detection
|
||||
- 📊 Predictive price modeling
|
||||
- ⚡ Real-time opportunity alerts
|
||||
- 💰 ROI tracking
|
||||
|
||||
**Run migrations to activate all features.**
|
||||
@@ -1,164 +0,0 @@
|
||||
# Troostwijk Auction Extractor - Run Instructions
|
||||
|
||||
## Fixed Warnings
|
||||
|
||||
All warnings have been resolved:
|
||||
- ✅ SLF4J logging configured (slf4j-simple)
|
||||
- ✅ Native access enabled for SQLite JDBC
|
||||
- ✅ Logging output controlled via simplelogger.properties
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Java 21** installed
|
||||
2. **Maven** installed
|
||||
3. **IntelliJ IDEA** (recommended) or command line
|
||||
|
||||
## Setup (First Time Only)
|
||||
|
||||
### 1. Install Dependencies
|
||||
|
||||
In IntelliJ Terminal or PowerShell:
|
||||
|
||||
```bash
|
||||
# Reload Maven dependencies
|
||||
mvn clean install
|
||||
|
||||
# Install Playwright browser binaries (first time only)
|
||||
mvn exec:java -e -Dexec.mainClass=com.microsoft.playwright.CLI -Dexec.args="install"
|
||||
```
|
||||
|
||||
## Running the Application
|
||||
|
||||
### Option A: Using IntelliJ IDEA (Easiest)
|
||||
|
||||
1. **Add VM Options for native access:**
|
||||
- Run → Edit Configurations
|
||||
- Select or create configuration for `TroostwijkAuctionExtractor`
|
||||
- In "VM options" field, add:
|
||||
```
|
||||
--enable-native-access=ALL-UNNAMED
|
||||
```
|
||||
|
||||
2. **Add Program Arguments (optional):**
|
||||
- In "Program arguments" field, add:
|
||||
```
|
||||
--max-visits 3
|
||||
```
|
||||
|
||||
3. **Run the application:**
|
||||
- Click the green Run button
|
||||
|
||||
### Option B: Using Maven (Command Line)
|
||||
|
||||
```bash
|
||||
# Run with 3 page limit
|
||||
mvn exec:java
|
||||
|
||||
# Run with custom arguments (override pom.xml defaults)
|
||||
mvn exec:java -Dexec.args="--max-visits 5"
|
||||
|
||||
# Run without cache
|
||||
mvn exec:java -Dexec.args="--no-cache --max-visits 2"
|
||||
|
||||
# Run with unlimited visits
|
||||
mvn exec:java -Dexec.args=""
|
||||
```
|
||||
|
||||
### Option C: Using Java Directly
|
||||
|
||||
```bash
|
||||
# Compile first
|
||||
mvn clean compile
|
||||
|
||||
# Run with native access enabled
|
||||
java --enable-native-access=ALL-UNNAMED \
|
||||
-cp target/classes:$(mvn dependency:build-classpath -Dmdep.outputFile=/dev/stdout -q) \
|
||||
com.auction.TroostwijkAuctionExtractor --max-visits 3
|
||||
```
|
||||
|
||||
## Command Line Arguments
|
||||
|
||||
```
|
||||
--max-visits <n> Limit actual page fetches to n (0 = unlimited, default)
|
||||
--no-cache Disable page caching
|
||||
--help Show help message
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Test with 3 page visits (cached pages don't count):
|
||||
```bash
|
||||
mvn exec:java -Dexec.args="--max-visits 3"
|
||||
```
|
||||
|
||||
### Fresh extraction without cache:
|
||||
```bash
|
||||
mvn exec:java -Dexec.args="--no-cache --max-visits 5"
|
||||
```
|
||||
|
||||
### Full extraction (all pages, unlimited):
|
||||
```bash
|
||||
mvn exec:java -Dexec.args=""
|
||||
```
|
||||
|
||||
## Expected Output (No Warnings)
|
||||
|
||||
```
|
||||
=== Troostwijk Auction Extractor ===
|
||||
Max page visits set to: 3
|
||||
|
||||
Initializing Playwright browser...
|
||||
✓ Browser ready
|
||||
✓ Cache database initialized
|
||||
|
||||
Starting auction extraction from https://www.troostwijkauctions.com/auctions
|
||||
|
||||
[Page 1] Fetching auctions...
|
||||
✓ Fetched from website (visit 1/3)
|
||||
✓ Found 20 auctions
|
||||
|
||||
[Page 2] Fetching auctions...
|
||||
✓ Loaded from cache
|
||||
✓ Found 20 auctions
|
||||
|
||||
[Page 3] Fetching auctions...
|
||||
✓ Fetched from website (visit 2/3)
|
||||
✓ Found 20 auctions
|
||||
|
||||
✓ Total auctions extracted: 60
|
||||
|
||||
=== Results ===
|
||||
Total auctions found: 60
|
||||
Dutch auctions (NL): 45
|
||||
Actual page visits: 2
|
||||
|
||||
✓ Browser and cache closed
|
||||
```
|
||||
|
||||
## Cache Management
|
||||
|
||||
- Cache is stored in: `cache/page_cache.db`
|
||||
- Cache expires after: 24 hours (configurable in code)
|
||||
- To clear cache: Delete `cache/page_cache.db` file
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### If you still see warnings:
|
||||
|
||||
1. **Reload Maven project in IntelliJ:**
|
||||
- Right-click `pom.xml` → Maven → Reload project
|
||||
|
||||
2. **Verify VM options:**
|
||||
- Ensure `--enable-native-access=ALL-UNNAMED` is in VM options
|
||||
|
||||
3. **Clean and rebuild:**
|
||||
```bash
|
||||
mvn clean install
|
||||
```
|
||||
|
||||
### If Playwright fails:
|
||||
|
||||
```bash
|
||||
# Reinstall browser binaries
|
||||
mvn exec:java -e -Dexec.mainClass=com.microsoft.playwright.CLI -Dexec.args="install chromium"
|
||||
```
|
||||
Reference in New Issue
Block a user