144 lines
4.7 KiB
Markdown
144 lines
4.7 KiB
Markdown
# Comprehensive Data Enrichment Plan
|
|
|
|
## Current Status: Working Features
|
|
✅ Image downloads (concurrent)
|
|
✅ Basic bid data (current_bid, starting_bid, minimum_bid, bid_count, closing_time)
|
|
✅ Status extraction
|
|
✅ Brand/Model from attributes
|
|
✅ Attributes JSON storage
|
|
|
|
## Phase 1: Core Bidding Intelligence (HIGH PRIORITY)
|
|
|
|
### Data Sources Identified:
|
|
1. **GraphQL lot bidding API** - Already integrated
|
|
- currentBidAmount, initialAmount, bidsCount
|
|
- startDate, endDate (for first_bid_time calculation)
|
|
|
|
2. **REST bid history API** ✨ NEW DISCOVERY
|
|
- Endpoint: `https://shared-api.tbauctions.com/bidmanagement/lots/{lot_uuid}/bidding-history`
|
|
- Returns: bid amounts, timestamps, autobid flags, bidder IDs
|
|
- Pagination supported
|
|
|
|
### Database Schema Changes:
|
|
|
|
```sql
|
|
-- Extend lots table with bidding intelligence
|
|
ALTER TABLE lots ADD COLUMN estimated_min DECIMAL(12,2);
|
|
ALTER TABLE lots ADD COLUMN estimated_max DECIMAL(12,2);
|
|
ALTER TABLE lots ADD COLUMN reserve_price DECIMAL(12,2);
|
|
ALTER TABLE lots ADD COLUMN reserve_met BOOLEAN DEFAULT FALSE;
|
|
ALTER TABLE lots ADD COLUMN bid_increment DECIMAL(12,2);
|
|
ALTER TABLE lots ADD COLUMN watch_count INTEGER DEFAULT 0;
|
|
ALTER TABLE lots ADD COLUMN first_bid_time TEXT;
|
|
ALTER TABLE lots ADD COLUMN last_bid_time TEXT;
|
|
ALTER TABLE lots ADD COLUMN bid_velocity DECIMAL(5,2);
|
|
|
|
-- NEW: Bid history table
|
|
CREATE TABLE IF NOT EXISTS bid_history (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
lot_id TEXT NOT NULL,
|
|
lot_uuid TEXT NOT NULL,
|
|
bid_amount DECIMAL(12,2) NOT NULL,
|
|
bid_time TEXT NOT NULL,
|
|
is_winning BOOLEAN DEFAULT FALSE,
|
|
is_autobid BOOLEAN DEFAULT FALSE,
|
|
bidder_id TEXT,
|
|
bidder_number INTEGER,
|
|
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
|
|
FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
|
|
);
|
|
|
|
CREATE INDEX IF NOT EXISTS idx_bid_history_lot_time ON bid_history(lot_id, bid_time);
|
|
CREATE INDEX IF NOT EXISTS idx_bid_history_bidder ON bid_history(bidder_id);
|
|
```
|
|
|
|
### Implementation:
|
|
- Add `fetch_bid_history()` function to call REST API
|
|
- Parse and store all historical bids
|
|
- Calculate bid_velocity (bids per hour)
|
|
- Extract first_bid_time, last_bid_time
|
|
|
|
## Phase 2: Valuation Intelligence
|
|
|
|
### Data Sources:
|
|
1. **Attributes array** (already in __NEXT_DATA__)
|
|
- condition, year, manufacturer, model, serial_number
|
|
|
|
2. **Description field**
|
|
- Extract year patterns, condition mentions, damage descriptions
|
|
|
|
### Database Schema:
|
|
|
|
```sql
|
|
-- Valuation fields
|
|
ALTER TABLE lots ADD COLUMN condition_score DECIMAL(3,2);
|
|
ALTER TABLE lots ADD COLUMN condition_description TEXT;
|
|
ALTER TABLE lots ADD COLUMN year_manufactured INTEGER;
|
|
ALTER TABLE lots ADD COLUMN serial_number TEXT;
|
|
ALTER TABLE lots ADD COLUMN manufacturer TEXT;
|
|
ALTER TABLE lots ADD COLUMN damage_description TEXT;
|
|
ALTER TABLE lots ADD COLUMN provenance TEXT;
|
|
```
|
|
|
|
### Implementation:
|
|
- Parse attributes for: Jaar, Conditie, Serienummer, Fabrikant
|
|
- Extract 4-digit years from title/description
|
|
- Map condition values to 0-10 scale
|
|
|
|
## Phase 3: Auction House Intelligence
|
|
|
|
### Data Sources:
|
|
1. **GraphQL auction query**
|
|
- Already partially working
|
|
|
|
2. **Auction __NEXT_DATA__**
|
|
- May contain buyer's premium, shipping costs
|
|
|
|
### Database Schema:
|
|
|
|
```sql
|
|
ALTER TABLE auctions ADD COLUMN buyers_premium_percent DECIMAL(5,2);
|
|
ALTER TABLE auctions ADD COLUMN shipping_available BOOLEAN;
|
|
ALTER TABLE auctions ADD COLUMN payment_methods TEXT;
|
|
```
|
|
|
|
## Viewing/Pickup Times Resolution
|
|
|
|
### Finding:
|
|
- `viewingDays` and `collectionDays` in GraphQL only return location (city, countryCode)
|
|
- Times are NOT in the GraphQL API
|
|
- Times must be in auction __NEXT_DATA__ or not set for many auctions
|
|
|
|
### Solution:
|
|
- Mark viewing_time/pickup_date as "location only" when times unavailable
|
|
- Store: "Nijmegen, NL" instead of full date/time string
|
|
- Accept that many auctions don't have viewing times set
|
|
|
|
## Priority Implementation Order:
|
|
|
|
1. **BID HISTORY API** (30 min) - Highest value
|
|
- Fetch and store all bid history
|
|
- Calculate bid_velocity
|
|
- Track autobid patterns
|
|
|
|
2. **ENRICHED ATTRIBUTES** (20 min) - Medium-high value
|
|
- Extract year, condition, manufacturer from existing data
|
|
- Parse description for damage/condition mentions
|
|
|
|
3. **VIEWING/PICKUP FIX** (10 min) - Low value (data often missing)
|
|
- Update to store location-only when times unavailable
|
|
|
|
## Data Quality Expectations:
|
|
|
|
| Field | Coverage Expected | Source |
|
|
|-------|------------------|---------|
|
|
| bid_history | 100% (for lots with bids) | REST API |
|
|
| bid_velocity | 100% (calculated) | Derived |
|
|
| year_manufactured | ~40% | Attributes/Title |
|
|
| condition_score | ~30% | Attributes |
|
|
| manufacturer | ~60% | Attributes |
|
|
| viewing_time | ~20% | Often not set |
|
|
| buyers_premium | 100% | GraphQL/Props |
|
|
|
|
## Estimated Total Implementation Time: 60-90 minutes
|