4.7 KiB
4.7 KiB
Comprehensive Data Enrichment Plan
Current Status: Working Features
✅ Image downloads (concurrent) ✅ Basic bid data (current_bid, starting_bid, minimum_bid, bid_count, closing_time) ✅ Status extraction ✅ Brand/Model from attributes ✅ Attributes JSON storage
Phase 1: Core Bidding Intelligence (HIGH PRIORITY)
Data Sources Identified:
-
GraphQL lot bidding API - Already integrated
- currentBidAmount, initialAmount, bidsCount
- startDate, endDate (for first_bid_time calculation)
-
REST bid history API ✨ NEW DISCOVERY
- Endpoint:
https://shared-api.tbauctions.com/bidmanagement/lots/{lot_uuid}/bidding-history - Returns: bid amounts, timestamps, autobid flags, bidder IDs
- Pagination supported
- Endpoint:
Database Schema Changes:
-- Extend lots table with bidding intelligence
ALTER TABLE lots ADD COLUMN estimated_min DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN estimated_max DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN reserve_price DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN reserve_met BOOLEAN DEFAULT FALSE;
ALTER TABLE lots ADD COLUMN bid_increment DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN watch_count INTEGER DEFAULT 0;
ALTER TABLE lots ADD COLUMN first_bid_time TEXT;
ALTER TABLE lots ADD COLUMN last_bid_time TEXT;
ALTER TABLE lots ADD COLUMN bid_velocity DECIMAL(5,2);
-- NEW: Bid history table
CREATE TABLE IF NOT EXISTS bid_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
lot_id TEXT NOT NULL,
lot_uuid TEXT NOT NULL,
bid_amount DECIMAL(12,2) NOT NULL,
bid_time TEXT NOT NULL,
is_winning BOOLEAN DEFAULT FALSE,
is_autobid BOOLEAN DEFAULT FALSE,
bidder_id TEXT,
bidder_number INTEGER,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
);
CREATE INDEX IF NOT EXISTS idx_bid_history_lot_time ON bid_history(lot_id, bid_time);
CREATE INDEX IF NOT EXISTS idx_bid_history_bidder ON bid_history(bidder_id);
Implementation:
- Add
fetch_bid_history()function to call REST API - Parse and store all historical bids
- Calculate bid_velocity (bids per hour)
- Extract first_bid_time, last_bid_time
Phase 2: Valuation Intelligence
Data Sources:
-
Attributes array (already in NEXT_DATA)
- condition, year, manufacturer, model, serial_number
-
Description field
- Extract year patterns, condition mentions, damage descriptions
Database Schema:
-- Valuation fields
ALTER TABLE lots ADD COLUMN condition_score DECIMAL(3,2);
ALTER TABLE lots ADD COLUMN condition_description TEXT;
ALTER TABLE lots ADD COLUMN year_manufactured INTEGER;
ALTER TABLE lots ADD COLUMN serial_number TEXT;
ALTER TABLE lots ADD COLUMN manufacturer TEXT;
ALTER TABLE lots ADD COLUMN damage_description TEXT;
ALTER TABLE lots ADD COLUMN provenance TEXT;
Implementation:
- Parse attributes for: Jaar, Conditie, Serienummer, Fabrikant
- Extract 4-digit years from title/description
- Map condition values to 0-10 scale
Phase 3: Auction House Intelligence
Data Sources:
-
GraphQL auction query
- Already partially working
-
Auction NEXT_DATA
- May contain buyer's premium, shipping costs
Database Schema:
ALTER TABLE auctions ADD COLUMN buyers_premium_percent DECIMAL(5,2);
ALTER TABLE auctions ADD COLUMN shipping_available BOOLEAN;
ALTER TABLE auctions ADD COLUMN payment_methods TEXT;
Viewing/Pickup Times Resolution
Finding:
viewingDaysandcollectionDaysin GraphQL only return location (city, countryCode)- Times are NOT in the GraphQL API
- Times must be in auction NEXT_DATA or not set for many auctions
Solution:
- Mark viewing_time/pickup_date as "location only" when times unavailable
- Store: "Nijmegen, NL" instead of full date/time string
- Accept that many auctions don't have viewing times set
Priority Implementation Order:
-
BID HISTORY API (30 min) - Highest value
- Fetch and store all bid history
- Calculate bid_velocity
- Track autobid patterns
-
ENRICHED ATTRIBUTES (20 min) - Medium-high value
- Extract year, condition, manufacturer from existing data
- Parse description for damage/condition mentions
-
VIEWING/PICKUP FIX (10 min) - Low value (data often missing)
- Update to store location-only when times unavailable
Data Quality Expectations:
| Field | Coverage Expected | Source |
|---|---|---|
| bid_history | 100% (for lots with bids) | REST API |
| bid_velocity | 100% (calculated) | Derived |
| year_manufactured | ~40% | Attributes/Title |
| condition_score | ~30% | Attributes |
| manufacturer | ~60% | Attributes |
| viewing_time | ~20% | Often not set |
| buyers_premium | 100% | GraphQL/Props |