# Comprehensive Data Enrichment Plan ## Current Status: Working Features ✅ Image downloads (concurrent) ✅ Basic bid data (current_bid, starting_bid, minimum_bid, bid_count, closing_time) ✅ Status extraction ✅ Brand/Model from attributes ✅ Attributes JSON storage ## Phase 1: Core Bidding Intelligence (HIGH PRIORITY) ### Data Sources Identified: 1. **GraphQL lot bidding API** - Already integrated - currentBidAmount, initialAmount, bidsCount - startDate, endDate (for first_bid_time calculation) 2. **REST bid history API** ✨ NEW DISCOVERY - Endpoint: `https://shared-api.tbauctions.com/bidmanagement/lots/{lot_uuid}/bidding-history` - Returns: bid amounts, timestamps, autobid flags, bidder IDs - Pagination supported ### Database Schema Changes: ```sql -- Extend lots table with bidding intelligence ALTER TABLE lots ADD COLUMN estimated_min DECIMAL(12,2); ALTER TABLE lots ADD COLUMN estimated_max DECIMAL(12,2); ALTER TABLE lots ADD COLUMN reserve_price DECIMAL(12,2); ALTER TABLE lots ADD COLUMN reserve_met BOOLEAN DEFAULT FALSE; ALTER TABLE lots ADD COLUMN bid_increment DECIMAL(12,2); ALTER TABLE lots ADD COLUMN watch_count INTEGER DEFAULT 0; ALTER TABLE lots ADD COLUMN first_bid_time TEXT; ALTER TABLE lots ADD COLUMN last_bid_time TEXT; ALTER TABLE lots ADD COLUMN bid_velocity DECIMAL(5,2); -- NEW: Bid history table CREATE TABLE IF NOT EXISTS bid_history ( id INTEGER PRIMARY KEY AUTOINCREMENT, lot_id TEXT NOT NULL, lot_uuid TEXT NOT NULL, bid_amount DECIMAL(12,2) NOT NULL, bid_time TEXT NOT NULL, is_winning BOOLEAN DEFAULT FALSE, is_autobid BOOLEAN DEFAULT FALSE, bidder_id TEXT, bidder_number INTEGER, created_at TEXT DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (lot_id) REFERENCES lots(lot_id) ); CREATE INDEX IF NOT EXISTS idx_bid_history_lot_time ON bid_history(lot_id, bid_time); CREATE INDEX IF NOT EXISTS idx_bid_history_bidder ON bid_history(bidder_id); ``` ### Implementation: - Add `fetch_bid_history()` function to call REST API - Parse and store all historical bids - Calculate bid_velocity (bids per hour) - Extract first_bid_time, last_bid_time ## Phase 2: Valuation Intelligence ### Data Sources: 1. **Attributes array** (already in __NEXT_DATA__) - condition, year, manufacturer, model, serial_number 2. **Description field** - Extract year patterns, condition mentions, damage descriptions ### Database Schema: ```sql -- Valuation fields ALTER TABLE lots ADD COLUMN condition_score DECIMAL(3,2); ALTER TABLE lots ADD COLUMN condition_description TEXT; ALTER TABLE lots ADD COLUMN year_manufactured INTEGER; ALTER TABLE lots ADD COLUMN serial_number TEXT; ALTER TABLE lots ADD COLUMN manufacturer TEXT; ALTER TABLE lots ADD COLUMN damage_description TEXT; ALTER TABLE lots ADD COLUMN provenance TEXT; ``` ### Implementation: - Parse attributes for: Jaar, Conditie, Serienummer, Fabrikant - Extract 4-digit years from title/description - Map condition values to 0-10 scale ## Phase 3: Auction House Intelligence ### Data Sources: 1. **GraphQL auction query** - Already partially working 2. **Auction __NEXT_DATA__** - May contain buyer's premium, shipping costs ### Database Schema: ```sql ALTER TABLE auctions ADD COLUMN buyers_premium_percent DECIMAL(5,2); ALTER TABLE auctions ADD COLUMN shipping_available BOOLEAN; ALTER TABLE auctions ADD COLUMN payment_methods TEXT; ``` ## Viewing/Pickup Times Resolution ### Finding: - `viewingDays` and `collectionDays` in GraphQL only return location (city, countryCode) - Times are NOT in the GraphQL API - Times must be in auction __NEXT_DATA__ or not set for many auctions ### Solution: - Mark viewing_time/pickup_date as "location only" when times unavailable - Store: "Nijmegen, NL" instead of full date/time string - Accept that many auctions don't have viewing times set ## Priority Implementation Order: 1. **BID HISTORY API** (30 min) - Highest value - Fetch and store all bid history - Calculate bid_velocity - Track autobid patterns 2. **ENRICHED ATTRIBUTES** (20 min) - Medium-high value - Extract year, condition, manufacturer from existing data - Parse description for damage/condition mentions 3. **VIEWING/PICKUP FIX** (10 min) - Low value (data often missing) - Update to store location-only when times unavailable ## Data Quality Expectations: | Field | Coverage Expected | Source | |-------|------------------|---------| | bid_history | 100% (for lots with bids) | REST API | | bid_velocity | 100% (calculated) | Derived | | year_manufactured | ~40% | Attributes/Title | | condition_score | ~30% | Attributes | | manufacturer | ~60% | Attributes | | viewing_time | ~20% | Often not set | | buyers_premium | 100% | GraphQL/Props | ## Estimated Total Implementation Time: 60-90 minutes