Files
scaev/docs/COMPREHENSIVE_UPDATE_PLAN.md
2025-12-07 07:09:16 +01:00

4.7 KiB

Comprehensive Data Enrichment Plan

Current Status: Working Features

Image downloads (concurrent) Basic bid data (current_bid, starting_bid, minimum_bid, bid_count, closing_time) Status extraction Brand/Model from attributes Attributes JSON storage

Phase 1: Core Bidding Intelligence (HIGH PRIORITY)

Data Sources Identified:

  1. GraphQL lot bidding API - Already integrated

    • currentBidAmount, initialAmount, bidsCount
    • startDate, endDate (for first_bid_time calculation)
  2. REST bid history API NEW DISCOVERY

    • Endpoint: https://shared-api.tbauctions.com/bidmanagement/lots/{lot_uuid}/bidding-history
    • Returns: bid amounts, timestamps, autobid flags, bidder IDs
    • Pagination supported

Database Schema Changes:

-- Extend lots table with bidding intelligence
ALTER TABLE lots ADD COLUMN estimated_min DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN estimated_max DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN reserve_price DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN reserve_met BOOLEAN DEFAULT FALSE;
ALTER TABLE lots ADD COLUMN bid_increment DECIMAL(12,2);
ALTER TABLE lots ADD COLUMN watch_count INTEGER DEFAULT 0;
ALTER TABLE lots ADD COLUMN first_bid_time TEXT;
ALTER TABLE lots ADD COLUMN last_bid_time TEXT;
ALTER TABLE lots ADD COLUMN bid_velocity DECIMAL(5,2);

-- NEW: Bid history table
CREATE TABLE IF NOT EXISTS bid_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    lot_id TEXT NOT NULL,
    lot_uuid TEXT NOT NULL,
    bid_amount DECIMAL(12,2) NOT NULL,
    bid_time TEXT NOT NULL,
    is_winning BOOLEAN DEFAULT FALSE,
    is_autobid BOOLEAN DEFAULT FALSE,
    bidder_id TEXT,
    bidder_number INTEGER,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (lot_id) REFERENCES lots(lot_id)
);

CREATE INDEX IF NOT EXISTS idx_bid_history_lot_time ON bid_history(lot_id, bid_time);
CREATE INDEX IF NOT EXISTS idx_bid_history_bidder ON bid_history(bidder_id);

Implementation:

  • Add fetch_bid_history() function to call REST API
  • Parse and store all historical bids
  • Calculate bid_velocity (bids per hour)
  • Extract first_bid_time, last_bid_time

Phase 2: Valuation Intelligence

Data Sources:

  1. Attributes array (already in NEXT_DATA)

    • condition, year, manufacturer, model, serial_number
  2. Description field

    • Extract year patterns, condition mentions, damage descriptions

Database Schema:

-- Valuation fields
ALTER TABLE lots ADD COLUMN condition_score DECIMAL(3,2);
ALTER TABLE lots ADD COLUMN condition_description TEXT;
ALTER TABLE lots ADD COLUMN year_manufactured INTEGER;
ALTER TABLE lots ADD COLUMN serial_number TEXT;
ALTER TABLE lots ADD COLUMN manufacturer TEXT;
ALTER TABLE lots ADD COLUMN damage_description TEXT;
ALTER TABLE lots ADD COLUMN provenance TEXT;

Implementation:

  • Parse attributes for: Jaar, Conditie, Serienummer, Fabrikant
  • Extract 4-digit years from title/description
  • Map condition values to 0-10 scale

Phase 3: Auction House Intelligence

Data Sources:

  1. GraphQL auction query

    • Already partially working
  2. Auction NEXT_DATA

    • May contain buyer's premium, shipping costs

Database Schema:

ALTER TABLE auctions ADD COLUMN buyers_premium_percent DECIMAL(5,2);
ALTER TABLE auctions ADD COLUMN shipping_available BOOLEAN;
ALTER TABLE auctions ADD COLUMN payment_methods TEXT;

Viewing/Pickup Times Resolution

Finding:

  • viewingDays and collectionDays in GraphQL only return location (city, countryCode)
  • Times are NOT in the GraphQL API
  • Times must be in auction NEXT_DATA or not set for many auctions

Solution:

  • Mark viewing_time/pickup_date as "location only" when times unavailable
  • Store: "Nijmegen, NL" instead of full date/time string
  • Accept that many auctions don't have viewing times set

Priority Implementation Order:

  1. BID HISTORY API (30 min) - Highest value

    • Fetch and store all bid history
    • Calculate bid_velocity
    • Track autobid patterns
  2. ENRICHED ATTRIBUTES (20 min) - Medium-high value

    • Extract year, condition, manufacturer from existing data
    • Parse description for damage/condition mentions
  3. VIEWING/PICKUP FIX (10 min) - Low value (data often missing)

    • Update to store location-only when times unavailable

Data Quality Expectations:

Field Coverage Expected Source
bid_history 100% (for lots with bids) REST API
bid_velocity 100% (calculated) Derived
year_manufactured ~40% Attributes/Title
condition_score ~30% Attributes
manufacturer ~60% Attributes
viewing_time ~20% Often not set
buyers_premium 100% GraphQL/Props

Estimated Total Implementation Time: 60-90 minutes