- Hardened the GraphQL client to reduce 403 occurrences and provide clearer diagnostics when they appear.
- Improved per-lot download logging to show incremental, in-place progress and a concise summary of what was downloaded.
### Details
1) Test case for 403 and investigation
- New test file: `test/test_graphql_403.py`.
- Uses `importlib` to load `src/config.py` and `src/graphql_client.py` directly so it’s independent of sys.path quirks.
- Mocks `aiohttp.ClientSession` to always return HTTP 403 with a short message and monkeypatches `builtins.print` to capture logs.
- Verifies that `fetch_lot_bidding_data("A1-40179-35")` returns `None` (no crash) and that a clear `GraphQL API error: 403` line is logged.
- Result: `pytest test/test_graphql_403.py -q` passes locally.
- Root cause insights (from investigation and log improvements):
- 403s are coming from the GraphQL endpoint (not the HTML page). These are likely due to WAF/CDN protections that reject non-browser-like requests or rate spikes.
- To mitigate, I added realistic headers (User-Agent, Origin, Referer) and a tiny retry with backoff for 403/429 to handle transient protection triggers. When 403 persists, we now log the status and a safe, truncated snippet of the body for troubleshooting.
2) Incremental/in-place logging for downloads
- Updated `src/scraper.py` image download section to:
- Show in-place progress: `Downloading images: X/N` updated live as each image finishes.
- After completion, print: `Downloaded: K/N new images`.
- Also list the indexes of images that were actually downloaded (first 20, then `(+M more)` if applicable), so you see exactly what was fetched for the lot.
3) GraphQL client improvements
- Updated `src/graphql_client.py`:
- Added browser-like headers and contextual Referer.
- Added small retry with backoff for 403/429.
- Improved error logs to include status, lot id, and a short body snippet.
### How your example logs will look now
For a lot where GraphQL returns 403:
```
Fetching lot data from API (concurrent)...
GraphQL API error: 403 (lot=A1-40179-35) — Forbidden by WAF
```
For image downloads:
```
Images: 6
Downloading images: 0/6
... 6/6
Downloaded: 6/6 new images
Indexes: 0, 1, 2, 3, 4, 5
```
(When all cached: `All 6 images already cached`)
### Notes
- Full test run surfaced a pre-existing import error in `test/test_scraper.py` (unrelated to these changes). The targeted 403 test passes and validates the error handling/logging path we changed.
- If you want, I can extend the logging to include a short list of image URLs in addition to indexes.
76 lines
1.9 KiB
Markdown
76 lines
1.9 KiB
Markdown
# Setup & IDE Configuration
|
|
|
|
## Python Version Requirement
|
|
|
|
This project **requires Python 3.10 or higher**.
|
|
|
|
The code uses Python 3.10+ features including:
|
|
- Structural pattern matching
|
|
- Union type syntax (`X | Y`)
|
|
- Improved type hints
|
|
- Modern async/await patterns
|
|
|
|
## IDE Configuration
|
|
|
|
### PyCharm / IntelliJ IDEA
|
|
|
|
If your IDE shows "Python 2.7 syntax" warnings, configure it for Python 3.10+:
|
|
|
|
1. **File → Project Structure → Project Settings → Project**
|
|
- Set Python SDK to 3.10 or higher
|
|
|
|
2. **File → Settings → Project → Python Interpreter**
|
|
- Select Python 3.10+ interpreter
|
|
- Click gear icon → Add → System Interpreter → Browse to your Python 3.10 installation
|
|
|
|
3. **File → Settings → Editor → Inspections → Python**
|
|
- Ensure "Python version" is set to 3.10+
|
|
- Check "Code compatibility inspection" → Set minimum version to 3.10
|
|
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
# Check Python version
|
|
python --version # Should be 3.10+
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Install Playwright browsers
|
|
playwright install chromium
|
|
```
|
|
|
|
## Verifying Setup
|
|
|
|
```bash
|
|
# Should print version 3.10.x or higher
|
|
python -c "import sys; print(sys.version)"
|
|
|
|
# Should run without errors
|
|
python main.py --help
|
|
```
|
|
|
|
## Common Issues
|
|
|
|
### "ModuleNotFoundError: No module named 'playwright'"
|
|
```bash
|
|
pip install playwright
|
|
playwright install chromium
|
|
```
|
|
|
|
### "Python 2.7 does not support..." warnings in IDE
|
|
- Your IDE is configured for Python 2.7
|
|
- Follow IDE configuration steps above
|
|
- The code WILL work with Python 3.10+ despite warnings
|
|
|
|
### Script exits with "requires Python 3.10 or higher"
|
|
- You're running Python 3.9 or older
|
|
- Upgrade to Python 3.10+: https://www.python.org/downloads/
|
|
|
|
## Version Files
|
|
|
|
- `.python-version` - Used by pyenv and similar tools
|
|
- `requirements.txt` - Package dependencies
|
|
- Runtime checks in scripts ensure Python 3.10+
|