This commit is contained in:
2025-11-26 13:05:04 +01:00
parent 2bc4b21862
commit 47854d8b39
14 changed files with 1717 additions and 12 deletions

191
QUICKSTART.md Normal file
View File

@@ -0,0 +1,191 @@
# Quick Start Guide
Get the scraper running in minutes without downloading YOLO models!
## Minimal Setup (No Object Detection)
The scraper works perfectly fine **without** YOLO object detection. You can run it immediately and add object detection later if needed.
### Step 1: Run the Scraper
```bash
# Using Maven
mvn clean compile exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper"
```
Or in IntelliJ IDEA:
1. Open `TroostwijkScraper.java`
2. Right-click on the `main` method
3. Select "Run 'TroostwijkScraper.main()'"
### What You'll See
```
=== Troostwijk Auction Scraper ===
Initializing scraper...
⚠️ Object detection disabled: YOLO model files not found
Expected files:
- models/yolov4.cfg
- models/yolov4.weights
- models/coco.names
Scraper will continue without image analysis.
[1/3] Discovering Dutch auctions...
✓ Found 5 auctions: [12345, 12346, 12347, 12348, 12349]
[2/3] Fetching lot details...
Processing sale 12345...
[3/3] Starting monitoring service...
✓ Monitoring active. Press Ctrl+C to stop.
```
### Step 2: Test Desktop Notifications
The scraper will automatically send desktop notifications when:
- A new bid is placed on a monitored lot
- An auction is closing within 5 minutes
**No setup required** - desktop notifications work out of the box!
---
## Optional: Add Email Notifications
If you want email notifications in addition to desktop notifications:
```bash
# Set environment variable
export NOTIFICATION_CONFIG="smtp:your.email@gmail.com:app_password:your.email@gmail.com"
# Then run the scraper
mvn exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper"
```
**Get Gmail App Password:**
1. Enable 2FA in Google Account
2. Go to: Google Account → Security → 2-Step Verification → App passwords
3. Generate password for "Mail"
4. Use that password (not your regular Gmail password)
---
## Optional: Add Object Detection Later
If you want AI-powered image analysis to detect objects in auction photos:
### 1. Create models directory
```bash
mkdir models
cd models
```
### 2. Download YOLO files
```bash
# YOLOv4 config (small)
curl -O https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg
# YOLOv4 weights (245 MB - takes a few minutes)
curl -LO https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights
# COCO class names
curl -O https://raw.githubusercontent.com/AlexeyAB/darknet/master/data/coco.names
```
### 3. Run again
```bash
mvn exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper"
```
Now you'll see:
```
✓ Object detection enabled with YOLO
```
The scraper will now analyze auction images and detect objects like:
- Vehicles (cars, trucks, forklifts)
- Equipment (machines, tools)
- Furniture
- Electronics
- And 80+ other object types
---
## Features Without Object Detection
Even without YOLO, the scraper provides:
**Full auction scraping** - Discovers all Dutch auctions
**Lot tracking** - Monitors bids and closing times
**Desktop notifications** - Real-time alerts
**SQLite database** - All data persisted locally
**Image downloading** - Saves all lot images
**Scheduled monitoring** - Automatic updates every hour
Object detection simply adds:
- AI-powered image analysis
- Automatic object labeling
- Searchable image database
---
## Database Location
The scraper creates `troostwijk.db` in your current directory with:
- All auction data
- Lot details (title, description, bids, etc.)
- Downloaded image paths
- Object labels (if detection enabled)
View the database with any SQLite browser:
```bash
sqlite3 troostwijk.db
.tables
SELECT * FROM lots LIMIT 5;
```
---
## Stopping the Scraper
Press **Ctrl+C** to stop the monitoring service.
---
## Next Steps
1.**Run the scraper** without YOLO to test it
2.**Verify desktop notifications** work
3. ⚙️ **Optional**: Add email notifications
4. ⚙️ **Optional**: Download YOLO models for object detection
5. 🔧 **Customize**: Edit monitoring frequency, closing alerts, etc.
---
## Troubleshooting
### Desktop notifications not appearing?
- **Windows**: Check if Java has notification permissions
- **Linux**: Ensure desktop environment is running (not headless)
- **macOS**: Check System Preferences → Notifications
### OpenCV warnings?
These are normal and can be ignored:
```
WARNING: A restricted method in java.lang.System has been called
WARNING: Use --enable-native-access=ALL-UNNAMED to avoid warning
```
The scraper works fine despite these warnings.
---
## Full Documentation
See [README.md](README.md) for complete documentation including:
- Email setup details
- YOLO installation guide
- Configuration options
- Database schema
- API endpoints

233
README.md Normal file
View File

@@ -0,0 +1,233 @@
# Troostwijk Auction Scraper
A Java-based web scraper for Dutch auctions on Troostwijk Auctions with **100% free** desktop/email notifications, SQLite persistence, and AI-powered object detection.
## Features
- **Auction Discovery**: Automatically discovers active Dutch auctions
- **Data Scraping**: Fetches detailed lot information via Troostwijk's JSON API
- **SQLite Storage**: Persists auction data, lots, images, and detected objects
- **Image Processing**: Downloads and analyzes lot images using OpenCV YOLO object detection
- **Free Notifications**: Real-time notifications when:
- Bids change on monitored lots
- Auctions are closing soon (within 5 minutes)
- Via desktop notifications (Windows/macOS/Linux system tray) ✅
- Optionally via email (Gmail SMTP - free) ✅
## Dependencies
All dependencies are managed via Maven (see `pom.xml`):
- **jsoup 1.17.2** - HTML parsing and HTTP client
- **Jackson 2.17.0** - JSON processing
- **SQLite JDBC 3.45.1.0** - Database operations
- **JavaMail 1.6.2** - Email notifications (free)
- **OpenCV 4.9.0** - Image processing and object detection
## Setup
### 1. Notification Options (Choose One)
#### Option A: Desktop Notifications Only ⭐ (Recommended - Zero Setup)
Desktop notifications work out of the box on:
- **Windows**: System tray notifications
- **macOS**: Notification Center
- **Linux**: Desktop environment notifications (GNOME, KDE, etc.)
**No configuration required!** Just run with default settings:
```bash
export NOTIFICATION_CONFIG="desktop"
# Or simply don't set it - desktop is the default
```
#### Option B: Desktop + Email Notifications 📧 (Free Gmail)
1. Enable 2-Factor Authentication in your Google Account
2. Go to: **Google Account → Security → 2-Step Verification → App passwords**
3. Generate an app password for "Mail"
4. Set environment variable:
```bash
export NOTIFICATION_CONFIG="smtp:your.email@gmail.com:your_app_password:recipient@example.com"
```
**Format**: `smtp:username:app_password:recipient_email`
**Example**:
```bash
export NOTIFICATION_CONFIG="smtp:john.doe@gmail.com:abcd1234efgh5678:john.doe@gmail.com"
```
**Note**: This is completely free using Gmail's SMTP server. No paid services required!
### 2. OpenCV Native Libraries
Download and install OpenCV native libraries for your platform:
**Windows:**
```bash
# Download from https://opencv.org/releases/
# Extract and add to PATH or use:
java -Djava.library.path="C:\opencv\build\java\x64" -jar scraper.jar
```
**Linux:**
```bash
sudo apt-get install libopencv-dev
```
**macOS:**
```bash
brew install opencv
```
### 3. YOLO Model Files
Download YOLO model files for object detection:
```bash
mkdir models
cd models
# Download YOLOv4 config
wget https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg
# Download YOLOv4 weights (245 MB)
wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights
# Download COCO class names
wget https://raw.githubusercontent.com/AlexeyAB/darknet/master/data/coco.names
```
## Building
```bash
mvn clean package
```
This creates:
- `target/troostwijk-scraper-1.0-SNAPSHOT.jar` - Regular JAR
- `target/troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar` - Executable JAR with all dependencies
## Running
### Quick Start (Desktop Notifications Only)
```bash
java -Djava.library.path="/path/to/opencv/lib" \
-jar target/troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar
```
### With Email Notifications
```bash
export NOTIFICATION_CONFIG="smtp:your@gmail.com:app_password:your@gmail.com"
java -Djava.library.path="/path/to/opencv/lib" \
-jar target/troostwijk-scraper-1.0-SNAPSHOT-jar-with-dependencies.jar
```
### Using Maven
```bash
mvn exec:java -Dexec.mainClass="com.auction.scraper.TroostwijkScraper"
```
## Project Structure
```
src/main/java/com/auction/scraper/
├── TroostwijkScraper.java # Main scraper class
│ ├── Lot # Domain model for auction lots
│ ├── DatabaseService # SQLite operations
│ ├── NotificationService # Desktop + Email notifications (FREE)
│ └── ObjectDetectionService # OpenCV YOLO object detection
└── Main.java # Entry point
```
## Configuration
Edit `TroostwijkScraper.main()` to customize:
- **Database file**: `troostwijk.db` (SQLite database location)
- **YOLO paths**: Model configuration and weights files
- **Monitoring frequency**: Default is every 1 hour
- **Closing alerts**: Default is 5 minutes before closing
## Database Schema
The scraper creates three tables:
**sales**
- `sale_id` (PRIMARY KEY)
- `title`, `location`, `closing_time`
**lots**
- `lot_id` (PRIMARY KEY)
- `sale_id`, `title`, `description`, `manufacturer`, `type`, `year`
- `category`, `current_bid`, `currency`, `url`
- `closing_time`, `closing_notified`
**images**
- `id` (PRIMARY KEY)
- `lot_id`, `url`, `file_path`, `labels` (detected objects)
## Notification Examples
### Desktop Notification
![System Tray Notification]
```
🔔 Kavel bieding update
Nieuw bod op kavel 12345: €150.00 (was €125.00)
```
### Email Notification
```
From: your.email@gmail.com
To: your.email@gmail.com
Subject: [Troostwijk] Kavel bieding update
Nieuw bod op kavel 12345: €150.00 (was €125.00)
```
**High Priority Alerts** (closing soon):
```
⚠️ Lot nearing closure
Kavel 12345 sluit binnen 5 min.
```
## Why This Approach?
✅ **100% Free** - No paid services (Twilio, Pushover, etc.)
✅ **No External Dependencies** - Desktop notifications built into Java
✅ **Works Offline** - Desktop notifications don't need internet
✅ **Privacy First** - Your data stays on your machine
✅ **Cross-Platform** - Windows, macOS, Linux supported
✅ **Optional Email** - Add Gmail notifications if you want
## Troubleshooting
### Desktop Notifications Not Showing
- **Windows**: Check if Java has notification permissions
- **Linux**: Ensure you have a desktop environment running (not headless)
- **macOS**: Check System Preferences → Notifications
### Email Not Sending
1. Verify 2FA is enabled in Google Account
2. Confirm you're using an **App Password** (not your regular Gmail password)
3. Check that "Less secure app access" is NOT needed (app passwords work with 2FA)
4. Verify the SMTP format: `smtp:username:app_password:recipient`
## Notes
- Desktop notifications require a graphical environment (not headless servers)
- For headless servers, use email-only notifications
- Gmail SMTP is free and has generous limits (500 emails/day)
- OpenCV native libraries must match your platform architecture
- YOLO weights file is ~245 MB
## License
This is example code for educational purposes.

80
models/coco.names Normal file
View File

@@ -0,0 +1,80 @@
person
bicycle
car
motorbike
aeroplane
bus
train
truck
boat
traffic light
fire hydrant
stop sign
parking meter
bench
bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe
backpack
umbrella
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
baseball bat
baseball glove
skateboard
surfboard
tennis racket
bottle
wine glass
cup
fork
knife
spoon
bowl
banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
chair
sofa
pottedplant
bed
diningtable
toilet
tvmonitor
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
sink
refrigerator
book
clock
vase
scissors
teddy bear
hair drier
toothbrush

1158
models/yolov4.cfg Normal file

File diff suppressed because it is too large Load Diff

BIN
models/yolov4.weights Normal file

Binary file not shown.

View File

@@ -354,6 +354,8 @@ public class TroostwijkScraper {
* discovers Dutch auctions, scrapes lots, and begins monitoring. * discovers Dutch auctions, scrapes lots, and begins monitoring.
*/ */
public static void main(String[] args) throws Exception { public static void main(String[] args) throws Exception {
System.out.println("=== Troostwijk Auction Scraper ===\n");
// Configuration parameters (replace with your own values) // Configuration parameters (replace with your own values)
String databaseFile = "troostwijk.db"; String databaseFile = "troostwijk.db";
@@ -366,27 +368,34 @@ public class TroostwijkScraper {
// Example: "smtp:your.email@gmail.com:abcd1234efgh5678:recipient@example.com" // Example: "smtp:your.email@gmail.com:abcd1234efgh5678:recipient@example.com"
// Get app password: Google Account > Security > 2-Step Verification > App passwords // Get app password: Google Account > Security > 2-Step Verification > App passwords
String yoloCfg = "models/yolov4.cfg"; // path to YOLO config file // YOLO model paths (optional - scraper works without object detection)
String yoloWeights = "models/yolov4.weights"; // path to YOLO weights file String yoloCfg = "models/yolov4.cfg";
String yoloClasses = "models/coco.names"; // list of class names String yoloWeights = "models/yolov4.weights";
String yoloClasses = "models/coco.names";
// Load native OpenCV library // Load native OpenCV library
System.loadLibrary(Core.NATIVE_LIBRARY_NAME); System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
System.out.println("Initializing scraper...");
TroostwijkScraper scraper = new TroostwijkScraper(databaseFile, notificationConfig, "", TroostwijkScraper scraper = new TroostwijkScraper(databaseFile, notificationConfig, "",
yoloCfg, yoloWeights, yoloClasses); yoloCfg, yoloWeights, yoloClasses);
// Step 1: Discover auctions in NL // Step 1: Discover auctions in NL
System.out.println("\n[1/3] Discovering Dutch auctions...");
List<Integer> auctions = scraper.discoverDutchAuctions(); List<Integer> auctions = scraper.discoverDutchAuctions();
System.out.println("Found auctions: " + auctions); System.out.println("Found " + auctions.size() + " auctions: " + auctions);
// Step 2: Fetch lots for each auction // Step 2: Fetch lots for each auction
System.out.println("\n[2/3] Fetching lot details...");
for (int saleId : auctions) { for (int saleId : auctions) {
System.out.println(" Processing sale " + saleId + "...");
scraper.fetchLotsForSale(saleId); scraper.fetchLotsForSale(saleId);
} }
// Step 3: Start monitoring bids and closures // Step 3: Start monitoring bids and closures
System.out.println("\n[3/3] Starting monitoring service...");
scraper.scheduleMonitoring(); scraper.scheduleMonitoring();
System.out.println("✓ Monitoring active. Press Ctrl+C to stop.\n");
} }
// ---------------------------------------------------------------------- // ----------------------------------------------------------------------
@@ -710,23 +719,53 @@ public class TroostwijkScraper {
} }
/** /**
* Service for performing object detection on images using OpenCVs DNN * Service for performing object detection on images using OpenCV's DNN
* module. The DNN module can load pretrained models from several * module. The DNN module can load pretrained models from several
* frameworks (Darknet, TensorFlow, ONNX, etc.)【784097309529506†L209-L233】. Here * frameworks (Darknet, TensorFlow, ONNX, etc.)【784097309529506†L209-L233】. Here
* we load a YOLO model (Darknet) by specifying the configuration and * we load a YOLO model (Darknet) by specifying the configuration and
* weights files. For each image we run a forward pass and return a * weights files. For each image we run a forward pass and return a
* list of detected class labels. * list of detected class labels.
*
* If model files are not found, the service operates in disabled mode
* and returns empty lists.
*/ */
static class ObjectDetectionService { static class ObjectDetectionService {
private final Net net; private final Net net;
private final List<String> classNames; private final List<String> classNames;
private final boolean enabled;
ObjectDetectionService(String cfgPath, String weightsPath, String classNamesPath) throws IOException { ObjectDetectionService(String cfgPath, String weightsPath, String classNamesPath) throws IOException {
// Load network // Check if model files exist
this.net = Dnn.readNetFromDarknet(cfgPath, weightsPath); Path cfgFile = Paths.get(cfgPath);
this.net.setPreferableBackend(DNN_BACKEND_OPENCV); Path weightsFile = Paths.get(weightsPath);
this.net.setPreferableTarget(DNN_TARGET_CPU); Path classNamesFile = Paths.get(classNamesPath);
// Load class names (one per line)
this.classNames = Files.readAllLines(Paths.get(classNamesPath)); if (!Files.exists(cfgFile) || !Files.exists(weightsFile) || !Files.exists(classNamesFile)) {
System.out.println("⚠️ Object detection disabled: YOLO model files not found");
System.out.println(" Expected files:");
System.out.println(" - " + cfgPath);
System.out.println(" - " + weightsPath);
System.out.println(" - " + classNamesPath);
System.out.println(" Scraper will continue without image analysis.");
this.enabled = false;
this.net = null;
this.classNames = new ArrayList<>();
return;
}
try {
// Load network
this.net = Dnn.readNetFromDarknet(cfgPath, weightsPath);
this.net.setPreferableBackend(DNN_BACKEND_OPENCV);
this.net.setPreferableTarget(DNN_TARGET_CPU);
// Load class names (one per line)
this.classNames = Files.readAllLines(classNamesFile);
this.enabled = true;
System.out.println("✓ Object detection enabled with YOLO");
} catch (Exception e) {
System.err.println("⚠️ Object detection disabled: " + e.getMessage());
throw new IOException("Failed to initialize object detection", e);
}
} }
/** /**
* Detects objects in the given image file and returns a list of * Detects objects in the given image file and returns a list of
@@ -736,9 +775,13 @@ public class TroostwijkScraper {
* postprocessing【784097309529506†L324-L344】. * postprocessing【784097309529506†L324-L344】.
* *
* @param imagePath absolute path to the image * @param imagePath absolute path to the image
* @return list of detected class names * @return list of detected class names (empty if detection disabled)
*/ */
List<String> detectObjects(String imagePath) { List<String> detectObjects(String imagePath) {
if (!enabled) {
return new ArrayList<>();
}
List<String> labels = new ArrayList<>(); List<String> labels = new ArrayList<>();
Mat image = Imgcodecs.imread(imagePath); Mat image = Imgcodecs.imread(imagePath);
if (image.empty()) return labels; if (image.empty()) return labels;

Binary file not shown.

BIN
troostwijk.db Normal file

Binary file not shown.