2.1 KiB
2.1 KiB
Deployment
Prerequisites
- Python 3.8+ installed
- Access to a server (Linux/Windows)
- Playwright and dependencies installed
Production Setup
1. Install on Server
# Clone repository
git clone git@git.appmodel.nl:Tour/troost-scraper.git
cd troost-scraper
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
playwright install chromium
playwright install-deps # Install system dependencies
2. Configuration
Create a configuration file or set environment variables:
# main.py configuration
BASE_URL = "https://www.troostwijkauctions.com"
CACHE_DB = "/mnt/okcomputer/output/cache.db"
OUTPUT_DIR = "/mnt/okcomputer/output"
RATE_LIMIT_SECONDS = 0.5
MAX_PAGES = 50
3. Create Output Directories
sudo mkdir -p /var/troost-scraper/output
sudo chown $USER:$USER /var/troost-scraper
4. Run as Cron Job
Add to crontab (crontab -e):
# Run scraper daily at 2 AM
0 2 * * * cd /path/to/troost-scraper && /path/to/.venv/bin/python main.py >> /var/log/troost-scraper.log 2>&1
Docker Deployment (Optional)
Create Dockerfile:
FROM python:3.10-slim
WORKDIR /app
# Install system dependencies for Playwright
RUN apt-get update && apt-get install -y \
wget \
gnupg \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN playwright install chromium
RUN playwright install-deps
COPY main.py .
CMD ["python", "main.py"]
Build and run:
docker build -t troost-scraper .
docker run -v /path/to/output:/output troost-scraper
Monitoring
Check Logs
tail -f /var/log/troost-scraper.log
Monitor Output
ls -lh /var/troost-scraper/output/
Troubleshooting
Playwright Browser Issues
# Reinstall browsers
playwright install --force chromium
Permission Issues
# Fix permissions
sudo chown -R $USER:$USER /var/troost-scraper
Memory Issues
- Reduce
MAX_PAGESin configuration - Run on machine with more RAM (Playwright needs ~1GB)