Files
scaev/_wiki/Deployment.md
2025-12-05 20:11:39 +01:00

2.1 KiB

Deployment

Prerequisites

  • Python 3.8+ installed
  • Access to a server (Linux/Windows)
  • Playwright and dependencies installed

Production Setup

1. Install on Server

# Clone repository
git clone git@git.appmodel.nl:Tour/troost-scraper.git
cd troost-scraper

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
playwright install chromium
playwright install-deps  # Install system dependencies

2. Configuration

Create a configuration file or set environment variables:

# main.py configuration
BASE_URL = "https://www.troostwijkauctions.com"
CACHE_DB = "/mnt/okcomputer/output/cache.db"
OUTPUT_DIR = "/mnt/okcomputer/output"
RATE_LIMIT_SECONDS = 0.5
MAX_PAGES = 50

3. Create Output Directories

sudo mkdir -p /var/troost-scraper/output
sudo chown $USER:$USER /var/troost-scraper

4. Run as Cron Job

Add to crontab (crontab -e):

# Run scraper daily at 2 AM
0 2 * * * cd /path/to/troost-scraper && /path/to/.venv/bin/python main.py >> /var/log/troost-scraper.log 2>&1

Docker Deployment (Optional)

Create Dockerfile:

FROM python:3.10-slim

WORKDIR /app

# Install system dependencies for Playwright
RUN apt-get update && apt-get install -y \
    wget \
    gnupg \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN playwright install chromium
RUN playwright install-deps

COPY main.py .

CMD ["python", "main.py"]

Build and run:

docker build -t troost-scraper .
docker run -v /path/to/output:/output troost-scraper

Monitoring

Check Logs

tail -f /var/log/troost-scraper.log

Monitor Output

ls -lh /var/troost-scraper/output/

Troubleshooting

Playwright Browser Issues

# Reinstall browsers
playwright install --force chromium

Permission Issues

# Fix permissions
sudo chown -R $USER:$USER /var/troost-scraper

Memory Issues

  • Reduce MAX_PAGES in configuration
  • Run on machine with more RAM (Playwright needs ~1GB)