# Deployment ## Prerequisites - Python 3.8+ installed - Access to a server (Linux/Windows) - Playwright and dependencies installed ## Production Setup ### 1. Install on Server ```bash # Clone repository git clone git@git.appmodel.nl:Tour/troost-scraper.git cd troost-scraper # Create virtual environment python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install dependencies pip install -r requirements.txt playwright install chromium playwright install-deps # Install system dependencies ``` ### 2. Configuration Create a configuration file or set environment variables: ```python # main.py configuration BASE_URL = "https://www.troostwijkauctions.com" CACHE_DB = "/mnt/okcomputer/output/cache.db" OUTPUT_DIR = "/mnt/okcomputer/output" RATE_LIMIT_SECONDS = 0.5 MAX_PAGES = 50 ``` ### 3. Create Output Directories ```bash sudo mkdir -p /var/troost-scraper/output sudo chown $USER:$USER /var/troost-scraper ``` ### 4. Run as Cron Job Add to crontab (`crontab -e`): ```bash # Run scraper daily at 2 AM 0 2 * * * cd /path/to/troost-scraper && /path/to/.venv/bin/python main.py >> /var/log/troost-scraper.log 2>&1 ``` ## Docker Deployment (Optional) Create `Dockerfile`: ```dockerfile FROM python:3.10-slim WORKDIR /app # Install system dependencies for Playwright RUN apt-get update && apt-get install -y \ wget \ gnupg \ && rm -rf /var/lib/apt/lists/* COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt RUN playwright install chromium RUN playwright install-deps COPY main.py . CMD ["python", "main.py"] ``` Build and run: ```bash docker build -t troost-scraper . docker run -v /path/to/output:/output troost-scraper ``` ## Monitoring ### Check Logs ```bash tail -f /var/log/troost-scraper.log ``` ### Monitor Output ```bash ls -lh /var/troost-scraper/output/ ``` ## Troubleshooting ### Playwright Browser Issues ```bash # Reinstall browsers playwright install --force chromium ``` ### Permission Issues ```bash # Fix permissions sudo chown -R $USER:$USER /var/troost-scraper ``` ### Memory Issues - Reduce `MAX_PAGES` in configuration - Run on machine with more RAM (Playwright needs ~1GB)