first
This commit is contained in:
122
wiki/Deployment.md
Normal file
122
wiki/Deployment.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# Deployment
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.8+ installed
|
||||
- Access to a server (Linux/Windows)
|
||||
- Playwright and dependencies installed
|
||||
|
||||
## Production Setup
|
||||
|
||||
### 1. Install on Server
|
||||
|
||||
```bash
|
||||
# Clone repository
|
||||
git clone git@git.appmodel.nl:Tour/troost-scraper.git
|
||||
cd troost-scraper
|
||||
|
||||
# Create virtual environment
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
playwright install chromium
|
||||
playwright install-deps # Install system dependencies
|
||||
```
|
||||
|
||||
### 2. Configuration
|
||||
|
||||
Create a configuration file or set environment variables:
|
||||
|
||||
```python
|
||||
# main.py configuration
|
||||
BASE_URL = "https://www.troostwijkauctions.com"
|
||||
CACHE_DB = "/var/troost-scraper/cache.db"
|
||||
OUTPUT_DIR = "/var/troost-scraper/output"
|
||||
RATE_LIMIT_SECONDS = 0.5
|
||||
MAX_PAGES = 50
|
||||
```
|
||||
|
||||
### 3. Create Output Directories
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /var/troost-scraper/output
|
||||
sudo chown $USER:$USER /var/troost-scraper
|
||||
```
|
||||
|
||||
### 4. Run as Cron Job
|
||||
|
||||
Add to crontab (`crontab -e`):
|
||||
|
||||
```bash
|
||||
# Run scraper daily at 2 AM
|
||||
0 2 * * * cd /path/to/troost-scraper && /path/to/.venv/bin/python main.py >> /var/log/troost-scraper.log 2>&1
|
||||
```
|
||||
|
||||
## Docker Deployment (Optional)
|
||||
|
||||
Create `Dockerfile`:
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.10-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install system dependencies for Playwright
|
||||
RUN apt-get update && apt-get install -y \
|
||||
wget \
|
||||
gnupg \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
RUN playwright install chromium
|
||||
RUN playwright install-deps
|
||||
|
||||
COPY main.py .
|
||||
|
||||
CMD ["python", "main.py"]
|
||||
```
|
||||
|
||||
Build and run:
|
||||
|
||||
```bash
|
||||
docker build -t troost-scraper .
|
||||
docker run -v /path/to/output:/output troost-scraper
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Check Logs
|
||||
|
||||
```bash
|
||||
tail -f /var/log/troost-scraper.log
|
||||
```
|
||||
|
||||
### Monitor Output
|
||||
|
||||
```bash
|
||||
ls -lh /var/troost-scraper/output/
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Playwright Browser Issues
|
||||
|
||||
```bash
|
||||
# Reinstall browsers
|
||||
playwright install --force chromium
|
||||
```
|
||||
|
||||
### Permission Issues
|
||||
|
||||
```bash
|
||||
# Fix permissions
|
||||
sudo chown -R $USER:$USER /var/troost-scraper
|
||||
```
|
||||
|
||||
### Memory Issues
|
||||
|
||||
- Reduce `MAX_PAGES` in configuration
|
||||
- Run on machine with more RAM (Playwright needs ~1GB)
|
||||
Reference in New Issue
Block a user