verbatim-dicta/README.md

# Verbatim Dicta2

Real-time audio transcription using Whisper AI with optional LLM analysis. Captures microphone and speaker audio simultaneously for comprehensive transcription.

## Features

- **Dual audio capture** - Record microphone and speaker output simultaneously
- **Real-time transcription** - Process audio as it's captured with Whisper models
- **LLM analysis** - Optional fact-checking and question generation via Ollama
- **Multi-language** - Support for 50+ languages
- **File output** - Save transcripts with timestamps and analysis
- **GPU acceleration** - CUDA support for faster processing

## Quick Start

```bash
# Install dependencies
pip install -r requirements.txt

# List audio devices
./run_transcribe.sh --list-devices

# Basic transcription
./run_transcribe.sh --model medium --language en

# With LLM analysis and file output
./run_transcribe.sh --model medium --enable-llm --output transcript.txt
```

## Requirements

- **OS**: Windows 10/11 or Linux
- **Python**: 3.8+
- **Audio**: Loopback device (Stereo Mix/VB-Cable on Windows, PulseAudio on Linux)
- **Optional**: CUDA-capable GPU, Ollama for LLM features

## Installation

### 1. Install Dependencies

```bash
pip install -r requirements.txt
```

### 2. GPU Support (Optional)

For CUDA 11.8:
```bash
pip install torch==2.8.0+cu118 --index-url https://download.pytorch.org/whl/cu118
```

For CUDA 12.1:
```bash
pip install torch==2.8.0+cu121 --index-url https://download.pytorch.org/whl/cu121
```

### 3. Audio Setup

**Linux (PulseAudio/PipeWire):**
```bash
# List devices to find your monitor device
./run_transcribe.sh --list-devices

# Use with monitor device
./run_transcribe.sh --monitor "alsa_output.monitor"
```

**Windows:**
- Enable "Stereo Mix" in Sound settings, or
- Install VB-Cable from [vb-audio.com](https://vb-audio.com/Cable/)

### 4. LLM Support (Optional)

```bash
# Install Ollama from ollama.ai
ollama pull qwen2.5:3b
```

## Usage

### Command Line Options

```bash
python transcribe.py [OPTIONS]

Options:
  --model {tiny,base,small,medium,large}  Whisper model (default: tiny)
  --language CODE                         Language code (default: en)
  --mic DEVICE                            Microphone device name
  --monitor DEVICE                        Speaker monitor device name
  --interval SECONDS                      Processing interval (default: 5.0)
  --min-duration SECONDS                  Minimum audio duration (default: 2.0)
  --enable-llm                            Enable LLM analysis
  --llm-model MODEL                       Ollama model (default: qwen2.5:3b)
  --output FILE                           Save transcript to file
  --force-cpu                             Force CPU processing
  --list-devices                          List audio devices
```

### Examples

```bash
# Dutch transcription with LLM
./run_transcribe.sh --model medium --language nl --enable-llm

# High-quality meeting transcription
./run_transcribe.sh --model large --interval 8 --output meeting.txt

# Fast real-time transcription
./run_transcribe.sh --model tiny --interval 3 --min-duration 2

# Specific devices
./run_transcribe.sh --mic "USB Mic" --monitor "Monitor of Speakers"
```

## Model Performance

| Model  | Size   | Speed    | Quality | Use Case               |
|--------|--------|----------|---------|------------------------|
| tiny   | 75 MB  | Fastest  | Basic   | Real-time, low latency |
| base   | 145 MB | Fast     | Good    | General use            |
| small  | 485 MB | Moderate | Better  | Balanced               |
| medium | 1.5 GB | Slow     | Great   | High accuracy          |
| large  | 3 GB   | Slowest  | Best    | Maximum quality        |

## Troubleshooting

**No audio devices found:**
```bash
# List all devices
./run_transcribe.sh --list-devices

# Specify devices explicitly
./run_transcribe.sh --mic "device_name" --monitor "monitor_name"
```

**CUDA errors:**
```bash
# Force CPU processing
./run_transcribe.sh --force-cpu
```

**Ollama connection failed:**
```bash
# Start Ollama service
ollama serve

# Pull required model
ollama pull qwen2.5:3b
```

**Poor transcription quality:**
- Use larger model: `--model medium` or `--model large`
- Increase interval: `--interval 10`
- Specify language: `--language nl`
- Ensure good audio quality (reduce background noise)

## Output Format

### Standard Output
```
🎤 [14:23:15] User speaking into microphone
🔊 [14:23:18] Audio from speakers or system
```

### With LLM Analysis
```
🎤 [14:23:15] The Earth orbits the Sun in 365 days.
   ✅ FACTUAL (0.98): Scientifically accurate orbital period.
   ❓ Questions:
      1. Why do we need leap years?
      2. How does the elliptical orbit affect seasons?
      3. What factors influence Earth's orbital velocity?
```

### File Output
```
[14:23:15] MIC: User speaking into microphone
[14:23:18] SPEAKER: Audio from speakers

======================================================================
[14:23:25] MIC: The Earth orbits the Sun in 365 days.

📊 Fact Check: FACTUAL (confidence: 0.98)
💡 Scientifically accurate orbital period.

❓ Questions:
1. Why do we need leap years?
2. How does the elliptical orbit affect seasons?
3. What factors influence Earth's orbital velocity?
```

## Architecture

- **Audio Capture**: sounddevice with dual-stream support
- **Transcription**: faster-whisper (optimized Whisper implementation)
- **LLM**: Ollama for local inference
- **Format**: 16kHz mono, 16-bit PCM
- **Processing**: Independent mic/speaker buffers with beam_size=3

## Contributing

Contributions welcome! Please open issues or submit pull requests.

## License

Uses [Whisper](https://github.com/openai/whisper) (OpenAI), [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (SYSTRAN), and [Ollama](https://ollama.ai).