Files
verbatim-dicta/README.md
mike 03b018a582 **feat: update project name and cleanup startup script**
- Rename project from "Verbatim Dicta1" to "Verbatim Dicta2"
- Remove redundant comments in `start_transcribe.sh`
- Streamline virtual environment activation process
2025-12-17 23:09:56 +01:00

208 lines
5.7 KiB
Markdown

# Verbatim Dicta2
Real-time audio transcription using Whisper AI with optional LLM analysis. Captures microphone and speaker audio simultaneously for comprehensive transcription.
## Features
- **Dual audio capture** - Record microphone and speaker output simultaneously
- **Real-time transcription** - Process audio as it's captured with Whisper models
- **LLM analysis** - Optional fact-checking and question generation via Ollama
- **Multi-language** - Support for 50+ languages
- **File output** - Save transcripts with timestamps and analysis
- **GPU acceleration** - CUDA support for faster processing
## Quick Start
```bash
# Install dependencies
pip install -r requirements.txt
# List audio devices
./run_transcribe.sh --list-devices
# Basic transcription
./run_transcribe.sh --model medium --language en
# With LLM analysis and file output
./run_transcribe.sh --model medium --enable-llm --output transcript.txt
```
## Requirements
- **OS**: Windows 10/11 or Linux
- **Python**: 3.8+
- **Audio**: Loopback device (Stereo Mix/VB-Cable on Windows, PulseAudio on Linux)
- **Optional**: CUDA-capable GPU, Ollama for LLM features
## Installation
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. GPU Support (Optional)
For CUDA 11.8:
```bash
pip install torch==2.8.0+cu118 --index-url https://download.pytorch.org/whl/cu118
```
For CUDA 12.1:
```bash
pip install torch==2.8.0+cu121 --index-url https://download.pytorch.org/whl/cu121
```
### 3. Audio Setup
**Linux (PulseAudio/PipeWire):**
```bash
# List devices to find your monitor device
./run_transcribe.sh --list-devices
# Use with monitor device
./run_transcribe.sh --monitor "alsa_output.monitor"
```
**Windows:**
- Enable "Stereo Mix" in Sound settings, or
- Install VB-Cable from [vb-audio.com](https://vb-audio.com/Cable/)
### 4. LLM Support (Optional)
```bash
# Install Ollama from ollama.ai
ollama pull qwen2.5:3b
```
## Usage
### Command Line Options
```bash
python transcribe.py [OPTIONS]
Options:
--model {tiny,base,small,medium,large} Whisper model (default: tiny)
--language CODE Language code (default: en)
--mic DEVICE Microphone device name
--monitor DEVICE Speaker monitor device name
--interval SECONDS Processing interval (default: 5.0)
--min-duration SECONDS Minimum audio duration (default: 2.0)
--enable-llm Enable LLM analysis
--llm-model MODEL Ollama model (default: qwen2.5:3b)
--output FILE Save transcript to file
--force-cpu Force CPU processing
--list-devices List audio devices
```
### Examples
```bash
# Dutch transcription with LLM
./run_transcribe.sh --model medium --language nl --enable-llm
# High-quality meeting transcription
./run_transcribe.sh --model large --interval 8 --output meeting.txt
# Fast real-time transcription
./run_transcribe.sh --model tiny --interval 3 --min-duration 2
# Specific devices
./run_transcribe.sh --mic "USB Mic" --monitor "Monitor of Speakers"
```
## Model Performance
| Model | Size | Speed | Quality | Use Case |
|--------|--------|----------|---------|------------------------|
| tiny | 75 MB | Fastest | Basic | Real-time, low latency |
| base | 145 MB | Fast | Good | General use |
| small | 485 MB | Moderate | Better | Balanced |
| medium | 1.5 GB | Slow | Great | High accuracy |
| large | 3 GB | Slowest | Best | Maximum quality |
## Troubleshooting
**No audio devices found:**
```bash
# List all devices
./run_transcribe.sh --list-devices
# Specify devices explicitly
./run_transcribe.sh --mic "device_name" --monitor "monitor_name"
```
**CUDA errors:**
```bash
# Force CPU processing
./run_transcribe.sh --force-cpu
```
**Ollama connection failed:**
```bash
# Start Ollama service
ollama serve
# Pull required model
ollama pull qwen2.5:3b
```
**Poor transcription quality:**
- Use larger model: `--model medium` or `--model large`
- Increase interval: `--interval 10`
- Specify language: `--language nl`
- Ensure good audio quality (reduce background noise)
## Output Format
### Standard Output
```
🎤 [14:23:15] User speaking into microphone
🔊 [14:23:18] Audio from speakers or system
```
### With LLM Analysis
```
🎤 [14:23:15] The Earth orbits the Sun in 365 days.
✅ FACTUAL (0.98): Scientifically accurate orbital period.
❓ Questions:
1. Why do we need leap years?
2. How does the elliptical orbit affect seasons?
3. What factors influence Earth's orbital velocity?
```
### File Output
```
[14:23:15] MIC: User speaking into microphone
[14:23:18] SPEAKER: Audio from speakers
======================================================================
[14:23:25] MIC: The Earth orbits the Sun in 365 days.
📊 Fact Check: FACTUAL (confidence: 0.98)
💡 Scientifically accurate orbital period.
❓ Questions:
1. Why do we need leap years?
2. How does the elliptical orbit affect seasons?
3. What factors influence Earth's orbital velocity?
```
## Architecture
- **Audio Capture**: sounddevice with dual-stream support
- **Transcription**: faster-whisper (optimized Whisper implementation)
- **LLM**: Ollama for local inference
- **Format**: 16kHz mono, 16-bit PCM
- **Processing**: Independent mic/speaker buffers with beam_size=3
## Contributing
Contributions welcome! Please open issues or submit pull requests.
## License
Uses [Whisper](https://github.com/openai/whisper) (OpenAI), [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (SYSTRAN), and [Ollama](https://ollama.ai).