# Verbatim Dicta2 Real-time audio transcription using Whisper AI with optional LLM analysis. Captures microphone and speaker audio simultaneously for comprehensive transcription. ## Features - **Dual audio capture** - Record microphone and speaker output simultaneously - **Real-time transcription** - Process audio as it's captured with Whisper models - **LLM analysis** - Optional fact-checking and question generation via Ollama - **Multi-language** - Support for 50+ languages - **File output** - Save transcripts with timestamps and analysis - **GPU acceleration** - CUDA support for faster processing ## Quick Start ```bash # Install dependencies pip install -r requirements.txt # List audio devices ./run_transcribe.sh --list-devices # Basic transcription ./run_transcribe.sh --model medium --language en # With LLM analysis and file output ./run_transcribe.sh --model medium --enable-llm --output transcript.txt ``` ## Requirements - **OS**: Windows 10/11 or Linux - **Python**: 3.8+ - **Audio**: Loopback device (Stereo Mix/VB-Cable on Windows, PulseAudio on Linux) - **Optional**: CUDA-capable GPU, Ollama for LLM features ## Installation ### 1. Install Dependencies ```bash pip install -r requirements.txt ``` ### 2. GPU Support (Optional) For CUDA 11.8: ```bash pip install torch==2.8.0+cu118 --index-url https://download.pytorch.org/whl/cu118 ``` For CUDA 12.1: ```bash pip install torch==2.8.0+cu121 --index-url https://download.pytorch.org/whl/cu121 ``` ### 3. Audio Setup **Linux (PulseAudio/PipeWire):** ```bash # List devices to find your monitor device ./run_transcribe.sh --list-devices # Use with monitor device ./run_transcribe.sh --monitor "alsa_output.monitor" ``` **Windows:** - Enable "Stereo Mix" in Sound settings, or - Install VB-Cable from [vb-audio.com](https://vb-audio.com/Cable/) ### 4. LLM Support (Optional) ```bash # Install Ollama from ollama.ai ollama pull qwen2.5:3b ``` ## Usage ### Command Line Options ```bash python transcribe.py [OPTIONS] Options: --model {tiny,base,small,medium,large} Whisper model (default: tiny) --language CODE Language code (default: en) --mic DEVICE Microphone device name --monitor DEVICE Speaker monitor device name --interval SECONDS Processing interval (default: 5.0) --min-duration SECONDS Minimum audio duration (default: 2.0) --enable-llm Enable LLM analysis --llm-model MODEL Ollama model (default: qwen2.5:3b) --output FILE Save transcript to file --force-cpu Force CPU processing --list-devices List audio devices ``` ### Examples ```bash # Dutch transcription with LLM ./run_transcribe.sh --model medium --language nl --enable-llm # High-quality meeting transcription ./run_transcribe.sh --model large --interval 8 --output meeting.txt # Fast real-time transcription ./run_transcribe.sh --model tiny --interval 3 --min-duration 2 # Specific devices ./run_transcribe.sh --mic "USB Mic" --monitor "Monitor of Speakers" ``` ## Model Performance | Model | Size | Speed | Quality | Use Case | |--------|--------|----------|---------|------------------------| | tiny | 75 MB | Fastest | Basic | Real-time, low latency | | base | 145 MB | Fast | Good | General use | | small | 485 MB | Moderate | Better | Balanced | | medium | 1.5 GB | Slow | Great | High accuracy | | large | 3 GB | Slowest | Best | Maximum quality | ## Troubleshooting **No audio devices found:** ```bash # List all devices ./run_transcribe.sh --list-devices # Specify devices explicitly ./run_transcribe.sh --mic "device_name" --monitor "monitor_name" ``` **CUDA errors:** ```bash # Force CPU processing ./run_transcribe.sh --force-cpu ``` **Ollama connection failed:** ```bash # Start Ollama service ollama serve # Pull required model ollama pull qwen2.5:3b ``` **Poor transcription quality:** - Use larger model: `--model medium` or `--model large` - Increase interval: `--interval 10` - Specify language: `--language nl` - Ensure good audio quality (reduce background noise) ## Output Format ### Standard Output ``` 🎤 [14:23:15] User speaking into microphone 🔊 [14:23:18] Audio from speakers or system ``` ### With LLM Analysis ``` 🎤 [14:23:15] The Earth orbits the Sun in 365 days. ✅ FACTUAL (0.98): Scientifically accurate orbital period. ❓ Questions: 1. Why do we need leap years? 2. How does the elliptical orbit affect seasons? 3. What factors influence Earth's orbital velocity? ``` ### File Output ``` [14:23:15] MIC: User speaking into microphone [14:23:18] SPEAKER: Audio from speakers ====================================================================== [14:23:25] MIC: The Earth orbits the Sun in 365 days. 📊 Fact Check: FACTUAL (confidence: 0.98) 💡 Scientifically accurate orbital period. ❓ Questions: 1. Why do we need leap years? 2. How does the elliptical orbit affect seasons? 3. What factors influence Earth's orbital velocity? ``` ## Architecture - **Audio Capture**: sounddevice with dual-stream support - **Transcription**: faster-whisper (optimized Whisper implementation) - **LLM**: Ollama for local inference - **Format**: 16kHz mono, 16-bit PCM - **Processing**: Independent mic/speaker buffers with beam_size=3 ## Contributing Contributions welcome! Please open issues or submit pull requests. ## License Uses [Whisper](https://github.com/openai/whisper) (OpenAI), [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (SYSTRAN), and [Ollama](https://ollama.ai).