7cc14326fce2351e3d0e9f4ddefe2cbc3760a4c9
- Fix typo in project name from "Verbatim Dicta" to "Verbatim Dicta1" - Ensure consistency with other project references
Verbatim Dicta1
Real-time audio transcription using Whisper AI with optional LLM analysis. Captures microphone and speaker audio simultaneously for comprehensive transcription.
Features
- Dual audio capture - Record microphone and speaker output simultaneously
- Real-time transcription - Process audio as it's captured with Whisper models
- LLM analysis - Optional fact-checking and question generation via Ollama
- Multi-language - Support for 50+ languages
- File output - Save transcripts with timestamps and analysis
- GPU acceleration - CUDA support for faster processing
Quick Start
# Install dependencies
pip install -r requirements.txt
# List audio devices
./run_transcribe.sh --list-devices
# Basic transcription
./run_transcribe.sh --model medium --language en
# With LLM analysis and file output
./run_transcribe.sh --model medium --enable-llm --output transcript.txt
Requirements
- OS: Windows 10/11 or Linux
- Python: 3.8+
- Audio: Loopback device (Stereo Mix/VB-Cable on Windows, PulseAudio on Linux)
- Optional: CUDA-capable GPU, Ollama for LLM features
Installation
1. Install Dependencies
pip install -r requirements.txt
2. GPU Support (Optional)
For CUDA 11.8:
pip install torch==2.8.0+cu118 --index-url https://download.pytorch.org/whl/cu118
For CUDA 12.1:
pip install torch==2.8.0+cu121 --index-url https://download.pytorch.org/whl/cu121
3. Audio Setup
Linux (PulseAudio/PipeWire):
# List devices to find your monitor device
./run_transcribe.sh --list-devices
# Use with monitor device
./run_transcribe.sh --monitor "alsa_output.monitor"
Windows:
- Enable "Stereo Mix" in Sound settings, or
- Install VB-Cable from vb-audio.com
4. LLM Support (Optional)
# Install Ollama from ollama.ai
ollama pull qwen2.5:3b
Usage
Command Line Options
python transcribe.py [OPTIONS]
Options:
--model {tiny,base,small,medium,large} Whisper model (default: tiny)
--language CODE Language code (default: en)
--mic DEVICE Microphone device name
--monitor DEVICE Speaker monitor device name
--interval SECONDS Processing interval (default: 5.0)
--min-duration SECONDS Minimum audio duration (default: 2.0)
--enable-llm Enable LLM analysis
--llm-model MODEL Ollama model (default: qwen2.5:3b)
--output FILE Save transcript to file
--force-cpu Force CPU processing
--list-devices List audio devices
Examples
# Dutch transcription with LLM
./run_transcribe.sh --model medium --language nl --enable-llm
# High-quality meeting transcription
./run_transcribe.sh --model large --interval 8 --output meeting.txt
# Fast real-time transcription
./run_transcribe.sh --model tiny --interval 3 --min-duration 2
# Specific devices
./run_transcribe.sh --mic "USB Mic" --monitor "Monitor of Speakers"
Model Performance
| Model | Size | Speed | Quality | Use Case |
|---|---|---|---|---|
| tiny | 75 MB | Fastest | Basic | Real-time, low latency |
| base | 145 MB | Fast | Good | General use |
| small | 485 MB | Moderate | Better | Balanced |
| medium | 1.5 GB | Slow | Great | High accuracy |
| large | 3 GB | Slowest | Best | Maximum quality |
Troubleshooting
No audio devices found:
# List all devices
./run_transcribe.sh --list-devices
# Specify devices explicitly
./run_transcribe.sh --mic "device_name" --monitor "monitor_name"
CUDA errors:
# Force CPU processing
./run_transcribe.sh --force-cpu
Ollama connection failed:
# Start Ollama service
ollama serve
# Pull required model
ollama pull qwen2.5:3b
Poor transcription quality:
- Use larger model:
--model mediumor--model large - Increase interval:
--interval 10 - Specify language:
--language nl - Ensure good audio quality (reduce background noise)
Output Format
Standard Output
🎤 [14:23:15] User speaking into microphone
🔊 [14:23:18] Audio from speakers or system
With LLM Analysis
🎤 [14:23:15] The Earth orbits the Sun in 365 days.
✅ FACTUAL (0.98): Scientifically accurate orbital period.
❓ Questions:
1. Why do we need leap years?
2. How does the elliptical orbit affect seasons?
3. What factors influence Earth's orbital velocity?
File Output
[14:23:15] MIC: User speaking into microphone
[14:23:18] SPEAKER: Audio from speakers
======================================================================
[14:23:25] MIC: The Earth orbits the Sun in 365 days.
📊 Fact Check: FACTUAL (confidence: 0.98)
💡 Scientifically accurate orbital period.
❓ Questions:
1. Why do we need leap years?
2. How does the elliptical orbit affect seasons?
3. What factors influence Earth's orbital velocity?
Architecture
- Audio Capture: sounddevice with dual-stream support
- Transcription: faster-whisper (optimized Whisper implementation)
- LLM: Ollama for local inference
- Format: 16kHz mono, 16-bit PCM
- Processing: Independent mic/speaker buffers with beam_size=3
Contributing
Contributions welcome! Please open issues or submit pull requests.
License
Uses Whisper (OpenAI), faster-whisper (SYSTRAN), and Ollama.
Description
Languages
Python
96.6%
Shell
3.4%