- Rename project from "Verbatim Dicta1" to "Verbatim Dicta2" - Remove redundant comments in `start_transcribe.sh` - Streamline virtual environment activation process
208 lines
5.7 KiB
Markdown
208 lines
5.7 KiB
Markdown
# Verbatim Dicta2
|
|
|
|
Real-time audio transcription using Whisper AI with optional LLM analysis. Captures microphone and speaker audio simultaneously for comprehensive transcription.
|
|
|
|
## Features
|
|
|
|
- **Dual audio capture** - Record microphone and speaker output simultaneously
|
|
- **Real-time transcription** - Process audio as it's captured with Whisper models
|
|
- **LLM analysis** - Optional fact-checking and question generation via Ollama
|
|
- **Multi-language** - Support for 50+ languages
|
|
- **File output** - Save transcripts with timestamps and analysis
|
|
- **GPU acceleration** - CUDA support for faster processing
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# List audio devices
|
|
./run_transcribe.sh --list-devices
|
|
|
|
# Basic transcription
|
|
./run_transcribe.sh --model medium --language en
|
|
|
|
# With LLM analysis and file output
|
|
./run_transcribe.sh --model medium --enable-llm --output transcript.txt
|
|
```
|
|
|
|
## Requirements
|
|
|
|
- **OS**: Windows 10/11 or Linux
|
|
- **Python**: 3.8+
|
|
- **Audio**: Loopback device (Stereo Mix/VB-Cable on Windows, PulseAudio on Linux)
|
|
- **Optional**: CUDA-capable GPU, Ollama for LLM features
|
|
|
|
## Installation
|
|
|
|
### 1. Install Dependencies
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### 2. GPU Support (Optional)
|
|
|
|
For CUDA 11.8:
|
|
```bash
|
|
pip install torch==2.8.0+cu118 --index-url https://download.pytorch.org/whl/cu118
|
|
```
|
|
|
|
For CUDA 12.1:
|
|
```bash
|
|
pip install torch==2.8.0+cu121 --index-url https://download.pytorch.org/whl/cu121
|
|
```
|
|
|
|
### 3. Audio Setup
|
|
|
|
**Linux (PulseAudio/PipeWire):**
|
|
```bash
|
|
# List devices to find your monitor device
|
|
./run_transcribe.sh --list-devices
|
|
|
|
# Use with monitor device
|
|
./run_transcribe.sh --monitor "alsa_output.monitor"
|
|
```
|
|
|
|
**Windows:**
|
|
- Enable "Stereo Mix" in Sound settings, or
|
|
- Install VB-Cable from [vb-audio.com](https://vb-audio.com/Cable/)
|
|
|
|
### 4. LLM Support (Optional)
|
|
|
|
```bash
|
|
# Install Ollama from ollama.ai
|
|
ollama pull qwen2.5:3b
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Command Line Options
|
|
|
|
```bash
|
|
python transcribe.py [OPTIONS]
|
|
|
|
Options:
|
|
--model {tiny,base,small,medium,large} Whisper model (default: tiny)
|
|
--language CODE Language code (default: en)
|
|
--mic DEVICE Microphone device name
|
|
--monitor DEVICE Speaker monitor device name
|
|
--interval SECONDS Processing interval (default: 5.0)
|
|
--min-duration SECONDS Minimum audio duration (default: 2.0)
|
|
--enable-llm Enable LLM analysis
|
|
--llm-model MODEL Ollama model (default: qwen2.5:3b)
|
|
--output FILE Save transcript to file
|
|
--force-cpu Force CPU processing
|
|
--list-devices List audio devices
|
|
```
|
|
|
|
### Examples
|
|
|
|
```bash
|
|
# Dutch transcription with LLM
|
|
./run_transcribe.sh --model medium --language nl --enable-llm
|
|
|
|
# High-quality meeting transcription
|
|
./run_transcribe.sh --model large --interval 8 --output meeting.txt
|
|
|
|
# Fast real-time transcription
|
|
./run_transcribe.sh --model tiny --interval 3 --min-duration 2
|
|
|
|
# Specific devices
|
|
./run_transcribe.sh --mic "USB Mic" --monitor "Monitor of Speakers"
|
|
```
|
|
|
|
## Model Performance
|
|
|
|
| Model | Size | Speed | Quality | Use Case |
|
|
|--------|--------|----------|---------|------------------------|
|
|
| tiny | 75 MB | Fastest | Basic | Real-time, low latency |
|
|
| base | 145 MB | Fast | Good | General use |
|
|
| small | 485 MB | Moderate | Better | Balanced |
|
|
| medium | 1.5 GB | Slow | Great | High accuracy |
|
|
| large | 3 GB | Slowest | Best | Maximum quality |
|
|
|
|
## Troubleshooting
|
|
|
|
**No audio devices found:**
|
|
```bash
|
|
# List all devices
|
|
./run_transcribe.sh --list-devices
|
|
|
|
# Specify devices explicitly
|
|
./run_transcribe.sh --mic "device_name" --monitor "monitor_name"
|
|
```
|
|
|
|
**CUDA errors:**
|
|
```bash
|
|
# Force CPU processing
|
|
./run_transcribe.sh --force-cpu
|
|
```
|
|
|
|
**Ollama connection failed:**
|
|
```bash
|
|
# Start Ollama service
|
|
ollama serve
|
|
|
|
# Pull required model
|
|
ollama pull qwen2.5:3b
|
|
```
|
|
|
|
**Poor transcription quality:**
|
|
- Use larger model: `--model medium` or `--model large`
|
|
- Increase interval: `--interval 10`
|
|
- Specify language: `--language nl`
|
|
- Ensure good audio quality (reduce background noise)
|
|
|
|
## Output Format
|
|
|
|
### Standard Output
|
|
```
|
|
🎤 [14:23:15] User speaking into microphone
|
|
🔊 [14:23:18] Audio from speakers or system
|
|
```
|
|
|
|
### With LLM Analysis
|
|
```
|
|
🎤 [14:23:15] The Earth orbits the Sun in 365 days.
|
|
✅ FACTUAL (0.98): Scientifically accurate orbital period.
|
|
❓ Questions:
|
|
1. Why do we need leap years?
|
|
2. How does the elliptical orbit affect seasons?
|
|
3. What factors influence Earth's orbital velocity?
|
|
```
|
|
|
|
### File Output
|
|
```
|
|
[14:23:15] MIC: User speaking into microphone
|
|
[14:23:18] SPEAKER: Audio from speakers
|
|
|
|
======================================================================
|
|
[14:23:25] MIC: The Earth orbits the Sun in 365 days.
|
|
|
|
📊 Fact Check: FACTUAL (confidence: 0.98)
|
|
💡 Scientifically accurate orbital period.
|
|
|
|
❓ Questions:
|
|
1. Why do we need leap years?
|
|
2. How does the elliptical orbit affect seasons?
|
|
3. What factors influence Earth's orbital velocity?
|
|
```
|
|
|
|
## Architecture
|
|
|
|
- **Audio Capture**: sounddevice with dual-stream support
|
|
- **Transcription**: faster-whisper (optimized Whisper implementation)
|
|
- **LLM**: Ollama for local inference
|
|
- **Format**: 16kHz mono, 16-bit PCM
|
|
- **Processing**: Independent mic/speaker buffers with beam_size=3
|
|
|
|
## Contributing
|
|
|
|
Contributions welcome! Please open issues or submit pull requests.
|
|
|
|
## License
|
|
|
|
Uses [Whisper](https://github.com/openai/whisper) (OpenAI), [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (SYSTRAN), and [Ollama](https://ollama.ai).
|