Files

mike 4343b7a5a2 chore: update 6 file(s)

2025-12-17 22:30:41 +01:00

5.7 KiB

Raw Blame History

Verbatim Dicta

Real-time audio transcription using Whisper AI with optional LLM analysis. Captures microphone and speaker audio simultaneously for comprehensive transcription.

Features

Dual audio capture - Record microphone and speaker output simultaneously
Real-time transcription - Process audio as it's captured with Whisper models
LLM analysis - Optional fact-checking and question generation via Ollama
Multi-language - Support for 50+ languages
File output - Save transcripts with timestamps and analysis
GPU acceleration - CUDA support for faster processing

Quick Start

# Install dependencies
pip install -r requirements.txt

# List audio devices
./run_transcribe.sh --list-devices

# Basic transcription
./run_transcribe.sh --model medium --language en

# With LLM analysis and file output
./run_transcribe.sh --model medium --enable-llm --output transcript.txt

Requirements

OS: Windows 10/11 or Linux
Python: 3.8+
Audio: Loopback device (Stereo Mix/VB-Cable on Windows, PulseAudio on Linux)
Optional: CUDA-capable GPU, Ollama for LLM features

Installation

1. Install Dependencies

pip install -r requirements.txt

2. GPU Support (Optional)

For CUDA 11.8:

pip install torch==2.8.0+cu118 --index-url https://download.pytorch.org/whl/cu118

For CUDA 12.1:

pip install torch==2.8.0+cu121 --index-url https://download.pytorch.org/whl/cu121

3. Audio Setup

Linux (PulseAudio/PipeWire):

# List devices to find your monitor device
./run_transcribe.sh --list-devices

# Use with monitor device
./run_transcribe.sh --monitor "alsa_output.monitor"

Windows:

Enable "Stereo Mix" in Sound settings, or
Install VB-Cable from vb-audio.com

4. LLM Support (Optional)

# Install Ollama from ollama.ai
ollama pull qwen2.5:3b

Usage

Command Line Options

python transcribe.py [OPTIONS]

Options:
  --model {tiny,base,small,medium,large}  Whisper model (default: tiny)
  --language CODE                         Language code (default: en)
  --mic DEVICE                            Microphone device name
  --monitor DEVICE                        Speaker monitor device name
  --interval SECONDS                      Processing interval (default: 5.0)
  --min-duration SECONDS                  Minimum audio duration (default: 2.0)
  --enable-llm                            Enable LLM analysis
  --llm-model MODEL                       Ollama model (default: qwen2.5:3b)
  --output FILE                           Save transcript to file
  --force-cpu                             Force CPU processing
  --list-devices                          List audio devices

Examples

# Dutch transcription with LLM
./run_transcribe.sh --model medium --language nl --enable-llm

# High-quality meeting transcription
./run_transcribe.sh --model large --interval 8 --output meeting.txt

# Fast real-time transcription
./run_transcribe.sh --model tiny --interval 3 --min-duration 2

# Specific devices
./run_transcribe.sh --mic "USB Mic" --monitor "Monitor of Speakers"

Model Performance

Model	Size	Speed	Quality	Use Case
tiny	75 MB	Fastest	Basic	Real-time, low latency
base	145 MB	Fast	Good	General use
small	485 MB	Moderate	Better	Balanced
medium	1.5 GB	Slow	Great	High accuracy
large	3 GB	Slowest	Best	Maximum quality

Troubleshooting

No audio devices found:

# List all devices
./run_transcribe.sh --list-devices

# Specify devices explicitly
./run_transcribe.sh --mic "device_name" --monitor "monitor_name"

CUDA errors:

# Force CPU processing
./run_transcribe.sh --force-cpu

Ollama connection failed:

# Start Ollama service
ollama serve

# Pull required model
ollama pull qwen2.5:3b

Poor transcription quality:

Use larger model: --model medium or --model large
Increase interval: --interval 10
Specify language: --language nl
Ensure good audio quality (reduce background noise)

Output Format

Standard Output

🎤 [14:23:15] User speaking into microphone
🔊 [14:23:18] Audio from speakers or system

With LLM Analysis

🎤 [14:23:15] The Earth orbits the Sun in 365 days.
   ✅ FACTUAL (0.98): Scientifically accurate orbital period.
   ❓ Questions:
      1. Why do we need leap years?
      2. How does the elliptical orbit affect seasons?
      3. What factors influence Earth's orbital velocity?

File Output

[14:23:15] MIC: User speaking into microphone
[14:23:18] SPEAKER: Audio from speakers

======================================================================
[14:23:25] MIC: The Earth orbits the Sun in 365 days.

📊 Fact Check: FACTUAL (confidence: 0.98)
💡 Scientifically accurate orbital period.

❓ Questions:
1. Why do we need leap years?
2. How does the elliptical orbit affect seasons?
3. What factors influence Earth's orbital velocity?

Architecture

Audio Capture: sounddevice with dual-stream support
Transcription: faster-whisper (optimized Whisper implementation)
LLM: Ollama for local inference
Format: 16kHz mono, 16-bit PCM
Processing: Independent mic/speaker buffers with beam_size=3

Contributing

Contributions welcome! Please open issues or submit pull requests.

License

Uses Whisper (OpenAI), faster-whisper (SYSTRAN), and Ollama.

5.7 KiB Raw Blame History