chore: update 6 file(s)
This commit is contained in:
161
QUICK_START.md
161
QUICK_START.md
@@ -1,105 +1,156 @@
|
||||
# Quick Start Guide
|
||||
|
||||
## Dutch Language (Nederlands)
|
||||
## 1. Setup Audio Devices
|
||||
|
||||
### Basic Dutch Transcription
|
||||
```bash
|
||||
./RUN_DUTCH.sh
|
||||
# List available audio devices
|
||||
./run_transcribe.sh --list-devices
|
||||
```
|
||||
- ✅ GPU-accelerated (RTX 4060 Ti)
|
||||
- ✅ Sentence extraction (complete zinnen)
|
||||
- ✅ Base model (goede balans snelheid/nauwkeurigheid)
|
||||
|
||||
### Dutch with LLM Analysis
|
||||
Find your:
|
||||
- **Microphone** - Your input device (e.g., "USB Microphone")
|
||||
- **Monitor** - Speaker capture device (e.g., "Monitor of Built-in Audio")
|
||||
|
||||
---
|
||||
|
||||
## 2. Basic Usage
|
||||
|
||||
### Simple Transcription
|
||||
```bash
|
||||
./RUN_DUTCH_LLM.sh
|
||||
# Auto-detect devices
|
||||
./run_transcribe.sh --model medium --language en
|
||||
|
||||
# Specify devices
|
||||
./run_transcribe.sh --mic "USB Mic" --monitor "Monitor"
|
||||
```
|
||||
- ✅ All features from basic version
|
||||
- ✅ Fact-checking van uitspraken
|
||||
- ✅ Automatische vraag generatie
|
||||
- Uses llama3.2:latest model
|
||||
|
||||
### Save to File
|
||||
### With File Output
|
||||
```bash
|
||||
./RUN_DUTCH.sh --output transcript.txt
|
||||
./RUN_DUTCH_LLM.sh --output enriched.txt
|
||||
./run_transcribe.sh --model medium --language en --output transcript.txt
|
||||
```
|
||||
|
||||
### With LLM Analysis
|
||||
```bash
|
||||
./run_transcribe.sh --model medium --enable-llm --output enriched.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## English Language
|
||||
## 3. Language Examples
|
||||
|
||||
### Basic English Transcription
|
||||
### Dutch (Nederlands)
|
||||
```bash
|
||||
./RUN_GPU.sh
|
||||
./run_transcribe.sh --model medium --language nl --enable-llm
|
||||
```
|
||||
|
||||
### English with LLM
|
||||
```bash
|
||||
./RUN_GPU.sh --enable-llm
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Other Languages
|
||||
|
||||
### Spanish
|
||||
```bash
|
||||
./RUN_GPU.sh --language es
|
||||
./run_transcribe.sh --model medium --language es
|
||||
```
|
||||
|
||||
### French
|
||||
```bash
|
||||
./RUN_GPU.sh --language fr
|
||||
./run_transcribe.sh --model medium --language fr
|
||||
```
|
||||
|
||||
### German
|
||||
```bash
|
||||
./RUN_GPU.sh --language de
|
||||
./run_transcribe.sh --model medium --language de
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Available Ollama Models
|
||||
## 4. Model Selection
|
||||
|
||||
You have these models installed:
|
||||
- `llama3.2:latest` (2.0 GB) - **Default** - Fast and accurate
|
||||
- `llama3:8b` (4.7 GB) - More powerful
|
||||
- `qwen2.5:3b` (1.9 GB) - Fast alternative
|
||||
- `qwen2.5:7b` (4.7 GB) - Powerful alternative
|
||||
- `qwen2.5:0.5b` (397 MB) - Very fast, less accurate
|
||||
| Model | Speed | Quality | Command |
|
||||
|--------|----------|---------|----------------------------------|
|
||||
| tiny | Fastest | Basic | `--model tiny` |
|
||||
| base | Fast | Good | `--model base` |
|
||||
| small | Moderate | Better | `--model small` |
|
||||
| medium | Slow | Great | `--model medium` **(recommended)** |
|
||||
| large | Slowest | Best | `--model large` |
|
||||
|
||||
To use a different model:
|
||||
---
|
||||
|
||||
## 5. Optimization Tips
|
||||
|
||||
### High Quality Transcription
|
||||
```bash
|
||||
./RUN_DUTCH_LLM.sh --llm-model "llama3:8b"
|
||||
./run_transcribe.sh --model large --interval 8 --min-duration 4
|
||||
```
|
||||
|
||||
### Fast Real-Time
|
||||
```bash
|
||||
./run_transcribe.sh --model tiny --interval 3 --min-duration 2
|
||||
```
|
||||
|
||||
### Best Dutch Transcription (Your Setup)
|
||||
```bash
|
||||
./run_transcribe.sh --model medium --interval 8 --min-duration 4 --enable-llm --language nl
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tips
|
||||
## 6. LLM Configuration
|
||||
|
||||
### Better Accuracy
|
||||
Use larger Whisper model (slower):
|
||||
### Default Model (qwen2.5:3b - Fast)
|
||||
```bash
|
||||
./RUN_DUTCH.sh --model medium # or: large
|
||||
./run_transcribe.sh --enable-llm
|
||||
```
|
||||
|
||||
### Faster Processing
|
||||
Use smaller model or reduce interval:
|
||||
### Larger Model (Better Analysis)
|
||||
```bash
|
||||
./RUN_DUTCH.sh --model tiny --interval 3
|
||||
```
|
||||
# Install model first
|
||||
ollama pull llama3.2
|
||||
|
||||
### Debug LLM Issues
|
||||
```bash
|
||||
./RUN_DUTCH_LLM.sh --llm-debug
|
||||
# Use it
|
||||
./run_transcribe.sh --enable-llm --llm-model llama3.2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Controls
|
||||
## 7. Output Examples
|
||||
|
||||
- **Ctrl+C** to stop transcription
|
||||
- Speak clearly into your microphone
|
||||
- Wait ~5 seconds for transcription to appear
|
||||
- Sentences appear with 📝 emoji
|
||||
### Console Output
|
||||
```
|
||||
🎤 [14:23:15] User speaking via microphone
|
||||
🔊 [14:23:20] Audio from speakers
|
||||
|
||||
🎤 [14:23:25] The Earth orbits the Sun in 365 days.
|
||||
✅ FACTUAL (0.98): Scientifically accurate.
|
||||
❓ Questions:
|
||||
1. Why do we need leap years?
|
||||
2. How does orbital speed vary?
|
||||
3. What affects Earth's orbit?
|
||||
```
|
||||
|
||||
### File Output
|
||||
Saved to `transcript.txt` or your specified file with timestamps and analysis.
|
||||
|
||||
---
|
||||
|
||||
## 8. Controls
|
||||
|
||||
- **Ctrl+C** - Stop transcription
|
||||
- Processing happens every `--interval` seconds (default: 5s)
|
||||
- Minimum `--min-duration` audio required (default: 2s)
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**No devices found:**
|
||||
```bash
|
||||
./run_transcribe.sh --list-devices
|
||||
```
|
||||
|
||||
**Ollama errors:**
|
||||
```bash
|
||||
ollama serve
|
||||
ollama pull qwen2.5:3b
|
||||
```
|
||||
|
||||
**Force CPU (GPU issues):**
|
||||
```bash
|
||||
./run_transcribe.sh --force-cpu
|
||||
```
|
||||
|
||||
249
README.md
249
README.md
@@ -1,16 +1,15 @@
|
||||
# Verbatim Dicta
|
||||
|
||||
Real-time audio transcription using Whisper AI with optional LLM-powered analysis. Captures system audio via loopback and transcribes it with configurable models and processing options.
|
||||
Real-time audio transcription using Whisper AI with optional LLM analysis. Captures microphone and speaker audio simultaneously for comprehensive transcription.
|
||||
|
||||
## Features
|
||||
|
||||
- Real-time transcription of system audio (Windows/Linux)
|
||||
- Multiple Whisper model sizes (tiny to large)
|
||||
- Multi-language support
|
||||
- **Sentence extraction mode** - Stitches audio chunks into complete sentences
|
||||
- Optional LLM analysis for fact-checking and question generation (via Ollama)
|
||||
- GPU acceleration support
|
||||
- Flexible audio device configuration
|
||||
- **Dual audio capture** - Record microphone and speaker output simultaneously
|
||||
- **Real-time transcription** - Process audio as it's captured with Whisper models
|
||||
- **LLM analysis** - Optional fact-checking and question generation via Ollama
|
||||
- **Multi-language** - Support for 50+ languages
|
||||
- **File output** - Save transcripts with timestamps and analysis
|
||||
- **GPU acceleration** - CUDA support for faster processing
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -18,17 +17,14 @@ Real-time audio transcription using Whisper AI with optional LLM-powered analysi
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Basic transcription (no LLM)
|
||||
python transcribe_speakers.py
|
||||
|
||||
# With LLM analysis (optional)
|
||||
python transcribe_speakers.py --enable-llm
|
||||
|
||||
# With sentence extraction
|
||||
python transcribe_speakers.py --sentence-mode
|
||||
|
||||
# List audio devices
|
||||
python transcribe_speakers.py --list-devices
|
||||
./run_transcribe.sh --list-devices
|
||||
|
||||
# Basic transcription
|
||||
./run_transcribe.sh --model medium --language en
|
||||
|
||||
# With LLM analysis and file output
|
||||
./run_transcribe.sh --model medium --enable-llm --output transcript.txt
|
||||
```
|
||||
|
||||
## Requirements
|
||||
@@ -58,172 +54,153 @@ For CUDA 12.1:
|
||||
pip install torch==2.8.0+cu121 --index-url https://download.pytorch.org/whl/cu121
|
||||
```
|
||||
|
||||
### 3. Audio Loopback Setup
|
||||
### 3. Audio Setup
|
||||
|
||||
**Windows - Option A (Stereo Mix):**
|
||||
1. Right-click speaker icon → Sounds → Recording tab
|
||||
2. Right-click → Show Disabled Devices
|
||||
3. Enable and set Stereo Mix as default
|
||||
**Linux (PulseAudio/PipeWire):**
|
||||
```bash
|
||||
# List devices to find your monitor device
|
||||
./run_transcribe.sh --list-devices
|
||||
|
||||
**Windows - Option B (VB-Cable, recommended):**
|
||||
1. Download from [vb-audio.com](https://vb-audio.com/Cable/)
|
||||
2. Install and restart
|
||||
3. Use `--device "CABLE Output"`
|
||||
# Use with monitor device
|
||||
./run_transcribe.sh --monitor "alsa_output.monitor"
|
||||
```
|
||||
|
||||
**Linux:**
|
||||
Configure PulseAudio loopback or use `transcribe_dual_linux.py`
|
||||
**Windows:**
|
||||
- Enable "Stereo Mix" in Sound settings, or
|
||||
- Install VB-Cable from [vb-audio.com](https://vb-audio.com/Cable/)
|
||||
|
||||
### 4. LLM Features (Optional)
|
||||
### 4. LLM Support (Optional)
|
||||
|
||||
```bash
|
||||
# Install Ollama from ollama.ai
|
||||
ollama pull llama3.2
|
||||
ollama pull qwen2.5:3b
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Available Scripts
|
||||
|
||||
- `transcribe_speakers.py` - Main script with all features (LLM optional via `--enable-llm`)
|
||||
- `transcribe_dual_linux.py` - Linux-specific with dual audio support
|
||||
|
||||
### Common Commands
|
||||
### Command Line Options
|
||||
|
||||
```bash
|
||||
# Quick start with GPU (English)
|
||||
./RUN_GPU.sh
|
||||
python transcribe.py [OPTIONS]
|
||||
|
||||
# Dutch language
|
||||
./RUN_DUTCH.sh
|
||||
|
||||
# Dutch with LLM analysis
|
||||
./RUN_DUTCH_LLM.sh
|
||||
|
||||
# With LLM analysis
|
||||
./RUN_GPU.sh --enable-llm
|
||||
|
||||
# Save to file
|
||||
./RUN_GPU.sh --output transcript.txt
|
||||
|
||||
# Other languages (Spanish, French, German, etc.)
|
||||
./RUN_GPU.sh --language es # Spanish
|
||||
./RUN_GPU.sh --language fr # French
|
||||
./RUN_GPU.sh --language de # German
|
||||
|
||||
# Maximum accuracy with LLM and sentence extraction
|
||||
python transcribe_speakers.py --model large --enable-llm --sentence-mode --output enriched.txt
|
||||
|
||||
# Force CPU (if GPU issues)
|
||||
python transcribe_speakers.py --force-cpu
|
||||
Options:
|
||||
--model {tiny,base,small,medium,large} Whisper model (default: tiny)
|
||||
--language CODE Language code (default: en)
|
||||
--mic DEVICE Microphone device name
|
||||
--monitor DEVICE Speaker monitor device name
|
||||
--interval SECONDS Processing interval (default: 5.0)
|
||||
--min-duration SECONDS Minimum audio duration (default: 2.0)
|
||||
--enable-llm Enable LLM analysis
|
||||
--llm-model MODEL Ollama model (default: qwen2.5:3b)
|
||||
--output FILE Save transcript to file
|
||||
--force-cpu Force CPU processing
|
||||
--list-devices List audio devices
|
||||
```
|
||||
|
||||
### Key Options
|
||||
### Examples
|
||||
|
||||
| Option | Description | Default |
|
||||
|--------|-------------|---------|
|
||||
| `--model` | Model size: tiny/base/small/medium/large | base |
|
||||
| `--language` | Language code (en/es/fr/de/ja/etc.) | en |
|
||||
| `--device` | Audio device name (partial match) | Auto |
|
||||
| `--interval` | Processing interval (seconds) | 8.0 |
|
||||
| `--min-duration` | Minimum audio duration | 3.0 |
|
||||
| `--fast-mode` | Fast mode (3-5x faster, lower accuracy) | False |
|
||||
| `--enable-llm` | Enable fact-checking and questions | False |
|
||||
| `--llm-model` | Ollama model to use | llama3.2 |
|
||||
| `--output` | Save to file | None |
|
||||
| `--force-cpu` | Disable GPU | False |
|
||||
| `--gpu-index` | GPU device index | 0 |
|
||||
| `--sentence-mode` | Extract complete sentences from chunks | False |
|
||||
```bash
|
||||
# Dutch transcription with LLM
|
||||
./run_transcribe.sh --model medium --language nl --enable-llm
|
||||
|
||||
# High-quality meeting transcription
|
||||
./run_transcribe.sh --model large --interval 8 --output meeting.txt
|
||||
|
||||
# Fast real-time transcription
|
||||
./run_transcribe.sh --model tiny --interval 3 --min-duration 2
|
||||
|
||||
# Specific devices
|
||||
./run_transcribe.sh --mic "USB Mic" --monitor "Monitor of Speakers"
|
||||
```
|
||||
|
||||
## Model Performance
|
||||
|
||||
| Model | Size | Speed | Quality | Best For |
|
||||
|-------|------|-------|---------|----------|
|
||||
| tiny | ~75 MB | Fastest | Basic | Quick tests, low-latency |
|
||||
| base | ~145 MB | Fast | Good | General real-time use |
|
||||
| small | ~485 MB | Moderate | Better | Balanced accuracy/speed |
|
||||
| medium | ~1.5 GB | Slow | Great | High accuracy needs |
|
||||
| large | ~3 GB | Slowest | Best | Maximum accuracy |
|
||||
|
||||
## Optimization Presets
|
||||
|
||||
**Low Latency (Real-Time):**
|
||||
```bash
|
||||
python transcribe_speakers.py --model tiny --fast-mode --interval 2 --min-duration 1.5
|
||||
```
|
||||
|
||||
**Balanced:**
|
||||
```bash
|
||||
python transcribe_speakers.py --model base --interval 5
|
||||
```
|
||||
|
||||
**High Accuracy:**
|
||||
```bash
|
||||
python transcribe_speakers.py --model large --interval 10 --enable-llm
|
||||
```
|
||||
| Model | Size | Speed | Quality | Use Case |
|
||||
|--------|--------|----------|---------|------------------------|
|
||||
| tiny | 75 MB | Fastest | Basic | Real-time, low latency |
|
||||
| base | 145 MB | Fast | Good | General use |
|
||||
| small | 485 MB | Moderate | Better | Balanced |
|
||||
| medium | 1.5 GB | Slow | Great | High accuracy |
|
||||
| large | 3 GB | Slowest | Best | Maximum quality |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**No loopback device:**
|
||||
- Windows: Enable Stereo Mix or install VB-Cable
|
||||
- Linux: Configure PulseAudio loopback
|
||||
**No audio devices found:**
|
||||
```bash
|
||||
# List all devices
|
||||
./run_transcribe.sh --list-devices
|
||||
|
||||
# Specify devices explicitly
|
||||
./run_transcribe.sh --mic "device_name" --monitor "monitor_name"
|
||||
```
|
||||
|
||||
**CUDA errors:**
|
||||
```bash
|
||||
python transcribe_speakers.py --force-cpu
|
||||
# Force CPU processing
|
||||
./run_transcribe.sh --force-cpu
|
||||
```
|
||||
|
||||
**No audio captured:**
|
||||
- Verify audio is playing
|
||||
- Check device: `--list-devices`
|
||||
- Increase system volume
|
||||
**Ollama connection failed:**
|
||||
```bash
|
||||
# Start Ollama service
|
||||
ollama serve
|
||||
|
||||
**Poor quality:**
|
||||
- Use larger model: `--model medium`
|
||||
# Pull required model
|
||||
ollama pull qwen2.5:3b
|
||||
```
|
||||
|
||||
**Poor transcription quality:**
|
||||
- Use larger model: `--model medium` or `--model large`
|
||||
- Increase interval: `--interval 10`
|
||||
- Specify language: `--language <code>`
|
||||
|
||||
**Ollama errors:**
|
||||
- Ensure Ollama is running
|
||||
- Pull model: `ollama pull llama3.2`
|
||||
- Specify language: `--language nl`
|
||||
- Ensure good audio quality (reduce background noise)
|
||||
|
||||
## Output Format
|
||||
|
||||
**Standard:**
|
||||
### Standard Output
|
||||
```
|
||||
[14:23:15] Transcribed audio segment.
|
||||
[14:23:23] Another segment with timestamp.
|
||||
🎤 [14:23:15] User speaking into microphone
|
||||
🔊 [14:23:18] Audio from speakers or system
|
||||
```
|
||||
|
||||
**With LLM (--enable-llm):**
|
||||
### With LLM Analysis
|
||||
```
|
||||
🎤 [14:23:15] The Earth orbits the Sun in 365 days.
|
||||
✅ FACTUAL (0.98): Scientifically accurate orbital period.
|
||||
❓ Questions:
|
||||
1. Why do we need leap years?
|
||||
2. How does the elliptical orbit affect seasons?
|
||||
3. What factors influence Earth's orbital velocity?
|
||||
```
|
||||
|
||||
### File Output
|
||||
```
|
||||
[14:23:15] MIC: User speaking into microphone
|
||||
[14:23:18] SPEAKER: Audio from speakers
|
||||
|
||||
======================================================================
|
||||
[14:23:15] The Earth revolves around the Sun in 365 days.
|
||||
[14:23:25] MIC: The Earth orbits the Sun in 365 days.
|
||||
|
||||
📊 Fact Check: FACTUAL (confidence: 0.98)
|
||||
💡 Scientifically accurate. Earth's orbital period is 365.25 days.
|
||||
💡 Scientifically accurate orbital period.
|
||||
|
||||
❓ Questions:
|
||||
1. Why do we need leap years?
|
||||
2. How does Earth's orbit affect seasons?
|
||||
======================================================================
|
||||
2. How does the elliptical orbit affect seasons?
|
||||
3. What factors influence Earth's orbital velocity?
|
||||
```
|
||||
|
||||
## Technical Stack
|
||||
## Architecture
|
||||
|
||||
- **Audio**: sounddevice, soundfile (16kHz mono, 16-bit PCM)
|
||||
- **Transcription**: faster-whisper (optimized Whisper)
|
||||
- **LLM**: Ollama (local inference)
|
||||
- **Capture**: WASAPI loopback (Windows), PulseAudio (Linux)
|
||||
- **Audio Capture**: sounddevice with dual-stream support
|
||||
- **Transcription**: faster-whisper (optimized Whisper implementation)
|
||||
- **LLM**: Ollama for local inference
|
||||
- **Format**: 16kHz mono, 16-bit PCM
|
||||
- **Processing**: Independent mic/speaker buffers with beam_size=3
|
||||
|
||||
## Future Work
|
||||
## Contributing
|
||||
|
||||
- Real-time streaming transcription with reduced buffering
|
||||
- Speaker diarization improvements
|
||||
- Web interface for remote monitoring
|
||||
- Multi-device simultaneous transcription
|
||||
- Cloud LLM integration options
|
||||
- Custom vocabulary and domain adaptation
|
||||
- Noise reduction preprocessing
|
||||
Contributions welcome! Please open issues or submit pull requests.
|
||||
|
||||
## License
|
||||
|
||||
|
||||
@@ -11,4 +11,4 @@ CUBLAS_PATH=".venv/lib/python3.13/site-packages/nvidia/cublas/lib"
|
||||
export LD_LIBRARY_PATH="${CUDNN_PATH}:${CUBLAS_PATH}:${LD_LIBRARY_PATH}"
|
||||
|
||||
# Run the transcription script with all arguments
|
||||
python3 transcribe_dual_linux.py "$@"
|
||||
python3 transcribe.py "$@"
|
||||
|
||||
437
transcribe.py
Normal file
437
transcribe.py
Normal file
@@ -0,0 +1,437 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Real-time audio transcription with dual capture and optional LLM analysis.
|
||||
Supports microphone + speaker monitor, file output, and fact-checking.
|
||||
"""
|
||||
|
||||
import sounddevice as sd
|
||||
import numpy as np
|
||||
import threading
|
||||
import queue
|
||||
import time
|
||||
import os
|
||||
import argparse
|
||||
from datetime import datetime
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
try:
|
||||
import ollama
|
||||
OLLAMA_AVAILABLE = True
|
||||
except ImportError:
|
||||
OLLAMA_AVAILABLE = False
|
||||
|
||||
|
||||
class DualAudioCapture:
|
||||
"""Capture both microphone and speaker output simultaneously"""
|
||||
|
||||
def __init__(self, mic_device=None, monitor_device=None, sample_rate=16000, chunk_size=2048):
|
||||
self.sample_rate = sample_rate
|
||||
self.chunk_size = chunk_size
|
||||
self.audio_queue = queue.Queue()
|
||||
|
||||
# Find devices
|
||||
devices = sd.query_devices()
|
||||
|
||||
# Microphone (default input or specified)
|
||||
if mic_device is None:
|
||||
self.mic_device = sd.default.device[0] # Default input
|
||||
else:
|
||||
self.mic_device = self._find_device(mic_device, input_required=True)
|
||||
|
||||
# Monitor/Loopback (for speaker output)
|
||||
if monitor_device:
|
||||
self.monitor_device = self._find_device(monitor_device, input_required=True)
|
||||
else:
|
||||
self.monitor_device = None
|
||||
|
||||
print(f"✓ Microphone: {devices[self.mic_device]['name']} (index {self.mic_device})")
|
||||
if self.monitor_device:
|
||||
print(f"✓ Monitor: {devices[self.monitor_device]['name']} (index {self.monitor_device})")
|
||||
else:
|
||||
print("⚠ No monitor device - capturing microphone only")
|
||||
|
||||
# Start streams
|
||||
self.mic_stream = sd.InputStream(
|
||||
device=self.mic_device,
|
||||
channels=1,
|
||||
samplerate=sample_rate,
|
||||
blocksize=chunk_size,
|
||||
dtype='int16',
|
||||
callback=self._mic_callback
|
||||
)
|
||||
|
||||
if self.monitor_device:
|
||||
self.monitor_stream = sd.InputStream(
|
||||
device=self.monitor_device,
|
||||
channels=1,
|
||||
samplerate=sample_rate,
|
||||
blocksize=chunk_size,
|
||||
dtype='int16',
|
||||
callback=self._monitor_callback
|
||||
)
|
||||
else:
|
||||
self.monitor_stream = None
|
||||
|
||||
self.mic_stream.start()
|
||||
if self.monitor_stream:
|
||||
self.monitor_stream.start()
|
||||
|
||||
print("✓ Audio capture started")
|
||||
|
||||
def _find_device(self, device_name, input_required=True):
|
||||
"""Find device by name substring"""
|
||||
devices = sd.query_devices()
|
||||
for i, dev in enumerate(devices):
|
||||
if device_name.lower() in dev['name'].lower():
|
||||
if not input_required or dev['max_input_channels'] > 0:
|
||||
return i
|
||||
raise RuntimeError(f"Device '{device_name}' not found")
|
||||
|
||||
def _mic_callback(self, indata, frames, time_info, status):
|
||||
"""Microphone audio callback"""
|
||||
if status:
|
||||
print(f"⚠ Mic status: {status}")
|
||||
self.audio_queue.put(('mic', indata.copy()))
|
||||
|
||||
def _monitor_callback(self, indata, frames, time_info, status):
|
||||
"""Monitor/speaker audio callback"""
|
||||
if status:
|
||||
print(f"⚠ Monitor status: {status}")
|
||||
self.audio_queue.put(('monitor', indata.copy()))
|
||||
|
||||
def read_chunk(self):
|
||||
"""Read audio data from queue"""
|
||||
try:
|
||||
return self.audio_queue.get(timeout=0.05)
|
||||
except queue.Empty:
|
||||
return None
|
||||
|
||||
def close(self):
|
||||
"""Cleanup resources"""
|
||||
self.mic_stream.stop()
|
||||
self.mic_stream.close()
|
||||
if self.monitor_stream:
|
||||
self.monitor_stream.stop()
|
||||
self.monitor_stream.close()
|
||||
|
||||
|
||||
class WhisperTranscriber:
|
||||
"""Process audio with Whisper"""
|
||||
|
||||
def __init__(self, model_name="base", language="en", force_cpu=False):
|
||||
print(f"Loading Whisper model '{model_name}'...")
|
||||
|
||||
import torch
|
||||
has_cuda = torch.cuda.is_available() and not force_cpu
|
||||
|
||||
device = "cpu"
|
||||
compute_type = "int8"
|
||||
|
||||
if has_cuda:
|
||||
try:
|
||||
import ctranslate2
|
||||
if ctranslate2.get_cuda_device_count() > 0:
|
||||
device = "cuda"
|
||||
compute_type = "float16"
|
||||
print(f"✓ Using GPU: {torch.cuda.get_device_name(0)}")
|
||||
except Exception as e:
|
||||
print(f"⚠ CUDA unavailable: {e}")
|
||||
|
||||
if device == "cpu":
|
||||
print("✓ Using CPU")
|
||||
|
||||
model_kwargs = {"device": device, "compute_type": compute_type}
|
||||
if device == "cpu":
|
||||
model_kwargs["cpu_threads"] = 4
|
||||
|
||||
self.model = WhisperModel(model_name, **model_kwargs)
|
||||
self.language = language
|
||||
self.mic_buffer = np.array([], dtype=np.float32)
|
||||
self.monitor_buffer = np.array([], dtype=np.float32)
|
||||
self.lock = threading.Lock()
|
||||
|
||||
def add_audio(self, source, audio_chunk):
|
||||
"""Add audio to appropriate buffer"""
|
||||
with self.lock:
|
||||
audio_float = audio_chunk.flatten().astype(np.float32) / 32768.0
|
||||
if source == 'mic':
|
||||
self.mic_buffer = np.concatenate([self.mic_buffer, audio_float])
|
||||
else:
|
||||
self.monitor_buffer = np.concatenate([self.monitor_buffer, audio_float])
|
||||
|
||||
def transcribe_chunk(self, min_duration=3.0):
|
||||
"""Transcribe accumulated audio"""
|
||||
with self.lock:
|
||||
mic_duration = len(self.mic_buffer) / 16000
|
||||
monitor_duration = len(self.monitor_buffer) / 16000
|
||||
|
||||
results = {}
|
||||
|
||||
# Transcribe microphone
|
||||
if mic_duration >= min_duration:
|
||||
mic_audio = self.mic_buffer.copy()
|
||||
self.mic_buffer = np.array([], dtype=np.float32)
|
||||
results['mic'] = self._transcribe(mic_audio)
|
||||
|
||||
# Transcribe monitor
|
||||
if monitor_duration >= min_duration:
|
||||
monitor_audio = self.monitor_buffer.copy()
|
||||
self.monitor_buffer = np.array([], dtype=np.float32)
|
||||
results['monitor'] = self._transcribe(monitor_audio)
|
||||
|
||||
return results if results else None
|
||||
|
||||
def _transcribe(self, audio):
|
||||
"""Internal transcription"""
|
||||
try:
|
||||
segments, _ = self.model.transcribe(
|
||||
audio,
|
||||
language=self.language,
|
||||
beam_size=3,
|
||||
vad_filter=True,
|
||||
vad_parameters=dict(min_silence_duration_ms=500)
|
||||
)
|
||||
text = " ".join([seg.text for seg in segments]).strip()
|
||||
return text if text else None
|
||||
except Exception as e:
|
||||
print(f"❌ Transcription error: {e}")
|
||||
return None
|
||||
|
||||
|
||||
class LLMAnalyzer:
|
||||
"""LLM analysis with fact-checking and question generation"""
|
||||
|
||||
def __init__(self, model="qwen2.5:3b"):
|
||||
if not OLLAMA_AVAILABLE:
|
||||
raise RuntimeError("Ollama not installed: pip install ollama")
|
||||
|
||||
self.model = model
|
||||
try:
|
||||
ollama.list()
|
||||
print(f"✓ Ollama connected: {self.model}")
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Ollama not running: {e}")
|
||||
|
||||
def fact_check(self, text):
|
||||
"""Quick fact-check"""
|
||||
prompt = f"""Fact-check this statement. Reply ONLY with:
|
||||
VERDICT: factual/dubious/false
|
||||
CONFIDENCE: 0.0-1.0
|
||||
REASON: one sentence
|
||||
|
||||
Statement: "{text}" """
|
||||
|
||||
try:
|
||||
response = ollama.generate(
|
||||
model=self.model,
|
||||
prompt=prompt,
|
||||
options={"temperature": 0.1, "num_predict": 80}
|
||||
)
|
||||
|
||||
import re
|
||||
response_text = response['response']
|
||||
|
||||
verdict = re.search(r'VERDICT:\s*(\w+)', response_text, re.I)
|
||||
confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', response_text, re.I)
|
||||
reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', response_text, re.I | re.DOTALL)
|
||||
|
||||
return {
|
||||
'verdict': verdict.group(1).lower() if verdict else 'unknown',
|
||||
'confidence': float(confidence.group(1)) if confidence else 0.5,
|
||||
'reason': reason.group(1).strip() if reason else response_text[:150]
|
||||
}
|
||||
except Exception as e:
|
||||
return {'verdict': 'error', 'confidence': 0.0, 'reason': str(e)}
|
||||
|
||||
def generate_questions(self, text):
|
||||
"""Generate follow-up questions"""
|
||||
prompt = f"""Generate 3 insightful questions about this. Reply ONLY with:
|
||||
Q1: [question]
|
||||
Q2: [question]
|
||||
Q3: [question]
|
||||
|
||||
Statement: "{text}" """
|
||||
|
||||
try:
|
||||
response = ollama.generate(
|
||||
model=self.model,
|
||||
prompt=prompt,
|
||||
options={"temperature": 0.7, "num_predict": 120}
|
||||
)
|
||||
|
||||
import re
|
||||
response_text = response['response']
|
||||
questions = []
|
||||
|
||||
for i in range(1, 4):
|
||||
q_match = re.search(rf'Q{i}:\s*(.+?)(?:\n|$)', response_text, re.I)
|
||||
if q_match:
|
||||
question = q_match.group(1).strip()
|
||||
if not question.endswith('?'):
|
||||
question += '?'
|
||||
questions.append(question)
|
||||
|
||||
# Fallback defaults
|
||||
while len(questions) < 3:
|
||||
defaults = ["What are the implications?", "What evidence supports this?", "What's the context?"]
|
||||
questions.append(defaults[len(questions)])
|
||||
|
||||
return questions[:3]
|
||||
except Exception as e:
|
||||
return ["What are the key points?", "What supports this?", "What are the implications?"]
|
||||
|
||||
|
||||
def save_transcript(text, source, timestamp, filename):
|
||||
"""Append transcript to file"""
|
||||
os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
|
||||
with open(filename, "a", encoding="utf-8") as f:
|
||||
source_label = "MIC" if source == 'mic' else "SPEAKER"
|
||||
f.write(f"[{timestamp}] {source_label}: {text}\n")
|
||||
|
||||
|
||||
def save_enriched_transcript(text, source, timestamp, fact_check, questions, filename):
|
||||
"""Save enriched transcript with LLM analysis"""
|
||||
os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
|
||||
with open(filename, "a", encoding="utf-8") as f:
|
||||
source_label = "MIC" if source == 'mic' else "SPEAKER"
|
||||
f.write(f"\n{'='*70}\n")
|
||||
f.write(f"[{timestamp}] {source_label}: {text}\n\n")
|
||||
|
||||
if fact_check:
|
||||
f.write(f"📊 Fact Check: {fact_check['verdict'].upper()} ")
|
||||
f.write(f"(confidence: {fact_check['confidence']:.2f})\n")
|
||||
f.write(f"💡 {fact_check['reason']}\n\n")
|
||||
|
||||
if questions:
|
||||
f.write("❓ Questions:\n")
|
||||
for i, q in enumerate(questions, 1):
|
||||
f.write(f"{i}. {q}\n")
|
||||
f.write("\n")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Real-time audio transcription with dual capture")
|
||||
parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium", "large"],
|
||||
help="Whisper model (default: tiny)")
|
||||
parser.add_argument("--language", default="en", help="Language code (default: en)")
|
||||
parser.add_argument("--mic", help="Microphone device name (partial match)")
|
||||
parser.add_argument("--monitor", help="Monitor device name for speaker capture")
|
||||
parser.add_argument("--interval", type=float, default=5.0, help="Processing interval in seconds (default: 5.0)")
|
||||
parser.add_argument("--min-duration", type=float, default=2.0, help="Minimum audio duration (default: 2.0)")
|
||||
parser.add_argument("--enable-llm", action="store_true", help="Enable LLM analysis (fact-checking + questions)")
|
||||
parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model (default: qwen2.5:3b)")
|
||||
parser.add_argument("--output", "-o", help="Save transcript to file")
|
||||
parser.add_argument("--list-devices", action="store_true", help="List audio devices and exit")
|
||||
parser.add_argument("--force-cpu", action="store_true", help="Force CPU processing")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.list_devices:
|
||||
print("\nAvailable audio devices:")
|
||||
for i, dev in enumerate(sd.query_devices()):
|
||||
in_ch = dev['max_input_channels']
|
||||
out_ch = dev['max_output_channels']
|
||||
if in_ch > 0:
|
||||
print(f" [{i:2d}] {dev['name']:<50} IN:{in_ch} OUT:{out_ch}")
|
||||
return
|
||||
|
||||
print("=== Real-Time Audio Transcription ===")
|
||||
print(f"Model: {args.model} | Language: {args.language} | Interval: {args.interval}s")
|
||||
if args.output:
|
||||
print(f"Output: {args.output}")
|
||||
if args.enable_llm:
|
||||
print(f"LLM Analysis: Enabled ({args.llm_model})")
|
||||
|
||||
# Initialize capture
|
||||
try:
|
||||
capturer = DualAudioCapture(
|
||||
mic_device=args.mic,
|
||||
monitor_device=args.monitor,
|
||||
sample_rate=16000,
|
||||
chunk_size=2048
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"\n❌ Audio Error: {e}")
|
||||
print("\nTip: Use --list-devices to see available devices")
|
||||
print(" Use --mic and --monitor to specify devices")
|
||||
return
|
||||
|
||||
# Initialize transcriber
|
||||
try:
|
||||
transcriber = WhisperTranscriber(
|
||||
model_name=args.model,
|
||||
language=args.language,
|
||||
force_cpu=args.force_cpu
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"\n❌ Whisper Error: {e}")
|
||||
return
|
||||
|
||||
# Initialize LLM analyzer
|
||||
llm_analyzer = None
|
||||
if args.enable_llm:
|
||||
try:
|
||||
llm_analyzer = LLMAnalyzer(model=args.llm_model)
|
||||
except Exception as e:
|
||||
print(f"\n⚠ LLM Error: {e}")
|
||||
print("Continuing without LLM analysis...")
|
||||
|
||||
# Main loop
|
||||
print(f"\n✅ Started. Press Ctrl+C to stop.\n{'='*60}")
|
||||
last_process = time.time()
|
||||
|
||||
try:
|
||||
while True:
|
||||
# Collect audio
|
||||
chunk = capturer.read_chunk()
|
||||
if chunk:
|
||||
source, audio = chunk
|
||||
transcriber.add_audio(source, audio)
|
||||
|
||||
# Process at intervals
|
||||
if time.time() - last_process >= args.interval:
|
||||
results = transcriber.transcribe_chunk(min_duration=args.min_duration)
|
||||
|
||||
if results:
|
||||
timestamp = datetime.now().strftime("%H:%M:%S")
|
||||
|
||||
for source, text in results.items():
|
||||
if text:
|
||||
source_emoji = "🎤" if source == 'mic' else "🔊"
|
||||
print(f"\n{source_emoji} [{timestamp}] {text}")
|
||||
|
||||
# LLM analysis
|
||||
fact_check = None
|
||||
questions = None
|
||||
if llm_analyzer:
|
||||
fact_check = llm_analyzer.fact_check(text)
|
||||
questions = llm_analyzer.generate_questions(text)
|
||||
|
||||
verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(
|
||||
fact_check['verdict'], '❓')
|
||||
print(f" {verdict_emoji} {fact_check['verdict'].upper()} "
|
||||
f"({fact_check['confidence']:.2f}): {fact_check['reason']}")
|
||||
print(f" ❓ Questions:")
|
||||
for i, q in enumerate(questions, 1):
|
||||
print(f" {i}. {q}")
|
||||
|
||||
# Save to file
|
||||
if args.output:
|
||||
if llm_analyzer:
|
||||
save_enriched_transcript(text, source, timestamp, fact_check, questions, args.output)
|
||||
else:
|
||||
save_transcript(text, source, timestamp, args.output)
|
||||
|
||||
last_process = time.time()
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print(f"\n{'='*60}\n🛑 Stopping...")
|
||||
|
||||
capturer.close()
|
||||
if args.output and os.path.exists(args.output):
|
||||
print(f"\n💾 Transcript saved: {os.path.abspath(args.output)}")
|
||||
print("\n✅ Done!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,7 +1,7 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Real-time transcription with dual audio capture (microphone + speaker monitor).
|
||||
Linux/PipeWire optimized with Ollama LLM fact-checking.
|
||||
Real-time audio transcription with dual capture and optional LLM analysis.
|
||||
Supports microphone + speaker monitor, file output, and fact-checking.
|
||||
"""
|
||||
|
||||
import sounddevice as sd
|
||||
@@ -9,6 +9,7 @@ import numpy as np
|
||||
import threading
|
||||
import queue
|
||||
import time
|
||||
import os
|
||||
import argparse
|
||||
from datetime import datetime
|
||||
from faster_whisper import WhisperModel
|
||||
@@ -197,8 +198,8 @@ class WhisperTranscriber:
|
||||
return None
|
||||
|
||||
|
||||
class LLMFactChecker:
|
||||
"""Fast fact-checking with Ollama"""
|
||||
class LLMAnalyzer:
|
||||
"""LLM analysis with fact-checking and question generation"""
|
||||
|
||||
def __init__(self, model="qwen2.5:3b"):
|
||||
if not OLLAMA_AVAILABLE:
|
||||
@@ -228,34 +229,100 @@ Statement: "{text}" """
|
||||
)
|
||||
|
||||
import re
|
||||
text = response['response']
|
||||
response_text = response['response']
|
||||
|
||||
verdict = re.search(r'VERDICT:\s*(\w+)', text, re.I)
|
||||
confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', text, re.I)
|
||||
reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', text, re.I | re.DOTALL)
|
||||
verdict = re.search(r'VERDICT:\s*(\w+)', response_text, re.I)
|
||||
confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', response_text, re.I)
|
||||
reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', response_text, re.I | re.DOTALL)
|
||||
|
||||
return {
|
||||
'verdict': verdict.group(1).lower() if verdict else 'unknown',
|
||||
'confidence': float(confidence.group(1)) if confidence else 0.5,
|
||||
'reason': reason.group(1).strip() if reason else text[:150]
|
||||
'reason': reason.group(1).strip() if reason else response_text[:150]
|
||||
}
|
||||
except Exception as e:
|
||||
return {'verdict': 'error', 'confidence': 0.0, 'reason': str(e)}
|
||||
|
||||
def generate_questions(self, text):
|
||||
"""Generate follow-up questions"""
|
||||
prompt = f"""Generate 3 insightful questions about this. Reply ONLY with:
|
||||
Q1: [question]
|
||||
Q2: [question]
|
||||
Q3: [question]
|
||||
|
||||
Statement: "{text}" """
|
||||
|
||||
try:
|
||||
response = ollama.generate(
|
||||
model=self.model,
|
||||
prompt=prompt,
|
||||
options={"temperature": 0.7, "num_predict": 120}
|
||||
)
|
||||
|
||||
import re
|
||||
response_text = response['response']
|
||||
questions = []
|
||||
|
||||
for i in range(1, 4):
|
||||
q_match = re.search(rf'Q{i}:\s*(.+?)(?:\n|$)', response_text, re.I)
|
||||
if q_match:
|
||||
question = q_match.group(1).strip()
|
||||
if not question.endswith('?'):
|
||||
question += '?'
|
||||
questions.append(question)
|
||||
|
||||
# Fallback defaults
|
||||
while len(questions) < 3:
|
||||
defaults = ["What are the implications?", "What evidence supports this?", "What's the context?"]
|
||||
questions.append(defaults[len(questions)])
|
||||
|
||||
return questions[:3]
|
||||
except Exception as e:
|
||||
return ["What are the key points?", "What supports this?", "What are the implications?"]
|
||||
|
||||
|
||||
def save_transcript(text, source, timestamp, filename):
|
||||
"""Append transcript to file"""
|
||||
os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
|
||||
with open(filename, "a", encoding="utf-8") as f:
|
||||
source_label = "MIC" if source == 'mic' else "SPEAKER"
|
||||
f.write(f"[{timestamp}] {source_label}: {text}\n")
|
||||
|
||||
|
||||
def save_enriched_transcript(text, source, timestamp, fact_check, questions, filename):
|
||||
"""Save enriched transcript with LLM analysis"""
|
||||
os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
|
||||
with open(filename, "a", encoding="utf-8") as f:
|
||||
source_label = "MIC" if source == 'mic' else "SPEAKER"
|
||||
f.write(f"\n{'='*70}\n")
|
||||
f.write(f"[{timestamp}] {source_label}: {text}\n\n")
|
||||
|
||||
if fact_check:
|
||||
f.write(f"📊 Fact Check: {fact_check['verdict'].upper()} ")
|
||||
f.write(f"(confidence: {fact_check['confidence']:.2f})\n")
|
||||
f.write(f"💡 {fact_check['reason']}\n\n")
|
||||
|
||||
if questions:
|
||||
f.write("❓ Questions:\n")
|
||||
for i, q in enumerate(questions, 1):
|
||||
f.write(f"{i}. {q}\n")
|
||||
f.write("\n")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Dual audio transcription with fact-checking")
|
||||
parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium"],
|
||||
help="Whisper model (default: tiny for speed)")
|
||||
parser.add_argument("--language", default="en", help="Language code")
|
||||
parser = argparse.ArgumentParser(description="Real-time audio transcription with dual capture")
|
||||
parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium", "large"],
|
||||
help="Whisper model (default: tiny)")
|
||||
parser.add_argument("--language", default="en", help="Language code (default: en)")
|
||||
parser.add_argument("--mic", help="Microphone device name (partial match)")
|
||||
parser.add_argument("--monitor", help="Monitor device name for speaker capture")
|
||||
parser.add_argument("--interval", type=float, default=5.0, help="Processing interval (seconds)")
|
||||
parser.add_argument("--min-duration", type=float, default=2.0, help="Min audio duration")
|
||||
parser.add_argument("--enable-llm", action="store_true", help="Enable fact-checking")
|
||||
parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model")
|
||||
parser.add_argument("--list-devices", action="store_true", help="List audio devices")
|
||||
parser.add_argument("--force-cpu", action="store_true", help="Force CPU")
|
||||
parser.add_argument("--interval", type=float, default=5.0, help="Processing interval in seconds (default: 5.0)")
|
||||
parser.add_argument("--min-duration", type=float, default=2.0, help="Minimum audio duration (default: 2.0)")
|
||||
parser.add_argument("--enable-llm", action="store_true", help="Enable LLM analysis (fact-checking + questions)")
|
||||
parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model (default: qwen2.5:3b)")
|
||||
parser.add_argument("--output", "-o", help="Save transcript to file")
|
||||
parser.add_argument("--list-devices", action="store_true", help="List audio devices and exit")
|
||||
parser.add_argument("--force-cpu", action="store_true", help="Force CPU processing")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
@@ -268,8 +335,12 @@ def main():
|
||||
print(f" [{i:2d}] {dev['name']:<50} IN:{in_ch} OUT:{out_ch}")
|
||||
return
|
||||
|
||||
print("=== Dual Audio Transcription with Fact-Checking ===")
|
||||
print("=== Real-Time Audio Transcription ===")
|
||||
print(f"Model: {args.model} | Language: {args.language} | Interval: {args.interval}s")
|
||||
if args.output:
|
||||
print(f"Output: {args.output}")
|
||||
if args.enable_llm:
|
||||
print(f"LLM Analysis: Enabled ({args.llm_model})")
|
||||
|
||||
# Initialize capture
|
||||
try:
|
||||
@@ -296,14 +367,14 @@ def main():
|
||||
print(f"\n❌ Whisper Error: {e}")
|
||||
return
|
||||
|
||||
# Initialize fact checker
|
||||
fact_checker = None
|
||||
# Initialize LLM analyzer
|
||||
llm_analyzer = None
|
||||
if args.enable_llm:
|
||||
try:
|
||||
fact_checker = LLMFactChecker(model=args.llm_model)
|
||||
llm_analyzer = LLMAnalyzer(model=args.llm_model)
|
||||
except Exception as e:
|
||||
print(f"\n⚠ LLM Error: {e}")
|
||||
print("Continuing without fact-checking...")
|
||||
print("Continuing without LLM analysis...")
|
||||
|
||||
# Main loop
|
||||
print(f"\n✅ Started. Press Ctrl+C to stop.\n{'='*60}")
|
||||
@@ -329,10 +400,27 @@ def main():
|
||||
source_emoji = "🎤" if source == 'mic' else "🔊"
|
||||
print(f"\n{source_emoji} [{timestamp}] {text}")
|
||||
|
||||
if fact_checker:
|
||||
fc = fact_checker.fact_check(text)
|
||||
verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(fc['verdict'], '❓')
|
||||
print(f" {verdict_emoji} {fc['verdict'].upper()} ({fc['confidence']:.2f}): {fc['reason']}")
|
||||
# LLM analysis
|
||||
fact_check = None
|
||||
questions = None
|
||||
if llm_analyzer:
|
||||
fact_check = llm_analyzer.fact_check(text)
|
||||
questions = llm_analyzer.generate_questions(text)
|
||||
|
||||
verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(
|
||||
fact_check['verdict'], '❓')
|
||||
print(f" {verdict_emoji} {fact_check['verdict'].upper()} "
|
||||
f"({fact_check['confidence']:.2f}): {fact_check['reason']}")
|
||||
print(f" ❓ Questions:")
|
||||
for i, q in enumerate(questions, 1):
|
||||
print(f" {i}. {q}")
|
||||
|
||||
# Save to file
|
||||
if args.output:
|
||||
if llm_analyzer:
|
||||
save_enriched_transcript(text, source, timestamp, fact_check, questions, args.output)
|
||||
else:
|
||||
save_transcript(text, source, timestamp, args.output)
|
||||
|
||||
last_process = time.time()
|
||||
|
||||
@@ -340,6 +428,8 @@ def main():
|
||||
print(f"\n{'='*60}\n🛑 Stopping...")
|
||||
|
||||
capturer.close()
|
||||
if args.output and os.path.exists(args.output):
|
||||
print(f"\n💾 Transcript saved: {os.path.abspath(args.output)}")
|
||||
print("\n✅ Done!")
|
||||
|
||||
|
||||
|
||||
347
transcribe_duil_linux_old.py
Normal file
347
transcribe_duil_linux_old.py
Normal file
@@ -0,0 +1,347 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Real-time transcription with dual audio capture (microphone + speaker monitor).
|
||||
Linux/PipeWire optimized with Ollama LLM fact-checking.
|
||||
"""
|
||||
|
||||
import sounddevice as sd
|
||||
import numpy as np
|
||||
import threading
|
||||
import queue
|
||||
import time
|
||||
import argparse
|
||||
from datetime import datetime
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
try:
|
||||
import ollama
|
||||
OLLAMA_AVAILABLE = True
|
||||
except ImportError:
|
||||
OLLAMA_AVAILABLE = False
|
||||
|
||||
|
||||
class DualAudioCapture:
|
||||
"""Capture both microphone and speaker output simultaneously"""
|
||||
|
||||
def __init__(self, mic_device=None, monitor_device=None, sample_rate=16000, chunk_size=2048):
|
||||
self.sample_rate = sample_rate
|
||||
self.chunk_size = chunk_size
|
||||
self.audio_queue = queue.Queue()
|
||||
|
||||
# Find devices
|
||||
devices = sd.query_devices()
|
||||
|
||||
# Microphone (default input or specified)
|
||||
if mic_device is None:
|
||||
self.mic_device = sd.default.device[0] # Default input
|
||||
else:
|
||||
self.mic_device = self._find_device(mic_device, input_required=True)
|
||||
|
||||
# Monitor/Loopback (for speaker output)
|
||||
if monitor_device:
|
||||
self.monitor_device = self._find_device(monitor_device, input_required=True)
|
||||
else:
|
||||
self.monitor_device = None
|
||||
|
||||
print(f"✓ Microphone: {devices[self.mic_device]['name']} (index {self.mic_device})")
|
||||
if self.monitor_device:
|
||||
print(f"✓ Monitor: {devices[self.monitor_device]['name']} (index {self.monitor_device})")
|
||||
else:
|
||||
print("⚠ No monitor device - capturing microphone only")
|
||||
|
||||
# Start streams
|
||||
self.mic_stream = sd.InputStream(
|
||||
device=self.mic_device,
|
||||
channels=1,
|
||||
samplerate=sample_rate,
|
||||
blocksize=chunk_size,
|
||||
dtype='int16',
|
||||
callback=self._mic_callback
|
||||
)
|
||||
|
||||
if self.monitor_device:
|
||||
self.monitor_stream = sd.InputStream(
|
||||
device=self.monitor_device,
|
||||
channels=1,
|
||||
samplerate=sample_rate,
|
||||
blocksize=chunk_size,
|
||||
dtype='int16',
|
||||
callback=self._monitor_callback
|
||||
)
|
||||
else:
|
||||
self.monitor_stream = None
|
||||
|
||||
self.mic_stream.start()
|
||||
if self.monitor_stream:
|
||||
self.monitor_stream.start()
|
||||
|
||||
print("✓ Audio capture started")
|
||||
|
||||
def _find_device(self, device_name, input_required=True):
|
||||
"""Find device by name substring"""
|
||||
devices = sd.query_devices()
|
||||
for i, dev in enumerate(devices):
|
||||
if device_name.lower() in dev['name'].lower():
|
||||
if not input_required or dev['max_input_channels'] > 0:
|
||||
return i
|
||||
raise RuntimeError(f"Device '{device_name}' not found")
|
||||
|
||||
def _mic_callback(self, indata, frames, time_info, status):
|
||||
"""Microphone audio callback"""
|
||||
if status:
|
||||
print(f"⚠ Mic status: {status}")
|
||||
self.audio_queue.put(('mic', indata.copy()))
|
||||
|
||||
def _monitor_callback(self, indata, frames, time_info, status):
|
||||
"""Monitor/speaker audio callback"""
|
||||
if status:
|
||||
print(f"⚠ Monitor status: {status}")
|
||||
self.audio_queue.put(('monitor', indata.copy()))
|
||||
|
||||
def read_chunk(self):
|
||||
"""Read audio data from queue"""
|
||||
try:
|
||||
return self.audio_queue.get(timeout=0.05)
|
||||
except queue.Empty:
|
||||
return None
|
||||
|
||||
def close(self):
|
||||
"""Cleanup resources"""
|
||||
self.mic_stream.stop()
|
||||
self.mic_stream.close()
|
||||
if self.monitor_stream:
|
||||
self.monitor_stream.stop()
|
||||
self.monitor_stream.close()
|
||||
|
||||
|
||||
class WhisperTranscriber:
|
||||
"""Process audio with Whisper"""
|
||||
|
||||
def __init__(self, model_name="base", language="en", force_cpu=False):
|
||||
print(f"Loading Whisper model '{model_name}'...")
|
||||
|
||||
import torch
|
||||
has_cuda = torch.cuda.is_available() and not force_cpu
|
||||
|
||||
device = "cpu"
|
||||
compute_type = "int8"
|
||||
|
||||
if has_cuda:
|
||||
try:
|
||||
import ctranslate2
|
||||
if ctranslate2.get_cuda_device_count() > 0:
|
||||
device = "cuda"
|
||||
compute_type = "float16"
|
||||
print(f"✓ Using GPU: {torch.cuda.get_device_name(0)}")
|
||||
except Exception as e:
|
||||
print(f"⚠ CUDA unavailable: {e}")
|
||||
|
||||
if device == "cpu":
|
||||
print("✓ Using CPU")
|
||||
|
||||
model_kwargs = {"device": device, "compute_type": compute_type}
|
||||
if device == "cpu":
|
||||
model_kwargs["cpu_threads"] = 4
|
||||
|
||||
self.model = WhisperModel(model_name, **model_kwargs)
|
||||
self.language = language
|
||||
self.mic_buffer = np.array([], dtype=np.float32)
|
||||
self.monitor_buffer = np.array([], dtype=np.float32)
|
||||
self.lock = threading.Lock()
|
||||
|
||||
def add_audio(self, source, audio_chunk):
|
||||
"""Add audio to appropriate buffer"""
|
||||
with self.lock:
|
||||
audio_float = audio_chunk.flatten().astype(np.float32) / 32768.0
|
||||
if source == 'mic':
|
||||
self.mic_buffer = np.concatenate([self.mic_buffer, audio_float])
|
||||
else:
|
||||
self.monitor_buffer = np.concatenate([self.monitor_buffer, audio_float])
|
||||
|
||||
def transcribe_chunk(self, min_duration=3.0):
|
||||
"""Transcribe accumulated audio"""
|
||||
with self.lock:
|
||||
mic_duration = len(self.mic_buffer) / 16000
|
||||
monitor_duration = len(self.monitor_buffer) / 16000
|
||||
|
||||
results = {}
|
||||
|
||||
# Transcribe microphone
|
||||
if mic_duration >= min_duration:
|
||||
mic_audio = self.mic_buffer.copy()
|
||||
self.mic_buffer = np.array([], dtype=np.float32)
|
||||
results['mic'] = self._transcribe(mic_audio)
|
||||
|
||||
# Transcribe monitor
|
||||
if monitor_duration >= min_duration:
|
||||
monitor_audio = self.monitor_buffer.copy()
|
||||
self.monitor_buffer = np.array([], dtype=np.float32)
|
||||
results['monitor'] = self._transcribe(monitor_audio)
|
||||
|
||||
return results if results else None
|
||||
|
||||
def _transcribe(self, audio):
|
||||
"""Internal transcription"""
|
||||
try:
|
||||
segments, _ = self.model.transcribe(
|
||||
audio,
|
||||
language=self.language,
|
||||
beam_size=3, # Faster than default 5
|
||||
vad_filter=True,
|
||||
vad_parameters=dict(min_silence_duration_ms=500)
|
||||
)
|
||||
text = " ".join([seg.text for seg in segments]).strip()
|
||||
return text if text else None
|
||||
except Exception as e:
|
||||
print(f"❌ Transcription error: {e}")
|
||||
return None
|
||||
|
||||
|
||||
class LLMFactChecker:
|
||||
"""Fast fact-checking with Ollama"""
|
||||
|
||||
def __init__(self, model="qwen2.5:3b"):
|
||||
if not OLLAMA_AVAILABLE:
|
||||
raise RuntimeError("Ollama not installed: pip install ollama")
|
||||
|
||||
self.model = model
|
||||
try:
|
||||
ollama.list()
|
||||
print(f"✓ Ollama connected: {self.model}")
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Ollama not running: {e}")
|
||||
|
||||
def fact_check(self, text):
|
||||
"""Quick fact-check"""
|
||||
prompt = f"""Fact-check this statement. Reply ONLY with:
|
||||
VERDICT: factual/dubious/false
|
||||
CONFIDENCE: 0.0-1.0
|
||||
REASON: one sentence
|
||||
|
||||
Statement: "{text}" """
|
||||
|
||||
try:
|
||||
response = ollama.generate(
|
||||
model=self.model,
|
||||
prompt=prompt,
|
||||
options={"temperature": 0.1, "num_predict": 80}
|
||||
)
|
||||
|
||||
import re
|
||||
text = response['response']
|
||||
|
||||
verdict = re.search(r'VERDICT:\s*(\w+)', text, re.I)
|
||||
confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', text, re.I)
|
||||
reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', text, re.I | re.DOTALL)
|
||||
|
||||
return {
|
||||
'verdict': verdict.group(1).lower() if verdict else 'unknown',
|
||||
'confidence': float(confidence.group(1)) if confidence else 0.5,
|
||||
'reason': reason.group(1).strip() if reason else text[:150]
|
||||
}
|
||||
except Exception as e:
|
||||
return {'verdict': 'error', 'confidence': 0.0, 'reason': str(e)}
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Dual audio transcription with fact-checking")
|
||||
parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium"],
|
||||
help="Whisper model (default: tiny for speed)")
|
||||
parser.add_argument("--language", default="en", help="Language code")
|
||||
parser.add_argument("--mic", help="Microphone device name (partial match)")
|
||||
parser.add_argument("--monitor", help="Monitor device name for speaker capture")
|
||||
parser.add_argument("--interval", type=float, default=5.0, help="Processing interval (seconds)")
|
||||
parser.add_argument("--min-duration", type=float, default=2.0, help="Min audio duration")
|
||||
parser.add_argument("--enable-llm", action="store_true", help="Enable fact-checking")
|
||||
parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model")
|
||||
parser.add_argument("--list-devices", action="store_true", help="List audio devices")
|
||||
parser.add_argument("--force-cpu", action="store_true", help="Force CPU")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.list_devices:
|
||||
print("\nAvailable audio devices:")
|
||||
for i, dev in enumerate(sd.query_devices()):
|
||||
in_ch = dev['max_input_channels']
|
||||
out_ch = dev['max_output_channels']
|
||||
if in_ch > 0:
|
||||
print(f" [{i:2d}] {dev['name']:<50} IN:{in_ch} OUT:{out_ch}")
|
||||
return
|
||||
|
||||
print("=== Dual Audio Transcription with Fact-Checking ===")
|
||||
print(f"Model: {args.model} | Language: {args.language} | Interval: {args.interval}s")
|
||||
|
||||
# Initialize capture
|
||||
try:
|
||||
capturer = DualAudioCapture(
|
||||
mic_device=args.mic,
|
||||
monitor_device=args.monitor,
|
||||
sample_rate=16000,
|
||||
chunk_size=2048
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"\n❌ Audio Error: {e}")
|
||||
print("\nTip: Use --list-devices to see available devices")
|
||||
print(" Use --mic and --monitor to specify devices")
|
||||
return
|
||||
|
||||
# Initialize transcriber
|
||||
try:
|
||||
transcriber = WhisperTranscriber(
|
||||
model_name=args.model,
|
||||
language=args.language,
|
||||
force_cpu=args.force_cpu
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"\n❌ Whisper Error: {e}")
|
||||
return
|
||||
|
||||
# Initialize fact checker
|
||||
fact_checker = None
|
||||
if args.enable_llm:
|
||||
try:
|
||||
fact_checker = LLMFactChecker(model=args.llm_model)
|
||||
except Exception as e:
|
||||
print(f"\n⚠ LLM Error: {e}")
|
||||
print("Continuing without fact-checking...")
|
||||
|
||||
# Main loop
|
||||
print(f"\n✅ Started. Press Ctrl+C to stop.\n{'='*60}")
|
||||
last_process = time.time()
|
||||
|
||||
try:
|
||||
while True:
|
||||
# Collect audio
|
||||
chunk = capturer.read_chunk()
|
||||
if chunk:
|
||||
source, audio = chunk
|
||||
transcriber.add_audio(source, audio)
|
||||
|
||||
# Process at intervals
|
||||
if time.time() - last_process >= args.interval:
|
||||
results = transcriber.transcribe_chunk(min_duration=args.min_duration)
|
||||
|
||||
if results:
|
||||
timestamp = datetime.now().strftime("%H:%M:%S")
|
||||
|
||||
for source, text in results.items():
|
||||
if text:
|
||||
source_emoji = "🎤" if source == 'mic' else "🔊"
|
||||
print(f"\n{source_emoji} [{timestamp}] {text}")
|
||||
|
||||
if fact_checker:
|
||||
fc = fact_checker.fact_check(text)
|
||||
verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(fc['verdict'], '❓')
|
||||
print(f" {verdict_emoji} {fc['verdict'].upper()} ({fc['confidence']:.2f}): {fc['reason']}")
|
||||
|
||||
last_process = time.time()
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print(f"\n{'='*60}\n🛑 Stopping...")
|
||||
|
||||
capturer.close()
|
||||
print("\n✅ Done!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user