Compare commits
4 Commits
36852dde18
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
03b018a582 | ||
|
|
7cc14326fc | ||
|
|
4343b7a5a2 | ||
|
|
a53c0e2902 |
156
QUICK_START.md
Normal file
156
QUICK_START.md
Normal file
@@ -0,0 +1,156 @@
|
|||||||
|
# Quick Start Guide
|
||||||
|
|
||||||
|
## 1. Setup Audio Devices
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List available audio devices
|
||||||
|
./run_transcribe.sh --list-devices
|
||||||
|
```
|
||||||
|
|
||||||
|
Find your:
|
||||||
|
- **Microphone** - Your input device (e.g., "USB Microphone")
|
||||||
|
- **Monitor** - Speaker capture device (e.g., "Monitor of Built-in Audio")
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Basic Usage
|
||||||
|
|
||||||
|
### Simple Transcription
|
||||||
|
```bash
|
||||||
|
# Auto-detect devices
|
||||||
|
./run_transcribe.sh --model medium --language en
|
||||||
|
|
||||||
|
# Specify devices
|
||||||
|
./run_transcribe.sh --mic "USB Mic" --monitor "Monitor"
|
||||||
|
```
|
||||||
|
|
||||||
|
### With File Output
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --model medium --language en --output transcript.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### With LLM Analysis
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --model medium --enable-llm --output enriched.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Language Examples
|
||||||
|
|
||||||
|
### Dutch (Nederlands)
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --model medium --language nl --enable-llm
|
||||||
|
```
|
||||||
|
|
||||||
|
### Spanish
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --model medium --language es
|
||||||
|
```
|
||||||
|
|
||||||
|
### French
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --model medium --language fr
|
||||||
|
```
|
||||||
|
|
||||||
|
### German
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --model medium --language de
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Model Selection
|
||||||
|
|
||||||
|
| Model | Speed | Quality | Command |
|
||||||
|
|--------|----------|---------|----------------------------------|
|
||||||
|
| tiny | Fastest | Basic | `--model tiny` |
|
||||||
|
| base | Fast | Good | `--model base` |
|
||||||
|
| small | Moderate | Better | `--model small` |
|
||||||
|
| medium | Slow | Great | `--model medium` **(recommended)** |
|
||||||
|
| large | Slowest | Best | `--model large` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Optimization Tips
|
||||||
|
|
||||||
|
### High Quality Transcription
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --model large --interval 8 --min-duration 4
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fast Real-Time
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --model tiny --interval 3 --min-duration 2
|
||||||
|
```
|
||||||
|
|
||||||
|
### Best Dutch Transcription (Your Setup)
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --model medium --interval 8 --min-duration 4 --enable-llm --language nl
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. LLM Configuration
|
||||||
|
|
||||||
|
### Default Model (qwen2.5:3b - Fast)
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --enable-llm
|
||||||
|
```
|
||||||
|
|
||||||
|
### Larger Model (Better Analysis)
|
||||||
|
```bash
|
||||||
|
# Install model first
|
||||||
|
ollama pull llama3.2
|
||||||
|
|
||||||
|
# Use it
|
||||||
|
./run_transcribe.sh --enable-llm --llm-model llama3.2
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Output Examples
|
||||||
|
|
||||||
|
### Console Output
|
||||||
|
```
|
||||||
|
🎤 [14:23:15] User speaking via microphone
|
||||||
|
🔊 [14:23:20] Audio from speakers
|
||||||
|
|
||||||
|
🎤 [14:23:25] The Earth orbits the Sun in 365 days.
|
||||||
|
✅ FACTUAL (0.98): Scientifically accurate.
|
||||||
|
❓ Questions:
|
||||||
|
1. Why do we need leap years?
|
||||||
|
2. How does orbital speed vary?
|
||||||
|
3. What affects Earth's orbit?
|
||||||
|
```
|
||||||
|
|
||||||
|
### File Output
|
||||||
|
Saved to `transcript.txt` or your specified file with timestamps and analysis.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Controls
|
||||||
|
|
||||||
|
- **Ctrl+C** - Stop transcription
|
||||||
|
- Processing happens every `--interval` seconds (default: 5s)
|
||||||
|
- Minimum `--min-duration` audio required (default: 2s)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**No devices found:**
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --list-devices
|
||||||
|
```
|
||||||
|
|
||||||
|
**Ollama errors:**
|
||||||
|
```bash
|
||||||
|
ollama serve
|
||||||
|
ollama pull qwen2.5:3b
|
||||||
|
```
|
||||||
|
|
||||||
|
**Force CPU (GPU issues):**
|
||||||
|
```bash
|
||||||
|
./run_transcribe.sh --force-cpu
|
||||||
|
```
|
||||||
243
README.md
243
README.md
@@ -1,16 +1,15 @@
|
|||||||
# Verbatim Dicta
|
# Verbatim Dicta2
|
||||||
|
|
||||||
Real-time audio transcription using Whisper AI with optional LLM-powered analysis. Captures system audio via loopback and transcribes it with configurable models and processing options.
|
Real-time audio transcription using Whisper AI with optional LLM analysis. Captures microphone and speaker audio simultaneously for comprehensive transcription.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- Real-time transcription of system audio (Windows/Linux)
|
- **Dual audio capture** - Record microphone and speaker output simultaneously
|
||||||
- Multiple Whisper model sizes (tiny to large)
|
- **Real-time transcription** - Process audio as it's captured with Whisper models
|
||||||
- Multi-language support
|
- **LLM analysis** - Optional fact-checking and question generation via Ollama
|
||||||
- **Sentence extraction mode** - Stitches audio chunks into complete sentences
|
- **Multi-language** - Support for 50+ languages
|
||||||
- Optional LLM analysis for fact-checking and question generation (via Ollama)
|
- **File output** - Save transcripts with timestamps and analysis
|
||||||
- GPU acceleration support
|
- **GPU acceleration** - CUDA support for faster processing
|
||||||
- Flexible audio device configuration
|
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
@@ -18,17 +17,14 @@ Real-time audio transcription using Whisper AI with optional LLM-powered analysi
|
|||||||
# Install dependencies
|
# Install dependencies
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
|
|
||||||
# Basic transcription (no LLM)
|
|
||||||
python transcribe_speakers.py
|
|
||||||
|
|
||||||
# With LLM analysis (optional)
|
|
||||||
python transcribe_speakers.py --enable-llm
|
|
||||||
|
|
||||||
# With sentence extraction
|
|
||||||
python transcribe_speakers.py --sentence-mode
|
|
||||||
|
|
||||||
# List audio devices
|
# List audio devices
|
||||||
python transcribe_speakers.py --list-devices
|
./run_transcribe.sh --list-devices
|
||||||
|
|
||||||
|
# Basic transcription
|
||||||
|
./run_transcribe.sh --model medium --language en
|
||||||
|
|
||||||
|
# With LLM analysis and file output
|
||||||
|
./run_transcribe.sh --model medium --enable-llm --output transcript.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
## Requirements
|
## Requirements
|
||||||
@@ -58,164 +54,153 @@ For CUDA 12.1:
|
|||||||
pip install torch==2.8.0+cu121 --index-url https://download.pytorch.org/whl/cu121
|
pip install torch==2.8.0+cu121 --index-url https://download.pytorch.org/whl/cu121
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Audio Loopback Setup
|
### 3. Audio Setup
|
||||||
|
|
||||||
**Windows - Option A (Stereo Mix):**
|
**Linux (PulseAudio/PipeWire):**
|
||||||
1. Right-click speaker icon → Sounds → Recording tab
|
```bash
|
||||||
2. Right-click → Show Disabled Devices
|
# List devices to find your monitor device
|
||||||
3. Enable and set Stereo Mix as default
|
./run_transcribe.sh --list-devices
|
||||||
|
|
||||||
**Windows - Option B (VB-Cable, recommended):**
|
# Use with monitor device
|
||||||
1. Download from [vb-audio.com](https://vb-audio.com/Cable/)
|
./run_transcribe.sh --monitor "alsa_output.monitor"
|
||||||
2. Install and restart
|
```
|
||||||
3. Use `--device "CABLE Output"`
|
|
||||||
|
|
||||||
**Linux:**
|
**Windows:**
|
||||||
Configure PulseAudio loopback or use `transcribe_dual_linux.py`
|
- Enable "Stereo Mix" in Sound settings, or
|
||||||
|
- Install VB-Cable from [vb-audio.com](https://vb-audio.com/Cable/)
|
||||||
|
|
||||||
### 4. LLM Features (Optional)
|
### 4. LLM Support (Optional)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Install Ollama from ollama.ai
|
# Install Ollama from ollama.ai
|
||||||
ollama pull llama3.2
|
ollama pull qwen2.5:3b
|
||||||
```
|
```
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
### Available Scripts
|
### Command Line Options
|
||||||
|
|
||||||
- `transcribe_speakers.py` - Main script with all features (LLM optional via `--enable-llm`)
|
|
||||||
- `transcribe_dual_linux.py` - Linux-specific with dual audio support
|
|
||||||
|
|
||||||
### Common Commands
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Specify device and model
|
python transcribe.py [OPTIONS]
|
||||||
python transcribe_speakers.py --device "CABLE Output" --model medium
|
|
||||||
|
|
||||||
# Save to file with language
|
Options:
|
||||||
python transcribe_speakers.py --language es --output transcript.txt
|
--model {tiny,base,small,medium,large} Whisper model (default: tiny)
|
||||||
|
--language CODE Language code (default: en)
|
||||||
# Fast mode (low latency)
|
--mic DEVICE Microphone device name
|
||||||
python transcribe_speakers.py --fast-mode --model tiny --interval 3
|
--monitor DEVICE Speaker monitor device name
|
||||||
|
--interval SECONDS Processing interval (default: 5.0)
|
||||||
# Extract complete sentences from chunks
|
--min-duration SECONDS Minimum audio duration (default: 2.0)
|
||||||
python transcribe_speakers.py --sentence-mode --output sentences.txt
|
--enable-llm Enable LLM analysis
|
||||||
|
--llm-model MODEL Ollama model (default: qwen2.5:3b)
|
||||||
# Maximum accuracy with LLM and sentence extraction
|
--output FILE Save transcript to file
|
||||||
python transcribe_speakers.py --model large --enable-llm --sentence-mode --output enriched.txt
|
--force-cpu Force CPU processing
|
||||||
|
--list-devices List audio devices
|
||||||
# Force CPU (avoid GPU issues)
|
|
||||||
python transcribe_speakers.py --force-cpu
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Key Options
|
### Examples
|
||||||
|
|
||||||
| Option | Description | Default |
|
```bash
|
||||||
|--------|-------------|---------|
|
# Dutch transcription with LLM
|
||||||
| `--model` | Model size: tiny/base/small/medium/large | base |
|
./run_transcribe.sh --model medium --language nl --enable-llm
|
||||||
| `--language` | Language code (en/es/fr/de/ja/etc.) | en |
|
|
||||||
| `--device` | Audio device name (partial match) | Auto |
|
# High-quality meeting transcription
|
||||||
| `--interval` | Processing interval (seconds) | 8.0 |
|
./run_transcribe.sh --model large --interval 8 --output meeting.txt
|
||||||
| `--min-duration` | Minimum audio duration | 3.0 |
|
|
||||||
| `--fast-mode` | Fast mode (3-5x faster, lower accuracy) | False |
|
# Fast real-time transcription
|
||||||
| `--enable-llm` | Enable fact-checking and questions | False |
|
./run_transcribe.sh --model tiny --interval 3 --min-duration 2
|
||||||
| `--llm-model` | Ollama model to use | llama3.2 |
|
|
||||||
| `--output` | Save to file | None |
|
# Specific devices
|
||||||
| `--force-cpu` | Disable GPU | False |
|
./run_transcribe.sh --mic "USB Mic" --monitor "Monitor of Speakers"
|
||||||
| `--gpu-index` | GPU device index | 0 |
|
```
|
||||||
| `--sentence-mode` | Extract complete sentences from chunks | False |
|
|
||||||
|
|
||||||
## Model Performance
|
## Model Performance
|
||||||
|
|
||||||
| Model | Size | Speed | Quality | Best For |
|
| Model | Size | Speed | Quality | Use Case |
|
||||||
|-------|------|-------|---------|----------|
|
|--------|--------|----------|---------|------------------------|
|
||||||
| tiny | ~75 MB | Fastest | Basic | Quick tests, low-latency |
|
| tiny | 75 MB | Fastest | Basic | Real-time, low latency |
|
||||||
| base | ~145 MB | Fast | Good | General real-time use |
|
| base | 145 MB | Fast | Good | General use |
|
||||||
| small | ~485 MB | Moderate | Better | Balanced accuracy/speed |
|
| small | 485 MB | Moderate | Better | Balanced |
|
||||||
| medium | ~1.5 GB | Slow | Great | High accuracy needs |
|
| medium | 1.5 GB | Slow | Great | High accuracy |
|
||||||
| large | ~3 GB | Slowest | Best | Maximum accuracy |
|
| large | 3 GB | Slowest | Best | Maximum quality |
|
||||||
|
|
||||||
## Optimization Presets
|
|
||||||
|
|
||||||
**Low Latency (Real-Time):**
|
|
||||||
```bash
|
|
||||||
python transcribe_speakers.py --model tiny --fast-mode --interval 2 --min-duration 1.5
|
|
||||||
```
|
|
||||||
|
|
||||||
**Balanced:**
|
|
||||||
```bash
|
|
||||||
python transcribe_speakers.py --model base --interval 5
|
|
||||||
```
|
|
||||||
|
|
||||||
**High Accuracy:**
|
|
||||||
```bash
|
|
||||||
python transcribe_speakers.py --model large --interval 10 --enable-llm
|
|
||||||
```
|
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
**No loopback device:**
|
**No audio devices found:**
|
||||||
- Windows: Enable Stereo Mix or install VB-Cable
|
```bash
|
||||||
- Linux: Configure PulseAudio loopback
|
# List all devices
|
||||||
|
./run_transcribe.sh --list-devices
|
||||||
|
|
||||||
|
# Specify devices explicitly
|
||||||
|
./run_transcribe.sh --mic "device_name" --monitor "monitor_name"
|
||||||
|
```
|
||||||
|
|
||||||
**CUDA errors:**
|
**CUDA errors:**
|
||||||
```bash
|
```bash
|
||||||
python transcribe_speakers.py --force-cpu
|
# Force CPU processing
|
||||||
|
./run_transcribe.sh --force-cpu
|
||||||
```
|
```
|
||||||
|
|
||||||
**No audio captured:**
|
**Ollama connection failed:**
|
||||||
- Verify audio is playing
|
```bash
|
||||||
- Check device: `--list-devices`
|
# Start Ollama service
|
||||||
- Increase system volume
|
ollama serve
|
||||||
|
|
||||||
**Poor quality:**
|
# Pull required model
|
||||||
- Use larger model: `--model medium`
|
ollama pull qwen2.5:3b
|
||||||
|
```
|
||||||
|
|
||||||
|
**Poor transcription quality:**
|
||||||
|
- Use larger model: `--model medium` or `--model large`
|
||||||
- Increase interval: `--interval 10`
|
- Increase interval: `--interval 10`
|
||||||
- Specify language: `--language <code>`
|
- Specify language: `--language nl`
|
||||||
|
- Ensure good audio quality (reduce background noise)
|
||||||
**Ollama errors:**
|
|
||||||
- Ensure Ollama is running
|
|
||||||
- Pull model: `ollama pull llama3.2`
|
|
||||||
|
|
||||||
## Output Format
|
## Output Format
|
||||||
|
|
||||||
**Standard:**
|
### Standard Output
|
||||||
```
|
```
|
||||||
[14:23:15] Transcribed audio segment.
|
🎤 [14:23:15] User speaking into microphone
|
||||||
[14:23:23] Another segment with timestamp.
|
🔊 [14:23:18] Audio from speakers or system
|
||||||
```
|
```
|
||||||
|
|
||||||
**With LLM (--enable-llm):**
|
### With LLM Analysis
|
||||||
```
|
```
|
||||||
|
🎤 [14:23:15] The Earth orbits the Sun in 365 days.
|
||||||
|
✅ FACTUAL (0.98): Scientifically accurate orbital period.
|
||||||
|
❓ Questions:
|
||||||
|
1. Why do we need leap years?
|
||||||
|
2. How does the elliptical orbit affect seasons?
|
||||||
|
3. What factors influence Earth's orbital velocity?
|
||||||
|
```
|
||||||
|
|
||||||
|
### File Output
|
||||||
|
```
|
||||||
|
[14:23:15] MIC: User speaking into microphone
|
||||||
|
[14:23:18] SPEAKER: Audio from speakers
|
||||||
|
|
||||||
======================================================================
|
======================================================================
|
||||||
[14:23:15] The Earth revolves around the Sun in 365 days.
|
[14:23:25] MIC: The Earth orbits the Sun in 365 days.
|
||||||
|
|
||||||
📊 Fact Check: FACTUAL (confidence: 0.98)
|
📊 Fact Check: FACTUAL (confidence: 0.98)
|
||||||
💡 Scientifically accurate. Earth's orbital period is 365.25 days.
|
💡 Scientifically accurate orbital period.
|
||||||
|
|
||||||
❓ Questions:
|
❓ Questions:
|
||||||
1. Why do we need leap years?
|
1. Why do we need leap years?
|
||||||
2. How does Earth's orbit affect seasons?
|
2. How does the elliptical orbit affect seasons?
|
||||||
======================================================================
|
3. What factors influence Earth's orbital velocity?
|
||||||
```
|
```
|
||||||
|
|
||||||
## Technical Stack
|
## Architecture
|
||||||
|
|
||||||
- **Audio**: sounddevice, soundfile (16kHz mono, 16-bit PCM)
|
- **Audio Capture**: sounddevice with dual-stream support
|
||||||
- **Transcription**: faster-whisper (optimized Whisper)
|
- **Transcription**: faster-whisper (optimized Whisper implementation)
|
||||||
- **LLM**: Ollama (local inference)
|
- **LLM**: Ollama for local inference
|
||||||
- **Capture**: WASAPI loopback (Windows), PulseAudio (Linux)
|
- **Format**: 16kHz mono, 16-bit PCM
|
||||||
|
- **Processing**: Independent mic/speaker buffers with beam_size=3
|
||||||
|
|
||||||
## Future Work
|
## Contributing
|
||||||
|
|
||||||
- Real-time streaming transcription with reduced buffering
|
Contributions welcome! Please open issues or submit pull requests.
|
||||||
- Speaker diarization improvements
|
|
||||||
- Web interface for remote monitoring
|
|
||||||
- Multi-device simultaneous transcription
|
|
||||||
- Cloud LLM integration options
|
|
||||||
- Custom vocabulary and domain adaptation
|
|
||||||
- Noise reduction preprocessing
|
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
|
|||||||
19
RUN_DUTCH.sh
Executable file
19
RUN_DUTCH.sh
Executable file
@@ -0,0 +1,19 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Dutch language transcription with GPU and sentence extraction
|
||||||
|
|
||||||
|
cd "$(dirname "$0")"
|
||||||
|
|
||||||
|
export LD_LIBRARY_PATH=".venv/lib/python3.13/site-packages/nvidia/cudnn/lib:.venv/lib/python3.13/site-packages/nvidia/cublas/lib:${LD_LIBRARY_PATH}"
|
||||||
|
|
||||||
|
echo "Starting Dutch transcription..."
|
||||||
|
echo "Speak in Dutch into your microphone"
|
||||||
|
echo "Press Ctrl+C to stop"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
.venv/bin/python3 transcribe_speakers.py \
|
||||||
|
--sentence-mode \
|
||||||
|
--language nl \
|
||||||
|
--model base \
|
||||||
|
--interval 5 \
|
||||||
|
--min-duration 2 \
|
||||||
|
"$@"
|
||||||
22
RUN_DUTCH_LLM.sh
Executable file
22
RUN_DUTCH_LLM.sh
Executable file
@@ -0,0 +1,22 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Dutch transcription with GPU, sentence extraction, and LLM analysis
|
||||||
|
|
||||||
|
cd "$(dirname "$0")"
|
||||||
|
|
||||||
|
export LD_LIBRARY_PATH=".venv/lib/python3.13/site-packages/nvidia/cudnn/lib:.venv/lib/python3.13/site-packages/nvidia/cublas/lib:${LD_LIBRARY_PATH}"
|
||||||
|
|
||||||
|
echo "Starting Dutch transcription with LLM analysis..."
|
||||||
|
echo "Using model: llama3.2:latest"
|
||||||
|
echo "Speak in Dutch into your microphone"
|
||||||
|
echo "Press Ctrl+C to stop"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
.venv/bin/python3 transcribe_speakers.py \
|
||||||
|
--sentence-mode \
|
||||||
|
--language nl \
|
||||||
|
--model large \
|
||||||
|
--interval 5 \
|
||||||
|
--min-duration 2 \
|
||||||
|
--enable-llm \
|
||||||
|
--llm-model "llama3.2:latest" \
|
||||||
|
"$@"
|
||||||
13
RUN_GPU.sh
Executable file
13
RUN_GPU.sh
Executable file
@@ -0,0 +1,13 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# GPU-accelerated transcription with sentence extraction
|
||||||
|
|
||||||
|
cd "$(dirname "$0")"
|
||||||
|
|
||||||
|
export LD_LIBRARY_PATH=".venv/lib/python3.13/site-packages/nvidia/cudnn/lib:.venv/lib/python3.13/site-packages/nvidia/cublas/lib:${LD_LIBRARY_PATH}"
|
||||||
|
|
||||||
|
.venv/bin/python3 transcribe_speakers.py \
|
||||||
|
--sentence-mode \
|
||||||
|
--model large \
|
||||||
|
--interval 5 \
|
||||||
|
--min-duration 2 \
|
||||||
|
"$@"
|
||||||
226
enriched.txt
226
enriched.txt
@@ -1,226 +0,0 @@
|
|||||||
[23:31:46] So it helps us get back into a grounded information terrain and then also it requires us.
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
|
|
||||||
💡 The statement is a vague, non‑specific claim that cannot be verified against any factual evidence.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What specific processes or actions help us return to a grounded information terrain?
|
|
||||||
2. In what ways does this approach require us to change our current practices or mindset?
|
|
||||||
3. How does re-establishing a grounded information terrain impact the overall effectiveness of the project?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:31:54] to take the time to pay attention to information, really absorb it properly, and then to make decisions based on that. So we need to bring people into the process of
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 0.99)
|
|
||||||
💡 The statement is an incomplete, non‑factual description of a process, not a verifiable claim.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What are the key points here?
|
|
||||||
2. What evidence supports this?
|
|
||||||
3. What are the implications?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:32:02] decision making and at the same time as part of that bring them into a terrain of really
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 1.00)
|
|
||||||
💡 The statement is a nonsensical fragment and does not convey a verifiable factual claim.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What does the phrase "bring them into a terrain of really" refer to in the context of decision making?
|
|
||||||
2. How does the process of decision making simultaneously involve "bringing them into a terrain" as mentioned?
|
|
||||||
3. Can you explain how the concept of "terrain"?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:32:10] curing and discerning information properly and then engage in the decision-making process. That's the only way we're actually going to turn this around. It's not going to be good enough to...
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
|
|
||||||
💡 The fragment is an incomplete quote with no verifiable factual claim.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What does "curing and discerning information properly" entail in the context of this statement?
|
|
||||||
2. How does engaging in the decision‑making process contribute to turning the situation around?
|
|
||||||
3. Why is simply having information or a plan not sufficient according to the speaker?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:32:18] to elect new politicians because the underlying problem of the way we absorb, process and deal with information now remains. And the only way we can do that is actually to do it.
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 0.95)
|
|
||||||
💡 The statement is an opinion about politics and information processing, not a verifiable factual claim.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What specific aspects of the way we absorb, process, and deal with information are identified as the underlying problem in the statement?
|
|
||||||
2. How does the statement justify the election of new politicians as a solution to the information-related issue it describes?
|
|
||||||
3. What practical steps or strategies does the statement imply we should take to "actually do it" in addressing the information problem?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:32:26] actually by bringing people in on a mass basis, having huge numbers of citizens, juries around the country on a regular basis for decisions we're making involving the
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
|
|
||||||
💡 The statement is incomplete and lacks context, making it impossible to verify its factual accuracy.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What are the key points here?
|
|
||||||
2. What evidence supports this?
|
|
||||||
3. What are the implications?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:32:35] public that's the only way we're going to be able to turn this around and not just think that okay let's just wait for another Kamala Harris or somebody like that to come along and win an election then
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 0.95)
|
|
||||||
💡 The quoted phrase
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What specific actions does the speaker believe are necessary?
|
|
||||||
2. Q1: What specific actions does the speaker believe are necessary?
|
|
||||||
3. What are the implications?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:32:42] we'll all be right and we'll be able to turn the clock back. It won't work like that. The problem is far too deep seated than that. So, yes, we are becoming
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
|
|
||||||
💡 The statement is a vague, incomplete fragment with no verifiable factual claim.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What specific problem is the speaker implying is "far too deep seated" to be solved by simply "turning the clock back"?
|
|
||||||
2. How does the speaker's claim that "we'll all be right" relate to the broader context or argument being presented?
|
|
||||||
3. In what ways might the statement "So, yes, we are becoming" reflect a shift in perspective or identity for the speaker or the audience?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:32:50] That's basically what's going on at the moment. But that doesn't mean that we can lose hope, because there are mechanisms in which we can actually turn that around.
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
|
|
||||||
💡 The statement is a general, non‑specific claim that cannot be verified as true or false.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What specific situation or issue is being described as "what's going on at the moment"?
|
|
||||||
2. What mechanisms are being referred to that could help "turn that around"?
|
|
||||||
3. How does the speaker justify maintaining hope despite the current challenges?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:32:58] by actually engaging in the political process ourselves, which would force us to then utilise information in a different way.
|
|
||||||
|
|
||||||
📊 Fact Check: DUBIOUS (confidence: 0.70)
|
|
||||||
💡 The claim is a speculative assertion about how political engagement might change information use, and it cannot be verified as a factual statement.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What are the key points here?
|
|
||||||
2. What evidence supports this?
|
|
||||||
3. What are the implications?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:33:06] hopefully in the end come to different conclusions, but be part of that decision-making process too. So it's an important realization. What's happening to us is
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 1.00)
|
|
||||||
💡 The statement is a subjective expression of hope and realization, not a verifiable factual claim.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What does the speaker mean by "hopefully in the end come to different conclusions" and how does that relate to the decision-making process mentioned?
|
|
||||||
2. In what ways might being part of the decision-making process influence the outcomes described in the statement?
|
|
||||||
3. What specific "important realization" is referenced, and how does it connect to "what's happening to us"?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:33:25] species in terms of our intelligence but it more importantly gives us a very important call to action. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think. We need to think.
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 1.00)
|
|
||||||
💡 The statement is a nonsensical fragment that does not present any verifiable factual claim.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What is the main message conveyed by?
|
|
||||||
2. Q1: What is the main message conveyed by?
|
|
||||||
3. What are the implications?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:33:40] differently about how we govern ourselves going forward if we are to reverse this genuine decline. I hope you like my video. As a psychiatrist who loves politics and economics and philosophy I love to make videos like this and you can really help promote this video to other people and get it on your feed more by liking and commenting and subscribing to the video.
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 1.00)
|
|
||||||
💡 The statement is a personal comment and request for promotion, not a claim that can be verified as true or false.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What specific strategies does the speaker propose for reversing the "genuine decline" in governance mentioned in the statement?
|
|
||||||
2. How does the speaker’s background as a psychiatrist influence their perspective on politics, economics, and philosophy?
|
|
||||||
3. In what ways does the speaker suggest viewers can effectively promote the video to reach a wider audience?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:33:46] well. We have a wonderful community of people here who comment and support each other through this very traumatic period of world history.
|
|
||||||
|
|
||||||
📊 Fact Check: DUBIOUS (confidence: 0.60)
|
|
||||||
💡 The claim is a subjective, unverified assertion about a community’s nature and cannot be confirmed or refuted with available evidence.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. Who are the members of the community mentioned in the statement?
|
|
||||||
2. Which specific traumatic period of world history is being referred to?
|
|
||||||
3. In what ways do the community members comment and support each other during this period?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:33:54] that we're going through right now. I also hope you consider becoming a subscriber to the channel and also subscribing to my.
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 1.00)
|
|
||||||
💡 The sentence is a fragment expressing a hope, not a verifiable factual claim.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What are the key points here?
|
|
||||||
2. What evidence supports this?
|
|
||||||
3. What are the implications?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:34:02] E-newsletter, there's a link in the description, and that way we can stay in touch outside the channel so you can stay, keep a
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
|
|
||||||
💡 [one sentence]"
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What are the key points here?
|
|
||||||
2. What evidence supports this?
|
|
||||||
3. What are the implications?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:34:10] rest of all of the content that I'm making on an ongoing basis. The latest of which actually is my latest book called We the People.
|
|
||||||
|
|
||||||
📊 Fact Check: DUBIOUS (confidence: 0.50)
|
|
||||||
💡 The statement is a fragment with no verifiable context or evidence that the speaker’s latest book is titled *We the People*.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What are the key points here?
|
|
||||||
2. What evidence supports this?
|
|
||||||
3. What are the implications?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:34:19] very very proud of this book. It's actually a novel, a fiction book, written by myself and the famous award-winning author T.J. McGregor.
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
|
|
||||||
💡 No verifiable record exists of a novel co‑authored by the user and an award‑winning author
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. How did you and T.J. McGregor collaborate on the novel?
|
|
||||||
2. What inspired you to co-write a fiction book with an award‑winning author?
|
|
||||||
3. What genre and themes does the novel?
|
|
||||||
|
|
||||||
|
|
||||||
======================================================================
|
|
||||||
[23:34:27] Together we wrote a book about what the future might look like. Bit of a dystopian novel, but what might happen if autocracy goes to its next stage?
|
|
||||||
|
|
||||||
📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
|
|
||||||
💡 There is no verifiable evidence that the speaker and the other person co‑authored a book on future dystopias.
|
|
||||||
|
|
||||||
❓ Questions:
|
|
||||||
1. What core themes and motifs did the book explore to envision the next stage of autocracy?
|
|
||||||
2. How does the narrative structure of the novel reflect the progression of authoritarian power in a dystopian future?
|
|
||||||
3. What real-world events or historical patterns inspired the authors to imagine a future where autocracy has evolved beyond its current form?
|
|
||||||
|
|
||||||
@@ -11,4 +11,4 @@ CUBLAS_PATH=".venv/lib/python3.13/site-packages/nvidia/cublas/lib"
|
|||||||
export LD_LIBRARY_PATH="${CUDNN_PATH}:${CUBLAS_PATH}:${LD_LIBRARY_PATH}"
|
export LD_LIBRARY_PATH="${CUDNN_PATH}:${CUBLAS_PATH}:${LD_LIBRARY_PATH}"
|
||||||
|
|
||||||
# Run the transcription script with all arguments
|
# Run the transcription script with all arguments
|
||||||
python3 transcribe_dual_linux.py "$@"
|
python3 transcribe.py "$@"
|
||||||
|
|||||||
24
start_transcribe.sh
Executable file
24
start_transcribe.sh
Executable file
@@ -0,0 +1,24 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
cd "$(dirname "$0")"
|
||||||
|
|
||||||
|
source .venv/bin/activate
|
||||||
|
|
||||||
|
# Set CUDA library paths for ctranslate2
|
||||||
|
export LD_LIBRARY_PATH=".venv/lib/python3.13/site-packages/nvidia/cudnn/lib:.venv/lib/python3.13/site-packages/nvidia/cublas/lib:.venv/lib/python3.13/site-packages/nvidia/cuda_runtime/lib:${LD_LIBRARY_PATH}"
|
||||||
|
|
||||||
|
# Run transcription with sentence mode and GPU
|
||||||
|
echo "Starting transcription with:"
|
||||||
|
echo " - Model: tiny (fast, good for testing)"
|
||||||
|
echo " - GPU mode (RTX 4060 Ti)"
|
||||||
|
echo " - Sentence extraction enabled"
|
||||||
|
echo " - Interval: 5 seconds"
|
||||||
|
echo ""
|
||||||
|
echo "Speak into your microphone to see transcription..."
|
||||||
|
echo "Press Ctrl+C to stop"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
python3 transcribe_speakers.py \
|
||||||
|
--sentence-mode \
|
||||||
|
--model tiny \
|
||||||
|
--interval 5 \
|
||||||
|
--min-duration 2
|
||||||
437
transcribe.py
Normal file
437
transcribe.py
Normal file
@@ -0,0 +1,437 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Real-time audio transcription with dual capture and optional LLM analysis.
|
||||||
|
Supports microphone + speaker monitor, file output, and fact-checking.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sounddevice as sd
|
||||||
|
import numpy as np
|
||||||
|
import threading
|
||||||
|
import queue
|
||||||
|
import time
|
||||||
|
import os
|
||||||
|
import argparse
|
||||||
|
from datetime import datetime
|
||||||
|
from faster_whisper import WhisperModel
|
||||||
|
|
||||||
|
try:
|
||||||
|
import ollama
|
||||||
|
OLLAMA_AVAILABLE = True
|
||||||
|
except ImportError:
|
||||||
|
OLLAMA_AVAILABLE = False
|
||||||
|
|
||||||
|
|
||||||
|
class DualAudioCapture:
|
||||||
|
"""Capture both microphone and speaker output simultaneously"""
|
||||||
|
|
||||||
|
def __init__(self, mic_device=None, monitor_device=None, sample_rate=16000, chunk_size=2048):
|
||||||
|
self.sample_rate = sample_rate
|
||||||
|
self.chunk_size = chunk_size
|
||||||
|
self.audio_queue = queue.Queue()
|
||||||
|
|
||||||
|
# Find devices
|
||||||
|
devices = sd.query_devices()
|
||||||
|
|
||||||
|
# Microphone (default input or specified)
|
||||||
|
if mic_device is None:
|
||||||
|
self.mic_device = sd.default.device[0] # Default input
|
||||||
|
else:
|
||||||
|
self.mic_device = self._find_device(mic_device, input_required=True)
|
||||||
|
|
||||||
|
# Monitor/Loopback (for speaker output)
|
||||||
|
if monitor_device:
|
||||||
|
self.monitor_device = self._find_device(monitor_device, input_required=True)
|
||||||
|
else:
|
||||||
|
self.monitor_device = None
|
||||||
|
|
||||||
|
print(f"✓ Microphone: {devices[self.mic_device]['name']} (index {self.mic_device})")
|
||||||
|
if self.monitor_device:
|
||||||
|
print(f"✓ Monitor: {devices[self.monitor_device]['name']} (index {self.monitor_device})")
|
||||||
|
else:
|
||||||
|
print("⚠ No monitor device - capturing microphone only")
|
||||||
|
|
||||||
|
# Start streams
|
||||||
|
self.mic_stream = sd.InputStream(
|
||||||
|
device=self.mic_device,
|
||||||
|
channels=1,
|
||||||
|
samplerate=sample_rate,
|
||||||
|
blocksize=chunk_size,
|
||||||
|
dtype='int16',
|
||||||
|
callback=self._mic_callback
|
||||||
|
)
|
||||||
|
|
||||||
|
if self.monitor_device:
|
||||||
|
self.monitor_stream = sd.InputStream(
|
||||||
|
device=self.monitor_device,
|
||||||
|
channels=1,
|
||||||
|
samplerate=sample_rate,
|
||||||
|
blocksize=chunk_size,
|
||||||
|
dtype='int16',
|
||||||
|
callback=self._monitor_callback
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
self.monitor_stream = None
|
||||||
|
|
||||||
|
self.mic_stream.start()
|
||||||
|
if self.monitor_stream:
|
||||||
|
self.monitor_stream.start()
|
||||||
|
|
||||||
|
print("✓ Audio capture started")
|
||||||
|
|
||||||
|
def _find_device(self, device_name, input_required=True):
|
||||||
|
"""Find device by name substring"""
|
||||||
|
devices = sd.query_devices()
|
||||||
|
for i, dev in enumerate(devices):
|
||||||
|
if device_name.lower() in dev['name'].lower():
|
||||||
|
if not input_required or dev['max_input_channels'] > 0:
|
||||||
|
return i
|
||||||
|
raise RuntimeError(f"Device '{device_name}' not found")
|
||||||
|
|
||||||
|
def _mic_callback(self, indata, frames, time_info, status):
|
||||||
|
"""Microphone audio callback"""
|
||||||
|
if status:
|
||||||
|
print(f"⚠ Mic status: {status}")
|
||||||
|
self.audio_queue.put(('mic', indata.copy()))
|
||||||
|
|
||||||
|
def _monitor_callback(self, indata, frames, time_info, status):
|
||||||
|
"""Monitor/speaker audio callback"""
|
||||||
|
if status:
|
||||||
|
print(f"⚠ Monitor status: {status}")
|
||||||
|
self.audio_queue.put(('monitor', indata.copy()))
|
||||||
|
|
||||||
|
def read_chunk(self):
|
||||||
|
"""Read audio data from queue"""
|
||||||
|
try:
|
||||||
|
return self.audio_queue.get(timeout=0.05)
|
||||||
|
except queue.Empty:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
"""Cleanup resources"""
|
||||||
|
self.mic_stream.stop()
|
||||||
|
self.mic_stream.close()
|
||||||
|
if self.monitor_stream:
|
||||||
|
self.monitor_stream.stop()
|
||||||
|
self.monitor_stream.close()
|
||||||
|
|
||||||
|
|
||||||
|
class WhisperTranscriber:
|
||||||
|
"""Process audio with Whisper"""
|
||||||
|
|
||||||
|
def __init__(self, model_name="base", language="en", force_cpu=False):
|
||||||
|
print(f"Loading Whisper model '{model_name}'...")
|
||||||
|
|
||||||
|
import torch
|
||||||
|
has_cuda = torch.cuda.is_available() and not force_cpu
|
||||||
|
|
||||||
|
device = "cpu"
|
||||||
|
compute_type = "int8"
|
||||||
|
|
||||||
|
if has_cuda:
|
||||||
|
try:
|
||||||
|
import ctranslate2
|
||||||
|
if ctranslate2.get_cuda_device_count() > 0:
|
||||||
|
device = "cuda"
|
||||||
|
compute_type = "float16"
|
||||||
|
print(f"✓ Using GPU: {torch.cuda.get_device_name(0)}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠ CUDA unavailable: {e}")
|
||||||
|
|
||||||
|
if device == "cpu":
|
||||||
|
print("✓ Using CPU")
|
||||||
|
|
||||||
|
model_kwargs = {"device": device, "compute_type": compute_type}
|
||||||
|
if device == "cpu":
|
||||||
|
model_kwargs["cpu_threads"] = 4
|
||||||
|
|
||||||
|
self.model = WhisperModel(model_name, **model_kwargs)
|
||||||
|
self.language = language
|
||||||
|
self.mic_buffer = np.array([], dtype=np.float32)
|
||||||
|
self.monitor_buffer = np.array([], dtype=np.float32)
|
||||||
|
self.lock = threading.Lock()
|
||||||
|
|
||||||
|
def add_audio(self, source, audio_chunk):
|
||||||
|
"""Add audio to appropriate buffer"""
|
||||||
|
with self.lock:
|
||||||
|
audio_float = audio_chunk.flatten().astype(np.float32) / 32768.0
|
||||||
|
if source == 'mic':
|
||||||
|
self.mic_buffer = np.concatenate([self.mic_buffer, audio_float])
|
||||||
|
else:
|
||||||
|
self.monitor_buffer = np.concatenate([self.monitor_buffer, audio_float])
|
||||||
|
|
||||||
|
def transcribe_chunk(self, min_duration=3.0):
|
||||||
|
"""Transcribe accumulated audio"""
|
||||||
|
with self.lock:
|
||||||
|
mic_duration = len(self.mic_buffer) / 16000
|
||||||
|
monitor_duration = len(self.monitor_buffer) / 16000
|
||||||
|
|
||||||
|
results = {}
|
||||||
|
|
||||||
|
# Transcribe microphone
|
||||||
|
if mic_duration >= min_duration:
|
||||||
|
mic_audio = self.mic_buffer.copy()
|
||||||
|
self.mic_buffer = np.array([], dtype=np.float32)
|
||||||
|
results['mic'] = self._transcribe(mic_audio)
|
||||||
|
|
||||||
|
# Transcribe monitor
|
||||||
|
if monitor_duration >= min_duration:
|
||||||
|
monitor_audio = self.monitor_buffer.copy()
|
||||||
|
self.monitor_buffer = np.array([], dtype=np.float32)
|
||||||
|
results['monitor'] = self._transcribe(monitor_audio)
|
||||||
|
|
||||||
|
return results if results else None
|
||||||
|
|
||||||
|
def _transcribe(self, audio):
|
||||||
|
"""Internal transcription"""
|
||||||
|
try:
|
||||||
|
segments, _ = self.model.transcribe(
|
||||||
|
audio,
|
||||||
|
language=self.language,
|
||||||
|
beam_size=3,
|
||||||
|
vad_filter=True,
|
||||||
|
vad_parameters=dict(min_silence_duration_ms=500)
|
||||||
|
)
|
||||||
|
text = " ".join([seg.text for seg in segments]).strip()
|
||||||
|
return text if text else None
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Transcription error: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
class LLMAnalyzer:
|
||||||
|
"""LLM analysis with fact-checking and question generation"""
|
||||||
|
|
||||||
|
def __init__(self, model="qwen2.5:3b"):
|
||||||
|
if not OLLAMA_AVAILABLE:
|
||||||
|
raise RuntimeError("Ollama not installed: pip install ollama")
|
||||||
|
|
||||||
|
self.model = model
|
||||||
|
try:
|
||||||
|
ollama.list()
|
||||||
|
print(f"✓ Ollama connected: {self.model}")
|
||||||
|
except Exception as e:
|
||||||
|
raise RuntimeError(f"Ollama not running: {e}")
|
||||||
|
|
||||||
|
def fact_check(self, text):
|
||||||
|
"""Quick fact-check"""
|
||||||
|
prompt = f"""Fact-check this statement. Reply ONLY with:
|
||||||
|
VERDICT: factual/dubious/false
|
||||||
|
CONFIDENCE: 0.0-1.0
|
||||||
|
REASON: one sentence
|
||||||
|
|
||||||
|
Statement: "{text}" """
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = ollama.generate(
|
||||||
|
model=self.model,
|
||||||
|
prompt=prompt,
|
||||||
|
options={"temperature": 0.1, "num_predict": 80}
|
||||||
|
)
|
||||||
|
|
||||||
|
import re
|
||||||
|
response_text = response['response']
|
||||||
|
|
||||||
|
verdict = re.search(r'VERDICT:\s*(\w+)', response_text, re.I)
|
||||||
|
confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', response_text, re.I)
|
||||||
|
reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', response_text, re.I | re.DOTALL)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'verdict': verdict.group(1).lower() if verdict else 'unknown',
|
||||||
|
'confidence': float(confidence.group(1)) if confidence else 0.5,
|
||||||
|
'reason': reason.group(1).strip() if reason else response_text[:150]
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
return {'verdict': 'error', 'confidence': 0.0, 'reason': str(e)}
|
||||||
|
|
||||||
|
def generate_questions(self, text):
|
||||||
|
"""Generate follow-up questions"""
|
||||||
|
prompt = f"""Generate 3 insightful questions about this. Reply ONLY with:
|
||||||
|
Q1: [question]
|
||||||
|
Q2: [question]
|
||||||
|
Q3: [question]
|
||||||
|
|
||||||
|
Statement: "{text}" """
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = ollama.generate(
|
||||||
|
model=self.model,
|
||||||
|
prompt=prompt,
|
||||||
|
options={"temperature": 0.7, "num_predict": 120}
|
||||||
|
)
|
||||||
|
|
||||||
|
import re
|
||||||
|
response_text = response['response']
|
||||||
|
questions = []
|
||||||
|
|
||||||
|
for i in range(1, 4):
|
||||||
|
q_match = re.search(rf'Q{i}:\s*(.+?)(?:\n|$)', response_text, re.I)
|
||||||
|
if q_match:
|
||||||
|
question = q_match.group(1).strip()
|
||||||
|
if not question.endswith('?'):
|
||||||
|
question += '?'
|
||||||
|
questions.append(question)
|
||||||
|
|
||||||
|
# Fallback defaults
|
||||||
|
while len(questions) < 3:
|
||||||
|
defaults = ["What are the implications?", "What evidence supports this?", "What's the context?"]
|
||||||
|
questions.append(defaults[len(questions)])
|
||||||
|
|
||||||
|
return questions[:3]
|
||||||
|
except Exception as e:
|
||||||
|
return ["What are the key points?", "What supports this?", "What are the implications?"]
|
||||||
|
|
||||||
|
|
||||||
|
def save_transcript(text, source, timestamp, filename):
|
||||||
|
"""Append transcript to file"""
|
||||||
|
os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
|
||||||
|
with open(filename, "a", encoding="utf-8") as f:
|
||||||
|
source_label = "MIC" if source == 'mic' else "SPEAKER"
|
||||||
|
f.write(f"[{timestamp}] {source_label}: {text}\n")
|
||||||
|
|
||||||
|
|
||||||
|
def save_enriched_transcript(text, source, timestamp, fact_check, questions, filename):
|
||||||
|
"""Save enriched transcript with LLM analysis"""
|
||||||
|
os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
|
||||||
|
with open(filename, "a", encoding="utf-8") as f:
|
||||||
|
source_label = "MIC" if source == 'mic' else "SPEAKER"
|
||||||
|
f.write(f"\n{'='*70}\n")
|
||||||
|
f.write(f"[{timestamp}] {source_label}: {text}\n\n")
|
||||||
|
|
||||||
|
if fact_check:
|
||||||
|
f.write(f"📊 Fact Check: {fact_check['verdict'].upper()} ")
|
||||||
|
f.write(f"(confidence: {fact_check['confidence']:.2f})\n")
|
||||||
|
f.write(f"💡 {fact_check['reason']}\n\n")
|
||||||
|
|
||||||
|
if questions:
|
||||||
|
f.write("❓ Questions:\n")
|
||||||
|
for i, q in enumerate(questions, 1):
|
||||||
|
f.write(f"{i}. {q}\n")
|
||||||
|
f.write("\n")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Real-time audio transcription with dual capture")
|
||||||
|
parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium", "large"],
|
||||||
|
help="Whisper model (default: tiny)")
|
||||||
|
parser.add_argument("--language", default="en", help="Language code (default: en)")
|
||||||
|
parser.add_argument("--mic", help="Microphone device name (partial match)")
|
||||||
|
parser.add_argument("--monitor", help="Monitor device name for speaker capture")
|
||||||
|
parser.add_argument("--interval", type=float, default=5.0, help="Processing interval in seconds (default: 5.0)")
|
||||||
|
parser.add_argument("--min-duration", type=float, default=2.0, help="Minimum audio duration (default: 2.0)")
|
||||||
|
parser.add_argument("--enable-llm", action="store_true", help="Enable LLM analysis (fact-checking + questions)")
|
||||||
|
parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model (default: qwen2.5:3b)")
|
||||||
|
parser.add_argument("--output", "-o", help="Save transcript to file")
|
||||||
|
parser.add_argument("--list-devices", action="store_true", help="List audio devices and exit")
|
||||||
|
parser.add_argument("--force-cpu", action="store_true", help="Force CPU processing")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.list_devices:
|
||||||
|
print("\nAvailable audio devices:")
|
||||||
|
for i, dev in enumerate(sd.query_devices()):
|
||||||
|
in_ch = dev['max_input_channels']
|
||||||
|
out_ch = dev['max_output_channels']
|
||||||
|
if in_ch > 0:
|
||||||
|
print(f" [{i:2d}] {dev['name']:<50} IN:{in_ch} OUT:{out_ch}")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("=== Real-Time Audio Transcription ===")
|
||||||
|
print(f"Model: {args.model} | Language: {args.language} | Interval: {args.interval}s")
|
||||||
|
if args.output:
|
||||||
|
print(f"Output: {args.output}")
|
||||||
|
if args.enable_llm:
|
||||||
|
print(f"LLM Analysis: Enabled ({args.llm_model})")
|
||||||
|
|
||||||
|
# Initialize capture
|
||||||
|
try:
|
||||||
|
capturer = DualAudioCapture(
|
||||||
|
mic_device=args.mic,
|
||||||
|
monitor_device=args.monitor,
|
||||||
|
sample_rate=16000,
|
||||||
|
chunk_size=2048
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"\n❌ Audio Error: {e}")
|
||||||
|
print("\nTip: Use --list-devices to see available devices")
|
||||||
|
print(" Use --mic and --monitor to specify devices")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Initialize transcriber
|
||||||
|
try:
|
||||||
|
transcriber = WhisperTranscriber(
|
||||||
|
model_name=args.model,
|
||||||
|
language=args.language,
|
||||||
|
force_cpu=args.force_cpu
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"\n❌ Whisper Error: {e}")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Initialize LLM analyzer
|
||||||
|
llm_analyzer = None
|
||||||
|
if args.enable_llm:
|
||||||
|
try:
|
||||||
|
llm_analyzer = LLMAnalyzer(model=args.llm_model)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"\n⚠ LLM Error: {e}")
|
||||||
|
print("Continuing without LLM analysis...")
|
||||||
|
|
||||||
|
# Main loop
|
||||||
|
print(f"\n✅ Started. Press Ctrl+C to stop.\n{'='*60}")
|
||||||
|
last_process = time.time()
|
||||||
|
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
# Collect audio
|
||||||
|
chunk = capturer.read_chunk()
|
||||||
|
if chunk:
|
||||||
|
source, audio = chunk
|
||||||
|
transcriber.add_audio(source, audio)
|
||||||
|
|
||||||
|
# Process at intervals
|
||||||
|
if time.time() - last_process >= args.interval:
|
||||||
|
results = transcriber.transcribe_chunk(min_duration=args.min_duration)
|
||||||
|
|
||||||
|
if results:
|
||||||
|
timestamp = datetime.now().strftime("%H:%M:%S")
|
||||||
|
|
||||||
|
for source, text in results.items():
|
||||||
|
if text:
|
||||||
|
source_emoji = "🎤" if source == 'mic' else "🔊"
|
||||||
|
print(f"\n{source_emoji} [{timestamp}] {text}")
|
||||||
|
|
||||||
|
# LLM analysis
|
||||||
|
fact_check = None
|
||||||
|
questions = None
|
||||||
|
if llm_analyzer:
|
||||||
|
fact_check = llm_analyzer.fact_check(text)
|
||||||
|
questions = llm_analyzer.generate_questions(text)
|
||||||
|
|
||||||
|
verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(
|
||||||
|
fact_check['verdict'], '❓')
|
||||||
|
print(f" {verdict_emoji} {fact_check['verdict'].upper()} "
|
||||||
|
f"({fact_check['confidence']:.2f}): {fact_check['reason']}")
|
||||||
|
print(f" ❓ Questions:")
|
||||||
|
for i, q in enumerate(questions, 1):
|
||||||
|
print(f" {i}. {q}")
|
||||||
|
|
||||||
|
# Save to file
|
||||||
|
if args.output:
|
||||||
|
if llm_analyzer:
|
||||||
|
save_enriched_transcript(text, source, timestamp, fact_check, questions, args.output)
|
||||||
|
else:
|
||||||
|
save_transcript(text, source, timestamp, args.output)
|
||||||
|
|
||||||
|
last_process = time.time()
|
||||||
|
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
print(f"\n{'='*60}\n🛑 Stopping...")
|
||||||
|
|
||||||
|
capturer.close()
|
||||||
|
if args.output and os.path.exists(args.output):
|
||||||
|
print(f"\n💾 Transcript saved: {os.path.abspath(args.output)}")
|
||||||
|
print("\n✅ Done!")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -1,7 +1,7 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
"""
|
"""
|
||||||
Real-time transcription with dual audio capture (microphone + speaker monitor).
|
Real-time audio transcription with dual capture and optional LLM analysis.
|
||||||
Linux/PipeWire optimized with Ollama LLM fact-checking.
|
Supports microphone + speaker monitor, file output, and fact-checking.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import sounddevice as sd
|
import sounddevice as sd
|
||||||
@@ -9,6 +9,7 @@ import numpy as np
|
|||||||
import threading
|
import threading
|
||||||
import queue
|
import queue
|
||||||
import time
|
import time
|
||||||
|
import os
|
||||||
import argparse
|
import argparse
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
from faster_whisper import WhisperModel
|
from faster_whisper import WhisperModel
|
||||||
@@ -197,8 +198,8 @@ class WhisperTranscriber:
|
|||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
class LLMFactChecker:
|
class LLMAnalyzer:
|
||||||
"""Fast fact-checking with Ollama"""
|
"""LLM analysis with fact-checking and question generation"""
|
||||||
|
|
||||||
def __init__(self, model="qwen2.5:3b"):
|
def __init__(self, model="qwen2.5:3b"):
|
||||||
if not OLLAMA_AVAILABLE:
|
if not OLLAMA_AVAILABLE:
|
||||||
@@ -228,34 +229,100 @@ Statement: "{text}" """
|
|||||||
)
|
)
|
||||||
|
|
||||||
import re
|
import re
|
||||||
text = response['response']
|
response_text = response['response']
|
||||||
|
|
||||||
verdict = re.search(r'VERDICT:\s*(\w+)', text, re.I)
|
verdict = re.search(r'VERDICT:\s*(\w+)', response_text, re.I)
|
||||||
confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', text, re.I)
|
confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', response_text, re.I)
|
||||||
reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', text, re.I | re.DOTALL)
|
reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', response_text, re.I | re.DOTALL)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'verdict': verdict.group(1).lower() if verdict else 'unknown',
|
'verdict': verdict.group(1).lower() if verdict else 'unknown',
|
||||||
'confidence': float(confidence.group(1)) if confidence else 0.5,
|
'confidence': float(confidence.group(1)) if confidence else 0.5,
|
||||||
'reason': reason.group(1).strip() if reason else text[:150]
|
'reason': reason.group(1).strip() if reason else response_text[:150]
|
||||||
}
|
}
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
return {'verdict': 'error', 'confidence': 0.0, 'reason': str(e)}
|
return {'verdict': 'error', 'confidence': 0.0, 'reason': str(e)}
|
||||||
|
|
||||||
|
def generate_questions(self, text):
|
||||||
|
"""Generate follow-up questions"""
|
||||||
|
prompt = f"""Generate 3 insightful questions about this. Reply ONLY with:
|
||||||
|
Q1: [question]
|
||||||
|
Q2: [question]
|
||||||
|
Q3: [question]
|
||||||
|
|
||||||
|
Statement: "{text}" """
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = ollama.generate(
|
||||||
|
model=self.model,
|
||||||
|
prompt=prompt,
|
||||||
|
options={"temperature": 0.7, "num_predict": 120}
|
||||||
|
)
|
||||||
|
|
||||||
|
import re
|
||||||
|
response_text = response['response']
|
||||||
|
questions = []
|
||||||
|
|
||||||
|
for i in range(1, 4):
|
||||||
|
q_match = re.search(rf'Q{i}:\s*(.+?)(?:\n|$)', response_text, re.I)
|
||||||
|
if q_match:
|
||||||
|
question = q_match.group(1).strip()
|
||||||
|
if not question.endswith('?'):
|
||||||
|
question += '?'
|
||||||
|
questions.append(question)
|
||||||
|
|
||||||
|
# Fallback defaults
|
||||||
|
while len(questions) < 3:
|
||||||
|
defaults = ["What are the implications?", "What evidence supports this?", "What's the context?"]
|
||||||
|
questions.append(defaults[len(questions)])
|
||||||
|
|
||||||
|
return questions[:3]
|
||||||
|
except Exception as e:
|
||||||
|
return ["What are the key points?", "What supports this?", "What are the implications?"]
|
||||||
|
|
||||||
|
|
||||||
|
def save_transcript(text, source, timestamp, filename):
|
||||||
|
"""Append transcript to file"""
|
||||||
|
os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
|
||||||
|
with open(filename, "a", encoding="utf-8") as f:
|
||||||
|
source_label = "MIC" if source == 'mic' else "SPEAKER"
|
||||||
|
f.write(f"[{timestamp}] {source_label}: {text}\n")
|
||||||
|
|
||||||
|
|
||||||
|
def save_enriched_transcript(text, source, timestamp, fact_check, questions, filename):
|
||||||
|
"""Save enriched transcript with LLM analysis"""
|
||||||
|
os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
|
||||||
|
with open(filename, "a", encoding="utf-8") as f:
|
||||||
|
source_label = "MIC" if source == 'mic' else "SPEAKER"
|
||||||
|
f.write(f"\n{'='*70}\n")
|
||||||
|
f.write(f"[{timestamp}] {source_label}: {text}\n\n")
|
||||||
|
|
||||||
|
if fact_check:
|
||||||
|
f.write(f"📊 Fact Check: {fact_check['verdict'].upper()} ")
|
||||||
|
f.write(f"(confidence: {fact_check['confidence']:.2f})\n")
|
||||||
|
f.write(f"💡 {fact_check['reason']}\n\n")
|
||||||
|
|
||||||
|
if questions:
|
||||||
|
f.write("❓ Questions:\n")
|
||||||
|
for i, q in enumerate(questions, 1):
|
||||||
|
f.write(f"{i}. {q}\n")
|
||||||
|
f.write("\n")
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
parser = argparse.ArgumentParser(description="Dual audio transcription with fact-checking")
|
parser = argparse.ArgumentParser(description="Real-time audio transcription with dual capture")
|
||||||
parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium"],
|
parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium", "large"],
|
||||||
help="Whisper model (default: tiny for speed)")
|
help="Whisper model (default: tiny)")
|
||||||
parser.add_argument("--language", default="en", help="Language code")
|
parser.add_argument("--language", default="en", help="Language code (default: en)")
|
||||||
parser.add_argument("--mic", help="Microphone device name (partial match)")
|
parser.add_argument("--mic", help="Microphone device name (partial match)")
|
||||||
parser.add_argument("--monitor", help="Monitor device name for speaker capture")
|
parser.add_argument("--monitor", help="Monitor device name for speaker capture")
|
||||||
parser.add_argument("--interval", type=float, default=5.0, help="Processing interval (seconds)")
|
parser.add_argument("--interval", type=float, default=5.0, help="Processing interval in seconds (default: 5.0)")
|
||||||
parser.add_argument("--min-duration", type=float, default=2.0, help="Min audio duration")
|
parser.add_argument("--min-duration", type=float, default=2.0, help="Minimum audio duration (default: 2.0)")
|
||||||
parser.add_argument("--enable-llm", action="store_true", help="Enable fact-checking")
|
parser.add_argument("--enable-llm", action="store_true", help="Enable LLM analysis (fact-checking + questions)")
|
||||||
parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model")
|
parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model (default: qwen2.5:3b)")
|
||||||
parser.add_argument("--list-devices", action="store_true", help="List audio devices")
|
parser.add_argument("--output", "-o", help="Save transcript to file")
|
||||||
parser.add_argument("--force-cpu", action="store_true", help="Force CPU")
|
parser.add_argument("--list-devices", action="store_true", help="List audio devices and exit")
|
||||||
|
parser.add_argument("--force-cpu", action="store_true", help="Force CPU processing")
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
@@ -268,8 +335,12 @@ def main():
|
|||||||
print(f" [{i:2d}] {dev['name']:<50} IN:{in_ch} OUT:{out_ch}")
|
print(f" [{i:2d}] {dev['name']:<50} IN:{in_ch} OUT:{out_ch}")
|
||||||
return
|
return
|
||||||
|
|
||||||
print("=== Dual Audio Transcription with Fact-Checking ===")
|
print("=== Real-Time Audio Transcription ===")
|
||||||
print(f"Model: {args.model} | Language: {args.language} | Interval: {args.interval}s")
|
print(f"Model: {args.model} | Language: {args.language} | Interval: {args.interval}s")
|
||||||
|
if args.output:
|
||||||
|
print(f"Output: {args.output}")
|
||||||
|
if args.enable_llm:
|
||||||
|
print(f"LLM Analysis: Enabled ({args.llm_model})")
|
||||||
|
|
||||||
# Initialize capture
|
# Initialize capture
|
||||||
try:
|
try:
|
||||||
@@ -296,14 +367,14 @@ def main():
|
|||||||
print(f"\n❌ Whisper Error: {e}")
|
print(f"\n❌ Whisper Error: {e}")
|
||||||
return
|
return
|
||||||
|
|
||||||
# Initialize fact checker
|
# Initialize LLM analyzer
|
||||||
fact_checker = None
|
llm_analyzer = None
|
||||||
if args.enable_llm:
|
if args.enable_llm:
|
||||||
try:
|
try:
|
||||||
fact_checker = LLMFactChecker(model=args.llm_model)
|
llm_analyzer = LLMAnalyzer(model=args.llm_model)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"\n⚠ LLM Error: {e}")
|
print(f"\n⚠ LLM Error: {e}")
|
||||||
print("Continuing without fact-checking...")
|
print("Continuing without LLM analysis...")
|
||||||
|
|
||||||
# Main loop
|
# Main loop
|
||||||
print(f"\n✅ Started. Press Ctrl+C to stop.\n{'='*60}")
|
print(f"\n✅ Started. Press Ctrl+C to stop.\n{'='*60}")
|
||||||
@@ -329,10 +400,27 @@ def main():
|
|||||||
source_emoji = "🎤" if source == 'mic' else "🔊"
|
source_emoji = "🎤" if source == 'mic' else "🔊"
|
||||||
print(f"\n{source_emoji} [{timestamp}] {text}")
|
print(f"\n{source_emoji} [{timestamp}] {text}")
|
||||||
|
|
||||||
if fact_checker:
|
# LLM analysis
|
||||||
fc = fact_checker.fact_check(text)
|
fact_check = None
|
||||||
verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(fc['verdict'], '❓')
|
questions = None
|
||||||
print(f" {verdict_emoji} {fc['verdict'].upper()} ({fc['confidence']:.2f}): {fc['reason']}")
|
if llm_analyzer:
|
||||||
|
fact_check = llm_analyzer.fact_check(text)
|
||||||
|
questions = llm_analyzer.generate_questions(text)
|
||||||
|
|
||||||
|
verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(
|
||||||
|
fact_check['verdict'], '❓')
|
||||||
|
print(f" {verdict_emoji} {fact_check['verdict'].upper()} "
|
||||||
|
f"({fact_check['confidence']:.2f}): {fact_check['reason']}")
|
||||||
|
print(f" ❓ Questions:")
|
||||||
|
for i, q in enumerate(questions, 1):
|
||||||
|
print(f" {i}. {q}")
|
||||||
|
|
||||||
|
# Save to file
|
||||||
|
if args.output:
|
||||||
|
if llm_analyzer:
|
||||||
|
save_enriched_transcript(text, source, timestamp, fact_check, questions, args.output)
|
||||||
|
else:
|
||||||
|
save_transcript(text, source, timestamp, args.output)
|
||||||
|
|
||||||
last_process = time.time()
|
last_process = time.time()
|
||||||
|
|
||||||
@@ -340,6 +428,8 @@ def main():
|
|||||||
print(f"\n{'='*60}\n🛑 Stopping...")
|
print(f"\n{'='*60}\n🛑 Stopping...")
|
||||||
|
|
||||||
capturer.close()
|
capturer.close()
|
||||||
|
if args.output and os.path.exists(args.output):
|
||||||
|
print(f"\n💾 Transcript saved: {os.path.abspath(args.output)}")
|
||||||
print("\n✅ Done!")
|
print("\n✅ Done!")
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
347
transcribe_duil_linux_old.py
Normal file
347
transcribe_duil_linux_old.py
Normal file
@@ -0,0 +1,347 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Real-time transcription with dual audio capture (microphone + speaker monitor).
|
||||||
|
Linux/PipeWire optimized with Ollama LLM fact-checking.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sounddevice as sd
|
||||||
|
import numpy as np
|
||||||
|
import threading
|
||||||
|
import queue
|
||||||
|
import time
|
||||||
|
import argparse
|
||||||
|
from datetime import datetime
|
||||||
|
from faster_whisper import WhisperModel
|
||||||
|
|
||||||
|
try:
|
||||||
|
import ollama
|
||||||
|
OLLAMA_AVAILABLE = True
|
||||||
|
except ImportError:
|
||||||
|
OLLAMA_AVAILABLE = False
|
||||||
|
|
||||||
|
|
||||||
|
class DualAudioCapture:
|
||||||
|
"""Capture both microphone and speaker output simultaneously"""
|
||||||
|
|
||||||
|
def __init__(self, mic_device=None, monitor_device=None, sample_rate=16000, chunk_size=2048):
|
||||||
|
self.sample_rate = sample_rate
|
||||||
|
self.chunk_size = chunk_size
|
||||||
|
self.audio_queue = queue.Queue()
|
||||||
|
|
||||||
|
# Find devices
|
||||||
|
devices = sd.query_devices()
|
||||||
|
|
||||||
|
# Microphone (default input or specified)
|
||||||
|
if mic_device is None:
|
||||||
|
self.mic_device = sd.default.device[0] # Default input
|
||||||
|
else:
|
||||||
|
self.mic_device = self._find_device(mic_device, input_required=True)
|
||||||
|
|
||||||
|
# Monitor/Loopback (for speaker output)
|
||||||
|
if monitor_device:
|
||||||
|
self.monitor_device = self._find_device(monitor_device, input_required=True)
|
||||||
|
else:
|
||||||
|
self.monitor_device = None
|
||||||
|
|
||||||
|
print(f"✓ Microphone: {devices[self.mic_device]['name']} (index {self.mic_device})")
|
||||||
|
if self.monitor_device:
|
||||||
|
print(f"✓ Monitor: {devices[self.monitor_device]['name']} (index {self.monitor_device})")
|
||||||
|
else:
|
||||||
|
print("⚠ No monitor device - capturing microphone only")
|
||||||
|
|
||||||
|
# Start streams
|
||||||
|
self.mic_stream = sd.InputStream(
|
||||||
|
device=self.mic_device,
|
||||||
|
channels=1,
|
||||||
|
samplerate=sample_rate,
|
||||||
|
blocksize=chunk_size,
|
||||||
|
dtype='int16',
|
||||||
|
callback=self._mic_callback
|
||||||
|
)
|
||||||
|
|
||||||
|
if self.monitor_device:
|
||||||
|
self.monitor_stream = sd.InputStream(
|
||||||
|
device=self.monitor_device,
|
||||||
|
channels=1,
|
||||||
|
samplerate=sample_rate,
|
||||||
|
blocksize=chunk_size,
|
||||||
|
dtype='int16',
|
||||||
|
callback=self._monitor_callback
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
self.monitor_stream = None
|
||||||
|
|
||||||
|
self.mic_stream.start()
|
||||||
|
if self.monitor_stream:
|
||||||
|
self.monitor_stream.start()
|
||||||
|
|
||||||
|
print("✓ Audio capture started")
|
||||||
|
|
||||||
|
def _find_device(self, device_name, input_required=True):
|
||||||
|
"""Find device by name substring"""
|
||||||
|
devices = sd.query_devices()
|
||||||
|
for i, dev in enumerate(devices):
|
||||||
|
if device_name.lower() in dev['name'].lower():
|
||||||
|
if not input_required or dev['max_input_channels'] > 0:
|
||||||
|
return i
|
||||||
|
raise RuntimeError(f"Device '{device_name}' not found")
|
||||||
|
|
||||||
|
def _mic_callback(self, indata, frames, time_info, status):
|
||||||
|
"""Microphone audio callback"""
|
||||||
|
if status:
|
||||||
|
print(f"⚠ Mic status: {status}")
|
||||||
|
self.audio_queue.put(('mic', indata.copy()))
|
||||||
|
|
||||||
|
def _monitor_callback(self, indata, frames, time_info, status):
|
||||||
|
"""Monitor/speaker audio callback"""
|
||||||
|
if status:
|
||||||
|
print(f"⚠ Monitor status: {status}")
|
||||||
|
self.audio_queue.put(('monitor', indata.copy()))
|
||||||
|
|
||||||
|
def read_chunk(self):
|
||||||
|
"""Read audio data from queue"""
|
||||||
|
try:
|
||||||
|
return self.audio_queue.get(timeout=0.05)
|
||||||
|
except queue.Empty:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
"""Cleanup resources"""
|
||||||
|
self.mic_stream.stop()
|
||||||
|
self.mic_stream.close()
|
||||||
|
if self.monitor_stream:
|
||||||
|
self.monitor_stream.stop()
|
||||||
|
self.monitor_stream.close()
|
||||||
|
|
||||||
|
|
||||||
|
class WhisperTranscriber:
|
||||||
|
"""Process audio with Whisper"""
|
||||||
|
|
||||||
|
def __init__(self, model_name="base", language="en", force_cpu=False):
|
||||||
|
print(f"Loading Whisper model '{model_name}'...")
|
||||||
|
|
||||||
|
import torch
|
||||||
|
has_cuda = torch.cuda.is_available() and not force_cpu
|
||||||
|
|
||||||
|
device = "cpu"
|
||||||
|
compute_type = "int8"
|
||||||
|
|
||||||
|
if has_cuda:
|
||||||
|
try:
|
||||||
|
import ctranslate2
|
||||||
|
if ctranslate2.get_cuda_device_count() > 0:
|
||||||
|
device = "cuda"
|
||||||
|
compute_type = "float16"
|
||||||
|
print(f"✓ Using GPU: {torch.cuda.get_device_name(0)}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠ CUDA unavailable: {e}")
|
||||||
|
|
||||||
|
if device == "cpu":
|
||||||
|
print("✓ Using CPU")
|
||||||
|
|
||||||
|
model_kwargs = {"device": device, "compute_type": compute_type}
|
||||||
|
if device == "cpu":
|
||||||
|
model_kwargs["cpu_threads"] = 4
|
||||||
|
|
||||||
|
self.model = WhisperModel(model_name, **model_kwargs)
|
||||||
|
self.language = language
|
||||||
|
self.mic_buffer = np.array([], dtype=np.float32)
|
||||||
|
self.monitor_buffer = np.array([], dtype=np.float32)
|
||||||
|
self.lock = threading.Lock()
|
||||||
|
|
||||||
|
def add_audio(self, source, audio_chunk):
|
||||||
|
"""Add audio to appropriate buffer"""
|
||||||
|
with self.lock:
|
||||||
|
audio_float = audio_chunk.flatten().astype(np.float32) / 32768.0
|
||||||
|
if source == 'mic':
|
||||||
|
self.mic_buffer = np.concatenate([self.mic_buffer, audio_float])
|
||||||
|
else:
|
||||||
|
self.monitor_buffer = np.concatenate([self.monitor_buffer, audio_float])
|
||||||
|
|
||||||
|
def transcribe_chunk(self, min_duration=3.0):
|
||||||
|
"""Transcribe accumulated audio"""
|
||||||
|
with self.lock:
|
||||||
|
mic_duration = len(self.mic_buffer) / 16000
|
||||||
|
monitor_duration = len(self.monitor_buffer) / 16000
|
||||||
|
|
||||||
|
results = {}
|
||||||
|
|
||||||
|
# Transcribe microphone
|
||||||
|
if mic_duration >= min_duration:
|
||||||
|
mic_audio = self.mic_buffer.copy()
|
||||||
|
self.mic_buffer = np.array([], dtype=np.float32)
|
||||||
|
results['mic'] = self._transcribe(mic_audio)
|
||||||
|
|
||||||
|
# Transcribe monitor
|
||||||
|
if monitor_duration >= min_duration:
|
||||||
|
monitor_audio = self.monitor_buffer.copy()
|
||||||
|
self.monitor_buffer = np.array([], dtype=np.float32)
|
||||||
|
results['monitor'] = self._transcribe(monitor_audio)
|
||||||
|
|
||||||
|
return results if results else None
|
||||||
|
|
||||||
|
def _transcribe(self, audio):
|
||||||
|
"""Internal transcription"""
|
||||||
|
try:
|
||||||
|
segments, _ = self.model.transcribe(
|
||||||
|
audio,
|
||||||
|
language=self.language,
|
||||||
|
beam_size=3, # Faster than default 5
|
||||||
|
vad_filter=True,
|
||||||
|
vad_parameters=dict(min_silence_duration_ms=500)
|
||||||
|
)
|
||||||
|
text = " ".join([seg.text for seg in segments]).strip()
|
||||||
|
return text if text else None
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Transcription error: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
class LLMFactChecker:
|
||||||
|
"""Fast fact-checking with Ollama"""
|
||||||
|
|
||||||
|
def __init__(self, model="qwen2.5:3b"):
|
||||||
|
if not OLLAMA_AVAILABLE:
|
||||||
|
raise RuntimeError("Ollama not installed: pip install ollama")
|
||||||
|
|
||||||
|
self.model = model
|
||||||
|
try:
|
||||||
|
ollama.list()
|
||||||
|
print(f"✓ Ollama connected: {self.model}")
|
||||||
|
except Exception as e:
|
||||||
|
raise RuntimeError(f"Ollama not running: {e}")
|
||||||
|
|
||||||
|
def fact_check(self, text):
|
||||||
|
"""Quick fact-check"""
|
||||||
|
prompt = f"""Fact-check this statement. Reply ONLY with:
|
||||||
|
VERDICT: factual/dubious/false
|
||||||
|
CONFIDENCE: 0.0-1.0
|
||||||
|
REASON: one sentence
|
||||||
|
|
||||||
|
Statement: "{text}" """
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = ollama.generate(
|
||||||
|
model=self.model,
|
||||||
|
prompt=prompt,
|
||||||
|
options={"temperature": 0.1, "num_predict": 80}
|
||||||
|
)
|
||||||
|
|
||||||
|
import re
|
||||||
|
text = response['response']
|
||||||
|
|
||||||
|
verdict = re.search(r'VERDICT:\s*(\w+)', text, re.I)
|
||||||
|
confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', text, re.I)
|
||||||
|
reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', text, re.I | re.DOTALL)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'verdict': verdict.group(1).lower() if verdict else 'unknown',
|
||||||
|
'confidence': float(confidence.group(1)) if confidence else 0.5,
|
||||||
|
'reason': reason.group(1).strip() if reason else text[:150]
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
return {'verdict': 'error', 'confidence': 0.0, 'reason': str(e)}
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Dual audio transcription with fact-checking")
|
||||||
|
parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium"],
|
||||||
|
help="Whisper model (default: tiny for speed)")
|
||||||
|
parser.add_argument("--language", default="en", help="Language code")
|
||||||
|
parser.add_argument("--mic", help="Microphone device name (partial match)")
|
||||||
|
parser.add_argument("--monitor", help="Monitor device name for speaker capture")
|
||||||
|
parser.add_argument("--interval", type=float, default=5.0, help="Processing interval (seconds)")
|
||||||
|
parser.add_argument("--min-duration", type=float, default=2.0, help="Min audio duration")
|
||||||
|
parser.add_argument("--enable-llm", action="store_true", help="Enable fact-checking")
|
||||||
|
parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model")
|
||||||
|
parser.add_argument("--list-devices", action="store_true", help="List audio devices")
|
||||||
|
parser.add_argument("--force-cpu", action="store_true", help="Force CPU")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.list_devices:
|
||||||
|
print("\nAvailable audio devices:")
|
||||||
|
for i, dev in enumerate(sd.query_devices()):
|
||||||
|
in_ch = dev['max_input_channels']
|
||||||
|
out_ch = dev['max_output_channels']
|
||||||
|
if in_ch > 0:
|
||||||
|
print(f" [{i:2d}] {dev['name']:<50} IN:{in_ch} OUT:{out_ch}")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("=== Dual Audio Transcription with Fact-Checking ===")
|
||||||
|
print(f"Model: {args.model} | Language: {args.language} | Interval: {args.interval}s")
|
||||||
|
|
||||||
|
# Initialize capture
|
||||||
|
try:
|
||||||
|
capturer = DualAudioCapture(
|
||||||
|
mic_device=args.mic,
|
||||||
|
monitor_device=args.monitor,
|
||||||
|
sample_rate=16000,
|
||||||
|
chunk_size=2048
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"\n❌ Audio Error: {e}")
|
||||||
|
print("\nTip: Use --list-devices to see available devices")
|
||||||
|
print(" Use --mic and --monitor to specify devices")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Initialize transcriber
|
||||||
|
try:
|
||||||
|
transcriber = WhisperTranscriber(
|
||||||
|
model_name=args.model,
|
||||||
|
language=args.language,
|
||||||
|
force_cpu=args.force_cpu
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"\n❌ Whisper Error: {e}")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Initialize fact checker
|
||||||
|
fact_checker = None
|
||||||
|
if args.enable_llm:
|
||||||
|
try:
|
||||||
|
fact_checker = LLMFactChecker(model=args.llm_model)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"\n⚠ LLM Error: {e}")
|
||||||
|
print("Continuing without fact-checking...")
|
||||||
|
|
||||||
|
# Main loop
|
||||||
|
print(f"\n✅ Started. Press Ctrl+C to stop.\n{'='*60}")
|
||||||
|
last_process = time.time()
|
||||||
|
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
# Collect audio
|
||||||
|
chunk = capturer.read_chunk()
|
||||||
|
if chunk:
|
||||||
|
source, audio = chunk
|
||||||
|
transcriber.add_audio(source, audio)
|
||||||
|
|
||||||
|
# Process at intervals
|
||||||
|
if time.time() - last_process >= args.interval:
|
||||||
|
results = transcriber.transcribe_chunk(min_duration=args.min_duration)
|
||||||
|
|
||||||
|
if results:
|
||||||
|
timestamp = datetime.now().strftime("%H:%M:%S")
|
||||||
|
|
||||||
|
for source, text in results.items():
|
||||||
|
if text:
|
||||||
|
source_emoji = "🎤" if source == 'mic' else "🔊"
|
||||||
|
print(f"\n{source_emoji} [{timestamp}] {text}")
|
||||||
|
|
||||||
|
if fact_checker:
|
||||||
|
fc = fact_checker.fact_check(text)
|
||||||
|
verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(fc['verdict'], '❓')
|
||||||
|
print(f" {verdict_emoji} {fc['verdict'].upper()} ({fc['confidence']:.2f}): {fc['reason']}")
|
||||||
|
|
||||||
|
last_process = time.time()
|
||||||
|
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
print(f"\n{'='*60}\n🛑 Stopping...")
|
||||||
|
|
||||||
|
capturer.close()
|
||||||
|
print("\n✅ Done!")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -79,17 +79,28 @@ class WindowsLoopbackAudioCapture:
|
|||||||
dev['max_input_channels'] > 0):
|
dev['max_input_channels'] > 0):
|
||||||
return dev
|
return dev
|
||||||
|
|
||||||
# Auto-detect: look for WASAPI speakers/headphones
|
# Auto-detect: look for WASAPI speakers/headphones (Windows)
|
||||||
for dev in devices:
|
for dev in devices:
|
||||||
if (dev['max_input_channels'] > 0 and
|
if (dev['max_input_channels'] > 0 and
|
||||||
any(x in dev['name'] for x in ['Speakers', 'Headphones', 'Output'])):
|
any(x in dev['name'] for x in ['Speakers', 'Headphones', 'Output'])):
|
||||||
return dev
|
return dev
|
||||||
|
|
||||||
# Fallback: Stereo Mix or similar
|
# Fallback: Stereo Mix or similar (Windows)
|
||||||
for dev in devices:
|
for dev in devices:
|
||||||
if 'Stereo Mix' in dev['name']:
|
if 'Stereo Mix' in dev['name']:
|
||||||
return dev
|
return dev
|
||||||
|
|
||||||
|
# Linux fallback: use default input device (pipewire/pulse)
|
||||||
|
try:
|
||||||
|
default_input_idx = sd.default.device[0] # Default input device
|
||||||
|
if default_input_idx is not None:
|
||||||
|
dev = devices[default_input_idx]
|
||||||
|
if dev['max_input_channels'] > 0:
|
||||||
|
print("⚠️ Note: Using default input device (microphone). For speaker capture on Linux, use transcribe_dual_linux.py")
|
||||||
|
return dev
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
return None
|
return None
|
||||||
|
|
||||||
def _audio_callback(self, indata, frames, time_info, status):
|
def _audio_callback(self, indata, frames, time_info, status):
|
||||||
@@ -511,8 +522,8 @@ Examples:
|
|||||||
help="GPU device index to use (default: 0)")
|
help="GPU device index to use (default: 0)")
|
||||||
parser.add_argument("--enable-llm", action="store_true",
|
parser.add_argument("--enable-llm", action="store_true",
|
||||||
help="Enable LLM analysis (fact-checking and questions)")
|
help="Enable LLM analysis (fact-checking and questions)")
|
||||||
parser.add_argument("--llm-model", default="gpt-oss:20b",
|
parser.add_argument("--llm-model", default="llama3.2:latest",
|
||||||
help="Ollama model to use for LLM analysis (default: gpt-oss:20b)")
|
help="Ollama model to use for LLM analysis (default: llama3.2:latest)")
|
||||||
parser.add_argument("--llm-debug", action="store_true",
|
parser.add_argument("--llm-debug", action="store_true",
|
||||||
help="Show LLM raw responses for debugging")
|
help="Show LLM raw responses for debugging")
|
||||||
parser.add_argument("--sentence-mode", action="store_true",
|
parser.add_argument("--sentence-mode", action="store_true",
|
||||||
|
|||||||
Reference in New Issue
Block a user