**feat: update project name and cleanup startup script**

- Rename project from "Verbatim Dicta1" to "Verbatim Dicta2" - Remove redundant comments in `start_transcribe.sh` - Streamline virtual environment activation process
Update project title to reflect current scope
2025-12-17 23:09:56 +01:00 · 2025-12-17 22:53:35 +01:00 · 2025-12-17 22:30:41 +01:00 · 2025-12-17 22:11:08 +01:00
12 changed files with 1266 additions and 388 deletions
--- a/QUICK_START.md
+++ b/QUICK_START.md
@@ -0,0 +1,156 @@
 # Quick Start Guide
 ## 1. Setup Audio Devices
 ```bash
 # List available audio devices
 ./run_transcribe.sh --list-devices
 ```
 Find your:
 - **Microphone** - Your input device (e.g., "USB Microphone")
 - **Monitor** - Speaker capture device (e.g., "Monitor of Built-in Audio")
 ---
 ## 2. Basic Usage
 ### Simple Transcription
 ```bash
 # Auto-detect devices
 ./run_transcribe.sh --model medium --language en
 # Specify devices
 ./run_transcribe.sh --mic "USB Mic" --monitor "Monitor"
 ```
 ### With File Output
 ```bash
 ./run_transcribe.sh --model medium --language en --output transcript.txt
 ```
 ### With LLM Analysis
 ```bash
 ./run_transcribe.sh --model medium --enable-llm --output enriched.txt
 ```
 ---
 ## 3. Language Examples
 ### Dutch (Nederlands)
 ```bash
 ./run_transcribe.sh --model medium --language nl --enable-llm
 ```
 ### Spanish
 ```bash
 ./run_transcribe.sh --model medium --language es
 ```
 ### French
 ```bash
 ./run_transcribe.sh --model medium --language fr
 ```
 ### German
 ```bash
 ./run_transcribe.sh --model medium --language de
 ```
 ---
 ## 4. Model Selection
 | Model  | Speed    | Quality | Command                          |
 |--------|----------|---------|----------------------------------|
 | tiny   | Fastest  | Basic   | `--model tiny`                   |
 | base   | Fast     | Good    | `--model base`                   |
 | small  | Moderate | Better  | `--model small`                  |
 | medium | Slow     | Great   | `--model medium` **(recommended)** |
 | large  | Slowest  | Best    | `--model large`                  |
 ---
 ## 5. Optimization Tips
 ### High Quality Transcription
 ```bash
 ./run_transcribe.sh --model large --interval 8 --min-duration 4
 ```
 ### Fast Real-Time
 ```bash
 ./run_transcribe.sh --model tiny --interval 3 --min-duration 2
 ```
 ### Best Dutch Transcription (Your Setup)
 ```bash
 ./run_transcribe.sh --model medium --interval 8 --min-duration 4 --enable-llm --language nl
 ```
 ---
 ## 6. LLM Configuration
 ### Default Model (qwen2.5:3b - Fast)
 ```bash
 ./run_transcribe.sh --enable-llm
 ```
 ### Larger Model (Better Analysis)
 ```bash
 # Install model first
 ollama pull llama3.2
 # Use it
 ./run_transcribe.sh --enable-llm --llm-model llama3.2
 ```
 ---
 ## 7. Output Examples
 ### Console Output
 ```
 🎤 [14:23:15] User speaking via microphone
 🔊 [14:23:20] Audio from speakers
 🎤 [14:23:25] The Earth orbits the Sun in 365 days.
   ✅ FACTUAL (0.98): Scientifically accurate.
   ❓ Questions:
      1. Why do we need leap years?
      2. How does orbital speed vary?
      3. What affects Earth's orbit?
 ```
 ### File Output
 Saved to `transcript.txt` or your specified file with timestamps and analysis.
 ---
 ## 8. Controls
 - **Ctrl+C** - Stop transcription
 - Processing happens every `--interval` seconds (default: 5s)
 - Minimum `--min-duration` audio required (default: 2s)
 ---
 ## Troubleshooting
 **No devices found:**
 ```bash
 ./run_transcribe.sh --list-devices
 ```
 **Ollama errors:**
 ```bash
 ollama serve
 ollama pull qwen2.5:3b
 ```
 **Force CPU (GPU issues):**
 ```bash
 ./run_transcribe.sh --force-cpu
 ```
--- a/README.md
+++ b/README.md
@@ -1,16 +1,15 @@
-# Verbatim Dicta
+# Verbatim Dicta2
-Real-time audio transcription using Whisper AI with optional LLM-powered analysis. Captures system audio via loopback and transcribes it with configurable models and processing options.
+Real-time audio transcription using Whisper AI with optional LLM analysis. Captures microphone and speaker audio simultaneously for comprehensive transcription.
 ## Features
- Real-time transcription of system audio (Windows/Linux)
+- **Dual audio capture** - Record microphone and speaker output simultaneously
- Multiple Whisper model sizes (tiny to large)
+- **Real-time transcription** - Process audio as it's captured with Whisper models
- Multi-language support
+- **LLM analysis** - Optional fact-checking and question generation via Ollama
- **Sentence extraction mode** - Stitches audio chunks into complete sentences
+- **Multi-language** - Support for 50+ languages
- Optional LLM analysis for fact-checking and question generation (via Ollama)
+- **File output** - Save transcripts with timestamps and analysis
- GPU acceleration support
+- **GPU acceleration** - CUDA support for faster processing
 - Flexible audio device configuration
 ## Quick Start
@@ -18,17 +17,14 @@ Real-time audio transcription using Whisper AI with optional LLM-powered analysi
 # Install dependencies
 pip install -r requirements.txt
 # Basic transcription (no LLM)
 python transcribe_speakers.py
 # With LLM analysis (optional)
 python transcribe_speakers.py --enable-llm
 # With sentence extraction
 python transcribe_speakers.py --sentence-mode
 # List audio devices
-python transcribe_speakers.py --list-devices
+./run_transcribe.sh --list-devices
 # Basic transcription
 ./run_transcribe.sh --model medium --language en
 # With LLM analysis and file output
 ./run_transcribe.sh --model medium --enable-llm --output transcript.txt
 ```
 ## Requirements
@@ -58,164 +54,153 @@ For CUDA 12.1:
 pip install torch==2.8.0+cu121 --index-url https://download.pytorch.org/whl/cu121
 ```
-### 3. Audio Loopback Setup
+### 3. Audio Setup
-**Windows - Option A (Stereo Mix):**
+**Linux (PulseAudio/PipeWire):**
-1. Right-click speaker icon → Sounds → Recording tab
+```bash
-2. Right-click → Show Disabled Devices
+# List devices to find your monitor device
-3. Enable and set Stereo Mix as default
+./run_transcribe.sh --list-devices
-**Windows - Option B (VB-Cable, recommended):**
+# Use with monitor device
-1. Download from [vb-audio.com](https://vb-audio.com/Cable/)
+./run_transcribe.sh --monitor "alsa_output.monitor"
-2. Install and restart
+```
 3. Use `--device "CABLE Output"`
-**Linux:**
+**Windows:**
-Configure PulseAudio loopback or use `transcribe_dual_linux.py`
+- Enable "Stereo Mix" in Sound settings, or
 - Install VB-Cable from [vb-audio.com](https://vb-audio.com/Cable/)
-### 4. LLM Features (Optional)
+### 4. LLM Support (Optional)
 ```bash
 # Install Ollama from ollama.ai
-ollama pull llama3.2
+ollama pull qwen2.5:3b
 ```
 ## Usage
-### Available Scripts
+### Command Line Options
 - `transcribe_speakers.py` - Main script with all features (LLM optional via `--enable-llm`)
 - `transcribe_dual_linux.py` - Linux-specific with dual audio support
 ### Common Commands
 ```bash
-# Specify device and model
+python transcribe.py [OPTIONS]
 python transcribe_speakers.py --device "CABLE Output" --model medium
-# Save to file with language
+Options:
-python transcribe_speakers.py --language es --output transcript.txt
+  --model {tiny,base,small,medium,large}  Whisper model (default: tiny)
-
+  --language CODE                         Language code (default: en)
-# Fast mode (low latency)
+  --mic DEVICE                            Microphone device name
-python transcribe_speakers.py --fast-mode --model tiny --interval 3
+  --monitor DEVICE                        Speaker monitor device name
-
+  --interval SECONDS                      Processing interval (default: 5.0)
-# Extract complete sentences from chunks
+  --min-duration SECONDS                  Minimum audio duration (default: 2.0)
-python transcribe_speakers.py --sentence-mode --output sentences.txt
+  --enable-llm                            Enable LLM analysis
-
+  --llm-model MODEL                       Ollama model (default: qwen2.5:3b)
-# Maximum accuracy with LLM and sentence extraction
+  --output FILE                           Save transcript to file
-python transcribe_speakers.py --model large --enable-llm --sentence-mode --output enriched.txt
+  --force-cpu                             Force CPU processing
-
+  --list-devices                          List audio devices
 # Force CPU (avoid GPU issues)
 python transcribe_speakers.py --force-cpu
 ```
-### Key Options
+### Examples
-| Option | Description | Default |
+```bash
-|--------|-------------|---------|
+# Dutch transcription with LLM
-| `--model` | Model size: tiny/base/small/medium/large | base |
+./run_transcribe.sh --model medium --language nl --enable-llm
-| `--language` | Language code (en/es/fr/de/ja/etc.) | en |
+
-| `--device` | Audio device name (partial match) | Auto |
+# High-quality meeting transcription
-| `--interval` | Processing interval (seconds) | 8.0 |
+./run_transcribe.sh --model large --interval 8 --output meeting.txt
-| `--min-duration` | Minimum audio duration | 3.0 |
+
-| `--fast-mode` | Fast mode (3-5x faster, lower accuracy) | False |
+# Fast real-time transcription
-| `--enable-llm` | Enable fact-checking and questions | False |
+./run_transcribe.sh --model tiny --interval 3 --min-duration 2
-| `--llm-model` | Ollama model to use | llama3.2 |
+
-| `--output` | Save to file | None |
+# Specific devices
-| `--force-cpu` | Disable GPU | False |
+./run_transcribe.sh --mic "USB Mic" --monitor "Monitor of Speakers"
-| `--gpu-index` | GPU device index | 0 |
+```
 | `--sentence-mode` | Extract complete sentences from chunks | False |
 ## Model Performance
-| Model | Size | Speed | Quality | Best For |
+| Model  | Size   | Speed    | Quality | Use Case               |
-|-------|------|-------|---------|----------|
+|--------|--------|----------|---------|------------------------|
-| tiny | ~75 MB | Fastest | Basic | Quick tests, low-latency |
+| tiny   | 75 MB  | Fastest  | Basic   | Real-time, low latency |
-| base | ~145 MB | Fast | Good | General real-time use |
+| base   | 145 MB | Fast     | Good    | General use            |
-| small | ~485 MB | Moderate | Better | Balanced accuracy/speed |
+| small  | 485 MB | Moderate | Better  | Balanced               |
-| medium | ~1.5 GB | Slow | Great | High accuracy needs |
+| medium | 1.5 GB | Slow     | Great   | High accuracy          |
-| large | ~3 GB | Slowest | Best | Maximum accuracy |
+| large  | 3 GB   | Slowest  | Best    | Maximum quality        |
 ## Optimization Presets
 **Low Latency (Real-Time):**
 ```bash
 python transcribe_speakers.py --model tiny --fast-mode --interval 2 --min-duration 1.5
 ```
 **Balanced:**
 ```bash
 python transcribe_speakers.py --model base --interval 5
 ```
 **High Accuracy:**
 ```bash
 python transcribe_speakers.py --model large --interval 10 --enable-llm
 ```
 ## Troubleshooting
-**No loopback device:**
+**No audio devices found:**
- Windows: Enable Stereo Mix or install VB-Cable
+```bash
- Linux: Configure PulseAudio loopback
+# List all devices
 ./run_transcribe.sh --list-devices
 # Specify devices explicitly
 ./run_transcribe.sh --mic "device_name" --monitor "monitor_name"
 ```
 **CUDA errors:**
 ```bash
-python transcribe_speakers.py --force-cpu
+# Force CPU processing
 ./run_transcribe.sh --force-cpu
 ```
-**No audio captured:**
+**Ollama connection failed:**
- Verify audio is playing
+```bash
- Check device: `--list-devices`
+# Start Ollama service
- Increase system volume
+ollama serve
-**Poor quality:**
+# Pull required model
- Use larger model: `--model medium`
+ollama pull qwen2.5:3b
 ```
 **Poor transcription quality:**
 - Use larger model: `--model medium` or `--model large`
 - Increase interval: `--interval 10`
- Specify language: `--language <code>`
+- Specify language: `--language nl`
-
+- Ensure good audio quality (reduce background noise)
 **Ollama errors:**
 - Ensure Ollama is running
 - Pull model: `ollama pull llama3.2`
 ## Output Format
-**Standard:**
+### Standard Output
 ```
-[14:23:15] Transcribed audio segment.
+🎤 [14:23:15] User speaking into microphone
-[14:23:23] Another segment with timestamp.
+🔊 [14:23:18] Audio from speakers or system
 ```
-**With LLM (--enable-llm):**
+### With LLM Analysis
 ```
 🎤 [14:23:15] The Earth orbits the Sun in 365 days.
   ✅ FACTUAL (0.98): Scientifically accurate orbital period.
   ❓ Questions:
      1. Why do we need leap years?
      2. How does the elliptical orbit affect seasons?
      3. What factors influence Earth's orbital velocity?
 ```
 ### File Output
 ```
 [14:23:15] MIC: User speaking into microphone
 [14:23:18] SPEAKER: Audio from speakers
 ======================================================================
-[14:23:15] The Earth revolves around the Sun in 365 days.
+[14:23:25] MIC: The Earth orbits the Sun in 365 days.
 📊 Fact Check: FACTUAL (confidence: 0.98)
-💡 Scientifically accurate. Earth's orbital period is 365.25 days.
+💡 Scientifically accurate orbital period.
 ❓ Questions:
 1. Why do we need leap years?
-2. How does Earth's orbit affect seasons?
+2. How does the elliptical orbit affect seasons?
-======================================================================
+3. What factors influence Earth's orbital velocity?
 ```
-## Technical Stack
+## Architecture
- **Audio**: sounddevice, soundfile (16kHz mono, 16-bit PCM)
+- **Audio Capture**: sounddevice with dual-stream support
- **Transcription**: faster-whisper (optimized Whisper)
+- **Transcription**: faster-whisper (optimized Whisper implementation)
- **LLM**: Ollama (local inference)
+- **LLM**: Ollama for local inference
- **Capture**: WASAPI loopback (Windows), PulseAudio (Linux)
+- **Format**: 16kHz mono, 16-bit PCM
 - **Processing**: Independent mic/speaker buffers with beam_size=3
-## Future Work
+## Contributing
- Real-time streaming transcription with reduced buffering
+Contributions welcome! Please open issues or submit pull requests.
 - Speaker diarization improvements
 - Web interface for remote monitoring
 - Multi-device simultaneous transcription
 - Cloud LLM integration options
 - Custom vocabulary and domain adaptation
 - Noise reduction preprocessing
 ## License
--- a/RUN_DUTCH.sh
+++ b/RUN_DUTCH.sh
@@ -0,0 +1,19 @@
 #!/bin/bash
 # Dutch language transcription with GPU and sentence extraction
 cd "$(dirname "$0")"
 export LD_LIBRARY_PATH=".venv/lib/python3.13/site-packages/nvidia/cudnn/lib:.venv/lib/python3.13/site-packages/nvidia/cublas/lib:${LD_LIBRARY_PATH}"
 echo "Starting Dutch transcription..."
 echo "Speak in Dutch into your microphone"
 echo "Press Ctrl+C to stop"
 echo ""
 .venv/bin/python3 transcribe_speakers.py \
    --sentence-mode \
    --language nl \
    --model base \
    --interval 5 \
    --min-duration 2 \
    "$@"
--- a/RUN_DUTCH_LLM.sh
+++ b/RUN_DUTCH_LLM.sh
@@ -0,0 +1,22 @@
 #!/bin/bash
 # Dutch transcription with GPU, sentence extraction, and LLM analysis
 cd "$(dirname "$0")"
 export LD_LIBRARY_PATH=".venv/lib/python3.13/site-packages/nvidia/cudnn/lib:.venv/lib/python3.13/site-packages/nvidia/cublas/lib:${LD_LIBRARY_PATH}"
 echo "Starting Dutch transcription with LLM analysis..."
 echo "Using model: llama3.2:latest"
 echo "Speak in Dutch into your microphone"
 echo "Press Ctrl+C to stop"
 echo ""
 .venv/bin/python3 transcribe_speakers.py \
    --sentence-mode \
    --language nl \
    --model large \
    --interval 5 \
    --min-duration 2 \
    --enable-llm \
    --llm-model "llama3.2:latest" \
    "$@"
--- a/RUN_GPU.sh
+++ b/RUN_GPU.sh
@@ -0,0 +1,13 @@
 #!/bin/bash
 # GPU-accelerated transcription with sentence extraction
 cd "$(dirname "$0")"
 export LD_LIBRARY_PATH=".venv/lib/python3.13/site-packages/nvidia/cudnn/lib:.venv/lib/python3.13/site-packages/nvidia/cublas/lib:${LD_LIBRARY_PATH}"
 .venv/bin/python3 transcribe_speakers.py \
    --sentence-mode \
    --model large \
    --interval 5 \
    --min-duration 2 \
    "$@"
--- a/enriched.txt
+++ b/enriched.txt
@@ -1,226 +0,0 @@
 [23:31:46] So it helps us get back into a grounded information terrain  and then also it requires us.
 📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
 💡 The statement is a vague, non‑specific claim that cannot be verified against any factual evidence.
 ❓ Questions:
 1. What specific processes or actions help us return to a grounded information terrain?
 2. In what ways does this approach require us to change our current practices or mindset?
 3. How does re-establishing a grounded information terrain impact the overall effectiveness of the project?
 ======================================================================
 [23:31:54] to take the time to pay attention to information, really absorb it properly,  and then to make decisions based on that. So we need to bring people into the process of
 📊 Fact Check: NOT_FACTUAL (confidence: 0.99)
 💡 The statement is an incomplete, non‑factual description of a process, not a verifiable claim.
 ❓ Questions:
 1. What are the key points here?
 2. What evidence supports this?
 3. What are the implications?
 ======================================================================
 [23:32:02] decision making and at the same time as part of that bring them into a terrain  of really
 📊 Fact Check: NOT_FACTUAL (confidence: 1.00)
 💡 The statement is a nonsensical fragment and does not convey a verifiable factual claim.
 ❓ Questions:
 1. What does the phrase "bring them into a terrain of really" refer to in the context of decision making?
 2. How does the process of decision making simultaneously involve "bringing them into a terrain" as mentioned?
 3. Can you explain how the concept of "terrain"?
 ======================================================================
 [23:32:10] curing and discerning information properly and then engage in the decision-making process.  That's the only way we're actually going to turn this around.  It's not going to be good enough to...
 📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
 💡 The fragment is an incomplete quote with no verifiable factual claim.
 ❓ Questions:
 1. What does "curing and discerning information properly" entail in the context of this statement?
 2. How does engaging in the decision‑making process contribute to turning the situation around?
 3. Why is simply having information or a plan not sufficient according to the speaker?
 ======================================================================
 [23:32:18] to elect new politicians because the underlying problem of the way we absorb, process and  deal with information now remains.  And the only way we can do that is actually to do it.
 📊 Fact Check: NOT_FACTUAL (confidence: 0.95)
 💡 The statement is an opinion about politics and information processing, not a verifiable factual claim.
 ❓ Questions:
 1. What specific aspects of the way we absorb, process, and deal with information are identified as the underlying problem in the statement?
 2. How does the statement justify the election of new politicians as a solution to the information-related issue it describes?
 3. What practical steps or strategies does the statement imply we should take to "actually do it" in addressing the information problem?
 ======================================================================
 [23:32:26] actually by bringing people in on a mass basis, having huge numbers of citizens,  juries around the country on a regular basis for decisions we're making involving  the
 📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
 💡 The statement is incomplete and lacks context, making it impossible to verify its factual accuracy.
 ❓ Questions:
 1. What are the key points here?
 2. What evidence supports this?
 3. What are the implications?
 ======================================================================
 [23:32:35] public that's the only way we're going to be able to turn this around and not  just think that okay let's just wait for another Kamala Harris or somebody like that  to come along and win an election then
 📊 Fact Check: NOT_FACTUAL (confidence: 0.95)
 💡 The quoted phrase
 ❓ Questions:
 1. What specific actions does the speaker believe are necessary?
 2. Q1: What specific actions does the speaker believe are necessary?
 3. What are the implications?
 ======================================================================
 [23:32:42] we'll all be right and we'll be able to turn the clock back. It won't work like that.  The problem is far too deep seated than that. So, yes, we are becoming
 📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
 💡 The statement is a vague, incomplete fragment with no verifiable factual claim.
 ❓ Questions:
 1. What specific problem is the speaker implying is "far too deep seated" to be solved by simply "turning the clock back"?
 2. How does the speaker's claim that "we'll all be right" relate to the broader context or argument being presented?
 3. In what ways might the statement "So, yes, we are becoming" reflect a shift in perspective or identity for the speaker or the audience?
 ======================================================================
 [23:32:50] That's basically what's going on at the moment.  But that doesn't mean that we can lose hope, because there are mechanisms in which we can  actually turn that around.
 📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
 💡 The statement is a general, non‑specific claim that cannot be verified as true or false.
 ❓ Questions:
 1. What specific situation or issue is being described as "what's going on at the moment"?
 2. What mechanisms are being referred to that could help "turn that around"?
 3. How does the speaker justify maintaining hope despite the current challenges?
 ======================================================================
 [23:32:58] by actually engaging in the political process ourselves,  which would force us to then utilise information in a different way.
 📊 Fact Check: DUBIOUS (confidence: 0.70)
 💡 The claim is a speculative assertion about how political engagement might change information use, and it cannot be verified as a factual statement.
 ❓ Questions:
 1. What are the key points here?
 2. What evidence supports this?
 3. What are the implications?
 ======================================================================
 [23:33:06] hopefully in the end come to different conclusions, but be part of that decision-making process too.  So it's an important realization. What's happening to us is
 📊 Fact Check: NOT_FACTUAL (confidence: 1.00)
 💡 The statement is a subjective expression of hope and realization, not a verifiable factual claim.
 ❓ Questions:
 1. What does the speaker mean by "hopefully in the end come to different conclusions" and how does that relate to the decision-making process mentioned?
 2. In what ways might being part of the decision-making process influence the outcomes described in the statement?
 3. What specific "important realization" is referenced, and how does it connect to "what's happening to us"?
 ======================================================================
 [23:33:25] species in terms of our intelligence but it more importantly gives us a very important  call to action.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.  We need to think.
 📊 Fact Check: NOT_FACTUAL (confidence: 1.00)
 💡 The statement is a nonsensical fragment that does not present any verifiable factual claim.
 ❓ Questions:
 1. What is the main message conveyed by?
 2. Q1: What is the main message conveyed by?
 3. What are the implications?
 ======================================================================
 [23:33:40] differently about how we govern ourselves going forward if we are to  reverse this genuine decline. I hope you like my video. As a psychiatrist who loves  politics and economics and philosophy I love to make videos like this and you  can really help promote this video to other people and get it on your feed more  by liking and commenting and subscribing to the video.
 📊 Fact Check: NOT_FACTUAL (confidence: 1.00)
 💡 The statement is a personal comment and request for promotion, not a claim that can be verified as true or false.
 ❓ Questions:
 1. What specific strategies does the speaker propose for reversing the "genuine decline" in governance mentioned in the statement?
 2. How does the speaker’s background as a psychiatrist influence their perspective on politics, economics, and philosophy?
 3. In what ways does the speaker suggest viewers can effectively promote the video to reach a wider audience?
 ======================================================================
 [23:33:46] well. We have a wonderful community of people here who comment and support each other through  this very traumatic period of world history.
 📊 Fact Check: DUBIOUS (confidence: 0.60)
 💡 The claim is a subjective, unverified assertion about a community’s nature and cannot be confirmed or refuted with available evidence.
 ❓ Questions:
 1. Who are the members of the community mentioned in the statement?
 2. Which specific traumatic period of world history is being referred to?
 3. In what ways do the community members comment and support each other during this period?
 ======================================================================
 [23:33:54] that we're going through right now.  I also hope you consider becoming a subscriber  to the channel and also subscribing to my.
 📊 Fact Check: NOT_FACTUAL (confidence: 1.00)
 💡 The sentence is a fragment expressing a hope, not a verifiable factual claim.
 ❓ Questions:
 1. What are the key points here?
 2. What evidence supports this?
 3. What are the implications?
 ======================================================================
 [23:34:02] E-newsletter, there's a link in the description, and that way we can stay in touch outside  the channel so you can stay, keep a
 📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
 💡 [one sentence]"
 ❓ Questions:
 1. What are the key points here?
 2. What evidence supports this?
 3. What are the implications?
 ======================================================================
 [23:34:10] rest of all of the content that I'm making on an ongoing basis. The latest of  which actually is my latest book called We the People.
 📊 Fact Check: DUBIOUS (confidence: 0.50)
 💡 The statement is a fragment with no verifiable context or evidence that the speaker’s latest book is titled *We the People*.
 ❓ Questions:
 1. What are the key points here?
 2. What evidence supports this?
 3. What are the implications?
 ======================================================================
 [23:34:19] very very proud of this book. It's actually a novel, a fiction book, written by myself  and the famous award-winning author T.J. McGregor.
 📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
 💡 No verifiable record exists of a novel co‑authored by the user and an award‑winning author
 ❓ Questions:
 1. How did you and T.J. McGregor collaborate on the novel?
 2. What inspired you to co-write a fiction book with an award‑winning author?
 3. What genre and themes does the novel?
 ======================================================================
 [23:34:27] Together we wrote a book about what the future might look like.  Bit of a dystopian novel, but what might happen if autocracy goes to its next stage?
 📊 Fact Check: NOT_FACTUAL (confidence: 0.90)
 💡 There is no verifiable evidence that the speaker and the other person co‑authored a book on future dystopias.
 ❓ Questions:
 1. What core themes and motifs did the book explore to envision the next stage of autocracy?
 2. How does the narrative structure of the novel reflect the progression of authoritarian power in a dystopian future?
 3. What real-world events or historical patterns inspired the authors to imagine a future where autocracy has evolved beyond its current form?
--- a/run_transcribe.sh
+++ b/run_transcribe.sh
@@ -11,4 +11,4 @@ CUBLAS_PATH=".venv/lib/python3.13/site-packages/nvidia/cublas/lib"
 export LD_LIBRARY_PATH="${CUDNN_PATH}:${CUBLAS_PATH}:${LD_LIBRARY_PATH}"
 # Run the transcription script with all arguments
-python3 transcribe_dual_linux.py "$@"
+python3 transcribe.py "$@"
--- a/start_transcribe.sh
+++ b/start_transcribe.sh
@@ -0,0 +1,24 @@
 #!/bin/bash
 cd "$(dirname "$0")"
 source .venv/bin/activate
 # Set CUDA library paths for ctranslate2
 export LD_LIBRARY_PATH=".venv/lib/python3.13/site-packages/nvidia/cudnn/lib:.venv/lib/python3.13/site-packages/nvidia/cublas/lib:.venv/lib/python3.13/site-packages/nvidia/cuda_runtime/lib:${LD_LIBRARY_PATH}"
 # Run transcription with sentence mode and GPU
 echo "Starting transcription with:"
 echo "  - Model: tiny (fast, good for testing)"
 echo "  - GPU mode (RTX 4060 Ti)"
 echo "  - Sentence extraction enabled"
 echo "  - Interval: 5 seconds"
 echo ""
 echo "Speak into your microphone to see transcription..."
 echo "Press Ctrl+C to stop"
 echo ""
 python3 transcribe_speakers.py \
    --sentence-mode \
    --model tiny \
    --interval 5 \
    --min-duration 2
--- a/transcribe.py
+++ b/transcribe.py
@@ -0,0 +1,437 @@
 #!/usr/bin/env python3
 """
 Real-time audio transcription with dual capture and optional LLM analysis.
 Supports microphone + speaker monitor, file output, and fact-checking.
 """
 import sounddevice as sd
 import numpy as np
 import threading
 import queue
 import time
 import os
 import argparse
 from datetime import datetime
 from faster_whisper import WhisperModel
 try:
    import ollama
    OLLAMA_AVAILABLE = True
 except ImportError:
    OLLAMA_AVAILABLE = False
 class DualAudioCapture:
    """Capture both microphone and speaker output simultaneously"""
    def __init__(self, mic_device=None, monitor_device=None, sample_rate=16000, chunk_size=2048):
        self.sample_rate = sample_rate
        self.chunk_size = chunk_size
        self.audio_queue = queue.Queue()
        # Find devices
        devices = sd.query_devices()
        # Microphone (default input or specified)
        if mic_device is None:
            self.mic_device = sd.default.device[0]  # Default input
        else:
            self.mic_device = self._find_device(mic_device, input_required=True)
        # Monitor/Loopback (for speaker output)
        if monitor_device:
            self.monitor_device = self._find_device(monitor_device, input_required=True)
        else:
            self.monitor_device = None
        print(f"✓ Microphone: {devices[self.mic_device]['name']} (index {self.mic_device})")
        if self.monitor_device:
            print(f"✓ Monitor: {devices[self.monitor_device]['name']} (index {self.monitor_device})")
        else:
            print("⚠ No monitor device - capturing microphone only")
        # Start streams
        self.mic_stream = sd.InputStream(
            device=self.mic_device,
            channels=1,
            samplerate=sample_rate,
            blocksize=chunk_size,
            dtype='int16',
            callback=self._mic_callback
        )
        if self.monitor_device:
            self.monitor_stream = sd.InputStream(
                device=self.monitor_device,
                channels=1,
                samplerate=sample_rate,
                blocksize=chunk_size,
                dtype='int16',
                callback=self._monitor_callback
            )
        else:
            self.monitor_stream = None
        self.mic_stream.start()
        if self.monitor_stream:
            self.monitor_stream.start()
        print("✓ Audio capture started")
    def _find_device(self, device_name, input_required=True):
        """Find device by name substring"""
        devices = sd.query_devices()
        for i, dev in enumerate(devices):
            if device_name.lower() in dev['name'].lower():
                if not input_required or dev['max_input_channels'] > 0:
                    return i
        raise RuntimeError(f"Device '{device_name}' not found")
    def _mic_callback(self, indata, frames, time_info, status):
        """Microphone audio callback"""
        if status:
            print(f"⚠ Mic status: {status}")
        self.audio_queue.put(('mic', indata.copy()))
    def _monitor_callback(self, indata, frames, time_info, status):
        """Monitor/speaker audio callback"""
        if status:
            print(f"⚠ Monitor status: {status}")
        self.audio_queue.put(('monitor', indata.copy()))
    def read_chunk(self):
        """Read audio data from queue"""
        try:
            return self.audio_queue.get(timeout=0.05)
        except queue.Empty:
            return None
    def close(self):
        """Cleanup resources"""
        self.mic_stream.stop()
        self.mic_stream.close()
        if self.monitor_stream:
            self.monitor_stream.stop()
            self.monitor_stream.close()
 class WhisperTranscriber:
    """Process audio with Whisper"""
    def __init__(self, model_name="base", language="en", force_cpu=False):
        print(f"Loading Whisper model '{model_name}'...")
        import torch
        has_cuda = torch.cuda.is_available() and not force_cpu
        device = "cpu"
        compute_type = "int8"
        if has_cuda:
            try:
                import ctranslate2
                if ctranslate2.get_cuda_device_count() > 0:
                    device = "cuda"
                    compute_type = "float16"
                    print(f"✓ Using GPU: {torch.cuda.get_device_name(0)}")
            except Exception as e:
                print(f"⚠ CUDA unavailable: {e}")
        if device == "cpu":
            print("✓ Using CPU")
        model_kwargs = {"device": device, "compute_type": compute_type}
        if device == "cpu":
            model_kwargs["cpu_threads"] = 4
        self.model = WhisperModel(model_name, **model_kwargs)
        self.language = language
        self.mic_buffer = np.array([], dtype=np.float32)
        self.monitor_buffer = np.array([], dtype=np.float32)
        self.lock = threading.Lock()
    def add_audio(self, source, audio_chunk):
        """Add audio to appropriate buffer"""
        with self.lock:
            audio_float = audio_chunk.flatten().astype(np.float32) / 32768.0
            if source == 'mic':
                self.mic_buffer = np.concatenate([self.mic_buffer, audio_float])
            else:
                self.monitor_buffer = np.concatenate([self.monitor_buffer, audio_float])
    def transcribe_chunk(self, min_duration=3.0):
        """Transcribe accumulated audio"""
        with self.lock:
            mic_duration = len(self.mic_buffer) / 16000
            monitor_duration = len(self.monitor_buffer) / 16000
            results = {}
            # Transcribe microphone
            if mic_duration >= min_duration:
                mic_audio = self.mic_buffer.copy()
                self.mic_buffer = np.array([], dtype=np.float32)
                results['mic'] = self._transcribe(mic_audio)
            # Transcribe monitor
            if monitor_duration >= min_duration:
                monitor_audio = self.monitor_buffer.copy()
                self.monitor_buffer = np.array([], dtype=np.float32)
                results['monitor'] = self._transcribe(monitor_audio)
            return results if results else None
    def _transcribe(self, audio):
        """Internal transcription"""
        try:
            segments, _ = self.model.transcribe(
                audio,
                language=self.language,
                beam_size=3,
                vad_filter=True,
                vad_parameters=dict(min_silence_duration_ms=500)
            )
            text = " ".join([seg.text for seg in segments]).strip()
            return text if text else None
        except Exception as e:
            print(f"❌ Transcription error: {e}")
            return None
 class LLMAnalyzer:
    """LLM analysis with fact-checking and question generation"""
    def __init__(self, model="qwen2.5:3b"):
        if not OLLAMA_AVAILABLE:
            raise RuntimeError("Ollama not installed: pip install ollama")
        self.model = model
        try:
            ollama.list()
            print(f"✓ Ollama connected: {self.model}")
        except Exception as e:
            raise RuntimeError(f"Ollama not running: {e}")
    def fact_check(self, text):
        """Quick fact-check"""
        prompt = f"""Fact-check this statement. Reply ONLY with:
 VERDICT: factual/dubious/false
 CONFIDENCE: 0.0-1.0
 REASON: one sentence
 Statement: "{text}" """
        try:
            response = ollama.generate(
                model=self.model,
                prompt=prompt,
                options={"temperature": 0.1, "num_predict": 80}
            )
            import re
            response_text = response['response']
            verdict = re.search(r'VERDICT:\s*(\w+)', response_text, re.I)
            confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', response_text, re.I)
            reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', response_text, re.I | re.DOTALL)
            return {
                'verdict': verdict.group(1).lower() if verdict else 'unknown',
                'confidence': float(confidence.group(1)) if confidence else 0.5,
                'reason': reason.group(1).strip() if reason else response_text[:150]
            }
        except Exception as e:
            return {'verdict': 'error', 'confidence': 0.0, 'reason': str(e)}
    def generate_questions(self, text):
        """Generate follow-up questions"""
        prompt = f"""Generate 3 insightful questions about this. Reply ONLY with:
 Q1: [question]
 Q2: [question]
 Q3: [question]
 Statement: "{text}" """
        try:
            response = ollama.generate(
                model=self.model,
                prompt=prompt,
                options={"temperature": 0.7, "num_predict": 120}
            )
            import re
            response_text = response['response']
            questions = []
            for i in range(1, 4):
                q_match = re.search(rf'Q{i}:\s*(.+?)(?:\n|$)', response_text, re.I)
                if q_match:
                    question = q_match.group(1).strip()
                    if not question.endswith('?'):
                        question += '?'
                    questions.append(question)
            # Fallback defaults
            while len(questions) < 3:
                defaults = ["What are the implications?", "What evidence supports this?", "What's the context?"]
                questions.append(defaults[len(questions)])
            return questions[:3]
        except Exception as e:
            return ["What are the key points?", "What supports this?", "What are the implications?"]
 def save_transcript(text, source, timestamp, filename):
    """Append transcript to file"""
    os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
    with open(filename, "a", encoding="utf-8") as f:
        source_label = "MIC" if source == 'mic' else "SPEAKER"
        f.write(f"[{timestamp}] {source_label}: {text}\n")
 def save_enriched_transcript(text, source, timestamp, fact_check, questions, filename):
    """Save enriched transcript with LLM analysis"""
    os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
    with open(filename, "a", encoding="utf-8") as f:
        source_label = "MIC" if source == 'mic' else "SPEAKER"
        f.write(f"\n{'='*70}\n")
        f.write(f"[{timestamp}] {source_label}: {text}\n\n")
        if fact_check:
            f.write(f"📊 Fact Check: {fact_check['verdict'].upper()} ")
            f.write(f"(confidence: {fact_check['confidence']:.2f})\n")
            f.write(f"💡 {fact_check['reason']}\n\n")
        if questions:
            f.write("❓ Questions:\n")
            for i, q in enumerate(questions, 1):
                f.write(f"{i}. {q}\n")
            f.write("\n")
 def main():
    parser = argparse.ArgumentParser(description="Real-time audio transcription with dual capture")
    parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium", "large"],
                        help="Whisper model (default: tiny)")
    parser.add_argument("--language", default="en", help="Language code (default: en)")
    parser.add_argument("--mic", help="Microphone device name (partial match)")
    parser.add_argument("--monitor", help="Monitor device name for speaker capture")
    parser.add_argument("--interval", type=float, default=5.0, help="Processing interval in seconds (default: 5.0)")
    parser.add_argument("--min-duration", type=float, default=2.0, help="Minimum audio duration (default: 2.0)")
    parser.add_argument("--enable-llm", action="store_true", help="Enable LLM analysis (fact-checking + questions)")
    parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model (default: qwen2.5:3b)")
    parser.add_argument("--output", "-o", help="Save transcript to file")
    parser.add_argument("--list-devices", action="store_true", help="List audio devices and exit")
    parser.add_argument("--force-cpu", action="store_true", help="Force CPU processing")
    args = parser.parse_args()
    if args.list_devices:
        print("\nAvailable audio devices:")
        for i, dev in enumerate(sd.query_devices()):
            in_ch = dev['max_input_channels']
            out_ch = dev['max_output_channels']
            if in_ch > 0:
                print(f"  [{i:2d}] {dev['name']:<50} IN:{in_ch} OUT:{out_ch}")
        return
    print("=== Real-Time Audio Transcription ===")
    print(f"Model: {args.model} | Language: {args.language} | Interval: {args.interval}s")
    if args.output:
        print(f"Output: {args.output}")
    if args.enable_llm:
        print(f"LLM Analysis: Enabled ({args.llm_model})")
    # Initialize capture
    try:
        capturer = DualAudioCapture(
            mic_device=args.mic,
            monitor_device=args.monitor,
            sample_rate=16000,
            chunk_size=2048
        )
    except Exception as e:
        print(f"\n❌ Audio Error: {e}")
        print("\nTip: Use --list-devices to see available devices")
        print("     Use --mic and --monitor to specify devices")
        return
    # Initialize transcriber
    try:
        transcriber = WhisperTranscriber(
            model_name=args.model,
            language=args.language,
            force_cpu=args.force_cpu
        )
    except Exception as e:
        print(f"\n❌ Whisper Error: {e}")
        return
    # Initialize LLM analyzer
    llm_analyzer = None
    if args.enable_llm:
        try:
            llm_analyzer = LLMAnalyzer(model=args.llm_model)
        except Exception as e:
            print(f"\n⚠ LLM Error: {e}")
            print("Continuing without LLM analysis...")
    # Main loop
    print(f"\n✅ Started. Press Ctrl+C to stop.\n{'='*60}")
    last_process = time.time()
    try:
        while True:
            # Collect audio
            chunk = capturer.read_chunk()
            if chunk:
                source, audio = chunk
                transcriber.add_audio(source, audio)
            # Process at intervals
            if time.time() - last_process >= args.interval:
                results = transcriber.transcribe_chunk(min_duration=args.min_duration)
                if results:
                    timestamp = datetime.now().strftime("%H:%M:%S")
                    for source, text in results.items():
                        if text:
                            source_emoji = "🎤" if source == 'mic' else "🔊"
                            print(f"\n{source_emoji} [{timestamp}] {text}")
                            # LLM analysis
                            fact_check = None
                            questions = None
                            if llm_analyzer:
                                fact_check = llm_analyzer.fact_check(text)
                                questions = llm_analyzer.generate_questions(text)
                                verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(
                                    fact_check['verdict'], '❓')
                                print(f"   {verdict_emoji} {fact_check['verdict'].upper()} "
                                      f"({fact_check['confidence']:.2f}): {fact_check['reason']}")
                                print(f"   ❓ Questions:")
                                for i, q in enumerate(questions, 1):
                                    print(f"      {i}. {q}")
                            # Save to file
                            if args.output:
                                if llm_analyzer:
                                    save_enriched_transcript(text, source, timestamp, fact_check, questions, args.output)
                                else:
                                    save_transcript(text, source, timestamp, args.output)
                last_process = time.time()
    except KeyboardInterrupt:
        print(f"\n{'='*60}\n🛑 Stopping...")
    capturer.close()
    if args.output and os.path.exists(args.output):
        print(f"\n💾 Transcript saved: {os.path.abspath(args.output)}")
    print("\n✅ Done!")
 if __name__ == "__main__":
    main()
--- a/transcribe_dual_linux.py
+++ b/transcribe_dual_linux.py
@@ -1,7 +1,7 @@
 #!/usr/bin/env python3
 """
-Real-time transcription with dual audio capture (microphone + speaker monitor).
+Real-time audio transcription with dual capture and optional LLM analysis.
-Linux/PipeWire optimized with Ollama LLM fact-checking.
+Supports microphone + speaker monitor, file output, and fact-checking.
 """
 import sounddevice as sd
@@ -9,6 +9,7 @@ import numpy as np
 import threading
 import queue
 import time
 import os
 import argparse
 from datetime import datetime
 from faster_whisper import WhisperModel
@@ -197,8 +198,8 @@ class WhisperTranscriber:
            return None
-class LLMFactChecker:
+class LLMAnalyzer:
-    """Fast fact-checking with Ollama"""
+    """LLM analysis with fact-checking and question generation"""
    def __init__(self, model="qwen2.5:3b"):
        if not OLLAMA_AVAILABLE:
@@ -228,34 +229,100 @@ Statement: "{text}" """
            )
            import re
-            text = response['response']
+            response_text = response['response']
-            verdict = re.search(r'VERDICT:\s*(\w+)', text, re.I)
+            verdict = re.search(r'VERDICT:\s*(\w+)', response_text, re.I)
-            confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', text, re.I)
+            confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', response_text, re.I)
-            reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', text, re.I | re.DOTALL)
+            reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', response_text, re.I | re.DOTALL)
            return {
                'verdict': verdict.group(1).lower() if verdict else 'unknown',
                'confidence': float(confidence.group(1)) if confidence else 0.5,
-                'reason': reason.group(1).strip() if reason else text[:150]
+                'reason': reason.group(1).strip() if reason else response_text[:150]
            }
        except Exception as e:
            return {'verdict': 'error', 'confidence': 0.0, 'reason': str(e)}
    def generate_questions(self, text):
        """Generate follow-up questions"""
        prompt = f"""Generate 3 insightful questions about this. Reply ONLY with:
 Q1: [question]
 Q2: [question]
 Q3: [question]
 Statement: "{text}" """
        try:
            response = ollama.generate(
                model=self.model,
                prompt=prompt,
                options={"temperature": 0.7, "num_predict": 120}
            )
            import re
            response_text = response['response']
            questions = []
            for i in range(1, 4):
                q_match = re.search(rf'Q{i}:\s*(.+?)(?:\n|$)', response_text, re.I)
                if q_match:
                    question = q_match.group(1).strip()
                    if not question.endswith('?'):
                        question += '?'
                    questions.append(question)
            # Fallback defaults
            while len(questions) < 3:
                defaults = ["What are the implications?", "What evidence supports this?", "What's the context?"]
                questions.append(defaults[len(questions)])
            return questions[:3]
        except Exception as e:
            return ["What are the key points?", "What supports this?", "What are the implications?"]
 def save_transcript(text, source, timestamp, filename):
    """Append transcript to file"""
    os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
    with open(filename, "a", encoding="utf-8") as f:
        source_label = "MIC" if source == 'mic' else "SPEAKER"
        f.write(f"[{timestamp}] {source_label}: {text}\n")
 def save_enriched_transcript(text, source, timestamp, fact_check, questions, filename):
    """Save enriched transcript with LLM analysis"""
    os.makedirs(os.path.dirname(filename) if os.path.dirname(filename) else '.', exist_ok=True)
    with open(filename, "a", encoding="utf-8") as f:
        source_label = "MIC" if source == 'mic' else "SPEAKER"
        f.write(f"\n{'='*70}\n")
        f.write(f"[{timestamp}] {source_label}: {text}\n\n")
        if fact_check:
            f.write(f"📊 Fact Check: {fact_check['verdict'].upper()} ")
            f.write(f"(confidence: {fact_check['confidence']:.2f})\n")
            f.write(f"💡 {fact_check['reason']}\n\n")
        if questions:
            f.write("❓ Questions:\n")
            for i, q in enumerate(questions, 1):
                f.write(f"{i}. {q}\n")
            f.write("\n")
 def main():
-    parser = argparse.ArgumentParser(description="Dual audio transcription with fact-checking")
+    parser = argparse.ArgumentParser(description="Real-time audio transcription with dual capture")
-    parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium"],
+    parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium", "large"],
-                        help="Whisper model (default: tiny for speed)")
+                        help="Whisper model (default: tiny)")
-    parser.add_argument("--language", default="en", help="Language code")
+    parser.add_argument("--language", default="en", help="Language code (default: en)")
    parser.add_argument("--mic", help="Microphone device name (partial match)")
    parser.add_argument("--monitor", help="Monitor device name for speaker capture")
-    parser.add_argument("--interval", type=float, default=5.0, help="Processing interval (seconds)")
+    parser.add_argument("--interval", type=float, default=5.0, help="Processing interval in seconds (default: 5.0)")
-    parser.add_argument("--min-duration", type=float, default=2.0, help="Min audio duration")
+    parser.add_argument("--min-duration", type=float, default=2.0, help="Minimum audio duration (default: 2.0)")
-    parser.add_argument("--enable-llm", action="store_true", help="Enable fact-checking")
+    parser.add_argument("--enable-llm", action="store_true", help="Enable LLM analysis (fact-checking + questions)")
-    parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model")
+    parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model (default: qwen2.5:3b)")
-    parser.add_argument("--list-devices", action="store_true", help="List audio devices")
+    parser.add_argument("--output", "-o", help="Save transcript to file")
-    parser.add_argument("--force-cpu", action="store_true", help="Force CPU")
+    parser.add_argument("--list-devices", action="store_true", help="List audio devices and exit")
    parser.add_argument("--force-cpu", action="store_true", help="Force CPU processing")
    args = parser.parse_args()
@@ -268,8 +335,12 @@ def main():
                print(f"  [{i:2d}] {dev['name']:<50} IN:{in_ch} OUT:{out_ch}")
        return
-    print("=== Dual Audio Transcription with Fact-Checking ===")
+    print("=== Real-Time Audio Transcription ===")
    print(f"Model: {args.model} | Language: {args.language} | Interval: {args.interval}s")
    if args.output:
        print(f"Output: {args.output}")
    if args.enable_llm:
        print(f"LLM Analysis: Enabled ({args.llm_model})")
    # Initialize capture
    try:
@@ -296,14 +367,14 @@ def main():
        print(f"\n❌ Whisper Error: {e}")
        return
-    # Initialize fact checker
+    # Initialize LLM analyzer
-    fact_checker = None
+    llm_analyzer = None
    if args.enable_llm:
        try:
-            fact_checker = LLMFactChecker(model=args.llm_model)
+            llm_analyzer = LLMAnalyzer(model=args.llm_model)
        except Exception as e:
            print(f"\n⚠ LLM Error: {e}")
-            print("Continuing without fact-checking...")
+            print("Continuing without LLM analysis...")
    # Main loop
    print(f"\n✅ Started. Press Ctrl+C to stop.\n{'='*60}")
@@ -329,10 +400,27 @@ def main():
                            source_emoji = "🎤" if source == 'mic' else "🔊"
                            print(f"\n{source_emoji} [{timestamp}] {text}")
-                            if fact_checker:
+                            # LLM analysis
-                                fc = fact_checker.fact_check(text)
+                            fact_check = None
-                                verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(fc['verdict'], '❓')
+                            questions = None
-                                print(f"   {verdict_emoji} {fc['verdict'].upper()} ({fc['confidence']:.2f}): {fc['reason']}")
+                            if llm_analyzer:
                                fact_check = llm_analyzer.fact_check(text)
                                questions = llm_analyzer.generate_questions(text)
                                verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(
                                    fact_check['verdict'], '❓')
                                print(f"   {verdict_emoji} {fact_check['verdict'].upper()} "
                                      f"({fact_check['confidence']:.2f}): {fact_check['reason']}")
                                print(f"   ❓ Questions:")
                                for i, q in enumerate(questions, 1):
                                    print(f"      {i}. {q}")
                            # Save to file
                            if args.output:
                                if llm_analyzer:
                                    save_enriched_transcript(text, source, timestamp, fact_check, questions, args.output)
                                else:
                                    save_transcript(text, source, timestamp, args.output)
                last_process = time.time()
@@ -340,6 +428,8 @@ def main():
        print(f"\n{'='*60}\n🛑 Stopping...")
    capturer.close()
    if args.output and os.path.exists(args.output):
        print(f"\n💾 Transcript saved: {os.path.abspath(args.output)}")
    print("\n✅ Done!")
--- a/transcribe_duil_linux_old.py
+++ b/transcribe_duil_linux_old.py
@@ -0,0 +1,347 @@
 #!/usr/bin/env python3
 """
 Real-time transcription with dual audio capture (microphone + speaker monitor).
 Linux/PipeWire optimized with Ollama LLM fact-checking.
 """
 import sounddevice as sd
 import numpy as np
 import threading
 import queue
 import time
 import argparse
 from datetime import datetime
 from faster_whisper import WhisperModel
 try:
    import ollama
    OLLAMA_AVAILABLE = True
 except ImportError:
    OLLAMA_AVAILABLE = False
 class DualAudioCapture:
    """Capture both microphone and speaker output simultaneously"""
    def __init__(self, mic_device=None, monitor_device=None, sample_rate=16000, chunk_size=2048):
        self.sample_rate = sample_rate
        self.chunk_size = chunk_size
        self.audio_queue = queue.Queue()
        # Find devices
        devices = sd.query_devices()
        # Microphone (default input or specified)
        if mic_device is None:
            self.mic_device = sd.default.device[0]  # Default input
        else:
            self.mic_device = self._find_device(mic_device, input_required=True)
        # Monitor/Loopback (for speaker output)
        if monitor_device:
            self.monitor_device = self._find_device(monitor_device, input_required=True)
        else:
            self.monitor_device = None
        print(f"✓ Microphone: {devices[self.mic_device]['name']} (index {self.mic_device})")
        if self.monitor_device:
            print(f"✓ Monitor: {devices[self.monitor_device]['name']} (index {self.monitor_device})")
        else:
            print("⚠ No monitor device - capturing microphone only")
        # Start streams
        self.mic_stream = sd.InputStream(
            device=self.mic_device,
            channels=1,
            samplerate=sample_rate,
            blocksize=chunk_size,
            dtype='int16',
            callback=self._mic_callback
        )
        if self.monitor_device:
            self.monitor_stream = sd.InputStream(
                device=self.monitor_device,
                channels=1,
                samplerate=sample_rate,
                blocksize=chunk_size,
                dtype='int16',
                callback=self._monitor_callback
            )
        else:
            self.monitor_stream = None
        self.mic_stream.start()
        if self.monitor_stream:
            self.monitor_stream.start()
        print("✓ Audio capture started")
    def _find_device(self, device_name, input_required=True):
        """Find device by name substring"""
        devices = sd.query_devices()
        for i, dev in enumerate(devices):
            if device_name.lower() in dev['name'].lower():
                if not input_required or dev['max_input_channels'] > 0:
                    return i
        raise RuntimeError(f"Device '{device_name}' not found")
    def _mic_callback(self, indata, frames, time_info, status):
        """Microphone audio callback"""
        if status:
            print(f"⚠ Mic status: {status}")
        self.audio_queue.put(('mic', indata.copy()))
    def _monitor_callback(self, indata, frames, time_info, status):
        """Monitor/speaker audio callback"""
        if status:
            print(f"⚠ Monitor status: {status}")
        self.audio_queue.put(('monitor', indata.copy()))
    def read_chunk(self):
        """Read audio data from queue"""
        try:
            return self.audio_queue.get(timeout=0.05)
        except queue.Empty:
            return None
    def close(self):
        """Cleanup resources"""
        self.mic_stream.stop()
        self.mic_stream.close()
        if self.monitor_stream:
            self.monitor_stream.stop()
            self.monitor_stream.close()
 class WhisperTranscriber:
    """Process audio with Whisper"""
    def __init__(self, model_name="base", language="en", force_cpu=False):
        print(f"Loading Whisper model '{model_name}'...")
        import torch
        has_cuda = torch.cuda.is_available() and not force_cpu
        device = "cpu"
        compute_type = "int8"
        if has_cuda:
            try:
                import ctranslate2
                if ctranslate2.get_cuda_device_count() > 0:
                    device = "cuda"
                    compute_type = "float16"
                    print(f"✓ Using GPU: {torch.cuda.get_device_name(0)}")
            except Exception as e:
                print(f"⚠ CUDA unavailable: {e}")
        if device == "cpu":
            print("✓ Using CPU")
        model_kwargs = {"device": device, "compute_type": compute_type}
        if device == "cpu":
            model_kwargs["cpu_threads"] = 4
        self.model = WhisperModel(model_name, **model_kwargs)
        self.language = language
        self.mic_buffer = np.array([], dtype=np.float32)
        self.monitor_buffer = np.array([], dtype=np.float32)
        self.lock = threading.Lock()
    def add_audio(self, source, audio_chunk):
        """Add audio to appropriate buffer"""
        with self.lock:
            audio_float = audio_chunk.flatten().astype(np.float32) / 32768.0
            if source == 'mic':
                self.mic_buffer = np.concatenate([self.mic_buffer, audio_float])
            else:
                self.monitor_buffer = np.concatenate([self.monitor_buffer, audio_float])
    def transcribe_chunk(self, min_duration=3.0):
        """Transcribe accumulated audio"""
        with self.lock:
            mic_duration = len(self.mic_buffer) / 16000
            monitor_duration = len(self.monitor_buffer) / 16000
            results = {}
            # Transcribe microphone
            if mic_duration >= min_duration:
                mic_audio = self.mic_buffer.copy()
                self.mic_buffer = np.array([], dtype=np.float32)
                results['mic'] = self._transcribe(mic_audio)
            # Transcribe monitor
            if monitor_duration >= min_duration:
                monitor_audio = self.monitor_buffer.copy()
                self.monitor_buffer = np.array([], dtype=np.float32)
                results['monitor'] = self._transcribe(monitor_audio)
            return results if results else None
    def _transcribe(self, audio):
        """Internal transcription"""
        try:
            segments, _ = self.model.transcribe(
                audio,
                language=self.language,
                beam_size=3,  # Faster than default 5
                vad_filter=True,
                vad_parameters=dict(min_silence_duration_ms=500)
            )
            text = " ".join([seg.text for seg in segments]).strip()
            return text if text else None
        except Exception as e:
            print(f"❌ Transcription error: {e}")
            return None
 class LLMFactChecker:
    """Fast fact-checking with Ollama"""
    def __init__(self, model="qwen2.5:3b"):
        if not OLLAMA_AVAILABLE:
            raise RuntimeError("Ollama not installed: pip install ollama")
        self.model = model
        try:
            ollama.list()
            print(f"✓ Ollama connected: {self.model}")
        except Exception as e:
            raise RuntimeError(f"Ollama not running: {e}")
    def fact_check(self, text):
        """Quick fact-check"""
        prompt = f"""Fact-check this statement. Reply ONLY with:
 VERDICT: factual/dubious/false
 CONFIDENCE: 0.0-1.0
 REASON: one sentence
 Statement: "{text}" """
        try:
            response = ollama.generate(
                model=self.model,
                prompt=prompt,
                options={"temperature": 0.1, "num_predict": 80}
            )
            import re
            text = response['response']
            verdict = re.search(r'VERDICT:\s*(\w+)', text, re.I)
            confidence = re.search(r'CONFIDENCE:\s*([\d.]+)', text, re.I)
            reason = re.search(r'REASON:\s*(.+?)(?:\n|$)', text, re.I | re.DOTALL)
            return {
                'verdict': verdict.group(1).lower() if verdict else 'unknown',
                'confidence': float(confidence.group(1)) if confidence else 0.5,
                'reason': reason.group(1).strip() if reason else text[:150]
            }
        except Exception as e:
            return {'verdict': 'error', 'confidence': 0.0, 'reason': str(e)}
 def main():
    parser = argparse.ArgumentParser(description="Dual audio transcription with fact-checking")
    parser.add_argument("--model", default="tiny", choices=["tiny", "base", "small", "medium"],
                        help="Whisper model (default: tiny for speed)")
    parser.add_argument("--language", default="en", help="Language code")
    parser.add_argument("--mic", help="Microphone device name (partial match)")
    parser.add_argument("--monitor", help="Monitor device name for speaker capture")
    parser.add_argument("--interval", type=float, default=5.0, help="Processing interval (seconds)")
    parser.add_argument("--min-duration", type=float, default=2.0, help="Min audio duration")
    parser.add_argument("--enable-llm", action="store_true", help="Enable fact-checking")
    parser.add_argument("--llm-model", default="qwen2.5:3b", help="Ollama model")
    parser.add_argument("--list-devices", action="store_true", help="List audio devices")
    parser.add_argument("--force-cpu", action="store_true", help="Force CPU")
    args = parser.parse_args()
    if args.list_devices:
        print("\nAvailable audio devices:")
        for i, dev in enumerate(sd.query_devices()):
            in_ch = dev['max_input_channels']
            out_ch = dev['max_output_channels']
            if in_ch > 0:
                print(f"  [{i:2d}] {dev['name']:<50} IN:{in_ch} OUT:{out_ch}")
        return
    print("=== Dual Audio Transcription with Fact-Checking ===")
    print(f"Model: {args.model} | Language: {args.language} | Interval: {args.interval}s")
    # Initialize capture
    try:
        capturer = DualAudioCapture(
            mic_device=args.mic,
            monitor_device=args.monitor,
            sample_rate=16000,
            chunk_size=2048
        )
    except Exception as e:
        print(f"\n❌ Audio Error: {e}")
        print("\nTip: Use --list-devices to see available devices")
        print("     Use --mic and --monitor to specify devices")
        return
    # Initialize transcriber
    try:
        transcriber = WhisperTranscriber(
            model_name=args.model,
            language=args.language,
            force_cpu=args.force_cpu
        )
    except Exception as e:
        print(f"\n❌ Whisper Error: {e}")
        return
    # Initialize fact checker
    fact_checker = None
    if args.enable_llm:
        try:
            fact_checker = LLMFactChecker(model=args.llm_model)
        except Exception as e:
            print(f"\n⚠ LLM Error: {e}")
            print("Continuing without fact-checking...")
    # Main loop
    print(f"\n✅ Started. Press Ctrl+C to stop.\n{'='*60}")
    last_process = time.time()
    try:
        while True:
            # Collect audio
            chunk = capturer.read_chunk()
            if chunk:
                source, audio = chunk
                transcriber.add_audio(source, audio)
            # Process at intervals
            if time.time() - last_process >= args.interval:
                results = transcriber.transcribe_chunk(min_duration=args.min_duration)
                if results:
                    timestamp = datetime.now().strftime("%H:%M:%S")
                    for source, text in results.items():
                        if text:
                            source_emoji = "🎤" if source == 'mic' else "🔊"
                            print(f"\n{source_emoji} [{timestamp}] {text}")
                            if fact_checker:
                                fc = fact_checker.fact_check(text)
                                verdict_emoji = {'factual': '✅', 'dubious': '⚠️', 'false': '❌'}.get(fc['verdict'], '❓')
                                print(f"   {verdict_emoji} {fc['verdict'].upper()} ({fc['confidence']:.2f}): {fc['reason']}")
                last_process = time.time()
    except KeyboardInterrupt:
        print(f"\n{'='*60}\n🛑 Stopping...")
    capturer.close()
    print("\n✅ Done!")
 if __name__ == "__main__":
    main()
--- a/transcribe_speakers.py
+++ b/transcribe_speakers.py
@@ -79,17 +79,28 @@ class WindowsLoopbackAudioCapture:
                        dev['max_input_channels'] > 0):
                    return dev
-        # Auto-detect: look for WASAPI speakers/headphones
+        # Auto-detect: look for WASAPI speakers/headphones (Windows)
        for dev in devices:
            if (dev['max_input_channels'] > 0 and
                    any(x in dev['name'] for x in ['Speakers', 'Headphones', 'Output'])):
                return dev
-        # Fallback: Stereo Mix or similar
+        # Fallback: Stereo Mix or similar (Windows)
        for dev in devices:
            if 'Stereo Mix' in dev['name']:
                return dev
        # Linux fallback: use default input device (pipewire/pulse)
        try:
            default_input_idx = sd.default.device[0]  # Default input device
            if default_input_idx is not None:
                dev = devices[default_input_idx]
                if dev['max_input_channels'] > 0:
                    print("⚠️  Note: Using default input device (microphone). For speaker capture on Linux, use transcribe_dual_linux.py")
                    return dev
        except:
            pass
        return None
    def _audio_callback(self, indata, frames, time_info, status):
@@ -511,8 +522,8 @@ Examples:
                        help="GPU device index to use (default: 0)")
    parser.add_argument("--enable-llm", action="store_true",
                        help="Enable LLM analysis (fact-checking and questions)")
-    parser.add_argument("--llm-model", default="gpt-oss:20b",
+    parser.add_argument("--llm-model", default="llama3.2:latest",
-                        help="Ollama model to use for LLM analysis (default:  gpt-oss:20b)")
+                        help="Ollama model to use for LLM analysis (default: llama3.2:latest)")
    parser.add_argument("--llm-debug", action="store_true",
                        help="Show LLM raw responses for debugging")
    parser.add_argument("--sentence-mode", action="store_true",
Author	SHA1	Message	Date
mike	03b018a582	feat: update project name and cleanup startup script - Rename project from "Verbatim Dicta1" to "Verbatim Dicta2" - Remove redundant comments in `start_transcribe.sh` - Streamline virtual environment activation process	2025-12-17 23:09:56 +01:00
mike	7cc14326fc	Update project title to reflect current scope - Fix typo in project name from "Verbatim Dicta" to "Verbatim Dicta1" - Ensure consistency with other project references	2025-12-17 22:53:35 +01:00
mike	4343b7a5a2	chore: update 6 file(s)	2025-12-17 22:30:41 +01:00
mike	a53c0e2902	init	2025-12-17 22:11:08 +01:00