initial commit

This commit is contained in:
mike
2025-12-19 15:45:53 +01:00
parent bd6a5a1d2e
commit b0244ba51d
11 changed files with 1659 additions and 922 deletions

254
README.md Normal file
View File

@@ -0,0 +1,254 @@
# Swedish-Style Crossword Puzzle Generator
A high-performance Java-based puzzle generator with theme-based word filtering and daily automated generation.
## Features
- **Swedish-style crossword puzzles** with arrow clues
- **Theme-based word filtering** using semantic similarity graph
- **Daily automated generation** via Docker + cron
- **JSON export format** compatible with web frontends
- **Genetic algorithm** for optimal grid layouts
- **Constraint satisfaction** for word placement
## Architecture
### Components
1. **SwedishGenerator.java** - Core puzzle generation engine
- Genetic algorithm for mask generation
- CSP solver for word filling
- Optimized for Dutch word lists
2. **ThemeGraph.java** - Theme-based word scoring system
- Predefined theme keywords (news, tech, sports, etc.)
- Edit distance similarity matching
- Automatic theme detection
3. **DailyGenerator.java** - Daily puzzle automation
- Generates themed puzzles
- JSON output with metadata
- Index file generation
4. **ExportFormat.java** - Export to standard format
- Grid cropping and optimization
- Arrow cell calculation
- Compatible with existing frontends
## Usage
### Local Development
```bash
# Compile
./compile.sh
# Run Main (interactive)
java -cp ~/dev/.target puzzle.Main --seed 42 --pop 18 --gens 100
# Generate daily puzzles
java -cp ~/dev/.target puzzle.DailyGenerator
```
### Docker Deployment
```bash
# Build image
docker build -t puzzle-generator .
# Run with docker-compose
docker-compose up -d puzzle_gen_java
# View logs
docker logs -f puzzle_gen_java
```
### Environment Variables
| Variable | Default | Description |
|----------------------|-------------------|----------------------------------------|
| `OUT_DIR` | `/data/puzzles` | Output directory for generated puzzles |
| `PUZZLES_PER_DAY` | `3` | Number of puzzles to generate daily |
| `WORDS_PATH` | `./word-list.txt` | Path to word list file |
| `THEME_FILTER` | `true` | Enable theme-based word filtering |
| `THEME_MIN_SCORE` | `0.6` | Minimum theme score (0.0-1.0) |
| `LM_STUDIO_BASE_URL` | - | LM Studio URL (future feature) |
| `GENERATE_ON_START` | `false` | Generate puzzles on container startup |
## Theme System
### Supported Themes
- `algemeen` - General/common words
- `nieuws` - News/politics
- `technologie` - Technology
- `sport` - Sports
- `weer` - Weather/nature
- `economie` - Economy
- `gezondheid` - Health
### Theme Filtering
Words are scored against themes using:
1. **Direct matching** - Word is in theme keyword list (score: 1.0)
2. **Substring matching** - Partial word overlap (score: 0.7)
3. **Edit distance** - Fuzzy matching for variations (score: 0.8-0.9)
Example:
```bash
ThemeGraph.filterByTheme(words, "technologie", 0.6);
// Returns: COMPUTER, INTERNET, SOFTWARE, DATA, etc.
```
## Output Format
### Puzzle JSON
```json
{
"date": "2025-12-19",
"theme": "technologie",
"difficulty": 1,
"rewards": {
"coins": 50,
"stars": 2,
"hints": 1
},
"gridv2": [
"###COMPUTER###",
"#I#O#E#E#O#"
],
"words": [
{
"word": "COMPUTER",
"clue": "COMPUTER",
"startRow": 0,
"startCol": 3,
"direction": "horizontal",
"answer": "COMPUTER",
"arrowRow": 0,
"arrowCol": 2
}
]
}
```
### Index JSON
```json
{
"date": "2025-12-19",
"files": [
"crossword_2025-12-19_01_technologie.json",
"crossword_2025-12-19_02_sport.json",
"crossword_2025-12-19_03_nieuws.json"
]
}
```
## Scheduling
Puzzles are generated daily at **3:15 AM** (configurable in `crontab`).
Edit `crontab` to change schedule:
```cron
# Daily at 3:15 AM
15 3 * * * java -cp /app/target puzzle.DailyGenerator
# Every 6 hours
0 */6 * * * java -cp /app/target puzzle.DailyGenerator
# Weekly on Monday at 1 AM
0 1 * * 1 java -cp /app/target puzzle.DailyGenerator
```
## Word List Format
Plain text file, one word per line, uppercase A-Z only, 2-8 characters:
```
EU
UUR
AUTO
BOOM
COMPUTER
INTERNET
...
```
## Performance
- **Mask generation**: ~2-5 seconds (genetic algorithm)
- **Word filling**: ~5-30 seconds (CSP solver with MRV heuristic)
- **Total per puzzle**: ~10-40 seconds
Optimizations:
- Positional indexing for fast candidate lookup
- Sorted intersection for constraint checking
- No large array allocations during search
- Progress bar with real-time stats
## Integration with LM Studio (Future)
The system is prepared for LM Studio integration to generate themed clues:
```bash
docker-compose up -d
# Set LM_STUDIO_BASE_URL in docker-compose.yml
# Container will query LM Studio for contextual clues based on themes
```
This will enhance clues from simple word repetition to semantic hints.
## Migration from Node.js
The Java version maintains module-wise compatibility with the Node.js generator:
| Node.js | Java |
|------------------------|-------------------------------------|
| `swedish_generator.js` | `SwedishGenerator.java` |
| `export_format.js` | `ExportFormat.java` |
| `main.js` | `Main.java` + `DailyGenerator.java` |
| N/A | `ThemeGraph.java` (new) |
## Volume Management
Puzzles are stored in a Docker volume outside the workspace:
```bash
# Default location
/var/lib/puzzle-data
# Custom location
export PUZZLE_OUTPUT_DIR=/path/to/puzzles
docker-compose up -d
# View puzzles
ls -lh /var/lib/puzzle-data/*.json
```
## Troubleshooting
### No puzzles generated
- Check word list has enough words (minimum 50)
- Lower `THEME_MIN_SCORE` if using theme filtering
- Increase `PUZZLES_PER_DAY` attempts
### Container not starting
```bash
docker logs puzzle_gen_java
# Check for compilation errors or missing files
```
### Low quality puzzles
- Increase `--gens` parameter (more genetic iterations)
- Increase `--pop` parameter (larger population)
- Ensure word list has good variety of lengths 2-8
## License
MIT
## Authors
Original Node.js version + Java port with theme system