Files
puzzle-generator/vocab/README.md
2025-12-19 14:02:07 +01:00

38 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
![GitHub last commit](https://img.shields.io/github/last-commit/opentaal/opentaal-wordlist)
![GitHub commit activity](https://img.shields.io/github/commit-activity/y/opentaal/opentaal-wordlist)
![GitHub Repo stars](https://img.shields.io/github/stars/opentaal/opentaal-wordlist)
![GitHub watchers](https://img.shields.io/github/watchers/opentaal/opentaal-wordlist)
![GitHub Sponsors](https://img.shields.io/github/sponsors/opentaal)
![Liberapay patrons](https://img.shields.io/liberapay/patrons/opentaal)
# Dutch Word List
Last updated: 2023-03-10
This repository contains the official OpenTaal Dutch word list, comprising over 400,000 words compiled from contributions and curated sources. The list is provided in UTF-8 encoding and is alphabetically sorted.
## Contents
### Primary File
- **`wordlist.txt`** Complete UTF-8 word list (one word per line).
### Metadata
- **`datetimeversion.txt`** Timestamp and version information.
### Component Files
- **`elements/basiswoorden-gekeurd.txt`** Approved base words (~200k entries).
- **`elements/basiswoorden-ongekeurd.txt`** Unapproved base words, including proper nouns and compounds (~41k entries).
- **`elements/flexies-ongekeurd.txt`** Unapproved inflections (~170k entries).
- **`elements/wordparts.tsv`** Word parts containing spaces (TSV format).
- **`elements/corrections.tsv`** Common misspellings with corrections (TSV format).
- **`elements/romeinse-cijfers.txt`** Roman numerals (~4k entries).
- **`elements/wordlist-ascii.txt`** ASCII-only subset (excludes accented characters).
- **`elements/wordlist-non-ascii.txt`** Entries containing non-ASCII characters.
## Character Set
Includes standard Latin letters (az, AZ), Dutch diacritics (e.g., `é`, `ë`, `ï`), superscript/subscript digits (e.g., `²`, `³`), and punctuation: `' . - / + & @ ?`.