![GitHub last commit](https://img.shields.io/github/last-commit/opentaal/opentaal-wordlist) ![GitHub commit activity](https://img.shields.io/github/commit-activity/y/opentaal/opentaal-wordlist) ![GitHub Repo stars](https://img.shields.io/github/stars/opentaal/opentaal-wordlist) ![GitHub watchers](https://img.shields.io/github/watchers/opentaal/opentaal-wordlist) ![GitHub Sponsors](https://img.shields.io/github/sponsors/opentaal) ![Liberapay patrons](https://img.shields.io/liberapay/patrons/opentaal) # Dutch Word List Last updated: 2023-03-10 This repository contains the official OpenTaal Dutch word list, comprising over 400,000 words compiled from contributions and curated sources. The list is provided in UTF-8 encoding and is alphabetically sorted. ## Contents ### Primary File - **`wordlist.txt`** – Complete UTF-8 word list (one word per line). ### Metadata - **`datetimeversion.txt`** – Timestamp and version information. ### Component Files - **`elements/basiswoorden-gekeurd.txt`** – Approved base words (~200k entries). - **`elements/basiswoorden-ongekeurd.txt`** – Unapproved base words, including proper nouns and compounds (~41k entries). - **`elements/flexies-ongekeurd.txt`** – Unapproved inflections (~170k entries). - **`elements/wordparts.tsv`** – Word parts containing spaces (TSV format). - **`elements/corrections.tsv`** – Common misspellings with corrections (TSV format). - **`elements/romeinse-cijfers.txt`** – Roman numerals (~4k entries). - **`elements/wordlist-ascii.txt`** – ASCII-only subset (excludes accented characters). - **`elements/wordlist-non-ascii.txt`** – Entries containing non-ASCII characters. ## Character Set Includes standard Latin letters (a–z, A–Z), Dutch diacritics (e.g., `é`, `ë`, `ï`), superscript/subscript digits (e.g., `²`, `³`), and punctuation: `' . - / + & @ ?`.