Sub-word Units
The power of sub-word units instead of word-level units has been witnessed in much research of Natural Language Processing (NLP) and Machine Translation (MT).
We’ve seen:
Character
Morpheme
Short/Long unit word (especially for Japanese in some specific corpora)
Byte Pair Encoding (BPE) (Sennrich et. al, 2016)
Stroke/Radical (especially for Chinese characters) (???, 2017/8)
Sub-word units work well in the following cases: