Nguyễn Thế Dương - Recognizing and Tagging Vietnamese Words based on Statistics and Word Order Patterns
Ngày: 21/08/2015
In Vietnamese sentences, function words and word order patterns (WOPs) identify the semantic meaning and the grammatical word classes. We study the most popular WOPs and find out the candidates for new Vietnamese words (NVWs) based on the phrase and word segmentation algorithm [7]. The best WOPs, which are used for recognizing and tagging NVWs, are chosen based on the support and confidence concepts. These concepts are also used in examining if a word belongs to a word class.
Our experiments were examined over a huge corpus, which contains more than 50 million sentences. Four sets of WOPs are studied for recognizing and tagging nouns, verbs, adjectives and pronouns. There are 6,385 NVWs in our new dictionary including 2,791 new noun-taggings, 1,436 new verb-tagging, 682 new adj-taggings, and 1,476 new pronoun taggings.
Bài viết liên quan