一种中文分词的预处理技术
首发时间:2010-01-05
摘要:首先分析基于词表的最大匹配分词算法,指出其存在的缺陷,然后针对这一缺陷提出了一种利用高频词的预处理技术,它根据高频词的特点,用很少的步骤将句子尽可能多的分成段,然后将段进行最大匹配。最后通过实验数据证明此技术将提高中文分词的效率。
For information in English, please click here
A Chinese Word Segmentation of the Pre-treatment Technology
Abstract:Firstly, this paper analyzes the algorithm based on the vocabulary maximum matching word segmentation, point out its flaws, and then for this defect, proposed a use of high-frequency-words pre-treatment technology, it is based on the characteristics of high-frequency-words, with very little steps to the sentence is divided into as many paragraphs, and then carry out the maximum matching. Lastly through the experimental data proves that the technology will improve the Chinese word segmentation efficiency.
Keywords: high-frequency-words pre-treatment Chinese word segmentation
基金:
论文图表:
引用
No.3844951037412626****
同行评议
共计0人参与
勘误表
一种中文分词的预处理技术
评论
全部评论0/1000