您当前所在位置: 首页 > 学者

刘建毅

  • 48浏览

  • 0点赞

  • 0收藏

  • 0分享

  • 174下载

  • 0评论

  • 引用

期刊论文

New word identification based on statistical classifier

刘建毅LIU Jian-yi WANG Jing-hua Wang Cong

THE JOURNAL OF CHINA UNIVERSITIES OF POSTS AND TELECOMMUNICATIONS Volume 13, Issue 3, September 2006,-0001,():

URL:

摘要/描述

New word identification is a difficult point in Chinese word segmentation processing. In the automatic word segmentation processing of large Chinese texts, new word can cause segmentation mistakes. The paper defines new word identification as a binary classification problem: whether a character sequence in certain context is a new word or not, and use two statistical learning approaches based on support vector machine (SVM) and C4.5. We then investigate various linguistic and statistical features including Independent Word Probability of former character, Independent Word Probability of latter character, front position In-word probability of former character, back position In-word probability of latter character, Mutual Information and frequency. In PK-close test of the 1st Special Interest Group for Chinese Language Processing (SIGHAN) bakeoff, this approach achieves the high precision and recall.

【免责声明】以下全部内容由[刘建毅]上传于[2008年03月21日 16时48分52秒],版权归原创者所有。本文仅代表作者本人观点,与本网站无关。本网站对文中陈述、观点判断保持中立,不对所包含内容的准确性、可靠性或完整性提供任何明示或暗示的保证。请读者仅作参考,并请自行承担全部责任。

我要评论

全部评论 0

本学者其他成果

    同领域成果