您当前所在位置: 首页 > 学者

唐杰

  • 41浏览

  • 0点赞

  • 0收藏

  • 0分享

  • 158下载

  • 0评论

  • 引用

期刊论文

A Unified Tagging Approach to Text Normalization

唐杰Conghui Zhu Jie Tang Hang Li Hwee Tou Ng Tie-Jun Zhao

,-0001,():

URL:

摘要/描述

This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting ‘informally inputted’ text into the canonical form, by eliminating ‘noises’ in the text and detecting paragraph and sentence boundaries in the text. Previously, text normalization issues were often undertaken in an ad-hoc fashion or studied separately. This paper first gives a formalization of the entire problem. It then proposes a unified tagging approach to perform the task using Conditional Random Fields (CRF). The paper shows that with the introduction of a small set of tags, most of the text normalization tasks can be performed within the approach. The accuracy of the proposed method is high, because the subtasks of normalization are interdependent and should be performed together. Experimental results on email data cleaning show that the proposed method significantly outperforms the approach of using cascaded models and that of employing independent models.

关键词:

【免责声明】以下全部内容由[唐杰]上传于[2008年03月24日 14时42分25秒],版权归原创者所有。本文仅代表作者本人观点,与本网站无关。本网站对文中陈述、观点判断保持中立,不对所包含内容的准确性、可靠性或完整性提供任何明示或暗示的保证。请读者仅作参考,并请自行承担全部责任。

我要评论

全部评论 0

本学者其他成果

    同领域成果