基于多类型特征的生物事件触发词识别
首发时间:2013-12-24
摘要:到目前为止,MEDLINE文献数据库里保存有超过2300万生物医学方面的期刊论文。为了获取隐藏在文献中的知识,蛋白质命名实体识别随之出现。随之而来,如何获取蛋白质之间反应的关系引起了广泛的关注。尽管如此,这些技术不足以抽取出高质量的信息,如生物事件。因此,生物事件抽取技术应运而生,并取得广泛关注。一般而言,生物事件抽取采取流水线的方式,首先是识别出事件触发词,然后基于触发词的事件抽取。本文提出了一种分类方法用于生物事件触发词识别。通过分析,比较基于不同类型特征组合的分类器的识别效果,发现不同类型的特征对于生物事件触发词的识别作用存在不同差异。这将为如何高效的设计和使用特征提供指导。
For information in English, please click here
Biomedical Event Trigger Word Identification Based on Different Types of Features
Abstract:To date, more than 23 million biomedical journal articles or abstracts are stored in MEDLINE database. To capture the knowledge hid in the plain texts, protein name entity recognition was proposed, and then protein protein interactions extraction attracted much more research attentions. However, high-quality information, such as biomedical events can not be captured through these approaches. Hence, biomedical event extraction has been attracting much attention. Generally, biomedical event extraction is in a pipeline form, firstly identify biomedical event trigger words, and then extract events from natural language texts. In this paper, a classification method for trigger word identification is proposed. Different types of features are combined together and employed for constructing the classifier. The performance of the classifiers based on different types of features is compared. It can be observed that different types of features contribute to the performance of the final trigger word identification system with different significance. This could be used as a guideline for future feature set design.
Keywords: Biomedical Event Extraction Trigger Word Detection Classification
论文图表:
引用
No.****
同行评议
勘误表
基于多类型特征的生物事件触发词识别
评论
全部评论0/1000