基于多粒度词嵌入的中文问题分类算法
首发时间:2017-12-08
摘要:面向开放领域的检索式问答系统涉及多方面的工作,包括问题分析、信息检索、答案抽取、答案合并、答案排序等,其中问题分析中关键部分问题分类的性能直接影响到候选答案抽取的准确性。本文基于已有的句法结构分析结果,提出了基于多粒度词嵌入的中文问题分类算法MGE-RNN(Multi-granularity Embedding for Question Classification using Recurrent Neural Network),该算法自动提取了问句中主干、疑问词及其附属成分多粒度的特征向量表示,又利用了多粒度特征间的非线性关系,充分挖掘了问题中利于分类的重要深层特征,其损失函数包括多粒度词向量的训练损失以及Softmax分类器中的分类误差损失,最终在公开数据集上收到了较好的模型测试效果。
关键词: 问答系统;问题分类;多粒度词嵌入 句法结构
For information in English, please click here
Multi-granularity Embedding for Question Classification using Recurrent Neural Network
Abstract:The open-field-oriented retrieval-based question answering system involves a variety of tasks, including question analysis, information retrieval, answer extraction, answer combination, answer ranking, etc., of which the performance of the key part of the question analysis directly affects the accuracy of the answer extraction. Based on the existing syntactic structure analysis results, this paper proposes a MGE-RNN (Multi-granularity Embedding for Question Classification for Recurrent Neural Network) . This algorithm automatically learns distributed representations of the backbone and interrogative with its their ancillary components, and also fits the non-linear relationship between integrated multi-granularity features and question classification. The loss functions include training loss of multi-granularity word embeddings and classification error in Softmax classifier, and finally MGE-RNN performance is better than the state-of-the-art question classifier on a open dataset.
Keywords: Question answering system Question classification Multi-granularity embedding Syntactic structure
基金:
引用
No.****
动态公开评议
共计0人参与
勘误表
基于多粒度词嵌入的中文问题分类算法
评论
全部评论0/1000