保险领域问答系统问句组合分类模型的研究
首发时间:2019-03-14
摘要:在问答系统中,通常采用的方法是将用户的提问映射到数据库中已有问句,进而检索到答案,给出用户反馈。这种方法往往导致问句映射时间开销过大,影响系统性能。如果能够首先对问句进行预分类,就可以将问句映射范围缩小,减少问句映射的工作量,达到提高系统性能的目的。优秀的分类模型可以在保障问答系统响应精度的同时,提高系统的响应速度。为了得到分类效果较为优秀的问句分类器,本文选取保险领域的问答语料,训练word2vec词向量模型,将原始中文语句转化为计算机可读的向量形式,调研并实现了MLP分类模型、CNN分类模型、LSTM分类模型等三种常见的神经网络分类器,并利用线性回归的方法,将这三种模型加以组合,得到组合分类模型。在实验阶段,除使用语料自带的测试集外,人工生成了变种问句的测试集,以测试分类器在面对陌生问句时的性能。实验表明,对比单一的分类模型,组合分类模型具有更高的分类性能,同时在面对训练不够充分的陌生语句时具有更好的适应性。
For information in English, please click here
Research on the combined classification model of question in insurance Q&A system
Abstract:In the question answering system, the commonly used method is to map the user\'s question to the existing question in the database, and then retrieve the answer to give the user feedback. This method often leads to excessive time overhead for question mapping, which affects system performance. If you can pre-classify the question, you can narrow the scope of the question, reduce the workload of the question mapper, and improve the performance of the system. An excellent classification model can improve the response speed of the system while ensuring the accuracy of the response. In order to get the question classifier with better classification effect, this paper selects the question answering corpus in the insurance field, trains the word2vec word vector model, converts the original Chinese sentence into a computer-readable vector form, and investigates and implements three common neural network classifiers: MLP, CNN and LSTM. Linear regression method is used to combine these three models to obtain a combined classification model. In the experimental stage, in addition to using the test set that comes with the corpus, this paper also uses a test set of variant questions which is manually generated to test the performance of the classifier when it encounters unfamiliar questions. Experiments show that compared with a single classification model, the combined classification model has higher classification performance and better adaptability in the face of unfamiliar sentences with insufficient training.
Keywords: Classification of questions Neural network Combinatorial classifier
基金:
引用
No.****
同行评议
勘误表
保险领域问答系统问句组合分类模型的研究
评论
全部评论0/1000