基于字向量和BiLSTM-CNN的文本相似度计算方法

宋英; 张弛

0
0
浏览
下载

摘要
关键词
基金信息
论文图表
动态公开评议
相关论文
评论

基于字向量和BiLSTM-CNN的文本相似度计算方法

首发时间：2022-03-14

宋英 ¹
宋英（1995-），女，硕士研究生，主要研究方向：文本相似度
张弛 ¹
张弛（1978-），男，副教授，主要研究方向：智能媒体信息处理

1、中国传媒大学计算机与网络空间安全学院

摘要：文本相似度在自然语言处理中有着重要的作用，随着对深度学习的研究，传统机器学习中出现的文本语义被忽略、人工获取文本特征时间长成本高等问题，都能很好地加以处理，然而在深层语义方面研究效果不是很理想。针对该问题，本文提出了一种基于字向量和BiLSTM-CNN的文本相似度计算模型。首先利用word2vec模型对文本进行训练，获得字向量集合；其次将文本通过字向量集合进行向量化表示，并输入到BiLSTM-CNN模型中，通过Attention进行拼接获得文本语义向量；最后采用softmax函数计算文本相似度。在Chinese STS数据集上进行验证和比较，最终表明该方法比其他模型的准确率更高。

关键词：文本相似度字向量 BiLSTM-CNN模型

For information in English, please click here

Text Similarity Calculation Method Based on Word Vector and BiLSTM-CNN

SONG Ying ¹
宋英（1995-），女，硕士研究生，主要研究方向：文本相似度
ZHANG Chi ¹
张弛（1978-），男，副教授，主要研究方向：智能媒体信息处理

1、School of Computer and Cyber Sciences, Communication University of China

Abstract：Text similarity plays an important role in natural language processing. With the research of deep learning, the problems of text semantics being ignored in traditional machine learning, long time and high cost of manual acquisition of text features can be well dealt with. However, the effect of research on deep semantics is not very satisfactory. Aiming at this problem, this paper proposes a text similarity calculation model based on word vector and BiLSTM-CNN. First, the word2vec model is used to train the text to obtain the word vector set; secondly, the text is vectorized through the word vector set, and input into the BiLSTM-CNN model, and the text semantic vector is obtained by splicing through Attention; finally, the softmax function is used to calculate the text similarity. Validation and comparison on the Chinese STS dataset finally show that the method is more accurate than other models.

Keywords： text similarity word vector BiLSTM-CNN model

基金：

论文图表：

引用

导出参考文献

.txt

.ris

.doc

宋英，张弛. 基于字向量和BiLSTM-CNN的文本相似度计算方法[EB/OL]. 北京：中国科技论文在线 [2022-03-14]. https://www.paper.edu.cn/releasepaper/content/202203-160.

No.****

动态公开评议

共计0人参与

动态评论进行中

全部评论

0/1000

论文编号	202203-160
论文题目	基于字向量和BiLSTM-CNN的文本相似度计算方法
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：Albert Einstein,编入参考文献时写法：Einstein A. 示例2：原姓名写法：李时珍；编入参考文献时写法：LI S Z. 示例3：YELLAND R L,JONES S C,EASTON K S,et al.