基于众包的弱监督关系提取降噪研究

刘亭村; 张熙

0
0
浏览
下载

摘要
关键词
基金信息
论文图表
同行评议
相关论文
评论

基于众包的弱监督关系提取降噪研究

首发时间：2019-11-12

刘亭村 ¹
刘亭村（1994-），女，硕士研究生，机器学习
张熙 ¹
张熙（1983-），男，副教授、硕导，主要研究方向：数据挖掘

1、北京邮电大学可信分布式计算与服务教育部重点实验室，北京 100876

摘要：在大数据时代，随着科技的发展和社交网络的兴起，自然语言处理的应用越来越广泛。其中，从文本中提取关系在许多应用中具有突出的重要性。包括深度模型在内的判别模型，具有非常高的准确性，但需要使用专家标记的大量高质量训练数据，成本昂贵且在医疗等领域中不易获得。弱监督，例如众包，远程监督和数据处理编程等，旨在扩大标签规模，但可能会受到标签准确性低的影响。本文利用众包方法，在其上实现了一种用于数据集降噪的概率多标签模型。同时，本文通过双向传播和远程监督自主生成可用于实验的数据集。最终，本文在数据集上进行了实验，并与其他方法进行比较，实验结果证明了此方法对于降噪的有效性，进而可以提供更准确的标签。

关键词：计算机应用技术自然语言处理众包关系提取最大似然估计社交网络

For information in English, please click here

Noise Reduction of Weak supervision Relation Extraction Based on Crowdsourcing

LIU Tingcun ¹
刘亭村（1994-），女，硕士研究生，机器学习
ZHANG Xi ¹
张熙（1983-），男，副教授、硕导，主要研究方向：数据挖掘

1、Key Laboratory of Trustworthy Distributed Computing and Service(BUPT), Ministry of Education, Beijing 100876

Abstract：In the era of big data, with the development of technology and the rise of social networks, the application of natural language processing has become more and more widespread. Among them, extracting relations from texts has outstanding importance in many applications. Discriminant models, including deep models, have very high accuracy, but require the use of a large amount of high-quality training data tagged by experts, which is expensive and not readily available in medical and other fields. Weak supervisions, such as crowdsourcing, distant supervision and data programming, are designed to increase the size of the labels, but may be affected by the low accuracy of the labels. In this paper, a probabilistic multi-label model for data set noise reduction is implemented by crowdsourcing. At the same time, this paper autonomously generates data sets that can be used for experiments through double propagation and distant supervision. Finally, this paper carried out experiments on the dataset and compared with other methods. The experimental results prove the effectiveness of this method for noise reduction, which can provide more accurate labels.

Keywords： computer application technology natural language processing crowdsourcing relation extraction maximum likelihood estimation social network

基金：

论文图表：

引用

导出参考文献

.txt

.ris

.doc

刘亭村，张熙. 基于众包的弱监督关系提取降噪研究[EB/OL]. 北京：中国科技论文在线 [2019-11-12]. https://www.paper.edu.cn/releasepaper/content/201911-39.

No.****

同行评议

未申请同行评议

全部评论

0/1000

论文编号	201911-39
论文题目	基于众包的弱监督关系提取降噪研究
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：Albert Einstein,编入参考文献时写法：Einstein A. 示例2：原姓名写法：李时珍；编入参考文献时写法：LI S Z. 示例3：YELLAND R L,JONES S C,EASTON K S,et al.