基于众包的弱监督关系提取降噪研究
首发时间:2019-11-12
摘要:在大数据时代,随着科技的发展和社交网络的兴起,自然语言处理的应用越来越广泛。其中,从文本中提取关系在许多应用中具有突出的重要性。包括深度模型在内的判别模型,具有非常高的准确性,但需要使用专家标记的大量高质量训练数据,成本昂贵且在医疗等领域中不易获得。弱监督,例如众包,远程监督和数据处理编程等,旨在扩大标签规模,但可能会受到标签准确性低的影响。本文利用众包方法,在其上实现了一种用于数据集降噪的概率多标签模型。同时,本文通过双向传播和远程监督自主生成可用于实验的数据集。最终,本文在数据集上进行了实验,并与其他方法进行比较,实验结果证明了此方法对于降噪的有效性,进而可以提供更准确的标签。
关键词: 计算机应用技术 自然语言处理 众包 关系提取 最大似然估计 社交网络
For information in English, please click here
Noise Reduction of Weak supervision Relation Extraction Based on Crowdsourcing
Abstract:In the era of big data, with the development of technology and the rise of social networks, the application of natural language processing has become more and more widespread. Among them, extracting relations from texts has outstanding importance in many applications. Discriminant models, including deep models, have very high accuracy, but require the use of a large amount of high-quality training data tagged by experts, which is expensive and not readily available in medical and other fields. Weak supervisions, such as crowdsourcing, distant supervision and data programming, are designed to increase the size of the labels, but may be affected by the low accuracy of the labels. In this paper, a probabilistic multi-label model for data set noise reduction is implemented by crowdsourcing. At the same time, this paper autonomously generates data sets that can be used for experiments through double propagation and distant supervision. Finally, this paper carried out experiments on the dataset and compared with other methods. The experimental results prove the effectiveness of this method for noise reduction, which can provide more accurate labels.
Keywords: computer application technology natural language processing crowdsourcing relation extraction maximum likelihood estimation social network
基金:
引用
No.****
同行评议
勘误表
基于众包的弱监督关系提取降噪研究
评论
全部评论0/1000