应用于微博的LDA模型改进

亓晓青; 景晓军

0
0
浏览
下载

摘要
关键词
基金信息
论文图表
同行评议
相关论文
评论

应用于微博的LDA模型改进

首发时间：2012-12-07

亓晓青 ¹
亓晓青，（1988-），男，硕士研究生，主要研究方向：数据挖据。
景晓军 ²
景晓军（1965-），男，教授，博士生导师，主要研究方向：图像处理

1、北京邮电大学信息与通信工程学院，北京 100876
2、北京邮电大学信息与通信工程学院

摘要：针对微博短文本高维稀疏的特点，主题模型被广泛研究用于微博文本聚类。作者主题模型（ATM）作为对热门主题模型LDA的有效拓展也用于微博文本挖掘。然而应用于微博文本挖掘，ATM具有两个缺点，其一是一篇文档中的单词只能由一个作者产生，其二是没有考虑到微博这种文本形式具有的内在结构信息。针对以上两点，对ATM模型进行改进，提出了新的改进算法--用户与关联扩展LDA（ULLDA）。并在NLPIR数据集上进行了验证，证实改进模型能有效地运用于微博文本挖掘，性能较ATM有所改进。

关键词：数据挖掘潜在狄利克雷分布模型（LDA）吉布斯抽样

For information in English, please click here

The Improvement of LDA Applying in Microblog

QI Xiaoqing ¹
亓晓青，（1988-），男，硕士研究生，主要研究方向：数据挖据。
JING Xiaojun ²
景晓军（1965-），男，教授，博士生导师，主要研究方向：图像处理

1、School of Information and Communication Engineering,Beijing University of Posts and Telecommunications, Beijing 100876
2、School of Information and Communication Engineering,Beijing University of Posts and Telecommunications

Abstract：Aiming at sparse high-dimension problem of microblog, topic model is widely researched in text clustering of microblog. Author Topic Model(ATM) , which is an effective extending of Latent Dirichlet Allocation(LDA), is also used to the same purpose. However, there are two disadvantages while ATM is used. The one is that all the words in an article are generated by only one author, the other one is that ATM doesn't take into account of the inside structure information of microblog. To solve these two problems, an improvement on ATM is presented, and the new model is called ULLDA. The proving is given based on the dataset of NLPIR, proving that ULLDA is useful for the text clustering of microblog and it can improve the performance of ATM.

Keywords： Data Mining Latent Dirichlet Allocation Gibbs Samping

基金：

论文图表：

引用

导出参考文献

.txt

.ris

.doc

亓晓青，景晓军. 应用于微博的LDA模型改进[EB/OL]. 北京：中国科技论文在线 [2012-12-07]. https://www.paper.edu.cn/releasepaper/content/201212-118.

No.****

同行评议

未申请同行评议

全部评论

0/1000

论文编号	201212-118
论文题目	应用于微博的LDA模型改进
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：Albert Einstein,编入参考文献时写法：Einstein A. 示例2：原姓名写法：李时珍；编入参考文献时写法：LI S Z. 示例3：YELLAND R L,JONES S C,EASTON K S,et al.