基于改进数据集结构的高效用数据挖掘算法研究

沈伟; 方伟

0
0
浏览
下载

摘要
关键词
基金信息
论文图表
动态公开评议
相关论文
评论

基于改进数据集结构的高效用数据挖掘算法研究

首发时间：2019-11-20

沈伟 ¹
沈伟（1995-），男，研究生，主要研究方向：数据挖掘。E-mail: 970543429@qq.com
方伟 ¹
方伟（1980-），男，教授，主要研究方向：计算智能。E-mail: fangwei@jiangnan.edu.cn

1、江南大学物联网工程学院，无锡　214122

摘要：高效用项集挖掘(High-Utility Itemset Mining,HUIM)是数据挖掘中的重要任务之一。相比于频繁项集挖掘(Frequent Itemset Mining,FIM),HUIM 会综合数量和利润两个因素来找出合适的项集,而不仅仅考虑数量,应用场景更加广泛。基于项集效用列表(utility-list) 结构的单阶段HUIM 算法因为可以在不生成候选解的情况下直接挖掘高效用项目集(High-Utility Itemset,HUI)是目前最有效的算法之一。然而,创建并维持多个utility-list 结构会消耗大量的时间和内存,尤其在比较大的密集型数据集上。为解决此问题,本文提出一种新的基于改进数据集结构的高效用数据挖掘(Efficient high-utility itemset mining based on a novel data structure,EIM-DS)算法。在EIM-DS 算法中,通过新的数据集结构来重构数据集能够有效地挖掘出所有的高效用项集并且减少在挖掘过程中的内存使用。同时,算法提出了两种新的剪枝策略：拓展集剪枝和局部TWU剪枝,能够较大地缩小搜索空间。在密集型和稀疏型数据集上的结果表明,EIM-DS 算法执行时间更少,内存消耗更低。

关键词：数据挖掘, 高效用数据挖掘, 模式挖掘

For information in English, please click here

EIM-DS: Efficient high-utility itemset mining based on a novel data structure

SHEN Wei ¹
沈伟（1995-），男，研究生，主要研究方向：数据挖掘。E-mail: 970543429@qq.com
FANG Wei ¹
方伟（1980-），男，教授，主要研究方向：计算智能。E-mail: fangwei@jiangnan.edu.cn

1、School of Internet of Things Engineering,Jiangnan University, Wuxi 214122

Abstract：High-utility itemset mining (HUIM) is an important tasks in data mining. Compared to frequent itemset mining (FIM), HUIM considers the quantity and profit factors to reveal the most profitable products, rather than the frequency factor. The one-phase HUIM algorithms based on utility-list structure have been shown to be one of the most efficient ones since they can mine high-utiliy itemsets (HUIs) without generating candidates. However, storing itemset information for utility list is time consuming and memory consuming, especially on the dense datasets with long transactions. To address the problem, a novel HUIM algorithm, which is called efficient high-utility itemset mining based on a novel data structure (EIM-DS), is proposed in this paper. In EIM-DS, a novel data structure is designed by reorganizing the transaction database in order to get all HUIs effectively and reduce memory useage in the depth-first search process. Based on the novel data structure, the extensions utility and local TWU utility are proposed in this paper and used to as the upper bounds, which can reduce the search space greatly from width and depth especally on dense datasets. Experimental results on the dense and sparse benchmark datasets show that the proposed EIM-DS has better results for mining HUIs compared to the state-of-the-art algorithms in terms of running time and memory usage.

Keywords： data mining, high-utility itemset mining, pattern mining

基金：

论文图表：

引用

导出参考文献

.txt

.ris

.doc

沈伟，方伟. 基于改进数据集结构的高效用数据挖掘算法研究[EB/OL]. 北京：中国科技论文在线 [2019-11-20]. https://www.paper.edu.cn/releasepaper/content/201911-53.

No.****

动态公开评议

共计0人参与

动态评论进行中

全部评论

0/1000

论文编号	201911-53
论文题目	基于改进数据集结构的高效用数据挖掘算法研究
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：Albert Einstein,编入参考文献时写法：Einstein A. 示例2：原姓名写法：李时珍；编入参考文献时写法：LI S Z. 示例3：YELLAND R L,JONES S C,EASTON K S,et al.