高效权重树快速挖掘频繁网页集的方法
首发时间:2018-03-30
摘要:为了了解用户访问行为,从web日志中挖掘频繁页面集已经成为了网络应用挖掘中迫在眉睫的需求。尽管原始的T +权重树算法(T+ weight tree algorithm TWTA)能够从网络日志中正确地获取频繁的网页集,但是由于具有大量候选集的TWTA本质上并不是树算法,所以它存在运行时间过长的弊端。因此,本文提出了一种高效权重树算法(efficient weight tree algorithm EWTA)来快速挖掘频繁页面集。文中提出了一种称为高效权重树(efficient weight tree EWT)的树结构,以挖掘频繁页面集,并定义了辅助权重(assistant weight AW)来过滤不频繁的页面集。EWT树可以通过树剪枝来加速计算过程,而辅助权重AW可以保证剪枝的正确性。实验结果表明,EWTA算法的运算速度远高于TWTA算法,特别是对于有大量项目的数据库,同时EWTA算法占用了很少的内存。
关键词: 数据挖掘 挖掘频繁模式 web日志挖掘 高效权重树算法
For information in English, please click here
Fast Mining Frequent page sets from web log by efficient weight tree algorithm
Abstract:Mining frequently visited web pages from web logs has become an imminent need for web usage mining to understand the behavior of users. Although original T+ weight tree algorithm (TWTA) can properly get frequent web pages from web log, it suffers from much long run time because TWTA which has lots of candidate generations is not a tree-algorithm essentially. So in this paper, we proposed an efficient weight tree algorithm (EWTA) to fast mine frequent web pages. We created an innovative tree structure called efficient weight tree (EWT) to mine frequent pagesets and defined assistant weight (AW) to filter infrequent page sets. The EWT can speed up the calculation process by prune infrequent page sets and the AW guaranteed the pruning is properly. Experimental results show that compared with T+ tree algorithm, our EWTA is much faster especially for databases which have vast items and our algorithm use little memory at the same time.
Keywords: data mining frequent pattern mining web log mining efficient weight tree algorithm
引用
No.****
动态公开评议
共计0人参与
勘误表
高效权重树快速挖掘频繁网页集的方法
评论
全部评论0/1000