中国科技论文在线

上传时间

2010年11月29日

【期刊论文】GraphMiner: A Structural PatternMining System for Large Diskbased Graph Databases and Its Applications.

汪卫， Wei Wang?， Chen Wang?， Yongtai Zhu?， Baile Shi?， Jian Pei?， Xifeng Yan?， Jiawei Han?

，-0001，（）：

-1年11月30日

Mining frequent structural patterns from graph databases is an important research problem with broad applications. Recently, we developed an e?ective index structure, ADI, and e±cient algorithms for mining frequent patterns from large, disk-based graph databases [5], as well as constraint-based mining techniques. The techniques have been integrated into a research prototype system|GraphMiner. In this paper, we describe a demo of GraphMiner which showcases the technical details of the index structure and the mining algorithms including their e±cient implementation, the mining performance and the comparison with some state-of-the-art methods, the constraint-based graph-pattern mining techniques and the procedure of constrained graph mining, as well as mining real data sets in novel applications.

51浏览
0点赞
0收藏
0分享
71下载
0

引用

上传时间

2010年11月29日

【期刊论文】CLINCH: Clustering Incomplete High-Dimensional Data for Data Mining Application

汪卫， Zunping Cheng， Ding Zhou， Chen Wang， Jiankui Guo， Wei Wang， Baokang Ding， and Baile Shi

APWeb 2005, LNCS 3399, pp. 88-99, 2005.，-0001，（）：

-1年11月30日

摘要

Clustering is a common technique in data mining to discover hidden patterns from massive datasets. With the development of privacy-maintaining data mining application, clustering incomplete highdimensional data has becoming more and more useful. Motivated by these limits, we develop a novel algorithm CLINCH, which could produce fine clusters on incomplete high-dimensional data space. To handle missing attributes, CLINCH employs a prediction method that can be more precise than traditional techniques. On the other hand, we also introduce an efficient way in which dimensions are processed one by one to attack the"curse of dimensionality". Experiments show that our algorithm not only outperforms many existing high-dimensional clustering algorithms in scalability and efficiency, but also produces precise results.

Clustering， Incomplete Data， High-Dimensional Data.，

37浏览
0点赞
0收藏
0分享
103下载
0

引用

上传时间

2010年11月29日

【期刊论文】基于隐私保护的分类挖掘

汪卫，葛伟平，周皓峰，施伯乐

计算机研究与发展, 2006, 43 (1): 39～45，-0001，（）：

-1年11月30日

摘要

基于隐私保护的分类挖掘是近年来数据挖掘领域的热点之一，如何对原始真实数据进行变换，然后在变换后的数据集上构造判定树是研究的重点。基于转移概率矩阵提出了一个新颖的基于隐私保护的分类挖掘算法，可以适用于非字符型数据（布尔类型、分类类型和数字类型）和非均匀分布的原始数据，可以变换标签属性。实验表明该算法在变换后的数据集上构造的分类树具有较高的精度。

数据挖掘，分类，判定树，隐私保护，转移概率矩阵

39浏览
0点赞
0收藏
0分享
132下载
0

引用

上传时间

2010年11月29日

【期刊论文】ESPPM——频繁子树挖掘算法

汪卫，朱永泰，王晨，洪铭胜，施伯乐

计算机研究与发展, 2004, 41 (10): 1720-1726，-0001，（）：

-1年11月30日

摘要

随着互联网的发展，频繁模式的挖掘同频繁项集扩展到结构化数据：树和图。在这些结构上的挖掘工作被应用于更为复杂的领域，比如生物信息学、网络日志和XML文档。提出了一个新颖的算法：ESPM，以挖掘有序标号树中的频繁子树。不同于以往的工作，把树同构的判断工作放到了算法的晚期，从而减少了整个挖掘过程的时间开销。人工数据集和真实数据集上的实验都证明ESPM相较于其他算法的优越性。还提出了一些可能的改进。

数据挖掘，频繁模式，频繁子树， ESPM

49浏览
0点赞
0收藏
0分享
75下载
0

引用

上传时间

2010年11月29日

【期刊论文】Preference-based Frequent Pattern Mining

汪卫， Moonjung Cho， Jian Pei， Haixun Wang， Wei Wang

，-0001，（）：

-1年11月30日

摘要

Frequent pattern mining is an important data mining problem with broad applications. Although there are many in-depth studies on efficient frequent pattern mining algorithms and constraint pushing techniques, the effectiveness of frequent pattern mining remains a serious concern: it is non-trivial and often tricky to specify appropriate support thresholds and proper constraints. In this paper, we propose a novel theme of preference-based frequent pattern mining. A user can simply specify a preference instead of setting detailed parameters in constraints. We identify the problem of preference-based frequent pattern mining and formulate the preferences for mining. We develop an efficient framework to mine frequent patterns with preferences. Interestingly, many preferences can be pushed deep into the mining by properly employing the existing efficient frequent pattern mining techniques. We conduct an extensive performance study to examine our method. The results indicate that preference-based frequent pattern mining is effective and efficient. Furthermore, we extend our discussion from pattern-based frequent pattern mining to preference-based data mining in principle and draw a general framework.

Data mining， Frequent-pattern mining

77浏览
0点赞
0收藏
0分享
79下载
0

引用