中国科技论文在线

上传时间

2005年08月02日

周志华， Zhi-Hua Zhou

Artificial Intelligence 143 (2003) 139-146，-0001，（）：

-1年11月30日

This paper reviews three recent books on data mining written from three different perspectives, i.e., databases, machine learning, and statistics. Although the exploration in this paper is suggestive instead of conclusive, it reveals that besides some common properties, different perspectives lay strong emphases on different aspects of data mining. The emphasis of the database perspective is on efficiency because this perspective strongly concerns the whole discovery process and huge data volume. The emphasis of the machine learning perspective is on effectiveness because this perspective is heavily attracted by substantive heuristics working well in data analysis although they may not always be useful. As for the statistics perspective, its emphasis is on alidity because this perspective cares much for mathematical soundness behind mining methods.

Data mining， Databases， Machine learning， Statistics

121浏览
0点赞
0收藏
0分享
246下载
0

引用

上传时间

2005年08月02日

【期刊论文】Concise Papers NeC4.5: Neural Ensemble Based C4.5

周志华， Zhi-Hua Zhou， Member， IEEE， and Yuan Jiang

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO.6, JUNE 2004，-0001，（）：

-1年11月30日

摘要

Decision tree is with good comprehensibility while neural network ensemble is with strong generalization ability. In this paper, these merits are integrated into a novel decision tree algorithm NeC4.5. This algorithm trains a neural network ensemble at first. Then, the trained ensemble is employed to generate a new training set through replacing the desired class labels of the original training examples with those output from the trained ensemble. Some extra training examples are also generated from the trained ensemble and added to the new training set. Finally, a C4.5 decision tree is grown from the new training set. Since its learning results are decision trees, the comprehensibility of NeC4.5 is better than that of neural network ensemble. Moreover, experiments show that the generalization ability of NeC4.5 decision trees can be better than that of C4.5 decision trees.

Machine learning,， decision tree， neural networks， ensemble learning， neural network ensemble， generalization， comprehensibility.，

136浏览
0点赞
0收藏
0分享
46下载
0

引用

上传时间

2005年08月02日

【期刊论文】Projection functions for eye detection

周志华， Zhi-Hua Zhou*， Xin Geng

Pattern Recognition 37 (2004) 1049-1056，-0001，（）：

-1年11月30日

摘要

In this paper, the generalized projection function (GPF) is de4ned. Both the integral projection unction (IPF) and the variance projection function (VPF) can be viewed as special cases of GPF. Another special case of GPF, i. e. the hybrid projection function (HPF), is developed through experimentally determining the optimal parameters of GPF. Experiments on three face databases show that IPF, VPF, and HPF are all e: ective in eye detection. Nevertheless, HPF is better than VPF, while VPF is better than IPF. Moreover, IPF is found to be more e: ective on occidental than on oriental faces, and VPF is more e: ective on oriental than on occidental faces. Analysis of the detections shows that this e: ect may be owed to the shadow of the noses and eyeholes of di:erent races of people.

Eye detection， Face detection， Face recognition， Projection function， Race e:ect

49浏览
0点赞
0收藏
0分享
50下载
0

引用

上传时间

2005年08月02日

【期刊论文】Exploiting Unlabeled Data in Content-Based Image Retrieval

周志华， Zhi-Hua Zhou， Ke-Jia Chen， and Yuan Jiang

ECML 2004, LNAI 3201, pp. 525-536, 2004.，-0001，（）：

-1年11月30日

摘要

In this paper, the Ssair (Semi-Supervised Active Image Retrieval) approach, which attempts to exploit unlabeled data to improve the performance of content-based image retrieval (Cbir), is proposed. This approach combines the merits of semi-supervised learning and active learning. In detail, in each round of relevance feedback, two simple learners are trained from the labeled data, i.e. images from user query and user feedback. Each learner then classiﬁes the unlabeled images in the database and passes the most relevant/irrelevant images to the other learner. After re-training with the additional labeled data, the learners classify the images in the database again and then their classiﬁcations are merged. Images judged to be relevant with high conﬁdence are returned as the retrieval result, while these judged with low conﬁdence are put into the pool which is used in the next round of relevance feedback. Experiments show that semi-supervised learning and active learning mechanisms are both beneﬁcial to Cbir.

121浏览
0点赞
0收藏
0分享
64下载
0

引用

上传时间

2005年08月02日

【期刊论文】Medical Diagnosis With C4.5 Rule Preceded by Artificial Neural Network Ensemble

周志华， Zhi-Hua Zhou， Member， IEEE， and Yuan Jiang

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 7, NO.1, MARCH 2003，-0001，（）：

-1年11月30日

摘要

Comprehensibility is very important for any machine learning technique to be used in computer-aided medical diagnosis. Since an artificial neural network ensemble is composed of multiple artificial neural networks, its comprehensibility is worse than that of a single artificial neural network. In this paper, C4.5 Rule-PANE which combines artificial neural network ensemble with rule induction by regarding the former as a preprocess of the latter, is proposed. At first, an artificial neural network ensemble is trained. Then, a new training data set is generated by feeding the feature vectors of the original training instances to the trained ensemble and replacing the expected class labels of the original training instances with the class labels output from the ensemble. Additional training data may also be appended by randomly generating feature vectors and combining them with their corresponding class labels output from the ensemble. Finally, a specific rule induction approach, i.e., C4.5 Rule, is used to learn rules from the new training data set. Case studies on diabetes, hepatitis, and breast cancer show that C4.5 Rule-PANE could generate rules with strong generalization ability, which profits from artificial neural network ensemble, and strong comprehensibility, which profits from rule induction.

Artificial neural networks， ensemble learning， machine learning， rule induction.，

64浏览
0点赞
0收藏
0分享
52下载
0

引用