用于Web文档聚类的基于相似度的软聚类算法
首发时间:2004-09-07
摘要:Internet的发展为人们提供了大量的信息资源,Web文本挖掘是从非结构化的文本中发现潜在的有价值的知识的一种有效技术,Web文本聚类能帮助用户获取最新的、来自世界范围的和自己所感兴趣的Web信息。本文提出了一种基于相似度的软聚类算法用于文本聚类,这是一种基于相似性度量的有效的软聚类算法,实验表明通过比较SISC和诸如K-means的硬聚类算法,SISC的聚类速度快,效率高。本文最后展望了文本挖掘在信息技术中的发展前景。
关键词: Web文本挖掘 文本聚类 软聚类 相似度 SISC
For information in English, please click here
A Similarity-based Soft Clustering Algorithm for Web Documents
Abstract:The booming growth of the Internet provides us a great deal of information resource. Web document mining is an efficient technique,which discovery valuable and potential knowledge from those unstructured documents. Web document clustering enables the user to have a good overall view of the information contained in the documents that it has. In this paper,We propose SISC(Similarity-based Soft Clustering),an efficient soft clustering algorithm based on a given similarity measure used in document clustering. Comparison with existing hard clustering algorithms like K-means,the experiment indicates the SISC is both efficient and effective,and this algorithm is available for document clustering. In the end,it highlights the upcoming challenges of document mining and the opportunities it offers.
Keywords: Web document mining document clustering soft clustering Similarity SISC
论文图表:
引用
No.1027474110945197****
同行评议
共计0人参与
勘误表
用于Web文档聚类的基于相似度的软聚类算法
评论
全部评论0/1000