主题网络爬虫的关键技术研究
首发时间:2012-11-09
摘要:随着互联网技术的迅猛发展,浩瀚的网络信息与人们获取真正所需信息能力之间的矛盾越来越突出,这就需要搜索引擎技术的支持。于是主题网络爬虫作为新兴的第四代搜索引擎应运而生,本文也是主要集中讨论关系主题网络爬虫的核心算法包括搜索策略的研究以及相关度的计算等等,同时受制于互联网本身的特性和爬虫的搜索策略,在主题爬虫领域普遍存在隧道现象本文也将进行相关的阐述并且提出了相应的改进算法。
For information in English, please click here
Research on topic-focused Web crawler
Abstract:With the rapid development of Internet technology,the vast information has more and more contradiction with information capacity people can obtain.So it needs the support of search engine technology.Topic web crawler appears as the fourth-generation of search engine,This article is mainly to discuss the relationship topic web crawler core algorithm including search strategy、relevance algorithm and so on.Because of the characteristics of the Internet itself,The article also described the widespread tunneling phenomena in the internet field and proposed Improved algorithm.
Keywords: Topic web crawler Search engine Relevance Tunnel
基金:
论文图表:
引用
No.****
同行评议
共计0人参与
勘误表
主题网络爬虫的关键技术研究
评论
全部评论0/1000