船运信息主题爬虫系统设计
首发时间:2013-11-20
摘要:随着网络上信息的爆炸式增长,人们获取所需信息变得越来越困难。本文围绕主题搜索引擎这一社会研究的新热点技术,对主题搜索引擎中占重要地位的主题爬虫给予研究和讨论。研究了主题爬虫在航运信息领域的应用,进行框架设计和优化,根据航运主题改进爬虫中的模块,并利用过滤URL,网页分类,建立表单集,表单输入等方法提高了航运信息主题爬虫的爬行覆盖率和准确率。最后通过测试实验,给出了测试结果和分析。
For information in English, please click here
Focused Crawler of Shipping Information System
Abstract:With the rapid growth of the Internet,the conflict between the growth of the Web information and the ability of people achieving it is becoming huger and huger.Surrounding the research on this hotspot,the important part of the topic-specific search engine that is called focused crawler is discussed in this paper.The paper researches application of the topic crawler in the field of the shipping information and optimizes the design of the frame.The crawler module has been improved based on the shipping theme.By using filtering URL,web page classification ,building form and inputing form,the paper has enhanced the coverage and accuracy of the topic crawler .The performance of this crawler is tested and the result is presented.
Keywords: Focused crawler Filtering Classification The shipping information Jsoup
基金:
论文图表:
引用
No.****
同行评议
共计0人参与
勘误表
船运信息主题爬虫系统设计
评论
全部评论0/1000