面向BBS的主题爬虫系统的分析与设计
首发时间:2011-10-14
摘要:BBS是当前网络用户发表评论、自由交流的重要平台,也成为了用户需求和商业价值等重要信息的聚集地。主题爬虫是一种面向主题的信息搜集系统,可以根据用户需要从互联网上自动搜集与主题相关信息,在主题搜索引擎、站点结构分析等方面取得越来越广泛的应用。本文首先阐述了主题爬虫的工作原理、模块组成及其实现所需的关键技术,然后通过分析动态网页的目录型结构和BBS的文本结构,设计了一种具有较强通用性的BBS爬虫抓取方案,并详细描述了主题爬虫的设计方案。并与通用网络爬虫方案进行了对比。
For information in English, please click here
The Analysis and Design of Theme Crawler System for Topics in BBS
Abstract:BBS is an important platform for the current network users to make comments and exchange views. There is lots of commercial value, user needs and other important information here. Focused crawler is a topic-oriented information collection system, which can collect information relevant to the subject automatically from the Internet according to user needs, It is used more and more widely in the design of subject search engine and analysis of site structure. This paper describes the principle, main modules and the key technologies of focused Web crawler. A general scheme of the BBS information extraction Web crawler is designed by analyzing the directory structure of dynamic web pages and text structure of BBS. After that, the analysis and design of the focused crawler is described in detail. Finally, Contrast the scheme crawler with the general Web crawler program.
Keywords: BBS Focused Crawler Search Algorithm Monitoring
基金:
论文图表:
引用
No.****
同行评议
共计0人参与
勘误表
面向BBS的主题爬虫系统的分析与设计
评论
全部评论0/1000