基于NodeJs的网络图片爬虫工具的设计与实现
首发时间:2018-07-12
摘要:本文提出了一种基于NodeJs的网络图片爬虫程序设计方法,本方法首先由NodeJs发起http请求,通过关键词在将要爬取图片数据的目标网站进行搜索得到相应的图片信息,然后通过分析页面结构得到图片的地址,将图片进行下载并上传保存到服务器,最后又将该程序进行了进一步的优化和改进。实验结果表明:NodeJs的异步I/O特性可以使爬虫不会被网络和数据库读写限制,实现其他多线程语言的并发效果,同时使用相关的开源框架,可以像jquery那样实现html的dom操作,不需要使用正则表达式,减少了代码量,降低了出错几率,具有针对性强、数据采集速度合理、稳定性强等优点。
For information in English, please click here
A Network Picture Crawler Based on NodeJs
Abstract:This paper presents a method of web image spider program based on NodeJs. This method first initiates the HTTP request by NodeJs, searches for the corresponding picture information by the keyword in the target website that contains the picture data, then gets the address of the picture through the analysis of the page structure and downloads the picture. It is saved to the server, and the program is further optimized and improved. The experimental results show that the asynchronous I/O characteristics of NodeJs can make the spider not restricted by the network and database reading and writing, and realize the concurrent effect of other multithreaded programming language. At the same time, using the relevant open source framework, the DOM operation of HTML can be implemented like jQuery, and the regular expression is not needed, the amount of code and the error is reduced. It has the advantages of strong pertinence, reasonable data collection speed and strong stability.
Keywords: Program design Spider NodeJs Picture grabbing
基金:
引用
No.****
动态公开评议
共计0人参与
勘误表
基于NodeJs的网络图片爬虫工具的设计与实现
评论
全部评论0/1000