基于NodeJs的网络图片爬虫工具的设计与实现

柴青山; 周晓光

0
0
浏览
下载

摘要
关键词
基金信息
论文图表
动态公开评议
相关论文
评论

基于NodeJs的网络图片爬虫工具的设计与实现

首发时间：2018-07-12

柴青山 ¹
柴青山（1995-），男，硕士研究生，主要研究方向为物流信息化与网络化
周晓光 ¹
周晓光（1957-），男，教授，博导，主要研究智能化物流传输存储系统、计算机网络管理控制系统。

1、北京邮电大学自动化学院，北京市１００８７６

摘要：本文提出了一种基于NodeJs的网络图片爬虫程序设计方法，本方法首先由NodeJs发起http请求，通过关键词在将要爬取图片数据的目标网站进行搜索得到相应的图片信息，然后通过分析页面结构得到图片的地址，将图片进行下载并上传保存到服务器，最后又将该程序进行了进一步的优化和改进。实验结果表明：NodeJs的异步I/O特性可以使爬虫不会被网络和数据库读写限制，实现其他多线程语言的并发效果，同时使用相关的开源框架，可以像jquery那样实现html的dom操作，不需要使用正则表达式，减少了代码量，降低了出错几率，具有针对性强、数据采集速度合理、稳定性强等优点。

关键词：程序设计爬虫 NodeJs 图片抓取

For information in English, please click here

A Network Picture Crawler Based on NodeJs

CHAI Qingshan ¹
柴青山（1995-），男，硕士研究生，主要研究方向为物流信息化与网络化
ZHOU Xiaoguang ¹
周晓光（1957-），男，教授，博导，主要研究智能化物流传输存储系统、计算机网络管理控制系统。

1、Institute of Automation,Beijing University of Posts and Telecommunications,Beijing 100876

Abstract：This paper presents a method of web image spider program based on NodeJs. This method first initiates the HTTP request by NodeJs, searches for the corresponding picture information by the keyword in the target website that contains the picture data, then gets the address of the picture through the analysis of the page structure and downloads the picture. It is saved to the server, and the program is further optimized and improved. The experimental results show that the asynchronous I/O characteristics of NodeJs can make the spider not restricted by the network and database reading and writing, and realize the concurrent effect of other multithreaded programming language. At the same time, using the relevant open source framework, the DOM operation of HTML can be implemented like jQuery, and the regular expression is not needed, the amount of code and the error is reduced. It has the advantages of strong pertinence, reasonable data collection speed and strong stability.

Keywords： Program design Spider NodeJs Picture grabbing

基金：

论文图表：

引用

导出参考文献

.txt

.ris

.doc

柴青山，周晓光. 基于NodeJs的网络图片爬虫工具的设计与实现[EB/OL]. 北京：中国科技论文在线 [2018-07-12]. https://www.paper.edu.cn/releasepaper/content/201807-28.

No.****

动态公开评议

共计0人参与

动态评论进行中

全部评论

0/1000

论文编号	201807-28
论文题目	基于NodeJs的网络图片爬虫工具的设计与实现
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：Albert Einstein,编入参考文献时写法：Einstein A. 示例2：原姓名写法：李时珍；编入参考文献时写法：LI S Z. 示例3：YELLAND R L,JONES S C,EASTON K S,et al.