基于Flume/Kafka/Spark的分布式日志流处理系统的设计与实现

陈任飞; 吕玉琴; 侯宾

0
0
浏览
下载

摘要
关键词
基金信息
论文图表
同行评议
相关论文
评论

基于Flume/Kafka/Spark的分布式日志流处理系统的设计与实现

首发时间：2015-07-14

陈任飞 ¹
陈任飞（1990-），男，硕士研究生，主要研究方向：大数据信息处理
吕玉琴 ¹
吕玉琴（1944-），女，教授，主要研究方向：智能信息处理
侯宾 ¹

1、北京邮电大学电子工程学院，北京 100876

摘要：随着移动互联网技术的发展和普及，企业的日常生产和交易活动伴随着大量日志的产生。如何使用新型的技术处理传统技术无法处理的海量日志数据，提取相应的商业信息，已经成为目前诸多行业的企业急需解决的问题。分布式计算框架的出现为解决这个问题提供了一种思路。Flume是一个可靠的用于日志收集、聚合和传输的分布式系统；Kafka是一个高吞吐的分布式发布/订阅消息系统；Spark是继Hadoop之后的新一代大数据分布式数据处理框架，Spark Streaming是Spark专门服务于流式数据的处理框架。本文基于Flume、Kafka和Spark 构建了一个分布式日志流处理系统。通过这个系统，企业可以高效、实时、可靠地获取和分析日志流数据，获得可用于辅助企业做出相关商业决策的信息，从而提高企业的服务质量和竞争力。

关键词：分布式系统日志流 Kafka Flume Spark

For information in English, please click here

Design and Implementation of Distributed Log Streams Processing System Based on Flume/Kafka/Spark

CHEN Renfei ¹
陈任飞（1990-），男，硕士研究生，主要研究方向：大数据信息处理
LV Yuqin ¹
吕玉琴（1944-），女，教授，主要研究方向：智能信息处理
HOU Bin ¹

1、School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876

Abstract：With the development and popularization of mobile Internet technology, daily production and trading activities of enterprises bring about massive amounts of log data. How to deal with these logs which can't be processed by traditional log system with a new method and how to extract relevant business information has become an issue that needs to be addressed urgently in many industries. The emergence of the distributed computing framework provides a train of thought to solve this problem. Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Kafka is a high-throughput, distributed publish-subscribe messaging system. Spark is a new generation distributed big data computing framework after Hadoop. Spark Streaming is a framework specially for stream-oriented computation. We have designed and implemented a distributed log streams processing system based on Flume, Kafka and Spark. With this system, enterprises can fetch and analyze log streams data and obtain the decision making information for enterprises business efficiently, real-timely and reliably, thereby the service quality and competitiveness of enterprise can be improved.

Keywords： distributed system log stream Kafka Flume Spark

基金：

论文图表：

引用

导出参考文献

.txt

.ris

.doc

陈任飞，吕玉琴，侯宾. 基于Flume/Kafka/Spark的分布式日志流处理系统的设计与实现[EB/OL]. 北京：中国科技论文在线 [2015-07-14]. https://www.paper.edu.cn/releasepaper/content/201507-130.

No.4648948105915514****

同行评议

共计0人参与

全部评论

0/1000

论文编号	201507-130
论文题目	基于Flume/Kafka/Spark的分布式日志流处理系统的设计与实现
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：Albert Einstein,编入参考文献时写法：Einstein A. 示例2：原姓名写法：李时珍；编入参考文献时写法：LI S Z. 示例3：YELLAND R L,JONES S C,EASTON K S,et al.