基于Flume/Kafka/Spark的分布式日志流处理系统的设计与实现
首发时间:2015-07-14
摘要:随着移动互联网技术的发展和普及,企业的日常生产和交易活动伴随着大量日志的产生。如何使用新型的技术处理传统技术无法处理的海量日志数据,提取相应的商业信息,已经成为目前诸多行业的企业急需解决的问题。分布式计算框架的出现为解决这个问题提供了一种思路。Flume是一个可靠的用于日志收集、聚合和传输的分布式系统;Kafka是一个高吞吐的分布式发布/订阅消息系统;Spark是继Hadoop之后的新一代大数据分布式数据处理框架,Spark Streaming是Spark专门服务于流式数据的处理框架。本文基于Flume、Kafka和Spark 构建了一个分布式日志流处理系统。通过这个系统,企业可以高效、实时、可靠地获取和分析日志流数据,获得可用于辅助企业做出相关商业决策的信息,从而提高企业的服务质量和竞争力。
关键词: 分布式系统 日志流 Kafka Flume Spark
For information in English, please click here
Design and Implementation of Distributed Log Streams Processing System Based on Flume/Kafka/Spark
Abstract:With the development and popularization of mobile Internet technology, daily production and trading activities of enterprises bring about massive amounts of log data. How to deal with these logs which can't be processed by traditional log system with a new method and how to extract relevant business information has become an issue that needs to be addressed urgently in many industries. The emergence of the distributed computing framework provides a train of thought to solve this problem. Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Kafka is a high-throughput, distributed publish-subscribe messaging system. Spark is a new generation distributed big data computing framework after Hadoop. Spark Streaming is a framework specially for stream-oriented computation. We have designed and implemented a distributed log streams processing system based on Flume, Kafka and Spark. With this system, enterprises can fetch and analyze log streams data and obtain the decision making information for enterprises business efficiently, real-timely and reliably, thereby the service quality and competitiveness of enterprise can be improved.
Keywords: distributed system log stream Kafka Flume Spark
基金:
论文图表:
引用
No.4648948105915514****
同行评议
共计0人参与
勘误表
基于Flume/Kafka/Spark的分布式日志流处理系统的设计与实现
评论
全部评论0/1000