基于SolrCloud的新闻事件查询与分析
首发时间:2017-10-12
摘要:新闻, 作为传播信息与记录社会的知识,一直以来都是人类生活中的重要信息来源。一般来说,过时的新闻便会被人们遗忘在角落。但是随着社会发展,信息量成爆炸式增长,人们不仅仅需要从海量的新闻数据中搜寻到自己感兴趣的信息,还需要从巨大的历史新闻库中发掘新的价值。而本次设计实现的新闻事件分析服务系统正是用来帮助人们做这件事情的,实验数据源来自于 GDELT:一个庞大的新闻事件数据集,迄今为止大约有四亿六千多万条数据。传统的 MySQL体系无法承载如此海量数据的实时存储和搜索,而解决这个难题的办法就是运用Spark、Solr这样的分布式大数据解决方案。
关键词: 新闻事件分析 大数据 分布式存储 Solr搜索引擎
For information in English, please click here
Search and Analysis of News Events Service System Based on SolrCloud
Abstract:News, as the knowledge of the dissemination of information and the record of society, has always been an important source of information in human life. In general, outdated news will be forgotten in the corner. But with the development of society, the amount of information grows explosively, people not only need to find their own interest from the massive news information, but also need to explore new value from the huge historical news library. And this design and implementation of the news event analysis service system is used to help people do this thing, the experimental data source comes from GDELT: a huge news event data set, so far about more than 460 million data. The traditional MySQLsystem can not carry such a massive amount of data in real-time storage and search, and the solution to this problem is to use Spark, Solr such large distributed data solutions.
Keywords: News Events Analysis Big Data Distributed Storage Solr Search Engine
基金:
引用
No.****
动态公开评议
共计0人参与
勘误表
基于SolrCloud的新闻事件查询与分析
评论
全部评论0/1000