基于向量空间模型的垃圾邮件关键词过滤研究
首发时间:2008-10-10
摘要:现今电子邮件已成为信息交互的重要工具,可惜垃圾邮件(spam)的泛滥,造成大量的网络资源浪费,反动邮件甚至严重破坏了社会稳定,所以反垃圾邮件技术研究已经成为当今热点。针对邮件内容的垃圾邮件过滤技术有黑白名单、关键词、hash、规则和概率统计过滤。其中关键词过滤是垃圾邮件过滤常用的方法,它简单易行,可惜过滤效果欠佳,误判率和漏判率都比较高。作者考虑到向量空间模型广泛运用于信息检索和信息过滤领域,提出了一个基于向量空间模型的垃圾邮件关键词过滤方法,并且可以根据用户反馈进行增量式学习,形成垃圾邮件过滤的自适应系统。经过试验验证,该方法具有良好的过滤性能和增量学习能力。
For information in English, please click here
Research on Keywords Spam Filtering based on VSM
Abstract:Now e-mail has become an important tool for information exchange, but junk e-mail (spam) spreads, resulting in substantial waste of network resources, or even reactionary e-mail seriously undermined social stability. Therefore, the anti-spam technology research has become the hot spot. There are black-and-white list filtering , keywords filtering, rule filtering, hash filtering and statistics filtering to the content of the spam. Keywords filtering which is commonly used in spam filtering method, it is simple, but is ineffective. The author takes into account the vector space model widely used for information retrieval and information filtering in the field, and then provides a keywords spam filtering way based on vector space model(VSM). In the way, incremental learning can be carried out according to users’ feedback, and an adaptive spam filtering system can be formed. After tests, the method has a good filtering performance and incremental learning ability.
Keywords: Spam filtering keywords filtering VSM incremental learning
基金:
论文图表:
引用
No.2468832973612236****
同行评议
共计0人参与
勘误表
基于向量空间模型的垃圾邮件关键词过滤研究
评论
全部评论0/1000