基于N-Gram和TF-IDF的URL特征提取系统的研究与实现
首发时间:2019-11-19
摘要:针对web的日志分析通常会对URL进行特征提取,由于URL中可能包含未解码参数,直接使用传统特征提取算法对其进行特征提取会造成提取出的特征过多过杂。针对上述问题,本文设计了基于N-Gram模型和TF-IDF模型的URL特征提取系统。实验表明,同等条件下,利用本文提出的方法所提取出的特征经训练、调优后具有更好的效果。
For information in English, please click here
Research on URL Feature Extraction System Based on N-Gram and TF-IDF
Abstract:Log analysis for the web usually extracts the features of the URLs. Since the URLs may contain undecoded parameters, the feature extracted directly by the traditional methodsmay cause the extracted features to be excessively complex. Aiming at the above problems, this paper designs a URL feature extraction system based on N-Gram model and TF-IDF model. Experiments show that under the same conditions, the features extracted by the method proposed in this paper have better effects after training and tuning.
Keywords: Log analysis Feature extraction N-Gram TF-IDF
基金:
引用
No.****
同行评议
共计0人参与
勘误表
基于N-Gram和TF-IDF的URL特征提取系统的研究与实现
评论
全部评论0/1000