基于子序列时间分布变化并满足离散性约束的对比序列模式挖掘
首发时间:2019-04-22
摘要:对比序列模式是指在一类序列数据集中频繁出现但在另一类序列数据集中不频繁出现的模式。对比序列模式挖掘已广泛应用于许多领域,如客户行为分析,生物信息学和医学诊断。现有的算法首先要求用户设置区分位置,并使用该固定位置来识别不同子序列的分布差异,即子序列模式在一类序列数据集中出现在给定区别位置之前而在另一类序列数据集中出现在相同位置之后。然而,没有足够的先验知识,用户很难设置合适的位置。由于不同的子序列其区分位置可能不同,因此设置固定位置可能会忽略许多有意义的模式。此外,以前的研究很少考虑子序列的时间分布变化和模式的离散性约束。针对上述问题,本文提出了一种基于子序列时间分布变化并满足离散性约束的对比序列模式挖掘方法。基于后缀树搜索算法,将要处理的数据集转换为树表示,挖掘基于子序列时间分布变化的对比序列模式。通过在真实的时间序列数据集上的实验,验证了算法的有效性和高效性。
For information in English, please click here
Mining Contrast Sequential Patterns based on Subsequence Time Distribution Variation with Discreteness Constraints
Abstract:Contrast sequential pattern is defined as a pattern that occurs frequently in one sequence dataset but not in the others. Contrast sequential pattern mining has been widely used in many fields, such as customer behavior analysis, bioinformatics and medical diagnosis. Existing algorithms first require users to set a distinguishing location and then use this fixed location to identify distribution differences of different subsequences, i.e., the subsequence pattern that appears before the given distinguishing location in one sequence dataset and after the same location in another sequence dataset. However, it is difficult for users to set an appropriate location without sufficient prior knowledge. Since the distinguishing location is different for different subsequences, setting a fixed location may ignore many meaningful patterns. In addition, previous studies rarely considered the time distribution variation of subsequences and the discreteness of patterns. To solve the above problems, we propose a novel method of mining contrast sequential pattern based on subsequence time distribution variation with discreteness constraints in this paper. A suffix-tree based search algorithm, which transforms the dataset to be processed into a tree representation, is designed to mine contrast sequential pattern based on subsequence time distribution variation. Experiments are conducted on real-world time-series datasets, and the experimental results validate the superiority of our method in terms of effectiveness and efficiency when compared with other state-of-the-art methods.
Keywords: Contrast sequential pattern discreteness constraints classification
基金:
引用
No.****
同行评议
勘误表
基于子序列时间分布变化并满足离散性约束的对比序列模式挖掘
评论
全部评论0/1000