数据均衡算法在时空动态模型应用中的比较分析 -以三峡库区为例
首发时间:2019-12-25
摘要:在机器学习与元胞自动机(Cellular Automata,CA)相结合的时空模型中,如何解决由于数据分布不均衡导致的关键少数地类的模拟精度过低的问题具有十分重要应用价值。本文在Markov-MLP-CA时空动态模型基础上,以三峡库区为例,设计了不同的数据均衡度策略和抽样算法方案,并对不同方案下的Markov-MLP-CA模拟结果进行了对比分析。结果显示:(1)当训练数据集的均衡度从0.64%,依次提升到7.65%、18.38%、23.06%和100%,其少数地类湿地的KAPPA从26.19%、依次提升到33.69%、36.57%、36.86%、42.05%,灌木地KAPPA也相应有所提高。(2)对训练数据进行均衡处理之后,少数地类的精度都得到了不同程度的提升。(3)采用Markov-MLP-CA和SMOTE-Tomek抽样算法耦合的模型,其总体KAPPA为0.8404,各地类KAPPA的波动度最小(49.08%),macro-F1值最高(0.7219)。研究认为:(1)通过改善训练数据的均衡度和改善抽样算法,可达到提高少数地类的模拟精度,降低各KAPPA指数波动度,从而提高模型的总体性能的目的;(2)模型性能评价应综合考虑KAPPA、KAPPA指数波动度和macro-F1值。(3)比较而言,Markov-MLP-CA与SMOTE-Tomek抽样算法耦合的模型具有较好的模拟性能。
关键词: 用地变化模拟 数据不均衡 SMOTE算法 多层感知机 元胞自动机
For information in English, please click here
Comparative Analysis Of Data Equalization Algorithms In Spatio-temporal Dynamic Model-A case Study Of The Three Gorges Reservoir Area
Abstract:In the spatio-temporal model combining machine learning and cellular automata (CA), it is very important to solve the problem that the simulation accuracy of the key minority land classes is too low due to the imbalance of data distribution. Different data balance strategies and sampling algorithm schemes are deMarkov-MLP-CA Spatio-Temporal Dynamic Modeling And Comparison Analysis Of Equilibrium Strategies -Taking The Three Gorges Reservoir Area As An Examplesigned which is based on the Markov-MLP-CA spatio-temporal dynamic model and taking the Three Gorges Reservoir area as an example. And the Markov-MLP-CA simulation results under different schemes are compared and analyzed. The results show: (1) When the equilibrium degree of the training data set increased from 0.64% to 7.65%, 18.38%, 23.06% and 100% respectively, the KAPPA of the wetland which belongs to the minority land classes increased from 26.19%, 33.69%, 36.57%, 36.86% and 42.05% respectively. And the KAPPA of the shrub land also increased correspondingly. (2) After balancing the training data, the accuracy of the minority land classes has been improved in varying degrees.(3) The model which is coupled with Markov-MLP-CA and SMOTE-Tomek sampling algorithms has the following advantages: the total kappa is 0.8404, the volatility of kappa in different regions is the lowest (49.08%), and the value of Macro-F1 is the highest (0.7219). This study considers: (1) By improving the equilibrium degree of training data and sampling algorithm, the simulation accuracy of the minority land classes can be improved, the volatility of Kappa index can be reduced, and the overall performance of the model can be improved.(2) KAPPA, KAPPAindex volatility and Macro-F1 value should be considered in the model performance evaluation.(3) In comparison, the model coupled with Markov-MLP-CA and SMOTE-Tomek sampling algorithm has better simulation performance.
Keywords: Land use change simulation Imbalance data SMOTE algorithm Multilayer perceptron Cellular Automaton
基金:
引用
No.****
动态公开评议
共计0人参与
勘误表
数据均衡算法在时空动态模型应用中的比较分析 -以三峡库区为例
评论
全部评论0/1000