训练误差概率分布对泛化性能的影响
首发时间:2018-05-18
摘要:经验风险是训练样本预测误差的平均值。然而,当损失函数确定后,对于给定的样本数据集来说,训练样本的预测误差在样本空间中并不全是大概率事件,那么在经验风险中平等对待这些样本是不正确的,所以在分析无界损失函数集合的风险界时,必须要考虑训练样本预测误差概率分布的尾部厚重程度。因此,本文将训练样本的预测误差按倒序排序,然后重新讨论经验风险与结构风险,分析了期望风险的上界,提出了新的泛化误差界--尾部综合指数下界。基于商空间理论本文还提出将尾部综合指数下界与LOO留一方法误差界综合考虑,在假设空间构造多层拓扑结构来寻找最优的假设紧子空间。通过在实验证明了本方法在提高一些学习算法的泛化性能的鲁棒性与有效性。
关键词: 无界非负损失函数集 统计学习理论 拓扑结构 重尾指数
For information in English, please click here
Influence of Training Error Probability Distribution on Generalization Performance
Abstract:The empirical risk is an average of the training sample prediction error.However, when the loss function is determined, for a given sample data set, the prediction errors of the training samples are not all is a big probability eventin the sample space, then in the empirical risk equal treatment to the samples is incorrect, so in the riskanalysis of unboundedloss function set, we must consider the thickness of tail of the training sample prediction error probability distribution. For this reason, in this paper, the prediction errors of the training sample are sorted in ascending order, then new empirical risk and structural risk are discussed, and the upper bound of the expected risk is analyzed.A new generalization error boundary, lower bound of tail synthetic index, is proposed. Based onquotient space theory the new bound and loo error bound are introduced for finding optimal compact function subspace. The robustness and effectiveness of the proposed method in improving the generalization performance of some learning algorithms are validated by experiments.
Keywords: loss function statistical learning theory topological structure tail synthetic index
基金:
引用
No.****
动态公开评议
共计0人参与
勘误表
训练误差概率分布对泛化性能的影响
评论
全部评论0/1000