A Mixed Integer Programming Approach for Gene Selection
首发时间:2013-09-16
Abstract:%It is known that for most of gene expression data for cancer classification, the number of samples is quite small compared to the number of genes. Therefore, feature selection is an essential pre-processing step and a challenging problem to remove the irrelevant or redundant genes before classification.In this paper, we model the gene selection problem as a mixed integer programming problem based on 1-norm support vector machine (SVM). This problem is difficult to solve because the number of integer variables (usually tens of thousands or even hundreds of thousands) is very big compared to the desired number of genes. To solve this problem, we propose an iterative mixed integer optimization algorithm to gradually select a subset of genes. We test the proposed approach on colon cancer and leukemia cancer gene expression datasets. The results show that our proposed algorithm performs better than fisher criterion, T-statistics, standard 2-norm SVM and SVM recursive feature elimination (SVM-RFE) methods. The selected gene subset has better classification accuracy.
keywords: Pattern recognition and intelligent system Gene selection Support Vector Machine SVM-RFE Mixed Integer Programming
点击查看论文中文信息
一种用于基因选择的混合整数规划方法
摘要:%众所周知,对癌症分类中的大多数基因表达数据来说,样本的个数与基因的个数相比是非常小的。因此,特征选择是分类前一个必要的预处理步骤,也是一个旨在移除不相关的和冗余的基因的富有挑战性的问题。在本文中,我们将基因选择问题建模为一个基于1-范数支持向量机的混合整数规划问题。这个问题难以求解,因为整数变量的个数(通常成千上万甚至上百万)与拟选的特征基因的个数相比是非常大的。为解决这个问题,我们提出一种迭代混合整数优化算法来逐步迭代选择出一个特征基因子集。我们将这种方法在结肠癌和白血病基因表达数据集上进行测试,实验结果表明我们提出的算法性能比Fisher准则、T统计、标准的2-范数支持向量机和SVM-RFE 方法的都要好,最终选择的特征基因子集具有更好的分类准确率。
关键词: 模式识别与智能系统 基因选择 支持向量机 支持向量机-迭代特征消除 混合整数规划
基金:
论文图表:
引用
No.****
同行评议
勘误表
一种用于基因选择的混合整数规划方法
评论
全部评论0/1000