基因拷贝数变异与不同人类肿瘤疾病的相关性研究
首发时间:2017-03-20
摘要:基因的拷贝数变异(CNV)作为DNA突变的一种变异形式已经被报道和人类肿瘤有密切的关系。为了更好地理解人类不同肿瘤与CNV之间的关系,本文采用一种理论计算方法基于CNV变异水平值来对6种人类肿瘤进行分类。将每种基因的CNV变异水平值作为一种分类特征,采用mRMR(最小冗余和最大相关性)算法对24,175个基因的CNV变异程度进行重要性排序,筛选出与不同肿瘤的关系最为密切的前1,000个特征。然后使用IFS(增量特征选择)的方法来从这1000个特征中筛选出最能准确分类6种肿瘤的最优特征集。基于SMO(支持向量机)的分类算法对特征进行分类、采用10折交叉验证对分类结果进行误差评估。得出的结果是当使用mRMR排序后的前479个基因的CNV变异水平创建的分类器可达到最高分类准确率80.86%。对这479个基因中的一些重要基因的生物学分析表明,大部分基因与肿瘤有密切关系。本文结果印证了CNV与不同人类肿瘤之间的关系,对进一步了解基因与肿瘤发病之间的关系,以及发现新的肿瘤致病基因都将具有很重要的意义。
For information in English, please click here
The Copy number variation on genes are reltated to sereval different human cancers
Abstract:Genetic alterations can lead to the change of the biological properties, and many abnormal genes are associated with human diseases. Copy number variation (CNV) is a type of DNA mutations. It has been reported that CNV has a close relationship with human tumors. In order to better understand the relationship between CNV and human tumors, this paper adopts a theoretical calculation method basing on the CNV variation level values to classify six kinds of human tumors. Firstly, we used the Maximum Relevance and Minimum Redundancy (mRMR) algorithm to rank the importance of 24,175 feature genes. We selected top 1,000 features from 24,175 feature genes which are ranked by mRMR. Secondly, using the IFS (Incremental Feature Selection) method to screen out the optimal feature subset from these 1,000 features. Thirdly, we used SMO(Sequential Minimal Optimization) method to classify these six toumors, and then used 10-fold-cross-validation to test the result. Finally, we get the highest classification accuracy of 80.86% by using the CNV level values of 479 genes. At last, we analyzed the biological information of some important genes in the optimal feature set, and it shows that there are some genes have close relationship with cancer. In this paper, the result help confirm the relationship between CNV and different human tumors.
Keywords: Bioinformatics Gene Copy number variation Cancer Feature selection
论文图表:
引用
No.4721868102464614****
同行评议
共计0人参与
勘误表
基因拷贝数变异与不同人类肿瘤疾病的相关性研究
评论
全部评论0/1000