序列相似性对RNA结合蛋白预测的影响
首发时间:2015-12-17
摘要:结合RNA的蛋白质在多种细胞过程中起到重要作用,近些年一些预测RNA结合蛋白的计算方法应运而生。在预测方法中,正负样本的比例和序列相似性和都是众多方法中要考虑和权衡的。在这篇文章中,我们探讨了序列相似性在RNA结合蛋白预测的平衡数据集和非平衡数据集对预测准确性是否有影响。通过在序列相似性阈值分别为35%,30%,25%,20%,15%,10%和5%的平衡数据集和非平衡数据集的测试集上测试,我们的方法得到的ROC曲线下的面积值几乎不变。
For information in English, please click here
The Influence of Sequence Identity to RNA-binding protein prediction
Abstract:RNA-binding proteins play important roles in various cellular processes, which results in the appearance of some methods for RNA-binding protein prediction. The proportion of positive and negative samples and sequence identity between them are need to be considered and weighed in the prediction methods. In this article, we discussed the impact of sequence identity to the performance of prediction on balanced and unbalanced testing sets. Testing on the balanced and unbalanced testing sets with sequence identity cutoff of 35%, 30%, 25%, 20%, 15%, 10% and 5% respectively, our method achieved almost the same performance of area under the receiver operating characteristic(ROC)curve.
Keywords: bioinformatics RNA-binding protein sequence identity
论文图表:
引用
No.4670300100654014****
同行评议
勘误表
序列相似性对RNA结合蛋白预测的影响
评论
全部评论0/1000