基于主成分分析的多结构域蛋白质构象的分簇算法
首发时间:2015-12-09
摘要::包含多个结构域的蛋白质在溶液中通常能够存在各种不同的构象状态,这可能与它们的生物学功能密切相关。除了实验技术外,计算方法比如分子动力学模拟非常适合于研究蛋白质结构域间的相对运动,并采集不同的构象状态。分子动力学模拟通常会产生一个包含大量蛋白质构象的轨迹,而在后续处理过程中分簇分析是非常必要的,即按照相似性对构象分组并确定多结构域蛋白质的典型构象。在本文中,我们基于分子动力学模拟轨迹主成分分析(principal component analysis,PCA)的结果,利用k-means算法对多结构域蛋白质构象分簇。通过对formin binding protein 21串联的WW结构域(FBP21-WW)进行分簇分析表明,上述算法的分簇结果优于利用其它标准(如结构域间成对残基的距离)进行分簇得到的结果。
关键词: 分子动力学模拟 k-means算法 多结构域蛋白质 主成分分析
For information in English, please click here
Clustering Multi-domain Protein Structures in the Subspace Defined by Principal Component Analysis
Abstract:A multi-domain protein is able to exist as equilibrium of different conformations in solution, which may be critical to its biological function. Besides experimental techniques, computational methods like molecular dynamics (MD) simulations are suitable to study inter-domain motions of the protein and sample different conformational states. A MD simulation usually generates a trajectory containing large amount of protein structures, and a post-processing cluster analysis would be necessary to group similar structures into clusters and identify these typical conformations of the multi-domain protein. In this paper, the widely used k-means clustering algorithm is implemented in the protein essential dynamics (ED) subspace defined by principal component analysis on the MD trajectory. Cluster analysis of the FBP21 (formin binding protein 21) tandem WW domains demonstrate that the k-means clustering results by measuring distances between structures in the ED subspace are superior to those by using other metrics like pairwise inter-domain residue distances.
Keywords: molecular dynamics simulations k-means clustering algorithm multi-domain protein principal component analysis
论文图表:
引用
No.4668215663832144****
同行评议
勘误表
基于主成分分析的多结构域蛋白质构象的分簇算法
评论
全部评论0/1000