Identifying Genetic Risk Factors for Alzheimer's Disease via Shared Tree-Guided Feature Learning Across Multiple Tasks
IEEE Transactions on Knowledge and Data Engineering，2018，30（11）：2145 - 215 | 2018年03月15日 | 10.1109/TKDE.2018.2816029
The genome-wide association study (GWAS) is a popular approach to identify disease-associated genetic factors for Alzhemer's Disease (AD). However, it remains challenging because of the small number of samples, very high feature dimensionality and complex structures. To accurately identify genetic risk factors for AD, we propose a novel method based on an in-depth exploration of the hierarchical structure among the features and the commonality across related tasks. Specifically, we first extract and encode the tree hierarchy among features; then, we integrate the tree structures with multi-task feature learning (MTFL) to learn the shared features-that are predictive of AD-among related tasks simultaneously. Thus, we can unify the strength of both the prior structure information and MTFL to boost the prediction performance. However, due to the highly complex regularizer that encodes the tree structure and the extremely high feature dimensionality, the learning process can be computationally prohibitive. To address this, we further develop a novel safe screening rule to quickly identify and remove the irrelevant features before training. Experiment results demonstrate that the proposed approach significantly outperforms the state-of-the-art in detecting genetic risk factors of AD and the speedup gained by the proposed screening can be several orders of magnitude.