蔡登
博士 教授 博士生导师
浙江大学 计算机科学与技术学院
主要研究方向是机器学习
个性化签名
- 姓名:蔡登
- 目前身份:在职研究人员
- 担任导师情况:博士生导师
- 学位:博士
-
学术头衔:
博士生导师
- 职称:高级-教授
-
学科领域:
人工智能
- 研究兴趣:主要研究方向是机器学习
蔡登,浙江大学计算机学院教授,博导,国家优秀青年基金获得者、万人计划青年拔尖人才、青年973首席科学家。主要研究方向是机器学习。近年来在人工智能及计算机视觉领域的国际顶尖学术会议及期刊上共发表学术论文130余篇,共被他人引用15000余次,H指数53。目前担任IEEE TKDE编委,AAAI 2017和IJCAI 2017资深程序委员会委员。获得2012年国际人工智能年会最佳论文奖。
-
主页访问
283
-
关注数
1
-
成果阅读
1247
-
成果数
25
【期刊论文】Social-Aware Movie Recommendation via Multimodal Network Learning
IEEE Transactions on Multimedia,2017,20(2):430 - 440
2017年08月15日
With the rapid development of Internet movie industry social-aware movie recommendation systems (SMRs) have become a popular online web service that provide relevant movie recommendations to users. In this effort many existing movie recommendation approaches learn a user ranking model from user feedback with respect to the movie's content. Unfortunately this approach suffers from the sparsity problem inherent in SMR data. In the present work we address the sparsity problem by learning a multimodal network representation for ranking movie recommendations. We develop a heterogeneous SMR network for movie recommendation that exploits the textual description and movie-poster image of each movie as well as user ratings and social relationships. With this multimodal data we then present a heterogeneous information network learning framework called SMR-multimodal network representation learning (MNRL) for movie recommendation. To learn a ranking metric from the heterogeneous information network we also developed a multimodal neural network model. We evaluated this model on a large-scale dataset from a real world SMR Web site and we find that SMR-MNRL achieves better performance than other state-of-the-art solutions to the problem.
无
0
-
62浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
IEEE Transactions on Knowledge and Data Engineering,2018,30(11):2145 - 215
2018年03月15日
The genome-wide association study (GWAS) is a popular approach to identify disease-associated genetic factors for Alzhemer's Disease (AD). However, it remains challenging because of the small number of samples, very high feature dimensionality and complex structures. To accurately identify genetic risk factors for AD, we propose a novel method based on an in-depth exploration of the hierarchical structure among the features and the commonality across related tasks. Specifically, we first extract and encode the tree hierarchy among features; then, we integrate the tree structures with multi-task feature learning (MTFL) to learn the shared features-that are predictive of AD-among related tasks simultaneously. Thus, we can unify the strength of both the prior structure information and MTFL to boost the prediction performance. However, due to the highly complex regularizer that encodes the tree structure and the extremely high feature dimensionality, the learning process can be computationally prohibitive. To address this, we further develop a novel safe screening rule to quickly identify and remove the irrelevant features before training. Experiment results demonstrate that the proposed approach significantly outperforms the state-of-the-art in detecting genetic risk factors of AD and the speedup gained by the proposed screening can be several orders of magnitude.
无
0
-
43浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Weakly-Supervised Deep Embedding for Product Review Sentiment Analysis
IEEE Transactions on Knowledge and Data Engineering,2017,30(1):185 - 197
2017年09月26日
Product reviews are valuable for upcoming buyers in helping them make decisions. To this end, different opinion mining techniques have been proposed, where judging a review sentence's orientation (e.g., positive or negative) is one of their key challenges. Recently, deep learning has emerged as an effective means for solving sentiment classification problems. A neural network intrinsically learns a useful representation automatically without human efforts. However, the success of deep learning highly relies on the availability of large-scale training data. We propose a novel deep learning framework for product review sentiment classification which employs prevalently available ratings as weak supervision signals. The framework consists of two steps: (1) learning a high level representation (an embedding space) which captures the general sentiment distribution of sentences through rating information; and (2) adding a classification layer on top of the embedding layer and use labeled sentences for supervised fine-tuning. We explore two kinds of low level network structure for modeling review sentences, namely, convolutional feature extractors and long short-term memory. To evaluate the proposed framework, we construct a dataset containing 1.1M weakly labeled review sentences and 11,754 labeled review sentences from Amazon. Experimental results show the efficacy of the proposed framework and its superiority over baselines.
无
0
-
40浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】A Better Way to Attend: Attention With Trees for Video Question Answering
IEEE Transactions on Image Processing,2018,27(11):5563 - 557
2018年07月25日
We propose a new attention model for video question answering. The main idea of the attention models is to locate on the most informative parts of the visual data. The attention mechanisms are quite popular these days. However, most existing visual attention mechanisms regard the question as a whole. They ignore the word-level semantics where each word can have different attentions and some words need no attention. Neither do they consider the semantic structure of the sentences. Although the extended soft attention model for video question answering leverages the word-level attention, it performs poorly on long question sentences. In this paper, we propose the heterogeneous tree-structured memory network (HTreeMN) for video question answering. Our proposed approach is based upon the syntax parse trees of the question sentences. The HTreeMN treats the words differently where the visual words are processed with an attention module and the verbal ones not. It also utilizes the semantic structure of the sentences by combining the neighbors based on the recursive structure of the parse trees. The understandings of the words and the videos are propagated and merged from leaves to the root. Furthermore, we build a hierarchical attention mechanism to distill the attended features. We evaluate our approach on two data sets. The experimental results show the superiority of our HTreeMN model over the other attention models, especially on complex questions.
无
0
-
66浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Multi-Task Vehicle Detection With Region-of-Interest Voting
IEEE Transactions on Image Processing,2017,27(1):432 - 441
2017年10月12日
Vehicle detection is a challenging problem in autonomous driving systems, due to its large structural and appearance variations. In this paper, we propose a novel vehicle detection scheme based on multi-task deep convolutional neural networks (CNNs) and region-of-interest (RoI) voting. In the design of CNN architecture, we enrich the supervised information with subcategory, region overlap, bounding-box regression, and category of each training RoI as a multi-task learning framework. This design allows the CNN model to share visual knowledge among different vehicle attributes simultaneously, and thus, detection robustness can be effectively improved. In addition, most existing methods consider each RoI independently, ignoring the clues from its neighboring RoIs. In our approach, we utilize the CNN model to predict the offset direction of each RoI boundary toward the corresponding ground truth. Then, each RoI can vote those suitable adjacent bounding boxes, which are consistent with this additional information. The voting results are combined with the score of each RoI itself to find a more accurate location from a large number of candidates. Experimental results on the real-world computer vision benchmarks KITTI and the PASCAL2007 vehicle data set show that our approach achieves superior performance in vehicle detection compared with other existing published works.
无
0
-
58浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】The forgettable-watcher model for video question answering
Neurocomputing,2018,314():386-393
2018年11月07日
A number of visual question answering approaches have been proposed recently, aiming at understanding the visual scenes by answering the natural language questions. While the image question answering has drawn significant attention, video question answering is largely unexplored. Video-QA is different from Image-QA since the information and the events are scattered among multiple frames. In order to better utilize the temporal structure of the videos and the phrasal structures of the answers, we propose two mechanisms: the re-watching and the re-reading mechanisms and combine them into the forgettable-watcher model. Then we propose a TGIF-QA dataset for video question answering with the help of automatic question generation. Finally, we evaluate the models on our dataset. The experimental results show the effectiveness of our proposed models.
Video analysis, Video question answering, Attention model
0
-
33浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Split-Net: Improving face recognition in one forwarding operation
Neurocomputing,2018,314():94-100
2018年11月07日
The performance of face recognition has been improved a lot owing to deep Convolutional Neural Network (CNN) recently. Because of the semantic structure of face images, local part as well as global shape is informative for learning robust deep face feature representation. In order to simultaneously exploit global and local information, existing deep learning methods for face recognition tend to train multiple CNN models and combine different features based on various local image patches, which requires multiple forwarding operations for each testing image and introduces much more computation as well as running time. In this paper, we aim at improving face recognition in only one forwarding operation by simultaneously exploiting global and local information in one model. To address this problem, we propose a unified end-to-end framework, named as Split-Net, which splits selective intermediate feature maps into several branches instead of cropping on original images. Experimental results demonstrate that our approach can effectively improve the accuracy of face recognition with less computation increased. Specifically, we increase the accuracy by one percent on LFW under standard protocol and reduce the error by 50% under BLUFR protocol. The performance of Split-Net matches state-of-the-arts with smaller training set and less computation finally.
Deep face representation, Region based models, Feature fusion
0
-
37浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Multi-label active learning based on submodular functions
Neurocomputing,2018,313():436-442
2018年11月03日
In the data collection task, it is more expensive to annotate the instance in multi-label learning problem, since each instance is associated with multiple labels. Therefore it is more important to adopt active learning method in multi-label learning to reduce the labeling cost. Recent researches indicate submodular function optimization works well on subset selection problem and provides theoretical performance guarantees while simultaneously retaining extremely fast optimization. In this paper, we propose a query strategy by constructing a submodular function for the selected instance-label pairs, which can measure and combine the informativeness and representativeness. Thus the active learning problem can be formulated as a submodular function maximization problem, which can be solved efficiently and effectively by a simple greedy lazy algorithm. Experimental results show that the proposed approach outperforms several state-of-the-art multi-label active learning methods.
Multi-label active learning, Submodular function optimization
0
-
50浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Deep Rotation Equivariant Network
Neurocomputing,2018,290():26-33
2018年05月17日
Recently, learning equivariant representations has attracted considerable research attention. Dieleman et al. introduce four operations which can be inserted into convolutional neural network to learn deep representations equivariant to rotation. However, feature maps should be copied and rotated four times in each layer in their approach, which causes much running time and memory overhead. In order to address this problem, we propose Deep Rotation Equivariant Network consisting of cycle layers, isotonic layers and decycle layers. Our proposed layers apply rotation transformation on filters rather than feature maps, achieving a speed up of more than 2 times with even less memory overhead. We evaluate DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrate that it can improve the performance of state-of-the-art architectures.
Neural network, Rotation equivariance, Deep learning
0
-
36浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Improving face recognition with domain adaptation
Neurocomputing,2018,287():45-51
2018年04月26日
Nearly all recent face recognition algorithms have been evaluated on the Labeled Faces in the Wild (LFW) dataset and many of them achieved over 99% accuracy. However, the performance is still not enough for real-world applications. One problem is the data bias. The faces in LFW and other web-collected datasets come from celebrities. They are quite different from the faces of a normal person captured in the daily life. In other words, they are different in the face distribution. Replacing the training data with the same distribution is a simple solution. However, the photos of common people are much harder to collect because of the privacy concerns. So it is useful to develop a method that transfers the knowledge in the data of different face distribution to help improving the final performance. In this paper, we crawl a large face dataset whose distribution is different from LFW and show the improvement of LFW accuracy with a simple domain adaptation technique. To the best of our knowledge, it is the first time that domain adaptation is applied in the unconstrained face recognition problem with a million scale dataset. Besides, we incorporate face verification threshold into FaceNet triplet loss function explicitly. Finally, we achieve 99.33% on the LFW benchmark with only single CNN model and similar performance even without face alignment.
Face recognition, Domain adaptation, Face verification loss
0
-
63浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用