中国科技论文在线

上传时间

2020年11月12日

【期刊论文】The forgettable-watcher model for video question answering

Neurocomputing，2018，314（）：386-393

2018年11月07日

A number of visual question answering approaches have been proposed recently, aiming at understanding the visual scenes by answering the natural language questions. While the image question answering has drawn significant attention, video question answering is largely unexplored. Video-QA is different from Image-QA since the information and the events are scattered among multiple frames. In order to better utilize the temporal structure of the videos and the phrasal structures of the answers, we propose two mechanisms: the re-watching and the re-reading mechanisms and combine them into the forgettable-watcher model. Then we propose a TGIF-QA dataset for video question answering with the help of automatic question generation. Finally, we evaluate the models on our dataset. The experimental results show the effectiveness of our proposed models.

Video analysis， Video question answering， Attention model

0

33浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Split-Net: Improving face recognition in one forwarding operation

Neurocomputing，2018，314（）：94-100

2018年11月07日

摘要

The performance of face recognition has been improved a lot owing to deep Convolutional Neural Network (CNN) recently. Because of the semantic structure of face images, local part as well as global shape is informative for learning robust deep face feature representation. In order to simultaneously exploit global and local information, existing deep learning methods for face recognition tend to train multiple CNN models and combine different features based on various local image patches, which requires multiple forwarding operations for each testing image and introduces much more computation as well as running time. In this paper, we aim at improving face recognition in only one forwarding operation by simultaneously exploiting global and local information in one model. To address this problem, we propose a unified end-to-end framework, named as Split-Net, which splits selective intermediate feature maps into several branches instead of cropping on original images. Experimental results demonstrate that our approach can effectively improve the accuracy of face recognition with less computation increased. Specifically, we increase the accuracy by one percent on LFW under standard protocol and reduce the error by 50% under BLUFR protocol. The performance of Split-Net matches state-of-the-arts with smaller training set and less computation finally.

Deep face representation， Region based models， Feature fusion

0

38浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Multi-label active learning based on submodular functions

Neurocomputing，2018，313（）：436-442

2018年11月03日

摘要

In the data collection task, it is more expensive to annotate the instance in multi-label learning problem, since each instance is associated with multiple labels. Therefore it is more important to adopt active learning method in multi-label learning to reduce the labeling cost. Recent researches indicate submodular function optimization works well on subset selection problem and provides theoretical performance guarantees while simultaneously retaining extremely fast optimization. In this paper, we propose a query strategy by constructing a submodular function for the selected instance-label pairs, which can measure and combine the informativeness and representativeness. Thus the active learning problem can be formulated as a submodular function maximization problem, which can be solved efficiently and effectively by a simple greedy lazy algorithm. Experimental results show that the proposed approach outperforms several state-of-the-art multi-label active learning methods.

Multi-label active learning， Submodular function optimization

0

50浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Deep Rotation Equivariant Network

Neurocomputing，2018，290（）：26-33

2018年05月17日

摘要

Recently, learning equivariant representations has attracted considerable research attention. Dieleman et al. introduce four operations which can be inserted into convolutional neural network to learn deep representations equivariant to rotation. However, feature maps should be copied and rotated four times in each layer in their approach, which causes much running time and memory overhead. In order to address this problem, we propose Deep Rotation Equivariant Network consisting of cycle layers, isotonic layers and decycle layers. Our proposed layers apply rotation transformation on filters rather than feature maps, achieving a speed up of more than 2 times with even less memory overhead. We evaluate DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrate that it can improve the performance of state-of-the-art architectures.

Neural network， Rotation equivariance， Deep learning

0

36浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Improving face recognition with domain adaptation

Neurocomputing，2018，287（）：45-51

2018年04月26日

摘要

Nearly all recent face recognition algorithms have been evaluated on the Labeled Faces in the Wild (LFW) dataset and many of them achieved over 99% accuracy. However, the performance is still not enough for real-world applications. One problem is the data bias. The faces in LFW and other web-collected datasets come from celebrities. They are quite different from the faces of a normal person captured in the daily life. In other words, they are different in the face distribution. Replacing the training data with the same distribution is a simple solution. However, the photos of common people are much harder to collect because of the privacy concerns. So it is useful to develop a method that transfers the knowledge in the data of different face distribution to help improving the final performance. In this paper, we crawl a large face dataset whose distribution is different from LFW and show the improvement of LFW accuracy with a simple domain adaptation technique. To the best of our knowledge, it is the first time that domain adaptation is applied in the unconstrained face recognition problem with a million scale dataset. Besides, we incorporate face verification threshold into FaceNet triplet loss function explicitly. Finally, we achieve 99.33% on the LFW benchmark with only single CNN model and similar performance even without face alignment.

Face recognition， Domain adaptation， Face verification loss

0

63浏览
0点赞
0收藏
0分享
0下载
0

引用