中国科技论文在线

上传时间

2020年11月12日

【期刊论文】Fast approximate nearest neighbor search with the navigating spreading-out graph

Proceedings of the VLDB Endowment，2019，（）：

2019年01月01日

Approximate nearest neighbor search (ANNS) is a fundamental problem in databases and data mining. A scalable ANNS algorithm should be both memory-efficient and fast. Some early graph-based approaches have shown attractive theoretical guarantees on search time complexity, but they all suffer from the problem of high indexing time complexity. Recently, some graph-based methods have been proposed to reduce indexing complexity by approximating the traditional graphs; these methods have achieved revolutionary performance on million-scale datasets. Yet, they still can not scale to billion-node databases. In this paper, to further improve the search-efficiency and scalability of graph-based methods, we start by introducing four aspects: (1) ensuring the connectivity of the graph; (2) lowering the average out-degree of the graph for fast traversal; (3) shortening the search path; and (4) reducing the index size. Then, we propose a novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) which guarantees very low search complexity (close to logarithmic time). To further lower the indexing complexity and make it practical for billion-node ANNS problems, we propose a novel graph structure named Navigating Spreading-out Graph (NSG) by approximating the MRNG. The NSG takes the four aspects into account simultaneously. Extensive experiments show that NSG outperforms all the existing algorithms significantly. In addition, NSG shows superior performance in the E-commercial scenario of Taobao (Alibaba Group) and has been integrated into their billion-scale search engine.

无

0

76浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Split-Net: Improving face recognition in one forwarding operation

Neurocomputing，2018，314（）：94-100

2018年11月07日

摘要

The performance of face recognition has been improved a lot owing to deep Convolutional Neural Network (CNN) recently. Because of the semantic structure of face images, local part as well as global shape is informative for learning robust deep face feature representation. In order to simultaneously exploit global and local information, existing deep learning methods for face recognition tend to train multiple CNN models and combine different features based on various local image patches, which requires multiple forwarding operations for each testing image and introduces much more computation as well as running time. In this paper, we aim at improving face recognition in only one forwarding operation by simultaneously exploiting global and local information in one model. To address this problem, we propose a unified end-to-end framework, named as Split-Net, which splits selective intermediate feature maps into several branches instead of cropping on original images. Experimental results demonstrate that our approach can effectively improve the accuracy of face recognition with less computation increased. Specifically, we increase the accuracy by one percent on LFW under standard protocol and reduce the error by 50% under BLUFR protocol. The performance of Split-Net matches state-of-the-arts with smaller training set and less computation finally.

Deep face representation， Region based models， Feature fusion

0

38浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】The forgettable-watcher model for video question answering

Neurocomputing，2018，314（）：386-393

2018年11月07日

摘要

A number of visual question answering approaches have been proposed recently, aiming at understanding the visual scenes by answering the natural language questions. While the image question answering has drawn significant attention, video question answering is largely unexplored. Video-QA is different from Image-QA since the information and the events are scattered among multiple frames. In order to better utilize the temporal structure of the videos and the phrasal structures of the answers, we propose two mechanisms: the re-watching and the re-reading mechanisms and combine them into the forgettable-watcher model. Then we propose a TGIF-QA dataset for video question answering with the help of automatic question generation. Finally, we evaluate the models on our dataset. The experimental results show the effectiveness of our proposed models.

Video analysis， Video question answering， Attention model

0

33浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Multi-label active learning based on submodular functions

Neurocomputing，2018，313（）：436-442

2018年11月03日

摘要

In the data collection task, it is more expensive to annotate the instance in multi-label learning problem, since each instance is associated with multiple labels. Therefore it is more important to adopt active learning method in multi-label learning to reduce the labeling cost. Recent researches indicate submodular function optimization works well on subset selection problem and provides theoretical performance guarantees while simultaneously retaining extremely fast optimization. In this paper, we propose a query strategy by constructing a submodular function for the selected instance-label pairs, which can measure and combine the informativeness and representativeness. Thus the active learning problem can be formulated as a submodular function maximization problem, which can be solved efficiently and effectively by a simple greedy lazy algorithm. Experimental results show that the proposed approach outperforms several state-of-the-art multi-label active learning methods.

Multi-label active learning， Submodular function optimization

0

50浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】A Better Way to Attend: Attention With Trees for Video Question Answering

IEEE Transactions on Image Processing，2018，27（11）：5563 - 557

2018年07月25日

摘要

We propose a new attention model for video question answering. The main idea of the attention models is to locate on the most informative parts of the visual data. The attention mechanisms are quite popular these days. However, most existing visual attention mechanisms regard the question as a whole. They ignore the word-level semantics where each word can have different attentions and some words need no attention. Neither do they consider the semantic structure of the sentences. Although the extended soft attention model for video question answering leverages the word-level attention, it performs poorly on long question sentences. In this paper, we propose the heterogeneous tree-structured memory network (HTreeMN) for video question answering. Our proposed approach is based upon the syntax parse trees of the question sentences. The HTreeMN treats the words differently where the visual words are processed with an attention module and the verbal ones not. It also utilizes the semantic structure of the sentences by combining the neighbors based on the recursive structure of the parse trees. The understandings of the words and the videos are propagated and merged from leaves to the root. Furthermore, we build a hierarchical attention mechanism to distill the attended features. We evaluate our approach on two data sets. The experimental results show the superiority of our HTreeMN model over the other attention models, especially on complex questions.

无

0

66浏览
0点赞
0收藏
0分享
0下载
0

引用