中国科技论文在线

上传时间

2020年11月12日

Neurocomputing，2018，290（）：26-33

2018年05月17日

Recently, learning equivariant representations has attracted considerable research attention. Dieleman et al. introduce four operations which can be inserted into convolutional neural network to learn deep representations equivariant to rotation. However, feature maps should be copied and rotated four times in each layer in their approach, which causes much running time and memory overhead. In order to address this problem, we propose Deep Rotation Equivariant Network consisting of cycle layers, isotonic layers and decycle layers. Our proposed layers apply rotation transformation on filters rather than feature maps, achieving a speed up of more than 2 times with even less memory overhead. We evaluate DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrate that it can improve the performance of state-of-the-art architectures.

Neural network， Rotation equivariance， Deep learning

0

36浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Sparse Coding Guided Spatiotemporal Feature Learning for Abnormal Event Detection in Large Videos

IEEE Transactions on Multimedia，-0001，21（1）：246 - 255

-1年11月30日

摘要

Abnormal event detection in large videos is an important task in research and industrial applications, which has attracted considerable attention in recent years. Existing methods usually solve this problem by extracting local features and then learning an outlier detection model on training videos. However, most previous approaches merely employ hand-crafted visual features, which is a clear disadvantage due to their limited representation capacity. In this paper, we present a novel unsupervised deep feature learning algorithm for the abnormal event detection problem. To exploit the spatiotemporal information of the inputs, we utilize the deep three-dimensional convolutional network (C3D) to perform feature extraction. Then, the key problem is how to train the C3D network without any category labels. Here, we employ the sparse coding results of the hand-crafted features generated from the inputs to guide the unsupervised feature learning. Specifically, we define a multilevel similarity relationship between these inputs according to the statistical information of the shared atoms. In the following, we introduce the quadruplet concept to model the multilevel similarity structure, which could be used to construct a generalized triplet loss for training the C3D network. Furthermore, the C3D network could be utilized to generate the features for sparse coding again, and this pipeline could be iterated for several times. By jointly optimizing between the sparse coding and the unsupervised feature learning, we can obtain robust and rich feature representations. Based on the learned representations, the sparse reconstruction error is applied to predicting the anomaly score of each testing input. Experiments on several publicly available video surveillance datasets in comparison with a number of existing works demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.

无

0

35浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Long-Form Video Question Answering via Dynamic Hierarchical Reinforced Networks

IEEE Transactions on Image Processing，2019，28（12）：5939 - 595

2019年06月17日

摘要

Open-ended long-form video question answering is a challenging task in visual information retrieval, which automatically generates a natural language answer from the referenced long-form video contents according to a given question. However, the existing works mainly focus on short-form video question answering, due to the lack of modeling semantic representations from long-form video contents. In this paper, we introduce a dynamic hierarchical reinforced network for open-ended long-form video question answering, which employs an encoder-decoder architecture with a dynamic hierarchical encoder and a reinforced decoder. Concretely, we first propose a frame-level dynamic long-short term memory (LSTM) network with binary segmentation gate to learn frame-level semantic representations according to the given question. We then develop a segment-level highway LSTM network with a question-aware highway gate for segment-level semantic modeling. Furthermore, we devise the reinforced decoder with a hierarchical attention mechanism to generate natural language answers. We construct a large-scale long-form video question answering dataset. The extensive experiments on the long-form dataset and another public short-form dataset show the effectiveness of our method.

无

0

34浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】The forgettable-watcher model for video question answering

Neurocomputing，2018，314（）：386-393

2018年11月07日

摘要

A number of visual question answering approaches have been proposed recently, aiming at understanding the visual scenes by answering the natural language questions. While the image question answering has drawn significant attention, video question answering is largely unexplored. Video-QA is different from Image-QA since the information and the events are scattered among multiple frames. In order to better utilize the temporal structure of the videos and the phrasal structures of the answers, we propose two mechanisms: the re-watching and the re-reading mechanisms and combine them into the forgettable-watcher model. Then we propose a TGIF-QA dataset for video question answering with the help of automatic question generation. Finally, we evaluate the models on our dataset. The experimental results show the effectiveness of our proposed models.

Video analysis， Video question answering， Attention model

0

33浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Question retrieval for community-based question answering via heterogeneous social influential network

Neurocomputing，2018，285（）：117-124

2018年04月12日

摘要

Community-based question answering platforms have attracted substantial users to share knowledge and learn from each other. As the rapid enlargement of community-based question answering (CQA) platforms, quantities of overlapped questions emerge, which makes users confounded to select a proper reference. It is urgent for us to take effective automated algorithms to reuse historical questions with corresponding answers. In this paper, we focus on the problem with question retrieval, which aims to match historical questions that are relevant or semantically equivalent to resolve one’s query directly. The challenges in this task are the lexical gaps between questions for the word ambiguity and word mismatch problem. Furthermore, limited words in queried sentences cause sparsity of word features. To alleviate these challenges, we propose a novel framework named HSIN which encodes not only the question contents but also the asker’s social interactions to enhance the question embedding performance. More specifically, we apply random walk based learning method with recurrent neural network to match the similarities between asker’s question and historical questions proposed by other users. Extensive experiments on a large-scale dataset from a real world CQA site Quora show that employing the heterogeneous social network information outperforms the other state-of-the-art solutions in this task.

CQA， Question retrieval， Deep learning， Social network

0

28浏览
0点赞
0收藏
0分享
0下载
0

引用