您当前所在位置: 首页 > 学者

蔡登

  • 11浏览

  • 0点赞

  • 0收藏

  • 0分享

  • 0下载

  • 0评论

  • 引用

期刊论文

A Better Way to Attend: Attention With Trees for Video Question Answering

暂无

IEEE Transactions on Image Processing,2018,27(11):5563 - 557 | 2018年07月25日 | 10.1109/TIP.2018.2859820

URL:https://ieeexplore.ieee.org/document/8419716

摘要/描述

We propose a new attention model for video question answering. The main idea of the attention models is to locate on the most informative parts of the visual data. The attention mechanisms are quite popular these days. However, most existing visual attention mechanisms regard the question as a whole. They ignore the word-level semantics where each word can have different attentions and some words need no attention. Neither do they consider the semantic structure of the sentences. Although the extended soft attention model for video question answering leverages the word-level attention, it performs poorly on long question sentences. In this paper, we propose the heterogeneous tree-structured memory network (HTreeMN) for video question answering. Our proposed approach is based upon the syntax parse trees of the question sentences. The HTreeMN treats the words differently where the visual words are processed with an attention module and the verbal ones not. It also utilizes the semantic structure of the sentences by combining the neighbors based on the recursive structure of the parse trees. The understandings of the words and the videos are propagated and merged from leaves to the root. Furthermore, we build a hierarchical attention mechanism to distill the attended features. We evaluate our approach on two data sets. The experimental results show the superiority of our HTreeMN model over the other attention models, especially on complex questions.

关键词:

学者未上传该成果的PDF文件,请等待学者更新

我要评论

全部评论 0

本学者其他成果

    同领域成果