您当前所在位置: 首页 > 学者

蔡登

  • 42浏览

  • 0点赞

  • 0收藏

  • 0分享

  • 0下载

  • 0评论

  • 引用

期刊论文

Multi-Turn Video Question Answering via Hierarchical Attention Context Reinforced Networks

暂无

IEEE Transactions on Image Processing,2019,28(8):3860 - 387 | 2019年02月27日 | 10.1109/TIP.2019.2902106

URL:https://ieeexplore.ieee.org/document/8654010

摘要/描述

Multi-turn video question answering is a challenging task in visual information retrieval, which generates the accurate answer from the referenced video contents according to the visual conversation context and given question. However, the existing visual question answering methods mainly tackle the problem of single-turn video question answering, which may be ineffectively applied for multi-turn video question answering directly, due to the insufficiency of modeling the sequential conversation context. In this paper, we study the problem of multi-turn video question answering from the viewpoint of multi-stream hierarchical attention context reinforced network learning. We first propose the hierarchical attention context network for context-aware question understanding by modeling the hierarchically sequential conversation context structure. We then develop the multi-stream spatio-temporal attention network for learning the joint representation of the dynamic video contents and context-aware question embedding. We next devise a multi-step reasoning process to enhance the multi-stream hierarchical attention context network learning method. We finally predict the multiple-choice answer from the candidate answer set and further develop the reinforced decoder network to generate the open-ended natural language answer for multi-turn video question answering. We construct two large-scale multi-turn video question answering datasets. The extensive experiments show the effectiveness of our method.

关键词:

学者未上传该成果的PDF文件,请等待学者更新

我要评论

全部评论 0

本学者其他成果

    同领域成果