，成果详细信息-中国科技论文在线

蔡登

53浏览
0点赞
0收藏
0分享
0下载
0评论
引用

期刊论文

Moment Retrieval via Cross-Modal Interaction Networks With Query Reconstruction

暂无

IEEE Transactions on Image Processing，2020，29（）：3750 - 376 | 2020年01月17日 | 10.1109/TIP.2020.2965987

URL:https://ieeexplore.ieee.org/document/8962274

摘要/描述

Moment retrieval aims to localize the most relevant moment in an untrimmed video according to the given natural language query. Existing works often only focus on one aspect of this emerging task, such as the query representation learning, video context modeling or multi-modal fusion, thus fail to develop a comprehensive system for further performance improvement. In this paper, we introduce a novel Cross-Modal Interaction Network (CMIN) to consider multiple crucial factors for this challenging task, including the syntactic dependencies of natural language queries, long-range semantic dependencies in video context and the sufficient cross-modal interaction. Specifically, we devise a syntactic GCN to leverage the syntactic structure of queries for fine-grained representation learning and propose a multi-head self-attention to capture long-range semantic dependencies from video context. Next, we employ a multi-stage cross-modal interaction to explore the potential relations of video and query contents, and we also consider query reconstruction from the cross-modal representations of target moment as an auxiliary task to strengthen the cross-modal representations. The extensive experiments on ActivityNet Captions and TACoS demonstrate the effectiveness of our proposed method.

关键词: 无

问答

暂无问题，成为第一个提问者

我要提问全部问题

学者未上传该成果的PDF文件，请等待学者更新

我要评论

全部评论 共 0 条

本学者其他成果

同领域成果