Video Emotion Recognition Using Subtitles Semantics, Audio and Visual Features
首发时间:2017-04-28
Abstract:Recognizing the emotion embedded in the video provides another way to classify media and supplies accurate videos that users really want. Hence, effective techniques for video emotion recognition are highly required. This paper proposes a novel framework for video emotion recognition by integrating textual feature extracted from video subtitles, audio and visual features embedded in video content. Firstly, high-level dialogic semantic features are extracted from video subtitles via Natural Language Processing (NLP) technology. These semantic features can represent emotion information by analyzing the concept of video dialogs rather that simple analysis of words. It is also more practical to extract high-level features from large number of video than to extract physiological signals in implicit tagging from participants. Secondly, a multimodal Deep Boltzmann Machine (DBM) is adopted to learn a joint representation from audio feature, visual feature and textual semantics feature. Considering some dialogs or subtitles may be absent in some videos, this model has ability to predict the joint representation without textual semantics. Finally, the joint representations are inputted into Support Vector Machine (SVM) for video emotion classification and regression. Our experimental results on the open database show the effectiveness of our framework.
keywords: Affective computing video emotion recognition dialogic semantics multimodal DBM
点击查看论文中文信息
基于字幕语义学和音频视觉特征的视频情感识别
摘要:Recognizing the emotion embedded in the video provides another way to classify media and supplies accurate videos that users really want. Hence, effective techniques for video emotion recognition are highly required. This paper proposes a novel framework for video emotion recognition by integrating textual feature extracted from video subtitles, audio and visual features embedded in video content. Firstly, high-level dialogic semantic features are extracted from video subtitles via Natural Language Processing (NLP) technology. These semantic features can represent emotion information by analyzing the concept of video dialogs rather that simple analysis of words. It is also more practical to extract high-level features from large number of video than to extract physiological signals in implicit tagging from participants. Secondly, a multimodal Deep Boltzmann Machine (DBM) is adopted to learn a joint representation from audio feature, visual feature and textual semantics feature. Considering some dialogs or subtitles may be absent in some videos, this model has ability to predict the joint representation without textual semantics. Finally, the joint representations are inputted into Support Vector Machine (SVM) for video emotion classification and regression. Our experimental results on the open database show the effectiveness of our framework.
关键词: Affective computing; video emotion recognition; dialogic semantics; multimodal DBM
论文图表:
引用
No.4728732119647414****
同行评议
勘误表
基于字幕语义学和音频视觉特征的视频情感识别
评论
全部评论0/1000