中国科技论文在线

上传时间

2020年11月12日

【期刊论文】Social-Aware Movie Recommendation via Multimodal Network Learning

IEEE Transactions on Multimedia，2017，20（2）：430 - 440

2017年08月15日

With the rapid development of Internet movie industry social-aware movie recommendation systems (SMRs) have become a popular online web service that provide relevant movie recommendations to users. In this effort many existing movie recommendation approaches learn a user ranking model from user feedback with respect to the movie's content. Unfortunately this approach suffers from the sparsity problem inherent in SMR data. In the present work we address the sparsity problem by learning a multimodal network representation for ranking movie recommendations. We develop a heterogeneous SMR network for movie recommendation that exploits the textual description and movie-poster image of each movie as well as user ratings and social relationships. With this multimodal data we then present a heterogeneous information network learning framework called SMR-multimodal network representation learning (MNRL) for movie recommendation. To learn a ranking metric from the heterogeneous information network we also developed a multimodal neural network model. We evaluated this model on a large-scale dataset from a real world SMR Web site and we find that SMR-MNRL achieves better performance than other state-of-the-art solutions to the problem.

无

0

62浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Neural Machine Translation With Noisy Lexical Constraints

IEEE/ACM Transactions on Audio, Speech, and Language Processing，2020，28（）：1864 - 187

2020年06月04日

摘要

In neural machine translation, lexically constrained decoding generates translation outputs strictly including the constraints predefined by users, and it is beneficial to improve translation quality at the cost of more decoding overheads if the constraints are perfect. Unfortunately, those constraints may contain mistakes in real-world situations and incorrect constraints will undermine lexically constrained decoding. In this article, we propose a novel framework that is capable of improving the translation quality even if the constraints are noisy. The key to our framework is to treat the lexical constraints as external memories. More concretely, it encodes the constraints by a memory encoder and then leverages the memories by a memory integrator. Experiments demonstrate that our framework can not only deliver substantial BLEU gains in handling noisy constraints, but also achieve speedup in decoding. These results motivate us to apply our models to a new scenario where the constraints are generated without the help of users. Experiments show that our models can indeed improve the translation quality with the automatically generated constraints.

无

0

61浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Multi-Task Vehicle Detection With Region-of-Interest Voting

IEEE Transactions on Image Processing，2017，27（1）：432 - 441

2017年10月12日

摘要

Vehicle detection is a challenging problem in autonomous driving systems, due to its large structural and appearance variations. In this paper, we propose a novel vehicle detection scheme based on multi-task deep convolutional neural networks (CNNs) and region-of-interest (RoI) voting. In the design of CNN architecture, we enrich the supervised information with subcategory, region overlap, bounding-box regression, and category of each training RoI as a multi-task learning framework. This design allows the CNN model to share visual knowledge among different vehicle attributes simultaneously, and thus, detection robustness can be effectively improved. In addition, most existing methods consider each RoI independently, ignoring the clues from its neighboring RoIs. In our approach, we utilize the CNN model to predict the offset direction of each RoI boundary toward the corresponding ground truth. Then, each RoI can vote those suitable adjacent bounding boxes, which are consistent with this additional information. The voting results are combined with the score of each RoI itself to find a more accurate location from a large number of candidates. Experimental results on the real-world computer vision benchmarks KITTI and the PASCAL2007 vehicle data set show that our approach achieves superior performance in vehicle detection compared with other existing published works.

无

0

58浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】SIF: Self-Inspirited Feature Learning for Person Re-Identification

IEEE Transactions on Image Processing，2020，29（）：4942 - 495

2020年03月04日

摘要

The re-identification (ReID) task has received increasing studies in recent years and its performance has gained significant improvement. The progress mainly comes from searching for new network structures to learn person representations. However, limited efforts have been made to explore the potential performance of existing ReID networks directly by better training scheme, which leaves a large space for ReID research. In this paper, we propose a Self-Inspirited Feature Learning (SIF) method to enhance the performance of given ReID networks from the viewpoint of optimization. We design a simple adversarial learning scheme to encourage a network to learn more discriminative person representation. In our method, an auxiliary branch is added into the network only in the training stage, while the structure of the original network stays unchanged during the testing stage. In summary, SIF has three aspects of advantages: 1) it is designed under general setting; 2) it is compatible with many existing feature learning networks on the ReID task; 3) it is easy to implement and has steady performance. We evaluate the performance of SIF on three public ReID datasets: Market1501, DuckMTMC-reID, and CUHK03(both labeled and detected). The results demonstrate significant improvement in performance brought by SIF. We also apply SIF to obtain state-of-the-art results on all the three datasets. Specifically, mAP / Rank-1 accuracy are: 87.6%/95.2% (without re-rank) on Market1501, 79.4%/89.8% on DuckMTMC-reID, 77.0%/79.5% on CUHK03 (labeled) and 73.9%/76.6% on CUHK03 (detected), respectively.

无

0

55浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月12日

【期刊论文】Moment Retrieval via Cross-Modal Interaction Networks With Query Reconstruction

IEEE Transactions on Image Processing，2020，29（）：3750 - 376

2020年01月17日

摘要

Moment retrieval aims to localize the most relevant moment in an untrimmed video according to the given natural language query. Existing works often only focus on one aspect of this emerging task, such as the query representation learning, video context modeling or multi-modal fusion, thus fail to develop a comprehensive system for further performance improvement. In this paper, we introduce a novel Cross-Modal Interaction Network (CMIN) to consider multiple crucial factors for this challenging task, including the syntactic dependencies of natural language queries, long-range semantic dependencies in video context and the sufficient cross-modal interaction. Specifically, we devise a syntactic GCN to leverage the syntactic structure of queries for fine-grained representation learning and propose a multi-head self-attention to capture long-range semantic dependencies from video context. Next, we employ a multi-stage cross-modal interaction to explore the potential relations of video and query contents, and we also consider query reconstruction from the cross-modal representations of target moment as an auxiliary task to strengthen the cross-modal representations. The extensive experiments on ActivityNet Captions and TACoS demonstrate the effectiveness of our proposed method.

无

0

53浏览
0点赞
0收藏
0分享
0下载
0

引用