山世光
博士 教授 博士生导师
中国科学院计算技术研究所 中国科学院智能信息处理重点实验室
以人脸识别为典型案例的计算机视觉和机器学习理论、方法和关键技术;基于视觉的情感计算;认知神经科学和脑科学
个性化签名
- 姓名:山世光
- 目前身份:在职研究人员
- 担任导师情况:博士生导师
- 学位:博士
-
学术头衔:
博士生导师
- 职称:高级-教授
-
学科领域:
模式识别
- 研究兴趣:以人脸识别为典型案例的计算机视觉和机器学习理论、方法和关键技术;基于视觉的情感计算;认知神经科学和脑科学
山世光,中科院计算所研究员、博导,现任中科院智能信息处理重点实验室常务副主任。
1993-08--1997-08 哈尔滨工业大学 学士;1997-09--1999-07 哈尔滨工业大学 硕士;1999-09--2004-07 中国科学院计算技术研究所 博士;2013-12--2015-01 卡内基梅隆大学 访问学者;2010-10~现在, 中国科学院计算技术研究所, 研究员;2011-11~现在, 中国科学院智能信息处理重点实验室, 副主任;2013-03~现在, 中国科学院智能信息处理重点实验室, 常务副主任。
他的研究领域为计算机视觉和机器学习。已在国内外刊物和学术会议上发表论文300余篇,其中CCF A类论文90余篇,论文被谷歌学术引用20000余次。所研发的人脸识别相关研究成果获2005年度国家科技进步二等奖(第3完成人),在高维、非线性视觉模式分析方面的研究成果获2015年度国家自然科学二等奖(第2完成人),视觉流形建模与学习方面的研究成果获CVPR2008 Best Student Poster Award Runner-up奖。他带领团队研发的人脸识别技术已应用于公安部门、华为等众多产品或系统中,取得了良好的经济和社会效益。曾应邀担任过ICCV11,ACCV12/16/18,ICPR12/14/20,FG13/18/20,ICASSP14,BTAS18, AAAI20/21, IJCAI21, CVPR19/20/21等十余次领域主流国际会议的领域主席,现/曾任IEEE TIP, CVIU, PRL, Neurocomputing, FCS等国际学术刊物的编委(AE)。他是基金委优青,国家重要人才计划入选者,CCF青年科学家奖获得者,北京市科技新星,中科院青促会优秀会员。
他的研究兴趣集中于以人脸识别为典型案例的计算机视觉和机器学习理论、方法和关键技术上,特别是在人脸识别领域有超过20年的研究经验。近年来开始重点关注基于视觉的深度情感计算,如远距离、无接触生理信号估计,心理状态估计,精神状态评估等。在理论和算法层面,他和团队有非常丰富的机器学习特别是深度学习研究经验,尤其关注X数据条件下深度结合“知识”的机器学习理论和方法,这里所谓的X数据包括小数据、无监督数据、半监督数据、弱监督数据、脏数据、增广数据等等。
他是视觉与学习青年研讨会(VALSE)的共同发起人,VALSE指导委员会首届轮值主席,VALSE在线学术报告会(VALSE Webinar)活动的共同发起人和首届在线组委会主席。VALSE2019(合肥)参加人数超过了5000人,而VALSE Webinar的高峰参加人数达到了1800人,成为国内计算机视觉领域影响力最大的系列学术会议之一。
作为个人兴趣,他深切关注认知神经科学和脑科学的进展,并乐于思考和讨论生物视觉的本质问题,以及脑神经科学给视觉计算带来的启示。
-
主页访问
87
-
关注数
0
-
成果阅读
589
-
成果数
15
【期刊论文】A comparative study on illumination preprocessing in face recognition
Pattern Recognition,2013,46(6):1691-1699
2013年06月01日
Illumination preprocessing is an effective and efficient approach in handling lighting variations for face recognition. Despite much attention to face illumination preprocessing, there is seldom systemic comparative study on existing approaches that presents fascinating insights and conclusions in how to design better illumination preprocessing methods. To fill this vacancy, we provide a comparative study of 12 representative illumination preprocessing methods (HE, LT, GIC, DGD, LoG, SSR, GHP, SQI, LDCT, LTV, LN and TT) from two novel perspectives: (1) localization for holistic approach and (2) integration of large-scale and small-scale feature bands. Experiments on public face databases (YaleBExt, CMU-PIE, CAS-PEAL and FRGC V2.0) with illumination variations suggest that localization for holistic illumination preprocessing methods (HE, GIC, LTV and TT) further improves the performance. Integration of large-scale and small-scale feature bands for reflectance field estimation based illumination preprocessing approaches (SSR, GHP, SQI, LDCT, LTV and TT) is also found helpful for illumination-insensitive face recognition.
无
0
-
35浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Adaptive discriminant learning for face recognition
Pattern Recognition,2013,46(9):2497-2509
2013年09月01日
Face recognition from Single Sample per Person (SSPP) is extremely challenging because only one sample is available for each person. While many discriminant analysis methods, such as Fisherfaces and its numerous variants, have achieved great success in face recognition, these methods cannot work in this scenario, because more than one sample per person are needed to calculate the within-class scatter matrix. To address this problem, we propose Adaptive Discriminant Analysis (ADA) in which the within-class scatter matrix of each enrolled subject is inferred using his/her single sample, by leveraging a generic set with multiple samples per person. Our method is motivated from the assumption that subjects who look alike to each other generally share similar within-class variations. In ADA, a limited number of neighbors for each single sample are first determined from the generic set by using kNN regression or Lasso regression. Then, the within-class scatter matrix of this single sample is inferred as the weighted average of the within-class scatter matrices of these neighbors based on the arithmetic mean or Riemannian mean. Finally, the optimal ADA projection directions can be computed analytically by using the inferred within-class scatter matrices and the actual between-class scatter matrix. The proposed method is evaluated on three databases including FERET database, FRGC database and a large real-world passport-like face database. The extensive results demonstrate the effectiveness of our ADA when compared with the existing solutions to the SSPP problem.
无
0
-
27浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】CovGa: A novel descriptor based on symmetry of regions for head pose estimation
Neurocomputing,2014,143():97-108
2014年11月02日
This paper proposes a novel method to estimate the head yaw rotation using the symmetry of regions. We argue that the symmetry of 2D regions located in the same horizontal row is more intrinsically relevant to the yaw rotation of head than the symmetry of 1D signals, while at the same time insensitive to the identity of the face. Specifically, the proposed method relies on the effective combination of Gabor filters and covariance descriptors. We first extract the multi-scale and multi-orientation Gabor representations of the input face image, and then use covariance descriptors to compute the symmetry between two regions in terms of Gabor representations under the same scale and orientation. Since the covariance matrix can alleviate the influence caused by rotations and illumination, the proposed method is robust to such variations. In addition, the proposed method is further improved by combining it with a metric learning method named aa KISS MEtric learning (KISSME). Experiments on four challenging databases demonstrated that the proposed method outperformed the state of the art.
Head pose estimation, Covariance descriptors, Gabor filters, Symmetry
0
-
27浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Data-driven hair segmentation with isomorphic manifold inference
Image and Vision Computing,2014,32(10):739-750
2014年10月01日
Hair segmentation is challenging due to the diverse appearance, irregular region boundary and the influence of complex background. To deal with this problem, we propose a novel data-driven method, named Isomorphic Manifold Inference (IMI). The IMI method assumes the coarse probability map and the binary segmentation map as a couple of isomorphic manifolds and tries to learn hair specific priors from manually labeled training images. For an input image, firstly, the method calculates a coarse probability map. Then it exploits regression techniques to obtain the relationship between the coarse probability map of the test image and those of training images. Finally, this relationship, i.e., a coefficient set, is transferred to the binary segmentation maps and a soft segmentation of the test image will be achieved by a linear combination of those binary maps. Further, we employ this soft segmentation as a shape cue and integrate it with color and texture cues into a unified segmentation framework. A better segmentation is achieved by the Graph Cuts optimization. Extensive experiments are conducted to validate effectiveness of the IMI method, compare contributions of different cues and investigate the generalization of IMI method. The results strongly encourage our method.
Hair segmentation, Data driven, Shape model, Isomorphic manifold inference
0
-
33浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Maximal Likelihood Correspondence Estimation for Face Recognition Across Pose
IEEE Transactions on Image Processing,2014,23(10):4587 - 460
2014年08月22日
Due to the misalignment of image features, the performance of many conventional face recognition methods degrades considerably in across pose scenario. To address this problem, many image matching-based methods are proposed to estimate semantic correspondence between faces in different poses. In this paper, we aim to solve two critical problems in previous image matching-based correspondence learning methods: 1) fail to fully exploit face specific structure information in correspondence estimation and 2) fail to learn personalized correspondence for each probe image. To this end, we first build a model, termed as morphable displacement field (MDF), to encode face specific structure information of semantic correspondence from a set of real samples of correspondences calculated from 3D face models. Then, we propose a maximal likelihood correspondence estimation (MLCE) method to learn personalized correspondence based on maximal likelihood frontal face assumption. After obtaining the semantic correspondence encoded in the learned displacement, we can synthesize virtual frontal images of the profile faces for subsequent recognition. Using linear discriminant analysis method with pixel-intensity features, state-of-the-art performance is achieved on three multipose benchmarks, i.e., CMU-PIE, FERET, and MultiPIE databases. Owe to the rational MDF regularization and the usage of novel maximal likelihood objective, the proposed MLCE method can reliably learn correspondence between faces in different poses even in complex wild environment, i.e., labeled face in the wild database.
无
0
-
45浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Domain Adaptation for Face Recognition: Targetize Source Domain Bridged by Common Subspace
International Journal of Computer Vision ,2013,109():pages94–10
2013年12月31日
In many applications, a face recognition model learned on a source domain but applied to a novel target domain degenerates even significantly due to the mismatch between the two domains. Aiming at learning a better face recognition model for the target domain, this paper proposes a simple but effective domain adaptation approach that transfers the supervision knowledge from a labeled source domain to the unlabeled target domain. Our basic idea is to convert the source domain images to target domain (termed as targetize the source domain hereinafter), and at the same time keep its supervision information. For this purpose, each source domain image is simply represented as a linear combination of sparse target domain neighbors in the image space, with the combination coefficients however learnt in a common subspace. The principle behind this strategy is that, the common knowledge is only favorable for accurate cross-domain reconstruction, but for the classification in the target domain, the specific knowledge of the target domain is also essential and thus should be mostly preserved (through targetization in the image space in this work). To discover the common knowledge, specifically, a common subspace is learnt, in which the structures of both domains are preserved and meanwhile the disparity of source and target domains is reduced. The proposed method is extensively evaluated under three face recognition scenarios, i.e., domain adaptation across view angle, domain adaptation across ethnicity and domain adaptation across imaging condition. The experimental results illustrate the superiority of our method over those competitive ones.
无
0
-
35浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
Pattern Recognition,2015,48(10):3113-3124
2015年10月01日
Face recognition on large-scale video in the wild is becoming increasingly important due to the ubiquity of video data captured by surveillance cameras, handheld devices, Internet uploads, and other sources. By treating each video as one image set, set-based methods recently have made great success in the field of video-based face recognition. In the wild world, videos often contain extremely complex data variations and thus pose a big challenge of set modeling for set-based methods. In this paper, we propose a novel Hybrid Euclidean-and-Riemannian Metric Learning (HERML) method to fuse multiple statistics of image set. Specifically, we represent each image set simultaneously by mean, covariance matrix and Gaussian distribution, which generally complement each other in the aspect of set modeling. However, it is not trivial to fuse them since mean, covariance matrix and Gaussian model typically lie in multiple heterogeneous spaces equipped with Euclidean or Riemannian metric. Therefore, we first implicitly map the original statistics into high dimensional Hilbert spaces by exploiting Euclidean and Riemannian kernels. With a LogDet divergence based objective function, the hybrid kernels are then fused by our hybrid metric learning framework, which can efficiently perform the fusing procedure on large-scale videos. The proposed method is evaluated on four public and challenging large-scale video face datasets. Extensive experimental results demonstrate that our method has a clear superiority over the state-of-the-art set-based methods for large-scale video-based face recognition.
Face recognition, Large-scale video, Multiple heterogeneous statistics, Hybrid Euclidean-and-Riemannian metric learning
0
-
42浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Learning prototypes and similes on Grassmann manifold for spontaneous expression recognition
Computer Vision and Image Understanding,2016,147():95-101
2016年06月01日
Video-based spontaneous expression recognition is a challenging task due to the large inter-personal variations of both the expressing manners and the executing rates for the same expression category. One of the key is to explore robust representation method which can effectively capture the facial variations as well as alleviate the influence of personalities. In this paper, we propose to learn a kind of typical patterns that can be commonly shared by different subjects when performing expressions, namely “prototypes”. Specifically, we first apply a statistical model (i.e. linear subspace) on facial regions to generate the specific expression patterns for each video. Then a clustering algorithm is employed on all these expression patterns and the cluster means are regarded as the “prototypes”. Accordingly, we further design “simile” features to measure the similarities of personal specific patterns to our learned “prototypes”. Both techniques are conducted on Grassmann manifold, which can enrich the feature encoding manners and better reveal the data structure by introducing intrinsic geodesics. Extensive experiments are conducted on both posed and spontaneous expression databases. All results show that our method outperforms the state-of-the-art and also possesses good transferable ability under cross-database scenario.
Expression prototype, Simile representation, Grassmann manifold, Spontaneous expression recognition
0
-
34浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Learning Expressionlets via Universal Manifold Model for Dynamic Facial Expression Recognition
IEEE Transactions on Image Processing,2016,25(12): 5920 - 59
2016年10月05日
Facial expression is a temporally dynamic event which can be decomposed into a set of muscle motions occurring in different facial regions over various time intervals. For dynamic expression recognition, two key issues, temporal alignment and semantics-aware dynamic representation, must be taken into account. In this paper, we attempt to solve both problems via manifold modeling of videos based on a novel mid-level representation, i.e., expressionlet. Specifically, our method contains three key stages: 1) each expression video clip is characterized as a spatial-temporal manifold (STM) formed by dense low-level features; 2) a universal manifold model (UMM) is learned over all low-level features and represented as a set of local modes to statistically unify all the STMs; and 3) the local modes on each STM can be instantiated by fitting to the UMM, and the corresponding expressionlet is constructed by modeling the variations in each local mode. With the above strategy, expression videos are naturally aligned both spatially and temporally. To enhance the discriminative power, the expressionlet-based STM representation is further processed with discriminant embedding. Our method is evaluated on four public expression databases, CK+, MMI, Oulu-CASIA, and FERA. In all cases, our method outperforms the known state of the art by a large margin.
无
0
-
25浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Spatial Pyramid Covariance-Based Compact Video Code for Robust Face Retrieval in TV-Series
IEEE Transactions on Image Processing ,2016,25(12): 5905 - 59
2016年10月10日
We address the problem of face video retrieval in TV-series, which searches video clips based on the presence of specific character, given one face track of his/her. This is tremendously challenging because on one hand, faces in TV-series are captured in largely uncontrolled conditions with complex appearance variations, and on the other hand, retrieval task typically needs efficient representation with low time and space complexity. To handle this problem, we propose a compact and discriminative representation for the huge body of video data, named compact video code (CVC). Our method first models the face track by its sample (i.e., frame) covariance matrix to capture the video data variations in a statistical manner. To incorporate discriminative information and obtain more compact video signature suitable for retrieval, the high-dimensional covariance representation is further encoded as a much lower dimensional binary vector, which finally yields the proposed CVC. Specifically, each bit of the code, i.e., each dimension of the binary vector, is produced via supervised learning in a max margin framework, which aims to make a balance between the discriminability and stability of the code. Besides, we further extend the descriptive granularity of covariance matrix from traditional pixel-level to more general patch-level, and proceed to propose a novel hierarchical video representation named spatial pyramid covariance along with a fast calculation method. Face retrieval experiments on two challenging TV-series video databases, i.e., the Big Bang Theory and Prison Break, demonstrate the competitiveness of the proposed CVC over the state-of-the-art retrieval methods. In addition, as a general video matching algorithm, CVC is also evaluated in traditional video face recognition task on a standard Internet database, i.e., YouTube Celebrities, showing its quite promising performance by using an extremely compact code with only 128 bits.
无
0
-
40浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用