胡清华
博士 教授 博士生导师
天津大学 人工智能学院
主要从事复杂数据机器学习、数据挖掘及其应用方面的研究,关注混合数据、多模态数据的知识发现。
个性化签名
- 姓名:胡清华
- 目前身份:在职研究人员
- 担任导师情况:博士生导师
- 学位:博士
-
学术头衔:
博士生导师
- 职称:高级-教授
-
学科领域:
计算机系统结构
- 研究兴趣:主要从事复杂数据机器学习、数据挖掘及其应用方面的研究,关注混合数据、多模态数据的知识发现。
胡清华,男,1976年生,天津大学教授、博士生导师,国家优秀青年基金获得者。
主要从事复杂数据机器学习、数据挖掘及其应用方面的研究,关注混合数据、多模态数据的知识发现。主持国家自然科学基金2项,参与国家自然科学基金面上项目、重大项目、973子课题多项。在科学出版社出版学术专著《应用粗糙计算》一部。发表学术论文90余篇,被SCI检索近60篇,所发表的论文被引用1000余次,其中SCI他引300余次,先后成为Pattern Recognition, Pattern Recognition Letters,Knowledge Based Systems杂志近5年发表被引用次数最多的论文之一。撰写的会议论文获“PRICAI2006”、2007和2010年“中国粗糙集与软计算会议”优秀学生论文奖。指导的研究生获“哈工大金牌毕业生”称号和“黑龙江省优秀硕士论文”。
-
主页访问
456
-
关注数
1
-
成果阅读
1238
-
成果数
23
【期刊论文】Feature Selection for Monotonic Classification
IEEE Transactions on Fuzzy Systems,2011,20(1):69 - 81
2011年09月06日
Monotonic classification is a kind of special task in machine learning and pattern recognition. Monotonicity constraints between features and decision should be taken into account in these tasks. However, most existing techniques are not able to discover and represent the ordinal structures in monotonic datasets. Thus, they are inapplicable to monotonic classification. Feature selection has been proven effective in improving classification performance and avoiding overfitting. To the best of our knowledge, no technique has been specially designed to select features in monotonic classification until now. In this paper, we introduce a function, which is called rank mutual information, to evaluate monotonic consistency between features and decision in monotonic tasks. This function combines the advantages of dominance rough sets in reflecting ordinal structures and mutual information in terms of robustness. Then, rank mutual information is integrated with the search strategy of min-redundancy and max-relevance to compute optimal subsets of features. A collection of numerical experiments are given to show the effectiveness of the proposed technique.
无
0
-
63浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】On Robust Fuzzy Rough Set Models
IEEE Transactions on Fuzzy Systems,2011,20(4):636 - 651
2011年12月22日
Rough sets, especially fuzzy rough sets, are supposedly a powerful mathematical tool to deal with uncertainty in data analysis. This theory has been applied to feature selection, dimensionality reduction, and rule learning. However, it is pointed out that the classical model of fuzzy rough sets is sensitive to noisy information, which is considered as a main source of uncertainty in applications. This disadvantage limits the applicability of fuzzy rough sets. In this paper, we reveal why the classical fuzzy rough set model is sensitive to noise and how noisy samples impose influence on fuzzy rough computation. Based on this discussion, we study the properties of some current fuzzy rough models in dealing with noisy data and introduce several new robust models. The properties of the proposed models are also discussed. Finally, a robust classification algorithm is designed based on fuzzy lower approximations. Some numerical experiments are given to illustrate the effectiveness of the models. The classifiers that are developed with the proposed models achieve good generalization performance.
无
0
-
52浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Rank Entropy-Based Decision Trees for Monotonic Classification
IEEE Transactions on Knowledge and Data Engineering,2011,24(11):2052 - 206
2011年06月30日
In many decision making tasks, values of features and decision are ordinal. Moreover, there is a monotonic constraint that the objects with better feature values should not be assigned to a worse decision class. Such problems are called ordinal classification with monotonicity constraint. Some learning algorithms have been developed to handle this kind of tasks in recent years. However, experiments show that these algorithms are sensitive to noisy samples and do not work well in real-world applications. In this work, we introduce a new measure of feature quality, called rank mutual information (RMI), which combines the advantage of robustness of Shannon's entropy with the ability of dominance rough sets in extracting ordinal structures from monotonic data sets. Then, we design a decision tree algorithm (REMT) based on rank mutual information. The theoretic and experimental analysis shows that the proposed algorithm can get monotonically consistent decision trees, if training samples are monotonically consistent. Its performance is still good when data are contaminated with noise.
无
0
-
55浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Improved support vector machine algorithm for heterogeneous data
Pattern Recognition,2015,48(6):2072-2083
2015年06月01日
A support vector machine (SVM) is a popular algorithm for classification learning. The classical SVM effectively manages classification tasks defined by means of numerical attributes. However, both numerical and nominal attributes are used in practical tasks and the classical SVM does not fully consider the difference between them. Nominal attributes are usually regarded as numerical after coding. This may deteriorate the performance of learning algorithms. In this study, we propose a novel SVM algorithm for learning with heterogeneous data, known as a heterogeneous SVM (HSVM). The proposed algorithm learns an mapping to embed nominal attributes into a real space by minimizing an estimated generalization error, instead of by direct coding. Extensive experiments are conducted, and some interesting results are obtained. The experiments show that HSVM improves classification performance for both nominal and heterogeneous data.
Support vector machine, Heterogeneous data, Nominal attribute, Numerical attribute, Classification learning
0
-
63浏览
-
0点赞
-
0收藏
-
0分享
-
1下载
-
0评论
-
引用
【期刊论文】Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO
IEEE Transactions on Multimedia,2015,17(11):1936 - 194
2015年09月07日
Heterogeneous feature representations are widely used in machine learning and pattern recognition, especially for multimedia analysis. The multi-modal, often also high- dimensional , features may contain redundant and irrelevant information that can deteriorate the performance of modeling in classification. It is a challenging problem to select the informative features for a given task from the redundant and heterogeneous feature groups. In this paper, we propose a novel framework to address this problem. This framework is composed of two modules, namely, multi-modal deep neural networks and feature selection with sparse group LASSO. Given diverse groups of discriminative features, the proposed technique first converts the multi-modal data into a unified representation with different branches of the multi-modal deep neural networks. Then, through solving a sparse group LASSO problem, the feature selection component is used to derive a weight vector to indicate the importance of the feature groups. Finally, the feature groups with large weights are considered more relevant and hence are selected. We evaluate our framework on three image classification datasets. Experimental results show that the proposed approach is effective in selecting the relevant feature groups and achieves competitive classification performance as compared with several recent baseline methods.
无
0
-
72浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Data-Distribution-Aware Fuzzy Rough Set Model and its Application to Robust Classification
IEEE Transactions on Cybernetics,2015,46(12):3073 - 308
2015年11月12日
Fuzzy rough sets (FRSs) are considered to be a powerful model for analyzing uncertainty in data. This model encapsulates two types of uncertainty: 1) fuzziness coming from the vagueness in human concept formation and 2) roughness rooted in the granulation coming with human cognition. The rough set theory has been widely applied to feature selection, attribute reduction, and classification. However, it is reported that the classical FRS model is sensitive to noisy information. To address this problem, several robust models have been developed in recent years. Nevertheless, these models do not consider a statistical distribution of data, which is an important type of uncertainty. Data distribution serves as crucial information for designing an optimal classification or regression model. Thus, we propose a data-distribution-aware FRS model that considers distribution information and incorporates it in computing lower and upper fuzzy approximations. The proposed model considers not only the similarity between samples, but also the probability density of classes. In order to demonstrate the effectiveness of the proposed model, we design a new sample evaluation index for prototype-based classification based on the model, and a prototype selection algorithm is developed using this index. Furthermore, a robust classification algorithm is constructed with prototype covering and nearest neighbor classification. Experimental results confirm the robustness and effectiveness of the proposed model.
无
0
-
41浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Efficient Background Modeling Based on Sparse Representation and Outlier Iterative Removal
IEEE Transactions on Circuits and Systems for Video Technology,2014,26(2):278 - 289
2014年12月12日
Background modeling is a critical component for various vision-based applications. Most traditional methods tend to be inefficient when solving large-scale problems. In this paper, we introduce sparse representation into the task of large-scale stable-background modeling, and reduce the video size by exploring its discriminative frames. A cyclic iteration process is then proposed to extract the background from the discriminative frame set. The two parts combine to form our sparse outlier iterative removal (SOIR) algorithm. The algorithm operates in tensor space to obey the natural data structure of videos. Experimental results show that a few discriminative frames determine the performance of the background extraction. Furthermore, SOIR can achieve high accuracy and high speed simultaneously when dealing with real video sequences. Thus, SOIR has an advantage in solving large-scale tasks.
无
0
-
49浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Semisupervised Online Multikernel Similarity Learning for Image Retrieval
IEEE Transactions on Multimedia,2016,19(5):1077 - 108
2016年12月23日
Metric learning plays a fundamental role in the fields of multimedia retrieval and pattern recognition. Recently, an online multikernel similarity (OMKS) learning method has been presented for content-based image retrieval (CBIR), which was shown to be promising for capturing the intrinsic nonlinear relations within multimodal features from large-scale data. However, the similarity function in this method is learned only from labeled images. In this paper, we present a new framework to exploit unlabeled images and develop a semisupervised OMKS algorithm. The proposed method is a multistage algorithm consisting of feature selection, selective ensemble learning, active sample selection, and triplet generation. The novel aspects of our work are the introduction of classification confidence to evaluate the labeling process and select the reliably labeled images to train the metric function, and a method for reliable triplet generation, where a new criterion for sample selection is used to improve the accuracy of label prediction for unlabeled images. Our proposed method offers advantages in challenging scenarios, in particular, for a small set of labeled images with high-dimensional features. Experimental results demonstrate the effectiveness of the proposed method as compared with several baseline methods.
无
0
-
45浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Streaming Feature Selection for Multilabel Learning Based on Fuzzy Mutual Information
IEEE Transactions on Fuzzy Systems,2017,25(6):1491 - 150
2017年08月03日
Due to complex semantics, a sample may be associated with multiple labels in various classification and recognition tasks. Multilabel learning generates training models to map feature vectors to multiple labels. There are several significant challenges in multilabel learning. Samples in multilabel learning are usually described with high-dimensional features and some features may be sequentially extracted. Thus, we do not know the full feature set at the beginning of learning, referred to as streaming features. In this paper, we introduce fuzzy mutual information to evaluate the quality of features in multilabel learning, and design efficient algorithms to conduct multilabel feature selection when the feature space is completely known or partially known in advance. These algorithms are called multilabel feature selection with label correlation (MUCO) and multilabel streaming feature selection (MSFS), respectively. MSFS consists of two key steps: online relevance analysis and online redundancy analysis. In addition, we design a metric to measure the correlation between the label sets, and both MUCO and MSFS take label correlation to consideration. The proposed algorithms are not only able to select features from streaming features, but also able to select features for ordinal multilabel learning. However streaming feature selection is more efficient. The proposed algorithms are tested with a collection of multilabel learning tasks. The experimental results illustrate the effectiveness of the proposed algorithms.
无
0
-
59浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Feature Selection Based on Neighborhood Discrimination Index
IEEE Transactions on Neural Networks and Learning Systems,2017,29(7):2986 - 299
2017年06月23日
Feature selection is viewed as an important preprocessing step for pattern recognition, machine learning, and data mining. Neighborhood is one of the most important concepts in classification learning and can be used to distinguish samples with different decisions. In this paper, a neighborhood discrimination index is proposed to characterize the distinguishing information of a neighborhood relation. It reflects the distinguishing ability of a feature subset. The proposed discrimination index is computed by considering the cardinality of a neighborhood relation rather than neighborhood similarity classes. Variants of the discrimination index, including joint discrimination index, conditional discrimination index, and mutual discrimination index, are introduced to compute the change of distinguishing information caused by the combination of multiple feature subsets. They have the similar properties as Shannon entropy and its variants. A parameter, named neighborhood radius, is introduced in these discrimination measures to address the analysis of real-valued data. Based on the proposed discrimination measures, the significance measure of a candidate feature is defined and a greedy forward algorithm for feature selection is designed. Data sets selected from public data sources are used to compare the proposed algorithm with existing algorithms. The experimental results confirm that the discrimination index-based algorithm yields superior performance compared to other classical algorithms.
无
0
-
53浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用