沈红斌
博士 教授 博士生导师
上海交通大学 电子信息与电气工程学院
生物信息学、模式识别与图像处理、蛋白质工程、大数据挖掘与理解。
个性化签名
- 姓名:沈红斌
- 目前身份:在职研究人员
- 担任导师情况:博士生导师
- 学位:博士
-
学术头衔:
- 职称:高级-教授
-
学科领域:
模式识别
- 研究兴趣:生物信息学、模式识别与图像处理、蛋白质工程、大数据挖掘与理解。
沈红斌,上海交通大学特聘教授,博士生导师。
2007年毕业于上海交通大学获模式识别与智能系统专业博士学位, 后赴哈佛大学医学院从事博士后研究工作,2008年回国后在上海交通大学工作, 历任副教授、特别研究员、教授。2012年受邀作为访问教授访问密歇根大学,2014-2016每年受邀请前往博洛尼亚大学开设Pattern Recognition in Bioinformatics 研究生课程。
主要从事生物分子模式识别理论和方法及海量生物数据挖掘的信息特征分析和处理的研究工作,在蛋白质分子结构和功能识别的理论算法和模型,蛋白质网络功能模块挖掘方面形成了创新性研究成果,在Nature Protocols等期刊发表SCI学术论文60余篇,研究工作曾被国际期刊Journal of Cellular Biochemistry作为封面故事论文报道。在理论工作的基础上,进一步实现在线生物信息科学服务网站20余个,已连续在线服务七年,被累计使用三百万余次,产生了一定的学术影响。
主持国家自然科学基金杰出青年科学基金、国家自然科学基金优秀青年基金、重大研究计划等研究项目10项。入选全国优秀博士学位论文,教育部新世纪优秀人才,上海市浦江人才,第八届上海青年科技英才,获上海交通大学教学新秀,上海市教育系统科研新星,上海市青年五四奖章,上海交通大学烛光二等奖等荣誉。主持项目入围2018世界人工智能创新大赛最高荣誉SAIL奖榜单。承担本科生大平台课程《C++程序设计》和本科生专业选修课《生物特征与模式分析》的讲授,指导研究生多次获得上海交大创新能力培养专项基金资助和优秀毕业生等荣誉称号。
兼任上海市自动化学会模式识别专委会主任 (2015-),上海市计算机学会生物信息学专委会副主任 (2015-),中国计算机学会人工智能与模式识别专业委员会委员 (2013-),中国人工智能学会模式识别专委会委员 (2014-);BMC Bioinformatics、SCIENCE CHINA Information Sciences副主编;Computational and Structural Biotechnology Journal、Protein and Peptide Letters、PLoS ONE等5种国际期刊的编委;中国科学:信息科学、生物信息学、电子与信息学报、计算机科学等国内期刊的编委。
-
主页访问
218
-
关注数
0
-
成果阅读
1280
-
成果数
23
【期刊论文】LabCaS for Ranking Potential Calpain Substrate Cleavage Sites from Amino Acid Sequence
Methods in Molecular Biology ,2019,(1915):111-120
2019年01月09日
Calpains are a family of Ca2+-dependent cysteine proteases involved in many important biological processes, where they selectively cleave relevant substrates at specific cleavage sites to regulate the function of the substrate proteins. Presently, our knowledge about the function of calpains and the mechanism of substrate cleavage is still limited due to the fact that the experimental determination and validation on calpain bindings are usually laborious and expensive. This chapter describes LabCaS, an algorithm that is designed for predicting the calpain substrate cleavage sites from amino acid sequences. LabCaS is built on a conditional random field (CRF) statistic model, which trains the cleavage site prediction on multiple features of amino acid residue preference, solvent accessibility information, pair-wise alignment similarity score, secondary structure propensity, and physical-chemistry properties. Large-scale benchmark tests have shown that LabCaS can achieve a reliable recognition of the cleavage sites for most calpain proteins with an average AUC score of 0.862. Due to the fast speed and convenience of use, the protocol should find its usefulness in large-scale calpain-based function annotations of the newly sequenced proteins. The online web server of LabCaS is freely available at http://www.csbio.sjtu.edu.cn/bioinf/LabCaS .
Calpain, Cleavage site prediction, Conditional random fields, Ensemble learning, Protease substrate recognition, Sequence labeling
0
-
26浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Identifying RNA-binding proteins using multi-label deep learning
Science China(Information Sciences),2019,(1):217-219
2019年01月01日
Dear editor,RNA-binding proteins (RBPs) are involved in both transcriptional and post-transcriptional gene regulation, such as RNA splicing and localization. In addition, their dysregulations are closely associated with many diseases [1]. For example, mutations in the RBPs FUS and TDP-43that can cause amyotrophic lateral sclerosis [2].
无
0
-
35浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】AnnoFly: annotating Drosophila embryonic images based on an attention-enhanced RNN model
Bioinformatics,2019,35(16):2834–2842
2019年08月15日
Motivation: In the post-genomic era, image-based transcriptomics have received huge attention, because the visualization of gene expression distribution is able to reveal spatial and temporal expression pattern, which is significantly important for understanding biological mechanisms. The Berkeley Drosophila Genome Project has collected a large-scale spatial gene expression database for studying Drosophila embryogenesis. Given the expression images, how to annotate them for the study of Drosophila embryonic development is the next urgent task. In order to speed up the labor-intensive labeling work, automatic tools are highly desired. However, conventional image annotation tools are not applicable here, because the labeling is at the gene-level rather than the image-level, where each gene is represented by a bag of multiple related images, showing a multi-instance phenomenon, and the image quality varies by image orientations and experiment batches. Moreover, different local regions of an image correspond to different CV annotation terms, i.e. an image has multiple labels. Designing an accurate annotation tool in such a multi-instance multi-label scenario is a very challenging task. Results: To address these challenges, we develop a new annotator for the fruit fly embryonic images, called AnnoFly. Driven by an attention-enhanced RNN model, it can weight images of different qualities, so as to focus on the most informative image patterns. We assess the new model on three standard datasets. The experimental results reveal that the attention-based model provides a transparent approach for identifying the important images for labeling, and it substantially enhances the accuracy compared with the existing annotation methods, including both single-instance and multi-instance learning methods. Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/annofly/.
无
0
-
24浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
J. Chem. Inf. Model.,2019,59(4):1658–1667
2019年01月24日
The reconstruction of a three-dimensional model from cryo-electron microscopy (cryo-EM) two-dimensional images is currently a mainstream technology for revealing the structure of biomacromolecules. In this structure solution protocol, an important step is to identify each particle’s projection orientation. Because the obtained single-particle images are often too noisy, clustering is an important step to mitigate noise by averaging images within the same class. The core of clustering is to place similar cryo-EM images into the same class; hence, measurement of similarity between data samples is an essential element in any clustering algorithm. As the cryo-EM images are highly noisy, directly measuring the similarity of two images will be easily biased by the hidden noise. In this study, we propose a new network structural similarity metric-based clustering protocol NCEM for clustering the noisy cryo-EM images. We first construct an image complex network for all of the cryo-EM single-particle images, where each image is represented as a node in the network. Then the similarity between two images is refined from the network structural geometry. By extending the similarity measurement from two independent images to their corresponding neighboring sets in the network, this new NCEM has typical advantages over direct measurement of two images for its noise resistance by using the structural information on the network. Our experimental results for both synthetic and real data sets demonstrate the efficacy of the protocol.
无
0
-
28浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Recent methodology progress of deep learning for RNA–protein interaction prediction
WIREs RNA,2019,():e1544
2019年05月08日
Interactions between RNAs and proteins play essential roles in many important biological processes. Benefitting from the advances of next generation sequencing technologies, hundreds of RNA‐binding proteins (RBP) and their associated RNAs have been revealed, which enables the large‐scale prediction of RNA–protein interactions using machine learning methods. Till now, a wide range of computational tools and pipelines have been developed, including deep learning models, which have achieved remarkable performance on the identification of RNA–protein binding affinities and sites. In this review, we provide an overview of the successful implementation of various deep learning approaches for predicting RNA–protein interactions, mainly focusing on the prediction of RNA–protein interaction pairs and RBP‐binding sites on RNAs. Furthermore, we discuss the advantages and disadvantages of these approaches, and highlight future perspectives on how to design better deep learning models. Finally, we suggest some promising future directions of computational tasks in the study of RNA–protein interactions, especially the interactions between noncoding RNAs and proteins.
无
0
-
53浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
RNA,2019,25(): 1604-1615
2019年09月19日
Circular RNAs (circRNAs), with their crucial roles in gene regulation and disease development, have become rising stars in the RNA world. To understand the regulatory function of circRNAs, many studies focus on the interactions between circRNAs and RNA-binding proteins (RBPs). Recently, the abundant CLIP-seq experimental data has enabled the large-scale identification and analysis of circRNA–RBP interactions, whereas, as far as we know, no computational tool based on machine learning has been proposed yet. We develop CRIP (CircRNAs Interact with Proteins) for the prediction of RBP-binding sites on circRNAs using RNA sequences alone. CRIP consists of a stacked codon-based encoding scheme and a hybrid deep learning architecture, in which a convolutional neural network (CNN) learns high-level abstract features and a recurrent neural network (RNN) learns long dependency in the sequences. We construct 37 data sets including sequence fragments of binding sites on circRNAs, and each set corresponds to an RBP. The experimental results show that the new encoding scheme is superior to the existing feature representation methods for RNA sequences, and the hybrid network outperforms conventional classifiers by a large margin, where both the CNN and RNN components contribute to the performance improvement.
circular RNA, RNA–protein interaction, deep learning, codon-based encoding
0
-
72浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
PLoS Comput Biol,2019,15(9):e1007324
2019年09月17日
Reverse engineering of gene regulatory networks (GRNs) is a central task in systems biology. Most of the existing methods for GRN inference rely on gene co-expression analysis or TF-target binding information, where the determination of co-expression is often unreliable merely based on gene expression levels, and the TF-target binding data from high-throughput experiments may be noisy, leading to a high ratio of false links and missed links, especially for large-scale networks. In recent years, the microscopy images recording spatial gene expression have become a new resource in GRN reconstruction, as the spatial and temporal expression patterns contain much abundant gene interaction information. Till now, the spatial expression resources have been largely underexploited, and only a few traditional image processing methods have been employed in the image-based GRN reconstruction. Moreover, co-expression analysis using conventional measurements based on image similarity may be inaccurate, because it is the local-pattern consistency rather than global-image-similarity that determines gene-gene interactions. Here we present GripDL (Gene regulatory interaction prediction via Deep Learning), which incorporates high-confidence TF-gene regulation knowledge from previous studies, and constructs GRNs for Drosophila eye development based on Drosophila embryonic gene expression images. Benefitting from the powerful representation ability of deep neural networks and the supervision information of known interactions, the new method outperforms traditional methods with a large margin and reveals new intriguing knowledge about Drosophila eye development.
无
0
-
43浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
【期刊论文】Deep Multi-View Feature Learning for EEG-Based Epileptic Seizure Detection
IEEE Transactions on Neural Systems and Rehabilitation Engineering,2019,27(10):1962 - 197
2019年09月11日
Epilepsy is a neurological illness caused by abnormal discharge of brain neurons, where epileptic seizure can lead to life-threatening emergencies. By analyzing the encephalogram (EEG) signals of patients with epilepsy, their conditions can be monitored and seizure can be detected and intervened in time. As the identification of effective features in EEG signals is important for accurate seizure detection, this paper proposes a multi-view deep feature extraction method in attempt to achieve this goal. The method first uses fast Fourier transform (FFT) and wavelet packet decomposition (WPD) to construct the initial multi-view features. Convolutional neural network (CNN) is then used to automatically learn deep features from the initial multi-view features, which reduces the dimensionality and obtain the features with better seizure identification ability. Furthermore, the multi-view Takagi-Sugeno-Kang fuzzy system (MV-TSK-FS), an interpretable rule-based classifier, is used to construct a classification model with strong generalizability based on the deep multi-view features obtained. Experimental studies show that the classification accuracy of the proposed multi-view deep feature extraction method is at least 1% higher than that of common feature extraction methods such as principal component analysis (PCA), FFT and WPD. The classification accuracy is also at least 4% higher than the average accuracy achieved with single-view deep features.
无
0
-
40浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
iScience,2019,20():265-277
2019年09月16日
MicroRNAs (miRNAs) play crucial roles in biological processes involved in diseases. The associations between diseases and protein-coding genes (PCGs) have been well investigated, and miRNAs interact with PCGs to trigger them to be functional. We present a computational method, DimiG, to infer miRNA-associated diseases using a semi-supervised Graph Convolutional Network model (GCN). DimiG uses a multi-label framework to integrate PCG-PCG interactions, PCG-miRNA interactions, PCG-disease associations, and tissue expression profiles. DimiG is trained on disease-PCG associations and an interaction network using a GCN, which is further used to score associations between diseases and miRNAs. We evaluate DimiG on a benchmark set from verified disease-miRNA associations. Our results demonstrate that DimiG outperforms the best unsupervised method and is comparable to two supervised methods. Three case studies of prostate cancer, lung cancer, and inflammatory bowel disease further demonstrate the efficacy of DimiG, where top miRNAs predicted by DimiG are supported by literature.
无
0
-
64浏览
-
0点赞
-
0收藏
-
0分享
-
0下载
-
0评论
-
引用
Biomedical Information Technology (Second Edition),2020,():217-237
2020年01月01日
The contact map, a two-dimensional representation of three-dimensional protein structure, plays an important role in protein structure prediction because it brings crucial restraints on protein conformation exploration. In this chapter, we have summarized automatic methodology development for contact map prediction, whose models are generally categorized into three classes: correlated mutation analysis, direct-correlation analysis, and supervised learning models. The first two classes are unsupervised algorithms, and the last needs training samples extracted from experimentally solved protein structures. Protein residue contact prediction is an extremely imbalanced modeling problem in big data modeling, because the number of residue pairs increases exponentially with sequence length. It has hence triggered the recent deep learning model's successful applications in this topic. We also show in this chapter that the sequence-encoding features extracted from multiple sequence alignment are one of the keys for enhancing predictive performance. With the more accurate and faster models proposed for this challenging topic, predicted contact knowledge is expected to be capable of dramatically speeding up the protein 3-D structure prediction area by providing reliable and timely spatial restraints between residues.
Big data modeling, Contact map, Deep learning, Direct-correlation analysis, Imbalanced classification, Protein structure prediction, Transitive noise
0
-
67浏览
-
0点赞
-
0收藏
-
0分享
-
1下载
-
0评论
-
引用