沈红斌，学者主页-中国科技论文在线

沈红斌

博士教授博士生导师

上海交通大学电子信息与电气工程学院

生物信息学、模式识别与图像处理、蛋白质工程、大数据挖掘与理解。

个性化签名

TA的关注(0) 关注TA的(0)

留言板

暂无留言

主页成果学术会议学者精选辑更多功能敬请期待

姓名：沈红斌
目前身份：在职研究人员
担任导师情况：博士生导师
学位：博士
学术头衔：
职称：高级-教授
学科领域：

模式识别
研究兴趣：生物信息学、模式识别与图像处理、蛋白质工程、大数据挖掘与理解。

个人简介

沈红斌，上海交通大学特聘教授，博士生导师。

2007年毕业于上海交通大学获模式识别与智能系统专业博士学位，后赴哈佛大学医学院从事博士后研究工作，2008年回国后在上海交通大学工作，历任副教授、特别研究员、教授。2012年受邀作为访问教授访问密歇根大学，2014-2016每年受邀请前往博洛尼亚大学开设Pattern Recognition in Bioinformatics 研究生课程。

主要从事生物分子模式识别理论和方法及海量生物数据挖掘的信息特征分析和处理的研究工作，在蛋白质分子结构和功能识别的理论算法和模型，蛋白质网络功能模块挖掘方面形成了创新性研究成果，在Nature Protocols等期刊发表SCI学术论文60余篇，研究工作曾被国际期刊Journal of Cellular Biochemistry作为封面故事论文报道。在理论工作的基础上，进一步实现在线生物信息科学服务网站20余个，已连续在线服务七年，被累计使用三百万余次，产生了一定的学术影响。
主持国家自然科学基金杰出青年科学基金、国家自然科学基金优秀青年基金、重大研究计划等研究项目10项。入选全国优秀博士学位论文，教育部新世纪优秀人才，上海市浦江人才，第八届上海青年科技英才，获上海交通大学教学新秀，上海市教育系统科研新星，上海市青年五四奖章，上海交通大学烛光二等奖等荣誉。主持项目入围2018世界人工智能创新大赛最高荣誉SAIL奖榜单。承担本科生大平台课程《C++程序设计》和本科生专业选修课《生物特征与模式分析》的讲授，指导研究生多次获得上海交大创新能力培养专项基金资助和优秀毕业生等荣誉称号。

兼任上海市自动化学会模式识别专委会主任 (2015-)，上海市计算机学会生物信息学专委会副主任 (2015-)，中国计算机学会人工智能与模式识别专业委员会委员 (2013-)，中国人工智能学会模式识别专委会委员 (2014-)；BMC Bioinformatics、SCIENCE CHINA Information Sciences副主编；Computational and Structural Biotechnology Journal、Protein and Peptide Letters、PLoS ONE等5种国际期刊的编委；中国科学：信息科学、生物信息学、电子与信息学报、计算机科学等国内期刊的编委。

主页访问

218
关注数

0
成果阅读

1280
成果数

23

TA的成果

上传时间

2020-11-30

【期刊论文】LabCaS for Ranking Potential Calpain Substrate Cleavage Sites from Amino Acid Sequence

Methods in Molecular Biology ，2019，（1915）：111-120

2019年01月09日

摘要

Calpains are a family of Ca2+-dependent cysteine proteases involved in many important biological processes, where they selectively cleave relevant substrates at specific cleavage sites to regulate the function of the substrate proteins. Presently, our knowledge about the function of calpains and the mechanism of substrate cleavage is still limited due to the fact that the experimental determination and validation on calpain bindings are usually laborious and expensive. This chapter describes LabCaS, an algorithm that is designed for predicting the calpain substrate cleavage sites from amino acid sequences. LabCaS is built on a conditional random field (CRF) statistic model, which trains the cleavage site prediction on multiple features of amino acid residue preference, solvent accessibility information, pair-wise alignment similarity score, secondary structure propensity, and physical-chemistry properties. Large-scale benchmark tests have shown that LabCaS can achieve a reliable recognition of the cleavage sites for most calpain proteins with an average AUC score of 0.862. Due to the fast speed and convenience of use, the protocol should find its usefulness in large-scale calpain-based function annotations of the newly sequenced proteins. The online web server of LabCaS is freely available at http://www.csbio.sjtu.edu.cn/bioinf/LabCaS .

Calpain， Cleavage site prediction， Conditional random fields， Ensemble learning， Protease substrate recognition， Sequence labeling

26浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-30

【期刊论文】Identifying RNA-binding proteins using multi-label deep learning

Science China(Information Sciences)，2019，（1）：217-219

2019年01月01日

摘要

Dear editor,RNA-binding proteins （RBPs） are involved in both transcriptional and post-transcriptional gene regulation, such as RNA splicing and localization. In addition, their dysregulations are closely associated with many diseases [1]. For example, mutations in the RBPs FUS and TDP-43that can cause amyotrophic lateral sclerosis [2].

无

35浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-30

【期刊论文】AnnoFly: annotating Drosophila embryonic images based on an attention-enhanced RNN model

Bioinformatics，2019，35（16）：2834–2842

2019年08月15日

摘要

Motivation: In the post-genomic era, image-based transcriptomics have received huge attention, because the visualization of gene expression distribution is able to reveal spatial and temporal expression pattern, which is significantly important for understanding biological mechanisms. The Berkeley Drosophila Genome Project has collected a large-scale spatial gene expression database for studying Drosophila embryogenesis. Given the expression images, how to annotate them for the study of Drosophila embryonic development is the next urgent task. In order to speed up the labor-intensive labeling work, automatic tools are highly desired. However, conventional image annotation tools are not applicable here, because the labeling is at the gene-level rather than the image-level, where each gene is represented by a bag of multiple related images, showing a multi-instance phenomenon, and the image quality varies by image orientations and experiment batches. Moreover, different local regions of an image correspond to different CV annotation terms, i.e. an image has multiple labels. Designing an accurate annotation tool in such a multi-instance multi-label scenario is a very challenging task. Results: To address these challenges, we develop a new annotator for the fruit fly embryonic images, called AnnoFly. Driven by an attention-enhanced RNN model, it can weight images of different qualities, so as to focus on the most informative image patterns. We assess the new model on three standard datasets. The experimental results reveal that the attention-based model provides a transparent approach for identifying the important images for labeling, and it substantially enhances the accuracy compared with the existing annotation methods, including both single-instance and multi-instance learning methods. Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/annofly/.

无

24浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-30

【期刊论文】Clustering Enhancement of Noisy Cryo-Electron Microscopy Single-Particle Images with a Network Structural Similarity Metric

J. Chem. Inf. Model.，2019，59（4）：1658–1667

2019年01月24日

摘要

The reconstruction of a three-dimensional model from cryo-electron microscopy (cryo-EM) two-dimensional images is currently a mainstream technology for revealing the structure of biomacromolecules. In this structure solution protocol, an important step is to identify each particle’s projection orientation. Because the obtained single-particle images are often too noisy, clustering is an important step to mitigate noise by averaging images within the same class. The core of clustering is to place similar cryo-EM images into the same class; hence, measurement of similarity between data samples is an essential element in any clustering algorithm. As the cryo-EM images are highly noisy, directly measuring the similarity of two images will be easily biased by the hidden noise. In this study, we propose a new network structural similarity metric-based clustering protocol NCEM for clustering the noisy cryo-EM images. We first construct an image complex network for all of the cryo-EM single-particle images, where each image is represented as a node in the network. Then the similarity between two images is refined from the network structural geometry. By extending the similarity measurement from two independent images to their corresponding neighboring sets in the network, this new NCEM has typical advantages over direct measurement of two images for its noise resistance by using the structural information on the network. Our experimental results for both synthetic and real data sets demonstrate the efficacy of the protocol.

无

28浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-30

【期刊论文】Recent methodology progress of deep learning for RNA–protein interaction prediction

WIREs RNA，2019，（）：e1544

2019年05月08日

摘要

Interactions between RNAs and proteins play essential roles in many important biological processes. Benefitting from the advances of next generation sequencing technologies, hundreds of RNA‐binding proteins (RBP) and their associated RNAs have been revealed, which enables the large‐scale prediction of RNA–protein interactions using machine learning methods. Till now, a wide range of computational tools and pipelines have been developed, including deep learning models, which have achieved remarkable performance on the identification of RNA–protein binding affinities and sites. In this review, we provide an overview of the successful implementation of various deep learning approaches for predicting RNA–protein interactions, mainly focusing on the prediction of RNA–protein interaction pairs and RBP‐binding sites on RNAs. Furthermore, we discuss the advantages and disadvantages of these approaches, and highlight future perspectives on how to design better deep learning models. Finally, we suggest some promising future directions of computational tasks in the study of RNA–protein interactions, especially the interactions between noncoding RNAs and proteins.

无

53浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】CRIP: predicting circRNA–RBP-binding sites using a codon-based encoding and hybrid deep neural networks

RNA，2019，25（）： 1604-1615

2019年09月19日

摘要

Circular RNAs (circRNAs), with their crucial roles in gene regulation and disease development, have become rising stars in the RNA world. To understand the regulatory function of circRNAs, many studies focus on the interactions between circRNAs and RNA-binding proteins (RBPs). Recently, the abundant CLIP-seq experimental data has enabled the large-scale identification and analysis of circRNA–RBP interactions, whereas, as far as we know, no computational tool based on machine learning has been proposed yet. We develop CRIP (CircRNAs Interact with Proteins) for the prediction of RBP-binding sites on circRNAs using RNA sequences alone. CRIP consists of a stacked codon-based encoding scheme and a hybrid deep learning architecture, in which a convolutional neural network (CNN) learns high-level abstract features and a recurrent neural network (RNN) learns long dependency in the sequences. We construct 37 data sets including sequence fragments of binding sites on circRNAs, and each set corresponds to an RBP. The experimental results show that the new encoding scheme is superior to the existing feature representation methods for RNA sequences, and the hybrid network outperforms conventional classifiers by a large margin, where both the CNN and RNN components contribute to the performance improvement.

circular RNA， RNA–protein interaction， deep learning， codon-based encoding

72浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】Predicting gene regulatory interactions based on spatial gene expression data and deep learning

PLoS Comput Biol，2019，15（9）：e1007324

2019年09月17日

摘要

Reverse engineering of gene regulatory networks (GRNs) is a central task in systems biology. Most of the existing methods for GRN inference rely on gene co-expression analysis or TF-target binding information, where the determination of co-expression is often unreliable merely based on gene expression levels, and the TF-target binding data from high-throughput experiments may be noisy, leading to a high ratio of false links and missed links, especially for large-scale networks. In recent years, the microscopy images recording spatial gene expression have become a new resource in GRN reconstruction, as the spatial and temporal expression patterns contain much abundant gene interaction information. Till now, the spatial expression resources have been largely underexploited, and only a few traditional image processing methods have been employed in the image-based GRN reconstruction. Moreover, co-expression analysis using conventional measurements based on image similarity may be inaccurate, because it is the local-pattern consistency rather than global-image-similarity that determines gene-gene interactions. Here we present GripDL (Gene regulatory interaction prediction via Deep Learning), which incorporates high-confidence TF-gene regulation knowledge from previous studies, and constructs GRNs for Drosophila eye development based on Drosophila embryonic gene expression images. Benefitting from the powerful representation ability of deep neural networks and the supervision information of known interactions, the new method outperforms traditional methods with a large margin and reveals new intriguing knowledge about Drosophila eye development.

无

43浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】Deep Multi-View Feature Learning for EEG-Based Epileptic Seizure Detection

IEEE Transactions on Neural Systems and Rehabilitation Engineering，2019，27（10）：1962 - 197

2019年09月11日

摘要

Epilepsy is a neurological illness caused by abnormal discharge of brain neurons, where epileptic seizure can lead to life-threatening emergencies. By analyzing the encephalogram (EEG) signals of patients with epilepsy, their conditions can be monitored and seizure can be detected and intervened in time. As the identification of effective features in EEG signals is important for accurate seizure detection, this paper proposes a multi-view deep feature extraction method in attempt to achieve this goal. The method first uses fast Fourier transform (FFT) and wavelet packet decomposition (WPD) to construct the initial multi-view features. Convolutional neural network (CNN) is then used to automatically learn deep features from the initial multi-view features, which reduces the dimensionality and obtain the features with better seizure identification ability. Furthermore, the multi-view Takagi-Sugeno-Kang fuzzy system (MV-TSK-FS), an interpretable rule-based classifier, is used to construct a classification model with strong generalizability based on the deep multi-view features obtained. Experimental studies show that the classification accuracy of the proposed multi-view deep feature extraction method is at least 1% higher than that of common feature extraction methods such as principal component analysis (PCA), FFT and WPD. The classification accuracy is also at least 4% higher than the average accuracy achieved with single-view deep features.

无

40浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】Inferring Disease-Associated MicroRNAs Using Semi-supervised Multi-Label Graph Convolutional Networks

iScience，2019，20（）：265-277

2019年09月16日

摘要

MicroRNAs (miRNAs) play crucial roles in biological processes involved in diseases. The associations between diseases and protein-coding genes (PCGs) have been well investigated, and miRNAs interact with PCGs to trigger them to be functional. We present a computational method, DimiG, to infer miRNA-associated diseases using a semi-supervised Graph Convolutional Network model (GCN). DimiG uses a multi-label framework to integrate PCG-PCG interactions, PCG-miRNA interactions, PCG-disease associations, and tissue expression profiles. DimiG is trained on disease-PCG associations and an interaction network using a GCN, which is further used to score associations between diseases and miRNAs. We evaluate DimiG on a benchmark set from verified disease-miRNA associations. Our results demonstrate that DimiG outperforms the best unsupervised method and is comparable to two supervised methods. Three case studies of prostate cancer, lung cancer, and inflammatory bowel disease further demonstrate the efficacy of DimiG, where top miRNAs predicted by DimiG are supported by literature.

无

64浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】Chapter Seven - Artificial intelligence in bioinformatics: Automated methodology development for protein residue contact map prediction

Biomedical Information Technology (Second Edition)，2020，（）：217-237

2020年01月01日

摘要

The contact map, a two-dimensional representation of three-dimensional protein structure, plays an important role in protein structure prediction because it brings crucial restraints on protein conformation exploration. In this chapter, we have summarized automatic methodology development for contact map prediction, whose models are generally categorized into three classes: correlated mutation analysis, direct-correlation analysis, and supervised learning models. The first two classes are unsupervised algorithms, and the last needs training samples extracted from experimentally solved protein structures. Protein residue contact prediction is an extremely imbalanced modeling problem in big data modeling, because the number of residue pairs increases exponentially with sequence length. It has hence triggered the recent deep learning model's successful applications in this topic. We also show in this chapter that the sequence-encoding features extracted from multiple sequence alignment are one of the keys for enhancing predictive performance. With the more accurate and faster models proposed for this challenging topic, predicted contact knowledge is expected to be capable of dramatically speeding up the protein 3-D structure prediction area by providing reliable and timely spatial restraints between residues.

Big data modeling， Contact map， Deep learning， Direct-correlation analysis， Imbalanced classification， Protein structure prediction， Transitive noise

67浏览
0点赞
0收藏
0分享
1下载
0评论
引用

上传时间

2020-11-27

【期刊论文】Chapter Seven - Artificial intelligence in bioinformatics: Automated methodology development for protein residue contact map prediction

Biomedical Information Technology (Second Edition)，2020，（）：217-237

2020年01月01日

摘要

Big data modeling， Contact map， Deep learning， Direct-correlation analysis， Imbalanced classification， Protein structure prediction， Transitive noise

67浏览
0点赞
0收藏
0分享
1下载
0评论
引用

上传时间

2020-11-27

【期刊论文】Ab-initio Membrane Protein Amphipathic Helix Structure Prediction Using Deep Neural Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics ( Early Access )，2020，（）：1 - 1

2020年10月07日

摘要

Amphipathic helix (AH) features the segregation of polar and nonpolar residues and plays important roles in many membrane-associated biological processes through interacting with both the lipid and the soluble phases. Although the AH structure has been discovered for a long time, few ab initio machine learning-based prediction models have been reported, due to the limited amount of training data. In this study, we report a new deep learning-based prediction model, which is composed of a residual neural network and the uneven-thresholds decision algorithm. It is constructed on 121 membrane proteins, in total 51640 residue samples, which are curated from an up-to-date membrane protein structure database. Through a rigid 10-fold nested cross-validation experiment, we demonstrate that our model has exceeded the state-of-the-art approaches in this field. This presents a new avenue for accurately predicting AHs. Analysis on the contribution of the input residues and some cases further reveals the high interpretability and the generalization of our model.

无

81浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】Consistency and variation of protein subcellular location annotations

Proteins: Structure, Function, and Bioinformatics，2020，（）：

2020年09月16日

摘要

A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human‐interpreted rather than primary data. For example, the Swiss‐Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high‐resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss‐Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss‐Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.

无

31浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】A New Protocol for Atomic-Level Protein Structure Modeling and Refinement Using Low-to-Medium Resolution Cryo-EM Density Maps

Journal of Molecular Biology，2020，432（19）：5365-5377

2020年09月04日

摘要

The rapid progress of cryo-electron microscopy (cryo-EM) in structural biology has raised an urgent need for robust methods to create and refine atomic-level structural models using low-resolution EM density maps. We propose a new protocol to create initial models using I-TASSER protein structure prediction, followed by EM density map-based rigid-body structure fitting, flexible fragment adjustment and atomic-level structure refinement simulations. The protocol was tested on a large set of 285 non-homologous proteins and generated structural models with correct folds for 260 proteins, where 28% had RMSDs below 2 Å. Compared to other state-of-the-art methods, the major advantage of the proposed pipeline lies in the uniform structure prediction and refinement protocol, as well as the extensive structural re-assembly simulations, which allow for low-to-medium resolution EM density map-guided structure modeling starting from amino acid sequences. Interestingly, the quality of both the image fitting and subsequent structure refinement was found to be strongly correlated with the correctness of the initial I-TASSER models; this is mainly due to the different correlation patterns observed between force field and structural quality for the models with template modeling score (or TM-score, a metric quantifying the similarity of models to the native) above and below a threshold of 0.5. Overall, the results demonstrate a new avenue that is ready to use for large-scale cryo-EM-based structure modeling and atomic-level density map-guided structure refinement.

无

38浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】Signal-3L 3.0: Improving Signal Peptide Prediction through Combining Attention Deep Learning with Window-Based Scoring

J. Chem. Inf. Model.，2020，60（7）：3679–3686

2020年06月05日

摘要

Signal peptides play an important role in guiding and transferring transmembrane proteins and secreted proteins. In recent years, with the explosive growth of protein sequences, computationally predicting signal peptides and their cleavage sites from protein sequences is highly desired. In this work, we present an improved approach, Signal-3L 3.0, for signal peptide recognition and cleavage-site prediction using a 3-layer hybrid method of integrating deep learning algorithms and window-based scoring. There are three main components in the Signal-3L 3.0 prediction engine: (1) a deep bidirectional long short-term memory (Bi-LSTM) network with a soft self-attention learns abstract features from sequences to determine whether a query protein contains a signal peptide; (2) the statistics propensity window-based cleavage site screening method is applied to generate the set of candidate cleavage sites; (3) the prediction of a conditional random field with a hybrid convolutional neural network (CNN) and Bi-LSTM is fused with the window-based score for identifying the final unique cleavage site. Experimental results on the benchmark datasets show that the new deep learning-driven Signal-3L 3.0 yields promising performance. The online server of Signal-3L 3.0 is available at http://www.csbio.sjtu.edu.cn/bioinf/Signal-3L/.

无

167浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】RNA-binding protein recognition based on multi-view deep feature and multi-label learning

Briefings in Bioinformatics，2020，（）：bbaa174

2020年08月17日

摘要

RNA-binding protein (RBP) is a class of proteins that bind to and accompany RNAs in regulating biological processes. An RBP may have multiple target RNAs, and its aberrant expression can cause multiple diseases. Methods have been designed to predict whether a specific RBP can bind to an RNA and the position of the binding site using binary classification model. However, most of the existing methods do not take into account the binding similarity and correlation between different RBPs. While methods employing multiple labels and Long Short Term Memory Network (LSTM) are proposed to consider binding similarity between different RBPs, the accuracy remains low due to insufficient feature learning and multi-label learning on RNA sequences. In response to this challenge, the concept of RNA-RBP Binding Network (RRBN) is proposed in this paper to provide theoretical support for multi-label learning to identify RBPs that can bind to RNAs. It is experimentally shown that the RRBN information can significantly improve the prediction of unknown RNA−RBP interactions. To further improve the prediction accuracy, we present the novel computational method iDeepMV which integrates multi-view deep learning technology under the multi-label learning framework. iDeepMV first extracts data from the views of amino acid sequence and dipeptide component based on the RNA sequences as the original view. Deep neural network models are then designed for the respective views to perform deep feature learning. The extracted deep features are fed into multi-label classifiers which are trained with the RNA−RBP interaction information for the three views. Finally, a voting mechanism is designed to make comprehensive decision on the results of the multi-label classifiers. Our experimental results show that the prediction performance of iDeepMV, which combines multi-view deep feature learning models with RNA−RBP interaction information, is significantly better than that of the state-of-the-art methods. iDeepMV is freely available at http://www.csbio.sjtu.edu.cn/bioinf/iDeepMV for academic use. The code is freely available at http://github.com/uchihayht/iDeepMV.

multi RNA-binding proteins recognition,， multi-view deep feature learning,， multi-label learning

88浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】Scoring disease-microRNA associations by integrating disease hierarchy into graph convolutional networks

Pattern Recognition，2020，105（）：107385

2020年09月01日

摘要

In this study, we present an updated predictor DimiG 2.0, which uses a semi-supervised multi-label graph convolutional network (GCN) to infer disease-associated microRNAs (miRNAs) on an interaction network between protein coding genes (PCGs) and miRNAs using disease-PCG associations. DimiG 2.0 benefits from integrating the hierarchy of diseases into the GCN. DimiG 2.0 has the following updates: 1) It incorporates the hierarchy of diseases to regularize the GCN, encouraging diseases in the hierarchy to share similar miRNAs. 2) It integrates the PCGs with interacting partners but without associated diseases into model training, these unlabeled PCGs increase the size of the constructed interaction network. 3) It is able to predict associated miRNAs for 1017 diseases (updated from 248). 4) It updates expression data across tissues from the latest GTEx v7, and the expression values are quantified in Transcripts Per Million (TPM). Our results show that DimiG 2.0 outperforms state-of-the-art semi-supervised and supervised methods on the constructed benchmarked sets.

microRNAs， Protein coding genes， Interaction network， Graph convolutional network， Disease hierarchy

46浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】SPREAD: A Fully Automated Toolkit for Single-Particle Cryogenic Electron Microscopy Data 3D Reconstruction with Image-Network-Aided Orientation Assignment

J. Chem. Inf. Model.，2020，60（5）：2614–2625

2020年01月28日

摘要

For the past decade, cryogenic electron microscopy (cryo-EM) has become an important technology to determine three-dimensional (3D) structures of biomacromolecules. Many software tools have been developed for cryo-EM image processing and 3D reconstruction, covering various computational tasks in cryo-EM data analysis. Despite the recent progress, most of these software tools focus on a single task, such as automatic particle picking or image clustering, whereas software packages covering the whole pipeline of cryo-EM data processing are still few. In this study, we developed a fully automatic single-particle reconstruction and analysis toolkit for cryo-EM data, named SPREAD, which integrates 2D image classification, 3D initial model generation, model selection, and 3D refinement. In SPREAD, we adopt our previously proposed network-based clustering algorithm for 2D image classification, NCEM, and the reference-free resolution measurement method SRes to realize the automatic model ranking and selection procedure. Projection orientation assignment is one of the key steps in initial model generation and 3D refinement. In SPREAD, we use the network-based image similarity metric and introduce a new probabilistic-based orientation searching method, named peak finding, to enhance assignment of the projection orientations. For dealing with both the particle images and projection images in the 3D refinement using SPREAD, we build a mixture image network containing both of these types of images on the basis of the peak-finding results, and then similarities for node pairs are recomputed by a superposed random walk on the network. SPREAD achieves a fully automatic workflow in which nearly no expert domain knowledge and interactive manual operation are involved. Our software can accessed for free at http://www.csbio.sjtu.edu.cn/bioinf/SPREAD/ for academic use.

无

62浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】Protein–ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data

Bioinformatics，2020，36（10）：3018–3027

2020年05月15日

摘要

Motivation Knowledge of protein–ligand binding residues is important for understanding the functions of proteins and their interaction mechanisms. From experimentally solved protein structures, how to accurately identify its potential binding sites of a specific ligand on the protein is still a challenging problem. Compared with structure-alignment-based methods, machine learning algorithms provide an alternative flexible solution which is less dependent on annotated homogeneous protein structures. Several factors are important for an efficient protein–ligand prediction model, e.g. discriminative feature representation and effective learning architecture to deal with both the large-scale and severely imbalanced data. Results In this study, we propose a novel deep-learning-based method called DELIA for protein–ligand binding residue prediction. In DELIA, a hybrid deep neural network is designed to integrate 1D sequence-based features with 2D structure-based amino acid distance matrices. To overcome the problem of severe data imbalance between the binding and nonbinding residues, strategies of oversampling in mini-batch, random undersampling and stacking ensemble are designed to enhance the model. Experimental results on five benchmark datasets demonstrate the effectiveness of proposed DELIA pipeline.

无

62浏览
0点赞
0收藏
0分享
0下载
0评论
引用

上传时间

2020-11-27

【期刊论文】CTNNB1/β-catenin dysfunction contributes to adiposity by regulating the cross-talk of mature adipocytes and preadipocytes

Science Advances，2020，6（2）：eaax9605

2020年01月08日

摘要

Overnutrition results in adiposity and chronic inflammation with expansion of white adipose tissue (WAT). However, genetic factors controlling fat mass and adiposity remain largely undetermined. We applied whole-exome sequencing in young obese subjects and identified rare gain-of-function mutations in CTNNB1/β-catenin associated with increased obesity risk. Specific ablation of β-catenin in mature adipocytes attenuated high-fat diet–induced obesity and reduced sWAT mass expansion with less proliferated Pdgfrα+ preadipocytes and less mature adipocytes. Mechanistically, β-catenin regulated the transcription of serum amyloid A3 (Saa3), an adipocyte-derived chemokine, through β-catenin–TCF (T-Cell-Specific Transcription Factor) complex in mature adipocytes, and Saa3 activated macrophages to secrete several factors, including Pdgf-aa, which further promoted the proliferation of preadipocytes, suggesting that β-catenin/Saa3/macrophages may mediate mature adipocyte-preadipocyte cross-talk and fat expansion in sWAT. The identification of β-catenin as a key regulator in fat expansion and human adiposity provides the basis for developing drugs targeting Wnt/β-catenin pathway to combat obesity.

无

69浏览
0点赞
0收藏
0分享
0下载
0评论
引用