朱小燕
模式识别(汉字识别、语音识别)、人工智能、人机交互、人工神经网络,生物信息学
个性化签名
- 姓名:朱小燕
- 目前身份:
- 担任导师情况:
- 学位:
-
学术头衔:
博士生导师
- 职称:-
-
学科领域:
计算机应用
- 研究兴趣:模式识别(汉字识别、语音识别)、人工智能、人机交互、人工神经网络,生物信息学
1982年毕业于北京科技大学。1987年获日本神户大学硕士学位。1990年获日本名古屋工业大学博士学位。1990年至1993年任日本(株)エレクトロダイン研究所主任工程师。1993年回国。现为清华大学计算机系教授、博士生导师。 主要研究领域为模式识别(汉字识别、语音识别)、人工智能、人机交互、人工神经网络,生物信息学,发表论文90余篇。作为项目负责人,承担973、985国家自然科学基金及多项国际合作项目。研究开发的手写数字识别系统达到"国内领先,国际先进水平"。"脱机手写体汉字与数字识别系统"1998年1月获得国家教委科技进步二等奖。
-
主页访问
2586
-
关注数
0
-
成果阅读
1098
-
成果数
20
【期刊论文】Various Features with Integrated Strategies for Protein Name Classification
朱小燕, Budi Taruna Ongkowijaya, Shilin Ding, and Xiaoyan Zhu
ISPA Workshops 2005, LNCS 3759, pp. 213-222, 2005.,-0001,():
-1年11月30日
Classification task is an integral part of named entity recognition system to classify a recognized named entity to its corresponding class. This task has not received much attention in the biomedical domain, due to the lack of awareness to differentiate feature sources and strategies in previous studies. In this research, we analyze different sources and strategies of protein name classification, and developed integrated strategies that incorporate advantages from rule-based, dictionary-based and statistical-based method. In rule-based method, terms and knowledge of protein nomenclature that provide strong cue for protein name are used. In dictionary-based method, a set of rules for curating protein name dictionary are used. These terms and dictionaries are combined with our developed features into a statistical-based classifier. Our developed features are comprised of word shape features and unigram & bi-gram features. Our various information sources and integrated strategies are able to achieve state-of-the-art performance to classify protein and non-protein names.
-
60浏览
-
0点赞
-
0收藏
-
0分享
-
100下载
-
0评论
-
引用
【期刊论文】Segmentation of Mandarin Braille Word and Braille Translation Based on Multi-knowledge*
朱小燕, Jiang Minghu, Zhu Xiaoyan, Xia Ying, Tan Gang, Yuan Baozong, Tang Xiaofang
Proceedings of ICS P 2000,-0001,():
-1年11月30日
This paper is about the segmentation of Braille words and the transformation from Mandarin Braille to Chinese characters. Braille word segmentation consists of rules base, the signs base of segmentation and knowledge base for disambiguation and mistakes, by using adjacency constraints and bidirectional maximum matching with a dictionary, our system's segmentation precision is better than 99% for the common text. By incorporating a pinyin knowledge dictionary into the system, we perfectly solved the problem of ambiguity in the translation from Braille to pinyin and developed a statistical language model based on the transformation of pinyin into characters. By using a multi-knowledge base to carry out the disambiguation process for each pinyin sentence, we built a multi-level graph and used Viterbi search to find the sequence of Chinese characters with maximum likelihood, and used an N-Best algorithm to get the N most likely character sequences. The experimental results show that the system's overall precision for translation from Braille codes to Chinese characters is 94.38%.
Braille translation,, Virterbi algorithm,, smoothing method,, multi-knowledge.,
-
151浏览
-
0点赞
-
0收藏
-
0分享
-
80下载
-
0评论
-
引用
朱小燕, 王昱, 徐伟
计算机学报,2001,24(2):213~218,-0001,():
-1年11月30日
近年来基于隐马尔可夫模型(HMM)的语音识别技术得到很大发展。然而HMM模型有着一定的局限性,如何克服HMM的一阶假设和独立性假设带来的问题一直是研究讨论的热点。在语音识别中引入神经网络的方法是克服HMM局限性的一条途径。该文将循环神经网络应用于汉语语音识别,修改了原网络模型并提出了相应的训练方法。实验结果表明该模型具有良好的连续信号处理性能,与传统的HMM模型效果相当。新的训练策略能够在提高训练速度的同时,使得模型分类性能有明显提高。
语音识别,, 隐马尔可夫模型(, HMM), ,, 循环神经网络
-
47浏览
-
0点赞
-
0收藏
-
0分享
-
458下载
-
0评论
-
引用
【期刊论文】盲人用计算机软件系统中的语音和自然语言处理技术*
朱小燕, 庄丽, 包塔
中文信息学报,2004,18(4):72~78,-0001,():
-1年11月30日
本文介绍了智能技术与系统国家重点实验室开发的“北极光”盲人用计算机软件系统中涉及的语音和语言处理技术。该系统能够获取和分析需要反馈的屏幕信息,通过语音合成平台将其内容朗读出来,对用户进行语音提示;与汉语自动分词、语言模型等自然语言处理技术的结合,使系统能够进行汉字和盲文的转换,反馈信息可以通过盲文点显器输出,使用户能够摸读盲文点字来获取所需要的信息,用户也可以采用盲文输入法进行输入,输入结果可转换为汉字文本形式。
计算机应用, 中文信息处理, 语音合成, 文本分析, 汉语自动分词, 语言模型
-
80浏览
-
0点赞
-
0收藏
-
0分享
-
327下载
-
0评论
-
引用
朱小燕, 史一凡
计算机学报,2002,25(5):470~482,-0001,():
-1年11月30日
该文提出了一种基于反馈的手写体字符识别方法。该方法将人工神经网络结构及学习算法运用于系统反馈机制中,并从理论上证明了该学习方法是可收敛的,保证了算法的有效性。同时给出反馈的可视化约束及反馈的判别准则。试验结果证明了该方法大大降低了高噪音手写体数字的识别率。该方法指出了一条进一步提高手写体字符系统性能的新途径。
手写体字符识别,, 神经网络,, 反馈
-
53浏览
-
0点赞
-
0收藏
-
0分享
-
195下载
-
0评论
-
引用
朱小燕, 黄民烈
,-0001,():
-1年11月30日
对话系统的研究已经成为人机交互技术发展的新热点,而对话管理则是其中最重要的组成部分。本文在当前对话管理的各种实现方法的基础上,提出了一种基于槽特征的自动机设计方法,其中应用了状态压缩和状态集、动作集的子空间划分,并着重以确认过程为例,阐述了确认策略控制函数及其对对话过程的影响。文中还提出了一种树形的意图分层结构,并将这种分层结构应用于主题检测与主题切换,成功解决了多主题对话系统的主题切换问题。最后,实验表明本文提出的设计方案在策略控制、主题检测与主题切换等方面具有较好性能,同时也具有一定扩展性。
对话系统, 人机交互, 对话管理, 对话策略控制
-
91浏览
-
0点赞
-
0收藏
-
0分享
-
148下载
-
0评论
-
引用
朱小燕, 王东+, 刘盈
软件学报,2002,14(9):1523~1529,-0001,():
-1年11月30日
语音识领域已经取得了稳步发展并出现了众多实用系统,但众所周知,今天的识别技术还远没有达到要求,而“鲁棒性”问题是系统性能提高的一个主要障碍。集中讨论了一种对抗语音识别系统脆弱性的通行方法-信道正规化技术,提出了一种新的正规化策略−−多层信道正规化MLCN(multi-layer channel normalization)新的算法应用递归补偿算法,在频谱域和倒谱域两层上进行正规化,降低噪音和去除信道畸变,从而为后续识别过程提供更鲁棒的特征参数。在此基础上,探讨了一种新的语音识别特征参数的提取-频域动态倒谱系数,由于MLCN的引入,频域的动态信息被恰当地集成到最终的特征向量中。在gallina系统中的实验证明了这种新方法的有效性。
语音识别, 特征提取, Mel倒谱系数, 信道正规化, 频域动态特征
-
34浏览
-
0点赞
-
0收藏
-
0分享
-
55下载
-
0评论
-
引用
朱小燕, 严斌峰+
软件学报,2003,14(12):2014~2020,-0001,():
-1年11月30日
提出了一种基于联合概率似然得分和概率似然比得分进行语音识别确认的方法,计算搜索路径得分过程中同时考虑概率似然比得分,在给出系统最终识别结果的同时给出置信度水平。实验结果表明,该方法在大大降低识别系统误警率的同时,基本保持识别正确率不变。
似然比检验, 备择模型, 语音确认, 语音识别
-
65浏览
-
0点赞
-
0收藏
-
0分享
-
117下载
-
0评论
-
引用
【期刊论文】Discovering patterns to extract protein-protein interactions from full texts
朱小燕, Minlie Huang, Xiaoyan Zhu, ∗, Yu Hao, Donald G. Payan, Kunbin Qu and Ming Li,
BIOINFORMATICS Vol. 20 no.18 2004, pages 3604-3612,-0001,():
-1年11月30日
Motivation: Although there are several databases storing protein-protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein-protein interactions from biomedical texts. Results: We present a novel and robust approach for extracting protein-protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0% and precision rate of 80.5%. Availability: The program is available on request from the authors. Contact: zxy-dcs@tsinghua.edu.cn; mli@uwaterloo.ca
-
35浏览
-
0点赞
-
0收藏
-
0分享
-
55下载
-
0评论
-
引用
【期刊论文】A hybrid method for relation extraction from biomedical literature
朱小燕, Minlie Huanga, Xiaoyan Zhua, ∗, Ming Lia, b, c
International Journal of Medical Informatics (2006) 75, 443-455,-0001,():
-1年11月30日
Purpose: Over recent years, there has been a growing interest in extracting entities and relations from biomedical literature. There are a vast number of systems and approaches being proposed to extract biological relations, but none of them achieves satisfactory results. These methodologies are either parsing-based or pattern-based, which are not competent to handle the grammatical complexities of biomedical texts, or too complicated to be adapted. It is well known that appositive, coordinative propositions and such grammatical structures are extremely common in biomedical texts, particularly in full texts. However, these problems are still untouched for most of researchers. Methods: In this paper, we have proposed a new approach, which is hybrid with both shallow parsing and pattern matching, to extract relations between proteins from scientific papers of biomedical themes. In the method, appositive and coordinative structures are interpreted based on the shallow parsing analysis, with both syntactic and semantic constraints. Then long sentences are splitted into sub-ones, from which relations are extracted by a greedy pattern matching algorithm, along with automatically generated patterns. Results: Our approach is experimented to extract protein-protein interactions from full biomedical texts, and has achieved an average F-score of 80% on individual verbs, and 66% on all verbs. With the help of shallow parsing analysis, pattern matching is improved remarkably. Compared with the traditional pattern matching algorithm, our approach achieves about 7% improvement of both precision and F-score. In contrast to other systems, our approach achieves performance comparable to the best. A demo system has been available at http://spies.cs.tsinghua.edu.cn.
Natural language processing, NLP, Information extraction, Relation extraction, Shallow parsing, Pattern matching, Appositive structure, Coordinative structure, Protein-protein interaction
-
51浏览
-
0点赞
-
0收藏
-
0分享
-
59下载
-
0评论
-
引用