您当前所在位置: 首页 > 学者

俞凯

  • 50浏览

  • 0点赞

  • 0收藏

  • 0分享

  • 0下载

  • 0评论

  • 引用

期刊论文

Phone Synchronous Speech Recognition With CTC Lattices

暂无

IEEE/ACM Transactions on Audio, Speech and Language Processing,2017,25(1): | 2017年01月01日 | doi.org/10.1109/TASLP.2016.2625459

URL:https://dl.acm.org/doi/10.1109/TASLP.2016.2625459

摘要/描述

Connectionist temporal classification CTC has recently shown improved performance and efficiency in automatic speech recognition. One popular decoding implementation is to use a CTC model to predict the phone posteriors at each frame and then perform Viterbi beam search on a modified WFST network. This is still within the traditional frame synchronous decoding framework. In this paper, the peaky posterior property of CTC is carefully investigated and it is found that ignoring blank frames will not introduce additional search errors. Based on this phenomenon, a novel phone synchronous decoding framework is proposed by removing tremendous search redundancy due to blank frames, which results in significant search speed up. The framework naturally leads to an extremely compact phone-level acoustic space representation: CTC lattice. With CTC lattice, efficient and effective modular speech recognition approaches, second pass rescoring for large vocabulary continuous speech recognition LVCSR, and phone-based keyword spotting KWS, are also proposed in this paper. Experiments showed that phone synchronous decoding can achieve 3-4 times search speed up without performance degradation compared to frame synchronous decoding. Modular LVCSR with CTC lattice can achieve further WER improvement. KWS with CTC lattice not only achieved significant equal error rate improvement, but also greatly reduced the KWS model size and increased the search speed.

关键词:

学者未上传该成果的PDF文件,请等待学者更新

我要评论

全部评论 0

本学者其他成果

    同领域成果