您当前所在位置: 首页 > 学者

俞凯

  • 38浏览

  • 0点赞

  • 0收藏

  • 0分享

  • 0下载

  • 0评论

  • 引用

期刊论文

Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection

暂无

IEEE/ACM Transactions on Audio, Speech and Language Processing,2018,26(11): | 2018年11月01日 | doi.org/10.1109/TASLP.2018.2851155

URL:https://dl.acm.org/doi/10.1109/TASLP.2018.2851155

摘要/描述

Recent advances in automatic speaker verification ASV lead to an increased interest in securing these systems for real-world applications. Malicious spoofing attempts against ASV systems can lead to serious security breaches. A spoofing attack within the context of ASV is a condition in which a potentially harmful person successfully masks as another, to the ASV system already known person by falsifying or manipulating data. While most previous work focuses on enhanced, spoof-aware features, end-to-end models can be a potential alternative. In this paper, we investigate the training of a raw wave front-ends for deep convolutional, long short-term memory LSTM and vanilla neural networks, which are analyzed for their suitability toward spoofing detection, regarding the influence of frame size, number of output neurons, and sequence length. A joint convolutional LSTM neural network CLDNN is proposed, which outperforms previous attempts on the BTAS2016 dataset 0.82% $\rightarrow$ 0.19% HTER, placing itself as the current state-of-the-art model for the dataset. We show that end-to-end approaches are appropriate for the important replay detection task and show that the proposed model is capable of distinguishing device-invariant spoofing attempts. Regarding the ASVspoof2015 dataset, the end-to-end solution achieves an equal error rate EER of 0.00% for the S1-S9 conditions. We show that the end-to-end approach based on a raw waveform input can outperform common cepstral features, without the use of context-dependent frame extensions. In addition, a cross-database domain mismatch scenario is also evaluated, which shows that the proposed CLDNN model trained on the BTAS2016 dataset achieves an EER of 25.7% on the ASVspoof2015 dataset.

关键词:

学者未上传该成果的PDF文件,请等待学者更新

我要评论

全部评论 0

本学者其他成果

    同领域成果