Self-supervised learning based knowledge distillation framework for automatic speech recognition for hearing impaired
Keywords:
Automatic Speech Recognition, Self-Supervised, Knowledge Distillation, WER, Deep LearningAbstract
The use of speech processing applications, particularly speech recognition, has got a lot of attention in recent decades. In recent years, research has focused on using deep learning for speech-related applications. This new branch of machine learning has outperformed others in a range of applications, including voice, and has thus become a particularly appealing research subject. Noise, speaker variability, language variability, vocabulary size, and domain remain one of the most significant research difficulties in speech recognition. We investigated on self-supervised algorithm for the unlabelled data. In recent years, these algorithms have progressed significantly, with their efficacy approaching and supervised pre-training alternatives across a variety of data modalities such as image and video. The purpose of this research is to develop powerful models for audio speech recognition that do not require human annotation. We accomplish this by distilling information from an automatic speech recognition (ASR) model that was trained on a large audio-only corpus. We integrate Connectionist Temporal Classification (CTC) loss, KL divergence loss in distillation technique. We demonstrate that distillation significantly speeds up training. We evaluate our model with evaluation metric Word Error Rate (WER).
Downloads
References
Kurata, G., & Saon, G. (2020). Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition. In Interspeech (pp. 2117-2121).
Yang, C., Xie, L., Qiao, S., & Yuille, A. L. (2019, July). Training deep neural networks in generations: A more tolerant teacher educates better students. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 5628-5635).
Yu, J., Han, W., Gulati, A., Chiu, C. C., Li, B., Sainath, T. N., ... & Pang, R. (2020). Dual-mode asr: Unify and improve streaming asr with full-context modeling. arXiv preprint arXiv:2010.06030.
Afouras, T., Chung, J. S., & Zisserman, A. (2020, May). Asr is all you need: Cross-modal distillation for lip reading. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2143-2147). IEEE.
Kundu, J. N., Lakkakula, N., & Babu, R. V. (2019). Um-adapt: Unsupervised multi-task adaptation using adversarial cross-task distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1436-1445).
Passalis, N., & Tefas, A. (2018). Learning deep representations with probabilistic knowledge transfer. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 268-284).
Zhao, L., Peng, X., Chen, Y., Kapadia, M., & Metaxas, D. N. (2020). Knowledge as priors: Cross-modal knowledge generalization for datasets without superior knowledge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6528-6537).
Thoker, F. M., & Gall, J. (2019, September). Cross-modal knowledge distillation for action recognition. In 2019 IEEE International Conference on Image Processing (ICIP) (pp. 6-10). IEEE.
Wang, X., Jiang, Y., Yan, Z., Jia, Z., Bach, N., Wang, T., ... & Tu, K. (2020). Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor. arXiv preprint arXiv:2010.05010.
Yang, C., Xie, L., Qiao, S., & Yuille, A. L. (2019, July). Training deep neural networks in generations: A more tolerant teacher educates better students. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 5628-5635).
Yang, Z., Shou, L., Gong, M., Lin, W., & Jiang, D. (2020, January). Model compression with two-stage multi-teacher knowledge distillation for web question answering system. In Proceedings of the 13th International Conference on Web Search and Data Mining (pp. 690-698).
Park, S., & Kwak, N. (2020). Feature-level ensemble knowledge distillation for aggregating knowledge from multiple networks. In ECAI 2020 (pp. 1411-1418). IOS Press.
Nguyen, L. T., Lee, K., & Shim, B. (2021, January). Stochasticity and Skip Connection Improve Knowledge Transfer. In 2020 28th European Signal Processing Conference (EUSIPCO) (pp. 1537-1541). IEEE.
Vidal, T., & Schiffer, M. (2020, November). Born-again tree ensembles. In International conference on machine learning (pp. 9743-9753). PMLR.
Mirzadeh, S. I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., & Ghasemzadeh, H. (2020, April). Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 04, pp. 5191-5198).
Micaelli, P., & Storkey, A. J. (2019). Zero-shot knowledge transfer via adversarial belief matching. Advances in Neural Information Processing Systems, 32.
Liu, P., Liu, W., Ma, H., Jiang, Z., & Seok, M. (2020, July). Ktan: knowledge transfer adversarial network. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.
Ye, J., Ji, Y., Wang, X., Gao, X., & Song, M. (2020). Data-free knowledge amalgamation via group-stack dual-gan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12516-12525).
Chung, I., Park, S., Kim, J., & Kwak, N. (2020, November). Feature-map-level online adversarial knowledge distillation. In International Conference on Machine Learning (pp. 2006-2015). PMLR.
Published
How to Cite
Issue
Section
Copyright (c) 2022 International journal of health sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the International Journal of Health Sciences (IJHS) are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJHS right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.
Articles published in IJHS can be copied, communicated and shared in their published form for non-commercial purposes provided full attribution is given to the author and the journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
This copyright notice applies to articles published in IJHS volumes 4 onwards. Please read about the copyright notices for previous volumes under Journal History.