Self-supervised learning based knowledge distillation framework for automatic speech recognition for hearing impaired

https://doi.org/10.53730/ijhs.v6nS1.7865

Authors

  • L. Ashok Kumar Professor, Dept. of EEE, PSG College of Technology
  • D. Karthika Renuka Associate Professor, Dept. of IT, PSG College of Technology
  • Shunmuga Priya M C Research Scholar, Dept. of IT, PSG College of Technology
  • Madhumitha G UG Students, Dept. of IT, PSG College of Technology
  • Priyanka S UG Students, Dept. of IT, PSG College of Technology
  • Sangeeth M UG Students, Dept. of IT, PSG College of Technology
  • Subhiksha R UG Students, Dept. of IT, PSG College of Technology

Keywords:

Automatic Speech Recognition, Self-Supervised, Knowledge Distillation, WER, Deep Learning

Abstract

The use of speech processing applications, particularly speech recognition, has got a lot of attention in recent decades. In recent years, research has focused on using deep learning for speech-related applications. This new branch of machine learning has outperformed others in a range of applications, including voice, and has thus become a particularly appealing research subject. Noise, speaker variability, language variability, vocabulary size, and domain remain one of the most significant research difficulties in speech recognition. We investigated on self-supervised algorithm for the unlabelled data. In recent years, these algorithms have progressed significantly, with their efficacy approaching and supervised pre-training alternatives across a variety of data modalities such as image and video. The purpose of this research is to develop powerful models for audio speech recognition that do not require human annotation. We accomplish this by distilling information from an automatic speech recognition (ASR) model that was trained on a large audio-only corpus. We integrate Connectionist Temporal Classification (CTC) loss, KL divergence loss in distillation technique. We demonstrate that distillation significantly speeds up training. We evaluate our model with evaluation metric Word Error Rate (WER).

Downloads

Download data is not yet available.

References

Kurata, G., & Saon, G. (2020). Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition. In Interspeech (pp. 2117-2121).

Yang, C., Xie, L., Qiao, S., & Yuille, A. L. (2019, July). Training deep neural networks in generations: A more tolerant teacher educates better students. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 5628-5635).

Yu, J., Han, W., Gulati, A., Chiu, C. C., Li, B., Sainath, T. N., ... & Pang, R. (2020). Dual-mode asr: Unify and improve streaming asr with full-context modeling. arXiv preprint arXiv:2010.06030.

Afouras, T., Chung, J. S., & Zisserman, A. (2020, May). Asr is all you need: Cross-modal distillation for lip reading. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2143-2147). IEEE.

Kundu, J. N., Lakkakula, N., & Babu, R. V. (2019). Um-adapt: Unsupervised multi-task adaptation using adversarial cross-task distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1436-1445).

Passalis, N., & Tefas, A. (2018). Learning deep representations with probabilistic knowledge transfer. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 268-284).

Zhao, L., Peng, X., Chen, Y., Kapadia, M., & Metaxas, D. N. (2020). Knowledge as priors: Cross-modal knowledge generalization for datasets without superior knowledge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6528-6537).

Thoker, F. M., & Gall, J. (2019, September). Cross-modal knowledge distillation for action recognition. In 2019 IEEE International Conference on Image Processing (ICIP) (pp. 6-10). IEEE.

Wang, X., Jiang, Y., Yan, Z., Jia, Z., Bach, N., Wang, T., ... & Tu, K. (2020). Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor. arXiv preprint arXiv:2010.05010.

Yang, C., Xie, L., Qiao, S., & Yuille, A. L. (2019, July). Training deep neural networks in generations: A more tolerant teacher educates better students. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 5628-5635).

Yang, Z., Shou, L., Gong, M., Lin, W., & Jiang, D. (2020, January). Model compression with two-stage multi-teacher knowledge distillation for web question answering system. In Proceedings of the 13th International Conference on Web Search and Data Mining (pp. 690-698).

Park, S., & Kwak, N. (2020). Feature-level ensemble knowledge distillation for aggregating knowledge from multiple networks. In ECAI 2020 (pp. 1411-1418). IOS Press.

Nguyen, L. T., Lee, K., & Shim, B. (2021, January). Stochasticity and Skip Connection Improve Knowledge Transfer. In 2020 28th European Signal Processing Conference (EUSIPCO) (pp. 1537-1541). IEEE.

Vidal, T., & Schiffer, M. (2020, November). Born-again tree ensembles. In International conference on machine learning (pp. 9743-9753). PMLR.

Mirzadeh, S. I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., & Ghasemzadeh, H. (2020, April). Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 04, pp. 5191-5198).

Micaelli, P., & Storkey, A. J. (2019). Zero-shot knowledge transfer via adversarial belief matching. Advances in Neural Information Processing Systems, 32.

Liu, P., Liu, W., Ma, H., Jiang, Z., & Seok, M. (2020, July). Ktan: knowledge transfer adversarial network. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.

Ye, J., Ji, Y., Wang, X., Gao, X., & Song, M. (2020). Data-free knowledge amalgamation via group-stack dual-gan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12516-12525).

Chung, I., Park, S., Kim, J., & Kwak, N. (2020, November). Feature-map-level online adversarial knowledge distillation. In International Conference on Machine Learning (pp. 2006-2015). PMLR.

Published

25-05-2022

How to Cite

Kumar, L. A., Renuka, D. K., Shunmuga, P. M. C., Madhumitha, G., Priyanka, S., Sangeeth, M., & Subhiksha, R. (2022). Self-supervised learning based knowledge distillation framework for automatic speech recognition for hearing impaired. International Journal of Health Sciences, 6(S1), 11728–11737. https://doi.org/10.53730/ijhs.v6nS1.7865

Issue

Section

Peer Review Articles